arXivDaily arXiv每日学术速递 周一至周五更新
重置

1. 统计理论与方法 12 篇

2606.19086 2026-06-18 stat.ME 新提交

Probability Bound Analysis for Dependence Uncertainty in Risk and Decision Models

风险与决策模型中依赖不确定性的概率界分析

Rowan Iskandar

AI总结 针对边际信息与依赖信息不完整的情况,提出一种依赖敏感的PBA框架,通过p-box、copula和Fréchet耦合集传播不确定性,并在风险决策模型中展示依赖假设对输出界和尾部风险的影响。

详情
AI中文摘要

风险与决策模型通常结合稀疏的边际信息、精确指定的概率分布以及仅部分合理的依赖假设。概率界分析(PBA)通过概率盒表示认知不确定性,但许多应用假设独立性或要求完全指定依赖结构。我们为黑箱风险与决策模型开发了一个依赖敏感的PBA框架,其中边际信息和依赖信息可能都不完整。该框架结合了p-box参数、精确CDF参数和固定量;通过copula纳入指定的依赖关系;并通过Fréchet风格的可容许耦合集传播未知依赖关系。我们还将该构造扩展到不精确指定和精确指定输入之间的交叉依赖关系。在一个说明性风险决策模型中,依赖假设显著影响了输出界和尾部风险汇总;忽略或简化依赖关系的分析产生了更窄的可能结果表征。当证据不足以证明精确边际分布或单一依赖模型时,该框架支持透明的不确定性传播。

英文摘要

Risk and decision models often combine sparse marginal information, precisely specified probability distributions, and dependence assumptions that are only partly justified. Probability bound analysis (PBA) represents epistemic uncertainty through probability boxes, but many applications assume independence or require dependence structures to be fully specified. We develop a dependence-sensitive PBA framework for black-box risk and decision models in which both marginal information and dependence information may be incomplete. The framework combines p-box parameters, precise-CDF parameters, and fixed quantities; incorporates specified dependence through copulas; and propagates unknown dependence through Fréchet-style admissible coupling sets. We also extend the construction to cross-dependence between imprecisely specified and precisely specified inputs. In an illustrative risk decision model, dependence assumptions materially affected output bounds and tail-risk summaries; analyses that ignored or simplified dependence produced narrower characterizations of plausible outcomes. The framework supports transparent uncertainty propagation when evidence is insufficient to justify either precise marginal distributions or a single dependence model.

2606.19011 2026-06-18 stat.ME 新提交

Dimension reduction of multivariate densities in Bayes spaces

贝叶斯空间中多元密度的降维

Adéla Czolková, Karel Hron, Sonja Greven

AI总结 提出在贝叶斯空间中对多元概率密度函数进行正交分解,实现独立与交互成分的分离,并证明该分解在PCA意义下最优,应用于房屋和地质数据展示可解释性。

详情
AI中文摘要

贝叶斯空间为分析概率密度函数(PDF)提供了一个希尔伯特空间结构,赋予它们反映其相对性和约束性的几何结构。该框架中的一个关键工具是中心对数比(clr)变换,它在贝叶斯空间与经典$L^2$空间(的一个子空间)之间建立了等距同构。这使得将函数数据分析(FDA)技术,特别是函数主成分分析(FPCA),应用于单变量和多变量密度数据的降维成为可能。对于多元PDF,将其嵌入贝叶斯空间可以实现正交分解为独立成分和交互成分。此外,独立部分可以分解为相互正交的几何边缘分布。这种结构为多元密度的变异来源提供了更深刻的见解。我们证明了这种总方差分解在PCA意义下是最优的,影响了FPCA得到的特征函数和得分的解释。我们证明,直接对多元密度应用FPCA在某种意义上等价于对其分解形式应用多元FPCA,得到的特征函数和得分也相应分解。基于这些理论结果的独特分解分别应用于房屋和地质实证数据,展示了该方法的可解释性和实用价值。

英文摘要

The Bayes space provides a Hilbert space structure for analysing probability density functions (PDFs), equipping them with a geometry that reflects their relative and constrained nature. A key tool in this framework is the centred logratio (clr) transformation, which establishes an isometric isomorphism between the Bayes space and (a subspace of) the classical $L^2$ space. This makes it possible to apply functional data analysis (FDA) techniques, particularly functional principal component analysis (FPCA), to both univariate and multivariate density data in the context of dimension reduction. For multivariate PDFs, embedding them in the Bayes space enables an orthogonal decomposition into independent and interactive components. Furthermore, the independent part can be decomposed into mutually orthogonal geometric marginals. This structure provides more profound insights into the sources of variation in multivariate densities. We show that this decomposition of the total variance is optimal in a PCA sense, impacting the interpretation of the eigenfunctions and scores resulting from FPCA. We demonstrate that applying FPCA directly to multivariate densities is equivalent in a certain sense to applying multivariate FPCA to their decomposed form, with the resulting eigenfunctions and scores decomposing accordingly. The unique decomposition based on these theoretical results is applied to housing and geological empirical data respectively, demonstrating the interpretability and practical value of this approach.

2606.18366 2026-06-18 stat.ME 新提交

A closed-form sample size correction for always-valid inference with optional stopping

可选停止下始终有效推断的闭式样本量校正

Mårten Schultzberg

AI总结 针对A/B测试中连续监测的顺序检验,提出闭式校正因子k*,通过调整固定样本量使经验功效接近目标值,节省8-20%样本预算。

详情
AI中文摘要

允许连续监测的顺序检验在A/B实验中很常见。这些检验的功效计算需要模拟,这在实验平台上跨多个指标难以扩展。相反,一种常见的样本量确定启发式方法会膨胀固定样本量,直到计划终点处的边际拒绝概率达到$1-\beta$。这个最后点规则是保守的,因为始终有效(AV)功效是在运行期间任何时间边界跨越的概率,而不仅仅是终点。我们给出了一个闭式校正因子$k^*(\alpha, \beta, t_0)$,用初等函数和二元正态CDF表示,其中$t_0 = m/n_z$是预热比例。闭式近似仅通过边界在计划终点处的值和斜率依赖于边界,并且可以针对任何光滑凹边界进行评估。我们研究了三种情况:Waudby-Smith等人(2023)和Maharaj等人(2023)的置信序列,以及Johari等人(2022)的混合序贯概率比检验。将总样本量设为$k^* \cdot n_z$,其中$n_z$是分配比$r$下的固定样本量,在高斯模拟中使经验功效与目标值相差约3个百分点以内。校正因子仅通过$t_0 = m/n_z(r)$依赖于分配比$r$。我们研究了预热参数的敏感性,并表明在校正因子在操作范围内节省了8-20%的最后点样本预算。

英文摘要

Sequential tests that allow continuous monitoring are common in A/B experimentation. Power calculations for these tests require simulations that are hard to scale across many metrics on an experimentation platform. Instead, a common sizing heuristic inflates the fixed-sample size until the marginal rejection probability at the planned endpoint reaches $1-β$. This last-point rule is conservative because always-valid (AV) power is the probability of a boundary crossing at any time during the run, not at the endpoint alone. We give a closed-form correction factor $k^(α, β, t_0)$ expressed in elementary functions and the bivariate normal CDF, where $t_0 = m/n_z$ is the burn-in fraction. The closed-form approximation depends on the boundary only through its value and slope at the planned endpoint and can be evaluated for any smooth concave boundary. We work out three cases: the confidence sequences of Waudby-Smith et al. (2023) and Maharaj et al. (2023), and the mixture sequential probability ratio test of Johari et al. (2022). Setting the total sample size to $k^ \cdot n_z$, where $n_z$ is the fixed-sample size for allocation ratio $r$, hits empirical power within approximately 3 percentage points of target in Gaussian simulations. The correction factor depends on the allocation ratio $r$ only through $t_0 = m/n_z(r)$. We study sensitivity to the burn-in parameter and show that the correction saves 8--20% of the last-point sample budget across the operating range.

2606.18365 2026-06-18 stat.ME 新提交

Logarithmic energy distances and Gini covariance for Hilbert-valued random elements

Hilbert值随机元的对数能量距离与Gini协方差

Norbert Henze, M. Dolores Jiménez-Gamero

AI总结 研究α↓0时广义能量距离的极限,得到对数能量距离,保留特征性质并导出高斯核最大均值差异表示,进而提出对数Gini协方差用于k样本问题。

Comments 18 pages

详情
AI中文摘要

对于α∈(0,2),广义能量距离和Gini协方差统计量基于核函数(x,y)↦‖x-y‖^α,其中‖·‖表示实可分Hilbert空间中的范数。本文研究边界情形α↓0。经过适当归一化后,相应的能量距离收敛到涉及核函数(x,y)↦log‖x-y‖的对数能量距离。我们证明所得对数能量距离保留了可分Hilbert空间中普通能量距离的基本特征性质,并导出其高斯核最大均值差异表示。受此表示启发,我们针对k样本问题引入对数Gini协方差,并研究其结构和渐近性质。特别地,我们导出其成对对数能量距离表示,建立分布相等的特征定理,发展相应经验统计量的渐近零假设和备择假设理论,并讨论基于置换的实现。对数框架揭示了能量型统计量族中的新边界现象,并提供了与核方法、函数型数据分析和高维推断的联系。

英文摘要

For $α\in(0,2)$, the generalized energy distance and the Gini covariance statistic are based on kernels of the form $(x,y)\mapsto \|x-y\|^α$, where $\|\cdot\|$ denotes the norm in a real separable Hilbert space. This paper investigates the boundary regime $α\downarrow 0$. After suitable normalization, the corresponding energy distance converges to a logarithmic energy distance involving the kernel $(x,y)\mapsto\log\|x-y\|$. We establish that the resulting logarithmic energy distance retains the fundamental characterization property of ordinary energy distances in separable Hilbert spaces and derive a representation in terms of Gaussian-kernel maximum mean discrepancies. Motivated by this representation, we introduce a logarithmic Gini covariance for the $k$-sample problem and investigate its structural and asymptotic properties. In particular, we derive a representation in terms of pairwise logarithmic energy distances, establish a characterization theorem for equality of distributions, develop asymptotic null and alternative theory for the corresponding empirical statistic, and discuss permutation-based implementation. The logarithmic framework reveals a new boundary phenomenon within the family of energy-type statistics and provides connections with kernel methods, functional data analysis, and high-dimensional inference.

2606.18933 2026-06-18 cs.LG cs.IR stat.ME 新提交

Zero-Shot Active Feature Acquisition via LLM-Elicitation

基于LLM启发式的零样本主动特征获取

Binyamin Perets, Natalie Mendelson, Shiran Vainberg, Yehuda Chowers, Shai Shen-Orr, Shie Mannor

发表机构 * Faculty of EE, Technion(技术学院电子工程系) Faculty of Medicine, Technion(技术学院医学院) CytoReason NVIDIA

AI总结 提出通过LLM启发式获取马尔可夫随机场充分统计量的零样本主动特征获取框架,解决数据标注不足问题,在IBD患者诊断中优于现有方法。

详情
AI中文摘要

主动特征获取(AFA)顺序选择要观察的特征以达成分类或排序决策。其主要局限性在于依赖大量标注数据来拟合指导获取的概率模型。大型语言模型(LLM)提供无监督的领域知识,但作为序列规划者表现不佳。要求其同时知晓和决策会混淆最好分开的能力。这里,我们通过严格的启发式方法开发了一个零样本AFA框架:仅要求LLM返回其可被信任返回的内容,即马尔可夫随机场(MRF)的充分统计量——一元偏差和成对协变。我们将该框架应用于两个场景:二分类和top-$k$识别。实践中,LLM可靠地仅返回判别性统计量,即区分类别而非孤立每个类别的统计量,这阻碍了经典AFA。我们应用最大熵闭包来解决这种规范模糊性。我们在炎症性肠病(IBD)患者队列上进行评估,这是一个活跃的临床环境,其中诊断模糊性和患者异质性阻碍了稳定的治疗策略。我们的框架在真实标签和其自身提取的信念上均优于LLM。在最关键的地方,即最困难的患者上,我们的top-$k$获取策略显著优于所有现有方法。

英文摘要

Active feature acquisition (AFA) sequentially selects which features to observe to reach a classification or ranking decision. Its central limitation is reliance on large amount of labeled data to fit probabilistic models guiding acquisition. Large language models (LLMs) supply unsupervised domain knowledge, but are poor sequential planners. Asking one to both know and decide conflates capabilities best kept separate. Here, we develop a framework for zero-shot AFA through disciplined elicitation: asking the LLM only for what it can be trusted to return, the unary deviations and pairwise co-variations that are the sufficient statistics of a Markov random field (MRF). We apply our framework to two settings: binary classification and top-$k$ identification. In practice, the LLM reliably returns only discriminative statistics, what distinguishes the classes rather than each class in isolation, which precludes classical AFA. We apply a maximum-entropy closure that resolves this gauge ambiguity. We evaluate on a cohort of Inflammatory Bowel Disease (IBD) patients, an active clinical setting where diagnostic ambiguity and patient heterogeneity obstruct stable treatment strategies. Our framework outperforms the LLM both on real labels and on its own extracted beliefs. Where it matters most, on the hardest patients, our top-$k$ acquisition policy markedly outperforms all existing methods.

2606.18306 2026-06-18 cs.LG stat.ML 新提交

Fisher Width: A Geometric Measure of Complexity on Statistical Manifolds

Fisher宽度:统计流形上的几何复杂度度量

Vu Khac Ky

发表机构 * Department of Mathematics, FPT University(FPT大学数学系)

AI总结 提出Fisher宽度作为统计流形上高斯宽度的类比,利用Fisher信息度量局部几何,并证明其保持高斯宽度的关键性质,应用于Fisher-Lipschitz假设类的泛化界。

Comments 48 pages, 3 figures

详情
AI中文摘要

高斯宽度是高维概率、压缩感知、凸优化和学习理论中的一个核心几何复杂度度量。它量化了集合沿随机方向的平均延伸程度,从而捕捉了约束集、假设类和下降锥的有效维度。然而,这一概念本质上是欧几里得的。统计模型则具有由Fisher信息度量诱导的自然黎曼几何,其中方向根据统计可区分性而非环境欧几里得长度进行缩放。我们引入了Fisher宽度,即统计流形上高斯宽度的Fisher几何类比。在参数点$\ heta$处,Fisher宽度将欧几里得恒等替换为局部度量张量$G(\ heta)^{1/2}$,测量Fisher重缩放集的高斯宽度。这使得所得量对局部统计曲率敏感,且在光滑重参数化下不变。我们发展了Fisher宽度的基本理论,表明它保留了高斯宽度的关键结构特征,包括集中性、度量扰动稳定性以及与欧几里得基线的谱比较界,同时捕捉了欧几里得度量无法察觉的各向异性几何效应。作为应用,我们证明了Fisher-Lipschitz假设类的泛化界,并提出了可计算的估计量,在MNIST上对三个模型类进行了实证评估。Fisher宽度之于统计流形,正如高斯宽度之于欧几里得凸体。这项工作为研究弯曲统计流形上的复杂性和学习奠定了基础。

英文摘要

Gaussian width is a central geometric complexity measure in high-dimensional probability, compressed sensing, convex optimization, and learning theory. It quantifies the average extent of a set along random directions, thereby capturing the effective dimension of constraint sets, hypothesis classes, and descent cones. However, this notion is intrinsically Euclidean. Statistical models instead carry a natural Riemannian geometry induced by the Fisher information metric, where directions are scaled according to statistical distinguishability rather than ambient Euclidean length. We introduce Fisher width, a Fisher-geometric analogue of Gaussian width for statistical manifolds. At a parameter point $θ$, Fisher width replaces the Euclidean identity by the local metric tensor $G(θ)^{1/2}$, measuring the Gaussian width of the Fisher-rescaled set. This makes the resulting quantity sensitive to local statistical curvature and invariant under smooth reparameterizations. We develop the basic theory of Fisher width, showing that it retains key structural features of Gaussian width, including concentration, metric perturbation stability, and spectral comparison bounds with the Euclidean baseline, while also capturing anisotropic geometric effects invisible to Euclidean measures. As an application, we prove a generalization bound for Fisher-Lipschitz hypothesis classes and propose computable estimators, which we evaluate empirically on MNIST across three model classes. Fisher width is to statistical manifolds what Gaussian width is to Euclidean convex bodies. This work lays the foundation for studying complexity and learning on curved statistical manifolds.

2606.18446 2026-06-18 astro-ph.CO astro-ph.IM stat.ME 新提交

Covariance shrinkage for cosmological inference with Sellentin-Heavens-type likelihoods

宇宙学推断中的协方差收缩:基于Sellentin-Heavens型似然

Mattera Raffaele

AI总结 研究在有限模拟样本下协方差矩阵的收缩正则化方法,提出将收缩强度作为辅助推断量并边缘化其不确定性,以改进参数后验校准。

详情
AI中文摘要

天文学和宇宙学参数推断中使用的协方差矩阵通常由有限数量的模拟估计得到,因此协方差不确定性会影响后验校准和参数约束。我们从基于模拟估计协方差矩阵的似然推断角度研究协方差正则化。首先,我们分析高斯插件似然和协方差边缘化的Sellentin-Heavens似然下的标量协方差缩放。使用期望负对数似然损失,我们表明在高斯插件似然下Hartlap协方差侧缩放是最优的,而在Sellentin-Heavens似然下未缩放的样本协方差是最优的。这表明标量协方差校正是似然依赖的,并且一旦通过Sellentin-Heavens似然边缘化协方差不确定性,额外的Hartlap型全局缩放就不被支持。然后,我们引入一种收缩公式,其中样本协方差向球形目标正则化,收缩强度被视为辅助推断量。为收缩强度分配先验,似然诱导其后验分布,最终参数后验通过对其边缘化得到。蒙特卡洛实验表明,收缩显著改善了协方差条件,而收缩强度的边缘化将正则化量的不确定性传播到后验推断中。所提出的方法提供了一种简单的方式,将协方差边缘化似然推断与噪声模拟估计协方差矩阵的结构正则化相结合。

英文摘要

Covariance matrices used in astronomical and cosmological parameter inference are often estimated from a finite number of simulations, so covariance uncertainty can affect posterior calibration and parameter constraints. We study covariance regularisation from the perspective of likelihood-based inference with simulation-estimated covariance matrices. First, we analyse scalar covariance scaling under the Gaussian plug-in likelihood and the covariance-marginalised Sellentin--Heavens likelihood. Using an expected negative log-likelihood loss, we show that Hartlap covariance-side scaling is recovered as the optimum under the Gaussian plug-in likelihood, whereas the unscaled sample covariance is optimal under the Sellentin--Heavens likelihood. This shows that scalar covariance corrections are likelihood-dependent and that an additional Hartlap-type global scaling is not favoured once covariance uncertainty is marginalised through the Sellentin--Heavens likelihood. We then introduce a shrinkage formulation in which the sample covariance is regularised towards a spherical target and the shrinkage intensity is treated as an auxiliary inferential quantity. A prior is assigned to the shrinkage intensity, the likelihood induces its posterior distribution, and the final parameter posterior is obtained by marginalising over it. Monte Carlo experiments show that shrinkage substantially improves covariance conditioning, while marginalisation over the shrinkage intensity propagates uncertainty about the amount of regularisation into posterior inference. The proposed approach provides a simple way to combine covariance-marginalised likelihood inference with structural regularisation of noisy simulation-estimated covariance matrices.

2606.11136 2026-06-18 math.ST stat.ME stat.ML stat.TH 新提交

Conformal Prediction for Dyadic Regression Under Complex Missingness

复杂缺失机制下二元回归的共形预测

Robert Lunde, Minjie Yang, Elizaveta Levina, Ji Zhu

AI总结 针对复杂缺失机制下的二元回归问题,提出共形预测框架,通过分布不变性条件替代可交换性,并利用双射论证处理随机子集样本,同时提出多种共形预测程序,包括图论加权方法,实现渐近条件有效性。

详情
AI中文摘要

我们针对复杂缺失机制下的二元回归问题,建立了一个共形预测框架。在理论层面,我们在弱于可交换性的分布不变性条件下建立了共形预测的超均匀性。一个关键结果通过一种新颖的双射论证处理了样本本身是指标集的随机子集的情况,该情况未被现有理论覆盖,该论证构造了事件之间的显式保测对应。此外,我们针对联合可交换数组提出了共形预测程序,包括全共形、分裂共形、利用行和列内相似性的行列方法,以及实现掩码条件有效性的选择性共形程序。对于缺失元素,我们在缺失机制的非参数图论模型下建立了图论加权共形程序的渐近有效性。我们进一步建立了连续和离散响应的条件有效性结果;据我们所知,这是首次在非随机缺失假设下对加权共形预测的渐近条件有效性进行正式证明。所提出的方法在合成和真实网络数据上进行了说明。

英文摘要

We develop a framework for conformal prediction in dyadic regression problems under complex missingness mechanisms. At the theoretical level, we develop general technical tools for establishing finite-sample validity of conformal prediction under distributional invariance conditions weaker than exchangeability. A key result handles the case where the sample itself is a random subset of the index set, a setting not covered by existing theory, via a novel bijection argument that constructs an explicit measure-preserving correspondence between events. In addition, we propose conformal prediction procedures for jointly exchangeable arrays, including full conformal, split conformal, a row-column approach exploiting similarities within rows and columns, and a selective conformal procedure achieving mask-conditional validity. For missing elements, we establish asymptotic validity of a weighted conformal procedure under a nonparametric graphon model for the missingness mechanism. We further establish conditional validity results for both continuous and discrete responses; to the best of our knowledge, this is the first formal proof of asymptotic conditional validity for weighted conformal prediction under a missing-not-at-random assumption. The proposed methods are illustrated on synthetic and real network data.

2605.20726 2026-06-18 stat.ME cs.LG stat.ML 版本更新

Everywhere Valid Bounds on False Discovery Proportions in Conformal Inference

在符合推断中对虚假发现比例的处处有效界

Ziang Song, Ying Jin, Emmanuel J. Candès

发表机构 * Department of Statistics, Stanford University(斯坦福大学统计学系) Department of Statistics and Data Science, University of Pennsylvania(宾夕法尼亚大学统计学与数据科学系) Department of Mathematics, Stanford University(斯坦福大学数学系)

AI总结 本文提出了一种在多重检验问题中对虚假发现比例(FDP)的处处有效界,通过构造高概率包络来保证在任意后验阈值选择下的统计保证,同时展示了该方法在异常检测和符合选择中的应用。

Comments 34 pages, 12 figures. Code available at https://github.com/sza919/everywhere-valid-fdp-bounds-in-conformal-inference

详情
AI中文摘要

现代将符合推断应用于多重检验问题,如异常检测和候选选择时,通常涉及选择符合p值低于阈值的测试样本。此类方法的质量通常通过虚假发现比例(FDP)来衡量,定义为错误选择的比例。现有方法通常控制FDP的期望值,使用如Benjamini-Hochberg过程等方法。这种做法无法提供高概率界下的实际FDP界,且当拒绝阈值在查看数据后选择时会破坏统计保证。本文建立了适用于所有可能拒绝阈值的有限样本、分布无关的FDP上界,从而允许任意后验阈值选择。通过从其联合分布中采样来构造null符合p值的经验分布函数的高概率包络,实现了同时有效性。此外,我们的框架允许从业者调节包络的形状,从而在主要感兴趣的拒绝区域中产生更紧的界。我们使用这种灵活的方法推导出异常检测和符合选择的的同时FDP上界。通过合成和真实数据实验,我们展示了所得到的界既有效又比现有方法的界更加不保守。

英文摘要

Modern applications of conformal inference to multiple testing problems, such as outlier detection and candidate selection, often involve selecting test samples whose conformal p-values fall below a threshold. The quality of such methods is often measured by the false discovery proportion (FDP), defined as the fraction of incorrect selections. Existing approaches typically control the expected value of the FDP, using methods such as the Benjamini-Hochberg procedure. This approach fails to provide high-probability bounds on the realized false discovery proportion and invalidates statistical guarantees if the rejection threshold is selected after inspecting the data. This paper establishes finite-sample, distribution-free upper bounds on the FDP that hold simultaneously over all possible rejection thresholds, enabling arbitrary post hoc selection of the threshold. Simultaneous validity is achieved by constructing a high-probability envelope for the empirical distribution function of null conformal p-values by sampling from their joint distribution. Furthermore, our framework allows practitioners to modulate the envelope's shape, thereby producing tight bounds in rejection regions of primary interest. We use this flexible approach to derive simultaneous FDP upper bounds for both outlier detection and conformal selection. We demonstrate through synthetic and real-data experiments that the resulting bounds are both valid and substantially less conservative than those derived from existing approaches.

2604.04141 2026-06-18 stat.ME math.ST stat.AP stat.TH 版本更新

On Data Thinning for Model Validation in Small Area Estimation

小区域估计中用于模型验证的数据稀疏化

Sho Kawano, Paul A. Parker, Zehang Richard Li

AI总结 提出数据稀疏化方法,将单个观测拆分为独立训练和测试集,实现小区域估计的模型验证,并分析其偏差-方差权衡,给出实用建议。

详情
AI中文摘要

小区域估计为样本量有限的地理和人口子组产生总体参数的估计。这些估计对政策决策至关重要,但模型的合理验证仍然是一个挑战。与传统的预测设置不同,验证数据很少可用。数据稀疏化将单个观测拆分为独立的训练和测试组件。它仅使用常规可用的区域级汇总统计量(要求其高斯性和已知抽样方差)实现样本外验证。然而,基于稀疏化的模型比较的性质尚未被正式研究。在本文中,我们发展了这些性质。我们构建了稀疏化数据均方误差的无偏估计量,并表明它与完整数据的对应量存在系统性差异;对于标准的Fay-Herriot模型,该差距具有闭式表达式,取决于候选模型的收缩行为。我们进一步表明,当训练分数接近1时,估计量方差急剧增加,产生偏差-方差权衡,且没有普遍最优的稀疏化参数。平衡这些力量的实用建议由理论指导并经经验验证。基于美国社区调查微观数据的设计模拟表明,推荐的数据稀疏化方法与信息准则和基于模拟的方法具有竞争力,并且在异质抽样设计下更稳定。

英文摘要

Small area estimation produces estimates of population parameters for geographic and demographic subgroups with limited sample sizes. Such estimates are critical for policy decisions, yet principled validation of these models remains a challenge. Unlike conventional predictive settings, validation data are rarely available. Data thinning splits a single observation into independent training and test components. It enables out-of-sample validation using only the area-level summary statistics routinely available, requiring only their Gaussianity and known sampling variances. However, the properties of thinning-based model comparison have not been formally studied. In this paper, we develop these properties. We construct an unbiased estimator of thinned-data mean squared error and show that it differs systematically from its full-data counterpart; for the standard Fay-Herriot model, the gap admits a closed-form expression that depends on the candidate model's shrinkage behavior. We further show that the estimator variance increases sharply as the training fraction approaches one, producing a bias-variance tradeoff with no universally optimal thinning parameter. Practical recommendations balancing these forces are informed by theory and verified empirically. Design-based simulations using American Community Survey microdata show that the recommended data thinning approach is competitive with information-criterion and simulation-based methods, and substantially more stable across heterogeneous sampling designs.

2511.14992 2026-06-18 stat.ME 版本更新

An Estimand-Focused Approach for AUC Generalization and Cross-Study Benchmarking

面向目标估计量的AUC泛化与跨研究基准比较方法

Jiajun Liu, Guangcai Mao, Xiaofei Wang

AI总结 提出一种以目标估计量为中心的框架,通过校准加权和增强变体,将AUC推断锚定到预设目标人群,解决生物标志物验证中人群迁移和跨研究比较问题。

详情
AI中文摘要

ROC曲线下面积(AUC)是衡量生物标志物判别准确性的标准指标;然而,AUC很少被视为特定人群的估计量。当验证队列在病例组合上与目标人群不同时,朴素AUC估计可能误导泛化和跨研究比较。我们开发了一个以估计量为中心的框架,将生物标志物AUC推断锚定到预先指定的目标人群,与ICH E9(R1)中适应于判别而非治疗效应的估计量视角一致。该框架支持两个科学目标:将研究特定AUC泛化到临床相关的目标人群,以及在共同人群基础上跨研究比较AUC。方法上,我们将校准加权扩展到AUC的U统计量公式,即使目标人群仅通过汇总级协变量信息表征,也能进行有效估计。这种设置在生物标志物验证中很常见,此时个体级目标数据通常不可用,且现有可迁移性方法可能不适用。当可获取患者级真实世界数据时,所提出的增强变体提供双重稳健性和更高效率。我们建立了渐近性质,并通过综合模拟研究其性能。此外,我们在POWER试验中演示了所提出的框架,评估基线爬楼梯功率(SCP)作为晚期非小细胞肺癌(NSCLC)6个月生存的预后标志物。与先前关于迁移基于模型的预测准确性的工作不同,我们的框架直接针对生物标志物级别的估计量,并解决跨研究可比性问题——这是当前方法尚未解决的问题。

英文摘要

The area under the ROC curve (AUC) is the standard measure of a biomarker's discriminatory accuracy; however, AUC is rarely treated as a population-specific estimand. When validation cohorts differ from the intended target population in case mix, Naïve AUC estimates can mislead both generalization and cross-study comparison. We develop an estimand-focused framework that anchors biomarker AUC inference to a prespecified target population, aligning with the ICH E9(R1) estimand perspective adapted to discrimination rather than treatment effect. The framework supports two scientific goals: generalizing a study-specific AUC to a clinically relevant target population, and benchmarking AUCs across studies on a common population footing. Methodologically, we extend calibration weighting to the U-statistic formulation of AUC, allowing valid estimation even when the target population is characterized only by summary-level covariate information. This setting is common in biomarker validation, where individual-level target data are often unavailable and existing transportability methods may not be applicable. When patient-level real-world data are accessible, the proposed augmented variants provide double robustness and improved efficiency. We establish asymptotic properties and study their performances through comprehensive simulations. Furthermore, we demonstrate the proposed framework on the POWER trials, evaluating baseline stair-climb power (SCP) as a prognostic marker for 6-month survival in advanced non-small-cell lung cancer (NSCLC). Unlike prior work on transporting model-based predictive accuracy, our framework targets the biomarker-level estimand directly and addresses cross-study comparability - an issue not resolved by current methods.

2508.06402 2026-06-18 stat.ME math.ST stat.TH 版本更新

Coverage correlation: detecting singular dependencies between random variables

覆盖相关性:检测随机变量之间的奇异依赖关系

Xuzhi Yang, Mona Azadkia, Tengyao Wang

AI总结 提出覆盖相关系数,一种基于Monge-Kantorovich秩的非参数统计量,用于检测随机变量间的奇异依赖关系,具有分布自由、计算高效等优点。

Comments 100 pages, 6 figures, 3 tables

详情
AI中文摘要

我们引入了覆盖相关系数,这是一种新的非参数统计关联度量,旨在量化两个随机变量的联合分布相对于边际乘积集中在奇异子集上的程度。我们的相关统计量一致地估计联合分布与边际乘积之间的$f$-散度,当且仅当变量独立时该散度为0,当且仅当copula是奇异时该散度为1。利用Monge-Kantorovich秩,覆盖相关系数自然地扩展到衡量随机向量之间的关联。它是分布自由的,具有解析可处理的渐近零分布,并且可以高效计算,使其非常适合在大规模成对检验中检测复杂的、可能非线性的关联。

英文摘要

We introduce the coverage correlation coefficient, a novel nonparametric measure of statistical association designed to quantify the extent to which two random variables have a joint distribution concentrated on a singular subset with respect to the product of the marginals. Our correlation statistic consistently estimates an $f$-divergence between the joint distribution and the product of the marginals, which is 0 if and only if the variables are independent and 1 if and only if the copula is singular. Using Monge--Kantorovich ranks, the coverage correlation naturally extends to measure association between random vectors. It is distribution-free, admits an analytically tractable asymptotic null distribution, and can be computed efficiently, making it well-suited for detecting complex, potentially nonlinear associations in large-scale pairwise testing.

2. 贝叶斯统计与概率建模 7 篇

2606.18412 2026-06-18 stat.ME stat.ML 新提交

Bayesian Nonparametric Detection of Anomalies in Multivariate Functional Data

多元函数数据中异常点的贝叶斯非参数检测

Daniel Krasnov, David Stephens

AI总结 提出一种贝叶斯非参数方法,通过无限混合多输出高斯过程建模多元函数数据,自动确定混合分量数,利用切片采样和Besov先验实现稀疏表示,并引入Carlin-Chib步骤选择协方差核,从而无需预设异常数量即可检测异常。

Comments 29 pages, 8 figures

详情
AI中文摘要

函数数据中的异常点源于偏离主导数据生成机制的罕见或独特过程。检测此类偏离在应用中至关重要,因为它们可能对应错误、结构变化或其他感兴趣的行为。本文介绍了一种用于多元函数数据异常检测的贝叶斯非参数方法。我们将函数数据建模为多输出高斯过程的无限混合,通过切片采样获得有限且自动确定的混合分量数。均值函数使用小波基表示,并通过Besov先验正则化以获得数据的平滑稀疏表示。利用内在共区域化模型捕获跨函数依赖性,并通过在马尔可夫链蒙特卡洛算法中引入Carlin-Chib乘积空间步骤解决协方差核选择问题。在该模型中,异常观测被分配到小的混合分量中,无需预先指定异常的数量或性质。我们考虑半监督设置,其中15%的正常观测有标签,且存在较大的类别不平衡。我们的模型在单变量和多元函数数据上的实用性得到了验证。

英文摘要

Anomalies in functional data arise from rare or distinct processes that deviate from the dominant data-generating mechanism. Detecting such departures is essential in applications where they may correspond to errors, structural changes, or other behavior of interest. This work introduces a Bayesian nonparametric approach for anomaly detection in multivariate functional data. We model functional data as an infinite mixture of multi-output Gaussian processes, with a finite and automatically determined number of mixture components obtained through slice sampling. Mean functions are represented using a wavelet basis and regularized through Besov priors to obtain a smooth and sparse representation of the data. Cross-functional dependence is captured using the intrinsic coregionalization model and we solve covariance kernel selection by introducing a Carlin-Chib product space step in the Markov Chain Monte Carlo algorithm. Within this model, anomalous observations are assigned to small mixture components without requiring prior specification of the number or nature of anomalies. We consider a semi-supervised setting, in which labels are available for 15% of the normal observations and a large class imbalance is present. The utility of our model is demonstrated on both univariate and multivariate functional data.

2606.19230 2026-06-18 cs.LG cs.HC stat.ML 新提交

A Human-in-the-Loop Bayesian Optimization Framework for Constraint-Aware Bioprocess Development

一种面向约束感知的生物过程开发的人机协同贝叶斯优化框架

Samuel Stricker, Claus Wirnsperger, Alessandro Butté, Laura Helleckes, Gonzalo Guillén Gosálbez, Antonio del Rio Chanona, Mehmet Mercangöz

发表机构 * Imperial College London(伦敦帝国理工学院) DataHow AG ETH Zurich(苏黎世联邦理工学院)

AI总结 提出一种扩展的帕累托前沿引导采样框架,通过将高斯过程代理的约束满足概率和鲁棒性作为多目标优化目标,结合交互式仪表盘实现人机协同的约束感知生物过程优化。

详情
AI中文摘要

本文提出了帕累托前沿引导采样(PFGS)的一种扩展,这是一种人机协同(HitL)贝叶斯优化(BO)框架,其中高斯过程(GP)代理导出的量被重新表述为多目标优化问题的目标,得到的帕累托前沿暴露给领域专家进行交互式候选选择,而不是返回单一的自动推荐。该框架在两个方向上进行了扩展:约束优化通过将满足输出规格限的后验概率作为显式的帕累托目标来处理,该概率从GP后验分布解析计算得到;鲁棒优化通过蒙特卡洛采样策略来处理,该策略估计在用户定义的输入扰动变异性下的期望下置信性能,捕捉在可能的实现偏差下的性能退化。由此产生的多维帕累托表示通过交互式仪表盘上的成对二维投影同时显示预测性能、模型不确定性、概率约束满足和输入鲁棒性之间的权衡,使得选择标准能够随着代理模型的改进和开发目标的演变而迭代细化。该框架在一个八维的补料分批中国仓鼠卵巢(CHO)细胞培养模拟器上进行了展示,证明了系统性地识别高性能、满足可行性且对扰动具有鲁棒性的操作条件,并说明了专家定义的需求如何提供原则性的停止标准并支持实验资源的明智分配。

英文摘要

This work presents an extension to Pareto Front Guided Sampling (PFGS), a Human-in-the-Loop (HitL) Bayesian Optimization (BO) framework in which Gaussian process (GP) surrogate-derived quantities are reformulated as objectives of a multi-objective optimization problem, and the resulting Pareto front is exposed to a domain expert for interactive candidate selection rather than returning a single automated recommendation. The framework is extended in two directions: constrained optimization is addressed by incorporating the posterior probability of satisfying output specification limits as an explicit Pareto objective, computed analytically from the GP posterior distribution; robust optimization is addressed by a Monte Carlo sampling strategy that estimates expected lower-confidence performance over a user-defined variability of input perturbations, capturing performance degradation under likely implementation deviations. The resulting multi-dimensional Pareto representation renders trade-offs between predicted performance, model uncertainty, probabilistic constraint satisfaction, and input robustness simultaneously visible through pairwise two-dimensional projections on an interactive dashboard, enabling selection criteria to be iteratively refined as the surrogate model improves and development objectives evolve. The framework is showcased on an eight-dimensional fed-batch Chinese Hamster Ovary (CHO) cell culture simulator demonstrating systematic identification of high-performing, feasibility-compliant, and perturbation-resilient operating conditions, and illustrating how expert-defined requirements provide a principled stopping criterion and support informed allocation of experimental resources.

2606.18535 2026-06-18 stat.ME cs.LG math.ST stat.TH 新提交

Shrinkage priors for Bayesian Substitute Confounders

贝叶斯替代混杂因子的收缩先验

Yordan P. Raykov, Hengrui Luo, Justin D. Strait, Wasiur R. KhudaBukhsh

发表机构 * School of Mathematical Sciences, University of Nottingham, Nottingham, UK(诺丁汉大学数学科学学院) Department of Statistics, Rice University, USA(里士满大学统计学系;伯克利国家实验室) Lawrence Berkeley National Laboratory, USA(洛斯阿拉莫斯国家实验室统计科学组) Statistical Sciences Group, Los Alamos National Laboratory, USA

AI总结 针对多原因观察研究中替代混杂因子过度编码问题,提出贝叶斯因子分配框架,利用收缩先验学习稀疏替代混杂因子,保持粗粒度多原因依赖,并证明后验集中性和重叠保持几何性质,实现潜在结果的一致性估计。

详情
AI中文摘要

多原因观察研究通过原因间的依赖结构包含关于未测量混杂的信息。然而,对未观测混杂的直接插补通常比学习一个低维替代得分更复杂,该得分保留了稳定因果调整所需的共享分配变异。去混杂因子(Wang and Blei, 2019)及相关替代混杂因子方法利用了这一思想,但灵活的分配模型可以拟合原因的联合分布,同时产生过度编码处理向量、破坏重叠或捕获单原因变异的得分。我们开发了一个贝叶斯因子分配框架,用于学习稀疏替代混杂因子,该框架通过收缩先验保留粗粒度的多原因依赖。该理论在后验集中性、因子得分收缩和保留重叠的分配几何层面进行阐述,因此不依赖于特定的收缩先验。在这些条件下,当相应的潜变量识别假设成立时,所提出的回归调整估计量对平均潜在结果是一致的。收缩先验为潜在结构学习提供了自然工具:它们倾向于由多个原因支持的低维因子,阻止有效的单原因因子,并通过渐进收缩诱导潜在因子的排序。合成实验说明了信号强度、结果有效性和几何感知正则化的作用。在阿尔茨海默病神经影像学倡议(ADNI)基线分析中,稀疏替代得分恢复了对侵入性脑脊液生物标志物直接条件调整的大部分效果,而重叠崩溃诊断则识别出拟合因子何时简化为单个观测测量。

英文摘要

Multi-cause observational studies contain information about unmeasured confounding through the dependence structure among causes. However, literal imputation of the unobserved confounder is often more complex than learning a lower-dimensional substitute score that preserves the shared assignment variation needed for stable causal adjustment. The deconfounder (Wang and Blei, 2019) and related substitute confounder methods exploit this idea, but flexible assignment models can fit the joint distribution of the causes while producing scores that over-encode the treatment vector, collapse overlap, or capture single-cause variation. We develop a Bayesian factor assignment framework for learning sparse substitute confounders that retain coarse multi-cause dependence with shrinkage priors. The theory is stated at the level of posterior concentration, factor score contraction, and overlap-preserving assignment geometry and therefore does not rely on a particular shrinkage prior. Under these conditions, the proposed regression-adjusted estimators are consistent for mean potential outcomes when the corresponding latent variable identification assumptions hold. Shrinkage priors provide a natural tool for latent structural learning: they favour low-dimensional factors supported by multiple causes, discourage effectively single-cause factors, and induce an ordering of the latent factors through progressive shrinkage. Synthetic experiments illustrate the roles of signal strength, outcome validity, and geometry-aware regularization. In an Alzheimer's Disease Neuroimaging Initiative (ADNI) baseline analysis, sparse substitute scores recover much of the adjustment obtained by directly conditioning on invasive cerebrospinal-fluid biomarkers, while collapse diagnostics identify when fitted factors reduce to individual observed measurements.

2606.18491 2026-06-18 astro-ph.CO astro-ph.IM hep-ph stat.ME 新提交

The Coherence Principle: A Falsifiable Prior for Model Selection from the Grammar of Theories

相干性原理:从理论语法出发的可证伪模型选择先验

Raul Jimenez, Carlos Peña Garay, Fergus Simpson, Licia Verde

AI总结 提出相干性原理,通过理论语法(对称性、守恒律等)的符合度分配模型先验,用最大熵指数形式量化代价,在宇宙学和粒子物理中验证其有效性。

详情
AI中文摘要

宇宙学和粒子物理学中的贝叶斯模型选择通常是在后验几率继承了对竞争模型先验的强烈、往往未被承认的依赖性的情况下进行的。标准方法——参考先验、层次先验或诉诸自然性——忽略了相关的理论知识或依赖于难以操作定义的标准。我们提出\emph{相干性原理}:一种可重复的处方,根据与现有理论验证结构的兼容性来分配模型先验。这种结构,或\emph{语法},包括对称性、守恒律、局域性、洛伦兹不变性和普适性模式。对这些规则的无动机违反会产生相干性代价,通过由一个可校准参数$\alpha$控制的最大熵指数形式转换为先验权重。所得先验既不同于贝叶斯奥卡姆因子也不同于自然性:它惩罚的不是参数体积或精细调谐,而是对已验证理论语法的偏离。我们用宇宙学和基础物理中的例子说明该原理:中微子质量机制、暗能量和修改引力、暴胀、超出标准模型扇区以及层次天体物理推断。我们还在四个历史案例——广义相对论、泡利中微子、宇称破坏和狭义相对论——上测试它,这些案例中证据和理论背景可以重构。这些例子表明,当在正确的领域和时间定义适当的语法时,它倾向于历史上成功的选择。相干性原理使物理推理中常见但通常不言而喻的部分变得明确:对已验证结构规则的信任。它将这种判断转化为贝叶斯推断中透明、可测试和可推翻的组成部分,当数据足够有约束力时,让经验似然自由主导。

英文摘要

Bayesian model selection in cosmology and particle physics is often performed where posterior odds inherit a strong, often unacknowledged dependence on the prior assigned to competing models. Standard responses -- reference priors, hierarchical priors, or appeals to naturalness -- ignore relevant theoretical knowledge or rely on criteria hard to define operationally. We propose the \emph{Coherence Principle}: a reproducible prescription for assigning model priors according to compatibility with the validated structure of an existing theory. This structure, or \emph{grammar}, includes symmetries, conservation laws, locality, Lorentz invariance, and universality patterns. Unmotivated violations of these rules incur a coherence cost, converted into a prior weight through a maximum-entropy exponential form controlled by one calibratable parameter $α$. The resulting prior is distinct from both the Bayesian Occam factor and naturalness: it penalizes not parameter volume or fine tuning, but departures from validated theoretical grammar. We illustrate the principle with examples from cosmology and fundamental physics: neutrino mass mechanisms, dark energy and modified gravity, inflation, beyond-Standard-Model sectors, and hierarchical astrophysical inference. We test it also on four historical cases -- general relativity, Pauli's neutrino, parity violation, and special relativity -- where evidential and theoretical contexts can be reconstructed. These examples show that it favors the historically successful choice when the proper grammar is defined in the correct domain and time. The Coherence Principle makes explicit a common but usually tacit part of physical reasoning: trust in validated structural rules. It turns this judgment into a transparent, testable, and overrulable component of Bayesian inference, leaving empirical likelihoods free to dominate when data are sufficiently constraining.

2606.17491 2026-06-18 stat.ML cs.LG stat.ME 新提交

A Bayesian Boolean Matrix Factorization with Application to Copy Number Analysis in Cancer

贝叶斯布尔矩阵分解及其在癌症拷贝数分析中的应用

Adolphus Wagala, Mehmet Samur, Giovanni Parmigiani

发表机构 * Department of Data Science, Dana-Farber Cancer Institute(数据科学部,达纳-法伯癌症研究所) Department of Biostatistics, Harvard T.H. Chan School of Public Health(生物统计学部,哈佛T.H. 潘克学校公共卫生学院)

AI总结 提出贝叶斯布尔矩阵分解(BBMF)模型,通过全共轭生成模型和稀疏先验实现布尔约束下的可解释因子分解,并应用于多发性骨髓瘤的染色体臂拷贝数变异分析,揭示肿瘤异质性的离散潜在结构。

详情
AI中文摘要

二值数据分解很常见,但实值方法忽略了离散性并产生难以解释的因子。布尔矩阵分解(BooMF)通过逻辑与和或运算将二值矩阵分解为两个低秩二值矩阵,将数据表示为可解释模式的布尔析取。在癌症基因组学中,BooMF可以揭示可能驱动肿瘤演化的协调特征变化,这与旋转或加性分解不同。大多数现有的BooMF方法是启发式的、贪婪的、对初始化敏感、容易陷入局部最优,并且不支持原则性的模型选择或不确定性量化。我们引入了贝叶斯布尔矩阵分解(BBMF),这是一个具有稀疏诱导先验的全共轭生成模型。它强制执行布尔约束,产生具有一致不确定性量化的可解释潜在因子,并允许具有封闭形式全条件分布的吉布斯采样。由于癌症演化通常涉及广泛、近乎同时的染色体数目变化(例如,全基因组复制后伴随不稳定性和选择),布尔分解比加性模型更自然地捕捉这些模式。应用于多发性骨髓瘤的臂级拷贝数变异数据(其中条目指示染色体臂扩增的存在/缺失),BBMF找到了一小组可解释的双团,将患者子集与反复共变的染色体臂联系起来,提供了肿瘤异质性的紧凑、生物学上有意义的总结,并展示了BBMF在复杂二值数据中发现离散潜在结构的实用性。

英文摘要

Binary data factorization is common, but real-valued methods ignore discreteness and yield hard-to-interpret factors. Boolean Matrix Factorization (BooMF) instead decomposes a binary matrix into two lower-rank binary matrices via logical AND and OR, expressing the data as a Boolean disjunction of interpretable patterns. In cancer genomics, BooMF can reveal coordinated feature changes that may drive tumor evolution, unlike rotational or additive decompositions. Most existing BooMF methods are heuristic, greedy, sensitive to initialization, prone to local optima, and do not support principled model selection or uncertainty quantification. We introduce Bayesian Boolean Matrix Factorization (BBMF), a fully conjugate generative model with sparsity-inducing priors. It enforces Boolean constraints, yields interpretable latent factors with coherent uncertainty quantification, and admits Gibbs sampling with closed-form full conditionals. Because cancer evolution often involves widespread, near-simultaneous chromosome-number changes (e.g., whole-genome duplication followed by instability and selection), Boolean factorizations capture these patterns more naturally than additive models. Applied to arm-level copy-number alteration data in multiple myeloma, where entries indicate presence/absence of chromosomal-arm amplifications, BBMF finds a small set of interpretable bicliques linking patient subsets to recurrently co-altered chromosomal arms, providing a compact, biologically meaningful summary of tumor heterogeneity and demonstrating BBMF's utility for uncovering discrete latent structure in complex binary data.

2605.22640 2026-06-18 stat.ME 版本更新

Positive-definiteness in separable priors: effects on prior interpretability and inference

在可分离先验中的正定性:对先验可解释性和推断的影响

Jack Storror Carter, David Rossell

AI总结 本文研究了在对称正定矩阵中使用可分离先验时,截断对先验可解释性和推断的影响,探讨了如何设置先验参数以减少截断带来的影响。

Comments 34 pages, 3 figures

详情
AI中文摘要

对称正定矩阵的常用先验假设独立的条目并添加截断以确保正定性。虽然概念上简单且计算上常有便利,但除非谨慎处理,这种截断可能会产生意外影响。如果截断先验或其边缘显著不同于未截断的对应物,则其可解释性可能受损,其收缩特性更难刻画,且后验推断可能以意想不到的方式受到影响。我们研究了截断对密集和稀疏矩阵的影响,并展示了如何设置先验参数,如非对角线条目的方差,使得随着矩阵维度的增长,这种影响被减轻。我们特别关注稀疏推断,其中除非精心设置先验参数,否则截断先验及其对应的后验会系统性地将更多质量分配给更稀疏的结构,而非截断先验。

英文摘要

A popular class of priors for symmetric positive-definite matrices assumes independent entries and adds a truncation to ensure positive-definiteness. While conceptually simple and often computationally convenient, unless done carefully this truncation can have unintended effects. If the truncated prior or its margins are significantly different from their untruncated counterpart, then its interpretability may suffer, its shrinkage properties become harder to characterise, and posterior inference may be affected in unanticipated ways. We investigate the effect of the truncation both for dense and sparse matrices, and show how to set prior parameters such as the variance of off-diagonal entries such that said effect is mitigated as the matrix dimension grows. We pay particular attention to sparse inference where, unless prior parameters are set carefully, the truncated prior and hence its corresponding posterior assign systematically higher mass to sparser structures than the untruncated prior.

2412.08895 2026-06-18 eess.SP stat.AP stat.CO 版本更新

Bayesian Wideband Signal Detection via Source Signal Marginalization and RJMCMC

基于RJMCMC的全贝叶斯宽带波达方向估计与检测

Kyurae Kim, Philip T. Clemson, James P. Reilly, Jason F. Ralph, Simon Maskell

AI总结 提出一种宽带信号模型,通过循环卷积和频域稀疏矩阵分解,将边际似然计算复杂度从O(N^3 k^3)降至O(N k^3),结合非可逆RJMCMC实现全贝叶斯源数检测与DOA估计。

详情
AI中文摘要

考虑一个阵列接收来自未知数量$k$个源的未知宽带信号。宽带信号可占据任意宽的带宽,使得基于解调的方法不适用,这在涉及声学信号的场景中很常见。本文旨在根据$N$个含噪阵列测量值确定$k$,这一任务称为“检测问题”,贝叶斯模型比较是常用方法。为使贝叶斯推断可行,通常需要对源信号进行边际化。不幸的是,对于宽带信号,朴素边际化的时间复杂度为$\mathcal{O}(N^3 k^3)$,难以承受。因此,全贝叶斯信号检测尚未在宽带设置中得到验证。本文提出一种宽带信号模型,允许计算上可处理的源信号边际化。我们从线性时不变(LTI)信号传播的规范模型出发,将其增强为循环卷积,且不失一般性。这允许在频域中进行高效计算,所得线性系统可分解为一个稀疏矩阵,我们称之为\textit{条带矩阵分解}。利用这种稀疏模式,可将计算边际似然的时间复杂度降至$\mathcal{O}(N k^3)$。这些计算改进使得通过可逆跳跃马尔可夫链蒙特卡洛(RJMCMC)进行高效后验推断成为可能。本文使用RJMCMC的非可逆扩展(NRJMCMC),它通常比RJMCMC具有更低的自相关性和更快的收敛速度。然后,可以使用NRJMCMC抽取的样本以全贝叶斯方式检测潜在源信号。我们通过与广义似然比检验(GLRT)和信息准则进行比较来评估我们的方法。

英文摘要

Consider an array receiving unknown wideband signals from an unknown number of sources $k$. Wideband signals can occupy arbitrarily wide bandwidths, rendering demodulation-based approaches inapplicable, a common situation in settings involving acoustic signals. Here, we aim to determine $k$ given $N$ noisy array-valued measurements, a task known as the "detection problem," for which Bayesian model comparison is a common approach. To render Bayesian inference tractable, it is typically necessary to marginalize the source signals. Unfortunately, for wideband signals, naive marginalization has an unaffordable time complexity of $\mathcal{O}(N^3 k^3)$. As a result, fully Bayesian signal detection has yet to be demonstrated in wideband settings. In this work, we propose a wideband signal model that allows for computationally tractable marginalization of the source signals. We begin from the canonical model of linear time-invariant (LTI) signal propagation, which is then augmented into a circular convolution, all without loss of generality. This allows for efficient computation in the frequency domain, where the resulting linear system admits a decomposition into a sparse matrix we refer to as a \textit{stripe matrix decomposition}. Exploiting this sparsity pattern reduces the time complexity of computing the marginal likelihood to $\mathcal{O}(N k^3)$. These computational improvements enable efficient posterior inference via reversible-jump Markov chain Monte Carlo (RJMCMC). In this work, we use the non-reversible extension of RJMCMC (NRJMCMC), which often achieves lower autocorrelation and faster convergence than RJMCMC. Detection of the latent source signals can then be performed in a fully Bayesian manner using samples drawn by NRJMCMC. We evaluate our procedure by comparing it against generalized likelihood ratio testing (GLRT) and information criteria.

3. 因果推断与实验设计 8 篇

2606.18459 2026-06-18 stat.ME 新提交

Apportioning Causal Responsibility of Two Risk Factors for an Adverse Outcome via Counterfactual Attribution

通过反事实归因分配两个风险因素对不良结果的因果责任

Shanshan Luo, Yafang Deng, Qingyuan Zhao, Zhi Geng

AI总结 提出一个量化框架,在无混杂和单调性假设下,通过反事实归因分配两个二元风险因素对已发生不良结果的因果责任,并建立非参数识别或推导出尖锐界限。

详情
AI中文摘要

与前瞻性评估原因效应的传统因果推断不同,分配因果责任需要回顾性评估以推断已发生结果的原因。本文提出了一个量化框架,用于在共同导致已实现不良结果的两个二元风险因素之间分配因果责任。理想情况下,了解个体的潜在因果类型(由所有可能暴露组合下的潜在结果定义)将允许精确分配;然而,这些潜在结果无法同时观测。因此,我们将每个风险因素的平均因果责任定义为其在潜在因果类型分布上的期望责任。在无混杂和单调性假设下,当类型特定责任满足结构平衡条件时,我们建立了该指标的非参数识别,否则推导出尖锐界限。我们使用吸烟和石棉暴露导致肺癌的经典例子来说明所提出的框架。

英文摘要

Unlike traditional causal inference, which prospectively evaluates the effects of causes, apportioning causal responsibility requires a retrospective assessment to deduce the causes of an outcome that has already occurred. This paper proposes a quantitative framework for apportioning causal responsibility between two binary risk factors that jointly contribute to a realized adverse outcome. Ideally, knowing the individual's latent causal type, defined by the potential outcomes under all possible exposure combinations, would allow precise apportionment; however, these potential outcomes cannot be simultaneously observed. We therefore define the average causal responsibility of each risk factor as its expected responsibility over the distribution of latent causal types. Under the assumptions of no confounding and monotonicity, we establish nonparametric identification of this metric when the type-specific responsibilities satisfy a structural balance condition, and derive sharp bounds otherwise. We illustrate the proposed framework using the classic example of lung cancer attributable to smoking and asbestos exposures.

2606.19117 2026-06-18 stat.ME cs.LG econ.EM stat.ML 新提交

Wasserstein Policy Learning for Distributional Outcomes

Wasserstein 策略学习用于分布性结果

Yiyan Huang, Cheuk Hang Leung, Qi Wu, Zhiheng Zhang

AI总结 针对分布值结果,提出基于Wasserstein重心和效用泛函的策略学习框架,使用IPW和DR估计器,证明遗憾率由策略类复杂度主导,并给出极小化下界。

Comments Accepted by The 39th Annual Conference on Learning Theory (COLT 2026)

详情
AI中文摘要

离线策略学习在因果推断中受到越来越多的关注。主要目标是学习一个策略(个体化治疗规则),作为从协变量到治疗的映射,以最大化定义为标量值潜在结果均值的经验福利。在本文中,我们研究具有分布值结果的离线策略学习,其中每个潜在结果是$\mathbb{R}$上的概率测度,奖励通过应用于诱导结果分布的Wasserstein重心的效用泛函来定义。我们基于逆概率加权(IPW)和双稳健(DR)估计器为策略学习框架建立了统计保证。通过处理组合策略类和无限维分位数域乘积上的具有挑战性的均匀偏差,我们证明了有限样本遗憾具有主导依赖$\widetilde{\mathcal{O}}(\sqrt{\mathrm{N\text{-}dim}(\Pi)/N})$。在一维Wasserstein设定下,并在所述正则条件下,主导遗憾率仍由策略类复杂度控制。此外,我们提供了一个极小化下界,建立了对$N$和$\mathrm{N\text{-}dim}(\Pi)$主导依赖的尖锐性。

英文摘要

Offline policy learning has received growing attention in causal inference. The primary objective is to learn a policy (individualized treatment rule) as a mapping from covariates to treatment that maximizes the empirical welfare defined as the mean of scalar-valued potential outcomes. In this paper, we study offline policy learning with distribution-valued outcomes, where each potential outcome is a probability measure on $\mathbb{R}$ and the reward is defined through a utility functional applied to the Wasserstein barycenter of induced outcome distributions. We establish statistical guarantees for the policy learning framework based on both Inverse Probability Weighting (IPW) and Doubly Robust (DR) estimators. By handling the challenging uniform deviation over the product of the combinatorial policy class and the infinite-dimensional quantile domain, we prove that the finite-sample regret has leading dependence $\widetilde{\mathcal{O}}(\sqrt{\mathrm{N\text{-}dim}(Π)/N})$. In the one-dimensional Wasserstein setting and under the stated regularity conditions, the leading regret rate is still governed by the policy-class complexity. Moreover, we provide a minimax lower bound establishing the sharpness of the leading dependence on $N$ and $\mathrm{N\text{-}dim}(Π)$.

2606.18969 2026-06-18 stat.ME cs.MS stat.ML 新提交

Balanced Twins: Causal Inference on Time Series with Hidden Confounding

平衡双胞胎:存在隐藏混杂的时间序列因果推断

Ouali Maha, Ghattas Badih, Flachaire Emmanuel, Charpentier Philippe, Bozzi Laurent

AI总结 提出神经框架同时学习个体时间序列的低维潜在表示和倾向得分,通过灵活匹配恢复反事实,估计处理组的平均处理效应,适用于交错干预和隐藏混杂场景。

详情
AI中文摘要

准确估计时间序列中的处理效应对于评估实际应用中的干预措施至关重要,尤其是当处理分配受到未观测因素的偏差影响时。在许多实际环境中,干预措施在不同时间点被不同个体采用,导致交错的处理暴露和异质性的处理前历史。在这种情况下,汇总处理单元的结果轨迹是不明确的,因此个体处理效应(ITE)估计成为可靠因果推断的前提。因此,我们通过首先恢复个体层面的反事实来研究估计处理组平均处理效应(ATT)的问题。我们引入了一个神经框架,同时学习个体时间序列的低维潜在表示和倾向得分。然后,这些估计通过一个灵活的匹配过程来近似个体处理效应,该过程避免了合成控制方法中常用的经典凸性约束。通过在个体层面操作,我们的方法自然地适应交错干预,并在潜在偏差下改进反事实估计,而不依赖于显式的时间建模假设。我们在实际能源消耗数据和临床时间序列上展示了我们的方法,包括高频电力需求响应项目和重症监护病房(ICU)个体的半合成数据,其中隐藏混杂、交错处理采纳和非平稳动态普遍存在。

英文摘要

Accurately estimating treatment effects in time series is essential for evaluating interventions in real-world applications, especially when treatment assignment is biased by unobserved factors. In many practical settings, interventions are adopted at different times across individuals, leading to staggered treatment exposure and heterogeneous pre-treatment histories. In such cases, aggregating outcome trajectories across treated units is ill-defined, making individual treatment effect (ITE) estimation a prerequisite for reliable causal inference. We therefore study the problem of estimating the average treatment effect for the treated (ATT) by first recovering individual-level counterfactuals. We introduce a neural framework that learns simultaneously low-dimensional latent representations of individual time series and propensity scores. These estimates are then used to approximate the individual treatment effects through a flexible matching procedure that avoids classical convexity constraints commonly used in synthetic control methods. By operating at the individual level, our approach naturally accommodates staggered interventions and improves counterfactual estimation under latent bias, without relying on explicit temporal modeling assumptions. We illustrate our approach on both real-world energy consumption data and clinical time series, including high-frequency electricity demand-response programs and semi-synthetic data for individuals in intensive care unit (ICU), where hidden confounding, staggered treatment adoption, and non-stationary dynamics are prevalent.

2606.18750 2026-06-18 stat.AP cs.LG 新提交

Ensuring Trustworthy Online A/B Testing: Addressing Five Key Questions on CUPED

确保可信的在线A/B测试:解决关于CUPED的五个关键问题

Yu Zhang, Bokui Wan, Yongli Qin, Jinyong Ma, Yifan Guo

AI总结 本文系统解决CUPED应用中五个常见但被忽视的问题,包括最优调整规范、回归调整有效性、鲁棒方差估计,并扩展到多臂实验和两阶段抽样设计,通过理论分析和实验验证提供可靠方法,已在字节跳动平台部署。

Comments 15 pages, 3 figures

详情
AI中文摘要

A/B测试已成为大规模在线实验中数据驱动决策的金标准,为功能发布、定价优化和用户体验提升提供关键指导。为最大化统计灵敏度,许多科技公司常规使用实验前数据控制实验(CUPED),该技术实现大幅方差缩减,同时保持平均处理效应估计的无偏性。尽管被广泛采用,CUPED的几个关键方法和实践细节仍未充分探索。本文系统解决了关于CUPED应用的五个常见但被忽视的问题。首先,我们提供各种后CUPED估计量的比较分析,以确定最优调整规范。其次,我们评估基于回归的调整的有效性,并描述为此类框架定制的鲁棒方差估计方法。最后,我们将研究扩展到复杂但常见的场景,包括多臂实验和两阶段抽样设计。我们的发现表明,在这些设置中,天真地依赖标准方差估计量可能导致严重误导的推断。通过提供严格的理论见解和广泛的实验验证,本工作加深了对CUPED的概念理解。值得注意的是,推荐的方法已成功部署并集成到字节跳动的实验平台中。

英文摘要

A/B testing has become the gold standard for data-driven decision-making in large-scale online experimentation, providing critical guidance for feature launch, pricing optimization, and user experience enhancement. To maximize statistical sensitivity, many technology companies routinely employ Controlled-experiment Using Pre-Experiment Data (CUPED), a technique that achieves substantial variance reduction while preserving the unbiasedness of estimating the average treatment effect. Despite its widespread adoption, several critical methodological and practical nuances of CUPED remain underexplored. This paper systematically addresses five frequently encountered yet overlooked questions regarding the application of CUPED. First, we provide a comparative analysis of various post-CUPED estimators to identify the optimal adjustment specification. Second, we evaluate the validity of regression-based adjustments and delineate robust variance estimation methods tailored for such frameworks. Finally, we extend our investigation to complex but common scenarios, including multi-arm experiments and two-stage sampling designs. Our findings reveal that in these settings, naive reliance on standard variance estimators can lead to severely misleading inferences. By offering rigorous theoretical insights and extensive experimental validation, this work deepens the conceptual understanding of CUPED. Notably, the recommended methodologies have been successfully deployed and integrated into ByteDance's experimentation platform.

2606.18281 2026-06-18 stat.AP cs.LG stat.ML 新提交

A Guide to Estimating Conditional Average Treatment Effects in Competing Risks Settings

竞争风险背景下条件平均处理效应估计指南

Daniel Klippert, Sarah Friedrich, Markus Pauly

发表机构 * Department of Statistics, TU Dortmund University(图恩-多特蒙德大学统计学系) Research Center Trustworthy Data Science and Security, University Alliance Ruhr (UA Ruhr)(鲁尔大学联盟可信数据科学与安全研究中心) Institute for Mathematics, University of Augsburg(艾希施泰特大学数学研究所)

AI总结 针对竞争风险生存数据,比较六种元学习器估计条件平均处理效应,提供R包crsurvlearners指导模型选择。

详情
AI中文摘要

条件平均处理效应(CATE)是个性化医疗中治疗决策的核心。在竞争风险背景下,从生存数据估计CATE允许对特定感兴趣事件的治疗效果进行患者特异性评估,同时适当考虑替代事件类型。在存在合并症的情况下,这种区分至关重要,因为竞争死亡原因可能混淆治疗效果。本文聚焦于右删失生存时间和二元治疗,研究CATE定义为在固定时间点上感兴趣事件绝对风险的协变量条件差异。为此,我们研究了元学习器,这些学习器将机器学习算法适应于竞争风险场景中的CATE估计。我们系统比较了六种元学习器,结合Cox回归或随机生存森林进行风险建模,以及弹性网回归或随机森林进行直接CATE建模。为提供模型选择的实践指导,我们在多种模拟设置中评估其性能,这些设置在风险复杂性、治疗异质性、治疗分配、事件类型分布和删失方面有所不同。为促进应用,我们提供R包crsurvlearners,实现了所有考虑的方法。

英文摘要

Conditional average treatment effects (CATEs) are central to treatment decision-making in personalized medicine. In competing risks settings, estimating CATEs from survival data allows for patient-specific assessments of treatment effectiveness for a specific event of interest while properly accounting for alternative event types. This distinction is essential in the presence of comorbidities, where competing causes of death may otherwise confound the therapeutic benefit. Focusing on right-censored survival times with binary treatment, we examine CATEs defined as covariate-conditional differences in the absolute risk for the event of interest at a fixed time. To this end, we study meta-learners which adapt machine learning algorithms for CATE estimation in competing risks scenarios. We systematically compare six meta-learners, combining Cox regression or random survival forests for risk modeling with elastic net regression or random forests for direct CATE modeling. To provide practical guidance on model selection, we evaluate their performance in multiple simulation settings, that differ in hazard complexity, treatment heterogeneity, treatment assignment, event type distribution and censoring. To facilitate applied use, we provide the R package, crsurvlearners, which implements all considered approaches.

2501.11996 2026-06-18 stat.ME econ.EM 版本更新

Experimental Designs for Multi-Item Multi-Period Inventory Control

多物品多周期库存控制的实验设计

Xinqi Chen, Xingyu Bai, Zeyu Zheng, Nian Si

AI总结 研究多物品多周期库存系统中A/B测试的偏差问题,提出一种物品与时间配对的实验设计,并通过仿真和真实数据验证其有效性。

详情
AI中文摘要

随机实验,或称A/B测试,是评估干预措施的黄金标准,但在库存管理中仍未得到充分利用。本研究通过分析在具有缺货损失和容量约束的多物品、多周期库存系统中的A/B测试策略来填补这一空白。我们考察了两种经典实验设计——切换实验和物品级随机化——并表明两者都因干扰而产生系统性偏差:切换实验中的时间延续效应和容量约束下物品间的相互蚕食。在温和条件下,我们刻画了不同场景下偏差的方向。受双边随机化的启发,我们提出了一种物品与时间配对的实验设计,并分析了其偏差性质。受控随机模拟验证了理论预测,而在真实生鲜零售数据上的轨迹驱动实验表明,在存在缺货替代的现实环境中,相同的机制仍然存在。

英文摘要

Randomized experiments, or A/B testing, are the gold standard for evaluating interventions, yet they remain underutilized in inventory management. This study addresses this gap by analyzing A/B testing strategies in multi-item, multi-period inventory systems with lost sales and capacity constraints. We examine two canonical experimental designs--switchback experiments and item-level randomization--and show that both suffer from systematic bias due to interference: temporal carryover in switchbacks and cannibalization across items under capacity constraints. Under mild conditions, we characterize the direction of this bias in different scenarios. Motivated by two-sided randomization, we propose a pairwise design over items and time and analyze its bias properties. Controlled stochastic simulations verify the theoretical predictions, and trace-driven experiments on real-world fresh-retail data show that the same mechanisms persist in realistic environments with stockout substitution.

2505.15215 2026-06-18 stat.ML cs.LG stat.ME 版本更新

Clustering and Pruning in Causal Data Fusion

因果数据融合中的聚类与剪枝

Otto Tabell, Santtu Tikka, Juha Karvanen

发表机构 * Department of Mathematics and Statistics(数学与统计学系)

AI总结 针对多数据源因果融合中变量增多导致计算复杂的问题,提出剪枝和聚类预处理方法,基于小图推断大图中因果效应的可识别性并给出识别函数。

详情
AI中文摘要

数据融合,即结合观测数据和实验数据的过程,可以使得原本不可识别的因果效应变得可识别。尽管针对特定场景已经开发了识别算法,但do-calculus仍然是因果数据融合的唯一通用工具,特别是当某些变量存在于部分数据源而其他数据源中没有时。然而,基于do-calculus的方法可能随着变量数量增加和因果图复杂度增长而面临计算挑战。因此,有必要在保留必要特征的同时减小此类模型的规模。为此,我们提出将剪枝(移除不必要的变量)和聚类(合并变量)作为因果数据融合的预处理操作。我们将先前关于单一数据源的结果进行推广,并推导出在多数据源情况下应用剪枝和聚类的条件。我们给出了基于较小图推断较大图中因果效应可识别性或不可识别性的充分条件,并展示了如何为可识别的因果效应获得相应的识别函数。来自流行病学和社会科学的例子展示了这些结果的应用。

英文摘要

Data fusion, the process of combining observational and experimental data, can enable the identification of causal effects that would otherwise remain non-identifiable. Although identification algorithms have been developed for specific scenarios, do-calculus remains the only general-purpose tool for causal data fusion, particularly when variables are present in some data sources but not others. However, approaches based on do-calculus may encounter computational challenges as the number of variables increases and the causal graph grows in complexity. Consequently, there exists a need to reduce the size of such models while preserving the essential features. For this purpose, we propose pruning (removing unnecessary variables) and clustering (combining variables) as preprocessing operations for causal data fusion. We generalize earlier results on a single data source and derive conditions for applying pruning and clustering in the case of multiple data sources. We give sufficient conditions for inferring the identifiability or non-identifiability of a causal effect in a larger graph based on a smaller graph and show how to obtain the corresponding identifying functional for identifiable causal effects. Examples from epidemiology and social science demonstrate the use of the results.

2502.07641 2026-06-18 stat.ME stat.ML 版本更新

Distributional Instrumental Variable Method

分布工具变量方法

Anastasiia Holovchak, Sorawit Saengkyongam, Nicolai Meinshausen, Xinwei Shen

AI总结 提出分布工具变量方法,利用生成建模在非线性工具变量设置中估计整个干预分布,并证明其可识别性优于传统方法。

详情
AI中文摘要

工具变量方法常用于存在未测量混杂因素时推断因果效应。现有方法通常旨在估计平均因果效应,而少数方法关注分位数处理效应。本文的目标是估计整个干预分布。我们提出了一种称为分布工具变量(DIV)的方法,该方法在非线性工具变量设置中使用生成建模。我们在一般假设下建立了干预分布的可识别性,并展示了一个“欠识别”案例,其中DIV可以识别因果效应,而两阶段最小二乘法无法识别。我们的实证结果表明,DIV方法在广泛的模拟数据上表现良好,在均值或分位数处理效应的可识别性和估计误差方面优于现有工具变量方法。此外,我们将DIV应用于一个经济数据集,以检验制度质量与经济发展之间的因果关系,结果与原研究吻合良好。我们还将DIV应用于一个单细胞数据集,研究在未见干预下预测基因表达的泛化性和稳定性。DIV的软件实现可在R和Python中获取。

英文摘要

The instrumental variable (IV) approach is commonly used to infer causal effects in the presence of unmeasured confounding. Existing methods typically aim to estimate the mean causal effects, whereas a few other methods focus on quantile treatment effects. The aim of this work is to estimate the entire interventional distribution. We propose a method called Distributional Instrumental Variable (DIV), which uses generative modelling in a nonlinear IV setting. We establish identifiability of the interventional distribution under general assumptions and demonstrate an 'under-identified' case, where DIV can identify the causal effects while two-step least squares fails to. Our empirical results show that the DIV method performs well for a broad range of simulated data, exhibiting advantages over existing IV approaches in terms of the identifiability and estimation error of the mean or quantile treatment effects. Furthermore, we apply DIV to an economic data set to examine the causal relation between institutional quality and economic development and our results align well with the original study. We also apply DIV to a single-cell data set, where we study the generalizability and stability in predicting gene expression under unseen interventions. The software implementations of DIV are available in R and Python.

4. 高维统计与正则化 2 篇

2606.19065 2026-06-18 stat.ME stat.AP 新提交

Regularized covariance estimation from partially observed interferometric data

基于部分观测干涉数据的正则化协方差估计

Teresa Bortolotti, Roberta Troilo, Francesco Casu, Simone Vantini, Alessandra Menafoglio

AI总结 针对部分观测干涉数据中系统缺失的问题,提出一种基于拉普拉斯正则化的矩阵补全方法进行非参数协方差估计,无需平稳性或各向同性假设,在模拟和实地数据中均表现优异。

详情
AI中文摘要

小基线子集技术提供了高空间分辨率的地面位移远程测量,使其成为监测灾害易发地区地球物理过程的关键工具。有效分析这类数据需要可靠估计其二阶结构,但由于测量在调查区域的相对较大范围内系统缺失,这一目标难以实现。我们从函数数据分析的角度处理该问题,将观测视为具有二维域的部分观测函数数据。为了恰当表征数据,我们引入了部分观测的碎片化机制,其中曲线部分在重复测量中系统缺失。针对该机制,我们提出了一种新的协方差估计方法,将任务表述为带有拉普拉斯正则化的矩阵补全问题。该估计量是非参数的,且无需平稳性或各向同性假设。大量模拟表明,我们的方法在多种协方差结构下均能实现一致的低估计误差。应用于与Phlegraean Fields相关的地面位移数据,证明了其恢复有意义空间依赖模式的能力,突显了其在环境风险评估和监测中的潜力。

英文摘要

The Small BAseline Subset technique provides remote measurements of ground displacement with high spatial resolution, making it a key tool for monitoring geophysical processes in hazard-prone areas. An effective analysis of this type of data requires reliable estimation of their second-order structure, which is difficult to achieve because the measurements are systematically missing over relatively large portions of the investigated areas. We tackle the problem from a functional data analysis perspective and treat the observations as partially observed functional data with two-dimensional domain. To properly characterize the data, we introduce the fragmented regime of partial observation, where parts of the curves are systematically missing across replicates. For this regime, we propose a novel method for covariance estimation, formulating the task as a matrix completion problem with Laplacian regularization. The estimator is nonparametric and free from stationarity or isotropy assumptions. Extensive simulations show that our method achieves consistently low estimation error across a range of covariance structures. Application to ground displacement data relative to the Phlegraean Fields demonstrates its ability to recover meaningful spatial dependence patterns, highlighting its potential for environmental risk assessment and monitoring.

2606.18949 2026-06-18 stat.ME 新提交

Feature Screening for High-Dimensional Structural Break Predictive Regression

高维结构断点预测回归的特征筛选

Zhenjie Qin, Rongmao Zhang, Wenyang Zhang, Yang Zu

AI总结 提出一种在高维结构断点预测回归中筛选活跃预测变量和估计断点的方法,结合SICS、RCRS和IC准则,实现一致估计与选择。

详情
AI中文摘要

预测回归是探索收益可预测性的重要工具。在本研究中,我们介绍了一种在结构断点预测回归中选择和估计活跃预测变量及断点的有效程序。我们的方法允许断点数量随样本量增加,并适应可能是平稳或协整的稀疏活跃预测变量。我们首先使用确信独立规范筛选(SICS)程序识别活跃预测变量。接着,通过比率控制回归筛选(RCRS)方法估计断点。最后,使用信息准则(IC)消除不必要的断点和预测变量以减少冗余。该方法能够一致地估计和选择真实断点及活跃预测变量。我们的模拟和实证研究表明,所提出的程序表现良好。

英文摘要

Predictive regression is a crucial tool for exploring return predictability. In this study, we introduce an efficient procedure for selecting and estimating active predictors and change points in structural break predictive regression. Our approach allows the number of change points to increase with the sample size and accommodates sparse active predictors that may be stationary or cointegrated. We begin by identifying the active predictors using a Sure Independence Canonical Screening (SICS) procedure. Next, we estimate the change points through a Ratio-Controlled Regression Screening (RCRS) method. Finally, we reduce redundancy by eliminating unnecessary breakpoints and predictors using information criteria (IC). This approach allows for consistent estimation and selection of true breakpoints and active predictors. Our simulations and empirical studies demonstrate that the proposed procedure performs effectively.

5. 时间序列与空间统计 6 篇

2606.18806 2026-06-18 stat.AP 新提交

Spatial emergence of acceleration in global warming

全球变暖加速的空间涌现

Tanja Korsten Bugajski, Nicolai Peder Bulow Pedersen, J. Eduardo Vera-Valdes

AI总结 使用贝叶斯分层时空模型检测全球变暖加速的空间涌现,发现高置信度信号最早出现在高纬度地区,空间聚集会延迟检测。

Comments Supplementary information included after main manuscript

详情
AI中文摘要

全球变暖是否在加速仍存在争议,因为内部变率和空间异质性可能掩盖变暖速率的变化。这里我们使用具有结构化空间依赖的贝叶斯分层时空模型来估计局部变暖轨迹和加速,并将模型应用于逐步截断的观测数据,以推断加速何时变得可检测。我们发现,可检测的加速在气候系统中不均匀地涌现,最早的高置信度信号集中在特定的高纬度地区。在保留的网格单元中,超过90%后验概率为正加速的比例从1970-1990年的13.6%增加到1970-2026年的39.7%,而超过50%阈值的比例从46.4%增加到70.3%。这些结果表明,空间聚集通过平均加速已经出现的区域与加速仍然微弱或不确定的区域,从而延迟了检测。该框架提供了一个概率诊断工具,用于识别变暖在何处加剧以及加速何时变得统计上可检测。

英文摘要

Whether global warming is accelerating remains contested because internal variability and spatial heterogeneity can obscure changes in warming rates. Here we use a Bayesian hierarchical spatio-temporal model with structured spatial dependence to estimate local warming trajectories and acceleration, and apply the model to progressively truncated observations to infer when acceleration becomes detectable. We find that detectable acceleration emerges unevenly across the climate system, with the earliest high-confidence signals concentrated in selected high-latitude regions. Across retained grid cells, the proportion exceeding a 90% posterior probability of positive acceleration increases from 13.6% for 1970-1990 to 39.7% for 1970-2026, while the proportion exceeding a 50% threshold increases from 46.4% to 70.3%. These results show that spatial aggregation can delay detection by averaging regions where acceleration has already emerged with regions where it remains weak or uncertain. The framework provides a probabilistic diagnostic for identifying where warming is intensifying and when acceleration becomes statistically detectable.

2606.18729 2026-06-18 stat.ML cs.LG 新提交

TimeLAVA: Learning-Agnostic Data Valuation for Time Series

TimeLAVA: 时间序列的学习无关数据估值

Wenqin Liu, Weizhi Quan, Aoqi Zuo, Erdun Gao, Vu Nguyen, Dino Sejdinovic, Howard Bondell, Mingming Gong

发表机构 * School of Mathematics and Statistics, The University of Melbourne(墨尔本大学数学与统计学学院) Statistics, The University of Melbourne(墨尔本大学统计学系) Statistics, University of Sydney(悉尼大学统计学系) Responsible AI Research Centre, Australian Institute for Machine Learning(澳大利亚机器学习研究所负责任人工智能研究中心) Amazon(亚马逊) School of Mathematical Sciences, Adelaide University(阿德莱德大学数学科学学院) Department of Machine Learning, MBZUAI(MBZUAI机器学习系)

AI总结 提出TimeLAVA,一种学习无关框架,通过小波变换和最优传输评估时间序列片段对分布差异的边际贡献,无需模型训练,在异常检测、数据剪枝和标签噪声检测中优于现有方法。

Comments 34pages

详情
Journal ref
ICML2026
AI中文摘要

数据估值量化单个样本的内在质量,以实现原则性的数据整理、质量控制和鲁棒学习。对于医疗、金融和工业监控等关键领域的时间序列,有效的估值方法至关重要但基本缺乏。现有方法要么依赖于模型,限制了其泛化性,要么针对独立同分布数据设计,因此无法捕捉序列数据固有的时间依赖性、多尺度模式和非平稳动态。我们引入了TimeLAVA,一种学习无关框架,通过评估时间片段对最小化评估数据与参考数据之间分布差异的边际贡献来估值。其核心是一种新颖的基于选择性小波的Wasserstein差异,结合了用于时间定位的多尺度小波变换和用于对分布偏移具有鲁棒性的非平衡最优传输。通过敏感性分析高效计算片段值,无需模型训练,并聚合成逐点得分。我们提供了将估值与模型无关泛化联系起来的理论保证,并证明了对异常值污染的有界敏感性。在异常检测、数据剪枝和标签噪声检测上的大量实验表明,TimeLAVA在多样化的真实世界数据集上产生了比现有方法显著更具信息量的价值分数。

英文摘要

Data valuation quantifies the intrinsic quality of individual samples to enable principled data curation, quality control, and robust learning. For time series in critical domains such as healthcare, finance, and industrial monitoring, effective valuation methods are essential yet fundamentally lacking. Existing approaches are either model-dependent, limiting their generalizability, or designed for i.i.d. data and thus fail to capture temporal dependencies, multi-scale patterns, and non-stationary dynamics inherent to sequential data. We introduce TimeLAVA, a learning-agnostic framework that values temporal segments by their marginal contribution to minimizing distributional discrepancy between evaluated and reference data. At its core is a novel Selective Wavelet-based Wasserstein discrepancy combining multi-scale wavelet transforms for temporal localization with unbalanced optimal transport for robustness to distributional shifts. Segment values are efficiently computed via sensitivity analysis without requiring model training and aggregated into point-wise scores. We provide theoretical guarantees linking valuation to model-agnostic generalization and prove bounded sensitivity to outlier contamination. Extensive experiments across anomaly detection, data pruning, and label noise detection demonstrate that TimeLAVA produces significantly more informative value scores than existing methods on diverse real-world datasets.

2606.19138 2026-06-18 cs.LG stat.ML 新提交

INDEQS: Informed Neural controlled Differential EQuationS

INDEQS: 信息引导的神经控制微分方程

Michael Detzel, Gabriel Nobis, Kristiyan Blagov, Juri Schubert, Jackie Ma, Wojciech Samek

AI总结 提出INDEQS,一种基于图的NCDE预测方法,通过在不同架构位置注入有向图先验知识,结合内外混合机制和自适应图卷积,在合成和真实任务中优于无信息NCDE。

详情
AI中文摘要

神经控制微分方程(NCDE)为时间序列预测提供了强大的连续时间框架,但标准的基于图的扩展通常纯粹从数据中学习空间结构,即使在已知有向图结构的情况下也是如此。我们引入了信息引导的神经控制微分方程(INDEQS),这是一种基于图的NCDE预测方法,在特定的架构位置融入有向图的先验知识。INDEQS将隐藏状态在图节点上的内部混合与向量场和控制之间的外部混合分开,并提供了一种轻量级的图约束变体和一种更具表现力的变体,通过自适应图卷积从数据中学习额外的图连接。为了系统研究图信息在预测中的有益时机,我们在有向图上设计了一个连续平流模拟,生成了具有已知真实流结构的合成时空数据集。然后,我们在两个实际任务上评估INDEQS:水文网络上的河流流量预测和PeMS08上的交通流预测。在这些合成和真实基准测试中,外部信息引导在参数数量相当的情况下,持续改善了无信息NCDE的平均绝对误差,尤其是在较大图上,而内部信息引导在需要严格遵循已知邻接时提供了一种更参数高效的替代方案。离散卷积和连续时间解码器的比较进一步表明,连续解码器在实际任务中提供了更好的准确性和更大的时间灵活性。INDEQS和平流模拟的实现可在以下网址获取:此 https URL。

英文摘要

Neural Controlled Differential Equations (NCDE) provide a powerful continuous-time framework for forecasting time series, but standard graph-based extensions typically learn spatial structure purely from data, even in settings where a directed graph structure is known a priori. We introduce Informed Neural controlled Differential EQuationS (INDEQS), a graph-based NCDE forecasting method that incorporates prior knowledge of a directed graph at distinct architectural positions. INDEQS separates inner mixing of hidden states across graph nodes from outer mixing between vector field and control, and offers both a lightweight graph-constrained variant and a more expressive variant, learning additional graph connections from data via adaptive graph convolutions. To systematically study when graph informedness is beneficial in forecasting, we devise a continuous advection simulation on directed graphs, yielding synthetic spatio-temporal datasets with known ground-truth flow structure. We then evaluate INDEQS on two real-world tasks: river discharge forecasting on a hydrological network and traffic flow prediction on PeMS08. Across these synthetic and real-world benchmarks, outer informedness consistently improves mean absolute error over an uninformed NCDE with comparable parameter count, particularly on larger graphs, while inner informedness offers a more parameter-efficient alternative when strict adherence to a known adjacency is desired. A comparison of discrete convolutional and continuous-time decoders further shows that continuous decoders yield better accuracy and greater temporal flexibility on real-world tasks. An implementation of INDEQS and the advection simulation is available at https://github.com/Mitchi1/indeqs.

2606.18445 2026-06-18 math.ST stat.ME stat.TH 新提交

A spectral based coefficient of determination for the fit of an MA(q) model

基于谱的MA(q)模型拟合优度判定系数

Holger Dette, Sebastian Kühnert

AI总结 提出基于谱的判定系数,衡量MA(q)模型对平稳过程谱密度的拟合优度,并建立渐近正态性、假设检验及最小阶数选择方法。

Comments 8 pages, 2 figures

详情
AI中文摘要

我们开发了一种基于谱的判定系数,用于衡量平稳过程的谱密度被MA($q$)模型类表示的程度。利用基于周期图的估计量,我们建立了渐近正态性,推导了MA($q$)假设的检验,并构造了确定达到规定近似质量的最小阶数$q$的程序。

英文摘要

We develop a spectral based coefficient of determination to measure how well the spectral density of a stationary process is represented by the class of MA($q$) models. Using periodogram-based estimators, we establish asymptotic normality, derive tests for the MA($q$) hypothesis, and construct procedures for determining the smallest order $q$ achieving a prescribed approximation quality.

2605.27478 2026-06-18 stat.ML cs.LG math.PR 版本更新

Triangular-Reference Schrödinger Bridges for Time Series Generation

三角参考薛定谔桥用于时间序列生成

Gabriele Bocchi

发表机构 * Arakne S.r.l.(阿拉克内公司)

AI总结 提出三角参考薛定谔桥框架,通过区间冻结的退化扩散参考和层次化潜在波动率结构,实现时间序列的保守生成,并保持熵最小化的变分核心。

详情
AI中文摘要

我们引入了用于时间序列的三角参考薛定谔桥(TR-SBTS),这是SBTS框架的一种保守扩展,其中布朗参考被替换为区间冻结的、可能退化的扩散参考,在潜在波动率水平的层次上呈三角形。该构造是在增广状态空间上的单一熵投影,变分约束在时间和潜在水平上联合施加,并通过相对熵的分解层次展开。SBTS的变分核心得以保留:熵最小化器是参考的h-变换,在每个冻结区间上,最优动力学在活跃协方差方向的仿射叶上具有对数梯度漂移公式,即使冻结协方差是秩亏的也成立。我们建立了冻结近似的稳定性以及相应正则化核估计量的收敛性。该构造通过一个有限维条件映射实现,该映射由三种互补的过去约简组成——块PCR摘要、由运行时冻结协方差累积量诱导的过去增量的参考感知马氏核,以及在同一参考度量下的过去窗口WLS漂移回归器——以及一个耦合的状态-协方差桥步骤,其中每个潜在水平为上一水平产生动态参考,并由协方差描述符总结;该构造在数值实验上进行了评估。

英文摘要

Schrödinger bridges for time series (SBTS) generate synthetic paths by projecting, in relative entropy, a Brownian reference onto the path laws that match the joint distribution of the data on the observation grid. The Brownian reference, however, fixes the quadratic variation of the generated paths, which is restrictive when stochastic volatility, correlated noise, or rank-deficient covariance structures must be reproduced. We introduce "Triangular-Reference Schrödinger Bridges for Time Series" (TR-SBTS), which keeps the entropy-projection backbone of SBTS but replaces the Brownian reference by a triangular, volatility-informed, intervalwise frozen reference on a state augmented with latent covariance descriptors. The construction remains a single entropy projection on the augmented state: the minimiser is the \(h\)-transform of the reference, and on each frozen interval the optimal drift has the logarithmic-gradient form \(b^\star(t,x)=A\,\nabla\log H(t,x)\), intrinsic to the active covariance directions when the frozen covariance \(A\) is degenerate. We prove stability of the frozen approximation and consistency of the associated regularised kernel estimators, describe a reference-aware Nadaraya--Watson implementation of the conditional next-increment law, and evaluate the construction on numerical experiments.

2407.20085 2026-06-18 stat.ME 版本更新

Local Level Dynamic Random Partition Models for Changepoint Detection

用于变化点检测的局部水平动态随机划分模型

Alice Giampino, Bernardo Nipoti, Marina Vannucci, Michele Guindani

AI总结 提出一种新颖的状态空间建模框架,通过动态线性模型和马尔可夫结构实现时间序列的局部水平动态随机划分,并采用非边缘错误发现率控制进行变化点检测,在模拟和人体手势数据上验证了有效性。

详情
AI中文摘要

受生物力学、运动分析和体育科学中传感器数据等复杂多元时间序列建模需求增加的驱动,我们引入了一种新颖的状态空间建模框架,其中状态方程编码数据潜在划分随时间的变化。基于动态线性模型的原理,我们的方法开发了一种随机划分模型,能够将数据划分与先前的划分随时间联系起来,使用简单的马尔可夫结构来解释时间持续性并促进变化点检测。变化点的选择涉及多个依赖决策,我们通过采用非边缘错误发现率控制来解决这种时间依赖性。这导致了一个简单的决策规则,与不考虑依赖性的方法相比,确保了更严格的错误发现率控制。该方法使用吉布斯采样算法高效实现,与现有的依赖随机划分模型方法相比,提供了一种直接的方法。此外,我们展示了所提方法如何适应多视图聚类场景。模拟研究和通过多种传感技术收集的人体手势阶段数据集的分析显示了该方法在动态聚类多元时间序列和检测变化点方面的有效性。

英文摘要

Motivated by an increasing demand for models that can effectively describe features of complex multivariate time series, e.g. from sensor data in biomechanics, motion analysis, and sports science, we introduce a novel state-space modeling framework where the state equation encodes the evolution of latent partitions of the data over time. Building on the principles of dynamic linear models, our approach develops a random partition model capable of linking data partitions to previous ones over time, using a straightforward Markov structure that accounts for temporal persistence and facilitates changepoint detection. The selection of changepoints involves multiple dependent decisions, and we address this time-dependence by adopting a non-marginal false discovery rate control. This leads to a simple decision rule that ensures more stringent control of the false discovery rate compared to approaches that do not consider dependence. The method is efficiently implemented using a Gibbs sampling algorithm, leading to a straightforward approach compared to existing methods for dependent random partition models. Additionally, we show how the proposed method can be adapted to handle multiview clustering scenarios. Simulation studies and the analysis of a human gesture phase dataset collected through various sensing technologies show the effectiveness of the method in dynamically clustering multivariate time series and detecting changepoints.

6. 计算统计与MCMC 12 篇

2606.19148 2026-06-18 stat.CO 新提交

Fast Computation of Free-Support Wasserstein Medians

自由支撑Wasserstein中位数的快速计算

Kisung You, Mauro Giuffré, Dennis Shung

AI总结 提出一种直接固定权重自由支撑求解器,通过求解精确最优传输子问题并重新定位支撑点,避免内循环,实现单调下降、凸包不变性和有限时间最佳残差率,计算效率显著优于嵌套Weiszfeld方法。

详情
AI中文摘要

Wasserstein中位数是Wasserstein重心的一种稳健替代方法,用于平均概率测度,但精确经验计算可能代价高昂。一种自然的度量空间Weiszfeld方案通过在每次外迭代中求解加权Wasserstein重心问题来更新当前候选,产生嵌套优化问题。我们提出一种直接固定权重自由支撑求解器,避免了这种内重心循环。在每次迭代中,该方法从当前候选到输入测度求解精确最优传输(OT)子问题,计算所选计划的质心投影,并将每个支撑原子重新定位到其投影目的地的逆距离加权平均值。对于平滑的中位数目标,我们证明这种重新定位是紧的majorization-minimization代理的精确最小化器。这产生了精确传输子问题的单调下降、凸包不变性、有限时间最佳残差率、可微性下的残差到梯度控制以及不动点和驻点刻画。我们还给出了平滑性、稳定性和分辨率一致性结果,阐明了固定权重近似。在精确OT基准测试中,直接求解器在显著减少精确传输子问题使用量的同时,获得了接近紧密求解的嵌套Weiszfeld基线中位数目标。额外的污染、后验聚合和图像原型实验表明,直接求解器产生的中位数摘要与嵌套计算相当,并且对异常分布比Wasserstein重心更不敏感。

英文摘要

The Wasserstein median is a robust alternative to the Wasserstein barycenter for averaging probability measures, but exact empirical computation can be expensive. A natural metric-space Weiszfeld scheme updates the current candidate by solving a weighted Wasserstein barycenter problem at each outer iteration, producing a nested optimization problem. We propose a direct fixed-weight free-support solver that avoids this inner barycenter loop. At each iteration, the method solves exact optimal transport (OT) subproblems from the current candidate to the input measures, computes barycentric projections of the selected plans, and relocates each support atom to an inverse-distance-weighted average of its projected destinations. For a smoothed median objective, we show that this relocation is the exact minimizer of a tight majorization--minimization surrogate. This yields monotone descent for exact transport subproblems, convex-hull invariance, a finite-time best-residual rate, residual-to-gradient control under differentiability, and fixed-point and stationarity characterizations. We also give smoothing, stability, and resolution-consistency results clarifying the fixed-weight approximation. In exact-OT benchmarks, the direct solver attains median objectives close to tightly solved nested Weiszfeld baselines while using substantially fewer exact transport subproblems. Additional contamination, posterior aggregation, and image-prototype experiments show that the direct solver produces median summaries comparable to nested computation and less sensitive to outlying distributions than Wasserstein barycenters.

2606.19044 2026-06-18 stat.CO stat.AP stat.ME 新提交

smoothbp: Fast Bayesian Hierarchical Piecewise Regression with Smoothed Transitions and Spike-and-Slab Model Selection

smoothbp:具有平滑转换和尖峰-板模型选择的快速贝叶斯分层分段回归

Aidan D. Bindoff

AI总结 提出R包smoothbp,利用Rust实现的Metropolis-within-Gibbs采样器和HMC,实现具有逻辑平滑转换的贝叶斯分层分段回归,支持多断点、随机效应和自动断点选择。

Comments 16 pages, 2 figures, R package on CRAN

详情
AI中文摘要

分段回归模型对于识别不同科学领域中纵向或空间数据的结构变化至关重要。虽然标准方法通常假设尖锐的瞬时转换和单一的非分层断点,但许多现实世界现象表现出逐渐平滑的转换,并且在不同组间系统性变化。我们介绍smoothbp,一个用于快速贝叶斯分层分段回归的R包,具有逻辑平滑转换。通过在Rust中实现定制的Metropolis-within-Gibbs采样器,smoothbp将线性项的精确共轭更新与非线性位置和尖锐度参数的哈密顿蒙特卡洛(HMC)转换相结合。smoothbp原生支持多个断点、随机截距、随机断点时机以及所有分段参数上的结构协变量。它还通过smoothbp_ss函数集成了Kuo和Mallick(1998)的尖峰-板先验,用于自动推断活跃断点的数量。我们记录了采样器,通过基于模拟的校准和区间覆盖研究验证了参数恢复和校准,并将smoothbp与R、Python、Julia和MATLAB中的现有软件进行了对比,展示了其相对于通用概率编程语言(如brms)和专用包(如mcp)的竞争效率。

英文摘要

Piecewise regression models are essential for identifying structural changes in longitudinal or spatial data across diverse scientific domains. While standard approaches often assume sharp, instantaneous transitions and single, non-hierarchical breakpoints, many real-world phenomena exhibit gradual, smoothed transitions that vary systematically across groups. We introduce smoothbp, an R package for fast, Bayesian hierarchical piecewise regression featuring logistic-smoothed transitions. By implementing a bespoke Metropolis-within-Gibbs sampler in Rust, smoothbp combines exact conjugate updates for linear terms with Hamiltonian Monte Carlo (HMC) transitions for non-linear location and sharpness parameters. smoothbp natively supports multiple change-points, random intercepts, random change-point timing, and structural covariates on all segment parameters. It also incorporates Kuo and Mallick (1998) spike-and-slab priors for automatic inference on the number of active breakpoints via the smoothbp_ss function. We document the sampler, validate parameter recovery and calibration through simulation-based calibration and interval-coverage studies, and contrast smoothbp against the existing software landscape across R, Python, Julia, and MATLAB, demonstrating its competitive efficiency against general-purpose probabilistic programming languages like brms and specialized packages like mcp.

2606.18409 2026-06-18 stat.ME 新提交

Learning Moment Maps for Continuous-Time Markov Chains under Monte Carlo Noise

蒙特卡洛噪声下连续时间马尔可夫链的矩映射学习

Madison Pratt, Olivia Prosper-Feldman

AI总结 针对连续时间马尔可夫链矩计算困难的问题,提出基于蒙特卡洛噪声训练数据的代理模型,学习参数到矩的映射,并分析噪声对均值和协方差估计的影响,给出计算资源分配策略。

详情
AI中文摘要

连续时间马尔可夫链广泛用于建模随机动力系统,但关键的汇总量如均值和协方差通常难以计算。虽然蒙特卡洛采样提供渐近精确的估计,但当需要在许多参数值下评估矩时,计算成本变得过高。我们开发了一个基于模拟的代理建模框架,从蒙特卡洛衍生的含噪声训练目标中学习参数到矩的映射,从而能够在参数空间中进行高效且准确的近似。我们表明,蒙特卡洛噪声主要通过加性方差影响均值估计,而协方差估计还额外受到来自经验估计的非线性变换引起的偏差的影响。使用随机易感-感染-恢复模型,我们证明了在用于构建含噪声训练标签的固定模拟预算下,神经网络能够准确学习均值和协方差。我们进一步描述了如何在参数空间覆盖和蒙特卡洛重复之间分配计算资源,表明协方差估计需要平衡分配以控制方差和偏差,而均值估计则更多受益于增加参数空间覆盖。最后,我们展示了学习到的矩映射能够产生有效的总体水平量,并在下游任务(如白化)中表现良好。这些结果强调了在代理建模中考虑蒙特卡洛噪声的重要性,并为随机系统中的基于模拟的学习提供了实用指导。

英文摘要

Continuous-time Markov Chains are widely used to model stochastic dynamical systems, but key summary quantities such as means and covariances are often intractable. While Monte Carlo sampling provides asymptotically exact estimates, it becomes computationally prohibitive when moments must be evaluated across many parameter values. We develop a simulation-based surrogate modeling framework that learns parameter-to-moment mappings from Monte Carlo-derived, noise-corrupted training targets, enabling efficient and accurate approximation across the parameter space. We show that Monte Carlo noise affects mean estimation primarily through additive variance, whereas covariance estimation is additionally impacted by bias arising from nonlinear transformations of empirical estimates. Using a stochastic Susceptible-Infected-Recovered model, we demonstrate that neural networks accurately learn both mean and covariance under fixed simulation budgets allocated to constructing the noisy training labels. We further characterize how to allocate computational resources between parameter-space coverage and Monte Carlo replication, showing that covariance estimation requires a balanced allocation to control both variance and bias, while mean estimation benefits more from increased parameter space coverage. Finally, we show that the learned moment mappings produce valid population-level quantities and perform well in downstream tasks such as whitening. These results highlight the importance of accounting for Monte Carlo noise in surrogate modeling and provide practical guidance for simulation-based learning in stochastic systems.

2606.19052 2026-06-18 q-fin.CP stat.CO 新提交

An extendable, integrated, and dynamic approach to forecasting and stress-testing credit risk

一种可扩展、集成且动态的信用风险预测与压力测试方法

Marcel Muller, Arno Botha, Conrad Beyers

AI总结 提出一种集成贷款生成与信用风险的可扩展压力测试方法,通过蒙特卡洛模拟生成贷款组合并计算风险指标,支持动态调整参数以评估多种压力情景。

Comments 23 pages, 10 figures

详情
AI中文摘要

本文提出了一种集成且可扩展的贷款组合压力测试方法,该方法包括贷款生成组件和信用风险组件。在该方法中,我们使用现实的贷款参数和分布假设模拟完整的贷款组合。随后,我们在多状态概率框架内生成这些贷款的不确定现金流历史。我们通过基于模拟的研究来说明我们的方法,尽管该方法可以拟合真实世界的数据。这种基于模拟的方法非常适合压力测试,因为它允许评估一系列条件。根据这些完成的贷款,我们计算组合层面的信用风险指标,例如违约率和损失率。通过在更广泛的蒙特卡洛设置中相应改变贷款参数来引入压力情景,从而产生一系列组合。经典的压力测试方法通常不集成贷款生成或嵌入风险指标之间的相关结构。在我们的方法中,我们将风险指标的预测与收据生成相结合。给定数据,我们可扩展方法中的贷款参数可以使用任何适用的技术动态建模为输入变量的函数。总体而言,我们的方法可以生成更动态且灵活调整的预测,这可以增强任何银行内的压力测试实践。

英文摘要

An integrated and extendable approach for stress-testing loan portfolios is presented, which includes both a loan production component and a credit risk component. In this approach, we simulate a completed portfolio using realistic loan parameters and distributional assumptions. Thereafter, we generate the uncertain cash flow history of these loans within a multistate probabilistic framework. We illustrate our approach using a simulation-based study, though the approach can be fit to real-world data. Such a simulation-based approach is ideal for stress-testing since it allows for evaluating a range of conditions. From these completed loans, we compute portfolio-level credit risk metrics, e.g., default and loss rates. Stress scenarios are introduced by varying the loan parameters accordingly within a broader Monte Carlo setup, thereby resulting in a range of portfolios. A classical approach to stress-testing does not typically integrate loan production or embed the correlation structure amongst risk metrics. In our approach, we integrate the forecasting of risk metrics with receipt-generation. Given data, the loan parameters within our extendable approach can be dynamically modelled as functions of input variables using any applicable technique. Overall, our approach can render predictions that are more dynamic and flexibly tuned, which can enhance stress-testing practices within any bank.

2606.18778 2026-06-18 cs.LG stat.ML 新提交

Online Distributional Prediction via Latent Cluster Geometry Under Drift and Corruption

漂移与腐败下基于潜在簇几何的在线分布预测

Navyansh Mahla, Prateek Chanda, Ganesh Ramakrishnan

发表机构 * Indian Institute of Technology, Bombay(印度理工学院,孟买)

AI总结 针对非平稳流中的在线分布预测问题,提出一种基于潜在簇几何的吉布斯准后验方法,通过可逆跳跃MCMC采样变维后验,并引入重启变体应对漂移,在亚线性腐败预算和运输代价下实现亚线性Wasserstein遗憾。

详情
AI中文摘要

非平稳流中的在线学习通常被表述为跟踪点估计,但许多应用需要预测完整的数据生成分布。我们研究漂移和对抗性腐败下的在线分布预测。我们的方法通过潜在簇几何表示每个候选律:一个可变大小的中心配置,组织概率质量并诱导预测分布。这些配置上的吉布斯准后验通过后验平均产生在线预测器,所得变维后验可通过可逆跳跃MCMC采样。因此,该方法避免了指定参数化流律,同时保留了用于不确定性、正则化和比较的结构化潜在空间。我们通过累积Wasserstein-1遗憾相对于时变真实律来评估性能。分析分离了两种效应:腐败扰动基于损失的后验更新,而漂移使长时域后验记忆过时。我们通过一个重启变体来解决后者,该变体在时间上局部化相同的准贝叶斯更新。所得的高概率界分解为PAC-Bayesian复杂度项、腐败敏感的后验扰动项以及由\(A_T^{\mathrm{OT}}=\sum_{t=2}^T W_2^2(p_{t-1}^*,p_t^*)\)驱动的动态最优传输项。在有界支撑、稳定潜在几何、预测映射正则性、预言可实现性、局部化重启窗口、亚线性传输作用和亚线性腐败预算下,重启预测器实现了亚线性累积Wasserstein遗憾。这些保证不需要对流、漂移机制或腐败过程进行参数化建模。

英文摘要

Online learning in non-stationary streams is often formulated as tracking a point estimate, but many applications require predicting the full data-generating distribution. We study online distributional prediction under drift and adversarial corruption. Our approach represents each candidate law through a latent cluster geometry: a variable-size configuration of centers that organizes probability mass and induces a predictive distribution. A Gibbs quasi-posterior over these configurations yields an online predictor by posterior averaging, and the resulting variable-dimensional posterior can be sampled with reversible-jump MCMC. The method therefore avoids specifying a parametric streaming law while retaining a structured latent space for uncertainty, regularization, and comparison. We evaluate performance by cumulative Wasserstein-1 regret against the time-varying true law. The analysis separates two effects: corruption perturbs the loss-based posterior update, whereas drift makes long-horizon posterior memory stale. We address the latter with a restarted variant that temporally localizes the same quasi-Bayesian update. The resulting high-probability bounds decompose into a PAC-Bayesian complexity term, a corruption-sensitive posterior perturbation term, and a dynamic optimal-transport term driven by \(A_T^{\mathrm{OT}}=\sum_{t=2}^T W_2^2(p_{t-1}^*,p_t^*)\). Under bounded support, stable latent geometry, predictive-map regularity, oracle realizability, localized restart windows, sublinear transport action, and sublinear corruption budget, the restarted predictor achieves sublinear cumulative Wasserstein regret. These guarantees require no parametric model for the stream, drift mechanism, or corruption process.

2606.18463 2026-06-18 cs.DC cs.LG cs.NA math.NA stat.ML 新提交

Mixed-Precision Communication-Avoiding SGD for Generalized Linear Models on GPUs

面向GPU上广义线性模型的混合精度通信避免SGD

Aditya Devarakonda, Irene Simó Muñoz, Giulia Guidi

发表机构 * Department of Computer Science, Wake Forest University(沃杰福大学计算机科学系) Department of Computer Science, Cornell University(康奈尔大学计算机科学系)

AI总结 提出混合精度通信避免SGD(CA-SGD),通过分析有限精度误差将精度选择分解为九个独立部分,在NVIDIA GPU上实现5.1-6.8倍加速,且损失与FP32 SGD匹配。

详情
AI中文摘要

分布式随机梯度下降(SGD)受限于通信而非计算,因为每次迭代都需要跨进程进行AllReduce。通信避免SGD(CA-SGD)通过将$s$次连续的AllReduce替换为单个$sb\ imes sb$ Gram矩阵的AllReduce,将通信开销分摊到$s$次迭代中,以更多的计算和带宽换取更少的同步点。现代GPU配备矩阵硬件和低精度格式,通过加速Gram GEMM和缩减BF16流量来抵消这一开销。我们研究了NVIDIA GPU上针对广义线性模型的混合精度CA-SGD。我们的有限精度分析将一次CA-SGD外迭代的局部舍入误差分解为九个独立的精度选择,仅通过低精度单元舍入误差依赖于硬件,因此所得方案原则上可跨GPU代际迁移。该方案将输入矩阵和边缘向量以低精度存储,从低精度输入计算Gram矩阵并采用高精度累加,以高精度通信该矩阵,并以高精度执行内部递推和权重更新。在NERSC Perlmutter A100 GPU上,混合精度CA-SGD在逻辑回归、线性回归和泊松问题上的损失与FP32 SGD相差在0.5%以内,并在epsilon、SUSY、HIGGS、synth和Poisson-synth数据集上达到5.1-6.8倍于FP32 SGD的加速。我们的软件可在以下网址获取:this https URL

英文摘要

Distributed stochastic gradient descent (SGD) is limited by communication rather than computation, since each iteration requires an AllReduce across processes. Communication-avoiding SGD (CA-SGD) amortizes communication over $s$ iterations by replacing $s$ consecutive AllReduces with a single AllReduce of an $sb\times sb$ Gram matrix, trading more computation and bandwidth for fewer synchronization points. Modern GPUs with matrix hardware and reduced-precision formats offset this by accelerating the Gram GEMM and shrinking BF16 traffic. We study mixed-precision CA-SGD for generalized linear models on NVIDIA GPUs. Our finite-precision analysis decomposes the local rounding error of one CA-SGD outer iteration into nine independent precision choices, depending on the hardware only through its low-precision unit roundoffs, so the resulting recipes transfer in principle across GPU generations. The recipe stores the input matrix and margin vector in low precision, computes the Gram matrix from low-precision inputs with high-precision accumulation, communicates it in high precision, and performs the inner recurrence and weight updates in high precision. On NERSC Perlmutter A100 GPUs, mixed-precision CA-SGD matches FP32 SGD loss within $0.5\%$ on logistic, linear, and Poisson problems and reaches $5.1$--$6.8\times$ speedup over FP32 SGD on epsilon, SUSY, HIGGS, synth, and Poisson-synth. Our software is available at https://doi.org/10.5281/zenodo.20448273

2606.15414 2026-06-18 cond-mat.dis-nn cond-mat.stat-mech stat.CO 新提交

Cluster-based Message-Passing (CluMP) Optimization for Complex QUBO Problems

基于聚类的消息传递(CluMP)优化复杂QUBO问题

Paolo Rissone, Stefan Boettcher, Alfonso Amendola, Simone Sala, Federico Ricci-Tersenghi

AI总结 提出CluMP算法,通过信念传播控制聚类内阻挫,实现自旋集体更新,在稀疏图上以更少操作达到更低能量,优于局部更新启发式方法。

Comments Main: 9 pages, 4 figures, 1 table. End Matter: 2 pages and 1 figure. Supp. Info: 5 pages, 3 figures

详情
AI中文摘要

二次无约束布尔优化(QUBO)问题在工业应用和科学研究中广泛存在。QUBO问题对应于定义在通常稀疏且异质图上的伊辛自旋系统的优化。当QUBO问题包含冲突请求时,相应的伊辛系统受挫,产生复杂的能量景观,难以探索和优化。尽管有广泛的算法和硬件发展,在这些系统中找到低能构型仍然具有挑战性(例如,局部更新启发式方法通常陷入亚稳态),特别是当(可能受挫的)相互作用产生扩展的相关域时。我们引入CluMP(基于聚类的消息传递),一种利用信念传播(BP)信息对自旋连接聚类进行集体更新的算法。通过控制聚类内的阻挫程度,CluMP使得BP在大子图上收敛,并提出了涉及单次移动中多达数百个自旋的非局域重排。我们在几种图拓扑(包括随机正则图和二维、三维晶格正则图)上定义的旋玻璃模型上,将CluMP与最先进的局部更新启发式方法进行基准测试。聚类移动始终如一地绕过局部陷阱,并以比单自旋动力学更少的有效操作达到更低的能量。这些结果表明,容忍阻挫的聚类更新可以在稀疏图上高效实现。CluMP框架为大规模组合优化和推理问题提供了一种可扩展的策略,其中利用中长程相关性是导航复杂能量景观的关键。

英文摘要

Quadratic Unconstrained Boolean Optimization (QUBO) problems are widespread in both industrial applications and scientific studies. A QUBO problem corresponds to the optimization of a system of Ising spins defined on a generally sparse and heterogeneous graph. When the QUBO problem contains conflicting requests, the corresponding Ising system is frustrated, generating a complex energy landscape, which is hard to explore and optimize. Despite extensive algorithmic and hardware developments, finding low-energy configurations in these systems remains challenging (e.g., local-update heuristics typically become trapped in metastable states), especially when the (possibly frustrated) interactions generate extended correlated domains. We introduce CluMP (Cluster-based Message-Passing), an algorithm that performs collective updates on connected clusters of spins using information from Belief Propagation (BP). By controlling the amount of frustration within clusters, CluMP enables BP convergence on large subgraphs and proposes nonlocal rearrangements involving up to hundreds of spins in a single move. We benchmark CluMP against state-of-the-art local-update heuristics on spin-glass models defined on several graph topologies, including random regular graphs and lattice regular graphs in two and three dimensions. Cluster moves consistently bypass local trapping and reach lower energies with fewer effective operations than single-spin dynamics. These results demonstrate that frustration-tolerant cluster updates can be implemented efficiently on sparse graphs. The CluMP framework provides a scalable strategy for large-scale combinatorial optimization and inference problems, where exploiting medium- and long-range correlations is key to navigating complex energy landscapes.

2602.23006 2026-06-18 stat.ML cs.LG 版本更新

Regular Fourier Features for Nonstationary Gaussian Processes

非平稳高斯过程的规则傅里叶特征

Arsalan Jawaid, Abdullah Karatas, Jörg Seewig

发表机构 * Institute of Measurement and Sensor Technology University of Kaiserslautern-Landau(测量与传感器技术研究所 柏林-卡尔斯鲁厄大学) Independent Researcher(独立研究者)

AI总结 提出规则傅里叶特征方法,通过直接离散化谱表示避免概率假设,实现非平稳高斯过程的低秩近似,并扩展至核学习。

Comments 11 pages (9 main + 2 suppl.), 5 figures, 2 tables

详情
AI中文摘要

模拟高斯过程需要从高维高斯分布中采样,其计算复杂度随采样点数量呈三次方增长。谱方法通过利用傅里叶表示并将谱密度视为适用于蒙特卡洛近似的概率分布来应对这一挑战。尽管这种概率解释对平稳过程有效,但对于非平稳情况则过于严格,因为非平稳过程的谱密度通常不是概率测度。我们针对可调和过程提出规则傅里叶特征以避免这一限制。我们的方法直接离散化谱表示,保留谱权重之间的相关结构,无需概率假设。在有限谱支撑假设下,这产生了一个高效的低秩近似,该近似一致且半正定。当谱密度未知时,该框架自然地扩展到基于数据的核学习。我们在局部平稳和可调和混合核(后者具有复值谱密度)上演示了该方法,并将核学习扩展应用于真实和合成数据。

英文摘要

Simulating a Gaussian process requires sampling from a high-dimensional Gaussian distribution, which scales cubically with the number of sample locations. Spectral methods address this challenge by exploiting the Fourier representation and treating the spectral density as a probability distribution suitable for Monte Carlo approximation. Although this probabilistic interpretation is valid for stationary processes, it is overly restrictive for the nonstationary case, where spectral densities are generally not probability measures. We propose regular Fourier features for harmonizable processes to avoid this limitation. Our method discretizes the spectral representation directly, preserving the correlation structure among spectral weights without requiring probability assumptions. Under a finite-spectral-support assumption, this yields an efficient low-rank approximation that is consistent and positive semi-definite by construction. When the spectral density is unknown, the framework extends naturally to kernel learning from data. We demonstrate the method on locally stationary and harmonizable mixture kernels, the latter with a complex-valued spectral density, and apply the kernel-learning extension to real and synthetic data.

2511.00366 2026-06-18 stat.ML cs.CE cs.LG 版本更新

A Streaming Sparse Cholesky Method for Derivative-Informed Gaussian Process Surrogates Within Digital Twin Applications

面向数字孪生应用中导数信息高斯过程代理的流式稀疏Cholesky方法

Shridhar Vashishtha, Krishna Prasath Logakannan, Jacob Hochhalter, Shandian Zhe, Robert M. Kirby

发表机构 * organization= Department of Mechanical Engineering, University of Utah , addressline= , city= Salt Lake City , postcode= 84112 , state= UT , country= USA organization= Kahlert School of Computing, University of Utah , city= Salt Lake City , postcode= 84112 , state= UT , country= USA organization= Scientific Computing \& Imaging Institute, University of Utah , addressline= , city= Salt Lake City , postcode= 84112 , state= UT , country= USA

AI总结 提出一种流式稀疏Cholesky方法,通过动态更新和导数信息增强高斯过程代理,降低协方差矩阵维度,实现数字孪生中飞机结构性能的实时预测。

详情
AI中文摘要

数字孪生被开发用于模拟特定物理资产(或孪生体)的行为,它们可以由高保真基于物理的模型或代理组成。高精度代理通常优于多物理场模型,因为它们能够实时预测物理孪生体的未来状态。为了适应特定的物理孪生体,必须使用来自该物理孪生体的在役数据更新数字孪生模型。在本文中,我们结合并扩展了几项先前与代理相关的进展,旨在展示一个端到端的数字孪生(DT)解决方案,用于预测飞机结构(物理资产)的性能。为此,我们将高斯过程(GP)模型扩展到包含导数数据,以提高精度,并通过动态更新来吸收在役期间的物理孪生体数据。然而,包含导数数据会带来协方差矩阵维度增加的过高成本。我们通过改进的动态稀疏Cholesky线性系统求解器规避了这个问题。数值实验表明,导数增强的稀疏Cholesky GP方法在动态数据添加时产生了改进的模型预测精度。最后,我们在一个数字孪生框架内演示了所开发的算法,用于模拟航空航天飞行器中的疲劳裂纹扩展,从而通过我们组装的工程系统展示了数字孪生技术如何在实践中结合。

英文摘要

Digital twins are developed to model the behavior of a specific physical asset (or twin), and they can consist of high-fidelity physics-based models or surrogates. A highly accurate surrogate is often preferred over multi-physics models as they enable forecasting the physical twin future state in real-time. To adapt to a specific physical twin, the digital twin model must be updated using in-service data from that physical twin. In this paper, we combine and extend several previous surrogate-related advancements with the goal of demonstrating an end-to-end digital twin (DT) solution for predicting performance of an aircraft structure (the physical asset). To this end, we extend Gaussian process (GP) models to include derivative data, for improved accuracy, with dynamic updating to ingest physical twin data during service. Including derivative data, however, comes at a prohibitive cost of increased covariance matrix dimension. We circumvent this issue through our modified dynamic sparse Cholesky linear system solver. Numerical experiments demonstrate that the prediction accuracy of the derivative-enhanced sparse Cholesky GP method produces improved models upon dynamic data additions. Lastly, we demonstrate the developed algorithm within a DT framework to model fatigue crack growth in an aerospace vehicle, thereby exhibiting through our assembled engineered system how digital twin technologies can be combined in practice.

2412.08876 2026-06-18 stat.CO astro-ph.CO 版本更新

Practical and Scalable Hamiltonian Monte Carlo Without the Metropolis Test

无需Metropolis检验的实用可扩展哈密顿蒙特卡洛

Jakob Robnik, Reuben Cohn-Gordon, Uroš Seljak

AI总结 提出自动调整步长方案,在无Metropolis调整的HMC和欠阻尼Langevin蒙特卡洛中控制渐近偏差,实验表明该方法在贝叶斯推断和百万参数物理模型中显著优于调整后的方法。

详情
AI中文摘要

哈密顿蒙特卡洛和欠阻尼朗之万蒙特卡洛是采样具有可微密度的高维分布的主要方法。两者都依赖数值积分,这在期望估计中引入了渐近偏差。通过使用Metropolis Hastings(MH)步骤调整数值积分可以消除此偏差,但代价是混合速度变慢和方差增大。或者,如果我们避免MH步骤并使用适当的积分步长,我们可以用偏差换取更低的方差。这些未调整的方案表现出色,尤其是在高维问题中,但由于缺乏自动步长选择而很少使用。我们提出了一种自动调整方案,选择步长以满足用户指定的渐近偏差容限。该方法基于我们建立的能量误差与偏差之间的关系。我们在高斯设置下严格分析了该方法,并将分析数值扩展到几个非高斯问题。在贝叶斯推断和大型统计物理模型(超过一百万个参数)上的实验表明,通过我们的调整,未调整的方法始终显著优于调整后的对应方法。

英文摘要

Hamiltonian Monte Carlo and underdamped Langevin Monte Carlo are leading methods for sampling from high-dimensional distributions with differentiable densities. Both rely on numerical integration, which introduces asymptotic bias in expectation estimates. This bias can be removed by adjusting the numerical integration with a Metropolis Hastings (MH) step, at a cost of slower mixing and larger variance. Alternatively, we can trade bias for lower variance if we avoid the MH step and use an appropriate step size of integration. These unadjusted schemes have strong performance, especially in high-dimensional problems, but are rarely used due to the lack of automated step size selection. We propose an automatic tuning scheme that selects a step size to meet a user-specified asymptotic bias tolerance. The method is based on a relationship between energy error and bias which we establish. We rigorously analyze the method in the Gaussian setting and numerically extend the analysis to several non Gaussian problems. Experiments on Bayesian inference and large scale statistical physics models (with over one million parameters) show that, with our tuning, unadjusted methods consistently and significantly outperform adjusted counterparts.

2503.18351 2026-06-18 stat.ME 版本更新

Parametric inference for the discretely observed multivariate Hawkes process using particle Markov Chain Monte Carlo

离散观测多元霍克斯过程的粒子马尔可夫链蒙特卡罗参数推断

Jason J. Lambe, Feng Chen, Tom Stindl, Tsz-Kit Jeffrey Kwan

AI总结 针对离散观测的多元霍克斯过程,提出基于序贯蒙特卡洛的无偏似然估计,结合Metropolis-Hastings算法实现参数后验推断,在模拟和实际数据中优于现有方法。

Comments 28 pages, 9 figures and 2 tables

详情
AI中文摘要

多元霍克斯过程(MHP)是分析具有自激发和互激发特性的多维事件时间序列的有用统计模型。当MHP被离散监测时,只能观察到不相交时间区间内每个维度的事件总数。该数据的似然函数难以处理,因此传统推断技术不可用。为解决此问题,我们基于状态空间模型中将未观测事件时间表示为潜在变量的表示,使用序贯蒙特卡洛(SMC)设计了难以处理的似然函数的无偏估计。SMC估计的无偏性允许其在Metropolis-Hastings算法中替代真实似然,从而构建来自MHP参数后验分布的马尔可夫链蒙特卡罗样本。使用模拟数据,我们评估了所提方法的性能,并证明其在均方误差和计算效率方面优于现有方法。基于2018年至2021年阿富汗和巴基斯坦的每日计数数据,分析了恐怖活动,以考察该地区恐怖主义的动态。

英文摘要

The multivariate Hawkes process (MHP) is a useful statistical model for analysing multidimensional event time sequences that exhibit self-excitation and cross-excitation. When the MHP is monitored discretely, only the total number of events for each dimension in disjoint time intervals is observed. The likelihood function relative to this data is intractable, so traditional inference techniques are not available. To address this, we design an unbiased estimate of the intractable likelihood function using sequential Monte Carlo (SMC) based on a representation of the unobserved event times as latent variables in a state-space model. The unbiasedness of the SMC estimate allows for its use in place of the true likelihood in a Metropolis-Hastings algorithm, enabling the construction of a Markov Chain Monte Carlo sample from the posterior distribution over the parameters of the MHP. Using simulated data, we assess the performance of our method and demonstrate that it outperforms existing approaches in terms of mean squared error and computational efficiency. Terrorist activity in Afghanistan and Pakistan from 2018 to 2021 is analysed based on daily count data to examine the dynamics of terrorism in the region.

1712.09566 2026-06-18 stat.CO stat.ME 版本更新

Mixture model fitting using conditional models and modal Gibbs sampling

使用条件模型和模态吉布斯采样的混合模型拟合

Héctor Gómez-López, Virgilio Gómez-Rubio

AI总结 提出一种基于模态吉布斯采样的新算法,通过仅采样辅助变量来近似混合模型的后验分布,利用集成嵌套拉普拉斯近似计算密度参数的条件分布众数,降低计算负担并给出良好估计。

详情
AI中文摘要

混合模型是一种使用不同参数分布的凸组合来建模数据的便捷方式。本文提出一种基于吉布斯采样的新算法,用于近似将每个观测分配到混合组中的辅助变量的后验分布,而无需对模型中的任何其他参数进行采样。特别地,使用集成嵌套拉普拉斯近似计算混合中密度参数的全条件分布的近似众数,并将其代入辅助变量的全条件分布以抽取样本。混合中其余参数的后验分布通过贝叶斯模型平均在辅助变量的条件后验边际上平均得到。这种近似,即“模态”吉布斯采样,减少了吉布斯采样算法中的计算负担,并提供了辅助变量后验分布的极好估计。一项模拟研究支持了“模态”吉布斯采样的有效性,并分别使用高斯和泊松分布的混合讨论了关于两个已知数据集的例子。

英文摘要

Mixture models are a convenient way of modeling data using a convex combination of different parametric distributions. A new algorithm based on Gibbs sampling is used to approximate the posterior distribution of the auxiliary variables, that assign each observation to a group in the mixture, without sampling any other parameter in the model. In particular, the modes of an approximation to the full conditional distributions of the parameters of the densities in the mixture are computed using the Integrated Nested Laplace Approximation. These are plugged-in to the full conditional distribution of the auxiliary variables to draw samples. The posterior distributions of the remainder of the parameters in the mixture are obtained by averaging over their conditional posterior marginals on the auxiliary variables using Bayesian model averaging. This approximation, 'modal' Gibbs sampling, reduces the computational burden in the Gibbs sampling algorithm and provides very good estimates of the posterior distribution of the auxiliary variables. A simulation study supports the validity of 'modal' Gibbs sampling and two examples on well-known datasets are discussed using a mixture of Gaussian and Poisson distributions, respectively.

7. 机器学习统计基础 30 篇

2606.19212 2026-06-18 stat.ML cs.LG 新提交

Generalised Eigenvalue Geometry of Semantic Adversarial Attacks

语义对抗攻击的广义特征值几何

Martin Anthony, Kaveh Salehzadeh Nobari

AI总结 提出一种连续局部模型,通过矩阵束$(A,B)$的最大广义特征值量化语义对抗攻击性,并给出预测翻转条件、攻击性证书及VC界。

详情
AI中文摘要

最近的实证工作表明,语义等价的释义可以欺骗金融情感分类器:尽管释义在强参考嵌入下保持与原文接近,但它可能足以改变目标模型的表示,从而改变预测类别。现有的鲁棒性理论要么假设单模型威胁模型,要么主要关注实证攻击算法。我们开发了一个连续局部模型来描述语义释义扰动,该模型捕捉了这种双模型结构。我们证明,在代理模型预算下,目标表示的最坏情况局部位移由从两个嵌入映射的雅可比矩阵构造的矩阵束$(A,B)$的最大广义特征值控制。由此产生的攻击性指标$\lambda^*(x)$是局部释义几何和所选嵌入器固有的,为仿射读出提供了闭式预测翻转条件,并支持保守的总体和有限样本攻击性证书。为了对仿射读出的类别进行统一控制,我们推导了二元攻击性指标的无分布VC界,以及基于攻击性调整边界的尺度敏感边界,该边界从标准分类器边界中减去局部几何惩罚。我们还将连续理论与离散释义搜索联系起来,识别出成功与不成功的有限搜索之间的不对称性,并给出了离散和连续设置一致时的覆盖条件。最后,我们提出了一个使用软令牌松弛和生成的释义集的实证验证框架,以评估部署的金融文本分类器上的局部特征值几何、预测翻转条件和有限搜索近似。

英文摘要

Recent empirical work shows that semantically equivalent paraphrases can fool financial sentiment classifiers: although a paraphrase remains close to the original under a strong reference embedding, it may shift the target model's representation enough to change the predicted class. Existing robustness theory either assumes a single-model threat model or focuses mainly on empirical attack algorithms. We develop a continuous local model of semantic paraphrase perturbations that captures this two-model structure. We show that the worst-case local displacement of the target representation, subject to a proxy-model budget, is governed by the largest generalised eigenvalue of a matrix pencil $(A,B)$ constructed from the Jacobians of the two embedding maps. The resulting attackability index $λ^*(x)$ is intrinsic to the local paraphrase geometry and the chosen embedders, yields a closed-form prediction-flip condition for affine readouts, and supports conservative population and finite-sample attackability certificates. For uniform control over classes of affine readouts, we derive a distribution-free VC bound for binary attackability indicators and a scale-sensitive margin bound based on an attackability-adjusted margin that subtracts a local geometric penalty from the standard classifier margin. We also connect the continuous theory to discrete paraphrase search, identify an asymmetry between successful and unsuccessful finite searches, and give a covering condition under which the discrete and continuous settings agree. Finally, we propose an empirical verification framework using soft-token relaxations and generated paraphrase sets to assess the local eigenvalue geometry, prediction-flip condition, and finite-search approximation on a deployed financial-text classifier.

2606.19057 2026-06-18 stat.ML cs.LG stat.CO stat.ME 新提交

Quantifying and Auditing LLM Evaluation via Positive--Unlabeled Learning

通过正-无标签学习量化与审计大语言模型评估

Zilong Zhang, Yi-Ting Hung, Lei Ding, Chi-Kuang Yeh

AI总结 针对大语言模型作为评估者存在的系统性偏差(如冗长偏好),提出基于部分最优传输的几何审计框架,利用少量人工验证正样本校正偏差,无需重训练即可提升与人类偏好的一致性。

详情
AI中文摘要

大语言模型(LLM)越来越多地被用作可扩展评估的评判者,然而这种LLM作为评判者的系统表现出与语义质量脱节的系统性偏差,最显著的是冗长偏差。同时,人工监督成本高昂且通常具有选择性,产生可靠的正向判断,但大多数输出未被标记且质量可能参差不齐。我们将选择性人工监督下的LLM评估形式化为一个正-无标签学习问题,并提出了一个基于部分最优传输的几何审计框架。通过在固定嵌入空间中将一小部分人工验证的正样本与可靠的无标签输出子集对齐,我们的方法识别出与人类一致的偏好,并在无需重新训练的情况下纠正有偏的评判者。实验表明,该方法提高了与人类偏好的一致性,增强了对呈现偏差的鲁棒性,并提供了可解释的置信度估计,为现有的LLM作为评判者流程提供了一种可扩展且统计上有依据的替代方案。

英文摘要

Large Language Models (LLMs) are increasingly used as judges for scalable evaluation, yet such LLM--as--a--Judge systems exhibit systematic biases that are decoupled from semantic quality, most notably verbosity bias. Meanwhile, human supervision is costly and typically selective, yielding reliable positive judgments but leaving most outputs unlabelled and potentially mixed in quality. We formulate LLM evaluation under selective human supervision as a positive--unlabelled learning problem and propose a geometric auditing framework based on Partial Optimal Transport. By aligning a small set of human--verified positives with a reliable subset of unlabelled outputs in a fixed embedding space, our method identifies human--consistent preferences and corrects biased judges without retraining. Experiments demonstrate improved alignment with human preferences, increased robustness to presentation biases, and interpretable confidence estimates, offering a scalable and statistically grounded alternative to existing LLM--as--a--judge pipelines.

2606.18993 2026-06-18 stat.ML cs.LG stat.ME 新提交

Sequential Kernel-based Conditional Independence Testing via Adaptive Betting

基于自适应投注的序列核条件独立性检验

Zheng He, Danica J. Sutherland

AI总结 提出一种对估计误差更鲁棒的序列条件独立性检验方法,通过自适应优化核条件独立性统计量、归一化及截断平移校准,在合成与真实数据上控制第一类错误并保持高功效。

Comments Published at ICML 2026: https://openreview.net/forum?id=vUMdIyTs9c

详情
AI中文摘要

检验条件独立性是基础但本质上困难的问题:在没有额外假设的情况下,通常无法控制第一类错误。“Model-X”范式通过假设精确知道相关条件分布来解决这一困难。虽然经典的一次性检验有时可以容忍对该假设的小偏差,但现有的序列条件独立性检验通常要求精确知道Model-X条件分布,这使得当必须估计该分布时它们变得脆弱。我们提出了一种新方法,对这类估计误差具有更强的鲁棒性。我们的方法将测试-投注应用于自适应优化的核条件独立性统计量,并结合归一化方案和截断-移位校准策略。这些修改大大减少了第一类错误膨胀,同时在高维合成基准和现实世界公平性任务中保持了高功效,优于现有的序列Model-X方法。代码可在https://this URL获取。

英文摘要

Testing conditional independence is fundamental yet intrinsically difficult: without additional assumptions, Type I error control is impossible in general. The "Model-X'' paradigm addresses this difficulty by assuming exact knowledge of a relevant conditional distribution. While small deviations from this assumption can sometimes be tolerated in classical one-shot testing, existing sequential conditional independence tests typically require the Model-X conditional to be known exactly, making them fragile when it must instead be estimated. We propose a new approach that is substantially more robust to such estimation error. Our method applies testing-by-betting to an adaptively optimized Kernel Conditional Independence statistic, together with a normalization scheme and a truncate-and-shift calibration strategy. These modifications greatly reduce Type I error inflation while preserving high power across high-dimensional synthetic benchmarks and real-world fairness tasks, outperforming existing sequential Model-X approaches. Code is available at https://github.com/he-zh/SKCI.

2606.18972 2026-06-18 stat.ML cs.LG 新提交

FOSC-X: An Extended Framework for Optimal Local Cuts and Non-Horizontal Cluster Selection from Clustering Hierarchies

FOSC-X: 一种用于从聚类层次结构中提取最优局部切割和非水平聚类的扩展框架

Connor Simpson, Ricardo J. G. B. Campello

AI总结 提出FOSC-X框架,通过动态规划从层次聚类树中提取前M个全局最优的局部非水平切割聚类,支持聚类数约束,在线性时间内保证最优排序。

详情
AI中文摘要

从层次结构中提取平坦聚类解是实际聚类分析中的常见任务,可表述为优化问题。现有方法侧重于寻找单个最优解。我们引入FOSC-X,一个从层次聚类树的局部非水平切割中提取前M个全局最优平坦聚类的框架,同时可选地对聚类数量施加约束。这使得能够自动识别多个高质量替代聚类,捕捉层次结构的不同方面。无约束时,利用子树内局部最优部分候选可组合成全局最优解并自动确定聚类数的性质,通过动态规划在多项式时间内求解前M问题。然而,这可能导致聚类数最终不理想——例如,在特定应用领域中过大而失去意义或难以实际分析。施加聚类数约束破坏了无约束动态规划方法的最优性性质,因为局部最优部分候选可能不再能组合成可行的全局最优解。FOSC-X通过一种动态规划策略应对这一挑战,该策略使用可行性的下界和上界维护紧凑的可行候选集,同时剪枝不可行或占优的组合。所得方法保证在有无聚类数约束下,均以聚类节点数和数据集大小的线性时间复杂度获得前M个解的最优排序。实验表明,FOSC-X能有效揭示单解提取方法忽略的替代聚类结构。

英文摘要

Extracting a flat clustering solution from a hierarchy is a common task in practical cluster analysis and can be formulated as an optimisation problem. Existing approaches focus on finding a single optimal solution. We introduce FOSC-X, a framework for extracting the top-M globally optimal flat clusterings from local, non-horizontal cuts of a hierarchical cluster tree, while optionally enforcing constraints on the number of clusters. This enables automatic identification of multiple high-quality alternative clusterings that capture different aspects of the hierarchical structure. Without constraints, the top-M problem can be solved in polynomial time using dynamic programming, exploiting the property that locally optimal partial candidates within subtrees can be combined to form globally optimal solutions while automatically determining the number of clusters. However, this can lead to solutions with numbers of clusters that are ultimately undesirable -- e.g., too large to be meaningful or practically analysed within a particular application domain. Imposing cluster-count constraints breaks the optimality property underlying the unconstrained dynamic programming approach, since locally optimal partial candidates may no longer combine into feasible globally optimal solutions. FOSC-X addresses this challenge through a dynamic programming strategy that maintains compact sets of feasible candidates using lower and upper feasibility bounds while pruning infeasible or dominated combinations. The resulting method guarantees optimal rankings of the top-M solutions with linear-time complexity in the number of cluster nodes and dataset size, both with and without cluster-count constraints. Experiments show that FOSC-X efficiently reveals alternative clustering structures overlooked by single-solution extraction methods.

2606.18853 2026-06-18 stat.ML cs.LG 新提交

Kernel of Partition Paths: A Unified Representation for Tree Ensembles

划分路径的核:树集成的统一表示

Nicolas Mahler

AI总结 提出KPP核,通过路径度量索引森林节点,统一了预测、精确加性归因、确定性Lipschitz鲁棒半径和Rademacher风险界,为树集成提供几何框架。

Comments 31 pages

详情
AI中文摘要

最近的一系列工作将单个决策树重新表述为基于其分裂的工程特征的线性模型,为oracle不等式和特征重要性重解释开辟了途径,但留下了一个开放问题:当通过节点而非分裂索引特征映射时,森林诱导的统一几何对象是什么。本文研究了该对象。KPP通过森林节点索引特征映射,并由路径度量加权,该度量将每个坐标转化为平方欧几里得路径等距嵌入的分量。KPP在承载度量的非对角Gram矩阵下统一了四个支柱:预测、精确加性归因、KPP度量下的确定性Lipschitz鲁棒半径,以及在固定、诚实或交叉拟合条件下的回归和分类的均匀Rademacher风险界。所有概率保证均以表示为条件,并在三种显式条件机制下陈述;鲁棒半径保证在KPP度量下是确定性的,而非原始输入的范数。回归和分类的快速率改进被推测为开放问题,并未声称是定理。

英文摘要

A recent line of work has reframed individual decision trees as linear models on engineered features associated with their splits, opening routes for oracle inequalities and feature-importance reinterpretation, but leaving open the question of what unified geometric object a forest induces when one indexes its feature map by nodes rather than by splits. The present paper studies that object. KPP indexes the feature map by the nodes of the forest, weighted by a path metric that turns each coordinate into a component of a squared-Euclidean path-isometric embedding. KPP unifies four pillars under a single non-diagonal Gram that carries a metric: prediction, exact additive attribution, deterministic Lipschitz robust radius in the KPP metric, and uniform Rademacher risk bounds for regression and classification under fixed, honest, or cross-fit conditioning. All probabilistic guarantees are conditional on the representation and are stated under three explicit conditioning regimes; the robust-radius guarantee is deterministic in the KPP metric rather than in a norm on the raw input. Conjectured fast-rate refinements for both regression and classification are stated as open problems and are not claimed as theorems.

2606.18531 2026-06-18 stat.ML cs.LG 新提交

When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?

轨迹级监督何时允许高效的离线强化学习?

Xuanfei Ren, Tengyang Xie

发表机构 * University of Wisconsin-Madison(威斯康星大学麦迪逊分校)

AI总结 本文研究离线强化学习中仅使用轨迹级结果(如累积回报或偏好)进行策略优化的统计理论,提出OPAC算法并证明其样本复杂度,同时揭示在非线性聚合目标下存在的统计障碍。

Comments 69 pages

详情
AI中文摘要

离线强化学习通常在过程级奖励监督下进行分析,然而许多序列决策数据集仅记录轨迹级结果。我们发展了从这种结果级监督进行离线策略优化的统计理论。首先研究规范设置,其中目标仍是期望累积奖励,但每个离线轨迹仅提供一个标量标签,其条件均值是累积回报。我们提出OPAC,一种悲观演员-评论家算法,它学习潜在奖励模型并从轨迹级标签优化策略。我们证明了阶为$\widetilde O(H^2\sqrt{C_{sa}(\pi^\star)/n})$的高概率保证和匹配的下界,刻画了用单个轨迹级标签替代过程级奖励的尖锐统计代价。然后我们将该原理扩展到基于偏好的反馈,在偏好模型常数范围内保留了领先的视界和可集中性依赖。最后,我们研究广义基于结果的离线强化学习,其中监督和目标都是由潜在每步奖励的非线性聚合引起的轨迹级量。该问题通常不可学习:对于全成功目标,即使具有确定性转移和常数可集中性,任何离线学习器可能需要$\Omega(2^H)$个轨迹。然后我们通过两个结构系数$\kappa_\mu(\sigma)$和$\chi_\mu(\sigma)$识别出一个可处理的区域,这两个系数捕捉了结果聚合和广义贝尔曼更新中的信息损失,在此区域广义OPAC实现了多项式样本复杂度。我们的结果共同描绘了何时结果级监督能够实现样本高效的离线控制,以及何时缺失过程级奖励会带来根本性的统计障碍。

英文摘要

Offline reinforcement learning is typically analyzed under process-level reward supervision, yet many sequential decision datasets record only trajectory-level outcomes. We develop a statistical theory for offline policy optimization from such outcome-level supervision. We first study the canonical setting where the target remains the expected cumulative reward, but each offline trajectory provides only a scalar label whose conditional mean is the cumulative return. We propose OPAC, a pessimistic actor-critic algorithm that learns a latent reward model and optimizes a policy from trajectory-level labels. We prove a high-probability guarantee of order $\widetilde O(H^2\sqrt{C_{sa}(π^\star)/n})$ and a matching lower bound, characterizing the sharp statistical cost of replacing process-level rewards with one trajectory-level label. We then extend the principle to preference-based feedback, preserving the leading horizon and concentrability dependence up to preference-model constants. Finally, we study generalized outcome-based offline RL, where both the supervision and the objective are trajectory-level quantities induced by a nonlinear aggregation of latent per-step rewards. This problem is not learnable in general: for all-success objectives, any offline learner may require $Ω(2^H)$ trajectories even with deterministic transitions and constant concentrability. We then identify a tractable regime through two structural coefficients, $κ_μ(σ)$ and $χ_μ(σ)$, capturing information loss in outcome aggregation and generalized Bellman updates, under which generalized OPAC achieves polynomial sample complexity. Together, our results delineate when outcome-level supervision enables sample-efficient offline control and when missing process-level rewards create fundamental statistical barriers.

2606.18527 2026-06-18 stat.ML cs.LG 新提交

Toward Simultaneously Optimal Regret in U-Calibration

面向同时最优遗憾的U-校准

Rafael Frongillo, Haipeng Luo, Nishant A. Mehta, Jon Schneider

发表机构 * University of Colorado Boulder(科罗拉多大学波德穆尔分校) University of Southern California(南加州大学) Google Research(谷歌研究)

AI总结 提出一种基于自和谐噪声的FTPL变体,实现对所有有界适当损失的最优$\tilde O(\sqrt{T})$遗憾和对光滑损失的对数遗憾。

Comments 30 pages; to appear at COLT 2026

详情
AI中文摘要

U-校准研究在线预测算法,其预测可被任何未知下游智能体使用,同时保证对所有适当损失函数的次线性遗憾。现有U-校准算法对每个有界适当损失实现了最坏情况最优的$O(\sqrt{T})$遗憾,但它们未能适应更简单的损失:如我们所示,即使对于平方损失等光滑损失,它们也会产生$\Omega(\sqrt{T})$遗憾,而不是最优的$O(\log T)$遗憾。在这项工作中,我们表明这一局限性并非固有。具体来说,我们设计了一个单一的预测算法,同时对所有有界适当损失实现$\tilde O(\sqrt{T})$遗憾,并对所有有界光滑适当损失实现$O(\log T)$遗憾。更一般地,我们的算法还对于相对于对数障碍光滑的损失(包括几个非Lipschitz例子)实现了对数遗憾。我们的方法基于一种新颖的跟随扰动领导者(FTPL)变体,其中使用自和谐噪声直接在预测空间中应用扰动。由于这种噪声的复杂性质,所得分析也大大偏离了先前的FTPL分析,可能具有独立意义。

英文摘要

U-calibration studies online forecasting algorithms whose predictions can be consumed by any unknown downstream agent, guaranteeing sublinear regret simultaneously for all proper loss functions. Existing U-calibration algorithms achieve worst-case optimal $O(\sqrt{T})$ regret for every bounded proper loss, but they fail to adapt to easier losses: as we show, even for smooth losses such as squared loss, they incur $Ω(\sqrt{T})$ regret instead of the optimal $O(\log T)$ regret. In this work, we show that this limitation is not inherent. Specifically, we design a single forecast algorithm that simultaneously achieves $\tilde O(\sqrt{T})$ regret for every bounded proper loss and $O(\log T)$ regret for every bounded smooth proper loss. More generally, our algorithm also attains logarithmic regret for losses that are smooth relative to the log-barrier, which include several non-Lipschitz examples. Our approach is based on a novel variant of Follow-the-Perturbed-Leader (FTPL) in which perturbations are applied directly in the prediction space using self-concordant noise. The resulting analysis also departs substantially from prior FTPL analyses due to the complex nature of this noise and may be of independent interest.

2606.18520 2026-06-18 stat.ML cs.CG cs.CL cs.DS cs.IR cs.LG 新提交

Compact Geometric Representations of Hierarchies

层次结构的紧凑几何表示

Prashant Gokhale, Piotr Indyk, Yuhao Liu, Sandeep Silwal, Tony Chang Wang, Haike Xu

发表机构 * UW-Madison(威斯康星大学麦迪逊分校) MIT(麻省理工学院)

AI总结 研究如何用低维几何嵌入表示有向无环图中的祖先-后代关系,提出基于树宽等结构参数的维度上界和下界,并在真实数据集上验证了紧凑性。

Comments Published at the 39th Annual Conference on Learning Theory (COLT) 2026. 22 Pages

详情
AI中文摘要

计算数据的几何表示是现代机器学习的基石,通常通过训练双编码器将查询和文档映射到共享嵌入空间来实现。You等人[NeurIPS '25]的最新工作将这种方法扩展到层次检索,其中相关性由有向无环图(DAG)中的祖先-后代关系决定。虽然先前的工作表明当后代数量较少时存在有效嵌入,但这些界限对于深层层次结构会严重退化,所需维度与节点总数相当。在本文中,我们研究了更一般图类的紧凑可达性嵌入,并提供了使用维度依赖于结构图参数的嵌入来表示层次结构的理论保证。我们证明,对于任何有向树,存在常数维度3的可达性嵌入,与树的大小或深度无关。我们将这一结果推广到以树宽$t$为特征的图,构造了维度为$O(t \log n)$的嵌入,其中$n$是节点数。作为这些上界的补充,我们提供了匹配或接近匹配的下界,表明对于一般DAG,维度$\Omega(n)$是必要的,而对于树宽为$t$的图,需要$\Omega(t/\log(n/t))$的维度。我们还获得了由DAG中交叉边数量参数化的上界和下界。此外,我们展示了我们的嵌入可以在真实世界数据集上构建,并且与先前具有理论保证的嵌入相比,在高召回率情况下维度小得多。

英文摘要

Computing geometric representations of data is a cornerstone of modern machine learning, typically achieved by training dual encoders which map queries and documents into a shared embedding space. Recent work of You et al. [NeurIPS '25] has extended this approach to hierarchical retrieval, where relevance is determined by the ancestor-descendant relationships in a Directed Acyclic Graph (DAG). While previous work has shown that valid embeddings exist when the number of descendants is small, these bounds degrade significantly for deep hierarchies, requiring dimensions as large as the total number of nodes. In this paper, we investigate compact reachability embeddings for more general graph classes and provide theoretical guarantees for representing hierarchies using embeddings whose dimension depends on structural graph parameters. We prove that for any directed tree, there exists a reachability embedding in constant dimension 3, independent of the tree's size or depth. We generalize this result to graphs characterized by treewidth $t$, constructing embeddings of dimension $O(t \log n)$, where $n$ is the number of nodes. Complementing these upper bounds, we provide matching or near-matching lower bounds, showing that dimension $Ω(n)$ is necessary for general DAGs and $Ω(t/\log(n/t))$ is required for graphs of treewidth $t$. We also obtain upper and lower bounds parameterized by the number of cross-edges in the DAG. We additionally show that our embeddings can be constructed on real world datasets, and that they give much smaller dimensions in high recall regimes compared to prior embeddings with theoretical guarantees.

2606.19105 2026-06-18 cs.LG stat.ML 新提交

Smoothness-Based Derandomization of PAC-Bayes Bounds

基于光滑性的PAC-Bayes去随机化

Alexandre Lemire Paquin, Brahim Chaib-Draa, Philippe Giguère

发表机构 * Department of Computer Science and Software Engineering(计算机科学与软件工程系) Université Laval(拉瓦尔大学)

AI总结 利用损失和预测器的光滑性,将Gibbs预测器去随机化为后验均值处的确定性预测器,通过Jensen间隙类的Rademacher复杂度控制泛化界,并导出涉及参数雅可比和海森矩阵的正则化器。

详情
AI中文摘要

我们研究光滑损失函数的PAC-Bayes去随机化。我们的目标是通过利用损失和预测器类的光滑性,获得对确定性预测器以高概率成立的泛化界。我们表明,从Gibbs预测器到后验均值处的确定性预测器的转换有一个精确的代价,由Jensen间隙类的泛化间隙给出。我们通过其Rademacher复杂度控制该类,从而得到涉及以参数雅可比和得分图的海森矩阵表示的平坦度量的确定性预测器界。该框架适用于有界和无界光滑损失函数,并将结果专门应用于线性预测器和光滑神经网络。最后,理论中出现的雅可比和海森矩阵量激发了一个实用的正则化器。对于BatchNorm网络,我们通过将BatchNorm变换折叠到相邻的仿射权重中,相对于有效的BatchNorm权重计算该正则化器。在CIFAR-10上的实验说明了该正则化器在不同批量大小下的行为。

英文摘要

We study PAC-Bayes derandomization for smooth loss functions. Our goal is to obtain generalization bounds that hold with high probability for deterministic predictors by exploiting smoothness properties of both the loss and the predictor class. We show that passing from the Gibbs predictor to the deterministic predictor at the posterior mean has a precise cost, given by the generalization gap of the Jensen gap class. We control this class through its Rademacher complexity, leading to bounds for deterministic predictors that involve flatness quantities expressed in terms of parameter Jacobians and Hessians of the score map. The framework applies to both bounded and unbounded smooth loss functions, and we specialize the results to linear predictors and smooth neural networks. Finally, the Jacobian and Hessian quantities appearing in the theory motivate a practical regularizer. For BatchNorm networks, we compute this regularizer with respect to effective BatchNorm weights obtained by folding the BatchNorm transformation into the adjacent affine weights. Experiments on CIFAR-10 illustrate the behavior of this regularizer under different batch sizes.

2606.18867 2026-06-18 cs.LG cs.CY stat.ML 新提交

Strategic Feature Selection

战略特征选择

Jivat Neet Kaur, Pratik Patil, Divya Shanmugam, Emma Pierson, Michael I. Jordan, Nika Haghtalab, Meena Jagadeesan, Ahmed Alaa, Serena Wang

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Texas, Austin(德克萨斯大学奥斯汀分校) Cornell Tech(康奈尔科技) Stanford University(斯坦福大学) University of Pennsylvania(宾夕法尼亚大学) Harvard University(哈佛大学) Inria, Paris(巴黎Inria)

AI总结 研究通过特征选择和岭正则化应对战略操纵的分类问题,发现仅基于可操纵性排除特征通常次优,提出联合优化特征集与正则化水平的算法,并在医疗支付基准上验证。

详情
AI中文摘要

当算法预测器在高风险领域(如医疗)中指导资源分配时,这些预测器必须考虑输入特征的战略操纵。典型的解决方案是重新设计预测器本身以明确考虑战略互动。然而在实践中,决策者通常受限于调整现有预测管道中的粗粒度杠杆。例如,医疗组织通常根据感知的可操纵性选择排除哪些特征,同时使用标准正则化程序来收缩保留特征的系数。在这项工作中,我们通过特征选择及其与岭正则化的相互作用,启动了对战略分类的形式化研究。我们的主要发现是,仅基于可操纵性排除单个特征通常是次优的。我们提供了在最优正则化下特征子集性能的细粒度刻画,为政策设计提供了新的见解。受此刻画启发,我们开发了一种实用算法,用于联合选择特征集和岭正则化水平。通过一个关于医疗支付基准的真实世界案例研究,我们说明了我们的算法如何指导实践中粗粒度政策杠杆的设计。我们的结果为减轻算法决策系统中战略行为的影响提供了一个有原则的、实用的框架。

英文摘要

When algorithmic predictors inform resource allocation in high-stakes domains such as healthcare, these predictors must account for strategic manipulation of input features. The typical solution is to redesign the predictor itself to explicitly account for strategic interactions. In practice, however, decision makers are often constrained to adjusting coarser levers within existing prediction pipelines. For example, healthcare organizations often select which features to exclude based on perceived manipulability, while using standard regularization procedures to shrink the coefficients of retained features. In this work, we initiate a formal study of strategic classification through feature selection and its interaction with ridge regularization. Our main finding is that excluding individual features based on their manipulability alone is generally suboptimal. We provide a fine-grained characterization of the performance of a feature subset under optimal regularization, yielding new insights for policy design. Motivated by this characterization, we develop a practical algorithm for jointly choosing the feature set and the level of ridge regularization. Through a real-world case study on a healthcare payments benchmark, we illustrate how our algorithm can guide the design of coarse policy levers in practice. Our results provide a principled, practical framework for mitigating the effects of strategic behavior in algorithmic decision-making systems.

2606.18538 2026-06-18 cs.LG stat.ML 新提交

Effects of sparsity and superposition on loss in simple autoencoders

稀疏性与叠加对简单自编码器损失的影响

Mriganka Basu Roy Chowdhury, Eric McLaughlin Weiner

发表机构 * Department of Statistics, UC Berkeley(伯克利大学统计学系) Department of Materials Science, UC Berkeley(伯克利大学材料科学系)

AI总结 研究神经网络中多语义性源于叠加现象,通过数学分析稀疏输入下自编码器的L2重构损失上下界,验证并扩展了Elhage等人的实证结果。

Comments 16 pages, 3 figures

详情
AI中文摘要

神经网络机械可解释性的主要困难之一是出现多语义性,即每个神经元通常负责多个不同任务,阻碍了对其功能的清晰解释。Elhage等人(2022)的开创性论文认为,这是由于叠加现象,即神经网络将不同特征表示为低维空间中的非正交方向,这种策略可以在不牺牲保真度的情况下实现更大的数据压缩,因为输入向量具有特征稀疏性。Elhage等人(2022)在一个相当自然且简单的具有稀疏输入的自编码器中实证验证了这些假设。本文的贡献在于分析叠加现象发生和最优性的数学基础,同时严格证实了他们的一些发现。特别地,我们为幂激活函数提供了L2重构损失的上界和下界,在非常稀疏的情况下是紧的。文末还包含一个简短的开放问题列表。

英文摘要

One of the major difficulties in the mechanistic interpretability of neural networks is the occurrence of polysemanticity, which suggests that each neuron is typically responsible for multiple different tasks, impeding a clean interpretation of their function. The seminal paper of Elhage et al. (2022) argues that this occurs due to superposition, a phenomenon where the neural network represents distinct features as non-orthogonal directions in a lower-dimensional space, a strategy that allows much greater compression of the data without sacrificing fidelity due to the feature sparsity of input vectors. Elhage et al. (2022) empirically validates these hypotheses in a rather natural and simple autoencoder with sparse inputs. The contribution of the present work is to analyze the mathematical basis for the occurrence and optimality of superposition, while rigorously corroborating some of their findings. In particular, we provide upper and lower bounds for the L2 reconstruction loss, tight in the very sparse regime, for power activation functions. A short list of interesting open problems are also included at the end.

2606.18509 2026-06-18 cs.LG stat.ML 新提交

Concept Modulation Models: A Unified Framework for Identifiability and Extrapolation

概念调制模型:可识别性与外推的统一框架

Soheun Yi, Yizhou Lu, Chandler Squires, Pradeep Ravikumar

发表机构 * Department of Statistics and Data Science, Carnegie Mellon University(卡内基梅隆大学统计与数据科学系) Machine Learning Department, Carnegie Mellon University(卡内基梅隆大学机器学习系)

AI总结 提出概念调制模型(CMMs),通过属性势统一条件潜变量模型的可识别性与外推分析,将基于转移的可识别性提升至条件设置,并导出代数外推准则。

详情
AI中文摘要

条件潜变量模型中的可靠泛化需要理解可识别性和外推:观测属性间的变化如何决定潜在结构,以及该结构如何决定未见属性上的分布。然而,现有的可识别性和外推保证大多是模型特定的,在非线性ICA、因果表示学习、扰动建模及相关条件潜变量模型中分别进行分析。我们引入概念调制模型(CMMs),这是一类属性索引的条件生成模型,其结构为$A\to \Lambda \to C\to X$,其中属性选择调制器,调制器诱导潜在概念法则,概念生成观测特征。CMMs通过展示观测属性上的特征一致性诱导受CMM类约束的潜在概念转移,将基于转移的可识别性提升至条件设置。我们通过属性势(属性条件概念法则之间的对数密度比)表达这些约束,将通用提升步骤与模型特定的刚性论证分离。相同的势控制外推:当且仅当传输的属性势恒等式扩展到这些属性时,未见属性上的一致性成立。这导出了代数外推准则,识别出几个现有可识别性和外推结果背后的共同基于势的证明对象,并且当与这些工作中的模型特定刚性论证结合时,恢复了它们所述的结论。

英文摘要

Reliable generalization in conditional latent variable models requires understanding both identifiability and extrapolation: how observed variation across attributes determines latent structure, and how that structure determines distributions at unseen attributes. However, existing identifiability and extrapolation guarantees are largely model-specific, with separate analyses in nonlinear ICA, causal representation learning, perturbation modeling, and related conditional latent variable models. We introduce concept modulation models (CMMs), an attribute-indexed class of conditional generative models with structure $A\to Λ\to C\to X$, where attributes select modulators, modulators induce latent concept laws, and concepts generate observed features. CMMs lift transition-based identifiability to conditional settings by showing that feature agreement on observed attributes induces a latent concept transition constrained by the CMM class. We express these constraints through attribute potentials, log-density ratios between attribute-conditioned concept laws, separating the generic lifting step from model-specific rigidity arguments. The same potentials control extrapolation: agreement at unseen attributes holds exactly when the transported attribute-potential identities extend to those attributes. This yields algebraic extrapolation criteria, identifies the common potential-based proof objects behind several existing identifiability and extrapolation results, and, when combined with the model-specific rigidity arguments in those works, recovers their stated conclusions.

2606.18503 2026-06-18 cs.LG stat.ML 新提交

Quantum Annealing Enhanced Reinforcement Learning for Accurate Remaining Useful Lifetime Prediction

量子退火增强强化学习用于精确剩余使用寿命预测

Manoranjan Gandhudi, Arunkumar V., G. R. Anil, Gangadharan G. R

发表机构 * Central University of Karnataka(卡纳塔克中央大学) University College of Engineering, Anna University(安娜大学工程学院) AIONOS India Pvt Ltd(AIONOS印度私人有限公司) National Institute of Technology Tiruchirappalli(蒂鲁吉拉帕利国立理工学院)

AI总结 提出量子退火增强Q学习框架,通过将Q值更新编码为QUBO问题并利用量子退火采样实现随机动作选择,解决高维非凸空间中的收敛问题,在C-MAPSS和工业数据集上显著优于基线方法。

Comments 29 pages, 6 figures, 12 tables

详情
AI中文摘要

剩余使用寿命(RUL)估计是预测性维护的核心,意外故障的成本可能远超资产本身。统计退化模型忽略了真实系统的强非线性,而数据驱动模型在高维非凸搜索空间中常收敛到次优解。我们提出量子退火增强Q学习(QAQL)框架,将量子退火的采样行为与Q学习的序列决策相结合。每个Q值更新被编码为一个小的二次无约束二元优化(QUBO)问题,其基态对应贪婪动作;退火器不是作为确定性优化器,而是在多次读取中返回一个近最优动作的分布,这种随机动作选择提供了探索,从而抑制了在非线性退化轨迹上的过早收敛。QUBO在D-Wave Advantage系统上通过小规模嵌入求解,退火器被嵌入强化学习循环中,而非训练后附加。我们在两个公开基准上验证了QAQL:NASA C-MAPSS涡扇发动机数据集和一个设备群预测性维护数据集。在多次独立运行和六个误差指标上平均,QAQL优于本研究考虑的经典和量子基线,具有统计显著性改进。结果表明,量子退火是工业预测性维护应用中强化学习循环内一个可用的(而非仅理论上的)优化器。

英文摘要

Remaining useful life (RUL) estimation is central to predictive maintenance, where an unplanned failure can cost far more than the asset itself. Statistical degradation models miss the strong nonlinearity of real systems, and data-driven models often converge to suboptimal solutions in high-dimensional, non-convex search spaces. We propose a Quantum Annealing enhanced Q-Learning (QAQL) framework that couples the sampling behaviour of quantum annealing with the sequential decision making of Q-learning. Each Q-value update is encoded as a small quadratic unconstrained binary optimization (QUBO) whose ground state is the greedy action; rather than acting as a deterministic optimizer, the annealer returns a distribution over near-optimal actions across many reads, and this stochastic action selection supplies the exploration that curbs premature convergence on nonlinear degradation trajectories. The QUBO is solved on the D-Wave Advantage system using minor embedding, with the annealer woven into the reinforcement-learning loop rather than bolted on after training. We validate QAQL on two public benchmarks: the NASA C-MAPSS turbofan engine datasets and a device-fleet predictive maintenance dataset. Averaged over many independent runs and across six error metrics, QAQL outperforms the classical and quantum baselines considered in this study, with statistically significant improvements. The results indicate that quantum annealing is a usable, not merely theoretical, optimizer inside a reinforcement-learning loop for industrial predictive-maintenance applications.

2606.18420 2026-06-18 cs.LG q-bio.QM stat.ML 新提交

Measurement noise limits the advantage of nonlinear models over linear models in biomedical prediction

测量噪声限制了非线性模型在生物医学预测中相对于线性模型的优势

Marc-Andre Schulz, Kerstin Ritter

发表机构 * Hertie Institute for AI in Brain Health, University of Tübingen(赫蒂人工智能脑健康研究所,图宾根大学) Tübingen AI Center, University of Tübingen(图宾根人工智能中心,图宾根大学) Department of Psychiatry and Neurosciences, Charité – Universitätsmedizin Berlin(精神病学与神经科学系,柏林夏里特医学院) Bernstein Center for Computational Neuroscience, Berlin(伯恩斯坦计算神经科学中心,柏林) German Center for Mental Health (DZPG), partner site Tübingen(德国心理健康中心(DZPG),图宾根合作站点)

AI总结 本文指出,在生物医学表格数据中,测量噪声会削弱非线性结构,导致非线性模型与线性模型性能相当,并提出了一个精确的超额风险恒等式,揭示了测量可靠性、样本量和特征表示三个条件必须同时满足才能体现非线性优势。

详情
AI中文摘要

在生物医学表格数据上,诸如深度网络、梯度提升树和核方法等灵活模型,在给定相同特征的情况下,反复被线性回归和逻辑回归匹配或击败。通常的反应是将其视为模型方面的不足,需要通过更多数据、更好的架构或调参来修复,假设非线性结构存在而模型未能捕捉到。我们认为,当限制因素是测量而非模型时(这在生物医学中经常发生),这些修复无法奏效。加性噪声模糊了群体最优预测器,并且由于模糊在去除函数的广泛形状之前先去除精细、快速变化的细节,它比线性结构更快地抹去非线性结构。一个k阶交互作用被特征可靠性的k次幂衰减,而线性部分只衰减一次。在生物医学测量典型的可靠性下,即使底层生物学是强非线性的,非线性优势也可能消失,并且噪声所移除的部分无法通过更大的队列或更灵活的模型恢复,只能通过更好的测量。非线性是隐藏的,而非缺失,线性模型与灵活模型之间的平局本身并不能对生物学做出定论。这些片段是经典的,来自测量误差统计、心理测量学和高斯分析,我们将它们组合成一个精确的超额风险恒等式。测量可靠性是与样本量和特征表示并列的三个条件之一,必须对齐才能使灵活模型发挥作用,而它们共同只留下一个狭窄的窗口,大多数生物医学任务落在此窗口之外。在140个英国生物银行任务中,灵活模型与线性模型之间的差距(如果存在)带有预测的噪声特征,并且这三个条件可以通过干预而非仅通过基准测试来分离。

英文摘要

On biomedical tabular data, flexible models such as deep networks, gradient-boosted trees, and kernel methods are repeatedly matched or beaten by linear and logistic regression given the same features. The usual reaction is to treat this as a model-side shortfall, to be fixed with more data, a better architecture, or tuning, on the assumption that the nonlinear structure is there and the model has failed to capture it. We argue that these fixes cannot help when the binding limit is the measurement rather than the model, as it frequently is in biomedicine. Additive noise blurs the population-optimal predictor, and because blurring removes a function's fine, rapidly varying detail before its broad shape, it erases nonlinear structure faster than linear structure. A degree-$k$ interaction is attenuated by the $k$-th power of feature reliability, while the linear part is attenuated only once. At the reliabilities typical of biomedical measurement, the nonlinear advantage can vanish even when the underlying biology is strongly nonlinear, and what the noise removes cannot be recovered by a larger cohort or a more flexible model, only by better measurement. The nonlinearity is hidden, not absent, and a tie between linear and flexible models is not by itself a verdict on the biology. These pieces are classical, drawn from measurement-error statistics, psychometrics, and Gaussian analysis, and we assemble them into an exact excess-risk identity. Measurement reliability is one of three conditions, alongside sample size and feature representation, that must align for a flexible model to help, and together they leave only a narrow window that most biomedical tasks fall outside. Across 140 UK Biobank tasks, the gap between flexible and linear models, where it exists, carries the predicted noise signature, and the three conditions can be separated by intervention but not by a benchmark alone.

2606.19179 2026-06-18 cs.LG cs.AI math.OC stat.ML 新提交

Compute Efficiency and Serial Runtime Tradeoffs for Stochastic Momentum Methods

随机动量方法的计算效率与串行运行时间权衡

Depen Morwani, Alexandru Meterez, Pranav Nair, Sham Kakade

发表机构 * Harvard University(哈佛大学) Kempner Institute at Harvard University(哈佛大学凯普纳研究所)

AI总结 研究随机动量方法(如重球法和加速SGD)在一致线性回归中的批次大小权衡,证明重球法不改善SGD的计算效率前沿但允许更大批次减少串行运行时间,而加速SGD的计算效率与串行运行时间权衡依赖于谱衰减。

详情
AI中文摘要

随机动量方法,如重球法(HB)、Nesterov动量以及加速SGD(ASGD)的变体[Kidambi等人,2018],在现代训练中被广泛使用,但其随机优势取决于两个不同的量:串行运行时间(达到目标精度所需的迭代次数)和计算效率(CE,总梯度查询或FLOP成本的倒数)。更大的批次在不损害CE的情况下减少串行运行时间,仅当收缩间隙随批次大小线性增长时。我们研究了一致线性回归(具有高斯协变量)的随机HB和ASGD,并证明了其批次大小权衡的有限维离散时间下界。我们的第一个结果表明,HB不会改善任意谱下SGD的CE前沿;相反,它在更大的批次大小窗口内保持SGD级别的CE,允许更大的批次减少串行运行时间,直到HB达到其确定性加速尺度。这个窗口可能比SGD临界批次大小大$\sqrt{\kappa}$倍。对于ASGD,情况更依赖于谱:对于快速衰减的幂律谱,ASGD改善了小批次下的CE(相对于HB/SGD),但随着批次大小增加,它牺牲了这种CE优势以换取改进的串行运行时间。合成线性回归实验验证了这些定性区域,包括慢衰减谱下ASGD和HB的近乎重叠,以及快速衰减谱下预测的CE-串行权衡。

英文摘要

Stochastic momentum methods such as heavy ball (HB), Nesterov momentum, and variants of Accelerated SGD (ASGD) [Kidambi et al., 2018] are widely used in modern training, but their stochastic benefits depend on two distinct quantities: serial runtime, the number of iterations needed to reach a target accuracy, and compute efficiency (CE), the inverse total gradient-query or FLOP cost. Larger batches reduce serial runtime without hurting CE only when the contraction gap grows linearly with batch size. We study stochastic HB and ASGD for consistent linear regression with Gaussian covariates and prove finite-dimensional, discrete-time lower bounds on their batch-size tradeoffs. Our first result shows that HB does not improve the CE frontier over SGD for arbitrary spectra; rather, it preserves SGD-level CE over a larger batch-size window, allowing larger batches to reduce serial runtime until HB reaches its deterministic accelerated scale. This window can be a factor $\sqrtκ$ larger than the SGD critical batch size. For ASGD, the picture is more spectrum-dependent: for rapidly decaying power-law spectra, ASGD improves small-batch CE over HB/SGD, but as batch size grows it trades this CE advantage for improved serial runtime. Synthetic linear-regression experiments verify these qualitative regimes, including near-overlap of ASGD and HB for slowly decaying spectra and the predicted CE--serial tradeoff for rapidly decaying spectra.

2606.19147 2026-06-18 stat.ML cs.LG math.ST stat.TH 新提交

On Local Population-Risk Certificates

论局部总体风险证书

Mingzhi Song

发表机构 * Department of Mathematics, The University of Hong Kong(香港大学数学系)

AI总结 本文提出局部总体风险增量证书,用于在模型更新时提供风险控制,通过双边置信带判断更新是否接受。

Comments 35 pages, 6 figures

详情
AI中文摘要

本文为当前模型周围的总体风险增量开发了局部证书。对于局部候选集 \(\mathcal D\),该证书是 \(P({\ell_{\theta+v}-\ell_\theta})\) 在 \(v\in\mathcal D\) 上的双边置信带。作为应用,该置信带的上端点产生了一个风险控制的更新规则:仅当认证的上端点为非正时,更新被接受;否则保留当前模型。

英文摘要

This paper develops local certificates for population-risk increments around a current model. For a local candidate set \(\mathcal D\), the certificate is a two-sided confidence band for \(P({\ell_{θ+v}-\ell_θ})\) over \(v\in\mathcal D\). As an application, the upper endpoint of this band yields a risk-controlled update rule: an update is accepted only when its certified upper endpoint is nonpositive; otherwise the current model is retained.

2606.19084 2026-06-18 math.ST stat.ML stat.TH 新提交

Optimal score function estimation via derivatives constraints

通过导数约束的最优得分函数估计

Thomas Bonis, Thanh Mai Pham Ngoc, Viet Chi Tran

AI总结 研究通过经验风险最小化估计得分函数的问题,提出在Sobolev球上约束假设空间以防止过拟合并获得极小极大估计速率,并应用于基于得分的生成模型。

详情
AI中文摘要

我们考虑通过经验风险最小化进行得分函数估计的问题。首先从推断平坦环面上具有密度的概率测度 $\mu$ 的得分函数的问题开始,基于分布 $\mu$ 的样本。我们证明将假设空间约束到 Sobolev 球足以防止过拟合并获得极小极大估计速率。然后我们考虑基于得分的生成模型中的得分函数估计问题。再次,在将得分估计速率与基于得分的生成模型输出质量联系起来的猜想下,我们通过将假设类约束到 Sobolev 球获得的得分函数估计器,为此类方法获得了极小极大速率。

英文摘要

We consider the problem of score function estimation via empirical risk minimization. We first start with the question of inferring the score function of a probability measure $μ$ with density on the flat torus from a sample of distribution $μ$. We show that constraining the hypothesis space to a Sobolev ball is sufficient to prevent overfitting and obtaining minimax estimation rates. We then consider the problem of score function estimation in the context of score-based generative modeling. Again, under a conjecture tying the score estimation rates to the quality of the output of a score-based generative model, we obtain minimax rates for such an approach using score function estimators obtained by constraining the hypothesis class to a Sobolev ball.

2606.18515 2026-06-18 quant-ph cs.LG stat.ML 新提交

Exponentially many initializations to avoid barren plateaus

指数多个初始化以避免贫瘠高原

Ankit Kulshrestha, Ricard Puig, Diego García-Martín, Lukasz Cincio, Ilya Safro, Zoë Holmes, M. Cerezo

发表机构 * Fujitsu Research of America, Santa Clara, CA 95054, USA(美国富士通美洲研究部) University of Delaware, Newark, DE 19716, USA(德雷克塞尔大学) Department for Quantum Information and Computation at Kepler (QUICK), Johannes Kepler University, Linz, Austria(约翰·凯撒大学量子信息与计算部门) Information Sciences, Los Alamos National Laboratory, Los Alamos, NM 87545, USA(洛斯阿拉莫斯国家实验室信息科学部)

AI总结 提出一阶矩框架诊断初始化能否逃离完全集中的贫瘠高原不动点,发现避免贫瘠高原的初始化策略高度非唯一,存在指数多个不等价族,且不同初始化导致不同极小值。

Comments 18 + 27 pages, 5+4 figures, 1 Table

详情
AI中文摘要

贫瘠高原被描述为一种平均情况现象:选择一个拟设,天真地初始化,然后集中随之而来。这导致了一种普遍观点,即贫瘠高原的潜在治愈方法仅仅是更仔细地初始化参数。在这里,我们表明情况更为微妙。我们引入了一个一阶矩框架,该框架提供了一个简单的算子级诊断,用于判断初始化何时可能逃离完全集中的贫瘠高原不动点,并用于比较不同初始化策略引起的偏差。我们的框架恢复了几种已知的初始化方案,如恒等初始化和高斯初始化,但也表明避免贫瘠高原是高度非唯一的。实际上,许多平移、有偏和非对称的参数分布可以避免集中,并且这些选择不必等价。事实上,我们的结果表明,可以生成指数多个不等价的初始化策略族。然后,我们的数值实验表明,不同一阶矩不同的初始化可能导致不同的达到极小值,这表明通过智能初始化避免贫瘠高原可以将指数集中问题转化为从众多选项中选择正确可训练口袋的挑战。

英文摘要

Barren plateaus are stated as an average-case phenomenon: pick an ansatz, initialize it naively, and concentration follows. This has led to the common view that a potential cure for barren plateaus is simply to initialize the parameters more carefully. Here we show that the situation is subtler. We introduce a first-moment framework that gives a simple operator-level diagnostic for when an initialization may escape the fully concentrated barren-plateau fixed point, and for comparing the biases induced by different initialization strategies. Our framework recovers several known initialization schemes such as identity and Gaussian initialization, but also shows that barren-plateau avoidance is highly non-unique. Indeed, many shifted, biased, and non-symmetric parameter distributions can avoid concentration, and these choices need not be equivalent. In fact, our results show that one can generate exponentially many families of inequivalent initialization strategies. Then, our numerics indicate that different first-moment-distinct initializations can lead to different attained minima, suggesting that avoiding barren plateaus via smart initializations can trade the exponential concentration problem for the challenge of selecting the right trainable pocket amongst many options.

2602.11557 2026-06-18 cs.LG stat.ML 交叉投稿

The Implicit Bias of Steepest Descent with Mini-batch Stochastic Gradient

小批量随机梯度下降的隐式偏差

Jichu Li, Xuan Tang, Difan Zou

AI总结 研究小批量随机最陡下降在多类分类中的隐式偏差,揭示批大小、动量和方差缩减对最大间隔行为和收敛率的影响,并证明动量可实现小批量收敛,方差缩减可恢复全批量隐式偏差。

详情
AI中文摘要

多种广泛使用的优化方法,如SignSGD和Muon,可以被解释为在不同范数诱导几何下的最陡下降实例。在这项工作中,我们研究了多类分类中小批量随机最陡下降的隐式偏差,刻画了批大小、动量和方差缩减如何在一般逐项和Schatten-$p$范数下塑造极限最大间隔行为和收敛率。我们证明,在没有动量时,最坏情况下的收敛和成功分类只能通过全批量梯度保证。相反,动量通过批量-动量权衡使得小批量收敛到近似最大间隔解成为可能,尽管会减慢收敛速度。该方法提供了完全显式、与维度无关的收敛率,优于先前的结果。此外,我们证明方差缩减可以恢复任意批大小下的精确全批量隐式偏差,尽管收敛速度较慢。最后,我们进一步研究了无动量的单批量最陡下降,并通过一个具体数据示例揭示了其收敛到根本不同偏差的特性,这揭示了纯随机更新的一个关键局限性。总体而言,我们的统一分析阐明了随机优化何时与全批量行为一致,并为更深入地探索随机梯度最陡下降算法的训练行为铺平了道路。

英文摘要

A variety of widely used optimization methods like SignSGD and Muon can be interpreted as instances of steepest descent under different norm-induced geometries. In this work, we study the implicit bias of mini-batch stochastic steepest descent in multi-class classification, characterizing how batch size, momentum, and variance reduction shape the limiting max-margin behavior and convergence rates under general entry-wise and Schatten-$p$ norms. We show that, without momentum, worst-case convergence and successful classification can only be guaranteed with full-batch gradient. In contrast, momentum enables small-batch convergence to an approximate max-margin solution through a batch-momentum trade-off, though it slows convergence. This approach provides fully explicit, dimension-free rates that improve upon prior results. Moreover, we prove that variance reduction can recover the exact full-batch implicit bias for any batch size, albeit at a slower convergence rate. Finally, we further investigate the batch-size-one steepest descent without momentum, and reveal its convergence to a fundamentally different bias via a concrete data example, which reveals a key limitation of purely stochastic updates. Overall, our unified analysis clarifies when stochastic optimization aligns with full-batch behavior, and paves the way for perform deeper explorations of the training behavior of stochastic gradient steepest descent algorithms.

2606.04404 2026-06-18 stat.ML cs.LG 版本更新

Knockoffs-based False Discovery Rate Control and Simplification for Deep Neural Networks

基于Knockoffs的深度神经网络错误发现率控制与简化

Wenyu Liao, Yiqing Shi, Fang Xie

发表机构 * bnbu.edu.cn(北京理工大学)

AI总结 本文基于knockoff方法和正则化神经网络,提出了三种在控制错误发现率条件下的变量筛选方法(单层过滤、多层过滤、变量权重聚合过滤),以简化深度神经网络并降低计算复杂度。

详情
AI中文摘要

深度神经网络是机器学习中广泛使用的框架,已广泛应用于各个领域。然而,深度神经网络通常涉及大量参数和输入,其中许多可能与目标或真实输出无关。这些参数和输入变量不仅增加了计算复杂度,还导致了额外的计算成本。解决这一问题的一种方法是knockoff方法,该方法在高维回归中已被证明能有效控制错误发现率。基于knockoff方法和正则化神经网络,本文提出了三种在控制错误发现率条件下的变量筛选方法:单层过滤、多层过滤、变量权重聚合过滤。与现有算法相比,我们发现我们的算法表现出令人满意的性能。

英文摘要

The deep neural network is a widely used framework in machine learning that has been widely applied in various fields. However, deep neural networks often involve a large number of parameters and inputs, many of which may be irrelevant to the goal or true output. These parameters and input variables not only increase computational complexity, but also contribute to additional computational cost. One solution to this problem is knockoff methods, which have proven successful in controlling false discovery rates in high-dimensional regression. Building on the knockoff methods and using the regularised neural network, this paper proposes three variable screening methods under the condition of controlling false discovery rates: one layer filter, multiple layers filter, and variable weight aggregation filter. In comparison with existing algorithms, we find that our algorithms show satisfactory performance.

2605.17232 2026-06-18 cs.LG math.ST stat.ML stat.TH 版本更新

Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space

离散扩散模型的维度无关收敛性:伴随方程诱导了正确的空间

Kelvin Kan, Xingjian Li, Benjamin J. Zhang, Tuhin Sahai, Stanley Osher, Markos A. Katsoulakis

发表机构 * Department of Mathematics(数学系) Oden Institute School of Data Science and Society(数据科学与社会学院) UCLA(加州大学洛杉矶分校) University of Texas at Austin(德克萨斯大学奥斯汀分校) UNC Chapel Hill(北卡罗来纳大学教堂山分校) Computational and Applied Sciences Group(计算与应用科学组) Department of Mathematics and Statistics(数学与统计学系) SRI International(SRI国际) University of Massachusetts Amherst(马萨诸塞大学阿姆赫斯特分校)

AI总结 本文提出了一种基于伴随方程的统一框架,实现了任何积分概率度量(IPM)下的维度无关收敛保证,克服了传统KL和TV方法在处理大规模状态空间时的局限性。

详情
AI中文摘要

离散扩散已成为生成建模中的领先框架,广泛应用于语言、视觉和生物学等领域。然而,现有的收敛理论存在根本性局限。基于KL的分析在奇异先验如掩码分布下会发散,而总变差(TV)的界依赖于状态空间大小S,并在现代语言任务中变得无效,因为词汇表包含数以万计的标记。我们开发了一种统一的基于伴随方程的框架,建立了任何积分概率度量(IPM)下的维度无关收敛保证。到目前为止,我们的界是首个完全不依赖S且适用于掩码和均匀先验的。重要的是,我们的理论仅依赖于一个标准的速率矩阵正则性假设,并且兼容时间非齐次调度。四个新颖的技术推动了我们的改进:通过伴随方程在可观测空间中工作而不是直接处理概率测度,一种产生任何IPM界正则性分析,一种耦合论证在均匀转移下去除S依赖性,以及一种分数-边际抵消技术在掩码转移下去除S依赖性。因此,我们的框架与先前分析显著不同,并避免了路径空间-KL和现有TV方法的不足。除了收敛界外,我们的框架还提供了一种灵活的工具包,用于进一步理论研究离散扩散模型。

英文摘要

Discrete diffusion has become a leading framework for generative modeling in various applications including language, vision, and biology. Existing convergence theory, however, exhibits fundamental limitations. KL-based analyses diverge under singular priors such as the masked distribution, while bounds in total variation (TV) depend on the state space size $S$ and become vacuous for modern language tasks, where vocabularies contain hundreds of thousands of tokens. We develop a unified adjoint-equation-based framework that establishes dimension-free convergence guarantees in any integral probability metric (IPM). To the best of our knowledge, our bounds are the first to be entirely free of $S$ and applicable to both masked and uniform priors. Importantly, our theory relies only on a single standard rate-matrix regularity assumption and applies to general priors. Five novel techniques drive our improvements: working in the space of observables via adjoint equations rather than directly with probability measures, a regularity analysis that yields bounds on any IPM, a coupling argument that removes $S$-dependence under uniform transitions, and score-marginal cancellation and exit-routing techniques that remove $S$-dependence under masked transitions. Our framework thus sharply departs from prior analyses and avoids the shortcomings of pathspace-KL and existing TV-based approaches. Beyond convergence bounds, our framework provides a versatile toolkit for further theoretical study of discrete diffusion models, including principled choices of loss functions and dimension-free step complexity.

2604.04342 2026-06-18 cs.LG stat.ML 版本更新

Generative models for decision-making under distributional shift

分布偏移下决策的生成模型

Xiuyuan Cheng, Yunqin Zhu, Yao Xie

发表机构 * Department of Mathematics, Duke University(杜克大学数学系) H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology(佐治亚理工学院H. Milton Stewart工业与系统工程学院)

AI总结 本文提出基于流和分数生成模型的统一框架,通过传输映射、速度场等工具处理分布偏移下的决策问题,实现鲁棒性、条件分布生成及不确定性量化。

Comments INFORMS TutORials in Operations Research, 2026

详情
AI中文摘要

许多数据驱动的决策问题使用从历史数据估计的名义分布来制定,而性能最终由可能发生偏移、依赖于上下文、部分观测或由压力引起的部署分布决定。本教程介绍了现代生成模型,特别是基于流和分数的方法,作为构建决策相关分布的数学工具。从运筹学的角度来看,它们的主要价值不在于无约束的样本合成,而在于通过传输映射、速度场、分数场和引导随机动力学来表示和变换分布。我们提出了一个基于前推映射、连续性、Fokker-Planck方程、Wasserstein几何和概率空间优化的统一框架。在此框架内,生成模型可用于学习名义不确定性、构建用于鲁棒性的受压或最不利分布,以及在侧信息和部分观测下生成条件或后验分布。我们还强调了代表性的理论保证,包括迭代流模型的前向-反向收敛、传输映射空间中的一阶极小极大分析,以及具有生成先验的后验采样的误差传递界。本教程为在分布偏移下使用生成模型进行场景生成、鲁棒决策、不确定性量化及相关问题提供了原则性的介绍。

英文摘要

Many data-driven decision problems are formulated using a nominal distribution estimated from historical data, while performance is ultimately determined by a deployment distribution that may be shifted, context-dependent, partially observed, or stress-induced. This tutorial presents modern generative models, particularly flow- and score-based methods, as mathematical tools for constructing decision-relevant distributions. From an operations research perspective, their primary value lies not in unconstrained sample synthesis but in representing and transforming distributions through transport maps, velocity fields, score fields, and guided stochastic dynamics. We present a unified framework based on pushforward maps, continuity, Fokker-Planck equations, Wasserstein geometry, and optimization in probability space. Within this framework, generative models can be used to learn nominal uncertainty, construct stressed or least-favorable distributions for robustness, and produce conditional or posterior distributions under side information and partial observation. We also highlight representative theoretical guarantees, including forward-reverse convergence for iterative flow models, first-order minimax analysis in transport-map space, and error-transfer bounds for posterior sampling with generative priors. The tutorial provides a principled introduction to using generative models for scenario generation, robust decision-making, uncertainty quantification, and related problems under distributional shift.

2603.09344 2026-06-18 cs.AI stat.ML 版本更新

Robust Regularized Policy Iteration under Transition Uncertainty

鲁棒正则化策略迭代在转移不确定性下

Hongqiang Lin, Zhenghui Fu, Weihao Tang, Pengfei Wang, Yiding Sun, Qixian Huang, Dongxu Zhang

发表机构 * College of Computer Science and Technology, Zhejiang University, Hangzhou, China(浙江大学计算机科学与技术学院) School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi'an, China(西北工业大学人工智能、光学与电子学院(iOPEN)) School of Software Technology, Zhejiang University, Hangzhou, China(浙江大学软件技术学院) School of Software Engineering, Xi'an Jiaotong University, Xi'an, China(西安交通大学软件工程学院) School of Systems Science and Engineering, Sun Yat-sen University, Guangzhou, China(中山大学系统科学与工程学院)

AI总结 提出鲁棒正则化策略迭代(RRPI),通过将离线强化学习建模为鲁棒策略优化,使用KL正则化替代难解的双层目标,并基于鲁棒正则化贝尔曼算子实现高效策略迭代,理论保证收敛性,实验在D4RL基准上表现优异。

详情
AI中文摘要

离线强化学习(RL)无需在线探索即可实现数据高效且安全的策略学习,但其性能常因分布偏移而下降。学习到的策略可能访问分布外的状态-动作对,其中价值估计和学习到的动态不可靠。为了在统一框架中处理策略引发的外推和转移不确定性,我们将离线RL建模为鲁棒策略优化,将转移核视为不确定性集内的决策变量,并针对最坏情况动态优化策略。我们提出鲁棒正则化策略迭代(RRPI),用可处理的KL正则化替代难解的最大-最小双层目标,并基于鲁棒正则化贝尔曼算子推导出高效的策略迭代过程。我们提供了理论保证,证明所提出的算子是$\gamma$-压缩算子,且迭代更新替代目标能单调改进原始鲁棒目标并收敛。在D4RL基准上的实验表明,RRPI实现了强大的平均性能,在大多数环境中优于包括基于百分位数方法在内的最新基线,并在其余环境中保持竞争力。此外,RRPI通过将较低的$Q$值与高认知不确定性对齐,展现出鲁棒性能,从而防止策略执行不可靠的分布外动作。

英文摘要

Offline reinforcement learning (RL) enables data-efficient and safe policy learning without online exploration, but its performance often degrades under distribution shift. The learned policy may visit out-of-distribution state-action pairs where value estimates and learned dynamics are unreliable. To address policy-induced extrapolation and transition uncertainty in a unified framework, we formulate offline RL as robust policy optimization, treating the transition kernel as a decision variable within an uncertainty set and optimizing the policy against the worst-case dynamics. We propose Robust Regularized Policy Iteration (RRPI), which replaces the intractable max-min bilevel objective with a tractable KL-regularized surrogate and derives an efficient policy iteration procedure based on a robust regularized Bellman operator. We provide theoretical guarantees by showing that the proposed operator is a $γ$-contraction and that iteratively updating the surrogate yields monotonic improvement of the original robust objective with convergence. Experiments on D4RL benchmarks demonstrate that RRPI achieves strong average performance, outperforming recent baselines including percentile-based methods on the majority of environments while remaining competitive on the rest. Moreover, RRPI exhibits robust performance by aligning lower $Q$-values with high epistemic uncertainty, which prevents the policy from executing unreliable out-of-distribution actions.

2603.04895 2026-06-18 stat.ML cs.LG math.OC 版本更新

How Does the ReLU Activation Affect the Implicit Bias of Gradient Descent on High-dimensional Neural Network Regression?

ReLU激活函数如何影响高维神经网络回归中梯度下降的隐式偏差?

Kuo-Wei Lai, Guanghui Wang, Molei Tao, Vidya Muthukumar

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 本文通过原始-对偶分析,研究了高维随机数据下浅层ReLU模型平方损失梯度下降的隐式偏差,证明其以高概率近似最小ℓ2范数解,差距为Θ(√(n/||λ||₁))。

Comments 66 pages

详情
AI中文摘要

过度参数化的机器学习模型(包括神经网络)通常会导致欠定的训练目标,具有多个全局最小值。隐式偏差指的是通过常见优化算法(如梯度下降)达到的极限全局最小值。在本文中,我们刻画了在高维随机特征上使用平方损失训练浅层ReLU模型时梯度下降的隐式偏差。先前的工作(Vardi和Shamir,2021)表明,在最坏情况下隐式偏差不存在,或者在完全正交数据下恰好对应于最小ℓ2范数插值解(Boursier等人,2022)。我们的工作介于这两个极端之间,并表明,对于足够高维的随机数据,隐式偏差以高概率近似最小ℓ2范数解,差距为Θ(√(n/||λ||₁)),其中n是训练样本数,λ表示数据协方差矩阵的谱。我们的结果通过一种新颖的原始-对偶分析获得,该分析仔细跟踪了预测、数据跨度系数及其相互作用的演变,并表明ReLU激活模式在随机数据上以高概率迅速稳定。

英文摘要

Overparameterized ML models, including neural networks, typically induce underdetermined training objectives with multiple global minima. The implicit bias refers to the limiting global minimum that is attained by a common optimization algorithm, such as gradient descent (GD). In this paper, we characterize the implicit bias of GD for training a shallow ReLU model with the squared loss on high-dimensional random features. Prior work (Vardi and Shamir, 2021) showed that the implicit bias does not exist in the worst-case, or corresponds exactly to the minimum-$\ell_2$-norm interpolating solution under exactly orthogonal data (Boursier et al., 2022). Our work interpolates between these two extremes and shows that, for sufficiently high-dimensional random data, the implicit bias approximates the minimum-$\ell_2$-norm solution with high probability with a gap on the order $Θ(\sqrt{n/||λ||_1})$, where $n$ is the number of training examples and $λ$ denotes the spectrum of the data covariance matrix. Our results are obtained through a novel primal-dual analysis that carefully tracks the evolution of predictions, data-span coefficients, as well as their interactions, and show that the ReLU activation pattern quickly stabilizes with high probability over random data.

2602.21160 2026-06-18 stat.ML cs.LG stat.AP stat.ME 版本更新

Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions

不仅多少,而且何处:将认知不确定性分解为每类贡献

Mame Diarra Toure, David A. Stephens

发表机构 * Department of Mathematics and Statistics(数学与统计学系)

AI总结 针对安全关键分类中认知不确定性度量无法区分类别的问题,提出将互信息分解为每类向量$C_k$,通过二阶泰勒展开和$1/\mu_k$加权校正边界抑制,在糖尿病视网膜病变选择性预测、分布外检测和标签噪声研究中验证其有效性。

Comments 8 pages, 17 figures Accepted at UAI 2026

详情
Journal ref
Forty-Second Annual Conference on Uncertainty in Artificial Intelligence}, year={2026}, url={https://openreview.net/forum?id=cxuWscJmAr}
AI中文摘要

在安全关键分类中,失败的代价往往是不对称的,然而贝叶斯深度学习用单个标量——互信息(MI)来总结认知不确定性,这无法区分模型的无知涉及良性类别还是安全关键类别。我们将MI分解为每类向量$C_k(x)=\sigma_k^{2}/(2\mu_k)$,其中$\mu_k{=}\mathbb{E}[p_k]$,$\sigma_k^2{=}\mathrm{Var}[p_k]$,计算基于后验样本。该分解来自熵的二阶泰勒展开;$1/\mu_k$加权校正了边界抑制,使$C_k$在稀有类别和常见类别之间具有可比性。根据构造,$\sum_k C_k \approx \mathrm{MI}$,并且伴随的偏度诊断标志可识别近似退化的输入。在刻画$C_k$的公理性质后,我们在三个任务上验证了它:(i)糖尿病视网膜病变的选择性预测,其中关键类别的$C_k$相比MI降低了34.7%的选择性风险,相比方差基线降低了56.2%;(ii)临床和图像基准上的分布外检测,其中$\sum_k C_k$取得了最高的AUROC,并且每类视角暴露了MI无法察觉的不对称偏移;(iii)受控的标签噪声研究,其中在端到端贝叶斯训练下,$\sum_k C_k$对注入的偶然噪声的敏感性低于MI,而在迁移学习下两种度量均退化。在所有任务中,后验近似的质量对不确定性的影响至少与度量选择本身一样强,这表明不确定性如何通过网络传播与其如何被度量同等重要。

英文摘要

In safety-critical classification, the cost of failure is often asymmetric, yet Bayesian deep learning summarises epistemic uncertainty with a single scalar, mutual information (MI), that cannot distinguish whether a model's ignorance involves a benign or safety-critical class. We decompose MI into a per-class vector $C_k(x)=σ_k^{2}/(2μ_k)$, with $μ_k{=}\mathbb{E}[p_k]$ and $σ_k^2{=}\mathrm{Var}[p_k]$ across posterior samples. The decomposition follows from a second-order Taylor expansion of the entropy; the $1/μ_k$ weighting corrects boundary suppression and makes $C_k$ comparable across rare and common classes. By construction $\sum_k C_k \approx \mathrm{MI}$, and a companion skewness diagnostic flags inputs where the approximation degrades. After characterising the axiomatic properties of $C_k$, we validate it on three tasks: (i) selective prediction for diabetic retinopathy, where critical-class $C_k$ reduces selective risk by 34.7\% over MI and 56.2\% over variance baselines; (ii) out-of-distribution detection on clinical and image benchmarks, where $\sum_k C_k$ achieves the highest AUROC and the per-class view exposes asymmetric shifts invisible to MI; and (iii) a controlled label-noise study in which $\sum_k C_k$ shows less sensitivity to injected aleatoric noise than MI under end-to-end Bayesian training, while both metrics degrade under transfer learning. Across all tasks, the quality of the posterior approximation shapes uncertainty at least as strongly as the choice of metric, suggesting that how uncertainty is propagated through the network matters as much as how it is measured.

2602.17187 2026-06-18 stat.ML cs.LG 版本更新

Anti-causal domain generalization: Leveraging unlabeled data

反因果域泛化:利用无标签数据

Sorawit Saengkyongam, Juan L. Gamella, Andrew C. Miller, Jonas Peters, Nicolai Meinshausen, Christina Heinze-Deml

发表机构 * Apple(苹果公司) ETH Zürich(苏黎世联邦理工学院)

AI总结 针对反因果设置下的域泛化问题,提出利用无标签数据估计环境扰动方向,通过惩罚模型对协变量均值和协方差变化的敏感性实现鲁棒性,并提供最坏情况最优性保证。

Comments Accepted at the International Conference on Machine Learning (ICML) 2026

详情
AI中文摘要

域泛化问题关注的是学习在部署到新的、未见过的环境时对分布变化具有鲁棒性的预测模型。现有方法通常需要来自多个训练环境的标记数据,这在标记数据稀缺时限制了它们的适用性。在这项工作中,我们研究了反因果设置下的域泛化,其中结果导致观察到的协变量。在这种结构下,影响协变量的环境扰动不会传播到结果,这促使我们对模型对这些扰动的敏感性进行正则化。关键在于,估计这些扰动方向不需要标签,使我们能够利用来自多个环境的无标签数据。我们提出了两种方法,分别惩罚模型对跨环境协变量均值和协方差变化的敏感性,并证明这些方法在特定环境类别下具有最坏情况最优性保证。最后,我们在一个受控物理系统和一个生理信号数据集上展示了我们方法的实证性能。

英文摘要

The problem of domain generalization concerns learning predictive models that are robust to distribution shifts when deployed in new, previously unseen environments. Existing methods typically require labeled data from multiple training environments, limiting their applicability when labeled data are scarce. In this work, we study domain generalization in an anti-causal setting, where the outcome causes the observed covariates. Under this structure, environment perturbations that affect the covariates do not propagate to the outcome, which motivates regularizing the model's sensitivity to these perturbations. Crucially, estimating these perturbation directions does not require labels, enabling us to leverage unlabeled data from multiple environments. We propose two methods that penalize the model's sensitivity to variations in the mean and covariance of the covariates across environments, respectively, and prove that these methods have worst-case optimality guarantees under certain classes of environments. Finally, we demonstrate the empirical performance of our approach on a controlled physical system and a physiological signal dataset.

2602.14789 2026-06-18 cs.LG stat.ML 版本更新

On the Stability of Nonlinear Dynamics in GD and SGD: Beyond Quadratic Potentials

关于GD和SGD中非线性动力学的稳定性:超越二次势能

Rotem Mulayoff, Sebastian U. Stich

发表机构 * CISPA Helmholtz Center for Information Security(CISPA赫尔姆霍兹信息安全中心)

AI总结 研究梯度下降和随机梯度下降中非线性项对动力学稳定性的影响,推导了多元设置下稳定振荡的精确条件,并发现SGD的稳定性由单个不稳定批次决定。

Comments Accepted to COLT 2026

详情
AI中文摘要

训练过程中迭代的动力稳定性在确定优化算法所获得的极小值方面起着关键作用。例如,梯度下降(GD)的稳定解对应于平坦极小值,而平坦极小值被认为具有有利特征。虽然先前的工作通常依赖线性化来确定稳定性,但线性化动力学是否忠实捕捉完整的非线性行为仍不清楚。最近的研究表明,GD可能在线性不稳定的极小值附近稳定振荡,并在步长衰减后收敛,这表明线性分析可能具有误导性。在这项工作中,我们明确研究了非线性项的影响。具体而言,我们在多元设置下推导了GD在极小值附近稳定振荡的精确准则。我们的条件依赖于高阶导数,推广了现有结果。将分析扩展到随机梯度下降(SGD),我们表明即使单个批次不稳定,非线性动力学也可能在期望上发散。这意味着稳定性可能由单个不稳定振荡的批次决定,而非线性分析所暗示的平均效应。最后,我们证明如果所有批次都是线性稳定的,则SGD的非线性动力学在期望上是稳定的。

英文摘要

The dynamical stability of the iterates during training plays a key role in determining the minima obtained by optimization algorithms. For example, stable solutions of gradient descent (GD) correspond to flat minima, which have been associated with favorable features. While prior work often relies on linearization to determine stability, it remains unclear whether linearized dynamics faithfully capture the full nonlinear behavior. Recent work has shown that GD may stably oscillate near a linearly unstable minimum and still converge once the step size decays, indicating that linear analysis can be misleading. In this work, we explicitly study the effect of nonlinear terms. Specifically, we derive an exact criterion for stable oscillations of GD near minima in the multivariate setting. Our condition depends on high-order derivatives, generalizing existing results. Extending the analysis to stochastic gradient descent (SGD), we show that nonlinear dynamics can diverge in expectation even if a single batch is unstable. This implies that stability can be dictated by a single batch that oscillates unstably, rather than an average effect, as linear analysis suggests. Finally, we prove that if all batches are linearly stable, the nonlinear dynamics of SGD are stable in expectation.

2602.02056 2026-06-18 cs.AR cs.LG cs.SY eess.SY stat.ML 版本更新

Ultrafast On-chip Online Learning via Spline Locality in Kolmogorov-Arnold Networks

基于Kolmogorov-Arnold网络中样条局部性的超快片上在线学习

Duc Hoang, Aarush Gupta, Philip Harris

发表机构 * MIT(麻省理工学院)

AI总结 针对量子计算和核聚变控制等高频系统对亚微秒级在线学习的需求,提出利用Kolmogorov-Arnold网络的B样条局部性实现稀疏更新和固定点量化鲁棒性,在FPGA上实现比MLP更高效、更具表达力的超快在线学习。

Comments Forty-Third International Conference on Machine Learning (ICML'26)

详情
AI中文摘要

超快在线学习对于高频系统(如量子计算和核聚变控制)至关重要,这些系统中的自适应必须在亚微秒时间尺度内发生。满足这些需求需要在严格的内存约束下进行低延迟、固定精度的计算,而传统的多层感知器(MLP)在这种条件下既低效又不稳定。我们识别了Kolmogorov-Arnold网络(KAN)与这些约束相符的关键特性。具体来说,我们表明:(i)利用B样条局部性的KAN更新是稀疏的,从而实现优越的片上资源缩放;(ii)KAN对固定点量化具有固有的鲁棒性。通过在现场可编程门阵列(FPGA)上实现固定点在线训练(一种代表性的片上计算平台),我们证明基于KAN的在线学习器在一系列低延迟和资源受限的任务中比MLP显著更高效且更具表达力。据我们所知,这项工作首次展示了在亚微秒延迟下的无模型在线学习。

英文摘要

Ultrafast online learning is essential for high-frequency systems, such as controls for quantum computing and nuclear fusion, where adaptation must occur on sub-microsecond timescales. Meeting these requirements demands low-latency, fixed-precision computation under strict memory constraints, a regime in which conventional Multi-Layer Perceptrons (MLPs) are both inefficient and numerically unstable. We identify key properties of Kolmogorov-Arnold Networks (KANs) that align with these constraints. Specifically, we show that: (i) KAN updates exploiting B-spline locality are sparse, enabling superior on-chip resource scaling, and (ii) KANs are inherently robust to fixed-point quantization. By implementing fixed-point online training on Field-Programmable Gate Arrays (FPGAs), a representative platform for on-chip computation, we demonstrate that KAN-based online learners are significantly more efficient and expressive than MLPs across a range of low-latency and resource-constrained tasks. To our knowledge, this work is the first to demonstrate model-free online learning at sub-microsecond latencies.

2509.14969 2026-06-18 cs.LG math.OC stat.ML 版本更新

Stochastic Adaptive Gradient Descent Without Descent

无需下降的随机自适应梯度下降

Jean-François Aujol, Jérémie Bigot, Camille Castera

发表机构 * Univ. Bordeaux CNRS, Bordeaux INP, IMB, UMR 5251(波尔多大学 CNRS,波尔多 INP,IMB,UMR 5251)

AI总结 提出一种无需超参数调优的随机梯度自适应步长策略,利用一阶随机Oracle的局部几何信息,理论证明收敛性,实验与调优基线竞争。

详情
AI中文摘要

我们引入了一种新的自适应步长策略,用于随机梯度的凸优化,该策略仅通过一阶随机Oracle利用目标函数的局部几何信息,无需任何超参数调优。该方法源于将自适应梯度下降无需下降方法理论化地适应到随机设置。我们证明了在多种假设下,使用我们的步长的随机梯度下降的收敛性,并展示了它在经验上与调优基线竞争。

英文摘要

We introduce a new adaptive step-size strategy for convex optimization with stochastic gradient that exploits the local geometry of the objective function only by means of a first-order stochastic oracle and without any hyper-parameter tuning. The method comes from a theoretically-grounded adaptation of the Adaptive Gradient Descent Without Descent method to the stochastic setting. We prove the convergence of stochastic gradient descent with our step-size under various assumptions, and we show that it empirically competes against tuned baselines.

2507.07156 2026-06-18 stat.ML cs.CG cs.LG math.AT 版本更新

Unreduced Persistence Diagrams for Topological Machine Learning

未约简持久图在拓扑机器学习中的应用

Nicole Abreu, Parker B. Edwards, Francis Motta

发表机构 * Department of Mathematics and Statistics, Florida Atlantic University, Boca Raton, FL(数学与统计学系,佛罗里达国际大学, Boca Raton, FL)

AI总结 研究未约简边界矩阵生成的拓扑特征向量在机器学习中的性能,发现其与完全约简持久图性能相当甚至更优,且计算内存需求低一个数量级。

Comments Substantially expanded to include additional ML and software benchmark experiments. 11 figures, 4 tables, 20 pages (without appendix and references)

详情
AI中文摘要

基于持久同源性特征训练的监督机器学习流程在实验中被观察到忽略了持久图中包含的大量信息。然而,计算持久图通常是此类流程中计算最密集的步骤。为了探索这一动态,我们引入了几种从未约简边界矩阵生成拓扑特征向量的方法,并研究了它们的理论和计算性质。我们比较了基于未约简持久图向量化的流程与基于完全约简持久图向量化的流程在多种数据和任务类型上的性能。结果表明,基于未约简图构建的持久图训练的模型在某些任务上可以与基于完全约简图训练的模型表现相当,甚至更优。我们还对一个计算未约简图的算法进行了计算性能基准测试,该算法是Ripser的 heavily modified 版本。这些计算是可并行的,并且平均所需内存比计算完全持久图少一个数量级。我们的结果表明,利用未约简边界矩阵中包含信息的机器学习流程可能在计算成本和性能方面受益。

英文摘要

Supervised machine learning pipelines trained on features derived from persistent homology have been experimentally observed to ignore much of the information contained in a persistence diagram. Computing persistence diagrams is often the most computationally demanding step in such a pipeline, however. To explore this dynamic, we introduce several methods to generate topological feature vectors from unreduced boundary matrices and investigate their theoretical and computational properties. We compared the performance of pipelines trained on vectorizations of unreduced PDs to vectorizations of fully-reduced PDs across several data and task types. Our results indicate that models trained on PDs built from unreduced diagrams can perform on par and even outperform those trained on fully-reduced diagrams on some tasks. We also benchmarked the computational performance of an algorithm for computing unreduced diagrams, which was implemented as a heavily modified version of Ripser. These computations are parallelizable and required an order of magnitude less memory on average compared to computing full persistence diagrams. Our results suggest that machine learning pipelines which incorporate topology-based features may benefit in terms of computational cost and performance by utilizing information contained in unreduced boundary matrices.

8. 生物统计与医学统计 8 篇

2606.19125 2026-06-18 eess.AS stat.ME 新提交

Continuous-Speech Parkinson's Disease Detection Using Acoustic and Inharmonicity Features

连续语音帕金森病检测:基于声学和非谐和性特征

Rujia Li, Niloofar Momeni, Susanna Whitling, Andreas Jakobsson

AI总结 提出一种基于连续语音的帕金森病检测方法,利用传统声学特征和新型非谐和性特征,实验表明连续语音模型优于持续元音模型。

详情
AI中文摘要

已有研究主要利用持续元音发声从语音数据中识别帕金森病(PD)。本文在此基础上,提出了一种针对连续语音的PD识别方法,从而实现对语音数据的实用背景监测,以检测指示PD的语音变化。使用两个不同的数据集,我们比较了最佳持续元音模型与所提出的连续语音模型的性能,清晰展示了后者的优越性能。我们研究了说话人级别评估和数据泄漏预防的方法,以及如何从连续语音中可靠提取元音信息。所提出的方法框架同时利用传统声学表示和一种有前景的新型基于非谐和性的框架,展示了后者如何提供互补信息以改善其中一个数据集的性能;然而,对于另一个数据集,该信息并未显著改善(或降低)性能,表明在得出其使用结论前需要进一步研究。总体而言,本文清晰展示了使用连续语音进行PD分类相比使用持续元音声音的优势。

英文摘要

Notable efforts have been made to identify Parkinson's disease (PD) from vocal data, primarily using sustained vowel phonations. In this work, we extend on these efforts introducing a PD identification approach for continuous speech, enabling a practical background monitoring of voice data to detect vocal changes indicative of PD. Using two distinct data sets, we compare the best sustained vowel model with that of the proposed continuous speech model, clearly illustrating the preferential performance of the latter. We examine approaches for speaker level evaluation and data leakage preventions, as well as how vowel information may be reliable extracted from continuous speech. The proposed method framework exploits both traditional acoustic representations and a promising novel inharmonicity based framework, showing how the latter provides complementary information improving the performance for one of the data sets; however, for the other data set, this information did not significantly improve (nor reduce) the performance, suggesting that further studies are required before being able to draw firm conclusions in its use. Overall, the work clearly illustrates the benefit of forming PD classification using continuous speech compared to using sustained vowel sounds.

2606.19041 2026-06-18 stat.ME 新提交

Efficient Cumulative Incidence Estimation in Biobank Studies Using All Prevalent and Incident Events

利用所有现患和发病事件在生物库研究中进行高效累积发病率估计

David M. Zucker, Malka Gorfine

AI总结 针对生物库数据中同时包含招募前发病(现患)和随访期间发病的个体,提出一种新的累积发病率函数估计方法,整合所有病例,处理年轻发病且生存期长的疾病,理论证明渐近性质,模拟和UK生物库癌症数据验证其优势。

详情
AI中文摘要

基于人群的生物库已在许多国家建立,为大规模研究各种疾病的发病率提供了机会。生物库数据通常是在特定日历期内招募的研究队列中收集的,受试者在年龄介于$R_L$和$R_U$之间时进入研究。本研究关注包含两类个体的生物库数据:在招募前已发生目标疾病(称为现患病例)的个体,以及最初招募时无病但在随访期间发病的个体。我们提出一种新的累积发病率函数(CIF)估计量,它超越了现有方法,因为它整合了所有疾病病例,无论是现患还是发病,无论其后续生命历程如何。特别是,新方法可以处理涉及在年轻年龄发生且发病后生存期长的疾病的情况。建立了新方法的渐近性质,并进行了模拟研究以检验该方法的性能。我们通过将方法应用于英国生物库的癌症数据,说明了该方法的使用,并强调了其相对于现有方法的优势。

英文摘要

Population-based biobanks, now established in many countries, offer opportunities for large-scale studies investigating the incidence of various diseases. Biobank data is typically collected from a study cohort recruited over a defined calendar period, with subjects entering the study at various ages falling between $R_L$ and $R_U$. This work focuses on biobank data that includes individuals in whom onset of the disease of interest occurred before recruitment, termed prevalent cases, along with individuals initially recruited as disease-free in whom disease onset occurred during the follow-up period. We propose a novel cumulative incidence function (CIF) estimator that goes beyond existing methods in that it incorporates all disease cases, both prevalent and incident, irrespective of their subsequent life course. In particular, the new method can handle situations involving diseases that can occur at young ages with long survival after disease onset. Asymptotic properties of the new method are established and a simulation study is presented examining the performance of the method. We illustrate the use of the method and highlight its advantages over existing methods with an application to cancer data from the UK biobank.

2606.18843 2026-06-18 stat.ME 新提交

Improved prediction of extreme random effects in joint models: WRaPs

联合模型中极端随机效应的改进预测:WRaPs

Eline Vanderpijpen, Els Goetghebeur

AI总结 针对联合模型中极端随机效应预测的回归均值问题,提出加权随机效应预测器(WRaPs),通过最小化加权平方预测误差来改善尾部估计,并在贝叶斯框架下提供解析解和MCMC计算方案。

详情
AI中文摘要

混合模型常用于预测多个中心的受试者特定重复结局或中心绩效。然而,当目标是识别极端或不良结局时,标准随机效应预测可能遭受回归均值的影响,低估其分布尾部的值。最近提出了最优加权随机效应估计量来缓解这一问题。受重复结局可能以死亡结束的临床情境启发,我们将该方法扩展到预测定义为“死亡或低于标准的重复测量”的不良结局。我们从具有纵向和生存结局共享随机效应的联合模型出发,通过最小化给定生存和重复测量可用数据下的加权平方预测误差来估计其随机效应。与混合模型一样,选择权重以更重地惩罚尾部的误差。我们将结果称为WRaPs:加权随机效应预测器。对于基本模型和选定的权重集,从通常的联合模型参数推导出解析闭式解。对于更复杂的情况,在贝叶斯范式内使用rjags中的MCMC方法开发计算解决方案。我们在具有随机截距和斜率的I型模拟中展示了所提出方法的有限样本性质;并将新方法应用于一项针对胶质母细胞瘤患者的随机研究中,预测个体未来结局和生存。

英文摘要

Mixed models are popular for the prediction of subject-specific repeated outcomes or center performance among many centers. When the goal is to identify extreme or poor outcomes, standard random effects predictions may, however, suffer from regression to the mean and underestimate values in the tail of their distribution. Optimally weighted random effect estimators have recently been proposed to mitigate this. Motivated by clinical settings where repeated outcomes may end in death, we extend that method to predict poor outcome defined as 'death or substandard repeated measures'. We start from joint models with shared random effects for the longitudinal and survival outcome and estimate their random effects by minimizing squared weighted prediction errors given available data on survival and repeated measures. As for mixed models, weights are chosen to more heavily penalize errors in the tails. We call the results WRaPs: Weighted Random effect Predictors. For basic models and a select set of weights analytical closed form solutions are derived from the usual joint model parameters. For the more complex setting, computational solutions are developed in rjags using MCMC methods within the Bayesian paradigm. We illustrate finite sample properties of the proposed method in Type I simulations with random intercept and slope; and apply the new approach to predict individual future outcomes and survival in a randomized study with glioblastoma patients.

2606.18809 2026-06-18 stat.ME stat.AP 新提交

Applying the Weibull Shape Parameter test for signal detection in pharmacovigilance using the R package WSPsignal

应用Weibull形状参数检验进行药物警戒信号检测:基于R包WSPsignal

Julia Dyck, Odile Sauzet

AI总结 提出Weibull形状参数检验家族用于药物警戒信号检测,开发R包WSPsignal集成多种估计方法和分布,支持默认与仿真调优,通过大样本和小样本示例展示功能。

详情
AI中文摘要

上市后药物警戒依赖统计信号检测方法识别潜在药物不良反应。Weibull形状参数(WSP)检验概念利用时间信息(电子健康记录)评估药物起始后不良事件的风险随时间的变化。从恒定性中统计显著的偏离产生信号。WSP框架包含一系列检验,这些检验在估计方法(频率学派或贝叶斯)、用于风险建模的选定时间-事件分布(Weibull、双Weibull、幂广义Weibull)以及检验规范参数方面有所不同。为促进实际应用并鼓励在未来研究中考虑WSP信号检测检验,我们开发了R包WSPsignal。该包将所有WSP检验所需功能整合到一个统一的开源接口中。它使实践者和研究人员能够应用默认检验规范,或执行基于仿真的调优以针对给定数据场景确定最优检验。我们通过两个示例说明该包的功能。在大样本设置(约20,000个观测)中,考虑频率学派WSP检验。在小样本设置(约1,000个观测)中,选择贝叶斯WSP检验。额外的检验规范通过基于仿真的调优进行优化。

英文摘要

Post-marketing pharmacovigilance relies on statistical signal detection methods to identify potential adverse drug reactions. The Weibull shape parameter (WSP) test concept exploits temporal information (electronic health records) to assess the hazard of an adverse event over time after drug initiation. A statistically significant deviation from constancy results in a signal. The WSP framework comprises a family of tests that differ with respect to the estimation approach (frequentist or Bayesian), the chosen time-to-event distribution (Weibull, double Weibull, power generalized Weibull) for hazard modeling, and test specification parameters. To facilitate practical application and encourage consideration of the WSP signal detection test in future research, we developed the R package WSPsignal. The package consolidates all functionalities required for WSP testing into a unified, open-source interface. It enables practitioners and researchers to apply default test specifications or perform simulation-based tuning to identify the optimal test for a given data scenario. We illustrate the package functionalities in two examples to follow along. In a large-sample setting (ca. 20 000 observations), a frequentist WSP test is considered. In a small-sample setting (ca. 1 000 observations), a Bayesian WSP test is chosen. The additional test specifications are optimized through simulation-based tuning.

2606.19092 2026-06-18 stat.AP cs.LG 新提交

Context-Aware Optimization of Follow-Up Intervals for Type 2 Diabetes Care Using Markov Decision Processes

使用马尔可夫决策过程对2型糖尿病护理随访间隔进行上下文感知优化

Parisa Lotfibagha, Kristen Miller, William J. Gallagher, Elizabeth B. Selden, Muge Capan

AI总结 提出上下文马尔可夫决策过程模型,利用电子健康记录数据为2型糖尿病患者优化个性化随访间隔,识别低风险和高风险亚群,相比固定间隔策略显著降低预期累积成本。

详情
AI中文摘要

慢性病管理依赖于定期的医患互动来跟踪疾病进展和控制。对于2型糖尿病,当前指南对所有患者规定固定的初级保健随访间隔,忽略了临床轨迹和患者特征的异质性。本研究引入上下文马尔可夫决策过程模型,利用来自10个初级保健诊所的22,154名2型糖尿病患者的电子健康记录数据,优化亚群特定的随访间隔决策。上下文通过以下方式识别:i) 利用主成分分析对代表个体健康轨迹的变量进行降维,以及ii) 通过主成分和额外的患者层面特征使用聚类将患者分配到上下文中。出现了两个不同的上下文,分别代表低风险和高风险亚群。CMDP导出的策略建议:(i) 如果当前就诊的实验室值未测量,则在1个月内随访;(ii) 对于实验室值升高或近期住院,最多3个月;(iii) 对于持续血糖控制,6至12个月,高风险上下文患者的随访间隔更短。最优策略实现了比基准更低的预期累积成本(例如,在高共病上下文中,相对于美国糖尿病协会类似的固定间隔随访策略,CMDP策略降低了约34.8%的成本;在低共病上下文中降低了约6.4%)。这些发现展示了上下文感知方法如何为适应性随访策略提供信息,并有可能通过综合机器学习和概率决策模型来推进初级保健中的慢性病管理。

英文摘要

Chronic disease management relies on regular patient-provider interactions to follow-up on disease progression and control. For Type 2 Diabetes (T2D), current guidelines prescribe fixed time intervals between subsequent primary care visits for all patients, overlooking heterogeneity in clinical trajectories and patient characteristics. This study introduces a Contextual Markov Decision Process (CMDP) model to optimize subpopulation-specific follow-up interval decisions using Electronic Health Record (EHR) data from 22,154 T2D patients across 10 primary care clinics. Contexts are identified by: i) dimensionality reduction of variables representing the individual health trajectories utilizing Principal Component Analysis, and ii) assigning patients to contexts via principal components and additional patient-level features using clustering. Two distinct contexts emerged, representing a lower- and a higher-risk subpopulation. CMDP-derived policies recommend: (i) follow-up within 1 month if lab value at current visit is unmeasured; (ii) up to 3 months for elevated lab values or recent hospitalizations; and (iii) 6 to 12 months for sustained glycemic control, with shorter follow-up intervals for patients in high-risk context. The optimal policies achieved lower expected cumulative cost than benchmarks (e.g., in the higher-comorbidity context, the CMDP policy reduced cost by about 34.8%, and in the lower-comorbidity context by about 6.4%, relative to an American Diabetes Association-like fixed interval follow-up policy. These findings demonstrate how context-aware approaches can inform adaptive follow-up strategies, and have the potential to advance chronic care management in primary care by synthesizing machine learning and probabilistic decision models.

2606.18506 2026-06-18 cs.LG eess.SP stat.AP 新提交

Beyond AHI: An Interpretable Causal-Discovery-Guided Framework for Sleep Recovery in Connected Health

超越AHI:一种可解释的因果发现引导的睡眠恢复框架在互联健康中的应用

Saba A. Farahani, Elahe Khatibi, Manoj Vishwanath, Amir M. Rahmani, Hung Cao

发表机构 * University of California, Irvine(加州大学尔湾分校)

AI总结 提出一种可解释的因果发现引导框架,从多模态PSG中推导层次化睡眠恢复评分(SRS),在两大队列中SRS与感知恢复的关联强度是AHI的2.5倍。

Comments 6 pages, 2 figures, 2 tables. Accepted at the 2nd Workshop on Sensing and Computing for Smart and Connected Health (SCH), co-located with IEEE/ACM CHASE 2026

详情
AI中文摘要

客观睡眠评估依赖于多导睡眠图(PSG),但临床影响通常更好地反映在患者报告结局(PROs)如嗜睡和疲劳中。现有的总结指标,包括呼吸暂停低通气指数(AHI),对功能恢复背后的多域生理学提供的洞察有限。我们提出了一种可解释的、因果发现引导的框架,用于从多模态PSG中推导层次化睡眠恢复评分(SRS)。利用两个大型人群队列(MESA: n=1540; MrOS: n=825),我们应用有向无环图(DAG)学习来识别候选生理驱动因素,涵盖呼吸负担、缺氧负担、睡眠碎片化、睡眠结构和自主神经调节。尽管源自临床PSG,这些域自然映射到互联健康技术中日益可用的传感流,包括可穿戴心电图、血氧测定和睡眠阶段估计设备。为了保持机制合理性,我们引入了一个两阶段筛选过程,结合基于生理学的约束和受约束的LLM辅助审计,以识别和消除结构混杂因素以及构造重叠变量。跨队列,这五个域作为与恢复相关的重复生理域出现,所得SRS与感知恢复的关联强度高达AHI的2.5倍。通过将多模态睡眠生理学与以患者为中心的结果通过可解释、偏差感知和域结构化的框架联系起来,这项工作为临床睡眠研究和新兴智能互联健康环境中的恢复建模提供了实用基础。

英文摘要

Objective sleep assessment relies on polysomnography (PSG), yet clinical impact is often better reflected in patient-reported outcomes (PROs) such as sleepiness and fatigue. Existing summary indices, including the Apnea-Hypopnea Index (AHI), provide limited insight into the multidomain physiology underlying functional recovery. We propose an interpretable, causal-discovery--guided framework for deriving a hierarchical Sleep Recovery Score (SRS) from multimodal PSG. Using two large population cohorts (MESA: n=1540; MrOS: n=825), we apply directed acyclic graph (DAG) learning to identify candidate physiological drivers spanning respiratory burden, hypoxic burden, sleep fragmentation, sleep architecture, and autonomic regulation. Although derived from clinical PSG, these domains map naturally to sensing streams increasingly available in connected health technologies, including wearable ECG, oximetry, and sleep-stage estimation devices. To preserve mechanistic plausibility, we introduce a two-stage screening process that combines physiology-based constraints with constrained LLM-assisted auditing to identify and remove structural confounders and construct-overlapping variables. Across cohorts, these five domains emerge as recurrent physiological domains associated with recovery, and the resulting SRS shows up to 2.5$\times$ stronger alignment with perceived recovery than AHI. By linking multimodal sleep physiology to patient-centered outcomes through an interpretable, bias-aware, and domain structured framework, this work provides a practical foundation for recovery modeling across both clinical sleep studies and emerging smart and connected health settings.

2603.25235 2026-06-18 stat.AP 版本更新

Bayesian Inference for Epidemic Final Size Datasets with Hidden Underlying Household Structure

具有隐藏底层家庭结构的流行病最终规模数据集的贝叶斯推断

Joseph Brooks, Thomas House, Lorenzo Pellis, Joe Hilton

AI总结 提出一种基于MCMC的贝叶斯推断方法,通过插补未报告的家庭结构来估计传染病传播强度,在合成数据上实现超90%覆盖率,并利用COVID-19数据表明按家庭规模分层可降低估计不确定性。

详情
AI中文摘要

家庭是传染病流行病学中经验研究和数学建模的关键兴趣单位。疾病的家庭内传播潜力通常用二代发病率(SAR)来概括。尽管SAR被广泛使用,但它依赖于研究期间观察到的家庭规模分布(HHSD),难以推广到新环境。将传播潜力估计扩展到新人群需要估计人际传播率,这些传播率可以与人口结构数据结合,以参数化机制传播模型。在本研究中,我们提出了一种新的贝叶斯推断方法,该方法使用MCMC算法通过插补流行病背后未报告的家庭结构来推断传播强度。该方法可以在不同分辨率水平上报告的家庭流行病学数据上运行。对于来自真实底层HHSD的合成数据,我们能够持续实现超过90%的传播率估计覆盖率。对于具有病态底层HHSD生成的数据,在给定HHSD强信息的情况下,我们也能持续实现超过90%的覆盖率。利用一个在COVID-19大流行期间记录微观家庭流行病学结果的现有数据集,我们表明按家庭规模分层观察到的SAR可以显著降低估计的不确定性。我们的发现表明,进行家庭流行病学研究的 researchers 可以通过报告家庭分层估计来提高结果对传染病建模者的实用性。这些结果旨在鼓励在流行病学现场工作中报告更高分辨率的数据,因为在缺乏强先验的情况下,从通常报告的低分辨率数据集中难以识别传播参数。

英文摘要

Households represent a key unit of interest in infectious disease epidemiology, in both empirical studies and mathematical modelling. The within-household transmission potential of a disease is often summarised by a secondary attack ratio (SAR). Despite its widespread use, the SAR depends on the household size distribution (HHSD) seen during the study period, making it difficult to generalise to new contexts. Extending estimates of transmission potential to new populations instead requires estimates of person-to-person transmission rates which can be convoluted with data on population structure to parametrise mechanistic transmission models. In this study we present a new Bayesian inference method which uses an MCMC algorithm to infer the transmission intensity by imputing the unreported household structure underlying the epidemic. This method can be run on household epidemiological data reported at varying levels of resolution. For synthetic data from a realistic underlying HHSD, we were able to achieve over 90% coverage in our estimates of transmission rate consistently. We were also able to consistently achieve over 90% coverage for data generated with a pathological underlying HHSD, given strong information about the HHSD. Using an existing dataset which recorded micro-scale household epidemiological outcomes during the COVID-19 pandemic, we show that stratifying observed SARs by household size substantially reduces the uncertainty in estimates. Our findings suggest that researchers conducting household epidemiological studies can improve the utility of results for infectious disease modellers by reporting household-stratified estimates. These results aim to encourage the reporting of higher resolution outputs in epidemiological field work as, in the absence of strong priors, transmission parameters were not easily identifiable from low resolution datasets, which are often reported.

2509.14183 2026-06-18 stat.ME stat.AP 版本更新

Index Date Imputation for Survival Analysis in Externally Controlled Trials with Delayed Treatment Initiation

延迟治疗启动的外部对照试验中生存分析的索引日期插补

Q. Le Coent, G. L. Rosner, M-C. Wang, C. Hu

AI总结 针对外部对照试验中因治疗启动延迟导致的索引日期错位问题,提出截断感知的索引日期插补(IDI)方法,结合倾向得分加权以校正混杂,模拟和真实数据验证其减少偏差的有效性。

详情
AI中文摘要

外部对照试验将单臂试验的结果与从历史试验、注册或观察性研究中抽取的外部对照进行比较。对于时间至事件终点,一个关键挑战是单臂试验中的随访以治疗启动为索引,而外部对照数据以更早的临床里程碑(如诊断或复发)为索引。这种错位可能引入永存时间偏倚,扭曲风险集,并复杂化生存比较的解释。我们提出索引日期插补(IDI),一种截断感知的方法,用于在延迟治疗启动的设置中为外部对照患者插补可比较的索引日期。IDI估计目标单臂人群中治疗启动时间的边际分布,同时考虑到启动时间仅在存活足够长以启动治疗的患者中观察到。然后使用插补的索引日期来对齐随访,并在外部对照队列中强制实施可比较的截断条件。由于仅时间对齐不能解决人群水平的混杂,IDI与倾向得分加权或匹配相结合,以改善队列之间的协变量可比性。我们通过蒙特卡洛模拟研究评估所提出方法的有限样本性能。使用来自一项随机肿瘤试验的数据,我们模拟了一个具有诱导索引日期错位的外部对照分析,并显示IDI减少了与随机试验基准的差异。IDI为涉及延迟治疗启动的生存分析中的索引日期对齐提供了一种实用策略,并且在有合适外部对照可用时,可以与标准的协变量调整方法集成。

英文摘要

Externally controlled trials compare outcomes from a single-arm trial with external controls drawn from historical trials, registries, or observational studies. For time-to-event endpoints, a key challenge arises when follow-up is indexed at treatment initiation in the single-arm trial, but the external-control data are indexed at an earlier clinical milestone, such as diagnosis or relapse. This misalignment can induce immortal time bias, distort risk sets, and complicate the interpretation of survival comparisons. We propose Index Date Imputation (IDI), a truncation-aware approach for imputing comparable index dates for external-control patients in settings with delayed treatment initiation. IDI estimates the marginal distribution of treatment-initiation times in the target single-arm population while accounting for the fact that initiation times are observed only among patients who survive long enough to initiate treatment. The imputed index dates are then used to align follow-up and enforce comparable truncation conditions in the external-control cohort. Because temporal alignment alone does not address population-level confounding, IDI is combined with propensity score weighting or matching to improve covariate comparability between cohorts. We evaluate the finite-sample performance of the proposed approach through Monte Carlo simulation studies. Using data from a randomized oncology trial, we emulate an externally controlled analysis with induced index-date misalignment and show that IDI reduces discrepancy from the randomized trial benchmark. IDI provides a practical strategy for index-date alignment in survival analyses involving delayed treatment initiation and can be integrated with standard covariate-adjustment methods when suitable external controls are available.

9. 经济金融与社会科学统计 4 篇

2606.18512 2026-06-18 econ.EM stat.ME 新提交

Causal Forecasting in Panel Data: A Two-Way Synthetic Forecasting Approach

面板数据中的因果预测:一种双向合成预测方法

Dennis Shen

AI总结 针对面板数据中未经历干预的目标单元的未来结果预测问题,提出双向合成预测(TWSF)方法,结合合成控制与时间序列外推,给出有限样本误差界和渐近正态性,并通过NFL体育场开放案例验证。

详情
AI中文摘要

估计面板数据中的因果效应是政策评估的核心问题。现有方法主要解决回顾性问题:在观测面板期间,目标单元在不同干预下会发生什么?然而,在许多应用中,决策者面临前瞻性问题:在观测面板之外,目标单元在尚未经历的干预下会发生什么?本文通过将基于合成控制的回顾性反事实逻辑与多元时间序列预测的外推结构相结合,开发了一个回答此类因果预测问题的框架。基于证明合成控制中单元侧回归合理性的潜在因子模型,我们对潜在时间因子施加低秩时间结构,以识别前瞻性因果预测估计量。我们通过双向合成预测估计量(TWSF)实施这一策略,该估计量从预处理结果中学习跨单元关系,并将其与从感兴趣干预下的处理单元轨迹中学习的时间序列模型相结合。在适当条件下,我们建立了有限样本预测误差界,该误差界意味着逐点一致性,并引入正交化校正,得到渐近正态性,从而实现逐点推断。我们将该框架扩展到固定多步预测视界,通过直接和递归两种程序,每种程序都继承了类似的逐点保证。我们通过模拟研究验证了理论,并通过研究2020赛季开放NFL体育场对公共卫生的影响,说明了TWSF的实际效用。

英文摘要

Estimating causal effects in panel data is a central problem in policy evaluation. Existing methods largely address retrospective questions of the form: what would have happened to a target unit under a different intervention during the observed panel? In many applications, however, decision-makers face prospective questions: what will happen to a target unit under an intervention it has not yet experienced, beyond the observed panel? This article develops a framework for answering such causal forecasting questions by integrating the retrospective counterfactual logic of synthetic-controls-based approaches with the extrapolative structure of multivariate time-series forecasting. Building on the latent factor models that justify unit-side regressions in synthetic controls, we impose low-rank temporal structure on the latent time factors to identify prospective causal forecast estimands. We operationalize this strategy through the Two-Way Synthetic Forecasting estimator, or TWSF, which learns cross-unit relationships from pre-treatment outcomes and combines them with a time-series model learned from treated donor trajectories under the intervention of interest. Under suitable conditions, we establish finite-sample forecasting error bounds that imply pointwise consistency and introduce an orthogonalized correction that yields asymptotic normality and thus enables pointwise inference. We extend the framework to fixed multi-step forecasting horizons through both direct and recursive procedures, each of which inherits analogous pointwise guarantees. We corroborate the theory with simulation studies and illustrate the practical utility of TWSF by studying the public-health impact of opening NFL stadiums during the 2020 season.

2606.19087 2026-06-18 stat.ME 新提交

What does ethnic density represent? Spatial co-occurrence networks of a widely used contextual measure using harmonised UK small-area census data

族群密度代表什么?基于统一英国小区域普查数据的广泛使用的背景测量的空间共现网络

Joseph Lam

AI总结 通过分析英国23.9万小区域普查数据,使用混合图模型和空间自相关分析,揭示族群密度并非单一背景标量,不同族群的密度百分比代表不同的邻里背景特征。

详情
AI中文摘要

族群密度在流行病学和健康地理学中广泛用作背景暴露,但很少被作为测量问题本身进行研究。等价的百分比值可能代表不同群体和地区的不同邻里背景,特别是在迁移、宗教、语言、家庭结构和社会经济条件空间共现的情况下。利用统一的英国普查数据,我分析了239,023个小区域普查数据,将族群密度作为探索性背景共现结构进行研究。我针对八个族群密度目标估计了英国范围的混合图模型(MGM),使用了239,019个完整案例和每个目标特定模型32个节点。随后,仅英格兰的空间分析使用k近邻输出区域质心(k=8)估计LISA和空间调整残差网络。族群密度并不表现为单一背景标量。在英国范围的MGM中,不同群体保留的最强目标-邻居边不同。亚洲密度与中东/亚洲出生比例(0.59)关联最强,印度密度与印度教比例(0.55)关联最强,巴基斯坦密度与穆斯林比例(0.47)关联最强,孟加拉国密度与穆斯林比例(0.23)关联最强,黑人密度与非洲出生比例(0.42)关联最强,白人密度与中东/亚洲出生比例(0.35)关联最强。仅英格兰的族群密度测量具有强空间自相关,全局莫兰指数从混合比例的0.57到白人比例的0.90。在对英格兰区域和局部空间滞后进行残差化后,64.3%至96.4%的原始目标-节点边在族群密度网络中持续存在。等价百分比值在不同族群之间不一定可比。这对估计量定义、调整策略以及城市健康研究中族群密度和其他捆绑背景测量的解释具有意义。

英文摘要

Ethnic density is widely used in epidemiology and health geography as a contextual exposure, yet it is rarely examined as a measurement problem in its own right. Equivalent percentage values may represent different neighbourhood contexts across groups and places, particularly where migration, religion, language, household structure and socioeconomic conditions are spatially co-located. Using the harmonised Unified UK Census Data release, I analysed 239,023 small-area census data to examine ethnic density as an exploratory contextual co-occurrence construct. I estimated UK-wide mixed graphical models (MGM) for eight ethnic-density targets using 239,019 complete cases and 32 nodes per target-specific model. England-only spatial analyses then used k-nearest-neighbour Output Area centroids (k = 8) to estimate LISA and spatially adjusted residual networks. Ethnic density did not behave as a single contextual scalar. In the UK-wide MGM, the strongest retained target-neighbour edges differed across groups. Asian density was linked most strongly with Middle East/Asia-born share (0.59), Indian density with Hindu share (0.55), Pakistani density with Muslim share (0.47), Bangladeshi density with Muslim share (0.23), Black density with Africa-born share (0.42), and White density with Middle East/Asia-born share (0.35). England-only ethnic-density measures were strongly spatially autocorrelated, with Global Moran's I ranging from 0.57 for Mixed share to 0.90 for White share. After residualising against English region and local spatial lag, 64.3% to 96.4% of original target-node edges persisted across ethnic-density networks. Equivalent percentage values are not necessarily comparable across ethnic groups. This has implications for estimand definition, adjustment strategies, and the interpretation of ethnic density and other bundled contextual measures in urban health research.

2606.18545 2026-06-18 stat.AP q-fin.RM 新提交

The Gini-Bayes Connection: The CAP Slope as Bayes' Theorem, with Applications to Weight of Evidence, Somers' $D$, and Calibration

Gini-Bayes 联系:CAP 斜率作为贝叶斯定理,及其在证据权重、Somers' $D$ 和校准中的应用

Denis Burakov

AI总结 本文明确将累积精度曲线 (CAP) 的斜率识别为贝叶斯定理在累积坐标下的形式,并由此推导出证据权重、信息值、准确率比、Somers' $D$ 和 Gini 系数之间的几何关系,同时提出基于 Gini 系数差异的校准诊断方法。

Comments 19 pages, 7 figures, 6 tables. Code and data: https://github.com/deburky/gini-bayes-paper

详情
AI中文摘要

累积精度曲线 (CAP) 的概率解释在工业界有着悠久的历史。Falkenstein, Boral 和 Carty (2000) 以离散形式指出,在某个分数百分位处的违约率等于组合平均违约率乘以功效曲线的局部斜率;van der Burgt (2008, 2019) 将其形式化为连续恒等式 $p(D\mid x) = p_D\\, dy/dx$,并将连续形式作为工作事实引入;Tasche (2009) 分析了由此产生的校准方法;Voloshyn 和 Voloshyn (2023) 将贝叶斯定理 $f(x\mid D)=p(D\mid x) f(x)/p_D$ 代入面积积分,并将 Gini 系数写为校准曲线的泛函。斜率本身已存在于这一谱系中(van der Burgt 的 $dy/dx$ 是两个累积微分的比值),但它是作为引用的工作事实出现的,从未被视为贝叶斯定理。我们明确地做出这一识别,并阐述其后果。首先,CAP 斜率是累积坐标下的贝叶斯定理:它所恢复的标准化 PD 是经先验概率重新缩放的后验概率。本文的重点在于这一解读所揭示的两个结果。几率形式将证据权重(似然比的对数,即贝叶斯因子)和信息值置于同一几何框架内(某点的证据权重是“坏”和“好”CAP 斜率比值的对数)。准确率比、Somers' $D_{xy}$ 和 Gini 系数 $(2A-1)/(1-p_D)$ 被揭示为同一数值的三种计算方式。在比较模式下(实际结果与模型预测对比),同一恒等式恢复了累积坐标下的可靠性图,其中经验 Gini 系数与模型隐含 Gini 系数之间的差距符号可作为校准诊断指标。一个五组示例以离散形式呈现了所有恒等式,一个核密度示例将其推广到连续情形。

英文摘要

The probabilistic reading of the cumulative accuracy profile (CAP) has a long industry lineage. Falkenstein, Boral and Carty (2000) state, in discrete form, that the default rate at a score percentile equals the portfolio average rate times the local slope of the power curve; van der Burgt (2008, 2019) formalizes this as the continuous identity $p(D\mid x) = p_D\, dy/dx$ and imports the continuous form as a working fact; Tasche (2009) analyzes the resulting calibration method; Voloshyn and Voloshyn (2023) substitute Bayes' theorem, $f(x\mid D)=p(D\mid x) f(x)/p_D$, into the area integral and write the Gini as a functional of the calibration curve. The slope itself is already in the lineage (van der Burgt's $dy/dx$ is the ratio of the two cumulative differentials), but it enters as a cited working fact, never as Bayes' theorem. We make that identification explicit and draw out its consequences. First, the CAP slope is Bayes' theorem in cumulative coordinates: the standardized PD it recovers is the posterior probability rescaled by the prior. The weight of the paper then falls on two results this reading unlocks. The odds form places the weight of evidence (the log of the likelihood ratio, i.e. the Bayes factor) and the information value inside one geometry (the weight of evidence at a point is the log of the ratio of the "bad" and "good" CAP slopes). The accuracy ratio, Somers' $D_{xy}$, and the Gini $(2A-1)/(1-p_D)$ are revealed as one number computed three ways. Run in comparison mode (realized outcomes against model claims), the same identity recovers the reliability diagram in cumulative coordinates, with the sign of the gap between the empirical and model-implied Gini coefficients as a calibration diagnostic. A worked five-band example carries every identity in discrete form, and a kernel-density example extends them to the continuous case.

2506.01101 2026-06-18 cs.CE q-fin.MF stat.CO 版本更新

Gradient-based Stochastic Optimization of Utility-based Shortfall Risk

基于梯度的随机优化在效用型短缺风险中的应用

Sumedh Gupte, Prashanth L. A., Sanjay P. Bhat

AI总结 本文扩展了效用型短缺风险(UBSR)以涵盖无界随机变量,提出其梯度估计器,并基于随机梯度算法给出强凸、凸和非凸目标下的非渐近收敛界。

详情
AI中文摘要

我们考虑效用型短缺风险(UBSR)的估计和优化问题。我们将UBSR扩展到可能无界的随机变量。我们将诸如熵风险、期望分位数风险、风险价值和二次风险等主要风险度量作为UBSR的特例。在估计方面,我们推导了UBSR的经典样本均值逼近(SAA)估计器的平均绝对误差(MAE)和均方误差(MSE)的非渐近界。在优化方面,我们推导了光滑参数化下UBSR梯度的表达式。我们提出了UBSR的梯度估计器,并推导了该估计器的MAE和MSE的非渐近界。我们将上述梯度估计器纳入随机梯度(SG)优化算法,并推导了我们的SG算法在优化UBSR时针对三种目标(即强凸、凸和非凸)的收敛速度的非渐近界。最后,我们在金融应用上进行实验,以展示我们提出的UBSR估计和优化算法的性能。

英文摘要

We consider the problems of estimation and optimization of utility-based shortfall risk (UBSR). We extend UBSR to cover possibly unbounded random variables. We cover prominent risk measures such as entropic risk, expectile risk, Value-at-Risk, and quadratic risk as special cases of the UBSR. In the context of estimation, we derive non-asymptotic bounds on the mean absolute error (MAE) and the mean-squared error (MSE) of the classical sample-average approximation (SAA) estimator for the UBSR. In the context of optimization, we derive an expression for the gradient of UBSR under a smooth parameterization. We propose a gradient estimator for the UBSR and derive non-asymptotic bounds on MAE and MSE for this estimator. We incorporate the aforementioned gradient estimator into a stochastic gradient (SG) optimization algorithm and derive non-asymptotic bounds on the convergence rate of our SG algorithm for optimizing UBSR under three objectives, namely, strongly convex, convex and non-convex. Finally, we conduct experiments on financial applications to demonstrate the performance of our proposed UBSR estimation and optimization algorithms.

10. 数据隐私、稳健性与公平性 2 篇

2606.18467 2026-06-18 stat.ML cs.LG 新提交

ToolChain-CRC: Conformal Risk Control for Agentic AI Under Retrieval and Tool-Use Drift

ToolChain-CRC: 检索与工具使用漂移下代理型AI的共形风险控制

Jeffery Opoku, David Banahene

发表机构 * The University of Texas Rio Grande Valley(德克萨斯大学里奥格兰德谷分校) Florida International University(佛罗里达国际大学)

AI总结 针对检索增强和工具使用代理在漂移下的风险控制问题,提出ToolChain-CRC方法,通过构建轨迹级风险评分并校准接受或干预规则,实现可证明的轨迹级风险控制。

Comments 26 pages, 11 figures

详情
AI中文摘要

现代AI代理检索文档、调用工具、检查中间信息,然后产生最终答案或行动。这产生了一个仅从最终答案无法察觉的风险控制问题。即使检索薄弱、工具输出错误或早期步骤缺乏支持,最终响应也可能看起来可接受。我们提出ToolChain-CRC,一种针对漂移下检索增强和工具使用代理的共形风险控制方法。该方法将每次代理运行视为动作、观察和最终输出的完整轨迹。它构建步骤级风险评分,将其组合成轨迹风险评分,校准接受或干预规则,并添加一个随时报警,可在最终答案前停止风险运行。我们在可交换校准运行下证明了轨迹级风险控制,给出了具有可审计常数的漂移感知扩展,并通过超鞅构造证明了随时升级规则。实验涵盖合成工具链漂移、RAG/工具使用压力测试、基于SQuAD的公共检索任务、无API代理问答案例研究、消融实验、目标风险敏感性检查、20种子鲁棒性检查、漂移边界审计以及实时RAG/工具使用代理基准。在这些设置中,仅基于最终答案的校准可能遗漏检索和工具故障,而轨迹级校准将接受轨迹的风险保持在目标之下。

英文摘要

Modern AI agents retrieve documents, call tools, check intermediate information, and then produce a final answer or action. This creates a risk-control problem that is not visible from the final answer alone. A final response may look acceptable even when the retrieval was weak, a tool output was wrong, or an earlier step was unsupported. We propose ToolChain-CRC, a conformal risk-control method for retrieval-augmented and tool-using agents under drift. The method treats each agent run as a full trajectory of actions, observations, and final output. It builds step-level risk scores, combines them into a trajectory risk score, calibrates an accept-or-intervene rule, and adds an anytime alarm that can stop risky runs before the final answer. We prove trajectory-level risk control under exchangeable calibration runs, give a drift-aware extension with auditable constants, and prove an anytime escalation rule through a supermartingale construction. Experiments cover synthetic tool-chain drift, RAG/tool-use stress tests, public SQuAD-derived retrieval tasks, an API-free agentic QA case study, ablations, target-risk sensitivity checks, 20-seed robustness checks, a drift-margin audit, and a live RAG/tool-use agent benchmark. Across these settings, final-answer-only calibration can miss retrieval and tool failures, while trajectory-level calibration keeps accepted-trajectory risk below the target.

2602.06518 2026-06-18 cs.CR stat.ME stat.ML 版本更新

Sequential Auditing for f-Differential Privacy

f-差分隐私的顺序审计

Tim Kutta, Martin Dunsche, Yu Wei, Vassilis Zikas

AI总结 提出基于输出样本的顺序审计器,自适应确定样本量以检测f-DP违规,降低采样成本,支持白盒和黑盒设置。

Comments 24 pages, 10 figures

详情
AI中文摘要

我们提出了新的审计器,用于基于输出样本评估算法的差分隐私(DP)。这种经验性审计器常用于检查算法正确性和实现错误。现有的大多数审计器是基于批次的,或针对传统的$(\varepsilon,\delta)$-DP概念;通常两者兼具。在这项工作中,我们将重点转向高度表达性的隐私概念$f$-DP,其中整个隐私行为由单个权衡曲线捕获。我们的审计器检测整个隐私谱上的违规行为,并具有统计显著性保证,这些保证由理论和模拟支持。最重要的是,与先前工作相比,我们的审计器不需要用户指定样本大小作为输入。相反,它们自适应地确定做出决策所需的近最优样本数量,从而避免了在许多审计研究中常见的过大样本量。这种采样成本的降低对于昂贵的训练过程(如DP-SGD)尤其有益。我们的方法支持白盒和黑盒设置,并且也可以在单次运行框架中执行。

英文摘要

We present new auditors to assess Differential Privacy (DP) of an algorithm based on output samples. Such empirical auditors are common to check for algorithmic correctness and implementation bugs. Most existing auditors are batch-based or targeted toward the traditional notion of $(\varepsilon,δ)$-DP; typically both. In this work, we shift the focus to the highly expressive privacy concept of $f$-DP, in which the entire privacy behavior is captured by a single tradeoff curve. Our auditors detect violations across the full privacy spectrum with statistical significance guarantees, which are supported by theory and simulations. Most importantly, and in contrast to prior work, our auditors do not require a user-specified sample size as an input. Rather, they adaptively determine a near-optimal number of samples needed to reach a decision, thereby avoiding the excessively large sample sizes common in many auditing studies. This reduction in sampling cost becomes especially beneficial for expensive training procedures such as DP-SGD. Our method supports both whitebox and blackbox settings and can also be executed in one-run frameworks.

11. 数据集、软件与应用 16 篇

2606.19294 2026-06-18 stat.AP 新提交

Accelerating Network-Agent Dispersion: Territorial Behavior and Directionally Biased Lazy Random Walks

加速网络智能体分散:领地行为与方向性偏倚懒惰随机游走

Li Zeng, Steve Alpern

AI总结 研究通过领地行为和方向性偏倚懒惰随机游走加速网络智能体分散,理论分析和模拟表明领地行为大幅降低分散时间,方向性偏倚可进一步加速。

详情
AI中文摘要

领地行为可以极大地加速网络上智能体的分散。本文研究一个网络智能体分散问题,其中m个自主智能体在离散时间内在连通图上移动,并寻求一种配置,使得没有两个智能体占据同一节点。我们关注m=n的分散情况,此时成功配置中每个节点恰好有一个智能体。在基线模型中,每个智能体遵循具有共同懒惰参数p的懒惰随机游走。该过程定义了一个有限吸收马尔可夫链,期望吸收时间用于衡量分散效率。我们引入了两种局部行为扩展:领地行为,即单独占据节点的智能体声明该节点并排斥后来的到达者;方向性偏倚,即智能体在路径和环上共享一个优先移动方向。在三个智能体的路径和环网络上的精确计算以及更大实例上的蒙特卡洛模拟表明,领地行为显著减少了期望分散时间,且网络规模越大相对减少越大。方向性偏倚在大多数小网络情况下效果有限,但与领地行为结合时能产生额外的大幅加速。特别是,模拟显示当所有智能体从一个节点出发时,在L100和C100上分别减少了99.22%和97.48%。这些结果表明简单的局部移动规则如何强烈影响分散式网络多智能体系统的全局分散时间。

英文摘要

Territorial behavior can greatly accelerate decentralized agent dispersion on networks. This paper studies a network-agent dispersion problem in which m autonomous agents move in discrete time on a connected graph and seek a configuration in which no two agents occupy the same node. We focus on the dispersion case m = n, where successful configurations contain exactly one agent per node. In the baseline model, each agent follows a lazy random walk with a common laziness parameter p. This process defines a finite absorbing Markov chain, and the expected absorption time is used to measure dispersion efficiency. We introduce two local behavioral extensions: territorial behavior, in which an agent that is alone at a node claims that node and repels later arrivals, and directional bias, in which agents share a preferred direction of movement on paths and cycles. Exact calculations on three-agent path and cycle networks and Monte Carlo simulations on larger instances show that territorial behavior substantially reduces expected dispersion time, with larger relative reductions as network size increases. Directional bias alone has limited effect in most small-network cases, but when combined with territorial behavior it can produce large additional speedups. In particular, the simulations show reductions of 99.22% on L100 and 97.48% on C100 when all agents start from one node. These results show how simple local movement rules can strongly affect global dispersion time in decentralized networked multi-agent systems.

2606.18544 2026-06-18 stat.AP 新提交

Chess Signatures of Play

对弈的棋谱签名

Christian Turk, Nicholas Polson

AI总结 利用粗路径理论的签名变换提取棋局中事件顺序与交互的不变特征,构建签名核双样本检验和时序有效作弊检测方法,在控制错误率的同时显著提升检测能力。

详情
AI中文摘要

一局棋是一个流:一个按时间排序的走法序列,每个走法携带引擎评估、准确度度量、局面复杂度度量和时钟读数。我们将一局棋建模为多元路径,并应用粗路径理论的签名变换,获得一个重参数化不变、分级的特征集,记录棋局内事件的顺序和交互,无需参数化似然。我们证明,棋手的对弈法则可以从期望签名中识别,直至树状等价;构造路径空间上的签名核双样本检验;并将作弊检测重新表述为任意时序有效的序列检验:签名符合度得分成为一个e过程,其误差通过Ville不等式对每个样本量同时控制,波动在中等偏差尺度上校准。判别信息存在于签名的Levy面积中,该面积衡量准确度是否恰好当局面变难时上升——这是引擎辅助的特征,而聚合的匹配率统计忽略了这一点。在对照研究中,该检验保持精确的第一类错误控制,检测能力从对细微辅助的微不足道上升到对明显辅助的0.98,中位检测时间与增长率预测一致。校准至马格努斯·卡尔森记录在案的精英准确度后,该监测器不会标记世界冠军级别的对弈;我们展示了作弊策略,这些策略使所有聚合统计量(包括Regan系统的最佳走法频率z分数)保持不变,却被签名干净地捕获——精确说明了顺序感知、任意时序有效的检验如何加强现有的国际象棋反作弊方法。

英文摘要

A game of chess is a stream: a time-ordered sequence of moves, each carrying an engine evaluation, a measure of accuracy, a measure of position complexity, and a clock reading. We model a game as a multivariate path and apply the signature transform of rough-path theory to obtain a reparametrization-invariant, graded feature set that records the order and interaction of in-game events without a parametric likelihood. We show that a player's law of play is identifiable from the expected signature up to tree-like equivalence, construct a signature-kernel two-sample test on path space, and recast cheating detection as an anytime-valid sequential test: a signature conformance score becomes an e-process whose error is controlled for every sample size at once by Ville's inequality, with fluctuations calibrated on the moderate-deviation scale. The discriminating information lives in the signature's Levy areas, which measure whether accuracy rises precisely when positions become hard--the fingerprint of engine assistance that aggregate match-rate statistics discard. In a controlled study the test holds exact type-I control and detection power rises from negligible for subtle assistance to 0.98 for blatant assistance, with a median detection time matching the growth-rate prediction. Calibrated to Magnus Carlsen's documented elite accuracy, the monitor does not flag world-champion-level play; and we exhibit cheating strategies that leave every aggregate statistic, including the best-move-frequency z-score of the Regan system, unchanged yet are caught cleanly by the signature--making precise how an order-aware, anytime-valid test strengthens the prevailing approach to chess anti-cheating.

2606.18567 2026-06-18 stat.ML cs.LG stat.AP stat.ME 新提交

Bridging Data Gaps in Structural Fragility Modeling through Transfer Learning: Methodology and Case Studies

通过迁移学习弥合结构易损性建模中的数据空白:方法与案例研究

Narges Saeednejad, Jamie Ellen Padgett

发表机构 * Department of Civil and Environmental Engineering, Rice University(Rice大学土木与环境工程系) Ken Kennedy Institute, Rice University(Rice大学肯尼迪研究所)

AI总结 提出以方法为中心的迁移学习框架,解决领域偏移、类别不平衡和目标标签稀缺问题,通过三个案例验证其在低数据场景下提升失效检测与预测稳定性的有效性。

Comments 24 pages, 12 figures

详情
AI中文摘要

本文提出了一个以方法为中心的迁移学习框架,用于在领域偏移、类别不平衡和目标标签稀缺的情况下进行易损性自适应,同时保持工程可解释性并支持不确定性下的决策。通过三个互补的案例研究展示了四种迁移学习策略(基于实例、基于参数、分层贝叶斯和多源):(i) 基于实例的迁移学习通过重要性加权,利用卡特里娜飓风观测数据演示了沿海桥梁易损性;(ii) 基于参数的迁移学习结合分层贝叶斯迁移学习,实现了跨层的部分合并和后验不确定性量化,利用伊恩飓风观测数据演示了住宅建筑易损性;(iii) 多源迁移学习融合多个分析易损性模型,学习源权重并进行正则化的目标域自适应,利用2001年尼斯夸利地震观测数据演示了地震桥梁易损性。在这些案例研究中,直接迁移源模型(即使用现有最先进模型)在领域偏移和严重类别不平衡下失败,而有针对性的自适应在低数据场景下显著提高了失效检测和预测稳定性。这些发现强调了在开发和自适应易损性模型时,需要对诊断、策略选择和不确定性报告提供系统指导。

英文摘要

This paper presents a methodology-centered transfer learning framework for fragility adaptation under domain shift, class imbalance, and scarce target labels while preserving engineering interpretability and supporting decision-making under uncertainty. Four transfer learning strategies (instance-based, parameter-based, hierarchical Bayesian, and multi-source) are demonstrated through three complementary case studies: (i) instance-based transfer learning via importance weighting, demonstrated on coastal bridge fragility using Hurricane Katrina observations; (ii) parameter-based transfer learning together with hierarchical Bayesian transfer learning, enabling partial pooling across strata and posterior uncertainty quantification, demonstrated on residential building fragility using Hurricane Ian observations; and (iii) multi-source transfer learning that fuses multiple analytical fragility models with learned source weights and regularized target-domain adaptation, demonstrated on seismic bridge fragility using observations from the 2001 Nisqually earthquake. Across these case studies, direct transfer of source models (i.e. using existing state-of-the-art models) fails under domain shift and severe class imbalance, while targeted adaptation substantially improves failure detection and predictive stability in low-data regimes. These findings highlight the need for systematic guidance on diagnostics, strategy selection, and uncertainty reporting when developing and adapting fragility models.

2606.18536 2026-06-18 stat.AP cs.SE 新提交

Analytics for Quality Assurance for Item Pools (AQuAP): Monitoring and Maintaining Item Bank Health in AI-Driven Assessment Systems

题库质量保证分析(AQuAP):AI驱动评估系统中题库健康的监控与维护

Alina A. von Davier, Xiaowan Zhang, Yigal Attali, Yena Park, Jacqueline Church, Andrew Runge, Geoff T. LaFlair, Alexander Tsigler

AI总结 提出AQuAP仪表盘环境,通过有效题库规模等指标监控题库质量,支持大规模自动与人工结合的试题开发,确保高利害测试的题库健康。

Comments 11 pages, 4 figures

详情
AI中文摘要

教育评估的大规模数字化使得题库的持续监督既必要又复杂。本文提出了题库质量保证分析(AQuAP),一个用于监控试题质量和题库健康的仪表盘环境。AQuAP支持高利害测试中大规模试题生成程序的操作实施,这些程序包含在试题工厂(一个自动化和人工支持的测试开发框架)中。本文描述了AQuAP与试题开发过程的关系,概述了题库质量保证的更广泛度量框架,并强调了有效题库规模(EBS)作为题库活力的核心指标。EBS量化了在内容重复发生之前可以构建的独立测试会话数量,当与曝光度和使用度量结合时,它提供了对题库安全性、多样性和效率的洞察。我们进一步引入了题库健康度量,如最大曝光度、最大条件曝光度、调整后的有效题库规模和极少施测比例,所有这些都扩展了试题利用情况的图景。AQuAP展示了操作分析如何将心理测量概念转化为高容量、AI驱动的测试程序的质量保证工具。本文以多邻国英语测试(DET)流程为例进行说明。

英文摘要

The large-scale digitization of educational assessment has made the continuous oversight of item banks both essential and complex. This paper presents Analytics for Quality Assurance for Item Pools (AQuAP), a dashboard environment for monitoring item quality and item bank health. AQuAP supports the operational implementation of the large scale item generation procedures for high-stakes tests as included in the Item Factory, a framework for automated and human-supported test development. The paper describes AQuAP in relationship with the process of item development, outlines the broader metric framework for item-pool quality assurance, and highlights the Effective Bank Size (EBS) as one central indicator of pool vitality. EBS quantifies how many independent test sessions can be constructed before content repetition occurs and, when coupled with exposure and usage metrics, provides insight into item bank security, diversity, and efficiency. We further introduce bank-health metrics, such as maximum exposure, maximum conditional exposure, adjusted effective bank size, and the rarely-administered fraction, all of which extend this picture of item utilization. AQuAP illustrates how operational analytics can translate psychometric concepts into quality assurance tools for high-volume, AI-enabled testing programs. This work is illustrated with the Duolingo English Test (DET) processes.

2606.18436 2026-06-18 stat.ML cs.LG 新提交

Pointwise is Pointless? A Multimodal Ablation Study for Precipitation Nowcasting with Graph Neural Networks

逐点是否无意义?基于图神经网络的降水临近预报的多模态消融研究

Ophélia Miralles, Máté Mile, Christoffer Artturi, Thomas Nipen, Ivar Seierstad

发表机构 * Norwegian Meteorological Institute(挪威气象研究所)

AI总结 本研究通过多模态图神经网络系统,消融分析雷达、数值预报、地面观测、卫星数据及训练损失对降水临近预报的影响,发现各模态分别改善不同方面,点观测虽提升局部但需结合损失函数和不确定性表示才能优化雷达场。

详情
AI中文摘要

稀疏点观测在降水临近预报中日益可用,但尚不清楚它们能在多大程度上改善密集雷达场预报。我们通过北欧雷达区域的多模态图神经网络临近预报系统部分回答了这个问题。该模型预测未来两小时内每五分钟的降雨率,并采用雷达历史、MEPS数值天气预报、Netatmo地面观测、MSG卫星通道、随机噪声和基于CRPS的集合损失的不同组合进行训练。本研究设计为对操作相关信源和训练目标的消融。我们比较了仅雷达、NWP信息、站点信息、卫星信息、噪声增强和基于CRPS的配置,使用雷达网格、站点位置、降雨起始的互补诊断,以及oracle、位移和幅度评分。结果表明,每个信源改善了预报问题的不同方面。MEPS稳定了仅雷达外推,Netatmo观测改善了局部站点和起始诊断,卫星预测因子减少了某些站点级偏差,但在确定性使用时可能过早激活降雨。基于CRPS的配置提供了最一致的雷达网格增益,而卫星与CRPS的组合设置给出了最佳的整体oracle/DAS评分。这些结果不支持点观测对临近预报无用的结论,但表明局部观测技能和空间相干雷达场技能是不同的目标。实际意义是,稀疏观测可以提供有用的局部约束,但它们对雷达类场的益处取决于训练损失、不确定性表示以及观测支持在模型中的编码方式。

英文摘要

Sparse point observations are increasingly available for precipitation nowcasting, but it is unclear how much they improve dense radar-field forecasts. We partially address this question with a multimodal graph neural network nowcasting system over the Nordic radar domain. The model predicts rain rate every five minutes up to two hours ahead and is trained with different combinations of radar history, MEPS numerical weather prediction, Netatmo surface observations, MSG satellite channels, stochastic noise, and CRPS-based ensemble losses. The study is designed as an ablation of operationally relevant information sources and training objectives. We compare radar-only, NWP-informed, station-informed, satellite-informed, noise-augmented, and CRPS-based configurations using complementary diagnostics on the radar grid, at station locations, for rain onset, and through oracle, displacement, and amplitude scores. The results show that each source improves a different part of the forecast problem. MEPS stabilises radar-only extrapolation, Netatmo observations improve local station and onset diagnostics, and satellite predictors reduce some station-level biases but may activate rain too early when used deterministically. CRPS-based configurations provide the most consistent radar-grid gains, while the combined satellite and CRPS setup gives the best overall oracle/DAS score. These results do not support the conclusion that point observations are uninformative for nowcasting, but they show that local observational skill and spatially coherent radar-field skill are distinct targets. The practical implication is that sparse observations can provide useful local constraints, but their benefit for radar-like fields depends on the training loss, uncertainty representation, and how observation support is encoded in the model.

2606.18280 2026-06-18 stat.AP cs.AI 新提交

IOAH3: Importance-Driven Adaptive Spatial Partitioning

IOAH3: 重要性驱动的自适应空间划分

Ehsaneddin Jalilian

发表机构 * Interdisciplinary Transformation University Austria(跨学科转型大学奥地利)

AI总结 提出IOAH3方法,通过多源特征提取、马尔可夫随机场图割优化和数据驱动层次细化,构建自适应空间划分,解决可修改面积单元问题。

详情
AI中文摘要

我们提出IOAH3(重要性导向的自适应H3划分),一种用于构建地理参考观测域的数据驱动空间划分的计算方法。标准的空间聚合方法采用固定面积单元,例如行政边界或单一分辨率的均匀六边形网格,而不考虑每个区域中底层观测的信息内容。这导致了著名的可修改面积单元问题:统计和推断结果依赖于划分的任意选择,空间集中的现象在粗网格中被平均化,从而掩盖了精细尺度的结构。IOAH3通过三个阶段构建自适应划分来解决这一问题:多源特征提取和重要性评分,通过主成分分析对道路密度、POI密度、建筑密度和地形粗糙度信号进行,人口和洪水灾害数据作为辅助输入用于单元过滤和空间平滑;通过马尔可夫随机场图割优化进行空间单元选择,该优化在强制空间连续性的同时联合最大化每个单元的重要性;以及数据驱动的高重要性区域层次细化到更精细的H3分辨率级别,并通过邻居传播支持以避免孤立的精细分辨率孤岛。所得划分作为空间推断流程的输入,并在任何建模步骤之前提供了对划分敏感性问题的原则性解决方案。

英文摘要

We present IOAH3 (Importance-Oriented Adaptive H3 partitioning), a computational method for constructing data-driven spatial partitions of geo-referenced observation domains. Standard approaches to spatial aggregation adopt fixed areal units, such as administrative boundaries or uniform hexagonal grids at a single resolution, without regard to the informational content of the underlying observations in each region. This leads to the well-known modifiable areal unit problem: statistical and inferential results depend on the arbitrary choice of partition, and spatially concentrated phenomena are averaged out in coarse cells that obscure fine-scale structure. IOAH3 addresses this by constructing an adaptive partition in three stages: multi-source feature extraction and importance scoring via principal component analysis over road density, POI density, building density, and terrain roughness signals, with population and flood-hazard data entering as auxiliary inputs to cell filtering and spatial smoothness; spatial cell selection via Markov Random Field graph-cut optimisation, which jointly maximises per-cell importance while enforcing spatial contiguity; and data-driven hierarchical refinement of high-importance regions to finer H3 resolution levels, with neighbour-propagated support to avoid isolated fine-resolution islands. The resulting partitions serve as input to spatial inference pipelines and provide a principled resolution of the partition-sensitivity problem prior to any modelling step.

2606.18611 2026-06-18 cs.SD cs.AI cs.LG stat.ML 新提交

QC-GAN: A Parameter-Efficient Quaternion Conformer GAN for High-Fidelity Speech Enhancement

QC-GAN: 一种参数高效的四元数Conformer GAN用于高保真语音增强

Shogo Yamauchi, Hideaki Tamori, Makoto Sakai, Yosuke Yamano, Tohru Nitta

发表机构 * The Asahi Shimbun Company(朝日新闻社) Tokyo Woman's Christian University(东京女子基督教大学)

AI总结 提出参数高效的QC-GAN,结合四元数Conformer生成器和MetricGAN训练,通过汉密尔顿积共享权重减少参数量,在VoiceBank+DEMAND上以0.89M参数达到PESQ 3.48,性能媲美两倍大小模型。

Comments 10 pages, 6 figures and 5 tables. Accepted at Interspeech2026

详情
AI中文摘要

我们提出了一种参数高效的语音增强框架——四元数Conformer GAN(QC-GAN),它将四元数Conformer生成器与基于MetricGAN的训练相结合。汉密尔顿积通过结构化权重共享对幅度和相位进行编码,在减少层参数数量的同时保持其相互依赖性。采用度量学习判别器,通过优化近似感知评估分数来最大化感知质量。在VoiceBank+DEMAND数据集上,QC-GAN仅用0.89M参数就达到了3.48的语音质量感知评估(PESQ)分数,其性能与最先进模型相当,而参数量不到后者的一半。一个35K参数的变体实现了3.23的PESQ分数,以显著更少的参数超越了传统方法。在DNS-Challenge 3数据集上的评估进一步证实了其在真实世界条件下的泛化能力。

英文摘要

We propose a parameter-efficient speech enhancement framework, Quaternion Conformer GAN (QC-GAN), which combines a Quaternion Conformer generator with MetricGAN-based training. The Hamilton product encodes the magnitude and phase via structured weight sharing, reducing the number of layer parameters while preserving their interdependencies. A metric-learning discriminator was employed to maximize perceptual quality by optimizing the approximate perceptual evaluation scores. On the VoiceBank+DEMAND dataset, QC-GAN achieved a Perceptual Evaluation of Speech Quality (PESQ) score of 3.48 with only 0.89M parameters, delivering a performance comparable to state-of-the-art models at less than half their size. A 35K-parameter variant achieved a PESQ score of 3.23, surpassing conventional methods with significantly fewer parameters. Evaluation on the DNS-Challenge 3 dataset further confirmed generalization to real-world conditions.

2606.18539 2026-06-18 cs.LG stat.ML 新提交

TS-Fault: Benchmarking Time Series Forecasters Against Structural Faults

TS-Fault: 针对结构性故障的时间序列预测器基准测试

Yuyang Zhao, Lian Xu, Hao Miao, Chenxi Liu, Hao Xue

发表机构 * Ray-zyy

AI总结 提出TS-Fault基准,通过参数化故障场景(沿观测/机制、单变量/多变量两轴)评估时间序列预测模型鲁棒性,发现干净数据准确性与鲁棒性负相关、机制级故障重排排名、基础模型最脆弱。

详情
AI中文摘要

时间序列预测(TSF)支撑着能源、交通、金融和医疗等领域的关键决策,然而TSF模型几乎普遍通过在干净保留数据上的单一数字(如平均误差)进行排名,隐含假设该数字能预测部署可靠性。但实际故障并非独立同分布噪声,而是具有时间形状的结构化事件、断裂的跨变量依赖、伴随缺失的机制变化以及跨传感管道的因果传播。将TSF鲁棒性视为数据质量问题,我们提出TS-Fault,一个在显式、参数化且具有可控语义难度的故障场景下评估预测模型的基准。TS-Fault将重复出现的故障沿两个正交轴(观测级 vs 机制级;单变量 vs 多变量)组织为四种模式,并通过统一重要性评分将每种故障注入最关键的预测窗口。该设计使得鲁棒性能够针对模型实际依赖的结构进行测试,而非简化为通用噪声敏感性。我们在6个数据集、4种模式和5个难度级别上,采用配对干净/损坏协议评估了21个模型。结果揭示了三个与常见排行榜直觉相悖的发现:(i)干净数据准确性与鲁棒性负相关;(ii)干净排名在观测级故障下保持不变,但在机制级故障下重新洗牌;(iii)所有灾难性故障均发生在机制级故障下,基础模型在干净数据上准确率最高但表现出最大的脆弱性。代码已公开于该URL。

英文摘要

Time series forecasting (TSF) underpins consequential decisions in energy, transportation, finance, and healthcare, yet TSF models are almost universally ranked by a single number (e.g., average error) on clean held-out data, under the implicit assumption that it predicts deployed reliability. However, real faults are not i.i.d noise but structured events with temporal shape, broken cross-variable dependencies, regime change coupled with missingness, and causal propagation across a sensing pipeline. Treating TSF robustness as a data-quality problem, we present TS-Fault, a benchmark that evaluates forecasting models under explicit, parameterized fault scenarios with controllable semantic difficulty. TS-Fault organizes recurring failures into four modes along two orthogonal axes (observation- vs mechanism-level; univariate vs multivariate) and injects each fault into the most prediction-critical window via a unified importance score. This design enables robustness to be tested against the structures models actually rely on, rather than reduced to generic noise sensitivity. We evaluate 21 models across 6 datasets, 4 modes, and 5 difficulty levels under a paired clean/corrupt protocol. The results reveal three findings that contradict common leaderboard intuition: (i) clean-data accuracy anti-correlates with robustness; (ii) clean rankings are preserved under observation-level faults but reshuffled under mechanism-level faults; and (iii) all catastrophic failures occur under mechanism-level faults, with foundation models achieving the highest clean-data accuracy yet exhibiting the greatest fragility. The code is publicly available at https://github.com/Ray-zyy/TS-Fault.

2606.18113 2026-06-18 stat.ME 新提交

Undocumented Behavior in the gsynth R package and its Consequences for Three Published Studies

gsynth R包中的未记录行为及其对三项已发表研究的影响

Beniamino Green, P. M. Aronow

AI总结 研究发现gsynth包在特定选项组合下因实现错误严重低估标准误,导致假阳性率升高,并影响三篇APSR论文的结论。

详情
AI中文摘要

在2025年12月CRAN上的1.3.1版本更新之前,gsynth(一个用于估计交互固定效应(IFE)模型的流行R包)可能严重且系统地低估标准误。当两个估计选项(inference = "parametric" 和 EM = TRUE)同时使用时,会出现这种低估,此时该包会对Gobillon和Magnac(2016)的IFE-EM估计量应用参数自助法。该包在2025年12月停止支持这种组合,最新文档现在描述参数自助法因理论上的不兼容性而不适用于IFE-EM估计量。我们的重点是在gsynth的1.3.1之前版本中发现的实现错误:当EM = TRUE时使用的参数自助法与Xu(2017)提出的算法不匹配,使用了样本内残差而非样本外误差。我们证明,仅此实现错误就可能导致低估数个数量级。我们进行了一项实证蒙特卡洛研究,在一系列州级面板数据集上随机分配安慰剂处理,并表明gsynth在现实环境中可能产生高假阳性率。我们识别出三篇发表在《美国政治科学评论》上的论文受到此行为的影响。重新分析这些论文的相关部分,我们表明:(i)纠正实现错误后,大多数发现变得不显著;(ii)使用Xu(2017)的广义合成控制方法替代IFE-EM后,所有发现均变得不显著。

英文摘要

Prior to the version 1.3.1 update on CRAN in December 2025, gsynth, a popular R package for estimating Interactive Fixed Effects (IFE) models, could drastically and systematically underestimate standard errors. This underestimation would occur when two estimation options (inference = "parametric", and EM = TRUE) were used together, in which case the package would apply a parametric bootstrap procedure to Gobillon and Magnac (2016)'s IFE-EM estimator. The package ceased supporting this combination in December 2025, and the latest documentation now describes the parametric bootstrap as not suitable for use with the IFE-EM estimator due to a theoretical incompatibility. Our focus is an implementation error we identified in the pre-1.3.1 versions of gsynth: the parametric bootstrap used when EM = TRUE did not match the algorithm proposed in Xu (2017), using in-sample residuals instead of out-of-sample errors. We show that this implementation error alone can cause underestimation by orders of magnitude. We conduct an empirical Monte Carlo study using randomly assigned placebo treatments on a series of state-level panel datasets, and show that gsynth could yield high false positive rates in realistic settings. We identify three papers published in the American Political Science Review that are affected by this behavior. Reanalyzing the relevant sections of these papers, we show that (i) correcting the implementation error renders most findings insignificant, and (ii) using Xu (2017)'s Generalized Synthetic Control method in place of IFE-EM renders every finding insignificant.

2606.07622 2026-06-18 cs.LG stat.AP 新提交

Airport Terminal Passenger Queue Forecasting for Departure Gates and Security Checkpoints

机场航站楼登机口与安检点旅客排队预测

Juhwan Lee, Seokbin Yoon, Keumjin Lee, Hojong Baik, Seyeon Jung

发表机构 * Korea Aerospace University(韩国航空大学) Korea Airports Corporation(韩国机场公社)

AI总结 提出基于Transformer的框架,利用历史队列长度、等待时间和旅客吞吐量数据,预测登机口和安检点未来两小时的队列长度与等待时间,支持主动排队管理。

Comments 10 pages, 6 figures, accepted at DASC 2026

详情
AI中文摘要

准确的机场航站楼旅客排队预测对于高效的离港运营至关重要,因为它能够实现主动的拥堵管理。然而,时变的旅客需求以及多个离港设施中异构的设施使用情况使得预测具有挑战性。在这项工作中,我们提出了一种旅客排队预测框架,该框架从运营数据中学习历史旅客流量模式。所提出的模型采用基于Transformer的架构,利用过去登机口和安检点的队列长度和等待时间,以及值机岛的旅客吞吐量,来捕捉时间依赖性和设施间相关性。学习到的表示被映射到两个设施特定的MLP头部,以预测登机口和安检点的队列长度和等待时间。实验结果表明,该模型能够准确预测未来两小时内的排队情况。所提出的方法为机场航站楼运营中的主动排队管理和人员重新分配提供了实用的实时决策支持。

英文摘要

Accurate passenger queue forecasting in airport terminals is essential for efficient departure operations, as it enables proactive congestion management. However, time-varying passenger demand and heterogeneous facility usage across multiple departure facilities make forecasting challenging. In this work, we propose a passenger queue forecasting framework that learns historical passenger flow patterns from operational data. The proposed model employs a Transformer-based architecture to capture temporal dependencies and inter-facility correlations using past queue length and waiting time at departure gates and security checkpoints, together with passenger throughput at check-in islands. The learned representations are mapped to two facility-specific prediction heads to predict queue length and waiting time at departure gates and security checkpoints. Experimental results demonstrate accurate forecasts up to two hours ahead. The proposed approach offers practical real-time decision support for proactive queue management and staff reallocation in airport terminal operations.

2509.10064 2026-06-18 cs.HC stat.ME 交叉投稿

From customer survey feedback to software improvements: Leveraging the full potential of data

从客户调查反馈到软件改进:充分利用数据的潜力

Erik Bertram, Nina Hollender, Sebastian Juhl, Sandra Loop, Martin Schrepp

AI总结 提出一种端到端方法,从客户调查数据中提取有用信息,通过推断统计方法分析并驱动软件改进,同时展示用于向利益相关者传达分析结果的UX原型仪表板。

Comments 10 pages, 8 figures, published in Springer Nature

详情
Journal ref
Lecture Notes in Computer Science, Volume 15795, Pages 3-19, 2025
AI中文摘要

将客户调查反馈数据转化为可用见解一直是大型软件企业面临的巨大挑战。尽管该领域有所改进,但在从数据中得出正确结论并将其引导到软件开发过程中时,一个主要障碍仍然存在。在本文中,我们提出了一种实用的端到端方法,说明如何从数据集中提取有用信息并利用这些信息驱动变革。我们描述了如何选择正确的度量指标、从客户最终用户那里收集适当的反馈、通过利用推断统计方法分析数据、使数据透明化,并最终用结果驱动变革。此外,我们展示了一个UX原型仪表板的示例,该仪表板可用于在公司内部向利益相关者传达分析结果。

英文摘要

Converting customer survey feedback data into usable insights has always been a great challenge for large software enterprises. Despite the improvements on this field, a major obstacle often remains when drawing the right conclusions out of the data and channeling them into the software development process. In this paper we present a practical end-to-end approach of how to extract useful information out of a data set and leverage the information to drive change. We describe how to choose the right metrics to measure, gather appropriate feedback from customer end-users, analyze the data by leveraging methods from inferential statistics, make the data transparent, and finally drive change with the results. Furthermore, we present an example of a UX prototype dashboard that can be used to communicate the analyses to stakeholders within the company.

2605.26631 2026-06-18 stat.AP cs.LG 版本更新

Data-driven sparse identification of governing PDEs via knockoff filters and multi-criteria trade-offs

基于Knockoff滤波器与多准则权衡的数据驱动稀疏识别控制偏微分方程

Pongpisit Thanasutives, Naichang Ke, Yoshinobu Kawahara

发表机构 * RIKEN Center for Advanced Intelligence Project (AIP)(RIKEN先进人工智能项目中心) The University of Osaka(大阪大学)

AI总结 提出KO-PDE-IDENT框架,通过模型-X knockoff滤波器控制错误发现率,结合递归特征消除和多准则决策,从噪声数据中稀疏识别偏微分方程。

Comments 44 pages, 5 figures, 11 tables

详情
AI中文摘要

我们提出KO-PDE-IDENT,一个用于识别简洁偏微分方程(PDE)并控制错误发现率(FDR)的数据驱动框架。从噪声观测中发现PDE常常受到候选项之间极端多重共线性的阻碍,这导致典型的稀疏回归方法选择虚假项。为了解决这个问题,KO-PDE-IDENT首先通过具有有限样本FDR控制的模型-X knockoff滤波器挖掘潜在候选项的支持集,然后对存活的PDE备选方案进行细化和排序。该框架整合了三个组成部分。首先,通过将$\ell_{0}$约束的自适应最佳子集选择与SHapley Additive exPlanations(SHAP)相结合,构建knockoff特征统计量,产生有效且计算高效的差异统计量。其次,递归特征消除(RFE)过程去除边际贡献可省略的项,并通过knockoff扰动假设检验评估统计必要性。第三,最终模型选择被表述为一个多准则决策(MCDM)问题,其中最优控制方程是在预测精度、模型复杂度和系数不确定性等广泛准则之间取得最佳平衡的备选方案。我们在严重噪声污染下对五个经典PDE验证了KO-PDE-IDENT。实验结果表明,我们的框架可以精确恢复真实的PDE结构,消除错误发现同时保留所有真实潜在项,且系数估计误差低。

英文摘要

We propose KO-PDE-IDENT, a data-driven framework for identifying parsimonious partial differential equations (PDEs) with false discovery rate (FDR) control. PDE discovery from noisy observations is often hindered by extreme multicollinearity among candidate terms, which causes typical sparse-regression methods to select spurious terms. To address this problem, KO-PDE-IDENT initially mines a support set of potential candidate terms via model-X knockoff filters with finite-sample FDR control, then refines and ranks the surviving PDE alternatives. The framework integrates three components. First, knockoff feature statistics are constructed by coupling $\ell_{0}$-constrained adaptive best-subset selection with SHapley Additive exPlanations (SHAP), yielding an effective and computationally efficient difference statistic. Second, a recursive feature elimination (RFE) procedure removes terms whose marginal contributions are dispensable and assesses statistical necessity through knockoff-perturbed hypothesis testing. Third, the final model selection is formulated as a multi-criteria decision-making (MCDM) problem, where the optimal governing equation is the alternative that best balances a wide range of criteria such as predictive accuracy, model complexity and coefficient uncertainty. We evaluate KO-PDE-IDENT on five canonical PDEs under severe noise corruption. Empirical results show that our framework can exactly recover the true PDE structure, eliminating false discoveries while retaining all true underlying terms, with low coefficient estimation error.

2406.14399 2026-06-18 cs.LG cs.CV physics.ao-ph stat.ML 版本更新

Benchmarking Physics-Informed Time-Series Models for Operational Global Station Weather Forecasting

面向全球站点业务天气预报的物理信息时间序列模型基准测试

Tao Han, Zhibin Wen, Zhenghao Chen, Dazhao Du, Song Guo, Lei Bai

发表机构 * Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong SAR China(香港科技大学计算机科学与工程系) Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China(南方科技大学计算机科学与工程系) School of Computer and Information Sciences, University of Newcastle, Newcastle, Australia(新castle大学计算机与信息科学学院) Hangzhou Innovation Institute of Beihang University, Hangzhou, China(北京航空航天大学杭州创新研究院) Shanghai Artificial Intelligence Laboratory, Shanghai, China(上海人工智能实验室)

AI总结 提出大规模观测数据集WEATHER-5K和物理信息模型PhysicsFormer,通过压力-风对齐和能量感知平滑损失增强物理一致性,在多个天气变量和极端事件预测上评估学术模型与业务系统的差距。

Comments Accepted by ICML2026

详情
AI中文摘要

时间序列预测(TSF)模型的发展常受限于缺乏全面的数据集,尤其是在全球站点天气预报(GSWF)中,现有数据集规模小、时间短且空间稀疏。为解决这一问题,我们引入了WEATHER-5K,一个大规模观测天气数据集,能更好地反映真实世界条件,支持改进模型训练和评估。尽管最近的TSF方法在基准测试上表现良好,但在捕捉复杂天气动态和极端事件方面落后于业务数值天气预报系统。我们提出了PhysicsFormer,一种物理信息预测模型,结合动态核心与Transformer残差来预测未来天气状态。通过压力-风对齐和能量感知平滑损失强制物理一致性,确保在捕捉复杂时间模式的同时保持合理的动力学。我们将PhysicsFormer及其他TSF模型与业务系统在多个天气变量、极端事件预测和模型复杂度上进行基准测试,全面评估学术TSF模型与业务预报之间的差距。数据集和基准测试实现可在以下网址获取:this https URL。

英文摘要

The development of Time-Series Forecasting (TSF) models is often constrained by the lack of comprehensive datasets, especially in Global Station Weather Forecasting (GSWF), where existing datasets are small, temporally short, and spatially sparse. To address this, we introduce WEATHER-5K, a large-scale observational weather dataset that better reflects real-world conditions, supporting improved model training and evaluation. While recent TSF methods perform well on benchmarks, they lag behind operational Numerical Weather Prediction systems in capturing complex weather dynamics and extreme events. We propose PhysicsFormer, a physics-informed forecasting model combining a dynamic core with a Transformer residual to predict future weather states. Physical consistency is enforced via pressure-wind alignment and energy-aware smoothness losses, ensuring plausible dynamics while capturing complex temporal patterns. We benchmark PhysicsFormer and other TSF models against operational systems across several weather variables, extreme event prediction, and model complexity, providing a comprehensive assessment of the gap between academic TSF models and operational forecasting. The dataset and benchmark implementation are available at: https://github.com/taohan10200/WEATHER-5K.

2512.13417 2026-06-18 astro-ph.SR stat.AP 版本更新

Defects and Inconsistencies in Solar Flare Data Sources: Implications for Machine Learning Forecasting

太阳耀斑数据源中的缺陷与不一致性:对机器学习预报的影响

Ke Hu, Kevin Jin, Victor Verma, Weihao Liu, Ward Manchester, Lulu Zhao, Tamas Gombosi, Yang Chen

AI总结 本研究识别了太阳耀斑数据源中的缺陷与不一致性,量化其影响,并提出修复或缓解方法,为机器学习预报模型的数据选择提供建议。

详情
AI中文摘要

用于预报太阳耀斑的机器学习模型已经使用多种数据源进行训练和评估,包括空间天气预报中心(SWPC)的业务数据和科学质量数据。通常,这些数据源在被用于训练和验证预报模型之前只经过最少的处理。然而,如果忽略这些数据源之间的缺陷和不一致性,预测性能可能会受到影响。针对一组常用的数据源以及查询和输出处理数据的软件,我们识别了它们的缺陷和不一致性,量化了它们的程度,并展示了它们如何影响数据驱动的机器学习预报模型的预测。我们还概述了修复这些问题或至少减轻其影响的程序。最后,基于数据源对训练后的预报模型预测技能得分的全面比较,我们为在业务预报中使用不同数据产品提供了建议。

英文摘要

Machine learning models for forecasting solar flares have been trained and evaluated using a variety of data sources, including Space Weather Prediction Center (SWPC) operational and science-quality data. Typically, data from these sources is minimally processed before being used to train and validate a forecasting model. However, predictive performance can be affected if defects and inconsistencies between these data sources are ignored. For a set of commonly used data sources, along with the software that queries and outputs processed data, we identify their defects and inconsistencies, quantify their extent, and show how they can affect predictions from data-driven machine-learning forecasting models. We also outline procedures for fixing these issues or at least mitigating their impacts. Finally, based on thorough comparisons of the effects of data sources on the trained forecasting model's predictive skill scores, we offer recommendations for using different data products in operational forecasting.

2511.11833 2026-06-18 stat.AP 版本更新

Source apportionment of air pollution burden using geometric non-negative matrix factorization and high-throughput multi-pollutant air sensor data in Curtis Bay, Baltimore, USA

使用几何非负矩阵分解和高通量多污染物空气传感器数据在美国巴尔的摩柯蒂斯湾进行空气污染负担源解析

Bora Jin, Bonita D. Salmerón, David McClosky, David H. Hagan, Russell R. Dickerson, Nicholas J. Spada, Lauren N. Deanes, Matthew A. Aubourg, Tarunika Ramprasad, Laura E. Schmidt, Gregory G. Sawtell, Christopher D. Heaney, Abhirup Datta

AI总结 本研究利用几何非负矩阵分解方法分析高时空分辨率多污染物传感器数据,成功识别出三个稳定潜在源,并定量评估了其对不同污染物的贡献,展示了社区空气监测从污染检测向源识别转变的潜力。

详情
AI中文摘要

空气传感器网络提供超本地、高频的多污染物数据,但与颗粒物(PM)的化学形态测量不同,它们缺乏用于源识别的直接化学特征。然而,高时间分辨率和多个空间位置为通过已知来源的空间邻近性、时间模式和气象学的关系来解释潜在源创造了新的机会。我们分析了来自柯蒂斯湾(美国巴尔的摩;2022年10月至2023年6月)的451946条一分钟空气传感器记录,涵盖粒径分辨的PM、黑碳(BC)、一氧化碳(CO)、一氧化氮(NO)和二氧化氮(NO2),使用一种可扩展至大数据集并产生可证明唯一源贡献百分比的几何非负矩阵分解(NMF)方法。出现了三个稳定的潜在源,其证据趋同于可识别的源类别:源1解释了细和粗PM的$>$70%以及BC的$\sim$30%;源2主导CO并贡献了BC、NO和NO2的$\sim$70%;源3特定于较大的PM组分,PM10至PM40。回归分析和关于已知推土机事件的案例研究将源1和源3与附近的煤炭码头联系起来。来自源1和源3的极端强度事件在距离码头最近的站点平均每天持续$\sim$33和$\sim$24分钟,并随距离衰减。源2反映了昼夜交通模式。总之,这些结果表明,密集的空气传感器网络与几何NMF方法相结合,可以使社区空气监测超越污染检测,走向识别可能的源类别并为可行的缓解策略提供信息。

英文摘要

Air sensor networks provide hyperlocal, high-frequency data on multiple pollutants, but unlike speciated particulate matter (PM) measurements, they lack direct chemical signatures for source identification. High temporal resolution and multiple spatial locations nonetheless create new opportunities to interpret latent sources through their relationships with spatial proximity to known origins, temporal patterns, and meteorology. We analyze 451946 one-minute air sensor records from Curtis Bay (Baltimore, USA; October 2022 - June 2023), covering size-resolved PM, black carbon (BC), carbon monoxide (CO), nitric oxide (NO), and nitrogen dioxide (NO2), using a geometric non-negative matrix factorization (NMF) approach that scales to large datasets and yields provably unique source attribution percentages. Three stable latent sources emerge with converging evidence toward recognizable source categories: Source 1 explains $>$ 70% of fine and coarse PM and $\sim$30% of BC; Source 2 dominates CO and contributes $\sim$70% of BC, NO, and NO2; Source 3 is specific to the larger PM fractions, PM10 to PM40. Regression analyses and a case study on a known bulldozer incident link Sources 1 and 3 to a nearby coal terminal. Extreme-intensity episodes from Sources 1 and 3 averaged $\sim$33 and $\sim$24 minutes per day at the site nearest the terminal, attenuating with distance. Source 2 reflects diurnal traffic patterns. Together, these results show that dense air sensor networks paired with the geometric NMF method can move community air monitoring beyond pollution detection toward identifying likely source categories and informing actionable mitigation strategies.

2507.18824 2026-06-18 hep-ph nucl-th stat.AP stat.ML 版本更新

Deep Neural Network Driven Simulation Based Inference Method for Pole Position Estimation under Model Misspecification

深度神经网络驱动的基于模拟推理的极点估计方法在模型误设定下的应用

Daniel Sadasivan, Isaac Cordero, Andrew Graham, Cecilia Marsh, Daniel Kupcho, Melana Mourad, Maxim Mai

AI总结 提出深度神经网络驱动的基于模拟推理方法,在模型误设定下比传统卡方最小化更准确估计共振极点位置,以ππ散射和ρ(770)共振为例验证。

Comments 12 pages, 4 figures

详情
AI中文摘要

基于模拟推理(SBI)被证明在模型误设定的某些情况下能比传统卡方最小化产生更准确的共振参数估计,通过ππ散射和ρ(770)共振的案例研究进行了演示。使用卡方最小化对某些数据集拟合的模型可能会预测出ρ(770)的不准确极点位置,而SBI在相同模型和数据下提供了更稳健的预测。这一结果具有重要意义,既作为SBI能够处理模型误设定的概念验证,也因为ππ散射的精确建模对于研究许多当代物理系统(例如a1(1260)、ω(782))至关重要。基于模拟推理的方法被证明在模型误设定的某些情况下,在ππ散射和ρ(770)共振的案例研究中能比传统卡方最小化产生更准确的共振参数估计。使用卡方最小化对某些数据集拟合的模型可能会对ρ(770)的极点位置做出不准确的预测。SBI被证明能对极点位置做出更稳健的预测。这具有重要意义,既作为SBI方法可用于模型误设定情况的概念验证,也因为ππ散射模型是许多当代感兴趣物理系统(例如a1(1260)、ω(782))的关键组成部分。

英文摘要

Simulation Based Inference (SBI) is shown to yield more accurate resonance parameter estimates than traditional chi-squared minimization in certain cases of model misspecification, demonstrated through a case study of pi-pi scattering and the rho(770) resonance. Models fit to some data sets using chi-squared minimization can predict inaccurate pole positions for the rho(770), while SBI provides more robust predictions across the same models and data. This result is significant both as a proof of concept that SBI can handle model misspecification, and because accurate modeling of pi-pi scattering is essential in the study of many contemporary physical systems (e.g., a1(1260), omega(782)). The method of Simulation Based Inference is shown to lead to a more accurate resonance parameter estimation than traditional chi-squared minimization in certain cases of model misspecification in a case-study of pi-pi scattering and the rho(770)-resonance. Models fit to certain data sets using chi-squared minimization can make inaccurate predictions for the pole position of the rho(770). SBI is shown to make a more robust predictions for the pole positions. This is significant, both as a proof of concept that the SBI method can be used in cases of model misspecification, and because models of pi-pi scattering are a crucial part to many physical systems of contemporary interest (e.g., a1(1260), omega(782)).

12. 其他/综合统计 24 篇

2606.18424 2026-06-18 stat.OT cs.AI cs.IT math.IT 新提交

A Variational Framework for LLM Generator-Regulator Games

大语言模型生成器-调节器博弈的变分框架

Quanyan Zhu

发表机构 * Department of Electrical and Computer Engineering, Tandon School of Engineering, New York University, Brooklyn, NY, USA(电气工程系,工程学院,纽约大学,布鲁克林,纽约,美国)

AI总结 提出一个变分框架,将语言生成建模为熵正则化吉布斯分布,将调节建模为最优判别器,通过鞍点问题平衡效用、熵、调节一致性和有限长度可检测性,并通过审查过滤和钓鱼防御案例验证。

详情
AI中文摘要

本文发展了一个用于受调节语言生成的变分框架。从自回归令牌采样出发,我们推导了完整消息上的诱导分布,并将其与熵正则化的吉布斯定律联系起来。调节被建模为一个最优判别器,其对偶凸值为f-散度,生成器-调节器交互被表述为一个鞍点问题。该框架适用于内容审核、审查、AI欺骗检测、合规审计、钓鱼防御和操纵控制,其中调节涉及可能消息上的分布而非单个输出。均衡阐明了效用、熵、调节一致性和有限长度可检测性之间的权衡。两个有限词汇案例研究,即审查过滤和钓鱼防御,说明了如何通过效用、熵、散度、接收端分数和检测概率来评估该理论。

英文摘要

This paper develops a variational framework for regulated language generation. Starting from autoregressive token sampling, we derive the induced distribution over complete messages and relate it to an entropy-regularized Gibbs law. Regulation is modeled as an optimal discriminator whose convex-dual value is an f-divergence, and the generator-regulator interaction is formulated as a saddle-point problem. The framework applies to moderation, censorship, AI deception detection, compliance auditing, phishing defense, and manipulation control, where regulation concerns a distribution over possible messages rather than a single output. The equilibrium clarifies the tradeoff among utility, entropy, regulatory alignment, and finite-length detectability. Two finite-vocabulary case studies, censorship filtering and phishing defense, illustrate how the theory can be evaluated through utility, entropy, divergence, receiver-side scores, and detection probability.

2606.19306 2026-06-18 math.PR math.ST stat.TH 新提交

On two overlooked stick-breaking constructions of the normalized inverse Gaussian process

关于归一化逆高斯过程两个被忽视的棍子断裂构造

Annalisa Cerquetti

AI总结 本文揭示了归一化逆高斯随机离散分布的两种替代棍子断裂构造,分别基于条件布朗运动划分和归一化广义Gamma子序的随机时空变换,并推广到更广泛的分布族。

Comments 10 pages

详情
AI中文摘要

我们阐明了归一化逆高斯(NIG)随机离散分布的两种替代棍子断裂构造,这些构造在贝叶斯非参数设置中似乎至今被忽视。第一种源自Aldous和Pitman(1998)关于条件布朗运动划分的结果,通过对时间1之前零点的局部时间进行混合。第二种作为James(2013)关于通过随机时空变换归一化广义Gamma子序得到先验的结果的特例出现。两种构造都基于标准随机变量的直接变换,并且可以轻松推广,分别为a) 由$1/2$稳定Lévy测度驱动的混合Poisson-Kingman模型族和b) 由逆Gaussian子序驱动的Poisson-Gamma过程族中的任何元素提供棍子断裂构造。

英文摘要

We shed light on two alternative stick-breaking constructions of the normalized inverse Gaussian (NIG) random discrete distribution which appear to have been overlooked so far in the Bayesian nonparametric setting. The first is derived from a result in Aldous and Pitman (1998) for the conditional Brownian excursion partition, mixing over the local time at zero up to time one. The second arises as a particular case of a result in James (2013) for priors obtained by a random spatial and temporal change of the normalized generalized Gamma subordinator. Both constructions are in terms of straightforward transformations of standard random variables and can be easily generalized to provide the stick-breaking construction of any element, respectively, in a) the family of mixed Poisson-Kingman models driven by the $1/2$ stable Lévy measure and b) the family of Poisson-Gamma processes driven by the Inverse Gaussian subordinator.

2606.19268 2026-06-18 math.ST cs.CG stat.TH 新提交

Patnaik-Pearson intrinsic dimension for internal representations of neural networks

神经网络内部表示的Patnaik-Pearson本征维数

Tom Hadfield

AI总结 提出Patnaik-Pearson维数度量数据流形的本征维数,应用于神经网络内部表示,证明其性质,并与HTSR/SETOL分析关联,通过BERT-base和DeepSeek-R1模型研究维数演化。

Comments 35 pages, 19 figures

详情
AI中文摘要

我们定义了一种新的数据流形本征维数度量,称为Patnaik-Pearson维数,并将其应用于神经网络(特别是Transformer)的内部表示。其灵感来自Martin、Mahoney和Hinrichs的HTSR和SETOL工作,以及Facco等人的TwoNN本征维数估计器。我们证明了该本征维数估计器的各种性质。将神经网络的权重矩阵视为数据流形,对于经验谱密度服从Pareto(幂律)分布的权重矩阵,我们将Patnaik-Pearson维数与HTSR和SETOL分析联系起来,并表明两种方法的尾部指数临界值一致。结合理论和数值技术,我们研究了数据流形在神经网络典型变换下Patnaik-Pearson维数的行为。我们将此方法应用于BERT-base和DeepSeek-R1-Distill-Qwen-1模型,首先研究token嵌入初始数据流形的Patnaik-Pearson维数,其次研究token嵌入通过模型各层时Patnaik-Pearson维数的演化。本文数值结果所使用的代码和笔记本可在以下网址获取:此 https URL

英文摘要

We define a new measure of intrinsic dimension of a data manifold, which we call the Patnaik-Pearson dimension, and apply this to internal representations of neural networks, in particular transformers. The inspiration for this comes from the HTSR and SETOL work of Martin, Mahoney and Hinrichs, combined with the TwoNN intrinsic dimension estimator of Facco et al. We prove various properties of this intrinsic dimension estimator. Treating weight matrices of neural networks as data manifolds, for weight matrices whose Empirical Spectral Density follows a Pareto (Power Law) distribution, we relate the Patnaik-Pearson dimension to the HTSR and SETOL analysis, and show that critical values of the tail exponent coincide for the two approaches. Using a combination of theoretical and numerical techniques, we study the behaviour of the Patnaik-Pearson dimension of a data manifold under the transformations typical to neural networks. We apply this machinery to the BERT-base and DeepSeek-R1-Distill-Qwen-1 models, to investigate first the Patnaik-Pearson dimension of the initial data manifold of token embeddings, and second the evolution of the Patnaik-Pearson dimension as token embeddings pass through the layers of the model. Code and notebooks used for the numerical results presented here is available at https://github.com/tdhadfield/PatnaikPearson

2606.18942 2026-06-18 math.ST stat.TH 新提交

Group Efficient Randomized-Adaptive Designs with Delayed and Missing Responses

具有延迟和缺失响应的群组高效随机自适应设计

Guijing Zhang, Li-Xin Zhang

AI总结 提出一种基于ERADE的群组响应自适应设计,通过固定时间间隔招募和动态更新分配概率,处理延迟和缺失响应,保留原设计的渐近性质。

详情
AI中文摘要

响应自适应随机化设计在临床试验中备受关注。本文提出了一类新的响应自适应设计,该设计基于Hu、Zhang和He提出的高效随机自适应设计(ERADE)。原始的ERADE使用离散分配概率函数,并利用随机过程的停时理论建立渐近结果。已证明该设计对于任何目标分配比例都能达到Cramér-Rao下界。本研究进一步扩展了原始框架,将传统的逐例顺序招募替换为固定时间间隔(每周、每两周或每月)的群组招募,并根据每个群组内的累积响应信息动态更新分配概率。同时,为了更好地适应实际应用场景,我们进一步明确考虑了随机缺失数据和响应延迟的情况。理论分析表明,新设计保留了原始ERADE的所有主要渐近性质,并且在响应延迟和缺失数据条件下仍然表现良好。最后,通过模拟研究和真实临床试验的重新设计,验证了所提方法的有效性和实用性。

英文摘要

Response-adaptive randomization designs have attracted much attention in clinical trials. This paper proposes a new class of response-adaptive design which built on the efficient randomized-adaptive design (ERADE) proposed by Hu, Zhang, and He. The original ERADE uses a discrete allocation probability function and leverages the stopping time theory of stochastic processes to establish asymptotic results. It has been proven that the design can reach the Cramér-Rao lower bound for any target allocation proportion. This study further expands the original framework by replacing the traditional case-by-case sequential enrollment with group recruitment over fixed time intervals(weekly, biweekly, or monthly), and dynamically updates the allocation probabilities based on the cumulative response information within each group. Meanwhile, to better fit practical application scenarios, we further explicitly consider the situations of randomly missing data and response delay. Theoretical analysis shows that the new design retains all the main asymptotic properties of the original ERADE, and still performs well under the conditions of response delay and missing data. Finally, through simulation studies and the redesign of a real-world clinical trial, the effectiveness and practicality of the proposed method are verified.

2606.18700 2026-06-18 math.ST stat.TH 新提交

Bayesian Prediction in Gamma Models: Admissibility and Infinitesimal Prediction

Gamma模型中的贝叶斯预测:可容许性与无穷小预测

Fumiyasu Komaki

AI总结 研究Gamma模型中形状参数已知、尺度参数未知时的估计与预测问题,证明基于Jeffreys先验的贝叶斯预测密度对所有α>0是可容许的,并建立无穷小预测框架。

详情
AI中文摘要

我们研究了Gamma模型$\mathrm{Ga}(\alpha,\beta)$在Kullback--Leibler损失下的估计与预测问题,其中形状参数$\alpha$已知,尺度参数$\beta$未知。对于$\alpha\le1$,所有尺度不变的$\beta$估计量具有无限风险,表明在边界$\alpha=1$处估计问题发生质变。我们的主要结果是,基于Jeffreys先验的贝叶斯预测密度对所有$\alpha>0$都是可容许的。这解决了Gamma模型中贝叶斯预测密度的可容许性问题。作为相关结果,我们还建立了$\alpha>1$时相应贝叶斯估计量的可容许性。为了证明预测可容许性结果,我们发展了一个基于Gamma过程的无穷小预测框架。该框架自然导出了Lévy密度的Kullback--Leibler损失,并建立了预测分布与Lévy测度之间的联系。在所得损失下,贝叶斯预测Lévy密度被证明是后验均值Lévy密度。与正态和Poisson模型不同,Gamma模型中的无穷小预测并未简化为参数估计,而是简化为Lévy密度的估计。我们将这一现象与均值混合曲率联系起来,并从信息几何角度进行讨论。

英文摘要

We study estimation and prediction in the Gamma model $\mathrm{Ga}(α,β)$, where the shape parameter $α$ is known and the scale parameter $β$ is unknown, under the Kullback--Leibler loss. For $α\le1$, all scale-invariant estimators of $β$ have infinite risk, indicating a qualitative change in the estimation problem at the boundary $α=1$. Our main result is that the Bayesian predictive density based on the Jeffreys prior is admissible for all $α>0$. This resolves the admissibility problem for Bayesian predictive densities in Gamma models. As a related result, we also establish the admissibility of the corresponding Bayesian estimator for $α>1$. To prove the predictive admissibility result, we develop an infinitesimal prediction framework based on Gamma processes. This framework naturally leads to a Kullback--Leibler loss for Lévy densities and establishes a connection between predictive distributions and Lévy measures. Under the resulting loss, the Bayesian predictive Lévy density is shown to be the posterior mean Lévy density. Unlike the normal and Poisson models, infinitesimal prediction in the Gamma model does not reduce to parameter estimation. Instead, it reduces to the estimation of a Lévy density. We relate this phenomenon to mean mixture curvature and discuss it from an information-geometric viewpoint.

2606.18378 2026-06-18 math.ST stat.TH 新提交

Inferential Models: The Power of Auxiliary Variables for Reasoning with Scientific Uncertainty

推断模型:辅助变量在科学不确定性推理中的力量

Chuanhai Liu

AI总结 本文探讨推断模型(IMs)作为无先验概率推理的框架,通过预测辅助变量并利用校准预测随机集传递合理性陈述,实现频率校准的不确定性评估,并阐明与费希尔、内曼、登普斯特-谢弗等方法的联系。

Comments 29 pages, 4 figures

详情
AI中文摘要

科学推断的一个核心挑战是产生既针对具体情境又经过频率校准的不确定性评估。本文探讨了推断模型(IMs)作为无先验概率推理处理科学不确定性的框架。IM的核心思想是将抽样模型中的辅助变量视为基于模型的不确定性的来源。R. A. Fisher的信仰推断在应用概率计算之前将辅助随机性转移到参数空间;而IMs则使用校准预测随机集(PRSs)预测未观测到的辅助值,并仅在之后传递合理性陈述。这种顺序的改变产生了有效的不确定性评估,并阐明了Fisherian信仰推理、Neymanian置信理论、Dempster-Shafer信念函数、广义信仰推断和IMs之间的关系。通过将IMs与客观先验贝叶斯推断进行比较,文章认为E. T. Jaynes的科学逻辑雄心可以继续,而无需将所有科学不确定性强制纳入精确的先验分布,因为校准的不精确性往往是必要的。最后,文章提出IMs的微分几何理论可能即将实现,为传统上以似然原理表述的基础问题提供了一条可能的途径。

英文摘要

A central challenge in scientific inference is to produce uncertainty assessments that are both situation-specific and frequency-calibrated. This article examines inferential models (IMs) as a framework for prior-free probabilistic reasoning with scientific uncertainty. The central IM idea is to view the auxiliary variables in a sampling model as the source of model-based uncertainty. R. A. Fisher's fiducial inference transfers auxiliary randomness to the parameter space before applying probability calculus; IMs instead predict the unobserved auxiliary value with calibrated predictive random sets (PRSs) and transfer the resulting plausibility statements only afterward. This change in order yields valid uncertainty assessments and clarifies the relations among Fisherian fiducial reasoning, Neymanian confidence theory, Dempster-Shafer belief functions, generalized fiducial inference, and IMs. By comparing IMs with objective-prior Bayesian inference, the article argues that E. T. Jaynes' logic-of-science ambition can be continued without forcing all scientific uncertainty into a precise prior distribution because calibrated imprecision is often essential. Finally, the article suggests that a differential-geometric theory of IMs may be within reach, offering a possible route to foundational questions traditionally framed in terms of the likelihood principle.

2606.05072 2026-06-18 math.ST stat.TH 版本更新

Adaptive Sequential Change Detection using Mixtures of Predictive Distributions

使用预测分布混合的自适应序列变化检测

Topi Halme, H. Vincent Poor, Visa Koivunen

AI总结 针对后变化分布未知的独立观测序列变化检测问题,提出一种基于滑动窗口预测分布混合的PM-CuSum算法,实现一阶渐近最优性且渐近延迟余项更小。

详情
AI中文摘要

本文研究了当后变化分布未知时,检测独立观测序列分布变化的问题。我们提出了一种新颖的变化检测算法,称为预测混合CuSum(PM-CuSum),该算法在CuSum递归中结合了从不同长度滑动窗口构建的预测分布。预测分布根据其近期预测性能使用自适应权重进行聚合。我们证明,在温和条件下,PM-CuSum实现了一阶渐近最优性,并且其渐近延迟界具有比任何固定(甚至先知)窗口更小的余项阶数。数值模拟表明,与现有方法相比,PM-CuSum表现良好。此外,与插件似然相比,使用完整预测分布形成似然比可以显著提高性能。

英文摘要

This paper studies the problem of detecting a change in the distribution of a sequence of independent observations when the post-change distribution is unknown. We propose a novel change detection algorithm, termed Predictive-Mixture CuSum (PM-CuSum), which combines predictive distributions constructed from sliding windows of different lengths within a CuSum recursion. The predictive distributions are aggregated using adaptive weights based on their recent predictive performance. We show that PM-CuSum achieves first-order asymptotic optimality under mild conditions, and that its asymptotic delay bound has a smaller remainder order than what is achieved procedures using a single fixed (even oracle) window. Numerical simulations demonstrate that PM-CuSum performs well compared to existing methods. Moreover, it is demonstrated that forming likelihood ratios using full predictive distributions can substantially improve performance compared to plug-in likelihoods.

2604.07336 2026-06-18 astro-ph.CO astro-ph.IM physics.data-an stat.AP 版本更新

The Non-Gaussian Weak-Lensing Likelihood: A Multivariate Copula Construction and Impact on Cosmological Constraints

非高斯弱引力透镜似然:多元Copula构建及其对宇宙学约束的影响

Veronika Oehl, Tilman Tröster

AI总结 提出用Copula方法构建两点相关函数的非高斯似然,在大尺度上比高斯似然更准确,但对Stage-IV巡天影响可忽略。

Comments 16 pages, 5 figures in the main text. Published in the Open Journal of Astrophysics

详情
Journal ref
The Open Journal of Astrophysics, Vol. 9, 2026
AI中文摘要

我们提出了一个计算两点相关函数的非高斯似然的框架。非高斯性在Stage-IV弱引力透镜巡天将精确测量的大尺度上最为显著。我们展示了如何通过Copula方法构建并高效评估这种多元似然,该方法结合了精确的一维边缘分布和来自精确多元似然的依赖结构。发现Copula似然与相关函数的模拟抽样分布比高斯似然更一致,尤其是在大尺度上。此外,我们研究了非高斯Copula似然对后验推断的影响,包括对当代弱引力透镜分析的全参数空间采样。我们发现对于$1\\ 000 \\ \mathrm{deg}^2$巡天,$S_8$可能存在约一个标准差的参数偏移,但对于$10\\ 000 \\ \mathrm{deg}^2$区域偏移可忽略,表明高斯似然对于Stage-IV巡天是足够的,尽管结果依赖于详细的掩膜几何和数据向量结构。

英文摘要

We present a framework to compute non-Gaussian likelihoods for two-point correlation functions. The non-Gaussianity is most pronounced on large scales that will be well-measured by stage-IV weak-lensing surveys. We show how such a multivariate likelihood can be constructed and efficiently evaluated using a copula approach by incorporating exact one-dimensional marginals and a dependence structure derived from the exact multivariate likelihood. The copula likelihood is found to be in better agreement with simulated sampling distributions of correlation functions than Gaussian likelihoods, particularly on large scales. We furthermore investigate the effect of the non-Gaussian copula likelihood on posterior inference, including sampling the full parameter space of contemporary weak-lensing analyses. We find potential parameter shifts in $S_8$ on the order of one standard deviation for $1 \ 000 \ \mathrm{deg}^2$ surveys but negligible shifts for areas of $10 \ 000 \ \mathrm{deg}^2$, suggesting Gaussian likelihoods are sufficient for stage-IV surveys, though results depend on the detailed mask geometry and data-vector structure.

2510.27319 2026-06-18 math.ST stat.TH 版本更新

Adaptive Algorithms for Infinitely Many-Armed Bandits: A Unified Framework

无穷多臂老虎机的自适应算法:统一框架

Emmanuel Pilliat

AI总结 提出统一框架OSE和PROSE算法,针对预算小于臂数(可能无穷)的老虎机问题,自适应臂均值分布,最大化期望简单奖励,实现近最优率。

详情
AI中文摘要

我们考虑一个预算小于臂数(可能无穷)的老虎机问题。在此情况下,文献中的通常目标是最小化简单遗憾。为了分析具有潜在无界支撑的广泛分布类别,其中简单遗憾可能无法明确定义,我们采取略有不同的方法,旨在最大化推荐臂的期望简单奖励,并提供随时保证。为此,我们引入了一个无分布算法OSE,该算法自适应于臂均值的分布,并为几种分布类别实现了近最优的速率。我们通过秩校正的逆平方间隙函数来刻画样本复杂度。特别地,当分位数函数为$\lambda_\eta = 1-\eta^{\alpha}$时,我们恢复了已知的上界和$\alpha$小于或大于$1/2$时的过渡区域。此外,我们根据相对于$\alpha$的噪声水平识别了新的过渡区域,并推测这些区域是近乎最优的。另外,我们引入了一个增强的实用版本PROSE,该版本在文献中考虑的主要分布类别上实现了最先进的实证性能。

英文摘要

We consider a bandit problem where the buget is smaller than the number of arms, which may be infinite. In this regime, the usual objective in the literature is to minimize simple regret. To analyze broad classes of distributions with potentially unbounded support, where simple regret may not be well-defined, we take a slightly different approach and seek to maximize the expected simple reward of the recommended arm, providing anytime guarantees. To that end, we introduce a distribution-free algorithm, OSE, that adapts to the distribution of arm means and achieves near-optimal rates for several distribution classes. We characterize the sample complexity through the rank-corrected inverse squared gap function. In particular, we recover known upper bounds and transition regimes for $α$ less or greater than $1/2$ when the quantile function is $λ_η= 1-η^α$. We additionally identify new transition regimes depending on the noise level relative to $α$, which we conjecture to be nearly optimal. Additionally, we introduce an enhanced practical version, PROSE, that achieves state-of-the-art empirical performance for the main distribution classes considered in the literature.

2508.02158 2026-06-18 cs.IT cs.CR cs.DS cs.LG math.IT math.ST stat.TH 版本更新

Robust Detection of Planted Subgraphs in Semi-Random Models

半随机模型中植入子图的鲁棒检测

Dor Elimelech, Wasim Huleihel

AI总结 研究半随机模型下植入子图检测问题,证明存在对抗者时强次对数密度子图检测在信息论上不可能,而对数以上密度子图统计极限不变,并设计了高效鲁棒检测算法。

Comments 38 pages, 2 figures

详情
AI中文摘要

在Erdös-Rényi随机图中检测植入子图已被广泛研究,产生了丰富的刻画统计和计算阈值的结果。然而,大多数先前的工作假设纯随机生成模型,使得所得算法在面对现实扰动时可能脆弱。本文开创性地研究了植入子图检测问题的半随机模型,其中允许对抗者在图被揭示给统计学家之前移除植入子图外的边。关键的是,统计学家仍然不知道哪些边被移除,这给推理任务带来了根本性挑战。我们建立了该半随机模型下检测的基本统计极限,揭示了尖锐的二分性。具体而言,对于具有强次对数最大密度的植入子图,在存在对抗者的情况下检测在信息论上变得不可能——尽管在经典随机模型中某些植入子图是可能的。与此形成鲜明对比的是,对于具有超对数密度的子图,统计极限基本保持不变;我们证明最优(尽管计算上不可行)的似然比检验仍然是鲁棒的。在这些统计边界之外,我们设计了一种新的计算高效且鲁棒的检测算法,并为其性能提供了严格的统计保证。我们的结果为植入子图检测建立了第一个鲁棒框架,并为半随机模型、计算-统计权衡和图推理问题中的鲁棒性研究开辟了新方向。

英文摘要

Detection of planted subgraphs in Erdös-Rényi random graphs has been extensively studied, leading to a rich body of results characterizing both statistical and computational thresholds. However, most prior work assumes a purely random generative model, making the resulting algorithms potentially fragile in the face of real-world perturbations. In this work, we initiate the study of semi-random models for the planted subgraph detection problem, wherein an adversary is allowed to remove edges outside the planted subgraph before the graph is revealed to the statistician. Crucially, the statistician remains unaware of which edges have been removed, introducing fundamental challenges to the inference task. We establish fundamental statistical limits for detection under this semi-random model, revealing a sharp dichotomy. Specifically, for planted subgraphs with strongly sub-logarithmic maximum density detection becomes information-theoretically impossible in the presence of an adversary-despite being possible for some planted subgraphs in the classical random model. In stark contrast, for subgraphs with super-logarithmic density, the statistical limits remain essentially unchanged; we prove that the optimal (albeit computationally intractable) likelihood ratio test remains robust. Beyond these statistical boundaries, we design a new computationally efficient and robust detection algorithm, and provide rigorous statistical guarantees for its performance. Our results establish the first robust framework for planted subgraph detection and open new directions in the study of semi-random models, computational-statistical trade-offs, and robustness in graph inference problems.

2307.10067 2026-06-18 econ.EM math.ST stat.TH 版本更新

The Canonical Decomposition of Factor Models: Weak Factors are Everywhere

因子模型的规范分解:弱因子无处不在

Philipp Gersing, Matteo Barigozzi, Christoph Rust, Manfred Deistler

AI总结 本文提出因子模型的规范分解,引入弱公共成分(动态与静态公共成分之差),并通过理论和实证表明该成分不可忽略,且考虑弱成分可获得更合理的脉冲响应函数。

详情
AI中文摘要

我们推导出一种新颖的因子模型规范分解,涵盖静态因子模型(因子仅同期加载)和广义动态因子模型(因子滞后加载)。该分解包含一个新项:弱公共成分,定义为动态与静态公共成分之差。它由(可能无限多的)非普遍弱因子驱动,这些因子属于动态公共空间。通过理论和实证例子(涉及美国宏观经济指标和全球金融波动性),我们表明弱公共成分通常不可忽略。此外,我们证明,通过考虑弱公共成分的存在,我们可能获得比纯静态方法更合理的脉冲响应函数形状。我们还为规范分解的所有项和弱因子提供了一致估计量。

英文摘要

We derive a novel canonical decomposition of factor models encompassing both the static factor model - where factors are loaded only contemporaneously - and the Generalised Dynamic Factor Model - where factors are loaded with lags. This decomposition features a new term: the weak common component, defined as the difference between the dynamic and static common components. It is driven by (possibly infinitely many) non-pervasive weak factors which belong to the dynamically common space. Through theoretical and empirical examples - both on U.S. macroeconomic indicators and global financial volatilities - we show that, in general, the weak common component shall not be neglected. Furthermore, we show that, by accounting for the presence of weak common components, we are likely to obtain Impulse Response Functions with more plausible shapes than those obtained from purely static approaches. In addition, we provide consistent estimators for all terms of the canonical decomposition and for the weak factors.

2405.05344 2026-06-18 math.ST stat.TH 版本更新

A note on the minimax risk of sparse linear regression

关于稀疏线性回归极小风险的一个注记

Yilin Guo, Shubhangi Ghosh, Haolei Weng, Arian Maleki

AI总结 本文推导了稀疏线性回归在各项同性高斯随机设计下极小风险的渐近精确表达式,即$2\sigma^2 k/n \log(p/k)$,并总结了现有结果和未解决问题。

详情
AI中文摘要

稀疏线性回归是高维统计和压缩感知中经典且被广泛研究的问题之一。尽管有大量文献致力于该问题,但其极小风险的精确确定仍然难以捉摸。本文旨在通过推导稀疏线性回归极小风险的渐近常数精确刻画来填补这一空白。具体而言,本文关注稀疏度$k$满足$(k \log p)/n \to 0$的场景,其中$p$和$n$分别表示特征数和观测数。我们证明在各项同性高斯随机设计下,极小风险渐近等于$2\sigma^2 k/n \log(p/k)$,其中$\sigma$表示噪声的标准差。除这一结果外,我们将总结文献中的现有结果,并提及一些仍未解决的基本问题。

英文摘要

Sparse linear regression is one of the classical and extensively studied problems in high-dimensional statistics and compressed sensing. Despite the substantial body of literature dedicated to this problem, the precise determination of its minimax risk remains elusive. This paper aims to fill this gap by deriving asymptotically constant-sharp characterization for the minimax risk of sparse linear regression. More specifically, the paper focuses on scenarios where the sparsity level, denoted as k, satisfies the condition $(k \log p)/n {\to} 0$, with p and n representing the number of features and observations respectively. We establish that the minimax risk under isotropic Gaussian random design is asymptotically equal to $2σ^2k/n log(p/k)$, where $σ$ denotes the standard deviation of the noise. In addition to this result, we will summarize the existing results in the literature, and mention some of the fundamental problems that have still remained open.

2504.03228 2026-06-18 econ.EM stat.ML

Weak instrumental variables due to ignored nonlinearities in panel data: A Super Learner Control Function estimator

Monika Avila-Marquez

详情
英文摘要

A triangular structural panel data model with additive separable individual-specific effects is used to model the causal effect of a covariate on an outcome variable when there are unobservable confounders with some of them time-invariant. In this setup, a linear specification for the reduced-form equation might be problematic when the conditional mean of the endogenous covariate and the instrumental variables is nonlinear in the population. The reason is that ignoring the nonlinearity could lead to weak instruments (instruments are weakly correlated with the endogenous covariate) due to misspecification as shown using a generalized concentration parameter for panel data. As a solution, we propose a triangular simultaneous equation model for panel data with additive separable individual-specific fixed effects composed of a linear structural equation with a nonlinear reduced form equation. The parameter of interest is the structural parameter of the endogenous variable. The identification of this parameter is obtained under the assumption of available exclusion restrictions and using a control function approach. We provide an estimator that we call Super Learner Control Function estimator (SLCFE). The estimation procedure is composed of two main steps and cross-fitting. First, we estimate the control function using a super learner. In the following step, we use the estimated control function to control for endogeneity in the structural equation. Cross-fitting is done across the individual dimension. The estimator is consistent and asymptotically normal achieving a parametric rate of convergence. We show that the SLCF estimator differs from both the plug-in IV estimator and a naive plug-in 2SLS estimator, with the former not being consistent without cross-fitting, and the latter not being consistent even with cross-fitting.

2510.03681 2026-06-18 stat.AP

Bayesian Variable Selection for Censored Spatial Responses with Application to PFAS Concentrations in California

Suman Majumder, Indranil Sahoo

详情
英文摘要

Per- and polyfluoroalkyl substances (PFAS) are persistent environmental pollutants of major public health concern due to their resistance to degradation, widespread presence, and potential health risks. Analyzing PFAS in groundwater is challenging due to left-censoring and strong spatial dependence. Although PFAS levels are influenced by sociodemographic, industrial, and environmental factors, the relative importance of these drivers remains unclear, highlighting the need for robust statistical tools to identify key predictors from a large candidate set. We present a Bayesian hierarchical framework that integrates censoring into a spatial process model via approximate Gaussian processes and employs a global-local shrinkage prior for high-dimensional variable selection. We evaluate three post-selection strategies, namely, credible interval rules, shrinkage weight thresholds, and clustering-based inclusion and compare their performance in terms of predictive accuracy, censoring robustness, and variable selection stability through cross-validation. Applied to PFOS concentrations in California groundwater, the model identifies a concise, interpretable set of predictors, including demographic composition, industrial facility counts, proximity to airports, traffic density, and environmental features such as herbaceous cover and elevation. These findings demonstrate that the proposed approach delivers stable, interpretable inference in censored, spatial, high-dimensional contexts, thereby offering actionable insights into the environmental and industrial factors affecting PFAS concentrations.

2601.18637 2026-06-18 quant-ph cs.LG stat.ML

Universality of Many-body Projected Ensemble for Learning Quantum Data Distribution

Quoc Hoan Tran, Koki Chinzei, Yasuhiro Endo, Hirotaka Oshima

发表机构 * Quantum Laboratory, Fujitsu Research, Fujitsu Limited, Kawasaki, Kanagawa 211-8588, Japan(富士通量子实验室,富士通研究,富士通株式会社,神户,神奈川县211-8588,日本)

Comments 21 pages, 6 figures (added Github repository)

详情
Journal ref
IJCNN 2026
英文摘要

Generating quantum data by learning the underlying quantum distribution poses challenges in both theoretical and practical scenarios, yet it is a critical task for understanding quantum systems. A fundamental question in quantum machine learning (QML) is the universality of approximation: whether a parameterized QML model can approximate any quantum distribution. We address this question by proving a universality theorem for the Many-body Projected Ensemble (MPE) framework, a method for quantum state design that uses a single many-body wave function to prepare random states. This demonstrates that MPE can approximate any distribution of pure states within a 1-Wasserstein distance error. This theorem provides a rigorous guarantee of universal expressivity, addressing key theoretical gaps in QML. For practicality, we propose an Incremental MPE variant with layer-wise training to improve the trainability. Numerical experiments on clustered quantum states and quantum chemistry datasets validate MPE's efficacy in learning complex quantum data distributions.

2512.17696 2026-06-18 cs.LG stat.ME stat.ML

Spatially-informed transformers: Injecting geostatistical covariance biases into self-attention for spatio-temporal forecasting

Yuri Calleo

发表机构 * Unimercatorum(乌尼默卡图姆大学)

详情
英文摘要

The modeling of high-dimensional spatio-temporal processes presents a fundamental dichotomy between the probabilistic rigor of classical geostatistics and the flexible, high-capacity representations of deep learning. While Gaussian processes offer theoretical consistency and exact uncertainty quantification, their prohibitive computational scaling renders them impractical for massive sensor networks. Conversely, modern transformer architectures excel at sequence modeling but inherently lack a geometric inductive bias, treating spatial sensors as permutation-invariant tokens without a native understanding of distance. In this work, we propose a spatially-informed transformer, a hybrid architecture that injects a geostatistical inductive bias directly into the self-attention mechanism via a learnable covariance kernel. By formally decomposing the attention structure into a stationary physical prior and a non-stationary data-driven residual, we impose a soft topological constraint that favors spatially proximal interactions while retaining the capacity to model complex dynamics. We demonstrate the phenomenon of ``Deep Variography'', where the network successfully recovers the true spatial decay parameters of the underlying process end-to-end via backpropagation. Extensive experiments on synthetic Gaussian random fields and real-world traffic benchmarks confirm that our method outperforms state-of-the-art graph neural networks. Furthermore, rigorous statistical validation confirms that the proposed method delivers not only superior predictive accuracy but also well-calibrated probabilistic forecasts, effectively bridging the gap between physics-aware modeling and data-driven learning.

2508.14629 2026-06-18 stat.AP

Experimental validation of universal filtering and smoothing for linear system identification using adaptive tuning

Zihao Liu, Sima Abolghasemi, Mohsen Ebrahimzadeh Hassanabadi, Nicholas E. Wierschem, Daniel Dias-da-Costa

详情
英文摘要

In Kalman filtering, unknown inputs are often estimated by augmenting the state vector, which introduces reliance on fictitious input models. In contrast, minimum-variance unbiased methods estimate inputs and states separately, avoiding fictitious models but requiring strict sensor configurations, such as full-rank feedforward matrices or without direct feedthrough. To address these limitations, two universal approaches have been proposed to handle systems with or without direct feedthrough, including cases of rank-deficient feedforward matrices. Numerical studies have shown their robustness and applicability, however, they have so far relied on offline tuning, and performance under physical sensor noise and structural uncertainties has not yet been experimentally validated. Contributing to this gap, this paper experimentally validates the universal methods on a five-storey shear frame subjected to shake table tests and multi-impact events. Both typical and rank-deficient conditions are considered. Furthermore, a self-tuning mechanism is introduced to replace impractical offline tuning and enable real-time adaptability. The findings of this paper provide strong evidence of the robustness and adaptability of the methods for structural health monitoring applications, particularly when sensor networks deviate from ideal configurations.

2412.05687 2026-06-18 stat.ME

Bootstrap Model Averaging

Minghui Song, Guohua Zou, Alan T. K. Wan

详情
英文摘要

Model averaging has gained significant attention in recent years due to its ability of fusing information from different models. The critical challenge in frequentist model averaging is the choice of weight vector. The bootstrap method, known for its favorable properties, presents a new solution. In this paper, we propose a bootstrap model averaging approach that selects the weights by minimizing a bootstrap criterion. Our weight selection criterion can also be interpreted as a bootstrap aggregating. We demonstrate that the resultant estimator is asymptotically optimal in the sense that it achieves the lowest possible squared error loss. Furthermore, we establish the convergence rate of bootstrap weights tending to the theoretically optimal weights. Additionally, we derive the limiting distribution for our proposed model averaging estimator. Through simulation studies and empirical applications, we show that our proposed method often has better performance than other commonly used model selection and model averaging methods, and bootstrap variants.

2411.02276 2026-06-18 stat.ME

A Bayesian Model for Co-clustering Ordinal Data with Informative Missing Entries

Alice Giampino, Antonio Canale, Bernardo Nipoti

Comments 25 pages, 11 figures

详情
英文摘要

Several approaches have been proposed in the literature for clustering multivariate ordinal data. These methods typically treat missing values as absent information, rather than recognizing them as valuable for profiling population characteristics. To address this gap, we introduce a Bayesian nonparametric model for co-clustering multivariate ordinal data that treats censored observations as informative, rather than merely missing. We demonstrate that this offers a significant improvement in understanding the underlying structure of the data. Our model exploits the flexibility of two independent Dirichlet processes, allowing us to infer potentially distinct subpopulations that characterize the latent structure of both subjects and variables. The ordinal nature of the data is addressed by introducing latent variables, while a matrix factorization specification is adopted to handle the high dimensionality of the data in a parsimonious way. The conjugate structure of the model enables an explicit derivation of the full conditional distributions of all the random variables in the model, which facilitates seamless posterior inference using a Gibbs sampling algorithm. We demonstrate the method's performance through simulations and by analyzing politician and movie ratings data.

2404.12463 2026-06-18 stat.ME stat.AP

Spatially Selected and Dependent Random Effects for Small Area Estimation with Application to Rent Burden

Sho Kawano, Paul A. Parker, Zehang Richard Li

详情
Journal ref
J. R. Stat. Soc. Ser. A Stat. Soc. (2025) qnaf063
英文摘要

Area-level models for small area estimation typically rely on areal random effects to shrink design-based direct estimates towards a model-based predictor. Incorporating the spatial dependence of the random effects into these models can further improve the estimates when there are not enough covariates to fully account for spatial dependence of the areal means. A number of recent works have investigated models that include random effects for only a subset of areas, in order to improve the precision of estimates. However, such models do not readily handle spatial dependence. In this paper, we introduce a model that accounts for spatial dependence in both the random effects as well as the latent process that selects the effects. We show how this model can significantly improve predictive accuracy via an empirical simulation study based on data from the American Community Survey, and illustrate its properties via an application to estimate county-level median rent burden.

2303.00806 2026-06-18 stat.AP physics.geo-ph

Survival modelling of smartphone trigger data for earthquake parameter estimation in early warning. With applications to 2023 Turkish-Syrian and 2019 Ridgecrest events

Luca Aiello, Raffaele Argiento, Francesco Finazzi, Lucia Paci

详情
Journal ref
Journal of the Royal Statistical Society Series A: Statistics in Society 189(1), qnae148 (2025)
英文摘要

Crowdsourced smartphone-based earthquake early warning systems recently emerged as reliable alternatives to the more expensive solutions based on scientific-grade instruments. For instance, during the 2023 Turkish-Syrian deadly event, the system implemented by the Earthquake Network citizen science initiative provided a forewarning up to 25 seconds. We develop a statistical methodology based on a survival mixture cure model which provides full Bayesian inference on epicentre, depth and origin time, and we design an efficient tempering MCMC algorithm to address multi-modality of the posterior distribution. The methodology is applied to data collected by the Earthquake Network, including the 2023 Turkish-Syrian and 2019 Ridgecrest events.

1909.13203 2026-06-18 cs.LG stat.ML

Learning transport cost from subset correspondence

Ruishan Liu, Akshay Balsubramani, James Zou

发表机构 * Department of Electrical Engineering(电气工程系) Department of Genetics(遗传学系) Stanford University(斯坦福大学) Department of Biomedical Data Science(生物医学数据科学系)

详情
Journal ref
International Conference on Learning Representations (ICLR 2020)
英文摘要

Learning to align multiple datasets is an important problem with many applications, and it is especially useful when we need to integrate multiple experiments or correct for confounding. Optimal transport (OT) is a principled approach to align datasets, but a key challenge in applying OT is that we need to specify a transport cost function that accurately captures how the two datasets are related. Reliable cost functions are typically not available and practitioners often resort to using hand-crafted or Euclidean cost even if it may not be appropriate. In this work, we investigate how to learn the cost function using a small amount of side information which is often available. The side information we consider captures subset correspondence -- i.e. certain subsets of points in the two data sets are known to be related. For example, we may have some images labeled as cars in both datasets; or we may have a common annotated cell type in single-cell data from two batches. We develop an end-to-end optimizer (OT-SI) that differentiates through the Sinkhorn algorithm and effectively learns the suitable cost function from side information. On systematic experiments in images, marriage-matching and single-cell RNA-seq, our method substantially outperform state-of-the-art benchmarks.

1611.02690 2026-06-18 stat.ME

A Multi-State Conditional Logistic Regression Model for the Analysis of Animal Movement

Aurélien Nicosia, Thierry Duchesne, Louis-Paul Rivest, Daniel Fortin

详情
Journal ref
The Annals of Applied Statistics, 2017, 11 (3), 1537-1560
英文摘要

A multi-state version of an animal movement analysis method based on conditional logistic regression, called Step Selection Function (SSF), is proposed. In ecology SSF is developed from a comparison between the observed location of an animal and randomly sampled locations at each time step. Interpretation of the parameters in the multi-state model and the impact of different sampling schemes for the random locations are discussed. We prove the equivalence between the new model and a random walk model on the plane. This equivalence allows one to use both pure movement and local discrete choice behaviors in identifying the model's hidden states. The new method is used to model the movement behavior of GPS-collared bison in Prince Albert National Park, Canada. The multi-state SSF successfully teases apart areas used to forage and to travel. The analysis thus provides valuable insights into how bison adjust their movement to habitat features, thereby revealing spatial determinants of functional connectivity in heterogeneous landscapes.

1507.08653 2026-06-18 stat.ME

A General Hidden State Random Walk Model for Animal Movement

Aurélien Nicosia, Thierry Duchesne, Louis-Paul Rivest, Daniel Fortin

Comments 28 pages

详情
Journal ref
Computational Statistics & Data Analysis, Volume 105, January 2017, Pages 76--95
英文摘要

In this paper, we propose a general hidden state random walk model to describe the movement of an animal that takes into account movement taxis with respect to features of the environment. A circular-linear process models the direction and distance between two consecutive localizations of the animal. A hidden process structure accounts for the animal's change in movement behavior. The originality of the proposed approach is that several environmental targets can be included in the directional model. An EM algorithm is devised to fit this model and an application to the analysis of the movement of caribou in Canada's boreal forest is presented