arXivDaily arXiv每日学术速递 周一至周五更新
重置

1. 统计理论与方法 7 篇

2606.13593 2026-06-12 stat.ME 新提交

Smoothed Rank-Based Regression Estimation Using Wilcoxon Score Functions

基于Wilcoxon得分函数的平滑秩回归估计

Feridun Tasdan

AI总结 提出用平滑秩代替整数秩的Wilcoxon秩回归估计,通过核分布函数近似指示函数,在保持稳健性的同时提高重尾误差下的效率并处理结数据,推导了Wald检验并证明渐近正态性。

详情
Comments
17 pages
AI中文摘要

本文提出了一种改进的基于秩的回归估计量,通过用从平滑经验累积分布函数导出的平滑秩替换Wilcoxon秩得分回归过程中的普通整数秩。平滑秩通过连续、非递减的核分布函数H计算,该函数为标准秩回归中使用的经典指示函数提供了可微近似。将这些平滑秩代入Wilcoxon得分函数,得到简单和多元线性回归模型中斜率参数的新估计量。我们证明,所提出的估计量继承了经典秩回归的稳健性,同时在重尾误差分布下提高了效率,并更好地处理了结观测值。推导了回归系数的Wald型假设检验,并建立了其渐近正态性。蒙特卡洛模拟研究将新估计量与普通最小二乘估计量、经典Wilcoxon秩回归估计量以及Theil和Sen估计量在几种误差分布(包括正态、拉普拉斯、柯西和污染正态)下进行了比较。所提出的估计量在所有考虑的场景中均匀地达到或超过经典秩回归的相对效率,在存在异常值和重尾误差时尤其显著。

英文摘要

This article proposes an improved rank based regression estimator obtained by replacing the ordinary integer ranks in the Wilcoxon rank-score regression procedure with smoothed ranks derived from a smoothed empirical cumulative distribution function. The smoothed ranks are computed via a continuous, nondecreasing kernel distribution function H that provides a differentiable approximation to the classical indicator function used in standard rank regression. Substituting these smoothed ranks into the Wilcoxon score function yields a new estimator for the slope parameter(s) of the simple and multiple linear regression model. We show that the proposed estimator inherits the robustness properties of classical rank regression while providing improved efficiency under heavy tailed error distributions and better handling of tied observations. A Wald type hypothesis test for the regression coefficients is derived and its asymptotic normality is established. A Monte Carlo simulation study compares new estimator with the ordinary least-squares (OLS) estimator, the classical Wilcoxon rank regression estimator, and the Theil and Sen estimator under several error distributions including the normal, Laplace, Cauchy, and contaminated normal. The proposed estimator achieves relative efficiencies at or above those of classical rank regression uniformly across all scenarios considered, with notable gains in the presence of outliers and heavy-tailed errors.

2606.13433 2026-06-12 stat.ME 新提交

Smoothed-KL Reweighting: A Principled Account and Matching Rule for SNR-Based Diffusion Training

平滑KL重加权:基于信噪比的扩散训练的原则性解释与匹配规则

Lei Li

AI总结 提出平滑KL重加权方法,从扩散散度推导出闭式权重,建立与Min-SNR家族的匹配规则,在CIFAR-10和CelebA-64上验证,最终FID相当但迭代效率因数据集而异。

详情
AI中文摘要

我们对Crowson等人(2024)的Soft-Min-SNR权重进行了原则性推导。Zhang等人(2018)的扩散散度在计算KL散度之前,先对两个比较分布进行高斯核卷积;将其应用于每个时间步的逐样本局部匹配高斯代理,得到闭式权重w(t,lambda) = sigma^2 / (sigma^2 + lambda)。由此产生三个结果。第一,对于方差保持调度,w(t,lambda)等于Soft-Min-SNR的常数倍,其中gamma' = (1+lambda)/lambda,从而推导出一个经过验证的启发式方法,而非引入新权重。第二,在gamma约等于1/lambda的主导阶下,相同权重匹配Min-SNR-gamma,从而在软硬重加权家族之间建立交叉路径。第三,局部几何分析在高SNR时间步将SGD难度代理按w^3缩放。与Kingma & Gao(2023)的目标级解释(将单调对数SNR加权统一为噪声增强数据的ELBO)互补,我们的方法平滑了两个比较分布,而不仅仅是数据侧。实验上,匹配规则在CIFAR-10(线性和余弦)和CelebA-64(余弦)上成立,并在跨数据集截面上得到轨迹级确认:在seed-42 CelebA-64轨迹的七个中间检查点上,|我们的方法 - Min-SNR|的平均FID为0.45,大约是任一重加权器与DDPM之间差距的3倍。局部几何预测部分得到证实:在CIFAR-10的线性调度上,我们的方法在训练中期FID阈值处比DDPM收敛早约21%,此时高SNR阻尼空间最大,但这种迭代效率优势并未转移到余弦或CelebA-64上,这三种方法在这些数据集上达到相似的最终FID。总体而言:最终FID相当,但迭代效率因数据集而异,并且在Min-SNR家族中具有原则性的匹配规则。

英文摘要

We give a principled derivation of the Soft-Min-SNR weight of Crowson et al. (2024). The spread divergence of Zhang et al. (2018) convolves both compared distributions with a Gaussian kernel before taking the Kullback-Leibler (KL) divergence; applied to the per-sample local matched-Gaussian surrogate at each timestep, it yields the closed-form weight w(t,lambda) = sigma^2 / (sigma^2 + lambda). Three consequences follow. First, for variance-preserving schedules, w(t,lambda) equals a constant multiple of Soft-Min-SNR with gamma' = (1+lambda)/lambda, deriving a validated heuristic rather than introducing a new weight. Second, the same weight matches Min-SNR-gamma at leading order under gamma approximately 1/lambda, giving a cross-walk between the soft and hard reweighting families. Third, a local-geometry analysis scales an SGD-difficulty proxy by w^3 at high-SNR timesteps. Complementary to the objective-level account of Kingma & Gao (2023), who unified monotonic-in-log-SNR weightings as ELBOs of noise-augmented data, ours smooths both compared distributions rather than only the data side. Empirically, the matching rule holds on CIFAR-10 (linear and cosine) and CelebA-64 (cosine), with trajectory-wide confirmation on the cross-dataset cut: |Ours - Min-SNR| averages 0.45 FID across seven intermediate checkpoints on the seed-42 CelebA-64 trajectory, roughly 3x tighter than either reweighter's gap to DDPM. The local-geometry prediction is partially borne out: Ours converges about 21% earlier than DDPM at mid-training FID thresholds on CIFAR-10's linear schedule, where high-SNR damping headroom is largest, but this iteration-efficiency advantage does not transfer to cosine or CelebA-64, where all three methods reach similar final FIDs. Overall: final-FID parity with dataset-dependent iteration efficiency, plus a principled matching rule across the Min-SNR family.

2606.13242 2026-06-12 stat.ME stat.CO 新提交

Least Absolute Deviations Estimation for Sinusoidal Models

正弦模型的最小绝对偏差估计

Zehaan Naik, Debasis Kundu

AI总结 提出基于最小绝对偏差的正弦回归模型鲁棒参数估计方法,采用坐标下降算法(加权中位数更新振幅、周期图网格搜索优化频率),证明估计量的强一致性和渐近正态性,在合成数据和真实时间序列中展示对非高斯噪声的鲁棒性。

详情
Comments
34 pages, 5 figures
AI中文摘要

我们研究在最小绝对偏差(LAD)框架下正弦回归模型中的鲁棒参数估计。虽然经典方法主要依赖于最小二乘公式,但已知它们对重尾噪声和异常值敏感。我们将估计问题表述为直接最小化LAD目标,并提出一种简单、模块化的坐标下降算法,该算法利用目标的部分凸性:振幅参数通过加权中位数计算更新,从而比传统的单纯形优化方法带来实质性的计算改进,而频率参数则通过基于周期图的网格搜索和局部细化进行估计。我们在温和的正则条件下建立了所提估计量的强一致性和渐近正态性。实验上,我们在合成数据集和真实世界时间序列(包括莫纳罗亚大气CO2数据、航空旅客数据和英国驾驶员死亡数据)上展示了该方法的有效性,其中对非高斯噪声的鲁棒性至关重要。所提出的方法为正弦信号估计提供了一种简单、可解释且鲁棒的替代最小二乘方法的方案。

英文摘要

We study robust parameter estimation in sinusoidal regression models within a least absolute deviations (LAD) framework. While classical approaches rely predominantly on least-squares formulations, they are known to be sensitive to heavy-tailed noise and outliers. We formulate the estimation problem as direct minimization of the LAD objective and propose a simple, modular coordinate descent algorithm that exploits the partial convexity of the objective: amplitude parameters are updated via weighted median computations, leading to substantial computational improvements over traditional simplex-based optimization methods, while frequency parameters are estimated via a periodogram-inspired grid search with local refinement. We establish strong consistency and asymptotic normality of the proposed estimator under mild regularity conditions. Empirically, we demonstrate the method's effectiveness on both synthetic datasets and real-world time series, including the Mauna Loa atmospheric CO2 data, air passenger data, and UK drivers' deaths data, where robustness to non-Gaussian noise is essential. The proposed approach provides a simple, interpretable, and robust alternative to least-squares-based methods for sinusoidal signal estimation.

2606.12884 2026-06-12 stat.ME eess.SP 新提交

Volterra--Wiener--Kunchenko Orthogonalization: From Wiener--Hermite to Distribution-Matched Volterra Bases

Volterra--Wiener--Kunchenko正交化:从Wiener--Hermite到分布匹配的Volterra基

Serhii Zabolotnii

AI总结 针对非高斯输入下Volterra辨识的病态问题,通过定向Gram-Schmidt正交化构造分布匹配的VWK基,并证明方差匹配高斯基下的自归一化对角估计器风险受偏度系数控制,实验表明VWK基条件数优于幂基。

详情
Comments
20 pages, 1 figure; companion reproducibility archive with code, frozen results, and Lean 4 files
AI中文摘要

有限记忆Volterra辨识的单项式参数化在非高斯输入下是病态的,而Wiener--Hermite展开仅对高斯白噪声输入消除病态。我们通过在$L^2(P)$中对单项式进行定向Gram--Schmidt正交化,构造了分布匹配的Volterra--Wiener--Kunchenko (VWK)基,并将其作为任意多项式混沌坐标系,用于从数据中进行有限记忆Volterra辨识,遵循Xiu和Karniadakis (2002)的广义多项式混沌以及Oladyshkin和Nowak (2012)的数据驱动任意多项式混沌。该基本身是经典的;贡献在于Volterra估计的解读。首先,一个二阶误指定惩罚定理表明,在方差匹配高斯基中,自归一化对角估计器的超额$L^2(P)$风险由偏度系数$\delta=\mu_3/\sigma^2$控制,对于对称输入恰好消失。其次,条件实验将总体匹配Gram是单位矩阵这一构造性事实与有限样本设计Gram区分开来:在$n=2000$时,中心指数经验VWK Gram的条件数远优于幂Gram,尽管它随阶数增加而退化。第三,一个机器检查的Lean 4证明建立了任意$N$的二项式$(N,p)$ Krawtchouk行。固定跨度上的全最小二乘是基不变的,因此VWK稳定了对角互相关和正则化坐标拟合,而非声称通用预测优越性。该分析基于矩、有限记忆,并限制为乘积输入分布。

英文摘要

The monomial parameterization of finite-memory Volterra identification is ill-conditioned under non-Gaussian input, and the Wiener--Hermite expansion removes this ill-conditioning only for Gaussian white-noise input. We construct the distribution-matched Volterra--Wiener--Kunchenko (VWK) basis by oriented Gram--Schmidt orthogonalization of monomials in $L^2(P)$ and use it as an arbitrary-polynomial-chaos coordinate system for finite-memory Volterra identification from data, following the generalized polynomial chaos of Xiu and Karniadakis (2002) and the data-driven arbitrary polynomial chaos of Oladyshkin and Nowak (2012). The basis itself is classical; the contribution is the Volterra-estimation reading. First, an order-2 misspecification-penalty theorem shows that a self-normalized diagonal estimator in the variance-matched Gaussian basis incurs an excess $L^2(P)$ risk governed by the skew coefficient $\delta=\mu_3/\sigma^2$, vanishing exactly for symmetric inputs. Second, conditioning experiments separate the constructional fact that the population matched Gram is the identity from the finite-sample design Gram: at $n=2000$, the centered-exponential empirical VWK Gram remains far better conditioned than the power Gram, although it degrades with degree. Third, a machine-checked Lean 4 proof establishes the Binomial$(N,p)$ Krawtchouk row for arbitrary $N$. Full least squares over a fixed span is basis-invariant, so VWK stabilizes diagonal cross-correlation and regularized coordinate fits rather than claiming universal prediction superiority. The analysis is moment-based, finite-memory, and restricted to product input laws.

2606.13240 2026-06-12 cs.LG cs.AI cs.CV stat.ME stat.ML 新提交

Towards More General Control of Diffusion Models Using Jeffrey Guidance

使用 Jeffrey 引导实现扩散模型的更通用控制

Raphaël Razafindralambo, Rémy Sun, Frédéric Precioso, Jes Frellsen, Pierre-Alexandre Mattei

发表机构 * Inria, CNRS, I3S, Maasai Université Côte d’Azur(法国国家信息与自动化研究所、法国国家科学研究中心、信息与系统科学实验室、马赛·蔚蓝海岸大学) Technical University of Denmark(丹麦技术大学) Inria, CNRS, LJAD, Maasai Université Côte d’Azur(法国国家信息与自动化研究所、法国国家科学研究中心、雅克-路易·利翁实验室、马赛·蔚蓝海岸大学)

AI总结 提出 Jeffrey 引导框架,通过 Jeffrey 条件规则更新边缘分布,扩展扩散模型控制到标准引导无法表达的应用,在 CIFAR-10 和 FFHQ 上显著降低 FID,并在 CelebA-HQ 上实现公平性控制。

详情
AI中文摘要

扩散模型的一个关键优势在于其灵活性,因为其输出可以在采样时通过引导进行控制。然而,除了条件采样等简单情况外,目标分布通常隐含地定义,仅通过采样规则或启发式能量函数给出。为了解决这个问题,我们提出了 Jeffrey 引导,这是一个原则性框架,将扩散模型控制扩展到标准引导无法表达的应用。它利用 Jeffrey 条件规则将边际分布更新到指定的目标,保持条件结构并最小化对联合分布的扰动。我们首先通过针对指定的嵌入分布来演示 Jeffrey 引导。以 Inception 嵌入为目标,这导致在 CIFAR-10 和 FFHQ 上 FID 显著降低。我们进一步将 Jeffrey 引导应用于 CelebA-HQ 上的公平性,更新无条件扩散模型以强制属性之间的独立性。

英文摘要

A key strength of diffusion models lies in their flexibility, since their outputs can be controlled at sampling time through guidance. However, beyond simple cases such as conditional sampling, the target distribution is often left implicit, defined only through a sampling rule or a heuristic energy function. To address this, we propose Jeffrey guidance, a principled framework that extends diffusion-model control to applications beyond what standard guidance can express. It leverages Jeffrey's rule of conditioning to update marginal distributions towards a prescribed target, preserving the conditional structure and minimally perturbing the joint distribution. We first demonstrate Jeffrey guidance by targeting a prescribed embedding distribution. With Inception embeddings as the target, this leads to substantial reductions in FID on both CIFAR-10 and FFHQ. We further apply Jeffrey guidance to fairness on CelebA-HQ, updating an unconditional diffusion model to enforce independence between attributes.

2606.13554 2026-06-12 math.ST stat.ME 新提交

Asymptotic regimes for maximum likelihood estimation in the Ewens--Pitman model: When the strength parameter matters

Ewens-Pitman模型中最大似然估计的渐近区域:当强度参数重要时

Filippo Ascolani, Mario Beraha, Stefano Favaro

AI总结 研究Ewens-Pitman模型中折扣和强度参数最大似然估计的大样本渐近行为,发现四个不同区域,其中θ可能起关键作用,并通过缩放模型克服无限可交换性限制。

详情
AI中文摘要

我们研究了随机划分的Ewens-Pitman模型中折扣和强度参数$(\alpha,\theta)$的最大似然估计的大样本渐近行为,在数据生成机制的温和假设下。我们表明,根据频率谱的极限行为,会出现四个不同的区域。特别是,与先前的工作相反,我们发现$\theta$在渐近上可能起关键作用。我们进一步表明,现有文献隐含地只关注其中两个区域,并将这种限制与无限可交换性施加的约束联系起来。在后者下,确实,不同块的数量和频率谱必然通过刚性的结构关系联系在一起。我们证明,通过我们所谓的缩放Ewens-Pitman模型可以克服这种缺乏灵活性的问题,在该模型中,$\theta$允许随样本大小$n$增长。最后,我们提供了来自真实世界数据的经验证据,表明需要这样的扩展来捕获超出经典Ewens-Pitman框架的频率谱。

英文摘要

We study the large sample asymptotic behaviour of the Maximum Likelihood Estimator of the discount and strength parameters $(\alpha,\theta)$ in the Ewens--Pitman model for random partitions, under mild assumptions on the data-generating mechanism. We show that four distinct regimes arise, depending on the limiting behaviour of the frequency spectrum. In particular, in contrast with previous work, we find that $\theta$ may play a crucial role asymptotically. We further show that the existing literature implicitly focuses on only two of these regimes, and we relate this restriction to the constraints imposed by infinite exchangeability. Under the latter, indeed, the number of distinct blocks and the frequency spectrum are necessarily tied by a rigid structural relation. We prove that this lack of flexibility can be overcome through what we call the scaled Ewens--Pitman model, in which $\theta$ is allowed to grow with the sample size $n$. Finally, we provide empirical evidence from real-world data showing that such extensions are needed to capture frequency spectra that fall outside the classical Ewens--Pitman framework.

2606.12720 2026-06-12 math.PR math.ST stat.ML 新提交

On McDiarmid's Inequality under Dependence via Approximate Tensorization of Entropy

关于依赖下通过熵的近似张量化得到的McDiarmid不等式

Valentin Roth

AI总结 本文通过熵的近似张量化(ATE)推导依赖数据的McDiarmid不等式,应用于非各向同性高斯向量、强对数凹和对数光滑测度,并解决符号函数集中问题、依赖下Erdős-Rényi图及Dvoretzky-Kiefer-Wolfowitz型不等式,改进收敛速率至$1/\sqrt{n}$。

详情
Comments
27 pages
AI中文摘要

我们认为McDiarmid不等式的依赖版本是数理统计、学习理论和理论计算机科学中有用但未被充分利用的工具。为说明这一点,我们首先强调熵的近似张量化(ATE)通过熵方法蕴含McDiarmid不等式。其次,我们通过ATE推导非各向同性高斯随机向量$X \sim \mathcal N(\mu, \Sigma)$的McDiarmid不等式,其常数阶为$\Sigma$的条件数。我们通过随机局部化的简单应用独立获得该ATE,并讨论Ascolani等人(2026)针对Gibbs采样器提出的更一般的ATE如何将McDiarmid型集中性推广到强对数凹和对数光滑概率测度。然后,我们将所得集中不等式应用于解决Simone Bombari提出的关于$\operatorname{sign}(X)$集中性的问题,研究依赖下的Erdős-Rényi图,并证明对于满足ATE和连续边际CDF的联合测度观测值的Dvoretzky-Kiefer-Wolfowitz型不等式。对于强对数凹和对数光滑测度类,该结果改进了Bobkov和Götze(2010)针对非独立同分布观测值的先验Dvoretzky-Kiefer-Wolfowitz型不等式,在弱依赖下建立了预期的$1/\sqrt{n}$收敛速率,而非$n^{-1/3}$。

英文摘要

We argue that dependent versions of McDiarmid's inequality are a useful but underutilized tool in mathematical statistics, learning theory and theoretical computer science. To make this point, we first highlight that approximate tensorization of entropy (ATE) implies McDiarmid's via the Entropy Method. Second, we derive McDiarmid's inequality for non-isotropic Gaussian random vectors $X \sim \mathcal N(\mu, \Sigma)$ through ATE with a constant of the order of the condition number of $\Sigma$. We both independently obtain this ATE through a simple application of stochastic localization and also discuss how a more general ATE for the Gibbs sampler due to Ascolani et al., 2026 generalizes McDiarmid's-like concentration to strongly log-concave and log-smooth probability measures. We then apply the resulting concentration inequalities to resolve a question on the concentration of $\operatorname{sign}(X)$ posed by Simone Bombari, investigate Erdős-Rényi graphs under dependence and prove a Dvoretzky-Kiefer-Wolfowitz-type inequality for observations from a joint measure fulfilling ATE and continuous marginal CDFs. For the class of strongly log-concave and log-smooth measures, this result improves upon a prior Dvoretzky-Kiefer-Wolfowitz-type inequality for non-i.i.d. observations due to Bobkov and Götze, 2010, by establishing the expected $1/\sqrt{n}$-rate of convergence under weak dependence instead of $n^{-1/3}$.

2. 贝叶斯统计与概率建模 1 篇

2606.12701 2026-06-12 stat.ME 新提交

Bayesian machine learning approach for recurrent events studies using Soft Bayesian Additive Regression Trees (SBART)

基于贝叶斯机器学习方法的复发事件研究:软贝叶斯加性回归树(SBART)

MengXing Chen, Debajyoti Sinha, Antonio Linero

AI总结 提出软贝叶斯加性回归树(SBART)非参数方法,结合软决策树与贝叶斯集成学习,用于复发事件建模,通过两层数据增强实现高效计算,在模拟和实际数据中优于现有方法。

详情
AI中文摘要

复发事件数据在生物医学研究中经常出现,其中个体可能经历同一类型事件的多次复发,例如反复住院。本文介绍了一种在贝叶斯集成学习框架下用于复发事件的非参数方法,称为软贝叶斯加性回归树(SBART),该方法结合多个软决策树以实现高预测精度和复发事件潜在强度的平滑估计。所提出的模型将非齐次泊松过程的条件强度函数表示为时间常数基线、个体特定脆弱随机效应以及捕获潜在非线性协变量效应和协变量与时间之间未知交互作用的非参数分量的乘积。采用两层数据增强方案,以在我们的计算算法中有效整合SBART组件。模拟研究表明,即使我们的建模假设不成立,我们的方法(简称RecSBART)在估计累积强度方面也优于现有方法。通过对结直肠癌患者反复住院研究的贝叶斯分析,我们进一步证明了RecSBART方法在复发事件研究中揭示和解释协变量之间潜在复杂关系的能力。

英文摘要

Recurrent event data frequently arise in biomedical studies, where individuals may experience multiple recurrences of the same type of events, such as recurrent hospitalizations. This article introduces a nonparametric method for recurrent events under a Bayesian ensemble learning framework, called Soft Bayesian Additive Regression Trees (SBART), which combines multiple soft decision trees to achieve high predictive accuracy and a smooth estimator of the underlying intensity of the recurrent events. The proposed model represents the conditional intensity function of the non-homogeneous Poisson process as the product of a time-constant baseline, a subject-specific frailty random effect, and a nonparametric component capturing potentially nonlinear covariate effects and unknown interactions among covariates and time. A two-layer data augmentation scheme is employed to efficiently incorporate the SBART component within our computational algorithm. Simulation studies demonstrate that our method, called RecSBART in short, achieves superior accuracy in estimating cumulative intensity compared to existing approaches, even when our modeling assumptions are not true. With the Bayesian analysis of a study of recurrent hospitalizations of colorectal cancer patients, we further demonstrate our RecSBART method's ability to reveal and interpret the underlying complex relationships among covariates in a recurrent events study.

3. 因果推断与实验设计 6 篇

2606.13531 2026-06-12 stat.ME 新提交

When Representative Samples Produce Worse Outcomes: Scale-up Decisions and Testing in Small-Budget RCTs

当代表性样本产生更差结果:小预算随机对照试验中的规模决策与测试

Hannah Li, Hongseok Namkoong, Isaac Scheinfeld

AI总结 本文研究小预算随机对照试验中,基于统计显著性检验决定是否扩大干预时,试点样本组成如何影响预期结果,发现小预算下最优设计是仅从单一同质子群体抽样。

详情
AI中文摘要

小型随机对照试验通常用于在开展更大规模后续研究之前筛选干预措施。这是实验的关键阶段,因为错过有效干预或扩大有害干预可能代价高昂。为减少这些错误,一个常见建议是招募对目标人群具有代表性的样本,但在资源有限的试点中这往往具有挑战性。我们挑战了代表性样本总是更优的观点,证明当统计显著性检验决定干预措施是否获得进一步研究时,最大化下游预期结果改善的试点试验组成关键取决于其预算规模。在大预算极限下,最优试点设计收敛于对目标人群具有代表性的样本。然而,在小预算区间,试点设计者通过仅从单一同质子群体抽样来最大化预期影响,子群体的选择取决于抽样成本以及设计者对异质性处理效应的先验信念。我们对小预算结果的证明更普遍地适用于当随机对照试验和显著性检验用于决定是否获得任何非自适应下游收益的情况,这一结果可能适用于其他实验预算受限的场景。

英文摘要

Small randomized controlled trials are often used to screen interventions before running larger follow-up studies. This is a critical phase of experimentation, as missing effective interventions or scaling up harmful ones can be very costly. A common proposal to mitigate these errors is to recruit samples that are representative of the target population, but this is often challenging in resource-constrained pilots. We challenge the narrative that representative samples are always superior by showing that when statistical significance testing determines whether interventions receive further study, the pilot trial composition that maximizes the downstream expected improvement in outcomes depends critically on its budget size. In the large-budget limit, the optimal pilot design converges to a sample that is representative of the target population. However, in the small-budget regime, the pilot designer maximizes expected impact by sampling only from a single homogeneous sub-population, chosen in a manner that depends on sampling costs and the designer's prior beliefs about heterogeneous treatment effects. Our proof of the small-budget result applies more generally when an RCT and significance test are used to decide whether to receive any non-adaptive downstream payoff, a result that may be applicable to other settings with constrained experimentation budgets.

2606.13305 2026-06-12 stat.ME stat.AP stat.CO 新提交

Semiparametric Bayesian inference for causal mediation in cluster randomized trials

整群随机试验中因果中介的半参数贝叶斯推断

Woojung Bae, Michael Daniels, Joseph Hogan, Rajesh Vedanthan, Stavroula Chrysanthopoulou

AI总结 针对整群随机试验中群组数量少、中介变量在群组层面测量时的因果中介分析难题,提出一种结合参数贝叶斯模型和相似性加权贝叶斯自助法的稳健推断框架,准确估计自然直接和间接效应。

详情
AI中文摘要

整群随机试验(CRTs)常用于评估干预措施,但在此类设置中进行因果中介分析仍然具有挑战性,特别是当中介变量在群组层面测量且群组数量较少时。标准推断方法通常依赖于渐近假设,这些假设在有限样本设置中失效,导致方差估计有偏和置信区间无效。在本文中,我们为CRT中的因果中介分析提出一个稳健的推断框架。我们利用结果和中介的参数贝叶斯模型以确保计算效率和可解释性。关键的是,为了量化不确定性,我们指定了一种新颖的相似性加权贝叶斯自助法(SWBB),其中包含群组之间的“距离”度量;这避免了对限制性参数假设的需求,并允许模型从“更近”的群组中借用更多信息。通过将观测数据模型与因果假设相结合,我们的方法即使在群组有限的情况下也能准确估计自然直接和间接效应。模拟研究表明,我们的方法在各种场景下实现了名义覆盖概率。我们通过评估肯尼亚一项CRT中的中介作用来展示我们方法的实际效用。

英文摘要

Cluster randomized trials (CRTs) are frequently used to evaluate interventions, yet conducting causal mediation analysis in these settings remains challenging, particularly when the mediator is measured at the cluster level and the number of clusters is small. Standard inference methods often rely on asymptotic assumptions that fail in finite-sample settings, leading to biased variance estimation and invalid confidence intervals. In this paper, we propose a robust inference framework for causal mediation analysis in CRTs. We utilize parametric Bayesian models for the outcome and mediator to ensure computational efficiency and interpretability. Crucially, to quantify uncertainty, we specify a novel similarity-weighted Bayesian bootstrap (SWBB) with a `distance' metric between clusters; this avoids the need for restrictive parametric assumptions and allows the model to borrow more information from `closer' clusters. By combining observed data models with causal assumptions, our approach accurately estimates natural direct and indirect effects even with limited clusters. Simulation studies demonstrate that our method achieves nominal coverage probability across diverse scenarios. We illustrate the practical utility of our approach by assessing mediation in a CRT in Kenya.

2606.13281 2026-06-12 stat.ME 新提交

Causal invariance in graphical models with latent variables

含潜变量图模型中的因果不变性

Marco Borriero, Monia Lupparelli, Giovanni M. Marchetti, Veronica Vinciotti

AI总结 本文研究含潜变量时因果不变性原理的适用条件,刻画了观测变量诱导图的结构,并给出了多变量高斯目标下检验不变性的充要条件。

详情
AI中文摘要

因果发现旨在从观测或干预数据中识别变量间的因果关系,通常用有向无环图(DAG)表示。因果不变性原理通过利用因果效应在不同实验设置下的稳定性,能够识别目标变量的因果父节点。然而,当某些父节点未被观测到时,观测变量上的诱导图可能不再是DAG,且可能不唯一,这使因果推断复杂化。针对潜父节点的相关配置,我们刻画了诱导图,并形式化了因果不变性得以保持以识别观测父节点的条件。对于多变量高斯目标,正式建立了检验此类不变性的必要和充分条件。

英文摘要

Causal discovery aims to identify causal relationships among variables from observational or interventional data, typically represented by a directed acyclic graph (DAG). The causal invariance principle enables the identification of the causal parents of target variables by exploiting the stability of causal effects across different experimental settings. When some parents are unobserved, however, the induced graph over the observed variables may no longer be a DAG, and it may not be unique, complicating causal inference. For relevant configurations of latent parents, we characterize the induced graph and formalize the conditions under which causal invariance is preserved for the identification of the observed parents. Necessary and sufficient conditions for testing such invariance are formally established for a multivariate Gaussian target.

2606.12623 2026-06-12 stat.AP cs.LG 新提交

Estimating Individualized Treatment Effects in Acute Ischemic Stroke with Causal Transformation Models (TRAM-DAG): A Multi-Centre Observational Study with External RCT Validation

使用因果变换模型(TRAM-DAG)估计急性缺血性卒中个体化治疗效果:一项多中心观察性研究及外部RCT验证

Oliver Dürr, Lisa Herzog, Pascal Bühler, Susanne Wegener, Beate Sick

AI总结 提出因果变换模型(TRAM-DAG)估计急性缺血性卒中患者个体化治疗效果,基于观察数据拟合后,在RCT人群中验证其平均效果与ATE一致,并能正确排序患者预后。

详情
AI中文摘要

急性缺血性卒中的个体化医疗需要从平均治疗效果(ATE)转向个体化治疗效果(ITE)估计,以支持治疗决策。在急性缺血性卒中中,随机对照试验(如MR CLEAN研究)显示机械取栓平均优于溶栓。我们旨在识别哪些个体患者从机械取栓中获益最大。关注的结局是三个月时的改良Rankin量表(mRS),这是一个有序的功能残疾指标(0:无症状,6:死亡)。我们证明,在观察性MAGIC多中心卒中患者数据上拟合后,有向无环图上的因果变换模型(TRAM-DAG)可用于ITE估计。为确保与用于验证的MR CLEAN人群的可比性,我们在MAGIC子人群(入院NIHSS≥6,对应MR CLEAN的一项纳入标准)上训练TRAM-DAG。然后使用拟合模型估计MR CLEAN人群中卒中患者的ITE。虽然这些ITE估计无法通过实验确认,但我们显示其平均值与试验报告的ATE一致。此外,ITE估计正确地将试验患者按观察到的良好结局(三个月mRS≤2)频率排序。这些发现支持使用像TRAM-DAG这样的因果模型进行卒中护理中的个性化决策,并突显其弥合观察性证据与临床试验之间差距的能力。

英文摘要

Personalized medicine in acute ischemic stroke requires moving beyond average treatment effects (ATE) to individualized treatment effect (ITE) estimates to support treatment decisions. In acute ischemic stroke, mechanical thrombectomy has been shown to be more effective on average than lysis in randomized controlled trials (RCTs), such as the MR CLEAN study. We aim to identify which individual patients benefit most from mechanical thrombectomy compared to lysis. The outcome of interest is the modified Rankin Scale (mRS) at three months, an ordinal measure of functional disability (0: no symptoms, 6: death). We demonstrate that causal transformation models on directed acyclic graphs (TRAM-DAG) can be used for ITE estimation after being fitted on observational MAGIC multi-center stroke patient data. To ensure comparability with the MR CLEAN population, which we use for validation, we train the TRAM-DAG on a MAGIC sub-population with NIHSS at admission >= 6, corresponding to one inclusion criterion of MR CLEAN. The fitted model is then used to estimate ITEs for stroke patients in the MR CLEAN population. While these ITE estimates cannot be confirmed experimentally, we show that their average is consistent with the trial's reported ATE. Furthermore, the ITE estimates correctly rank trial patients by their observed frequency of a good outcome (mRS at three months <= 2). These findings support the use of causal models like TRAM-DAG for personalized decision-making in stroke care and highlight their ability to bridge the gap between observational evidence and clinical trials.

2606.12680 2026-06-12 cs.LG stat.ML 新提交

How Useful is Causal Invariance for Domain Adaptation in Finite-Sample Settings?

因果不变性在有限样本设置中对领域适应有多大用处?

Julia Kostin, Kasra Jalaldoust, Elias Bareinboim, Samory Kpotufe, Fanny Yang

发表机构 * Department of Computer Science, ETH Zurich(苏黎世联邦理工学院计算机科学系) Causal Artificial Intelligence Lab, Columbia University(哥伦比亚大学因果人工智能实验室) Department of Statistics, Columbia University(哥伦比亚大学统计系)

AI总结 研究线性回归中因果不变性如何提升监督领域适应,通过候选预测器的目标风险边界和有限样本估计误差推导匹配上下界,证明当边界足够大时自适应聚合可避免负迁移。

详情
AI中文摘要

机器学习模型在部署到与训练源分布不同的目标分布时,性能往往会下降。最近基于因果的领域泛化工作表明,领域间的共享因果结构可以诱导不变预测器,例如在结构化领域偏移下具有稳定风险的某些特征子集上的模型。然而,这种总体水平的因果不变性在有限样本设置中能带来多大收益仍未充分探索。特别是,在实践中我们通常只能获得少量带标签的目标样本,这种设置称为监督领域适应(sDA)。本文探讨何时(完全或部分)因果知识能够可证明地改进监督领域适应。作为第一步,我们研究线性回归,其中完全或部分因果知识指定了一组不变或可能不变的特征子集,每个子集产生一个源训练候选预测器。我们推导了匹配的上界和下界,表明有限样本收益由候选预测器之间的目标风险边界以及有限源估计误差共同决定。当这些边界相对于$n_Q$足够大时,自适应聚合过程可以匹配最佳候选预测器,同时避免相对于仅使用目标样本学习的负迁移。另一方面,当边界过小时,没有算法能够可靠地利用候选集合获得更快的有限样本速率。我们进一步将这些边界与线性SCM中的结构偏移幅度联系起来,并在真实世界的因果基准上验证了理论。

英文摘要

Machine learning models often degrade when they are deployed on a target distribution that differs from the source distributions they were trained on. Recent work in causality-based domain generalization has shown how shared causal structure between domains can induce invariant predictors, e.g., models on a subset of features which have stable risk across structured domain shifts. However, the extent to which such population-level causal invariances can lead to gains in finite-sample settings remains underexplored. In particular, in practice we often have access to a few labeled target samples, a setting called supervised domain adaptation (sDA). In this paper, we explore when (full or partial) causal knowledge can provably improve supervised domain adaptation. As a first step, we study linear regression, where full or partial causal knowledge specifies a collection of invariant or possibly invariant feature subsets, each yielding a source-trained candidate predictor. We derive matching upper and lower bounds showing that finite-sample gains are governed by the target-risk margins separating the candidates, together with the finite-source estimation error. When these margins are sufficiently large relative to $n_Q$, an adaptive aggregation procedure can match the best candidate predictor while avoiding negative transfer relative to target-only learning. On the other hand, when the margins are too small, no algorithm can reliably exploit the candidate collection to obtain faster finite-sample rates. We further connect these margins to structural shift magnitude in linear SCMs and validate the theory on real-world causal benchmarks.

2606.12892 2026-06-12 stat.ML cs.LG econ.EM math.ST stat.ME 新提交

Prediction-Powered Causal Inference by Automatic Debiased Machine Learning and Semi-Supervised Riesz Regression

预测驱动的因果推断:自动去偏机器学习与半监督Riesz回归

Masahiro Kato

AI总结 研究半监督设置下因果参数的半参数有效估计,通过结合去偏机器学习和半监督Riesz回归,提出DML-PPCI和TMLE-PPCI方法,实现比仅用标注数据更小的渐近方差。

详情
AI中文摘要

本研究探讨了在半监督设置下因果和结构参数的半参数有效估计。在我们的设置中,除了由结果和回归变量组成的标注观测数据外,还有未标记的辅助回归变量可用。我们的目标是构建因果和结构参数的估计量,其渐近方差小于仅使用标注数据构建的估计量。我们将此框架称为预测驱动的因果推断(PPCI)。我们首先推导了有效影响函数和效率界,这表明使用辅助回归变量可以获得比仅从标注观测数据可达到的效率界更小的渐近方差。然后,通过将有效影响函数与去偏机器学习(DML)框架相结合,我们提出了称为DML-PPCI的方法。如果我们构建一个估计方程估计量,我们称之为EE-DML-PPCI;如果我们构建一个目标学习估计量,我们称之为TMLE-DML-PPCI。两种估计量的渐近方差都与我们推导的效率界相匹配。在构建估计量时,有效影响函数的估计起着重要作用。在我们的研究中,有效影响函数也是一个Neyman正交分数,它依赖于Riesz表示子和回归函数。对于Riesz表示子估计,我们开发了具有收敛速度保证的半监督广义Riesz回归。

英文摘要

This study investigates semiparametric efficient estimation of causal and structural parameters in a semi-supervised setting. In our setting, unlabeled auxiliary regressors are available in addition to labeled observations consisting of outcomes and regressors. Our goal is to construct estimators of causal and structural parameters whose asymptotic variances are smaller than those of estimators constructed using only labeled data. We refer to this framework as prediction-powered causal inference (PPCI). We first derive the efficient influence function and the efficiency bound, which imply that the use of auxiliary regressors can attain a smaller asymptotic variance than the efficiency bound attainable from labeled observations alone. Then, by combining the efficient influence function with the debiased machine learning (DML) framework, we propose methods that we call DML-PPCI. If we construct an estimating-equation estimator, we refer to the method as EE-DML-PPCI; if we construct a targeted-learning estimator, we refer to the method as TMLE-DML-PPCI. The asymptotic variances of both estimators match our derived efficiency bound. In the construction of the estimators, estimation of the efficient influence function plays an important role. In our study, the efficient influence function is also a Neyman orthogonal score, which depends on the Riesz representer and the regression function. For Riesz representer estimation, we develop semi-supervised generalized Riesz regression with convergence rate guarantees.

4. 时间序列与空间统计 2 篇

2606.13615 2026-06-12 math.PR stat.ME 新提交

Data-driven subsampling rates for diffusion parameter estimation of SDEs

数据驱动的扩散参数估计子采样率选择

Felix Lindner, Andre Schmeißer, Felipe Trolldenier, Raimund Wegener

AI总结 提出基于单调游程统计的自动子采样率选择方法,确保子采样数据与SDE模型在无穷小尺度上一致,无需多尺度扩散渐近框架。

详情
Comments
30 pages, 11 figures
AI中文摘要

我们研究随机微分方程(SDE)模型中扩散参数估计的问题,其中数据和模型仅在尚未确定的特定尺度上兼容。我们引入一种简单有效的方法,用于选择合适的速率对给定的时间序列数据进行子采样,以确保子采样数据的统计结构与SDE模型在无穷小尺度上的行为一致。我们的方法基于分析子采样数据序列中单调递增或递减段(称为单调游程)的长度统计。作为分析基础,我们证明对于一大类具有加性噪声的SDE,在无穷小尺度上单调游程的长度近似服从成功概率为$1/2$的几何分布。利用这一通用特征,我们推导出一种自动化方法,用于为给定的时间序列数据选择合适的子采样率,该方法可直接应用于实际场景,且不依赖于多尺度扩散的渐近框架。通过一个工业数学应用——非织造纺织品生产过程中纤维铺放曲线的替代模型——展示了该方法。

英文摘要

We study the problem of diffusion parameter estimation for stochastic differential equation (SDE) models in scenarios where data and model are compatible only on specific scales that have yet to be determined. We introduce a simple and efficient method for selecting suitable rates at which given time series data should be subsampled in order to ensure that the statistical structure of the subsampled data is consistent with the behavior of the SDE model on an infinitesimal scale. Our approach is based on analyzing the statistics of the lengths of monotonically increasing or decreasing segments in the subsampled data sequence, which we refer to as monotone runs. As an analytical foundation, we prove for a large class of SDEs with additive noise that the lengths of monotone runs at an infinitesimal scale are approximately geometrically distributed with success probability $1/2$. This universal characterization is employed to derive an automated method for selecting appropriate subsampling rates for given time series data that is directly applicable in real-world scenarios and does not rely on an asymptotic framework of multiscale diffusions. The approach is demonstrated using an application from industrial mathematics concerning surrogate models for fiber lay-down curves in production processes of nonwoven textiles.

2606.12836 2026-06-12 physics.data-an q-bio.QM stat.ME 新提交

Interpretable model-free inference of parametric variation across time-series data through large-scale feature extraction

通过大规模特征提取进行时间序列数据参数变化的可解释无模型推断

Ben D. Fulcher, Carl H. Lubba, Giorgio F. Gilestro, Simon R. Schultz, Nick S. Jones

AI总结 提出一种无监督数据驱动方法,利用超过7000个时间序列特征库,从时间序列数据中推断未知生成过程的参数变化维度和性质,无需指定或拟合模型。

详情
AI中文摘要

这里我们解决了直接从时间序列数据中估计未知生成过程中参数变化的维度和性质的问题,无需指定或拟合模型。特别地,我们假设时间序列集合中的实例间变化是由生成模型中的参数变化引起的。我们假设,给定一个足够大的时间序列特征库,低维参数变化将表现为特征空间中的低维结构,从而可以构建潜在自由度的可解释估计量。我们使用一个包含超过7000种多样且可解释的时间序列统计量的特征库,以及13个具有已知参数变化的模拟系统(涵盖线性随机过程、非线性振荡器和混沌动力学)来测试我们的假设。我们的无监督数据驱动方法通常能在这广泛的模拟动力系统范围内重建潜在的参数变化,同时为每个潜在维度生成可解释的估计量。应用于1143只果蝇的运动动力学,我们使用该方法提取了对应于性别和昼夜节律的生物意义成分。我们的结果为急需的数据驱动方法铺平了道路,以弥合动力学的可解释理论理解与表征现代科学问题的大规模复杂数据集之间的差距。

英文摘要

Here we address the problem of estimating the dimensionality and nature of parametric variation in an unknown generative process directly from time-series data, without specifying or fitting a model. In particular we suppose that inter-instance variation in collections of time series is caused by parametric variation in the generating model. We hypothesize that, given a sufficiently large library of time-series features, low-dimensional parametric variation will manifest as low-dimensional structure in feature space, enabling interpretable estimators of the underlying degrees of freedom to be constructed. We test our hypothesis using a library of over 7000 diverse and interpretable time-series statistics and thirteen simulated systems with known parametric variation, spanning linear stochastic processes, nonlinear oscillators, and chaotic dynamics. Our unsupervised, data-driven approach often reconstructs the underlying parametric variation across this extensive range of simulated dynamical systems while also yielding interpretable estimators for each underlying dimension. Applied to the movement dynamics of 1143 fruit flies, we use this method to extract biologically meaningful components corresponding to sex and circadian rhythmicity. Our results pave the way for much-needed data-driven methods to bridge the gap between interpretable theoretical understanding of dynamics and the large and complex datasets that characterize modern scientific problems.

5. 计算统计与MCMC 8 篇

2606.13213 2026-06-12 stat.ME stat.ML 新提交

Calibrating simplified vine copulas with a noise contrastive estimation approach

使用噪声对比估计方法校准简化藤蔓连接函数

Michael Denis Kraus, David Huk, Claudia Czado

AI总结 针对简化藤蔓连接函数在条件依赖变化显著时可能误设的问题,提出基于观测特定校正因子的校准策略,利用噪声对比估计(NCE)进行局部调整,提高模型准确性。

详情
Comments
Preprint
AI中文摘要

藤蔓连接函数提供了一个灵活的框架,仅使用二元构建块对复杂的多元依赖结构进行建模。它们的实际成功在很大程度上依赖于简化假设,该假设限制条件对连接函数独立于特定的条件值。虽然这一假设极大地促进了估计,但在条件依赖变化显著的应用中可能导致模型误设。我们提出了一种基于观测特定校正因子的简化藤蔓连接函数模型的新校准策略。这些因子使用噪声对比估计(NCE)推导,这是一种用于密度估计的监督学习技术,将问题重新定义为二元分类任务,并具有易于采样的噪声分布。将拟合的简化藤蔓连接函数视为噪声模型,NCE方法为单个观测提供校正的对数似然估计,从而局部地将简化藤蔓向底层数据生成依赖结构调整。模拟研究表明,所提出的校准提供了合理有效的调整,在简化假设被违反时提高了模型准确性,而在简化模型充分时保持中性。两个实际数据应用进一步说明了该方法的实际益处。结果凸显了基于NCE的校准作为增强简化藤蔓连接函数模型而不放弃其计算可处理性的有前途工具。

英文摘要

Vine copulas provide a flexible framework for modeling complex multivariate dependence structures using only bivariate building blocks. Their practical success relies heavily on the simplifying assumption, which restricts conditional pair copulas to be independent of the specific conditioning values. While this assumption greatly facilitates estimation, it may lead to model misspecification in applications with pronounced varying conditional dependence. We propose a novel calibration strategy for simplified vine copula models based on observation-specific correction factors. These factors are derived using noise contrastive estimation (NCE), a supervised learning technique for density estimation that reframes the problem as a binary classification task with an easily sampled noise distribution. Treating the fitted simplified vine copula as the noise model, the NCE approach yields corrected log-likelihood estimates for individual observations, thereby locally adjusting the simplified vine toward the underlying data-generating dependence structure. Simulation studies demonstrate that the proposed calibration provides sensible and effective adjustments, improving model accuracy when the simplifying assumption is violated while remaining neutral when the simplified model is adequate. Two real-data applications further illustrate the practical benefits of the method. The results highlight NCE-based calibration as a promising tool to enhance simplified vine copula models without abandoning their computational tractability.

2606.12857 2026-06-12 stat.ME stat.CO 新提交

Discrepancy Modeling with Intermediate Variables: A New Framework for Robust Gaussian Process Calibration

带中间变量的差异建模:鲁棒高斯过程校准的新框架

Henry Shaowu Yuchi, Michael Grosskopf, Aman Sharma, Nicolas Schunck, Jared O'Neal, Matt Menickelly, Stefan M. Wild

AI总结 提出利用中间变量进行差异建模的鲁棒高斯过程校准框架,通过结构化变量选择、离散化缩放高斯过程约束和空间填充设计,联合建模仿真器与差异,提升预测性能并缓解可辨识性问题。

详情
AI中文摘要

高斯过程广泛用于计算机实验中的代理建模,这些实验通常产生大量中间变量,但在标准校准框架中未明确使用。如果不利用这些变量,校准不完美模型可能具有挑战性,而分别拟合仿真器和差异模型也会带来可辨识性问题。在这项工作中,我们提出了一种鲁棒的高斯过程校准框架,利用中间变量进行差异建模。该框架集成了结构化的中间变量选择过程、离散化缩放高斯随机过程(S-GaSP)来约束差异项,以及用于选择约束点的空间填充设计策略。这使得仿真器和差异的联合建模成为可能,提高了预测性能,提供了原则性的不确定性量化,并减轻了可辨识性风险。我们在涉及结合能的核物理应用中证明了其有效性,其性能优于基线方法。

英文摘要

Gaussian processes are widely used for surrogate modeling in computer experiments, which often produce numerous intermediate variables that are not explicitly used in standard calibration frameworks. Calibration of imperfect models can be challenging without leveraging these variables, while fitting the emulator and the discrepancy models separately also poses identifiability issues. In this work, we propose a robust Gaussian process calibration framework that leverages intermediate variables for discrepancy modeling. The framework integrates a structured intermediate variable selection process, a discretized scaled Gaussian stochastic process (S-GaSP) to constrain the discrepancy term, and a space-filling design strategy for selecting constraint points. This enables joint modeling of the emulator and discrepancy, improving predictive performance, providing principled uncertainty quantification, and alleviating identifiability risks. We demonstrate its efficacy on a nuclear physics application involving binding energies, where it outperforms baseline approaches.

2606.12596 2026-06-12 stat.ME 新提交

Extending Prais-Winsten Regression to Panel Data with Higher-Order Autoregressive Errors: A Simulation Study

将Prais-Winsten回归扩展到具有高阶自回归误差的面板数据:一项模拟研究

Ariel Linden

AI总结 将Prais-Winsten AR(k) GLS变换扩展到面板数据,在Stata包xtpraisk中实现,并通过蒙特卡洛模拟验证其统计性质,发现xtpraisk在保持名义第一类错误率的同时比xtscc具有更高功效,且对自回归阶数误设稳健。

详情
AI中文摘要

我们将Prais-Winsten AR(k)广义最小二乘(GLS)变换扩展到Beck-Katz面板校正标准误(PCSE)框架内的面板数据,并在社区贡献的Stata包xtpraisk中实现了该方法。作为Prais-Winsten的面板扩展,xtpraisk是xtscc(Newey-West的面板扩展和Driscoll-Kraay估计量的实现)的自然比较对象。我们进行了蒙特卡洛模拟以验证xtpraisk的统计性质,并将其有限样本性能与xtscc进行比较。模拟涵盖了自回归阶数1-3、三种自相关情景、三种面板规模、六种序列长度和五种效应大小,每种条件进行2000次重复。在所有条件下,xtpraisk在保持接近名义第一类错误率、置信区间覆盖率和标准误校准的同时,实现了比xtscc更高的功效。相比之下,xtscc在短序列长度下表现出系统性的标准误低估和第一类错误膨胀,且这两种缺陷随着自回归阶数的增加而恶化。两种估计量基本上无偏。自回归阶数的误设不会降低xtpraisk的推断性能,而面板间相关性和面板规模对任一估计量的相对性能影响可忽略。结果表明,当统计效率和有效推断均为优先考虑时,尤其是在持久的高阶自相关和短到中等序列长度下,xtpraisk更优。

英文摘要

We extend the Prais-Winsten AR(k) generalized least squares (GLS) transformation to panel data within the Beck-Katz panel-corrected standard error (PCSE) framework and implement the method in the community-contributed Stata package xtpraisk. As the panel extension of Prais-Winsten, xtpraisk is the natural comparator to xtscc, the panel extension of Newey-West and implementation of the Driscoll-Kraay estimator. We conduct a Monte Carlo simulation to validate the statistical properties of xtpraisk and compare its finite-sample performance with xtscc. The simulation spans autoregressive orders 1-3, three autocorrelation scenarios, three panel sizes, six series lengths, and five effect sizes, with 2,000 replications per condition. Across all conditions, xtpraisk achieved higher power than xtscc while maintaining near-nominal Type I error rates, confidence interval coverage, and standard error calibration. In contrast, xtscc exhibited systematic standard error underestimation and inflated Type I error at short series lengths, with both deficiencies worsening as autoregressive order increased. Both estimators were essentially unbiased. Misspecification of the autoregressive order did not degrade xtpraisk's inferential performance, and cross-panel correlation and panel size had negligible effects on the relative performance of either estimator. The results indicate that xtpraisk is preferable when both statistical efficiency and valid inference are priorities, particularly under persistent higher-order autocorrelation and short to moderate series lengths.

2606.13234 2026-06-12 stat.CO math.NA math.ST 新提交

Switching Hamiltonian Monte Carlo for sampling from mixture distributions

切换哈密顿蒙特卡洛方法用于混合分布采样

A. Sharma

AI总结 提出切换哈密顿蒙特卡洛方法,结合对称数值积分器和泊松跳跃,实现有限混合玻尔兹曼-吉布斯分布的采样,并证明几何遍历性和二阶偏差。

详情
AI中文摘要

我们提出了一种切换哈密顿蒙特卡洛方法,用于从有限混合玻尔兹曼-吉布斯分布中采样。我们提出了对称数值积分器来近似与泊松跳跃交织的切换哈密顿动力学,其中状态切换链使用均匀化技术或随机模拟算法进行模拟。我们证明了所得马尔可夫链的几何遍历性。我们开发了一种基于与数值方案相关的离散泊松方程的方法,用于估计计算遍历平均值的误差。使用这种方法,我们证明了所提出的数值积分器具有二阶偏差。该方法简单且可推广到其他设置,例如动力学朗之万方程。最后,我们通过数值实验验证了收敛结果。

英文摘要

We introduce a switching Hamiltonian Monte Carlo method for sampling from finite mixture Boltzmann-Gibbs distributions. We propose symmetric numerical integrators to approximate switching Hamiltonian dynamics interlaced with Poisson jumps, where the regime-switching chain is simulated using the uniformization technique or the stochastic simulation algorithm. We prove geometric ergodicity of the resulting Markov chain. We develop an approach based on the discrete Poisson equation associated with numerical schemes to estimate the error in computing ergodic averages. Using this approach we prove that the proposed numerical integrators have second-order bias. This approach is simple and can be generalized to other settings, for example, kinetic Langevin equations. Finally, we verify the convergence result via numerical experiment.

2606.12694 2026-06-12 cs.DS cs.LG math.PR stat.ML 新提交

A unified complexity bound for logconcave sampling

对数凹采样的统一复杂度界

Yunbum Kook, Santosh S. Vempala

AI总结 本文通过In-and-Out算法与指数提升,给出了从热启动采样任意对数凹分布的简单、统一且近乎紧的界,主要创新是提升了提升分布的Poincaré常数界。

详情
Comments
5 pages
AI中文摘要

我们给出了一个简单、统一且近乎紧的界,用于从热启动使用In-and-Out算法结合指数提升采样任意对数凹分布。分析中的主要新成分是提升了提升分布的Poincaré常数界。因此,得到的收敛率对于约束设置(例如,限制在凸体上的高斯分布)和良条件设置(例如,强对数凹且光滑的密度)都是近乎紧的。

英文摘要

We give a simple, unified, and nearly tight bound for sampling arbitrary logconcave distributions from a warm start using the In-and-Out algorithm along with exponential lifting. The main new ingredient in the analysis is an improved bound on the Poincaré constant of a lifted distribution. As a consequence, the resulting convergence rate is nearly tight for both constrained settings (e.g., Gaussian restricted to a convex body) and well-conditioned settings (e.g., strongly logconcave and smooth densities).

2606.13063 2026-06-12 math.NA stat.ML 新提交

A Quadratic Order Reduction -- Gaussian Process Ordinary Differential Equation framework for the inference of Large Continuous Dynamical Systems

二次降阶——高斯过程常微分方程框架用于大规模连续动力系统的推断

Guglielmo Padula, Michele Girfoglio, Gianluigi Rozza

AI总结 提出结合高斯过程与二次降阶的框架,实现复杂动力系统的高精度、稳定预测与不确定性量化。

详情
Comments
49 pages, 11 figures
AI中文摘要

预测复杂动力系统的演化仍然是一项根本性的挑战任务,主要由于显著的非线性相互作用、高维状态空间以及对严格可靠的不确定性量化的同时需求。当代降阶建模(ROM)框架通常在预测精度、数值稳定性和可解释性之间表现出固有的权衡,因此往往无法在这些相互竞争的目标之间达到最优平衡。为了解决这些限制,我们提出了一种基于高斯过程和二次模型降阶的核自洽常微分方程方法,用于预测复杂动力系统。我们的基础方法,高斯过程常微分方程模型,允许带有不确定性量化的精确短期预测,并且在光滑情况下可证明收敛到真实的自洽方程。我们将其与二次降阶建模和球面投影相结合,以高效学习潜在动力学并保持稳定性。数值实验表明,我们的完整模型在精度或计算成本方面优于扩展动态模式分解、Bagging优化动态模式分解以及线性和非线性去混叠优化等ROM预测方法。这些结果证明了该框架作为具有严格不确定性量化的复杂动力系统预测的稳健且稳定工具的潜力。

英文摘要

Forecasting the evolution of complex dynamical systems remains a fundamentally challenging task, primarily due to pronounced nonlinear interactions, high-dimensional state spaces, and the concomitant requirement for rigorous and reliable uncertainty quantification. Contemporary reduced-order modelling (ROM) frameworks frequently exhibit inherent trade-offs among predictive accuracy, numerical stability, and interpretability, and thus often fail to achieve an optimal balance among these competing objectives. To address these limitations, we propose a framework for forecasting complex dynamical systems via a kernel autonomous ordinary differential equation approach based on Gaussian Processes and Quadratic Order Model Reduction. Our base method, the Gaussian Process Ordinary Differential Equations model, allows accurate short-term forecasting with uncertainty quantification, and it provably converges to the real autonomous equation in the smooth case. We integrate it with quadratic order reduced-order modelling and sphere projection for learning the latent dynamics efficiently while preserving stability. Numerical experiments demonstrate that our full model outperforms ROM forecasting methods such as Extended Dynamic Mode Decomposition, Bagging Optimised Dynamic Mode Decomposition and Linear and Nonlinear Disambiguation Optimisation in terms of accuracy or computational costs. These results demonstrate the potential of the framework as a robust and stable tool for forecasting complex dynamical systems with rigorous uncertainty quantification.

2606.13245 2026-06-12 physics.comp-ph stat.ML 新提交

REMAL: Residual Equilibrium Manifold Active Learning for Surrogate-Based Multidisciplinary Design Analysis

REMAL: 基于残差平衡流形主动学习的替代模型多学科设计分析

Kail Yuan, Ashwin Renganathan

AI总结 提出REMAL框架,通过多任务高斯过程学习联合残差流形替代模型,结合熵主动学习在零等高线附近采样,求解非线性最小二乘恢复平衡状态,显著降低耦合系统多设计点分析成本。

详情
Comments
30 pages, 16 figures
AI中文摘要

耦合工程系统的多学科设计分析需要计算平衡状态,其中所有学科耦合变量相互一致。传统的固定点迭代在每个设计点单独解决此一致性问题,当学科评估成本高昂且在外环任务(如多学科设计优化、不确定性量化或数字孪生更新)中需要大量分析时,这可能会变得昂贵。本文介绍了REMAL,一种用于耦合系统的残差流形替代建模框架。该方法不是独立近似每个学科或直接学习收敛的耦合变量,而是通过多任务高斯过程模型学习联合残差流形的替代模型。基于熵的主动学习策略在不确定的零等高线区域附近选择额外的残差评估,并通过仅使用训练好的替代模型求解非线性最小二乘优化问题来恢复新设计输入的平衡状态。该方法在四个工程耦合系统基准上进行了评估:卫星模型、气动结构模型、有限元燃气轮机传热与经济模型以及带有反馈耦合的改进涡轮模型。在这些案例中,当需要在设计空间内重复评估固定点时,REMAL始终表现出成本效益。理论上,我们证明在温和假设下,REMAL的预测固定点误差是有界的。

英文摘要

Multidisciplinary design analysis of coupled engineering systems requires the computation of equilibrium states in which all disciplinary coupling variables are mutually consistent. Conventional fixed-point iteration resolves this consistency problem separately at each design point, which can become expensive when disciplinary evaluations are costly and many analyses are required in outer-loop tasks such as multidisciplinary design optimization, uncertainty quantification, or digital twin updating. This paper introduces REMAL, a residual manifold surrogate modeling framework for coupled systems. Instead of approximating each discipline independently or directly learning converged coupling variables, the proposed method learns a surrogate model of the joint residual manifold via multitask Gaussian process models. An entropy-based active learning strategy selects additional residual evaluations near uncertain zero-contour regions, and equilibrium states for new design inputs are recovered by solving a nonlinear least squares optimization problem using only the trained surrogate. The method is evaluated on four engineering coupled system benchmarks: a satellite model, an aerostructural model, a finite-element gas-turbine heat-transfer and economics model, and a modified turbine model with added feedback coupling. Across these cases, REMAL consistently demonstrates the cost effectiveness when repeated evaluations of the fixed point across the design space are necessary. Theoretically, we show that, under mild assumptions, REMAL's predictive fixed point error is bounded.

2606.13453 2026-06-12 math-ph stat.ML 新提交

Rapid mixing for Gibbs measures in Riemannian manifolds

黎曼流形上吉布斯测度的快速混合

Ángela Capel, Marco Castrillón-López, Sofyan Iblisdir, Angelo Lucia, Pablo Páez-Velasco, David Pérez-García

AI总结 分析黎曼流形上的Langevin动力学,识别确保对数Sobolev不等式(快速混合到吉布斯测度)的条件,涉及曲率、逆温度、鞍点逃逸方向,并排除贫瘠高原和虚假局部极小值,实现维度多项式混合时间。

详情
Comments
88 + 80 pages, 1 figure
AI中文摘要

分析了黎曼流形上的Langevin动力学。确定了确保存在合适的对数Sobolev不等式(快速混合到吉布斯测度)的条件。这些条件涉及流形的曲率、逆温度、从鞍点的逃逸方向,并排除了贫瘠高原和虚假局部极小值。我们表明,当这些条件满足时,可以实现流形维度多项式的混合时间。这一结果是通过定义域和黎曼淹没像中的Langevin过程之间的关系获得的。这种关系可能具有独立的意义。

英文摘要

Langevin dynamics on Riemannian manifolds is analyzed. Conditions ensuring the existence of a suitable logarithmic Sobolev inequality (rapid mixing to the Gibbs measure) are identified. These conditions involve the curvature of the manifold, the inverse temperature, escaping directions from saddle points, and exclude barren plateaus and spurious local minima. We show that when these conditions are met, mixing times polynomial in the dimension of the manifold are achievable. This result is obtained through a relation between Langevin processes in the domain and in the image of a Riemannian submersion. Such a relation can be of independent interest.

6. 机器学习统计基础 16 篇

2606.13295 2026-06-12 stat.ML cs.LG stat.ME 新提交

Simultaneous Latent Budget Trees for Stratified Classification

用于分层分类的同时潜在预算树

Simultaneous Latent Budget Trees for Stratified Classification Cristian Buoncompagni, Stefano Pellegrino, Giulia Vannucci, Roberta Siciliano

AI总结 提出同时潜在预算树框架,通过模型驱动的分裂规则处理分层因素,实现可解释分类,并应用于肌萎缩侧索硬化症性别差异分析。

详情
AI中文摘要

在可解释人工智能时代,单棵树因其易于解释而重新受到关注。本文介绍了同时潜在预算树,这是一个概率机器学习框架,用于在存在分层因素(如时间、空间或人口统计变量)作为控制变量或潜在混杂因素时的分类树。标准的树生长过程并非设计用于优化条件分裂规则。提出了一种基于模型的分裂规则,其中子节点被解释为同时混合模型(如同时潜在预算模型及其约束版本)的潜在成分,该模型拟合于父节点。混合参数驱动观测值(不同组别不同)到达子节点,而潜在预算参数更新控制变量每个水平的响应类别轮廓。参数通过最小二乘法估计,考虑模型的神经网络视角。信息丰富的树结构可以通过节点和路径上的解释辅助工具进行交互式可视化,包括视觉剪枝和决策树选择过程。提出了适当的措施来处理不平衡的响应类别分布。所提出的方法应用于调查肌萎缩侧索硬化症疾病进展中的性别相关差异。SLBT库及其各种基于树的算法可在链接的GitHub仓库中获取。

英文摘要

In the era of Explainable Artificial Intelligence, there is a renewed focus on single trees for their ease of interpretation. This paper introduces Simultaneous Latent Budget Trees, a probabilistic machine learning framework for classification trees in the presence of a stratification factor such as a temporal, spatial, or demographic variable, acting as a control variable or potential confounder. Standard tree growth procedures are not designed to optimize a conditional split rule. A model-based split rule is proposed in which child nodes are interpreted as latent components of a simultaneous mixture model, such as the Simultaneous Latent Budget Model and its constrained versions, fitted to the parent node. Mixing parameters drive the observations, differently for each group, to the child nodes whereas latent budgets parameters update the response classes profile of each level of the control variable. Parameters are estimated by least squares considering a neural network perspective of the model. An informative tree structure can be interactively visualized with interpretation aids on the node and the paths, including visual pruning and decision tree selection procedure. Suitable measures are proposed to handle an unbalanced response class distribution. The proposed methodology is applied to investigate gender-related differences in disease progression of Amyotrophic Lateral Sclerosis. The SLBT library with the various tree-based algorithms is available in the linked GitHub repository.

2606.13277 2026-06-12 stat.ML cs.LG 新提交

ProtoX-AD: Self-Explainable Time Series Anomaly Detection and Characterization

ProtoX-AD:自解释的时间序列异常检测与特征描述

Aitor Sánchez-Ferrera, Elisabeth Wetzer, Kristoffer Wickstrøm, Michael Kampffmeyer, Robert Jenssen

AI总结 提出ProtoX-AD框架,通过原型学习实现自监督时间序列异常检测的可解释性,在保持检测性能的同时提供语义一致的异常特征解释。

详情
Comments
26 pages, 8 figures
AI中文摘要

时间序列异常检测(TSAD)的最新进展突显了自监督分类方法的有效性。这些方法对正常训练样本应用变换,训练分类器识别变换特定模式,从而通过增加分类误差来帮助识别异常。尽管性能强大,但一个重大挑战是缺乏可解释性,因为它们对标记异常的特征提供的洞察有限。为了解决这一局限,我们提出了ProtoX-AD,一种基于原型的自解释框架,用于自监督TSAD。ProtoX-AD学习变换感知的潜在表示以及可解释的原型,从而实现准确的异常检测和通过基于原型的解释识别不同的异常轮廓。此外,它允许系统分析变换设计如何影响检测性能和可解释性。在合成和真实世界数据集上的实验结果表明,ProtoX-AD实现了与其黑盒对应物相当的检测性能,同时比现有的可解释基线提供更一致和语义上有意义的解释。我们的代码在此 https URL 公开。

英文摘要

Recent advances in time series anomaly detection (TSAD) have highlighted the effectiveness of self-supervised classification-based approaches. These methods apply transformations to normal training samples, training a classifier to recognize transformation-specific patterns that help identify anomalies through increased classification errors. Despite their strong performance, a significant challenge is their lack of explainability, as they provide limited insight into the characteristics of flagged anomalies. To address this limitation, we propose ProtoX-AD, a prototype-based self-explainable framework for self-supervised TSAD. ProtoX-AD learns transformation-aware latent representations alongside interpretable prototypes, enabling both accurate anomaly detection and the identification of distinct anomalous profiles through prototype-based explanations. Additionally, it allows for systematic analysis of how transformation design impacts detection performance and explainability. Experimental results on synthetic and real-world datasets demonstrate that ProtoX-AD achieves detection performance comparable to its black-box counterparts while offering more consistent and semantically meaningful explanations than existing explainable baselines. Our code is publicly available at this https URL.

2606.13146 2026-06-12 stat.ML cs.LG stat.ME 新提交

Robust State-Conditional Feature-Weighted Jump Models for Temporal Clustering

鲁棒的状态条件特征加权跳跃模型用于时间聚类

Federico P. Cortese, Alessio Farcomeni

AI总结 提出一种鲁棒的特征加权跳跃模型,通过Tukey双权损失函数实现鲁棒性,并引入状态特定特征权重,在模拟和实证中优于竞争方法。

详情
AI中文摘要

我们提出了一种用于时间依赖聚类的鲁棒特征加权跳跃模型。使用惩罚项来鼓励随时间平滑过渡,同时通过Tukey双权损失函数实现鲁棒性。一个额外的参数控制特征权重在不同状态间的变异性,允许模型为每个特征分配状态特定的相关性。我们在模拟中展示了该方法如何准确恢复真实聚类序列并可靠识别相关特征,特别是在存在异常值的情况下优于竞争方法。最后,我们进行了两个实证应用,一个涉及1998-2000年科索沃冲突相关杀人事件的数量,另一个涉及1949-2024年十二个欧洲国家的宏观经济表现。

英文摘要

We propose a robust feature-weighted jump model for time-dependent clustering. A penalty is used to encourage smoothness of transitions over time, while robustness is achieved through the use of a Tukey's biweight loss function. An additional parameter controls the variability of feature weights across states, allowing the model to assign state-specific relevance to each feature. We illustrate in simulation how the method accurately recovers the true cluster sequence and reliably identifies relevant features, outperforming competing approaches, particularly in the presence of outliers. We conclude with two empirical applications, one on the number of conflict-related homicides in Kosovo in the period 1998-2000, and another on macroeconomic performance of twelve European countries in the period 1949-2024.

2606.12471 2026-06-12 stat.ML cs.CL cs.ET cs.LG 新提交

Identifiability Without Gaussianity: Symbolic World Models and Near-Infinite Temporal Consistency

无高斯假设的可识别性:符号世界模型与近无限时间一致性

Seth Dobrin, Łukasz Chmiel

AI总结 本文提出物理基础符号架构(PGSA),证明其在非高斯动态系统中实现精确线性可识别性和近无限时间一致性,克服了统计世界模型的高斯边界限制。

详情
Comments
Pre-print
AI中文摘要

Klindt、LeCun 和 Balestriero (arXiv:2605.26379) 证明了联合嵌入预测架构(JEPA)实现线性可识别性(即线性恢复世界的真实潜在变量)当且仅当世界的潜在动态遵循高斯平稳过程。这一高斯边界意味着时间一致性的基本限制:对于任何非高斯物理系统,统计世界模型的表示误差随时间单调增长。我们证明这一限制是统计对齐机制的产物,而非世界模型的一般性质。我们引入物理基础符号架构(PGSA),并证明三个结果:(1) PGSA 对所有物理机制实现精确线性可识别性,无论潜在分布如何;(2) PGSA 的每步误差仅受数值精度限制;(3) 直接推论是,PGSA 在无界数量的转换中保持时间一致性,我们称之为近无限时间一致性。我们进一步证明,对于任何非高斯系统,统计世界模型无法实现这一性质,无论模型容量或训练数据量如何。其中四个定理的代数核心已在 Lean 4 中使用 Mathlib4 v4.31.0 形式化(零个 sorry 占位符);Klindt 等人的逆命题作为外部前提。对比表明,在世界动态的因果生成器中进行符号基础化是充分条件,并且在非高斯体制下,是实现近无限时间一致性的唯一条件。

英文摘要

Klindt, LeCun, and Balestriero ( arXiv:2605.26379 ) proved that Joint-Embedding Predictive Architectures (JEPAs) achieve linear identifiability, the linear recovery of the world's true latent variables, if and only if the world's latent dynamics follow a Gaussian, stationary process. This Gaussian boundary implies a fundamental limit on temporal consistency: for any non-Gaussian physical system, the representation error of a statistical World Model grows monotonically with time. We prove that this limit is an artifact of the statistical alignment mechanism, not a property of World Models in general. We introduce the Physics-Grounded Symbolic Architecture (PGSA) and prove three results: (1) a PGSA achieves exact linear identifiability for all physical regimes, regardless of the latent distribution; (2) the per-step error of a PGSA is bounded by numerical precision alone; and (3) as a direct consequence, a PGSA maintains temporal consistency for an unbounded number of transitions, a property we term near-infinite temporal consistency. We further prove that statistical World Models cannot achieve this property for any non-Gaussian system, regardless of model capacity or the volume of training data. The algebraic cores of four of the theorems are formalized in Lean 4 with Mathlib4 v4.31.0 (zero sorry placeholders); the Klindt et al. converse is taken as an external premise. The contrast establishes that symbolic grounding in the causal generator of the world's dynamics is the sufficient condition and, in non-Gaussian regimes, the only condition for near-infinite temporal consistency.

2606.13576 2026-06-12 cs.LG cs.CC cs.DS stat.ML 新提交

Learning with Simulators: No Regret in a Computationally Bounded World

与模拟器学习:计算受限世界中的无悔学习

Sasha Voitovych, Abhishek Shetty, Noah Golowich, Alexander Rakhlin

发表机构 * MIT(麻省理工学院) Microsoft Research(微软研究院)

AI总结 提出可模拟过程框架,利用模拟器近似任意复杂依赖的数据分布,恢复VC维误差界,并展示条件采样的统计与计算优势。

详情
Comments
To appear at COLT 2026
AI中文摘要

理解泛化所需的最小假设是学习理论的基本问题。不幸的是,大多数结果严重依赖于数据生成过程的独立性(或其某种代理),而强依赖数据的结果则非常有限。为填补这一空白,我们引入了可模拟过程的框架,其中学习器可以访问一个近似数据生成分布(可能是任意复杂且依赖的过程)的模拟器。令人惊讶的是,我们表明,在访问这样的模拟器的情况下,我们可以恢复与经典独立数据设置相同的学习保证,即依赖于VC维的误差界。此外,我们利用这一框架研究条件采样的能力,并展示了在这种设置下严格的统计和计算优势。作为我们框架的一个亮点,我们展示了一个单一算法,该算法同时学习所有在有限多项式时间内可采样的过程下的任意给定VC类,其遗憾由过程的时间有界Kolmogorov复杂度控制。这为经典PAC模型提供了重要的概念扩展。

英文摘要

Understanding the minimal assumptions necessary for generalization is the fundamental question in learning theory. Unfortunately, most results rely heavily on independence (or some proxy thereof) of the data-generating process, while results for strongly dependent data are far more limited. Towards addressing this gap, we introduce the framework of simulatable processes, where the learner has access to a simulator that approximates the distribution generating the data (which may be an arbitrarily complex and dependent process). Surprisingly, given access to such a simulator, we show that we can recover the same learning guarantees as in the classical setting with independent data, namely, error bounds that depend on the VC dimension. Further, we use this framework to study the power of conditional sampling and show strict statistical and computational advantages in this setting. As a highlight of our framework, we exhibit a single algorithm that simultaneously learns any given VC class under all processes samplable in bounded polynomial time, with regret controlled by the time-bounded Kolmogorov complexity of the process. This provides a significant conceptual broadening of the classical PAC model.

2606.13426 2026-06-12 cs.LG stat.ML 新提交

Accelerating Speculative Diffusions via Block Verification

通过块验证加速推测性扩散

Alexander Soen, Hisham Husain, Valentin De Bortoli, Arnaud Doucet

AI总结 提出一种针对扩散模型的推测性采样方案,通过块验证提高草稿接受率,无需训练的Free Drafter实现高达6.3%的加速。

详情
AI中文摘要

推测性解码通过使用草稿模型生成令牌,并采用接受-拒绝方案确保输出与目标分布匹配,从而加速LLM推理。将其适应于连续扩散是困难的,因为推测性采样需要从残差分布中采样。虽然在离散空间中直接,但在连续空间中高效采样残差并非易事。因此,现有的扩散适应要么使用计算效率低下的采样技术,要么依赖替代方案。在这项工作中,我们引入了一种新颖的方案,高效地实现了扩散模型的原始推测性采样机制。我们的方法相比现有方法具有关键优势:它使我们能够将LLM的块验证适应到扩散——这被证明可以提高草稿的接受率。此外,我们形式化并分析了Free Drafter,一种无需训练的扩散启发式自推测草稿生成器。通过启用块验证,我们的Free Drafter在无需额外训练且开销可忽略的情况下,相比现有推测性方法实现了高达6.3%的加速。

英文摘要

Speculative decoding speeds up LLM inference by using a draft model to generate tokens, with an acceptance-rejection scheme that ensures that the output matches the target distribution. Adapting this to continuous diffusions is difficult because speculative sampling requires drawing from a residual distribution. While straightforward in discrete spaces, efficiently sampling this residual in continuous space is non-trivial. Consequently, existing diffusion adaptations either use computationally inefficient sampling techniques or rely on an alternative scheme. In this work, we introduce a novel scheme that efficiently implements the original speculative sampling mechanism for diffusion models. Our approach offers a critical advantage over current methods: it enables us to adapt block verification from LLMs to diffusions -- which provably improves the acceptance rate of drafts. Furthermore, we formalize and analyze the Free Drafter, a heuristic self-speculative drafter for diffusions that requires no training. By enabling block verification, our Free Drafter yields up to a 6.3% speedup over existing speculative methods with no additional training and negligible overhead beyond the existing parallel verification pass.

2606.12997 2026-06-12 cs.LG stat.ML 新提交

Reliability of Probabilistic Emulation of Physical Systems

物理系统概率仿真的可靠性

Sam F. Greenbury (1), Radka Jersakova (1), Paolo Conti (1 and 2), Marjan Famili (1 and 3), Christopher Iliffe Sprague (1 and 4), Edwin Brown (1 and 5), Jason D. McEwen (1 and 6) ((1) The Alan Turing Institute, (2) Autodesk Research, (3) PhysicsX, (4) Orbital, (5) University of Sheffield, (6) University College London)

发表机构 * The Alan Turing Institute(艾伦·图灵研究所) Autodesk Research(欧特克研究院) PhysicsX Orbital University of Sheffield(谢菲尔德大学) University College London(伦敦大学学院)

AI总结 比较生成模型与CRPS训练集成在物理系统概率仿真中的可靠性,发现CRPS集成在覆盖率和推理速度上更优。

详情
AI中文摘要

目前,生成物理系统概率预测的两种主要方法已经出现:生成模型(如扩散或流匹配)以及注入随机性的确定性模型集成(使用连续排序概率评分(CRPS)损失训练)。虽然这两种方法都表现出强大的预测准确性,但其不确定性的可靠性尚未得到系统评估。我们通过开发一个框架来填补这一空白,该框架在匹配模型大小和计算预算的情况下,评估这两种方法在多种二维时空物理系统中的表现。我们通过检查预测区间的经验覆盖率来评估概率仿真的可靠性,同时考虑准确性和计算效率指标。CRPS训练的集成在单步预测和自回归展开中通常能实现更可靠的不确定性,显示出比在潜在空间中训练生成模型的标准替代方案更好的覆盖率。此外,CRPS方法提供了显著更快的推理速度。当生成模型在环境空间而非压缩潜在空间中训练时(这在高维问题中通常不可行),它们表现出与CRPS训练集成相当的覆盖率,但推理延迟显著更大。相比之下,当CRPS训练的集成在潜在空间中训练时,其覆盖率相对于环境空间没有明显下降。生成模型和CRPS训练的集成都表现出良好的预测准确性。为促进未来的研究和应用,我们发布了AutoCast,一个实现生成模型和CRPS训练集成的模块化框架,以及AutoSim,一个用于快速原型的灵活数据集生成包。

英文摘要

Two dominant approaches have emerged for generating probabilistic forecasts of physical systems: generative models, such as diffusion or flow matching; and ensembles of deterministic models with stochasticity injected, trained using the continuous ranked probability score (CRPS) loss. While both approaches have demonstrated strong predictive accuracy, the reliability of their uncertainties has not been systematically assessed. We address this gap by developing a framework to evaluate both approaches across diverse 2D spatiotemporal physical systems, under matched model size and computational budget. We assess the reliability of probabilistic emulation by inspecting the empirical coverage of predictive intervals, while also considering accuracy and computational efficiency metrics. CRPS-trained ensembles typically achieve more reliable uncertainties on both single-step prediction and autoregressive rollouts, demonstrating better coverage than the standard alternative of training generative models in a latent space. Moreover, the CRPS approach offers significantly faster inference. When generative models are trained in ambient rather than a compressed latent space, which is often infeasible for high-dimensional problems, they exhibit comparable coverage to CRPS-trained ensembles, though with substantially larger inference latency. In contrast, when CRPS-trained ensembles are trained in latent space they do not show a marked degradation in coverage with respect to ambient space. Both generative models and CRPS-trained ensembles demonstrate good predictive accuracy. To facilitate future research and application, we release AutoCast, a modular framework implementing both generative models and CRPS-trained ensembles, alongside AutoSim, a flexible dataset generation package for rapid prototyping.

2606.12658 2026-06-12 cs.LG q-bio.QM stat.ML 新提交

Physics-Informed Neural Networks for Chemotherapy Pharmacokinetics: Benchmarking the Clinical Estimator and Exposing Parameter Identifiability

基于物理信息的神经网络用于化疗药代动力学:基准测试临床估计器并揭示参数可辨识性

Riya Bisht, Dhruv Agarwal

AI总结 本研究将物理信息神经网络(PINN)应用于化疗药代动力学,在双室线性模型上匹配临床标准方法,在Michaelis-Menten扩展模型中揭示参数不可辨识性,并通过稀疏组织观测部分恢复可辨识性。

详情
AI中文摘要

物理信息神经网络(PINN)是生物学中部分观测问题的一个有吸引力的工具,其中控制动力学已知但某些隔室无法测量。化疗药代动力学(PK)是一个清晰的实例:血浆中的药物浓度常规测量,但组织中的浓度——决定肿瘤杀伤和脱靶毒性——无法测量。我们在两个PK问题上将PINN与标准临床基线(非线性最小二乘解析双指数血浆解,以下简称NLS)和物理无关的神经基线(仅数据的MLP)进行基准测试。在线性双室问题上,NLS接近最优;PINN在匹配其性能(小常数因子内)的同时,在单次训练过程中产生组织曲线,而仅数据的MLP在组织上失败约10倍。在Michaelis-Menten扩展(可饱和消除)上,双指数闭式不再存在,因此NLS被错误指定并静默返回无意义的速率常数。PINN反而揭示了一个更深层的事实:Michaelis-Menten双室模型仅从血浆数据不可辨识,PINN通过收敛到k12 -> 0的盆地诚实地报告这一点。添加两个稀疏组织观测在很大程度上解决了可辨识性:在五个随机种子上,PINN恢复k21在真实值的1%以内,Vmax和Km在一个标准差范围内,而k12向正确方向移动(0.02 -> 0.82)但仍低于真实值约2个标准差——这是闭式NLS估计器根本无法尝试的恢复,因为其双指数假设仅描述血浆。我们的主张不是PINN击败NLS。而是PINN提供了一种统一的方案,该方案在教科书问题上与教科书估计器匹配,揭示了教科书估计器隐藏的结构可辨识性,并在单一损失中吸收异构测量。

英文摘要

Physics-Informed Neural Networks (PINNs) are an attractive tool for partial-observation problems in biology, where the governing dynamics are known but some compartments cannot be measured. Chemotherapy pharmacokinetics (PK) is a clean instance: drug concentration in plasma is routinely measured, but concentration in tissue -- which determines tumour kill and off-target toxicity -- is not. We benchmark a PINN against the standard clinical baseline (nonlinear least-squares on the analytical biexponential plasma solution, hereafter NLS) and a physics-agnostic neural baseline (a data-only MLP) on two PK problems. On the linear two-compartment problem, NLS is near-optimal; the PINN matches it to within a small constant factor while also producing the tissue curve in a single training pass, whereas the data-only MLP fails on tissue by roughly 10x. On a Michaelis-Menten extension (saturable elimination), the biexponential closed form no longer exists, so NLS is mis-specified and silently returns meaningless rate constants. The PINN instead exposes a deeper fact: the Michaelis-Menten two-compartment model is non-identifiable from plasma alone, and the PINN reports this honestly by converging to a basin with k12 -> 0. Adding two sparse tissue observations largely resolves identifiability: across five seeds the PINN recovers k21 to within 1% of truth and Vmax, Km to within one standard-deviation bar, while k12 moves in the correct direction (0.02 -> 0.82) but remains ~2 sigma below truth -- a recovery the closed-form NLS estimator cannot attempt at all, because its biexponential ansatz describes only plasma. Our claim is not that PINNs beat NLS. It is that PINNs offer a uniform recipe that ties the textbook estimator on the textbook problem, exposes structural identifiability that the textbook estimator hides, and absorbs heterogeneous measurements within a single loss.

2606.13614 2026-06-12 stat.ML cs.LG math.ST 新提交

Majority-of-Three is Optimal

三中多数是最优的

Divit Rawal, Nikita Zhivotovskiy

AI总结 本文通过简短证明,在可实现PAC学习框架下,三个独立一致分类器的多数投票是最优学习器,简化了投票学习器的算法结构和概率分析。

详情
Comments
9 pages
AI中文摘要

我们给出一个简短证明,表明在可实现PAC学习框架下,三个独立一致分类器的多数投票是最优学习器。这证明了最简单投票方案的最优性,同时简化了先前投票学习器的算法结构和概率分析,包括S. Hanneke的算法和K. Green Larsen对装袋的分析。

英文摘要

We give a short proof that the majority vote of three independent consistent classifiers is an optimal learner in the realizable PAC setting. This proves optimality for the simplest voting scheme, while simplifying both the algorithmic structure and the probabilistic analysis of previous voting learners, including the algorithm of S. Hanneke and the analysis of bagging by K. Green Larsen.

2606.12879 2026-06-12 cs.DS math.ST stat.ML 新提交

Diffusion-Network Alignment: An Efficient Algorithm and Explicit Probability Bounds

扩散-网络对齐:一种高效算法与显式概率界

Ziao Wang, Lei Ying

AI总结 提出扩散-网络对齐问题,基于树相关性测试设计高效算法,在稀疏图下证明高概率正确匹配,并给出顶点正确匹配的显式下界。

详情
AI中文摘要

本文研究经典网络对齐问题的一个变体,称为扩散-网络对齐。目标是将有根扩散树的顶点与网络的顶点对齐,其中扩散树可能来自通信追踪或接触追踪,而网络可能是在线或离线社交网络。与两个网络都被完全观测的经典网络对齐不同,该模型捕捉了两个网络的信息不对称性。为了解决这个问题,本文提出了一种基于树相关性测试的高效算法,从局部邻域中提取对齐信息。我们分析了该算法在稀疏图情况下的性能,并表明以高概率,所有匹配对都是正确的。此外,对于扩散树上的每个顶点,本文建立了该顶点被正确匹配的概率的显式下界。这些下界是深度依赖的,并且随着顶点接近根而增加。

英文摘要

This paper studies a variation of the classic network alignment problem, named diffusion-network alignment. The goal is to align the vertices of a rooted diffusion tree to the vertices of a network, where the diffusion tree could be from a communication trace or contact tracing, and the network could be an online or offline social network. Different from the classic network alignment where both networks are fully observed, this model captures the information asymmetry of two networks. To solve this problem, this paper presents an efficient algorithm based on tree correlation tests to extract alignment information from local neighborhoods. We analyze the performance of the algorithm in the sparse graph regime and show that with high probability, all matched pairs are correct. Furthermore, for each vertex on the diffusion tree, this paper establishes an explicit lower bound on the probability that the vertex is correctly matched. These lower bounds are depth-dependent and increase as vertices get closer to the root.

2606.12691 2026-06-12 cs.LG cs.AI eess.SY math.OC stat.ML 新提交

Two-Layer Linear Auto-Regressive Models Estimate Latent States

两层线性自回归模型估计潜在状态

Yahya Sattar, Sunmook Choi, Leo Maynard-Zhang, Yassir Jedra, Maryam Fazel, Sarah Dean

AI总结 本文证明两层线性自回归模型通过经验风险最小化训练时,能近似卡尔曼滤波,恢复潜在状态估计,并提供有限样本保证。

详情
Comments
ICML 2026
AI中文摘要

自回归模型已成为处理序列数据(从语言到视频)的强大工具。理解这些模型如何以及为何学习潜在表示仍然是一个开放的理论问题。在这项工作中,我们证明,当在部分观测的线性动力系统的数据上通过经验风险最小化训练时,两层线性自回归模型自然学会近似卡尔曼滤波。特别地,我们表明,学习到的隐藏表示与最优(卡尔曼)滤波器产生的状态估计一致,仅相差一个相似变换,尽管模型没有关于底层动力学或状态的显式知识。该结果基于三个主要见解。首先,我们建立卡尔曼滤波器可以被具有有界截断误差的自回归模型很好地近似。其次,我们表明,尽管非凸性,两层优化景观是良性的,即所有驻点要么是严格鞍点,要么是全局最小值。最后,作为我们的主要贡献,我们提供了关于预测误差、参数估计误差和潜在状态恢复的有限样本保证。数值模拟支持理论结果,并表明自回归模型的潜在表示恢复了状态估计。

英文摘要

Auto-regressive models have emerged as powerful tools for sequential data, from language to video. Understanding how and why these models learn latent representations remains an open theoretical question. In this work, we demonstrate that when trained by empirical risk minimization on data from partially observed linear dynamical systems, two-layer linear auto-regressive models naturally learn to approximate Kalman filtering. In particular, we show that the learned hidden representation coincides, up to a similarity transformation, with the state estimates produced by the optimal (Kalman) filter, even though the model has no explicit knowledge of the underlying dynamics or state. The result follows from three main insights. First, we establish that the Kalman filter is well approximated by an auto-regressive model with bounded truncation error. Second, we show that despite non-convexity, the two-layer optimization landscape is benign, i.e., all stationary points are either strict saddles or global minima. Finally, as our main contributions, we provide finite-sample guarantees on prediction error, parameter estimation error, and latent state recovery. Numerical simulations support the theoretical results and demonstrate that the latent representations of auto-regressive models recover state estimates.

2606.12646 2026-06-12 stat.ML cs.IT cs.LG 新提交

Epistemic Uncertainty Is Not the Reducible Kind

认知不确定性并非可约简的那种

Robin Young

AI总结 证明标准定义中认知不确定性为可被更多数据移除的部分,与互信息度量在扩展上不一致,并提出三部分分解:偶然、样本可约简认知和机制可约简认知不确定性。

详情
AI中文摘要

预测不确定性的标准分类将认知不确定性定义为可通过收集更多数据移除的部分,而标准度量将其与互信息项等同。我们证明该定义与度量在扩展上不一致。在一个显式构造中,度量将所有不确定性归为认知类,但任何数量的训练数据都无法减少它。可约简性反而是(不确定性,获取类)这一对的性质,二分法分解为三部分:偶然不确定性、样本可约简认知不确定性和机制可约简认知不确定性。一个观测值的精确恒等式表明,分布内数据永远不会减少机制不可约简的不确定性,并且通常会增加它。集成分歧,即部署的认知估计,追踪的是训练过程而非认知项。在一致训练下,它降至正真值以下的零,并在插值下等于超参数缩放的初始化噪声。有限样本的证伪测试和种子扫描实验证实了该理论。

英文摘要

The standard taxonomy of predictive uncertainty defines epistemic uncertainty as the part removable by collecting more data, while the standard measure identifies it with a mutual-information term. We prove the definition and the measure are extensionally inconsistent. On an explicit construction, the measure assigns all uncertainty to the epistemic class, yet no quantity of training data reduces it. Reducibility is instead a property of the pair (uncertainty, acquisition class), and the dichotomy resolves into three parts: aleatoric, sample-reducible epistemic, and mechanism-reducible epistemic uncertainty. An exact identity for the value of an observation shows that in-distribution data never reduces mechanism-irreducible uncertainty and generically increases it. Ensemble disagreement, the deployed epistemic estimate, tracks the training procedure rather than the epistemic term. It collapses to zero beneath a positive truth under consistent training, and equals hyperparameter-scaled initialization noise under interpolation. A finite-sample falsification test and seed-swept experiments confirm the theory.

2606.13548 2026-06-12 cond-mat.mtrl-sci physics.data-an stat.ML 新提交

Symmetry-electronic fingerprints reveal competing magnetic phases in two-dimensional materials

对称-电子指纹揭示二维材料中的竞争磁性相

Addis Fuhr, Zachary R. Fox, David Parker, Ayana Ghosh

AI总结 提出对称-电子指纹(SEF)表示,结合晶体对称性与电子结构,通过随机森林集成学习准确分类磁有序、回归磁矩和各向异性,并识别Stoner铁磁性与局域超交换的竞争区域,模型不确定性可诊断近简并铁磁/反铁磁相。

详情
AI中文摘要

二维磁体为自旋电子学和量子技术提供了引人注目的平台,但预测其磁基态、磁矩和各向异性仍然具有挑战性。这一限制主要源于现有的机器学习表示编码了化学环境,但没有捕捉控制磁性的对称性或交换物理。在这项工作中,我们引入了对称-电子指纹(SEF),这是一种物理可解释的表示,编码了晶体对称操作、Wyckoff位点几何以及位点分辨的电子结构。结合随机森林的集成学习,SEF在回归磁矩和各向异性能量的同时准确分类磁有序,同时分辨巡游Stoner铁磁性与局域超交换的不同区域。SEF训练模型的不同之处在于,模型不确定性较高的区域不是失败,而是一种诊断,识别出这些机制竞争的材料。对Co基和Ni基卤化物和氧化物的第一性原理计算证实,这些区域对应于具有磁受挫、抑制各向异性和涌现非共线有序的真正近简并FM和AFM相。通过将对称性和交换物理直接编码到表示中(不同于传统描述符),SEF将模型不确定性转化为指向二维材料的指南针,在这些材料中,小扰动驱动共线、受挫或非共线磁相之间的转变。

英文摘要

Two-dimensional magnets offer compelling platforms for spintronics and quantum technologies, yet predicting their magnetic ground states, moments, and anisotropy remains challenging. This limitation primarily arises because existing machine-learning representations encode chemical environments without capturing the symmetry or exchange physics that govern magnetism. In this work, we introduce the symmetry-electronic fingerprint (SEF), a physically interpretable representation that encodes crystallographic symmetry operations, Wyckoff-site geometry, together with site-resolved electronic structure. Combined with ensemble learning with random forests, the SEF accurately classifies magnetic ordering while regressing moments alongside anisotropy energies while simultaneously resolving the distinct regimes of itinerant Stoner ferromagnetism from localized superexchange. What sets the SEF-trained models apart is that regions of elevated model uncertainty are not a failure but a diagnostic, identifying materials where these mechanisms compete. First-principles calculations on Co- and Ni-based halides and oxides confirm that these regions correspond to genuine near-degenerate FM and AFM phases with magnetic frustration, suppressed anisotropy, and emergent non-collinear ordering. By encoding symmetry together with exchange physics directly into the representation unlike conventional descriptors, the SEF transforms model uncertainty into a compass pointing toward two-dimensional materials where small perturbations drive transitions between collinear, frustrated, or non-collinear magnetic phases.

2606.11104 2026-06-12 cs.LG math.CA stat.ML 新提交

Limitations of Learning Tanh Neural Networks with Finite Precision

有限精度下学习Tanh神经网络的局限性

Philipp Grohs, Matěj Trödler

AI总结 基于有限精度计算和L^p精度保证,通过构造尖锐局部化bump函数,证明自适应随机算法在L^p范数下收敛速度不超过蒙特卡洛率O(m^{-1/p}),除非采样预算随网络参数和架构指数增长。

详情
AI中文摘要

我们研究了在有限精度计算和$L^p$精度保证下,从点评估中学习$\ anh$神经网络的局限性,建立在Berner、Grohs和Voigtländer(2023)的工作基础上。我们的方法基于通过迭代$\ anh$激活函数新颖构造的尖锐局部化bump函数。利用这一机制,我们证明,在有限精度设置下,基于$m$个样本的自适应随机算法在$L^p$范数下无法达到比蒙特卡洛率$O(m^{-1/p})$更高的收敛速度,除非采样预算随网络参数和架构的大小指数增长。结果揭示了有限精度对包含局部化bump函数的类别可学习性施加的基本限制,将先前针对ReLU网络的结果推广到了$\ anh$设置。

英文摘要

We investigate limitations of learning $\tanh$ neural networks from point evaluations under finite-precision computations and $L^p$ accuracy guarantees, building on Berner, Grohs, and Voigtländer (2023). Our approach is based on a novel construction of sharply localized bump functions via iterated $\tanh$ activations. Using this mechanism, we show that, in a finite-precision setting, no adaptive randomized algorithm based on $m$ samples can achieve a convergence rate higher than the Monte Carlo rate $O(m^{-1/p})$ in the $L^p$ norm, unless the sampling budget grows exponentially with the size of the network parameters and architecture. The results reveal fundamental limitations imposed by finite precision on the learnability of classes containing localized bump functions, extending previous results for ReLU networks to the $\tanh$ setting.

2606.07247 2026-06-12 cond-mat.dis-nn cond-mat.stat-mech stat.ML 新提交

Theory of learning of high-dimensional controlled non-linear dynamical systems (I): models and methods

高维受控非线性动力系统学习理论 (I): 模型与方法

Pierfrancesco Urbani

AI总结 本文提出一类理论模型,通过动态平均场理论求解神经ODE在在线随机梯度下降下的训练动力学,并推导高维极限下的学习曲线。

详情
Comments
28 pages, 2 figures
AI中文摘要

神经常微分方程(neural ODEs)迅速成为概念化人工神经网络的一个强大且统一的框架,优雅地将动力系统的连续时间建模与现代深度学习的离散数据驱动范式联系起来。除了实际优势外,它们还为神经网络的训练和泛化性质提供了新的理论见解。该框架的显著特征是其双重动力学性质:推理动力学(控制前向计算期间的ODE演化)和训练动力学(控制模型参数的优化)。这使得神经ODE成为研究多种设置(如多层神经网络(例如ResNet)、自回归模型(具有下一个token生成动力学)、生成模型以及理论神经科学中的递归神经网络)的特别合适的理论框架。在这项工作中,我们引入了一个基于理论的模型类,用于研究通过在线随机梯度下降训练的神经ODE。我们通过动态平均场理论求解这些模型的训练动力学,并推导出高维极限下的学习曲线。

英文摘要

Neural ordinary differential equations (neural ODEs) have rapidly gained prominence as a powerful and unifying framework for conceptualizing artificial neural networks, elegantly connecting the continuous-time modeling of dynamical systems with the discrete, data-driven paradigm of modern deep learning. Beyond their practical advantages they offer fresh theoretical insights into the training and generalization properties of neural networks. The distinctive feature of this framework is its dual dynamical nature: inference dynamics, which govern the ODE evolution during forward computation, and training dynamics, which control the optimization of model parameters. This makes neural ODEs a particularly well-suited theoretical framework for studying a large variety of settings such as multi-layer neural networks (ResNets for example), autoregressive models (with next-token generation dynamics), generative models, and recurrent neural networks in theoretical neuroscience. In this work, we introduce a theoretically grounded class of models for studying neural ODEs trained via online stochastic gradient descent. We solve the training dynamics of these models via dynamical mean field theory and derive learning curves in the high-dimensional limit.

2605.18898 2026-06-12 cs.LG stat.ML 交叉投稿

A Two-Parameter Weibull Framework for Diagnosing Transformer Weight Distributions

一种双参数Weibull框架用于变压器权重分布诊断

Tiexin Ding

AI总结 本文提出了一种基于Weibull分布的双参数框架,用于分析Transformer中元素权重幅度分布,通过实验发现不同模块的k值分布特征,并揭示了训练过程中lambda参数的变化规律。

详情
Comments
27 pages, 14 figures. Companion library npm-weibull-py and benchmark database available at this https URL
AI中文摘要

我们应用Weibull分布——极值理论中的一个双参数家族——作为诊断框架,用于分析Transformer中元素权重幅度分布。在初始化时,i.i.d.高斯权重给出|w| ~ HalfNormal,产生k ~ 1.20通过中间80%概率-图拟合(此工作中的协议)。这个锚点使k成为一种原则性的、架构无关的训练动态测量工具;在每个层的每个检查点独立拟合每个权重矩阵,使能够进行每组件、每层和每步的诊断,这些聚合统计无法解决。将此框架应用于12个模型,涵盖7个架构家族(Pythia, OLMo-1/2, LLaMA-3, Mistral, Qwen2.5/3)揭示了三个发现。首先,FFN模块和注意力输出投影W_o——传输类——落在狭窄的k带中:在12个条目中,中位数终端k在[1.186, 1.204]之间(跨家族CV=0.51%),在SwiGLU/GeLU激活、Pre-LN/QK-Norm放置和70M-14B大小之间共享。其次,注意力输入投影W_q, W_k——选择类——脱离Weibull家族,其严重程度由存储形状决定:分别存储Q/K(OLMo-1, OLMo-2)产生k在[0.76, 0.99](深层);GQA模型产生k在[1.10, 1.16](轻微);Pythia的合并W_qkv占据过渡区,跟踪训练预算T/tau单调递增。第三,lambda在训练过程中显著增长,并在Pythia家族中与sqrt(eta/lambda_wd)成比例(Pearson r=0.94,三种传输类型),方向上与Fan等人(2025)一致。这两个参数携带独立信息:k标记功能类别,lambda标记训练进度。我们发布了npm-weibull-py v0.4(Python库)和DATABASE_v9_1在https://github.com/tiexinding/NPM-Weibull-public。

英文摘要

We apply the Weibull distribution -- a two-parameter family from extreme-value theory -- as a diagnostic framework for element-wise weight magnitude distributions in transformers. At initialization, i.i.d. Gaussian weights give |w| ~ HalfNormal, yielding k ~ 1.20 via middle-80% probability-plot fit (the protocol used throughout this work). This anchor makes k a principled, architecture-independent measuring stick for training dynamics; fitting each weight matrix independently at every layer at every checkpoint enables per-component, per-layer, and per-step diagnostics that aggregate statistics cannot resolve. Applying this framework to 12 model entries spanning 7 architectural families (Pythia, OLMo-1/2, LLaMA-3, Mistral, Qwen2.5/3) reveals three findings. First, FFN modules and the attention output projection W_o -- the Transmission Class -- fall in a narrow k band: median terminal k in [1.186, 1.204] across 12 entries (cross-family CV = 0.51%), shared across SwiGLU/GeLU activations, Pre-LN/QK-Norm placements, and 70M-14B sizes. Second, the attention input projections W_q, W_k -- the Selection Class -- depart from the Weibull family, with severity shaped by storage: separately-stored Q/K (OLMo-1, OLMo-2) yields k in [0.76, 0.99] (deep); GQA models yield k in [1.10, 1.16] (mild); Pythia's merged W_qkv occupies a transitional zone tracking training budget T/tau monotonically. Third, lambda grows substantially during training and scales with sqrt(eta/lambda_wd) within the Pythia family (Pearson r = 0.94, three Transmission kinds), directionally consistent with Fan et al. (2025). The two parameters carry independent information: k labels the functional class, lambda labels training progress. We release npm-weibull-py v0.4 (Python library) and DATABASE_v9_1 at this https URL.

7. 生物统计与医学统计 3 篇

2606.12677 2026-06-12 stat.ME 新提交

Restricted Multivariate Spatial Modeling

受限多变量空间建模

Jihyeon Kwon, Harrison Quick

AI总结 针对多变量条件自回归模型信息量过强的问题,提出一种通过重参数化控制信息量的受限MCAR模型,并在心脏病死亡数据中展示其优势。

详情
Comments
30 pages
AI中文摘要

在对小区域健康事件建模时,Besag、York和Mollié(BYM)的条件自回归(CAR)框架被广泛使用。对于多结局,多变量CAR(MCAR)扩展除了空间依赖性外,还容纳了共享风险因素的疾病之间的依赖性,并且还可以联合建模单一疾病的人口统计子组,允许在相关人群之间借用信息。然而,最近的研究表明,BYM CAR模型可能信息量过强,导致估计过于精确。虽然MCAR模型由于跨子组共享额外信息而预期信息量更强,但其信息量水平此前尚未被量化。我们提出了一个框架来测量MCAR模型的信息量,作为先前工作的扩展,并引入了一种控制信息量的方法,确保模型对每个子组的贡献相当。我们通过在一个计算高效的框架内对MCAR模型进行重参数化来实现这一点。我们展示了MCAR模型在信息量和过度平滑方面与BYM CAR模型的比较,并利用按种族和性别分层的县级心脏病死亡数据突出了受限MCAR模型的优势。

英文摘要

When modeling health events in small areas, the conditional autoregressive (CAR) framework of Besag, York, and Mollié (BYM) is widely used. For multiple outcomes, the multivariate CAR (MCAR) extension accommodates dependence among diseases that share risk factors, in addition to spatial dependence, and can also jointly model demographic subgroups for a single disease, allowing information to be borrowed across related populations. However, recent studies have shown that the BYM CAR model can be overly informative, leading to excessively precise estimates. While the MCAR model is expected to be more informative due to additional information shared across subgroups, its level of informativeness has not been previously quantified. We propose a framework to measure MCAR model informativeness as an extension of prior work and introduce a method to control it, ensuring the model contributes comparably to each subgroup. We achieve this through a reparameterization of the MCAR model within a computationally efficient framework. We demonstrate how the MCAR model compares with the BYM CAR model in terms of informativeness and oversmoothing and highlight the advantages of the restricted MCAR model using county-level heart disease death data stratified by race and sex.

2606.12566 2026-06-12 stat.ME 新提交

Inferring resource selection and utilization distributions from irregular and error-prone animal tracking data

从不规则且带有误差的动物追踪数据推断资源选择和利用分布

Fanny Dupont, Brett T. McClintock, Jan-Ole Fischer, Marianne Marcoux, Nigel E. Hussey, Marie Auger-Méthé

AI总结 提出基于拉普拉斯近似的单阶段框架,通过TMB实现,同时处理测量误差和不规则采样,在模拟和独角鲸数据中优于两步法。

详情
Comments
26 pages
AI中文摘要

栖息地选择和空间利用是理解动物分布的基础。从遥测数据量化栖息地偏好的传统方法假设规则采样和可忽略的测量误差。然而,这些假设在海洋系统中经常被违反。实践者通常在进行模型拟合之前对数据进行规则化和过滤,但这两步过程未能传播过滤阶段的不确定性,并可能导致有偏估计。栖息地驱动的Langevin扩散模型提供了一种优雅的替代方案,自然地适应不规则采样。然而,通过状态空间公式纳入测量误差具有挑战性,因为栖息地协变量依赖于潜在的真实位置。我们利用拉普拉斯近似同时积分真实位置并考虑潜在路径上的栖息地协变量,从而在模板模型构建器(TMB)中高效实现单阶段框架。通过这样做,我们提供了第一个能够处理依赖于潜在变量的协变量的TMB实现,允许通过快速高效的极大似然估计进行推断。模拟表明,我们的方法优于两步法,即使在显著的测量误差和缺失数据下也能恢复栖息地选择参数,并得到更准确的利用分布和轨迹重建。应用于独角鲸(Monodon monoceros)遥测数据时,两步法将栖息地选择系数大幅缩小至接近零,而我们的统一方法恢复了更强的信号。我们的框架为栖息地选择推断中长期存在的测量误差和时间不规则性挑战提供了计算高效的解决方案,适用于广泛的分类群和环境。

英文摘要

Habitat selection and space use are fundamental to understanding animal distribution. Traditional methods for quantifying habitat preferences from telemetry data assume regular sampling and negligible measurement error. However, these assumptions are routinely violated in marine systems. Practitioners typically regularize and filter the data before fitting models, but these two-step procedures do not propagate uncertainty from the filtering stage and can yield biased estimates. Habitat-driven Langevin diffusion models offer an elegant alternative, naturally accommodating irregular sampling. However, incorporating measurement error via a state-space formulation is challenging because habitat covariates depend on the latent true locations. We address this using the Laplace approximation to simultaneously integrate over true locations and account for habitat covariates along latent paths, yielding a single-stage framework efficiently implemented in Template Model Builder (TMB). By doing so, we provide the first TMB implementation capable of handling covariates that depend on latent variables, allowing inference via fast and efficient maximum likelihood estimation. Simulations show that our approach outperforms the two-step method, recovering habitat-selection parameters even under substantial measurement error and missing data, with more accurate utilization distributions and trajectory reconstructions. Applied to narwhal (Monodon monoceros) telemetry data, the two-step method substantially shrinks the habitat selection coefficient towards zero, while our unified approach recovers a much stronger signal. Our framework offers a computationally efficient solution to long-standing challenges of measurement error and temporal irregularity in habitat selection inference, applicable across a wide range of taxa and environments.

2606.13236 2026-06-12 cs.LG cs.AI cs.SD stat.AP 新提交

Decoding Insect Song: A Multitask Semisupervised Orthoptera Bioacoustic Classifier

解码昆虫之歌:一种多任务半监督直翅目生物声学分类器

Olga Isupova, Danil Kuzin, Ella Browning, Tom Mills, Steven Reece

发表机构 * University of Oxford(牛津大学)

AI总结 提出PULSE半监督多任务框架,结合弱监督分类、自监督学习和知识蒸馏,在直翅目生物声学分类中优于通用模型,并通过主动学习进一步提升性能。

详情
Comments
ICML 2026 Workshop on Machine Learning for Audio
AI中文摘要

被动声学监测在生态推断方面具有巨大潜力,但现有的自动化工具通常训练范围狭窄且不可迁移。我们通过PULSE(一种用于直翅目生物声学的半监督多任务框架)解决了这些局限性,该框架结合了弱监督物种分类、未标记野外音频的自监督学习以及来自通用生物声学模型的知识蒸馏。我们的领域自适应专家模型在所有指标上均优于最先进的通用模型(宏F1:0.21 vs. 0.07;AUC:0.74 vs. 0.45;AP:0.32 vs. 0.19),主动学习进一步将F1提升至0.34,AUC提升至0.84。除了分类之外,学习到的嵌入编码了生态上有意义的结构,并通过交互式可视化工具暴露出来,用于生态发现。

英文摘要

Passive acoustic monitoring holds great promise for ecological inference, yet existing automated tools are typically narrowly trained and non-transferable. We address these limitations with PULSE, a semi-supervised, multi-task framework for Orthoptera bioacoustics, combining weakly-supervised species classification, self-supervised learning on unlabelled field audio, and knowledge distillation from a general-purpose bioacoustic model. Our domain-adapted specialist model outperforms a state-of-the-art general model across all metrics (macro F1: 0.21 vs. 0.07; AUC: 0.74 vs. 0.45; AP: 0.32 vs. 0.19), with active learning further raising F1 to 0.34 and AUC to 0.84. Beyond classification, the learned embeddings encode ecologically meaningful structure, exposed through an interactive visualisation tool for ecological discovery.

8. 经济金融与社会科学统计 4 篇

2606.13401 2026-06-12 stat.AP 新提交

Scaling Demand-Side Flexibility Through Dynamic Tariffs

通过动态电价扩展需求侧灵活性

Lucas Brylle, Niels Andersen, Henrik Madsen

AI总结 本文论证动态电价激励的隐性需求侧灵活性是应对配电网挑战的最可扩展且经济有效的方法,可节省每座受限变电站1300-4800万丹麦克朗。

详情
AI中文摘要

丹麦配电网中持续的电气化和可再生能源整合带来了重大运营挑战,包括储备容量不足、过载导致的组件退化、电压不稳定以及不断增加的基础设施投资需求。本文论证,通过动态电价激励的隐性需求侧灵活性(DSF)是应对现代配电网这些挑战的最可扩展且经济有效的方法。我们证明,虽然显式灵活性机制提供了运营确定性,但它们无法扩展到解决异构客户群中的系统范围拥堵。基于显示强烈价格响应行为的经验消费数据、因监管框架(如丹麦市场模型3.0和电价模型3.0)而变化的价格以及经济分析,我们展示了通过延迟或避免加固,每座受限变电站可节省1300-4800万丹麦克朗的电网成本。我们认为,隐性DSF机制代表了收入中性的可扩展灵活性解决方案的必要路径,可以在保持系统可靠性的同时延迟昂贵的电网加固。除了直接的电网节省外,额外的价值流包括避免峰值发电成本、减少连接延迟和降低停电风险,进一步增强了经济性。关键是,动态电价提供了将实时电网约束传达给消费者的机制,使价格信号能够准确反映配电网在任何给定时间和地点的实际容量状态。

英文摘要

The ongoing electrification and integration of renewable energy sources in Denmark's distribution grids pose significant operational challenges, including insufficient reserve capacity, component degradation due to overload, voltage instability, and increasing infrastructure investment requirements. This article argues that implicit demand-side flexibility (DSF) incentivized through dynamic tariffs offers the most scalable and cost-effective approach to address these challenges in a modern distribution network. We demonstrate that while explicit flexibility mechanisms provide operational certainty, they cannot scale to address system-wide congestion across heterogeneous customer bases. Drawing on empirical consumption data showing strong price-responsive behavior, varying prices due to, e.g., regulatory frameworks including the Danish Market Model 3.0 and Tariff Model 3.0, and economic analysis, we demonstrate potential grid savings of 13--48 million DKK per constrained substation through deferred or avoided reinforcement. We argue that implicit DSF mechanisms represent the necessary pathway for revenue-neutral scalable flexibility solutions that can defer costly grid reinforcements while maintaining system reliability. Beyond direct grid savings, additional value streams include avoided peak generation costs, reduced connection delays, and lower outage risk, further strengthening the economic case. Critically, dynamic tariffs offer the mechanism through which real-time grid constraints can be communicated to consumers, enabling price signals that accurately reflect the actual state of the capacity of the distribution network at any given point in time and space.

2606.13094 2026-06-12 stat.AP 新提交

Efficient Estimation of A-basis and B-Basis Value under Epistemic Uncertainty using Importance Sampling and Control Variates

基于重要性采样和控制变量的认知不确定性下A基准和B基准值的高效估计

Elton Donfack-Siewe, Jérôme Morio, Sylvain Dubreuil, Jean-Philippe Navarro, Christian Fagiano

AI总结 针对航空航天认证中的保守分位数估计问题,提出一种利用重要性采样和控制变量在混合不确定性下高效估计A基准和B基准的方法,确保无偏一致估计并量化认知不确定性来源。

详情
AI中文摘要

在航空航天认证和其他安全关键领域,保守分位数估计(如A基准和B基准值)对于保证可靠性至关重要。虽然这些指标传统上来自实验活动,但本文关注使用经过验证的确定性数值模型进行估计。该问题在混合偶然-认知不确定性下提出,考虑了有限材料数据、有限采样效应和代理模型误差。我们提出了一种在混合不确定性下具有统计保证的保守设计分位数估计方法。所提出的方法利用重要性采样和控制变量,在固定计算预算内实现准确高效的估计。一个关键点是代理模型仅作为方差缩减工具,这保证了无偏且一致的分位数估计。通过明确整合所有不确定性来源,所提出的框架为估计A基准和B基准提供了一种数值替代方案。此外,Sobol敏感性指数无需额外成本即可获得,从而洞察主要的认知不确定性来源。结构模型上的数值实验证明了该方法的可靠性和计算效率。特别是,将其应用于大规模工业模拟证实了其适用于航空航天认证工作流程,并突显了其在实际工程环境中的相关性。

英文摘要

In aerospace certification and other safety-critical domains, conservative quantile estimation such as A- and B-basis values is essential to guarantee reliability. While these metrics are traditionally derived from experimental campaigns, this work focuses on their estimation using a validated deterministic numerical model. The problem is formulated under mixed aleatory-epistemic uncertainty, accounting for limited material data, finite sampling effects, and surrogate modeling errors. We propose a methodology for estimating conservative design quantiles with statistical guarantees under mixed uncertainties. The proposed method leverages importance sampling and control variates to achieve accurate and efficient estimates within a fixed computational budget. One key point is the surrogate model's role solely as a variance reduction device, which guarantees unbiased and consistent quantile estimation. By explicitly integrating all sources of uncertainty, the proposed framework provides a numerical alternative to estimate A-basis and B-Basis. Furthermore, Sobol-based sensitivity indices are obtained at no additional cost, offering insight into the dominant epistemic sources. Numerical experiments on structural models demonstrate the method's reliability and computational efficiency. In particular, the application to large-scale industrial simulations confirms its suitability for aerospace certification workflows and highlights its relevance for real world engineering environments.

2606.13019 2026-06-12 stat.AP 新提交

Stochastic Modeling of Composite Interfaces: Sensitivity to Spatial Correlation and Bayesian Identification from Standard Fracture Tests

复合材料界面的随机建模:对空间相关性的敏感性及基于标准断裂试验的贝叶斯识别

Elton Donfack-Siewe, Sylvain Dubreuil, Christian Fagiano, Jérôme Morio, Jean-Philippe Navarro

AI总结 提出随机有限元框架,通过空间相关随机场表征界面变异性,从标准断裂试验中利用近似贝叶斯计算提取关键参数,提升航空复合材料可靠性评估。

详情
AI中文摘要

为了在数值上处理复合材料结构中的不确定性,本文提出了一个随机有限元框架,旨在提高航空航天复合材料的可靠性评估,特别关注加强筋脱粘。通过用空间相关的随机场表示层压部件之间的界面变异性,该方法旨在考虑更高尺度的模拟和测试中的散射效应。对标准化模式I和模式II断裂试验进行的参数研究表明,相关长度是观察到的变异性的主要驱动因素,而协方差核的正则性只有边际影响。为了保证工业相关性,我们证明可以使用近似贝叶斯计算方法从实验断裂数据中提取这一关键参数。因此,所提出的方法为高保真虚拟测试以及在耐损伤复合材料机身设计中不确定性的预测管理提供了一条稳健的途径。

英文摘要

To enable a numerical handling of uncertainties in composite structures, this work presents a stochastic finite-element framework aimed at improving the reliability assessment of aerospace composites, with particular attention to stiffener debonding. By representing interface variability between laminate parts with spatially correlated random fields, the method aims at considering scattering effect at a higher scale of simulation and testing. A parametric study carried out on standardized Mode I and Mode II fracture tests reveals that the correlation length is the primary driver of observed variability, while the regularity of the covariance kernel has only a marginal impact. To guarantee industrial relevance, we demonstrate that this key parameter can be extracted from experimental fracture data using an Approximate Bayesian Computation approach. The proposed methodology therefore offers a robust route to high-fidelity virtual testing and to the predictive management of uncertainties in the design of damage-tolerant composite airframes.

2606.12889 2026-06-12 stat.AP 新提交

The Persistent Non-Response Bias in a Sample-Matched Poll for the 2024 U.S. Presidential Election

2024年美国总统选举中样本匹配民调的持续无应答偏差

Jay Chooi

AI总结 针对2024年美国总统选举民调偏差,利用数据缺陷相关性框架分析6万受访者数据,发现特朗普选民的无应答偏差持续存在,并提出基于历史数据缺陷相关性和投票率的预选举偏差校正估计器。

详情
Comments
Submitted to Journal of Survey Statistics and Methodology
AI中文摘要

唐纳德·特朗普赢得了2024年美国总统选举,尽管民调预测民主党领先,这呼应了2016年的民调失误。利用数据缺陷相关性框架,我们重新审视了6万受访者的合作选举研究,发现即使在针对美国成年人口进行样本匹配后,特朗普选民的无应答偏差仍然存在,且量级相同(ρ=-0.0030,而2016年为-0.0045)。我们还发现,在调整投票率后,哈里斯选民存在正向应答偏差。与2016年的发现一致,民调误差随州人口规模扩大而增加,更大的样本导致与常规置信区间的更大偏离,最大州的样本有效规模减少超过99%。我们提出了一种基于历史数据缺陷相关性和投票率的预选举偏差校正估计器,仅使用先前选举数据即可将均方根误差从0.13降至0.05,与选举后加权(均方根误差0.09)相当。

英文摘要

Donald Trump won the 2024 US Presidential Election despite polls predicting a Democratic lead, echoing the polling miss in 2016. Using the data defect correlation framework, we revisit the 60,000-respondent Cooperative Election Study and find that non-response bias for Trump voters persists on the same order of magnitude ($\rho=-0.0030$ vs $-0.0045$ in 2016) even under sample-matching to the US adult population. We additionally find evidence of positive response bias for Harris voters after adjusting for turnout. Consistent with findings in 2016, polling errors scale with state population size, and larger samples produce greater departures from conventional confidence intervals, with reductions of effective sample size exceeding 99% in the largest states. We propose a pre-election bias correction estimator informed by historical data defect correlations and turnout rates that decreases RMSE from 0.13 to 0.05 using only prior election data, comparable to post-election weighting (RMSE 0.09).

9. 数据隐私、稳健性与公平性 4 篇

2606.13327 2026-06-12 stat.ME stat.OT 新提交

Disclosure risk in a geo-spatial setting

地理空间环境中的披露风险

Peter-Paul de Wolf

AI总结 针对主题地图发布统计信息时披露风险与效用的平衡问题,提出一种不受可修改面积单元问题影响的新风险度量,该度量与目标人口局部密度相关并考虑多单元连接,通过企业位置示例数据集展示其行为。

详情
AI中文摘要

使用主题地图发布统计信息已成为一种流行的可视化方式。与所有统计出版物一样,主题地图也必须处理披露风险与效用之间的平衡。然而,大多数风险和效用度量并未考虑地图的空间特征。一些提出的空间风险度量存在可修改面积单元问题(MAUP):略微改变区域分类可能会影响风险。实际上,即使是网格的微小平移也可能影响该风险。我们提出了一种新的风险度量,它不受MAUP的影响。此外,我们的风险直接与(目标)人口的局部密度相关,并考虑到多个单元可能连接到单个位置的情况。我们使用一个虚构但真实的企业位置示例数据集展示了风险度量的行为。我们的风险度量可以进行调整,以考虑放大或缩小对(感知)风险的影响以及所用分辨率的影响。

英文摘要

Using thematic maps to publish statistical information has become a popular visualization. As is the case with all statistical publications, thematic maps also have to deal with the balance between disclosure risk and utility. However, most risk and utility measures do not take into account the spatial character of a map. Some of the proposed spatial risk measures suffer from the Modifiable Areal Unit Problem (MAUP): slightly changing regional classifications may influence the risk. Indeed, even a small translation of for example a grid may influence that risk. We propose a new risk measure that does not suffer from MAUP. Moreover, our risk is directly related to the local density of the (target) population and takes into account that often multiple units may be connected to a single location. We show the behavior of our risk measure using an example dataset of fake but realistic locations of enterprises. Our risk measure can be adapted to take into account the effect on the (perceived) risk of zooming in or out and the effect of the used resolution.

2606.13025 2026-06-12 stat.ME 新提交

Diagnostics-guided variance-inflated Fay-Herriot estimation from non-probability samples

诊断引导的方差膨胀Fay-Herriot估计:基于非概率样本

Andrius Čiginas

AI总结 针对非概率样本的小域估计,提出诊断引导的方差膨胀Fay-Herriot估计,通过域诊断指标调整方差膨胀,在弱覆盖域中加强平滑,显著降低估计误差。

详情
Comments
17 pages, 2 figures
AI中文摘要

非概率数据源在小域估计中日益受到关注,但逆概率加权(IPW)给出模型依赖的域估计量,其可靠性在不同域间可能差异显著。标准Fay-Herriot(FH)平滑跨域借用强度,但它使用提供的区域级方差估计,仿佛它们完全描述了输入估计量的不确定性。当某些域覆盖弱、权重不稳定或辅助平衡差时,这可能产生误导,因为这些特征可能表明选择偏差风险,而仅凭估计方差无法捕捉。我们提出一种诊断引导的方差膨胀FH估计量,用于有限总体域总量。该方法从校准的IPW域估计量出发,通过一组域诊断总结其可靠性,并在FH观测方程中引入混合方差膨胀成分。诊断表明IPW信息较弱的域因此被更强烈地平滑到区域级回归均值。基于立陶宛商业企业的伪真实总体验证表明,与校准IPW相比,估计误差大幅降低。

英文摘要

Non-probability data sources are increasingly considered in small area estimation, but inverse probability weighting (IPW) gives model-dependent domain estimators whose reliability may vary substantially across domains. Standard Fay-Herriot (FH) smoothing borrows strength across domains, yet it uses the supplied area-level variance estimates as if they fully described the uncertainty of the input estimators. This can be misleading when some domains have weak coverage, unstable weights, or poor auxiliary balance, since these features may indicate selection-bias risk not captured by the estimated variance alone. We propose a diagnostics-guided variance-inflated FH estimator for finite-population domain totals. The method starts from calibrated IPW domain estimators, summarizes their reliability through a small set of domain diagnostics, and introduces a mixture variance-inflation component in the FH observation equation. Domains whose diagnostics indicate weaker IPW information are thereby smoothed more strongly toward the area-level regression mean. A truth-known validation based on a pseudo-real population of Lithuanian business enterprises shows a substantial reduction in estimation error relative to calibrated IPW.

2606.13629 2026-06-12 stat.ME cs.AI cs.LG stat.ML 新提交

Valid Inference with Synthetic Data via Task Exchangeability

通过任务可交换性实现基于合成数据的有效推断

Lezhi Tan, Tijana Zrnic

AI总结 提出任务可交换性条件,确保在科学研究中使用合成数据进行统计推断的有效性,并给出在民意调查和AI评估中的应用。

详情
AI中文摘要

越来越多的工作主张在科学研究中使用合成数据。例如,社会科学家主张在试点研究中使用LLM生成的“硅样本”;AI评估越来越依赖“LLM作为裁判”的输出;蛋白质组学研究通过生成合成蛋白质结构的生成模型加速。这些发展引发了一个有趣的可能性:合成数据可以帮助研究人员提出更多问题、进行更多研究并加速发现。但它们也引发了一个根本性的担忧:合成数据可能有偏、有噪声且设定错误。在这项工作中,我们提出了在科学研究中使用合成数据的统计原则,并具有可证明的有效性保证。关键见解是一个我们称为任务可交换性的新技术条件。非正式地说,这是一个要求,即研究人员可以识别出有真实数据可用的历史任务,使得他们当前感兴趣的任务与历史任务在适当的数学意义上可交换。我们开发了在任务可交换性下进行有效推断的方法,以及即使在可交换性之外也能提供保证的扩展。我们通过硅样本的民意调查和自动评分器的AI评估来展示该框架。

英文摘要

There is a proliferation of work arguing for the use of synthetic data in scientific research. For example, social scientists are arguing for the use of LLM-generated "silicon samples" in pilot studies; AI evaluations increasingly rely on "LLM-as-a-judge" outputs; and proteomics research is accelerated by generative models that produce synthetic protein structures. These developments raise an intriguing possibility: synthetic data may help researchers ask more questions, run more studies, and accelerate discovery. But they also raise a fundamental concern: synthetic data can be biased, noisy, and misspecified. In this work, we propose statistical principles for using synthetic data in scientific research with provable validity guarantees. The key insight is a new technical condition that we call task exchangeability. Informally, this is a requirement that the researcher can identify historical tasks, for which real data is available, such that their current task of interest is exchangeable with the historical tasks in an appropriate mathematical sense. We develop methods for valid inference under task exchangeability, together with extensions that provide guarantees even beyond exchangeability. We demonstrate the framework on public opinion surveys with silicon samples and AI evaluation with autoraters.

2606.12654 2026-06-12 stat.ME cs.LG stat.ML 新提交

Computationally tractable robust differentially private mean estimation

计算可处理的鲁棒差分隐私均值估计

Kelly Ramsay

AI总结 提出一种名为“气球均值”的新差分隐私均值估计器,通过扩展马氏距离球上的迭代裁剪实现计算可处理性、鲁棒性及零集中差分隐私,理论保证在重尾和污染椭圆模型下的统计性能与鲁棒性。

详情
Comments
40 pages, 17 figures
AI中文摘要

我们开发了一种新的差分隐私均值估计器,称为气球均值。气球均值的主要特点是计算可处理且对异常观测具有鲁棒性。它基于在扩展的马氏距离球(即“气球”)上的迭代裁剪过程。该方法满足零集中差分隐私,并依赖于少量可解释的调优参数。我们在重尾和污染椭圆模型下提供了理论保证,刻画了其统计性能和对异常值的鲁棒性。大量模拟表明,气球均值对重尾和污染数据具有鲁棒性,并且在污染环境下优于现有的差分隐私均值估计器。

英文摘要

We develop a new, differentially private mean estimator called the balloon mean. The main features of the balloon mean are that it is computationally tractable and enjoys robustness to outlying observations. It is based on an iterative clipping procedure over expanding Mahalanobis balls, or ``balloons.'' The method satisfies zero-concentrated differential privacy and depends on a small number of interpretable tuning parameters. We provide theoretical guarantees under heavy-tailed and contaminated elliptical models, characterizing its statistical performance and robustness to outliers. Extensive simulations demonstrate that the balloon mean is robust to heavy-tailed and contaminated data, and outperforms existing differentially private mean estimators in contaminated settings.

10. 数据集、软件与应用 2 篇

2606.13523 2026-06-12 stat.CO 新提交

HNPclassifier: An R Package for Hierarchical Neyman-Pearson Classification

HNPclassifier:用于分层Neyman-Pearson分类的R包

Lujia Yang, Che Shen, Shunan Yao, Lijia Wang

AI总结 提出HNPclassifier R包,实现分层Neyman-Pearson框架,通过内置或用户提供的评分函数控制有序多类分类中的欠分类错误。

详情
AI中文摘要

在多类分类问题中,类别通常具有自然的优先级顺序(例如,癌症分期、COVID-19严重程度等级或空气质量类别)。在这种情况下,优先正确识别更严重的类别并控制欠分类错误(即当来自高优先级类别的观测被错误分类到低优先级类别时)非常重要。Wang等人(2024)的分层Neyman-Pearson(H-NP)框架针对有序多类设置开发,以优先控制欠分类错误;其H-NP伞算法在用户指定水平上以高概率控制欠分类错误。本文介绍了R包HNPclassifier,该包实现了H-NP伞算法,使用内置学习器(如逻辑回归、随机森林和支持向量机)以及用户提供的评分函数构建H-NP分类器,从而实现对有序多类分类任务的有效错误控制。

英文摘要

In multi-class classification problems, classes often have a natural priority ordering (e.g., cancer stages, COVID-19 severity levels, or air-quality categories). In such settings, it is important to prioritize correct identification of more severe classes and to control under-classification errors, which occur when an observation from a higher-priority class is misclassified into a lower-priority one. The Hierarchical Neyman-Pearson (H-NP) framework of Wang et al. (2024) was developed for ordered multi-class settings to prioritize under-classification error control; its H-NP umbrella algorithm provides high-probability control of under-classification errors at user-specified levels. This paper introduces the R package HNPclassifier, which implements H-NP umbrella algorithms to construct H-NP classifiers using built-in learners such as logistic regression, random forests, and support vector machines, as well as user-supplied scoring functions, thereby enabling effective error control for ordered multi-class classification tasks.

2606.12642 2026-06-12 astro-ph.EP astro-ph.IM stat.AP 新提交

Quantifying Surface Heterogeneity Across Asteroid (101955) Bennu using Candidate Site Remote Sensing Data

利用候选采样点遥感数据量化小行星(101955)贝努的表面异质性

Emma-Catherine Belhadfa, Neil E. Bowles, Katherine A. Shirley, Amy A. Simon, Victoria E. Hamilton, Hannah H. Kaplan

AI总结 通过OSIRIS-REx任务获取的可见光-近红外和热红外光谱,量化贝努表面在2-10米尺度上的矿物组成和物理性质异质性,发现不同采样点间水合指标和硅酸盐波段存在显著差异。

详情
Comments
Currently under review at JGR: Planets
AI中文摘要

OSIRIS-REx任务在小行星(101955)贝努的四个候选采样点(Nightingale、Osprey、Sandpiper和Kingfisher)获取了空间分辨(2-10米光斑尺寸)的可见光-近红外(VNIR)和热红外(TIR)光谱。为了量化像贝努这样的小天体(半径约500米)的表面异质性,我们探索了遥感观测的光谱数据,以得出关于矿物组成和驱动表面变化的关键物理过程的结论。我们从OSIRIS-REx可见光和红外光谱仪以及OSIRIS-REx热发射光谱仪数据中提取诊断性波段参数,以量化各采样点之间的组成和物理变化,并评估其矿物学背景。VNIR光谱显示出相似的整体反射率形状,但在光谱斜率和2.74微米OH吸收方面存在系统性差异。TIR发射率光谱揭示了克里斯琴森特征、硅酸盐伸缩和弯曲波段位置的适度但统计上显著的偏移,表明硅酸盐组成、水合状态和Mg/Fe相对丰度的差异。主成分分析将每个采样点分离成多变量波段参数空间中的不同簇,而K-means聚类识别出站点内的光谱子群。Welch方差分析和Hotelling检验证实了站点间波段参数变化的显著性。这些结果表明,贝努表面在2-10米尺度上保留了可测量的光谱异质性,不同站点间的水合指示剂和硅酸盐波段位置存在变化。Nightingale的光谱特性涵盖了所有四个站点观测到的全部范围,为将返回样本的实验室分析置于贝努更广泛的组成多样性和蚀变历史背景中建立了遥感基线。

英文摘要

The OSIRIS-REx mission acquired spatially resolved (2-10 m spot sizes) visible-near infrared (VNIR) and thermal infrared (TIR) spectra across four candidate sampling sites on asteroid (101955) Bennu: Nightingale, Osprey, Sandpiper, and Kingfisher. To quantify heterogeneity across a small body (about 500 m radius) like Bennu, we explore remotely observed spectral data to draw conclusions about the mineralogical composition and key physical processes that drive surface variability. We derive diagnostic band parameters from the OSIRIS-REx Visible and Infrared Spectrometer and the OSIRIS-REx Thermal Emission Spectrometer datasets to quantify compositional and physical variability across sites and assess their mineralogical context. The VNIR spectra exhibit similar overall reflectance shapes but systematic differences in spectral slopes and the 2.74 micron OH absorption. TIR emissivity spectra reveal modest but statistically significant shifts in the Christiansen Feature, silicate stretching, and bending band positions, indicating differences in silicate composition, hydration state, and Mg/Fe relative abundance. Principal component analysis separates each site into distinct clusters in multivariate band-parameter space, whereas K-means clustering identifies intra-site spectral sub-populations. Welch's Analysis of Variance and Hotelling's tests confirm that band-parameter variations between sites are significant. These results reveal that Bennu's surface preserves measurable spectral heterogeneity at 2-10 m scales, with site-to-site variations in hydration indicators and silicate band positions. The spectral properties of Nightingale encompass the full range observed across all four sites, establishing a remote sensing baseline for contextualizing laboratory analyses of the returned sample within Bennu's broader composition diversity and alteration history.

11. 其他/综合统计 1 篇

2606.12448 2026-06-12 physics.geo-ph stat.CO stat.ME 新提交

A generalized framework for performance-based earthquake engineering: integrated assessment of structural reliability and resilience

基于性能的地震工程通用框架:结构可靠性与韧性的综合评估

C. NArdin, S. Marelli, B. Sudret, M. Broccardo

AI总结 提出一个通用PBEE框架,通过连续时间马尔可夫链将损伤和恢复嵌入系统动力学,统一描述结构可靠性和韧性,并利用生成矩阵的谱特性高效计算指标。

详情
AI中文摘要

评估地震灾害下的结构性能需要考虑损伤累积和震后恢复。在当前基于性能的地震工程(PBEE)中,恢复通常被视为后处理属性,而结构性能采用泊松超越假设建模,该假设隐含可更新性和无记忆性。这些假设阻碍了在重复地震荷载下对可靠性和韧性的统一处理。本研究提出了一个通用PBEE框架,其中损伤和恢复通过连续时间马尔可夫链直接嵌入系统动力学。单个生成矩阵控制状态依赖的转移,提供了结构可靠性和韧性的统一描述,同时与标准PBEE指标兼容。时间相关的失效概率和可靠性指标从瞬态系统动力学导出,而韧性通过倒塌前的预期运行时间比例量化。该框架利用生成矩阵的谱特性高效且透明地计算这两个指标。该方法通过一个三状态示例进行说明,并应用于两个结构原型:一个支撑框架和一个基础隔震系统。结果表明,即使传统可靠性指标表现出有限的敏感性,恢复动力学也能强烈影响长期韧性,强调了在生命周期地震性能评估中明确考虑恢复的必要性。

英文摘要

Assessing structural performance under seismic hazard requires accounting for both damage accumulation and post-event recovery. In current performance-based earthquake engineering (PBEE), recovery is generally treated as a post-processing attribute, while structural performance is modeled using Poissonian exceedance assumptions that imply renewability and memorylessness. These assumptions hinder a unified treatment of reliability and resilience under repeated seismic loading. This study proposes a generalized PBEE framework in which damage and recovery are embedded directly into the system dynamics through a continuous-time Markov chain. A single generator matrix governs state-dependent transitions, providing a unified description of structural reliability and resilience while remaining compatible with standard PBEE metrics. Time-dependent failure probabilities and reliability indices are derived from the transient system dynamics, whereas resilience is quantified through the expected fraction of operational time before collapse. The framework exploits the spectral properties of the generator matrix to compute both metrics efficiently and transparently. The methodology is illustrated on a three-state example and applied to two structural archetypes: a braced frame and a base-isolated system. Results show that recovery dynamics can strongly affect long-term resilience even when conventional reliability measures exhibit limited sensitivity, emphasizing the need to explicitly account for recovery in life-cycle seismic performance assessment.