arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.20151 2026-05-20 cs.LG math.ST stat.TH

When Does Model Collapse Occur in Structured Interactive Learning?

在结构互动学习中模型崩溃何时发生？

Yuchen Wu, Kangjie Zhou, Weijie Su

AI总结研究探讨了在结构互动学习环境中，生成模型性能下降（模型崩溃）的发生条件，通过分析交互图拓扑结构，推导出模型崩溃的必要和充分条件，并通过数值实验验证理论结果。

详情

Comments: 57 pages, 12 figures

AI中文摘要

生成式人工智能的普及催生了交互学习环境，其中模型参数通过自然过程生成的数据和由其他模型产生的合成输出不断更新。这种范式引入了两大挑战：（1）训练数据不再仅来自目标群体，破坏了经典统计学习的核心假设；（2）模型训练过程变得内在相关，因为模型通过反复接触彼此的合成输出进行交互，方式可能复杂。在这样的结构互动学习环境中建立可靠的统计推断仍然是一个重要开放问题。特别是，人们对模型崩溃现象日益关注，该现象是指生成模型在训练于早期模型生成的合成数据时性能逐步下降。先前关于模型崩溃的研究主要集中在单个模型训练其自身输出的情况，未能捕捉多模型交互环境中的模型性能。在本文中，我们填补了这一空白，通过研究具有通用交互模式的交互学习环境中的生成模型性能。特别是，我们利用有向图形式化模型交互，并证明模型崩溃的发生严重依赖于交互图的拓扑结构。我们进一步推导出一个显式的必要和充分条件，以表征模型崩溃何时发生，并为线性回归建立有限样本结果，为一般M估计量建立渐近保证。我们通过广泛的数值实验支持我们的理论发现。

英文摘要

The proliferation of generative artificial intelligence has given rise to an interactive learning environment, where model parameters are continuously updated using not only data generated by natural processes, but also synthetic outputs produced by other models. This paradigm introduces two major challenges: (1) training data are no longer drawn exclusively from the target population, undermining a core assumption of classical statistical learning, and (2) model training processes become inherently correlated, as models interact with one another through repeated exposure to each other's synthetic outputs in a potentially complex manner. Establishing reliable statistical inference in such structured interactive learning environments therefore remains an important open problem. In particular, there is growing concern about model collapse, a phenomenon in which the performance of generative models progressively degrades as they are trained on synthetic data produced by earlier model generations. Prior work on model collapse primarily focuses on a single model trained on its own output, failing to capture model performance in multi-model interactive settings. In this work, we fill this gap by investigating the performance of generative models in an interactive learning environment with general interaction patterns. In particular, we formalize model interactions using directed graphs and show that the occurrence of model collapse depends critically on the topology of the interaction graph. We further derive an explicit necessary and sufficient condition characterizing when model collapse occurs, and establish finite-sample results for linear regression and asymptotic guarantees for general M-estimators. We support our theoretical findings through extensive numerical experiments.

URL PDF HTML ☆

赞 0 踩 0

2605.20145 2026-05-20 stat.ML cs.LG stat.ME

Goal-Oriented Lower-Tail Calibration of Gaussian Processes for Bayesian Optimization

面向目标的高斯过程低尾校准用于贝叶斯优化

Aurélien Pion, Emmanuel Vazquez

AI总结本文研究了在无噪声情况下，针对低于低阈值t的标准高斯过程模型的预测分布进行面向目标的校准，提出了一种后处理方法tcGP，以校准预测分布低于t的部分，并展示了基于此的全局优化算法在设计空间中保持密集性，实验表明相较于标准高斯过程模型和全局校准高斯过程模型，改进了低尾校准和贝叶斯优化性能。

详情

Journal ref: ICML 2026

AI中文摘要

贝叶斯优化（BO）利用高斯过程（GP）预测分布来选择昂贵的黑箱目标的评估点。核选择和超参数选择可能导致预测分布不准确，从而影响探索与利用的平衡。对于最小化问题，采样标准如预期改进（EI）依赖于当前最佳值以下的预测分布，因此低尾不准确直接影响采样决策。本文研究了在无噪声情况下，针对低于低阈值t的标准高斯过程模型的预测分布进行面向目标的校准，超参数通过最大似然法选择。引入了一种预测可靠性低于t的框架，基于两个空间校准的概念：设计空间上的发生校准和子水平集形式{ x∈X, f(x)≤t }上的阈值μ-校准。在此框架基础上，提出tcGP，一种后处理方法，用于校准预测分布低于t的部分，并证明由此得到的基于EI的全局优化算法在设计空间中保持密集。在标准基准测试中，实验表明相较于标准高斯过程模型和全局校准高斯过程模型，改进了低尾校准和贝叶斯优化性能。

英文摘要

Bayesian optimization (BO) selects evaluation points for expensive black-box objectives using Gaussian process (GP) predictive distributions. Kernel choice and hyperparameter selection can lead to miscalibrated predictive distributions and an inappropriate exploration-exploitation trade-off. For minimization, sampling criteria such as expected improvement (EI) depend on the predictive distribution below the current best value, so lower-tail miscalibration directly affects the sampling decision. This article studies goal-oriented calibration of GP predictive distributions below a low threshold $t$ in the noiseless setting, for standard GP models with hyperparameters selected by maximum likelihood. A framework for predictive reliability below $t$ is introduced, based on two notions of spatial calibration: occurrence calibration over the design space and thresholded $μ$-calibration on sublevel sets of the form $\{x\in\mathbb{X}, f(x)\le t\}$. Building on this framework, we propose tcGP, a post-hoc method that calibrates GP predictive distributions below~$t$, and we show that the resulting EI-based global optimization algorithm remains dense in the design space. Experiments on standard benchmarks show improved lower-tail calibration and BO performance relative to standard GP models and globally calibrated GP models.

URL PDF HTML ☆

赞 0 踩 0

2605.20142 2026-05-20 stat.AP q-fin.ST

Mining Financial Data using Mixtures of Mirrored Weibull Distributions

使用镜像Weibull分布混合物挖掘金融数据

Zijun Jia, Sharon X. Lee

AI总结本文提出了一种基于镜像Weibull分布混合物（MMW）模型，用于建模股票收益和估计风险指标，该模型能够灵活适应金融数据中常见的非正态特征，并在价值-at-风险（VaR）估计中表现出显著优势。

2605.20135 2026-05-20 stat.ME

Quantile-Based Effectiveness Persistence Function: A Tail-Focused Metric with Theory, Estimation, and Application to Biosimilar Evaluation

基于分位数的有效性持续函数：一个聚焦尾部的指标，包含理论、估计与生物类似药评估应用

Sankaran P. G., Prasanth V. P., Midhu N. N

AI总结本文提出一种基于分位数的有效性持续函数，用于评估生物类似药的尾部性能，通过理论分析、非参数估计和Bootstrap校准的两样本等价检验，提供稳健的统计工具以捕捉临床相关的尾部持续性。

详情

AI中文摘要

在临床研究中，持续性，即测量患者持续服用处方药物的时间长度而不中断，日益被认可为评估药物依从性的重要指标。依从性不仅包括患者是否按指示服药，还包括他们服药的一致性和持续时间。在各种评估依从性的指标中，持续性因其提供时间维度而显得尤为稳健，反映了患者对治疗方案的持续承诺。这种对持续性的关注为依从性相关质量与性能提供了独特的见解，揭示了优化长期药物使用的机会与挑战。与常规总结相比，上尾临床性能的比较，即衡量顶级响应者中非常大响应持续的范围，往往在治疗评估中更具决定性。本文引入了基于分位数的有效性持续函数，定义为尾部均值与分位函数的比率。该概念类似于风险理论中的预期 shortfall，并针对检测临床有意义的上尾偏差进行了定制。我们建立了关键性质，并证明该函数等同于缩放尾部的首 L-moment，从而获得稳健的推断工具。我们推导出该函数的简单非参数估计器，并开发了Bootstrap校准的两样本（上尾）等价检验。模拟研究和实际数据分析表明，所提出的措施捕捉了临床相关的尾部持续性，补充了中位数和均值基于的总结。

英文摘要

In clinical studies, persistence, which measures the duration of time a patient continues to take a prescribed medication without discontinuation, is increasingly recognized as a critical indicator of adherence to medication. Adherence encompasses not only whether a patient takes their medication as prescribed but also the consistency and duration with which they do so. Among the various metrics used to evaluate adherence, persistence stands out as a particularly robust measure because it provides a temporal dimension, reflecting the sustained commitment of patients to their therapeutic regimens. This focus on persistence offers unique insights into adherence-related quality and performance, shedding light on the challenges and opportunities to optimize long-term medication use. The comparison of upper-tail clinical performance, which measures the extent to which very large responses persist among top responders, is often more decisive in therapy evaluation than conventional summaries. In this paper, we introduce the quantile-based effectiveness persistence function defined as the ratio between the tail mean and the quantile function. The notion parallels expected shortfall in risk theory and is tailored to detect clinically meaningful deviations in the upper tail. We establish key properties and show that the function is equivalent to the first L-moment of the scaled tail, yielding robust inference tools. We derive a simple nonparametric estimator of the function and develop a bootstrap-calibrated two-sample (upper-tail) equivalence test. Simulation studies and real-data analysis illustrate that the proposed measures captures clinically relevant tail persistence that complements median and mean-based summaries.

URL PDF HTML ☆

赞 0 踩 0

2605.20125 2026-05-20 stat.ME math.ST stat.TH

Federated Learning with Incomplete Data: When to Use Complete Cases and When to Weight

联邦学习与不完整数据：何时使用完整案例，何时加权

Jesus E. Vazquez, Yicheng Shen, Jason Akulian, Chad Hochberg, Theodore J. Iwashyna, Elizabeth A. Stuart, Jiayi Tong

AI总结本文研究了在存在缺失数据的联邦学习中，何时应使用完整案例估计器而非逆概率加权估计器，并提出了一种校准权重估计方法以提高一致性。

2605.20122 2026-05-20 stat.ML cs.CC cs.LG

Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation

优化Wasserstein距离估计的计算-统计运行时间

Peter Matthew Jacobs, Jeff M. Phillips

AI总结本文提出了一种Sample-Sketch-Solve方法，通过引入正则化笛卡尔网格草图来压缩数据并加速Wasserstein距离的计算，实现了在Hölder光滑分布下以更优的运行时间达到ε误差的估计。

详情

AI中文摘要

平方Wasserstein距离是衡量概率分布之间差异的常用工具。该距离通常在两个底层随机样本的经验测度之间计算。不幸的是，即使在低维欧几里得空间问题（d∈{2,3}）中，计算Wasserstein距离的算法在运行时间上随着n和所需精度的增加而表现不佳。为此，我们考虑计算-统计运行时间，目标是从样本中估计潜在光滑测度之间的Wasserstein距离，误差在期望意义上不超过ε。我们允许收集样本的计算成本为O(1)。为此，我们开发了一种Sample-Sketch-Solve范式，其中引入了样本的正则化笛卡尔网格草图。我们证明，尤其是在α-Hölder光滑分布下，这可以压缩数据而不增加渐近误差，并且正则化结构使更快的精确算法成为可能。最终，我们以ε误差在ε^{-max(2,(d+1+o(1))/(1+α))}时间内近似W_2^2(P,Q)，对于0 < α < 1的Hölder光滑分布P,Q在(0,1)^d上；当d=2时，对于α>1/2，达到最优Θ(ε^{-2})，当d=3时，当α→1时几乎最优。

英文摘要

Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size $n$ from two underlying random samples. Unfortunately, even in lower dimensional Euclidean space problems $\left( d \in \{2,3\} \right)$, algorithms for Wasserstein distance computation with approximate or exact precision guarantees scale poorly in the runtime as a function of $n$ and the desired precision. In response, we consider the computational-statistical runtime, where the goal is to estimate from samples the Wasserstein distance between potentially smooth measures up to $ε$-additive error in expectation with respect to the sampling; we allow $O(1)$ computational cost for collecting a sample. Towards this, we develop a Sample-Sketch-Solve paradigm where we introduce a regular cartesian grid sketch of the samples. We show that (especially under $α$-Hölder smooth distributions) this can compress the data without increasing asymptotic error, and also regularizes the structure which enables faster exact algorithms. Ultimately, we approximate $W_2^2(P,Q)$ within $ε$ error in $ε^{-\max(2,\frac{d+1+o(1)}{1+α})}$ time for $0 < α< 1$ Hölder smooth distributions $P,Q$ on $(0,1)^{d}$; an optimal $Θ(ε^{-2})$ for $α> 1/2$ when $d=2$ and nearly optimal as $α\to 1$ when $d = 3$.

URL PDF HTML ☆

赞 0 踩 0

2605.20099 2026-05-20 math.ST stat.ME stat.TH

A Goodness-of-Fit Test for Independent Component Models in High Dimensions

高维数据中独立成分模型的适配性检验

Mingshuo Liu, Siyao Wang, Miles E. Lopes

AI总结本文提出了一种高维数据中独立成分模型的适配性检验方法，该方法在数据维度和样本量按比例发散时具有理论保证，并通过数值实验和基因表达数据示例展示了其在实际应用中的诊断潜力。

2605.20068 2026-05-20 stat.ML cs.LG

Tail Annealing for Heavy-Tailed Flow Matching

尾部退火用于厚尾流匹配

Jean Pachebat

AI总结本文提出了一种简单的方法，通过在训练前对数据应用软对数变换，然后在生成后进行指数化，以处理厚尾数据问题。该方法通过Hill诊断决定是否对每个坐标进行变换，保留轻尾边缘不变，从而压缩厚尾到标准流匹配可以处理的范围内，无需厚尾基础分布或架构修改。

详情

Comments: 18 pages

AI中文摘要

标准生成模型在处理厚尾数据时存在困难：Lipschitz架构无法从高斯噪声中生成幂律尾部，且在厚尾数据和高斯数据之间插值是不合理的。我们提出一个简单的解决方案：在训练前对数据应用软对数变换$ϕ(x) = \mathrm{sign}(x) \cdot \log(1 + |x|)$，然后在生成后对样本进行指数化。Hill诊断决定每个坐标是否进行变换，从而在不增加复杂度的情况下保留轻尾边缘不变。这将厚尾压缩到标准流匹配可以处理的范围内，而无需厚尾基础分布或架构修改。我们提供了理论直觉说明其有效性：对数变换将帕累托尾部映射到指数，诱导的动力学通过幂变换实现尾部退火。在144配置的多变量基准测试（3个copulas，$d$最大到100，4个尾指数）上，Log-FM在$W_1$、CVaR$_{99}$和极值分位数度量上优于专门的基线，并且是唯一在2880次运行中无严重发散的方法。

英文摘要

Standard generative models struggle with heavy-tailed data: Lipschitz architectures cannot produce power-law tails from Gaussian noise, and interpolating between heavy-tailed data and Gaussians is ill-posed. We propose a simple fix: apply the soft-log transform $ϕ(x) = \mathrm{sign}(x) \cdot \log(1 + |x|)$ coordinate-wise to data before training, then exponentiate samples after generation. A Hill diagnostic decides per-coordinate whether to transform, leaving light-tailed margins untouched at no added complexity. This compresses heavy tails into a range where standard flow matching succeeds, without heavy-tailed base distributions or architectural modifications. We provide theoretical intuition for why this works: the log-transform maps Pareto tails to exponentials, and the induced dynamics implement a form of tail annealing via power transformations. On a 144-configuration multivariate benchmark (3 copulas, $d$ up to 100, 4 tail indices), Log-FM dominates specialized baselines on $W_1$, CVaR$_{99}$, and extreme-quantile metrics, and is the only method with zero severe divergences across 2{,}880 runs.

URL PDF HTML ☆

赞 0 踩 0

2605.20007 2026-05-20 stat.ME

Identifying Interventional Joint Distributions via Extended Bridge Functions

通过扩展桥梁函数识别干预性联合分布

Constantin Schott

AI总结本文提出扩展桥梁函数，用于识别联合干预分布，解决了传统方法无法保留所有相关代理变量的问题，并将其应用于近端识别算法，基于核操作构建了通用框架。

2605.20003 2026-05-20 stat.ME stat.AP

Estimating treatment duration effects via clone-censor-weight: a breast cancer case study

通过克隆-删失加权估计治疗持续时间效应：乳腺癌案例研究

Charlotte Voinot, Noémie Simon-Tillaux, Emma Torrini, Stefan Michiels, Bernard Sebastien, Clément Berenfeld, Julie Josse

AI总结本文研究了在观察性生存数据中估计治疗持续时间效应的问题，提出克隆-删失加权框架以模拟治疗持续时间策略的目标试验，通过仿真研究比较了多种估计方法，并在乳腺癌队列中应用该框架，揭示了克隆-删失方法的实用性和局限性。

详情

AI中文摘要

在本文中，我们研究了在观察性生存数据中估计治疗持续时间效应的问题，其中治疗和协变量历史随时间演变，较长的观察持续时间只能在事件未发生且随访持续的个体中获得，导致朴素分析中存在永恒时间偏差。克隆-删失加权（CCW）框架提供了一种实用的方法来模拟治疗持续时间策略的目标试验，但若干方法学方面仍不够清楚。我们专注于静态治疗持续时间策略，在两种复杂性递增的设置中进行研究：仅基线混淆，以及具有时间变化协变量的混淆。我们正式化了CCW背后的假设，特别强调治疗可接受性、放松的干预规则以及人工删失与自然删失的区别。然后通过仿真研究比较了克隆和删失后的几种估计方法，包括逆概率删失加权（IPCW）、G公式和双重稳健估计器，评估其稳健性、变异性以及对删失模型误设定的敏感性。最后，我们将该框架应用于乳腺癌队列，以模拟比较2年与5年辅助他莫昔芬在早期乳腺癌中的目标试验。由于事件数量少且2年策略支持有限，估计与显著不确定性相关。这些发现突显了CCW的实用相关性和局限性，并强调了在复杂纵向观察研究中敏感性分析的重要性。

英文摘要

In this work, we study the estimation of treatment duration effects in observational survival data, where treatment and covariate histories evolve over time and longer observed durations are only attainable among individuals who remain event-free and under follow-up, leading to immortal time bias under naive analyses. The cloning-censoring-weighting (CCW) framework provides a practical approach to emulate target trials of treatment duration strategies, but several methodological aspects remain insufficiently understood. We focus on static treatment duration strategies under two settings of increasing complexity: baseline confounding only, and confounding with time-varying covariates. We formalize the assumptions underlying CCW, with particular emphasis on treatment admissibility, relaxed intervention rules, and the distinction between artificial and natural censoring. We then compare several estimation approaches after cloning and censoring, including inverse probability of censoring weighting (IPCW), the G-formula, and doubly robust estimators, through simulation studies assessing robustness, variability, and sensitivity to censoring model misspecification. Finally, we apply the framework to a Breast Cancer cohort to emulate a target trial comparing 2 versus 5 years of adjuvant tamoxifen in early stage breast cancer. Due to the small number of events and limited support for the 2-year strategy, estimates are associated with substantial uncertainty. These findings highlight both the practical relevance and the limitations of CCW, and underscore the importance of sensitivity analyses in complex longitudinal observational settings.

URL PDF HTML ☆

赞 0 踩 0

2605.19989 2026-05-20 math.ST cs.NA math.NA stat.TH

Error Bounds for Importance Sampling with Estimated Proposal Distributions

重要性采样中使用估计提案分布的误差界

Cathrine Aeckerle-Willems, Ilja Klebanov, Simon Weissmann

AI总结本文研究了使用数据驱动提案分布的重要性和采样方法，通过推导非渐近误差界，分离了蒙特卡洛误差和提案近似误差，并为基于KDE的提案提供了定量保证。

详情

AI中文摘要

使用数据驱动提案分布的重要性和采样在实践中被广泛应用。一个常见的流程是首先从目标分布的近似分布中生成大小为N的辅助样本，构建一个密度估计$\hat q$，例如核密度估计器（KDE），然后从该学习的提案中抽取n个重要性样本。尽管其实际相关性，这种分层过程的理论性质仍不明确，因为经典的重要性采样理论假设提案是固定的。我们通过推导标准、防御性和自归一化的重要性采样估计器的非渐近误差界来填补这一空白，这些结果将蒙特卡洛误差（按n^{-1/2}缩放）与通过$\hat q$的均整绝对和平方误差（MIAE和MISE）测量的提案近似误差分开。为了获得在(N,n)中的显式收敛速率，我们为由几何递归马尔可夫链构造的KDEs在平稳和非平稳情况下建立了MIAE和MISE界限。结合这些结果，为基于KDE的提案的重要性采样提供了定量保证。我们的理论为非参数重要性采样框架中选择防御性混合权重提供了实用指导。

英文摘要

Importance sampling with data-driven proposal distributions is widely used in practice. A common workflow first generates an auxiliary sample of size $N$ from an approximation of the target distribution, constructs a density estimate $\hat q$ such as a kernel density estimator (KDE), and then draws $n$ importance samples from this learned proposal. Despite its practical relevance, the theoretical properties of this hierarchical procedure remain poorly understood, since classical importance sampling theory assumes a fixed proposal. We address this gap by deriving non-asymptotic error bounds for standard, defensive, and self-normalized importance sampling estimators with random proposals. Our results separate the Monte Carlo error, scaling as $n^{-1/2}$, from the proposal approximation error measured through the mean integrated absolute and squared errors (MIAE and MISE) of $\hat q$. To obtain explicit convergence rates in $(N,n)$, we establish MIAE and MISE bounds for KDEs constructed from geometrically ergodic Markov chains in stationary and non-stationary regimes. Combining these results yields quantitative guarantees for importance sampling with KDE-based proposals. Our theory provides practical guidance for selecting defensive mixture weights in a nonparametric importance sampling framework.

URL PDF HTML ☆

赞 0 踩 0

2605.19900 2026-05-20 math.ST stat.TH

Uniform projection designs under the stratified $L_2$-discrepancy

基于分层L2偏差的均匀投影设计

Sixu Liu, Yaping Wang

AI总结本文研究了在分层L2偏差下空间填充设计的均匀投影准则，提出了一个显式公式，并建立了精确的上下界，证明了已知最优构造达到下界，且全分层L2偏差下界达到的设也达到该准则的下界，该准则可高效评估低维投影均匀性。

2605.19894 2026-05-20 math.PR math.ST stat.TH

Sharp Spectral Thresholds for Multi-View Spiked Wigner Models

多视图尖峰威格纳模型的尖峰谱阈值

Xiaodong Yang, Subhabrata Sen, Yue M. Lu

AI总结本文研究了多视图尖峰威格纳模型，通过线性化近似消息传递（AMP）方法推导出潜在线性尖峰的谱估计器，并给出了其谱的显式尖峰阈值公式，证明当SNR(λ,B)等于1时，线性化AMP方法达到信息论弱恢复阈值。

详情

Comments: 67 pages, 2 figures

AI中文摘要

受多模态估计启发，我们研究了一个多视图尖峰威格纳模型，其中多个含噪矩阵观测包含相关潜在线性尖峰。我们通过线性化近似消息传递（AMP）方法推导出潜在线性尖峰的谱估计器。我们的主要结果是其谱的显式尖峰阈值公式：对于L≥2个视图，设λ为L维的尖峰强度向量，B为L×L的尖峰极限格拉姆矩阵，关键参数是SNR(λ,B)=λ_max[Diag(√λ)(B⊙B)Diag(√λ)]。当SNR(λ,B)<1时，线性化AMP矩阵在谱的右边缘之外没有异常值。当SNR(λ,B)>1时，一个信息性异常值被固定在区分点1，相关的特征向量具有显式的非平凡重叠与潜在线性信号。因此SNR(λ,B)=1给出了线性化AMP方法的精确谱弱恢复阈值。为建立我们的结果，我们通过矩阵狄士逊方程分析相关高斯噪声矩阵，并结合适应于多视图尖峰结构的确定性描述和有限秩扰动论证。我们还证明，对于广泛类别的尖峰先验，谱阈值SNR(λ,B)=1与信息论弱恢复阈值一致，消除了该类先验的统计-计算间隙。

英文摘要

Motivated by multimodal estimation, we study a multi-view spiked Wigner model in which several noisy matrix observations contain correlated latent spikes. We derive a spectral estimator for the latent spikes by linearizing approximate message passing (AMP). Our main result is an explicit sharp transition formula for its spectrum: for $L \geq 2$ views, letting $λ$ be the $L$-dimensional vector of spike strengths and $B$ the $L\times L$ limiting Gram matrix of the spikes, the critical parameter is $\mathsf{SNR}(λ,B)=λ_{\max}[\mathrm{Diag}(\sqrtλ) (B \odot B) \mathrm{Diag}(\sqrtλ)]$. When $\mathsf{SNR}(λ,B)<1$, the linearized AMP matrix has no outlier beyond the right edge of its bulk spectrum. When $\mathsf{SNR}(λ,B)>1$, an informative outlier is pinned at the distinguished point $1$, and the associated eigenvector has explicit, nontrivial overlaps with the latent signals. Thus $\mathsf{SNR}(λ,B)=1$ gives the exact spectral weak-recovery threshold for the linearized AMP method. To establish our results, we analyze the correlated Gaussian noise matrix through a matrix Dyson equation and combine this deterministic description with finite-rank perturbation arguments adapted to the multi-view spike structure. We also show that, for a broad class of spike priors, the spectral threshold $\mathsf{SNR}(λ,B)=1$ coincides with the information-theoretic threshold for weak recovery, ruling out a statistical-computational gap for this class of priors.

URL PDF HTML ☆

赞 0 踩 0

2605.19878 2026-05-20 stat.ME

Sample Size Determination Under Selection Bias: Robust Tolerance Limits for Prevalent Cohort Data

在选择偏差下确定样本量：对普遍队列数据的稳健容忍限

James H. McVittie, Martin Lysy, Masoud Asgharian

AI总结本文研究了在存在选择偏差的情况下如何确定样本量，提出了适用于带有权重偏差和删失等偏倚采样方案的稳健容忍限方法，并通过模拟研究验证了其有效性。

详情

Comments: 11 pages, 3 figures, 1 table

AI中文摘要

容忍限在统计文献中受到了广泛关注，其应用范围已远超最初的质量控制角色。Scheffé和Tukey（1944）著名的公式建立了样本量与通过给定顺序统计量和给定置信水平之间的简单无分布关系。应用此公式的一个关键要求是能够获得一个无偏且具有代表性的样本。然而，在生物和医学应用中，各种物流限制可能会阻碍获取无偏样本。我们推导了此公式的扩展版本，以适应包含权重偏差和删失在内的广泛偏倚采样方案。修改后的公式通过模拟研究进行了验证，并与未修改的对应版本进行了比较。我们使用加拿大健康与衰老研究中收集的痴呆症个体部分观察到的失败时间数据，展示了修改后的公式的应用。

英文摘要

Tolerance limits have received considerable attention in the statistical literature, with applications reaching far beyond their initial role in quality control. The well-known formula of Scheffé and Tukey (1944) establishes a simple, distribution-free relation between sample size and population coverage by two given order statistics and a given confidence level. A key requirement in applying this formula is the availability of an unbiased, representative sample from the population of interest. However, as it often happens in biological and medical applications, various logistical constraints may preclude the possibility of obtaining an unbiased sample. We derive extensions of this formula which accommodate a large class of biased sampling schemes including weight bias and censoring. The modified formulae are validated through a simulation study and compared to its unmodified counterpart. We illustrate the use of the modified formulae using the partially observed failure times for individuals with dementia using data collected from the Canadian Study of Health and Aging.

URL PDF HTML ☆

赞 0 踩 0

2605.19861 2026-05-20 stat.ME

Stationary subspace analysis for spatial data

空间数据的平稳子空间分析

Perttu Saarela, Klaus Nordhausen, Jaakko Pere, Anne M. Ruiz

AI总结本文提出了一种适用于空间索引数据的平稳子空间分析（spSSA）方法，通过引入一阶和二阶空间统计量来估计混叠矩阵，解决多形式非平稳性问题，并通过数据增强方法估计非平稳子空间的维度。

详情

AI中文摘要

平稳子空间分析（SSA）是一种盲源分离框架，用于将线性混合的多元数据分解为平稳和非平稳成分。我们通过引入空间平稳子空间分析（spSSA）扩展SSA，以显式考虑空间依赖性。我们提出了三种估计混合矩阵的程序，基于一阶和二阶空间统计量。每种程序针对不同的非平稳性类型，并可以表述为求解广义特征值问题的解。为处理同时存在多种非平稳性的情况，我们利用近似联合对角化将三种程序结合。模拟研究显示，这种综合方法具有优越的分离性能。当非平稳子空间的维度已知时，所提出的方法能够可靠地恢复潜在的平稳和非平稳成分。然而，确定该维度仍然是SSA中的基本挑战，目前尚无普遍接受的解决方案。基于我们的估计程序，我们提出了一种新的数据增强方法来估计非平稳子空间的维度，并通过模拟研究证明了其有效性。所提出的方法易于转移到时间序列设置中，具有更广泛的方法学兴趣。

英文摘要

Stationary subspace analysis (SSA) is a blind source separation framework that decomposes linearly mixed multivariate data into stationary and nonstationary components. We extend SSA to spatially indexed data by introducing spatial stationary subspace analysis (spSSA), which explicitly accounts for spatial dependence. We propose three estimation procedures for the unmixing matrix based on first- and second-order spatial statistics. Each procedure targets a different type of nonstationarity and can be formulated as the solution to a generalized eigenvalue problem. To address situations where multiple forms of nonstationarity are present simultaneously, we combine the three procedures using approximate joint diagonalization. Simulation studies demonstrate that this combined approach yields superior separation performance. When the dimension of the nonstationary subspace is known, the proposed methods reliably recover the latent stationary and nonstationary components. However, determining this dimension remains a fundamental challenge in SSA, for which no generally accepted solution currently exists. Building on our estimation procedures, we propose a novel data augmentation approach to estimate the dimension of the nonstationary subspace and demonstrate its effectiveness through simulation studies. The proposed methodology is easily transferable to time series settings, making it of broader methodological interest.

URL PDF HTML ☆

赞 0 踩 0

2605.19830 2026-05-20 cs.LG math.ST stat.TH

Set-Valued Policy Learning

多治疗设置下的集合值策略学习

Laura Fuentes-Vicente, Mathieu Even, Gaëlle Dormion, Antoine Chambaz, Uri Shalit, Julie Josse

AI总结本文提出了一种集合值策略学习方法，用于多治疗场景，通过输出可能的治疗集而非单一推荐，从而内在地量化不确定性，并通过新的 greatest Lower Bound 方法扩展了学习-延迟框架，并引入了符合政策学习，以连接未观察到的真实最优治疗与估计的最优治疗规则。

详情

AI中文摘要

传统治疗政策将患者协变量映射到单一推荐干预以最大化预期临床结果。尽管已开发出大量因果推断方法来估计此类政策，但点值推荐对估计不确定性、模型规范和有限样本变异高度敏感，通常提供很少关于应如何自信推荐行动的指导。在本文中，我们提出了一种多治疗设置下的集合值策略学习范式，其中策略输出一组可能的治疗而非单一推荐。这种形式使内在不确定性量化成为可能，预测集的大小反映决策不确定性的程度。我们通过新的 greatest Lower Bound 方法扩展了学习-延迟框架到多治疗，并引入了符合政策学习，它弥合了未观察到的真实最优治疗与估计最优治疗规则之间的差距。借鉴噪声标签文献的见解，我们开发了一种随机性注入方法，该方法在不需假设底层黑箱最优治疗规则的情况下保证边际覆盖率。通过在合成数据和实际应用到体外受精（IVF）上的实验，我们证明了我们的方法产生稳健且可操作的政策，这些政策自然地纳入临床考虑，同时有效平衡性能和可靠性。

英文摘要

Conventional treatment policies map patient covariates to a single recommended intervention in order to maximize expected clinical outcomes. Although a rich body of causal inference methods has been developed to estimate such policies, point-valued recommendations can be highly sensitive to estimation uncertainty, model specification, and finite-sample variability, while typically providing little guidance about how confident one should be in the recommended action. In this work, we propose a set-valued policy learning paradigm for the multiple-treatment setting, in which policies output a set of plausible treatments rather than a single recommendation. This formulation enables intrinsic uncertainty quantification, with the size of the predicted set reflecting the degree of decision ambiguity. We extend the learning-to-defer framework to multiple treatments via a novel \textit{greatest Lower Bound} method, and introduce \textit{conformal policy learning}, which bridges the gap between unobserved ground-truth optimal treatments and estimated optimal treatment rules. Drawing on insights from the noisy-label literature, we develop a randomness-injection approach that guarantees marginal coverage without requiring assumptions on underlying black-box optimal treatment rules. Through experiments on synthetic data and a real-world application to In-Vitro Fertilization (IVF), we demonstrate that our methods produce robust and actionable policies that naturally incorporate clinical considerations while effectively balancing performance and reliability.

URL PDF HTML ☆

赞 0 踩 0

2605.19823 2026-05-20 cs.LG cs.AI math.AP math.DS stat.ML

Smooth Piecewise Cutting for Neural Operator to Handle Discontinuities and Sharp Transitions

通过平滑分段处理神经算子以应对不连续性和尖锐过渡

Ha Dang, Sebastian Schmidt, Juergen Hesser

AI总结本文提出Cut-DeepONet，一种两阶段训练框架，通过将不连续性建模为更高维空间中的边界，减少学习复杂性，从而在处理偏微分方程的解算子时更有效地捕捉不连续性和尖锐过渡。

详情

AI中文摘要

神经算子在学习偏微分方程（PDEs）的解算子方面取得了强劲表现，但其本质上连续的表示在捕捉不连续性和尖锐过渡时存在困难。现有方法通常在连续函数空间内近似这些特征，往往需要增加模型容量和高分辨率数据。在本文中，我们提出Cut-DeepONet，一种两阶段训练框架，通过提升策略将问题重新表述，将域划分成平滑子区域，同时在更高维空间中将不连续性表示为边界。这种分离使算子学习任务与神经网络的归纳偏置对齐，并避免直接近似不连续性。一个额外的网络预测输入依赖的不连续性位置，然后用于指导神经算子在每个区域内生成平滑组件。在基准PDEs上的实验表明，Cut-DeepONet在低分辨率数据集上训练时也优于最先进的方法。该方法在存在不连续性和尖锐过渡的问题上表现优异，同时使用更少的可训练参数。我们的结果突显了改变算子学习的表示而非增加模型复杂性的优势。

英文摘要

Neural operators have achieved strong performance in learning solution operators of partial differential equations (PDEs), but their inherently continuous representations struggle to capture discontinuities and sharp transitions. Existing approaches typically approximate such features within continuous function spaces, often requiring increased model capacity and high-resolution data. In this work, we propose Cut-DeepONet, a two-stage training framework that explicitly models discontinuities while reducing learning complexity. Our approach reformulates the problem via a lifting strategy, partitioning the domain into smooth subregions while representing discontinuities as boundaries in a higher-dimensional space. This separation aligns the operator learning task with the inductive bias of neural networks and avoids directly approximating discontinuities. An additional network predicts input-dependent discontinuity locations for unseen inputs, which are then used to guide the neural operator in generating smooth components within each region. Experiments on benchmark PDEs show that Cut-DeepONet outperforms state-of-the-art methods, even when trained on low-resolution datasets. The method excels on problems with discontinuities and sharp transitions, while using fewer trainable parameters. Our results highlight the benefits of changing the representation of operator learning rather than increasing model complexity.

URL PDF HTML ☆

赞 0 踩 0

2605.19813 2026-05-20 cs.LG math.ST stat.TH

General Lower Bounds for Differentially Private Federated Learning with Arbitrary Public-Transcript Interactions

具有任意公共 transcripts 交互的差分隐私联邦学习的一般下界

Yicheng Li

AI总结本文研究了在任意公共 transcripts 交互下差分隐私联邦学习的下界问题，提出了一个针对平方 $\ell_2$ 损失参数估计的联邦 Van Trees 下界，并通过均值估计、线性回归和非参数回归等应用展示了该下界。

2605.19812 2026-05-20 cs.LG cs.AI stat.AP stat.ML

FLUXtrapolation: A benchmark on extrapolating ecosystem fluxes

FLUXtrapolation：一个用于外推生态系统通量的基准测试

Anya Fries, Jacob A Nelson, Martin Jung, Markus Reichstein, Jonas Peters

AI总结该研究提出FLUXtrapolation基准测试，旨在外推生态系统通量，通过分析分布偏移对通量上推的挑战，评估机器学习方法在分布偏移下的表现，以促进通量上推的科学目标。

详情

AI中文摘要

我们介绍了FLUXtrapolation，一个用于在外推生态系统通量时应对逐渐加剧的分布偏移的基准测试。生态系统通量是理解碳、水和能量循环的关键，但只能通过稀疏分布的测量塔直接测量。因此，生成全球通量估计需要在可用的全球协变量上训练模型，并在未观测区域进行预测，即上推。通量上推是一个具有挑战性的领域泛化问题，受气候、生态系统类型和环境条件之间协变量分布偏移的影响，以及条件偏移的影响：重要的驱动因素在全局尺度上未被观测。我们对这两种偏移在P_X和P_{Y|X}中的定量分析。FLUXtrapolation基于对通量上推的领域专业知识设计：它定义了基于时间、空间和温度的外推场景，并在未观测的领域、时间聚合和尾部误差上评估性能。在试点研究中，我们发现基线方法在中位小时RMSE下表现相似，但在提出的尾部聚焦和多尺度评估下则有所不同。因此，FLUXtrapolation为机器学习方法在分布偏移下的现实挑战提出了相关挑战；同时，该基准测试的进步将直接支持科学目标，即改进通量上推。

英文摘要

We introduce FLUXtrapolation, a benchmark for extrapolating ecosystem fluxes under progressively harder distribution shifts. Ecosystem fluxes are central to understanding the carbon, water, and energy cycles, yet they can only be measured directly at sparsely located measurement towers. Producing global flux estimates therefore requires training models on observed sites using globally available covariates and predicting in unobserved regions, that is, upscaling. Flux upscaling is a challenging domain generalization problem that is affected by a shift in covariate distribution across climates, ecosystem types, and environmental conditions, as well as by conditional shift: important drivers remain unobserved at global scale. We provide a quantitative analysis of both these shifts in $P_X$ and $P_{Y\mid X}$. FLUXtrapolation is designed based on domain expertise on flux upscaling: it defines temporal, spatial, and temperature-based extrapolation scenarios and evaluates performance across held-out domains, temporal aggregations, and tail errors. In a pilot study, we find that baselines perform similarly under median hourly RMSE, but separate under the proposed tail-focused and multi-scale evaluation. FLUXtrapolation therefore poses a realistic and thus relevant challenge for machine learning methods under distribution shift; at the same time, progress on this benchmark would directly support the scientific goal of improving flux upscaling.

URL PDF HTML ☆

赞 0 踩 0

2605.19807 2026-05-20 stat.ME

Reliable model selection in the presence of parameter non-identifiability

在参数非可识别性存在的情况下可靠模型选择

Yong See Foo, Torkel E. Loman, Alexander P. Browning, Ivo Siekmann, Ruth E. Baker, Jennifer A. Flegg

AI总结本文研究了在参数非可识别性存在时证据计算方法的可靠性，提出了一种新的自适应多重重要性采样方法，以提高模型选择的鲁棒性，并通过生态案例展示了其在低计算成本下优于MCMC方法的性能。

详情

Comments: 33 pages, 8 figures

AI中文摘要

数学模型对于理解生物系统的行为和预测具有重要价值，尽管其构建需要指定机制和关系，这些关系往往并不完全已知。在存在多个竞争模型的情况下，进行基于可用数据的推断时应考虑模型不确定性。贝叶斯模型选择是一种用于检验机理假设并生成模型不确定性下的预测的框架，通常需要计算模型证据。在本文中，我们研究了在参数非可识别性——即给定可用数据无法区分参数值——存在时证据计算方法的可靠性，并发现确定性证据近似方法由于其底层假设被违反，可以产生误导性的模型选择结果。我们提出了一种新的自适应多重重要性采样方法用于证据估计，并展示了其对非可识别性的鲁棒性。我们使用生态案例研究来展示简单的模型选择方法如何无法产生准确的结果，而我们的方法在显著较低的计算成本下产生的模型选择结果与MCMC方法获得的结果相当。鉴于参数非可识别性在数学生物学中的普遍性，本文提供了一种实用的方法，以在参数不明确的情况下实现可靠的模型选择。

英文摘要

Mathematical models are invaluable for understanding and predicting how biological systems behave, although their construction requires specifying mechanisms and relationships that are often not perfectly known. In the presence of multiple competing models, model uncertainty should be accounted for when performing inference based on available data. Bayesian model selection is a framework for testing mechanistic hypotheses and generating predictions under model uncertainty, which generally requires computation of the model evidence. In this work, we investigate the reliability of evidence computation methods when parameter non-identifiability -- the inability to distinguish between parameter values given available data -- is present, and find that deterministic evidence approximations can produce misleading model selection results because their underlying assumptions are violated. We propose a novel implementation of adaptive multiple importance sampling for evidence estimation, and demonstrate its robustness against non-identifiability. We use ecological case studies to demonstrate how simple model selection methods fail to produce accurate results, whereas our method yields model selection results that are comparable to those obtained by Markov chain Monte Carlo methods at substantially lower computational cost. Given the pervasiveness of parameter non-identifiability in mathematical biology, this work provides a practical approach to reliable model selection in the presence of poorly identified parameters.

URL PDF HTML ☆

赞 0 踩 0

2605.19784 2026-05-20 math.OC math.PR math.ST stat.ML stat.TH

Fast Spawn\&Prune (FS\&P): Global convergence of stochastic conic particle gradient descent via birth/death process

快速生成与剪枝（FS&P）：通过生成/死亡过程实现随机锥粒子梯度下降的全局收敛

Yohann De Castro, Sébastien Gadat, Clément Marteau

AI总结本文研究了连续稀疏回归中出现的目标函数（特别是Beurling LASSO）在测度空间中的全局优化问题。尽管锥粒子梯度下降（CPGD）方法计算高效，但非凸参数化可能导致陷入局部极小值。为此，作者引入了快速生成与剪枝（FS&P）算法，该算法扩展了De Castro等人（2025）提出的FastPart，并结合CPGD与生成-死亡过程。生成机制通过在违反一阶最优条件的区域引入粒子，确保渐近全局探索；死亡过程通过剪枝非信息性粒子保持计算效率。本文首次为这一类离散时间随机算法提供了全局收敛的理论保证，无需指数级初始化。此外，作者推导了经验风险的显式收敛率，其规模为O((log K/K)^{1/(2(2+d))})，其中K表示迭代次数，d为域的维度，从而量化了全局探索与局部细化之间的权衡。此外，样本复杂度为O(N^{-1/(4(2+d))})（忽略对数因子）。本文还提出了一种无需先验知识迭代预算的变体。

详情

AI中文摘要

我们研究了连续稀疏回归中出现的目标函数（特别是Beurling LASSO（BLASSO））在测度空间中的全局优化问题。尽管锥粒子梯度下降（CPGD）方法在计算上是高效的，但非凸参数化可能导致陷入局部极小值。为克服这一限制，我们引入了快速生成与剪枝（FS&P），一种随机算法，它扩展了De Castro等人（2025）提出的FastPart，并将CPGD与生成-死亡过程相结合。生成机制通过在违反一阶最优条件的区域引入粒子，确保渐近全局探索；死亡过程通过剪枝非信息性粒子保持计算效率。我们为这一类离散时间随机算法提供了首次理论保证，证明其全局收敛性，无需指数级初始化。此外，我们推导了经验风险的显式收敛率，其规模为O((log K/K)^{1/(2(2+d))})，其中K表示迭代次数，d为域的维度，从而量化了全局探索与局部细化之间的权衡。此外，样本复杂度为O(N^{-1/(4(2+d))})（忽略对数因子）。我们还提出了一种无需先验知识迭代预算的变体。

英文摘要

We investigate the global optimization of the objective function arising in continuous sparse regression, specifically the Beurling LASSO (BLASSO), over the space of measures. While Conic Particle Gradient Descent (CPGD) methods are computationally efficient, they may become trapped in local minima due to the non-convexity of the parameterization. To overcome this limitation, we introduce Fast Spawn\&Prune (FS\&P), a stochastic algorithm that extends FastPart introduced in De Castro et al. (2025) and combines CPGD with a birth-death process. The birth mechanism ensures asymptotic global exploration by introducing particles in regions where first-order optimality conditions are violated, while the death process preserves computational efficiency by pruning non-informative particles. We provide the first theoretical guarantee of global convergence for this class of discrete-time stochastic algorithms, without requiring exponentially large initializations. Furthermore, we derive explicit convergence rates for the excess risk, which scale as $\mathcal{O}\big(\left(\log K / K\right)^{\frac{1}{2(2+d)}}\big)$, where $K$ denotes the number of iterations and d the dimension of the domain, thereby quantifying the trade-off between global exploration and local refinement. Moreover, the sample complexity is $\mathcal{O}\big(N^{-\frac{1}{4(2+d)}}\big)$ (up to logarithmic factors). We also propose a horizon-free variant that does not require prior knowledge of the iteration budget.

URL PDF HTML ☆

赞 0 踩 0

2604.23826 2026-05-20 stat.CO

Building a GPU-Accelerated Multivariate Statistics Platform

构建一个GPU加速的多元统计平台

Mike Crowhurst

AI总结本文研究了如何在大规模数据下高效应用经典多元统计方法，通过GPU加速实现单次遍历计算足够统计量，从而提升性能并保证数值稳定性。

详情

Comments: 13 pages, 1 Figure, 3 Tables

AI中文摘要

经典的多元统计方法，如协方差估计和主成分分析，在数学上已被充分理解，但在极端数据规模下的应用仍然具有挑战性。当观测数达到十亿级别时，性能受限于数据移动、输入输出瓶颈和数值稳定性，而非算术复杂性。本文展示了在单个多GPU节点上扩展经典多元统计方法的案例研究。使用C++和CUDA开发了一个GPU加速的工作流，以在单次遍历100亿行数据集时计算足够统计量。通过列总和和交叉乘积矩阵，可以无需重新访问原始数据即可进行均值、协方差、相关性和主成分分析的下游计算。结果突显了在大规模应用已建立的统计方法时，数据表示、使用已知不变量进行验证以及仔细的数值处理的重要性。

英文摘要

Classical multivariate statistical methods such as covariance estimation and principal component analysis are well understood mathematically, yet their application at extreme data scales remains challenging. When the number of observations reaches billions, performance is limited by data movement, input-output bottlenecks, and numerical stability rather than arithmetic complexity. This work presents a case study of scaling classical multivariate statistics on a single multi-GPU node. Using C++ and CUDA, a GPU-accelerated workflow was developed to compute sufficient statistics in a single pass over a 10-billion-row dataset. Column sums and cross-product matrices are used to enable downstream computation of means, covariance, correlation, and principal component analysis without revisiting the raw data. The results highlight the importance of data representation, validation using known invariants, and careful numerical treatment when applying established statistical methods at large scale.

URL PDF HTML ☆

赞 0 踩 0

2604.00671 2026-05-20 stat.CO

Implementation and Workflows for INLA-Based Approximate Bayesian Structural Equation Modelling

基于INLA的近似贝叶斯结构方程建模的实现与工作流

Haziq Jamil, Håvard Rue

AI总结本文提出INLAvaan包，通过集成嵌套拉普拉斯近似框架实现快速近似贝叶斯结构方程建模，展示了其在复杂模型中的高效性能。

详情

AI中文摘要

贝叶斯结构方程建模（BSEM）提供了许多优势，如原理化的不确定性量化、小样本正则化和灵活的模型规范。然而，其依赖的马尔可夫链蒙特卡洛（MCMC）方法在仔细的心理测量实践中所需的迭代规范、批评和细化循环中计算上是不可行的。我们提出了INLAvaan，这是一个基于Jamil & Rue（2026，arXiv:2603.25690 [stat.ME]）开发的结构方程模型集成嵌套拉普拉斯近似（INLA）框架的R包，用于快速、近似的贝叶斯结构方程建模。本文作为配套论文，描述了该包的架构决策和计算策略。两个实质性应用——一个具有256个参数的双因子圆周模型和一个具有完整信息缺失数据处理的多层中介模型——展示了该方法在MCMC需要数小时运行时间和仔细收敛工作的规范中的高效性能。相比之下，INLAvaan在几秒钟内即可提供校准的后验总结。

英文摘要

Bayesian structural equation modelling (BSEM) offers many advantages such as principled uncertainty quantification, small-sample regularisation, and flexible model specification. However, the Markov chain Monte Carlo (MCMC) methods on which it relies are computationally prohibitive for the iterative cycle of specification, criticism, and refinement that careful psychometric practice demands. We present INLAvaan, an R package for fast, approximate Bayesian SEM built around the Integrated Nested Laplace Approximation (INLA) framework for structural equation models developed by Jamil & Rue (2026, arXiv:2603.25690 [stat.ME]). This paper serves as a companion manuscript that describes the architectural decisions and computational strategies underlying the package. Two substantive applications -- a 256-parameter bifactor circumplex model and a multilevel mediation model with full-information missing-data handling -- demonstrate the approach on specifications where MCMC would require hours of run time and careful convergence work. In constrast, INLAvaan delivers calibrated posterior summaries in seconds.

URL PDF HTML ☆

赞 0 踩 0

2603.25690 2026-05-20 stat.ME

Approximate Bayesian Inference for Structural Equation Models using Integrated Nested Laplace Approximations

使用集成嵌套拉普拉斯近似法对结构方程模型进行近似贝叶斯推断

Haziq Jamil, Håvard Rue

AI总结本文提出了一种基于集成嵌套拉普拉斯近似法框架的近似贝叶斯方法，用于结构方程模型的推断，通过简化拉普拉斯近似和变分贝叶斯修正，提高了推断效率和准确性。

详情

AI中文摘要

马尔可夫链蒙特卡罗（MCMC）方法仍然是结构方程模型（SEM）贝叶斯估计的主要方法，尽管它们通常会产生较高的计算成本。我们提出了一种定制的近似贝叶斯方法用于SEM，借鉴了集成嵌套拉普拉斯近似法（INLA，Rue等人，2009，J. R. Stat. Soc. Series B Stat. Methodol.）框架的思想。我们实现了一种简化的拉普拉斯近似，能够高效地在每个参数方向上对后验密度进行轮廓化，并校正不对称性，从而实现参数化的偏斜正态估计。此外，我们应用变分贝叶斯修正来调整边际位置，从而更好地捕捉后验质量。基本量，包括因子得分和模型拟合指数，通过调整的高斯哥特采样方案获得。对于正常理论SEM，这种方法为基于采样的推断提供了一个高度准确的替代方案，实现了接近最大似然的速度，同时保留了完整贝叶斯推断的精度。

英文摘要

Markov chain Monte Carlo (MCMC) methods remain the mainstay of Bayesian estimation of structural equation models (SEM), though they often incur a high computational cost. We present a bespoke approximate Bayesian approach to SEM, drawing on ideas from the integrated nested Laplace approximation (INLA, Rue et al., 2009, J. R. Stat. Soc. Series B Stat. Methodol.) framework. We implement a simplified Laplace approximation that efficiently profiles the posterior density in each parameter direction while correcting for asymmetry, allowing for parametric skew-normal estimation of the marginals. Furthermore, we apply a variational Bayes correction to shift the marginal locations, thereby better capturing the posterior mass. Essential quantities, including factor scores and model-fit indices, are obtained via an adjusted Gaussian copula sampling scheme. For normal-theory SEM, this approach offers a highly accurate alternative to sampling-based inference, achieving near-'maximum likelihood' speeds while retaining the precision of full Bayesian inference.

URL PDF HTML ☆

赞 0 踩 0

2603.24400 2026-05-20 stat.ML cs.LG

Neural Network Models for Contextual Regression

用于上下文回归的神经网络模型

Seksan Kiatsupaibul, Pakawan Chansiripas

AI总结本文提出了一种用于上下文回归的神经网络模型，通过将上下文特征确定主动子模型和拟合模型的算法分离，实现了结构化且可解释的架构，参数更少。数学上证明该架构足以用标准神经网络组件表示上下文线性回归模型，并通过数值实验表明所提模型在参数数量相当的情况下，具有更低的均方误差和更稳定的性能。

详情

AI中文摘要

我们提出了一种用于上下文回归的神经网络模型，其中回归模型依赖于确定活跃子模型的上下文特征以及一个拟合模型的算法。所提出的简单上下文神经网络（SCtxtNN）将上下文识别与上下文特定回归分离，从而实现了一个结构化且可解释的架构，其参数数量少于全连接前馈网络。我们数学上证明所提出的架构仅使用标准神经网络组件即可表示上下文线性回归模型。提供的数值实验支持这一理论结果，显示所提模型在参数数量相当的情况下，比具有相同参数数量的前馈神经网络具有更低的超额均方误差和更稳定的性能，而更大的网络只能以增加复杂性为代价提高准确性。结果表明，引入上下文结构可以提高模型效率，同时保持可解释性。

英文摘要

We propose a neural network model for contextual regression in which the regression model depends on contextual features that determine the active submodel and an algorithm to fit the model. The proposed simple contextual neural network (SCtxtNN) separates context identification from context-specific regression, resulting in a structured and interpretable architecture with fewer parameters than a fully connected feed-forward network. We show mathematically that the proposed architecture is sufficient to represent contextual linear regression models using only standard neural network components. Numerical experiments are provided to support the theoretical result, showing that the proposed model achieves lower excess mean squared error and more stable performance than feed-forward neural networks with comparable numbers of parameters, while larger networks improve accuracy only at the cost of increased complexity. The results suggest that incorporating contextual structure can improve model efficiency while preserving interpretability.

URL PDF HTML ☆

赞 0 踩 0

2602.18718 2026-05-20 stat.ML cs.LG math.OC stat.CO

Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space

基于Bures-沃斯特斯坦空间到参数空间的随机梯度变分推断与Price梯度估计

Kyurae Kim, Qiang Fu, Yi-An Ma, Jacob R. Gardner, Trevor Campbell

AI总结本文研究了在仅给定目标分布无规范化的对数密度时，利用随机梯度的变分推断方法。通过比较Wasserstein VI和Black-Box VI，发现WVI在使用Price梯度估计时具有更优的收敛性，本文进一步证明两者在迭代复杂度上可以达到一致的最优结果。

详情

Comments: Accepted to ICML'26

AI中文摘要

对于仅给定目标分布无规范化的对数密度时，基于随机梯度的变分推断（VI）算法是一种流行的方法。例如，Wasserstein VI（WVI）和Black-Box VI（BBVI）分别在测度空间（Bures-Wasserstein空间）和参数空间上执行梯度下降。此前，对于高斯变分族，WVI的收敛性保证显示出优于使用重参数化梯度的Black-Box VI的结果，表明测度空间方法可能提供一些独特优势。然而，本文通过获得两者相同的最优迭代复杂度保证，填补了这一差距。特别是，我们发现WVI的优越性源于其使用的特定梯度估计器，BBVI也可以通过少量修改利用该估计器。所讨论的估计器通常与Price定理相关，并利用目标对数密度的二阶信息（Hessian）。我们将此称为Price梯度。另一方面，WVI可以通过使用重参数化梯度使其更广泛适用，这只需要对数密度的梯度。我们实验证明，使用Price梯度是性能提升的主要来源。

英文摘要

For approximating a target distribution given only its unnormalized log-density, stochastic gradient-based variational inference (VI) algorithms are a popular approach. For example, Wasserstein VI (WVI) and black-box VI (BBVI) perform gradient descent in measure space (Bures-Wasserstein space) and parameter space, respectively. Previously, for the Gaussian variational family, convergence guarantees for WVI have shown superiority over existing results for black-box VI with the reparametrization gradient, suggesting the measure space approach might provide some unique benefits. In this work, however, we close this gap by obtaining identical state-of-the-art iteration complexity guarantees for both. In particular, we identify that WVI's superiority stems from the specific gradient estimator it uses, which BBVI can also leverage with minor modifications. The estimator in question is usually associated with Price's theorem and utilizes second-order information (Hessians) of the target log-density. We will refer to this as Price's gradient. On the flip side, WVI can be made more widely applicable by using the reparametrization gradient, which requires only gradients of the log-density. We empirically demonstrate that the use of Price's gradient is the major source of performance improvement.

URL PDF HTML ☆

赞 0 踩 0

2602.09061 2026-05-20 stat.ME cs.IT math.IT math.ST stat.TH

Optimal information deletion and Bayes' theorem

最优信息删除与贝叶斯定理

Hans Montcho, Håvard Rue

AI总结本文从信息删除的角度重新审视贝叶斯定理，提出最优信息删除规则，并证明其与留出数据后验一致。

2601.15014 2026-05-20 stat.ML cs.LG math.ST stat.TH

Efficient and Minimax Optimal In-context Nonparametric Regression with Transformers

高效且最优的基于上下文的非参数回归变换器

Michelle Ching, Ioana Popescu, Nico Smith, Tianyi Ma, William G. Underwood, Richard J. Samworth

AI总结本文研究了基于上下文学习的非参数回归，针对α-Holder光滑回归函数，证明了使用预训练的变换器可以达到最优收敛率，且参数和预训练序列数量显著少于现有文献。

2601.10252 2026-05-20 stat.ME

Asymptotic Theory of Tail Dependence Measures for Checkerboard Copula and the Validity of Multiplier Bootstrap

checkerboard copula尾部依赖度的渐近理论及乘法自助法的有效性

Mayukh Choudhury, Debraj Das, Sujit Ghosh

AI总结本文研究了在未知边缘分布下基于checkerboard方法估计下尾和上尾copula的渐近理论和自助法有效性，通过局部双线性插值构建empirical copula估计器，并扩展到尾部区域以获得非参数极值依赖估计。通过将误差分解为随机经验过程项和由checkerboard投影引起的确定性近似偏差，证明了checkerboard平滑copula估计器的几乎处处一致性和强一致性。进一步推导了中心化和缩放后的checkerboard copula过程在$\ell^\infty([0,1]^2)$下的弱收敛性，证明平滑不影响一阶极限。所得高斯过程与经验copula过程一致，但附加了边际估计带来的项。这些结果扩展到下尾和上尾copula过程，得到函数中心极限定理和尾部依赖系数的渐近正态性。由于极限协方差依赖于未知尾部特征和偏导数，直接推断不可行，因此提出适用于checkerboard结构的直接乘法自助法。证明自助过程条件弱收敛到相同极限，确保光滑函数的有效推断。最后通过模拟和统计应用展示自助方法，包括拟合优度检验和在多种依赖结构下尾部依赖推断，展示准确的小样本性能。

详情

AI中文摘要

在本文中，我们开发了在未知边缘分布下基于checkerboard方法估计下尾和上尾copula的全面渐近和自助理论。估计器通过empirical copula的局部双线性（checkerboard）插值构建，并扩展到尾部区域以获得非参数极值依赖估计。我们首先通过将误差分解为随机经验过程项和由checkerboard投影引起的确定性近似偏差，证明了checkerboard平滑copula估计器的几乎处处一致性和强一致性。接下来，我们推导了中心化和缩放后的checkerboard copula过程在$\ell^\infty([0,1]^2)$下的弱收敛性，证明平滑不影响一阶极限。所得高斯过程与经验copula过程一致，但附加了边际估计带来的项。这些结果扩展到下尾和上尾copula过程，得到函数中心极限定理和尾部依赖系数的渐近正态性。由于极限协方差依赖于未知尾部特征和偏导数，直接推断不可行，因此提出适用于checkerboard结构的直接乘法自助法。我们证明自助过程条件弱收敛到相同极限，确保光滑函数的有效推断。最后，我们通过模拟和统计应用展示自助方法，包括拟合优度检验和在多种依赖结构下尾部依赖推断，展示准确的小样本性能。

英文摘要

In this paper, we develop a comprehensive asymptotic and bootstrap theory for checkerboard-based estimation of lower and upper tail copulas under unknown marginal distributions. The estimator is constructed via local bilinear (checkerboard) interpolation of the empirical copula and extended to the tail region to obtain nonparametric estimators of extremal dependence. We first establish almost sure uniform consistency of the checkerboard-smoothed copula estimator by decomposing the error into a stochastic empirical process term and a deterministic approximation bias induced by the checkerboard projection. Under mild growth conditions on the grid size, the estimator is shown to be strongly consistent. Next, we derive weak convergence of the centered and scaled checkerboard copula process in $\ell^\infty([0,1]^2)$, showing that the smoothing does not affect the first-order limit. The resulting Gaussian process coincides with that of the empirical copula, augmented by terms arising from marginal estimation. These results extend to the lower and upper tail copula processes, yielding functional central limit theorems and asymptotic normality of the tail dependence coefficient. Since the limiting covariance depends on unknown tail features and partial derivatives rendering direct inference infeasible, we propose a direct multiplier bootstrap adapted to the checkerboard structure. We prove conditional weak convergence of the bootstrap process to the same limit, ensuring valid inference for smooth functionals. Finally, we illustrate the bootstrap methodology through simulations and statistical applications, including goodness-of-fit testing and inference on tail dependence under a range of dependence structures, demonstrating accurate finite-sample performance.

URL PDF HTML ☆

赞 0 踩 0

2512.24139 2026-05-20 cs.LG stat.ME

Colorful Pinball: Density-Weighted Quantile Regression for Conditional Guarantee of Conformal Prediction

Colorful Pinball：基于密度加权分位数回归的条件保证置信预测

Qianyi Chen, Bo Li

AI总结本文提出了一种基于密度加权分位数回归的条件保证置信预测方法，通过改进标准置信预测的条件覆盖性能，提供更精确的非渐近保证。

详情

Comments: ICML 2026

AI中文摘要

尽管置信预测提供了稳健的边缘覆盖保证，但实现特定输入的可靠条件覆盖仍然具有挑战性。虽然有限样本下无法获得精确的分布无关条件覆盖，但近期研究集中在改进标准置信程序的条件覆盖性能上。与针对放宽条件覆盖概念的方法不同，我们直接针对条件覆盖的均方误差，通过优化支撑许多置信方法的分位数回归组件来改进。利用泰勒展开，我们推导出一种尖锐的替代目标函数：密度加权pinball损失，其中权重由非置信分数的条件密度在真实分位数处的值给出。我们提出了一种三头分位数网络，通过使用辅助分位数水平$1-α\pm δ$的有限差分估计这些权重，随后通过优化加权损失微调中心分位数。我们提供了具有精确非渐近保证的理论分析，刻画了由此产生的超额风险。在多样化的高维真实世界数据集上的广泛实验展示了在条件覆盖性能上的显著改进。

英文摘要

Although conformal prediction provides robust marginal coverage guarantees, achieving reliable conditional coverage for specific inputs remains challenging. While exact distribution-free conditional coverage is impossible with finite samples, recent work has focused on improving the conditional coverage of standard conformal procedures. Distinct from approaches that target relaxed notions of conditional coverage, we directly target the mean squared error of conditional coverage by refining the quantile regression components that underpin many conformal methods. Leveraging a Taylor expansion, we derive a sharp surrogate objective for quantile regression: a density-weighted pinball loss, where the weights are given by the conditional density of the nonconformity score evaluated at the true quantile. We propose a three-headed quantile network that estimates these weights via finite differences using auxiliary quantile levels at $1-α\pm δ$, subsequently fine-tuning the central quantile by optimizing the weighted loss. We provide a theoretical analysis with exact non-asymptotic guarantees characterizing the resulting excess risk. Extensive experiments on diverse high-dimensional real-world datasets demonstrate remarkable improvements in conditional coverage performance.

URL PDF HTML ☆

赞 0 踩 0

2512.03021 2026-05-20 stat.CO

Semiparametric Robust Estimation of Population Location

半参数鲁棒总体位置估计

Ananyabrata Barua, Ayanendranath Basu

AI总结本文提出一种半参数方法，通过参数化主导成分和非参数化背景成分来鲁棒地估计总体位置，该方法在计算上可扩展且统计上稳健，通过最大化观测似然而非传统异常值降权来吸收噪声背景。

详情

AI中文摘要

现实中的测量数据通常包含主导信号，其被噪声背景所污染。在实践中鲁棒地估计主导信号是一个基本的统计问题。传统上，混合模型被用来将异质人口划分为同质成分。用完全参数化模型建模此类数据在模型不正确时会引入偏差，而完全非参数化方法可能会损失统计功效和计算资源。我们提出一种中间方法：一种半参数方法，仅对主导成分进行参数化建模，而将背景完全非参数化，从而保持计算可扩展性和统计稳健性。因此，我们不采用传统稳健统计文献中使用的异常值降权，而是最大化观测似然，使噪声背景被非参数化成分吸收。在计算上，我们提出了一种新的近似FFT加速的似然最大化算法。经验上，这种FFT插值方法在速度上比传统的加权EM快一个数量级，同时保持统计准确性和大样本性质。

英文摘要

Real-world measurements often comprise a dominant signal contaminated by a noisy background. Robustly estimating the dominant signal in practice has been a fundamental statistical problem. Classically, mixture models have been used to cluster the heterogeneous population into homogeneous components. Modeling such data with fully parametric models risks bias under misspecification, while fully nonparametric approaches can dissipate power and computational resources. We propose a middle path: a semiparametric method that models only the dominant component parametrically and leaves the background completely nonparametric, yet remains computationally scalable and statistically robust. So instead of outlier downweighting, traditionally done in robust statistics literature, we maximize the observed likelihood such that the noisy background is absorbed by the nonparametric component. Computationally, we propose a new approximate FFT-accelerated likelihood maximization algorithm. Empirically, this FFT plug-in achieves order-of-magnitude speedups over vanilla weighted EM while preserving statistical accuracy and large sample properties.

URL PDF HTML ☆

赞 0 踩 0

2510.25632 2026-05-20 stat.ME

Automatic selection of hyper-parameters via the use of softened profile likelihood

通过使用软化轮廓似然实现超参数的自动选择

Gengyang Chen, Mu Zhu

AI总结本文提出了一种扩展方法，用于自动选择高维数据的超参数，通过软化轮廓似然在scree图中识别肘部，并在弹性网络、支持向量机和神经网络中验证了该方法，同时进行了小规模模拟研究以检验假设的稳健性。

2509.19707 2026-05-20 stat.ML cs.LG stat.CO stat.ME

Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies

扩散与流基copula：遗忘与记忆依赖

David Huk, Theodoros Damoulas

AI总结本文提出基于扩散和流原理的copula建模方法，通过遗忘和记忆依赖机制，有效建模多变量依赖，提升了copula模型的表示能力，适用于复杂和高维数据。

详情

Comments: Published as a conference paper at ICLR 2026

AI中文摘要

copulas是建模数据多变量依赖的基本工具，在众多领域和应用中被广泛采用。然而，现有模型在处理多模态和高维依赖时受到限制性假设和扩展性差的阻碍。在本文中，我们提出了基于扩散和流原理的copula建模方法。我们设计了两种过程，逐步遗忘变量间依赖，同时不影响维度分布，证明在所有时间都定义有效的copula。我们展示了如何通过学习从每个过程中记忆遗忘的依赖来获得copula模型，理论上在最优时恢复真实copula。我们的框架的第一种实例专注于直接密度估计，第二种则专注于高效采样。实验表明，我们的方法在建模科学数据集和图像中的复杂和高维依赖方面优于现有copula方法。我们的工作增强了copula模型的表示能力，推动了其在更广泛领域和更大规模应用中的采用。

英文摘要

Copulas are a fundamental tool for modelling multivariate dependencies in data, forming the method of choice in diverse fields and applications. However, the adoption of existing models for multimodal and high-dimensional dependencies is hindered by restrictive assumptions and poor scaling. In this work, we present methods for modelling copulas based on the principles of diffusions and flows. We design two processes that progressively forget inter-variable dependencies while leaving dimension-wise distributions unaffected, provably defining valid copulas at all times. We show how to obtain copula models by learning to remember the forgotten dependencies from each process, theoretically recovering the true copula at optimality. The first instantiation of our framework focuses on direct density estimation, while the second specialises in expedient sampling. Empirically, we demonstrate the superior performance of our proposed methods over state-of-the-art copula approaches in modelling complex and high-dimensional dependencies from scientific datasets and images. Our work enhances the representational power of copula models, empowering applications and paving the way for their adoption on larger scales and more challenging domains.

URL PDF HTML ☆

赞 0 踩 0

2507.01932 2026-05-20 math.OC cs.LG cs.NA math.NA stat.ML

A first-order method for nonconvex-nonconcave minimax problems under a local Kurdyka-Lojasiewicz condition

非凸-非凹极小极大问题的一种一阶方法：在局部Kurdyka-Lojasiewicz条件下

Zhaosong Lu, Xiangyuan Wang

AI总结本文研究了一类非凸-非凹极小极大问题，其中内部最大化问题满足一个可能随外部最小化变量变化的局部Kurdyka-Lojasiewicz条件。与文献中常见的全局KL或Polyak-Lojasiewicz条件相比，该局部KL条件能涵盖更广泛的实际场景，但同时也带来了新的分析挑战。为此，本文证明了关联的最大函数是局部广义Hölder光滑的，并基于此开发了一种近似近端梯度方法来求解极小极大问题，在温和假设下建立了计算近似 stationary 点的复杂性保证。

详情

Comments: Accepted by SIAM Journal on Optimization

AI中文摘要

我们研究了一类非凸-非凹极小极大问题，其中内部最大化问题满足一个可能随外部最小化变量变化的局部Kurdyka-Lojasiewicz（KL）条件。与文献中常见的全局KL或Polyak-Lojasiewicz（PL）条件相比，该局部KL条件能涵盖更广泛的实际场景，但同时也带来了新的分析挑战。特别是，随着优化算法向问题的 stationary 点推进，KL条件成立的区域可能缩小，导致更复杂且可能病态的景观。为解决这一挑战，我们证明了关联的最大函数是局部广义Hölder光滑的。利用这一关键性质，我们开发了一种近似近端梯度方法来求解极小极大问题，其中最大函数的近似梯度通过应用KL结构子问题的近端梯度方法计算。在温和假设下，我们建立了计算极小极大问题近似 stationary 点的复杂性保证。

英文摘要

We study a class of nonconvex-nonconcave minimax problems in which the inner maximization problem satisfies a local Kurdyka-Lojasiewicz (KL) condition that may vary with the outer minimization variable. In contrast to the global KL or Polyak-Lojasiewicz (PL) conditions commonly assumed in the literature -- which are significantly stronger and often too restrictive in practice -- this local KL condition accommodates a broader range of practical scenarios. However, it also introduces new analytical challenges. In particular, as an optimization algorithm progresses toward a stationary point of the problem, the region over which the KL condition holds may shrink, resulting in a more intricate and potentially ill-conditioned landscape. To address this challenge, we show that the associated maximal function is locally generalized Hölder smooth. Leveraging this key property, we develop an inexact proximal gradient method for solving the minimax problem, where the inexact gradient of the maximal function is computed by applying a proximal gradient method to a KL-structured subproblem. Under mild assumptions, we establish complexity guarantees for computing an approximate stationary point of the minimax problem.

URL PDF HTML ☆

赞 0 踩 0

2506.15897 2026-05-20 math.ST math.PR stat.TH

The exact region and an inequality between Chatterjee's and Spearman's rank correlations

Chatterjee's 和 Spearman's 排列相关系数之间的精确区域及不等式

Jonathan Ansari, Marcus Rockel

AI总结本文研究了Chatterjee排列相关系数ξ(X,Y)和Spearman相关系数ρ(X,Y)的可取区域，发现ξ-ρ区域是一个凸集，其边界由一种具有对角带结构的绝对连续非对称Copula家族所刻画，并证明当Y在X上单调时，ξ(X,Y) ≤ |ρ(X,Y)|，且最大差ρ(X,Y)-ξ(X,Y)恰好为0.4。

详情

DOI: 10.1016/j.jmva.2026.105630
Journal ref: Journal of Multivariate Analysis 214 (2026) 105630
Comments: 24 pages, 5 figures

AI中文摘要

最近由Sourav Chatterjee建立的排列相关系数ξ(X,Y)取值在[0,1]区间内，其中0表示X和Y的独立性，1表示Y完全依赖于X。不同于如Spearman的ρ这样的共识度量，ξ量化的是函数依赖性的强度。本文研究了可取的(ξ(X,Y),ρ(X,Y))对的集合。所得到的ξ-ρ区域是一个凸集，其边界由一种新的绝对连续、非对称Copula家族所刻画。此外，我们证明当Y在X上单调时，ξ(X,Y) ≤ |ρ(X,Y)|，并且最大差ρ(X,Y)-ξ(X,Y)恰好为0.4。我们的证明依赖于在各种等式和不等式约束下的凸优化问题，以及ξ和ρ的排序性质。我们的结果有助于更好地理解Chatterjee的排列相关系数，该系数在量化正相关时通常比Spearman的ρ给出更小的值。特别是，当将Chatterjee的排列相关系数的值在ρ的尺度上解释时，√ξ似乎更为合适。

英文摘要

The rank correlation ξ(X,Y), recently established by Sourav Chatterjee and already popular in the statistics literature, takes values in [0,1], where 0 characterizes independence of X and Y, and 1 characterizes perfect dependence of Y on X. Unlike concordance measures such as Spearman's ρ, which capture the degree of positive or negative dependence, ξquantifies the strength of functional dependence. In this paper, we study the attainable set of pairs (ξ(X,Y),ρ(X,Y)). The resulting ξ-\r{ho}-region is a convex set whose boundary is characterized by a novel family of absolutely continuous, asymmetric copulas having a diagonal band structure. Moreover, we prove that ξ(X,Y)\leq|ρ}(X,Y)| whenever Y is stochastically increasing or decreasing in X, and we identify the maximal difference ρ(X,Y)-ξ(X,Y) as exactly 0.4. Our proofs rely on a convex optimization problem under various equality and inequality constraints, as well as on ordering properties for ξand ρ. Our results contribute to a better understanding of Chatterjee's rank correlation, which typically yields substantially smaller values than Spearman's ρwhen quantifying positive dependencies. In particular, when interpreting the values of Chatterjee's rank correlation on the scale of ρ, the quantity \sqrtξ appears to be more appropriate.

URL PDF HTML ☆

赞 0 踩 0

2506.05855 2026-05-20 math.OC stat.ML

Optimized projection-free algorithms for online learning: construction and worst-case analysis

在线学习中优化的无投影算法：构造与最坏情况分析

Julien Weibel, Pierre Gaillard, Wouter M. Koolen, Adrien Taylor

AI总结本文研究并开发了使用线性优化 oracle（即 Frank-Wolfe）处理约束集的无投影在线学习算法，提出了改进的在线 Frank-Wolfe 算法变体，并展示了如何利用半定规划在多种设置中联合设计和分析在线 Frank-Wolfe 类型算法，得出无投影在线 Frank-Wolfe 算法在不增加额外假设的情况下，最坏情况下的 regret 保证无法优于 O(T^3/4)（T 为时间跨度），且当前算法的常数不最优，算法具有无需提前知道 T 的 anytime 性质 O(t^3/4)，且多个线性优化轮次通常无助于获得更好的 regret 保证。

详情

Journal ref: AISTATS 2026 - 29th International Conference on Artificial Intelligence and Statistics, May 2026, Tanger, Morocco

AI中文摘要

本文研究并开发了使用线性优化 oracle（即 Frank-Wolfe）处理约束集的无投影在线学习算法。更具体地说，本文（i）提供了改进的在线 Frank-Wolfe 算法变体及其概念上简单的势能证明，（ii）展示了如何利用半定规划在多种设置中联合设计和分析在线 Frank-Wolfe 类型算法——包括变体（i）的设计。基于半定规划技术，我们得出强有力的数值证据，表明在我们的模型类别中，没有任何纯在线 Frank-Wolfe 算法可以在不增加额外假设的情况下具有优于 O(T^3/4)（T 是时间跨度）的 regret 保证，当前算法的常数不最优，该算法具有无需提前知道 T 的 anytime 性质 O(t^3/4)，且多个线性优化轮次通常无助于获得更好的 regret 保证。

英文摘要

This work studies and develop projection-free algorithms for online learning with linear optimization oracles (a.k.a. Frank-Wolfe) for handling the constraint set. More precisely, this work (i) provides an improved (optimized) variant of an online Frank-Wolfe algorithm along with its conceptually simple potential-based proof, and (ii) shows how to leverage semidefinite programming to jointly design and analyze online Frank-Wolfe-type algorithms numerically in a variety of settings-that include the design of the variant (i). Based on the semidefinite technique, we conclude with strong numerical evidence suggesting that no pure online Frank-Wolfe algorithm within our model class can have a regret guarantee better than O(T^3/4) (T is the time horizon) without additional assumptions, that the current algorithms do not have optimal constants, that the algorithm benefits from similar anytime properties O(t^3/4) not requiring to know T in advance, and that multiple linear optimization rounds do not generally help to obtain better regret bounds.

URL PDF HTML ☆

赞 0 踩 0

2409.02311 2026-05-20 econ.EM stat.ME

A simple distributional difference-in-differences estimator for univariate and bivariate outcomes

一种用于单变量和双变量结果的简单分布差异-差异估计器

Iván Fernández-Val, Jonas Meier, Aico van Vuuren, Francis Vella

AI总结本文提出了一种简单的分布回归估计器，用于处理差异-差异设计中的处理效应，特别是在处理效应在结果变量分布上存在差异时。该估计器易于纳入协变量，并可扩展到处理可能影响多个结果联合分布的情况。核心假设是未处理结果分布中组别和时间的交互效应不存在，这导致了对分布变换的平行趋势假设。本文还通过Athey和Imbens（2006）的改变-改变方法探讨了该方法与假设的关系，并重新审视了Card和Krueger（1994）关于最低工资对就业影响的研究，以展示该方法的实用性。

详情

Comments: 43 pages, 3 figures, 4 tables; new section on asymptotic theory with respect to previous version

AI中文摘要

我们提供了一种简单的分布回归估计器，用于差异-差异（DiD）设计中的处理效应估计。我们的方法特别适用于处理效应在结果变量分布上存在差异的情况。所提出的估计器易于纳入协变量，并且重要的是，可以扩展到处理可能影响多个结果联合分布的设置。我们的关键识别限制是未处理结果分布中没有组别和时间的交互效应。这一假设导致了对分布变换的平行趋势假设。我们强调了我们的方法与Athey和Imbens（2006）的改变-改变方法之间的关系。我们还重新审视了Card和Krueger（1994）关于最低工资对就业影响的研究，以展示我们方法的实用性。

英文摘要

We provide a simple distribution regression estimator for treatment effects in the difference-in-differences (DiD) design. Our procedure is particularly useful when the treatment effect differs across the distribution of the outcome variable. Our proposed estimator easily incorporates covariates and, importantly, can be extended to settings where the treatment potentially affects the joint distribution of multiple outcomes. Our key identifying restriction is that the untreated outcome distribution does not exhibit an interaction effect of group and time. This assumption results in a parallel trend assumption on a transformation of the distribution. We highlight the relationship between our procedure and assumptions with the changes-in-changes approach of Athey and Imbens (2006). We also reexamine the Card and Krueger (1994) study of the impact of minimum wages on employment to illustrate the utility of our approach.

URL PDF HTML ☆

赞 0 踩 0

2403.20200 2026-05-20 math.ST math.PR stat.ME stat.ML stat.TH

High-dimensional analysis of ridge regression for non-identically distributed data with a variance profile

具有方差轮廓的非同分布数据的高维岭回归分析

Jérémie Bigot, Issa-Mbenard Dabo, Camille Male

AI总结本文研究了在非同分布数据下的高维岭回归模型，通过引入方差轮廓随机矩阵，分析了岭估计器的预测风险及其自由度，并揭示了在特定方差轮廓下最小范数最小二乘估计器中双下降现象的出现，同时展示了预测风险形状与双下降不同的方差轮廓。

详情

AI中文摘要

在独立且同分布数据背景下，高维线性回归已得到充分研究。本文旨在研究独立但非同分布数据下的高维回归模型。为此，我们假设观测预测变量（或特征）的集合是一个具有方差轮廓的随机矩阵，其维度以比例速率增长。假设随机效应模型，我们研究具有此类方差轮廓的线性回归中岭估计器的预测风险。在此设定下，我们提供了该风险及岭估计器自由度的确定性等价量。对于某些方差轮廓类别，我们的工作揭示了在岭正则化参数趋于零时，最小范数最小二乘估计器在高维回归中出现已知的双下降现象。我们还展示了预测风险形状不同于双下降的方差轮廓。我们的结果证明基于随机矩阵理论工具，在存在方差轮廓的情况下，这些工具尚未被用于研究回归模型。数值实验展示了上述确定性等价量在计算岭回归预测风险准确性方面的有效性。我们还探讨了与标准独立同分布数据设定下的相似性和差异性。

英文摘要

High-dimensional linear regression has been thoroughly studied in the context of independent and identically distributed data. We propose to investigate high-dimensional regression models for independent but non-identically distributed data. To this end, we suppose that the set of observed predictors (or features) is a random matrix with a variance profile and with dimensions growing at a proportional rate. Assuming a random effect model, we study the predictive risk of the ridge estimator for linear regression with such a variance profile. In this setting, we provide deterministic equivalents of this risk and of the degree of freedom of the ridge estimator. For certain class of variance profile, our work highlights the emergence of the well-known double descent phenomenon in high-dimensional regression for the minimum norm least-squares estimator when the ridge regularization parameter goes to zero. We also exhibit variance profiles for which the shape of this predictive risk differs from double descent. The proofs of our results are based on tools from random matrix theory in the presence of a variance profile that have not been considered so far to study regression models. Numerical experiments are provided to show the accuracy of the aforementioned deterministic equivalents on the computation of the predictive risk of ridge regression. We also investigate the similarities and differences that exist with the standard setting of independent and identically distributed data.

URL PDF HTML ☆

赞 0 踩 0

2605.19758 2026-05-20 cs.AI cs.DB stat.ML

CogScale: Scalable Benchmark for Sequence Processing

CogScale: 用于序列处理的可扩展基准

Yannis Bendi-Ouis, Romain de Coudenhove, Xavier Hinaut

AI总结本文提出CogScale，一个包含14个可扩展合成任务的基准，用于评估不同架构在不同参数规模下的认知和记忆能力，通过标准化轻量框架加速架构创新验证。

详情

AI中文摘要

维持和操纵信息随时间变化的能力是生物和人工智能的基本特征。尽管现代模型在自然语言处理等任务上取得了显著成功，但评估新型架构处理序列信息的能力仍计算成本高且耗时。测试新架构通常需要扩展到大规模数据集和模型，导致巨大的计算成本和缓慢的迭代周期。在本文中，我们提出了CogScale，一个包含14个可扩展合成任务的基准，旨在隔离和评估不同参数规模下的特定认知和记忆能力。通过提供标准化的轻量框架，CogScale允许研究者在投入大规模训练之前快速验证架构创新。为了建立坚实的基础，我们评估了七种不同的架构：门控循环单元（GRU）、长短期记忆（LSTM）、xLSTM、回声状态网络（ESN）、Mamba、Transformer解码器和Transformer编码器-解码器。这些评估在严格的参数预算（1k、10k和100k）和不同的难度级别和规模下进行。我们的结果表明，尽管经典RNN和回声状态网络在严格参数预算内表现出色，只有注意力机制和现代状态空间模型在推理复杂性和任务难度增加时仍能保持高性能。

英文摘要

The ability to maintain and manipulate information over time is a fundamental aspect of living beings and Artificial Intelligence. While modern models have achieved remarkable success in tasks like natural language processing, evaluating the capacity of novel architectures to process sequential information remains computationally expensive and time-consuming. Testing a new architecture often requires scaling up to massive datasets and models, leading to vast computational costs and slow iteration cycles. In this paper, we propose CogScale, a benchmark of 14 scalable synthetic tasks designed to isolate and evaluate specific cognitive and memory abilities at different parametrizable scales. By providing a standardized, lightweight framework, CogScale allows researchers to rapidly validate architectural innovations before committing to large-scale training. To establish a solid baseline, we evaluate seven distinct architectures: Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), xLSTM, Echo State Network (ESN), Mamba, Transformer Decoder, and Transformer Encoder-Decoder. These evaluations are conducted under strict parameter budgets (1k, 10k, and 100k) and across different difficulty levels and scales. Our results show that while classical RNNs and Echo State Networks excel at basic retention within strict parameter budgets, only attention mechanisms and modern state-space models consistently maintain high performance as reasoning complexity and task difficulty scale.

URL PDF HTML ☆

赞 0 踩 0

2605.19745 2026-05-20 stat.OT

Making Uncertainty Visible: Multiverse Analysis for Robust Computational Social Science

使不确定性显现：用于稳健计算社会科学的多元宇宙分析

Maximilian Linde, Jun Sun, Paul Balluff, Danica Radovanović, Chung-hong Chan

AI总结本文通过案例研究展示多元宇宙分析如何增强计算社会科学研究在面对替代方法学决策时的稳健性和透明性，通过分析三个使用贝叶斯分析、网络生成建模和机器学习（含或不含大语言模型）的社会科学研究，揭示了不同方法学决策组合对实证发现的影响，并指出展示导致计算失败的方法组合是进行多元宇宙分析的动机。

详情

AI中文摘要

通过案例研究，我们展示了多元宇宙分析如何加强计算社会科学发现的稳健性和透明性，以应对替代方法学决策。我们对三个已发表的社会科学研究进行了多元宇宙分析，这些研究使用了以下计算方法：贝叶斯分析、网络生成建模和带有或不带大语言模型的机器学习。这些方法在计算社会科学研究中被频繁使用，但涉及方法学选择的任意性，即“研究者自由度”。我们的多元宇宙分析揭示了这些研究的实证发现如何随着各种合理决策组合而变化。我们的三个案例研究还揭示了进行多元宇宙分析的另一个常被忽视的动机：展示导致计算失败的方法组合。这些失败案例通常不在发表报告中提及，尽管这些复杂的计算方法更有可能失败。本文最后提出了如何为计算社会科学研究的多元宇宙分析找到可辩护的决策组合以及如何公平地传达多元宇宙分析结果的建议。

英文摘要

Through case studies, we demonstrate how multiverse analysis can strengthen the robustness and transparency of computational social science findings against alternative methodological decisions. We conduct multiverse analyses of three published social science studies that use the following computational methods: Bayesian analysis, network generative modeling, and machine learning with or without large language models. These methods are applied frequently in computational social science studies, yet entail a greater degree of arbitrariness in terms of methodological choices, or "researcher degrees of freedom." Our multiverse analyses reveal how the empirical findings in these studies vary as a function of various plausible decision combinations. Our three case studies also expose an often-ignored motivation for conducting multiverse analysis: Showing which methodological combinations lead to computational failure. These failed cases are usually not communicated in the published reports, even though these sophisticated computational methods have a much higher likelihood of failure. We end our paper with suggestions on how to find defensible decision combinations for multiverse analysis of computational social science studies and how to communicate multiverse analysis findings fairly.

URL PDF HTML ☆

赞 0 踩 0

2605.19693 2026-05-20 stat.ME

Causal treatment effect decompositions with time-to-event outcomes under competing events

具有竞争事件的时间到事件结果的因果治疗效应分解

Mikko Valtanen, Tommi Härkänen, Jenni Lehtisalo, Tiia Ngandu, Miia Kivipelto, Kari Auranen

AI总结本文提出了一种分解方法，用于分析时间到事件结果的治疗效应，考虑了竞争事件的不同机制，并通过两个随机试验的数据集展示了四维分解的应用。

详情

Comments: 18 pages, 3 figures

AI中文摘要

关于时间到事件结果的治疗效应推断常常受到竞争事件存在的阻碍。当治疗影响竞争事件的发生时，情况变得更加复杂。因此，全面评估应考虑治疗和竞争事件共同产生显见治疗效应的不同机制。本文提出了一种分解方法，用于分解治疗对目标事件的影响，通过四种不同的机制来刻画其产生。基于因果模型，该分解依赖于跨世界估计量，反映治疗在两个事件上设置冲突水平的反事实场景。我们指定了可交换性和一致性假设，使得该分解可以从观察数据中进行估计。我们讨论了新的分解如何揭示竞争事件的作用，并为存在竞争事件时定义可解释的估计量提供了基础。最后，我们通过两个随机试验的数据集展示了四维分解的应用。

英文摘要

Inference about treatment effects for time-to-event outcomes is often obscured by the presence of competing events. A particularly complex situation arises when the treatment influences the occurrence of the competing event. A comprehensive assessment should then account for different mechanisms by which the treatment and the competing event together produce the apparent treatment effect. Here, we propose a decomposition of the treatment's effect on the event of interest (target), characterising how it arises due to four distinct mechanisms involving both the target and competing events. Based on a causal model, the decomposition relies on cross-world estimands reflecting counterfactual scenarios in which the treatment affects the two events as if set to conflicting levels. We specify exchangeability and consistency assumptions under which the decomposition can be estimated from observed data. We discuss how the new decomposition reveals the role of the competing event and serves as a basis for defining causally interpretable estimands in the presence of competing events. Finally, we demonstrate the use of the four-way decomposition with datasets from two randomised trials.

URL PDF HTML ☆

赞 0 踩 0

2605.19685 2026-05-20 stat.ML cs.LG

Probabilistic Multivariate Time Series Forecasting with Diffusion Copulas

基于扩散Copula的概率多变量时间序列预测

David Huk, Dongshan Wang, Miha Bresar

AI总结本文提出了一种扩散-Copula框架，通过分离边际分布学习与依赖结构学习，改进了多变量时间序列预测中对尾部风险的估计，展示了在加密货币市场中对系统性极值的预测优势。

2605.19648 2026-05-20 math.ST stat.TH

Influence as soft sparsity: Estimation of monotone functions on $\{0,1\}^d$

影响作为软稀疏性：布尔超立方体上单调函数的估计

Gérard Biau

AI总结该研究探讨了如何从噪声观测中估计布尔超立方体上单调函数的问题，通过分析总L1影响来衡量目标函数的复杂性，并建立了相应的最小最大界。

详情

AI中文摘要

我们研究了从噪声观测中估计布尔超立方体上单调函数f:{0,1}^d→[0,1]的问题。作为目标函数f的复杂性度量，我们使用总L1影响I(f)=∑_{i=1}^d(E[f(X)|X_i=1]-E[f(X)|X_i=0])，这是一个在布尔分析中经典的量，对于单调函数非负，并控制估计问题的有效维度：通过类似于Friedgut联合定理的谱集中结果，任何I(f)≤K的函数的傅里叶谱集中在低阶的影响力坐标上。我们建立了关于类F_K={f:{0,1}^d→[0,1],f单调,I(f)≤K}的最小最大界：c·K²/(log n)^{3/2} ≤ inf_{\hat f} sup_{f∈F_K} E[‖\hat f - f‖_2^2] ≤ C·K/√(log n)，其中n是样本量。上界对于所有K≥1成立，并且在环境维度d下是均匀的（在温和的条件log d≤n^{1-ε}下）。通过一个傅里叶阈值估计器实现，该估计器适应于未知的K。下界依赖于在超立方体中间层上的Varshamov-Gilbert打包结合Fano不等式。

英文摘要

We study the problem of estimating a monotone function $f:\{0,1\}^d\to[0,1]$ from noisy observations at uniformly random vertices of the Boolean hypercube. As a measure of complexity for the target~$f$, we use the total $L^1$-influence $I(f)=\sum\_{i=1}^d(\E[f(X)\mid X\_i=1]-\E[f(X)\mid X\_i=0])$, a classical quantity in Boolean analysis that is nonnegative for monotone functions and controls the effective dimensionality of the estimation problem: through a spectral concentration result in the spirit of Friedgut's junta theorem, the Fourier spectrum of any $f$ with $I(f)\leqslant K$ concentrates on low-degree subsets of the influential coordinates. We establish minimax bounds over the class $\cF\_K=\{f:\{0,1\}^d\to[0,1],\;f\text{ monotone},\; I(f)\leqslant K\}$: \[ c\,\frac{K^2}{(\log n)^{3/2}} \;\leqslant\; \inf\_{\hat f}\;\sup\_{f\in\cF\_K}\; \E\bigl[\|\hat f - f\|\_2^2\bigr] \;\leqslant\; C\,\frac{K}{\sqrt{\log n}}, \] where $n$ is the sample size. The upper bound holds for all $K\geqslant 1$ and is uniform in the ambient dimension~$d$ (under the mild condition $\log d\leqslant n^{1-\varepsilon}$). It is achieved by a Fourier thresholding estimator that adapts to the unknown~$K$. The lower bound relies on a Varshamov--Gilbert packing on the middle layer of the hypercube combined with Fano's inequality.

URL PDF HTML ☆

赞 0 踩 0

2605.19641 2026-05-20 stat.ML cs.LG

Increasing Missingness to Reduce Bias: Richardson-SGD with Missing Data

增加缺失值以减少偏差：带有缺失数据的Richardson-SGD

Ferdinand Genans, Erwan Scornet

AI总结本文研究了如何通过增加缺失值来减少梯度偏差，提出了一种基于Richardson外推的Richardson-SGD方法，该方法通过在已有不完整数据的基础上故意增加缺失率，从而抵消梯度偏差，提高了不完整数据下的优化和估计性能。

详情

AI中文摘要

随机梯度方法在现代大规模学习中至关重要，但其在不完整协变量中的使用仍然谨慎，因为插补方案通常会引入系统性的梯度偏差，如在线性模型中所示。在本工作中，我们证明了所有参数模型在各种插补程序中都表现出相似的梯度偏差，并且精确地刻画了缺失率向量p的依赖性，其中O(||p||)是主导项。我们利用这一分析，提出了一种简单的去偏差程序，用于带有缺失值的随机梯度下降（SGD），基于Richardson外推。关键思想是“故意增加缺失率”：从已有的不完整观测中，生成一个更稀疏的版本，在更高的、受控的缺失率下，并将两个结果的随机梯度结合以抵消主导的偏差项。我们证明，在几种缺失情况中，一个Richardson步骤将梯度偏差从O(||p||)减少到O(||p||²)。我们提出的方法计算高效，模型无关，并适用于任何参数损失函数，其随机梯度可以在插补后计算。此外，当缺失指示符独立时，总体梯度偏差是p的多线性多项式，并仅取决于由声明单个坐标缺失引起的总体梯度误差。在这种情况下，我们的方法可以推广到多步Richardson过程，该过程递归地抵消更高阶项。在经验上，Richardson去偏差提高了多个广义线性模型中的优化和估计性能，并与广泛使用的插补程序如MICE相结合。这些结果表明，有些反直觉地，在现有缺失数据上添加受控的缺失率可以使不完整数据的随机学习更准确。

英文摘要

Stochastic gradient methods are central to modern large-scale learning, but their use with incomplete covariates remains delicate since imputation schemes generally introduce systematic gradient biases, as shown for linear models. In this work, we prove that all parametric models exhibit similar gradient bias for various imputation procedures and characterize exactly the dependence on the missingness ratio vector $p$, with $O(\|p\|)$ as the leading term. We exploit this analysis to propose a simple debiasing procedure for stochastic gradient descent (SGD) with missing values based on Richardson extrapolation, which leverages the exact expression of the gradient bias. The key idea is to \emph{deliberately add missingness}: from an already incomplete observation, we generate a further-thinned version at a higher, controlled missingness level, and combine the two resulting stochastic gradients to cancel the leading bias term. We prove that one Richardson step reduces the gradient bias from $O(\|p\|)$ to $O(\|p\|^2)$ under several missingness scenarios. Our proposed method is computationally efficient, model-agnostic and applies to any parametric loss whose stochastic gradient can be computed after imputation. Furthermore, when missing indicators are independent, the population gradient bias is a multilinear polynomial in $p$ and depends only on population gradient errors induced by declaring a single coordinate missing. In this case, our method generalizes to a multi-step Richardson procedure which recursively cancels higher-order terms. Empirically, Richardson debiasing improves optimization and estimation across several generalized linear models and combines positively with widely used imputation procedures such as MICE. These results suggest that, somewhat counter-intuitively, adding controlled missingness on top of existing missing data can make stochastic learning from incomplete data more accurate.

URL PDF HTML ☆

赞 0 踩 0

2605.19629 2026-05-20 stat.ML cs.LG math.OC

Gaussian Approximation and Multiplier Bootstrap for Federated Linear Stochastic Approximation

高斯近似与乘数自助法用于联邦线性随机逼近

Ilya Levin, Maksim Shuklin, Eric Moulines, Paul Mangold, Sergey Samsonov

AI总结本文建立了联邦线性随机逼近的Berry-Esseen型界，首次明确捕捉通信-计算权衡和异质性误差项的联邦高斯近似，量化了局部步长、局部更新次数和异质性对收敛速率的影响。

2605.19619 2026-05-20 cs.LG cs.AI math.OC stat.ML

MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models

MiMuon: 一种具有改进泛化能力的混合穆恩优化器用于大模型

Feihu Huang, Yuning Luo, Songcan Chen

AI总结本文研究了穆恩优化器的泛化误差，提出了一种改进的混合穆恩优化器MiMuon，证明其泛化误差更低，同时保持了与穆恩优化器相同的收敛速度。

详情

Comments: 25 pages

AI中文摘要

矩阵结构的参数在许多人工智能模型中频繁出现，例如大语言模型。最近，为大规模模型的矩阵参数设计了一种高效的穆恩优化器，其收敛速度明显快于向量级算法。尽管一些工作已经开始研究穆恩优化器的收敛性质（即优化误差），但其泛化性质（即泛化误差）尚未建立。因此，在本文中，我们基于算法稳定性与数学归纳法研究穆恩优化器的泛化误差，并证明穆恩优化器的泛化误差为O(1/(Nκ^T))，其中N为训练样本数量，T表示迭代次数，κ>0表示梯度估计奇异值之间的最小差。为了增强穆恩优化器的泛化能力，我们通过谨慎使用梯度的正交化，提出了一种有效的混合穆恩（MiMuon）优化器，该优化器是穆恩优化器与基于动量的SGD优化器的混合。然后我们证明我们的MiMuon优化器的泛化误差比穆恩优化器的O(1/(Nκ^T))更低，因为κ通常非常小。同时，我们还研究了我们MiMuon算法的收敛性质，并证明我们的MiMuon算法具有与穆恩算法相同的收敛速度O(1/T^{1/4})。在训练大模型（包括Qwen3-0.6B和YOLO26m）的一些数值实验结果中展示了MiMuon优化器的效率。

英文摘要

Matrix-structured parameters frequently appear in many artificial intelligence models such as large language models. More recently, an efficient Muon optimizer is designed for matrix parameters of large-scale models, and shows markedly faster convergence than the vector-wise algorithms. Although some works have begun to study convergence properties (i.e., optimization error) of the Muon optimizer, its generalization properties (i.e., generalization error) is still not established. Thus, in this paper, we study generalization error of the Muon optimizer based on algorithmic stability and mathematical induction, and prove that the Muon has a generalization error of $O\big(\frac{1}{Nκ^{T}}\big)$, where $N$ is training sample size, and $T$ denotes iteration number, and $κ>0$ denotes minimum difference between singular values of gradient estimate. To enhance generalization of the Muon, we propose an effective mixed Muon (MiMuon) optimizer by cautiously using orthogonalization of gradient, which is a hybrid of Muon and momentum-based SGD optimizers. Then we prove that our MiMuon optimizer has a lower generalization error of $O\big(\frac{1}{N}\big)$ than $O\big(\frac{1}{Nκ^{T}}\big)$ of Muon optimizer, since $κ$ generally is very small. Meanwhile, we also studied the convergence properties of our MiMuon algorithm, and prove that our MiMuon algorithm has the same convergence rate of $O(\frac{1}{T^{1/4}})$ as the Muon algorithm. Some numerical experimental results on training large models including Qwen3-0.6B and YOLO26m demonstrate efficiency of the MiMuon optimizer.

URL PDF HTML ☆

赞 0 踩 0

2605.19618 2026-05-20 cs.LG stat.ME

A Family of Divergence Measures for Evaluating the Reconstruction Quality of Explainable Ensemble Trees

可解释性集成树的重建质量评估的一类发散度度量

Massimo Aria, Agostino Gnasso, Carmela Iorio

AI总结本文提出了一种基于发散度的度量框架，用于评估可解释性集成树的重建质量，通过区分一致性和关联性，提供了一种新的诊断方法来识别重建失败的具体原因。

详情

AI中文摘要

验证集成学习者可解释的替代模型需要测量集成内部表示与其替代近似之间的同意程度，而不是仅仅关联性。基于相关性的方法是尺度不变的，无法检测共现结构中的系统性差异。我们提出了一种基于一致性和关联性区别的统计框架，以归一化的可解释性损失（nLoI）为中心。该框架基于Cressie-Read幂发散家族，lambda等于2，nLoI可以分解为节点内和节点间的组成部分，提供了独特的诊断能力，以精确识别重建失败的位置和原因。该框架包含四个互补的度量，捕捉替代质量的不同结构方面。统一的排列检验程序在单次重采样过程中为所有度量提供有效的推断。每个度量的理论性质，包括有界性和对称性，均已建立。蒙特卡洛模拟和实证评估证实了精确的I型错误控制，并展示了这些度量能够检测出相关性方法无法检测到的重建保真度梯度。该框架在可解释性集成树（E2Tree）的背景下开发和说明，并在三个基准数据集上的实证评估展示了该框架的实际应用价值。

英文摘要

Validating interpretable surrogate models for ensemble learners requires measuring agreement between the ensemble's internal representation and its surrogate approximation, rather than mere association. Correlation-based approaches are scale-invariant and fail to detect systematic discrepancies in co-occurrence structure. We propose a statistical framework grounded in the agreement-association distinction, centered on the normalized Loss of Interpretability (nLoI). Rooted in the Cressie-Read power divergence family with lambda equal to 2, the nLoI admits a closed-form decomposition into within-node and between-node components, providing a unique diagnostic capability to identify precisely where and why reconstruction fails. The framework incorporates four complementary measures capturing distinct structural facets of approximation quality. A unified permutation testing procedure delivers valid inference for all measures within a single resampling pass. Theoretical properties, including boundedness and symmetry, are established for each metric. Monte Carlo simulations and empirical evaluations confirm exact Type I error control and demonstrate that these measures detect reconstruction fidelity gradients invisible to correlation-based alternatives. The framework is developed and illustrated in the context of Explainable Ensemble Trees (E2Tree), and empirical evaluation on three benchmark datasets illustrates the practical utility of the framework.

URL PDF HTML ☆

赞 0 踩 0

2605.19610 2026-05-20 stat.ML cs.LG

Posterior Contraction of Lévy Adaptive B-spline Regression in Besov Spaces

Lévy自适应B样条回归在Besov空间中的后验收缩

Jeunghun Oh, Sewon Park, Jaeyong Lee

AI总结本文研究了Lévy自适应B样条回归模型在Besov空间中的后验收缩性质，证明了该模型在非参数回归框架中能够以接近最优的速率收敛到真实函数，同时自动适应未知的光滑度。

详情

AI中文摘要

我们研究了Lévy自适应B样条（LABS）回归模型的渐近性质，这是一种将B样条核纳入Lévy自适应回归核（LARK）模型的贝叶斯非参数方法。LABS应用具有不同次数的样条，并独立定义结点，从而获得一个灵活的模型类，能够适应真实函数的不规则和局部结构特征。在单变量随机设计和高斯误差的非参数回归框架中，我们证明了LABS后验在Besov类中以接近最优的速率收敛到真实函数，直至一个对数因子，同时自动适应未知的光滑度。本研究填补了文献中的空白，因为关于LARK模型在Besov空间中的后验收缩的理论结果仍然很少。在Besov空间的标准测试函数（包括Blocks、Bumps、HeaviSine和Doppler）上的模拟实验补充了理论结果，并展示了LABS的实用价值。

英文摘要

We investigate the asymptotic properties of the Lévy Adaptive B-spline (LABS) regression model, a Bayesian nonparametric method that incorporates B-spline kernels into the Lévy Adaptive Regression Kernel (LARK) model. LABS applies splines of varying degrees with independently defined knots, yielding a flexible model class capable of adapting to irregular and locally structured features of the true function. Within the nonparametric regression framework with univariate random design and Gaussian errors, we establish that the LABS posterior contracts around the true function in Besov classes at nearly minimax-optimal rates, up to a logarithmic factor, while adapting automatically to unknown smoothness. This study contributes to filling a gap in the literature, where theoretical results on posterior contraction of the LARK model in Besov spaces remain scarce. Simulation experiments on standard test functions in Besov spaces, including Blocks, Bumps, HeaviSine, and Doppler, complement the theoretical results and demonstrate the practical utility of LABS.

URL PDF HTML ☆

赞 0 踩 0

2605.19591 2026-05-20 stat.ME

Uncertainty-Aware Ideal Point Estimation via Variational EM

通过变分EM进行不确定性感知的理想点估计

Kwangok Seo, Youngjo Lee, Jong Hee Park, Xinlei Wang, Johan Lim

AI总结本文提出了一种计算高效的方法来估计立法者理想点及其标准误差，利用变分EM算法和变分Louis方法来近似观测Fisher信息，以提高大样本数据下的计算效率和估计精度。

详情

AI中文摘要

滚叫数据的分析旨在估计立法者理想点并量化相关的不确定性。现有方法要么依赖于通过马尔可夫链蒙特卡洛采样实现的贝叶斯方法，要么主要关注点估计，不确定性通常通过重新抽样程序如bootstrap来评估。因此，当应用于大规模滚叫数据集时，这些方法的计算负担可能会变得相当大。为了解决这一挑战，我们提出了一种计算高效的似然方法来估计理想点及其标准误差。利用Polya-Gamma恒等式，我们开发了用于估计理想点的变分期望-最大化算法，并引入了变分Louis方法来近似观测Fisher信息以进行标准误差估计。数值研究和对美国国会滚叫数据的应用表明，所提出的方法在产生准确的理想点估计和可靠的标准误差的同时，比现有方法具有显著的计算效率提升。

英文摘要

Roll-call data analysis aims to estimate legislators' ideal points and quantify the associated uncertainty. Existing approaches either rely on Bayesian methods implemented via Markov chain Monte Carlo sampling or focus primarily on point estimation, with uncertainty typically assessed through resampling procedures such as the bootstrap. Consequently, the computational burden of these approaches can become substantial when applied to large roll-call datasets. To address this challenge, we propose a computationally efficient likelihood method for estimating ideal points and their standard errors. Leveraging the Pólya--Gamma identity, we develop a variational expectation--maximization algorithm for estimating ideal points and introduce a variational Louis' method to approximate the observed Fisher information for standard error estimation. Numerical studies and applications to U.S. congressional roll-call data demonstrate that the proposed method produces accurate ideal point estimates and reliable standard errors while being substantially more computationally efficient than existing approaches.

URL PDF HTML ☆

赞 0 踩 0

2605.19584 2026-05-20 cs.LG stat.ML

Online Market Making and the Value of Observing the Order Book

在线市场做市与观察订单簿的价值

Davide Maran, Marcello Restelli

AI总结本文研究了在线市场做市问题，其中学习者在与持有私人估值的交易者交互时，依次发布买入和卖出价格。与现有在线学习公式假设完全截断反馈不同，我们引入了受真实限价簿启发的动作依赖反馈模型。我们证明，这种额外信息从根本上改变了问题的学习性。在随机设置中，我们提出了一种消除算法，以高概率达到O(√T)的遗憾，而无需对交易者估值分布的光滑性做出任何假设。然后我们将这一结果扩展到广泛的均值回归价格过程中，考虑了局部自回归动态和基于累积偏离均值的较弱全局漂移条件。在任一假设下，我们建立了高概率O(√T)的遗憾界，依赖于一个新的有趣的集中不等式。最后，在对抗性设置中，我们设计了探索后扰动算法，保证了期望O(T^{2/3})的遗憾。

详情

Comments: Accepted at COLT2026

AI中文摘要

我们研究了一个在线市场做市问题，其中学习者在与持有私人估值的交易者交互时，依次发布买入和卖出价格。与现有在线学习公式假设完全截断反馈不同，我们引入了受真实限价簿启发的动作依赖反馈模型：当发生交易时，交易者的估值保持隐藏，而当没有发生交易时，会揭示关于供应和需求的信息反馈。我们证明，这种额外信息从根本上改变了问题的学习性。在随机设置中，我们提出了一种消除算法，以高概率达到O(√T)的遗憾，而无需对交易者估值分布的光滑性做出任何假设。然后我们将这一结果扩展到广泛的均值回归价格过程中，考虑了局部自回归动态和基于累积偏离均值的较弱全局漂移条件。在任一假设下，我们建立了高概率O(√T)的遗憾界，依赖于一个新的有趣的集中不等式。最后，在对抗性设置中，我们设计了探索后扰动算法，保证了期望O(T^{2/3})的遗憾。我们的结果量化了在线市场做市中观察订单簿的价值，并证明了即使有限的动作依赖反馈也能显著改善遗憾保证，相比标准带隙反馈模型。

英文摘要

We study an online market-making problem in which a learner sequentially posts bid and ask prices for a single asset while interacting with traders holding private valuations. Unlike existing online learning formulations that assume fully censored feedback, we introduce an action-dependent feedback model inspired by real limit order books: when a trade occurs, the trader's valuation remains hidden, whereas when no trade occurs, informative feedback about supply and demand is revealed. We show that this additional information fundamentally changes the learnability of the problem. In the stochastic setting with i.i.d. market prices, we propose an elimination-based algorithm that achieves $O(\sqrt T)$ regret with high probability, without requiring any smoothness assumptions on the distribution of trader valuations. We then extend this result to a broad class of mean-reverting price processes by considering both local, autoregressive dynamics and a weaker global drift condition based on cumulative deviations from the mean. Under either assumption, we establish high-probability $O(\sqrt T)$ regret bounds, relying on a new concentration inequality of independent interest. Finally, in the adversarial setting with oblivious prices, we design an explore-then-perturb algorithm that guarantees $O(T^{2/3})$ regret in expectation. Our results quantify the value of observing the order book in online market making and demonstrate that even limited, action-dependent feedback can substantially improve regret guarantees compared to standard bandit feedback models.

URL PDF HTML ☆

赞 0 踩 0

2605.19557 2026-05-20 stat.ML cs.LG

Density-Ratio Losses for Post-Hoc Learning to Defer

基于密度比损失的后验学习延迟

Alexander Soen, Ragnar Thobaben, Joakim Jaldén, Richard Nock

AI总结本文研究了后验学习延迟（L2D）问题，通过理想分布的视角定义延迟，并提出基于密度比损失的CPE损失函数，通过阈值判断延迟决策，从而在不重新训练的情况下调整延迟率，同时揭示了Chow规则与专家倾斜贝叶斯后验之间的联系。

详情

Comments: Preprint

AI中文摘要

我们通过理想分布的视角研究后验学习延迟（L2D）。理想分布被定义为在其中模型能够取得低损失的数据分布的密度比重加权。我们通过将密度比估计还原为类别概率估计，推导出用于后验L2D评分器的DR CPE损失。延迟决策通过阈值化评分器进行，允许在不重新训练的情况下调整延迟率。对于基于KL的理想分布，我们的延迟规则在原始分布下恢复Chow规则，并在理想分布是联合或边缘分布时与专家倾斜的贝叶斯后验建立联系。实验表明，我们的方法在与常见基线相比具有竞争力，并且在不同数据集设置下更加稳健。更广泛地说，我们的结果将后验L2D视为理想分布之间的密度比学习，连接了Chow式规则、专家比较以及阐明了与异常检测等其他学习设置的相关联系。

英文摘要

We study post-hoc Learning to Defer (L2D) through the lens of ideal distributions: divergence-regularized reweightings of the data distribution under which a model attains low loss. We define deferral via the density-ratio between a model's and an expert's ideals. Using the reduction from density-ratio estimation to class-probability estimation, we derive the DR CPE losses for post-hoc L2D scorers. Deferral decisions are then made by thresholding the scorer, allowing deferral rates to be adjusted without retraining. For KL-based ideal distributions, our deferral rules recovers Chow's rule under the original distribution and a connection to an expert-tilted Bayes posterior -- which incorporates the expert's performance -- depending on if the ideal distributions are joint or marginal distributions. Experimentally, our approach is competitive compared to common baselines and more robust across dataset settings. More broadly, our results cast post-hoc L2D as density-ratio learning between ideal distributions, bridging Chow-style rules, expert comparison, and elucidating connections to related learning settings including anomaly detection.

URL PDF HTML ☆

赞 0 踩 0

2605.19519 2026-05-20 stat.ME

Inference for Fréchet Regression

Fréchet回归的推断

Wookyeong Song, Paromita Dubey, Hans-Georg Müller, Alexander Petersen

AI总结本文研究了Fréchet回归中响应变量为非欧几里得随机对象的推断问题，提出了一种显著性检验方法以检验回归函数是否依赖于预测变量，并通过模拟和实际数据展示了方法的有效性。

详情

Comments: 35 pages, 6 figures

AI中文摘要

线性回归被广泛用于建模响应变量和预测变量之间的关系。在现代应用中，会遇到响应变量为处于度量空间中的非欧几里得随机对象，配以欧几里得预测变量的数据。全局Fréchet回归将线性回归推广到此类一般设置，但统计推断仍鲜有研究。本文开发了一种显著性检验，用于检验Fréchet回归函数是否不依赖于预测变量，解决度量空间中缺乏线性运算的挑战。我们还开发了一种检验预测变量子集部分效应的检验方法，类似于但与经典线性回归在高斯假设下的部分F检验不同。关键思想是利用随机乘数以获得所提出检验统计量的非退化空分布，并采用柯西组合方法。我们在线性假设和连续替代假设下获得一致性和收敛性结果，并通过网络数据（用图拉普拉斯矩阵表示）和球面数据（用测地距离表示）的模拟展示了所提检验的有限样本性能。我们还通过纽约市出租车行程数据和美国能源来源组成数据中的运输网络进一步展示了我们的方法。

英文摘要

Linear regression is widely used to model relationships between responses and predictors. In modern applications, one encounters data where the responses are non-Euclidean random objects situated in a metric space, paired with Euclidean predictors. Global Fréchet regression generalizes linear regression to such general settings, however statistical inference has remained largely unexplored. We develop a significance test for the null hypothesis that the Fréchet regression function does not depend on the predictors, addressing the challenge of an absence of linear operations in metric spaces. We also develop a test for the partial effect of a subset of the predictors in analogy to, but quite different from, the partial F-tests commonly used in classical linear regression under Gaussian assumptions. Key ideas are to employ random multipliers to obtain non-degenerate null distributions for the proposed test statistics and the Cauchy combination method. We obtain consistency and convergence results under the null hypothesis and contiguous alternatives and demonstrate the finite sample performance of the proposed tests through simulations on network data represented by graph Laplacians and spherical data with geodesic distances. We further illustrate our method using transport networks arising from New York City taxi trip data and U.S. energy source compositional data.

URL PDF HTML ☆

赞 0 踩 0

2605.19401 2026-05-20 econ.GN q-fin.EC q-fin.GN stat.AP

External Demand, Domestic Monetary Conditions, and Remittance Dynamics in Nepal

外部需求、国内货币政策与尼泊尔汇款动态

Sahaj Raj Malla

AI总结本文研究了尼泊尔国内汇款占GDP比重的宏观经济决定因素和动态行为，重点分析主要目的地国家的外部需求和国内货币政策。通过1993-2024年的年度数据，构建了多国外部需求的综合指数和国内货币条件指数，采用ARDL界限检验、Engle-Granger共整合、动态OLS和两步误差修正模型等小样本计量方法，发现外部需求对汇款有显著正向影响，而紧缩的国内货币政策则有显著负向影响。

详情

Comments: 16 pages, 1 figure, 7 tables

AI中文摘要

本文研究了尼泊尔国内汇款占GDP比重的宏观经济决定因素和动态行为，重点分析主要目的地国家的外部需求和国内货币政策。通过1993-2024年的年度数据，构建了多国外部需求的综合指数和国内货币条件指数，采用ARDL界限检验、Engle-Granger共整合、动态OLS和两步误差修正模型等小样本计量方法，发现外部需求对汇款有显著正向影响，而紧缩的国内货币政策则有显著负向影响。误差修正模型确认了稳定的共整合关系，每年纠正约26%的不平衡。中期预测表明，汇款将在结构上保持重要地位，到2030年在基准条件下将达到GDP的28.3%，同时对外部需求冲击高度敏感。本文通过将PCA衍生的外部需求和货币条件指数整合到统一的ARDL-ECM框架中，推动了文献发展。聚焦于全球最依赖汇款的经济体之一，为货币政策校准、移民多样化和汇款流入的生产性利用提供了可操作的见解。

英文摘要

This study investigates the macroeconomic determinants and dynamic behaviour of personal remittances as a share of Gross Domestic Product (GDP) in Nepal, emphasizing external demand in major destination countries and domestic monetary policy. Using annual data (1993-2024), we construct composite indices via Principal Component Analysis (PCA) for multi-country external demand and a domestic Monetary Conditions Index (MCI). Our small-sample econometric pipeline includes Autoregressive Distributed Lag (ARDL) bounds testing, Engle-Granger cointegration, Dynamic OLS (DOLS), and a two-step Error Correction Model (ECM). We also employ Granger causality tests and multi-model forecasting using machine learning and ECM scenarios. The analysis reveals a strong positive long-run effect of external demand on remittances and a significant negative impact of tighter domestic monetary conditions. The ECM confirms a stable cointegrating relationship, correcting approximately 26% of disequilibria annually. Medium-term projections indicate remittances will remain structurally important, reaching around 28.3% of GDP by 2030 under baseline conditions, while exhibiting high sensitivity to external demand shocks. This study advances the literature by integrating PCA-derived external demand and monetary conditions indices within a unified ARDL-ECM framework for small samples. Focusing on one of the world's most remittance-dependent economies, it offers actionable insights for monetary policy calibration, migration diversification, and the productive utilization of remittance inflows.

URL PDF HTML ☆

赞 0 踩 0

2605.19391 2026-05-20 stat.ML cs.LG

Tweedie's Formulae and Diffusion Generative Models Beyond Gaussian

Tweedie's公式与超越高斯的扩散生成模型

Wenpin Tang, Nizar Touzi, Zikun Zhang, Xun Yu Zhou

AI总结本文扩展了Tweedie公式以适用于重要的非高斯过程，如几何布朗运动、平方贝塞尔过程和Cox-Ingersoll-Ross过程，并利用这些公式在图像和金融时间序列生成以及经验贝叶斯估计中应用非高斯扩散模型，展示了非高斯模型的潜力。

详情

Comments: 27 pages, 18 figures

AI中文摘要

扩散模型在生成未知数据分布的样本方面取得了显著成功。大多数流行的基于随机微分方程的扩散模型通过向目标分布添加高斯噪声，将其转换为简单的先验分布，然后使用去噪分数匹配，这是Tweedie公式的结果，来学习分数函数并从噪声中生成干净的样本。然而，具有状态依赖扩散系数的非高斯扩散模型以及相应的Tweedie公式一直被忽视。在本文中，我们扩展了Tweedie公式以适用于重要的非高斯过程，包括几何布朗运动（GBM）、平方贝塞尔（BESQ）过程和Cox-Ingersoll-Ross（CIR）过程，从而得到相应的去噪分数匹配目标。然后，我们应用推导出的公式，使用基于GBM和CIR的扩散模型进行图像和金融时间序列生成，并在BESQ设置下进行经验贝叶斯估计。报告的实验结果展示了非高斯模型的潜力。

英文摘要

Diffusion models have achieved remarkable success in generating samples from unknown data distributions. Most popular stochastic differential equation-based diffusion models perturb the target distribution by adding Gaussian noise, transforming it into a simple prior, and then use denoising score matching, a consequence of Tweedie's formula, to learn the score function and generate clean samples from noise. However, non-Gaussian diffusion models with state-dependent diffusion coefficient have been largely underexplored, as have the corresponding Tweedie's formulae. In this work, we extend Tweedie's formula to important non-Gaussian processes, including geometric Brownian motion (GBM), squared Bessel (BESQ) processes, and Cox-Ingersoll-Ross (CIR) processes, thereby yielding the corresponding denoising score-matching objectives. We then apply the derived formulae to image and financial time series generation using GBM- and CIR-based diffusion models, and to empirical Bayes estimation under the BESQ setting. The reported experimental results demonstrate the potential of non-Gaussian models.

URL PDF HTML ☆

赞 0 踩 0

2605.19370 2026-05-20 stat.AP

A General Statistical Framework for Hardy-Weinberg Equilibrium Inference on the X Chromosome

一个用于性染色体上Hardy-Weinberg平衡推断的一般统计框架

Lin Zhang, Andrew Paterson, Lei Sun

AI总结本文提出了一种基于稳健等位基因回归模型的一般统计框架，用于性染色体上Hardy-Weinberg平衡的推断，统一了现有基于Pearson卡方检验的方法，并明确了其假设、自由度和对sdMAF的敏感性。

详情

AI中文摘要

对Hardy-Weinberg平衡（HWE）进行检验是遗传数据分析中的基本组成部分，广泛用于质量控制和模型验证。尽管HWE检验在常染色体位点上已得到良好建立，但对性染色体的推断更为复杂，因为存在性别特异性基因型结构和可能的次要等位基因频率（sdMAF）差异。现有的检验方法在sdMAF和男性样本包含的假设上有所不同，往往导致不同的但不明确的零假设。我们开发了一个一般性的统计框架，用于HWE推断，使用稳健的等位基因回归模型。通过将HWE检验表述为等位基因层面的依赖性评估，该框架直接参数化Hardy-Weinberg不均衡，统一了基于Pearson卡方的现有检验方法，并明确了其假设、自由度和对sdMAF的敏感性。该框架还允许在统一的回归框架内进行协变量和人口结构调整。所提出的框架提供了稳健、可解释和灵活的推断，为常染色体和性染色体区域的HWE检验建立了统一的统计基础。模拟研究和对高覆盖度1000基因组计划数据的分析表明，当存在sdMAF时，常用的性染色体检验方法可能会产生I类错误膨胀或误导性推断。

英文摘要

Testing for Hardy-Weinberg equilibrium (HWE) is a fundamental component of genetic data analysis, widely used for quality control and model validation. Although HWE testing is well established for autosomal loci, inference on the X chromosome is more complex due to sex-specific genotype structures and potential sex differences in minor allele frequency (sdMAF). Existing tests differ in their assumptions about sdMAF and male sample inclusion, often leading to distinct but poorly characterized null hypotheses. We develop a general statistical framework for HWE inference using the robust allele-based regression model. By formulating HWE testing as an assessment of allele-level dependence, the framework directly parameterizes Hardy-Weinberg disequilibrium, unifies existing Pearson chi-square-based tests under explicit modeling assumptions, and clarifies their null hypotheses, degrees of freedom, and sensitivity to sdMAF. The framework also accommodates covariate and population-structure adjustment within a unified regression-based formulation. The proposed framework provides robust, interpretable, and flexible inference, establishing a unified statistical foundation for HWE testing across autosomal and X-chromosomal regions. Simulation studies and analysis of high-coverage 1000 Genomes Project data demonstrate that commonly used X-chromosome tests can exhibit inflated type I error or misleading inference when sdMAF is present.

URL PDF HTML ☆

赞 0 踩 0

2605.19364 2026-05-20 math.ST math.PR stat.TH

Optimal Spectral Algorithms for Correlated Two-view Models in High Dimensions

高维相关双视模型中的最优谱算法

Hang Du, Henry Hu, Saba Lepsveridze

AI总结本文研究高维相关双视模型中的谱方法，提出一种基于统计物理启发的通用框架，统一处理三种经典模型，并构造了在强检测和弱恢复下达到信息论下限的显式谱算法。

2605.19341 2026-05-20 cs.CL cs.AI cs.LG stat.ML

HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models

HalluWorld: 一个用于通过参考世界模型控制幻觉的基准

Emmy Liu, Varun Gangal, Michael Yu, Zhuofu Tao, Karan Singh, Sachin Kumar, Steven Y. Feng

AI总结本文提出HalluWorld基准，通过显式参考世界模型研究语言模型的幻觉问题，发现不同任务中幻觉表现不一致，表明幻觉源于多种失败模式而非单一能力。

详情

Comments: HalluWorld benchmark (code and data) at github.com/DegenAI-Labs/HalluWorld

AI中文摘要

幻觉仍然是大语言模型的核心失败模式，但现有基准在摘要、问答、检索增强生成和代理交互中操作不一致。这种碎片化使得不清楚一种缓解措施在不同情境中是否有效。当前基准要么需要人工标注和固定参考，要么依赖难以复现的观察。为研究根本原因，我们引入HalluWorld，一个基于显式参考世界模型的可扩展基准：当模型生成一个与该世界不一致的可观察声明时，即产生幻觉。基于这一观点，我们构建了合成和半合成环境，在其中参考世界完全指定，模型观点受控，幻觉标签自动产生。HalluWorld涵盖网格世界、国际象棋和现实终端任务，使世界复杂性、可观察性、时间变化和源冲突政策可控，并将幻觉细分为细粒度错误类别。我们评估了前沿和开放权重语言模型在这些设置中的表现，发现一致模式：前沿模型在直接观察信息上的感知幻觉接近解决，而多步状态跟踪和因果正向模拟仍然困难且未被扩展思考普遍解决。在终端设置中，模型在何时应放弃时也遇到困难。不同探测类型和领域中的失败分布不均，表明幻觉源于不同的失败模式而非单一能力。我们的结果表明，受控参考世界为测量和减少现代语言模型中的幻觉提供了可扩展且可重复的路径。

英文摘要

Hallucination remains a central failure mode of large language models, but existing benchmarks operationalize it inconsistently across summarization, question answering, retrieval-augmented generation, and agentic interaction. This fragmentation makes it unclear whether a mitigation that works in one setting reduces hallucinations across contexts. Current benchmarks either require human annotation and fixed references that may be memorized, or rely on observations in settings that are difficult to reproduce. To study root causes, we introduce HalluWorld, an extensible benchmark grounded in an explicit reference-world formulation: a model hallucinates when it produces an observable claim that is false with respect to this world. Building on this view, we construct synthetic and semi-synthetic environments in which the reference world is fully specified, the model's view is controlled, and hallucination labels are generated automatically. HalluWorld spans gridworlds, chess, and realistic terminal tasks, enabling controlled variation of world complexity, observability, temporal change, and source-conflict policy, and disentangling hallucinations into fine-grained error categories. We evaluate frontier and open-weight language models across these settings and find consistent patterns: perceptual hallucination on directly observed information is near-solved for frontier models, while multi-step state tracking and causal forward simulation remain difficult and are not generally solved by extended thinking. In the terminal setting, models also struggle with when to abstain. The uneven profile of failures across probe types and domains suggests that hallucinations arise from distinct failure modes rather than a single capability. Our results suggest that controlled reference worlds offer a scalable and reproducible path toward measuring and reducing hallucinations in modern language models.

URL PDF HTML ☆

赞 0 踩 0

2605.19313 2026-05-20 stat.ML cs.LG stat.ME

A Unified Framework for Structure-Aware Clustering and Heterogeneous Causal Graph Learning

一种用于结构感知聚类和异质因果图学习的统一框架

Honglin Du, Muxuan Liang, Xiang Zhong

AI总结本文提出了一种基于有向无环图的依赖聚类方法，通过交替方向乘子法解决结构异质性问题，实现对子群体依赖结构的鲁棒发现。

详情

AI中文摘要

在复杂的多变量系统中，变量间的相互作用由依赖结构定义，通常编码为有向无环图（DAGs）。然而，依赖结构可能在不同个体间变化，忽略这种结构异质性会引入偏差并掩盖子群体特定的依赖关系。为此，我们提出了一种基于有向无环图的依赖聚类方法，通过交替方向乘子法（ADMM）解决结构异质性问题，构建在结构方程模型（SEM）之上，联合学习聚类分配和子群体特定的依赖结构。我们通过平滑约束编码无环性，并整合一个组内截断Lasso融合惩罚（gTLP）以根据结构相似性聚类个体。这产生了一个非凸优化问题，结合稀疏性、无环性和结构一致性约束。我们通过增广拉格朗日方法解决非凸性，并使用适应的交替方向乘子法（ADMM）求解差分凸程序。对于某些图结构，如上三角邻接矩阵，我们的算法保证能收敛到KKT点。实验表明，我们的方法能够以高真阳性率和低假发现率恢复子群体特定的因果依赖结构。这种能力使我们能够在子群体标签未知的情况下，鲁棒地发现跨个体的异质依赖关系。

英文摘要

In complex multivariate systems, interactions among variables are defined by dependency structures, often encoded as directed acyclic graphs ($\text{DAGs}$). However, dependency structures can vary across subjects, and ignoring this structural heterogeneity introduces bias and obscures subpopulation-specific dependencies. To address this, we propose Directed Acyclic Graph-based Dependency Clustering via Alternating Direction Method of Multipliers (DAG-DC-ADMM), a unified framework built upon Structural Equation Modeling (SEM) that jointly learns cluster assignments and cluster-specific dependency structures. We encode acyclicity via a smooth constraint and integrate a groupwise truncated Lasso fusion penalty (gTLP) to cluster subjects based on their structural similarity. This yields a nonconvex optimization problem that incorporates sparsity, acyclicity, and structural consensus constraints. We address the nonconvexity by using the augmented Lagrangian method and solve it with an adapted version of the Alternating Direction Method of Multipliers (ADMM) for difference-of-convex programs. For certain graph structures, such as upper triangular adjacency matrices, our algorithm is guaranteed to converge to a Karush-Kuhn-Tucker (KKT) point. Experiments demonstrate that our method recovers cluster-specific causal dependency structures with a high true positive rate and a low false discovery rate. This capability enables the robust discovery of heterogeneous dependencies across subjects where the subpopulation label is unknown.

URL PDF HTML ☆

赞 0 踩 0

2605.19291 2026-05-20 stat.ML cs.LG math.ST stat.TH

Factor Augmented High-Dimensional SGD

因子增强的高维SGD

Shubo Li, Yuefeng Han, Xiufan Yu

AI总结本文提出了一种新的优化方法Factor-Augmented SGD (FSGD)，通过利用高维学习任务中的潜在因子表示，解决了传统两阶段降维方法在数据存储和在线学习中的限制，并建立了首个将潜在因子估计误差纳入SGD分析的理论框架，提供了在衰减步长和小批量更新下的$\ell^s$范数矩收敛性。

2605.19283 2026-05-20 cs.LG cs.AI stat.ML

EviTrack: Selection over Sampling for Delayed Disambiguation

EviTrack: 在延迟歧义中选择而非采样

Omer Haq

AI总结本文提出EviTrack框架，通过在潜在轨迹上进行选择而非边际状态，以在延迟歧义中实现更有效的序列推理，其核心方法是基于证据和似然比的轨迹假设选择，从而在数据支持后延迟承诺，优于基于采样的基线方法。

详情

Comments: https://github.com/Haq94/EviTrack

AI中文摘要

在延迟歧义的环境中，顺序预测具有挑战性，因为早期观测模糊，多个潜在解释在足够证据积累之前仍然合理。基于边际推断的标准方法在此设置中表现不佳，要么过早坍塌不确定性，要么在信息证据出现后无法恢复。我们引入EviTrack，一种测试时间推断框架，该框架在潜在轨迹上而非边际状态上操作。EviTrack维护一组竞争轨迹假设，并应用基于证据和似然比的选择来延迟承诺，直到有数据支持。受多假设跟踪和先检测前跟踪中的假设管理启发。为了评估此设置，我们构建了一个受控的合成基准，具有已知的潜在真实值，明确展示了延迟歧义。在匹配的推断预算下，EviTrack显著优于基于采样的基线方法，实现更快的后歧义恢复。这些结果表明，在延迟歧义环境中，适度的轨迹级选择比增加采样覆盖更有效，突显了选择而非采样作为可靠序列推断的关键原则。

英文摘要

Sequential prediction is challenging in regimes of delayed disambiguation, where early observations are ambiguous and multiple latent explanations remain plausible until sufficient evidence accumulates. Standard approaches based on marginal inference struggle in this setting, either collapsing uncertainty prematurely or failing to recover once informative evidence arrives. We introduce EviTrack, a test-time inference framework that operates over latent trajectories rather than marginal states. EviTrack maintains a set of competing trajectory hypotheses and applies evidence- and likelihood-ratio-based selection to delay commitment until supported by data, drawing inspiration from hypothesis management in multiple hypothesis tracking and track-before-detect. To evaluate this setting, we construct a controlled synthetic benchmark with known latent ground truth that explicitly exhibits delayed disambiguation. At matched inference budget, EviTrack substantially outperforms sampling-based baselines, achieving faster post-disambiguation recovery. These results show that, in delayed disambiguation regimes, moderate trajectory-level selection is more effective than increasing sampling coverage, highlighting selection over sampling as a key principle for reliable sequential inference.

URL PDF HTML ☆

赞 0 踩 0

2605.19275 2026-05-20 stat.AP

Open-Weight LLMs Are Often Competitive with Commercial APIs for Political Science Text Classification

开放权重大语言模型在政治科学文本分类中常与商业API竞争

Hanno Hilbig

AI总结研究探讨了本地开放权重模型在政治科学文本分类任务中是否能替代商业API，通过对比五种本地模型与四种商业API模型，发现本地模型在简单任务中表现竞争，但在复杂任务中商业API仍有优势，但整体上本地模型是可行的替代方案。

详情

AI中文摘要

研究人员能否用本地开放权重模型代替商业API进行大语言模型文本分类？本地模型可以避免边际API费用，保持数据在研究人员的机器上，并更方便地保存精确的模型版本。本文在34个政治科学分类任务上将五种本地模型与四种商业API模型进行了基准测试。本地模型在许多任务中表现竞争，尤其是在简单任务中。在任务特定的Oracle比较中，本地模型在9个任务中表现匹配或超过API性能；平均而言，最好的API模型比最好的本地模型高出0.015 F1。观察到的四个最强模型的均值在0.021 F1以内。API模型在具有许多标签或每个项目多个输出的复杂任务中优势最明显。将多个项目放在一个提示中通常可以减少本地模型每个项目的运行时间，但特定的模型-任务对可能会返回无效的响应格式或标签。总体而言，这些结果表明本地开放权重模型在许多政治科学分类任务中是一个实用的替代方案，前提是研究人员在扩大规模前验证候选模型的任务特定标签并检查批处理的可靠性。

英文摘要

Can researchers use local open-weight models instead of commercial APIs for LLM text classification? Local models avoid marginal API charges, keep data on the researcher's machine, and make exact model versions easier to preserve. I benchmark five local models against four commercial API models on 34 political science classification tasks. Local models are often competitive, especially on simpler tasks. In a task-specific oracle comparison, local models match or exceed API performance on 9 tasks; on average, the best API model exceeds the best local model by 0.015 F1. The four strongest observed model means fall within 0.021 F1. API models have their clearest edge on complex tasks with many labels or multiple outputs per item. Batching several items in one prompt usually reduces local runtime per item, but specific model-task pairs can return invalid response formats or labels. Taken together, the results make local open-weight models a practical candidate alternative for many political science classification tasks, provided researchers validate candidate models on task-specific labels and check batching reliability before scaling up.

URL PDF HTML ☆

赞 0 踩 0

2605.19231 2026-05-20 cs.LG stat.ML

DeRegiME: Deep Regime Mixtures for Probabilistic Forecasting under Distribution Shift

DeRegiME：用于分布偏移下概率预测的深度制度混合

Kieran Wood, Stefan Zohren, Stephen J. Roberts

AI总结 DeRegiME通过引入稀疏变分高斯过程，实现了概率预测中的制度混合，解决了神经预测器在处理分布偏移时的不足，提升了预测密度的准确性。

详情

AI中文摘要

我们介绍了DeRegiME--深度制度混合专家--一种直接多时间跨度的概率预测器，它将潜在的不确定性制度与底层信号分开，并使用稀疏变分高斯过程（GP）软地将每个预测位置分配给学习到的重复制度。该过程通过共享门将非平稳制度混合核和学生t分布似然结合起来，从而得到一个单一的稀疏GP后验，而不是GP专家的混合。DeRegiME解决了神经预测器的一个关键限制：点预测丢弃残差不确定性，而概率头--无论是单边际、未解释的混合、分位数集还是扩散样本--很少暴露残差的制度结构。然而，在噪声异方差时间序列中，分布偏移可能突然、逐渐或时间依赖性出现，通常出现在残差不确定性而非条件均值中。DeRegiME提供了一个可解释的均值-残差-噪声分解，通过直接求和的特征空间表示，将制度锚定为残差相似性的聚类，其转换表现为隐含的转折点。有效制度的数量通过粘性打破门进行修剪。我们证明了核的有效性及预测密度的正确性，并在十个基准和三个编码器网格上，DeRegiME在最强大的编码器匹配基线（DeepAR/GluonTS风格的动态学生t头）上将负对数预测密度（NLPD）提高了20.3%，并在CRPS（3.0%）和MSE（4.7%）上获得并行收益。改进在所有数据集中保持一致，这些数据集涵盖了突然、逐渐和季节性偏移。

英文摘要

We introduce DeRegiME -- Deep Regime Mixture of Experts -- a direct multi-horizon probabilistic forecaster that separates latent uncertainty regimes from the underlying signal and softly assigns each forecast location to learned recurring regimes using a sparse variational Gaussian process (GP) whose nonstationary regime-mixing kernel and Student-t likelihood combine per-regime sub-kernels and noise processes via a shared gate. This yields a single sparse-GP posterior, not a mixture of GP experts. DeRegiME addresses a key limitation of neural forecasters: point forecasts discard residual uncertainty, and probabilistic heads -- whether single marginals, uninterpreted mixtures, quantile sets, or diffusion samples -- rarely expose the regime structure of the residual. Yet distribution shift in noisy heteroskedastic time series may be abrupt, gradual, or horizon-dependent and often appears in residual uncertainty rather than the conditional mean. DeRegiME yields an interpretable mean-residual-noise decomposition with a direct-sum feature-space representation that anchors regimes as clusters of residual similarity whose transitions surface as implicit changepoints. The effective number of regimes is pruned by the stick-breaking gate. We prove kernel validity and predictive-density propriety, and across ten benchmarks and three encoder grids DeRegiME improves negative log predictive density (NLPD) by 20.3% over the strongest encoder-matched baseline, a DeepAR/GluonTS-style dynamic Student-t head, with parallel gains on CRPS (3.0%) and MSE (4.7%). Improvements are consistent across all datasets, which span abrupt, gradual, and seasonal shifts.

URL PDF HTML ☆

赞 0 踩 0

2605.19208 2026-05-20 stat.AP cs.LG stat.ML

Precision Physical Activity Prescription via Reinforcement Learning for Functional Actions

通过强化学习实现功能动作的精准体育活动处方

Gefei Lin, Rui Miao, Jennifer Sacheck, Xiaoke Zhang

AI总结本文提出了一种基于强化学习的算法，用于根据心血管代谢风险个性化优化每日步数分布，通过All of Us研究数据验证了该方法在提高健康生物标志物方面的有效性。

详情

AI中文摘要

体育活动（PA）在维持和改善健康方面起着重要作用。日常步数已成为一种关键的PA测量指标，可通过常见的可穿戴设备轻松获取。然而，缺乏推荐个性化最优每日步数分布的方法以最佳改善某些健康生物标志物。本文基于All of Us研究数据，该数据包括数月的步数计数以及关键健康生物标志物的重复测量，开发了一种新的离线强化学习（RL）算法，以学习与心血管代谢风险相关的个性化和最优PA分布，其中动作是一个函数，表示一段时间内每日步数分布。模拟研究显示，所提出的方法在现有连续动作RL方法中具有优势。从All of Us数据中学习到的最优策略通常建议人们增加日常步数，并在时间上遵循更一致的PA模式，同时为血糖水平、体重指数、血压、年龄和性别等亚组提供定制推荐。

英文摘要

Physical activity (PA) plays an important role in maintaining and improving health. Daily steps have been a key PA measure that is easily accessible with common wearable devices. However, methods are lacking to recommend a personalized optimal distribution of daily steps over a period of time for the best of certain health biomarkers. In this paper, we fill this void based on the data from the All of Us Research Program which includes months of step counts as well as repeated measurements of key health biomarkers. We develop a new offline reinforcement learning (RL) algorithm to learn personalized and optimal PA distributions associated with cardiometabolic risk, where the action is a function representing the daily step distribution over a period of time. Simulation studies demonstrate the advantage of the proposed approach over existing continuous-action RL methods. The learned optimal policy from the All of Us data generally suggests people take more daily steps and also follow a more consistent pattern of PA over time while offering tailored recommendations for subgroups in blood glucose level, body mass index, blood pressure, age, and sex.

URL PDF HTML ☆

赞 0 踩 0

2605.19195 2026-05-20 cond-mat.stat-mech cs.IT math.IT stat.ML

The Thermodynamic Costs of Simple Linear Regression

简单线性回归的热力学成本

Samuel H. D'Ambrosia, Sultan M. Daniels, Michael R. DeWeese, Anant Sahai

AI总结本文研究了简单线性回归算法的热力学成本，推导了在一般化误差需求下训练线性回归模型的最佳数据集大小的能耗感知缩放定律，并讨论了降低连续输入变量算法熵产的方法。

2605.19189 2026-05-20 math.ST math.FA stat.ME stat.TH

Inference Functionals and Observation Operators for Distributional Statistical Models

分布统计模型中的推断函数与观测算子

R. Labouriau

AI总结本文将推断函数推广到分布统计模型中，其中每个概率测度由分布-核对 $(T_θ, φ) ∈ \mathcal S'(\mathbb R) × \mathcal S(\mathbb R)$ 表示。核心方法是通过推断函数的根来推导最大似然估计的自洽性和渐近正态性，主要贡献是提出了观测算子和信息界层级。

详情

Comments: 40 pages, one figure and one table

AI中文摘要

本文将推断函数（Godambe, 1960）推广到分布统计模型中，在其中每个概率测度由分布-核对 $(T_θ, φ) ∈ \mathcal S'(\mathbb R) × \mathcal S(\mathbb R)$ 表示。这种推广是战略性的：最大似然估计的自洽性和渐近正态性并非来自最大化似然，而是来自MLE是正则推断函数的根。将推断函数扩展到分布设置中为缺乏经典密度或有限矩的模型提供了一种最优理论。扩展需要扩大观测概念：我们引入观测算子 $\mathcal O : \mathcal S'(\mathbb R) → \mathcal Y$，将分布模型映射到观测空间，并定义推断函数作为估计方程与这些算子的组合。该框架涵盖经典点观测、区间删失数据、卷积测量和基于变换的统计量。我们建立了在温和条件下渐近理论（一致性、渐近正态性、Godambe最优性），并通过Hájek-Le Cam卷积定理推导了信息界层级——经典Fisher信息主导通过观测算子可用的信息，而后者又主导任何推断函数捕获的信息。这两个间隙量化了信息损失的不同来源：观测机制和推断函数的选择。例子包括正弦推断函数用于厚尾分布、区间删失位置推断、椭圆轮廓模型和通过Bhapkar-Godambe投影的无关参数。

英文摘要

This paper generalises inference functions (Godambe, 1960) to distributional statistical models, in which each probability measure is represented by a distribution--kernel pair $(T_θ, φ) \in \mathcal S'(\mathbb R) \times \mathcal S(\mathbb R)$. The generalisation is strategically motivated: the key properties of maximum likelihood estimation-consistency and asymptotic normality -derive not from maximising the likelihood but from the MLE being the root of a regular inference function. Extending inference functions to the distributional setting provides an optimality theory for models lacking classical densities or finite moments. The extension requires enlarging the notion of observation. We introduce observation operators $\mathcal O : \mathcal S'(\mathbb R) \to \mathcal Y$ mapping distributional models to an observation space, and define inference functionals as estimating equations composed with these operators. The framework encompasses classical point observations, interval-censored data, convolutional measurements, and transform-based statistics. We establish asymptotic theory (consistency, asymptotic normality, Godambe optimality) under mild conditions and derive a hierarchy of information bounds -- classical Fisher information dominates the information available through the observation operator, which in turn dominates the information captured by any inference functional -- via the Hájek--Le~Cam convolution theorem. The two gaps quantify distinct sources of information loss: the observation mechanism and the choice of inference functional. Examples include sinusoidal inference functions for heavy-tailed distributions, interval-censored location inference, elliptically contoured models, and nuisance parameters via the Bhapkar--Godambe projection.

URL PDF HTML ☆

赞 0 踩 0

2605.19164 2026-05-20 stat.ME math.ST stat.TH

The Spatial Cram'{e}r--von Mises Test of Independence under $β$-Mixing: Asymptotic Theory and Python Implementation

空间Cramér-von Mises独立性检验在β-混合下的渐近理论与Python实现

Marco Mandap

AI总结本文研究了在多项式β-混合依赖下，空间Cramér-von Mises统计量的渐近分布，并提供了Python实现。通过结合三种方法，将经典独立性检验扩展到空间依赖数据，给出了三种权重函数的特征值公式，并通过模拟实验验证了其有效性。

详情

Comments: 34 pages

AI中文摘要

我们推导了在多项式β-混合依赖下，用于检验二维平稳随机场中双变量独立性的空间Cramér-von Mises统计量的渐近分布，并记录了能够复现所有模拟结果的Python实现。经典检验假设独立同分布观测；我们通过结合三个成分将其扩展到空间依赖数据：(i) 一种Davydov型协方差界，使得在θ>2(2+δ)/δ下空间协方差核具有可积性；(ii) 将内形式检验统计量重新表述为一个二阶退化U统计量，其核函数为Q=G₁⊗G₂，遵循De Wet (1980)；(iii) 通过Yoshihara (1976)将Gregory (1977)的U统计量极限定理扩展到β-混合序列。极限分布是相关χ²₁变量的加权和，其特征值因子为边际特征值的乘积；在小带宽极限下相关性消失，极限退化为经典i.i.d.形式。给出了三种权重函数（均匀、最优正态、Anderson-Darling）的显式特征值公式，产生可计算的临界值。软件通过循环嵌入生成Matérn随机场，通过内形式核分解计算检验统计量，通过蒙特卡洛评估渐近临界值，并运行基于排列的替代方法。模拟实验表明，Anderson-Darling权重达到最佳功效，而Mantel和交叉K检验在空间相关场中对交叉依赖无功效。

英文摘要

We derive the asymptotic distribution of the spatial Cram'{e}r--von Mises statistic for testing bivariate independence in stationary random fields on $\mathbb{R}^2$ under polynomial $β$-mixing dependence, and document the Python implementation that reproduces all simulation results. The classical test assumes i.i.d. observations; we extend it to spatially dependent data by combining three ingredients: (i) a Davydov-type covariance bound yielding integrability of the spatial covariance kernel under $θ> 2(2+δ)/δ$; (ii) a reformulation of the inner-form test statistic as a degenerate U-statistic of order~2 with product kernel $Q = G_1 \otimes G_2$, following De Wet (1980); and (iii) an extension of Gregory's (1977) U-statistic limit theorem to $β$-mixing sequences via Yoshihara (1976). The limit distribution is a weighted sum of correlated $χ^2_1$ variables whose eigenvalues factor as products of marginal eigenvalues; in the small-bandwidth limit the correlation vanishes and the limit reduces to the classical i.i.d. form. Explicit eigenvalue formulas are given for three weight functions (uniform, optimal normal, Anderson--Darling), producing computable critical values. The software generates Mat'{e}rn random fields by circulant embedding, computes the test statistic via the inner-form kernel decomposition, evaluates asymptotic critical values by Monte Carlo, and runs permutation-based alternatives. Simulation experiments show that the Anderson--Darling weight achieves the best power, while the Mantel and cross-$K$ tests have no power against cross-dependence in spatially correlated fields.

URL PDF HTML ☆

赞 0 踩 0

2605.19122 2026-05-20 stat.ML cs.LG

Dual-Channel Tensor Neural Networks: Finite-Sample Theory and Conformal Structure Selection

双通道张量神经网络：有限样本理论与符合结构选择

Elynn Chen, Jiayu Li, Zheshi Zheng, Jian Pei

AI总结本文提出双通道张量神经网络（DC-TNN），通过分解张量输入为低秩核心和稀疏细化部分，并通过耦合的神经通道处理两者。该框架结构无关，可容纳CP、Tucker和张量列车核心。在估计方面，建立了DC-TNN估计器的非渐近风险界，并展示了有效维度由核心秩和细化稀疏性共同决定。在推断方面，开发了结构感知符合ROC程序，产生具有有限样本、分布自由覆盖的ROC和AUC置信带。基于此，提出了符合结构选择器，是首个具有有限样本有效性的分布自由候选张量分解选择方法。模拟和蛋白质数据集分析显示了竞争性的预测精度、可靠的不确定性量化和一致的张量结构恢复。

详情

AI中文摘要

张量值数据自然出现在神经影像、基因组学、气候科学和时空网络中，其中多线性依赖关系在模式间携带信息，而向量化会破坏这些信息。现有方法要么施加单一低秩结构，可能遗漏局部信号，要么将张量视为长向量，从而丢弃其多维几何。我们提出双通道张量神经网络（DC-TNN），将每个张量输入分解为低秩核心和稀疏细化，并通过耦合的神经通道处理两个组件。该框架结构无关，可容纳CP、Tucker和张量列车核心于单一架构中。在估计方面，我们建立了DC-TNN估计器的非渐近风险界，将其分解为网络近似、核心估计和细化选择项，并显示有效维度由核心秩和细化稀疏性共同决定，而非由张量环境大小决定。在推断方面，我们开发了结构感知符合ROC程序，校准在核心-细化潜在空间中，并产生具有有限样本、分布自由覆盖的ROC和AUC置信带。基于此，我们提出了符合结构选择器，据我们所知，是首个具有有限样本有效性的分布自由候选张量分解选择方法。模拟和蛋白质数据集分析显示了竞争性的预测精度、可靠的不确定性量化和一致的张量结构恢复。

英文摘要

Tensor-valued data arise naturally in neuroimaging, genomics, climate science, and spatiotemporal networks, where multilinear dependencies across modes carry information that is destroyed under vectorization. Existing approaches either impose a single low-rank structure, which can miss localized signal, or treat the tensor as a long vector, which discards its multiway geometry. We propose a *Dual-Channel Tensor Neural Network* (DC-TNN) that decomposes each tensor input into a low-rank core and a sparse refinement, and processes the two components through coupled neural channels. The framework is structure-agnostic and accommodates CP, Tucker, and tensor-train cores within a single architecture. For estimation, we establish non-asymptotic risk bounds for the DC-TNN estimator that decompose into network approximation, core estimation, and refinement-selection terms, and show that the effective dimension is determined jointly by the core rank and refinement sparsity rather than by the ambient tensor size. For inference, we develop a *structure-aware conformal ROC* procedure that calibrates within the core-refinement latent space and produces ROC and AUC confidence bands with finite-sample, distribution-free coverage. Building on this, we propose a *conformal structure selector* that, to our knowledge, is the *first distribution-free procedure* for choosing among candidate tensor decompositions with finite-sample validity. Simulations and an analysis of a protein dataset demonstrate competitive predictive accuracy, reliable uncertainty quantification, and consistent recovery of the tensor structure.

URL PDF HTML ☆

赞 0 踩 0

2605.19113 2026-05-20 stat.ME cs.LG stat.ML

Learning Interpretable Point-Based Clinical Risk Scores via Direct Optimization

通过直接优化学习可解释的基于点的临床风险评分

Ying Cui, Albert M Li, Vivek Charu, Yeon-Mi Hwang, Tina Hernandez-Boussard, Lu Tian

AI总结本文提出了一种新的机器学习算法，通过灵活的贪心优化策略直接学习可解释的基于点的临床风险评分，以在明确的最优性目标下优化加法评分。

详情

Comments: 23 pages, 4 figures

AI中文摘要

许多临床风险评分被部署为加法规则，其中相关的二元预测特征被分配非负整数点。这些整数权重不仅使评分在实践中更容易使用，还促进了所得到的预测模型的稀疏性。此类风险评分通常通过首先拟合回归模型，然后经过适当缩放后将估计的系数四舍五入到最近的整数来获得。这种方法计算速度快，但不能保证最终评分的最优性。替代方法是通过遍历所有可能的整数权重，将问题视为整数规划任务，直接优化价值函数。然而，相关计算负担可能相当大，尤其是当价值函数是非凸甚至不连续时。在本文中，我们开发了新的机器学习算法，采用灵活的贪心优化策略，在明确且合理的最优性目标下直接学习此类加法评分。我们应用所提出的方法，利用Epic Cosmos中的大规模电子健康记录（EHR）队列，构建一个整数加权共病评分，用于衡量出院后死亡风险。我们还进行了模拟研究，以考察有限样本的操作特性。

英文摘要

Many clinical risk scores are deployed as additive rules with nonnegative integer points assigned to relevant binary predictive features. These integer weights not only make the score easier to use in practice but also promote sparsity in the resulting prediction model. Such risk scores are often derived by first fitting a regression model and then rounding the estimated coefficients to the nearest integer after appropriate scaling. This approach is computationally fast but does not guarantee optimality of the resulting score. Alternatively, one may search over all possible integer weights to directly optimize a value function by posing the problem as an integer programming task. However, the associated computational burden can be substantial, especially when the value function is nonconcave or even discontinuous. In this paper, we develop new machine learning algorithms that employ a flexible greedy optimization strategy to learn such additive scoring directly under explicit and sensible optimality objectives. We apply the proposed method to a large electronic health record (EHR) cohort in Epic Cosmos to construct an integer-weighted comorbidity score for measuring the risk of post-discharge mortality. We also conduct a simulation study to examine the finite-sample operating characteristics.

URL PDF HTML ☆

赞 0 踩 0

2605.19100 2026-05-20 stat.CO stat.AP stat.ME

ldmppr: Location Dependent Marked Point Processes in R

ldmppr：R中基于位置的标记点过程

Lane Drew, Andee Kaplan

AI总结本文提出ldmppr包，用于估计、评估、从标记空间点过程中模拟和可视化具有位置依赖性的标记点过程。该包解决了传统点过程中标记与位置独立的假设在林业等应用中不合理的问题，提供了一个实用框架生成具有标记与位置依赖性的点过程，并展示了包在实际数据中的使用方法。

详情

Comments: 31 pages, 5 figures

AI中文摘要

在本文中，我们介绍了ldmppr，一个用于估计、评估、从标记空间点过程中模拟和可视化具有位置依赖性的标记点过程的R包。截至目前，通常假设点过程的标记与位置独立。然而，当处理许多点过程，如林业应用中出现的点过程时，这种独立性假设显得不合理。我们引入了一个实用框架，用于生成具有标记与位置之间依赖性的标记点过程。我们简要讨论了我们建模方法背后的理论，并概述了在典型的真实数据场景中使用该包的方法。我们强调了该包在生成和评估给定模型拟合度方面的功能，使用户能够根据参考模式或感兴趣的参数值生成逼真的点模式。

英文摘要

In this article, we present $\textbf{ldmppr}$, an R package for estimating, evaluating, simulating from, and visualizing location-dependent marked spatial point processes. To date, it has commonly been assumed that the marks associated with a point process are independent of the locations. However, when dealing with many point processes, such as those arising in forestry applications, the independence assumption proves unreasonable. We introduce a practical framework for generating marked point processes with dependence between the marks and locations. We provide a brief discussion of the theory underpinning our modeling approach and outline the use of the package in a typical scenario involving real data. We highlight the functionality of the package for both generating from and assessing the goodness-of-fit of a given model, enabling users to generate realistic point patterns given a reference pattern or parameter values of interest.

URL PDF HTML ☆

赞 0 踩 0

2605.19096 2026-05-20 math.NA cs.NA stat.CO

Sharp analysis of sketched least squares and randomized low-rank approximation

对sketched least squares和randomized low-rank approximation的精确分析

Ethan N. Epperly, Robert J. Webber

AI总结本文研究了sketched least squares和randomized low-rank approximation中最优随机嵌入及其最紧误差界的问题，证明了正交矩阵在sketch-and-solve算法中是最小最大最优，而任何旋转不变嵌入在randomized SVD中也是最小最大最优，并通过实验证实了不同随机嵌入在实践中具有相似的准确性。

详情

Comments: 23 pages, 2 figures

AI中文摘要

两种广泛使用的随机算法是用于最小二乘回归的sketch-and-solve方法和用于低秩近似的随机SVD。这些算法应用随机嵌入来压缩目标矩阵，并在压缩矩阵上进行计算以节省计算成本。本文提出问题：在这些算法中，最优的随机嵌入是什么？以及最优嵌入的最紧误差界是什么？本文证明了正交矩阵在sketch-and-solve算法中是最小最大最优，而任何旋转不变嵌入在随机SVD中也是最小最大最优。随后，本文获得了sketched least squares和随机SVD的最佳可能误差界。最后，实验证实了普遍现象，即几种随机嵌入在实践中与最优嵌入具有相似的准确性。

英文摘要

Two widely used randomized algorithms are the sketch-and-solve method for least-squares regression and the randomized SVD for low-rank approximation. These algorithms apply a random embedding to compress a target matrix, and they perform computations on the compressed matrix to save computational cost. This paper asks, what is the optimal random embedding in these algorithms? Also, what is the sharpest possible error bound for the optimal embedding? The paper proves that a random orthonormal matrix is minimax optimal for the sketch-and-solve algorithm while any rotation-invariant embedding is minimax optimal for the randomized SVD. Following these results, the paper obtains the best possible error bounds for sketched least-squares and the randomized SVD. Last, empirical experiments provide evidence of universality phenomena, in which several random embeddings lead to similar accuracy to the optimal embeddings in practice.

URL PDF HTML ☆

赞 0 踩 0

2605.19095 2026-05-20 cs.LG cs.AI stat.ML

ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models

ScheduleFree+: 将学习率自由和调度自由学习扩展到大型语言模型

Aaron Defazio

AI总结本文提出了一种学习率自由和调度自由的学习方法（ScheduleFree+），用于训练大型语言模型，该方法在大规模训练中显著优于传统的Warmup-Stable-Decay（WSD）调度方案，并证明了调度自由学习在长周期训练中的有效性。

2605.19034 2026-05-20 stat.ME

Sparse Latent Class Analysis: Post-Estimation Refinement via Item-level Pseudo-Likelihood

稀疏潜在类别分析：通过项目级伪似然进行后估计细化

Yuxuan Xu, Lea Kaufmann, Yunxiao Chen, Maria Kateri, Irini Moustaki

AI总结本文提出一种计算高效的后估计细化方法，通过稀疏模型估计提升模型可解释性，通过项目特定响应概率的模型选择进一步减少冗余层次，从而获得更易于解释的稀疏矩阵。

详情

Comments: 32 pages, 4 figures

AI中文摘要

潜在类别分析（LCA）被广泛用于识别社会和行为科学中的不可观测子群体。长期以来，LCA面临的问题是潜在类别的可解释性，因为估计的项目响应概率矩阵复杂度高。为了解决这一问题，我们提出了一种计算高效的后估计细化程序，通过稀疏模型估计来增强模型可解释性。该方法首先估计一个经典的、无约束的潜在类别模型，并使用贝叶斯信息准则（BIC）确定类别数量。随后进行细化步骤，进一步基于初始估计对项目特定的响应概率进行模型选择。该细化过程惩罚每个项目不同响应概率层次的数量，将冗余层次合并，从而得到一个显著更易于解释的稀疏矩阵，比经典LCA产生的矩阵更容易解释。我们提供了渐近理论，证明所提出的方法能够一致地恢复每个项目响应概率的稀疏模式，并通过广泛的模拟进一步验证其性能。通过应用于关于社会角色表现的调查数据，进一步展示了所提方法的实用功效，其中它提供了简洁且清晰的潜在类别特征。所提出方法的代码可在https://github.com/florence07/Sparse-LCA-Refinement公开获取。

英文摘要

Latent Class Analysis (LCA) is widely used to identify unobserved subgroups in social and behavioural sciences. A long-standing challenge for LCA is the interpretability of the latent classes, due to the high complexity of the estimated item response probability matrix. To address this, we propose a computationally efficient post-estimation refinement procedure that enhances model interpretability by a sparse model estimate. The method begins by estimating a classical, unrestricted, latent class model and determining the number of classes using the Bayesian information criterion (BIC). It is followed by a refinement step that further performs model selection on the item-specific response probabilities based on the initial estimate. This refinement penalises the number of distinct response probability levels per item, collapsing redundant levels to yield a sparse matrix that is significantly easier to interpret than those produced by classical LCA. We provide asymptotic theory showing that the proposed procedure consistently recovers the sparse pattern of the item response probabilities for each item, and further validate its performance through extensive simulations. The practical power of the proposed method is further illustrated via an application to survey data on social role performance, where it provides a parsimonious and clear characterisation of the resulting latent classes. The code for implementing the proposed method is publicly available at https://github.com/florence07/Sparse-LCA-Refinement.

URL PDF HTML ☆

赞 0 踩 0

2605.19024 2026-05-20 stat.ML cs.LG stat.ME

Conformal Prediction via Transported Beta Laws

通过运输的贝塔定律进行符合预测

Thiago R. Ramos, Helton Graziadei, Luben M. C. Cabezas

AI总结本文研究了通过实现的符合阈值诱导的校准-条件覆盖定律，利用贝塔分布作为有限样本参考对象，并通过Wasserstein距离量化偏离，从而提供对边际覆盖差距和坏校准概率的直接界限，并区分不同非i.i.d行为的来源。

详情

AI中文摘要

分割符合预测在交换性下提供有限样本边际覆盖保证，但此保证平均于随机校准样本。我们研究的是由实现的符合阈值诱导的校准-条件覆盖定律。在连续i.i.d情况下，此定律恰好为Beta(k,n+1-k)，因此常规的边际保证对应于其均值。我们将此贝塔定律作为有限样本参考对象，并利用Wasserstein距离在[0,1]上量化偏离。该框架提供了对边际覆盖差距和坏校准概率的直接界限，并根据如何变形贝塔参考来区分不同的非i.i.d行为：测试侧偏移通过覆盖尺度上的运输映射作用，而校准依赖性改变顺序统计学定律本身。我们将在尺度-偏移、聚类和稳定混合设置中实例化该框架，其中诱导的变形可以明确表征或通过Berry-Esseen近似表征。在依赖过程上的模拟证实，一阶近似在中等样本大小下能够跟踪经验Wasserstein距离。

英文摘要

Split conformal prediction provides finite-sample marginal coverage under exchangeability, but this guarantee averages over the random calibration sample. We study instead the law of the calibration-conditional coverage induced by a realized conformal threshold. In the continuous i.i.d. setting this law is exactly $Beta(k,n+1-k)$, so the usual marginal guarantee corresponds to its mean. We take this beta law as a finite-sample reference object and quantify departures from it using Wasserstein distances on $[0,1]$. The framework yields direct bounds on marginal coverage gaps and on bad-calibration probabilities, and separates different sources of non-i.i.d. behavior according to how they deform the beta reference: test-side shift acts through a transport map on the coverage scale, while calibration dependence changes the order-statistic law itself. We instantiate the framework in scale-shift, clustered, and stationary mixing settings, where the induced deformations can be characterized explicitly or through Berry-Esseen approximations. Simulations on dependent processes confirm that the first-order approximation tracks the empirical Wasserstein distance even at moderate sample sizes.

URL PDF HTML ☆

赞 0 踩 0

2605.15165 2026-05-20 cs.CY math.PR stat.AP

Due Process on Hold: A Queueing Framework for Improving Access in SNAP

暂缓程序：一种排队框架用于改善SNAP访问

Andrew Daw, Chloe Pache, Angela Zhou

AI总结本文提出了一种基于排队模型的评估框架，用于评估和改进SNAP系统中的访问问题，通过分析系统动态和访问需求之间的相互作用，揭示了社会服务中特有的现象，如重拨和放弃，从而改进访问系统的设计和政策制定。

详情

AI中文摘要

美国的社会安全网在大规模提供基本服务，但访问障碍仍然存在，因为拥挤的联系方式或呼叫中心是申请完成和帮助的主要方式。在Holmes v. Knodell案中，密苏里州的SNAP呼叫中心如此拥挤，以至于近一半的申请拒绝是程序性的，而不是由于申请人无法完成必需的面试。法官裁定这些系统失败导致了程序正当程序的违反。我们提出了一种基于运营研究和管理的排队模型的性能评估框架，以评估和改进此类系统的访问。呼叫中心的操作访问失败与以往福利提供中的自动化失败不同。系统动态和访问需求之间的相互作用导致了新兴的任意性，而不是显式的算法规则，这使得诊断和修复本质上是系统层面的。我们开发了一个排队模型，该模型纳入了区分社会服务与标准服务领域的现象，如重拨和放弃，从而产生内生拥堵。标准的Erlang-A排队指导不考虑内生拥堵，从根本上低估了人员，这可能导致实践中持续的不足。使用流近似，我们推导出稳态性能指标，以分析捆绑人员和服务交付变化的影响。我们通过法院文件中披露的呼叫中心数据拟合模型参数。我们的排队模型可以支持访问系统的先验评估和设计，为改善访问提供政策杠杆，并提供证据说明申请人是否有机会在大规模上得到服务。

英文摘要

The U.S. social safety net delivers essential services at mass scale, but access burdens persist, as congested contact or call centers serve as a primary mode of application completion and assistance. In Holmes v. Knodell, Missouri's SNAP call centers were so congested that nearly half of all application denials were procedural, caused by applicants' inability to complete required interviews, rather than underlying ineligibility. The judge ruled these system failures led to a violation of procedural due process. We propose a performance evaluation framework based on queueing models from operations research and management to assess and improve access in such systems. Operational access failures of call centers are distinct from prior automation failures in benefits provision. Emergent arbitrariness arises from interactions between system dynamics and access demand, rather than from an explicit algorithmic rule, making diagnosis and repair inherently system-level. We develop a queueing model that incorporates phenomena that distinguish social services from standard service domains, redials and abandonment, through which backlogs generate endogenous congestion. Standard queueing guidance from Erlang-A that does not address endogenous congestion fundamentally understaffs, which could lead to persistent shortfalls in practice. Using a fluid approximation, we derive steady-state performance metrics to analytically characterize the impacts of bundled staffing and service delivery changes. We fit model parameters to call-center data disclosed in court documents. Our queueing model can support ex-ante evaluation and design of access systems, inform policy levers for improving access, and provide evidence about whether applicants are afforded a meaningful opportunity to be served at scale.

URL PDF HTML ☆

赞 0 踩 0

2605.09569 2026-05-20 math.ST cs.IT math.IT stat.ML stat.TH

Minimax optimal submatrix detection: Sharp non-asymptotic rates

子矩阵检测的最优最小最大方法：精确的非渐近速率

Parker Knight, Julien Chhor

AI总结该研究探讨了在均值矩阵中检测植入子矩阵的问题，提出了一种非渐近最优的检测方法，并建立了最小最大下界，同时扩展了这些测试以适应未知的稀疏性水平。

详情

Comments: 75 pages. Significant extension of our prior work arXiv:2505.18372

AI中文摘要

给定一个观察到的矩阵Y∈R^{d1×d2}，来自模型Y=X+E，其中X是常数，E具有i.i.d. N(0,1)的条目，我们考虑在均值矩阵X中检测植入子矩阵的问题。具体而言，我们旨在区分原假设X=0与备择假设，其中X仅在s1×s2的子矩阵上有非零条目，且这些条目下限被限制在μ>0。我们建立了最小最大下界，以确定μ必须多大才能确保在高概率下区分两个假设。此外，我们推导了新的最小最大最优测试，达到下界，并描述了这些测试的扩展，以适应未知的稀疏性水平s1和s2。与之前的工作不同，后者对s1,s2,d1和d2有严格的假设，而我们的非渐近上界和下界对于这些参数的任何配置都匹配。

英文摘要

Given an observation $\mathbf Y \in \mathbb{R}^{d_1\times d_2}$ from the model $\mathbf Y = \mathbf X + \mathbf E$ where $\mathbf X$ is constant and $\mathbf E$ has i.i.d. $N(0,1)$ entries, we consider the problem of detecting a planted submatrix in the mean matrix $\mathbf X$. Specifically, we aim to distinguish the null hypothesis $\mathbf X = 0$ from the alternative hypothesis in which $\mathbf X$ is non-zero only on a submatrix of size $s_1 \times s_2$ with elevated entries bounded below by $μ>0$. We establish a minimax lower bound characterizing how large $μ$ must be to ensure that the two hypotheses are distinguishable with high probability. Furthermore, we derive novel minimax-optimal tests achieving the lower bound, and describe extensions of these tests that are adaptive to unknown sparsity levels $s_1$ and $s_2$. In contrast with previous work, which required restrictive assumptions on $s_1,s_2, d_1$ and $d_2$, our non-asymptotic upper and lower bounds match for any configuration of these parameters.

URL PDF HTML ☆

赞 0 踩 0

2605.05480 2026-05-20 cs.LG cs.AI stat.ML

GRALIS: A Unified Canonical Framework for Linear Attribution Methods via Riesz Representation

GRALIS：通过里斯表示建立线性归因方法的统一规范框架

Raimondo Fanale

AI总结本文提出GRALIS框架，通过里斯表示理论统一了线性归因方法，提供七个形式定理保证归因方法的准确性、收敛性、Shapley交互值、Hoeffding ANOVA分解、Sobol敏感性泛化和多尺度扩展，展示了其在医学图像上的初步验证结果。

详情

Comments: 25 pages, 6 tables, 2 figures. Theoretical framework with preliminary experimental validation on BreaKHis (1,187 images, DenseNet-121). Extended empirical comparison in preparation

AI中文摘要

深度神经网络的主要XAI归因方法——GradCAM、SHAP、LIME、集成梯度——基于不同的理论基础且无法正式比较。我们提出了GRALIS（梯度-里斯平均局部积分Shapley），一个建立归因表示理论的数学框架：L^2(Q, mu)上的每一个可加、线性和连续的归因功能都具有唯一的规范表示（Q，w，Delta），由里斯表示定理证明其必要性。该类包括SHAP、IG、LIME和线性化GradCAM，但不包括非线性功能如标准GradCAM或注意力图。七个形式定理提供了任何单个方法都缺乏的同时保证：（T1）必要规范形式；（T2）精确完备性；（T3）蒙特卡洛收敛O(1/sqrt(m))+O(1/k)；（T4）精确Shapley交互值；（T5）Hoeffding ANOVA分解；（T6）Sobol敏感性泛化；（T7）多尺度扩展（MS-GRALIS）具有最小方差权重。代数附录通过Mobius变换证明GRALIS-SIV对应关系，无需循环论证。GRALIS满足13.5/14个公理性质，而单独方法仅为2.5-6/14，包括完备性、敏感性、局部性、k阶交互和最优多尺度聚合。在BreaKHis（1,187例病理图像，DenseNet-121）上的初步验证报告删除忠实度AUC+0.015（恶性），96%类条件一致性，SAL=0.762±0.109和稀疏性指数0.39。与基线XAI方法的扩展比较计划在配套论文中进行。

英文摘要

The main XAI attribution methods for deep neural networks -- GradCAM, SHAP, LIME, Integrated Gradients -- operate on separate theoretical foundations and are not formally comparable. We present GRALIS (Gradient-Riesz Averaged Locally-Integrated Shapley), a mathematical framework establishing a representation theory for attributions: every additive, linear, and continuous attribution functional on L^2(Q,mu) admits a unique canonical representation (Q, w, Delta), proved necessary by the Riesz Representation Theorem. This class encompasses SHAP, IG, LIME and linearized GradCAM, but excludes nonlinear functionals such as standard GradCAM or attention maps. Seven formal theorems provide simultaneous guarantees absent in any individual method: (T1) necessary canonical form; (T2) exact completeness; (T3) Monte Carlo convergence O(1/sqrt(m))+O(1/k); (T4) exact Shapley Interaction Values; (T5) Hoeffding ANOVA decomposition; (T6) Sobol sensitivity generalization; (T7) multi-scale extension (MS-GRALIS) with minimum-variance weights. An algebraic appendix justifies the GRALIS-SIV correspondence via the Mobius transform without circularity. GRALIS satisfies 13.5/14 axiomatic properties vs. 2.5-6/14 for individual methods, including completeness, sensitivity, locality, order-k interactions and optimal multi-scale aggregation simultaneously. Preliminary validation on BreaKHis (1,187 histology images, DenseNet-121) reports deletion faithfulness AUC +0.015 (malignant), 96% class-conditional consistency, SAL = 0.762+/-0.109 and sparsity index 0.39. Extended comparison with baseline XAI methods is planned for a companion paper.

URL PDF HTML ☆

赞 0 踩 0

2605.02099 2026-05-20 math.ST stat.TH

Entropic Strict Minimum Message Length and Its Connections to PAC-Bayes and NML

熵严格最小信息量及其与PAC-Bayes和NML的联系

Enes Makalic, Daniel F. Schmidt

AI总结本文提出熵严格最小信息量（SMML），一种风险敏感的严格最小信息量编码扩展。该准则将先验预测分布下的期望两部分码长替换为指数确定等价，从而定义了一族参数化的编码规则，介于贝叶斯平均情况编码和最坏情况最小最大编码之间。研究显示，普通SMML在风险中性极限下被恢复，而极端风险敏感极限产生最小最大码长准则；当以 oracle 最大似然码长为中心时，该准则与规范化最大似然（NML）最小遗憾原则一致。进一步证明熵SMML可作为Kullback-Leibler正则化最坏情况期望码长的变分刻画，具有PAC-Bayes类型的解释。建立样本量n和风险参数τ的联合渐近理论，显示在正则参数模型中，贝叶斯、鲁棒和最小最大编码制度的过渡发生在对数尺度上。对于正则指数族，固定码书分区保持在充分统计量空间中为仿射，而码点满足倾斜矩匹配条件并可解释为倾斜Bregman重心。这些结果将熵SMML定位为信息论上连接MML、PAC-Bayes和MDL的桥梁。

详情

AI中文摘要

我们引入了熵严格最小信息量（SMML），一种风险敏感的严格最小信息量编码扩展。所提出的准则将先验预测分布下的期望两部分码长替换为指数确定等价，从而定义了一族参数化的编码规则，介于贝叶斯平均情况编码和最坏情况最小最大编码之间。我们展示普通SMML在风险中性极限下被恢复，而极端风险敏感极限产生最小最大码长准则；当以oracle最大似然码长为中心时，该准则与规范化最大似然（NML）最小遗憾原则一致。我们进一步证明熵SMML可作为Kullback-Leibler正则化最坏情况期望码长的变分刻画，具有PAC-Bayes类型的解释。我们建立了样本量n和风险参数τ的联合渐近理论，显示在正则参数模型中，贝叶斯、鲁棒和最小最大编码制度的过渡发生在对数尺度上。对于正则指数族，固定码书分区保持在充分统计量空间中为仿射，而码点满足倾斜矩匹配条件并可解释为倾斜Bregman重心。这些结果将熵SMML定位为信息论上连接MML、PAC-Bayes和MDL的桥梁。

英文摘要

We introduce entropic strict minimum message length (SMML), a risk-sensitive generalization of strict minimum message length coding. The proposed criterion replaces expected two-part codelength under the prior predictive distribution with an exponential certainty equivalent, thereby defining a one-parameter family of coding rules that interpolates between Bayesian average-case coding and worst-case minimax coding. We show that ordinary SMML is recovered in the risk-neutral limit, while the extreme risk-sensitive limit yields a minimax codelength criterion; when centered by the oracle maximum likelihood codelength, this criterion coincides with the normalized maximum likelihood (NML) minimax-regret principle. We further prove that entropic SMML admits a variational characterization as a Kullback--Leibler-regularized worst-case expected codelength, giving it a PAC--Bayes-type interpretation. We establish a joint asymptotic theory linking the sample size $n$ and the risk parameter $τ$, showing that in regular parametric models the transition between Bayesian, robust, and minimax coding regimes occurs on a logarithmic scale. For regular exponential families, the fixed-codebook partition remains affine in sufficient-statistic space, while the codepoints satisfy a tilted moment-matching condition and admit an interpretation as tilted Bregman centroids. These results position entropic SMML as an information-theoretic bridge between MML, PAC--Bayes, and MDL.

URL PDF HTML ☆

赞 0 踩 0

2604.19265 2026-05-20 stat.ME

From design of experiments to analysis of variance of multivariate data: a tutorial review on ANOVA simultaneous component analysis

从实验设计到多变量数据方差分析：关于ANOVA同时成分分析的教程综述

José Camacho, Jokin Ezenarro, Daniel Schorn-García, Johan A. Westerhuis

AI总结本文综述了ANOVA同时成分分析（ASCA）在高维实验数据分析中的应用，介绍了其与实验设计的结合方法，并提出了使用ASCA的最佳实践。

2604.18739 2026-05-20 cs.LG stat.ML

Discrete Tilt Matching

离散倾斜匹配

Yuyuan Chen, Shiyi Wang, Peter Potaptchik, Jaeyeon Kim, Michael S. Albergo

AI总结本文提出了一种无需概率模型的离散倾斜匹配方法，用于改进扩散大语言模型的微调，通过局部解掩码后验的状态级匹配来提高训练稳定性并防止模式崩溃。

详情

AI中文摘要

Masked diffusion large language models (dLLMs) 是一种有前景的替代自回归生成方法。尽管最近强化学习 (RL) 方法已被适应到 dLLM 微调中，但其目标通常依赖于序列级边际似然，这在掩码扩散模型中是不可行的。为了解决这个问题，我们推导出离散倾斜匹配 (DTM)，一种无需概率模型的方法，将 dLLM 微调重新表述为在奖励倾斜下局部解掩码后验的状态级匹配。DTM 以加权交叉熵目标形式出现，具有显式的最小化器，并且允许控制变体以提高训练稳定性。在合成迷宫规划任务中，我们分析了 DTM 的退火计划和控制变体如何影响训练稳定性并防止模式崩溃。在大规模情况下，使用 DTM 微调 LLaDA-8B-Instruct 在 Sudoku 和 Countdown 任务上表现出强劲的提升，同时在 MATH500 和 GSM8K 任务上保持竞争力。

英文摘要

Masked diffusion large language models (dLLMs) are a promising alternative to autoregressive generation. While reinforcement learning (RL) methods have recently been adapted to dLLM fine-tuning, their objectives typically depend on sequence-level marginal likelihoods, which are intractable for masked diffusion models. To address this, we derive Discrete Tilt Matching (DTM), a likelihood-free method that recasts dLLM fine-tuning as state-level matching of local unmasking posteriors under reward tilting. DTM takes the form of a weighted cross-entropy objective with explicit minimizer, and admits control variates that improve training stability. On a synthetic maze-planning task, we analyze how DTM's annealing schedule and control variates affect training stability and prevent mode collapse. At scale, fine-tuning LLaDA-8B-Instruct with DTM yields strong gains on Sudoku and Countdown while remaining competitive on MATH500 and GSM8K.

URL PDF HTML ☆

赞 0 踩 0

2603.14918 2026-05-20 stat.ML cs.LG

Bayesian Symbolic Regression for Missing Physics

贝叶斯符号回归用于缺失物理

Arno Strouwen

AI总结本文提出了一种基于贝叶斯的符号回归方法，用于从实验数据中学习缺失的物理规律，通过Reversible Jump Markov Chain Monte Carlo方法量化模型结构的不确定性。

详情

Comments: 6 pages, 4 figures. Accepted at IFAC World Congress 2026. v2: updated title and results for camera-ready version

AI中文摘要

基于模型的方法用于(bio)过程系统时，往往面临对底层物理、化学或生物定律不完整知识的挑战。通用微分方程，将神经网络嵌入微分方程中，已发展为从实验数据中学习缺失物理的强大工具。然而，神经网络本质上是不透明的，因此需要通过符号回归进行后处理以获得可解释的数学表达式。基于遗传算法的符号回归是这种后处理步骤的流行方法，但只能提供点估计，无法量化发现方程的置信度。我们通过应用贝叶斯符号回归来解决这一限制，该方法使用Reversible Jump Markov Chain Monte Carlo在符号表达式树的后验分布上采样。这种方法自然地量化了恢复模型结构的不确定性。我们通过Lotka-Volterra捕食者-猎物系统演示了该方法，然后展示了精心设计的实验如何在Fed-batch生物反应器案例研究中降低不确定性。

英文摘要

Model-based approaches for (bio)process systems often suffer from incomplete knowledge of the underlying physical, chemical, or biological laws. Universal differential equations, which embed neural networks within differential equations, have emerged as powerful tools to learn this missing physics from experimental data. However, neural networks are inherently opaque, motivating their post-processing via symbolic regression to obtain interpretable mathematical expressions. Genetic algorithm-based symbolic regression is a popular approach for this post-processing step, but provides only point estimates and cannot quantify the confidence we should place in a discovered equation. We address this limitation by applying Bayesian symbolic regression, which uses Reversible Jump Markov Chain Monte Carlo to sample from the posterior distribution over symbolic expression trees. This approach naturally quantifies uncertainty in the recovered model structure. We demonstrate the methodology on a Lotka-Volterra predator-prey system and then show how a well-designed experiment leads to lower uncertainty in a fed-batch bioreactor case study.

URL PDF HTML ☆

赞 0 踩 0

2603.07018 2026-05-20 stat.ME cs.LG econ.EM

TEA-Time: Transporting Effects Across Time

TEA-Time: 跨时间效应传输

Harsh Parikh, Gabriel Levin-Konigsberg, Dominique Perrault-Joncas, Alexander Volfovsky

AI总结本文提出了一种跨时间效应传输的方法，通过分离的时变效应假设正式化传输的平均处理效应，推导出两种识别策略：重复试验和共同臂，并为每种策略开发双重稳健、半参数高效估计器。

详情

AI中文摘要

从随机对照试验中估计的处理效应不仅局限于研究人群，还局限于试验进行的时间。关于将实验结果推广到新人群的文献非常广泛，但跨时间传输效应却受到较少关注，甚至定义目标估计量也并不明显。我们正式化了在可分离的时变效应假设下的传输平均处理效应，推导出两种识别策略：重复试验和共同臂，并为每种策略开发双重稳健、半参数高效估计器。应用于一个大型的头条A/B测试档案库，共同臂策略在精度上显著更高，但当时间因素依赖于干预与测量之间的间隔而非单独的测量时间时，会表现出系统性偏差，而允许这种依赖的重复试验策略则更忠实于真实情况。模拟研究探讨了每种策略在何时可靠以及何时会无声地失败。

英文摘要

Treatment effects estimated from a randomized controlled trial are local not only to the study population but also to the time at which the trial was conducted. The literature on generalizing experimental findings to new populations is extensive, yet transporting effects across time has received far less attention, and even defining the target estimand is nonobvious. We formalize the transported average treatment effect under a separable temporal effects assumption, derive two identification strategies: replicated trials and common arm, and develop doubly robust, semiparametrically efficient estimators for each. Applied to a large archive of headline A/B tests, the common arm strategy is substantially more precise but exhibits systematic bias when the temporal factor depends on the gap between intervention and measurement rather than on measurement time alone, while the replicated trials strategy, which allows this dependence, tracks the ground truth more faithfully. Simulation studies investigate when each strategy is reliable and when it silently fails.

URL PDF HTML ☆

赞 0 踩 0

2601.18178 2026-05-20 math.ST stat.ME stat.TH

Asymptotic properties of the multivariate Szász-Mirakyan estimator for cumulative distribution functions on the nonnegative orthant

多变量Szász-Mirakyan估计器在非负正交量上的累积分布函数的渐近性质

Guanjie Lyu, Frédéric Ouimet, Cindy Feng

AI总结本文研究了多变量Szász-Mirakyan估计器在非负正交量上累积分布函数的渐近性质，推导了在紧致子集内的显式偏倚和方差展开，提供了精确的均方误差刻画和最优平滑速率。分析表明，所提出的泊松平滑相对于经验分布函数具有显著的方差减少，从而带来可量化通过局部和全局缺陷度量的渐近效率提升。此外，还分别研究了估计器在支持区域边界附近的性质。在保持非退化泊松平滑的边界层缩放下，得到的偏倚和方差展开与内部区域有根本性差异。特别是，方差减少机制在最高阶消失，意味着在边界区域不存在渐近最优的平滑参数。还建立了中心极限定理和几乎处处一致性的结果。这些结果为多变量Szász-Mirakyan分布函数估计提供了统一的渐近理论，并阐明了平滑在内部和边界区域中的不同作用。

详情

Comments: 40 pages, 3 figures, 3 tables

AI中文摘要

多变量Szász-Mirakyan估计器在非负正交量上累积分布函数的渐近性质被研究。在紧致子集的内部中推导了显式的偏倚和方差展开，从而得到了精确的均方误差刻画和最优平滑速率。分析表明，所提出的泊松平滑相对于经验分布函数具有非可忽略的方差减少，从而带来可通过局部和全局缺陷度量量化渐近效率提升。此外，还分别研究了估计器在支持区域边界附近的性质。在保持非退化泊松平滑的边界层缩放下，当评价点接近$[0,\infty)^d$的边界时，得到的偏倚和方差展开与内部区域有根本性差异。特别是，方差减少机制在最高阶消失，意味着在边界区域不存在渐近最优的平滑参数。还建立了中心极限定理和几乎处处一致性的结果。这些结果为多变量Szász-Mirakyan分布函数估计提供了统一的渐近理论，并阐明了平滑在内部和边界区域中的不同作用。

英文摘要

The asymptotic properties of multivariate Szász-Mirakyan estimators for cumulative distribution functions (cdf) supported on the nonnegative orthant are investigated. Explicit bias and variance expansions are derived on compact subsets of the interior, yielding sharp mean squared error characterizations and optimal smoothing rates. The analysis shows that the proposed Poisson smoothing yields a non-negligible variance reduction relative to the empirical cdf, leading to asymptotic efficiency gains that can be quantified through local and global deficiency measures. The behavior of the estimator near the boundary of its support is examined separately. Under a boundary-layer scaling that preserves nondegenerate Poisson smoothing as the evaluation point approaches the boundary of $[0,\infty)^d$, bias and variance expansions are obtained that differ fundamentally from those in the interior region. In particular, the variance reduction mechanism disappears at leading order, implying that no asymptotically optimal smoothing parameter exists in the boundary regime. Central limit theorems and almost sure uniform consistency are also established. Together, these results provide a unified asymptotic theory for multivariate Szász-Mirakyan cdf estimation and clarify the distinct roles of smoothing in the interior and boundary regions.

URL PDF HTML ☆

赞 0 踩 0

2601.14234 2026-05-20 cs.LG cs.AI cs.RO stat.ML

Q-learning with Adjoint Matching

具有伴随匹配的Q学习

Qiyang Li, Sergey Levine

AI总结本文提出了一种基于时序差分的强化学习算法QAM，解决了连续动作强化学习中的长期挑战：高效优化表达性强的扩散或流匹配策略相对于参数化的Q函数。通过利用批评者的首阶信息进行有效优化，但直接通过反向传播其多步去噪过程进行梯度优化在数值上不稳定。现有方法通过仅使用价值和丢弃梯度信息或依赖近似方法牺牲策略的表达性或偏置学习策略。QAM通过利用生成建模中最近提出的技术伴随匹配，将批评者的动作梯度转换为逐步目标函数，避免了不稳定反向传播，同时在最优时提供无偏且表达性强的策略。结合时序差分备份进行批评者学习，QAM在离线和离线到在线强化学习的硬稀疏奖励任务中一致优于先前方法。

详情

Comments: 32 pages, 8 figures, 7 tables

AI中文摘要

我们提出QAM，一种新颖的基于时序差分的强化学习（RL）算法，解决了连续动作RL中长期存在的挑战：高效优化表达性强的扩散或流匹配策略相对于参数化的Q函数。有效的优化需要利用批评者的首阶信息，但通过反向传播其多步去噪过程进行直接梯度优化在数值上不稳定。现有方法通过仅使用价值和丢弃梯度信息或依赖近似方法牺牲策略的表达性或偏置学习策略。QAM通过利用生成建模中最近提出的技术伴随匹配，将批评者的动作梯度转换为逐步目标函数，避免了不稳定反向传播，同时在最优时提供无偏且表达性强的策略。结合时序差分备份进行批评者学习，QAM在离线和离线到在线RL的硬稀疏奖励任务中一致优于先前方法。

英文摘要

We propose Q-learning with Adjoint Matching (QAM), a novel TD-based reinforcement learning (RL) algorithm that tackles a long-standing challenge in continuous-action RL: efficient optimization of an expressive diffusion or flow-matching policy with respect to a parameterized Q-function. Effective optimization requires exploiting the first-order information of the critic, but it is challenging to do so for flow or diffusion policies because direct gradient-based optimization via backpropagation through their multi-step denoising process is numerically unstable. Existing methods work around this either by only using the value and discarding the gradient information, or by relying on approximations that sacrifice policy expressivity or bias the learned policy. QAM sidesteps both of these challenges by leveraging adjoint matching, a recently proposed technique in generative modeling, which transforms the critic's action gradient to form a step-wise objective function that is free from unstable backpropagation, while providing an unbiased, expressive policy at the optimum. Combined with temporal-difference backup for critic learning, QAM consistently outperforms prior approaches on hard, sparse reward tasks in both offline and offline-to-online RL.

URL PDF HTML ☆

赞 0 踩 0

2601.12707 2026-05-20 cs.LG stat.ML

Decoding Rewards in Competitive Games: Inverse Game Theory with Entropy Regularization

在竞争性游戏中解码奖励：带有熵正则化的逆向博弈论

Junyi Liao, Zihan Zhu, Ethan Fang, Zhuoran Yang, Vahid Tarokh

AI总结本文研究了在竞争性游戏中通过逆向博弈论和熵正则化来恢复未知奖励函数的问题，提出了一种统一的框架，能够在静态和动态设置中学习奖励函数，并通过理论保证和数值实验验证了其有效性。

详情

Comments: Extended journal version of ICML 2025 paper. Submitted to Operations Research

AI中文摘要

估计驱动智能体行为的未知奖励函数在逆向强化学习和博弈论中具有核心重要性。为解决这个问题，我们开发了一个统一的框架，用于在两名玩家零和矩阵博弈和马尔可夫博弈中恢复奖励函数，并通过熵正则化来重建给定观察到的玩家策略和动作的潜在奖励函数。这项任务具有挑战性，因为逆向问题固有的模糊性、可行奖励的非唯一性和观察数据覆盖的限制。为了解决这些挑战，我们利用线性假设在量级响应均衡（QRE）下建立了奖励函数的可识别性。在此理论基础上，我们提出了一种新的算法，从观察到的动作中学习奖励函数。我们的算法适用于静态和动态设置，并且可以适应不同方法，如最大似然估计（MLE）。我们为算法的可靠性和样本效率提供了强有力的理论保证。进一步，我们进行了广泛的数值研究，以证明所提出框架的实际有效性，为竞争环境中的决策提供了新的见解。

英文摘要

Estimating the unknown reward functions driving agents' behaviors is of central interest in inverse reinforcement learning and game theory. To tackle this problem, we develop a unified framework for reward function recovery in two-player zero-sum matrix games and Markov games with entropy regularization, where we aim to reconstruct the underlying reward functions given observed players' strategies and actions. This task is challenging due to the inherent ambiguity of inverse problems, the non-uniqueness of feasible rewards, and limited observational data coverage. To address these challenges, we establish the reward function's identifiability using the quantal response equilibrium (QRE) under linear assumptions. Building upon this theoretical foundation, we propose a novel algorithm to learn reward functions from observed actions. Our algorithm works in both static and dynamic settings and is adaptable to incorporate different methods, such as Maximum Likelihood Estimation (MLE). We provide strong theoretical guarantees for the reliability and sample efficiency of our algorithm. Further, we conduct extensive numerical studies to demonstrate the practical effectiveness of the proposed framework, offering new insights into decision-making in competitive environments.

URL PDF HTML ☆

赞 0 踩 0

2601.12238 2026-05-20 stat.ML cs.LG math.OC

On the Provable Suboptimality of Momentum SGD in Nonstationary Stochastic Optimization

关于动量SGD在非平稳随机优化中的可证明次优性的研究

Sharan Sahu, Cameron J. Hogan, Martin T. Wells

AI总结本文研究了在强凸性和光滑性条件下，随机梯度下降及其动量变体（Polyak重球和Nesterov）在跟踪时间变化最优解时的性能，揭示了动量方法在分布偏移下导致的显式漂移放大惩罚，并证明了这种惩罚并非分析伪影，而是信息论障碍，为动量方法在动态环境中的经验不稳定性提供了理论依据。

详情

Comments: Accepted to ICML 2026. 75 pages, 5 figures, 4 tables

AI中文摘要

在本文中，我们对随机梯度下降（SGD）及其动量变体（Polyak重球和Nesterov）在强凸性和光滑性条件下跟踪时间变化最优解进行了全面的理论分析。我们的有限时间界揭示了跟踪误差的尖锐分解，将其分为瞬态、噪声诱导和漂移诱导成分。这种分解揭示了一个根本性的权衡：虽然动量通常被用作梯度平滑启发式方法，但在分布偏移下，它会引入显式漂移放大惩罚，当动量参数β接近1时，该惩罚会发散，导致系统性的跟踪滞后。我们通过梯度变化约束下的最小最大下界补充这些上界，证明这种动量引起的跟踪惩罚并非分析伪影，而是信息论障碍：在漂移主导的 regime 中，动量不可避免地更差，因为旧梯度平均迫使系统性滞后。我们的结果为动量方法在动态环境中的经验不稳定性提供了理论依据，并精确界定了 vanilla SGD 在其加速变体中可证明表现更好的 regime 分界线。

英文摘要

In this paper, we provide a comprehensive theoretical analysis of Stochastic Gradient Descent (SGD) and its momentum variants (Polyak Heavy-Ball and Nesterov) for tracking time-varying optima under strong convexity and smoothness. Our finite-time bounds reveal a sharp decomposition of tracking error into transient, noise-induced, and drift-induced components. This decomposition exposes a fundamental trade-off: while momentum is often used as a gradient-smoothing heuristic, under distribution shift it incurs an explicit drift-amplification penalty that diverges as the momentum parameter $β$ approaches 1, yielding systematic tracking lag. We complement these upper bounds with minimax lower bounds under gradient-variation constraints, proving this momentum-induced tracking penalty is not an analytical artifact but an information-theoretic barrier: in drift-dominated regimes, momentum is unavoidably worse because stale-gradient averaging forces systematic lag. Our results provide theoretical grounding for the empirical instability of momentum in dynamic settings and precisely delineate regime boundaries where vanilla SGD provably outperforms its accelerated counterparts.

URL PDF HTML ☆

赞 0 踩 0

2512.05650 2026-05-20 stat.ME stat.CO

Efficient sequential Bayesian inference for state-space epidemic models using ensemble data assimilation

基于集合数据同化的有效序贯贝叶斯推断用于状态空间流行病模型

Dhorasso Temfack, Jason Wyse

AI总结本文提出了一种高效的序贯贝叶斯推断方法，用于状态空间流行病模型，通过使用集合数据同化替代传统粒子滤波，以提高计算效率并保持推断精度。

详情

AI中文摘要

从部分观测、噪声数据中估计隐含的流行病状态和模型参数仍然是流行病学建模中的主要挑战。状态空间公式提供了一种连贯的概率框架用于此类推断，但完全贝叶斯估计通常计算上是不可行的，因为评估观测数据似然需要对潜在轨迹进行积分。序贯蒙特卡洛平方（SMC$^2$）算法提供了一种联合状态和参数推断的系统方法，结合了对外部参数的SMC采样器和内部粒子滤波器，用于估计当前时间点的似然。尽管这种嵌套粒子滤波器在理论上具有吸引力，但其计算成本很高，限制了在接近实时爆发响应中的常规使用。我们提出集合SMC$^2$（eSMC$^2$），一种计算上更高效的变体，用集合卡尔曼滤波（EnKF）替代内部粒子滤波器，以在每个观测时间点近似增量似然。虽然这种替代引入了通过高斯近似带来的偏差，但我们通过使用无偏高斯密度估计器和通过状态依赖的观测方差适应EnKF来减轻有限样本效应。这使我们的方法特别适合于在流行病监测中常见的超分散发病率数据。具有已知真实值的模拟实验和对2022年美国猴痘发病率数据的应用表明，eSMC$^2$在实现显著的计算优势的同时，产生的后验估计与SMC$^2$相当。该方法能够准确重建流行病轨迹并估计关键流行病学参数，提供了一个高效的框架用于从不完美的监测数据中进行序贯贝叶斯推断。

英文摘要

Estimating latent epidemic states and model parameters from partially observed, noisy data remains a major challenge in infectious disease modeling. State-space formulations provide a coherent probabilistic framework for such inference, yet fully Bayesian estimation is often computationally prohibitive because evaluating the observed-data likelihood requires integration over a latent trajectory. The Sequential Monte Carlo squared (SMC$^2$) algorithm offers a principled approach for joint state and parameter inference, combining an outer SMC sampler over parameters with an inner particle filter that estimates the likelihood up to the current time point. Despite its theoretical appeal, this nested particle filter imposes substantial computational cost, limiting routine use in near-real-time outbreak response. We propose Ensemble SMC$^2$ (eSMC$^2$), a computationally efficient variant that replaces the inner particle filter with an Ensemble Kalman Filter (EnKF) to approximate the incremental likelihood at each observation time. While this substitution introduces bias via a Gaussian approximation, we mitigate finite-sample effects using an unbiased Gaussian density estimator and adapt the EnKF for epidemic data through state-dependent observation variance. This makes our approach particularly suitable for overdispersed incidence data commonly encountered in infectious disease surveillance. Simulation experiments with known ground truth and an application to 2022 United States (U.S.) monkeypox incidence data demonstrate that eSMC$^2$ achieves substantial computational gains while producing posterior estimates comparable to SMC$^2$. The method accurately reconstructs epidemic trajectories and estimates key epidemiological parameters, providing an efficient framework for sequential Bayesian inference from imperfect surveillance data.

URL PDF HTML ☆

赞 0 踩 0

2511.01126 2026-05-20 cs.LG cs.NA math.NA math.OC math.ST stat.TH

Stochastic Regret Guarantees for Online Zeroth- and First-Order Bilevel Optimization

在线零阶和一阶双层优化的随机遗憾保证

Parvin Nazari, Bojian Hou, Davoud Ataee Tarzanagh, Li Shen, George Michailidis

AI总结本文提出了一种新的搜索方向，证明了利用该方向的零阶和一阶随机在线双层优化算法能够在不使用窗口平滑的情况下实现亚线性随机双层遗憾。此外，该框架通过减少超梯度估计中的oracle依赖、同时更新内层和外层变量以及使用基于零阶的Hessian、雅可比和梯度估计来提高效率。

详情

Comments: Published at NeurIPS 2025

AI中文摘要

在线双层优化（OBO）是一种强大的框架，用于解决机器学习问题，其中外层和内层目标随时间演变，需要动态更新。当前的OBO方法依赖于确定性的窗口平滑后悔最小化，这在函数变化迅速时可能无法准确反映系统性能。在本文中，我们引入了一种新的搜索方向，并证明利用该方向的零阶和一阶随机OBO算法能够在不使用窗口平滑的情况下实现亚线性随机双层遗憾。除了这些保证外，我们的框架通过以下方式提高效率：（i）减少超梯度估计中的oracle依赖，（ii）在求解线性系统的同时更新内层和外层变量，（iii）使用基于零阶的Hessian、雅可比和梯度估计。在在线参数损失调谐和黑盒对抗攻击的实验中验证了我们的方法。

英文摘要

Online bilevel optimization (OBO) is a powerful framework for machine learning problems where both outer and inner objectives evolve over time, requiring dynamic updates. Current OBO approaches rely on deterministic \textit{window-smoothed} regret minimization, which may not accurately reflect system performance when functions change rapidly. In this work, we introduce a novel search direction and show that both first- and zeroth-order (ZO) stochastic OBO algorithms leveraging this direction achieve sublinear {stochastic bilevel regret without window smoothing}. Beyond these guarantees, our framework enhances efficiency by: (i) reducing oracle dependence in hypergradient estimation, (ii) updating inner and outer variables alongside the linear system solution, and (iii) employing ZO-based estimation of Hessians, Jacobians, and gradients. Experiments on online parametric loss tuning and black-box adversarial attacks validate our approach.

URL PDF HTML ☆

赞 0 踩 0

2510.20035 2026-05-20 stat.ME cs.LG

Throwing Vines at the Wall: Structure Learning via Random Search

向墙上投掷藤蔓：通过随机搜索进行结构学习

Thibault Vatter, Thomas Nagler

AI总结本文提出基于模型置信集的统计框架和随机搜索算法，以改进结构选择，提供理论保证，并为集成学习奠定基础。

2510.19382 2026-05-20 stat.ML cs.LG

A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond

一种用于结构发现的去随机化框架：应用于神经网络及其他领域

Nikos Tsikouras, Yorgos Pantis, Ioannis Mitliagkas, Christos Tzamos

AI总结本文研究了神经网络中特征学习动态的理解问题，提出了一种基于去随机化方法的结构发现框架，在更弱的假设下探讨了结构发现的本质及其在MAXCUT端到端近似和Johnson-Lindenstrauss嵌入计算中的应用。

详情

AI中文摘要

理解神经网络中特征学习动态的机制仍然是一个重大挑战。Mousavi-Hosseini等人（2023）分析了多重索引教师-学生设置，并展示了在使用随机梯度下降（SGD）和强正则化器训练时，两层学生模型的第一层权重会呈现低秩结构。这种结构特性已知可以减少泛化样本复杂度。在第二步中，同一作者们在额外假设下建立了算法特定的学习保证。本文专注于结构发现方面，并在更弱的假设下研究了该问题，具体包括：允许任意大小和深度的神经网络，所有参数可训练，任何平滑损失函数，微弱正则化，以及通过任何能够达到二阶平稳点（SOSP）的方法（例如扰动梯度下降（PGD））进行训练。我们方法的核心是一个关键的去随机化引理，该引理指出在温和条件下，优化函数E_x[g_θ(Wx + b)]会收敛到W=0的点。该引理的本质直接解释了结构发现，并在其他领域如端到端MAXCUT近似和Johnson-Lindenstrauss嵌入计算中具有即时应用。

英文摘要

Understanding the dynamics of feature learning in neural networks (NNs) remains a significant challenge. The work of (Mousavi-Hosseini et al., 2023) analyzes a multiple index teacher-student setting and shows that a two-layer student attains a low-rank structure in its first-layer weights when trained with stochastic gradient descent (SGD) and a strong regularizer. This structural property is known to reduce sample complexity of generalization. Indeed, in a second step, the same authors establish algorithm-specific learning guarantees under additional assumptions. In this paper, we focus exclusively on the structure discovery aspect and study it under weaker assumptions, more specifically: we allow (a) NNs of arbitrary size and depth, (b) with all parameters trainable, (c) under any smooth loss function, (d) tiny regularization, and (e) trained by any method that attains a second-order stationary point (SOSP), e.g.\ perturbed gradient descent (PGD). At the core of our approach is a key $\textit{derandomization}$ lemma, which states that optimizing the function $\mathbb{E}_{\mathbf{x}} \left[g_θ(\mathbf{W}\mathbf{x} + \mathbf{b})\right]$ converges to a point where $\mathbf{W} = \mathbf{0}$, under mild conditions. The fundamental nature of this lemma directly explains structure discovery and has immediate applications in other domains including an end-to-end approximation for MAXCUT, and computing Johnson-Lindenstrauss embeddings.

URL PDF HTML ☆

赞 0 踩 0

2510.09028 2026-05-20 math.ST stat.TH

Drift estimation for rough processes under small noise asymptotic : QMLE approach

小噪声渐近下粗糙过程的漂移估计：QMLE方法

Arnaud Gloter, Nakahiro Yoshida

AI总结本文研究了在小噪声渐近条件下，通过QMLE方法对具有奇异核的随机Volterra过程的漂移函数参数进行估计，证明了在网格大小趋于零时，路径重建误差随h^1/2衰减，从而得到一个有效的估计量。

2510.03824 2026-05-20 cs.LG cs.AI stat.ML

Proximal Diffusion Neural Sampler

近端扩散神经采样器

Wei Guo, Jaemoo Choi, Yuchen Zhu, Molei Tao, Yongxin Chen

AI总结本文提出了一种名为近端扩散神经采样器（PDNS）的框架，通过在路径测度空间上应用近端点方法，解决神经采样器在训练过程中遇到的多模式目标分布和模式崩溃问题，通过分阶段的简单子问题逐步逼近目标分布，促进模式的全面探索。

详情

Comments: Accepted at ICLR 2026 (https://openreview.net/forum?id=XTHQqS7ObC)

AI中文摘要

学习基于扩散的神经采样器以从未归一化目标分布中抽取样本的任务可以被视为路径测度上的随机最优控制问题。然而，当目标分布是多模式且存在显著的模式分离屏障时，神经采样器的训练可能会面临挑战，可能导致模式崩溃。我们提出了一种名为近端扩散神经采样器（PDNS）的框架，通过在路径测度空间上应用近端点方法来解决这些问题。PDNS将学习过程分解为一系列更简单的子问题，逐步创建一条接近目标分布的路径。这种分阶段的程序会逐步细化路径以接近目标分布，并促进对所有模式的彻底探索。为了实现实用且高效的实现，我们用近端加权去噪交叉熵（WDCE）目标实例化每个近端步骤。通过在连续和离散采样任务中的广泛实验，包括分子动力学和统计物理中的挑战性场景，我们展示了PDNS的有效性和鲁棒性。我们的代码可在https://github.com/AlexandreGUO2001/PDNS上获得。

英文摘要

The task of learning a diffusion-based neural sampler for drawing samples from an unnormalized target distribution can be viewed as a stochastic optimal control problem on path measures. However, the training of neural samplers can be challenging when the target distribution is multimodal with significant barriers separating the modes, potentially leading to mode collapse. We propose a framework named Proximal Diffusion Neural Sampler (PDNS) that addresses these challenges by tackling the stochastic optimal control problem via proximal point method on the space of path measures. PDNS decomposes the learning process into a series of simpler subproblems that create a path gradually approaching the desired distribution. This staged procedure traces a progressively refined path to the desired distribution and promotes thorough exploration across modes. For a practical and efficient realization, we instantiate each proximal step with a proximal weighted denoising cross-entropy (WDCE) objective. We demonstrate the effectiveness and robustness of PDNS through extensive experiments on both continuous and discrete sampling tasks, including challenging scenarios in molecular dynamics and statistical physics. Our code is available at https://github.com/AlexandreGUO2001/PDNS.

URL PDF HTML ☆

赞 0 踩 0

2509.19250 2026-05-20 stat.ML cs.LG

Recovering Wasserstein Distance Matrices from Few Measurements

从少量测量中恢复Wasserstein距离矩阵

Muhammad Rana, Abiy Tasissa, HanQin Cai, Yakov Gavriyelov, Keaton Hamm

AI总结本文提出两种算法，用于从少量条目估计平方Wasserstein距离矩阵，这些矩阵用于计算流形学习嵌入，如多维标度分析（MDS）或Isomap，但与欧几里得距离矩阵不同，它们的计算成本极高。本文分析了从上三角样本进行矩阵补全和Nyström补全，证明了在Nyström补全下MDS的稳定性，并展示了在固定样本距离预算下，Nyström补全可以优于矩阵补全。最后，本文证明了即使仅计算距离矩阵的10%列，嵌入数据在OrganCMNIST数据集上的分类也是稳定的。

详情

DOI: 10.1109/ICASSP55912.2026.11460676
Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026

AI中文摘要

本文提出两种算法，用于从少量条目估计平方Wasserstein距离矩阵。这些矩阵用于计算流形学习嵌入，如多维标度分析（MDS）或Isomap，但与欧几里得距离矩阵不同，它们的计算成本极高。我们分析了从上三角样本进行矩阵补全和Nyström补全，在其中$\mathcal{O}(d\log(d))$列的距离矩阵被计算，其中$d$是所需的嵌入维度，证明了在Nyström补全下MDS的稳定性，并展示了在固定样本距离预算下，Nyström补全可以优于矩阵补全。最后，我们证明了即使仅计算距离矩阵的10%列，嵌入数据在OrganCMNIST数据集上的分类也是稳定的。

英文摘要

This paper proposes two algorithms for estimating square Wasserstein distance matrices from a small number of entries. These matrices are used to compute manifold learning embeddings like multidimensional scaling (MDS) or Isomap, but contrary to Euclidean distance matrices, are extremely costly to compute. We analyze matrix completion from upper triangular samples and Nyström completion in which $\mathcal{O}(d\log(d))$ columns of the distance matrices are computed where $d$ is the desired embedding dimension, prove stability of MDS under Nyström completion, and show that it can outperform matrix completion for a fixed budget of sample distances. Finally, we show that classification of the OrganCMNIST dataset from the MedMNIST benchmark is stable on data embedded from the Nyström estimation of the distance matrix even when only 10\% of the columns are computed.

URL PDF HTML ☆

赞 0 踩 0

2509.12884 2026-05-20 stat.ME stat.ML

Modeling nonstationary spatial processes with normalizing flows

用归一化流建模非平稳空间过程

Pratik Nag, Andrew Zammit-Mangion, Ying Sun

AI总结本文提出利用神经自回归流（NAFs）建模非平稳各向异性空间过程，通过模拟研究证明其优于传统空间过程模型，并在3D Argo浮标数据集上展示其在实际应用中的有效性。

2507.11719 2026-05-20 stat.ME stat.CO stat.ML

Barycentric model aggregation in the Wasserstein space of distributions and a variational approach to consistency

在分布的Wasserstein空间中研究模型聚合及一致性变分方法

Emmanouil Androulakis, Georgios I. Papayiannis, Athanasios N. Yannacopoulos

AI总结本文研究了在实数线上概率测度的Wasserstein空间中模型聚合问题，提出了一种数据驱动的校准框架，通过统计学习聚合权重，并基于Γ-收敛从变分视角证明了聚合方案的一致性，通过合成实验和实际温度监测网络数据验证了方法性能。

2507.06428 2026-05-20 math.OC cs.LG cs.NA math.NA stat.ML

Neural Actor-Critic Methods for Hamilton-Jacobi-Bellman PDEs: Asymptotic Analysis and Numerical Studies

神经Actor-Critic方法用于哈密尔顿-雅可比-贝尔曼PDEs：渐近分析与数值研究

Samuel N. Cohen, Jackson Hebner, Deqing Jiang, Justin Sirignano

AI总结本文研究了用于求解高维哈密尔顿-雅可比-贝尔曼偏微分方程的神经Actor-Critic方法，通过渐近分析和数值研究，证明了该方法在解决随机控制问题中的有效性。

详情

Comments: 46 pages

AI中文摘要

我们数学上分析并数值研究了一种用于求解随机控制理论中高维哈密尔顿-雅可比-贝尔曼（HJB）偏微分方程的Actor-Critic机器学习算法。批评者（价值函数估计器）的结构设计使得边界条件始终被完美满足（而不是包含在训练损失中），并利用偏斜梯度以减少计算成本。演员（最优控制估计器）通过最小化域内哈密尔顿量的积分进行训练，其中哈密尔顿量通过批评者估计。我们证明，当演员和批评者神经网络中的隐藏单元数量趋于无穷大时，演员和批评者的训练动态在Sobolev型空间中收敛到某个无限维常微分方程（ODE）。进一步地，在哈密尔顿量类似凸性假设下，我们证明该极限ODE的任何固定点都是原始随机控制问题的解。这为算法性能提供了重要保证，考虑到有限宽度神经网络可能只能收敛到局部极小值（而非最优解），由于其损失函数的非凸性。在我们的数值研究中，我们展示了该算法能够准确地在高达200维的随机控制问题中求解。特别是，我们构建了一系列逐渐复杂且具有已知解析解的随机控制问题，并研究该算法在这些问题上的数值性能。这些问题从线性二次调节器方程到极具挑战性的非凸哈密尔顿量方程，使我们能够识别并分析该神经Actor-Critic方法在求解HJB方程中的优势和局限性。

英文摘要

We mathematically analyze and numerically study an actor-critic machine learning algorithm for solving high-dimensional Hamilton-Jacobi-Bellman (HJB) partial differential equations from stochastic control theory. The architecture of the critic (the estimator for the value function) is structured so that the boundary condition is always perfectly satisfied (rather than being included in the training loss) and utilizes a biased gradient which reduces computational cost. The actor (the estimator for the optimal control) is trained by minimizing the integral of the Hamiltonian over the domain, where the Hamiltonian is estimated using the critic. We show that the training dynamics of the actor and critic neural networks converge in a Sobolev-type space to a certain infinite-dimensional ordinary differential equation (ODE) as the number of hidden units in the actor and critic $\rightarrow \infty$. Further, under a convexity-like assumption on the Hamiltonian, we prove that any fixed point of this limit ODE is a solution of the original stochastic control problem. This provides an important guarantee for the algorithm's performance in light of the fact that finite-width neural networks may only converge to a local minimizers (and not optimal solutions) due to the non-convexity of their loss functions. In our numerical studies, we demonstrate that the algorithm can solve stochastic control problems accurately in up to 200 dimensions. In particular, we construct a series of increasingly complex stochastic control problems with known analytic solutions and study the algorithm's numerical performance on them. These problems range from a linear-quadratic regulator equation to highly challenging equations with non-convex Hamiltonians, allowing us to identify and analyze the strengths and limitations of this neural actor-critic method for solving HJB equations.

URL PDF HTML ☆

赞 0 踩 0

2506.20058 2026-05-20 stat.ME stat.AP

Causal mediation analysis for longitudinal and survival data in continuous time using Bayesian non-parametric joint models

基于贝叶斯非参数联合模型的纵向和生存数据的因果中介分析

Saurabh Bhandari, Michael J. Daniels, Juned Siddique

AI总结本文提出了一种联合建模纵向暴露、混杂因素、中介变量和时间到事件结局的因果中介框架，以连续函数形式建模年龄，利用富集狄利克雷过程混合模型进行统计推断，评估药物对心血管疾病死亡时间的影响。

详情

AI中文摘要

观察性队列数据是理解治疗对生存的影响及其通过疾病相关风险因素变化所中介的程度的重要信息来源。然而，这些分析常常受到不规则的数据收集间隔和纵向混杂因素和中介变量的存在所困扰。我们提出了一种因果中介框架，该框架将纵向暴露、混杂因素、中介变量和时间到事件结局作为年龄的连续函数进行联合建模。这种纵向协变量轨迹的框架即使在受试者协变量测量不可用的年龄也能进行统计推断。我们的框架中观测数据分布使用富集狄利克雷过程混合（EDPM）模型进行建模。利用动脉粥样硬化风险在社区队列研究的数据，我们应用我们的方法来评估药物——针对心血管疾病（CVD）风险因素所开具的药物——对时间到CVD死亡的影响。

英文摘要

Observational cohort data is an important source of information for understanding the causal effects of treatments on survival and the degree to which these effects are mediated through changes in disease-related risk factors. However, these analyses are often complicated by irregular data collection intervals and the presence of longitudinal confounders and mediators. We propose a causal mediation framework that jointly models longitudinal exposures, confounders, mediators, and time-to-event outcomes as continuous functions of age. This framework for longitudinal covariate trajectories enables statistical inference even at ages where the subject's covariate measurements are unavailable. The observed data distribution in our framework is modeled using an enriched Dirichlet process mixture (EDPM) model. Using data from the Atherosclerosis Risk in Communities cohort study, we apply our methods to assess how medication -- prescribed to target cardiovascular disease (CVD) risk factors -- affects the time-to-CVD death.

URL PDF HTML ☆

赞 0 踩 0

2506.17036 2026-05-20 stat.ME cs.LG stat.ML

Bayesian Joint Model of Multi-Sensor and Failure Event Data for Multi-Mode Failure Prediction

多传感器和故障事件数据的贝叶斯联合模型用于多模式故障预测

Sina Aghaee Dabaghan Fard, Minhee Kim, Akash Deep, Jaesung Lee

AI总结本文提出了一种联合建模多传感器时间序列数据和多模式故障时间的贝叶斯方法，通过整合Cox比例危险模型、卷积多输出高斯过程和多项式故障模式分布，实现对系统剩余使用寿命的准确预测，并通过数值和案例研究验证了其优势。

详情

AI中文摘要

现代工业系统常常受到多种故障模式的影响，其状态由多个传感器监控，产生多个时间序列信号。此外，时间到故障的数据也经常可用。准确预测系统剩余使用寿命（RUL）需要有效利用多传感器时间序列数据和多模式故障事件数据。在大多数现有模型中，故障模式和RUL预测是独立进行的，忽略了这两个任务之间的内在关系。一些模型使用黑箱机器学习方法整合多种故障模式和事件预测，但缺乏统计严谨性，无法表征模型和数据中的内在不确定性。本文提出了一种统一的方法，通过层次贝叶斯框架整合多传感器时间序列数据和涉及多种故障模式的故障时间，该模型整合了Cox比例危险模型、卷积多输出高斯过程和多项式故障模式分布，并相应地设置先验，从而实现具有鲁棒不确定性量化的准确预测。通过变分贝叶斯方法有效获得后验分布，并通过蒙特卡洛采样进行预测。所提出模型的优势通过广泛的数值和案例研究，使用喷气发动机数据集进行了验证。

英文摘要

Modern industrial systems are often subject to multiple failure modes, and their conditions are monitored by multiple sensors, generating multiple time-series signals. Additionally, time-to-failure data are commonly available. Accurately predicting a system's remaining useful life (RUL) requires effectively leveraging multi-sensor time-series data alongside multi-mode failure event data. In most existing models, failure modes and RUL prediction are performed independently, ignoring the inherent relationship between these two tasks. Some models integrate multiple failure modes and event prediction using black-box machine learning approaches, which lack statistical rigor and cannot characterize the inherent uncertainty in the model and data. This paper introduces a unified approach to jointly model the multi-sensor time-series data and failure time concerning multiple failure modes. This proposed model integrate a Cox proportional hazards model, a Convolved Multi-output Gaussian Process, and multinomial failure mode distributions in a hierarchical Bayesian framework with corresponding priors, enabling accurate prediction with robust uncertainty quantification. Posterior distributions are effectively obtained by Variational Bayes, and prediction is performed with Monte Carlo sampling. The advantages of the proposed model is validated through extensive numerical and case studies with jet-engine dataset.

URL PDF HTML ☆

赞 0 踩 0

2506.00286 2026-05-20 cs.LG cs.AI math.OC stat.ML

Recursive Entropic Risk Optimization in Discounted MDPs: Sample Complexity Bounds with a Generative Model

递归熵风险优化在折扣马尔可夫决策过程中的应用：带有生成模型的样本复杂性界

Oliver Mortensen, Mohammad Sadegh Talebi

AI总结本文研究了在有限折扣马尔可夫决策过程（MDP）中使用递归熵风险度量（ERM）进行风险敏感强化学习的问题，引入了基于模型的算法Model-Based ERM Q-Value Iteration（MB-RS-QVI），并推导了该算法在价值学习和策略学习中的PAC型样本复杂性界，证明了在最坏情况下样本复杂性与|β|/(1-γ)呈指数关系，为递归ERM在风险规避和风险寻求情形下的样本复杂性提供了首次严格保证。

详情

AI中文摘要

我们研究了在有限折扣马尔可夫决策过程（MDP）中使用递归熵风险度量（ERM）进行风险敏感强化学习的问题，其中风险参数β≠0控制智能体的风险态度：β>0表示风险规避，β<0表示风险寻求行为。假设MDP具有生成模型。我们的关注点是学习最优状态-动作价值函数（价值学习）和最优策略（策略学习）在递归ERM下的样本复杂性。我们引入了一个基于模型的算法，称为Model-Based ERM Q-Value Iteration（MB-RS-QVI），并推导了该算法在价值和策略学习中的PAC型样本复杂性界。两种PAC界都随|β|/(1-γ)呈指数增长，其中γ是折扣因子。我们还为价值和策略学习建立了相应的下界，证明在最坏情况下样本复杂性对|β|/(1-γ)的指数依赖是不可避免的。这些界在状态和动作的数量（S和A）上是紧的，为递归ERM在风险规避和风险寻求情形下的样本复杂性提供了首次严格保证。

英文摘要

We study risk-sensitive reinforcement learning in finite discounted MDPs with recursive entropic risk measures (ERM), where the risk parameter $β\neq 0$ controls the agent's risk attitude: $β>0$ for risk-averse and $β<0$ for risk-seeking behavior. A generative model of the MDP is assumed to be available. Our focus is on the sample complexities of learning the optimal state-action value function (value learning) and an optimal policy (policy learning) under recursive ERM. We introduce a model-based algorithm, called Model-Based ERM $Q$-Value Iteration (MB-RS-QVI), and derive PAC-type bounds on its sample complexity for both value and policy learning. Both PAC bounds scale exponentially with $|β|/(1-γ)$, where $γ$ is the discount factor. We also establish corresponding lower bounds for both value and policy learning, showing that exponential dependence on $|β|/(1-γ)$ is unavoidable in the worst case. The bounds are tight in the number of states and actions ($S$ and $A$), providing the first rigorous sample complexity guarantees for recursive ERM across both risk-averse and risk-seeking regimes.

URL PDF HTML ☆

赞 0 踩 0

2504.08220 2026-05-20 stat.ME stat.AP

Feature aware covariance estimation, with application to mixtures of chemical exposures

具有特征意识的协方差估计，用于化学暴露混合物

Elizabeth Bersson, Kate Hoffman, Heather M. Stapleton, David B. Dunson

AI总结本文旨在改进环境暴露间协变推断，通过利用特征意识的贝叶斯因子分析扩展进行协方差回归，以更灵活地处理协方差结构，减少过度收缩问题。

详情

Comments: 25 pages, 6 figures

AI中文摘要

本文的动机是改进环境暴露间协变推断，受 Toddlers Exposure to SVOCs in Indoor Environments (TESIE) 研究数据的启发。挑战在于样本量有限，导致经验协方差估计较差。在相关应用中，贝叶斯因子模型受到青睐；这些方法将协方差表示为低秩加对角线，并可自适应推断因子数量。然而，它们的缺点是收缩到对角线协方差，常低估数据中的重要协变模式。替代方案是通过合并化学类别的详细暴露数据来解决维度问题，这可能掩盖重要信息。我们应用了特征意识的协方差回归扩展的贝叶斯因子分析，通过包含总结不同暴露特性的特征信息来提高性能。这种方法使收缩能够更灵活地适应协方差结构，减少过度收缩问题，如在 TESIE 数据中使用各种化学特征所展示的。

英文摘要

The motivation of this article is to improve inferences on the covariation in environmental exposures, motivated by data from a study of Toddlers Exposure to SVOCs in Indoor Environments (TESIE). The challenge is that the sample size is limited, so empirical covariance provides a poor estimate. In related applications, Bayesian factor models have been popular; these approaches express the covariance as low rank plus diagonal and can infer the number of factors adaptively. However, they have the disadvantage of shrinking towards a diagonal covariance, often under estimating important covariation patterns in the data. Alternatively, the dimensionality problem is addressed by collapsing the detailed exposure data within chemical classes, potentially obscuring important information. We apply a feature aware covariance regression extension of Bayesian factor analysis, which improves performance by including information from features summarizing properties of the different exposures. This approach enables shrinkage to more flexible covariance structures, reducing the over-shrinkage problem, as we illustrate in the TESIE data using various chemical features.

URL PDF HTML ☆

赞 0 踩 0

2502.04575 2026-05-20 stat.ML cs.LG cs.NA math.NA physics.comp-ph stat.CO

Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond

归一化常数估计的复杂性分析：从Jarzynski等式到退火重要性采样及其进一步发展

Wei Guo, Molei Tao, Yongxin Chen

AI总结本文研究了归一化常数估计问题，提出了一种非渐近分析方法，推导了退火重要性采样估计归一化常数的复杂度，并提出了一种新的算法以处理多模态问题。

详情

Comments: Accepted at ICLR 2026 (https://openreview.net/forum?id=96fJALwotm)

AI中文摘要

给定一个未归一化的概率密度π∝e^{-V}，估计其归一化常数Z=∫_{R^d}e^{-V(x)}dx或自由能F=-log Z是贝叶斯统计、统计力学和机器学习中的关键问题。尤其是在高维或π多模态时，这变得尤为具有挑战性。为了减轻传统重要性采样估计器的高方差，采用基于退火的方法如Jarzynski等式和退火重要性采样是常见的选择，但其定量复杂度保证仍很少被探索。我们朝着退火重要性采样的非渐近分析迈出第一步。特别是，我们推导出一个oracle复杂度为~O(dβ²A²/ε⁴)的复杂度，用于在高概率下估计Z的ε相对误差。其中，β是V的光滑度，A表示一个插值π和可处理参考分布的概率测度曲线的动作。我们的分析利用Girsanov定理和最优传输，不需要显式要求目标分布的等周假设。最后，为了处理广泛使用的几何插值的大动作，我们提出了一种基于反扩散采样器的新算法，建立了分析其复杂度的框架，并通过实验证明其在处理多模态问题中的效率。

英文摘要

Given an unnormalized probability density $π\propto\mathrm{e}^{-V}$, estimating its normalizing constant $Z=\int_{\mathbb{R}^d}\mathrm{e}^{-V(x)}\mathrm{d}x$ or free energy $F=-\log Z$ is a crucial problem in Bayesian statistics, statistical mechanics, and machine learning. It is challenging especially in high dimensions or when $π$ is multimodal. To mitigate the high variance of conventional importance sampling estimators, annealing-based methods such as Jarzynski equality and annealed importance sampling are commonly adopted, yet their quantitative complexity guarantees remain largely unexplored. We take a first step toward a non-asymptotic analysis of annealed importance sampling. In particular, we derive an oracle complexity of $\widetilde{O}\left(\frac{dβ^2{\mathcal{A}}^2}{\varepsilon^4}\right)$ for estimating $Z$ within $\varepsilon$ relative error with high probability, where $β$ is the smoothness of $V$ and $\mathcal{A}$ denotes the action of a curve of probability measures interpolating $π$ and a tractable reference distribution. Our analysis, leveraging Girsanov's theorem and optimal transport, does not explicitly require isoperimetric assumptions on the target distribution. Finally, to tackle the large action of the widely used geometric interpolation, we propose a new algorithm based on reverse diffusion samplers, establish a framework for analyzing its complexity, and empirically demonstrate its efficiency in tackling multimodality.

URL PDF HTML ☆

赞 0 踩 0

2501.11448 2026-05-20 stat.CO stat.ML

An accuracy-runtime trade-off comparison of scalable Gaussian process approximations for spatial data

可扩展高斯过程近似在空间数据中的准确性-运行时间权衡比较

Filippo Rambelli, Fabio Sigrist

AI总结本文比较了不同高斯过程近似方法在似然评估、参数估计和预测中的准确性与运行时间，发现Vecchia近似在大多数情况下提供了最佳的准确性-运行时间权衡。

2409.13760 2026-05-20 stat.AP

Mapping climate change awareness through spatial hierarchical clustering

通过空间层次聚类映射气候变化意识

Gianpaolo Zammarchi, Paolo Maranzano

AI总结本文通过地理信息引导的层次聚类分析，研究不同国家对气候变化的认知水平，结合气候意识、社会经济因素、气候相关特征及国家间物理距离，提出定制算法选择聚类超参数，发现地理引导聚类在稳定性及地理紧凑性上优于非地理聚类，揭示西方国家高认知与亚洲、非洲及中东国家较低且更分散的认知差异。

详情

DOI: 10.1016/j.seps.2026.102509

AI中文摘要

气候变化是一个将在未来几十年内持续存在于政治议程中的关键问题。尽管这一话题在高层讨论中很重要，但使人口意识到问题同样至关重要。由于不同国家可能面临不同程度的影响，了解特定人口群体的认知程度也是有益的。本文提出了一种地理引导的层次聚类分析，旨在识别具有相似气候变化意识水平的国家群体。我们采用一种类似于Ward的聚类算法，结合气候变化意识、社会经济因素、不同国家的气候相关特征以及国家间的物理距离信息。为了选择合适的聚类超参数值，我们提出了一种定制算法，考虑了簇内同质性、簇间分离度以及明确比较地理引导和非地理划分的结果。结果表明，地理引导的聚类方法在分区稳定性上优于无地理成分的聚类方法，并导致具有解释性和地理紧凑性的聚合结果。特别是，我们识别出西方国家具有高且紧凑的认知水平，而亚洲、非洲和中东国家则表现出更大的变异性，但整体认知水平仍较低。

英文摘要

Climate change is a critical issue that will be in the political agenda for the next decades. While it is important for this topic to be discussed at higher levels, it is also of paramount importance that the populations became aware of the problem. As different countries may face more or less severe repercussions, it is also useful to understand the degree of awareness of specific populations. In this paper, we present a geographically-informed hierarchical clustering analysis aimed at identify groups of countries with a similar level of climate change awareness. We employ a Ward-like clustering algorithm that combines information pertaining climate change awareness, socio-economic factors, climate-related characteristics of different countries, and the physical distances between countries. To choose suitable values for the clustering hyperparameters, we propose a customized algorithm that takes into account the within-clusters homogeneity, the between-clusters separation and that explicitly compares the geographically-informed and non-geographical partitioning. The results show that the geographically-informed clustering provides more stability of the partitions and leads to interpretable and geographically-compact aggregations compared to a clustering in which the geographical component is absent. In particular, we identify a clear contrast among Western countries, characterized by high and compact awareness, and Asian, African, and Middle Eastern countries having greater variability but still lower awareness.

URL PDF HTML ☆

赞 0 踩 0

2407.17200 2026-05-20 stat.ML cs.LG math.OC stat.ME

Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems

组合优化问题中替代策略的泛化界限

Pierre-Cyril Aubin-Frankowski, Yohann De Castro, Axel Parmentier, Alessandro Rudi

AI总结本文研究了在组合优化问题中使用替代策略的泛化界限，通过分析平滑（扰动）策略，提出了一个将超额风险分解为扰动偏差、统计估计误差和优化误差的泛化界限，引入了新的几何量来控制扰动偏差，并利用核Sum-of-Squares方法减少全局优化的维度灾难。

详情

Comments: 29 pages main document, 9 pages supplement

AI中文摘要

许多现实世界决策问题需要反复求解来自共同分布的组合优化实例。最近的结构学习方法利用这种规律性，通过学习将统计模型与可计算的组合 oracle 结合的策略，而不是独立解决每个实例。然而，训练此类策略极具挑战性：结果的经验风险是模型参数的分段常数函数，这阻碍了基于梯度的优化，并且迄今为止仅提供了很少的理论保证。我们通过分析平滑（扰动）策略来解决这个问题：在线性oracle使用的方向上添加受控的随机扰动，会得到一个可微的替代风险并提高泛化能力。我们的主要贡献是一个将超额风险分解为(i)扰动偏差、(ii)统计估计误差和(iii)优化误差的泛化界限。扰动偏差通过新的几何量“扇交叉概率”来控制，该量衡量扰动改变oracle解的可能性。我们引入了两个互补的条件来限制它——均匀有界密度（UBD）性质，产生一个锐利的O(λ)偏差，和较弱的均匀弱矩（UW）性质，产生一个亚线性界——两者都捕捉了统计模型与可行多面体的正常扇之间的几何交互。统计估计误差通过政策类的统一偏差界来控制，其速率O(1/(λ√n))，与平滑参数成反比。关于优化误差，我们利用核Sum-of-Squares方法来缓解全局优化的维度灾难。

英文摘要

Many real-world decision problems require solving, again and again, combinatorial optimization instances drawn from a common distribution. A recent line of structured learning methods exploits this regularity by learning policies that pair a statistical model with a tractable combinatorial oracle, instead of solving each instance independently. Training such policies is notoriously difficult, however: the resulting empirical risk is piecewise constant in the model parameters, which hinders gradient-based optimization, and only a few theoretical guarantees have been provided so far. We address this issue by analyzing smoothed (perturbed) policies: adding controlled random perturbations to the direction used by the linear oracle yields a differentiable surrogate risk and improves generalization. Our main contribution is a generalization bound that decomposes the excess risk into $(\mathit{i})$ perturbation bias, $(\mathit{ii})$ statistical estimation error, and $(\mathit{iii})$ optimization error. The perturbation bias is controlled by the \emph{fan-crossing probability}, a new geometric quantity measuring the likelihood that a perturbation changes the oracle solution. We introduce two complementary conditions to bound it--the \emph{Uniformly Bounded Density} (UBD) property, yielding a sharp ${O}(λ)$ bias, and the weaker \emph{Uniform Weak moment} (UW) property, yielding a sub-linear bound--both capturing the geometric interaction between the statistical model and the normal fan of the feasible polytope. The statistical estimation error is controlled via a uniform deviation bound over the policy class, with rate ${O}(1/(λ\sqrt{n}))$ that scales inversely in the smoothing parameter. Concerning the optimization error, we exploit kernel Sum-of-Squares methods to mitigate the curse of dimensionality of global optimization.

URL PDF HTML ☆

赞 0 踩 0

2404.12949 2026-05-20 math.PR cs.GT math.OC math.ST stat.TH

Optimal single threshold stopping rules and sharp prophet inequalities

最优单阈值停止规则与尖锐先知不等式

Alexander Goldenshluger, Yaakov Malinovsky, Assaf Zeevi

AI总结本文研究了一个有限时间 horizon 的最优停止问题，针对独立同分布随机变量序列，旨在设计能够选择序列中最大值的停止规则。通过与拥有完美先知能力的先知进行比较，通常以先知不等式形式表述性能。本文提出了一种博弈论特征化方法，支持从单阈值停止规则中推导出尖锐非渐近先知不等式。我们证明了比率型和差型先知不等式中的尖锐常数由在单位正方形上的无限两人零和博弈的最优值决定，而博弈的解提供了最优停止规则和最不利分布。此外，该方法还允许系统地处理受限分布类别。所提出的框架导致了一个数值高效算法范式，允许以任意指定的精度计算先知不等式中的尖锐常数。

详情

AI中文摘要

本文考虑了一个有限时间 horizon 的最优停止问题，针对独立同分布随机变量序列，目标是设计停止规则以选择序列中值最高的随机变量。任何停止规则的性能都可以相对于选择一个拥有完美先知能力的先知进行比较。此类比较通常以“先知不等式”的形式陈述。本文开发了一种博弈论特征化方法，支持从单阈值停止规则中推导出尖锐非渐近先知不等式。我们证明了比率型和差型先知不等式中的尖锐常数由在单位正方形上的无限两人零和博弈的最优值决定，而博弈的解提供了最优停止规则和最不利分布。此外，该方法还允许系统地处理受限分布类别。所提出的框架导致了一个数值高效算法范式，允许以任意指定的精度计算先知不等式中的尖锐常数。

英文摘要

This paper considers a finite horizon optimal stopping problem for a sequence of independent and identically distributed random variables, where the objective is to design stopping rules that attempt to select the random variable with the highest value in the sequence. The performance of any stopping rule may be benchmarked relative to the selection of a ``prophet" that has perfect foreknowledge of the largest value. Such comparisons are typically stated in the form of ``prophet inequalities." In this paper we develop a game-theoretic characterization that supports a principled approach for deriving sharp non-asymptotic prophet inequalities for single threshold stopping rules. We demonstrate that sharp constants in the ratio- and difference-type prophet inequalities are determined by the optimal values of infinite two-person zero-sum game on the unit square with particular payoff kernels, while the the solutions to the game provide optimal stopping rules and least favorable distributions. Among other things, this formulation also allows a systematic way to tackle restricted classes of distributions. The proposed framework leads to a numerically efficient algorithmic paradigm that allows computing sharp constants in prophet inequalities with any prescribed level of accuracy.

URL PDF HTML ☆

赞 0 踩 0

2402.00267 2026-05-20 cs.DS cs.CR stat.ML

Not All Learnable Distribution Classes are Privately Learnable

并非所有可学习的分布类都能在隐私下学习

Mark Bun, Gautam Kamath, Argyris Mouzakis, Vikrant Singhal

AI总结该研究通过一个例子说明，某些分布类在总变差距离下可被有限样本学习，但无法在$(\varepsilon, \delta)$-差分隐私下以相同误差学习，从而反驳了Ashtiani的猜想。

2401.16667 2026-05-20 math.ST stat.AP stat.ME stat.TH

Sharp variance estimator and causal bootstrap in stratified randomized experiments

精确方差估计与分层随机实验中的因果Bootstrap

Haoyang Yu, Ke Zhu, Hanzhong Liu

AI总结本文提出了一种精确方差估计和两种因果Bootstrap方法，用于更准确地近似分层随机实验中加权差异均值估计量的抽样分布，以解决小样本或结果偏斜时传统Neyman型方差估计器过于保守和正态近似失效的问题。

详情

DOI: 10.1002/sim.70139
Comments: Accepted by Statistics in Medicine

AI中文摘要

随机实验是估计处理效应的黄金标准，随机化为推断提供了合理的依据。在广泛使用的分层随机实验中，基于随机化的有限总体渐近理论使平均处理效应的推断有效，依赖于正态近似和Neyman型保守方差估计器。然而，当样本量小或结果偏斜时，Neyman型方差估计器可能过于保守，正态近似可能失效。为了解决这些问题，我们提出了一种精确方差估计器和两种因果Bootstrap方法，以更准确地近似分层随机实验中加权差异均值估计量的抽样分布。第一种因果Bootstrap程序基于保持排名的插补，我们证明其在正态近似上的二次修正。第二种因果Bootstrap程序基于恒定处理效应插补，并进一步适用于配对实验。与传统Bootstrap方法不同，其中随机性来源于假设超总体抽样，我们的分析对于所提出的因果Bootstrap是基于随机化的，仅依赖于随机实验中处理分配的随机性。数值研究和两个实际数据应用展示了我们提出的方法在有限样本中的优势。实现我们方法的R包CausalBootstrap已公开发布。

英文摘要

Randomized experiments are the gold standard for estimating treatment effects, and randomization serves as a reasoned basis for inference. In widely used stratified randomized experiments, randomization-based finite-population asymptotic theory enables valid inference for the average treatment effect, relying on normal approximation and a Neyman-type conservative variance estimator. However, when the sample size is small or the outcomes are skewed, the Neyman-type variance estimator may become overly conservative, and the normal approximation can fail. To address these issues, we propose a sharp variance estimator and two causal bootstrap methods to more accurately approximate the sampling distribution of the weighted difference-in-means estimator in stratified randomized experiments. The first causal bootstrap procedure is based on rank-preserving imputation and we prove its second-order refinement over normal approximation. The second causal bootstrap procedure is based on constant-treatment-effect imputation and is further applicable in paired experiments. In contrast to traditional bootstrap methods, where randomness originates from hypothetical super-population sampling, our analysis for the proposed causal bootstrap is randomization-based, relying solely on the randomness of treatment assignment in randomized experiments. Numerical studies and two real data applications demonstrate advantages of our proposed methods in finite samples. The \texttt{R} package \texttt{CausalBootstrap} implementing our method is publicly available.

URL PDF HTML ☆

赞 0 踩 0

2310.11203 2026-05-20 cs.LG stat.ML

Federated Learning with Nonvacuous Generalisation Bounds

联邦学习中的非空泛化界限

Pierre Jobic, Maxime Haddouche, Benjamin Guedj

AI总结本文提出了一种在联邦学习中训练随机预测器的新策略，通过在保持隐私的同时，释放本地预测器并保护训练数据不被其他节点知晓。研究构建了一个全局随机预测器，继承本地私有预测器的属性，基于PAC-Bayesian泛化界限。通过数值实验展示了该方法在预测性能上与批量方法相当，同时保持隐私。

详情

AI中文摘要

我们介绍了一种新的策略来训练联邦学习中的随机预测器，其中每个网络节点旨在通过释放本地预测器来保护隐私，同时保持其训练数据对其他节点的保密性。然后我们构建了一个全局随机预测器，该预测器继承本地私有预测器的属性，基于PAC-Bayesian泛化界限。我们考虑了同步情况，其中所有节点共享相同的训练目标（来源于泛化界限），以及异构和同构情况，其中每个节点可能有自己的个性化训练目标。通过一系列数值实验，我们证明了我们的方法在预测性能上与批量方法相当，其中所有数据集都在节点之间共享。此外，预测器由数值非空泛化界限支持，同时为每个节点保持隐私。我们明确计算了我们两种联邦设置的预测性能和泛化界限的增量，突显了为保护隐私而付出的代价。

英文摘要

We introduce a novel strategy to train randomised predictors in federated learning, where each node of the network aims at preserving its privacy by releasing a local predictor but keeping secret its training dataset with respect to the other nodes. We then build a global randomised predictor which inherits the properties of the local private predictors in the sense of a PAC-Bayesian generalisation bound. We consider the synchronous case where all nodes share the same training objective (derived from a generalisation bound), and the heterogenous and homogenous cases where each node may have its own personalised training objective. We show through a series of numerical experiments that our approach achieves a comparable predictive performance to that of the batch approach where all datasets are shared across nodes. Moreover the predictors are supported by numerically nonvacuous generalisation bounds while preserving privacy for each node. We explicitly compute the increment on predictive performance and generalisation bounds for our two federated settings, highlighting the price to pay to preserve privacy.

URL PDF HTML ☆

赞 0 踩 0

2205.02726 2026-05-20 stat.ME

Asymptotic Efficiency Bounds for a Class of Experimental Designs

一类实验设计的渐近效率界限

Timothy B. Armstrong

AI总结本文研究了在无限总体中按顺序采样并分配处理的实验设计，推导了适用于任何基于协变量和过去结果数据分配处理的实验的数据渐近效率界限，证明了二元处理的平均处理效应估计中无法进一步提高一阶渐近效率，相较于已达到Hahn(1998)界限的估计器。

2203.15890 2026-05-20 econ.EM stat.ME

Testing the identification of causal effects in observational data

检验观测数据中因果效应的识别

Martin Huber, Jannis Kueck

AI总结本文研究了在观测数据中检验治疗对结果的因果效应的可检验条件，通过机器学习方法提出测试该条件的检验方法，并在模拟研究中检验其渐近行为和有限样本性能，同时应用该方法评估生育对女性劳动力供给的影响，发现使用兄弟姐妹前两孩性别比作为工具变量时存在可检验假设的违反。

详情

AI中文摘要

本文研究了在观测数据中检验治疗对结果的因果效应的可检验条件，该条件依赖于两组变量：需要控制的观测协变量和一个怀疑的工具变量。在经验应用中常见的因果结构下，怀疑的工具变量在治疗和协变量给定的情况下对结果的条件独立性具有两个含义。第一，工具变量是有效的，即它不直接影响结果（除了通过治疗外），并且在协变量条件下无混杂。第二，治疗在协变量条件下无混杂，因此治疗效应可以被识别。我们建议基于机器学习方法的测试该条件独立性的检验方法，这些方法以数据驱动的方式考虑协变量，并在模拟研究中检验其渐近行为和有限样本性能。我们还应用我们的检验方法来评估使用前两个孩子的兄弟姐妹性别比作为工具变量时生育对女性劳动力供给的影响，这在考虑的中等社会经济协变量集下大多指向我们可检验假设的违反。

英文摘要

This study demonstrates the existence of a testable condition for the identification of the causal effect of a treatment on an outcome in observational data, which relies on two sets of variables: observed covariates to be controlled for and a suspected instrument. Under a causal structure commonly found in empirical applications, the testable conditional independence of the suspected instrument and the outcome given the treatment and the covariates has two implications. First, the instrument is valid, i.e. it does not directly affect the outcome (other than through the treatment) and is unconfounded conditional on the covariates. Second, the treatment is unconfounded conditional on the covariates such that the treatment effect is identified. We suggest tests of this conditional independence based on machine learning methods that account for covariates in a data-driven way and investigate their asymptotic behavior and finite sample performance in a simulation study. We also apply our testing approach to evaluating the impact of fertility on female labor supply when using the sibling sex ratio of the first two children as supposed instrument, which by and large points to a violation of our testable implication for the moderate set of socio-economic covariates considered.

URL PDF HTML ☆

赞 0 踩 0

2112.08507 2026-05-20 cs.LG stat.ML

Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization

适应性实验的算法：在统计分析与奖励之间进行权衡：结合均匀随机分配与奖励最大化

Tong Li, Jacob Nogas, Haochen Song, Anna Rafferty, Eric M. Schwartz, Audrey Durand, Harsh Kumar, Nina Deliu, Sofia S. Villar, Dehan Kong, Joseph J. Williams

AI总结本文提出了一种统计敏感算法TS-PostDiff，通过结合均匀随机分配和奖励最大化，在统计分析与用户奖励之间进行权衡，以提高实验效率和准确性。

详情

AI中文摘要

传统随机A/B实验使用均匀随机（UR）概率分配臂，例如将50/50分配给网站的两个版本以发现哪个版本更能吸引用户。为了更快速和自动地利用数据来造福用户，多臂老虎机算法如汤普森采样（TS）已被提倡。虽然TS具有可解释性并结合了随机化关键的统计推断，但它可能导致有偏估计并增加假阳性率和假阴性率。我们引入了一种更统计敏感的算法，TS-PostDiff（后验概率小差异），它通过使用额外的自适应步骤混合TS和传统UR，其中使用UR（而非TS）的概率与臂差异的后验概率成正比。这使实验者能够定义什么算作小差异，低于此值，传统UR实验可以以低成本获得用于统计推断的信息数据，而高于此值则使用更多TS以最大化用户利益。我们评估了TS-PostDiff与UR、TS以及两个其他旨在提高统计推断的TS变体。我们考虑了在多种设置下的常见双臂实验结果，这些设置受到现实应用的启发。我们的结果提供了洞察，说明在何时以及为何TS-PostDiff或替代方法在用户利益（奖励）和统计推断（假阳性率和功率）之间提供更好的权衡。TS-PostDiff的自适应性有助于在差异较小时高效减少假阳性并提高统计功率，而在差异较大时增加奖励。这项工作强调了未来统计敏感算法开发中重要的考虑因素，这些算法需要在适应性实验中平衡奖励和统计分析。

英文摘要

Traditional randomized A/B experiments assign arms with uniform random (UR) probability, such as 50/50 assignment to two versions of a website to discover whether one version engages users more. To more quickly and automatically use data to benefit users, multi-armed bandit algorithms such as Thompson Sampling (TS) have been advocated. While TS is interpretable and incorporates the randomization key to statistical inference, it can cause biased estimates and increase false positives and false negatives in detecting differences in arm means. We introduce a more Statistically Sensitive algorithm, TS-PostDiff (Posterior Probability of Small Difference), that mixes TS with traditional UR by using an additional adaptive step, where the probability of using UR (vs TS) is proportional to the posterior probability that the difference in arms is small. This allows an experimenter to define what counts as a small difference, below which a traditional UR experiment can obtain informative data for statistical inference at low cost, and above which using more TS to maximize user benefits is key. We evaluate TS-PostDiff against UR, TS, and two other TS variants designed to improve statistical inference. We consider results for the common two-armed experiment across a range of settings inspired by real-world applications. Our results provide insight into when and why TS-PostDiff or alternative approaches provide better tradeoffs between benefiting users (reward) and statistical inference (false positive rate and power). TS-PostDiff's adaptivity helps efficiently reduce false positives and increase statistical power when differences are small, while increasing reward more when differences are large. The work highlights important considerations for future Statistically Sensitive algorithm development that balances reward and statistical analysis in adaptive experimentation.

URL PDF HTML ☆

赞 0 踩 0

2605.19014 2026-05-20 cs.LG econ.EM stat.ML

SAGA: A Sequence-Adaptive Generative Architecture for Multi-Horizon Probabilistic Forecasting with Adaptive Temporal Conformal Prediction

SAGA：一种序列自适应的生成架构，用于多时间跨度概率预测的自适应时间符合预测

Gustav Olaf Yunus Laitinen-Fredriksson Lundström-Imanov, Hafize Gonca Cömert

AI总结本文提出SAGA，一种用于不规则表格面板序列的解码器-only transformer，结合分割符合校准包装器，提供个体层面的预测区间，并保证有限样本边缘覆盖。SAGA在瑞典LISA登记处的纵向数据上训练，预测了1到30年的年度劳动收入，并通过蒙特卡洛方法汇总成现值寿命收入分布。与传统参数过程和表格和循环基线相比，SAGA在10年时间跨度上将连续排名概率分数减少了31.9%，在20年时间跨度上将平均绝对误差减少了37.7%。符合区间在边缘情况下覆盖率为0.4个百分点，在最差的人口子群体中为2.4个百分点。重建的寿命收入基尼系数为0.327，与部分观测的真实值0.341和GKOS估计值0.378相比。模型权重、校准表和合成等价数据集已发布，供在保护的SCB MONA环境中外的复制使用。

详情

Comments: 14 pages, 3 figures, 12 tables, 5 appendices, 45 references. Submitted to IEEE TPAMI. Source code at https://github.com/olaflaitinen/saga (archived: doi:10.5281/zenodo.20260366). Synthetic equivalent dataset: doi:10.5281/zenodo.20260287. Empirical work conducted on the Swedish LISA register via SCB MONA (project SCB-MONA-2026-147); ethical approval Swedish Ethical Review Authority 2026-04127-01

AI中文摘要

用于财政部门和中央银行的微模拟模型依赖于参数过程来捕捉生命周期收入的寿命，这些过程只捕捉条件分布的一阶和二阶矩，忽略了长期非线性结构。我们提出SAGA，一种用于不规则表格面板序列的解码器-only transformer，结合分割符合校准包装器，提供个体层面的预测区间，并保证有限样本边缘覆盖。在1990年至2022年的纵向瑞典LISA登记处数据上训练，包含2,143,817个个体和61,284,903人年，模型预测了1到30年的年度劳动收入，并通过蒙特卡洛方法汇总成现值寿命收入分布。与传统参数过程和表格和循环基线相比，SAGA在10年时间跨度上将连续排名概率分数减少了31.9%，在20年时间跨度上将平均绝对误差减少了37.7%。符合区间在边缘情况下覆盖率为0.4个百分点，在最差的人口子群体中为2.4个百分点。重建的寿命收入基尼系数为0.327，与部分观测的真实值0.341和GKOS估计值0.378相比。模型权重、校准表和合成等价数据集已发布，供在保护的SCB MONA环境中外的复制使用。

英文摘要

Microsimulation models used by ministries of finance and central banks rely on parametric processes for lifetime earnings that capture only first and second moments of the conditional distribution and miss long-range nonlinear structure. We propose SAGA, a decoder-only transformer for irregular tabular panel sequences, paired with a split conformal calibration wrapper that delivers individual-level prediction intervals with finite-sample marginal coverage guarantees. Trained on the longitudinal Swedish LISA register over 1990 to 2022, comprising 2,143,817 individuals and 61,284,903 person-years, the model forecasts annual labor earnings at horizons of one to thirty years and aggregates them by Monte Carlo into present-discounted lifetime earnings distributions. Against the canonical Guvenen, Karahan, Ozkan, and Song parametric process and tabular and recurrent baselines, SAGA reduces continuous ranked probability score by 31.9 percent at the ten-year horizon and mean absolute error by 37.7 percent at the twenty-year horizon. Conformal intervals achieve nominal coverage to within 0.4 percentage points marginally and within 2.4 percentage points on the worst-case demographic subgroup. The reconstructed lifetime earnings Gini coefficient is 0.327 against the partially observed truth of 0.341 and the GKOS estimate of 0.378. Model weights, calibration tables, and a synthetic equivalent dataset are released for replication outside the protected SCB MONA environment.

URL PDF HTML ☆

赞 0 踩 0

2605.19006 2026-05-20 stat.ME stat.ML

Causal Inference with Categorical Unobserved Confounder via Mixture Learning

通过混合学习进行带有类别未观察混杂因素的因果推断

Aytijhya Saha, Stephen Bates, Devavrat Shah

AI总结本文研究了在未观察混杂因素为类别型时，如何通过混合学习方法实现因果推断，并提出了基于张量分解的估计方法，证明了在合适条件下因果效应可识别。

详情

AI中文摘要

未观察混杂是估计因果效应的基本挑战。为解决未观察混杂，最近的文献采用了两种不同的方法——代理变量和多重治疗的使用。第一种方法通常称为近因因果推断，要求代理变量被分配到特定的非对称角色：治疗诱导代理（负控暴露）、作为治疗和结果共同原因的变量，以及结果诱导代理（负控结果）。然而，在实践中，根据应用领域不同，识别满足这些非对称角色的变量可能具有挑战性。第二种方法通常称为“去混杂因子”，处理多个条件独立的治疗。在这一设置下，开发一致估计方法的研究进展有限。本文的主要贡献是证明在合适条件下，当未观察混杂因子为类别型时，因果效应在两种设置下都是可识别的。我们的方法基于混合学习的视角：我们展示通过识别相应的混合分布可以恢复底层的混杂结构。我们提出了一种基于张量分解的估计程序，这使得能够一致地恢复潜在结构，并具有非渐近保证。模拟研究和真实数据实验表明，所提出的方法即使在数据有限的情况下也能表现良好。

英文摘要

Unobserved confounding is a fundamental challenge for estimating causal effects. To address unobserved confounding, recent literature has turned to two different approaches -- proxy variables and the use of multiple treatments. The first approach, commonly referred to as proximal causal inference, requires proxies to be assigned to specific asymmetric roles: treatment-inducing proxies (negative control exposures), variables that act as common causes of the treatment and outcome, and outcome-inducing proxies (negative control outcomes). In practice, however, identifying variables that satisfy these asymmetric roles can be difficult depending on the application domain. The second approach, commonly referred to as the ``Deconfounder," deals with multiple conditionally independent treatments. There has been limited progress towards developing a consistent estimation method for this setting. As the primary contribution of this work, we establish that causal effects are identifiable in both settings when the unobserved confounder is categorical under suitable conditions. Our approach builds on a mixture learning perspective: we show that the underlying confounding structure can be recovered by identifying the corresponding mixture distribution. We propose an estimation procedure based on tensor decomposition, which allows consistent recovery of the latent structure and comes with non-asymptotic guarantees. Simulation studies and real data experiments demonstrate that the proposed method performs well even with limited data.

URL PDF HTML ☆

赞 0 踩 0

2605.18927 2026-05-20 stat.ML cs.LG math.PR

Bayesian Latent Space Models for Graphs Are Misspecified: Toward Robust Inference via Generalized Posteriors

基于图的贝叶斯潜在空间模型存在规格问题：通过广义后验实现稳健推断

Aldric Labarthe

AI总结本文研究了基于图的贝叶斯潜在空间模型的规格问题，提出了一种广义后验框架，通过Link-Sequential R-SafeBayes方法改进模型的鲁棒性，提升了校准性和链接预测性能。

详情

AI中文摘要

贝叶斯潜在空间模型为网络表示提供了一种系统的方法，但依赖于几何和链接函数的正确规范。现实中的网络经常违反这些假设，表现出几何不匹配和结构异常，破坏标准度量属性。我们证明，这种不规范会将数据生成分布推离模型类，导致贝叶斯推断变得过于自信且校准不佳。为了解决这个问题，我们提出了一种随机几何图的广义后验框架。我们引入了Link-Sequential R-SafeBayes方法，该方法利用二元条件独立性来估计预quential风险并自适应地调节后验正则化。在合成和现实网络上的实验表明，改进了校准性，提高了链接预测性能，并提供了一个可靠的准则来选择欧几里得、球面和双曲空间中的潜在几何结构。

英文摘要

Bayesian latent space models offer a principled approach to network representation, but rely on correct specification of both geometry and link function. Real-world networks often violate these assumptions, exhibiting geometric mismatch and structural anomalies that break standard metric properties. We show that such misspecification pushes the data-generating distribution outside the model class, causing Bayesian inference to become overconfident and poorly calibrated. To address this, we propose a generalized posterior framework for random geometric graphs. We introduce Link-Sequential R-SafeBayes, a method that exploits dyadic conditional independence to estimate prequential risk and adaptively tune posterior regularization. Experiments on synthetic and real-world networks demonstrate improved calibration, better link prediction performance, and a reliable criterion for selecting latent geometries across Euclidean, spherical, and hyperbolic spaces.

URL PDF HTML ☆

赞 0 踩 0

2605.18910 2026-05-20 stat.ME

A Tutorial on Symbolic Structural Identifiability Analysis of ODE Models in Julia

关于使用Julia进行ODE模型符号结构可辨识性分析的教程

Abdallah Alsammani

AI总结本文介绍了如何利用Julia中的StructuralIdentifiability.jl包进行符号结构可辨识性分析，通过七个案例研究展示了全局可辨识性、局部可辨识性、结构不可辨识性以及通过额外测量和重新参数化恢复可辨识性的方法。

详情

AI中文摘要

结构可辨识性分析确定机理普通微分方程（ODE）模型的参数是否可以从理想观测中唯一恢复，因此是可靠参数估计的基本前提。本文介绍了使用Julia包StructuralIdentifiability.jl进行符号结构可辨识性分析的现代、可重现的计算框架。我们提供了关于局部和全局可辨识性、可观测性、参数到输出映射以及可辨识参数组合的严格而易懂的介绍，以及基于核心函数@ODEmodel、assess_local_identifiability、assess_identifiability和find_identifiable_functions的统一工作流程。该框架通过流行病学、药代动力学和系统生物学中的七个案例研究进行演示，展示了全局可辨识系统、仅局部可辨识性、结构不可辨识性和通过额外测量和重新参数化恢复可辨识性的方法。除了理论基础外，本文强调了在Julia SciML生态系统中进行实际模型改写、实验设计和可重现的科学工作流程，为从事机理ODE模型研究的研究人员和研究生提供了全面的参考。

英文摘要

Structural identifiability analysis determines whether the parameters of a mechanistic ordinary differential equation (ODE) model can be uniquely recovered from ideal observations and is therefore a fundamental prerequisite for reliable parameter estimation. This tutorial presents a modern, reproducible computational framework for symbolic structural identifiability analysis using the Julia package StructuralIdentifiability.jl. We provide a rigorous yet accessible introduction to local and global identifiability, observability, parameter-to-output mappings, and identifiable parameter combinations, together with a unified workflow based on the core functions @ODEmodel, assess_local_identifiability, assess_identifiability, and find_identifiable_functions. The framework is demonstrated through seven case studies from epidemiology, pharmacokinetics, and systems biology, illustrating globally identifiable systems, local-only identifiability, structural non-identifiability, and recovery of identifiability through additional measurements and reparameterization. Beyond the theoretical foundations, the tutorial emphasizes practical model reformulation, experimental design, and reproducible scientific workflows within the Julia SciML ecosystem, providing a comprehensive reference for researchers and graduate students working with mechanistic ODE models.

URL PDF HTML ☆

赞 0 踩 0

2605.18887 2026-05-20 econ.EM econ.GN q-fin.EC stat.AP

Valuing Winners: When and How to Correct for Selection Bias in Randomized Experiments

估值赢家：何时以及如何在随机实验中纠正选择偏差

Ron Berman, Walter W. Zhang, Hangcheng Zhao

AI总结本文研究了在随机实验中如何纠正选择偏差，区分了全局和选择性赢家诅咒两种形式，并探讨了如何根据管理目标选择合适的方法。

详情

Comments: 68 pages

AI中文摘要

决策者经常选择随机实验中表现最好的处理方法，从而产生赢家诅咒：选择倾向于那些观察到的结果较高的处理，部分原因是统计噪声，因此对赢家的简单估计存在向上偏差。我们区分了两种形式的赢家诅咒，即相对于真实最佳处理的偏差（全局）和相对于所选处理真实均值的偏差（选择性），并将它们与部署次优处理的遗憾联系起来。该框架定义了七个决策相关的评估目标：全局和选择性赢家诅咒的均值偏差、均方误差和置信区间覆盖率，以及均值遗憾。然后我们显示，在一种目标上表现良好的方法可能在其他目标上表现不佳，因此纠正措施应与管理目标相匹配。在具有不同效应大小、多臂设置和校准到在线A/B测试平台的数据模拟中，没有方法在所有情况下都占优：插值估计器在处理差异较大的情况下表现最佳，交叉拟合在处理相似时表现最佳，而重采样方法在中等差异时通常能实现较低的均方误差。我们还介绍了一种自适应经验似然程序，该程序在各种情况下都能提供渐近有效的置信区间，而无需重采样方法的调参敏感性。

英文摘要

Decision-makers often deploy the best-performing treatment from a randomized experiment, creating a winner's curse: selection favors treatments whose observed outcomes are high partly because of statistical noise, so the naïve estimate of the winner is upward biased. We distinguish two forms of winner's curse, bias relative to the true best treatment (global) and bias relative to the selected treatment's true mean (selective), and link them to regret from deploying a suboptimal treatment. This framework defines seven decision-relevant evaluation targets: mean bias, mean squared error, and confidence interval coverage for the global and selective winner's curse, and mean regret. We then show that methods that perform well on one target can perform poorly on others, so corrections should be matched to the manager's objective. Across simulations with varying effect sizes, multiple-arm settings, and data calibrated to an online A/B testing platform, no method dominates uniformly: the plug-in estimator performs best when treatment differences are large, cross-fitting performs best when treatments are similar, and resampling methods often achieve low mean squared error for moderate differences. We also introduce an adaptive empirical likelihood procedure that delivers asymptotically valid confidence intervals across settings without the tuning sensitivity of resampling-based methods.

URL PDF HTML ☆

赞 0 踩 0

2605.18858 2026-05-20 cs.LG cs.AI cs.GT stat.ML

When Individually Calibrated Models Become Collectively Miscalibrated

当个体校准的模型成为集体不校准的

Zhaohui Wang

AI总结研究探讨了在多智能体环境中，即使每个模型都经过个体校准，聚合预测仍可能不校准的现象，提出通过VCG聚合方法解决这一问题，实现激励相容和近最优性能。

详情

Comments: 42 pages, 1 main figure, multiple tables. Accepted at ProbML 2026

AI中文摘要

概率预测系统常常将多个模型的概率估计聚合为单一决策。一个常见假设是，如果每个模型都经过个体校准，聚合预测也将是良好的校准。我们展示了在多智能体设置中，这一假设不成立：当预测者战略性地相互作用时，即使没有刻意协调，个体校准的预测者也可能集体上不校准。这种现象自然出现在智能体在重叠数据上独立训练时。我们证明，在基于Brier分数的聚合中，当信念正相关时，每个智能体的个体最优报告系统地低估了正类概率，导致价格of anarchy大于一，只要协方差(b_i, b_j) > 0。在典型设置（n=5个智能体，成对相关性=0.5，基础率=0.3）中，经实测的PoA在假阴性率上达到7.25倍。相比之下，基于VCG的聚合通过奖励边际贡献对齐激励，实现主导策略激励相容性和近最优性能。在三个现实世界数据集（NSL-KDD、UNSW-NB15、信用卡欺诈）上的实验显示，VCG在保持可比准确性的同时表现出强鲁棒性。它在数据稀疏和对抗性设置中表现尤其出色，自适应加权进一步在分布偏移下提升了性能。

英文摘要

Probabilistic prediction systems often aggregate probability estimates from multiple models into a single decision. A common assumption is that if each model is individually calibrated, the aggregate prediction will also be well calibrated. We show that this assumption fails in multi-agent settings: individually calibrated predictors can become collectively miscalibrated when their predictions interact strategically, in the game-theoretic sense of Brier-optimal local response, even without deliberate coordination. This phenomenon arises naturally when agents are independently trained on overlapping data. We prove that under Brier-score-based aggregation with positively correlated beliefs, each agent's individually optimal report systematically underestimates the positive-class probability, yielding a Price of Anarchy greater than one whenever Cov(b_i, b_j) > 0. In a canonical setting (n = 5 agents, pairwise correlation = 0.5, base rate = 0.3), the empirically measured PoA in false-negative rate reaches 7.25x. In contrast, VCG-based aggregation aligns incentives by rewarding marginal contribution, achieving dominant-strategy incentive compatibility and near-optimal performance. Experiments on three real-world datasets (NSL-KDD, UNSW-NB15, Credit Card Fraud) show that VCG provides strong robustness while maintaining comparable accuracy. It performs particularly well in data-sparse and adversarial settings, and adaptive weighting further improves performance under distribution shift.

URL PDF HTML ☆

赞 0 踩 0

2605.18798 2026-05-20 cs.LG cs.IT math.IT math.ST stat.ML stat.TH

Accurate Evaluation of Quickest Changepoint Detectors via Non-parametric Survival Analysis

通过非参数生存分析准确评估最快突变点检测器

Taiki Miyagawa, Akinori F. Ebihara

AI总结本文提出非参数估计方法用于快速突变点检测中的平均运行长度和平均检测延迟，通过将突变点检测与生存分析类比，解决了有限和不规则序列长度下的估计问题，提升了模型的鲁棒性和可解释性。

详情

Comments: Accepted to ICML 2026. GitHub: https://github.com/TaikiMiyagawa/Kaplan-Meier-Average-Run-Length

AI中文摘要

我们提出非参数估计器用于在有限和不规则序列长度下快速突变点检测（QCD）中的平均运行长度（ARL）和平均检测延迟（ADD）。尽管ARL和ADD广泛用于理论和模拟研究中的最优性标准，但它们在实际数据集中的应用受到有限和不规则序列长度的限制。为了解决这个问题，我们通过将QCD与生存分析类比，提出非参数估计器ARL和ADD，称为KM-ARL和KM-ADD，以建模序列截断下的检测概率。我们推导了估计偏差界限，并证明除非需要外推，否则它们在渐近上是无偏的。在模拟和实际数据集上的实验展示了其实际用途，增强了对有限和不规则序列长度的鲁棒性，提高了可解释性，并促进了经验、直观的模型选择。我们的Python代码可在https://github.com/TaikiMiyagawa/Kaplan-Meier-Average-Run-Length提供，为从业者提供了即用型实现。

英文摘要

We propose non-parametric estimators for the average run length (ARL) and average detection delay (ADD) in quickest changepoint detection (QCD) under finite and irregular sequence lengths. Although ARL and ADD are widely used as optimality criteria in theoretical and simulation studies, their application to real-world datasets is hindered by limited and irregular sequence lengths. To address this issue, we propose non-parametric estimators for the ARL and ADD, termed KM-ARL and KM-ADD, by drawing an analogy between QCD and survival analysis to model detection probabilities under sequence truncation. We derive estimation bias bounds and prove that they are asymptotically unbiased unless extrapolation is required. Experiments on simulated and real-world datasets demonstrate their practical utility, enhancing robustness against limited and irregular sequence lengths, improving interpretability, and facilitating empirical, intuitive model selection. Our Python code is provided at https://github.com/TaikiMiyagawa/Kaplan-Meier-Average-Run-Length, offering ready-to-use implementations for practitioners.

URL PDF HTML ☆

赞 0 踩 0

2605.18786 2026-05-20 math.OC cs.NA math.NA stat.ME

Unbiased Gradients for a Class of Conditional Stochastic Optimization Problems

一类条件随机优化问题的无偏梯度

Miguel Alvarez, Ajay Jasra

AI总结本文研究了一类条件随机优化问题，提出了一种结合马尔可夫随机近似与无偏近似方法的新方法，用于求解此类问题的最优解，并通过参数估计与高维全因子多元随机波动率模型的实例进行了验证。

详情

AI中文摘要

在本文中，我们考虑了条件随机优化（CSO）问题。该问题涉及优化一个函数，该函数可以表示为另一个函数的期望，而该函数本身是条件期望的函数，即类型 $F(ξ) := \mathbb{E}\left[f\left(Z,\mathbb{E}[g(Z,X,ξ)|Z] ight) ight]$，其中精确定义在正文中给出。我们处理了一类特定的CSO问题，其中随机变量 $X,Z$ 的联合分布无法精确采样；这一情况已在Goda & Kitade (2023)中得到解决。我们引入了一种方法，将马尔可夫随机近似与无偏近似方法相结合，从而能够在感兴趣的上下文中找到 $F(ξ)$ 的最优解。我们通过两个例子来说明我们的方法，这两个例子与参数估计中的模型平均和高维全因子多元随机波动率模型相关的投资组合选择有关。

英文摘要

In this paper we consider the conditional stochastic optimization (CSO) problem. This consists of optimizing a function which can be written as the expectation of a function which is itself a function of a conditional expectation, i.e.~of the type $F(ξ) := \mathbb{E}\left[f\left(Z,\mathbb{E}[g(Z,X,ξ)|Z]\right)\right]$, where precise definitions are given in the main text. We address a particular class of CSO problems where the joint law of the random variables $X,Z$ cannot be exactly sampled; this case has been addressed in Goda & Kitade (2023). We introduce a method that combines Markovian stochastic approximation with unbiased approximation methods which allows one to find the optimizer of $F(ξ)$ in the context of interest. We illustrate our methodology on two examples associated to parameter estimation with model averaging and portfolio selection associated to high-dimensional full factor multivariate stochastic volatility models.

URL PDF HTML ☆

赞 0 踩 0

2506.19958 2026-05-20 stat.ME econ.GN q-fin.EC stat.AP stat.CO

RobustiPy: An efficient next generation multiversal library with model selection, averaging, resampling, and explainable artificial intelligence

RobustiPy: 一个高效的下一代多宇宙库，包含模型选择、平均、重采样和可解释人工智能

Daniel Valdenegro, Jiani Yan, Duiyi Dai, Charles Rahal

AI总结本文提出RobustiPy，一个高效的多宇宙分析库，通过统一的模块化框架整合了重采样推断、组合规范搜索、模型选择与平均、联合推断和可解释AI方法，以提高实证研究的透明度和可重复性。

详情

AI中文摘要

科学推断常常受到广泛但很少探索的“多宇宙”可辩护建模选择的影响，这些选择可以产生与研究现象一样多变的结果。我们介绍了RobustiPy，一个开源的Python库，它系统化地在大规模上进行多宇宙分析和模型不确定性量化。RobustiPy在一个模块化、可重复的框架中统一了基于重采样的推断、组合规范搜索、模型选择和平均、联合推断程序以及可解释的人工智能方法。除了详尽的规范曲线外，它还支持严格的离样验证，并量化每个协变量的边际贡献。我们展示了其在五个模拟设计和十个涵盖经济学、社会学、心理学和医学的实证案例研究中的实用性，包括对广泛引用但有记录差异的发现的重新分析。在约6.72亿次模拟回归上进行基准测试表明，RobustiPy在提高实证研究透明度的同时，实现了最先进的计算效率。通过标准化和加速稳健性分析，RobustiPy改变了研究人员在分析多宇宙中的敏感性探究方式，为更可重复和可解释的计算科学提供了实用基础。

英文摘要

Scientific inference is often undermined by the vast but rarely explored "multiverse" of defensible modelling choices, which can generate results as variable as the phenomena under study. We introduce RobustiPy, an open-source Python library that systematizes multiverse analysis and model-uncertainty quantification at scale. RobustiPy unifies bootstrap-based inference, combinatorial specification search, model selection and averaging, joint-inference routines, and explainable AI methods within a modular, reproducible framework. Beyond exhaustive specification curves, it supports rigorous out-of-sample validation and quantifies the marginal contribution of each covariate. We demonstrate its utility across five simulation designs and ten empirical case studies spanning economics, sociology, psychology, and medicine, including a re-analysis of widely cited findings with documented discrepancies. Benchmarking on ~672 million simulated regressions shows that RobustiPy delivers state-of-the-art computational efficiency while expanding transparency in empirical research. By standardizing and accelerating robustness analysis, RobustiPy transforms how researchers interrogate sensitivity across the analytical multiverse, offering a practical foundation for more reproducible and interpretable computational science.

URL PDF HTML ☆

赞 0 踩 0