arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.28785 2026-05-28 stat.ME

Beyond Exchangeability: Distribution-Shift-Aware Integration of External Control Data in Randomized Trials

超越可交换性:随机试验中外部对照数据的分布偏移感知整合

Jiawei Shan, Yiteng Tu, Guanbo Wang, Chao Ying, Jiwei Zhao

AI总结 针对随机试验与外部对照数据间分布偏移问题,提出通过校准方程平衡人群的增强估计量,并开发自适应收缩估计量以保证一致性和效率优势。

详情
AI中文摘要

随机对照试验(RCT)是评估因果效应的金标准,但通常成本高昂且难以扩展;因此,在许多应用中,它们经常辅以辅助外部对照。先前的借用此类数据的方法通常依赖于可交换性,即外部对照可直接用于试验人群的推断。然而在实践中,入组标准、标准护理和数据收集程序的差异可能导致RCT与外部对照之间的分布偏移,使得可交换性不成立。在本文中,我们提出了一种通过显式建模这些分布偏移来整合外部对照的新框架。我们通过校准方程调整仅使用试验的有效影响函数来构建增强估计量,以平衡试验和外部人群,从而即使在可交换性不成立时也能充分利用外部对照数据。我们进一步开发了一种自适应收缩估计量,该估计量保持一致性,同时保证相对于仅使用试验的基准的效率优势。合成实验和真实数据应用证明了所提出方法的实际优势。

英文摘要

Randomized controlled trials (RCTs) are the gold standard for evaluating causal effects but are often costly and difficult to scale; consequently, they are frequently augmented with auxiliary external controls in many applications. Prior approaches for borrowing such data typically rely on exchangeability, under which the external controls are readily usable for inference in the trial population. In practice, however, differences in eligibility criteria, standard of care, and data collection procedures may induce distribution shifts between the RCT and the external controls, rendering exchangeability implausible. In this paper, we propose a novel framework for integrating external controls by explicitly modeling these distribution shifts. We construct augmented estimators by adapting trial-only efficient influence functions through calibration equations that balance the trial and external populations, thereby fully exploiting the external control data even when exchangeability fails. We further develop an adaptive shrinkage estimator that preserves consistency while guaranteeing efficiency dominance over the trial-only benchmark. Synthetic experiments and a real data application demonstrate the practical advantages of the proposed approaches.

2605.28767 2026-05-28 cs.LG stat.ML

Principled Algorithms for Optimizing Generalized Metrics in Multi-Label Learning

多标签学习中优化广义度量的原则性算法

Mehryar Mohri, Yutao Zhong

AI总结 本文基于H-一致性理论,设计了可分解的代理损失函数,提出MMO算法族,用于优化多标签学习中的广义线性分式度量,并在大规模数据集上验证了其可扩展性和优越性能。

详情
AI中文摘要

许多现实世界的分类任务需要为每个实例预测多个标签,从而需要优化诸如$F$度量和Jaccard指数等复杂评估度量。虽然经验效用最大化(EUM)框架对于这些总体度量是自然的,但现有的理论结果主要局限于渐近贝叶斯一致性。在本文中,我们基于更强的$H$一致性概念,在EUM框架内开发了用于优化广义度量类别的原则性学习算法。我们的关键贡献是为多标签学习设计了新颖的代理损失函数,这些函数具有可证明的$H$一致性界,从而能够针对假设类别和有限样本进行具有非渐近保证的优化。至关重要的是,我们证明了这些组合公式化的代理损失函数可以精确分解,以严格的$O(l)$时间运行,无需近似。在此基础之上,我们引入了MMO(多标签度量优化),这是一个用于优化广义线性分式度量的新算法族。我们通过大量实验验证了我们的方法,在高稀疏性、深度学习环境下的大规模数据集(MS-COCO、Reuters-21578)上展示了稳健的可扩展性和优于最先进连续基线的性能。我们的结果为一般多标签度量优化提供了理论严谨性和实际有效性。

英文摘要

Many real-world classification tasks require predicting multiple labels per instance, necessitating the optimization of complex evaluation metrics such as the $F$-measure and Jaccard index. While the Empirical Utility Maximization (EUM) framework is natural for these population-level metrics, existing theoretical results are largely limited to asymptotic Bayes-consistency. In this paper, we develop principled learning algorithms for optimizing a broad class of generalized metrics within the EUM framework, grounded in the stronger notion of $H$-consistency. Our key contribution is the design of novel surrogate loss functions for multi-label learning that admit provable $H$-consistency bounds, enabling optimization with non-asymptotic guarantees tailored to the hypothesis class and finite samples. Crucially, we prove these combinatorially formulated surrogates decompose exactly, operating in strictly $O(l)$ time without approximations. Building on this foundation, we introduce MMO (Multi-Label Metric Optimization), a new family of algorithms for optimizing generalized linear-fractional metrics. We validate our approach through extensive experiments, demonstrating robust scalability and superior performance over state-of-the-art continuous baselines on large-scale datasets (MS-COCO, Reuters-21578) in high-sparsity, deep learning regimes. Our results offer both theoretical rigor and practical effectiveness for general multi-label metric optimization.

2605.28762 2026-05-28 math.ST stat.AP stat.CO stat.ME stat.ML stat.TH

Deep Neural Networks for Doubly Robust Estimation with Nonprobability Survey Samples

用于非概率调查样本的双重稳健估计的深度神经网络

Yufang Dai, Shihua Luo, Wendy Lou, Zilin Wang, Xuewen Lu

AI总结 提出一种深度神经网络辅助的双重稳健框架,结合非概率样本和概率样本估计有限总体均值,通过伪似然估计非参数采样得分,并证明一致性和收敛速度。

详情
Comments
29 pages, 1 figure
AI中文摘要

整合概率和非概率调查样本是现代调查抽样中的一个重要问题。非概率样本通常包含丰富的结果信息,但可能缺乏总体代表性,而概率样本提供基于设计的辅助信息,但可能不包含研究变量。我们提出了一个深度神经网络(DNN)辅助的双重稳健框架,用于从这两个数据源估计有限总体均值。所提出的方法将非概率样本的对数几率采样得分建模为未知的非参数函数,并通过最大化结合非概率样本和参考概率样本信息的伪似然来估计它。DNN参数使用ADAM算法进行优化。得到的DNN估计的采样得分被纳入DNN辅助的逆概率加权估计器和深度双重稳健估计器。我们在正则条件下建立了一致性和收敛速度,并通过模拟研究和使用皮尤研究中心及行为风险因素监测系统数据的实证应用评估了所提出估计器的有限样本性能。结果表明,所提出的估计器可以提高对参数倾向性得分误设的鲁棒性,特别是当真实选择机制是非线性时。

英文摘要

Integrating probability and nonprobability survey samples is an important problem in modern survey sampling. Nonprobability samples often contain rich outcome information but may lack population representativeness, whereas probability samples provide design-based auxiliary information but may not contain the study variable. We propose a deep neural network (DNN)-assisted doubly robust framework for estimating the finite population mean from these two data sources. The proposed method models the logit sampling score for the nonprobability sample as an unknown nonparametric function and estimates it by maximizing a pseudo-likelihood that combines information from the nonprobability sample and a reference probability sample. The DNN parameters are optimized using the ADAM algorithm. The resulting DNN-estimated sampling scores are incorporated into a DNN-assisted inverse-probability weighted estimator and a deep doubly robust estimator. We establish consistency and convergence rates under regularity conditions and evaluate the finite-sample performance of the proposed estimators through simulation studies and an empirical application using Pew Research Center and Behavioral Risk Factor Surveillance System data. The results suggest that the proposed estimators can improve robustness to parametric propensity-score misspecification, especially when the true selection mechanism is nonlinear.

2605.28749 2026-05-28 econ.EM math.ST stat.ME stat.TH

IV regression with distribution-valued outcomes

分布值结果的IV回归

David Van Dijcke, Kaspar Wüthrich

AI总结 提出IV Fréchet回归(IVFR),一种针对结果为整个分布的工具变量方法,通过2-Wasserstein空间中的IV回归扩展全局Fréchet回归以处理内生协变量,并证明投影减少估计误差、保证有效拟合分布,且估计量弱收敛到高斯过程。

详情
Comments
37 pages, 4 figures, 2 tables
AI中文摘要

我们开发了IV Fréchet回归(IVFR),这是一种工具变量(IV)方法,适用于结果为整个分布的情况。将问题表述为2-Wasserstein空间中的IV回归,IVFR将全局Fréchet回归扩展到存在内生协变量的情况。IVFR将IV加权分位曲线投影到有效分布空间上,然后恢复相应的回归系数函数。该投影可证明地减少有限样本中的估计误差,并保证有效的拟合分布。我们证明了IVFR估计量弱收敛到均值为零的高斯过程,并建立了用于均匀推断的乘子自助法的有效性。在模拟中,与现有方法相比,投影将积分均方误差(IMSE)降低了高达63%。重新审视中国进口竞争对通勤区内工资分布的影响,所提出的方法产生的置信带比现有方法窄9-10%。使用我们新颖的均匀置信带,我们没有发现进口竞争降低了分布最底端工资的证据,但发现在第10至第35百分位数之间有影响。我们还重新审视了县级食品券计划对县出生体重分布的影响,并未发现显著影响。

英文摘要

We develop IV Fréchet regression (IVFR), an instrumental-variable (IV) method for settings where the outcome is an entire distribution. Framing the problem as an IV regression in 2-Wasserstein space, IVFR extends global Fréchet regression to the case with endogenous covariates. IVFR projects IV-weighted quantile curves onto the space of valid distributions and then recovers the corresponding regression coefficient functions. The projection provably reduces the estimation error in finite samples and guarantees valid fitted distributions. We show that the IVFR estimator converges weakly to a mean-zero Gaussian process and establish the validity of a multiplier bootstrap procedure for uniform inference. In simulations, the projection reduces the integrated mean squared error (IMSE) by up to 63% relative to existing methods. Revisiting the effects of Chinese import competition on the wage distribution within commuting zones, the proposed method produces 9-10% narrower confidence bands than existing methods. Using our novel uniform confidence bands, we find no evidence that import competition reduced wages at the very bottom of the distribution, but only between the 10th and 35th quantile. We also revisit the effect of county food stamp programs on the county's birth weight distribution and find no significant effects.

2605.28729 2026-05-28 stat.ML cs.LG

Beyond Lipschitz: Data-Driven Robustness via Discrete Modulus of Continuity

超越Lipschitz:基于离散模连续性的数据驱动鲁棒性

Jürgen Dölz, Michael Multerer, Michele Palma

AI总结 提出基于离散模连续性(DMOC)的数据驱动鲁棒性框架,通过非线性泛化Lipschitz连续性并引入可扩展的小批量算法,实现与数据分布相关的细粒度鲁棒性评估。

详情
AI中文摘要

神经网络的鲁棒性通常通过局部或全局Lipschitz常数来量化。然而,Lipschitz连续性作为全局鲁棒性度量可能过于粗糙或过于严格,无法捕捉细微的、依赖于数据的行为。我们提出了一种基于离散模连续性(DMOC)的数据驱动、架构无关的框架,这是Lipschitz连续性的非线性推广,提供了更精细的鲁棒性概念。与许多现有方法不同,DMOC不需要访问模型内部,而是评估相对于数据分布的规律性。这将焦点从模型转移到数据,数据提供了规律性的数据驱动基线,用于评估网络的鲁棒性。我们建立了DMOC诱导半范数的收敛结果,给出了基于分离距离的显式数据驱动速率,并引入了一种可扩展的小批量算法,该算法将精确计算的二次成本降低,从而能够应用于ImageNet等大规模数据集。实验上,DMOC作为一种架构无关的诊断工具:它区分了训练和未训练的网络,揭示了欠拟合和过拟合状态,并且作为特例,产生了与最先进方法(如ECLipsE和ECLipsE-fast)相当的紧Lipschitz估计。

英文摘要

Robustness of neural networks is commonly quantified via local or global Lipschitz constants. However, Lipschitz continuity can be overly coarse or overly restrictive as global robustness measure, failing to capture nuanced, data-dependent behavior. We propose a data-driven, architecture-agnostic framework based on the discrete modulus of continuity (DMOC), a non linear generalization of Lipschitz continuity that provides a finer notion of robustness. Unlike many existing approaches, DMOC does not require access to model internals and instead evaluates regularity relative to the data distribution. This shifts the focus from the model to the data, which provide a data-driven baseline of regularity against which the network's robustness is assessed. We establish convergence results for DMOC-induced seminorms with explicit data-driven rates in terms of the separation distance, and introduce a scalable minibatch algorithm that reduces the quadratic cost of exact computation, enabling application to large-scale data sets such as ImageNet. Empirically, DMOC serves as an architecture independent diagnostic: it distinguishes trained from untrained networks, reveals underfitting and overfitting regimes, and yields, as a special case, tight Lipschitz estimates comparable to state-of-the-art method such as ECLipsE and ECLipsE-fast.

2605.28679 2026-05-28 cs.LG stat.ML

Optimal ridge regularization revisited

最优岭回归正则化再探讨

Jack Timmermans, Sergio A. Alvarez

AI总结 针对有限数据样本的线性岭回归,提出一种迭代算法从生成参数计算最优正则化强度,并证明其在有限噪声水平下的收敛性,实验表明结合样本参数估计可在多种设置下实现接近最优的泛化性能。

详情
AI中文摘要

我们考虑在有限数据样本 $X$ 上的 $L^2$ 正则化线性(岭)回归,其中 $X$ 具有有界协方差,线性预测目标 $y$ 具有加性各向同性噪声且方差有限。我们提出了一种迭代过程,用于在固定 $X$ 设置下从生成参数数值计算最优正则化强度,并证明了其在有限噪声水平下的收敛性。我们在合成数据上的实验评估表明,所提出的过程结合基于样本的参数估计,在广泛的样本量、长宽比和噪声水平下,实现了接近最优的随机 $X$ 泛化性能,额外计算成本相当于欠参数化情况下的一次初步岭回归和过参数化情况下的两次初步岭回归。

英文摘要

We consider $L^2$-regularized linear (ridge) regression over a finite data sample $X$ with bounded covariance and linear prediction targets $y$ with additive isotropic noise of finite variance. We present an iterative procedure to compute the optimal regularization strength numerically from the generative parameters in the fixed-$X$ setting and prove its convergence at limited noise levels. Our experimental evaluation over synthetic data shows that the proposed procedure combined with sample-based parameter estimates attains near-optimal random-$X$ generalization across a wide range of sample sizes, aspect ratios, and noise levels, at an added computational cost equivalent to one preliminary ridge regression in the underparameterized regime and two in the overparameterized case.

2605.28653 2026-05-28 stat.ME

Adaptive clinical trials based on design-optimal e-values with automatic curtailment: An application to single-arm trials with binary data

基于设计最优e值的自适应临床试验与自动截断:在二分类数据单臂试验中的应用

Stef Baas, Judith ter Schure, Joost van Rosmalen

AI总结 本文提出基于有限时域最优e值的单臂多阶段临床试验设计,通过动态规划最大化统计功效或最小化期望样本量,并证明其在二分类数据中具有竞争力。

详情
Comments
19 pages, 4 figures, 1 table
AI中文摘要

e值作为p值和贝叶斯因子的稳健替代指标,在量化统计证据方面日益受到关注。e值是自适应临床试验的一种有前景的方法,因为它们具有任意时刻有效性:e值确保在任何停止时间控制I类错误率,便于重复中期分析、复杂停止规则以及在方案偏离下的有效推断。e值文献主要关注渐近最优性;然而,临床试验的样本量通常有限。为此,我们研究了基于e值的有限时域最优设计,用于二分类数据的单臂多阶段临床试验。这一设置与早期癌症试验相关,但也便于引入e值的赌注解释,我们利用该解释构建e值,这些e值要么(1)最大化统计功效,要么(2)最小化期望样本量,同时可能对最小功效施加约束。我们通过基于当前观测e值、最大样本量和预先指定的显著性水平的(约束)动态规划来构建这些设计。通过精确计算,我们表明,除了稳健性之外,基于e值的设计在有无无效停止的情况下,能够提供与标准(非)自适应设计竞争的操作特征,并在有限样本中优于增长率最优e值。此外,小的e值自动表明试验继续是徒劳的,例如,e值为零表明不可能得出有效性结论。因此,基于e值的设计为当前最先进的单臂二分类试验提供了可行的替代方案,值得扩展到其他自适应临床试验设置,如多臂多阶段和响应自适应设计。

英文摘要

The e-value is gaining traction as a robust alternative to p-values and Bayes factors for quantifying statistical evidence. e-values are a promising method for adaptive clinical trials due to their anytime-validity: e-values ensure type I error rate control at any stopping time, facilitating repeated interim analyses, complex stopping rules, and valid inference under protocol deviations. The e-value literature focuses mostly on asymptotic optimality; however, sample sizes in clinical trials are often limited. To this end, we investigate e-value-based designs with finite-horizon optimality for single-arm multi-stage clinical trials with binary data. This setting is relevant in early-phase cancer trials, but it also facilitates an accessible introduction to the betting interpretation of e-values, which we use to construct e-values that either (1) maximize statistical power, or (2) minimize the expected sample size, with or without constraints on the minimum power. We construct these designs through (constrained) dynamic programming based on the currently observed e-value, the maximum sample size, and the pre-specified significance level. Using exact calculations, we show that, next to robustness, e-value-based designs can provide competitive operating characteristics to standard (non-)adaptive designs with and without futility stopping and outperform growth-rate-optimal e-values in finite samples. In addition, small e-values automatically indicate trial continuation is futile, e.g., an e-value of zero indicates the impossibility of an efficacy conclusion. Hence, e-value-based designs provide a viable alternative to the current state-of-the-art in single-arm binary trials, warranting extension to other adaptive clinical trial settings such as multi-arm multi-stage and response-adaptive designs.

2605.28613 2026-05-28 math.OC cs.LG stat.ML

Implicit Regularization in Perturbed Deep Matrix Factorization: Spectral Conditions and Stability

扰动深度矩阵分解中的隐式正则化:谱条件与稳定性

Jingzhe Wang, Hung-Hsu Chou

AI总结 本文研究扰动深度矩阵分解中低秩隐式正则化的稳定性,通过推导谱条件分析无噪声情况下的低秩阶段,并证明扰动下梯度下降的收敛性与低秩阶段的保持性。

详情
AI中文摘要

本文研究了扰动深度矩阵分解中低秩隐式正则化的稳定性,其中目标矩阵被噪声矩阵破坏。我们首先推导了充分的谱条件,使得梯度下降在无噪声情况下表现出低秩阶段。这些条件展示了目标谱、初始化和步长如何共同决定非空低秩区间的存在性。然后我们分析了扰动的梯度下降动力学,证明了收敛保证,并量化了扰动如何影响迭代复杂度和特征值恢复。最后,我们表明低秩阶段在扰动下仍然存在,且与扰动大小有显式依赖关系。数值实验支持了理论发现。

英文摘要

This paper studies the stability of low-rank implicit regularization in perturbed deep matrix factorization, where the target matrix is corrupted by a noise matrix. We first derive sufficient spectral conditions under which gradient descent exhibits a low-rank phase in the noiseless setting. These conditions show how the target spectrum, initialization, and step size jointly determine the existence of a nonempty low-rank interval. We then analyze the perturbed gradient descent dynamics, proving convergence guarantees and quantifying how the perturbation affects iteration complexity and eigenvalue recovery. Finally, we show that the low-rank phase persists under perturbation, with explicit dependence on the perturbation size. Numerical experiments support the theoretical findings.

2605.28559 2026-05-28 stat.ME

Sequential generalized kernel equating: Providing comparable scores across multiple test forms with nonequivalent groups and differently measured covariates

序贯广义核等值:在不等组和不同测量协变量下提供多个测试形式的可比分数

Michaela Vařejková, Patrícia Martinková, Eva Potužníková

AI总结 提出序贯广义核等值方法,通过处理协变量分布差异,在无锚题时利用协变量实现多测试形式的分数等值,模拟和实际数据表明可减少等值偏差。

详情
AI中文摘要

当没有锚题可用时,可以使用协变量进行测试等值,以提供多个测试形式的可比分数。然而,如果某些协变量本身是通过不同的测试形式测量的,其性能可能会受到影响。在这项工作中,我们提出了序贯广义核等值,以考虑NEC设计中使用的协变量分布可能存在的差异。我们在核等值框架内通过模拟研究评估了所提出的方法。结果表明,等值协变量可以减少等值测试分数的偏差,特别是当协变量分布不同且协变量与测试分数之间的相关性较强时。来自全国高中毕业考试的真实数据示例进一步展示了实际应用。

英文摘要

Test equating using covariates may be applied to provide comparable scores from multiple test forms when no anchor items are available. However, its performance may be compromised if some of the covariates themselves are measured using different test forms. In this work, we propose sequential generalized kernel equating to account for possible differences in the distribution of covariates used in the NEC design. We evaluate the proposed approach through a simulation study within the kernel equating framework. Results indicate that equating the covariate reduces bias in equated test scores, particularly when the covariate distributions differ and the correlation between the covariate and the test score is strong. A real data example from a national high school leaving examination further demonstrates the practical application.

2605.28516 2026-05-28 stat.ML cs.LG

Conservative neural posterior estimation via distributionally robust training

通过分布鲁棒训练实现保守神经后验估计

William Laplante, Yuga Hikida, Charita Dellaporta, François-Xavier Briol, Ayush Bharti

AI总结 提出DRO-NPE方法,通过Wasserstein模糊集上的最坏情况损失替代标准NPE目标,控制过拟合并减少后验过度自信,从而提高低模拟预算下的覆盖率和校准性能。

详情
AI中文摘要

基于神经后验估计(NPE)的模拟推断在有限模拟预算下通常会产生过度自信且不可靠的后验。为了解决这个问题,我们提出了DRO-NPE,一种分布鲁棒方法,它将标准NPE目标替换为Wasserstein模糊集上的最坏情况损失。我们引入了基于KL的误覆盖和误校准度量,并利用这些度量表明DRO-NPE目标控制了过拟合并减少了后验过度自信。我们的方法是可处理的、可并行化的,并且易于与标准归一化流集成。在基准SBI任务中,DRO-NPE一致地提高了覆盖率和校准性能,同时缩小了经验NPE损失与总体NPE损失之间的差距,从而在低模拟情况下实现更可靠的推断。

英文摘要

Simulation-based inference with neural posterior estimation (NPE) often yields overconfident and unreliable posteriors under limited simulation budgets. To address this, we propose DRO-NPE, a distributionally robust approach that replaces the standard NPE objective with a worst-case loss over a Wasserstein ambiguity set. We introduce KL-based metrics for miscoverage and miscalibration, and use these to show that the DRO-NPE objective controls overfitting and reduces posterior overconfidence. Our method is tractable, parallelisable, and readily integrates with standard normalising flows. Across benchmark SBI tasks, DRO-NPE consistently improves coverage and calibration, while narrowing the gap between empirical and population NPE loss, leading to more reliable inference in low-simulation regimes.

2605.28471 2026-05-28 stat.ME

The Modified Egger Intercept Tests for Detecting Horizontal Pleiotropy in Two-Sample Summary-Data Mendelian Randomization

修正的Egger截距检验:用于检测两样本汇总数据孟德尔随机化中的水平多效性

Yilei Ma, Youpeng Su, Xin Liu, Xuanye Cui, Ping Yin, Peng Wang

AI总结 针对Egger截距检验在检测水平多效性时因测量误差和赢家诅咒导致的偏差,提出基于偏差校正的修正Egger截距检验,并结合两种等位基因编码方案增强稳健性,在控制第一类错误和检验功效上优于原方法。

详情
AI中文摘要

Egger截距(EI)检验是两样本汇总数据孟德尔随机化中检测水平多效性的常用工具。显著的EI检验表明平均多效性效应不为零(即方向性多效性)或InSIDE(工具变量强度独立于直接效应)假设被违反(即相关多效性),或两者兼有。因此,EI检验提供了对工具变量假设有效性的评估,非零EI表明常用的逆方差加权(IVW)估计量将有偏。然而,由于测量误差和赢家诅咒导致的Egger回归估计偏差,EI检验可能表现出不准确的第一类错误率。在本文中,我们基于零假设(无方向性或相关多效性)下的偏差校正EI估计量,利用最近发展的重随机化IVW估计量,提出了一种修正的EI(MEI)检验。然后,我们在现实条件下证明了MEI检验的渐近性质。与EI检验类似,我们发现MEI检验的功效也受SNP方向的影响。为了增强功效的稳健性,我们进一步结合了两种特定等位基因编码方案下获得的MEI检验统计量。模拟和实际数据研究表明,组合检验在第一类错误控制和功效方面均优于EI检验。

英文摘要

The Egger intercept (EI) test is a widely used tool to detect horizontal pleiotropy in two-sample summary-data Mendelian randomization. A significant EI test suggests that either the average pleiotropic effect differs from zero (i.e., directional pleiotropy) or the InSIDE (Instrument Strength Independent of Direct Effect) assumption is violated (i.e., correlated pleiotropy) or both. As such, the EI test provides an assessment of the validity of the instrumental variable assumptions, with a non-zero EI indicating that the commonly used inverse-variance weighted (IVW) estimator will be biased. However, the EI test may exhibit inaccurate type one error rates due to biased estimation in Egger regression caused by the measurement error and winner's curse. In this article, we propose a modified EI (MEI) test based on a bias-corrected EI estimator under the null hypothesis of no directional or correlated pleiotropy, leveraging the recently developed rerandomized IVW estimator. We then prove the asymptotic properties of the MEI test under realistic conditions. Like the EI test, we find that the power of the MEI test is also affected by the orientation of SNPs. To enhance the robustness of power, we further combine the MEI test statistics obtained under two specific allele coding schemes. Both simulation and real data studies show that the combined test outperforms the EI test in terms of type one error control and power.

2605.28429 2026-05-28 math.ST stat.TH

The 'Right' Extension of Type-I Error to Data-Dependent Levels

第一类错误到数据依赖水平的“正确”扩展

Nick W. Koning

AI总结 本文通过三个公理证明第一类错误到数据依赖水平的扩展是唯一的,并以此支持E-value的常用定义。

详情
AI中文摘要

关于数据依赖和事后显著性水平的假设检验文献依赖于第一类错误到数据依赖水平的特定扩展。现有对该扩展的论证是启发式的,主要动机源于其与E-value的联系。我们的主要贡献是通过展示该扩展从三个公理中产生来论证其是“正确”的:它是唯一嵌套了数据无关水平下经典第一类错误有效性的扩展,保留了数据依赖水平下的经典有效性,并且在拒绝声明的强度上是单调的。随后,我们应用这一结果来支持E-value的常用定义,通过展示它作为可能在不同数据驱动显著性水平下拒绝的广义假设检验数值表示的正确有效性概念而出现。

英文摘要

The literature on hypothesis testing with data-dependent and post-hoc significance levels relies on a particular extension of the Type-I error to data-dependent levels. Existing arguments for this extension are heuristic, and primarily motivated by a resulting connection to the E-value. Our main contribution is to argue that the extension is 'right', by showing that it emerges from three axioms: it is the only extension that nests classical Type-I error validity for data-independent levels, preserves classical validity for data-dependent levels and is monotone in the strength of the rejection claim. We subsequently apply this result to support the common definition of the E-value, by showing that it arises as the 'right' notion of validity for the numerical representation of a generalized hypothesis test that may reject at different data-driven significance levels.

2605.28427 2026-05-28 cs.LG stat.ML

Latent Diffusion for Missing Data

缺失数据的潜在扩散模型

Alberte Heering Estad, Ignacio Peis, Jes Frellsen

AI总结 提出两阶段框架,先利用鲁棒VAE从缺失数据中学习潜在表示,再训练扩散模型,在MCAR缺失率高达50%时仍保持高质量生成,优于像素空间扩散。

详情
AI中文摘要

扩散模型已成为缺失数据插补的强大生成方法,但大多数现有方法直接在数据空间中操作,当训练数据严重不完整时会退化。我们研究将扩散转移到学习到的潜在表示是否能在完全随机缺失(MCAR)损坏下提高鲁棒性。为此,我们提出一个两阶段框架:一个基于VAE的鲁棒插补器首先从不完整观测中学习紧凑的语义特征,然后在得到的潜在空间中训练扩散模型。在不同的训练缺失率下,我们在相同的不完整数据设置下与像素空间扩散模型进行受控比较。潜在扩散模型保持高样本质量,并在缺失率高达50%时保持稳定,而像素空间扩散随着缺失率增加逐渐退化。对于下游插补,潜在扩散也始终比像素空间扩散表现更好。这些发现表明,潜在空间建模减轻了零插补输入带来的伪影放大,并为不完整数据学习提供了更鲁棒的生成先验。总体而言,我们的结果支持潜在扩散作为缺失数据问题中像素空间扩散的一个强大且实用的替代方案。

英文摘要

Diffusion models have emerged as powerful generative approaches for missing-data imputation, yet most existing methods operate directly in data space and degrade when training data are heavily incomplete. We investigate whether shifting diffusion to a learned latent representation improves robustness under missing-completely-at-random (MCAR) corruption. To this end, we propose a two-stage framework: a robust VAE-based imputer first learns compact semantic features from incomplete observations, and a diffusion model is then trained in the resulting latent space. Across training missing rates, we perform a controlled comparison against pixel-space diffusion models under the same incomplete-data setting. The latent diffusion model maintains high sample quality and remains stable up to 50\% missingness, while pixel-space diffusion degrades progressively as missingness increases. For downstream imputation, latent diffusion also achieves consistently better performance than pixel-space diffusion. These findings indicate that latent-space modeling mitigates artifact amplification from zero-imputed inputs and provides a more robust generative prior for incomplete-data learning. Overall, our results support latent diffusion as a strong and practically useful alternative to pixel-space diffusion for missing-data problems.

2605.28406 2026-05-28 math.ST math.PR stat.TH

Sharp inequalities between variance-based dependent sensitivity indices and Shapley effects: upper-bounds

基于方差的相依敏感性指数与Shapley效应之间的尖锐不等式:上界

Matieyendou Lamboni

AI总结 研究输入相依时基于方差的Shapley效应与相依敏感性指数之间的不等式关系,证明Shapley效应介于主效应与总效应之间,并给出多种上界以简化高维非相关输入识别。

详情
AI中文摘要

对于在随机独立变量集上评估的模型,基于方差的Shapley效应介于Sobol'指数之间,且相应的总指数具有基于导数的上界。当输入非独立时,这些关系不成立。本研究探讨了我们最近引入的基于方差的相依敏感性指数与基于方差的Shapley效应之间的一个一般不等式联系,针对具有相依输入的模型。结果表明,Shapley效应介于主相依敏感性指数与总相依敏感性指数之间,且这些指数在计算上更具吸引力。此外,提供了这些指数的不同上界,以便于在高维中识别非相关输入,并有时获得总相依指数的实用估计。其中一些上界依赖于传统梯度,而另一些则依赖于使用依赖模型的广义敏感性指数。

英文摘要

For models evaluated at a random set of independent variables, the variance-based Shapley effects range between Sobol' indices, and the corresponding total indices admit derivative-based upper-bounds. Such relationships fail when the inputs are non-independent. This study investigates a general inequality link between the variance-based dependent sensitivity indices, recently introduced by us, and the variance-based Shapley effects for models with dependent inpus. It turns out that Shapley effects range between the main and total dependent sensitivity indices, and such indices are computationally more attractive. Moreover, different upper-bounds of such indices are provided so as to ease the identification of non-relevant inputs in higher dimensions as well as to obtain sometimes practical estimates of total dependent indices. Some of such bounds rely on the traditional gradients, while others rely on generalized sensitivity indices using dependency models.

2605.28364 2026-05-28 stat.ML cs.LG

Variance-Adaptive Optimal Algorithm for Reinforcement Learning with Multinomial Logit Function Approximation

基于多项逻辑函数逼近的强化学习的方差自适应最优算法

Wonyoung Kim, Min-Hwan Oh, Garud Iyengar, Assaf Zeevi

AI总结 针对多项逻辑函数逼近的强化学习,提出一种计算高效的方差自适应算法,实现了实例级最优遗憾界,并通过实验验证其优于传统方法。

详情
AI中文摘要

基于多项逻辑(MNL)函数逼近的强化学习因其灵活性和广泛适用性已成为一个重要框架。虽然现有研究在最坏情况分析下建立了遗憾保证,但它们未能捕捉性能如何依赖于学习者和环境之间交互的变异性。在本文中,我们为基于MNL的马尔可夫决策过程开发了一种新的理论分析,得到了显式的方差自适应遗憾界。我们的算法计算高效,并实现了实例级最优遗憾率,缩小了上下界之间的差距。我们的数值实验验证了我们的方法比传统方法更有效地学习最优策略。

英文摘要

Reinforcement learning with multinomial logistic (MNL) function approximation has become an important framework due to its flexibility and broad applicability. While existing studies have established regret guarantees under worst-case analysis, they do not capture how performance depends on the variability of the interaction between the learner and the environment. In this paper, we develop a new theoretical analysis for MNL-based Markov decision processes that yields explicit variance-adaptive regret bounds. Our algorithm is computationally efficient and achieves the instance-wise optimal rate of regret, narrowing the gap between upper and lower bounds. Our numerical experiments validate that our method learns optimal policies more efficiently than conventional approaches.

2605.28344 2026-05-28 stat.AP

Capturing the Curve: Functional Data Analysis for Validated Digital Outcome Measures

捕捉曲线:用于验证数字结局指标的函数数据分析

Mia S. Tackney, Marcos Matabuena, Marco Palma, Michael Wester, Claire Maassen, Thomas Krammer, Julian Mustroph, Peter H. Charlton, James Carpenter, Sofia S. Villar

AI总结 本文提出基于多水平函数主成分分析(MFPCA)的受试者间得分作为数字健康数据的低维表示,并通过模拟和实例验证其在可靠性、区分度和临床变化检测方面优于预定义标量。

详情
AI中文摘要

数字健康技术能够以近连续时间的高频率收集数据,并捕获个体健康的丰富信息。这些设备收集的原始数据通常具有层次函数结构:重复的生理功能在多个时间尺度(秒、天、周)上被观测。虽然可以从数字数据中推导出许多汇总统计量,但在临床试验中,通常只有一小部分预定义的标量被验证为结局指标。我们探索基于多水平函数主成分分析(MFPCA)的受试者间得分的数据驱动汇总统计量,这些得分是函数数据的低维表示,具有稳健的统计性质。具体而言,我们计算相对于参考人群的MFPCA投影得分,总结个体在每个层次水平上如何偏离主导变异方向。通过基于智能手表心电图(ECG)信号的模拟研究,我们根据验证标准(包括重测信度和已知组区分度)比较MFPCA得分与预定义汇总统计量。我们证明MFPCA得分通常具有高可靠性,并且能够在模拟的变化情景中区分组别。当数字工具能够测量新的生理信号且变化的特征尚未定义时,这提供了优势。最后,使用帕金森病患者膝关节屈伸数据,我们证明与预定义标量相比,其中一个MFPCA得分与既定的金标准指标相关性更强,并且能够检测临床变化。我们得出结论,MFPCA导出的得分比典型的结局指标保留更多信息,并为在临床试验环境中使用学习表示策略打开了大门。

英文摘要

Digital health technologies enable high-frequency collection of data in near-continuous time and capture rich information about the health of individuals. The raw data collected by these devices often have a hierarchical functional structure: repeated physiological functions are observed over time and on multiple time scales (seconds, days, weeks). While many summaries can be derived from digital data, typically, only a small subset of pre-defined scalars is validated as outcome measures in clinical trials. We explore data-driven summaries based on between-subject scores from Multilevel Functional Principal Component Analysis (MFPCA), which are low-dimensional representations of functional data with robust statistical properties. Specifically, we compute MFPCA projection scores with respect to a reference population, summarising how individuals differ from the dominant directions of variation at each hierarchical level. Through a simulation study based on smartwatch electrocardiogram (ECG) signals, we compare MFPCA scores with pre-specified summaries in terms of validation criteria, including test-retest reliability and known-groups discrimination. We demonstrate that MFPCA scores generally have high reliability and can discriminate between groups across simulated scenarios of change. This offers an advantage when digital tools enable the measurement of novel physiological signals and the characteristics of the change are not yet defined. Finally, using knee flexion-extension data from individuals living with Parkinson's disease, we demonstrate that one of the MFPCA scores more strongly correlates with established gold-standard metrics and can detect clinical change, compared to a pre-specified scalar. We conclude that MFPCA-derived scores retain more information than typical outcome measures and open the door to using learning representation strategies in clinical trial settings.

2605.28340 2026-05-28 stat.ML cs.LG

Decision-focused learning for optimal PV-Battery scheduling

面向决策的光伏-电池调度优化学习

Joris Depoortere, Hussain Kazmi, Johan Driesen

AI总结 提出一种决策聚焦学习框架,通过训练LSTM光伏发电预测器以最小化电池调度成本,相比传统两阶段方法降低平均电费3.6%,验证了预测与优化目标对齐的重要性。

详情
Journal ref
Journal of Energy Storage Volume 154, Part A, 10 April 2026, 121152
AI中文摘要

近年来住宅光伏的使用急剧增加。随着电池系统变得更加经济实惠,光伏-电池系统的最优运行可以为家庭带来显著节省。最优控制需要正确预测底层参数(如光伏发电量)以调度电池。尽管由于算法进步和数据可用性,预测模型变得越来越准确,但准确性通常以通用指标衡量,这些指标可能与下游应用不一致。本研究提出了一种决策聚焦学习框架,通过在下游电池系统最优调度上训练长短期记忆光伏能量预测器,将优化和预测集成在一起。将所提出的方法与标准两阶段方法进行比较。在14个月的评估期内,决策聚焦方法在根据完美预测和无优化基线定义的性能界限归一化后,将20栋建筑的平均电费降低了3.6%。关键的是,尽管该模型的均方根误差为19.9%,显著高于解耦模型的8.2%,但仍实现了这一财务改善。对决策聚焦模型进行热启动进一步改善了结果,平均成本降低约8%,同时减轻了对统计准确性的负面影响(均方根误差为13.7%)。这些发现在20个家庭以及每个家庭单独在0.001水平上具有统计显著性。这些结果表明,将预测模型与优化目标对齐对于在光伏-电池系统中实现成本优势至关重要。未来的研究应在其他数据集、替代预测模型和替代优化算法上重复这些发现。

英文摘要

The use of residential photovoltaics has increased dramatically in recent years. With battery systems becoming more affordable, the optimal operation of a photovoltaic-battery system can bring significant savings to households. Optimal control requires correct forecasts of underlying parameters, such as photovoltaic power generation, to schedule the battery. While forecasting models have become increasingly accurate due to algorithmic advances and data availability, accuracy is typically measured in generic metrics which might not align with the downstream application. This study proposes a decision-focused learning framework that integrates optimization and prediction by training a Long Short-Term Memory photovoltaic energy forecaster on the downstream optimal scheduling of a battery system. The proposed methodology is compared against a standard two-phase approach. Across a 14-month evaluation period, the decision-focused method reduced average electricity costs across twenty buildings by 3.6% when normalized against performance bounds defined by a perfect forecast and a baseline of no optimization. Critically, this financial improvement was achieved despite the model exhibiting a root mean squared error of 19.9%, significantly higher than the decoupled model's 8.2%. Warm-starting the decision-focused model further improves results, lowering average cost by approximately 8%, while also mitigating the negative impact on statistical accuracy (root mean squared error of 13.7%). The findings are statistically significant at the 0.001 level across the twenty households and for each household individually. These results demonstrate that aligning forecast models with optimization goals is key for achieving cost advantages in PV-battery systems. Future research should replicate these findings on other datasets, alternate forecasting models and alternate optimization algorithms.

2605.28339 2026-05-28 math.ST stat.ME stat.TH

From nonstationarity to stationarity via $1/f$ noise: discrete Fourier transforms and sample mean asymptotics for testing

从非平稳到平稳通过 $1/f$ 噪声:用于检验的离散傅里叶变换和样本均值渐近性

Mohamedou Ould Haye, Anne Philippe

AI总结 本文研究具有长记忆和非平稳性的时间序列的统计量渐近行为,推导离散傅里叶变换的联合极限分布,并构造一个参数无关的检验来区分非平稳性与长记忆平稳性。

详情
AI中文摘要

我们研究了表现出长记忆和非平稳性的时间序列的不同统计量的渐近行为。对于记忆参数 $d\in(-1/2,3/2)$ 的过程,我们推导了在固定数量的傅里叶频率处离散傅里叶变换的联合极限分布,采用统一的归一化。得到的极限是高斯分布,具有显式的协方差结构。特别关注边界情况 $d=1/2$,也称为 $1/f$ 噪声。我们证明对数校正为样本均值和样本方差提供了非退化极限,从而得到 $\chi^2$ 类型的显式渐近分布。我们构造了一个结合样本均值、样本方差和低频周期图坐标的统计量,设计使得在边界情况 $(d=1/2)$ 下,它具有易于处理的极限分布。这些结果被应用于构造一个一致的、参数无关的检验,用于区分非平稳性与长记忆平稳性。

英文摘要

We study the asymptotic behaviour of different statistics for time series exhibiting long memory and nonstationarity. For processes with memory parameter $d\in(-1/2,3/2)$, we derive the joint limiting distribution of discrete Fourier transforms at a fixed number of Fourier frequencies, with a unified normalization. The resulting limits are Gaussian with an explicit covariance structure. Particular attention is given to the boundary case $d=1/2$, also known as $1/f$ noise. We show that logarithmic corrections yield nondegenerate limits for sample mean and sample variance leading to explicit asymptotic distributions of $χ^2$ type. We construct a statistic that combines the sample mean, the sample variance, and low-frequency periodogram ordinates, designed so that, at the boundary case $(d=1/2)$, it admits a tractable limit distribution. These results are applied to construct a consistent parameter-free test of nonstationarity against long memory stationarity.

2605.28290 2026-05-28 cs.LG cs.GT stat.ML

Adaptive Bandit Algorithms for Contextual Matching Markets

上下文匹配市场的自适应Bandit算法

Shiyun Lin, Simon Mauras, Vianney Perchet, Nadav Merlis

AI总结 针对上下文匹配市场中的bandit学习问题,提出自适应算法,在随机和对抗性上下文下分别实现实例相关的多对数遗憾上界和次线性遗憾界。

详情
Comments
Accepted to ICML 2026
AI中文摘要

我们研究匹配市场中的bandit学习,其中玩家和臂构成市场的两侧,玩家的效用与臂上下文呈线性关系。每一轮,新臂带着可观测的上下文到达。然后,算法将它们与玩家匹配,旨在最小化每个玩家相对于稳定匹配基准的遗憾。这种上下文结构带来了显著的复杂性:微妙的上下文偏移可能轻微改变一个玩家的效用,同时完全重构底层基准,导致其他玩家出现大的遗憾峰值。我们在两种设置下解决这个问题:随机上下文(从潜在分布中抽取)和对抗性上下文(可能是任意的)。对于随机情况,我们引入了一个新颖的最小偏好差距来捕捉学习难度,并提供了一种完全自适应的算法,具有实例相关的多对数遗憾上界。我们还在温和的分布假设下建立了匹配的实例无关遗憾上界和下界。对于对抗性设置,我们提出了一种在任意上下文下仍然有效的可处理遗憾概念,并通过自适应算法实现了实例无关的次线性遗憾界。

英文摘要

We study bandit learning in matching markets, where players and arms constitute the two market sides, and the players' utilities are linear in the arm contexts. In each round, new arms arrive with observable contexts. Then, the algorithm matches them to players, aiming to minimize each player's regret against a stable matching benchmark. This contextual structure creates significant complexity: subtle context shifts can slightly alter one player's utility while completely reconfiguring the underlying benchmark, causing large regret spikes for others. We address this in two settings: stochastic contexts, drawn from a latent distribution, and adversarial contexts, which may be arbitrary. For the stochastic case, we introduce a novel minimum preference gap to capture learning difficulty and provide a fully adaptive algorithm with an instance-dependent poly-logarithmic regret upper bound. We also establish matching instance-independent regret upper and lower bounds under a mild distributional assumption. For the adversarial setting, we propose a tractable regret notion that remains valid under arbitrary contexts and achieves an instance-independent sublinear regret bound via an adaptive algorithm.

2605.28269 2026-05-28 cs.LG stat.ME

Dynamic Topic Modeling with a Higher-Order Hypergraphical Representation

基于高阶超图表示的动态主题建模

Hanjia Gao, Hanwen Ye, Qing Nie, Annie Qu

AI总结 针对传统主题模型忽略词间高阶交互和动态语料中语义重叠的问题,提出超图表示文本并构建动态主题建模框架,通过结构化低秩分解和时间正则化实现,理论保证收敛性和误差界,实验优于现有模型。

详情
Comments
34 pages, 4 figures
AI中文摘要

动态主题建模被广泛用于分析科学文献、医疗记录和社交媒体中的演变趋势。传统主题模型通过多项单纯形上的单个概率向量表示每个主题,并将词的出现和重复隐式耦合在一个概率机制中。然而,这种表述限制了词之间的依赖结构,并忽略了信息丰富的高阶交互,特别是在具有重叠语义的动态语料中。为了解决这些局限性,我们引入文本的超图表示,其中每个文档被建模为一个连接所有共现词的超边,重复强度编码为节点权重。这种表示自然地将词的出现与重复分开,并引入了一种新颖的基于超图的多项分布,其非线性归一化取决于每个文档的观测词集。基于此似然,我们通过结构化低秩分解和主题-词轮廓上的显式时间正则化,开发了一个动态主题建模框架。此外,尽管双线性分解和文档特定的非线性归一化导致了内在的非凸性,我们仍建立了局部收敛保证并推导了非渐近误差界。在合成数据上的数值实验以及在国际学习表征会议(ICLR)语料库上的应用表明,该方法比现有的基于多项式的主题模型具有一致的改进。

英文摘要

Dynamic topic modeling is widely used to analyze evolving trends in scientific literature, medical records, and social media. Traditional topic models represent each topic through a single probability vector on the multinomial simplex and implicitly couple word occurrence and repetition within one probabilistic mechanism. However, this formulation restricts the dependence structure among words and overlooks informative higher-order interactions, particularly in dynamic corpora with overlapping semantics. To address these limitations, we introduce a hypergraph representation of text where each document is modeled as a hyperedge connecting all co-occurring words, with repetition intensities encoded as node weights. This representation naturally separates word occurrence from repetition and induces a novel hypergraph-based multinomial distribution with a nonlinear normalization depending on the observed word set of each document. Building on this likelihood, we develop a dynamic topic modeling framework via structured low-rank factorizations with explicit temporal regularization on topic-word profiles. Moreover, we establish local convergence guarantees and derive non-asymptotic error bounds despite the intrinsic nonconvexity induced by bilinear factorization and document-specific nonlinear normalization. Numerical experiments on synthetic data and an application to the International Conference on Learning Representations (ICLR) corpus demonstrate consistent improvements over existing multinomial-based topic models.

2605.28267 2026-05-28 cs.LG stat.ML

Parameter-Efficient Generative Modeling with Controlled Vector Fields

基于受控向量场的参数高效生成建模

Peyman Morteza

AI总结 提出一种基于Chow-Rashevskii定理的连续时间生成建模框架,通过少量固定向量场和学习的标量控制构建表达流,实现参数高效的分布变换。

详情
AI中文摘要

受Chow-Rashevskii定理启发,我们引入了一个连续时间生成建模框架,该框架从一小组固定向量场和学习的标量控制中构建表达流。我们的框架不是学习无约束的高维向量场,而是通过学习标量控制函数来调制固定向量场,从而构造速度。当固定场是括号生成时,它们的李代数张成整个空间,提供了一种仅用少量学习控制通道即可实现表达性传输的机制,并为标准向量场参数化提供了一种参数高效的几何替代方案。这种解耦公式产生了一个结构化和可解释的生成模型,其中学习的标量输出通道的数量可以独立于环境维度选择。我们制定了一个表达性原则,表明在适当的可控性和适定性假设下,这种受控流可以将源分布传输到目标分布。我们使用连续归一化流似然目标训练所得模型,并在合成分布上进行了概念验证实验。

英文摘要

We introduce a continuous-time generative modeling framework, motivated by the Chow-Rashevskii theorem, that builds expressive flows from a small set of fixed vector fields and learned scalar controls. Instead of learning an unconstrained high-dimensional vector field, our framework constructs the velocity by modulating fixed vector fields with learned scalar control functions. When the fixed fields are bracket-generating, their Lie algebra spans the ambient space, providing a mechanism for expressive transport with only a small number of learned control channels and offering a parameter-efficient geometric alternative to standard vector-field parameterizations. This decoupled formulation yields a structured and interpretable generative model in which the number of learned scalar output channels can be chosen independently of the ambient dimension. We formulate an expressivity principle showing that, under suitable controllability and well-posedness assumptions, such controlled flows can transport a source distribution to a target distribution. We train the resulting model using a continuous-normalizing-flow likelihood objective and present proof-of-concept experiments on synthetic distributions.

2605.28251 2026-05-28 stat.ML cs.CY cs.LG

Counterfactually Fair Regression via Optimal Transport

通过最优传输实现反事实公平回归

M. Generali Lince, S. Gaucher, J-J. Vie, P. Loiseau

AI总结 本文采用因果不确定性视角,通过重采样噪声定义反事实公平性,提出基于最优传输的后处理估计器,并证明其有限样本公平性保证和风险界。

详情
AI中文摘要

我们考虑学习一个反事实公平回归器的问题。我们采用因果不确定性视角,其中反事实公平性通过重采样噪声定义。我们专注于为一种新的后处理估计器获得理论公平性保证。我们首先证明反事实公平性等价于满足以潜在变量为条件的群体均等。这使我们能够通过重心分位数映射提供最优公平回归器的闭式表达式。为了处理连续潜在变量,我们提出了一种离散化的后处理方法。然后,在温和的正则性假设下,我们证明了我们的估计器具有高概率的有限样本公平性保证,不公平性衰减率为 $ ilde O(n^{-1/3})$,并建立了匹配的风险界 $ ilde O(n^{-1/3})$。我们给出了几乎公平预测的过剩风险的下界。最后,我们将结果扩展到宽松反事实公平性的设置。我们在真实世界和合成数据上验证了我们的方法。

英文摘要

We consider the problem of learning a counterfactually fair regressor. We adopt a causal uncertainty view in which counterfactual fairness is defined with resampled noise. We focus on obtaining theoretical fairness guarantees for a new post-processing estimator. We begin by showing that counterfactual fairness is equivalent to satisfying demographic parity conditional on the latent variable. This allows us to provide a closed-form expression of the optimal fair regressor via a barycentric quantile map. In order to handle continuous latent variables, we propose a discretized post-processing method. Then, under mild regularity assumptions, we prove high-probability finite-sample fairness guarantees for our estimator, providing an unfairness decay at rate $\tilde O(n^{-1/3})$, and establishing a matching risk bound of order $\tilde O(n^{-1/3})$. We provide a matching lower bound on the excess risk of almost fair predictions. Finally, we extend our results to the setting of relaxed counterfactual fairness. We validate our approach on real-world and synthetic data.

2605.28233 2026-05-28 stat.ML cs.CY cs.LG

Geometry of Relaxed Fair Regression: A Unified Framework for Aware and Unaware Settings

松弛公平回归的几何:统一感知与无感知设置框架

M. Generali Lince, V. Divol, R. Flamary, S. Gaucher, P. Loiseau

AI总结 本文通过最优传输理论统一了感知与无感知设置下的公平回归问题,提出了基于Wasserstein-2和全变差惩罚的算法,在松弛公平约束下实现准确预测。

详情
AI中文摘要

公平-准确权衡是部署公平感知机器学习方法的核心问题。当敏感属性在推理时不可用——即所谓的无感知设置时,在松弛公平约束下获得准确预测的原则性方法基本缺失。在这项工作中,我们通过将人口统计平价惩罚下的回归问题表述为最优传输问题来填补这一空白。我们的框架统一了感知和无感知设置,并通过最优传输映射刻画了在平方Wasserstein-2和全变差惩罚下的最优预测函数。这些结果表明,惩罚的选择反映了根本不同的公平哲学:Wasserstein惩罚诱导出平滑的、群体范围内的妥协,而全变差惩罚则对个体子集强制执行精确的平价。基于这些理论刻画,我们提出了一种易于实现、计算高效且在实际基准测试中始终匹配或超越最先进基线的算法。

英文摘要

Fairness-accuracy trade-offs are a central concern in the deployment of fairness-aware machine learning methods. When sensitive attributes are unavailable at inference time-the so called unawareness setting, principled methods for obtaining accurate predictions under relaxed fairness constraints are largely missing. In this work, we address this gap by formulating regression under a demographic parity penalty as an optimal transport problem. Our framework unifies both the \emph{aware} and \emph{unaware} settings and characterizes optimal prediction functions via optimal transport maps, under both squared Wasserstein-2 and Total Variation penalties. These results reveal that the choice of penalty reflects fundamentally different fairness philosophies: the Wasserstein penalty induces a smooth, population-wide compromise, while Total Variation enforces exact parity for a subset of individuals. Building on these theoretical characterizations, we propose an algorithm that is simple to implement, computationally efficient, and consistently matches or outperforms state-of-the-art baselines on real-world benchmarks.

2605.28212 2026-05-28 stat.AP

How to measure intra-physician variability in clinical decision-making?

如何衡量临床决策中医生的个体内变异性?

Alaedine Benani, Pierre Meneton, Emmanuel Messas, Liza Hettal, Sai Sagireddy, Damien Grosgeorge, Jérôme Salomon, Sylvain Bodard, Xavier Tannier

AI总结 本研究通过基准测试八种方法(如学习权重匹配、互信息加权匹配等)在94种实验条件下评估医生处方不一致性,发现学习权重匹配误差最低,且监督方法在医生异质性下保持排序准确性。

详情
Comments
24 pages, 7 tables, 3 figures
AI中文摘要

医生处方变异性,即一位医生对两个在观测协变量上被认为可比较的患者做出不一致决策的概率,对护理质量、安全性和成本具有重大影响。然而,目前尚无经过验证的测量方法。在此,我们在94种实验条件下,针对合成真实数据基准测试了八种方法(欧几里得、马氏距离、学习权重、遗传马氏距离、随机森林邻近度、互信息加权、潜在剖面分析和贝叶斯二项广义线性混合模型)。学习权重匹配实现了最低的平均绝对误差(0.027),其次是互信息加权匹配(0.028)和随机森林邻近度(0.034)。只要医生变异性分组分离良好,所有八种不一致分析方法都能高保真地保持医生排名顺序(在SCORE2实验中与真实数据的斯皮尔曼相关系数>0.89)。在连续异质性医生模型下,无监督方法的排名保持性显著下降(斯皮尔曼相关系数=[0.28, 0.35]),但监督特征加权方法和广义线性混合模型仍能保持(斯皮尔曼相关系数=[0.62, 0.68])。这项受控的方法学评估为在观察性处方数据上的验证奠定了基础。一旦在观察性处方数据上得到验证,这些经过评估的开源估计器可将处方不一致性转化为常规可测量的临床医生级质量指标,系统性地补充现有关于医生间变异性的文献。

英文摘要

Intra-physician prescribing variability, the probability that one physician issues discordant decisions for two patients deemed comparable on observed covariates, holds great impact in quality of care, safety and cost. However, there are no known validated measurement methods. Here, we benchmark eight methods (Euclidean, Mahalanobis, Learned-Weights, Genetic Mahalanobis, Random Forest proximity, Mutual-Information-weighted, Latent Profile Analysis and Bayesian binomial generalized linear mixed model) against a synthetic ground truth across 94 experimental conditions. Learned-Weights matching achieves the lowest mean absolute error (0.027), followed by Mutual-Information-weighted matching (0.028) and RF Proximity (0.034). All eight discordance-analysis methods preserve the physician rank ordering with high fidelity (Spearman > 0.89 versus the ground truth on the SCORE2 experiment), as long as the physician variability groups are well separated. Under a continuous-heterogeneity physician model, rank preservation degrades substantially for unsupervised methods (Spearman = [0.28, 0.35]) but is retained by supervised feature-weighted methods and the GLMM (Spearman = [0.62, 0.68]). This controlled methodological evaluation is a foundation for validation on observational prescribing data. Once validated on observational prescribing data, these evaluated open-source estimators could turn prescribing inconsistency into a routinely measurable clinician-level quality metric, systematically complementing the existing literature on between-physician variation.

2605.28134 2026-05-28 math.OC stat.ML

Convergence of empirical subgradients for optimal transport-based objectives

基于最优传输目标的经验次梯度的收敛性

Tam Le

AI总结 研究由采样传输成本定义的参数化目标,证明其次微分的图形收敛到总体目标的次微分,确保标准次梯度方法一致地逼近总体问题的稳定点。

详情
AI中文摘要

最优传输被广泛用于学习分布、强制执行分布约束和建模不确定性。在实际应用中,传输损失通常通过可处理的表示(如一维排序公式或切片Wasserstein成本)从样本中计算,使其成为训练流程中的实用组件。我们研究了由采样传输成本定义的参数化目标,并证明了其次微分的图形收敛到总体目标的次微分。特别地,这确保了标准次梯度方法一致地逼近总体问题的稳定点。我们在几个设置中说明了结果,包括风险厌恶优化、公平约束学习和切片Wasserstein问题。我们的分析强调,平滑参数化为统计一致性和优化之间提供了有利的接口。相比之下,具有非光滑成本和模型的传输目标在大样本极限下可能表现出不稳定的导数。

英文摘要

Optimal transport is widely used to learn distributions, enforce distributional constraints, and model uncertainty. In applications, transport losses are often computed from samples through tractable representations, such as one-dimensional sorting formulas or sliced Wasserstein costs, making them practical components in training pipelines. We study parameterized objectives defined by sampled transport costs and prove graphical convergence of their subdifferentials to the subdifferential of the population objective. In particular, this ensures that standard subgradient methods consistently approach stationary points of the population-level problem. We illustrate the results in several settings, including risk-averse optimization, fairness-constrained learning, and sliced Wasserstein problems. Our analysis highlights that smooth parameterizations provide a favorable interface between statistical consistency and optimization. By contrast, transport objectives with nonsmooth costs and models may exhibit unstable derivatives in the large-sample limit.

2605.28133 2026-05-28 cs.LG stat.ML

Learning to Bid in Repeated Second-Price Auctions with Dynamic Values and Aggregated Feedback

在具有动态价值和聚合反馈的重复第二价格拍卖中学习出价

Benjamin Heymann, Otmane Sakhi

AI总结 研究当投标者价值动态变化时,如何通过结合插件估计器和最优策略的微分方程刻画来学习出价策略,并针对分段线性和一般光滑原始函数分别实现接近最优的遗憾界。

详情
AI中文摘要

我们研究了当投标者的价值是动态的,即当前价值依赖于过去结果时的学习出价问题。具体来说,我们考虑一个参与重复第二价格拍卖的投标者,其价值取决于自上次成功出价以来的时间,拍卖在连续时间内到达,并且仅在时间范围结束时揭示聚合反馈。这样的投标者必须(1)平衡赢得当前拍卖的即时收益与其对未来价值的影响,以及(2)学习未知的环境参数。我们推导了一类学习方法的遗憾界,该方法将插件估计器与最优策略的微分方程刻画相结合,并表明一个特定的置信界算法以接近最优的遗憾学习最优策略,对于分段线性原始函数为$\widetilde{O}(\log N)$,对于一般光滑原始函数为$\widetilde{O}(N^{1/3})$,且无需显式随机化即可实现这些遗憾。这些理论结果得到了数值实验的支持。

英文摘要

We study the problem of learning to bid when the bidder's value is dynamic, i.e., when the current value depends on past outcomes. Specifically, we consider a bidder participating in repeated second-price auctions whose value depends on the time elapsed since their last successful bid, with auctions arriving in continuous time and only aggregated feedback revealed at the end of the horizon. Such a bidder must (1) balance the immediate benefit of winning the current auction against its impact on future values and (2) learn unknown environmental parameters. We derive regret bounds for a class of learning methods that combine plug-in estimators with a differential-equation characterization of the optimal policy, and show that a specific confidence bound algorithm learns the optimal policy with a near optimal regret of $\widetilde{O}(\log N)$ for piecewise linear primitives, and $\widetilde{O}(N^{1/3})$ for general, smooth primitives, achieving these regrets without explicit randomization. These theoretical results are supported by numerical experiments.

2605.28106 2026-05-28 math.PR math.FA stat.ML

Gaussian Processes with Sample Paths in Reproducing Kernel Banach Spaces

再生核巴拿赫空间中具有样本路径的高斯过程

Toni Karvonen, Rasmus Kleist Hørlyck Sørensen

AI总结 研究高斯过程与再生核巴拿赫空间中高斯随机元之间的联系,通过γ-拉德尼化算子刻画协方差算子对应的正定函数,并将经典Driscoll定理推广到巴拿赫空间。

详情
AI中文摘要

我们研究了高斯过程与再生核巴拿赫空间中高斯随机元之间的联系。我们证明,此类空间上的弱二阶拉东概率测度的协方差算子由一个正定函数唯一确定。在高斯情形下,我们利用γ-拉德尼化算子刻画了那些由协方差算子导出的正定函数。基于这些结果,我们将经典的Driscoll定理推广到了巴拿赫空间框架。

英文摘要

We investigate the connection between Gaussian processes and Gaussian random elements in reproducing kernel Banach spaces. We show that the covariance operator of a weak second-order Radon probability measure on such a space is uniquely determined by a positive definite function. In the Gaussian case, we characterize those positive definite functions that arise from covariance operators in terms of $γ$-radonifying operators. Building on these results, we extend the classical Driscoll theorem to the Banach space setting.

2605.28105 2026-05-28 stat.ME

Identifying Direct Causal Effects in Latent Factor Models by Accounting for Unidentified Parents

通过考虑未识别的父节点来识别潜在因子模型中的直接因果效应

Tom Hochsprung, Nils Sturma, Jakob Runge, Mathias Drton, Andreas Gerhardus

AI总结 针对线性结构方程模型中显式建模的潜在变量,提出一种新的识别准则,通过递归识别方案并显式考虑未识别直接效应的因果父节点,结合网络流计算解决组合搜索问题,从而在密集混淆图中识别观测变量间的直接因果效应。

详情
Comments
48 pages, 4 tables, 7 figures
AI中文摘要

我们考虑具有显式建模潜在变量的线性结构方程模型。在此类模型中,观测变量和潜在变量通过包含随机噪声项的线性方程求解。我们的工作目标是提供观测协方差中的(有理)公式,以识别感兴趣的观测变量之间的直接因果效应。大多数先前的识别方法在潜在投影框架下操作,其中潜在变量被投影到依赖误差项中。然而,当观测变量被密集混淆时,即使仅由少数潜在变量引起,基于投影的方法也无法证明大多数效应的可识别性。对于此类问题,显式使用潜在变量的方法更为有效,但最近为此目的提出的算法在更密集的因果图中通常仍无法得出结论。我们开发了一种新的识别准则,通过利用递归识别方案可以推广的关键见解,即显式考虑具有(尚未)识别直接效应的因果父节点,从而更好地处理密集图。我们的新准则中的组合搜索问题可以借助网络流计算来解决,从而产生一个实用的算法工具,我们也将其在软件中提供。

英文摘要

We consider linear structural equation models with explicitly modelled latent variables. In such models, observed and latent variables solve linear equations including stochastic noise terms. The goal of our work is to identify the direct causal effects between the observed variables of interest by providing (rational) formulas in the observed covariances. Most prior identification approaches operate in the latent projection framework, where latent variables are projected away into dependent error terms. However, when the observed variables are densely confounded, even if only by a few latent variables, the projection-based approaches are unable to certify identifiability of most effects. For such problems, approaches that explicitly use the latent variables are more effective, but algorithms that were recently proposed for this purpose often remain inconclusive for denser causal graphs. We develop a new identification criterion that is able to better handle dense graphs by leveraging the key insight that recursive identification schemes can be generalized by explicitly accounting for causal parents with (yet) unidentified direct effects. Combinatorial search problems in our new criterion can be tackled with the help of network-flow computations, leading to a practical useful algorithmic tool that we also make available in software.

2605.28099 2026-05-28 stat.ME stat.CO

A computationally-tractable measure of global sensitivity for sampling-based Bayesian inference

一种计算可行的全局敏感性度量方法,用于基于采样的贝叶斯推断

Arina Odnoblyudova, Charita Dellaporta, François-Xavier Briol

AI总结 提出基于Fisher散度的全局敏感性分析方法,仅需参考后验样本和得分函数评估,计算可行且适用于高维问题。

详情
AI中文摘要

贝叶斯推断通常对先验或似然的超参数选择敏感,但在实践中以原则性和计算可行的方法定义和量化这种敏感性仍然具有挑战性。不幸的是,现有的敏感性方法由于计算成本高且在中等至高维度下性能差,很少适用于现代贝叶斯工作流程。为了解决这些限制,我们引入了一种基于Fisher散度的全局敏感性分析新方法。我们的方法仅需要一组来自参考后验的样本以及评估得分函数的能力,使其在计算上广泛可行。在温和的正则条件下,它控制整个后验的变化,并提供扰动对前两个矩影响的界限。我们在具有挑战性的贝叶斯推断问题上展示了这些优势,这些问题实际上超出了现有方法的范围,包括未归一化模型的广义贝叶斯推断、时间序列贝叶斯模型推断以及基于神经模拟的推断。

英文摘要

Bayesian inference can often be sensitive to the choice of hyperparameters of the prior or likelihood, yet defining and quantifying this sensitivity in a principled and computationally feasible way remains challenging in practice. Unfortunately, existing sensitivity methods are rarely applicable in modern Bayesian workflows due to their high computational cost and poor performance in moderate to high dimensions. To address these limitations, we introduce a new approach to global sensitivity analysis based on the Fisher divergence. Our method only requires a set of samples from a reference posterior and the ability to evaluate score functions, making it broadly computationally tractable. Under mild regularity conditions, it controls changes in the whole posterior, and provides a bound on the impact of perturbations on the first two moments. We demonstrate these strengths on challenging Bayesian inference problems which are practically out of reach of existing approaches, including generalised Bayesian inference for unnormalised models, inference in Bayesian models of time series, and neural simulation-based inference.

2605.28078 2026-05-28 cs.CR cs.AI cs.LG stat.ML

Mind the Gap: Mixtures of Gaussians in Approximate Differential Privacy

注意差距:近似差分隐私中的高斯混合机制

Huikang Liu, Aras Selvi, Wolfram Wiesemann

AI总结 针对已知敏感度的标量实值查询函数,设计了一类混合高斯加性噪声机制,在中等和低隐私预算下显著降低噪声幅度和方差,接近最优性。

详情
Comments
ICML 2026 style: 9 main pages followed by acknowledgements, references, appendices
AI中文摘要

我们设计了一类加性噪声机制,满足标量实值查询函数的 \((\varepsilon, δ)\)-差分隐私(DP),这些函数具有已知敏感度,特别关注中等和低隐私预算。这些机制称为 extit{混合机制},通过混合多个高斯分布构建,这些高斯分布共享相同的方差,但均值和混合权重不同。得到的分布可以解释为零均值高斯(如解析高斯机制中所用)和额外高斯(其均值取决于查询函数的敏感度)的凸组合。我们推导了 \((\varepsilon, δ)\)-DP 所需方差的严格条件,并提供了高效算法来计算它们。与解析高斯机制相比,我们的机制产生了显著更低的期望噪声幅度(\(l_1\)-损失)和方差(零均值分布的 \(l_2\)-损失)。在激励我们设计的低隐私预算下,我们的机制接近最优性,几乎消除了解析高斯机制的所有最优性差距。

英文摘要

We design a class of additive noise mechanisms that satisfy \((\varepsilon, δ)\)-differential privacy (DP) for scalar, real-valued query functions with known sensitivities, with a particular focus on moderate and low-privacy regimes. These mechanisms, which we call \textit{mixture mechanisms}, are constructed by mixing multiple Gaussian distributions that share the same variance but differ in their means and mixture weights. The resulting distributions can be interpreted as convex combinations of a zero-mean Gaussian (as used in the analytic Gaussian mechanism) and additional Gaussians whose means depend on the sensitivity of the query function. We derive tight conditions on the variances required for \((\varepsilon, δ)\)-DP and provide efficient algorithms to compute them. Compared to the analytic Gaussian mechanism, our mechanisms yield substantially lower expected noise amplitudes (\(l_1\)-loss) and variances (\(l_2\)-loss for zero-mean distributions). In the low-privacy regime that motivates our design, our mechanisms approach optimality, mitigating nearly all of the optimality gap of the analytic Gaussian mechanism.

2605.27967 2026-05-28 stat.ME cs.AI cs.LG stat.ML

Multi-Teacher Knowledge Distillation via Teacher-Informed Mixture Priors

通过教师引导的混合先验进行多教师知识蒸馏

Luyang Fang, Yongkai Chen, Jiazhang Cai, Ping Ma, Wenxuan Zhong

AI总结 提出多教师贝叶斯知识蒸馏(MT-BKD)框架,利用贝叶斯推断和教师引导的先验分布,结合熵加权机制,实现多教师知识的高效融合与不确定性量化。

详情
AI中文摘要

知识蒸馏是一种强大的模型压缩方法,能够高效部署复杂的深度学习模型(教师模型),包括大型语言模型。然而,其潜在的统计机制尚不明确,且不确定性评估常被忽视,特别是在需要多样化教师专业知识的实际场景中。为解决这些挑战,我们引入了 extit{多教师贝叶斯知识蒸馏}(MT-BKD),其中蒸馏学生模型在贝叶斯框架内从多个教师模型学习。我们的方法利用贝叶斯推断来捕捉蒸馏过程中的固有不确定性。我们引入了一种教师引导的先验,整合来自教师模型和特定任务训练数据的外部知识,提供了更好的泛化性、鲁棒性和可扩展性。此外,一种基于熵的加权机制自适应地调整每个教师的影响,使学生能够有效组合多个专业知识来源。MT-BKD增强了学生模型学习过程的可解释性,提高了预测准确性,并提供了不确定性量化。我们在合成任务和真实任务(包括蛋白质亚细胞定位预测和图像分类)上验证了MT-BKD。实验表明,我们的MT-BKD框架在性能提升和稳健的不确定性量化方面表现出色,突显了其优势。

英文摘要

Knowledge distillation is a powerful method for model compression, enabling the efficient deployment of complex deep learning models (teachers), including large language models. However, its underlying statistical mechanisms remain unclear, and uncertainty evaluation is often overlooked, especially in real-world scenarios requiring diverse teacher expertise. To address these challenges, we introduce \textit{Multi-Teacher Bayesian Knowledge Distillation} (MT-BKD), where a distilled student model learns from multiple teachers within the Bayesian framework. Our approach leverages Bayesian inference to capture inherent uncertainty in the distillation process. We introduce a teacher-informed prior, integrating external knowledge from teacher models and task-specific training data, offering better generalization, robustness, and scalability. Additionally, an entropy-based weighting mechanism adaptively adjusts each teacher's influence, allowing the student to combine multiple sources of expertise effectively. MT-BKD enhances the interpretability of the student model's learning process, improves predictive accuracy, and provides uncertainty quantification. We validate MT-BKD on both synthetic and real-world tasks, including protein subcellular location prediction and image classification. Our experiments show improved performance and robust uncertainty quantification, highlighting the strengths of our MT-BKD framework.

2605.27946 2026-05-28 stat.ML cs.LG

Is Backpropagation Optimal? When Synthetic Gradients Improve Sample Efficiency

反向传播是最优的吗?合成梯度何时提高样本效率

Yibo Jacky Zhang, Zeyu Tang, Sanmi Koyejo

AI总结 本文通过理论分析,提出合成梯度作为反向传播的替代方案,并证明在某些条件下合成梯度能实现更低的梯度估计均方误差,从而显著提高样本效率。

详情
AI中文摘要

反向传播是人工神经网络的默认学习规则,并且在可微性可用时通常被视为既定方法。在这项工作中,我们从样本效率的理论角度重新审视这一惯例。我们引入了一个用于计算图上基于损失和基于奖励学习的统一向量化反馈框架,其中合成梯度作为反向传播的自然替代方案出现。我们刻画了合成梯度能够实现比反向传播更低的梯度估计均方误差的条件。我们构造了示例,说明这种样本效率优势可以任意大。在上下文赌博机和强化学习任务上的实验证明了我们理论发现的潜力。

英文摘要

Backpropagation is the default learning rule for artificial neural networks and is often treated as the settled approach whenever differentiability is available. In this work, we revisit this convention through a theoretical lens of sample efficiency. We introduce a unified vectorized feedback framework for loss-based and reward-based learning on computational graphs, in which synthetic gradients emerge as a natural alternative to backpropagation. We characterize the conditions under which synthetic gradients can achieve a lower gradient-estimation mean squared error than backpropagation. We construct examples illustrating that this sample efficiency advantage can be arbitrarily large. Experiments on contextual bandits and reinforcement learning tasks demonstrate the potential of our theoretical findings.

2605.27925 2026-05-28 cond-mat.stat-mech math.PR physics.data-an stat.CO stat.ME

Finite-size occupancy scaling of apparent fractal dimensions in stochastic trajectories

随机轨迹中表观分形维数的有限尺寸占据标度

Bon A. Koo, Edward Ju

AI总结 针对有限随机轨迹中分形维数估计的有限尺寸标度问题,提出基于球箱占据模型的偏差校正方法,通过归一化局部斜率坍塌实现跨模型类的误差降低。

详情
Comments
Main text: 30 pages, 5 figures; supplementary material included
AI中文摘要

从有限随机轨迹估计分形维数是一个有限尺寸标度问题:表观盒计数指数受限于分辨尺度范围和有限采样点之间的占据交叉,不一定等于极限过程的维数。我们用一个球箱占据律来建模这种交叉,该定律预测盒计数曲线、有限尺寸饱和尺度以及归一化局部斜率的标度函数。在随机游走轨迹、分数布朗图和列维飞行中,归一化局部斜率坍塌到单一交叉曲线,而当回归窗口相对于饱和尺度定位时,窗口化盒计数偏差坍塌。反转占据模型给出了有限尺寸偏差校正,减少了受控随机轨迹上的误差,并可在保留模型类之间迁移。与关联维数、去趋势波动分析、变异函数和Higuchi方法的比较表明,主导偏差特定于有限尺度窗口上的点采样盒计数,并且仅局部斜率稳定性不是可靠的诊断。一个DNA行走示例展示了在测量数据上的工作流程,所有图表和文中数字均从发布的单种子代码重新生成。

英文摘要

Estimating a fractal dimension from a finite stochastic trajectory is a finite-size scaling problem: the apparent box-counting exponent is shaped by an occupancy crossover between the resolved range of scales and the finite number of sampled points, and need not equal the dimension of the limiting process. We model this crossover with a balls-in-boxes occupancy law, which predicts the box-count curve, the finite-size saturation scale, and a scaling function for the normalized local slope. Across random-walk traces, fractional Brownian graphs, and Levy flights, the normalized local slope collapses onto a single crossover curve, while the windowed box-counting bias collapses when the regression window is positioned relative to the saturation scale. Inverting the occupancy model gives a finite-size bias correction that reduces error on controlled stochastic trajectories and transfers across held-out model classes. Comparisons with correlation dimension, detrended fluctuation analysis, the variogram, and Higuchi's method show that the dominant bias is specific to point-sampled box-counting over finite scale windows, and that local-slope stability alone is not a reliable diagnostic. A DNA-walk example illustrates the workflow on measured data, and all figures, tables, and in-text numbers are regenerated from released single-seed code.

2605.27859 2026-05-28 math.ST stat.TH

Near-Unit-Root Theory for Affine Processes

仿射过程的近单位根理论

Gael Anne, Yang Lu, Xuewen Yu, Xiaowen Zhou

AI总结 本文针对离散时间仿射过程建立了近单位根渐近理论,揭示了其时变条件方差在近单位根下的非渐近可忽略性,并分析了局部单位根、温和爆炸和温和平稳三种框架下的估计量行为与推断方法。

详情
AI中文摘要

离散时间仿射过程广泛应用于金融和经济学中,涵盖计数、正值和非负值过程。本文为这类模型建立了近单位根渐近理论。与线性AR(1)过程不同,仿射过程表现出时变条件方差,该方差在接近单位根时仍渐近不可忽略,导致定性的不同缩放极限和估计量行为。我们表明,局部单位根区域存在通常的冗余参数问题,而温和爆炸区域虽然无此问题,但仍无法一致估计截距项。相比之下,温和平稳框架更易处理:OLS估计量渐近正态,所得轨迹比线性AR(1)模型更现实,并且可以通过插件法或自助法进行推断。理论结果得到模拟证据的支持,并通过保险和金融数据的应用加以说明。

英文摘要

Discrete-time affine processes are widely used in finance and economics and encompass count, positive, and nonnegative-valued processes. This paper develops near-unit-root asymptotic theory for this class of models. Unlike linear AR(1) processes, affine processes exhibit time-varying conditional variance that remains asymptotically non-negligible near unity, leading to qualitatively different scaling limits and estimator behavior. We show that the local-to-unity regime suffers from the usual nuisance-parameter problem, whereas the mildly explosive regime, while free of it, still does not allow consistent estimation of the intercept. By contrast, the mildly stationary framework is more tractable: the OLS estimator is asymptotically normal, the resulting trajectories are more realistic than those of linear AR(1) models, and inference is possible through both a plug-in method or bootstrap. The theoretical results are supported by simulation evidence and illustrated through applications to insurance and financial data.

2605.27844 2026-05-28 stat.ME

A Parameterization-Invariant DIC

参数化不变的DIC

Xingyao Xiao, Sophia Rabe-Hesketh

AI总结 针对经典DIC对参数化敏感且有效参数可能为负的问题,提出一种无插件、参数化不变的DIC(DIC_i),并证明其渐近等价于WAIC,在因子分析和增长混合模型中表现良好。

详情
AI中文摘要

经典的偏差信息准则(DIC)对重新参数化不是不变的,并且可能具有负且不稳定的有效参数数量。有效参数数量为负的原因实际上是当模型参数的后验均值与最大似然估计差异显著时,插件偏差变得过大。在潜变量模型中,原因可能是可识别性问题,导致无意义且不稳定的插件估计。具体来说,不可识别性意味着不同的参数点可以具有相同的似然,并且在MCMC链内部或链之间切换这些点会产生不稳定且无意义的后验均值。为了解决这个问题,我们提出了一种无插件、参数化不变的DIC版本,记为DIC$_i$,并证明其渐近等价于Watanabe-Akaike信息准则(WAIC)。模拟表明,在经典DIC失效的因子分析和增长混合模型中,DIC$_i$与WAIC一致。这些结果表明,当WAIC不适用或不可用时,DIC$_i$是DIC的一个有用且计算高效的替代方案。

英文摘要

The classic Deviance Information Criterion (DIC) is not invariant to reparameterization and can have a negative and unstable effective number of parameters. The reason for the effective number of parameters being negative is actually that the plug-in deviance becomes excessively large when the posterior means of the model parameter differ dramatically from the maximum likelihood estimates. In latent variable models, the cause can be identifiability issues that lead to meaningless and unstable plug-in estimates. Specifically, nonidentifiability means that distinct parameter points can have the same likelihood and switching between such points within or between MCMC chains produces unstable and meaningless posterior means. To address this issue, we propose a plug-in-free, parameterization-invariant version of the DIC, denoted DIC$_i$, and show that it is asymptotically equivalent to the Watanabe-Akaike Information Criterion (WAIC). Simulations demonstrate that DIC$_i$ aligns with WAIC in factor analysis and growth mixture models where the classic DIC breaks down. These results suggest that DIC$_i$ is a useful, computationally efficient alternative to the DIC when WAIC is not applicable or not available.

2605.27834 2026-05-28 cs.LG stat.ML

Reward Transfer from Inverse Reinforcement Learning: A Coupled Minimax Approach

从逆强化学习中的奖励迁移:一种耦合极小极大方法

Guang-Yuan Hao, Lars van der Laan, Aurélien Bibaut, Nathan Kallus

AI总结 提出一种耦合极小极大方法,通过联合求解源和目标环境的贝尔曼方程组,消除源贝尔曼残差误差的一阶影响,实现逆强化学习奖励从源环境到目标环境的有效迁移。

详情
AI中文摘要

我们研究利用逆强化学习从专家演示中学习到的奖励从一个环境迁移到另一个不同环境的强化学习问题。当演示在受控环境中收集时,这自然发生。我们将问题表述为跨源和目标环境的贝尔曼方程联合系统,并开发了目标软$q$函数的极小极大估计器。顺序求解方法首先估计源奖励,然后将其代入目标控制问题,而耦合方法则联合求解源和目标系统方程。我们表明,与顺序方法相比,耦合方法消除了源贝尔曼残差误差的一阶影响。我们刻画了每种方法的局部行为,建立了有限样本软$q$函数误差界,并证明了所得软控制策略的遗憾保证。使用脓毒症模拟器的实证研究验证了理论比较。

英文摘要

We study the transfer of rewards learned using inverse reinforcement learning from expert demonstrations in one environment to reinforcement learning in a new, different environment. This arises naturally when demonstrations are collected in a controlled environment. We formulate the problem as a joint system of Bellman equations across the source and target environments and develop minimax estimators for the target soft-$q$-function. Whereas a sequential solution approach first estimates the source reward and then plugs it into the target control problem, a coupled approach solves the source and target system of equations jointly. We show that, in contrast to the sequential approach, the coupled approach removes the first-order influence of source Bellman residual error. We characterize the local behavior of each approach, develop finite-sample soft-$q$-function error bounds, and prove regret guarantees for the resulting soft-control policy. An empirical investigation using a sepsis simulator validates the theoretical comparison.

2605.27796 2026-05-28 eess.IV cs.CV cs.LG eess.SP stat.AP

Benchmarking Ultrasound Foundation Models for Fetal Plane Classification

超声基础模型在胎儿平面分类中的基准测试

Leya Barrientos, Yuexi Du, Nicha C. Dvornek

AI总结 本研究对四种超声基础模型(USFM、MOFO、UltraSAM、FetalCLIP)在胎儿平面分类任务上进行基准测试,发现FetalCLIP在线性探测设置中表现最佳,而USFM在全微调设置中表现最佳,且预训练目标显著影响迁移性能。

详情
AI中文摘要

超声因其安全性、可及性和实时成像能力被广泛应用于产科护理。然而,其解读仍依赖操作者,且易受噪声和伪影影响。深度学习模型在解决这些问题上表现出色,但通常需要大量标注数据集,这在临床超声中难以获得。基础模型(FMs)提供了一种替代方案,利用大量超声图像学习可迁移的表征,从而在有限标注数据下实现泛化。本文针对胎儿平面分类任务,对超声专用基础模型进行了全面基准测试。我们评估了四种超声基础模型(USFM、MOFO、UltraSAM、FetalCLIP),并与两个CNN基线(ResNet50、EfficientNet-V2)以及一个在自然图像上预训练的ViT(DINOv3)进行比较。我们在两种互补设置下训练所有模型:全微调和冻结编码器的线性探测。所有模型均使用西班牙胎儿超声数据集进行5折患者级交叉验证训练,并在域内数据和外部非洲队列上测试,以评估跨人群泛化能力。我们发现,FetalCLIP在线性探测设置中取得最佳结果(域内F1=0.9261,域外F1=0.9731),而USFM在全微调设置中表现最佳(域内F1=0.9476,域外F1=0.9515)。MOFO和UltraSAM在两种设置中性能下降最多,在某些情况下甚至不如自然图像预训练模型。这些发现强调了预训练模型的选择对胎儿平面分类性能的显著影响,因为不同的预训练目标导致不同的迁移能力。

英文摘要

Ultrasound is widely used in obstetric care due to its safety, accessibility, and real-time imaging. However, interpretation remains operator-dependent and susceptible to noise and artifacts. Deep learning models have shown strong performance to solve these problem, but they typically require large annotated datasets that are difficult to obtain in clinical ultrasound. Foundation models (FMs) offer an alternative, using a large number of ultrasound images to learn transferable representations that can generalize with limited labeled data. This work presents a comprehensive benchmark of ultrasound-specific FMs for fetal plane classification. We evaluated four ultrasound FMs (USFM, MOFO, UltraSAM, FetalCLIP) against two CNN baselines (ResNet50, EfficientNet-V2) and a ViT (DINOv3) pretrained on natural images. We trained all models under two complementary settings: full fine-tuning and linear probing with a frozen encoder. All models were trained using 5-fold patient-level cross-validation on a Spanish fetal ultrasound dataset and tested on both in-domain data and an external African cohort to assess cross-population generalization. We found that FetalCLIP achieved the best results in the linear probing setting (F1 = 0.9261 for in-domain, F1 = 0.9731 for out-of-domain), while USFM performed best in the full fine-tuning setting (F1 = 0.9476 for in-domain, F1 = 0.9515 for out-of-domain). MOFO and UltraSAM degraded most in both settings, underperforming natural image pretrained models in some cases. These findings highlight how the choice of pretrained model strongly affects fetal plane classification performance, since different pretraining objectives lead to different levels of transferability.

2605.27794 2026-05-28 stat.ML cs.LG stat.ME

Learning to target with network interference

在网络干扰下学习目标定位

Xiaomeng Wang, Hamsa Bastani, Osbert Bastani, Zhimei Ren

AI总结 研究在bandit设置下网络干扰中的自适应目标定位,通过线性模型和稀疏假设,针对不同干扰结构知识水平提出近最优遗憾算法。

详情
AI中文摘要

本文研究在bandit设置下网络干扰中的自适应目标定位,其中对一个个体的处理可能通过溢出效应影响他人。我们考虑稀疏场景下的线性模型,每个个体的结果最多受少数其他人影响。首先建立遗憾下界,表明忽略网络结构并将问题简化为标准线性bandit必然导致低效学习,尤其是在大规模群体中。为了理解如何利用结构信息,我们分析了干扰结构知识水平不同的场景:(1) 完全支持知识,(2) 列支持大小知识,(3) 无先验知识。对于每种场景,我们建立了表征学习基本极限的遗憾下界,并开发了实现近最优遗憾的算法。总之,我们的结果提供了干扰结构知识如何影响在线学习效率的统一视角,并在每种设置下提供了实用的自适应目标定位算法。在合成和真实数据上的数值实验证明了我们算法的实际优势。

英文摘要

This paper studies adaptive targeting under network interference in a bandit setting, where treatments applied to one individual may affect others through spillover effects. We consider a linear model in a sparse regime, where each individual's outcome can be affected by at most a few others. We first establish a regret lower bound showing that ignoring the network structure and reducing the problem to a standard linear bandit inevitably leads to inefficient learning, particularly in large populations. To understand how structural information can be leveraged, we analyze regimes with varying levels of knowledge of the interference structure: (1) full support knowledge, (2) knowledge of the column support sizes, and (3) no prior knowledge. For each regime, we establish regret lower bounds characterizing the fundamental limits of learning, and develop algorithms that achieve near-optimal regret. Together, our results provide a unified view of how knowledge of the interference structure governs the efficiency of online learning under interference, and offer practical adaptive targeting algorithms in each setting. Numerical experiments on synthetic and real-world data demonstrate the practical benefits of our algorithms.

2605.27781 2026-05-28 stat.AP cs.SY eess.SY

Day-Ahead Electricity Price Forecasting Using a Multivariate Group Lasso Method

基于多变量组套索方法的日前电价预测

Keyi Wang, Jiaxiang Ji, Mahan Mansouri, Ahmed Aziz Ezzat

AI总结 针对电价序列中的时间组效应,提出基于Group Lasso的多变量统计方法进行日前电价向量预测,在CAISO数据上显著提升点预测和概率预测精度,并在国际竞赛中获第二名。

详情
AI中文摘要

现代电力系统中的电价信号表现出复杂的依赖结构,使得预测具有内在挑战性。我们对加州独立系统运营商(CAISO)实际电价信号的分析揭示了复杂的时间组效应,即由于潜在的经济和运营驱动因素,解释变量对电价的影响在连续的时间块中持续存在。为此,我们提出了一种基于Group Lasso公式的多变量统计方法,通过利用多特征时间组效应来预测日前电价向量。我们的方法在CAISO两年的完整电价数据上进行了评估,与多种统计和深度学习方法相比,在点预测和概率预测指标上显示出显著改进。理论和实证分析证实了所提方法在建模实际组效应方面的有效性,同时保持了可解释性和低计算复杂度。在最近一次国际电价预测挑战赛的测试数据上进行回顾性评估时,尽管所提方法获得的信息远少于竞争方法,但仍排名第二。最后,所提方法在CAISO的两个运营电价预测系统中得到独立验证,显示出具有竞争力的预测性能和实际相关性。

英文摘要

Electricity price signals in modern power systems exhibit complex dependence structures that render forecasting inherently challenging. Our analysis of real-world pricing signals from the California Independent System Operator (CAISO) reveals complex temporal group effects, whereby the influence of explanatory variables on electricity prices persists across consecutive blocks of time due to underlying economic and operational drivers. In response, we propose a multivariate statistical method based on a Group Lasso formulation to forecast the vector of day-ahead electricity prices, by leveraging multi-feature temporal group effects. Our approach is evaluated on two full years of electricity prices from CAISO, demonstrating considerable improvements in point and probabilistic forecast metrics compared to a wide array of statistical and deep learning methods. Theoretical and empirical analyses confirm the effectiveness of the proposed approach in modeling realistic group effects, maintaining both interpretability and low computational complexity. When retrospectively evaluated on test data from a recent international electricity price forecasting challenge, the proposed method ranked in second place, despite having access to significantly less information than competing approaches. Finally, the proposed method is independently validated against two operational electricity price forecasting systems in CAISO, demonstrating competitive predictive performance and practical relevance.

2605.27769 2026-05-28 cs.DS cs.IT cs.LG math.IT stat.ML

Smoothed Score Queries and the Complexity of Sampling

平滑得分查询与采样的复杂度

Jingbo Liu

AI总结 本文研究利用梯度信息从高维高斯分布中采样的查询复杂度,通过引入平滑得分查询(即高斯卷积密度的对数梯度)将条件数依赖从√κ降低到对数级别,并给出近乎匹配的上下界。

详情
AI中文摘要

我们研究利用梯度信息从高维高斯分布中采样的查询复杂度。在标准预言机模型中,精确梯度仅暴露与精度矩阵的矩阵-向量乘积,导致多项式逼近障碍和特征性的条件数√κ依赖。我们证明,当允许采样器查询\emph{平滑得分}(即高斯卷积密度的对数梯度)时,这一障碍消失。对于精度矩阵为Λ的高斯目标,噪声水平τ下的平滑得分查询可访问预解式(Λ+τ^{-1}I)^{-1}。将几何间隔的噪声水平与sinc求积有理逼近相结合,我们得到一个采样器,其总变分误差δ_{TV}所需的平滑得分查询次数为q=O\!\left(igl(\logκ+\log(e\sqrt d/δ_{ m TV})igr)\log(e\sqrt d/δ_{ m TV}) ight),将条件数依赖从√κ改进为对数依赖。我们还研究了有限比特梯度预言机。通过对变换后的平滑得分答案进行坐标量化并添加最终抖动步骤,我们得到一个采样方案,其总通信梯度信息在κ中为多对数级别;特别地,对于固定维度和精度,比特复杂度为O(\log^2κ)。为补充这些上界,我们引入一种信道合成(或反向香农)逆技术用于采样下界。这将总变分模拟保证转化为通信需求,并得到所需梯度信息的Ω(\logκ)下界。综合这些结果,我们识别出平滑得分作为采样中可证明信息更丰富的预言机,并为其有限比特复杂度给出了近乎匹配的上下界。

英文摘要

We study the query complexity of sampling from high-dimensional Gaussian distributions using gradient information. In the standard oracle model, exact gradients expose only matrix-vector products with the precision matrix, leading to polynomial approximation barriers and a characteristic \(\sqrtκ\) dependence on the condition number. We show that this barrier disappears when the sampler is allowed to query \emph{smoothed scores}, namely gradients of the logarithms of the Gaussian-convolved densities. For a Gaussian target with precision matrix \(Λ\), a smoothed-score query at noise level \(τ\) gives access to the resolvent \((Λ+τ^{-1}I)^{-1}\). Combining geometrically spaced noise levels with sinc-quadrature rational approximation, we obtain a sampler with $q=O\!\left(\bigl(\logκ+\log(e\sqrt d/δ_{\rm TV})\bigr)\log(e\sqrt d/δ_{\rm TV})\right)$ smoothed-score queries for total variation error \(δ_{\rm TV}\), improving the condition-number dependence from \(\sqrtκ\) to logarithmic. We also study finite-bit gradient oracles. Using coordinatewise quantization of the transformed smoothed-score answers and a final dithering step, we obtain a sampling scheme whose total communicated gradient information is polylogarithmic in \(κ\); in particular, for fixed dimension and accuracy, the bit complexity is \(O(\log^2κ)\). To complement these upper bounds, we introduce a channel-synthesis, or reverse-Shannon, converse technique for sampling lower bounds. This converts total-variation simulation guarantees into communication requirements and yields an \(Ω(\logκ)\) lower bound on the required gradient information. Together, these results identify smoothed scores as a provably more informative oracle for sampling and give nearly matching upper and lower bounds for its finite-bit complexity.

2605.27747 2026-05-28 stat.ML cs.LG stat.CO

Soft Specialists: $α$-Rényi Ensembles for Uncertainty-Aware LLM Post-Training

软专家:用于不确定性感知的LLM后训练的$\alpha$-Rényi集成

Paula Cordero-Encinar, Georgy Tyukin, Andrew B. Duncan

AI总结 提出一种$\alpha$-Rényi变分框架,通过学习后训练参数的分布来替代深度集成,实现不确定性感知的LLM后训练,并支持软路由和模型专业化。

详情
AI中文摘要

现有的大语言模型训练方法基于大量数据学习单一参数集,这些数据通常异构、冲突且往往直接矛盾。因此,模型被迫将冲突目标和固有不确定性压缩为单一的平均行为模式。我们提出了一种$\alpha$-Rényi变分框架,用于学习后训练参数的分布,为深度集成方法提供了一种不确定性感知的替代方案。得到的变分目标在经典变分贝叶斯和预测导向的后验学习之间插值,平衡全局合理的个体模型与互补专家系统。我们确定了局部稳定性准则,展示了模型误设如何使非退化后验扩散局部有利,将矛盾或冲突数据表现为认知不确定性。我们将该框架应用于LLM后训练,学习附着在共享冻结基模型上的LoRA适配器集成,为监督微调和偏好优化提供了可扩展的训练过程。我们的方法使得训练示例能够被软路由到集成成员之间,促进模型专业化,并为不同任务提供可操作的不确定性估计。

英文摘要

Existing training approaches for large language models learn a single set of parameters, based on large volumes of data, which is typically heterogeneous, conflicting and often outright contradictory. As a result, the model is forced to compress conflicting goals, and inherent uncertainties into a single, averaged pattern of behaviour. We propose an $α$-Rényi variational framework for learning distributions over post-training parameters, offering an uncertainty-aware alternative to deep ensemble approaches. The resulting variational objective interpolates between classical variational Bayes and predictively oriented posterior learning, balancing between globally plausible individual models against systems of complementary specialists. We identify local stability criteria, demonstrating how model misspecification can make non-degenerate posterior spread locally favourable, manifesting contradictory or conflicting data as epistemic uncertainty. We apply our framework to LLM post-training, learning an ensemble of LoRA adapters attached to a shared, frozen base model, providing a scalable training procedure for both supervised fine-tuning and preference optimisation. Our approach enables training examples to be softly routed across ensemble members, promoting model specialisation and providing actionable uncertainty estimates across different tasks.

2605.27720 2026-05-28 cs.LG stat.AP

Bayesian Deployment Approval for Learned Landing Controllers under Finite Rollout Validation

有限滚动验证下学习型着陆控制器的贝叶斯部署批准

Fei Jiang, Lei Yang

AI总结 针对学习型自主控制器在有限仿真验证下的部署不确定性,提出基于贝叶斯后验推断的部署批准框架,通过后验批准概率和部署风险进行不确定性校准评估。

详情
Comments
16 pages, 4 figures and 4 tables
AI中文摘要

强化学习和数据驱动的自主控制器通常使用累积奖励和有限仿真轨迹下的经验成功频率进行评估。然而,这些经验指标不一定能为不确定性下的部署准备提供足够的统计证据。本文针对有限滚动证据下的学习型自主着陆控制器,开发了一个贝叶斯批准框架。基于不确定运行条件下的触地安全满足性,引入了概率着陆能力公式,同时使用贝叶斯后验推断来量化学习策略真实部署能力的不确定性。进一步引入了后验批准概率和后验部署风险用于面向部署的评估,以及一个支持在渐进滚动测试中做出批准/拒绝/继续决策的顺序验证框架。使用PPO和SAC控制器的仿真实验表明,在有限的验证证据下,经验成功和奖励优化可能产生过度自信的部署解释,而后验批准推断提供了更不确定性校准的部署准备评估。所提出的框架为传统强化学习评估与不确定性下面向部署的验证之间提供了实用的统计联系,并可推广到更广泛的学习型自主系统类别。

英文摘要

Reinforcement learning and data-driven autonomous controllers are commonly evaluated using cumulative reward and empirical success frequency under finite simulation trajectories. However, such empirical metrics do not necessarily provide sufficient statistical evidence regarding deployment readiness under uncertainty. This work develops a Bayesian approval framework for learned autonomous landing controllers under finite rollout evidence. A probabilistic landing capability formulation is introduced based on touchdown safety satisfaction under uncertain operating conditions, while Bayesian posterior inference is used to quantify uncertainty regarding the true deployment capability of learned policies. Posterior approval probability and posterior deployment risk are further introduced for deployment-oriented evaluation, together with a sequential validation framework supporting approve/reject/continue decisions during progressive rollout testing. Simulation experiments using PPO and SAC controllers demonstrate that empirical success and reward optimization may produce overconfident deployment interpretation under limited validation evidence, whereas posterior approval inference provides a more uncertainty-calibrated assessment of deployment readiness. The proposed framework provides a practical statistical connection between conventional reinforcement-learning evaluation and deployment-oriented validation under uncertainty and may be generalized to broader classes of learned autonomous systems.

2605.27718 2026-05-28 math.ST cs.LG stat.ME stat.TH

Robust Moment-Based Estimation via Spectral Gradient Reweighting

基于谱梯度重加权的稳健矩估计

Liu Zhang, Amit Singer

AI总结 提出SGR-GMM算法,通过谱梯度重加权对观测梯度进行软重加权,实现稳健的广义矩估计,并给出理论保证和实验验证。

详情
AI中文摘要

基于矩的估计是参数推断在理论上具有吸引力的方法,尤其是在基于似然的估计不可用、设定错误或计算不便时。然而,矩方程涉及样本均值,这使得基于矩的估计对异常值敏感。我们提出了SGR-GMM算法,这是一种稳健的广义矩估计(GMM)程序,它使用谱梯度重加权(SGR)原语在矩匹配优化过程中对每个观测的梯度进行软重加权。我们的分析分为三层。首先,对于固定中心,SGR原语被表述为样本权重玩家和密度矩阵玩家之间的熵正则化谱博弈,并使用经典的多重权重和矩阵多重权重遗憾界进行分析。其次,我们建立了SGR原语中固定中心更新的显式收敛半径和有限终止界。第三,我们证明了局部有限样本参数估计误差界,该界显式依赖于污染比例、内点梯度稳定性、局部GMM识别强度和优化精度。我们进一步特化SGR-GMM算法,以获得稳健的对角加权GMM(DGMM)估计量,用于估计在加性高斯噪声和强污染下观测到的异方差低秩高斯混合模型。在数值实验中,SGR原语产生近乎神谕的梯度估计,而稳健的DGMM特化显著优于非稳健的矩基线。代码和数据可在https://github.com/liu-lzhang/sgr-gmm获取。

英文摘要

Moment-based estimation is a theoretically attractive approach to parametric inference, especially when likelihood-based estimation is unavailable, misspecified, or computationally inconvenient. However, the moment equations involve sample averages, which makes moment-based estimation sensitive to outliers. We propose the SGR-GMM algorithm, a robust generalized method of moments (GMM) procedure that uses a spectral gradient reweighting (SGR) primitive to soft-reweight the per-observation gradients during the moment-matching optimization. Our analysis has three layers. First, for a fixed center, the SGR primitive is formulated as an entropy-regularized spectral game between a sample-weight player and a density-matrix player, which is analyzed using classical multiplicative-weights and matrix-multiplicative-weights regret bounds. Second, we establish explicit convergence radius and finite termination bound for the fixed-center updates in the SGR primitive. Third, we prove a local finite-sample parameter estimation error bound with explicit dependence on the contamination fraction, inlier gradient stability, local GMM identification strength, and optimization accuracy. We further specialize the SGR-GMM algorithm to obtain a robust diagonally-weighted GMM (DGMM) estimator for estimating heteroscedastic low-rank Gaussian mixtures observed under additive Gaussian noise and strong contamination. In the numerical experiments, the SGR primitive produces nearly-oracle gradient estimation and the robust DGMM specialization substantially improves over non-robust moment baselines. The code and data are available at https://github.com/liu-lzhang/sgr-gmm.

2605.27711 2026-05-28 stat.ME

Improving Power in Randomized Controlled Trials with Time-to-Event Endpoints: A Risk-Free Approach

改善具有时间至事件终点的随机对照试验的效能:一种无风险方法

Junyi Zhou, Qing Liu, May Mo, Amy Xia

AI总结 提出一种利用外部数据中的预后信息,通过两步法(先估计预后评分,再将其作为协变量纳入非参数调整对数秩检验)来无偏估计边际风险比并提高试验效能的框架。

详情
AI中文摘要

利用外部或历史数据提高随机临床试验的效率而不引入偏倚或膨胀I类错误率仍然具有挑战性。最近关于外部训练预后评分的工作,例如针对连续终点的PROCOVA,已经通过协变量调整展示了一种无风险方法。然而,由于边际风险比(HR)的非可折叠性,将该范式扩展到时间至事件终点并非易事。在本文中,我们通过提出一个统一框架来解决这一挑战,该框架将从外部数据学习到的复杂、高维预后信息纳入具有时间至事件终点的随机对照试验的主要分析中,同时以边际风险比为目标。所提出的程序分两步进行。首先,通过使用灵活的监督学习方法将鞅残差对基线协变量进行回归,从外部或历史数据估计预后评分。其次,将拟合的评分作为额外协变量纳入Ye等人[2024]的非参数协变量调整对数秩检验及相关的边际HR估计量中。所提出的方法控制了I类错误,并提供了边际HR的渐近无偏估计,无论预后模型设定错误或外部/历史数据与试验数据之间的总体异质性如何。我们表明,方差减少及相应的事件数节省大约等于预后评分与试验中鞅伪结局之间的平方相关系数。扩展到分层随机化是直接的。模拟研究证明了令人满意的有限样本性能,并在历史预后信息具有信息性时实现了有意义的效率提升。

英文摘要

Leveraging external or historical data to improve the efficiency of randomized clinical trials without introducing bias or inflating the Type I error rate remains challenging. Recent work on externally trained prognostic scores, such as PROCOVA for continuous endpoint, has demonstrated a risk-free approach via covariate adjustment. However, extending this paradigm to time-to-event endpoints is nontrivial due to the non-collapsibility of the marginal hazard ratio (HR). In this paper, we address this challenge by proposing a unified framework for incorporating complex, high-dimensional prognostic information learned from external data into the primary analysis of RCTs with time- to-event endpoints, while targeting the marginal hazard ratio. The proposed procedure proceeds in two steps. First, a prognostic score is estimated from external or historical data by regressing martingale residuals on baseline covariates using flexible supervised learning methods. Second, the fitted score is included as an additional covariate in the nonparametric covariate-adjusted log-rank test and the associated marginal HR estimator of Ye et al. [2024]. The proposed method controls Type I error and provides asymptotic unbiased estimation of the marginal HR, irrespective of prognostic model misspecification, or population heterogeneity between external/historical and trial data. We show that the variance reduction, and corresponding event count savings, are approximately equal to the squared correlation between the prognostic score and the martingale pseudo-outcome in the trial. Extensions to stratified randomization are straightforward. Simulation studies demonstrate satisfactory finite-sample performance and meaningful efficiency gains when historical prognostic information is informative.

2605.27694 2026-05-28 stat.AP

Likelihood-Free Inference for Multivariate Generalized Pareto Models

多元广义帕累托模型的无似然推断

Samira Aka, Marie Kratz, Philippe Naveau

AI总结 针对似然函数难处理或支撑离散的多元极值模型,提出两阶段无似然推断方法AW-NBE,结合神经贝叶斯估计与Sinkhorn散度优化,改善参数推断。

详情
AI中文摘要

基于似然的多元极值模型推断在似然函数难处理或支撑离散时通常不可靠或不可行。这一挑战在多元离散广义帕累托模型中尤为突出,因为需要从稀疏的超阈值样本中推断边际尾部行为和依赖结构。我们提出了一种两阶段无似然推断程序,称为AW--NBE(自适应Wasserstein神经贝叶斯估计器),它将神经贝叶斯估计与基于Sinkhorn散度的目标最优传输细化步骤相结合。在第一阶段,基于模拟数据训练的神经贝叶斯估计器提供快速稳定的初始参数估计。在第二阶段,通过最小化观测和模拟超阈值经验分布之间的Sinkhorn散度,对这些估计进行局部细化。这种细化减少了观测和模拟超阈值经验分布之间的Sinkhorn散度,同时保留了神经估计器学习到的依赖特征。模型充分性通过新的基于最优传输的多元Q-Q图和潜在诊断进行评估。应用于金融对数收益率和瑞士干旱期超阈值数据表明,与单独使用Sinkhorn散度、标准神经贝叶斯估计器或删失似然估计相比,AW--NBE可以改善参数推断。

英文摘要

Likelihood-based inference for multivariate extreme-value models is often unreliable or infeasible when likelihoods are intractable or supports are discrete. This challenge is particularly acute for multivariate discrete generalized Pareto models, where both marginal tail behavior and dependence must be inferred from sparse exceedance samples. We propose a two-stage likelihood-free inference procedure, termed AW--NBE (Adaptive Wasserstein Neural Bayes Estimator), that combines neural Bayes estimation with a targeted optimal transport refinement step based on the Sinkhorn discrepancy. In the first stage, a neural Bayes estimator trained on simulated data provides fast and stable initial parameter estimates. In the second stage, these estimates are locally refined by minimizing the Sinkhorn divergence between the empirical distributions of observed and simulated exceedances. This refinement reduces the Sinkhorn discrepancy between the empirical distributions of observed and simulated exceedances, while preserving dependence features learned by the neural estimator. Model adequacy is assessed using new optimal transport based multivariate Q--Q and potential diagnostics. Applications to financial log-returns and Swiss dry spell exceedances suggest that AW--NBE can improve parameter inferences compared to estimation using solely, either the Sinkhorn discrepancy, or the standard neural Bayes estimators and censored likelihood estimation.

2605.27676 2026-05-28 stat.ML cs.LG

Unsupervised Identification and Removal of Spurious Correlations During Fine-Tuning

微调过程中虚假关联的无监督识别与消除

Ciarán M. Gilligan-Lee, Joseph Egan, Yuchen Zhu, Michael O'Riordan

AI总结 提出GRASP方法,通过梯度投影在微调时无监督识别并消除与任务纠缠的虚假关联,同时保留预训练知识,在三个任务上优于基线。

详情
Comments
10 + 4 pages, comments welcome
AI中文摘要

在精心策划的数据集上微调预训练语言模型可能会在微调任务与无意中的潜在因素(如不对齐的人物角色或政治倾向)之间产生虚假关联,而这些因素是由策划过程与任务纠缠在一起的。模型可能会依赖这些虚假关联,导致偏差并降低分布外泛化能力。我们证明,在任务复杂性和虚假关联的合理假设下,可以从朴素LoRA微调的权重中无监督地识别这些潜在因素。现有的消除偏差方法(如激活引导)在推理或训练期间从残差流激活中移除已识别的因素。然而,我们认为目标应该是消除虚假关联,而不是潜在因素本身,因为预训练模型可能依赖该因素来获取真实的任务信号。为此,我们提出GRASP(关联虚假模式的梯度投影),该方法防止模型对已识别的潜在因素产生新的依赖,同时保留沿该方向的任何预训练内容。我们在三个微调任务上进行了验证。前两个涉及紧急不对齐,即在狭窄任务(在我们的案例中,编写不安全的代码和给出糟糕的医疗建议)上进行微调会导致在无关话题上产生不对齐的响应。在这里,我们的方法在不安全代码案例中完全消除了不对齐,在糟糕医疗建议案例中减少了约5倍,在不对齐减少与任务保持之间的权衡中击败了所有基线。最后一个是新颖的政治偏见实验,即在右倾的Reddit金融建议数据上进行微调会导致在无关话题上产生政治倾向漂移。在这里,我们的方法将漂移减少了一半以上,同时提高了金融任务性能,击败了所有基线。

英文摘要

Fine-tuning a pretrained language model on a curated dataset can produce spurious correlations between the fine-tuning task and unintended latent factors -- such as misaligned personas or political slant -- that the curation procedure has entangled with the task. The model can latch onto these spurious correlations, leading to bias and reduced out-of-distribution generalisation. We prove that under reasonable assumptions on task complexity and the spurious correlation, such latent factors can be identified, without supervision, from the weights of a naive LoRA fine-tune. Existing approaches to removing bias, such as activation steering, remove identified factors from residual-stream activations, either at inference or during training. We argue, however, that the goal should be to remove the spurious correlation, not the latent factor itself, as the pretrained model may rely on it for genuine task signal. To enable this, we propose GRASP, GRadient projection of Associated Spurious Patterns, which prevents the model from acquiring new reliance on the identified latent factor while preserving any pretrained content along it. We validate on three fine-tuning tasks. The first two involve emergent misalignment, where fine-tuning on a narrow task -- in our case, writing insecure code and giving bad medical advice -- leads to misaligned responses on unrelated topics. Here our method completely removes misalignment in the insecure code case and reduces them by ~5x in the bad medical advice case, beating all baselines in the trade-off between misalignment-reduction and task-preservation. The last is a novel political-bias experiment, where fine-tuning on right-skewed Reddit financial-advice data causes political-lean drift on unrelated topics. Here our method reduces drift by more than half, while improving financial task performance, beating all baselines.

2605.27671 2026-05-28 stat.ML cs.LG

Evolving and Detecting Multi-Turn Deception using Geometric Signatures

使用几何特征演化与检测多轮欺骗

Surender Suresh Kumar, Mary L. Cummings

AI总结 提出多目标遗传优化生成多轮欺骗问题集,并利用嵌入空间中的简单几何特征(角覆盖、距离比、线性度)结合轻量级分类器实现高召回率(0.89)的欺骗检测。

详情
AI中文摘要

大型语言模型(LLM)的安全防御通常针对单轮提示进行训练和评估,但实际攻击往往以间接的多轮探测形式展开。为了防御这种更微妙的欺骗形式,我们提出了一种统一流程,通过具有协同进化变异算子的多目标遗传提示优化,生成逼真的多轮欺骗问题集。我们通过人类研究验证了该数据集,该研究还表明,早期生成产生了最令人信服的欺骗,并且存在实际约束,如依从性过滤和顺序效应。利用这些数据,我们能够通过嵌入空间中简单、可解释的几何信号,结合轻量级前馈分类器,检测到试图获取被禁止信息的欺骗行为。三个几何特征(角覆盖、距离比和线性度)加上成对相似性统计,形成了一个紧凑的预测模型,在基础、改写和截断(三轮)场景中持续实现了高召回率(0.89),测试时F1值在0.74-0.86之间。结果支持一个中心假设:多轮欺骗意图会留下稳定的几何足迹,从而能够实现轻量级、透明的筛选,无需昂贵的端到端训练。我们进一步讨论了负责任的使用、局限性以及构建更大、更多样化的人类评估数据集的路径。对人工智能的主要贡献是多目标进化提示生成框架,工程应用是部署用于LLM安全基础设施的轻量级几何检测系统。

英文摘要

Safety defenses for large language models (LLMs) are typically trained and evaluated on single-turn prompts, yet real attacks often unfold as indirect, multi-turn probing. To defend against this more nuanced form of deception, we present a unified pipeline that generates realistic multi-turn deceptive question sets via multi-objective genetic prompt optimization with co-evolving mutation operators. We validate this dataset through a human study, which also revealed that early generations yielded the most convincing deception and practical constraints such as adherence filtering and ordering effects. Using this data, we were able to detect deceptive attempts to access prohibited information using simple, explainable geometric signals in embedding space coupled with a lightweight feed-forward classifier. Three geometric features (angular coverage, distance ratio, and linearity) augmented with pairwise similarity statistics led to a compact predictive model that achieved consistently high recall (0.89) across base, reworded, and truncated (three-turn) scenarios, with test-time F1 ranging from 0.74-0.86. The results support a central hypothesis that multi-turn deceptive intent leaves a stable geometric footprint that enables lightweight, transparent screening without expensive end-to-end training. We further discuss responsible uses, limitations, and paths toward larger, more diverse human-evaluated datasets. The primary contribution to artificial intelligence is the multi-objective evolutionary framework for prompt generation, and the engineering application is the deployment of a lightweight geometric detection system for LLM safety infrastructure.

2605.27664 2026-05-28 stat.ME

BOOST: Power-Optimal Strong-FWER Testing for Block-Structured Multiplicity

BOOST: 块结构多重性的功率最优强FWER检验

Prasanjit Dubey, Xiaoming Huo

AI总结 针对块结构多重检验问题,提出BOOST方法,通过等边际KKT条件实现块间功率最优分配,在有限样本下以O(K)成本控制强FWER,并显著提升发现能力。

详情
AI中文摘要

结构化多重检验问题(如门控试验、剂量探索、多组织eQTL定位、捆绑挑战者A/B实验)将假设组织成设计强加的块,并要求对确认性声明进行强族系错误率(FWER)控制。目前从业者使用目标无关的逐步规则(Bonferroni、Holm、Hochberg、Hommel)、封闭检验和图形扩展,或分层和重抽样方法;没有一种方法在这些设计所诱导的块可分离类中达到功率最优。我们引入BOOST(块最优目标驱动强FWER检验),这是针对块大小为三的功率最优强FWER程序,具有三个保证:(i) 有限样本下以O(K)成本(相对于一般封闭检验的O(K^2))实现强FWER有效性,无需独立性假设,在跨块独立下具有严格的Sidak改进;(ii) 通过等边际KKT条件实现跨异质块的功率最优分配,可通过二分法在O(B log(1/ε))内求解;(iii) 针对未知备选密度g的样本分割插件变体,实现α控制,膨胀至多O(B_T E‖g-ĝ‖_∞),每个假设的功率赤字与B_T无关。在独立、等相关、稀疏和错误设定情景下的模拟显示,在校准FWER下,相对于最强现有基线,功率提升1.4-1.7倍。在两个已发表数据集(BLUEPRINT跨谱系顺式eQTL和Upworthy捆绑挑战者A/B实验)上,BOOST在受控FWER下比现有基线认证了数量级更多的全块发现。

英文摘要

Structured multiple-testing problems (gatekeeping trials, dose-finding, multi-tissue eQTL mapping, bundled-challenger A/B experiments) organize hypotheses into design-imposed blocks and demand strong family-wise error rate (FWER) control for confirmatory claims. Practitioners currently use objective-agnostic stepwise rules (Bonferroni, Holm, Hochberg, Hommel), closed-testing and graphical extensions, or hierarchical and resampling methods; none is power-optimal within the block-separable class these designs induce. We introduce BOOST (Block-Optimal Objective-driven Strong-FWER Testing), the power-optimal strong-FWER procedure for block size three, with three guarantees: (i) finite-sample strong-FWER validity at $O(K)$ cost (versus $O(K^2)$ for general closed testing) without independence assumptions, with a strict Sidak improvement under cross-block independence; (ii) power-optimal allocation across heterogeneous blocks via an equalized-marginal KKT condition, solvable by bisection in $O(B\log(1/\varepsilon))$; and (iii) a sample-split plug-in variant for unknown alternative density $g$, attaining $α$-control up to $O(B_T \mathbb E\|g-\widehat g\|_\infty)$ inflation with per-hypothesis power deficit independent of $B_T$. Simulations across independent, equicorrelated, sparse, and mis-specified regimes show 1.4-1.7$\times$ power gains over the strongest existing baseline at calibrated FWER. On two published datasets (BLUEPRINT cross-lineage cis-eQTL and Upworthy bundled-challenger A/B experiments), BOOST certifies an order of magnitude more full-block discoveries than existing baselines at controlled FWER.

2605.27655 2026-05-28 stat.ME

Implementing the principal stratum strategy for intercurrent events with survival outcomes: a tutorial

针对生存结局的并发事件的主分层策略实施:教程

Xiaoxiao Zhou, Joyce Chen, Pallavi Mishra-Kalyani, Xiaoxue Li, Yuan Li Shen, Shu Wang, Susan Halabi, Fan Li

AI总结 本教程回顾了主分层策略在生存结局并发事件中的应用,通过混合模型和加权方法估计因果效应,并提供了R代码和模拟研究。

详情
AI中文摘要

国际协调理事会(ICH)E9(R1)增补提供了估计框架,用于制定临床试验中的治疗效果。该框架描述的估计属性之一是并发事件。在指南列出的五种应对并发事件的策略中,主分层策略在概念和技术上最具挑战性,因为它定义了未观察到的分层上的治疗效果。其在生存结局中的应用尤其难以被实践者掌握。本教程回顾了采用主分层策略处理生存结局并发事件的估计框架的方法和实施。我们使用肿瘤学临床试验进行说明,并关注一个简单案例:二元治疗和单一二元并发事件(即停止指定治疗)。我们定义了因果效应,并回顾了估计效应的两种主要方法:混合模型方法和加权方法。对于每种方法,我们详细阐述了相关假设、模型、敏感性分析、软件,并提供了示例R代码。我们进行了模拟研究,模拟真实研究以考察这些方法的操作特性。

英文摘要

The International Council for Harmonization (ICH) E9 (R1) addendum provides the estimand framework to formulate treatment effects in a clinical trial. One of the attributes of an estimand the framework describes is intercurrent events. Among the five strategies to intercurrent events the guidance lists, the principal stratum strategy is the most conceptually and technically challenging because it defines treatment effects on unobserved strata. Its application to survival outcomes is particularly inaccessible to practitioners. This tutorial reviews the methodology and implementation of the estimand framework with the principal stratum strategy to address intercurrent events with survival outcomes. We illustrate using a clinical trial in oncology and focus on a simple case with binary treatment and a single binary intercurrent event of discontinuation of the assigned treatment. We define the causal effects and review two main methods for estimating the effects: the mixture model method and the weighting method. For each method, we elaborate the associated assumptions, models, sensitivity analysis, software and provide example R code. We conduct simulation studies that mimic the real study to study the operation characteristics of these methods.

2605.27650 2026-05-28 stat.ME

Bayesian Imputation for Unplayed Games in Round-Robin Chess Tournaments: Application to Grand Chess Tour, Bucharest 2026

循环赛国际象棋比赛中未进行局面的贝叶斯插补:应用于2026年布加勒斯特国际象棋大巡回赛

Ravi Varadhan

AI总结 针对棋手中途退赛导致的未进行局面计分问题,提出基于最佳线性无偏预测(BLUP)的贝叶斯框架,通过结合赛前等级分与观测表现进行插补,相比国际棋联现行规则降低26%预测误差。

详情
AI中文摘要

当棋手在循环制国际象棋比赛中途退赛时,组织者面临一个基本问题:如何为从未进行的对局分配得分?当前的国际棋联(FIDE)指南规定,如果退赛发生在完成50%对局之前,则取消成绩;之后则判负(奖励未对阵对手1分)。这种二分法规则造成了任意的间断性,并可能严重扭曲最终排名。我们基于最佳线性无偏预测(BLUP)开发了一个贝叶斯框架,该框架最优地结合了赛前等级分与观察到的表现,产生反映退赛棋手当前状态和未对阵对手间实力差异的插补得分。该估计量是一致、保点的,并在线性无偏预测中最小化均方误差。对180,000场模拟比赛的蒙特卡洛模拟研究表明,与FIDE现行规则相比,贝叶斯BLUP插补总体上将预测误差降低了26%,比判负规则改进41%,比取消成绩规则改进12%。最大的改进发生在退赛棋手表现不佳时,这是最常见的退赛情况。我们进一步表明,在所有场景下,取消成绩规则的RMSE比判负规则低15-45%。该方法应用于GM Alireza Firouzja在2026年布加勒斯特国际象棋大巡回赛中的退赛,贝叶斯插补本应给予未对阵对手0.55-0.70分,而不是在判负规则下给予的1.0分。为比赛组织者提供了一个开源R Shiny应用程序。我们建议FIDE在世界冠军赛周期赛事中采用贝叶斯插补,或至少用统一的取消成绩规则取代当前的二分法规则。

英文摘要

When a player withdraws mid-tournament from a round-robin chess event, organizers face a fundamental problem: how should scores be assigned for games that were never played? Current FIDE guidelines specify annulment if withdrawal occurs before 50% of games are completed, and forfeit (awarding unplayed opponents a full point) thereafter. This dichotomous rule creates arbitrary discontinuities and can substantially distort final standings. We develop a Bayesian framework based on best linear unbiased prediction (BLUP) that optimally combines pre-tournament ratings with observed performance, producing imputed scores that reflect both the withdrawn player's current form and the strength differentials among unplayed opponents. The estimator is consistent, point-conserving, and minimizes mean squared error among linear unbiased predictors. A Monte Carlo simulation study on 180,000 simulated tournaments demonstrates that Bayesian BLUP imputation reduces prediction error by 26% overall compared to FIDE's current rule, with improvements of 41% over forfeit and 12% over annulment. The largest gains occur when the withdrawn player is underperforming, the most common withdrawal scenario. We further show that annulment achieves 15-45% lower RMSE than forfeit across all scenarios. The methodology is applied to GM Alireza Firouzja's withdrawal at Grand Chess Tour, Bucharest 2026, where Bayesian imputation would have awarded unplayed opponents 0.55-0.70 points rather than the 1.0 awarded under forfeit rules. An open-source R Shiny application is provided for tournament organizers. We recommend that FIDE adopt Bayesian imputation for World Championship cycle events, or at minimum replace the current dichotomous rule with uniform annulment.

2605.27648 2026-05-28 stat.AP

Why pyrotechnics markets keep killing:a simple geometric argument for redesign

为什么烟火市场持续致命:一个简单的几何重新设计论证

Carlos M. Hernandez-Suarez, Alonso Sanchez-Maldonado, Carlos A. Robles-Hernandez

AI总结 本文通过几何传播和疏散模型,论证市场全局拓扑结构是烟火市场火灾死亡率的主要决定因素,并提出一种基于空间生态学种子传播接触过程模型的市场几何设计,以同时减缓火势蔓延和缩短疏散距离。

详情
Comments
Nine pages, three figures
AI中文摘要

烟火零售市场的火灾和爆炸在全球范围内以可预测的规律性反复发生,单次事件导致数十至数百人死亡。本文论证市场的全局拓扑结构是死亡率的主要决定因素,通过两个独立的几何渠道起作用。第一个是传播,涉及点燃物品的弹道扩散:火势在摊位之间蔓延的概率与扩散范围内摊位的空间密度成正比。第二个是疏散,涉及 occupant 必须穿越到达边界的距离,该距离由市场足迹的全局几何决定,而非任何摊位级别的参数。由于死亡风险随疏散时间近似指数增长,拓扑结构将疏散距离的微小差异放大为伤亡人数的巨大差异。目前美国、欧盟和墨西哥的标准规定了 aisle 宽度和摊位间距等局部参数,但未对市场的全局拓扑进行监管。我们认为拓扑结构应成为可监管的设计变量,并基于空间生态学中种子传播的接触过程模型,提出一种既能减缓火势蔓延又能缩短疏散距离的市场几何设计。

英文摘要

Fires and explosions in pyrotechnics retail markets recur worldwide with predictable regularity, killing dozens to hundreds of people in single events. This paper argues that the global topology of the market is the dominant determinant of mortality, acting through two independent geometric channels. The first, propagation, concerns ballistic dispersal of ignited articles: the probability that fire spreads between blocks scales with the spatial density of blocks within the dispersal range. The second, evacuation, concerns the distance an occupant must traverse to reach the perimeter, which is set by the global geometry of the market footprint, not by any stall-level parameter. Because mortality risk grows approximately exponentially in evacuation time, topology amplifies modest differences in egress distance into large differences in casualties. Current standards in the United States, the European Union, and Mexico prescribe local parameters such as aisle width and stall separation, but leave the global topology of the market unregulated. We argue that topology should be a regulable design variable, and propose a market geometry that simultaneously slows propagation and shortens evacuation, derived from contact-process models of seed dispersal in spatial ecology.

2605.27597 2026-05-28 stat.AP

Purely analytic composites: Relative variance contributions of indicators corresponding to a priori indicator weights

纯分析合成指标:对应先验指标权重的指标相对方差贡献

Andre Beauducel, Ned Kock

AI总结 提出纯分析合成指标,使得合成指标内各指标的方差贡献恰好等于其先验权重所定义的比例,并通过模拟数据和ETF应用示例说明其与普通分析合成指标的区别。

详情
AI中文摘要

合成指标通常是为了方便决策者而创建的。因此,实践或理论上的考虑可能导致形成合成指标的指标具有先验权重。作为加权聚合体创建的合成指标并非数据分析的结果,因此可称为“分析合成指标”。然而,已有研究表明,分析合成指标中指标的方差贡献受指标方差和指标间相关性的影响。在本研究中,我们提出了纯分析合成指标,其内部指标的方差贡献恰好等于由指标权重先验定义的比例。基于模拟数据的示例说明了分析合成指标与纯分析合成指标之间的差异。作为应用领域,我们提出纯分析合成指标可能在交易所交易基金中具有意义。附录中给出了用于计算纯分析合成指标的R脚本。

英文摘要

Composites are often created to facilitate the work of decision-makers. Therefore, practical or theoretical considerations may lead to a priori weights of the indicators forming a composite. Composites that are created a weighted aggregates are not the result of data analysis and may therefore be termed 'analytic composites'. However, it has already been shown that the variance contributions of indicators within analytic composites are affected by the indicator variance and indicator inter-correlations. In the present study purely analytic composites are proposed, having exactly the variance contribution of indicators within the composites that are a priori defined by the indicator weights. An example based on simulated data illustrates the difference between analytic composites and purely analytic composites. As an application area, we propose that purely analytic composites could be of interest in the exchange-traded fund. An R-script for the computation of purely analytic composites is given in the Appendix.

2605.27594 2026-05-28 cs.DS cs.LG stat.ML

Proper Agnostic Learning of Functions of Halfspaces under Gaussian Marginals

高斯边际下半空间函数的恰当不可知学习

Sergei Tikhonov, Arsen Vasilyan

AI总结 针对高斯分布下K个半空间的任意布尔函数,提出首个高效恰当不可知学习算法,运行时间在维度d上达到最优。

详情
AI中文摘要

我们研究了高斯分布下多维概念类的高效恰当不可知学习问题。在该设置中,给定来自$\mathbb{R}^d imes \{\pm 1\}$上未知分布(其边际在$\mathbb{R}^d$上为高斯分布)的i.i.d.标记样本,目标是输出目标类$\mathcal{F}$中的一个假设,使其0-1损失与$\mathcal{F}$中最优分类器的损失相差不超过$\epsilon$。我们给出了高斯边际下K个半空间的任意布尔函数的首个高效恰当不可知学习算法。我们的算法运行时间为$d^{O(K^2 \log(1/\epsilon)/\epsilon^2)} + (K/\epsilon)^{O(K^3/\epsilon^{2.5})}$。在我们工作之前,对于$K \geq 2$,唯一已知的算法是暴力搜索,运行时间关于d指数级。此外,我们运行时间对维度d的依赖与已知最佳非恰当学习算法相匹配,即$d^{\widetilde{O}(K^2/\epsilon^2)}$。对于单个半空间($K=1$)的特殊情况,先前最佳运行时间为$d^{O(1/\epsilon^4)} + (1/\epsilon)^{O(1/\epsilon^6)}$。我们的算法将其改进为$d^{O(1/\epsilon^2)} + (1/\epsilon)^{O(1/\epsilon^{2.5})}$。同样,对d的依赖与已知最佳非恰当算法$d^{O(1/\epsilon^2)}$相匹配。此外,我们运行时间对维度d的依赖在统计查询模型中本质上是最优的。

英文摘要

We study the problem of computationally efficient proper agnostic learning of multidimensional concept classes under the Gaussian distribution. In this setting, given i.i.d. labeled samples from an unknown distribution over $\mathbb{R}^d \times \{\pm 1\}$ whose marginal on $\mathbb{R}^d$ is Gaussian, the goal is to output a hypothesis from a target class $\mathcal{F}$ whose 0-1 loss is within $ε$ of that of the best classifier in $\mathcal{F}$. We give the first efficient proper agnostic learning algorithm for arbitrary Boolean functions of $K$ halfspaces under Gaussian marginals. Our algorithm runs in time $d^{O(K^2 \log(1/ε)/ε^2)} + (K/ε)^{O(K^3/ε^{2.5})}$. Prior to our work, the only known algorithm for $K \geq 2$ was brute-force search, with run-time exponential in $d$. Moreover, the dependence of our run-time on the dimension $d$ matches that of the best known improper learning algorithm, namely $d^{\widetilde{O}(K^2/ε^2)}$. For the special case of a single halfspace ($K=1$), the best previous run-time was $d^{O(1/ε^4)} + (1/ε)^{O(1/ε^6)}$. Our algorithm improves this to $d^{O(1/ε^2)} + (1/ε)^{O(1/ε^{2.5})}$. Once again, the dependence on $d$ matches that of the best known improper algorithm, namely $d^{O(1/ε^2)}$. Furthermore, the dependence of our run-time on the dimension $d$ is essentially optimal in the statistical query model.

2605.27563 2026-05-28 math.PR cs.AI stat.ML

On the Subgaussianity of Quantized Linear Maps: An AI-Assisted Note

关于量化线性映射的次高斯性:一份AI辅助笔记

Guangyi Zou, Roman Vershynin

AI总结 本文通过Gemini 3.5 Flash发现了一个与维度无关的次高斯集中界,适用于高斯向量在坐标非线性映射下的情况,并应用于回答Simone Bombari关于符号量化线性映射的问题。

详情
Comments
4 pages
AI中文摘要

这份简短的笔记给出了高斯向量在坐标非线性映射下与维度无关的次高斯集中界。该结果由Gemini 3.5 Flash发现,适用于任何在良态协方差下的有界函数。我们应用这一工具回答了Simone Bombari关于符号量化线性映射$Y = \text{sgn}(Wx)$的问题。

英文摘要

This short note presents a dimension-independent subgaussian concentration bound for Gaussian vectors under coordinate-wise nonlinear mappings. Discovered by Gemini 3.5 Flash, this result applies to any bounded function under a well-conditioned covariance. We apply this tool to answer a question of Simone Bombari on sign-quantized linear maps $Y = \text{sgn}(Wx)$.

2605.27556 2026-05-28 stat.ML cs.LG

Accelerating Reinforcement Learning Training Using Simulation Surrogate Models

利用仿真代理模型加速强化学习训练

Mohammadmahdi Ghasemloo, David J. Eckman, Yaxian Li

AI总结 针对奖励结构、模型参数或系统动态随时间变化的环境,提出使用仿真代理模型加速强化学习训练和再训练,并通过离散事件仿真实验验证其有效性。

详情
AI中文摘要

高保真仿真模型被广泛用于分析复杂随机系统,但其高计算成本促使开发更廉价的代理模型来近似仿真模型的输入-输出关系。同时,强化学习(RL)已成为在随机环境中进行在线决策的强大框架,越来越多的人关注使用仿真模型作为RL模型的训练环境。我们研究了一类适用于加速RL训练的代理模型,这些模型适用于奖励结构、模型参数或系统动态随时间变化的环境,并探讨了它们与仿真模型和RL模型的相互作用。通过对一个通过离散事件仿真建模的随机服务系统进行数值实验,我们证明利用代理模型可以显著加速RL训练和再训练。

英文摘要

High-fidelity simulation models are widely used to analyze complex stochastic systems, but their high computational cost motivates the development of cheaper surrogate models that approximate the simulation model's input-output relationship. In parallel, reinforcement learning (RL) has emerged as a powerful framework for making online decisions in stochastic environments, with increasing attention being given to the use of simulation models as training environments for RL models. We investigate a class of surrogate models suitable for accelerating RL training in settings where the reward structure, model parameters, or system dynamics change over time and explore their interactions with simulation models and RL models. Through numerical experiments on a stochastic service system modeled via discrete-event simulation, we demonstrate that leveraging surrogate models can substantially accelerate RL training and re-training.

2605.27526 2026-05-28 stat.ML cs.LG

Semiparametrically Efficient Inference for Kernel Measures of Noise Heterogeneity

噪声异质性核测度的半参数有效推断

Jakub Wornbard, Zikai Shen, Dimitri Meunier, Arthur Gretton

AI总结 针对加性噪声模型中噪声异质性的核测度,提出一种基于希尔伯特值一步估计的半参数有效推断方法,实现残差独立性和拟合优度的自举校准检验,并提供渐近有效的置信区间。

详情
AI中文摘要

我们为加性噪声模型中噪声异质性的核测度开发了半参数有效推断。在许多应用中,回归函数使用灵活的机器学习方法进行估计。基于所得残差的下游过程可能继承第一阶段偏差:回归误差可能引起协变量与残差之间的虚假依赖,从而使标准分析所需的假设无效。我们构建了一个新颖的希尔伯特值一步估计量,用于估计协变量与残差之间的核协方差算子。我们的估计量为加性噪声模型中的残差独立性和拟合优度提供了自举校准检验,同时在噪声异质性下为核依赖测度提供了渐近有效的置信区间。该框架扩展到包含额外协变量的设置,从而能够推断不同处理组间残差噪声的分布异质性。模拟显示,与朴素插件残差方法相比,校准和功效有所改进。

英文摘要

We develop semiparametrically efficient inference for kernel measures of noise heterogeneity in additive noise models. In many applications, the regression function is estimated using flexible machine learning methods. Downstream procedures based on the resulting residuals can then inherit first-stage bias: regression error may induce spurious dependence between covariates and residuals, invalidating the assumptions needed for standard analysis. We construct a novel Hilbert-valued one-step estimator of the kernel covariance operator between covariates and residuals. Our estimator yields bootstrap-calibrated tests for residual independence and goodness of fit in additive noise models, while also providing asymptotically efficient confidence intervals for the kernel dependence measure under noise heterogeneity. The framework extends to settings with additional covariates, enabling inference on distributional heterogeneity of residual noise across treatment groups. Simulations show improved calibration and power relative to naive plug-in residual methods.

2605.27523 2026-05-28 stat.ML cs.LG

Identifiable Bayesian Deep Generative Copulas with Unknown Layer Widths for Data with Arbitrary Marginal Distributions

可识别的贝叶斯深度生成Copula模型:未知层宽下任意边缘分布数据的建模

Joseph Feldman, Yuqi Gu

AI总结 提出Deep Discrete Encoder (DDE) Copula模型,通过二元潜变量的分层有向网络与Copula框架结合,实现任意边缘分布数据的可识别与可解释生成建模,并基于秩似然进行估计与后验推断。

详情
AI中文摘要

深度生成模型为多变量数据分析提供了强大工具,但其黑箱架构往往不可识别且难以解释。我们引入了Deep Discrete Encoder (DDE) Copula,一种用于任意边缘分布多变量数据的可识别且可解释的生成模型。该模型在Copula框架内放置了一个二元潜变量的分层有向网络,从而能够灵活地对混合离散和连续数据进行依赖关系建模。估计基于秩似然,它将边缘建模与DDE参数的后验推断解耦,并避免了指定边缘分布。我们建立了DDE Copula参数可识别的条件,确保层特定参数提供有意义的多元依赖总结。我们还证明了在精确秩似然下连续边缘的商空间后验一致性,并将用于结或混合边缘的扩展秩似然视为广义似然,在额外对比条件下具有集中性。在计算方面,我们提出了一种随机期望最大化算法用于最大后验估计,并辅以改进收敛的初始化策略。为了自适应地学习网络维度,我们将贝叶斯秩选择先验扩展到推断层特定宽度。模拟实验展示了强大的有限样本性能,一项人格调查分析揭示了复杂多变量数据中可解释的分层潜在结构。

英文摘要

Deep generative models offer powerful tools for multivariate data analysis, but their black-box architectures are often unidentified and difficult to interpret. We introduce the Deep Discrete Encoder (DDE) Copula, an identifiable and interpretable generative model for multivariate data with arbitrary marginal distributions. The model places a hierarchical directed network of binary latent variables inside a copula framework, enabling flexible dependence modeling for mixed discrete and continuous data. Estimation is based on rank likelihoods, which decouple marginal modeling from posterior inference on the DDE parameters and avoid specifying the marginal distributions. We establish conditions for identification of the DDE copula parameters, ensuring that layer-specific parameters provide meaningful summaries of multivariate dependence. We also prove quotient-space posterior consistency for continuous margins under the exact rank likelihood and treat the extended rank likelihood for tied or mixed margins as a generalized likelihood, with concentration under an additional contrast condition. For computation, we propose a stochastic expectation-maximization algorithm for \emph{maximum a posteriori} estimation, together with initialization strategies that improve convergence. To learn network dimension adaptively, we extend Bayesian rank-selection priors to infer layer-specific widths. Simulations show strong finite-sample performance, and a personality-survey analysis reveals interpretable hierarchical latent structure in complex multivariate data.

2605.27499 2026-05-28 cs.LG astro-ph.CO astro-ph.IM physics.comp-ph stat.ML

GenSBI: Generative Methods for Simulation-Based Inference in JAX

GenSBI: 基于JAX的模拟推断生成方法

Aurelio Amerio

AI总结 提出GenSBI库,在JAX中实现流匹配、分数匹配和去噪扩散等生成模型,用于模拟推断,提供统一接口和多种Transformer架构,并在标准基准上达到接近理想的C2ST分数。

详情
Comments
48 pages + 1 appendix, 33 figures, 18 tables. For the associated Python code, see https://github.com/aurelio-amerio/GenSBI
AI中文摘要

流和扩散生成模型已成为模拟推断(SBI)中广泛采用的密度估计器,从神经后验估计自然扩展到似然和联合密度估计。它们原则性的优化目标和不受架构约束的特点推动了在自然科学中的快速采用。然而,最广泛使用的SBI库仍然是基于PyTorch的,这使得在JAX中开发前向模型和分析流程的研究人员没有原生选择。我们提出GenSBI,一个完全在JAX中实现流匹配、分数匹配和去噪扩散的开源库。该库提供三种基于Transformer的架构——SimFormer、Flux1和一种新颖的Flux1Joint,它将门控调制Transformer块扩展到联合密度估计——所有这些都通过一个统一接口互换,该接口解耦了生成方法、神经骨干和推理模式。GenSBI提供了从训练到后验校准(SBC、TARP、LC2ST)的端到端工作流,并支持具有领域特定嵌入网络的自定义架构。我们在标准SBI基准上验证了该框架,在SBIBM任务上以最小的每任务调整实现了接近理想的平均C2ST分数(0.50-0.56,其中0.50为理想值),并且在所有测试配置中后验覆盖校准良好。代码公开于https://github.com/aurelio-amerio/GenSBI。

英文摘要

Flow and diffusion generative models have established themselves as widely adopted density estimators for simulation-based inference (SBI), extending naturally from neural posterior estimation to likelihood and joint density estimation. Their principled optimization objectives and freedom from architectural constraints have driven rapid adoption across the natural sciences. Yet the most widely used SBI libraries remain PyTorch-based, leaving researchers who develop their forward models and analysis pipelines in JAX without a native option. We present GenSBI, an open-source library that implements flow matching, score matching, and denoising diffusion entirely in JAX. The library offers three transformer-based architectures - SimFormer, Flux1, and a novel Flux1Joint that extends gate-modulated transformer blocks to joint density estimation - all interchangeable through a unified interface that decouples generative method, neural backbone, and inference mode. GenSBI provides an end-to-end workflow from training through posterior calibration (SBC, TARP, LC2ST) and supports custom architectures with domain-specific embedding networks. We validate the framework on standard SBI benchmarks, achieving near-ideal mean C2ST scores (0.50-0.56, where 0.50 is ideal) on SBIBM tasks with minimal per-task tuning and well-calibrated posterior coverage across all tested configurations. The code is publicly available at https://github.com/aurelio-amerio/GenSBI.

2605.27496 2026-05-28 stat.ME

Model--based clustering for spherical and hyper--spherical data using elliptically symmetric distributions

基于模型的球面和超球面数据聚类:使用椭圆对称分布

Theodoros Perdikis, Nader Alharbi, Michail Tsagris

AI总结 提出使用椭圆对称分布(椭圆对称角度高斯分布和球面椭圆对称投影柯西分布)进行基于模型的定向数据聚类,通过期望最大化算法实现,并比较了两种分布在聚类数选择和计算成本上的表现,应用于球面和超球面数据集。

详情
AI中文摘要

基于模型的定向数据聚类引起了广泛兴趣,但大多数方法使用旋转对称分布。本文建议使用椭圆对称分布,即最近文献中提出的用于建模球面数据的椭圆对称角度高斯分布和球面椭圆对称投影柯西分布。采用期望最大化算法,并考察了协变量的包含。模拟研究比较了这两种分布在选择最优聚类数和计算成本方面的表现。我们使用这两种分布的混合来聚类两个球面数据集(地震位置)和两个超球面数据集。

英文摘要

Model--based clustering for directional data data has attracted a lot of interest, but most methods utilize rotationally symmetric distributions. This paper suggests the use of elliptically symmetric distributions, namely the elliptically symmetric angular Gaussian and the spherical elliptically symmetric projected Cauchy distributions that were recently proposed in the literature for modelling spherical data. The expectation--maximization algorithm is employed and the inclusion of covariates is also examined. Simulation studies compare the two distributions in terms of choosing the optimal number of clusters and computational cost. We use the mixtures of these two distributions to cluster two datasets on the sphere (earthquake locations) and two hyper--spherical datasets.

2605.27477 2026-05-28 stat.ML cs.LG

Iterative Causal Discovery: Per-Edge Impossibility Certificates, Tier-Aware Oracle Queries, and the $1+K$ Lower Bound

迭代因果发现:每边不可能性证书、分层感知的Oracle查询以及$1+K$下界

Eichi Uehara

AI总结 提出一种迭代因果发现协议,通过为每条候选边分配不可能性证书(RESOLVED/IMPOSSIBLE代码)和五层门控可识别性层级(LSNM、IGCI、Stein、MDL、PEIT),结合两种Oracle原语(元枢纽查询和子节点查询),在理想Oracle假设下实现了最多$1+K$次专家交互即可恢复任意DAG的上界。

详情
Comments
Contains 10 figures and 5 tables
AI中文摘要

因果发现算法返回一个有向图,但无法原则性地区分由数据确定的边方向和在没有识别假设的情况下分配的边方向。在标准马尔可夫性和忠实性条件下,观测分布仅识别一个马尔可夫等价类;该类内的方向不由联合分布决定,且无法仅通过额外样本恢复,而是需要功能限制或干预。我们提出一种针对连续数据的观测因果发现协议,该协议为每个候选边附加一个离散的不可能性证书:RESOLVED代码记录提交方向所依据的可识别性定理,而IMPOSSIBLE代码记录失败模式以及领域专家必须回答以解决该问题的具体问题。双变量级联扩展了五个门控可识别性层级:LSNM、IGCI、Stein、MDL和PEIT,当它们的前提条件检验被拒绝时,这些层级会弃权。两种Oracle原语——元枢纽查询和子节点查询——共同建立了最多$1+K$次专家交互的上界,足以恢复任意DAG,其中$K$表示非叶节点的数量。在理想Oracle假设下,该界在asia、sachs、child和alarm基准上被精确达到。

英文摘要

Causal-discovery algorithms return a directed graph, yet provide no principled means of distinguishing edge directions identified by the data from those assigned without an identifying assumption. Under the standard Markov and faithfulness conditions, the observational distribution identifies only a Markov equivalence class; orientations within that class are not determined by the joint distribution and cannot be recovered from additional samples alone, but require either a functional restriction or an intervention. We introduce a protocol for observational causal discovery on continuous data that attaches to each candidate edge a discrete impossibility certificate: a RESOLVED code records the identifiability theorem under which the direction was committed, while an IMPOSSIBLE code records the failure mode together with the specific question a domain expert must answer to resolve it. The bivariate cascade is extended with five gated identifiability tiers LSNM, IGCI, Stein, MDL, and PEIT that abstain when their precondition test rejects. Two oracle primitives, the meta-hub query and the node-children query, jointly establish an upper bound of $1+K$ expert interactions sufficient to recover any DAG, where $K$ denotes the number of non-leaf vertices. Under an ideal-oracle assumption, the bound is met exactly on the asia, sachs, child, and alarm benchmarks.

2605.27473 2026-05-28 stat.ML cs.LG

Calibrated Inference for the Conditional Average Treatment Effect in the Few-Placebo Regime via Gaussian Processes

在少安慰剂条件下通过高斯过程对条件平均处理效应的校准推断

Eichi Uehara

AI总结 针对少安慰剂条件下条件平均处理效应估计的校准不确定性,提出GP-CATE方法,通过高斯过程直接建模每个臂的结果曲面,实现校准覆盖。

详情
Comments
14 pages, 1 figure, 5 tables
AI中文摘要

估计干预对给定个体的帮助程度——条件平均处理效应(CATE)——在医学、经济学和政策决策中日益重要,当估计值伴随校准的不确定性区间时最为有用。我们研究少安慰剂条件,即一个治疗臂远小于另一个,如出现在非均衡分配试验和小样本保留的A/B测试中。该设置下的标准估计器是X-Learner,获得可信区间的自然方法是使其第二阶段贝叶斯化。我们表明这些区间覆盖不足:它们包含真实效应的频率低于名义水平。我们将其归因于结构性原因——X-Learner的回归目标继承了拟合小臂的干扰模型的偏差,因此后验中心偏离真实效应。我们发现标准补救措施——回归正交双稳健得分——在此也不可靠,因为该条件的有限重叠使得估计器要么高度可变,要么一旦稳定后再次有偏。这两种后果反映了超越因果推断的模式:单独估计的方差附加到难以学习的量的点估计上,而点估计的偏差未被该方差捕获。我们提出GP-CATE,它用高斯过程建模每个臂的结果曲面,因此稀缺臂的不确定性直接进入后验,而不是作为未建模的偏差。在合成和半合成基准测试中,GP-CATE实现了校准覆盖,而我们比较的估计器(包括Causal Forest和BART)未能做到,代价是当数据无信息时区间适当变宽。

英文摘要

Estimating how much an intervention helps a given individual the conditional average treatment effect (CATE) is increasingly central to decision-making in medicine, economics, and policy, where an estimate is most useful when accompanied by a calibrated uncertainty interval. We study the few-placebo regime, in which one treatment arm is much smaller than the other, as arises in unequal-allocation trials and small-holdout $A/B$ tests. The standard estimator in this setting is the X-Learner, and a natural way to obtain credible intervals is to make its second stage Bayesian. We show that these intervals under-cover: they contain the true effect less often than their nominal level. We trace this to a structural cause the X-Learner's regression target inherits the bias of a nuisance model fitted to the small arm, so the posterior is centered away from the true effect and we find that the standard remedy, regressing an orthogonal doubly-robust score, is also unreliable here, since the regime's limited overlap leaves the estimator either highly variable or, once stabilized, biased once more. Both consequences reflect a pattern that extends beyond causal inference: a separately estimated variance is attached to a point estimate of a hard-to-learn quantity, and the point estimate's bias is not captured by that variance. We propose GP-CATE, which models each arm's outcome surface with a Gaussian process, so the scarce arm's uncertainty enters the posterior directly rather than as an unmodelled bias. Across synthetic and semi-synthetic benchmarks, GP-CATE attains calibrated coverage where the estimators we compare against including Causal Forest and BART do not, at the cost of intervals that are appropriately wide when the data are uninformative.

2605.27466 2026-05-28 cs.MA cs.AI cs.LG stat.ML

AgensFlow: A Coordination-Policy Substrate for Multi-Agent Systems

AgensFlow:多智能体系统的协调策略基础

Nicole Koenigstein

AI总结 提出AgensFlow框架,将多智能体协调视为在线策略学习问题,通过可学习路由优化协调流程,在分布式系统事件和安全咨询任务上验证了其优于固定管道基线。

详情
Comments
7 pages, 4 figures, 4 tables. Code and reproducible evaluations available at: https://github.com/Nicolepcx/AgensFlow
AI中文摘要

基于大语言模型(LLM)构建的多智能体系统需要许多难以先验固定的协调选择:调用哪个技能协议、哪个智能体角色应执行子任务、每个角色绑定哪个模型、角色之间如何交互、何时使用检索或验证,以及何时完全省略某个步骤。这些选择与任务机制和操作约束相互影响,因此静态管道和一次性模型比较只能提供设计空间的有限视角。本文介绍AgensFlow,一个开源框架,将多智能体协调视为部分可观测下的在线策略学习问题。该框架使协调决策可观测且可从重复轨迹中学习,而不是将技能、角色、模型、拓扑和评估选择视为固定的管道设计。AgensFlow在两个语料库上进行了评估:分布式系统事件任务和安全咨询任务。评估展示了三个主要结果:在协调密集型任务上,学习路由比固定管道基线达到更高质量的操作点;skip:X将拓扑压缩隔离为基础的有意义部分;热启动策略图可以在保持平台质量的同时减少探索成本。总体而言,结果支持学习型可审计路由可以改善静态布线下的协调密集型多智能体工作流。

英文摘要

Multi-agent systems built on large language models (LLMs) require many coordination choices that are difficult to fix a priori: which skill protocol to invoke, which agent role should perform a subtask, which model to bind to each role, how roles should interact, when to use retrieval or verification, and when to omit a step entirely. These choices interact with task regime and operational constraints, so static pipelines and one-off model comparisons provide only a limited view of the design space. This paper introduces AgensFlow, an open-source framework that treats multi-agent coordination as an online policy-learning problem under partial observability. The framework makes coordination decisions observable and learnable from repeated trajectories, rather than treating skill, role, model, topology, and evaluation choices as fixed pipeline design. AgensFlow is evaluated on two corpora: distributed-systems incident tasks and security-advisory tasks. The evaluation shows three main results: learned routing reaches a higher-quality operating point than a fixed pipeline baseline on coordination-heavy classes; skip:X isolates topology compression as a meaningful part of the substrate; and warm-started policy graphs can reduce exploration cost while preserving plateau quality. Overall, the results support that learned, auditable routing can improve coordination-heavy multi-agent workflows over static wiring.

2605.27463 2026-05-28 stat.ME cs.AI stat.AP

When prompt perturbations break your A/B test: A valid statistical test for generative surveying

当提示扰动破坏你的A/B测试:一种用于生成式调查的有效统计检验

Hayden Helm, Carey Priebe

AI总结 针对生成式调查中LLM对提示设计敏感的问题,提出一种置换检验方法,在包含扰动结构的统计模型下保持有效性,并给出预算分配建议。

详情
AI中文摘要

生成式调查——利用基于LLM的角色集合对消息提供反馈——已成为传统市场研究的廉价且可扩展的替代方案。然而,LLM对提示设计中的微小变化很敏感,从生成式调查中得出的结论可能依赖于任意的措辞选择。控制这种敏感性需要在分析中包含语义等价的扰动。在本文中,我们表明,在包含现实扰动结构的生成式调查统计模型下,标准假设检验(包括符号检验和Wilcoxon符号秩检验)是无效的。我们提出了一种在该模型下有效的置换检验,并正式刻画了标准检验失效的条件。将我们的框架应用于一个简单的生成式调查问题,我们估计了相关参数,刻画了置换检验在现实条件下的功效,并提供了关于在角色、扰动和重复之间分配预算的实用指导。最后,我们表明,即使在同一个模型家族内,估计效应的大小和方向都对模型选择敏感。

英文摘要

Generative surveying -- where collections of LLM-based personas provide feedback on messages -- has emerged as a cheap and scalable alternative to traditional market research. However, LLMs are sensitive to small variations in prompt design and conclusions drawn from generative surveys may depend on arbitrary phrasing choices. Controlling for this sensitivity requires including semantically equivalent perturbations in the analysis. In this paper, we show that standard hypothesis tests, including the sign test and Wilcoxon signed-rank test, are invalid under a statistical model for generative surveying that includes realistic perturbation structure. We propose a permutation test that is valid under this model and formally characterize the conditions under which standard tests fail. Applying our framework to a simple generative surveying problem, we estimate relevant parameters, characterize the power of the permutation test under realistic conditions, and provide practical guidance on budget allocation across personas, perturbations, and replicates. Finally, we show that both the magnitude and direction of the estimated effect are sensitive to the choice of model, even within the same model family.

2605.25567 2026-05-28 stat.ML cs.LG

Rao-Blackwellized Score Matching on Manifolds

流形上的 Rao-Blackwellized 得分匹配

Divit Rawal

AI总结 针对潜分布支撑在光滑嵌入流形上的去噪得分匹配,提出通过最近点投影条件期望消除奇异性的 Rao-Blackwellized 方法,并推导出小噪声展开下与内在黎曼得分相差显式 σ² 修正的规范目标。

详情
Comments
22 pages, 3 figures; SPIGM @ ICML 2026
AI中文摘要

我们研究了当潜分布支撑在光滑嵌入流形 $M \subset \mathbb{R}^D$ 上时的去噪得分匹配(DSM)。在环境高斯噪声下,切向去噪目标包含一个奇异的法向纤维噪声通道,其方差在 $\sigma \to 0^+$ 时发散为 $d/\sigma^2$。我们证明,对最近点投影 $\pi(X)$ 取条件可以规范地消除这一奇异性:所得的条件期望是所有仅依赖于投影观测 $\pi(X)$ 的估计量中切向 DSM 目标的唯一 $L^2$ 最优 Rao-Blackwellized 预测器。然后我们计算了这一规范目标的小噪声展开,并证明它等于内在黎曼得分,相差一个显式的 $\sigma^2$ 阶修正,该修正分解为一个内在的 Tweedie 项和一个涉及 Weingarten 和 Ricci 算子的外在曲率项。在平坦情形下,该构造精确退化为普通的低维高斯 DSM,而在 $S^d$ 上,外在修正简化为标量因子 $(1-d/2)\nabla_M \log q$;在 $S^2$ 上,这一外在 $\sigma^2$ 修正恒为零,但内在的 Tweedie 项仍然存在。

英文摘要

We study denoising score matching (DSM) when the latent distribution is supported on a smooth embedded manifold $M \subset \mathbb{R}^D$. Under ambient Gaussian corruption, the tangent denoising target contains a singular normal-fiber noise channel whose variance diverges as $d/σ^2$ as $σ\to 0^+$. We show that conditioning on the nearest-point projection $π(X)$ canonically removes this singularity: the resulting conditional expectation is the unique $L^2$-optimal Rao-Blackwellized predictor of the tangent DSM target among all estimators depending only on the projected observation $π(X)$. We then compute the small-noise expansion of this canonical target and show that it equals the intrinsic Riemannian score up to an explicit order-$σ^2$ correction that decomposes into an intrinsic Tweedie term and an extrinsic curvature term involving the Weingarten and Ricci operators. In the flat case, the construction reduces exactly to ordinary lower-dimensional Gaussian DSM, while on $S^d$ the extrinsic correction simplifies to the scalar factor $(1-d/2)\nabla_M \log q$; this extrinsic $σ^2$ correction cancels identically on $S^2$, though the intrinsic Tweedie term remains.

2605.24666 2026-05-28 math.DS cs.NA math.NA stat.ML

Finding Koopman Invariant Subspaces via Personalized PageRank

通过个性化PageRank寻找Koopman不变子空间

Hyukpyo Hong, Qin Li, Matthew J. Colbrook, Hanbaek Lyu

AI总结 利用扩展动态模态分解(EDMD)矩阵的零块结构,通过个性化PageRank检测Koopman不变子空间,实现紧凑可解释字典的自动选择,并提供有限样本保证。

详情
Comments
37 pages, 9 figures
AI中文摘要

选择一组其张成空间是Koopman不变的有限观测字典是数据驱动Koopman算子逼近中的核心挑战。我们通过利用扩展动态模态分解(EDMD)矩阵中的零块结构来解决这个问题。我们证明,任何其张成空间是Koopman不变的子字典都会在EDMD矩阵中诱导出一个精确的零块,即使对于有限数据也是如此。然后我们证明,通过将PageRank应用于从大型初始字典构建的行归一化EDMD矩阵,可以检测到这样的块。该理论扩展到近似不变子空间,并且当种子观测位于目标块内并到达该块中的所有观测时,为个性化PageRank(PPR)提供更强的保证。将EDMD浓度界与PageRank扰动理论相结合,给出了端到端的检测保证,具有$O(1/\sqrt{M})$的有限样本缩放和显式常数。更一般地,在不假设存在不变子空间的情况下,子字典上的高PPR质量控制了从种子观测出发的折扣多步泄漏。在Duffing振荡器、Van der Pol振荡器、Lorenz系统和三阱Ramachandran势上的数值实验表明,该方法能够识别出紧凑、可解释的字典,并具有准确的预测能力。

英文摘要

Selecting a finite dictionary of observables whose span is Koopman-invariant is a central challenge in data-driven Koopman operator approximation. We address this problem by exploiting zero-block structure in Extended Dynamic Mode Decomposition (EDMD) matrices. We show that any sub-dictionary whose span is Koopman-invariant induces an exact zero block in the EDMD matrix, even for finite data. We then show that such blocks can be detected by applying PageRank to a row-normalized EDMD matrix constructed from a large initial dictionary. The theory extends to approximately invariant subspaces and yields stronger guarantees for personalized PageRank (PPR) when the seed observables lie inside the target block and reach all observables in that block. Combining EDMD concentration bounds with PageRank perturbation theory gives end-to-end detection guarantees with $O(1/\sqrt{M})$ finite-sample scaling and explicit constants. More generally, without assuming an invariant subspace exists, high PPR mass on a sub-dictionary controls discounted multi-step leakage from the seed observables. Numerical experiments on the Duffing oscillator, Van der Pol oscillator, Lorenz system, and a three-well Ramachandran potential suggest that the method identifies compact, interpretable dictionaries with accurate predictions.

2605.20635 2026-05-28 cs.LG math.ST stat.ML stat.TH

The General Theory of Localization Methods

局部化方法的一般理论

Congwei Song

AI总结 本文提出一种基于局部化核和局部均值的通用机器学习框架——局部化方法,系统揭示其与多种现有模型(如核方法、MeanShift、Transformer等)的联系,并展示其统一和泛化现代架构的能力。

详情
Comments
correct some math expressions
AI中文摘要

本文提出一种称为局部化方法的通用机器学习框架,该框架从根本上建立在两个核心概念之上:局部化核和局部均值——这些是支撑自注意力机制的关键组成部分。为了建立严格的理论基础,该框架通过两个基本支柱正式定义:局部(化)模型的公式化和局部化技巧。我们系统地研究了局部化方法与广泛现有机器学习模型/方法之间的联系,包括(但不限于)核方法、惰性学习、MeanShift算法、松弛标记、Hopfield网络、局部线性嵌入(LLE)、模糊推理和去噪自编码器(DAEs)。通过剖析这些关系,我们阐明了局部化方法更广泛的理论意义,并展示了其在各种机器学习任务中的实际适用性。此外,我们探讨了该框架的高级扩展,如自适应核、层次局部模型和非局部模型。值得注意的是,我们展示了Transformer——现代序列建模的基石——可以使用层次局部模型构建,揭示了局部化方法统一和泛化最先进架构的能力。这项工作不仅提供了重新解释现有模型的统一理论视角,还为设计灵活、数据自适应的学习系统提供了新的方法论工具。

英文摘要

This paper proposes a general machine learning framework called the localization method, which is fundamentally built on two core concepts: localization kernels and local means -- key components that underpin the self-attention mechanism. To establish a rigorous theoretical foundation, the framework is formally defined through two essential pillars: the formulation of the local(-ized) model and the localization trick. We systematically investigate the connections between the localization method and a wide range of existing machine learning models/methods, including (but not limited to) kernel methods, lazy learning, the MeanShift algorithm, relaxation labeling, Hopfield networks, local linear embedding (LLE), fuzzy inference, and denoising autoencoders (DAEs). By dissecting these relationships, we clarify the broader theoretical significance of the localization method and demonstrate its practical applicability across diverse machine learning tasks. Furthermore, we explore advanced extensions of the framework, such as adaptive kernels, hierarchical local models, and non-local models. Notably, we show that the Transformer -- a cornerstone of modern sequence modeling -- can be constructed using hierarchical local models, revealing the ability of the localization method to unify and generalize state-of-the-art architectures. This work not only provides a unified theoretical lens to reinterpret existing models but also offers new methodological tools for designing flexible, data-adaptive learning systems.

2605.23419 2026-05-28 stat.ME eess.SP math.ST stat.TH

Generalized Stochastic Approximation of the Log-Likelihood Ratio for Robust Sequential Change-Point Detection

鲁棒序列变点检测的对数似然比的广义随机逼近

Serhii Zabolotnii

AI总结 针对非高斯随机过程中的序列变点检测问题,提出基于广义随机基(多项式、对数或分数幂)的对数似然比逼近框架,仅需3s阶矩,无需分布解析形式,通过Kullback-Leibler散度投影选择逼近阶数,并利用Kunchenko概率误差界控制虚警率,在极端重尾数据上优于经典方法。

详情
Comments
68 pages, 7 figures. Companion code, Monte Carlo experiments, and Lean 4 formal proofs of the core theorems: https://github.com/SZabolotnii/KuYuPe-Change_Point-code-supplement
AI中文摘要

非高斯随机过程中的序列变点检测具有挑战性,因为底层密度在实时中很少已知。经典的参数化方法如CUSUM在分布失配时失去最优性,而非参数替代方法通常反应缓慢。我们开发了一个统一框架,在广义随机基(多项式、对数或分数幂)上逼近对数似然比(LLR),仅使用高达3s阶矩,无需分布的解析形式,从而将经典的CUSUM、GRSh和SRP方法适应于非高斯数据。收敛泛函J(s) = K^T Y被解释为Kullback-Leibler散度在基张成空间上的投影,为选择逼近阶数提供了形式化准则。我们针对小相对变点的场景,其中信号能量变化很小,但分布的形状(尾部结构和模态)发生变化。一个鲁棒的阈值来自Kunchenko的概率误差界(KU-PE),无需经验调参即可控制虚警率。在四个领域的九个公开基准上,据我们所知,该方法是唯一能在极端重尾数据(超额峰度γ_4 > 20)上运行的方法,而经典方法会产生100%的虚警,同时在保证虚警水平下减少检测延迟。核心定理已在Lean 4中正式验证。

英文摘要

Sequential change-point detection in non-Gaussian stochastic processes is challenging because the underlying densities are rarely known in real time. Classical parametric procedures such as CUSUM lose optimality under distributional mismatch, whereas nonparametric alternatives often react slowly. We develop a unified framework that approximates the log-likelihood ratio (LLR) on a generalized stochastic basis -- polynomial, logarithmic, or fractional-power -- using only moments up to order 3s, with no analytic form of the distribution, and thereby adapts the classical CUSUM, GRSh, and SRP procedures to non-Gaussian data. The convergence functional J(s) = K^T Y is interpreted as the projection of the Kullback-Leibler divergence onto the basis span, yielding a formal criterion for selecting the approximation order. We target the regime of small relative change-points, where the signal energy changes little but the shape of the distribution -- tail structure and modality -- does. A robust threshold follows from Kunchenko's probability-error bound (KU-PE), which controls the false-alarm rate without empirical tuning. On nine public benchmarks across four domains, the method is, to our knowledge, the only one operative on extremely heavy-tailed data (excess kurtosis gamma_4 > 20), where classical methods produce 100% false alarms, while reducing the detection delay at a guaranteed false-alarm level. The core theorems are formally verified in Lean 4.

2605.03517 2026-05-28 cs.LG stat.ML

Understanding Self-Supervised Learning via Latent Distribution Matching

通过潜在分布匹配理解自监督学习

Fabian A Mikulasch, Friedemann Zenke

AI总结 本文将自监督学习形式化为潜在分布匹配(LDM),通过对齐和均匀性最大化潜在表示的对数概率和熵,统一了多种SSL方法,并推导出用于高维时间序列的非线性无采样贝叶斯滤波模型。

详情
Comments
Accepted to ICML 2026 (Spotlight)
AI中文摘要

自监督学习(SSL)擅长从复杂数据中学习通用潜在表示,但缺乏统一的理论框架来解释现有各种方法并指导新方法的设计。我们将SSL视为潜在分布匹配(LDM):学习表示以最大化其在假设潜在模型下的对数概率(对齐),同时最大化潜在熵以防止坍塌(均匀性)。这一观点将独立成分分析与对比、非对比和预测性SSL方法(包括停止梯度方法)统一起来。利用LDM,我们推导出一个非线性的、无采样的贝叶斯滤波模型,其中包含基于卡尔曼的预测器,用于高维时间序列。我们进一步证明,在温和假设下,即使使用非线性预测器,预测性LDM也能产生可识别的潜在表示。总体而言,LDM阐明了现有SSL方法背后的假设,并为开发新方法提供了原则性指导。

英文摘要

Self-supervised learning (SSL) excels at finding general-purpose latent representations from complex data, yet lacks a unifying theoretical framework that explains the diverse existing methods and guides the design of new ones. We cast SSL as latent distribution matching (LDM): learning representations that maximize their log-probability under an assumed latent model (alignment), while maximizing latent entropy to prevent collapse (uniformity). This view unifies independent component analysis with contrastive, non-contrastive, and predictive SSL methods, including stop gradient approaches. Leveraging LDM, we derive a nonlinear, sampling-free Bayesian filtering model with a Kalman-based predictor for high-dimensional timeseries. We further prove that predictive LDM yields identifiable latent representations under mild assumptions, even with nonlinear predictors. Overall, LDM clarifies the assumptions behind established SSL methods and provides principled guidance for developing new approaches.

2512.21075 2026-05-28 cs.LG cs.AI math.PR stat.ML

Feature Learning Dynamics in Infinite-Depth Neural Networks

无限深度神经网络中的特征学习动力学

Zihan Yao, Ruoyu Wu, Tianxiang Gao

AI总结 本文研究深度-μP缩放下单层ResNet中由权重重用引起的前向-后向耦合,证明其在初始化时随宽度消失,但在训练中产生非平凡相关项,并推导出无限深度极限下的神经特征动力学(NFD)SDE系统。

详情
AI中文摘要

深度神经网络在实践中取得了显著成功,但对训练过程中特征如何演化的机制理解仍不完整,尤其是在大深度极限下。对于深度-μP缩放下的ResNet,先前工作将层索引ℓ视为连续时间t_ℓ = ℓ/L,得到训练动力学的SDE描述。一个关键未解决问题是,反向传播通过其转置W_ℓ^⊤重用每个前向权重矩阵W_ℓ,在前向特征和反向梯度之间产生相关性,其行为和特征学习中的作用尚不清楚。我们研究了深度-μP下单层ResNet中这种重用权重的前向-后向耦合。使用条件高斯表示,我们在取任何网络极限之前,显式地将权重重用引起的耦合项与解耦的高斯波动分开。在初始化时,我们证明耦合是有限宽度效应,并以O(n^{-1})的速率随深度一致消失。然而,在训练期间,SGD引入了一个非平凡的前向-后向相关项,该项在无限宽度极限下仍然存在。关键的深度效应是,在深度-μP缩放下,这个幸存项在深度上是高阶的,并且随着L→∞,其在层上的累积贡献变得可忽略。这种深度诱导的抑制促使了神经特征动力学(NFD),一个具有解耦后向权重的向前-向后SDE系统,它保留了训练期间生成的特征-梯度协方差结构。在非退化假设下,我们证明有限网络训练动力学收敛到其NFD极限,深度离散化误差为O(L^{-1}),而重用权重耦合项具有更快的O(L^{-2})衰减。这些结果为深度-μP下单层ResNet的特征学习动力学提供了严格的无限深度极限。

英文摘要

Deep neural networks have achieved remarkable success in practice, yet a mechanistic understanding of how features evolve during training remains incomplete, especially in the large-depth limit. For ResNets under depth-$μ$P scaling, prior work treats the layer index $\ell$ as a continuous time $t_\ell = \ell/L$, yielding SDE descriptions of the training dynamics. A key unresolved issue is that backpropagation reuses each forward weight matrix $W_\ell$ through its transpose $W_\ell^\top$, creating correlations between forward features and backward gradients whose behavior and role in feature learning remain unclear. We study this reused-weight forward--backward coupling in one-layer ResNets under depth-$μ$P. Using conditional Gaussian representations, we explicitly separate the coupling terms induced by weight reuse from decoupled Gaussian fluctuations before taking any network limit. At initialization, we prove that the coupling is a finite-width effect and vanishes at rate $O(n^{-1})$, uniformly over depth. During training, however, SGD induces a nontrivial forward--backward correlation term that survives the infinite-width limit. The key depth effect is that, under depth-$μ$P scaling, this surviving term is higher order in depth and its accumulated contribution over layers becomes negligible as $L\to\infty$. This depth-induced suppression motivates Neural Feature Dynamics (NFD), a forward--backward SDE system with decoupled backward weights that retains the feature-gradient covariance structure generated during training. Under nondegeneracy assumptions, we prove that the finite-network training dynamics converge to its NFD limit with an $O(L^{-1})$ depth-discretization error, while the reused-weight coupling term has a faster $O(L^{-2})$ decay. These results provide a rigorous infinite-depth limit for the feature-learning dynamics of one-layer ResNets under depth-$μ$P.

2605.11755 2026-05-28 cs.LG cs.CV stat.ML

One-Step Generative Modeling via Wasserstein Gradient Flows

通过Wasserstein梯度流的一步生成建模

Jiaqi Han, Puheng Li, Qiushan Guo, Renyuan Xu, Stefano Ermon, Emmanuel J. Candès

AI总结 提出W-Flow框架,通过Wasserstein梯度流将参考分布到目标分布的演化压缩为一步生成,结合Sinkhorn散度实现高效最优传输,在ImageNet 256×256上达到1.29 FID且采样速度提升约100倍。

详情
Comments
40 pages, 14 figures
AI中文摘要

扩散模型和基于流的方法展现了令人印象深刻的生成能力,尤其对于图像,但其采样成本高昂,因为需要多次迭代更新。我们引入了W-Flow,一个训练生成器的框架,该生成器在单步中将来自简单参考分布的样本转换为来自目标数据分布的样本。这通过两步实现:首先,通过最小化能量泛函的Wasserstein梯度流,定义从参考分布到目标分布的演化;其次,训练一个静态神经生成器将此演化压缩为一步生成。我们用Sinkhorn散度实例化能量泛函,该散度产生一种高效的基于最优传输的更新规则,捕获全局分布差异并改善目标分布的覆盖。我们进一步证明了在适当假设下,有限样本训练动力学收敛到连续时间分布动力学。实验上,W-Flow为一步ImageNet 256×256生成设立了新的最先进水平,实现了1.29 FID,并改善了模式覆盖和域迁移。与具有相似FID分数的多步扩散模型相比,我们的方法实现了约100倍的采样加速。这些结果表明,Wasserstein梯度流为快速且高保真的生成建模提供了原则性和有效的基础。

英文摘要

Diffusion models and flow-based methods have shown impressive generative capability, especially for images, but their sampling is expensive because it requires many iterative updates. We introduce W-Flow, a framework for training a generator that transforms samples from a simple reference distribution into samples from a target data distribution in a single step. This is achieved in two steps: we first define an evolution from the reference distribution to the target distribution through a Wasserstein gradient flow that minimizes an energy functional; second, we train a static neural generator to compress this evolution into one-step generation. We instantiate the energy functional with the Sinkhorn divergence, which yields an efficient optimal-transport-based update rule that captures global distributional discrepancy and improves coverage of the target distribution. We further prove that the finite-sample training dynamics converge to the continuous-time distributional dynamics under suitable assumptions. Empirically, W-Flow sets a new state of the art for one-step ImageNet 256$\times$256 generation, achieving 1.29 FID, with improved mode coverage and domain transfer. Compared to multi-step diffusion models with similar FID scores, our method yields approximately 100$\times$ faster sampling. These results show that Wasserstein gradient flows provide a principled and effective foundation for fast and high-fidelity generative modeling.

2511.10132 2026-05-28 math.PR math.ST stat.TH

Hawkes autoregressive processes: a new model for multiscale and heterogeneous processes

霍克斯自回归过程:一种用于多尺度与异质过程的新模型

Théo Leblanc

AI总结 本文提出霍克斯自回归(HAR)模型,融合连续时间与离散时间动力学,并证明了平稳性、聚类表示、稳定性及遍历性等概率性质。

详情
Comments
As suggested by the anonymous referee, we decided to cut the paper in half. The paper now focuses only on the probabilistic study of HAR processes, on which the statistical study fundamentally relies. The statistical analysis, which builds upon these probabilistic results, is postponed to a separate paper. Some results have also been improved
AI中文摘要

霍克斯过程和自回归过程都依赖于其过去的线性泛函,同时对不同类型的数据进行建模。由于同一现象观测产生的数据集可能是异质的,并且以不同的时间尺度采样,因此研究多尺度和异质过程(例如通过结合霍克斯和自回归动力学获得的过程)是自然的。在本文中,我们引入了这种新的霍克斯自回归(HAR)模型,它融合了连续时间和离散时间动力学,并建立了若干概率结果,包括平稳版本的存在性、聚类表示,以及稳定性和遍历性性质。

英文摘要

Both Hawkes processes and autoregressive processes rely on linear functionals of their past, while modeling different types of data. Since datasets arising from observations of the same phenomenon may be heterogeneous and sampled at different time scales, it is natural to study multiscale and heterogeneous processes, such as those obtained by combining Hawkes and autoregressive dynamics. In this paper, we introduce this new Hawkes autoregressive (HAR) model incorporating both continuous- and discrete-time dynamics, and establish several probabilistic results, including the existence of a stationary version, a cluster representation, as well as stability and ergodic properties.

2605.09986 2026-05-28 stat.ML cs.CL cs.LG

Federated Language Models Under Bandwidth Budgets: Distillation Rates and Conformal Coverage

带宽预算下的联邦语言模型:蒸馏率与共形覆盖

Prasanjit Dubey, Xiaoming Huo

AI总结 本文研究带宽受限节点间分布式语言模型的统计保证,提出联邦探针-对数蒸馏(FPLD)和联邦共形RAG(FC-RAG)两种协议,分别给出训练时的KL一致性率和推理时的无分布边际覆盖界,首次将带宽作为一阶统计参数。

详情
AI中文摘要

在临床网络、企业知识库和科学联盟中,经常出现数据分散在带宽受限节点上且无法集中的场景,需要训练语言模型。我们研究数据必须保持分布式在节点上的情况,并询问在明确带宽预算下原则上可以实现哪些统计保证;我们的目标是描述可证明的可能性,而不是展示一个可部署的系统。现有理论要么单独处理训练时的一致性,要么单独处理推理时的校准,且没有先前的工作将带宽作为一阶统计参数。我们分析了两种协议:用于训练的联邦探针-对数蒸馏(FPLD)和用于推理的联邦共形RAG(FC-RAG),作为我们结果的分析载体。我们的第一个主要结果是FPLD的显式高概率KL一致性率,同时依赖于节点数$K$、每节点样本量$n$、量化预算$B$、探针集大小$m$和词汇量$V$;带宽仅通过指数衰减的量化项进入。我们的第二个主要结果是FC-RAG的无分布边际覆盖界,其新颖的检索带宽松弛量$\Delta_{\mathrm{RAG}} = f_{\max}\sqrt{K^{-2}\sum_i v(B_i)}$使每节点检索带宽成为一阶统计参数,在每节点均匀情况下,通过$K$个节点的算术聚合使松弛量以$K^{-1/2}$的速度缩小。一个Pinsker型推论将两个界组合成端到端的覆盖保证。合成实验验证了沿界参数的预测缩放;在GPT-2测试平台上的小规模实验表明,定性带宽-准确率权衡在真实语言模型上仍然存在。部署规模的实证评估超出范围。

英文摘要

Training a language model on data scattered across bandwidth-limited nodes that cannot be centralized is a setting that arises in clinical networks, enterprise knowledge bases, and scientific consortia. We study the regime in which data must remain distributed across nodes, and ask what statistical guarantees are in principle achievable under explicit bandwidth budgets; we aim to characterize what is provably possible, not to demonstrate a deployment-ready system. Existing theory treats either training-time consistency or inference-time calibration in isolation, and no prior work makes bandwidth a first-class statistical parameter. We analyze two protocols, Federated Probe-Logit Distillation (FPLD) for training and Federated Conformal RAG (FC-RAG) for inference, as the analytical vehicles for our results. Our first main result is an explicit high-probability KL-consistency rate for FPLD with simultaneous dependence on node count $K$, per-node sample size $n$, quantization budget $B$, probe-set size $m$, and vocabulary size $V$; bandwidth enters only through an exponentially vanishing quantization term. Our second main result is a distribution-free marginal-coverage bound for FC-RAG, whose novel retrieval-bandwidth slack $Δ_{\mathrm{RAG}} = f_{\max}\sqrt{K^{-2}\sum_i v(B_i)}$ makes per-node retrieval bandwidth a first-class statistical parameter, with arithmetic aggregation across $K$ nodes shrinking the slack as $K^{-1/2}$ in the per-node-uniform regime. A Pinsker-type corollary composes the two bounds into an end-to-end coverage guarantee. Synthetic experiments verify the predicted scaling along the bounds' parameters; small-scale experiments on a GPT-2 testbed illustrate that the qualitative bandwidth-accuracy tradeoff survives on a real language model. A deployment-scale empirical evaluation is out of scope.

2605.09084 2026-05-28 math.ST stat.TH

Two-Sample Inference for Gaussian-Smoothed Wasserstein Costs with Finite Moments

有限矩条件下高斯平滑Wasserstein代价的双样本推断

Jiaping Yang, Yunxin Zhang

AI总结 研究高斯平滑Wasserstein代价的双样本插件估计量,在有限多项式矩条件下建立了概率上界,并推导了中心极限定理和方差估计。

详情
AI中文摘要

高斯平滑已成为降低最优传输样本复杂度的有效技术。本文研究高斯平滑Wasserstein代价 \(T_p^{(σ)}(μ,ν)=W_p(μ*γ_σ,ν*γ_σ)^p\) 在 \(\R^d\) 上的双样本插件估计量。对于固定的平滑和有限多项式矩 \(M_{q_μ}(μ)<\infty\),\(M_{q_ν}(ν)<\infty\),其中 \(q_μ,q_ν>p\),我们建立了概率上界,阶为 \(ρ_{q_μ,p,d}(m)+ρ_{q_ν,p,d}(n)\)。这里当 \(p<q<d+2p\) 时 \(ρ_{q,p,d}(N)=N^{-(q-p)/(q+d)}\),当 \(q=d+2p\) 时 \(ρ_{q,p,d}(N)=N^{-1/2}\log N\),当 \(q>d+2p\) 时 \(ρ_{q,p,d}(N)=N^{-1/2}\)。在 \(q_μ,q_ν\ge2p\) 条件下,该阶在期望意义下也成立。当平滑总体距离为正时,代价界给出距离本身的该速率。对于 \(p>1\) 且 \(q_μ,q_ν>d+2p\),我们还推导了一阶展开、分离的双样本中心极限定理以及样本分裂方差估计量。

英文摘要

Gaussian smoothing has emerged as an effective technique for reducing the sample complexity of optimal transport. In this paper, we study the two-sample plug-in estimator of the Gaussian-smoothed Wasserstein cost \(T_p^{(σ)}(μ,ν)=W_p(μ*γ_σ,ν*γ_σ)^p\) on \(\R^d\). For fixed smoothing and finite polynomial moments \(M_{q_μ}(μ)<\infty\), \(M_{q_ν}(ν)<\infty\), with \(q_μ,q_ν>p\), we establish upper bounds in probability of order \(ρ_{q_μ,p,d}(m)+ρ_{q_ν,p,d}(n)\). Here \(ρ_{q,p,d}(N)=N^{-(q-p)/(q+d)}\) for \(p<q<d+2p\), \(N^{-1/2}\log N\) at \(q=d+2p\), and \(N^{-1/2}\) for \(q>d+2p\). This order also holds in expectation under \(q_μ,q_ν\ge2p\). When the smoothed population distance is positive, the cost bound yields this rate for the distance itself. For \(p>1\) and \(q_μ,q_ν>d+2p\), we also derive a first-order expansion, a separated two-sample central limit theorem, and a sample-splitting variance estimator.

2604.19072 2026-05-28 cs.LG cs.AI stat.ML

S2MAM: Semi-supervised Meta Additive Model for Robust Estimation and Variable Selection

S2MAM: 半监督元加性模型用于稳健估计和变量选择

Xuelin Zhang, Hong Chen, Yingjie Wang, Tieliang Gong, Bin Gu

AI总结 提出基于双层优化的半监督元加性模型,自动识别信息变量、更新相似矩阵并实现可解释预测,理论保证收敛性和泛化界,实验验证了鲁棒性和可解释性。

详情
Comments
Accepted by ICML'2026 as Accept (regular)
AI中文摘要

基于流形正则化的半监督学习是一种经典的联合利用有标签和无标签数据进行学习的框架,其关键要求是未知边际分布的支持集具有黎曼流形的几何结构。通常,基于拉普拉斯-贝尔特拉米算子的流形正则化可以通过与整个训练数据及其对应的图拉普拉斯矩阵相关联的拉普拉斯正则化进行经验近似。然而,图拉普拉斯矩阵严重依赖于预先指定的相似度度量,并且在处理冗余或噪声输入变量时可能导致不适当的惩罚。为了解决上述问题,本文提出了一种新的半监督元加性模型(S$^2$MAM),该模型基于双层优化方案,能够自动识别信息变量、更新相似矩阵,并同时实现可解释的预测。为S$^2$MAM提供了理论保证,包括计算收敛性和统计泛化界。在4个合成数据集和12个真实世界数据集上进行的实验评估,涵盖了不同级别和类型的污染,验证了所提方法的鲁棒性和可解释性。

英文摘要

Semi-supervised learning with manifold regularization is a classical framework for jointly learning from both labeled and unlabeled data, where the key requirement is that the support of the unknown marginal distribution has the geometric structure of a Riemannian manifold. Typically, the Laplace-Beltrami operator-based manifold regularization can be approximated empirically by the Laplacian regularization associated with the entire training data and its corresponding graph Laplacian matrix. However, the graph Laplacian matrix depends heavily on the prespecified similarity metric and may lead to inappropriate penalties when dealing with redundant or noisy input variables. To address the above issues, this paper proposes a new Semi-Supervised Meta Additive Model (S$^2$MAM) based on a bilevel optimization scheme that automatically identifies informative variables, updates the similarity matrix, and simultaneously achieves interpretable predictions. Theoretical guarantees are provided for S$^2$MAM, including the computing convergence and the statistical generalization bound. Experimental assessments across 4 synthetic and 12 real-world datasets, with varying levels and categories of corruption, validate the robustness and interpretability of the proposed approach.

2601.07299 2026-05-28 stat.AP

Cauchy-Gaussian Overbound for Heavy-tailed GNSS Measurement Errors

重尾GNSS测量误差的柯西-高斯过界

Zhengdao Li, Penggao Yan, Weisong Wen, Li-Ta Hsu

AI总结 针对重尾GNSS测量误差,提出结合柯西分布核心与高斯分布尾部的过界方法,通过卷积保持过界性质,在位置域将垂直保护水平降低15%-47%。

详情
Comments
Published in NAVIGATION: Journal of the Institute of Navigation
AI中文摘要

重尾测量误差的过界对于满足完整性监测应用中的严格导航要求至关重要。本文提出利用柯西分布在核心的边界锐度以及高斯分布在尾部的边界锐度,来紧密界定重尾全球导航卫星系统测量误差。我们开发了一个程序来确定对称单峰和非对称单峰重尾误差的过界参数,并证明过界性质在卷积下得以保持。在模拟和真实数据集上的实验结果表明,我们的方法能够在核心和尾部区域紧密界定重尾误差。在位置域,与单累积分布函数高斯过界相比,所提方法将对称单峰重尾误差的平均垂直保护水平降低了15%;与非对称单峰重尾误差相比,与导航离散包络和两步高斯过界相比,降低了21%-47%。

英文摘要

Overbounds of heavy-tailed measurement errors are essential to meet stringent navigation requirements in integrity monitoring applications. This paper proposes to leverage the bounding sharpness of the Cauchy distribution in the core and the Overbounds of heavy-tailed measurement errors are essential for meeting stringent navigation requirements in integrity-monitoring applications. This paper proposes to leverage the bounding sharpness of the Cauchy distribution in the core and the Gaussian distribution in the tails to tightly bound heavy-tailedglobal navigation satellite system measurement errors. We develop a procedure to determine the overbounding parameters for both symmetric unimodal (SU)and non-symmetric unimodal (NSU) heavy-tailed errors and prove that the over-bounding property is preserved through convolution. Experiment results on both simulated and real-world data sets reveal that our method can sharply boundheavy-tailed errors in both the core and tail regions. In the position domain, the proposed method reduces the average vertical protection level by 15% for SU heavy-tailed errors compared with the single-cumulative-density-function Gaussian overbound and by 21%-47% for NSU heavy-tailed errors compared with the navigation discrete envelope and two-step Gaussian overbounds.

2501.16721 2026-05-28 cond-mat.stat-mech math.ST physics.app-ph stat.TH

Heat-dissipation decomposition and free-energy generation in a non-equilibrium dot with multi-electron states

多电子态非平衡量子点中的耗散热分解与自由能产生

Chloe Salhani, Kensaku Chida, Takase Shimizu, Toshiaki Hayashi, Katsuhiko Nishiguchi

AI总结 通过单电子计数统计实验,研究了纳米尺度量子点在过渡到非平衡稳态过程中耗散热的分解及其与自由能产生的直接关联。

详情
Comments
Revised version. Title changed
AI中文摘要

我们通过单电子计数统计实验,展示了纳米尺度量子点在过渡到非平衡稳态过程中自由能产生时耗散热的分解。驱动储层的交流信号将多个电子注入量子点,使其处于非平衡状态,导致自由能产生、耗散热和香农熵产生。通过分析量子点多电子态的时域概率分布,我们定量地将耗散热分解为维持热和过剩热,从而揭示了它们与自由能产生的直接相关性。这种相关性表明,在大信号诱导的远离平衡条件下,产生的自由能与施加到量子点的功之比可能达到0.5,而实验上实现了0.25的效率。这些结果建立了多电子随机系统中分解的耗散热与自由能产生之间的定量联系,为非平衡电子器件提供了热力学框架。

英文摘要

We experimentally demonstrate the decomposition of heat dissipation during free-energy generation in a nanometer-scale dot transitioning to a non-equilibrium steady state via single-electron counting statistics. An alternating-current signal driving a reservoir that injects multiple electrons into the dot makes it non-equilibrium, leading to free-energy generation, heat dissipation, and Shannon-entropy production. By analyzing the time-domain probability distributions of multi-electron states of the dot, we quantitatively decompose the heat dissipation into housekeeping and excess heats, thereby revealing their direct correlation with free-energy generation. This correlation suggests that the ratio of the generated free energy to the work applied to the dot, can potentially reach 0.5 under far-from-equilibrium conditions induced by a large signal, while an efficiency of 0.25 was experimentally achieved. These results establish a quantitative link between decomposed heat dissipation and free-energy generation in a multi-electron stochastic system, providing a thermodynamic framework for non-equilibrium electronic devices.

2603.19745 2026-05-28 stat.ME

Invariant quantile regression for heterogeneous environments

异质环境下的不变分位数回归

Bo Fu, Dandan Jiang

AI总结 针对多环境数据集提出不变分位数回归框架,通过核平滑估计器利用环境间不变性实现因果发现和内生性克服。

详情
Comments
25 pages, 4 figures
AI中文摘要

在本文中,我们提出了一个专门针对多环境数据集的不变分位数回归(IQR)框架,该框架捕捉了不同环境之间的不变性。该框架与迁移学习、因果推断和公平机器学习密切相关,其动机源于响应变量在给定协变量下的条件概率发生变化,而某些关键变量保持不变的场景。这一视角与以往仅关注条件均值的工作显著不同,后者通常不足以捕捉异质环境中协变量与响应变量之间的完整因果关系。相比之下,基于分位数的不变性自然地适应异质性,并且与结构因果模型更加一致,其中在一个或多个分位数水平上跨环境不变的变量直接指示潜在且稳定的因果变量。此外,我们表明,与条件均值框架相比,IQR 可能产生更大的内生变量集,从而更有效地排除虚假(非因果)变量。为此,我们引入了一种核平滑不变分位数回归(KS-IQR)估计器,该估计器利用潜在的不变结构和环境间的异质性,确保在多个环境中稳定估计。我们在非渐近框架下建立了我们方法的因果发现性质,展示了其克服“内生性诅咒”的能力,并推导了估计器的 $\ell_2$ 误差界。我们将我们的方法应用于真实数据的因果发现,获得了具有生物学意义的关系,恢复了已知的信号通路并揭示了额外的分位数特定效应。

英文摘要

In this paper, we propose an invariant quantile regression (IQR) framework specifically designed for multi-environment datasets, which captures the invariance across different environments. This framework is closely related to transfer learning, causal inference, and fair machine learning, and is motivated by scenarios in which the conditional probability of the response given covariates varies, while certain key variables remain invariant. This perspective differs notably from previous works that restrict attention to the conditional mean, which is often insufficient to capture the full causal relationships between covariates and the response in heterogeneous environments. In contrast, quantile-based invariance naturally accommodates heterogeneity, and aligns more closely with structural causal models, in which variables invariant across environments at one or multiple quantile levels directly indicate potential and stable causal variables. Moreover, we show that IQR may yield a larger set of endogenous variables compared to the conditional mean framework, which in turn promotes more effective exclusion of spurious (non-causal) variables. To achieve this, we introduce a Kernel-Smoothed Invariant Quantile Regression (KS-IQR) estimator, which leverages the underlying invariance structure and heterogeneity among environments, ensuring stable estimation across multiple environments. We establish the causal discovery properties of our method, demonstrate its ability to overcome the ``curse of endogeneity'', and derive an $\ell_2$ error bound for our estimator, all in a non-asymptotic framework. We apply our method to real data for causal discovery and obtain biologically meaningful relationships, recovering known signaling pathways and revealing additional quantile-specific effects.

2603.08761 2026-05-28 stat.ML cs.LG

No Certificate for Alignment: Two Independent Impossibilities and the Pareto Frontier of Achievable Safety Guarantees

对齐无证书:两个独立的不可行性与可实现安全保证的帕累托前沿

Ayushi Agarwal

AI总结 本文通过两个独立的不可行性定理证明,在标准计算复杂性和学习理论假设下,对开放或无界输入域的AI对齐进行形式化认证是不可能的,并刻画了可实现的安全保证的帕累托前沿。

详情
AI中文摘要

我们论证,在计算复杂性和学习理论的标准假设下,对开放或无界输入域上的AI对齐进行形式化认证是不可能的,并刻画了仍可实现的内容。两个结构独立的不可行性定理支持这一立场。语义障碍(定理1):判断一个系统是否在整个输入域上满足任何非平凡的对齐性质,对于前馈网络是NP难的,对于图灵完备架构是不可判定的——这是神经网络验证复杂性和Rice定理的直接推论。统计障碍(定理2):任何既正确又易处理的验证过程无法在整个输入域上满足完备性——这是从有限观测中认证无限域性质的不可能性的直接推论。这两个定理共同蕴含一个三难困境:没有过程能同时满足正确性(没有未对齐系统被认证)、完备性(没有对齐系统被拒绝)和易处理性(多项式运行时间)。每对性质可同时实现,但三者不可兼得。我们将这些结果整合为一个包含两个结构独立障碍的联合框架,证明它们的独立性,并通过构造性的覆盖间隙下界定量刻画可实现的帕累托前沿。

英文摘要

We argue that formal certification of AI alignment over open-ended or unbounded input domains is impossible under standard assumptions in computational complexity and learning theory, and characterise what remains achievable. Two structurally independent impossibility theorems support this position. The semantic barrier (Theorem 1): deciding whether a system satisfies any non-trivial alignment property over the full input domain is NP-hard for feedforward networks and undecidable for Turing-complete architectures -- a direct consequence of neural-network verification complexity and Rice's Theorem. The statistical barrier (Theorem 2): any verification procedure that is both sound and tractable cannot satisfy Completeness over the full input domain -- a direct consequence of the impossibility of certifying infinite-domain properties from finite observations. These two theorems jointly entail a trilemma: no procedure can simultaneously satisfy soundness (no misaligned system is certified), completeness (no aligned system is rejected), and tractability (polynomial runtime). Each pair is simultaneously achievable; all three are not. We combine these results as a joint framework of two structurally independent barriers, prove their independence, and characterise the achievable Pareto frontier quantitatively via a constructive coverage-gap lower bound.

2603.08276 2026-05-28 stat.ME stat.AP

A Unified Framework for Density Estimation under Right-Censored Point-Centred Quarter Sampling

右删失点四分位抽样下密度估计的统一框架

Wenzhe Huang, Guochun Shen, Dingliang Xing, Jiangyan Zhao

AI总结 针对右删失点四分位抽样数据,提出基于泊松和负二项分布模型的统一似然与矩估计框架,解决空间聚集种群密度估计问题,其中负二项模型MLE在多种生态场景下表现最优。

详情
Comments
42 pages, 28 figures, 4 table
AI中文摘要

尽管点四分位法(PCQM)广泛用于密度估计,但现有处理截断搜索半径导致的右删失数据的方法主要依赖于假设完全空间随机性(CSR)的泊松模型,这为空间聚集种群留下了关键空白。为解决这一局限,我们开发了一个基于泊松和负二项分布(NBD)模型的右删失点四分位抽样的统一似然与矩估计框架。特别地,所提出的NBD估计量同时显式考虑了空间聚集和删失,将基于距离的推断扩展到CSR设置之外。广泛的模拟和对完全映射森林地块的应用表明,NBD的MLE在不同生态场景下提供了最稳健的整体性能。在来自完全映射森林地块的100多个物种中,与现有删失估计量相比,所提出的NBD的MLE将绝对相对偏差的中位数大约降低了0.10,相对改进超过30%。最终,我们的框架为分析删失的点到树距离数据提供了一个经过严格验证且实际有用的工具。

英文摘要

While the point-centred quarter method (PCQM) is widely used for density estimation, existing methods for handling right-censored data from truncated search radii rely primarily on a Poisson model assuming complete spatial randomness (CSR), leaving a critical gap for spatially aggregated populations. To address this limitation, we develop a unified likelihood- and moment-based framework for right-censored point-centred quarter sampling under both Poisson and negative binomial distribution (NBD) models. In particular, the proposed NBD-based estimators explicitly account for spatial aggregation and censoring simultaneously, extending distance-based inference beyond the CSR setting. Extensive simulations and applications to fully mapped forest plots reveal that the NBD-based MLE delivers the most robust overall performance across diverse ecological scenarios. Across more than 100 species from fully mapped forest plots, the proposed NBD-based MLE approximately reduced absolute relative bias by a median of 0.10 compared with existing censored estimators, representing a relative improvement of over 30%. Ultimately, our framework provides a rigorously validated and practically useful toolkit for analysing censored point-to-tree distance data.

2512.00252 2026-05-28 stat.ML cs.LG physics.ao-ph

DAISI: Data Assimilation with Inverse Sampling using Stochastic Interpolants

DAISI:基于随机插值逆采样的数据同化

Martin Andrae, Erik Wikingsson, So Takao, Tomas Landelius, Fredrik Lindsten

AI总结 提出DAISI算法,利用流式生成模型实现灵活的概率推断,通过逆采样结合预报信息与观测数据,解决传统高斯近似在复杂非线性系统中的局限性。

详情
Comments
Accepted at the International Conference on Machine Learning 2026, 44 pages, 28 figures
AI中文摘要

数据同化是科学和工程应用的基石,它将模型预报与稀疏且带噪声的观测相结合,以估计潜在的系统状态。经典的高维数据同化方法,如集合卡尔曼滤波器,依赖于高斯近似,这在复杂动力学或观测算子中会被违反。为了解决这一局限性,我们引入了DAISI,一种基于流式生成模型的可扩展滤波算法,能够利用数据驱动的先验实现灵活的概率推断。核心思想是使用一个固定的、预训练好的生成先验,首先通过一种新颖的逆采样步骤融入预报信息,然后通过基于引导的条件采样同化观测。这使我们能够利用任何预报模型作为数据同化流程的一部分,而无需在每个同化步骤重新训练或微调生成先验。在具有挑战性的非线性系统上的实验表明,DAISI在稀疏、带噪声和非线性观测的情况下实现了准确的滤波结果,而传统方法在这些情况下表现不佳。DAISI的代码可在https://github.com/Erik-Wikingsson/DAISI获取。

英文摘要

Data assimilation (DA) is a cornerstone of scientific and engineering applications, combining model forecasts with sparse and noisy observations to estimate latent system states. Classical high-dimensional DA methods, such as the ensemble Kalman filter, rely on Gaussian approximations that are violated for complex dynamics or observation operators. To address this limitation, we introduce DAISI, a scalable filtering algorithm built on flow-based generative models that enables flexible probabilistic inference using data-driven priors. The core idea is to use a stationary, pre-trained generative prior that first incorporates forecast information through a novel inverse-sampling step, before assimilating observations via guidance-based conditional sampling. This allows us to leverage any forecasting model as part of the DA pipeline without having to retrain or fine-tune the generative prior at each assimilation step. Experiments on challenging nonlinear systems show that DAISI achieves accurate filtering results in regimes with sparse, noisy, and nonlinear observations where traditional methods struggle. The code for DAISI is available at https://github.com/Erik-Wikingsson/DAISI.

2602.23602 2026-05-28 stat.ML cs.LG

Moment Matters: Mean and Variance Causal Graph Discovery from Heteroscedastic Observational Data

矩重要:从异方差观测数据中发现均值和方差因果图

Yoichi Chikahara

AI总结 提出贝叶斯矩驱动因果发现框架,从异方差观测数据中分别推断均值和方差因果图,并实现结构特征的不确定性量化。

详情
Comments
Accepted at KDD 2026. This is the full version of the accepted paper. 17 pages, 6 figures
AI中文摘要

异方差性——即一个变量的方差随其他变量变化——在真实数据中普遍存在,从统计矩的角度阐明其产生原因对于科学知识发现和决策至关重要。然而,标准因果发现无法揭示哪些原因作用于均值还是方差,因为它返回一个单一的不考虑矩的图,限制了可解释性和下游干预设计。我们提出了一个贝叶斯、矩驱动的因果发现框架,从观测异方差数据中推断独立的 extit{均值}和 extit{方差}因果图。我们首先通过建立充分条件推导出这两个图可分别识别的识别结果。基于此理论,我们开发了一种变分推理方法,学习两个图的后验分布,从而实现对结构特征(如边、路径和子图)的原则性不确定性量化。为了解决具有两个图结构的异方差模型中参数优化的挑战,我们采用曲率感知优化方法,并开发了一种先验引入技术,利用节点顺序的领域知识,提高样本效率。在合成、半合成和真实数据上的实验表明,我们的方法能够准确恢复均值和方差结构,并优于最先进的基线方法。

英文摘要

Heteroscedasticity -- where the variance of a variable changes with other variables -- is pervasive in real data, and elucidating why it arises from the perspective of statistical moments is crucial in scientific knowledge discovery and decision-making. However, standard causal discovery does not reveal which causes act on the mean versus the variance, as it returns a single moment-agnostic graph, limiting interpretability and downstream intervention design. We propose a Bayesian, moment-driven causal discovery framework that infers separate \textit{mean} and \textit{variance} causal graphs from observational heteroscedastic data. We first derive the identification results by establishing sufficient conditions under which these two graphs are separately identifiable. Building on this theory, we develop a variational inference method that learns a posterior distribution over both graphs, enabling principled uncertainty quantification of structural features (e.g., edges, paths, and subgraphs). To address the challenges of parameter optimization in heteroscedastic models with two graph structures, we take a curvature-aware optimization approach and develop a prior incorporation technique that leverages domain knowledge on node orderings, improving sample efficiency. Experiments on synthetic, semi-synthetic, and real data show that our approach accurately recovers mean and variance structures and outperforms state-of-the-art baselines.

2602.14862 2026-05-28 stat.ML cs.AI cs.IT cs.LG math.IT stat.ME

The Well-Tempered Classifier: Some Elementary Properties of Temperature Scaling

温度缩放分类器:温度缩放的一些基本性质

Pierre-Alexandre Mattei, Bruno Loureiro

AI总结 本文通过信息投影和线性缩放子模型等新视角,严格分析了温度缩放对分类器校准和LLM多样性的影响,证明升温普遍增加不确定性但质疑其增加多样性的说法。

详情
AI中文摘要

温度缩放是一种简单的方法,可以控制概率模型的不确定性。它主要用于两个场景:改进分类器的校准和调节大型语言模型(LLM)的随机性。在这两种情况下,温度缩放都是最流行的方法。尽管其流行,但温度缩放性质的严格理论分析仍然难以捉摸。我们在此研究其中一些性质。对于分类,我们表明提高温度在非常普遍的意义上增加了模型的不确定性(特别是增加了其熵)。然而,对于LLM,我们质疑了提高温度会增加多样性的常见说法。此外,我们引入了温度缩放的两种新表征。第一种是几何的:温度缩放模型被证明是原始模型在具有给定熵的模型集合上的信息投影。第二种表征阐明了温度缩放作为更一般线性缩放器(如矩阵缩放和狄利克雷校准)的子模型的作用:我们表明温度缩放是唯一不改变模型硬预测的线性缩放器。

英文摘要

Temperature scaling is a simple method that allows to control the uncertainty of probabilistic models. It is mostly used in two contexts: improving the calibration of classifiers and tuning the stochasticity of large language models (LLMs). In both cases, temperature scaling is the most popular method for the job. Despite its popularity, a rigorous theoretical analysis of the properties of temperature scaling has remained elusive. We investigate here some of these properties. For classification, we show that increasing the temperature increases the uncertainty in the model in a very general sense (and in particular increases its entropy). However, for LLMs, we challenge the common claim that increasing temperature increases diversity. Furthermore, we introduce two new characterisations of temperature scaling. The first one is geometric: the tempered model is shown to be the information projection of the original model onto the set of models with a given entropy. The second characterisation clarifies the role of temperature scaling as a submodel of more general linear scalers such as matrix scaling and Dirichlet calibration: we show that temperature scaling is the only linear scaler that does not change the hard predictions of the model.

2510.03534 2026-05-28 cs.MA cs.LG cs.SY eess.SY stat.ML

Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning

基于多智能体强化学习的杜罗河羽流长期映射

Nicolò Dal Fabbro, Milad Mesbahi, Renato Mendes, João Borges de Sousa, George J. Pappas

AI总结 提出一种能量与通信高效的多智能体强化学习方法,结合时空高斯过程回归与多头Q网络控制器,实现多艘自主水下航行器对杜罗河羽流的长期(多天)映射,在Delft3D模拟中优于基准方法,且增加智能体数量可提升精度与续航。

详情
Comments
Accepted at the 2026 IEEE International Conference on Robotics and Automation
AI中文摘要

我们研究了使用多艘自主水下航行器(AUV)对河流羽流进行长期(多天)映射的问题,重点关注杜罗河代表性用例。我们提出了一种能量和通信高效的多智能体强化学习方法,其中中央协调器间歇性地与AUV通信,收集测量数据并发出指令。我们的方法将时空高斯过程回归(GPR)与多头Q网络控制器相结合,该控制器调节每个AUV的方向和速度。使用Delft3D海洋模型的模拟表明,我们的方法始终优于单智能体和多智能体基准,并且增加智能体数量既能改善均方误差(MSE)又能提高操作续航。在某些情况下,我们的算法表明,将AUV数量加倍可以使续航增加一倍以上,同时保持或提高精度,这凸显了多智能体协调的优势。我们学习到的策略能够泛化到不同月份和年份的未见季节性情景,为未来开发数据驱动的动态羽流环境长期监测展示了前景。

英文摘要

We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.

2602.02855 2026-05-28 cs.LG cond-mat.dis-nn math.ST stat.TH

When pre-training hurts LoRA fine-tuning: a dynamical analysis via single-index models

当预训练损害LoRA微调:基于单指标模型的动力学分析

Gibbs Nwemadji, Bruno Loureiro, Jean Barbier

AI总结 本文通过单指标模型下的动力学分析,数学证明了过度预训练会降低LoRA微调的收敛速度,并刻画了收敛率与初始对齐及目标任务非线性的关系。

详情
Comments
38 pages, 14 figures
AI中文摘要

在源任务上的预训练通常被认为有助于类似下游问题的微调。本文从数学上表明,这种朴素直觉并不总是成立:过度预训练会在计算上减慢微调优化。我们研究了在单次SGD训练的单指标模型上进行低秩适应(LoRA)微调的现象。利用微调动力学的汇总统计描述,我们精确刻画了收敛率如何依赖于初始微调对齐和目标任务的非线性程度。关键结论是,即使预训练和下游任务高度对齐,强预训练也会导致搜索阶段延长并阻碍收敛。因此,我们的理论提供了一个统一图景,说明预训练强度与任务难度如何在非平凡的可处理模型中共同塑造LoRA微调的动力学和局限性。在实践方面,我们通过实验表明,我们的理论发现超越了玩具模型,在真实数据上训练的视觉变换器模型中仍然相关。

英文摘要

Pre-training on a source task is usually expected to facilitate fine-tuning on similar downstream problems. In this work, we mathematically show that this naive intuition is not always true: excessive pre-training can computationally slow down fine-tuning optimization. We study this phenomenon for low-rank adaptation (LoRA) fine-tuning on single-index models trained under one-pass SGD. Leveraging a summary statistics description of the fine-tuning dynamics, we precisely characterize how the convergence rate depends on the initial fine-tuning alignment and the degree of non-linearity of the target task. The key take away is that even when the pre-training and downstream tasks are well aligned, strong pre-training can induce a prolonged search phase and hinder convergence. Our theory thus provides a unified picture of how pre-training strength and task difficulty jointly shape the dynamics and limitations of LoRA fine-tuning in a nontrivial tractable model. On the practical side, we empirically show that our theoretical findings extend beyond our toy model and remain relevant in the context of a vision-transformer model trained on real data.

2510.02174 2026-05-28 cs.LG math.OC math.PR stat.ML

Flatness-Aware Stochastic Gradient Langevin Dynamics

平坦感知随机梯度Langevin动力学

Stefano Bruno, Youngsik Hwang, Jaehyeon An, Sotirios Sabanis, Dong-Young Lim

AI总结 提出平坦感知随机梯度Langevin动力学(fSGLD),通过理论规定的噪声尺度与逆温度耦合,在保持计算效率的同时偏向平坦盆地,并提供非渐近理论分析和实验验证。

详情
Journal ref
ICML 2026
Comments
Accepted by ICML 2026
AI中文摘要

损失景观的平坦性已被广泛研究,作为理解深度学习算法行为和泛化的重要视角。受此观点启发,我们提出了平坦感知随机梯度Langevin动力学(fSGLD),这是一种一阶优化方法,在保持SGD和SGLD的计算和内存效率的同时,使其动力学偏向平坦盆地。我们提供了非渐近理论分析,表明在理论上规定的噪声尺度$σ$和逆温度$β$之间的耦合下,fSGLD以平坦偏差的吉布斯分布为目标,并给出了显式的过剩风险保证。我们在标准优化器基准、贝叶斯图像分类、不确定性量化和分布外检测上对fSGLD进行了实证评估,展示了持续强劲的性能和可靠的不确定性估计。额外实验证实了理论上规定的$β$-$σ$耦合相对于解耦选择的有效性。

英文摘要

Flatness of the loss landscape has been widely studied as an important perspective for understanding the behavior and generalization of deep learning algorithms. Motivated by this view, we propose Flatness-Aware Stochastic Gradient Langevin Dynamics (fSGLD), a first-order optimization method that biases learning its dynamics toward flat basins while retaining the computational and memory efficiency of SGD and SGLD. We provide a non-asymptotic theoretical analysis showing that fSGLD targets a flatness-biased Gibbs distribution under a theoretically prescribed coupling between the noise scale $σ$ and the inverse temperature $β$, together with explicit excess risk guarantees. We empirically evaluate fSGLD across standard optimizer benchmarks, Bayesian image classification, uncertainty quantification, and out-of-distribution detection, demonstrating consistently strong performance and reliable uncertainty estimates. Additional experiments confirm the effectiveness of the theoretically prescribed $β$-$σ$ coupling compared to decoupled choices.

2601.22519 2026-05-28 stat.ML cs.LG

Corrected Samplers for Discrete Flow Models

离散流模型的校正采样器

Zhengyan Wan, Yidong Ouyang, Liyan Xie, Hongyuan Zha, Fang Fang, Guang Cheng

AI总结 针对离散流模型中现有采样器(如tau-leaping和Euler求解器)离散化误差大、需大量迭代的问题,提出时间校正和位置校正两种采样器,在不增加计算成本下降低误差,并证明位置校正采样器复杂度更低。

详情
AI中文摘要

离散流模型(DFMs)被提出用于学习有限状态空间上的数据分布,作为离散扩散模型的替代方案提供了灵活框架。近期一系列工作研究了离散扩散模型的采样器,如tau-leaping和Euler求解器。然而,这些采样器需要大量迭代来控制离散化误差,因为转移速率在时间上被冻结并在每个时间区间内以初始状态评估。此外,这些采样器的理论结果通常要求转移速率的有限性条件,或专注于特定类型的源分布。为解决这些限制,我们在离散流模型框架下,建立了这些采样器的非渐近离散化误差界,且对转移速率和源分布无任何限制。进一步,通过分析Euler求解器的一步下界,我们提出了两种校正采样器: extit{时间校正采样器}和 extit{位置校正采样器},它们几乎不增加额外计算成本即可减少tau-leaping和Euler求解器的离散化误差。我们严格证明位置校正采样器比现有并行采样器具有更低的复杂度。通过在模拟和文本到图像生成任务上以更少的推理时间获得更好的生成质量,验证了所提方法的有效性。代码见 https://github.com/WanZhengyan/Corrected-Samplers-for-Discrete-Flow-Models。

英文摘要

Discrete flow models (DFMs) have been proposed to learn the data distribution on finite state space, offering a flexible framework as an alternative to discrete diffusion models. A line of recent work has studied samplers for discrete diffusion models, such as tau-leaping and Euler solver. However, these samplers require a large number of iterations to control discretization error, since the transition rates are frozen in time and evaluated at the initial state within each time interval. Moreover, theoretical results for these samplers often require boundedness conditions of the transition rate or they focus on a specific type of source distributions. To address those limitations, we establish non-asymptotic discretization error bounds for those samplers without any restriction on transition rates and source distributions, under the framework of discrete flow models. Furthermore, by analyzing a one-step lower bound of the Euler sampler, we propose two corrected samplers: \textit{time-corrected sampler} and \textit{location-corrected sampler}, which can reduce the discretization error of tau-leaping and Euler solver with almost no additional computational cost. We rigorously show that the location-corrected sampler has a lower complexity than existing parallel samplers. We validate the effectiveness of the proposed method by achieving better generation quality with reduced inference time on simulations and text-to-image generation tasks. Code can be found in https://github.com/WanZhengyan/Corrected-Samplers-for-Discrete-Flow-Models.

2601.10464 2026-05-28 stat.AP q-bio.GN

MitoFREQ: A Novel Approach for Mitogenome Frequency Estimation from Top-level Haplogroups and Single Nucleotide Variants

MitoFREQ:一种基于顶级单倍群和单核苷酸变异进行线粒体基因组频率估计的新方法

Mikkel Meyer Andersen, Nicole Huber, Kimberly S Andreaggi, Tóra Oluffa Stenberg Olsen, Walther Parson, Charla Marshall

AI总结 提出MitoFREQ方法,利用HelixMTdb和gnomAD数据库中顶级单倍群的SNV等位基因频率,通过加权稀有SNV频率估计线粒体基因组群体频率,并开发了开源R包mitofreq。

详情
AI中文摘要

谱系标记群体频率可作为法医遗传学中表达证据价值的一种方式。然而,对于高质量的全线粒体DNA基因组序列(线粒体基因组),群体数据仍然有限。在本文中,我们提供了一种新方法MitoFREQ,用于估计线粒体基因组的群体频率。MitoFREQ使用线粒体基因组资源HelixMTdb和gnomAD,分别包含来自195,983和56,406个线粒体基因组的信息。HelixMTdb和gnomAD都不能直接查询单个线粒体基因组频率,但提供了30个“顶级”单倍群(TLHG)中每个的单核苷酸变异(SNV)等位基因频率。我们建议通过将给定线粒体基因组分类到TLHG方案中,随后使用该TLHG内其最稀有SNV的频率(按TLHG频率加权)来利用HelixMTdb和gnomAD资源。我们证明,该方法保证提供比使用精细单倍群及其SNV频率更高的群体频率估计。此外,我们表明,仅使用227个特定位置即可对99.9%的测试线粒体基因组实现顶级单倍群分类,可能使该方法适用于低质量样本。该方法在两类数据集上进行了测试:高质量法医参考数据集和来自GenBank的多样化经过审查的线粒体基因组集合。这种双重评估表明,该方法在精心策划的法医数据和更广泛的群体水平序列上均具有稳健性。该方法产生的似然比在100-100,000范围内,展示了其加强法医mtDNA证据统计评估的潜力。我们开发了一个开源R包`mitofreq`来实现我们的方法,包括一个Shiny应用程序,可以在其中提供自定义TLHG频率。

英文摘要

Lineage marker population frequencies can serve as one way to express evidential value in forensic genetics. However, for high-quality whole mitochondrial DNA genome sequences (mitogenomes), population data remain limited. In this paper, we offer a new method, MitoFREQ, for estimating the population frequencies of mitogenomes. MitoFREQ uses the mitogenome resources HelixMTdb and gnomAD, harbouring information from 195,983 and 56,406 mitogenomes, respectively. Neither HelixMTdb nor gnomAD can be queried directly for individual mitogenome frequencies, but offers single nucleotide variant (SNV) allele frequencies for each of 30 "top-level" haplogroups (TLHG). We propose using the HelixMTdb and gnomAD resources by classifying a given mitogenome within the TLHG scheme and subsequently using the frequency of its rarest SNV within that TLHG weighted by the TLHG frequency. We show that this method is guaranteed to provide a higher population frequency estimate than if a refined haplogroup and its SNV frequencies were used. Further, we show that top-level haplogrouping can be achieved by using only 227 specific positions for 99.9% of the tested mitogenomes, potentially making the method available for low-quality samples. The method was tested on two types of datasets: high-quality forensic reference datasets and a diverse collection of scrutinised mitogenomes from GenBank. This dual evaluation demonstrated that the approach is robust across both curated forensic data and broader population-level sequences. This method produced likelihood ratios in the range of 100-100,000, demonstrating its potential to strengthen the statistical evaluation of forensic mtDNA evidence. We have developed an open-source R package `mitofreq` that implements our method, including a Shiny app where custom TLHG frequencies can be supplied.

2212.04382 2026-05-28 stat.ML cs.LG

Structure of Classifier Boundaries: Case Study for a Naive Bayes Classifier

分类器边界的结构:朴素贝叶斯分类器的案例研究

Alan F. Karr, Zac Bowen, Adam A. Porter, Regina Ruane

AI总结 研究贝叶斯分类器在输入空间为图时的边界结构,通过邻域相似性度量分类不确定性,并应用于DNA读段分配问题。

详情
AI中文摘要

对于输入空间为图的贝叶斯分类器,我们研究边界的结构,边界由那些至少有一个邻居被分类不同的点组成。科学背景是将下一代测序仪产生的DNA读段分配给候选源基因组。我们表明边界在结构上既大又复杂。一种新的不确定性度量——邻域相似性,它将输入点的分类器结果与其邻居的结果分布进行比较,不仅跟踪贝叶斯分类器的两个固有不确定性度量,而且可以用于没有固有不确定性度量的分类器。

英文摘要

For a Bayes classifier whose input space is a graph, we study the structure of the boundary, which comprises those points for which at least one neighbor is classified differently. The scientific setting is assignment of DNA reads produced by next generations sequencers to candidate source genomes. We show that the boundary is both large and complicated in structure. A new measure of uncertainty, Neighbor Similarity, which compares the classifier result for an input point to the distribution of results for its neighbors, not only tracks two inherent uncertainty measures for the Bayes classifier, but also can be implemented for classifiers without inherent measures of uncertainty.

2512.06797 2026-05-28 math.OC cs.AI cs.LG stat.ML

Optimal and Diffusion Transports in Machine Learning

机器学习中的最优输运与扩散输运

Gabriel Peyré

AI总结 本文综述了机器学习中扩散方法和最优输运两种输运方法,它们通过拉格朗日视角设计概率分布演化,应用于采样、神经网络优化和大语言模型动力学建模。

详情
Comments
Proc. 2026 International Congress of Mathematicians
AI中文摘要

机器学习中的若干问题自然地表述为随时间演化的概率分布的设计与分析。这包括通过扩散方法进行采样、优化神经网络的权重,以及分析大语言模型各层中令牌分布的演化。尽管目标应用不同(样本、权重、令牌),它们的数学描述共享一个共同结构。一个关键思想是通过平流粒子的向量场,从密度的欧拉表示转换到其拉格朗日对应。这种双重观点带来了挑战,特别是拉格朗日向量场的非唯一性,但也提供了机会,以构造在正则性、稳定性和计算可行性方面具有有利性质的密度演化和流。本综述概述了这些方法,重点介绍两种互补方法:扩散方法,它依赖于随机插值过程并支撑现代生成式AI;以及最优输运,它通过最小化位移成本来定义插值。我们说明了这两种方法如何出现在从采样、神经网络优化到建模大语言模型Transformer动力学的应用中。

英文摘要

Several problems in machine learning are naturally expressed as the design and analysis of time-evolving probability distributions. This includes sampling via diffusion methods, optimizing the weights of neural networks, and analyzing the evolution of token distributions across layers of large language models. While the targeted applications differ (samples, weights, tokens), their mathematical descriptions share a common structure. A key idea is to switch from the Eulerian representation of densities to their Lagrangian counterpart through vector fields that advect particles. This dual view introduces challenges, notably the non-uniqueness of Lagrangian vector fields, but also opportunities to craft density evolutions and flows with favorable properties in terms of regularity, stability, and computational tractability. This survey presents an overview of these methods, with emphasis on two complementary approaches: diffusion methods, which rely on stochastic interpolation processes and underpin modern generative AI, and optimal transport, which defines interpolation by minimizing displacement cost. We illustrate how both approaches appear in applications ranging from sampling, neural network optimization, to modeling the dynamics of transformers for large language models.

2512.02019 2026-05-28 cs.LG cs.AI stat.ML

Diffusion-Augmented Markov Decision Processes for Maximum Entropy Reinforcement Learning

扩散增强马尔可夫决策过程用于最大熵强化学习

Sebastian Sanokowski, Kaustubh Patil

AI总结 本文通过将最大熵强化学习扩展到扩散过程,提出扩散增强马尔可夫决策过程(DA-MDPs),以最小化反向KL散度的上界来学习最优策略轨迹分布,并成功将PPO、WPO和REPPO适配为扩散变体,在连续控制和多模态基准上取得与基线相当或更优的性能。

详情
Comments
Preprint
AI中文摘要

扩散模型擅长从复杂的非归一化分布中采样。在这项工作中,我们将最大熵强化学习(ME-RL)扩展到扩散过程,从而能够从最优策略轨迹分布中采样。通过最小化扩散策略与最优策略轨迹分布之间的反向KL散度的可处理上界,我们推导出一个修改后的替代目标,并引入了扩散增强马尔可夫决策过程(DA-MDPs)。DA-MDPs允许将扩散策略无缝集成到任何ME-RL方法中,只需最小的修改。我们通过将近端策略优化(PPO)、Wasserstein策略优化(WPO)和相对熵路径策略优化(REPPO)适配为其基于扩散的变体:DA-MDP: PPO、DA-MDP: WPO和DA-MDP: REPPO,证明了其有效性。在标准连续控制基准上的实验结果表明,我们的方法匹配或优于基线方法,而在多模态基准上的实验证实了其建模多模态动作分布的能力。

英文摘要

Diffusion models excel at sampling from complex, unnormalized distributions. In this work, we extend Maximum Entropy Reinforcement Learning (ME-RL) to diffusion processes, enabling sampling from the optimal policy trajectory distribution. By minimizing a tractable upper bound on the reverse KL divergence between the diffusion policy and the optimal policy trajectory distributions, we derive a modified surrogate objective and introduce Diffusion-Augmented Markov Decision Processes (DA-MDPs). DA-MDPs allow for seamless integration of diffusion policies into any ME-RL method with minimal modifications. We demonstrate its effectiveness by adapting Proximal Policy Optimization (PPO), Wasserstein Policy Optimization (WPO), and Relative Entropy Pathwise Policy Optimization (REPPO) into their diffusion-based variants: DA-MDP: PPO, DA-MDP: WPO, and DA-MDP: REPPO. Empirical results on standard continuous-control benchmarks show that our approach matches or outperforms baseline methods, while experiments on multimodal benchmarks confirm its ability to model multimodal action distributions.

2505.24587 2026-05-28 quant-ph math.ST stat.ML stat.TH

Sample-optimal learning of quantum states using gentle measurements

使用温和测量的量子态样本最优学习

Cristina Butucea, Jan Johannes, Henning Stein

AI总结 本文引入α-局部温和测量类,通过改进温和性与量子差分隐私的关系证明强量子数据处理不等式,并用于推导量子层析和量子态认证的样本复杂度下界,提出量子标签切换方法达到该下界。

详情
AI中文摘要

量子态的温和测量不会完全坍缩初始态。相反,它们提供一个与初始态处于规定迹距离$α$的测量后态,以及一个用于量子学习初始态的随机变量。我们在此引入有限维量子系统上的$α$-局部温和测量($α-$LGM)类,它们是乘积态上的乘积测量,并通过改进温和性与量子差分隐私之间的关系,证明了该类上的强量子数据处理不等式(qDPI)。我们进一步展示了一个温和的量子Neyman-Pearson引理,它意味着我们的qDPI是渐近最优的(对于小$α$)。该不等式用于表明,对于量子层析和量子态认证,达到规定精度$ε$所需的量子态数量级为$1/(ε^2 α^2)$。最后,我们提出了一种称为量子标签切换的$α-$LGM,它达到了这些界。这是一种通用的可实现方法,可将任何双输出测量转化为$α-$LGM。

英文摘要

Gentle measurements of quantum states do not entirely collapse the initial state. Instead, they provide a post-measurement state at a prescribed trace distance $α$ from the initial state together with a random variable used for quantum learning of the initial state. We introduce here the class of $α-$locally-gentle measurements ($α-$LGM) on a finite dimensional quantum system which are product measurements on product states and prove a strong quantum Data-Processing Inequality (qDPI) on this class using an improved relation between gentleness and quantum differential privacy. We further show a gentle quantum Neyman-Pearson lemma which implies that our qDPI is asymptotically optimal (for small $α$). This inequality is employed to show that the necessary number of quantum states for prescribed accuracy $ε$ is of order $1/(ε^2 α^2)$ for both quantum tomography and quantum state certification. Finally, we propose an $α-$LGM called quantum Label Switch that attains these bounds. It is a general implementable method to turn any two-outcome measurement into an $α-$LGM.

2511.17291 2026-05-28 math.ST stat.TH

Properties of stepwise parameter estimation in high-dimensional vine copulas

高维藤蔓Copula中逐步参数估计的性质

Jana Gauss, Thomas Nagler

AI总结 针对参数数量与样本量同阶的高维藤蔓Copula,建立了逐步最大似然估计的相合性和渐近正态性,并探讨了截断藤蔓及边际分布估计的影响。

详情
AI中文摘要

在高维设置中,藤蔓Copula的使用日益增多,其参数数量通常与样本量同阶,这要求超越传统固定$p$、大$n$框架的渐近理论。我们建立了当参数数量随$n \to \infty$发散时,藤蔓Copula的逐步最大似然估计的相合性和渐近正态性。我们的理论结果涵盖了边际分布的参数和非参数估计,以及截断藤蔓,并且也适用于一般的估计问题,特别是其他序贯过程。数值实验表明,如果较高树中的对Copula足够快地收敛到独立Copula,则所导出的假设成立。模拟研究证实了这些发现,并识别了估计变得具有挑战性的设置。特别是,藤蔓结构强烈影响估计精度,D-藤蔓比C-藤蔓更难估计,且Gumbel藤蔓中的估计偏差显著大于高斯藤蔓中的偏差。

英文摘要

The increasing use of vine copulas in high-dimensional settings, where the number of parameters is often of the same order as the sample size, calls for asymptotic theory beyond the traditional fixed-$p$, large-$n$ framework. We establish consistency and asymptotic normality of the stepwise maximum likelihood estimator for vine copulas when the number of parameters diverges as $n \to \infty$. Our theoretical results cover both parametric and nonparametric estimation of the marginal distributions, as well as truncated vines, and are also applicable to general estimation problems, particularly other sequential procedures. Numerical experiments suggest that the derived assumptions are satisfied if the pair copulas in higher trees converge to independence copulas sufficiently fast. A simulation study substantiates these findings and identifies settings in which estimation becomes challenging. In particular, the vine structure strongly affects estimation accuracy, with D-vines being more difficult to estimate than C-vines, and estimates in Gumbel vines exhibit substantially larger biases than those in Gaussian vines.

2411.13479 2026-05-28 stat.ML cs.LG stat.AP

Conformal Prediction for Hierarchical Data

分层数据的共形预测

Guillaume Principato, Gilles Stoltz, Yvenn Amara-Ouali, Yannig Goude, Bachir Hamrouche, Jean-Michel Poggi

AI总结 针对分层数据,通过引入投影(协调)步骤到分裂共形预测中,在联合覆盖和分量覆盖下均实现更小的预测区域,并理论证明其全局更优。

详情
Comments
39 pages, 4 figures
AI中文摘要

我们考虑多元数据的共形预测,并重点关注分层数据,其中某些分量是其他分量的线性组合。直观上,可以利用分层结构来减小相同覆盖水平下的预测区域大小。我们通过在分裂共形预测(SCP)过程中引入投影步骤(也称为协调步骤)来实现这一直觉,并证明所得预测区域确实全局更小。我们既在经典的联合覆盖目标下,也在一个新的且具有挑战性的任务——分量覆盖下(该任务下效率结果更难获得)做到了这一点。相关策略及其分析基于我们连接的SCP和预测协调文献。我们还通过模拟数据在不同规模的分层结构上展示了理论发现。

英文摘要

We consider conformal prediction for multivariate data and focus on hierarchical data, where some components are linear combinations of others. Intuitively, the hierarchical structure can be leveraged to reduce the size of prediction regions for the same coverage level. We implement this intuition by including a projection step (also called a reconciliation step) in the split conformal prediction [SCP] procedure, and prove that the resulting prediction regions are indeed globally smaller. We do so both under the classic objective of joint coverage and under a new and challenging task: component-wise coverage, for which efficiency results are more difficult to obtain. The associated strategies and their analyses are based both on the literature of SCP and of forecast reconciliation, which we connect. We also illustrate the theoretical findings, for different scales of hierarchies on simulated data.

2510.22016 2026-05-28 cs.LG stat.ML

Cost-Sensitive Evaluation for Binary Classifiers

二分类器的代价敏感评估

Pierangelo Lombardo, Antonio Casoli, Cristian Cingolani, Shola Oshodi, Michele Zanatta

AI总结 针对分类器评估与总分类代价(TCC)最小化不一致的问题,提出加权准确率(WA)指标和通用重加权框架,证明WA与TCC等价,并在各类不平衡与代价场景下保持鲁棒性。

详情
Comments
24 pages, 5 figures
AI中文摘要

为分类器选择合适的评估指标对于模型比较、参数优化和部署决策至关重要,但目前尚无广泛接受的、明确与总分类代价(TCC)最小化一致的评估范式。同时,类别不平衡常被视为需要修正的问题本身,可能导致与TCC最小化的不一致。为解决这些局限,(i)我们定义了加权准确率(WA),一种对二分类器的评估指标,其直观解释为准确率的加权版本;(ii)我们提出了一个通用的重加权框架,用于处理代价敏感场景中的类别不平衡,为重采样技术提供了替代方案。该框架适用于任何可表示为示例相关量的线性组合的评估指标或损失函数;它能够有意义地比较在不同数据集上获得的评估结果,并考虑用于训练、验证和测试的“开发”数据集与模型将部署的“目标”数据集之间的差异。在该框架内,我们推导了标准重平衡技术与TCC最小化保持一致的条件,以及它们可能变得具有误导性的情况。我们证明,在示例无关的单位分类代价下,最大化WA等价于最小化TCC。最后,我们通过研究WA与TCC在广泛的类别不平衡和代价机制下的相关性,分析了WA在现实示例相关代价场景中的鲁棒性。结果表明,在几乎所有考察的场景中,WA与TCC保持稳健的对齐。

英文摘要

Selecting an appropriate evaluation metric for classifiers is crucial for model comparison, parameter optimization, and deployment decisions, yet there is no consensus on a broadly accepted evaluation paradigm explicitly aligned with Total Classification Cost (TCC) minimization. At the same time, class imbalance is often treated as a problem to be corrected \emph{per se}, potentially causing misalignments with TCC minimization. To address these limitations, (\emph{i}) we define Weighted Accuracy (WA), an evaluation metric for binary classifiers with a straightforward interpretation as a weighted version of accuracy and (\emph{ii}) we propose a general reweighting framework for handling class imbalance in cost-sensitive scenarios, providing an alternative to resampling techniques. This framework applies to any evaluation metric or loss function that can be expressed as a linear combination of example-dependent quantities; it enables meaningful comparison of evaluation results obtained on different datasets and accounts for discrepancies between the \emph{development} dataset, used for training, validation, and testing, and the \emph{target} dataset, where the model will be deployed. Within this framework, we derive the conditions under which standard rebalancing techniques remain coherent with TCC minimization, and when they may instead become misleading. We prove that, under example-independent Unit Classification Costs, maximizing WA is equivalent to minimizing TCC. Finally, we analyze the robustness of WA in realistic example-dependent cost scenarios by studying its correlation with TCC across a broad range of class imbalance and cost regimes. The results show that WA maintains robust alignment with TCC across almost all examined scenarios.

2510.15839 2026-05-28 cs.LG econ.EM stat.ML

Learning Correlated Reward Models: Statistical Barriers and Opportunities

学习相关奖励模型:统计障碍与机遇

Yeshwanth Cherapanamjeri, Constantinos Daskalakis, Gabriele Farina, Sobhan Mohammadpour

AI总结 本文研究了避免IIA假设的相关probit模型的统计与计算挑战,证明了成对偏好数据不足以学习相关性,而三选一偏好数据可实现近最优估计。

详情
Comments
International Conference on Learning Representations (ICLR) 2026
AI中文摘要

随机效用模型(RUM)是建模用户偏好的经典框架,并在基于人类反馈的强化学习(RLHF)的奖励建模中发挥关键作用。然而,这些技术的一个关键缺陷是无关选项独立性(IIA)假设,该假设将所有人类偏好归结为单一的潜在效用函数,从而对人类偏好范围进行了粗略近似。另一方面,避免这一假设的模型的统计和计算保证很少。在本文中,我们研究了学习相关probit模型的统计和计算挑战,这是一种避免IIA假设的基本RUM。首先,我们确定了成对偏好数据的经典数据收集范式从根本上不足以学习相关性信息,这解释了该设置下缺乏统计和计算保证的原因。接下来,我们证明了三选一偏好数据可证明地克服了这些缺陷,并设计了一个统计和计算上高效的估计器,具有近最优性能。这些结果突显了高阶偏好数据在学习相关效用中的优势,从而允许对人类偏好进行更精细的建模。最后,我们在几个真实世界数据集上验证了这些理论保证,展示了人类偏好的改进个性化。

英文摘要

Random Utility Models (RUMs) are a classical framework for modeling user preferences and play a key role in reward modeling for Reinforcement Learning from Human Feedback (RLHF). However, a crucial shortcoming of many of these techniques is the Independence of Irrelevant Alternatives (IIA) assumption, which collapses \emph{all} human preferences to a universal underlying utility function, yielding a coarse approximation of the range of human preferences. On the other hand, statistical and computational guarantees for models avoiding this assumption are scarce. In this paper, we investigate the statistical and computational challenges of learning a \emph{correlated} probit model, a fundamental RUM that avoids the IIA assumption. First, we establish that the classical data collection paradigm of pairwise preference data is \emph{fundamentally insufficient} to learn correlational information, explaining the lack of statistical and computational guarantees in this setting. Next, we demonstrate that \emph{best-of-three} preference data provably overcomes these shortcomings, and devise a statistically and computationally efficient estimator with near-optimal performance. These results highlight the benefits of higher-order preference data in learning correlated utilities, allowing for more fine-grained modeling of human preferences. Finally, we validate these theoretical guarantees on several real-world datasets, demonstrating improved personalization of human preferences.

2509.22553 2026-05-28 stat.ML cs.LG

Linear Causal Representation Learning by Topological Ordering, Pruning, and Disentanglement

通过拓扑排序、剪枝和解缠的线性因果表示学习

Hao Chen, Lin Liu, Yu Guang Wang

AI总结 提出一种在更弱假设下通过拓扑排序、剪枝和解缠恢复线性因果表示的新算法,并通过合成实验和大语言模型可解释性分析验证其有效性。

详情
AI中文摘要

因果表示学习(CRL)因其能够利用现代数据集的异质性,将复杂的数据生成机制解缠为因果可解释的潜在特征,而日益引起因果推断和人工智能领域的兴趣。本文进一步为CRL文献做出贡献,专注于潜在特征上的风格化线性结构因果模型,并假设一个线性混合函数将潜在特征映射到观测数据或测量值。现有的线性CRL方法通常依赖于严格假设,例如访问单节点干预数据或对潜在特征和/或外生测量噪声施加限制性分布约束。然而,这些先决条件在实践中容易违反。在这项工作中,我们提出了一种新颖的线性CRL算法,与现有方法不同,它在对环境异质性和数据生成分布的更弱假设下运行,同时仍然能够恢复潜在因果特征直至等价类。我们通过合成实验和大语言模型的可解释性分析进一步验证了我们的新算法,展示了其在有限样本下优于竞争方法的性能,以及将因果性融入理解人工智能的潜力。源代码可在https://github.com/utulie/code_for_linear_crl_paper_creator获取。

英文摘要

Causal representation learning (CRL) has garnered increasing interest from the causal inference and artificial intelligence communities due to its potential to disentangle complex data-generating mechanism into causally interpretable latent features by leveraging the heterogeneity of modern datasets. In this paper, we further contribute to the CRL literature, by focusing on the stylized linear structural causal model over latent features and assuming a linear mixing function that maps latent features to the observed data or measurements. Existing linear CRL methods often rely on stringent assumptions, such as access to single-node interventional data or restrictive distributional constraints on latent features and/or exogenous measurement noise. However, these prerequisites can be easy to violate in practice. In this work, we propose a novel linear CRL algorithm that, unlike existing methods, operates under weaker assumptions on environment heterogeneity and data-generating distributions while still recovering latent causal features up to an equivalence class. We further validate our new algorithm via synthetic experiments and an interpretability analysis of large language models, demonstrating both its superiority over competing methods in finite samples and its potential in integrating causality into understanding artificial intelligence. The source code is available at https://github.com/utulie/code_for_linear_crl_paper_creator.

2509.22446 2026-05-28 stat.ME math.ST stat.ML stat.TH

Rescuing double robustness: safe estimation under complete misspecification

拯救双重稳健性:完全错误设定下的安全估计

Lorenzo Testa, Francesca Chiaromonte, Kathryn Roeder

AI总结 针对双重稳健估计在完全错误设定下表现脆弱的问题,提出基于自适应修正裁剪(DR+ACC)的安全估计方法,确保误差受限于各干扰模型误差的凸组合,并保持半参数效率。

详情
Comments
23 pages, 4 figures
AI中文摘要

双重稳健性是半参数和缺失数据方法论的主要卖点。其优点在于部分干扰错误设定下的保护以及在正确干扰设定下的渐近半参数效率。然而,在许多应用中,完全干扰错误设定应被视为常态(或至少是预期的默认情况),因此双重稳健估计可能表现脆弱。事实上,大量实证表明,当所有干扰函数被错误设定时,这些估计的表现可能很差。本文首先刻画了这种双重脆弱现象,然后提出了一种基于自适应修正裁剪(DR+ACC)的解决方案。我们认为,我们的DR+ACC方案是安全的,因为它继承了双重稳健估计在正确干扰设定下的优良性质,但其误差保证被限制在单个干扰模型误差的凸组合内,从而防止了双重稳健估计误差乘积复合导致的不稳定性。我们还表明,与双重稳健估计相比,我们的方案在半参数效率上没有降低,因此在干扰被正确设定时,可以基于渐近正态性进行有效的推断。通过大量模拟以及将其应用于阿尔茨海默病蛋白质组学数据分析,我们展示了DR+ACC估计的有效性。

英文摘要

Double robustness is a major selling point of semiparametric and missing data methodology. Its virtues lie in protection against partial nuisance misspecification and asymptotic semiparametric efficiency under correct nuisance specification. However, in many applications, complete nuisance misspecification should be regarded as the norm (or at the very least the expected default), and thus doubly robust estimators may behave fragilely. In fact, it has been amply verified empirically that these estimators can perform poorly when all nuisance functions are misspecified. Here, we first characterize this phenomenon of double fragility, and then propose a solution based on adaptive correction clipping (DR+ACC). We argue that our DR+ACC proposal is safe, in that it inherits the favorable properties of doubly robust estimators under correct nuisance specification, but its error is guaranteed to be bounded by a convex combination of the individual nuisance model errors, which prevents the instability caused by the compounding product of errors of doubly robust estimators. We also show that our proposal comes with no reduction in semiparametric efficiency compared to doubly robust estimators, and thus valid inference based on asymptotic normality can be conducted when nuisances are well-specified. We showcase the efficacy of our DR+ACC estimator both through extensive simulations and by applying it to the analysis of Alzheimer's disease proteomics data.

2012.02985 2026-05-28 math.ST stat.ME stat.TH

Selecting the number of components in PCA via random signflips

通过随机符号翻转选择PCA中的成分数量

David Hong, Yue Sheng, Edgar Dobriban

AI总结 针对异质噪声下主成分分析中成分数量选择问题,提出基于随机符号翻转的并行分析(FlipPA)方法,并证明其具有非渐近第一类错误控制和一致选择能力。

详情
Comments
54 pages, 22 figures
AI中文摘要

主成分分析(PCA)是现代数据分析中的基础工具,其中关键步骤是选择保留的成分数量。然而,在日益常见的高维异质噪声数据(即每个条目可能具有不同的噪声方差)设置下,经典选择方法(如碎石图、并行分析等)缺乏统计保证。此外,这些在均匀噪声下非常有效的方法在处理异质噪声数据时可能严重失效。本文针对近似对称噪声设置提出了一种新方法,称为符号翻转并行分析(FlipPA):它将数据奇异值与通过以概率1/2随机翻转每个条目符号生成的“经验零”矩阵的奇异值进行比较。我们为FlipPA建立了严格的理论,证明其具有非渐近第一类错误控制,并且在大维极限下(即使噪声异质)能一致选择高于噪声底限的信号的正确秩。我们还严格解释了为什么经典的基于置换的并行分析在异质噪声下性能下降。最后,通过数值模拟和来自天文学数据的示例,我们说明FlipPA与最先进方法相比具有优势。

英文摘要

Principal component analysis (PCA) is a foundational tool in modern data analysis, and a crucial step in PCA is selecting the number of components to keep. However, classical selection methods (e.g., scree plots, parallel analysis, etc.) lack statistical guarantees in the increasingly common setting of large-dimensional data with heterogeneous noise, i.e., where each entry may have a different noise variance. Moreover, it turns out that these methods, which are highly effective for homogeneous noise, can fail dramatically for data with heterogeneous noise. This paper proposes a new method called signflip parallel analysis (FlipPA) for the setting of approximately symmetric noise: it compares the data singular values to those of "empirical null" matrices generated by flipping the sign of each entry randomly with probability one-half. We develop a rigorous theory for FlipPA, showing that it has nonasymptotic type I error control and that it consistently selects the correct rank for signals rising above the noise floor in the large-dimensional limit (even when the noise is heterogeneous). We also rigorously explain why classical permutation-based parallel analysis degrades under heterogeneous noise. Finally, we illustrate that FlipPA compares favorably to state-of-the-art methods via numerical simulations and an illustration on data coming from astronomy.

2502.14498 2026-05-28 math.ST stat.TH

Nadaraya-Watson Type Estimator of the Transition Density Function for Diffusion Processes

扩散过程转移密度函数的Nadaraya-Watson型估计量

Nicolas Marie, Ousmane Sacko

AI总结 本文针对扩散过程的独立连续观测,提出非参数Nadaraya-Watson型转移密度估计量,建立其风险界,并扩展惩罚比较过拟合带宽选择方法,最后提供数值实验。

详情
Comments
24 pages, 4 figures
AI中文摘要

本文研究基于扩散过程独立连续观测计算的非参数Nadaraya-Watson (NW)转移密度函数估计量。建立了该估计量的风险界。本文还针对我们的NW估计量,将惩罚比较扩展到过拟合带宽选择方法。最后,提供了数值实验。

英文摘要

This paper deals with a nonparametric Nadaraya-Watson (NW) estimator of the transition density function computed from independent continuous observations of a diffusion process. A risk bound is established on this estimator. The paper also deals with an extension of the penalized comparison to overfitting bandwidths selection method for our NW estimator. Finally, numerical experiments are provided.

2507.09787 2026-05-28 math.ST stat.TH

Fixed-Point Estimation of the Drift Parameter in Stochastic Differential Equations Driven by Rough Multiplicative Fractional Noise

粗糙乘性分数噪声驱动随机微分方程漂移参数的定点估计

Chiara Amorino, Laure Coutin, Nicolas Marie

AI总结 针对Hurst参数H∈(1/3,1)的乘性分数布朗噪声驱动的随机微分方程,提出基于Skorokhod积分的定点估计方法,通过Malliavin导数重公式化控制逼近误差,建立估计量的适定性、渐近置信区间和非渐近风险界。

详情
Comments
32 pages, 6 figures
AI中文摘要

我们研究了从由Hurst参数H∈(1/3,1)的乘性分数布朗噪声驱动的随机微分方程的N个独立副本中估计漂移参数的问题。基于涉及Skorokhod积分的最小二乘型对象,一个关键挑战在于用可计算的定点估计量逼近这个不可观测的量,这需要处理用路径积分替换Skorokhod积分所引起的修正。为此,本文的一个关键技术贡献是以不显式依赖于驱动噪声的方式重新表述过程的Malliavin导数,从而能够在乘性设置下控制逼近误差。对于H∈(1/3,1/2]的情况,我们进一步利用二维Young积分的结果来处理出现更复杂的修正项。结果,我们建立了对任意H∈(1/3,1)的定点估计量的适定性,以及渐近置信区间和非渐近风险界。最后,数值研究说明了所提估计量良好的实际性能。

英文摘要

We investigate the problem of estimating the drift parameter from $N$ independent copies of the solution of a stochastic differential equation driven by a multiplicative fractional Brownian noise with Hurst parameter $H\in (1/3,1)$. Building on a least-squares-type object involving the Skorokhod integral, a key challenge consists in approximating this unobservable quantity with a computable fixed-point estimator, which requires addressing the correction induced by replacing the Skorokhod integral with its pathwise counterpart. To this end, a crucial technical contribution of this work is the reformulation of the Malliavin derivative of the process in a way that does not depend explicitly on the driving noise, enabling control of the approximation error in the multiplicative setting. For the case $H\in (1/3,1/2]$, we further exploit results on two-dimensional Young integrals to manage the more intricate correction term that appears. As a result, we establish the well-posedness of a fixed-point estimator for any $H\in (1/3,1)$, together with both an asymptotic confidence interval and a non-asymptotic risk bound. Finally, a numerical study illustrates the good practical performance of the proposed estimator.

2506.08928 2026-05-28 cs.LG stat.ME stat.ML

Local MDI+: Local Feature Importances for Tree-Based Models

Local MDI+: 基于树的模型的局部特征重要性

Zhongyuan Liang, Zachary T. Rewolinski, Abhineet Agarwal, Tiffany M. Tang, Bin Yu

AI总结 提出Local MDI+ (LMDI+)方法,通过扩展MDI+框架到局部特征重要性,在12个基准数据集上平均提升10%的预测性能,并展现出更高的稳定性和可解释性。

详情
AI中文摘要

基于树的集成方法(如随机森林)由于其预测性能和计算效率,在表格数据上仍然是深度学习模型的首选。这些优势使其在高风险领域得到广泛应用,在这些领域中,可解释性对于确保可信预测至关重要。这推动了流行的局部特征重要性方法(如LIME和TreeSHAP)的发展。然而,这些方法依赖于忽略模型内部结构的近似,并依赖于可能不稳定的扰动。这些问题在全局设置中通过MDI+得到解决,MDI+是一种全局特征重要性方法,它通过利用决策树和最小二乘法在变换后的节点基上的等价性,结合了基于树和线性的特征重要性。然而,全局MDI+分数无法在面临异质个体特征时解释预测。为了解决这一差距,我们提出了Local MDI+ (LMDI+),这是MDI+框架的一种新颖扩展,用于量化每个特定样本的特征重要性。在12个真实世界基准数据集上,LMDI+在识别实例特定的预测特征方面优于现有基线,仅使用所选特征时平均预测性能提升10%。它进一步展现出更高的稳定性,在不同随机种子的重复模型拟合中,一致地产生相似的实例级特征重要性排名。消融实验表明,LMDI+的每个组件都对这一提升有贡献,并且这些改进不仅限于随机森林,还扩展到梯度提升模型。最后,我们展示了LMDI+通过为每个分类基准识别紧密匹配的反事实案例,以及在住房数据集案例研究中发现同质子群,从而实现了局部可解释性的用例。

英文摘要

Tree-based ensembles such as random forests remain the go-to for tabular data over deep learning models due to their prediction performance and computational efficiency. These advantages have led to their widespread deployment in high-stakes domains, where interpretability is essential for ensuring trustworthy predictions. This has motivated the development of popular local feature importance methods such as LIME and TreeSHAP. However, these approaches rely on approximations that ignore the model's internal structure and instead depend on potentially unstable perturbations. These issues are addressed in the global setting by MDI+, a global feature importance method which combines tree-based and linear feature importances by exploiting an equivalence between decision trees and least squares on a transformed node basis. However, the global MDI+ scores are not able to explain predictions when faced with heterogeneous individual characteristics. To address this gap, we propose Local MDI+ (LMDI+), a novel extension of the MDI+ framework that quantifies feature importances for each particular sample. Across twelve real-world benchmark datasets, LMDI+ outperforms existing baselines at identifying instance-specific predictive features, yielding an average 10% improvement in predictive performance when using only the selected features. It further demonstrates greater stability by consistently producing similar instance-level feature importance rankings across repeated model fits with different random seeds. Ablation experiments show that each component of LMDI+ contributes to these gains, and that the improvements extend beyond random forests to gradient boosting models. Finally, we show that LMDI+ enables local interpretability use cases by identifying closely matched counterfactuals for each classification benchmark and discovering homogeneous subgroups in a housing dataset case study.

2506.04948 2026-05-28 math.OC stat.ML

Unregularized limit of stochastic gradient method for Wasserstein distributionally robust optimization

Wasserstein分布鲁棒优化的随机梯度方法的非正则化极限

Tam Le

AI总结 研究通过熵平滑近似Wasserstein分布鲁棒优化问题,证明正则化参数趋于零时近似梯度收敛到非正则化目标的次梯度,并给出随机梯度方法的收敛保证和速率。

详情
AI中文摘要

Wasserstein分布鲁棒优化为机器学习中数据分布潜在偏移下的模型拟合提供了一个框架。我们研究了该问题的正则化变体,其中熵平滑产生了原始目标的一个采样近似。我们证明了当正则化参数消失时,近似梯度收敛到非正则化目标的次梯度,从而为随机梯度方法提供了收敛保证。我们在一般假设下获得了定性收敛结果,然后在额外正则性下提供了收敛速率。特别地,我们证明了当正则化水平随迭代次数降低时,非正则化目标值(直至采样误差)的收敛速率。我们的分析产生了独立感兴趣的副产品,包括最大值函数次微分平滑的近似结果以及Wasserstein分布鲁棒优化对偶解的经验下界。

英文摘要

Wasserstein distributionally robust optimization offers a framework for model fitting in machine learning under potential shifts in the data distribution. We study a regularized variant of this problem in which entropic smoothing produces a sampled approximation of the original objective. We establish convergence of the approximate gradients to subgradients of the unregularized objective as the regularization parameter vanishes, enabling convergence guarantees for stochastic gradient methods. We obtain qualitative convergence results under general assumptions, then we provide convergence rates under additional regularity. In particular, we prove rates for the convergence of the unregularized objective values, up to sampling errors, when the regularization level is decreased across iterations. Our analysis yields byproducts of independent interest, including approximation results for smoothing of maximum functions subdifferentials and empirical lower bounds for dual solutions of Wasserstein distributionally robust optimization.

2505.09861 2026-05-28 cs.LG cs.AI cs.IR stat.ME

LiDDA: Data Driven Attribution at LinkedIn

LiDDA:领英的数据驱动归因

John Bencina, Erkut Aykutlug, Yue Chen, Zerui Zhang, Stephanie Sorenson, Shao Tang, Changshuai Wei

AI总结 提出一种基于Transformer的统一归因方法,处理成员级、聚合级数据和外部宏观因素,并在领英大规模实施,显著提升营销效果。

详情
AI中文摘要

数据驱动归因基于从数据中学习到的因果模式,将转化功劳分配给营销互动,是现代营销智能的基础,对任何营销业务和广告平台至关重要。本文介绍了一种统一的基于Transformer的归因方法,能够处理成员级数据、聚合级数据以及外部宏观因素的整合。我们详细描述了该方法在领英的大规模实施,展示了显著的影响。我们还分享了广泛适用于营销和广告技术领域的经验与见解。

英文摘要

Data Driven Attribution, which assigns conversion credits to marketing interactions based on causal patterns learned from data, is the foundation of modern marketing intelligence and vital to any marketing business and advertising platform. In this paper, we introduce a unified transformer-based attribution approach that can handle member-level data, aggregate-level data, and integration of external macro factors. We detail the large scale implementation of the approach at LinkedIn, showcasing significant impact. We also share learnings and insights which are broadly applicable to the marketing and ad tech fields.

2505.01987 2026-05-28 math.ST stat.TH

Sharp Empirical Bernstein Bounds for the Variance of Bounded Random Variables

有界随机变量方差的尖锐经验伯恩斯坦界

Diego Martinez-Taboada, Aaditya Ramdas

AI总结 针对有界随机变量方差,提出在常数条件方差和均值下无需独立同分布假设的尖锐经验伯恩斯坦不等式,适用于批量与序贯设定,渐近最优且可推广至希尔伯特空间。

详情
AI中文摘要

我们发展了有界随机变量方差的新型经验伯恩斯坦不等式。我们的不等式在常数条件方差和均值下成立,无需随机变量独立或同分布等进一步假设,使其适用于序贯决策场景。结果在批量设定(样本量固定)和序贯设定(样本量为停时)中均有实例化。我们的界是渐近尖锐的:当数据独立同分布时,我们的置信区间自适应地最优适应未知均值 $μ$ 和未知 $\mathbb{V}[(X-μ)^2]$,意味着置信区间的一阶项精确匹配知道这些量的预言机伯恩斯坦不等式。我们将结果与基于自界随机变量的广泛使用的(非尖锐)方差浓度不等式进行比较,展示了我们方法的理论优势和改进的经验性能。最后,我们将方法扩展到任意可分希尔伯特空间。

英文摘要

We develop novel empirical Bernstein inequalities for the variance of bounded random variables. Our inequalities hold under constant conditional variance and mean, without further assumptions like independence or identical distribution of the random variables, making them suitable for sequential decision making contexts. The results are instantiated for both the batch setting (where the sample size is fixed) and the sequential setting (where the sample size is a stopping time). Our bounds are asymptotically sharp: when the data are iid, our CI adpats optimally to both unknown mean $μ$ and unknown $\mathbb{V}[(X-μ)^2]$, meaning that the first order term of our CI exactly matches that of the oracle Bernstein inequality which knows those quantities. We compare our results to a widely used (non-sharp) concentration inequality for the variance based on self-bounding random variables, showing both the theoretical gains and improved empirical performance of our approach. We finally extend our methods to work in any separable Hilbert space.

2504.10092 2026-05-28 stat.ME cs.NA math.NA stat.CO

Bayesian optimal experimental design with Wasserstein information criteria

基于Wasserstein信息准则的贝叶斯最优实验设计

Tapio Helin, Youssef Marzouk, Jose Rodrigo Rojo-Garcia

AI总结 提出基于先验与后验分布间期望Wasserstein-p距离的贝叶斯设计准则(Wasserstein信息准则),证明其在线性高斯模型下的闭式解,并建立稳定性分析与误差界。

详情
Comments
28 pages, 5 figures
AI中文摘要

贝叶斯最优实验设计(OED)为选择观测或实验提供了一个原则性框架。我们基于先验分布与后验分布之间的期望Wasserstein-p距离引入了新的贝叶斯设计准则,称为Wasserstein信息准则。这些准则与广泛使用的期望信息增益(EIG)准则有许多相似之处,后者依赖于Kullback-Leibler散度。我们证明,在线性高斯设定下,Wasserstein-2准则具有闭式解,这一性质可用于更一般的近似方案,并将该解与经典贝叶斯字母最优性概念进行对比。然后,我们对Wasserstein-1准则进行稳定性分析,其中我们限定了由先验或似然扰动引起的误差。我们将这一分析部分扩展到Wasserstein-2准则。特别地,这些结果给出了先验经验近似的误差率。然后,我们通过模拟说明了Wasserstein-2准则的可计算性并展示了我们的近似速率。

英文摘要

Bayesian optimal experimental design (OED) provides a principled framework for selecting observations or experiments. We introduce new Bayesian design criteria based on the expected Wasserstein-$p$ distance between the prior and posterior distributions, termed Wasserstein information criteria. These criteria have many parallels with the widely used expected information gain (EIG) criterion, which instead relies on the Kullback--Leibler divergence. We show that the Wasserstein-$2$ criterion admits a closed-form solution in the linear-Gaussian setting, a property which can be used for more general approximation schemes, and contrast this solution with classical notions of Bayesian alphabetic optimality. Then we develop a stability analysis of the Wasserstein-$1$ criterion, wherein we bound errors induced by perturbations of the prior or likelihood. We partially extend this analysis to the Wasserstein-$2$ criterion. In particular, these results yield error rates for empirical approximations of the prior. We then illustrate the computability of the Wasserstein-$2$ criterion and demonstrate our approximation rates through simulations.

2503.17531 2026-05-28 stat.ME

Bayesian Latent Class Regression with Interpretable Binary Profiles

具有可解释二元轮廓的贝叶斯潜在类别回归

Yuren Zhou, Yuqi Gu, David B. Dunson

AI总结 针对高维分类数据,提出一种引入二元潜在属性层的贝叶斯潜在类别回归模型(BLIP),满足可识别性和后验一致性,并具有贝叶斯最优聚类性质以应对维数灾难。

详情
AI中文摘要

高维分类数据出现在不同的科学领域,并且通常伴随有协变量。潜在类别回归模型通常用于此类设置,通过假设在给定依赖于协变量的单个潜在类别(通过逻辑回归模型)的条件下分类变量条件独立来降低维度。然而,随着维度增加,这些方法变得不可靠。为了解决这个问题,我们提出了具有可解释二元轮廓的贝叶斯潜在类别回归(BLIP),这是一个灵活的模型族,在协变量依赖的潜在类别和观测到的分类响应之间引入了一个二元潜在属性层。BLIP满足关键理论性质,包括可识别性和后验一致性,并且我们建立了一个贝叶斯最优聚类性质,确保对维数灾难的鲁棒性。我们开发了高效的后验计算方法,通过模拟研究验证了它们,并使用BLIP来推断生态数据中的常见轮廓区域。

英文摘要

High-dimensional categorical data arise in diverse scientific domains and are often accompanied by covariates. Latent class regression models are routinely used in such settings, reducing dimensionality by assuming conditional independence of the categorical variables given a single latent class that depends on covariates through a logistic regression model. However, such methods become unreliable as the dimensionality increases. To address this, we propose Bayesian latent class regression with interpretable binary profiles (BLIP), a flexible family of models that introduces a binary latent-attribute layer between the covariate-dependent latent class and the observed categorical responses. BLIP satisfies key theoretical properties, including identifiability and posterior consistency, and we establish a Bayes oracle clustering property that ensures robustness against the curse of dimensionality. We develop efficient posterior computation methods, validate them through simulation studies, and use BLIP to infer regions of common profile in ecological data.

2503.21643 2026-05-28 math.ST math.PR stat.TH

Wasserstein bounds for non-linear Gaussian filters

非线性高斯滤波器的Wasserstein界

Toni Karvonen, Simo Särkkä

AI总结 利用Poincaré不等式推导预测与测量联合分布与其高斯近似之间的Wasserstein距离上界,用于评估非线性高斯滤波器性能并识别易产生误差的滤波近似。

详情
Comments
To appear in IEEE Transactions on Automatic Control
AI中文摘要

大多数用于非线性系统的卡尔曼滤波器(如无迹卡尔曼滤波器)都基于高斯近似。我们利用Poincaré不等式来界定预测与测量的真实联合分布与其高斯近似之间的Wasserstein距离。这些界限可用于评估非线性高斯滤波器的性能,并确定最可能引起误差的滤波近似。

英文摘要

Most Kalman filters for non-linear systems, such as the unscented Kalman filter, are based on Gaussian approximations. We use Poincaré inequalities to bound the Wasserstein distance between the true joint distribution of the prediction and measurement and its Gaussian approximation. The bounds can be used to assess the performance of non-linear Gaussian filters and determine those filtering approximations that are most likely to induce error.

2406.06306 2026-05-28 eess.SP cs.IT math.IT math.ST stat.TH

Unified Fourier transform on graphs sampled from stochastic block models

从随机块模型采样的图上的统一傅里叶变换

Mahya Ghandehari, Jeannette Janssen, Silo Murphy

AI总结 提出一种基于图论的傅里叶变换方法,用于从随机块模型采样的图,通过块大小和块概率矩阵计算傅里叶基,并利用扰动理论分析基对块大小变化的敏感性。

详情
Comments
27 pages
AI中文摘要

最近,提出了一种基于图论的图信号处理方法。这里我们展示了如何将这种图论驱动的傅里叶变换方法用于从随机块模型(SBM)采样的图。特别地,我们展示了如何从块大小和块概率矩阵轻松计算傅里叶基。利用扰动理论,我们推导了基对块大小变化的敏感性界限。然后我们考虑由加权凯莱图构建的SBM。当块大小相等时,可以从底层群的表示论推导出一个良好的傅里叶基。当块大小近似均匀时,我们证明这个傅里叶基很好地逼近SBM傅里叶基。对于高度非均匀的块大小,基于群的傅里叶基不再适用,但正如我们所示,底层群仍然提供关于SBM傅里叶基的部分信息。

英文摘要

Recently, an approach to graph signal processing based on graphons was proposed. Here we show how such a graphon-driven approach to the Fourier transform can be used on graphs sampled from a stochastic block model (SBM). In particular, we show how a Fourier basis can be easily calculated from the block sizes and the block probability matrix. Using perturbation theory, we derive bounds on the sensitivity of the basis with respect to variations in the block sizes. We then consider SBMs constructed from weighted Cayley graphs. When block sizes are equal, a nice Fourier basis can be derived from the representation theory of the underlying group. When block sizes are nearly uniform, we demonstrate that this Fourier basis closely approximates the SBM Fourier basis. For highly non-uniform block sizes, the group-based Fourier basis is no longer applicable, though, as we show, the underlying group still provides partial information about the SBM Fourier basis.

2502.08695 2026-05-28 stat.ML cs.LG

A Bayesian Nonparametric Perspective on Mahalanobis Distance for Out of Distribution Detection

马氏距离用于分布外检测的贝叶斯非参数视角

Randolph W. Linderman, Noah Cowan, Yiran Chen, Scott W. Linderman

AI总结 本文通过建立贝叶斯非参数模型与相对马氏距离评分(RMDS)之间的形式关系,提出具有分层先验的贝叶斯非参数混合模型来推广RMDS,并在OpenOOD基准上证明其在训练类协方差结构不同且每类数据点较少时优于现有方法。

详情
Journal ref
Transactions on Machine Learning Research (2026)
Comments
32 pages, 5 figures, code is available at https://github.com/rwl93/bnp4ood
AI中文摘要

贝叶斯非参数方法天然适用于分布外(OOD)检测问题。然而,这些技术在很大程度上被更简单的方法所取代,这些方法基于预训练或学习的数据点嵌入之间的距离。在这里,我们展示了贝叶斯非参数模型与相对马氏距离评分(RMDS)之间的形式关系,RMDS是一种常用的OOD检测方法。基于这种联系,我们提出了具有分层先验的贝叶斯非参数混合模型,该模型推广了RMDS。我们在OpenOOD检测基准上评估了这些模型,并表明贝叶斯非参数方法可以改进现有的OOD方法,特别是在训练类协方差结构不同且每类数据点相对较少的场景中。

英文摘要

Bayesian nonparametric methods are naturally suited to the problem of out-of-distribution (OOD) detection. However, these techniques have largely been eschewed in favor of simpler methods based on distances between pre-trained or learned embeddings of data points. Here we show a formal relationship between Bayesian nonparametric models and the relative Mahalanobis distance score (RMDS), a commonly used method for OOD detection. Building on this connection, we propose Bayesian nonparametric mixture models with hierarchical priors that generalize the RMDS. We evaluate these models on the OpenOOD detection benchmark and show that Bayesian nonparametric methods can improve upon existing OOD methods, especially in regimes where training classes differ in their covariance structure and where there are relatively few data points per class.

2412.08052 2026-05-28 cs.LG stat.ML

CANDOR: Counterfactual ANnotated DOubly Robust Off-Policy Evaluation

CANDOR: 反事实注释的双重稳健离策略评估

Aishwarya Mandyam, Shengpu Tang, Jiayu Yao, Jenna Wiens, Barbara E. Engelhardt

AI总结 提出基于双重稳健框架的离策略评估方法,通过仅在奖励模型组件中融入反事实注释,在理论保证和实验上优于其他策略。

详情
Comments
11 pages, published in the conference proceedings of the Conference on Health Inference and Learning (2026)
AI中文摘要

离策略评估(OPE)对于将上下文赌博算法应用于高风险决策环境(如医疗保健)至关重要,因为新治疗策略在部署前必须进行评估。不幸的是,OPE技术本质上受到可用数据广度的限制,这可能不足以评估新策略的性能。最近的工作尝试通过添加专家注释的反事实样本来改善数据集覆盖。然而,此类注释通常不完美,可能导致比不使用任何注释更差的估计器性能。为了更好地利用不完美注释,我们提出了一族基于双重稳健(DR)框架的OPE估计器,该框架将重要性采样(IS)与奖励模型(直接方法,DM)相结合以获得更好的统计保证。我们研究了三种融入反事实注释的方式。在温和假设下,我们证明仅在DM组件中使用注释能产生最理想的理论结果。在多个医疗保健任务(包括真实世界电子健康记录(EHR)数据)上的实验表明,该策略在错误指定的奖励模型和不准确的注释下最为稳健。通过解决不完美注释带来的挑战,这项工作拓宽了OPE方法的适用性,并促进了医疗保健中决策策略的更安全部署。

英文摘要

Off-policy evaluation (OPE) is critical for applying contextual bandit algorithms to high-stakes decision-making settings such as healthcare, where new treatment policies must be evaluated prior to deployment. Unfortunately, OPE techniques are inherently limited by the breadth of the available data, which may not be sufficient to evaluate the performance of a new policy. Recent work attempts to improve dataset coverage by adding expert-annotated counterfactual samples. However, such annotations are often imperfect and can lead to worse estimator performance than using no annotations at all. To better leverage imperfect annotations, we propose a family of OPE estimators grounded in the doubly robust (DR) framework, which combines importance sampling (IS) with a reward model (direct method, DM) for better statistical guarantees. We study three ways of incorporating counterfactual annotations. Under mild assumptions, we prove that using annotations within just the DM component yields the most desirable theoretical results. Experiments on multiple healthcare tasks, including real-world electronic health records (EHR) data, show that this strategy is most robust under misspecified reward models and inaccurate annotations. By addressing the challenges posed by imperfect annotations, this work broadens the applicability of OPE methods and facilitates safer deployment of decision-making policies in healthcare.

2411.18502 2026-05-28 stat.ML cs.AI cs.IR cs.LG stat.ME

Isometry pursuit

等距追踪

Samson Koelle, Marina Meila

AI总结 提出等距追踪算法,通过新颖的归一化方法和多任务基追踪识别宽矩阵中的正交列子矩阵,用于从可解释字典中发现等距嵌入。

详情
AI中文摘要

等距追踪是一种用于识别宽矩阵中正交列子矩阵的凸算法。它由一种新颖的归一化方法后接多任务基追踪组成。应用于假定坐标函数的雅可比矩阵时,它有助于从可解释字典中识别等距嵌入。我们提供了理论和实验结果来证明该方法的合理性。对于涉及坐标选择和多样化的问题,它提供了贪心搜索和暴力搜索的协同替代方案。

英文摘要

Isometry pursuit is a convex algorithm for identifying orthonormal column-submatrices of wide matrices. It consists of a novel normalization method followed by multitask basis pursuit. Applied to Jacobians of putative coordinate functions, it helps identity isometric embeddings from within interpretable dictionaries. We provide theoretical and experimental results justifying this method. For problems involving coordinate selection and diversification, it offers a synergistic alternative to greedy and brute force search.

2407.12100 2026-05-28 stat.ME stat.AP stat.ML

An Agglomerative Clustering of Simulation Output Distributions Using Regularized Wasserstein Distance

使用正则化Wasserstein距离对模拟输出分布进行凝聚聚类

Mohammadmahdi Ghasemloo, David J. Eckman

AI总结 提出一种利用正则化Wasserstein距离对多元经验分布进行凝聚聚类的新算法,用于识别模拟系统性能模式,并应用于呼叫中心模型的异常检测、预优化和在线监控。

详情
AI中文摘要

使用统计学习方法分析随机模拟输出可以通过揭示不同模拟系统之间以及系统输入与输出之间的关系,显著增强决策能力。我们专注于对模拟输出的多元经验分布进行聚类,以识别性能指标之间的模式和权衡。我们提出了一种新颖的凝聚聚类算法,利用正则化Wasserstein距离对这些多元经验分布进行聚类。该框架具有多个重要用例,包括异常检测、预优化和在线监控。在涉及呼叫中心模型的数值实验中,我们展示了该方法如何识别产生相似性能结果的人员配置计划,并为当队列长度信号表明系统性能可能恶化时进行干预提供政策依据。

英文摘要

Using statistical learning methods to analyze stochastic simulation outputs can significantly enhance decision-making by uncovering relationships between different simulated systems and between a system's inputs and outputs. We focus on clustering multivariate empirical distributions of simulation outputs to identify patterns and trade-offs among performance measures. We present a novel agglomerative clustering algorithm that utilizes the regularized Wasserstein distance to cluster these multivariate empirical distributions. This framework has several important use cases, including anomaly detection, pre-optimization, and online monitoring. In numerical experiments involving a call-center model, we demonstrate how this methodology can identify staffing plans that yield similar performance outcomes and inform policies for intervening when queue lengths signal potentially worsening system performance.

2410.12035 2026-05-28 stat.ML cs.LG

Learning with Importance Weighted Variational Inference

基于重要性加权变分推断的学习

Kamélia Daudel, François Roueff

AI总结 本文通过渐近分析比较了IWAE、VR和VR-IWAE边界下的重参数化和双重重参数化梯度估计器,揭示了偏差-方差权衡并证明了DREP的优越性,同时分析了困难区域中梯度估计器的方向合理性。

详情
AI中文摘要

几种涉及重要性加权思想的变分边界推广了用于边际似然优化的证据下界(ELBO),例如重要性加权自编码器(IWAE)、变分Rényi(VR)和VR-IWAE边界。然而,边界和梯度估计器的联合选择如何影响所得变分推断(VI)算法的行为仍不清楚。本文对与IWAE、VR和VR-IWAE边界相关的重参数化(REP)和双重重参数化(DREP)梯度估计器进行了统一的理论比较。通过当蒙特卡洛样本数$N$趋于无穷时信噪比的渐近分析,我们识别了这些梯度估计器中的偏差-方差权衡,并正式证明了在重要性加权VI中DREP优于REP。针对变分密度和后验密度之间的Kullback-Leibler散度以及$N$都趋于无穷的困难区域的额外渐近分析表明,即使变分近似恶化,重要性加权VI梯度估计器仍指向合理方向。这些互补的结果刻画了重要性加权VI中从糟糕初始化到最终收敛的优化轨迹。重要的是,我们的证明技术为样本均值比的研究建立了通用的理论工具,其范围超出了VI,并对蒙特卡洛方法领域做出了独立贡献。

英文摘要

Several variational bounds involving importance weighting ideas generalize the Evidence Lower BOund (ELBO) for marginal likelihood optimization, such as the Importance-weighted Auto-Encoder (IWAE), Variational Rényi (VR) and VR-IWAE bounds. Yet, it remains unclear how the joint choice of bound and gradient estimator impacts the behavior of the resulting variational inference (VI) algorithms. This paper provides a unified theoretical comparison of reparameterized (REP) and doubly-reparameterized (DREP) gradient estimators tied to the IWAE, VR and VR-IWAE bounds. Through asymptotic analyses of the Signal-to-Noise Ratio as the number of Monter Carlo samples $N$ goes to infinity, we identify a bias-variance tradeoff in these gradient estimators and we formally justify the superiority of DREP over REP in importance-weighted VI. An additional asymptotic analysis for challenging regimes, where both $N$ and the Kullback-Leibler divergence between the variational and posterior densities go to infinity, indicates that importance-weighted VI gradient estimators point in a well-founded direction even when the variational approximation deteriorates. Together, these complementary results characterize the optimization trajectory in importance-weighted VI from poor initialization to final convergence. Importantly, our proof techniques establish general theoretical tools for the study of sample means ratios whose scope extend beyond VI and constitute an independent contribution to the field of Monte Carlo methods.

2410.10241 2026-05-28 cs.LG cs.AI stat.ML

Revisiting Graph Autoencoders as Implicit Contrastive Learners

重新审视图自编码器作为隐式对比学习器

Jintang Li, Ruofan Wu, Yuchang Zhu, Huizhe Zhang, Zulun Zhu, Liang Chen

AI总结 本文通过对比学习视角重新审视图自编码器,揭示其隐式对比学习本质,并强调对比视图设计的关键作用,提出非对称子图视图作为重要设计维度。

详情
Comments
KDD 2026 research track. Code available at https://github.com/EdisonLeeeee/lrGAE
AI中文摘要

图自编码器(GAEs)和图对比学习(GCL)是图上自监督表示学习的两种主要范式,但它们通常被孤立研究并被视为根本不同的方法。在这项工作中,我们通过对比学习的视角重新审视GAEs,并表明基于结构和基于特征的GAEs都可以概念化为隐式图对比学习器。这一视角揭示了许多现有GAEs的主要区别在于对比视图的构建方式,而非学习目标或架构。基于这一见解,我们引入了一个统一公式,强调对比视图设计是GAEs中一个核心且先前较少探索的维度。特别是,我们识别出由子图视图不匹配产生的非对称对比视图,作为先前GAE研究中一个重要但未充分探索的设计轴。我们在统一框架内形式化这一见解,并在代表性图学习任务上进行系统实验,以检验其对性能和效率的影响。我们的结果表明,将GAEs解释为隐式对比学习器能更清晰地理解现有模型,并为设计有效且可扩展的图自编码器提供实用指导。

英文摘要

Graph autoencoders (GAEs) and graph contrastive learning (GCL) are two major paradigms for self-supervised representation learning on graphs, yet they are often studied in isolation and treated as fundamentally different approaches. In this work, we revisit GAEs through the lens of contrastive learning and show that both structure-based and feature-based GAEs can be conceptualized as implicitly graph contrastive learners. This perspective reveals that many existing GAEs differ primarily in how contrastive views are constructed, rather than in their learning objectives or architectures. Building on this insight, we introduce a unified formulation that highlights contrastive view design as a central and previously less explored dimension in GAEs. In particular, we identify asymmetric contrastive views, arising from mismatches in subgraph views, as an important yet underexplored design axis in prior GAE research. We formalize this insight within a unified framework and conduct systematic experiments on representative graph learning tasks to examine its impact on performance and efficiency. Our results show that interpreting GAEs as implicit contrastive learners offers a clearer understanding of existing models and provides practical guidance for designing effective and scalable graph autoencoders.

2409.19712 2026-05-28 stat.ME stat.ML

Posterior Conformal Prediction

后验共形预测

Yao Zhang, Emmanuel J. Candès

AI总结 提出后验共形预测(PCP)方法,通过将条件非一致性得分分布建模为混合分布,在数据自然发现的簇上实现边际和近似条件有效性,产生更紧的预测区间。

详情
Comments
67 pages, 17 figures
AI中文摘要

共形预测是一种流行的技术,用于构建具有无分布覆盖保证的预测区间。覆盖是边际的,意味着它仅在整个总体平均上成立,而不一定适用于任何特定子组。本文介绍了后验共形预测(PCP),它为数据中自然发现的簇(或子组)生成具有边际和近似条件有效性的预测区间。PCP通过将条件非一致性得分分布建模为簇分布的混合来实现这些保证。与其他具有近似条件有效性的方法相比,该方法产生更紧的区间,特别是当测试数据来自验证数据中代表性良好的簇时。PCP也可用于保证用户指定子组的条件覆盖,在这种情况下,它进一步确保每个子组中代表性不足的个体的覆盖。当响应变量是分类变量时,PCP可以根据分类器的预测概率调整覆盖水平,如果分类器校准良好,则产生低基数的预测集。我们在来自社会经济、材料科学和医疗保健的数据集上展示了增强的性能。

英文摘要

Conformal prediction is a popular technique for constructing prediction intervals with distribution-free coverage guarantees. The coverage is marginal, meaning it only holds on average over the entire population but not necessarily for any specific subgroup. This article introduces posterior conformal prediction (PCP), which generates prediction intervals with both marginal and approximate conditional validity for clusters (or subgroups) naturally discovered in the data. PCP achieves these guarantees by modelling the conditional nonconformity score distribution as a mixture of cluster distributions. Compared to other methods with approximate conditional validity, this approach produces tighter intervals, particularly when the test data is drawn from clusters that are well represented in the validation data. PCP can also be applied to guarantee conditional coverage on user-specified subgroups, in which case it further ensures coverage for underrepresented individuals in each subgroup. When the response variable is categorical, PCP can adjust the coverage level based on the classifier's predictive probabilities, yielding low-cardinality prediction sets if the classifier is well calibrated. We demonstrate enhanced performance on datasets from socioeconomics, materials science, and healthcare.

2409.06060 2026-05-28 math.ST stat.TH

Empirical Bernstein in smooth Banach spaces

光滑Banach空间中的经验Bernstein不等式

Diego Martinez-Taboada, Aaditya Ramdas

AI总结 针对有界向量值随机变量,提出一种利用方差经验估计量的新经验Bernstein不等式,适用于2-光滑可分Banach空间,渐近达到与Bernstein不等式相同的置信集宽度。

详情
AI中文摘要

现有的有界向量值随机变量的浓度界包括标量Hoeffding和Bernstein不等式的推广。后者通常更紧,但需要知道随机变量方差的一个界。我们推导了一个新的向量值经验Bernstein不等式,它利用方差的经验估计量代替真实方差。该不等式在2-光滑可分Banach空间中成立,包括有限维欧几里得空间和可分Hilbert空间。所得置信集在批量设置(样本量固定)和序贯设置(样本量为停时)中均被实例化。置信集宽度渐近地精确匹配Bernstein不等式在主导项中的结果。

英文摘要

Existing concentration bounds for bounded vector-valued random variables include extensions of the scalar Hoeffding and Bernstein inequalities. While the latter is typically tighter, it requires knowing a bound on the variance of the random variables. We derive a new vector-valued empirical Bernstein inequality, which makes use of an empirical estimator of the variance instead of the true variance. The bound holds in 2-smooth separable Banach spaces, which include finite dimensional Euclidean spaces and separable Hilbert spaces. The resulting confidence sets are instantiated for both the batch setting (where the sample size is fixed) and the sequential setting (where the sample size is a stopping time). The confidence set width asymptotically exactly matches that achieved by Bernstein in the leading term.

2404.11150 2026-05-28 stat.ME

Automated, efficient and model-free inference for randomized clinical trials via data-driven covariate adjustment

通过数据驱动的协变量调整实现随机临床试验的自动化、高效和无模型推断

Kelly Van Lancker, Iván Díaz, Stijn Vansteelandt

AI总结 本文提出使用自动化数据自适应方法(如逐步回归、Lasso和灵活机器学习)进行协变量调整,以解决预指定协变量及其函数形式的挑战,并提供有效且可解释的处理效应估计和标准误差,即使结果模型错误设定或预测有偏。

详情
AI中文摘要

2023年,美国食品药品监督管理局发布了随机临床试验中协变量调整的指南,强调其通过预后基线变量提高精度和效力的作用。尽管有潜力,许多试验未充分利用该方法,部分原因在于预指定最优基线协变量及其函数形式的挑战。我们探索了自动化、数据自适应方法(包括逐步回归、Lasso和灵活机器学习算法)在协变量调整中的潜力,以应对预指定挑战。我们的方法确保有效且可解释的处理效应估计和标准误差,即使结果模型错误设定或使用有偏的结果预测。这与大多数竞争方法不同,后者假设模型正确指定以获得一致的标准误差。我们的估计量需要交叉拟合以获得可靠的标准误差估计,但当使用变量选择时,若结果模型满足超稀疏性假设,则可省略。因此,我们得到了随机临床试验(或类似研究如A/B测试)中边际处理效应的简单估计量和标准误差,利用来自预后基线协变量的数据自适应预测,即使在预测有偏的情况下,有限样本中偏差也很小(或没有)。实证和方法学结果表明,自动化协变量调整在提高试验分析统计效力方面具有前景。

英文摘要

In 2023, the U.S. Food and Drug Administration issued guidance for adjustment of covariates in randomized clinical trials, emphasizing its role in enhancing precision and power through prognostic baseline variables. Despite its potential, many trials underutilize this method partly due to challenges in pre-specifying optimal baseline covariates and their functional forms. We explore the potential of automated, data-adaptive methods-including stepwise regression, Lasso and flexible machine learning algorithms-for covariate adjustment, addressing the challenge of pre-specification. Our approach ensures valid and interpretable treatment effect estimates and standard errors, even when outcome models are misspecified or biased outcome predictions are used. This differs from most competing methods, which assume correctly specified models for consistent standard errors. Our estimators require cross-fitting for reliable standard error estimation, though it can be omitted when variable selection is used, provided the outcome model satisfies an ultra-sparsity assumption. As such, we arrive at simple estimators and standard errors for marginal treatment effects in randomized clinical trials (or similar studies like A/B-testing), exploiting data-adaptive predictions from prognostic baseline covariates, with little (or no) bias in finite samples even when predictions are biased. Empirical and methodological results demonstrate promise of automated covariate adjustment for improving statistical power of trial analyses.

2403.16825 2026-05-28 cs.LG math.OC math.PR stat.ML

Weak Convergence Analysis of Online Neural Actor-Critic Algorithms

在线神经演员-评论家算法的弱收敛分析

Samuel Chun-Hei Lam, Justin Sirignano, Ziheng Wang

AI总结 本文证明单层神经网络在线演员-评论家算法在隐藏单元数和训练步数趋于无穷时分布收敛到随机常微分方程,并利用泊松方程和弱收敛技术分析极限行为。

详情
AI中文摘要

我们证明,当隐藏单元数和训练步数趋于无穷时,使用在线演员-评论家算法训练的单层神经网络在分布上收敛到一个随机常微分方程(ODE)。在在线演员-评论家算法中,数据样本的分布随着模型更新而动态变化,这是任何收敛分析的关键挑战。我们建立了在固定演员策略下数据样本的几何遍历性。然后,利用泊松方程,我们证明了由于随机到达的数据样本导致的模型更新围绕极限分布的波动随着参数更新次数趋于无穷而消失。利用泊松方程和弱收敛技术,我们证明了演员神经网络和评论家神经网络收敛到具有随机初始条件的ODE系统的解。对极限ODE的分析表明,极限评论家网络将收敛到真实价值函数,这将为演员提供策略梯度的渐近无偏估计。然后我们证明极限演员网络将收敛到一个驻点。

英文摘要

We prove that a single-layer neural network trained with the online actor critic algorithm converges in distribution to a random ordinary differential equation (ODE) as the number of hidden units and the number of training steps $\rightarrow \infty$. In the online actor-critic algorithm, the distribution of the data samples dynamically changes as the model is updated, which is a key challenge for any convergence analysis. We establish the geometric ergodicity of the data samples under a fixed actor policy. Then, using a Poisson equation, we prove that the fluctuations of the model updates around the limit distribution due to the randomly-arriving data samples vanish as the number of parameter updates $\rightarrow \infty$. Using the Poisson equation and weak convergence techniques, we prove that the actor neural network and critic neural network converge to the solutions of a system of ODEs with random initial conditions. Analysis of the limit ODE shows that the limit critic network will converge to the true value function, which will provide the actor an asymptotically unbiased estimate of the policy gradient. We then prove that the limit actor network will converge to a stationary point.

2006.06049 2026-05-28 cs.LG stat.ML

On Mixup Regularization

关于混合正则化

Luigi Carratino, Moustapha Cissé, Rodolphe Jenatton, Jean-Philippe Vert

AI总结 本文通过将混合解释为数据变换与随机扰动的组合,揭示了其正则化效应,并提出了测试时数据变换改进以及标签平滑和Lipschitz常数减小等机制。

详情
Journal ref
Journal of Machine Learning Research, 23(325):1-31, 2022
AI中文摘要

混合是一种数据增强技术,通过训练点和标签的凸组合创建新样本。这种简单技术已在不同设置和应用中经验性地提高了许多最先进模型的准确性,但其经验成功背后的原因仍然知之甚少。在本文中,我们通过阐明混合的正则化效应,在解释其理论基础方面迈出了重要一步。我们表明,混合可以解释为在数据变换和变换数据随机扰动组合下的标准经验风险最小化估计量。从这一新解释中,我们获得了两个核心见解。首先,数据变换表明,在测试时,使用混合训练的模型也应应用于变换后的数据,这是一行代码的改变,我们经验性地表明这可以提高预测的准确性和校准。其次,我们展示了混合新解释中的随机扰动如何诱导多种已知的正则化方案,包括标签平滑和估计量Lipschitz常数的减小。这些方案协同相互作用,产生自校准且有效的正则化效果,防止过拟合和过度自信的预测。我们通过实验支持我们的理论分析,这些实验证实了我们的结论。

英文摘要

Mixup is a data augmentation technique that creates new examples as convex combinations of training points and labels. This simple technique has empirically shown to improve the accuracy of many state-of-the-art models in different settings and applications, but the reasons behind this empirical success remain poorly understood. In this paper we take a substantial step in explaining the theoretical foundations of Mixup, by clarifying its regularization effects. We show that Mixup can be interpreted as standard empirical risk minimization estimator subject to a combination of data transformation and random perturbation of the transformed data. We gain two core insights from this new interpretation. First, the data transformation suggests that, at test time, a model trained with Mixup should also be applied to transformed data, a one-line change in code that we show empirically to improve both accuracy and calibration of the prediction. Second, we show how the random perturbation of the new interpretation of Mixup induces multiple known regularization schemes, including label smoothing and reduction of the Lipschitz constant of the estimator. These schemes interact synergistically with each other, resulting in a self calibrated and effective regularization effect that prevents overfitting and overconfident predictions. We corroborate our theoretical analysis with experiments that support our conclusions.

2206.15475 2026-05-28 cs.LG stat.ME

Causal Machine Learning: A Survey and Open Problems

因果机器学习:综述与开放问题

Jean Kaddour, Aengus Lynch, Qi Liu, Matt J. Kusner, Ricardo Silva

AI总结 本文综述了因果机器学习(CausalML)的五个主要研究方向(因果监督学习、因果生成建模、因果解释、因果公平性和因果强化学习),系统比较了各方向的方法,指出了开放问题,并讨论了在计算机视觉、自然语言处理和图表征学习中的应用。

详情
Comments
v03. Work in progress. Feedback and comments are highly appreciated!
AI中文摘要

因果机器学习(CausalML)是机器学习方法的总称,这些方法将数据生成过程形式化为结构因果模型(SCM)。这种视角使我们能够推理对该过程的改变(干预)的效果以及事后本会发生的情况(反事实)。我们根据解决的问题将CausalML的工作分为五类:(1)因果监督学习,(2)因果生成建模,(3)因果解释,(4)因果公平性,以及(5)因果强化学习。我们系统地比较了每个类别中的方法,并指出了开放问题。此外,我们回顾了计算机视觉、自然语言处理和图表征学习中的数据模态特定应用。最后,我们提供了因果基准的概述和对这一新兴领域现状的批判性讨论,包括对未来工作的建议。

英文摘要

Causal Machine Learning (CausalML) is an umbrella term for machine learning methods that formalize the data-generation process as a structural causal model (SCM). This perspective enables us to reason about the effects of changes to this process (interventions) and what would have happened in hindsight (counterfactuals). We categorize work in CausalML into five groups according to the problems they address: (1) causal supervised learning, (2) causal generative modeling, (3) causal explanations, (4) causal fairness, and (5) causal reinforcement learning. We systematically compare the methods in each category and point out open problems. Further, we review data-modality-specific applications in computer vision, natural language processing, and graph representation learning. Finally, we provide an overview of causal benchmarks and a critical discussion of the state of this nascent field, including recommendations for future work.

2205.14090 2026-05-28 stat.ML cs.LG

Surrogate modeling for Bayesian optimization beyond a single Gaussian process

超越单一高斯过程的贝叶斯优化代理建模

Qin Lu, Konstantinos D. Polyzos, Bingcong Li, Georgios B. Giannakis

AI总结 提出一种基于高斯过程集成(EGP)的自适应代理模型,结合汤普森采样(TS)进行贝叶斯优化,以增强表达能力和并行性,并建立了贝叶斯遗憾分析。

详情
Comments
This version added some minor corrections and clarifications to the proofs
AI中文摘要

贝叶斯优化(BO)在优化具有昂贵评估代价的黑盒函数方面具有充分记录的优点。这类函数出现在超参数调优、药物发现和机器人等多样化应用中。BO依赖于贝叶斯代理模型来顺序选择查询点,以平衡搜索空间的探索与利用。大多数现有工作依赖于基于单一高斯过程(GP)的代理模型,其中核函数形式通常使用领域知识预先选择。为了绕过这种设计过程,本文利用GP的集成(E)来自适应地选择实时拟合的代理模型,从而产生具有增强表达能力的GP混合后验。然后,通过汤普森采样(TS)实现使用基于EGP的函数后验获取下一个评估输入,这不需要额外的设计参数。为了赋予函数采样可扩展性,每个GP模型采用基于随机特征的核近似。新颖的EGP-TS易于适应并行操作。为了进一步建立所提出的EGP-TS收敛到全局最优的结论,基于贝叶斯遗憾的概念对顺序和并行设置进行了分析。在合成函数和实际应用上的测试展示了所提出方法的优点。

英文摘要

Bayesian optimization (BO) has well-documented merits for optimizing black-box functions with an expensive evaluation cost. Such functions emerge in applications as diverse as hyperparameter tuning, drug discovery, and robotics. BO hinges on a Bayesian surrogate model to sequentially select query points so as to balance exploration with exploitation of the search space. Most existing works rely on a single Gaussian process (GP) based surrogate model, where the kernel function form is typically preselected using domain knowledge. To bypass such a design process, this paper leverages an ensemble (E) of GPs to adaptively select the surrogate model fit on-the-fly, yielding a GP mixture posterior with enhanced expressiveness for the sought function. Acquisition of the next evaluation input using this EGP-based function posterior is then enabled by Thompson sampling (TS) that requires no additional design parameters. To endow function sampling with scalability, random feature-based kernel approximation is leveraged per GP model. The novel EGP-TS readily accommodates parallel operation. To further establish convergence of the proposed EGP-TS to the global optimum, analysis is conducted based on the notion of Bayesian regret for both sequential and parallel settings. Tests on synthetic functions and real-world applications showcase the merits of the proposed method.

2112.09305 2026-05-28 cs.LG stat.ML

Gaussian RBF Centered Kernel Alignment (CKA) in the Large Bandwidth Limit

大带宽极限下的高斯RBF中心核对齐(CKA)

Sergio A. Alvarez

AI总结 本文证明基于高斯RBF核的中心核对齐(CKA)在大带宽极限下收敛到线性CKA,并发现收敛起始对特征表示的几何形状敏感,表示偏心率限制了高斯CKA表现非线性的带宽范围。

详情
Journal ref
IEEE TPAMI, vol. 45, issue 5, 01 May 2023, pages 6587-6593
Comments
11 pages, 3 figures
AI中文摘要

我们证明基于高斯RBF核的中心核对齐(CKA)在大带宽极限下收敛到线性CKA。我们表明收敛起始对特征表示的几何形状敏感,并且表示偏心率限制了高斯CKA表现非线性的带宽范围。

英文摘要

We prove that Centered Kernel Alignment (CKA) based on a Gaussian RBF kernel converges to linear CKA in the large-bandwidth limit. We show that convergence onset is sensitive to the geometry of the feature representations, and that representation eccentricity bounds the range of bandwidths for which Gaussian CKA behaves nonlinearly.

1901.03808 2026-05-28 cs.LG eess.SP stat.ML

ECGadv: Generating Adversarial Electrocardiogram to Misguide Arrhythmia Classification System

ECGadv: 生成对抗性心电图以误导心律失常分类系统

Huangxun Chen, Chenyu Huang, Qianyi Huang, Qian Zhang, Wei Wang

AI总结 本文针对基于深度神经网络的心电图诊断系统,分析心电图特性并设计两种攻击模型下的对抗攻击方案,揭示系统盲点,呼吁采取对策。

详情
Journal ref
Proceedings of the AAAI conference on artificial intelligence 2020
Comments
Accepted by AAAI 2020
AI中文摘要

基于深度神经网络(DNN)的心电图(ECG)诊断系统最近取得了令人瞩目的进展,有望取代心脏病专家进行繁琐的检查。然而,它们对对抗攻击的脆弱性仍缺乏全面研究。由于心电图在可视化和动态特性上的独特性,图像领域的现有攻击无法直接适用。因此,本文迈出一步,深入探索对基于DNN的心电图诊断系统的对抗攻击。我们分析心电图特性,分别在两种攻击模型下设计有效的攻击方案。我们的结果揭示了基于DNN的诊断系统在对抗攻击下的盲点,这呼吁采取充分的应对措施。

英文摘要

Deep neural networks (DNNs)-powered Electrocardiogram (ECG) diagnosis systems recently achieve promising progress to take over tedious examinations by cardiologists. However, their vulnerability to adversarial attacks still lack comprehensive investigation. The existing attacks in image domain could not be directly applicable due to the distinct properties of ECGs in visualization and dynamic properties. Thus, this paper takes a step to thoroughly explore adversarial attacks on the DNN-powered ECG diagnosis system. We analyze the properties of ECGs to design effective attacks schemes under two attacks models respectively. Our results demonstrate the blind spots of DNN-powered diagnosis systems under adversarial attacks, which calls attention to adequate countermeasures.