arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.30327 2026-05-29 cs.LG cs.AI cs.CL math.ST stat.ML stat.TH

Reasoning with Sampling: Cutting at Decision Points

基于采样的推理:在决策点进行裁剪

Felix Zhou, Anay Mehrotra, Quanquan C. Liu

AI总结 提出Entropy-Cut Metropolis-Hastings算法,利用基础模型的下一词元熵作为代理识别关键决策点并重新采样,从而高效地从幂分布中采样以增强推理能力,在多个基准上超越基线和RL训练模型。

详情
AI中文摘要

前沿推理模型是通过对基础语言模型进行强化学习后训练而产生的。最近的研究对此提出了挑战,表明从基础模型分布的锐化版本(即所谓的幂分布)中采样,无需额外训练、精心策划的数据集或验证器,就能产生可比的推理能力。然而,使这种方法实用化需要高效地从幂分布中采样。采样器需要“混合”到幂分布,这需要在目标分布的模态之间移动;直观地说,例如尝试不同的推理策略。先前工作中提出的采样器反复在当前推理轨迹中均匀随机选择一个“裁剪”位置,并从该位置开始重新采样后缀。然而,推理轨迹通常包含少数关键决策(例如,证明策略或算法的选择),我们观察到均匀选择的裁剪往往重写局部细节,而不是重新审视决策点。我们引入了一种算法(Entropy-Cut Metropolis-Hastings),该算法使用基础模型的下一词元熵作为代理来识别关键决策点,并从这些位置重新采样。我们通过实验验证了熵跳变是决策点的有用代理,并在一个风格化的推理模型中证明了我们的方法的混合时间与轨迹中的决策数量成比例,而不是与可能大得多的词元数量成比例。在MATH500、HumanEval、GPQA Diamond和AIME26上,我们的方法始终优于基线和RL训练模型。

英文摘要

Frontier reasoning models are produced by posttraining base language models with reinforcement learning. Recent work has challenged this by showing that sampling from a sharpened version of the base model's distribution, a so-called power distribution, elicits comparable reasoning without additional training, curated datasets, or verifiers. However, making this method practical requires efficiently sampling from the power distribution. A sampler needs to "mix" to the power distribution, which necessitates moving between modes of the target distribution; intuitively, e.g., trying different reasoning strategies. The samplers proposed in prior works repeatedly select a "cut" position in the current reasoning trace uniformly at random and resample the suffix from that position onward. However, reasoning traces typically contain a few consequential decisions (e.g., the choice of proof strategy or algorithm), and we observe that a uniformly chosen cut tends to rewrite local details rather than revisit decision points. We introduce an algorithm (Entropy-Cut Metropolis-Hastings) that uses the base model's next-token entropy as a proxy to identify key decision points and resample from those positions. We empirically verify that entropy jumps are a useful proxy for decision points and, in a stylized model of reasoning, prove that our method's mixing time scales with the number of decisions in a trace rather than with the number of tokens, which can be much larger. Across MATH500, HumanEval, GPQA Diamond, and AIME26, our method consistently improves over baselines and RL-trained models.

2605.30324 2026-05-29 cs.DS cs.AI cs.CL cs.LG stat.ML

On Language Generation in the Limit with Bounded Memory

有界记忆下的极限语言生成

Jon Kleinberg, Anay Mehrotra, Amin Saberi, Grigoris Velegkas

AI总结 研究有界记忆下语言生成的极限问题,通过组合界和滑动窗口分析记忆约束对可生成性、密度和识别的影响。

详情
Comments
The abstract has been shortened to fit within the arXiv limit
AI中文摘要

我们研究有界记忆下的极限语言生成。在该任务中,学习器每次观察来自未知目标语言的一个示例,并且必须最终只输出新的有效示例。先前的工作假设可以访问整个历史,这是一个强假设,因为实际算法只保留有限的过去信息。学习理论中的经典工作表明,记忆约束会显著改变可学习性;我们将此扩展到语言生成。 首先,我们研究无记忆生成器。在温和的枚举限制下,每个可数无限语言集合仍然可以在没有记忆的情况下生成。没有这个限制,我们精确刻画了何时无记忆生成是可能的。对于有限集合,我们刻画了无记忆生成器可实现的最优极小极大密度——针对任何给定大小的集合所能保证的最佳密度。这个组合界依赖于Sperner定理和对称链分解。 我们进一步表明,最后$W$个示例的滑动窗口不会改善这种最坏情况密度,而允许存储$b$个自适应选择的过去示例则会改善每个$b \geq 1$的可实现密度。 最后,我们重新审视极限识别,其中学习器必须收敛到目标语言的单个正确假设。我们关注其增量变体,其中学习器只记住其之前的猜测。在这里,尽管精确识别在仅包含三种语言的集合上失败,但一个温和的松弛——要求收敛到目标的“近似”版本——对于每个有限集合都是可实现的。 这些结果表明,有界记忆对这些任务的影响不同:生成对于每个可数集合仍然可实现,而密度和识别仅限于有限集合,且随着集合增长保证减弱。

英文摘要

We study language generation in the limit under bounded memory. In this task, a learner observes examples from an unknown target language one at a time and must eventually output only new valid examples. Prior work assumes access to the entire history, a strong assumption since realistic algorithms retain limited past information. Classical work in learning theory shows memory constraints dramatically alter learnability; we extend this to language generation. First, we study memoryless generators. Under a mild enumeration restriction, every countable collection of infinite languages remains generable without memory. Without this restriction, we exactly characterize when memoryless generation is possible. For finite collections, we characterize the optimal minimax density achievable by memoryless generators -- the best density guaranteed against any collection of a given size. This combinatorial bound relies on Sperner's theorem and symmetric chain decompositions. We further show that a sliding window of the last $W$ examples does not improve this worst-case density, whereas allowing it to store $b$ adaptively chosen past examples improves the achievable density for every $b \geq 1$. Finally, we revisit identification in the limit, where the learner must converge to a single correct hypothesis for the target language. We focus on its incremental variant, where the learner remembers only its previous guess. Here, although exact identification fails on a collection of just three languages, a mild relaxation requiring convergence to an ``approximate'' version of the target is achievable for every finite collection. These results show bounded memory affects these tasks differently: generation remains achievable for every countable collection, while density and identification are confined to finite collections, with guarantees weakening as the collection grows.

2605.30321 2026-05-29 math.PR math.ST stat.TH

A Bayesian Proof and Interpretation of Talagrand's Majorizing Measure Theorem

Talagrand 优势测度定理的贝叶斯证明与解释

Ilias Zadik

AI总结 本文通过贝叶斯方法,利用高斯加性模型的两个面积恒等式,比较最大似然估计与贝叶斯最优估计,给出了 Talagrand 优势测度定理下界的简洁证明。

详情
AI中文摘要

在本文中,我们给出了 Talagrand 著名的优势测度定理(MMT)的一个简短贝叶斯证明。虽然 MMT 的上界方向相对直接地遵循标准论证,但下界方向被广泛认为是更困难的部分,并且已有几种不同的证明。与以往的方法不同,我们的证明不依赖于现有的高斯过程下界技术,也不依赖于组合、几何或编码理论构造。相反,我们从高斯加性模型的两个面积恒等式推导出下界。我们证明,有限集的高斯宽度是最大似然估计(MLE)的积分均方误差,而积分最小均方误差(MMSE)大于 Fernique-Talagrand 泛函,相差一个通用常数。然后,只需比较 MLE 与贝叶斯最优估计,即可直接证明 MMT 的困难方向。

英文摘要

In this paper, we give a short Bayesian proof of Talagrand's celebrated majorizing-measure theorem (MMT). While the upper-bound direction of MMT follows relatively directly from standard arguments, the lower-bound direction is widely regarded as the more difficult part and has received several distinct proofs. Unlike previous approaches, our proof does not rely on existing Gaussian processes lower bounds techniques, nor on combinatorial, geometric, or coding-theoretic constructions. Instead, we derive the lower bound from two area identities for Gaussian additive models. We show that the Gaussian width of a finite set is the integrated mean-squared error of the maximum-likelihood estimator (MLE), while the integrated minimum mean-squared error (MMSE) is larger than the Fernique-Talagrand functional, up to a universal constant. Simply then comparing the MLE with Bayes-optimal estimation gives a direct proof of the hard direction of MMT.

2605.30319 2026-05-29 stat.ML cs.AI cs.DS cs.LG math.ST stat.TH

Improved Guarantees for Heterogeneous Treatment-Effect Estimation via Matrix Completion

通过矩阵补全改进异质性处理效应估计的保证

Anay Mehrotra, Phuc Tran, Van H. Vu, Manolis Zampetakis

AI总结 针对面板数据中的异质性处理效应估计问题,提出一种基于矩阵补全的简单高效估计器,在低秩假设下实现行向$\ell_2$误差$ ilde{O}(\sqrt{1/n + n/m^2})$,并首次建立了低秩逼近的行向$\ell_2$扰动界。

详情
AI中文摘要

现代因果推断的一个核心目标是估计异质性处理效应,以回答诸如“干预如何影响每个单元”的问题,而不仅仅是平均效应。我们研究面板数据下的该问题,其中我们观察到$n$个单元在$m$个时间点上的数据,且处理分配未知且非均匀。该设置中的数据自然表示为所有单元-时间处理效应的矩阵。估计异质性处理效应可以表示为对该矩阵中每一行平均值的良好估计。这使我们能够将问题表述为矩阵补全,在自然低秩假设下可解。然而,现有的矩阵补全保证不足以得到估计异质性处理效应所需的每行保证的有意义界;粗略地说,它们仅适用于估计平均处理效应界,正如最近一系列工作所示。我们给出一个简单、计算高效的估计器,在不知道倾向性且标准低秩和正则性假设下,实现行向$\ell_2$误差$ ilde{O}(\sqrt{ rac{1}{n} + rac{n}{m^2}})$。在技术上,我们的分析首次建立了低秩逼近的尖锐行向$\ell_2$扰动界,补充了现有的谱、Frobenius和逐元素扰动理论。

英文摘要

A central goal of modern causal inference is estimating heterogeneous treatment effects to answer questions like "how does an intervention affect each unit," rather than only on average. We study this problem with panel-data where we observe $n$ units across $m$ times under unknown, non-uniform treatment assignments. The data in this setting is naturally represented as a matrix of all unit--time treatment effects. Estimating heterogeneous treatment effects can then be expressed as obtaining a good estimation of each row's average in this matrix. This allows us to formulate the problem as matrix completion, which can be solved under natural low-rankness assumptions. However, existing matrix-completion guarantees are not powerful enough to get meaningful bounds for the per-row guarantee required for estimating the heterogeneous treatment effect; roughly speaking, they are only useful for estimating average treatment effect bounds, as also illustrated in a recent line of work. We give a simple, computationally efficient estimator that, without knowledge of the propensities and under standard low-rankness and regularity assumptions, achieves a row-wise $\ell_2$ error of $\tilde{O}(\sqrt{\frac{1}{n} + \frac{n}{m^2}})$. Technically, our analysis establishes the first sharp row-wise $\ell_2$-perturbation bound for low-rank approximation, complementing existing spectral-, Frobenius-, and entrywise perturbation theory.

2605.24244 2026-05-29 stat.ML cs.LG

MEDAL: Manifold Embedding Distillation via Autoencoder Learning

MEDAL: 通过自编码器学习的流形嵌入蒸馏

Irene Chang, Tarek M. Zikry, Genevera I. Allen

AI总结 提出MEDAL框架,通过约束自编码器将流形嵌入蒸馏为可复用的编码器-解码器模型,实现留出验证、超参数选择和分布偏移检测。

详情
AI中文摘要

低维嵌入被广泛用作高维数据的视觉摘要,并支持下游科学发现。然而,流行的非线性降维方法(如t-SNE和UMAP)通常仅根据视觉吸引力选择,缺乏严格的定量验证。主要原因是流形嵌入通常不提供样本外映射或返回原始特征空间的逆映射;这使得留出验证(监督学习的黄金标准)几乎不可能。为了解决这些挑战,我们开发了一个新颖的框架MEDAL(通过自编码器学习的流形嵌入蒸馏),它将拟合的流形嵌入蒸馏为可复用的编码器-解码器模型。MEDAL训练一个约束自编码器,其瓶颈精确匹配任何教师嵌入,而解码器重建原始输入;这为新样本提供了显式映射、近似逆映射以及流形空间中基于逐点重建的失真度量。这将静态流形嵌入转换为可在留出数据上评估的模型,从而实现定量验证,包括比较不同降维方法以及超参数调优。在多个基准和科学案例研究中,我们展示了MEDAL能够通过留出验证确定最优流形嵌入和超参数,揭示难以在二维嵌入中保留的生物相干区域,并在新样本映射到固定参考流形时检测分布偏移。MEDAL为任何现有降维技术提供了一个通用验证包装器,将提高科学工作流中降维的严谨性和可靠性。

英文摘要

Low-dimensional embeddings are widely used as visual summaries of high-dimensional data and to enable downstream scientific discoveries. Yet, popular nonlinear dimension reduction methods, such as t-SNE and UMAP, are often selected based on visual appeal alone and without rigorous quantitative validation. A major reason is that manifold embeddings typically do not provide an out-of-sample map nor an inverse back to the original feature space; this makes held-out validation, the gold standard in supervised learning, all but impossible. To address these challenges, we develop a novel framework, MEDAL (Manifold Embedding Distillation via Autoencoder Learning), which distills a fitted manifold embedding into a reusable encoder--decoder model. MEDAL trains a constrained autoencoder whose bottleneck exactly matches any teacher embedding while the decoder reconstructs the original input; this yields an explicit map for new samples, an approximate inverse, and a pointwise reconstruction-based measure of distortion in the manifold space. This converts static manifold embeddings into models that can be evaluated on held-out data, enabling quantitative validation including comparing different dimension reduction methods as well as hyperparameter tuning. Across multiple benchmark and scientific case studies, we show that MEDAL enables held-out validation to determine optimal manifold embeddings and hyperparameters, reveals biologically coherent regions that are difficult to preserve in two dimensional embeddings, and detects distribution shift when new samples are mapped into a fixed reference manifold. MEDAL provides a general validation wrapper to any existing dimension reduction technique that will improve the rigor and reliability of dimension reduction in scientific workflows.

2510.08535 2026-05-29 stat.ML cs.LG math.PR

Permutation-Invariant Spectral Learning via Dyson Diffusion

通过戴森扩散的置换不变谱学习

Tassilo Schwarz, Cai Dieball, Constantin Kogler, Renaud Lambiotte, Arnaud Doucet, Aljaž Godec, George Deligiannidis

AI总结 提出戴森扩散模型,利用随机矩阵理论从分析上提取扩散过程的谱特性,将归纳偏置从架构转移到动力学,实现置换不变的谱学习,准确学习图谱并超越现有图扩散模型。

详情
AI中文摘要

扩散模型是生成建模的核心,并已通过扩散邻接矩阵表示适应于图。对于具有$n$个节点的图,存在多达$n!$个这样的表示,这一挑战仅通过使用置换等变学习架构得到部分缓解。尽管计算效率高,现有的图扩散模型难以区分某些图族及其谱,除非图数据被增强以特定的特征。这一缺陷源于在学习架构中强制执行归纳偏置。在这项工作中,我们利用随机矩阵理论从分析上提取扩散过程的谱特性,从而将大部分归纳偏置从架构推入动力学。在此基础上,我们引入了戴森扩散模型,该模型采用戴森布朗运动来捕捉邻接矩阵上Ornstein-Uhlenbeck过程的谱动力学。此外,以谱动力学为条件,我们制定了一个李群扩散,适当地建模剩余的自由度。引人注目的是,由此产生的学习问题在李代数层面上变为置换不变的。我们证明,戴森扩散模型能够准确学习图谱,并优于现有的图扩散模型。

英文摘要

Diffusion models are central to generative modeling and have been adapted to graphs by diffusing adjacency matrix representations. The challenge of having up to $n!$ such representations for graphs with $n$ nodes is only partially mitigated by using permutation-equivariant learning architectures. Despite their computational efficiency, existing graph diffusion models struggle to distinguish certain graph families and their spectra, unless graph data are augmented with ad hoc features. This shortcoming stems from enforcing the inductive bias within the learning architecture. In this work, we leverage random matrix theory to analytically extract the spectral properties of the diffusion process, allowing us to push most of the inductive bias from the architecture into the dynamics. Building on this, we introduce the Dyson Diffusion Model, which employs Dyson's Brownian motion to capture the spectral dynamics of an Ornstein-Uhlenbeck process on the adjacency matrix. Furthermore, conditioned on the spectral dynamics, we formulate a Lie group diffusion, appropriately modeling the remaining degrees of freedom. Strikingly, the resulting learning problem becomes permutation invariant at the Lie algebra level. We demonstrate that the Dyson Diffusion Model learns graph spectra accurately and outperforms existing graph diffusion models.

2503.24022 2026-05-29 math.ST stat.ML stat.TH

Wasserstein KL-divergence for Gaussian distributions

高斯分布的Wasserstein KL散度

Adwait Datar, Nihat Ay

AI总结 提出基于Wasserstein几何的高斯分布KL散度新版本(WKL散度),证明其与样本空间几何一致,且狄拉克测度的WKL散度正比于两点间距离平方。

详情
AI中文摘要

我们引入了一种基于Wasserstein几何的高斯分布KL散度新版本,称为WKL散度。我们证明该版本与样本空间${\Bbb R}^n$的几何一致。特别地,我们可以评估集中在两个点上的狄拉克测度的WKL散度,结果发现它正比于这两点之间的平方距离。

英文摘要

We introduce a new version of the KL-divergence for Gaussian distributions which is based on Wasserstein geometry and referred to as WKL-divergence. We show that this version is consistent with the geometry of the sample space ${\Bbb R}^n$. In particular, we can evaluate the WKL-divergence of the Dirac measures concentrated in two points which turns out to be proportional to the squared distance between these points.

2605.30289 2026-05-29 cs.LG stat.AP stat.ML

Statistical Embeddings for Similarity, Retrieval, and Interpretable Alignment of Numeric Tabular Datasets

用于数值表格数据集的相似性、检索和可解释对齐的统计嵌入

M. Ross Kunz, John Merickel, Keith Wilson

AI总结 提出一种通过结构化探索性数据分析描述符、句子变换器嵌入和典型相关分析(CCA)来表征和比较数值表格数据集的方法,实现跨数据集的相似性检索和可解释变量级对齐,并支持差分隐私。

详情
AI中文摘要

数值表格数据集是科学实践中的主要数据格式,但大型语言模型缺乏在异构特征空间中有意义地表示数值数据集的原生机制。现有方法要么针对单个数据集的预测建模(需要共享变量定义),要么缺乏可解释的跨数据集对齐机制。提出的方法通过结构化探索性数据分析描述符来表征数值表格数据集,使用预训练的句子变换器将这些描述符嵌入到共享向量空间,并通过典型相关分析(CCA)量化跨数据集相似性。此外,应用惩罚形式的CCA来恢复数据集之间稀疏、可解释的变量级对应关系,识别哪些统计描述符或变量级数量驱动跨数据集对齐,而无需共享变量名或特征约定。在嵌入之前,可选地对描述符集应用差分隐私,支持在敏感数据环境中部署,而无需在比较时访问原始观测值。该方法在15个数据集上进行了评估,涵盖通用基准、材料信息学和核级石墨表征。结果表明,总P@1得分为0.9,已知最近邻检索和聚类结构在嵌入消融和差分隐私预算下保持稳健。所提出的框架为将异构数值数据集成到检索增强生成流程中提供了一条原则性途径,同时保留统计上下文,直接应用于数据驱动的算法选择和未知数据集的模拟模型初始化。

英文摘要

Numeric tabular datasets are the dominant data format in scientific practice, yet large language models lack native mechanisms for representing numeric datasets in a meaningful way across heterogeneous feature spaces. Existing approaches either target predictive modeling over individual datasets, which requires a shared set of variable definitions, or lack mechanisms for interpretable cross-dataset alignment. The proposed methodology characterizes numeric tabular datasets through structured exploratory data analysis descriptors, embeds those descriptors into a shared vector space using a pretrained sentence transformer, and quantifies cross-dataset similarity via Canonical Correlation Analysis (CCA). Furthermore, a penalized formulation of CCA is applied to recover sparse, interpretable variable-level correspondences between datasets, identifying which statistical descriptors or variable-level quantities drive cross-dataset alignment without requiring shared variable names or feature conventions. Differential privacy is optionally applied to the descriptor set prior to embedding, supporting deployment in sensitive data contexts without requiring access to raw observations at time of comparison. The methodology is evaluated across 15 datasets spanning general-purpose benchmarks, materials informatics, and nuclear-grade graphite characterization. Results demonstrate a total P@1 score of 0.9, with known nearest-neighbor retrieval and cluster structure remaining robust across embedding ablations and differential privacy budgets. The proposed framework provides a principled pathway for integrating heterogeneous numeric data into retrieval-augmented generation pipelines while preserving statistical context, with direct applications to data-driven algorithm selection and simulation model initialization for unknown datasets.

2605.30287 2026-05-29 stat.ME

MoSAIC: Multi-Resolution Spatial Regression Analysis of Cellular Colocalizations in Cancer Imaging

MoSAIC: 癌症成像中细胞共定位的多分辨率空间回归分析

Jessica Aldous, Michele Peruzzi, Maria Masotti, Aaron Udager, Allison May, Evan Keller, Veerabhadran Baladandayuthapani

AI总结 提出层次贝叶斯空间回归模型MoSAIC,联合分析多分辨率空间数据,分解全局肿瘤梯度效应、患者特异性效应和空间依赖性,在肾细胞癌成像中识别EMT梯度相关的免疫-肿瘤共定位变化。

详情
Comments
45 pages (30 before supplement), 6 figures, submitted to ISBA and JSM
AI中文摘要

层次多重成像方法在患者肿瘤标本内的多个空间组织的视野(FOV)中生成空间分辨的单细胞测量,从而能够系统研究肿瘤微环境组织如何沿生物学上有意义的瘤内梯度变化。现有方法未能联合处理恢复真实生物信号所需的多分辨率数据结构。我们提出MoSAIC:细胞共定位的多分辨率空间回归分析,一种为多分辨率空间数据设计的层次贝叶斯空间回归模型。MoSAIC将联合变异分解为三个模型组件:(i)全局肿瘤梯度效应,(ii)患者特异性效应以捕获患者间变异,以及(iii)高斯过程模型以解释每个患者肿瘤组织内FOV之间的空间依赖性。模拟表明,与现有的空间和非空间模型替代方案相比,MoSAIC具有改进的预测和模型拟合。我们的方法受肾细胞癌多重成像队列的启发并应用于该队列,以研究跨上皮-间充质转化(EMT)梯度的免疫-肿瘤共定位模式。MoSAIC识别出随着EMT梯度增加,巨噬细胞-肿瘤共定位增加和细胞毒性T细胞-肿瘤共定位减少,这与EMT相关的免疫抑制和空间变化的免疫参与一致。总体而言,MoSAIC为量化癌症成像研究中的空间肿瘤梯度效应提供了一个可解释的多分辨率框架。软件可在GitHub上获取:jcaldous/MoSAIC。

英文摘要

Hierarchical multiplex imaging approaches generate spatially resolved single-cell measurements across multiple, spatially organized fields of view (FOVs) within patient tumor specimens, thereby enabling systematic investigation of how the organization of the tumor microenvironment varies along biologically meaningful intratumoral gradients. Existing approaches fail to jointly address this multi-resolution data structure needed to recover true biological signals. We propose MoSAIC: multi-resolution spatial regression analysis of cell colocalizations, a hierarchical Bayesian spatial regression model designed for multi-resolution spatial data. MoSAIC decomposes the joint variation into three model components: (i) global tumor-gradient effects, (ii) patient-specific effects to capture inter-patient variability, and (iii) Gaussian process models to account for spatial dependence between FOVs within each patient tumor tissue. Simulations demonstrate MoSAIC has improved prediction and model fit compared to existing spatial and non-spatial model alternatives. Our method is motivated by and applied to a renal cell carcinoma multiplex imaging cohort to investigate immune-tumor colocalization patterns across the epithelial-to-mesenchymal transition (EMT) gradient. MoSAIC identifies increased macrophage-tumor colocalization and decreased cytotoxic T-tumor colocalization progressing across the increasing EMT gradient, consistent with EMT-associated immune suppression and spatially varying immune engagement. Overall, MoSAIC provides an interpretable, multi-resolution framework for quantifying spatial tumor-gradient effects in cancer imaging studies. Software is available on GitHub at jcaldous/MoSAIC.

2605.30266 2026-05-29 math.ST stat.TH

Wasserstein Least Squares: A Canonical Regression Method for Probability Distributions

Wasserstein最小二乘法:概率分布的规范回归方法

Uriel Martínez León, Jonathan Niles-Weed

AI总结 本文提出Wasserstein最小二乘回归方法,从凸分析角度证明其是欧几里得最小二乘在概率分布空间上的规范扩展,并在模板变形模型下实现n^{-1/2}的估计速率,应用于人口统计学数据分析。

详情
AI中文摘要

我们对Wasserstein最小二乘问题进行了数学和统计分析,这是一种针对向量值协变量和分布值响应的回归方法。我们的提议与其他分布回归方法形成对比,因为它具有直接基于随机变量的解释,是经典随机效应模型的非参数类比。在数学方面,我们采用Lavenant (2024)的策略,从凸分析的角度证明Wasserstein最小二乘是欧几里得最小二乘在概率分布空间上的规范扩展;这一观点引出了Wasserstein最小二乘问题的多边缘和对偶公式,扩展了Wasserstein重心类似的理论。我们在模板变形模型下对Wasserstein最小二乘问题进行了统计分析,令人惊讶地表明,估计可以达到n^{-1/2}的速率。作为特例,我们获得了Wasserstein重心估计的改进速率,这比Ahidar-Coutrix、Le Gouic和Paris (2020)建立的速率呈指数级改进。最后,我们提出了一种启发式粒子方法用于Wasserstein最小二乘,并利用它对来自RAND健康与退休研究的大规模人口统计学数据进行了新颖的分析。

英文摘要

We perform a mathematical and statistical analysis of the Wasserstein least squares problem, a regression method for vector-valued covariates and distribution-valued responses. Our proposal contrasts with other distributional regression methods by having a direct interpretation in terms of random variables, as a nonparametric analogue of the classic random-effects model. On the mathematical side, we use a strategy of Lavenant (2024) to show that Wasserstein least squares is the canonical extension of Euclidean least squares to the space of probability distributions from the perspective of convex analysis; this viewpoint gives rise to multimarginal and dual formulations of the Wasserstein least squares problem, extending a similar theory for Wasserstein barycenters. We perform a statistical analysis of the Wasserstein least squares problem under the template deformation model, showing, surprisingly, that estimation is possible at the n^{-1/2} rate. As a special case, we obtain improved rates of estimation for Wasserstein barycenters, which are an exponential improvement over those established by Ahidar-Coutrix, Le Gouic and Paris (2020). Finally, we propose a heuristic particle method for Wasserstein least squares and use it to conduct a novel analysis of large-scale demographic data from the RAND Health and Retirement Study.

2605.30209 2026-05-29 econ.GN q-fin.EC stat.AP

Betting Against Integrity: Identifying Match-Fixing Through In-Play Market Dynamics

对抗诚信:通过实时市场动态识别假球

David Winkelmann, Maya Vienken, Christian Deutscher, Roland Langrock

AI总结 本研究利用意大利足球乙级联赛的高频实时投注数据,通过状态空间模型描述标准投注市场动态并预测预期投注量,再结合异常值检测技术识别异常投注行为,为早期发现假球提供统计支持。

详情
AI中文摘要

假球通过侵蚀公众信任和威胁俱乐部及联赛的财务可持续性,破坏了体育的诚信。全球体育博彩市场的扩张为操纵创造了新的激励和机会,迫切需要严格的数据驱动监控工具。足球在全球博彩营业额中占比最大,尤其容易受到影响:诚信报告持续指出多场可疑比赛,意大利和土耳其过去的丑闻凸显了问题的持续性。本研究使用意大利足球乙级联赛(2018/19-2020/21赛季)的高频实时投注数据,探索检测异常投注行为的统计方法。采用状态空间建模框架描述标准投注市场动态,并根据比赛特征预测预期投注量。然后利用异常值检测技术分析这些预期值的偏差,以识别潜在的可疑时段。结果表明统计建模如何有助于早期识别异常投注模式,从而支持实时体育博彩市场的诚信保障。

英文摘要

Match-fixing undermines the integrity of sport by eroding public trust and threatening the financial sustainability of clubs and leagues. The global expansion of sports betting markets has created new incentives and opportunities for manipulation, calling for rigorous, data-driven monitoring tools. Football, which accounts for the largest share of global betting turnover, remains particularly exposed: integrity reports continue to flag several suspicious matches, with past scandals in Italy and Turkey underlining the problem's persistence. This study uses high-frequency live-betting data from the Italian Serie B (2018/19-2020/21) to explore statistical approaches for detecting abnormal betting behaviour. A state-space modelling framework is employed to describe standard betting market dynamics and to predict expected betting volumes conditional on match characteristics. Deviations from these expectations can then be analysed using outlier detection techniques to identify potentially suspicious periods. The results demonstrate how statistical modelling can contribute to the early identification of irregular betting patterns, thereby supporting integrity assurance in live sports betting markets.

2605.30178 2026-05-29 stat.ME stat.CO

Cellwise Robust Discriminant Analysis

单元稳健判别分析

Fabio Centofanti, Can Hakan Dagidir, Mia Hubert, Peter J. Rousseeuw

AI总结 针对数据矩阵中的单元异常值,提出基于单元稳健估计和惩罚最大似然的判别分析方法(cellQDA/cellLDA),在训练和预测中同时处理单元和个案异常值及缺失值。

详情
AI中文摘要

经典判别分析(DA)基于每个类别的均值和经验协方差矩阵,两者都对数据中的异常值敏感。过去关注的是个案异常值,即远离的数据点。但如今对单元异常值(即数据矩阵中的意外条目)的兴趣日益增加。因为一个或几个异常单元而删除整个个案会丢失大量信息。单元稳健方法旨在检测异常单元并保留其他单元的信息。我们提出一种DA方法,通过单元和个案稳健估计量估计每个类别的位置和协方差来进行训练,该方法也能处理缺失值。我们方法的主要创新在于对测试数据的预测,测试数据本身可能包含异常单元和缺失值。新的稳健判别函数通过惩罚最大似然从新的统计模型推导得出。我们专注于二次DA,但也涵盖线性DA的设置。新的cellQDA和cellLDA方法在模拟中表现良好。该方法在真实数据上进行了说明,并通过图形展示解释结果。

英文摘要

Classical discriminant analysis (DA) is based on the mean and empirical covariance matrix of each class, both of which are sensitive to outliers in the data. In the past the focus was on casewise outliers, that is, datapoints that lie far away. But nowadays there is increasing interest in cellwise outliers, that are unexpected entries in the data matrix. Removing an entire case because it has one or a few outlying cells would lose much information. Cellwise robust methods aim to detect the outlying cells and to preserve the information in the other cells. We propose a DA method that is trained by estimating the location and covariance of each class by cellwise and casewise robust estimators, that can also handle NA's. The main novelty of our approach is in the prediction on test data, that may contain outlying cells and NA's themselves. The new robust discriminant function is derived from a novel statistical model by penalized maximum likelihood. We focus on quadratic DA, but also cover the setting of linear DA. The new cellQDA and cellLDA methods perform well in simulation. The approach is illustrated on real data, and the results are interpreted with the help of graphical displays.

2605.30175 2026-05-29 astro-ph.HE cs.LG stat.ML

A new completely parameter-free clustering algorithm for unsupervised classification of BATSE gamma-ray bursts

一种用于BATSE伽马射线暴无监督分类的全新无参数聚类算法

Soumita Modak

AI总结 提出一种完全无参数的聚类算法,对BATSE伽马射线暴样本进行分类,支持双群(短暴与长暴)的合并-坍缩星理论。

详情
AI中文摘要

聚类分析是一种广泛应用的机器学习技术,用于理解伽马射线暴(GRB)群体中存在的模式,以探索其物理来源。目前,尽管采用了最先进的聚类程序进行了多次尝试,但对应可区分群组的聚类数量仍存在争议。这一关键未知参数需要通过直接或间接方式(以其他调优参数的形式)评估,以便通过实施合适的聚类算法在GRB中产生聚类。虽然大多数应用的算法得出了两个物理上可解释的群组(分别以短暴和长暴为主的合并与坍缩星),但其他统计方法违反了这种二元划分。然而,任何额外聚类的物理建立尚未得到确认。因此,我们提出一种新算法,来自一种称为“完全无参数”的不同聚类流派,它以迄今未尝试过的方式对GRB进行分类。该算法从BATSE样本中指示出两个主要群组,即短持续时间和长持续时间爆发,与合并-坍缩星理论兼容。

英文摘要

Cluster analysis is a widely applied machine learning technique to understand the existing patterns in the population of gamma-ray bursts (GRBs), in order to explore their physical sources. In the present scenario, the number of clusters corresponding to differentiable groups is still under conflict, in spite of numerous attempts with the state-of-the-art clustering procedures. This crucial unknown parameter needs to be evaluated, either directly or indirectly in terms of other tuning parameters, to produce the clusters in GRBs through implementation of an appropriate clustering algorithm. While most of the applied algorithms reached two physically explained groups of merger and collapsar predominated by the short and long bursts respectively, other statistical approaches violated this binary partition. However, physical establishment of any additional cluster(s) is not yet confirmed. Therefore, we propose a new algorithm, from a different stream of clustering referred to as `completely parameter-free', which carries out the classification of GRBs in a manner that has not been tried so far. It indicates two main groups, of short and long duration bursts from the BATSE sample, compatible with the merger-collapsar theory.

2605.30167 2026-05-29 stat.ML cs.CV cs.LG stat.AP

Visual Spatial Learning: Single-Field Spatial Interpolation Using Convolutional Neural Networks

视觉空间学习:使用卷积神经网络的单场空间插值

Daniel Tinoco, Raquel Menezes, Carlos Baquero, Alexandra Silva

AI总结 提出基于卷积神经网络(CNN)的架构,直接从单次部分观测场学习空间插值,无需外部数据或先验场,作为克里金法的替代方案。

详情
Comments
53 pages, 10 figures
AI中文摘要

从稀疏观测中预测完整的空间相关场是空间统计和环境建模中的一个基本挑战。经典的插值方法如克里金法依赖于高斯过程假设和变异函数分析,这可能会限制其在非平稳环境中的有效性,并且需要大量的领域专业知识。在这项工作中,我们利用基于卷积神经网络(CNN)的架构进行空间插值,该架构在单个部分观测场上进行训练和应用,无需访问外部数据或先验场。模型直接在观测位置进行监督,并学习在用户定义的网格上预测未观测点的值。与克里金法不同,我们的方法不需要显式的协方差建模或变异函数估计,并且可以以数据驱动的方式灵活捕捉局部空间模式。这项工作展示了CNN在稀疏监督下进行单实例空间插值的潜力,为经典地统计方法提供了实用的替代方案,并将CNN的应用扩展到新的问题领域。

英文摘要

Predicting a complete spatially correlated field from sparse observations is a fundamental challenge in spatial statistics and environmental modelling. Classical interpolation methods such as Kriging rely on Gaussian process assumptions and variography, which can limit their effectiveness in non-stationary settings and require substantial domain expertise. In this work, we leverage an architecture based on convolutional neural networks (CNNs) for spatial interpolation that is trained and applied on a single partially observed field, without access to external data or prior fields. The model is supervised directly on the observed locations and learns to predict values at unobserved points on the user defined grid. Unlike Kriging, our method does not require explicit covariance modelling or variogram estimation, and it can flexibly capture local spatial patterns in a data-driven manner. This work demonstrates the potential of CNNs for single-instance spatial interpolation under sparse supervision, offering a practical alternative to classical geostatistical methods, and extending the use of CNNs to a new problem domain.

2605.30158 2026-05-29 stat.ME

High-Dimensional Data with Measurement Error

高维数据中的测量误差

Herman Tesso, Georges Nguefack-Tsague

AI总结 针对高维回归中协变量存在测量误差的问题,综述并比较了岭回归、Lasso、Dantzig选择器和弹性网及其误差校正变体的性能,通过模拟和真实数据提供实践建议。

详情
Comments
21 pages, 0 figure
AI中文摘要

在许多重要的统计分析中,协变量的数量 $p$ 通常超过数据规模 $n$,这种情形通常被称为高维。尽管在假设协变量无误差的高维回归方面取得了显著进展,但现实世界的数据经常涉及噪声或损坏的测量。如果不加以处理,测量误差可能会悄无声息地扭曲分析并误导结论。本文回顾并评估了一些适用于存在测量误差协变量的高维回归的统计推断方法。我们讨论了四种惩罚回归方法——岭回归、lasso、Dantzig选择器和弹性网——及其测量误差校正变体,并在线性加性且不相关的测量误差模型下进行了比较研究。通过模拟研究和对高维医学遗传数据的实际应用,我们展示了所研究的方法,表明校正程序的选择是问题特定的,并提供了实用建议以帮助实践者应对这一方法论领域。

英文摘要

In many important statistical analyses, the number of covariates $p$ often exceeds the data size $n$, a regime commonly referred to as high-dimensional. While considerable progress has been made in high-dimensional regression under the assumption of error-free covariates, real-world data frequently involve noisy or corrupted measurements. When left unaddressed, measurement errors can silently distort the analysis and mislead the conclusions. This paper reviews and evaluates some advisable statistical inference methods for high-dimensional regression in the presence of mismeasured covariates. We discuss four penalized regression methods -- ridge, lasso, Dantzig selector, and Elastic-net -- alongside their measurement-error-corrected variants, and conduct a comparative study under linear additive and uncorrelated measurement error models. Through simulation studies and a real application to high-dimensional medical genetic data, we illustrate the methods studied, show that the choice of correction procedure is problem-specific, and provide practical recommendations to help practitioners navigate this methodological landscape.

2605.30157 2026-05-29 stat.AP

Leveraging Large Language Models to Improve Precision in Randomized Controlled Trials

利用大型语言模型提高随机对照试验的精度

Jaylin Lowe, Adam Sales, Johann A. Gagnon-Bartsch

AI总结 本文探索如何安全、严谨地利用大型语言模型(LLM)的预测来提升随机对照试验(RCT)的精度,并通过三个案例验证其有效性。

详情
Comments
Submitted to Machine Learning and Artificial Intelligence for Causal Inference in the Behavioral and Social Sciences: Methodological Advances and Applications, a topical issue of the Zeitschrift für Psychologie
AI中文摘要

大型语言模型(LLM)在统计研究和应用中越来越广泛,但也以信息不可靠或有偏见而闻名。在此,我们探讨是否能够以安全且严谨的方式利用LLM提高随机对照试验(RCT)的精度。借鉴利用观察性数据的类似工作,我们将LLM预测纳入RCT分析。虽然利用外部预测提高精度并非新概念,但以这种方式使用LLM预测的价值仍是一个开放性问题。我们开发了一个在此背景下最佳利用LLM预测的流程,并将其应用于三个不同的案例研究。我们发现,这些预测可以安全地提高精度,特别是在RCT缺乏预测性协变量或包含文本数据等适合LLM的协变量时。

英文摘要

Large language models (LLMs) are increasingly used in statistical research and applications. However,they are also notorious for unreliable or biased information. Here, we explore whether LLMs can be used to improve the precision of randomized controlled trials (RCTs) in a safe and rigorous way. Following similar work on leveraging observational data, we incorporate LLM predictions into an RCT analysis. While incorporating external predictions to improve precision is not new, the value of using LLM predictions in this manner is an open question. We develop a pipeline for best leveraging LLM predictions in this context and apply it to three different case studies. We find that these predictions can safely improve precision, particularly when the RCT lacks predictive covariates or contains covariates, such as text data, that are well-suited to LLMs.

2605.30153 2026-05-29 stat.ML cs.IT cs.LG math.IT math.ST stat.TH

Diffusion Models Are Statistically Optimal for Learning Low-Dimensional Multi-Modal Distributions

扩散模型在学习低维多模态分布时具有统计最优性

Jingda Wu, Changxiao Cai

AI总结 本文证明扩散模型在学习支撑在低维子空间并集上的分布时,样本复杂度仅依赖于内在维度,达到近最优的1-Wasserstein误差率,无需光滑性或有界密度假设。

详情
Comments
accepted to ICML 2026
AI中文摘要

基于分数的扩散模型在学习高维分布,特别是那些具有低维和多模态结构的分布方面,已经展现出显著的实证成功。然而,对其统计效率的理论理解仍然有限。现有理论通常依赖于强正则性假设,例如一致有界密度或全局光滑的分数函数,这些假设无法捕捉此类内在结构。在这项工作中,我们研究了扩散模型在学习支撑在低维子空间并集上的分布时的样本复杂度。假设每个子空间内的数据分布是次高斯的,我们证明扩散模型最多需要$\widetilde{O}(\varepsilon^{-k \vee 2})$个样本即可在1-Wasserstein距离上达到$\varepsilon$误差,其中$k$是内在维度。这一近最优的收敛速率仅依赖于内在维度,并显著改进了先前遭受维度灾难的理论保证。值得注意的是,我们的分析适用于广泛的分布,无需施加光滑性、有界密度或对数凹性假设。总体而言,我们的结果表明,扩散模型能够统计适应内在低维结构,同时自然容纳多模态数据,为其在复杂高维学习任务中的成功提供了严格的理论依据。

英文摘要

Score-based diffusion models have demonstrated remarkable empirical success in learning high-dimensional distributions, particularly those exhibiting low-dimensional and multi-modal structures. However, theoretical understanding of their statistical efficiency remains limited. Existing theories typically rely on strong regularity assumptions, such as uniformly bounded densities or globally smooth score functions, which fail to capture such intrinsic structures. In this work, we study the sample complexity of diffusion models for learning distributions supported on a union of low-dimensional subspaces. Assuming that the data distribution within each subspace is subgaussian, we show that diffusion models require at most $\widetilde{O}(\varepsilon^{-k \vee 2})$ samples to achieve $\varepsilon$ error in 1-Wasserstein distance, where $k$ is the intrinsic dimension. This near-optimal convergence rate depends only on the intrinsic dimension and significantly improves upon prior theoretical guarantees that suffer from the curse of dimensionality. Notably, our analysis applies to a broad collection of distributions without imposing smoothness, bounded-density, or log-concavity assumptions. Overall, our results show that diffusion models can statistically adapt to intrinsic low-dimensional structure while naturally accommodating multi-modal data, offering a rigorous theoretical justification for their success in complex high-dimensional learning tasks.

2605.30134 2026-05-29 stat.CO

Accurate and Efficient MCMC for Latent Position Models

潜在位置模型的精确高效MCMC

Zonghao Li, Aaron Smith

AI总结 针对潜在位置模型(LPM)的贝叶斯推断,提出两种MCMC算法:一种在几乎O(|E|)时间内运行且精度更高,另一种在几乎O(|V|)时间内运行且精度略有提升,主要创新是引入一种可廉价更新的辅助数据结构。

详情
Comments
43 pages, 8 figures
AI中文摘要

潜在位置模型(LPM)是一类广泛流行的随机图模型。然而,拟合贝叶斯LPM在计算上具有挑战性——即使只计算一次似然,所需时间也与观测图$G = (V,E)$的顶点数$|V|$呈二次关系。许多先前的工作引入了近似MCMC算法来加速,其中最接近我们的是Rastelli等人(2024),他们提出了一种算法,其摊销运行时间几乎可以降低到$O(|E|)$,并且在合理的推理问题上具有良好的实证表现。本文提供了两种解决同一问题的算法:一种“快速”算法,其运行时间与Rastelli等人相同,几乎为$O(|E|)$量级,但具有更强的精度保证;另一种“更快”算法,其运行时间改进为几乎$O(|V|)$,精度保证相比Rastelli等人略有提升(但不足以应对所有任务)。主要改进来自于引入一种简单的辅助数据结构,该结构可以在MCMC运行期间廉价地更新;我们怀疑这种“廉价草图”可能对其他MCMC算法也有用。

英文摘要

Latent position models (LPMs) are a large and popular class of models for random graphs. However, fitting Bayesian LPMs is computationally challenging - computing the likelihood even once takes time that is quadratic in the number of vertices $|V|$ of the observed graph $G = (V,E)$. Many previous papers have introduced approximate MCMC algorithms to speed this up, with the most similar to ours, Rastelli et al (2024), presenting an algorithm that has amortized running time that can be reduced almost to $O(|E|)$ and good empirical performance on reasonable inference problems. The present paper offers two algorithms for solving the same problem: a ``fast" algorithm with running time of the same almost-$O(|E|)$ order as astelli et al and much stronger accuracy guarantees, and a ``faster" algorithm with an improved running time of almost $O(|V|)$, and accuracy guarantees that are slightly improved compared to Rastelli et al (but not sufficient for all tasks). The main improvements come from the introduction of a simple auxiliary data structure that can be cheaply updated during an MCMC run; we suspect that the same ``cheap sketch" may be useful for other MCMC algorithms.

2605.30132 2026-05-29 cs.LG stat.ML

Learning to Extrapolate to New Tasks: A Relational Approach to Task Extrapolation

学习外推到新任务:一种关系型任务外推方法

Adam Ousherovitch, Yixin Wang

AI总结 提出关系型任务外推器(RTE),通过将目标任务分解为锚定任务和变换关系并学习关系算子,实现向未见任务的系统性外推,在函数预测和序列预测中显著优于现有方法。

详情
Comments
ICML 2026
AI中文摘要

现代学习系统擅长内插,但难以泛化到训练分布支持范围之外的未见任务。即使在简单设置中(如处理超出训练范围的任务参数),这种失败也会发生,并且尽管基础模型取得了进展,问题依然存在。为此,我们开发了关系型任务外推器(RTE),一种旨在实现向新任务系统性外推的算法。关键观察是外推本质上是关系型的:外推到未见任务需要学习任务如何相互转换。如果模型在训练期间学习了任务A和B之间的变换,它可以在测试时应用相同的变换来关联已知任务和未见任务。RTE通过将每个目标任务分解为一个已知的锚定任务和一个连接锚定与目标的变换来实现这一思想。然后它学习一个关系算子,将锚定-变换对映射到目标任务的预测。我们在函数预测的多个任务外推场景中实例化RTE,例如目标任务使用超出范围的参数(参数外推)、具有更大的组合深度(长度外推)和/或以未见方式重新组合函数原语(组合外推)。我们进一步将RTE扩展到序列预测,将其集成到基础模型的微调算法中。在实证研究中,我们发现RTE在向新颖、未见任务的外推上显著优于现有方法。

英文摘要

Modern learning systems excel at interpolation but struggle to generalize to unseen tasks outside the training distribution's support. This failure occurs even in simple settings, such as handling task parameters beyond the training range, and persists despite advances in foundation models. To this end, we develop the Relational Task Extrapolator (RTE), an algorithm designed to enable systematic extrapolation to novel tasks. The key observation is that extrapolation is inherently relational: extrapolating to unseen tasks requires learning how tasks transform into one another. If a model learns the transformation between tasks A and B during training, it can apply that same transformation to relate known tasks to unseen ones at test time. RTE operationalizes this idea by decomposing each target task into a known anchor task and a transformation linking the anchor and target. It then learns a relational operator, mapping an anchor-transformation pair to predictions for the target task. We instantiate RTE across multiple task extrapolation regimes in function prediction, e.g. where target tasks use out-of-range parameters (parameter extrapolation), have greater compositional depth (length extrapolation), and/or recombine function primitives in unseen ways (compositional extrapolation). We further extend RTE to sequence prediction, integrating it into fine-tuning algorithms for foundation models. Across empirical studies, we find that RTE substantially outperforms existing approaches on extrapolation to novel, unseen tasks.

2605.30113 2026-05-29 math.ST cs.CC cs.DS math.PR stat.TH

Low-degree estimation thresholds in planted hypergraphs and tensor PCA

植入超图和张量PCA中的低度估计阈值

Daniel Fu, Youngtak Sohn

AI总结 本文通过低度框架研究植入稠密子超图、稀疏张量PCA和一般先验张量PCA中的统计-计算差距,确定了低度估计的尖锐阈值,并给出了多项式时间算法。

详情
Comments
67 pages, 1 figure
AI中文摘要

高维统计学中的一个核心问题是理解统计-计算差距:即恢复隐藏信号在信息论上可能但推测在计算上难以处理的区域。低度框架通过将估计器限制为观测数据中次数至多为$D$的多项式,提供了一种研究这一差距的具体方法。本文研究了植入稠密子超图、稀疏张量PCA以及具有一般先验的张量PCA中的低度估计。 对于$n$个顶点上的植入稠密子超图模型,我们根据植入集合是否大于或小于$\sqrt{n}$确定了两种情形。在此尺度之上,我们识别出低度估计的尖锐阈值。在此尺度之下,我们在先前工作预测的区域内建立了困难性,从而解决了Schramm和Wein(2022)以及Sohn和Wein(2025)的一个问题。对于稀疏张量PCA,我们识别出类似的尖锐相变。对于具有一般先验的张量PCA,我们在关键信号尺度上证明了低度估计下界,与先前工作提示的度-信号权衡相匹配。 我们的下界适用于次数$D=n^δ$,其中$n$是维度,$δ>0$是常数,并且我们通过相应的低度上界进行补充。此外,对于$\sqrt{n}$尺度以上的植入稠密子超图和稀疏张量PCA,我们将上界转化为多项式时间算法,在尖锐阈值以上实现几乎精确恢复,从而得到成功达到该阈值的多项式时间算法。我们的证明通过条件变体扩展了Sohn和Wein(2025)的框架,在无条件方法不足的设置中得到了正确的信噪比。

英文摘要

A central question in high-dimensional statistics is to understand statistical--computational gaps: regimes in which recovering a hidden signal is information-theoretically possible but conjectured to be computationally intractable. The low-degree framework offers a concrete way to study this gap by restricting attention to estimators that are polynomials of degree at most $D$ in the observed data. In this paper, we study low-degree estimation in planted dense subhypergraph, sparse tensor PCA, and tensor PCA with a general prior. For the planted dense subhypergraph model on $n$ vertices, we identify two regimes depending on whether the planted set is larger or smaller than $\sqrt{n}$. Above this scale, we identify a sharp threshold for low-degree estimation. Below this scale, we establish hardness in the regimes predicted by prior work, thereby resolving a question of Schramm and Wein (2022) and Sohn and Wein (2025). For sparse tensor PCA, we identify an analogous sharp phase transition. For tensor PCA with a general prior, we prove a low-degree estimation lower bound at the critical signal scale, matching the degree--signal tradeoff suggested by prior work. Our lower bounds apply to degree $D=n^δ$, where $n$ is the dimension and $δ>0$ is a constant, and we complement them with corresponding low-degree upper bounds. In addition, for planted dense subhypergraph and sparse tensor PCA above the $\sqrt{n}$ scale, we convert our upper bounds into polynomial-time algorithms that achieve almost exact recovery above the sharp threshold, yielding polynomial-time algorithms succeeding up to this threshold. Our proofs extend the framework of Sohn and Wein (2025) through a conditional variant that yields the correct signal-to-noise ratio in settings where the unconditional approach is insufficient.

2605.30095 2026-05-29 math.ST cs.IT eess.SP math.IT stat.TH

The generalized method of moments is (almost) statistically efficient in low-SNR Gaussian latent-variable models

广义矩方法在低信噪比高斯潜变量模型中(几乎)具有统计有效性

Amnon Balanov, Tamir Bendory, Dan Edidin

AI总结 针对低信噪比高斯潜变量模型,证明广义矩方法在最优加权下与最大似然估计具有相同的一阶渐近协方差,从而提供统计有效的替代方案。

详情
AI中文摘要

我们研究了低信噪比(SNR)条件下的一类广泛的高斯潜变量模型,包括高斯混合和轨道恢复问题。我们证明,在该条件下,广义矩方法(GMoM)与最大似然估计的一阶渐近有效性相匹配。特别地,如果矩特征选择到识别所需的最小局部阶数并最优加权,则所得的GMoM估计量与最大似然估计量具有相同的主渐近协方差。我们的分析表明,在低信噪比下,这种等价性由分层局部几何结构决定:不同方向在不同矩阶数下变得信息丰富,将空间划分为具有不同SNR缩放比例的分层。我们证明了观测Fisher信息和GMoM信息算子在这些层上具有匹配的分层展开。因此,在低信噪比条件下,GMoM提供了最大似然的统计有效替代方案,同时保留了基于矩估计的计算优势。

英文摘要

We study estimation in the low signal-to-noise ratio (SNR) regime for a broad class of Gaussian latent-variable models, including Gaussian mixtures and orbit recovery problems. We show that, in this regime, the generalized method-of-moments (GMoM) matches the first-order asymptotic efficiency of maximum likelihood. In particular, if the moment features are chosen up to the minimal local order required for identification and are weighted optimally, then the resulting GMoM estimator has the same leading asymptotic covariance as the maximum-likelihood estimator. Our analysis shows that, in low SNR, this equivalence is governed by a layered local geometry: different directions become informative at different moment orders, partitioning the space into layers with distinct SNR scalings. We prove that the observed Fisher information and the GMoM information operator admit matching layerwise expansions across these layers. As a consequence, in the low-SNR regime, GMoM provides a statistically efficient alternative to maximum likelihood, while preserving the computational advantages of moment-based estimation.

2605.30085 2026-05-29 cs.AI cs.CL cs.LG stat.ML

Conformal Certification of Reasoning Trace Prefixes

推理轨迹前缀的保形认证

Matt Y. Cheung, Ashok Veeraraghavan, Hanjie Chen, Guha Balakrishnan

AI总结 提出CROP方法,通过保形校准选择阈值,返回最长无错前缀,并控制错误包含概率,平衡保留有效推理与丢弃误导后缀。

详情
Comments
Code available at https://github.com/matthewyccheung/crop
AI中文摘要

语言模型推理轨迹很少是全有或全无;在关键错误发生之前,它们通常包含有效的中间步骤。现有的不确定性量化方法通常认证最终答案或整个响应,未能为顺序轨迹中可安全保留的比例提供统计保证。为了解决这个问题,我们引入了CROP(保形推理输出前缀),一种与验证器无关的校准程序,用于干净前缀认证。给定任何步骤级风险代理,CROP选择一个校准阈值,并返回其步骤风险代理保持低于该阈值的最长连续前缀,将未认证的后缀路由到下游审查或修复。假设可交换性,CROP严格控制了返回前缀包含注释错误的边际概率。在六个过程标记的推理数据集上,我们证明了标准步骤级指标(如AUROC)不能完全捕捉前缀效用,建议验证器应改为通过认证前缀长度进行评估。此外,CROP平衡了过度保留和不足保留,通过保留有效的中间推理同时丢弃误导后缀,提高了下游修复的准确性。最终,这项工作将前缀认证定位为过程监督、弃权和修复之间的严格、实用的桥梁。

英文摘要

Language model reasoning traces are rarely all-or-nothing; they frequently contain valid intermediate steps before a critical error occurs. Existing uncertainty quantification methods typically certify final answers or entire responses, failing to provide statistical guarantees for the proportion of a sequential trace that can be safely retained. To address this, we introduce CROP (Conformal Reasoning Output Prefixes), a verifier-agnostic calibration procedure for clean-prefix certification. Given any step-level risk proxy, CROP selects a calibrated threshold and returns the longest contiguous prefix whose step risk proxies remain below it, routing the uncertified suffix for downstream review or repair. Assuming exchangeability, CROP rigorously controls the marginal probability that the returned prefix contains an annotated error. Across six process-labeled reasoning datasets, we demonstrate that standard step-level metrics such as AUROC do not fully capture prefix utility, suggesting verifiers should instead be evaluated by certified prefix length. Furthermore, CROP balances over- and under-withholding, improving downstream repair accuracy by preserving valid intermediate reasoning while discarding misleading suffixes. Ultimately, this work positions prefix certification as a rigorous, practical bridge between process supervision, abstention, and repair.

2605.30072 2026-05-29 stat.ME

Credible rectangles for high-dimensional posterior comparison

高维后验比较的可信矩形

Alice Chevaux, Julyan Arbel, Guillaume Kon Kam King, Sophie Achard

AI总结 提出一种贝叶斯框架,通过构建和比较后验分布的可信超矩形,实现脑连接图分析中的不确定性量化与比较,并提供高维可扩展算法和理论保证。

详情
Comments
35 pages, 4 figures
AI中文摘要

我们提出了一种用于脑连接图分析中不确定性量化和比较的贝叶斯框架。标准的基于图的方法通常依赖于相关矩阵的点估计,忽视了从有限数据中进行高维估计所引入的不确定性。我们的方法构建并比较从后验分布导出的可信超矩形,为个体水平推断和纵向监测提供了可解释的工具。我们开发了在高维中估计这些区域的可扩展算法,并在静息态fMRI数据的逆Wishart模型中建立了理论保证,包括相关矩阵的Bernstein--von Mises定理和贝叶斯族系错误率的控制。所提出的框架能够在保持联合依赖结构的同时,从全局和局部两个层面原则性地检测显著的连接差异。在合成数据集上,该方法与多重检验程序相比表现出竞争性能,同时它还促进了单个患者两次不同扫描的直接比较,这是文献中目前缺失的能力。我们利用这一新颖性在真实数据集上提高了可解释性。除了fMRI数据,该方法为高维依赖环境中的比较问题提供了一个通用框架。

英文摘要

We propose a Bayesian framework for uncertainty quantification and comparison in brain connectivity graph analysis. Standard graph-based approaches typically rely on point estimates of correlation matrices, overlooking the uncertainty induced by high-dimensional estimation from limited data. Our methodology constructs and compares credible hyperrectangles derived from posterior distributions, providing interpretable tools for subject-level inference and longitudinal monitoring. We develop scalable algorithms for estimating these regions in high dimensions and establish theoretical guarantees in the inverse-Wishart model for resting-state fMRI data, including a Bernstein--von Mises theorem for correlation matrices and control of a Bayesian family-wise error rate. The proposed framework enables principled detection of significant connectivity differences both globally and locally while preserving joint dependency structures. While demonstrating competitive performance against multiple-testing procedures on synthetic datasets, our approach also facilitates the direct comparison of two distinct scans from a single patient, a capability currently absent from the literature. We leverage this novelty on real datasets to improve interpretability. Beyond fMRI data, the approach provides a general framework for comparison problems in high-dimensional dependent settings.

2605.30071 2026-05-29 math.ST stat.TH

On multiplicative bias correction in kernel density estimation

核密度估计中的乘性偏差校正

M. C. Jones, D. F. Signorini, Nils Lid Hjort

AI总结 本文结合Hjort-Glad和Jones-Linton-Nielsen两种半参数密度估计方法,提出一种新的乘性偏差校正估计量,理论上实现高阶偏差并提升参数模型拟合时的性能,但模拟显示小到中等样本下实际效果有限。

详情
Journal ref
Sankyha: the Indian Journal of Statistics, Series A, 2009, pages 422.430
Comments
9 pages, no figures. This is the authors' manuscript, Statistical Research Report, Department of Mathematics, University of Oslo, later published, in essentially similar form, in Sankyha: the Indian Journal of Statistics, Series A, 2009, pages 422.430
AI中文摘要

Hjort和Glad(1995)提出了一种半参数密度估计方法。相对于普通核密度估计,当参数车辆分布拟合数据时,该技术表现更好,否则表现大致相同。Jones、Linton和Nielsen(1995)提出了一种类似但具有高阶偏差(对所有足够光滑的密度)的密度估计方法。本文中,我们结合了这两种方法。理论上,我们实现了期望的高阶偏差性质,同时对于合适的车辆模型具有更好的性能。模拟表明,对于小到中等样本量,新估计量在实践中仅实现了其理论潜力的一小部分。

英文摘要

Hjort and Glad (1995) present a method for semiparametric density estimation. Relative to the ordinary kernel density estimator, this technique performs much better when a parametric vehicle distribution fits the data, and otherwise performs at broadly the same level. Jones, Linton, and Nielsen (1995) present a somewhat similar method for density estimation which has higher order bias for all sufficiently smooth densities. In this paper, we combine the two methods. We show that, theoretically, the desired properties of general higher order bias allied with even better performance for an appropriate vehicle model are achieved. Simulations suggest that the new estimator realises only a little of its theoretical potential in practice for small to moderately large sample sizes.

2605.30059 2026-05-29 cs.LG cond-mat.stat-mech stat.ML

Ridge Regression from Poisson Resetting: A Renewal Perspective on Spectral Regularization

泊松重置的岭回归:谱正则化的更新视角

Petar Jolakoski

AI总结 通过非平衡统计物理中的随机重置与统计学习中的岭正则化建立联系,证明线性梯度流下以速率r重置到原点产生的稳态均值即为岭估计,并推广到一般更新重置律以生成替代谱滤波器。

详情
AI中文摘要

我们将非平衡统计物理中的随机重置与统计学习中的岭正则化联系起来。对于线性梯度流,以速率$r$重置到原点产生稳态均值$(X^\top X+rI)^{-1}X^\top y$,这正是惩罚项$\lambda=r$的岭估计。这利用了岭回归与梯度流指数时间平均之间已知的拉普拉斯变换关系,其中指数时间现在被解释为与泊松重置相关的稳态年龄。然后我们将这一恒等式推广到一般更新重置律:指数重置时间分布是唯一的更新律,其稳态均值在每个特征方向上作为精确的滤波器恒等式对每个正曲率重现标量岭,而非指数更新律则生成替代的谱滤波器。在波动层面,我们研究了一个具有恒定扩散的独立加性奥恩斯坦-乌伦贝克扩展,解释为一种风格化的SGD近似。在这种设定下,等式仅在均值层面成立,因为重置过程由于累积的OU噪声和重置时序方差具有非零稳态协方差,而确定性岭是一个具有相同中心的固定估计量。风格化实验直接比较了确定性更新诱导的滤波器,并说明了非指数重置时间律诱导的滤波器何时可能在预测上与岭不同。关于稳态均值和诱导谱滤波器的结果是在二次目标上具有各向同性重置的连续时间梯度流下建立的;协方差和风险公式额外假设具有状态独立协方差的加性噪声。

英文摘要

We connect stochastic resetting from non-equilibrium statistical physics with ridge regularization in statistical learning. For linear gradient flow, resetting to the origin at rate $r$ produces stationary mean $(X^\top X+rI)^{-1}X^\top y$, exactly the ridge estimator with penalty $λ=r$. This uses the known Laplace-transform relationship between ridge regression and exponential-time averaging of gradient flow, with the exponential time now interpreted as the stationary age associated with Poisson resetting. We then extend this identity to general renewal reset laws: the exponential reset time distribution is the unique renewal law whose stationary mean reproduces scalar ridge in every eigendirection as an exact filter identity for every positive curvature, while non-exponential renewal laws generate alternative spectral filters. At the fluctuation level, we study a separate additive Ornstein-Uhlenbeck extension with constant diffusion, interpreted as a stylized SGD approximation. In this setting, the equality holds only at the level of the mean, since the reset process has a nonzero stationary covariance from accumulated OU noise and reset-timing variance, whereas deterministic ridge is a fixed estimator with the same center. Stylized experiments compare the deterministic renewal-induced filters directly and illustrate when filters induced by non-exponential reset-time laws can differ predictively from ridge. The results for the stationary mean and the induced spectral filters are established for continuous-time gradient flow with isotropic resetting on quadratic objectives; the covariance and risk formulas additionally assume additive noise with state-independent covariance.

2605.30055 2026-05-29 math.PR math.FA math.ST stat.TH

The Wasserstein cost of Importance Sampling

重要性抽样的Wasserstein代价

Simon Coste, Michael Goldman

AI总结 本文证明了在高维(d≥3)情况下,重要性抽样估计的Wasserstein距离期望以n^{-p/d}阶收敛,并给出了上下界常数,同时发现最优抽样分布是g的tempered版本,类似于Zador定理。

详情
Comments
20 pages
AI中文摘要

重要性抽样(IS)包括将来自分布$f$的样本偏向于另一个分布$g$。具体来说,给定来自$f$的样本$X_i$,IS测度为$$\hat{g}_n = rac{1}{Z_n}\sum_{i=1}^n rac{g(X_i)}{f(X_i)} δ_{X_i},$$其中$Z_n = \sum_{i=1}^n rac{g(X_i)}{f(X_i)}$。随机测度$\hat{g}_n$近似于$g$,并用于从蒙特卡洛积分到贝叶斯推断的许多背景中。我们证明,在高维($d \geqslant 3$)中,Wasserstein代价$W_p^p(\hat{g}_n, g)$的期望阶为$n^{-p/d}$,即$$β^{\mathrm{low}}_{p,d}\int gf^{-p/d}\leqslant \liminf_{n o \infty} n^{p/d} \mathbb{E}[W_p^p(\hat{g}_n, g)] \leqslant \limsup_{n o \infty} n^{p/d} \mathbb{E}[W_p^p(\hat{g}_n, g)] \leqslantβ_{p,d} \int g f^{-p/d}$$其中$0<β^{\mathrm{low}}_{p,d}\leqslant β_{p,d}$是仅依赖于$p$和$d$的常数,对于$p=2$它们相等,并推测对于任何$p\geqslant 1$都相等。我们的结果对所有$p\geqslant 1$和$d\geqslant 3$都成立。在$β^{\mathrm{low}}_{p,d} = β_{p,d}$的情况下,我们证明重要性抽样的渐近最优抽样分布$f^*$不等于$g$,而是$g$的一个tempered版本,即$f^* \propto g^{d/(p+d)}$,这让人联想到测度量化的Zador定理。

英文摘要

Importance sampling (IS) consists in biasing samples from a distribution $f$ towards another distribution $g$. Concretely, given samples $X_i$ from $f$, the IS measure is $$\hat{g}_n = \frac{1}{Z_n}\sum_{i=1}^n \frac{g(X_i)}{f(X_i)} δ_{X_i},$$ with $Z_n = \sum_{i=1}^n \frac{g(X_i)}{f(X_i)}$. The random measure $\hat{g}_n$ approximates $g$, and is used in many contexts ranging from Monte Carlo integration to Bayesian inference. We show that, in high dimension ($d \geqslant 3$), the Wasserstein cost $W_p^p(\hat{g}_n, g)$ has order $n^{-p/d}$ in expectation, i.e. $$β^{\mathrm{low}}_{p,d}\int gf^{-p/d}\leqslant \liminf_{n \to \infty} n^{p/d} \mathbb{E}[W_p^p(\hat{g}_n, g)] \leqslant \limsup_{n \to \infty} n^{p/d} \mathbb{E}[W_p^p(\hat{g}_n, g)] \leqslantβ_{p,d} \int g f^{-p/d}$$ where $0<β^{\mathrm{low}}_{p,d}\leqslant β_{p,d}$ are constants depending only on $p$ and $d$, which are equal for $p=2$ and conjectured to be equal for any $p\geqslant 1$. Our results are valid for all $p\geqslant 1$ and $d\geqslant 3$. In the case where $β^{\mathrm{low}}_{p,d} = β_{p,d}$, we show that the asymptotically optimal sampling distribution $f^*$ for importance sampling is not equal to $g$ but to a tempered version of $g$, namely $f^* \propto g^{d/(p+d)}$, which is reminiscent of Zador's theorem in the domain of measure quantization.

2605.30034 2026-05-29 stat.AP

Constructing Contact and Connectivity Matrices for Infectious Disease Modelling

构建传染病建模中的接触矩阵和连通性矩阵

Xiahui Li, Dongni Zhang, Neha Bansal, Jessica R. E. Bridgen, Chris Jewell, Emma McBryde, Glenn Marion, Emily Nixon, Philip D. O'Neill, David J. Pascall, Lorenzo Pellis, Simon E. F. Spencer, Panayiota Touloupou, Lloyd Chapman, Ben Swallow

AI总结 本文综述了用于构建接触矩阵的数据类型以及将不确定性和异质性纳入矩阵的方法,并指出了未来研究方向。

详情
AI中文摘要

接触(或混合,或更一般地,连通性)矩阵是传染病流行病学建模和推断的基本组成部分。它们的结构和参数化直接解释了不同个体亚群之间相互作用的频率,并有可能编码这些相互作用在人口统计轴、空间和时间上的动态异质性。大量研究致力于这些矩阵(的组成部分)的结构和估计,以帮助指导疫情控制和预测疾病传播。在本文中,我们回顾了关于用于构建接触矩阵的数据类型以及将不确定性和异质性纳入这些矩阵的方法的现有文献。我们还强调了在流行病学研究中使用这些接触矩阵的剩余挑战和未来方向。

英文摘要

Contact (or mixing, or more generally connectivity) matrices are a fundamental component of modelling and inference for infectious disease epidemiology. Their structure and parametrisation directly accounts for the frequency of interactions between different subpopulations of individuals, as well as having the potential to encode dynamic heterogeneity in these interactions across demographic axes, space and time. Considerable research has been devoted to the structure and estimation of (components of) these matrices to help inform outbreak control and forecast disease spread. In this paper, we review the existing literature on the data types used to construct contact matrices and the methods for incorporating uncertainties and heterogeneities into them. We also highlight remaining challenges and future directions in the use of these contact matrices for epidemiological research.

2605.29961 2026-05-29 stat.ME

Modifying causal models to distinguish between transient and lasting causal effects

修改因果模型以区分瞬时和持久因果效应

Russell Steele, Naftali Weinberger, Tess Baker, Ian Shrier

AI总结 本文提出一种基于系统和状态的方法,通过定义新的零效应概念来区分时间变化系统中的瞬时和持久因果效应。

详情
Comments
18 pages, 7 figures
AI中文摘要

本文考虑如何对随时间观察的结果和暴露的因果模型中的干预效应进行分类。首先,我们展示了在时变框架中,潜在结果和因果有向无环图的最常见用法在捕捉所有可能干预方面的局限性,特别是在关键问题涉及维持或改变平衡行为的干预时。其次,我们采用基于系统和状态的方法,而不是基于测量的方法,来识别因果参数。特别是,我们讨论了关于系统平衡的假设以及干预对该平衡的影响如何允许更具体的因果解释,并阐明设计和分析的目标。第三,我们展示了识别时变系统因果参数的能力如何取决于测量系统状态的时间点的选择。我们通过提出一种新颖的零效应版本来解决这个问题,该版本旨在区分瞬时和持久因果效应。

英文摘要

This paper considers how to classify the effects of interventions in causal models for outcomes and exposures observed over time. First, we demonstrate the limitations of the most common uses of potential outcomes and causal directed acyclic graphs for capturing all possible interventions in a time varying framework, particularly in problems where the key question concerns interventions to maintain or change equilibrium behaviour. Second, we adopt a system and state based approach rather than a measurement-based approach to identify the causal parameters. In particular, we discuss how assumptions about the system's equilibrium and the effects of interventions on that equilibrium can allow for more specific causal interpretations and clarify the goals of design and analysis. Third, we show how the ability to identify the the causal parameters of a time varying system depends on the selection of timepoints for measuring the system's states. We address this by proposing a novel version of the null effect, which is designed to distinguish between transient and lasting causal effects.

2605.29922 2026-05-29 stat.ME

Statistical Tapers for Correlation-Based Localization in Ensemble Data Assimilation

统计锥形函数用于集合数据同化中基于相关性的局地化

Alexandre A. Emerick, Vinicius Luiz Santos Silva

AI总结 本文提出三种统计锥形函数(广义幂律、逻辑斯蒂和差异型),用于集合数据同化中基于相关性的局地化,以抑制虚假相关并保留有意义的参数-数据关系。

详情
AI中文摘要

局地化在基于集合的数据同化中至关重要,因为有限集合会产生噪声协方差估计,导致虚假更新和集合方差过度损失。在地下应用中,局地化通常基于空间距离,但当参数-数据关系受流动动力学、非线性算子、非局部参数或先验条件效应控制时,这一标准难以证明其合理性。本研究将基于相关性的局地化作为一种替代策略,其中锥形系数根据估计的模型-数据相关性的统计可靠性计算。我们将局地化解释为相关空间中的收缩问题,并提出三种锥形函数:基于均方误差校正的广义幂律锥形函数、源自贝叶斯尖峰-板公式的逻辑斯蒂锥形函数,以及受莫罗佐夫原理启发的差异型锥形函数。使用涉及标量和网格参数、局地化流动响应、非平凡相关模式以及模型维数增加的合成储层数据同化问题对这些锥形函数进行评估。结果表明,基于相关性的局地化可以抑制虚假相关,同时保留有意义的参数-数据关系。在几种情况下,所提出的幂律和逻辑斯蒂锥形函数比基于距离的局地化保留了更多的后验集合方差,同时保持了可接受的数据匹配质量。逻辑斯蒂锥形函数提供了最强的方差保留,而更平滑的锥形函数则有利于更好的数据匹配。总体而言,结果表明基于相关性的局地化是一种统计上合理的替代基于距离的局地化的方法,特别是在空间距离不可用或具有误导性时。

英文摘要

Localization is essential in ensemble-based data assimilation because finite ensembles produce noisy covariance estimates, causing spurious updates and excessive loss of ensemble variance. In subsurface applications, localization is usually based on spatial distance, but this criterion can be hard to justify when parameter-data relationships are controlled by flow dynamics, nonlinear operators, non-local parameters, or prior conditioning effects. This work investigates correlation-based localization as an alternative strategy in which tapering coefficients are computed from the statistical reliability of estimated model-data correlations. We interpret localization as a shrinkage problem in correlation space and propose three tapers: a generalized power-law taper motivated by mean-square-error correction, a logistic taper derived from a Bayesian spike-and-slab formulation, and a discrepancy-based taper inspired by Morozov's principle. The tapers are evaluated using synthetic reservoir data assimilation problems involving scalar and grid-based parameters, localized flow responses, non-trivial correlation patterns, and increasing model dimension. The results show that correlation-based localization can suppress spurious correlations while preserving meaningful parameter-data relationships. In several cases, the proposed power-law and logistic tapers retained more posterior ensemble variance than distance-based localization while maintaining acceptable data-match quality. The logistic taper provided the strongest variance preservation, whereas smoother tapers favored better data matches. Overall, the results indicate that correlation-based localization is a statistically motivated alternative to distance-based localization, especially when spatial distance is unavailable or misleading.

2605.29908 2026-05-29 stat.ML cs.LG

Joint Model and Data Sparsification via the Marginal Likelihood

通过边际似然进行联合模型与数据稀疏化

Alexander Timans, Thomas Möllenhoff, Christian A. Naesseth, Mohammad Emtiyaz Khan, Eric Nalisnick

AI总结 提出通过边际似然联合学习特征和样本相关性,实现同时模型与数据稀疏化的贝叶斯方法,在保持共轭性和闭式更新的同时提升鲁棒性。

详情
Comments
36 pages, 8 figures, 12 tables (incl. appendix); published at ICML 2026
AI中文摘要

线性系统中的稀疏恢复支撑着从信号处理到高维回归的应用。基于自动相关性确定(ARD)原理的稀疏贝叶斯学习,通过边际似然优化为特征稀疏性提供了一种实用的贝叶斯机制。然而,其对同方差噪声模型的依赖使其对数据污染(如异常值或错误指定的噪声)敏感,损害了模型拟合和预测。相反,我们提出联合学习个体特征和样本相关性,通过单一贝叶斯目标实现同时模型与数据稀疏化。这种模型和数据的对称剪枝提供了一种自然扩展,保持了共轭性,允许标准优化过程的闭式更新,并与鲁棒回归和影响函数的观点一致。跨多种回归任务的实证结果证实,联合ARD方法一致地产生稀疏且鲁棒的预测模型。

英文摘要

Sparse recovery in linear systems underpins applications from signal processing to high-dimensional regression. Sparse Bayesian Learning, grounded in the principle of automatic relevance determination (ARD), offers a practical Bayesian mechanism for feature sparsity via marginal likelihood optimization. Yet, its reliance on a homoscedastic noise model renders it sensitive to data contaminations such as outliers or misspecified noise, harming model fit and predictions. Instead, we propose jointly learning individual feature and sample relevancies, enabling simultaneous model and data sparsification via a single Bayesian objective. This symmetric pruning of model and data offers a natural extension that preserves conjugacy, admits closed-form updates for standard optimization procedures, and aligns with perspectives from robust regression and influence functions. Empirical results across diverse regression tasks affirm that a joint ARD approach consistently yields both sparse and robust prediction models.

2605.29885 2026-05-29 cs.LG cond-mat.dis-nn math.OC math.RT stat.ML

Open Problem: Separating Geometric and Algorithmic Compression via Cayley-Table Completion

开放问题:通过凯莱表完成分离几何压缩与算法压缩

Dongsung Huh

AI总结 提出凯莱表完成作为测试缺失的算法复杂度最小化归纳偏置的规范问题,并挑战社区将连续平坦性先验推广以自主发现离散算法公理。

详情
Comments
6 pages. Submitted to the Conference on Learning Theory (COLT) 2026 Open Problem track
AI中文摘要

现代统计学习理论和深度学习主要从连续容量控制(如基于范数的正则化、间隔最大化、低秩偏置)的角度来表征泛化。虽然在连续领域非常成功,但深度学习始终无法外推精确的算法或离散代数规则,这反映出缺失了向算法复杂度最小化的归纳偏置。我们提出凯莱表完成作为这一缺失偏置的规范测试平台,作为矩阵完成的离散代数对应物。正如矩阵分解结合权重衰减产生对低线性秩的隐式几何偏置,最近的结果表明,算子值张量分解结合平坦性先验产生对精确离散结合性的隐式算法偏置。我们提出了为凯莱表建立形式化精确恢复界限的开放问题,并挑战社区将连续平坦性先验推广,以自主发现更广泛的离散算法公理,而无需组合搜索。

英文摘要

Modern statistical learning theory and deep learning characterize generalization primarily in terms of continuous capacity control (e.g., norm-based regularization, margin maximization, low-rank bias). While highly successful in continuous domains, deep learning consistently fails to extrapolate exact algorithmic or discrete algebraic rules, reflecting a missing inductive bias toward algorithmic complexity minimization. We propose the Cayley-table completion as the canonical testbed for this missing bias, serving as the discrete algebraic counterpart to matrix completion. Just as matrix factorization combined with weight decay yields an implicit geometric bias toward low linear rank, recent results demonstrate that operator-valued tensor factorizations paired with a flatness prior yield an implicit algorithmic bias toward exact discrete associativity. We pose the open problem of establishing formal exact recovery bounds for Cayley-table completion, and challenge the community to generalize continuous flatness priors to autonomously discover broader discrete algorithmic axioms without combinatorial search.

2605.29839 2026-05-29 math.ST cs.IT math.IT physics.data-an stat.ML stat.TH

The Topological Stability Index: A Variance-Based Measure for Persistence Barcodes

拓扑稳定性指数:一种基于方差的持久性条形码度量

Joris Kirchner, Ioannis Diamantis

AI总结 提出拓扑稳定性指数(TSI),一种基于方差的持久性条形码标量度量,量化持久性寿命的离散程度,并建立其基本性质及与Rényi熵的联系。

详情
Comments
31 pages, 14 figures
AI中文摘要

我们引入了拓扑稳定性指数(TSI),一种基于方差的持久性条形码标量度量,用于量化持久性寿命的离散程度。与仅依赖于归一化权重的持久熵不同,TSI捕获绝对变异性,并对异质特征尺度敏感。我们建立了TSI的基本性质,包括其缩放行为、在寿命平移下的不变性以及在插入和删除条形下的显式更新公式。我们还考虑了一种互补的一阶矩型量——拓扑信号指数(TSigI),它捕获持久性寿命的典型尺度,并与TSI一起提供额外的可解释性。我们进一步引入了一个归一化版本$cv\text{TSI}$,它是尺度不变的,并且与二阶Rényi熵有显式的代数关系。特别地,$cv\text{TSI}$是碰撞概率$\sum_i p_i^2$的仿射函数,因此是Rényi熵的单调重参数化,为拓扑数据分析中基于方差和基于熵的摘要提供了直接联系。在合成数据和随机时间序列上的数值实验表明,TSI捕获了与熵互补的结构变异性:它对确定性趋势相对不敏感,而对随机波动和持久性幅度的变化响应强烈。

英文摘要

We introduce the \emph{Topological Stability Index} (TSI), a variance-based scalar measure for persistence barcodes that quantifies the dispersion of persistence lifetimes. Unlike persistent entropy, which depends only on normalized weights, the TSI captures absolute variability and is sensitive to heterogeneous feature scales. We establish fundamental properties of the TSI, including its scaling behavior, invariance under lifetime translation and explicit update formulas under insertion and deletion of bars. We also consider a complementary first-moment-type quantity, the Topological Signal Index (TSigI), which captures the typical scale of persistence lifetimes and provides additional interpretability alongside the TSI. We further introduce a normalized version, $cv\text{TSI}$, which is scale invariant and admits an explicit algebraic relation to the Rényi entropy of order two. In particular, $cv\text{TSI}$ is an affine function of the collision probability $\sum_i p_i^2$, and therefore a monotone reparametrization of the Rényi entropy, providing a direct link between variance-based and entropy-based summaries in topological data analysis. Numerical experiments on synthetic data and stochastic time series demonstrate that the TSI captures structural variability complementary to entropy: it is relatively insensitive to deterministic trends, while responding strongly to stochastic fluctuations and variations in persistence magnitude.

2605.29836 2026-05-29 cs.LG cs.AI stat.ML

CB-SLICE: Concept-Based Interpretable Error Slice Discovery

CB-SLICE: 基于概念的可解释错误切片发现

Yael Konforti, Mateo Espinosa Zarlenga, Elaf Almahmoud, Mateja Jamnik

AI总结 提出CB-SLICE方法,利用概念瓶颈模型的概念预测失败来发现错误切片,并通过关键词概念解释失败模式,优于现有方法。

详情
Comments
20 pages, 7 figures, 12 tables, to be published at Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)
AI中文摘要

尽管平均性能强劲,深度学习模型在特定人群组(称为错误切片)上常表现出系统性错误。识别这些组及其失败的根本原因对于模型调试和偏差缓解至关重要。然而,现有的错误切片发现方法(SDMs)通常生成与模型推理过程脱节的解释,因此只能近似潜在错误源,可能不准确。我们通过利用概念瓶颈模型(CBMs)来解决这一局限,其预测直接依赖于人类可理解的语义概念。由于CBM中下游任务失败通常源于概念预测错误,概念表示为错误切片识别提供了强有力的候选,提供了直接关联错误源的细粒度解释。基于这一见解,我们引入CB-SLICE,一种基于概念的SDM,它将共享概念预测失败的样本分组,并识别每个切片失败模式中最关键的关键词概念。在多个基准测试中,我们展示了CB-SLICE在发现已知偏差方面优于最先进方法,同时提供更丰富、更忠实的模型错误解释。

英文摘要

Despite strong average-case performance, deep learning models often exhibit systematic errors on specific population groups, known as error slices. Identifying these groups and the root causes of their failures is critical for model debugging and bias mitigation. However, existing error Slice Discovery Methods (SDMs) typically generate explanations disconnected from the model's inference process, thus only approximating the underlying error source and may be inaccurate. We address this limitation by leveraging Concept Bottleneck Models (CBMs), whose predictions are directly dependent on human-understandable semantic concepts. Since downstream task failures in CBMs commonly arise from concept mispredictions, concept representations provide a strong candidate for error slice identification, offering fine-grained explanations directly linked to the error source. Building on this insight, we introduce CB-SLICE, a concept-based SDM that groups samples with shared concept prediction failures and identifies the keyword concepts most responsible for each slice's failure mode. Across multiple benchmarks, we show that CB-SLICE outperforms state-of-the-art methods in uncovering well-known biases while providing richer and more faithful explanations of model errors.

2605.29830 2026-05-29 math.ST math.PR stat.AP stat.TH

A Multi-factorial Innovation Model with Feature Interaction

具有特征交互的多因素创新模型

Giacomo Aletti, Irene Crimaldi, Andrea Ghiglietti

AI总结 提出一个基于印度自助餐过程的多因素创新模型,引入特征交互机制,分析参数对渐近行为的影响,并建立强律和中心极限定理。

详情
AI中文摘要

我们引入了一个用于多因素创新的印度自助餐类型模型,其中每个到达的智能体可能同时表现出先前观察到的特征和新特征。新特征的数量遵循幂律行为,而选择旧特征的概率结合了自强化(取决于特征特定的流行度)和平均场交互项(取决于所有观察到的特征的平均流行度)。该模型由通常的创新参数(质量、折扣和浓度)以及两个额外参数控制:一个控制强化相对于强制输入(趋向于零)的强度,另一个调节特征交互的强度。尽管观察到的不同特征总数的增长与三参数印度自助餐过程相同,但交互机制产生了新的渐近状态。对于聚合量,包括预测均值、每个智能体的平均特征数、平均包含概率和平均特征流行度,相变由折扣参数与强制输入权重的比较决定。对于特征特定量,根据交互水平与临界阈值的比较,出现进一步的转变。特别是,高交互导致特征特定包含概率的渐近同步。我们建立了强律和二阶渐近结果,包括在鞅波动与确定性或随机项竞争的状态下的中心极限定理。该分析依赖于递归随机动力学的新的一般结果,这些结果可能在本框架之外也有用。

英文摘要

We introduce an Indian-buffet-type model for multi-factorial innovation in which each arriving agent may exhibit both previously observed and new features. The number of new features follows a power-law behavior, while the probability of selecting an old feature combines self-reinforcement, depending on the feature-specific popularity, with a mean-field interaction term depending on the average popularity of all observed features. The model is governed by the usual innovation parameters (mass, discount and concentration), together with two additional parameters: one controlling the strength of reinforcement against a forcing input toward zero, and one regulating the intensity of feature interaction. Although the growth of the total number of distinct observed features has the same behavior as in the three-parameter Indian buffet process, the interaction mechanism produces new asymptotic regimes. For aggregate quantities, including the predictive mean, the averaged number of features per agent, the mean inclusion probability, and the mean feature popularity, the phase transition is determined by the comparison between the discount parameter and the weight of the forcing input. For feature-specific quantities, a further transition appears according to the comparison between the interaction level and a critical threshold. In particular, high interaction leads to an asymptotic synchronization of feature-specific inclusion probabilities. We establish strong laws and second-order asymptotic results, including central limit theorems in regimes where martingale fluctuations compete with deterministic or random terms. The analysis relies on novel general results for recursive stochastic dynamics, which may be useful beyond the present framework.

2605.29758 2026-05-29 stat.ME

Fisher's ideas and the design of field experiments in agronomy and plant breeding

Fisher的思想与农学和植物育种中的田间试验设计

Hans-Peter Piepho

AI总结 回顾Fisher在实验设计中的关键思想,并联系当前农业田间试验中的系统设计、行列设计、多环境试验等方法。

详情
Comments
31 pages, 2 tables
AI中文摘要

R. A. Fisher是上世纪最伟大的科学家之一。他做出了许多开创性的贡献,多到几乎无法一一列举。他对实验设计的革命性贡献主要可追溯至其学术生涯早期,并且与他在Rothamsted实验站参与农业田间试验密不可分。在本报告中,我将回顾Fisher关于实验设计的关键思想,并将其与我参与的一些工作联系起来,这些工作大多直接关注农业田间试验。涵盖的主题包括系统设计、行列设计、增广行列设计、多环境试验、部分重复设计、试验在细分目标环境群体中的最优分配,以及各国试验系统的联系。

英文摘要

R. A. Fisher was one of the greatest scientists of the last century. He made many ground-breaking contributions, so many indeed that it seems almost impossible to list all of them. His revolutionary contributions to the design of experiments can mostly be traced to the early part of his academic career, and they are inextricably linked to his involvement with agricultural field experiments at Rothamsted Experiment Station. In this talk I will review Fisher's key ideas on experimental design and relate them to some of the work I am involved in, most of which directly focuses on field experiments in agriculture. Topics covered include systematic designs, row-column designs, augmented row-column designs, multi-environment trials, partially replicated designs, optimal allocation of trials to zones in sub-divided target populations of environments, and the connection of trialling systems across countries.

2605.29748 2026-05-29 stat.ML cs.LG

Instance-dependent Stochastic Lipschitz bandit

实例依赖的随机Lipschitz bandit

Marius Potfer, Vianney Perchet

AI总结 针对Lipschitz bandit问题,提出一种基于水平集次优性间隙积分的算法,实现比传统缩放维度更优的实例依赖遗憾界。

详情
AI中文摘要

我们研究Lipschitz bandit问题,其中学习器通过带噪声的点评估在域$\mathcal{X} \subset [0,1]^d$上顺序最大化未知的Lipschitz函数$f$。现有的遗憾界要么是最坏情况的,缩放为$\tilde\Theta \left ( T^{d+1/d+2}\right )$,要么通过缩放维度$d_z$自适应,得到$\tilde\Theta \left ( T^{d_z+1/d_z+2}\right )$。然而,这种基于缩放的保证仅是部分实例依赖的,因为它们仅依赖于近最优水平集的渐近增长,未能捕捉$f$的更精细结构性质。我们提供了一种分析和算法,通过$f$在其水平集上的次优性间隙的积分来刻画遗憾。这产生了适应水平集局部增长(而不仅仅是其渐近行为)的遗憾界。作为推论,当最大化者集合的维度$d^\star>0$时,我们获得了阶为$\tilde{\mathcal{O}} \left ( T^{d_z+1 / \max(d_z,d^\star)+2}\right )$的改进自适应速率,在该情况下严格优于经典的缩放界。最后,我们将分析扩展到完全信息设置(Lipschitz专家),并展示了如何放宽一些正则性假设。

英文摘要

We study the Lipschitz bandit problem, where a learner sequentially maximizes an unknown Lipschitz function $f$ over a domain $\mathcal{X} \subset [0,1]^d$ using noisy pointwise evaluations. Existing regret bounds are either worst-case, scaling as $\tildeΘ \left ( T^{d+1/d+2}\right )$, or adaptive via the zooming dimension $d_z$, yielding $\tildeΘ \left ( T^{d_z+1/d_z+2}\right )$. However, such zooming-based guarantees are only partially instance-dependent, as they depend solely on the asymptotic growth of near-optimal level sets and fail to capture finer structural properties of $f$. We provide an analysis and an algorithm that characterizes the regret through integrals of the suboptimality gap of $f$ over its level sets. This yields regret bounds that adapt to the local growth of level sets, rather than only their asymptotic behavior. As a corollary, when the set of maximizers has dimension $d^\star>0$, we obtain improved adaptive rates of order $\tilde{\mathcal{O}} \left ( T^{d_z+1 / \max(d_z,d^\star)+2}\right )$ strictly improving over classical zooming bounds in this regime. Finally, we extend our analysis to the full-information setting (Lipschitz experts) and show how some of the regularity assumptions can be relaxed.

2605.29702 2026-05-29 stat.ME

A Jensen-Shannon divergence based $k$--$NN$ algorithm for missing value imputation in compositional data

基于Jensen-Shannon散度的$k$--$NN$算法用于成分数据缺失值插补

Michail Tsagris, Connie Stewart, Abdulaziz Alenazi

AI总结 提出一种基于Jensen-Shannon散度和Fr\'echet均值的$k$--$NN$非参数方法,用于成分数据缺失值插补,具有自适应性且适用于含零值数据。

详情
Comments
This is the preprint of the paper that was published in the Journal of Applied Statistics. https://www.tandfonline.com/doi/full/10.1080/02664763.2026.2677908
AI中文摘要

开发了一种新的非参数方法来插补成分数据中的缺失值。该方法基于$k$--$NN$算法,利用Jensen-Shannon散度并采用Fr\'echet均值,以在估计过程中提供更多灵活性。作为一个额外特性,超参数可以根据缺失值的模式自适应调整。与限制性参数模型不同,所提出的方法不对数据结构做任何假设,最重要的是,即使成分数据包含零值,它也能适用。通过使用真实数据的模拟研究,表明所提出的算法在各种设置下不仅准确性更高,而且计算效率也优于竞争算法。

英文摘要

A novel nonparametric method to impute missing values in compositional data is developed. The method is based on the $k$--$NN$ algorithm, utilizes the Jensen-Shannon divergence and employs the Fr{é}chet mean to allow for more flexibility in the estimation process. As an extra feature, the hyper-parameters can be self-adaptive according to the pattern of missing values. Unlike restrictive parametric models, the proposed method makes no assumption about the structure of the data and, most importantly, it is applicable even when compositional data contain zero values. Through simulation studies using real data, it is shown that the proposed algorithm outperforms competing algorithms at various settings, not only in terms of accuracy but also in terms of computational efficiency.

2605.29684 2026-05-29 cs.LG cond-mat.dis-nn stat.ML

Kernel Renormalization in Bayesian Deep Neural Networks: the Equivalent Wishart Ansatz in the Proportional Regime

贝叶斯深度神经网络中的核重整化:比例机制下的等效Wishart假设

Paolo Baglioni, Christian Keup, Vincenzo Zimbardo, Rosalba Pacelli, Alessandro Vezzani, Raffaella Burioni, Pietro Rotondo

AI总结 针对固定深度L的贝叶斯多层感知机,提出等效Wishart假设来捕捉层次经验核的随机涨落,通过大偏差分析得到重正化NNGP核描述,在比例极限下用至多L个标量序参数刻画表示学习,并扩展到CNN揭示局部核重整化机制。

详情
Comments
45 pages, 21 figures
AI中文摘要

训练集大小$P$和深度神经网络宽度$N$以相同速率增长的比例宽度极限,已被深入研究用于浅层单隐藏层网络。然而,将这些非微扰结果从浅层架构扩展到深度非线性网络已被证明非常具有挑战性。在这里,我们提出了一种有效的近似方法,用于预测固定深度$L$的贝叶斯多层感知机(MLP)在任意高维数据上的泛化性能。我们提出了一个等效Wishart假设,以捕捉MLP层次经验核的主要随机涨落。这使我们能够在比例极限下对MLP的配分函数进行大偏差分析,并用重正化NNGP核表示。在这种描述中,即使比例极限下的强表示学习也由至多$L$个标量序参数编码,这些参数自洽确定。将该方法扩展到卷积架构(CNN),我们识别出一种层次局部核重整化机制,该机制允许量化CNN中由于有限宽度效应导致的大宽度核的更复杂数据相关变换。我们在经典基准数据集上,针对深度$L \sim O(10)$和$P\sim O(10^3)$的有限深度神经网络的贝叶斯后验采样实验测试了我们的有效理论,发现总体吻合良好,同时存在两种不同类型的系统性偏差。

英文摘要

The scaling limit where both the size of the training set $P$ and the width $N$ of a deep neural network grow at the same rate, the so-called proportional-width regime, has been intensely studied for shallow, single-hidden-layer networks. However, extending these non-perturbative results from shallow architectures to deep non-linear networks has proven very challenging. Here we present an effective approximate approach to predict the generalization performance of Bayesian multi-layer perceptrons (MLPs) of fixed depth $L$ on arbitrary high-dimensional data. We propose an equivalent Wishart Ansatz to capture the dominant stochastic fluctuations of the hierarchical empirical kernels of MLPs. This allows us to perform a large deviation analysis for the partition function of MLPs in the proportional limit, expressed in terms of a renormalized NNGP kernel. In this description, even strong representation learning in the proportional limit is encoded in at most $L$ scalar order parameters, determined self-consistently. Extending the approach to convolutional architectures (CNNs), we identify a hierarchical local kernel renormalization mechanism, which allows to quantify more complex data-dependent transformations of the large-width kernel in CNNs due to finite-width effects. We test our effective theory against sampling experiments from the Bayesian posterior of finite deep neural networks with depths $L \sim O(10)$ and $P\sim O(10^3)$ on classic benchmark datasets, finding overall very good agreement together with two distinct types of systematic deviations.

2605.29669 2026-05-29 stat.ML cs.LG math.PR math.ST stat.TH

Eigen-Spike Emergence and Quadratic Equivalents for Conjugate Kernels on Nonlinearly Separable Data

Eigen-Spike 涌现与共轭核在非线性可分数据上的二次等价

Collin Cranston, Zhichao Wang, Todd Kemp, Michael W. Mahoney

AI总结 针对非线性可分数据(XOR问题),通过共轭核矩阵的二次等价模型,分析异常特征值涌现及其与标签对齐的BBP型相变,揭示样本复杂度、信噪比、激活函数和预训练特征对非线性可学习性的影响。

详情
Comments
89 pages, 10 figures
AI中文摘要

近期随机矩阵理论(RMT)工作发展了确定性等价的概念:通常是线性代理模型,用于近似大型非线性随机矩阵(如神经网络中的非线性特征映射)的谱行为。一方面,这些确定性等价通过将复杂模型简化为具有经典RMT工具特性的更简单模型,使理论预测易于处理。然而,这留下了一个问题:在处理高维非线性可分数据(例如对非线性可分数据进行分类)时,这种理想化的线性等价是否仍然有意义。受此启发,我们考虑前馈神经网络的非线性特征映射——共轭核(CK),在典型的非线性可分数据集XOR问题上;我们利用CK中信息性异常特征值的研究及其对应特征向量是否渐近与XOR标签对齐,作为非线性可学习性的代理。我们开发了尖峰CK矩阵的稳健二次等价,从而能够精确分析随着修改机器学习实践中常见的各种旋钮(样本复杂度、信噪比、非线性激活选择以及预训练特征)时涌现的信息性尖峰。在每种情况下,我们推导出精确的BBP型相变,其中通过CK特征向量的线性分类变得可能。我们的分析有助于将RMT中确定性等价工具的力量转化为研究机器学习中实际相关的问题。

英文摘要

Recent work in random matrix theory (RMT) has developed the notion of deterministic equivalents: typically linear surrogate models that approximate the spectral behavior of large nonlinear random matrices, such as nonlinear feature maps in neural networks (NNs). On the one hand, these deterministic equivalents make theoretical predictions tractable by reducing a complex model to a simpler model with properties that fall under the umbrella of classical RMT tools. However, this leaves open the question of whether this idealized linear equivalence remains meaningful when dealing with high-dimensional nonlinearly separable data, such as performing clssification on nonlinearly separable data. Motivated by this, we consider the conjugate kernel (CK), which is the nonlinear feature map of a feedforward NN, under a canonical nonlinearly separable dataset, the XOR problem; and we use the study of informative outlier eigenvalues in the CK and whether their corresponding eigenvectors asymptotically align with XOR labels as a proxy for nonlinear learnability. We develop a robust quadratic equivalent to the spiked CK matrix that enables a precise analysis of emergent informative spikes, as one modifies various knobs common in ML practice: sample complexity, signal-to-noise ratio (SNR), nonlinear activation choice, and pretrained features. In each of these scenarios, we derive a precise BBP-type phase transition in which linear classification via the CK eigenvectors becomes possible. Our analysis helps translate the power of deterministic equivalence tools in RMT to study problems of practical relevance in ML.

2605.29645 2026-05-29 cs.LG cs.AI stat.ML

The Sample Complexity of Multiclass and Sparse Contextual Bandits

多类别和稀疏上下文赌博机的样本复杂度

Liad Erez, Fan Chen, Alon Cohen, Tomer Koren, Yishay Mansour, Shay Moran, Alexander Rakhlin

AI总结 针对随机i.i.d.上下文赌博机,提出基于决策估计系数和低方差探索的算法,在稀疏奖励下实现接近最优的样本复杂度,并匹配下界。

详情
AI中文摘要

我们研究随机i.i.d.设置下的上下文赌博机,其中学习器观察来自未知分布的上下文,从有限集合$A$中选择动作,并旨在基于赌博机反馈从给定类别中识别近似最优策略。受零一奖励的赌博机多类别分类启发,我们关注\emph{$s$-稀疏}设置,其中对于每个上下文,奖励向量的$L_1$范数至多为$s \ll |A|$。我们的主要结果是设计算法,以高概率输出一个相对于策略类$Π$的$ε$-最优策略,使用$ ilde{O} ((s/ε^2 + |A|/ε)\log |Π|/δ)$个样本。我们将此界推广到一般Natarajan类,并补充了匹配的下界(对数因子内),从而缩小了先前工作(Erez等人,2024, 2025)留下的巨大差距,后者额外增加了$Θ(|A|^9)$依赖。我们通过两种互补方法获得这些结果。首先,我们从具有结构化观测的上下文决策角度分析上下文赌博机,设计了一种探索-优化算法,其样本复杂度由\emph{决策估计系数}(DEC;Foster等人,2021, 2022)控制。我们证明,在$s$-稀疏奖励下,诱导的模型类具有随$s$缩放的尖锐DEC界,直接产生最优速率。由于这种方法主要是信息论性的,并涉及求解复杂的min-max优化问题,我们还开发了第二种更专门的算法方法,基于低方差探索技术。这种方法产生了具体、易处理的算法,并自然地扩展到上下文组合半赌博机,为赌博机多类别列表分类提供了改进的样本复杂度保证。

英文摘要

We study contextual bandits in the stochastic i.i.d.\ setting, where a learner observes contexts drawn from an unknown distribution, selects actions from a finite set $A$, and aims to identify an approximately optimal policy from a given class based on bandit feedback. Motivated by bandit multiclass classification with zero-one rewards, we focus on the \emph{$s$-sparse} setting in which, for every context, the reward vector has $L_1$-norm at most $s \ll |A|$. Our main result is the design of algorithms that, with high probability, output an $ε$-optimal policy compared to policy class $Π$ using $\tilde{O} ((s/ε^2 + |A|/ε)\log |Π|/δ)$ samples. We extend this bound to general Natarajan classes and complement it with a matching lower bound (up to logarithmic factors), thereby closing a substantial gap left by prior work (Erez et al., 2024, 2025), which incurred an additional $Θ(|A|^9)$ dependence. We obtain these results via two complementary approaches. First, we analyze contextual bandits through the lens of contextual decision making with structured observations, designing an exploration-by-optimization algorithm whose sample complexity is governed by the \emph{decision-estimation coefficient} (DEC; Foster et al., 2021, 2022). We show that, with $s$-sparse rewards, the induced model class admits a sharp DEC bound that scales with $s$ and directly yields the optimal rate. Since this approach is largely information-theoretic and involves solving complex min-max optimization problems, we also develop a second, more specialized algorithmic method based on a low-variance exploration technique. This approach leads to concrete, tractable algorithms and naturally extends to contextual combinatorial semi-bandits, leading to improved sample complexity guarantees for bandit multiclass list classification.

2605.29642 2026-05-29 stat.ML cs.IT cs.LG math.IT

Matching Rates and Optimal Allocation for Federated Probe-Logit Distillation under Heterogeneous Bandwidth Budgets

异构带宽预算下的联邦探针-逻辑蒸馏匹配率与最优分配

Prasanjit Dubey, Xiaoming Huo

AI总结 针对联邦探针-逻辑蒸馏(FPLD)中带宽项速率紧性及异构节点带宽分配问题,提出匹配下界、多轮改进方案及闭合形式最优分配规则。

详情
AI中文摘要

在联邦语言建模中,$K$个节点各自持有$n$个样本,但无法合并数据或交换全精度梯度或权重。我们研究当每个节点在公共探针集上每次查询最多上传$B$比特时,对$V$个令牌上的条件分布进行估计的极小极大速率。在联邦探针-逻辑蒸馏(FPLD)中,每个节点在探针集上传输一个标量量化的逻辑向量,聚合器蒸馏出一个全局参数化学生模型。先前的工作(Dubey and Huo, 2026)建立了高概率KL速率$O(d/(Kn) + ρ\sqrt{V \log V / m} + K^{-1} \cdot 2^{-2B/V})$加上优化松弛项,其中带宽项采用迹锐化形式。该带宽项速率是否紧致,以及上界如何推广到异构每节点带宽,仍是开放问题。 我们填补了这两个空白。首先,抖动FPLD构造在非退化条件下具有匹配的单轮下界$Ω(K^{-1} \cdot 2^{-2B/V})$,将带宽轴速率确定为$Θ(K^{-1} \cdot 2^{-2B/V})$。使用嵌套/缩放残差量化器的$T$轮顺序细化达到$O(K^{-1} \cdot 2^{-2TB/V})$;对于任意$T > 1$,原始FPLD的与$T$无关的带宽项是次优的。其次,我们建立了每节点预算$B_i$的异构带宽上界,并配以闭合形式的最优分配$B_i^* = B_{\mathrm{tot}}/K + (V/2) \log_2(w_i / ar{w}_g)$,这是一种对数倾斜的注水规则,是失真率优化中反向注水的每节点类比。一种即插即用自适应变体通过短预热阶段估计权重,并达到$1 + O(\sqrt{\log(K/δ)/(m T_0)})$的相对次优性。合成n-gram模拟证实经验KL被上界和下界所界定,并且在异构裁剪下最优分配严格优于均匀和逆权重基线。

英文摘要

In federated language modeling, $K$ nodes each hold $n$ samples but cannot pool data or exchange full-precision gradients or weights. We study the minimax rate at which a conditional distribution over $V$ tokens can be estimated when each node may upload at most $B$ bits per query in a public probe set. In federated probe-logit distillation (FPLD), each node transmits a scalar-quantized logit vector on the probe set, and an aggregator distills a global parametric student. Prior work (Dubey and Huo, 2026) establishes a high-probability KL rate $O(d/(Kn) + ρ\sqrt{V \log V / m} + K^{-1} \cdot 2^{-2B/V})$ plus optimization slack, with the bandwidth term in its trace-sharpened form. Whether this bandwidth-term rate is tight, and how the upper bound generalizes to heterogeneous per-node bandwidths, are left open. We close both gaps. First, the dithered FPLD construction has a matching single-round lower bound $Ω(K^{-1} \cdot 2^{-2B/V})$ under non-degeneracy, pinning the bandwidth-axis rate at $Θ(K^{-1} \cdot 2^{-2B/V})$. $T$-round sequential refinement with nested/scaled residual quantizers achieves $O(K^{-1} \cdot 2^{-2TB/V})$; vanilla FPLD's $T$-independent bandwidth term is suboptimal for every $T > 1$. Second, we establish a heterogeneous-bandwidth upper bound for per-node budgets $B_i$, paired with a closed-form optimal allocation $B_i^* = B_{\mathrm{tot}}/K + (V/2) \log_2(w_i / \bar{w}_g)$, a log-tilted water-filling rule that is the per-node analogue of reverse water-filling for distortion-rate optimization. A plug-in adaptive variant estimates the weights from a short warm-up phase and attains $1 + O(\sqrt{\log(K/δ)/(m T_0)})$ relative suboptimality. Synthetic n-gram simulations confirm that empirical KL is bracketed by the upper and lower bounds and that the optimal allocation strictly dominates uniform and inverse-weighted baselines under heterogeneous clipping.

2605.29641 2026-05-29 stat.ME cs.PF math.PR

Experimentation for Different Scheduling Policies on Queues: Mixed Differences-in-Q Estimators Based on Little's Law

不同调度策略在队列上的实验:基于Little定律的混合差分Q估计量

Nanshan Jia, Ramesh Johari, Nian Si, Zeyu Zheng

AI总结 针对数据中心调度策略A/B测试中的马尔可夫干扰问题,提出基于Little定律的混合差分Q估计量,显著降低偏差和方差,并通过非平稳到达率、异构服务率等场景的仿真验证了鲁棒性和有效性。

详情
AI中文摘要

在数据中心,任务被分发到各个服务器以均匀分配工作负载。当数据中心考虑实施新的调度算法时,通常会在部署前进行A/B测试以评估该新方法的实际影响。然而,直接的A/B测试可能会受到所谓的“马尔可夫”干扰。我们利用Farias等人(2022)开发的差分Q估计量,并引入了基于Little定律的混合差分Q估计量。我们表明,我们的A/B测试方法在测试各种调度策略时显著减少了偏差和方差。在非平稳到达率、异构服务率和通信延迟等场景下进行了大量仿真。这些仿真突出了我们A/B测试方法的鲁棒性和有效性。

英文摘要

In data centers, tasks are dispatched to various servers to evenly distribute the workload. When a data center considers implementing a new scheduling algorithm, it typically conducts an A/B test prior to deployment to assess the real-world impact of this new method. However, a straightforward A/B test might be interfered with so-called ``Markovian'' interference. We utilized the Differences-in-Q estimator, as developed by Farias et al. (2022), and introduced mixed Differences-in-Q estimators grounded in Little's Law. We show that our A/B testing methods significantly reduce bias and variance when testing various scheduling policies. Extensive simulations were conducted under scenarios like non-stationary arrival rates, heterogeneous service rates, and communication delays. These simulations highlight the robustness and efficacy of our A/B testing approach.

2605.29611 2026-05-29 stat.ME stat.CO

Hierarchical forecasting: The role of information

分层预测:信息的作用

Minh Nguyen, Farshid Vahid, Shanika L Wickramasuriya

AI总结 本文提出信息组合(IComb)方法,通过结合不同信息集的基础预测来改进分层时间序列预测,并证明信息组合与聚合约束对预测改进的独立贡献。

详情
AI中文摘要

在分层预测中,预测协调过程将一组不满足实际数据中分层聚合约束的“基础”或“原始”预测,转换为一组满足这些约束的“一致”预测。学术文献提供了大量模拟证据和实际案例,证明预测协调在改进分层时间序列预测方面的价值。这种改进归因于聚合约束的施加。然而,这些证据来源于每个使用不同信息集(通常是每个时间序列对应的单变量信息集)生成的基础预测。由于协调算法结合了预测,很难确定改进在多大程度上是由于约束的施加,还是由于不同预测所携带信息的组合。在本文中,我们证明当基础预测基于不同的信息集且历史数据可用时,即使预测已经一致,通过组合每个预测携带的信息也有改进这些预测的空间。我们提出了一种新方法,称为信息组合(IComb)方法,该方法在协调过程中结合预测的信息内容。该方法基于回归,可以使用现有的惩罚回归包实现。我们提供模拟证据来说明信息集与聚合约束在分层时间序列预测中的不同作用。最后,我们将我们的方法应用于文献中先前使用的数据集,并证明与传统方法相比,它取得了更优的结果。

英文摘要

In hierarchical forecasting, the process of forecast reconciliation transforms a set of "base" or "raw" forecasts, which do not satisfy the hierarchical aggregation constraints in the real data, into a set of "coherent" forecasts, which do satisfy those constraints. The academic literature provides ample simulation evidence and real-world examples demonstrating the value of forecast reconciliation in improving forecasts of hierarchical time series. This improvement is attributed to the imposition of aggregation constraints. However, this evidence is derived from base forecasts, each generated using a distinct information set, usually the univariate information set corresponding to each time series. Since reconciliation algorithms combine forecasts, it is difficult to determine the extent to which the improvement is due to the imposition of constraints versus the combination of information carried by different forecasts. In this paper, we demonstrate that when base forecasts are based on different information sets and historical data are available, there is scope for improving these forecasts by combining the information that each one carries, even when they are already coherent. We propose a new method, called the information combination (IComb) method, which combines the information content of forecasts during the reconciliation process. The method is regression-based and can be implemented using existing penalised regression packages. We provide simulation evidence to illustrate the role of information sets, as distinct from the role of aggregation constraints, in forecasting hierarchical time series. Finally, we apply our method to datasets previously used in the literature and demonstrate that it achieves superior results compared to traditional approaches.

2605.28327 2026-05-29 stat.ML cs.LG q-fin.RM stat.AP

Insurance Pricing Optimization via Off-Policy Evaluation

通过离线策略评估进行保险定价优化

Sascha Günther, Dimitri Semenovich, Mario V. Wüthrich

AI总结 本文提出基于离线策略评估和随机控制的保险定价方法,利用核化逆倾向得分估计器降低方差,并通过数据共享Lasso和神经网络两种策略优化方法实现最优定价。

详情
AI中文摘要

传统保险定价依赖于基于风险的原则,确保精算公平和偿付能力,但未明确考虑投保人的价格敏感性。我们将保险定价表述为一个决策问题,并使用离线策略评估和随机控制的工具进行研究。我们提出了一种核化逆倾向得分估计器,该估计器利用动作空间中的局部结构,与经典逆倾向得分估计器相比实现了方差减少。基于这些价值估计,我们研究了策略优化,并提出了两种计算最优定价规则的实用方法:一种可解释的数据共享Lasso公式和一种基于神经网络的灵活策略参数化。通过使用受控的合成旅行保险环境,我们实证验证了理论结果,并表明神经网络在策略优化方面优于现有技术。

英文摘要

Traditional insurance pricing relies on risk-based principles that ensure actuarial fairness and solvency but do not explicitly account for policyholders' price sensitivity. We formulate insurance pricing as a decision-making problem and study it using tools from off-policy evaluation and stochastic control. We propose a kernelized inverse propensity score estimator that exploits local structure in the action space and yields variance reduction compared to the classical inverse propensity score estimator. Building on these value estimates, we investigate policy optimization and present two practical approaches for computing optimal pricing rules: an interpretable data-shared Lasso formulation and a flexible policy parameterization based on neural networks. Using a controlled synthetic travel insurance environment, we empirically confirm the theoretical results and show that neural networks outperform existing techniques for policy optimization.

2605.27265 2026-05-29 econ.GN q-fin.EC stat.AP stat.ME

Quantifying Social Inflation in Liability Insurance with Advanced Statistical Methods

用高级统计方法量化责任保险中的社会通胀

Tsz Chai Fung, Lie Ma, Liang Peng, Fang Yang

AI总结 本研究利用美国陪审团裁决与和解数据库,通过滚动窗口逻辑回归和分位数回归等方法,量化了责任保险中社会通胀的多渠道影响,发现裁决严重性是主要驱动因素,且社会通胀不仅影响极端判决也影响中等损失。

详情
AI中文摘要

社会通胀,即责任索赔成本超出一般经济通胀的上升,已成为保险公司和再保险公司的主要担忧,但由于诉讼结果具有重尾分布,且进入裁决与和解的案件组合随时间变化,因此难以量化。利用美国陪审团裁决与和解的大型数据库,我们通过多个对再保险公司重要的渠道开发了经案件组合调整的社会通胀度量:原告胜诉率(频率型渠道)、和解倾向(频率型渠道)以及裁决/和解严重性。该方法结合了概率的滚动窗口逻辑回归和严重性的分位数(风险价值)回归,不确定性通过随机加权自助法量化。我们发现,从2009年到2024年,原告胜诉概率有统计上显著的相对增长约20%-30%,同时和解概率在同一时期有统计上显著的相对下降超过10%。主导渠道是裁决严重性:即使在控制解释变量后,裁决金额在2020年后急剧上升,从2020年到2024年增长超过100%,而和解金额显示出有限且通常统计上不显著的通胀。因此,支付给原告的总金额通胀紧密跟随裁决严重性。社会通胀在公司被告和无保险被告案件以及没有侵权赔偿上限或第三方诉讼资助监管的州更为显著。此外,我们发现社会通胀不仅影响“核裁决”,而且以类似方式影响中等损失。

英文摘要

Social inflation, which is the rise in liability claim costs beyond general economic inflation, has become a major concern for insurers and reinsurers, yet it is difficult to quantify because litigation outcomes are heavy-tailed and the mix of cases reaching verdict versus settlement changes over time. Using a large database of US jury verdicts and settlements, we develop case-mix-adjusted social inflation measures through multiple channels that matter to reinsurers: plaintiff win rates (a frequency-type channel), settlement propensity (a frequency-type channel), and verdict/settlement severity. The approach combines rolling-window logistic regression for probabilities and quantile (value-at-risk) regression for severities, with uncertainty quantified via a random-weighted bootstrap. We find statistically significant relative increases in plaintiff win probability of approximately 20%-30% from 2009 to 2024, alongside a statistically significant relative decline in settlement probability of more than 10% over the same period. The dominant channel is verdict severity: Even after controlling for explanatory variables, verdict awards show a sharp rise after 2020, increasing by more than 100% from 2020 to 2024, whereas settlement amounts show limited and often statistically insignificant inflation. Therefore, inflation in total amounts payable to plaintiffs closely tracks verdict severity. Social inflation is more pronounced in corporate-defendant and uninsured-defendant cases and in states without tort caps or third-party litigation funding regulation. In addition, we find that social inflation has impacts not only on "nuclear verdicts" but also, in a similar manner, on moderate losses.

2605.26964 2026-05-29 stat.ME stat.AP

Semiparametric Inference for Causal Effects on Functional Outcomes

功能结果因果效应的半参数推断

Junzhu Nie, Chengxiu Ling, Mengfei Ran

AI总结 针对离散观测的功能数据,提出基于双重差分的半参数推断框架,通过有效影响函数、去偏估计和均匀置信带解决识别、推断和观测三大挑战。

详情
AI中文摘要

双重差分(DiD)是因果推断的基石,但将其扩展到功能结果并非简单的标量推广;相反,它在识别、推断和观测方面面临三个基本挑战。本文针对离散观测数据的功能DiD开发了一个全面的半参数推断框架。首先,我们在平行趋势下定义了功能平均处理效应,并推导了其有效影响函数(EIF),从而建立了半参数效率界。其次,利用Neyman正交性和交叉拟合,我们构建了一个去偏估计量,有效缓解了非参数重构引起的正则化偏差。第三,我们建立了估计量的弱收敛性,并提出了渐近有效的均匀置信带,实现了从逐点推断到曲线级推断的严格过渡。最后,我们证明了离散采样下的重构误差在半参数推断中渐近可忽略,确保了实际可行性。模拟和实证应用证实,所提方法在有限样本中实现了优越的覆盖率和检验功效,为功能数据的因果评估提供了理论扎实且计算可行的基础。

英文摘要

Difference-in-differences (DiD) is a cornerstone of causal inference, yet extending it to functional outcomes is not a routine scalar generalization; rather, it entails three fundamental challenges in identification, inference, and observation. This paper develops a comprehensive semiparametric inference framework for functional DiD with discretely observed data. First, we define the functional average treatment effect under parallel trends and derive its efficient influence function (EIF), thereby establishing the semiparametric efficiency bound. Second, leveraging Neyman orthogonality and cross-fitting, we construct a debiased estimator that effectively mitigates regularization bias arising from nonparametric reconstruction. Third, we establish weak convergence of the estimator and propose an asymptotically valid uniform confidence band, enabling a rigorous transition from pointwise to curve-level inference. Finally, we demonstrate that reconstruction error under discrete sampling is asymptotically negligible for semiparametric inference, ensuring practical feasibility. Simulations and empirical applications confirm that the proposed method achieves superior coverage and testing power in finite samples, providing a theoretically grounded and computationally tractable foundation for causal evaluation with functional data.

2605.25303 2026-05-29 cs.DS cs.LG math.ST stat.ML stat.TH

Algorithms with Polynomially-Improved Approximation Factors for the $2 \rightarrow q$ Norm, and Applications

具有多项式改进近似因子的 $2 \rightarrow q$ 范数算法及其应用

Samuel B. Hopkins, Stefan Tiegel

AI总结 本文针对 $q>2$ 时的 $2 \rightarrow q$ 范数,提出了首个多项式时间近似算法,其近似因子在多项式级别上优于基线 $d^{1/4}$,例如 $q=4$ 时达到 $d^{1/8}$,并构造了平方和证书,从而改进了鲁棒均值估计、协方差估计、回归和聚类等问题的算法。

详情
Comments
v2 corrected minor typos
AI中文摘要

矩阵 $X \in \mathbb{R}^{n \times d}$ 的 $2 \rightarrow q$ 范数定义为 $\lVert X \rVert_{2 \rightarrow q} = \sup_{\lVert v \rVert_2 = 1} \lVert Xv \rVert_q$。我们针对 $q > 2$(即超收缩设置)给出了该范数的多项式时间乘法近似算法。该问题要么直接对应,要么与组合优化和近似难度(例如小集扩张)、量子信息(例如最佳可分态)以及算法统计学中长期存在的开放问题密切相关。 关于在多项式时间内能为此问题达到何种近似因子,我们所知甚少,尽管此类近似具有重要的下游影响。Barak、Brandão、Harrow、Kelner、Steurer 和 Zhou 表明,假设指数时间假设(FOCS'12),没有多项式时间算法能实现优于 $2^{\sqrt{\log n}}$ 的近似因子。另一方面,一个简单的谱算法给出了 $d^{1/4}$ 的基线近似。据我们所知,我们给出了首个在多项式因子内超越该基线的多项式时间近似算法。对于重要的特例 $q = 4$,它实现了 $d^{1/8}$ 的近似。所有先前的算法要么需要对 $X$ 附加假设,要么仅在 $n$ 较小时才能超越基线。 此外,我们为 $2 \rightarrow q$ 范数构造了平方和证书。这直接改进了当数据仅满足 $q$ 阶矩有界时的鲁棒均值和协方差估计、鲁棒回归以及聚类算法。

英文摘要

The $2 \rightarrow q$ norm of a matrix $X \in \mathbb{R}^{n \times d}$ is defined as $\lVert X \rVert_{2 \rightarrow q} = \sup_{\lVert v \rVert_2 = 1} \lVert Xv \rVert_q$. We give polynomial-time multiplicative approximation algorithms for this norm when $q > 2$ (i.e. in the hypercontractive setting). This problem either directly captures or is closely related to long-standing open problems in combinatorial optimization and hardness of approximation (e.g. Small Set Expansion), quantum information (e.g. Best Separable State), and algorithmic statistics. Very little is known about what approximation factors we can achieve for this problem in polynomial time, even though such approximations have significant downstream consequences. Barak, Brandão, Harrow, Kelner, Steurer, and Zhou showed that no polynomial-time algorithm can achieve an approximation factor better than $2^{\sqrt{\log n}}$, assuming the Exponential Time Hypothesis (FOCS'12). On the other hand, a simple spectral algorithm gives a $d^{1/4}$-approximation as a baseline. We give, to the best of our knowledge, the first polynomial-time approximation algorithm beating this baseline by polynomial factors. For the important special case of $q = 4$ it achieves a $d^{1/8}$-approximation. All previous algorithms required additional assumptions on $X$, or only surpassed the baseline for small values of $n$. Moreover, we construct sum-of-squares certificates for the $2 \rightarrow q$ norm. This directly implies improved algorithms for robust mean and covariance estimation, robust regression, and clustering, when the data only satisfies a bound on its $q$-th moment.

2605.13986 2026-05-29 cs.LG stat.ML

TabPFN-3: Technical Report

TabPFN-3: 技术报告

Léo Grinsztajn, Klemens Flöge, Oscar Key, Felix Birkel, Philipp Jund, Brendan Roof, Mihir Manium, Shi Bin Hoo, Magnus Bühler, Anurag Garg, Dominik Safaric, Jake Robertson, Benjamin Jäger, Simone Alessi, Adrian Hayler, Vladyslav Moroshan, Lennart Purucker, Philipp Singer, Alan Arazi, Julien Siems, Jan Hendrik Metzen, Georg Grab, Nick Erickson, Siyuan Guo, Eliott Kalfon, Simon Bing, David Salinas, Clara Cornu, Lilly Charlotte Wehrhahn, Diana Kriuchkova, Kursat Kaya, Lydia Sidhoum, Marie Salmon, Jerry Chen, Madelon Hulsebos, Yann LeCun, Samuel Müller, Bernhard Schölkopf, Sauraj Gambhir, Noah Hollmann, Frank Hutter

AI总结 本文提出TabPFN-3,通过扩展训练数据和优化推理,在表格数据上实现最先进性能,并支持时间序列、关系数据和表格文本数据。

详情
AI中文摘要

表格数据支撑着科学和工业中大多数高价值预测问题,而TabPFN推动了该模态的基础模型革命。根据用户反馈设计,TabPFN-3在此基础上将最先进性能扩展到具有100万训练行的数据集,并大幅减少训练和推理时间。TabPFN-3完全基于我们先验的合成数据进行预训练,极大地推动了表格预测的前沿,并在时间序列、关系数据和表格文本数据上带来了实质性收益。在标准表格基准TabArena上,TabPFN-3的前向传播以显著优势优于所有其他模型(包括调优和集成基线),并在速度/性能前沿上占据帕累托优势。在更多样化的数据集上,TabPFN-3在多类数据集上排名第一,并在多达100万训练行和200个特征的数据集上击败了经过8小时调优的梯度提升树基线。TabPFN-3将测试时计算缩放引入表格基础模型。我们的API产品TabPFN-3-Plus(思考版)利用这一点,在TabArena上以超过200 Elo的优势击败所有非TabPFN模型,在最大数据子集上达到420 Elo,并且比AutoGluon 1.5 extreme快10倍,同时不使用LLM、真实数据、互联网搜索或除TabPFN之外的任何其他模型。TabPFN-3扩展了我们模型的能力,实现了对关系数据(在RelBenchV1上新的最先进基础模型)和表格文本数据(通过TabPFN-3-Plus在TabSTAR上达到最先进)的最先进预测;并改进了现有集成:专用检查点TabPFN-TS-3在时间序列基准fev-bench上排名第二,SHAP值计算速度提升高达120倍。TabPFN-3在实现这一性能的同时,比TabPFN-2.5快20倍。此外,减少的KV缓存和行分块技术使得在单个H100上以快速推理速度扩展到100万行。

英文摘要

Tabular data underpins most high-value prediction problems in science and industry, and TabPFN has driven the foundation model revolution for this modality. Designed with feedback from our users, TabPFN-3 builds on this foundation to scale state-of-the-art performance to datasets with 1M training rows and substantially reduce training and inference time. Pretrained exclusively on synthetic data from our prior, TabPFN-3 dramatically pushes the frontier of tabular prediction and brings substantial gains on time series, relational, and tabular-text data. On the standard tabular benchmark TabArena, a forward pass of TabPFN-3 outperforms all other models, including tuned and ensembled baselines, by a significant margin, and pareto-dominates the speed/performance frontier. On more diverse datasets, TabPFN-3 ranks first on datasets with many classes, and beats 8-hour-tuned gradient-boosted-tree baselines on datasets up to 1M training rows and 200 features. TabPFN-3 introduces test-time compute scaling to tabular foundation models. Our API offering TabPFN-3-Plus (Thinking) exploits this to beat all non-TabPFN models by over 200 Elo on TabArena, rising to 420 Elo on the largest data subset, and outperforms AutoGluon 1.5 extreme while being 10x faster, without using LLMs, real data, internet search or any other model besides TabPFN. TabPFN-3 extends the capabilities of our models, enabling SOTA prediction on relational data (new SOTA foundation model on RelBenchV1) and tabular-text data (SOTA on TabSTAR via TabPFN-3-Plus); and improves existing integrations: a specialized checkpoint, TabPFN-TS-3, ranks 2nd on the time-series benchmark fev-bench, and SHAP-value computation is up to 120x faster. TabPFN-3 achieves this performance while being up to 20x faster than TabPFN-2.5. In addition, a reduced KV cache and row-chunking scale to 1M rows on one H100 with fast inference speed.

2605.07596 2026-05-29 stat.ML cs.LG

A Refined Generalization Analysis for Extreme Multi-class Supervised Contrastive Representation Learning

极端多类监督对比表示学习的精细泛化分析

Nong Minh Hieu, Antoine Ledent

AI总结 针对对比表示学习在有限标注数据中构造元组导致依赖性的问题,提出改进的U-统计量分析,得到与类别数R同阶的样本复杂度,并设计新估计器在长尾分布下实现O(k)的样本复杂度。

详情
Comments
Accepted at ICML 2026
AI中文摘要

对比表示学习(CRL)在多个机器学习领域取得了强大的实证成功,但其理论样本复杂度仍然知之甚少。现有分析通常假设输入元组是独立同分布的,这一假设在大多数实际设置中被违反,因为对比元组是从有限标注数据池中构造的,导致元组之间存在依赖性。虽然最近有一项工作使用U-统计量分析这种学习设置以估计总体风险,但其中使用的技术要求每个类别的风险均匀集中,使得超额风险界限的规模为$ρ_{\min}^{-{1}/{2}}$,其中$ρ_{\min}$表示最稀有类别的概率。这种依赖在极端多类设置中可能过于悲观,因为存在许多尾部类别,它们对总体风险的贡献极小。我们的贡献有两方面。首先,我们改进了先前的工作,证明了一个样本复杂度与类别数$R$同阶的界限,无论类别分布如何。此外,我们制定了一个不同的估计器,捕捉风险 extit{跨类别}的集中性,从而在极端多类学习场景中实现更尖锐的界限,特别是在类别分布为长尾的情况下。在类别分布的温和假设下,得到的样本复杂度为$\mathcal{O}(k)$,其中$k$是每个元组的样本数。

英文摘要

Contrastive Representation Learning (CRL) has achieved strong empirical success in multiple machine learning disciplines, yet its theoretical sample complexity remains poorly understood. Existing analyses usually assume that input tuples are identically and independently distributed, an assumption violated in most practical settings where contrastive tuples are constructed from a finite pool of labeled data, inducing dependencies among tuples. While one recent work analyzed this learning setting using U-Statistics to estimate the population risk, the techniques used therein require the risk of each class to concentrate uniformly, making excess risk bounds scale in the order of $ρ_{\min}^{-{1}/{2}}$ where $ρ_{\min}$ denotes the probability of the rarest class. Such a dependency can be overly pessimistic in the extreme multiclass settings where there are many tail classes which contribute minimally to the overall population risk. Our contributions are two-fold. Firstly, we improve upon the previous work and prove a bound with a sample complexity of the same order as the number of classes $R$, regardless of the distribution over classes. Furthermore, we formulate a different estimator that captures the concentration of the risk \textit{across classes}, enabling sharper bounds in extreme multi-class learning scenarios, especially where class distributions are long-tailed. Under mild assumptions on the class distributions, the resulting sample complexity is $\mathcal{O}(k)$ where $k$ is the number of samples per tuple.

2605.06355 2026-05-29 cs.LG stat.ML

Order-Agnostic Autoregressive Modelling with Missing Data

缺失数据下的顺序无关自回归建模

Ignacio Peis, Pablo M. Olmos, Jes Frellsen

AI总结 本文通过缺失数据视角重新审视顺序无关自回归模型,提出缺失感知训练框架,并利用其条件密度估计进行主动信息获取,在多个基准上优于传统插补方法。

详情
AI中文摘要

顺序无关自回归模型在深度生成建模中表现出色,但其在数据不完整情况下的应用尚未被充分探索。本文从缺失数据的角度重新审视这些模型。首先,我们证明它们在完全观测数据上的标准训练过程隐式地在完全随机缺失机制下进行插补,从而在高缺失率场景下实现了稳健的样本外插补性能。其次,我们提出了第一个原则性框架,用于在一般缺失机制下直接从不完整数据集中训练这些模型。第三,我们利用其摊销条件密度估计进行主动信息获取,即顺序选择对下游预测或推理最有信息量的缺失变量。在一系列真实世界基准测试中,我们的缺失感知顺序无关自回归模型(MO-ARM)持续优于已建立的插补基线。

英文摘要

Order-Agnostic autoregressive models have demonstrated strong performance in deep generative modeling, yet their use in settings with incomplete data remains largely unexplored. In this work, we reinterpret them through the lens of missing data. First, we show that their standard training procedure on fully observed data implicitly performs imputation under a missing completely at random mechanism, resulting in robust out-of-sample imputation performance in settings with high missingness. Second, we introduce the first principled framework for training them directly on incomplete datasets under general missingness mechanisms. Third, we leverage their amortized conditional density estimation to perform active information acquisition, i.e., sequentially selecting the most informative missing variables for downstream prediction or inference. Across a suite of real-world benchmarks, our Missingness-Aware Order-Agnostic Autoregressive Model (MO-ARM) consistently outperforms established imputation baselines.

2605.01665 2026-05-29 econ.EM stat.ME

Exact Likelihood Inference and Robust Filtering for Gauss-Cauchy Convolution Models

高斯-柯西卷积模型的精确似然推断与鲁棒滤波

Peter Reinhard Hansen, Chen Tong

AI总结 本文推导了Voigt分布(高斯与柯西卷积)的解析表达式,用于重尾测量噪声建模,并基于此提出GCC滤波器,在状态空间模型中实现鲁棒滤波,实证中优于高斯、t分布等滤波器。

详情
AI中文摘要

高斯分布与柯西分布的卷积,即Voigt分布,广泛应用于光谱学,并为重尾测量噪声建模提供了自然框架。我们利用缩放互补误差函数推导了其密度、得分、海森矩阵、Fisher信息量和条件矩的解析表达式,从而无需数值卷积、有限差分导数或伪Voigt近似即可实现稳定的最大似然估计。潜在高斯分量的条件期望由再下降位置得分控制,因此极端观测值会被自动折扣而非传播。该结构导致了用于具有高斯潜在动态和Voigt测量误差的状态空间模型的高斯-柯西卷积(GCC)滤波器,其中Masreliez高斯预测近似保留了Voigt预测误差密度。在对科技板块SPDR基金的对数已实现波动率的应用中,GCC滤波器将持久的潜在变化与瞬时的测量噪声分离,并在所考虑的高斯、Student-t、Huber及相关滤波规格中实现了最高的预测误差准则。

英文摘要

The convolution of a Gaussian and a Cauchy distribution, known as the Voigt distribution, is widely used in spectroscopy and provides a natural framework for modeling heavy-tailed measurement noise. We derive analytical expressions for its density, score, Hessian, Fisher information, and conditional moments using the scaled complementary error function, enabling stable maximum likelihood estimation without numerical convolution, finite-difference derivatives, or pseudo-Voigt approximations. The conditional expectation of the latent Gaussian component is governed by a redescending location score, so extreme observations are automatically discounted rather than propagated. This structure leads to the Gauss-Cauchy Convolution (GCC) filter for state-space models with Gaussian latent dynamics and Voigt measurement errors, where the Masreliez Gaussian prediction approximation preserves a Voigt prediction-error density. In an application to log realized volatility for the Technology Select Sector SPDR Fund, the GCC filter separates persistent latent variation from transient measurement noise and attains the highest implemented prediction-error criterion among the Gaussian, Student-$t$, Huber, and related filtering specifications considered.

2605.01050 2026-05-29 stat.AP

Trust Me, I'm a Doctor?

相信我,我是医生?

Zach Shahn, Mats Stensrud

AI总结 利用嵌套在观察性队列中的随机试验数据,推导医生个人策略优于试验平均推荐策略的比例的严格界限。

详情
AI中文摘要

临床试验通常针对平均治疗效果,但治疗决策是针对个体做出的。这种张力引发了对循证医学的一个常见批评:平均有益的特定治疗可能不适合特定患者,而熟练的医生可能优于严格遵循随机试验中表现最佳的策略。我们考虑如何使用来自同一目标人群的随机和观察性数据来评估这种可能性。具体来说,我们研究了随机试验嵌套在观察性队列中的设置,从而在治疗、对照和常规护理下观察到结果。我们询问观察数据能揭示医生多久能胜过试验建议的策略。我们推导了医生个人策略优于始终选择试验中表现更好的治疗的比例的严格界限,假设没有医生的策略比始终选择试验中表现更差的治疗更差。这些结果揭示了临床数据何时支持依赖医生自由裁量权而非试验平均推荐,以及何时需要更强的理由。

英文摘要

Clinical trials usually target average treatment effects, but treatment decisions are made for individuals. This tension motivates a common criticism of evidence-based medicine: a treatment that is beneficial on average may be inappropriate for a particular patient, and skilled physicians may outperform rigid adherence to the strategy that performed best in a randomized trial. We consider how randomized and observational data from the same target population can be used to assess that possibility. Specifically, we study settings in which a randomized trial is nested within an observational cohort, so that outcomes are observed under treatment, control, and usual care. We ask what the observed data can reveal about how often physicians outperform the strategy suggested by the trial. We derive sharp bounds on the proportion of physicians whose personal strategies perform better than always choosing the better performing treatment from the trial under the assumption that no physician's strategy is worse than always choosing the worse performing treatment from the trial. These results shed light on when clinical data support relying on physician discretion over the trial-average recommendation and when stronger justification is required.

2604.02094 2026-05-29 math.ST math.PR stat.TH

Importance sampling for Bayesian inference: polynomial-dimension dependent error bounds

贝叶斯推断的重要性采样:多项式维数依赖的误差界

Fabián González, Víctor Elvira, Joaquín Míguez

AI总结 本文在随机观测视角下,通过引入链接函数,证明了重要性采样的L2误差界在任意维数下以标准蒙特卡洛速率收敛,并给出了多项式维数依赖的误差分析。

详情
AI中文摘要

许多贝叶斯推断问题涉及高维模型,其中标准重要性采样(IS)方法的性能通常随着维数增加而迅速下降。经典IS分析通常假设观测是任意但固定的(即确定性的),从而忽略了贝叶斯模型赋予数据的概率结构。在本文中,我们采用观测本身是随机变量的视角,其分布由底层模型控制。在这个概率框架内,我们识别出一个模型依赖的函数,称为链接函数,它连接了固定观测和随机观测的表述。我们给出了L2蒙特卡洛估计误差的特征:具体来说,我们证明了当且仅当链接函数是Bochner可积时,L2误差界是有限的,并且以标准蒙特卡洛速率O(N^{-1/2})收敛,对于任意大的维数。这一结果揭示了控制近似误差的基本量,并建立了一种管理模型状态维数依赖性的机制。因此,我们的方法提供了一种原则性的方式来缓解高维挑战,提供了超越现有文献中主导的最坏情况分析的见解。最后,我们推导了几类模型(包括线性高斯系统和具有有界观测函数的模型)相关误差的维数缩放的显式解析示例。

英文摘要

Many Bayesian inference problems involve high-dimensional models where the performance of standard importance sampling (IS) methods often degrades rapidly as the dimensionality increases. Classical analyses of IS typically rely on the assumption that observations are arbitrary but fixed (i.e., deterministic), thereby neglecting the probabilistic structure that the Bayesian model induces on the data. In this paper, we adopt the perspective that observations are themselves random variables whose distribution is governed by the underlying model. Within this probabilistic framework, we identify a model-dependent function, referred to as the link function, which connects the fixed- and random-observation formulations. We provide a characterization of the $L^2$ Monte Carlo estimation error: specifically, we show that the $L^2$ error bounds are finite and converge at the standard Monte Carlo rate $O(N^{-1/2})$, for arbitrarily large dimension, if and only if the link function is Bochner integrable. This result reveals the fundamental quantity controlling the approximation error and establishes a mechanism to manage the dependence on the model state dimension. Consequently, our approach provides a principled way to alleviate the challenges of high dimensionality, offering insights that transcend worst-case analyses dominant in the existing literature. Finally, we derive explicit analytical examples of the dimensional scaling of the associated errors for several model classes, including linear-Gaussian systems and models with bounded observation functions.

2603.20329 2026-05-29 stat.ML cs.LG math.PR

Measure flow path recovery in Bayes Hilbert spaces

贝叶斯希尔伯特空间中的测度流路径恢复

S. David Mis, Maarten V. de Hoop

AI总结 针对有限移动局部传感器恢复概率测度流的不适定问题,提出基于贝叶斯希尔伯特框架的变分理论,通过构造最小能量传输实现和线性化观测算子,分析可恢复性条件,并发展有限维约化方法实现稳定重建。

详情
AI中文摘要

我们研究使用贝叶斯希尔伯特框架从有限个移动局部传感器恢复概率测度流的不适定问题。相对于固定的参考概率测度,概率律由其中心化对数比坐标表示,因此演化律成为希尔伯特函数空间中的一条路径。对于足够正则的贝叶斯希尔伯特路径,我们通过在每个时间点求解加权纽曼问题,构造路径的规范最小能量传输实现,得到切方向上的内在传输形式。然后,我们直接在贝叶斯希尔伯特路径空间上制定逆问题。观测算子的线性化产生可观测性形式,可恢复性由其与传输几何通过联合传输-可观测性形式的相互作用决定。在无穷维环境中,我们发展了正则化变分理论,并识别了局部传感器的局限性:移动传感器可以使联合形式单射,但通常不能在整个状态空间上产生强制稳定性估计。这一障碍自然导致有限维贝叶斯希尔伯特约化。在那里,传输形式成为动能张量,线性化观测成为约化感知矩阵,因此可恢复性可以通过显式的格拉姆条件表达。我们证明局部凸起传感器检测每个固定的约化方向,有限个适当放置的静态传感器产生均匀的约化可观测性,并且存在依赖于路径的传感器轨迹,使得即使单个移动传感器也能恢复约化路径。最后,我们证明这些约化恢复结果可以提升到对由所选有限维子空间良好近似的路径的近似环境恢复,从而实现稳定重建至投影误差。

英文摘要

We study the ill-posed problem of recovering a probability measure flow from finitely many moving localized sensors using a Bayes Hilbert framework. Relative to a fixed reference probability measure, a probability law is represented by its centered log-ratio coordinates, so that an evolving law becomes a path in a Hilbert space of functions. For sufficiently regular Bayes Hilbert paths, we construct a canonical minimum-energy transport realization of the path by solving a weighted Neumann problem at each time, yielding an intrinsic transport form on tangent directions. We then formulate an inverse problem directly on Bayes Hilbert path space. Linearization of an observation operator yields an observability form, and recoverability is governed by its interaction with the transport geometry through a joint transport--observability form. In the ambient infinite-dimensional setting, we develop a regularized variational theory and identify limitations of localized sensing: mobile sensors can make the joint form injective, but they do not in general yield a coercive stability estimate on the full state space. This obstruction leads naturally to finite-dimensional Bayes Hilbert reductions. There the transport form becomes a kinetic tensor and the linearized observations become reduced sensing matrices, so recoverability can be expressed through explicit Gramian conditions. We show that localized bump sensors detect every fixed reduced direction, that finitely many suitably placed static sensors yield uniform reduced observability, and there exist path-dependent sensor trajectories such that even a single moving sensor can recover the reduced path. Finally, we show that these reduced recovery results lift to approximate ambient recovery for paths that are well approximated by the chosen finite-dimensional subspaces, yielding stable reconstruction up to projection error.

2603.15192 2026-05-29 stat.AP

Benchmarking Formula 1 results using a normal model

使用正态模型对标一级方程式赛车成绩

John Fry, Silvio Fanzon, Mark Austin, Tom Brighton

AI总结 本文使用单变量和双变量正态模型,区分精英与非精英车队,量化车手和车队层面的合理性能预期,并应用于2025赛季数据。

详情
AI中文摘要

在体育运动中,区分技能和运气的影响一直备受关注。在一级方程式中,一个关键问题是区分赛车层面和车手层面的效应。目前,四支精英车队主导了一级方程式,并在过去四年中赢得了所有主要比赛。在本文中,我们使用单变量和双变量正态模型来量化车手和车队层面的合理性能预期,区分精英和非精英车队。我们通过应用于最近完全完成的2025赛季来展示我们的方法。

英文摘要

There is enduring interest in disentangling the effects of skill and luck in sport. A key issue in Formula 1 is distinguishing between car-level and driver-level effects. Four elite teams currently dominate Formula 1 and have won every major race for the last four years. In this paper we use univariate and bivariate normal models to quantify reasonable performance expectations at both driver and team levels, distinguishing between elite and non-elite teams. We illustrate our approach with an application to the last fully completed 2025 season.

2603.05002 2026-05-29 cs.LG math.OC stat.ML

Non-Euclidean Gradient Descent Operates at the Edge of Stability

非欧几里得梯度下降在稳定性边缘运行

Rustem Islamov, Michael Crawshaw, Jeremy Cohen, Robert Gower

AI总结 本文通过方向光滑性解释梯度下降中的稳定性边缘现象,并将其推广到非欧几里得范数,定义广义尖锐度,实验表明非欧几里得梯度下降也表现出渐进尖锐化和阈值振荡。

详情
AI中文摘要

稳定性边缘(EoS)是一种现象,其中Hessian矩阵的尖锐度(最大特征值)在梯度下降(GD)中接近并徘徊在稳定性阈值$2/η$附近(步长为$η$)。尽管(表面上)违反了经典光滑性假设,但EoS在深度学习中已被广泛观察到,其理论基础仍不完整。我们通过方向光滑性[Mishkin et al., 2024]的视角提供了对EoS的解释。这种解释自然地扩展到非欧几里得范数,我们用它来定义任意范数下的广义尖锐度。我们的广义尖锐度度量包括先前研究的普通GD和预处理GD作为特例,以及尚未研究EoS的方法,例如$\ell_{\infty}$下降、块坐标下降、谱GD及其归一化版本。通过在神经网络上的实验,我们表明具有广义尖锐度的非欧几里得GD也表现出渐进尖锐化,随后在阈值$2/η$附近或之上振荡。在实践中,我们的框架提供了一种几何感知的谱诊断方法,可应用于广泛的非欧几里得梯度方法类别。

英文摘要

The Edge of Stability (EoS) is a phenomenon where the sharpness (largest eigenvalue) of the Hessian approaches and then hovers near the stability threshold $2/η$ during gradient descent (GD) with step size $η$. Despite (apparently) violating classical smoothness assumptions, EoS has been widely observed in deep learning, but its theoretical foundations remain incomplete. We provide an interpretation of EoS through the lens of Directional Smoothness [Mishkin et al., 2024]. This interpretation naturally extends to non-Euclidean norms, which we use to define generalized sharpness under an arbitrary norm. Our generalized sharpness measure includes previously studied vanilla GD and preconditioned GD as special cases, as well as methods for which EoS has not been studied, such as $\ell_{\infty}$-descent, Block CD, Spectral GD, and their normalized versions. Through experiments on neural networks, we show that non-Euclidean GD with our generalized sharpness also exhibits progressive sharpening followed by oscillations around or above the threshold $2/η$. Practically, our framework provides a geometry-aware spectral diagnostic that can be applied across a broad class of non-Euclidean gradient methods.

2602.16449 2026-05-29 cs.LG cs.AI stat.ML

GICDM: Mitigating Hubness for Reliable Distance-Based Generative Model Evaluation

GICDM: 缓解枢纽性以实现可靠的基于距离的生成模型评估

Nicolas Salvy, Hugues Talbot, Bertrand Thirion

AI总结 针对生成模型评估中高维嵌入空间的枢纽性现象,提出GICDM方法(基于迭代上下文不相似度度量),通过多尺度扩展校正邻域估计,恢复可靠度量并与人类评估对齐。

详情
Comments
Forty-third International Conference on Machine Learning, 2026
AI中文摘要

生成模型评估通常依赖于高维嵌入空间来计算样本之间的距离。我们表明,这些空间中的数据集表示受到枢纽性现象的影响,这会扭曲最近邻关系并使基于距离的度量产生偏差。基于经典的迭代上下文不相似度度量(ICDM),我们引入了生成式ICDM(GICDM),一种校正真实数据和生成数据邻域估计的方法。我们引入了多尺度扩展以改善经验行为。在合成和真实基准上的大量实验表明,GICDM解决了枢纽性引起的失败,恢复了可靠的度量行为,并改善了与人类评估的一致性。

英文摘要

Generative model evaluation commonly relies on high-dimensional embedding spaces to compute distances between samples. We show that dataset representations in these spaces are affected by the hubness phenomenon, which distorts nearest-neighbor relationships and biases distance-based metrics. Building on the classical Iterative Contextual Dissimilarity Measure (ICDM), we introduce Generative ICDM (GICDM), a method to correct neighborhood estimation for both real and generated data. We introduce a multi-scale extension to improve empirical behavior. Extensive experiments on synthetic and real benchmarks demonstrate that GICDM resolves hubness-induced failures, restores reliable metric behavior, and improves alignment with human assessment.

2602.10637 2026-05-29 cs.LG cond-mat.stat-mech physics.chem-ph stat.ML

Coarse-Grained Boltzmann Generators

粗粒度玻尔兹曼生成器

Weilong Chen, Bojun Zhao, Jan Eckwert, Julija Zavadlav

AI总结 提出粗粒度玻尔兹曼生成器(CG-BGs)框架,结合基于流的生成模型与重要性采样,利用学习到的平均力势(PMF)进行重加权,在降低计算成本的同时实现大分子系统的平衡采样。

详情
Comments
Accepted at ICML 2026
AI中文摘要

从玻尔兹曼分布中采样平衡分子构型是一个长期挑战。玻尔兹曼生成器(BGs)通过结合精确似然生成模型与重要性采样来解决这一问题,但实际可扩展性有限。同时,粗粒度代理模型通过降低有效维度来建模更大系统,但往往缺乏确保渐近正确统计量的重加权过程。在这项工作中,我们提出了粗粒度玻尔兹曼生成器(CG-BGs),一个用于粗粒度坐标空间中的降阶生成建模与重要性采样的框架。CG-BGs使用基于流的模型生成样本,并使用学习到的平均力势(PMF)进行重加权。我们表明,可以通过增强采样力匹配从快速收敛的轨迹中学习PMF。实验证明,CG-BGs在高度降阶表示中捕获溶剂介导的相互作用,同时相对于原子级BGs大幅降低计算成本,为更大分子系统的平衡采样提供了实用途径。

英文摘要

Sampling equilibrium molecular configurations from the Boltzmann distribution is a longstanding challenge. Boltzmann Generators (BGs) address this by combining exact-likelihood generative models with importance sampling, but practical scalability is limited. Meanwhile, coarse-grained surrogates enable the modeling of larger systems by reducing effective dimensionality, yet often lack a reweighting procedure required to ensure asymptotically correct statistics. In this work, we propose Coarse-Grained Boltzmann Generators (CG-BGs), a framework for reduced-order generative modeling with importance sampling in coarse-grained coordinate space. CG-BGs generate samples using a flow-based model and reweight them using a learned potential of mean force (PMF). We show that the PMF can be learned from rapidly converged trajectories via enhanced sampling force matching. Experiments demonstrate that CG-BGs capture solvent-mediated interactions in highly reduced representations while substantially reducing computational cost relative to atomistic BGs, providing a practical route toward equilibrium sampling of larger molecular systems.

2602.06361 2026-05-29 cs.GT cs.IT cs.LG math.IT stat.ML

Envy-Free Allocation of Indivisible Goods via Noisy Queries

通过噪声查询实现不可分割物品的无嫉妒分配

Zihan Li, Yan Hao Ling, Jonathan Scarlett, Warut Suksompong

AI总结 针对不可直接观测估值、仅能通过噪声查询获取信息的不可分割物品分配问题,在双智能体高斯噪声和有界估值设定下,推导了实现无嫉妒分配所需查询次数的上下界,并证明了当最优分配负嫉妒值Δ不太小时最优查询次数与m^{2.5}/Δ^2成比例。

详情
Comments
ICML 2026
AI中文摘要

我们引入了一个公平分配不可分割物品(物品)的问题,其中智能体的估值无法直接观测,而只能通过噪声查询访问。在双智能体设定中,考虑高斯噪声和有界估值,我们推导了根据物品数量$m$和最优分配的负嫉妒值$Δ$,找到无嫉妒分配所需查询次数的上下界。特别地,当$Δ$不太小(即$Δ\gg m^{1/4}$)时,我们证明最优查询次数在忽略对数因子下为$ rac{\sqrt m }{(Δ/ m)^2} = rac{m^{2.5}}{Δ^2}$。我们的上界基于非自适应查询和一个简单的基于阈值的分配算法,该算法在多项式时间内运行,而下界即使在自适应查询和任意计算时间下也成立。

英文摘要

We introduce a problem of fairly allocating indivisible goods (items) in which the agents' valuations cannot be observed directly, but instead can only be accessed via noisy queries. In the two-agent setting with Gaussian noise and bounded valuations, we derive upper and lower bounds on the required number of queries for finding an envy-free allocation in terms of the number of items, $m$, and the negative-envy of the optimal allocation, $Δ$. In particular, when $Δ$ is not too small (namely, $Δ\gg m^{1/4}$), we establish that the optimal number of queries scales as $\frac{\sqrt m }{(Δ/ m)^2} = \frac{m^{2.5}}{Δ^2}$ up to logarithmic factors. Our upper bound is based on non-adaptive queries and a simple thresholding-based allocation algorithm that runs in polynomial time, while our lower bound holds even under adaptive queries and arbitrary computation time.

2602.05961 2026-05-29 cs.LG stat.ML

Discrete diffusion samplers and bridges: Off-policy algorithms and applications in latent spaces

离散扩散采样器与桥:离策略算法及其在潜在空间中的应用

Arran Carter, Sanghyeok Choi, Kirill Tamogashev, Víctor Elvira, Esmeralda S. Whitammer

AI总结 提出离策略训练技术改进离散扩散采样器性能,并首次引入离散域的数据到能量薛定谔桥训练,应用于图像生成模型的离散潜在空间中的无数据后验采样。

详情
Comments
ICML 2026. Code: https://github.com/mmacosha/offpolicy-discrete-diffusion-samplers-and-bridges
AI中文摘要

从已知归一化常数的分布 $p(x) \propto e^{-\mathcal{E}(x)}$ 中采样是统计学中一个重要且具有挑战性的问题。近年来,出现了一类新的摊销采样算法,通常称为扩散采样器,能够从未归一化的密度中快速高效地采样。这类算法在连续空间采样任务中已被广泛研究;然而,它们在离散空间问题中的应用仍 largely 未被探索。尽管该领域已取得一些进展,但离散扩散采样器并未充分利用连续空间采样中常用的思想。在本文中,我们提出通过引入离散扩散采样器的离策略训练技术来弥合这一差距。我们证明这些技术在已有和新颖的合成基准上提高了离散采样器的性能。接下来,我们将离散扩散采样器推广到两个任意分布之间的桥接任务,首次为离散域引入了数据到能量薛定谔桥训练。最后,我们展示了所提出的扩散采样器在图像生成模型的离散潜在空间中进行无数据后验采样的应用。

英文摘要

Sampling from a distribution $p(x) \propto e^{-\mathcal{E}(x)}$ known up to a normalising constant is an important and challenging problem in statistics. Recent years have seen the rise of a new family of amortised sampling algorithms, commonly referred to as diffusion samplers, that enable fast and efficient sampling from an unnormalised density. Such algorithms have been widely studied for continuous-space sampling tasks; however, their application to problems in discrete space remains largely unexplored. Although some progress has been made in this area, discrete diffusion samplers do not take full advantage of ideas commonly used for continuous-space sampling. In this paper, we propose to bridge this gap by introducing off-policy training techniques for discrete diffusion samplers. We show that these techniques improve the performance of discrete samplers on both established and new synthetic benchmarks. Next, we generalise discrete diffusion samplers to the task of bridging between two arbitrary distributions, introducing data-to-energy Schrödinger bridge training for the discrete domain for the first time. Lastly, we showcase the application of the proposed diffusion samplers to data-free posterior sampling in the discrete latent spaces of image generative models.

2512.10401 2026-05-29 stat.ML cs.LG math.ST stat.TH

Diffusion differentiable resampling

扩散可微重采样

Jennifer Rosina Andersson, Zheng Zhao

AI总结 针对序贯蒙特卡洛中的可微重采样问题,提出一种基于无训练扩散模型代理的信息性且即时可微的重采样方法,理论证明其一致性,并在多个滤波和参数估计基准上优于现有方法。

详情
Comments
In ICML 2026
AI中文摘要

本文关注序贯蒙特卡洛(例如粒子滤波)中的可微重采样问题。借鉴重参数化,我们提出了一种新的重采样方法,该方法基于无训练扩散模型代理,具有信息性且即时可微。我们从理论上证明了我们的扩散重采样方法提供了一致的重采样分布,并通过实验表明,在多个滤波和参数估计基准上,它优于最先进的可微重采样方法。最后,我们展示了当用于学习具有高维图像观测的复杂动力学-解码器模型时,它实现了具有竞争力的端到端性能。

英文摘要

This paper is concerned with differentiable resampling in the context of sequential Monte Carlo (e.g., particle filtering). Drawing on reparametrisation, we propose a new resampling method that is informative and instantly differentiable, based on a training-free diffusion model surrogate. We theoretically prove that our diffusion resampling method provides a consistent resampling distribution, and we show empirically that it outperforms the state-of-the-art differentiable resampling methods on multiple filtering and parameter estimation benchmarks. Finally, we show that it achieves competitive end-to-end performance when used in learning a complex dynamics-decoder model with high-dimensional image observations.

2512.03109 2026-05-29 cs.LG cs.AI stat.AP stat.ML

E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing

E-valuator: 基于序贯假设检验的可靠智能体验证器

Shuvom Sadhuka, Drew Prinster, Clara Fannjiang, Gabriele Scalia, Bonnie Berger, Aviv Regev, Hanchen Wang

AI总结 提出E-valuator方法,将任意黑盒验证器分数转化为具有可控虚警率的决策规则,通过序贯假设检验实现对智能体轨迹的在线监控,提升统计功效并节省令牌。

详情
AI中文摘要

智能体AI系统根据用户提示执行一系列动作,如推理步骤或工具调用。为了评估其轨迹的成功性,研究人员开发了验证器(如LLM评判器和过程奖励模型)来对智能体轨迹中每个动作的质量进行评分。尽管这些启发式评分可能提供信息,但在用于决定智能体是否会产生成功输出时,无法保证正确性。在此,我们引入e-valuator,一种将任意黑盒验证器分数转化为具有可证明虚警率控制的决策规则的方法。我们将区分成功轨迹(即会导致对用户提示正确响应的动作序列)与不成功轨迹的问题构建为序贯假设检验问题。E-valuator基于e-过程工具开发了一个序贯假设检验,该检验在智能体轨迹的每一步都保持统计有效性,从而能够对任意长动作序列的智能体进行在线监控。实验表明,在六个数据集和三个智能体上,e-valuator相比其他策略提供了更高的统计功效和更好的虚警率控制。我们还展示了e-valuator可用于快速终止有问题的轨迹并节省令牌。总之,e-valuator提供了一个轻量级、模型无关的框架,将验证器启发式转化为具有统计保证的决策规则,从而支持部署更可靠的智能体系统。

英文摘要

Agentic AI systems execute a sequence of actions, such as reasoning steps or tool calls, in response to a user prompt. To evaluate the success of their trajectories, researchers have developed verifiers, such as LLM judges and process-reward models, to score the quality of each action in an agent's trajectory. Although these heuristic scores can be informative, there are no guarantees of correctness when used to decide whether an agent will yield a successful output. Here, we introduce e-valuator, a method to convert any black-box verifier score into a decision rule with provable control of false alarm rates. We frame the problem of distinguishing successful trajectories (that is, a sequence of actions that will lead to a correct response to the user's prompt) and unsuccessful trajectories as a sequential hypothesis testing problem. E-valuator builds on tools from e-processes to develop a sequential hypothesis test that remains statistically valid at every step of an agent's trajectory, enabling online monitoring of agents over arbitrarily long sequences of actions. Empirically, we demonstrate that e-valuator provides greater statistical power and better false alarm rate control than other strategies across six datasets and three agents. We additionally show that e-valuator can be used for to quickly terminate problematic trajectories and save tokens. Together, e-valuator provides a lightweight, model-agnostic framework that converts verifier heuristics into decisions rules with statistical guarantees, enabling the deployment of more reliable agentic systems.

2511.12732 2026-05-29 stat.ME

Scalable and Communication-Efficient Varying Coefficient Mixed Effect Models: Methodology, Theory, and Applications

可扩展且通信高效的变系数混合效应模型:方法、理论与应用

Lida Chalangar Jalili Dehkharghani, Li-Hsiang Lin

AI总结 针对大规模分布式数据下的变系数混合模型,提出一种基于贝叶斯分层表示和充分统计量的通信高效推断框架,实现一阶有效估计,并通过SVD增强处理病态协方差,理论证明似然保持、收敛性和渐近效率,模拟和迁移数据验证了可扩展性和动态空间模式恢复。

详情
Comments
3 Figures
AI中文摘要

人类迁移表现出由环境和社会经济力量驱动的复杂时空依赖性。大规模建模此类模式需要能够容纳许多随机效应的方法,同时在原始数据或大型设计矩阵无法在分布式节点间自由共享时保持可行性。我们为变系数混合模型(VCMMs)开发了一个通信高效的推断框架,该模型具有灵活的均值结构和大型相关随机效应分量。利用惩罚样条的贝叶斯分层表示,我们推导出充分统计量,这些统计量保留每个节点的似然贡献,并在无限制通信下从完整数据中恢复估计量。在通信约束下,这些统计量支持具有一阶效率的单步通信高效估计量。SVD增强实现稳定了大型或病态随机效应协方差算子。理论建立了似然保持、收敛性、渐近效率和有限样本集中性。模拟和美国迁移流数据证明了准确性、可扩展性和动态空间模式的恢复。

英文摘要

Human migration exhibits complex spatiotemporal dependence driven by environmental and socioeconomic forces. Modeling such patterns at scale requires methods that accommodate many random effects while remaining feasible when raw data or large design matrices cannot be freely shared across distributed nodes. We develop a communication-efficient inference framework for Varying Coefficient Mixed Models (VCMMs) with flexible mean structures and large correlated random-effect components. Using a Bayesian hierarchical representation of penalized splines, we derive sufficient statistics that preserve each node's likelihood contribution and recover the estimator from the full data under unrestricted communication. Under communication constraints, these statistics support a one-step communication-efficient estimator with first-order efficiency. An SVD-enhanced implementation stabilizes large or ill-conditioned random-effect covariance operators. Theory establishes likelihood preservation, convergence, asymptotic efficiency, and finite-sample concentration. Simulations and U.S. migration-flow data demonstrate accuracy, scalability, and recovery of dynamic spatial patterns.

2510.27663 2026-05-29 eess.IV cs.LG stat.ME stat.ML

Bayesian model selection and misspecification testing in imaging inverse problems only from noisy and partial measurements

仅从噪声和部分测量中进行成像逆问题的贝叶斯模型选择与误设定检验

Tom Sprunck, Marcelo Pereyra, Tobias Liaudat

AI总结 提出一种结合贝叶斯交叉验证与数据分裂的通用方法,用于在无真实数据情况下对成像逆模型进行选择与误设定检测,兼容扩散采样器等贝叶斯成像采样器,计算成本低且准确率高。

详情
AI中文摘要

现代成像技术严重依赖贝叶斯统计模型来解决困难的图像重建和恢复任务。本文针对无真实数据的情况,研究此类模型的客观评估,重点关注模型选择和误设定诊断。现有的无监督模型评估方法通常因计算成本高且与通过机器学习模型隐式定义的现代图像先验不兼容,而不适用于计算成像。本文提出一种基于贝叶斯交叉验证与数据分裂(一种随机测量分裂技术)的新型组合方法,用于贝叶斯成像科学中的无监督模型选择和误设定检测。该方法与任何贝叶斯成像采样器兼容,包括扩散采样器和即插即用采样器。我们通过涉及多种评分规则和模型误设定类型的实验证明了该方法的有效性,在低计算成本下实现了出色的选择和检测精度。

英文摘要

Modern imaging techniques heavily rely on Bayesian statistical models to address difficult image reconstruction and restoration tasks. This paper addresses the objective evaluation of such models in settings where ground truth is unavailable, with a focus on model selection and misspecification diagnosis. Existing unsupervised model evaluation methods are often unsuitable for computational imaging due to their high computational cost and incompatibility with modern image priors defined implicitly via machine learning models. We herein propose a general methodology for unsupervised model selection and misspecification detection in Bayesian imaging sciences, based on a novel combination of Bayesian cross-validation and data fission, a randomized measurement splitting technique. The approach is compatible with any Bayesian imaging sampler, including diffusion and plug-and-play samplers. We demonstrate the methodology through experiments involving various scoring rules and types of model misspecification, where we achieve excellent selection and detection accuracy with a low computational cost.

2510.25154 2026-05-29 stat.ME stat.CO stat.ML

TabMGP: Martingale Posterior with TabPFN

TabMGP: 基于TabPFN的马氏后验

Kenyon Ng, Edwin Fong, David T. Frazier, Jeremias Knoblauch, Susan Wei

AI总结 提出TabMGP方法,利用TabPFN作为预测规则构建马氏后验,为表格数据提供具有近名义覆盖率的可信集,优于手工MGP构造和标准贝叶斯基线。

详情
Comments
Accepted at ICML 2026. Extra plots in https://drive.google.com/drive/folders/1ct_effOoTEGpiWUf0_1xI3VqLWHtJY16 . Code in https://github.com/weiyaw/tabmgp
AI中文摘要

贝叶斯推断提供了原则性的不确定性量化,但通常受到先验和似然启发的限制。马氏后验(MGP)(Fong等人,2023)通过用预测规则替代这些要求提供了一种替代方案。此外,MGP将推断聚焦于通过损失函数定义的参数。这一框架在基础变换器时代尤为共鸣;从业者越来越多地利用像TabPFN这样的模型以获得其最先进的能力,但通常需要对于科学估计量$θ$的认知不确定性,而该估计量不必参数化隐式潜在模型。MGP提供了一种恢复这些后验分布的机制。我们介绍了TabMGP,一种基于TabPFN构建的用于表格数据的MGP。TabMGP产生了具有近名义覆盖率的可信集,并且通常优于手工MGP构造和标准贝叶斯基线。

英文摘要

Bayesian inference provides principled uncertainty quantification but is often limited by the challenges of prior and likelihood elicitation. The martingale posterior (MGP) (Fong et al., 2023) offers an alternative by replacing these requirements with a predictive rule. In addition, the MGP focuses inference on parameters defined through a loss function. This framework is especially resonant in the era of foundation transformers; practitioners increasingly leverage models like TabPFN for their state-of-the-art capabilities, yet often require epistemic uncertainty for a scientific estimand $θ$ that need not parameterise the implicit latent model. The MGP provides a mechanism to recover these posterior distributions. We introduce TabMGP, an MGP built on TabPFN for tabular data. TabMGP produces credible sets with near-nominal coverage and often outperforms both handcrafted MGP constructions and standard Bayesian baselines.

2510.05991 2026-05-29 econ.EM math.ST stat.ME stat.TH

Robust Inference for Convex Pairwise Difference Estimators

凸成对差分估计量的稳健推断

Matias D. Cattaneo, Michael Jansson, Kenichi Nagasawa

AI总结 针对凸成对差分估计量,本文发展了分布理论和基于自助法的推断方法,通过小带宽渐近理论、广义刀切去偏和调整带宽方差扭曲的自助法,实现了在更弱带宽条件下的有效推断。

详情
AI中文摘要

本文为一类广泛的凸成对差分估计量发展了分布理论和基于自助法的推断方法。这些估计量在具有相似协变量的观测对之间最小化核加权的凸参数函数,其中相似性由局部化(带宽)参数控制。虽然经典结果在限制性带宽条件下建立了渐近正态性,但我们证明在更弱的假设下,有效的高斯和基于自助法的推断仍然是可能的。首先,我们将小带宽渐近理论扩展到凸成对差分估计设置,即使使用比标准更小的带宽也能推导出稳健的高斯近似。其次,我们采用基于广义刀切的去偏程序,使得在较大带宽下也能进行推断,同时保持目标函数的凸性。第三,我们构建了一种新颖的自助法,调整了带宽引起的方差扭曲,从而在广泛的带宽选择范围内实现有效的推断。我们提出的推断方法具有明显更强的稳健性,同时保留了凸成对差分估计量的实际吸引力。

英文摘要

This paper develops distribution theory and bootstrap-based inference methods for a broad class of convex pairwise difference estimators. These estimators minimize a kernel-weighted convex-in-parameter function over observation pairs with similar covariates, where the similarity is governed by a localization (bandwidth) parameter. While classical results establish asymptotic normality under restrictive bandwidth conditions, we show that valid Gaussian and bootstrap-based inference remains possible under substantially weaker assumptions. First, we extend the theory of small bandwidth asymptotics to convex pairwise difference estimation settings, deriving robust Gaussian approximations even when a smaller than standard bandwidth is used. Second, we employ a debiasing procedure based on generalized jackknifing to enable inference with larger bandwidths, while preserving convexity of the objective function. Third, we construct a novel bootstrap method that adjusts for bandwidth-induced variance distortions, yielding valid inference across a wide range of bandwidth choices. Our proposed inference method enjoys demonstrably greater robustness, while retaining the practical appeal of convex pairwise difference estimators.

2510.00094 2026-05-29 physics.soc-ph stat.AP

Proximity-based cities emit less mobility-driven CO$_2$

基于邻近性的城市减少出行驱动的CO$_2$排放

Francesco Marzolla, Matteo Bruno, Hygor P. M. Melo, Vittorio Loreto

AI总结 通过分析全球近400个城市的数据,发现服务设施靠近居民的区域人均交通碳排放更低,并量化了优化服务布局可实现的减排潜力。

详情
AI中文摘要

在追求更环境可持续的城市区域的过程中,提出了15分钟城市的概念,以鼓励主动出行,主要是步行和骑行。如果每个居民都能在离家15分钟步行或骑行范围内获得基本服务,则该城市区域被视为“15分钟城市”。然而,关于该模式在减少汽车使用和碳排放方面的有效性仍存在争议。在本研究中,我们进行了一项大规模数据驱动分析,以评估服务设施靠近住宅对CO$_2$排放的影响。通过检查全球近400个城市,我们发现,在同一城市内,服务设施更靠近居民的区域人均交通CO$_2$排放更低。我们为每个城市建立了服务设施邻近性与CO$_2$排放之间的明确关系。此外,我们量化了30个城市在优化服务设施位置后潜在的减排量。这种优化保持每个城市的服务设施总数不变,同时重新分布它们以确保整个城市区域的平等可达性。我们的研究结果表明,改善服务设施的邻近性可以显著减少与交通相关的预期城市排放。

英文摘要

In the quest for more environmentally sustainable urban areas, the concept of the 15-minute city has been proposed to encourage active mobility, primarily through walking and cycling. An urban area is considered a ``15-minute city" if every resident can access essential services within a 15-minute walk or bike ride from their home. However, there is an ongoing debate about the effectiveness of this model in reducing car usage and carbon emissions. In this study, we conduct a large-scale data-driven analysis to evaluate the impact of service proximity to homes on CO$_2$ emissions. By examining nearly 400 cities worldwide, we discover that, within the same city, areas with services located closer to residents produce less CO$_2$ emissions per capita from transportation. We establish a clear relationship between the proximity of services and CO$_2$ emissions for each city. Additionally, we quantify the potential reduction in emissions for 30 cities if they optimise the location of their services. This optimisation maintains each city's total number of services while redistributing them to ensure equal accessibility throughout the entire urban area. Our findings indicate that improving the proximity of services can significantly reduce expected urban emissions related to transportation.

2509.24100 2026-05-29 stat.ME cs.LG

SpeedCP: Fast Kernel-based Conditional Conformal Prediction

SpeedCP: 基于核的快速条件共形预测

Yating Liu, Yeo Jin Jung, Zixuan Wu, So Won Jeong, Claire Donnat

AI总结 提出一种基于路径追踪的高效算法,在保持RKHS条件共形预测框架理论优势的同时,将计算速度提升40倍,区间长度缩短30%。

详情
AI中文摘要

共形预测提供了具有有限样本条件保证的分布自由预测集。我们基于Gibbs等人(2023)的RKHS框架,该框架利用协变量偏移族来提供近似条件共形预测区间,具有强大的理论前景,但计算成本过高。为弥补这一差距,我们开发了一种稳定高效的算法,该算法以与单次核分位数拟合基本相同的成本计算正则化RKHS共形优化问题的完整解路径。我们的路径追踪框架同时调整超参数,提供平滑控制和数据自适应校准。为了将方法扩展到高维设置,我们进一步将我们的方法与低秩潜在嵌入相结合,在数据驱动的潜在空间中捕获条件有效性。实验上,我们的方法在各种现代黑盒预测器上提供了可靠的条件覆盖,将Gibbs等人(2023)的区间长度改善了30%,同时实现了40倍的加速。

英文摘要

Conformal prediction provides distribution-free prediction sets with finite-sample conditional guarantees. We build upon the RKHS-based framework of Gibbs et al. (2023), which leverages families of covariate shifts to provide approximate conditional conformal prediction intervals, an approach with strong theoretical promise, but with prohibitive computational cost. To bridge this gap, we develop a stable and efficient algorithm that computes the full solution path of the regularized RKHS conformal optimization problem, at essentially the same cost as a single kernel quantile fit. Our path-tracing framework simultaneously tunes hyperparameters, providing smoothness control and data-adaptive calibration. To extend the method to high-dimensional settings, we further integrate our approach with low-rank latent embeddings that capture conditional validity in a data-driven latent space. Empirically, our method provides reliable conditional coverage across a variety of modern black-box predictors, improving the interval length of Gibbs et al. (2023) by 30%, while achieving a 40-fold speedup.

2506.21543 2026-05-29 math.ST cs.IT math.IT math.PR stat.TH

Detecting weighted hidden cliques

检测加权隐藏团

Urmisha Chatterjee, Karissa Huang, Ritabrata Karmakar, B. R. Vinay Kumar, Gábor Lugosi, Nandan Malhotra, Anirban Mandal, Maruf Alam Tarafdar

AI总结 研究加权图中隐藏团检测问题,通过假设检验框架,在已知和部分已知分布下推导可检测性统计极限,并给出谱方法算法。

详情
Comments
Revision with organised references
AI中文摘要

我们研究了经典隐藏团问题到具有实值边权图的推广。形式上,我们定义了一个假设检验问题。在原假设下,$n$个顶点的完全图的边权独立同分布于分布$P$。在备择假设下,随机选择$k$个顶点,它们之间的边权来自分布$Q$,其余边权来自$P$。目标是观察边权后决定它们来自哪个假设。我们在两种场景下研究该问题:(1) $P$和$Q$完全已知,(2) $P$和$Q$只有部分信息。在第一种场景中,我们得到了当两个假设可区分和不可区分时$k$的统计极限。此外,在每个场景中,当$Q$关于$P$不是绝对连续时,我们给出了假设检验问题最小风险的界。我们还提供了计算高效的谱检验,只要$k=Ω(\sqrt{n})$,即可在两种场景下区分两个假设。

英文摘要

We study a generalization of the classical hidden clique problem to graphs with real-valued edge weights. Formally, we define a hypothesis testing problem. Under the null hypothesis, edges of a complete graph on $n$ vertices are associated with independent and identically distributed edge weights from a distribution $P$. Under the alternate hypothesis, $k$ vertices are chosen at random and the edge weights between them are drawn from a distribution $Q$, while the remaining are sampled from $P$. The goal is to decide, upon observing the edge weights, which of the two hypotheses they were generated from. We investigate the problem under two different scenarios: (1) when $P$ and $Q$ are completely known, and (2) when there is only partial information of $P$ and $Q$. In the first scenario, we obtain statistical limits on $k$ when the two hypotheses are distinguishable, and when they are not. Additionally, in each of the scenarios, we provide bounds on the minimal risk of the hypothesis testing problem when $Q$ is not absolutely continuous with respect to $P$. We also provide computationally efficient spectral tests that can distinguish the two hypotheses as long as $k=Ω(\sqrt{n})$ in both the scenarios.

2505.13745 2026-05-29 cs.LG stat.ML

Synthetic Non-stationary Data Streams for Recognition of the Unknown

用于未知识别的合成非平稳数据流

Joanna Komorniczak

AI总结 提出一种同时包含概念漂移和新类出现的合成数据流生成策略,并评估无监督漂移检测器在开放集识别任务中的表现。

详情
AI中文摘要

数据非平稳性问题在数据流处理中常被讨论。在动态环境中,方法应持续准备分析时变数据——因此,它们应支持增量训练并应对概念漂移。非平稳数据流环境中另一个同样重要的变化是新的、先前未知类别的出现。通常,方法专注于这两种现象之一——检测概念漂移或检测新类别——而数据流中可能同时出现这两种困难。此外,关于先前未知的观测,开放类别集的话题近年来变得尤为重要,方法的目标是在已知类别内高效分类,并识别模型能力范围外的对象。本文提出一种合成数据流生成策略,其中同时出现概念漂移和代表未知对象的新类别。所呈现的研究展示了无监督漂移检测器如何处理检测新类别和概念漂移的任务,并演示了生成的数据流如何用于开放集识别任务。

英文摘要

The problem of data non-stationarity is commonly addressed in data stream processing. In a dynamic environment, methods should continuously be ready to analyze time-varying data -- hence, they should enable incremental training and respond to concept drifts. An equally important variability typical for non-stationary data stream environments is the emergence of new, previously unknown classes. Often, methods focus on one of these two phenomena -- detection of concept drifts or detection of novel classes -- while both difficulties can be observed in data streams. Additionally, concerning previously unknown observations, the topic of open set of classes has become particularly important in recent years, where the goal of methods is to efficiently classify within known classes and recognize objects outside the model competence. This article presents a strategy for synthetic data stream generation in which both concept drifts and the emergence of new classes representing unknown objects occur. The presented research shows how unsupervised drift detectors address the task of detecting novelty and concept drifts and demonstrates how the generated data streams can be utilized in the open set recognition task.

2505.07989 2026-05-29 stat.ME econ.EM stat.CO

rd2d: Causal Inference in Boundary Discontinuity Designs

rd2d:边界断点设计中的因果推断

Matias D. Cattaneo, Rocio Titiunik, Ruiqi Rae Yu

AI总结 本文介绍rd2d软件包,用于边界断点设计中基于局部多项式估计的因果效应推断,支持双变量得分或单变量符号距离得分,并提供带宽选择、偏差校正、置信带等功能。

详情
AI中文摘要

边界断点(BD)设计用于实证研究,以了解由双变量得分定义的连续分配边界上的因果处理效应。这些设计也称为多得分断点回归(RD)设计,其中地理RD设计是一个突出的例子。本文介绍了\pkg{rd2d},一个用于\proglang{R}、\proglang{Python}和\proglang{Stata}的统计软件包,该软件包使用双变量得分或单变量符号距离边界得分实现BD设计的局部多项式估计和推断。该软件涵盖精确和模糊BD设计,提供自动带宽选择、稳健偏差校正逐点推断、一致置信带、联合或单独拟合约定的聚类稳健推断、协变量调整效率改进、质量点检查和协方差正则化等功能。我们通过一个应用于机会区的实证例子来说明该软件包,在该区域中,资格对指定有强烈的第一阶段效应,但对早期工作场所就业增长没有显著影响。

英文摘要

Boundary Discontinuity (BD) designs are used in empirical research to learn about causal treatment effects along a continuous assignment boundary defined by a bivariate score. These designs are also known as multi-score regression discontinuity (RD) designs, and include geographic RD designs as a prominent example. This article introduces \pkg{rd2d}, a statistical software package for \proglang{R}, \proglang{Python}, and \proglang{Stata} that implements local polynomial estimation and inference for BD designs using either the bivariate score or a univariate signed distance-to-boundary score. The software covers sharp and fuzzy BD designs, providing automatic bandwidth selection, robust bias-corrected pointwise inference, uniform confidence bands, cluster-robust inference with joint or separate fitting conventions, covariate-adjusted efficiency improvements, mass-point checks, and covariance regularization, among other features. We illustrate the package with an empirical application to Opportunity Zones, where eligibility has a strong first-stage effect on designation but no significant effects on early workplace-job growth.

2505.02069 2026-05-29 cs.LG stat.ML

Neural Logistic Bandits

神经逻辑老虎机

Seoungbin Bae, Dabeen Lee

AI总结 针对神经逻辑老虎机问题,利用一种新型的自归一化向量值鞅的Bernstein型不等式,提出两种算法NeuralLog-UCB-1和NeuralLog-UCB-2,分别实现与有效维度相关的遗憾上界,改进了现有结果。

详情
AI中文摘要

我们研究了神经逻辑老虎机问题,其主要任务是通过神经网络学习逻辑链接函数内的未知奖励函数。现有方法要么对$κ$(其中$1/κ$表示奖励分布的最小方差)有不利的依赖,要么直接依赖于特征维度$d$,而在基于神经网络的设置中$d$可能非常大。在这项工作中,我们引入了一种新型的自归一化向量值鞅的Bernstein型不等式,旨在绕过对环境维度的直接依赖。这使我们能够推导出一个遗憾上界,该上界随有效维度$\widetilde{d}$增长,而不是特征维度,同时保持对$κ$的最小依赖。基于该集中不等式,我们提出了两种算法NeuralLog-UCB-1和NeuralLog-UCB-2,它们分别保证了$\widetilde{O}(\widetilde{d}\sqrt{κT})$和$\widetilde{O}(\widetilde{d}\sqrt{T/κ})$阶的遗憾上界,改进了现有结果。最后,我们在合成数据集和真实数据集上报告了数值结果,以验证我们的理论发现。

英文摘要

We study the problem of neural logistic bandits, where the main task is to learn an unknown reward function within a logistic link function using a neural network. Existing approaches either exhibit unfavorable dependencies on $κ$, where $1/κ$ represents the minimum variance of reward distributions, or suffer from direct dependence on the feature dimension $d$, which can be huge in neural network-based settings. In this work, we introduce a novel Bernstein-type inequality for self-normalized vector-valued martingales that is designed to bypass a direct dependence on the ambient dimension. This lets us deduce a regret upper bound that grows with the effective dimension $\widetilde{d}$, not the feature dimension, while keeping a minimal dependence on $κ$. Based on the concentration inequality, we propose two algorithms, NeuralLog-UCB-1 and NeuralLog-UCB-2, that guarantee regret upper bounds of order $\widetilde{O}(\widetilde{d}\sqrt{κT})$ and $\widetilde{O}(\widetilde{d}\sqrt{T/κ})$, respectively, improving on the existing results. Lastly, we report numerical results on both synthetic and real datasets to validate our theoretical findings.

2503.15287 2026-05-29 stat.CO cs.DC

Distributed Generalized Linear Models: A Privacy-Preserving Approach

分布式广义线性模型:一种隐私保护方法

Daniel Tinoco, Raquel Menezes, Carlos Baquero

AI总结 提出一种隐私保护的分布式广义线性模型方法,通过数据流或分布式计算实现模型训练,并扩展到GLM框架,数值实验验证了其在联邦环境中的有效性。

详情
Comments
Total PDF pages: 23 Figures: 7
AI中文摘要

本文提出了一种新颖的经典线性回归方法,能够在数据流或分布式环境中进行模型计算,同时保护联邦环境中的数据隐私。我们将该框架扩展到广义线性模型(GLM),确保可扩展性和对不同数据分布的适应性,同时保持隐私保护特性。为了评估我们方法的有效性,我们在模拟和真实数据集上进行了数值研究,并将我们的方法与使用迭代重加权最小二乘法的GLM传统最大似然估计进行了比较。我们的结果证明了所提方法在分布式和联邦环境中的优势。

英文摘要

This paper presents a novel approach to classical linear regression, enabling model computation from data streams or in a distributed setting while preserving data privacy in federated environments. We extend this framework to generalized linear models (GLMs), ensuring scalability and adaptability to diverse data distributions while maintaining privacy-preserving properties. To assess the effectiveness of our approach, we conduct numerical studies on both simulated and real datasets, comparing our method with conventional maximum likelihood estimation for GLMs using iteratively reweighted least squares. Our results demonstrate the advantages of the proposed method in distributed and federated settings.

2410.23222 2026-05-29 cs.LG cs.AI stat.ML

Dataset-Driven Channel Masks in Transformers for Multivariate Time Series

数据集驱动的Transformer通道掩码用于多变量时间序列

Seunghan Lee, Taeyoung Park, Kibok Lee

AI总结 提出部分通道依赖(PCD)概念,通过数据集特定的通道掩码(CMs)改进Transformer中的通道依赖建模,并在多种任务和数据集上验证有效性。

详情
Comments
ICASSP 2026. Preliminary version: NeurIPS Workshop on Time Series in the Age of Large Models 2024 (Oral presentation)
AI中文摘要

最近基础模型的进展已成功扩展到时间序列(TS)领域,这得益于大规模TS数据集的出现。然而,先前的努力主要集中于捕获通道依赖(CD),这对于建模多变量时间序列至关重要,并且基于注意力的方法已被广泛用于此目的。尽管如此,这些方法主要关注修改架构,往往忽略了数据集特定特征的重要性。在这项工作中,我们引入了部分通道依赖(PCD)的概念,通过利用数据集特定信息来增强基于Transformer的模型中的CD建模,从而细化模型捕获的CD。为了实现PCD,我们提出了通道掩码(CMs),通过逐元素乘法将其集成到Transformer的注意力矩阵中。CMs由两个组件组成:1)捕获通道之间关系的相似性矩阵,以及2)数据集特定且可学习的领域参数,用于细化相似性矩阵。我们在多种任务和数据集上使用不同的骨干网络验证了PCD的有效性。代码可在此存储库获取:https://github.com/YonseiML/pcd。

英文摘要

Recent advancements in foundation models have been successfully extended to the time series (TS) domain, facilitated by the emergence of large-scale TS datasets. However, previous efforts have primarily Capturing channel dependency (CD) is essential for modeling multivariate time series (TS), and attention-based methods have been widely employed for this purpose. Nonetheless, these methods primarily focus on modifying the architecture, often neglecting the importance of dataset-specific characteristics. In this work, we introduce the concept of partial channel dependence (PCD) to enhance CD modeling in Transformer-based models by leveraging dataset-specific information to refine the CD captured by the model. To achieve PCD, we propose channel masks (CMs), which are integrated into the attention matrices of Transformers via element-wise multiplication. CMs consist of two components: 1) a similarity matrix that captures relationships between the channels, and 2) dataset-specific and learnable domain parameters that refine the similarity matrix. We validate the effectiveness of PCD across diverse tasks and datasets with various backbones. Code is available at this repository: https://github.com/YonseiML/pcd.

2409.06439 2026-05-29 cs.LG stat.CO stat.ML

Extending Explainable Ensemble Trees (E2Tree) to regression contexts

将可解释集成树(E2Tree)扩展到回归场景

Massimo Aria, Agostino Gnasso, Carmela Iorio, Marjolein Fokkema

AI总结 本文通过引入新的不相似度度量,将可解释集成树方法从分类扩展到回归,并在真实数据集上验证其解释能力。

详情
Journal ref
Applied Stochastic Models in Business and Industry, Vol. 42, No. 1, e70064 (2026)
AI中文摘要

集成方法如随机森林通过聚合多个弱学习器提供了高精度的预测,改变了监督学习的格局。然而,尽管它们有效,这些方法往往缺乏透明度,阻碍了用户理解随机森林模型如何得出预测。可解释集成树(E2Tree)是一种解释随机森林的新方法,提供了响应变量与预测变量之间关系的图形表示。E2Tree的一个显著特点是它不仅考虑预测变量对响应的影响,还通过计算和使用不相似度度量来考虑预测变量之间的关联。E2Tree方法最初是为分类任务提出的。在本文中,我们将该方法扩展到回归场景。为了展示所提算法的解释能力,我们在真实数据集上进行了演示。

英文摘要

Ensemble methods such as random forests have transformed the landscape of supervised learning, offering highly accurate prediction through the aggregation of multiple weak learners. However, despite their effectiveness, these methods often lack transparency, impeding users' comprehension of how RF models arrive at their predictions. Explainable ensemble trees (E2Tree) is a novel methodology for explaining random forests, that provides a graphical representation of the relationship between response variables and predictors. A striking characteristic of E2Tree is that it not only accounts for the effects of predictor variables on the response but also accounts for associations between the predictor variables through the computation and use of dissimilarity measures. The E2Tree methodology was initially proposed for use in classification tasks. In this paper, we extend the methodology to encompass regression contexts. To demonstrate the explanatory power of the proposed algorithm, we illustrate its use on real-world datasets.

2408.13596 2026-05-29 stat.ME stat.CO

Robust Principal Components by Casewise and Cellwise Weighting

通过案例加权和单元加权实现稳健主成分

Fabio Centofanti, Mia Hubert, Peter J. Rousseeuw

AI总结 提出 cellPCA 方法,通过结合两种稳健损失函数和迭代重加权最小二乘算法,同时处理案例异常值、单元异常值和缺失数据,实现稳健的主成分分析。

详情
AI中文摘要

主成分分析(PCA)是分析多元数据的基本工具。这里关注的是降维到主子空间,其特征由投影矩阵表示。经典的主子空间可能受到异常值的强烈影响。传统的稳健方法考虑案例异常值,即由与干净案例不同的未指定异常分布生成的案例。但也可能存在单元异常值,即可能出现在数据矩阵中任何位置的可疑条目。另一个常见问题是某些单元格可能缺失。本文提出了一种新的稳健PCA方法,称为cellPCA,它可以同时处理案例异常值、单元异常值和缺失单元格。其单一目标函数结合了两个稳健损失函数,共同减轻案例和单元异常值的影响。目标函数通过迭代重加权最小二乘(IRLS)算法最小化。提出了残差单元格图和增强异常值图用于异常值检测。推导了主子空间的案例和单元影响函数,并得到了其渐近分布。广泛的模拟和两个真实数据示例说明了cellPCA的性能。

英文摘要

Principal component analysis (PCA) is a fundamental tool for analyzing multivariate data. Here the focus is on dimension reduction to the principal subspace, characterized by its projection matrix. The classical principal subspace can be strongly affected by the presence of outliers. Traditional robust approaches consider casewise outliers, that is, cases generated by an unspecified outlier distribution that differs from that of the clean cases. But there may also be cellwise outliers, which are suspicious entries that can occur anywhere in the data matrix. Another common issue is that some cells may be missing. This paper proposes a new robust PCA method, called cellPCA, that can simultaneously deal with casewise outliers, cellwise outliers, and missing cells. Its single objective function combines two robust loss functions, that together mitigate the effect of casewise and cellwise outliers. The objective function is minimized by an iteratively reweighted least squares (IRLS) algorithm. Residual cellmaps and enhanced outlier maps are proposed for outlier detection. The casewise and cellwise influence functions of the principal subspace are derived, and its asymptotic distribution is obtained. Extensive simulations and two real data examples illustrate the performance of cellPCA.

2406.18509 2026-05-29 math.ST math.PR stat.TH

Normal integral representation for the joint survival function of the cumulative sums of the components of multinomial random vectors

多项随机向量分量累积和的联合生存函数的正态积分表示

Frédéric Ouimet

AI总结 本文通过将多项随机向量分量累积和的联合生存函数与Dirichlet概率联系起来,推导出其正态积分表示,为多元KMT逼近提供潜在工具。

详情
Comments
15 pages, 0 figures, 4 tables
AI中文摘要

本文给出了任意多项随机向量在内部格点处分量累积和的联合生存函数的多元正态积分表示。该结果可视为Carter和Pollard (2004)中方程(7)的多元类比,其证明从二项生存概率的Beta积分表示出发,并利用Laplace方法改进了Tusnády不等式。我们的发现基于任意多项随机向量分量累积和的联合生存函数与对应累积和区域上的Dirichlet概率之间的关键关系。主要动机是这种显式公式最终可能有助于简化Einmahl (1989)多元KMT逼近中使用的条件分位数变换论证,这一联系留待未来工作。我们对d=2,3,4,5的情况进行了数值验证。

英文摘要

This paper presents a multivariate normal integral representation for the joint survival function of the cumulative sums of the components of any multinomial random vector at interior lattice points. This result can be viewed as a multivariate analog of Equation (7) in Carter and Pollard (2004), whose proof starts from the beta integral representation of binomial survival probabilities and uses Laplace's method to improve Tusnády's inequality. Our findings are based on a crucial relationship between the joint survival function of the cumulative sums of the components of any multinomial random vector and a Dirichlet probability over a corresponding cumulative-sum region. The main motivation is that such an explicit formula may eventually help streamline the conditional quantile-transformation arguments used in the multivariate KMT approximation of Einmahl (1989), a connection left for future work. We provide numerical checks of the identity for $d = 2,3,4,5$.

2406.15844 2026-05-29 stat.ME stat.AP

Bayesian modeling of multi-species labeling errors in ecological studies

生态研究中多物种标记错误的贝叶斯建模

Haoxuan Wang, Patrik Lauha, David B. Dunson

AI总结 提出贝叶斯分层模型,结合稀疏专家标注以改进鸟类物种分类并提供不确定性量化,同时评估专家表现。

详情
AI中文摘要

生态和保护研究监测鸟类群落通常依赖于基于鸟类鸣声的物种分类。历史上,这依赖于专家志愿者进入野外并列出他们观察到的鸟类物种。最近,机器学习算法已经出现,可以根据音频记录准确分类鸟类物种。这类算法关键依赖于专家标记的训练数据。当多个物种同时发声、存在背景噪声和/或鸟类远离麦克风时,自动分类具有挑战性。在连续监测不同地点时,音频数据量变得巨大,人类专家只能标记可用数据的一小部分。此外,专家在不同物种的准确性和知识广度上可能有所不同。本文关注结合稀疏专家标注以改进鸟类物种分类同时提供不确定性量化的重要问题。我们还感兴趣于提供专家表现评分以增加他们的参与度并鼓励改进。我们提出了一种贝叶斯分层建模方法,并在芬兰开发的一个新的社区科学平台上评估了该方法。

英文摘要

Ecological and conservation studies monitoring bird communities typically rely on species classification based on bird vocalizations. Historically, this has been based on expert volunteers going into the field and making lists of the bird species that they observe. Recently, machine learning algorithms have emerged that can accurately classify bird species based on audio recordings of their vocalizations. Such algorithms crucially rely on training data that are labeled by experts. Automated classification is challenging when multiple species are vocalizing simultaneously, there is background noise, and/or the bird is far from the microphone. In continuously monitoring different locations, the size of the audio data become immense and it is only possible for human experts to label a tiny proportion of the available data. In addition, experts can vary in their accuracy and breadth of knowledge about different species. This article focuses on the important problem of combining sparse expert annotations to improve bird species classification while providing uncertainty quantification. We additionally are interested in providing expert performance scores to increase their engagement and encourage improvements. We propose a Bayesian hierarchical modeling approach and evaluate this approach on a new community science platform developed in Finland.

2305.16842 2026-05-29 q-fin.ST stat.AP stat.ME

Accounting statement analysis at industry level. A gentle introduction to the compositional approach

行业层面的会计报表分析:组合方法的温和介绍

Germà Coenders, Núria Arimany Serrat

AI总结 本文介绍组合数据分析方法在行业层面财务报表分析中的应用,通过几何均值计算行业财务比率均值,利用组合主成分分析、聚类分析和回归模型进行可视化与建模,并以西班牙酒庄为例演示杜邦分析。

详情
AI中文摘要

组合数据当前被定义为正向量,其元素之间的比率是研究者感兴趣的。通过会计比率(即财务比率)进行的财务报表分析完全符合这一定义。组合数据分析解决了行业层面标准财务比率统计分析中的主要问题,如偏态、非正态性、非线性、异常值以及结果对选择哪个会计数字作为比率分子和分母的依赖性。尽管如此,组合方法在财务报表分析中的应用仍然很少。在本文中,我们介绍组合数据分析中一些对财务报表分析特别有用的变换。我们展示了如何从组合角度通过几何均值计算行业或子行业的标准财务比率均值。我们展示了如何使用组合主成分分析双标图可视化行业中的公司;如何使用组合聚类分析将它们分类为同质的财务绩效概况;以及如何将财务比率作为变量引入统计模型,例如通过组合回归模型关联财务绩效和公司特征。我们通过杜邦分析分解净资产收益率,展示了在西班牙酒庄的会计报表中的应用,并提供了组合免费软件CoDaPack的逐步教程。

英文摘要

Compositional data are contemporarily defined as positive vectors, the ratios among whose elements are of interest to the researcher. Financial statement analysis by means of accounting ratios a.k.a. financial ratios fulfils this definition to the letter. Compositional data analysis solves the major problems in statistical analysis of standard financial ratios at industry level, such as skewness, non-normality, non-linearity, outliers, and dependence of the results on the choice of which accounting figure goes to the numerator and to the denominator of the ratio. Despite this, compositional applications to financial statement analysis are still rare. In this article, we present some transformations within compositional data analysis that are particularly useful for financial statement analysis. We show how to compute industry or sub-industry means of standard financial ratios from a compositional perspective by means of geometric means. We show how to visualise firms in an industry with a compositional principal-component-analysis biplot; how to classify them into homogeneous financial performance profiles with compositional cluster analysis; and how to introduce financial ratios as variables in a statistical model, for instance to relate financial performance and firm characteristics with compositional regression models. We show an application to the accounting statements of Spanish wineries using the decomposition of return on equity by means of DuPont analysis, and a step-by-step tutorial to the compositional freeware CoDaPack.

2303.09644 2026-05-29 math.ST stat.TH

Testing the goodness-of-fit of a functional autoregressive model

检验函数自回归模型的拟合优度

W. González-Manteiga, M. D. Ruiz-Medina, M. Febrero-Bande

AI总结 针对函数型时间序列中的线性自相关模型,提出基于经验过程的拟合优度检验,推导了经验过程收敛到时间变换的Wiener过程的函数中心极限定理,并验证了检验在简单和复合原假设下的大样本性质及一致性。

详情
AI中文摘要

提出的用于检验函数型时间序列中线性自相关模型的拟合优度检验基于一个经验过程,其残差标记和协变量索引集位于可分离希尔伯特空间 \mathbb{H} 中。推导了函数中心极限定理,提供了经验过程收敛到在可分离希尔伯特空间 \mathbb{H} 中评估的时间变换的Wiener过程,其中子序器由所涉及的严格平稳自回归希尔伯特过程(AR\mathbb{H}(1) 过程)的边缘概率给出。在简单和复合原假设下获得了检验统计量的大样本行为。在简单原假设下讨论了检验的一致性。附录中说明了在不同备择族和随机投影方案下检验过程的有限样本性能。

英文摘要

The proposed Goodness--of--Fit (GoF) test for checking the linear autocorrelation model in a functional time series is based on an empirical process, whose residual marks and covariate index set are in a separable Hilbert space \mathbb{H}. A functional central limit theorem is derived providing the convergence of the empirical process to a time-changed Wiener process evaluated in a separable Hilbert space \mathbb{H}, with subordinator given by the marginal probability of the involved strictly stationary Autoregressive Hilbertian process (AR\mathbb{H}(1) process). The large sample behavior of the test statistics is obtained under simple and composite null hypotheses. The consistency of the test is addressed under simple null hypothesis. The finite-sample performance of the testing procedure, under different families of alternatives, and random projection schemes, is illustrated in the Appendix.

2212.12435 2026-05-29 stat.ME math.ST stat.TH

Second-level global sensitivity analysis of numerical simulators with application to an accident scenario in a sodium-cooled fast reactor

数值模拟器的二级全局灵敏度分析及其在钠冷快堆事故场景中的应用

Anouar Meynaoui, Amandine Marrel, Béatrice Laurent

AI总结 针对输入分布不确定对全局灵敏度分析结果的影响,提出基于加权HSIC估计量的单环蒙特卡洛方法,实现计算预算有限的二级全局灵敏度分析,并应用于核反应堆严重事故场景。

详情
Comments
This work was intended as a replacement of arXiv:1902.07030 and any subsequent updates will appear there
AI中文摘要

数值模拟器广泛用于模拟物理现象,全局灵敏度分析(GSA)旨在研究输入不确定性对模拟器输出的全局影响。为了进行GSA,通常使用基于输入/输出依赖度量的统计工具。我们关注希尔伯特-施密特独立性准则(HSIC)。有时,建模输入不确定性的概率分布本身可能是不确定的,量化其对GSA结果的影响非常重要。我们将其称为二级全局灵敏度分析(GSA2)。然而,当使用蒙特卡洛双环方法进行GSA2时,需要大量的模型评估,这对于CPU时间昂贵的模拟器来说是难以处理的。为了解决这一限制,我们提出了一种基于蒙特卡洛单环且计算预算有限的新统计方法。首先,我们从精心选择的输入概率分布中构建一个唯一的输入和模拟器输出样本。从该样本中,我们通过使用加权HSIC度量估计器,对各种假设的输入概率分布进行GSA。证明了这些加权估计器的统计性质。随后,我们定义了输入分布与GSA结果之间的基于HSIC的二级度量,即GSA2指标。通过一个解析示例说明了我们的GSA2方法的效率,从而比较了几种技术选项。最后,将其应用于模拟核反应堆严重事故场景的测试案例。

英文摘要

Numerical simulators are widely used to model physical phenomena and global sensitivity analysis (GSA) aims at studying the global impact of the input uncertainties on the simulator output. To perform GSA, statistical tools based on inputs/output dependence measures are commonly used. We focus here on the Hilbert-Schmidt independence criterion (HSIC). Sometimes, the probability distributions modeling the uncertainty of inputs may be themselves uncertain and it is important to quantify their impact on GSA results. We call it here the second-level global sensitivity analysis (GSA2). However, GSA2, when performed with a Monte Carlo double-loop, requires a large number of model evaluations, which is intractable with CPU time expensive simulators. To cope with this limitation, we propose a new statistical methodology based on a Monte Carlo single-loop with a limited calculation budget. First, we build a unique sample of inputs and simulator outputs, from a well-chosen probability distribution of inputs. From this sample, we perform GSA for various assumed probability distributions of inputs by using weighted HSIC measures estimators. Statistical properties of these weighted estimators are demonstrated. Subsequently, we define 2 nd-level HSICbased measures between the distributions of inputs and GSA results, which constitute GSA2 indices. The efficiency of our GSA2 methodology is illustrated on an analytical example, thereby comparing several technical options. Finally, an application to a test case simulating a severe accidental scenario on nuclear reactor is provided.

2605.29580 2026-05-29 cs.LG stat.ML

On the Construction and Implications of Low-Loss Valleys in LoRA-based Bayesian Inference

基于LoRA的贝叶斯推理中低损失谷的构造与启示

Daniel Dold, Emanuel Sommer, Julius Kobialka, Oliver Dürr, David Rügamer

AI总结 本文提出LoRA-Curve方法,通过分段贝塞尔曲线参数化在LoRA空间中连接独立最优解,形成连续低损失谷,并结合平坦极小扰动和JS散度正则化,在不牺牲性能的前提下提高预测分布的互信息,实现功能多样性。

详情
AI中文摘要

虽然低秩适应(LoRA)等参数高效微调方法已成为大型语言模型的标准方法,但对认知不确定性的原则性估计仍然具有挑战性。最近在LoRA机制下的结果表明,深度集成等离散多模态方法相比单模态方法几乎没有优势。这与深度学习中的更广泛观察相矛盾,在深度学习中,集成独立最优解通常能改善泛化,而通过连续低损失谷连接这些模态能进一步增强贝叶斯模型平均(BMA)。LoRA空间中是否存在这种结构,以及它是否能产生局部或离散方法所遗漏的功能多样性,尚未被研究。我们引入了LoRA-Curve,一种在LoRA空间中的分段贝塞尔曲线参数化,包含两种变体:一种自由配置,联合优化所有控制点;另一种锚定配置,连接独立微调的LoRA最优解。我们证明了损失沿曲线的路径连续性和Lipschitz正则性,并通过Qwen2.5 7B在推理和分类基准上的实验表明,线性插值会遇到损失障碍,而我们的锚定多段曲线通过连续低损失谷连接独立最优解。结合平坦极小扰动和詹森-香农散度正则化,LoRA-Curve在不牺牲性能的情况下,可测量地提高了预测分布的互信息,并将连续参数空间遍历与功能多样性联系起来。

英文摘要

While parameter-efficient fine-tuning methods like low-rank adaptation (LoRA) are standard for large language models, principled estimation of epistemic uncertainty remains challenging. Recent results in the LoRA regime suggest that discrete multi-mode approaches such as deep ensembles offer little benefit over single-mode methods. This contradicts broader observations in deep learning, where ensembling independent optima typically improves generalization, and linking these modes through continuous low-loss valleys further enhances Bayesian model averaging (BMA). Whether such structure exists in the LoRA space and whether it yields functional diversity missed by local or discrete methods has not been studied. We introduce LoRA-Curve, a segmented Bézier curve parameterization in the LoRA space, with two variants: a free configuration that jointly optimizes all control points, and an anchored configuration that connects independently fine-tuned LoRA optima. We prove pathwise continuity and Lipschitz regularity of the loss along the curve and empirically show, across reasoning and classification benchmarks with Qwen2.5 7B, that linear interpolation encounters loss barriers, while our anchored multi-segment curves connect independent optima through continuous low-loss valleys. Combined with flat-minima perturbations and a Jensen-Shannon divergence regularizer, LoRA-Curve yields measurably higher mutual information of the predictive distribution without sacrificing performance, and links continuous parameter-space traversal to functional diversity.

2605.29541 2026-05-29 stat.ME q-fin.ST

Change-point estimation for Weibull time series with copula-based Markov models

基于Copula马尔可夫模型的威布尔时间序列变点估计

Li-Hsien Sun, Zong-Yuan Huang, Yi-Ling Huang, Chi-Yang Chiu, Ning Ning

AI总结 针对具有非线性序列依赖的时间序列,提出基于Copula的马尔可夫链模型(威布尔边缘分布),通过Clayton和Joe copula捕捉非对称尾部依赖,利用牛顿-拉夫逊算法进行最大似然估计变点,并采用参数自助法构建置信区间。

详情
AI中文摘要

我们研究具有非线性序列依赖的时间序列数据的离线变点估计。为解决此问题,我们提出一个基于Copula的马尔可夫链模型,具有威布尔边缘分布,适用于建模非负数据,如事件时间和波动率度量。通过Clayton和Joe copula纳入非线性依赖,使模型能够分别捕捉非对称下尾和上尾依赖结构。我们推导相应的似然函数,并通过牛顿-拉夫逊算法实现的最大似然估计来估计变点和模型参数。通过参数自助蒙特卡洛程序构建置信区间。进行大量数值研究,评估所提出方法在不同依赖结构和copula误设情况下的有限样本性能和稳健性。结果表明,所提出的估计量在RMSE和相对误差方面表现良好,特别是对于变点的估计。对COVID-19大流行期间VIX指数的实证应用进一步说明了所提出方法在检测边缘分布和序列依赖结构中的结构性变化方面的实际效用。

英文摘要

We study offline change-point estimation for time series data exhibiting nonlinear serial dependence. To address this problem, we propose a copula-based Markov chain model with Weibull marginal distributions, which is suitable for modeling nonnegative data such as event times and volatility measures. Nonlinear dependence is incorporated through the Clayton and Joe copulas, allowing the model to capture asymmetric lower-tail and upper-tail dependence structures, respectively. We derive the corresponding likelihood function and estimate the change point and model parameters using maximum likelihood estimation implemented through the Newton--Raphson algorithm. Confidence intervals are constructed via a parametric bootstrap Monte Carlo procedure. Extensive numerical studies are conducted to evaluate the finite-sample performance and robustness of the proposed method under different dependence structures and copula misspecification scenarios. The results demonstrate that the proposed estimators perform well in terms of RMSE and relative error, particularly for the estimation of the change point. An empirical application to the VIX index during the COVID-19 pandemic further illustrates the practical usefulness of the proposed approach in detecting structural changes in both the marginal distributions and serial dependence structure.

2605.29516 2026-05-29 stat.ME math.OC

Active learning strategy for excursion-set confidence regions of functional simulator outputs

主动学习策略用于功能模拟器输出的超越集置信区域

Lucas Brunel, Mathieu Balesdent, Loïc Brevault, Rodolphe Le Riche, Bruno Sudret

AI总结 提出结合主成分分析和高斯过程回归的代理模型,并引入基于最大-最小准则的主动学习策略,高效估计具有随机输入和功能输出的函数的超越集置信区域。

详情
AI中文摘要

估计超越集置信区域旨在以给定的置信水平识别函数可能超过某个阈值的区域。本文关注于函数具有随机输入且功能输出一次性返回的情况下的置信区域估计。我们开发了一种基于代理模型的方法来估计置信区域,结合了主成分分析和高斯过程回归。还引入了一种基于最大-最小准则的主动学习策略,该策略选择可能减少置信区域不确定性的新样本。该策略通过Karhunen-Loève展开利用高斯过程的高效采样。将所提出的方法应用于三个案例研究的置信区域估计:一个合成函数、高超声速飞行器的表面压力系数分布以及可重复使用运载器第一级的滑翔返回轨迹。该方法在准确估计置信区域的同时减少了建模不确定性的来源,表现出高效性。与文献中的参考方法进行了基准比较。讨论了评估置信区域估计性能的相关指标。

英文摘要

Estimating excursion set confidence regions seeks to identify regions where a function may exceed some threshold with a given confidence level. This paper focuses on estimating such confidence regions in cases where the function has random inputs and a functional output that is returned all at once. We develop a surrogate-based approach for estimating the confidence region, combining principal component analysis and Gaussian process regression. An active learning strategy is also introduced, based on a max-min criterion that selects new samples which are likely to reduce the uncertainty in the confidence region. This strategy leverages efficient sampling of the Gaussian process through a Karhunen-Loève expansion. The proposed approach is applied to estimate the confidence regions of three case studies: a synthetic function, the surface pressure coefficient distribution of a hypersonic vehicle, and the glide-back trajectory of a reusable launcher first stage. The method demonstrates efficiency in accurately estimating the confidence region while reducing sources of modeling uncertainties. It is benchmarked against reference methods from the literature. Relevant metrics for assessing the confidence region estimation performance are discussed.

2605.29466 2026-05-29 stat.CO physics.data-an

`pandemonium`: High Dimensional Analysis in Linked Spaces

`pandemonium`: 链接空间中的高维分析

Gabriel McCoy, German Valencia, Ursula Laa

AI总结 提出R包pandemonium,通过聚类分析和链接可视化探索预测变量与响应变量之间的关系,核心方法包括非线性降维和动画游览,主要贡献在于提供了一种在双空间中同时可视化和分析高维数据结构的工具。

详情
AI中文摘要

数据分析中的一个常见挑战是在涉及大量预测变量和响应变量的问题中揭示它们之间的关系。当预测变量和响应变量的数量有限时,可视化方法特别有效。我们提出了一个R包pandemonium,旨在通过将聚类分析与链接可视化相结合来探索此类问题。聚类在一组变量中执行,以识别在该空间中具有相似模式的区域。使用基于非线性降维和动画游览的链接视图,同时在两个空间中可视化得到的聚类。我们通过两个示例介绍该包,这些示例说明了不同类型的链接空间。在第一个示例中,我们考虑一组输入变量如何映射到神经网络回归模型中的潜在激活,以识别导致相似激活模式的输入组合。在第二个示例中,我们分析了一个在物理学中出现的复杂多变量数学模型,以研究预测空间中的结构如何与响应相关。

英文摘要

A common challenge in data analysis is uncovering relationships between predictors and responses in problems involving large numbers of both. When the number of predictors and responses is limited, visual approaches are particularly effective. We present an R package, pandemonium, designed to explore such problems by combining cluster analysis with linked visualisations. Clustering is performed in one set of variables to identify regions with similar patterns in that space. The resulting clusters are simultaneously visualised in both spaces using linked views based on non-linear dimension reduction and animated tours. We introduce the package through two examples that illustrate different types of linked spaces. In the first example, we consider how a set of input variables is mapped to latent activations in a neural network regression model, to identify input combinations that result in similar activation patterns. In the second example, we analyse a complex multivariable mathematical model arising in physics to investigate how structure in the predictor space relates to the responses.

2605.29464 2026-05-29 stat.ML cs.LG

Deep Optimal Individualized Treatment Rules for Bivariate Survival Outcomes via Adaptive Prediction-Powered Learning

双变量生存结局的深度最优个体化治疗规则:基于自适应预测驱动学习

Kun Ren, Yifan Cui, Wen Su

AI总结 针对随机试验中的双变量生存结局,提出一种基于深度神经网络的自适应预测驱动方法,通过随机策略建模治疗规则并耦合边际加速失效时间模型,以最大化联合生存概率。

详情
AI中文摘要

在涉及多种治疗的随机试验中,双变量生存结局给决策带来了显著的分析挑战。本文通过深度神经网络,解决推导最优个体化治疗规则以最大化固定时间点$(t_1, t_2)$之后的联合生存概率的问题,同时考虑右删失。我们提出了一种新颖的方法,通过随机策略对治疗规则进行建模,并通过连接函数耦合边际加速失效时间模型以捕捉双变量依赖性。为了增强决策的鲁棒性和有效性,我们引入了一种自适应预测驱动方法,该方法利用机器学习模型的辅助预测。

英文摘要

In randomized trials involving multiple treatments, bivariate survival outcomes present significant analytical challenges for making decisions. This paper addresses the problem of deriving optimal individualized treatment rules to maximize the joint survival probability beyond fixed time points $(t_1, t_2)$ through deep neural networks, while accounting for right censoring. We propose a novel approach that models treatment rules via stochastic policies, coupling marginal accelerated failure time models via link function to capture bivariate dependence. To enhance robustness and effectiveness of decision making, we introduce an adaptive prediction-powered method that leverages auxiliary predictions from machine learning models.

2605.29424 2026-05-29 stat.AP cond-mat.soft physics.data-an

Model-free estimation in scattering analysis of microscopy

显微镜散射分析中的无模型估计

Tong Lin, Jinseok Lee, Matt Helgeson, Megan T. Valentine, Yimin Luo, Mengyang Gu

AI总结 提出一种基于概率框架的无模型方法MF-AIUQ,通过中间散射函数与均方位移的关系,利用边际最大似然估计从显微镜视频中直接估计均方位移,无需粒子追踪或参数模型。

详情
Comments
18 pages, 6 figures
AI中文摘要

粒子的均方位移通常通过粒子追踪方法从显微镜视频中估计,这些方法依赖手动调整参数,并且在整个滞后时间范围内往往不稳定,尤其是在密集或低对比度情况下。在这项工作中,我们提出了无模型从头不确定性量化方法,这是一种基于概率框架的显微镜视频散射分析无模型方法,它无需分离粒子或链接其轨迹即可估计均方位移。基于累积量定理导出的中间散射函数与均方位移之间的关系,MF-AIUQ通过边际最大似然估计器估计均方位移值。为了降低计算成本,似然函数通过傅里叶变换强度的一个子集来近似。这些强度在傅里叶基函数和对数滞后时间点的对数值上等间距分布。我们发现中间散射函数在这个对数输入空间中是平滑的,并且中间散射函数的信息可以通过这个输入子集捕获。我们通过涵盖几个代表性随机过程的模拟研究和三个实验系统来检验该方法:用于评估在光学密集和明场设置下性能的牛顿流体、具有演化均方位移形状的凝胶系统,以及用于模量估计的粘弹性生物聚合物蜗牛粘液。在这些研究中,MF-AIUQ在整个滞后时间范围内提供了平滑且稳定的均方位移估计,并在粒子追踪不可靠或均方位移参数模型不可用或不可验证的情况下,作为一种有用的补充方法。

英文摘要

The mean squared displacement (MSD) of particles or probes is commonly estimated from microscopy videos using particle tracking approaches, which rely on tuning parameters manually, and are often unstable over the entire lag time range, especially in dense or low-contrast situations. In this work, we propose model-free ab initio uncertainty quantification (MF-AIUQ), a model-free method for scattering analysis of microscopy video based on a probabilistic framework, which estimates MSD without isolating particles and linking their trajectories. Based on the relationship between the intermediate scattering function (ISF) and the MSD derived from the cumulant theorem, MF-AIUQ estimates the MSD values by the marginal maximum likelihood estimator. To reduce the computational cost, the likelihood function is approximated by a subset of Fourier-transformed intensities. These intensities are equally spaced at the logarithmic values of Fourier basis functions and lag time points. We found that the ISF is smooth in this logarithmic input space, and the information of the ISF can be captured by this subset of inputs. We examine the method through simulation studies covering several representative stochastic processes and three experimental systems: a Newtonian fluid for evaluating performance in optically dense and bright-field settings, a gelation system with an evolving MSD shape, and snail mucin, a viscoelastic biopolymer, for modulus estimation. Across these studies, MF-AIUQ provides smooth and stable MSD estimates over the full lag time range and serves as a useful complementary approach in settings where particle tracking is unreliable or a parametric model of MSD is unavailable or unverifiable.

2605.29415 2026-05-29 eess.IV cs.CV cs.LG eess.SP stat.ML

Constructing efficient channels for ideal observers using the conjugate gradient method

使用共轭梯度法构建理想观察者的高效通道

Weimin Zhou

AI总结 针对医学成像系统图像质量的任务评估,提出基于共轭梯度(CG)的方法构建高效通道,以近似贝叶斯理想观察者(IO)和霍特林观察者(HO)的性能。

详情
Comments
Submitted to the Journal of Medical Imaging (JMI) Special Issue Honoring Dr. Harrison H. Barrett
AI中文摘要

基于任务的图像质量(IQ)评估对于医学成像系统的设计和优化至关重要。理想观察者,包括贝叶斯理想观察者(IO)和理想线性观察者(即霍特林观察者(HO)),提供了客观的品质因数(FOM),用于量化系统在信号检测任务上的性能。然而,将理想观察者应用于高维图像数据通常在计算上难以处理。通道机制提供了一种有效的降维框架,可以促进理想观察者的计算。本文提出了一种基于共轭梯度(CG)的方法,用于构建近似IO和HO性能的高效通道。

英文摘要

Task-based assessment of image quality (IQ) is critically important for the design and optimization of medical imaging systems. Ideal observers, including the Bayesian Ideal Observer (IO) and the ideal linear observer, i.e., the Hotelling observer (HO), provide objective figures of merit (FOMs) that quantify system performance on signal detection tasks. However, the application of ideal observers to high-dimensional image data is often computationally intractable. Channel mechanisms provide an effective framework for dimensionality reduction that can facilitate the computation of ideal observers. This work presents a conjugate gradient (CG)-based method to construct efficient channels for approximating the IO and HO performance.

2605.29413 2026-05-29 q-fin.PM q-fin.MF q-fin.RM q-fin.ST stat.AP

From Classical Optimization to Bayesian Integration: A Comprehensive Analysis of Systematic Portfolio Management

从经典优化到贝叶斯整合:系统性投资组合管理的全面分析

Ajay Kumar Verma, Shravya Barkam

AI总结 本文通过十只美国股票在2023年9月至2025年12月期间的数据,比较了均值-方差优化、约束优化、Fama-French五因子回归、蒙特卡洛模拟和Black-Litterman模型等现代投资组合构建方法,分析了约束、风险因子、模拟近似和市场观点对投资组合配置、绩效和稳定性的影响。

详情
AI中文摘要

本文通过选取十只美国股票(TSLA、WMT、BAC、GS、LLY、MRK、GOOG、META、AAPL和XOM),在2023年9月至2025年12月的时间范围内,比较了一系列当代投资组合构建方法。本文探讨了基本的均值-方差优化、约束优化、Fama-French五因子回归建模、蒙特卡洛模拟以及Black-Litterman模型,以确定解的约束、策略的风险因子、模拟近似以及特定的市场观点如何影响投资组合配置、绩效和稳定性。总体而言,结果表明:标准优化可能导致高度集中的投资组合,而约束优化通过改变有效前沿导致投资组合配置发生变化;五因子回归模型表明一种防御性大价值与盈利暴露的基本投资风格;蒙特卡洛近似是一种可行的技术,用于获得均值-方差最优投资组合,前提是模拟次数足够高,尤其是在箱约束下;与标准均值-方差优化相比,Black-Litterman投资组合方法产生了更具经济直觉的配置和更高的稳定性,因为该方法平衡了均衡收益与投资者观点。

英文摘要

This paper compares a series of contemporary portfolio construction approaches by employing ten U.S. stocks (TSLA, WMT, BAC, GS, LLY, MRK, GOOG, META, AAPL and XOM) in a time frame from September 2023 to December 2025. The paper explores both basic mean-variance optimization, constrained optimization, Fama French five factor regression modeling, Monte Carlo simulation, and the Black-Litterman model to determine how constraints to a solution, risk factors to a strategy, simulated approximations, and specific market views may all impact the outcome of portfolio allocation, performance and stability. Overall, the results show that standard optimization may result in highly concentrated portfolios, while constrained optimization leads to changes in portfolio allocations by altering the efficient frontier, five factor regression models suggest that a basic investment style of defensive large value and profitability exposure, Monte Carlo approximation is a viable technique to arrive at mean-variance optimal portfolios provided the simulations are high enough especially under a box constraint, the Black Litterman portfolio approach produces more economically intuitive allocations and greater stability compared to standard mean-variance optimization as the approach balances equilibrium returns with investor views.

2605.29411 2026-05-29 cs.LG cs.AI stat.ME stat.ML

The Good, the Bad, and the Ugly of Markov Boundary for Tabular Prediction

马尔可夫边界在表格预测中的好、坏与丑

Shu Wan, Abhinav Gorantla, Huan Liu, K. Selçuk Candan

AI总结 研究马尔可夫边界在表格预测中的实际效用,发现理论上最优的边界在实践中有条件地提升预测性能,但因果发现方法难以实现其潜力。

详情
Comments
11 pages, 9 figures, 2 tables. Preprint
AI中文摘要

在标准图形假设下,目标变量的马尔可夫边界是使所有其他特征冗余的最小特征集。一旦观察到边界,目标变量与表格的其余部分条件独立。这对于表格预测来说是一个诱人的对象,因为它恰好指出了模型所需的列。然而,现代回归器仍然在完整特征集上训练。我们询问马尔可夫边界是否在SCM3K(一个包含3450个任务的合成SCM基准,特征数量从40到1000,涵盖六个SCM家族)上对预测真正有用,并使用六个回归器进行评估。答案比理论所暗示的要微妙得多。将回归器限制在oracle边界上通常会显著改善预测,并且随着特征空间变得更大更稀疏,改善程度增加。但是,通过因果发现恢复边界并在恢复的掩码上训练的自然流程并不奏效。现有的估计器在达到边界最有帮助的区域之前就耗尽了计算预算,即使它们运行,也很少能击败完整特征集。我们将此归因于三个原因。发现优化的是结构恢复而非预测。假阴性和假阳性具有高度不对称的预测成本。精确边界只是众多击败所有特征的特征集之一。然后,我们阐述了这些事实对于预测对齐的特征选择以及学习使用因果结构的表格模型的意义。

英文摘要

Under standard graphical assumptions, the Markov boundary of a target variable is the smallest set of features that renders every other feature redundant. Once the boundary is observed, the target is conditionally independent of the rest of the table. This is a tempting object for tabular prediction, since it names exactly the columns a model should need. Yet modern regressors are still trained on the full feature set. We ask whether the Markov boundary is genuinely useful for prediction on SCM3K, a 3,450-task synthetic SCM benchmark with feature counts from 40 to 1000 and six SCM families, evaluated with six regressors. The answer is more nuanced than the theory suggests. Restricting a regressor to the oracle boundary often improves prediction substantially, and the improvement grows as the feature space becomes larger and sparser. But the natural pipeline of recovering the boundary with causal discovery and training on the recovered mask does not deliver. Existing estimators exhaust the compute budget before reaching the regime where the boundary helps most, and even where they run they rarely beat the full feature set. We trace this to three causes. Discovery optimizes structural recovery rather than prediction. False negatives and false positives carry sharply asymmetric predictive cost. The exact boundary is only one of many feature sets that beat all features. We then develop what these facts imply for prediction-aligned feature selection and for tabular models that learn to use causal structure.

2605.29403 2026-05-29 stat.ME stat.AP

Power Estimation for Longitudinal Studies with Time Dependent Covariates Using Generalized Method of Moments

使用广义矩方法对含时变协变量的纵向研究进行功效估计

Niloofar Ramezani, Oliver Hurst

AI总结 本文针对含时变协变量的纵向研究,提出基于广义矩方法(GMM)的两种功效估计方法(Wald法和距离度量法),填补了GMM框架下缺乏实用功效分析工具的空白。

详情
Comments
27 pages with appendix, 16 pages main manuscript, 3 figures in main manuscript, 7 figures including figures in appendix
AI中文摘要

纵向研究经常包含随时间变化的协变量,这会在结果和预测变量之间产生复杂的依赖结构。当协变量是时变时,标准的功效分析工具——主要针对广义估计方程(GEE)开发——可能会产生误导性结果,因为它们没有考虑有效边际推断所需的矩基结构。广义矩方法(GMM)为在存在时变协变量的情况下估计边际效应提供了一个灵活且高效的框架,但目前尚无实用工具可用于GMM下的功效分析。本文介绍了一个现代、可实施的框架,用于使用GMM对含时变协变量的纵向研究进行功效估计。开发了两种互补方法:一种基于Wald的方法,利用GMM估计量的渐近正态性;另一种基于距离度量的方法,基于样本和总体矩条件的二次型。两种方法仅需有限的分布假设,并依赖于有效的矩条件而非完整的似然设定。我们概述了理论基础,提供了逐步实施指南,并利用骨关节炎倡议的数据说明了这些方法。提出了一个模拟框架用于评估实证性能。这些方法填补了纵向建模文献中的一个关键空白,为应用研究人员提供了一种实用的、分布轻量的功效估计方法,适用于存在时变协变量且GMM是首选估计技术的情况。

英文摘要

Longitudinal studies frequently incorporate covariates that evolve over time, creating complex dependence structures between outcomes and predictors. When covariates are time dependent, standard power analysis tools--largely developed for generalized estimating equations (GEE)--can yield misleading results because they do not account for the moment based structure required for valid marginal inference. Generalized Method of Moments (GMM) provides a flexible and efficient framework for estimating marginal effects in the presence of time dependent covariates, yet no practical tools exist for conducting power analysis under GMM. This paper introduces a modern, implementable framework for power estimation in longitudinal studies with time dependent covariates using GMM. Two complementary approaches are developed: a Wald based method that leverages the asymptotic normality of GMM estimators, and a distance metric method based on quadratic forms of sample and population moment conditions. Both approaches require only limited distributional assumptions and rely on valid moment conditions rather than full likelihood specification. We outline the theoretical foundations, provide step by step implementation guidance, and illustrate the methods using data from the Osteoarthritis Initiative. A simulation framework is presented for evaluating empirical performance. These methods fill a critical gap in the longitudinal modeling literature by offering applied researchers a practical, distribution light approach to power estimation when time dependent covariates are present and GMM is the preferred estimation technique.

2605.29395 2026-05-29 stat.ME stat.ML

Low Rank for Rank: Uncertainty-Aware Task-Specific LLM Ranking under Sparse Pairwise Comparisons

低秩排序:稀疏成对比较下不确定性感知的任务特定大语言模型排名

Jiachun Li, David Simchi-Levi, Will Wei Sun

AI总结 提出一种低秩框架,通过稀疏成对比较进行任务特定的大语言模型排名,利用任务-模型能力矩阵的低秩结构实现跨任务信息共享,并开发了不确定性量化方法以提供置信区间和排名证书。

详情
AI中文摘要

成对人类偏好平台(如Chatbot Arena)已成为大语言模型评估的核心,但可靠的任务特定排名仍然具有挑战性。全局排行榜掩盖了任务异质性,而在稀疏、不平衡的比较下独立地对每个细粒度任务进行排名是不稳定的。我们提出了一种低秩框架,用于从稀疏成对比较中进行任务特定的大语言模型排名,将任务-模型能力矩阵 $Θ^\star \in \mathbb{R}^{d_t imes d_m}$ 建模为低秩,以便在相关任务之间共享信息,同时保留任务特定的差异。我们首先开发了一个最大范数($\ell_\infty$)准确的潜在分数估计器,结合凸初始化器和交替最小化细化,并证明了在稀疏采样下任务级 top-$K$ 恢复保证。我们的主要贡献是一个用于任务特定排名的不确定性量化框架。我们为固定分数对比(例如两个模型之间的任务特定能力差距)构建了交叉拟合的一步去偏估计器,产生渐近有效的置信区间,达到半参数效率界。然后我们将推断扩展到高维排名场景,其中每个任务的排名和 top-$K$ 成员资格由许多依赖的分数差距假设决定。使用高斯和乘子自助法校准,我们获得了跨多个任务和模型的每个任务排名的同时置信集以及有效的 top-$K$ 成员资格检验。在合成数据和 Chatbot Arena 上的实验表明,低秩共享提高了样本效率,优于独立的任务级 Bradley-Terry 估计,并产生更紧密、校准更好的排名证书,在真实大语言模型基准测试典型的稀疏场景中增益最大。

英文摘要

Pairwise human-preference platforms such as Chatbot Arena have become central to large language model (LLM) evaluation, yet reliable task-specific ranking remains challenging. Global leaderboards mask task heterogeneity, while ranking each fine-grained task independently is unstable under sparse, imbalanced comparisons. We propose a low-rank framework for task-specific LLM ranking from sparse pairwise comparisons, modeling the task-by-model ability matrix $Θ^\star \in \mathbb{R}^{d_t \times d_m}$ as low rank so that information is shared across related tasks while task-specific differences are preserved. We first develop a max-norm ($\ell_\infty$) accurate estimator for the latent scores, combining a convex initializer with alternating-minimization refinement, and prove task-wise top-$K$ recovery guarantees under sparse sampling. Our main contribution is an uncertainty quantification framework for task-specific ranking. We construct cross-fitted one-step debiased estimators for fixed score contrasts -- such as the task-specific ability gap between two models -- yielding asymptotically valid confidence intervals that attain the semiparametric efficiency bound. We then extend the inference to the high-dimensional ranking regime, where per-task ranks and top-$K$ membership are determined by many dependent score-gap hypotheses. Using Gaussian and multiplier-bootstrap calibration, we obtain simultaneous confidence sets for per-task ranks and valid top-$K$ membership tests across many tasks and models. Experiments on synthetic data and Chatbot Arena show that low-rank sharing improves sample efficiency over independent task-wise Bradley-Terry estimation and produces tighter, better-calibrated ranking certificates, with the largest gains in the sparse regime typical of real LLM benchmarks.

2605.29387 2026-05-29 cs.LG cs.AI stat.ML

On the Optimizer Dependence of Neural Scaling Laws

神经缩放定律的优化器依赖性

Vansh Ramani, Shourya Vir Jain

AI总结 通过随机特征回归实验,发现优化器类型系统性地影响神经缩放定律中的缩放指数α,预条件优化器产生更陡峭的缩放,并提供了光谱诊断预测高级优化器的收益。

详情
AI中文摘要

神经缩放定律 $L(N) \propto N^{-α}$ 中的缩放指数 $α$ 通常被视为由架构和数据确定的固定常数。我们提出证据表明 $α$ 系统性地依赖于优化器。在受控的随机特征回归实验——神经缩放的理论框架——中,我们测量了五种优化器变体和六种光谱条件下的 $α$。预条件优化器一致地产生更陡峭的缩放(更大的 $α$),且 $α$ 的偏移在大部分测试光谱范围内增加,在 $s = 1.5$ 附近达到峰值,并在 $s = 2.0$ 时保持较大。在 $s \approx 1.0$(自然语言的特征)时,完全自然梯度达到 $α\approx 0.31$,而梯度下降为 $α\approx 0.12$——拟合指数大 $2.6$ 倍,在随机特征模型中,该差异随模型规模加倍而累积。这种指数偏移是否以及如何迁移到大规模 LLM 训练中——近期证据表明优势可能随规模减弱——仍是一个重要的开放问题。我们的结果表明,缩放定律预测应考虑优化器选择,并且我们提供了一个光谱诊断来预测高级优化器何时会带来收益。

英文摘要

The scaling exponent $α$ in neural scaling laws $L(N) \propto N^{-α}$ is commonly treated as a fixed constant set by architecture and data. We present evidence that $α$ depends systematically on the optimizer. In controlled random-feature regression experiments -- the canonical theoretical framework for neural scaling -- we measure $α$ across five optimizer variants and six spectral conditions. Preconditioned optimizers consistently yield steeper scaling (larger $α$), with the $α$-shift increasing across most of the tested spectral range, peaking near $s = 1.5$, and remaining large at $s = 2.0$. At $s \approx 1.0$ (characteristic of natural language), the full natural gradient achieves $α\approx 0.31$ versus $α\approx 0.12$ for gradient descent -- a $2.6\times$ larger fitted exponent that, within the random-feature model, compounds with each model-size doubling. Whether and how this exponent shift transfers to large-scale LLM training -- where recent evidence suggests the advantage may attenuate with scale -- remains an important open question. Our results imply that scaling-law forecasts should account for optimizer choice, and we provide a spectral diagnostic predicting when advanced optimizers will pay off.

2605.29371 2026-05-29 math.OC cs.LG cs.NA math.NA stat.ML

Kernel-based potential mean-field games with unbiased random Fourier $U$-statistics

基于核的势均场博弈与无偏随机傅里叶 $U$-统计量

Yumiharu Nakano

AI总结 针对运行交互成本和终端目标成本均由再生核最大均值差异(MMD)惩罚表示的势均场博弈子类,提出一种利用核结构的计算框架,通过无偏随机傅里叶U-统计量估计成本,并证明样本级几乎必然收敛定理和显式收敛速率。

详情
AI中文摘要

我们研究势均场博弈的子类,其中运行交互成本和终端目标成本均通过再生核最大均值差异(MMD)惩罚表示,并开发了一个利用这种核结构的计算框架。两种成本均使用无偏随机傅里叶U-统计量表示从有限样本经验分布中估计,该统计量在批量大小上具有线性成本。受控扩散的漂移由神经网络参数化,并通过随机梯度下降训练。对于该子类,我们在惩罚参数、随机特征数量、样本大小和优化容差的耦合速率条件下,证明了样本级几乎必然收敛定理和显式几乎必然收敛速率。该框架包括核MMD惩罚Schrödinger桥问题作为交互成本消失的特例。数值实验在高达一百维的Schrödinger桥问题以及一个具有每辆车物理异质性的电动汽车充电协调问题上展示了该方法,其中聚合需求拥堵成本代表群体层面的价格反馈竞争,终端MMD惩罚塑造截止时刻的荷电状态分布。

英文摘要

We study the subclass of potential mean-field games in which the running interaction cost and the terminal target cost are both expressed through reproducing-kernel maximum mean discrepancy (MMD) penalties, and develop a computational framework that exploits this kernel structure. Both costs are estimated from finite-sample empirical distributions using a random Fourier U-statistic representation that is unbiased and has linear cost in the batch size. The drift of the controlled diffusion is parametrized by a neural network and trained via stochastic gradient descent. For this subclass we prove a sample-level almost-sure convergence theorem and an explicit almost-sure rate of convergence, under coupled rate conditions on the penalty parameter, the random-feature count, the sample size, and the optimization tolerance. The framework includes the kernel-MMD-penalty Schrödinger bridge problem as the special case of a vanishing interaction cost. Numerical experiments illustrate the method on the Schrödinger bridge problem in dimensions up to one hundred, and on an electric vehicle charging coordination problem with per-vehicle physical heterogeneity, where an aggregate-demand congestion cost represents price-feedback competition at the population level and the terminal MMD penalty shapes the state-of-charge distribution at the deadline.

2605.29351 2026-05-29 cs.LG math.DS stat.ML

Attention as In-Context Empirical Bayes: A Two-Stage View via Particle Dynamics

注意力作为上下文经验贝叶斯:通过粒子动力学的两阶段视角

Matthew Smart, Soumya Ganguly, Nilava Metya, Alexandre V. Morozov, Anirvan M. Sengupta

AI总结 本文通过粒子动力学将最小注意力仅变换器解释为两阶段经验贝叶斯过程,揭示了深度和注意力残差的统计角色,并证明无需显式噪声调度即可实现有效去噪。

详情
Comments
52 pages, 5 figures
AI中文摘要

我们研究了在所有标记损坏情况下的最小注意力仅变换器,并表明它们具有两阶段经验贝叶斯解释。单个注意力步骤计算相对于由上下文定义的经验分布的核加权后验均值。深度通过粒子动力学(阶段1)细化该分布,而长程跳跃连接将噪声输入作为查询用于后验推断(阶段2),揭示了深度和注意力残差的独特统计角色。该框架隔离了一个最小设置,其中上下文本身诱导了一个控制上下文推断的深度依赖能量景观。我们表明,无需显式噪声调度即可出现有效去噪:固定的核带宽和有限的积分范围就足够了,从而产生了一个有原则的深度-噪声关系。我们进一步为一类表现良好的先验建立了后验均值恢复保证,其中经验估计器在渐近条件下收敛到贝叶斯最优预测器。将这些动力学与反向扩散极限联系起来,我们的结果为注意力作为通过基于样本的后验估计进行上下文推断提供了统计解释,无需显式密度建模。

英文摘要

We study minimal attention-only transformers under all-token corruption and show they admit a two-stage empirical Bayes interpretation. A single attention step computes a kernel-weighted posterior mean with respect to the empirical distribution defined by the context. Depth refines this distribution through particle dynamics (Stage 1), while a long-range skip-connection carries the noisy input as a query for posterior inference (Stage 2), revealing distinct statistical roles for depth and attention residuals. The framework isolates a minimal setting in which the context itself induces a depth-dependent energy landscape governing in-context inference. We show that effective denoising can emerge without an explicit noise schedule: a fixed kernel bandwidth and finite integration horizon suffice, yielding a principled depth-noise relationship. We further establish a posterior-mean recovery guarantee for a class of well-behaved priors, where the empirical estimator converges to the Bayes-optimal predictor under asymptotic conditions. Connecting these dynamics to reverse-diffusion limits, our results provide a statistical interpretation of attention as in-context inference via sample-based posterior estimation, without explicit density modeling.

2605.29315 2026-05-29 econ.EM stat.ME

Generalized Spectral Testing with Sample Splitting

基于样本分割的广义谱检验

Yuxin Tao, Feiyu Jiang, Xiaofeng Shao

AI总结 提出一种样本分割广义谱检验方法,用于评估线性与非线性时间序列模型的条件均值设定,通过分割样本估计参数并计算残差,避免了参数估计效应,实现了与不可行检验等价的极限分布,并通过简单乘子自助法近似。

详情
AI中文摘要

基于残差的参数时间序列模型拟合优度检验常常因参数估计效应而复杂化,这种效应会改变诊断统计量的极限行为。我们提出一种样本分割广义谱检验(借鉴Escanciano(2006)的思想),用于评估线性和非线性时间序列模型的条件均值设定。该过程在拟合子样本上估计模型参数,并根据检查/测试子样本计算的残差构造广义谱Cramer-von Mises统计量。该统计量聚合了所有滞后的成对条件均值限制,因此无需带宽选择和截断滞后选择。在温和的正则条件和得分对齐条件下,基于残差的过程与基于真实误差的不可行oracle过程具有相同的零分布极限。尽管得到的极限律仍然是非枢轴的,但可以通过一个简单的乘子自助法一致地近似,该方法不需要生成自助法时间序列或重新估计参数。这种oracle等价性质与原始全样本检验形成鲜明对比,在全样本检验中,参数估计对极限过程贡献了额外的一阶项,并且需要在每个自助法样本中重新估计参数。我们进一步证明了所提检验对固定备择假设的一致性以及对局部备择假设的非平凡功效。大量模拟和实际数据分析表明,所提检验能很好地控制大小,具有可比的功效,并在重复估计代价高昂的模型中大幅节省计算成本。

英文摘要

Residual-based goodness-of-fit tests for parametric time-series models are often complicated by parameter-estimation effects, which can alter the limiting behavior of diagnostic statistics. We propose a sample-splitting generalized spectral test (in the spirit of Escanciano(2006)) for assessing conditional mean specification in linear and nonlinear time-series models. The procedure estimates the model parameter on a fitting subsample and constructs a generalized spectral Cramer-von Mises statistic from residuals computed on a checking/testing subsample. The statistic aggregates pairwise conditional mean restrictions over all lags and is therefore bandwidth-free and free of truncation-lag selection. Under mild regularity conditions and a score-alignment condition, the residual-based process has the same limiting null distribution as the infeasible oracle process based on the true errors. Although the resulting limiting law is still non-pivotal, it can be consistently approximated by a simple multiplier bootstrap that does not require generating bootstrap time series or re-estimating parameters. Such an oracle-equivalence property is in sharp contrast to the original full-sample test, for which parameter estimation contributes an additional first-order term to the limiting process, and requires re-estimating parameters in each bootstrapped sample. We further establish consistency of the proposed test against fixed alternatives and nontrivial power against local alternatives. Extensive simulations and real data analyses show that the proposed test controls size well, has comparable power, and delivers substantial computational savings in models where repeated estimation is costly.

2605.29296 2026-05-29 stat.AP

Conformal prediction for functional time series: Application to age-specific mortality rates

函数型时间序列的保形预测:应用于年龄别死亡率

Han Lin Shang

AI总结 针对模型不确定性,提出无模型、无分布的保形预测方法,用于构建函数型时间序列的预测区间,并通过澳大利亚年龄别死亡率数据验证其有效性。

详情
Comments
27 pages, 4 figures, 7 tables
AI中文摘要

在人口学文献中,预测不确定性通常通过统计模型来量化。这种基于模型的方法可能存在缺陷,即模型设定错误、选择效应以及缺乏有限样本有效性。我们引入了一种模型无关且无分布的程序——保形预测,用于构建函数型时间序列的预测区间。在保形预测家族中,分割保形预测将数据分为训练集、验证集和测试集。在验证集内,我们可以通过校准经验覆盖概率以匹配其名义值来选择最优调优参数。然后,利用选定的最优调优参数,我们使用相同的预测模型为测试集中的保留数据构建预测区间。无需样本分割,序列保形预测通过自回归过程顺序更新预测分位数。使用澳大利亚年龄和性别特定的对数死亡率,我们评估并比较了两种保形预测变体之间的区间预测准确性,通过经验覆盖概率、覆盖概率差异和平均区间得分来衡量。

英文摘要

In demographic literature, forecast uncertainty is often quantified with a statistical model. This model-based approach may potentially suffer from drawbacks, namely model misspecification, selection effect, and lack of finite-sample validity. We introduce a model-agnostic and distribution-free procedure, conformal prediction, for constructing prediction intervals for a functional time series. In the family of conformal prediction, split conformal prediction divides the data into training, validation, and test sets. Within the validation set, we can select optimal tuning parameters by calibrating the empirical coverage probabilities to match their nominal values. With the selected optimal tuning parameters, we then construct the prediction intervals using the same forecasting model for the holdout data in the testing set. Without sample splitting, sequential conformal prediction sequentially updates the predicted quantiles via an autoregressive process. Using Australian age- and sex-specific log mortality rates, we evaluate and compare the interval forecast accuracy, as measured by empirical coverage probability, coverage probability difference and mean interval score, between the two variants of conformal prediction.

2605.29284 2026-05-29 stat.ME stat.AP stat.CO

Rapid Approximation Prediction for Kriging

克里金法的快速近似预测

Ziyu Li, Gregory Fasshauer, Douglas Nychka

AI总结 针对大规模空间数据,提出一种通过局部邻域稀疏线性组合近似协方差向量的快速克里金预测方法,显著降低计算复杂度并保持精度。

详情
Comments
11 figures, 38 pages
AI中文摘要

精确克里金法和用于不确定性量化的条件模拟(CS)对于具有大量观测值和密集预测网格的现代空间分析在计算上不可行。我们针对规则预测网格上的平稳高斯过程,提出一种克里金预测步骤的快速近似方法,通过将每个非网格协方差向量近似为局部 $L$ 阶邻域内 $M = (2L)^2$ 个网格点的协方差的稀疏线性组合。这种重新表述将复杂度从 $O(N n^3)$ 降低到 $O(N \log N + nM + M^3)$,同时保持精度。一项因子研究表明,近似误差随着 Matérn 平滑度、邻域阶数 $L$ 和网格分辨率的增加而系统性地减小,这与核近似理论的界限一致。在一个北美夏季降雨应用($n=1368$)中,我们的方法产生的预测与精确克里金法在视觉上无法区分,点误差约为 $10^{-5}$ 英寸,并在 $350 imes350$ 网格上实现了超过 150 倍的加速,同时优于 Vecchia 和 LatticeKrig 的预测。嵌入到快速 CS 方案中,该方法再现了克里金标准误差,并且随着 $n$ 和 $N$ 的增加具有良好的可扩展性。我们推荐一个实用工作流程:先使用快速方法进行参数估计,然后使用我们的快速预测器进行精细网格映射和不确定性量化。

英文摘要

Exact Kriging and conditional simulation (CS) for uncertainty quantification are computationally infeasible for modern spatial analyses with large numbers of observations and dense prediction grids. We present a rapid approximation to the Kriging prediction step for stationary Gaussian processes for a regular prediction grid by approximating each off-grid covariance vector by a sparse linear combination of on-grid covariances within a local $L$-order neighborhood of $M = (2L)^2$ neighboring grid points. This reformulation reduces complexity from $O(N n^3)$ to $O(N \log N + nM + M^3)$ while preserving accuracy. A factorial study shows that approximation error decreases systematically with increased Matérn smoothness, neighbor order $L$, and grid resolution, aligning with bounds from kernel approximation theory. In a North American summer-rainfall application ($n=1368$), our method produces predictions visually indistinguishable from exact Kriging with point-wise errors on the order of $10^{-5}$ inches and achieves more than $150$ times speedups at a $350\times350$ grid, also outperforming Vecchia and LatticeKrig predictions. Embedded in a fast CS scheme, the approach reproduces Kriging standard errors and scales favorably with both $n$ and $N$. We recommend a practical workflow that uses a fast method for parameter estimation followed by our rapid predictor for fine-grid mapping and uncertainty quantification.

2605.29272 2026-05-29 cs.LG cs.AI stat.ML

Causal Label Recovery in Payment Networks

支付网络中的因果标签恢复

Gaurav Dhama

AI总结 针对支付网络中标签存在的四种系统偏差,提出序列三重稳健(STR)估计器,同时纠正所有偏差并达到半参数效率界,实现基于数天而非数月数据的训练。

详情
Comments
49 pages
AI中文摘要

支付网络中的欺诈检测模型依赖于存在系统性偏差的退单标签进行训练。每个标签必须依次经过三个门控:授权(被拒绝的交易不产生标签)、发卡行报告(未报告的欺诈不可见)和延迟(待处理的退单在训练时缺失)。到达的标签可能因第一方滥用或发卡行错误分类而受损。配套论文[arXiv:2605.27557]证明这四种损害对检测性能施加了极小极大下界。本文问:能否达到该下界?我们将观测流程形式化为一个具有三个倾向阶段和一个损坏层的顺序缺失数据问题,并构建了序列三重稳健(STR)估计器。STR同时纠正所有四种损害,并达到半参数效率界——没有估计器能具有更低的渐近方差。它是序列三重稳健的:在每个门控处,一致性仅要求倾向模型或结果回归中有一个正确指定,而非两者。我们提供了通过噪声率调整的伪标签进行损坏校正、通过经验贝叶斯收缩稳定小发卡行的逆倾向权重、提供有效置信区间的插件方差估计量,以及用于有限样本保证的伯恩斯坦集中不等式。在操作层面,我们推导了最优训练延迟——使标签质量损失和模型过时之和最小化的成熟窗口——并证明STR允许使用数天而非数月前的数据进行训练,将模型新鲜度与退单成熟周期解耦。对于任何样本量,STR在均方误差上严格优于基于退单的朴素训练。

英文摘要

Fraud detection models in payment networks train on chargeback labels that are systematically biased. Every label must survive three sequential gates: authorization (declined transactions generate no labels), issuer reporting (unreported fraud is invisible), and delay (pending chargebacks are missing at training time). Labels that do arrive may be corrupted by first-party misuse or issuer misclassification. A companion paper [arXiv:2605.27557] proved that these four impairments impose a minimax lower bound on detection performance. This paper asks: can that bound be achieved? We formalize the observation pipeline as a sequential missing-data problem with three propensity stages and a corruption layer, and construct the Sequential Triply Robust (STR) estimator. The STR corrects for all four impairments simultaneously and achieves the semiparametric efficiency bound -- no estimator can have lower asymptotic variance. It is sequentially triply robust: at each gate, consistency requires only that either the propensity model or the outcome regression is correctly specified, not both. We provide corruption correction via noise-rate-adjusted pseudo-labels, empirical Bayes shrinkage to stabilize inverse-propensity weights for small issuers, a plug-in variance estimator yielding valid confidence intervals, and a Bernstein concentration inequality for finite-sample guarantees. On the operational side, we derive the optimal training delay -- the maturity window that minimizes the sum of label-quality loss and model staleness -- and prove that the STR permits training on data that is days old rather than months old, decoupling model freshness from the chargeback maturity cycle. The STR provably dominates naive chargeback-based training in mean squared error for any sample size.

2605.29255 2026-05-29 stat.ME

Outcome-Calibrated Regression and Predicted Outcome-Based Inference

结果校准回归与基于预测结果的推断

Hwiyoung Lee, Shuo Chen

AI总结 针对OLS预测在结果条件上存在偏差的问题,提出结果校准回归(OCR)方法,通过直接强制结果校准消除条件预测偏差,实现基于预测结果的有效推断。

详情
AI中文摘要

回归是科学研究中的基本工具。普通最小二乘法(OLS)是最广泛使用的回归方法之一,具有多个理想性质,包括最佳线性无偏估计量(BLUE)性质。众所周知,在标准模型假设下,OLS在给定协变量时条件无偏,即 $\mathbb{E}(\widehat Y-Y\mid X=x)=0$。然而,OLS一个常被忽视的性质是,预测误差在给定结果时通常不是条件无偏的,即 $\mathbb{E}(\widehat Y-Y\mid Y=y) eq 0$。由于最小化均方误差,OLS预测系统地朝向结果均值收缩,这解释了经典的均值回归(RTM)现象:大的结果值倾向于被低估,而小的结果值倾向于被高估。这种条件预测偏差给基于预测结果的推断带来了不可忽略的问题,其中科学推断使用预测结果 $\widehat Y$ 和另一个变量 $W$ 进行。在脑年龄分析和因果推断等应用中,我们表明基于回归预测结果的推断可能存在系统性偏差。为解决这一问题,我们提出结果校准回归(OCR),一种新的回归框架,具有闭式解,直接强制结果校准。所提出的OCR估计量消除了关于结果的条件预测偏差,并使得使用回归预测结果进行有效推断成为可能。

英文摘要

Regression is a fundamental tool in scientific research. Ordinary least squares (OLS), one of the most widely used regression methods, enjoys several desirable properties, including the best linear unbiased estimator (BLUE) property. It is well known that, under the assumptions of the standard model, the OLS is conditionally unbiased given the covariates, i.e., $\mathbb{E}(\widehat Y-Y\mid X=x)=0$. However, an often-overlooked property of OLS is that the prediction error is generally not unbiased conditional on the outcome, i.e., $\mathbb{E}(\widehat Y-Y\mid Y=y)\neq 0$. As a consequence of minimizing mean squared error, OLS predictions are systematically shrunk toward the outcome mean, which explains the classical phenomenon of regression to the mean (RTM): large outcome values tend to be underpredicted, whereas small outcome values tend to be overpredicted. This conditional prediction bias creates a nonignorable problem for predicted outcome-based inference, where scientific inference is performed using the predicted outcome $\widehat Y$ and another variable $W$. In applications such as brain-age analysis and causal inference, we show that inference based on regression-predicted outcomes can be systematically biased. To address this issue, we propose outcome-calibrated regression (OCR), a new regression framework with a closed-form solution that directly enforces outcome calibration. The proposed OCR estimator eliminates conditional prediction bias with respect to the outcome and enables valid inference using regression-predicted outcomes.

2605.29249 2026-05-29 stat.ML cs.LG

Prediction-Powered Inference Across Many Tasks for AI Evaluation & Social Science Research

跨任务预测驱动推理在AI评估与社会科学研究中的应用

Nicolas Emmenegger, Ellery Stahler, Chara Podimata

AI总结 提出多任务预测驱动推理框架,通过跨任务重校准利用共享结构,在标签稀缺时提升统计推断效率,并证明非线性结构是跨任务增益的必要条件。

详情
AI中文摘要

许多应用需要在多个相关任务中进行统计上有效的推断,而每个假设只使用少量高质量标签。在AI评估中,这些任务可能对应于不同提示、子群体或假设下的模型行为;在社会科学调查中,它们可能对应于相关问题、群体或测量条件。预测驱动推理(PPI)利用丰富但廉价的代理测量来改进有限真实标签的推断,但常用方法独立处理任务,因此未能利用相关任务间的共享结构。这一限制在每任务仅有少量标签的场景中尤为重要。为解决此问题,我们引入了一个多任务预测驱动推理框架,该框架利用来自相关任务的标记数据来提高统计功效,同时保留任务特定的推断。我们的方法通过跨任务重校准来利用代理-真实关系中的共享结构,同时保留任务内修正和功效调优,以构建精确的点估计和置信区间。我们证明,只有当代理-真实关系包含非线性结构时,才能实现超越功效调优PPI的效率提升;仿射跨任务重校准在渐近意义上等同于使用原始代理。我们通过合成和半合成数据集上的实验,以及2024年美国总统大选期间审计语言模型关于选举相关信息的案例研究,补充了我们的理论发现。利用一项大型人工标注研究,我们表明当标签稀缺时,跨任务重校准可以显著减少置信区间宽度。

英文摘要

Many applications require statistically valid inference across many related tasks, while using only a handful of high-quality labels per hypothesis. In AI evaluation, these tasks may correspond to model behaviors across prompts, subgroups, or hypotheses; in social science surveys, they may correspond to related questions, populations, or measurement conditions. Prediction-powered inference (PPI) uses abundant but inexpensive proxy measurements to improve inference from limited, ground-truth labels, but commonly used methods treat tasks independently and therefore fail to exploit shared structure across related tasks. This limitation is especially important in settings where only a small number of labels are available per task. To address this issue, we introduce a multi-task prediction-powered inference framework that uses labeled data from related tasks to improve power while preserving task-specific inference. Our methods exploit the shared structure in the proxy-ground-truth relationship through cross-task recalibration, while retaining within-task rectification and power tuning to construct accurate point estimates and confidence intervals. We prove that efficiency gains beyond power-tuned PPI are only possible when the proxy-ground-truth relationship contains nonlinear structure; affine cross-task recalibrations are asymptotically equivalent to using the original proxy. We complement our theoretical findings with experiments on synthetic and semi-synthetic datasets, as well as a case study auditing language models on election-related information during the 2024 U.S. presidential election. Using a large human-annotation study, we show that cross-task recalibration can substantially reduce confidence interval widths when labels are scarce.

2605.29222 2026-05-29 stat.ME

Valid and efficient possibilistic fusion

有效且高效的可能性融合

Leonardo Cella

AI总结 针对推断模型(IM)框架下的可能性测度融合问题,提出一种基于排序-验证构造的通用有效性保持框架,并研究在独立性、任意依赖性和可交换性下的实现,揭示效率考量。

详情
Comments
28 pages, 7 figures
AI中文摘要

除了融合多源证据的经典动机外,基于随机化、重抽样和数据分割的现代推断过程常常引入分析师生成的多重性,其中跨随机实现聚合输出可以提高稳健性和稳定性。这强调了在不同推断设置下融合证据测度时,同时保持所采用推断框架的关键属性的重要性。本文在推断模型(IMs)的背景下解决这一问题,IMs是一种用于可证明有效统计推断的可能性方法。尽管可能性测度的融合在可能性理论文献中已被广泛研究,但现有方法通常不能保持IM的有效性。我们提出一个通用的保持有效性的可能性融合框架,其动机来源于IMs背后的排序-验证构造。我们研究了该框架在可用IMs的独立性、任意依赖性和可交换性下的实现,从而为广泛的实际相关场景中的IM融合提供了统一的方法。所提出的框架还揭示了重要的效率考量,表明直观且常用的融合算子在IM背景下可能变得低效,因此替代选择有时可能更有利,包括那些从纯直观角度看起来不自然的选择。

英文摘要

Besides the classical motivation of fusing evidence from multiple sources, modern inferential procedures based on randomization, resampling, and data splitting often introduce analyst-generated multiplicity, where aggregating outputs across random realizations can improve robustness and stability. This emphasizes the importance of developing principled strategies for fusing measures of evidence across different inferential settings, while preserving the key properties of the adopted inferential framework. The present paper addresses this problem in the context of inferential models (IMs), a possibilistic approach for provably valid statistical inference. Although the fusion of possibility measures has been extensively studied in the possibility-theory literature, existing methods do not, in general, preserve IM validity. We propose a general validity-preserving framework for possibilistic fusion, motivated by the ranking--validification construction underlying IMs. We study the implementation of this framework under independence, arbitrary dependence, and exchangeability of the available IMs, thereby providing a unified approach for IM fusion across a broad range of practically relevant scenarios. The proposed framework also reveals important efficiency considerations, showing that intuitive and commonly used fusion operators may become inefficient in the IM context, so that alternative choices can sometimes be advantageous, including ones that might not appear natural from a purely intuitive standpoint.

2605.29196 2026-05-29 stat.AP

Coating Breakdown Prediction for Ships and Inspection Planning

船舶涂层破损预测与检查规划

Huy Truong-Ba, Michael E. Cholette, Geoffrey Will, Marc Hartmann

AI总结 采用幂律非齐次泊松过程(PL-NHPP)和分层贝叶斯方法,解决数据稀缺下船舶涂层缺陷预测问题,并优化检查规划以降低生命周期成本。

详情
AI中文摘要

海洋腐蚀显著降低船舶可用性,增加运营成本并可能影响安全。防护涂层可缓解这些风险,但其有效性随时间恶化。早期检测涂层破损对于防止昂贵的维修和安全问题至关重要。虽然腐蚀本身已被充分理解,但由于缺乏长期数据,涂层退化仍研究不足。本文通过增强涂层缺陷预测和优化船舶检查规划来填补这一知识空白。采用幂律非齐次泊松过程(PL-NHPP)对涂层缺陷到达进行建模。与先前研究不同,我们采用分层贝叶斯方法进行参数拟合,有效解决了与稀缺真实数据相关的局限性。此外,我们通过考虑停运成本和延迟维修导致的潜在成本增加来优化检查规划。通过一项涉及最近投入使用且历史数据有限的船队的综合案例研究,评估了这些方法的有效性。本研究通过实现更准确的涂层破损预测和在船队寿命早期优化检查计划,推动了船舶基于状态的维护(CBM)策略的发展。该方法最终提高了运营效率并降低了生命周期成本。

英文摘要

Marine corrosion significantly reduces a ship's availability, increases costs of operation and could impact safety. Protective coatings mitigate these risks, but their effectiveness deteriorates over time. Early detection of coating breakdown is crucial to prevent costly repairs and safety concerns. While corrosion itself is well-understood, coating degradation remains under-investigated due to insufficient long-term data. This work addresses this knowledge gap by enhancing coating defect prediction and optimizing inspection planning for ships. The Power Law Non-Homogeneous Poisson Process (PL-NHPP) is utilized for modeling coating defect arrivals. Unlike prior studies, we employ a hierarchical Bayesian approach for parameter fitting, effectively addressing limitations associated with scarce real-world data. Furthermore, we optimize inspection planning by incorporating out-of-service costs and potential costs increases due to delayed repairs. The efficacy of these methods is evaluated through a comprehensive case study involving a recently commissioned fleet with limited historical data. This research contributes to the advancement of condition-based maintenance (CBM) strategies for ships by enabling more accurate prediction of coating breakdowns and optimizing inspection schedules early in the life of the fleet. This approach ultimately improves operational efficiency and reduces life-cycle costs.

2605.29193 2026-05-29 stat.AP

Bayesian reversal of the liquid level trajectory in a draining tank for pollution forensics

用于污染溯源的排水罐中液位轨迹的贝叶斯反演

Kyla D. Jones, Gbenga Fabusola, Alexander W. Dowling, Cory M. Simon

AI总结 针对污染事件中未知初始液位的反问题,提出基于贝叶斯统计反演的框架,结合托里拆利定律物理模型和经验偏差函数,从最终液位和排水时长推断初始液位并量化不确定性。

详情
AI中文摘要

危险液体储罐在工业和农业中很常见。在污染事件中,液体可能通过小孔、裂缝或管道从储罐中排出。控制泄漏后,估算排出的液体体积对于公共安全、监管评估和修复至关重要。当原始库存未知时,这构成一个反问题。在这项工作中,我们提出了一个框架,用于从污染事件后观察到的最终液位和排水时长的估计值推断部分排空储罐中的初始液位。由于排水动力学、模型参数和观测值存在不确定性,我们采用贝叶斯统计反演将先验物理知识与实验液位时间序列数据相结合,以预测具有量化不确定性的初始液位。我们使用基于托里拆利定律的物理模型来描述储罐排水动力学,并通过经验偏差函数对其进行增强,以解释缺失或不完美建模的物理过程。在我们用水进行储罐排水的实验中,我们发现推断的初始液位是准确的,尽管不确定性随着排水时长的增加而增加。除了应用于污染溯源外,这项工作还可以作为动手课堂项目,说明动态建模、模型偏差和贝叶斯推断。

英文摘要

Storage tanks for hazardous liquids are common in industry and agriculture. During a pollution incident, liquid may drain from a storage tank through a small hole, crack, or pipe. After containing the leak, estimating the discharged volume of liquid is essential for public safety, regulatory assessment, and remediation. When the original inventory of liquid is unknown, this constitutes an inverse problem. In this work, we present a framework for inferring the initial liquid level in a partially drained tank from the observed final liquid level after a pollution incident and an estimate of the drainage duration. Because the drainage dynamics, model parameters, and observations are uncertain, we employ Bayesian statistical inversion to combine prior physical knowledge with experimental liquid level time series data to predict the initial liquid level with quantified uncertainty. We use a physics-based model based on Torricelli's law to describe the tank-draining dynamics and augment it with an empirical discrepancy function to account for missing or imperfectly modeled physics. In our experiments with a tank draining of water, we found that our inferred initial liquid level was accurate, although uncertainty increased with drainage duration. Beyond its application to pollution forensics, this work may also serve as a hands-on classroom project illustrating dynamic modeling, model discrepancy, and Bayesian inference.

2605.29189 2026-05-29 math.ST stat.ME stat.ML stat.TH

Bayesian Multiplicity Correction in the Probabilistic Forward Stepwise Framework

概率前向逐步框架中的贝叶斯多重校正

Andrew Womack, Daniel Taylor-Rodriguez

AI总结 本文在回归问题的概率前向逐步模型空间先验表示中,提出一种自然的贝叶斯多重校正先验分布,通过类比Holm过程实现,并与Matryoshka doll先验行为一致,论证了多重校正为回归模型空间先验设定提供了原则性且透明的标准。

详情
Comments
2 Figures
AI中文摘要

我们在回归问题的概率前向逐步模型空间先验表示中,开发了一种自然的贝叶斯多重校正先验分布。所提出的先验通过类比Holm过程得到,其行为与Matryoshka doll先验紧密一致。我们将这两种先验与其他几种先验进行比较,包括最近作为模型空间先验概率客观选择提出的先验。我们的比较表明,充分的多重校正需要一定程度的稀疏性,而许多推荐的先验并未提供这种稀疏性;我们论证了多重校正本身为回归中模型空间先验的设定提供了原则性且透明的标准。

英文摘要

We develop a natural Bayesian multiplicity-correcting prior distribution within the probabilistic forward stepwise representation of model space priors for regression problems. The proposed prior, obtained from making an analogy to the Holm procedure, exhibits behavior closely aligned with that of the Matryoshka doll prior. We compare both priors to several other priors, including some recently put forward as objective choices for model space prior probabilities. Our comparisons indicate that adequate multiplicity correction requires a degree of sparsity that many recommended priors do not provide, and we argue that multiplicity correction itself offers a principled and transparent criterion for specifying model space priors in regression.

2605.29182 2026-05-29 stat.ME

A Latent Variable Model for Response Times with Individual-Specific Change-Points

具有个体特异性变点的响应时间潜变量模型

Gabriel Wallin, Nivedita Bhaktha

AI总结 提出一种包含个体特异性变点的对数响应时间潜变量模型,通过边际最大似然估计参数,并利用后验分布量化个体变点位置的不确定性。

详情
AI中文摘要

在计算机化测试中收集的响应时间提供了关于潜在响应过程的信息,并且可能在测试过程中表现出个体内变异。我们提出了一种对数响应时间的潜变量模型,该模型包含个体特异性变点。该模型扩展了对数正态响应时间模型,允许在未观测到的变点之后均值结构发生项目特异性偏移。变点被视为离散潜变量,其分布被建模为潜在速度的函数。使用边际最大似然进行估计。该框架为变点位置提供了后验分布,允许在个体水平上量化不确定性,并支持变点效应参数的统计推断。一项模拟研究考察了在不同边界条件、变点发生率、样本量和测试长度下的参数恢复和变点估计。结果表明项目和结构参数恢复准确。所提出的模型为建模具有个体内行为变化的响应时间提供了一种统一的方法。

英文摘要

Response times collected in computerised assessments provide information about the underlying response process and may exhibit within-person variation over the course of a test. We propose a latent variable model for log response times that incorporates individual-specific change-points. The model extends the log-normal response time model by allowing an item-specific shift in the mean structure after an unobserved change-point. The change-point is treated as a discrete latent variable, and its distribution is modeled as a function of latent speed. Estimation is carried out using marginal maximum likelihood. The framework yields posterior distributions for change-point locations, allowing uncertainty to be quantified at the individual level, and supports statistical inference for the change-point effect parameters. A simulation study examines parameter recovery and change-point estimation under varying boundary conditions, prevalence of changers, sample sizes, and test lengths. The results show accurate recovery of item and structural parameters. The proposed model provides a unified approach to modeling response times with within-person changes in behaviour.

2605.29180 2026-05-29 stat.CO

Neural Posterior Estimation for Spatial Individual-Level Epidemic Models

空间个体级流行病模型的神经后验估计

Yicheng Mao, Rob Deardon

AI总结 针对空间个体级流行病模型,提出使用神经后验估计(NPE)进行摊销贝叶斯推断,通过条件归一化流直接近似后验,避免推断时的似然计算,并比较了两种嵌入架构,其中图神经网络(GNN)嵌入在空间传播参数上误差更低且置信区间更窄。

详情
AI中文摘要

空间个体级模型(ILMs)为已知位置的种群间传染病传播建模提供了灵活的框架。这些模型的贝叶斯推断依赖于马尔可夫链蒙特卡洛(MCMC),这需要重复的似然评估,并且当流行病轨迹部分未观测时,需要对高维潜在变量进行数据增强采样。这种计算成本限制了MCMC在大种群和需要跨多次暴发进行推断的场景中的适用性。我们提出使用神经后验估计(NPE)进行空间ILMs的摊销贝叶斯推断。NPE在模拟数据上训练条件归一化流以直接近似后验,从而在推断时绕过似然计算。我们比较了两种嵌入架构:一种是在种群水平发病率曲线上操作的卷积神经网络(CNN),另一种是在个体水平感染和位置数据上操作的图神经网络(GNN)。在完全观测、随机移除和部分观测下的模拟研究中,两种变体都产生了校准良好的后验,其中GNN嵌入在空间传播参数上产生更低的误差和更窄的置信区间。我们将该框架应用于2001年英国口蹄疫暴发中1177个农场位置的空间SEIR模型。GNN-NPE保持了校准的覆盖范围,并且在每次流行病的基础上比MCMC快得多。

英文摘要

Spatial individual-level models (ILMs) provide a flexible framework for modelling infectious disease transmission across populations with known locations. Bayesian inference for these models relies on Markov chain Monte Carlo (MCMC), which requires repeated likelihood evaluation and, when parts of the epidemic trajectory are unobserved, data-augmented sampling over high-dimensional latent variables. This computational cost limits the applicability of MCMC to large populations and to settings requiring inference across multiple outbreaks. We propose using neural posterior estimation (NPE) for amortised Bayesian inference in spatial ILMs. NPE trains a conditional normalising flow on simulated data to approximate the posterior directly, bypassing likelihood evaluation at inference time. We compare two embedding architectures: a convolutional neural network (CNN) operating on the population-level incidence curve and a graph neural network (GNN) operating on individual-level infection and location data. In a simulation study under full observation, stochastic removals, and partial observation, both variants produce well-calibrated posteriors, with the GNN embedding yielding lower error and narrower credible intervals for the spatial transmission parameters. We apply the framework to a spatial SEIR model on 1,177 farm locations from the 2001 UK foot-and-mouth disease outbreak. GNN-NPE maintains calibrated coverage and is substantially faster than MCMC on a per-epidemic basis.

2605.29152 2026-05-29 cs.LG math.OC stat.ML

Do Deep Networks Forget Initialization? A Forgetting-Time View of Practical Inductive Bias

深度网络会忘记初始化吗?实用归纳偏见的遗忘时间视角

Mohua Das, Pierfrancesco Beneventano, Shibshankar Dey, Gareth H. McKinkey, Tomaso Poggio

AI总结 通过引入初始化记忆度量,研究随机初始化对训练后预测器的影响,发现低学习率SGD保留初始化记忆而Adam族方法遗忘,且遗忘动力学与泛化正则化相关。

详情
Comments
39 pages, 9 figures
AI中文摘要

随机初始化的神经网络在函数上诱导先验,但实践中使用的预测器仅在训练后产生。我们询问这种初始偏差有多少在训练流程中幸存。为了使问题可测量,我们引入初始化记忆:验证选择的预测器对随机初始化尺度的依赖性。我们在ResNet上进行了受控的CIFAR-10实验,其中初始化记忆已经尖锐地分离了训练机制。低学习率SGD可以在记住初始化的同时进行插值:在批大小$b=128$的ResNet-9上,尽管训练准确率$\ge99.5\%$,测试准确率在不同初始化尺度上变化$26.5$个百分点。这不是欠训练:将相同的低学习率机制扩展到$5{,}000$个epoch,差异基本不变。相比之下,Adam族方法在很大程度上消除了这种依赖性。当较大的学习率与显式$L_2$范数控制配对时,SGD也可以被遗忘。我们根据遗忘的时间尺度解释这些发现:梯度流式动力学可以保留初始化记忆,而随机有限步效应、显式范数衰减和自适应预处理在由显式或隐式正则化大小控制的尺度上消除它。因此,训练网络的实用归纳偏见不仅仅是架构先验,而是经过训练流程遗忘动力学过滤后的架构先验;并且改善泛化的相同正则化器正是那些消除初始化记忆的。

英文摘要

Randomly initialized neural networks induce a prior over functions, but the predictor used in practice is produced only after training. We ask how much of this initial bias survives the training pipeline. To make the question measurable, we introduce initialization memory: the dependence of the validation-selected predictor on the scale of the random initialization. We perform controlled CIFAR-10 experiments on ResNets where initialization memory already sharply separates training regimes. Low-learning-rate SGD can interpolate while still remembering its initialization: on ResNet-9 with batch size $b=128$, test accuracy varies by $26.5$ percentage points across initialization scales despite $\ge99.5\%$ training accuracy. This is not undertraining: extending the same low-learning-rate regime to $5{,}000$ epochs leaves the spread essentially unchanged. In contrast, Adam-family methods largely erase the dependence. SGD can also be made to forget when larger learning rates are paired with explicit $L_2$ norm control. We interpret these findings in terms of the time scale of forgetting: gradient-flow-like dynamics can preserve initialization memory, whereas stochastic finite-step effects, explicit norm decay, and adaptive preconditioning erase it on scales governed by the size of explicit or implicit regularization. The practical inductive bias of a trained network is therefore not the architectural prior alone, but the architectural prior after being filtered by the forgetting dynamics of the training pipeline; and the same regularizers that improve generalization are precisely those that erase memory of initialization.

2605.29148 2026-05-29 cs.LG stat.ML

Optimal Gap-Dependent Regret for Private Stochastic Decision-Theoretic Online Learning

私有随机决策理论在线学习的最优间隙相关遗憾

Tommaso Cesari, Roberto Colomboni

AI总结 针对完全信息、事件级纯差分隐私的随机决策理论在线学习,提出一种无水平线的纯差分隐私算法,并证明遗憾界为O(log K / Δ_min + log K / ε)。

详情
AI中文摘要

我们研究具有完全信息和事件级纯差分隐私的随机决策理论在线学习。Hu和Mehta在COLT上提出的一个开放问题要求确定在纯事件级差分隐私下,随机决策理论在线学习的最优间隙相关遗憾率。对于$K$个动作,损失在$[0,1]$中,且唯一最优动作与次优动作的间隙为$Δ_{\min}$,已知下界为$ rac{\log K}{\min\{Δ_{\min},\varepsilon\}} $,或等价地,在通用常数范围内,为\[ rac{\log K}{Δ_{\min}}+ rac{\log K}{\varepsilon} \]。我们给出一个无水平线的纯DP算法,并证明对于任意水平线$T$,显式遗憾界\[ \operatorname{Reg}_T \le 1000 \cdot \left( rac{\log K}{Δ_{\min}}+ rac{\log K}{\varepsilon} ight) \]。数值常数未优化。该算法将时间划分为指数增长大小的块,每个块内执行单个动作,并通过指数机制(应用于前一个块的数据无关随机前缀)选择下一个动作。随机前缀将块遗憾转化为所有前缀长度上softmax选择误差的和。单个熵势参数以代价$\log K/\varepsilon$控制所有隐私主导的大间隙动作。

英文摘要

We study stochastic decision-theoretic online learning with full information and event-level pure differential privacy. A COLT open problem of Hu and Mehta asks to determine the optimal gap-dependent regret rate for stochastic decision-theoretic online learning under pure event-level differential privacy. For $K$ actions, losses in $[0,1]$, and a unique best action separated from the second-best action by gap $Δ_{\min}$, the known lower bound is of order $ \frac{\log K}{\min\{Δ_{\min},\varepsilon\}}, $ or equivalently, up to universal constants, of order \[ \frac{\log K}{Δ_{\min}}+\frac{\log K}{\varepsilon}. \] We give a horizon-free pure-DP algorithm and prove the explicit regret bound \[ \operatorname{Reg}_T \le 1000 \cdot \left(\frac{\log K}{Δ_{\min}}+\frac{\log K}{\varepsilon}\right) \] for every horizon $T$. The numerical constant is not optimized. The algorithm partitions time into blocks of exponentially increasing size, plays a single action throughout each block, and chooses the next action by an exponential mechanism applied to a data-independent random prefix of the previous block. The random prefix converts block regret into a sum, over all prefix lengths, of softmax selection errors. A single entropy-potential argument controls all privacy-dominated large-gap actions at cost $\log K/\varepsilon$.

2605.29139 2026-05-29 stat.ML cs.LG

Anytime-Valid Federated Conformal RAG for LLM Swarms

面向LLM群体的任意有效联邦共形RAG

Prasanjit Dubey, Xiaoming Huo

AI总结 提出Anytime-FC-RAG,通过可累积的逐步校准偏差预算和截断投注e过程,将联邦共形RAG扩展到任意停止时间均有效的序贯覆盖,并保证时间均匀报警有效性、Hoeffding拼接累积误覆盖包络及自适应控制下的安全性。

详情
AI中文摘要

联邦共形RAG(FC-RAG)为带宽受限的弱语言模型群体提供了无分布假设的覆盖保证,但仅限于固定时间范围。我们将其扩展到任意有效序贯覆盖:在每个停止时间均有效,且在可预测自适应控制(重新校准、每节点带宽升级、蒸馏学生刷新)下保持不变,且无需比固定时间范围FC-RAG更多的假设。朴素组合失败,因为FC-RAG的边缘覆盖界使得投注e过程在不利校准抽取下成为非超鞅,无法调用Ville不等式。我们提出Anytime-FC-RAG,这是一种序贯扩展,基于可累加的逐步校准偏差预算,将边缘界转换为校准好事件上的严格条件界,并配以在整个概率空间上为非负超鞅的截断投注e过程。由这两个要素,我们获得四个保证:时间均匀报警有效性$\mathbb{P}(\sup_t E_t \ge 1/δ_e) \le δ_e + δ_{\mathrm{cal}}$,相同总预算下的Hoeffding拼接累积误覆盖包络,任何可预测控制器(重新校准、带宽升级、学生刷新)下的安全性,以及通过可累加训练预算在无界序列的联邦探针-逻辑蒸馏(FPLD)刷新上的训练侧误差传播。实际结果是,仅在e过程超过警告阈值时升级检索带宽的自适应控制器,以显著更低的通信成本匹配固定高带宽调度的报警率。在GPT-2-small + MiniLM群体上对MMLU、DBpedia和AG News的实验验证了预测的报警率、检测延迟、包络覆盖以及14%-57%的带宽节省;报警仅在覆盖真正失效时触发。

英文摘要

Federated Conformal RAG (FC-RAG) provides distribution-free coverage for a bandwidth-limited swarm of weak language models, but only at a fixed horizon. We extend it to anytime-valid sequential coverage: validity at every stopping time, preserved under predictable adaptive control (recalibration, per-node bandwidth escalation, distilled-student refresh), at no extra cost in assumptions over fixed-horizon FC-RAG. Naive composition fails because FC-RAG's marginal coverage bound makes the betting e-process a non-supermartingale on adverse calibration draws, and Ville's inequality cannot be invoked. We give Anytime-FC-RAG, a sequential extension built on a summable per-step calibration-deviation budget that converts the marginal bound into a strict conditional bound on a calibration-good event, paired with a truncated betting e-process that is a nonnegative supermartingale on the entire probability space. From these two ingredients, we obtain four guarantees: time-uniform alarm validity $\mathbb{P}(\sup_t E_t \ge 1/δ_e) \le δ_e + δ_{\mathrm{cal}}$, a Hoeffding-stitched cumulative-miscoverage envelope at the same total budget, safety under any predictable controller (recalibration, bandwidth escalation, student refresh), and training-side error propagation across an unbounded sequence of Federated Probe-Logit Distillation (FPLD) refreshes via a summable training budget. As a practical consequence, an adaptive controller that escalates retrieval bandwidth only when the e-process crosses a warning threshold matches the alarm rate of a fixed-high-bandwidth schedule at substantially lower communication cost. Experiments on a GPT-2-small + MiniLM swarm across MMLU, DBpedia, and AG News verify the predicted alarm rate, detection delay, envelope coverage, and $14$-$57\%$ bandwidth savings; the alarm fires when and only when coverage genuinely breaks.

2605.29112 2026-05-29 stat.ME

Efficient First-Order Methods for Estimating Generalized Additive Index Models

估计广义可加指标模型的高效一阶方法

Ziyu Peng, Linglingzhi Zhu, Yao Xie

AI总结 针对广义可加指标模型(GAIM)的序贯估计计算效率低的问题,提出基于基展开的梯度下降和变分不等式算法,实现同时估计,并证明收敛到稳定点,数值实验显示优于经典方法。

详情
AI中文摘要

广义可加指标模型(GAIM)提供了一个灵活的半参数框架,用于捕捉复杂的数据关系,平衡了参数模型的可解释性和非参数方法的灵活性。然而,经典的GAIM逐阶段估计方法由于其序贯性质和对非参数平滑的依赖而遭受计算效率低下的问题。为了克服这些缺点,我们提出了高效的GAIM同时估计算法。通过利用基展开,我们将半参数估计任务转化为一个有限维优化问题,该问题可以通过一阶方法(如梯度下降(GD))求解。此外,我们引入了一种变分不等式(VI)估计算法,将VI框架从广义线性模型扩展到GAIM。我们为两种算法提供了统一的收敛到稳定点的结果。数值实验突出了我们的方法相对于经典逐阶段方法的计算和统计优势,并揭示了基于VI的方法在非规范链接函数上相对于GD的潜在优势。

英文摘要

Generalized additive index models (GAIMs) offer a flexible semiparametric framework for capturing complex data relationships, balancing the interpretability of parametric models with the flexibility of nonparametric approaches. However, classical stage-wise estimation procedures for GAIMs suffer from computational inefficiencies due to their sequential nature and reliance on nonparametric smoothing. To overcome these drawbacks, we propose efficient, simultaneous estimation algorithms for GAIMs. By leveraging basis expansion, we cast the semiparametric estimation task as a finite-dimensional optimization problem solvable by first-order methods such as gradient descent (GD). Furthermore, we introduce a variational inequality (VI) estimation algorithm, extending the VI framework from generalized linear models to GAIMs. We provide a unified convergence result to a stationary point for both algorithms. Numerical experiments highlight the computational and statistical advantages of our methods over classical stage-wise procedures, and reveal the potential benefits of the VI-based approach over GD for non-canonical link functions.

2605.29081 2026-05-29 stat.ME

Bayesian Inference of Mixing and Transmission Heterogeneity in Stratified Disease Surveillance Models

分层疾病监测模型中混合与传播异质性的贝叶斯推断

Miles Moran, Rob Trangucci, Lisa Madsen

AI总结 提出一种贝叶斯潜变量扩展的内生-流行病模型,用于从分层疾病监测数据中推断未观测的个体传播性、疾病发生率与流行率的分离以及人口组间混合结构。

详情
AI中文摘要

当传染病发病率的监测数据(如每周病例数)按人口统计指标分层时,这些群体之间长期健康结果的差异变得明显。准确识别高风险亚群将使政策制定者能够在流行病早期进行针对性干预;然而,疾病发病的时间模型通常缺乏对多变量(即亚群水平)结果的稳健处理。我们提出了一种新颖的贝叶斯潜变量扩展,用于通常为此目的使用的内生-流行病(``EE'')建模框架。具体来说,我们通过明确表示未观测的个体水平传播性、明确分离疾病发病率和流行率以及参数化估计人口组间混合结构来增强EE模型类。得到的模型可以针对罕见疾病(高度地方性)或暴发驱动(高度流行性)背景进行定制,并且能够仅从发病率数据推断社会接触混合模式,包括多重分层数据中的混合模式。为了演示,我们进行了一项模拟研究,将我们的模型与现有的双重分层EE模型在预期的罕见疾病应用制度中进行比较。然后,我们将我们的推断与竞争对手对2011-2015年柏林诺如病毒胃肠炎实际发病率数据(按六个年龄组和十二个地理区域分层)的推断进行比较。最后,我们报告了我们的模型对疫情第一年密歇根州记录的COVID-19发病率(按六个年龄组和六十六个地理区域分层)的推断。

英文摘要

When surveillance data of infectious disease incidence (e.g. weekly case counts) are disaggregated by demographic indicators, disparities in long-run health outcomes between these groups become apparent. Accurate identification of high-risk subpopulations would enable policy-makers to target interventions early in an epidemic; but, temporal models of disease incidence typically lack robust treatment of multivariate (i.e. subpopulation-level) outcomes. We propose a novel Bayesian latent-variable extension of the endemic-epidemic (``EE'') modeling framework commonly used for this purpose. Specifically, we augment the EE model class with explicit representation of unobserved individual-level transmissibility; explicit separation of disease incidence and prevalence; and parametric estimation of between-demographic-groups mixing structure. The resulting model may be tailored for either rare-disease (highly-endemic) contexts or outbreak-driven (highly-epidemic) contexts, and is capable of inferring social contact mixing patterns from incidence data alone, including mixing patterns among multiply-stratified data. To demonstrate, we conduct a simulation study comparing our model to an existing doubly-stratified EE model in the intended rare-disease application regime. We then compare our inference to the competitor's for real incidence data of norovirus gastroenteritis in Berlin, 2011-2015, disaggregated by six age groups and twelve geographic regions. Finally, we report inference of our model on COVID-19 incidence recorded in Michigan during the first year of the pandemic, disaggregated by six age groups and sixty-six geographic regions.

2605.29066 2026-05-29 math.ST math.PR stat.TH

A scale-free density bound for Gaussian maxima

高斯最大值的一个无尺度密度界

Suhas Vijaykumar

AI总结 针对中心化高斯向量的最大值,推导了一个无尺度密度界,该界非均匀、对数依赖于维数且适用于任意协方差矩阵,并应用于高维假设检验和反集中性等。

详情
AI中文摘要

我们推导了中心化高斯向量最大值密度的无尺度界。基本界是非均匀的,对数依赖于维数,并允许任意协方差矩阵。当最大边际方差与零分离时,它意味着最大值密度在所有大于2/3的分位数上被均匀控制,这对于许多假设检验应用是足够的;它在检验水平$α\le 1/3$下,无需进一步限制协方差,即可保证高维和最大值的高斯及自举近似的有效性。该结果还意味着均匀的反集中性界以及最大值方差的控制,具有最优的维数依赖性,用最大值的期望和最大边际方差表示。我们讨论了在高维相关性检验、时间均匀序贯检验以及潜在低维结构下的非参数推断中的应用。

英文摘要

We derive a scale-free bound on the density of the maximum of a centered Gaussian vector. The basic bound is non-uniform, depends logarithmically on the dimension, and allows any covariance matrix. When the largest marginal variance is separated from zero, it implies that the density of the maximum is uniformly controlled at all quantiles above 2/3, which is sufficient for many hypothesis testing applications; it yields validity of Gaussian and bootstrap approximations for maxima of high-dimensional sums at test levels $α\le 1/3$ without further restricting the covariance. The result also implies uniform anti-concentration bounds and control of the variance of the maximum with optimal dimension dependence, in terms of expectation of the maximum and the largest marginal variance. We discuss implications for high-dimensional correlation testing, time-uniform sequential testing, and non-parametric inference under latent, low-dimensional structure.

2605.29032 2026-05-29 cs.LG stat.ML

Theoretical Foundations and Effective Algorithms for Policy-Aware Simulator Learning

策略感知模拟器学习的理论基础与有效算法

Christoph Dann, Yishay Mansour, Mehryar Mohri

AI总结 针对模型强化学习中模拟器利用问题,提出以策略鲁棒性为目标,通过零和极小极大博弈学习模拟器,并给出理论保证与有效算法。

详情
AI中文摘要

基于模型的强化学习(MBRL)智能体通常通过最小化预测损失来学习世界模型。然而,强大的RL优化器不可避免地会利用微小的模型不准确性,导致模拟器利用和现实差距,即策略在模拟中成功但在现实世界中失败。我们提出学习模拟器的目标应该是策略鲁棒性而非预测准确性,并将其形式化为模型玩家与对抗策略玩家之间的零和极小极大博弈。我们提供了全面的理论分析:(1)在线学习保证,表明该博弈是可学习的,具有次线性遗憾界;(2)一个可处理的基于评论家的简化,通过局部评论家的损失来界定全局策略价值差距;(3)误差-MDP对偶性,证明寻找最坏情况策略在形式上是标准RL问题的对偶,其中奖励是一步评论家误差。这种对偶性产生了一个可证明收敛的主动数据选择算法。在连续控制任务上的实验表明,我们的方法在策略重要区域将预测误差降低了1.5-2.2倍,并使完全在模拟中训练的策略能够匹配接近最优的现实世界性能。

英文摘要

Model-based reinforcement learning (MBRL) agents typically learn world models by minimizing predictive loss. However, powerful RL optimizers inevitably exploit minor model inaccuracies, leading to simulator exploitation and a reality gap where policies succeed in simulation but fail in the real world. We propose that the objective for learning simulators should be strategic robustness rather than predictive accuracy, and formulate this as a zero-sum minimax game between a model player and an adversarial policy player. We provide a comprehensive theoretical analysis: (1) an online learning guarantee showing the game is learnable with sublinear regret bounds; (2) a tractable critic-based simplification bounding the global policy-value gap by the local critic's loss; and (3) an Error-MDP duality, proving that finding the worst-case policy is formally dual to a standard RL problem where the reward is the one-step critic error. This duality yields a provably convergent active data selection algorithm. Experiments on continuous control tasks demonstrate that our approach reduces prediction error in strategically important regions by $1.5$-$2.2\times$ and enables policies trained purely in simulation to match near-optimal real-world performance.

2605.28974 2026-05-29 math.ST math.RT stat.AP stat.ME stat.TH

Algorithm to check Maximum Likelihood Estimate Existence for integrated PCA

检查集成PCA的最大似然估计存在性的算法

Dmitri Shmelkin

AI总结 基于不变理论与统计学的桥梁,利用箭图半不变技术,提出并验证了集成PCA模型中最大似然估计存在的充要条件,并开发了易于使用的软件。

详情
Comments
6 pages
AI中文摘要

受[AKRS](提供了统计学与不变理论之间的惊人桥梁)以及[FM](其中箭图半不变技术用于验证最近iPCA模型的MLE存在性)的启发,我们对[FM]进行了改进。我们的定理5.2给出了对于任意维数向量,MLE一般存在的充要条件。这些条件可以通过我们基于Derksen-Weyman算法的软件[T]轻松检查,简化了统计实践者和非箭图专家的应用。对于深入研究箭图表示理论的学者,定理5.2将MLE存在性与[Sh07]中引入的表示的局部半单性联系起来。我们也希望这篇基础而简短的文本能为两个领域的专家提供一个新类别的温暖起点。

英文摘要

Being encouraged by [AKRS] that provides an amazing bridge between Statistics and Invariant Theory, and especially by [FM], where quiver semi-invariant techniques apply to verify the existence of MLE for a recent iPCA model, we provide an enhancement to [FM]. Our Theorem 5.2 yields necessary and sufficient conditions for MLE to exist generically for any dimension vector. The conditions can be easily checked with our software [T] based on Derksen-Weyman algorithm and simplifying the application for statistics practitioners and non-specialists in quivers. For those deep in quiver Representation Theory, Theorem 5.2 relates the MLE existence to the local semi-simplicity of representations as introduced in [Sh07]. We also hope that our elementary and short text can serve for the experts in both domains as a warm start in a new category.

2605.28961 2026-05-29 stat.ML cs.LG math.OC

Dynamics of Stochastic Momentum with Sparse Updates in High Dimensions

高维稀疏更新下随机动量的动力学

Katie Everett, Elliot Paquette

AI总结 本文通过最小二乘和逻辑回归模型,理论分析了稀疏更新下动量的动力学,揭示了由动量保留时间尺度与学习时间尺度之比决定的相结构,并发现不同令牌稀疏度下的振荡动力学存在谱冲突。

详情
AI中文摘要

现有的动量理论假设梯度以大致恒定的速率到达每个参数,但这一假设在重尾数据分布和现代架构中常被违反。我们理论分析了稀疏更新下两种可处理动量模型的动力学:具有稀疏输入的最小二乘模型和具有稀有类别的逻辑回归模型。两者都给出了精确的闭式二阶矩动力学,我们针对稀疏性、批量大小和动量衰减的三个标度指数刻画了其高维极限。两个问题上的相结构由两个内在时间尺度之比决定:动量保留时间尺度(缓冲区存活的活动更新次数)和学习时间尺度(减少平方误差所需的活动更新次数)。当学习远慢于保留时,极限匹配SGD;当学习更快时,系统不稳定;当时间尺度相当时,我们恢复经典的重球动力学。振荡动力学发生在不同令牌稀疏度的不同动量值处,从而在全局动量上产生跨令牌频率的谱冲突。

英文摘要

Existing theory of momentum assumes that gradients arrive at every parameter at a roughly constant rate, an assumption violated in practice by heavy-tailed data distributions and modern architectures. We theoretically analyze the dynamics of two tractable models of momentum under sparse updates: a least squares model with sparse inputs and a logistic regression model with a rare class. Both admit exact closed-form second-moment dynamics whose high-dimensional limits we characterize across three scaling exponents for sparsity, batch size, and momentum decay. The phase structure on both problems is governed by the ratio of two intrinsic timescales: a momentum retention timescale (how many active updates the buffer survives) and a learning timescale (how many active updates it takes to reduce the squared error). When learning is much slower than retention, the limit matches SGD; when learning is faster, the system is unstable; where the timescales coincide, we recover classical heavy-ball dynamics. The oscillatory dynamics occur at different momentum values for different token sparsity, creating a spectral conflict for global momentum across token frequencies.

2605.28920 2026-05-29 cs.LG cs.AI stat.ML

Conf-Gen: Conformal Uncertainty Quantification for Generative Models

Conf-Gen: 生成模型的共形不确定性量化

Gabriel Loaiza-Ganem, Kevin Zhang, Wei Cui, Marc T. Law, Kin Kwan Leung

AI总结 提出Conf-Gen框架,通过共形风险控制适配生成任务,统一并扩展了共形预测在大型语言模型等生成模型中的应用,并在图像生成、对话AI和AI代理等新领域提供了形式化保证。

详情
Comments
ICML 2026
AI中文摘要

共形预测(CP)及其扩展共形风险控制(CRC)是通过形式化保证量化监督机器学习中不确定性的成熟框架。然而,人工智能(AI)的最新突破由无监督生成模型驱动,例如大型语言模型(LLMs)和图像生成器,这些模型与CP或CRC不直接兼容。在这项工作中,我们引入了共形生成(Conf-Gen),这是一个将CRC适配到生成任务同时放宽其理论假设的通用框架。Conf-Gen统一并泛化了先前将CP应用于LLMs的尝试,并将共形方法扩展到全新的领域。我们通过一些新颖的应用展示了Conf-Gen的灵活性,包括在以下方面获得共形保证:生成非记忆图像的图像生成器、提出足够澄清问题的对话AI系统,以及AI代理输出的正确性。

英文摘要

Conformal prediction (CP) and its extension, conformal risk control (CRC), are established frameworks for quantifying uncertainty in supervised machine learning through formal guarantees. However, recent breakthroughs in artificial intelligence (AI) have been driven by unsupervised generative models, such as large language models (LLMs) and image generators, which are not directly compatible with CP or CRC. In this work we introduce conformal generation (Conf-Gen), a general framework adapting CRC to generative tasks while relaxing its theoretical assumptions. Conf-Gen unifies and generalizes previous attempts to apply CP to LLMs, and extends conformal methodology to entirely new domains. We demonstrate the flexibility of Conf-Gen through some novel applications, including obtaining conformal guarantees on: image generators producing non-memorized images, conversational AI systems having asked enough clarifying questions, and the output of AI agents being correct.

2605.28894 2026-05-29 math.OC stat.ML

Saddle Networks: Structure-Preserving Architectures for Convex-Concave Functions

鞍点网络:凸-凹函数的结构保持架构

Xavier Warin

AI总结 提出一种结构保持的分离分解方法,并基于混合Monge型凸性条件证明一维逼近定理,进而设计出通过构造保持凸-凹几何的实用鞍点网络架构,在光滑、非光滑和高秩凸-凹测试函数上实现高精度。

详情
AI中文摘要

鞍点模型出现在优化、最优传输、鲁棒学习和控制等多个领域。在许多应用中,相关函数 f(x,y) 关于 x 凸、关于 y 凹,保持这种几何结构对于获得可处理的极小-极大公式和可靠保证至关重要。我们引入一种结构化的分离分解方法,以保持凸-凹几何,并在混合Monge型凸性条件下证明了一个完整的一维逼近定理。然后,我们描述了实用的鞍点网络架构,这些架构通过构造保持关于 x 的凸性和关于 y 的凹性。所提出的架构仅需要保持凸性的神经网络,以及施加符号和凹性约束的简单输出变换。最后,我们报告了在1维和5维上的数值基准测试,表明所提出的鞍点网络在光滑、非光滑和高秩凸-凹测试函数上实现了高精度。

英文摘要

Saddle-point models arise throughout optimization, optimal transport, robust learning, and control. In many applications, the relevant function f(x,y) is convex in x and concave in y, and preserving this geometry is essential for obtaining tractable min--max formulations and reliable certificates. We introduce a structured separable decomposition that preserves the convex-concave geometry and prove a complete one-dimensional approximation theorem under a mixed Monge-type convexity condition. We then describe practical saddle network architectures that preserve convexity in x and concavity in y by construction. The proposed architectures require only convexity-preserving neural networks, together with simple output transformations enforcing sign and concavity constraints. Finally, we report numerical benchmarks in dimension 1 and 5, showing that the proposed saddle networks achieve high accuracy on smooth, nonsmooth, and high-rank convex--concave test functions.

2605.28880 2026-05-29 cs.LG physics.data-an stat.ME

Towards Continuous-time Causal Foundation Models

迈向连续时间因果基础模型

Dennis Thumm, Ruben Wiedemann, Ying Chen

AI总结 提出轨迹律对观测时间表不变的连续性准则,通过细网格积分与解耦观测实现连续时间因果先验模型,并在线性与非线性先验上验证其优于离散方法。

详情
Comments
ICML 2026 2nd Workshop on Foundation Models for Structured Data (FMSD)
AI中文摘要

将时间序列的离散时间因果先验数据拟合网络扩展到连续时间,需要将机制写为随机微分方程(SDE)——但如果SDE在每个观测间隔内只积分一次,轨迹律依赖于观测时间,先验仍然是披着SDE外衣的离散时间马尔可夫模型。我们提出了一个精确的连续性准则——轨迹律对观测时间表的不变性——以及一个三层分类法(离散;朴素观测网格积分;细网格积分与解耦观测),并在具有OU或小型MLP非线性漂移、不规则观测时间表以及硬/软/时变干预的随机DAG上实现了顶层。一个2×2编码器×积分器消融实验,在线性和非线性先验上独立运行,发现细网格积分在8/8个单元上优于朴素积分(符号一致性p<1/256),且随着评估网格细化差距增大;编码器轴在细积分下无效,而在朴素积分下具有时间感知优势。我们发布了该先验以及一个在药代动力学和物理系统数据上的初步零样本协议。

英文摘要

Extending discrete-time causal Prior-data Fitted Networks for time series to continuous time invites writing the mechanism as a stochastic differential equation (SDE) -- but if the SDE is integrated \emph{once per observation gap}, the trajectory law depends on when it is observed, and the prior remains a discrete-time Markov model in SDE clothing. We propose a precise continuity criterion -- trajectory-law invariance to the observation schedule -- together with a three-tier taxonomy (discrete; naive observation-grid integration; fine-grid integration with decoupled observation) and a construction realising the top tier on a random DAG with OU or small-MLP nonlinear drifts, irregular observation schedules, and hard / soft / time-varying interventions. A $2 \times 2$ encoder $\times$ integrator ablation, run independently on a linear and a nonlinear prior, finds fine-grid integration beats naive on 8/8 cells (sign-consistency $p < 1/256$) with the gap growing as the eval grid refines; the encoder axis is null with fine integration but time-aware-leading with naive. We release the prior and a preliminary zero-shot protocol on pharmacokinetic and physical-system data.

2605.28488 2026-05-29 stat.ML cs.LG math.ST stat.TH

Bridging Maximum Likelihood and Optimal Transport for Efficient Inference and Model Selection in Stochastic Block Models

桥接最大似然与最优传输:随机块模型中的高效推理与模型选择

Simon Queric, Cédric Vincent-Cuaz, Charles Bouveyron, Marco Corneli

AI总结 本文通过最优传输视角研究随机块模型,提出正则化与未正则化的半松弛Gromov-Wasserstein估计器,实现聚类与模型参数的联合推断及簇数自动选择。

详情
Comments
10 pages, 8 figures
AI中文摘要

我们通过最优传输(OT)的视角研究随机块模型(SBM)中的推断。首先,我们证明最大似然变分推断(MLVI)可以解释为带有熵正则化的半松弛Gromov-Wasserstein(srGW)投影。虽然这种公式能产生准确的聚类,但熵正则化阻止了传输计划的稀疏性,从而阻碍了内在的模型选择。因此,我们研究未正则化的srGW估计器,并证明它们在渐近情况下一致地恢复SBM连接矩阵和潜在簇分配。然而,这种渐近性质在有限样本中并不能转化为可靠的模型选择,需要额外的机制来促进推断的簇比例中的稀疏性。我们通过实验表明,这种正则化公式产生的估计器能够在单个优化问题中同时恢复模型参数并选择簇的数量,从而避免了昂贵的网格搜索或启发式模型选择程序。

英文摘要

We study inference in stochastic block models (SBMs) through the lens of optimal transport (OT). We first establish that maximum likelihood variational inference (MLVI) can be interpreted as a semi-relaxed Gromov-Wasserstein (srGW) projection with entropic regularization. While this formulation yields accurate clustering, the entropic regularization prevents transport plans to be sparse, hindering intrinsic model selection. Consequently, we investigate unregularized srGW estimators, and prove that they consistently recover both the SBM connectivity matrix and latent cluster assignments in the asymptotic regime. However, this asymptotic property does not translate into reliable model selection in finite samples, and calls for additional mechanisms to promote sparsity in the inferred cluster proportions. We empirically show that such a regularized formulation yields estimators that simultaneously recover model parameters and select the number of clusters in a single optimization problem, thereby avoiding costly grid search or heuristic model selection procedures.

2605.28341 2026-05-29 stat.ME math.ST stat.TH

Identification and Inference for Structural Accelerated Failure Time Models via Instrument Interactions

通过工具变量交互作用的结构性加速失效时间模型的识别与推断

Qiushi Bu, Wen Su, Xinyu Zhang, Xingqiu Zhao, Zhonghua Liu

AI总结 针对存在未测量混杂的右删失时间-事件结局,提出一种利用工具变量交互作用进行识别和推断的框架,无需经典工具变量有效性假设,并采用增强逆概率删失加权和广义经验似然方法实现稳健推断。

详情
AI中文摘要

我们研究在存在未测量混杂的情况下,右删失时间-事件结局的因果推断。聚焦于结构性加速失效时间模型,我们开发了一个利用工具变量交互作用的识别和推断框架。所提出的方法不依赖于经典工具变量有效性,并在有效和无效工具变量下均能产生有效的因果推断,前提是交互作用识别条件成立。为处理右删失,我们使用增强逆概率删失加权方法构建了一个删失调整的观测数据矩函数。该矩函数对 nuisance 函数具有 Neyman 正交性,并具有双重稳健性,从而在灵活的 nuisance 估计下实现有效推断。使用广义经验似然进行估计和推断,该方法适用于具有许多潜在弱交互作用矩条件的情形。我们在许多弱矩渐近条件下建立了相合性和渐近正态性,并开发了诊断工具来评估交互作用识别强度和过度识别限制。模拟研究展示了在各种删失率和工具配置下良好的有限样本性能。对英国生物银行数据的应用说明了所提出方法在大规模观察性研究中进行因果生存分析的实际意义。

英文摘要

We study causal inference for time-to-event outcomes under right censoring in the presence of unmeasured confounding. Focusing on structural accelerated failure time models, we develop an identification and inference framework that exploits interactions among instrumental variables. The proposed approach does not rely on classical instrumental variable validity and yields valid causal inference under both valid and invalid instruments, provided that the interaction-based identification condition holds. To accommodate right censoring, we construct a censoring-adjusted observed data moment function using an augmented inverse probability censoring weighting approach. The resulting moment function is Neyman orthogonal with respect to nuisance functions and enjoys a double robustness property, enabling valid inference under flexible nuisance estimation. Estimation and inference are conducted using generalized empirical likelihood, which is well suited to settings with many potentially weak interaction-based moment conditions. We establish consistency, and asymptotic normality under many weak moment asymptotics, and develop diagnostic tools to assess interaction-based identification strength and overidentifying restrictions. Simulation studies demonstrate favorable finite sample performance across a range of censoring rates and instrument configurations. An application to UK Biobank data illustrates the practical relevance of the proposed method for causal survival analysis in large-scale observational studies.

2605.27975 2026-05-29 cs.LG stat.ML

Continual Learning in Modern Hopfield Networks with an Application to Diffusion Models

现代Hopfield网络中的持续学习及其在扩散模型中的应用

Ken Takeda, Masafumi Oizumi, Ryo Karakida

AI总结 通过现代Hopfield能量分析扩散模型中的持续学习,证明高能量异常样本更容易被遗忘,并基于能量选择重放样本以缓解遗忘。

详情
AI中文摘要

生成模型(包括扩散模型)越来越多地被用作基础模型,并通过顺序微调进行适配,这使得持续学习成为一个关键问题设定。然而,此类生成模型中的持续学习仍未被充分理解:任务变化后,学习分布的哪些方面最容易丢失,以及应优先重放哪些样本?我们通过现代Hopfield能量来解决这些问题。现代Hopfield网络(MHN)与扩散模型之间的最新联系使得MHN中的分析可以迁移到扩散模型。我们引入内在遗忘作为任务变化后Hopfield能量的增加。在MHN的可处理设定中,我们证明高能量、类似异常值的样本比类似聚类的样本经历更大的能量增加,这意味着位于尖锐、孤立盆地中的样本更容易被遗忘。我们进一步分析了记忆重放,并表明重放对高能量样本特别有效,从而实现了基于能量的重放样本选择。我们在MHN和两种扩散模型(Stable Diffusion和像素空间DDPM)的持续学习设置实验中验证了这些预测。在这些扩散模型中,Hopfield能量追踪基于重建的遗忘,重放实验揭示了与MHN分析一致的能量依赖性遗忘缓解。

英文摘要

Generative models, including diffusion models, are increasingly used as foundation models and adapted through sequential fine-tuning, making continual learning an essential problem setting. However, continual learning in such generative models remains poorly understood: after a task change, what aspects of the learned distribution are most easily lost, and what replay samples should be prioritized? We address these questions through the modern Hopfield energy. Recent links between modern Hopfield networks (MHNs) and diffusion models allow analyses in MHNs to be transferred to diffusion models. We introduce intrinsic forgetting as an increase in Hopfield energy after the task change. In tractable settings in an MHN, we prove that high-energy, outlier-like samples undergo a larger energy increase than cluster-like samples, implying that samples located in sharp, isolated basins are more forgettable. We further analyze memory replay and show that replay is particularly effective for high-energy samples, enabling an energy-based selection of replay samples. We validate these predictions in experiments on MHNs and two diffusion models under continual-learning settings: Stable Diffusion and a pixel-space DDPM. In these diffusion models, Hopfield energy tracks reconstruction-based forgetting, and replay experiments reveal energy-dependent mitigation of forgetting that is consistent with the MHN analysis.

2605.27625 2026-05-29 math.ST stat.TH

Admissibility of Adaptive Monotone Step-Down Multiple Testing Procedures Under Arbitrary Covariance Dependence

任意协方差依赖下自适应单调逐步多重检验程序的可容许性

Prasenjit Ghosh, Arijit Chakrabarti

AI总结 针对任意协方差依赖下多元正态均值的同步检验问题,建立了一类基于残差的自适应单调逐步下降多重检验程序的可容许性定理,证明其关于向量值损失函数是可容许的。

详情
AI中文摘要

本文考虑在任意协方差依赖下多元正态均值的同步检验问题。具体地,设 $\boldsymbol{X}\sim N_n(\boldsymbolθ,\boldsymbolΣ)$,其中 $\boldsymbolθ\in\mathbb{R}^n$ 未知,$\boldsymbolΣ$ 是已知的正定协方差矩阵。目标是同时检验 $H_{0i}:θ_i=0$ 对 $H_{Ai}:θ_i\neq 0$,$i=1,\ldots,n$。我们为一类广泛的基于残差的单调逐步下降多重检验程序建立了通用可容许性定理,这些程序通过使用由条件正态分布产生的适当标准化残差统计量的局部自适应严格递增变换得到的统计量,迭代地对活跃假设进行排序。我们的主要结果表明,每个这样的程序关于向量值损失函数(其分量是通常的单个 $0$-$1$ 检验损失)都是可容许的。证明依赖于对诱导接受区域的精细几何分析以及自适应逐步拒绝指标的结构不变性。该定理实质性地扩展了 Cohen 等人 (2009) 为最大残差下降程序建立的可容许性理论,并揭示了依赖下的可容许性根本上是由残差统计量诱导的单调排序结构驱动的,而不是由检验规则本身的精确函数形式驱动的。

英文摘要

In this paper, we consider the problem of simultaneous testing of multivariate normal means under arbitrary covariance dependence. Specifically, let $\boldsymbol{X}\sim N_n(\boldsymbolθ,\boldsymbolΣ)$, where $\boldsymbolθ\in\mathbb{R}^n$ is unknown and $\boldsymbolΣ$ is a known positive definite covariance matrix. The objective is to test $H_{0i}:θ_i=0$ against $H_{Ai}:θ_i\neq 0$, simultaneously for $i=1,\ldots,n$. We establish a general admissibility theorem for a broad class of monotone residual-based step-down multiple testing procedures which iteratively rank the active hypotheses using statistics obtained through locally adaptive strictly increasing transformations of suitably standardized residual statistics arising from conditional normal distributions. Our main result shows that every such procedure is admissible with respect to a vector-valued loss function whose components are the usual individual $0$--$1$ testing losses. The proof relies on a delicate geometric analysis of the induced acceptance regions together with structural invariance properties of the adaptive stagewise rejection indices. The theorem substantially extends the admissibility theory developed for the maximum residual down procedure of Cohen et al. (2009) and reveals that admissibility under dependence is fundamentally driven by the monotone ordering structure induced by the residual statistics rather than by the precise functional form of the testing rule itself.

2605.27474 2026-05-29 stat.ML cs.LG

Stop Suppressing the Tail: Causal Inference for Extreme Events

停止抑制尾部:极端事件的因果推断

Eichi Uehara

AI总结 针对重尾结果,提出一种平均剂量-响应函数(ADRF)估计器,通过基于中位数中心化的尾部诊断(PDHTE+JK)打破循环依赖,输出结构化尾部形状和深层尾部风险指标,在极端事件预测中显著优于传统方法。

详情
Comments
22 pages, 6 figures, 13 tables. Keywords: double machine learning, dose-response, heavy tails, extreme value theory, causal inference
AI中文摘要

估计结果如何响应连续处理(平均剂量-响应函数,ADRF)是因果推断的核心基础。然而,当结果具有重尾时,标准的鲁棒双重机器学习(DML)会刻意抑制这些极端值以稳定整体均值。在高风险场景(如金融收益或气候损失)中,这种被忽略的千分之一极端事件恰恰是实际目标量。此外,当前从模型残差中读取尾部的方法存在循环依赖,导致仅因核心估计器在Huber和Welsch之间切换,尾部形状推断就会发生剧烈变化。本研究提出一种ADRF估计器,它在标准点估计之外输出结构化的尾部形状。其尾部诊断(PDHTE+JK)通过基于中位数中心化的结果评估每个处理下的尾部形状,成功打破了循环依赖,使诊断结果不受核心方法选择的影响。输出包含四个处理条件量:尾部形状$\hatξ(t)$、深层尾部回报水平$\hat{Q}_α(t)$、条件短缺$\hat{S}_α(t)$、恢复的均值ADRF,以及一个明确的拒绝机制,当数据不支持极值建模时拒绝外推。与核加权分位数回归(QR)相比,所提估计器在重尾面板上将深层尾部($α=0.001$)回报水平MAE降低了11%,条件短缺MAE降低了25.5%。在样本稀缺场景($n\le2000$)中,MAE降低了20-29%。在freMTPL2汽车保险索赔数据上,它在对数索赔尺度上成功触发了明确的外推拒绝,这是QR或仅损失DML无法实现的。

英文摘要

Estimating how an outcome responds to a continuous treatment (the Average Dose-Response Function, or ADRF) is a core causal-inference primitive. However, when outcomes possess heavy tails, standard robust double machine learning (DML) deliberately suppresses these extremes to stabilize the bulk average. In high-stakes settings, such as financial returns or climate losses, this omitted 1-in-1000 extreme event is the actual target quantity. Furthermore, current methods that read the tail from a model's residuals suffer from circular dependence, causing tail shape inferences to shift drastically based solely on whether the core estimator is switched between Huber and Welsch. The research proposes an ADRF estimator that emits a structured tail-shape output alongside the standard point estimate. Its tail diagnostic (PDHTE+JK) evaluates the per-treatment tail shape from the outcome centered by a pilot median, successfully breaking the circular dependence and rendering the diagnostic invariant to the choice of core method. The output encompasses four treatment-conditional quantities: tail shape $\hatξ(t)$, deep-tail return levels $\hat{Q}_α(t)$, conditional shortfalls $\hat{S}_α(t)$, the recovered mean ADRF, and an explicit refusal mechanism that declines extrapolation when extreme-value modeling is unsupported by the data. Compared to kernel-weighted quantile regression (QR), the proposed estimator reduces deep-tail ($α=0.001$) return-level MAE by 11% and conditional-shortfall MAE by 25.5% across a heavy-tailed panel. It also achieves a 20-29% MAE reduction in sample-scarce regimes ($n\le2000$). On freMTPL2 motor-insurance claims, it successfully triggered an explicit extrapolation refusal on the log-claim scale, which neither QR nor loss-only DML can produce.

2605.26653 2026-05-29 stat.ME

Nonparametric Regression via Tree-Guided Feature Aggregation

基于树引导特征聚合的非参数回归

Sithija Manage, Y. Samuel Wang, Martin T. Wells

AI总结 针对协变量具有层次树结构的回归问题,提出一种基于惩罚的Nadaraya-Watson型估计器KR-TEXAS,通过自适应惩罚权重同时实现模型选择和特征聚合,并证明了模型选择一致性。

详情
AI中文摘要

在协变量自然组织成层次树结构的回归问题中,一个核心挑战是选择协变量进入模型的粒度。确定这种特征聚合水平具有内在的科学意义,并且可以通过引入稀疏性来提高统计效率。尽管已有丰富文献在线性设置中解决了这一问题,但将特征聚合扩展到非线性设置仍然是一个开放的挑战。在这项工作中,我们提出通过一种惩罚的Nadaraya-Watson型估计器同时进行模型选择和特征聚合。我们提出的估计器,即带有树探索聚合的核回归(KR-TEXAS),基于回归函数偏导数的初始估计器为特征构建自适应惩罚权重。在温和条件下,我们为定义良好的目标聚合集建立了模型选择一致性,并且我们的模拟显示在模型选择和预测方面均表现强劲。最后,我们通过将我们的方法应用于一个微生物组数据集来预测短链脂肪酸,展示了其效用。我们的方法在R包krtexas中提供了用户友好的实现。

英文摘要

In regression problems where covariates are naturally organized in a hierarchical tree structure, a central challenge is to select the resolution at which covariates enter the model. Determining this level of feature aggregation is of intrinsic scientific interest and can improve statistical efficiency by inducing sparsity. While a rich literature addresses this problem in the linear setting, extending feature aggregation to the nonlinear setting remains an open challenge. In this work, we propose to simultaneously perform model selection and feature aggregation through a penalized Nadaraya-Watson-type estimator. Our proposed estimator, Kernel Regression with Tree-EXploring AggregationS (KR-TEXAS), constructs adaptive penalty weights for the features based on pilot estimators of the regression function's partial derivatives. Under mild conditions, we establish model selection consistency for a well-defined target aggregation set, and our simulations show strong performance in both model selection and prediction. Finally, we demonstrate the utility of our procedure by applying it to a microbiome data set to predict short chain fatty acids. A user-friendly implementation of our procedure is available in the R package krtexas.

2605.26408 2026-05-29 cs.LG stat.ME stat.ML

Function-Valued Causal Influence in Nonlinear Time Series

非线性时间序列中的函数值因果影响

Valentina V. Kuskova, Dmitry Zaytsev, Michael Coppedge

AI总结 针对非线性时间序列因果发现中常用标量评分掩盖状态依赖函数效应的问题,提出基于个体条件期望的框架从神经加性向量自回归模型直接估计因果响应函数,揭示标量评分无法区分的多种函数行为。

详情
Comments
26 pages, 6 tables, 8 figures
AI中文摘要

时间序列中的因果发现越来越多地使用非线性机器学习模型进行,但由此产生的因果关系几乎总是通过标量边评分来总结。我们认为,这种做法掩盖了非线性自回归模型真正学习到的对象:一个状态依赖的函数,其效应随机制、幅度和上下文而变化。我们形式化了加性、贡献可分解架构的函数值因果影响,并表明标量因果评分构成了严重的信息瓶颈,将状态间变化与状态内残差噪声混为一谈。以神经加性向量自回归作为代表性架构,我们引入了一个基于个体条件期望的实用框架,直接从训练好的模型估计因果响应函数。通过受控的合成实验,我们证明了具有无法区分的标量评分的边可以表现出定性的不同函数行为,包括单调、阈值、饱和和符号变化效应。一个关于民主发展的应用案例进一步表明,函数值分析揭示了以评分为中心的方法系统性遗漏的特定于机制和非对称的因果结构。

英文摘要

Causal discovery in time series is increasingly performed using nonlinear machine-learning models, yet the resulting causal relationships are almost always summarized by scalar edge scores. We argue that this practice obscures the true object learned by nonlinear autoregressive models: a state-dependent function whose effect varies across regimes, magnitudes, and contexts. We formalize function-valued causal influence for additive, contribution-decomposable architectures and show that scalar causal scores constitute a severe information bottleneck, conflating between-state variation with within-state residual noise. Using Neural Additive Vector Autoregression as a representative architecture, we introduce a practical framework based on Individual Conditional Expectation for estimating causal response functions directly from trained models. Through controlled synthetic experiments, we demonstrate that edges with indistinguishable scalar scores can exhibit qualitatively different functional behaviors, including monotonic, thresholded, saturating, and sign-changing effects. An applied case study on democratic development further shows that function-valued analysis reveals regime-specific and asymmetric causal structure systematically missed by score-centric approaches.

2605.13168 2026-05-29 stat.ME

Variance-Aware Estimation and Inference for Michaelis--Menten Models with Heteroscedastic Errors and Clustered Measurements

异方差误差和聚类测量下Michaelis-Menten模型的方差感知估计与推断

Mijeong Kim, Minkyoung Cha, Ah Young Jeong

AI总结 针对Michaelis-Menten模型,提出一种基于条件矩约束的方差感知估计与推断方法,通过简单条件高斯工作模型实现单曲线和聚类数据的参数估计,改善了异方差和聚类结构下的推断效率。

详情
AI中文摘要

Michaelis-Menten分析通常在恒定方差假设下通过非线性最小二乘法进行,尽管酶动力学数据经常表现出浓度依赖的异方差性,并且通常包含重复或聚类测量。我们开发了一种方差感知的Michaelis-Menten估计与推断程序,该程序由条件矩约束驱动,并通过简单的条件高斯工作模型实现。对于单曲线,该方法简化为对$K_m$的一维求根,随后对$V_{\max}$和方差尺度参数进行闭式插件更新;相同的得分逻辑通过随机效应诱导的工作协方差扩展到聚类水平。在模拟中,相对于同方差非线性最小二乘法,建模异方差改善了方差恢复和区间效率,而聚类感知的半参数和NLME拟合比忽略聚类的合并分析更有效地恢复了固定效应覆盖。在自主实验室和土壤胞外酶数据中,异方差模型实现了比同方差非线性最小二乘法更低的信息准则,其中平方根方差函数在预指定的工作模型中给出了最稳定的经验拟合。我们在配套的\texttt{inferMM}包中实现了单曲线、分组和聚类Michaelis-Menten分析的工作流程。这些结果表明,当变异性随底物浓度变化或测量值聚类时,简单的方差函数和协方差建模可以稳定原始尺度的Michaelis-Menten推断。

英文摘要

Michaelis--Menten analysis is often conducted by nonlinear least squares under a constant-variance assumption, even though enzyme-kinetic data frequently display concentration-dependent heteroscedasticity and often include repeated or clustered measurements. We develop a variance-aware procedure for Michaelis--Menten estimation and inference that is motivated by conditional moment restrictions and implemented through simple conditionally Gaussian working models. For single curves, the method reduces to one-dimensional root finding for $K_m$ followed by closed-form plug-in updates for $V_{\max}$ and a variance scale parameter; the same score logic yields a cluster-level extension through a random-effect-induced working covariance. In simulation, modeling heteroscedasticity improved variance recovery and interval efficiency relative to homoscedastic nonlinear least squares, while cluster-aware semiparametric and NLME fits restored fixed-effect coverage far more effectively than pooled analyses that ignored clustering. In self-driving laboratory and soil exoenzyme data, heteroscedastic models achieved lower information criteria than homoscedastic nonlinear least squares, with the square-root variance function giving the most stable empirical fit among the prespecified working models. We implement the workflow in the companion \texttt{inferMM} package for single-curve, grouped, and clustered Michaelis--Menten analysis. These results show that simple variance-function and covariance modeling can stabilize original-scale Michaelis--Menten inference when variability changes with substrate concentration or measurements are clustered.

2605.12208 2026-05-29 stat.ML cs.AI cs.LG stat.CO

Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification

自监督拉普拉斯近似用于贝叶斯不确定性量化

Julian Rodemann, Alexander Marquard, Thomas Augustin, Michele Caprio

AI总结 提出自监督拉普拉斯近似(SSLA),通过重新拟合自预测数据直接近似后验预测分布,实现确定性、无采样的贝叶斯不确定性量化,并在回归任务中优于经典拉普拉斯近似。

详情
Journal ref
Transactions on Machine Learning Research (TMLR). ISSN 2835-8856 (2026)
Comments
Accepted for publication in TMLR (https://openreview.net/forum?id=T8w8L2t3JG), v2: fixed typos and added a deceased-author footnote with a dedication to Thomas Augustin
AI中文摘要

近似贝叶斯推断通常围绕计算后验参数分布展开。然而,在实践中,感兴趣的主要对象通常是模型的预测而非其参数。在这项工作中,我们提出绕过参数后验,直接关注近似后验预测分布。我们通过从自监督和半监督学习中的自训练中汲取灵感来实现这一点。本质上,我们通过重新拟合自预测数据来量化贝叶斯模型的预测不确定性。这个想法非常简单:如果模型对自预测数据赋予高似然,那么这些预测的不确定性低,反之亦然。这产生了后验预测的确定性、无采样近似。我们的自监督拉普拉斯近似(SSLA)的模块化结构进一步允许我们插入不同的先验规范,从而实现经典的贝叶斯敏感性(关于先验选择)分析。为了绕过昂贵的重新拟合,我们进一步引入了SSLA的近似版本,称为ASSLA。我们从理论和经验上研究了(A)SSLA,涉及从贝叶斯线性模型到贝叶斯神经网络的回归模型。在模拟和真实数据集的广泛回归任务中,我们的方法在预测校准方面优于经典拉普拉斯近似,同时保持计算效率。

英文摘要

Approximate Bayesian inference typically revolves around computing the posterior parameter distribution. In practice, however, the main object of interest is often a model's predictions rather than its parameters. In this work, we propose to bypass the parameter posterior and focus directly on approximating the posterior predictive distribution. We achieve this by drawing inspiration from self-training within self-supervised and semi-supervised learning. Essentially, we quantify a Bayesian model's predictive uncertainty by refitting on self-predicted data. The idea is strikingly simple: If a model assigns high likelihood to self-predicted data, these predictions are of low uncertainty, and vice versa. This yields a deterministic, sampling-free approximation of the posterior predictive. The modular structure of our Self-Supervised Laplace Approximation (SSLA) further allows us to plug in different prior specifications, enabling classical Bayesian sensitivity (w.r.t. prior choice) analysis. In order to bypass expensive refitting, we further introduce an approximate version of SSLA, called ASSLA. We study (A)SSLA both theoretically and empirically in regression models ranging from Bayesian linear models to Bayesian neural networks. Across a wide array of regression tasks with simulated and real-world datasets, our methods outperform classical Laplace approximations in predictive calibration while remaining computationally efficient.

2605.02574 2026-05-29 stat.CO cs.NA math.NA stat.ME

Fast and accurate conditioning for large-scale and online Gaussian process prediction problems

大规模和在线高斯过程预测问题的快速且精确的条件化方法

Samanyu Arora, Christopher J. Geoga

AI总结 提出一种通过精心设计的线性组合进行条件化的方法,以指数级收敛速度实现大规模高斯过程预测,并支持在线预测。

详情
AI中文摘要

高斯过程模型为预测和不确定性量化提供了灵活框架。然而,对于大多数协方差函数,基于 $n$ 个点的精确 GP 预测计算复杂度为 $\mathcal{O}(n^3)$,这使得它在大数据集或大量预测点上代价高昂。虽然基于最近邻的预测在某些情况下效果良好,但非病理情况(例如测量噪声)会严重限制其效率。本文提出了一种互补方法,即对精心设计的数据线性组合进行条件化,这在联合预测数据域中大型连通区域内的多个值时特别有效。对于远离原点光滑的核函数和简单预测域,该方法在用于条件化的线性组合数量 $r$ 上呈指数收敛,并且当 $r \approx 100$ 时可达到机器精度。该方法的计算成本为 $\mathcal{O}(T r^2)$,其中 $T$ 是求解数据协方差矩阵线性系统的成本,因此在许多情况下,通过利用良好协方差矩阵的秩结构,可以以线性或近线性成本计算。通过额外 $\mathcal{O}(n r^2)$ 的预计算成本,该方法还可以在 $\mathcal{O}(1)$ 的在线工作中为指定区域的任意点提供预测,这使得它对于预测点事先未知的问题特别有吸引力。

英文摘要

Gaussian Process (GP) models provide a flexible framework for prediction and uncertainty quantification. For most covariance functions, however, exact GP prediction with $n$ points scales as $\mathcal{O}(n^3)$, making it prohibitively expensive for large datasets or large numbers of prediction points. While nearest neighbor-based prediction can work well in certain settings, non-pathological circumstances (for example measurement noise) can severely restrict its efficiency. This work presents a complementary approach where one conditions on carefully designed linear combinations of data, which is particularly effective in the setting of jointly predicting many values in large connected regions of the data domain. For kernel functions that are smooth away from the origin and simple prediction domains, this method can be exponentially convergent in the number of linear combinations $r$ used for conditioning, and can be machine-precision machine-precision accurate for $r \approx 100$. This approach costs $\mathcal{O}(T r^2)$ work to compute where $T$ is the cost of solving a linear system with the data covariance matrix, and so in many cases can be computed in linear or near-linear cost by exploiting rank structure in well-behaved covariance matrices. At the cost of $\mathcal{O}(nr^2)$ additional precomputation work, this approach can also provide predictions at arbitrary points of a designated region in $\mathcal{O}(1)$ online work, making it particularly attractive for problems where prediction points are not known in advance.

2604.13410 2026-05-29 stat.ME cs.LG stat.ML

Estimating Continuous Treatment Effects with Two-Stage Kernel Ridge Regression

使用两阶段核岭回归估计连续治疗效果

Seok-Jin Kim, Kaizheng Wang

AI总结 针对连续治疗的效果函数估计问题,提出两阶段核岭回归方法,通过第一阶段建模响应与治疗和协变量的关系,第二阶段构造伪结果校正分布偏移,无需估计条件治疗密度即可达到最优学习界,并实现数据驱动的模型选择。

详情
AI中文摘要

我们研究连续治疗的效果函数估计问题,该函数将每个治疗值映射到群体平均结果。该设置中的一个核心挑战是混杂:治疗分配通常依赖于协变量,产生选择偏差,使得直接对响应进行回归不可靠。为了解决这个问题,我们提出了一种两阶段核岭回归方法。在第一阶段,我们学习一个模型,将响应表示为治疗和协变量的函数;在第二阶段,我们使用该模型构造伪结果以校正分布偏移,然后拟合第二个模型来估计治疗效果。尽管响应随治疗和协变量变化,但通过对协变量平均得到的诱导效果函数通常更简单,我们的估计器适应这种结构。我们在不估计条件治疗密度的情况下实现了最优学习界,从而绕过了现有方法中的一个主要瓶颈。此外,我们引入了一种完全数据驱动的模型选择程序,该程序对未知的重叠程度和底层核的谱衰减具有可证明的自适应性。

英文摘要

We study the problem of estimating the effect function for a continuous treatment, which maps each treatment value to a population-averaged outcome. A central challenge in this setting is confounding: treatment assignment often depends on covariates, creating selection bias that makes direct regression of the response on treatment unreliable. To address this issue, we propose a two-stage kernel ridge regression method. In the first stage, we learn a model for the response as a function of both treatment and covariates; in the second stage, we use this model to construct pseudo-outcomes that correct for distribution shift, and then fit a second model to estimate the treatment effect. Although the response varies with both treatment and covariates, the induced effect function obtained by averaging over covariates is typically much simpler, and our estimator adapts to this structure. Our optimal learning bounds are achieved without estimating the conditional treatment density, thereby bypassing a major bottleneck in existing methods. Furthermore, we introduce a fully data-driven model selection procedure that achieves provable adaptivity to both the unknown degree of overlap and the spectral decay of the underlying kernel.

2604.13147 2026-05-29 stat.ML cs.LG math.PR

Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version

基于离模型训练和重要性采样的自适应学习用于完全非马尔可夫最优随机控制(完整版)

Dorival Leão, Alberto Ohashi, Simone Scotti, Adolfo M. D da Silva

AI总结 针对完全非马尔可夫且依赖未知模型参数的连续时间随机控制问题,提出一种基于离散骨架和重要性采样的蒙特卡洛学习方法,实现离模型训练架构和自适应参数更新,并给出非渐近误差界。

详情
Comments
Typos are fixed. Numerical experiment is revised
AI中文摘要

本文研究连续时间随机控制问题,其受控状态是完全非马尔可夫的,且依赖于未知模型参数。这类问题自然出现在路径依赖随机微分方程、粗糙波动率对冲以及分数布朗运动驱动的系统中。基于先前工作中发展的离散骨架方法,我们提出了一种用于相关嵌入后向动态规划方程的蒙特卡洛学习方法。我们的主要贡献有两方面。首先,针对几类具有代表性的非马尔可夫受控系统,我们构造了显式的支配训练律和Radon-Nikodym权重。这产生了一种离模型训练架构,其中在参考律下生成固定的合成数据集,而通过重要性采样恢复与目标模型相关的动态规划算子。其次,我们利用这种结构设计了参数模型不确定性下的自适应更新机制,使得可以通过重新加权相同的训练样本而非重新生成新轨迹来执行重复校准。对于固定参数,我们建立了通过深度神经网络逼近嵌入动态规划方程的非渐近误差界。对于自适应学习,我们推导了将蒙特卡洛逼近误差与模型风险误差分离的定量估计。数值实验在结构化线性二次型例子中展示了离模型训练机制和自适应重要性采样更新。

英文摘要

This paper studies continuous-time stochastic control problems whose controlled states are fully non-Markovian and depend on unknown model parameters. Such problems arise naturally in path-dependent stochastic differential equations, rough-volatility hedging, and systems driven by fractional Brownian motion. Building on the discrete skeleton approach developed in earlier work, we propose a Monte Carlo learning methodology for the associated embedded backward dynamic programming equation. Our main contribution is twofold. First, we construct explicit dominating training laws and Radon--Nikodym weights for several representative classes of non-Markovian controlled systems. This yields an off-model training architecture in which a fixed synthetic dataset is generated under a reference law, while the dynamic programming operators associated with a target model are recovered by importance sampling. Second, we use this structure to design an adaptive update mechanism under parametric model uncertainty, so that repeated recalibration can be performed by reweighting the same training sample rather than regenerating new trajectories. For fixed parameters, we establish non-asymptotic error bounds for the approximation of the embedded dynamic programming equation via deep neural networks. For adaptive learning, we derive quantitative estimates that separate Monte Carlo approximation error from model-risk error. Numerical experiments illustrate both the off-model training mechanism and the adaptive importance-sampling update in structured linear-quadratic examples.

2604.05446 2026-05-29 stat.ML cs.LG

MEC: Machine-Learning-Assisted Generalized Entropy Calibration for Semi-Supervised Mean Estimation

MEC:基于机器学习的广义熵校准用于半监督均值估计

Se Yoon Lee, Jae Kwang Kim

AI总结 提出MEC方法,通过交叉拟合校准加权改进预测驱动推断,在半监督均值估计中实现半参数效率界,并提升置信区间覆盖率和精度。

详情
AI中文摘要

获取高质量标签成本高昂,而无标签协变量通常丰富,这推动了具有可靠不确定性量化的半监督推断方法的发展。预测驱动推断(PPI)利用在少量标记样本上训练的机器学习预测器来提高效率,但在模型误指定下可能损失效率,并因标签重用而导致覆盖失真。我们引入了基于机器学习的广义熵校准(MEC),这是PPI的一种交叉拟合、校准加权变体。MEC通过基于Bregman投影的原则性校准框架对标记样本重新加权,以更好地与目标群体对齐,从而提高效率。这使MEC对预测器的仿射变换具有鲁棒性,并通过用更弱的投影误差条件替代原始预测误差条件,放宽了有效性的要求。因此,MEC在比现有PPI变体更弱的假设下达到了半参数效率界。在模拟和实际数据应用中,MEC实现了接近名义覆盖率的置信区间,并且比CF-PPI和普通PPI具有更紧的置信区间。

英文摘要

Obtaining high-quality labels is costly, whereas unlabeled covariates are often abundant, motivating semi-supervised inference methods with reliable uncertainty quantification. Prediction-powered inference (PPI) leverages a machine-learning predictor trained on a small labeled sample to improve efficiency, but it can lose efficiency under model misspecification and suffer from coverage distortions due to label reuse. We introduce Machine-Learning-Assisted Generalized Entropy Calibration (MEC), a cross-fitted, calibration-weighted variant of PPI. MEC improves efficiency by reweighting labeled samples to better align with the target population, using a principled calibration framework based on Bregman projections. This yields robustness to affine transformations of the predictor and relaxes requirements for validity by replacing conditions on raw prediction error with weaker projection-error conditions. As a result, MEC attains the semiparametric efficiency bound under weaker assumptions than existing PPI variants. Across simulations and a real-data application, MEC achieves near-nominal coverage and tighter confidence intervals than CF-PPI and vanilla PPI.

2603.19573 2026-05-29 stat.ME

Estimating within-cluster and between-cluster spillover effects in randomized saturation designs

随机饱和设计中估计簇内和簇间溢出效应

Sizhu Lu, Lei Shi, Peng Ding

AI总结 针对随机饱和设计中单位跨簇交互导致簇间溢出效应的问题,基于潜在结果框架提出估计簇内和簇间溢出效应的因果推断方法,并建立估计与推断的统计理论。

详情
Comments
To appear in Social Networks
AI中文摘要

随机饱和设计是两阶段实验:首先随机分配簇的处理概率,然后随机分配簇内单位的处理。现有关于随机饱和设计的文献通过假设不存在簇间溢出效应来关注估计簇内溢出效应。然而,在许多实际的随机饱和设计中,单位可能跨簇交互。一个主要例子是某些单位在地理上彼此接近,因此溢出效应跨簇产生。基于潜在结果框架,我们提出了在随机饱和设计中估计簇内和簇间溢出效应的因果推断问题。我们明确了因果估计量,并建立了估计与推断的统计理论。我们还应用我们的方法分析了肯尼亚最近一项关于现金转移对家庭支出的随机饱和设计。

英文摘要

Randomized saturation designs are two-stage experiments: they first randomly assign treatment probabilities over the clusters and then randomly assign the treatment to the units within the clusters. The existing literature on randomized saturation designs focuses on estimating within-cluster spillover effects by assuming away between-cluster spillover effects. However, the units may interact across clusters in many practical randomized saturation designs. A leading example is that some units are geographically close to each other, so spillover effects arise across clusters. Based on the potential outcomes framework, we formulate the causal inference problem of estimating within-cluster and between-cluster spillover effects in randomized saturation designs. We clarify the causal estimands and establish the statistical theory for estimation and inference. We also apply our method to analyze a recent randomized saturation design of cash transfer on household expenditure in Kenya.

2602.11760 2026-05-29 stat.ML cs.LG

Aggregate Models, Not Explanations: Improving Feature Importance Estimation

聚合模型而非解释:改进特征重要性估计

Joseph Paillard, Angel Reyero Lobo, Denis A. Engemann, Bertrand Thirion

AI总结 针对特征重要性估计不准确的问题,本文通过理论分析证明模型级集成比解释级集成能更有效地降低误差,并在基准和蛋白质组学数据上验证。

详情
AI中文摘要

特征重要性方法有望将机器学习模型从预测引擎转变为科学发现的工具。然而,由于数据采样和算法随机性,表达性模型可能不稳定,导致变量重要性估计不准确,削弱其在关键生物医学应用中的效用。尽管集成提供了一种解决方案,但由于重要性度量的非线性,决定是解释单个集成模型还是聚合单个模型解释是困难的,并且尚未得到充分研究。我们的理论分析在适应复杂最先进机器学习模型的假设下发展,揭示了这一选择主要由模型的超额风险驱动。与先前文献相反,我们表明模型级集成通过减少这一主导误差项,提供了更准确的变量重要性估计,特别是对于表达性模型。我们在经典基准和来自英国生物银行的大规模蛋白质组学研究中验证了这些发现。

英文摘要

Feature-importance methods show promise in transforming machine learning models from predictive engines into tools for scientific discovery. However, due to data sampling and algorithmic stochasticity, expressive models can be unstable, leading to inaccurate variable importance estimates and undermining their utility in critical biomedical applications. Although ensembling offers a solution, deciding whether to explain a single ensemble model or aggregate individual model explanations is difficult due to the nonlinearity of importance measures and remains largely understudied. Our theoretical analysis, developed under assumptions accommodating complex state-of-the-art ML models, reveals that this choice is primarily driven by the model's excess risk. In contrast to prior literature, we show that ensembling at the model level provides more accurate variable-importance estimates, particularly for expressive models, by reducing this leading error term. We validate these findings on classical benchmarks and a large-scale proteomic study from the UK Biobank.

2602.05786 2026-05-29 cs.LG stat.AP stat.ML

Selecting Hyperparameters for Tree-Boosting

选择树提升的超参数

Floris Jan Koster, Fabio Sigrist

AI总结 本文通过59个数据集比较了多种超参数优化方法,发现SMAC方法显著优于其他方法,并揭示了超参数调优的关键因素。

详情
AI中文摘要

树提升是一种广泛用于表格数据的机器学习技术。然而,其样本外准确性严重依赖于多个超参数。在本文中,我们使用59个回归和分类数据集,实证比较了几种流行的树提升超参数优化方法,包括随机网格搜索、树结构Parzen估计器(TPE)、基于高斯过程的贝叶斯优化(GP-BO)、Hyperband、基于序列模型的算法配置(SMAC)方法以及确定性全网格搜索。我们发现SMAC方法明显优于所有其他考虑的方法。我们进一步观察到:(i)需要相对较大的试验次数(大于100)才能进行准确的调优,(ii)使用超参数的默认值会产生非常不准确的模型,(iii)所有考虑的超参数都可能对树提升的准确性产生实质性影响,即不存在一组比其他超参数更重要的超参数,以及(iv)对于回归任务,使用早停法选择提升迭代次数比将其包含在搜索空间中能产生更准确的结果。

英文摘要

Tree-boosting is a widely used machine learning technique for tabular data. However, its out-of-sample accuracy is critically dependent on multiple hyperparameters. In this article, we empirically compare several popular methods for hyperparameter optimization for tree-boosting including random grid search, the tree-structured Parzen estimator (TPE), Gaussian-process-based Bayesian optimization (GP-BO), Hyperband, the sequential model-based algorithm configuration (SMAC) method, and deterministic full grid search using $59$ regression and classification data sets. We find that the SMAC method clearly outperforms all the other considered methods. We further observe that (i) a relatively large number of trials larger than $100$ is required for accurate tuning, (ii) using default values for hyperparameters yields very inaccurate models, (iii) all considered hyperparameters can have a material effect on the accuracy of tree-boosting, i.e., there is no small set of hyperparameters that is more important than others, and (iv) choosing the number of boosting iterations using early stopping yields more accurate results compared to including it in the search space for regression tasks.

2601.18728 2026-05-29 cs.LG math.DG math.OC math.ST stat.TH

Riemannian AmbientFlow: Towards Simultaneous Manifold Learning and Generative Modeling from Corrupted Data

黎曼环境流:面向从损坏数据同时进行流形学习和生成建模

Willem Diepeveen, Oscar Leong

AI总结 提出Riemannian AmbientFlow框架,通过变分推断和数据驱动黎曼几何,从损坏观测中同时学习概率生成模型和非线性数据流形,并理论保证误差可控与双Lipschitz流形参数化。

详情
AI中文摘要

现代生成建模方法在从干净样本学习复杂数据分布方面表现出强大性能。然而,在许多科学和成像应用中,干净样本不可用,只能观测到噪声或线性损坏的测量值。此外,数据中存在的潜在结构(如流形几何)对于进一步的科学分析至关重要。在这项工作中,我们引入了Riemannian AmbientFlow,一个直接从损坏观测中同时学习概率生成模型和底层非线性数据流形的框架。基于AmbientFlow的变分推断框架,我们的方法结合了由归一化流引起的数据驱动黎曼几何,通过拉回度量和黎曼自编码器提取流形结构。我们建立了理论保证,表明在适当的几何正则化和测量条件下,学习到的模型以可控误差恢复底层数据分布,并产生光滑的双Lipschitz流形参数化。我们进一步证明,所得的光滑解码器可以作为具有恢复保证的逆问题的原则性生成先验。我们在低维合成流形和MNIST上实证验证了我们的方法。

英文摘要

Modern generative modeling methods have demonstrated strong performance in learning complex data distributions from clean samples. In many scientific and imaging applications, however, clean samples are unavailable, and only noisy or linearly corrupted measurements can be observed. Moreover, latent structures, such as manifold geometries, present in the data are important to extract for further downstream scientific analysis. In this work, we introduce Riemannian AmbientFlow, a framework for simultaneously learning a probabilistic generative model and the underlying, nonlinear data manifold directly from corrupted observations. Building on the variational inference framework of AmbientFlow, our approach incorporates data-driven Riemannian geometry induced by normalizing flows, enabling the extraction of manifold structure through pullback metrics and Riemannian Autoencoders. We establish theoretical guarantees showing that, under appropriate geometric regularization and measurement conditions, the learned model recovers the underlying data distribution up to a controllable error and yields a smooth, bi-Lipschitz manifold parametrization. We further show that the resulting smooth decoder can serve as a principled generative prior for inverse problems with recovery guarantees. We empirically validate our approach on low-dimensional synthetic manifolds and on MNIST.

2511.16815 2026-05-29 stat.ML cs.LG

BITS for GAPS: Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates

BITS for GAPS:用于层次高斯过程代理的贝叶斯信息论采样

Kyla D. Jones, Alexander W. Dowling

AI总结 提出BITS for GAPS框架,通过贝叶斯层次建模将超参数不确定性传播到采样准则中,实现基于高斯过程代理模型的信息论实验设计,并在汽液平衡案例中验证其提升预测精度和信息增益的效果。

详情
Journal ref
Computers & Chemical Engineering, 197, 109041 (2026)
AI中文摘要

我们引入了用于层次高斯过程代理的贝叶斯信息论采样(BITS for GAPS),这是一个框架,能够实现基于高斯过程的代理模型的信息论实验设计。与标准方法(在采集函数中使用固定或点估计的超参数)不同,我们的方法通过贝叶斯层次建模将超参数不确定性传播到采样准则中。在该框架中,潜在函数接受高斯过程先验,而超参数被赋予额外的先验以捕捉建模者对控制物理现象的知识。因此,采集函数同时包含了来自潜在函数及其超参数的不确定性,确保采样由数据稀缺性和模型不确定性共同指导。我们进一步在此背景下建立了理论结果:后验微分熵的闭式近似和下界。我们通过一个汽液平衡案例研究展示了该框架在混合建模中的实用性。具体来说,我们为二元混合物中的潜在活度系数构建了一个代理模型。通过将代理嵌入扩展形式的拉乌尔定律中,我们构建了一个混合模型。该混合模型随后用于指导蒸馏设计。该案例研究展示了如何将部分物理知识转化为层次高斯过程代理。它还表明,使用BITS for GAPS通过瞄准Wilson活度模型的高不确定性区域,增加了期望信息增益和预测准确性。总体而言,BITS for GAPS是一个用于复杂物理系统中自适应数据采集的通用不确定性感知框架。

英文摘要

We introduce Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates (BITS for GAPS), a framework enabling information-theoretic experimental design of Gaussian process-based surrogate models. Unlike standard methods, which use fixed or point-estimated hyperparameters in acquisition functions, our approach propagates hyperparameter uncertainty into the sampling criterion through Bayesian hierarchical modeling. In this framework, a latent function receives a Gaussian process prior, while hyperparameters are assigned additional priors to capture the modeler's knowledge of the governing physical phenomena. Consequently, the acquisition function incorporates uncertainties from both the latent function and its hyperparameters, ensuring that sampling is guided by both data scarcity and model uncertainty. We further establish theoretical results in this context: a closed-form approximation and a lower bound of the posterior differential entropy. We demonstrate the framework's utility for hybrid modeling with a vapor-liquid equilibrium case study. Specifically, we build a surrogate model for latent activity coefficients in a binary mixture. We construct a hybrid model by embedding the surrogate into an extended form of Raoult's law. This hybrid model then informs distillation design. This case study shows how partial physical knowledge can be translated into a hierarchical Gaussian process surrogate. It also shows that using BITS for GAPS increases expected information gain and predictive accuracy by targeting high-uncertainty regions of the Wilson activity model. Overall, BITS for GAPS is a generalized uncertainty-aware framework for adaptive data acquisition in complex physical systems.

2510.16060 2026-05-29 cs.LG cs.AI stat.ME stat.ML

Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?

超越准确性:时间序列基础模型是否良好校准?

Coen Adler, Yuxin Chang, Felix Draxler, Samar Abdi, Padhraic Smyth

AI总结 本文系统评估了五个时间序列基础模型和两个基线的校准特性,发现基础模型校准优于基线且无系统性过度自信或信心不足。

详情
Journal ref
Proceedings of ICLR 2026
Comments
Published as a conference paper at ICLR 2026
AI中文摘要

最近时间序列数据基础模型的发展引起了在各种应用中使用此类模型的广泛兴趣。尽管基础模型实现了最先进的预测性能,但它们的校准特性仍然相对未被充分探索,尽管校准在许多实际应用中可能至关重要。在本文中,我们研究了五个近期时间序列基础模型和两个竞争基线的校准相关特性。我们进行了一系列系统评估,包括模型校准(即过度自信或信心不足)、不同预测头的影响以及长期自回归预测下的校准。我们发现时间序列基础模型始终比基线模型校准得更好,并且往往不会系统性地过度自信或信心不足,这与在其他深度学习模型中常见的过度自信形成对比。

英文摘要

The recent development of foundation models for time series data has generated considerable interest in using such models across a variety of applications. Although foundation models achieve state-of-the-art predictive performance, their calibration properties remain relatively underexplored, despite the fact that calibration can be critical for many practical applications. In this paper, we investigate the calibration-related properties of five recent time series foundation models and two competitive baselines. We perform a series of systematic evaluations assessing model calibration (i.e., over- or under-confidence), effects of varying prediction heads, and calibration under long-term autoregressive forecasting. We find that time series foundation models are consistently better calibrated than baseline models and tend not to be either systematically over- or under-confident, in contrast to the overconfidence often seen in other deep learning models.

2510.12152 2026-05-29 stat.ML cs.LG

Follow-the-Perturbed-Leader for Decoupled Bandits: Best-of-Both-Worlds and Practicality

解耦赌博机的跟随扰动领导者:两全其美与实用性

Chaiwon Kim, Jongyeong Lee, Min-hwan Oh

AI总结 针对解耦多臂赌博机问题,提出一种高效的跟随扰动领导者策略,在随机环境下实现常数遗憾,在对抗环境下实现最优O(√KT)遗憾,且避免了凸优化和重采样过程,显著降低计算成本。

详情
Comments
Accepted to ICML 2026, 31 pages
AI中文摘要

我们研究了解耦多臂赌博机问题,其中学习者在每一轮分别选择一个臂进行探索,并选择另一个可能不同的臂进行利用。在此设置中,探索臂的损失被观察到但不承担,而利用臂的损失被承担但不被观察到。我们提出了一种高效的跟随扰动领导者(FTPL)策略,该策略在随机环境下实现常数遗憾,在对抗环境下实现最优$O(\sqrt{KT})$遗憾,从而获得两全其美(BOBW)保证。我们方法的一个关键特征是它完全避免了先前BOBW策略所需的凸优化以及FTPL赌博机策略中通常使用的重采样过程。这使得FTPL能够充分发挥其计算效率优势,大幅降低计算成本。我们通过实验证实,我们的策略不仅提高了运行时间,而且在两种环境下都表现出优越的遗憾性能。

英文摘要

We study the decoupled multi-armed bandit problem, where the learner separately selects one arm for exploration and one, possibly different, arm for exploitation at each round. In this setting, the loss of the explored arm is observed but not incurred, whereas the loss of the exploited arm is incurred without being observed. We propose an efficient Follow-the-Perturbed-Leader (FTPL) policy that achieves Best-of-Both-Worlds (BOBW) guarantee with constant regret in the stochastic regime and optimal $O(\sqrt{KT})$ regret in the adversarial regime. A key feature of our method is that it completely avoids both the convex optimization required by prior BOBW policies and the resampling procedures typically used in FTPL bandit policies. This allows FTPL to fully realize its computational efficiency advantages, leading to substantial reductions in computational cost. We empirically confirm that our policy not only improves the runtime but also demonstrates superior regret performance in both regimes.

2510.10578 2026-05-29 math.PR math.ST stat.TH

On extremes for Gaussian subordination

高斯从属过程的极值理论

Shuyang Bai, Marie-Christine Duker

AI总结 本文研究高斯从属过程的极值理论,通过改进方法、推广到多元设置并引入m-极值依赖概念,建立了点过程弱收敛和多元极值极限定理。

详情
Comments
32 pages; revised based on reviewer's comments
AI中文摘要

本文研究了通过对平稳高斯过程应用变换而得到的过程(也称为从属高斯过程)的极值理论。主要贡献如下:首先,我们改进了\cite{sly2008nonstandard}的方法,允许底层高斯过程的协方差以比任何多项式速率更慢的速度衰减,几乎达到Berman条件。其次,我们将理论推广到多元设置,其中从属过程和底层高斯过程都可以是向量值,且变换是有限维的。特别地,我们建立了由从属高斯过程构造的点过程的弱收敛,从而得到多元极值极限定理。一个促进我们分析的关键观察(可能具有独立意义)是:任何从两个具有非单位典型相关的联合高斯向量变换得到的二元随机向量始终保持极值独立。这一观察也促使我们引入并讨论一个称为m-极值依赖的概念,它扩展了经典的m-依赖概念。此外,我们放宽了对有限维变换的限制,通过近似论证将结果推广到无限维设置。作为示例,我们为多元移动最大值过程建立了极限定理,该过程由具有潜在长记忆的从属高斯过程产生的规则变化创新驱动。

英文摘要

This paper investigates extreme value theory for processes obtained by applying transformations to stationary Gaussian processes, also called subordinated Gaussian processes. The main contributions are as follows. First, we refine the method of \cite{sly2008nonstandard} to allow the covariance of the underlying Gaussian process to decay more slowly than any polynomial rate, nearly matching Berman's condition. Second, we extend the theory to a multivariate setting, where both the subordinated process and the underlying Gaussian process may be vector-valued, and the transformation is finite-dimensional. In particular, we establish the weak convergence of a point process constructed from the subordinated Gaussian process, from which a multivariate extreme value limit theorem follows. A key observation that facilitates our analysis, and may be of independent interest, is the following: any bivariate random vector derived from transformations of two jointly Gaussian vectors with a non-unity canonical correlation always remains extremally independent. This observation also motivates us to introduce and discuss a notion we call $m$-extremal-dependence, which extends the classical concept of $m$-dependence. Moreover, we relax the restriction to finite-dimensional transforms, extending the results to infinite-dimensional settings via an approximation argument. As an illustration, we establish a limit theorem for a multivariate moving maxima process driven by regularly varying innovations that arise from subordinated Gaussian processes with potentially long memory.

2510.10020 2026-05-29 stat.ML cs.LG q-bio.BM

Calibrating Generative Models to Distributional Constraints

生成模型的分布约束校准

Henry D. Smith, Nathaniel L. Diamant, Brian L. Trippe

AI总结 针对生成模型采样分布统计量偏离期望的校准问题,提出将校准形式化为受约束优化问题,并通过松弛损失和奖励损失两种替代目标进行微调,在蛋白质设计、图像生成和语言建模等应用中显著降低了数百个同时约束下的校准误差。

详情
Comments
To appear at the International Conference on Machine Learning (ICML), 2026. Codebase accompanying the paper is available at: https://github.com/smithhenryd/cgm
AI中文摘要

生成模型经常存在校准不足的问题,即采样分布的统计量(例如给定类别中生成样本的比例)偏离期望值。我们将校准形式化为一个受约束的优化问题,并寻找在满足校准约束条件下与原始模型在Kullback-Leibler散度上最接近的模型。为了解决精确施加这些约束的难解性,我们引入了两种用于微调的替代目标:(1) 松弛损失,用校准惩罚项替代约束;(2) 奖励损失,将校准转化为奖励微调问题。我们证明,这些方法在数百个同时约束和参数高达九十亿的模型上显著降低了校准误差,应用范围涵盖蛋白质设计、图像生成和语言建模。

英文摘要

Generative models frequently suffer miscalibration, wherein statistics of the sampling distribution, such as the fraction of generations in a given class, deviate from desired values. We frame calibration as a constrained optimization problem and seek the closest model in Kullback-Leibler divergence satisfying a calibration constraint. To address the intractability of imposing these constraints exactly, we introduce two surrogate objectives for fine-tuning: (1) the relax loss, which replaces the constraint with a miscalibration penalty, and (2) the reward loss, which converts calibration into a reward fine-tuning problem. We demonstrate that these approaches substantially reduce calibration error across hundreds of simultaneous constraints and models with up to nine billion parameters, spanning applications in protein design, image generation, and language modeling.

2509.21734 2026-05-29 stat.ME

Optimal Stopping for Sequential Bayesian Experimental Design

序贯贝叶斯实验设计的最优停止

Chen Cheng, Xun Huan

AI总结 针对序贯实验设计中何时停止的问题,提出基于马尔可夫决策过程的贝叶斯最优停止框架,并采用课程学习策略解决联合训练中的局部最优陷阱。

详情
AI中文摘要

序贯贝叶斯实验设计通常假设实验次数在数据收集开始前是固定的。然而,在实际操作中,实验可能需要提前终止,因为额外的测量相对于其成本可能提供递减的信息,从而引发核心决策问题:何时应该停止?常见的基于阈值的停止规则易于实现但目光短浅,因为它们将当前状态与固定标准进行比较,而未考虑未来实验的预期价值。本文通过将停止和设计表述为马尔可夫决策过程中的耦合决策,为序贯实验设计开发了一个贝叶斯最优停止框架。我们证明,对于任何设计策略,最优停止规则恰好当立即终止奖励超过预期继续价值时终止。然后,我们推导出一种用于学习基于价值的停止和设计策略的策略梯度方法。朴素的联合训练可能产生循环依赖,使学习陷入早期停止的局部最优。我们通过一种课程学习策略解决了这一困难,该策略在训练过程中逐渐从强制继续过渡到自适应停止。在线性高斯基准、一维非线性测试问题以及污染物源检测问题上的数值研究表明,所提出的方法学习了稳定的设计-停止策略,并提高了资源感知性能,在具有强序贯依赖的设置中增益最大。

英文摘要

Sequential Bayesian experimental design typically assumes that the number of experiments is fixed before data collection begins. In practical campaigns, however, experimentation may need to terminate early because additional measurements can provide diminishing information relative to their cost, raising the central decision question: when should one stop? Common threshold-based stopping rules are easy to implement but myopic, because they compare the current state with a fixed criterion without accounting for the expected value of future experiments. This work develops a Bayesian optimal stopping framework for sequential experimental design by formulating stopping and design as coupled decisions in a Markov decision process. We prove that, for any design policy, the optimal stopping rule terminates exactly when the immediate terminal reward exceeds the expected continuation value. We then derive a policy gradient method for learning value-based stopping and design policies. Naïve joint training can create a circular dependency that traps learning in early-stopping local optima. We address this difficulty with a curriculum learning strategy that gradually transitions from forced continuation to adaptive stopping during training. Numerical studies on a linear-Gaussian benchmark, a one-dimensional nonlinear test problem, and a contaminant source detection problem show that the proposed approach learns stable design-stopping policies and improves resource-aware performance, with the largest gains in settings with strong sequential dependence.

2509.21707 2026-05-29 stat.ML cs.LG stat.ME

SADA: Safe and Adaptive Aggregation of Multiple Black-Box Predictions in Semi-Supervised Learning

SADA:半监督学习中多个黑箱预测的安全自适应聚合

Jiawei Shan, Zhifeng Chen, Yiming Dong, Yazhen Wang, Jiwei Zhao

AI总结 提出一种安全自适应聚合多个不确定质量黑箱预测的方法,保证不劣于仅用标注数据,并在存在完美预测时实现更快收敛或半参数效率界。

详情
AI中文摘要

半监督学习(SSL)在实践中出现于标注数据稀缺或获取成本高昂,而大量未标注数据易于获取的情况下。随着机器学习技术的广泛采用,使用多种模型和算法(包括深度学习、大语言模型和生成式AI)生成多个预测标签已变得越来越可行。在本文中,我们提出了一种新颖方法,能够安全且自适应地聚合多个质量不确定的黑箱预测,用于推理和预测任务。我们的方法提供两个关键保证:(i)无论预测质量如何,其表现永远不会差于仅使用标注数据;(ii)如果任意一个预测(无需知道是哪一个)完美拟合真实标签,算法会自适应地利用这一点,以实现更快的收敛速度或半参数效率界。我们通过小规模模拟和两项具有不同科学目标的真实数据分析展示了所提算法的有效性。提供了用户友好的R包sada以促进实际实施。

英文摘要

Semi-supervised learning (SSL) arises in practice when labeled data are scarce or expensive to obtain, while large quantities of unlabeled data are readily available. With the growing adoption of machine learning techniques, it has become increasingly feasible to generate multiple predicted labels using a variety of models and algorithms, including deep learning, large language models, and generative AI. In this paper, we propose a novel approach that safely and adaptively aggregates multiple black-box predictions of uncertain quality for both inference and prediction tasks. Our method provides two key guarantees: (i) it never performs worse than using the labeled data alone, regardless of the quality of the predictions; and (ii) if any one of the predictions (without knowing which one) perfectly fits the ground truth, the algorithm adaptively exploits this to achieve either a faster convergence rate or the semiparametric efficiency bound. We demonstrate the effectiveness of the proposed algorithm through small-scale simulations and two real-data analyses with distinct scientific goals. A user-friendly R package, sada, is provided to facilitate practical implementation.

2509.08194 2026-05-29 cs.LG stat.ML

Prescribe-then-Select: Adaptive Policy Selection for Contextual Stochastic Optimization

先规定后选择:面向情境随机优化的自适应策略选择

Caio de Prospero Iglesias, Kimberly Villalobos Carballo, Dimitris Bertsimas

AI总结 针对情境随机优化中候选策略在协变量空间表现异质的问题,提出Prescribe-then-Select模块化框架,通过构建可行策略库并基于最优策略树集成学习元策略实现数据驱动的自适应选择,在单阶段报童和两阶段运输规划问题中优于单一最优策略。

详情
AI中文摘要

我们解决了情境随机优化中的策略选择问题,其中协变量作为情境信息可用,且决策必须满足严格的可行性约束。在许多情境随机优化场景中,来自不同建模范式的多个候选策略在协变量空间上表现出异质性能,没有单一策略能够统一占优。我们提出了Prescribe-then-Select(PS)模块化框架,该框架首先构建一个可行候选策略库,然后学习一个元策略来为观测到的协变量选择最佳策略。我们使用在训练集上通过交叉验证训练的最优策略树集成来实现元策略,使策略选择完全数据驱动。在两个基准情境随机优化问题——单阶段报童和两阶段运输规划中,PS在协变量空间的异质区域始终优于最佳单一策略,并在不存在这种异质性时收敛到占优策略。所有重现结果的代码可在https://anonymous.4open.science/r/Prescribe-then-Select-TMLR获取。

英文摘要

We address the problem of policy selection in contextual stochastic optimization (CSO), where covariates are available as contextual information and decisions must satisfy hard feasibility constraints. In many CSO settings, multiple candidate policies--arising from different modeling paradigms--exhibit heterogeneous performance across the covariate space, with no single policy uniformly dominating. We propose Prescribe-then-Select (PS), a modular framework that first constructs a library of feasible candidate policies and then learns a meta-policy to select the best policy for the observed covariates. We implement the meta-policy using ensembles of Optimal Policy Trees trained via cross-validation on the training set, making policy choice entirely data-driven. Across two benchmark CSO problems--single-stage newsvendor and two-stage shipment planning--PS consistently outperforms the best single policy in heterogeneous regimes of the covariate space and converges to the dominant policy when such heterogeneity is absent. All the code to reproduce the results can be found at https://anonymous.4open.science/r/Prescribe-then-Select-TMLR.

2509.05771 2026-05-29 stat.ML cs.LG math.OC

Risk-averse Fair Multi-class Classification

风险规避的公平多类分类

Darinka Dentcheva, Xiangyu Tian

AI总结 基于一致风险度量与系统性风险理论,提出一种适用于噪声、稀缺和标签不可靠数据的风险规避多类分类框架,并通过非线性聚合的系统方法设计两阶段随机规划及正则化分解算法,同时实现公平性增强。

详情
AI中文摘要

我们基于一致风险度量和系统性风险理论开发了一种新的分类框架。所提出的方法适用于数据存在噪声、稀缺(相对于问题维度)且标签可能不可靠的多类问题。在论文的第一部分,我们提供了使用系统性风险模型的基础,并展示了如何将其应用于线性和基于核的多类问题中。我们提出了一种通过非线性聚合的系统理论方法进行更高级的公式化,这导致了一个两阶段随机规划问题。设计了一种风险规避的正则化分解方法来求解该问题。在性能分析中,我们使用一种流行的多类方法作为所提出分类方法的基准。我们通过使用一致风险度量对该方法进行多种推广来说明我们的想法。所提出的风险规避方法的可行性在理论和数值上得到了支持。此外,我们证明了系统性风险度量的应用有助于在分类中强制执行公平性。对所提出模型的公平性进行了仔细的分析和实验。对于所有方法,我们的数值实验表明,它们在训练数据不可靠的情况下具有鲁棒性,并且在未知数据上的表现优于最小化期望分类误差的方法。此外,当类别数量增加时,性能会得到提升。

英文摘要

We develop a new classification framework based on the theory of coherent risk measures and systemic risk. The proposed approach is suitable for multi-class problems when the data is noisy, scarce (relative to the dimension of the problem), and the labeling might be unreliable. In the first part of our paper, we provide the foundation of the use of systemic risk models and show how to apply it in the context of linear and kernel-based multi-class problems. More advanced formulation via a system-theoretic approach with non-linear aggregation is proposed, which leads to a two-stage stochastic programming problem. A risk-averse regularized decomposition method is designed to solve the problem. We use a popular multi-class method as a benchmark in the performance analysis of the proposed classification methods. We illustrate our ideas by proposing several generalization of that method by the use of coherent measures of risk. The viability of the proposed risk-averse methods are supported theoretically and numerically. Additionally, we demonstrate that the application of systemic risk measures facilitates enforcing fairness in classification. Analysis and experiments regarding the fairness of the proposed models are carefully conducted. For all methods, our numerical experiments demonstrate that they are robust in the presence of unreliable training data and perform better on unknown data than the methods minimizing expected classification errors. Furthermore, the performance improves when the number of classes increases.

2507.21429 2026-05-29 stat.ML cs.LG

From Sublinear to Linear: Local Convergence in Finite-Width Networks via Locally Polyak-Lojasiewicz Regions

从次线性到线性:通过局部Polyak-Lojasiewicz区域在有限宽度网络中的局部收敛

Agnideep Aich, Ashit Baran Aich, Bruce Wade

AI总结 本文研究有限宽度前馈网络在平方经验损失下梯度下降的局部线性收敛,通过局部Polyak-Lojasiewicz不等式和NTK正定性条件,证明了在局部拟凸区域内可实现线性收敛。

详情
AI中文摘要

我们研究了有限宽度前馈网络在平方经验损失下梯度下降的局部线性收敛。先前的工作表明,梯度下降可以保持在初始化附近的局部拟凸区域(LQCR)内,但仅给出次线性速率。我们证明,如果经验神经正切核在初始化时正定、在LQCR上Lipschitz稳定且与LQCR半径兼容,则平方损失满足局部Polyak-Łojasiewicz不等式,常数$μ= λ_0 - L_Θr(\Rcal) > 0$。结合固定步长迭代包含在LQCR内(作为线性速率定理中的假设),这在该区域上产生线性收敛。LQCR提供局部化;固定步长包含作为线性速率定理中的假设;PL不等式来自平方损失下的NTK条件。因此,结果是充分的局部条件,并非声称该机制对于快速收敛是必要或唯一的。实验上,我们通过NTK谱间隙、参数漂移、经验PL比率和次优性衰减来检验理论。在二值MNIST上,NTK保持正定,PL比率有正的下包络,损失在稳定区域呈几何衰减。在宽度消融实验中,固定步长宽度1024的运行离开局部区域;减小步长将最终漂移从1.870降至0.158,恢复观察到的局部区域诊断,并产生研究中观察到的最大经验PL比率下包络。在CIFAR-10子集上的CNN鲁棒性检查显示,PL比率包络在三个种子下保持正,且在稳定区域上三个种子均有正的下包络。

英文摘要

We study local linear convergence of gradient descent for finite-width feedforward networks under the squared empirical loss. Prior work shows that GD can remain confined to a Locally Quasi-Convex Region (LQCR) around initialization, but only gives a sublinear rate. We show that if the empirical Neural Tangent Kernel is positive at initialization, Lipschitz stable on the LQCR, and compatible with the LQCR radius, then the squared loss satisfies a local Polyak-Łojasiewicz inequality with constant $μ= λ_0 - L_Θr(\Rcal) > 0$. Combined with fixed-step iterate containment in the LQCR, imposed as a hypothesis in the linear-rate theorem, this yields linear convergence on the region. The LQCR supplies localization; fixed-step containment is imposed as a hypothesis in the linear-rate theorem; and the PL inequality comes from NTK conditioning under squared loss. The result is therefore a sufficient local condition, not a claim that this mechanism is necessary or unique for fast convergence. Empirically, we probe the theory through NTK spectral gap, parameter drift, empirical PL ratio, and suboptimality decay. On binary MNIST, the NTK remains positive, the PL ratio has a positive lower envelope, and the loss shows geometric decay on the stable regime. In a width ablation, the fixed-step width-$1024$ run leaves the local regime; reducing the step size lowers final drift from $1.870$ to $0.158$, restores the observed local-regime diagnostics, and yields the largest empirical PL-ratio lower envelope observed in the study. A CNN robustness check on a CIFAR-10 subset shows the PL-ratio envelope remains positive across three seeds, with a positive lower envelope across all three seeds on the stable regime.

2506.08028 2026-05-29 eess.SY cs.SY stat.AP

Sensor Fusion for Track Geometry Monitoring: Integrating On-Board Condition Monitoring and Degradation Models via Kalman Filtering

轨道几何监测的传感器融合:通过卡尔曼滤波集成车载状态监测与退化模型

Huy Truong-Ba, Jacky Chin, Michael E. Cholette, Pietro Borghesani

AI总结 本研究提出一种通过卡尔曼滤波融合低成本车载传感器振动信号与退化模型的方法,以提升轨道几何预测的可靠性,并实验验证了频繁传感器数据能显著降低预测不确定性。

详情
AI中文摘要

轨道几何监测对于维护铁路运营的安全性和效率至关重要。虽然轨道检测车(TRCs)能提供轨道几何指标的精确测量,但其有限的可用性和高昂的运营成本限制了在大型铁路网络中的频繁监测。近年来,安装在运营列车上的车载传感器系统提供了一种成本效益高的替代方案,能够实现高频但精度较低的数据采集。本研究提出一种方法,通过卡尔曼滤波框架将低精度传感器振动信号与退化模型相结合,以增强轨道几何预测的可靠性。一项使用安装在TRC上的低成本传感器系统的实验活动评估了所提出的方法。结果表明,即使数据存在噪声,融入频繁的传感器数据也能显著降低预测不确定性。研究还探讨了数据记录频率如何影响可信预测区间的大小,为有效轨道监测和维护规划中车载传感器的最优部署提供指导。

英文摘要

Track geometry monitoring is essential for maintaining the safety and efficiency of railway operations. While Track Recording Cars (TRCs) provide accurate measurements of track geometry indicators, their limited availability and high operational costs restrict frequent monitoring across large rail networks. Recent advancements in on-board sensor systems installed on in-service trains offer a cost-effective alternative by enabling high-frequency, albeit less accurate, data collection. This study proposes a method to enhance the reliability of track geometry predictions by integrating low-accuracy sensor vibration signals with degradation models through a Kalman filter framework. An experimental campaign using a low-cost sensor system mounted on a TRC evaluates the proposed approach. The results demonstrate that incorporating frequent sensor data significantly reduces prediction uncertainty, even when the data is noisy. The study also investigates how the frequency of data recording influences the size of the credible prediction interval, providing guidance on the optimal deployment of on-board sensors for effective track monitoring and maintenance planning.

2505.20634 2026-05-29 cs.LG stat.ML

Explaining Concept Shift with Interpretable Feature Attribution

用可解释的特征归因解释概念漂移

Ruiqi Lyu, Alistair Turcan, Bryan Wilder

AI总结 提出SGShift方法,通过将概念漂移建模为特征选择任务,利用广义加性模型、敲除和吸收等统计工具识别导致源域与目标域模型性能差异的稀疏漂移特征。

详情
AI中文摘要

当特征条件标签分布在域间发生变化时,就会发生概念漂移,这可能导致即使调优良好的机器学习模型在新域上校准失效。识别这些漂移特征可以独特地揭示域间特征-标签关系如何不同,考虑到这种差异可能跨越科学相关的维度(如时间、疾病状态、人群等)。在本文中,我们提出SGShift,一种将表格数据中概念漂移导致的性能下降归因于稀疏漂移特征集的方法。我们将概念漂移框架化为特征选择任务,以学习能够解释源域和目标域模型间性能差异的特征。该框架使SGShift能够适应强大的统计工具,如广义加性模型、敲除和吸收,以识别这些漂移特征。我们在各种机器学习模型的合成数据和真实数据上进行了广泛实验,发现SGShift比基线方法更准确地识别漂移特征,在漂移域中所需样本少,并且对复杂的概念漂移情况具有鲁棒性。

英文摘要

Concept shift occurs when the distribution of labels conditioned on the features changes between domains, which can make even a well-tuned ML model miscalibrated on a new domain. Identifying these shifted features provides unique insight into how feature-label relationships differ between domains, considering the difference may be across a scientifically relevant dimension, such as time, disease status, population, etc. In this paper, we propose SGShift, a method for attributing performance degradation under concept shift in tabular data to a sparse set of shifted features. We frame concept shift as a feature selection task to learn the features that can explain performance differences between models in the source and target domain. This framework enables SGShift to adapt powerful statistical tools such as generalized additive models, knockoffs, and absorption towards identifying these shifted features. We conduct extensive experiments in synthetic and real data across various ML models and find SGShift can identify shifted features much more accurately than baseline methods, requires few samples in the shifted domain, and is robust to complex cases of concept shift.

2505.02743 2026-05-29 cs.LG stat.ML

Cooperative Variance Estimation and Bayesian Neural Networks for Disentangling Aleatoric and Epistemic Uncertainties

合作方差估计与贝叶斯神经网络用于分离偶然不确定性和认知不确定性

Jiaxiang Yi, Miguel A. Bessa

AI总结 提出通过合作训练方差估计网络与贝叶斯神经网络,实现偶然不确定性与认知不确定性的分离,并提升均值估计性能。

详情
Comments
38 pages, 26 figures
AI中文摘要

真实世界的数据包含偶然不确定性——由不完美的测量或对数据生成过程的不完全了解引起的不可约噪声。均值-方差估计网络可以学习这种类型的不确定性,但需要即兴的正则化策略以避免过拟合,并且无法预测认知不确定性(模型不确定性)。相反,贝叶斯神经网络可以预测认知不确定性,但由于贝叶斯推断的近似性质,它们以难以训练而著称。我们提出合作训练一个方差估计网络与一个贝叶斯神经网络,并通过实验证明,所得模型在改善均值估计的同时分离了偶然不确定性和认知不确定性。我们展示了该方法在多种数据集上的有效性和可扩展性,包括我们创建的一个时间依赖异方差回归数据集,其中偶然不确定性是已知的。所提出的方法易于实现、鲁棒,并且适用于各种模型架构。

英文摘要

Real-world data contains aleatoric uncertainty - irreducible noise arising from imperfect measurements or from incomplete knowledge about the data generation process. Mean-variance estimation networks can learn this type of uncertainty but require ad-hoc regularization strategies to avoid overfitting and are unable to predict epistemic uncertainty (model uncertainty). Conversely, Bayesian neural networks predict epistemic uncertainty but are notoriously difficult to train due to the approximate nature of Bayesian inference. We propose to cooperatively train a variance estimation network with a Bayesian neural network and empirically demonstrate that the resulting model disentangles aleatoric and epistemic uncertainties while improving the mean estimation. We demonstrate the effectiveness and scalability of this method across a diverse range of datasets, including a time-dependent heteroscedastic regression dataset we created where the aleatoric uncertainty is known. The proposed method is straightforward to implement, robust, and adaptable to various model architectures.

2502.04867 2026-05-29 stat.AP

Invariant Image Reparameterisation: Bridging Symbolic and Numerical Methods for Identifiability Analysis, Model Reduction, and Prediction

不变图像重参数化:连接符号与数值方法进行可辨识性分析、模型简化与预测

Oliver J. Maclaren, Ruanui Nicholson, Joel A. Trent, Joshua Rottenberry, Matthew Simpson

AI总结 本文提出不变图像重参数化(IIR)方法,通过将符号重参数化条件替换为单参考点的数值导数计算,实现模型降维、可辨识性分析与预测。

详情
Comments
41 pages incl. supplementary material (main text approx. 28 pages)
AI中文摘要

当数学模型用于解释数据时,结构性和实践性参数不可辨识问题很常见。此类问题促使了模型重参数化和简化方法的发展。本文考虑不变图像重参数化(IIR),探讨何时可将符号重参数化条件替换为单参考点处的数值导数计算。核心对象是不变图像:一种简化且与基无关的表示,用于描述控制可观测模型行为的参数组合。我们证明,当存在一一对应的分量变换使得可观测行为仅依赖于变换后参数的固定线性组合时,单个数值雅可比矩阵即可确定相关的低维重参数化空间。这包括依赖于原始参数单项式组合的模型。我们还给出了一阶不变性条件,通过局部零空间的不变部分区分最小简化与非最小但精确的简化。在结构可辨识但实践弱信息的情况下,相同的计算可分离强信息与弱信息的参数组合。不变图像支持多种坐标表示:奇异值分解(SVD)提供默认的按局部可辨识性排序的标准正交基,而稀疏单项式基通常更具可解释性。将这些坐标作为剖面分析(Profile-Wise Analysis)中的关注参数,可得到基于似然的不确定性量化和预测。我们在具有泊松极限、扩展泊松极限和非极限情况的参数化正态模型上,以及在基因调控的非线性微分方程模型repressilator上演示了该方法。IIR的Julia实现(包含这些示例及更多案例)可在https://github.com/omaclaren/reparam获取。

英文摘要

Structural and practical parameter non-identifiability issues are common when mathematical models are used to interpret data. Such issues motivate model reparameterisation and reduction methods. Here, we consider Invariant Image Reparameterisation (IIR), which asks when symbolic reparameterisation conditions can be replaced by numerical derivative calculations at a single reference point. The central object is the invariant image: a reduced, basis-independent representation of the parameter combinations controlling observable model behaviour. We show that when a one-to-one componentwise transformation makes observable behaviour depend only on fixed linear combinations of the transformed parameters, a single numerical Jacobian determines the associated lower-dimensional reparameterisation space. This includes models depending on monomial combinations of the original parameters. We also give a first-order invariance condition that distinguishes minimal from non-minimal but exact reductions via the invariant part of the local null space. In structurally identifiable but practically weakly informed settings, the same calculations separate strongly and weakly informed parameter combinations. The invariant image admits multiple coordinate representations: the SVD gives a default orthonormal basis ordered by local identifiability, while sparse monomial bases are often more interpretable. Treating these coordinates as interest parameters in Profile-Wise Analysis gives likelihood-based uncertainty quantification and prediction. We demonstrate the method on parameterised normal models with Poisson-limit, extended Poisson-limit, and non-limit cases, and on the repressilator, a nonlinear differential equation model of gene regulation. A Julia implementation of IIR, with these and further examples, is available at https://github.com/omaclaren/reparam.

2410.19371 2026-05-29 stat.ML cs.CR cs.LG

Noise-Aware Differentially Private Variational Inference

噪声感知的差分隐私变分推断

Talal Alrawajfeh, Joonas Jälkö, Antti Honkela

AI总结 针对差分隐私导致下游推断不可靠的问题,提出一种基于随机梯度变分推断的噪声感知近似贝叶斯推断方法,可应用于高维和非共轭模型,并改进了后验评估精度。

详情
Comments
26 pages, 4 figures
AI中文摘要

差分隐私(DP)为统计推断提供了强大的隐私保证,但这可能导致下游应用中不可靠的结果和偏差。尽管已有几种将DP扰动纳入推断的噪声感知方法被提出,但它们仅限于特定类型的简单概率模型。在这项工作中,我们提出了一种基于随机梯度变分推断的噪声感知近似贝叶斯推断新方法,该方法也可应用于高维和非共轭模型。我们还提出了一种更精确的噪声感知后验评估方法。实验表明,我们的推断方法在现有方法适用的领域具有相似的性能。在该领域之外,我们在高维贝叶斯线性回归上获得了准确的覆盖率,并在UCI成人数据集上的贝叶斯逻辑回归上获得了校准良好的预测概率。

英文摘要

Differential privacy (DP) provides robust privacy guarantees for statistical inference, but this can lead to unreliable results and biases in downstream applications. While several noise-aware approaches have been proposed which integrate DP perturbation into the inference, they are limited to specific types of simple probabilistic models. In this work, we propose a novel method for noise-aware approximate Bayesian inference based on stochastic gradient variational inference which can also be applied to high-dimensional and non-conjugate models. We also propose a more accurate evaluation method for noise-aware posteriors. Empirically, our inference method has similar performance to existing methods in the domain where they are applicable. Outside this domain, we obtain accurate coverages on high-dimensional Bayesian linear regression and well-calibrated predictive probabilities on Bayesian logistic regression with the UCI Adult dataset.

2408.15451 2026-05-29 cs.LG cs.CR stat.ME

Certified Causal Defense with Generalizable Robustness

具有泛化鲁棒性的认证因果防御

Yiran Qiao, Yu Yin, Chen Chen, Jing Ma

AI总结 提出GLEAN框架,通过可认证因果因子学习解耦因果关系与虚假相关性,并设计因果认证防御策略,实现跨分布偏移域的鲁棒性泛化。

详情
Comments
Accepted by AAAI 2025
AI中文摘要

尽管机器学习模型在各种场景中已被证明有效,但普遍认为许多模型容易受到对抗性攻击。近年来,出现了大量对抗性防御的研究。其中,认证防御因其对输入在特定范围内(例如$l_2$球)的任意对抗性扰动具有理论保证而闻名。然而,该领域现有的大多数工作难以将其认证鲁棒性泛化到具有分布偏移的其他数据域中。这一问题的根源在于难以消除不同域中虚假相关性对鲁棒性的负面影响。为解决此问题,本文提出了一种新颖的认证防御框架GLEAN,该框架将因果视角引入认证防御的泛化问题。具体而言,我们的框架集成了一个可认证的因果因子学习组件,以解耦输入与标签之间的因果关系和虚假相关性,从而排除虚假相关性对防御的负面影响。在此基础上,我们设计了一种因果认证防御策略来处理对潜在因果因子的对抗性攻击。通过这种方式,我们的框架不仅对训练分布中数据上的恶意噪声具有鲁棒性,而且能够将其鲁棒性泛化到具有分布偏移的各个域中。在基准数据集上的大量实验验证了我们的框架在不同数据域中认证鲁棒性泛化的优越性。代码见补充材料。

英文摘要

While machine learning models have proven effective across various scenarios, it is widely acknowledged that many models are vulnerable to adversarial attacks. Recently, there have emerged numerous efforts in adversarial defense. Among them, certified defense is well known for its theoretical guarantees against arbitrary adversarial perturbations on input within a certain range (e.g., $l_2$ ball). However, most existing works in this line struggle to generalize their certified robustness in other data domains with distribution shifts. This issue is rooted in the difficulty of eliminating the negative impact of spurious correlations on robustness in different domains. To address this problem, in this work, we propose a novel certified defense framework GLEAN, which incorporates a causal perspective into the generalization problem in certified defense. More specifically, our framework integrates a certifiable causal factor learning component to disentangle the causal relations and spurious correlations between input and label, and thereby exclude the negative effect of spurious correlations on defense. On top of that, we design a causally certified defense strategy to handle adversarial attacks on latent causal factors. In this way, our framework is not only robust against malicious noises on data in the training distribution but also can generalize its robustness across domains with distribution shifts. Extensive experiments on benchmark datasets validate the superiority of our framework in certified robustness generalization in different data domains. Code is available in the supplementary materials.

2407.04142 2026-05-29 stat.ME

Bayesian Structured Mediation Analysis With Unobserved Confounders

贝叶斯结构化中介分析:存在未观测混杂因素

Yuliang Xu, Shu Yang, Jian Kang

AI总结 针对具有空间平滑结构的高维中介变量(如脑成像数据)中未观测混杂因素影响的问题,提出贝叶斯结构化中介分析(BASMU)框架,通过引入潜在个体效应作为未观测混杂因素来去偏中介效应,并建立模型可识别性条件与两阶段估计算法。

详情
AI中文摘要

我们探索了减少未观测混杂因素对具有空间平滑结构的高维中介变量(如脑成像数据)的因果中介分析影响的方法。关键方法是将影响结构化中介变量的潜在个体效应作为未观测混杂因素纳入结果模型,从而可能对中介效应进行去偏。我们开发了贝叶斯结构化中介分析(BASMU)框架,并建立了其模型可识别性条件。当中介分析中忽略未观测混杂因素时,我们对自然间接效应(NIE)和自然直接效应(NDE)的渐近偏差进行了理论分析。针对BASMU,我们提出了一种两阶段估计算法,以减轻这些未观测混杂因素对估计中介效应的影响。大量模拟表明,BASMU在各种场景下显著减少了偏差。我们将BASMU应用于青少年脑认知发展(ABCD)研究的fMRI数据分析,重点关注先前报道显示具有有意义中介效应的四个脑区。与现有的图像中介分析方法相比,BASMU识别出具有显著中介效应的体素数量增加了两到四倍,其中NIE增加了41%,NDE减少了26%。

英文摘要

We explore methods to reduce the impact of unobserved confounders on the causal mediation analysis of high-dimensional mediators with spatially smooth structures, such as brain imaging data. The key approach is to incorporate the latent individual effects, which influence the structured mediators, as unobserved confounders in the outcome model, thereby potentially debiasing the mediation effects. We develop BAyesian Structured Mediation analysis with Unobserved confounders (BASMU) framework, and establish its model identifiability conditions. Theoretical analysis is conducted on the asymptotic bias of the Natural Indirect Effect (NIE) and the Natural Direct Effect (NDE) when the unobserved confounders are omitted in mediation analysis. For BASMU, we propose a two-stage estimation algorithm to mitigate the impact of these unobserved confounders on estimating the mediation effect. Extensive simulations demonstrate that BASMU substantially reduces the bias in various scenarios. We apply BASMU to the analysis of fMRI data in the Adolescent Brain Cognitive Development (ABCD) study, focusing on four brain regions previously reported to exhibit meaningful mediation effects. Compared with the existing image mediation analysis method, BASMU identifies two to four times more voxels that have significant mediation effects, with the NIE increased by 41%, and the NDE decreased by 26%.

2402.01866 2026-05-29 stat.ME

Parametric Bootstrap for Fixed Edge-Probability Network Models

固定边概率网络模型的参数自助法

Zhixuan Shao, Can M. Le

AI总结 针对Chung-Lu模型下的网络统计量不确定性量化问题,提出一种两层自助法以消除参数自助法的偏差,并构建更精确的置信区间。

详情
AI中文摘要

本文研究网络数据的参数自助法,旨在量化感兴趣的网络统计量的不确定性。现有的网络重抽样方法主要关注节点可交换图模型下的计数统计量,而我们考虑在未假设节点可交换性的Chung-Lu模型下更一般的网络统计量,包括局部统计量。我们表明,自然的网络参数自助法(先估计网络生成模型,再从估计模型中抽取自助样本)通常存在自助偏差。作为通用补救措施,我们证明两层自助程序可证明地减少这种偏差。这将经典迭代自助法的思想扩展到网络设置中,其中参数数量随网络规模增长。此外,对于许多网络统计量,第二层自助法提供了构建更高精度置信区间的方法。作为该分析的副产品,我们还得到了非齐次Erdos-Rényi模型下子图计数的中心极限定理,这可能具有独立意义。

英文摘要

This paper studies parametric bootstrap methods for network data, with the goal of quantifying the uncertainty of network statistics of interest. While existing network resampling methods primarily focus on count statistics under node-exchangeable graphon models, we consider more general network statistics, including local statistics, under the Chung-Lu model without assuming node exchangeability. We show that the natural network parametric bootstrap, which first estimates the network-generating model and then draws bootstrap samples from the estimated model, generally suffers from bootstrap bias. As a general remedy, we show that a two-level bootstrap procedure provably reduces this bias. This extends the classical idea of the iterative bootstrap to the network setting, where the number of parameters grows with the network size. Moreover, for many network statistics, the second-level bootstrap provides a way to construct confidence intervals with higher accuracy. As a by-product of this analysis, we also obtain a central limit theorem for subgraph counts under the inhomogeneous Erdos-Rényi model, which may be of independent interest.

2308.13222 2026-05-29 physics.comp-ph cs.LG physics.flu-dyn stat.ML

Bayesian Reasoning for Physics Informed Neural Networks

物理信息神经网络的贝叶斯推理

Krzysztof M. Graczyk, Kornel Witkowski

AI总结 提出一种基于证据驱动的贝叶斯物理信息神经网络方法,通过拉普拉斯近似高效计算模型证据,自动优化偏微分方程残差、边界条件和观测数据之间的损失权重,并在热方程、波动方程和伯格斯方程上验证了其求解精度与不确定性量化能力。

详情
Journal ref
Phys. Rev. E 113, 055307 (2026)
Comments
21 pages, 12 figures, re-edit the description of the Bayesian framework, some of the content moved to Appendix. Discussion of numerical performance added, as well as related approaches
AI中文摘要

我们引入了一种基于证据驱动的贝叶斯物理信息神经网络公式,能够自动优化偏微分方程残差、边界条件和观测数据之间的损失权重。与现有基于采样或变分推理的贝叶斯PINN方法不同,所提方法使用拉普拉斯近似解析计算模型证据,从而无需后验采样即可实现高效的超参数调优和模型比较。我们在热方程、波动方程和伯格斯方程上演示了该方法,获得了与精确解或参考解一致的结果。在伯格斯方程示例中,我们进一步展示了该框架自然地整合了控制方程和含噪声测量中的信息,在统一的贝叶斯框架内提供了预测不确定性。

英文摘要

We introduce an evidence-driven Bayesian formulation of physics-informed neural networks that enables automatic optimization of loss weights between PDE residuals, boundary conditions, and observational data. Unlike existing Bayesian PINN approaches based on sampling or variational inference, the proposed method uses a Laplace approximation to compute model evidence analytically, enabling efficient hyperparameter tuning and model comparison without posterior sampling. We demonstrate the method on the heat, wave, and Burgers' equations, obtaining solutions in agreement with exact or reference results. In the Burgers' equation example, we further show that the framework naturally integrates information from governing equations and noisy measurements, providing predictive uncertainties within a unified Bayesian setting.

2212.08549 2026-05-29 stat.CO astro-ph.IM hep-lat hep-th

Microcanonical Hamiltonian Monte Carlo

微正则哈密顿蒙特卡洛

Jakob Robnik, G. Bruno De Luca, Eva Silverstein, Uroš Seljak

AI总结 本文提出微正则哈密顿蒙特卡洛(MCHMC),通过固定能量哈密顿动力学和能量守恒的动量反弹实现遍历性,并开发了连续方向保持反弹的欠阻尼朗之万变体(MCLMC),在多个基准问题上性能优于NUTS HMC一个数量级以上。

详情
Comments
34 pages, 11 figures
AI中文摘要

我们发展了微正则哈密顿蒙特卡洛(MCHMC),这是一类遵循固定能量哈密顿动力学的模型,与遵循不同能量水平正则分布的哈密顿蒙特卡洛(HMC)形成对比。MCHMC调整哈密顿函数,使得动量变量上常能量曲面的均匀分布的边缘分布给出期望的目标分布。我们证明MCHMC需要偶尔的能量守恒台球式动量反弹以实现遍历性,类似于HMC中的动量重采样。我们将反弹概念推广到连续版本,在每一步进行部分方向保持反弹,这给出了具有非高斯噪声的能量守恒欠阻尼朗之万动力学(MCLMC)。MCHMC和MCLMC在条件数和维度上表现出有利的缩放性质。我们开发了一种高效的超参数调整方案,该方案在几个标准基准问题上实现了高性能并始终优于NUTS HMC,在某些情况下性能提升超过一个数量级。

英文摘要

We develop Microcanonical Hamiltonian Monte Carlo (MCHMC), a class of models which follow a fixed energy Hamiltonian dynamics, in contrast to Hamiltonian Monte Carlo (HMC), which follows canonical distribution with different energy levels. MCHMC tunes the Hamiltonian function such that the marginal of the uniform distribution on the constant-energy-surface over the momentum variables gives the desired target distribution. We show that MCHMC requires occasional energy conserving billiard-like momentum bounces for ergodicity, analogous to momentum resampling in HMC. We generalize the concept of bounces to a continuous version with partial direction preserving bounces at every step, which gives an energy conserving underdamped Langevin-like dynamics with non-Gaussian noise (MCLMC). MCHMC and MCLMC exhibit favorable scalings with condition number and dimensionality. We develop an efficient hyperparameter tuning scheme that achieves high performance and consistently outperforms NUTS HMC on several standard benchmark problems, in some cases by more than an order of magnitude.