arXivDaily arXiv每日学术速递 周一至周五更新

1. 统计理论与方法 7 篇

2606.20406 2026-06-19 stat.ME stat.CO 新提交

Flexible modeling of bimodal distributions via skewed-$t$ mixtures

双峰分布的灵活建模:基于偏斜-t分布的混合模型

Marco Bee, Flavio Santi

AI总结 提出基于Fernández和Steel (1998)偏斜-t分布的混合模型,通过EM算法进行极大似然估计,并开发似然比检验,用于拟合双峰、偏斜和厚尾数据,在标准普尔500指数中验证了双峰性。

详情
AI中文摘要

我们提出了一种位置-尺度偏斜-t分布的混合模型,用于拟合双峰、偏斜和厚尾数据。特别地,该混合模型基于Fernández和Steel (1998)的偏斜-t分布,因此模型构建过程可以轻松扩展到其他对称分布的混合。在研究了混合模型的性质后,我们通过EM算法开发了极大似然估计方法,并提出了一个似然比检验,用于检验任何给定成分中无偏斜的原假设。与最近提出的g-and-h分布混合的基于模拟的比较表明,所提出模型在良好指定设置下的估计精度和错误指定框架下的建模能力方面均表现出色。将该模型拟合到标准普尔500指数失真数据,证实了其分布的双峰性,这意味着美国股市历史上处于熊市或牛市状态,而非接近其基本面价值。

英文摘要

We propose a mixture of location-scale skewed-$t$ distributions to fit bimodal, skewed and heavy-tailed data. In particular, the mixture is based on the skewed-$t$ distribution by Fernández and Steel (1998), so that the model-building procedure can be easily extended to mixtures of other symmetric distributions. After studying the properties of the mixture, we develop a maximum likelihood estimation approach via the EM algorithm and a likelihood ratio test of the null hypothesis of no skewness in any given component. A simulation-based comparison to a recently proposed mixture of g-and-h distributions suggests that the performance of the proposed model is excellent, in terms of both estimation precision in well-specified setups and modeling capability in mis-specified frameworks. Fitting the model to the Standard & Poor's 500 distortion allows us to confirm the bimodality of its distribution, with the implication that the US stock market has historically been in bearish or bullish conditions, rather than near its fundamental value.

2606.20226 2026-06-19 stat.ME stat.CO 新提交

Analysis of uncertain fixed-effects model for Latin square designs

拉丁方设计的不确定固定效应模型分析

Yaru Cheng, Zhiming Li

AI总结 针对无频率稳定性的不确定实验数据,建立拉丁方设计的不确定固定效应模型,提出三种估计方法并构建置信区间,进行不确定齐性检验和常见检验,通过数值模拟和实例验证模型有效性。

详情
AI中文摘要

实验设计中常出现无频率稳定性的不确定数据。经典固定效应模型只能分析精确的实验数据。基于不确定测度,本文建立了拉丁方设计的不确定固定效应模型。首先,我们提出了三种不确定方法来估计处理和区组效应,并构建其置信区间。然后,进行不确定齐性检验和常见检验以评估处理效应的显著性。在数值模拟中,基于偏差、均方误差、平均绝对误差、总体标准差、覆盖概率和平均区间长度比较了三种估计方法。给出了几个例子来说明估计和假设检验的过程。最后,将不确定固定效应模型应用于真实教育数据,展示了其实用价值。

英文摘要

Uncertain data without frequency stability often arises in experimental design. Classical fixed-effects models can only analyze precise experimental data. Based on an uncertain measure, this paper establishes uncertain fixed-effect models for Latin-square designs. First, we propose three methods with uncertainty to estimate the treatment and blocked effects and construct their confidence intervals. Then, uncertain homogeneity and common tests are conducted to assess the significance of treatment effects. In the numerical simulations, the three estimation methods are compared based on bias, mean squared error, mean absolute error, overall standard deviation, coverage probability, and average interval length. Several examples are given to illustrate the process of estimation and hypothesis. Finally, the uncertain fixed-effects model is applied to real education data, demonstrating its practical value.

2606.20069 2026-06-19 stat.ME 新提交

A minimum-risk and cost-efficient two-sample sequential testing framework for the shifted exponential models with application to precipitation data

移位指数模型的最小风险与成本高效双序贯检验框架及其在降水数据中的应用

Ashwani Rajput, Neeraj Joshi

AI总结 提出一种双序贯抽样框架,通过控制第一类错误概率并最小化包含第二类错误和抽样成本的损失函数,检验两个移位指数模型的位置参数差异,具有一阶、二阶效率和风险效率。

详情
AI中文摘要

本文通过一种新颖的双序贯抽样框架,研究了比较两个移位指数模型位置参数的问题。所提出的假设检验过程通过将第一类错误概率控制在预设水平,同时最小化包含第二类错误概率和相应抽样成本的损失函数来开发。相应的最优固定样本量表达式依赖于未知的尺度参数,这使得在固定样本设计下,期望的检验精度在实践中无法实现。为克服这一困难,提出了一种双序贯抽样程序,用于在尺度参数未知且不等时检验位置参数之间的差异。所提出的方法具有理想的新近性质,包括一阶效率、二阶效率和二阶风险效率。广泛的模拟研究和涉及气象站强降水事件的实际数据应用证明了所提出程序的实际有效性和适用性。

英文摘要

This paper investigates the problem of comparing the location parameters of two shifted exponential models through a novel double sequential sampling framework. The proposed hypothesis testing procedure is developed by controlling the type I error probability at a preassigned level while minimizing a loss function that incorporates both the type II error probability and the associated sampling cost. The corresponding optimal fixed-sample-size expressions are shown to depend on unknown scale parameters, rendering the desired testing accuracies unattainable in practice under fixed-sample designs. To overcome this difficulty, a double sequential sampling procedure is proposed to test the difference between location parameters when the scale parameters are unknown and unequal. The proposed methodology is shown to possess desirable asymptotic properties, including first-order efficiency, second-order efficiency, and second-order risk efficiency. Extensive simulation studies and a real-data application that involves heavy precipitation episodes at meteorological stations demonstrate the practical effectiveness and applicability of the proposed procedure.

2606.19737 2026-06-19 stat.ME stat.ML 新提交

Calibration without labels in multiple testing

多重检验中的无标签校准

Adway S. Wadekar, Jake A. Soloff

AI总结 针对多重检验中无法观测真实标签的难题,利用有序p值间距构造伪标签,实现局部错误发现率的校准,并揭示q值在心理学和神经科学文献中可能严重失准。

详情
AI中文摘要

大规模假设检验支持对单个假设的概率性声明,如经验贝叶斯方法估计局部错误发现率。我们研究如何将这些声明解释为原假设的近似校准预测,即使在模型误设定下也能产生可解释的错误概率。我们的方法从概率预测中汲取概念灵感,但面临不同的挑战:与预测不同(标签最终可观测),在多重检验中真实情况从未揭示,因此校准必须随机评估并间接建立。我们通过构造一组伪标签来应对这一挑战,这些伪标签源自有序$p$值的间距,并以局部错误发现率作为回归目标。我们的构造解锁了现有工具,用于评估和执行多重检验中的事后校准。值得注意的是,我们在对已发表的心理学和神经科学文献的大规模实证调查中发现,基于错误发现率的流行误差度量$q$值可能严重失准。

英文摘要

Large-scale hypothesis testing supports probability claims about individual hypotheses, as in empirical Bayes methods for estimating local false discovery rates. We study how such claims can be interpreted as approximately calibrated forecasts of the null hypothesis, yielding interpretable error probabilities even under model misspecification. Our approach draws conceptual inspiration from probabilistic forecasting but addresses a different challenge: unlike forecasting, where labels are eventually observed, in multiple testing the ground truth is never revealed, so calibration must be assessed stochastically and established indirectly. We address this challenge by constructing a set of pseudo-labels, derived from the spacings of ordered $p$-values, which have the local false discovery rate as their regression target. Our construction unlocks existing tools for assessing and performing post-hoc calibration in multiple testing. Notably, we find on a large-scale empirical survey of published psychology and neuroscience literature that the $q$-value, a popular error measure based on the false discovery rate, can be severely miscalibrated.

2606.19580 2026-06-19 stat.ME stat.ML 新提交

Machine Learning Integrated in Wavelet Shrinkage (MLShrink)

机器学习集成小波收缩 (MLShrink)

Dixon Vimalajeewa, Vijini Lakmini, Brani Vidakovic

AI总结 提出MLShrink,结合小波收缩与机器学习,通过双阈值对中间带系数进行数据自适应分类,保留经典阈值简单性,理论证明其非扩张性和oracle一致性,在非平滑信号上表现优异。

详情
AI中文摘要

实践中遇到的数据经常被加性噪声污染,小波收缩仍是非参数估计中恢复潜在信号的基本工具。经典方法如硬阈值和软阈值几乎完全根据系数的大小决定是否保留。尽管在许多情况下有效,这些规则对于幅度落在信号与噪声区分不确定的中间区域的系数可能过于僵化。我们提出MLShrink,一种将小波收缩与机器学习相结合的双阈值小波去噪过程。低于下阈值的系数被丢弃,高于上阈值的系数被保留,中间带的系数使用局部小波域特征进行分类。这样,MLShrink在远离决策边界处保留了经典阈值的简单性,同时允许对模糊系数进行数据自适应决策。本文还为此架构开发了一个理论框架。我们证明MLShrink是一个非扩张的支持选择规则,推导出一个基于oracle的风险分解,表明多余的去噪风险由未决策带上的分类误差决定,并在分类器性能的适当假设下建立了oracle一致性结果。在标准基准信号上的模拟实验表明,MLShrink与几种已建立的小波收缩方法具有竞争力,尤其适用于具有不规则、边缘丰富或非平滑结构的信号。这些发现表明,中间阈值带上的学习决策为经典小波去噪与现代统计学习之间提供了有用且可解释的联系。

英文摘要

Data encountered in practice are frequently contaminated by additive noise, and wavelet shrinkage remains a fundamental tool for recovering underlying signals in nonparametric estimation. Classical procedures such as hard and soft thresholding decide whether to retain a wavelet coefficient almost entirely from its magnitude. Although effective in many settings, these rules can be too rigid for coefficients whose magnitudes fall in an intermediate region where the distinction between signal and noise is uncertain. We propose MLShrink, a two-threshold wavelet denoising procedure that combines wavelet shrinkage with machine learning. Coefficients below a lower threshold are discarded, coefficients above an upper threshold are retained, and coefficients in the intermediate band are classified using local wavelet-domain features. In this way, MLShrink preserves the simplicity of classical thresholding away from the decision boundary while allowing data-adaptive decisions for ambiguous coefficients. The paper also develops a theoretical framework tailored to this architecture. We show that MLShrink is a nonexpansive support-selection rule, derive an oracle-based risk decomposition showing that excess denoising risk is determined by classification errors on the undecided band, and establish an oracle-consistency result under suitable assumptions on classifier performance. Simulation experiments on standard benchmark signals indicate that MLShrink is competitive with several established wavelet shrinkage methods and is especially effective for signals with irregular, edge-rich, or non-smooth structure. These findings suggest that learned decisions on the intermediate threshold band provide a useful and interpretable connection between classical wavelet denoising and modern statistical learning.

2606.19572 2026-06-19 stat.ME 新提交

SCOPE Shrinkage: A Unified Framework for Wavelet Denoising

SCOPE 收缩:小波去噪的统一框架

Dixon Vimalajeewa, Vijini Lakmini, Malith Premarathna, Fabrizio Ruggeri, Brani Vidakovic

AI总结 提出基于对称单峰分布累积分布函数的SCOPE收缩族,通过两个可解释参数分离尺度与形状效应,实现局部强收缩与渐近无偏的平衡,在小波去噪中性能与可解释性兼具。

详情
AI中文摘要

我们引入了对称CDF导向概率增强(SCOPE)收缩,这是一个由对称单峰分布的中心累积分布函数构造的保号收缩规则统一族。所提出的框架生成了一类广泛的衰减轮廓,在零点附近强局部收缩与尾部渐近无偏行为之间插值。我们开发了一个通用公式,通过两个可解释参数分离尺度与形状效应,从而能够独立控制有效的阈值位置和过渡锐度。在明确的规律性假设下,建立了SCOPE收缩的结构性质,包括奇性、单调性、连续性、收缩性以及将规则与软化阈值算子联系起来的混合表示。还发展了贝叶斯和惩罚似然解释:SCOPE规则允许偶惩罚表示,该表示在系数幅度上非递减,并且合适的子类在适当的对称单峰先验下作为精确的最大后验估计出现。基于逻辑分布、均匀分布和柯西分布的代表性例子说明了概率形状如何控制收缩行为。通过Stein型无偏风险估计讨论了光滑子类的数据驱动参数选择。在标准Donoho-Johnstone测试函数上的Oracle校准模拟研究表明,SCOPE收缩与几种已建立的小波去噪方法相比具有竞争力,同时保持了高度的可解释性和结构灵活性。结果突出了中心分布函数作为小波去噪及相关估计问题中收缩的自然且通用的设计原则。

英文摘要

We introduce Symmetric CDF Oriented Probability Enhanced (SCOPE) shrinkage, a unified family of sign-preserving shrinkage rules constructed from centered cumulative distribution functions of symmetric unimodal distributions. The proposed framework generates a broad class of attenuation profiles that interpolate between strong local shrinkage near zero and asymptotically unbiased behavior in the tails. A general formulation is developed that separates scale and shape effects through two interpretable parameters, allowing effective threshold location and transition sharpness to be controlled independently. Under explicit regularity assumptions, structural properties of SCOPE shrinkage are established, including oddness, monotonicity, continuity, contractivity, and a mixture representation that connects the rules to softened thresholding operators. A Bayesian and penalized likelihood interpretation is also developed: SCOPE rules admit even penalty representations that are nondecreasing in coefficient magnitude, and suitable subclasses arise as exact maximum a posteriori estimators under proper symmetric unimodal priors. Representative examples based on logistic, uniform, and Cauchy distributions illustrate how probabilistic shape governs shrinkage behavior. Data driven parameter selection for smooth subclasses is discussed via Stein-type unbiased risk estimation. Oracle calibrated simulation studies on standard Donoho-Johnstone test functions show that SCOPE shrinkage performs competitively with several established wavelet denoising methods, while retaining a high degree of interpretability and structural flexibility. The results highlight centered distribution functions as a natural and versatile design principle for shrinkage in wavelet denoising and related estimation problems.

2606.18933 2026-06-19 cs.LG cs.IR stat.ME 新提交

Zero-Shot Active Feature Acquisition via LLM-Elicitation

基于LLM启发式的零样本主动特征获取

Binyamin Perets, Natalie Mendelson, Shiran Vainberg, Yehuda Chowers, Shai Shen-Orr, Shie Mannor

发表机构 * Faculty of EE, Technion(技术学院电子工程系) Faculty of Medicine, Technion(技术学院医学院) CytoReason NVIDIA

AI总结 提出通过LLM启发式获取马尔可夫随机场充分统计量的零样本主动特征获取框架,解决数据标注不足问题,在IBD患者诊断中优于现有方法。

详情
AI中文摘要

主动特征获取(AFA)顺序选择要观察的特征以达成分类或排序决策。其主要局限性在于依赖大量标注数据来拟合指导获取的概率模型。大型语言模型(LLM)提供无监督的领域知识,但作为序列规划者表现不佳。要求其同时知晓和决策会混淆最好分开的能力。这里,我们通过严格的启发式方法开发了一个零样本AFA框架:仅要求LLM返回其可被信任返回的内容,即马尔可夫随机场(MRF)的充分统计量——一元偏差和成对协变。我们将该框架应用于两个场景:二分类和top-$k$识别。实践中,LLM可靠地仅返回判别性统计量,即区分类别而非孤立每个类别的统计量,这阻碍了经典AFA。我们应用最大熵闭包来解决这种规范模糊性。我们在炎症性肠病(IBD)患者队列上进行评估,这是一个活跃的临床环境,其中诊断模糊性和患者异质性阻碍了稳定的治疗策略。我们的框架在真实标签和其自身提取的信念上均优于LLM。在最关键的地方,即最困难的患者上,我们的top-$k$获取策略显著优于所有现有方法。

英文摘要

Active feature acquisition (AFA) sequentially selects which features to observe to reach a classification or ranking decision. Its central limitation is reliance on large amount of labeled data to fit probabilistic models guiding acquisition. Large language models (LLMs) supply unsupervised domain knowledge, but are poor sequential planners. Asking one to both know and decide conflates capabilities best kept separate. Here, we develop a framework for zero-shot AFA through disciplined elicitation: asking the LLM only for what it can be trusted to return, the unary deviations and pairwise co-variations that are the sufficient statistics of a Markov random field (MRF). We apply our framework to two settings: binary classification and top-$k$ identification. In practice, the LLM reliably returns only discriminative statistics, what distinguishes the classes rather than each class in isolation, which precludes classical AFA. We apply a maximum-entropy closure that resolves this gauge ambiguity. We evaluate on a cohort of Inflammatory Bowel Disease (IBD) patients, an active clinical setting where diagnostic ambiguity and patient heterogeneity obstruct stable treatment strategies. Our framework outperforms the LLM both on real labels and on its own extracted beliefs. Where it matters most, on the hardest patients, our top-$k$ acquisition policy markedly outperforms all existing methods.

2. 贝叶斯统计与概率建模 3 篇

2606.19540 2026-06-19 stat.ME stat.CO stat.ML 新提交

Overfitted high-dimensional matrix factorizations via adaptive spectral shrinkage

通过自适应谱收缩的过拟合高维矩阵分解

Lorenzo Mauri, David B. Dunson

AI总结 提出EigenBayes方法,通过谱估计和自适应经验贝叶斯校准超参数,实现快速且具有不确定性量化的过拟合因子模型,在数值实验和基因组学应用中优于现有方法。

详情
AI中文摘要

因子模型是分析高维数据以提取低秩信号和估计协方差的常用方法。它们将协方差矩阵分解为低秩分量和对角分量之和。一个关键问题是如何选择潜在维度$k$,当因子模型仅近似成立且信噪比较低时,这尤其具有挑战性。贝叶斯过拟合因子模型指定$k$的上界,并依赖结构化收缩先验有效去除多余分量。这类方法流行且有效,但计算成本高。我们提出了一种更快的\texttt{EigenBayes}方法,基于潜在因子的谱估计和关键超参数的自适应经验贝叶斯校准,提供有效的不确定性量化。得到的后验分布可跨结果分解且解析可处理,绕过了马尔可夫链蒙特卡洛。我们证明\texttt{EigenBayes}能适应每个结果和潜在维度的信噪比,同时将多余的潜在分量收缩至零。我们建立了良好的渐近性质,并在数值实验和基因组学应用中展示了强大的实证性能,其中EigenBayes优于最先进的替代方法。

英文摘要

Factor models are popular approaches for analyzing high-dimensional data to extract low-rank signals and estimate covariances. They decompose the covariance matrix as the sum of low-rank and diagonal components. A key issue is how to choose the latent dimension $k$, which is particularly challenging when the factor model only holds approximately and in low signal-to-noise scenarios. Bayesian overfitted factor models specify an upper bound on $k$ and rely on structured shrinkage priors to effectively remove extra components. Such approaches are popular and effective, but computationally expensive. We propose a much faster \texttt{EigenBayes} approach that provides valid uncertainty quantification, based on spectral estimation of latent factors and adaptive empirical Bayes calibration of key hyperparameters. The resulting posterior distribution factorizes across outcomes and is analytically tractable, bypassing Markov chain Monte Carlo. We show that \texttt{EigenBayes} adapts to the signal-to-noise ratio of each outcome and latent dimension, while shrinking superfluous latent components to zero. We establish favorable asymptotic properties and demonstrate strong empirical performance in numerical experiments and a genomics application, where EigenBayes outperforms state-of-the-art alternatives.

2606.19643 2026-06-19 stat.ML cs.LG 新提交

Variational Consensus Monte Carlo for Bayesian Mixture

变分共识蒙特卡洛用于贝叶斯混合模型

Julie Fendler, Francesca L. Crowe, Tom Marshall, Sylvia Richardson, Paul D. W. Kirk

AI总结 提出变分共识蒙特卡洛方法扩展至过拟合贝叶斯混合模型,通过新颖的聚类匹配算法和聚合策略,在联邦学习设置下推断聚类数和所有参数,并在模拟和真实电子健康记录数据上验证了有效性。

详情
AI中文摘要

受健康数据的隐私、敏感性和共享限制的驱动,我们提出了一个在联邦学习设置下(即数据无法在计算节点之间完全共享或汇集)对贝叶斯混合模型进行推断的全面流程。我们采用共识蒙特卡洛(CMC)方法,在每个数据孤岛内独立运行MCMC算法以估计局部后验分布,然后聚合这些分布以近似完整数据的后验。Rabinovich, Angelino 和 Jordan (2015) [1] 的变分CMC方法将聚合步骤视为变分推断问题,但他们应用于混合模型时假设聚类数和关键混合参数已知。我们的主要方法贡献是:(i) 将变分CMC扩展到过拟合贝叶斯混合模型,该模型推断聚类数和所有模型参数,无需共轭性;(ii) 适用于跨孤岛设置的新颖聚类匹配算法,其中并非每个聚类都出现在每个局部数据集中;(iii) 针对聚合步骤的多种推断策略,匹配不同的联邦学习约束;以及 (iv) 在实践中选择这些策略的指南。一项全面的模拟研究验证了该框架,并允许我们与最先进的联邦学习替代方法进行比较。值得注意的是,我们表明当局部数据集的组成反映了数据中的底层聚类结构时,我们的方法可以比应用于汇集数据的标准MCMC更准确地恢复小聚类。我们在大规模电子健康记录数据上展示了该框架,识别了英国老年人群中的多发病模式。

英文摘要

Motivated by the privacy, sensitivity and sharing limitations of health data, we present a comprehensive pipeline for inference of Bayesian mixture models within a federated learning setting, i.e. when data cannot be fully shared or pooled across compute nodes. We adopt a Consensus Monte Carlo (CMC) approach, in which an MCMC algorithm is run independently within each data silo to estimate local posterior distributions, which are then aggregated to approximate the posterior over the full data. The variational CMC approach of Rabinovich, Angelino and Jordan (2015) [1] frames the aggregation step as a variational inference problem, but their application to mixtures assumes the number of clusters and key mixture parameters to be known. Our main methodological contributions are: (i) an extension of variational CMC to over-fitted Bayesian mixture models that infer the number of clusters and all model parameters, without requiring conjugacy; (ii) novel cluster-matching algorithms suitable for cross-silo settings in which not every cluster appears in each local dataset; (iii) a number of inference strategies for the aggregation step, matched to different federated learning constraints; and (iv) guidelines for choosing among these in practice. A comprehensive simulation study validates the framework and allows us to compare to state-of-the-art federated learning alternatives. Notably, we show that when the composition of local datasets reflects the underlying clustering structure in the data, our approach can recover small clusters with greater accuracy than standard MCMC applied to the pooled data. We illustrate the framework on large-scale electronic health record data, identifying multi-morbidity patterns in a British geriatric population.

2606.20480 2026-06-19 math.ST stat.ML stat.TH 新提交

Leveraging tails for adaptation

利用尾部进行自适应

Sergios Agapiou, Ismaël Castillo, Paul Egels

AI总结 研究非参数贝叶斯中基于p-指数尾先验的后验收缩率,发现p越小收缩越快,且p→0时可实现光滑性自适应,应用于白噪声回归和ReLU神经网络。

Comments 59 pages, 3 figures

详情
AI中文摘要

我们考虑非参数设定下贝叶斯后验分布的收缩,其中函数在基或字典上的系数被赋予具有$p$指数尾的先验,包括拉普拉斯尾$(p=1)$和更重的尾$(p<1)$。结果表明,随着$p$减小,收缩率提高,并且在适当的$p\to 0$范围内,可以获得对光滑性的完全自适应(达到对数因子)。作为应用,我们考虑了白噪声回归中的级数先验和随机设计回归中的浅层ReLU神经网络。特别地,我们表明过参数化的浅层ReLU网络可以适应任何正则性$0\le \beta\le 2$。通过模拟研究,我们展示了与理论预测行为的高度实证一致性。

英文摘要

We consider contraction of Bayesian posterior distributions in nonparametric settings where coefficients of a function over a basis or dictionary are given priors with $p$--exponential tails, including Laplace tails $(p=1)$ and heavier tails $(p<1)$. It is shown that contraction rates improve as $p$ decreases and that full adaptation to smoothness, up to logarithmic factors, is obtained in an appropriate $p\to 0$ regime. As applications, we consider both series priors in white noise regression and shallow ReLU neural networks in random design regression. In particular, we show that overparametrised shallow ReLU networks can adapt to any regularity $0\le β\le 2$. Through a simulation study, we show strong empirical agreement with the behavior predicted by our theory.

3. 因果推断与实验设计 5 篇

2606.20148 2026-06-19 stat.ME 新提交

A case study of causal mediation using Bayesian nonparametrics and semiparametric corrections

使用贝叶斯非参数和半参数修正的因果中介分析案例研究

Yuhua Zhang, Michael J. Daniels

AI总结 提出截断富集狄利克雷过程混合模型估计自然直接和间接效应,结合高效MCMC算法和基于有效影响函数的一步后验修正,解决贝叶斯非参数中因果估计量的可靠推断问题。

详情
AI中文摘要

我们提出了一种贝叶斯非参数方法,使用截断富集狄利克雷过程混合(EDPM)模型来估计存在后处理混杂因素时的因果中介分析中的自然直接效应(NDE)和间接效应(NIE)。我们引入了一种高效的簇重分配Metropolis-Hasting算法,以改善阻塞吉布斯采样器中的混合。我们基于有效影响函数实现了针对我们设定的一步后验修正。这个后处理步骤解决了贝叶斯非参数中的一个关键问题:如何从为复杂联合分布设计的模型中获得特定因果估计量(NDE和NIE)的可靠估计和后验,并具有优良的频率性质,如正确的覆盖。我们进行了模拟研究以评估我们方法的性能,并将其应用于评估一项体重管理临床试验中的因果中介效应。

英文摘要

We propose a Bayesian nonparametric approach using a truncated Enriched Dirichlet Process mixture (EDPM) model to estimate natural direct (NDE) and indirect (NIE) effects in causal mediation analyses in the presence of post-treatment confounders. We introduce an efficient cluster reallocation Metropolis-Hasting algorithm to improve mixing in the blocked Gibbs sampler. We implement a one-step posterior correction based on the efficient influence function for our setting. This post-processing step solves a critical problem in Bayesian nonparametrics: how to obtain reliable estimates and posteriors for a specific causal estimand of interest (the NDE and NIE) with excellent frequentist properties, such as correct coverage, from a model designed for complex joint distributions. We conduct simulation studies to assess our method's performance and apply it to evaluate causal mediation effects in a weight management clinical trial.

2606.20078 2026-06-19 stat.OT 新提交

A Law of Iterated Expectation Primer for Causal Inference

因果推断中的迭代期望定律入门

Ashley I. Naimi, Razieh Nabi, Lindsay J. Collin, Paul N. Zivich, Stephen R. Cole

AI总结 本文介绍迭代期望定律及其在因果效应识别中的应用,通过g公式的两种非参数等价形式(NICE和ICE)和三个数值示例阐明其数学直觉。

详情
AI中文摘要

g公式是识别观察数据中因果效应的基础工具,它基于迭代期望定律——统计学中的一个关键数学恒等式。然而,表达迭代期望定律和g公式的符号对于统计背景不足的人来说可能难以理解。我们提供了一篇入门文章,介绍迭代期望定律、用于表达它的积分符号,以及它通过g公式在因果效应识别中的作用。在因果一致性、正性和条件可交换性假设下,迭代期望定律可以重写为因果标准化公式(g公式),有两种非参数等价形式:非迭代条件期望(NICE)形式,涉及条件结果均值的单一加权平均;以及迭代条件期望(ICE)形式,涉及嵌套期望。我们通过三个逐步复杂的数值示例说明这两种形式:一个时间固定示例,包含单个二元混杂因子;一个时间固定示例,包含离散和连续混杂因子;以及一个时间变化示例,包含两个时间点。我们阐明了迭代期望定律是什么,它与g公式的关系,以及如何在实际数据示例中理解其数学公式的直觉,这些示例可以推广到各种场景。

英文摘要

The g-formula is a foundational tool for identifying causal effects in observational data. This tool is based on the law of iterated expectation, a key mathematical identity in statistics. However, the notation with which the law of iterated expectation and the g-formula is expressed can be opaque to those with little background in statistics. We provide a primer introducing the law of iterated expectation, the integration notation used to express it, and its role for causal effect identification via the g-formula. Under the assumptions of causal consistency, positivity, and conditional exchangeability, the law of iterated expectation can be rewritten as a causal standardization formula (the g-formula) in two nonparametrically equivalent forms: a non-iterative conditional expectation (NICE) form involving a single weighted average of conditional outcome means, and an iterative conditional expectation (ICE) form involving nested expectations. We illustrate both forms using three progressively complex numerical examples: a time-fixed example with a single binary confounder, a time-fixed example with discrete and continuous confounders, and a time-varying example with two timepoints. We provide clarity on what the law of iterated expectation is, how it is related to the g-formula, and how to gain intuition of its mathematical formulations in actual data examples that can be generalized to a range of settings.

2606.20206 2026-06-19 stat.ML cs.LG 新提交

Off-Policy Evaluation for Missingness-Aware Policies in MDPs with Rewards Missing Not at Random

马尔可夫决策过程中奖励非随机缺失的缺失感知策略的离线评估

Ziheng Wei, Annie Qu, Rui Miao

AI总结 针对奖励非随机缺失的离线强化学习问题,提出基于未来状态作为影子变量的识别方法,并利用桥函数和min-max估计器恢复条件均值奖励,实现缺失感知策略的离线评估。

Comments Accepted at ICML 2026. 31 pages, 6 figures

详情
AI中文摘要

在离线强化学习中,由于记录稀疏或不规则,或超出特定奖励值的审查,记录批次数据中的即时奖励通常未被观测到。这个问题出现在实际场景中,包括医疗和营销。我们研究了有限时域马尔可夫决策过程中奖励非随机缺失时的离线策略评估,这破坏了可忽略性,并即使在以状态和行动为条件后也会引起选择偏差。为了解决这个问题,我们形式化了一个依赖于奖励的倾向模型,并使用未来状态作为影子变量来识别完整数据的条件均值奖励。我们进一步引入了一个桥函数,无需显式建模MNAR机制即可恢复条件均值奖励,并通过min-max过程进行估计以避免双重采样。基于这些识别结果,我们提出了一个类似Fitted-Q-Evaluation的估计器,该估计器传播恢复的奖励,同时允许目标策略依赖于过去的缺失指示符。最后,我们为我们的OPE估计器建立了一致性和有限样本误差界,并通过实验在模拟数据和MIMIC-III脓毒症数据上展示了我们方法相比现有方法的强性能。

英文摘要

In offline Reinforcement Learning, immediate rewards in logged batch data are often unobserved due to sparse or irregular record-keeping, or censored beyond certain reward values. This issue arises in practical settings, including health care and marketing. We investigate off-policy evaluation (OPE) in finite-horizon Markov decision processes when rewards are missing not at random (MNAR), which breaks ignorability and induces selection bias even after conditioning on states and actions. To address this, we formalize a reward-dependent propensity model and use future states as shadow variables to identify the full-data conditional mean reward. We further introduce a bridge function that recovers the conditional mean reward without explicitly modeling the MNAR mechanism, and estimate it via a min-max procedure to avoid double sampling. Building upon these identification results, we propose an Fitted-Q-Evaluation-style estimator that propagates the recovered rewards while allowing target policies to depend on past missingness indicators. Finally, we establish consistency and finite-sample error bounds for our OPE estimator, and show through experiments the strong performance of our method compared to existing methods on simulated and MIMIC-III Sepsis data.

2606.17308 2026-06-19 stat.ME stat.ML 新提交

Kernel-Based Functional Balancing for Causal Inference with Compositional Treatments

基于核的协变量函数平衡法用于成分处理下的因果推断

Sungbum Kim, Jiayi Wang

AI总结 针对成分处理(暴露位于单纯形)的因果效应估计,提出基于核的协变量函数平衡加权法,通过最小化再生核希尔伯特空间中的最坏情况平衡误差构造权重,并构建增强加权估计量,实现√n一致性。

Comments 40 pages, 3 figures

详情
AI中文摘要

我们研究成分处理下的因果效应估计,其中暴露位于单纯形上,估计量定义在成分上而非标量或二元值。通过考虑平均潜在结果在处理空间上的投影,采用基于核的协变量函数平衡方法进行权重构造。权重通过直接最小化在由处理和协变量联合空间定义的再生核希尔伯特空间(RKHS)上的最坏情况平衡误差获得,而非在处理分配模型下估计。基于这些权重,提出了一个增强加权估计量(AWE),其中结果函数通过核岭回归估计,并与协变量分布的边际增广相结合。尽管所得目标函数结构复杂,但通过表示定理和低秩近似,我们将其转化为有限维凸优化问题。所提出的估计量在不要求权重一致估计或光滑性的情况下实现了√n一致性。建立了围绕样本特定目标的渐近正态性结果。通过模拟研究和真实数据应用展示了经验性能。

英文摘要

We study causal effect estimation with compositional treatments, where the exposure lies on a simplex and the estimand is defined over compositions rather than scalar or binary values. By considering a projection of the average potential outcome onto the treatment space, a kernel-based covariate functional balancing approach is adopted for weight construction. The weights are obtained by directly minimizing a worst-case balancing error over a reproducing kernel Hilbert space (RKHS) defined on the joint space of treatments and covariates, instead of being estimated under a treatment assignment model. Building on these weights, an augmented weighted estimator (AWE) is proposed, where the outcome function is estimated via kernel ridge regression and combined with a marginal augmentation over the covariate distribution. Despite the complex structure of the resulting objective, a finite-dimensional convex optimization problem is formulated via a representer theorem and a low-rank approximation. The proposed estimator achieves $\sqrt{n}$-consistency without requiring consistent estimation or smoothness of the weights. An asymptotic normality result is established around a sample-specific target. Empirical performance is demonstrated through simulation studies and a real data application.

2606.17165 2026-06-19 stat.ME cs.AI econ.EM math.ST stat.TH 新提交

Statistical Foundations of LLM-based A/B Testing: A Surrogacy Framework for Human Causal Inference

基于LLM的A/B测试的统计基础:用于人类因果推断的替代指标框架

Joel Persson, Mårten Schultzberg, Sebastian Ankargren

发表机构 * Spotify USA, Inc.(Spotify美国公司)

AI总结 提出替代指标理论框架,证明在弱于分布等价条件下,校准LLM输出可识别平均处理效应,并分析随机性带来的偏差与方差。

详情
AI中文摘要

组织和研究者越来越有兴趣在A/B测试中使用大型语言模型(LLM)代替人类参与者,以期更快、更低成本地进行实验。我们研究当在LLM结果上估计的处理效应何时能够恢复在感兴趣的人类群体上测量的效应。LLM与人类结果之间的分布等价性会使任何标准估计量有效,但这不现实。因此,我们开发了一个统计框架,将替代终点理论适配到LLM。该框架表明,将LLM结果校准到人类结果,在替代性和可比性条件(联合弱于分布等价性)下,可以识别平均处理效应。当这些条件不成立时,感兴趣的效应仅部分可识别,我们提供了诊断方法,可以在历史实验上证伪替代性,并给出有限重叠下最坏情况偏差的界限。我们进一步证明,LLM固有的随机性会引入偏差和方差,但使用多次抽取的平均值作为替代指标可以同时缓解两者。我们在模拟和Upworthy标题的A/B测试应用中展示了方法和理论。我们工作的一个核心结论是,LLM结果作为替代指标的有效性只能对过去的处理被证伪,而无法对新处理被验证,因此对于新颖干预,人类实验仍然不可或缺。我们讨论了LLM选择、提示和温度作为设计变量的作用,以及如何确定人类实验的规模以进行验证。

英文摘要

Organizations and researchers show increasing interest in using large language models (LLMs) in place of human participants in A/B tests, in the hope of experimenting faster and at lower cost. We study when a treatment effect estimated on LLM outcomes can recover the effect that would have been measured on the human population of interest. Distributional equivalence between LLM and human outcomes would make any standard estimator valid but is unrealistic. We therefore develop a statistical framework that adapts surrogate endpoint theory to LLMs, showing that calibrating LLM outcomes to human outcomes identifies the average treatment effect under surrogacy and comparability conditions that are jointly weaker than distributional equivalence. We present a falsification test for surrogacy and a bound on the worst-case bias from limited overlap between the LLM and human samples. We further show that the stochasticity inherent to LLMs can weaken surrogacy for identification while also introducing bias and variance during estimation, but that using an average over multiple LLM draws per unit as the surrogate mitigates these issues. Simulations validate the results, and an empirical application to A/B tests on Upworthy headlines shows that raw LLM predictions recover only 39\% of the human treatment effect while nonparametric calibration closes the gap. A central takeaway is that A/B testing on LLMs yields correct results only by assumption, whereas A/B testing on humans is correct by design, and that the required assumptions are hardest to justify precisely where A/B testing on LLMs promises the greatest benefit. We discuss the role of LLM choice, prompting, and temperature as design variables, the compounded challenge posed by long-term outcomes, and how to size human pilot studies for validation.

4. 高维统计与正则化 1 篇

2606.20514 2026-06-19 stat.ME 新提交

Hypergraph Variable Selection with False Discovery Rate Control

具有错误发现率控制的超图变量选择

Sarah Organ, Toby Kenney, Hong Gu

AI总结 针对预测变量复杂依赖结构导致变量选择方法功效降低的问题,提出基于超图的选择方法,在控制错误发现率的同时提高选择功效。

Comments 28 pages, 4 figures

详情
AI中文摘要

控制错误发现率的变量选择方法在预测变量呈现复杂依赖结构时往往会失去功效。我们先前表明,选择分层聚类组的预测变量可以缓解这一问题,同时保持错误发现率控制。然而,当相关性结构较不明确时,重叠的预测变量集可能更有效。我们引入了针对预测变量集上定义假设的广义错误发现率,并提出了一种基于超图的选择方法。该方法在各种设置下实现了更高的功效,同时保持了严格的错误发现率控制。

英文摘要

Variable selection methods that control the false discovery rate often lose power when predictors exhibit complex dependence structures. We previously showed that selecting hierarchically clustered groups of predictors can mitigate this issue while maintaining false discovery rate control. When correlations are less structured, however, overlapping predictor sets may be more effective. We introduce a generalized false discovery rate for hypotheses defined on sets of predictors and propose a hypergraph-based selection method. This approach achieves higher power across diverse settings while preserving rigorous false discovery rate control.

5. 计算统计与MCMC 5 篇

2606.20191 2026-06-19 stat.ML stat.ME 新提交

AK-MCS-C2 : Active Kriging Monte Carlo Simulation method with conformal certification for failure probability estimation

AK-MCS-C2: 具有共形认证的主动克里金蒙特卡洛模拟方法用于失效概率估计

Edgar Jaber, Vincent Chabridon, Mathilde Mougeot

AI总结 提出一种结合主动克里金蒙特卡洛模拟与共形预测的主动学习框架,通过自适应交叉共形策略和J+GP共形估计器,在少量样本下提供无分布假设的预测误差保证,提高极限状态面附近样本分类可靠性,从而提升失效概率估计的准确性和鲁棒性。

详情
AI中文摘要

我们提出了一种新颖的主动学习框架,用于结构可靠性分析中的失效概率估计,该框架将主动克里金蒙特卡洛模拟与共形预测相结合。所提出的方法采用了一种自适应交叉共形策略,专门针对小样本设置和基于J+GP共形估计器的克里金代理模型设计。与标准的AK-MCS方法不同,所提出的框架对预测误差提供了无分布假设的保证,从而对极限状态面附近的样本进行更可靠的分类。这种改进的不确定性量化增强了失效概率估计的准确性和鲁棒性,特别是在这种效率至关重要的罕见事件区域。可重复的数值结果说明了该方法的有效性,并在公认的基准测试上将其与经典方法进行了比较。

英文摘要

We introduce a novel active-learning framework for failure probability estimation in structural reliability analysis that integrates Active Kriging Monte Carlo simulation with conformal prediction. The proposed approach employs an adaptive cross-conformal strategy specifically designed for small-sample settings and kriging surrogate models using the J+GP conformal estimator. Unlike standard AK-MCS methods, the proposed framework provides distribution-free guarantees on prediction errors, leading to more reliable classification of samples near the limit-state surface. This improved uncertainty quantification enhances both the accuracy and robustness of failure probability estimates, especially for rare-event regimes where such efficiency is crucial. Reproducible numerical results illustrate the effectiveness of the method and also compare it to classical approaches on well-established benchmarks.

2606.20141 2026-06-19 stat.CO 新提交

DASH: A Dimensionality Reduction Method for Large-scale Convex MIQP with Applications in Subset Portfolio Selection

DASH: 一种用于大规模凸MIQP的降维方法及其在子集投资组合选择中的应用

Pinzhang Cheng

AI总结 提出DASH降维方法,通过减少变量层次改善大规模凸MIQP求解器性能,在子集投资组合选择中显著提升Gurobi难以求解问题的初始解质量。

详情
AI中文摘要

作为MIP(混合整数规划)的子集选择问题是NP难的。对于大规模问题,在合理时间内找到全局最优解是不可行的,实践中常通过MIP求解器寻找高质量的初始解。本文提出DASH(递减活动集层次)——一种降维方法,针对可表述为MIQP(混合整数二次规划)的一类最佳子集选择问题,提高MIP求解器的性能。我们在子集投资组合选择问题中开发并评估了DASH的性能,并与商业MIP求解器Gurobi进行了比较。除了问题规模外,问题的难度还与协方差矩阵的条件数以及投资组合权重的箱约束有关。大量不同问题配置的数值实验表明,当Gurobi难以求解问题时,DASH能持续显著改进初始解。特别是,DASH改进的幅度和持续时间随问题难度增加而扩大。

英文摘要

Subset selection problems as MIPs (Mixed Integer Programs) are NP-hard. For large scale problems, it is infeasible to find global optimal solutions in a reasonable time and good-quality incumbent solutions are sought after with MIP solvers in practice. This paper proposes DASH (Decreasing Active Set Hierarchy) -- a dimensionality reduction method that improves the MIP solver performance for a subclass of best subset selection problems that can be formulated as MIQPs (Mixed Integer Quadratic Programs). We develop and evaluate the performance of DASH in the subset portfolio selection problem with comparison to Gurobi, a commercial MIP solver. In addition to the problem size, the difficulty of a problem is related to the condition number of the covariance matrix and the box constraint on portfolio weights. An extensive set of numerical experiments with varying problem configurations shows that DASH offers consistent and significant improvement of incumbent solutions when the problem is difficult to solve by Gurobi. In particular, the magnitude and duration of improvement by DASH scale with the difficulty of the problem.

2606.19909 2026-06-19 stat.CO math.PR stat.ME 新提交

Establishing an $Ω(\sqrt{d})$ complexity lower bound for PDMP samplers and how to break it: a sub-$\sqrt{d}$ algorithm for Gaussian-tailed targets

建立 PDMP 采样器的 $\Omega(\sqrt{d})$ 复杂度下界及如何突破:针对高斯尾目标的一个亚 $\sqrt{d}$ 算法

Augustin Chevallier

AI总结 本文证明分段确定性马尔可夫过程采样器在标准设置下具有 $\Omega(\sqrt{d})$ 复杂度下界,并通过放宽目标密度连续时间不变性假设,提出一种新方案,对高斯尾目标实现 $O(d^\alpha)$($\alpha\in[0.2,0.3]$)的经验复杂度。

详情
AI中文摘要

尽管分段确定性马尔可夫过程(PDMP)采样器在理论上有非可逆性的吸引力,但迄今为止,尚未开发出在计算复杂度上相对于目标维度 $d$ 优于 $\mathcal{O}(\sqrt{d})$ 的 PDMP 采样器。我们通过在标准设置中建立 PDMP 采样器算法复杂度的 $\Omega(\sqrt{d})$ 下界,证明这是一个基本限制。通过放宽目标密度必须在所有连续时间保持不变的假设,我们随后展示了如何突破这一障碍。具体来说,我们引入了一种新颖的 PDMP 采样方案,并表明它对高斯尾目标实现了 $\mathcal{O}(d^\alpha)$ 的经验复杂度,其中 $\alpha \in [0.2, 0.3]$。此外,该 PDMP 方案在轨迹长度和速度更新之间的距离上都是局部自适应的。

英文摘要

Despite the theoretical appeal of their non-reversibility, to date, no Piecewise Deterministic Markov Process (PDMP) samplers have been developed that scale better than $\mathcal{O}(\sqrt{d})$ in computational complexity with respect to the target dimension $d$. We prove that this is a fundamental limitation by establishing an $Ω(\sqrt{d})$ lower bound on the algorithmic complexity of PDMP samplers in a standard setup. By relaxing the assumption that the target density must remain invariant at all continuous times, we then demonstrate how to bypass this barrier. Specifically, we introduce a novel PDMP sampling scheme and show that it achieves an empirical complexity of $\mathcal{O}(d^α)$, where $α\in [0.2, 0.3]$ for Gaussian-tailed targets. In addition, this PDMP scheme is locally adaptive in both trajectory length and distance between velocity updates.

2606.19655 2026-06-19 stat.CO math.ST stat.TH 新提交

A Flat Connection: The Pooling Factor and the Geometry of Centring in Hierarchical MCMC

平坦联络:分层MCMC中的汇集因子与中心化几何

Aidan D. Bindoff

AI总结 研究分层MCMC中中心化/非中心化障碍的几何原因,证明Fisher信息诱导的联络是平坦的,障碍源于统计上的汇集因子π_j,并据此提出诊断方法。

Comments 39 pages, 9 figures, accompanying R package

详情
AI中文摘要

标准MCMC诊断($\hat{R}$、有效样本量、发散计数)检测链是否混合,但不检测为何未混合。我们询问分层模型中的中心化/非中心化障碍是否具有度量之外的几何原因。联合参数空间是一个纤维丛(超参数为底,组级参数为纤维),Fisher信息度量诱导一个Ehresmann联络$A = -G_{FF}^{-1}G_{BF}$;自然假设是障碍是其曲率,采样器将其感受为和乐。我们证明这是错误的。对于任何光滑的分层后验,不仅是高斯情况,联络是平坦的,因为其水平叶是纤维得分$\partial_\alpha \log p$的水平集:度量之上没有几何障碍。剩下的障碍是统计的,而非几何的,平坦联络将其识别为一个单一量:纤维对底的条件依赖性,由每组的先验比例$\pi_j$(经典汇集因子)控制。该框架由此恢复了已有图景:先验主导的组混合缓慢,每组的非中心化最优权重有闭式解,并且一项模拟研究通过它们对分层方差的相反依赖性,将这种底-纤维耦合与漏斗(一种不同的底空间病态)区分开来。一项直接归因测试确认NUTS不运输纤维:链级足迹是先验主导组中多余的条件自相关,正如$\pi_j$所预测。真正的、甚至旋转的曲率确实出现,但仅针对由采样器工作度量(固定质量矩阵)构建的联络,此时和乐作为算法现象而非几何现象重新出现。先验比例诊断作为R包fibr分发,几何方法作为附带的复现代码。

英文摘要

Standard MCMC diagnostics ($\hat{R}$, effective sample size, divergence counts) detect whether a chain has mixed, but not why it has not. We ask whether the centring/non-centring obstruction in hierarchical models has a geometric cause beyond the metric. The joint parameter space is a fiber bundle (hyperparameters the base, group-level parameters the fibers), and the Fisher information metric induces an Ehresmann connection $A = -G_{FF}^{-1}G_{BF}$; the natural hypothesis is that the obstruction is its curvature, felt by the sampler as holonomy. We prove this false. The connection is flat for any smooth hierarchical posterior, not only the Gaussian case, because its horizontal leaves are the level sets of the fiber score $\partial_α\log p$: there is no geometric obstruction above the metric. What remains is statistical, not geometric, and the flat connection identifies it as a single quantity: the conditional dependence of fiber on base, governed per group by the prior fraction $π_j$, the classical pooling factor. From it the framework recovers the established picture, that prior-dominated groups mix slowly and that the optimal per-group non-centring weight follows in closed form, and a simulation study separates this base-fiber coupling from the funnel, a distinct base-space pathology, by their opposite dependence on the hierarchical variance. A direct attribution test confirms that NUTS does not transport the fiber: the chain-level footprint is excess conditional autocorrelation in prior-dominated groups, exactly as $π_j$ predicts. Genuine, even rotational, curvature does appear, but only for connections built from a sampler's working metric (a fixed mass matrix), where holonomy re-enters as an algorithmic rather than geometric phenomenon. The prior-fraction diagnostic is distributed as the R package fibr, with the geometric methods as accompanying reproduction code.

2606.19361 2026-06-19 cs.LG cs.AI cs.NA math.NA stat.CO stat.ME stat.ML 新提交

Computational Identifiability

计算可识别性

Lucius E. J. Bynum, Rajesh Ranganath, Kyunghyun Cho

发表机构 * New York University(纽约大学)

AI总结 提出“计算可识别性”框架,通过有限计算搜索过程在指定误差容限内找到经验估计量,从而解决理论可识别性在有限样本、模糊图标准等实际场景中的不足。

详情
AI中文摘要

识别条件描述了目标查询或感兴趣参数作为可用信息类型和数量的函数的可计算性。在因果识别中,这些信息通常以因果图的形式表达,数据是针对图中某些变量子集观测或收集的。目标查询可以是单个效应,也可以是给定模型中的一类效应。识别算法的推导在数学上定义了期望中理论上唯一确定所需因果效应的过程。期望中的可识别性,即“理论可识别性”,通常假设渐近性质、无限数据或其他数学理想化条件。在本文中,我们探讨了这种理论理想化的可识别性与一种受计算限制的替代方案之间的根本区别。我们提出的框架——“计算可识别性”——而是为经验估计量定义一个有限的计算搜索过程。如果该过程在期望的误差容限内经验性地找到了估计量,则满足可识别性,条件取决于搜索的指定假设(即参数上的先验分布)以及搜索过程本身。通过多个实验,我们展示了该框架如何回答细粒度的实际识别问题,例如小有限样本下的识别、模糊图标准下的识别、混合观测-干预数据下的识别,以及跨反事实数据和估计量的识别。代码见 https://this https URL。

英文摘要

Identification conditions describe the computability of a target query or parameter of interest as a function of the type and amount of information available. In causal identification, this information is often expressed in the form of a causal graph, and data are observed or collected for some subset of variables in the graph. Target queries may be for a single effect alone or for a class of effects in a given model. The derivation of an identification algorithm then defines mathematically the process by which the desired causal effect(s) can be uniquely determined, theoretically, in expectation. Identifiability in expectation, or 'theoretical identifiability,' generally assumes asymptotic properties, infinite data, or other mathematically idealized conditions. In this paper, we explore a fundamental distinction between this theoretical, idealized notion of identifiability and a proposed alternative that is computation-bound. The framework we propose - 'computational identifiability' - is to instead define a finite computational search procedure for an empirical estimator. If this process finds an estimator empirically, within a desired error tolerance, then identifiability is satisfied, conditional on the specified assumptions of the search (i.e., a prior distribution over the parameters) and conditional on the search procedure itself. Through several experiments, we demonstrate how this framework allows us to answer fine-grained, practical identification questions, such as identification with small finite samples, with ambiguous graphical criteria, with mixed observational-interventional data, and across counterfactual data and estimands. Code is available at https://github.com/lbynum/metadentify.

6. 机器学习统计基础 12 篇

2606.20451 2026-06-19 stat.ML cs.LG stat.AP stat.CO 新提交

SSH-Net: A Deep Neural Network for Predicting Failure Time Distribution Functions under Competing Risks with Application to GPU Data

SSH-Net: 一种用于竞争风险下预测失效时间分布函数的深度神经网络及其在GPU数据上的应用

Jie Min, Yueyao Wang, Mengkun Chen

AI总结 提出结构化分段风险深度神经网络(SSH-Net),通过将网络结构与数据结构关联,允许不同协变量组通过子网络影响预测,在竞争风险框架下预测失效时间分布函数,仿真和GPU数据验证了准确性。

详情
AI中文摘要

竞争风险在工程领域常见,当应用场景复杂时会给时间事件数据建模带来挑战。近年来,深度神经网络因其灵活性和高学习能力在竞争风险预测中受到广泛关注。然而,神经网络结构的复杂性使得基于不同数据输入的超参数调优更加困难。此外,当工程系统具有多层级的复杂物理结构时,将所有结构层级视为单一输入组可能无法捕捉关键信息。为解决这些问题,我们提出了一种结构化分段风险深度神经网络(SSH-Net),用于在特定原因竞争风险框架下预测失效时间。我们的方法将神经网络结构与数据结构相关联,并允许不同的协变量组通过分离的子网络影响失效预测。神经网络基于特定原因竞争风险模型构建。SSH-Net输出特定原因风险函数,并采用惩罚对数似然作为损失函数。通过评估Brier分数、接收者操作特征曲线下面积(AUC)和预测的特定原因累积发生函数的均方根误差(RMSE),仿真研究验证了SSH-Net的预测准确性。我们进一步使用Titan GPU失效时间数据展示了模型预测失效时间分布函数的能力。

英文摘要

Competing risks are commonly observed in engineering fields and can bring challenges to time-to-event data modeling when the application scenarios are complicated. Recently, deep neural networks have received great attention for prediction with competing risks, due to their flexibility and high learning capability. However, the complexity of neural network structure brings extra difficulty in hyperparameter tuning based on different data inputs. Additionally, when an engineered system has complex physical structures with multiple hierarchical levels, treating all structural levels as a single group of inputs may fail to capture critical information. To address the issues, we propose a Structured Segmented Hazard Deep Neural Network (SSH-Net) for failure time prediction under cause-specific competing risks framework. Our approach associates neural network structure with data structures, and allows different covariate groups to impact the failure prediction through separate sub-networks. The neural network is constructed based on a cause-specific competing risks model. The SSH-Net outputs cause-specific hazard functions, and utilizes the penalized log-likelihood as the loss function. The prediction accuracy of SSH-Net is validated through simulation studies by evaluating the Brier score, the area under receiver operating characteristic curves (AUC), and the root mean square error (RMSE) of the predicted cause-specific cumulative incident function. We further demonstrate the model's ability to predict failure time distribution functions using the Titan GPU failure time data.

2606.19714 2026-06-19 stat.ML cs.AI cs.LG stat.CO stat.ME 新提交

AURA: Adaptive Uncertainty-aware Refinement for LLM-as-a-Judge Auditing

AURA: 用于LLM作为评判审计的自适应不确定性感知精炼

Zilong Zhang, Yi-Ting Hung, Weiyi He, Junxi Zhang, Lei Ding, Chi-Kuang Yeh

AI总结 提出AURA框架,通过自适应不确定性感知精炼,在少量人工验证下迭代学习人类一致性信号,优先审核不确定比较,提升LLM评判的可靠性。

详情
AI中文摘要

大型语言模型(LLM)越来越多地被用作开放式生成的评判者,因为大规模人工评估通常昂贵且难以扩展,但它们的偏好仍然是人类判断的不完美代理。现有的审计流程通常假设事先存在可靠的示例子集或干净的监督信号,例如来自人工注释、启发式过滤或强评判者的输出。在LLM评估中,这一假设是脆弱的:初始分割可能继承评判者偏差,而人工验证通常过于稀缺,无法在规模上定义稳定组。我们提出AURA,一种自适应不确定性感知精炼框架,用于在选定的人工验证下审计成对LLM作为评判的决策。AURA迭代学习人类一致性信号,传播可靠证据,并优先将不确定的比较提交人工审核。关键思想是将对评判者的信任视为一个潜在量,随着证据积累逐步精炼。我们提供了紧凑的公式、稳定的精炼过程,以及在合成和真实成对LLM答案数据上的全面评估。

英文摘要

Large language models (LLMs) are increasingly used as judges for open-ended generation, as large-scale human evaluation is often expensive and difficult to scale, yet their preferences remain imperfect proxies for human judgment. Existing auditing pipelines often assume that a reliable subset of examples or clean supervision signals are available beforehand, for example from human annotation, heuristic filtering, or the outputs of strong judges. In LLM evaluation, this assumption is fragile: the initial split may inherit judge bias, while human verification is typically too scarce to define stable groups at scale. We propose AURA, an adaptive uncertainty--aware refinement framework for auditing pairwise LLM--as--a--judge decisions under selected human verification. AURA iteratively learns a human-consistency signal, propagates reliable evidence, and prioritizes uncertain comparisons for human review. The key idea is to treat trust in a judge as a latent quantity that is progressively refined as evidence accumulates. We provide a compact formulation, a stable refinement procedure, and a comprehensive evaluation on both synthetic and real pairwise LLM-answer data.

2606.19587 2026-06-19 stat.ML cs.LG 新提交

A Solver-Free Training Method for Predict-then-Optimize

一种无求解器的预测后优化训练方法

Beichen Wan, Mo Liu

AI总结 提出一种基于测度变换的决策聚焦学习管道,通过无求解器代理损失实现预测后优化中预测模型的高效训练,理论保证Fisher一致性,训练时间降低数个数量级。

Comments Accepted by ICML 2026

详情
AI中文摘要

我们提出了一种可扩展的方法,用于在预测后优化范式中训练预测(机器学习)模型,其中模型输出作为后续线性优化任务的系数。直接最小化经验决策遗憾对于线性规划和组合优化是不可行的,因为决策映射是分段常数,且梯度几乎处处为零。虽然现有方法通过平滑微分过程来解决这一问题,但它们存在可扩展性问题,因为每次梯度评估都需要调用计算昂贵的求解器。为了解决这个问题,我们提出了一种基于测度变换原理的决策聚焦学习管道,该管道在训练期间产生一个完全无优化求解器的新代理损失。我们建立了理论保证,包括Fisher一致性和超额风险界。实验上,我们的方法在实现与最先进方法相当的决策质量的同时,将训练时间减少了数个数量级。

英文摘要

We propose a scalable method for training prediction (machine learning) models in the predict-then-optimize paradigm, where model outputs serve as coefficients for a subsequent linear optimization task. Directly minimizing the empirical decision regret is intractable for linear programming and combinatorial optimization since the decision mapping is piecewise constant, and the gradients are zero almost everywhere. While existing methods address this by smoothing the differentiation process, they suffer from scalability issues, since a computationally expensive solver call is required for every gradient evaluation. To address this, we propose a decision-focused learning pipeline based on a measure transformation principle, which yields a new surrogate loss that is completely optimization-solver-free during training. We establish theoretical guarantees, including Fisher consistency and excess risk bounds. Empirically, our method achieves decision quality competitive with state-of-the-art methods while reducing training time by orders of magnitude.

2606.19410 2026-06-19 stat.ML cs.LG 新提交

The Representational Limit of Scalar Interactions: An Interventional Decomposition

标量交互的表征限制:一种干预分解

Potito Aghilar, Sabino Roccotelli, Stanislao Fidanza, Vito Walter Anelli, Sebastiano Stramaglia, Tommaso Di Noia

AI总结 本文证明标量交互指标混淆了唯一性、冗余性和协同性,并提出Stochastic Hi-Fi方法,通过干预掩码推理分解每个特征的U/R/S轮廓,在表格和图像任务中恢复被标量基线遗漏的结构。

详情
AI中文摘要

有符号的成对交互指标从根本上混淆了唯一性(U)、冗余性(R)和协同性(S)。我们在一个最小的3路XOR结构因果模型上证明了这一点:忠实的指标如Shapley-Taylor对每对返回零,而投影指标如Shapley Interaction将三阶效应扩散到混淆三种机制的成对标量中。我们引入了Stochastic Hi-Fi,一种事后、无需重新训练的可预测性分解方法,通过干预掩码推理估计每个特征的U/R/S轮廓。该估计器提供精确的干预语义、有限样本蒙特卡洛界限、耦合菱形采样带来的严格方差减少以及均匀的有限词汇收敛。在表格SCM上,Stochastic Hi-Fi恢复了被标量基线遗漏的结构(交互幅度恢复比高达411倍)。它还在GPT-2 IOI电路中分离了冗余和协同头。在NIH ChestX-ray14上,Stochastic Hi-Fi在Pointing Game中匹配GradCAM,并在Deletion AUC上显著改进。

英文摘要

Signed pairwise interaction scores fundamentally conflate uniqueness (U), redundancy (R), and synergy (S). We prove this on a minimal 3-way XOR structural causal model: faithful indices such as Shapley-Taylor return zero per pair, whereas projective indices such as Shapley Interaction spread the third-order effect into pair scalars that conflate the three mechanisms. We introduce Stochastic Hi-Fi, a post-hoc, retraining-free predictability decomposition that estimates per-feature U/R/S profiles by interventional masked inference. The estimator provides exact interventional semantics, finite-sample Monte Carlo bounds, strict variance reduction from coupled diamond sampling, and uniform finite-vocabulary convergence. Across tabular SCMs, Stochastic Hi-Fi recovers structure missed by scalar baselines (up to 411x larger interaction-magnitude recovery ratios). It also separates redundant and synergistic heads in the GPT-2 IOI circuit. On NIH ChestX-ray14, Stochastic Hi-Fi matches GradCAM on Pointing Game and improves substantially on Deletion AUC.

2606.19883 2026-06-19 cs.LG stat.ML 新提交

Matching Markets meet Cumulative Prospect Theory: Towards Optimal and Adversarially Robust Learning

匹配市场遇上累积前景理论:迈向最优和对抗鲁棒学习

Ananya Kunisetty, Avishek Ghosh

发表机构 * Indian Institute of Technology Bombay(印度理工学院孟买分校)

AI总结 研究基于累积前景理论(CPT)的竞争性双边匹配市场多智能体多臂赌博机问题,提出最优遗憾界算法并扩展到对抗性市场。

Comments Accepted at ECML-PKDD 2026, Naples, Italy

详情
AI中文摘要

我们研究了一个在竞争性设置下具有双边匹配市场的多智能体多臂赌博机问题,该问题基于以人为中心的决策模型。为了捕捉人类偏好,我们使用累积前景理论(CPT),该理论通过一个(α-Hölder连续)权重函数以非线性方式加权智能体的行动。CPT已被广泛用于行为经济学和风险敏感机器学习中,以模拟人类偏好。我们分析了带有CPT权重扭曲奖励的最先进学习算法,并获得了玩家最优遗憾界为$\mathcal{O}(K\log T \left(\frac{1}{\Delta}\right)^{2/\alpha})$,其中$K$表示臂数,$T$是学习时间,$\Delta$表示(适当定义的)玩家的最小偏好差距。注意到对$\Delta$的依赖是次优的,我们通过明智地选择探索期间的活跃臂集进一步改进了这一遗憾,从而在主导项中消除了对$K$的依赖,并在臂数$K$显著大于玩家数$N$的设置中实现了改进的(最优)遗憾保证。此外,我们考虑了对抗性市场,其中智能体的观测奖励可能被破坏。我们提出并分析了在已知和未知总破坏预算两种设置下,以CPT作为风险敏感度量的鲁棒市场算法,并在两种情况下建立了对数级别的玩家最优遗憾保证。

英文摘要

We study a multi-agent multi-armed bandit problem in the competitive setup with two-sided matching markets under a human centric decision making model. To capture human preferences, we use cumulative prospect theory (CPT) that weighs the actions of the agent in a nonlinear fashion using a ($α$-Hölder continuous) weight function. CPT has been widely used in behavioral economics and risk sensitive machine learning to emulate human preferences. We analyze the state-of-the-art learning algorithm with CPT weight distorted rewards and obtain a player optimal regret of $\mathcal{O}(K\log T \left(\frac{1}Δ\right)^{2/α})$, where $K$ denotes the number of arms, $T$ is the learning horizon, and $Δ$ represents (suitably defined) players' minimum preference gap. Noticing the dependence on $Δ$ to be sub-optimal, we further improve this regret by judiciously selecting the active set of arms during exploration, which removes the dependence on $K$ in the dominant term and achieves an improved (optimal) regret guarantees in the setting where the number of arms $K$ is significantly larger than the number of players $N$. In addition, we consider adversarial markets where the observed rewards of the agents may be corrupted. We propose and analyze algorithms for robust markets with CPT as risk sensitive measure in both settings where the total corruption budget is known and where it is unknown, and establish logarithmic player-optimal regret guarantees in both cases.

2606.19607 2026-06-19 cs.AI stat.AP 新提交

Which Pairs to Compare for LLM Post-Training?

LLM后训练中应比较哪些对?

Jiangze Han, Vineet Goyal, Will Ma

发表机构 * Columbia University(哥伦比亚大学)

AI总结 研究偏好后训练中如何选择最具信息量的比较对,提出基于采样设计的比较策展方法,通过DPO训练的理论分析给出优化准则,实验证明能提升样本效率。

详情
AI中文摘要

基于偏好的后训练已成为对齐语言模型的核心范式。常见的数据收集策略是为每个提示生成少量补全并标注生成的比较对。然而,人工偏好标签通常比生成额外补全昂贵得多,这提示了相同标注预算的不同使用方式:生成更大的补全集,但只标注最具信息量的比较对。本文研究在基于偏好的后训练中应比较哪些对。我们将比较策展形式化为一个采样设计问题,并通过基于偏好的后训练目标下的最终策略质量来评估设计。我们针对直接偏好优化(DPO)实例化该框架,分析标注对的选择如何通过DPO训练传播到下游策略性能。我们的主要结果为DPO训练策略的后训练最优性差距提供了匹配的上界和下界。这些界限表明,比较选择通过一个单一的设计相关信息矩阵影响下游性能,该矩阵将标签分配与参数估计误差和策略次优性联系起来。这为预算受限的比较策展提供了显式优化准则,并激发了从大型生成补全池中选择信息对的实际采样设计。在合成设置和语言模型后训练基准上的实验表明,所提出的设计在样本效率上持续优于常见的比较选择启发式方法。

英文摘要

Preference-based post-training has become a central paradigm for aligning language models. A common data-collection strategy is to generate a small set of completions for each prompt and label the resulting comparison pairs. However, human preference labels are often much more expensive than generating additional completions, suggesting a different use of the same labeling budget: generate a larger pool of completions, but label only the most informative comparison pairs. This paper studies which pairs should be compared in preference-based post-training. We formulate comparison curation as a sampling-design problem and evaluate designs by the quality of the final policy under the preference-based post-training objective. We instantiate this framework for Direct Preference Optimization (DPO), analyzing how the choice of labeled pairs propagates through DPO training to downstream policy performance. Our main results provide matching upper and lower bounds on the post-training optimality gap of the DPO-trained policy. The bounds show that comparison selection affects downstream performance through a single design-dependent information matrix, which links label allocation to parameter estimation error and policy suboptimality. This yields an explicit optimization criterion for budgeted comparison curation and motivates practical sampling designs for selecting informative pairs from large generated completion pools. Experiments on synthetic settings and language-model post-training benchmarks show that the proposed designs consistently improve sample efficiency over common comparison-selection heuristics.

2606.19491 2026-06-19 cs.LG stat.ML 新提交

Algebraic Dead Directions in LayerNorm Transformers: A Forward-Pass-Only Diagnostic at LLM Scale

LayerNorm Transformer 中的代数死方向:一种仅需前向传播的大语言模型规模诊断方法

Tejas Pradeep Shirodkar, P. J. Narayanan

发表机构 * IIIT, Hyderabad(海得拉巴国际信息技术学院)

AI总结 本文发现 LayerNorm 的逆尺度方向是后最终归一化中心激活协方差矩阵的精确代数核,可仅从参数中读取死方向,无需前向或后向传播,并在 14 个预训练模型上验证了其有效性。

Comments 34 pages, 7 figures, 6 tables. Empirical companion to arXiv:2606.05957

详情
AI中文摘要

预训练 Transformer 位于损失函数的奇异极小值附近,此时 Fisher 信息度量沿死方向退化:参数空间中方向性 Fisher 为零的方向。通常定位这样的方向需要一次前向传播和激活矩阵的特征分解,或基于采样的复杂度估计;没有一种方法能仅从网络参数计算方向。我们针对 LayerNorm Transformer 给出了一个这样的方向。LayerNorm 仿射的逆尺度方向 $\gamma^{-1}/\|\gamma^{-1}\|$ 是后最终归一化中心激活协方差矩阵的精确代数核,适用于任何输入分布,并在参数空间中诱导出相应的死方向。它仅从 LN 尺度参数读取,无需前向或后向传播,无需特征分解:这是针对 LayerNorm 的最廉价死方向读取方法。我们在 14 个预训练 Transformer(9 个 LayerNorm,5 个 RMSNorm;160M-35B;语言和视觉目标)上进行了测试。在随机初始化时,预测方向与测量的底部奇异方向(一次前向传播,直接 SVD)在 9/9 的 LayerNorm 模型上匹配到小数点后四位,并在 5/5 的 RMSNorm 模型上正确缺失,后者缺乏产生该方向的均值减法投影器。在训练后的检查点上,沿该方向的协方差特征值加深约 ${\sim}10^3$ 倍,并打开更多死方向;随机初始化到训练后的差距是一次前向传播、每检查点沿预测坐标的奇异结构读出。由此得出两个闭式结论:残差流的最小奇异值在 13/14 个 Transformer 上逐块保持不变(在其自身输入分布上测量),唯一的例外(Gemma$4$-$31$B)是一个真正的死方向,同一读出可精确定位;核方向的存在从参数本身即可对 Transformer 的归一化进行分类。

英文摘要

Pretrained transformers sit near singular minima of the loss, where the Fisher information metric degenerates along dead directions: directions in parameter space along which the directional Fisher vanishes. Locating such a direction normally needs a forward pass and an eigendecomposition of activations, or a sampling-based complexity estimate; none returns a direction computable from the network's parameters alone. We give one, for LayerNorm transformers. The inverse-scale direction $γ^{-1}/\|γ^{-1}\|$ of the LayerNorm affine is an exact algebraic kernel of the post-final-norm centred activation covariance, for any input distribution, and induces a corresponding dead direction in parameter space. It is read from the LN scale parameter alone, with no forward or backward pass and no eigensolve: the cheapest dead-direction read, specific to LayerNorm. We test it on $14$ pretrained transformers ($9$ LayerNorm, $5$ RMSNorm; $160$M-$35$B; language and vision objectives). At random initialisation the predicted direction matches the measured bottom singular direction (one forward pass, direct SVD) to four decimal places on $9/9$ LayerNorm models, and is correctly absent on $5/5$ RMSNorm models, which lack the mean-subtraction projector that creates it. On the trained checkpoint the covariance eigenvalue along this direction deepens by ${\sim}10^3\times$ and further dead directions open; the random-init-to-trained gap is a one-forward-pass, per-checkpoint readout of singular structure along the predicted coordinate. Two consequences follow in closed form: the residual stream's smallest singular value is preserved block-to-block on $13/14$ transformers measured on their own input distribution, the one exception (Gemma$4$-$31$B) a genuine dead direction the same read pinpoints; and the kernel direction's presence classifies a transformer's normalisation from the parameters alone.

2606.20557 2026-06-19 cs.LG math.ST stat.ML stat.TH 新提交

Optimal Deterministic Multicalibration and Omniprediction

最优确定性多校准与全预测

Georgy Noarov, Aaron Roth

发表机构 * University of Pennsylvania(宾夕法尼亚大学)

AI总结 本文提出一种确定性算法,实现多校准的极小化最优样本复杂度,并推广到结果不可区分性,解决确定性预测器是否必要的问题。

详情
AI中文摘要

一个模型在一组群体权重 $G$ 上是多校准的,如果它是校准的——即即使以其预测为条件也是无偏的——不仅整体上,而且在通过每个 $g \in G$ 对上下文重新加权后也是如此。这对于许多下游应用是一个有用的性质,也是可信机器学习的基本要求。在这项工作之前,所有已知达到 $\varepsilon$-多校准的极小化最优 $\widetilde O(\varepsilon^{-3})$ 样本复杂度的预测器都是随机化的,而确定性预测器仅以更差的样本复杂度已知。多校准中随机化对于最优样本复杂度是否必要的问题由 [CLNR26] 明确提出,并在之前的几项工作中隐含提出。我们通过给出一个输出确定性预测器的极小化最优多校准算法解决了这个开放问题。然后我们将该算法推广到产生满足关于有限或有限覆盖测试集合的结果不可区分性(OI)的最优确定性预测器。作为一个应用,这也给出了具有最优样本复杂度的确定性全预测器和泛预测器,解决了 [OKK25] 和 [BHHLZ25] 提出的开放问题。

英文摘要

A model is multicalibrated on a collection of group weights $G$ if it is calibrated -- i.e. unbiased even conditional on its prediction -- not just overall, but also after reweighting contexts by each $g \in G$. It is a useful property for many downstream applications and is a basic desideratum of trustworthy machine learning. Before this work, all predictors known to attain the minimax-optimal $\widetilde O(\varepsilon^{-3})$ sample complexity rate for $\varepsilon$-multicalibration were randomized, while deterministic predictors were known only with substantially worse sample complexity. Whether randomization is necessary for optimal sample complexity in multicalibration was explicitly asked by [CLNR26] and implicitly in several prior works. We resolve this open problem by giving a minimax-optimal multicalibration algorithm that outputs a deterministic predictor. We then generalize the algorithm to produce optimal deterministic predictors that satisfy outcome indistinguishability (OI) with respect to finite or finitely covered collections of tests. As an application, this also gives deterministic omnipredictors and panpredictors with optimal sample complexity, resolving open problems posed by [OKK25] and [BHHLZ25].

2606.20022 2026-06-19 stat.ML cs.LG math.OC 新提交

Stochastic Linear Contextual Bandits with Bounded Noise: A Set-Membership Approach

具有有界噪声的随机线性上下文赌博机:一种集合成员方法

Haonan Xu, Yingying Li

AI总结 针对有界奖励噪声的随机线性上下文赌博机,提出基于集合成员估计和乐观原则的SME-OFU算法,实现O(log T)的遗憾界,优于次高斯噪声下的最优界。

Comments 23 pages, 1 figure

详情
AI中文摘要

本文考虑具有有界奖励噪声的随机线性上下文赌博机(SLCB)。现有工作通常假设次高斯奖励噪声和有界期望奖励,在此条件下最优遗憾界关于时间T为$\tilde{O}(\sqrt{T})$。然而,在许多应用中,实现/观测到的奖励也自然有界,这意味着奖励噪声有界。有界噪声比次高斯条件更具信息性,但在SLCB文献中尚未被明确利用。本文通过利用一种称为集合成员估计(SME)的不确定性量化方法,并应用面对不确定性的乐观原则(OFU),提出了一种新颖的算法SME-OFU。我们的算法享有改进的遗憾界$O(\log T)$。注意,这并不与次高斯噪声下现有的最优界$\tilde{O}(\sqrt{T})$矛盾,因为有界噪声是更强的条件。最后,仿真表明,当奖励噪声有界时,SME-OFU相对于为次高斯噪声设计的基准算法在经验上有所改进。

英文摘要

This paper considers stochastic linear contextual bandits (SLCB) with bounded reward noise. Existing works typically assume sub-Gaussian reward noise and bounded expected rewards, under which the optimal regret bound scales as $\tilde{O}(\sqrt{T})$ in terms of horizon $T$. However, in many applications, realized/observed rewards are also naturally bounded, implying bounded reward noise. Bounded noise is more informative than the sub-Gaussian condition but has not been leveraged explicitly in the SLCB literature. In this paper, we propose a novel algorithm SME-OFU by utilizing an uncertainty quantification method called set-membership estimation (SME) and applying the principle of optimism in the face of uncertainty (OFU). Our algorithm enjoys an improved regret bound $O(\log T)$. Notice that this does not contradict the existing optimal bound $\tilde{O}(\sqrt{T})$ for sub-Gaussian noise because bounded noise is a stronger condition. Finally, simulations show empirical improvements of SME-OFU over a benchmark algorithm designed for sub-Gaussian noise when the reward noise is bounded.

2606.19878 2026-06-19 cs.LG math.OC stat.ML 新提交

On the Oracle Complexity of Interpolation-Based Gradient Descent

基于插值的梯度下降的预言复杂度

Dongmin Lee, William Lu, Anuran Makur

发表机构 * Purdue University(普渡大学)

AI总结 提出分段多项式插值梯度下降(PPI-GD)方法,通过数据域等距点查询一阶预言构造多项式插值近似全梯度,在强凸和非凸损失下分析预言复杂度,证明在数据维数受限且损失足够光滑时优于多种GD变体。

Comments 16 pages, 2 figures

详情
AI中文摘要

最近关于经验风险最小化(ERM)的一阶优化器的工作表明,可以利用ERM损失函数在训练数据中的光滑性(而非优化参数中的光滑性)来改进梯度下降(GD)方法的预言复杂度。在本文中,我们提出了一种不精确梯度方法——分段多项式插值梯度下降(PPI-GD),该方法通过在数据域中的等距点处查询一阶预言来近似每次迭代中的全梯度,从而在数据域的适当大小的块上构造所得梯度样本的多项式插值。我们分析了PPI-GD在强凸和非凸损失函数下的预言复杂度,其中数据空间维数以训练样本数量的多对数函数为界,并发现当损失函数足够光滑时,PPI-GD在关键区域优于几种GD变体。此外,我们的分析将双三次样条插值误差分析中的几种技术扩展到$d$变量张量积多项式插值的设置中,这可能对插值分析具有独立意义。

英文摘要

Recent work on first-order optimizers for empirical risk minimization (ERM) has suggested that smoothness of ERM loss functions in the training data, rather than in the optimization parameters, can be leveraged to improve the oracle complexity of gradient descent (GD) methods. In this paper, we propose an inexact gradient method, piecewise polynomial interpolation-based gradient descent (PPI-GD), which approximates the full gradient in each iteration by querying the first-order oracle at equidistant points in the data domain to construct polynomial interpolants of the resulting gradient samples over appropriately sized patches of the data domain. We analyze the oracle complexity of PPI-GD for strongly convex and non-convex loss functions when the data space dimension is bounded by a polylogarithmic function of the number of training samples, and find it to outperform several GD variants in key regimes when the loss function is sufficiently smooth. Furthermore, our analysis extends several techniques from the error analysis of bicubic spline interpolants to the setting of $d$-variate tensor product polynomial interpolants which may be of independent interest in interpolation analysis.

2606.20356 2026-06-19 math.OC cs.AI cs.LG math.PR stat.ML 新提交

Robust $Q$-learning for mean-field control under Wasserstein uncertainty in common noise

公共噪声Wasserstein不确定性下的平均场控制鲁棒$Q$-学习

Mathieu Laurière, Ariel Neufeld, Kyunghyun Park

AI总结 提出一种针对公共噪声分布Wasserstein不确定性的离散时间平均场控制鲁棒$Q$-学习算法,结合量化投影与Wasserstein对偶,证明同步和异步学习的收敛性及有限时间界,并在系统风险和流行病模型中验证鲁棒性-性能权衡。

详情
AI中文摘要

在本文中,我们提出了一种针对公共噪声定律下Wasserstein不确定性的离散时间平均场控制问题的鲁棒$Q$-学习算法。该算法将量化投影方案与公共噪声空间上的Wasserstein对偶重述相结合。我们建立了其收敛性以及同步和异步学习方案的有限时间迭代界。关于系统风险和流行病模型的数值实验将异步实现与理想化的Bellman迭代进行了比较,说明了在公共噪声误设下的鲁棒性-性能权衡,并报告了异步$Q$-学习算法的观察收敛行为。

英文摘要

In this article, we present a robust $Q$-learning algorithm for discrete-time mean-field control problems under Wasserstein uncertainty in the common noise law. The algorithm combines a quantization-and-projection scheme with a Wasserstein dual reformulation on the common-noise space. We establish its convergence together with finite-time iteration bounds for both synchronous and asynchronous learning schemes. Numerical experiments on systemic risk and epidemic models compare the asynchronous implementation with an idealized Bellman iteration, illustrate the robustness-performance tradeoff under common-noise misspecification, and report the observed convergence behavior of the asynchronous $Q$-learning algorithm.

2606.20299 2026-06-19 stat.ML cs.LG hep-ph physics.data-an 新提交

Statistical Properties of Training & Generalization

训练与泛化的统计特性

Itay Lavie, Noam Levi, Yonatan Kahn

AI总结 从物理学角度研究深度学习的关键特征和意外现象,回顾神经缩放定律及其与物理问题中约束和归纳偏置的相互作用。

Comments 32 pages, 3 figures. Part of the VERaiPHY initiative

详情
AI中文摘要

深度学习成功规避了经典统计学的众多直觉,在多个现实任务中取得了前所未有的性能。本文从物理学角度研究深度学习的关键特征和意外现象,注意指出并尽可能证明构建深度学习模型时固有的多种选择。特别地,我们回顾了神经缩放定律的现象,并讨论了它们与在物理问题中应用机器学习时可能存在的约束和归纳偏置之间的相互作用。

英文摘要

Deep learning has managed to evade numerous intuitions from classical statistics to achieve unprecedented performance on a number of real-world tasks. In this article, we investigate the key features and surprises of deep learning from a physics-informed perspective, taking care to point out and justify where possible the many choices inherent in constructing a deep learning model. In particular, we review the phenomenon of neural scaling laws and discuss their interplay with the constraints and inductive biases which may be present when applying machine learning to problems in physics.

7. 生物统计与医学统计 7 篇

2606.20341 2026-06-19 stat.ME stat.AP 新提交

Anchors Away: Navigating Unanchored Indirect Comparisons with Multilevel Unanchored Meta-Regression (ML-UMR)

锚定之外:使用多层次非锚定元回归(ML-UMR)导航非锚定间接比较

Conor Chandler, Jack Ishak

AI总结 针对随机证据缺失时的非锚定治疗比较,提出多层次非锚定元回归(ML-UMR),通过贝叶斯框架联合建模个体与汇总数据,估计多治疗、多研究及目标人群的边际和条件效应,并明确识别假设与可转移性假设。

Comments 20 pages (excluding supplementary material), 5 figures

详情
AI中文摘要

当随机证据不可用时,使用单臂研究或断开证据的非锚定间接治疗比较越来越多地用于卫生技术评估(HTA)。现有方法,包括匹配调整间接比较(MAIC)和模拟治疗比较(STC),通常局限于成对设置,并且通常估计比较研究人群中的边际效应,这可能与决策相关人群不同。我们提出多层次非锚定元回归(ML-UMR),一种用于综合来自完全断开证据的个体患者数据和汇总数据的贝叶斯回归框架。ML-UMR通过在一个统一似然中联合建模个体水平和汇总水平数据,将多层次网络元回归(ML-NMR)扩展到非锚定设置,从而能够估计跨多个治疗、研究和目标人群的治疗特异性结果以及边际和条件效应。ML-UMR区分了识别治疗效应所需的假设与将结果转移到目标人群所需的假设。与所有非锚定比较一样,有效推断依赖于强且通常不可验证的假设,包括条件可交换性、结果模型的正确设定以及跨治疗假设(例如,共享预后因素假设(SPFA))。ML-UMR并未减轻这些要求,而是在统一框架内使其明确,并促进敏感性分析。在模拟研究中,ML-UMR对比较人群效应产生了低偏差和名义覆盖。向其他人群的可转移性关键取决于识别假设:在强效应修饰下,违反SPFA导致偏差,而纳入亚组信息则恢复了近乎无偏的估计和名义覆盖。

英文摘要

Unanchored indirect treatment comparisons using single-arm studies or disconnected evidence are increasingly used in health technology assessment (HTA) when randomized evidence is unavailable. Existing methods, including matching-adjusted indirect comparison (MAIC) and simulated treatment comparison (STC), are generally limited to pairwise settings and typically estimate marginal effects in the comparator study population, which may differ from the decision-relevant population. We propose multilevel unanchored meta-regression (ML-UMR), a Bayesian regression framework for synthesizing individual patient data and aggregate data from fully disconnected evidence. ML-UMR extends multilevel network meta-regression (ML-NMR) to unanchored settings by jointly modeling individual- and aggregate-level data within a unified likelihood, enabling estimation of treatment-specific outcomes and both marginal and conditional effects across multiple treatments, studies, and target populations. ML-UMR distinguishes assumptions required to identify treatment effects from those required to transport results to target populations. As with all unanchored comparisons, valid inference relies on strong and often unverifiable assumptions, including conditional exchangeability, correct specification of the outcome model, and cross-treatment assumptions (e.g., shared prognostic factor assumption (SPFA)). ML-UMR does not lessen these requirements but makes them explicit within a unified framework and facilitates sensitivity analyses. In simulation studies, ML-UMR produced low bias and nominal coverage for comparator-population effects. Transportability to alternative populations depended critically on identifying assumptions: violations of SPFA led to bias under strong effect modification, whereas incorporating subgroup information restored near-unbiased estimation and nominal coverage.

2606.19982 2026-06-19 stat.ME 新提交

Built-in Selection Bias in Proportional Hazards Models with Omitted Covariates: Simulation Evidence and Alternative Approaches

省略协变量的比例风险模型中的内置选择偏倚:模拟证据与替代方法

Ayoub Bifenzi, Helene Jacqmin-Gadda

AI总结 本文通过模拟和实际数据,证明在随机试验中,即使省略的协变量与处理独立,仍会导致Cox比例风险模型估计的处理风险比存在偏倚,并比较了脆弱模型、加速失效时间模型和Kaplan-Meier曲线等替代方法的稳健性。

详情
AI中文摘要

在时间-事件分析中,来自Cox比例风险(PH)模型的风险比(HR)是评估治疗效果最常用且广泛报告的指标。然而,由于风险比固有地依赖于每个时间点的生存条件,它们具有非可压缩性。因此,当存在因省略重要协变量导致的未测量异质性时,即使这些协变量在基线时与主要暴露独立(如随机对照试验中),风险比也会受到内置选择偏倚的影响。本文旨在概述文献中关于未观测异质性(由影响结局的省略协变量引起)如何在标准比例风险模型中偏倚治疗风险比估计的关键发现,即使在处理分配独立于这些协变量的随机试验中也是如此。通过模拟,我们评估了半参数Cox PH模型和参数PH模型在各种未测量异质性场景下的偏倚程度。然后,我们将这些标准模型与替代方法进行比较,这些方法要么解决了这一问题,要么被认为对此具有稳健性。这些替代方法包括来自脆弱模型的风险比、来自加速失效时间(AFT)模型的回归参数,以及使用Kaplan-Meier曲线非参数估计或基于具有时变暴露效应的Cox模型估计的治疗组间生存差异。我们通过一个来自放射治疗肿瘤学组(RTOG 9202)的随机对照试验的实际数据应用,说明了所探索替代方法的实际相关性。

英文摘要

In time-to-event analysis, the hazard ratio (HR) derived from the Cox proportional hazards (PH) model is the most commonly used and widely reported measure for assessing treatment effects. However, hazard ratios are non-collapsible due to their inherent conditioning on survival up to each time point. As a result, they are subject to built-in selection bias in the presence of unmeasured heterogeneity arising from omitted important covariates, even when these covariates are independent of the main exposure at baseline, as is the case in randomized controlled trials. This article aims to provide an overview of key findings from the literature on how unobserved heterogeneity, due to omitted covariates that affect the outcome, can bias the estimation of the treatment hazard ratio in standard proportional hazards models, even in randomized trials where treatment is assigned independently of such covariates. Through simulations, we evaluate the extent of bias in the semi-parametric Cox PH model and parametric PH model under various scenarios of unmeasured heterogeneity. We then compare these standard models to alternative approaches that either account for this issue or are considered robust to it. These alternatives include the hazard ratio estimated from frailty models, regression parameters from an Accelerated Failure Time (AFT) model, and survival differences between treatment groups estimated nonparametrically using Kaplan-Meier curves or based on a Cox model with time-dependent effect of the exposure. We illustrate the practical relevance of the explored alternatives through a real data application to a randomized controlled trial from the Radiation Therapy Oncology Group (RTOG 9202).

2606.19892 2026-06-19 stat.ME 新提交

The Ghosh-Lin and Fine-Gray models for a mix of administrative and random censoring

混合行政删失与随机删失下的Ghosh-Lin和Fine-Gray模型

Thomas H. Scheike, Christian Mirian, Isao Yokota, Giuliana Cortese

AI总结 针对同时存在行政删失和随机删失的数据,提出结合风险集调整和逆概率删失加权的方法,使Ghosh-Lin和Fine-Gray模型得到一致估计。

详情
AI中文摘要

复发事件或竞争风险回归模型通常应用于生物医学领域,两者都可视为边际模型。在存在右删失的情况下,需要调整这些模型以获得一致估计量。当删失是行政性时,边际回归模型特别容易估计。然而,当删失是随机作用时,通常考虑逆概率删失加权(IPCW)调整来获得参数估计。该技术通过正确的删失模型进行删失权重调整,但对于行政删失,只需修改风险集即可正确调整。在实践中,对于大型中央登记处或某些临床试验,所有受试者的行政删失时间已知,但通常也会有一定比例的受试者被随机删失。在这项工作中,我们考虑两种常用的回归方法:用于带有终止事件的复发事件的Ghosh-Lin模型和用于竞争事件的Fine-Gray模型。对于这两种情况,当同时存在行政删失和随机删失时,我们展示了如何通过处理这两种不同类型删失的组合,在最小化建模假设的基础上获得正确估计。

英文摘要

Recurrent events or competing risks regression models are often applied in the bio-medical setting and both can be considered as marginal models. In presence of right-censoring, such models need to be adjusted to give consistent estimators. When censoring is administrative, marginal regression models are particularly easy to estimate. However, when censoring is instead acting randomly, inverse probability of censoring weighting (IPCW) adjustments are typically considered to obtain parameter estimates. This technique relies on a censoring-weights adjustment via a correct censoring model, but for administrative censoring the adjustment is done correctly simply by modifying the risk-set. In practice for large central registries or some clinical trials, the administrative censoring time will be known for all subjects, but there will typically also be a proportion of subjects that are censored at random. In this work, we consider two frequently used regression approaches, the Ghosh-Lin model for recurrent events with terminal events and the Fine-Gray model for competing events. For these two settings, when both administrative and random censoring are present, we demonstrate how to obtain correct estimation by dealing with the combination of the two different types of censoring relying on a minimum of modeling assumptions.

2606.19760 2026-06-19 stat.AP 新提交

Covariate-Adjusted Functional Principal Components Analysis for Modeling Hazard Rates of Physical Activity in the US Population

协变量调整的功能主成分分析用于建模美国人口体力活动的风险率

Md Rokibul Hasan, Pratim Guha Niyogi

AI总结 提出基于风险函数的分布分析方法,利用功能主成分分析(FPCA)从腕部加速度计数据中刻画个体活动强度分布变异,优于均值摘要。

详情
AI中文摘要

体力活动在人类健康中起着至关重要的作用。其整体分布因人而异。常用的汇总指标无法描述这种分布模式。我们提出了一种基于分布的分析方法,通过从腕部加速度计数据中导出的风险函数来建模个体活动强度模式,从而描述体力活动。我们分析了2011-2012年国家健康与营养调查(NHANES)中4297名连续佩戴设备7天的成年人的分钟级独立于监测器的运动摘要(MIMS)数据。我们使用基于生存的方法为每个个体在共同强度网格上导出了非参数活动强度风险,将MIMS的风险曲线及其对数变换后的MIMS都视为功能对象。我们在MIMS的两个尺度上使用功能主成分分析(FPCA)来表征活动强度分布的主要变异模式。组均值风险函数在低强度水平上差异很小,而在高强度水平上我们观察到显著差异。我们的结果表明,基于风险的功能表示方法能够捕捉个体间体力活动强度分布的差异,提供了一种灵活且可解释的方式来表征异质性。该方法优于基于均值的摘要,并支持对人口亚组之间体力活动模式进行有原则的比较。

英文摘要

Physical activity plays a vital role in human health. Its entire distribution differs among people. Commonly used summary measures cannot describe this distributional pattern. We present a distribution-based analytical approach to describe physical activity by modeling individual-level activity-intensity patterns through hazard functions derived from wrist-worn accelerometer data. We analyzed minute-level Monitor-Independent Movement Summary (MIMS) data of 4297 adults with seven continuous days of device wear from the 2011- 2012 National Health and Nutrition Examination Survey (NHANES). We derived a nonparametric activity-intensity hazard using a survival-based approach for each individual on a common intensity grid, treating both the hazard curves from MIMS and their log-transformed MIMS as functional objects. We used functional principal component analysis (FPCA) on both scales of MIMS to characterize dominant modes of variation in activity-intensity distributions. Group-wise mean hazard functions showed little difference at lower intensity levels, while we observed a substantial difference at higher intensity levels. Our results demonstrate that hazard-based functional representations for capturing differences in physical activity intensity distributions across individuals offer a flexible and interpretable way to characterize heterogeneity. This approach works better than mean-based summaries and supports principled comparisons of physical activity patterns across population subgroups.

2606.19743 2026-06-19 stat.ME stat.AP 新提交

A Bayesian spatio-temporal nearest neighbor Gaussian process model for pooled genetic data

一种用于汇总遗传数据的贝叶斯时空最近邻高斯过程模型

Imke Botha, Tianxiao Hao, Lucinda E. Harrison, Nick Golding, Daniel J. Weiss, Jennifer A. Flegg

AI总结 提出最近邻高斯过程模型,结合序贯蒙特卡洛平方算法,高效推断汇总遗传数据中的单倍型频率,并应用于非洲抗疟药物耐药性遗传数据分析。

详情
AI中文摘要

大规模遗传数据集通常汇总不同遗传标记的总等位基因计数。从这些汇总数据中推断单倍型频率(即多标记等位基因的频率)是一个挑战。由于计算成本,先前在此背景下的时空建模仅限于3个标记。在这项工作中,我们提出了一种最近邻高斯过程(NNGP)模型,以改善随标记和观测数量扩展的规模。为了推断模型参数,我们开发了一种新颖的序贯蒙特卡洛平方算法,该算法使用带有祖先抽样的粒子吉布斯来变异NNGP函数值。后者在观测数量和NNGP数量上具有线性成本,并可应用于广泛的NNGP模型。作为案例研究,我们分析了与非洲抗疟药物耐药性相关的遗传数据,并在3和6个遗传标记数据集上实证展示了我们的扩展结果。

英文摘要

Large scale genetic datasets often aggregate the total allele counts of distinct genetic markers. Inferring haplotype frequencies (i.e.\ the frequency of multimarker alleles) from these pooled data is a challenge. Previous spatio-temporal modelling in this context has been limited to 3 markers due to the computational cost. In this work, we propose a nearest neighbor Gaussian process (NNGP) model to improve scaling with the number of markers and observations. To infer the parameters of our model, we develop a novel sequential Monte Carlo squared algorithm, which uses particle Gibbs with ancestor sampling to mutate the NNGP function values. The latter has a linear cost in the number of observations and the number of NNGPs, and can be applied to a broad range of NNGP models. As a case study, we analyse genetic data relating to antimalarial drug resistance in Africa, and show our scaling results empirically on a 3 and 6 genetic marker dataset.

2606.20489 2026-06-19 q-bio.PE nlin.CG physics.bio-ph stat.AP 新提交

West Nile virus outbreak in Italy modelled with the quantum Game of Life

意大利西尼罗病毒疫情用量子生命游戏建模

Andrea Fontana, Simone Tambascia, Ciro Di Carluccio, Andrea Esposito, Bernardo Spagnolo, Andrea M. Chiariello

AI总结 使用量子生命游戏细胞自动机模型模拟2025年夏季意大利西尼罗病毒传播,通过优化蚊子出生和移除率,准确拟合局部和区域平均累计感染曲线,并评估环境变化的影响。

详情
AI中文摘要

近年来,意大利观察到西尼罗病毒(WNV)异常高传播,特别是在拉齐奥南部、坎帕尼亚和威尼托地区感染高峰显著。WNV的主要病媒是库蚊,通过叮咬传播人类感染。本文通过基于量子版本的生命游戏(GOL)细胞自动机模型的计算方法,研究2025年夏季意大利西尼罗热疫情的扩散。具体而言,人类动力学根据GOL规则演化,而病媒(即蚊子)的随机动力学及其与人类的相互作用同时发生。我们表明,该模型在局部和平均区域水平上以高精度拟合累计感染个体曲线,仅需优化蚊子出生率和移除率参数。此外,利用模型的灵活性,我们表明模型参数值的变化阐明了系统对环境变化的响应。例如,我们量化了蚊子传播控制措施或由于气候和生态变化导致的蚊子突然增加的影响。总体而言,我们提供了意大利WNV感染传播的一般定量描述,可作为测试不同环境情景的支持工具,并有助于决策者制定监测病媒动力学和控制病毒传播的策略。

英文摘要

In the last years, an anomalously high spreading of West Nile virus (WNV) has been observed in Italy, with particularly high peaks of infections in southern Lazio, Campania and Veneto regions. The main disease vector for WNV is represented by Culex pipiens mosquitoes, which spread human infections through their bites. Here, we investigate WNV fever epidemic diffusion during summer season 2025 in Italy through a computational approach based on a quantum version of the Game of Life (GOL) cellular automaton model. Specifically, human dynamics evolves according to the GOL rules, while stochastic dynamics of disease vectors, i.e., mosquitoes, as well as their interaction with humans, simultaneously occur. We show that this model fits the curves of cumulative infected individuals with high accuracy, either at local and average-regional level, with only optimization of mosquito birth and removal rates parameters. Furthermore, leveraging model flexibility, we show that changes in model parameters values elucidate system response to environmental variations. For instance, we quantify, e.g., the impact of mosquito spreading containment measures or sudden mosquito increasing abundance due to climatic and ecological changes. Overall, we provide a general, quantitative description of WNV infection spreading in Italy which could represent a supportive tool to test different environmental scenarios and could be useful to devise strategies for decision makers to monitor disease vector dynamics and to control consequent virus diffusion.

2606.19041 2026-06-19 stat.ME 新提交

Efficient Cumulative Incidence Estimation in Biobank Studies Using All Prevalent and Incident Events

利用所有现患和发病事件在生物库研究中进行高效累积发病率估计

David M. Zucker, Malka Gorfine

AI总结 针对生物库数据中同时包含招募前发病(现患)和随访期间发病的个体,提出一种新的累积发病率函数估计方法,整合所有病例,处理年轻发病且生存期长的疾病,理论证明渐近性质,模拟和UK生物库癌症数据验证其优势。

详情
AI中文摘要

基于人群的生物库已在许多国家建立,为大规模研究各种疾病的发病率提供了机会。生物库数据通常是在特定日历期内招募的研究队列中收集的,受试者在年龄介于$R_L$和$R_U$之间时进入研究。本研究关注包含两类个体的生物库数据:在招募前已发生目标疾病(称为现患病例)的个体,以及最初招募时无病但在随访期间发病的个体。我们提出一种新的累积发病率函数(CIF)估计量,它超越了现有方法,因为它整合了所有疾病病例,无论是现患还是发病,无论其后续生命历程如何。特别是,新方法可以处理涉及在年轻年龄发生且发病后生存期长的疾病的情况。建立了新方法的渐近性质,并进行了模拟研究以检验该方法的性能。我们通过将方法应用于英国生物库的癌症数据,说明了该方法的使用,并强调了其相对于现有方法的优势。

英文摘要

Population-based biobanks, now established in many countries, offer opportunities for large-scale studies investigating the incidence of various diseases. Biobank data is typically collected from a study cohort recruited over a defined calendar period, with subjects entering the study at various ages falling between $R_L$ and $R_U$. This work focuses on biobank data that includes individuals in whom onset of the disease of interest occurred before recruitment, termed prevalent cases, along with individuals initially recruited as disease-free in whom disease onset occurred during the follow-up period. We propose a novel cumulative incidence function (CIF) estimator that goes beyond existing methods in that it incorporates all disease cases, both prevalent and incident, irrespective of their subsequent life course. In particular, the new method can handle situations involving diseases that can occur at young ages with long survival after disease onset. Asymptotic properties of the new method are established and a simulation study is presented examining the performance of the method. We illustrate the use of the method and highlight its advantages over existing methods with an application to cancer data from the UK biobank.

8. 经济金融与社会科学统计 3 篇

2606.20240 2026-06-19 econ.EM stat.AP 新提交

Two-Sample IV: Efficient Two-Step Estimation and Tests for Overidentification and Weak-Instruments

两样本IV:高效两步估计及过度识别与弱工具变量检验

Fatima Kasenally, Ruoxi Guan, Frank Windmeijer

AI总结 针对两样本IV估计,提出异方差和样本异质性下稳健的两步高效估计方法及过度识别检验,仅需线性回归的汇总统计量,并扩展弱工具变量检验。

详情
AI中文摘要

两样本IV是一种流行的估计方法,当结果变量和处理变量在不同样本中可用,而工具变量在两个样本中都可用时。标准估计量是两样本两阶段最小二乘估计量,在同方差和样本同质性下是有效的。我们开发了一个稳健的两步程序,用于在一般异方差和样本异质性下进行有效估计,并提出了相关的两样本Hansen过度识别检验。我们方法的一个关键特征是只需要两个样本中简化形式和第一阶段的线性回归的汇总统计量。这些是估计系数向量的六个对象,以及同方差和异方差稳健的估计方差矩阵。我们进一步表明,在同方差和同质性下,处理样本中的第一阶段F统计量可以按标准方式用作弱工具变量检验,这里的相对偏差是比例偏差。我们提出了Montiel-Olea和Pflueger (2013)的有效F统计量的扩展,用于异方差情况,遵循Windmeijer (2025)的推广。我们在Marshall (2019)研究教育对投票行为影响的应用中说明了估计量和检验,并进行了聚类稳健推断。

英文摘要

Two-sample IV is a popular estimation method when the outcome and treatment variables are available in different samples, whereas instruments are available in both samples. The standard estimator is two-sample two-stage least squares estimator, which is efficient under homoskedasticity and homogeneity of the samples. We develop a robust two-step procedure for efficient estimation under general heteroskedasticity and heterogeneity of the samples, and propose a related two-sample Hansen overidentification test. A key feature of our approach is that only summary statistics from the linear regressions of the reduced form and first-stage in the two samples are needed. These are the six objects of the estimated coefficient vectors, and the homoskedastic and heteroskedasticity robust estimated variance matrices. We further show that the first-stage F-statistic in the treatment sample can be used as a test for weak instruments in the standard way under homoskedasticity and homogeneity, with the relative bias here a proportional bias. We propose an extension of the effective F-statistic of Montiel-Olea and Pflueger (2013) for the heteroskedastic case, following the generalization in Windmeijer (2025). We illustrate the estimators and tests in an application studying the effect of education on voting behavior from Marshall (2019), with cluster robust inference.

2606.20420 2026-06-19 q-fin.CP stat.AP 新提交

Advanced Calibration Analysis and Tools: Identifying Influential Observations in Stochastic Interest Rate Model Calibration

高级校准分析与工具:识别随机利率模型校准中的有影响观测值

Philipp Mahler, Peter Ruckdeschel

AI总结 将校准问题嵌入非线性回归理论,证明最小化RMSRE等价于加权最小二乘,开发诊断框架(加权帽子矩阵、影响函数、泛函Delta方法),实证发现杠杆边界主导、有效维度损失及2022年后参数稳定性转变,指出低RMSRE不足以验证校准。

Comments 47 pages, 9 figures, 1 table

详情
AI中文摘要

利率模型的准确校准对于市场一致性估值和经济情景生成器(ESGs)至关重要。多因子模型(如G2++模型)的传统校准方法通常依赖于点估计,忽略了特定市场数据的影响和估计不确定性的量化。本文开发了一个诊断框架,将校准问题嵌入非线性回归理论。研究表明,行业常见的均方根相对误差(RMSRE)最小化等价于加权最小二乘(WLS)问题。这一等价关系导出了诊断工具的相应公式,包括用于杠杆分析的加权帽子矩阵、用于局部敏感性诊断的影响函数,以及用于局部、边界置信区间的泛函Delta方法。实现中采用了高效的雅可比矩阵分解,利用了平价(ATM)上限的解析可处理性。该框架应用于2016-2025年期间的欧元ATM上限数据集。我们的实证分析揭示了边界主导的杠杆分布、由于参数约束活跃导致的重复有效维度损失,以及2022年后市场转型中局部参数稳定性的诊断机制转变。对精算模型治理的启示是:低RMSRE不足以验证校准。最后,我们讨论了该框架对一般最小二乘问题的适用性,同时指出了对于缺乏闭式梯度的工具(如互换期权)的计算挑战。

英文摘要

The accurate calibration of interest rate models is central to market-consistent valuation and Economic Scenario Generators (ESGs). Traditional calibration methods for multi-factor models such as the G2++ model often rely on point estimates, neglecting the influence of specific market data and the quantification of estimation uncertainty. This paper develops a diagnostic framework embedding the calibration problem into non-linear regression theory. It shows that the common industry practice of minimizing the Root Mean Squared Relative Error (RMSRE) is equivalent to a Weighted Least Squares (WLS) problem. This equivalence yields the corresponding formulations for diagnostic tools, including the Weighted Hat Matrix for leverage analysis, Influence Functions for local sensitivity diagnostics, and the Functional Delta Method for local, boundary-respecting confidence intervals. The implementation uses an efficient Jacobian factorization that exploits the analytical tractability of At-The-Money (ATM) caps. The framework is applied to a dataset of Euro ATM caps covering the period 2016--2025. Our empirical analysis reveals a boundary-dominated leverage profile, repeated losses of effective dimensionality due to active parameter constraints, and a diagnostic regime shift in local parameter stability around the post-2022 market transition. The resulting message for actuarial model governance is that low RMSRE is not sufficient for calibration validation. We conclude by discussing the framework's applicability to general least-squares problems while highlighting the computational challenges for instruments lacking closed-form gradients, such as swaptions.

2606.19789 2026-06-19 math.OC stat.ME 新提交

Dynamic Core Allocation for Malleable Jobs with Unknown Speed-up Parameters

具有未知加速参数的可变作业的动态核心分配

S. ~A. Bodas, J. ~L. Dorsman, M. Mandjes, L. Ravner

AI总结 针对多核系统中具有未知加速参数的可变作业,提出一种迭代学习-控制框架,通过最大似然估计未知参数并求解马尔可夫决策过程更新分配策略,以最小化长期平均作业数。

详情
AI中文摘要

我们研究了具有固定数量处理核心和可变形作业流的多核计算系统中的动态资源分配问题。每个作业可以在执行期间调整其并行度,从而允许在并发活动作业之间自适应地重新分配资源。作业属于两个可观测类别之一,每个类别由具有未知参数的独特加速函数表征。目标是学习一种核心分配策略,以最小化系统中长期平均作业数,即稳态下的平均响应时间。为了解决这种不确定性,我们开发了一个迭代学习与控制框架。系统在根据观察到的作业完成情况估计未知加速参数和求解相关马尔可夫决策过程以更新分配策略之间交替。在每个作业类别内,核心在活动作业之间平均共享;分配给每个类别的容量比例来自文献[17]的MDP公式,并在当前参数估计下进行评估。我们基于状态相关的离开时间构建了最大似然估计器,并证明了在固定分配策略下其强一致性。我们进一步提出了两种学习算法,将该估计步骤与基于动态规划的策略更新相结合,并通过数值实验说明了它们的性能。

英文摘要

We study dynamic resource allocation in a multicore computing system with a fixed number of processing cores and a stream of {\it malleable} jobs. Each job may adjust its level of parallelism during execution, allowing adaptive redistribution of resources across concurrently active jobs. Jobs belong to one of two observable classes, each characterized by a distinct speed-up function with unknown parameters. The objective is to learn a core-allocation policy that minimizes the long-run mean number of jobs in the system, equivalently the mean response time in steady state. \noindent To address this uncertainty, we develop an iterative learning-and-control framework. The system alternates between estimating the unknown speed-up parameters from observed job completions and solving the associated Markov decision process (MDP) to update the allocation policy. Within each job class, cores are shared equally among active jobs; the fraction of capacity assigned to each class is obtained from the MDP formulation of \cite{berg2017}, evaluated at the current parameter estimates. We construct a maximum likelihood estimator based on state-dependent inter-departure times and prove its strong consistency under a fixed allocation policy. We further propose two learning algorithms that combine this estimation step with dynamic programming-based policy updates, and illustrate their through numerical experiments.

9. 数据隐私、稳健性与公平性 1 篇

2606.20427 2026-06-19 math.ST stat.ME stat.TH 新提交

Private Rate-Double-Robust Inference

私有率双稳健推断

Máté Kormos, Aad van der Vaart

AI总结 本文通过局部隐私机制注入噪声保护个体隐私,同时利用率双稳健性实现目标参数的无偏和半参数有效推断,并开发了私有化非参数和参数 nuisance 估计方法。

详情
AI中文摘要

我们协调了隐私保护和率双稳健推断。个体隐私通过局部隐私机制得到保护:向敏感数据注入噪声,仅揭示用于推断的噪声数据。因此,隐私保护阻碍了推断。相比之下,当目标参数的估计量的大样本偏差由另外两个 nuisance 参数的估计误差之间的权衡表征时,该参数的推断是率双稳健的。因此,率双稳健性促进了推断。我们协调的起点是一类由无限维线性索引和低维非线性回归索引的率双稳健目标参数。这包括因果参数等。为了私有地推断这些目标,我们展示了合适的隐私机制如何将敏感数据模型的半参数性质转移到私有设置中。率双稳健性被转移,从而实现了对目标参数的局部私有、无偏和半参数有效推断。最后,我们将一般的非参数 nuisance 估计量转化为私有估计量,这些估计量继承了其非私有对应物的收敛性质。对于参数 nuisance 模型,我们开发了一种私有矩估计方法及其大样本推断理论。

英文摘要

We reconcile privacy protection and rate-double-robust inference. The privacy of individuals is protected by a local privacy mechanism: injecting noise into their sensitive data, revealing only the noisy data for inference. Hence, privacy protection hinders inference. In contrast, the inference of a target parameter is rate-double-robust when the large-sample bias of an estimator of the parameter is characterised by a trade-off between the estimation errors of two other, nuisance, parameters. Hence, rate-double-robustness facilitates inference. Our starting point of reconciliation is a class of rate-double-robust target parameters indexed linearly by an infinite-dimensional and nonlinearly by a low-dimensional regression. Among others, this includes causal parameters. To infer these targets privately, we show how suitable privacy mechanisms transfer the semiparametric properties of the sensitive-data model to the private setting. Rate-double-robustness is transferred, enabling locally-private, unbiased and semiparametrically efficient inference of our target parameters. Finally, we transform general nonparametric nuisance estimators into private ones, which inherit convergence properties of their nonprivate counterparts. For parametric nuisance models, we develop a private method-of-moments estimator and its large-sample inference theory.

10. 数据集、软件与应用 6 篇

2606.20114 2026-06-19 stat.ME stat.AP 新提交

Community detection in small-sample ordinal regimes: A benchmarking framework for Delphi data

小样本有序情境下的社区检测:德尔菲数据的基准测试框架

Yuri Calleo, Simone Di Zio, Fabrizio Maturo

AI总结 针对德尔菲数据高维小样本导致的秩亏问题,提出从变量中心协方差模型转向网络中心连接模型,利用社区检测算法识别潜在主题结构,实现结构稳定的降维。

详情
AI中文摘要

德尔菲数据共识的统计建模面临一个关键瓶颈:问卷项目的高维性与专家小组有限样本量之间的矛盾。这种秩亏导致传统潜变量模型(如主成分分析)结构不稳定且易过拟合。为弥补这一方法论空白,本研究提出从变量中心协方差模型转向网络中心连接模型。通过将项目相关性映射到加权图拓扑,我们提出了一个基于模拟的基准测试,利用社区检测算法识别潜在主题结构,有效解决了高维小样本情境下典型的谱不稳定性和秩亏问题。该研究系统评估了基于结构密度、信息流和谱划分的拓扑方法在合成数据集上的鲁棒性,这些数据集旨在复制共识数据的病理条件,包括有序量表和系统噪声。核心方法论贡献在于证明专家判断间的共线性——传统上被视为需要正则化的统计冗余——可以有效地重新解释为凝聚的拓扑信号。该框架为研究人员提供了一种结构化的自动降维程序,确保即使在标准因子分析失效的小样本情境下也能保持结构稳定性和心理测量一致性。

英文摘要

The statistical modeling of consensus in Delphi data faces a critical bottleneck: the high dimensionality of questionnaire items relative to the limited sample size of expert panels. This rank deficiency leads traditional latent variable models, such as Principal Component Analysis, to be structurally unstable and prone to overfitting. Addressing this methodological gap, this study proposes a transition from variable-centric covariance models to network-centric connectivity models. By mapping item correlations onto a weighted graph topology, we present a simulation-based benchmark that utilizes community detection algorithms to identify latent thematic structures, effectively addressing the spectral instability and rank deficiency typical of high-dimensional, low-sample-size regimes. The research systematically evaluates the robustness of topological approaches based on structural density, information flow, and spectral partitioning against synthetic datasets designed to replicate the pathological conditions of consensus data, including ordinal scales and systemic noise. The central methodological contribution lies in demonstrating that collinearity among expert judgments - traditionally treated as statistical redundancy to be regularized - can be effectively reinterpreted as a topological signal of cohesion. This framework provides researchers with a structured and automated procedure for dimensionality reduction, ensuring structural stability and psychometric consistency even in small-sample regimes where standard factor analysis breaks down.

2606.19775 2026-06-19 cs.SI stat.AP stat.OT 新提交

Rethinking Sampling Strategy in Link Prediction

重新思考链接预测中的采样策略

Yilin Bi, Zhenyu Deng, Xinshan Jiao, Tao Zhou

AI总结 提出β-采样方案,研究两阶段采样对链接预测性能的影响,发现缺失链接的结构特征显著影响预测精度,且第二阶段采样策略至关重要。

Comments 19 pages, 5 figures, 3 tables

详情
AI中文摘要

许多现实世界的网络是不完整的,使得链接预测成为网络科学中的一个基本挑战。为了训练参数和评估算法,观察到的链接通常被划分为三个子集,即训练集、验证集和探测集。这种划分隐含地涉及两个采样过程:第一阶段采样产生探测集,第二阶段采样获得变化集。迄今为止,我们对这两个采样过程如何影响算法性能的理解仍然非常有限。为了解决这个问题,我们提出了一种称为β-采样的采样方案,其中链接的采样概率与其两个端点的度数乘积的β次幂成正比。在45个真实网络上的实验表明,通过改变探测集模拟的缺失链接的结构特征显著影响预测精度。当缺失链接倾向于连接高度数节点时,这类链接可以很容易地被准确预测。此外,即使探测集固定,第二阶段采样仍然对预测精度产生显著影响。值得注意的是,最优的第二阶段采样策略不同于随机采样(随机选择链接形成验证集)和一致采样(保证验证集和探测集中的链接具有相同的结构特征)。

英文摘要

Many real-world networks are incomplete, making link prediction a fundamental challenge in network science. To train parameters and evaluate algorithms, observed links are usually divided into three subsets, namely training, validation, and probe sets. This division implicitly involves two sampling processes: first-stage sampling yields the probe set and second-stage sampling obtains the variation set. To date, our understanding of how these two sampling processes affect algorithm performance remains quite limited. To address this issue, we propose a sampling scheme called $β$-sampling, where the sampling probability of a link is proportional to the product of the degrees of its two endpoints raised to the power of $β$. Experiments on 45 real-world networks reveal that the structural characteristics of missing links, as simulated via varying probe sets, substantially impact prediction accuracy. When missing links tend to connect high-degree nodes, such links can be predicted accurately with ease. Furthermore, even with a fixed probe set, second-stage sampling still exerts a significant influence on prediction accuracy. Notably, the optimal second-stage sampling strategy differs from \textit{random sampling} (which randomly selects links to form the validation set) and \textit{consistent sampling} (which guarantees that links in the validation and probe sets share identical structural characteristics).

2606.19642 2026-06-19 physics.ao-ph stat.AP stat.ML 新提交

Rigorous uncertainty quantification of probabilistic AI weather forecasts with conformal prediction

基于保形预测的概率AI天气预报的严格不确定性量化

Anna Asch, Raphael Rossellini, Pedram Hassanzadeh, Rebecca Willett

AI总结 针对AI概率天气预报校准不足(尤其是极端事件),提出使用保形预测方法,无需分布假设即可数学保证覆盖,应用于三个全球模型(GenCast、NeuralGCM、AIFS-ENS)的温度和降水预报,实现校准不确定性而不牺牲其他概率指标。

详情
AI中文摘要

概率天气预报正随着人工智能(AI)经历快速变革。在传统数值天气预报中,计算能力可能限制集合预报对未知未来状态统计分布的近似程度。AI模型便于生成更大的集合,并经过概率考量训练,理论上能带来更好的不确定性量化。这些最先进模型的预报通常被认为是良好校准的。然而,我们在此表明,此类模型的统计覆盖(校准的最终度量)可能存在问题,尤其是在极端事件上。为解决这一缺陷,我们采用保形预测,这是一类统计方法,与以往的后处理技术不同,它在无分布假设下数学上保证覆盖。我们将在线保形预测应用于三个领先全球天气模型(GenCast、NeuralGCM和AIFS-ENS)的温度和降水预报(包括极端情况),确保校准不确定性而不牺牲其他概率指标。这种后处理方法可应用于任何预报模型。

英文摘要

Probabilistic weather forecasting is undergoing rapid transformation with artificial intelligence (AI). In traditional numerical weather prediction, computing power can limit how well ensemble forecasts approximate the unknown statistical distribution of future states. AI models facilitate larger ensembles and are trained with probabilistic considerations, ideally leading to better uncertainty quantification. Forecasts from these state-of-the-art models are often considered well-calibrated. However, here we show that the statistical coverage of such models, the ultimate measure of calibration, can struggle, especially on extreme events. To address this shortcoming, we employ conformal prediction, a class of statistical methods that mathematically guarantees coverage under no distributional assumptions, unlike previous post-processing techniques. We apply online conformal prediction to temperature and precipitation forecasts (including extremes) of three leading global weather models, GenCast, NeuralGCM, and AIFS-ENS, ensuring calibrated uncertainty at no expense to other probabilistic metrics. This post-processing method can be applied to any forecasting model.

2606.18544 2026-06-19 stat.AP 新提交

Chess Signatures of Play

对弈的棋谱签名

Christian Turk, Nicholas Polson

AI总结 利用粗路径理论的签名变换提取棋局中事件顺序与交互的不变特征,构建签名核双样本检验和时序有效作弊检测方法,在控制错误率的同时显著提升检测能力。

详情
AI中文摘要

一局棋是一个流:一个按时间排序的走法序列,每个走法携带引擎评估、准确度度量、局面复杂度度量和时钟读数。我们将一局棋建模为多元路径,并应用粗路径理论的签名变换,获得一个重参数化不变、分级的特征集,记录棋局内事件的顺序和交互,无需参数化似然。我们证明,棋手的对弈法则可以从期望签名中识别,直至树状等价;构造路径空间上的签名核双样本检验;并将作弊检测重新表述为任意时序有效的序列检验:签名符合度得分成为一个e过程,其误差通过Ville不等式对每个样本量同时控制,波动在中等偏差尺度上校准。判别信息存在于签名的Levy面积中,该面积衡量准确度是否恰好当局面变难时上升——这是引擎辅助的特征,而聚合的匹配率统计忽略了这一点。在对照研究中,该检验保持精确的第一类错误控制,检测能力从对细微辅助的微不足道上升到对明显辅助的0.98,中位检测时间与增长率预测一致。校准至马格努斯·卡尔森记录在案的精英准确度后,该监测器不会标记世界冠军级别的对弈;我们展示了作弊策略,这些策略使所有聚合统计量(包括Regan系统的最佳走法频率z分数)保持不变,却被签名干净地捕获——精确说明了顺序感知、任意时序有效的检验如何加强现有的国际象棋反作弊方法。

英文摘要

A game of chess is a stream: a time-ordered sequence of moves, each carrying an engine evaluation, a measure of accuracy, a measure of position complexity, and a clock reading. We model a game as a multivariate path and apply the signature transform of rough-path theory to obtain a reparametrization-invariant, graded feature set that records the order and interaction of in-game events without a parametric likelihood. We show that a player's law of play is identifiable from the expected signature up to tree-like equivalence, construct a signature-kernel two-sample test on path space, and recast cheating detection as an anytime-valid sequential test: a signature conformance score becomes an e-process whose error is controlled for every sample size at once by Ville's inequality, with fluctuations calibrated on the moderate-deviation scale. The discriminating information lives in the signature's Levy areas, which measure whether accuracy rises precisely when positions become hard--the fingerprint of engine assistance that aggregate match-rate statistics discard. In a controlled study the test holds exact type-I control and detection power rises from negligible for subtle assistance to 0.98 for blatant assistance, with a median detection time matching the growth-rate prediction. Calibrated to Magnus Carlsen's documented elite accuracy, the monitor does not flag world-champion-level play; and we exhibit cheating strategies that leave every aggregate statistic, including the best-move-frequency z-score of the Regan system, unchanged yet are caught cleanly by the signature--making precise how an order-aware, anytime-valid test strengthens the prevailing approach to chess anti-cheating.

2606.18436 2026-06-19 stat.ML cs.LG 新提交

Pointwise is Pointless? A Multimodal Ablation Study for Precipitation Nowcasting with Graph Neural Networks

逐点是否无意义?基于图神经网络的降水临近预报的多模态消融研究

Ophélia Miralles, Máté Mile, Christoffer Artturi, Thomas Nipen, Ivar Seierstad

发表机构 * Norwegian Meteorological Institute(挪威气象研究所)

AI总结 本研究通过多模态图神经网络系统,消融分析雷达、数值预报、地面观测、卫星数据及训练损失对降水临近预报的影响,发现各模态分别改善不同方面,点观测虽提升局部但需结合损失函数和不确定性表示才能优化雷达场。

详情
AI中文摘要

稀疏点观测在降水临近预报中日益可用,但尚不清楚它们能在多大程度上改善密集雷达场预报。我们通过北欧雷达区域的多模态图神经网络临近预报系统部分回答了这个问题。该模型预测未来两小时内每五分钟的降雨率,并采用雷达历史、MEPS数值天气预报、Netatmo地面观测、MSG卫星通道、随机噪声和基于CRPS的集合损失的不同组合进行训练。本研究设计为对操作相关信源和训练目标的消融。我们比较了仅雷达、NWP信息、站点信息、卫星信息、噪声增强和基于CRPS的配置,使用雷达网格、站点位置、降雨起始的互补诊断,以及oracle、位移和幅度评分。结果表明,每个信源改善了预报问题的不同方面。MEPS稳定了仅雷达外推,Netatmo观测改善了局部站点和起始诊断,卫星预测因子减少了某些站点级偏差,但在确定性使用时可能过早激活降雨。基于CRPS的配置提供了最一致的雷达网格增益,而卫星与CRPS的组合设置给出了最佳的整体oracle/DAS评分。这些结果不支持点观测对临近预报无用的结论,但表明局部观测技能和空间相干雷达场技能是不同的目标。实际意义是,稀疏观测可以提供有用的局部约束,但它们对雷达类场的益处取决于训练损失、不确定性表示以及观测支持在模型中的编码方式。

英文摘要

Sparse point observations are increasingly available for precipitation nowcasting, but it is unclear how much they improve dense radar-field forecasts. We partially address this question with a multimodal graph neural network nowcasting system over the Nordic radar domain. The model predicts rain rate every five minutes up to two hours ahead and is trained with different combinations of radar history, MEPS numerical weather prediction, Netatmo surface observations, MSG satellite channels, stochastic noise, and CRPS-based ensemble losses. The study is designed as an ablation of operationally relevant information sources and training objectives. We compare radar-only, NWP-informed, station-informed, satellite-informed, noise-augmented, and CRPS-based configurations using complementary diagnostics on the radar grid, at station locations, for rain onset, and through oracle, displacement, and amplitude scores. The results show that each source improves a different part of the forecast problem. MEPS stabilises radar-only extrapolation, Netatmo observations improve local station and onset diagnostics, and satellite predictors reduce some station-level biases but may activate rain too early when used deterministically. CRPS-based configurations provide the most consistent radar-grid gains, while the combined satellite and CRPS setup gives the best overall oracle/DAS score. These results do not support the conclusion that point observations are uninformative for nowcasting, but they show that local observational skill and spatially coherent radar-field skill are distinct targets. The practical implication is that sparse observations can provide useful local constraints, but their benefit for radar-like fields depends on the training loss, uncertainty representation, and how observation support is encoded in the model.

2606.18611 2026-06-19 cs.SD cs.AI cs.LG stat.ML 新提交

QC-GAN: A Parameter-Efficient Quaternion Conformer GAN for High-Fidelity Speech Enhancement

QC-GAN: 一种参数高效的四元数Conformer GAN用于高保真语音增强

Shogo Yamauchi, Hideaki Tamori, Makoto Sakai, Yosuke Yamano, Tohru Nitta

发表机构 * The Asahi Shimbun Company(朝日新闻社) Tokyo Woman's Christian University(东京女子基督教大学)

AI总结 提出参数高效的QC-GAN,结合四元数Conformer生成器和MetricGAN训练,通过汉密尔顿积共享权重减少参数量,在VoiceBank+DEMAND上以0.89M参数达到PESQ 3.48,性能媲美两倍大小模型。

Comments 10 pages, 6 figures and 5 tables. Accepted at Interspeech2026

详情
AI中文摘要

我们提出了一种参数高效的语音增强框架——四元数Conformer GAN(QC-GAN),它将四元数Conformer生成器与基于MetricGAN的训练相结合。汉密尔顿积通过结构化权重共享对幅度和相位进行编码,在减少层参数数量的同时保持其相互依赖性。采用度量学习判别器,通过优化近似感知评估分数来最大化感知质量。在VoiceBank+DEMAND数据集上,QC-GAN仅用0.89M参数就达到了3.48的语音质量感知评估(PESQ)分数,其性能与最先进模型相当,而参数量不到后者的一半。一个35K参数的变体实现了3.23的PESQ分数,以显著更少的参数超越了传统方法。在DNS-Challenge 3数据集上的评估进一步证实了其在真实世界条件下的泛化能力。

英文摘要

We propose a parameter-efficient speech enhancement framework, Quaternion Conformer GAN (QC-GAN), which combines a Quaternion Conformer generator with MetricGAN-based training. The Hamilton product encodes the magnitude and phase via structured weight sharing, reducing the number of layer parameters while preserving their interdependencies. A metric-learning discriminator was employed to maximize perceptual quality by optimizing the approximate perceptual evaluation scores. On the VoiceBank+DEMAND dataset, QC-GAN achieved a Perceptual Evaluation of Speech Quality (PESQ) score of 3.48 with only 0.89M parameters, delivering a performance comparable to state-of-the-art models at less than half their size. A 35K-parameter variant achieved a PESQ score of 3.23, surpassing conventional methods with significantly fewer parameters. Evaluation on the DNS-Challenge 3 dataset further confirmed generalization to real-world conditions.

11. 其他/综合统计 13 篇

2606.19859 2026-06-19 cs.IT cs.LG math.IT math.PR math.ST stat.TH 新提交

Doeblin Curves

Doeblin 曲线

Dongmin Lee, William Lu, Anuran Makur, Japneet Singh

AI总结 提出 Doeblin 曲线概念,量化马尔可夫核在不同散度和功率水平下的收缩行为,并应用于噪声迭代优化、噪声电路可靠计算和差分隐私等领域的更细粒度收缩分析。

Comments 42 pages, 2 figures

Journal ref IEEE Transactions on Information Theory, vol. 72, no. 6, pp. 3556-3596, June 2026

详情
AI中文摘要

近期关于 Doeblin 系数的研究揭示了它们作为 TV 距离的 Dobrushin 收缩系数的多路泛化的有用性,这与它们在马尔可夫链遍历性理论中的经典作用不同。然而,为了建立信息收缩的存在性,通常需要强条件,例如远离 0。基于最近提出的非线性信息收缩概念,我们旨在提出一种更细粒度的基于 Doeblin 的多路收缩行为刻画,即使对于 Doeblin 系数为 0 的信道,也能产生非平凡的收缩保证。为此,我们引入了 Doeblin 曲线的概念——一种非线性函数,它量化了马尔可夫核在特定散度和功率水平下对输入分布集合的收缩行为。在我们的分析过程中,我们发展了 Doeblin 系数的新变分刻画,提出了 Doeblin 曲线的若干性质,定义了功率约束 Doeblin 曲线的几个版本,并利用上述变分刻画推导了上下界。然后,我们将这些结果应用于不同领域,包括噪声迭代优化的泛化界、噪声电路可靠计算的误差界以及在线迭代算法的差分隐私保证。特别是,我们将这些领域的结果扩展到更广泛的领域或群体设置,利用 Doeblin 曲线揭示比 Doeblin 系数更细粒度的收缩现象。

英文摘要

Recent research on Doeblin coefficients has shed light on their usefulness as a multi-way generalization of the Dobrushin contraction coefficient for TV distance, in a separate vein from their classic role in the theory of Markov chain ergodicity. However, strong conditions, such as being bounded away from 0, are typically necessary for Doeblin coefficients to establish the existence of information contraction. Building on recently formulated concepts of nonlinear information contraction, we aim to propose a finer-grained Doeblin-based characterization of multi-way contraction behavior which yields non-vacuous contraction guarantees even for channels whose Doeblin coefficient is 0. To this end, we introduce the notion of a Doeblin curve -- a nonlinear function which quantifies the contraction behavior of a Markov kernel on collections of input distributions at specific levels of divergence and power. Through the course of our analysis, we develop a new variational characterization of Doeblin coefficients, present several properties of Doeblin curves, define several versions of power-constrained Doeblin curves, and derive upper and lower bounds using our aforementioned variational characterization. We then utilize these results in diverse areas, including generalization bounds for noisy iterative optimization, error bounds for reliable computation with noisy circuits, and differential privacy guarantees for online iterative algorithms. In particular, we extend results in these areas to broader domains or group settings, leveraging Doeblin curves to reveal finer-grained contraction phenomena than Doeblin coefficients.

2606.19726 2026-06-19 math.ST stat.TH 新提交

A Laplace equation approach to the Behrens--Fisher problem

Behrens-Fisher问题的拉普拉斯方程方法

Nagananda K G, Jong Sung Kim

AI总结 针对两独立正态样本方差未知且不等的情况,提出偏微分方程公式,通过正交分解和球面楔概率将分布问题转化为拉普拉斯-狄利克雷边值问题,导出累积分布函数和概率密度的精确有限样本表示,并得到尾部分布展开。

Comments 31 pages, 4 figures

详情
AI中文摘要

我们针对两个独立正态样本(方差未知且不等)的Behrens-Fisher问题,发展了一种偏微分方程公式。通过正交分解分离均值分量和残差分量(对应于去除均值方向后中心化的样本内变异),并将样本均值的学生化差异重新表述为尺度不变的几何约束。这种简化将分布问题转化为球面楔概率的评估,这些概率被识别为调和测度以及拉普拉斯-狄利克雷边值问题在原点的值。在此框架下,我们导出了累积分布函数和概率密度函数的精确有限样本表示,形式为贝塔函数,仅依赖于样本量和方差比。这些表示将Behrens-Fisher分布置于标准特殊函数形式中,可直接在广泛可用的商业软件(包括Microsoft Excel)中使用,从而便于分布评估和分位数计算。我们还得到了相关调和延拓及其阈值导数的Gegenbauer分离变量展开,系数为封闭的贝塔-伽马形式,并导出了具有显式首项常数和高阶修正的尖锐尾部分布展开。

英文摘要

We develop a partial differential equation formulation of the Behrens-Fisher problem for two independent normal samples with unknown and unequal variances. An orthogonal decomposition separates mean and residual components (corresponding to the centered within-sample variation left after removal of the mean directions) and recasts the studentized difference of sample means as a scale-invariant geometric constraint. This reduction transforms the distributional problem into the evaluation of spherical wedge probabilities, which are identified with harmonic measure and with the value at the origin of a Laplace-Dirichlet boundary value problem. From this framework, we derive exact finite-sample representations for the cumulative distribution function and the probability density function in terms of beta functions, with dependence only on the sample sizes and the variance ratio. These representations place the Behrens-Fisher law in a standard special-function form that is directly accessible in widely available commercial software -- including Microsoft Excel -- thereby facilitating distributional evaluation and quantile computation. We also obtain a Gegenbauer separation-of-variables expansion for the associated harmonic extension and its threshold derivative, with coefficients in closed Beta-Gamma form, and derive sharp tail expansions with explicit leading constants and higher-order corrections.

2606.11171 2026-06-19 cs.LG cond-mat.stat-mech cs.IT math.IT math.OC math.ST stat.TH 新提交

Indexed Bellman Information Complexity

核赌博机中的算法与极小极大复杂度

Yunbei Xu

AI总结 本文通过统一MAIR框架,将GP-UCB与MAMS算法置于共同语言下,提出结合两者优势的安全主算法,并证明在过参数化模型中算法复杂度比类宽极小极大或DEC证书更具信息性。

详情
AI中文摘要

高斯过程上置信界(GP-UCB)和决策估计系数(DEC)方法乍看之下可能属于不同的理论。本文将这两种观点置于一个共同的算法信息语言中,用于频率学派RKHS赌博机。GP-UCB固定了一个算法性的(而非真实的)高斯过程先验,并利用实现轨迹的复杂度以及计算可处理性,而MAMS优化了一个鲁棒的类宽MAIR/DEC包络。通过统一的MAIR框架和异质半正定算法先验,我们推广了GP-UCB分析和MAMS算法,提出了一种结合两者优势的安全主算法,并提供了一个核赌博机构造,表明在过参数化模型中算法复杂度可以比类宽极小极大或DEC证书更具信息性。由此得出的信息是:算法信息和类宽极小极大系数回答不同的问题,并可能导致不同的差距;核赌博机提供了一个干净的环境,使得这种区别在数学上变得可见。

英文摘要

We develop indexed Bellman information complexity, a representation-level theory of interactive decision making centered on information indices and reference histories. The representation strips away problem-specific syntax and retains only the ingredients needed for dynamic programming and information accounting, thereby unifying the earlier framework of indexed algorithmic information ratios (AIR). On the upper-bound side, regret is controlled by Bellman supersolutions or potential identities whose gradient bracket is paid for by indexed information. Upper-confidence-bound (UCB), estimation-to-decision/decision-estimation-coefficient (E2D/DEC), and adaptive-minimax-sampling or exploration-by-optimization (AMS/EBO) methods appear as three relaxations of this same identity. On the lower-bound side, the posterior-reference trajectory supplies both the information telescope and the ghost quantile of small-regret trajectories. The resulting critical radius in the lower bound is an effective-dimension-scale quantity, as in Fano and local-prior-mass lower bounds, rather than the constant radius of a two-point Le Cam argument. The examples show that DEC is best viewed as a one-step relaxation of indexed Bellman information complexity, not as a universally tight conversion mechanism. We illustrate the framework through several applications, with particular emphasis on kernel bandits. In this setting, the active action marginal provides a concrete basis for comparing UCB, E2D, and AMS/EBO.

2507.15475 2026-06-19 eess.SP math.PR stat.AP

On the Distribution of a Two-Dimensional Random Walk with Restricted Angles

二维受限角度随机游走的分布

Karl-Ludwig Besser

AI总结 研究受限角度二维随机游走的分布,推导两步联合与边缘分布,提供一般步数的数值解及大步数近似,明确支持集的精确描述。

Comments 14 pages, 14 figures

Journal ref IEEE Transactions on Signal Processing, vol. 74, pp. 2316-2330, 2026

详情
AI中文摘要

本文推导了二维(复数)随机游走的分布,其中每一步的角度被限制在圆的一个子集。这种设置出现在信号处理中的空中计算等领域。特别地,我们推导了两步的联合和边缘分布,给出了任意步数的数值解,并对大步数提供了近似解。此外,我们为任意步数提供了支持集的精确描述。本文的结果为未来涉及此类问题的研究提供了参考。

英文摘要

In this paper, we derive the distribution of a two-dimensional (complex) random walk in which the angle of each step is restricted to a subset of the circle. This setting appears in various domains, such as in over-the-air computation in signal processing. In particular, we derive the exact joint and marginal distributions for two steps, numerical solutions for a general number of steps, and approximations for a large number of steps. Furthermore, we provide an exact characterization of the support for an arbitrary number of steps. The results in this work provide a reference for future work involving such problems.

2506.23396 2026-06-19 stat.ML cs.LG

AICO: Feature Significance Tests for Supervised Learning

Kay Giesecke, Enguerrand Horel, Chartsiri Jirachotkulthorn

发表机构 * Stanford University, Department of Management Science and Engineering and Institute for Computational and Mathematical Engineering(斯坦福大学管理科学与工程系和计算与数学工程研究所) Upstart, Inc.(Upstart公司) Stanford University, Institute for Computational and Mathematical Engineering(斯坦福大学计算与数学工程研究所)

详情
英文摘要

Machine learning is central to modern science, industry, and policy, yet its predictive power often comes at the cost of transparency: we rarely know which input features truly drive a model's predictions. Without such understanding, researchers cannot draw reliable conclusions, practitioners cannot ensure fairness or accountability, and policymakers cannot trust or govern model-based decisions. Existing tools for assessing feature influence are limited; most lack statistical guarantees, and many require costly retraining or surrogate modeling, making them impractical for large modern models. We introduce AICO, a broadly applicable framework that turns model interpretability into an efficient statistical exercise. AICO tests whether each feature genuinely improves predictive performance by masking its information and measuring the resulting change. The method provides exact, finite-sample feature p-values and confidence intervals for feature importance through a simple, non-asymptotic hypothesis testing procedure. It requires no retraining, surrogate modeling, or distributional assumptions, making it feasible for large-scale algorithms. In both controlled experiments and real applications, from credit scoring to mortgage-behavior prediction, AICO reliably identifies the variables that drive model behavior, providing a scalable and statistically principled path toward transparent and trustworthy machine learning.

2412.20298 2026-06-19 cs.LG cs.CY stat.ML

An Experimental Study on Fairness-aware Machine Learning for Credit Scoring Problems

Huyen Giang Thi Thu, Thang Viet Doan, Ha-Bang Ban, Tai Le Quy

发表机构 * Banking Academy of Vietnam(越南银行学院) Vietnam Academy of Science and Technology(越南科学技术 academy) Hanoi University of Science and Technology(河内科学技术大学) University of Koblenz(科隆大学)

Comments The manuscript is submitted to Springer Nature's journal

详情
英文摘要

The digitalization of credit scoring has become essential for financial institutions and commercial banks, especially in the era of digital transformation. Machine learning techniques are commonly used to evaluate customers' creditworthiness. However, the predicted outcomes of machine learning models can be biased toward protected attributes, such as race or gender. Numerous fairness-aware machine learning models and fairness measures have been proposed. Nevertheless, their performance in the context of credit scoring has not been thoroughly investigated. In this paper, we present a comprehensive experimental study of fairness-aware machine learning in credit scoring. The study explores key aspects of credit scoring, including financial datasets, predictive models, and fairness measures. We also provide a detailed evaluation of fairness-aware predictive models and fairness measures on widely used financial datasets. The experimental results show that fairness-aware models achieve a better balance between predictive accuracy and fairness compared to traditional classification models.

2510.05013 2026-06-19 stat.ML cs.LG

Curiosity-Driven Development of Action and Language in Robots Through Self-Exploration

通过自我探索的机器人好奇心驱动行为与语言发展

Theodore Jerome Tinker, Kenji Doya, Jun Tani

发表机构 * Okinawa Institute of Science and Technology(冲绳科学技术大学院大学)

AI总结 本研究通过好奇心驱动的机器人自我探索,结合Q学习实现主动推理,揭示了组合泛化、快速学习、先配对后组合以及异常处理导致的U型发展模式,为人类高效语言习得提供解释。

Comments 27 pages, 22 pages of supplementary material

详情
AI中文摘要

婴儿通过极少的经验就能泛化习得语言,而大型语言模型需要数十亿的训练标记。人类高效发展的基础是什么?我们通过实验研究了这一问题,其中机器人代理通过好奇心驱动的自我探索学习执行与祈使句(例如,推红色立方体)相关的动作。我们的方法使用Q学习摊销主动推理,实现内在动机的发展性学习。模拟揭示了与发展心理学观察相对应的关键发现。i) 随着组合元素规模的增加,泛化能力显著提高。ii) 好奇心驱动的探索能够加速学习。iii) 句子和动作的机械配对先于组合泛化。iv) 异常处理导致U型发展表现,这种模式类似于儿童语言学习中的表征重述。这些结果表明,好奇心驱动的主动推理解释了内在动机的感觉运动-语言学习如何支持人类和人工代理中的可扩展组合泛化和异常处理。

英文摘要

Infants acquire language with generalization from minimal experience, whereas large language models require billions of training tokens. What underlies efficient development in humans? We investigated this problem through experiments wherein robotic agents learn to perform actions associated with imperative sentences (e.g., push red cube) via curiosity-driven self-exploration. Our approach amortizes active inference using Q-learning, enabling intrinsically motivated developmental learning. The simulations reveal key findings corresponding to observations in developmental psychology. i) Generalization improves drastically as the scale of compositional elements increases. ii) Curiosity-driven exploration enables faster learning. iii) Rote pairing of sentences and actions precedes compositional generalization. iv) Exception-handling induces U-shaped developmental performance, a pattern like representational redescription in child language learning. These results suggest that curiosity-driven active inference accounts for how intrinsically motivated sensorimotor-linguistic learning supports scalable compositional generalization and exception handling in humans and artificial agents.

2505.01318 2026-06-19 stat.ME

Modeling Large Nonstationary Spatial Data with the Full-Scale Basis Graphical Lasso

用全尺度基图拉索方法建模大非平稳空间数据

Matthew LeDuc, William Kleiber, Tomoko Matsuo

AI总结 本文提出了一种结合隐含低秩过程和稀疏协方差模型的新方法,用于建模大非平稳空间数据,通过灵活的图高斯马尔可夫随机场模型对低秩组件系数进行建模,并结合全尺度近似和基图拉索方法,提出全尺度基图拉索方法(FSBGL),采用图拉索惩罚似然进行估计,通过差异凸方案优化,通过合成场和热层高分辨率模拟数据集验证,与现有空间模型相比,在有限训练数据下更能捕捉热层温度场的显著特征。

详情
AI中文摘要

我们提出了一种新的方法,用于建模大非平稳空间过程的数据集,该方法结合了隐含的低秩过程和稀疏协方差模型。低秩组件的系数被赋予了灵活的图高斯马尔可夫随机场模型。利用低秩和紧支撑协方差结构结合了全尺度近似和基图拉索;我们称这种新方法为全尺度基图拉索(FSBGL)。估计采用图拉索惩罚似然,通过差异凸方案进行优化。我们在合成场以及具有挑战性的高分辨率热层模拟数据集上展示了所提出的方法。在与现有空间模型的比较中,即使在可用训练数据有限的情况下,FSBGL在捕捉热层温度场的显著特征方面表现更好。

英文摘要

We propose a new approach for the modeling large datasets of nonstationary spatial processes that combines a latent low rank process and a sparse covariance model. The low rank component coefficients are endowed with a flexible graphical Gaussian Markov random field model. The utilization of a low rank and compactly-supported covariance structure combines the full-scale approximation and the basis graphical lasso; we term this new approach the full-scale basis graphical lasso (FSBGL). Estimation employs a graphical lasso-penalized likelihood, which is optimized using a difference-of-convex scheme. We illustrate the proposed approach on synthetic fields as well as with a challenging high-resolution simulation dataset of the thermosphere. In a comparison against state-of-the-art spatial models, the FSBGL performs better at capturing salient features of the thermospheric temperature fields, even with limited available training data.

2408.15920 2026-06-19 math.ST math.PR stat.TH

Nonlinear Filtering and Spatial Asymptotic Consistency for SPDEs Observed via Spatio-Temporal Point Processes

Jan Szalankiewicz, Cristina Martinez-Torres, Wilhelm Stannat

Comments Fixed several typos throughout the manuscript, substantially revised Section 4 with improved theoretical bounds, and updated simulations with corresponding code base improvements

Journal ref Stoch PDE: Anal Comp (2026)

详情
英文摘要

In this paper, we develop the mathematical framework for filtering problems arising from biophysical applications where data is collected from confocal laser scanning microscopy recordings of the space-time evolution of intracellular wave dynamics of biophysical quantities. In these applications, signals are described by stochastic partial differential equations (SPDEs) and observations can be modelled as functionals of marked point processes whose intensities depend on the underlying signal. We derive both the unnormalized and normalized filtering equations for these systems, demonstrate the asymptotic consistency and approximations of finite dimensional observation schemes respectively partial observations. Our theoretical results are validated through extensive simulations using synthetic and real data. These findings contribute to a deeper understanding of filtering with point process observations and provide a robust framework for future research in this area.

2307.06655 2026-06-19 stat.ME

Stochastic Reaction-Diffusion Systems in Biophysics: Towards a Toolbox for Quantitative Model Evaluation

Gregor Pasemann, Carsten Beta, Wilhelm Stannat

Journal ref In: Stich, M., Carballido-Landeira, J. (eds) Nonlinear Dynamics for Biological Systems. SEMA SIMAI Springer Series, vol 40, 2025, Springer, Cham

详情
英文摘要

We develop a statistical toolbox for a quantitative model evaluation of stochastic reaction-diffusion systems modeling space-time evolution of biophysical quantities on the intracellular level. Starting from space-time data $X_N(t,x)$, as, e.g., provided in fluorescence microscopy recordings, we discuss basic modelling principles for conditional mean trend and fluctuations in the class of stochastic reaction-diffusion systems, and subsequently develop statistical inference methods for parameter estimation. With a view towards application to real data, we discuss estimation errors and confidence intervals, in particular in dependence of spatial resolution of measurements, and investigate the impact of misspecified reaction terms and noise coefficients. We also briefly touch implementation issues of the statistical estimators. As a proof of concept we apply our toolbox to the statistical inference on intracellular actin concentration in the social amoeba Dictyostelium discoideum.

1812.05678 2026-06-19 stat.ME

Objective-Driven Ensembles: Bridging the Gap Between Interpretable Sparsity and Algorithmic Prediction

目标驱动集成:弥合可解释稀疏性与算法预测之间的差距

Anthony Christidis, Stefan Van Aelst, Ruben Zamar

AI总结 本文提出目标驱动集成方法,通过将最优子集选择推广为联合数学优化问题,生成可解释的集成模型,并理论证明惩罚预测变量重叠可限制预测协方差、减轻有限样本虚假相关的影响,实现机器学习级精度与稀疏模型可解释性的兼顾。

详情
AI中文摘要

稀疏方法(如最优子集选择、弹性网)是获得可解释模型的标准方法,但可能遭受高方差和易受虚假相关影响的问题。另一方面,算法集成(如随机森林、梯度提升)实现了高预测精度,但产生了由随机化或顺序残差拟合驱动的不可解释黑箱。近年来,一种统一的范式出现了:目标驱动集成。通过将最优子集选择推广为联合数学优化问题,该方法通过将预测变量最优地分配到少量不同模型中来生成可解释的集成。在本文中,我们综合了这一日益增长的文献,并为其经验成功提供了理论见解。具体来说,我们表明惩罚预测变量重叠在数学上限制了预测协方差,并减轻了有限样本虚假相关的影响。我们使用精确的组合预言机证明了这些性质,并回顾了最近的计算近似如何成功地将这一框架扩展到各种领域,包括高维数据、分类任务以及存在逐案例或逐单元污染的场景,实现了机器学习级别的精度,同时保留了稀疏模型的可解释性。

英文摘要

Sparse methods (e.g., Best Subset Selection, Elastic Net) are the standard approach for obtaining interpretable models, but they can suffer from high variance and vulnerability to spurious correlations. Alternatively, algorithmic ensembles (e.g., Random Forests, Gradient Boosting) achieve high prediction accuracy but yield uninterpretable black boxes driven by randomization or sequential residual fitting. In recent years, a unifying paradigm has emerged: Objective-Driven Ensembles. By generalizing best subset selection into a joint mathematical optimization problem, this approach generates interpretable ensembles by optimally splitting predictors across a small number of diverse models. In this paper, we synthesize this growing body of literature and illustrate the statistical principles driving its empirical success. Specifically, we utilize finite-sample bounds to demonstrate how penalizing predictor overlap controls ensemble covariance and provides a mathematical hedge against spurious correlations. We evaluate these mechanics using an exact combinatorial oracle, and review how recent computational approximations have successfully scaled this framework to a variety of domains, including high-dimensional data, classification tasks, and settings with casewise or cellwise contamination, achieving machine-learning-level accuracy while retaining the interpretability of sparse models.

1909.03488 2026-06-19 math.AT cs.CG math.PR math.ST stat.TH

Probabilistic Convergence and Stability of Random Mapper Graphs

Adam Brown, Omer Bobrowski, Elizabeth Munch, Bei Wang

详情
英文摘要

We study the probabilistic convergence between the mapper graph and the Reeb graph of a topological space $\mathbb{X}$ equipped with a continuous function $f: \mathbb{X} \rightarrow \mathbb{R}$. We first give a categorification of the mapper graph and the Reeb graph by interpreting them in terms of cosheaves and stratified covers of the real line $\mathbb{R}$. We then introduce a variant of the classic mapper graph of Singh et al.~(2007), referred to as the enhanced mapper graph, and demonstrate that such a construction approximates the Reeb graph of $(\mathbb{X}, f)$ when it is applied to points randomly sampled from a probability density function concentrated on $(\mathbb{X}, f)$. Our techniques are based on the interleaving distance of constructible cosheaves and topological estimation via kernel density estimates. Following Munch and Wang (2018), we first show that the mapper graph of $(\mathbb{X}, f)$, a constructible $\mathbb{R}$-space (with a fixed open cover), approximates the Reeb graph of the same space. We then construct an isomorphism between the mapper of $(\mathbb{X},f)$ to the mapper of a super-level set of a probability density function concentrated on $(\mathbb{X}, f)$. Finally, building on the approach of Bobrowski et al.~(2017), we show that, with high probability, we can recover the mapper of the super-level set given a sufficiently large sample. Our work is the first to consider the mapper construction using the theory of cosheaves in a probabilistic setting. It is part of an ongoing effort to combine sheaf theory, probability, and statistics, to support topological data analysis with random data.

1406.0214 2026-06-19 eess.SY cs.SY math.AT stat.ML

Topological and Statistical Behavior Classifiers for Tracking Applications

拓扑与统计行为分类器用于跟踪应用

Paul Bendich, Sang Chin, Jesse Clarke, Jonathan deSena, John Harer, Elizabeth Munch, Andrew Newman, David Porter, David Rouse, Nate Strawn, Adam Watkins

AI总结 本文提出基于多假设跟踪、拓扑数据分析和机器学习的统一理论,通过拓扑特征编码行为信息,利用统计模型拟合拓扑特征分布,并结合目标类型分类方法提升跟踪性能。

详情
AI中文摘要

我们介绍了一种基于多假设跟踪、拓扑数据分析和机器学习的统一理论,用于目标跟踪。我们的创新包括:1)利用鲁棒的拓扑特征编码行为信息;2)对这些拓扑特征的分布拟合统计模型;3)采用Wigren和Bar Shalom等人的目标类型分类方法,利用所得的拓扑特征似然值提升跟踪过程。为证明我们方法的有效性,我们在由Simulation of Urban Mobility包生成的合成车辆数据上进行了测试。

英文摘要

We introduce the first unified theory for target tracking using Multiple Hypothesis Tracking, Topological Data Analysis, and machine learning. Our string of innovations are 1) robust topological features are used to encode behavioral information, 2) statistical models are fitted to distributions over these topological features, and 3) the target type classification methods of Wigren and Bar Shalom et al. are employed to exploit the resulting likelihoods for topological features inside of the tracking procedure. To demonstrate the efficacy of our approach, we test our procedure on synthetic vehicular data generated by the Simulation of Urban Mobility package.