arXivDaily arXiv每日学术速递 周一至周五更新
重置
STAT统计89
2606.07466 2026-06-08 stat.ME 新提交

Covariance-Adaptive Residualization and Stagewise Calibration for Dependent Multiple Testing

协方差自适应残差化与分步校准用于相依多重检验

Prasenjit Ghosh, Arijit Chakrabarti

AI总结 针对任意协方差相依下的多元高斯均值同时假设检验问题,提出一种结合协方差自适应残差化与广义分步临界常数的分步校准程序,在降低计算复杂度的同时实现更优的信号恢复和错误控制。

详情
AI中文摘要

本文研究在任意协方差相依下多元高斯均值的同步假设检验。基于Cohen等人(2009)的最大残差向下(MRD)程序,我们探索了一种基于Gavrilov等人(2009)的广义分步临界常数的新校准策略。所得程序保留了MRD的协方差自适应残差化机制,同时将原始模型依赖的阈值设定替换为简单的分步校准规则。由于所提程序属于Ghosh和Chakrabarti(2026)研究的单调残差基分步程序类,其可容许性直接由其理论得出。我们还推导了MRD残差统计量的替代表示,将所有活动残差通过单个活动精度矩阵表达,大幅降低了计算复杂度。在广泛相依结构下的模拟研究表明,所提方法通常比几种广泛使用的边际检验程序获得更低的归一化误分类风险。在几种结构化相依模型下,该程序还表现出强大的信号恢复能力,实现了接近名义水平的错误发现率、极小的错误非发现率、接近1的功效以及接近预期真实信号数的平均拒绝数。这些发现提供了经验证据,表明协方差自适应残差化和分步校准在相依多重检验中可能以高度有利的方式相互作用。

英文摘要

In this paper, we study simultaneous hypothesis testing for multivariate Gaussian means under arbitrary covariance dependence. Building on the Maximum Residual Down (MRD) procedure of Cohen et al. (2009), we investigate a new calibration strategy based on the generalized step-down critical constants of Gavrilov et al. (2009). The resulting procedure retains the covariance-adaptive residualization mechanism of MRD while replacing the original model-dependent threshold specification with a simple stagewise calibration rule. Since the proposed procedure belongs to the class of monotone residual-based step-down procedures studied by Ghosh and Chakrabarti (2026), its admissibility follows directly from their theory. We also derive alternative representations of the MRD residual statistics that express all active residuals through a single active precision matrix, substantially reducing computational complexity. Simulation studies across a broad range of dependence structures show that the proposed methodology often achieves a lower normalized misclassification risk than several widely used marginal testing procedures. Under several structured dependence models, the procedure also exhibits strong signal-recovery behavior, attaining false discovery rates near the nominal level, extremely small false non-discovery rates, powers approaching one, and average numbers of rejections close to the expected number of true signals. These findings provide empirical evidence that covariance-adaptive residualization and stagewise calibration may interact in a highly favorable manner for dependent multiple testing.

2606.07447 2026-06-08 stat.ME math.ST stat.TH 新提交

Community Detection on a Randomly Growing Network

随机增长网络上的社区检测

Jianxiang Wang, Min Xu

AI总结 针对非随机块模型的马尔可夫随机网络,提出两阶段算法,先分类高度节点再扩展社区标签,理论证明无法一致恢复所有节点但可恢复中心子集。

详情
Comments
69 pages, 16 figures, 7 tables
AI中文摘要

我们在随机块模型框架之外研究马尔可夫随机网络上的社区检测。具体来说,我们考虑一个随机网络增长过程,该过程生成$K$个独立的优先连接树,并通过Erdős–Rényi边连接它们,使得每棵树代表一个社区,每个节点继承其所属树的标签。该模型能够产生许多在SBM下不太可能出现的真实网络特征,例如幂律度分布以及链和枢纽的存在。仅给定最终图,对增长过程一无所知,我们试图恢复节点的未观察到的社区成员身份。我们首先证明任何算法都无法一致地恢复所有节点的社区标签。然而,我们设计了算法,这些算法能够证明地恢复中心节点子集的社区标签,对于节点中心性的几种不同概念,例如到达时间或度数。我们的过程包括两个阶段,在第一阶段,我们对高度节点进行分类,然后在第二阶段,将社区分配扩展到剩余顶点。数值实验和合著网络上的真实数据应用证明了我们提出方法的有效性。

英文摘要

We study community detection on Markovian random networks outside of the Stochastic Block Model (SBM) framework. Specifically, we consider a random network growth process which generates $K$ separate preferential attachment trees and connects them with Erdős--Rényi edges, so that each tree represents a community and each node inherits the label of the tree to which it belongs. This model is able to produce many features of real world networks that are improbable under SBM, such as power law degree distribution and the existence of chains and hubs. Given only the final graph, without any knowledge of the growth process, we seek to recover the unobserved community membership of the nodes. We first prove that it is impossible for any algorithm to consistently recover the community label of all the nodes. However, we design algorithms which are provably able to recover the community labels of subsets of central nodes, for several different notions of node centrality such as arrival time or degree. Our procedure consists of two stages where, in the first stage, we classify high degree nodes and then, in the second stage, extend the community assignments to the remaining vertices. Numerical experiments and a real data application on a coauthorship network demonstrate the effectiveness of our proposed approach.

2606.07406 2026-06-08 stat.ME 新提交

Deriving the Variance-Minimizing Design for Standard Addition via c-Optimality

通过c-最优性推导标准加入法的方差最小化设计

Gerhard Gössler, Vera Hofer, Walter Goessler

AI总结 本文通过c-最优性理论,针对线性响应下测量误差非递减的情况,证明了标准加入法的最优设计为两点设计,并探讨了测量分配、浓度范围及加权回归的影响。

详情
AI中文摘要

关于标准加入法最优设计的知识似乎分散在文献中,并且至少部分仅存在于数学文献中,对于不熟悉设计最优性理论的读者来说不易快速获取。因此,本工作的想法是总结分析文献中已有的内容,并在需要时将最优性理论的相关结果应用于标准加入法的特殊情况。研究表明,对于非递减的测量误差(例如,随着分析物浓度增加,误差恒定或线性或二次增加),在线性响应的情况下,最优设计是两点设计,无论测量误差方差的具体行为如何。此外,证明测量的最优分配取决于具体设置,这意味着测量的最优分布可能显著偏离50:50的比例。还研究了范围(即最大添加浓度)如何影响结果。最后但同样重要的是,讨论了加权回归的应用问题,并表明,与使用两个以上加标浓度的设计相比,当使用两点设计时,无需加权即可实现最优结果。虽然重点在于浓度估计的精度,但也研究了偏差的影响。

英文摘要

Knowledge about optimal designs for standard addition seems to be scattered among literature and is also, at least partially, only available in mathematical literature that is not quickly accessible for readers not skilled in the field of design optimality theory. Therefore, the idea for this work was to summarize what is already available in analytical literature and to apply the respective results from optimality theory, where needed, to the special case of standard addition. It is shown, for measurement errors that are non-decreasing, e.g., are constant or increase linearly or quadratically with increasing analyte concentration, that the optimal design in the case of a linear response is a two-point design irrespective of the particular behavior of measurement error variance. In addition, it is demonstrated that the optimal allocation of measurements depends on the concrete setting, which means that the optimal distribution of measurements may deviate significantly from a 50:50 ratio. It is also investigated how the range, i.e., the largest added concentration influences the result. Last but not least, also the question of applying weighted regression is discussed and it is shown, that, in contrast to designs using more than two spiked concentrations, no weighting is necessary to achieve optimal results, when a two-point design is used. While the focus lies on the precision of the concentration estimate also the implications for the bias are investigated.

2606.07399 2026-06-08 stat.ML cs.LG 新提交

Automatic, Debiased, and Invariant Counterfactual Generation under General Interventions

通用干预下的自动、去偏和不变反事实生成

Raphael C Kim, Jingsen Zhu, Ramin Zabih, Michele Santacatterina

AI总结 提出ADIGen框架,结合Riesz回归、因果不变性和正交统计学习,实现通用干预下反事实生成的自动、去偏和不变性,并提供过剩风险界。

详情
AI中文摘要

用于反事实结果的生成模型在复杂干预下支持决策具有巨大潜力,但现有方法受限于不稳定的估计、跨环境的泛化能力差以及来自干扰模型错误设定的偏差。我们引入了ADIGen,一个在通用干预下(包括高维干预和结果)进行自动、去偏和不变反事实生成的框架。ADIGen结合了Riesz回归以避免不稳定的密度比估计,因果不变性以改善分布偏移下的泛化,以及正交统计学习以获得针对干扰模型错误设定的双重稳健保证。我们提供了过剩风险界,表明ADIGen在通用干预下控制了反事实风险,具有乘积偏差干扰余项和跨环境的不变风险界。

英文摘要

Generative models for counterfactual outcomes have great potential to support decision-making under complex interventions, but existing approaches are limited by unstable estimation, poor generalization across environments, and bias from nuisance model misspecification. We introduce ADIGen, a framework for automatic, debiased, and invariant counterfactual generation under general interventions, including high-dimensional interventions and outcomes. ADIGen combines Riesz regression to avoid unstable density-ratio estimation, causal invariance to improve generalization under distribution shift, and orthogonal statistical learning to obtain doubly robust guarantees against nuisance model misspecification. We provide excess-risk bounds showing that ADIGen controls counterfactual risk under general interventions, with a product-bias nuisance remainder and an invariant risk bound across environments.

2606.07373 2026-06-08 stat.ME 新提交

Learning Collapsed Patterns in Compositional Data: A Bayesian Heterogeneous Relative-Shift Approach

成分数据中的塌陷模式学习:一种贝叶斯异质性相对位移方法

Maoran Xu, Guanyu Hu

AI总结 提出贝叶斯异质性相对位移回归模型,联合学习潜在聚类和简约效应结构,通过投影收缩先验和有限混合先验实现,并开发了嵌入确定性替代塌缩算子的混合MCMC算法。

详情
AI中文摘要

相对位移回归通过量化当质量从一个成分重新分配到另一个成分时响应如何变化,为建模成分协变量提供了一个原则性框架。然而,许多新兴的成分数据问题超出了这一经典设置,涉及高维预测变量和跨潜在子群体变化的回归效应。这种复杂性对现有方法构成了双重挑战:恢复潜在聚类结构,同时在每个聚类内实现降维。我们提出了一种贝叶斯异质性相对位移回归模型,该模型联合学习潜在聚类和简约效应结构。在方法论上,我们将基于投影的收缩先验(在可识别对比上诱导混合成分内的精确系数绑定)与有限混合先验(推断聚类数量)相结合。在计算上,我们开发了一种可扩展的混合MCMC算法,该算法在NUTS内嵌入了一个确定性替代塌缩算子。在理论上,我们建立了潜在划分和聚类特定效应结构的后验一致性。模拟证实了准确的恢复和强大的预测性能,对跨国宏观经济数据和空间转录组学的应用证明了该方法的可解释性和实用性。

英文摘要

Relative-shift regression provides a principled framework for modeling compositional covariates by quantifying how the response changes when mass is reallocated from one component to another. Yet many emerging compositional data problems extend beyond this classical setting, involving high-dimensional predictors and regression effects that vary across latent subpopulations. This complexity poses a dual challenge unmet by existing methods: recovering latent cluster structure while simultaneously achieving dimension reduction within each cluster. We propose a Bayesian heterogeneous relative-shift regression model that jointly learns latent clusters and parsimonious effect structures. Methodologically, we combine a projection-based shrinkage prior on identifiable contrasts, which induces exact coefficient ties within mixture components, with a mixture of finite mixtures prior that infers the number of clusters. Computationally, we develop a scalable hybrid MCMC algorithm that embeds a deterministic surrogate collapse operator within NUTS. Theoretically, we establish posterior consistency for both the latent partition and cluster-specific effect structures. Simulations confirm accurate recovery and strong predictive performance, and applications to cross-country macroeconomic data and spatial transcriptomics demonstrate the method's interpretability and practical utility.

2606.07364 2026-06-08 stat.AP 新提交

S2A3: Thompson Sampling and Stochastic Exposure Control for High-Stakes CATs

S2A3: 高风险CAT的汤普森采样与随机曝光控制

James Sharpnack, Alexander Tsigler, J. R. Lockwood, Steven Nydick, Alina A. von Davier

AI总结 提出S2A3框架,通过汤普森采样优化项目选择、软评分处理不确定性、随机曝光控制平衡效率与安全,在高风险自适应测试中实现快速项目校准并保持评分可靠性。

详情
AI中文摘要

高风险计算机自适应测试(CAT)需要持续供应已校准的项目,然而传统的项目试测过程缓慢、昂贵且操作风险高。我们引入了S2A3框架——软评分(S2)与自适应自适应管理(A3)——将项目校准和测试管理统一为单一的在线过程。汤普森采样通过从每个项目的后验分布中抽取临时参数,并选择最大化期望Fisher信息的项目来增强项目选择,自然地将不确定项目分配给信息量大的考生,同时保持测量精度。软评分整合了参数不确定性,使得未完全校准的项目对能力估计产生适当减弱的影响。Sympson-Hetter曝光控制的随机变体通过可调温度参数和项目特定权重,平衡测量效率与题库安全。我们在多邻国英语测试的“是/否词汇”和“语境词汇”任务上验证了S2A3,结果表明即使在冷启动项目占活跃题库很大比例的情况下,也能实现快速项目校准并保持评分可靠性。

英文摘要

High-stakes computerized adaptive tests (CATs) require a continuous supply of calibrated items, yet traditional item piloting is slow, expensive, and operationally hazardous. We introduce the S2A3 framework -- Soft Scoring (S2) and Adaptive Adaptive Administration (A3) -- which unifies item calibration and test administration into a single online process. Thompson sampling enhances item selection by drawing provisional parameters from each item's posterior distribution and selecting the item maximizing expected Fisher information, naturally routing uncertain items to informative test-takers while maintaining measurement precision. Soft scoring integrates over parameter uncertainty so that incompletely calibrated items exert appropriately attenuated influence on ability estimates. A stochastic variant of Sympson-Hetter exposure control balances measurement efficiency against bank security via a tunable temperature parameter and item-specific weights. We validate S2A3 on Yes/No Vocabulary and Vocabulary-in-Context tasks from the Duolingo English Test, demonstrating rapid item calibration and preserved scoring reliability even when cold-start items constitute a significant fraction of the active pool.

2606.07213 2026-06-08 stat.ME math.ST stat.ML stat.TH 新提交

Principal Component Analysis for Multivariate Extremes

多元极值的主成分分析

Dan Cooley, Anne Sabourin, Troy Wixson

AI总结 提出一种针对多元极值数据的降维方法,通过主成分分析保留极值相关信息,解决高维极值分析中的维度灾难问题。

详情
Comments
Chapter 11 in "Handbook of of Statistic of Extremes", edited by Miguel de Carvalho, Raphaël Huser, Philippe Naveau, and Brian Reich
AI中文摘要

本章探讨在保留与多元极值分析相关的关键信息的同时,降低数据维度的各种方法。

英文摘要

This chapter explores ways to reduce the dimensionality of the data while preserving key information relevant to the analysis of multivariate extreme values.

2606.07174 2026-06-08 stat.ME 新提交

One-step Outcome Imputation: An Alternative to Multiple Imputation

一步结果插补:多重插补的替代方案

Andreas Nordland, Klaus K. Holst, David Redek, Christian B. Pipper, Aske T. Iversen

AI总结 针对随机对照试验中的缺失结局,提出一种基于影响函数的一步估计方法,避免多重插补中Rubin规则的标准误估计失效问题,并简化计算。

详情
AI中文摘要

随机对照试验中的缺失结局通常通过多重插补(MI)处理。Rubin规则常用于估计标准误,但对于某些常用程序(如基于参考的插补)可能无法提供有效的标准误估计。我们提出一种一步替代方案,通过明确目标给定插补模型所隐含的处理效应,并利用其影响函数构建该处理效应的有效一步估计量。与Rubin规则不同,该方法产生渐近有效的推断。此外,所提方法规避了MI的随机成分和计算负担。我们通过一系列插补模型(包括基于参考的插补和依赖于并发事件的插补)的示例来说明该方法。

英文摘要

Missing outcomes in randomized controlled trials are often handled by multiple imputation (MI). Rubin's rules are routinely used to estimate standard errors but can fail to provide valid standard error estimates for some commonly used procedures, such as reference-based imputation. We propose a one-step alternative by explicitly targeting the treatment effect implied by a given imputation model and constructing an efficient one-step estimator for that treatment effect via its influence function. Unlike Rubin's rules, this approach yields asymptotically valid inference. Moreover, the proposed method circumvents the stochastic component and computational burden of MI. We illustrate the approach with examples spanning a range of imputation models, including reference-based imputation and intercurrent-event-dependent imputation.

2606.07169 2026-06-08 stat.ME math.ST stat.TH 新提交

When can a posterior predictive check identify the learning rate? Exact degeneracy in Gaussian models and implications for Generalised Bayesian Inerence

后验预测检查何时能识别学习率?高斯模型中的精确退化及其对广义贝叶斯推断的影响

Nam Anh Le

AI总结 本文通过精确有限样本分析,揭示了在高斯线性模型中,基于后验预测检查的学习率选择器存在退化现象,即p值不依赖于学习率或数据,导致选择器失效,并提出了数据无关的预筛选诊断方法。

详情
Comments
6 pages, 4 figures
AI中文摘要

广义贝叶斯推断通过学习率$\eta$对似然进行退火以缓解模型误设定,而$\eta$的选择至关重要。Zafar和Nicholls (2024) 提出通过后验预测检查(PPC)选择$\eta$:选择使对数似然PPC $p$值不被拒绝的最小$\eta$。本文给出了该选择器在高斯线性模型上的精确有限样本分析。在方差已知且使用平坦先验时,对于每个$\eta$,PPC $p$值等于$P(\chi^2_n > \mathrm{RSS}/\sigma_0^2)$,因此选择器对$\eta$不变;在方差误设定下,它是双侧非识别的。在方差未知且使用参考先验时,$p$值仅依赖于$(n,d,\eta)$,而不依赖于实际数据或数据生成过程。因此,选择器的输出在观察到任何数据之前就已固定,通常会坍缩到最小的网格值,这会导致过度退火并相对于留出选择扩大预测区间。该现象是高斯尺度-位置族和参考先验特有的枢轴性质;在信息先验下消失。这些结果界定了选择器的适用范围,识别了它无法识别学习率的典型类别,并激发了一种廉价、无数据的预筛选诊断方法。

英文摘要

Generalised Bayesian inference tempers the likelihood by a learning rate $η$ to mitigate model misspecification, and the choice of $η$ is consequential. Zafar and Nicholls (2024) proposed selecting $η$ by a posterior predictive check (PPC): one chooses the smallest $η$ at which a log-likelihood PPC $p$-value is not rejected. An exact, finite-sample analysis of this selector on the Gaussian linear model is given. With known variance and a flat prior, the PPC $p$-value equals $P(χ^2_n > \mathrm{RSS}/σ_0^2)$ for every $η$, so the selector is $η$-invariant; under variance misspecification it is two-sided non-identifying. With unknown variance and the reference prior, the $p$-value depends only on $(n,d,η)$ and not on the realised data or the data-generating process. Consequently the selector's output is fixed before any data are seen, typically collapsing to the smallest grid value, which over-tempers and inflates predictive intervals relative to held-out selection. The phenomenon is a pivotality property specific to the Gaussian scale--location family and the reference prior; it disappears under informative priors. These results delineate the selector's scope, identify a canonical class on which it cannot identify the learning rate, and motivate a cheap, data-free pre-screening diagnostic.

2606.07062 2026-06-08 stat.CO cs.MS 新提交

CATEKAPPA: An R Shiny Application for Design and Analysis of Consistency Tests Based on the Kappa Statistic for Categorical Responses

CATEKAPPA:基于Kappa统计量进行分类响应一致性检验设计与分析的R Shiny应用

Zheng Gai, Li Xincheng, Jiang Wangyingjie, Zhao Panwei

AI总结 针对分类数据一致性检验中样本量确定和Kappa系数计算两大难题,开发了集成样本量规划与一致性分析的R Shiny应用CATEKAPPA,支持Cohen's、Fleiss'和Light's Kappa,并提供自动解释。

详情
Comments
10 pages, 4 figures; This open-source R package CATEKAPPA is available on CRAN at https://CRAN.R-project.org/package=catekappa, source code repository is hosted at https://github.com/satellite837/catekappa. Manuscript planned for submission to Journal of Statistical Software (JSS). Supplementary R package source code uploaded as ancillary file
AI中文摘要

Kappa统计量是分类数据中衡量评估者间一致性的最广泛使用的指标。尽管其流行,应用研究人员常遇到两大障碍:(i) 确定达到给定功效下期望一致性水平所需的样本量,以及(ii) 计算合适的Kappa系数并进行正确解释。现有的R包如irr和kappaSize提供了这些功能,但需要编程技能且缺乏集成的用户友好界面。我们提出CATEKAPPA,一个R包,通过将样本量规划(通过kappaSize)和一致性分析(通过irr)结合到单个基于Shiny的Web应用中,弥合了这一差距。该包支持两位评估者的Cohen's kappa、三位或更多评估者的Fleiss' kappa以及Light's kappa,并使用Landis & Koch量表提供自动解释。用户可以启动交互式图形界面或使用命令行函数进行脚本编写。该包在CRAN上免费提供。

英文摘要

The kappa statistic is the most widely used measure of inter-rater agreement for categorical data. Despite its popularity, applied researchers often encounter two major hurdles: (i) determining the sample size required to achieve a desired level of agreement with given power, and (ii) computing appropriate kappa coefficients with proper interpretation. Existing R packages such as irr and kappaSize provide these functionalities but require programming skills and lack an integrated, user-friendly interface. We present CATEKAPPA, an R package that bridges this gap by combining sample size planning (via kappaSize) and agreement analysis (via irr) into a single Shiny-based web application. The package supports Cohen's kappa for two raters, Fleiss' kappa for three or more raters, and Light's kappa, and provides automatic interpretation using the Landis & Koch scale. Users can either launch an interactive graphical interface or use command-line functions for scripting. The package is freely available on CRAN.

2606.07052 2026-06-08 stat.ME 新提交

Influence of continuous predictor modelling methods on prediction stability in clinical prediction model development: an empirical comparison using real clinical data

连续预测因子建模方法对临床预测模型开发中预测稳定性的影响:基于真实临床数据的实证比较

Phichayut Phinyo, Pakpoom Wongyikul, Noraworn Jirattikanwong, Natthanaphop Isaradech, Wuttipat Kiratipaisarl, Suppachai Lawanaskol, Noppadon Seesuwan, Wachiranun Sirikul

AI总结 本研究利用真实临床数据比较六种连续变量建模方法(二分法、三分法、线性项、二次项、多变量分数多项式、极端梯度提升)对预测稳定性的影响,发现线性项在较小样本中更稳定,而复杂方法需要更大样本。

详情
Comments
30 pages
AI中文摘要

背景与目的:预测稳定性在可靠临床预测模型开发中日益受到重视,但连续预测因子建模选择的影响尚不明确。本研究探讨了连续预测因子建模方法对预测稳定性的影响。方法:我们使用包含19,418名急诊患者的真实临床数据集,创建了从437到8,739名患者的五种样本量场景。比较了六种方法:中位数二分法(DIC)、三分法(TER)、线性项(LIN)、二次项(QUA)、多变量分数多项式(MFP)和极端梯度提升(XGB)。使用基于bootstrap的框架评估预测稳定性。通过内部验证估计了经乐观校正的AUC和校准度。当至少90%的个体预测的平均绝对预测误差(MAPE)<=5%时,认为方法稳定。结果:稳定性随样本量增加而变化,且因方法而异。在n=437时,没有方法达到稳定性标准;LIN最稳定,其次是DIC。在n=874时,DIC和LIN实现了稳定预测且校准度相似,尽管DIC的AUC较低。在n=1,748时,QUA达到稳定,而MFP和XGB未达到。在n=3,496和n=8,739时,所有方法均达到稳定。LIN、QUA、MFP和XGB通常比DIC和TER具有更高的AUC,而XGB显示出最高的AUC但持续存在校准偏差。结论:连续预测因子建模方法似乎影响预测稳定性。LIN从基础样本量开始即实现稳定预测,而QUA、MFP和XGB需要更大样本。尽管XGB具有高区分度,但校准问题持续存在。这些发现表明,在较小数据集中,更简单的方法(尤其是LIN)可能提供更稳定的预测。

英文摘要

Background and objective: Prediction stability is increasingly recognised as important for reliable clinical prediction model development, but the effect of continuous predictor modelling choices is unclear. This study examined how approaches to modelling continuous predictors influence prediction stability. Methods: We used a real clinical dataset of 19,418 emergency department patients to create five sample size scenarios ranging from 437 to 8,739 patients. Six methods were compared: dichotomisation at the median (DIC), tertile categorisation (TER), linear terms (LIN), quadratic terms (QUA), multivariable fractional polynomials (MFP), and extreme gradient boosting (XGB). Prediction stability was evaluated using a bootstrap-based framework. Optimism-corrected AUC and calibration were estimated through internal validation. A method was considered stable when at least 90% of individual predictions had a mean absolute prediction error (MAPE) <=5%. Results: Stability increased with sample size and varied by method. At n = 437, no method met the stability criterion; LIN was the most stable, followed by DIC. At n = 874, DIC and LIN achieved stable predictions with similar calibration, although DIC had lower AUC. At n = 1,748, QUA achieved stability, whereas MFP and XGB did not. At n = 3,496 and n = 8,739, all methods achieved stability. LIN, QUA, MFP, and XGB generally had higher AUCs than DIC and TER, while XGB showed the highest AUC but persistent miscalibration. Conclusion: Continuous predictor modelling methods appeared to influence prediction stability. LIN achieved stable predictions from the base sample size onwards, whereas QUA, MFP, and XGB required larger samples. Although XGB showed high discrimination, calibration concerns persisted. These findings suggest that, in smaller datasets, simpler approaches, particularly LIN, may provide more stable predictions.

2606.07016 2026-06-08 stat.AP cs.CV 新提交

An Integrated Roadside Sensing and Communication Framework for Vulnerable Road User Safety at Signalized Intersections

信号交叉口弱势道路使用者安全的集成路边感知与通信框架

Parvez Anowar

AI总结 提出集成多模态感知、边缘计算、V2X/P2X通信和自适应信号控制的框架,基于公开数据集R-LiViT分析53,319个标注,发现VRU占49%、昼夜密度差异大、近距离事件变化10倍、83%行人边界框小,支持多模态感知和自适应部署。

详情
Comments
17 pages, 5 figures, 2 tables. Preprint
AI中文摘要

弱势道路使用者(VRU)约占全球城市交通死亡人数的一半,而交叉口集中了不成比例的伤亡。最近关于VRU保护的感知技术综述列举了数十种单传感器和双传感器部署,但所调查的系统均未将多模态感知与边缘侧近碰撞分析以及双向车联万物(V2X)和行人联万物(P2X)消息传递集成在单个交叉口机柜中。本文提出一个信号交叉口VRU保护的综合框架,在感知层结合LiDAR、雷达、RGB相机和热成像相机,在计算层进行基于边缘的预测和替代安全分析,在通信层进行V2X和P2X消息传递,在驱动层进行自适应信号控制。该框架基于使用R-LiViT(首个公开的路边LiDAR-视觉-热成像数据集)的实证案例研究,该数据集提供了200个多模态序列和2,400个标注的RGB-T帧,来自三个德国交叉口。对53,319个检测标注的分析显示,VRU约占所有道路使用者观测的49%;从白天到夜晚,行人密度下降38%,车辆下降45%,而夜间分布显示更高的近距离比例;在三个交叉口的八个独特位置,每帧近距离事件计数变化约10倍;83%的行人边界框在图像空间中较小,表明VRU通常远离任何单个传感器。这些发现支持多模态感知、边缘侧分析和自适应上下文感知部署,而非统一的单传感器解决方案。

英文摘要

Vulnerable road users (VRUs) account for approximately half of urban traffic deaths globally, with intersections concentrating a disproportionate share of these casualties. Recent reviews of sensing technology for VRU protection have cataloged dozens of single-sensor and dual-sensor deployments, yet none of the surveyed systems couples multi-modal sensing with edge-side near-miss analytics and bidirectional vehicle-to-everything (V2X) and pedestrian-to-everything (P2X) messaging in a single intersection cabinet. This paper presents an integrated framework for VRU protection at signalized intersections, combining LiDAR, radar, RGB camera, and thermal camera at the perception layer, edge-based prediction and surrogate-safety analytics at the computation layer, V2X and P2X messaging at the communication layer, and adaptive signal control at the actuation layer. The framework is grounded in an empirical case study using R-LiViT, the first publicly released roadside LiDAR-Visual-Thermal dataset, which provides 200 multi-modal sequences and 2,400 annotated RGB-T frames at three German intersections. Analysis of 53,319 detection annotations reveals that VRUs comprise approximately 49% of all road-user observations, that day-to-night density drops by 38% for pedestrians and 45% for vehicles while the night distribution shows a higher close-proximity share, that per-frame close-proximity event counts vary approximately 10-fold across the eight unique locations at three intersections, and that 83% of pedestrian bounding boxes are small in image space, indicating that VRUs are typically far from any single sensor. These findings support multi-modal sensing, edge-side analytics, and adaptive context-sensitive deployment rather than uniform single-sensor solutions.

2606.07014 2026-06-08 stat.AP 新提交

Networked Spatial Effects in European Electricity Price Forecasting

欧洲电价预测中的网络空间效应

Sultan Mahmud Chomon, Florian Ziel

AI总结 针对欧洲竞价区高度互联的特点,提出网络时空模型(NSTM),利用度量图映射空间信息覆盖,在39个竞价区的高分辨率流式预测中,该模型优于传统孤立模型,揭示了网络结构在跨市场信息传播中的关键作用。

详情
AI中文摘要

由于欧洲竞价区通过物理输电线路高度互联,空间影响通过网络在相邻节点间传播。这反映在欧洲竞价区的日前电价中,因为拍卖算法也使用每个竞价区地理边界之外的信息。为了捕捉这种互联如何影响相邻竞价区的电价,我们使用度量图,通过定义良好的邻域度量来映射信息的空间覆盖。我们提出了网络时空模型(NSTM),它将不规则的空间节点映射到有序网络中,从而能够系统地纳入邻域信息。我们在覆盖大部分欧洲电力市场的39个竞价区中,以高分辨率流式预测设置实现了NSTM。该模型利用自回归、跨小时和季节效应,以及燃料和排放价格和基本面的日前预测,作为互联信息来预测每个竞价区的日前价格。本文呈现的一项全欧洲研究表明,NSTM始终优于传统的孤立纯局部模型。本文提供了一个框架,展示了网络结构在跨互联市场传播信息中的关键作用及其对日前电价预测的重大影响。

英文摘要

As European bidding zones are highly interconnected by physical transmission lines, spatial influences propagate across neighboring nodes through a network. It is reflected in the day-ahead electricity prices across European bidding zones, as the auction algorithm also uses information beyond each bidding zone's geographic boundary. To capture how this interconnection affects the electricity prices in neighboring bidding zones, we have used a metric graph to map the spatial coverage of information using a well-defined neighborhood measure. We propose the Networked Spatio-Temporal Model (NSTM), which maps irregular spatial nodes into an ordered network, enabling the systematic incorporation of neighborhood information. We implement the NSTM across 39 bidding zones covering the majority of European electricity markets in a high-resolution, streaming-forecasting setup. The model uses autoregressive, cross-hour, and seasonal effects, along with fuel and emission prices and day-ahead forecasts of fundamentals, as interconnected information to predict the day-ahead prices for each bidding zone. A Europe-wide study presented in this paper shows that the NSTM consistently outperforms traditional island-based pure local models. This paper provides a framework that demonstrates the critical role the networked structure plays in propagating information across interconnected markets and its vast implications for day-ahead electricity price forecasting.

2606.06961 2026-06-08 stat.ME 新提交

Causal inference of Plackett-Burman designs in applications

Plackett-Burman设计在应用中的因果推断

Shuchen Chang, Zhi-ming Li

AI总结 针对Plackett-Burman设计的四个应用,提出基于潜在结果的因果推断框架,定义有限总体下的因果效应,给出Neyman估计量及方差协方差估计,并进行Fisher精确检验和区间构造。

详情
AI中文摘要

受Plackett-Burman(PB)设计的四个应用驱动,本文提出了一个基于潜在结果的因果推断框架。首先,我们在有限总体下定义了PB设计的因果效应。然后,得到了因果效应的Neyman估计量,包括估计的方差和协方差。此外,我们进行了尖锐零假设检验,并使用算法构造了Fisher区间。最后,通过这些应用说明了所提出的方法。

英文摘要

Driven by four applications of Plackett-Burman (PB) designs, this paper proposes a causal inference framework based on potential outcomes. First, we define the causal effects of the PB designs under finite populations. The Neymanian estimator of causal effects is then obtained, including the estimated variance and covariance. Furthermore, we conduct a sharp null-hypothesis test and construct the Fisherian interval using an algorithm. Finally, the proposed methods are illustrated through these applications.

2606.06957 2026-06-08 stat.ML cs.LG 新提交

Deep Single-Index Fréchet Regression

深度单指标弗雷歇回归

Muqing Cui, Yidong Zhou, Su I Iao, Hans-Georg Müller

AI总结 提出DeSI框架,通过深度神经网络估计单指标方向,在度量空间中进行弗雷歇回归,缓解维数灾难并保持可解释性,理论保证收敛率,在分布、网络等数据上表现优异。

详情
AI中文摘要

预测位于非欧几里得空间中的输出,如概率分布、网络和对称正定矩阵,在现代数据分析中变得越来越重要,特别是当输入是高维时。我们提出了DeSI(深度单指标弗雷歇回归),一种用于度量空间值输出和多变量输入的半参数回归框架,该框架假设条件弗雷歇均值具有单指标结构。DeSI使用深度神经网络估计可解释的指标方向,该方向量化了输入的相对重要性,并在目标度量空间中沿着得到的一维指标进行弗雷歇回归。这种结构缓解了维数灾难,同时保持了可解释性,这与标准深度神经网络形成对比。我们为DeSI建立了理论保证,包括一致逼近和收敛速度,并通过在分布、网络和对称正定矩阵上的模拟,以及在新泽西州的成分情绪数据上的应用,展示了其强大的预测性能。

英文摘要

Predicting outputs that are located in non-Euclidean spaces, such as probability distributions, networks, and symmetric positive-definite matrices, is becoming increasingly important in modern data analysis, particularly when inputs are high-dimensional. We propose DeSI (Deep Single-Index Fréchet Regression), a semiparametric framework for regression with metric space-valued outputs and multivariate inputs that assumes a single-index structure for the conditional Fréchet mean. DeSI estimates an interpretable index direction, which quantifies the relative importance of inputs, using a deep neural network, and performs Fréchet regression along the resulting one-dimensional index in the target metric space. This structure mitigates the curse of dimensionality while retaining interpretability, which stands in contrast to standard deep neural networks. We establish theoretical guarantees for DeSI, including uniform approximation and convergence rates, and demonstrate its strong predictive performance through simulations on distributions, networks, and symmetric positive-definite matrices, as well as an application to compositional mood data from New Jersey.

2606.06930 2026-06-08 stat.ME 新提交

Testing Equality of Conditional Distributions via Generative Models

通过生成模型检验条件分布相等

Hanjia Gao, Linjun Huang, Yun Yang, Xiaofeng Shao

AI总结 提出一种基于生成模型检验两个条件分布是否相等的方法,通过交叉生成对齐协变量,避免密度比估计和高维平滑,并开发了基于RKHS的检验统计量及自举校准算法,理论证明了双重稳健性。

详情
Comments
93 pages, 4 figures
AI中文摘要

我们研究了利用生成模型检验两个条件分布是否相等的问题。所提出的方法从每个样本中学习一个条件生成器,并利用它在另一个样本中观察到的协变量值生成响应,从而允许直接比较生成响应和观测响应。通过交叉生成对齐协变量,该方法避免了条件密度比估计和高维协变量的局部平滑。该构造的总体版本产生了一个条件差异,在适当的重叠条件下刻画了两个条件分布的相等性,而样本版本则定义了一个检验统计量,该统计量是RKHS索引经验过程的上确界,并采用乘子自举校准。基于交替最大化和核技巧,我们开发了一种计算高效的算法来评估该统计量及其自举模拟。理论上,我们推导了原假设和备择假设下检验统计量的极限分布,证明了自举的有效性和检验的一致性,并表明所提出的过程在条件生成器估计误差方面具有双重稳健性。模拟和实际数据应用表明,所提出的方法对多元响应和高维协变量表现良好。

英文摘要

We study the problem of testing whether two conditional distributions are equal using generative models. The proposed method learns a conditional generator from each sample and uses it to create responses at covariate values observed in the other sample, allowing generated and observed responses to be compared directly. By aligning covariates through cross-generation, the approach avoids conditional density-ratio estimation and local smoothing over high-dimensional covariates. The population version of this construction yields a conditional discrepancy that characterizes equality of the two conditional distributions under suitable overlap conditions, while the sample version leads to a test statistic defined as the supremum of an RKHS-indexed empirical process with multiplier bootstrap calibration. A computationally efficient algorithm for evaluating the statistic and its bootstrap analogue is developed based on alternating maximization and the kernel trick. Theoretically, we derive the limiting distribution of the test statistic under both the null and alternative hypotheses, prove bootstrap validity and consistency of the resulting test, and show that the proposed procedure attains a double-robustness property with respect to conditional generator estimation errors. Simulations and real data applications suggest that the proposed method performs well for multivariate responses and high-dimensional covariates.

2606.06855 2026-06-08 stat.ML cs.LG math.ST stat.TH 新提交

Stability beyond Bounded Differences: Sharp Generalization Bounds under Finite $L_p$ Moments

超越有界差分的稳定性:有限 $L_p$ 矩下的尖锐泛化界

Qianqian Lei, Soham Bonnerjee, Yuefeng Han, Wei Biao Wu

AI总结 针对重尾或无界损失,提出仅需有限 $L_p$ 矩条件的稳定性框架,导出尖锐高概率泛化界,覆盖经验风险最小化、转导回归和元学习。

详情
AI中文摘要

虽然算法稳定性是理解学习算法泛化能力的核心工具,但现有的高概率保证通常依赖于一致有界或次高斯/次韦布尔尾部假设,这对于现代设置中重尾或无界损失可能过于严格。我们开发了一个仅需有限 $L_p$ 矩条件的稳定性框架。我们的第一个贡献是在 $L_p$ 约束下独立随机变量函数的尖锐集中不等式,将 McDiarmid 的有界差分技术扩展到经典范围之外。利用这些结果,我们在一系列学习范式中推导出尖锐的高概率泛化界,包括经验风险最小化、转导回归和元学习。这些保证表明,即使有界性不成立,$L_p$ 稳定性也足以实现鲁棒泛化,显著削弱了稳定性文献中的标准假设。

英文摘要

While algorithmic stability is a central tool for understanding generalization of learning algorithms, existing high-probability guarantees typically rely on uniform boundedness or sub-Gaussian/sub-Weibull tail assumptions, which can be overly restrictive for modern settings with heavy-tailed or unbounded losses. We develop a stability-based framework that requires only a finite $L_p$ moment condition. Our first contribution is sharp concentration inequalities for functions of independent random variables under $L_p$ constraints, extending McDiarmid's bounded-differences techniques beyond the classical regime. Leveraging these results, we derive sharp high-probability generalization bounds across a range of learning paradigms, including empirical risk minimization, transductive regression, and meta-learning. These guarantees show that $L_p$ stability suffices for robust generalization even when boundedness fails, substantially weakening the standard assumptions in the stability literature.

2606.06814 2026-06-08 stat.ML cs.LG math.ST stat.AP stat.TH 新提交

The Effect of Training Task Diversity on In-Context Learning through the Lens of Low-Dimensional Subspaces

训练任务多样性对上下文学习的影响:基于低维子空间的视角

Soo Min Kwon, Alec S. Xu, Can Yaras, Dogyoon Song, Laura Balzano, Qing Qu

AI总结 本文通过低秩高斯混合模型分析训练任务多样性(由子空间非重叠列数定义)如何提升线性注意力上下文学习的泛化与优化,解释训练多样性缩短学习平台期及实现分布外泛化的现象,并扩展至非线性场景。

详情
AI中文摘要

Transformer执行上下文学习(ICL)的涌现能力引发了大量旨在理解其底层机制的研究。现有工作通常研究训练任务多样性(定义为ICL训练任务向量的数量或任务向量所来自的函数类数量)如何塑造ICL的学习动态和泛化能力。尽管这两种定义都揭示了许多有趣的现象,但后一定义下的许多观察结果在理论上仍未得到解释。本文提出了一个最小分析模型,在这些现象下,这些现象可以从训练数据的属性中可靠地涌现。通过将训练任务向量建模为低秩高斯的混合,我们展示了训练任务多样性(由参数化协方差矩阵的子空间之间的非重叠列数定义)如何改善线性注意力ICL的泛化和优化轨迹。特别地,我们表明我们的模型可以解释(i)为什么任务多样性训练缩短了ICL的平台期,以及(ii)为什么ICL似乎实现了分布外泛化。最后,我们通过实验证明了我们的结果如何扩展到非线性Transformer和非线性函数类。总体而言,我们的工作提出了一个可处理的框架来统一现有的观察结果。

英文摘要

The transformer's emergent ability to perform in-context learning (ICL) has sparked a wide range of studies designed to understand its underlying mechanisms. Existing works often study how training task diversity, defined either as the number of ICL training task vectors or as the number of function classes from which the task vectors are drawn, shapes both the learning dynamics and generalization capabilities of ICL. While both definitions have uncovered many interesting phenomena, many observations under the latter definition remain theoretically unexplained. This paper presents a minimal analytical model under which these phenomena provably emerge from the properties of the training data. By modeling the training task vectors as a mixture of low-rank Gaussians, we show how training task diversity, defined by the number of non-overlapping columns between subspaces that parameterize the covariance matrices, improves both the generalization and optimization trajectory of ICL with linear attention. In particular, we show that our model can explain (i) why training with task diversity shortens the ICL plateau and (ii) why ICL appears to achieve out-of-distribution generalization. We conclude by empirically demonstrating how our results extend to nonlinear transformers and nonlinear function classes. Overall, our work presents a tractable framework to unify existing observations.

2606.06785 2026-06-08 stat.ML cs.LG math.DS 新提交

Empirical Transfer Operators and Finite-Sample Change Detection for Noisy Expanding Interval Maps

经验转移算子与含噪扩张区间映射的有限样本变化检测

Aparna Rajput

AI总结 针对一维含噪动力系统,提出基于分区经验转移矩阵的有限样本变化检测方法,通过比较滑动窗口与基线段的平稳分布L1距离来检测不变密度变化,并给出有限样本界和误报保证。

详情
Comments
27 pages, 2 tables, 1 figure
AI中文摘要

我们研究了一维含噪动力系统的有限样本变化检测,使用基于分区的经验近似来刻画平稳行为。给定区间值过程的观测,我们对状态空间进行划分,从观测到的分区元素之间的转移中估计一个有限转移矩阵,并应用一个小的Doeblin型正则化以确保唯一的平稳分布。从初始参考段,我们计算基线经验平稳分布\(\widehat{\pi}_{0,\rho}\)。对于每个后续滑动窗口,我们计算\(\widehat{\pi}_{t,\rho}\)并定义得分\[ S_t=\|\widehat{\pi}_{t,\rho}-\widehat{\pi}_{0,\rho}\|_1. \] \(S_t\)的大值表示相对于基线的平稳行为发生变化。该统计量检测不变密度或平稳定律的变化,但不检测转移动态的所有可能变化。在关于经验转移集中性、有限状态平稳分布稳定性、分区近似、正则化偏差和噪声稳定性的明确假设下,我们推导了经验平稳密度的有限样本界。该界将采样误差、正则化偏差、分区近似误差和噪声偏差分开。然后,我们得到了单窗口误报保证,以及当不变密度变化超过估计误差时的充分检测条件。我们在合成含噪beta映射变点实验中展示了该方法。

英文摘要

We study finite-sample change detection for one-dimensional noisy dynamical systems using partition-based empirical approximations of stationary behaviour. Given observations from an interval-valued process, we partition the state space, estimate a finite transition matrix from observed transitions between partition elements, and apply a small Doeblin-type regularisation to ensure a unique stationary distribution. From an initial reference segment, we compute a baseline empirical stationary distribution \(\widehatπ_{0,ρ}\). For each later sliding window, we compute \(\widehatπ_{t,ρ}\) and define the score \[ S_t=\|\widehatπ_{t,ρ}-\widehatπ_{0,ρ}\|_1. \] Large values of \(S_t\) indicate a change in stationary behaviour relative to the baseline. The statistic detects changes in invariant density or stationary law, but not all possible changes in transition dynamics. Under explicit assumptions on empirical transition concentration, finite-state stationary distribution stability, partition approximation, regularisation bias, and noise stability, we derive a finite-sample bound for the empirical stationary density. The bound separates sampling error, regularisation bias, partition approximation error, and noise bias. We then obtain a single-window false-alarm guarantee and a sufficient detection condition when the invariant density changes by more than the estimation error. We illustrate the method on synthetic noisy beta-map change-point experiments.

2606.06772 2026-06-08 stat.ML cs.AI cs.LG 新提交

Generalization in Deep Neural Networks: Minimax Rates for Gradient Methods

深度神经网络的泛化:梯度方法的极小化最优速率

Junyu Zhou, Puyu Wang, Yunwen Lei, Marius Kloft, Yiming Ying

AI总结 本文建立了过参数化深度神经网络与核方法学习动力学的联系,证明了梯度下降和随机梯度下降在足够宽度下能达到极小化最优泛化误差。

详情
Comments
37 pages
AI中文摘要

理解过参数化神经网络的泛化性能已成为深度学习理论的核心课题。尽管近期进展,特别是神经正切核(NTK)机制下的工作,揭示了浅层架构的行为,但深度神经网络(DNN)的统计泛化性质,尤其是在回归任务中,仍远未得到充分理解。本文通过提供使用梯度方法训练的DNN的全面泛化分析,在弥合这一差距方面取得了重大进展。首先,我们首次建立了使用梯度方法训练的、具有光滑激活函数的DNN的学习动态与核方法的学习动态之间的关键联系,表明过参数化DNN上的梯度方法可以完全继承其核对应物的有利学习动态。基于这一联系以及核方法已确立的最优性,我们推导出了梯度下降(GD)和随机梯度下降(SGD)的过量总体风险的第一个已知极小化最优速率,假设网络宽度与样本大小成多项式比例。我们的结果表明,在足够宽度下,由GD或SGD训练的DNN可以实现与基于核的方法相当的泛化性能。

英文摘要

Understanding the generalization performance of over-parameterized neural networks has become a central topic in deep learning theory. While recent advances, particularly works under the Neural Tangent Kernel (NTK) regime, have shed light on the behavior of shallow architectures, the statistical generalization properties of deep neural networks (DNNs), especially in regression tasks, remain far less understood. In this paper, we make significant progress toward closing this gap by providing a comprehensive generalization analysis of DNNs trained using gradient-based methods. First, we establish, for the first time, a crucial connection between the learning dynamics of a DNN with smooth activation functions trained via gradient-based methods and those of kernel methods, showing that gradient-based methods on over-parameterized DNNs can fully inherit the favorable learning dynamics of their kernel counterparts. Building on this connection and the well-established optimality of kernel methods, we derive the first known minimax-optimal rates for the excess population risk of both gradient descent (GD) and stochastic gradient descent (SGD), under the assumption that network width scales polynomially with the sample size. Our results demonstrate that, with sufficient width, DNNs trained by GD or SGD can achieve generalization performance comparable to kernel-based methods.

2606.06764 2026-06-08 stat.ML cs.AI cs.LG 新提交

Optimal Rates for Generalization of Gradient Descent Methods with Deep Neural Networks

深度神经网络梯度下降方法的泛化最优速率

Junyu Zhou, Puyu Wang, Yunwen Lei, Yiming Ying, Ding-Xuan Zhou

AI总结 本文针对深度ReLU网络,在神经正切核(NTK)机制下,首次建立了梯度下降(GD)和随机梯度下降(SGD)的极小化最优泛化误差速率,证明宽度足够时可达核方法的最优速率。

详情
Comments
39 pages, 1 table
AI中文摘要

近年来,在神经正切核(NTK)机制下,对于过参数化神经网络的梯度下降方法的统计泛化性能的理解取得了进展。然而,现有关于回归问题的工作大多局限于浅层网络架构,在深度神经网络理论中留下了显著的空白。本文通过为使用梯度下降(GD)和随机梯度下降(SGD)训练的深度ReLU网络提供全面的泛化分析来填补这一空白。具体来说,我们首次建立了深度ReLU网络的GD和SGD在总体风险超额上的极小化最优速率,假设网络宽度与网络深度和训练样本规模呈多项式关系。我们的结果表明,在足够宽度下,深度ReLU网络的梯度下降方法能够达到与核方法相当的泛化最优速率。

英文摘要

Recent progress has been made in understanding the statistical generalization performance of gradient descent methods for overparameterized neural networks within the neural tangent kernel (NTK) regime. However, most of the existing work on regression problems is limited to shallow network architectures, leaving a notable gap in the theory of deep neural networks. This paper addresses this gap by presenting a comprehensive generalization analysis for deep ReLU networks trained using gradient descent (GD) and stochastic gradient descent (SGD). Specifically, we establish the first known minimax-optimal rates of excess population risk for both GD and SGD with deep ReLU networks, under the assumption that the network width scales polynomially with respect to the network depth and training sample size. Our results demonstrate that with sufficient width, gradient descent methods for deep ReLU networks can achieve optimal generalization rates on par with kernel methods.

2606.06753 2026-06-08 stat.ME 新提交

Cluster-Aware Conformal Calibration for Spatio-Temporal Distributional Prediction

面向时空分布预测的聚类感知保形校准

Gooyoung Kim, Chae Young Lim, Wen-Ting Wang, Hao-Yun Huang, Wei-Ying Wu

AI总结 针对DeepKriging在非均匀采样下空间基函数效率低的问题,提出聚类自适应空间基和聚类感知保形校准,提升时空分布预测的覆盖精度和尾部可靠性。

详情
AI中文摘要

DeepKriging类模型(如时空DeepKriging)通过基函数嵌入和随机梯度学习提高了可扩展性;然而,在高度非均匀采样模式下,固定的规则网格空间基仍然效率低下,往往将容量过度分配给稀疏区域,而对密集簇的分辨不足。为解决这一局限,我们提出了一种DeepKriging的实用扩展,用于可靠的时空分布预测,结合了聚类自适应空间基——其中心和尺度从空间采样密度初始化——以更好地捕捉异质空间采样,以及聚类感知保形校准,该校准在空间簇内确定预测区间宽度(当校准样本不足时使用全局回退)。由此产生的校准流程明确针对空间异质性和局部误校准,实验(包括模拟研究和PM$_{2.5}$数据分析)表明,与全局保形基线相比,在聚类观测模式下覆盖精度和尾部可靠性显著提高。

英文摘要

DeepKriging-style models, such as Spatio-Temporal DeepKriging, improve scalability through basis-function embeddings and stochastic gradient learning; however, fixed regular-grid spatial bases remain inefficient under highly non-uniform sampling patterns, often over-allocating capacity to sparse regions while under-resolving dense clusters. To address this limitation, we propose a practical extension of DeepKriging for reliable spatio-temporal distributional forecasting, incorporating cluster-adaptive spatial bases - whose centers and scales are initialized from {the spatial sampling density} - to better capture heterogeneous spatial sampling, together with cluster-aware conformal calibration that determines prediction-interval widths within spatial clusters (with a global fallback when calibration samples are insufficient). The resulting calibration pipeline explicitly targets spatial heterogeneity and local miscalibration, and experiments, including simulation studies and PM$_{2.5}$ data analysis, demonstrate substantially improved coverage accuracy and tail reliability under clustered observation patterns compared with a global conformal baseline.

2606.06730 2026-06-08 stat.ME 新提交

Bayesian genome-wide clustering and variable selection of transcriptomic data via rank-based mixtures

基于秩混合的转录组数据贝叶斯全基因组聚类与变量选择

Emilie Eliseussen, Haakon Muggerud, Luca Coraggio, Ida Scheel, Thomas Fleischer, Valeria Vitelli

AI总结 提出首个基于秩的模型 lowBM3,扩展贝叶斯 Mallows 模型以联合处理超高维数据中的聚类和变量选择,提供可扩展的贝叶斯框架,并在癌症基因组学应用中展示其有效性。

详情
Comments
60 pages, 25 figures
AI中文摘要

随着排名数据可用性的增加,对能够处理高维数据集并为所有估计提供不确定性量化的无监督基于秩的推理框架的需求日益增长。基于秩的方法在组学分析中也越来越受欢迎,因为对连续测量进行排序提供了一种处理非正态分布数据的稳健方法。贝叶斯 Mallows 模型(BMM)因其对各种排名数据的适应性以及灵活框架(将聚类级排名聚合与个体级推理相结合)而成为一种有前景的选择。然而,BMM 在超高维设置(如组学分析)中的可扩展性仍然有限。本文通过引入第一个基于秩的模型来解决这一问题,该模型将 BMM 推广到联合处理聚类和变量选择,即低维贝叶斯 Mallows 模型混合(lowBM3)。所提出的方法提供了一种新颖的贝叶斯框架,能够以可扩展的方式同时处理样本异质性、无监督参数估计和模型选择,适用于超高维数据。此外,还引入了一个配套的后处理框架,以提供共识排名和变量选择器的离散后验分布的后验总结。通过模拟研究评估了该方法的性能。该方法在癌症基因组学特征发现中的应用也展示了其实用性,其中对乳腺癌患者获得的 RNA-seq 批量基因表达数据进行了全基因组聚类。

英文摘要

With the increasing availability of ranking data, there has been a growing demand for appropriate unsupervised rank-based inferential frameworks capable of handling high-dimensional datasets and providing uncertainty quantification for all estimates. Rank-based methods have also seen a growing popularity in -omics pipelines, as ranking continuous measurements provides a robust means of handling non-normally distributed data. The Bayesian Mallows model (BMM) has emerged as a promising choice because of its adaptability to various types of ranking data and its flexible framework, integrating cluster-wise rank aggregation with inference at the individual level. However, the scalability of BMM to ultra-high-dimensional settings, such as -omics analyses, has remained limited. The present paper addresses this issue by introducing the first rank-based model generalizing BMM to jointly handle clustering and variable selection, namely the lower-dimensional Bayesian Mallows Model Mixture (lowBM3). The proposed method provides a novel Bayesian framework that simultaneously handles heterogeneity in the sample, unsupervised parameter estimation, and model selection in a scalable manner for ultra-high-dimensional data. Additionally, a companion postprocessing framework is introduced to provide posterior summaries of the discrete posterior distributions of both the consensus ranking and the variable selector. Simulation studies are performed to assess the performance of the method. The usefulness of the method is also shown in an application to signature discovery for cancer genomics, where RNA-seq bulk gene expression data obtained from breast cancer patients are clustered genome-wide.

2606.06699 2026-06-08 stat.ME stat.AP 新提交

Robust inference for cyclic-stress accelerated life tests under interval monitoring with lognormal lifetimes

对数正态寿命区间监测下循环应力加速寿命试验的稳健推断

María Jaenada, Leandro Pardo, Kiran Prajapat

AI总结 针对区间删失的对数正态寿命循环应力加速寿命试验,提出基于加权密度功率散度的稳健估计方法,推导渐近分布并给出置信区间,模拟和实例验证了抗异常值能力。

详情
Comments
35 pages, 7 figures, 6 tables
AI中文摘要

高可靠性产品通常需要在加速条件下进行测试,以便在可行的时间范围内诱发失效。对于使用寿命涉及两个应力水平之间重复交替的产品,例如汽车空调、电池和航空航天部件,循环应力加速寿命试验(CyALT)提供了比传统加速试验更真实的负载曲线。在实践中,失效通常仅在计划的检查时间记录,导致区间删失计数而非精确寿命。此外,传统的最大似然估计对数据污染敏感,这在工业小样本实验中是一个实际问题。本文针对区间监测下具有对数正态寿命的CyALT模型,开发了稳健的推断程序。通过最小化加权密度功率散度(WDPD)获得稳健估计量,即加权最小密度功率散度估计量(WMDPDE)。我们建立了WMDPDE的渐近分布,推导了影响函数表达式以表征稳健性,并给出了重要寿命特征的渐近和自助法置信区间。模拟研究证实,WMDPDE在干净数据下保持高效率的同时,对异常值提供了实质性保护。通过分析空调可靠性数据集展示了该方法,证明了CyALT框架中稳健推断的实际优势。

英文摘要

Highly reliable products are often tested under accelerated conditions to provoke failures within a feasible timeframe. For products whose service life involves repeated alternation between two stress levels, such as automotive air-conditioners, batteries, and aerospace components, cyclic-stress accelerated life testing (CyALT) provides a more realistic loading profile than conventional accelerated tests. In practice, failures are often recorded only at scheduled inspection times, leading to interval-censored counts rather than exact lifetimes. Moreover, traditional maximum likelihood estimation is sensitive to data contamination, which is a genuine concern in small-sample industrial experiments. This paper develops robust inferential procedures for CyALT models with lognormal lifetimes under interval monitoring. Robust estimators are obtained by minimizing a weighted density power divergence (WDPD), leading to the weighted minimum density power divergence estimator (WMDPDE). We establish the asymptotic distribution of the WMDPDE, derive influence function expressions to characterize the robustness, and present asymptotic and bootstrap confidence intervals for important lifetime characteristics. A simulation study confirms that the WMDPDE provides substantial protection against outliers while retaining high efficiency under clean data. The methodology is illustrated through the analysis of an air-conditioner reliability dataset, demonstrating the practical advantages of robust inference in the CyALT framework.

2606.06670 2026-06-08 stat.AP 新提交

When Should Forecasting Models Be Re-Specified? A Cost-Sensitive Trigger for Adaptive Model-Form Updating

预测模型何时应重新指定?一种成本敏感的触发机制用于自适应模型形式更新

Harrison Katz

AI总结 针对预测系统模型形式更新频率问题,提出基于规范债务的成本敏感触发规则,在保持预测精度的同时降低计算成本和不稳定性,并在M4数据上验证其有效性。

详情
AI中文摘要

预测系统通常在每个评审周期进行刷新,该刷新通常包含两个不同的操作:估计参数和选择模型形式。最近的证据表明,第二个操作通常是不必要的,因为中间更新策略可以在大致保持预测精度的同时降低计算成本和预测不稳定性。本技术说明探讨了补充性问题:一旦系统采用了减少更新的策略,何时应中断该策略并重新指定模型形式?我们将规范债务定义为针对部署模型形式积累的证据,并利用它构建一个成本敏感的重新指定触发机制。在封闭的离散模型空间中,该触发机制简化为对部署规范的后验概率负对数的阈值。在开放的生产环境中,相同的决策规则可以通过预测得分差距、堆叠权重或校准的监测诊断来运行。固定更新频率是该规则的一个特例,当针对部署形式的证据以恒定速率积累时恢复。我们在500个M4月度序列上说明了这一想法,比较了完全更新、固定模型形式更新频率、仅参数更新以及有上限的自适应得分触发更新,并在有限ETS网格内,根据候选形式的AIC和BIC权重计算了规范债务的信息准则类似物。在该示例中,最佳的有上限自适应策略在精度上与完全更新相当,运行时间约为完全更新的28%,降低了预测不稳定性,并且行为类似于具有少量基于证据的例外的固定调度。

英文摘要

Forecasting systems are commonly refreshed at every review period, and that refresh usually bundles two distinct operations: estimating parameters and selecting the model form. Recent evidence suggests the second operation is often unnecessary, since intermediate updating strategies can hold forecast accuracy roughly fixed while cutting computational cost and forecast instability. This technical note takes up the complementary question. Once a system has adopted a reduced-update policy, when should it interrupt that policy and re-specify the model form? We define specification debt as the evidence accumulated against the deployed model form, and we use it to build a cost-sensitive trigger for re-specification. In a closed discrete model space the trigger reduces to a threshold on the negative log posterior probability of the deployed specification. In open production settings the same decision rule can be run with predictive score gaps, stacking weights, or calibrated monitoring diagnostics. Fixed update frequencies turn out to be a special case of the rule, recovered when evidence against the deployed form accumulates at a constant rate. We illustrate the idea on 500 monthly M4 series, comparing full updating, fixed model-form update frequencies, parameter-only updating, and capped adaptive score-triggered updating, and within the finite ETS grid we also compute information-criterion analogues of specification debt from AIC and BIC weights over the candidate forms. In that illustration the best capped adaptive policy is comparable to full updating in accuracy, runs in about 28 percent of full-update computational time, lowers forecast instability, and behaves like a fixed schedule with a small number of evidence-based exceptions.

2606.06571 2026-06-08 stat.AP 新提交

Counting the uncounted: How many were killed in Guatemala, 1978-1995?

计数未计数者:1978-1995年危地马拉有多少人被杀害?

Nils Lid Hjort

AI总结 针对多项分布中零单元格计数缺失的问题,提出参数化推断方法估计未知数量,并应用于估算危地马拉种族灭绝期间(1978-1995年)的死亡人数。

详情
Comments
10 pages, 3 figures. Invited chapter, for invited talk, at the 40th International Workshop on Statistical Modelling, Oslo, June 28 to July 3, 2026; will be published in Conference Proceedings, in different layout etc
AI中文摘要

在各种应用领域中,存在一个特定的“零单元格”,在多项分布设置中,其他单元格有观测记录,但无法计数零单元格的发生次数。我开发了推断理论,通过参数化建模,在可获得其他单元格计数的情况下,评估此类未知数量,即计数未计数者。这些方法用于估算危地马拉种族灭绝时期(1978-1995年)的死亡人数。有三份精心整理的遇害者名单,其信息可映射到一个包含$2^3=8$个单元格的维恩图。对七个观测单元格求和,可识别出$R=47,803$名遇害者,但$N_{0,0,0}$有多大,进而$N=N_{0,0,0}+R$是多少?

英文摘要

In various application domains, there is a certain `null cell', inside a multinomial setup, where observations are recorded for the other cells, but where one cannot count the number of occurrences for the null cell. I develop inference theory for assessing such unknown numbers, counting the uncounted, in situations where counts are available for the other cells, via parametric modelling. The methods are used to estimate the number of persons killed in Guatemala during the Genocidio guatemalteco years 1978--1995. There are three carefully curated lists of killed people, where the information can be mapped to a Venn diagram with $2^3=8$ cells. Summing over the seven observed cells, $R=\hbox{47,803}$ killed individuals can be identified, but how big is $N_{0,0,0}$, and hence $N=N_{0,0,0}+R$?

2606.07492 2026-06-08 cs.IR cs.LG stat.ML 新提交

Bradley-Terry Rankings for Recommender Systems Across Dataset Taxonomies

基于数据集分类学的推荐系统Bradley-Terry排名

Ekaterina Grishina, Stepan Kuznetsov, Askar Tsyganov, Ilya Ivanov, Daria Korovaitceva, Margarita Rusanova, Uliana Parkina, Alexander Derevyagin, Evgeny Frolov, Sergey Samsonov, Anton Lysenko

AI总结 针对推荐算法排名对数据集特性敏感的问题,提出基于Bradley-Terry模型的数据驱动排名方法,并引入排名一致性指标和针对未见数据集的算法排名方法。

详情
Comments
KDD'26
AI中文摘要

推荐算法的排名是一个具有挑战性的问题,因为模型性能对数据集特征(如稀疏性、序列结构和规模)敏感。这驱动了对适当方法的需求,以公平比较算法。对性能指标(例如,在基准测试上平均NDCG)的简单聚合可能会产生误导性的排名,削弱实际选择。为解决此问题,我们引入了一种基于Bradley-Terry(BT)模型的新型数据驱动排名方法。我们证明所获得的排名取决于关键数据集统计量。此外,我们提出了一种新的排名一致性评估指标,并展示了我们的排名对不完整数据的鲁棒性。最后,我们引入了一种针对未见数据集的算法排名方法,无需运行模型,依赖于Bradley-Terry框架的扩展,包括BT树和带协变量的BT模型。

英文摘要

The ranking of recommendation algorithms is a challenging problem since model performance is sensitive to dataset characteristics such as sparsity, sequential structure, and scale. This drives a demand for a proper methodology for fair comparison between algorithms. Naive aggregation of performance metrics (e.g., averaging NDCG over benchmarks) can yield misleading rankings, undermining practical selection. To address this problem, we introduce a novel, data-driven ranking methodology based on Bradley-Terry (BT) model. We demonstrate that the obtained ranking depends on key dataset statistics. Additionally, we propose a novel metric for evaluating ranking consistency and demonstrate robustness of our ranking to incomplete data. Finally, we introduce a dataset-specific methodology for ranking algorithms on unseen datasets without running the models, relying on extensions of the Bradley-Terry framework, including BT trees and BT models with covariates.

2606.07483 2026-06-08 cs.LG stat.ML 新提交

Network Recovery from Cascade Data: A Debiased Jacobian-Based Machine Learning Approach

从级联数据中恢复网络:一种基于去偏雅可比矩阵的机器学习方法

Lei Huang

发表机构 * MIT Sloan School of Management

AI总结 提出CascadeNet框架,通过去偏雅可比矩阵估计一步转移函数,无需指定扩散模型即可恢复隐藏影响网络,在模拟和COVID-19传播数据中优于现有方法。

详情
AI中文摘要

许多重要结果以动态级联的形式展开,包括产品采用、疾病传播、金融困境和信息扩散。一个核心挑战是恢复这些级联背后的隐藏影响网络。现有方法通常假设特定的扩散模型,当该假设错误时,其性能会大幅下降。我们提出了CascadeNet,一种基于雅可比矩阵的机器学习框架,用于网络恢复,无需指定扩散机制。关键思想是,潜在的影响结构可以通过一步转移函数的雅可比矩阵来刻画。CascadeNet首先构建转移函数的灵活估计量,然后通过Riesz表示应用Neyman正交去偏,使得去偏后的雅可比矩阵是$\sqrt{n}$一致且渐近正态的,从而能够对网络结构进行正式推断。我们在模拟实验和真实世界实证应用中验证了CascadeNet。在模拟中,数据生成过程已知,CascadeNet在九种常见数据生成过程中实现了最高的网络恢复准确率。在西班牙52个省份的COVID-19传播实证应用中,CascadeNet恢复的传播网络与真实的省际移动网络显著相关,而基线方法恢复的网络与真实情况无显著一致性。

英文摘要

Many important outcomes unfold as dynamic cascades, including product adoption, disease spread, financial distress, and information diffusion. A central challenge is to recover the hidden influence network behind these cascades. Existing methods typically assume a specific diffusion model, and their performance degrades substantially when that assumption is misspecified. We propose CascadeNet, a Jacobian-based machine learning framework for network recovery that does not require specifying a diffusion mechanism. The key idea is that the underlying influence structure can be characterized by the Jacobian of the one-step transition function. CascadeNet first constructs a flexible estimator of the transition function, and further applies Neyman-orthogonal debiasing via the Riesz representer, so that the debiased Jacobian is $\sqrt{n}$-consistent and asymptotically normal, enabling formal inference on the network structure. We validate CascadeNet in both a simulation exercise and a real-world empirical application. In simulations, where the data-generating process is known, CascadeNet achieves the highest network recovery accuracy across nine common data-generating processes. In an empirical application to COVID-19 transmission across Spain's 52 provinces, CascadeNet recovers transmission networks that are significantly correlated with the true inter-province mobility network, whereas networks recovered by baseline methods show no significant alignment with the ground truth.

2606.07457 2026-06-08 cs.LG eess.SP stat.ML 新提交

Time series Foundation Models based on Physics-Informed Synthetic Histories for Cold-Start Photovoltaic Forecasting

基于物理信息合成历史的时间序列基础模型用于冷启动光伏预测

Lorenzo Longarini, Alessandro Rongoni, Simone Silenzi, Emanuele Frontoni, Riccardo Rosati

发表机构 * European Commission

AI总结 针对光伏电站冷启动预测问题,提出利用物理信息合成历史数据,结合时间序列基础模型进行零样本预测,在440个站点上实现1.7-2倍性能提升。

详情
Comments
To be published in the 2nd ICML Workshop on Foundation Models for Structured Data
AI中文摘要

在并网调试时,光伏运营商必须在目标站点观测数据可用之前预测发电量,这限制了标准监督预测器的直接使用。针对这种冷启动场景,我们提出了一种零样本流程,通过电站元数据和气象协变量生成合成发电历史,使时间序列基础模型(TSFMs)能够通过推理时条件化进行预测。我们在严格的冷启动基线、真实反馈和自预测反馈策略下,将五种TSFM与经典基线进行了基准测试。评估涵盖了四个数据集中$440$个光伏站点以及多种气候区域。协变量感知的基础模型比基线性能提升约$1.7-2$倍:TabPFN-TS在真实反馈下实现了最低误差(MAE $0.514$, RMSE $0.721$ $kWh$ ${kWp}^{-1}$ ${d}^{-1}$),而Chronos-2在自预测反馈下最为鲁棒。性能对合成历史来源基本不敏感,表明准确性更多取决于合理的时序上下文可用性,而非特定生成器。

英文摘要

At commissioning time, Photovoltaic (PV) operators must forecast production before target-site observations are available, limiting the direct use of standard supervised forecasters. This cold-start setting is addressed with a zero-shot pipeline that generates a synthetic production history from plant metadata and meteorological covariates, enabling time-series foundation models (TSFMs) to forecast through inference-time conditioning. Five TSFMs are benchmarked against classical baselines under strict Cold-Start Baseline, Real Feedback, and Self-Forecast Feedback strategies. The evaluation spans $440$ PV sites across four datasets and diverse climate regimes. Covariate-aware foundation models outperform baselines by approximately $1.7-2\times$: TabPFN-TS achieves the lowest error under Real Feedback (MAE $0.514$, RMSE $0.721$ $kWh$ ${kWp}^{-1}$ ${d}^{-1}$), while Chronos-2 is most robust under Self-Forecast Feedback. Performance is largely insensitive to the synthetic-history source, indicating that accuracy is driven more by the availability of plausible temporal context than by the specific generator.

2606.07392 2026-06-08 cs.AI cs.LG econ.EM stat.ML 新提交

Online Pandora's Box for Contextual LLM Cascading

面向上下文LLM级联的在线潘多拉魔盒

Alexandre Belloni, Yan Chen, Yehua Wei

发表机构 * The Fuqua School of Business, Duke University

AI总结 针对LLM级联场景,提出在线上下文潘多拉魔盒模型,通过参数化保留索引和GMM估计结合UCB界,实现维度相关的√T累积遗憾。

详情
AI中文摘要

受大型语言模型(LLM)级联的启发,我们提出了一种在线上下文潘多拉魔盒模型,用于自适应地查询和选择LLM API。在每个周期中,决策者观察一个请求上下文,并面临一个两阶段决策问题。在查询阶段,决策者顺序查询API,每次查询揭示一个生成的输出,并且决策者承担(输出相关的)成本。在选择阶段,决策者选择一个生成的输出进行部署,并仅观察部署输出的下游奖励。这种输出介导的反馈结构不同于经典的在线上下文潘多拉魔盒模型,后者打开盒子直接揭示其奖励。我们不估计每个API的完整条件输出和成本分布,而是直接建模保留索引,并为查询阶段开发一种学习方法。具体地,我们对由经典Weitzman策略诱导的上下文保留索引函数施加参数化结构。我们的策略将这些保留索引的广义矩方法(GMM)类型估计与这些索引以及共享输出级奖励评估器的UCB风格置信界相结合。在正则条件下,我们证明所得策略在T个周期的时间范围内实现了维度相关的$\widetilde O(\sqrt T)$累积遗憾。

英文摘要

Motivated by Large Language Model (LLM) cascading, we propose an online contextual Pandora's Box model for adaptively querying and selecting LLM APIs. In each period, a decision-maker observes a request context and faces a two-phase decision problem. In the query phase, the decision-maker sequentially queries APIs, where each query reveals a generated output and the decision-maker incurs an (output-dependent) cost. In the selection phase, the decision-maker selects one of the generated outputs to deploy and observes only the downstream reward of the deployed output. This output-mediated feedback structure differs from classical online contextual Pandora's Box models, in which opening a box directly reveals its reward. Rather than estimating the full conditional output and cost distributions of each API, we directly model the reservation index and develop a learning approach for the query phase. Specifically, we impose a parametric structure on the contextual reservation index functions induced by the classical Weitzman's policy. Our policy combines generalized method of moments (GMM) type estimation of these reservation indices with UCB-style confidence bounds for both these indices and the shared output-level reward evaluator. Under regularity conditions, we prove that the resulting policy achieves dimension-dependent $\widetilde O(\sqrt T)$ cumulative regret over a horizon of $T$ periods.

2606.07382 2026-06-08 cs.LG stat.ML 新提交

Covariance Shrinkage via Stochastic Interpolation

通过随机插值的协方差收缩

Mathieu Chalvidal, Florentin Coeurdoux, Eric Vanden-Eijnden

发表机构 * Capital Fund Management

AI总结 提出将高维协方差估计的经典收缩重述为基于源分布与目标分布之间参数化随机插值的经验风险最小化,揭示三种降低统计风险的机制,并设计神经估计器及风险上界。

详情
Comments
18 pages
AI中文摘要

我们将高维协方差估计器的经典收缩重述为基于源分布与目标分布之间参数化随机插值的经验风险最小化。该形式将已知的收缩估计器作为特例,并揭示了降低统计风险的三种不同机制:(i) 调度:插值调度决定了可容许协方差的类别,从而影响可实现的风险。(ii) 流映射和耦合:虽然朴素构造相当于假设分布之间的独立性,但特定的耦合结构(例如最优传输问题的解)可以降低经验风险。此外,实现这种耦合的非线性流映射使插值协方差摆脱经验估计的特征基,从而实现特征向量正则化。(iii) 提前停止:通过积分回归向量场定义的估计器通过近似真实插值分布提供了额外的偏差-方差权衡。然后,我们提出了一种插值器的神经估计器,并给出了其二次风险关于插值近似误差的上界,并在合成实验中进行了验证。最后,我们将该估计器应用于真实的神经影像数据,展示了该方法在实践中提供的额外正则化能力。

英文摘要

We recast classical shrinkage of high-dimensional covariance estimators as empirical risk minimization over a parametric stochastic interpolant between a source and a target distribution. This formalism recovers known shrinkage estimators as special cases and reveals three distinct mechanisms for reducing statistical risk: (i) Scheduling: the interpolant schedule determines the class of admissible covariances, and hence the achievable risk. (ii) Flow maps and couplings: whereas naive constructions amount to assuming independence between the distributions, specific coupling structures (e.g., solutions of optimal transport problems) can lower the empirical risk. Moreover, non-linear flow maps realizing such couplings free the interpolant covariance from the eigenbasis of the empirical estimate, enabling eigenvector regularization. (iii) Early stopping: estimators defined by integrating a regressed vector field afford an additional bias-variance trade-off through approximation of the true interpolant distribution. We then propose a neural estimator of the interpolant, together with an upper bound on its quadratic risk in terms of the interpolant approximation error, and validate both on synthetic experiments. Finally, we apply the estimator to real neuroimaging data, demonstrating the additional regularization power this approach offers in practice.

2606.07058 2026-06-08 cs.LG cs.CV math.AT stat.ML 新提交

Constructing VAE Latent Spaces with Prescribed Topology

构建具有指定拓扑的VAE潜在空间

Jilles S. van Hulst, Jakub M. Tomczak, W. P. M. H. Heemels, Duarte J. Antunes

发表机构 * Eindhoven University of Technology Nature Innovation Laboratory (NatInLab)

AI总结 针对数据流形非欧几里得拓扑导致标准高斯先验不匹配的问题,提出一种构造性数学框架,通过因子化分布和重参数化技巧,为乘积覆盖空间流形(如圆柱、环面、莫比乌斯带等)设计拓扑匹配的先验,提升重建质量和表示忠实性。

详情
Comments
16 pages, 7 figures
AI中文摘要

变分自编码器(VAE)学习高维数据的低维潜在表示。当数据位于具有非欧几里得拓扑的流形上时,标准高斯先验会引入拓扑不匹配,从而降低重建质量并阻碍忠实表示。我们提出了一个构造性数学框架,解决了所有允许乘积覆盖空间的流形的这种不匹配问题。这些流形可表示为基本因子(圆、区间或直线)的乘积,或此类乘积在有限对称群下的商。该类包括圆柱、环面、莫比乌斯带、克莱因瓶和实射影空间。基本因子上的因子化分布产生具有闭式解耦KL散度的乘积拓扑,使得每个潜在因子可以独立塑造,同时保持训练可处理。我们为周期、有界和无界支撑编目了可重参数化的编码器-先验对,并提供了坐标变换,允许标准神经网络输出具有平滑梯度的非欧几里得参数。对于商流形,解码器接收覆盖空间坐标的群不变特征,使得识别点产生相同输出。锚点约束相对于数据固定坐标系或创建软拓扑孔。在合成流形和真实图像数据集(旋转和循环移位MNIST)上的实验证实,拓扑匹配的先验使KL正则化与数据流形对齐。所得到的拓扑感知模型在所有实际相关的正则化强度下均优于高斯基线。代码可从此https URL获取。

英文摘要

Variational autoencoders (VAEs) learn low-dimensional latent representations of high-dimensional data. When the data lies on a manifold with non-Euclidean topology, the standard Gaussian prior introduces a topological mismatch that degrades reconstruction quality and prevents faithful representation. We present a constructive mathematical framework that resolves this mismatch for all manifolds that admit a product covering space. These are manifolds expressible as products of elementary factors (circles, intervals, or lines) or as quotients of such products by a finite symmetry group. The class includes cylinders, tori, Möbius strips, Klein bottles, and real projective spaces. Factorized distributions over the elementary factors yield product topologies with closed-form, decoupled KL divergences, so that each latent factor can be shaped independently while keeping training tractable. We catalogue reparametrizable encoder-prior pairs for periodic, bounded, and unbounded supports, and provide coordinate transformations that allow standard neural networks to output non-Euclidean parameters with smooth gradients. For quotient manifolds, the decoder receives group-invariant features of the covering-space coordinates, so that identified points produce identical outputs. Anchor constraints fix the coordinate system relative to the data or create soft topological holes. Experiments on synthetic manifolds and real-image datasets (rotated and cyclically shifted MNIST) confirm that a topology-matched prior aligns KL regularization with the data manifold. The resulting topology-aware models outperform the Gaussian baseline at all practically relevant regularization strengths. The code is available at https://github.com/JvHulst/VAE-Topology.

2606.06804 2026-06-08 cs.LG stat.AP 新提交

Interpreting Learning Under Competing Models: Joint and Stepwise Approaches for Dynamic Cognitive Diagnosis

解释竞争模型下的学习:动态认知诊断的联合与逐步方法

Yawen Ma, Sahoko Ishida, Kate Cain, Gabriel Wallin

发表机构 * School of Mathematical Sciences, Lancaster University Department of Computer Science, University of Oxford Department of Psychology, Lancaster University

AI总结 研究在项目-技能结构未知时,联合估计Q矩阵与学习过程相比先确定Q矩阵再研究学习,如何改变对学习者发展的结论,并通过动态认知诊断模型分析阅读游戏数据,发现联合分析更可靠。

详情
AI中文摘要

数字学习环境记录学习者对单个项目的反应,使得研究特定技能的发展而非总体分数成为可能。从这些数据中得出关于学习的结论需要一个将反应与潜在技能联系起来的模型,并追踪掌握程度随时间的变化。当每个项目测量的技能未知时,分析者必须决定是联合估计这种结构(Q矩阵)与学习过程,还是先确定它再研究学习。我们表明,这一决定可以改变关于学习者如何发展的实质性结论。使用动态认知诊断模型,我们分析了两个阅读游戏的数据,这些游戏测量了从二年级到三年级的词汇和理解能力,项目文本嵌入为未知的Q矩阵提供了先验信息。联合分析和偏差校正的逐步分析一致认为,大多数学习者朝着掌握两种技能的方向发展,但在三年级时有多少人仍然只部分熟练的问题上存在分歧,从而改变了阅读进展的报告方式。模拟研究确定了两种分析何时出现分歧,并表明当项目-技能结构不确定且项目池在不同年级之间变化时,联合分析更可靠。我们提供了两种分析的R代码。

英文摘要

Digital learning environments record learners' responses to individual items, making it possible to study the development of specific skills rather than overall scores. Drawing conclusions about learning from these data requires a model that links responses to latent skills and tracks how mastery changes over time. When the skills measured by each item are unknown, the analyst must decide whether to estimate this structure, the Q-matrix, jointly with the learning process, or to establish it first and study learning afterwards. We show that this decision can change substantive conclusions about how learners develop. Using dynamic cognitive diagnostic models, we analyse data from two reading games measuring vocabulary and comprehension from Grade 2 to Grade 3, with item-text embeddings providing prior information for the unknown Q-matrix. A joint analysis and a bias-corrected stepwise analysis agree that most learners move toward mastering both skills, but disagree about how many remain only partially proficient at Grade 3, changing how reading progress would be reported. A simulation study identifies when the two analyses diverge and shows that joint analysis is more reliable when the item-skill structure is uncertain and the item pool changes between grades. We provide R code for both analyses.

2606.06705 2026-06-08 eess.SY cs.SY math.ST stat.ME stat.TH 新提交

Estimating Evolving Functions with Dynamic Gaussian Processes

使用动态高斯过程估计演化函数

J. S. van Hulst, W. P. M. H. Heemels, D. J. Antunes

AI总结 提出动态高斯过程框架,通过积分-差分方程建模演化函数,将高斯过程回归扩展到时变函数,并利用可分离核结构简化为有限维卡尔曼滤波,支持向量值状态和高阶偏微分方程。

详情
Comments
This manuscript is a preprint submitted to a SIAM journal
AI中文摘要

本文发展了动态高斯过程(DGP),一个用于估计由积分-差分方程(IDE)支配的函数的框架。IDE 对具有离散时间动态的连续函数进行建模,并自然地从线性偏微分方程(PDE)的时间离散化中产生。DGP 将高斯过程回归扩展到时变函数,并将卡尔曼滤波扩展到无限维状态。DGP 后验仍为高斯过程,具有闭式均值和协方差更新,且可分离核结构将问题简化为基函数系数上的有限维卡尔曼滤波。本文将 DGP 扩展到向量值状态,从而能够处理高阶 PDE,并提供了基函数近似的稳定性和逼近误差分析。函数 L2 估计误差精确分解为子空间内和子空间外贡献,且所有逼近误差随基函数数量增长而消失。该框架在热方程和波动方程(后者具有向量值状态)上进行了演示。代码可在 https://this URL 获取。

英文摘要

This paper develops the Dynamic Gaussian Process (DGP), a framework for estimating functions governed by integro-difference equations (IDEs). IDEs model continuous functions that evolve with discrete-time dynamics and arise naturally from time-discretization of linear partial differential equations (PDEs). The DGP extends Gaussian process regression to time-varying functions and extends Kalman filtering to infinite-dimensional states. The DGP posterior remains a Gaussian process with closed-form mean and covariance updates, and separable kernel structure reduces the problem to a finite-dimensional Kalman filter on basis function coefficients. This paper extends the DGP to vector-valued states, enabling the treatment of higher-order PDEs, and provides a stability and approximation error analysis for the basis function approximation. The functional L2 estimation error decomposes exactly into in-subspace and out-of-subspace contributions, and all approximation errors vanish as the number of basis functions grows. The framework is demonstrated on the heat equation and on the wave equation, the latter with a vector-valued state. Code is available at https://github.com/JvHulst/Dynamic_Gaussian_Processes.

2606.06625 2026-06-08 cs.GT math.ST stat.TH 新提交

N-Player Binary Games with Unidirectional Dependencies: Cycle Robustness and Induced Indifference

具有单向依赖性的N人二元博弈:循环鲁棒性与诱导无差异

Jose Maria Sanchez-Saez, Nana Odishelidze, Francisco Criado-Aldeanueva

AI总结 本文针对具有单向依赖性的N人二元博弈,给出了纳什均衡的闭式刻画,重点研究了有向循环图博弈,提出了鲁棒激励结构,在O(N)时间内求解均衡,并揭示了奇偶条件与诱导无差异的作用。

详情
Journal ref
Communications in Nonlinear Science and Numerical Simulation, Volume 161, Part 2, 2026, 110151, ISSN 1007-5704
AI中文摘要

本研究提供了具有单向依赖性的N人二元博弈中纳什均衡的闭式刻画。虽然一般网络博弈是PPAD完全的,但先前的工作已证明树或路径可通过动态规划在多项式时间内求解。我们为有向循环图博弈的子类提供了确定性刻画,表明非零边界激励将拓扑线性化为前馈传播。在这种鲁棒激励结构下,可在O(N)时间内求解:严格优势保证唯一均衡;在无严格优势时,纯策略均衡由奇偶条件支配,而通过诱导支付无差异保证唯一完全混合均衡。对于非鲁棒情形,我们给出了分支规则。转移矩阵公式可预先评估搜索树大小。这种透明性使得循环网络中目标均衡的逆向设计成为可能,明确了数值求解器中晦涩的机制。

英文摘要

The present study provides a closed-form characterisation of Nash equilibria in N-player binary games with unidirectional dependencies. While general network games are PPAD-complete, prior work has established that trees or paths admit polynomial-time solutions via dynamic programming. We provide a deterministic characterisation for the subclass of directed cycle graphical games, demonstrating that non-zero boundary incentives linearize the topology into a feed-forward propagation. Under this Robust Incentive Structure, resolution is achieved in O(N) time: strict dominance guarantees a unique equilibrium; in its absence, pure strategy equilibria are governed by the Parity Condition, while a unique fully mixed equilibrium is guaranteed via induced payoff indifference. For non-robust regimes, we deliver branching rules. The transition-matrix formulation evaluates the search tree size beforehand. This transparency enables the inverse design of target equilibria in circular networks, making explicit the mechanics that remain opaque in numerical solvers.

2606.06576 2026-06-08 cs.LG astro-ph.EP astro-ph.IM stat.ML 新提交

Gaussian Process Latent Factor Regression for Low-Data, High-Dimensional Output Problems

高斯过程潜在因子回归用于低数据高维输出问题

Edward T. Stevenson, Eric T. Wolf, Mei Ting Mak, N. J. Mayne, Miles Cranmer

发表机构 * University of Cambridge University of Colorado Boulder University of Oxford University of Exeter

AI总结 提出高斯过程潜在因子回归(GPLFR)模型,通过将输出表示为低维潜在状态的线性高斯解码,联合优化压缩与预测,解决低数据高维输出回归问题,并首次构建岩石系外行星全球气候模型的空间分辨仿真器。

详情
Comments
9 pages content + 22 pages appendix/references. Supporting code at https://github.com/edstevenson/GPLFR
AI中文摘要

在科学领域,回归任务通常需要从少量训练样本预测高维输出。多输出高斯过程在低数据场景中表现出色,但通常难以处理高维输出。PCA-GP(主成分分析加高斯过程回归)等压缩-预测流程处理了高维性,但依赖于为重构而非预测优化的基。为弥补这一差距,我们提出一个模型,将每个输出表示为从高斯过程先验中抽取的低维潜在状态的线性高斯解码。通过解析地边缘化解码器权重,我们将压缩和预测耦合在一个可扩展到高维输出的单一目标中。我们将此模型称为高斯过程潜在因子回归(GPLFR)。我们通过构建首个岩石系外行星全球气候模型的空间分辨仿真器来演示GPLFR。

英文摘要

In the sciences, regression tasks often require predicting high-dimensional outputs from few training examples. Multi-output Gaussian processes excel in low-data regimes but typically struggle with high-dimensional outputs. Compress-then-predict pipelines such as PCA-GP (principal component analysis plus Gaussian process regression) handle high dimensionality, but rely on bases optimized for reconstruction rather than prediction. To address this gap, we propose a model that represents each output as a linear-Gaussian decoding of a low-dimensional latent state drawn from a Gaussian process prior. By analytically marginalizing the decoder weights, we couple compression and prediction in a single objective that scales to high-dimensional outputs. We refer to this model as Gaussian process latent factor regression (GPLFR). We demonstrate GPLFR by building the first spatially resolved emulator of global climate models for rocky exoplanets.

2606.07499 2026-06-08 math.ST math.PR stat.TH 新提交

Non-asymptotic bounds for quasi-MLE, misspecified models, and dependence under group sequential sampling

分组序贯抽样下拟极大似然估计、误设定模型及依赖性的非渐近界

Julian Aronowitz, Jay Bartroff

AI总结 针对分组序贯拟极大似然估计,在模型可能误设定且组内存在依赖性的情况下,推导了渐近多元正态极限和显式非渐近正态逼近界,并应用于癫痫临床试验数据。

详情
AI中文摘要

我们推导了分组序贯拟极大似然估计在可能的模型误设定和组内依赖性下的渐近多元正态极限和显式非渐近正态逼近界。这些界通过Stein方法获得,具有已知常数,并适用于一类依赖数据估计问题,其中用于估计的似然可能不同于真实数据生成机制。我们针对具有随机组效应的泊松广义线性混合模型明确计算了极限协方差结构和有限样本界,并使用癫痫临床试验数据说明了结果。

英文摘要

We derive asymptotic multivariate normal limits and explicit non-asymptotic normal approximation bounds for group sequential quasi-maximum likelihood estimators under possible model misspecification and within-group dependence. The bounds, obtained using Stein's method, have known constants and apply to a class of dependent-data estimating problems in which the likelihood used for estimation may differ from the true data-generating mechanism. We compute the limiting covariance structure and finite-sample bound explicitly for a Poisson generalized linear mixed model with random group effects and illustrate the results using data from an epilepsy clinical trial.

2606.07354 2026-06-08 math.ST stat.TH 新提交

Dependence Measures via Adapted Optimal Transport: Stability and Rates of Convergence

通过适应最优输运的依赖性度量:稳定性与收敛速率

Jonathan Ansari, Johannes Wiesel

AI总结 提出基于适应最优输运的收敛模式,恢复条件分布的弱连续性,并导出依赖度量插件估计器的O(N^{-1/3})收敛速率。

详情
AI中文摘要

最近研究的依赖性度量,如Chatterjee秩相关,同时刻画独立性和完全函数依赖,为检测非线性依赖提供了强大框架。然而,这些度量不能弱连续,这限制了基于经验分布的传统插件估计器的适用性。这种障碍是自然的,因为此类度量是通过条件分布而非仅通过其联合分布定义的。本文引入一种基于最优输运的收敛模式,捕捉条件分布的弱收敛,并恢复广泛依赖度量类的连续性。我们将此收敛模式与适应Wasserstein距离、Knothe-Rosenblatt距离以及copula上的d1度量联系起来。基于此视角,我们提出基于适应经验测度的copula估计器,并与经典的基于秩的棋盘估计器进行比较。对于这两种估计器,我们推导出关于捕捉条件弱连续性的度量的O(N^{-1/3})收敛速率。作为结果,我们为几类依赖度量的插件估计器(包括基于秩的和重排的依赖度量)获得了相同的速率。

英文摘要

Recently studied dependence measures, such as Chatterjee's rank correlation, that characterize both independence and perfect functional dependence, provide a powerful framework for detecting nonlinear dependencies. However, these measures cannot be weakly continuous, which limits the applicability of classical plug-in estimators based on empirical distributions. This obstruction is natural, as such measures are defined via conditional distributions and not through their joint law alone. In this paper, we introduce an optimal transport-based mode of convergence that captures weak convergence of conditional distributions and restores continuity for a broad class of dependence measures. We relate this mode of convergence to the adapted Wasserstein distance, the Knothe-Rosenblatt distance and the d1-metric on copulas. Building on this perspective, we propose a copula estimator based on the adapted empirical measure and compare it with the classical rank-based checkerboard estimator. For both estimators, we derive O(N^{-1/3})-rates of convergence with respect to metrics that capture conditional weak continuity. As a consequence, we obtain the same rates for plug-in estimators of several classes of dependence measures, including rank-based and rearranged dependence measures.

2606.07325 2026-06-08 math.ST cs.AI cs.IT math.IT stat.TH 新提交

A Temporal Spatial Minimax Rate for Smoothly-Varying Distributions in Wasserstein Space

Wasserstein空间中平滑变化分布的时空极小极大速率

Munsik Kim

AI总结 研究在Wasserstein空间中,基于过去有限噪声快照估计未来曲线值的极小极大速率,提出时空下界并证明其匹配上界。

详情
AI中文摘要

我们研究了在$2$-Wasserstein空间$\mathcal{P}_2(\mathbb{R}^d)$中,从过去有限个噪声快照估计曲线$t\mapsto\mu_t$的未来值$\mu_{t_n+h}$的极小极大速率,在速度场的$k$阶协变导数满足绝热界$\|\nabla_t^k v\|\le\varepsilon$的条件下。我们的核心结果是统一的时空极小极大下界:在正则的、局部传输丰富的子类上,每个估计量都会遭受$W_2$-风险,其$M$-指数为$\gamma_d(k+1)/(k+1+\gamma_d)$,其中$\gamma_d=\min(1/d,1/2)$($M$为总样本量)。该下界源于时空约化:光滑性预算定义了一个可达的$W_2$-球,沿时间轴嵌入一个传输填充,整个快照实验的信息由Fano论证控制——空间填充是经典的,但其光滑性容许的时间嵌入和全窗口分析是新的。该界插值了一个与维数无关的外推下限$\varepsilon h^{k+1}$——即使过去完全已知,未来不可观测的不可约代价——以及空间估计的维数灾难$M^{-\gamma_d}$,当$k\to\infty$时恢复静态分布估计速率。我们以设计依赖的形式陈述下界——具有设计加权的有效样本量——适用于任意观测时间,并在密集(等间距)情形下得到闭式指数。匹配的上界在$k=0$(速率$M^{-1/(d+1)}$,$d\ge3$)和平移子模型中对所有$k$建立;对于$k\ge1$,协变估计量条件依赖于两个估计(比较几何偏差界和最优传输映射估计速率)达到该速率,将无条件的一般$k$上界留作开放问题。在合成弯曲和平坦族上的数值实验验证了预测的指数。

英文摘要

We study the minimax rate of estimating a future value $μ_{t_n+h}$ of a curve $t\mapstoμ_t$ in the $2$-Wasserstein space $\mathcal{P}_2(\mathbb{R}^d)$ from finitely many noisy snapshots of its past, under an adiabatic bound $\|\nabla_t^k v\|\le\varepsilon$ on the $k$-th covariant derivative of the velocity field. Our central result is a unified temporal-spatial minimax lower bound: over regular, locally transport-rich subclasses, every estimator incurs $W_2$-risk with $M$-exponent $γ_d(k+1)/(k+1+γ_d)$, $γ_d=\min(1/d,1/2)$ ($M$ the total sample size). It follows from a temporal-to-spatial reduction: the smoothness budget defines a reachable $W_2$-ball into which a transport packing is embedded along the time axis, and the information of the entire snapshot experiment is controlled by a Fano argument -- the spatial packing is classical, but its smoothness-admissible temporal embedding and the full-window analysis are new. The bound interpolates a dimension-free extrapolation floor of order $\varepsilon h^{k+1}$ -- the irreducible cost of an unobserved future, present even with the exact past -- and the spatial estimation curse $M^{-γ_d}$, recovering the static distribution-estimation rate as $k\to\infty$. We state the lower bound in a design-dependent form -- with a design-weighted effective sample size -- valid for arbitrary observation times, and obtain the closed-form exponent in the dense (equispaced) regime. The matching upper bound is established at $k=0$ (rate $M^{-1/(d+1)}$, $d\ge3$) and, in a translation submodel, for all $k$; for $k\ge1$ a covariant estimator attains the rate conditionally on two estimates (a comparison-geometry bias bound and an optimal-transport map-estimation rate), leaving the unconditional general-$k$ upper bound as an open problem. Numerical experiments on synthetic curved and flat families corroborate the predicted exponents.

2606.07276 2026-06-08 math.ST q-fin.RM stat.TH 新提交

The Balance Property: The Constrained Case, with a View on Risk Sharing

平衡性质:约束情形及风险分担视角

Mario V. Wüthrich

AI总结 本文提出一种约束广义线性模型拟合方法,解决保险定价中平衡性质失效问题,并揭示其与事后风险分担规则的联系。

详情
AI中文摘要

平衡性质是用于保险定价的拟合统计模型的一个重要性质。它保证拟合模型中的总精算价格等于用于拟合模型的总观测损失。这可以视为一种样本内全局无偏性。使用典型连接函数的最大似然拟合广义线性模型自动满足平衡性质。Lindholm-Wüthrich (Scandinavian Actuarial Journal, 2026) 讨论了在平衡性质不成立时的两种流行的平衡校正方法。本文通过第三种方法——约束GLM拟合——扩展了这一讨论,该方法优于先前讨论的两种方法。此外,我们强调了平衡性质与事后风险分担规则之间的联系。

英文摘要

The balance property is an important property of fitted statistical models deployed for insurance pricing. It guarantees that the total actuarial price in the fitted model is equal to the totally observed loss used to fit the model. This can be seen as an in-sample global unbiasedness property. Maximum likelihood fitted generalized linear models (GLMs) with canonical links automatically fulfill the balance property. Lindholm-Wüthrich (Scandinavian Actuarial Journal, 2026) discussed two popular balance correction methods in case the balance property fails to hold. This note extends this discussion with a third method, constrained GLM fitting, that turns out to be superior over the two previously discussed ones. Moreover, we highlight the connection between the balance property and ex-post risk sharing rules.

2606.07124 2026-06-08 cs.IT math.IT stat.ML 新提交

Information-Theoretic Bounds for Sparse Covariance Estimation in the Vertical-Split Distributed Model

垂直分割分布式模型中稀疏协方差估计的信息论界

Jing Yee Tan, Guangyue Han

AI总结 研究垂直分割分布式设置下稀疏协方差矩阵的极小化估计误差,证明稀疏性可降低通信和样本复杂度,并给出匹配的上下界。

详情
AI中文摘要

我们研究了垂直分割(特征分割)设置下分布式协方差矩阵估计的极小化估计误差,其中两个智能体各自观测 $m$ 个独立同分布的子高斯样本的不同坐标,并向中心服务器传输有限比特数。虽然 Rahmani 等人 [2025] 对稠密(无结构)互协方差矩阵建立了近乎紧的界,但我们研究了在互协方差 $C_{21}$ 上施加元素级 $s$-稀疏性是否能降低所需的通信和样本复杂度。与水平分割设置(Braverman 等人 [2016] 表明稀疏性不能降低均值估计的通信成本)相反,我们证明在垂直分割中稀疏性确实有助于互协方差估计。具体地,我们建立了极小化下界,表明每个智能体的通信预算为 $B_k = \Omega(\sigma^4 d_k\\, s' \log(d_1 d_2/s')/\varepsilon^2)$,互协方差估计的样本复杂度为 $m = \Omega(\sigma^4\\, s' \log(d_1 d_2/s')/\varepsilon^2)$,其中 $s' = s \wedge d_{\min}$。对于 $1$-稀疏情况,与稠密率相比,这实现了从 $d_1 d_2$ 到 $\log(d_1 d_2)$ 的指数级改进。我们的下界通过 Fano 方法建立,使用基于 Varshamov--Gilbert 型论证的显式稀疏打包(针对符号部分置换矩阵)并结合 Rahmani 等人 [2025] 的条件强数据处理不等式。我们通过匹配的可实现方案证明了界的紧性,该方案基于覆盖网量化和逐元素硬阈值,在多项式对数因子内达到 $s$-稀疏下界。

英文摘要

We study the minimax estimation error for distributed covariance matrix estimation in the vertical-split (feature-split) setting, where two agents each observe different coordinates of $m$ i.i.d. sub-Gaussian samples and communicate a limited number of bits to a central server. While Rahmani et al. [2025] established nearly tight bounds for dense (unstructured) cross-covariance matrices, we investigate whether imposing elementwise $s$-sparsity on the cross-covariance $C_{21}$ can reduce the required communication and sample complexity. In contrast to the horizontal-split setting, where Braverman et al. [2016] showed that sparsity does not reduce communication cost for mean estimation, we prove that sparsity does help for cross-covariance estimation in the vertical split. Specifically, we establish minimax lower bounds showing that the communication budget per agent scales as $B_k = Ω(σ^4 d_k\, s' \log(d_1 d_2/s')/\varepsilon^2)$ and the sample complexity for cross-covariance estimation as $m = Ω(σ^4\, s' \log(d_1 d_2/s')/\varepsilon^2)$, where $s' = s \wedge d_{\min}$. For the $1$-sparse case, this yields an exponential improvement from $d_1 d_2$ to $\log(d_1 d_2)$ compared to the dense rate. Our lower bounds are established via Fano's method with an explicit sparse packing using a Varshamov--Gilbert-type argument for signed partial permutation matrices combined with the Conditional Strong Data Processing Inequality of Rahmani et al. [2025]. We show the bounds are tight with a matching achievable scheme, based on covering-net quantization and entry-wise hard thresholding, that attains the $s$-sparse lower bound up to polylogarithmic factors.

2606.07065 2026-06-08 math.ST math.PR stat.ME stat.TH 新提交

Ising Models on Inhomogeneous Random Graphs: Inference, Local Asymptotic Minimaxity, and Limit of Experiments

非均匀随机图上的伊辛模型:推断、局部渐近极小极大性和实验极限

Somabha Mukherjee, Sanchayan Bhowal, Anirban Chatterjee, Bhaswar B. Bhattacharya

AI总结 针对亚临界参数下非均匀随机图上的伊辛模型,提出一种计算高效的闭合形式估计量,证明其与极大似然估计具有相同渐近分布和方差,并建立局部渐近极小极大最优性。

详情
Comments
82 pages, 2 figures. Abstract shortened to meet ArXiv requirements
AI中文摘要

在本文中,我们为亚临界参数区间内非均匀随机图上的伊辛模型开发了一个具有尖锐渐近最优性保证的推断框架。我们首先基于模型的一个样本,刻画了自然参数的极大似然估计的渐近分布,覆盖了稀疏和稠密网络两种情形。接着,为克服极大似然方法的计算困难,我们通过似然方程的一步近似提出一个简单的闭合形式估计量。我们证明该估计量达到与极大似然估计相同的渐近分布和方差,从而为自然参数提供了一个计算高效且渐近有效的置信区间。我们通过建立Hájek--Le Cam型局部渐近极小极大定理来补充这些推断结果,表明所提出的估计量在真实参数的收缩邻域内,在速率和领先常数上均达到最小的渐近最大风险。我们还推导了相应的实验极限。据我们所知,这是针对网络依赖数据的首批尖锐渐近最优性结果之一。最后,我们研究了自然参数的拟合优度检验,推导了似然比检验的局部功效和极小极大检测率。我们的分析依赖于非均匀随机图上伊辛模型的充分统计量(哈密顿量)和随机配分函数的新波动结果,这些结果本身也具有独立意义。

英文摘要

In this paper, we develop an inferential framework with sharp asymptotic optimality guarantees for Ising models on inhomogeneous random graphs in the subcritical parameter regime. We begin by characterizing the asymptotic distribution of the maximum likelihood (ML) estimate of the natural parameter, based on a single sample from the underlying model, covering both sparse and dense network regimes. Next, to overcome the computational intractability of the ML method, we propose a simple closed-form estimate obtained from a one-step approximation to the likelihood equation. We show that this estimate attains the same asymptotic distribution and variance as the ML estimate, thereby yielding a computationally efficient and asymptotically valid confidence interval for the natural parameter. We complement these inferential results by establishing a Hájek--Le Cam-type local asymptotic minimax theorem, showing that the proposed estimate achieves the smallest possible asymptotic maximum risk, both in rate and in leading constant, over shrinking neighborhoods of the true parameter. We also derive the corresponding limit of experiments. To the best of our knowledge, these are among the first sharp asymptotic optimality results for network-dependent data. Finally, we study goodness-of-fit testing for the natural parameter, deriving the local power of the likelihood ratio test and minimax detection rates. Our analysis relies on new fluctuation results for the sufficient statistic (Hamiltonian) and for the random partition function of Ising models on inhomogeneous random graphs, which are of independent interest.

2606.07022 2026-06-08 math.ST stat.TH 新提交

Integral stochastic orders of $m$-generalized order statistics from transform-ordered nonparametric families

来自变换有序非参数族的 $m$-广义次序统计量的积分随机序

Idir Arab, Tommaso Lando, Paulo Eduardo Oliveira, Tomasz Rychlik

AI总结 本文为 $m$-广义次序统计量在递增凹、递增凸和星形随机序下提供比较的充分条件,涵盖经典次序统计量、删失II型次序统计量和记录值,并采用非参数变换有序假设。

详情
AI中文摘要

我们提供了关于递增凹、递增凸和星形随机序下比较 $m$-广义次序统计量的充分条件。这些条件允许我们对经典次序统计量、选定的删失II型次序统计量和记录值进行排序。它们依赖于广义次序统计量的参数和潜在分布。我们采用非参数方法,假设某种随机变换有序性质,即某种合适的形状条件,而不是假设特定的参数形式。这个框架涵盖了许多通过变换序与广义和负广义帕累托分布相关的相关分布类。

英文摘要

We provide sufficient conditions for comparing $m$-generalized order statistics with respect to the increasing concave, increasing convex, and star-shaped stochastic orders. These conditions allow us to rank classical order statistics, selected censored type-II order statistics, and records. They depend on both the parameters of the generalized order statistics and the underlying distribution. Rather than assuming a specific parametric form, we adopt a nonparametric approach and assume some stochastic transform-ordered property, that is, some suitable shape condition. This framework encompasses many relevant classes of distributions that are related, via transform order, to the generalized and the negative generalized Pareto distribution.

2606.06782 2026-06-08 cs.IT cs.LG math.IT math.ST stat.ML stat.TH 新提交

The Sharp Phase Transition of Tyler's M-Estimator for Robust Subspace Recovery

Tyler's M-估计器在鲁棒子空间恢复中的尖锐相变

Gilad Lerman, Teng Zhang

AI总结 研究Tyler's M-估计器在临界信噪比DS-SNR=1时的行为,证明其收敛到真实子空间,建立尖锐相变。

详情
AI中文摘要

鲁棒子空间恢复(RSR)旨在从被异常值严重污染的数据集中识别潜在的d维子空间。复杂性理论结果基于维度缩放信噪比(DS-SNR)建立了问题计算难度的阈值:当DS-SNR严格小于1时,问题是SSE难的;当它大于1时,在一般位置假设下可通过实用算法求解。然而,在临界边界DS-SNR=1处实用算法的确切行为一直未知。本文解决了Tyler's M-估计器(TME)在此临界边界的行为,从而建立了尖锐相变。具体地,我们证明在一种新的稳定性条件下,当DS-SNR≥1时,TME精确收敛到真实子空间,该条件比先前文献中使用的一般位置假设更宽松。我们的分析利用了在majorization-minimization框架内对TME迭代的分解。

英文摘要

Robust Subspace Recovery (RSR) aims to identify an underlying d-dimensional subspace from a dataset heavily corrupted by outliers. Complexity-theoretic results establish a threshold for the problem's computational hardness based on the dimension-scaled signal-to-noise ratio (DS-SNR): the problem is SSE-hard when the DS-SNR is strictly less than 1, and solvable via practical algorithms when it is greater than 1 under general position assumptions. However, the exact behavior of practical algorithms at the critical boundary DS-SNR = 1 has remained unknown. This work resolves the behavior of Tyler's M-estimator (TME) at this critical boundary, consequently establishing a sharp phase transition. Specifically, we prove that TME converges exactly to the true subspace for DS-SNR \geq 1 under a new stability condition, which is less restrictive than the general position assumptions used in prior literature. Our analysis utilizes a decomposition of the TME iterates within a majorization-minimization framework.

2606.06769 2026-06-08 math.ST stat.TH 新提交

Sequential testing of conditionally constrained hypotheses

条件约束假设的序贯检验

Eugenio Clerico

AI总结 针对有限条件约束定义的非参数假设,刻画了所有e过程的显式形式,证明每个e过程被仿射一步e变量的可预测积点点控制,从而可将任意e过程无损失地替换为检验超鞅。

详情
AI中文摘要

我们显式刻画了用于检验由有限个条件约束定义的条件非参数假设的全体e过程类。主要结果是一个完全类定理:此类假设的每个e过程都被仿射一步e变量的可预测积点点控制。因此,对于一大类条件检验问题,任意e过程可以无损失地替换为检验超鞅。这扩展了之前从单步约束检验和有界一维条件均值检验到更广泛的条件序贯设置的结果。

英文摘要

We explicitly characterise the full class of e-processes for testing conditional non-parametric hypotheses, defined by finitely many conditional constraints. Our main result is a complete-class theorem: every e-process for such a hypothesis is point-wise dominated by a predictable product of affine one-step e-variables. Therefore, for a broad class of conditional testing problems, arbitrary e-processes can be replaced without loss by test supermartingales. This extends previous complete-class results from single-step constrained testing and bounded one-dimensional conditional mean testing to a broader conditional sequential setting.

2606.06632 2026-06-08 math.ST cs.NA eess.IV eess.SP math.NA stat.TH 新提交

Smooth Hard-Thresholding for Singular Values with Stein's Unbiased Risk Estimate

奇异值的平滑硬阈值与Stein无偏风险估计

Guanzhong Yang

AI总结 针对低秩矩阵去噪,提出基于Stein无偏风险估计的平滑硬阈值谱估计器,解决传统硬阈值不连续问题,并证明其无偏风险估计性质。

详情
Comments
24 pages, 9 figures, 4 tables
AI中文摘要

低秩矩阵去噪是基于补丁的图像恢复和许多其他逆问题的核心原语。经典的基于SVD的图像去噪方法通常通过将残差奇异值能量与估计的噪声能量匹配来选择截断秩,但这一规则并非有限样本风险原则,因为拟合的低秩近似不可避免地吸收了部分噪声。本文基于Stein无偏风险估计(SURE)开发了一种数学上严格的替代方案。由于奇异值硬阈值是不连续的,且不满足Stein引理的假设,我们引入了一种逻辑平滑硬阈值谱估计器。我们证明了平滑收缩器满足Stein引理的谱估计器版本所需的正则条件,因此在高斯噪声下允许精确无偏的固定阈值风险估计。对于固定的观测矩阵和一组与观测奇异值分离的候选阈值,固定阈值平滑SURE目标的排序最终与一个简单的极限分数一致。该极限分数具有与有偏硬阈值SURE公式相同的代数形式,但在此仅用作排序有限候选的计算工具。选择最小化阈值是一个数据自适应的调整步骤;所选的SURE值不应被解释为最终选择的估计器的无偏风险估计。

英文摘要

Low-rank matrix denoising is a central primitive in patch-based image restoration and many other inverse problems. Classical SVD-based image denoising methods often choose a truncation rank by matching residual singular-value energy with an estimated noise energy, but this rule is not a finite-sample risk principle because a fitted low-rank approximation inevitably absorbs part of the noise. This paper develops a mathematically rigorous alternative based on Stein's unbiased risk estimate (SURE). Since singular value hard thresholding is discontinuous and does not satisfy the hypotheses of Stein's lemma, we introduce a logistic smooth hard-threshold spectral estimator. We prove that the smooth shrinker satisfies the regularity conditions required by a spectral-estimator version of Stein's lemma, and therefore admits an exactly unbiased fixed-threshold risk estimate under Gaussian noise. For a fixed observed matrix and a finite set of candidate thresholds separated from the observed singular values, the ordering of the fixed-threshold smooth SURE objective eventually agrees with a simple limiting score. The limiting score has the same algebraic form as the biased hard-threshold SURE formula, but here it is used only as a computational device for ranking finite candidates. Selecting the minimizing threshold is a data-adaptive tuning step; the selected SURE value should not be interpreted as an unbiased risk estimate of the finally selected estimator.

2606.07247 2026-06-08 cond-mat.dis-nn cond-mat.stat-mech stat.ML 新提交

Theory of learning of high-dimensional controlled non-linear dynamical systems (I): models and methods

高维受控非线性动力系统学习理论 (I): 模型与方法

Pierfrancesco Urbani

AI总结 本文提出一类理论模型,通过动态平均场理论求解神经ODE在在线随机梯度下降下的训练动力学,并推导高维极限下的学习曲线。

详情
Comments
28 pages, 2 figures
AI中文摘要

神经常微分方程(neural ODEs)迅速成为概念化人工神经网络的一个强大且统一的框架,优雅地将动力系统的连续时间建模与现代深度学习的离散数据驱动范式联系起来。除了实际优势外,它们还为神经网络的训练和泛化性质提供了新的理论见解。该框架的显著特征是其双重动力学性质:推理动力学(控制前向计算期间的ODE演化)和训练动力学(控制模型参数的优化)。这使得神经ODE成为研究多种设置(如多层神经网络(例如ResNet)、自回归模型(具有下一个token生成动力学)、生成模型以及理论神经科学中的递归神经网络)的特别合适的理论框架。在这项工作中,我们引入了一个基于理论的模型类,用于研究通过在线随机梯度下降训练的神经ODE。我们通过动态平均场理论求解这些模型的训练动力学,并推导出高维极限下的学习曲线。

英文摘要

Neural ordinary differential equations (neural ODEs) have rapidly gained prominence as a powerful and unifying framework for conceptualizing artificial neural networks, elegantly connecting the continuous-time modeling of dynamical systems with the discrete, data-driven paradigm of modern deep learning. Beyond their practical advantages they offer fresh theoretical insights into the training and generalization properties of neural networks. The distinctive feature of this framework is its dual dynamical nature: inference dynamics, which govern the ODE evolution during forward computation, and training dynamics, which control the optimization of model parameters. This makes neural ODEs a particularly well-suited theoretical framework for studying a large variety of settings such as multi-layer neural networks (ResNets for example), autoregressive models (with next-token generation dynamics), generative models, and recurrent neural networks in theoretical neuroscience. In this work, we introduce a theoretically grounded class of models for studying neural ODEs trained via online stochastic gradient descent. We solve the training dynamics of these models via dynamical mean field theory and derive learning curves in the high-dimensional limit.

2606.06395 2026-06-08 math.DG math.ST stat.TH 版本更新

Doubly Totally-Umbilical Statistical Submanifolds in the Probability Simplex

概率单纯形中的双全脐统计子流形

Ryu Ueno

AI总结 本文在概率单纯形中完全分类了双全脐统计子流形,这是信息几何中统计子流形理论的一个重要问题。

详情
Comments
All comments are welcome!; 34 pages
AI中文摘要

概率单纯形是最标准的统计流形之一,由S. Amari和H. Nagaoka开创的信息几何研究了概率单纯形的统计子流形理论。另一方面,H. Furuhata在统计流形几何中定义了双全脐子流形。我们给出了概率单纯形中双全脐子流形的完全分类。

英文摘要

We give a complete classification of doubly totally-umbilical submanifolds in the probability simplex. The probability simplex is one of the most standard statistical manifolds, and information geometry initiated by S. Amari and H. Nagaoka studies the statistical submanifold theory of the probability simplex. On the other hand, H. Furuhata defined doubly totally-umbilical submanifolds in the geometry of statistical manifolds, inspired by the surface theory of Euclidean space.

2606.05967 2026-06-08 stat.ML cs.LG 版本更新

Fast and Robust Convergence Rate for TD(0) with Linear Function Approximation, Universal Learning Steps and I.I.D. Samples

具有线性函数逼近、通用学习步长和独立同分布样本的TD(0)的快速鲁棒收敛速率

Ziad Kobeissi, Éloïse Berthier

AI总结 针对线性函数逼近的TD(0)算法,在独立同分布样本和常数学习步长下,提出一种均方误差的快速(1/k阶)、鲁棒(不依赖最小特征值)且尖锐(乘性常数小于11)的收敛速率,并引入PCTD(0)变体以在强混合假设下获得更好收敛性。

详情
Journal ref
AISTATS 2026, May 2026, Tanger, Morocco
Comments
This is an extended version of a paper accepted at AISTATS 2026
AI中文摘要

本文研究了具有线性函数逼近(LFA)的TD(0)时序差分方法的有限时间行为。我们考虑策略内独立同分布(i.i.d.)样本、常数学习步长和Polyak-Juditsky平均方法。我们为近似函数的均方误差(MSE)建立了一个新的收敛速率,该速率(i)快速,即具有迭代次数k的最优依赖性(即1/k阶),(ii)对病态条件鲁棒:仅依赖于初始误差和模型无关常数,以及(iii)尖锐,乘性常数小于11。特别地,与TD(0)文献中所有现有的O(1/k)速率不同,它不依赖于线性参数化的非中心协方差矩阵的最小特征值。我们还引入了PCTD(0),这是TD(0)的一个变体,在马尔可夫链的强混合附加假设下具有更好的收敛性质。

英文摘要

In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning step, and the Polyak-Juditsky averaging method. We establish a new convergence rate, for the Mean-Square Error (MSE) on the approximated function, that is (i) fast in the sense that it admits an optimal dependency in the number of iterations k (i.e., of order 1/k), (ii) robust to ill-conditioning: it only depends on an initial error and modelindependent constants and (iii) sharp up to a multiplicative constant lower than 11. In particular, it does not depend on the smallest eigenvalue of the uncentered covariance matrix of the linear parametrization, unlike all pre-existing O(1/k) rates in the TD(0) literature. We also introduce PCTD(0), a variant of TD(0), which benefits from better convergence properties under an additional assumption of strong mixing on the Markov Chain.

2606.05919 2026-06-08 stat.ML cs.LG econ.EM stat.CO 版本更新

Finding Most Influential Sets

寻找最具影响力的集合

Lucas D. Konrad, Nikolas Kuschnig

AI总结 针对具有线性分式留出效应的估计量,提出一种基于Dinkelbach方法的高效算法,将最具影响力集合的选择转化为一个单参数序列的top-k问题,实现全局最优解。

详情
Comments
Published as a conference paper at ICML 2026, fixed ref
AI中文摘要

识别最具影响力的集合(MIS)——即移除后能最大程度改变目标估计量的大小为$k$的子集——通常是不可行的,因为需要搜索$inom{n}{k}$个子集。对于具有线性分式留出效应的估计量,我们证明MIS选择可简化为一个单参数序列的top-k问题。Dinkelbach方法产生了一种每轮迭代成本为$\mathcal{O}(n)$且有限终止的算法。对于固定残差化输入,该算法返回单变量比率目标的全局最优集,包括预言机残差化偏线性模型。当存在估计的干扰函数时,均匀分母和生成得分稳定性意味着对一阶预言机正交得分目标的近似;在分离条件下,可精确恢复集合。模拟和应用表明,该方法恢复了以前计算上无法访问的精确MIS。

英文摘要

Identifying most influential sets (MIS) - size-$k$ subsets whose removal maximally changes a target estimand - is typically infeasible because it requires searching over $\binom{n}{k}$ subsets. For estimands with linear-fractional leave-set-out effects, we show that MIS selection reduces to a one-parameter sequence of top-$k$ problems. Dinkelbach's method yields an algorithm with $\mathcal{O}(n)$ cost per iteration and finite termination. For fixed residualized inputs, the algorithm returns a globally optimal set for the univariate ratio objective, including the oracle-residualized partial linear model. With estimated nuisance functions, uniform denominator and generated-score stability imply approximation to the first-order oracle orthogonal-score objective; exact set recovery follows under a separation condition. Simulations and applications show that the method recovers exact MIS that were previously computationally inaccessible.

2606.03559 2026-06-08 cs.LG math.OC stat.ML 版本更新

Analytical Evaluation of DCA Convergence Properties for Minimizing Prediction Functions of Gaussian RBF Support Vector Regression

高斯RBF支持向量回归预测函数最小化的DCA收敛性分析评估

Yohei Kakimoto, Yuto Omae, Hirotaka Takahashi

AI总结 针对以训练好的高斯RBF核支持向量回归(RBF-SVR)预测函数为目标函数的非凸优化问题,利用RBF核的解析结构构造显式DC分解,推导出DC分量强凸参数下界μ和子问题梯度Lipschitz常数上界L的闭式表达式,并通过数值实验表明特征量Cαρ主导DCA的收敛性和初始点依赖性。

详情
Comments
29 pages, 5 figures, 2 tables
AI中文摘要

对于目标函数为训练好的高斯径向基函数(RBF)核支持向量回归(SVR)模型(RBF-SVR)预测函数的非凸优化问题,我们提出一个框架,通过利用RBF核的解析结构构造显式的凸函数差(DC)分解,应用DC算法(DCA)。具体地,我们闭式推导了DC分量的强凸参数下界μ和子问题梯度Lipschitz常数上界L。μ和L完全由训练后的对偶系数和Cα、RBF核参数γ以及DC分解参数ρ决定,且共享共同主导项Cαρ。通过在六个基准函数上的数值实验,我们表明Cαρ是表征DCA收敛性质和初始点依赖性的主要单一量,并进一步证明它分解为两个独立路径C→Cα和γ→ρ,其主要变化由SVR超参数(C,γ)控制。这些结果使得RBF-SVR上DCA的收敛性质可以通过单一标量Cαρ预先评估:训练前近似从(C,γ)得到,训练后精确闭式得到。

英文摘要

For nonconvex optimization problems whose objective is the prediction function of a trained Support Vector Regression (SVR) model with the Gaussian radial basis function (RBF) kernel (RBF-SVR), we present a framework that applies the difference of convex functions (DC) algorithm (DCA) by exploiting the analytical structure of the RBF kernel to construct an explicit DC decomposition. Specifically, we derive in closed form both the lower bound $μ$ of the strong convexity parameter of the DC components and the upper bound $L$ of the gradient Lipschitz constant of the subproblem. Both $μ$ and $L$ are determined solely by the post-training dual-coefficient sum $C_α$ and the RBF kernel parameter $γ$, together with the DC decomposition parameter $ρ$, and they share a common leading term $C_αρ$. Through numerical experiments on six benchmark functions, we show that $C_αρ$ is the primary single quantity characterizing both the convergence properties and the initial-point dependence of DCA, and further demonstrate that it decomposes into two independent pathways, $C \to C_α$ and $γ\to ρ$, with its primary variation governed by the SVR hyperparameters $(C, γ)$. Together, these results allow the convergence properties of DCA on RBF-SVR to be assessed in advance through the single scalar quantity $C_αρ$: approximately from $(C, γ)$ before training, and exactly in closed form after training.

2605.31583 2026-06-08 math.ST math.PR stat.TH 版本更新

Sharp minimax risks and phase transitions in sparse submatrix detection

稀疏子矩阵检测中的尖锐极小极大风险与相变

Subhajit Goswami, Rajarshi Mukherjee

AI总结 研究在含噪矩阵中检测稀疏高均值高斯子矩阵的极小极大风险,确定了风险趋于0或1的精确渐近速率及相变边界。

详情
Comments
26 pages, 1 figure
AI中文摘要

我们研究在更大的含噪矩阵中检测稀疏高均值高斯子矩阵的极小极大风险。当植入子矩阵大小为$n\times n$,环境矩阵大小为$N\times N$且$N = n^{1+\alpha}$时,Butucea等人(2013)的经典工作确定了尖锐检测边界,在该边界附近极小极大风险收敛于$0$或$1$。本文通过在整个两变量相图中确定极小极大风险的精确渐近速率,扩展了该零一理论。在检测边界之上,我们确定了风险拉伸或超指数衰减的精确指数。在边界之下(风险趋于1),我们确定了收敛速率的精确多项式阶(绝对乘法常数范围内)。在这两种区域中,尖锐渐近形式在直线$\alpha+\delta=1/2$附近发生变化,其中$\delta$表示到边界的带符号距离。最后,在检测边界上,我们证明在非常稀疏的情况下($n$固定,$N\to\infty$),极小极大风险收敛于非退化常数$\frac12$。这些速率中的每一个对应于适当校准的扫描或求和检验的风险,从而得出上界。为证明这些界的尖锐性,我们依赖于根据特定区域精心选择的随机变量的精细二阶矩方法。我们的结果也推广到张量设定。

英文摘要

We study the minimax risk for detecting a sparse elevated-mean Gaussian submatrix inside a larger noisy matrix. When the planted submatrix has size $n\times n$ and the ambient matrix has size $N\times N$ with $N = n^{1+α}$, the classical work of \cite{butuceasubmatrix2013} identifies the sharp detection boundary around which the minimax risk converges to $0$ or $1$. This paper extends that zero-one theory by determining the precise asymptotic rate of the minimax risk throughout a two-variable phase diagram. Above the detection boundary, we determine the precise exponent for the stretched or super-exponential decay of the risk. Below the boundary, where the risk tends to 1, we identify the exact polynomial order of the rate of convergence up to absolute multiplicative constants. In both of these regimes, the form of the sharp asymptotics changes around the line $α+ δ= 1/2$ where $δ$ indicates the signed distance from the boundary. Finally, on the detection boundary, we show that the minimax risk converges to the non-degenerate constant $\frac12$ in the very sparse case where $n$ remains fixed and $N \to \infty$. Each of these rates corresponds to the risk of a suitably calibrated scan or sum test, whence follow the upper bounds. To show the sharpness of these bounds, we rely on refined second-moment methods applied to random variables chosen carefully according to the particular regime. Our results also extend to the tensor setting.

2603.29969 2026-06-08 stat.OT 版本更新

Hilbert's Sixth Problem and Soft Logic

希尔伯特第六问题与软逻辑

Moshe Klein, Oren Fivel

AI总结 本文提出基于软逻辑和软数的概率框架,解决经典概率理论中微观状态概率为零的难题,并探讨其对统计力学和希尔伯特第六问题的影响,同时引入无穷小概率公理和软数在处理不可能事件概率时的应用。

详情
Comments
20 pages, 8 figures, 1 table ,1 python code
AI中文摘要

希尔伯特第六问题要求对物理进行公理化,特别是从微观机械原理推导宏观统计定律。在经典概率理论中,连续空间中每个微观状态的概率都为零,这构成了概念上的困难。本文介绍了一种基于软逻辑和软数的概率框架,其中点事件具有无穷小的软概率而非经典零。我们展示了软概率可以被视为经典概率的无穷小细化,并讨论其对统计力学和希尔伯特第六问题的影响。此外,我们还严谨地展示了如何基于软数构建莫比乌斯带,并讨论这种基于软数的莫比乌斯带表示如何更深入地理解希尔伯特第六问题的本质和特征。受经典概率坍缩为零的启发,我们建议将无穷小概率公理加入 Kolmogorov 的五个概率公理之中。此外,我们还提出了一种基于软数的概率框架,用于分配离散随机变量外延支持的不可能事件的概率值,这种分配基于对帕斯卡三角形的扩展,使其能够包含基于负数阶乘的软零。

英文摘要

Hilbert's sixth problem calls for the axiomatization of physics, particularly the derivation of macroscopic statistical laws from microscopic mechanical principles. A conceptual difficulty arises in classical probability theory: in continuous spaces every individual microstate has probability zero. In this paper, we introduce a probabilistic framework based on Soft Logic and Soft Numbers in which point events possess infinitesimal Soft probabilities rather than the classical zero. We show that Soft probability can be interpreted as an infinitesimal refinement of classical probability and discuss its implications for statistical mechanics and Hilbert's sixth problem. In addition, we show rigorously how to construct a Mobius strip, based on the soft numbers, and we discuss how this Mobius strip representation with soft numbers allows for a deeper understanding of the nature and character of Hilbert's sixth problem. Inspired by the collapsing of that classical probability to zero, we suggest adding an axiom for an Infinitesimal Probability into the list of Kolmogorov's five Probability axioms. Furthermore, we suggest a probabilistic framework based on Soft Numbers for assigning values to probabilities of impossible events of a discrete random variable with realizations outside its support (which, in the ordinary probability, collapse to zero). This assignment of Soft Number values is based on an extension of the Pascal triangle to have soft zeros outside of the regular Pascal triangle (with real values) based on factorials of negative numbers.

2109.02644 2026-06-08 math.PR stat.ML 版本更新

Resolvent convergence for sample covariance matrices with general covariance profiles and quadratic-form control

具有一般协方差轮廓和二次型控制的样本协方差矩阵的预解收敛性

Cosme Louart

AI总结 研究独立但非同分布列随机矩阵的预解式收敛性,通过二次型矩控制误差,给出迹的确定性等价逼近。

详情
Comments
Main text 38p
AI中文摘要

我们研究预解式 \\[ G^z = \left(\frac{1}{n}XX^T - zI_p\right)^{-1}, \qquad z\in\mathbb C,\\ \Im(z)>0, \\] 其中 $X=(x_1,\ldots,x_n)\in\mathcal M_{p,n}$ 是一个具有独立但不一定同分布列的随机矩阵。我们的界用中心化二次型 \\[ q_i(A):=x_i^TAx_i-\mathbb E[x_i^TAx_i] \\] 的矩表示,其中 $A$ 是单位 Hilbert--Schmidt 范数的确定性矩阵。特别地,我们不假设给定列 $x_i$ 的元素之间独立。在准渐近区域 $p\le O(n)$ 中,矩阵 $G^z$ 有一个自然的确定性等价 $\tilde G^z$,仅依赖于列向量 $x_1,\ldots,x_n$ 的二阶矩。我们证明,对于任意确定性矩阵 $B\in\mathcal M_p$,迹 $\text{Tr}(BG^z)$ 接近于 $\text{Tr}(B\tilde G^z)$,误差在二次型的一阶矩界下由 $\\|B\\|_{\text{HS}}$ 控制,在适当的二阶矩界下由 $\\|B\\|_{\text{HS}}/\sqrt n$ 控制。

英文摘要

We study the resolvent \[ G^z = \left(\frac{1}{n}XX^T - zI_p\right)^{-1}, \qquad z\in\mathbb C,\ \Im(z)>0, \] where $X=(x_1,\ldots,x_n)\in\mathcal M_{p,n}$ is a random matrix with independent, but not necessarily identically distributed, columns. Our bounds are expressed in terms of moments of the centered quadratic forms \[ q_i(A):=x_i^TAx_i-\mathbb E[x_i^TAx_i], \] for deterministic matrices $A$ with unit Hilbert--Schmidt norm. In particular, we do not assume independence between the entries of a given column $x_i$. In the quasi-asymptotic regime $p\le O(n)$, the matrix $G^z$ admits a natural deterministic equivalent $\tilde G^z$, depending only on the second moments of the column vectors $x_1,\ldots,x_n$. We show that, for any deterministic matrix $B\in\mathcal M_p$, the trace $\text{Tr}(BG^z)$ is close to $\text{Tr}(B\tilde G^z)$, with error controlled by $\|B\|_{\text{HS}}$ under first-moment bounds on the quadratic forms, and by $\|B\|_{\text{HS}}/\sqrt n$ under suitable second-moment bounds.

2604.26535 2026-06-08 stat.ME cs.NA math.NA 版本更新

ARMA approximation of a Non-separable Spatio-Temporal Model with Fractional Smoothnesses in Space and Time

具有空间和时间分数平滑度的非分离时空模型的ARMA逼近

S. Knutsen Furset, Geir-Arne Fuglstad, Espen R. Jakobsen

AI总结 针对具有分数平滑度的非分离时空模型,提出基于时间有理逼近的离散化方法,得到VARMA过程,证明协方差函数逐点收敛并给出收敛速率,通过数值验证和模拟研究展示低阶VARMA的精度及参数估计能力。

详情
AI中文摘要

Matérn协方差模型在空间建模中无处不在,但在时空建模中没有默认选择。本文考虑最近提出的基于扩散的空间Matérn协方差模型扩展到时空非分离协方差模型,该模型允许空间和时间上的分数平滑度。该模型通过时空分数随机偏微分方程描述,但当前提出的计算方法对时间上的可能平滑度有严格限制。我们提出一种基于时间有理逼近的离散化方法来处理任意平滑度,这导致向量自回归移动平均过程(VARMA)。我们证明了逼近的协方差函数逐点收敛,确定了作为空间和时间分辨率以及有理逼近精度的函数的显式收敛速率,并进行了数值验证以展示低阶VARMA过程的微小逐点误差。通过模拟研究,我们证明了参数可以被估计回来,并且正确指定时间平滑度对于预测尤其重要。该方法应用于法国大陆三个月的日平均温度数据。

英文摘要

The Matérn covariance model is ubiquitous in spatial modelling, but there is no default choice for spatio-temporal modelling. In this paper, we consider the recently proposed ``diffusion-based'' extension of the spatial Matérn covariance model to a spatio-temporal non-separable covariance model that allows fractional smoothnesses in space and in time. The model is described in terms of a space-time fractional stochastic partial differential equation, but currently proposed computational approaches have strong restrictions on the possible smoothnesses in time. We propose a discretization method based on rational approximations in time to handle arbitrary smoothnesses, which leads to a vector autoregressive moving average process (VARMA). We prove that the covariance function of the approximation converges pointwise, determine explicit convergence rates as a function of spatial and temporal resolutions and the accuracy of the rational approximation, and conduct numerical verification to demonstrate small pointwise error for low orders of the VARMA process. Through a simulation study, we demonstrate that the parameters can be estimated back and that correctly specifying the temporal smoothness is especially important for forecasting. The approach is illustrated for three months of daily mean temperatures in mainland France.

2604.21407 2026-06-08 cs.LG stat.CO stat.ML 版本更新

Even More Guarantees for Variational Inference in the Presence of Symmetries

变分推断在对称性存在下的更多保证

Lena Zellinger, Antonio Vergari

AI总结 本文扩展了变分推断在目标对称性下的鲁棒性理论,证明了使用前向KL散度和α-散度时,即使模型误设也能精确恢复目标均值和相关矩阵,并放宽了对数凹假设,适用于多模态分布。

详情
AI中文摘要

当通过变分推断(VI)近似一个难以处理的密度时,变分族通常被选为一个简单的参数族,很可能不包含目标。这引发了一个问题:在模型误设的情况下,我们能在什么条件下恢复目标的特征?在这项工作中,我们在两个重要方面扩展了先前关于位置-尺度族在目标对称性下鲁棒VI的理论结果:(1)我们通过提供使用前向Kullback-Leibler散度和α-散度时精确恢复目标均值和相关矩阵的充分条件,将它们开放给更广泛的散度。(2)通过这样做,我们发现可以放弃先前工作中做出的对数凹目标的限制性假设,从而允许我们为更广泛的目标(包括多模态目标)提供保证。在我们的实验中,我们展示了我们的保证如何作为选择变分族和α值的指南,并通过一组多样化的例子说明了在缺乏我们的充分条件时优化如何以及为何会失败。

英文摘要

When approximating an intractable density via variational inference (VI) the variational family is typically chosen as a simple parametric family that very likely does not contain the target. This raises the question: Under which conditions can we recover characteristics of the target despite misspecification? In this work, we extend previous theoretical results on robust VI with location-scale families under target symmetries in two substantial ways: (1) We open them up to a wider range of divergences by providing sufficient conditions for exact recovery of the target mean and correlation matrix when using the forward Kullback-Leibler divergence and $α$-divergences. (2) By doing so, we find that we can drop the restrictive assumption of a log-concave target made in previous work, allowing us to give guarantees for a wider range of targets, including multi-modal ones. In our experiments, we show how our guarantees can serve as guidelines for the choice of the variational family and $α$-value and we illustrate on a diverse set of examples how and why optimization can fail in the absence of our sufficient conditions.

2512.01667 2026-06-08 stat.ME stat.CO 版本更新

Detecting Model Misspecification in Bayesian Inverse Problems via Variational Gradient Descent

通过变分梯度下降检测贝叶斯逆问题中的模型误设定

Qingyang Liu, Matthew A. Fisher, Zheyang Shen, Xuebin Zhao, Katherine Tant, Andrew Curtis, Chris. J. Oates

AI总结 提出一种通过比较标准贝叶斯后验与预测导向后验来检测模型误设定的诊断方法,并基于变分梯度下降实现高效数值算法。

详情
Comments
Several improvements to the text and fixed a typo in the statement of Theorem 2.3
AI中文摘要

当统计模型设定正确时,贝叶斯推断是最优的;而在设定错误的情况下,贝叶斯推断可能灾难性地失败。因此,人们提出了大量后贝叶斯方法。预测导向(PrO)方法将统计模型 $P_θ$ 提升为(无限)混合模型 $\int P_θ\; \mathrm{d}Q(θ)$,并通过最小化熵正则化目标泛函来拟合该预测分布。在设定正确的情况下,期望混合分布 $Q$ 在大数据极限下集中于真实数据生成参数;而如果模型误设定,则通常不会观察到这种奇异集中。我们的贡献在于证明,通过比较标准贝叶斯后验与 PrO“后验”$Q$,可以经验性地检测模型误设定,为标准贝叶斯工作流提供一种新颖且广泛适用的诊断工具。为实现这一目标,我们提出了一种基于变分梯度下降的高效数值算法。模拟研究以及涉及地震学中贝叶斯逆问题的更详细案例研究证实,使用该框架可以自动检测模型误设定。

英文摘要

Bayesian inference is optimal when the statistical model is well-specified, while outside this setting Bayesian inference can catastrophically fail; accordingly a wealth of post-Bayesian methodologies have been proposed. Predictively oriented (PrO) approaches lift the statistical model $P_θ$ to an (infinite) mixture model $\int P_θ\; \mathrm{d}Q(θ)$ and fit this predictive distribution via minimising an entropy-regularised objective functional. In the well-specified setting one expects the mixing distribution $Q$ to concentrate around the true data-generating parameter in the large data limit, while such singular concentration will typically not be observed if the model is misspecified. Our contribution is to demonstrate that one can empirically detect model misspecification by comparing the standard Bayesian posterior to the PrO `posterior' $Q$, providing a novel and widely-applicable diagnostic tool for the standard Bayesian workflow. To operationalise this, we present an efficient numerical algorithm based on variational gradient descent. A simulation study, and a more detailed case study involving a Bayesian inverse problem in seismology, confirm that model misspecification can be automatically detected using this framework.

2604.03146 2026-06-08 stat.ML cs.LG 版本更新

Characterization of Gaussian Universality Breakdown in High-Dimensional Empirical Risk Minimization

高维经验风险最小化中高斯普适性破坏的表征

Chiheb Yaakoubi, Cosme Louart, Malik Tiomoko, Zhenyu Liao

AI总结 通过将凸高斯极小极大定理推广到非高斯数据,刻画了高维经验风险最小化估计量的渐近分布,揭示了高斯普适性的适用范围与局限。

详情
Journal ref
ICML 2026
Comments
28 pages, 5 figures, 1 table
AI中文摘要

我们研究了一般非高斯数据设计下的高维凸经验风险最小化(ERM)。通过启发式地将凸高斯极小极大定理(CGMT)扩展到非高斯设置,我们推导出关键统计量的渐近极小极大表征,从而能够近似ERM估计量 $\hat{\theta}$ 的均值 $\mu_{\hat{\theta}}$ 和协方差 $C_{\hat{\theta}}$。具体地,在数据矩阵的集中假设以及损失和正则化子的标准正则性条件下,我们证明:对于独立于训练数据的测试协变量 $x$,投影 $\hat{\theta}^\top x$ 近似遵循 $\mu_{\hat{\theta}}^\top x$ 的一般非高斯分布与一个独立中心高斯变量(方差为 $\mathrm{tr}(C_{\hat{\theta}} \mathbb{E}[xx^\top])$)的卷积。这一结果阐明了ERM高斯普适性的范围和局限。此外,我们证明任何 $\mathcal{C}^2$ 正则化子渐近等价于一个由其零点的Hessian矩阵和 $\mu_{\hat{\theta}}$ 处的梯度唯一确定的二次型。我们提供了跨不同损失和模型的数值模拟,以验证我们的理论预测和定性见解。

英文摘要

We study high-dimensional convex empirical risk minimization (ERM) under general non-Gaussian data designs. By heuristically extending the Convex Gaussian Min-Max Theorem (CGMT) to non-Gaussian settings, we derive an asymptotic min-max characterization of key statistics, enabling approximation of the mean $μ_{\hatθ}$ and covariance $C_{\hatθ}$ of the ERM estimator $\hatθ$. Specifically, under a concentration assumption on the data matrix and standard regularity conditions on the loss and regularizer, we show that for a test covariate $x$ independent of the training data, the projection $\hatθ^\top x$ approximately follows the convolution of the generally non-Gaussian distribution of $μ_{\hatθ}^\top x$ with an independent centered Gaussian variable of variance $\mathrm{tr}(C_{\hatθ} \mathbb{E}[xx^\top])$. This result clarifies the scope and limits of Gaussian universality for ERMs. Additionally, we prove that any $\mathcal{C}^2$ regularizer is asymptotically equivalent to a quadratic form determined solely by its Hessian at zero and gradient at $μ_{\hatθ}$. Numerical simulations across diverse losses and models are provided to validate our theoretical predictions and qualitative insights.

2603.20967 2026-06-08 stat.ML cs.LG math.ST stat.TH 版本更新

Hard labels sampled from sparse targets mislead rotation invariant algorithms

从稀疏目标采样的硬标签误导旋转不变算法

Avrajit Ghosh, Bin Yu, Manfred Warmuth, Peter Bartlett

AI总结 针对稀疏目标下的二分类问题,证明旋转不变算法(如逻辑损失梯度下降)的过风险下界为Ω((d-1)/n),而通过重参数化u_i v_i的非旋转不变算法可实现O(s log d / n)的上界。

详情
Journal ref
ICML-2026
AI中文摘要

最常见的机器学习设置之一是逻辑回归。在许多分类模型中,包括神经网络,最终预测是通过将逻辑链接函数应用于线性得分获得的。在二元逻辑回归中,反馈可以是软标签(对应于数据的真实条件概率,如在蒸馏中)或采样的硬标签(取值为$\pm 1$)。我们指出即使在特别有利的设置中也会出现一个基本问题,其中目标是学习形式为$\sigma(\mathbf{x}^{\top}\mathbf{w}^{\star})$的无噪声软目标。在过约束情况(即样本数$n$超过输入维度$d$)下,使用样本$(\mathbf{x}_i,\sigma(\mathbf{x}_i^{\top}\mathbf{w}^{\star}))$足以恢复$\mathbf{w}^{\star}$,从而获得贝叶斯风险。然而,我们证明当样本由从相同条件分布$\sigma(\mathbf{x}_i^{\top}\mathbf{w}^{\star})$采样的硬标签$y_i$标记,且$\mathbf{w}^{\star}$是$s$-稀疏时,旋转不变算法被证明是次优的:它们产生过风险$\Omega\\!\left(\frac{d-1}{n}\right)$,而存在简单的非旋转不变算法,其过风险为$O(\frac{s\log d}{n})$。最简单的旋转不变算法是逻辑损失上的梯度下降(带早停)。针对稀疏目标实现上述上界的简单非旋转不变算法使用对权重$u_i,v_i$的梯度下降,其中线性权重$w_i$被重参数化为$u_i v_i$。

英文摘要

One of the most common machine learning setups is logistic regression. In many classification models, including neural networks, the final prediction is obtained by applying a logistic link function to a linear score. In binary logistic regression, the feedback can be either soft labels, corresponding to the true conditional probability of the data (as in distillation), or sampled hard labels (taking values $\pm 1$). We point out a fundamental problem that arises even in a particularly favorable setting, where the goal is to learn a noise-free soft target of the form $σ(\mathbf{x}^{\top}\mathbf{w}^{\star})$. In the over-constrained case (i.e. the number of samples $n$ exceeds the input dimension $d$) with examples $(\mathbf{x}_i,σ(\mathbf{x}_i^{\top}\mathbf{w}^{\star}))$, it is sufficient to recover $\mathbf{w}^{\star}$ and hence achieve the Bayes risk. However, we prove that when the examples are labeled by hard labels $y_i$ sampled from the same conditional distribution $σ(\mathbf{x}_i^{\top}\mathbf{w}^{\star})$ and $\mathbf{w}^{\star}$ is $s$-sparse, then rotation-invariant algorithms are provably suboptimal: they incur an excess risk $Ω\!\left(\frac{d-1}{n}\right)$, while there are simple non-rotation invariant algorithms with excess risk $O(\frac{s\log d}{n})$. The simplest rotation invariant algorithm is gradient descent on the logistic loss (with early stopping). A simple non-rotation-invariant algorithm for sparse targets that achieves the above upper bounds uses gradient descent on the weights $u_i,v_i$, where now the linear weight $w_i$ is reparameterized as $u_iv_i$.

2312.07762 2026-06-08 cs.LG cs.NA math.NA stat.AP 版本更新

Interpretable factorization of clinical questionnaires to identify latent factors of psychopathology

临床问卷的可解释分解以识别精神病理学的潜在因素

Ka Chun Lam, Bridget W Mahony, Armin Raznahan, Francisco Pereira

AI总结 提出可解释性约束问卷分解(ICQF),一种非负矩阵分解方法,通过正则化提高因子可解释性和稳定性,并自动检测潜在维度,在真实数据中优于现有方法。

详情
AI中文摘要

精神病学研究旨在通过识别少量潜在因素来理解问卷数据中测量的行为精神病理学表现。虽然因子分析是传统工具,但所得因子可能不可解释,且可能受混杂变量影响。此外,缺失数据常见,通常需要显式插补。为克服这些限制,我们引入了可解释性约束问卷分解(ICQF),一种针对问卷数据正则化的非负矩阵分解方法。我们的方法旨在提高因子可解释性和解稳定性。我们提供了具有理论收敛保证的优化过程,以及自动准确检测潜在维度的程序。我们使用逼真的合成数据验证了这些程序。我们在两个独立数据集(健康大脑网络和青少年大脑认知发展研究)中展示了该方法在广泛使用的通用问卷中的有效性。具体而言,我们表明ICQF提高了领域专家定义的可解释性,同时保留了跨一系列障碍的诊断信息,并在较小数据集规模下优于竞争方法。这表明我们方法中的正则化与领域特征相匹配。ICQF的Python实现可在https://github.com/jefferykclam/ICQF获取。

英文摘要

Psychiatry research seeks to understand the manifestations of psychopathology in behavior, as measured in questionnaire data, by identifying a small number of latent factors that explain them. While factor analysis is the traditional tool for this purpose, the resulting factors may not be interpretable, and may also be subject to confounding variables. Moreover, missing data are common, and explicit imputation is often required. To overcome these limitations, we introduce interpretability constrained questionnaire factorization (ICQF), a non-negative matrix factorization method with regularization tailored for questionnaire data. Our method aims to promote factor interpretability and solution stability. We provide an optimization procedure with theoretical convergence guarantees, and an automated procedure to detect latent dimensionality accurately. We validate these procedures using realistic synthetic data. We demonstrate the effectiveness of our method in a widely used general-purpose questionnaire, in two independent datasets (the Healthy Brain Network and Adolescent Brain Cognitive Development studies). Specifically, we show that ICQF improves interpretability, as defined by domain experts, while preserving diagnostic information across a range of disorders, and outperforms competing methods for smaller dataset sizes. This suggests that the regularization in our method matches domain characteristics. The python implementation for ICQF is available at https://github.com/jefferykclam/ICQF.

2603.12507 2026-06-08 cs.LG math.OC stat.CO stat.ML 版本更新

Adaptive Conditional Forest Sampling for Spectral Risk Optimisation under Decision-Dependent Uncertainty

自适应条件森林采样用于决策依赖不确定性下的谱风险优化

Marcell T. Kurbucz

AI总结 提出ACFS框架,结合广义随机森林、CEM全局搜索和重加权聚焦增强,解决决策依赖分布下的谱风险最小化问题,在重尾和偏态基准上优于现有方法。

详情
Comments
18 pages, 3 figures, 10 tables
AI中文摘要

当不确定性分布依赖于决策时,最小化谱风险目标(定义为期望成本与条件风险价值(CVaR)的加权组合)具有挑战性,这使得代理建模和基于模拟的排序对尾部估计误差敏感。我们提出自适应条件森林采样(ACFS),一个四阶段模拟优化框架,集成了用于决策条件分布近似的广义随机森林、CEM引导的全局探索、秩加权聚焦增强以及代理到真实的两阶段重排序,然后进行多起点梯度优化。我们在两个结构不同的数据生成过程上评估ACFS:具有决策依赖学生t边际的高斯copula和具有对数正态边际的高斯copula,在三种惩罚权重配置和每种设置100次重复下,对每种方法可用的真实分布oracle抽取次数设置共同上限。在第二个基准测试中,ACFS在每个配置下均实现了最低的中位数oracle谱风险,中位数差距相对于GP-BO在8.6%到21.8%之间。在第一个基准测试中,ACFS和GP-BO在中位数目标上统计上无显著差异,但在较高惩罚权重下,ACFS相对于GP-BO将跨重复离散度降低了约1.9到2.5倍,在最低权重下接近持平,在第二个基准测试中整体降低了1.7到2.3倍,表明运行间可靠性显著提高。ACFS在几乎所有设置中也优于CEM-SO、SGD-CVaR和KDE-SO,而消融和敏感性分析支持设计的鲁棒性,并表明各组件贡献在偏斜的对数正态基准上最为显著。

英文摘要

Minimising a spectral risk objective, defined as a weighted combination of expected cost and Conditional Value-at-Risk (CVaR), is challenging when the uncertainty distribution is decision-dependent, making both surrogate modelling and simulation-based ranking sensitive to tail estimation error. We propose Adaptive Conditional Forest Sampling (ACFS), a four-phase simulation-optimisation framework that integrates Generalised Random Forests for decision-conditional distribution approximation, CEM-guided global exploration, rank-weighted focused augmentation, and surrogate-to-oracle two-stage reranking before multi-start gradient-based refinement. We evaluate ACFS on two structurally distinct data-generating processes: a Gaussian copula with decision-dependent Student-t marginals and a Gaussian copula with log-normal marginals, across three penalty-weight configurations and 100 replications per setting, under a common cap on the number of true-distribution oracle draws available to each method. ACFS achieves the lowest median oracle spectral risk on the second benchmark in every configuration, with median gaps over GP-BO ranging from 8.6% to 21.8%. On the first benchmark, ACFS and GP-BO are statistically indistinguishable in median objective, but ACFS reduces cross-replication dispersion relative to GP-BO by approximately 1.9 to 2.5 times at the higher penalty weights, with near-parity at the lowest, and by 1.7 to 2.3 times throughout on the second benchmark, indicating materially improved run-to-run reliability. ACFS also outperforms CEM-SO, SGD-CVaR, and KDE-SO in nearly all settings, while ablation and sensitivity analyses support the robustness of the design and indicate that component contributions are most pronounced on the skewed log-normal benchmark.

2603.04109 2026-06-08 econ.EM stat.ML 版本更新

Testing Full Mediation of Treatment Effects and the Identifiability of Causal Mechanisms

治疗效应的完全中介检验与因果机制的可识别性

Martin Huber, Kevin Kloiber, Lukáš Lafférs

AI总结 提出检验随机分配治疗是否完全通过中介变量影响结果,以及不同中介的因果机制是否可识别,并扩展至非随机治疗情形。

详情
AI中文摘要

在因果分析中,理解干预或治疗影响结果的因果机制通常是核心关注点。我们提出一个检验,以评估(i) 在协变量条件下随机分配的治疗的因果效应是否完全由观测到的中间结果(称为中介或替代结果)中介,或仅通过这些中间结果运作,以及(ii) 通过不同中介运作的各种因果机制是否在协变量条件下可识别。我们证明,如果完全中介和因果机制的可识别性都成立,那么条件随机治疗在给定中介和协变量的条件下与结果条件独立。此外,我们将框架扩展到非随机分配治疗的情形。我们表明,在这种情况下,完全中介仍然可检验,而因果机制的可识别性不再有保证。我们提出一个双重机器学习框架来实现该检验,该框架可以纳入高维协变量,并在特定正则条件下具有根n一致性和渐近正态性。我们还通过一个模拟研究展示了我们方法良好的有限样本性能,并提供了两个实证应用,重新审视了关于产妇心理健康和社会规范的随机实验。

英文摘要

In causal analysis, understanding the causal mechanisms through which an intervention or treatment affects an outcome is often of central interest. We propose a test to evaluate (i) whether the causal effect of a treatment that is randomly assigned conditional on covariates is fully mediated by, or operates exclusively through, observed intermediate outcomes (referred to as mediators or surrogate outcomes), and (ii) whether the various causal mechanisms operating through different mediators are identifiable conditional on covariates. We demonstrate that if both full mediation and identification of causal mechanisms hold, then the conditionally random treatment is conditionally independent of the outcome given the mediators and covariates. Furthermore, we extend our framework to settings with non-randomly assigned treatments. We show that, in this case, full mediation remains testable, while identification of causal mechanisms is no longer guaranteed. We propose a double machine learning framework for implementing the test that can incorporate high-dimensional covariates and is root-n consistent and asymptotically normal under specific regularity conditions. We also present a simulation study demonstrating good finite-sample performance of our method, along with two empirical applications revisiting randomized experiments on maternal mental health and social norms.

2512.13246 2026-06-08 math.NA cs.NA math.ST stat.TH 版本更新

A geometric $q$-analogue of Hamiltonian Monte Carlo

哈密顿蒙特卡洛的几何 $q$-模拟

Xiaomei Yang, Zhiliang Deng

AI总结 提出哈密顿蒙特卡洛的几何 $q$-模拟,通过 $q$-微积分中的 $q$-变形哈密顿系统替代经典哈密顿动力学,构造 Metropolis 校正的 $q$-HMC 算法,并证明其满足细致平衡。数值实验表明,对于正尺度黑箱目标,$q$-HMC 具有优势。

详情
AI中文摘要

哈密顿蒙特卡洛 (HMC) 通过将哈密顿动力学与 Metropolis 校正相结合,生成高效的马尔可夫转移。本文通过将经典哈密顿动力学替换为来自 $q$-微积分的 $q$-变形哈密顿系统,发展了 HMC 的几何 $q$-模拟。从拉格朗日形式出发,我们推导出相应的 $q$-哈密顿方程,并证明了在 $q$-变形微分学中相关 $q$-辛形式的形式不变性。为了获得可计算的采样器,我们引入了 Jackson 导数实现,并构建了 Metropolis 校正的 $q$-HMC 算法。该提议在 $q\to1$ 时退化为经典 HMC,而当 $q\neq1$ 时,它将普通导数替换为 $q$-Jackson 有限差分。我们建立了细致平衡,确保生成的马尔可夫转移保持目标分布。数值实验检验了所提方法的计算行为。对于正尺度黑箱目标,$q$-Jackson 力具有尺度一致的解释:$s>0$ 的乘法扰动对应于 $y=\log s$ 中的中心有限差分。在此类例子中,$q$-HMC 紧密跟踪对数坐标有限差分 HMC 和精确梯度基准,而原始加法有限差分可能产生大的力和哈密顿误差。这些结果表明,所提出的 $q$-模拟为 HMC 型采样提供了一个有效的框架,对于正和乘法黑箱目标具有明显优势。

英文摘要

Hamiltonian Monte Carlo (HMC) generates efficient Markov transitions by combining Hamiltonian dynamics with a Metropolis correction. This paper develops a geometric \(q\)-analogue of HMC by replacing classical Hamiltonian dynamics with a \(q\)-deformed Hamiltonian system arising from \(q\)-calculus. Starting from a Lagrangian formulation, we derive the corresponding \(q\)-Hamiltonian equations and prove the formal invariance of the associated \(q\)-symplectic form within the \(q\)-deformed differential calculus. To obtain a computable sampler, we introduce a Jackson-derivative realization and construct a Metropolis-corrected \(q\)-HMC algorithm. The proposal reduces to classical HMC as \(q\to1\), while for \(q\neq1\) it replaces ordinary derivatives by \(q\)-Jackson finite differences. We establish detailed balance, which ensures that the resulting Markov transition preserves the target distribution. Numerical experiments examine the computational behavior of the proposed method. For positive-scale black-box targets, the \(q\)-Jackson force has a scale-consistent interpretation: multiplicative perturbations of \(s>0\) correspond to centered finite differences in \(y=\log s\). In such examples, \(q\)-HMC closely tracks log-coordinate finite-difference HMC and the exact-gradient benchmark, whereas raw additive finite differences may produce large force and Hamiltonian errors. These results suggest that the proposed \(q\)-analogue provides a valid HMC-type sampling framework with a visible advantage for positive and multiplicative black-box targets.

2602.21132 2026-06-08 stat.ME 版本更新

Robust and Sparse Generalized Linear Models for High-Dimensional Data via Maximum Mean Discrepancy

基于最大均值差异的高维数据鲁棒稀疏广义线性模型

Xiaoning Kang, Lulu Kang

AI总结 针对高维数据中的异常值和重尾噪声,提出基于最大均值差异(MMD)的惩罚广义线性模型,通过ℓ1惩罚和ADMM算法实现鲁棒估计与变量选择,在模拟中优于传统方法。

详情
Comments
22 pages, 5 tables, 2 figures
AI中文摘要

高维数据集经常受到异常值和重尾噪声的污染,这可能会严重偏倚如Lasso等标准正则化估计量。尽管最大均值差异(MMD)最近被引入作为鲁棒回归的“通用”框架,但其在高维广义线性模型(GLM)中的应用仍未得到充分探索,特别是在变量选择方面。在本文中,我们提出了一种用于GLM中鲁棒估计和特征选择的惩罚MMD框架。我们引入了一个ℓ1惩罚的MMD目标,并开发了两种版本的估计量:一个完整的O(n²)版本和一个计算高效的O(n)近似版本。为了解决由此产生的非凸优化问题,我们采用了一种基于交替方向乘子法(ADMM)结合AdaGrad的算法。通过涉及高斯线性回归和二元逻辑回归的广泛模拟研究,我们证明了所提出的方法与经典惩罚GLM和现有鲁棒基准方法相比具有很强的竞争力。我们的方法在保持估计精度和变量选择之间的平衡方面表现出特别的韧性,特别是在处理高杠杆点和重尾误差分布时,传统方法的性能可能会波动。

英文摘要

High-dimensional datasets are frequently subject to contamination by outliers and heavy-tailed noise, which can severely bias standard regularized estimators like the Lasso. While Maximum Mean Discrepancy (MMD) has recently been introduced as a ``universal'' framework for robust regression, its application to high-dimensional Generalized Linear Models (GLMs) remains largely unexplored, particularly regarding variable selection. In this paper, we propose a penalized MMD framework for robust estimation and feature selection in GLMs. We introduce an $\ell_1$-penalized MMD objective and develop two versions of the estimator: a full $O(n^2)$ version and a computationally efficient $O(n)$ approximation. To solve the resulting non-convex optimization problem, we employ an algorithm based on the Alternating Direction Method of Multipliers (ADMM) combined with AdaGrad. Through extensive simulation studies involving Gaussian linear regression and binary logistic regression, we demonstrate that our proposed methods are highly competitive with classical penalized GLMs and existing robust benchmarks. Our approach shows particular resilience in maintaining a balance between estimation accuracy and variable selection across diverse contamination scenarios, especially in handling high-leverage points and heavy-tailed error distributions where traditional methods may fluctuate in performance.

2509.11208 2026-06-08 stat.ML cs.LG 版本更新

Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication

可预测的压缩失败:基于证据的二元裁决的顺序敏感性与信息预算

Leon Chlon, Ahmed Karim, Maggie Chlon, MarcAntonio Awada

AI总结 研究证据顺序对基于Transformer的二元裁决模型的影响,提出QMV界和EDFL定律,通过信息充分率门控实现低幻觉率下的答案/弃权决策。

详情
AI中文摘要

用于基于证据的二元裁决(例如,支持/反驳、是/否或验证器支持的通过/失败决策)的Transformer可能对可交换证据呈现的顺序敏感,在验证器相关的伯努利谓词下产生跨排列的分散性和不可靠的尝试答案。我们将证据顺序视为一个干扰变量,并形式化了一个期望-实现差距:下一个词训练可以最小化顺序上的期望条件描述长度,而固定顺序仍保持位置敏感性。我们的量化鞅违反(QMV)界预测了由相邻秩位置敏感性引起的分散性,在调和区具有$O(\log n)$增长;我们的期望级解压定律(EDFL)将KL凸性/数据处理界专门化到伯努利谓词,产生信任比特(B2T)、幻觉风险(RoH)以及用于答案/弃权决策的信息充分率(ISR)门。在来自FEVER、HotpotQA、NQ-Open、PopQA和Controls的3,059个有依据项目上,我们观察到对数分散性和均匀排列混合的正Jensen增益。在一个预先指定的保留审计(528个项目)中,分析固定的ISR$=1$门实现了0.0-0.7%的幻觉率,20.6-27.9%的弃权率(95%置信区间),支持该操作点,但未声称对所有模型系列或不受限生成具有通用校准。

英文摘要

Transformers used for evidence-grounded binary adjudication (e.g., support/refute, yes/no, or verifier-backed pass/fail decisions) can be sensitive to the order in which exchangeable evidence is presented, producing dispersion across permutations and unreliable attempted answers under a verifier-relative Bernoulli predicate. We treat evidence order as a nuisance variable and formalize an expectation-realization gap: next-token training can minimize expected conditional description length over orderings while a fixed ordering remains position-sensitive. Our Quantified Martingale Violation (QMV) bound predicts the dispersion induced by adjacent-rank positional sensitivity, with $O(\log n)$ growth in the harmonic regime; our Expectation-level Decompression Law (EDFL) specializes a KL convexity/data-processing bound to Bernoulli predicates, yielding Bits-to-Trust (B2T), Risk-of-Hallucination (RoH), and an Information Sufficiency Ratio (ISR) gate for answer/abstain decisions. On 3,059 grounded items from FEVER, HotpotQA, NQ-Open, PopQA, and Controls, we observe logarithmic dispersion and positive Jensen gains from uniform permutation mixtures. In one pre-specified held-out audit (528 items), the analytically fixed ISR$=1$ gate attains 0.0-0.7% hallucination with 20.6-27.9% abstention (95% CIs), supporting the operating point without claiming universal calibration across all model families or unrestricted generation.

2602.06245 2026-06-08 stat.ML cs.LG 版本更新

Inheritance Between Feedforward and Convolutional Networks via Model Projection

前馈网络与卷积网络之间的继承关系:通过模型投影

Nicolas Ewen, Jairo Diaz-Rodriguez, Kelly Ramsay

AI总结 提出模型继承概念,证明广义前馈网络是广义卷积网络的子集,并通过模型投影实现反向继承,用于参数高效的迁移学习。

详情
AI中文摘要

神经网络技术通常通过类比在不同架构家族之间转移,但这种转移仅在技术所需假设被保留时才有效。我们将这一思想引入为模型类之间的继承。使用统一的节点级框架和张量值激活,我们证明广义前馈网络(GFFN)是广义卷积网络(GCNN)的严格子集,因此GCNN的性质直接转移到GFFN。反向方向并非自动:标准CNN节点使用空间核,而FFN节点对每个输入贡献使用一个标量权重。我们引入模型投影来恢复受限的反向继承路径。投影冻结每个卷积输入通道子函数,并为每个输入-输出通道贡献学习一个标量系数,使投影后的CNN节点具有标量加权输入重组的GFFN风格可训练结构。这种继承结构自然导致参数高效的迁移学习。在多个ImageNet预训练CNN骨干网络和下游图像分类数据集上,模型投影与标准和PEFT基线竞争,并为后续全微调提供有效的初始化。

英文摘要

Neural-network techniques are often transferred across architecture families by analogy, but such transfer is valid only when the assumptions required by a technique are preserved. We introduce this idea as inheritance between model classes. Using a unified node-level framework with tensor-valued activations, we prove that generalized feedforward networks (GFFNs) form a strict subset of generalized convolutional networks (GCNNs), so GCNN properties transfer directly to GFFNs. The reverse direction is not automatic: standard CNN nodes use spatial kernels, while FFN nodes use one scalar weight per input contribution. We introduce model projection to recover a restricted reverse inheritance path. Projection freezes each convolutional input-channel sub-function and learns one scalar coefficient for each input-output channel contribution, giving projected CNN nodes the GFFN-style trainable structure of scalar-weighted input recombination. This inherited structure leads naturally to parameter-efficient transfer learning. Across multiple ImageNet-pretrained CNN backbones and downstream image-classification datasets, model projection is competitive with standard and PEFT baselines and provides an effective initialization for subsequent full fine-tuning.

2602.02819 2026-06-08 cs.LG stat.ML 版本更新

Causal Evaluation of Membership Inference Attacks

成员推断攻击的因果评估

Mathieu Even, Clément Berenfeld, Linus Bleistein, Tudor Cebere, Julie Josse, Aurélien Bellet

AI总结 将成员推断攻击评估视为因果推断问题,定义记忆化为包含数据点的因果效应,提出多轮、单轮和零轮设置下的实用估计器并验证其有效性。

详情
Comments
Fixed ref label problems
AI中文摘要

成员推断攻击(MIA)旨在区分训练点(成员)和未见数据(非成员),并广泛用于量化记忆化和评估隐私风险。标准MIA评估需要重复训练,对于大型模型计算成本高昂。单轮(单次训练,随机数据包含)和零轮(事后评估)方法常被用作替代,但其统计有效性尚不清楚。我们通过将MIA评估框架化为因果推断问题来填补这一空白,将\emph{记忆化定义为在训练集中包含一个数据点的因果效应}。这一新颖的表述揭示并形式化了现有协议中偏差的关键来源:单轮方法受到联合包含点之间的干扰,而零轮评估还受到成员与非成员评估数据之间分布偏移的混淆。我们推导了标准MIA指标的因果类比,并提出了多轮、单轮和零轮设置下的实用估计器,具有非渐近一致性保证。我们在多个设置中验证了我们的方法,包括预训练和微调的大型语言模型,表明它能够在无需重新训练且存在分布偏移的情况下可靠地测量MIA性能。总体而言,我们的框架为现代AI系统中的隐私评估提供了原则性基础。

英文摘要

Membership Inference Attacks (MIAs) aim to distinguish training points (members) from unseen data (non-members), and are widely used to quantify memorization and assess privacy risks. Standard MIA evaluation requires repeated retraining, which is computationally costly for large models. One-run (single training with randomized data inclusion) and zero-run (post hoc evaluation) methods are often used instead, but their statistical validity remains unclear. We address this gap by framing MIA evaluation as a causal inference problem, defining \emph{memorization as the causal effect of including a data point in the training set}. This novel formulation reveals and formalizes key sources of bias in existing protocols: one-run methods suffer from interference between jointly included points, while zero-run evaluations are additionally confounded by distribution shift between member and non-member evaluation data. We derive causal analogues of standard MIA metrics and propose practical estimators for multi-run, one-run, and zero-run regimes with non-asymptotic consistency guarantees. We validate our approach in several settings, including pretrained and fine-tuned LLMs, showing that it enables reliable measurement of MIA performance without retraining and under distribution shift. Overall, our framework provides a principled foundation for privacy evaluation in modern AI systems.

2601.13782 2026-06-08 math.ST cs.NA math.NA stat.TH 版本更新

Moving Least Squares without Quasi-Uniformity: A Stochastic Approach

无需拟均匀性的移动最小二乘法:一种随机方法

Shir Tapiro-Moshe, Yariv Aizenbud, Barak Sober

AI总结 针对随机采样破坏确定性假设的问题,提出移动最小二乘法的随机分析,证明在温和条件下经典收敛性和光滑性仍以高概率成立。

详情
AI中文摘要

局部多项式回归(LPR)和移动最小二乘法(MLS)是密切相关的非参数估计方法,分别独立发展于统计学和逼近理论。统计LPR分析侧重于在概率假设下克服采样噪声,而确定性MLS理论研究关于\textit{填充距离}(分辨率参数)的光滑性性质和收敛速度。尽管有相似性,MLS背后的确定性假设在随机采样下不成立。我们首先量化独立同分布随机样本的填充距离$h_n$和\textit{分离距离}$δ_n$的概率行为。即,对于满足温和正则性条件的分布,依概率有$h_n\propto n^{-1/d}\log^{1/d}(n)$和$δ_n \propto n^{-2/d}$。然后我们证明,对于$k-1$次MLS,与阶$|m|\le k-1$的微分算子$Q$相关的逼近误差以$h_n^{k-|m|}$衰减,建立了经典MLS估计的随机类比。此外,我们证明MLS逼近函数以高概率局部光滑。这项工作提供了MLS的第一个统一随机分析,表明尽管确定性采样假设失效,经典收敛性和光滑性性质在自然概率模型下仍然保持。

英文摘要

Local Polynomial Regression (LPR) and Moving Least Squares (MLS) are closely related nonparametric estimation methods, developed independently in statistics and approximation theory. While statistical LPR analysis focuses on overcoming sampling noise under probabilistic assumptions, the deterministic MLS theory studies smoothness properties and convergence rates with respect to the \textit{fill distance} (a resolution parameter). Despite this similarity, the deterministic assumptions underlying MLS fail to hold under random sampling. We begin by quantifying the probabilistic behavior of the fill distance $h_n$ and \textit{separation} $δ_n$ of an i.i.d. random sample. That is, for a distribution satisfying a mild regularity condition, $h_n\propto n^{-1/d}\log^{1/d} (n)$ and $δ_n \propto n^{-2/d}$ in probability. We then prove that, for MLS of degree $k\!-\!1$, the approximation error associated with a differential operator $Q$ of order $|m|\le k-1$ decays as $h_n^{\,k-|m|}$, establishing stochastic analogues of the classical MLS estimates. Additionally, we show that the MLS approximant is locally smooth with high probability. This work provides the first unified stochastic analysis of MLS, demonstrating that - despite the failure of deterministic sampling assumptions - the classical convergence and smoothness properties persist under natural probabilistic models.

2601.17612 2026-06-08 stat.ME 版本更新

A Regularised Latent-Class Item Response Model for Detecting Measurement Non-Invariance in Ordinal Response Scales

用于检测有序反应量表中测量非不变性的正则化潜在类别项目反应模型

Gabriel Wallin, Qi Huang

AI总结 提出一种正则化潜在类别项目反应模型,通过ℓ1惩罚边际似然和EM算法检测有序量表上的差异项目功能,无需已知分组标签或锚定项目。

详情
AI中文摘要

当量表的心理测量属性在不同子组间存在差异时,会出现测量非不变性,从而削弱组间比较的有效性。在项目层面,这表现为差异项目功能(DIF),即在控制潜在特质后,项目反应在不同组间存在差异。本文开发了一个框架,用于检测有序量表中的DIF,无需已知组标签或锚定项目。我们构建了一个比例优势潜在类别项目反应模型,其中个体被概率性地分配到潜在类别。DIF通过类别特定的截距和斜率变化来捕捉,允许均匀和非均匀DIF。通过在稀疏性假设下使用ℓ1惩罚边际似然实现识别,并使用定制的EM算法进行估计。由于类别特定斜率使每个潜在类别的位置和尺度均无法识别,稀疏性在锚定潜在度量的同时选择DIF效应。模拟研究显示,项目参数及两种类型的DIF均能准确恢复。对一项人格测试的实证应用揭示了具有不同反应模式的潜在子组,并识别出显示潜在类别特定测量非不变性的项目。该框架为在比较组未观测或定义不明确时评估有序量表的测量不变性提供了一种灵活的方法。

英文摘要

Measurement non-invariance arises when the psychometric properties of a scale differ across subgroups, undermining the validity of group comparisons. At the item level, this manifests as differential item functioning (DIF), where item responses differ across groups after controlling for the latent trait. This paper develops a framework for detecting DIF in ordinal scales without requiring known group labels or anchor items. We formulate a proportional-odds latent-class item response model in which individuals are assigned probabilistically to latent classes. DIF is captured through class-specific intercept and slope shifts, allowing both uniform and non-uniform DIF. Identification is achieved through an \(\ell_1\)-penalised marginal likelihood under a sparsity assumption, with estimation implemented using a tailored EM algorithm. Because class-specific slopes leave both the location and scale of each latent class unidentified, sparsity anchors the latent metric while selecting DIF effects. Simulation studies demonstrate accurate recovery of item parameters and both types of DIF. An empirical application to a personality test reveals latent subgroups with distinct response patterns and identifies items displaying potential class-specific measurement non-invariance. The framework provides a flexible approach for assessing measurement invariance in ordinal scales when comparison groups are unobserved or poorly defined.

2601.05669 2026-06-08 stat.ME 版本更新

Two-Stage Robust Sparse Gradient Methods for Regression Under Heavy-Tailed Designs

重尾设计下回归的两阶段鲁棒稀疏梯度方法

Kaiyuan Zhou, Xiaoyu Zhang, Wenyang Zhang, Di Wang

AI总结 针对重尾协变量和噪声下的高维稀疏回归,提出两阶段RIGHT方法,利用坐标-wise中位数均值梯度估计和延迟样本分裂,实现相位自适应收敛,并揭示设计尾指数与噪声尾指数对梯度稳定性和统计速率的解耦影响。

详情
AI中文摘要

我们研究在同时具有重尾协变量和噪声的高维稀疏回归。重尾数据以两种不同方式影响稀疏优化:极端协变量可能在全局定位期间破坏梯度场,而重尾噪声在局部细化期间限制最终统计精度。受这种两阶段结构的启发,我们提出两阶段RIGHT,一种基于坐标-wise中位数均值(MoM)梯度估计和延迟样本分裂的鲁棒稀疏一阶方法。MoM梯度估计器计算简单,与硬阈值更新兼容,并允许相位自适应浓度界,其速率取决于当前定位半径。延迟分裂在全局定位期间重用数据,并为较短的细化阶段保留新批次,从而降低样本分裂成本。理论结果揭示了解耦的速率结构:设计尾指数控制梯度稳定性和样本复杂度,而噪声尾指数控制最终统计速率。我们还提供了相位-wise下界基准,表明设计驱动的定位障碍是内在的。广泛的模拟实验和真实数据分析展示了所提方法相对于现有竞争者的有效性。

英文摘要

We study high-dimensional sparse regression under simultaneous heavy-tailed covariates and noise. Heavy-tailed data affect sparse optimization in two different ways: extreme covariates can destabilize the gradient field during global localization, while heavy-tailed noise limits the final statistical accuracy during local refinement. Motivated by this two-phase structure, we propose two-stage RIGHT, a robust sparse first-order method based on coordinate-wise median-of-means (MoM) gradient estimation and delayed sample splitting. The MoM gradient estimator is computationally simple, compatible with hard-thresholded updates, and admits phase-adaptive concentration bounds whose rates depend on the current localization radius. Delayed splitting reuses data during global localization and reserves fresh batches for the shorter refinement stage, reducing the sample-splitting cost. The theoretical results reveal a decoupled rate structure: the design-tail index controls gradient stability and sample complexity, whereas the noise-tail index controls the final statistical rate. We also provide phase-wise lower-bound benchmarks showing that the design-driven localization barrier is intrinsic. Extensive simulation experiments and real data analysis showcase the efficacy of the proposed method over existing competitors.

2505.21423 2026-06-08 cs.LG stat.ML 版本更新

Conflicting Biases at the Edge of Stability: Norm versus Sharpness Regularization

稳定性边缘的冲突偏差:范数与锐度正则化

Maria Matveev, Vit Fojtik, Hung-Hsu Chou, Gitta Kutyniok, Johannes Maly

AI总结 本文研究过参数化网络中梯度下降的隐式正则化,证明学习率在低范数与低锐度之间插值,且单一偏差不足以解释泛化,需考虑动态权衡。

详情
Comments
Accepted at ICML 2026
AI中文摘要

过参数化网络显著的泛化性能通常归因于隐式偏差,例如小学习率下的范数最小化和稳定性边缘(Edge-of-Stability)状态下的低锐度。在这项工作中,我们认为全面理解梯度下降的泛化性能需要分析这些不同形式的隐式正则化之间的相互作用。我们通过实验证明,学习率在训练模型的低参数范数和低锐度之间插值。此外,我们证明对于在简单回归任务上训练的对角线性网络,单独的隐式偏差都不能最小化泛化误差。这些发现表明,仅关注单一隐式偏差不足以解释良好的泛化,并促使我们采用更广阔的隐式正则化视角,捕捉由不可忽略的学习率引起的范数与锐度之间的动态权衡。

英文摘要

The remarkable generalization properties of overparameterized networks are often attributed to implicit biases, such as norm minimization at small learning rates and low sharpness in the Edge-of-Stability regime. In this work, we argue that a comprehensive understanding of the generalization performance of gradient descent requires analyzing the interaction between these various forms of implicit regularization. We empirically demonstrate that the learning rate interpolates between low parameter norm and low sharpness of the trained model. We furthermore prove that neither implicit bias alone minimizes the generalization error for diagonal linear networks trained on a simple regression task. These findings demonstrate that focusing on a single implicit bias is insufficient to explain good generalization, and they motivate a broader view of implicit regularization that captures the dynamic trade-off between norm and sharpness induced by non-negligible learning rates.

2512.06974 2026-06-08 math.ST stat.TH 版本更新

On-line Pick-Freeze Mirror algorithm for Sensitity Analysis

在线冻结-镜像算法用于灵敏度分析

Manon Costa, Sébastien Gadat, Xavier Gendre, Thierry Klein

AI总结 提出一种在线随机镜像下降算法,通过将Sobol'指数重写为单纯形上的优化问题,同时估计所有Sobol'指数,并证明其一致性和收敛速率。

详情
Comments
33 pages, 5 figures
AI中文摘要

本文的主要目标是提出一种新方法,用于同时估计Sobol'指数的整个集合。我们的方法利用了Sobol'指数可以重写为$\R^d$单纯形上的优化问题解这一事实,通过随机镜像下降算法构建在线估计序列。我们证明了我们的估计过程是一致的,并给出了其收敛速率的非渐近上界。此外,我们展示了该方法的数值准确性,并与其他经典估计方法进行了比较。

英文摘要

The main objective of this paper is to propose a new approach for estimating the entire collection of Sobol' indices simultaneously. Our approach exploits the fact that Sobol' indices can be rewritten as solutions to an optimisation problem over the simplex of $\R^d$, to construct an online sequence of estimators using a stochastic mirror descent algorithm. We prove that our estimation procedure is consistent and provide a non-asymptotic upper bound for its rate of convergence. Furthermore, we demonstrate the numerical accuracy of our method and compare it with other classical estimation procedures.

2211.02192 2026-06-08 stat.ME stat.AP 版本更新

A Mixed Model Approach for Estimating Regional Functional Connectivity from Voxel-level BOLD Signals

基于体素级BOLD信号的区域功能连接估计的混合模型方法

Ruobin Liu, Chao Zhang, Chau Tran, Sophie Achard, Wendy Meiring, Alexander Petersen

AI总结 提出线性混合效应模型,显式建模区域间和区域内相关性及测量误差,通过最大似然估计实现无偏且置信区间覆盖准确的功能连接估计,优于传统平均相关法。

详情
AI中文摘要

静息态脑功能连接量化不同脑区活动模式之间的同步性。在功能磁共振成像中,每个区域包含一组空间连续的体素,用于采集血氧水平依赖信号。普遍使用的平均相关(CA)估计器及其他类似指标,是通过每个区域内空间聚合信号计算得出的,并且仍然是神经科学家最常用的区域间连接量化方法。它们的流行主要归因于计算简单,尽管存在明显的偏差且缺乏统计原理上的合理性。通过利用线性混合效应模型,可以将区域间和区域内相关性以及测量误差明确建模为信号变异性来源。开发了一种新颖的计算流程,聚焦于受试者水平的区域间相关参数,以应对将最大似然估计应用于这种结构化、高维时空数据的挑战。模拟结果证实,与CA相比,所提出的估计器在减少偏差和准确置信区间覆盖方面均具有优越性。该方法还应用于构建来自人类连接组项目重测数据库的个体人脑网络。区域间相关估计的一致性表明,所提出的方法可能具有显著的科学优势,对于同一受试者的重测扫描,其产生的结果比CA更可靠。

英文摘要

Resting-state brain functional connectivity quantifies the synchrony between activity patterns of different brain regions. In functional magnetic resonance imaging, each region comprises a set of spatially contiguous voxels at which blood-oxygen-level-dependent signals are acquired. The ubiquitous Correlation of Averages (CA) estimator, and other similar metrics, are computed from spatially aggregated signals within each region, and remain the quantifications of inter-regional connectivity most used by neuroscientists. Their popularity is primarily due to computational simplicity despite their demonstrable bias and lack of statistically principled justification. By leveraging linear mixed-effects models, both inter-regional and intra-regional correlation and measurement error can be explicitly modeled as signal variability sources. A novel computational pipeline, focused on subject-level inter-regional correlation parameters of interest, is developed to address the challenges of applying maximum likelihood estimation to such structured, high-dimensional spatiotemporal data. Simulation results confirm the superiority of the proposed estimator relative to CA in terms of both decreased bias and accurate confidence interval coverage across simulation settings. The proposed method is also applied to construct individual human brain networks for subjects from a Human Connectome Project test-retest database. Concordances between inter-regional correlation estimates demonstrate the potentially substantial scientific benefits of the proposed approach that reliably produces more consistent results than CA for test-retest scans of the same subject.

2510.15093 2026-06-08 math.NA cs.NA physics.comp-ph stat.ML 版本更新

Fast spectral separation method for kinetic equation with anisotropic non-stationary collision operator retaining micro-model fidelity

保留微观模型保真度的各向异性非平稳碰撞算子的动力学方程快速谱分离方法

Yue Zhao, Huan Lei

AI总结 提出一种从分子动力学学习的数据驱动碰撞算子,通过各向异性非平稳核和快速谱分离方法(O(N log N)算法)扩展动力学模型至弱耦合之外,保持结构守恒与H定理。

详情
AI中文摘要

我们提出了一种通用的、数据驱动的单组分等离子体碰撞算子,该算子从分子动力学模拟中学习,将碰撞动力学模型扩展到弱耦合区域之外。所提出的算子具有各向异性、非平稳的碰撞核,考虑了经典Landau公式中通常忽略的粒子相关性。为了实现高效的数值评估,我们开发了一种快速谱分离方法,将核表示为单变量基函数的低秩张量积。该公式通过快速傅里叶变换实现了$O(N \log N)$算法,并通过保持结构的中心差分离散化保留了关键的物理性质,包括离散守恒律和H定理。数值实验表明,所提出的模型在中等耦合区域准确捕捉了超越标准Landau模型的等离子体动力学,同时保持了高计算效率和结构保持性质。

英文摘要

We present a generalized, data-driven collisional operator for one-component plasmas, learned from molecular dynamics simulations, to extend the collisional kinetic model beyond the weakly coupled regime. The proposed operator features an anisotropic, non-stationary collision kernel that accounts for particle correlations typically neglected in classical Landau formulations. To enable efficient numerical evaluation, we develop a fast spectral separation method that represents the kernel as a low-rank tensor product of univariate basis functions. This formulation admits an $O(N \log N)$ algorithm via fast Fourier transforms and preserves key physical properties, including discrete conservation laws and the H-theorem, through a structure-preserving central difference discretization. Numerical experiments demonstrate that the proposed model accurately captures plasma dynamics in the moderately coupled regime beyond the standard Landau model while maintaining high computational efficiency and structure-preserving properties.

2505.21285 2026-06-08 cs.LG stat.ML 版本更新

Learnable Kernel Density Estimation for Graphs and Its Application to Graph-Level Anomaly Detection

可学习图核密度估计及其在图级异常检测中的应用

Xudong Wang, Ziheng Sun, Chris Ding, Jicong Fan

AI总结 提出LGKDE框架,通过图神经网络表示图分布并利用最大均值差异学习多尺度核密度估计,在理论保证下有效捕获结构模式和语义变化,在图异常检测任务中优于现有方法。

详情
Comments
Accepted in the Forty-Third International Conference on Machine Learning (ICML 2026), Main Track
AI中文摘要

本文提出一个名为LGKDE的框架,用于学习图的核密度估计。图密度估计的关键挑战在于有效捕获结构模式和语义变化,同时保持理论保证。结合图核和核密度估计(KDE)是图密度估计的标准方法,但由于核的手工设计和固定特征,性能不佳。我们的方法LGKDE利用图神经网络将每个图表示为离散分布,并利用最大均值差异学习多尺度KDE的图度量,其中所有参数通过最大化图相对于其精心设计的扰动版本的密度来学习。扰动在节点特征和图谱上进行,有助于更好地刻画正常密度区域的边界。理论上,我们为LGKDE建立了一致性和收敛性保证,包括均方积分误差界、鲁棒性和泛化性。我们通过展示其在恢复合成图分布底层密度方面的有效性,并将其应用于多个基准数据集上的图异常检测来验证LGKDE。广泛的实证评估表明,在大多数基准数据集上,LGKDE相比最先进的基线方法表现出优越的性能。

英文摘要

This work proposes a framework LGKDE that learns kernel density estimation for graphs. The key challenge in graph density estimation lies in effectively capturing both structural patterns and semantic variations while maintaining theoretical guarantees. Combining graph kernels and kernel density estimation (KDE) is a standard approach to graph density estimation, but has unsatisfactory performance due to the handcrafted and fixed features of kernels. Our method LGKDE leverages graph neural networks to represent each graph as a discrete distribution and utilizes maximum mean discrepancy to learn the graph metric for multi-scale KDE, where all parameters are learned by maximizing the density of graphs relative to the density of their well-designed perturbed counterparts. The perturbations are conducted on both node features and graph spectra, which helps better characterize the boundary of normal density regions. Theoretically, we establish consistency and convergence guarantees for LGKDE, including bounds on the mean integrated squared error, robustness, and generalization. We validate LGKDE by demonstrating its effectiveness in recovering the underlying density of synthetic graph distributions and applying it to graph anomaly detection across diverse benchmark datasets. Extensive empirical evaluation shows that LGKDE demonstrates superior performance compared to state-of-the-art baselines on most benchmark datasets.

2404.02141 2026-06-08 stat.ME cs.LG econ.EM stat.CO stat.ML 版本更新

Robustly estimating heterogeneity in factorial data using Rashomon Partitions

使用Rashomon分区稳健估计因子数据中的异质性

Aparajithan Venkateswaran, Anirudh Sankar, Arun G. Chandrasekhar, Tyler H. McCormick

AI总结 提出Rashomon分区集(RPS)贝叶斯框架,通过枚举后验密度接近最大后验模型的所有模型来量化模型不确定性,实现稳健的异质性估计。

详情
AI中文摘要

在观测数据和随机对照试验中,研究人员选择统计模型来阐述感兴趣的结果如何随可观测协变量的组合而变化。选择过于简单的模型可能会掩盖协变量组之间结果的重要异质性,而过于复杂则可能识别出虚假模式。在本文中,我们提出了一种新颖的贝叶斯模型不确定性框架,称为Rashomon分区集(RPS)。RPS包含所有后验密度接近最大后验(MAP)模型的模型。我们通过枚举而非采样来构建RPS,这确保我们探索数据中具有高证据的所有模型,即使它们提供截然不同的实质性解释。我们使用l0先验,该先验允许我们在不对效应之间的关联施加强假设的情况下捕获复杂的异质性,并从信息论角度证明该先验是极小化最优的。我们刻画了在RPS内相对于整个后验条件计算的参数(的函数)的近似误差。我们提出了一种算法,从可解释且唯一的模型类中枚举RPS,然后给出RPS大小的界限。我们提供了模拟证据以及三个实证例子:价格对慈善捐赠的影响、染色体结构的异质性以及小额信贷的引入。

英文摘要

In both observational data and randomized control trials, researchers select statistical models to articulate how the outcome of interest varies with combinations of observable covariates. Choosing a model that is too simple can obfuscate important heterogeneity in outcomes between covariate groups, while too much complexity risks identifying spurious patterns. In this paper, we propose a novel Bayesian framework for model uncertainty called Rashomon Partition Sets (RPSs). The RPS consists of all models that have posterior density close to the maximum a posteriori (MAP) model. We construct the RPS by enumeration, rather than sampling, which ensures that we explore all models with high evidence in the data, even if they offer dramatically different substantive explanations. We use a l0 prior, which allows the allows us to capture complex heterogeneity without imposing strong assumptions about the associations between effects, showing this prior is minimax optimal from an information-theoretic perspective. We characterize the approximation error of (functions of) parameters computed conditional on being in the RPS relative to the entire posterior. We propose an algorithm to enumerate the RPS from the class of models that are interpretable and unique, then provide bounds on the size of the RPS. We give simulation evidence along with three empirical examples: price effects on charitable giving, heterogeneity in chromosomal structure, and the introduction of microfinance.

2508.05818 2026-06-08 math.ST stat.TH 版本更新

Validity and Power of Heavy-Tailed Combination Tests under Asymptotic Dependence

渐近相依下重尾组合检验的有效性与功效

Lin Gui, Tiantian Mao, Jingshu Wang, Ruodu Wang

AI总结 针对重尾组合检验在渐近相依p值下的有效性缺失问题,提出基于多元正则变化Copula的统一框架,证明当变换分布尾部指数γ≤1时检验渐近有效,且γ=1时功效优于Bonferroni方法。

详情
AI中文摘要

重尾组合检验,如柯西组合检验和调和平均p值方法,广泛用于通过聚合相依p值来检验全局零假设。然而,现有的理论保证主要局限于渐近独立p值的情况,使得这些检验在更广泛相依结构下的行为尚不清楚。我们基于多元正则变化Copula(一个由p值在零附近联合行为的温和正则条件定义的灵活类)开发了一个统一框架,该框架能够适应广泛的相依结构。在此框架内,当变换分布的尾部指数γ≤1时,重尾组合检验是渐近有效的,其中γ=1在保持有效性的同时最大化功效。我们进一步证明,当且仅当p值不是渐近独立且信号不是极其稀疏时,γ=1的组合检验比Bonferroni方法具有严格更大的渐近功效,且随着相依性增强,功效优势增大。Bonferroni方法作为γ→0的极限出现,并在渐近相依下变得过于保守。这些结果为使用截断柯西或帕累托组合检验提供了理论支持,为在复杂相依下增强功效同时控制假阳性提供了原则性方法。

英文摘要

Heavy-tailed combination tests, such as the Cauchy combination test and harmonic mean p-value method, are widely used for testing global null hypotheses by aggregating dependent p-values. Existing theoretical guarantees, however, are largely restricted to the case of asymptotically independent p-values, leaving the behavior of these tests under broader dependence structures poorly understood. We develop a unified framework based on multivariate regularly varying copulas, a flexible class defined by a mild regularity condition on the joint behavior of p-values near zero, that accommodates a wide range of dependence structures. Within this framework, heavy-tailed combination tests are asymptotically valid when the transformation distribution has tail index $γ\leq 1$, with $γ= 1$ maximizing power while preserving validity. We further show that combination tests with $γ= 1$ achieve strictly greater asymptotic power than Bonferroni's method if and only if the p-values are not asymptotically independent and signals are not extremely sparse, with the power advantage growing as dependence strengthens. Bonferroni emerges as the $γ\to 0$ limit and becomes overly conservative under asymptotic dependence. These results provide theoretical support for using truncated Cauchy or Pareto combination tests, offering a principled approach to enhance power while controlling false positives under complex dependence.

2508.02039 2026-06-08 cs.LG stat.ML 版本更新

Model Recycling Framework for Multi-Source Data-Free Supervised Transfer Learning

多源无数据监督迁移学习的模型回收框架

Sijia Wang, Ricardo Henao

AI总结 提出模型回收框架,在无源数据情况下,通过识别相关源模型子集实现白盒和黑盒设置下的参数高效迁移学习,支持多源无数据监督迁移学习。

详情
AI中文摘要

对数据隐私的日益关注以及与检索源数据进行模型训练相关的其他困难,催生了无源迁移学习的需求,在这种学习中,只能访问预训练模型,而不能访问原始源域的数据。这种设置带来了许多挑战,因为许多现有的迁移学习方法通常依赖于对源数据的访问,这限制了它们直接应用于源数据不可用的场景。此外,实际问题使其更加困难,例如在没有源数据信息的情况下有效选择迁移模型,以及在没有完全访问源模型的情况下进行迁移。受此启发,我们提出了一个模型回收框架,用于参数高效的模型训练,该框架在白盒和黑盒设置中识别要重用的相关源模型的子集。因此,我们的框架使模型即服务(MaaS)提供商能够构建高效预训练模型的库,从而为多源无数据监督迁移学习创造了机会。

英文摘要

Increasing concerns for data privacy and other difficulties associated with retrieving source data for model training have created the need for source-free transfer learning, in which one only has access to pre-trained models instead of data from the original source domains. This setting introduces many challenges, as many existing transfer learning methods typically rely on access to source data, which limits their direct applicability to scenarios where source data is unavailable. Further, practical concerns make it more difficult, for instance efficiently selecting models for transfer without information on source data, and transferring without full access to the source models. So motivated, we propose a model recycling framework for parameter-efficient training of models that identifies subsets of related source models to reuse in both white-box and black-box settings. Consequently, our framework makes it possible for Model as a Service (MaaS) providers to build libraries of efficient pre-trained models, thus creating an opportunity for multi-source data-free supervised transfer learning.

2406.13826 2026-06-08 econ.EM stat.ME 版本更新

Testing identification in mediation and dynamic treatment models

中介和动态处理模型中的识别检验

Martin Huber, Kevin Kloiber, Lukas Laffers

AI总结 基于Huber和Kueck(2022)的检验,提出一种利用两组观测变量(协变量和疑似工具)来检验中介和动态处理模型中因果效应识别的方法,并应用于斯洛伐克劳动力市场数据。

详情
AI中文摘要

我们提出了一种检验中介和动态处理模型中因果效应识别的方法,该方法基于两组观测变量,即要控制的协变量和疑似工具,建立在Huber和Kueck(2022)针对单一处理模型的检验之上。我们考虑具有处理和中介变量顺序分配的模型,以评估直接处理效应(排除中介)、间接处理效应(通过中介)或处理和中介的联合效应。我们建立了在观测数据中识别这些效应的可检验条件。这些条件共同意味着(1)处理和中介在给定协变量下的外生性,以及(2)处理和中介的不同工具的有效性,即工具不直接影响结果(除了通过处理或中介)并且在给定协变量下是无混杂的。我们的框架扩展到当用选择指标替换中介以观察结果时的处理后样本选择或损耗问题,从而能够联合检验处理和损耗的选择性。我们提出了一种基于机器学习的检验,以数据驱动的方式控制协变量,并在模拟研究中分析其有限样本性能。此外,我们将我们的方法应用于斯洛伐克劳动力市场数据,发现对于动态处理评估中通常考虑的一系列培训项目,我们的可检验含义未被拒绝。

英文摘要

We propose a test for the identification of causal effects in mediation and dynamic treatment models that is based on two sets of observed variables, namely covariates to be controlled for and suspected instruments, building on the test by Huber and Kueck (2022) for single treatment models. We consider models with a sequential assignment of a treatment and a mediator to assess the direct treatment effect (net of the mediator), the indirect treatment effect (via the mediator), or the joint effect of both treatment and mediator. We establish testable conditions for identifying such effects in observational data. These conditions jointly imply (1) the exogeneity of the treatment and the mediator conditional on covariates and (2) the validity of distinct instruments for the treatment and the mediator, meaning that the instruments do not directly affect the outcome (other than through the treatment or mediator) and are unconfounded given the covariates. Our framework extends to post-treatment sample selection or attrition problems when replacing the mediator by a selection indicator for observing the outcome, enabling joint testing of the selectivity of treatment and attrition. We propose a machine learning-based test to control for covariates in a data-driven manner and analyze its finite sample performance in a simulation study. Additionally, we apply our method to Slovak labor market data and find that our testable implications are not rejected for a sequence of training programs typically considered in dynamic treatment evaluations.

2206.08598 2026-06-08 cs.LG stat.ML 版本更新

Characterizing Learning Dynamics under Relative Reparameterization of Singular Models

奇异模型相对重参数化下的学习动态表征

Pascal Mattia Esser, Frank Nielsen

AI总结 针对奇异模型参数空间与模型空间非一一对应导致收敛慢的问题,提出相对重参数化方法提取正则子模型,并在高斯混合模型和神经网络上理论分析梯度下降收敛率。

详情
AI中文摘要

分析统计模型学习的一种常见方法是考虑模型参数空间中的操作,但当参数空间与底层统计模型空间之间不存在一一映射时,这变得具有挑战性。这种“奇异模型”经常出现,并且由于吸引子行为,学习轨迹的收敛速度会特征性地降低。在这项工作中,我们考虑了参数空间的相对重参数化技术,该技术提供了一种从奇异模型中提取正则子模型的通用方法。以高斯混合模型和神经网络为例,我们从理论和数值上分析了两种参数化下梯度下降的收敛率。通过分析二阶方法和Fisher信息矩阵的显式性质,我们区分了由算法和内在信息几何方面引起的收敛行为差异。

英文摘要

A common way to analyze learning of statistical models is to consider operations in the models parameter space, however this becomes challenging when there is no one-to-one mapping between the parameter space and the underlying statistical model space. Such ``singular models'' occur frequently and exhibit a characteristic decrease in convergence speed of learning trajectories due to attractor behaviors. In this work, we consider a relative reparameterization technique of the parameter space, which yields a general method for extracting regular sub-models from singular models. On the example of Gaussian Mixture Models and Neural Networks we theoretically and numerically analyze the convergence rate for Gradient Descent under both parameterizations. Analyzing second-order methods and explicit properties of the Fisher Information Matrix we distinguish between differences in convergence behavior arising from algorithmic and intrinsic information-geometric aspects.

2604.26559 2026-06-08 stat.ME math.ST stat.TH

Principled Estimation and Prediction with Competing Risks: a Bayesian Nonparametric Approach

基于竞争风险的原理性估计与预测:一种贝叶斯非参数方法

Claudio Del Sole, Antonio Lijoi, Igor Prünster

AI总结 本文提出一种贝叶斯非参数方法,用于竞争风险下的估计与预测,通过多状态模型框架和灵活的非参数先验模型,构建预测曲线并评估生存函数和病因特异性发病率。

详情
AI中文摘要

在生存分析中,当存在多种死亡原因时会出现竞争风险。本文采用多状态模型框架处理竞争风险,引入通过分层完全随机措施定义的灵活非参数先验模型,以建模转移概率并确定特定的共轭成员。进一步确定数据和潜在随机分割的联合边缘分布,并表征模型的后验分布。利用这些分布结果,评估未来事件为特定类型(如特定原因死亡)的概率,作为事件发生时间的函数。所得函数基于坚实原理,称为预测曲线,是文献中的重要创新。此外,我们还提供生存函数、病因特异性发病率和亚分布函数的后验估计。还设计了适合后验推断的模拟算法。通过模拟研究评估模型性能及算法有效性。最后,我们在临床数据集上展示了我们的方法。

英文摘要

Competing risks occur in survival analysis when multiple causes of death are present. They play a prominent role in several domains extending beyond biostatistics to encompass epidemiology, actuarial sciences, and reliability theory. This paper adopts a multi-state modeling framework to competing risks. We introduce a class of flexible nonparametric priors, defined through hierarchical completely random measures, to model the transition probabilities, and identify the specific (conditionally) conjugate member of this general class. Furthermore, we determine the joint marginal distribution of the data and of a latent random partition, and characterize the posterior distribution of the model. Leveraging these distributional results, we evaluate the predictive probability that a future event is of a specific type (e.g. death from a particular cause), as a function of the time at which the event occurs. The resulting function, derived on sound principles, is termed the prediction curve, and represents a major innovation in the literature. In addition, we provide posterior estimates for the survival function, and for the cause-specific incidence and subdistribution functions. Suitable simulation algorithms for posterior inference are also devised. The model's performance, as well as the algorithms' effectiveness, is evaluated through simulation studies. Finally, we illustrate our approach on clinical datasets.

2507.12878 2026-06-08 eess.SP cs.LG stat.ML

Bayesian Modeling and Estimation of Linear Time-Varying Systems using Neural Networks and Gaussian Processes

基于神经网络和高斯过程的线性时变系统贝叶斯建模与估计

Yaniv Shulman

AI总结 本文提出一种统一的贝叶斯框架,通过将系统脉冲响应建模为随机过程,利用变分推断和高斯过程,实现了对线性时变系统的鲁棒估计。

详情
AI中文摘要

本文提出了一种统一的贝叶斯框架,通过将系统脉冲响应建模为随机过程,利用变分推断和高斯过程,实现了对线性时变系统的鲁棒估计。

英文摘要

The identification of Linear Time-Varying (LTV) systems from input-output data is a fundamental yet challenging ill-posed inverse problem. This work introduces a unified Bayesian framework that models the system's impulse response, $h(t, τ)$, as a stochastic process. We decompose the response into a posterior mean and a random fluctuation term, a formulation that provides a principled approach for quantifying uncertainty, unifies intrinsic channel variability and epistemic uncertainty through a common posterior representation, and naturally defines a new, useful system class we term Linear Time-Invariant in Expectation (LTIE). To perform inference, we leverage modern machine learning techniques, including Bayesian neural networks and Gaussian Processes, using scalable variational inference. We demonstrate through a series of experiments that our framework can infer the properties of an LTI system from a single noisy input-output pair, including under deliberate additive-noise misspecification, achieve a lower overall error floor than the classical CCF stacking baseline in a simulated ambient noise tomography setting, and track a continuously varying LTV impulse response by using a structured Gaussian Process prior. This work provides a flexible and robust methodology for uncertainty-aware system identification in dynamic environments.

2602.10680 2026-06-08 stat.ML cond-mat.dis-nn cs.LG

A solvable high-dimensional model where nonlinear autoencoders learn structure invisible to PCA while test loss misaligns with generalization

一个可解的高维模型,其中非线性自编码器学习到结构对PCA不可见,而测试损失与泛化不一致

Vicente Conde Mendes, Lorenzo Bardone, Cédric Koller, Jorge Medina Moreira, Vittorio Erba, Emanuele Troiani, Lenka Zdeborová

AI总结 本文提出一个高维模型,展示非线性自编码器能学习线性方法如PCA无法捕捉的结构,尽管其测试损失与泛化性能不一致。

详情
Journal ref
ICML 2026
AI中文摘要

许多现实世界的数据集包含隐藏的结构,这些结构无法通过输入特征间的简单线性相关性检测到。例如,潜在因子可能以协调的方式影响数据,尽管其影响对基于协方差的方法如PCA不可见。在实践中,非线性神经网络常在无监督和自监督学习中成功提取此类隐藏结构。然而,构建一个最小的高维模型,其中这种优势可以严格分析仍是一个开放的理论挑战。我们引入了一个可解的高维 spiked 模型,包含两个潜在因子:一个对协方差可见,另一个统计上相关但不相关,仅出现在高阶矩中。PCA 和线性自编码器无法恢复后者,而最小的非线性自编码器可以证明性地提取两者。我们分析了总体风险和经验风险最小化。我们的模型还提供了一个可解的例子,其中自监督测试损失与表征质量不一致:非线性自编码器恢复了线性方法无法捕捉的结构,尽管其重建损失更高。

英文摘要

Many real-world datasets contain hidden structure that cannot be detected by simple linear correlations between input features. For example, latent factors may influence the data in a coordinated way, even though their effect is invisible to covariance-based methods such as PCA. In practice, nonlinear neural networks often succeed in extracting such hidden structure in unsupervised and self-supervised learning. However, constructing a minimal high-dimensional model where this advantage can be rigorously analyzed has remained an open theoretical challenge. We introduce a tractable high-dimensional spiked model with two latent factors: one visible to covariance, and one statistically dependent yet uncorrelated, appearing only in higher-order moments. PCA and linear autoencoders fail to recover the latter, while a minimal nonlinear autoencoder provably extracts both. We analyze both the population risk, and empirical risk minimization. Our model also provides a tractable example where self-supervised test loss is poorly aligned with representation quality: nonlinear autoencoders recover latent structure that linear methods miss, even though their reconstruction loss is higher.

2509.24914 2026-06-08 stat.ML cond-mat.dis-nn cs.IT cs.LG math.IT

Single-Head Attention in High Dimensions: A Theory of Generalization, Weights Spectra, and Scaling Laws

高维中的单头注意力:一般化、权重谱和扩展定律的理论

Fabrizio Boncoraglio, Vittorio Erba, Emanuele Troiani, Yizhou Xu, Florent Krzakala, Lenka Zdeborová

AI总结 本文研究了高维序列任务中训练的注意力层权重谱结构,通过随机矩阵理论等工具,揭示了训练误差、插值阈值及键查询矩阵谱的高维特性,并预测了功率谱定律的出现。

详情
Journal ref
ICML 2026
AI中文摘要

训练的注意力层表现出显著且可重复的权重谱结构,包括低秩坍塌、批量变形和孤立谱异常,但其起源及对泛化的影响尚不明确。本文通过在合成高维序列任务上训练单头绑定注意力层,利用随机矩阵理论、自旋玻璃理论和近似消息传递工具,获得训练和测试误差、插值和恢复阈值及键查询矩阵谱的高维表征。理论预测了训练查询-键映射的完整奇异值分布,包括低秩结构和孤立谱异常,与更现实的Transformer观察结果定性一致。最后,对于具有幂律谱的目标,显示学习通过序列谱恢复进行,导致幂律扩展定律的出现。

英文摘要

Trained attention layers exhibit striking and reproducible spectral structure of the weights, including low-rank collapse, bulk deformation, and isolated spectral outliers, yet the origin of these phenomena and their implications for generalization remain poorly understood. We study empirical risk minimization in a single-head tied-attention layer trained on synthetic high-dimensional sequence tasks generated from the attention-indexed model. Using tools from random matrix theory, spin-glass theory, and approximate message passing, we obtain an exact high-dimensional characterization of training and test error, interpolation and recovery thresholds, and the spectrum of the key and query matrices. Our theory predicts the full singular-value distribution of the trained query-key map, including low-rank structure and isolated spectral outliers, in qualitative agreement with observations in more realistic transformers. Finally, for targets with power-law spectra, we show that learning proceeds through sequential spectral recovery, leading to the emergence of power-law scaling laws.

2411.05729 2026-06-08 cs.LG stat.ML

Graph-Dictionary Signal Model for Sparse Representations of Multivariate Data

图词典信号模型用于多变量数据的稀疏表示

William Cappelletti, Pascal Frossard

AI总结 本文提出图词典信号模型,通过图结构描述多变量数据中的关系,利用稀疏组合的图原子进行信号重构,优于现有基线方法。

详情
AI中文摘要

表示和利用多变量信号需要捕捉变量间的关系,我们通过图来表示这些关系。图词典允许将复杂的关联信息表示为稀疏简单结构之和,但目前尚无先验模型能从数据中推断此类底层结构元素。我们定义了新的图词典信号模型,其中有限的图集合通过其拉普拉斯算子加权和的稀疏组合来描述数据分布中的关系。我们提出了一种从观测节点信号中推断图词典表示的框架,允许包含关于信号属性、底层图及其系数的先验知识。我们引入了原始-对偶分裂算法的双线性推广来解决学习问题。我们展示了该方法在多个合成设置中从信号中重建图的能力,其中我们的模型优于流行的基线方法。然后,我们利用图词典表示在脑活动数据上的示例运动解码任务中,比依赖更多特征的标准方法更好地分类想象运动。我们的图词典模型弥合了多变量数据稀疏表示与样本变化关系的结构分解之间的差距。

英文摘要

Representing and exploiting multivariate signals requires capturing relations between variables, which we can represent by graphs. Graph dictionaries allow to describe complex relational information as a sparse sum of simpler structures, but no prior model exists to infer such underlying structure elements from data. We define a novel Graph-Dictionary signal model, where a finite set of graphs characterizes relationships in data distribution as filters on the weighted sum of their Laplacians. We propose a framework to infer the graph dictionary representation from observed node signals, which allows to include a priori knowledge about signal properties, and about underlying graphs and their coefficients. We introduce a bilinear generalization of the primal-dual splitting algorithm to solve the learning problem. We show the capability of our method to reconstruct graphs from signals in multiple synthetic settings, where our model outperforms popular baselines. Then, we exploit graph-dictionary representations in an illustrative motor imagery decoding task on brain activity data, where we classify imagined motion better than standard methods relying on many more features. Our graph-dictionary model bridges a gap between sparse representations of multivariate data and a structured decomposition of sample-varying relationships into a sparse combination of elementary graph atoms.

2412.03246 2026-06-08 stat.ME stat.AP

Nonparametric estimation of the Patient Weighted While-Alive Estimand

非参数估计患者存活期间事件估量

Alessandra Ragni, Torben Martinussen, Thomas Scheike

AI总结 本文提出患者存活期间事件估量,开发了高效估计方法,并通过实际案例展示其在复发事件中的应用优势。

详情
AI中文摘要

在具有复发事件的临床试验中,如重复住院终以死亡结束,需考虑患者整体事件史以全面评估治疗效应。本文聚焦于患者存活期间事件估量,即存活时间内的事件数期望值,并开发了高效估计方法。具体而言,我们推导了对应的高效影响函数,并开发了初始应用于不可逆疾病-死亡模型的一步估计器。对于更广泛的复发事件情境,由于复杂性增加,该一步估计器由于可能的条件转移强度规格不准确而难以实际应用。因此,我们建议一个替代估计器,预期具有高效率,专注于随机化治疗设置。此外,我们将所提估计器应用于两个实际案例,展示了该第二估计器的实用性和存活期间方法相对于现有替代方案的优势。

英文摘要

In clinical trials with recurrent events, such as repeated hospitalizations terminating with death, it is important to consider the patient events overall history for a thorough assessment of treatment effects. The occurrence of fewer events due to early deaths can lead to misinterpretation, emphasizing the importance of a while-alive strategy as suggested in Schmidli et al. (2023). In this study, we focus on the patient weighted while-alive estimand, represented as the expected number of events divided by the time alive within a target window, and develop efficient estimation for this estimand. Specifically, we derive the corresponding efficient influence function and develop a one-step estimator initially applied to the simpler irreversible illness-death model. For the broader context of recurrent events, due to the increased complexity, this one-step estimator is practically intractable due to likely misspecification of the needed conditional transition intensities that depend on a patient's unique history. Therefore, we suggest an alternative estimator that is expected to have high efficiency, focusing on the randomized treatment setting. Additionally, we apply our proposed estimator to two real-world case studies, demonstrating the practical applicability of this second estimator and benefits of this while-alive approach over currently available alternatives.

2402.16362 2026-06-08 stat.ME math.ST stat.TH

Penalized GEE for Complex Carry-Over in Repeated-Measures Crossover Designs

惩罚GEE用于重复测量交叉设计中的复杂残留效应

N. A. Cruz, K. Mylona, O. O. Melo

AI总结 本文提出了一种基于惩罚GEE的模型,用于估计交叉设计中的复杂残留效应,通过模拟展示该方法能提高治疗效应估计的精度,减少偏差。

详情
AI中文摘要

长期以来,人们认为用于分析交叉设计数据的模型在假设简单残留效应时并不合适。此外,从未发现能估计交叉设计中复杂残留效应的统计模型。然而,本文发现了复杂残留效应的可估计条件及支持这些结果的理论结果。此外,开发了一个非线性剂量-反应测试中的模拟示例,用于典型的AB/BA交叉设计带重复测量。该模拟表明,半参数模型能够检测复杂残留效应,并且这种估计提高了治疗效应估计量的精度。结论指出,当每个个体每个观测时期至少有五次重复时,半参数统计模型能提供良好的治疗效应估计,并相对于假设无残留效应或简单残留效应的模型减少偏差。此外,展示了该方法的应用,并展示了通过估计复杂残留效应获得的分析成果。

英文摘要

It has been argued for many years that models used to analyze data from crossover designs are not appropriate when simple carryover effects are assumed. Furthermore, a statistical model that could estimate complex carry-over effects in crossover designs had never been found. However, in this paper, the estimability conditions of the complex carryover effects and a theoretical result that supports them are found. In addition, a simulation example is developed in a non-linear dose-response test for a typical AB/BA crossover design with repeated measures. This simulation shows that a semiparametric model can detect complex carryover effects and that this estimation improves the precision of the estimators of the treatment effect. It is concluded that when there are at least five replicates in each observation period per individual, semiparametric statistical models provide a good estimator of the treatment effect and reduce bias with respect to models that assume the absence of carryover effects or simplex carryover effects. Furthermore, an application of the methodology is shown and the wealth of analysis gained by estimating complex carryover effects is evident.

2506.12454 2026-06-08 stat.ML cond-mat.dis-nn cs.CR cs.LG

On the existence of consistent adversarial attacks in high-dimensional linear classification

高维线性分类中一致对抗攻击存在的存在性研究

Matteo Vilucchio, Lenka Zdeborová, Bruno Loureiro

AI总结 本文研究高维二分类中对抗攻击与模型表达能力有限导致的误分类区别,提出新的误差度量标准,揭示模型对保持真实标签扰动的脆弱性,理论分析显示模型越过度参数化,对标签保持扰动的敏感性越高。

详情
Journal ref
ICML 2026
AI中文摘要

本文研究高维二分类中对抗攻击与模型表达能力有限或数据有限导致的误分类的本质区别,提出新的误差度量标准,精确捕捉这一区别,量化模型对保持真实标签扰动的脆弱性。我们的主要技术贡献是精确且严谨地对这些度量在良好指定模型和潜在空间模型中的渐进行为进行刻画,揭示与标准稳健误差度量不同的脆弱性模式。理论结果表明,随着模型变得越来越过度参数化,其对标签保持扰动的脆弱性增加,为理解模型对对抗攻击的敏感机制提供了理论见解。

英文摘要

What fundamentally distinguishes an adversarial attack from a misclassification due to limited model expressivity or finite data? In this work, we investigate this question in the setting of high-dimensional binary classification, where statistical effects due to limited data availability play a central role. We introduce a new error metric that precisely capture this distinction, quantifying model vulnerability to consistent adversarial attacks -- perturbations that preserve the ground-truth labels. Our main technical contribution is an exact and rigorous asymptotic characterization of these metrics in both well-specified models and latent space models, revealing different vulnerability patterns compared to standard robust error measures. The theoretical results demonstrate that as models become more overparameterized, their vulnerability to label-preserving perturbations grows, offering theoretical insight into the mechanisms underlying model sensitivity to adversarial attacks.

2201.00892 2026-06-08 math.ST stat.TH

An extreme value approach to CoVaR estimation

基于极值理论的CoVaR估计方法

Natalia Nolde, Chen Zhou, Menglin Zhou

AI总结 本文提出一种半参数方法,利用多变量极值理论估计CoVaR,通过建模尾部依赖函数解决数据稀疏问题,并通过模拟和实证验证方法的稳健性。

详情
Journal ref
Journal of the American Statistical Association (2026)
Comments
44 pages, 5 figures, 6 tables
AI中文摘要

2007-2009年全球金融危机凸显了系统性风险在保障金融市场稳定中的关键作用。准确评估系统性风险可帮助监管机构制定适当政策以缓解风险,并使机构监测其对市场波动的敏感性。CoVaR作为衡量系统性风险的常用指标,由Adrian和Brunnermeier(2011)提出。本文在多变量极值理论框架内开发了一种半参数估计方法。根据其定义,CoVaR可视为某一机构或金融系统潜在损失条件分布的高分位数,其中条件事件对应于金融系统(或给定金融机构)出现大损失。本文将该条件分布与系统与机构间的尾部依赖函数相关联,然后通过参数建模尾部依赖函数来解决联合尾部区域的数据稀疏问题。本文证明了所提估计量的一致性,并通过模拟研究和实际数据示例展示了其表现。

英文摘要

The global financial crisis of 2007-2009 highlighted the crucial role systemic risk plays in ensuring stability of financial markets. Accurate assessment of systemic risk would enable regulators to introduce suitable policies to mitigate the risk as well as allow individual institutions to monitor their vulnerability to market movements. One popular measure of systemic risk is the conditional value-at-risk (CoVaR), proposed in Adrian and Brunnermeier (2011). We develop a methodology to estimate CoVaR semi-parametrically within the framework of multivariate extreme value theory. According to its definition, CoVaR can be viewed as a high quantile of the conditional distribution of one institution's (or the financial system) potential loss, where the conditioning event corresponds to having large losses in the financial system (or the given financial institution). We relate this conditional distribution to the tail dependence function between the system and the institution, then use parametric modelling of the tail dependence function to address data sparsity in the joint tail regions. We prove consistency of the proposed estimator, and illustrate its performance via simulation studies and a real data example.