arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.27270 2026-05-27 stat.AP stat.ML

Inverse Control Constrained Optimization of Vessel Speed Decisions Under Environmental Risk: Evidence from Arctic Shipping

环境风险下船舶速度决策的逆控制约束优化:来自北极航运的证据

Mauli Pant, Linda Fernandez, Indranil Sahoo

AI总结 通过逆控制约束优化框架,利用超过1400万条AIS观测数据估计船舶速度决策中的风险参数,揭示了不同船型和航行状态在运营效率、冰风险与鲸鱼生态风险之间的权衡模式。

详情
AI中文摘要

理解决策者如何在运营效率与环境生态风险之间进行权衡是船舶航行的核心问题。我们将船舶速度建模为约束优化框架中的控制变量,其中船舶运营商平衡多个相互竞争的目标,包括运输效率、与冰相关的航行风险以及与鲸鱼相关的生态风险。底层风险参数使用来自美国北极地区(2010-2019年)的超过1400万条自动识别系统(AIS)观测数据,结合环境协变量和空间明确的鲸鱼密度估计进行估计。该框架包含非线性风险目标、船舶异质性和正则化,以确保结果稳定且可解释。推断出的权衡揭示了不同船组和航行状态下的不同决策模式。拖船和货船等船型在运营速度与环境生态考量之间取得平衡。相比之下,包括渔船、客船和未指定船舶在内的几个船组受到冰相关风险的强烈影响,而游艇和油轮则对鲸鱼相关风险表现出更高的敏感性。在不同航行状态类别中,也观察到类似的异质性。主导状态“使用发动机航行”显示出清晰的权衡,而其他状态,如“搁浅”和“未定义”,则受到冰相关约束的强烈影响。包括“操纵能力受限”和“从事捕鱼”在内的状态对鲸鱼相关风险表现出更高的估计敏感性,尽管存在较大的不确定性。敏感性分析表明,增加鲸鱼相关风险权重对模型隐含的最优速度产生有限的变化,而增加冰相关风险则导致更一致的减速。

英文摘要

Understanding how decision makers balance operational efficiency with environmental and ecological risks is central to vessel navigation. We model vessel speed as a control variable in a constrained optimization framework in which vessel operators balance multiple competing objectives, including transit efficiency, ice related navigational risk, and whale related ecological risk. The underlying risk parameters are estimated using over 14 million Automatic Identification System (AIS) observations from the United States Arctic (2010-2019), together with environmental covariates and spatially explicit whale density estimates. The framework incorporates a nonlinear risk objective, vessel heterogeneity, and regularization to ensure stable and interpretable results. The inferred trade offs reveal distinct decision making patterns across vessel groups and navigational statuses. Vessel types such as Tug Tow and Cargo balance operational speed with environmental and ecological considerations. In contrast, several vessel groups, including Fishing, Passenger, and Unspecified vessels, are strongly influenced by ice related risk, while Pleasure Craft and Tankers exhibit higher sensitivity to whale related risk. Across navigational status categories, similar heterogeneity is observed. The dominant status, under way using engine, displays a clear trade off, whereas other statuses, such as aground and undefined, are strongly shaped by ice related constraints. Statuses including restricted maneuverability and engaged in fishing exhibit higher estimated sensitivity to whale related risk, though with substantial uncertainty. Sensitivity analysis indicates that increasing whale-related risk weighting produces limited changes in model-implied optimal speed, whereas increasing ice-related risk leads to more consistent reductions.

2605.27352 2026-05-27 cs.LG stat.ML

From Scores to Gibbs Correctors: Accelerating Uniform-Rate Discrete Diffusion Models

从分数到吉布斯校正器:加速均匀速率离散扩散模型

Yuchen Liang, Ness Shroff, Yingbin Liang

AI总结 提出吉布斯加速离散扩散(GADD)方法,利用具体分数函数构建吉布斯后验似然,无需额外训练即可实现均匀速率离散扩散模型的加速采样,达到$\mathcal{O}(\mathrm{polylog} (\varepsilon^{-1}))$的采样复杂度。

详情
AI中文摘要

离散扩散模型在文本和其他符号领域取得了强大的实证表现,但特别是对于均匀速率模型,它们通常需要许多步骤才能生成单个样本。现有的加速方法要么依赖训练额外的量,要么遭受慢混合问题。在这项工作中,我们提出了一种新颖的基于吉布斯的离散扩散模型校正器,称为吉布斯加速离散扩散(GADD)。GADD利用具体分数函数的结构直接构建吉布斯后验似然,除了标准分数估计外不需要任何额外训练。我们证明GADD实现了$\mathcal{O}(\mathrm{polylog} (\varepsilon^{-1}))$的整体采样复杂度,为均匀速率离散扩散模型的基于扩散的采样器提供了第一个这样的速率。我们还进行了数值实验,展示了GADD在合成数据、零样本文本采样和零样本条件音乐生成中的实际优势。这些结果证实了理论,并表明GADD在样本质量和墙钟效率上始终优于标准基线,包括原始欧拉方法和CTMC校正器。除此之外,我们的理论分析引入了一个新颖的框架,用于分析离散扩散模型中的预测器-校正器方法,这可能具有独立的意义。与依赖Girsanov测度变换技术的现有方法不同,我们的方法基于一个归纳论证,该论证在考虑校正器更新不准确性的同时,跟踪预测器迭代中的误差传播。

英文摘要

Discrete diffusion models have achieved strong empirical performance in text and other symbolic domains, but, especially for uniform-rate models, they often require many steps to generate a single sample. Existing acceleration methods either rely on training additional quantities or suffer from slow mixing. In this work, we propose a novel Gibbs-based corrector for discrete diffusion models, termed Gibbs-Accelerated Discrete Diffusion (GADD). GADD leverages the structure of the concrete score function to construct Gibbs posterior likelihoods directly, without requiring any additional training beyond standard score estimation. We show that GADD achieves an overall sampling complexity of $\mathcal{O}(\mathrm{polylog} (\varepsilon^{-1}))$, yielding the first such rate for diffusion-based samplers for uniform-rate discrete diffusion models. We also conduct numerical experiments demonstrating the practical advantages of GADD across synthetic data, zero-shot text sampling, and zero-shot conditional music generation. These results corroborate the theory and show that GADD consistently improves sample quality and wall-clock efficiency over standard baselines, including vanilla Euler methods and CTMC correctors. Beyond this, our theoretical analysis introduces a novel framework for analyzing predictor-corrector methods in discrete diffusion models, which may be of independent interest. Unlike existing approaches that rely on the Girsanov change-of-measure technique, our method is based on an induction argument that tracks error propagation across predictor iterations while accounting for inaccuracies in the corrector updates.

2605.27335 2026-05-27 stat.AP

Beyond average warming: Two-sample inference for dense-sparse functional data reveals changes in intraday temperature patterns

超越平均变暖:密集-稀疏函数数据的双样本推断揭示日内温度模式的变化

Kevin Wilk, Hajo Holzmann

AI总结 针对历史与当前温度数据采样频率不一致的问题,提出基于迁移学习的函数型数据双样本推断方法,估计均值函数差并构建均匀置信带,发现气候变化不仅影响平均温度,还改变了日内温度模式。

详情
AI中文摘要

德国现代气象站每10分钟记录一次每日温度,而历史参考期的测量通常只有更粗的时间分辨率,通常是每小时一次。在比较历史和当前的每日温度模式时,必须考虑这种差异。受此问题启发,我们针对一种样本密集观测而另一种相对稀疏的采样方案,开发了函数型数据的双样本推断程序。基于函数型数据迁移学习的最新思想,我们推导了均值函数差的估计量,该估计量在最大模下达到最优收敛速度。我们进一步在连续函数空间中建立了函数中心极限定理,并开发了乘子自助法来构建均匀置信带。还讨论了向函数型时间序列的扩展。将所提出的方法应用于德国气象站的每日温度曲线,并按月分别分析,结果表明气候变化不仅改变了平均温度,还改变了日内温度模式。特别是,对于柏林等站点,从早晨到下午早期的变暖幅度超过了每日平均增幅,而傍晚和夜间温度的增加相对较小。

英文摘要

Modern weather stations in Germany record daily temperatures every 10 minutes, whereas measurements from historical reference periods are often only available at much coarser temporal resolutions, typically hourly. This discrepancy must be accounted for when comparing historical and current daily temperature patterns. Motivated by this problem, we develop two-sample inference procedures for functional data under sampling schemes where one sample is densely observed while the other is relatively sparse. Building on recent ideas from transfer learning for functional data, we derive estimators of the difference of the mean functions that attain optimal convergence rates in the supremum norm. We further establish a functional central limit theorem in the space of continuous functions and develop multiplier bootstrap methods for constructing uniform confidence bands. Extensions to functional time series are also discussed. Applying the proposed methodology to daily temperature curves from German weather stations, analyzed separately by month, reveals that climate change has altered not only average temperatures but also intraday temperature patterns. In particular, for stations such as Berlin, warming from morning to early afternoon exceeds the daily average increase, whereas evening and nighttime temperatures exhibit comparatively smaller increases.

2605.27330 2026-05-27 stat.ME

Two-Phase Sampling Designs and Analysis Approaches for Ordinal Outcomes

面向有序结局的两阶段抽样设计与分析方法

Yunbi Nam, Nathan I. Shapiro, Eric P. Schmidt, Wesley H. Self, Ran Tao, Jonathan S. Schildcrout

AI总结 针对有序结局变量,提出三种基于结局的第二阶段抽样设计(ODS、协变量分层ODS和残差依赖抽样)以及相应的校正分析方法,显著提高估计效率。

详情
AI中文摘要

现代临床试验和队列研究收集所有参与者的低成本数据,但可能资源有限,无法评估昂贵的暴露因素(如生物标志物或基因组数据)。当关注涉及昂贵暴露的关联时,两阶段设计通过利用所有参与者的可用信息来指导目标选择子集进行额外测量,从而提供了一种成本效益高的框架。我们将此框架扩展到具有有序结局的研究中,这是一个常见但先前未探索的设置。我们提出了三种基于结局的第二阶段抽样设计——结局依赖抽样(ODS)、协变量分层ODS和残差依赖抽样——利用第一阶段数据丰富第二阶段选择的信息性受试者。然后,我们开发了用于有效且高效估计/推断的分析方法,包括使用校正最大似然估计的条件似然方法、多重插补以及使用筛最大似然估计的全似然方法。在一系列场景中,模拟研究表明,与使用标准最大似然估计的简单随机抽样相比,所提出的方法显著提高了效率。我们通过检查白细胞介素-6与四水平临床状态结局(出院、住院但不在ICU、住院在ICU和死亡)之间的关联,进一步证明了其实用性,该数据来自Crystalloid Liberal or Vasopressors Early Resuscitation in Sepsis试验随机化后14天的结果。

英文摘要

Modern clinical trials and cohort studies gather low-cost data on all participants but may have limited resources to assess expensive exposures such as biomarkers or genomic data. When interest lies in associations involving expensive exposures, two-phase designs provide a cost-effective framework by using information available on all participants to guide the targeted selection of a subset for additional measurements. We extend this framework to studies with ordinal outcomes, a common yet previously unexplored setting. We propose three outcome-informed phase 2 sampling designs -- outcome-dependent sampling (ODS), covariate-stratified ODS, and residual-dependent sampling -- that leverage phase 1 data to enrich phase 2 selection with informative subjects. We then develop analysis methods for valid and efficient estimation/inference, including conditional likelihood methods with ascertainment-corrected maximum likelihood estimation, multiple imputation, and a full likelihood method using sieve maximum likelihood estimation. Across a range of scenarios, simulation studies show that the proposed methods substantially improve efficiency over simple random sampling with standard maximum likelihood estimation. We further demonstrate their practical utility by examining the association between interleukin-6 and a four-level clinical status outcome -- discharged, hospitalized but not in the ICU, hospitalized in the ICU, and death -- 14 days after randomization into the Crystalloid Liberal or Vasopressors Early Resuscitation in Sepsis trial.

2605.27293 2026-05-27 cs.LG stat.ML

BASIS: Batchwise Advantage Estimation from Single-Rollout Information Sharing for LLM Reasoning

BASIS: 基于单次采样信息共享的批量优势估计用于LLM推理

Shijin Gong, Erhan Xu, Kai Ye, Francesco Quinzan, Giulia Livieri, Chengchun Shi

AI总结 提出BASIS算法,通过单次采样和批次内信息共享改进价值函数估计,在减少计算开销的同时提升策略优化性能。

详情
Comments
17 pages, 7 figures
AI中文摘要

基于可验证奖励的强化学习已成为提升大型语言模型推理能力的标准方法。现有算法在价值估计和策略学习中面临计算效率与样本效率之间的权衡。我们引入BASIS,一种无评论家的后训练算法,旨在解决这一权衡。在每个在线训练步骤中,BASIS每个提示仅采样一次,但利用整个批次中跨提示的丰富信息来改进价值函数估计。实验表明,与代表性单次采样基线REINFORCE++相比,BASIS将价值函数估计的MSE降低了69%,并且使用一次采样达到的MSE低于使用8次采样的组均值估计器。价值估计的改进转化为更好的策略优化:使用显著更少的训练时间,BASIS达到了接近多次采样GRPO型基线的性能,并且通常优于单次采样REINFORCE型基线。

英文摘要

Reinforcement learning with verifiable rewards has become a standard recipe for improving the reasoning abilities of large language models. Existing algorithms face a tradeoff between computational efficiency and sample efficiency in value estimation and policy learning. We introduce BASIS, a critic-free post-training algorithm designed to address this tradeoff. At each online training step, BASIS samples only one rollout per prompt, but leverages rich information across prompts in the entire batch to improve value function estimation. Our experiments demonstrate that BASIS reduces MSE in value function estimation by 69% compared to REINFORCE++, a representative single-rollout baseline, and achieves lower MSE with one rollout than group mean estimators with 8 rollouts. This improvement in value estimation translates to better policy optimization: using substantially less training time, BASIS achieves performance close to multi-rollout GRPO-type baselines and often outperforms single-rollout REINFORCE-type baselines.

2605.27281 2026-05-27 cs.LG stat.ML

Causal Risk Minimization for High-Dimensional Treatments

高维处理变量的因果风险最小化

Nikita Dhawan, Arnav Paruthi, Andrew Kim, Lovedeep Gondara, Jekaterina Novikova, Chris J. Maddison

AI总结 针对高维处理空间(如文本)的因果推断,提出通过分解因果误差为矩平衡误差序列并优化高阶平衡目标,以及将高维处理投影到低维属性的方法,实现无需属性特定训练的因果估计。

详情
Comments
18 pages, 4 figures
AI中文摘要

预测具有多种可能变化的干预效果(例如,影响心理健康结果的治疗内容或推动股价变动的财报电话会议记录)在多个领域中非常有用。然而,经典的因果估计量通常假设所有可能的干预都被观察到,这在干预变化广泛的情况下(例如,在所有文本字符串的空间中)是不可行的。我们采用了一种将因果推断重新表述为学习问题的著名方法,以处理高维处理空间。具体来说,在标准假设(如无未观测混杂)下,我们证明因果误差可分解为一系列递增阶数的矩平衡误差,并设计了直接改进因果估计的目标函数。我们还展示了如何将高维处理的效果投影到低维处理属性上,这使得单个模型能够回答多个因果问题,而无需额外的属性特定训练。我们在高维连续、离散和文本处理设置中经验性地评估了我们的估计量,其中文本处理使用了亚马逊评论的半合成数据集。我们的实验证明了高阶平衡误差优化的优势以及投影因果估计与属性特定估计的竞争性能。

英文摘要

Predicting the effect of interventions with many possible variations, e.g., therapeutic content that affects mental health outcomes or an earnings call transcript that drives movement in share price, is useful across several domains. However, classical causal estimators tend to assume that all possible interventions are observed, which is infeasible when interventions vary widely, for instance, in the space of all text strings. We adapt a well-known approach of recasting causal inference as a learning problem, to address high-dimensional treatment spaces. Specifically, under standard assumptions like no unobserved confounding, we show that causal error decomposes into a series of moment-balancing errors of increasing order, and design objectives that directly improve causal estimation. We also show how to project the effect of a high-dimensional treatment onto lower-dimensional treatment attributes, which allows a single model to answer several causal questions without additional attribute-specific training. We empirically evaluate our estimators in settings with high-dimensional continuous, discrete, and text treatments, the last of which used a semi-synthetic dataset of Amazon Reviews. Our experiments demonstrate the benefit of higher-order balance error optimization and competitive performance of projected causal estimates with attribute-specific estimators.

2605.27272 2026-05-27 stat.ME

Causally-interpretable meta-analysis using aggregate data

使用聚合数据的因果可解释荟萃分析

Qingyang Shi, Wouter van Amsterdam, Sacha la Bastide-van Gemert, Talitha Feenstra, Issa J. Dahabreh

AI总结 提出一种仅需试验聚合数据的新版因果可解释荟萃分析方法,通过构建矩方程估计条件平均处理效应函数,进而得到目标人群的平均处理效应,并建立渐近性质。

详情
AI中文摘要

证据综合和荟萃分析用于指导临床实践指南和卫生经济学评估。然而,治疗效果的异质性构成了重大挑战。传统荟萃分析通过随机效应假设处理异质性,但这些假设缺乏设计支持,导致估计可能不适用于任何真实世界人群。因果可解释荟萃分析(CIMA)为合并多个随机试验信息时因果效应的规范、识别和估计提供了严格框架。CIMA的初步开发侧重于使用来自随机试验的个体数据,但此类数据在实践中通常不可得。在此,我们提出一种新版本的CIMA,仅需试验的聚合数据,在仅依赖聚合数据的同时解决了传统荟萃分析方法的局限性。该方法利用试验报告的边际和一次一个亚组治疗效果的估计值以及基线协变量的描述性统计,构建矩方程以识别和估计参数化的条件平均处理效应(CATE)函数。通过将CATE函数在定义目标人群的个体协变量数据上边缘化,得到新目标人群中的平均处理效应。该方法还可用于在目标人群中进行因果可解释的间接治疗比较。我们建立了该方法的渐近性质,通过模拟研究评估其有限样本性能,并通过重新分析已发表的关于心力衰竭患者中SGLT2抑制剂的荟萃分析来展示该方法的应用。

英文摘要

Evidence syntheses and meta-analyses are used to inform clinical practice guidelines and health economic evaluations. However, heterogeneity of treatment effects poses a significant challenge. Conventional meta-analysis addresses heterogeneity through random-effect assumptions, which are not supported by design and lead to estimates that may not apply to any real-world population. Causally-interpretable meta-analysis (CIMA) offers a rigorous framework for specification, identification, and estimation of causal effects when combining information from multiple randomized trials. Initial development of CIMA focused on using individual data from randomized trials, but such data are often unavailable in practice. Here, we propose a new version of CIMA that only requires aggregate data from trials, addressing the limitations of traditional meta-analysis methods while relying only on aggregate data. The method leverages the trials' reported estimates of marginal and one-at-a-time subgroup treatment effects and descriptive statistics for baseline covariates to build moment equations for identifying and estimating a parametric conditional average treatment effect (CATE) function. The average treatment effect in a new target population is obtained by marginalizing the CATE function over the individual covariate data that defines the target population. The method can also be used to obtain causally-interpretable indirect treatment comparisons in the target population. We establish the asymptotic properties of the method, assess its finite-sample performance in simulation studies, and illustrate the application of the method by re-analyzing a published meta-analysis for SGLT2 inhibitors in patients with heart failure.

2605.27269 2026-05-27 cs.LG stat.AP

Transfer Learning using 66 Diseases for Disease Forecasting Applications

使用66种疾病的迁移学习进行疾病预测应用

Lauren J Beesley, Alexander C Murph, Dave Osthus, Lauren A Castro

AI总结 本研究通过迁移学习整合66种传染病及多种数据流,发现大多数情况下加入其他数据流能提升预测性能,但数据质量至关重要,并构建了公开数据库。

详情
AI中文摘要

疾病预测模型通常依赖于单一数据流,这使得模型在历史数据短或噪声大时变得脆弱。最近表现最佳的模型表明,综合同一疾病的多个报告系统可以提升性能。其他近期工作进一步扩展了这一想法,使用迁移学习利用不同疾病的数据来训练某一疾病的预测模型。我们极大地扩展了这些方法,在涵盖66种传染病和多个数据流的数据上训练机器学习模型。我们研究了整合不同数据流对预测20种不同疾病数据流的价值。我们发现,在绝大多数(84.9%)考虑的时间序列和模型结构中,整合其他数据流改善了预测。然而,我们的工作强调,添加数据的质量很重要,添加与目标数据流极其不同的数据有时会降低预测性能。这项工作的一个主要贡献是编制了一个公开可用的数据库,供传染病预测社区使用。

英文摘要

Disease forecasting models typically rely on a single data stream, making models brittle when histories are short or noisy. Recent top-performing models have shown that synthesizing multiple reporting systems for the same disease improves performance. Other recent work takes this idea a step further, using transfer learning to train a forecasting model for one disease using data from a different disease. We expand upon each of these approaches greatly, training machine learning models on data that span 66 infectious diseases and several data streams. We investigate the value of incorporating different data streams for forecasting 20 different disease data streams. We find that incorporating other data streams improves forecasting in the vast majority (84.9%) of time series and model structures considered. However, our work highlights that the quality of the added data matters, where adding data extremely different from the target data stream can sometimes degrade forecast performance. A major contribution of this work is in compiling a publicly-available database of data for use by the infectious disease forecasting community.

2605.27253 2026-05-27 math.ST math.PR stat.TH

An Entropy-Energy Identity for Predictive Kullback-Leibler Regret in Infinitely Divisible Location Models

无穷可分位置模型中预测Kullback-Leibler遗憾的熵-能量恒等式

Kōsaku Takanashi, Kenichiro McAlinn

AI总结 针对无穷可分位置模型,提出熵-能量恒等式将贝叶斯预测密度的遗憾表示为对称马尔可夫半群的Dirichlet形式能量,并给出可容许性的尾条件。

详情
AI中文摘要

我们考虑$d$维无穷可分位置模型下对数得分的预测密度估计。以Lebesgue先验下的形式贝叶斯预测密度为基准,我们研究竞争贝叶斯预测密度的Kullback-Leibler遗憾。我们的主要贡献是一个精确的熵-能量恒等式:先验$π$下的贝叶斯预测密度$\hat{p}^π$相对于基准的积分遗憾可以精确表示为基准核诱导的对称马尔可夫半群的平方根边际分布$\sqrt{M^π}$的Dirichlet形式能量。这将遗憾比较转化为势论问题,并给出了基准预测密度能否被一致改进的尖锐递归/瞬态刻画。我们引入一类由诱导过程的生成元$\mathcal{A}$定义的$\mathcal{A}$调和类不当先验,并给出显式的尾条件——关于诱导边际的积分检验,等价于重尾模型中的幂律先验衰减——以保证所得贝叶斯预测密度的可容许性。我们通过几个分布的新结果来说明该理论。

英文摘要

We consider predictive density estimation under logarithmic score for $d$-dimensional infinitely divisible location models. Taking the formal Bayes predictive density under the Lebesgue prior as a benchmark, we study the Kullback-Leibler regret of competing Bayes predictive densities. Our main contribution is an exact entropy-energy identity: the integrated regret of a Bayes predictive density $\hat{p}^π$ under prior $π$ relative to the benchmark admits an exact representation as the Dirichlet-form energy of the square-rooted marginal distribution $\sqrt{M^π}$ for the symmetric Markov semigroup induced by the benchmark kernel. This converts regret comparisons into a potential-theoretic problem and yields a sharp recurrence/transience characterization of when the benchmark predictive density can or cannot be uniformly improved. We introduce an $\mathcal{A}$-harmonic class of improper priors -- defined through the generator $\mathcal{A}$ of the induced process -- and give explicit tail conditions -- an integral test on the induced marginal, equivalent to power-law prior decay in heavy-tailed models -- that guarantee admissibility of the resulting Bayes predictive density. We illustrate the theory with new results for several distributions.

2605.27248 2026-05-27 stat.ME

Space-filling foldover designs for order-of-addition experiments under Kendall tau distance criteria

基于Kendall tau距离准则的序贯添加实验空间填充折叠设计

Hui Shao, Yaping Wang, Qian Xiao

AI总结 针对序贯添加实验,提出基于Kendall tau距离的极大极小准则和分散准则,设计高效折叠模拟退火算法(FSA-KD)生成空间填充部分排列设计,并在代理建模和排列优化中表现优异。

详情
AI中文摘要

当响应依赖于一组组件的添加顺序时,就会产生序贯添加实验。由于可能顺序的数量随组件数量呈阶乘增长,因此除小规模问题外,全排列设计很少可行。本文基于Kendall tau距离(一种通过成对排序不一致性比较排列的自然度量)研究序贯添加实验的空间填充部分设计。我们考虑了极大极小Kendall tau距离准则及相关分散准则,并建立了它们与成对排序模型和Mallows核高斯过程模型下统计最优性的联系。为了构造此类设计,我们提出了一种高效的折叠模拟退火算法(记为FSA-KD),该算法基于排列空间中的交换移动,并结合了折叠和增量更新策略。数值研究表明,所得到的FSA-KD设计具有较大的最小成对Kendall tau距离(记为k_min(D))和稳定的成对距离分布,并在代理建模和基于排列的优化任务中表现良好。

英文摘要

Order-of-addition experiments arise when the response depends on the order in which a set of components is added. Since the number of possible orders increases factorially with the number of components, full permutation designs are rarely feasible except for small problems. This paper studies space-filling fractional designs for order-of-addition experiments based on the Kendall tau distance, a natural metric for comparing permutations through pairwise ordering disagreements. We consider the maximin Kendall tau distance criterion and related dispersion criteria, and establish their connections with statistical optimality under the pairwise ordering model and a Gaussian process model with the Mallows kernel. To construct such designs, we propose an efficient foldover simulated annealing algorithm, denoted by FSA-KD, based on swap moves in the permutation space, together with foldover and incremental updating strategies. Numerical studies show that the resulting FSA-KD designs have large minimum pairwise Kendall tau distances, denoted by k_min(D), and stable pairwise distance distributions, and perform well in surrogate modeling and permutation-based optimization tasks.

2605.27219 2026-05-27 cs.LG stat.ML

Nonlinear Data Integration via Kernel Methods for Data Collaboration Analysis

基于核方法的非线性数据整合用于数据协作分析

Yamato Suetake, Yuta Kawakami, Shunnosuke Ikeda, Yuichi Takano

AI总结 针对分散保密数据协作分析中线性整合方法重建风险高且无法对齐非线性变换的问题,提出非线性核整合(NKI)方法,通过核岭回归和特征值问题获得全局最优解,并引入图正则化和中心化约束以捕获几何和目标变量信息,在图像分类任务中提升了准确率并降低了重建风险。

详情
Comments
50 pages, 7 figures
AI中文摘要

分散保密数据集的协作分析很重要,但原始数据集的直接共享常受隐私和机构限制。数据协作(DC)分析通过各方特定的混淆函数将每个数据集转换为隐私保护的中间表示,并使用锚数据集将它们整合为公共协作表示。然而,许多现有的DC分析方法依赖线性变换进行数据混淆和整合,这可能增加重建风险。尽管非线性降维可以缓解这一风险,但传统的线性整合方法无法准确对齐非线性变换产生的中间表示。此外,现有的整合方法主要最小化各方之间的差异,并未明确纳入对下游分析有用的几何或目标变量信息。为克服这些限制,我们首先将线性核整合(LKI)公式化为一种线性整合方法,然后对其进行核化以获得非线性核整合(NKI)。NKI通过核岭回归和特征值问题获得全局最优解。我们还引入了图正则化和中心化约束,使得目标表示能够捕获对下游分析有用的几何和目标变量信息。在图像分类任务上的实验表明,在非线性降维下,NKI比现有的线性整合方法提高了分类准确率,而目标变量感知的图正则化和中心化进一步带来了增益。结果还表明,降维选择显著影响分类准确率和重建风险。

英文摘要

Collaborative analysis of decentralized confidential datasets is important, but direct sharing of original datasets is often restricted by privacy and institutional constraints. Data collaboration (DC) analysis transforms each dataset into privacy-preserving intermediate representations via party-specific obfuscation functions and integrates them into common collaboration representations using an anchor dataset. However, many existing DC analysis methods rely on linear transformations for data obfuscation and integration, which may increase reconstruction risk. Although nonlinear dimensionality reduction can mitigate this risk, conventional linear integration methods cannot accurately align intermediate representations produced by nonlinear transformations. Moreover, existing integration methods mainly minimize discrepancies among parties and do not explicitly incorporate geometric or target-variable information useful for downstream analysis. To overcome these limitations, we first formulate linear kernel integration (LKI) as a linear integration method and then kernelize it to obtain nonlinear kernel integration (NKI). NKI admits a globally optimal solution via kernel ridge regression and an eigenvalue problem. We also introduce graph regularization and a centering constraint so that the target representation can capture geometric and target-variable information useful for downstream analysis. Experiments on image classification tasks demonstrate that NKI improves classification accuracy over existing linear integration methods under nonlinear dimensionality reduction, with further gains from target-variable-aware graph regularization and centering. The results also show that dimensionality reduction choices substantially affect both classification accuracy and reconstruction risk.

2605.27184 2026-05-27 stat.ME stat.AP

Posterior Quantification of Borrowing from Multiple Historical Control Data in Bayesian Dynamic Borrowing Methods: A Scoping Review

贝叶斯动态借用方法中从多个历史对照数据借用的后验量化:一项范围综述

Tomohiro Ohigashi, Wataru Murasaki, Masahiko Gosho

AI总结 本文通过范围综述,聚焦贝叶斯动态借用方法中从多个历史对照借用的后验量化,区分了整体和源级别的借用度量,并通过案例展示了不同方法在借用总量和源特异性模式上的差异。

详情
AI中文摘要

贝叶斯动态借用方法将历史对照数据纳入当前临床试验分析,同时允许借用程度取决于历史数据与当前数据之间的兼容性。尽管已提出许多方法,但借用程度通常难以解释,尤其是当存在多个历史对照来源时。本范围综述聚焦于从多个历史对照借用的后验量化。我们讨论了基于有效历史样本量的整体借用摘要,以及源自幂先验、单位信息先验、多源可交换模型、狄利克雷过程混合模型和潜在偏倚模型的方法特异性源级别借用、信息贡献或兼容性摘要。我们将后验借用度量与描述先验信息分配或源特异性冲突的量区分开来。两个案例研究(一个二元终点,一个连续终点)表明,具有相似后验处理效应估计的方法可能在借用的整体数量和源特异性模式上有所不同。这些例子显示,较大的整体借用可能反映从兼容历史源的选择性借用,而非均匀地从所有源借用。我们建议在可能的情况下报告处理效应估计以及整体和源特异性借用摘要,以提高后验推断的透明度。

英文摘要

Bayesian dynamic borrowing methods incorporate historical control data into current clinical trial analyses while allowing the degree of borrowing to depend on the compatibility between historical and current data. Although many methods have been proposed, the degree of borrowing is often difficult to interpret, especially when multiple historical control sources are available. This scoping review focuses on posterior quantification of borrowing from multiple historical controls. We discuss overall borrowing summaries based on effective historical sample size, together with method-specific source-level summaries of borrowing, information contribution, or compatibility arising from power priors, unit information priors, multisource exchangeability models, Dirichlet process mixture models, and potential bias models. We distinguish posterior borrowing measures from quantities describing prior information allocation or source-specific conflict. Two case studies, one with a binary endpoint and one with a continuous endpoint, illustrate that methods with broadly similar posterior treatment effect estimates may differ in both the overall amount and source-specific pattern of borrowing. These examples show that large overall borrowing may reflect selective borrowing from compatible historical sources rather than uniform borrowing from all sources. We recommend reporting treatment effect estimates together with overall and source-specific borrowing summaries, when available, to improve transparency in posterior inference.

2605.27172 2026-05-27 math.ST math.PR stat.TH

Convergence Rates of Ordering, Testing and Estimation Procedures for Graphons With Fast Boundary Decay Rates

具有快速边界衰减率的图收敛排序、检验和估计的收敛速度

Jeannette Janssen, Na Lin, Aaron Smith

AI总结 针对潜在空间为[0,1]的潜在位置随机图模型,研究了顶点排序、图估计和模型检验三个问题,证明了排序估计在某些图族上可达到快于1/√n的收敛速度,并提出了计算高效的图估计算法。

详情
Comments
47 pages, 2 figures
AI中文摘要

在潜在位置随机图模型(LPM)中,潜在顶点位置$U_{1},\ldots,U_{n}$从潜在空间$\Omega$上的某个分布中采样,然后观测图$G = ([n],E)$的边以依赖于未观测潜在位置的概率$\mathbb{P}[(i,j) \in E ]=w(U_i,U_j)$进行采样。LPM在网络统计分析中无处不在,提供了具有良好经验性能、强理论保证和可处理算法的模型。特殊情况$\Omega= [0,1]$很重要,因为它对应于具有时间或偏好结构的图。本文研究了潜在空间为$[0,1]$的LPM相关的三个问题:根据潜在位置对顶点进行 extit{排序}, extit{估计}生成图$w$,以及 extit{检验}观测图$G$是否可能来自状态空间为$[0,1]$的LPM。关于排序问题的结果大大推广了Janssen/Smith (2022)的两个观察:(i) 对于 extit{某些}图族,排序的最佳估计收敛速度远快于通常的统计速率$\frac{1}{\sqrt{n}}$,并且(ii) 即使对于相同的图族,潜在位置的最佳估计仍然以通常的$\frac{1}{\sqrt{n}}$速率发生。作为主要结果,我们开发了一种计算高效的图估计算法,并证明其具有与Gao等人(2015)的非显式最优算法相同的收敛速度。我们还推导并分析了一个检验程序。

英文摘要

In latent-position random graph models (LPMs), latent vertex positions $U_{1},\ldots,U_{n}$ are sampled from some distribution on a latent space $Ω$, then edges of an observed graph $G = ([n],E)$ are sampled with some probability $\mathbb{P}[(i,j) \in E ]=w(U_i,U_j)$ that depends on the unobserved latent positions. LPMs are ubiquitous in the statistical analysis of networks, offering models that have good empirical performance, strong theoretical guarantees, and tractable algorithms. The special case $Ω= [0,1]$ is important, as it corresponds to graphs with temporal or preference-based structure. In this paper, we study three problems related to LPMs with latent space $[0,1]$: \textit{ordering} the vertices according to the latent positions, \textit{estimating} the generating graphon $w$, and \textit{testing} whether an observed graph $G$ could have come from an LPM with state space $[0,1]$. Our results on the ordering problem greatly generalize two observations of Janssen/Smith (2022): (i) for \textit{some} families of graphons, the best estimate of the ordering converges much faster than the usual statistical rate of $\frac{1}{\sqrt{n}}$, and (ii) this occurs even though, for the same families of graphons, the best estimate of the latent positions still occurs at the usual $\frac{1}{\sqrt{n}}$ rate. As a main consequence, we develop a computationally-efficient graphon-estimation algorithm and show that it has the same convergence rate as the non-explicit optimal algorithm of Gao et al (2015). We also derive and analyze a testing procedure.

2605.27163 2026-05-27 cs.LG stat.ML

The Role of Causal Features in Strategic Classification for Robustness and Alignment

因果特征在战略分类中的作用:鲁棒性与对齐

Antonio Gois, Sophia Gunluk, Nir Rosenfeld, Nidhi Hegde, Simon Lacoste-Julien, Dhanya Sridhar

AI总结 本文通过因果模型分析战略分类中的分布偏移,证明因果分类在噪声有界时达到最优误差,并分解OOD交叉熵风险,揭示因果特征在长期激励对齐中的优势。

详情
Comments
Accepted at AISTATS 2026. 20 pages, 5 figures
AI中文摘要

在战略分类中,机构(例如银行)预期用户会改变其特征以提高分类任务(例如贷款偿还)中的效用,从而进行适应。由于关键挑战是用户引起的分布偏移,我们转向因果模型,该模型已被证明可以限制最坏情况下的分布外(OOD)风险,并建立了几个将因果关系与战略分类联系起来的新结果。首先,我们证明,当噪声以某种方式有界时,因果分类在任何足够大的适应后都能达到最优分类误差。其次,当这些假设不成立时,我们证明最优分类器的OOD交叉熵风险分解为一个OOD偏差项和一个由未使用所有可观测特征引起的项,从而使我们能够理解因果分类器何时具有优势。最后,我们证明使用因果特征可以允许机构与用户之间的长期激励对齐,这与先前强调此类方法社会成本的工作形成对比。我们在合成数据上凭经验验证了我们的理论,发现我们的结果预测了实际行为。

英文摘要

In strategic classification, an institution (e.g., a bank) anticipates adaptation from users who change their features to increase utility in a classification task (e.g., loan repayment). Since a key challenge is the distribution shift induced by users, we turn to causal models, which have been shown to bound the worst-case out-of-distribution (OOD) risk, and establish several new results that link causality and strategic classification. First, we show that causal classification leads to optimal classification error after any sufficiently large adaptation, when the noise is bounded in a certain way. Second, when these assumptions do not hold, we show OOD cross-entropy risk of optimal classifiers decomposes into an OOD bias term and a term arising from not using all observable features, allowing us to understand when causal classifiers have an advantage. Finally, we show that the use of causal features can allow alignment of long-term incentives between institutions and users, contrasting with previous work that highlights social costs of such approaches. We validate our theory empirically on synthetic data, finding that our results predict behavior in practice.

2605.27137 2026-05-27 math.ST stat.ME stat.TH

Bernstein-von Mises Theorem for Sparse Generalized Linear Model

稀疏广义线性模型的 Bernstein-von Mises 定理

Hanqing Li, Xuewen Lu

AI总结 研究广义线性模型中的 spike-and-slab 先验,通过分数后验的 oracle Bernstein-von Mises 定理,在支撑似然假设下实现收缩、支撑恢复和高斯混合近似。

详情
Comments
99 pages
AI中文摘要

我们研究了广义线性模型中具有可能分组稀疏性的 spike-and-slab 先验。主要结果是分数后验在支撑似然假设下的 oracle Bernstein--von Mises 定理。证明发展了稀疏局部渐近正态性和围绕支撑特定伪真实中心的 Laplace 近似,并将其与固定先验质量、支撑惩罚、恢复几何和 beta-min 分离相结合,以获得收缩、支撑恢复、高斯混合近似以及向 oracle 高斯律的坍缩。在所述充分条件下,对高斯回归以及逻辑回归、泊松回归、probit 回归、Gamma 对数链接回归和负二项对数链接回归给出了模型条目验证。普通后验仅通过受限高斯和典型链接扩展进行处理,并在额外的活跃维度和矩条件下覆盖。

英文摘要

We study spike-and-slab priors for generalized linear models with possible grouped sparsity. The main result is an oracle Bernstein--von Mises theorem for the fractional posterior under supportwise likelihood assumptions. The proof develops sparse local asymptotic normality and Laplace approximation around support-specific pseudo-true centers, and combines them with fixed-prior mass, support penalization, recovery geometry, and beta-min separation to obtain contraction, support recovery, Gaussian mixture approximation, and collapse to the oracle Gaussian law. Model-entry verifications are given for Gaussian regression and for logistic, Poisson, probit, Gamma log-link, and negative-binomial log-link regression under stated sufficient conditions. The ordinary posterior is treated only through restricted Gaussian and canonical-link extensions, with coverage under additional active-dimension and moment conditions.

2605.27120 2026-05-27 stat.ME

Copula and spatial-regularized variational autoencoder for mapping disease comorbidity in West Africa

Copula与空间正则化变分自编码器用于西非疾病共病映射

Osafu Augustine Egbon, Bassey David Ita, Faith Eshofonie, Ezra Gayawan

AI总结 提出一种结合双变量Gumbel copula和空间正则化的变分自编码器,用于建模西非儿童腹泻、发烧和急性呼吸道感染共病的空间异质性和非对称依赖关系,并识别高风险区域及风险因素。

详情
AI中文摘要

地理空间健康不均衡仍然是一个关键的公共卫生问题,因为社区由于暴露于不同的不利社会经济和环境条件而面临异质性的疾病风险。尽管已采用统计模型来识别风险因素,但考虑共病模式中固有的复杂非线性依赖和空间规律性的研究尚不充分。在这项工作中,我们提出了一种新颖的空间正则化变分自编码器(VAE),用于表征和映射西非儿童共病的地理空间不均衡,重点关注腹泻、发烧和急性呼吸道感染(ARI)。为了模拟这些疾病之间的依赖性,本研究将双变量Gumbel copula集成到VAE框架中,从而能够灵活建模非对称依赖性并量化联合和条件发病风险。此外,框架内的协变量效应被量化,以促进风险因素的流行病学解释。所提出的方法与常用方法进行了基准比较,并应用于使用人口与健康调查数据表征西非的共病情况。结果显示,西非儿童共病可能性存在显著的空间异质性,其中发烧和ARI之间的共现最强。家庭财富、母亲教育程度和改善水源的可及性与共病可能性相关。这些模式突出了高风险区域,并强调了需要有针对性的、特定地点的公共卫生干预措施。

英文摘要

Geospatial health disproportionality remains a critical public health concern, as communities face heterogeneous illness risks due to varying exposures to adverse socioeconomic and environmental conditions. While statistical models have been adopted to identify risk factors, studies that account for the complex, non-linear dependencies and spatial regularities inherent in comorbid disease patterns are underdeveloped. In this work, we propose a novel spatially regularized variational autoencoder (VAE) to characterize and map the geospatial disproportion of childhood comorbidity in West Africa, focusing on diarrhea, fever, and acute respiratory infection (ARI). To model dependence between these conditions, this study integrates a bivariate Gumbel copula into the VAE framework, enabling flexible modeling of asymmetric dependence and quantification of joint and conditional morbidity risks. Additionally, covariate effects within the framework were quantified to facilitate epidemiological interpretation of risk factors. The proposed method was benchmarked against commonly used methods and applied to characterize comorbidity in West Africa using the Demographic and Health Survey data. Findings reveal pronounced spatial heterogeneity in the likelihood of comorbidity among West African children, with the strongest co-occurrence observed between fever and ARI. Household wealth, maternal education, and access to improved water sources were associated with the likelihood of comorbidity. These patterns highlight high-risk areas and underscore the need for targeted, location-specific public health interventions.

2605.27097 2026-05-27 cs.LG stat.ML

Mildly Overparameterized ReLU Networks on Orthogonal Data: Incremental Learning and Implicit Bias

正交数据上的轻度过参数化ReLU网络:增量学习与隐式偏差

James Town, Etienne Boursier, Ben Lewis, Matthias Englert, Ranko Lazic

AI总结 研究从微小初始化出发的两层ReLU网络在正交数据上的梯度流动力学,揭示了当初始化尺度趋近零时极限流收敛到鞍点间跳跃过程,并证明网络在宽度m约大于log(n)时高概率插值训练数据,且学习到的插值器的平方ℓ2范数缩放为√n,与最小ℓ2范数插值器相差常数因子。

详情
Comments
66 pages, 6 figures
AI中文摘要

神经网络的成功训练依赖于一阶优化方法的使用,但这些方法的理论刻画仍不完整,尤其是在轻度过参数化设置下。本文研究从微小初始化出发的两层ReLU网络在正交训练数据上的梯度流动力学。我们证明,当初始化尺度趋近零时,极限流收敛到鞍点间跳跃过程,揭示了在每个鞍点处激活一个新神经元的增量学习现象。该分析恢复了Dana等人(2025, arXiv:2502.16977)的已知结果:只要$m \gtrsim \log(n)$(其中$m$是网络宽度,$n$是训练样本数),网络就以高概率插值训练数据。这一增量过程刻画还使我们能够推导出一个新的隐式偏差结果:学习到的插值器具有平方$\ell_2$范数缩放为$\sqrt{n}$,这处于最小$\ell_2$范数插值器的常数因子内。更广泛地,我们的工作为ReLU网络的增量学习过程提供了首个严格证明,同时表明轻度过参数化网络可以收敛到复杂度与最优插值器同阶的插值解。

英文摘要

The successful training of neural networks hinges on the use of first order optimization methods, yet the theoretical characterization of these methods remains incomplete. This is especially true in settings with mild overparameterization. In this work, we study the gradient flow dynamics of two-layer ReLU networks from small initialization with orthogonal training data. We prove the limiting flow converges to a saddle-to-saddle jump process as the initialization scale tends to zero, revealing an incremental learning phenomenon in which a new neuron activates at each saddle. This analysis recovers the known result of Dana et al. (2025, arXiv:2502.16977) that the network interpolates the training data with high probability as soon as $m \gtrsim \log(n)$, where $m$ is the network width and $n$ is the number of training samples. This incremental process characterization also allows us to derive a novel implicit bias result: the learned interpolator has a squared $\ell_2$-norm scaling as $\sqrt{n}$, which is within a constant factor of the minimal $\ell_2$-norm interpolator. More broadly, our work provides the first rigorous proof of an incremental learning process for ReLU networks, whilst suggesting mildly overparameterized networks can converge to interpolating solutions whose complexity is of the same order as that of the optimal interpolator.

2605.27093 2026-05-27 stat.ML cs.LG

Gaussian Process-based learning with new MCMC-based implementation of Wishart prior on correlation matrix

基于高斯过程的学习:相关矩阵上Wishart先验的新MCMC实现

Kane Warrior, Dalia Chakrabarty

AI总结 提出一种自组装Wishart先验用于协方差矩阵,结合MCMC对核超参数进行贝叶斯推断,通过回溯窗口引入自适应性,有效诊断弱信息输入。

详情
AI中文摘要

在输入-输出关系的概率监督学习中(作为高斯过程(GP)的样本函数),通常为核的超参数指定先验,这些超参数参数化GP的协方差函数,其中(所得多元正态)似然的诱导协方差矩阵控制学习和预测。当所寻求的函数高度多元时,必须同时学习多个长度尺度参数,使得推断困难。我们为协方差矩阵开发了一种“自组装”Wishart先验,同时使用MCMC对核超参数进行贝叶斯推断。该构造使用最近MCMC迭代的回溯窗口来定义依赖于时间步长的尺度矩阵,从而为链引入自适应性。结果表明,在基于GP的学习范式中,对协方差矩阵的直接先验指定可用于诊断弱信息输入。我们通过两个不同的实证示例支持我们的先验开发——一个基于合成数据,另一个基于真实世界数据集。

英文摘要

In probabilstic supervised learning of an input-output relationship - as a sample function of a Gaussian Process (GP) - priors are typically specified for the hyperparameters of the kernel that parametrises the covariance function of the GP, where the induced covariance matrix of the (resulting multivariate Normal) likelihood, governs the learning and prediction. When the sought function is highly multivariate, multiple lengthscale parameters must be learnt simultaneously, making inference difficult. We develop a ``self-assembled'' Wishart prior for the covariance matrix, while undertaking Bayesian inference on the kernel hyperparameters using MCMC. The construction uses a look-back window over recent MCMC iterations to define a time-step dependent scale matrix, thereby introducing adaptiveness to the chain. Results suggest that direct prior specification on the covariance matrix can be useful for diagnosing weakly informative inputs within the GP-based learning paradigm. We support our prior development with two distinct empirical illustrations - one on synthetic data, and another on a real-world dataset.

2605.27085 2026-05-27 stat.ME math.ST stat.AP stat.TH

Estimation and Inference for Win Measures with Multiple Ordinal Endpoints Subject to Missingness

具有缺失数据的多个有序终点胜率指标的估计与推断

Yi Liu, Huiman Barnhart, Sean O'Brien, Yuliya Lokhnygina, Roland A. Matsouaka

AI总结 针对多个层次有序终点存在缺失数据的情况,提出逆概率加权(IPW)和增强逆概率加权(AIPW)估计量,以纠正标准成对比较方法的偏差,并实现双重稳健性。

详情
AI中文摘要

胜率指标,包括胜率(WR)、胜算(WO)、净收益(NB)和结局排序期望(DOOR),越来越多地用于具有多个层次有序终点的随机临床试验中。然而,在实践中,一个或多个组分终点可能存在缺失数据。标准的成对比较方法将缺失结局的对视为平局,即使数据是完全随机缺失(MCAR),也可能产生有偏估计。尽管针对删失生存终点已开发了逆删失概率加权(IPCW)方法,但针对缺失层次有序终点的相应方法尚不可用。为填补这一空白,我们针对具有缺失数据的层次有序终点,开发了逆概率加权(IPW)和增强逆概率加权(AIPW)估计量,允许缺失依赖于治疗分配和基线协变量。IPW估计量通过使用定义胜率指标的联合单元格概率估计中涉及的联合非缺失概率对完整观测结局进行重新加权,从而纠正偏差。AIPW估计量额外纳入结局建模,提高了效率并实现了双重稳健性。在推断方面,我们基于影响函数推导了两种方法的闭式方差估计量。模拟研究表明,标准方法可能存在显著偏差,而所提出的IPW和AIPW估计量保持一致,且覆盖概率接近名义水平。此外,AIPW估计量通常比IPW估计量更有效。在SCOUT-CAP和ACTT-1试验中的应用说明了所提出方法的实用性。提供了R包WinMO以供实现。

英文摘要

Win measures, including the win ratio (WR), win odds (WO), net benefit (NB), and desirability of outcome ranking (DOOR), are increasingly used in randomized clinical trials with multiple hierarchical ordinal endpoints. In practice, however, one or more component endpoints may have missing data. The standard pairwise-comparison approach, which treats pairs with missing outcomes as ties, can produce biased estimates, even if the data are missing completely at random (MCAR). Although inverse probability of censoring weighting (IPCW) methods have been developed for censored survival endpoints, corresponding methods for addressing missing hierarchical ordinal endpoints are not yet available. To address this gap, we develop inverse probability weighting (IPW) and augmented IPW (AIPW) estimators for win measures with hierarchical ordinal endpoints subject to missing data, allowing missingness to depend on treatment assignment and baseline covariates. The IPW estimator corrects bias by reweighting complete observed outcomes using joint non-missingness probabilities involved in estimating the joint cell probabilities that define the win measures. The AIPW estimator additionally incorporates outcome modeling, improving efficiency and achieving double robustness. For inference, we derive closed-form variance estimators for both methods based on influence functions. Simulation studies show that the standard approach can be substantially biased, whereas the proposed IPW and AIPW estimators remain consistent with near-nominal coverage. Furthermore, the AIPW estimator is generally more efficient than IPW estimator. Applications to the SCOUT-CAP and ACTT-1 trials illustrate the practical utility of the proposed methods. An R package, WinMO, is provided for implementation.

2605.27043 2026-05-27 stat.ML cs.LG stat.ME

Causal Representation Learning for Generalisable Recommendation

因果表示学习用于可泛化推荐

Yorgos Felekis, Michael O'Riordan, Oriol Corcoll, Ciarán M. Gilligan-Lee

AI总结 针对推荐系统中训练分布与部署分布不一致导致的泛化问题,提出基于因果表示学习的信息论解缠标准及其可计算变分下界,仅利用混淆日志即可提升模型在分布偏移下的泛化能力,在Spotify A/B测试、KuaiRand数据集和合成基准上验证了有效性。

详情
AI中文摘要

基于观测数据训练的预测模型在部署时往往无法泛化到所遇到的分布,尤其是当训练数据是被优化系统的产物时。推荐系统是一个典型例子:它们是在被部署策略、过去用户行为和平台过滤混淆的交互日志上训练的。因此,训练分布与在服务时评分的候选分布存在显著差异,这种差距使得离线指标无法可靠预测在线性能。我们通过一种受因果表示学习(CRL)启发的方法来解决分布偏移问题。我们提出了一种信息论解缠标准,并证明其最优值仅取决于输入的因果成分。然后,我们推导出一个可处理的变分下界,使得该标准仅从有限观测数据中即可优化。我们的方法范围比大多数CRL文献更窄,因为我们目标是改善分布偏移下的泛化能力,而非完全识别所有潜在因果因素。这个更窄的目标使得该方法实用,仅需要现有的混淆日志,适用于任何标准监督模型,且不增加推理时间成本。我们的主要评估是在Spotify上对数百万用户进行的A/B测试,应用于个性化播放列表生成的排序器。一个容量匹配的CRL变体在离线性能上相当,但在在线听众参与度上带来了显著提升。在公开的KuaiRand推荐数据集和具有已知因果结构的合成基准上的补充证据显示了相同模式:与基线离线持平,在分布偏移下获得收益。在所有三种设置中,加入我们的因果解缠目标都带来了更有意义的分布外泛化。

英文摘要

Predictive models trained on observational data often fail to generalise to the distributions they encounter when deployed, especially when the training data is a product of the system being optimised. Recommender systems are a canonical example: they are trained on interaction logs confounded by the deployed policy, past user behaviour, and platform filtering. As a result, the training distribution differs substantially from the candidate distribution scored at serving time, a gap that makes offline metrics unreliable predictors of online performance. We address the distribution shift problem with a method motivated by causal representation learning (CRL). We propose an information-theoretic disentanglement criterion and prove that its optimum depends only on the causal components of the input. We then derive a tractable variational lower bound that makes the criterion optimisable from finite observational data alone. The scope of our method is narrower than that of much of the CRL literature, in that we target better generalisation under distribution shift, not full identification of all latent causal factors. This narrower target is what makes the method practical, requiring only the existing confounded logs, applying to any standard supervised model, and adding no inference-time cost. Our headline evaluation is an A/B test with millions of users on Spotify, applied to a production ranker for personalised playlist generation. A capacity-matched CRL variant performed on par offline but delivered substantial online gains in listener engagement. Complementary evidence on the public KuaiRand recommendation dataset and a synthetic benchmark with known causal structure shows the same pattern: offline parity with baseline, gains under distribution shift. Across all three settings, adding our causal disentanglement objective yields meaningfully better out-of-distribution generalisation.

2605.27016 2026-05-27 cs.CL cs.AI cs.LG stat.ML

Evaluating the Relevance of Uncertainty Estimators for LLM Hallucination

评估不确定性估计器与LLM幻觉的相关性

Yedidia Agnimo, Anna Korba, Annabelle Blangero, Nicolas Chesneau, Karteek Alahari

AI总结 通过系统实证研究,评估信息论、基于采样和反思性等不确定性估计器与LLM幻觉之间的关联,发现关联性高度可变且通常较弱,挑战了将不确定性作为幻觉直接信号的做法。

详情
Comments
35 pages, 7 figures, 9 tables
AI中文摘要

大型语言模型(LLM)容易产生幻觉,即与输入或训练数据不符的陈述,阻碍了可靠部署。同时,许多不确定性估计(UE)方法被提出来量化模型置信度,并常被隐含地视为模型失败的代理。然而,不确定性与幻觉之间的关系尚未得到充分表征。我们对不确定性估计器与LLM幻觉之间的关联进行了系统的实证研究。我们不是假设这种关联,而是直接评估它在何时以及在多大程度上成立。我们考虑了多种不确定性估计器,包括信息论、基于采样和反思性估计器,并检查了它们在幻觉设置中的行为。我们的实验涵盖了内在幻觉(违反输入忠实性)和外在幻觉(相对于训练数据的无根据主张),使用了四个互补基准,包括RAGTruth和HalluLens。我们发现,这种关联性高度可变且通常较弱,取决于幻觉类型和所评估的LLM。这些结果挑战了将不确定性作为幻觉直接信号的做法,并阐明了何时它能提供可操作的信息。

英文摘要

Large language models (LLMs) are prone to hallucinations, i.e., statements unsupported by the input or training data, hindering reliable deployment. In parallel, numerous uncertainty estimation (UE) methods have been proposed to quantify model confidence and are often implicitly treated as proxies for model failure. However, the relationship between uncertainty and hallucinations remains insufficiently characterized. We present a systematic empirical study of the association between uncertainty estimators and hallucinations in LLMs. Rather than assuming this association, we evaluate directly when and to what extent it holds. We consider a diverse set of uncertainty estimators, including information-theoretic, sampling-based, and reflexive estimators, and examine their behavior across hallucination settings. Our experiments cover both intrinsic hallucinations (violations of input faithfulness) and extrinsic hallucinations (unsupported claims relative to training data), using four complementary benchmarks, including RAGTruth and HalluLens. We find that the association is highly variable and often weak, depending on the hallucination type and the LLM under evaluation. These results challenge the use of uncertainty as a direct signal of hallucination and clarify when it provides actionable information.

2605.27012 2026-05-27 math.ST stat.ME stat.TH

Conformalized Large-Scale Selective Inference with Informative and Trustworthy Prediction Sets

具有信息性和可信预测集的共形化大规模选择性推断

Wangcheng Li, Guanlan Zhao, Xu Guo, Wenguang Sun

AI总结 提出SCIP方法,通过信息性集构造器、信任评分和广义共形p值,在控制错误覆盖率的同时选择信息性预测集,适用于回归和分类任务。

详情
AI中文摘要

在大规模预测问题中,对所有测试单元进行详尽跟进通常不切实际且效率低下,因此需要一种满足信息性和可信性双重需求的选择性报告策略。在InfoFCR(具有错误覆盖率控制的信息性预测)框架内,我们提出了SCIP(选择性共形推断用于信息性预测),该过程基于三个关键组成部分:(i)信息性集构造器,根据用户指定的信息性约束为单个测试单元定制预测集;(ii)信任评分,对候选信息性集的可信性进行原则性量化;以及(iii)广义共形p值,用于执行FCR分析以选择最有希望的候选集。我们证明SCIP保证了有限样本下的FCR控制,并且是渐近反保守的,相比现有方法实现了更高的统计功效。该框架高度通用,适用于回归和分类任务中的各种误差度量。在模拟和真实数据上的大量数值实验证明了我们方法的有效性。

英文摘要

In large-scale prediction problems, exhaustively following up on all test units is often impractical and inefficient, motivating a selective reporting strategy that fulfills the dual requirements of informativeness and trustworthiness. Within the InfoFCR (Informative prediction with False Coverage Rate control) framework, we propose SCIP (Selective Conformal Inference for Informative Predictions), a procedure built on three key components: (i) an informative set constructor that tailors prediction sets to individual test units according to user-specified informativeness constraints; (ii) a trust score that provides a principled quantification of the trustworthiness of candidate informative sets; and (iii) generalized conformal p-values that are used to perform FCR analysis for selecting the most promising candidates. We establish that SCIP guarantees finite-sample FCR control and is asymptotically anti-conservative, achieving higher statistical power than existing methods. The framework is highly versatile, accommodating a wide range of error metrics across both regression and classification tasks. Extensive numerical experiments on simulated and real data demonstrate the effectiveness of our approach.

2605.27006 2026-05-27 cs.LG cond-mat.dis-nn stat.ML

Sampling Data with Chains of Forward-Backward Diffusion Steps

通过前向-反向扩散步骤链采样数据

Hyunmo Kang, Noam Itzhak Levi, Corinna Elena Wegner, Daniel J. Korchinski, Matthieu Wyart

AI总结 提出U-turn链,通过扩散模型的短前向-反向步骤迭代构造马尔可夫链,结合Metropolis-Hastings校正从能量修正目标中采样,并发现最小U-turn动力学经历由数据流形碎片化驱动的遍历性破缺相变。

详情
AI中文摘要

从学习到的高维分布中采样是一个基础的计算问题。我们引入U-turn链:通过迭代扩散模型的短前向-反向步骤获得的马尔可夫链,其中每一步提出一个保持在所学数据流形上的移动,并与Metropolis-Hastings校正配对,从能量修正目标中采样。对于合成语言,我们表明最小U-turn动力学经历由数据流形碎片化驱动的遍历性破缺相变;在更大的U-turn幅度下遍历性得以恢复。在非遍历区域,低层特征比高层特征松弛得更快,这种顺序仅在足够大的U-turn幅度下才会反转。我们在自然语言和自然图像上测试这些预测。在两种模态中,最小U-turn松弛缓慢,尤其是对于由CNN或LLM中深层表示近似的高层特征。层序反转仅在噪声足够大且混合高效时出现——这些特征与强约束、弱混合的局部动力学一致。我们讨论了这些结果对使用扩散模型采样的启示。

英文摘要

Sampling from learned high-dimensional distributions is a foundational computational problem. We introduce U-turn chains: Markov chains obtained by iterating short forward-backward steps of a diffusion model, in which each step proposes a move that remains on the learned data manifold and, paired with a Metropolis-Hastings correction, samples from energy-modified targets. For synthetic languages, we show that minimal U-turn dynamics undergoes an ergodicity-breaking phase transition driven by fragmentation of the data manifold; ergodicity is restored at larger U-turn magnitude. In the non-ergodic regime, low-level features relax faster than high-level ones, an ordering that inverts only at sufficiently large U-turn magnitude. We test these predictions on natural language and natural images. In both modalities, minimal U-turns relax slowly, especially for high-level features approximated by deep representations in CNNs or LLMs. The layer-ordering inversion appears only at large noise when mixing is efficient -- signatures consistent with strongly constrained, weakly mixing local dynamics. We discuss the implications of these results for sampling with diffusion models.

2605.26990 2026-05-27 stat.ML cs.LG

Constrained Bayesian Experimental Design via Online Planning

通过在线规划的约束贝叶斯实验设计

Yujia Guo, Daolang Huang, Xinyu Zhang, Sammie Katt, Samuel Kaski, Ayush Bharti

AI总结 提出一种结合离线预训练摊销策略和后验网络与在线多步前瞻规划(场景树)的方法,以在动态约束下优化贝叶斯实验设计,相比现有方法获得更优信息序列且计算开销适中。

详情
Comments
24 pages, 9 figures. Accepted at the Forty-Third International Conference on Machine Learning (ICML 2026)
AI中文摘要

贝叶斯实验设计(BED)是一个用于数据高效顺序实验设计的理论框架。然而,现有的BED方法无法适应实际任务中由于预算限制、成本变化或物理约束(限制设计随时间演化)而产生的动态约束。在本文中,我们介绍了一种新的BED方法,通过将离线预训练的摊销策略和后验网络与使用场景树的在线多步前瞻规划相结合,实现了实验设计的约束优化。我们通过实验证明,在多种约束BED任务中,我们的方法相比现有方法产生了更信息丰富的设计序列,同时仅增加了适度的额外计算开销。

英文摘要

Bayesian experimental design (BED) is a principled framework for data-efficient design of sequential experiments. However, existing BED methods are unable to adapt to dynamic constraints inherent in real-world tasks due to budget limitations, varying costs, or physical constraints that restrict how designs evolve over time. In this paper, we introduce a novel approach to BED that enables constrained optimization of experimental designs by combining offline pre-training of an amortized policy and a posterior network with online multi-step lookahead planning using scenario trees. We empirically demonstrate that our method yields substantially more informative design sequences than existing methods across a range of constrained BED tasks, while incurring only a modest additional computational overhead.

2605.26973 2026-05-27 stat.ML cond-mat.dis-nn cs.LG cs.NE q-bio.NC

Signal-to-Noise Ratio and Sample Size Govern Representational Alignment in Neural Networks

信噪比与样本量控制神经网络中的表征对齐

Ali Hussaini Umar, Alessandro Laio

AI总结 通过理论和实验证明,信噪比和训练样本量以单调和非单调方式分别影响神经网络表征对齐,且对齐程度在插值阈值附近最小,与泛化误差解耦。

详情
AI中文摘要

已知神经网络会发展出潜在表征,这些表征是$对齐$的,即在不同架构、训练协议或训练数据集训练的网络之间结构相似。我们在一个受控环境中研究这一现象,使用被噪声过程的独立实现扰动的训练集,训练一组网络执行回归和分类任务。我们表明,信噪比(SNR)和训练样本量以定性相似的方式影响对齐,无论是在真实世界数据集上训练的网络,还是在极其简单的具有单个隐藏层的$线性$网络中(其对齐可以解析估计)。在线性和非线性网络、回归和分类任务以及合成和真实数据中,我们一致观察到,对齐随SNR单调变化,但随训练样本量非单调变化。特别地,对齐在插值阈值附近最小,且更强的对齐不一定对应更好的泛化误差。这些发现揭示了数据质量和数量对对齐的非平凡依赖关系,且与泛化性能解耦。

英文摘要

Neural networks are known to develop latent representations that are $aligned$, namely structurally similar across networks trained with different architectures, training protocols, or training datasets. We study this phenomenon in a controlled setting, where we train an ensemble of networks on regression and classification tasks using training sets perturbed by independent realizations of a noise process. We show that the signal-to-noise ratio (SNR) and the training sample size influence the alignment in qualitatively similar ways in networks trained on real-world datasets and in an extremely simple $linear$ network with a single hidden layer, for which the alignment can be estimated analytically. Across linear and nonlinear networks, regression and classification tasks, and both synthetic and real-world data, we consistently observe that alignment varies monotonically with SNR but non-monotonically with training sample size. In particular, the alignment is minimized near the interpolation threshold, and a stronger alignment does not necessarily correspond to better generalization error. These findings reveal a non-trivial dependence of alignment on data quality and quantity, decoupled from generalization performance.

2605.26895 2026-05-27 cs.LG cs.AI stat.ML

Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models

微不足道的大小,显著的效果:大型语言模型中的尺度向量

Mingze Wang, Shuchen Zhu, Yuxin Fang, Binghui Li, Kai Shen, Shu Zhong

AI总结 本文系统研究了大型语言模型中的尺度向量,发现其虽参数占比极小但对预训练至关重要,通过自放大预条件效应优化优化过程,并提出了三种轻量级改进策略,在多种模型规模上一致提升性能。

详情
Comments
36 pages
AI中文摘要

现代大型语言模型(LLM)中的归一化层由确定性归一化操作和可学习的尺度向量组成。尽管归一化操作已被广泛研究,但尺度向量尽管被普遍使用,其作用仍未被充分理解。在这项工作中,我们从表达能力、优化和架构结构的角度对LLM中的尺度向量进行了系统研究。首先,我们通过实验表明,虽然尺度向量仅占模型参数的极小部分,但移除它们会显著降低LLM的预训练效果。我们的理论进一步表明,在Pre-Norm架构中,尺度向量并不增加表达能力;相反,它们通过对后续线性映射产生自放大预条件效应来改善优化。其次,我们研究了权重衰减对尺度向量的作用。通过区分Input-Norm和Output-Norm层,我们从理论上证明,由于它们在优化和表达能力中的不同作用,权重衰减对前者有益但对后者有害。第三,受此理解的启发,我们提出了三种轻量级且互补的尺度向量改进方法:分支特异性异质性、线性映射周围的改进放置以及幅度-方向重参数化。理论和实验均表明,每种改进都能带来一致的收益。最后,我们将这些改进整合为一个统一的尺度向量策略,并通过在0.12B到2B参数的密集和混合专家模型上进行大规模LLM预训练实验,使用多种优化器和学习率调度,在工业级token预算下进行评估。该统一策略始终比精心调整的基线获得更低的终端损失,并展现出更有利的扩展行为,同时增加可忽略的参数和计算开销。

英文摘要

Normalization layers in modern large language models (LLMs) consist of a deterministic normalization operation and a learnable scale vector. While the normalization operation has been extensively studied, the scale vector remains poorly understood despite its ubiquitous use. In this work, we present a systematic study of scale vectors in LLMs from the perspectives of expressivity, optimization, and architectural structure. First, we show empirically that although scale vectors constitute only a negligible fraction of model parameters, removing them substantially degrades LLM pre-training. Our theory further shows that, in Pre-Norm architectures, scale vectors do not increase expressivity; instead, they improve optimization through a self-amplifying preconditioning effect on subsequent linear mappings. Second, we investigate the role of weight decay for scale vectors. By distinguishing Input-Norm and Output-Norm layers, we theoretically show that weight decay is beneficial for the former but harmful for the latter, due to their distinct roles in optimization and expressivity. Third, motivated by this understanding, we propose three lightweight and complementary improvements to scale vectors: branch-specific heterogeneity, improved placement around linear mappings, and magnitude-direction reparameterization. Both theory and experiments show that each improvement yields consistent gains. Finally, we combine these improvements into a unified scale-vector strategy and evaluate it through extensive LLM pre-training experiments on dense and mixture-of-experts models ranging from 0.12B to 2B parameters, across multiple optimizers and learning rate schedules, under industrial-scale token budgets. The unified strategy consistently achieves lower terminal loss than well-tuned baselines and exhibits more favorable scaling behavior, while adding negligible parameter and computational overhead.

2605.26890 2026-05-27 q-fin.CP stat.AP stat.ML

Nonlinear and Heavy-Tailed Predictability in Transition-Energy Financial Markets

转型能源金融市场中的非线性和重尾可预测性

Kpante Emmanuel Gnandi, Fredy Pokou, Jules Sadefo Kamdem

AI总结 针对转型能源金融市场的非线性依赖和重尾特性,提出结合Student-t向量自回归与非线性循环残差学习的混合预测框架,实证表明该框架在宏观金融压力期显著优于传统高斯线性模型。

详情
AI中文摘要

与转型相关的金融市场日益面临突然的重定价事件、波动性加剧和异质性宏观金融冲击。在此条件下,传统的高斯线性预测框架可能无法完整描述化石能源、可再生能源、技术和公用事业部门资产之间的依赖结构。本文研究了转型相关金融收益在控制重尾多变量线性动态后是否表现出残差非线性可预测性。为解决这一问题,我们开发了一个混合预测框架,将Student-t向量自回归与非线性循环残差学习架构相结合。实证分析考虑了六只主要交易所交易基金,代表广泛股票市场和关键的转型敏感行业。结果揭示了与高斯线性行为的显著偏离,包括超额峰度、波动率聚类以及经过计量过滤后仍存在的非线性依赖。样本外预测实验表明,所提出的框架相对于传统VAR模型、独立机器学习方法和替代混合规范,持续提高了预测准确性。预测收益在宏观金融压力期更为显著,特别是在COVID-19危机和与乌克兰相关的能源冲击期间。总体而言,研究结果表明,转型相关金融系统表现出对制度敏感且重尾的预测动态,这些动态仅靠标准高斯线性模型无法充分捕捉。

英文摘要

Transition-related financial markets are increasingly exposed to abrupt repricing episodes, elevated volatility, and heterogeneous macro-financial shocks. Under such conditions, conventional Gaussian-linear forecasting frameworks may provide an incomplete representation of the dependence structure linking fossil-energy, renewable-energy, technology, and utility-sector assets. This paper investigates whether transition-related financial returns exhibit residual non-linear predictability after controlling for heavy-tailed multivariate linear dynamics. To address this question, we develop a hybrid forecasting framework combining Student-t Vector Autoregressions with nonlinear recurrent residual learning architectures. The empirical analysis considers six major exchange-traded funds representing broad equity markets and key transition-sensitive sectors. The results reveal substantial departures from Gaussian-linear behavior, including excess kurtosis, volatility clustering, and remaining nonlinear dependence after econometric filtering. Out-of-sample forecasting experiments show that the proposed framework consistently improves predictive accuracy relative to conventional VAR models, standalone machine-learning methods, and alternative hybrid specifications. The forecasting gains become more pronounced during periods of macro-financial stress, particularly during the COVID-19 crisis and the Ukraine-related energy shock. Overall, the findings suggest that transition-related financial systems exhibit regime-sensitive and heavy-tailed predictive dynamics that are insufficiently captured by standard Gaussian-linear models alone.

2605.26888 2026-05-27 math.ST stat.TH

INARMA Models for Count Random Fields -- a Survey

计数随机场的INARMA模型——综述

Angelika Silbernagel, Christian H. Weiß

AI总结 本文综述了基于稀疏算子的整数自回归滑动平均(INARMA)模型在二维网格计数随机场中的应用,涵盖不同稀疏算子、一阶和高阶模型以及单边和多边模型结构。

详情
Comments
11 pages, 2 figures
AI中文摘要

基于稀疏算子的整数自回归滑动平均(INARMA)模型在计数时间序列中很受欢迎。最近,INARMA模型也被发展用于计数随机场,即位于规则二维网格上的空间计数数据。本文对现有的INARMA随机场进行了全面综述,涵盖了不同稀疏算子的方法、一阶和高阶模型,以及单边和多边模型结构。

英文摘要

The thinning-based integer-valued autoregressive moving-average (INARMA) models are popular for count time series. Recently, types of INARMA models have also been developed for count random fields, i.e., for spatial count data located on a regular two-dimensional grid. This article provides a comprehensive survey on existing INARMA random fields, covering approaches with different thinning operators, first- and higher-order models, as well as unilateral and multilateral model structures.

2605.26881 2026-05-27 math.ST math.DS stat.ME stat.TH

Robust ensemble Kalman filtering under observation noise misspecification via diffusion score matching

基于扩散分数匹配的观测噪声误设定下鲁棒集成卡尔曼滤波

Hans Reimann, Sebastian Reich

AI总结 针对动态系统贝叶斯滤波中观测噪声误设定问题,提出利用扩散分数匹配调整分析步信息处理的鲁棒卡尔曼滤波器,理论证明其在线性高斯状态空间中的共轭性、鲁棒性、协方差稳定性及高维一致性,并通过集成近似实现EnKF、ESRF和LETKF变体,在目标跟踪、Lorenz 63和Lorenz 96系统中验证了鲁棒性与稳定性的权衡。

详情
AI中文摘要

我们通过广义贝叶斯推断的最新进展,解决了动态系统贝叶斯滤波中的观测噪声误设定问题。真实数据生成过程与假设观测模型之间尾部衰减的不匹配(通常表现为频繁的异常值)会严重影响卡尔曼滤波中的贝叶斯更新和分析。现有方法通常采用检测-删除方案或协方差膨胀来避免吸收有影响的误设定实例。在分析更新勉强足以抵消预报不确定性的挑战性环境中,这些策略可能不稳定或难以提供可靠的不确定性量化。我们提出一种新颖的卡尔曼滤波器,通过在分析步骤中采用扩散分数匹配进行推断来调整信息处理,从而在保持良好量化不确定性的同时获得鲁棒性。我们提供了扩散分数匹配卡尔曼滤波器在线性高斯状态空间系统中的理论性质,涵盖分析步骤中的共轭性和闭式参数更新、鲁棒性、协方差稳定性以及调参和高维一致性。我们通过随机和确定性耦合以及实现局域化推导出集成近似,得到EnKF、ESRF和LETKF变体。我们在目标跟踪、混沌Lorenz 63系统和40维Lorenz 96系统的适当模拟研究中评估了这些方法。我们的见解突显了贝叶斯滤波中鲁棒性与稳定性之间的关键权衡。采用广义贝叶斯推断的方法可以驾驭这种平衡,并在结合非线性动力学和潜在非高斯观测噪声的挑战性环境中改进数据同化。

英文摘要

We address the problem of observation noise misspecification in Bayesian filtering of dynamical systems via recent advances in generalised Bayesian inference. Mis-match in tail decay between the true data generating process and an assumed observation model, often showing via frequent outliers, can strongly impact Bayesian updates and analysis in Kalman filtering. Existing approaches often employ detect-and-delete-schemes or covariance inflation to avoid assimilation of influential instances of mis-specification. In challenging settings where the analysis updates are barely sufficient to counteract the induced forecast uncertainty, these strategies may destabilize or struggle to provide reliable uncertainty quantification. We consider a novel Kalman filter adjusting information processing in the analysis step by employing diffusion score matching for inference to obtain robustness while maintaining well-quantified uncertainties. We provide theoretical properties of the diffusion score matching Kalman filter in linear Gaussian state space systems covering conjugacy and closed form parameter update in the analysis step, robustness, covariance stability, and tuning as well as high-dimensional consistency. We derive ensemble approximations via stochastic and deterministic coupling as well as implementing localization to obtain EnKF, ESRF and LETKF varieties. We evaluate the methods in appropriate simulation studies on target-tracking, the chaotic Lorenz 63 system and the Lorenz 96 system in 40 dimensions. Our insights highlight a critical trade-off between robustness and stability in Bayesian filtering. Methods employing generalized Bayesian inference can navigate this balance and improve data assimilation in challenging environments combining non-linear dynamics and potentially non-Gaussian observation noise.

2605.26843 2026-05-27 stat.AP stat.ME

A warning system for risk prediction of metabolic syndrome in a healthy population of blood donors

健康献血人群代谢综合征风险预测预警系统

Simone Colombara, Ilenia Epifani, Alessandra Guglielmi, Ettore Lanzarone

AI总结 提出贝叶斯统计模型,利用纵向数据预测献血者代谢综合征概率,并构建交通灯预警系统以支持早期干预。

详情
AI中文摘要

代谢综合征是一种复杂的临床状况,其特征是同时存在多种代谢风险因素,是一个主要的公共卫生问题。该综合征悄然发展,可能长期未被诊断,这凸显了在明显疾病发作前研究早期代谢变化的重要性。对主要健康个体的纵向监测可能有助于早期识别代谢风险。本文提出了一种贝叶斯统计模型,用于在献血前筛查中估计献血者患代谢综合征的概率,并结合了先前访视收集的信息。利用意大利主要献血者协会之一AVIS米兰的纵向数据,我们分析了主要健康献血者群体的重复临床和生活方式测量值。具体而言,我们拟合了一个贝叶斯多元模型,该模型联合表示代谢综合征五个诊断成分的对数。该模型考虑了重复访视中献血者内部的依赖性,并提供了个体风险的概率估计。我们的框架旨在为AVIS米兰的临床医生在献血前筛查中提供一个可解释的交通灯预警系统(低、中、高风险),以促进识别未来访视中有代谢综合征风险的个体,并在常规献血者评估中支持有针对性的预防干预,最终为意大利国家医疗系统长期降低医疗成本。

英文摘要

Metabolic syndrome is a complex clinical condition characterized by the simultaneous presence of multiple metabolic risk factors and represents a major public health concern. The syndrome develops silently and may remain undiagnosed for long periods, highlighting the importance of investigating early metabolic alterations before overt disease onset. Longitudinal monitoring of predominantly healthy individuals may help identify metabolic risk early. The paper proposes a Bayesian statistical model to estimate the probability of metabolic syndrome among blood donors during pre-donation screening, incorporating information collected at previous visits. Using longitudinal data from one of the main blood donor associations in Italy, AVIS Milan, we analyze repeated clinical and lifestyle measurements from a predominantly healthy population of donors. In particular, we fit a Bayesian multivariate model that jointly represents the logarithm of the five diagnostic components of metabolic syndrome. The model accounts for within-donor dependence across repeated visits and provides probabilistic estimates of individual risk. Our framework aims to provide clinicians at AVIS Milan with an interpretable traffic-light warning system (low, intermediate, high risk) during pre-donation screening to facilitate the identification of individuals at risk of metabolic syndrome at future visits and to support targeted preventive interventions during routine donor assessment, ultimately contributing to a long-term reduction in healthcare costs for the Italian national healthcare system.

2605.26800 2026-05-27 math.ST stat.TH

Accelerated Schrödinger-Föllmer samplers

加速的薛定谔-福尔默采样器

Haotian Lin, Xiaojie Wang, Xiaoyan Zhang

AI总结 提出一种高效的随机龙格-库塔方案来加速薛定谔-福尔默采样器,用于复杂高维多模态分布采样,并证明其L^2-Wasserstein距离下的收敛阶为O(h^{3/2}|ln h|)。

详情
AI中文摘要

采样是科学计算、统计学和机器学习等多个学科广泛应用中的基本算法任务。本文提出了一种高效的随机龙格-库塔方案来加速薛定谔-福尔默采样器,该采样器设计用于从复杂的高维多模态分布中采样。所得到的随机龙格-库塔薛定谔-福尔默采样器(SRKSFS)被证明在$L^2$-Wasserstein距离下达到$\mathcal{O}(h^{3/2}|\ln h|)$阶的收敛速度,显著改进了现有欧拉型采样器的$\mathcal{O}(h)$阶。然而,获得增强的收敛速度并非易事,因为注意到扩散过程的漂移关于时间变量不可微,仅具有$ rac{1}{2}$-Hölder连续性。为了解决这一困难,我们依赖于精细的误差估计来克服漂移时间导数引起的奇异性,代价是引入对数因子。此外,该框架被扩展到基于经验度量的数据驱动薛定谔-福尔默生成,使得无需已知密度即可进行数据驱动采样。报告了各种数值实验以验证所提出采样算法的有效性。

英文摘要

Sampling is a fundamental algorithmic task in wide-ranging applications across multiple disciplines such as scientific computing, statistics and machine learning. In this paper, an efficient stochastic Runge-Kutta scheme is proposed to accelerate the Schrödinger-Föllmer sampler, designed for sampling from complex and high-dimensional multimodal distributions. The resulting stochastic Runge-Kutta Schrödinger-Föllmer sampler (SRKSFS) is proved to achieve a convergence rate of order $\mathcal{O} ( h^{3/2} |\ln h|)$ in the $L^2$-Wasserstein distance, considerably improving the order $\mathcal{O}(h)$ of the existing Euler type sampler. Obtaining the enhanced convergence rate is, however, not trivial, by noting that the drift of the diffusion process is not differentiable but only $\frac{1}{2}$-Hölder continuity with respect to the time variable. To address the difficulty, we rely on delicate error estimates to overcome the singularity due to time derivatives of the drift, at the expense of the logarithmic factor. Furthermore, the framework is extended to data-driven Schrödinger-Föllmer generation with empirical measures, enabling data-driven sampling without known density. A variety of numerical experiments are reported to validate the effectiveness of the proposed sampling algorithms.

2605.26753 2026-05-27 math.ST stat.TH

Estimating the logistic regression equation when the model is incorrect

当模型不正确时估计逻辑回归方程

Nils Lid Hjort

AI总结 研究当逻辑回归模型仅为真实函数的近似时,基于似然的估计量(如最大似然估计)的一致性和渐近正态性,并指出最小化加权距离的“最小假”参数值。

详情
Comments
8 pages, 0 figures. Statistical Research Report, Department of Mathematics, University of Oslo, January 1990; the material foreshadows later developments regarding local likelihood (for regression and for densities), weighted likelihood, robust inference, and more
AI中文摘要

温和地反对精确正确参数模型的概念,采用逻辑回归方程仅仅是潜在真实函数近似的观点。在此一般框架下研究基于似然的估计量的行为。证明了最大似然估计对于某个最小化加权平均距离(衡量真实模型与参数模型之间差异的量)的“最小假”参数值具有一致性。还证明了渐近正态性。最后提供了一些附加评论,其中一些指向自然的推广,另一些指向新的研究问题,如加权和局部似然估计方法。

英文摘要

Protesting mildly against the notion of an exactly correct parametric model the view is adopted that the logistic regression equation is merely an approximation to the underlying, true function. The behaviour of likelihood based estimators is investigated in such a general framework. The maximum likelihood estimator is shown to be consistent for a certain least false parameter value minimising a weighted average of quantities that measure the distance from the true to the parametric model. Asymptotic normality is also demonstrated. Finally a number of additional remarks are offered, some pointing to natural generalisations and some to new questions for research, like weighted and local likelihood estimation methods.

2605.26723 2026-05-27 stat.ME math.ST stat.CO stat.TH

Marginal likelihoods for finite-support Huber contamination

有限支撑Huber污染的边际似然

Jaehoan Kim

AI总结 针对有限样本空间上的Huber污染,通过Dirichlet先验和Beta先验解析积分得到结构参数的精确边际似然,并利用动态规划实现高效计算。

详情
Comments
16 pages, 3 figures
AI中文摘要

对于已知有限样本空间上的Huber污染,无限制的污染律是支撑原子上的概率向量,且对所有可测子集的支配简化为逐点不等式。在此概率向量上放置Dirichlet先验,在污染比例上放置Beta先验,在对两个 nuisance 量进行解析积分后,得到结构参数的精确边际似然。该似然是观测计数在结构分量和污染分量之间分配的有限加权和。对于固定支撑大小,该和及其得分可以通过样本量二次成本的动态规划进行评估,从而实现基于梯度的后验采样。

英文摘要

For Huber contamination on a known finite sample space, the unrestricted contaminating law is a probability vector on the support atoms, and domination over all measurable subsets reduces to atomwise inequalities. Placing a Dirichlet prior on this probability vector and a Beta prior on the contamination proportion gives an exact marginal likelihood for the structural parameter after analytic integration of both nuisance quantities. The likelihood is a finite weighted sum over allocations of the observed counts between the structural and contaminating components. For fixed support size, this sum and its score can be evaluated by a dynamic program with quadratic cost in the sample size, enabling gradient-based posterior sampling.

2605.26713 2026-05-27 stat.ML cs.LG

Transformers Can Learn Posterior Predictive Distributions In-Context

Transformer可以在上下文中学习后验预测分布

Gyeonghun Kang, Changwoo J. Lee, Xiang Cheng

AI总结 本文通过构造证明Transformer能够实现针对后验预测均值和方差的梯度下降算法,并研究其逼近后验预测分布的误差界,揭示了归一化和注意力深度对泛化能力的关键作用。

详情
AI中文摘要

先验数据拟合网络(PFN)最近已成为贝叶斯预测任务的一种强大方法,通过上下文学习近似后验预测分布(PPD)。尽管它们具有强大的实证性能和超越点预测的能力,但对Transformer在上下文中学习分布的算法能力的理论理解仍然缺乏。聚焦于高斯过程回归问题,我们通过构造证明Transformer可以实现针对后验预测均值和方差的梯度下降算法,随后通过非线性映射产生PPD的分箱概率。我们根据注意力深度和分箱分辨率研究了近似PPD的误差界。基于这些结果,我们进一步证明了归一化和注意力深度的选择在使Transformer能够超越预训练样本大小范围进行外推中的关键作用。我们进行了模拟实验,验证了我们的发现,为针对PPD的PFN的表达能力以及架构选择如何影响泛化能力提供了见解。

英文摘要

Prior-data fitted networks (PFNs) have recently emerged as a powerful approach for Bayesian prediction tasks, approximating the posterior predictive distribution (PPD) through in-context learning. Despite their strong empirical performance and ability to go beyond point predictions, theoretical understandings of the algorithmic capability of transformers to learn distributions in context are still lacking. Focusing on Gaussian process regression problems, we show by construction that transformers can implement a gradient descent algorithm targeting the posterior predictive mean and variance, followed by nonlinear mappings that yield binned probabilities of PPD. We study the error bounds of the approximated PPD in terms of attention depth and bin resolution. Based on these results, we further demonstrate the key role of normalization and the choice of attention depth in enabling the extrapolation abilities of transformers beyond the pretraining sample size range. We conduct simulations that corroborate our findings, providing insight into the expressivity of PFNs targeting PPDs and how architectural choices may influence generalization capabilities.

2605.25852 2026-05-27 stat.ME

A Post-Processing Conformal Prediction Approach for Conditional Coverage via Pivotal Scores

通过枢轴分数实现条件覆盖的后处理共形预测方法

Félix Laplante

AI总结 本文提出PIT-CP后处理校正方法,通过将非共形分数映射为近似特征不变的分数,在保持几何结构、可解释性和边际覆盖的同时实现条件覆盖,并利用一维条件密度估计替代原始结果空间的全条件密度估计。

详情
Comments
33 pages, 4 figures
AI中文摘要

虽然共形预测(CP)已被证明是用于不确定性量化的强大框架,但保证条件覆盖仍然是一个核心挑战。尽管已知在没有结构假设的情况下,有限样本、分布自由的条件有效性是不可能的,但我们表明它在本质上等价于构造一个其分布与特征无关的非共形分数。这一理论特征激发了PIT-CP,一种新的后处理校正方法,它将任何基础非共形分数映射为近似不变的分数,同时保持其几何结构、可解释性和边际覆盖。这一视角在实践中特别有吸引力,因为当强大的预测驱动模型已经提供高度准确的点估计时,重新训练完整的生成模型可能既不经济也不省时。我们的程序将问题简化为对诱导分数的一维条件密度估计,而不是对原始结果空间的完整条件密度估计。我们展示了如何在实践中估计这种变换,并推导了条件覆盖差距的界限,以及体积和对称差界限。我们介绍了已知的极小极大最优条件估计技术,同时激励使用现代条件密度估计器,包括混合密度网络和条件归一化流。最后,我们在各种数据集上实证表明,我们的PIT-CP程序以最小的努力和计算成本匹配或超越了多种最先进的共形预测策略。

英文摘要

While Conformal Prediction (CP) has proven to be a powerful framework for uncertainty quantification, guaranteeing conditional coverage remains a central challenge. Although finite-sample, distribution-free conditional validity is known to be impossible without structural assumptions, we show that it is fundamentally equivalent to constructing a nonconformity score whose distribution is independent of the features. This theoretical characterization motivates PIT-CP, a new post-processing correction that maps any base nonconformity score to an approximately invariant one while preserving its geometry, interpretability, and marginal coverage. This perspective is particularly appealing in practice, since it may be neither economical nor time-effective to retrain a full generative model when a strong prediction-driven model already provides highly accurate point estimates. Our procedure reduces the problem to one-dimensional conditional density estimation on the induced score, rather than full conditional density estimation on the original outcome space. We show how to estimate this transform in practice and derive bounds on the conditional coverage gap, alongside volumetric and symmetric-difference bounds. We present known minimax-optimal conditional estimation techniques while also motivating the use of modern conditional density estimators, including Mixture Density Networks and Conditional Normalizing Flows. Finally, we empirically demonstrate on various datasets that our PIT-CP procedure matches or outperforms many state-of-the-art conformal prediction strategies with minimal effort and computational cost.

2605.13522 2026-05-27 math.ST stat.TH

Dependence functions based on Chatterjee's rank correlation

基于Chatterjee秩相关的依赖函数

Carsten Limbach

AI总结 本文通过Markov积重新解释Chatterjee的ξ系数,引入两个依赖函数φ和κ,以几何方式量化响应变量Y与预测变量X之间的随机依赖。

详情
AI中文摘要

我们研究了Chatterjee的$ξ$系数的几何和分布重新解释,该系数衡量响应变量$Y$与预测向量$\mathbf{X}$之间的函数依赖性。为此,我们分析了Markov积$(Y,Y')$,其中$Y'$是$Y$的一个副本,在给定$\mathbf{X}$的条件下与$Y$条件独立。基于这一构造,我们引入并研究了两个依赖函数,记为$ϕ_{(Y,\mathbf{X})}$和$κ_{(Y,\mathbf{X})}$。所提出的框架提供了Markov积的几何解释,并将Chatterjee相关系数扩展为一个更丰富、更可解释的对象,用于分析有向随机依赖。特别是,所提出的依赖函数不仅衡量$Y$可以被表示为$\mathbf{X}$的函数的程度,还额外量化了相应的Markov积集中在对角线附近的强度。

英文摘要

We investigate a geometric and distributional reinterpretation of Chatterjee's $ξ$-coefficient, which measures functional dependence between a response variable $Y$ and a predictor vector $\mathbf{X}$. For this purpose, we analyze the Markov product $(Y,Y')$, where $Y'$ is a copy of $Y$ that is conditionally independent of $Y$ given $\mathbf{X}$. Based on this construction, we introduce and study two dependence functions, denoted by $ϕ_{(Y,\mathbf{X})}$ and $κ_{(Y,\mathbf{X})}$. The proposed framework provides a geometric interpretation of the Markov product and extends Chatterjee's correlation coefficient to a richer and more interpretable object for the analysis of directed stochastic dependence. In particular, rather than only measuring how well $Y$ can be represented as a function of $\mathbf{X}$, the proposed dependence functions additionally quantify how strongly the corresponding Markov product is concentrated near the diagonal.

2605.04932 2026-05-27 stat.ML cs.LG

Jacobian-Velocity Bounds for Deployment Risk Under Covariate Drift

协变量漂移下部署风险的雅可比-速度界

Jonathan R. Landers

AI总结 针对动态协变量漂移下冻结预测器的长期部署风险,提出基于时域庞加莱不等式和雅可比-速度定理的路径控制方法,并设计漂移对齐切线正则化(DTR)以降低风险波动。

详情
Comments
8 pages, 4 figures, 4 tables
AI中文摘要

我们研究了动态协变量漂移下冻结预测器的长期部署问题。时域庞加莱不等式首先将时间风险波动降低为导数能量。然后,雅可比-速度定理提供了相应的路径控制。在明确的规则性和支配假设下,该定理将沿部署路径的方向切线能量识别为控制量。在低秩漂移下,该量减少为漂移子空间中的方向雅可比能量,从而激发了漂移对齐切线正则化(DTR)和匹配的监测代理。DTR不是各向同性地平滑网络,而是仅沿估计的漂移方向惩罚敏感性。我们通过四个实验验证了从定理到方法的流程:一个用于时域不等式的合成基准,一个与各向同性雅可比正则化对比的受控合成实验,以及在UCI空气质量数据集和Tetouan电力消耗数据集上的两个冻结部署研究。DTR在受控低秩区域降低了风险波动和方向增益,并优于各向同性平滑。它还在两个真实数据集上给出了验证选择的部署增益,其中空气质量子空间是从目标正交传感器运动估计的。适度的漂移子空间错误指定是可容忍的,而正交错误指定则基本消除了收益。

英文摘要

We study long-horizon deployment of a frozen predictor under dynamic covariate shift. A time-domain Poincare inequality first reduces temporal risk volatility to derivative energy. A Jacobian-velocity theorem then supplies the corresponding pathwise control. Given explicit regularity and domination assumptions, the theorem identifies directional tangent energy along the deployment path as the governing quantity. Under low-rank drift, that quantity reduces to directional Jacobian energy in the drift subspace, motivating drift-aligned tangent regularization (DTR) and a matched monitoring proxy. Rather than smoothing the network isotropically, DTR penalizes sensitivity only along estimated drift directions. We validate the theorem-to-method pipeline in four experiments: a synthetic benchmark for the time-domain inequality, a controlled synthetic comparison against isotropic Jacobian regularization, and two frozen-deployment studies on the UCI Air Quality and Tetouan power-consumption datasets. DTR reduces risk volatility and directional gain in the controlled low-rank regime and beats isotropic smoothing there. It also gives validation-selected deployment gains on both real datasets, with the Air Quality subspace estimated from target-orthogonal sensor motion. Moderate drift-subspace misspecification is tolerable while orthogonal misspecification largely removes the benefit.

2605.26693 2026-05-27 cs.LG cs.AI stat.ML

Model Merging on Loss Landscape: A Geometry Perspective

损失景观上的模型合并:几何视角

Juanwu Lu, Anand Bhaskar, Brian Axelrod, Ekaterina Tolstaya, Tristan Emrich

AI总结 提出EpiMer框架,将模型合并视为黎曼流形上的Fréchet均值,利用任务向量张成的低秩子空间和期望Hessian度量,理论证明曲率感知合并优于平坦几何方法,并在八个图像分类任务上验证了性能提升。

详情
Comments
CVPR 2026 Findings Track. 18 pages, 4 figures, 6 tables
AI中文摘要

模型合并为无需重新训练的知识集成和并行开发提供了有前景的途径。然而,现有方法要么忽略损失景观的几何结构,要么依赖于难以处理的全空间Hessian近似。我们提出EpiMer,一个将模型合并视为黎曼流形上Fréchet均值求解的框架,并将计算限制在由任务向量张成的低秩子空间内。以期望Hessian作为度量,我们揭示了局部曲率与参数认知不确定性之间的联系。我们的理论分析将合并误差界分解为子空间Fréchet方差和残差能量,并提供了曲率感知合并何时在理论上优于平坦几何方法的闭式刻画。此外,我们的框架将曲率感知方法和最近的谱方法统一为不同几何度量下子空间Fréchet均值的特例。在八个图像分类任务上合并微调的CLIP-ViT模型,Epistemic Merging在匹配秩下严格优于所有三个CLIP-ViT骨干网络的基线,提高了每个骨干网络上的跨任务平均准确率和最差任务准确率。

英文摘要

Model merging offers a promising avenue for knowledge integration and parallel development without retraining. Yet, existing methods either ignore the geometry of the loss landscape or rely on intractable full-space Hessian approximations. We propose EpiMer, a framework that casts model merging as solving the Fréchet mean on a Riemannian manifold and restricts the computation to a low-rank subspace spanned by the task vectors. With the expected Hessian as the metric, we reveal a connection between local curvature and epistemic uncertainty of the parameters. Our theoretical analysis decomposes the merging error bound into the subspace Fréchet variance and the residual energy, and provides a closed-form characterization of when curvature-aware merging provably outperforms flat-geometry methods. In addition, our framework unifies both curvature-aware methods and recent spectral methods as special cases of the subspace Fréchet mean with different geometric metrics. Merging fine-tuned CLIP-ViT models on eight image classification tasks, Epistemic Merging strictly outperforms the baselines on all three CLIP-ViT backbones at matched rank, improving the across-task average accuracy and worst-task accuracy on every backbone.

2605.26675 2026-05-27 stat.ML cs.LG

CART Random Forests as Sequential Allocation over Random Opportunity Sets: A Stochastic-Control Theory of Ensemble Risk

CART随机森林作为随机机会集上的序贯分配:集成风险的随机控制理论

Tianxing Mei, Yingying Fan, Mingming Leng, Jinchi Lv

AI总结 本文从随机控制视角将CART随机森林建模为随机机会集上的序贯分配过程,通过分离特征子采样和信息分裂策略两个设计杠杆,揭示了森林均方误差的构成,并证明了CART策略的局部稳定性与全局次优性。

详情
Comments
69 pages, 1 figure
AI中文摘要

CART随机森林是最广泛使用的现代预测方法之一,具有充分记录的经验成功。然而,在机制层面,由于其复杂性,该算法通常被视为黑箱。在本文中,我们发展了特征子采样CART随机森林的随机控制视角,称为CART随机机会集分配(CART-ROSA)。在每个节点,特征的随机子集被解释为随机可行动作集,CART分裂规则被解释为掩码动作分配策略。该策略在信息性分裂计数状态上诱导出一个受控的随机过程,其终末分布决定了森林均方误差(MSE)中的单棵树误差和树间交互项。这种表示通过分离两个设计杠杆——特征子采样引起的信息性机会率和掩码内分裂策略的收缩强度——打开了CART森林的黑箱。我们证明CART策略是局部稳定的:它收缩了信息性分裂分配中的不平衡,并集中了终末树的几何结构。然而,在系统层面,它对森林目标可能是全局次优的。针对线性模型,我们显式推导了MSE风险展开。我们的结果表明,运筹学视角如何使从CART森林的标准算法描述难以触及的理论缺口变得可处理。

英文摘要

CART random forests are among the most widely used modern predictive methods, with well-documented empirical success. Yet, at the mechanistic level, the algorithm is often treated as a black box because of its complexity. In this paper, we develop a stochastic-control perspective on feature-subsampled CART random forests, named CART random opportunity-set allocation (CART-ROSA). At each node, the random subset of features is interpreted as a random feasible action set, and the CART split rule as a masked-action allocation policy. This policy induces a controlled stochastic process over informative split-count states, whose terminal law determines both single-tree error and cross-tree interaction terms in the forest mean squared error (MSE). Such representation opens the black box of CART-forests by separating two design levers: the informative-opportunity rate induced by feature subsampling, and the contraction strength from the within-mask split policy. We establish that the CART policy is locally stabilizing: it contracts imbalances in informative split allocations and concentrates terminal tree geometry. At the system level, however, it can be globally suboptimal for the forest objective. Specializing to the linear model, we derive the MSE risk expansion explicitly. Our results show how an operations-research perspective makes tractable a theoretical gap difficult to access from the standard algorithmic description of CART forests.

2605.26654 2026-05-27 cs.LG cs.AI math.OC stat.ML

Bilevel Optimization over Saddle Points of Zero-Sum Markov Games

零和马尔可夫博弈鞍点上的双层优化

Zihao Zheng, Irwin King, Songtao Lu

AI总结 针对下层为零和马尔可夫博弈的双层优化问题,提出基于惩罚的Nikaido-Isoda下降-上升方法(PANDA),避免计算超梯度且无需二阶信息,在无凸性假设下收敛到平稳点,达到与单策略下层MDP双层RL相当的最优速率。

详情
Comments
Accepted to the International Conference on Machine Learning (ICML 2026)
AI中文摘要

强化学习(RL)通常具有层次结构,其中上层(UL)学习器选择模型参数,下层(LL)决策过程做出响应,自然形成双层优化问题。大多数现有的双层RL方法假设下层为单策略马尔可夫决策过程(MDP),因此无法捕捉激励设计等应用中出现的竞争结构,其中多个策略相互交互。我们研究了下层问题为正则化极小极大零和马尔可夫博弈、上层目标通过下层博弈诱导的鞍点均衡进行优化的双层优化问题。在这项工作中,我们提出了惩罚增强的Nikaido-Isoda下降-上升(PANDA),一种基于Nikaido-Isoda函数的惩罚一阶策略梯度方法。通过利用极小极大博弈结构,PANDA避免了计算上层超梯度,且不需要二阶信息。我们证明了PANDA在无需对上层或下层目标做凸性假设的情况下收敛到平稳点。此外,PANDA在$ ilde{\mathcal{O}}(ε^{-1})$次迭代内达到$ε$-平稳点,样本复杂度为$ ilde{\mathcal{O}}(ε^{-3})$,与单策略下层MDP的双层RL的最佳已知速率相匹配。实验表明PANDA优于密切相关基线方法。

英文摘要

Reinforcement learning (RL) often has a hierarchical structure, where an upper-level (UL) learner selects model parameters and a lower-level (LL) decision-making process responds, naturally leading to a bilevel optimization problem. Most existing bilevel RL methods assume a single-policy LL Markov decision process (MDP), and therefore fail to capture competitive structures arising in applications such as incentive design, where multiple policies interact. We study bilevel optimization problems in which the LL problem is a regularized min-max zero-sum Markov game and the UL objective is optimized through the saddle-point equilibrium induced by the LL game. In this work, we propose penalty-augmented Nikaido-Isoda descent-ascent (PANDA), a penalty-based first-order policy-gradient method based on the Nikaido-Isoda function. By exploiting the min-max game structure, PANDA avoids computing UL hypergradients and does not require second-order information. We prove that PANDA converges to stationary points without convexity assumptions on either the UL or LL objectives. Moreover, PANDA reaches an $ε$-stationary point in $\tilde{\mathcal{O}}(ε^{-1})$ iterations with sample complexity $\tilde{\mathcal{O}}(ε^{-3})$, matching the best-known rates for bilevel RL with single-policy LL MDPs. Experiments demonstrate the superior performance of PANDA over closely related baselines.

2605.26647 2026-05-27 cs.LG cs.AI stat.ML

More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations

更具表达力的前馈层:第一部分。激活的令牌自适应混合

Mingze Wang, Jinbo Wang, Yikuan Xia, Kai Shen, Shu Zhong

AI总结 提出令牌自适应激活混合(MoA)和可学习激活(LA)方法,通过轻量级输入相关门混合多个激活函数,在理论和实验上证明其比固定激活FFN具有更强的表达能力和更优的缩放行为。

详情
Comments
31 pages
AI中文摘要

前馈网络(FFN)层在基于Transformer的大语言模型(LLMs)中占据了大部分参数和非线性表达能力。尽管从ReLU和GELU发展到门控变体如SwiGLU,大多数FFN设计仍使用单一固定激活函数,对所有令牌应用相同的非线性变换。在这项工作中,我们提出了激活混合(MoA),一种令牌自适应的FFN设计,它使用轻量级输入相关门混合一个激活函数字典,同时共享相同的线性投影。作为输入无关的对应,我们还引入了可学习激活(LA),它为ReLU型和SwiGLU型FFN形成激活函数的线性组合。理论上,我们在固定激活FFN、LA和MoA之间建立了严格的有限宽度表达分离:LA严格包含固定激活FFN,而MoA严格包含LA,额外的表达能力来自于输入相关的非线性混合。实验上,我们通过在不同令牌预算、优化器和学习率调度下,对0.12B到2B参数的密集和MoE语言模型进行广泛的预训练实验来评估MoA。与调整良好的基线相比,MoA始终获得更低的最终损失,并表现出更有利的缩放行为,且参数和计算开销极小。这些结果表明,令牌自适应激活混合是提高LLMs中FFN表达能力的一种简单而有效的机制。

英文摘要

Feedforward network (FFN) layers account for a large fraction of parameters and nonlinear expressivity in Transformer-based large language models (LLMs). Despite the evolution from ReLU and GELU to gated variants such as SwiGLU, most FFN designs still use a single fixed activation function, applying the same nonlinear transformation to all tokens. In this work, we propose Mixture of Activations (MoA), a token-adaptive FFN design that mixes a dictionary of activation functions using lightweight input-dependent gates while sharing the same linear projections. As an input-independent counterpart, we also introduce learnable activations (LA), which form linear combinations of activation functions for both ReLU-type and SwiGLU-type FFNs. Theoretically, we establish strict finite-width expressive separations among fixed-activation FFNs, LA, and MoA: LA strictly contains fixed-activation FFNs, while MoA strictly contains LA, with the additional expressivity arising from input-dependent nonlinear hybridization. Empirically, we evaluate MoA through extensive pre-training experiments on dense and MoE language models ranging from 0.12B to 2B parameters under different token budgets, optimizers, and learning rate schedules. MoA consistently achieves lower terminal loss and exhibits more favorable scaling behavior than well-tuned baselines, with minimal parameter and computational overhead. These results suggest that token-adaptive activation mixing is a simple and effective mechanism for improving FFN expressivity in LLMs.

2605.26640 2026-05-27 eess.SY cs.LG cs.SY math.OC stat.ML

Sample Complexity of Policy Gradient for Log-Growth Control

对数增长控制的策略梯度样本复杂度

Qiuhua Pan, Yukai Shen, Liwei Zhang, Cailian Chen, Xinping Guan

AI总结 针对乘性噪声驱动标量线性系统的对数增长控制问题,利用奇点对称性消除梯度估计发散,证明了策略梯度的样本复杂度。

详情
Comments
43 pages, 4 figures, 2 tables; includes supplementary material
AI中文摘要

我们研究了策略梯度在对数增长控制中的样本复杂度——即从观测到的状态转移中学习一个反馈增益,该增益能够最优稳定一个通过乘性噪声驱动通道的标量线性系统。目标函数 $J(K) = \mathbb{E}[\log|1+BK|]$ 是闭环系统的顶部李雅普诺夫指数。该问题存在一个我们称为尖点障碍的结构性困难:最优增益 $K^*$ 总是将噪声奇点 $b_{\rm sing}(K) = -1/K$ 置于支撑集内部。在这个奇异最优处,策略梯度仅作为柯西主值存在,而非勒贝格积分,且自然的单样本梯度估计量具有无穷方差。因此,标准的一阶随机优化分析在最优处不适用,仅对目标函数进行平滑处理无法解决这一困难。然而,该障碍具有可利用的对称性:柯西核是关于移动极点位移的奇函数,因此将每个观测值与其关于极点的反射配对可以抵消发散部分。这一抵消同时控制了总体曲率、梯度估计量方差以及估计噪声密度时产生的偏差。结合这些界与一个闭式单转移梯度预言,我们证明:当噪声密度已知时,投影小批量策略梯度(初始化于稳定区域的任意紧子集内)的总样本复杂度为 $\tilde{O}(1/\eta)$;当噪声密度需估计时,对于 $C^s$ 噪声密度($s \geq 2$),样本复杂度为 $\tilde{O}(\eta^{-(2s+1)/(2s)})$。

英文摘要

We study the sample complexity of policy gradient for log-growth control -- the problem of learning, from observed state transitions, a feedback gain that optimally stabilizes a scalar linear system driven through a multiplicative-noise actuation channel. The objective $J(K) = \mathbb{E}[\log|1+BK|]$ is the top Lyapunov exponent of the closed loop. This problem carries a structural difficulty we call the cusp obstruction: the optimal gain $K^*$ always places the noise singularity $b_{\rm sing}(K) = -1/K$ in the interior of the support. At this singular optimum the policy gradient exists only as a Cauchy principal value, not as a Lebesgue integral, and the natural single-sample gradient estimator has infinite variance. Standard first-order stochastic-optimization analysis is thus inapplicable at the optimum, and merely smoothing the objective does not resolve the difficulty. The obstruction, however, has an exploitable symmetry: the Cauchy kernel is an odd function of the displacement from the moving pole, so pairing each observation with its reflection through the pole cancels the divergent part. This one cancellation simultaneously controls the population curvature, the gradient-estimator variance, and the bias incurred when the noise density is estimated. Combining these bounds with a closed-form single-transition gradient oracle, we prove that projected mini-batch policy gradient, initialized in any compact subset of the stabilizing region, attains total sample complexity $\tilde{O}(1/η)$ when the noise density is known and $\tilde{O}(η^{-(2s+1)/(2s)})$ when it must be estimated, for $C^s$ noise densities with $s \geq 2$.

2605.26631 2026-05-27 stat.AP cs.LG

Data-driven sparse identification of governing PDEs via knockoff filters and multi-criteria trade-offs

基于Knockoff滤波器与多准则权衡的数据驱动稀疏识别控制偏微分方程

Pongpisit Thanasutives, Naichang Ke, Yoshinobu Kawahara

AI总结 提出KO-PDE-IDENT框架,通过模型-X knockoff滤波器控制错误发现率,结合递归特征消除和多准则决策,从噪声数据中稀疏识别偏微分方程。

详情
Comments
42 pages, 5 figures, 10 tables
AI中文摘要

我们提出KO-PDE-IDENT,一个用于识别简洁偏微分方程(PDE)并控制错误发现率(FDR)的数据驱动框架。从噪声观测中发现PDE常常受到候选项之间极端多重共线性的阻碍,这导致典型的稀疏回归方法选择虚假项。为了解决这个问题,KO-PDE-IDENT首先通过具有有限样本FDR控制的模型-X knockoff滤波器挖掘潜在候选项的支持集,然后对存活的PDE备选方案进行细化和排序。该框架整合了三个组成部分。首先,通过将$\ell_{0}$约束的自适应最佳子集选择与SHapley Additive exPlanations(SHAP)相结合,构建knockoff特征统计量,产生有效且计算高效的差异统计量。其次,递归特征消除(RFE)过程去除边际贡献可省略的项,并通过knockoff扰动假设检验评估统计必要性。第三,最终模型选择被表述为一个多准则决策(MCDM)问题,其中最优控制方程是在预测精度、模型复杂度和系数不确定性等广泛准则之间取得最佳平衡的备选方案。我们在严重噪声污染下对五个经典PDE验证了KO-PDE-IDENT。实验结果表明,我们的框架可以精确恢复真实的PDE结构,消除错误发现同时保留所有真实潜在项,且系数估计误差低。

英文摘要

We propose KO-PDE-IDENT, a data-driven framework for identifying parsimonious partial differential equations (PDEs) with false discovery rate (FDR) control. PDE discovery from noisy observations is often hindered by extreme multicollinearity among candidate terms, which causes typical sparse-regression methods to select spurious terms. To address this problem, KO-PDE-IDENT initially mines a support set of potential candidate terms via model-X knockoff filters with finite-sample FDR control, then refines and ranks the surviving PDE alternatives. The framework integrates three components. First, knockoff feature statistics are constructed by coupling $\ell_{0}$-constrained adaptive best-subset selection with SHapley Additive exPlanations (SHAP), yielding an effective and computationally efficient difference statistic. Second, a recursive feature elimination (RFE) procedure removes terms whose marginal contributions are dispensable and assesses statistical necessity through knockoff-perturbed hypothesis testing. Third, the final model selection is formulated as a multi-criteria decision-making (MCDM) problem, where the optimal governing equation is the alternative that best balances a wide range of criteria such as predictive accuracy, model complexity and coefficient uncertainty. We validate KO-PDE-IDENT on five canonical PDEs under severe noise corruption. Empirical results show that our framework can exactly recover the true PDE structure, eliminating false discoveries while retaining all true underlying terms, with low coefficient estimation error.

2605.26608 2026-05-27 stat.ME

Statistical Inference and Stability Boundaries of Multi-cellular Interaction Hypergraphs from Asynchronous Event Streams

异步事件流中多细胞相互作用超图的统计推断与稳定性边界

Zihan Xu

AI总结 提出超边触发霍克斯过程(HTH)用于从异步事件数据推断多细胞系统中的高阶交互结构,通过闭合形式的EM算法和CP张量分解实现高效推断,在合成实验中成对恢复误差低于5%,并在视网膜神经节细胞数据上获得+20.6 nats的似然增益。

详情
Comments
8 pages, 5 figures, 1 table
AI中文摘要

我们引入了超边触发霍克斯(HTH)过程,用于从异步事件时间数据推断多细胞系统中的高阶交互结构。除了标准的成对激发外,HTH强度还包括一个由时间窗口内细胞组同时共发激活的项。我们推导了一个闭合形式的期望最大化算法,其关键成分是一个分段补偿器,消除了朴素积分公式中存在的系统性偏差。CP张量分解将超边参数计数从O(N^K)减少到O(NR)。在十一个合成实验中,该框架实现了低于5%的成对恢复误差,同时揭示了超边权重上系统性的-22%偏差,该偏差在核衰减率上非单调,排除了简单的时间重叠解释,并激发了自适应核方法。在小鼠视网膜神经节细胞的多电极记录中,该模型相对于成对基线获得了+20.6 nats的似然增益,为高阶交互提供了提示性但非决定性的证据。代码和所有实验公开于https://github.com/Hanii0210/hypergraph-hawkes。

英文摘要

We introduce the Hyperedge-triggered Hawkes (HTH) process for inferring higher-order interaction structure in multi-cellular systems from asynchronous event-time data. Beyond standard pairwise excitation, the HTH intensity includes a term activated by the simultaneous co-firing of a cell group within a temporal window. We derive a closed-form Expectation-Maximisation algorithm whose key ingredient is a piecewise compensator that eliminates the systematic bias present in the naive integral formulation. A CP tensor decomposition reduces the hyperedge parameter count from O(N^K) to O(NR). Across eleven synthetic experiments the framework achieves pairwise recovery error below 5%, while revealing a systematic -22% bias on hyperedge weights that is non-monotonic in the kernel decay rate, ruling out a simple temporal-overlap explanation and motivating adaptive kernel methods. On multi-electrode recordings of mouse retinal ganglion cells, the model yields a +20.6 nat likelihood gain over the pairwise baseline, providing suggestive but not decisive evidence for higher-order interactions. Code and all experiments are publicly available at https://github.com/Hanii0210/hypergraph-hawkes.

2605.26607 2026-05-27 stat.CO stat.AP

Log-linear Model for Dual System Estimation and Computational Considerations

双重系统估计的对数线性模型及计算考虑

Zhiyuan Lu

AI总结 针对双重系统估计中因数据缺失导致的计算瓶颈,提出一种比EM算法计算量更低的极大似然估计方法。

详情
AI中文摘要

双重系统估计(DSE)在美国人口普查局的操作中被广泛使用。使用DSE方法时,重要的是实施方法来推断从一个或两个数据源中缺失数据的人群规模。如Van der Heijden等人(2022)所示,通过EM算法计算的对数线性模型提供了一种估计所有记录不完整组别计数的方法。不幸的是,涉及的数值计算随着人口划分的增加而扩展性极差,以至于同时分析多个人口和地理因素(如居住州和种族)在计算上变得不可行。这里,将提供一种替代方法来计算对数线性估计,该方法可以在比EM算法低得多的计算量下计算极大似然估计。

英文摘要

The use of dual system estimation (DSE) is heavily used in Census Bureau operations. With DSE methods, it is important to implement methods to infer the population size among those with missing data from one or both data sources. The use of log-linear models, calculated through EM algorithms, promises a way for estimation of counts among all groups with incomplete recorded data, as displayed by Van der Heijden et al. 2022. Unfortunately, the numerical computations involved scale very poorly the more the population is divided, to the point where simultaneous analysis of several demographic and geographic factors, such as state of residence and ethnicity, becomes computationally infeasible. Here, an alternative method to calculate the log-linear estimates will be provided, which can calculate the maximum likelihood estimator in orders of computation lower than through the EM algorithm.

2605.26589 2026-05-27 cs.LG cs.AI stat.ML

Few-shot Cross-country Generalization of Tabular Machine Learning and Foundation Models for Childhood Anemia Prediction under Distribution Shift

分布漂移下儿童贫血预测的表格机器学习与基础模型的少样本跨国家泛化

Yusuf Brima, Marcellin Atemkeng, Lansana Hassim Kallon, David Niyukuri, Antoine Vacavant, Samuel Saidu, Ding-Geng Chen

AI总结 本研究评估了基于Transformer的表格基础模型TabPFN在跨国家、数据稀缺环境下预测儿童贫血的性能,发现其优于经典监督方法,尤其在低数据场景下表现出更好的区分度和校准能力。

详情
AI中文摘要

儿童贫血影响全球约40%的6-59个月儿童,且由异质性因素引起,限制了模型的泛化能力。我们在跨国家和数据稀缺环境下,评估了基于Transformer的表格基础模型与经典监督方法。我们使用了来自非洲、亚洲、拉丁美洲、高加索和中东16个国家的DHS数据(n=68,856)。比较了逻辑回归、XGBoost、LightGBM和TabPFN v2.6。性能通过AUC-ROC、Brier评分和ECE评估。泛化性通过留一国家法(LOCO)、反向LOCO和少样本设置评估。亚组分析包括性别、年龄、居住地、母亲教育和财富。特征重要性通过SHAP估计。TabPFN在低数据场景(<200样本)中优于经典模型,显示出更高的区分度和更好的校准。在各国中,它实现了最低的Brier评分(0.042)和ECE(0.203)。在全数据设置下,AUC-ROC范围为0.59-0.76,模型间差异较小(≤0.05)。LOCO性能稳定(0.58-0.69),受国家背景驱动。反向LOCO显示出不对称的可转移性。亚组性能一致,无系统性人口统计偏差。SHAP识别出儿童年龄、海拔和年龄别身高Z分数为主要预测因子,其次是财富和母亲教育。儿童贫血预测的性能更多由人群变异驱动而非模型选择。TabPFN在低资源环境中通过改进的区分度和校准提供了优势,突显了基础模型作为数据稀缺全球健康预测的有前景工具。

英文摘要

Childhood anemia affects around 40% of children aged 6-59 months globally and arises from heterogeneous factors, limiting model generalizability. We evaluate a transformer-based tabular foundation model against classical supervised methods under cross-country and data-scarce settings. We used DHS data from 16 countries across Africa, Asia, Latin America, the Caucasus, and the Middle East (n=68,856). We compared Logistic Regression, XGBoost, LightGBM, and TabPFN v2.6. Performance was assessed using AUC-ROC, Brier score, and ECE. Generalization was evaluated using leave-one-country-out (LOCO), reverse-LOCO, and few-shot settings. Subgroup analyses included sex, age, residence, maternal education, and wealth. Feature importance was estimated using SHAP. TabPFN outperformed classical models in low-data regimes (<200 samples), showing higher discrimination and better calibration. Across countries, it achieved the lowest Brier score (0.042) and ECE (0.203). Under full-data settings, AUC-ROC ranged from 0.59-0.76 with small between-model differences ($\leq 0.05$). LOCO performance was stable (0.58-0.69), driven by country context. Reverse-LOCO showed asymmetric transferability. Subgroup performance was consistent with no systematic demographic bias. SHAP identified child age, altitude, and height-for-age z-score as dominant predictors, followed by wealth and maternal education. Performance in childhood anemia prediction is driven more by population variation than model choice. TabPFN provides advantages in low-resource settings through improved discrimination and calibration, highlighting foundation models as promising tools for data-scarce global health prediction.

2605.26572 2026-05-27 stat.ME

Using Transcripts for Nonparametric Monitoring of Serial Dependence

利用转录本对序列依赖进行非参数监控

Christian H. Weiß, José M. Amigó

AI总结 提出基于转录本和代数距离的新型非参数控制图,用于监控序列依赖,并通过模拟和化工实例验证其性能。

详情
AI中文摘要

过程监控的控制图在实践中被广泛使用。大多数控制图要求被监控的(残差)过程是序列独立的(并满足指定的分布假设),而未检测到的依赖(或违反分布假设)可能会严重影响控制图的性能。因此,用于监控序列依赖的(分布自由)控制图在实践中至关重要。最近,为此目的提出了各种基于序数模式的非参数控制图,这些控制图在检测不同类型的序列依赖方面表现出良好的性能。在本研究中,我们进一步朝这个方向推进,开发了基于转录本和代数距离(源自序数模式)的新型非参数控制图。通过模拟研究评估了新提出的控制图的性能,并通过化工行业的实际数据示例说明了它们在实践中的应用。

英文摘要

Control charts for process monitoring are widely used in practice. Most control charts require the monitored (residuals) process to be serially independent (and to satisfy specified distributional assumptions), whereas undetected dependence (or violations of distributional assumptions) may severely affect the charts' performances. Therefore, (distribution-free) control charts for monitoring serial dependence are of utmost relevance for practice. Recently, various nonparametric control charts have been proposed for this purpose, which are based on ordinal patterns, and which showed an appealing performance in detecting different types of serial dependence. In this research, we further progress in this direction and develop novel nonparametric control charts being based on transcripts and algebraic distances (as derived from ordinal patterns). The performance of the newly proposed control charts is evaluated in a simulation study, and their application in practice is illustrated with a real-world data example from chemical industry.

2605.26568 2026-05-27 stat.ME cs.IT math.IT math.ST stat.TH

Target-Oriented Statistical Compression: Sufficiency, Reverse Martingales, and Sequential Monitoring

面向目标的统计压缩:充分性、逆鞅与序贯监测

Yuan-chin Ivan Chang

AI总结 本文提出“面向目标的统计压缩”统一框架,通过逆鞅和条件目标过程量化压缩损失,并应用于序贯边界监测问题。

详情
Comments
28 pages, 9 figures
AI中文摘要

统计过程很少保留观测数据的所有特征。充分统计量剔除与参数无关的信息;极大似然估计将经验目标压缩为优化点;序贯模型中的隐状态将过去观测压缩为学习表示。本文在统一概念“面向目标的统计压缩”下发展这些实践:有用的摘要保留对推断、预测或决策相关目标重要的信息,而非实现数据路径的每个细节。核心对象是条件目标过程 \(M_n=\E(Z\given\G_n)\),其中 \(Z\) 是目标,\(\G_n=σ(T_n)\) 是由压缩映射 \(T_n\) 保留的信息。当 \((\G_n)\) 是递减滤过时,\((M_n)\) 是极限为 \(M_\infty=\E(Z\given\G_\infty)\) 的逆鞅。精确充分性对应无损压缩,而近似摘要(如惩罚估计量、主成分和神经网络隐状态)产生逆拟鞅缺陷,衡量跨压缩水平的相干性损失。诊断量 \(r_n=|M_n-M_{n-1}|\) 被视为可观测的稳定性代理,而非理论缺陷的无偏估计。序贯二元问题中的边界退化被发展为一项核心应用。实际的边界声明需要联合评估边界接近度、不确定性控制和轨迹稳定性。配套论文 \citet{chang2025rm} 发展了相应的停止程序、有限样本界和数值证据;本文提供更广泛的理论基础设施,并将框架扩展到高斯、泊松和拟鞅监测问题。

英文摘要

Statistical procedures rarely retain all features of the observed data. A sufficient statistic removes information irrelevant to a parameter; a maximum likelihood estimate compresses an empirical objective into an optimizing point; and a hidden state in a sequential model compresses past observations into a learned representation. This article develops these practices under the unified notion of \emph{target-oriented statistical compression}: a useful summary preserves what matters for an inferential, predictive, or decision-relevant target, rather than every detail of the realized data path. The central object is the conditional target process \(M_n=\E(Z\given\G_n)\), where \(Z\) is the target and \(\G_n=σ(T_n)\) is the information retained by the compression map \(T_n\). When \((\G_n)\) is a decreasing filtration, \((M_n)\) is a reverse martingale with limit \(M_\infty=\E(Z\given\G_\infty)\). Exact sufficiency corresponds to lossless compression, while approximate summaries such as penalized estimators, principal components, and neural-network hidden states produce reverse quasi-martingale defects measuring coherence loss across compression levels. The diagnostic \(r_n=|M_n-M_{n-1}|\) is treated as an observable stability proxy, not as an unbiased estimator of the theoretical defect. Boundary degeneracy in sequential binary problems is developed as a central application. Practical boundary claims require joint assessment of boundary closeness, uncertainty control, and trajectory stability. The companion paper \citet{chang2025rm} develops the corresponding stopping procedures, finite-sample bounds, and numerical evidence; the present paper provides the broader theoretical infrastructure and extends the framework to Gaussian, Poisson, and quasi-martingale monitoring problems.

2605.26532 2026-05-27 stat.ME

Global Average Treatment Effects for Individualized Randomization Experiments with Aggregate Data

基于聚合数据的个体随机实验的全局平均处理效应

Shuguang Yu, Ting Li, Yuchen Lu, Chengchun Shi, Fan Zhou, Zhichao Zou, Peng Zhen, Hongtu Zhu

AI总结 针对个体随机实验中存在时空干扰且仅能获取聚合数据的问题,提出IRE-VCDP模型以估计全局平均处理效应(GATE),并给出估计与推断方法及理论保证。

详情
AI中文摘要

个体随机实验是在复杂环境中优化个性化决策的在线平台的核心。然而,在双边市场中,由于强烈的时间和跨单元干扰,标准的处理效应估计往往无效,而当由于隐私或系统限制只能获得聚合数据时,这一问题更加复杂。为了解决这些问题,我们仅使用来自处理组和对照组的群体级数据来识别全局平均处理效应(GATE)。我们首先基于聚合观测建立识别条件,然后提出个体随机实验变系数决策过程(IRE-VCDP)模型,该模型通过供需动态来解释干扰。在此框架基础上,我们开发了GATE估计和统计推断的完整程序,并给出了所提检验的理论保证。利用来自领先网约车平台的数据进行的大量模拟和真实实验证明了我们方法的有效性。

英文摘要

Individualized randomized experiments are central to online platforms for optimizing personalized decisions in complex environments. In two-sided markets, however, standard treatment effect estimation is often invalid due to strong temporal and cross-unit interference, a challenge compounded when only aggregated data are available because of privacy or system constraints. To address these issues, we identify the Global Average Treatment Effect (GATE) using only group-level data from treatment and control groups. We first establish identification conditions based on aggregated observations, and then propose the Individualized Randomized Experiment Varying Coefficient Decision Process (IRE-VCDP) model, which accounts for interference through supply-demand dynamics. Building on this framework, we develop a complete procedure for estimation and statistical inference of the GATE, along with theoretical guarantees for the proposed test. Extensive simulations and real-world experiments using data from a leading ridesharing platform demonstrate the effectiveness of our approach.

2605.26515 2026-05-27 stat.ME

Learning a directed acyclic graph with additive heteroscedastic errors

学习具有加性异方差误差的有向无环图

Xintao Xia, Li Chen, Yue Hu, Chunlin Li

AI总结 针对结构方程模型中的加性异方差误差,提出基于异方差性的因果方向识别方法,并设计迭代算法RESQUE,通过条件尺度系数在不同分位数上的不变性递归识别汇节点,理论保证拓扑顺序和图结构的恢复。

详情
Comments
33 pages, 4 figures
AI中文摘要

本文研究结构方程模型下具有加性异方差误差的有向无环图的因果发现。我们首先建立了位置-尺度噪声模型的新可识别性结果,表明异方差性可用于恢复因果方向。基于这些见解,我们提出了一种新颖的迭代过程——残差同时分位数估计(RESQUE),其中每次迭代由残差构建阶段和复合分位数回归阶段组成,通过条件尺度系数在不同分位数上的不变性递归识别汇节点。然后,我们建立了其在恢复拓扑顺序和图结构方面的理论保证,即使变量数量随样本量发散。模拟研究和基准数据集应用表明,RESQUE与现有方法相比表现良好,特别是当因果信息部分编码在方差分量中时。这些结果突出了利用结构化方差信号进行因果发现,并为超越基于均值建模的多变量因果发现提供了原则性框架。

英文摘要

This paper studies causal discovery for a directed acyclic graph under a structural equation model with additive heteroscedastic errors. We first establish new identifiability results for location-scale noise models, showing that heteroscedasticity can be leveraged to recover causal directions. Based on these insights, we propose a novel iterative procedure, Residual Simultaneous Quantile Estimation (RESQUE), where each iteration consists of a residual-construction stage and a composite quantile regression stage, enabling recursive identification of sink nodes via the invariance of conditional scale coefficients across quantiles. We then establish its theoretical guarantees for recovering topological order and graph structure, even when the number of variables diverges with the sample size. Simulation studies and application to benchmark datasets show that RESQUE performs favorably compared with existing methods, especially when causal information is partly encoded in the variance component. These results highlight exploiting structured variance signals for causal discovery and provide a principled framework for multivariate causal discovery beyond mean-based modeling.

2605.26509 2026-05-27 cs.LG math.PR stat.CO

SIKA-GP: Accelerating Gaussian Process Inference with Sparse Inducing Kernel Approximations for Bayesian Deep Learning

SIKA-GP:利用稀疏诱导核近似加速贝叶斯深度学习中的高斯过程推断

Wenyuan Zhao, Rui Tuo, Chao Tian

AI总结 提出SIKA-GP方法,通过基于二元有序模板基的稀疏诱导核近似,将高斯过程推断的计算复杂度降低至O(log M),并实现高效张量化GPU计算,可自然嵌入贝叶斯神经网络,在视觉和Transformer语言基准上显著加速训练和推断而不牺牲预测性能。

详情
Comments
20 pages, 8 figures; accepted to International Conference on Machine Learning (ICML) 2026
AI中文摘要

高斯过程(GPs)为不确定性估计提供了原则性的贝叶斯框架,但其计算复杂度严重限制了在大规模数据集上的可扩展性。我们提出SIKA-GP,该方法使用基于二元有序模板基的稀疏诱导核近似来加速GP推断,对诱导点数量的复杂度依赖仅为${O}(\log M)$。我们的方法从稀疏激活基构建紧凑且表达力强的核表示,从而实现高效的张量化GPU计算,并与现代大规模模型无缝集成。SIKA-GP可以自然地嵌入具有稀疏激活的贝叶斯神经网络(BNNs)中,在训练和推断中均实现显著加速,且不牺牲预测性能。该方法自然地扩展到深度特征学习,解决了深度架构和高维特征表示带来的可扩展性挑战。在视觉和基于Transformer的语言基准上的实验结果表明,我们的方法始终提供快速且准确的GP模型,为可扩展核学习提供了一条原则性路径。

英文摘要

Gaussian processes (GPs) provide a principled Bayesian framework for uncertainty estimation, but their computational complexity severely limits scalability to large datasets. We propose SIKA-GP, which accelerates GP inference using sparse inducing kernel approximations based on a dyadic ordered template basis, incurring only ${O}(\log M)$ complexity dependence on the number of inducing points. Our approach constructs compact and expressive kernel representations from sparsely activated bases, enabling efficient tensorized GPU computation and seamless integration with modern large-scale models. SIKA-GP can be naturally embedded into Bayesian neural networks (BNNs) with sparse activations, yielding significant speedups in both training and inference without sacrificing predictive performance. The method naturally extends to deep feature learning, addressing the scalability challenges introduced by deep architectures and high-dimensional feature representations. Empirical results on vision and transformer-based language benchmarks demonstrate that our approach consistently delivers fast and accurate GP models, providing a principled path toward scalable kernel learning.

2605.26507 2026-05-27 stat.ME

Improving inverse probability of censoring weighting for win statistics with composite survival outcomes

改进复合生存结局胜统计量的逆删失概率加权方法

Xi Fang, Fan Li

AI总结 针对复合终点胜统计量(胜率、净收益、胜率比)估计中逆删失概率加权法丢弃不确定对导致效率损失的问题,提出基于条件平局概率的部分观测对恢复估计器,并推导了大样本理论和闭合方差估计。

详情
AI中文摘要

胜统计量,包括胜率、净收益和胜率比,通过按临床重要性排序的组分结局依次比较患者对,仅在较高优先级结局平局时才考虑较低优先级结局,从而总结层次复合终点的治疗效果。将比较限制在预先指定的临床时间范围内,可得到与删失机制分离的明确定义的估计目标,而在估计过程中处理右删失至关重要。现有的逆删失概率加权方法完全丢弃不确定对,导致随删失和限制时间范围增加而增长的效率损失。我们提出一种新估计器,用给定观测数据下的条件平局概率替代较高优先级平局的确认,将部分观测对恢复为分数贡献。基于带有估计 nuisance 函数的两样本 U 统计量建立了大样本理论,并在新加权方案下得到了胜率、净收益和胜率比的闭合形式三明治方差估计。模拟显示,基于新估计器,从轻度删失到高删失率,效率增益显著增加,我们进一步将估计器应用于重新分析一项已完成的随机临床试验。

英文摘要

Win statistics, including the win ratio, net benefit, and win odds, summarize treatment effects on hierarchical composite endpoints by sequentially comparing patient pairs on component outcomes ordered by clinical importance, proceeding to lower priority components only when higher priority ones are tied. Restricting comparisons to a pre-specified clinical horizon yields well defined estimands separated from the censoring mechanism, and it is critically important to address right censoring during estimation. Existing inverse probability of censoring weighting methods discard indeterminate pairs entirely, incurring avoidable efficiency loss that grows with censoring and restriction horizon length. We propose a novel estimator that replaces the confirmation of higher priority ties with a conditional tie probability given the observed data, recovering partially observed pairs as fractional contributions. Large sample theory is developed based on two-sample U-statistics with estimated nuisance functions, and closed-form sandwich variance estimators are obtained for the win ratio, net benefit, and win odds under our new weighting scheme. Simulations demonstrate sizable efficiency gains growing sharply from light censoring to high censoring rate based on our new estimator, and we further apply our estimator to reanalyze a completed randomized clinical trial.

2605.26429 2026-05-27 stat.ME cs.AI cs.LG stat.ML

Structure-Adaptive Conformal Inference for Large-Scale Out-of-Distribution Testing

面向大规模分布外检测的结构自适应共形推断

Rongyi Sun, Wenguang Sun, Zinan Zhao

AI总结 提出结构自适应共形q值(SCQ)和伪分数引导的直推式自动模型选择(P-TAMS),在成对可交换性下实现结构化分布外检测的有限样本错误率控制、功效提升和可解释性增强。

详情
AI中文摘要

本文针对高风险机器学习应用中的结构化分布外(OOD)检测问题。传统共形方法依赖于联合可交换性,难以融入时空或分组结构等辅助信息。为克服这一局限,我们提出结构自适应共形q值(SCQ),这是一种整合个体检验证据与结构模式的显著性指标。我们还开发了伪分数引导的直推式自动模型选择(P-TAMS),将共形化模型选择适应于候选模型工具箱中的结构化OOD检测。SCQ和P-TAMS共同在成对可交换性下形成一个统一框架,提供有限样本错误率控制、改进的功效和增强的可解释性。在模拟和真实数据上的实验表明,所提方法控制了错误发现率,并在多种设置下表现良好。

英文摘要

This paper addresses structured out-of-distribution (OOD) testing in high-stakes machine learning applications. Traditional conformal methods rely on joint exchangeability, making it difficult to incorporate auxiliary information such as spatiotemporal or grouping structures. To overcome this limitation, we propose the structure-adaptive conformal q-value (SCQ), a significance index that integrates individual test evidence with structural patterns. We also develop pseudo-score-guided transductive automated model selection (P-TAMS), which adapts conformalized model selection to structured OOD testing across a toolbox of candidate models. Together, SCQ and P-TAMS form a unified framework under pairwise exchangeability, providing finite-sample error-rate control, improved power, and enhanced interpretability. Experiments on simulated and real data demonstrate that the proposed approach controls the false discovery rate and performs well across diverse settings.

2605.26422 2026-05-27 stat.CO

Fast Computational Methods for Regularized Estimating Equations

正则化估计方程的快速计算方法

Weihua Shi, Yixuan Li, Yi Lian, Archer Y. Yang, Yue Zhao

AI总结 本文回顾了估计方程的应用领域,并将正则化估计方程的计算方法归纳为四种类型:最小化型、Dantzig型、正则化型和不动点型,讨论了每种方法的主要数值策略,并强调了正则化估计方程与不动点问题之间的联系。

详情
AI中文摘要

估计方程出现在广泛的统计应用中,包括纵向和聚类数据分析、生存分析、计量经济学和半参数推断。在高维设置中,添加稀疏性诱导的正则化通常会导致计算挑战,这些挑战无法通过标准的惩罚优化例程完全解决。这些挑战与底层估计问题的结构形式密切相关:主要地,估计函数不必是标量目标的梯度,并且可能涉及非对称雅可比矩阵、过度识别、非光滑性、非凸性或嵌套优化。本文首先回顾了估计方程的应用领域,然后通过将正则化估计方程的计算方法组织为四种广泛的形式:最小化型、Dantzig型、正则化型和不动点型方法,讨论了每种方法的主要数值策略,包括惩罚优化、约束线性规划、迭代求根和近端不动点迭代。我们还强调了正则化估计方程与不动点问题之间的联系,这为分析和求解正则化估计方程提供了一个统一的计算视角。

英文摘要

Estimating equations arise in a wide range of statistical applications, including longitudinal and clustered data analysis, survival analysis, econometrics, and semiparametric inference. In high-dimensional settings, adding sparsity-inducing regularization often leads to computational challenges that are not fully addressed by standard penalized optimization routines. These challenges are closely tied to the structural form of the underlying estimating problem: mainly, the estimating function needs not be the gradient of a scalar objective and may involve asymmetric Jacobians, overidentification, nonsmoothness, nonconvexity, or nested optimization. This article first reviews the application areas of estimating equations, and then the computational methods for regularized estimating equations by organizing them into four broad formulations: minimization-type, Dantzig-type, regularization-type, and fixed-point-type approaches. We discuss the main numerical strategies associated with each formulation, including penalized optimization, constrained linear programming, iterative root-solving, and proximal fixed-point iteration. We also highlight the connection between regularized estimating equations and fixed-point problems, which provides a unified computational perspective for analyzing and solving regularized estimating equations.

2605.26413 2026-05-27 stat.ME cs.AI cs.LG stat.ML

Confounder Detection via Treatment Intent: A New Observational Study Design

通过治疗意图进行混杂检测:一种新的观察性研究设计

Drago Plecko, Patrik Okanovic, Torsten Hoefler, Elias Bareinboim

AI总结 提出一种通过询问治疗决策者比较配对单元来揭示未观测混杂因素的新研究设计,并在ICU数据中验证其有效性。

详情
AI中文摘要

理解干预的效果是科学进步的核心,随机对照试验(RCT)在许多应用领域被视为因果推断的金标准。然而,RCT成本高、耗时长,且常受伦理或实际限制,这促使我们需要能够从观察性数据中得出结论的因果方法。尽管此类数据收集规模日益扩大,但将其用于因果推断常因并非所有影响治疗分配和结果的变量都被观测到而受阻,这一问题称为未观测混杂。在本文中,我们介绍了一种称为通过治疗意图进行混杂检测的新研究设计。其思路是询问做出治疗决策的人类专家,并要求他们比较由原则性匹配策略提出的单元对,目的是引出解释治疗决策为何不同的未观测变量。我们为此类程序提供了理论基础,确定了此类研究设计可能引出未观测混杂因素的条件。基于这些新建立的基础,我们研究了重症监护病房(ICU)中干预的治疗效果。首先,我们展示了强烈表明ICU中收集的电子健康记录(EHR)存在未观测混杂的经验证据。通过使用临床文本笔记作为医生知识的代理并利用自然语言处理,我们在已知真实情况的半合成环境中为我们的方法提供了概念验证。

英文摘要

Understanding the effects of interventions is central to scientific progress, with randomized controlled trials (RCTs) regarded as the gold standard for causal inference in many applied fields. However, RCTs are costly, time-consuming, and often constrained by ethical or practical limitations, motivating the need for causal methods able to draw conclusions from observational data. While such data is collected at ever larger scale, making its use for causal inference is often hindered by the fact that not all variables affecting treatment allocation and the outcome are observed: an issue known as unobserved confounding. In this paper, we introduce a new study design called confounder detection via treatment intent. The idea is to query a human expert who makes treatment decisions, and ask them to compare pairs of units proposed by a principled matching strategy, with the goal of eliciting unobserved variables that explain why treatment decisions differ. We provide a theoretical basis for such a procedure, ascertaining conditions under which such a study design may elicit unobserved confounders. Building on this newly established foundations, we study treatment effects of interventions in the intensive care unit (ICU). First, we show empirical evidence strongly indicating that electronic health records (EHRs) collected in ICUs are subject to unobserved confounding. By using clinical text notes as a proxy for physicians' knowledge and leveraging natural language processing, we provide a proof of concept for our methodology in a semi-synthetic environment with a known ground truth.

2605.26401 2026-05-27 stat.AP stat.ME

Small-Area Precipitation Forecasting and Drought--Flood Early Warning with Reverse-Martingale Regularized Recurrent Networks

基于逆鞅正则化循环网络的小区域降水预报与旱涝预警

Foo Hui-Mean, Yuan-chin Ivan Chang

AI总结 提出逆鞅正则化循环神经网络(RMRNN)用于概率降水预报和序贯旱涝预警,通过后向一致性惩罚和Shiryaev-Roberts检测器提升预警性能。

详情
Comments
4 figures
AI中文摘要

小区域降水预报支持水库调度、灌溉规划、干旱监测和山洪响应的实时决策。其操作价值不仅取决于点精度,还取决于校准的超越概率和预警规则,这些规则在局部天气状况偏离训练气候学时保持稳定。我们评估了一种逆鞅正则化循环神经网络(RMRNN)用于概率降水预报和序贯早期预警。在循环隐藏状态中添加了后向一致性惩罚;由此产生的残差过程驱动Shiryaev-Roberts(SR)检测器,因此产生预报的同一潜在轨迹也提供了持续更新的干旱或洪水状态指标。该框架在台湾CWA密集雨量计网络、台湾和非洲之角的CHIRPS v2日网格降水数据以及德克萨斯丘陵地区的NOAA GHCN-Daily站点上进行了测试。在1000次重复实验中,RMRNN在1小时至72小时预报时效的RMSE、MAE和CRPS上匹配或略优于GRU基线,同时显著改善了警报特性。在匹配检测功率下,SR检测器将误报率降低了三到五倍。在2020-2021年台湾干旱中,干旱开始比SPI-3阈值法提前8-12天被标记;在2023年海葵台风洪水中,山洪风险比CWA操作警报提前4小时被发出信号。

英文摘要

Small-area precipitation forecasts support real-time decisions for reservoir operation, irrigation planning, drought monitoring, and flash-flood response. Operational value depends not only on point accuracy, but also on calibrated exceedance probabilities and warning rules that remain stable when local weather regimes depart from the training climatology. We evaluate a reverse-martingale regularized recurrent neural network (\RMRNN) for probabilistic precipitation forecasting and sequential early warning. A backward-coherence penalty is added to the recurrent hidden state; the resulting residual process drives a Shiryaev--Roberts (SR) detector, so the same latent trajectory that produces the forecast also supplies a continuously updated drought or flood-regime indicator. The framework is tested on the Taiwan CWA dense rain-gauge network, CHIRPS v2 daily gridded precipitation over Taiwan and the Horn of Africa, and NOAA GHCN-Daily stations over the Texas Hill Country. Across 1{,}000 replications, \RMRNN{} matches or slightly improves the GRU baseline in RMSE, MAE, and CRPS at 1~h--72~h lead while substantially improving alarm characteristics. The SR detector reduces false-alarm ratios by a factor of three to five at matched detection power. In the 2020--2021 Taiwan drought, onset is flagged 8--12 days earlier than SPI-3 thresholding; in the 2023 Typhoon Haikui flood, flash-flood risk is signalled 4~h before the CWA operational alert.

2605.26385 2026-05-27 cs.IR cs.AI stat.ML

Credit-assigned Policy Gradient for Early Stage Retrieval in Two-stage Ranking

两阶段排序中早期检索的信用分配策略梯度

Haruka Kiyohara, Mihaela Curmei, Ariel Evnine, Shankar Kalyanaraman, Israel Nir, Ana-Roxana Pop, Nitzan Razin, Sarah Dean, Thorsten Joachims, Udi Weinsberg

AI总结 针对两阶段排序中早期排序器(ESR)端到端训练难的问题,提出信用分配策略梯度(CA-PG),通过对目标项被选中的概率求梯度来降低方差,提升训练稳定性和收敛速度。

详情
Comments
ICML2026
AI中文摘要

大规模搜索、推荐和检索增强生成(RAG)系统通常采用两阶段架构:早期排序器(ESR)生成候选集,随后由后期排序器(LSR)重新排序。虽然有许多强化学习(RL)方法用于训练LSR,但ESR的端到端训练被证明具有挑战性。特别是,朴素应用“普通”策略梯度(V-PG)对于实际使用的候选集大小不可扩展,因为方差爆炸。该问题源于V-PG将梯度传播到候选集的联合概率,忽略了候选集中每个特定项对奖励的贡献。为缓解此问题,我们提出了一种新颖的“信用分配”策略梯度(CA-PG),它计算相对于目标项在任何候选集中被选中的概率的梯度,即边际化所有包含它的候选集。我们的理论分析表明,CA-PG通过边际化候选集的具体组成显著降低了V-PG的方差,同时保留了在合理对齐的LSR策略下学习正确排序项的能力。在合成和真实数据上的实验表明,CA-PG提高了使用经典Plackett-Luce模型的ESR的收敛速度和训练稳定性,特别是在候选集大小较大时。

英文摘要

Large-scale search, recommendation, and retrieval-augmented generation (RAG) systems typically employ a two-stage architecture: an early-stage ranker (ESR) generates a candidate set, which is subsequently re-ranked by a late-stage ranker (LSR). While there are many reinforcement learning (RL) methods for training the LSR, end-to-end training of the ESR has proven challenging. In particular, naive application of "vanilla" policy gradient (V-PG) is not scalable for candidate-set sizes relevant for practical use due to exploding variance. This issue arises because V-PG propagates the gradient to the joint probability of the candidate sets, ignoring the contribution of each specific item in the candidate set to the reward. To mitigate this issue, we propose a novel "credit-assigned" policy gradient (CA-PG), which computes gradients with respect to the probability that the target item is chosen in any candidate set, i.e. marginalizing over all candidate sets that contain it. Our theoretical analysis reveals that CA-PG significantly reduces the variance of V-PG by marginalizing over the specific composition of the candidate set, while preserving the ability to learn the correct ranking of items under a reasonably aligned LSR policy. Experiments on both synthetic and real-world data demonstrate that CA-PG improves the convergence speed and training stability for ESRs utilizing the canonical Plackett-Luce model, especially when the candidate-set size is large.

2605.26379 2026-05-27 stat.ML cs.LG

When Does LeJEPA Learn a World Model?

LeJEPA 何时学习世界模型?

David Klindt, Yann LeCun, Randall Balestriero

AI总结 本文证明 LeJEPA(对齐加高斯正则化)在潜变量服从平稳加性噪声演化的世界中能够线性恢复潜变量(线性可识别性),并指出高斯分布是唯一保证该性质的潜分布,同时验证了近似可识别性和最优规划能力。

详情
AI中文摘要

一种混淆世界真实自由度的表示无法支持可靠的规划或组合泛化。我们证明,在潜变量服从平稳加性噪声演化的一类广泛世界中,LeJEPA(对齐加高斯正则化)能从非线性观测中线性恢复世界的潜变量,这一性质称为线性可识别性。我们的主要结果是:在所有此类世界中,高斯分布是唯一保证该性质的潜分布。正向方向依赖于谱分解,其中每个非线性度都受到对齐的严格惩罚,使得线性映射成为最优;反向方向排除了所有非高斯替代。我们进一步证明了近似可识别性结果,其中保证会优雅地退化,并表明线性正交可识别性能够实现最优潜空间规划。我们通过从二维示例到1024维潜变量的实验验证了理论,包括分布消融和基于像素的机器人控制。我们的理论将经验上成功的配方转化为数学保证,为构建能够可证明恢复世界结构的世界模型提供了基础。

英文摘要

A representation that scrambles the true degrees of freedom of the world cannot support reliable planning or compositional generalization. We prove that LeJEPA (alignment plus Gaussian regularization) linearly recovers the world's latent variables from nonlinear observations, a property known as linear identifiability, in a broad class of worlds where latents evolve under stationary, additive-noise transitions. Our main result is that among all such worlds, the Gaussian is the unique latent distribution for which this guarantee holds. The forward direction rests on a spectral decomposition in which each degree of nonlinearity is strictly penalized by alignment, making the linear map the optimum; the converse rules out every non-Gaussian alternative. We further prove an approximate identifiability result where the guarantee degrades gracefully, and show that linear, orthogonal identifiability enables optimal latent-space planning. We validate the theory with experiments ranging from 2D examples to 1024-dimensional latents, including distributional ablations and pixel-based robotic control. Our theory turns an empirically successful recipe into a mathematical guarantee, providing the foundation for building World Models that provably recover the structure of the world.

2605.26373 2026-05-27 cs.LG math.OC stat.ML

Online Learning on Hidden-Convex Losses via Algorithmic Equivalence: Optimal Regret, Geometric Barrier, and Bandit Feedback

通过算法等价性在隐凸损失上的在线学习:最优遗憾、几何障碍与Bandit反馈

Anas Barakat, Andreas Kontogiannis, Vasilis Pollatos, Ioannis Panageas, Antonios Varvitsiotis

AI总结 本文通过更精确的离散时间算法等价性论证,证明在线梯度下降在隐凸损失上达到最优的$\mathcal{O}(\sqrt{T})$遗憾,并澄清了所需几何条件,同时扩展到单点Bandit反馈得到$\mathcal{O}(T^{3/4})$期望遗憾。

详情
Comments
43 pages
AI中文摘要

我们研究具有隐凸损失的对抗性在线学习,即经过非线性重参数化后变为凸的非凸损失。Ghai, Lu和Hazan (2022)证明,在几何和光滑性假设下,此类非凸损失上的在线梯度下降(OGD)近似模拟了具有适当正则化器的底层凸损失上的在线镜像下降(OMD),得到$\mathcal{O}(T^{2/3})$遗憾。他们留下了是否可以在隐凸设置中恢复在线凸优化的最优$\Theta(\sqrt{T})$遗憾的开放问题。我们肯定地回答了这个问题。更具体地,通过更尖锐的离散时间算法等价性论证,我们证明在相同假设下OGD达到$\mathcal{O}(\sqrt{T})$遗憾,匹配对抗性在线凸优化的最坏情况最优速率。我们还解决了Ghai, Lu和Hazan (2022)的另一个开放问题,澄清了这种算法等价性所需的几何条件。我们将对角雅可比充分条件替换为必要且充分的Hessian相容性条件,从而扩展了可允许重参数化的类别。我们用下界补充了紧的遗憾界,表明Hessian相容性假设对OGD是必要的;当该条件不成立时,我们构造一个光滑的重参数化和一个对抗性的隐凸损失序列,使得OGD遭受$\Omega(T)$遗憾。最后,我们将分析扩展到单点Bandit反馈,并证明使用球形平滑的Bandit OGD的$\mathcal{O}(T^{3/4})$期望遗憾界,匹配其在凸损失上的经典速率。

英文摘要

We study adversarial online learning with hidden-convex losses, i.e., nonconvex losses that become convex after a nonlinear reparameterization. Ghai, Lu and Hazan (2022) proved that, under geometric and smoothness assumptions, online gradient descent (OGD) on such nonconvex losses approximately simulates online mirror descent (OMD) on the underlying convex losses with a suitable regularizer, yielding $\mathcal{O}(T^{2/3})$ regret. They left open whether the optimal $Θ(\sqrt{T})$ regret from online convex optimization can be recovered in this hidden-convex setting. We answer this question affirmatively. More specifically, via a sharper discrete-time algorithmic equivalence argument, we prove that OGD achieves $\mathcal{O}(\sqrt{T})$ regret under the same assumptions, matching the optimal worst-case rate for adversarial online convex optimization. We also address another open question of Ghai, Lu and Hazan (2022) by clarifying the geometry required for this algorithmic equivalence. We replace the diagonal-Jacobian sufficient condition with a necessary-and-sufficient Hessian compatibility condition, thereby expanding the class of admissible reparameterizations. We complement our tight regret bound with a lower bound showing that the Hessian compatibility assumption is essential for OGD; when it fails, we construct a smooth reparameterization and an adversarial sequence of hidden-convex losses for which OGD suffers $Ω(T)$ regret. Finally, we extend our analysis to one-point bandit feedback and prove a $\mathcal{O}(T^{3/4})$ expected regret bound for bandit OGD with spherical smoothing, matching its classical rate on convex losses.

2605.26361 2026-05-27 math.OC stat.ML

Fast Convergence of Policy Regret in Learning Stochastic Optimal Control

学习随机最优控制中策略遗憾的快速收敛

Shengbo Wang, Jose Blanchet, Peter Glynn

AI总结 本文研究随机最优控制中基于值的策略学习,通过最优动作值函数Q*的估计诱导贪婪策略,并证明在连续动作空间中,由增长指数p、边际质量指数m和动作方向正则性指数q三种几何结构驱动策略遗憾的快速收敛,得到极小化最优收敛速率。

详情
AI中文摘要

现代运营环境中的策略学习面临有限运营数据与必须识别和部署良好决策的大规模、通常连续的状态和动作空间之间的基本张力。我们研究随机最优控制中基于值的策略学习:由最优动作值函数$Q^*$的估计诱导的贪婪策略被部署,其性能通过遗憾来衡量。该方法的经验成功需要统计洞察哪些结构能够实现快速遗憾收敛。我们证明,在连续动作空间中,快速策略学习由三种几何结构驱动:增长指数$p$,量化$Q^*$将次优动作与其最大化者分离的速度;边际质量指数$m$,控制部署质量落在增长较弱状态上的程度;以及动作方向正则性指数$q$,衡量$Q^*$估计误差在动作间的平滑度。给定$n^{-1/2}$精度的$Q^*$估计量,我们证明极小化最优策略遗憾收敛速率为\[ \widetildeΘ\left( n^{-\min\left\{ rac{p}{2(p-q)}, rac{m+1}{2m} ight\}} ight), \] 在两个区域边界处存在对数因子。指数$q$至关重要:$q>0$产生快于$n^{-1/2}$的遗憾。这一区域在运营应用中很自然。特别是,我们在动态库存控制和服务分配示例中验证了在温和正则条件下$q>0$,而这一快速速率机制背后的原理超出了这些设置。

英文摘要

Policy learning in modern operations environments faces a fundamental tension between limited operational data and the large, often continuous, state and action spaces over which good decisions must be identified and deployed. We study value-based policy learning in stochastic optimal control: a greedy policy induced by an estimate of the optimal action-value function $Q^*$ is deployed, and its performance is measured by regret. The empirical success of this approach calls for statistical insight into the structures that enable fast regret convergence. We show that, in continuous action spaces, fast policy learning is induced by three geometric structures: a growth exponent $p$, which quantifies how quickly $Q^*$ separates suboptimal actions from its maximizers; a margin-mass exponent $m$, which controls how much deployment mass lies on states with weak growth; and an action-wise regularity exponent $q$, which measures the smoothness of the $Q^*$-estimation error across actions. Given a $n^{-1/2}$-accurate estimator of $Q^*$, we show that the minimax-optimal policy regret convergence rate is \[ \widetildeΘ\left( n^{-\min\left\{\frac{p}{2(p-q)},\frac{m+1}{2m}\right\}} \right), \] up to a logarithmic factor at the boundary between the two regimes. The exponent $q$ is crucial: $q>0$ yields faster-than-$n^{-1/2}$ regret. This regime is natural in operations applications. In particular, we verify $q>0$ under mild regularity conditions in dynamic inventory control and service allocation examples, while the mechanism underlying this fast rate regime extends beyond these settings.

2605.26341 2026-05-27 cs.LG stat.ML

A PAC-Bayesian View of Generalisation for Physics-Informed Machine Learning

物理信息机器学习的泛化性的PAC-Bayesian视角

Thien V. Nguyen, Amaury Habrard, Benjamin Guedj

AI总结 本文通过PAC-Bayesian框架,针对无界损失下的回归问题,推导了物理信息机器学习的高概率泛化界,并提出了自界感知学习算法,在标准PDE基准上验证了界的非平凡性和更紧性。

详情
AI中文摘要

物理信息机器学习(PIML)将机械知识(通常以偏微分方程(PDE)的形式)整合到数据驱动模型中。尽管经验性能强劲,但其统计泛化性质仍未被充分理解,尤其是在具有无界损失的回归设置中。现有分析依赖于近似或稳定性论证,未能完全捕捉物理结构如何影响有限数据的泛化。在这项工作中,我们为PIML开发了一个PAC-Bayesian框架,在存在无界损失的情况下提供高概率泛化保证。我们采用多任务视角,联合处理数据保真度、PDE残差、初始条件和边界条件,避免了标准联合界方法导致的松散性。我们的分析利用物理信息目标的结构,推导出新的界,其中复杂度与损失的输入梯度范数成比例,揭示了物理正则性与泛化之间的直接联系。我们在Sobolev和Poincaré型假设下实例化该框架,得到两类界,在不同机制下权衡统计复杂性和光滑性。基于这些结果,我们提出了一种自界感知学习算法,直接优化推导界的可处理代理,以及一种在实际设置中估计相关常数的实用程序。在标准PDE基准上的实证评估表明,我们的界是非平凡的,显著比联合界基线更紧,并且可以在训练过程中有效最小化。总体而言,我们的结果为物理信息模型的泛化提供了原则性的统计基础。

英文摘要

Physics-informed machine learning (PIML) integrates mechanistic knowledge, typically in the form of partial differential equations (PDE), into data-driven models. Despite strong empirical performance, its statistical generalisation properties remain poorly understood, particularly in the regression setting with unbounded losses. Existing analyses rely on approximation or stability arguments and do not fully capture how physical structure influences generalisation from finite data. In this work, we develop a PAC-Bayesian framework for PIML that provides high-probability generalisation guarantees in the presence of unbounded losses. We adopt a multi-task perspective that jointly treats data fidelity, PDE residuals, initial and boundary conditions, avoiding the looseness induced by standard union-bound approaches. Our analysis leverages the structure of physics-informed objectives to derive novel bounds where the complexity scales with input-gradient norms of the losses, revealing a direct link between physical regularity and generalisation. We instantiate this framework under Sobolev and Poincaré-type assumptions, yielding two classes of bounds that trade off statistical complexity and smoothness in different regimes. Building on these results, we propose a self-bounding-aware learning algorithm that directly optimises tractable surrogates of the derived bounds, along with a practical procedure to estimate the associated constants in realistic settings. Empirical evaluations on standard PDE benchmarks demonstrate that our bounds are non-vacuous, significantly tighter than union-bound baselines, and can be effectively minimised during training. Overall, our results provide a principled statistical foundation for the generalisation of physics-informed models.

2605.26335 2026-05-27 stat.ME

Unobserved Heterogeneity in Threshold Regression Based on the Hitting Times of a Reflected Brownian Motion for Recurrent Hypoglycemia

基于反射布朗运动首次击中时间的阈值回归中未观测到的异质性:以复发性低血糖为例

Yingfa Xie, Haoda Fu, Yuan Huang, Jun Yan

AI总结 针对复发性低血糖数据中个体间异质性不足的问题,提出反射布朗运动首次击中时间的有限混合模型,通过成分特异性回归系数和脆弱参数,结合贝叶斯推断识别不同风险亚组。

详情
AI中文摘要

复发性低血糖的分析对于糖尿病患者的有效治疗管理至关重要。通常,此类分析中的受试者内依赖性通过受试者水平的脆弱性来捕捉。最近的研究使用反射布朗运动的首次击中时间对复发性低血糖进行建模。对该方法的仔细检查表明,它未能充分解释个体间不同的脆弱性,这指示了显著的异质性。为解决这一不足,我们提出了反射布朗运动首次击中时间分布的有限混合模型。该模型允许成分特异性的回归系数和脆弱参数,从而提供关于风险因素如何不同地影响患者亚组的细致见解。我们采用贝叶斯框架进行推断,利用马尔可夫链蒙特卡洛进行估计。模型选择使用偏差信息准则和伪边际似然对数进行。通过模拟研究评估这些准则的有效性。应用于复发性低血糖建模揭示了两个具有不同风险特征的亚组,如其波动性所反映。贝叶斯模型比较准则倾向于具有成分特异性波动性回归系数的模型。波动性较低的亚组表现出较大的方差,因此具有更高水平的异质性。

英文摘要

Analyses of recurrent hypoglycemia are critical for effective treatment management in diabetic patients. Typically, within-subject dependency in such analyses is captured through subject-level frailty. Recent research has modeled recurrent hypoglycemia using the first hitting times of a reflected Brownian motion. A close examination of this approach reveals that it does not adequately account for varying frailties among individuals, which indicate notable heterogeneity. To address this gap, we propose a finite mixture model of the first hitting time distribution of the reflected Brownian motion. This model allows for component-specific regression coefficients and frailty parameters, providing nuanced insights into how risk factors differently affect patient subgroups. We employ a Bayesian framework for inference, utilizing Markov chain Monte Carlo for estimation. Model selection is conducted using the Deviance Information Criterion and the Logarithm of the Pseudo-Marginal Likelihood. The effectiveness of these criteria is assessed through simulation studies. Application to recurrent hypoglycemia modeling revealed two subgroups with different risk profiles, as reflected in their volatilities. Bayesian model comparison criteria favor the model with component specific regression coefficients for volatilities. The subgroup with lower volatility exhibits a larger variance and, hence, a greater level of heterogeneity.

2605.26312 2026-05-27 stat.ME

Cross-modal dependence analysis with asynchronous longitudinal multimodal data

异步纵向多模态数据的跨模态依赖分析

Kun Qian, Hyung G. Park

AI总结 提出一种贝叶斯潜变量模型,用于估计异步观测的多模态数据中协变量辅助的依赖结构,并应用于阿尔茨海默病神经影像学倡议数据,揭示临床有意义的纵向跨模态生物标志物依赖模式。

详情
AI中文摘要

我们提出一种贝叶斯潜变量模型,用于估计可能异步观测的多变量数据跨模态的协变量辅助依赖结构。这种设置常见于纵向生物医学研究,尤其是在复杂疾病的观察性和临床研究中,其中生物标志物模态之间的动态和异质性依赖可能具有病理学和临床信息。例如,阿尔茨海默病的生物学诊断和分期需要综合评估多模态生物标志物,包括影像和生物流体生物标志物,而阿尔茨海默病神经影像学倡议(ADNI)研究已纵向收集生物标志物数据超过二十年。然而,由于研究设计和数据收集的限制,多模态谱的异步收集常常给定量分析带来挑战。常见的分析策略,如将推断限制在完整观测或分别分析每个模态,可能会丢失信息并引入偏差。因此,我们旨在联合建模所有可用数据,并估计随时间演变且在不同人口学或临床组间变化的总体水平跨模态依赖结构,其中模态对的互协方差矩阵是主要关注量。所提出的模型使用模态特定的低秩载荷结构,结合共享潜变量,以跨模态、访视和受试者借用信息,同时考虑重复测量。应用于ADNI数据揭示了纵向跨模态生物标志物依赖中具有临床意义的模式,模拟研究显示在有限模态同步性下恢复效果得到改善。

英文摘要

We propose a Bayesian latent variable model to estimate covariate-assisted dependence structures across multiple modalities of multivariate data that may be observed asynchronously. This setting commonly arises in longitudinal biomedical research, especially in observational and clinical studies of complex diseases, where dynamic and heterogeneous dependence across biomarker modalities can be pathologically and clinically informative. For example, the biological diagnosis and staging of Alzheimer's disease require integrated evaluation of multimodal biomarkers, including imaging and biofluid biomarkers, and the Alzheimer's Disease Neuroimaging Initiative (ADNI) study has collected biomarker data longitudinally for over two decades. However, quantitative analysis is often challenged by asynchronous collection of multimodal profiles due to study design and data collection constraints. Common analytic strategies that restrict inference to complete observations or analyze each modality separately can lose information and introduce bias. Therefore, we aim to jointly model all available data and estimate the population-level cross-modal dependence structure that evolves over time and varies across demographic or clinical groups, where the cross-covariance matrices for modality pairs serve as the primary quantities of interest. The proposed model uses modality-specific low-rank loading structures with shared latent variables to borrow information across modalities, visits, and subjects while accounting for repeated measurements. The application to ADNI data reveals clinically meaningful patterns in longitudinal cross-modal biomarker dependence, and the simulation study shows improved recovery under limited modality synchrony.

2605.26288 2026-05-27 stat.ML cs.LG stat.ME

Beyond Differences: Doubly Robust Meta-Learners for Ratio-Based Treatment Effects

超越差异:基于比率的治疗效应的双重稳健元学习器

Michael Fuchs, Dominik Kreiss

AI总结 针对比率型条件平均处理效应(CATE)估计,提出Q-Learner将比率分解为两个优势比的乘积,并推导双重稳健增强版本,在低转化率场景和混杂观测数据中表现优异。

详情
Comments
13+5 pages, 5 figures, 6 tables. Code: https://github.com/michaelfuchs90/ratiobasedcate
AI中文摘要

当治疗效应自然表达为比率时——如在医学、定价和营销中——基于比率的CATE $τ(x) = E[Y|W=1,X=x] / E[Y|W=0,X=x]$ 是合适的估计目标。然而,现有估计器要么施加对数线性参数结构,要么应用通用回归而不对该泛函提供稳健性保证。我们引入了Q-Learner,它将$τ(x)$分解为两个优势比的乘积,将二元结果的比率CATE估计简化为两个倾向性分类任务。我们进一步推导了S/T型和Q型比率学习器的双重稳健增强,并刻画了它们不同的稳健性性质。在七个RCT数据集的基准测试中,Q-Learner在低转化率场景下是最持续有竞争力的方法,其仅基于倾向性的构造规避了伤害基于结果估计器的不平衡回归。在四个观测数据集上,其中倾向性必须估计且混杂无法排除,本文引入的DR学习器明确胜出,使其成为实践者在混杂观测数据中的自然默认选择。

英文摘要

When treatment effects are naturally expressed as ratios -- as in medicine, pricing, and marketing -- the ratio-based CATE $τ(x) = E[Y|W=1,X=x] / E[Y|W=0,X=x]$ is the appropriate estimand. Yet existing estimators either impose a log-linear parametric structure or apply generic regression without robustness guarantees for this functional. We introduce the Q-Learner, which decomposes $τ(x)$ into a product of two odds ratios, reducing ratio-CATE estimation for binary outcomes to two propensity classification tasks. We further derive doubly robust augmentations for both S/T- and Q-style ratio learners and characterize their distinct robustness properties. In benchmarks on seven RCT datasets, the Q-Learner is the most consistently competitive method in low-conversion regimes, where its propensity-only construction sidesteps the imbalanced regression that hurts outcome-based estimators. On four observational datasets, where propensity must be estimated and confounding cannot be ruled out, the DR learners introduced here decisively come out on top, making them practitioners' natural default for confounded observational data.

2605.26271 2026-05-27 stat.ML cs.LG econ.EM

Learning Nonlinear Factor Models with Unknown Monotone Links from Incomplete and Noisy Data

从不完整和含噪数据中学习具有未知单调链接的非线性因子模型

Yutong Chao, Resat Gökhan, Jalal Etesami, Ali Habibnia

AI总结 研究从含噪和不完整数据中联合恢复低秩因子、载荷和未知单调链接函数的问题,提出投影块坐标下降算法并建立收敛保证。

详情
AI中文摘要

我们研究了一个非线性因子模型,其中观测响应通过未知的单调链接函数依赖于低秩潜在因子。由于严重的非凸性和可识别性问题,这一设置具有挑战性且在很大程度上未被充分探索。链接函数假设位于再生核希尔伯特空间(RKHS)中,从而在保持可识别性的同时实现灵活的非参数建模。我们将问题表述为从可能不完整和含噪的观测中联合恢复低秩因子、载荷和非线性链接函数,并提出一种带有显式正则化的投影块坐标下降(BCD)算法以解决尺度和旋转模糊性。在因子的弱不相干性和标准采样条件下,我们建立了无噪声和有噪声情况下的收敛保证,以及链接函数更新的次线性遗憾界。我们的结果将经典线性因子模型推广到广泛的非线性领域,并为学习非线性潜在结构提供了一个原则性框架。我们通过受控的合成实验评估了所提出的方法,显示出有希望的性能。

英文摘要

We study a nonlinear factor model in which observed responses depend on low-rank latent factors through an unknown monotone link function. This setting is challenging and largely underexplored due to severe nonconvexity and identifiability issues. The link function is assumed to lie in a reproducing kernel Hilbert space (RKHS), enabling flexible nonparametric modeling while preserving identifiability. We formulate the problem as the joint recovery of the low-rank factors, loadings, and the nonlinear link function from possibly incomplete and noisy observations and propose a projected block coordinate descent (BCD) algorithm with explicit regularization to address scale and rotational ambiguities. Under mild incoherence of factors and standard sampling conditions, we establish convergence guarantees in both noiseless and noisy regimes, along with sublinear regret bounds for the link-function updates. Our results extend classical linear factor models to a broad nonlinear regime and provide a principled framework for learning nonlinear latent structures. We evaluate the proposed approach using controlled synthetic experiments, indicating promising performance.

2605.26253 2026-05-27 stat.ME stat.AP

Length-biased Birnbaum-Saunders quantile regression with application to water evaporation

长度偏差Birnbaum-Saunders分位数回归及其在水蒸发中的应用

Helton Saulo, Tailine Nonato, Roberto Vila

AI总结 提出基于长度偏差Birnbaum-Saunders分布的分位数回归模型,通过重新参数化直接解释协变量对响应变量条件分位数的影响,并应用于巴西气象数据。

详情
Comments
21 pages, 3 figures
AI中文摘要

长度偏差分布自然出现在环境、可靠性和经济研究中,其中抽样机制倾向于较大的观测单元。本文提出了一种基于长度偏差Birnbaum-Saunders (QLBS) 分布的分位数回归模型。该模型通过将长度偏差Birnbaum-Saunders分布重新参数化为其分位数函数来构建,从而允许直接解释协变量对响应变量条件分位数的影响。我们推导了对数似然函数和相应的得分方程,并通过数值优化获得最大似然估计。考虑了渐近和自助法置信区间。提出了两种用于模型评估的残差,即广义Cox-Snell残差和随机分位数残差。进行了详细的蒙特卡洛模拟研究,以评估不同样本量和分位数水平下最大似然估计的性能。通过巴西的真实气象数据集说明了所提出的方法。

英文摘要

Length-biased distributions arise naturally in environmental, reliability, and economic studies where the sampling mechanism favors larger observational units. In this paper, we propose a quantile regression model based on the length-biased Birnbaum--Saunders (QLBS) distribution. The model is constructed through a reparameterization of the length-biased Birnbaum--Saunders distribution in terms of its quantile function, thereby allowing direct interpretation of covariate effects on conditional quantiles of the response variable. We derive the log-likelihood function and the corresponding score equations, and obtain maximum likelihood estimators via numerical optimization. Asymptotic and bootstrap confidence intervals are considered. Two types of residuals are proposed for model assessment, namely the generalized Cox--Snell and randomized quantile residuals. An elaborate Monte Carlo simulation study is carried out to evaluate the performance of the maximum likelihood estimators for several sample sizes and quantile levels. The proposed methodology is illustrated with a real meteorological data set from Brazil.

2605.26222 2026-05-27 cs.LG stat.ML

From Privacy to Generalization: Linear Max-Information Bounds for DP-SGD

从隐私到泛化:DP-SGD的线性最大信息界

Christoph H. Lampert, Hossein Zakerinia

AI总结 本文证明了DP-SGD的近似最大信息量具有与数据集大小成线性关系的有限样本界,并基于此推导出PAC-Bayes泛化界和DP-SGD训练模型的显式泛化界。

详情
Comments
22 pages
AI中文摘要

理解泛化与隐私之间的关系仍然是现代机器学习理论中的一个核心挑战,特别是对于通过差分隐私随机梯度下降(DP-SGD)变体训练的深度网络。在这项工作中,我们通过证明DP-SGD的近似最大信息量的有限样本界,该界展现出与(Dwork et al, 2015)关于$ε$-差分隐私算法的经典结果相当的缩放性质,即最多与数据集大小成线性关系,从而在这个长期存在的开放问题上取得了进展。根据我们的结果,我们得到了一个通用的PAC-Bayes泛化界,其中所需的先验分布可以由DP-SGD学习,以及一个针对DP-SGD训练模型本身的泛化界,其复杂度项完全显式且由优化超参数控制。

英文摘要

Understanding the relationship between generalization and privacy remains a central challenge in modern machine learning theory, particularly for deep networks trained by variants of differentially private stochastic gradient descent (DP-SGD). In this work we make progress on this persistent open problem by proving a finite-sample bound on the approximate max-information of DP-SGD that exhibits scaling properties comparable with (Dwork et al, 2015)'s classic result for $ε$-differentially private algorithms, namely at most linear in the dataset size. From our result we obtain a general-purpose PAC-Bayes generalization bound in which the necessary prior distribution can be learned by DP-SGD, as well as a generalization bound for DP-SGD-trained models themselves, with a complexity term that is fully explicit and controlled by the optimization hyperparameters.

2605.26123 2026-05-27 eess.SY cs.SY stat.CO

Low Latency Stand Alone Compute-Efficient Forecasting of Marine Engine Time Series Data

低延迟独立计算高效的船舶发动机时间序列数据预测

Y. Harsha Vardhana Reddy, Soumyendu Raha

AI总结 提出一种基于自适应窗口多粒子随机微分方程的非线性状态空间预测框架,用于船舶发动机参数预测,在保持计算效率的同时提高了多步预测稳定性。

详情
Comments
10 pages, 9 figures, 4 tables
AI中文摘要

高性能船舶的运行可靠性关键取决于其船舶推进系统的健康状况,而推进系统日益承受多样化的运行载荷和环境压力。本文提出了一种鲁棒的数学框架,用于使用自适应窗口多粒子随机微分方程对船舶发动机参数进行非线性状态空间预测。传统的时序模型如向量自回归积分滑动平均,由于依赖固定窗口线性假设,往往无法捕捉复杂系统固有的随机性和瞬态动力学。为了解决这一问题,我们开发了一种双层估计方法:首先,自适应回溯机制根据瞬时漂移幅度动态调整学习窗口大小,确保在非平稳状态下的响应性。其次,通过欧拉-丸山离散化演化多粒子集成,其中每个粒子轨迹代表系统状态的一个随机实现。为了优化集成均值并减轻原始估计器的“噪声追逐”行为,实施了由吉尔萨诺夫变换诱导的概率测度变化,为符合物理漂移的粒子分配更高的概率权重。理论评估和实证基准测试表明,所提出的自适应SDE框架在多步预测稳定性和计算效率上显著优于经典统计基线。该模型为具有高频波动和非线性转变的系统中的实时风险量化提供了一种可扩展的“灰盒”解决方案。

英文摘要

The operational reliability of a high performance marine vessel depends critically on the health of its marine propulsion systems, which are increasingly subjected to diverse operational loads and environmental stressors. This paper proposes a robust mathematical framework for non-linear state-space forecasting of marine engine parameters using adaptive-window multi-particle stochastic differential equations. Traditional time-series models such as Vector Autoregressive Integrated Moving Average, often fail to capture the inherent stochasticity and transient dynamics of complex systems due to their reliance on fixed-window linear assumptions. To address this, we develop a dual-layered estimation approach: first, an adaptive lookback mechanism dynamically adjusts the learning window size based on the instantaneous drift magnitude, ensuring responsiveness during non-stationary regimes. Second, a Multi-Particle ensemble is evolved via Euler-Maruyama discretization, where each particle trajectory represents a stochastic realization of the system state. To refine the ensemble mean and mitigate the "noise-chasing" behavior of raw estimators, a Girsanov transform induced change of probability measure is implemented, assigning higher probabilistic weights to particles that align with the physical drift. Theoretical evaluation and empirical benchmarking demonstrate that the proposed adaptive SDE framework significantly outperforms classical statistical baselines in multi-step prediction stability and computational efficiency. The model provides a scalable, "grey-box" solution for real-time risk quantification in systems characterized by high-frequency volatility and non-linear transitions.

2605.25678 2026-05-27 stat.ML cs.DS cs.LG math.ST stat.TH

PAC Learning with Bandit Feedback: Sharp Sample Complexity in the Realizable Setting

带强盗反馈的PAC学习:可实现设置下的精确样本复杂度

Steve Hanneke, Qinglin Meng, Shay Moran, Amirreza Shaeiri

AI总结 本文研究可实现设置下带强盗反馈的多类PAC学习问题,通过定义新的组合维度(强盗DS维度)并基于ListCascade算法,给出了最优样本复杂度的精确刻画(至多对数因子)。

详情
Comments
18 pages
AI中文摘要

我们研究了可实现设置下带强盗反馈的多类PAC学习问题。在该框架中,存在一个实例空间$\mathcal{X}$和标签空间$\mathcal{Y}$上的未知数据分布,与经典多类PAC学习相同,但学习器无法观察到独立同分布训练样本的标签。相反,在每一轮中,它接收一个无标签实例,预测其标签,并接收仅指示预测是否正确的强盗反馈。尽管有此限制,目标仍与经典PAC学习相同。我们对该问题的最优样本复杂度给出了一个一般性刻画,对于每个概念类至多相差对数因子。该刻画基于一个新的组合维度,称为强盗$\mathrm{DS}$维度,通过我们称为伪盒子的广义组合结构定义。这些结构扩展了$\mathrm{DS}$维度所依赖的伪立方体,允许每个坐标有不同数量的邻居。与通过计数伪立方体中坐标数量来刻画完全信息设置的$\mathrm{DS}$维度不同,强盗$\mathrm{DS}$维度聚合了各坐标的邻居数量,从而得到样本复杂度与邻居总数成比例的刻画。我们还提出了一种通用的学习算法,称为ListCascade,实现了上界,该算法将强盗学习与列表学习联系起来,可能具有独立意义。

英文摘要

We study the problem of multiclass PAC learning with bandit feedback in the realizable setting. In this framework, there is an unknown data distribution over an instance space $\mathcal{X}$ and a label space $\mathcal{Y}$, as in classical multiclass PAC learning, but the learner does not observe the labels of the i.i.d. training examples. Instead, in each round, it receives an unlabeled instance, predicts its label, and receives bandit feedback indicating only whether the prediction is correct. Despite this restriction, the goal remains the same as in classical PAC learning. We provide a general characterization of the optimal sample complexity of this problem, sharp for every concept class up to logarithmic factors. Our characterization is based on a new combinatorial dimension, termed the bandit $\mathrm{DS}$ dimension, defined via generalized combinatorial structures we call pseudo-boxes. These extend the pseudo-cubes underlying the $\mathrm{DS}$ dimension by allowing a different number of neighbors in each coordinate. In contrast to the $\mathrm{DS}$ dimension, which governs the full-information setting by counting the number of coordinates in the pseudo-cube, the bandit $\mathrm{DS}$ dimension aggregates the number of neighbors across coordinates, leading to a characterization in which the sample complexity scales with the total number of neighbors. We also propose a general learning algorithm achieving the upper bound, based on an algorithmic principle called ListCascade, which connects bandit learning to list learning and may be of independent interest.

2605.24644 2026-05-27 math.OC stat.ML

Quadratically Regularized Optimal Transport: Localization Bounds and Affine Case Analysis

二次正则化最优传输:局部化界与仿射情况分析

Long Nguyen-Chi, Nam Nguyen, Binh Nguyen

AI总结 本文建立二次正则化最优传输(QOT)优化器支撑集在Monge耦合附近局部化的下界,并证明在仿射Brenier情形下达到最优指数。

详情
Journal ref
Forty-Third International Conference on Machine Learning (ICML 2026)
AI中文摘要

二次正则化已成为计算最优传输中流行的熵正则化的潜在替代方案,通过其铰链密度结构提供了产生稀疏耦合的理论优势。尽管在一维设置和一般上界方面取得了近期进展,但关于QOT优化器在Monge耦合附近局部化速率的基本问题仍然悬而未决。在这项工作中,我们建立了一个一般的下界,表明QOT优化器的支撑集在定向Hausdorff距离下不能以快于$\varepsilon^{ rac{1}{d+2}}$阶的速度集中在Monge图附近,这与\citet{wiesel2025sparsity}中标准正则性假设下的猜想最优指数相匹配。我们还证明了QOT值差距控制均方偏差$\mathbb E_{π_\varepsilon}\|y-T(x)\|^2$的尺度为$\varepsilon^{ rac{2}{d+2}}$。作为推论,在仿射Brenier情形(包括高斯到高斯传输)中,通过将问题简化为自传输并应用最近的自传输稀疏性结果,我们推导出阶为$\varepsilon^{ rac{1}{d+2}}$的尖锐逐点管界。最后,我们通过高维设置下的合成实验验证了我们的理论界。

英文摘要

Quadratic regularization has emerged as a potential alternative to the popular entropic regularization in computational optimal transport, offering the theoretical advantage of producing sparse couplings through its hinge density structure. Despite recent progress in one-dimensional settings and general upper bounds, fundamental questions about the localization rate of QOT optimizers around the Monge coupling have remained open. In this work, we establish a general lower bound showing that the support of the QOT optimizer cannot concentrate around the Monge graph faster than order $\varepsilon^{\frac{1}{d+2}}$ in the directed Hausdorff distance, matching the conjectured optimal exponent under standard regularity assumptions in \citet{wiesel2025sparsity}. We also show that the QOT value gap controls the mean-squared deviation $\mathbb E_{π_\varepsilon}\|y-T(x)\|^2$ by the scale of $\varepsilon^{\frac{2}{d+2}}$. As a corollary, in the affine Brenier regime, which includes Gaussian-to-Gaussian transport, we derive a sharp pointwise tube bound of order $\varepsilon^{\frac{1}{d+2}}$ by reducing the problem to self-transport and applying recent self-transport sparsity results. Finally, we validate our theoretical bound with a synthetic experiment in high-dimensional settings.

2605.19052 2026-05-27 stat.ML cs.LG

Provably Data-driven Lagrangian Relaxation for Mixed Integer Linear Programming

可证明数据驱动的混合整数线性规划拉格朗日松弛

Tung Quoc Le, Anh Tuan Nguyen, Viet Anh Nguyen

AI总结 针对混合整数线性规划的拉格朗日松弛,通过数据驱动算法设计框架,理论分析了学习乘子的泛化界和极小化最优速率,并证明随机梯度上升和热启动方法达到最优。

详情
Comments
Accepted to ICML 2026
AI中文摘要

拉格朗日松弛(LR)是求解大规模混合整数线性规划(MILP)的强大技术,特别是那些具有可分解结构的问题,如车辆路径或机组组合问题。通过松弛耦合约束,LR能够并行求解子问题,并且通常比标准线性规划松弛产生更紧的对偶界,这对于高效的分支定界剪枝至关重要。虽然最近的实证工作显示出使用机器学习预测这些乘子的有希望的结果,但对此类方法的理论理解仍然是一个开放问题。在这项工作中,我们通过数据驱动算法设计的视角分析学习LR的问题来弥合这一差距,即在一个问题实例分布上的统计学习问题。我们的贡献如下:首先,我们推导出学习乘子的泛化界为$\mathcal{O}(s^{1.5}/\sqrt{N})$,其中$s$是耦合约束的数量,$N$是样本量。其次,我们提供了极小化下界$\Omega(s/\sqrt{N})$,证明线性依赖是不可避免的。第三,我们通过证明带有平均的随机梯度上升(SGA)达到了极小化最优速率$\Theta(s/\sqrt{N})$,建设性地缩小了这一理论差距。最后,我们将框架扩展到学习热启动设置,证明其达到了快速、极小化最优速率$\Theta(s/N)$,并确立了相对于直接乘子预测的理论优势。

英文摘要

Lagrangian Relaxation (LR) is a powerful technique for solving large-scale Mixed Integer Linear Programming (MILP), particularly those with decomposable structures, such as vehicle routing or unit commitment problems. By relaxing the coupling constraints, LR enables parallel subproblem solving and often yields tighter dual bounds than standard linear programming relaxations, which is crucial for efficient branch-and-bound pruning. While recent empirical work has shown promising results using machine learning to predict these multipliers, a theoretical understanding of such methods remains an open question. In this work, we bridge this gap by analyzing the problem of learning LR through the lens of Data-driven Algorithm Design, i.e., a statistical learning problem over a distribution of problem instances. Our contributions are as follows: first, we derive a generalization bound of $\mathcal{O}(s^{1.5}/\sqrt{N})$ for the learned multipliers, where $s$ is the number of coupling constraints and $N$ is the sample size. Second, we provide a minimax lower-bound of $Ω(s/\sqrt{N})$, proving that a linear dependency is unavoidable. Third, we constructively close this theoretical gap by proving that Stochastic Gradient Ascent (SGA) with averaging achieves the minimax optimal rate $Θ(s/\sqrt{N})$. Finally, we extend our framework to the learning-to-warm-start setting, proving that it achieves a fast, minimax-optimal rate of $Θ(s/N)$ and establishing a theoretical advantage over direct multiplier prediction.

2605.18468 2026-05-27 stat.ML cs.LG

Shallow ReLU$^s$ Networks in $L^p$-Type and Sobolev Spaces: Approximation and Path-Norm Controlled Generalization

浅层ReLU$^s$网络在$L^p$型空间和Sobolev空间中的逼近与路径范数控制的泛化

Weizhao Li, Fanghui Liu, Lei Shi

AI总结 本文研究浅层ReLU$^s$网络在$L^p$型空间和Sobolev空间中的逼近能力,并通过$\ell_1$路径范数控制实现非参数回归的极小化最优泛化误差。

详情
Comments
42 pages, 1 figure. Update theorem 2and fix some typos. Authors are listed in alphabetical order and contributed equally
AI中文摘要

本文研究浅层ReLU$^s$网络($\sigma_s(t)=\max\{0,t\}^s$)的逼近性质及其在$\ell_1$路径范数控制下的泛化行为。对于$L^p$型积分空间$\widetilde{\mathcal{F}}_{p, au_d,s}$($1\le p\le2$),球谐分析给出了浅层网络的逼近界。特别地,当$ au_d$为均匀测度且$1\le p<2$时,逼近率为:当$1\le p\le p^*$时为$O\!\left(m^{- rac{p(2s+2d+1)-2d}{2dp}} ight)$,当$p^*<p<2$时为$O\!\left(m^{- rac{p(4s+3d-1)-2d+2}{4dp}} ight)$,其中$p^*= rac{2d+2}{d+3}$。通过嵌入到谱Barron空间,得到了Sobolev空间$W^{\alpha,p}$($1\le p<2$)的逼近界。对于亚高斯噪声下的非参数回归,路径范数正则化的浅层ReLU$^s$网络在$\mathscr{B}_s$上达到极小化最优速率$O\!\left(n^{- rac{d+2s+1}{2d+2s+1}}\log n ight)$,在$W^{\alpha,\infty}$上达到$O\!\left(n^{- rac{2\alpha}{2\alpha+d}}\log n ight)$,且下界匹配至对数因子。

英文摘要

This paper studies approximation by shallow ReLU$^s$ networks, $σ_s(t)=\max\{0,t\}^s$, together with their generalization behavior under $\ell_1$ path-norm control. For the $L^p$-type integral spaces $\widetilde{\mathcal{F}}_{p,τ_d,s}$, $1\le p\le2$, spherical harmonic analysis yields approximation bounds for shallow networks. In particular, when $τ_d$ is the uniform measure and $1\le p<2$, the approximation rate is $O\!\left(m^{-\frac{p(2s+2d+1)-2d}{2dp}}\right)$ for $1\le p\le p^*$ and $O\!\left(m^{-\frac{p(4s+3d-1)-2d+2}{4dp}}\right)$ for $p^*<p<2$, where $p^*=\frac{2d+2}{d+3}$. Approximation bounds for Sobolev spaces $W^{α,p}$, $1\le p<2$, are obtained through embeddings into spectral Barron spaces. For nonparametric regression with sub-Gaussian noise, path-norm-regularized shallow ReLU$^s$ networks achieve minimax-optimal rates $O\!\left(n^{-\frac{d+2s+1}{2d+2s+1}}\log n\right)$ over $\mathscr{B}_s$ and $O\!\left(n^{-\frac{2α}{2α+d}}\log n\right)$ over $W^{α,\infty}$, with matching lower bounds up to logarithmic factors.

2604.24660 2026-05-27 stat.ML econ.EM math.ST stat.ME stat.TH

Nonparametric Instrumental Variable Analysis Without Structural Equations: Debiased Inference on Functionals of Inverse Problems with No Solutions

无结构方程的非参数工具变量分析:无解反问题泛函的去偏推断

Zikai Shen, Nathan Kallus, Dimitri Meunier, Houssam Zenati, Arthur Gretton, Aurélien Bibaut

AI总结 针对无精确解的反问题,提出对有限维泛函进行去偏推断的方法,避免假设结构方程精确成立,确保推断在模型不成立时仍有效。

详情
AI中文摘要

我们考虑对反问题中无穷维最小二乘解的有限维泛函进行去偏推断,以避免必须假设精确解存在。这种假设是实质性的且并非无害,当我们将它们强加于统计模型时,其失败可能会危及推断。我们的方法允许我们对一个无论解是否存在都定义的量进行推断,并且当解存在时,该量与通常的估计量一致。对于工具变量的情况,这意味着我们可以用结构模型来激励分析,但这些模型不需要精确成立,半参数推断程序仍然有效。

英文摘要

We consider debiased inference on finite-dimensional functionals of infinite-dimensional least-squares solutions to inverse problems as a way to avoid having to assume exact solutions exist. Such assumptions are substantive and not innocuous, and their failure may imperil inference when we impose them on the statistical model. Our approach instead allows us to conduct inference on a quantity that is defined regardless of solutions existing and coincides with the usual estimands when they do. For the case of instrumental variables, this means we can motivate the analysis with structural models but these do not need to hold exactly for the semiparametric inferential procedure to remain valid.

2605.15902 2026-05-27 econ.EM stat.ME

Tweedie's Formula and Score-Driven Updating

Tweedie公式与得分驱动更新

Peter Reinhard Hansen, Chen Tong

AI总结 本文通过Tweedie公式为得分驱动模型提供了贝叶斯解释,证明了在自然指数族和一般条件密度下,得分更新要么是精确的贝叶斯滤波,要么是局部近似。

详情
AI中文摘要

得分驱动模型使用条件似然得分更新时变参数。本文通过Tweedie公式为这类更新提供了贝叶斯解释,该公式将后验均值修正与边际得分联系起来。在高斯信号提取中,这给出了精确的后验修正恒等式。对于自然指数族,相关恒等式刻画了自然参数空间和期望参数空间中的后验均值。基于这些恒等式,我们证明在局部精度折扣下,期望空间中的共轭贝叶斯滤波恰好与逆Fisher缩放的条件得分更新一致。对于一般条件密度,精确的贝叶斯修正涉及通常不可用的预测边际得分。局部高斯近似表明,条件似然得分提供了该后验修正的主要近似;在局部精度折扣下,预测协方差与逆Fisher信息成正比,从而得到熟悉的逆Fisher缩放得分递归。结果澄清了何时得分驱动更新是精确的贝叶斯滤波,以及何时应将其视为易处理的局部近似。

英文摘要

Score-driven models update time-varying parameters using conditional likelihood scores. This paper develops a Bayesian interpretation of such updates through Tweedie's formula, which connects posterior mean corrections with marginal scores. In Gaussian signal extraction, this gives an exact posterior-correction identity. For natural exponential families, related identities characterize posterior means in natural- and expectation-parameter spaces. Building on these identities, we show that conjugate Bayesian filtering in expectation space coincides exactly with an inverse-Fisher-scaled conditional score update under local precision discounting. For general conditional densities, the exact Bayesian correction involves a generally unavailable predictive-marginal score. A local Gaussian approximation shows that the conditional likelihood score provides the leading approximation to this posterior correction; under local precision discounting, the predictive covariance becomes proportional to inverse Fisher information, yielding the familiar inverse-Fisher-scaled score recursion. The results clarify when score-driven updates are exact Bayesian filters and when they should instead be viewed as tractable local approximations.

2605.06152 2026-05-27 cs.LG cs.CL math.OC stat.ML

Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes

Grokking 还是 Glitching?低精度如何驱动 Slingshot 损失尖峰

Liu Hanqing, Jianjun Cao, Yuanze Li, Zijian Zhou

AI总结 本文证明深度神经网络训练中的 Slingshot 损失尖峰现象是由浮点精度限制导致的数值特征膨胀(NFI)机制引起的,并解释了参数范数快速增长和梯度消失等现象。

详情
Comments
28 pages, 13 figures; ICML 2026 Workshop on High-dimensional Learning Dynamics (Spotlight)
AI中文摘要

深度神经网络在无正则化的长期训练中会出现周期性的损失尖峰,这种现象被称为“Slingshot 机制”。现有工作通常将其归因于内在的优化动力学,但其触发机制仍不清楚。本文证明这种现象是浮点算术精度限制的结果。当训练进入高置信度阶段时,正确类别的 logit 与其他 logit 之间的差异可能超过吸收误差阈值。然后在反向传播中,正确类别的梯度被精确舍入为零,而错误类别的梯度保持非零。这打破了跨类别的梯度零和约束,并在分类器层的参数更新中引入了系统性漂移。我们证明这种漂移与特征形成正反馈循环,导致全局分类器均值和全局特征均值呈指数增长。我们将这种机制称为数值特征膨胀(NFI)。该机制解释了 Slingshot 尖峰前的快速范数增长、随后梯度的重新出现以及由此产生的损失尖峰。我们进一步表明,NFI 并不等同于观察到的损失尖峰:在更实际的任务中,部分吸收可能不会产生可见的尖峰,但它仍然可以打破零和约束并驱动参数范数的快速增长。我们的结果将 Slingshot 重新解释为有限精度训练的一种数值动力学,并为训练后期异常参数增长和 logit 发散提供了可检验的解释。

英文摘要

Deep neural networks exhibit periodic loss spikes during unregularized long-term training, a phenomenon known as the "Slingshot Mechanism." Existing work usually attributes this to intrinsic optimization dynamics, but its triggering mechanism remains unclear. This paper proves that this phenomenon is a result of floating-point arithmetic precision limits. As training enters a high-confidence stage, the difference between the correct-class logit and the other logits may exceed the absorption-error threshold. Then during backpropagation, the gradient of the correct class is rounded exactly to zero, while the gradients of the incorrect classes remain nonzero. This breaks the zero-sum constraint of gradients across classes and introduces a systematic drift in the parameter update of the classifier layer. We prove that this drift forms a positive feedback loop with the feature, causing the global classifier mean and the global feature mean to grow exponentially. We call this mechanism Numerical Feature Inflation (NFI). This mechanism explains the rapid norm growth before a Slingshot spike, the subsequent reappearance of gradients, and the resulting loss spike. We further show that NFI is not equivalent to an observed loss spike: in more practical tasks, partial absorption may not produce visible spikes, but it can still break the zero-sum constraint and drive rapid growth of parameter norms. Our results reinterpret Slingshot as a numerical dynamic of finite-precision training, and provide a testable explanation for abnormal parameter growth and logit divergence in late-stage training.

2508.00223 2026-05-27 math.ST stat.ME stat.TH

Structural Causal Models for Extremes: an Approach Based on Exponent Measures

极值结构因果模型:基于指数测度的方法

Shuyang Bai, Fei Fang, Tiandong Wang

AI总结 提出极值结构因果模型(eSCM),利用指数测度描述极值因果关系,通过激活变量抽象单大跳原理,并利用内在不对称性实现因果方向识别。

详情
Comments
Updated the statement of Theorem 5
AI中文摘要

我们提出了一种新的极值结构因果模型,称为极值结构因果模型(eSCM)。与传统的结构因果模型不同,传统模型中随机性由概率分布控制,而eSCM使用指数测度,这是一种在多元极值分析中自然出现的无限质量定律。该框架的核心是激活变量,它抽象了单大跳原理,以及额外的随机化,丰富了eSCM定律的类别。在最近引入的极值条件独立概念下,该公式涵盖了有向图模型的所有可能定律。我们还识别了自然假设下eSCM中固有的不对称性,从而能够识别因果方向,这是因果推断中的一个核心挑战。最后,我们提出了一种利用这种因果不对称性的方法,并在模拟和真实数据集中证明了其有效性。

英文摘要

We introduce a new formulation of structural causal models for extremes, called the extremal structural causal model (eSCM). Unlike conventional structural causal models, where randomness is governed by a probability distribution, eSCMs use an exponent measure, an infinite-mass law that naturally arises in the analysis of multivariate extremes. Central to this framework are activation variables, which abstract the single-big-jump principle, along with additional randomization that enriches the class of eSCM laws. This formulation encompasses all possible laws of directed graphical models under the recently introduced notion of extremal conditional independence. We also identify an inherent asymmetry in eSCMs under natural assumptions, enabling the identifiability of causal directions, a central challenge in causal inference. Finally, we propose a method that utilizes this causal asymmetry and demonstrate its effectiveness in both simulated and real datasets.

2604.18751 2026-05-27 cs.LG cs.AI stat.ME stat.ML

Beyond Coefficients: Forecast-Necessity Testing for Interpretable Causal Discovery in Nonlinear Time-Series Models

超越系数:非线性时间序列模型中可解释因果发现的预测必要性检验

Valentina Kuskova, Dmitry Zaytsev, Michael Coppedge

AI总结 针对非线性时间序列模型中因果分数被误读为回归系数的问题,提出基于边消融和预测比较的预测必要性检验框架,以评估因果关系的实际必要性。

详情
AI中文摘要

非线性机器学习模型越来越多地用于发现时间序列数据中的因果关系,但其输出的解释仍不明确。特别是,正则化神经自回归模型产生的因果分数常被视为回归系数的类比,导致误导性的统计显著性声明。在本文中,我们认为非线性时间序列模型中的因果相关性应通过预测必要性而非系数大小来评估,并提出了一种实用的评估程序。我们提出了一个基于系统边消融和预测比较的可解释评估框架,用于测试候选因果关系是否对准确预测是必要的。以神经加性向量自回归作为案例研究模型,我们将该框架应用于一个关于民主发展的真实世界案例研究,该案例将面板数据(139个国家的民主指标)建模为多元时间序列。我们表明,具有相似因果分数的关系由于冗余、时间持久性和特定制度效应,其预测必要性可能差异巨大。我们的结果展示了预测必要性检验如何支持应用AI系统中更可靠的因果推理,并为在高风险领域解释非线性时间序列模型提供实用指导。

英文摘要

Nonlinear machine-learning models are increasingly used to discover causal relationships in time-series data, yet the interpretation of their outputs remains poorly understood. In particular, causal scores produced by regularized neural autoregressive models are often treated as analogues of regression coefficients, leading to misleading claims of statistical significance. In this paper, we argue that causal relevance in nonlinear time-series models should be evaluated through forecast necessity rather than coefficient magnitude, and we present a practical evaluation procedure for doing so. We present an interpretable evaluation framework based on systematic edge ablation and forecast comparison, which tests whether a candidate causal relationship is required for accurate prediction. Using Neural Additive Vector Autoregression as a case study model, we apply this framework to a real-world case study of democratic development, modeled as a multivariate time series of panel data - democracy indicators across 139 countries. We show that relationships with similar causal scores can differ dramatically in their predictive necessity due to redundancy, temporal persistence, and regime-specific effects. Our results demonstrate how forecast-necessity testing supports more reliable causal reasoning in applied AI systems and provides practical guidance for interpreting nonlinear time-series models in high-stakes domains.

2604.11481 2026-05-27 astro-ph.CO math-ph math.MP nlin.PS physics.data-an stat.AP

Emergence of Complex Web Structures

复杂网络结构的涌现

Francisco-Shu Kitaura

AI总结 本文通过结合半微观相空间动力学、输运几何、信息论和粗粒化有效建模的统一框架,解释了从均匀或弱相关状态涌现出复杂结构时熵增与有序化之间的表观矛盾。

详情
Comments
38 pages, 8 figures, revised manuscript after referee report
AI中文摘要

复杂结构通常从初始均匀或弱相关状态涌现。我们通过一个结合半微观相空间动力学、输运几何、信息论和粗粒化有效建模的统一框架,解决了这种有序化与熵增之间的表观矛盾。关键在于熵依赖于描述层次:当结构形成时,粗粒化的空间场可能变得更加有序,而完整的相空间描述则通过壳交叉、多流以及速度自由度的激活变得更加复杂。利用拉格朗日-欧拉输运映射,我们展示了密度放大如何由变形的雅可比行列式控制,以及各向异性坍缩如何从变形张量层次的特征值产生。长程相互作用或信息流编码在位移场中,因此非局域性直接通过输运进入。我们将这种几何描述与最大熵高斯基线联系起来,并展示了非线性输运和非局域耦合如何产生尺度耦合、高阶相关和非高斯性。然后,我们构建了一个朗道-金兹堡描述,其中种子各向异性的增长被解释为较低有效自由能分支的激活,提供了自组织的粗粒化实现。应用于生成的宇宙学场,该框架表明非局域潮汐水平在中等过密度时已经变得相关。尽管宇宙学结构形成是这里考虑的主要实现,但该框架旨在更广泛地作为输运、各向异性、非局域性和自组织为核心的系统的一种介观语言。

英文摘要

Complex structures often emerge from initially homogeneous or weakly correlated states. We address the apparent tension between this ordering and entropy growth through a unified framework combining semi-microscopic phase-space dynamics, transport geometry, information theory, and coarse-grained effective modeling. The key point is that entropy depends on the level of description: a coarse-grained spatial field may become more ordered as structure forms, even while the full phase-space description becomes more complex through shell crossing, multistreaming, and the activation of velocity degrees of freedom. Using a Lagrangian--Eulerian transport map, we show how density amplification is governed by the Jacobian of the deformation and how anisotropic collapse arises from the eigenvalues of a hierarchy of deformation tensors. Long-range interaction or information flow is encoded in the displacement field, so that nonlocality enters directly through transport. We connect this geometric description to a maximum-entropy Gaussian baseline and show how nonlinear transport and nonlocal coupling generate scale coupling, higher-order correlations, and non-Gaussianity. We then formulate a Landau--Ginzburg description in which the growth of seed anisotropies is interpreted as the activation of lower effective free-energy branches, providing a coarse-grained realization of self-organization. Applied to generated cosmological fields, this framework indicates that the nonlocal tidal level becomes relevant already at moderate overdensity. Although cosmological structure formation is the main realization considered here, the framework is intended more broadly as a mesoscopic language for systems in which transport, anisotropy, nonlocality, and self-organization are central.

2508.16444 2026-05-27 stat.AP

Dynamic Financial Analysis (DFA) of General Insurers under Climate Change

气候变化下一般保险公司的动态财务分析(DFA)

Benjamin Avanzi, Yanfeng Li, Greg Taylor, Bernard Wong

AI总结 针对传统DFA未考虑气候变化影响的问题,提出一个扩展的DFA框架,通过整合物理和经济维度的气候风险,实现对一般保险行业长期影响的整体评估,并以澳大利亚数据实证展示了其优势。

详情
AI中文摘要

预计气候变化将在长期内显著影响物理、金融和经济环境,对一般保险公司的财务健康构成风险。尽管一般保险公司通常使用动态财务分析(DFA)来全面了解财务影响,但文献中传统的DFA并未考虑气候变化的影响。为弥补这一空白,我们将静态DFA框架扩展以整合气候风险,从而能够整体评估气候变化对一般保险行业的长期影响,并为单个保险公司的DFA提供基础架构。我们的框架通过在一个互联结构中考虑不同气候情景下的物理和经济维度,捕捉气候变化对一般保险公司资产和负债的长期影响。此外,它通过气候情景分析中的随机模拟处理气候变化影响的不确定性,这对精算应用非常有用。我们的扩展针对一般保险部门量身定制,并解决了其独特特征。为展示模型的实际应用,我们使用澳大利亚数据进行了广泛的实证研究,评估了不同气候情景下气候变化对一般保险市场的长期财务影响。结果与静态DFA框架的结果进行了基准比较,表明经济增长与物理风险之间的相互作用在塑造一般保险公司的风险-回报特征中起着关键作用。它们突出了在气候变化影响下,气候依赖型DFA相比静态DFA在生成财务预测方面的优势。我们充分讨论了框架的局限性。

英文摘要

Climate change is expected to significantly affect the physical, financial, and economic environments over the long term, posing risks to the financial health of general insurers. While general insurers typically use Dynamic Financial Analysis (DFA) for a comprehensive view of financial impacts, traditional DFA as presented in the literature does not consider the impact of climate change. To address this gap, we extend the stationary DFA framework to integrate climate risk, enabling a holistic assessment of the long-term impact of climate change on the general insurance industry and offering a foundational architecture for the DFA of individual insurers. Our framework captures the long-term impact of climate change on the assets and liabilities of general insurers by considering both physical and economic dimensions across different climate scenarios within an interconnected structure. Furthermore, it addresses the uncertainty of climate change impacts using stochastic simulations within climate scenario analysis that are useful for actuarial applications. Our extensions are tailored to the general insurance sector and address its unique characteristics. To demonstrate the practical application of our model, we conduct an extensive empirical study using Australian data and assess the long-term financial impact of climate change on the general insurance market under various climate scenarios. The results are benchmarked against those of a stationary DFA framework and show that the interaction between economic growth and physical risk plays a key role in shaping general insurers' risk-return profiles. They highlight the advantages of the climate-dependent DFA over the stationary DFA in generating financial projections under climate change impacts. Limitations of our framework are thoroughly discussed.

2603.01800 2026-05-27 cs.LG cs.AI stat.ML stat.OT

Phase-Type Variational Autoencoders for Heavy-Tailed Data

Phase-Type变分自编码器用于重尾数据

Abdelhakim Ziani, András Horváth, Paolo Ballarini

AI总结 提出Phase-Type变分自编码器(PH-VAE),通过将解码器分布建模为潜在条件相位型分布(连续时间马尔可夫链的吸收时间),灵活适应重尾行为,在合成和真实基准上优于高斯、Student-t和极值VAE解码器。

详情
AI中文摘要

重尾分布在现实世界数据中无处不在,其中罕见但极端的事件主导了风险和变异性。然而,标准变分自编码器(VAE)采用简单的解码器分布,如高斯分布,无法捕捉重尾行为,而现有的重尾感知扩展仍然局限于预定义的参数族,其尾部行为是预先固定的。我们提出了Phase-Type变分自编码器(PH-VAE),其解码器分布是一个潜在条件的Phase-Type(PH)分布,定义为连续时间马尔可夫链(CTMC)的吸收时间。这种公式组合了多个指数时间尺度,产生了一个灵活且解析可处理的解码器,它直接从观测数据中调整其有限范围的尾部行为。在合成和真实世界基准上的实验表明,PH-VAE能够准确逼近各种重尾分布,在建模观测到的尾部行为和极端分位数方面显著优于基于高斯、Student-t和极值的VAE解码器。在多变量设置中,PH-VAE通过其共享的潜在表示捕捉了现实中的跨维度尾部依赖性。据我们所知,这是首次将Phase-Type分布整合到深度生成建模中的工作,桥接了应用概率论和表示学习。

英文摘要

Heavy-tailed distributions are ubiquitous in real-world data, where rare but extreme events dominate risk and variability. However, standard Variational Autoencoders (VAEs) employ simple decoder distributions, such as Gaussian distributions, that fail to capture heavy-tailed behavior, while existing heavy-tail-aware extensions remain restricted to predefined parametric families whose tail behavior is fixed a priori. We propose the Phase-Type Variational Autoencoder (PH-VAE), whose decoder distribution is a latent-conditioned Phase-Type (PH) distribution, defined as the absorption time of a continuous-time Markov chain (CTMC). This formulation composes multiple exponential time scales, yielding a flexible and analytically tractable decoder that adapts its finite-range tail behavior directly from the observed data. Experiments on synthetic and real-world benchmarks demonstrate that PH-VAE accurately approximates diverse heavy-tailed distributions, significantly outperforming Gaussian, Student-t, and extreme-value-based VAE decoders in modeling observed tail behavior and extreme quantiles. In multivariate settings, PH-VAE captures realistic cross-dimensional tail dependence through its shared latent representation. To our knowledge, this is the first work to integrate Phase-Type distributions into deep generative modeling, bridging applied probability and representation learning.

2602.15919 2026-05-27 stat.ML cs.AI cs.LG

Assessing Per-Sample Membership Inference Vulnerability without Retraining

无需重训练的逐样本成员推断脆弱性评估

Valentin Dorseuil, Jamal Atif, Olivier Cappé

AI总结 提出一种基于数据依赖几何度量的逐样本成员推断脆弱性评分方法,仅需单个训练模型即可高效识别高风险样本。

详情
AI中文摘要

近期隐私文献表明,针对样本的成员推断攻击(MIA)显著优于非针对性方法。受此启发,我们探讨以下问题:能否在不训练影子模型的情况下评估单个训练点的隐私脆弱性?我们表明,逐样本对MIA的暴露程度不仅受其损失影响,还受数据依赖的几何度量控制。在线性设置中,我们推导出个体黑盒MIA脆弱性的闭式分解,将其分解为总体杠杆得分和残差损失项,明确了样本依赖的几何结构如何转化为隐私暴露。由于大多数现代架构的最后一层是线性的,我们将此框架扩展到深度网络,并提出一种基于最后一层表示的替代评分,仅需单个训练模型且无需影子模型。跨不同数据集和架构的实验表明,我们的评分在识别最先进攻击下的最高风险点时优于损失和梯度范数基线,为逐样本隐私风险评估提供了计算高效且理论基础的工。

英文摘要

Recent work in the privacy literature shows that sample-targeted membership inference attacks (MIAs) significantly outperform untargeted approaches by a wide margin. Motivated by this observation, we address the following question: can the privacy vulnerability of individual training points be assessed without training shadow models? We show that per-sample exposure to MIA is governed not only by a point's loss, but also by a data-dependent geometric measure. In the linear setting, we derive a closed-form decomposition of individual black-box MIA vulnerability into a population leverage score and a residual loss term, making explicit how sample-dependent geometry translates into privacy exposure. Since the final layer of most modern architectures is linear, we extend this framework to deep networks and propose a surrogate score operating on last-layer representations that requires only a single trained model and no shadow models. Empirical evaluations across diverse datasets and architectures show that our score outperforms loss and gradient-norm baselines at identifying the highest-risk points under state-of-the-art attacks, providing a computationally efficient and theoretically grounded tool for per-sample privacy risk assessment.

2506.15199 2026-05-27 cs.LG stat.ML

Interpretability and Generalization Bounds for Learning Spatial Physics

学习空间物理的可解释性与泛化界

Alejandro Francisco Queiruga, Theo Gutman-Solo, Shuai Jiang

AI总结 利用数值分析技术,严格量化了应用于线性微分方程的机器学习模型在参数发现或求解中的准确性、收敛率和泛化界,并基于格林函数表示引入科学模型的可解释性视角。

详情
Comments
To appear in ICML 2026. 18 pages, 13 figures
AI中文摘要

尽管机器学习在科学问题上的许多应用看起来很有前景,但视觉可能具有欺骗性。利用数值分析技术,我们严格量化了某些应用于线性微分方程进行参数发现或求解的机器学习模型的准确性、收敛率和泛化界。除了数据的数量和离散化之外,我们发现数据的函数空间对模型的泛化至关重要。对于常用模型(包括物理特定技术),我们通过实验证明了类似的泛化不足。与直觉相反,我们发现不同类别的模型可能表现出相反的泛化行为。基于我们的理论分析,我们还引入了一种新的科学模型机械可解释性视角,即可以从黑箱模型的权重中提取格林函数表示。我们的结果为测量物理系统泛化性提供了一种新的交叉验证技术,该技术可作为基准。

英文摘要

While there are many applications of ML to scientific problems that look promising, visuals can be deceiving. Using numerical analysis techniques, we rigorously quantify the accuracy, convergence rates, and generalization bounds of certain ML models applied to linear differential equations for parameter discovery or solution finding. Beyond the quantity and discretization of data, we identify that the function space of the data is critical to the generalization of the model. A similar lack of generalization is empirically demonstrated for commonly used models, including physics-specific techniques. Counterintuitively, we find that different classes of models can exhibit opposing generalization behaviors. Based on our theoretical analysis, we also introduce a new mechanistic interpretability lens on scientific models whereby Green's function representations can be extracted from the weights of black-box models. Our results inform a new cross-validation technique for measuring generalization in physical systems, which can serve as a benchmark.

2602.03202 2026-05-27 math.ST stat.ML stat.TH

Sharp Inequalities between Total Variation and Hellinger Distances for Gaussian Mixtures

高斯混合的总变差与Hellinger距离之间的尖锐不等式

Joonhyuk Jung, Chao Gao

AI总结 本文建立了紧支撑混合分布下高斯位置混合的总变差与Hellinger距离之间的上界,并构造序列证明其尖锐性,从而解决了Jia等人(2023)提出的开放问题,并给出了总变差下学习高斯混合的熵特征以及Hellinger距离下的最优鲁棒估计。

详情
Journal ref
Proceedings of the 43rd International Conference on Machine Learning (ICML), Seoul, South Korea. PMLR 306, 2026
Comments
36 pages
AI中文摘要

我们研究了两个高斯位置混合之间的总变差(TV)和Hellinger距离之间的关系。我们的第一个结果建立了一个一般上界:对于任何支撑在紧集上的两个混合分布,两个混合之间的Hellinger距离受TV距离的$1-o(1)$次幂控制,其中$o(1)$项的量级为$1/\log\log(1/\mathrm{TV})$。我们还构造了两个混合分布序列,证明了该界限的尖锐性。综合起来,我们的结果解决了Jia等人(2023)提出的一个开放问题,从而导致了总变差下学习高斯混合的熵特征。我们的不等式还给出了Hellinger距离下高斯混合的最优鲁棒估计,这对在Huber污染下经验贝叶斯的极小最大遗憾有直接影响。

英文摘要

We study the relation between the total variation (TV) and Hellinger distances between two Gaussian location mixtures. Our first result establishes a general upper bound: for any two mixing distributions supported on a compact set, the Hellinger distance between the two mixtures is controlled by the TV distance raised to a power $1-o(1)$, where the $o(1)$ term is of order $1/\log\log(1/\mathrm{TV})$. We also construct two sequences of mixing distributions that demonstrate the sharpness of this bound. Taken together, our results resolve an open problem raised in Jia et al. (2023) and thus lead to an entropic characterization of learning Gaussian mixtures in total variation. Our inequality also yields optimal robust estimation of Gaussian mixtures in Hellinger distance, which has a direct implication for bounding the minimax regret of empirical Bayes under Huber contamination.

2602.00827 2026-05-27 cs.LG stat.ML

Over-Alignment vs Over-Fitting: The Role of Feature Learning Strength in Generalization

过度对齐 vs 过拟合:特征学习强度在泛化中的作用

Taesun Yeom, Taehyeok Ha, Jaeho Lee

AI总结 本文通过实验和理论分析,揭示了深度网络中特征学习强度存在最优值,过大导致过度对齐、过小导致过拟合,从而影响泛化性能。

详情
Comments
ICML 2026
AI中文摘要

特征学习强度(FLS),即模型有效输出缩放的倒数,在塑造神经网络的优化动态中起着关键作用。尽管其影响已在渐近区域(训练时间和FLS)得到广泛研究,但现有理论对FLS如何影响实际设置中的泛化(例如,当训练在达到目标训练风险时停止)提供的见解有限。在这项工作中,我们研究了在实际条件下FLS对深度网络泛化的影响。通过实证研究,我们首先发现了一个$ extit{最优FLS}$的存在——既不太小也不太大——它能带来显著的泛化收益。这一发现与更强的特征学习普遍改善泛化的主流直觉相悖。为了解释这一现象,我们开发了对使用逻辑损失训练的两层ReLU网络中的梯度流动力学的理论分析,其中FLS通过初始化尺度控制。我们的主要理论结果建立了最优FLS的存在性,它源于两种竞争效应之间的权衡:过大的FLS会导致$ extit{过度对齐}$现象,降低泛化性能,而过小的FLS则会导致$ extit{过拟合}$。

英文摘要

Feature learning strength (FLS), i.e., the inverse of the effective output scaling of a model, plays a critical role in shaping the optimization dynamics of neural nets. While its impact has been extensively studied under the asymptotic regimes -- both in training time and FLS -- existing theory offers limited insight into how FLS affects generalization in practical settings, such as when training is stopped upon reaching a target training risk. In this work, we investigate the impact of FLS on generalization in deep networks under such practical conditions. Through empirical studies, we first uncover the emergence of an $\textit{optimal FLS}$ -- neither too small nor too large -- that yields substantial generalization gains. This finding runs counter to the prevailing intuition that stronger feature learning universally improves generalization. To explain this phenomenon, we develop a theoretical analysis of gradient flow dynamics in two-layer ReLU nets trained with logistic loss, where FLS is controlled via initialization scale. Our main theoretical result establishes the existence of an optimal FLS arising from a trade-off between two competing effects: An excessively large FLS induces an $\textit{over-alignment}$ phenomenon that degrades generalization, while an overly small FLS leads to $\textit{over-fitting}$.

2509.21906 2026-05-27 math.ST cs.LG stat.ML stat.TH

Error Analysis of Discrete Flow with Generator Matching

生成器匹配的离散流误差分析

Zhengyan Wan, Yidong Ouyang, Qiang Yao, Liyan Xie, Fang Fang, Hongyuan Zha, Guang Cheng

AI总结 本文基于随机微积分理论,通过Girsanov型定理统一分析离散流模型的收敛性质,给出了转移率估计误差和提前停止误差的非渐近误差界,并首次提供了离散流模型的误差分析。

详情
AI中文摘要

离散流模型为学习离散状态空间上的分布提供了强大的框架,并且与离散扩散模型相比表现出更优的性能。然而,它们的收敛性质和误差分析仍然在很大程度上未被探索。在这项工作中,我们开发了一个基于随机微积分理论的统一框架,以系统地研究离散流模型的理论性质。具体来说,通过利用两个连续时间马尔可夫链(CTMC)路径测度的Girsanov型定理,我们提出了一个全面的误差分析,该分析同时考虑了转移率估计误差和提前停止误差。实际上,现有工作中很少关注转移率的估计误差。与离散扩散模型不同,离散流不会因在噪声过程中截断时间范围而产生初始化误差。基于生成器匹配和均匀化,我们在没有对Oracle转移率施加有界性条件的情况下,建立了分布估计的非渐近误差界。此外,我们推导了在有界性条件下估计分布的总变差收敛的更快速率,得到了关于样本量的近乎最优的速率。我们的结果为离散流模型提供了首次误差分析。我们还基于模拟结果研究了不同设置下的模型性能。

英文摘要

Discrete flow models offer a powerful framework for learning distributions over discrete state spaces and have demonstrated superior performance compared to the discrete diffusion models. However, their convergence properties and error analysis remain largely unexplored. In this work, we develop a unified framework grounded in stochastic calculus theory to systematically investigate the theoretical properties of discrete flow models. Specifically, by leveraging a Girsanov-type theorem for the path measures of two continuous-time Markov chains (CTMCs), we present a comprehensive error analysis that accounts for both transition rate estimation error and early stopping error. In fact, the estimation error of transition rates has received little attention in existing works. Unlike discrete diffusion models, discrete flow incurs no initialization error caused by truncating the time horizon in the noising process. Building on generator matching and uniformization, we establish non-asymptotic error bounds for distribution estimation without the boundedness condition on oracle transition rates. Furthermore, we derive a faster rate of total variation convergence for the estimated distribution with the boundedness condition, yielding a nearly optimal rate in terms of sample size. Our results provide the first error analysis for discrete flow models. We also investigate model performance under different settings based on simulation results.

2601.21789 2026-05-27 cs.LG cs.AI stat.ML

ECSEL: Explainable Classification via Signomial Equation Learning

ECSEL: 通过符号方程学习的可解释分类

Adia Lumadjeng, Ilker Birbil, Erman Acar

AI总结 提出ECSEL方法,通过学习符号方程形式的闭式表达式实现可解释分类,在符号回归基准上以更低计算量恢复更多目标方程,并保持分类精度与可解释性。

详情
Comments
9 pages, 4 figures, accepted at ICML 2026
AI中文摘要

我们引入ECSEL,一种可解释的分类方法,它学习形如符号方程的正式表达式,其动机是观察到许多符号回归基准具有紧凑的符号结构。ECSEL直接构建一个结构化的闭式表达式,同时作为分类器和解释。在标准符号回归基准上,我们的方法比竞争的最新方法恢复更大比例的目标方程,同时需要更少的计算。利用这种效率,ECSEL在不牺牲可解释性的情况下实现了与已建立的机器学习模型竞争的分类精度。此外,我们展示了ECSEL在全局特征行为、决策边界分析和局部特征归因方面满足一些理想性质。在基准数据集和两个真实世界案例研究(即电子商务和欺诈检测)上的实验表明,学习到的方程暴露了数据集偏差,支持反事实推理,并产生可操作的见解。

英文摘要

We introduce ECSEL, an explainable classification method that learns formal expressions in the form of signomial equations, motivated by the observation that many symbolic regression benchmarks admit compact signomial structure. ECSEL directly constructs a structural, closed-form expression that serves as both a classifier and an explanation. On standard symbolic regression benchmarks, our method recovers a larger fraction of target equations than competing state-of-the-art approaches while requiring substantially less computation. Leveraging this efficiency, ECSEL achieves classification accuracy competitive with established machine learning models without sacrificing interpretability. Further, we show that ECSEL satisfies some desirable properties regarding global feature behavior, decision-boundary analysis, and local feature attributions. Experiments on benchmark datasets and two real-world case studies i.e., e-commerce and fraud detection, demonstrate that the learned equations expose dataset biases, support counterfactual reasoning, and yield actionable insights.

2512.08371 2026-05-27 cs.LG stat.ML

A Multivariate Bernoulli-Based Sampling Method for Multi-Label Data with Application to Meta-Research

基于多元伯努利的采样方法用于多标签数据及其在元研究中的应用

Simon Chung, Colby J. Vorland, Donna L. Maney, Andrew W. Brown

AI总结 针对多标签数据中标签频率差异大且存在依赖关系的问题,提出一种基于多元伯努利分布的加权采样算法,通过估计标签组合权重实现目标分布特征,并在Web of Science研究文章数据上验证了其增强少数类别代表性的效果。

详情
AI中文摘要

数据集可能包含具有多个标签的观测值。如果标签不是互斥的,并且标签的频率差异很大,那么获取一个样本,该样本包含足够多的稀有标签观测值以对这些标签进行推断,并且以已知方式偏离总体频率,这带来了挑战。在本文中,我们将多元伯努利分布视为多标签问题的底层分布。我们提出了一种新颖的采样算法,该算法考虑了标签依赖性。它使用观测到的标签频率来估计多元伯努利分布参数,并为每个标签组合计算权重。这种方法确保加权采样在考虑标签依赖性的同时获得目标分布特征。我们将该方法应用于各种数据集,包括来自Web of Science的研究文章样本,这些文章标有64个生物医学主题类别。我们的目标是保持类别频率顺序,减少最常见和最不常见类别之间的频率差异,并考虑类别依赖性。该方法产生了更平衡的子样本,增强了少数类别的代表性。

英文摘要

Datasets may contain observations with multiple labels. If the labels are not mutually exclusive, and if the labels vary greatly in frequency, obtaining a sample that includes sufficient observations with scarcer labels to make inferences about those labels, and which deviates from the population frequencies in a known manner, creates challenges. In this paper, we consider a multivariate Bernoulli distribution as our underlying distribution of a multi-label problem. We present a novel sampling algorithm that takes label dependencies into account. It uses observed label frequencies to estimate multivariate Bernoulli distribution parameters and calculates weights for each label combination. This approach ensures the weighted sampling acquires target distribution characteristics while accounting for label dependencies. We applied this approach to a variety of datasets, including a sample of research articles from Web of Science labeled with 64 biomedical topic categories. We aimed to preserve category frequency order, reduce frequency differences between most and least common categories, and account for category dependencies. This approach produced a more balanced sub-sample, enhancing the representation of minority categories.

2512.03333 2026-05-27 quant-ph cs.NA math.NA math.ST physics.comp-ph stat.ML stat.TH

Sketch Tomography: Hybridizing Classical Shadow and Matrix Product State

Sketch Tomography: 混合经典阴影与矩阵乘积态

Xun Tang, Haoxuan Chen, Yuehaw Khoo, Lexing Ying

AI总结 提出Sketch Tomography方法,基于经典阴影协议和矩阵乘积态假设,通过张量列分解和可观测量估计高效重构量子态,样本复杂度随系统规模二次增长,数值实验表明在可观测量估计中优于经典阴影和最大似然估计方法。

详情
AI中文摘要

我们引入了Sketch Tomography,这是一种基于用于量子可观测量估计的经典阴影协议的高效量子态层析成像程序。该程序适用于真实量子态为矩阵乘积态(MPS)的情况。真实态的密度矩阵由于MPS假设而具有张量列形式,我们通过一系列可观测量估计来估计该形式的张量分量,从而输出密度矩阵的近似。该程序可证明收敛,样本复杂度随系统规模二次增长。我们进行了广泛的数值实验,表明该程序能输出量子态的精确近似。对于涉及中等大小子系统的可观测量估计任务,我们表明我们的程序比经典阴影协议产生更准确的估计。我们还表明,在可观测量估计中,sketch tomography比从最大似然估计公式训练的量子态更准确。

英文摘要

We introduce Sketch Tomography, an efficient procedure for quantum state tomography based on the classical shadow protocol used for quantum observable estimations. The procedure applies to the case where the ground truth quantum state is a matrix product state (MPS). The density matrix of the ground truth state admits a tensor train ansatz as a result of the MPS assumption, and we estimate the tensor components of the ansatz through a series of observable estimations, thus outputting an approximation of the density matrix. The procedure is provably convergent with a sample complexity that scales quadratically in the system size. We conduct extensive numerical experiments to show that the procedure outputs an accurate approximation to the quantum state. For observable estimation tasks involving moderately large subsystems, we show that our procedure gives rise to a more accurate estimation than the classical shadow protocol. We also show that sketch tomography is more accurate in observable estimation than quantum states trained from the maximum likelihood estimation formulation.

2508.00542 2026-05-27 physics.soc-ph cs.IT math.IT physics.data-an physics.med-ph stat.ME

Assessing (im)balance in signed brain networks

评估带符号脑网络中的(不)平衡

Marzio Di Vece, Emanuele Agrimi, Samuele Tatullo, Tommaso Gili, Miguel Ibáñez-Berganza, Tiziano Squartini

AI总结 本文提出一种基于信息论和假设检验的方法,将多元时间序列投影为带符号图,并应用于脑网络,发现脑网络存在挫折,且负子图主要来自皮层下结构。

详情
Comments
43 pages, 19 figures, 1 table
AI中文摘要

许多复杂系统——无论是金融、自然还是社会系统——都由单元(如股票、神经元或智能体)组成,其联合活动可以表示为多元时间序列。一个既具有实际重要性又具有理论重要性的问题涉及仅从动态状态推断任意两个单元之间是否存在静态关系。本文旨在传统假设检验框架内解决这一问题:简而言之,我们的建议是,如果两个单元的行为足够相似,则将它们连接起来。为了实现这一目标,我们通过以下步骤将多元时间序列投影到带符号图上:i) 将前者的经验性质与在适当基准下预期的性质进行比较,以及ii) 如果相应序列共享显著大量的一致(不一致)值,则用正(负)边连接任意两个单元。为了定义我们的基准,我们采用一种基于信息论的方法,该方法根植于香农熵的约束最大化,这一过程产生了一个多元时间序列的集成,该集成平均保留了某些经验性质,同时随机化了其他所有内容。我们通过解决神经科学领域中最及时的问题之一——即确定脑网络是否受挫,如果是,受挫程度如何——来展示我们方法的可能应用。正如我们的结果所示,情况确实如此,对潜在负子图的主要贡献来自皮层下结构(以及较小程度上来自边缘区域)。在介观层面,使用带符号随机块模型实例化的贝叶斯信息准则的最小化表明,大脑区域聚集成与松弛平衡理论的统计变体一致的模块。

英文摘要

Many complex systems - be they financial, natural, or social - are composed of units - such as stocks, neurons, or agents - whose joint activity can be represented as a multivariate time series. An issue of both practical and theoretical importance concerns the possibility of inferring the presence of a static relationship between any two units solely from their dynamic state. The present contribution aims at tackling such an issue within the frame of traditional hypothesis testing: briefly speaking, our suggestion is that of linking any two units if behaving in a sufficiently similar way. To achieve such a goal, we project a multivariate time series onto a signed graph by i) comparing the empirical properties of the former with those expected under a suitable benchmark and ii) linking any two units with a positive (negative) edge in case the corresponding series shares a significantly large number of concordant (discordant) values. To define our benchmarks, we adopt an information-theoretic approach that is rooted into the constrained maximisation of Shannon entropy, a procedure inducing an ensemble of multivariate time series that preserves some of the empirical properties on average, while randomising everything else. We showcase the possible applications of our method by addressing one of the most timely issues in the domain of neurosciences, i.e. that of determining if brain networks are frustrated or not, and, if so, to what extent. As our results suggest, this is indeed the case, with the major contribution to the underlying negative subgraph coming from the subcortical structures (and, to a lesser extent, from the limbic regions). At the mesoscopic level, the minimisation of the Bayesian Information Criterion, instantiated with the Signed Stochastic Block Model, reveals that brain areas gather into modules aligning with the statistical variant of the Relaxed Balance Theory.

2511.17852 2026-05-27 cs.LG stat.ML

Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently

带RL或SFT的Transformer可证明学习稀疏布尔函数,但方式不同

Bochen Lyu, Yiyang Jia, Xiaohao Cai, Zhanxing Zhu

AI总结 本文通过统一分析RL(过程奖励)和SFT微调Transformer学习可递归分解的k-稀疏布尔函数的动态,证明两者都能学习k-PARITY、k-AND、k-OR等函数,但RL同时学习整个CoT链,而SFT逐步学习。

详情
Comments
50 pages, 12 figures
AI中文摘要

Transformer可以通过微调获得思维链(CoT)能力来解决复杂的推理任务。强化学习(RL)和监督微调(SFT)是实现这一目标的两种主要方法。在这项工作中,我们专门研究了使用过程奖励的RL和SFT,通过类似于CoT的中间推理步骤,用单层Transformer学习$k$-稀疏布尔函数。特别地,我们考虑可以递归分解为固定2-稀疏布尔函数的$k$-稀疏布尔函数。我们首先以统一的方式分析使用过程奖励的RL微调和SFT的学习动态。这使我们能够识别出Transformer可证明学习这些稀疏布尔函数的充分条件。然后,我们验证了这些条件在三个基本示例(包括$k$-PARITY、$k$-AND和$k$-OR)中成立,从而证明了它们通过RL和SFT的可学习性。值得注意的是,我们揭示了RL和SFT表现出不同的学习行为:RL同时学习整个CoT链,而SFT自然地逐步学习CoT链。总体而言,我们的发现为RL和SFT的底层机制以及它们在触发Transformer的CoT能力方面的差异提供了见解,并表明RL和SFT之间的比较可能需要考虑奖励设计和教师强制(teacher forcing)的使用。

英文摘要

Transformers can acquire Chain-of-Thought (CoT) capabilities to solve complex reasoning tasks through fine-tuning. Reinforcement learning (RL) and supervised fine-tuning (SFT) are two primary approaches to this end. In this work, we specifically examine RL with process rewards and SFT for learning $k$-sparse Boolean functions with a one-layer transformer through intermediate reasoning steps akin to CoT. In particular, we consider $k$-sparse Boolean functions that can be recursively decomposed into fixed 2-sparse Boolean functions. We first analyze the learning dynamics of RL fine-tuning with process reward and SFT in a unified way. This allows us to identify sufficient conditions under which the transformer provably learns these sparse Boolean functions. We then verify that these conditions hold for three basic examples, including $k$-PARITY, $k$-AND, and $k$-OR, thus demonstrating their learnability via both RL and SFT. Notably, we reveal that RL and SFT exhibit distinct learning behaviors: RL learns the whole CoT chain simultaneously, whereas SFT naturally learns the CoT chain step by step. Overall, our findings provide insights on the mechanisms underlying RL and SFT and how they differ in triggering the CoT capabilities of transformers, and suggest that the comparison between RL and SFT may need to consider the reward design and the use of teacher forcing.

2510.25956 2026-05-27 math.OC math.AP stat.ML

Gradient Flow Sampler-based Distributionally Robust Optimization

基于梯度流采样的分布鲁棒优化

Zusen Xu, Jia-Jie Zhu

AI总结 提出一种基于PDE梯度流框架的分布鲁棒优化方法,利用MCMC采样与梯度流理论实现最坏情况分布采样,并统一了现有DRO方法。

详情
AI中文摘要

我们为分布鲁棒优化(DRO)提出了一个数学上严谨的PDE梯度流框架。利用马尔可夫链蒙特卡洛采样与梯度流理论交叉领域的最新进展,我们展示了我们的理论框架可以作为从最坏情况分布采样的实用算法实现,从而用于DRO。虽然许多先前的工作提出了各种重构技术和迭代算法,但我们贡献了一个合理的分布优化梯度流视角,可用于构建新算法。作为应用示例,我们使用最近发现的Wasserstein Fisher-Rao和Stein变分梯度流解决了一类Wasserstein和Sinkhorn DRO问题。值得注意的是,我们还展示了我们框架的一些简单简化恰好恢复了先前流行的DRO方法,并为它们的理论极限和优化动态提供了新的见解。基于随机梯度下降的数值研究为我们的理论发现提供了经验支持。

英文摘要

We propose a mathematically principled PDE gradient flow framework for distributionally robust optimization (DRO). Exploiting the recent advances in the intersection of Markov Chain Monte Carlo sampling and gradient flow theory, we show that our theoretical framework can be implemented as practical algorithms for sampling from worst-case distributions and, consequently, DRO. While numerous previous works have proposed various reformulation techniques and iterative algorithms, we contribute a sound gradient flow view of the distributional optimization that can be used to construct new algorithms. As an example of applications, we solve a class of Wasserstein and Sinkhorn DRO problems using the recently-discovered Wasserstein Fisher-Rao and Stein variational gradient flows. Notably, we also show some simple reductions of our framework recover exactly previously proposed popular DRO methods, and provide new insights into their theoretical limit and optimization dynamics. Numerical studies based on stochastic gradient descent provide empirical backing for our theoretical findings.

2510.17759 2026-05-27 cs.CR cs.CL cs.CV cs.LG stat.ML

VERA-V: Variational Inference Framework for Jailbreaking Vision-Language Models

VERA-V:用于破解视觉语言模型的变分推断框架

Qilin Liao, Anamika Lochab, Ruqi Zhang

AI总结 提出VERA-V变分推断框架,通过联合后验分布生成隐蔽的文本-图像对抗输入,以系统性地发现视觉语言模型的多模态漏洞,在多个基准上攻击成功率最高提升53.75%。

详情
Comments
18 pages, 7 Figures,
AI中文摘要

视觉语言模型(VLM)通过视觉推理扩展了大语言模型,但其多模态设计也引入了新的、未被充分探索的漏洞。现有的多模态红队方法主要依赖脆弱的模板,专注于单一攻击设置,并且仅暴露了漏洞的一小部分。为了解决这些限制,我们引入了VERA-V,一个变分推断框架,将多模态越狱发现重新表述为学习配对文本-图像提示的联合后验分布。这种概率视角使得能够生成绕过模型防护的隐蔽、耦合的对抗输入。我们训练一个轻量级攻击者来近似后验分布,从而能够高效采样多样化的越狱方法,并提供对漏洞的分布性洞察。VERA-V进一步整合了三种互补策略:(i)基于排版的文本提示,嵌入有害线索;(ii)基于扩散的图像合成,引入对抗信号;(iii)结构化干扰物,分散VLM的注意力。在HarmBench和HADES基准上的实验表明,VERA-V在开源和前沿VLM上均持续优于最先进的基线方法,在GPT-4o上相比最佳基线实现了高达53.75%的攻击成功率(ASR)提升。我们在项目页面提供了代码,地址为:https://github.com/kxwhiowo/VERA-V

英文摘要

Vision-Language Models (VLMs) extend large language models with visual reasoning, but their multimodal design also introduces new, underexplored vulnerabilities. Existing multimodal red-teaming methods largely rely on brittle templates, focus on single-attack settings, and expose only a narrow subset of vulnerabilities. To address these limitations, we introduce VERA-V, a variational inference framework that recasts multimodal jailbreak discovery as learning a joint posterior distribution over paired text-image prompts. This probabilistic view enables the generation of stealthy, coupled adversarial inputs that bypass model guardrails. We train a lightweight attacker to approximate the posterior, allowing efficient sampling of diverse jailbreaks and providing distributional insights into vulnerabilities. VERA-V further integrates three complementary strategies: (i) typography-based text prompts that embed harmful cues, (ii) diffusion-based image synthesis that introduces adversarial signals, and (iii) structured distractors to fragment VLM attention. Experiments on HarmBench and HADES benchmarks show that VERA-V consistently outperforms state-of-the-art baselines on both open-source and frontier VLMs, achieving up to 53.75% higher attack success rate (ASR) over the best baseline on GPT-4o. We include the code on the project page available here: https://github.com/kxwhiowo/VERA-V

2501.06547 2026-05-27 math.ST math.PR stat.TH

Pathwise guessing in categorical time series with unbounded alphabets

无界字母表分类时间序列中的路径猜测

J. -R. Chazottes, S. Gallo, D. Takahashi

AI总结 提出一种非参数猜测函数,其学习率与字母表大小无关,并针对广泛的时间序列模型(包括有限阶马尔可夫链、部分隐马尔可夫链、计数过程的泊松回归以及一维吉布斯测度)建立了风险收敛的边际条件,同时给出了匹配的极小化下界,证明了估计量的近最优性。

详情
Comments
25 pages. This is the final version. To appear in IEEE Trans. Inform. Th
AI中文摘要

以下学习问题在各种应用中自然出现:给定来自分类或计数时间序列的有限样本,我们能否学习样本的一个函数,该函数(几乎)最大化使用其余部分的数据正确猜测给定部分数据值的概率?与统计推断中的经典方法不同,我们的方法避免显式估计条件概率。我们提出了一种非参数猜测函数,其学习率与字母表大小无关。我们的分析聚焦于一类广泛的时间序列模型,包括有限阶马尔可夫链、一些隐马尔可夫链、计数过程的泊松回归以及一维吉布斯测度。我们提供了一个控制风险收敛速度的边际条件。此外,我们为与猜测问题相关的风险收敛速度建立了一个极小化下界。该下界与我们的估计器达到的上界匹配,仅相差一个对数因子,证明了其近最优性。

英文摘要

The following learning problem arises naturally in various applications: Given a finite sample from a categorical or count time series, can we learn a function of the sample that (nearly) maximizes the probability of correctly guessing the values of a given portion of the data using the values from the remaining parts? Unlike classical approaches in statistical inference, our approach avoids explicitly estimating the conditional probabilities. We propose a non-parametric guessing function with a learning rate independent of the alphabet size. Our analysis focuses on a broad class of time series models that encompasses finite-order Markov chains, some hidden Markov chains, Poisson regression for count processes, and one-dimensional Gibbs measures. We provide a margin condition that controls the rate of convergence for the risk. Additionally, we establish a minimax lower bound for the convergence rate of the risk associated with our guessing problem. This lower bound matches the upper bound achieved by our estimator up to a logarithmic factor, demonstrating its near-optimality.

2505.07894 2026-05-27 cs.NI cs.ET cs.LG eess.SP math.ST stat.TH

EnvCDiff: Joint Refinement of Environmental Information and Channel Fingerprints via Conditional Generative Diffusion Model

EnvCDiff:通过条件生成扩散模型联合优化环境信息与信道指纹

Zhenzhou Jin, Li You, Xiang-Gen Xia, Xiqi Gao

AI总结 针对环境信息和信道指纹粗粒度问题,提出条件生成扩散模型(CDiff)同时细化两者,从粗粒度重建细粒度EnvCF,实验表明性能显著提升。

详情
Journal ref
IEEE Transactions on Vehicular Technology, vol. 75, no. 4, pp. 6846-6851, Apr. 2026
Comments
6 pages, 2 figures
AI中文摘要

从环境无感知通信向智能环境感知通信的范式转变有望促进未来无线通信中信道状态信息的获取。信道指纹(CF)作为环境感知通信的新兴使能技术,为目标通信区域内潜在位置提供信道相关知识。然而,由于用于感知环境信息和测量信道相关知识的实际设备有限,大多数获取的环境信息和CF是粗粒度的,不足以指导无线传输设计。为此,本文提出一种深度条件生成学习方法,即定制的条件生成扩散模型(CDiff)。所提出的CDiff同时细化环境信息和CF,从其粗粒度对应物重建包含环境信息的细粒度CF,称为EnvCF。实验结果表明,与基线相比,所提方法显著提高了EnvCF构建的性能。

英文摘要

The paradigm shift from environment-unaware communication to intelligent environment-aware communication is expected to facilitate the acquisition of channel state information for future wireless communications. Channel Fingerprint (CF), as an emerging enabling technology for environment-aware communication, provides channel-related knowledge for potential locations within the target communication area. However, due to the limited availability of practical devices for sensing environmental information and measuring channel-related knowledge, most of the acquired environmental information and CF are coarse-grained, insufficient to guide the design of wireless transmissions. To address this, this paper proposes a deep conditional generative learning approach, namely a customized conditional generative diffusion model (CDiff). The proposed CDiff simultaneously refines environmental information and CF, reconstructing a fine-grained CF that incorporates environmental information, referred to as EnvCF, from its coarse-grained counterpart. Experimental results show that the proposed approach significantly improves the performance of EnvCF construction compared to the baselines.

2510.01168 2026-05-27 math.OC cs.LG cs.NA math.NA stat.ML

A first-order method for constrained nonconvex-nonconcave minimax optimization

约束非凸-非凹极小极大优化的一阶方法

Zhaosong Lu, Xiangyuan Wang

AI总结 针对内层最大化含复杂约束的非凸-非凹极小极大问题,通过提升重构和局部KL条件,提出基于序列凸规划的不精确近端梯度法并证明收敛性。

详情
Comments
27 pages
AI中文摘要

我们研究一类约束非凸-非凹极小极大优化问题,其中内层最大化涉及潜在复杂约束。在假设新型提升极小极大重构的内层问题满足局部Kurdyka-Lojasiewicz (KL)条件的情况下,我们证明原问题的极大函数具有局部广义Hölder光滑性。我们还提出了一种求解约束优化问题的序列凸规划(SCP)方法,并在局部KL条件下建立了其收敛速率。利用这些结果,我们为原始极小极大问题开发了一种不精确近端梯度法,其中极大函数的不精确梯度通过将SCP方法应用于局部KL结构的子问题来计算。最后,我们为所提方法在计算原始极小极大问题的近似稳定点方面建立了复杂度保证。

英文摘要

We study a class of constrained nonconvex-nonconcave minimax optimization problems in which the inner maximization involves potentially complex constraints. Under the assumption that the inner problem of a novel lifted minimax reformulation satisfies a local Kurdyka-Lojasiewicz (KL) condition, we show that the maximal function of the original problem enjoys a local generalized Hölder smoothness property. We also propose a sequential convex programming (SCP) method for solving constrained optimization problems and establish its convergence rate under a local KL condition. Leveraging these results, we develop an inexact proximal gradient method for the original minimax problem, where the inexact gradient of the maximal function is computed via the SCP method applied to a locally KL-structured subproblem. Finally, we establish complexity guarantees for the proposed method in computing an approximate stationary point of the original minimax problem.

2509.24144 2026-05-27 q-fin.PM q-fin.CP q-fin.ST stat.ML

From Headlines to Holdings: Deep Learning for Smarter Portfolio Decisions

从头条到持仓:用于更明智投资组合决策的深度学习

Yun Lin, Jiawei Lou, Jinghe Zhang

AI总结 提出一个端到端框架,结合LSTM、图注意力网络和新闻情感分析,直接学习投资组合权重,避免传统两步法的不稳定性,在九只美国股票上实现更高累计收益和夏普比率。

详情
Comments
22 pages, 9 figures
AI中文摘要

深度学习为投资组合优化提供了新工具。我们提出了一个端到端框架,通过结合长短期记忆网络(LSTM)建模时间模式、图注意力网络(GAT)捕捉股票间动态关系以及金融新闻情感分析反映市场心理,直接学习投资组合权重。与先前方法不同,我们的模型将这些元素统一在单个流水线中,生成每日资产配置。它避免了传统的两步过程——先预测资产收益,然后应用均值-方差优化(MVO),这一序列可能引入不稳定性。我们在覆盖六个行业的九只美国股票上评估该框架,选择旨在平衡行业多样性和新闻覆盖。在此设置中,该模型相比等权重和基于CAPM的MVO基准,实现了更高的累计收益和夏普比率。尽管股票池有限,但结果凸显了整合价格、关系和情感信号对投资组合管理的价值,并为将该方法扩展到更大、更多样化的资产集指明了有前景的方向。

英文摘要

Deep learning offers new tools for portfolio optimization. We present an end-to-end framework that directly learns portfolio weights by combining Long Short-Term Memory (LSTM) networks to model temporal patterns, Graph Attention Networks (GAT) to capture evolving inter-stock relationships, and sentiment analysis of financial news to reflect market psychology. Unlike prior approaches, our model unifies these elements in a single pipeline that produces daily allocations. It avoids the traditional two-step process of forecasting asset returns and then applying mean--variance optimization (MVO), a sequence that can introduce instability. We evaluate the framework on nine U.S. stocks spanning six sectors, chosen to balance sector diversity and news coverage. In this setting, the model delivers higher cumulative returns and Sharpe ratios than equal-weighted and CAPM-based MVO benchmarks. Although the stock universe is limited, the results underscore the value of integrating price, relational, and sentiment signals for portfolio management and suggest promising directions for scaling the approach to larger, more diverse asset sets.

2503.21510 2026-05-27 cs.LG cs.CV stat.ML

An uncertainty-aware Bayesian framework for machine learning classification models: A case study in land cover classification

一种不确定性感知的贝叶斯机器学习分类模型框架:以土地覆盖分类为例

Samuel Bilson, Miles McCrory, Anna Pustogvar

AI总结 提出一种考虑输入测量不确定性的贝叶斯生成式分类模型框架,通过贝叶斯二次判别分析模型在土地覆盖数据集上验证,该模型在可解释性、不确定性建模和计算效率方面优于随机森林和神经网络。

详情
Comments
38 pages, 16 figures
AI中文摘要

确保机器学习分类模型的预测伴随不确定性估计是可信任人工智能的主要支柱之一。当前不确定性量化研究主要关注ML模型的认知不确定性,但很少考虑输入测量不确定性,而这对于计量学的可追溯性至关重要。在这项工作中,我们提出了一种考虑输入测量不确定性的生成式ML分类模型的贝叶斯框架。我们以贝叶斯二次判别分析(BQDA)模型为例,并将其应用于来自Copernicus Sentinel-2的2020年和2021年计量土地覆盖数据集。我们将该模型的性能与土地覆盖图中更流行的分类模型(如随机森林和神经网络)进行基准测试。为了验证和评估此类模型的泛化能力,我们还在合成分类数据上进行了模拟,改变了输入测量噪声的分布类型和强度。我们发现,对于真实和合成数据,所提出的BQDA模型更可信,因为它更具可解释性,显式建模了输入测量不确定性,并在不同领域和大小的数据集上保持了类别概率输出的预测性能,同时计算效率更高。

英文摘要

Ensuring that predictions of machine learning (ML) classification models are accompanied by uncertainty estimates is one of the main pillars of trustworthy AI. Current research in uncertainty quantification focuses mainly on epistemic uncertainty of the ML model, but rarely takes account of input measurement uncertainty, which is vital for traceability in metrology. In this work we propose a Bayesian framework for generative ML classification models that takes account of input measurement uncertainty. We take the specific case of a Bayesian quadratic discriminant analysis (BQDA) model, and apply it to metrological land cover datasets from Copernicus Sentinel-2 from 2020 and 2021. We benchmark the performance of the model against more popular classification models used in land cover maps such as random forests and neural networks. To validate and assess the generalisability of such a model, we also run simulations over synthetic classification data, varying distribution type and strength of the input measurement noise. We find for both real and synthetic data, the BQDA model presented is more trustworthy, in the sense that it is more interpretable, explicitly models the input measurement uncertainty, and maintains predictive performance of class probability outputs across datasets over different domains and sizes, whilst also being more computationally efficient.

2503.11673 2026-05-27 math.ST math.PR stat.AP stat.TH

Crossing the Kolmogorov-Smirnov Boundary: Exact Tails, Sharp Bounds, and Broken Pivots

跨越 Kolmogorov-Smirnov 边界:精确尾部、尖锐界与破坏的枢轴

Elvis Han Cui, Yihao Li, Zhuang Liu

AI总结 本文通过有限样本穿越账本重写 Kolmogorov-Smirnov 统计量的精确分布,给出单样本和双样本的精确尾部计算、指数界解释,并指出在复合零假设下参数拟合会破坏路径从而失去分布自由性质。

详情
AI中文摘要

Kolmogorov-Smirnov 统计量通常被引入为 supremum,但其有限样本行为由更局部的问题支配:经验过程首次跨越边界的位置?本文通过一个有限样本穿越账本给出部分答案。该账本将 Smirnov-Birnbaum-Tingey 单样本公式重写为显式击中时间律,并产生稳定的对数尺度尾部评估器。对于双样本,它给出了任意样本大小的单壁和双壁精确格点递归,平衡反射公式作为特例出现。同一观点将 Dvoretzky-Kiefer-Wolfowitz-Massart 不等式解释为精确穿越和的指数压缩,并指出精确无分布计数停止之处:在复合零假设下,拟合参数改变了路径本身。模拟和两个小数据诊断说明了由此产生的校准警告。

英文摘要

The Kolmogorov-Smirnov statistic is usually introduced as a supremum, but its finite-sample behavior is governed by a more local question: where does the empirical process first cross a boundary? This letter gives a partial answer through a finite-sample crossing ledger. The ledger rewrites the Smirnov- Birnbaum-Tingey one-sample formula as an explicit hitting-time law and yields a stable log-scale tail evaluator. For two samples, it gives one-wall and two-wall exact lattice recursions for arbitrary sample sizes, with the balanced reflection formula appearing as a special closed form. The same viewpoint explains the Dvoretzky-Kiefer-Wolfowitz-Massart inequality as an exponential compression of exact crossing sums and shows where exact distribution-free counting stops: under a composite null, fitted parameters change the path itself. Simulations and two small data diagnostics illustrate the resulting calibration warning.

2306.13985 2026-05-27 stat.ML cs.AI cs.LG stat.ME

Robust Classification of High-Dimensional Data using Data-Adaptive Energy Distance

使用数据自适应能量距离的高维数据鲁棒分类

Jyotishka Ray Choudhury, Aytijhya Saha, Sarbojit Roy, Subhajit Dutta

AI总结 针对高维低样本量数据,提出无调参、无矩条件的鲁棒分类器,在渐近条件下实现完美分类,并通过模拟和真实数据验证其优势。

详情
Journal ref
In: ECML PKDD 2023: Research Track. Lecture Notes in Computer Science, vol 14173. Springer, Cham (2023)
Comments
Published at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2023
AI中文摘要

高维低样本量数据的分类在基因表达研究、癌症研究和医学成像等多种实际场景中构成挑战。本文开发并分析了一些专门为HDLSS数据设计的分类器。这些分类器无需调参且具有鲁棒性,即它们不依赖于底层数据分布的任何矩条件。研究表明,在相当一般的条件下,它们在HDLSS渐近框架下能实现完美分类。还研究了所提分类器的比较性能。我们的理论结果得到了广泛的模拟研究和真实数据分析的支持,这些分析表明所提出的分类技术相对于几种广泛认可的方法具有显著优势。

英文摘要

Classification of high-dimensional low sample size (HDLSS) data poses a challenge in a variety of real-world situations, such as gene expression studies, cancer research, and medical imaging. This article presents the development and analysis of some classifiers that are specifically designed for HDLSS data. These classifiers are free of tuning parameters and are robust, in the sense that they are devoid of any moment conditions of the underlying data distributions. It is shown that they yield perfect classification in the HDLSS asymptotic regime, under some fairly general conditions. The comparative performance of the proposed classifiers is also investigated. Our theoretical results are supported by extensive simulation studies and real data analysis, which demonstrate promising advantages of the proposed classification techniques over several widely recognized methods.

2502.06567 2026-05-27 stat.ML cs.LG

Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study

量化模型中的成员推断风险:理论与实证研究

Eric Aubinais, Philippe Formont, Pablo Piantanida, Elisabeth Gassiat

AI总结 本文通过理论分析和实证方法,研究后训练量化对机器学习模型成员推断隐私风险的影响,并提出新的成员推断安全指标。

详情
Journal ref
AISTATS 2026
AI中文摘要

量化机器学习模型已被证明在降低内存和推理成本的同时,能够保持与原始模型相当的性能水平。在这项工作中,我们研究了量化过程对数据驱动模型隐私的影响,重点关注它们对成员推断攻击的脆弱性。成员推断安全(MIS)最近被提出,用于表征机器学习模型针对最强大(且可能未知)攻击的隐私性。然而,量化MIS在计算上似乎非常困难。在本文中,我们针对最小化经验损失的机器学习模型的后训练量化过程,提出了一种新的MIS指标。该新指标是此背景下MIS理论渐近分析的副产品。我们还提出了一种经验估计MIS指标的方法。使用合成数据集和真实世界数据(在药物发现背景下),我们证明了我们的方法在评估和排序不同量化器的MIS方面的有效性。

英文摘要

Quantizing machine learning models has demonstrated its effectiveness in lowering memory and inference costs while maintaining performance levels comparable to those of the original models. In this work, we investigate the impact of quantization procedures on privacy in data-driven models, focusing on their vulnerability to membership inference attacks. Membership Inference Security (MIS) has recently been proposed to characterize the privacy of machine learning models against the most powerful (and possibly unknown) attacks. However, quantifying MIS appears to be computationally very difficult. In this paper, we propose a new MIS indicator for post-training quantization procedures of machine learning models that minimizes an empirical loss. This new indicator is a byproduct of a theoretical asymptotic analysis of the MIS in this context. We also present a methodology for empirically estimating our MIS indicator. Using synthetic datasets and real-world data (in the context of drug discovery), we demonstrate the effectiveness of our approach in assessing and ranking the MIS of different quantizers.

2411.00657 2026-05-27 stat.ML cs.NA math.NA

Fast Spectrum Estimation of Some Kernel Matrices

某些核矩阵的快速谱估计

Mikhail Lepilov

AI总结 提出一种无需显式构造完整核矩阵即可估计其所有特征值量级的框架,适用于均匀分布点集上快速衰减核函数生成的矩阵,并给出理论保证和实验验证。

详情
AI中文摘要

在数据科学中,通常假设单个观测值独立地来自一个潜在的概率空间。由大量此类观测值构成的核矩阵经常出现,例如在分类任务中。在不显式构造这些矩阵的情况下了解其特征值衰减特性是可取的,例如在确定低秩近似是否可行时。在这项工作中,我们为某些核矩阵引入了一种新的特征值分位数估计框架。该框架为核矩阵的所有特征值提供了有意义的界,同时避免了构造完整矩阵的成本。所考虑的核矩阵来自一个在远离对角线处快速衰减的核函数,应用于任意维欧几里得空间中均匀分布的点集。我们证明了在核函数满足某些界限时该框架的有效性,并提供了其准确性的经验证据。在此过程中,我们还证明了有限数集的一个一般交错型定理。此外,我们指出了该框架在数据内在维度研究中的应用,以及推广本工作的几个其他方向。

英文摘要

In data science, individual observations are often assumed to come independently from an underlying probability space. Kernel matrices formed from large sets of such observations arise frequently, for example during classification tasks. It is desirable to know the eigenvalue decay properties of these matrices without explicitly forming them, such as when determining if a low-rank approximation is feasible. In this work, we introduce a new eigenvalue quantile estimation framework for some kernel matrices. This framework gives meaningful bounds for all the eigenvalues of a kernel matrix while avoiding the cost of constructing the full matrix. The kernel matrices under consideration come from a kernel with quick decay away from the diagonal applied to uniformly-distributed sets of points in Euclidean space of any dimension. We prove the efficacy of this framework given certain bounds on the kernel function, and we provide empirical evidence for its accuracy. In the process, we also prove a general interlacing-type theorem for finite sets of numbers. Additionally, we indicate an application of this framework to the study of the intrinsic dimension of data, as well as several other directions in which to generalize this work.

2410.00357 2026-05-27 cs.LG stat.ML

Neural Scaling Laws of Deep ReLU and Deep Operator Network: A Theoretical Study

深度ReLU和深度算子网络的神经缩放定律:一项理论研究

Hao Liu, Zecheng Zhang, Wenjing Liao, Hayden Schaeffer

AI总结 本文通过分析深度算子网络的逼近误差和泛化误差,建立了量化神经缩放定律的理论框架,揭示了网络模型大小和训练数据大小与误差之间的关系,并推广到深度ReLU网络。

详情
AI中文摘要

神经缩放定律在深度神经网络的性能中起着关键作用,并在广泛的任务中被观察到。然而,理解这些缩放定律的完整理论框架仍不完善。在本文中,我们探索了深度算子网络的神经缩放定律,这些网络涉及学习函数空间之间的映射,重点关注Chen和Chen风格的架构。这些方法包括流行的深度算子网络(DeepONet),它们使用可学习基函数和依赖于输入函数的系数的线性组合来近似输出函数。我们建立了一个理论框架,通过分析其逼近和泛化误差来量化神经缩放定律。我们阐述了深度算子网络的逼近和泛化误差与网络模型大小和训练数据大小等关键因素之间的关系。此外,我们处理了输入函数表现出低维结构的情况,从而能够推导出更紧的误差界。这些结果也适用于深度ReLU网络和其他类似结构。我们的结果为算子学习中的神经缩放定律提供了部分解释,并为其应用提供了理论基础。

英文摘要

Neural scaling laws play a pivotal role in the performance of deep neural networks and have been observed in a wide range of tasks. However, a complete theoretical framework for understanding these scaling laws remains underdeveloped. In this paper, we explore the neural scaling laws for deep operator networks, which involve learning mappings between function spaces, with a focus on the Chen and Chen style architecture. These approaches, which include the popular Deep Operator Network (DeepONet), approximate the output functions using a linear combination of learnable basis functions and coefficients that depend on the input functions. We establish a theoretical framework to quantify the neural scaling laws by analyzing its approximation and generalization errors. We articulate the relationship between the approximation and generalization errors of deep operator networks and key factors such as network model size and training data size. Moreover, we address cases where input functions exhibit low-dimensional structures, allowing us to derive tighter error bounds. These results also hold for deep ReLU networks and other similar structures. Our results offer a partial explanation of the neural scaling laws in operator learning and provide a theoretical foundation for their applications.

2408.05560 2026-05-27 cs.LG math.OC stat.ML

Incremental Gauss-Newton Descent for Machine Learning

增量高斯-牛顿下降法在机器学习中的应用

Mikalai Korbit, Mario Zanon

AI总结 针对标量输出损失逐样本评估的场景,提出增量高斯-牛顿下降法(IGND),通过闭式标量归一化随机梯度实现无需存储或求解曲率矩阵的高效更新,并证明其收敛性。

详情
AI中文摘要

随机梯度更新因其高效性和可扩展性被广泛使用,但其有效步长可能严重依赖于特征缩放和局部模型敏感性。高斯-牛顿方法通过曲率信息处理此类尺度效应,但在标准小批量形式中需要矩阵-向量乘积、线性求解或结构化近似。本文研究每次评估一个样本的标量输出损失的特殊情况。在此设置下,广义高斯-牛顿矩阵的秩至多为1,其唯一可能的非零曲率方向与随机梯度对齐。因此,阻尼高斯-牛顿方向简化为样本梯度的闭式标量归一化。由此产生的更新,即增量高斯-牛顿下降法(IGND),不需要曲率矩阵存储、分解或迭代线性求解。我们推导了该更新,描述了其行为,并将其与归一化梯度下降、自适应一阶方法、随机Polyak步长和小批量高斯-牛顿更新联系起来。在显式光滑性、对齐性和随机逼近假设下,我们证明了IGND更新的平稳性结果。在监督学习、尺度鲁棒性的受控测试以及线性二次控制案例研究上的实验表明,IGND提高了对敏感性缩放的鲁棒性,并且可以在保持简单增量更新的同时,与常见的随机优化器竞争或互补。

英文摘要

Stochastic gradient updates are widely used for their efficiency and scalability, but their effective step sizes can depend strongly on feature scaling and local model sensitivity. Gauss-Newton methods address such scale effects through curvature information, but in their standard mini-batch form they require matrix-vector products, linear solves, or structured approximations. This paper studies the special case of scalar-output losses evaluated one sample at a time. In this setting, the generalized Gauss-Newton matrix has rank at most one, and its only possible nonzero curvature direction is aligned with the stochastic gradient. As a result, the damped Gauss-Newton direction reduces to a closed-form scalar normalization of the sample gradient. The resulting update, Incremental Gauss-Newton Descent (IGND), requires no curvature matrix storage, factorization, or iterative linear solve. We derive the update, characterize its behavior, and relate it to normalized gradient descent, adaptive first-order methods, stochastic Polyak step sizes, and mini-batch Gauss-Newton updates. Under explicit smoothness, alignment, and stochastic approximation assumptions, we prove a stationarity result for the IGND update. Experiments on supervised learning, a controlled test of scale robustness, and a linear-quadratic control case study show that IGND improves robustness to sensitivity scaling and can be competitive with, or complementary to, common stochastic optimizers while retaining a simple incremental update.

1909.08210 2026-05-27 cs.LG stat.ML

Reformulation of RBM to Unify Linear and Nonlinear Dimensionality Reduction

RBM的重新表述以统一线性和非线性降维

Jiangsheng You, Chun-Yen Liu

AI总结 本文通过最大后验估计和期望最大化算法重新表述受限玻尔兹曼机为确定性模型,提出无需MCMC的对比散度算法,统一了标量和向量变量的线性和非线性降维。

详情
Comments
16 pages with 7 figures
AI中文摘要

受限玻尔兹曼机(RBM)是一种具有共享权重的两层神经网络,在文献中已被广泛研究用于降维、数据表示和推荐系统。传统的RBM需要对两层上的值进行概率解释,并在训练期间使用马尔可夫链蒙特卡洛(MCMC)过程生成样本。对比散度(CD)算法能高效训练RBM,但其收敛性尚未得到数学证明。在本文中,利用最大后验(MAP)估计和期望最大化(EM)算法,我们证明了无MCMC的CD算法对于条件似然目标函数是收敛的。本文的另一个关键贡献是将RBM重新表述为确定性模型。在重新表述的RBM中,无MCMC的CD算法近似于梯度下降(GD)方法。这种重新表述的RBM可以在节点上采用连续的标量和向量变量,并灵活选择激活函数。数值实验显示了其在线性和非线性降维中的能力,并且对于非线性降维,通过选择合适的激活函数,重新表述的RBM可以优于主成分分析(PCA)。最后,我们展示了其在CIFAR-10数据集(彩色图像)和多变量序列数据上的向量值节点应用,这些应用无法用传统RBM自然配置。这项工作不仅为传统RBM提供了理论见解,而且统一了标量和向量变量的线性和非线性降维。

英文摘要

A restricted Boltzmann machine (RBM) is a two-layer neural network with shared weights and has been extensively studied for dimensionality reduction, data representation and recommendation systems in the literature. The traditional RBM requires a probabilistic interpretation of the values on both layers and a Markov chain Monte Carlo (MCMC) procedure to generate samples during the training. The contrastive divergence (CD) is efficient to train the RBM but its convergence has not been proved mathematically. In this paper, using a maximum a posteriori (MAP) estimate and the expectation maximization (EM) algorithm, we show that the CD algorithm without MCMC is convergent for the conditional likelihood object function. Another key contribution in this paper is the reformulation of the RBM into a deterministic model. Within the reformulated RBM, the CD algorithm without MCMC approximates the gradient descent (GD) method. This reformulated RBM can take the continuous scalar and vector variables on the nodes with flexibility in choosing the activation functions. Numerical experiments show its capability in both linear and nonlinear dimensionality reduction, and, for the nonlinear dimensionality reduction, the reformulated RBM can outperform principal component analysis (PCA) by choosing the proper activation functions. Finally, we demonstrate its application to vector-valued nodes for the CIFAR-10 dataset (color images) and the multivariate sequence data, which cannot be configured naturally with the traditional RBM. This work not only provides theoretical insights regarding the traditional RBM but also unifies the linear and nonlinear dimensionality reduction for scalar and vector variables.