arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.02550 2026-06-02 stat.AP physics.ao-ph

Probabilistic storyline attribution using machine learning

使用机器学习的概率性故事线归因

Frieder Loer, Maybritt Schillinger, Sebastian Sippel

AI总结 提出分布自编码器(DAE)方法,基于大气环流状态和全球变暖水平生成气候反事实,用于概率性故事线归因,并以2003年欧洲热浪为例展示了条件强度和概率比的变化。

详情
Comments
main text: 19 pages and 4 figures
AI中文摘要

气候归因的一个基本目标是估计强迫气候变化如何影响观测到的极端天气事件。故事线归因方法将观测到的天气事件(以其大气动态状态即大气环流为条件)与当前“事实”气候中的事件进行比较,并与假设的“反事实”气候中具有非常相似环流条件的事件进行比较。然而,物理气候模型无法直接在不同气候强迫状态下转移这些故事线反事实。统计和机器学习技术可能克服这一限制;然而,在不同气候状态下模拟环流条件极端事件具有挑战性。在这里,我们展示了分布自编码器(DAE)作为一种生成气候反事实的通用方法。它们以大气环流状态和平均全球变暖水平为条件,对欧洲空间分辨温度场的完整分布进行建模。这些分布允许推导有意义的条件概率比,这是基于DAE的故事线方法的一个特殊优势。我们在完全耦合的气候模型模拟上训练DAE,并评估在不同事实和基于故事线的反事实气候模型模拟中的建模分布。在一个说明性案例研究中,我们重新审视了2003年欧洲热浪,并使用ERA5环流为假设的“类似2003年的欧洲热浪”生成反事实,我们假设该热浪发生在2003年后的四分之一世纪(2028年)和半个世纪(2053年)。条件强度将从2003年的29.3°C增加到2028年的30.3°C和2053年的32.1°C,与2003年相比,条件概率比分别为2.1和3.2。

英文摘要

A fundamental goal in climate attribution is to estimate how forced climate change contributes to observed extreme weather events. The storyline attribution method compares an observed weather event, conditional on its atmospheric dynamic state (i.e., atmospheric circulation), in the current, 'factual' climate to an event with very similar circulation conditions in a hypothetical, 'counterfactual' climate. However, physical climate models cannot directly transfer these storyline counterfactuals across different climate forcing states. Statistical and machine learning techniques may overcome this limitation; yet, emulating circulation-conditional extreme events under different climate states is challenging. Here, we demonstrate distributional autoencoders (DAEs) as a versatile method for generating climate counterfactuals. They model the full distribution of spatially resolved European temperature fields conditional on the atmospheric circulation state and the mean global warming level. These distributions allow for deriving meaningful conditional probability ratios, which is a particular advantage of the DAE-based storyline approach. We train DAEs on fully coupled climate model simulations and we evaluate the modelled distributions across different factual and storyline-based counterfactual climate model simulations. In an illustrative case study, we revisit the 2003 European heatwave and we generate counterfactuals for a hypothetical `2003-like European heatwave' using ERA5 circulation, which we hypothesize to occur a quarter century (2028) and a half century (2053) after 2003. The conditional intensity would increase from 29.3 °C in 2003, to 30.3 °C and 32.1 °C in 2028 and 2053, respectively and conditional probability ratios would be 2.1 and 3.2 when compared to 2003.

2606.02533 2026-06-02 stat.ME

Space-Filling One-Factor-At-A-Time Designs

空间填充的一次一因子设计

Wei-Yang Yu, V. Roshan Joseph

AI总结 针对确定性计算机实验中现有筛选设计缺乏空间填充性的问题,提出一类在保持筛选能力的同时提升空间填充性的新设计,并通过数值例子证明其优越性。

详情
AI中文摘要

空间填充设计常用于确定性计算机实验。然而,它们对于因子筛选效果不佳,这使得当只有一小部分输入因子对输出有影响时效率低下。最近开发的筛选设计,如MOFAT设计,能有效识别重要因子,但缺乏空间填充性,限制了其在代理建模中的实用性。在本文中,我们提出了一类新的筛选设计,在保持筛选能力的同时提高了空间填充性。通过几个数值例子,我们证明了所提出的设计相对于现有设计具有明显优势。

英文摘要

Space-filling designs are commonly used in deterministic computer experiments. However, they are ineffective for factor screening, which makes them inefficient when only a small subset of input factors is influential to the output. Recently developed screening designs, such as MOFAT designs, are effective at identifying important factors but lack space-filling properties, limiting their usefulness for surrogate modeling. In this article, we propose a new class of screening designs that improves the space-fillingness while retaining their screening capability. Through several numerical examples, we demonstrate that the proposed designs offer clear advantages over existing designs.

2606.02508 2026-06-02 stat.AP

AI and physics-based weather forecasting: A comparative study

AI与基于物理的天气预报:一项比较研究

Mátyás Kocsis, Sándor Baran

AI总结 本研究系统比较了ECMWF的物理模型IFS与AI模型AIFS在中期10米风速集合预报中的表现,发现原始IFS预报显著优于AIFS,但后处理可大幅缩小差距。

详情
Comments
24 pages, 9 figures, 1 table
AI中文摘要

在过去几年中,基于AI的模型因其日益提高的准确性和效率而成为天气预报领域的关注焦点。在气象服务中,ECMWF率先开发了其人工智能预报系统(AIFS)模型,该模型于2024年6月首次提供数据驱动的集合预报。自2025年7月起,AIFS集合模型已投入业务运行,并与ECMWF基于物理的综合预报系统(IFS)并行运行,后者被视为天气预报的黄金标准。新的AIFS模型生成预报的速度比经典数值天气预报模型快十倍,同时能耗约低一千倍。我们通过比较两个模型在2025年7月至11月期间为全球超过9000个地面观测站业务生成的中期10米风速原始和后处理集合预报的准确性,展示了我们对IFS和AIFS模型性能的系统评估结果。后处理案例涉及参数化集合模型输出统计(EMOS)以及非参数分位数回归(QR)方法,以纠正原始预报中的任何系统性不准确。原始IFS集合预报的预测性能在所有研究的预报时效内均显著优于原始AIFS预报的技能。正如预期,后处理显著提高了IFS和AIFS预报的技能,并且在大多数验证指标中,EMOS优于QR,尤其是在短提前时间。与原始集合相比,匹配的IFS和AIFS预报之间的技能差异通过后处理显著减小,并且在短提前时间时大多显著,此时IFS预报优于其AIFS对应预报。

英文摘要

In the last few years, AI-based models have become the centre of attention in weather forecasting due to their increasing accuracy and efficiency. Pioneering among weather services, ECMWF has developed its Artificial Intelligence Forecasting System (AIFS) model, which was first to provide data-driven ensemble forecasts in June 2024. Since July 2025, the AIFS ensemble model has been operational and runs in parallel with ECMWF's physics-based Integrated Forecasting System (IFS), which is considered the gold standard in weather prediction. The new AIFS model can generate forecasts ten times faster than the classical numerical weather prediction model, while consuming approximately a thousand times less energy. We present the results of our systematic assessment of the performance of the IFS and AIFS models by comparing the accuracy of raw and post-processed medium-range 10-m wind-speed ensemble forecasts generated operationally by the two models for the period between July and November 2025 for more than 9000 synoptic observation stations across the globe. The post-processed case involves the parametric ensemble model output statistics (EMOS) as well as the non-parametric quantile regression (QR) approach to correct any systematic inaccuracies in the raw forecasts. The predictive performance of raw IFS ensemble forecasts proves to be substantially superior to the skill of the raw AIFS predictions for all investigated forecast horizons. As expected, post-processing significantly improves the skill of both IFS and AIFS predictions, and, across most verification metrics, EMOS is superior to QR, especially for short lead times. Compared to the raw ensemble, the differences in skill between the matching IFS and AIFS predictions are substantially decreased by post-processing and are mostly significant at short lead times, when the IFS forecasts outperform their AIFS counterparts.

2606.02472 2026-06-02 math.PR cs.SI math.ST stat.TH

Correlated uniform attachment trees

相关均匀附着树

Johannes Bäumler, Miklós Z. Rácz, Nathan Ross, Anirudh Sridhar

AI总结 研究一种相关均匀附着树模型,其中两棵树并行生长且附着相关,通过构造基于Jordan中心性和边缘子树大小的统计量,在树规模趋于无穷时一致估计相关参数α。

详情
Comments
45 pages, 2 figures
AI中文摘要

我们引入并研究了一种新的相关均匀附着(UA)树模型,其中相关性贯穿过程的时间演化。在该模型中,两棵UA树并行生长,每一步向每棵树添加一个新节点,并通过边连接到各自树中均匀选择的已有顶点。两次附着选择是相关的:以概率α,两条边连接到两棵树中具有相同时间标签的节点;以概率1-α,选择独立进行。我们研究给定两棵未标记树时该模型的基本检测和估计问题。主要结果中,我们构造了相关参数α的一致估计量,当树规模趋于无穷时成立。该统计量的构造依赖于两个关键思想。首先,我们使用Jordan中心性识别每棵树中顶点子集,其交集具有足够多的共同早期顶点。其次,跨多个时间尺度,可以利用边缘子树的大小近似确定附着到这些早期顶点的顶点标签。我们的分析包括关于保持中心性的早期顶点比例的新定量界,这在网络考古文献中具有独立意义。

英文摘要

We introduce and study a new model of correlated uniform attachment (UA) trees, where correlation is sprinkled throughout the time evolution of the process. In this model, two UA trees are grown in parallel, and at each time step a new node is added to each tree, with an edge between it and a uniformly chosen existing vertex in the respective tree. The two choices of attachment are correlated: with probability $α$, the edges attach to nodes with the same time label in both trees, and with probability $1-α$, the choices are made independently. We study fundamental detection and estimation questions for this model, given two \emph{unlabeled} trees. In our main result, we construct a consistent estimator of the correlation parameter $α$, as the size of the trees goes to infinity. The construction of our statistic relies on two key ideas. First, we use Jordan centrality to identify subsets of vertices of each tree whose intersection has a sufficient number of common early vertices. The second idea is that, across multiple time scales, it is possible to approximately determine the labels of vertices that have attached to these early vertices, using the sizes of fringe subtrees. Our analysis includes novel quantitative bounds on the fraction of early vertices that remain central, which are of independent interest in the network archaeology literature.

2606.02455 2026-06-02 cs.LG cond-mat.mtrl-sci physics.chem-ph physics.comp-ph stat.CO

Speculative Sampling For Faster Molecular Dynamics

用于加速分子动力学的推测采样

Arthur Kosmala, Stephan Günnemann, Meng Gao, Brandon Wood

AI总结 提出Langevin推测动力学(LSD),一种分布式且模型无关的推测采样方法,通过草稿模型快速提议步长并用目标模型并行验证,实现分子动力学模拟的3-9倍加速而不增加相对误差。

详情
Comments
Forty-Third International Conference on Machine Learning (ICML 2026). 32 pages, 14 figures, 8 tables
AI中文摘要

分子动力学(MD)是模拟原子系统动力学行为的关键工具。然而,MD本质上是串行的,这使得通过并发计算提高单系统吞吐量变得困难。为了解决这个问题,我们引入了Langevin推测动力学(LSD),一种分布式且模型无关的推测采样器,用于在不增加相对误差的情况下加速MD。受语言和扩散建模中推测方法的启发,LSD使用草稿模型提议快速模拟步长,并用较慢的目标模型并行验证,应用从草稿分布到目标分布的传输映射。我们将推测采样扩展到二阶Langevin动力学,推导出作为物理参数函数的可实现加速比,表明LSD在不同系统和草稿-目标组合中实现3-9倍加速,并从理论和实验上证实LSD从其目标模型分布中采样轨迹。

英文摘要

Molecular dynamics (MD) is a key tool for simulating the dynamical behavior of atomic systems. However, MD is inherently serial, which makes it difficult to increase single-system throughput with concurrent compute. To address this, we introduce Langevin Speculative Dynamics (LSD), a distributed and model-agnostic speculative sampler for accelerating MD without adding relative error. Inspired by speculative methods in language and diffusion modeling, LSD uses a draft model to propose fast simulation steps and verifies them in parallel with a slower target model, applying a transport map from the draft to the target distribution. We extend speculative sampling to second-order Langevin dynamics, derive the achievable speedup as a function of physical parameters, show that LSD generalizes across different systems and draft-target combinations with a 3-9x speedup, and confirm theoretically and empirically that LSD samples trajectories from its target model distribution.

2606.02410 2026-06-02 stat.ME stat.AP

Optimal sequential two-stage Bayes Factor Design for two-arm clinical Phase II Trials with binary Endpoints

二元终点两臂II期临床试验的最优序贯两阶段贝叶斯因子设计

Riko Kelter

AI总结 提出一种无模拟方法,基于贝叶斯因子优化两臂II期试验的两阶段设计,通过精确表达式计算操作特征并最小化零假设下的期望样本量。

详情
Comments
50 pages, 9 figures
AI中文摘要

两臂II期临床试验通常受益于允许因无效而提前停止的中期分析,但此类设计的贝叶斯校准通常基于计算密集的蒙特卡洛模拟。本研究开发了一种无模拟方法,以贝叶斯因子作为主要证据度量,在二元终点的两臂II期试验中获得贝叶斯最优两阶段设计。基于固定样本两臂贝叶斯因子设计的近期矩阵搜索方法和单臂两阶段设计的早期校正公式,所提出的方法推导了具有单一无效中期分析的两阶段两臂设计操作特征的精确表达式。通过校正因提前停止而移除的轨迹对应的固定样本量,获得贝叶斯功效和I型错误,从而得到完全数值校准程序,完全避免蒙特卡洛误差。所得方法在可接受的中期和最终样本量上搜索,以识别满足贝叶斯功效、I型错误和有利于零假设的令人信服证据概率的目标约束的最优设计,同时最小化零假设下的期望样本量。该方法在现实II期场景中进行了说明,包括对系统性硬化症中riociguat试验的详细重新分析。总体而言,该方法将无模拟贝叶斯因子设计方法扩展到实际重要的两臂两阶段II期试验,并为贝叶斯设计校准和敏感性分析提供了透明基础。

英文摘要

Two-arm phase II clinical trials often benefit from an interim analysis that allows early stopping for futility, but Bayesian calibration of such designs is usually based on computationally intensive Monte Carlo simulation. In this work, a simulation-free methodology is developed to obtain Bayesian optimal two-stage designs in two-arm phase II trials with binary endpoints using Bayes factors as the primary measure of evidence. Building on recent matrix-search methods for fixed-sample two-arm Bayes factor designs and earlier correction formulas for one-arm two-stage designs, the proposed approach derives exact expressions for the operating characteristics of a two-stage two-arm design with a single futility interim. Bayesian power and type-I error are obtained by correcting the corresponding fixed-sample quantities for trajectories that would have been removed by early stopping, yielding a fully numerical calibration procedure that avoids Monte Carlo error entirely. The resulting method searches over admissible interim and final sample sizes to identify the optimal design that satisfies target constraints on Bayesian power, type-I error, and the probability of compelling evidence in favour of the null hypothesis, while minimizing the expected sample size under the null hypothesis. The methodology is illustrated in realistic phase II settings, including a detailed re-analysis of the riociguat trial in systemic sclerosis. Overall, the approach extends simulation-free Bayes factor design methodology to the practically important setting of two-arm two-stage phase II trials and provides a transparent basis for Bayesian design calibration and sensitivity analysis.

2606.02382 2026-06-02 cs.CC cs.HC stat.CO

Attention Dynamics and Adaptive Decision Support in C5ISR: A Recurrence Quantification Analysis of Visual and Multimodal Attention Guidance Effects on Mission Performance

C5ISR中的注意力动态与自适应决策支持:视觉与多模态注意力引导对任务绩效影响的递归量化分析

Hyun-Gee Jei, Caleb J. Armstrong, Farzan Sasangohar

AI总结 本研究通过递归量化分析眼动数据,探讨了视觉与多模态自适应决策支持工具在C5ISR环境中的效果,发现多模态工具显著提升绩效,且递归度量与绩效存在线性和非线性关系。

详情
Comments
11 Figures, 3 Tables
AI中文摘要

现代指挥、控制、通信、计算机、网络、情报、监视和侦察(C5ISR)环境对任务指挥官提出了巨大的注意力需求。在这些高风险环境中,注意力分配的失败可能导致严重的操作后果。本研究在高保真模拟军事指挥中心中,调查了基于注视的、注意力引导的自适应决策支持工具(包括纯视觉和多模态设计)的有效性。为了表征与这些工具交互过程中的注视和注意力动态,对眼动数据进行了递归量化分析。然后使用基于贝叶斯信息准则的逐步回归来识别与绩效相关的递归注视指标。结果表明,多模态自适应决策支持工具比纯视觉注意力引导工具与显著更高的绩效相关。平均对角线长度与绩效呈负线性关联,而熵呈正线性关联。递归率、确定性和熵也与绩效呈非线性二次关系。特别是,递归率和确定性遵循与耶克斯-多德森定律一致的倒U型模式。这些发现表明,在动态C5ISR环境中,有效绩效取决于结构化与灵活视觉扫描之间的平衡,并且递归注视指标有助于表征与自适应决策支持系统交互过程中的注意力动态。

英文摘要

Modern command, control, communications, computers, cyber, intelligence, surveillance, and reconnaissance (C5ISR) environments place substantial attentional demands on mission commanders. Failures in attention allocation in these high-risk settings can have severe operational consequences. This study investigates the efficacy of gaze-driven, attention-guided adaptive decision support tools, including visual-only and multimodal designs, in a high-fidelity simulated military command center. To characterize gaze and attentional dynamics during interaction with these tools, recurrence quantification analysis was applied to eye-tracking data. Stepwise regression using the Bayesian information criterion was then used to identify recurrence-based gaze metrics associated with performance. Results showed that the multimodal adaptive decision support tool was associated with significantly higher performance than the visual-only attention-guided tool. Average diagonal line length showed a negative linear association with performance, whereas entropy showed a positive linear association. Recurrence rate, determinism, and entropy also showed nonlinear quadratic relationships with performance. In particular, recurrence rate and determinism followed an inverted-U pattern consistent with the Yerkes-Dodson law. These findings suggest that effective performance in dynamic C5ISR contexts depends on a balance between structured and flexible visual scanning, and that recurrence-based gaze metrics can help characterize attentional dynamics during interaction with adaptive decision support systems.

2606.02363 2026-06-02 cs.LG stat.ML

Minimax-Optimal Policy Regret in Partially Observable Markov Games

部分可观测马尔可夫博弈中的极小化最优策略遗憾

Raman Arora

AI总结 针对部分可观测马尔可夫博弈,提出基于epoch的乐观最大似然算法,实现了与聚合Eluder维数相关的$ ilde{O}(\sqrt{T})$策略遗憾,并证明了匹配的下界。

详情
AI中文摘要

我们研究了部分可观测环境中面对战略、自适应对手的序贯决策问题,建模为部分可观测马尔可夫博弈(POMG)。核心挑战在于从部分观测中学习潜在动态,同时面对行为依赖于学习者策略的对手,这使得标准遗憾概念不适用。我们证明,对于固定问题参数,基于epoch的乐观最大似然算法实现了$ ilde{O}(\sqrt{T})$的策略遗憾,显式依赖于视界、对手记忆、置信半径以及可观测算子类的聚合Eluder维数。该算法在每个几何增长的epoch中选择一个策略,使用从过去数据累积构建的置信集,这将比较跨策略的对手响应的成本控制在$T$的对数级别。我们还证明了与$\sqrt{T}$和聚合Eluder维数依赖相匹配的下界(至多问题相关和对数因子)。最后,我们将框架扩展到视界自适应保证和具有几何衰减记忆的对手。

英文摘要

We study sequential decision-making in partially observable environments against strategic, adaptive opponents, modeled as partially observable Markov games (POMGs). The central challenge is to learn latent dynamics from partial observations while facing an adversary whose behavior depends on the learner's strategy, making standard regret notions inadequate. We prove that an epoch-based optimistic maximum-likelihood algorithm achieves $\tilde{O}(\sqrt{T})$ policy regret for fixed problem parameters, with explicit dependence on the horizon, adversary memory, confidence radius, and the aggregate Eluder dimension of the observable-operator class. The algorithm selects one policy per geometrically growing epoch using confidence sets built cumulatively from past data, which keeps the cost of comparing adversary responses across policies logarithmic in $T$. We also prove a lower bound matching the $\sqrt{T}$ and aggregate-Eluder-dimension dependence, up to problem-dependent and logarithmic factors. Finally, we extend the framework to horizon-adaptive guarantees and adversaries with geometric fading memory.

2606.02345 2026-06-02 stat.ML cs.LG

Doing well with less! On Sampling Techniques for Empirical Pairwise Loss Estimation/Minimization

少即是多!关于经验成对损失估计/最小化的采样技术

Louise Davy, Stephan Clémençon, Charlotte Laclau

AI总结 本文利用调查采样技术,通过直接对成对样本进行采样而非单个观测,在保留少量信息的情况下实现与全量成对评估相当的估计或优化性能,为精度与计算成本之间提供了理论上有依据的权衡。

详情
AI中文摘要

许多机器学习问题,包括相似性学习、排序和聚类,都依赖于经验成对损失函数,其二次计算成本在大规模下迅速变得难以承受。我们展示了一种节俭的方法,通过利用调查采样技术,仅保留成对信息的一小部分,即可实现与使用所有成对数据相当的估计或优化性能。一个核心发现(理论和实验均支持)是,这种采样方案必须直接针对成对样本而非单个观测。特别地,对于高维向量(如视觉或图学习中的嵌入)之间的成对损失,使用合适的辅助信息为信息量大的成对样本分配更高的包含概率,可以获得接近全量成对评估的性能,从而在精度和计算成本之间提供了一种有原则且理论上有依据的权衡。

英文摘要

Many machine learning problems, including similarity learning, ranking, and clustering, rely on empirical pairwise loss functions whose quadratic computational cost quickly becomes prohibitive at scale. We demonstrate how a frugal approach that retains only a fraction of the available information on pairs can achieve estimation or optimization performance comparable to that obtained by using all pairs, by leveraging survey sampling techniques. A central finding, supported by both theory and experiments, is that such sampling plans must target pairs directly rather than individual observations. In particular, for pairwise losses between high-dimensional vectors such as embeddings in vision or graph learning, assigning higher inclusion probabilities to informative pairs using suitable auxiliary information yields performance close to full pairwise evaluation, providing a principled and theoretically grounded trade-off between accuracy and computational cost.

2606.02340 2026-06-02 math.PR math.CO math.ST stat.TH

Transitivity in Inhomogeneous Random Tournaments

非均匀随机锦标赛中的传递性

Sayak Chatterjee, Bhaswar B. Bhattacharya

AI总结 本文通过W-随机锦标赛模型,研究了非均匀随机锦标赛中循环三元组(有向3-圈)数量的波动特征,提出了基于锦标赛乘子自举的一致性系数推断框架,并构建了渐近有效的置信区间算法。

详情
Comments
41 pages, 5 figures
AI中文摘要

配对比较数据自然地由锦标赛表示,其中传递性对应于存在一个与所有成对结果一致的全局排名。因此,经典的Kendall-Smith一致性系数通过计算循环三元组(有向$3$-圈)的数量来衡量锦标赛中偏离传递性的程度。本文刻画了非均匀随机锦标赛中循环三元组数量的波动特征,并发展了一个关于一致性系数的推断框架。具体而言,我们考虑$W$-随机锦标赛模型,其中比较概率由锦标赛子$W$决定,它是图论中图子在锦标赛背景下的类比。我们证明,对于$n$个顶点的$W$-随机锦标赛,循环三元组的数量呈现出三种不同的波动机制,由$W$的适当正则性和均匀性概念决定。我们进一步发展了一种新颖的锦标赛子乘子自举方法,在相关渐近机制下一致地逼近循环三元组计数的极限分布。结合检验正则性和均匀性的程序,我们设计了一个算法,用于构建对所有锦标赛子渐近有效的一致性系数置信区间。我们还获得了当循环三元组数量的极限分布呈现特定退化时锦标赛子的结构刻画。这些结果也可以通过锦标赛拟随机性的视角来审视,并可能具有独立的意义。

英文摘要

Paired-comparison data are naturally represented by tournaments, where transitivity corresponds to the existence of a global ranking consistent with all pairwise outcomes. Accordingly, the classical Kendall-Smith coefficient of consistency measures deviations from transitivity in a tournament by counting the number of circular triads (directed $3$-cycles). In this paper, we characterize the fluctuations of the number of circular triads in inhomogeneous random tournaments and develop an inferential framework for the consistency coefficient. Specifically, we consider the $W$-random tournament model, where the comparison probabilities are determined by a tournamenton $W$, the analogue of a graphon in the tournament setting. We show that, for a $W$-random tournament on $n$ vertices, the number of circular triads exhibits three different fluctuation regimes, determined by suitable notions of regularity and uniformity of $W$. We further develop a novel tournamenton multiplier bootstrap that consistently approximates the limiting distribution of the circular-triad count in the relevant asymptotic regime. Combining this with procedures for testing regularity and uniformity, we design an algorithm for constructing confidence intervals for the consistency coefficient that is asymptotically valid for all tournamentons. We also obtain structural characterizations of tournamentons for which the limiting distribution of the number of circular triads exhibits specific degeneracies. These results can also be viewed through the lens of tournament quasirandomness and may be of independent interest.

2606.02295 2026-06-02 stat.AP

Bandwidth selection with a frequency-domain version of the AIC

基于频域AIC的带宽选择

Erhard Reschenhofer

AI总结 本文提出使用频域AIC自动选择非参数谱密度估计的带宽,并通过实际与合成时间序列证明其效果可与标准参数方法媲美。

详情
Comments
14 pages, 7 figures
AI中文摘要

在尽可能简单可靠地估计未知谱密度时,使用AR模型和通过AIC进行阶数选择的参数谱密度估计是首选方法。相比之下,自动非参数谱密度估计尚未出现标准方法,而且似乎很少有人愿意权衡不同风险函数及其估计方法的优缺点,尤其是在没有关于未知谱密度的具体先验信息的情况下,不清楚这种努力是否值得。因此,在实践中,主观视觉方法仍广泛用于确定非参数估计的适当平滑参数。本文旨在鼓励更多地使用客观自动方法,通过证据表明,使用最简单的频域AIC自动确定适当带宽,可以获得与标准参数方法相当的结果。该证据基于实际时间序列和具有不同复杂度谱密度的合成时间序列。

英文摘要

When it comes to estimating an unknown spectral density as simply and reliably as possible, parametric spectral density estimation using AR models and order selection via AIC is the method of choice. In contrast, no standard method has yet emerged for automatic nonparametric spectral density estimation, and there seems to be little willingness to weigh the advantages and disadvantages of different risk functions and the various methods for estimating them on a case-by-case basis, particularly because it is unclear whether the effort is even worthwhile without concrete prior information about the unknown spectral density. As a result, subjective visual methods are still widely used in practice to determine the appropriate smoothing parameter for a nonparametric estimation. This article aims to encourage the increased use of objective automatic methods by presenting evidence that using what is arguably the simplest and most straightforward frequency-domain version of the AIC for the automatic determination of an appropriate bandwidth enables results that are comparable to those obtained using the standard parametric approach. This evidence is based on both real-world time series and synthetic time series with spectral densities of varying complexity.

2606.02286 2026-06-02 math-ph math.AP math.DG math.MP physics.flu-dyn stat.AP

Exponential thermalisation of viscous fluids on negatively curved manifolds

负曲率流形上粘性流体的指数热化

Samuel L. Braunstein, Zhi-Wei Wang

AI总结 本文在负Ricci曲率的紧致黎曼流形上,通过涨落-耗散关系确定随机Navier-Stokes方程的噪声,证明了谱截断系统的唯一平稳分布为Gibbs测度,且以至少2νλ_D的速率指数收敛到平衡态,其中λ_D为变形拉普拉斯的谱隙。

详情
Comments
13 pages
AI中文摘要

确定性不可压缩Navier-Stokes方程在物理上是不完备的:任何有限温度下的粘性流体必须表现出由涨落-耗散关系决定的热涨落。我们在Ricci曲率严格负的紧致黎曼流形上,利用运动学选择的变形拉普拉斯算子,构建了随机Navier-Stokes方程。基于拓扑(Poincaré引理)论证的涨落-耗散关系,从粘性算子唯一确定了噪声。对于谱截断系统,我们证明了唯一平稳分布是Gibbs测度(模态振幅为高斯分布,因为非线性对流项保持能量),并且收敛到平衡态是指数快的,速率至少为$2νλ_\Def$,其中$ν$是运动粘度,$λ_\Def$是变形拉普拉斯算子的谱隙。当$\Ric \leq -κ^2 g$时,谱隙满足$λ_\Def \geq κ^2$,且与区域体积无关。在平坦空间中,类似的热化速率在无限体积极限下消失。平衡态的速度-速度相关函数在测地距离下呈指数衰减,而在平坦空间中为代数衰减。这些结果为负曲率流形上的粘性流体提供了严格的统计力学基础,并说明了区域的几何结构不仅控制确定性动力学,也控制热平衡的趋近过程。

英文摘要

The deterministic incompressible Navier-Stokes equations are physically incomplete: any viscous fluid at finite temperature must exhibit thermal fluctuations whose form is dictated by the fluctuation-dissipation relation. We formulate the stochastic Navier-Stokes equations with the kinematically selected deformation Laplacian on compact Riemannian manifolds with strictly negative Ricci curvature. The fluctuation-dissipation relation, derived from a topological (Poincaré lemma) argument, uniquely determines the noise from the viscous operator. For the spectrally truncated system, we prove that the unique stationary distribution is the Gibbs measure (Gaussian in the mode amplitudes, because the nonlinear convective terms preserve energy), and that convergence to equilibrium is exponentially fast with rate at least $2νλ_\Def$, where $ν$ is the kinematic viscosity and $λ_\Def$ is the spectral gap of the deformation Laplacian. The spectral gap satisfies $λ_\Def \geq κ^2$ when $\Ric \leq -κ^2 g$, and is independent of the volume of the domain. On flat space, the analogous thermalisation rate vanishes in the infinite-volume limit. The equilibrium velocity-velocity correlation function decays exponentially in geodesic distance, in contrast to the algebraic decay on flat space. These results provide a rigorous statistical-mechanical foundation for viscous fluids on negatively curved manifolds and illustrate how the geometry of the domain controls not only the deterministic dynamics but also the approach to thermal equilibrium.

2606.02247 2026-06-02 stat.ML cs.LG

ShaplEIG: Bayesian Experimental Design for Shapley Value Estimation

ShaplEIG:用于Shapley值估计的贝叶斯实验设计

David Rundel, Fabian Fumagalli, Maximilian Muschalik, Bernd Bischl, Matthias Feurer

AI总结 提出ShaplEIG方法,通过高斯过程代理和期望信息增益自适应选择联盟,以高效估计Shapley值,在低预算场景下显著提升样本效率。

详情
Comments
Accepted at the Forty-Third International Conference on Machine Learning (ICML 2026)
AI中文摘要

Shapley值是一种原则性的归因度量,广泛用于可解释机器学习,但其精确计算随玩家数量呈指数增长,促使了基于采样联盟价值函数评估的各种近似方法。这引发了一个问题:能否通过根据先前评估自适应选择联盟来提高近似精度?这在价值函数昂贵且评估次数严重受限的设置中尤为重要,例如基于重训练的特征重要性、数据估值和超参数重要性。为此,我们提出ShaplEIG,一种贝叶斯实验设计方法,该方法使用高斯过程代理近似昂贵的价值函数,并根据联盟对Shapley值的期望信息增益自适应选择联盟。通过Shapley值在价值函数中的线性性质,我们证明了期望信息增益具有封闭形式。此外,我们提出了一种高效计算方案,通过初等对称多项式将复杂度从指数级降低到玩家数量的多项式级。在多种昂贵应用的广泛实验中,我们的方法在低预算场景下始终优于最先进的基线方法,提高了样本效率。

英文摘要

Shapley values are a principled attribution measure widely used in interpretable machine learning, but their exact computation scales exponentially with the number of players, motivating a wide range of approximation methods based on value function evaluations of sampled coalitions. This raises the question of whether approximation accuracy can be improved by adaptively selecting coalitions for evaluation based on previous evaluations. This is particularly relevant in settings where the value function is costly and the number of evaluations is severely limited, such as retraining-based feature importance, data valuation, and hyperparameter importance. For this purpose, we propose ShaplEIG, a Bayesian experimental design approach that approximates the expensive value function using a Gaussian process surrogate and adaptively selects coalitions based on their expected information gain about the Shapley values. By the linearity of the Shapley values in the value function, we show that the expected information gain is available in closed form. Furthermore, we propose an efficient computation scheme that reduces the complexity from exponential to polynomial in the number of players via elementary symmetric polynomials. In extensive experiments across diverse costly applications, our method consistently improves sample efficiency in the low-budget regime over state-of-the-art baselines.

2606.02234 2026-06-02 econ.EM stat.ME

When Do Treatment Changes Identify Causal Effects?

治疗变化何时能识别因果效应?

Martin Huber

AI总结 本文澄清了基于治疗变化而非治疗水平的因果推断的识别假设,并展示了其与传统识别策略的关系,提出了非嵌套假设下的双重稳健性结果。

详情
AI中文摘要

本文阐明了基于治疗变化而非治疗水平的因果推断背后的识别假设,以及它们与传统识别策略的关系。我们刻画了两个不同的结构模型,具有非嵌套的识别假设,在这些模型下,治疗变化识别在给定观测协变量条件下是有效的。我们证明,依赖治疗变化的识别假设通常不与依赖治疗水平的方法的假设嵌套,例如控制过去结果、治疗和协变量的基于可观测变量的选择策略,或随时间对结果而非治疗进行差分的双重差分方法。然而,我们表明,在治疗过程的随机游走限制下,以治疗变化为条件等价于以滞后治疗为条件给定治疗水平。这一结果及其他等价性结果通过联合考虑基于治疗水平和变化的方法,推动了过度识别检验。除了这些检验,非嵌套结果还蕴含了结构双重稳健性含义:同时随时间对结果和治疗进行差分的估计量,如双向固定效应回归,只要治疗变化假设或平行趋势假设之一成立,即保持一致,无需两者同时成立。我们刻画了与每种方法一致的因果模型,通过模拟研究考察了有限样本行为,并给出了一个关于香烟需求的经验应用。

英文摘要

This paper clarifies the identifying assumptions underlying causal inference based on treatment changes rather than treatment levels, and their relationship to conventional identification strategies. We characterize two distinct structural models, with non-nested identifying assumptions, under which treatment-change identification is valid conditional on observed covariates. We demonstrate that the identifying assumptions relying on treatment changes are generally not nested with those of methods relying on treatment levels, such as selection-on-observables strategies that control for past outcomes, treatments, and covariates, or difference-in-differences approaches that difference outcomes rather than treatments over time. We show, however, that under a random-walk restriction on the treatment process, conditioning on treatment changes is equivalent to conditioning on treatment levels given lagged treatment. This and other equivalence results motivate overidentification tests by jointly considering methods based on treatment levels and changes. Beyond these tests, the non-nesting results carry a structural double robustness implication: an estimator that differences both the outcome and the treatment over time, such as two-way fixed effects regression, remains consistent if either the treatment-change assumption or the parallel-trends assumption holds, without requiring both simultaneously. We characterize the causal models consistent with each method, investigate finite-sample behavior in a simulation study, and present an empirical application to cigarette demand.

2606.02231 2026-06-02 stat.ML cs.LG stat.ME

Identifiable Markov Switching Models with Instantaneous Effects and Exponential Families

具有瞬时效应和指数族的可识别马尔可夫切换模型

Roel Hulsman, Carles Balsells-Rodas, Sara Magliacane

AI总结 针对非平稳时间序列,提出在指数族噪声下具有瞬时效应的马尔可夫切换模型的可识别性理论,并开发FlowMSM框架用于检测隐状态和恢复因果结构。

详情
Comments
International Conference on Machine Learning (ICML) 2026
AI中文摘要

时间系统通常表现出非平稳行为,例如季节性气候变化或1型糖尿病患者的血糖波动。对非平稳性建模的一种方法是通过离散隐状态,即时间的平稳片段。此类系统诱导出马尔可夫切换模型(MSM),这是一类隐马尔可夫模型,其中隐状态和观测变量之间存在自回归依赖关系。在存在频繁状态切换以及非线性和非高斯动态的情况下,特别是在变量之间存在瞬时效应(例如由于测量速率较慢)时,识别隐状态具有挑战性。在这项工作中,我们建立了在时间状态依赖、非线性滞后和瞬时效应以及来自指数族的独立噪声下,隐状态和状态依赖因果结构的可识别性。我们的可识别性理论涵盖了因果模型的非时间混合。此外,我们引入了FlowMSM,这是一个状态检测框架,可与任何平稳因果发现方法配对,以恢复状态依赖的因果结构。在合成基准和金融经济学数据集上的实验证明了我们的方法在检测隐状态和从非平稳时间序列中发现因果结构方面的有效性。

英文摘要

Temporal systems often exhibit non-stationary behaviour, such as seasonal climate variation or glucose fluctuations in patients with type-1 diabetes. One way to model non-stationarity is through discrete latent regimes, i.e., stationary segments of time. Such systems induce a Markov Switching Model (MSM), a class of Hidden Markov Models with autoregressive dependencies among latent regimes and observed variables. Identifying latent regimes is challenging in the presence of frequent regime switches and nonlinear and non-Gaussian dynamics, particularly when there are instantaneous effects between the variables, e.g., due to slow rates of measurements. In this work, we establish the identifiability of both latent regimes and regime-dependent causal structures under temporal regime dependencies, nonlinear lagged and instantaneous effects, and independent noise from the exponential family. Our identifiability theory subsumes non-temporal mixtures of causal models. Furthermore, we introduce FlowMSM, a regime detection framework that can be paired with any stationary causal discovery method to recover regime-dependent causal structures. Experiments on synthetic benchmarks and a financial economics dataset demonstrate the effectiveness of our approach to detect latent regimes and discover causal structures from non-stationary time series.

2606.02228 2026-06-02 stat.ML cs.CV cs.LG

Bayesian meta-learning for modeling Alzheimer's disease progression

贝叶斯元学习用于阿尔茨海默病进展建模

Clara Hoffmann, Nadja Klein

AI总结 提出贝叶斯元学习方法,利用个体历史MRI体积和疾病轨迹预测疾病评分分布,无需重新训练即可动态预测,并减少长期预测的过度自信。

详情
AI中文摘要

预测阿尔茨海默病患者将经历轻度还是重度疾病进展对于个性化治疗至关重要。通常,临床医生试图预测离散疾病评分的分布,条件是个体当前的MRI体积及其历史疾病轨迹。经典的统计回归模型和单任务神经网络不适合此目的,因为拟合单独模型不可行(每个个体通常只有少量观测),而忽略个体间相关性会导致泛化能力差。相比之下,元学习提供了一种自然的方法来动态预测分布,无需重新训练,并能建模结果与协变量之间的非线性关系。受此启发,我们提出了一种贝叶斯元学习器,它在多个个体上训练,但根据每个个体的历史数据定制预测的疾病评分分布。我们的模型无需重新训练即可预测未见过的个体,与历史观测数量呈线性扩展,并且在预测长期疾病评分时,与确定性对应模型相比,保证更少的过度自信。在阿尔茨海默病神经影像学倡议(ADNI)数据库的真实世界数据上,我们的模型在性能上与单任务模型和确定性元学习器相当,同时在预测长期疾病进展时显著提高了性能。

英文摘要

Predicting whether an individual with Alzheimer's disease will experience mild or severe disease progression is essential for personalized treatment. Typically, practitioners seek to predict the distribution of a discrete disease score, conditional on an individual's current MRI volume and their historical disease trajectory. Classical statistical regression models and single-task neural networks are not well-suited for this purpose because fitting separate models is infeasible (since each individual typically has few observations), while ignoring individual-level correlation leads to poor generalization. Meta-learning, in contrast, provides a natural avenue to dynamically predict distributions without retraining and model nonlinear relationships between the outcome and covariates. Motivated by this, we propose a Bayesian meta-learner that is trained on multiple individuals but tailors the predictive disease score distribution to each individual's historical data. Our model predicts on unseen individuals without retraining, scales linearly with the number of historical observations, and is guaranteed to be less overconfident when predicting long-term disease scores compared to its deterministic counterpart. On real-world data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, our model achieves performance competitive with both single-task models and deterministic meta-learners, while substantially improving performance when predicting long-term disease progression.

2606.02223 2026-06-02 cs.LG math.ST stat.ME stat.TH

Network Learning with Semi-relaxed Gromov-Wasserstein

半松弛Gromov-Wasserstein的网络学习

Charles Dufour, Ulysse Naepels, Leonardo V. Santoro

AI总结 针对大规模网络生成机制估计中的节点标签缺失问题,提出半松弛Gromov-Wasserstein目标函数,通过概率耦合松弛分配问题,利用块坐标条件梯度算法求解,并证明松弛解与确定性分配的最优性差距以O(1/n)速率消失,实现随机块模型和Hölder光滑图模型的相合性与极小化最优收敛速率。

详情
AI中文摘要

估计大规模网络的生成机制是统计机器学习中的一个基本挑战。由于缺乏规范的节点标签,识别潜在连接结构通常是一个NP难的组合问题。我们通过允许概率耦合来应对这一挑战,从而松弛了分配问题。我们的估计框架可以表述为半松弛Gromov-Wasserstein目标,并提供了生成结构的低维表示。我们通过块坐标条件梯度算法求解该问题。尽管进行了松弛,但所得解通常是确定性的:事实上,我们证明了松弛解与确定性分配之间的最优性差距以$O(1/n)$的速率消失,其中$n$是节点数。这使得潜在模型的可处理恢复成为可能,并能够进行严格的统计分析:我们为随机块模型和Hölder光滑图模型建立了相合性和极小化最优收敛速率。我们的实现在合成和真实数据集上均展示了随$n$的高效扩展能力。

英文摘要

Estimating the generative mechanism of large-scale networks is a fundamental challenge in statistical machine learning. It requires the identification of the latent connectivity structure, which is in general an NP-hard combinatorial problem due to the absence of canonical node labels. We address this challenge by allowing for probabilistic couplings, thereby relaxing the assignment problem. Our estimation framework can be formulated as a semi-relaxed Gromov-Wasserstein objective and provides a low-dimensional representation of the generative structure. We solve this via a block-coordinate conditional gradient algorithm. Despite the relaxation, the resulting solution is typically deterministic: in fact, we show that the optimality gap between the relaxed solution and the deterministic assignment vanishes at rate $O(1/n)$, where $n$ is the number of nodes. This allows for tractable recovery of the underlying model and enables rigorous statistical analysis: we establish consistency and minimax-optimal convergence rates for both stochastic block models and Holder-smooth graphons. Our implementation scales efficiently with $n$, as demonstrated on both synthetic and real-world datasets.

2606.02199 2026-06-02 stat.ME

A Contaminated Model for Overdispersed Multinomial Microbiome Count Data

过度分散多项微生物组计数数据的污染模型

Ockert van Heerden, Andriëtte Bekker, Seite Makgai, Arno Otto, Antonio Punzo

AI总结 针对过度分散多项计数数据中的异常值问题,提出污染狄利克雷-多项分布(CDM)模型,通过双组分混合有效降低异常值影响并实现异常检测。

详情
AI中文摘要

多项计数数据(例如来自测序研究的微生物组成谱)经常包含扭曲参数估计的异常观测值。狄利克雷-多项分布(DM)在此背景下被广泛使用,但仍对此类污染敏感。我们提出了污染狄利克雷-多项分布(CDM),这是一个双组分混合模型,其中常规数据来自具有较低离散度的DM组分,而非常规数据来自具有膨胀离散参数的DM组分。这种构造无需移除异常值即可容纳它们,并通过后验概率产生自然的异常检测规则。通过涉及单点异常和背景噪声的敏感性分析,我们证明CDM分布能有效降低异常观测值对参数估计的影响。该模型应用于结直肠癌发生研究的肠道微生物组数据,在所有信息准则上均一致优于DM分布,并在健康组和癌变组中识别出生物学上合理的异常比例。

英文摘要

Multinomial count data, such as microbial composition profiles derived from sequencing studies, frequently contain anomalous observations that distort parameter estimates. The Dirichlet-multinomial (DM) distribution is widely used in this setting but remains sensitive to such contamination. We propose the contaminated Dirichlet-multinomial (CDM) distribution, a two-component mixture in which the regular data come from a DM component with a lower dispersion and the irregular data come from a DM component with an inflated dispersion parameter. This construction accommodates anomalies without requiring their removal, and yields a natural rule for anomaly detection via posterior probabilities. Through sensitivity analyses involving both single-point anomalies and background noise, we demonstrate that the CDM distribution effectively downweights the influence of anomalous observations on the parameter estimates. The model is applied to gut microbiome data from a colorectal carcinogenesis study, where it consistently outperforms the DM distribution across all information criteria and identifies biologically plausible anomaly proportions in both the healthy and carcinoma subsets.

2606.02144 2026-06-02 math.ST stat.TH

Sharp Support Thresholds for Smeariness of Absolutely Continuous Measures on Spheres

球面上绝对连续测度模糊性的尖锐支撑阈值

Susovan Pal

AI总结 本文研究球面上绝对连续概率测度的完全模糊和方向模糊的支撑阈值,通过Hessian矩阵和四阶项分析,给出了旋转对称密度下模糊性发生的精确半径条件,并构造了反例。

详情
AI中文摘要

我们研究了球面 \(\mathbb{S}^m\) 上完全模糊和方向模糊的绝对连续概率测度的支撑阈值。动机源于推断:模糊性由 Fréchet 函数 Hessian 矩阵的退化引起,这种退化会使 Fréchet 均值的经典中心极限定理 (CLT) 及相应的 Wald 型 \(\chi^2\) 推断失效。对于旋转对称密度,我们证明完全模糊和方向模糊是等价的。Hessian 矩阵和四阶项由两个显式的几何相关半径 \(R_m<S_m\) 控制。在维度 \(m=2,3\) 时,旋转对称模糊性不可能发生。对于 \(m\ge4\),支撑包含在以 Fréchet 均值为中心、半径为 \(S_m\) 的测地球内时,可排除模糊性;反之,对于每个满足 \(S_m+\varepsilon<\pi\) 的 \(\varepsilon>0\),我们构造了支撑在半径为 \(S_m+\varepsilon\) 的球内的旋转对称 \(2\)-模糊密度的例子。对于一般密度,闭半球支撑可排除完全模糊和方向模糊。支撑包含在半径为 \(S_m\) 的闭球内可排除完全模糊,而我们构造了支撑在半径为 \(\pi/2+\varepsilon\) 的球内的显式方向 \(2\)-模糊例子。作为副产品,本文中的显式 Hessian 公式也为检测接近 Hessian 退化、非经典区域提供了实用诊断工具。

英文摘要

We investigate support thresholds for fully smeary and directionally smeary absolutely continuous probability measures on the sphere \(\mathbb{S}^m\). The motivation is inferential: smeariness is caused by degeneracy of the Hessian of the Fréchet function, and such degeneracy can invalidate the classical central limit theorem (CLT) for Fréchet means and the corresponding Wald-type \(χ^2\) inference. For rotationally symmetric densities, we show that full and directional smeariness are equivalent. The Hessian and fourth-order terms are governed by two explicit geometry-dependent radii \(R_m<S_m\). In dimensions \(m=2,3\), rotationally symmetric smeariness cannot occur. For \(m\ge4\), support contained in the geodesic ball of radius \(S_m\), centered at the Fréchet mean, rules out smeariness; conversely, for every \(\varepsilon>0\) with \(S_m+\varepsilon<π\), we construct examples of rotationally symmetric \(2\)-smeary densities supported in the ball of radius \(S_m+\varepsilon\). For general densities, closed hemispherical support rules out both full and directional smeariness. Support contained in the closed ball of radius \(S_m\) rules out full smeariness, while we construct explicit, directionally \(2\)-smeary examples supported in balls of radius \(π/2+\varepsilon\). As a byproduct, the explicit Hessian formulas in this paper also provide a practical diagnostic for detecting proximity to the Hessian-degenerate, non-classical regime.

2606.02130 2026-06-02 stat.ME

Methods for adjusting for covariate measurement error in flexible modelling of functional form: designing a blinded, controlled neutral comparison simulation study

在函数形式灵活建模中调整协变量测量误差的方法:设计一项盲法、受控的中性比较模拟研究

Anne C M Thiébaut, Aris Perperouglou, Mohammed Sedki, Steve Ferreira Guerra, Paul Gustafson, Frank E Harrell, Willi Sauerbrei, Michal Abrahamowicz, Laurence S Freedman

AI总结 本研究通过一项三阶段盲法模拟研究,比较了四种测量误差校正方法(SIMEX、回归校准、多重插补和贝叶斯方法)与灵活回归模型(B样条、P样条和分数多项式)的组合在估计连续含误差暴露变量与二元结局之间函数关系时的性能。

详情
AI中文摘要

本文描述了一项中性比较研究的设计,该研究针对实证研究中的兴趣在于学习连续含误差暴露变量与二元结局之间的函数关系。通过模拟研究比较了测量误差校正方法与灵活回归建模技术组合的性能。该项目涉及四个独立团队,一个负责数据生成和评估,另外三个负责特定的测量误差校正方法(模拟外推、回归校准和多重插补、贝叶斯方法)。研究分三个连续阶段进行。在第一阶段,第一个团队模拟了五个数据集,这些数据集仅在真实暴露-结局函数形式和真实暴露分布上有所不同。此外,灵活建模方法(B样条、P样条和分数多项式)的实施被标准化。三个方法团队对底层数据生成过程不知情,创建了实施其方法的代码,并将结果提供给第一个团队进行评估。这些代码随后由该团队在项目的后续阶段使用。在第二阶段,该团队模拟了150个额外的数据集,其中其他设计参数变化,同时使用相同的五个暴露-结局函数。第三阶段包括对第二阶段考虑的150个场景中的每一个进行独立重复模拟,以量化估计的抽样方差。这项工作强调了中性比较研究在公平评估旨在解决复杂分析挑战的统计方法方面的相关性,并通过一个大型合作项目证明了其可行性。

英文摘要

This article describes the design of a neutral comparison study in the context of empirical studies where the interest is in learning the functional relationship between a continuous errorprone exposure variable and a binary outcome. The performance of combinations of measurement error correction methods and flexible regression modeling techniques was compared using a simulation study. The project involved four independent teams, one devoted to data generation and evaluation, the other three to specific methods of measurement error correction (Simulation-Extrapolation, Regression-Calibration and Multiple imputation, Bayesian method). The study was conducted in three successive stages. In Stage 1, the first team simulated five datasets differing only by the true exposure-outcome functional form and distribution of true exposure. Furthermore, the implementation of flexible modeling methods (B-splines, P-splines, and fractional polynomials) was standardized. The three methods teams, blinded to the underlying data generation process, created the codes to implement their methods, and provided their results to the first team who evaluated them. These codes were then used by this team in the next Stages of the project. In Stage 2, the team simulated 150 additional datasets where other design parameters varied while using the same five exposureoutcome functions. Stage 3 consisted of simulating independent replications of each of the 150 scenarios considered in Stage 2 to quantify the sampling variance of the estimates. This work emphasizes the relevance of neutral comparison studies to fairly evaluate statistical methods aimed at addressing a complex analytical challenge, and demonstrates their feasibility through a large collaborative project.

2606.02118 2026-06-02 stat.CO

Observed Fisher Information in hidden Markov models - Application to a noisy Gaussian random walk

隐马尔可夫模型中的观测Fisher信息——应用于带噪高斯随机游走

Alexandra Lefebvre, Grégory Nuel

AI总结 本文提出基于Oakes恒等式的前向后向算法,精确计算高斯噪声下观测高斯随机游走的得分和观测Fisher信息矩阵,并利用Newton-Raphson算法进行参数估计和置信区间计算。

详情
AI中文摘要

在这项工作中,我们提供了在通过高斯噪声观测的高斯随机游走中,精确计算得分和观测Fisher信息矩阵的解析闭式表达式。我们的方法基于Oakes恒等式,并且与对数似然计算一样,其时间复杂度随序列长度线性增长,使用前向后向(或Baum-Welch)算法。我们通过各种模拟研究说明了该方法,并提供了使用Newton-Raphson算法计算的参数估计以及置信区间。

英文摘要

In this work we provide analytical and closed-form expressions for the exact computation of the score and the observed Fisher information matrix in a Gaussian random walk observed through Gaussian noise. Our method is based on the Oakes' identity and, as for the computation of the log-likelihood, its complexity in time is linear in the length of the sequence with the forward-backward (or Baum-Welch) algorithm. We illustrate the method over various simulation studies and provide parameter estimates computed with the Newton-Raphson algorithm along with confidence intervals.

2606.02117 2026-06-02 stat.ML cs.LG stat.ME

ProbRes: Volatility Learning for Probabilistic Time-Series Forecasting

ProbRes: 概率时间序列预测的波动率学习

Tingting Wang, Yunyi Zhang, Benyou Wang

AI总结 提出ProbRes,一种事后概率校准方法,通过显式学习波动率动态来改进概率预测,有效处理异方差数据,并在理论和实验上验证其有效性。

详情
AI中文摘要

概率时间序列预测由于需要量化未来观测中的风险和不确定性,在金融应用中引起了越来越多的关注。我们提出ProbRes,一种事后概率校准方法,它显式地学习并将波动率动态纳入概率预测中,从而能够有效处理异方差数据。在训练过程中,ProbRes采用两个与架构无关的模块分别对条件均值和条件波动率进行建模。在推理阶段,它通过重采样标准化残差生成预测分布。ProbRes适用于单变量和多变量时间序列,并且在广泛的误差分布下保持稳健,包括具有条件异方差的非高斯创新。理论结果证明了ProbRes的有效性,在合成和真实数据集上的实验表明,ProbRes准确捕捉预测分布并产生校准良好的预测区间。

英文摘要

Probabilistic time series forecasting has attracted increasing attention in financial applications due to the need to quantify risk and uncertainty in future observations. We propose ProbRes, a post-hoc probabilistic calibration method that explicitly learns and incorporates volatility dynamics into probabilistic forecasting, enabling effective handling of heteroskedastic data. During training, ProbRes employs two architecture-agnostic modules to separately model the conditional mean and conditional volatility. At the inference stage, it generates predictive distributions by resampling normalized residuals. ProbRes is applicable to both univariate and multivariate time series and remains robust under a wide range of error distributions, including non-Gaussian innovations with conditional heteroskedasticity. Theoretical results demonstrate ProbRes's validity and experiments on both synthetic and real-world datasets show that ProbRes accurately captures predictive distributions and produces well-calibrated prediction intervals.

2606.02115 2026-06-02 stat.ML cs.LG

Error Bounds for a Diffusion Model-Based Drift Estimator

基于扩散模型的漂移估计器的误差界

Ioar Casado-Telletxea, Omar Rivasplata

AI总结 针对随机微分方程中已知扩散参数时的漂移估计问题,利用扩散模型理论推导了时间平均均方误差的显式风险界,将风险分解为离散化、得分近似、噪声初始化和采样方差四项。

详情
Comments
Preprint
AI中文摘要

随机微分方程中的参数估计是一个经典的统计问题,在许多科学领域具有重要意义。Tapia Costa等人(2026)的最新工作引入了一种新技术,当扩散参数已知时,利用多条轨迹的离散样本估计漂移。他们的方法将漂移估计视为去噪问题,并利用(条件)得分匹配扩散模型的工具。尽管他们的实验在不同漂移类别中显示出有希望的结果,但其估计器的理论保证问题仍未解决。在本笔记中,我们通过利用扩散模型理论的技术来填补这一空白。更具体地说,我们为该漂移估计器的时间平均均方误差推导了一个显式的风险界。我们的界将风险分解为(i)Euler-Maruyama离散化,(ii)得分/去噪器近似,(iii)噪声初始化,以及(iv)采样方差,揭示了估计器中不同超参数和误差源之间的权衡。

英文摘要

Parameter estimation in stochastic differential equations is a classical statistical problem of much importance in many scientific fields. Recent work of Tapia Costa et al. (2026) introduced a novel technique for estimating the drift when the diffusion parameter is known, using discrete samples from multiple trajectories. Their method treats drift estimation as a denoising problem, and leverages tools from (conditional) score-matching diffusion models. Although their experiments showed promising results across different drift classes, the question of theoretical guarantees for their estimator was left unanswered. In this note, we address this gap by exploiting techniques from diffusion model theory. More concretely, we derive an explicit risk bound for the time-averaged mean-squared error of said drift estimator. Our bound decomposes the risk into the (i) Euler-Maruyama discretization, (ii) score/denoiser approximation, (iii) noise initialization, and (iv) sampling variance, revealing the trade-offs between the different hyperparameters and sources of error in the estimator.

2606.02106 2026-06-02 cs.LG stat.ML

When Tabular Foundation Models Transfer Across Modalities: A Systematic Evaluation Across 95 Datasets, 7 Modalities, and Two Regimes

当表格基础模型跨模态迁移:对95个数据集、7种模态和两种范式的系统评估

Julien Lafrance

AI总结 本文提出一种结合等角紧框架预处理与表格基础模型的分类流水线,在跨模态数据上评估其性能,并证明其在速度与质量间取得良好平衡。

详情
Comments
24 pages, 5 figures. Code and data available at https://doi.org/10.5281/zenodo.19982636
AI中文摘要

我们提出一个单一的分类流水线,该流水线结合了等角紧框架(ETF)预处理阶段和用于上下文推理的表格基础模型,一旦数据被映射到固定向量表示,该流水线在所有模态上应用相同。我们在涵盖七种信号模态——视觉、音频、语音、文本、分子、时间序列和表格——的95个数据集上对其进行评估。主要的方法论贡献是固定比较对象:在整个论文中,性能与相同冻结特征上最强的轻量级调优基线进行比较,而oracle选择、部署选择和专门微调则分别报告。该流水线在相同冻结特征上与强大的轻量级调优基线广泛竞争。它并不在每个任务上都匹配最好的专门模型或高度调优的流水线,但差距很小,且运行速度更快——通常比完整骨干微调快4到200倍,而质量往往相当。我们描述了如何在实际中部署该流水线:何时应用ETF预处理,如何在无验证集的情况下停止其训练,如何设置上下文分类器,以及如何校准所得概率。校准步骤并非装饰性的:TabICL通过构造产生良好校准的概率,ETF预处理最初会破坏该校准,而后处理重新缩放则恢复它——从而产生每个预测的置信度信号,从业者可以将其用作置信度门控部署的信任阈值。我们还报告了该流水线在哪些情况下不应期望有帮助,以及如何提前识别这些情况。

英文摘要

We present a single classification pipeline that combines an Equiangular Tight Frame (ETF) preprocessing stage with a tabular foundation model for in-context inference, applied identically across modalities once data is mapped to fixed vector representations. We evaluate it on 95 datasets spanning seven signal modalities -- vision, audio, speech, text, molecular, time-series, and tabular. The main methodological contribution is to fix the comparison object: throughout the paper, performance is judged against the strongest lightweight tuned baseline on the same frozen features, while oracle selection, deployed selection, and specialized fine-tuning are reported separately. The pipeline is broadly competitive with strong lightweight tuned baselines on the same frozen features. It does not match the very best specialized models or heavily tuned pipelines on every task, but it stays close, and it runs much faster -- typically 4 to 200 times faster than full backbone fine-tuning, often at comparable quality. We describe how to deploy the pipeline in practice: when to apply ETF preprocessing, how to stop its training without a validation split, how to set up the in-context classifier, and how to calibrate the resulting probabilities. The calibration step is non-cosmetic: TabICL produces well-calibrated probabilities by construction, ETF preprocessing initially disrupts that calibration, and the post-hoc rescaling restores it -- yielding a per-prediction confidence signal that practitioners can use as a trust threshold for confidence-gated deployment. We also report where the pipeline should not be expected to help, and how to identify those cases in advance.

2606.02101 2026-06-02 stat.ML cs.LG stat.AP

It does what it says on the tin: safe synthetic data from coarsened margins

名副其实:来自粗化边际的安全合成数据

Gillian M Raab

AI总结 提出一种通过粗化边际并应用迭代比例拟合算法生成合成数据的方法,确保透明性和无披露风险。

详情
AI中文摘要

本文提出了一种创建合成数据的方法,与当前可用的其他方法相比,该方法对用户有两个重要优势。首先是透明性;与其他方法不同,接收合成数据的人将知道原始数据中哪些变量之间的关系将在合成数据中大致保持。其次是保证合成数据来源于已被判定无披露风险的信息。这是通过首先定义和计算将在合成数据中保持变量关系的边际来实现的。然后,每个边际将根据数据保管者定义的标准进行统计披露控制,例如顶部编码和底部编码、小类别的组合和/或修改小计数。建议通过将表格中的所有计数粗化为披露限制的倍数来进一步调整策展边际。这些调整后的边际用于通过迭代比例拟合算法生成合成数据。使用1901年苏格兰人口普查的数据说明了创建此类合成数据的实际步骤。

英文摘要

This paper proposes a method of creating synthetic data (SD) that will have two important advantages for the user compared to other methods currently available. The first is transparency; unlike other methods, the person in receipt of the SD will know which of the relationships between variables in the original data will be approximately maintained in the SD. The second is a guarantee that the SD is derived from information that has already been judged to be free of disclosure risk. This is achieved by first defining and calculating the margins where relationships between variables will be maintained in the SD. Each margin will then be subject to statistical disclosure control (SDC) to the standards defined by the data custodian, e.g. top-coding and bottom-coding, combination of small categories and/or modifying small counts. Further adjustment of the curated margins is advised by coarsening all counts in the table to multiples of the disclosure limit. These adjusted margins are used to create SD by the Iterative Proportional Fitting (IPF) algorithm. The practical steps involved in creating such SD are illustrated using data from the 1901 Census of Scotland.

2606.02081 2026-06-02 math.OC stat.ML

Decision-calibrated prediction sets for robust power system operations

面向鲁棒电力系统运行的决策校准预测集

Akylas Stratigakos, Honglin Wen, Elina Spyrou, Pierre Pinson

AI总结 提出决策校准预测集作为鲁棒优化中的不确定集,通过部分输入凸神经网络学习条件预测集,并基于共形风险控制校准阈值,以控制下游运行约束的期望违反,实现比传统覆盖校准更优的约束满足精度和成本效率。

详情
Comments
25 pages, 6 figures
AI中文摘要

鲁棒优化为在天气依赖的可再生能源不确定性主导的电力系统中平衡运行成本和可靠性提供了一种可处理方法,但其性能关键取决于不确定集。标准数据驱动方法通常校准不确定集以达到预测覆盖,这可能导致不必要的庞大集合和高昂的运行决策。相反,我们引入决策校准预测集,并将其作为不确定集嵌入鲁棒优化问题中;这些是条件多变量预测集,其中校准是根据下游决策的可靠性而非覆盖来定义的。首先,我们将这些条件预测集学习为基于范数的得分函数的子水平集,该得分函数由部分输入凸神经网络表示,在捕获上下文信息和多变量依赖性的同时,保持下游鲁棒公式的凸性和可处理性。其次,受共形风险控制的启发,我们校准一个得分阈值参数,该参数设定不确定集的体积,从而控制下游运行约束的期望违反。我们将我们的方法应用于考虑网络约束可输送性的15分钟前备用调度,该问题被建模为具有仿射补偿的鲁棒直流最优潮流问题。数值实验表明,决策校准集在约三个百分点内达到规定的约束满足目标,而标准的基于覆盖的校准系统性地超过这些目标超过十一个百分点,导致更大的集合和更高的运行成本。

英文摘要

Robust optimization offers a tractable approach to balance operating costs and reliability in power systems dominated by weather-dependent renewable uncertainty, but its performance depends critically on the uncertainty set. Standard data-driven approaches often calibrate uncertainty sets to attain predictive coverage, which can produce unnecessarily large sets and costly operating decisions. In contrast, we introduce decision-calibrated prediction sets and embed them as uncertainty sets in robust optimization problems; these are conditional multivariate prediction sets where calibration is defined in terms of the reliability of downstream decisions, rather than in terms of the coverage. First, we learn these conditional prediction sets as sub-level sets of norm-based score functions represented by partially input-convex neural networks, capturing contextual information and multivariate dependence while preserving convexity and tractability in downstream robust formulations. Second, inspired by conformal risk control, we calibrate a score-threshold parameter that sets the volume of the uncertainty set, thereby controlling the expected violations of downstream operational constraints. We apply our approach to 15-minute-ahead reserve scheduling with network-constrained deliverability, which we formulate as a robust DC optimal power flow problem with affine recourse. Numerical experiments show that decision-calibrated sets attain prescribed constraint-satisfaction targets within about three percentage points, whereas standard coverage-based calibration systematically exceeds these targets by more than eleven percentage points, leading to larger sets and higher operating costs.

2606.02076 2026-06-02 stat.ME

Modelling multi-cancer screening data to infer on natural history of disease: when can valid, identifiable and precise inference be obtained?

建模多癌筛查数据以推断疾病自然史:何时能获得有效、可识别且精确的推断?

MO Soares, J Lange, K Gogebakan, S Dias, NJ Welton, R Etzioni, AE Ades, S Palmer

AI总结 本文通过模拟多癌早期检测试验数据,评估多状态模型在不同临床细分程度下的推断能力,发现高维模型(如9状态)收敛性和可识别性较差,而层次模型和先验信息可改善性能,但需注意先验偏差。

详情
Comments
26 pages, 5 Tables, 1 Figure, 2 Boxes
AI中文摘要

背景:应用于筛查数据的多状态模型(MSMs)可以表征癌症的自然史并预测筛查带来的“分期转变”。然而,推断诸如平均逗留时间(MST)等参数具有挑战性,因为疾病起始在这些数据中本质上是未观察到的。当在多癌早期检测(MCED)试验数据中表征癌症类型之间的异质性时,这更加困难。方法:我们利用模拟的纵向MCED筛查数据集,评估MSMs在日益增加的临床细分下的推断界限:3状态(总体MST)、5状态(早期/晚期)和9状态(I-IV期)模型。通过马尔可夫链蒙特卡罗进行贝叶斯估计。通过链收敛、参数可识别性(通过轮廓似然)和估计精度评估稳健性。我们还探索了层次模型和使用信息先验来改善可识别性。结果:仅基于MCED试验数据,许多癌症类型表现出推断困难。通常,5状态模型与3状态模型一样稳健,在保持总体MST精度的同时,收敛性和可识别性略有改善。相比之下,9状态模型显示出更差的收敛性和可识别性,并且总体MST估计的精度显著降低。层次模型成功改善了性能,信息先验模型也是如此,但后者引入了对先验值的偏差。结论:虽然按个体癌症分期分解自然史模型对政策制定是可取的,但这些高维模型更依赖于外部数据/假设。我们建议进行明确的可识别性评估以及外部数据/假设影响的评估,以支持MCED筛查评估的推断。

英文摘要

Background: Multistate models (MSMs) applied to screening data can characterise the natural history of cancer and predict "stage-shifts" from screening. However, inferring parameters like mean sojourn time (MST) is challenging as disease onset is inherently unobserved in these data. This is even more challenging when characterising heterogeneity between cancer types in multicancer early detection (MCED) trial data. Methods: We utilised simulated longitudinal MCED screening datasets to evaluate the inferential bounds of MSMs under increasing clinical disaggregation: a 3-state (overall MST), 5-state (early/late stage), and 9-state (stages I-IV) model. Bayesian estimation was performed via Markov chain Monte Carlo. Robustness was assessed through chain convergence, parameter identifiability (via profile likelihood), and precision of estimates. We also explored hierarchical models and the use of informative priors to improve identifiability. Results: Based only on MCED trial data, many cancer types exhibited inferential challenges. Generally, the 5-state model was as robust as the 3-state model, showing slight improvements to convergence and identifiability while maintaining precision for overall MST. In contrast, the 9-state model showed worsened convergence and identifiability, and a significant reduction in the precision of overall MST estimates. Hierarchical models successfully improved performance, as have informative prior models but the latter introduced bias towards the prior values. Conclusions: While disaggregating natural history models by individual cancer stages is desirable for policy, these higher-dimensional models show a greater reliance on external data/assumptions. We recommend explicit identifiability assessments and assessments of the influence of external data/assumptions to support inference for MCED screening evaluations.

2606.02065 2026-06-02 stat.ME math.PR

Inverting Poisson-Laguerre tessellations

反转泊松-拉盖尔镶嵌

Thomas van der Jagt, Geurt Jongbloed, Martina Vittorietti

AI总结 针对泊松-拉盖尔镶嵌,提出一种一致反转方法,从观测到的镶嵌细胞中恢复加权生成元,并用于非参数估计权重分布。

详情
AI中文摘要

虽然对于给定的加权生成点集,如何计算拉盖尔镶嵌的细胞是众所周知的,但反转拉盖尔镶嵌并不显然。也就是说,当观测到一个拉盖尔镶嵌时,如何检索与观测细胞对应的加权生成元。在本文中,我们考虑一类称为泊松-拉盖尔镶嵌的随机拉盖尔镶嵌的反转。观测到的泊松-拉盖尔镶嵌细胞的加权生成元是令人感兴趣的,因为这些加权生成元的知识对于泊松-拉盖尔镶嵌的统计推断是有用的。对于一般的拉盖尔镶嵌,我们给出了所有产生相同拉盖尔镶嵌的加权生成点配置的一个刻画。对于泊松-拉盖尔镶嵌,我们提出了一种一致反转的方法,这意味着随着通过增大的观测窗口观测镶嵌,可以获得原始加权生成元的更接近的近似。在模拟研究中,我们检验了反转过程的性能,以及使用获得的近似加权生成元非参数估计对应于泊松-拉盖尔镶嵌的权重分布函数的效果。

英文摘要

While it is well-known how to compute the cells of a Laguerre tessellation for a given set of weighted generator points, it is not obvious how to invert a Laguerre tessellation. That is, given that one observes a Laguerre tessellation, how can one retrieve the weighted generators corresponding to the observed cells. In this paper, we consider inversion of a class of random Laguerre tessellations known as Poisson-Laguerre tessellations. The weighted generators of observed cells of a Poisson-Laguerre tessellation are of interest because knowledge of these weighted generators is useful for statistical inference of Poisson-Laguerre tessellations. For general Laguerre tessellations we provide a characterization of all configurations of weighted generator points which yield the same Laguerre tessellation. For Poisson-Laguerre tessellations we propose a method for consistent inversion, meaning that as one observes the tessellation through increasing observation windows, a closer approximation of the original weighted generators can be obtained. In a simulation study we examine both performance of the inversion procedure, as well as the use of the obtained approximated weighted generators for nonparametrically estimating the weight distribution function corresponding to a Poisson-Laguerre tessellation.

2606.02062 2026-06-02 stat.ME

Evaluating the role of correlation among markers in prediction models

评估预测模型中标志物间相关性的作用

Sergio Sabroso-Lasa, Luis Mariano Esteban, Tomás Alcalá-Nalvaiz, Francisco J. Jurado, Núria Malats

AI总结 本文通过理论推导、模拟和实际数据分析,研究了标志物间相关性对预测模型AUC的影响,发现负相关性可最大化组合AUC,正相关性效果最差。

详情
Comments
31 pages, 1 table, 4 figures
AI中文摘要

已有多种方法用于估计最大化接收者操作特征曲线下面积(ROC-AUC)的模型。模型开发后,整合新型生物标志物可能提升其诊断能力。然而,即使新标志物本身具有良好的区分能力,添加它所带来的区分度提升并不总是明显的。标志物之间相关性的符号和大小可能影响模型性能。本文评估了这种相关性对预测模型区分能力的影响。在多元正态性假设下,我们推导出最大AUC作为标志物间相关性的函数表达式,并使用曲面图形展示。对数折叠二元正态分布和Gamma模拟处理偏斜数据情况。此外,通过液相色谱法测定的1934种血脂代谢物,在PanGenMic研究的44例胰腺癌病例和38例对照中评估了AUC改善。我们的结果表明,负相关性一致地最大化组合AUC,当标志物具有相等预测能力时提供最大改善,而正相关性产生最不利的结果。对于能力不同的标志物,负相关性仍然是最优的,尽管正相关性显示出轻微优势。偏斜分布的模拟证实了这些趋势,强调了不对称性在标志物选择中的作用。对血清脂质衍生代谢物检测胰腺导管腺癌(PDAC)的实际分析强化了相关性对AUC优化的影响。这些发现表明,在将新标志物纳入预测算法时,应考虑标志物间相关性的符号和大小。

英文摘要

Different methods have been employed to estimate models maximizing the area under the receiver operating characteristic curve (ROC-AUC). Once a model is developed, integrating novel biomarkers may improve its diagnostic ability. However, the discrimination improvement from adding a new biomarker is not always evident, even if the marker itself has good discriminatory power. The sign and magnitude of correlations between biomarkers may impact model performance. In this paper, we assess the effect of such correlations on the discrimination ability of predictive models. Under multivariate normality, we derive an expression for the maximum AUC as a function of the correlations between markers, illustrated graphically using surfaces. Logarithmic folded bivariate normal and Gamma simulations address skewed data cases. Additionally, AUC improvement was assessed combining 1934 blood lipid metabolites determined by liquid chromatography in 44 pancreatic cancer cases and 38 controls from the PanGenMic Study. Our results show that negative correlations consistently maximize the combined AUC, offering the greatest improvements when markers have equal predictive ability, while positive correlations yield the least favorable results. Negative correlations remain optimal for markers with differing abilities, though positive correlations show slight benefits. Simulations with skewed distributions confirm these trends, emphasizing the role of asymmetry in marker selection. Real-world analysis of serum lipid-derived metabolites for detecting pancreatic ductal adenocarcinoma (PDAC) reinforces the influence of correlations on AUC optimization. These findings suggest that the sign and magnitude of inter-biomarker correlations should be considered when incorporating new markers into predictive algorithms.

2606.02059 2026-06-02 stat.ME

ICCDesign: An R Package for the Design and Analysis of ICC-Based Reliability Studies with Continuous Responses

ICCDesign: 用于连续响应基于ICC的可靠性研究设计与分析的R包

Ziyu Liu, Ruilin Ma, Yundan Zhang, Chenge Gao

AI总结 本文介绍ICCDesign R包,它整合了基于ICC的可靠性研究中的点估计、置信区间、假设检验、样本量规划和可靠性评估功能,并提供交互式Shiny应用。

详情
Comments
22 pages, 3 tables, R package
AI中文摘要

组内相关系数(ICC)是可靠性研究中最广泛使用的统计量之一,在医学测量、心理评估和行为科学中发挥着核心作用。然而,ICC的实际应用面临两大障碍。首先,在McGraw和Wong(1996)框架下,ICC可以组织成多种形式——包括六种广泛报告的标准形式和四种额外的设计组合——研究人员必须根据其研究设计选择适当的形式,但现有指南并未总是在软件界面中实现。其次,可用的R工具高度分散:样本量计算、带置信区间的ICC估计和可靠性评估分布在不同的包中,迫使研究人员在工具之间切换,增加了分析错误的风险。本文介绍了ICCDesign包,专门为基于ICC的连续响应可靠性研究提供集成工作流,假设每个受试者-评分者单元格有一个连续评分。该包集成了四个核心功能:(1)按照McGraw和Wong(1996)框架,对支持的ICC设计组合进行点估计、基于ANOVA的置信区间和实施的假设检验,并内置四步决策框架引导用户选择适当的ICC形式;(2)基于Zou(2012)的闭式公式进行样本量规划,支持两种规划模式和逆保证计算;(3)基于Koo和Li(2016)标准进行自动可靠性评估,当置信区间跨越0.75良好可靠性阈值时发出不确定性通知;(4)涵盖主要分析和规划功能的交互式Shiny Web应用程序。ICCDesign可从GitHub获取:https://github.com/KlariZhang/ICCDesign。

英文摘要

The intraclass correlation coefficient (ICC) is among the most widely used statistics in reliability research, playing a central role in medical measurement, psychological assessment, and behavioral science. However, practical application of ICC faces two major obstacles. First, ICC can be organized into multiple forms under the McGraw and Wong (1996) framework -- including six widely reported standard forms and four additional design combinations -- and researchers must select the appropriate form based on their study design, yet existing guidelines are not always operationalized in software interfaces. Second, available R tools are highly fragmented: sample size calculation, ICC estimation with confidence intervals, and reliability evaluation are distributed across separate packages, compelling researchers to switch between tools and increasing the risk of analytical errors. This paper introduces the ICCDesign package, designed specifically to provide an integrated workflow for ICC-based reliability studies with continuous responses, assuming one continuous rating per subject-rater cell. The package integrates four core functionalities: (1) point estimation, ANOVA-based confidence intervals, and implemented hypothesis tests for supported ICC design combinations following the McGraw and Wong (1996) framework, with a built-in four-step decision framework guiding users toward an appropriate ICC form; (2) sample size planning based on Zou's (2012) closed-form formulas, supporting two planning modes and an inverse assurance calculation; (3) automated reliability evaluation based on Koo and Li (2016) criteria, with an uncertainty notification when the confidence interval spans the 0.75 good-reliability threshold; and (4) an interactive Shiny web application covering the main analysis and planning functionalities. ICCDesign is available from GitHub at https://github.com/KlariZhang/ICCDesign.

2606.02055 2026-06-02 cs.IT cs.LG cs.SI math.IT stat.ML

Query-Limited Community Recovery in Stochastic Block Models

随机块模型中的有限查询社区恢复

Sabyasachi Basu, Manuj Mukherjee, Lutz Oettershagen, Suhas Thejaswi

AI总结 研究在有限且带噪的网络数据访问下,通过自适应查询策略实现两社区随机块模型的精确社区恢复,并证明自适应查询可突破非自适应基准的信息论极限。

详情
AI中文摘要

我们研究在 $n$ 个顶点上的两社区随机块模型中,对网络数据的有限且带噪访问下的精确社区恢复。学习器可以查询一个带噪的邻域预言机,该预言机独立地以固定概率揭示被查询顶点的每个真实邻居,且从不返回非邻居,受限于有限的查询预算。我们考虑仅预言机访问以及一个组合模型,其中学习器还观察底层图的单个子采样副本。对于仅预言机访问,平衡均匀查询给出了一个尖锐的非自适应基准:当每个顶点被查询相同整数次数时,观测结果简化为具有衰减边概率的 SBM,并且 Abbe-Bandeira-Hall 精确恢复阈值适用。我们证明该基准并非自适应最优:在平衡均匀查询需要 $m n$ 次查询(对于某个 $m>1$)的机制下,两阶段自适应策略以 $n+o(n)$ 次查询成功。对于额外的子采样图,我们证明了一个亚线性查询的自适应差距:预算为亚线性的平衡数据无关均匀查询不会比单独的子采样图有所改进,而自适应查询可以针对少量不确定顶点并实现精确恢复。因此,自适应数据采集可以严格改善精确恢复的信息论极限。

英文摘要

We study exact community recovery in the two-community stochastic block model on $n$ vertices under limited and noisy access to network data. The learner may query a noisy neighborhood oracle that reveals each true neighbor of a queried vertex independently with fixed probability and never returns non-neighbors, subject to a finite query budget. We consider both oracle-only access and a combined model where the learner also observes a single subsampled copy of the underlying graph. For oracle-only access, balanced uniform querying gives a sharp non-adaptive benchmark: when each vertex is queried the same integer number of times, the observations reduce to an SBM with attenuated edge probabilities and the Abbe-Bandeira-Hall exact-recovery threshold applies. We show that this benchmark is not adaptively optimal: a two-stage adaptive strategy succeeds with $n+o(n)$ queries in a regime where balanced uniform querying requires $m n$ queries for some $m>1$. With an additional subsampled graph, we prove a sublinear-query adaptivity gap: balanced data-independent uniform querying with a sublinear budget does not improve over the subsampled graph alone, whereas adaptive querying can target a small set of uncertain vertices and achieve exact recovery. Thus adaptive data acquisition can strictly improve the information-theoretic limits of exact recovery.

2606.02047 2026-06-02 stat.ML cs.LG math.ST stat.ME stat.TH

Convex Distance Operator Transport: A Convex and Geometry-Preserving Formulation

凸距离算子传输:一种凸且保持几何的公式

Junhyoung Chung, Euijong Song, Won Hwa Kim, Gunwoong Park

AI总结 提出凸距离算子传输(CDOT),通过算子正则化联合保持特征对应与内在几何结构,实现异质分布对齐,并证明其伪度量性质及与Gromov-Wasserstein的关系。

详情
Comments
This paper is 41 pages long, contains 6 figures, and has been accepted to ICML 2026
AI中文摘要

我们引入了凸距离算子传输(CDOT),这是第一个凸最优传输框架,通过联合保持特征对应和内在几何结构来对齐异质域中的分布。具体来说,CDOT采用基于算子的正则化,通过引入距离算子和条件期望算子来对齐聚合的距离结构。因此,所提出的正则化提高了对局部几何变化的鲁棒性。我们进一步证明了得到的CDOT差异是带属性的紧度量测度空间上的有效伪度量。此外,我们通过一个新的色散间隙概念刻画了CDOT与Gromov-Wasserstein(GW)之间的关系,正式阐明了GW非凸性相对于CDOT凸性的几何来源。在有限样本情况下,我们推导了一个非渐近风险界,分解为优化误差和统计误差,并在全局收敛的Frank-Wolfe算法下建立了风险一致性。在合成点云、脑连接组和图分类基准上的实验表明,该方法优于现有方法,在实践中表现稳定可靠。

英文摘要

We introduce Convex Distance Operator Transport (CDOT), the first convex optimal transport framework that aligns distributions across heterogeneous domains by jointly preserving feature correspondence and intrinsic geometric structure. Specifically, CDOT employs an operator-based regularization that aligns aggregated distance structures by introducing distance and conditional expectation operators. Consequently, the proposed regularization improves the robustness to local geometric variations. We further prove that the resulting CDOT discrepancy is a valid pseudometric on the space of attributed compact metric-measure spaces. In addition, we characterize the relationship between CDOT and Gromov--Wasserstein (GW) through a new notion of dispersion gap, formally elucidating the geometric source of non-convexity in GW compared to the convexity of CDOT. In the finite-sample regime, we derive a non-asymptotic risk bound decomposed into optimization and statistical errors, establishing risk consistency under a globally convergent Frank--Wolfe algorithm. Experiments on synthetic point clouds, brain connectomes, and graph classification benchmarks demonstrate better performance over existing methods, with stable and reliable behavior in practice.

2606.02017 2026-06-02 stat.ME stat.AP stat.ML

PliableBVS: A flexible Bayesian variable selection method for modeling interactions with mandatory modifying variables

PliableBVS:一种用于建模与强制修改变量交互的灵活贝叶斯变量选择方法

Theophilus Quachie Asenso, Zhi Zhao, Maren-Helene Langeland Degnes, Marie Cecilie Paasche Roland, Trond Melbye Michelsen, Manuela Zucknick

AI总结 提出PliableBVS方法,通过分层尖峰-板先验在贝叶斯框架下实现高维交互效应与主效应的同时选择,保留弱层次约束,优于可塑套索。

详情
AI中文摘要

高维交互模型对于研究例如大量感兴趣的变量(如基因表达或其他组学特征)如何与较小的一组修改变量(如临床协变量)交互非常有用。在此背景下,最近提出了可塑套索作为一种有效的方法,用于在非对称弱层次约束下筛选大量潜在的交互项。在这项工作中,我们通过引入PliableBVS扩展了这一框架,这是一种贝叶斯变量选择方法,它保留了可塑套索的层次结构,同时通过尖峰-板先验诱导稀疏性。所提出的模型结合了贝叶斯套索的连续收缩效应和分层尖峰-板先验公式,该公式具有两层决策变量:一层控制主效应的包含,另一层控制交互效应的包含,后者以相应主效应的包含为条件。这种结构使得能够在连贯的概率框架内同时选择高维主效应和交互效应。在模拟研究中,所提出的方法在识别活跃的主效应和交互效应、减少错误发现以及提高预测准确性方面,在大多数情况下优于原始的可塑套索。来自分娩启动研究和子痫前期研究的数据应用表明,PliableBVS选择了具有生物学意义的特征和交互项。

英文摘要

High-dimensional interaction models are useful for studying, for example, how a large set of variables of interest, such as gene expression or other omics features, interact with a smaller set of modifying variables, such as clinical covariates. In this context, the pliable lasso has recently been proposed as an efficient method for screening large numbers of potential interaction terms under an asymmetric weak hierarchical constraint. In this work, we extend this framework by introducing PliableBVS, a Bayesian variable selection approach that preserves the hierarchical structure of the pliable lasso while inducing sparsity through spike-and-slab priors. The proposed model combines the continuous shrinkage effect of Bayesian lasso with a hierarchical spike-and-slab prior formulation that has two layers of decision variables: one governing the inclusion of main effects and another controlling the inclusion of interaction effects which is conditional on the inclusion of the corresponding main effects. This structure enables simultaneous selection of high-dimensional main and interaction effects within a coherent probabilistic framework. In simulation studies the proposed method outperforms the original pliable lasso in identifying active main and interaction effects, reducing false discoveries, and improving prediction accuracy in most scenarios. Applications with data from a labor onset study and a preeclampsia study demonstrate that PliableBVS selects biologically meaningful features and interactions.

2606.02008 2026-06-02 stat.ML cs.LG

Provable Data Scaling Law for Meta Learning via Complexity Minimization

通过复杂度最小化实现元学习的可证明数据缩放定律

Kazuto Fukuchi, Ryuichiro Hataya, Kota Matsui

AI总结 提出复杂度最小化框架,通过最小化跨源域的最坏情况下游模型复杂度,从理论上证明元学习中的预训练数据规模增大可提升少样本适应性能。

详情
AI中文摘要

预训练已成为现代机器学习的基本范式,其关键经验优势之一是随着预训练数据规模的增加,下游样本复杂度降低。然而,现有的预训练理论框架并未完全解释这一现象。在本文中,我们引入了复杂度最小化,一种新颖的元表示学习框架,旨在实现对此缩放行为的理论分析,该框架通过评估每个领域最适合的下游模型复杂度并最小化跨源域的最坏情况复杂度来学习表示。我们的端到端理论分析,涵盖从预训练到下游回归,表明该框架可证明地捕捉了这种缩放行为;特别地,我们展示了少样本适应的错误率随着元训练数据量的增加而改善。实验上,我们证明将复杂度正则化纳入现有的元学习方法中持续提高下游样本效率。

英文摘要

Pre-training has become a fundamental paradigm in modern machine learning, with one of its key empirical benefits being reduced downstream sample complexity as the scale of pre-training data increases. However, existing theoretical frameworks for pre-training do not fully explain this phenomenon. In this paper, we introduce complexity minimization, a novel meta-representation learning framework designed to enable theoretical analysis of this scaling behavior, which learns representations by evaluating the downstream model complexity best suited to each domain and minimizing the worst-case such complexity across source domains. Our end-to-end theoretical analysis, spanning pre-training through downstream regression, shows that this framework provably captures this scaling behavior; in particular, we show that the error rate of few-shot adaptation improves as the amount of meta-training data grows. Empirically, we demonstrate that incorporating complexity regularization into existing meta-learning methods consistently improves downstream sample efficiency.

2606.01990 2026-06-02 stat.ME math.ST stat.TH

Testing for Single-Population Ancestry in the Admixture Model

混合模型中的单群体祖先检验

Holger Dette, Carola Sophia Heinzel, Zoe Lange, Peter Pfaffelhuber

AI总结 针对混合模型中的单群体祖先假设,提出一种基于约束参数自助法的统计检验方法,能够控制错误率并可靠检测主导祖先成分。

详情
AI中文摘要

混合模型通过将每个个体的基因组表示为来自 $K$ 个祖先群体的贡献的混合来描述遗传标记数据,其中个体混合向量总结了相应的祖先比例。在群体遗传学和法医遗传学中,一个关键问题是个体的基因组是否支持以单一祖先为主的解释,还是混合解释更为合适。我们针对监督混合模型(其中祖先等位基因频率视为已知)提出了一个单群体祖先的统计检验。该检验评估最大的混合成分是否超过从业者选择的优势阈值,从而对足够强的单群体贡献这一概念给出精确含义。为了校准检验,我们开发了一种约束参数自助法程序,该程序在零假设约束下的最大似然估计量下生成数据,考虑了约束假设结构、标记异质性和小样本量。在标准正则条件下,我们证明所提出的检验具有渐近水平 $α$ 且是一致的,确保在可靠检测主导祖先成分的同时控制虚假单祖先声明。模拟研究在不同数量的祖先群体、标记面板大小、优势阈值和等位基因频率分布下展示了良好的有限样本性能。我们进一步利用1000基因组计划的数据说明了该方法的实际效用。所提出的框架提供了可解释的、基于阈值的祖先评估,并具有严格的误差控制,并将约束自助法方法扩展到遗传标记数据的独立但非同分布设定中。

英文摘要

The Admixture Model describes genetic marker data by representing each individual's genome as a mixture of contributions from $K$ ancestral populations, with the individual admixture vector summarizing the corresponding ancestry proportions. In population and forensic genetics, a key question is whether an individual's genome supports a predominantly single-ancestry interpretation or whether an admixed interpretation is more appropriate. We propose a statistical test for single-population ancestry in the supervised Admixture Model, where ancestral allele frequencies are treated as known. The test assesses whether the largest admixture component exceeds a practitioner-chosen dominance threshold, giving precise meaning to the notion of a sufficiently strong single-population contribution. To calibrate the test, we develop a constrained parametric bootstrap procedure that generates data under a null-constrained maximum likelihood estimator, accounting for the constrained hypothesis structure, the marker-wise heterogeneity and small sample sizes. Under standard regularity conditions, we prove that the proposed test has asymptotic level $α$ and is consistent, ensuring control of false single-ancestry declarations while reliably detecting dominant ancestry components. Simulation studies demonstrate good finite-sample performance across different numbers of ancestral populations, marker-panel sizes, dominance thresholds, and allele-frequency distributions. We further illustrate the practical utility of the method using data from the 1000 Genomes Project. The proposed framework delivers interpretable, threshold-based ancestry assessment with rigorous error control, and extends constrained bootstrap methodology to the independent but non-identically distributed setting of genetic marker data.

2606.01960 2026-06-02 stat.ME math.ST stat.TH

Return-to-Baseline Testing via Empirically Calibrated e-processes

通过经验校准的e过程进行基线恢复测试

Marta Regis, Paulo Serra

AI总结 提出一种序贯、无分布假设的检验程序,利用通用推断和超鞅定义差异度量,通过基线数据经验校准形成e过程,以在干预前后高频监测数据中准确检测数据分布回归基线的时间,并提供任意时刻有效的误差控制。

详情
AI中文摘要

我们考虑在干预前后的高频监测数据中检测基线恢复(RtB)的问题,目标是识别数据生成分布重新与干预前分布对齐的时间。我们提出了一种序贯、无分布假设的检验程序,不依赖于指定零模型,并提供任意时刻有效的误差控制。该方法利用通用推断的思想定义差异度量,该度量被聚合为非负超鞅,然后通过经验校准形成e过程。校准使用基线数据进行,因此是特定于个体的。我们在灵活的非参数假设下建立了校准误差的有限样本界,讨论了调谐参数和计算复杂度的影响,并通过模拟和临床案例研究表明,该程序能准确从监测数据中检测到RtB。

英文摘要

We consider the problem of detecting a Return to Baseline (RtB) in high-frequency monitoring data preceding and following an intervention, where the aim is to identify the time at which the data-generating distribution realigns with its pre-intervention distribution. We propose a sequential, distribution-free testing procedure that does not rely on specifying a null model and provides anytime-valid error control. The method relies on ideas from universal inference to define a discrepancy measure that is aggregated into a non-negative super-martingale, and is then empirically cal- ibrated to form an e-process. The calibration is performed using the baseline data, and is thus subject-specific. We establish finite-sample bounds for the calibration error (under a flexible non-parametric assumption), discuss the impact of tuning parameters and computational complexity, and illustrate through simulations and a clinical case study that the procedure accurately detects RtB from monitoring data.

2606.01954 2026-06-02 cs.LG stat.ML

Flow-Transformed Implicit Processes for Function-Space Variational Inference

流变换隐式过程用于函数空间变分推断

Luis A. Ortega, Andrés R. Masegosa, Thomas D. Nielsen

AI总结 提出流变换隐式过程(FTIP),通过归一化流增强组合权重的变分分布,从而在函数空间中捕获非对称、重尾和多模态后验结构,并使用黑盒α目标进行优化。

详情
Comments
24 pages, 4 figures, 10 tables. Pre-print submitted for revision
AI中文摘要

隐式过程先验通过灵活的生成机制定义函数上的分布,使其对贝叶斯函数空间建模具有吸引力。然而,使用此类先验进行后验推断具有挑战性,因为其诱导的函数空间分布通常没有闭式解。一种实用策略是使用有限个采样函数的集合来近似先验,然后将后验函数表示为这些样本的学习组合。现有方法通常对组合权重施加高斯变分分布。虽然易于处理,但这种选择限制了可表示的后验不确定性形状,特别是当真实后验是非对称、重尾或多模态时。我们提出流变换隐式过程(FTIP),一种变分推断方法,使这种有限维函数空间近似更具表达力。FTIP不使用高斯分布,而是使用归一化流来定义更丰富的变分分布,从而在保持可处理优化的同时诱导灵活的后验函数分布。我们使用黑盒α目标训练模型,从而能够比较质量覆盖和模式寻找的变分行为。实验表明,FTIP捕获了函数空间中的非对称和多模态后验结构,而高斯系数近似往往会平滑或崩溃这些结构。

英文摘要

Implicit-process priors define distributions over functions through flexible generative mechanisms, making them attractive for Bayesian function-space modelling. However, performing posterior inference with such priors is challenging because their induced function-space distributions are typically not available in closed form. One practical strategy is to approximate the prior using a finite collection of sampled functions, and then represent posterior functions as learned combinations of these samples. Existing approaches commonly place a Gaussian variational distribution over the combination weights. While tractable, this choice limits the shapes of posterior uncertainty that can be represented, especially when the true posterior is asymmetric, heavy-tailed, or multimodal. We propose Flow-Transformed Implicit Processes (FTIP), a variational inference method that makes this finite-dimensional function-space approximation more expressive. Instead of using a Gaussian distribution over the combination weights, FTIP uses a normalizing flow to define a richer variational distribution. This induces a flexible posterior distribution over functions while preserving tractable optimization. We train the model using a Black-Box α objective, allowing us to compare mass-covering and mode-seeking variational behaviour. Experiments show that FTIP captures asymmetric and multimodal posterior structure in function space that Gaussian coefficient approximations tend to smooth or collapse.

2606.01932 2026-06-02 stat.ME stat.AP

Spatial Capture-Recapture With Penalized Regression Splines to Flexibly Model Wildlife Density and Distribution

基于惩罚回归样条的空间捕获-再捕获模型灵活建模野生动物密度与分布

Andrew E. Seaton, David L. Borchers, Milou Groenenberg, Ben Stevenson

AI总结 提出一种结合惩罚回归样条和拉普拉斯近似惩罚边际最大似然的空间捕获-再捕获框架,通过近似对数高斯Cox过程灵活建模活动中心分布,改善动物空间分布估计。

详情
AI中文摘要

空间捕获-再捕获模型通常用于估计野生动物种群的丰度和分布,涉及一个描述个体空间分布的潜在空间点过程(动物活动中心)。虽然传统的空间捕获-再捕获模型使用泊松过程,但由于点过程中未包含的因素(如社会聚集、领地性、或未观测协变量导致的栖息地偏好),点之间的条件独立假设在实践中常被违反。对数高斯Cox过程在空间统计中常用于克服泊松过程的弱点,但目前尚无方法将其拟合到空间捕获-再捕获中。本文提出一个空间捕获-再捕获框架,允许使用惩罚回归样条描述活动中心分布,通过拉普拉斯近似惩罚边际最大似然方法进行模型拟合。我们的方法近似使用对数高斯Cox过程描述活动中心,并允许灵活建模协变量对密度的非线性效应。我们通过模拟研究和两个案例研究展示了该方法的应用。结果表明,虽然传统方法的人口规模估计对密度模型误设具有鲁棒性,但我们的方法显著改善了动物空间分布的估计。

英文摘要

Spatial capture-recapture models are routinely used to estimate the abundance and distribution of wild animal populations and involve a latent spatial point process of animal activity centres that describes the spatial distribution of individuals. While traditional spatial capture-recapture models use a Poisson process, the assumption of conditional independence between points is often violated in practice due to factors not included in the point process, such as social clustering, territoriality, or preferential selection of habitat due to unobserved covariates. Log-Gaussian Cox processes are commonly used in spatial statistics to overcome weaknesses of Poisson processes, but methods to fit them within spatial capture-recapture do not currently exist. Here, we present a spatial capture-recapture framework that allows for the use of penalized regression splines to describe the activity centre distribution, with model fitting via a Laplace-approximate penalized marginal maximum likelihood approach. Our method approximates using a log-Gaussian Cox process for activity centres, and allows flexible modelling of nonlinear effect of covariates on density. We illustrate the use of our method with a simulation study and two case-studies. We demonstrate that, while population size estimates of traditional approaches are robust to density model misspecification, our approach substantially improves the estimation of spatial animal distributions.

2606.01827 2026-06-02 math.OC cs.LG stat.ML

Adaptive Sharpness-Aware Minimization with a Polyak-type Step size: A Theory-Grounded Scheduler

自适应锐度感知最小化与Polyak型步长:一种理论驱动的调度器

Dimitris Oikonomou, Nicolas Loizou

AI总结 针对锐度感知最小化(SAM)对学习率敏感的问题,受随机Polyak步长启发,提出适用于SAM的Polyak调度器,在确定性和随机设置下实现自适应算法,并证明收敛性,实验表明性能优于或媲美调优的SAM基线。

详情
Comments
43rd International Conference on Machine Learning (ICML 2026)
AI中文摘要

锐度感知最小化(SAM)已成为训练机器学习模型的一种强大且广泛采用的优化器。通过显式最小化损失景观的锐度,SAM通常能提高泛化能力,同时提供强大的经验性能。然而,SAM及其变体,像大多数训练算法一样,对学习率的选择敏感,而学习率通常通过广泛的超参数调优或预定义调度器来选择。在这项工作中,受随机Polyak步长对随机梯度下降(SGD)有效性的最新进展的启发,我们推导了针对SAM风格更新的Polyak调度器,在确定性和随机设置下产生了新颖的自适应算法。在光滑设置中,我们证明了强凸目标的线性收敛性和确定性情况下凸目标的$\mathcal{O}(1/T)$收敛率。在随机设置中,我们建立了直到最优解邻域的类似收敛保证。数值实验表明,所提出的Polyak调度器实现了与精心调优的SAM基线相当或更好的性能,同时大大减少了对学习率调优的需求。

英文摘要

Sharpness-Aware Minimization (SAM) has established itself as a powerful and widely adopted optimizer for training machine learning models. By explicitly minimizing the sharpness of the loss landscape, SAM often improves generalization while delivering strong empirical performance. However, SAM and its variants, like most training algorithms, are sensitive to the choice of learning rate, which is typically selected through extensive hyperparameter tuning or predefined schedulers. In this work, motivated by recent advances on the effectiveness of stochastic Polyak step sizes for Stochastic Gradient Descent (SGD), we derive Polyak schedulers tailored to SAM-style updates, yielding novel adaptive algorithms in both deterministic and stochastic settings. In the smooth setting, we prove linear convergence for strongly convex objectives and an $\mathcal{O}(1/T)$ convergence rate for convex objectives in the deterministic case. In the stochastic setting, we establish analogous convergence guarantees up to a neighborhood of the optimum. Numerical experiments demonstrate that the proposed Polyak schedulers achieve performance comparable to or better than carefully tuned SAM baselines, while substantially reducing the need for learning-rate tuning.

2606.01799 2026-06-02 cs.LG stat.ML

Tree-Guided Identify-Then-Exploit: A Unified Framework of Best Arm Identification and Regret Minimization for Dueling Bandits

树引导的识别-然后-利用:决斗式赌博机中最佳臂识别与遗憾最小化的统一框架

Pu Wang, Yao-Xiang Ding

AI总结 针对Condorcet赢家假设下的N臂随机决斗式赌博机,提出树引导的识别-然后-利用(TG-ITE)统一框架,通过共享树引导识别方法在O(N)次比较内找到高置信度候选,并针对不同目标设计利用策略,首次同时实现最佳臂识别O(N)样本复杂度、弱遗憾O(N)和强遗憾O(N log T)保证,并消除现有方法中O(log N)的次优差距。

详情
AI中文摘要

我们研究在Condorcet赢家假设下的$N$臂随机决斗式赌博机,考虑三个广泛采用的目标:最佳臂识别(BAI)、弱遗憾和强遗憾。我们提出树引导的识别-然后-利用(TG-ITE),据我们所知,这是第一个统一处理所有这些目标的框架。无需更强的假设,我们提出一种共享的树引导识别方法,在$O(N)$次比较内找到高置信度的候选。我们进一步提出不同的利用策略,利用这个热启动阶段来优化具体目标。这种方法使得我们的方法能够:(1)在没有通常采用的更强假设的情况下,实现BAI的$O(N)$样本复杂度;(2)构建第一个赢家保持风格的算法,实现$O(N)$弱遗憾;(3)享有与专门强遗憾方法相同的$O(N \log T)$保证;(4)实现BAI和弱遗憾的联合优化,两者均具有$O(N)$保证,消除了现有方法中$O(\log N)$的次优差距。我们的结果提供了证据,表明在决斗式赌博机中,BAI和遗憾最小化之间的权衡相对温和。

英文摘要

We study $N$-armed stochastic dueling bandits under the Condorcet-winner assumption, where three widely adopted objectives are considered: best-arm identification (BAI), weak regret, and strong regret. We propose Tree-Guided Identify-Then-Exploit (TG-ITE), the first unified framework to tackle all these objectives to our knowledge. Without requiring stronger assumptions, we propose a shared tree-guided identification approach to find a high-confidence incumbent within $O(N)$ comparisons. We further propose varied exploitation strategies to utilize this warm-start stage to optimize the specific objectives at hand. This methodology enables our approach to (1) achieve $O(N)$ sample complexity in BAI without commonly adopted stronger assumptions; (2) build the first winner-stays-style algorithm to achieve $O(N)$ weak regret; (3) enjoy the same $O(N \log T)$ guarantee as specialized strong-regret approaches; (4) realize the joint optimization of BAI and weak regret with $O(N)$ guarantees for both, eliminating the sub-optimal gap of $O(\log N)$ in the existing approach. Our results provide evidence that the trade-off between BAI and regret minimization is relatively benign in dueling bandits.

2606.01796 2026-06-02 stat.ME

LoopPerm-CPD: A Robust Loop Permutation Framework for Automatic Multiple Change-Point Detection in Longitudinal Data

LoopPerm-CPD:纵向数据中自动多重变点检测的鲁棒循环置换框架

Xuejun Sun, Oliver Li, Qianhui Zheng, Xiaojing Zheng, Fei Zou

AI总结 针对纵向数据中稀疏、不规则采样、小样本和异常值等问题,提出基于循环置换的鲁棒变点检测方法LoopPerm-CPD,自动检测多重变点。

详情
AI中文摘要

人类病毒挑战研究,其中参与者被故意接种H1N1或H3N2等流感毒株,并通过接种前后的纵向转录组谱进行监测,对于表征对病毒感染的动态生物免疫反应至关重要。在此类设置中,一个关键的分析目标是检测关键的转变时间点或变点,在这些点上潜在轨迹的方向或速率发生变化,指示诸如免疫反应开始或恢复等事件。然而,在这些纵向数据中进行变点检测具有根本性挑战,因为观测通常稀疏且不规则间隔,样本量小,异常值常见,且变点数量事先未知。为应对这些挑战,我们提出LoopPerm-CPD,一种鲁棒的变点检测方法,具有内置的循环置换程序,用于自动多重变点检测。该方法评估候选斜率变点,并通过受试者内循环置换结合二元分割来评估其显著性,联合估计变点的数量和位置。随附的R包LoopPerm-CPD实现了该框架,并灵活地适应广义最小二乘、分位数回归和分位数秩得分统计量,用于不同类型的纵向结果。通过模拟评估所提出的方法,展示了I类错误控制以及与竞争方法相比更高的功效。应用于真实数据,该框架在多个人类呼吸道病毒接种研究中识别出可解释的转变点。综合这些结果,将LoopPerm-CPD及其配套软件确立为复杂人类纵向队列数据中变点检测的鲁棒且用户友好的工具。

英文摘要

Human viral challenge studies, in which participants are deliberately inoculated with influenza strains such as H1N1 or H3N2 and monitored through longitudinal transcriptomic profiling before and after inoculation, are critical for characterizing dynamic biological immune responses to viral infection. A key analytical goal in such settings is to detect critical transition times, or change points, at which an underlying trajectory shifts direction or rate, indicating events such as the onset of an immune response or recovery. However, change-point detection in these longitudinal data is fundamentally challenging because observations are often sparse and irregularly spaced, sample sizes are small, outliers are common, and the number of change points is unknown in advance. To address these challenges, we propose LoopPerm-CPD, a robust change-point detection approach with a built-in loop permutation procedure for automatic multiple change-point detection. The method evaluates candidate slope change points and assesses their significance using within-subject circular permutation combined with binary segmentation, jointly estimating both the number and locations of change points. The accompanying R package, LoopPerm-CPD, implements this framework and flexibly accommodates generalized least squares, quantile regression, and quantile rank-score statistics for different types of longitudinal outcomes. The proposed approach is evaluated through simulations, demonstrating Type I error control and improved power compared with competing methods. Applied to real data, the framework identifies interpretable transition points in multiple human respiratory viral inoculation studies. Together, these results establish LoopPerm-CPD and its companion software as a robust and user-friendly tool for change-point detection in complex human longitudinal cohort data.

2606.01724 2026-06-02 stat.AP cs.NI

Mapping the Storm: Geospatial Impacts of Severe Weather on LEO Network Performance

风暴映射:恶劣天气对低轨卫星网络性能的地理空间影响

Sina Ehsani, Bhanu Pallakonda, Pragyana K. Mishra

AI总结 本研究利用超过87万终端小时的遥测数据,通过空间连接和时间对齐分析,揭示了雷暴等恶劣天气对Starlink低轨卫星网络性能的显著影响,超过55%的受影响终端出现严重降级,并提出了面向地理空间预测和天气感知网络规划的框架。

详情
Journal ref
Proceedings of the 33rd ACM International Conference on Advances in Geographic Information Systems, pp. 163-172. 2025
Comments
10 pages, 8 figures, presented at the proceedings of the 33rd ACM International Conference on Advances in Geographic Information Systems
AI中文摘要

以Starlink为代表的低轨卫星星座在全球宽带连接中扮演着日益关键的角色。然而,这些天基网络的可靠性和性能对环境动态高度敏感,尤其是表现出强烈时空变异性的局部天气现象。在本研究中,我们呈现了对Starlink低轨网络中天气引起的性能降级的洲际尺度地理空间分析,重点关注美国本土。利用包含来自1292个Starlink终端的超过87万终端小时分钟级遥测数据的独特数据集,我们整合了高分辨率局部天气观测,以量化各种气象条件的影响。我们使用空间连接技术和与分类天气事件的时间对齐相关性,评估了关键性能指标(KPI),包括ping延迟、ping丢包率和信号质量。我们的分析揭示了恶劣天气事件,如伴有大雨或雪的雷暴,对网络性能有显著影响。特别是,超过55%的受影响终端经历了实质性降级。分钟级的时间连续性分析表明,这种降级可能导致持续数分钟至数小时的性能受损或完全服务中断。本工作首次将低轨卫星互联网性能与精细时空天气数据联系起来的大规模实证研究。我们的发现为地理空间预测建模、天气感知网络规划和弹性卫星通信系统设计提供了可操作的见解。我们还提出了一个框架,将天气推断的性能变异性纳入未来低轨互联网系统的地理空间规划和服务级预测工具中。

英文摘要

LEO satellite constellations, led by deployments such as Starlink, are playing an increasingly pivotal role in enabling global broadband connectivity. However, the reliability and performance of these space-based networks are highly sensitive to environmental dynamics, particularly localized weather phenomena that exhibit strong spatio-temporal variability. In this study, we present a continental-scale geospatial analysis of weather-induced performance degradation in the Starlink LEO network, with a focus on the contiguous United States. Leveraging a unique dataset comprising more than 870,000 terminal hours of minute-level telemetry from 1,292 Starlink terminals, we integrate high-resolution localized weather observations to quantify the impact of various meteorological conditions. We evaluated key performance indicators (KPIs)-including ping latency, ping drop rate, and signal quality-using spatial join techniques and time-aligned correlation with classified weather events. Our analysis reveals that severe weather events, such as thunderstorms with heavy rain or snow, have a pronounced effect on network performance. In particular, more than 55% affected terminals experienced substantial degradation. Temporal continuity analysis at the minute level shows that such degradation can lead to sustained impairments or full service outages lasting from several minutes to multiple hours.This work contributes to the first large-scale empirical study linking LEO satellite Internet performance with fine-grained weather data in both space and time. Our findings offer actionable insights for geospatial predictive modeling, weather-aware network provisioning, and resilient satellite communication system design. We also propose a framework for incorporating weather-inferred performance variability into future geospatial planning and service-level forecasting tools for LEO-based Internet systems.

2606.01674 2026-06-02 stat.ME

Higher-Order Efficient Estimators: A Review and Simulation-Based Benchmark Study

高阶高效估计量:综述与基于模拟的基准研究

Zeyi Wang, Mark J. van der Laan

AI总结 本文综述了二阶高效估计量,通过治疗特异性均值估计作为因果推断和缺失数据问题,比较了高阶影响函数、基于核的高阶目标最小损失估计量和基于高度自适应套索的高阶目标最小损失估计量,并通过模拟评估了它们在高阶去偏中的表现和稳定性。

详情
AI中文摘要

高阶高效估计量通过用三阶或更高阶项替换二阶残差来扩展标准一阶半参数估计量,从而可能在较慢的 nuisance 函数收敛速度下实现渐近效率,并改善有限样本性能。现有方法通过结构不同的近似策略实现高阶展开,包括基截断、核平滑和高度自适应套索(HAL)表示,这使得直接的理论和实际比较变得困难。在本文中,我们以治疗特异性均值估计作为因果推断和缺失数据问题的典型例子,对二阶高效估计量进行了重点综述和基于模拟的经验基准测试。我们比较了高阶影响函数(HOIF)估计量、基于核的高阶目标最小损失估计量(HOTMLE)和基于HAL的HOTMLE如何构建高阶展开以及它们引入的近似或正则化负担。渐近和数值研究在具有恒定或增加截面变化复杂性的受控 nuisance 误差下评估了一阶和经验二阶估计量。结果表明,高阶去偏可以显著减少一阶估计偏差;然而,收益在很大程度上取决于高阶校正所需的近似或正则化的稳定性。基于HAL的经验HOTMLE表现出相对稳定的性能,而经验HOIF仍然对基截断和调参选择敏感。总体而言,本文阐明了何时在理论上实现高阶渐近改进,何时在实际中可见,以及何时实现不稳定性可能抵消理论优势。

英文摘要

Higher-order efficient estimators extend standard first-order semiparametric estimators by replacing second-order residuals with third- or higher-order terms, potentially enabling asymptotic efficiency under slower nuisance function convergence rates and improving finite-sample performance. Existing methods achieve higher-order expansions through structurally different approximation strategies, including basis truncation, kernel smoothing, and highly adaptive lasso (HAL) representations, making direct theoretical and practical comparison difficult. In this manuscript, we provide a focused review and a simulation-based empirical benchmark for second-order efficient estimators, using treatment-specific mean estimation as a canonical causal inference and missing data problem. We compare how higher-order influence function (HOIF) estimators, kernel-based higher-order targeted minimum loss-based estimator (HOTMLE), and HAL-based HOTMLE construct higher-order expansions and the approximation or regularization burdens they introduce. The asymptotic and numerical study evaluates first-order and empirical second-order estimators under controlled nuisance errors with constant or increasing sectional variation complexity. Results show that higher-order debiasing can substantially reduce first-order estimation bias; however, gains depend strongly on stability of the approximation or regularization required for higher-order correction. Empirical HAL-based HOTMLE shows relatively stable performance, while empirical HOIF remains sensitive to basis truncation and tuning choices. Overall, this manuscript clarifies when higher-order asymptotic improvements are attained in theory, when they may be practically visible, and when implementation instability may offset theoretical advantages.

2606.01669 2026-06-02 stat.ME

Beyond principal ignorability: Nonparametric sensitivity bounds for principal stratification

超越主忽略性:主分层的非参数敏感性界

Xinyuan Chen, Michael O. Harhay, Fan Li

AI总结 针对因果推断中主分层框架下主因果效应不可识别的问题,提出一种基于未测量混杂因子相对风险的无界参数化非参数敏感性分析方法,推导出每个主因果效应的尖锐非参数界,并推广至主广义因果效应。

详情
AI中文摘要

主分层是因果推断中处理中间变量的有效框架。然而,主因果效应(PCE)的点识别通常需要不可检验的主忽略性(PI)假设。本文开发了一个非参数敏感性分析框架,用于评估PI的违反。我们引入了一个由未测量混杂因子的选择相对风险和结局相对风险参数化的无界边界因子。利用这个边界因子,我们推导出每个PCE的尖锐非参数界。我们证明这些界嵌套在有无单调性假设下的最坏情况非参数界内。然后,我们讨论了Cornfield型条件和主E值,它们量化了使目标PCE无效所需的最小未测量混杂联合强度。此外,我们将该方法推广到主广义因果效应,将敏感性界和证伪阈值扩展到最近在乘积空间上评估的成对比较估计量。

英文摘要

Principal stratification is an effective framework addressing intermediate variables in causal inference. However, point identification of the principal causal effects (PCEs) often requires the untestable principal ignorability (PI) assumption. This article develops a nonparametric sensitivity analysis framework for evaluating PI violations. We introduce a margin-free bounding factor parameterized by the selection and outcome relative risks of an unmeasured confounder. Using this bounding factor, we derive sharp nonparametric bounds for each PCE. We prove that these bounds nest within the worst-case nonparametric bounds with and without the monotonicity assumption. We then discuss Cornfield-type conditions and principal E-values that quantify the minimum joint magnitude of unmeasured confounding required to nullify the target PCE. Furthermore, we generalize this methodology to principal generalized causal effects, extending the sensitivity bounds and falsification thresholds to the recent pairwise comparison estimands evaluated over a product space.

2606.01661 2026-06-02 q-bio.NC q-bio.QM stat.ME

Feature leakage and the identifiability of direct-dependency entropy models of neural activity

特征泄露与神经活动直接依赖熵模型的可辨识性

Houman Safaai, Bernardo L. Sabatini

AI总结 本文研究直接依赖熵模型在预测神经活动时的局限性,指出其成功预测仅反映输入分布下的预测能力而非机制辨识,并提出诊断方法区分分布内预测与响应规则恢复。

详情
AI中文摘要

生物神经元在分支、电兴奋的树突上接收数千个突触输入,但群体活动通常用直接输入-输出规则建模,其中每个输入独立贡献于标量驱动。我们研究此类模型的成功预测能揭示和不能揭示关于神经计算的什么。对于匹配输出率和成对输出-输入共活性的条件最大熵模型,直接模型解释的熵是采样输入分布下的预测度量,而非机制识别测试。受限最大熵拟合是信息投影:省略的交互、时间或隐藏状态项,只要与包含的充分统计量相关,就可以吸收到拟合的一阶参数中。对于稀疏相关二进制输入,这种吸收具有明确的协偏度形式。我们引入诊断方法,将分布内预测与响应规则恢复分开:在保持P(y|x)固定而改变P(x)的状态重加权、局部可加性的条件对数几率对比以及时间泄露控制。在真实模拟中,纯高阶响应可以在易泄露采样下通过一阶熵和原始共活性测试,但在重加权后被正确分类。应用于从CA1海马记录中选择的、泄露丰富的局部表格时,大约一半在经验权重下表现为一阶的表格在平衡重加权下变得分布敏感,远高于匹配加性替代零假设。因此,直接熵解释分数和原始共活性预测应解释为在观测状态分布下的预测,而非直接模型之外的机制不存在或微弱的证据。

英文摘要

Biological neurons receive thousands of synaptic inputs on branching, electrically excitable dendrites, yet population activity is often modeled with direct input-output rules in which each input contributes independently to a scalar drive. We study what successful prediction by such models does, and does not, reveal about neural computation. For conditional maximum-entropy models that match output rates and pairwise output-input coactivities, the entropy explained by a direct model is a prediction measure under the sampled input distribution, not a mechanism-identification test. A restricted MaxEnt fit is an information projection: omitted interaction, temporal, or hidden-state terms can be absorbed into fitted first-order parameters whenever they are correlated with the included sufficient statistics. For sparse correlated binary inputs, this absorption has an explicit coskewness form. We introduce diagnostics that separate in-distribution prediction from recovery of the response rule: state reweighting that holds P(y|x) fixed while changing P(x), conditional log-odds contrasts for local additivity, and temporal leakage controls. In ground-truth simulations, purely higher-order responses can pass first-order entropy and raw coactivity tests under leakage-prone sampling, but are correctly classified after reweighting. Applied to selected, leakage-enriched local tables from CA1 hippocampal recordings, approximately half of tables that appear first-order under empirical weights become distribution-sensitive under balanced reweighting, far above a matched additive-surrogate null. Thus direct entropy-explained fractions and raw coactivity predictions should be interpreted as predictions under the observed state distribution, not as evidence that mechanisms outside the direct model are absent or small.

2606.01659 2026-06-02 econ.EM stat.ME stat.ML

Data-Automated Policy Learning for Nonlinear Welfare

非线性福利的数据自动化政策学习

Chunrong Ai, Zeqi Wu, Zheng Zhang

AI总结 本文针对二元处理设置中的非线性福利准则,提出一种基于观测数据的自动化政策学习方法,通过重加权去偏和筛逼近等技术,在无限维政策空间中实现了理论保证。

详情
AI中文摘要

本文探讨了从观测数据中进行政策学习,重点关注二元处理设置中的非线性福利准则。该非线性准则源于政策制定者优先考虑特定人群的场景。我们使用一个包含潜在结果和中间参数的效用函数来建模该准则,其中后者捕捉结果分布的高阶矩。在观测数据背景下,中间参数和福利准则都依赖于倾向得分,我们使用机器学习技术估计倾向得分。为了解决机器学习估计中的偏差,我们引入了一种新颖的基于重加权的去偏方法,为传统的基于正交性的方法提供了有前景的替代方案。为了应对无限维政策空间的复杂性,我们采用筛逼近和$K$折交叉验证进行模型选择,从而完全自动化政策学习过程。尽管存在这些复杂性,我们证明了所提出的政策学习方法的福利遗憾和平均福利遗憾满足一个预言不等式,从而为估计政策相对于最优政策的性能提供了理论保证。这一发现将现有结果从线性福利准则扩展到非线性福利准则,从有限维政策空间扩展到无限维政策空间,并从已知倾向得分扩展到机器学习估计的倾向得分。

英文摘要

This paper explores policy learning from observational data, focusing on a nonlinear welfare criterion in a binary treatment setting. The nonlinear criterion is inspired by scenarios where policymakers prioritize specific population segments. We model this criterion using a utility function that encompasses potential outcomes and intermediate parameters, with the latter capturing higher moments of the outcome distributions. When formulated in the context of observational data, both the intermediate parameters and the welfare criterion depend on the propensity score, which we estimate using machine-learning techniques. To address bias in machine learning estimates, we introduce a novel reweighting-based debiasing approach that offers a promising alternative to traditional orthogonality-based methods. To tackle the complexities of infinite-dimensional policy spaces, we employ sieve approximations and $K$-fold cross-validation for model selection, thereby fully automating the policy-learning process. Despite these complexities, we demonstrate that both the welfare regret and the average welfare regret of our proposed policy learning method satisfy an oracle inequality, thereby providing theoretical guarantees on the performance of the estimated policy relative to the best possible policy. This finding extends the existing results from linear to nonlinear welfare criteria, from finite-dimensional to infinite-dimensional policy spaces, and from a known propensity score to a machine-learned one.

2606.01655 2026-06-02 math.OC cs.AI cs.LG stat.ML

MINTS: Minimalist Thompson Sampling

MINTS: 极简汤普森采样

Kaizheng Wang

AI总结 针对贝叶斯方法在复杂结构约束下的局限性,提出一种仅对最优位置设置先验、通过轮廓似然消除冗余参数的极简贝叶斯框架,并实例化为MINTS算法,在均值约束多臂老虎机中实现近最优非渐近遗憾保证和精确几乎必然渐近遗憾刻画。

详情
Comments
29 pages
AI中文摘要

贝叶斯范式为不确定性下的序贯决策提供了原则性工具,但其对所有参数依赖概率模型的做法会阻碍复杂结构约束的纳入。我们提出一种极简贝叶斯框架,仅对最优位置设置先验,同时通过轮廓似然消除冗余参数。这产生了一个自然适应结构约束的广义后验。作为直接实例,我们开发了极简汤普森采样(MINTS)。对于具有均值约束的多臂老虎机,我们建立了近最优的非渐近遗憾保证和精确的几乎必然渐近遗憾刻画。特别地,MINTS在无结构设置中达到了经典的Lai-Robbins常数,并自动适应单峰结构,达到仅由最优臂的紧邻所确定的精确常数。

英文摘要

The Bayesian paradigm offers principled tools for sequential decision-making under uncertainty, but its reliance on a probabilistic model for all parameters can hinder the incorporation of complex structural constraints. We introduce a minimalist Bayesian framework that places a prior only on the location of the optimum, while eliminating nuisance parameters through profile likelihood. This yields a generalized posterior that naturally accommodates structural constraints. As a direct instantiation, we develop MINimalist Thompson Sampling (MINTS). For multi-armed bandits with mean constraints, we establish near-optimal non-asymptotic regret guarantees and sharp almost-sure asymptotic regret characterizations. In particular, MINTS attains the classical Lai--Robbins constant in the unstructured setting and automatically adapts to unimodal structure, achieving the sharp constant determined only by the immediate neighbors of the optimal arm.

2606.01645 2026-06-02 stat.ML cs.LG

Self-Regulating Annealing in Heavy-Tailed Diffusion Models

重尾扩散模型中的自调节退火

Keito Wakatsuki, Hideaki Shimazaki

AI总结 本文提出一种基于随机微分方程的重尾扩散模型采样器,通过状态依赖的扩散系数实现自调节退火机制,以改进重尾数据的生成保真度。

详情
Comments
6 pages, 3 figures, IJCNN2026
AI中文摘要

扩散模型已成为深度生成模型的主要框架。虽然标准高斯公式在理论上很方便,但其对重尾数据集的适用性仍不清楚。为了解决这个问题,重尾扩散模型(HTDM)通过用学生t分布替换高斯分布来扩展标准公式,从而提高了重尾数据集上的尾部保真度。尽管基于随机微分方程(SDE)的采样在HTDM中是可能的,但尚未得到充分探索。在本文中,我们提出了一种用于HTDM的基于SDE的采样器,该采样器明确地包含了状态依赖的扩散系数。这种状态依赖性通过自适应地调节有效噪声尺度,自然地诱导出自调节退火机制。我们从理论上探讨了这一机制,并通过实验验证了其在从重尾分布中重现样本的必要性。

英文摘要

Diffusion models have emerged as a leading framework for deep generative modeling. While the standard Gaussian formulation is theoretically convenient, its suitability for heavy-tailed datasets remains unclear. To address this, heavy-tailed diffusion models (HTDMs) extend the standard formulation by replacing the Gaussian distribution with a Student's t-distribution, thereby improving tail fidelity on heavy-tailed datasets. Although stochastic differential equation (SDE)-based sampling is possible in HTDMs, it has not been fully explored. In this paper, we propose an SDE-based sampler for HTDMs that explicitly incorporates a state-dependent diffusion coefficient. This state dependence naturally induces a self-regulating annealing mechanism by adaptively modulating the effective noise scale. We theoretically explore this mechanism and experimentally verify its necessity for reproducing samples from a heavy-tailed distribution.

2606.01554 2026-06-02 math.ST stat.TH

Fast Near-Optimal Estimation over Symmetric Norm Balls

对称范数球上的快速近最优估计

Matey Neykov

AI总结 提出一种多项式时间算法,用于在对称范数单位球约束下实现信号的近最优欧几里得估计,并扩展至随机设计的中等维度线性回归。

详情
AI中文摘要

这篇短文提出了一种多项式时间算法,用于对约束在对称范数单位球内的信号进行近最优欧几里得估计,其中对称性是相对于已知基的,并且范数可通过评估预言机访问。我们进一步将该方法扩展到随机设计、中等维度的线性回归设置,其中回归参数同样假设属于由对称范数定义的约束集。

英文摘要

This short note proposes a polynomial-time algorithm for near-optimal Euclidean estimation of a signal constrained to lie in the unit ball of a symmetric norm, where the symmetry is with respect to a known basis and the norm is accessible through an evaluation oracle. We further extend the method to a random-design, moderate-dimensional linear regression setting, where the regression parameter is likewise assumed to belong to a constraint set defined by a symmetric norm.

2606.01553 2026-06-02 stat.ME econ.EM

Structural Change Detection in High-Dimensional Transformed Factor Models via Canonical Correlation Analysis

高维变换因子模型中的结构变化检测:基于典型相关分析

Lei Jia, Shouri Hu, Zhaoxing Gao

AI总结 提出一种基于典型相关的方法,通过特征值比准则检测高维变换因子模型中的结构变化点,并设计交替迭代估计程序同时估计变化点与因子数。

详情
Comments
35 pages
AI中文摘要

本文开发了一种基于典型相关的方法,用于检测高维变换因子模型中的结构变化。所提出的方法利用了由动态依赖的共同因子诱导的低秩典型相关结构,而序列不相关的异质成分对应于具有零典型相关的噪声子空间。我们构建了一个特征值比准则,用于测量估计噪声子空间中的残差动态依赖性,并在特定载荷空间或动态典型相关结构充分分离的情况下识别真实变化点。由于变化点位置和特定状态的因子数均未知,我们进一步提出了一种交替迭代估计程序,该程序依次更新它们直至收敛。在合适的混合和矩条件下,我们建立了所提出估计量的渐近性质,其收敛速度明确依赖于因子强度、截面维度和样本量。蒙特卡洛实验以及对日内股票收益率和美国温度序列的实证应用展示了有限样本下的性能。

英文摘要

This paper develops a canonical-correlation-based method for detecting structural changes in high-dimensional transformed factor models. The proposed approach exploits the low-rank canonical-correlation structure induced by dynamically dependent common factors, while serially uncorrelated idiosyncratic components correspond to a noise subspace with zero canonical correlations. We construct an eigenvalue-ratio criterion that measures residual dynamic dependence in the estimated noise subspace and identifies the true change point under sufficient separation of the regime-specific loading spaces or dynamic canonical correlation structures. Since the change-point location and the regime-specific factor numbers are both unknown, we further propose an alternating iterative estimation procedure that updates them sequentially until convergence. Under suitable mixing and moment conditions, we establish asymptotic properties of the proposed estimators, with convergence rates depending explicitly on factor strength, cross-sectional dimension, and sample size. Monte Carlo experiments and empirical applications to intraday stock returns and U.S. temperature series demonstrate the finite-sample

2606.01539 2026-06-02 stat.ME cs.LG

Scalable Counterfactual Risk Estimation for Rare Events in Longitudinal Data

纵向数据中罕见事件的可扩展反事实风险估计

Xiaohui Yin, Avijit Mitra, Ying Zhou, Kun Chen, Hong Yu

AI总结 针对纵向生存数据中罕见事件导致的类不平衡和计算负担问题,提出一种可扩展的子采样与重加权策略,应用于ICE等因果效应估计器,在保持一致性的同时提高稳定性。

详情
Comments
Accepted at KDD-2026, 12 pages
AI中文摘要

在大规模观察性研究中,估计时变治疗对生存结果的因果效应在计算上要求很高,尤其是当结果罕见时。虽然基于g公式的方法(如迭代条件期望(ICE)估计器)为纵向因果推断提供了原则性框架,但它们在计算上变得昂贵,特别是当需要基于自助法的方差估计时。此外,每个时间点的结果罕见性会导致严重的类不平衡,从而引发逻辑回归及相关模型的不稳定性和收敛问题。为应对这些挑战,我们提出了一种针对纵向生存数据的原则性子采样与重加权策略,可应用于该场景下的多种现有因果效应估计器,包括ICE估计器。所提方法显著降低了计算负担,同时在罕见结果场景下保持一致性并提高估计稳定性。我们通过模拟评估该方法,并使用一项关于健康社会和行为决定因素(SBDH)与自杀风险的大规模EHR队列研究进行验证,证明了其在纵向数据中建模罕见结果的有效性。

英文摘要

Estimating the causal effect of time-varying treatments on survival outcomes in large observational studies is computationally demanding, particularly when outcomes are rare. While g-formula-based methods such as the iterative conditional expectation (ICE) estimator provide a principled framework for longitudinal causal inference, they become computationally expensive, especially when bootstrap-based variance estimation is required. In addition, outcome rarity at each time point induces severe class imbalance, leading to instability and convergence issues in logistic regression and related models. To address these challenges, we propose a principled subsampling and reweighting strategy for longitudinal survival data that can be applied to a range of existing causal effect estimators in this setting, including the ICE estimator. The proposed method substantially reduces computational burden while preserving consistency and improving estimation stability in rare-outcome settings. We evaluate the method through simulations and validate it using a large-scale EHR cohort study on social and behavioral determinants of health (SBDH) and suicide risk, demonstrating its effectiveness for modeling rare outcomes in longitudinal data.

2606.01530 2026-06-02 math.ST stat.TH

A flexible and robust approach to univariate Gaussian splitting using parameterised Gaussian mixtures

使用参数化高斯混合的单变量高斯分裂的灵活且稳健方法

Dmitry Mikhin, Athena Xiourouppa

AI总结 本文提出一种通过最小化L2范数来用同方差高斯混合近似高斯分布的方法,该方法参数化以降低优化复杂度,并快速逼近原函数。

详情
Comments
37 pages, 11 figures
AI中文摘要

我们考虑用较小方差的同方差高斯混合来近似高斯分布。通过最小化原始高斯与混合分布之间的$L^2$范数得到解,该混合分布被参数化以降低优化问题的复杂度。所开发的技术直接、足够稳健,并且随着混合成分数量的增加,高斯混合迅速逼近原始函数。针对输入参数的多种特殊情况,对提出的解进行了检验,得到了进一步的简化。讨论了将所提方法扩展到近似非高斯分布的情况。

英文摘要

We consider approximation of a Gaussian distribution with a mixture of homoscedastic Gaussians of smaller variance. The solution is obtained by minimising the $L^2$ norm between the original Gaussian and the mixture, which is parameterised to reduce the complexity of the optimisation problem. The developed technique is straightforward, sufficiently robust and yields Gaussian Mixtures that rapidly approach the original function as the number of mixands is increased. The proposed solution is examined for multiple special cases of input parameters resulting in further simplifications. Extension of the proposed method for approximating non-Gaussian distributions is discussed.

2606.01525 2026-06-02 cs.LG stat.ML

Semi-Supervised Hyperbolic Hierarchical Clustering with Set-Level Structural Priors

基于集合级结构先验的半监督双曲层次聚类

Junjing Zheng, Xinyu Zhang, Xiangfeng Qiu, Chengliang Song, Weidong Jiang

AI总结 提出一种半监督双曲层次聚类方法,通过引入集合作为基本建模单元,利用从叶级监督导出的集合级结构先验来指导非叶层次结构学习,提升标签一致性和树质量。

详情
AI中文摘要

半监督层次聚类旨在学习与数据模式和用户提供的监督一致的树结构。监督通常以叶级关系的形式给出,例如成对的必须连接/不能连接约束或三元组的必须在之前连接约束。尽管这些约束有助于调节局部样本关系,但它们并不直接指示哪些样本应形成连贯的子树。因此,学习到的树的非叶结构可能偏离真实标签所偏好的层次组织。为了解决这一局限性,我们提出了一种具有集合级结构先验的半监督双曲层次聚类方法。主要贡献是引入集合作为层次学习的基本建模单元。每个集合表示预期在子树内凝聚的样本,并从叶级监督以及学习到的约束一致相似性结构中导出。这些集合作为子树级监督的软结构先验,使得监督能够指导超出局部叶级关系的非叶层次形成。具体来说,我们首先学习约束一致的嵌入以获得可靠的集合划分,然后构建约束诱导的集合并估计集合间相似性以形成集合级结构先验。最后,将这些先验纳入双曲层次目标中进行连续树优化。在11个基准数据集上的实验和消融研究表明,所提出的方法在提高代表性层次聚类基线的标签一致性的同时,也增强了基于相似性的树质量。

英文摘要

Semi-supervised hierarchical clustering aims to learn a tree structure consistent with data patterns and user-provided supervision. Supervision is usually given as leaf-level relations, such as pairwise must-link/cannot-link constraints or triplet-wise must-link-before constraints. Although useful for regulating local sample relations, such supervision does not directly indicate which samples should form coherent subtrees. Consequently, the non-leaf structure of the learned tree may deviate from the hierarchical organization preferred by ground-truth labels. To address this limitation, we propose a semi-supervised hyperbolic hierarchical clustering method with set-level structural priors. The main contribution is to introduce sets as basic modeling units for hierarchy learning. Each set denotes samples expected to cohere within a subtree and is induced from leaf-level supervision together with a learned constraint-consistent similarity structure. These sets act as soft structural priors for subtree-level supervision, allowing supervision to guide non-leaf hierarchy formation beyond local leaf-level relations. Specifically, we first learn constraint-consistent embeddings to obtain a reliable set partition, then construct constraint-induced sets and estimate inter-set similarities to form set-level structural priors. Finally, these priors are incorporated into a hyperbolic hierarchy objective for continuous tree optimization. Experiments on eleven benchmark datasets and ablation studies show that the proposed method consistently improves label consistency over representative hierarchical clustering baselines while also enhancing similarity-based tree quality.

2606.01521 2026-06-02 cs.LG stat.ML

Fast Generalization after Interpolation via Critically Damped Momentum Optimization

通过临界阻尼动量优化实现插值后的快速泛化

Luca Muscarnera, Silas Ruhrberg Estévez, Yuanzhang Xiao, Mihaela Van der Schaar

AI总结 提出GROKtimizer双阶段策略,结合快速收敛到插值与临界阻尼动量后插值范数最小化,在局部二次模型下实现比经典梯度下降二次加速,选择低范数插值解以提升泛化。

详情
AI中文摘要

机器学习的一个核心问题是模型在训练中可以达到近乎完美的性能,但对未见示例的泛化能力却显著较差。这种差距在高维、小样本场景下尤为严重,因为存在许多插值解,优化必须隐式地在具有不同泛化特性的最小值之间进行选择。基于最近关于插值阈值附近优化动态的理论进展,我们注意到风险最小化的两阶段结构(先损失最小化,后复杂度最小化)启发了一种双阶段优化调度。因此,我们从理论上证明,GROKtimizer——一种结合快速收敛到插值与基于临界阻尼动量(CDM)的后插值范数最小化的双阶段策略——为选择低范数插值解提供了一种自然方案。在后插值盆地的局部二次模型下,GROKtimizer比经典梯度下降实现了二次加速,并在一阶优化器中具有可证明的最优性。为了展示我们方法的适用性,我们在经典grokking文献中常见的几个合成基准以及各种真实世界数据集上评估了GROKtimizer。最后,我们将我们的发现与平坦最小值假说相协调,强调了后插值动态在构建高质量、泛化模型中的重要性。

英文摘要

A central problem in machine learning is that models can achieve near-perfect training performance while generalizing substantially less well to unseen examples. This gap is especially acute in high-dimensional, low-sample regimes, where many interpolating solutions exist and optimization must implicitly select among minima with different generalization properties. Following recent theoretical advances on optimization dynamics near the interpolation threshold, we note that the two-regime structure of risk minimization, with loss minimization followed by complexity minimization, motivates a biphasic optimization schedule. We thus theoretically demonstrate that GROKtimizer, a biphasic strategy that combines rapid convergence to interpolation with Critically Damped Momentum (CDM)-based post-interpolation norm minimization, offers a natural solution for selecting low-norm interpolating solutions. Under a local quadratic model of the post-interpolation basin, GROKtimizer provides a quadratic speedup over classical gradient descent, with provable optimality among first-order optimizers. To showcase the applicability of our method, we evaluate GROKtimizer on several synthetic benchmarks common in the classical grokking literature and on various real-world datasets. Finally, we reconcile our findings with the flat-minima hypothesis, highlighting the importance of post-interpolation dynamics in the construction of high-quality, generalizing models.

2606.01496 2026-06-02 astro-ph.GA physics.data-an stat.AP

The Information Content of Quasar Variability Light Curves: How Well Can we Infer Stochastic Model Parameters?

类星体变光曲线的信息含量:我们能在多大程度上推断随机模型参数?

Brendon J. Brewer, Geraint F. Lewis, Xiang Yu, Yuan Li

AI总结 通过信息论分析,发现类星体光变曲线对短期波动参数η的约束远强于对时标τ的约束,建议未来研究聚焦η,并利用分层贝叶斯回归模型揭示了η随辐射光度、静止波长和红移的变化规律。

详情
Comments
Submitted. 17 pages, 7 figures, 3 tables
AI中文摘要

类星体变异性由相对论性吸积盘内的多尺度物理过程驱动,通常用随机时间序列模型来建模。其中最简单的是阻尼随机游走(DRW),也称为Ornstein-Uhlenbeck(OU)过程。这里我们证明,当将此类模型拟合到类星体光变曲线数据时,光变曲线的均值μ不应固定(这是典型做法),因为这会导致对变异性时标τ的推断过于自信,不确定性被严重低估。然而,短期波动参数η通常可以从短光变曲线中得到很好的约束。通过模拟,我们计算了条件熵和互信息等信息论量,证实光变曲线提供的关于η的信息远多于关于τ的信息。因此,我们建议未来的类星体变异性研究应聚焦于η而非τ。为演示这一方法,我们拟合了一个分层贝叶斯回归模型,将η表示为辐射光度和静止波长的函数,数据集包含570条跨越数十年的光变曲线。我们使用直接利用光变曲线的似然函数进行拟合,而非使用来自单个光变曲线拟合的中间η值。我们发现,波动性随辐射光度和静止波长的增加而降低。波动性随红移的下降也比单纯时间膨胀效应所预示的更陡,表明随着类星体在宇宙时间中的演化,其内在波动性在增加。

英文摘要

Quasar variability, driven by multi-scale physical processing within a relativistic accretion disk, is commonly modelled with stochastic time series models. The simplest of these is the Damped Random Walk (DRW), also known as the Ornstein-Uhlenbeck (OU) process. Here, we demonstrate that, when fitting such a model to quasar light curve data, the mean of the light curve, $μ$, should not be fixed (which is the typical approach), as this leads to overconfident inferences about the variability timescale $τ$, with substantially underestimated uncertainties. However, the short term volatility parameter $η$ is typically very well constrained from short light curves. Through simulations, we compute information theoretic quantities such as the conditional entropy and the mutual information, confirming that light curves provide much more information about $η$ than about $τ$. As a result, we recommend that future quasar variability studies focus on $η$ rather than $τ$. To demonstrate this approach, we fit a hierarchical Bayesian regression model for $η$ as a function of bolometric luminosity and rest wavelength to a dataset of 570 light curves measured over decades. We perform the fit using a likelihood function that uses the light curves directly, rather than using intermediate $η$ values from individual light curve fits. We find that volatility decreases as a function of both bolometric luminosity and rest wavelength. The volatility also decreases more steeply with redshift than time dilation alone would suggest, pointing to an increase in intrinsic volatility as quasars evolve over cosmic time.

2606.01489 2026-06-02 stat.AP

Model complexity in econometrics - a combinatorial analysis

计量经济学中的模型复杂度——组合分析

Vahidin Jeleskovic

AI总结 针对回归模型和向量自回归模型在规范选择中的欠拟合与过拟合问题,提出组合框架计算模型可能的状态数,以揭示被忽视的模型规范方面。

详情
AI中文摘要

回归模型和向量自回归模型(VAR)在计量经济学中扮演着关键角色,允许同时分析多个变量。尽管有用,这些模型面临欠拟合和过拟合等挑战,特别是在确定最优模型规范时,可能导致显著的计算成本。为应对这些挑战,计量经济学家通常依赖广泛采用的模型选择准则,如赤池信息准则(AIC)和贝叶斯信息准则(BIC)。这些准则有助于平衡模型复杂度和拟合优度,辅助选择给定数据最合适的模型规范。然而,现有研究在正确规范这些模型方面存在明显空白,特别是在确定系统可能假设的最优状态数方面。针对这一空白,我们引入了一个组合框架,旨在计算此类计量经济模型中潜在的状态数。我们的方法包括在模型开发中描绘四个不同阶段,每个阶段提供一系列规范。该方法能够对所有可能状态进行全面的组合计算。本文旨在强调模型规范中被忽视的这一方面,并在实证研究界引发建设性对话。通过这样做,我们希望激发进一步研究,提高计量经济模型的精确性和适用性。理论复杂性准则对于阐明基本限制并提出新目标至关重要。

英文摘要

Regression models and Vector Autoregressive Models (VARs) play crucial roles in econometrics by allowing the analysis of multiple variables simultaneously. Despite their utility, these models face challenges like underfitting and overfitting, especially when determining the optimal model specification, which can lead to significant computational costs. To address these challenges, econometricians often rely on widely adopted model selection criteria such as the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). These criteria help balance model complexity and goodness of fit, aiding in the selection of the most suitable model specification for the given data. Nonetheless, there is a notable gap in existing research concerning the correct specification of these models, particularly in determining the optimal number of states a system can assume. Addressing this gap, we introduce a combinatorial framework designed to calculate the potential number of states in such econometric models. Our approach involves delineating four distinct stages in model development, each offering a range of specifications. This method enables a comprehensive combinatorial calculation of all possible states. The aim of this paper is to highlight this overlooked aspect of model specification and to spark a constructive dialogue within the empirical research community. By doing so, we hope to inspire further research that enhances the precision and applicability of econometric models. A theoretical complexity criterion is necessary to elucidate fundamental limitations and propose new objectives to pursue.

2606.01474 2026-06-02 stat.CO

Voronoi-Elitism Genetic Algorithm: A Generic Derivative-Free Routine With Theory and Implementation for Statistical Optimization

Voronoi-精英遗传算法:一种用于统计优化的通用无导数方法及其理论与实现

Anthony Haitao Zou, Yizhou Jake Cai, Ting Fung Ma

AI总结 提出Voronoi-精英遗传算法(VEGA),通过嵌入几何信息的遗传搜索解决具有两个参数块(一个可解析优化,另一个不规则或计算昂贵)的目标函数优化问题,并在理论上分析高维行为,实验证明其空间覆盖和距离界限优势。

详情
AI中文摘要

在本文中,我们提出了一种针对具有挑战性的目标函数的通用优化方法,该方法可应用于各种统计问题。我们关注具有两个参数块的目标函数,其中一个块适合解析优化,另一个块不规则或计算昂贵。针对这种情况,我们提出了Voronoi-精英遗传算法(VEGA),这是一种将几何信息嵌入遗传搜索的无导数优化方法。该算法保留精英候选解,并在其周围构建基于Voronoi的邻域,通过交叉和自适应变异平衡对优秀解的利用与对未覆盖区域的探索。我们通过分析距离集中性以及种群规模和收缩突变的影响来研究遗传搜索的高维行为,结果表明该算法在有限计算预算下改善了空间覆盖并产生了更尖锐的距离界限。通过仿真研究将VEGA与两种遗传型算法在有限样本下进行比较。在Stack Exchange活动数据上的实际应用进一步证明了其识别稳定结构变化的能力,表明该算法在高维无导数优化中具有计算灵活性,并适用于各种统计问题。

英文摘要

In this paper, we propose a generic optimization approach for challenging objective functions that finds applications in various statistical problems. We focus on objective functions with two parameter blocks of one amenable to analytic optimization, and another that is irregular or computationally expensive. To address this setting, we propose the Voronoi-Elitism Genetic Algorithm (VEGA), a derivative-free optimization method that embeds geometric information into genetic search. The proposed algorithm retains elite candidates and constructs Voronoi-based neighborhoods around them, whose crossover and self-adaptive mutation balance exploitation of promising solutions with exploration of under-covered regions. We study the high dimensional behavior of genetic search by analyzing distance concentration, and the effects of population size and shrinking mutation, which shows that the algorithm improves spatial coverage and yields sharper distance bounds under limited computational budgets. Simulation studies are conducted to compare VEGA with two genetic-type algorithms competitors in finite samples. A real data application on Stack Exchange activity data further illustrates its ability to identify stable structural changes, implying the algorithm is computationally flexible for high-dimensional, derivative-free optimization and applicable for various statistical problems.

2606.01468 2026-06-02 stat.ML cs.AI cs.LG

Computation-Aware Kalman Filtering with Model Selection for Neural Dynamics

基于模型选择的计算感知卡尔曼滤波用于神经动力学

JR Huml, Jonathan Wenger, John P. Cunningham

AI总结 提出计算感知状态空间模型(CASSM),通过新训练损失和优化方案实现模型选择,在试验数远少于神经元数的规模不平衡场景中,以可处理的计算复杂度提供竞争性预测和更优的不确定性校准。

详情
Comments
24 pages, Proceedings of 2nd International Conference on Probabilistic Numerics (2026)
AI中文摘要

由于其明确的先验和建模不确定性的能力,贝叶斯方法在单细胞神经记录的动力潜变量建模中发挥了重要作用。然而,现代规模的数据集使得过参数化的深度网络因其预测能力和有利的计算扩展性成为首选方法。尽管存在许多后验近似方法,但所有方法都会引入近似误差。最近的工作以计算不确定性的形式考虑了这种误差,但代价是二次复杂度,并假设固定的模型超参数。在这里,我们将这一发展扩展到模型选择,包括一种新颖的训练损失和优化方案,从而在大状态空间中实现可处理的推理。我们引入了一个框架,即计算感知状态空间模型(CASSM),专门针对规模不平衡的场景设计,其中试验次数显著少于记录的神经元数量。在这种场景下,对于合成数据和真实数据,我们展示了我们的方法与数据饥饿的深度网络具有竞争力,并且与之前扩展贝叶斯方法的尝试相比,不确定性校准显著改善。我们的实验为神经科学研究人员根据关键数据集属性和约束从一系列潜在动力潜变量模型中进行选择提供了路线图。

英文摘要

Due to their explicit priors and ability to model uncertainty, Bayesian methods have played a major role in dynamical latent variable modeling of single-cell neural recordings. However, modern-sized datasets have made overparameterized deep networks the preferred methods of choice due to their predictive power and favorable computational scaling. While many posterior approximations exist, all incur approximation errors. Recent work accounts for this error in the form of computational uncertainty but comes at the cost of quadratic complexity and assumes fixed model hyperparameters. Here we extend this development to model selection, including a novel training loss and optimization scheme, which yields tractable inference in large state-spaces. We introduce a framework, the Computation-Aware State-Space Model (CASSM), specifically designed for the scale-imbalanced regime, where the number of trials is significantly lower than the number of recorded neurons. In this regime, for both synthetic and real data, we show that our method is competitive with data-hungry deep networks, with significantly improved uncertainty calibration over previous attempts to scale Bayesian methods. Our experiments provide a roadmap to neuroscience researchers in choosing from a host of potential dynamical latent variable models given key dataset properties and constraints.

2606.01465 2026-06-02 stat.ME

Comb Test: Histogram Uniformity Testing Based on Discrete Total Variation

梳状检验:基于离散总变差的直方图均匀性检验

Nikola Banić, Neven Elezović

AI总结 提出一种基于离散总变差的梳状检验方法,通过动态规划计算精确零分布,并利用伽马近似与蒙特卡洛估计扩展至大样本,在梳状交替偏差下统计功效比皮尔逊卡方检验高67%。

详情
Comments
5 pages, 5 figures
AI中文摘要

直方图均匀性检验是一项常见的统计任务,通常使用皮尔逊卡方检验。本文提出一种基于离散总变差的新检验方法,易于计算,并且对于梳状(交替)偏差,其统计功效比皮尔逊卡方检验高出67%,使其成为标准检验的补充。通过动态规划计算精确零分布,并利用伽马近似与蒙特卡洛估计将检验扩展到任意大样本量。在模拟ADC交替微分非线性以及科学数据中的舍入偏差检测上的实验证实了这些结论。Python源代码和预计算数据可在 https://github.com/DiscreteTotalVariation/CombTest 获取。

英文摘要

Histogram uniformity testing is a common statistical task usually performed using Pearson's chi-square test. This paper proposes a new test based on the discrete total variation that is easy to compute and, for comb-like (alternating) deviations, achieves up to 67% higher statistical power than Pearson's chi-square test, making it a complement to standard tests. The exact null distribution is computed via dynamic programming, and a gamma approximation with Monte Carlo estimation extends the test to arbitrarily large sample sizes. Experiments on simulated ADC alternating differential nonlinearity and on rounding bias detection in scientific data confirm the claims. The Python source code and precomputed data are available at https://github.com/DiscreteTotalVariation/CombTest.

2606.01457 2026-06-02 cs.AI cs.LG stat.ML

Transferring Information Across Interventions in Causal Bayesian Optimization

跨干预因果贝叶斯优化的信息传递

Mohammad Ali Javidian

AI总结 提出图耦合因果贝叶斯优化方法,通过共享因果参数的不确定性连接不同干预效应,实现跨干预信息传递,在可识别线性高斯因果模型中证明低秩核性质和次线性遗憾界。

详情
AI中文摘要

贝叶斯优化是一种优化昂贵系统的流行方法,其中每次实验、模拟或干预都会耗费时间或金钱。在其标准形式中,它将我们控制的变量视为黑盒的普通输入,无法区分单纯的相关性与真正的因果关系。因果贝叶斯优化通过使用已知因果图结合观测数据来决定哪些变量值得干预,从而部分弥补了这一差距。然而,现有方法几乎孤立地学习每种可能干预的效果,尽管在因果系统中这些效果通常共享相同的底层机制。我们提出图耦合因果贝叶斯优化,通过我们对一小部分共享因果参数的不确定性,将不同的干预效果联系在一起。结果是一个因果核,使得从一次干预收集的证据能够改进我们对相关干预的估计。对于可识别的线性高斯因果模型,我们证明该核具有低秩,其秩由共享参数的数量而非干预菜单的大小界定。这进而产生一个信息增益界,该界仅随优化范围对数增长,以及一个遗憾界,清晰地将三种误差来源分开:优化、因果估计以及考虑哪些干预集的选择。我们还描述了非线性和自适应扩展。在与理论一致的高斯系统、共享机制压力测试以及标准因果优化基准测试中,该方法保持了因果贝叶斯优化的优势,同时实现了跨相关干预的信息传递,当对目标父节点的直接干预不可用且稀疏的干预数据必须在一大组候选干预中重复使用时,增益最为明显。

英文摘要

Bayesian optimization is a popular way to optimize expensive systems, where every experiment, simulation, or intervention costs time or money. In its standard form, it treats the variables we control as plain inputs to a black box and cannot tell apart mere correlation from a real cause and effect. Causal Bayesian optimization closes part of this gap by using a known causal graph together with observational data to decide which variables are worth intervening on. Existing methods, however, learn the effect of each possible intervention almost in isolation, even though in a causal system these effects usually share the same underlying mechanisms. We propose graph-coupled causal Bayesian optimization, which ties the different intervention effects together through the uncertainty we have about a small set of shared causal parameters. The result is a causal kernel that lets evidence collected from one intervention improve our estimate of related interventions. For identifiable linear Gaussian causal models, we show that this kernel has low rank, bounded by the number of shared parameters rather than by the size of the intervention menu. This in turn yields an information-gain bound that grows only logarithmically in the optimization horizon, and a regret bound that cleanly separates three sources of error: optimization, causal estimation, and the choice of which intervention sets to consider. We also describe nonlinear and adaptive extensions. Across theory-aligned Gaussian systems, shared-mechanism stress tests, and standard causal optimization benchmarks, the method keeps the benefits of causal Bayesian optimization while transferring information across related interventions, with the clearest gains when direct interventions on the target's parents are unavailable and sparse interventional data must be reused across a large family of candidate interventions.

2606.01432 2026-06-02 cs.LG eess.IV eess.SP stat.ML

Leaf Spectral Reflectance Prediction Using Multi-Head Attention Neural Networks

使用多头注意力神经网络预测叶片光谱反射率

Parastoo Farajpoor, Alireza Pourreza, Mohammadreza Narimani, Ashraf El-Kereamy, Matthew W. Fidelibus

AI总结 针对特定作物(如葡萄藤),提出基于多头注意力神经网络的叶片性状-光谱预测模型,在葡萄藤数据集上实现高精度(R²=0.84,NRMSE=1.52%),优于传统辐射传输模型PROSPECT-PRO。

详情
Journal ref
Proc. SPIE 13475, 134750V (2025)
Comments
8 pages, 5 figures. Author-accepted version of the SPIE conference paper
AI中文摘要

从生理和生化性状准确建模叶片光谱反射率对于推进植物科学和精准农业中的遥感应用至关重要。广泛使用的辐射传输模型(如PROSPECT-PRO)依赖于从多种物种中开发的广义性状-反射率关系,这可能无法完全捕捉特定作物(如葡萄藤)的光谱行为。在本研究中,我们开发了一个基于多头注意力神经网络的性状到光谱预测模型,该模型在包含16个叶片性状(涵盖多个品种、生长阶段和年份)的葡萄藤特定数据集上训练。使用分层5折交叉验证评估模型,平均决定系数(R^2)为0.84,归一化均方根误差(NRMSE)为1.52%,显示出高精度和泛化能力。与正向模式下的PROSPECT-PRO相比,神经网络表现出更低的平均绝对误差(MAE),尤其是在近红外(NIR)和短波红外(SWIR)区域。这些结果强调了物种特异性建模方法的重要性,并表明将生化和结构性状整合到数据驱动架构中可以显著改善光谱预测。所提出的模型为生成准确的叶片级反射率数据提供了稳健框架,在冠层性状反演、葡萄园监测和遥感驱动的作物管理方面具有潜在应用。

英文摘要

Accurate modeling of leaf spectral reflectance from physiological and biochemical traits is essential for advancing remote sensing applications in plant science and precision agriculture. Widely used radiative transfer models, such as PROSPECT-PRO, rely on generalized trait-reflectance relationships developed from a wide range of species, which may not fully capture the spectral behavior of specific crops like grapevines. In this study, we developed a trait-to-spectra prediction model using a multi-head attention neural network trained on a grapevine-specific dataset that includes 16 leaf traits measured across multiple varieties, growth stages, and years. The model was evaluated using stratified 5-fold cross-validation and achieved an average coefficient of determination (R^2) of 0.84 and normalized root mean squared error (NRMSE) of 1.52 percent, demonstrating high accuracy and generalizability. When compared to PROSPECT-PRO in forward mode, the neural network exhibited lower mean absolute error (MAE), especially in the near-infrared (NIR) and shortwave-infrared (SWIR) regions. These results emphasize the importance of species-specific modeling approaches and show that integrating biochemical and structural traits into data-driven architectures can significantly improve spectral prediction. The proposed model provides a robust framework for generating accurate leaf-level reflectance data, with potential applications in canopy trait retrieval, vineyard monitoring, and remote sensing-driven crop management.

2606.01428 2026-06-02 stat.ME stat.AP

Quantifying Evidential Rigor in Meta-Analytic Corpora: A Simulation-Characterized, Bias-Robust Bayesian Workflow with a Nutrition Case Study

量化元分析语料库中的证据严谨性:一个模拟表征、偏差稳健的贝叶斯工作流及营养学案例研究

Matt Hester

AI总结 提出一个语料库规模的贝叶斯证据审计工作流,通过联合贝叶斯因子综合效应/无效应证据和偏差成分缺失来量化证据严谨性,并使用ADEMP框架的模拟/重采样设计进行表征,以营养干预语料库作为案例研究。

详情
Comments
59 pages, 13 figures; supplementary material included as ancillary file; companion repository archived at Zenodo, DOI: 10.5281/zenodo.20467258
AI中文摘要

传统的元分析通过合并估计、置信区间和p值来总结证据,但这些输出并不直接衡量效应存在的证据、无效应的证据,或者结论在多大程度上依赖于发表偏倚或小样本效应。我们引入了一个用于元分析语料库的语料库规模贝叶斯证据审计工作流。该工作流重建或接受研究层面的效应和标准误,协调方向,拟合匹配的贝叶斯随机效应基线和偏差感知模型平均集成,并报告配对估计以及成分和联合模型族证据。中心估计量是严谨性:一个联合贝叶斯因子总结,结合了已解决的效应/无效应证据与拟合集成中显式偏差成分的缺失。严谨性不是一个阳性发现分数;无效应证据可以得高分,而不确定或依赖偏差的证据得分低。我们使用ADEMP框架的模拟/重采样设计来表征该工作流,包括已知单元合成模拟、经验注册重采样和经验拟合轮廓加权合成采样。一个营养干预语料库提供了工作案例研究,其中偏差感知拟合常常削弱传统估计,许多名义上有意义的效应失去了清晰的证据支持。一个公共配套仓库提供了经验输入、生成的人工制品、模拟源/设计文件以及用于复现和调整审计的文档。

英文摘要

Conventional meta-analysis summarizes evidence through pooled estimates, intervals, and p-values, but these outputs do not directly measure evidence for an effect, evidence for no effect, or the degree to which conclusions depend on publication selection or small-study effects. We introduce a corpus-scale Bayesian evidential-audit workflow for meta-analytic corpora. The workflow reconstructs or accepts study-level effects and standard errors, harmonizes directions, fits a matched Bayesian random-effects baseline and a bias-aware model-averaged ensemble, and reports paired estimates with component and joint model-family evidence. The central estimand is rigor: a joint Bayes-factor summary combining resolved effect/no-effect evidence with absence of an explicit bias component in the fitted ensemble. Rigor is not a positive-finding score; no-effect evidence can score highly, whereas inconclusive or bias-dependent evidence scores poorly. We characterize the workflow using an ADEMP-framed simulation/resampling design with known-cell synthetic simulation, empirical registry resampling, and empirical fitted-profile-weighted synthetic sampling. A nutrition intervention corpus provides the worked case study, where bias-aware fitting often attenuates conventional estimates and many nominally meaningful effects lose clean evidential support. A public companion repository provides empirical inputs, generated artifacts, simulation source/design files, and documentation for reproducing and adapting the audit.

2606.01427 2026-06-02 stat.ML cs.LG

On the Uncertainty Quantification Ability of Tabular Foundation Models

关于表格基础模型的不确定性量化能力

Tyler R. Johnson, Kian Ben-Jacob, Nima Negarandeh, Oriol Vendrell-Gallart, Ramin Bostanabad

AI总结 通过对比TabPFN与高斯过程在回归任务上的实证研究,揭示了显式先验与学习先验之间的权衡:TabPFN在复杂高维问题中表现优异,而高斯过程在数据稀缺时提供更优的预测精度和不确定性量化。

详情
Comments
12 pages, 2 figures, 2 tables
AI中文摘要

基础模型(FMs)在无需特定任务训练或微调的情况下,已在跨任务泛化方面取得了显著成功。然而,力学和计算科学中的许多关键应用不仅需要准确的预测,还需要可靠的不确定性量化(UQ)。本文通过全面的实证研究,比较了表格先验数据拟合网络(TabPFN)与高斯过程(GP)在回归任务中的UQ能力。我们系统地评估了这两种方法在一系列具有不同复杂度、数据集大小和输入维度的回归问题上的表现。我们使用默认设置构建所有GP,并与TabPFN v2.5进行公平比较。我们的发现突显了显式先验与学习先验之间的重要权衡:虽然TabPFN在数据充足的高维复杂问题上具有高度竞争性的性能,但GP在数据稀缺场景下通常提供更优的预测精度和UQ。此外,当所选核函数构成底层函数的良好先验时,GP的性能可能显著超过TabPFN。我们的结果可从https://github.com/kianswarehouse/GPvsPFN复现。

英文摘要

Foundation models (FMs) have achieved substantial success in generalizing across tasks without problemspecific training or fine-tuning. However, many critical applications in mechanics and computational science require not only accurate predictions but also reliable uncertainty quantification (UQ). Herein we investigate the UQ capabilities of tabular FMs in regression tasks through a comprehensive empirical study comparing Tabular Prior-Data Fitted Networks (TabPFN) against Gaussian processes (GPs). We systematically evaluate these two methods across a host of regression problems with varying complexity, dataset sizes, and input dimensionalities. We use a default setting to build all the GPs and for a fair comparison against TabPFN v2.5. Our findings highlight an important trade-off between explicit and learned priors: while TabPFN achieves highly competitive performance for complex, high-dimensional problems with sufficient data, GPs often provide superior predictive accuracy and UQ in data-scarce settings. Moreover, when the chosen kernel constitutes a good prior for the underlying function, GP performance can substantially exceed that of TabPFN. Our results can be reproduced from https://github.com/kianswarehouse/GPvsPFN.

2606.01346 2026-06-02 stat.ME stat.ML

FlowSDR: Sufficient Dimension Reduction via Conditional Normalizing Flows

FlowSDR:基于条件归一化流的充分降维

Yuexiao Dong, Kenichiro Mcalinn, Edoardo Airoldi, Lei Li

AI总结 提出FlowSDR方法,通过最大化条件对数似然联合学习投影和条件密度,实现充分降维,并在模拟和真实数据上优于现有方法。

详情
Comments
20 pages, 8 tables
AI中文摘要

充分降维旨在寻找预测变量的低维线性投影,以保留响应的条件分布。现有方法通过逆矩、局部前向回归或神经集成回归间接逼近该条件分布。我们提出FlowSDR,一种基于似然的框架,通过最大化条件对数似然联合学习投影和条件密度,其中密度由单调有理二次样条流参数化。该估计量在SDR模型下是Fisher相合的,其样本目标在互信息方面具有总体解释。作为同一似然框架内的补充模型,我们引入了神经高斯SDR,一种异方差条件高斯模型,其均值和方差由投影预测变量的共享神经网络函数参数化。在涵盖高斯误差、重尾分布、双组分混合以及均值-方差结构无法捕捉尾部行为的设置中,FlowSDR比现有SDR方法和神经高斯SDR基线更准确地恢复中心子空间。我们还在UTKFace数据集上的人脸年龄预测任务中进一步验证了这些优势。

英文摘要

Sufficient dimension reduction (SDR) seeks a low-dimensional linear projection of predictors that preserves the conditional distribution of the response. Existing methods target this conditional distribution indirectly, via inverse moments, local forward regression, or neural ensemble regression. We propose FlowSDR, a likelihood-based framework that jointly learns the projection and the conditional density by maximizing a conditional log-likelihood, with the density parameterized by monotone rational-quadratic spline flows. The estimator is Fisher consistent under the SDR model, and its sample objective admits a population interpretation in terms of mutual information. As a complementary model within the same likelihood framework, we introduce the neural Gaussian SDR, a heteroscedastic conditional Gaussian model whose mean and variance are parameterized by shared neural-network functions of the projected predictors. In simulations spanning Gaussian errors, heavy-tailed distributions, two-component mixtures, and settings with tail behavior not captured by mean-variance structure, FlowSDR recovers the central subspace more accurately than existing SDR methods and the neural Gaussian SDR baseline. We further validate these advantages on a face-age prediction task using the UTKFace dataset.

2606.01328 2026-06-02 stat.ME physics.soc-ph

Scale-Free Priors and Survival Dynamics: A Bayesian Framework for Conflict Duration

无标度先验与生存动力学:冲突持续时间的贝叶斯框架

Tomasz F. Stepinski

AI总结 提出一个完全贝叶斯生存分析框架,通过无标度先验和交互作用扩展,用于稀疏数据下系统寿命的推断,并以2026年美以-伊朗冲突为例展示其应用。

详情
Comments
10 pages, 1 figure
AI中文摘要

我们开发了一个完全贝叶斯生存分析框架,将关于系统寿命的推断重新表述为风险函数和生存函数,并将这种表示扩展到交互主体。从J. Richard Gott的哥白尼原理出发,我们将无标度先验表示为基准风险$λ(t)=1/t$,从而将寿命的静态先验与生存分析的动态语言联系起来。在该公式中,贝叶斯更新对应于对生存的条件化,而得到的后验分布以风险函数和生存函数的形式具有自然表示。该方法适用于数据稀疏或不可靠的场景,以及无标度、假设较少的基准优于高度参数化模型的情况。在此基础上,我们推导出双主体系统的一般表达式,这些表达式表征了联合生存、条件寿命和比较结果,而不需要特定的交互参数形式。这产生了一个灵活且模块化的框架,其中基准动力学与交互效应分离,允许透明地纳入不同机制。因此,主要贡献是基于风险的贝叶斯更新的一般公式及其向交互系统的扩展。为了说明该框架,我们考虑一个乘性资源消耗规范,其中交互通过累积参与强度修改基准风险。该示例展示了如何在保持分析可处理性的同时嵌入交互项,包括在简化假设下的闭式表达式。我们进一步将方法应用于一个不对称的双主体冲突——2026年美以-伊朗敌对行动,以突出该方法的定性含义。

英文摘要

We have developed a fully Bayesian survival-analysis framework that reformulates inference about system lifetimes in terms of hazard and survival functions, and extends this representation to interacting actors. Starting from J.~Richard Gott's Copernican principle, we express the scale-free prior as a baseline hazard $λ(t)=1/t$, thereby linking a static prior over lifetimes to the dynamic language of survival analysis. In this formulation, Bayesian updating corresponds to conditioning on survival, while the resulting posterior distribution admits a natural representation in terms of hazard and survival functions. The approach is intended for settings where data are sparse or unreliable, and where a scale-free, assumption-light baseline is preferable to heavily parameterized models. Building on this foundation, we derive general expressions for two-actor systems that characterize joint survival, conditional lifetimes, and comparative outcomes without requiring a specific parametric form of interaction. This yields a flexible and modular framework in which baseline dynamics are separated from interaction effects, allowing different mechanisms to be incorporated transparently. Thus, the primary contribution is a general hazard-based formulation of Bayesian updating and its extension to interacting systems To illustrate the framework, we consider a multiplicative resource-depletion specification in which interaction modifies the baseline hazard through cumulative engagement intensity. This example demonstrates how interaction terms can be embedded while preserving analytical tractability, including closed-form expressions under simplifying assumptions. We further provide a stylized application to an asymmetric two-actor conflict, the 2026 US/Israel--Iran hostilities, to highlight the qualitative implications of the approach.

2606.01257 2026-06-02 math.ST stat.ME stat.ML stat.TH

Statistical Inference on Gradient Flows

梯度流的统计推断

Tongyu Li, Alexander Giessing

AI总结 本文提出了一种基于时间一致性的统计推断理论,用于经验风险最小化产生的梯度流,证明了均匀中心极限定理并构建了无需矩阵求逆的协方差估计量,从而为梯度方法提供不确定性量化工具。

详情
AI中文摘要

基于梯度的算法是现代统计估计的核心,但其统计分析通常局限于固定时间行为,例如收敛到总体目标或规定迭代处的波动。然而,在许多应用中,整个优化路径上都需要不确定性量化,特别是当停止时间依赖于数据或发散时。在本文中,我们发展了一种关于经验风险最小化产生的梯度流的时间一致性统计推断理论。我们证明了一个均匀中心极限定理,该定理将经验梯度流与总体梯度流之间的偏差刻画为整个非负实数轴上的连续时间高斯过程。基于这一结果,我们引入了一种算法感知的协方差估计量,该估计量与梯度流联合演化,避免了矩阵求逆、重采样或样本分割。我们证明了该协方差估计量在时间上是一致的,并利用它为待估参数构建了渐近有效覆盖的置信区间。我们的结果将优化动力学与统计推断联系起来,为基于梯度的方法中的不确定性量化提供了实用工具。

英文摘要

Gradient-based algorithms are central to modern statistical estimation, yet their statistical analysis is often restricted to fixed-time behavior, such as convergence to a population target or fluctuations at a prescribed iteration. In many applications, however, uncertainty quantification is needed along the entire optimization path, especially when the stopping time is data-dependent or divergent. In this paper, we develop a theory for time-uniform statistical inference on gradient flows arising from empirical risk minimization. We prove a uniform central limit theorem that characterizes the deviation between empirical and population gradient flows as a continuous-time Gaussian process over the entire nonnegative real line. Building on this result, we introduce an algorithm-aware covariance estimator that evolves jointly with the gradient flow and avoids matrix inversion, resampling, or sample splitting. We show that the covariance estimator is uniformly consistent over time and use it to construct confidence intervals for the target parameter with asymptotically valid coverage. Our results connect optimization dynamics with statistical inference and provide practical tools for uncertainty quantification in gradient-based methods.

2606.01256 2026-06-02 stat.ML cs.LG stat.ME

Distribution-free changepoint localization after sequential change detection

顺序变化检测后的无分布变化点定位

Aytijhya Saha, Aaditya Ramdas

AI总结 本文提出了一种无分布框架,在停止顺序变化检测程序后构建变化点的后检测置信集,无需任何分布假设,并保证了有限样本覆盖率和渐近有界期望大小。

详情
AI中文摘要

本文介绍了一种无分布框架,用于在停止顺序变化检测程序后构建变化点的后检测置信集。众所周知,共形测试鞅可用于顺序检测分布变化,但其本身不提供对声称变化发生时间的推断。以往关于后检测推断的工作需要已知变化前和变化后的分布类别,而本文在没有任何分布假设的情况下实现了变化点的定位。我们建立了有限样本覆盖保证(条件于正确检测)。我们给出了置信集条件期望大小的非渐近界。在合适的渐近机制下,我们证明了置信集的条件期望大小一致有界,并在模拟和真实数据上展示了强大的实证性能。据我们所知,这是第一个具有有效后检测覆盖保证的通用无分布顺序变化点定位框架。

英文摘要

This paper introduces a distribution-free framework for constructing post-detection confidence sets for changepoints after stopping a sequential change detection procedure. It is well known that conformal test martingales can be used to sequentially detect changes in distribution, but by themselves provide no inference for the time at which a proclaimed change occurred. Past work on post-detection inference requires pre- and post-change classes of distributions to be known, but this paper accomplishes localization of the changepoint without any distributional assumptions. We establish finite-sample coverage guarantees (conditional on correct detection). We provide non-asymptotic bounds on the conditional expected size of the confidence sets. Under suitable asymptotic regimes, we proved that the conditional expected size of the confidence set remains uniformly bounded. and demonstrate strong empirical performance on simulated and real data. To the best of our knowledge, this is the first general distribution-free framework for sequential changepoint localization with a valid post-detection coverage guarantee.

2606.01244 2026-06-02 stat.ML cs.LG cs.NA math.FA math.NA math.ST stat.TH

Efficient Approximation for Encoder--Decoder Neural Operators via Variation Spaces

基于变分空间的编码器-解码器神经算子的高效逼近

Jia-Qi Yang, Lei Shi

AI总结 通过引入变分空间作为非线性算子的无穷维结构类,建立了编码器-解码器双层网络在Bochner L^q范数下的逼近界,误差分解为输入编码误差、输出编码误差和N^{-1/2}阶有限宽逼近项,为超越一般Lipschitz或Fréchet可微算子类的高效神经算子学习提供了理论保证。

详情
Comments
14 pages
AI中文摘要

我们研究使用编码器-解码器神经网络的算子学习。受神经网络函数空间理论的启发,我们引入变分空间作为非线性算子的无穷维结构类。该空间通过直接在输入和输出空间上的向量值测度定义。对于该空间中的算子,我们建立了编码器-解码器双层网络在Bochner $L^q$ 范数下的逼近界。得到的误差界分解为输入编码误差、输出编码误差和一个阶为 $N^{-1/2}$ 的有限宽逼近项,其常数与输入和输出编码维度无关。当输入和输出编码误差在编码维度上呈多项式衰减时,这些估计产生代数逼近和学习速率。结果为超越一般Lipschitz或Fréchet可微算子类的高效神经算子学习提供了理论保证。

英文摘要

We study operator learning using encoder--decoder neural networks. Inspired by the function-space theory of neural networks, we introduce a variation space as an infinite-dimensional structural class for nonlinear operators. This space is defined through vector-valued measures directly on the input and output spaces. For operators in this space, we establish approximation bounds for encoder--decoder two-layer networks in the Bochner $L^q$ norm. The resulting error bound decomposes into the input encoding error, the output encoding error, and a finite-width approximation term of order $N^{-1/2}$, with a constant independent of the input and output encoding dimensions. When the input and output encoding errors decay polynomially in the encoding dimensions, these estimates yield algebraic approximation and learning rates. The results provide an theoretical guarantees for efficient neural operator learning beyond general Lipschitz or Fréchet differentiable operator classes.

2606.01239 2026-06-02 stat.ME stat.ML

Functional Clustering of Survival Data via Smoothed Log-Hazard Trajectories: A Risk-Dynamics Perspective

基于平滑对数风险轨迹的生存数据功能聚类:风险动态视角

Anna De Magistris, Elvira Romano, Fabrizio Maturo

AI总结 提出一种基于平滑对数风险轨迹的功能主成分分析聚类框架,从风险动态角度对生存数据进行聚类,并通过模拟和真实临床数据验证其有效性和鲁棒性。

详情
Comments
23 pages, 16 figures, 4 tables
AI中文摘要

本文通过将分析焦点从累积生存概率转移到瞬时风险(由风险函数刻画),研究生存数据中的聚类问题。我们将平滑的对数风险轨迹建模为功能对象,捕捉风险的时间演变,并提出一个基于B样条平滑对数风险轨迹的功能主成分分析的聚类框架。在聚类之前,使用95%累积解释方差规则选择保留的功能主成分数量,然后对未标准化的FPCA得分进行聚类。通过涵盖逐渐复杂场景的模拟研究评估所提出的方法,包括重叠和交叉的风险函数、队列不平衡、异质风险概况和异常值污染。该框架进一步在两个真实临床数据集(德国乳腺癌研究和原发性胆汁性肝硬化数据集)上进行了说明。结果表明,所提出的基于对数风险的功能聚类框架提供了相对时间风险动态的可解释表示,与基于累积生存的基准相比,具有竞争性的内部凝聚力和明确的鲁棒性诊断。

英文摘要

This paper investigates clustering in survival data by shifting the analytical focus from cumulative survival probabilities to instantaneous risk, as characterized by the hazard function. We model smoothed log-hazard trajectories as functional objects that capture the temporal evolution of risk and propose a clustering framework based on Functional Principal Component Analysis applied to B-spline smoothed log-hazard trajectories. The number of retained functional principal components is selected before clustering using a 95% cumulative explained-variance rule, and clustering is then performed on the unstandardized FPCA scores. The proposed methodology is evaluated through simulation studies covering progressively complex scenarios, including overlapping and crossing hazard functions, cohort imbalance, heterogeneous risk profiles, and outlier contamination. The framework is further illustrated on two real-world clinical datasets, the German Breast Cancer Study and the Primary Biliary Cirrhosis dataset. Results show that the proposed log-hazard-based functional clustering framework provides an interpretable representation of relative temporal risk dynamics, with competitive internal cohesion and explicit robustness diagnostics when compared with cumulative-survival-based benchmarks.

2606.01217 2026-06-02 cs.CV cs.LG stat.AP

Analysis of Ethnic Disparities in Autism Spectrum Disorder among Toddlers

幼儿自闭症谱系障碍中的种族差异分析

Aadithya Prabha Ramaharsha, Deevna Reddy, Uma Ranjan

AI总结 通过逻辑回归分析,研究种族、行为评分、性别和新生儿黄疸对幼儿自闭症谱系障碍(ASD)的影响,发现白种人ASD风险比亚洲人高81%,中东人低79%,并确认新生儿黄疸和男性为显著风险因素。

详情
Comments
Third International Conference Biomedical Engineering Science and technology
AI中文摘要

自闭症谱系障碍(ASD)是一种以沟通和行为挑战为特征的神经发育障碍。本研究考察了种族与ASD特征之间的关系,以及行为评分、性别和新生儿黄疸在三个种族群体(白种人、亚洲人和中东人)中的差异。我们进行了逻辑回归分析,表明种族对ASD发病率有显著影响。与亚洲人相比,白种人患ASD的风险增加81%,而中东人患ASD的风险降低79%。我们还证实了早期研究,即新生儿黄疸是ASD的重要预测因子,而男性儿童患ASD的风险远高于女性儿童。这些结果表明,需要建立考虑种族差异的诊断框架和干预措施,以评估ASD特征的表现和评估。

英文摘要

Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder characterized by challenges in communication and behavior. This study examines the relationship between ethnicity and ASD traits, along with behavioural scores, sex and neonatal jaundice across three ethnic groups: White Europeans, Asians, and Middle Eastern individuals. We perform a logistic regression and show that ethnicity has a significant effect on incidence of ASD. White Europeans are 81% increased risk of ASD and Middle Easterners are at 79\% reduced risk of ASD compared to Asians. We also confirm earlier studied which show that neonatal jaundice is a significant predictor of ASD, while male children are at much higher risk of ASD compared to female children. These results suggest the need for diagnostic frameworks and interventions that account for ethnic in the presentation and assessment of ASD traits

2606.01214 2026-06-02 stat.AP

Markovianity-Based Conditioning Depth Diagnostics for Hidden Confounding in Observational Datasets

基于马尔可夫性的条件深度诊断:观测数据集中的隐藏混杂

S. A. Adedayo

AI总结 通过分析推断因果图随条件深度变化的稳定性,提出一种诊断隐藏混杂的模型检查工具。

详情
Comments
19 pages, 7 figures and 2 tables
AI中文摘要

时间序列中可靠的因果发现取决于条件集是否充分代表系统状态。如果遗漏了相关历史或未观测过程,残差依赖可能表现为直接因果联系。我们通过一个简单的假设研究这种失败模式:随着条件深度增加,推断图的变化程度如何?当观测过程近似由有限阶马尔可夫表示描述时,一旦包含足够的过去观测,推断图应趋于稳定。当观测状态不完整时,隐藏混杂和其他隐藏记忆机制应保持对深度的敏感性。我们通过条件深度网格上计算的图不稳定性统计量形式化这种行为。实证研究涵盖已知真实结构的合成系统和因果结构未知的钙成像记录。在模拟中,马尔可夫和非马尔可夫系统都相对支持我们的假设。对于已知真实结构,我们使用混淆矩阵指标评估恢复效果;而在没有真实结构的真实数据中,我们使用描述性图不稳定性总结。在合成马尔可夫和隐藏记忆系统中,c-GC变体给出最清晰的分离,而PCMCI变体显示较弱的兼容趋势。在真实数据中,推断的连接性随条件深度急剧下降然后趋于平稳。然而,该方法不恢复潜在图,也不能清晰区分潜在混杂与滞后阶数误设、非平稳性、测量误差。其贡献更为温和和实用:一个明确的模型检查工具,用于判断因果声明何时稳定,何时应谨慎对待。

英文摘要

Reliable causal discovery in time series depends on whether the conditioning set adequately represents the system state. If relevant history or unobserved processes are omitted, residual dependence can appear as direct causal links. We study this failure mode on promnient constraint-based causal discovery methods through a simple premise: how much does the inferred graph change as conditioning depth increases? When the observed process is described approximately by a finite-order Markovian representation, inferred graphs should stabilize once sufficient past observations are observed. Hidden confounding and other hidden-memory mechanisms should remain sensitive to depth when the observed state is incomplete. We formalise this behavior with graph instability statistics computed over the conditioning-depth grid. The empirical study covers synthetic systems with known ground truth and calcium imaging recordings with unknown causal structure. In simulations, both Markovian and non-Markovian systems relatively upheld our premise. With known ground truth, we evaluate recovery using confusion matrix metrics; while in real data without ground truth, we use descriptive graph instability summaries. Across synthetic Markovian and hidden memory systems, c-GC variants give the clearest separation, while PCMCI variants show weaker compatible trends. In real data, inferred connectivity drops sharply with conditioning depths and then levels off. This method, however, does not recover latent graphs, nor does it clearly separate latent confounding from lag-order misspecification, non-stationarity, measurement error. Its contribution is more modest and practical: and explicit model-checking tool for deciding when causal claims are stable and when they should be treated caustiosly.

2606.01090 2026-06-02 stat.ME cs.LG

Measuring the Symmetry--Data Exchange Rate

测量对称性与数据交换率

Ahmed M. Adly

AI总结 通过在受控的C_n对称任务上实验,发现错位群约束有害、架构与增强的差距取决于非对称测试计算、对称性交换率与理论值一致但置信区间包含零,并提出了相对率估计器、错位群控制和失败分类法等方法论贡献。

详情
Comments
19 pages, 9 figures. Exploratory study. Code and data at https://github.com/AhmedMostafa16/symmetry-exchange
AI中文摘要

等变理论预测,架构对称性先验将样本复杂度降低|G|倍;这一结论被广泛引用,但很少作为具有控制分离先验及其混杂因素的比例定律进行测量。在一个受控的C_n对称任务上,我们报告了三个发现。首先,具有相同轨道大小和匹配计算的错位群控制比无约束更差(联合成对CI [+0.79, +3.26]排除零,对估计器鲁棒);错位约束是有害的,而不仅仅是无帮助。其次,配备测试时轨道平均的增强基线精确匹配等变模型——跨匹配单元的每周期验证曲线逐位相同——因此架构与增强之间的差距取决于非对称测试时计算,而非无条件。第三,相对交换率beta_diff = 1.28在符号和数量级上与理论值1.0一致(单层CI [+0.92, +2.05]);更保守的两层自助法(种子×群大小)将其扩大到[-0.63, +1.72],包含零,并且在sqrt(2)间隔网格上的更细N复制不确定(点估计-0.82)。方法论贡献——消除共享难度混杂因素的相对率估计器、错位群控制和预先指定的失败分类法——可迁移到任何强度可参数化的归纳偏置。诚实范围:主要估计器beta_diff是在初始分析揭示正斜率可识别性问题后事后采用的;设计从未外部预注册;标题数字基于粗N网格上七个群大小的OLS斜率。这是一项探索性研究,而非确认性测量;错位群结果是最清晰的发现,也是我们报告最有信心的发现。在新鲜种子上的注册复制是未来工作。

英文摘要

Equivariance theory predicts that an architectural symmetry prior reduces sample complexity by a factor of |G|; this is widely cited but rarely measured as a scaling law with controls that separate the prior from its confounds. On a controlled C_n-symmetric task, we report three findings. First, a wrong-group control with identical orbit size and matched compute is worse than no constraint (joint pairwise CI [+0.79, +3.26] excludes zero, robust across estimators); misaligned constraint is actively harmful, not merely unhelpful. Second, an augmentation baseline equipped with test-time orbit averaging matches the equivariant model exactly -- bit-identical per-epoch validation curves across matched cells -- so the architecture-vs-augmentation gap is conditional on asymmetric test-time computation, not unconditional. Third, the relative exchange rate beta_diff = 1.28 is consistent in sign and order of magnitude with the theoretical 1.0 (single-level CI [+0.92, +2.05]); the more conservative two-level bootstrap (seeds x group sizes) widens this to [-0.63, +1.72], including zero, and a finer-N replication on a sqrt(2)-spaced grid is inconclusive (point estimate -0.82). The methodological contributions -- the relative-rate estimator that cancels the shared-difficulty confound, the wrong-group control, and a pre-specified failure taxonomy -- transfer to any inductive bias whose strength can be parameterised. Honest scoping: the primary estimator beta_diff was adopted post-hoc after the initial analysis revealed a positive-slope identifiability problem; the design was never externally pre-registered; and the headline number rests on an OLS slope over seven group sizes on a coarse N grid. This is an exploratory study, not a confirmatory measurement; the wrong-group result is the cleanest finding and the one we report with the most confidence. A registered replication on fresh seeds is future work.

2606.01078 2026-06-02 cs.LG stat.CO stat.ME

Non-Vacuous Certification of Transport MCMC via Oscillation-Controlled Normalizing Flows

通过振荡控制归一化流实现传输MCMC的非平凡认证

Jun Hu

AI总结 提出振荡控制归一化流框架,首次为传输MCMC采样器提供严格的非平凡谱隙界,通过谱归一化、基于覆盖的经验振荡界和振荡正则化训练,在多个后验分布上实现可认证的采样效率。

详情
Comments
36 pages, includes appendix
AI中文摘要

传输MCMC训练归一化流以预处理Metropolis-Hastings提议,在具有挑战性的后验分布上实现了高经验效率;然而,先前的工作没有为此类采样器产生数值上非平凡的、严格的谱隙界。我们建立了第一个这样的界。对于香蕉族上的独立MH,我们在D=2时认证了γ^*=0.828(在原始空间中覆盖),在D=5时认证了γ^*≥7.6×10^{-4}(在解析解卷的高斯空间中覆盖,并具有网格认证的梯度界,在所述数值Lipschitz认证下),两者均在95%置信度下严格。该框架基于三个支柱:(i) 具有缩减尺度裁剪的谱归一化将流Lipschitz常数从10^{47}约束到10^4;(ii) 基于覆盖的经验振荡界用数据依赖的证书替代了空洞的分析界;(iii) 振荡正则化训练在不损失密度拟合的情况下将经验振荡减少60-90%,将实用证书扩展到D=20(γ^*≥1.7×10^{-4})。在另外四个目标(高斯混合、剪切建筑、Neal的漏斗、贝叶斯逻辑回归)上的测试确定了三个精确障碍:边界曲率、目标刚度和尾部覆盖不匹配。仿射与样条比较表明,更简单的架构在相同NLL下产生更紧的证书,颠倒了通常的表达性层次。

英文摘要

Transport MCMC trains a normalizing flow to precondition Metropolis--Hastings proposals, achieving high empirical efficiency on challenging posteriors; yet no prior work produces a numerically non-vacuous, rigorous spectral-gap bound for such samplers. We establish the first such bounds. For independence MH on the banana family we certify (γ^\ast = 0.828) at (D = 2) (covering in the original space) and (γ^\ast \ge 7.6\times 10^{-4}) at (D = 5) (covering in an analytically unwarped Gaussian space with a grid-certified gradient bound under the stated numerical Lipschitz certification), both rigorous at 95% confidence. The framework rests on three pillars: (i) spectral normalization with reduced scale clips constrains the flow Lipschitz constant from (10^{47}) to (10^4); (ii) a coverage-based empirical oscillation bound replaces the vacuous analytical bound with a data-dependent certificate; and (iii) oscillation-regularised training cuts the empirical oscillation by 60--90% at no cost to density fit, extending practical certificates through (D = 20) ((γ^\ast \ge 1.7\times 10^{-4})). Tests on four further targets (Gaussian mixture, shear-building, Neal's funnel, Bayesian logistic regression) identify three precise barriers: boundary curvature, target stiffness, and tail-coverage mismatch. An affine-vs-spline comparison shows that simpler architectures yield tighter certificates at identical NLL, inverting the usual expressiveness hierarchy.

2606.01034 2026-06-02 cs.CL stat.ME

A Finite-Calibration Regime Map for LLM Judge Panels

有限校准机制图:LLM评审团面板

Bin Zhu, Yanghui Rao

AI总结 研究在有限人工标注预算下,低维堆叠器与联合输出表对LLM评审团面板的校准权衡,提出有限校准面板选择方法,实验表明多数评审输出可加或冗余。

详情
Comments
Work in Progress
AI中文摘要

我们研究了在有限人工标注预算下,LLM评审团面板应何时使用低维堆叠器与联合输出表进行校准。低维堆叠器估计成本小但忽略交互,而联合表校准器可表示交互但需为单元格计数和未见模式付出代价。我们将此权衡构建为有限校准机制图,并实例化为有限校准面板选择——一种可部署的验证选择器,涵盖评审路径、前缀大小和聚合器家族,并辅以表格和参数估计诊断。在RewardBench、LLMBar、SummEval和Arena100K上,使用包含DeepSeek V4 Flash的七评审池,标量/可靠性聚合在20个真实数据集-预算单元中赢得16个,表明当前评审输出通常是可加或冗余的。受控的校准增长数据显示互补机制:可加标签仍偏好标量,而六路交互选择更大的联合表,其测试MSE从未见质量消失前的0.224降至0.061。因此,实际问题不是“需要多少评审?”,而是下一个评审的信息在可用人工标注下是否可估计。

英文摘要

We study when LLM judge panels should be calibrated with low-dimensional stackers versus joint output tables under finite human-label budgets. Low-dimensional stackers have small estimation cost but miss interactions, whereas joint-table calibrators can represent interactions but pay for cell counts and unseen patterns. We cast this tradeoff as a finite-calibration regime map and instantiate it as Finite-Calibration Panel Selection, a deployable validation selector over judge path, prefix size, and aggregator family with table and parametric estimation diagnostics. On RewardBench, LLMBar, SummEval, and Arena100K with a seven-judge pool including DeepSeek V4 Flash, scalar/reliability aggregation wins 16 of 20 real dataset--budget cells, indicating that current judge outputs are often additive or redundant. Controlled calibration-growth data show the complementary regime: additive labels remain scalar-favored, whereas a six-way interaction selects a larger joint table and its test MSE drops from 0.224 to 0.061 once unseen mass vanishes. Thus the practical question is not ``how many judges?'' but whether the next judge's information is estimable under the available human labels.

2606.01011 2026-06-02 math.ST stat.ME stat.TH

Semiparametric Efficiency of Residual Correlation Testing under Gaussian Additive Noise Models

高斯加性噪声模型下残差相关性检验的半参数效率

Yin Tang, Yanyuan Ma, Bing Li

AI总结 本文在高斯加性噪声模型下,证明了残差皮尔逊相关系数估计量是半参数有效的,并建立了条件独立性检验的渐近性质与推断方法。

详情
AI中文摘要

本文研究了高斯加性噪声模型(GANM)下的条件独立性检验,其中两个变量被建模为协变量的非线性函数,并具有独立的双变量高斯回归误差。在此框架下,条件独立性可由回归误差的相关系数刻画,这启发了一种基于拟合残差计算的皮尔逊相关系数的检验。尽管形式简单,但所得检验的渐近行为和统计效率尚未被充分理解。本文发展了GANM下的半参数效率理论,并令人惊讶地表明,有效估计量恰好与普通残差皮尔逊相关系数估计量一致。我们进一步建立了所提检验的渐近性质,并开发了相应的推断程序。模拟研究表明,所提方法在保持有效I类错误控制的同时,实现了接近最优的效率和有竞争力的经验功效。我们还将所提检验应用于美国股票收益的条件依赖性分析。

英文摘要

This paper studies conditional independence testing under the Gaussian additive noise model (GANM), where two variables are modeled as nonlinear functions of covariates with independent bivariate Gaussian regression errors. Under this framework, conditional independence can be characterized by the correlation coefficient of the regression errors, which motivates a test based on the Pearson correlation coefficient computed from the fitted residuals. Despite its simple form, the asymptotic behavior and statistical efficiency of the resulting test have not been well understood. In this paper, we develop the semiparametric efficiency theory under GANM and show, surprisingly, that the efficient estimator coincides exactly with the ordinary residual Pearson correlation estimator. We further establish the asymptotic properties of the proposed test and develop the corresponding inference procedure. Simulation studies demonstrate that the proposed method achieves near-oracle efficiency and competitive empirical power while maintaining valid Type I error control. We further apply the proposed test to conditional dependence analysis of U.S. stock returns.

2606.01002 2026-06-02 stat.ME cs.LG math.ST stat.TH

Theoretical Analysis of Engression and Reverse Markov Engression

Engression与反向马尔可夫Engression的理论分析

Jiaqi Huang, Gongjun Xu, Ji Zhu

AI总结 本文针对Engression及其反向马尔可夫扩展,在深度神经网络参数化下建立了非渐近收敛界,并通过能量距离链式法则分析了误差传播,得到了接近最优的过量风险界。

详情
AI中文摘要

Engression是最近提出的用于条件分布学习的有效框架。其多步反向马尔可夫扩展通过将复杂条件采样分解为顺序反向转移,进一步提高了生成灵活性。尽管这些方法具有强大的实证性能,但其严格的有限样本统计保证仍然缺乏。在本文中,在深度神经网络参数化下,我们通过直接控制学习到的条件分布与目标条件分布之间的能量距离,建立了Engression的非渐近收敛界。对于反向马尔可夫框架,我们进一步开发了基于能量距离的链式法则,从而能够严格分析反向步骤间的误差传播。我们的分析得到了相应的过量风险界,相对于一般Hölder类上的经典极小化最优速率,该界在对数因子意义下是接近最优的。

英文摘要

Engression is a recently proposed and effective framework for conditional distribution learning. Its multi-step Reverse Markov extension further improves generative flexibility by decomposing complex conditional sampling into sequential reverse transitions. Despite their strong empirical performance, rigorous finite-sample statistical guarantees for these methods remain unavailable. In this paper, under deep neural network parameterizations, we establish nonasymptotic convergence bounds for Engression by directly controlling the Energy Distance between the learned and target conditional distributions. For the Reverse Markov framework, we further develop an Energy-Distance-based chain rule that enables a rigorous analysis of error propagation across reverse steps. Our analysis yields corresponding excess-risk bounds that are near-optimal up to logarithmic factors relative to the classical minimax rate over a general Hölder class.

2606.00984 2026-06-02 stat.ML cs.LG

Practical and Optimal Algorithm for Linear Contextual Bandits with Rare Parameter Updates

线性上下文赌博机中参数稀有更新的实用最优算法

Sanghoon Yu, Min-hwan Oh

AI总结 针对参数更新次数受限的线性上下文赌博机问题,提出两种仅需O(log log T)次参数更新的算法,在静态调度下达到极小化最优遗憾,并显著降低计算复杂度。

详情
Comments
Accepted at ICML 2026
AI中文摘要

我们研究在参数稀有更新下的线性上下文赌博机:学习器只能在少量更新时刻将奖励反馈纳入其参数估计,同时仍在线观察上下文并顺序选择动作。这一观点澄清了文献中常被模糊的实际区别:许多“严格批处理”方法额外限制了区间内上下文的自适应性,即区间内的动作规则不能依赖于该区间内已实现的上下文/动作序列(除了当前轮次的上下文)。对于线性上下文赌博机,我们提出了两种仅需$O(\log\log T)$次参数更新的实用算法。我们的第一个算法BLCE-G在静态调度下,同时在小$K$和大$K$机制下达到极小化最优遗憾(达到$T$的多对数因子)。第二个算法BLCE去除了近G-最优设计步骤——这是先前严格批处理静态网格方法中主要的计算瓶颈——同时保持极小化最优遗憾,并在最优算法中实现了已知最低的运行时间复杂度。我们进一步将这些稀有更新和计算原则扩展到广义线性上下文赌博机。总体而言,我们的结果在$O(\log\log T)$次参数更新下产生了统计最优且计算高效的算法。

英文摘要

We study linear contextual bandits under rare parameter updates: the learner may incorporate reward feedback into its parameter estimate only at a small number of update times, while still observing contexts online and selecting actions sequentially. This viewpoint clarifies a practical distinction that is often blurred in the literature: many "strictly batched" methods additionally restrict within-interval context adaptivity, meaning that the action rule inside an interval cannot depend on the sequence of realized contexts/actions in that interval (beyond the current round's context). For linear contextual bandits, we propose two practical algorithms with only $O(\log\log T)$ parameter updates. Our first algorithm BLCE-G attains minimax-optimal regret (up to polylogarithmic factors in $T$) simultaneously in both the small-$K$ and large-$K$ regimes under a static schedule. Our second algorithm BLCE removes the near G-optimal design step -- a dominant computational bottleneck in prior strictly batched static-grid methods -- yet preserves minimax-optimal regret and achieves the lowest known runtime complexity among optimal algorithms. We further extend these rare-update and computational principles to generalized linear contextual bandits. Overall, our results yield statistically optimal algorithms under $O(\log\log T)$ parameter updates that are also computationally efficient in practice.

2606.00965 2026-06-02 stat.ME

Design-based edge-level causal inference with machine learning assisted covariate adjustment

基于设计的边级因果推断与机器学习辅助协变量调整

Haoyang Yu, Yilin Li, Lu Deng, Yong Wang, Xin Lu, Hanzhong Liu

AI总结 针对有向网络中的边级结果,在二元干扰下,通过设计基于Horvitz-Thompson估计量、三折样本分裂交叉拟合及协变量调整,实现有效推断并提升效率。

详情
AI中文摘要

我们研究了有向网络中二元干扰下边级结果的设计因果推断。在此设定中,结果定义在有向边上,并依赖于单位对的联合处理分配,这导致了复杂的依赖结构,使得为标准节点级数据开发的估计和推断程序失效。我们为一般类别的边级因果效应构建了Horvitz-Thompson估计量,并在温和正则条件下建立了其渐近正态性。为了实现有效推断,我们开发了利用网络依赖的可识别成分的方差估计量,从而得到比经典方法保守性更低的界。为了提高效率,我们通过样本分裂和交叉拟合程序纳入辅助协变量。一个关键的技术挑战是,由于共享单位引起的依赖,标准的二折样本分裂在存在边级结果时失败。为了解决这个问题,我们引入了一种三折样本分裂和交叉拟合方案,恢复了无偏估计所需的条件独立性。在稳定性条件下,得到的协变量调整估计量是渐近正态的,并适用于线性调整和灵活的机器学习方法。我们进一步引入了一个校准步骤,保证了相对于未调整估计量没有渐近效率损失。模拟研究和实际数据应用证实了理论结果,并展示了显著的效率提升。

英文摘要

We study design-based causal inference for edge-level outcomes in directed networks under dyadic interference. In this setting, outcomes are defined on directed edges and depend on the joint treatment assignments of pairs of units, inducing a complex dependence structure that invalidates standard estimation and inference procedures developed for node-level data. We construct Horvitz--Thompson estimators for a general class of edge-level causal effects and establish their asymptotic normality under mild regularity conditions. To enable valid inference, we develop variance estimators that exploit identifiable components of network dependence, yielding substantially less conservative bounds than classical approaches. To improve efficiency, we incorporate auxiliary covariates through a sample splitting and cross-fitting procedure. A key technical challenge is that standard two-fold sample splitting fails in the presence of edge-level outcomes due to the dependence induced by shared units. To address this issue, we introduce a three-fold sample splitting and cross-fitting scheme that restores the conditional independence required for unbiased estimation. Under a stability condition, the resulting covariate-adjusted estimator is asymptotically normal and accommodates both linear adjustment and flexible machine learning methods. We further introduce a calibration step that guarantees no asymptotic efficiency loss relative to the unadjusted estimator. Simulation studies and a real-data application confirm the theoretical results and demonstrate substantial efficiency gains.

2606.00934 2026-06-02 stat.ML cs.LG stat.AP stat.ME

Efficient Synthetic Network Generation via Latent Embedding Reconstruction

通过潜在嵌入重建的高效合成网络生成

Feifan Jiang, Yinan Bu, Shihao Wu, Gongjun Xu, Ji Zhu

AI总结 提出SyNGLER框架,基于潜在空间网络模型,通过重建潜在嵌入生成合成网络,兼顾效率与结构保真度。

详情
AI中文摘要

网络数据在社会科学、生物学和信息系统中无处不在。生成逼真的合成网络数据具有从网络模拟到科学发现的广泛应用。然而,许多现有的黑盒网络生成方法倾向于过拟合观测数据,同时忽视特征网络结构,并在大规模下产生大量计算开销。这些实际挑战要求合成网络生成方法既高效又能捕捉网络的结构特性。在本文中,我们介绍了通过潜在嵌入重建的合成网络生成(SyNGLER),这是一个基于潜在空间网络模型的通用且高效的合成网络生成框架。给定一个观测网络,SyNGLER首先通过潜在空间网络模型学习低维潜在节点嵌入,然后通过在这些嵌入上构建无分布生成器来重建潜在空间。对于生成,SyNGLER首先从潜在空间中的生成器采样(或重采样)节点嵌入,然后使用潜在空间网络模型生成合成网络。通过潜在空间框架,SyNGLER保留了网络中的独特特征,如稀疏性和节点度异质性,同时允许以比许多现有深度架构更低的计算成本进行高效训练。我们通过开发真实边缘分布与合成边缘分布之间距离的一致性结果来提供理论保证。实证研究进一步证明了SyNGLER的有效性,与现有方法相比,它高效地生成了更好地保留关键网络特征(如网络矩和度分布)的网络。代码可在 https://github.com/FeifanJiang/syngler 获取。

英文摘要

Network data are ubiquitous across the social sciences, biology, and information systems. Generating realistic synthetic network data has broad applications from network simulation to scientific discovery. However, many existing black-box approaches for network generation tend to overfit observed data while overlooking characteristic network structure, and incur substantial computational overhead at scale. These practical challenges call for synthetic network generation methods that are both efficient and capable of capturing structural properties of networks. In this paper, we introduce Synthetic Network Generation via Latent Embedding Reconstruction (SyNGLER), a general and efficient framework for synthetic network generation that builds on latent space network models. Given an observed network, SyNGLER first learns low-dimensional latent node embeddings via a latent space network model and then reconstructs the latent space by building a distribution-free generator over these embeddings. For generation, SyNGLER first samples (or resamples) node embeddings from the generator in the latent space and then produces synthetic networks using the latent space network model. Through the latent space framework, SyNGLER preserves unique characteristics in networks such as sparsity and node degree heterogeneity, while allowing for efficient training with lower computational cost than many existing deep architectures. We provide theoretical guarantees by developing consistency results on the distance between the true and synthetic edge distributions. Empirical studies further demonstrate the effectiveness of SyNGLER, which efficiently produces networks that better preserve key network characteristics such as network moments and degree distributions compared with existing approaches. Code is available at https://github.com/FeifanJiang/syngler.

2606.00913 2026-06-02 stat.ML cs.LG

Bandit Simulation for Average Reward Inference

平均奖励推断的赌博机模拟

Samya Praharaj, Chih-Yu Chang, Koulik Khamaru, Kelly W. Zhang

AI总结 提出BSI框架,通过拟合环境模拟器并传播参数不确定性,为自适应赌博机算法构建渐近有效的置信区间。

详情
AI中文摘要

多臂赌博机算法越来越多地用于在线平台、临床试验和社会科学实验,但对其性能的有效统计推断仍然是一个开放挑战。部署赌博机后,一个自然的问题是能否为其平均奖励构建置信区间,并评估其是否可靠地优于基线策略。任何单次赌博机部署中获得的总奖励是随机的,由于奖励的随机性,在同一人群上部署两次赌博机通常会产生不同的奖励轨迹。标准统计推断方法无法使用,因为赌博机算法在收集的数据中引入了复杂的依赖性,违反了经典方法所依赖的独立同分布假设。此外,现有的自适应收集数据推断方法仅适用于不依赖于数据收集算法的估计量(例如固定动作下的平均奖励)。我们提出了用于推断的赌博机模拟(BSI),这是一个框架,它从观测数据(在线或离线)中拟合赌博机环境的模拟器,并用于估计任何评估策略(包括自适应黑盒算法)下的平均奖励。BSI将估计的模拟器参数的不确定性正式传播到置信区间构建中。此外,BSI的有效性仅需要对行为策略的弱探索假设,并避免了重要性加权。我们证明BSI产生渐近有效的置信区间,并通过实验证明在标准离线策略评估方法失败的情况下,BSI能保持名义覆盖。

英文摘要

Multi-arm bandit algorithms are increasingly used in online platforms, clinical trials, and social science experiments, but valid statistical inference on their performance remains an open challenge. After deploying bandits, a natural question is whether one can construct a confidence interval for its mean reward and assess whether it reliably outperforms a baseline policy. The total reward achieved in any single bandit deployment is random, and deploying a bandit twice on the same population typically yields different reward trajectories due to stochastic rewards. Standard statistical inference methods cannot be used because bandit algorithms introduce complex dependencies in the collected data, which violate the i.i.d. assumption underlying many classical approaches. Moreover, existing inference methods for adaptively collected data only apply to estimands that do not depend on the data-collection algorithm (such as the mean reward under a fixed action). We propose Bandit Simulation for Inference (BSI), a framework that fits a simulator of the bandit environment from observed data--either on-policy or off-policy--and uses it to estimate the mean reward under any evaluation policy, including adaptive blackbox algorithms. BSI formally propagates uncertainty in the estimated simulator parameters into the confidence interval construction. Furthermore, for BSI to be valid, it requires only weak exploration assumptions on the behavior policy and avoids importance weighting. We prove that BSI yields asymptotically valid confidence intervals, and demonstrate empirically that it maintains nominal coverage in settings where standard off-policy evaluation methods fail.

2606.00900 2026-06-02 stat.ME

Notes on Randomized Controlled Trials for Studying Social Media Harms

关于研究社交媒体危害的随机对照试验的注记

Chris Felton

AI总结 本文指出随机对照试验和个人层面观察性研究在研究社交媒体危害时的局限性,强调局部干预与全局干预的区别,并提出通过三角验证不同不完美证据来深入理解社交媒体对青少年心理健康的总体影响。

详情
Comments
34 pages, 2 figures
AI中文摘要

随机对照试验(RCTs)和个人层面观察性研究在关于社交媒体危害的辩论中占据重要地位。我强调了此类证据的一些未被充分认识的局限性。最重要的是,已发表的RCTs通常识别的是 extit{局部}或小规模干预的效果:一个人被分配退出社交媒体,但她的直接同伴仍然大量使用它。相比之下,社交媒体的批评者关注的是 extit{全局}或大规模干预:美国青少年中社交媒体的广泛采用。这种全局干预既改变了近端社会环境,也改变了更广泛的文化,可能伤害完全戒除社交媒体的青少年。本文详细讨论了局部-全局的区别,并就现有RCTs和个人层面观察性研究了解社交媒体危害的局限性提供了其他注记。我建议,三角验证不同形式的不完美证据可能为社交媒体对青少年心理健康的总体影响提供最深入的见解。

英文摘要

Randomized controlled trials (RCTs) and person-level observational studies feature prominently in debates over social media harms. I highlight some under-acknowledged limitations of such evidence. Most important is that published RCTs typically identify effects of a \textit{local}, or small-scale, intervention: a person is assigned to quit social media, but her immediate peers continue using it in large numbers. Critics of social media, in contrast, focus on a \textit{global}, or large-scale, intervention: the mass adoption of social media among U.S. teenagers. Such global interventions alter both the proximal social environment and the broader culture, potentially harming teenagers who abstain from social media entirely. This paper discusses the local--global distinction at length and offers other notes on the limits of learning about social media harms from existing RCTs and person-level observational studies. I suggest that triangulating different forms of imperfect evidence may provide the deepest insights about social media's aggregate effect on teen mental health.

2606.00887 2026-06-02 stat.ME math.ST stat.TH

Hypothesis Testing for a Functional Parameter via Self-normalization

通过自归一化对函数参数进行假设检验

Yi Zhang, Xiaofeng Shao

AI总结 提出基于样本分割的自归一化方法,用于检验函数参数的简单或复合假设,推导了零假设下的渐近分布和局部备择下的功效函数。

详情
Journal ref
Journal of the American Statistical Association 2025 120 (552)
AI中文摘要

对函数参数进行简单或复合假设检验在时间序列分析中引起了广泛关注。为了适应未知的时间依赖性,经典的非参数方法如块自助法和子抽样法都涉及带宽参数,其选择会显著影响有限样本性能。自归一化(SN)方法在应用于有限维参数的推断时无需调整参数,但其对函数参数的适用性尚不清楚。在本文中,我们提出了一种基于样本分割的方法,将SN方法推广到函数参数的假设检验。我们的SS-SN(样本分割加自归一化)思想广泛适用于许多函数参数的检验问题,包括检验边际累积分布函数的简单/复合假设、检验时间可逆性以及检验多元时间序列谱分布的变化点。具体地,我们推导了在简单和复合零假设下SS-SN检验统计量的渐近分布,并推导了局部备择下的渐近功效函数。数值模拟表明,与许多现有方法相比,我们的新检验倾向于产生准确的检验水平并具有有竞争力的功效性能。

英文摘要

Testing simple or composite hypothesis on a functional parameter has attracted considerable attention in time series analysis. To accommodate for the unknown temporal dependence, classical nonparametric approaches such as block bootstrapping and subsampling all involve a bandwidth parameter, the choice of which can substantially affect the finite sample performance. The self normalization (SN) method is tuning parameter free when applied to the inference of a finite-dimensional parameter but its applicability to a functional parameter is unknown. In this paper, we propose a sample splitting based approach to generalize the SN method to hypothesis testing of a functional parameter. Our SS-SN (sample splitting plus self-normalization) idea is broadly applicable to many testing problems for functional parameters, including testing for simple/composite hypothesis on marginal cumulative distribution function, testing for time-reversibility and testing for a change point on the spectral distribution of a multivariate time series. Specifically, we derive the pivotal limiting distributions of our SS-SN test statistics under the null for both simple and composite null hypothesis, and derive the limiting power function under the local alternatives. Numerical simulations show that our new tests tend to yield accurate size with competitive power performance as compared to many existing ones.

2606.00878 2026-06-02 stat.ME

Anytime-valid testing with e-values and confirmatory adaptive designs

基于e值的任意时刻验证检验与确认性自适应设计

Werner Brannath, Lasse Fischer

AI总结 本文比较了基于e值的任意时刻验证检验与确认性自适应设计,揭示其形式与方法上的联系与差异,并指出各自优势及相互借鉴之处。

详情
AI中文摘要

确认性自适应设计在30多年前被引入,允许在临床试验过程中进行样本量重新评估以及治疗、终点和亚群的选择。最近,基于e值的序贯检验被开发用于任意时刻有效推断,承诺提供看似相似甚至更大的灵活性和实用性。在本文中,我们比较这两个独立发展的概念,揭示它们在形式和方法上的联系与差异。具体来说,我们表明自适应设计工具如条件误差函数和组合检验在形式上等价于基于e值的任意时刻有效序贯检验。然而,尽管它们共同的基本意图是为统计推断带来灵活性,但它们有相当不同的侧重点:使用组合检验和条件误差函数的假设检验通常旨在在所提供灵活性下耗尽I类错误率,而基于e值的检验则侧重于可选继续、所选水平以及最近扩展中要控制的损失函数方面的额外灵活性。我们还指出最近的e值成果如何能够丰富临床试验方法学,以及自适应设计方法学如何能够启发和改进基于e值的检验。

英文摘要

Confirmatory adaptive designs were introduced more than 30 years ago and enable for example sample size re-assessments and the selection of treatments, endpoints as well as subpopulations during the course of a clinical trial. Recently, sequential tests based on e-values for an anytime-valid inference have been developed, promising seemingly similar or even more flexibility and utility. In this note, we compare these two independently developed concepts, shedding light on their formal and methodological connections and differences. Specifically, we show that adaptive design tools like conditional error functions and combination tests are formally equivalent to e-value based, anytime-valid sequential tests. However, in spite of their common fundamental intention to bring flexibility into statistical inference, they have quite different emphases: While hypothesis testing with combination tests and conditional error function usually intent to exhaust type I error rates under the offered flexibility, e-value based testing aims on the additional flexibility with regard to optional continuation, the chosen level and, in recent extensions, in the loss functions to be controlled. We also indicate how recent e-value achievements could enrich clinical trial methodology and adaptive design methodology could inspire and improve e-value based testing.

2606.00867 2026-06-02 stat.ML cs.LG eess.SP

Statistical Analysis of using the Shapley Value for Sensor Anomaly Localization with Accurate Classifiers

使用Shapley值进行传感器异常定位与准确分类器的统计分析

Xubin Fang, Rick S. Blum

AI总结 本文通过数学定义的二元最优分类器分析Shapley值在传感器异常定位中的性能,证明在独立观测下等价于低复杂度测试,而在相关双变量高斯/拉普拉斯场景下两者存在本质差异,并首次提供理论统计结果。

详情
AI中文摘要

最近的出版物建议使用Shapley值进行传感器异常/攻击定位。我们通过在Shapley值计算中使用数学定义的二元最优分类器来研究这种方法的性能。为了判断定位性能,我们研究给定传感器观测的Shapley值确定该观测是否异常的能力。首先,我们证明对于独立传感器观测的情况,使用Shapley值的优化异常测试等价于使用Shapley值计算中单个项的优化低复杂度异常测试,产生完全相同的错误概率。对于涉及两个传感器的一些流行的相关观测情况,包括相关双变量高斯/拉普拉斯概率密度函数和常数/高斯攻击/异常,我们证明这两个测试本质上是不同的,产生不同的决策区域和错误概率。此外,我们证明在某些统计相关的双变量高斯场景中,当相关幅度较大且存在加性攻击/异常时,Shapley值测试有时严格劣于另一个(Shapley计算中的单个项)测试,而在其他情况下则严格优于它,具体取决于相关的符号。在这些情况下,可以结合这两种方法以获得严格更好的方法。这些结果首次提供了基于Shapley定位的理论统计分析,鉴于许多研究人员广泛接受Shapley值,这些结果似乎非常有趣,并应鼓励对该主题的进一步研究。提供了数值结果以说明我们的发现。

英文摘要

Recent publications have suggested using the Shap- ley value for sensor anomaly/attack localization. We study the performance of such an approach by using mathematically de- fined optimum binary classifiers in the Shapley value calculation. To judge localization performance, we study the ability of the Shapley value of a given sensor observation to determine if that observation is anomalous. First, we prove that for cases with independent sensor observations, an optimized anomaly test using the Shapley value is equivalent to an optimized lower-complexity anomaly test using a single term in the Shapley value calculation, yielding the exact same probability of error. For some popular dependent observation cases involving two sensors, including correlated bivariate Gaussian/Laplacian probability density functions and constant/Gaussian at- tacks/anomalies, we prove that these two tests are fundamentally different, yielding different decision regions and error probabil- ities. Further, we prove that the Shapley value test is sometimes strictly inferior to the other (single term in Shapley calculation) test in certain statistically dependent bivariate Gaussian scenarios with large correlation magnitude and additive attacks/anomalies, while it is strictly superior in others, depending on the sign of the correlation. One can combine these two approaches to obtain a strictly better approach in these cases. These results, which provide the first theoretical statistical analysis of Shapley-based localization, seem very interesting based on the wide acceptance of the Shapley value by many researchers and should encourage further research on this topic. Numerical results are provided which illustrate our findings.

2606.00864 2026-06-02 stat.ME math.ST stat.TH

Another Look at Bandwidth-free Inference: a Sample Splitting Approach

再论无带宽推断:一种样本分割方法

Yi Zhang, Xiaofeng Shao

AI总结 针对多维参数的无带宽检验在样本量中等且维度和时间相依性适中时存在严重尺寸扭曲,本文提出基于样本分割的方法将参数降维至一维,并构建L∞型和L2型SS-SN检验统计量,推导其极限分布,有效缓解尺寸扭曲。

详情
Journal ref
Journal of the Royal Statistical Society Series B: Statistical Methodology 2024 86 (1)
AI中文摘要

多维参数的无带宽检验/推断在计量经济学和统计学文献中引起了广泛关注。由于无需调节参数,这些检验易于实施,并且与传统的基于HAC的方法(涉及一致长期方差估计)相比,具有更准确的尺寸。然而,当样本量较小/中等时,如果参数维度和时间相依性的幅度适中,这些无带宽检验会出现较大的尺寸扭曲,使其在实践中不可靠。在本文中,我们提出了一种基于样本分割的方法,将参数维度降至一维,以便进行后续的无带宽推断。我们的SS-SN(样本分割加自归一化)思想广泛适用于时间序列的许多检验问题,包括均值检验、零自相关检验、时间序列回归模型中的线性假设检验以及多元均值的变点检验。具体地,我们提出了L∞型和L2型SS-SN检验统计量,推导了它们在原假设和备择假设下的极限分布,并通过模拟显示了它们在缓解尺寸扭曲方面的有效性。作为一个重要的理论贡献,当维度随样本量增长而发散时,我们得到了多元均值检验问题中两种SS-SN检验统计量的极限分布。此外,我们证明了在增长维度设定下,L∞型和L2型SS-SN检验统计量在原假设下的渐近独立性。

英文摘要

The bandwidth-free tests/inferences for a multi-dimensional parameter have attracted considerable attention in econometrics and statistics literature. These tests can be conveniently implemented due to their tuning-parameter free nature and possess more accurate size as compared to the traditional HAC-based approaches, where consistent long run variance estimation was involved. However, when sample size is small/medium, these bandwidth-free tests exhibit large size distortion when both the dimension of the parameter and the magnitude of temporal dependence are moderate, making them unreliable to use in practice. In this paper, we propose a sample splitting based approach to reduce the dimension of the parameter to one for the subsequent bandwidth-free inference. Our SS-SN (sample splitting plus self-normalization) idea is broadly applicable to many testing problems for time series, including mean testing, testing for zero autocorrelation, linear hypotheses testing in a time series regression model and testing for a change point in multivariate mean. Specifically, we propose $L_{\infty}$-type and $L_2$-type SS-SN test statistics and derive their limiting distributions under both the null and alternatives and show their effectiveness in alleviating size distortion via simulations. As an important theoretical contribution, we obtain the limiting distributions for both SS-SN test statistics in the multivariate mean testing problem when the dimension is allowed to diverge as sample size grows to infinity. In addition we show the asymptotic independence of $L_{\infty}$-type and $L_2$-type SS-SN test statistics under the null in the growing dimensional setting.

2606.00858 2026-06-02 stat.ME math.ST stat.TH

Change-Point Detection for Object-valued Time Series

对象值时间序列的变点检测

Yi Zhang, Changbo Zhu, Xiaofeng Shao

AI总结 针对度量空间中对象值时间序列,提出基于自归一化的统计量检测边际分布变化,结合Wild Binary Segmentation算法估计多个变点,理论证明渐近性质并展示广泛适用性。

详情
Journal ref
Journal of Business and Economic Statistics 2026 44 1
AI中文摘要

本文关注度量空间中对象值数据的变点检测,该问题近期在统计学和计量经济学文献中引起兴趣。现有方法要么专注于独立数据,要么只能检测Fréchet均值或方差的变化。本文提出基于自归一化(SN)的统计量,用于检测对象值时间序列边际分布的变化。我们的检验普遍适用于各种对象值数据,如分布数据和网络数据,并能处理弱序列相关性。此外,所提出的检验统计量几乎无需调整参数,具有枢轴极限零分布,且仅使用成对距离。当与Wild Binary Segmentation算法(WBS)结合时,我们的统计量可用于估计多个变点的数量和位置。在单一变点设定下,推导了基于SN的统计量在原假设和局部备择假设下的渐近结果。首次在非参数设定下,对广泛的对象值时间序列类证明了WBS估计的一致性,这需要新的非标准理论论证。通过大量数值实验和真实数据分析,展示了我们方法的有效性和广泛适用性。

英文摘要

This article is concerned with change point detection for object-valued data that reside in a metric space, which has attracted some recent interests in statistics and econometrics literature. The existing methods either focus on independent data or can only detect change in the Fréchet mean or variance. In this paper, we propose a self-normalization (SN, hereafter) based statistic for detecting a shift in the marginal distribution of object-valued time series. Our test is universally applicable to a wide range of object-valued data, such as distributional and network data, and can accommodate weak serial dependence. In addition the proposed test statistic is almost tuning parameter free, has pivotal limiting null distribution and only uses the pairwise distances. When combined with the Wild Binary Segmentation algorithm (WBS, hereafter), our statistic can be used to estimate the number and locations of multiple change points. Asymptotic results for our SN based statistic are derived under both null and local alternatives in the single change point setting. For the first time, the WBS estimation consistency is shown for a broad class of object-valued time series and in a nonparametric setting, which requires new non-standard theoretical arguments. Extensive numerical experiments and real data analysis are conducted to illustrate the effectiveness and broad applicability of our proposed method.

2606.00847 2026-06-02 stat.ME

Partial Identification under High-Dimensional Potential Outcomes and Confounders via Optimal Transport

高维潜在结果和混杂因素下的部分识别:基于最优传输的方法

Yunfeng Wang, Zhiheng Zhang, Zijun Gao

AI总结 针对高维潜在结果和混杂因素下最优传输方法计算和统计不可行的问题,提出通过将传输问题分解为低维信号子空间和高维残差子空间,并利用切片Wasserstein距离恢复残差传输能量,从而得到更紧的因果界。

详情
Comments
19 pages, 3 figures
AI中文摘要

当点识别不可能时,部分识别提供了有信息量的因果保证,但基于最优传输(OT)的现有方法在高维设置中变得计算和统计上不可行。当潜在结果和混杂因素都是高维时,这一局限性尤为严重,经典OT界遭受维数灾难和不利的收敛速度。为了应对这一挑战,我们提出了一种新颖的估计器,将传输问题分解为低维信号子空间和高维残差子空间。与丢弃残差信息的现有基于投影的方法不同,我们使用切片Wasserstein距离恢复残差传输能量,该距离计算高效且对高维鲁棒。我们建立了基于残差结构控制近似间隙的可解释条件,并提供了信号维度选择的数据驱动规则。实证结果表明,我们的估计器通过恢复丢失的传输能量,始终优于仅投影的基线,在高维中产生更有信息量的因果界,同时保持计算可行性。

英文摘要

Partial identification provides informative causal guarantees when point identification is impossible, but existing approaches based on optimal transport (OT) become computationally and statistically intractable in high-dimensional settings. This limitation is particularly severe when both potential outcomes and confounders are high-dimensional, where classical OT-based bounds suffer from the curse of dimensionality and unfavorable convergence rates. To address this challenge, we propose a novel estimator that decomposes the transport problem into a low-dimensional signal subspace and a high-dimensional residual subspace. Unlike existing projection-based methods that discard residual information, we recover the residual transport energy using the Sliced Wasserstein distance, which is computationally efficient and robust to high dimensions. We establish interpretable conditions controlling the approximation gap based on residual structure and provide a data-driven rule for signal dimension selection. Empirical results show that our estimator consistently outperforms projection-only baselines by recovering lost transport energy, yielding more informative causal bounds while remaining computationally tractable in high dimensions.

2606.00839 2026-06-02 stat.ME

Sequential multiple testing with multiple hypotheses and prior information on the hypothesis configuration

具有多个假设和假设配置先验信息的序贯多重检验

Yiming Xing

AI总结 针对多个独立序贯观测数据流中每个流有多个候选假设的问题,利用假设配置的先验信息,设计了一种可靠、计算高效且渐近最优的序贯检验程序。

详情
Comments
20 pages, 4 figures
AI中文摘要

本文研究了多个独立、序贯观测数据流的边际分布检验问题,其中每个数据流有多个候选假设可供选择,并且存在关于未知假设配置的先验信息。目标是理解此类信息的益处,并设计一种有效利用该信息的序贯检验程序。我们从任意先验信息出发,并具体化为若干实例,包括已知每个假设对应的数据流数量或下界,以及存在互斥假设的情况。所设计的程序具有三重特性:(i) 可靠,即能够将各类族系错误概率控制在用户指定的任意水平以下;(ii) 计算高效,即在决策时专注于最小备择假设配置集;(iii) 渐近最优,即当错误水平趋近于零时,在所有可靠程序中实现最小期望样本量。文中给出了数值研究以作说明。

英文摘要

In this work, we study the problem of testing the marginal distributions of multiple independent, sequentially observed data streams, where for each stream there are multiple candidate hypotheses to select from, in the presence of prior information on the unknown hypothesis configuration. The goal is to understand the benefit of such information and to design a sequential testing procedure that effectively leverages it. We start with arbitrary prior information and specialize to concrete examples, including known number or known lower bound on the number of streams following each hypothesis, and the presence of exclusive hypotheses. The designed procedure is three-fold: (i) reliable, i.e., controlling all types of familywise error probabilities below arbitrary user-specified levels, (ii) computationally efficient, i.e., focusing on minimal sets of alternative hypothesis configurations in making decisions, and (iii) asymptotically optimal, i.e., achieving the minimum expected sample size among all reliable procedures asymptotically as the error levels go to zero. Numerical studies are presented for illustration.

2606.00834 2026-06-02 stat.AP cs.AI cs.LG math.PR

Hybrid Probabilistic Forecasting of Under-Five Malaria Admissions in Ghana: A Gaussian Process Regression with Holt-Winters Smoothing

加纳五岁以下儿童疟疾住院人数的混合概率预测:高斯过程回归与Holt-Winters平滑

T. Ansah-Narh, Y. Asare Afrane, J. Bremang Tandoh

AI总结 针对加纳疟疾预测中季节性和数据不确定性挑战,提出结合高斯过程回归与Holt-Winters指数平滑的混合模型,实现概率性预测并评估其性能。

详情
Comments
24 pages, 8 figures, accepted for publication in Artificial Intelligence in Medicine
AI中文摘要

准确的疟疾预测在撒哈拉以南非洲仍是一个重大挑战,那里强烈的季节性、报告不确定性和非平稳传播动态降低了传统模型的可靠性。在加纳,地区级疟疾监测需要概率上严谨且数据有限时稳健的预测框架。本研究提出了一个混合框架,将高斯过程回归(GPR)与Holt-Winters指数平滑相结合,用于建模每月五岁以下儿童疟疾住院人数。GPR捕捉非线性行为和预测不确定性,而Holt-Winters稳定长期预测并保留季节结构。使用十年(2014-2023年)的地区级数据,通过滚动起点扩展窗口验证评估性能。混合模型实现了$R^2 = 0.9906$,而单独Holt-Winters为$0.8213$,$94.2\%$的残差在$\pm 2σ$范围内。2024-2028年的预测显示月平均住院人数约为8,000至12,200例。时空分析揭示了显著的生态异质性:北部高负担地区尽管绝对波动较大,但相对模式稳定。该框架为疟疾流行地区的早期预警和运营规划提供了一种可扩展的概率方法,支持加纳国家疟疾控制战略。

英文摘要

Accurate malaria forecasting remains a major challenge in sub-Saharan Africa, where strong seasonality, reporting uncertainty, and non-stationary transmission dynamics reduce the reliability of conventional models. In Ghana, district-level malaria surveillance requires forecasting frameworks that are probabilistically rigorous and robust under limited data. This study proposes a hybrid framework integrating Gaussian Process Regression (GPR) with Holt-Winters exponential smoothing for modelling monthly under-five malaria admissions. GPR captures non-linear behaviour and predictive uncertainty, while Holt-Winters stabilises long-horizon forecasts and preserves seasonal structure. Using ten years of district-level data (2014-2023), performance was evaluated via rolling-origin expanding-window validation. The hybrid model achieved $R^2 = 0.9906$ versus $0.8213$ for Holt-Winters alone, with $94.2\%$ of residuals within $\pm 2σ$ bounds. Forecasts for 2024-2028 project average monthly admissions from approximately 8{,}000 to 12{,}200 cases. Spatio-temporal analysis revealed pronounced ecological heterogeneity: northern high-burden districts exhibited stable relative patterns despite large absolute fluctuations. The framework provides a scalable probabilistic approach for malaria early warning and operational planning in endemic settings, supporting Ghana's national malaria control strategy.

2606.00797 2026-06-02 stat.ME stat.AP stat.ML

Robust inference for risk heterogeneity under group imbalance

群体不平衡下风险异质性的稳健推断

Mengqi Xu, Subha Maity, Joel Dubin

AI总结 针对群体不平衡问题,提出基于Neyman正交性的风险异质性推断框架,通过局部不敏感估计量降低偏差并提高推断稳定性,在eICU数据库中发现标准方法无法检测的种族特异性风险异质性。

详情
AI中文摘要

人群层面的异质性在生物医学数据中普遍存在,不同人口统计学或临床亚组之间的差异可能显著改变风险模式。例如,在重症监护病房(ICU)研究中,特定入院诊断相关的死亡风险可能因种族群体而异。现有的风险异质性检测方法通常对基线模型误设和正则化偏差敏感,而这两者在实践中经常出现。本文中,我们提出一个稳健框架,利用Neyman正交性推断两个群体之间的风险异质性,该框架得到的估计量对 nuisance 参数估计误差局部不敏感。所提出的估计量一致且渐近正态,模拟研究表明,在有限样本下,我们的方法相比标准基于似然的方法显著降低了偏差并提高了推断稳定性。在eICU合作研究数据库的应用中,我们的方法揭示了标准基于似然方法无法检测到的、具有临床意义的种族特异性入院诊断死亡风险异质性。

英文摘要

Population-level heterogeneity is ubiquitous in biomedical data, where differences across demographic or clinical subgroups can substantially alter risk patterns. For example, in intensive care unit (ICU) studies, the mortality risk associated with specific admission diagnoses can vary across ethnic groups. Existing approaches for detecting risk heterogeneity are often sensitive to baseline model misspecification and regularization bias, both of which commonly arise in practice. In this paper, we propose a robust framework for inferring risk heterogeneity between two populations using Neyman orthogonality, which yields estimators that are locally insensitive to nuisance parameter estimation error. The proposed estimator is consistent and asymptotically normal, and simulation studies demonstrate that in finite samples our method substantially reduces bias and improves inferential stability compared with standard likelihood-based approaches. In an application to the eICU Collaborative Research Database, our method reveals clinically meaningful ethnicity-specific heterogeneity in admission diagnoses for in-hospital mortality that standard likelihood-based methods fail to detect.

2606.00783 2026-06-02 stat.AP cs.AI math.PR stat.CO

Bayesian Inference of Nonlinear Malaria Dynamics in Ghana via an Ensemble Markov Chain Monte Carlo Sampler

加纳非线性疟疾动力学的贝叶斯推断:基于集成马尔可夫链蒙特卡洛采样器

T. Ansah-Narh, Y. Asare Afrane, J. Bremang Tandoh

AI总结 针对加纳疟疾监测数据短、噪声大、空间异质性强的问题,提出一种贝叶斯非线性推断框架,结合三次基线与阻尼振荡核,通过仿射不变集成马尔可夫链蒙特卡洛采样器估计参数,实现了高精度拟合和概率预测,揭示了空间异质性并预测了2024-2026年疟疾回升趋势。

详情
Journal ref
Expert Systems with Applications, Volume 312, 131540 (2026)
Comments
27 pages, 15 figures, published in Expert Systems with Applications
AI中文摘要

可靠量化撒哈拉以南非洲疟疾动态受到短、噪声大且空间异质的监测记录阻碍。在加纳,2014年至2023年的卫生设施数据揭示了住院人数的非线性和年龄特异性波动,然而现有方法难以捕捉随机变异或提供可信的不确定性区间。本研究开发了一个贝叶斯非线性推断框架,该框架将三次基线与阻尼振荡核相结合,通过仿射不变集成马尔可夫链蒙特卡洛采样器进行估计。该框架适应有限数据,建模参数不确定性,并为五岁以下儿童和五岁及以上个体生成概率预测。结果显示较强的经验充分性(五岁以下:$R^2 = 0.9958$;五岁及以上:$R^2 = 0.9956$),残差低于$2\%$,且混合良好的后验分布确认了收敛性。区级分析揭示了显著的空间异质性,变异系数从库马西等城市中心的$<0.07$到姆波霍尔和东比亚等边缘地区的$>3.3$。2024-2026年的预测表明逐步回升:五岁以下儿童病例从137,000例增至149,000例,年长个体从348,000例增至375,000例,不确定性随时间扩大。通过生成概率预测,该贝叶斯框架为预测疟疾波动和加强加纳国家疟疾控制战略中的数据驱动决策提供了原则性工具。

英文摘要

Reliable quantification of malaria dynamics in sub-Saharan Africa is hindered by short, noisy, and spatially heterogeneous surveillance records. In Ghana, health-facility data from 2014 to 2023 reveal non-linear and age-specific fluctuations in hospital admissions, yet existing approaches struggle to capture stochastic variability or provide credible uncertainty bounds. This study develops a Bayesian nonlinear inference framework that integrates a cubic baseline with a damped oscillatory kernel, estimated via an affine-invariant ensemble Markov Chain Monte Carlo sampler. The framework accommodates limited data, models parameter uncertainty, and generates probabilistic forecasts for children under five years and individuals aged five years or more. Results show strong empirical adequacy ($R^2 = 0.9958$ for $<5$ years; $R^2 = 0.9956$ for $\geq 5$ years) with residual errors below $2\%$ and well-mixed posteriors confirming convergence. District-level analysis reveals pronounced spatial heterogeneity, with coefficients of variation ranging from $<0.07$ in urban centres such as Kumasi to $>3.3$ in peripheral districts such as Mpohor and Bia East. Forecasts for 2024-2026 indicate a gradual resurgence: from 137,000 to 149,000 cases among children under five years and from 348,000 to 375,000 cases among older individuals, with uncertainty widening over time. By producing probabilistic forecasts, this Bayesian framework provides a principled tool for anticipating malaria fluctuations and strengthening data-driven decision-making in Ghana's national malaria control strategy.

2606.00768 2026-06-02 astro-ph.IM physics.data-an stat.AP

Bayesian estimation of spectral parameters of the 6.7-GHz methanol maser G339.884-1.259 from GRAO observations

基于GRAO观测的6.7 GHz甲醇脉泽G339.884-1.259谱参数贝叶斯估计

Theophilus Ansah-Narh, Stephen Sottie, Nia Imara, Emmanuel Proven-Adzri

AI总结 针对甲醇脉泽谱线分解中传统高斯拟合无法捕捉非高斯结构且缺乏不确定性量化的问题,提出基于高斯、洛伦兹和Voigt线型的贝叶斯谱分解框架,通过马尔可夫链蒙特卡洛采样实现模型比较和不确定性估计,应用于GRAO观测的6.7 GHz甲醇脉泽G339.884-1.259,发现七个速度相干分量,其中Voigt模型统计最优。

详情
Comments
15 pages, 7 figures, accepted for publication in Monthly Notices of the Royal Astronomical Society
AI中文摘要

准确分解甲醇脉泽谱对于理解高质量恒星形成区域至关重要,特别是在复杂混合谱中微小差异会改变物理解释。传统高斯拟合通常无法捕捉非高斯结构且缺乏不确定性量化。我们开发了一个贝叶斯谱分解框架,使用高斯、洛伦兹和Voigt线型,结合马尔可夫链蒙特卡洛采样,实现模型比较和不确定性估计。应用于加纳射电天文台观测的6.7 GHz甲醇脉泽G339.884$-$1.259,我们的方法揭示了七个速度相干分量。Voigt模型在统计上更优,产生最低的AIC和BIC(分别约为$1.98 \times 10^{4}$和$1.99 \times 10^{4}$)、最小的RMSE(约11.1 Jy)和最高的$R^{2}$(0.985)。纯高斯或洛伦兹模型留下系统性残差。升高的约化$\chi^{2}_{\nu}$值表明存在未解析的子结构和非理想噪声。贝叶斯推断为脉泽谱分析提供了稳健框架,可扩展到其他分子谱线并与高分辨率干涉测量结合。

英文摘要

Accurate decomposition of methanol maser spectra is essential for understanding high-mass star-forming regions, especially in complex blended spectra where small differences alter physical interpretation. Conventional Gaussian fitting often fails to capture non-Gaussian structure and lacks uncertainty quantification. We develop a Bayesian spectral decomposition framework using Gaussian, Lorentzian, and Voigt profiles with Markov Chain Monte Carlo sampling, enabling model comparison and uncertainty estimation. Applied to the 6.7\,GHz methanol maser G339.884$-$1.259 observed with the Ghana Radio Astronomy Observatory, our method reveals seven velocity-coherent components. The Voigt model is statistically preferred, yielding the lowest AIC and BIC ($\approx 1.98 \times 10^{4}$ and $1.99 \times 10^{4}$), the smallest RMSE ($\approx 11.1$ Jy), and the highest $R^{2}$ (0.985). Purely Gaussian or Lorentzian models leave systematic residuals. Elevated reduced $χ^{2}_ν$ values indicate unresolved substructure and non-ideal noise. Bayesian inference provides a robust framework for maser spectral analysis, extendable to other molecular lines and combinable with high-resolution interferometry.

2606.00767 2026-06-02 stat.ME

The Effect of Choice of Metric and Scan Length on Reliability in Resting-State fMRI

度量选择和扫描长度对静息态fMRI可靠性的影响

Yu Huang, Philip T. Reiss, Seonjoo Lee, R. Todd Ogden

AI总结 本研究利用距离基组内相关系数(dbICC)评估静息态fMRI数据的功能连接可靠性,比较了Frobenius度量和仿射不变黎曼度量(AIRM)的影响,并分析了扫描长度和会话间隔对可靠性的作用。

详情
AI中文摘要

静息态功能磁共振成像(rs-fMRI)广泛用于研究大脑功能连接,但测量的可靠性仍是确保可重复性的关键问题。距离基组内相关系数(dbICC)将经典ICC推广到更一般的数据类型,非常适合评估功能连接测量的可靠性。本研究应用dbICC评估来自Midnight Scanning Club(MSC)数据集的rs-fMRI数据的可靠性,该数据集包含10名受试者,每名受试者进行10次30分钟的rs-fMRI扫描。功能连接通过所有脑区对之间的Pearson相关系数估计,每次扫描得到一个相关矩阵。我们比较了两种距离度量——广泛使用的Frobenius度量和为尊重协方差矩阵空间几何而选择的仿射不变黎曼度量(AIRM)——以评估度量选择如何影响相关估计的可靠性。此外,我们研究了扫描长度和会话间时间间隔对可靠性的影响。基于每种度量的结果在某些方面一致,但在其他方面不一致,说明了度量选择的影响。我们还发现,较长的扫描长度显著提高了可靠性,而会话间时间间隔的影响较小。

英文摘要

Resting-state fMRI (rs-fMRI) is widely used to investigate brain functional connectivity, but the reliability of these measurements remains a key concern for ensuring reproducibility. The distance-based intraclass correlation coefficient (dbICC) generalizes classical ICC to more general data types, making it well-suited for assessing the reliability of measures of functional connectivity. In this study, we applied dbICC to assess the reliability of rs-fMRI data from the Midnight Scanning Club (MSC) dataset, which consists of 10 subjects, each undergoing 10 sessions of 30-minute rs-fMRI scans. The functional connectivity was estimated using Pearson's correlation coefficients between all pairs of brain regions, resulting in a correlation matrix for each session. We compared two distance metrics-the widely used Frobenius metric and the Affine Invariant Riemannian Metric (AIRM) selected to respect the geometry of the space of covariance matrices-to evaluate how the choice of metric affects the reliability of estimating correlation. In addition, we investigated the impact of scan length and time interval between sessions on reliability. Results based on each metric agreed in some respects but disagreed in others, illustrating the impact of choice of metric. We also found that longer scan lengths significantly improve reliability, while the time interval between sessions has less impact.

2606.00758 2026-06-02 stat.ML cs.LG eess.SP stat.ME

Statistical Testing on Directed Graphs by Surrogate Data Generation

通过替代数据生成的有向图统计检验

Chun Hei Michael Chan, Alexandre Cionca, Dimitri Van De Ville

AI总结 针对有向图,基于图移位算子的特征分解定义宽平稳信号,提出保持协方差结构的替代数据生成框架,用于构建检验统计量的零分布,并在真实数据上验证了优于现有方法。

详情
Comments
Submitted to IEEE Transactions on Signal and Information Processing over Networks
AI中文摘要

近年来,图信号处理已成为信号处理与图论交叉领域的一个强大框架,提供了分析定义在节点上的信号同时考虑其由边表示的关系的工具。这些工具已成功应用于各种场景,包括统计假设检验。特别地,针对无向图上的信号,已提出了基于替代生成的非参数方法。然而,这些方法尚未扩展到有向图。在这项工作中,我们首先重新审视有向图上平稳图信号的概念。具体地,通过图移位算子的特征分解,我们定义了有向图宽平稳信号。然后,我们提出一个新的框架来生成替代图信号,该信号在平稳性假设下保持协方差结构。然后,可以从这些替代信号构建检验度量的零分布,并作为经验数据的参考。最后,我们提供了指导性示例和真实数据上的应用,其中我们将我们的框架与现有针对无向图或基于朴素置换的技术进行了性能比较,证明了所提方法的可行性和优越性。

英文摘要

In recent years, graph signal processing has emerged as a powerful framework at the intersection of signal processing and graph theory, providing tools for the analysis of signals defined on nodes while accounting for their relationships represented by edges. These tools have been successfully applied to various settings, including statistical hypothesis testing. In particular, non-parametric approaches based on surrogate generation have been proposed for signals on undirected graphs. However, they are yet to be extended to directed graphs. In this work, we first revisit the notion of stationary graph signals on directed graphs. Specifically, and through the eigendecomposition of the graph shift operator, we define directed graph wide-sense stationary signals. Then, we propose a new framework to generate surrogate graph signals that preserve covariance structure under stationarity assumptions. Null distributions of the test metric can then be constructed from these surrogates and serve as a reference for the empirical data. Finally, we provide guiding examples and an application on real data, in which we compare the performance of our framework with existing techniques for undirected graphs or based on naive permutation, demonstrating feasibility and superiority of the proposed approach.

2606.00754 2026-06-02 stat.ME cs.AI cs.LG

Causal Density Functions

因果密度函数

Sridhar Mahadevan

AI总结 提出因果密度函数作为干预分布与观测分布的Radon-Nikodym导数,用于局部密度比衡量因果效应,并给出估计与检验方法。

详情
Comments
25 pages
AI中文摘要

我们引入因果密度函数:Radon-Nikodym导数,它比较干预分布与观测分布,因此作为因果效应的局部密度比。许多因果强度度量在图手术后的整个分布上进行比较,而因果密度函数提供了一个逐点的测度变换对象,可以估计、校准并用于评分有向影响。基本恒等式 \[ \mathbb{E}_{\mathrm{do}}[f(Y)] = \mathbb{E}_{\mathrm{obs}}\!\left[f(Y)ρ(X,Y)\right] \] 使得因果密度直接可检验:如果估计的密度比正确,通过ρ重新加权的观测期望重现干预期望。我们推导了do曲线和有向边得分的实用估计量,将构造与条件作用和干预的Radon-Nikodym/Kan语义联系起来,并在合成和真实扰动基准上评估了所得估计量。

英文摘要

We introduce causal density functions: Radon-Nikodym derivatives that compare interventional laws to observational laws and therefore act as local density ratios for causal effects. Whereas many causal-strength measures compare whole distributions after graph surgery, causal density functions provide a pointwise change-of-measure object that can be estimated, calibrated, and used to score directed influence. The basic identity \[ \mathbb{E}_{\mathrm{do}}[f(Y)] = \mathbb{E}_{\mathrm{obs}}\!\left[f(Y)ρ(X,Y)\right] \] makes causal density directly testable: if the estimated density ratio is correct, observational expectations reweighted by $ρ$ reproduce interventional expectations. We derive practical estimators for do-curves and directed edge scores, relate the construction to Radon-Nikodym/Kan semantics for conditioning and intervention, and evaluate the resulting estimators on synthetic and real perturbation benchmarks.

2606.00741 2026-06-02 cs.LG cs.AI stat.ML

Quantum Tunneling-Aware Machine Learning: Physics-Derived Noise Models for Robust Deployment

量子隧穿感知机器学习:面向鲁棒部署的物理衍生噪声模型

Uiwon Hwang, Jaeho Hwang

AI总结 本文提出量子隧穿感知机器学习(QTAML),通过WKB近似推导部署时的权重误差分布,并设计隧穿感知补偿(TAC)算法,在无需重训练和标签的情况下,以较低ECC开销恢复模型精度。

详情
AI中文摘要

晶体管缩放正接近量子力学极限,因为薄栅氧化物通过量子隧穿引起电子泄漏。与传统数字系统不同,只要错误结构被正确建模,AI推理可以容忍此类错误。在本文中,我们引入量子隧穿感知机器学习(QTAML)。我们使用Wentzel-Kramers-Brillouin(WKB)近似从第一性原理推导部署时的权重误差分布,并表明它具有通用高斯噪声模型所忽略的结构:精确的仿射均值漂移、由最高有效位主导的逐位方差层级,以及依赖于$\|W_\ell\|_\infty$和训练网络Jacobian的逐层依赖性。我们将这三个结构属性打包成一个单一的部署时算法——隧穿感知补偿(TAC),该算法结合了闭式均值校正和基于WKB方差分解的最优逐层自适应比特预算分配。在$p_\mathrm{flip}=0.10$的四个卷积架构和$p_\mathrm{flip}=0.05$的一个Transformer编码器上,TAC达到了干净精度的95%,同时ECC开销比从相同物理导出的自然基线Uniform-MSP低3.4倍到33.6倍。闭式饱和比$ ho^*$预先预测了这些增益,在异构架构上,WKB导出的评分在小预算下比基于幅度的分配高出多达24个百分点。该算法无需重训练、无需标签,且无推理时开销。我们还验证了WKB导出的分布定理达到蒙特卡洛精度。这些结果将WKB隧穿物理与噪声感知深度学习联系起来,并为超越传统缩放极限的硬件-软件协同设计提供了一条有原则的路径。

英文摘要

Transistor scaling is approaching a quantum-mechanical limit, as thin gate oxides induce electron leakage through quantum tunneling. Unlike conventional digital systems, AI inference can tolerate such errors provided their structure is modeled correctly. In this paper, we introduce quantum tunneling-aware machine learning (QTAML). We derive the deployment-time weight-error distribution from first principles using the Wentzel-Kramers-Brillouin (WKB) approximation and show that it has structure that generic Gaussian noise models miss: an exact affine mean drift, a per-bit variance hierarchy dominated by the most-significant bit, and a per-layer dependence on $\|W_\ell\|_\infty$ and the trained-network Jacobian. We package these three structural properties into a single deployment-time algorithm, Tunneling-Aware Compensation (TAC), that combines closed-form mean correction with an optimal layer-adaptive bit-budget allocation derived from the WKB variance decomposition. Across four convolutional architectures at $p_\mathrm{flip}$=0.10 and a transformer encoder at $p_\mathrm{flip}$=0.05, TAC reaches $95\%$ of clean accuracy with 3.4$\times$ to 33.6$\times$ less ECC overhead than Uniform-MSP, the natural baseline derived from the same physics. The closed-form saturation ratio $ρ^*$ predicts these gains in advance, and on heterogeneous architectures WKB-derived scoring outperforms magnitude-based allocation by up to 24 percentage points at small budgets. The algorithm requires no retraining, no labels, and no inference-time overhead. We also verify the WKB-derived distributional theorems to Monte Carlo precision. These results connect WKB tunneling physics with noise-aware deep learning and suggest a principled path toward hardware--software co-design beyond conventional scaling limits.

2606.00717 2026-06-02 cs.LG cs.AI stat.ML

Multi-Agent Conformal Prediction with Personalized Statistical Validity

具有个性化统计有效性的多智能体共形预测

Martin V. Vejling, Christophe A. N. Biscio, Adrien Mazoyer, Petar Popovski, Shashi Raj Pandey

AI总结 提出个性化联邦加权共形预测框架,通过局部密度比加权和加权分位数聚合,在保护隐私的同时纠正数据异质性,为每个参与智能体提供渐近有效的边际和校准条件覆盖保证。

详情
AI中文摘要

不确定性量化在高风险机器学习任务中至关重要。然而,共形预测这一原则性解决方案在局部校准数据有限、隐私约束和数据异质性下面临挑战。在多智能体设置中,现有工作无法同时令人满意地解决这些挑战,其保证要么限于智能体间的平均值,要么在异质性设置中失去有效性。因此,我们提出个性化联邦加权共形预测(PFWCP),该框架结合局部密度比加权与加权分位数聚合,以在保护隐私的同时纠正异质性。该方法为每个参与智能体提供渐近有效的边际和校准条件覆盖保证,并支持一次性通信协议。理论分析呈现了对覆盖方差的调整,该调整由有效样本量表达式控制,这在加权共形预测的背景下是必要的,并且在合成和真实数据集上的实验表明,与最先进的联邦共形基线相比,校准质量有所提高。

英文摘要

Uncertainty quantification is essential in high-stakes machine learning tasks. However, one of the principled solutions, conformal prediction, faces challenges under limited local calibration data, privacy constraints, and data heterogeneity. In multi-agent settings, existing works do not simultaneously and satisfactorily address these challenges with guarantees either limited to averages across agents or losing validity in heterogeneous settings. Hence, we propose personalized federated weighted conformal prediction (PFWCP), a framework that combines local density ratio weighting with weighted quantile aggregation to correct for heterogeneity while preserving privacy. The method yields asymptotically valid marginal and calibration-conditional coverage guarantees for each participating agent and supports protocols with one-shot communication. Theoretical analysis presents an adjustment to the coverage variance, governed by an effective sample size expression, which is necessary in the context of weighted conformal prediction, and experiments on synthetic and real datasets show improved calibration quality over state-of-the-art federated conformal baselines.

2606.00715 2026-06-02 stat.ME stat.ML

Rate-optimal neural boundary detection from unlabeled noisy images

无标签噪声图像的最优率神经边界检测

Kyeongho Kim, Ilsang Ohn

AI总结 针对无标签噪声图像,提出一种基于铰链型替代损失的深度神经网络边界检测方法,并证明其达到极小化最优边界恢复率。

详情
AI中文摘要

我们从统计角度研究无标签噪声图像的边界检测。目标是从原始强度观测中恢复未知物体区域,无需逐像素标注标签或强度分布的参数模型。受基于阈值化误分类损失的鲁棒Gibbs后验方法启发,我们提出一种连续的铰链型替代损失用于边界检测。该损失适用于基于梯度的优化,并可结合深度神经网络表示复杂物体边界。我们证明,在温和的分离假设下,所提损失函数是Fisher一致的,并得到连接超额替代风险与估计区域对称差误差的校准不等式。在分段光滑边界模型下,我们证明所得深度神经网络估计器达到极小化最优边界恢复率(对数因子除外)。分段光滑公式可处理带角和尖点的边界,从而超越全局光滑边界模型。数值实验表明,所提方法在不同噪声水平和形状配置下准确稳定地恢复物体边界,并与现有无监督边界检测方法相比具有优势。

英文摘要

We study boundary detection for unlabeled noisy images from a statistical perspective. The aim is to recover an unknown object region from raw intensity observations without pixel-wise annotating labels or a parametric model for the intensity distributions. Motivated by robust Gibbs posterior approaches based on thresholded misclassification losses, we propose a continuous hinge-type surrogate loss for boundary detection. The proposed loss is amenable to gradient-based optimization and can be combined with deep neural networks to represent complex object boundaries. We prove that the proposed loss function is Fisher consistent under a mild separation assumption and obtain a calibration inequality linking excess surrogate risk to the symmetric difference error of the estimated region. Under a piecewise smooth boundary model, we prove that the resulting deep neural network estimator achieves the minimax-optimal boundary recovery rate, up to logarithmic factors. The piecewise smooth formulation accommodates boundaries with corners and kinks, thereby extending beyond globally smooth boundary models. Numerical experiments demonstrate that the proposed method accurately and stably recovers object boundaries across a range of noise levels and shape configurations, and compares favorably with existing unsupervised boundary detection methods.

2606.00661 2026-06-02 stat.ML cs.LG

On Median of Incomplete U-Statistics

关于不完全U-统计量的中位数

Nong Minh Hieu

AI总结 本文针对不完全U-统计量的中位数(MIU)建立了有限样本浓度率,这是一种用于对称核期望的高效稳健估计量。

详情
AI中文摘要

我们建立了不完全U-统计量的中位数(MIU)的有限样本浓度率,这是一种用于对称核期望的高效稳健估计量。

英文摘要

We establish the finite-sample concentration rate for the Median-of-Incomplete-U-Statistics (MIU), an efficient robust estimator for the expectation of symmetric kernels.

2606.00643 2026-06-02 stat.ML cs.LG cs.NA math.NA math.OC math.ST stat.TH

Taming the Loss Landscape of PINNs with Noisy Feynman-Kac Supervision: Operator Preconditioning and Non-Asymptotic Error Bounds

驯服带噪声Feynman-Kac监督的PINN损失景观:算子预条件与非渐近误差界

Nathanael Tepakbong, Hanyu Hu, Chengyu Liu, Xiang Zhou

AI总结 通过引入点态数据保真项作为算子级预条件,显著改善PINN的损失景观条件数,并基于Feynman-Kac表示生成标签,提出FK-PINN方法,推导了梯度下降下的非渐近误差界。

详情
Comments
accepted in ICML 2026 (poster), 59 pages
AI中文摘要

物理信息神经网络(PINNs)在求解具有挑战性的偏微分方程(PDEs)时通常训练缓慢或无法收敛,这一行为最近被归因于从底层微分算子继承的严重病态损失景观。我们研究了在标准残差和边界损失基础上,于域内少数点添加点态数据保真项的PINNs。我们证明,该监督项作为算子级预条件:对于合适的权重,我们的比较界保证条件数比标准PINN损失下显著更小,且与点态标签的获取方式无关。对于一类允许Feynman-Kac(FK)表示的PDEs,我们通过FK泛函的蒙特卡洛平均生成此类标签,得到所谓的“FK-PINNs”,并利用超额风险分解方法,推导了使用tanh激活函数、通过有限步梯度下降训练的FK-PINNs的非渐近$L^2(Ω)$误差界。在此过程中,我们建立了tanh神经网络一阶和二阶导数的伪维数界,这些结果具有独立意义,且据我们所知是新的。在泊松、薛定谔、平均逃逸时间和通量问题上的数值实验证实了理论,并表明FK-PINNs能够成功求解标准PINNs表现出严重失效模式的PDEs。

英文摘要

Physics-Informed Neural Networks (PINNs) often train slowly or fail to converge on challenging partial differential equations (PDEs), a behavior recently linked to severely ill-conditioned loss landscapes inherited from the underlying differential operator. We study PINNs augmented with a pointwise data-fidelity term, added at a few points in the domain to the standard residual and boundary losses. We show that this supervision term acts as an operator-level preconditioner: for suitable weights, our comparison bounds guarantee a substantially smaller condition number than under the standard PINN loss, independently of how the pointwise labels are obtained. For a broad class of PDEs admitting a Feynman-Kac (FK) representation, we generate such labels by Monte Carlo averages of the FK functional, resulting in what we call ``FK-PINNs", and using the excess risk decomposition approach, we derive non-asymptotic $L^2(Ω)$-error bounds for FK-PINNs with $\tanh$ activation trained by finitely many steps of gradient descent. Along the way, we establish pseudo-dimension bounds for first- and second-order derivatives of $\tanh$ neural networks, which are of independent interest and, to the best of our knowledge, new. Numerical experiments on Poisson, Schrödinger, mean exit time, and committor problems corroborate the theory, and show that FK-PINNs can successfully solve PDEs for which standard PINNs exhibit severe failure modes.

2606.00630 2026-06-02 cs.CV stat.ML

A Systematic Benchmark of Intraoperative Ultrasound-to-MR Synthesis for Brain Tumour Surgery

脑肿瘤手术中术中超声到MR合成的系统基准测试

Olga Esteban-Sinovas, Santiago Cepeda, Ignacio Arrese, Rosario Sarabia

AI总结 针对脑肿瘤手术中术中超声(ioUS)到MR图像合成问题,本研究在公共ReMIND数据集上系统比较了6种生成器、4种推理模式和2种目标,结合图像保真度指标和下游分割评估,发现感知质量(LPIPS)与下游效用最相关,而SSIM与效用负相关,SynDiff-2.5D在下游分割中表现最佳。

详情
AI中文摘要

术中超声(ioUS)在脑肿瘤手术中是一种多功能、成本效益高的模态,但其解释困难:采集平面非标准,伪影具有模态特异性,且其外观与术前MRI(手术规划工具、分割模型和外科医生经验所依赖的)显著不同。从ioUS合成类似MRI的图像可以使基于MRI的基础设施在术中无需额外扫描即可重复使用。大多数先前的工作孤立地评估单一架构;据我们所知,没有基准测试在共同协议下涵盖架构范式、推理机制和下游任务端点。我们在公共ReMIND数据集(76名患者;153对ioUS/T2w和104对ioUS/FLAIR研究;60/16患者级训练/保留测试集划分)上填补了这一空白。六个生成器(四个GAN基线:Pix2Pix、SwinPix2Pix、CycleGAN、CUT;Transformer增强的ResViT;以及少步扩散模型SynDiff)分别在四种推理机制(2D、2.5D、2D+3D细化、全3D)和两种目标(仅T2w;T2w+FLAIR多任务)下训练,共产生48个实验。图像保真度指标(SSIM、PSNR、MAE、LPIPS)辅以nnU-Net v2下游分割评估(肿瘤和切除腔)以及按组织学分级和再次手术的亚组分析。没有一种架构在所有轴上占优,而且关键的是,感知质量与下游效用最密切相关(LPIPS,r=-0.66,p<0.001),而更高的SSIM与更差的效用相关(r=-0.64,p<0.001);SynDiff-2.5D最好地保留了下游分割(U_Dice=0.55)。因此,应报告或优先考虑感知和下游任务指标而非全局SSIM,并且架构选择应取决于手术阶段、患者病史和临床目标。

英文摘要

Intraoperative ultrasound (ioUS) is a versatile, cost-effective modality in brain tumour surgery, but its interpretation is difficult: acquisition planes are non-standard, artefacts are modality-specific, and its appearance differs markedly from the preoperative MRI on which surgical-planning tools, segmentation models and the surgeon's experience rely. Synthesising MRI-like images from ioUS could let this MRI-based infrastructure be reused intraoperatively without an extra scan. Most prior work evaluates a single architecture in isolation; to our knowledge, no benchmark has spanned architectural paradigms, inference regimes and downstream-task endpoints under a common protocol. We address this gap on the public ReMIND data set (76 patients; 153 paired ioUS/T2w and 104 paired ioUS/FLAIR studies; 60/16 patient-level train/held-out split). Six generators (four GAN baselines: Pix2Pix, SwinPix2Pix, CycleGAN, CUT; the transformer-augmented ResViT; and the few-step diffusion model SynDiff) were each trained under four inference regimes (2D, 2.5D, 2D + 3D-refinement, full-3D) and two targets (T2w only; T2w + FLAIR multi-task), yielding 48 experiments. Image-fidelity metrics (SSIM, PSNR, MAE, LPIPS) were complemented by an nnU-Net v2 downstream segmentation evaluation (tumour and resection cavity) and by subgroup analyses by histological grade and reoperation. No architecture dominated every axis, and, critically, perceptual quality tracked downstream utility most closely (LPIPS, r=-0.66, p<0.001), whereas higher SSIM was associated with worse utility (r=-0.64, p<0.001); SynDiff-2.5D best preserved downstream segmentation (U_Dice=0.55). Perceptual and downstream-task metrics should therefore be reported alongside or in preference to global SSIM, and architecture choice conditioned on surgical phase, patient history and clinical objective.

2606.00605 2026-06-02 cs.LG stat.ML

Looped Transformers with Layer Normalization Provably Learn the Power Method

带有层归一化的循环Transformer可证明地学习幂方法

Lyumin Wu, Chenyang Zhang, Yuan Cao

AI总结 本文通过主成分预测任务,证明带有层归一化的循环线性Transformer在梯度下降训练下会收敛到实现幂方法的解,揭示了层归一化带来的算法隐式偏差。

详情
Comments
70 pages, 8 figures
AI中文摘要

Transformer在广泛的应用中取得了显著成功,越来越多的研究表明其部分优势来自于学习和执行算法程序的能力。然而,我们对Transformer如何学习此类算法的理解仍然有限,尤其是在存在层归一化(LN)的情况下。在这项工作中,我们研究主成分预测作为理解带有LN的Transformer训练动态的具体测试平台。我们证明,通过梯度下降训练的带有LN的循环线性Transformer收敛到实现幂方法的解,其中每个自注意力层执行一次幂迭代。值得注意的是,模型仅针对主成分预测进行训练,而非明确监督其实现幂方法。因此,我们的发现揭示了带有LN的循环Transformer的“算法隐式偏差”:主成分预测原则上可以通过多种机制实现,但梯度下降选择了实现幂方法的一种。我们进一步提供了带有和不带有LN的Transformer之间的具体比较:即使有幂迭代的逐层指导,没有LN的Transformer也无法精确学习幂方法,而带有LN的对应Transformer可以,导致主成分预测中可证明的性能差距。据我们所知,我们的结果首次对带有LN的循环和单层Transformer的训练动态进行了理论分析,并阐明了LN在Transformer模型中的作用。

英文摘要

Transformers have achieved remarkable success across a wide range of applications, and a growing body of work suggests that part of their strength comes from their ability to learn and execute algorithmic procedures. However, our understanding of how transformers learn such algorithms remains limited, especially in the presence of layer normalization (LN). In this work, we study principal component prediction as a concrete testbed for understanding the training dynamics of transformers with LN. We prove that a looped linear transformer with LN, trained by gradient descent, converges to a solution that implements the power method, with each self-attention layer performing one power iteration. Notably, the model is trained only for principal component prediction, rather than being explicitly supervised to implement the power method. Our finding thus reveals an "algorithmic implicit bias" of looped transformers with LN: principal-component prediction can in principle be achieved by many mechanisms, yet gradient descent selects one that realizes the power method. We further provide a concrete comparison between transformers with and without LN: even with layerwise guidance from power iterations, a transformer without LN cannot exactly learn the power method, whereas the corresponding transformer with LN can, leading to a provable performance gap in principal component prediction. Our results provide, to our knowledge, the first theoretical analysis of the training dynamics of looped and single-layer transformers with LN, and shed light on the role of LN in transformer models.

2606.00584 2026-06-02 stat.ML cs.LG

Spectra-Guided Neural Tucker Factorization

光谱引导的神经Tucker分解

Fusheng Wang, Yikai Hou

AI总结 提出光谱引导的神经Tucker分解(SG-NTF),通过连续光谱空间映射和时空共门控机制,实现高维不完整张量的高效补全。

详情
AI中文摘要

本文针对高维不完整(HDI)张量补全问题,提出光谱引导的神经Tucker分解(SG-NTF)。为规避离散表示的局限性,SG-NTF将标量时间戳映射到连续光谱空间以抽象时间周期性。同时,时空共门控(STCG)机制通过时空上下文上的乘法调制显式过滤潜在交互。在真实世界HDI张量上的评估验证了SG-NTF在参数效率下保持有竞争力的补全精度。

英文摘要

This paper proposes Spectra-Guided Neural Tucker Factorization (SG-NTF) for High-Dimensional and Incomplete (HDI) tensor completion. Circumventing discrete representational limits, SG-NTF maps scalar timestamps into a continuous spectral space to abstract temporal periodicities. Concurrently, a Spatio-Temporal Co-Gating (STCG) mechanism explicitly filters latent interactions via multiplicative modulation on spatiotemporal contexts. Evaluations on real-world HDI tensors verify that SG-NTF maintains competitive completion accuracy with parameter efficiency.

2606.00578 2026-06-02 stat.ME

When Do Generalized Permutation Tests Achieve Optimal Power? A Dispersion Characterization

广义置换检验何时达到最优功效?一种离散度刻画

Yongmin Kim, Ilmun Kim

AI总结 本文通过引入两种离散度量,刻画了非均匀置换分布下广义蒙特卡洛置换检验的条件分布收敛性,并证明了当离散度渐近消失时检验达到最优Pitman局部功效,反之则无法保证。

详情
Comments
34 pages, 3 figures, 1 table
AI中文摘要

我们研究了在置换上的非均匀分布下的广义蒙特卡洛置换检验。聚焦于均值差统计量,我们引入了两种标量离散度量,用于量化个体和成对层面上对完全随机化的偏离。我们证明,如果两个离散度渐近消失,则条件置换分布收敛到其高斯基准,临界值稳定,且检验达到最优Pitman局部功效。反之,如果这些离散度未能消失,则置换分布不自平均,临界值不必稳定,且通常无法保证最优局部功效。我们进一步证明,在标准Pitman局部模型之外,适当选择的非均匀置换分布可以通过利用数据中的干扰结构严格优于均匀分布。

英文摘要

We study generalized Monte Carlo permutation tests under a non-uniform distribution on permutations. Focusing on the difference-in-means statistic, we introduce two scalar dispersion measures that quantify departures from complete randomization at the individual and pairwise levels. We show that if both dispersions vanish asymptotically, then the conditional permutation distribution converges to its Gaussian benchmark, the critical value stabilizes, and the test attains optimal Pitman local power. Conversely, if these dispersions fail to vanish, the permutation distribution does not self-average, the critical value need not stabilize, and optimal local power cannot in general be guaranteed. We further show that beyond the standard Pitman local model, suitably chosen non-uniform permutation distributions can strictly dominate the uniform distribution by exploiting nuisance structure in the data.

2606.00563 2026-06-02 cs.LG cs.AI stat.ML

A Practical Upper Bound on Selection Bias Effects in Medical Prediction Models

医学预测模型中选择偏差影响的一个实用上界

Kara Liu, Maggie Wang, Russ B. Altman

AI总结 针对选择偏差导致模型泛化性差的问题,提出在仅部分观测选择机制和目标分布的现实条件下,对目标群体最差模型性能的一个新上界,并通过合成数据和真实数据验证其有效性和实用性。

详情
Comments
32 pages, 27 figures, will be published at ACM SIGKDD '26
AI中文摘要

选择偏差是真实世界数据中常见且往往不可避免的一个方面,它挑战了机器学习模型的泛化性。当在偏倚数据上训练的模型被部署到更广泛的目标群体时,模型泛化能力差可能导致实际危害,尤其是在医疗保健等高危环境中。这种风险凸显了从业者在部署前可靠评估模型泛化性的需求。然而,现有的预测模型性能的方法依赖于不切实际地访问目标分布或了解导致偏差的选择机制。为了解决这些局限性,我们提出了一个新颖的上界,用于在现实设置下目标群体上的最差模型性能,其中选择机制和目标群体数据仅被部分观测。我们通过在完全合成数据、源自All of Us研究计划的半合成数据以及MIMIC-IV中的真实世界选择偏差上的实验,证明了我们方法的有效性和实际效用。我们的工作提供了一个原则性和实用性的工具,用于估计在原本难以处理的情况下选择偏差的影响,从而使从业者能够在医疗保健及其他领域构建更安全、更具泛化性的模型。

英文摘要

Selection bias is a common and often unavoidable aspect of real-world data that challenges the generalizability of machine learning models. When models trained on biased data are deployed in the broader target population, poor model generalization may lead to real harm, particularly in high-risk settings such as healthcare. This risk highlights the need for practitioners to reliably assess model generalizability prior to deployment. However, existing methods for predicting model performance rely on unrealistic access to the target distribution or knowledge of the selection mechanism causing bias. To address these limitations, we propose a novel upper bound on the worst-case model performance on the target population under the realistic setting where the selection mechanism and the target population data are only partially observed. We demonstrate the validity and practical utility of our method through experiments on fully synthetic data, semi-synthetic data derived from the All of Us Research Program, and real-world selection bias in MIMIC-IV. Our work offers a principled and practical tool to estimate the impact of selection bias in an otherwise intractable setting, thereby enabling practitioners to build safer and more generalizable models in healthcare and beyond.

2606.00539 2026-06-02 cs.LG math.OC stat.ML

GNMR: Runtime Stability Control for Low-Precision Large Language Model Training

GNMR: 低精度大语言模型训练的运行时稳定性控制

Boao Kong, Weichen Jia, Engao Zhang, Guohong Li, Yonghan Dong, Yao Wang, Yaoyuan Wang, Yunke Peng, Kun Yuan

AI总结 针对低精度语言模型训练中的稳定性瓶颈,提出基于梯度范数与历史均值之比(GNMR)的轻量级运行时控制器,通过局部风险信号映射到有界恢复动作,在不改变数值格式或后端的情况下提升训练稳定性。

详情
Comments
29 pages, 4 figures, 15 tables
AI中文摘要

训练稳定性是低精度语言模型训练的关键瓶颈:高效的低成本路径仍可能在少量算子处产生短暂的数值风险。我们将此问题形式化为运行时稳定性控制,并提出梯度范数与历史均值之比(GNMR),一种轻量级控制器,将每个可恢复单元的当前梯度范数与其历史均值进行比较。结合用于检测短窗口内突增的$Δ$-GNMR,GNMR在硬$\mathrm{maxO}$预算和短锁定间隔下将局部风险信号映射到有界恢复动作,而不改变数值格式、内核或后端方案。在激活量化压力测试、DeepSeek风格的配方级训练以及LLaMA-2 13B微调中,GNMR以稀疏且预算受限的恢复保持了高保真质量。这些结果支持GNMR作为一种与后端无关的控制器,在保持低成本执行的同时提高低精度训练的稳定性。

英文摘要

Training stability is a key bottleneck in low-precision language model training: efficient low-cost paths can still produce short-lived numerical risks at a small set of operators. We formulate this as runtime stability control and present Gradient Norm-to-Mean Ratio (GNMR), a lightweight controller that compares each recoverable unit's current gradient norm with its historical mean. Together with $Δ$-GNMR for abrupt short-window increases, GNMR maps local risk signals to bounded recovery actions under a hard $\mathrm{maxO}$ budget and a short lock interval, without changing the numerical format, kernel, or backend recipe. Across activation-quantization stress, DeepSeek-style recipe-level training, and LLaMA-2 13B fine-tuning, GNMR preserves high-fidelity quality with sparse, budgeted recovery. These results support GNMR as a backend-agnostic controller to improve low-precision training stability while preserving low-cost execution.

2606.00520 2026-06-02 math.OC cs.LG stat.ML

In-Expectation Convergence of Stochastic Gradient Methods under Heavy-Tailed Noise

重尾噪声下随机梯度方法的期望收敛性

Zijian Liu

AI总结 针对重尾噪声(有限p阶矩,p∈(1,2))下随机梯度方法的收敛性问题,证明了随机镜像下降(SMD)、加速随机镜像下降(ASMD)在凸优化中以及SGD和带动量的SGD(SGDM)在非凸优化中的期望收敛性,无需算法修改或有界域假设。

详情
AI中文摘要

许多随机梯度方法被认为在随机梯度的噪声仅具有有限$p$阶矩($p\in\left(1,2\right)$)时不会收敛,这种设置被称为重尾噪声假设。然而,最近的一些研究发现,随机梯度下降($\textsf{SGD}$)无需对其更新规则进行任何修改,就能在有界域的凸问题中出人意料地收敛,这凸显了经典随机梯度方法的潜力。受这一最新进展的启发,我们对重尾噪声下的随机优化进行了全面研究,并为凸优化中的随机镜像下降($\textsf{SMD}$)和加速随机镜像下降($\textsf{ASMD}$)以及非凸优化中的$\textsf{SGD}$和带动量的随机梯度下降($\textsf{SGDM}$)建立了新的期望收敛结果。值得注意的是,我们的结果不仅无需算法修改,而且避免了先前工作中施加的限制性假设,如有界域。更重要的是,我们的分析为研究重尾随机优化提供了一个新颖、优雅且强大的框架,为理解一阶随机梯度方法开辟了一条新途径。

英文摘要

Many stochastic gradient methods are believed not to converge when the noise in stochastic gradients has only a finite $p$-th moment for $p\in\left(1,2\right)$, a setting known as the heavy-tailed noise assumption. However, some recent studies have found that Stochastic Gradient Descent ($\textsf{SGD}$), without any modification to its update rule, can surprisingly converge in expectation for convex problems with bounded domains, highlighting the potential of classical stochastic gradient methods. Inspired by this recent progress, we provide a comprehensive study of stochastic optimization under heavy-tailed noise and establish new in-expectation convergence results for Stochastic Mirror Descent ($\textsf{SMD}$) and Accelerated Stochastic Mirror Descent ($\textsf{ASMD}$) in convex optimization, and for $\textsf{SGD}$ and Stochastic Gradient Descent with Momentum ($\textsf{SGDM}$) in nonconvex optimization. Notably, our results not only hold without algorithmic changes but also avoid restrictive assumptions, such as bounded domains, imposed in prior work. More importantly, our analysis provides a new, elegant, and powerful framework for studying heavy-tailed stochastic optimization, opening a new route to understanding first-order stochastic gradient methods.

2606.00512 2026-06-02 cs.LG cs.IT math.IT stat.ML

Semi-Supervised Learning with Noisy Proxy Covariates: Generalization Bounds and Distribution Regression

带噪声代理协变量的半监督学习:泛化界与分布回归

Kwangho Kim, Jisu Kim

AI总结 针对带噪声代理协变量的半监督回归问题,提出两阶段估计器,利用所有代理协变量学习核本征特征,并在标记数据上拟合岭回归,理论证明在代理扰动可控且未标记代理协变量充足时能恢复快速标记样本率,实验表明在低标记率下优于监督和半监督基线。

详情
Journal ref
Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)
AI中文摘要

在许多现代机器学习流程中,丰富的预训练表示充当有噪声的代理协变量,而任务特定标签仍然稀缺。我们研究这种设置下的半监督回归,并提出一个简单的两阶段估计器,该估计器从所有代理协变量中学习核本征特征,并在标记数据上拟合岭预测器。我们推导出有限样本界,表明当代理扰动受控且未标记代理协变量足够丰富时,可以恢复快速的标记样本率。我们还表明,分布回归是一个直接的特例,当有限袋大小足够大时具有类似的保证。实验表明,在低标记率情况下,相比监督和半监督基线有持续改进。

英文摘要

In many modern machine learning pipelines, abundant pretrained representations serve as noisy proxy covariates, while task-specific labels remain scarce. We study semi-supervised regression in this setting, and propose a simple two stage estimator that learns kernel eigenfeatures from all proxy covariates and fits a ridge predictor on labeled data. We derive finite sample bounds showing that fast labeled sample rates are recovered when proxy perturbation is controlled and unlabeled proxy covariates are sufficiently abundant. We also show that distribution regression is a direct special case, with analogous guarantees when the finite bag size is large enough. Experiments show consistent gains over supervised and semi-supervised baselines, especially in low label regimes.

2606.00500 2026-06-02 cs.DS cs.LG math.ST stat.ML stat.TH

Easy, robust approximate message passing for planted spike models

用于植入尖峰模型的简单、鲁棒近似消息传递

Misha Ivkov, Tselil Schramm

AI总结 针对含对抗性噪声的尖峰矩阵模型,提出一种结合谱预处理与鲁棒谱初始化的算法,使近似消息传递(AMP)在无需修改的情况下实现鲁棒性,输出与无噪声AMP结果接近的向量。

详情
Comments
32 pages
AI中文摘要

我们提出了一种简单高效的算法,用于尖峰矩阵设置中的鲁棒近似消息传递(AMP)。特别地,设 $\varepsilon$ 为足够小的常数,并假设 $X \in \mathbb R^{n \times n}$ 是带有植入秩-$1$ 尖峰的高斯矩阵,而 $E \in \mathbb R^{n \times n}$ 是支撑在 $\varepsilon n \times \varepsilon n$ 主子矩阵上的对抗性选择矩阵。令 $v_{\mathrm{AMP}}(X)$ 为在未损坏矩阵 $X$ 上执行 AMP 迭代的输出。我们给出一个过程,仅给定损坏矩阵 $Y = X + E$,即可计算向量 $v_{\mathrm{ALG}}(Y)$,该向量与 $v_{\mathrm{AMP}}(X)$ 的差距为 $\tilde{O}(\sqrt{\varepsilon})$,适用于包括稀疏主成分分析(PCA)、非负 PCA 和 $\mathbb Z_2$ 同步在内的一类 AMP 迭代。我们的算法由谱预处理步骤结合鲁棒谱初始化过程组成;给定这些输入,我们证明(或许令人惊讶地)AMP 开箱即用具有鲁棒性。

英文摘要

We present a simple and efficient algorithm for robust approximate message passing (AMP) in the spiked matrix setting. In particular, let $\varepsilon$ be a sufficiently small constant, and suppose that $X \in \mathbb R^{n \times n}$ is a Gaussian matrix with a planted rank-$1$ spike, and $E \in \mathbb R^{n \times n}$ is an adversarially chosen matrix supported on an $\varepsilon n \times \varepsilon n$ principal minor. Let $v_{\mathrm{AMP}}(X)$ be the output of an AMP iteration on the uncorrupted matrix $X$. We give a procedure that, given access only to the corrupted matrix $Y = X + E$, computes a vector $v_{\mathrm{ALG}}(Y)$ which is $\tilde{O}(\sqrt{\varepsilon})$-close to $v_{\mathrm{AMP}}(X)$, for any of a class of AMP iterations which includes sparse Principal Component Analysis (PCA), non-negative PCA, and $\mathbb Z_2$ synchronization. Our algorithm consists of a spectral pre-processing step combined with a robust spectral initialization procedure; given these inputs, we prove that (perhaps surprisingly) AMP is robust out-of-the-box.

2606.00481 2026-06-02 cs.CR cs.SY eess.SY math.PR stat.AP

Stochastic Analysis of Cybersecurity Defense Strategies Under Single Attack Scenario

单攻击场景下网络安全防御策略的随机分析

Song-Kyoo Kim

AI总结 针对单攻击场景,提出一种基于连续观测机制的随机框架,利用Laplace-Carson变换和首次超出理论导出防御时刻的联合检测函数,并通过马尔可夫泊松到达的边际化得到防御时刻概率密度和条件期望,实现防御时机对威胁强度的定量评估与低延迟主动防御的参数校准。

详情
Comments
Target to submit an international journal
AI中文摘要

本研究提出了一种新颖的随机框架,用于在单攻击场景下进行主动网络安全防御时机的决策。该方法将防御过程建模为连续观测机制,其中防御时刻和随后的观测时隙服从独立的指数分布。结合Laplace-Carson变换与首次超出理论,得到包围攻击时刻的联合检测函数。在马尔可夫泊松到达下的边际化产生了防御时刻的概率密度以及攻击前和攻击后观测时间的条件期望。这些闭式结果能够定量评估防御时机对威胁强度的敏感性,并支持低延迟主动防御措施中观测参数的精确校准。主要贡献包括边际分布和期望值的显式推导、防御时刻密度的可视化,以及将随机对决方法与实际网络安全应用相衔接。

英文摘要

This research presents a novel stochastic framework for proactive cybersecurity defense timing under a single attack scenario. The approach models the defense process as a continuous observation mechanism in which the defense instant and the subsequent observation slot follow independent exponential distributions. Laplace-Carson transforms combined with first-excess theory yield the joint detection function that brackets the attack moment. Marginalization under Markovian Poisson arrivals then produces the probability density of the defense moment and conditional expectations of pre-attack and post-attack observation times. These closed-form results enable quantitative assessment of defense timing sensitivity to threat intensity and support precise calibration of observation parameters for low-latency proactive measures. Major contributions include the explicit derivation of marginal distributions and expected values, visualization of defense moment density, and the bridging of stochastic duel methodology with practical cybersecurity applications.

2606.00480 2026-06-02 math.DS cs.NA math.NA math.OC stat.ML

Continuous Data Assimilation with Learned Surrogate Dynamics

基于学习代理动力学的连续数据同化

Wenwen Li, Daniel Sanz-Alonso

AI总结 针对动力学未知或计算昂贵的问题,提出使用学习代理模型的 nudging 算法,并建立统一有限维分析,证明其指数收敛性及误差界限。

详情
Comments
75 pages, 14 figures, including appendices
AI中文摘要

连续数据同化旨在从部分观测中估计动力系统的状态。然而,在许多应用中,状态动力学未知或无法以所需分辨率进行模拟,导致模型误差。受此挑战以及数据同化中机器学习代理模型日益普及的推动,本文开发了一种统一的有限维分析,用于采用学习到的动力学代理模型的 nudging 算法。我们首先在无噪声和有噪声情况下,建立了保证使用真实动力学模型的 nudging 算法精确跟踪的动力学和观测的一般条件。然后,我们表明,使用代理模型的 nudging 算法保留了指数收敛性,直到一个明确的误差下限,该下限量化了代理近似误差和观测噪声的影响。最后,我们分析了通过学习系统的向量场或短时解映射获得的代理模型,并量化了在无噪声情况下确保精确 nudging 所需的训练数据量。数值实验支持了该理论。

英文摘要

Continuous data assimilation seeks to estimate the state of a dynamical system from partial observations. In many applications, however, the state dynamics are unknown or prohibitively expensive to simulate at the required resolution, leading to model error. Motivated by this challenge and the increasing adoption of machine learning surrogates in data assimilation, this paper develops a unified finite-dimensional analysis of nudging algorithms that employ learned surrogate models of the dynamics. We first establish general conditions on the dynamics and observations that guarantee accurate tracking for nudging with the true dynamics model, both in the noise-free and noisy settings. We then show that nudging algorithms that employ surrogate models retain exponential convergence up to an explicit error floor that quantifies the effects of surrogate approximation error and observation noise. Finally, we analyze surrogate models obtained by learning either the vector field or the short-time solution map of the system, and quantify the amount of training data needed to ensure accurate nudging in the noise-free setting. Numerical experiments support the theory.

2606.00478 2026-06-02 math.ST stat.TH

Online Sparse Regression with Expanding Observables

具有扩展可观测变量的在线稀疏回归

Ying Yang, Fang Yao

AI总结 提出RAVAS框架,通过递归过程动态更新特征选择,解决在线高维回归中变量逐渐出现的问题,实现高效计算和理论保证。

详情
AI中文摘要

近年来,在线高维回归受到越来越多的关注,但现有方法通常假设所有候选特征(包括重要特征)在数据收集之初就已观测到。这一假设在实际场景中经常被违反,因为新变量会随着数据积累逐渐变得可用。为解决这一问题,我们引入了一个新颖的框架——递归自适应变量选择(RAVAS),用于具有扩展可观测性的在线回归。RAVAS采用递归过程,随着样本量和可观测特征集的增长动态更新特征选择。该算法设计为计算高效且内存轻量,仅依赖于在线更新的低维充分统计量。该方法的一个关键优势在于能够检测并纳入后来出现的重要变量,从而减轻早期缺失的影响。我们在模型选择、估计误差和特征覆盖方面建立了理论保证,并开发了一种自适应在线调参策略。大量的模拟和真实世界实验验证了RAVAS在高维流数据上的有效性。

英文摘要

Online high-dimensional regression has gained increasing attention in recent years, yet existing methods typically assume that all candidate features, including important ones, are observed from the outset of data collection. This assumption is often violated in real-world scenarios, where new variables become available gradually as data accumulate. To address this gap, we introduce a novel framework, Recurrent Adaptive Variable Selection (RAVAS), for online regression with expanding observability. RAVAS employs a recurrent procedure that dynamically updates feature selection as both the sample size and the observable feature set grow. The algorithm is designed to be computationally efficient and memory-light, relying only on low-dimensional sufficient statistics that are updated online. A key advantage of the method lies in its ability to detect and incorporate important variables that emerge later, thereby mitigating the effect of early-stage missingness. We establish theoretical guarantees on model selection, estimation error, and feature coverage, and develop an adaptive online tuning strategy. Extensive simulations and real-world experiments verify the effectiveness of RAVAS for high-dimensional streaming data.

2606.00467 2026-06-02 cs.CL cs.AI cs.LG stat.ML

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

论大语言模型适应性的局限:模型内化先验对标注任务性能的影响

Etienne Casanova, Rafal Kocielnik, R. Michael Alvarez

AI总结 通过毒性检测实验,研究大语言模型内化先验与指令交互的三个维度,发现近三分之二的零样本错误难以通过提示纠正,并引入定义特定熟悉度(DSF)指标,证明其与性能正相关,而文本记忆指标则无此关联。

详情
Comments
Accepted at ICML 2026 (Oral & Spotlight); PMLR vol. 306. 9 pages, 4 figures
AI中文摘要

大语言模型(LLMs)越来越多地用于零样本标注和LLM-as-a-judge任务,但其可靠性取决于模型内化先验与用户提供指令的交互方式。我们研究了这种交互的三个维度:(1)LLM对数据和任务定义的熟悉程度如何影响性能;(2)提示中的额外信息能在多大程度上纠正零样本错误(“决策粘性”);(3)模型对错误任务定义的敏感性。通过在多种数据集(涵盖社交媒体、游戏、新闻和论坛)上进行毒性检测实验,使用密集模型和混合专家模型,我们发现近三分之二的零样本错误难以纠正,提示纠正的总体挽救率(初始错误中被纠正的比例)仅为34.8%。高置信度错误尤其难以纠正。当给出错误定义时,LLM会遵循这些定义,同时保持与正确定义条件下相同的置信水平。关键的是,我们引入了定义特定熟悉度(DSF),它衡量模型内部概念与任务定义之间的一致性。在控制数据集层面的混杂因素后,DSF与模型性能呈正相关(偏相关系数r=+0.41),而三种不同的记忆指标(ROUGE-L、BERTScore和嵌入余弦相似度)均未显示正相关。这些发现揭示了基于提示的纠正在标注任务中的局限性,强调了定义对齐比文本级记忆更重要。

英文摘要

Large Language Models (LLMs) are increasingly used for zero-shot annotation and LLM-as-a-judge tasks, yet their reliability hinges on how model-internalized priors interact with user-provided instructions. We investigate three dimensions of this interaction: (1) how an LLM's familiarity with data and task definitions affects performance, (2) the extent to which additional information in prompts can correct zero-shot errors ("decision stickiness"), and (3) model susceptibility to misaligned task definitions. Through experiments on toxicity detection across diverse datasets (spanning social media, gaming, news, and forums) using both dense and mixture-of-experts models, we find that nearly two-thirds of zero-shot errors are resistant to correction, with an overall rescue rate (fraction of initial errors corrected by prompting) of only 34.8%. High-confidence errors prove especially resistant to correction. When given misaligned definitions, LLMs follow them while maintaining confidence levels unchanged from the aligned condition. Crucially, we introduce Definition-Specific Familiarity (DSF), which measures alignment between a model's internal concept and the task definition. After controlling for dataset-level confounds, DSF shows a positive association with model performance (partial r = +0.41), while three distinct memorization metrics (ROUGE-L, BERTScore, and embedding cosine similarity) all fail to show a positive association. These findings show the limitations of prompt-based correction in annotation tasks, highlighting the importance of definition alignment over text-level memorization.

2606.00465 2026-06-02 stat.OT

Extrinsic Analysis on BHV4

BHV4上的外蕴分析

Tingan Chen, Garett Ordway, Vic Patrangenaru

AI总结 基于BHV4空间的新表示SPED,利用Veronese-Whitney嵌入推导外蕴均值的精确解,并应用于酵母基因组数据集研究四个酵母支系的系统发育树。

详情
Comments
30 pages
AI中文摘要

本文基于最近提出的新表示(见[1])——尖刺投影挖掘十二面体(SPED),对具有四个叶子的Billera-Holmes-Vogtmann树空间(T4或BHV4)进行外蕴统计分析。由于SPED的对称性,我们在此考虑的Veronese-Whitney(VW)嵌入为BHV4上的统计分析提供了自然的外蕴度量。我们推导了VW外蕴均值的精确解,并将这一新方法应用于酵母基因组数据集,以研究四个不同酵母支系的系统发育树。

英文摘要

One investigates the extrinsic statistical analysis on the space of Billera- Holmes-Vogtmann tree space with four leaves (T4 or BHV4) based on its recently proposed novel representation (see [1])- the Spiky Projective ExcavatedDodecahedron (SPED). Due to the symmetry of the SPED, the Veronese- Whitney (VW) embeddingwe consider here produces a natural extrinsicmetric for a statistical analysis on BHV4. one derives the exact solution for the VW extrinsic mean and applies this novel method on a yeast genome dataset to study the phylogenetic trees of four distinct yeast clades.

2606.00442 2026-06-02 cs.LG math.OC stat.ML

Exploiting weight-space symmetries for approximating curvature

利用权重空间对称性近似曲率

Artem Artemev, Rui Xia, Benjamin M. Boyd, Youjing Yu, Felix Dangel, Guillaume Hennequin, Alberto Bernacchia

AI总结 本文通过解析平均化保持损失不变的群作用,从单个梯度构建结构化的Hessian近似,从而利用权重空间对称性来近似损失函数的曲率。

详情
Comments
Published at ICML 2026. 35 pages, 11 figures. Code: https://github.com/mtkresearch/symm_opt
AI中文摘要

许多机器学习技术依赖于近似损失函数的曲率,但在现代深度网络的规模下,这通常很难做到。令人惊讶的是,之前没有工作利用损失景观中众所周知的权重空间对称性所产生的曲率约束。通过解析平均化保持损失不变的群作用,我们从单个梯度构建了结构化的Hessian近似,这些近似可以易于估计、存储和求逆。用户指定的对称群直接控制近似精度与计算成本之间的权衡。此外,我们的框架为审视现有方法提供了统一的理论视角;特别地,特定的对称群选择可以恢复Shampoo/Muon类的曲率估计。我们在多种网络架构上验证了我们的方法,并将其应用于二阶优化基准测试,包括一个小型语言模型。我们的曲率估计框架可能在机器学习其他问题中找到应用,如不确定性估计、持续学习、压缩/剪枝、训练数据归因等。

英文摘要

Many machine learning techniques rely on approximating a loss function's curvature, but this is notoriously hard to do at the scale of modern deep networks. Surprisingly, no previous work has exploited the curvature constraints that arise from well known weight-space symmetries in loss landscapes. By analytically averaging over group actions that leave the loss invariant, we construct structured Hessian approximations from single gradients that can be tractably estimated, stored, and inverted. The choice of user-specified symmetry group directly governs the trade-off between approximation accuracy and computational cost. Moreover, our framework provides a unifying theoretical lens for viewing existing methods; in particular, a specific choice of symmetry group recovers Shampoo/Muon-like curvature estimates. We validate our method on a range of network architectures, and deploy it to second-order optimization benchmarks, including a small language model. Our curvature estimation framework might find applications in other machine learning problems such as uncertainty estimation, continual learning, compression/pruning, training data attribution, and more.

2606.00436 2026-06-02 stat.ME math.ST stat.ML stat.TH

Weighted Conformal Clustering

加权共形聚类

Anirban Nath, YoonHaeng Hur, Genevera I. Allen

AI总结 提出加权共形方法为聚类标签构建有效置信集,通过条件标签分布偏移校正解决校准标签与真实标签不匹配问题,实现有限样本边际覆盖。

详情
AI中文摘要

聚类是发现未标记数据中潜在结构的核心工具;然而现代聚类流程通常以将每个观测值硬分配到某个聚类结束,而没有严格的分配不确定性度量。我们提出了一种新颖的加权共形方法,用于为聚类标签构建有效的置信集。关键困难在于可用于校准的标签不是观察到的真实标签,而是由数据依赖的聚类算法产生的合成标签。我们的方法开发了一种共形推理算法,通过将共形聚类表述为条件标签分布偏移问题,利用权重校正与潜在目标标签的失配。我们首先推导出一个达到有限样本边际覆盖的预言过程,然后使用估计的条件标签概率和新颖的增强校准开发了一个计算上可行且可实现的版本。我们表明估计权重过程的覆盖度取决于估计量,给出了相对于名义水平的损失的显式界限。实证研究表明,所提出的加权方法在信息性置信集大小方面,特别是在非线性和高维聚类应用中,优于最近提出的分裂共形聚类过程。

英文摘要

Clustering is a central tool for discovering latent structure in unlabeled data; yet modern clustering pipelines often end with a hard assignment of each observation to a cluster without rigorous measures of assignment uncertainty. We propose a novel weighted conformal approach for constructing valid confidence sets for cluster labels. The key difficulty is that the labels available for calibration are not observed ground-truth labels, but synthetic labels produced by a data-dependent clustering algorithm. Our method develops a conformal inference algorithm that corrects the resulting mismatch with the latent target labels through weights by formulating conformal clustering as a conditional label-distribution shift problem. We first derive an oracle procedure that attains finite-sample marginal coverage and then develop a computationally tractable and implementable version using estimated conditional label probabilities and novel augmented calibration. We show that the coverage of the estimated-weight procedure depends on the estimator, giving an explicit bound on the loss relative to the nominal level. Empirical studies demonstrate that the proposed weighted approach offers improvements over the recently proposed split conformal clustering procedure in terms of informative confidence set size, especially in nonlinear and high-dimensional clustering applications.

2606.00425 2026-06-02 stat.ME stat.ML

Empirical Likelihood with Generative AI

基于生成式AI的经验似然

Jiguang Li, Sid Kankanala, Veronika Rockova

AI总结 提出一种基于指数倾斜经验似然的非参数贝叶斯框架,通过将狄利克雷过程的后验投影到矩约束模型上,实现高效并行推断,并建立Bernstein-von Mises定理和一致性定理,应用于新闻标题预测股票收益。

详情
AI中文摘要

矩条件广泛用于识别模型中的参数,其中完全似然要么未知,要么故意未指定。经验似然方法通过为观测数据分配概率权重,使得样本矩条件精确成立,从而解决这一问题。基于这一思想,我们提出了一种基于指数倾斜经验似然的非参数贝叶斯框架。这种贝叶斯公式在以下设置中特别有吸引力:先验信息更自然地指定在可观测变量上,而不是底层参数上。这种设置出现在存在辅助数据源或由现代生成式AI模型生成的合成数据时。推断通过将狄利克雷过程的后验抽取投影到矩约束模型上进行,产生了一种计算高效且自然适合并行化的程序。我们在先验消失和先验持续两种机制下,为得到的投影后验建立了新的Bernstein-von Mises定理和一致性定理。在利用隔夜新闻标题预测收益的应用中,我们展示了当参数本身缺乏信息先验时,AI生成的辅助数据可以提供有用的间接正则化来源。

英文摘要

Moment conditions are widely used to identify parameters in models where the full likelihood is either unknown or intentionally left unspecified. Empirical likelihood methods address this problem by assigning probability weights to the observed data so that the sample moment conditions hold exactly. Building on this idea, we propose a nonparametric Bayesian framework based on exponentially tilted empirical likelihood. This Bayesian formulation is particularly appealing in settings where prior information is more naturally specified on the observables rather than on the underlying parameters. Such settings arise in the presence of auxiliary data sources or synthetic data generated by modern generative AI models.Inference proceeds by projecting posterior draws from a Dirichlet process onto the moment-restricted model, yielding a computationally efficient procedure that is naturally amenable to parallelization. We establish new Bernstein--von Mises and consistency theorems for the resulting projection posterior under both vanishing-prior and persistent-prior regimes. In an application to return prediction using overnight news headlines, we show that AI-generated auxiliary data can provide a useful source of indirect regularization when informative priors on the parameter itself are unavailable.

2606.00413 2026-06-02 stat.ML cs.LG

Riemannian Stochastic Optimization for Sufficient Dimension Reduction

充分降维的黎曼随机优化

Thibault Pautrel, François Portier

AI总结 提出一种基于黎曼流形随机梯度上升的算法SMAVE,通过将充分降维问题转化为Stiefel流形上的光滑最大化,实现高效的低维子空间恢复。

详情
AI中文摘要

充分降维(SDR)通过将协变量投影到保留响应条件均值的低维子空间,使高维回归变得易于处理。现有的基于梯度的估计器要么在原始空间中操作并遭受维数灾难,要么在降维空间中局部化,每次外迭代的代价至少与样本量成二次关系。我们证明了总体最小平均方差估计(MAVE)风险的最小化器与梯度外积(OPG)逼近相同的Grassmannian目标,并将经验准则重新表述为Stiefel流形上的光滑最大化,具有闭式黎曼梯度。由此产生的算法SMAVE结合了稀疏投影空间最近邻局部化和黎曼随机梯度上升。简化版本具有几乎必然收敛性和非渐近速率,匹配标准的非凸随机一阶缩放。实验上,SMAVE在中高维环境中匹配或改进了RMAVE的合成子空间恢复,在四个真实数据集上一致优于OPG,并且与RMAVE相比具有竞争力或更优,同时运行时间低几个数量级。

英文摘要

Sufficient dimension reduction (SDR) makes high-dimensional regression tractable by projecting the covariates onto a low-dimensional subspace that preserves the conditional mean of the response. Existing gradient-based estimators either operate in the ambient space and suffer from the curse of dimensionality, or localize in the reduced space at a per-outer-iteration cost at least quadratic in the sample size. We show that minimizers of the population Minimum Average Variance Estimation (MAVE) risk approximate the same Grassmannian target as the Outer Product of Gradients (OPG), and recast the empirical criterion as a smooth maximization on the Stiefel manifold with closed-form Riemannian gradient. The resulting algorithm, SMAVE, combines sparse projected-space nearest-neighbor localization with Riemannian stochastic gradient ascent. A simplified version comes with almost-sure convergence and a non-asymptotic rate matching the standard non-convex stochastic first-order scaling. Empirically, SMAVE matches or improves on RMAVE's synthetic subspace recovery at moderate-to-high ambient dimension, and on four real datasets it uniformly improves over OPG and is competitive with or outperforms RMAVE at orders of magnitude lower runtime.

2606.00402 2026-06-02 stat.ME cs.AI stat.AP

A Distribution-Free Framework for Rewrite-Based Human-text Detection via Knockoff Filtering

基于重写的人类文本检测的无分布框架:通过Knockoff过滤

Yi Liu

AI总结 提出一种无分布统计框架,将任意基于重写的检测器转化为具有有限样本FDR保证的检测器,无需重新训练,通过将重写检测视为具有knockoff结构的多重假设检验问题实现。

详情
AI中文摘要

我们提出了一种无分布统计框架,该框架无需重新训练即可将任意基于重写的检测器转化为具有有限样本FDR保证的检测器。我们的关键观察是,基于重写的检测隐式地构建了knockoff样本,使得LLM生成的文本检测可以被表述为具有knockoff结构的多重假设检验问题。这一视角将检测统计量的设计与错误发现的控制分离开来,通过一个简单的校准过程,使现有的重写检测器能够继承有限样本错误发现率(FDR)保证。我们在三个检测模型、19个领域和四个LLM上展示了可靠的FDR控制和有意义的检测能力。

英文摘要

We propose a distribution-free statistical framework that converts arbitrary rewrite-based detectors into detectors with finite-sample FDR guarantees without retraining. Our key observation is that rewrite-based detection implicitly constructs knockoff samples, enabling LLM-generated text detection to be formulated as a multiple hypothesis testing problem with knockoff structure. This perspective separates the design of detection statistics from the control of false discoveries, allowing existing rewrite detectors to inherit finite-sample false discovery rate (FDR) guarantees through a simple calibration procedure. We demonstrate reliable FDR control with meaningful detection power across three detection models, 19 domains, and four LLMs.

2606.00394 2026-06-02 hep-lat stat.CO

Probing and graph coloring techniques for trace estimation in Lattice QCD

格点QCD中迹估计的探测与图着色技术

Mario Papace, Andreas Frommer, Jose Jimenez-Merchan, Bruno Lang, Gustavo Ramirez-Hidalgo, Christian Schneider

AI总结 针对格点QCD中Wilson-Dirac矩阵迹的随机估计方差大的问题,本文回顾了随机探测技术,并提出了一种基于乘子的着色方案,该方案在任意距离下用更少的颜色实现有效着色,并证明在中间着色时方差严格低于分层探测,数值实验验证了其平滑单调的方差减小和精度提升。

详情
AI中文摘要

计算 $\mathrm{Tr}[D^{-1}]$(其中 $D$ 是格点QCD的Wilson-Dirac矩阵)是一项基础且计算量大的任务,应用于不相连的强子关联函数。由于 $D^{-1}$ 是尺寸过大的稠密矩阵,其迹无法精确计算,必须通过Hutchinson估计器进行随机估计。然而,估计的方差可能很大,因为它由 $D^{-1}$ 的非对角元主导。我们回顾了随机探测技术,该技术通过从与 $D$ 相关的图的距离-$d$ 着色构造结构化采样向量来减小方差,利用 $D^{-1}$ 的指数非对角衰减消除方差的主要短程贡献。然后,我们提出了一种新颖的基于乘子的着色方案,该方案在任意距离下用比已建立的分层探测构造显著更少的颜色实现有效的距离-$d$ 着色。我们证明,在任意介于两个连续分层级别之间的中间着色处,对于足够大的 $d$,基于乘子的估计器方差严格低于部分分层估计器。数值实验证实了这一点,显示基于乘子的方差随颜色数量平滑单调减小,避免了分层探测在中间着色处的不规则行为,并在相对精度上实现了显著提升。

英文摘要

The computation of $\mathrm{Tr}[D^{-1}]$, where $D$ is the Wilson-Dirac matrix of Lattice QCD, is a fundamental and computationally demanding task with applications to disconnected hadronic correlation functions. Since $D^{-1}$ is a dense matrix of prohibitive size, its trace cannot be computed exactly, and one must resort to stochastic estimation via the Hutchinson estimator. The variance of the resulting estimation, however, can be large, as it is dominated by the off-diagonal entries of $D^{-1}$. We review the stochastic probing technique, which reduces the variance by constructing structured sampling vectors from distance-$d$ colorings of the graph associated with $D$, exploiting the exponential off-diagonal decay of $D^{-1}$ to eliminate dominant short-range contributions to the variance. We then present a novel multiplier-based coloring scheme, which achieves valid distance-$d$ colorings at arbitrary distances with significantly fewer colors than the established hierarchical probing construction. We prove that at any intermediate coloring falling between two consecutive hierarchical levels, the multiplier-based estimator achieves strictly lower variance than the partial hierarchical estimator, for large enough $d$. This is confirmed by numerical experiments showing that the multiplier-based variance decreases smoothly and monotonically with the number of colors, avoiding the irregular behavior affecting hierarchical probing at intermediate colorings, and achieving a substantial improvement in relative accuracy.

2606.00346 2026-06-02 stat.ME stat.AP

Network knockoffs: controlling false discovery in dyadic space

网络knockoff:在二元空间中控制错误发现

Justin Van Ee, Yoichiro Kanno, Jacob Rash, Mevin Hooten

AI总结 提出一种直接在拓扑网络上模拟合成特征的knockoff变量选择方法,以在高维二元回归中控制错误发现率。

详情
Comments
20 pages, 6 figures
AI中文摘要

流行病过程、水文系统、社交平台、公用事业服务和供应链等现象可以表示为拓扑网络。关于这些网络的一个核心问题是连通性和边的渗透性。二元回归及相关方法已被提出用于识别与成对节点差异相关的网络特征。在高维设置中,控制虚假选择特征的数量非常重要。然而,控制二元结果的错误发现率具有挑战性,因为二元组之间的依赖性使得经典渐近方法无效,并复杂化了标准数据分割和knockoff方法。我们提出了一种新颖的knockoff变量选择程序,该程序在构建二元空间中的增广设计矩阵之前,直接在拓扑网络上模拟合成特征。实验表明,我们的方法控制了节点级和边级特征的错误发现率。Benjamini-Hochberg、Benjamini-Yekutieli、Storey Q值、数据分割和标准knockoff程序都是反保守的。我们将我们的网络knockoff应用于评估北卡罗来纳州超过1000个溪流障碍物对溪红点鲑的不可通行性。与数据分割和传统knockoff方法相比,我们提出的方法选择了更高比例的先前评估为阻碍鱼类运动的障碍物。

英文摘要

Phenomena such as epidemiological processes, hydrologic systems, social platforms, utility services, and supply chains can be represented as topological networks. A central question about these networks concerns connectivity and the permeability of edges. Dyadic regression and related approaches have been proposed to identify network features associated with pairwise node-level differences. In high-dimensional settings, it is important to control the number of spuriously selected features. However, controlling the false discovery rate for dyadic outcomes is challenging because dependence among dyads invalidates classic asymptotic procedures and complicates standard data splitting and knockoff approaches. We propose a novel knockoff variable selection procedure that simulates synthetic features directly on the topological network prior to constructing the augmented design matrix in dyadic space. Empirically, our method controls the false discovery rate for both node- and edge-level features. The Benjamini-Hochberg, Benjamini-Yekutieli, Storey Q-value, data-splitting, and standard knockoff procedures were all anticonservative. We applied our network knockoffs to assess the impassability of over 1000 stream barriers in North Carolina for Salvelinus fontinalis. Compared to data splitting and traditional knockoff approaches, our proposed approach selected a higher proportion of barriers previously assessed to impede fish movement.

2606.00343 2026-06-02 math.ST stat.CO stat.ML stat.TH

Polar Depth for Potentially Heavy-Tailed Data

潜在重尾数据的极坐标深度

Stephan Clemençon, Carlos Fernándes, Pavlo Mozharovskyi, Anne Sabourin

AI总结 提出极坐标深度函数用于分析多元重尾分布的极端值,证明其在大阈值下收敛到极限分布,并应用于异常检测。

详情
AI中文摘要

受多元重尾分布极端行为分析的启发,我们引入了一种新的统计深度概念,称为极坐标深度。极坐标深度函数自然地在极坐标中表达,正如正则变化随机变量的极限分布一样,在边际经过适当归一化后,超越渐近大阈值。极坐标深度函数不仅易于对重尾随机变量X的极端值进行排序,并在异常检测中有自然应用,而且我们还可以在适当假设下证明,最大观测值(即范数大于t>0的观测值X)的极坐标深度随着t趋于无穷而收敛到极限分布的极坐标深度。尽管设计用于量化多元极值的深度,极坐标深度本身也很有趣,因为对于支撑集包含在半空间中的分布,该概念比文献中提出的替代方案(特别是半空间深度)更为相关。在此,我们展示了其性质,并从有限样本和渐近角度分析了与其估计相关的统计问题。我们给出数值结果以实证证明其相关性,特别是对于极端观测的统计分析,以及更具体地识别其中的异常。

英文摘要

Motivated by the analysis of the behaviour of extremes from multivariate heavy-tailed distributions, we introduce a novel notion of statistical depth, referred to as Polar Depth. The polar depth function is naturally expressed in polar coordinates, as is the limiting distribution of a regularly varying random variable, beyond asymptotically large thresholds, once its marginals have been appropriately normalized. Not only does the polar depth function make it easy to order the extreme values taken by a heavy-tailed random variable X and finds natural applications in anomaly detection, but it is also possible to show, as we prove it under appropriate assumptions in this article, that the polar depth of the largest observations, i.e. observations X which norm is larger than t>0, converges to the polar depth of the limiting distribution as t converges to infinity. Although designed to quantify the depth of multivariate extremes, the polar depth is interesting in its own right, insofar as this notion is more relevant for distributions whose support is included in a halfspace than the alternatives proposed in the literature, the halfspace depth in particular. Here, we demonstrate its properties and analyze statistical issues related to its estimation from both finite-sample and asymptotic points of view. We present numerical results to empirically demonstrate its relevance, particularly for the statistical analysis of extreme observations and more specifically for the identification of anomalies among them.

2606.00329 2026-06-02 eess.SY cs.LG cs.SY stat.ML

Benchmarking Recursive-Collapse Warning Claims Under Matched False-Positive Control

在匹配假阳性控制下对递归崩溃警告声明的基准测试

David Mullett

AI总结 提出Loopzero基准框架,通过方向性遥测模式(增益G、递归持久性p、多样性δ)在匹配假阳性预算下评估递归系统崩溃警告声明,并报告标准检测器未达到可接受工作点。

详情
Comments
29 pages, 7 figures, 2 tables; supplementary materials: 9 pages, 1 figure, 4 tables. Code, derived data packets, and Lean artifact: https://github.com/davidmullett/loopzero-paper-public (release tag lean-v1.0)
AI中文摘要

递归系统在明显故障变得可见之前可能进入类似崩溃的状态——自我强化放大、持续递归和多样性缩小,这些掩盖了加速的内部退化。我们引入了Loopzero,一个声明约束的基准框架,用于测试递归故障是否遵循方向性遥测模式:上升增益(G)、递归持久性(p)和下降多样性(δ)。声明边界在Lean中指定;Lean构件不验证实际遥测、基准有效性或检测器性能。我们在两个冻结的公共构件基准上评估桥梁:一个分段公共市场基准(2018年Volmageddon,2020年COVID MWCB)和一个MovieLens-25M离线确定性推荐回放。检测器在锁定等假阳性合同(FP ∈ [0.03, 0.07],预注册)下进行评估,因此所有配置面临相同的警报预算。测试的标准比较器和Loopzero预注册的分位数检测器均未达到可接受的工作点。方向性证人对齐在两个规范基准上成立,并披露了相邻视野和行级限制。数字化Shumailov等人(2024)的LLM训练循环轨迹在方向上与模式一致;该领域的匹配假阳性评估被推迟。贡献是一个可复现、可证伪的基准框架,用于在显式警报预算合同下评估递归崩溃警告声明——将不接受报告为第一类科学结果。

英文摘要

Recursive systems can enter collapse-like regimes -- self-reinforcing amplification, persistent recursion, and narrowing diversity that mask accelerating internal degradation -- before overt failure becomes visible. We introduce Loopzero, a claim-bounded benchmark framework for testing whether recursive failures follow a directional telemetry pattern: rising gain (G), recursive persistence (p), and declining diversity ($δ$). The claim boundary is specified in Lean; the Lean artifact does not verify real telemetry, benchmark validity, or detector performance. We evaluate the bridge on two frozen public-artifact benchmarks: a segmented public-markets benchmark (Volmageddon 2018, COVID MWCB 2020) and a MovieLens-25M offline deterministic recommender replay. Detectors are evaluated under a locked equal-false-positive contract (FP $\in$ [0.03, 0.07], pre-registered) so all configurations face the same alert budget. Neither tested standard comparators nor Loopzero's pre-registered quantile detector achieved an accepted operating point. Directional witness alignment held on both canonical benchmarks, with adjacent-horizon and row-level limitations disclosed. Digitized Shumailov et al. (2024) LLM training-loop trajectories are directionally consistent with the pattern; matched-FP evaluation in that domain is deferred. The contribution is a reproducible, falsifiable benchmark framework for evaluating recursive-collapse warning claims under an explicit alert-budget contract -- non-acceptance reported as a first-class scientific outcome.

2606.00327 2026-06-02 stat.ME cs.LG stat.AP stat.ML

Cluster Analysis with Resampling for Validation and Exploration (CARVE)

基于重采样的聚类验证与探索分析(CARVE)

Kai R. Wycik, Tiffany M. Tang, Tarek M. Zikry, Genevera I. Allen

AI总结 提出CARVE开源软件包,通过重采样评估聚类稳定性和泛化性,在全局、簇和样本级别提供诊断,优于传统聚类验证指标。

详情
AI中文摘要

聚类在科学领域被广泛用作下游数据驱动科学发现的基础。然而,聚类结果对算法选择、预处理和聚类数$k$高度敏感,导致科学声明往往不可重复。当前用于验证聚类解决方案的最先进技术包括轮廓系数、Davies-Bouldin和Calinski-Harabasz等聚类验证指标(CVI),这些指标依赖于几何假设,但在生物医学研究中遇到的重尾、高维和非线性结构数据上失效。基于重采样的替代方法——基于聚类稳定性和泛化性的思想——已被提出,但仍分散在专门的工具中,缺乏统一、易用的软件。我们通过CARVE(基于重采样的聚类验证与探索分析)填补了这一空白,这是一个开源的Python和R包,可联合评估多个聚类算法和超参数,在全局、簇和样本级别返回稳定性和泛化性诊断,以及基于原则的选择规则和基于共识的簇标签。在六个合成基准测试中,CARVE一致地恢复了接近最优的聚类,而经典指标则显著退化。在实验基因组学和蛋白质组学数据集上,当经典CVI完全失效时,CARVE恢复了更精细的生物结构。CARVE提供与scikit-learn兼容的Python API和与Seurat工作流兼容的类似R接口。

英文摘要

Clustering is widely used across the sciences as the foundation for downstream data-driven scientific discoveries. However, clustering results are highly sensitive to the choice of algorithm, preprocessing, and the number of clusters $k$, producing scientific claims that are often not reproducible. The current state of the art for validating clustering solutions consists of clustering validation indices (CVIs) such as Silhouette, Davies-Bouldin, and Calinski-Harabasz, which rely on geometric assumptions that break down on the heavy-tailed, high-dimensional, and nonlinearly structured data encountered in biomedical research. Resampling-based alternatives - grounded in the ideas of clustering stability and generalizability - have been proposed but remain scattered across specialized tools with no unified, accessible software. We fill this gap with CARVE (Cluster Analysis with Resampling for Validation and Exploration), an open-source Python and R package that jointly evaluates multiple clustering algorithms and hyperparameters, returning stability and generalizability diagnostics at the global, cluster, and sample level together with principled selection rules and consensus-based cluster labels. Across six synthetic benchmarks CARVE consistently recovers near-optimal clusterings where classical indices degrade substantially. On experimental genomics and proteomics data sets, CARVE recovers finer biological structure when classical CVIs collapse entirely. CARVE is available with a scikit-learn-compatible Python API and an analogous R interface compatible with Seurat workflows.

2606.00322 2026-06-02 cs.LG stat.ML

Perturbative methods for non-parametric instrumental variable

非参数工具变量的微扰方法

Wei Bu, Arthur Gretton

AI总结 提出一种受物理微扰论启发的非参数工具变量估计方法,通过系统的高阶微扰校正改进核岭回归,在高维病态问题中预测误差降低高达99%。

详情
Comments
8+24 pages, 4 figures, comments welcomed
AI中文摘要

我们引入了一种用于非参数工具变量(NPIV)估计的微扰方法。通过从物理学中的微扰论汲取灵感,我们用系统的高阶微扰校正扩展了标准核岭回归方法,显著提高了估计精度。在谱域中,微扰引入了期望积分算子不同本征模之间的混合,这在积分方程病态时尤其有用。这种病态的一个来源可以是维度灾难。我们的方法在各种维度范围内均有效,特别是当维度参数$β$(通过样本数$n$和维度$d$定义为$n^β= d$)变大时。实验结果表明,在高维病态情况($β> 0.7$)下,与标准岭回归方法相比,我们的一阶微扰校正可以将预测误差降低高达99%。性能提升在广泛的维度范围内得以保持,并且随着维度的增加,优势变得更加明显。

英文摘要

We introduce a perturbative approach for nonparametric instrumental variable (NPIV) estimation. By drawing inspiration from perturbation theory in physics, we extend standard kernel ridge methods with systematic higher perturbation order corrections that significantly improve estimation accuracy. Spectrally, the perturbation introduces mixing between different eigenmodes of the expectation integral operator, which becomes especially useful when the integral equation is ill-defined. One source for such ill-definedness can be the curse of dimensionality. Our method performs across various dimensionality regimes, particularly when the dimensionality parameter $β$ which is defined through the number of samples $n$ and dimension $d$ as $n^β= d$, becomes large. Experimental results show that our first-order perturbative corrections can reduce prediction error by up to 99\% in high-dimensional ill-defined cases ($β> 0.7$) compared to standard ridge regression approaches. The performance improvement is maintained across a wide range of dimensions, with the advantage becoming more pronounced as dimensionality increases.

2606.00309 2026-06-02 cs.LG stat.ML

Large-scale Uncertainty Quantification for Latent Variable Models Using Subsampling Markov Chain Monte Carlo

基于子抽样马尔可夫链蒙特卡罗的潜变量模型大规模不确定性量化

Xiaoyu Wang, Jonathan H. Huggins

AI总结 针对潜变量模型中SGLD-Gibbs算法超参数调优缺乏理论指导的问题,通过推导统计缩放极限理论,提出确保不确定性量化有意义的调优准则。

详情
Journal ref
Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026
AI中文摘要

随机梯度Langevin动力学结合Gibbs更新(SGLD--Gibbs)为潜变量模型中的近似贝叶斯推断提供了一种高度可扩展的方法。然而,如何以原则性方式调整算法的超参数以确保不确定性估计在统计上有意义仍不清楚。在这项工作中,我们通过为SGLD--Gibbs开发统计缩放极限理论来解决这一调优指导的空白。我们在适当的时空重缩放下推导了全局参数和潜变量的联合渐近极限。我们表明,全局参数收敛到扩散型极限,而每个潜变量收敛到跳跃过程,反映了间歇性Gibbs更新的使用。这种联合跳跃-扩散结构揭示了潜变量随机性如何对全局参数的平稳分布做出贡献。我们利用我们的结果为SGLD--Gibbs的超参数调优提出明确的指导,确保有意义的不确定性量化。数值实验表明,使用我们的调优指导的SGLD--Gibbs在参数估计、不确定性量化和预测性能方面优于随机变分推断。

英文摘要

Stochastic gradient Langevin dynamics combined with Gibbs updates (SGLD--Gibbs) provides a highly scalable approach to approximate Bayesian inference in latent variable models. However, it remains unclear how to tune the algorithm's hyperparameters in a principled manner to ensure the uncertainty estimates are statistically meaningful. In this work, we address this gap in tuning guidance by developing a statistical scaling limit theory for SGLD--Gibbs. We derive a joint asymptotic limit for the global parameters and latent variables under appropriate space-time rescaling. We show that global parameters converge to a diffusion-type limit, while each latent variable converges to a jump process, reflecting the use of intermittent Gibbs updates. This joint jump-diffusion structure reveals how latent-variable randomness contributes to the stationary distribution of the global parameters. We leverage our results to propose explicit guidance on hyperparameter tuning for SGLD--Gibbs that ensures meaningful uncertainty quantification. Numerical experiments show that SGLD--Gibbs with our tuning guidance leads to better parameter estimates, uncertainty quantification, and predictive performance than stochastic variational inference.

2606.00302 2026-06-02 stat.ML cs.LG

ERICA: Quantifying Replicability of Cluster Analysis

ERICA: 量化聚类分析的可复现性

Siamak K. Sorooshyari, Manuel A. Rivas, Robert Tibshirani

AI总结 提出ERICA框架,通过迭代聚类分配计算统计量,量化数据集中的聚类结构是否可复现,并应用于合成数据和乳腺癌基因表达数据,发现合成数据可复现而部分真实数据存在不可复现性。

详情
AI中文摘要

尽管聚类在科学中无处不在,但其结果尚未通过框架进行定量审查。我们提出了一种称为通过迭代聚类分配评估可复现性(ERICA)的分析方法,应用于数据集以确定聚类是否以可复现的方式被识别。该流程计算一个统计量,描述数据集中是否发现结构。提出了定量可视化方法以回答重要问题,例如聚类之间的相似性以及可能是异常值的点的身份。当在合成数据上进行测试时,结果显示聚类以可复现的方式被发现。然而,我们注意到当该流程应用于三个用于乳腺癌亚型验证的基因表达数据集时,可能出现不可复现的结果。该研究强调了严格检查的必要性,并为此提供了一个实用工具。

英文摘要

Despite being ubiquitous in science, clustering remains a technique whose results are not quantitatively scrutinized via a framework. We present an analysis called evaluating replicability via iterative clustering assignments (ERICA) that is applied to a dataset to determine whether clusters are identified in a replicable manner. The pipeline computes a statistic that describes whether structure is found in a dataset. Quantitative visualization methods are presented to answer important questions such as the similarity between clusters, and the identity of points that may be outliers. When tested on synthetic data, the findings show clusters being discovered in a replicable manner. However, we note a possibility for non-replicable results when the pipeline is applied to three gene expression datasets for breast cancer subtype validation. The study underscores the need for rigorous inspection and offers a practical tool for doing so.

2606.00296 2026-06-02 stat.ML cs.LG math.AP

Is Zero-Shot Super-Resolution Possible in Operator Learning?

零样本超分辨率在算子学习中是否可能?

Unique Subedi, Ambuj Tewari

AI总结 本文系统研究算子学习中的零样本超分辨率现象,证明其在信息论上可能不可行,并识别输出函数的Hölder光滑性作为充分条件,给出泛化界。

详情
AI中文摘要

神经算子常被报道具有零样本超分辨率能力,即模型在粗网格上训练后,无需额外训练即可在更细的测试网格上产生准确预测。尽管有强有力的经验证据,这一现象的理论基础仍不清楚。本文对算子学习中的零样本超分辨率进行了系统的理论研究。我们首先证明,即使在输入函数在整个连续域上可用且真实映射为简单秩一线性算子的良性设置下,零样本超分辨率在信息论上也可能不可行。然后,我们识别出输出函数的Hölder光滑性作为零样本超分辨率的充分条件,并推导出相应的泛化界。最后,我们通过实验结果验证了所识别的失败模式。

英文摘要

Neural operators are often reported to exhibit zero-shot super-resolution, a phenomenon in which a model trained on coarse grids produces accurate predictions on finer testing grids without additional retraining. Despite strong empirical evidence, the theoretical foundations of this phenomenon remain unclear. In this work, we provide a systematic theoretical study of zero-shot super-resolution in operator learning. We first show that zero-shot super-resolution can be information-theoretically impossible even in benign settings such as when the input functions are available over the entire continuum and the ground truth is a simple rank-one linear operator. We then identify H{\" o}lder smoothness of the output functions as a sufficient condition for zero-shot super-resolution and derive corresponding generalization bounds. Finally, we also validate the identified failure modes through experimental results.

2606.00293 2026-06-02 cs.LG stat.ME stat.ML

Accurate Large-sample Uncertainty Quantification using Stochastic Gradient Markov Chain Monte Carlo

使用随机梯度马尔可夫链蒙特卡洛进行精确的大样本不确定性量化

Yu Wang, Jie Ding, Jonathan H. Huggins

AI总结 针对大批量或模型误设下随机梯度下降和随机梯度Langevin动力学调参困难的问题,提出新的离散时间近似方法,实现稳态协方差、迭代平均协方差和积分自相关时间的精确预测,并给出非渐近误差界。

详情
Journal ref
Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026
AI中文摘要

调参算法如随机梯度下降(SGD)和随机梯度Langevin动力学(SGLD)用于近似采样和不确定性量化仍然具有挑战性,特别是在批量大小较大或模型误设的实际相关设置中。现有提供调参指导的理论依赖于连续时间极限或强统计假设,在这些情况下可能变得定量不准确。我们通过提出新的带或不带动量的SG(L)D离散时间近似来解决这些不足,从而能够精确预测稳态协方差、迭代平均协方差和积分自相关时间。此外,我们证明了定量的非渐近误差界,表明这些估计对于实际调参和不确定性量化足够准确。数值实验表明,在现有方法失效的各种模型和数据生成分布中,我们的理论提供了改进的调参指导,包括使用$β$-散度而非对数损失以获得统计稳健推断的情况。

英文摘要

Tuning algorithms such as stochastic gradient descent (SGD) and stochastic gradient Langevin dynamics (SGLD) for approximate sampling and uncertainty quantification remains challenging, particularly in the practically relevant settings when the batch size is large or the model is misspecified. Existing theory that provides tuning guidance relies on continuous-time limits or strong statistical assumptions, which can become quantitatively inaccurate in these regimes. We address these shortcomings by proposing new discrete-time approximations to SG(L)D with and without momentum, which enables accurate predictions of the stationary covariance, iterate average covariance, and integrated autocorrelation time. Moreover, we prove quantitative, non-asymptotic error bounds showing that these estimates are sufficiently accurate for practical tuning and uncertainty quantification. Numerical experiments demonstrate that our theory yields improved tuning guidance across a range of models and data-generating distributions where existing approaches fail, including when using the $β$-divergence rather than log-loss to obtain statistically robust inferences.

2606.00265 2026-06-02 stat.ML cs.LG

Out-of-Distribution generalization of quantile regression with heavy tailed inputs: an SVM approach

重尾输入下分位数回归的分布外泛化:一种SVM方法

Baptiste Leroux, Clément Dombry, Anne Sabourin

AI总结 针对协变量取异常大值的分位数回归外推问题,提出基于支持向量机(SVM)的框架,利用再生核希尔伯特空间处理高维非线性情况,并建立有限样本学习保证。

详情
Comments
48 pages, 5 figures
AI中文摘要

我们研究了协变量取异常大值的外推机制下的分位数回归。在正则变化假设下,极端观测可以通过其角度分量有效表征,从而使得学习策略能够聚焦于最极端观测的角度。该方法通过最小化渐近条件风险来形式化,该风险将学习定位在协变量分布的尾部。我们提出了一种新的支持向量机(SVM)框架用于极端分位数回归,利用再生核希尔伯特空间处理高维和非线性设置。我们的方法还适应无界响应变量,并避免了限制性变换。我们在温和的正则性假设下建立了有限样本学习保证。该框架统一了统计学习和多元极值的思想,提供了一种可处理且理论扎实的外推方法。我们通过对多瑙河河流流量数据的实证研究补充了理论发现,证明了我们方法的实际相关性。

英文摘要

We study quantile regression in an extrapolation regime where the covariate takes unusually large values. Under regular variation assumptions, extreme observations can be effectively characterized through their angular components, enabling learning strategies that focus on the angle of the most extreme observations. This approach is formalized through the minimization of an asymptotic conditional risk that localizes learning in the tail of the covariate distribution. We propose a novel Support Vector Machine (SVM) framework for extreme quantile regression, leveraging reproducing kernel Hilbert spaces to handle high-dimensional and nonlinear settings. Our method also accommodates unbounded response variables and avoids restrictive transformations. We establish finite-sample learning guarantees under mild regularity assumptions. The proposed framework unifies ideas from statistical learning and multivariate extremes, providing a tractable and theoretically grounded approach to extrapolation. We complement our theoretical findings with an empirical study on river flow data from the Danube, demonstrating the practical relevance of our methods.

2606.00262 2026-06-02 cs.LG cs.AI stat.AP stat.ML

When Softmax Fails at the Top: Extreme Value Corrections for InfoNCE

当 Softmax 在顶部失效:InfoNCE 的极值修正

Melihcan Erol, Suat Evren, Oktay Ozel, Alexander Morgan, Jongha Jon Ryu, Lizhong Zheng

AI总结 针对 InfoNCE 中 softmax 假设与对比学习嵌入设置不匹配的问题,提出基于极值理论的 WEINCE 修正方法,在五个视觉基准上提升冻结特征评估性能。

详情
Comments
Presented in ICML 2026
AI中文摘要

InfoNCE 是标准的对比学习目标,但其 softmax 形式不仅是一种计算便利:它还编码了关于如何选择最高分示例的统计假设。利用极值理论,我们表明这一假设通常与现代对比学习中使用的归一化嵌入设置不一致。受此不匹配的启发,我们提出了 extsc{WEINCE},这是 InfoNCE 的一个简单修改,它使用锚点在线批次统计将通常的 softmax 对数与端点短缺修正混合,不增加可训练参数。在五个视觉基准上, extsc{WEINCE} 在冻结特征评估中产生了一致的改进。这些结果表明,对困难负样本进行更忠实的统计处理可以改进对比目标。

英文摘要

InfoNCE is the standard contrastive learning objective, but its softmax form is not only a computational convenience: it also encodes a statistical assumption about how the top-scoring example is selected. Using extreme value theory, we show that this assumption is often misaligned with the normalized embedding setting used in modern contrastive learning. Motivated by this mismatch, we propose \textsc{WEINCE}, a simple modification of InfoNCE that uses anchor-wise online batch statistics to blend the usual softmax logits with an endpoint shortfall correction, adding no trainable parameters. Across five vision benchmarks, \textsc{WEINCE} yields consistent improvements in frozen-feature evaluation. These results show that a more faithful statistical treatment of hard negatives can improve contrastive objectives.

2606.00243 2026-06-02 cs.NE q-bio.NC stat.ML

Dynamics and Representation Structure of Local Approximations to Gradient-Based Learning in Linear Recurrent Neural Networks

线性递归神经网络中基于梯度学习的局部近似的动力学与表示结构

Ezekiel Williams, Alexandre Payeur, Guillaume Lajoie

AI总结 本文应用动力系统理论分析线性RNN中局部学习算法(RFLO和tBPTT)与BPTT的差异,发现RFLO的解被限制在初始参数的低秩扰动上。

详情
Comments
accepted to ICML 2026 as poster. Current version is camera-ready submission
AI中文摘要

生物和神经形态递归神经网络(RNN)在学习过程中可合理使用的信息受到空间和时间局部性约束。满足这些约束的常见策略是通过不同程度地忽略非局部项来修改梯度下降,如随机反馈局部在线学习(RFLO)和截断时间反向传播(tBPTT)。然而,这些算法的学习动力学及其与BPTT的比较仍不清楚。我们将动力系统理论应用于数据对齐的线性RNN——其动力学可分解为正交模式——以比较平稳解、稳定性性质和收敛速度,发现RFLO与BPTT及一步tBPTT在行为上存在质的差异。我们进一步观察到,RFLO学习的解被限制在初始参数的低秩扰动上,这一结果在数据对齐设置之外也成立。我们的工作为局部性约束如何塑造学习动力学提供了分析性见解,对神经科学的学习模型和RNN的替代优化方法具有启示意义。

英文摘要

Biological and neuromorphic recurrent neural networks (RNNs) are subject to spatial and temporal locality constraints on the information that can plausibly be used during learning. A common strategy to satisfy these constraints is to modify gradient descent by neglecting non-local terms to varying degrees, as in random feedback local online (RFLO) learning and truncated backpropagation through time (tBPTT). However, the learning dynamics of these algorithms, and how they compare with BPTT, remain poorly understood. We apply dynamical systems theory to data-aligned linear RNNs -- whose dynamics can be separated into orthogonal modes -- to compare stationary solutions, stability properties, and convergence rates, finding qualitatively distinct behaviour for RFLO versus BPTT and one-step tBPTT. We further observe that the solutions learned by RFLO are restricted to low-rank perturbations of initial parameters, a result which holds beyond the data-aligned setting. Our work provides analytical insight into how locality constraints shape learning dynamics, with implications for neuroscientific models of learning and alternative optimization approaches for RNNs.

2606.00241 2026-06-02 cs.LG cs.AI stat.ML

InfoAtlas: A Foundation Model for Zero-Shot Statistical Dependence Estimate

InfoAtlas:用于零样本统计依赖性估计的基础模型

Zhengyang Hu, Yanzhi Chen, Hanxiang Ren, Qunsong Zeng, Youyi Zheng, Adrian Weller, Kaibin Huang, Yanchao Yang

AI总结 提出InfoAtlas,一种基础模型架构,通过单次前向传播直接推断互信息,实现零样本估计,在保持精度的同时获得100倍加速。

详情
Comments
Accepted to ICML 2026
AI中文摘要

测量高维随机变量之间的统计依赖性是数据科学和机器学习中的基本任务。神经互信息(MI)估计器提供了一种有前景的途径,但它们通常需要对每个新数据集进行昂贵的迭代优化,这使得它们不适用于实时应用。我们提出了InfoAtlas,一种类似基础模型的架构,通过单次前向传播直接推断MI,消除了这一瓶颈。在大规模合成数据上预训练,具有丰富的依赖模式,InfoAtlas学习识别多样的依赖结构并直接从数据集中预测MI。全面的实验表明,InfoAtlas在准确性上匹配最先进的神经估计器,同时实现100倍加速,可以通过单个统一模型灵活处理不同维度和样本量,并有效推广到复杂的现实场景。通过将MI估计重新表述为推理任务,InfoAtlas为实时依赖性分析奠定了基础。

英文摘要

Measuring statistical dependency between high-dimensional random variables is a fundamental task in data science and machine learning. Neural mutual information (MI) estimators offer a promising avenue, but they typically require costly iterative optimization for each new dataset, making them impractical for real-time applications. We present InfoAtlas, a foundation model-like architecture that eliminates this bottleneck by directly inferring MI in a single forward pass. Pretrained on large-scale synthetic data with rich dependence patterns, InfoAtlas learns to identify diverse dependence structures and predict MI directly from the dataset. Comprehensive experiments demonstrate that InfoAtlas matches state-of-the-art neural estimators in accuracy while achieving $100\times$ speedup, can flexibly handle varying dimensions and sample sizes through a single unified model, and generalizes effectively to complex, real-world scenarios. By reformulating MI estimation as an inference task, InfoAtlas establishes a foundation for real-time dependency analysis.

2606.00233 2026-06-02 math.ST stat.TH

Density Evolution: A Multiscale View of Density Estimation

密度演化:密度估计的多尺度视角

Kisung You

AI总结 本文提出密度演化视角,通过平滑尺度、扩散时间等参数索引的密度路径来研究数据集,并回顾了高斯核密度估计、尺度空间方法、有限混合模型、聚类树与持久同调等方法的联系,讨论了特征寿命推断、高维复杂性和与基于分数的生成扩散的联系。

详情
AI中文摘要

密度估计通常被呈现为参数摘要、有限混合和非参数平滑器之间的选择。本文主张一种互补的视角:一个数据集可以通过由平滑尺度、扩散时间、模型复杂度、密度水平或噪声水平索引的密度路径来研究。我们将这种视角称为密度演化。在此视角下,高斯核密度估计是从经验测度出发的热流;尺度空间方法、临界带宽、模式树和导数显著性图描述了模式和导数结构的演化;有限混合和混合约简提供了核类似估计的压缩表示;聚类树和持久同调总结了演化的水平集拓扑。我们回顾了这些联系,并讨论了特征寿命的推断、高维复杂性以及与基于分数的生成扩散的联系。我们还包含了三个基本结构结果:非退化模式沿光滑分支移动,一个自然的保矩高斯化半群被迫成为奥恩斯坦-乌伦贝克过程,以及共享协方差的高斯混合在分量均值足够集中时变为对数凹。总之,这些想法将注意力从选择一个密度估计转移到研究多尺度概率景观上。

英文摘要

Density estimation is often presented as a choice among parametric summaries, finite mixtures, and nonparametric smoothers. This review argues for a complementary view: a data set can be studied through a path of densities indexed by smoothing scale, diffusion time, model complexity, density level, or noise level. We call this perspective density evolution. Under this lens, Gaussian kernel density estimation is heat flow from the empirical measure; scale-space methods, critical bandwidths, mode trees, and derivative-significance displays describe the evolution of modal and derivative structure; finite mixtures and mixture reduction provide compressed representations of kernel-like estimates; and cluster trees and persistent homology summarize evolving level-set topology. We review these connections and discuss inference for feature lifetimes, high-dimensional complications, and links with score-based generative diffusion. We also include three elementary structural results: nondegenerate modes move along smooth branches, a natural moment-preserving Gaussianization semigroup is forced to be Ornstein--Uhlenbeck, and shared-covariance Gaussian mixtures become log-concave once component means are sufficiently concentrated. Together, these ideas shift attention from choosing one density estimate to studying the multiscale probability landscape.

2606.00231 2026-06-02 stat.ME

On Asymptotic Outlier Rejection in Bayesian Mixed Poisson Regression Models Under Extreme Target and Covariate Values

关于极端目标值和协变量值下贝叶斯混合泊松回归模型中的渐近异常值拒绝

Ilaria Pia, Jarno Vanhatalo

AI总结 研究混合泊松回归模型在协变量或目标值异常时的稳健性,发现模型对目标异常稳健但对协变量异常不稳健,需开发新方法。

详情
Comments
42 pages, 8 figures
AI中文摘要

贝叶斯模型被认为对异常值完全稳健,如果渐近地,远离其他数据的观测值不影响后验。稳健贝叶斯推断的早期工作集中于连续分布和独立同分布观测。随后,稳健性结果扩展到存在无限残差的线性回归,无论是通过异常结果还是异常协变量。最近,Hamura等人(2025,arXiv:2106.10503)提出了一种计数回归模型,具有泊松-重缩放Beta(-RSB)目标分布和高斯潜变量(GLV),该模型对无限大计数稳健并能处理零膨胀。我们延续Hamura等人的工作,研究具有GLV的混合泊松回归模型在存在来自损坏协变量或损坏目标值的异常数据点时的稳健性。虽然在线性回归中这两种情况可互换,因为无限目标或协变量都会导致无限残差,但我们表明在计数回归中,无限协变量与无限目标不是对称情况。具体而言,我们表明混合泊松模型对由无限协变量导致的异常值不是渐近稳健的。然后,我们考虑三种替代的混合泊松(泊松-伽马、泊松-log-t和泊松-RSB)作为目标分布,并从理论上和通过模拟以及真实世界案例研究,检验它们在三种替代类型异常值(大目标值以及大和小协变量值)存在时的行为。我们的结果表明,对异常目标数据点稳健的模型对异常协变量数据点不稳健,这要求为协变量异常值开发稳健模型的方法论发展。

英文摘要

Bayesian models are claimed to be fully robust against outliers if, asymptotically, observations infinitely far from the other data do not influence the posterior. Early works in robust Bayesian inference concentrated on continuous distributions and i.i.d. observations. Robustness results were then extended to linear regression in the presence of infinite residuals, either through an outlying outcome or an outlying covariate. Recently, Hamura et al. (2025, arXiv:2106.10503) presented a count regression model, with Poisson-Rescaled Beta (-RSB) target distribution and Gaussian latent variables (GLVs), which is robust against infinitely large counts and able to handle zero-inflation. We continue from the work of Hamura et al. and study the robustness properties of mixed Poisson regression models with GLVs in the presence of outlying data points arising from either corrupted covariates or corrupted target values. While in linear regression the two cases are interchangeable, as both infinite target or covariates lead to infinite residuals, we show that in count regression infinite covariates is not a symmetric case to infinite target. Specifically, we show that mixed Poisson models are not asymptotically robust to outliers resulting from infinite covariates. We then consider three alternative mixed Poissons (Poisson-Gamma, Poisson-log-t, and Poisson-RSB) as target distribution and examine, both theoretically and via simulations as well as real-world case studies, their behavior in the presence of outliers of three alternative types: large target value as well as large and small covariate values. Our results show that models robust to data points with an anomalous target are not robust to data points with anomalous covariates, calling for methodological development for models that are robust for covariate outliers.

2606.00183 2026-06-02 cs.LG cs.AI math.OC stat.ML

Agentic Transformers Provably Learn to Search via Reinforcement Learning

智能体Transformer通过强化学习可证明地学会搜索

Tong Yang, Yu Huang, Yingbin Liang, Yuejie Chi

AI总结 本文通过构建双头Transformer实现随机深度优先搜索,并分析策略梯度训练动力学,证明该搜索机制能从稀疏强化反馈中分阶段涌现,且具备深度泛化能力。

详情
AI中文摘要

树搜索是许多语言智能体推理和决策任务的核心抽象:智能体必须探索动作、记住失败并回溯到有希望的替代方案。然而,我们缺乏对基于Transformer的策略如何从强化学习(RL)的训练动态中获得这种搜索能力的理论理解。我们在一个随机的$k$叉树环境中研究这个问题,其中智能体Transformer仅通过交互观察其轨迹历史,并在到达隐藏的叶子目标节点时获得终端奖励。我们首先构建了一个实现随机深度优先搜索(DFS)的双头Transformer:一个头跟踪之前的动作,而另一个头检测失败结果并触发回溯。然后,我们分析了在深度课程下的策略梯度训练动态,表明相同的DFS机制在没有专家演示的情况下,从稀疏强化反馈中分阶段涌现。得到的策略表现出深度泛化能力:仅在深度为1和2的树上训练后,它能在更深的完整树上成功。我们进一步表明,在非平衡目标分布下,对回报进行折扣会导致一种排序的DFS策略,优先考虑高概率分支。总的来说,我们的结果确定了基于Transformer的搜索的一种机制性标准形式,其中注意力头专门化并协作,从上下文中提取与决策相关的轨迹,并通过RL训练将其转化为智能体动作选择。

英文摘要

Tree search is a central abstraction behind many language-agent reasoning and decision-making tasks: agents must explore actions, remember failures, and backtrack toward promising alternatives. Yet, we lack a theoretical understanding of how transformer-based policies acquire such search capabilities from the training dynamics of reinforcement learning (RL). We study this question in a stochastic $k$-ary tree environment, where an agentic transformer observes only its trajectory history through interaction and receives a terminal reward for reaching a hidden leaf goal node. We first construct a two-head transformer that implements randomized depth-first search (DFS): one head tracks previous actions, while the other detects failure outcomes and triggers backtracking. We then analyze the training dynamics of policy gradient under a depth-wise curriculum, showing that this same DFS mechanism emerges in stages from sparse reinforcement feedback without expert demonstrations. The resulting policy exhibits depth generalization: after training only on depth-$1$ and depth-$2$ trees, it succeeds on deeper full trees. We further show that, under imbalanced goal distributions, discounting the return leads to a ranked DFS policy that prioritizes higher-probability branches. Overall, our results identify a mechanistic normal form for transformer-based search, in which attention heads specialize and cooperate to extract decision-relevant traces from context and convert them into agentic action selection via RL training.

2606.00181 2026-06-02 stat.ME

Infinite-Dimensional Spherical Kernel ridge Regression

无限维球面核岭回归

Beatrice Matteo, Almond Stoecker, Shahin Tavakoli

AI总结 提出一种内在的球面回归框架,通过向量值再生核希尔伯特空间理论将无限维估计转化为有限维问题,并开发了BFGS算法。

详情
AI中文摘要

我们引入了一种新颖的回归框架,旨在模拟位于有限或无限维球面$\mathbb{S}$上的非线性响应。与传统的切空间回归(将响应提升到切空间$T_o \mathbb{S}$,从而违反内在球面距离)不同,我们提出的方法采用内在方法。我们通过截距$o \in \mathbb{S}$和线性预测函数$f: \mathfrak{X} \to T_o \mathbb{S}$对条件均值进行建模。这一表述将估计问题转化为在函数空间内寻找线性预测器,但使用由球面几何而非标准欧氏距离定义的度量。利用向量值再生核希尔伯特空间理论,我们的方法通过表示定理将无限维估计挑战简化为可处理的有限维问题,从而得到高效的基于BFGS的估计算法。我们建立了收敛速度并分析了估计量的有限样本行为,最后将其应用于密度回归。完整实现可在R中获取。

英文摘要

We introduce a novel regression framework designed to model non-linear responses situated on a sphere $\mathbb{S}$ of finite or infinite dimension. Unlike traditional tangent-space regressions, which lift responses to a tangent space $T_o \mathbb{S}$ and thereby violate intrinsic spherical distances, our proposed method employs an intrinsic approach. We model the conditional mean through an intercept $o \in \mathbb{S}$ and a linear predictor function $f: \mathfrak{X} \to T_o \mathbb{S}$. This formulation transforms the estimation problem into finding a linear predictor within a function space, but utilizing a metric defined by spherical geometry rather than standard Euclidean distance. Leveraging vector-valued reproducing kernel Hilbert space theory, our approach reduces the infinite-dimensional estimation challenge to a manageable finite-dimensional problem via the representer theorem, leading to an efficient BFGS-based estimation algorithm. We establish convergence rates and analyze the finite-sample behavior of our estimator, concluding with a practical application to density regression. The full implementation is available in R.

2606.00157 2026-06-02 stat.ML cs.AI cs.LG math.PR

Interpreting FCDNNs via RG on Exponential Family

通过指数族上的重正化群解释全连接深度神经网络

Fuzhou Gong, Zigeng Xia

AI总结 本文通过建立统计物理中重正化群方法与深度神经网络训练过程的对应关系,证明了对于指数族连续输入数据,全连接DNN训练后特征层输出的特征参数等于RG方法下的不动点,从而解释了DNN的特征提取能力。

详情
Comments
18 pages, 2 figures
AI中文摘要

我们考虑通过建立统计物理中的重正化群(RG)方法与深度神经网络(DNN)训练过程之间的对应关系,来建立深度学习的可解释性理论。我们已使用一维伊辛模型作为输入数据证明了所构建的关系。本文我们将结果推广到连续输入数据的情况,这是将该对应框架应用于真实数据的必要准备。为具有代表性,我们考虑指数族中的一类数据分布。我们证明,当全连接(FC)DNN的参数在训练后达到最优值时,DNN特征层输出的特征参数等于连续场RG方法下输入数据特征参数的不动点。这一结论表明,DNN的训练过程等价于对此类数据进行RG计算,因此网络能够像RG一样从输入数据中提取主要特征。此外,该等价性进一步验证了我们建立的对应框架,为DNN在真实数据上的卓越表现提供了解释。

英文摘要

We consider establishing the interpretability theory of deep learning through constructing a corresponding relationship between the renormalization group (RG) method in statistical physics and the training process of deep neural networks (DNNs). We have proved the constructed relationship using the one-dimensional Ising model as the input data. In this paper we generalize our results to the case of continuous input data, which is a necessary preparation for applying the corresponding framework to real-world data. To be representative, we consider a class of data distribution in the exponential family. We prove that when the parameters of fully connected (FC) DNNs achieve their optimal value after training, the characteristic parameters of the feature layer output of DNNs are equal to the fixed points of the characteristic parameters of input data under RG method for continuous fields. This conclusion shows that the training process of DNNs is equivalent to RG calculation on this kind of data and therefore the network can extract main features from the input data just like RG. Also, the equivalence further validates the correspondence framework we have established, providing an explanation for the outstanding performance of DNNs on real-world data.

2606.00128 2026-06-02 stat.AP

Exploring the periodicity of flight patterns

探索航班模式的周期性

Sarah M. Coleman, H. Sherry Zhang, Lydia R. Lucchesi, Saptarshi Roy

AI总结 本文利用美国运输统计局超过35年的商业航班数据,通过可视化和香农熵量化方法,分析航空公司枢纽机场的航班调度周期性,并对比美国四大航空公司的差异。

详情
AI中文摘要

每年美国统计协会(ASA)都会举办年度数据挑战博览会,要求参与者分析给定数据集并在联合统计会议(JSM)上展示其工作。2025年数据挑战博览会要求参与者分析来自美国运输统计局(BTS)超过35年的商业航班数据。这些数据提供了美国国内航空市场的广泛地理覆盖和运营细节。对于数百万过去的航班,包含航班日期、出发地、目的地、承运人、飞机、起飞和到达信息。在本文中,我们展示了针对2025年JSM数据挑战博览会的分析。我们选择探索航空公司、机场和时间上每日航班调度模式的规律。在此过程中,我们观察到主要航空枢纽以及大型联邦航空管理局(FAA)枢纽存在明显的调度“波”或周期性结构。在本文其余部分,我们详细描述了可视化航班调度周期性的过程,以及通过计算香农熵进行量化的方法。2025年数据挑战博览会的另一个要素是加入第二个数据集,由参与者自行决定。我们详细介绍了使用包含乘客登机信息的BTS数据集来确定FAA枢纽分类(而非特定航空公司的枢纽)。此外,我们讨论了这一可视化和定量分析的结果,强调了美国航空业“四大”航空公司——美国航空、达美航空、联合航空和西南航空——在机场调度周期性和熵方面的显著差异。

英文摘要

Each year the American Statistical Association (ASA) hosts the Annual Data Challenge Expo, which tasks participants with analyzing a given dataset and presenting their work at the Joint Statistical Meeting (JSM). The 2025 Data Challenge Expo tasked participants with analyzing over 35 years of commercial flight data from the United States Bureau of Transportation Statistics (BTS). These data provide extensive geographic coverage and operational details for the U.S. domestic aviation market. For millions of past flights, there is information about the flight's date, origin, destination, carrier, plane, departure, and arrival. In this article, we present our analysis for the 2025 JSM Data Challenge Expo. We chose to explore patterns in the daily scheduling of departures and arrivals across airlines, airports, and time. In doing so, we observed distinct scheduling ``waves'', or periodic structures at major airline hubs as well as large Federal Aviation Administration (FAA) hubs. In the remainder of this article, we detail the process of visualizing periodicity in flight scheduling as well as quantifying it through the calculation of Shannon entropy. An additional element to the 2025 Data Challenge Expo is the incorporation of a second dataset, to be decided by the participants. We detail the use of a BTS dataset with passenger enplanement (boarding) information to determine Federal Aviation Administration (FAA) hub classification (as opposed to airline-specific hubs). Furthermore, we discuss results from this visual and quantitative analysis, highlighting noticeable differences in the scheduling periodicity and entropy across airports, for the ``big four'' or four largest carriers, in U.S. aviation: American Airlines, Delta Air Lines, United Airlines, and Southwest Airlines.

2606.00115 2026-06-02 cs.CV cs.LG stat.ML

Physics from Video: Identifiability of Time-Invariant Second-Order ODEs under Minimal Trajectory Conditions

来自视频的物理:最小轨迹条件下时不变二阶ODE的可辨识性

Yuanyuan Wang, Wenjie Wang, Kun Zhang, Mingming Gong

AI总结 研究从原始像素中辨识连续时间物理定律的结构可辨识性,证明在最小轨迹条件下,编码器-仅管道可唯一恢复二阶线性ODE参数,并引入方差底正则化器稳定无解码器目标。

详情
Comments
Accepted at ICML 2026
AI中文摘要

弥合视觉真实感与物理理解之间的差距是基于视频的世界模型的核心挑战。我们研究从原始像素中辨识连续时间物理定律的结构可辨识性,重点关注编码器-仅管道能否唯一恢复二阶线性ODE的参数。我们证明,一个水平集斜率覆盖条件确保学习到的潜在空间与真实物理状态局部仿射,从而实现精确的参数恢复。我们的理论首次给出了不同阻尼机制下最小数据需求的刻画,建立了欠阻尼系统可从单个视频片段辨识,而其他机制需要三个不同轨迹。我们进一步引入方差底正则化器以稳定无解码器目标并防止潜在坍缩。在合成和真实数据上验证,我们的方法表明,无需计算密集的像素重建,即可从视频中可靠估计可解释的物理常数,确保物理正确性和透明性。代码可在 https://github.com/wenjiewang3/PhysicsFromVideo 获取。

英文摘要

Bridging the gap between visual realism and physical understanding is a core challenge for video-based world models. We study the structural identifiability of continuous-time physical laws from raw pixels, focusing on whether an encoder-only pipeline can uniquely recover the parameters of second-order linear ODEs. We prove that a level-set slope-coverage condition ensures the learned latent space is locally affine to the true physical state, enabling exact parameter recovery. Our theory provides the first characterization of minimal data requirements across damping regimes, establishing that underdamped systems are identifiable from a single video clip, whereas other regimes require three diverse trajectories. We further introduce a variance-floor regularizer to stabilize the decoder-free objective and prevent latent collapse. Validated on synthetic and real-world data, our approach demonstrates that interpretable physical constants can be reliably estimated from video without the need for compute-intensive pixel reconstruction, ensuring both physical correctness and transparency. Code is available at https://github.com/wenjiewang3/PhysicsFromVideo.

2606.00082 2026-06-02 cs.LG cs.AI stat.ML

Hoeffding Concept Bottleneck Models with Applications to Overhead Images

Hoeffding概念瓶颈模型及其在俯视图像中的应用

Clément Bénard, Manon Arfib, Christophe Labreuche, Victor Quétu

AI总结 针对线性概念瓶颈模型可解释性差和信息泄露问题,提出基于Hoeffding泛函分解的非线性稀疏聚合方法HCBM,并证明其对概念间泄露的鲁棒性,在分类和俯视图像目标检测任务中优于传统线性CBM。

详情
AI中文摘要

深度学习算法的可解释性对于高风险决策的计算机视觉应用至关重要。概念瓶颈模型(CBM)最近在基于高级概念瓶颈的分类问题上展示了提供可解释且准确预测的潜力。现有的CBM方法依赖概念分数的线性聚合来计算预测。然而,这种线性方法通常使用大量概念,这削弱了可解释性并有利于信息泄露。通常,概念与输出logits之间的潜在关系不是线性的。因此,我们引入了Hoeffding概念瓶颈模型(HCBM),该模型基于梯度提升树的Hoeffding泛函分解,提供概念分数的非线性和稀疏聚合,并使用素蕴含生成紧凑预测。HCBM被证明对概念间泄露具有鲁棒性,并在大量实验中优于标准线性CBM。除了分类,HCBM还可以适应目标检测,我们专注于一个具有挑战性的俯视图像案例,以展示HCBM在这些设置中的高性能。

英文摘要

Explainability of deep learning algorithms is critical for computer-vision applications with high-stake decisions. Concept bottleneck models (CBM) have recently shown promising performance to provide explainable and accurate predictions for classification problems, based on a bottleneck of high-level concepts. Existing CBM methods rely on a linear aggregation of the concept scores to compute predictions. However, a large number of concepts is often used in this linear approach, which undermines explainability and favors information leakage. In general, the underlying relation between concepts and output logits is not linear. Therefore, we introduce Hoeffding Concept Bottleneck Models (HCBM), which build on the Hoeffding functional decomposition of gradient-boosted trees to provide non-linear and sparse aggregations of concept scores, and generate compact predictions using prime implicants. HCBM are proved to be robust to interconcept leakage, and outperform standard linear CBM in practice, as shown in extensive experiments. Beyond classification, HCBM can be adapted to object detection, and we focus on a challenging case with overhead images to show the high performance of HCBM in these settings.

2605.31200 2026-06-02 cs.LG stat.ML

Beyond Additive Decompositions: Interpretability Through Separability

超越加性分解:通过可分离性实现可解释性

Jinyang Liu, Munir Eberhardt Hiabu

AI总结 提出张量分离学习(TSL)回归模型,通过阶段式贪心过程与正交重拟合学习单变量特征函数的秩1乘积之和,避免加性分解中的信号抵消和外推问题,实现忠实于拟合分量的可视化,并提供近似率保证。

详情
Comments
To appear in Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)
AI中文摘要

可解释机器学习需要准确且结构上忠实于数据的模型。现有的可解释性方法严重依赖加性表示(例如广义加性模型(GAMs)、SHapley加性解释(SHAP)、函数ANOVA),这些方法在存在强交互作用时可能会遭受信号抵消和支持外推。我们提出了张量分离学习(TSL),一种回归模型,通过带有正交重拟合的阶段式贪心过程学习单变量特征函数的秩1乘积之和。通过强制可分离性,TSL避免了加性投影中由于边缘化高阶交互作用而导致的信息损失。学习的TSL模型可以从一阶偏依赖函数完全重建,直到常数因子。这种阶段式对应确保了所得可视化忠实于拟合的分量。我们为具有有界混合$p$阶偏导数的函数建立了近似率保证,并证明TSL在回归基准测试中与黑盒模型竞争。

英文摘要

Interpretable machine learning requires models that are accurate and structurally faithful to the data. Existing explainability methods rely heavily on additive representations (e.g., Generalized Additive Models (GAMs), SHapley Additive exPlanations (SHAP), functional ANOVA), which can suffer from signal cancellation and off-support extrapolation in the presence of strong interactions. We propose Tensor Separation Learning (TSL), a regression model that learns a sum of rank-1 products of univariate per-feature functions via a stagewise greedy procedure with orthogonal refitting. By enforcing separability, TSL avoids the information loss inherent in additive projections caused by marginalizing higher-order interactions. The learned TSL model can be fully reconstructed from first-order partial dependence functions, up to constant factors. This stage-wise correspondence ensures that the resulting visualizations are faithful to the fitted components. We establish approximation-rate guarantees for functions with bounded mixed $p$-th order partial derivatives and demonstrate that TSL competes with black-box models on regression benchmarks.

2605.24583 2026-06-02 cs.LG cs.CL stat.ML

Measuring Alignment-Induced Activation Shifts Correctly: A Template-Controlled Difference-in-Differences Protocol

正确测量对齐引起的激活偏移:一种模板控制的双重差分协议

Yuki Nakamura

AI总结 针对对齐前后模型内部激活比较中存在的聊天模板混淆问题,提出模板控制的双重差分协议,有效分离对齐偏移与格式效应,恢复拒绝方向并提升余弦对齐度。

详情
Comments
11 pages, 1 figure. v3: substantially revised and reframed as a measurement-methodology paper. Code, data, and an immutable Zenodo archive are available at https://github.com/Nakammura/effective-rank-audit (DOI: 10.5281/zenodo.20341444)
AI中文摘要

比较模型在安全对齐前后的内部激活是探究安全训练改变了什么的一种自然方式:在安全相关输入上形成配对的(对齐后减对齐前)激活矩阵,并读取其有效秩或主方向。我们表明,形成该矩阵的直观方式存在混淆。对齐后的模型在基础模型从未见过的聊天模板下进行评估,因此朴素差异将对齐偏移与聊天格式混为一谈。我们引入修改矩阵的四变量分解(朴素、模板控制、对齐内和双重差分,DiD),以分离这两种效应。仅模板控制即可消除 Llama-3.1-8B、Gemma-2-9B 和 Qwen-2.5-7B 上测量有效秩的 2.0-3.9 倍膨胀;DiD 对比才是恢复 Arditi 等人(2024)的拒绝方向的关键,将其余弦对齐度从 0.18-0.39 提升至 0.50-0.86。跨三个系列的投影消融实验证实,恢复的子空间在行为上是活跃的,且奇异值顺序并非因果顺序。我们在受控测试平台上验证了该协议,并将其提炼为对齐激活差异研究的测量建议。

英文摘要

Comparing a model's internal activations before and after alignment is a natural way to ask what safety training changes: one forms the matrix of paired aligned-minus-base activations on safety-relevant inputs and reads off its effective rank or top direction. We show the obvious way to form this matrix is confounded. The aligned model is evaluated under a chat template the base model never saw, so the naive difference conflates the alignment shift with chat formatting. We introduce a four-variant decomposition of the modification matrix (naive, template-controlled, within-aligned, and difference-in-differences, DiD) that separates the two effects. Template control alone removes a 2.0-3.9x inflation of the measured effective rank across Llama-3.1-8B, Gemma-2-9B, and Qwen-2.5-7B; the DiD contrast is what recovers the refusal direction of Arditi et al. (2024), lifting its cosine alignment from 0.18-0.39 to 0.50-0.86. Projection-ablation across the three families confirms the recovered subspace is behaviorally active and that singular-value order is not causal order. We validate the protocol on a controlled testbed and distill it into measurement recommendations for activation-difference studies of alignment.

2605.20716 2026-06-02 cs.LG stat.ML

Decision-Path Patterns as Tree Reliability Signals: Path-based Adaptive Weighting for Random Forest Classification

决策路径模式作为树可靠性信号:基于路径的自适应加权用于随机森林分类

Youngjoon Park

AI总结 提出利用每棵树的决策路径结构模式作为实例自适应可靠性信号,对更可靠的树进行差异化加权,以纠正随机森林中因错误表示树占多数而导致的错误,在36个二分类基准上显著提升准确率。

详情
Comments
32 pages, 3 figures. Code and data: https://github.com/DavidParkYJ/dwarfp
AI中文摘要

随机森林通过每棵树对特征空间的不同随机化表示进行构建。当具有错误表示的树在概率上超过正确表示的树时,即使集成整体拥有足够正确的信息,其统一投票也无法纠正这些区域的错误——这是本文解决的一种可约错误。我们提出利用每棵树决策路径的结构模式作为实例自适应可靠性信号,以识别并对更可靠的树进行差异化加权。在推理时,随机森林通过样本在每棵树中遍历的根到叶路径得出预测,因此路径级可靠性提供了比树级加权更细的粒度。我们表明,该信号反映了每棵树决策的实际可靠性,并且使用它能在36个二分类基准上比RF获得统计显著的准确率提升(Wilcoxon p < 0.0001)。测量了类别召回率回归——RF校正方法的典型失败模式:在0.2个百分点阈值下,零个少数类召回率回归和一个多数类召回率回归,表明偏差减少而非类别权衡。我们进一步量化了该方法从拟合RF本身可访问的可约错误;该估计与每个数据集的增益强相关(Pearson r = +0.840, p < 0.0001)。在它识别的合格组上,该方法平均准确率提升+0.99个百分点,且在每个数据集上严格获胜(7/0/0);可选的放大机制进一步将其提升至+1.48个百分点。

英文摘要

Random forests construct each tree with a different, randomised representation of the feature space. Their uniform voting cannot correct errors in regions where trees with incorrect representations probabilistically outnumber correct ones, even when the ensemble collectively holds enough correct information - a reducible error that this paper addresses. We propose using the structural pattern of each tree's decision path as an instance-adaptive reliability signal to identify and differentially weight the more reliable trees. At inference, a random forest reaches its prediction through the root-to-leaf path the sample traverses in each tree, so path-level reliability offers a finer granularity than tree-level weighting can access. We show that this signal reflects the actual reliability of each tree's decision, and that using it yields a statistically significant accuracy improvement over RF on 36 binary classification benchmarks (Wilcoxon p < 0.0001). Class-recall regression - the typical failure mode of RF correction methods - is measured: zero minority-recall regressions and a single majority-recall regression at the 0.2 pp threshold, indicating bias reduction rather than a class trade-off. We further quantify the reducible error accessible to the method from the fitted RF alone; this estimate correlates strongly with per-dataset gain (Pearson r = +0.840, p < 0.0001). On the qualifying group it identifies, the method delivers a mean +0.99 pp accuracy improvement with strict wins on every dataset (7/0/0); an optional amplification mechanism further raises this to +1.48 pp.

2605.26919 2026-06-02 cs.LG stat.ML

Agile Online Model Selection: Resolving Adaptation Lag via Safeguarded Large Learning Rates

敏捷在线模型选择:通过受保护的大学习率解决适应滞后

Kei Takemura, Ryuta Matsuno, Keita Sakuma

AI总结 提出一种乐观在线镜像下降算法,利用受保护的大学习率(高达Θ(T))并引入事后惩罚机制,在非平稳环境中实现快速适应,同时保持近最优的遗憾界。

详情
Comments
Accepted to KDD 2026
AI中文摘要

在非平稳环境中保持预测准确性需要在线模型选择来自主适应未知的分布变化。然而,现有的免调参算法在鲁棒性和敏捷性之间存在根本性权衡。具体来说,为了确保动态遗憾界,它们必须将学习率限制为小常数(例如,$O(1)$)。这种限制不可避免地会在突变期间导致显著的适应滞后。为了解决这个问题,我们提出了一种新颖的乐观在线镜像下降算法,该算法利用受保护的大学习率,最高可达$Θ(T)$,其中$T$是轮数。我们的关键技术贡献是一种事后惩罚机制,该机制动态监控不稳定的更新,并排除导致过度遗憾的学习率,从而消除了限制性先验约束的需要。我们证明了累积惩罚仍为$O(\log T)$,使得我们的算法在良性情况下实现优越的速率的同时,匹配接近最优的最坏情况保证。在三个合成数据集和十一个多样化真实世界数据集上的实证评估表明,我们的方法将适应滞后从数百轮减少到几轮,始终优于免调参基线。

英文摘要

Maintaining predictive accuracy in non-stationary environments requires online model selection to adapt autonomously to unknown distribution shifts. However, existing tuning-free algorithms face a fundamental trade-off between robustness and agility. Specifically, to ensure dynamic regret bounds, they must restrict learning rates to small constants (e.g., $O(1)$). This restriction inevitably causes significant adaptation lag during abrupt changes. To resolve this, we propose a novel optimistic online mirror descent that utilizes safeguarded large learning rates up to $Θ(T)$, where $T$ is the number of rounds. Our key technical contribution is a post-hoc penalty mechanism that dynamically monitors unstable updates and excludes learning rates incurring excessive regret, eliminating the need for restrictive a priori constraints. We show that the cumulative penalty remains $O(\log T)$, allowing our algorithm to match near-optimal worst-case guarantees while achieving superior rates in benign cases. Empirical evaluations on three synthetic and eleven diverse real-world datasets demonstrate that our approach reduces the adaptation lag from hundreds of rounds to a few rounds, consistently outperforming tuning-free baselines.

2605.26431 2026-06-02 cs.CL stat.AP

Probing Minimalist Phase Structure in LLMs: What Universal Dependencies Cannot Represent

探究LLM中的最简阶段结构:普遍依赖关系无法表示的内容

Yuanhao Chen, Peter Chin

AI总结 通过设计UD距离不变的条件,使用结构探针评估LLM在wh-移位刺激上的表现,发现阶段计数梯度和阶段内凝聚性效应,表明分布预训练能诱导出超越UD的形式句法抽象表征。

详情
AI中文摘要

结构探针在普遍依赖关系(UD)上进行训练,而UD不编码形式句法抽象,如阶段边界或阶段内凝聚性。大型语言模型(LLM)是否编码这些抽象仍是一个开放问题,基于UD的探针在构造上无法回答。我们在wh-移位刺激上评估结构探针,这些刺激中UD距离在设计上跨条件不变——因此任何非零效应都反映了超越UD的结构。三个条件——裸小句、不定式和限定式——按wh-元素跨越的最简方案(MP)阶段边界数量排序。在来自四个家族的13个LLM中,我们在跨从句对上发现了阶段计数梯度(12/13模型),并在一个从句内对上发现了13/13的符号不对称性,该从句内对的UD距离跨条件相同——后者特别由阶段内凝聚性预测,这是MP抽象,在构造上对UD不可见。激活修补证实这些表征在12/13模型中因果活跃。这些发现表明,分布预训练可以诱导与形式句法抽象一致的表征,这些抽象超出了基于注释的探针所能达到的范围;基于UD的探针提供了句法编码的下界,而非上界。

英文摘要

Structural probes train on Universal Dependencies (UD), which does not encode formal-syntactic abstractions such as phase boundaries or phase-internal cohesion. Whether large language models (LLMs) encode these remains an open question that UD-based probing cannot answer by construction. We evaluate structural probes on wh-movement stimuli where UD distances are invariant across conditions by design -- any non-zero effect therefore reflects structure beyond UD. The three conditions -- bare small clause, infinitival, and finite -- are ordered by the number of Minimalist Program (MP) phase boundaries the wh-element crosses. Across 13 LLMs from four families, we find a phase-count gradient on a cross-clause pair (12/13 models) and a 13/13 sign asymmetry on a within-clause pair whose UD distance is identical across conditions -- the latter specifically predicted by phase-internal cohesion, an MP abstraction invisible to UD by construction. Activation patching confirms the representations are causally active in 12/13 models. These findings suggest that distributional pretraining can induce representations aligned with formal-syntactic abstractions beyond the reach of annotation-based probing; UD-grounded probes provide a lower bound on syntactic encoding, not an upper bound.

2605.30242 2026-06-02 stat.AP

Multi-source land-use emissions reveal rising airborne fraction

多源土地利用排放揭示上升的大气份额

J. Eduardo Vera-Valdes

AI总结 利用全球碳预算2025中所有可用的土地利用变化排放序列,通过混合效应模型估计大气份额趋势,发现1959-2024年间大气份额从约0.40上升至约0.47,且结论稳健。

详情
AI中文摘要

大气份额是指人为二氧化碳排放中留在大气中的比例,是碳循环响应和持续排放下剩余碳预算的关键指标。该比例是否在上升仍存在争议,因为推断对土地利用和土地覆盖变化(LULC)排放的不确定性敏感。这里我们使用全球碳预算2025中所有可用的LULC测量序列,并通过混合效应模型(对LULC序列具有随机截距和斜率)估计大气份额趋势。我们发现大气份额在1959-2024年间从约0.40增加到约0.47,并且这一结论在排除最后一年以及明确传播分母不确定性的替代设定下仍然稳健。这些结果澄清了为何早期研究报告了微弱或不确定的趋势证据,并加强了对以下观点的支持:越来越多的排放二氧化碳正在大气中积累,而非被陆地和海洋碳汇吸收,这对碳预算评估和近期减排需求具有影响。

英文摘要

The airborne fraction is the share of anthropogenic carbon dioxide emissions that remains in the atmosphere and is a key indicator of carbon-cycle response and remaining carbon budgets under continued emissions. Whether this share is rising remains debated because inference is sensitive to uncertainty in land-use and land-cover change (LULC) emissions. Here we use all available LULC measurement series from Global Carbon Budget 2025 and estimate airborne-fraction trends with a mixed-effects model with random intercepts and slopes by LULC series. We find that the airborne fraction increased over 1959-2024, from about 0.40 to about 0.47, and that this conclusion is robust to excluding the final year and to alternative specifications that explicitly propagate denominator uncertainty. These results clarify why earlier studies reported weak or inconclusive trend evidence and strengthen support for the view that an increasing share of emitted carbon dioxide is accumulating in the atmosphere rather than being taken up by land and ocean sinks, with implications for carbon-budget assessment and near-term mitigation requirements.

2605.30188 2026-06-02 cs.LG cs.AI stat.ML

CalArena: A Large-Scale Post-Hoc Calibration Benchmark

CalArena:大规模事后校准基准

Eugène Berta, David Holzmüller, Francis Bach, Michael I. Jordan

AI总结 提出CalArena大规模标准化基准,涵盖近2000个实验,通过事后改进(PHI)原则比较多种校准方法,发现平滑校准函数优于分箱方法,专用多类方法在高维场景中至关重要。

详情
Comments
30 pages, 9 figures
AI中文摘要

可靠的概率估计在许多机器学习应用中至关重要,但现代分类器往往校准不佳。事后校准提供了一种简单且广泛使用的解决方案,但由于提出的方法众多,加上小规模和不一致的评估,很难确定哪些方法在实践中真正有效。我们引入了一个大规模、标准化的事后校准基准,涵盖表格和计算机视觉任务的近2000个实验,包括二分类、多分类和大规模分类设置。我们的基准汇集了来自多种经典模型、现代深度学习架构和基础模型的预测,并在通用评估框架内提供了数十种校准方法的统一、可重复实现。我们认为,在适当评分规则下的事后改进(PHI)为比较事后方法提供了传统校准误差估计器的原则性替代方案,同时捕捉校准质量和模型预测性能的潜在退化。利用这一框架,我们进行了迄今为止最全面的事后校准实证研究。我们的结果揭示了跨领域的一致模式:平滑校准函数优于基于分箱的方法,专用多类方法在高维设置中至关重要,而通用机器学习模型在没有校准特定设计的情况下不具备竞争力。为促进未来研究,我们发布了所有数据、代码和评估工具,为开发和比较校准方法提供了一个即插即用的基准。

英文摘要

Reliable probability estimates are critical in many machine learning applications, yet modern classifiers are often poorly calibrated. Post-hoc calibration provides a simple and widely used solution, but the large number of proposed methods, combined with small-scale and inconsistent evaluations, makes it difficult to determine which approaches are truly effective in practice. We introduce a large-scale, standardized benchmark for post-hoc calibration, covering nearly 2000 experiments across tabular and computer vision tasks, including binary, multiclass, and large-scale classification settings. Our benchmark aggregates predictions from a diverse set of classical models, modern deep learning architectures, and foundation models, and provides unified, reproducible implementations of dozens of calibration methods within a common evaluation framework. We argue that Post-Hoc Improvement (PHI) in proper scoring rules offers a principled alternative to traditional calibration error estimators for comparing post-hoc methods, capturing both calibration quality and potential degradation to the model's predictive performance. Using this framework, we conduct the most comprehensive empirical study of post-hoc calibration to date. Our results reveal consistent patterns across domains: smooth calibration functions outperform binning-based approaches, dedicated multiclass methods are essential in high-dimensional settings, and generic machine learning models are not competitive without calibration-specific design. To facilitate future research, we release all data, code, and evaluation tools, providing a plug-and-play benchmark for developing and comparing calibration methods.

2506.02075 2026-06-02 stat.ME cs.LG

Position: Stop Chasing the C-index when Evaluating Survival Analysis Models

立场:评估生存分析模型时停止追逐C指数

Christian Marius Lillelund, Shi-ang Qi, Russell Greiner, Christian Fischer Pedersen

AI总结 本文批判性审视生存分析中的评估实践,指出C指数等一致性指标被过度使用且与建模目标错位,提出双螺旋阶梯框架以确保评估指标与模型假设对齐,并通过实验展示错位导致的误导性比较。

详情
Comments
ICML 2026 Position Paper Track (Spotlight)
AI中文摘要

当前生存分析评估的现状受到持续使用与既定建模目标不一致的评估指标的困扰。此外,许多此类评估基于隐含或不合理的删失假设。这意味着报告的性能可能具有误导性,并且可能无法回答评估旨在解决的科学或建模问题。在这篇立场论文中,我们批判性地审视了生存分析中的评估实践,并强调了删失如何使评估从根本上不同于标准回归或分类。我们特别关注基于一致性的度量,如C指数,我们证明其在文献中被过度使用。为了帮助确定合适的度量,我们提出了一组关键需求,并引入了一个双螺旋阶梯,其中有效评估需要度量与模型假设之间的对齐。通过控制实验,我们表明这种对齐的违反可能导致误导性的模型比较。最后,我们提供了关于如何评估生存模型的实用指导。

英文摘要

The current state of evaluation in survival analysis is plagued by the persistent use of evaluation metrics in ways that are misaligned with the stated modeling objective. In addition, many such evaluations are based on censoring assumptions that are left implicit or unjustified. This means that the reported performance can be misleading and may fail to answer the scientific or modeling question the evaluation was intended to address. In this position paper, we critically examine evaluation practices in survival analysis and highlight how censoring makes evaluation fundamentally different from standard regression or classification. We place particular focus on concordance-based measures, such as the C-index, which we show are heavily overused in the literature. To help identify appropriate metrics, we propose a set of key desiderata and introduce a double-helix ladder, in which valid evaluation requires alignment between metric and modeling assumptions. Through controlled experiments, we show that violations of this alignment can lead to misleading model comparisons. We conclude by providing practical guidance on how to evaluate a survival model.

2605.29388 2026-06-02 stat.ME

Gaussian Differentially Private $e$-values: Construction, Threshold Calibration, and Multiple Testing

高斯差分隐私 $e$-值:构造、阈值校准与多重检验

Qi Kuang, Bowen Gang, Yin Xia

AI总结 本文在高斯差分隐私($\mu$-GDP)下构建了差分隐私 $e$-值的框架,通过最优乘法扰动的高斯分布导出全局尖锐拒绝阈值,并提出了递归剥离算法用于多重检验,在控制错误发现率的同时提高检验功效。

详情
AI中文摘要

本文在高斯差分隐私($\mu$-GDP)下发展了一个差分隐私 $e$-值的框架。我们刻画了典型噪声机制,建立了最优乘法扰动服从高斯分布。利用该分布,我们推导了一个全局尖锐的拒绝阈值,该阈值严格优于标准的马尔可夫界。渐近分析表明,在低灵敏度情况下,校准后的私有检验相对于非私有基线实现了净功效增益。对于多重检验,我们引入了一种递归剥离算法,该算法自适应地将隐私预算集中在最有希望的假设上。这种构造保证了严格的 $\mu$-GDP,并产生与标准多重检验程序兼容的有效私有 $e$-值。模拟和全基因组关联研究证实,该方法在控制错误发现率的同时,优于朴素的全部加噪私有化,并恢复接近非私有基准的功效。

英文摘要

This paper develops a framework for differentially private $e$-values under Gaussian differential privacy ($μ$-GDP). We characterize the canonical noise mechanism, establishing that optimal multiplicative perturbation follows a Gaussian distribution. Using this distribution, we derive a globally sharp rejection threshold that strictly improves upon the standard Markov bound. Asymptotic analysis shows that in low-sensitivity regimes, the calibrated private test achieves a net power gain over the non-private baseline. For multiple testing, we introduce a recursive peeling algorithm that adaptively concentrates the privacy budget on the most promising hypotheses. This construction guarantees rigorous $μ$-GDP and yields valid private $e$-values compatible with standard multiple testing procedures. Simulations and a genome-wide association study confirm that the method controls the false discovery rate while improving upon naive all-noisy privatization and recovering power close to non-private benchmarks.

2605.28952 2026-06-02 cs.CR cs.DS cs.IT cs.LG math.IT math.ST stat.TH

Optimal Rates for Differentially Private Hypothesis Testing with E-values

基于E值的差分隐私假设检验的最优速率

Ben Jacobsen, Tomas Gonzalez, Gavin Brown, Kassem Fawaz, Aaditya Ramdas

AI总结 研究在ε-差分隐私约束下,使用e值进行假设检验时所能达到的最大e-power,并给出最优速率及匹配算法。

详情
Comments
Corrected typos; updated references; generalized proposition 3.1
AI中文摘要

近年来,e值作为支持任意有效和自适应数据分析的灵活工具引起了广泛关注。假设检验是许多此类应用的核心,而这些应用通常涉及私有或敏感数据。在这项工作中,我们回答了一个简单但重要的问题:给定两个分布 $\mathbb{P}$ 和 $\mathbb{Q}$,当使用满足 $\varepsilon$-差分隐私的e值检验 $X\sim \mathbb{P}^n$ 对 $X\sim\mathbb{Q}^n$ 时,所能达到的最大e-power是多少?我们刻画了该问题的最优速率,并提供了一个精确匹配的算法。在顺序设置中,当观测值逐个到达且分析者选择何时停止时,我们给出了任何私有e过程的停止时间的匹配上下界。数值实验证实了我们算法的实用性,在多种顺序检验问题和隐私水平下,我们的算法所需数据少于最近提出的DP-SPRT。

英文摘要

E-values have attracted considerable interest in recent years as flexible tools for enabling anytime-valid and adaptive data analysis. Hypothesis testing is at the core of many of these applications, which can often involve private or sensitive data. In this work, we answer a simple but important question: given two distributions $\mathbb{P}$ and $\mathbb{Q}$, what is the maximum achievable e-power when testing $X\sim \mathbb{P}^n$ against $X\sim\mathbb{Q}^n$ with e-values that satisfy $\varepsilon$-differential privacy? We characterize the optimal rate for this problem and provide an algorithm which matches it exactly. In the sequential setting, when observations arrive one-by-one and the analyst chooses when to halt, we give matching upper and lower bounds on the stopping times of any private e-process. Numerical experiments confirm the practicality of our algorithms, which require less data than the recently proposed DP-SPRT across a range of sequential testing problems and privacy levels.

2605.24847 2026-06-02 stat.AP

Logistic regression is not enough: The need for Bayesian nonparametric modelling for causal inference using observational data, exemplified by the 'gateway' effect

逻辑回归是不够的:使用观测数据进行因果推断需要贝叶斯非参数建模——以“门户”效应为例

Floe Foxon, Raymond Niaura

AI总结 本文通过理论分析和实证研究,指出逻辑回归模型在因果推断中的局限性,并采用贝叶斯加性回归树(BART)对电子烟使用与吸烟之间的“门户”效应进行建模,发现该效应在因果推断下消失,解决了与人口趋势的矛盾。

详情
Comments
51 pages, 5 figures
AI中文摘要

引言:通过电子烟使用到吸烟的所谓“门户”效应的视角,从理论和实证上解释了逻辑回归(LR)型模型在因果推断中的局限性。先前的研究报告称,在青少年纵向队列(LCs)的LR型模型中,基线电子烟使用使随访吸烟(二值化)的几率增加了四倍,因此电子烟使用的增加将抵消吸烟的下降。然而,美国人口层面的趋势显示,当电子烟使用增加时,吸烟下降加速至历史最低点,呈现明显的悖论。方法:使用贝叶斯加性回归树(BART)分析美国烟草与健康人口评估(美国)青少年第3至4波数据,对从未吸烟和曾经吸烟的受访者(组效应)建模基线电子烟使用(处理)与从基线到随访吸烟天数的变化(数值响应),并调整混杂风险因素(社会人口学、个体内、行为、同伴影响和家庭背景)。与LR型模型不同,BART提供非线性、非参数建模,包含反事实,并提供具有原则性不确定性估计的因果效应估计。结果:在曾经吸烟的青少年中,电子烟使用对吸烟的平均效应在临床和统计上均显著(吸烟天数减少2天[转移效应;与门户相反]),而在从未吸烟的青少年中,临床不显著(吸烟天数绝对变化小于1天[零效应])。结论:当使用因果推断技术分析LC数据时,门户效应消失,与人口层面趋势一致。这很可能解释了为什么先前LR型研究预测的门户效应并未在美国青少年吸烟下降中实现人口层面的逆转或意外减缓,从而解决了这一悖论。

英文摘要

Introduction: Logistic regression (LR)-type model limitations for causal inference are explained theoretically and empirically through the lens of the purported gateway effect from e-cigarette use to smoking. Previous studies have reported that baseline e-cigarette use quadruples odds of follow-up smoking (binarized) in LR-type models of adolescent longitudinal cohorts (LCs), such that increased e-cigarette use would counteract smoking declines. However, US population-level trends show accelerated smoking declines to record-lows when e-cigarette use increased, presenting an apparent paradox. Methods: Population Assessment of Tobacco and Health (USA) Youth Waves 3 to 4 were analyzed with Bayesian Additive Regression Trees (BART) to model baseline e-cigarette use (treatment) and change in number of days smoking from baseline to follow-up (numerical response) among never- and ever-smoking respondents (group effects), adjusting for confounding risk factors (socio-demographic, intra-individual, behavioural, peer influence, and family background). Unlike LR-type models, BART provides nonlinear, nonparametric modelling with counterfactuals and provides causal effect estimates with principled uncertainty estimation. Results: The average effect of e-cigarette use on smoking was both clinically and statistically significant among ever-smoking adolescents (-2 days smoking [diversionary effect; opposite to gateway]) and was not clinically significant among never-smoking adolescents (<1-day absolute change in days smoking [null effect]). Conclusions: When LC data are analyzed with causal inference techniques, the gateway effect disappears, consistent with population-level trends. This likely explains why gateway effects predicted in previous LR-type studies have not materialized in a population-level reversal/unexpected slowing of the US adolescent smoking decline, resolving the paradox.

2605.24377 2026-06-02 stat.AP

Trustworthy AI/ML Regression and Unbiased Causal Inference for Real-World Data

可信赖的AI/ML回归与真实世界数据的无偏因果推断

Yifei Xu, Hwiyoung Lee, Zhenyao Ye, Yezhi Pan, Jingsong Zhou, Yun Yang, Chixiang Chen, Shuo Chen

AI总结 针对真实世界数据中高维混杂导致的AI/ML回归系统性预测偏差,提出无偏的ML/AI回归因果推断框架,确保平均处理效应的无偏估计。

详情
Comments
17 pages, 4 figures, 4 tables; includes supplementary material
AI中文摘要

真实世界数据(RWD)凭借其大样本量和丰富的临床细节,为研究多样化复杂患者群体的治疗效果提供了随机对照试验(RCT)的引人注目的替代方案。然而,其观察性质引入了混杂因素,阻碍了直接的比较有效性研究。目标试验模拟利用RWD来估计RCT无法实现的人群规模和多样性下的平均处理效应(ATE),但其有效性关键取决于在高维混杂下无偏的ATE估计。许多因果推断流程通过机器学习和人工智能(ML/AI)结果回归来处理高维混杂。然而,常用的ML/AI回归模型表现出系统性预测偏差,预测结果向边际结果均值收缩。这种结构性偏差会传播到ATE估计中,且无法通过交叉拟合、集成方法或任何标准ML实践来纠正。在这项工作中,我们首先定量描述了ML/AI结果回归中的系统性预测偏差如何导致因果推断模型中有偏的ATE估计。我们进一步提出了一种无偏的ML/AI回归因果推断框架,以确保观察性研究中ATE的无偏估计。我们通过使用英国生物银行数据研究阿片类药物对慢性疼痛患者心血管健康的影响来展示我们的方法。

英文摘要

Real-World Data (RWD), with its large sample sizes and rich clinical detail, offers a compelling alternative to randomized controlled trials (RCTs) for studying treatment effects in diverse and complex patient populations. However, its observational nature introduces confounding that prevents straightforward comparative effectiveness research. Target trial emulation leverages RWD to estimate average treatment effects (ATE) at the population scale and diversity that RCTs cannot achieve, yet its validity depends critically on unbiased ATE estimation under high-dimensional confounding. Many causal inference pipelines address high-dimensional confounding through machine learning and artificial intelligence (ML/AI) outcome regression. However, commonly used ML/AI regression models exhibit systematic prediction bias, with predicted outcomes shrinking toward the marginal outcome mean. This structural bias propagates into ATE estimation and cannot be corrected by cross-fitting, ensemble methods, or any standard ML practice. In this work, we first quantitatively characterize how systematic prediction bias in ML/AI outcome regression leads to biased ATE estimates in causal inference models. We further propose an unbiased ML/AI regression-based causal inference framework to ensure unbiased ATE estimation for observational studies. We demonstrate our approach by studying the effects of opioids on cardiovascular health in patients with chronic pain using UK Biobank data.

2605.21648 2026-06-02 cs.LG cond-mat.dis-nn cs.NE stat.ML

Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos

Dropout 普适性:混沌边缘的缩放定律与最优调度

Lucas Fernandez Sarmiento

AI总结 提出 dropout 作为临界信号传播扰动的平均场理论,发现前端加载的 dropout 调度在固定预算下可将 MLP 和 Vision Transformer 的测试损失降低 18-35%,并推导出相关缩放定律与普适类。

详情
Comments
Accepted at the 43rd International Conference on Machine Learning (ICML 2026). 36 pages, 11 figures. Camera-ready version
AI中文摘要

我们发展了 dropout 作为混沌边缘临界信号传播扰动的平均场理论,并表明它预测了一个简单、零成本的实践改变:前端加载的 dropout 调度在固定预算下,在 MLP 和 Vision Transformer 中比恒定 dropout 降低测试损失 18-35%。理论机制是 dropout 移动了完美对齐固定点,使得即使在临界初始化下信息传播的深度尺度也变得有限。我们推导了相关衰减的临界和交叉缩放定律,并建立了平滑激活和带拐点的 ReLU 类激活构成不同的普适类,具有不同的临界指数以及在失谐和 dropout 强度下的通用两参数缩放塌缩。这种区别追溯到相关映射的解析结构:平滑激活在完美对齐附近允许泰勒展开,而带拐点的激活则出现具有普适非解析性的分支点。作为推论,该框架在固定预算下产生饱和的 dropout 轮廓;然后通过正则化可达性论证选择前端加载的调度,精度提升作为一致的次要效果。我们还讨论了相同的高斯核结构如何将理论从 MLP 扩展到 CNN 和残差架构。

英文摘要

We develop a mean-field theory of dropout as a perturbation of critical signal propagation at the edge of chaos, and show that it predicts a simple, no-cost change to standard practice: \emph{front-loaded} dropout schedules cut test loss by \(18\)--\(35\%\) over constant dropout in MLPs and Vision Transformers at fixed budget. The theoretical mechanism is that dropout shifts the perfect-alignment fixed point, making the depth scale for information propagation finite even at critical initialization. We derive critical and crossover scaling laws for correlation decay and establish that smooth activations and kinked, \relu{}-like activations constitute distinct universality classes, with different critical exponents and a universal two-parameter scaling collapse in detuning and dropout strength. The distinction traces to the analytic structure of the correlation map: smooth activations admit a Taylor expansion near perfect alignment, while kinked activations develop a branch point with universal non-analyticity. As a corollary, the framework yields saturated dropout profiles under fixed budget; a regularization-reach argument then selects front-loaded schedules, with accuracy gains as a consistent secondary effect. We also discuss how the same Gaussian-kernel structure extends the theory beyond MLPs toward CNNs and residual architectures.

2605.20615 2026-06-02 stat.ME

Evaluating causal indirect effects when mediators are left-censored by assay limit of quantification

当中介变量因检测定量下限而左删失时评估因果间接效应

Cong Jiang, Michael D. Hughes, Nima S. Hejazi

AI总结 针对中介变量因检测定量下限导致的左删失问题,提出结合分数插补与半参数EM算法的半参数框架,用于估计自然直接和间接效应,并通过数据自适应bootstrap实现稳健推断。

详情
AI中文摘要

因果中介分析对于解析研究性治疗和预防药物影响临床结果的机制至关重要。然而,生物中介变量的测量常因技术测量限制而受到左删失,最常见的是检测的定量下限。这种删失形式对因果中介估计量的识别和估计都构成严重挑战,尤其是当删失机制是确定性的且由此产生的缺失为非随机缺失或不可忽略时。受评估病毒RNA在加速COVID-19治疗和疫苗(ACTIV)-2平台试验中单克隆抗体疗法作用机制中作用的启发,我们开发了一个半参数框架,用于在感兴趣的中介变量部分受到这种左删失时估计自然直接和间接效应。我们提出的策略将分数插补与半参数EM算法相结合,以灵活估计因子化数据似然的关键组成部分。应用所提出的策略来规避左删失,我们讨论了直接和间接效应估计量的传统插件估计和渐近有效估计,并引入了一种数据自适应的$m$-out-of-$n$自助法,用于在插补程序下进行稳健推断。我们在数值实验中证明,我们的方法显著减少了偏差并允许可靠的推断。对ACTIV-2平台试验数据的应用证实,单克隆抗体疗法降低了因COVID-19住院和死亡的风险,同时表明病毒RNA的变化仅介导了总体治疗效果的一小部分。

英文摘要

Causal mediation analysis is essential for disentangling the mechanisms by which investigational therapeutic and preventive agents impact clinical outcomes. However, the measurement of biological mediators is often subject to left-censoring by technical measurement limitations, most commonly an assay's limit of quantification. This form of censoring can pose severe challenges for both identification and estimation of causal mediation estimands, particularly when the censoring mechanism is deterministic and the resulting missingness is missing not at random (MNAR) or nonignorable. Motivated by the question of assessing the role of viral RNA in the action mechanism of monoclonal antibody therapies for COVID-19 in the Accelerating COVID-19 Therapeutics and Vaccine (ACTIV)-2 platform trial, we develop a semi-parametric framework for estimation of the natural direct and indirect effects when the mediator of interest is partially subject to this form of left-censoring. Our proposed strategy combines fractional imputation with a semi-parametric EM algorithm to flexibly estimate key components of the factorized data likelihood. Applying the proposed strategy to circumvent the left-censoring, we discuss both traditional plug-in and asymptotically efficient estimators of the direct and indirect effect estimands, introducing a data-adaptive $m$-out-of-$n$ bootstrap for robust inference under the imputation procedure. We demonstrate in numerical experiments that our approach significantly reduces bias and allows for reliable inference. An application to data from the ACTIV-2 platform trial confirms that monoclonal antibody therapies reduce the risk of hospitalization and death due to COVID-19, while suggesting that changes in viral RNA mediate only a modest proportion of the overall treatment effect.

2605.18694 2026-06-02 math.OC cs.LG stat.ML

Can Adaptive Gradient Methods Converge under Heavy-Tailed Noise? A Case Study of AdaGrad

自适应梯度方法能否在重尾噪声下收敛?以 AdaGrad 为例

Zijian Liu

AI总结 本文研究 AdaGrad 在重尾梯度噪声下的收敛性,首次证明当尾指数 p 满足 4/3 < p ≤ 2 时,无需先验知识即可获得非凸优化的收敛率,并给出了算法相关的下界。

详情
Comments
ICML 2026. v2: simplification of the proof
AI中文摘要

现代机器学习中的许多任务在优化过程中观察到涉及重尾梯度噪声。为了应对这一现实且具有挑战性的场景,引入了新的机制,如梯度裁剪和梯度归一化,以确保一阶算法的收敛性。然而,自适应梯度方法,一类著名的现代优化器,包括流行的 $\mathtt{Adam}$ 和 $\mathtt{AdamW}$,即使没有上述任何额外操作,通常也表现良好。因此,自然要问:自适应梯度方法能否在重尾噪声下收敛而无需任何算法更改?在这项工作中,我们通过研究一个特例 $\mathtt{AdaGrad}$(自适应梯度方法的起源)迈出了回答这个问题的第一步。我们首次证明了当尾指数 $p$ 满足 $4/3 < p \leq 2$ 时,$\mathtt{AdaGrad}$ 在非凸优化中的可证明收敛率。值得注意的是,这一结果无需任何关于 $p$ 的先验知识,因此对尾指数是自适应的。此外,我们开发了一个算法相关的下界,表明现有的重尾优化极小极大速率无法由 $\mathtt{AdaGrad}$ 达到。最后,我们考虑了 $\mathtt{AdaGrad}\text{-}\mathtt{Norm}$(理论研究中 $\mathtt{AdaGrad}$ 的一个流行变体),并证明了在额外温和假设下,对于任何 $1 < p \leq 2$ 都成立的改进速率。

英文摘要

Many tasks in modern machine learning are observed to involve heavy-tailed gradient noise during the optimization process. To manage this realistic and challenging setting, new mechanisms, such as gradient clipping and gradient normalization, have been introduced to ensure the convergence of first-order algorithms. However, adaptive gradient methods, a famous class of modern optimizers that includes popular $\mathtt{Adam}$ and $\mathtt{AdamW}$, often perform well even without any extra operations mentioned above. It is therefore natural to ask whether adaptive gradient methods can converge under heavy-tailed noise without any algorithmic changes. In this work, we take the first step toward answering this question by investigating a special case, $\mathtt{AdaGrad}$, the origin of adaptive gradient methods. We provide the first provable convergence rate for $\mathtt{AdaGrad}$ in non-convex optimization when the tail index $p$ satisfies $4/3<p\leq2$. Notably, this result is achieved without requiring any prior knowledge of $p$ and is hence adaptive to the tail index. In addition, we develop an algorithm-dependent lower bound, suggesting that the existing minimax rate for heavy-tailed optimization is not attainable by $\mathtt{AdaGrad}$. Lastly, we consider $\mathtt{AdaGrad}\text{-}\mathtt{Norm}$, a popular variant of $\mathtt{AdaGrad}$ in theoretical studies, and show an improved rate that holds for any $1<p\leq2$ under an extra mild assumption.

2510.12999 2026-06-02 cs.LG stat.ML

AMORE: Adaptive Multi-Output Operator Network for Stiff Chemical Kinetics

AMORE: 自适应多输出算子网络用于刚性化学动力学

Kamaljyoti Nath, Additi Pandey, Bryan T. Susi, Hessam Babaee, George Em Karniadakis

AI总结 针对刚性化学动力学系统的时间积分计算成本高的问题,提出AMORE框架,通过自适应损失函数和可逆映射确保多输出算子学习的可靠性,并在合成气和GRI-Mech 3.0上验证了有效性。

详情
AI中文摘要

刚性系统的时间积分是燃烧、高超声速及其他反应输运系统中计算成本的主要来源。这种刚性会引入远小于其他物理过程的时间尺度,导致显式格式需要极小的步长或隐式方法计算量大。因此,缓解刚性挑战的策略至关重要。虽然神经算子(DeepONet)可以作为刚性动力学的替代模型,但需要可靠的算子学习策略来适当考虑输出变量和样本之间的误差差异。本文开发了AMORE(自适应多输出算子网络),一个包含能够预测多个输出的算子和确保可靠算子学习的自适应损失函数的框架。该算子从给定初始条件预测所有热化学状态。我们提出了两种自适应损失函数,考虑每个状态变量和样本的误差来惩罚损失函数。我们设计了主干网络以自动满足单位分解。为了精确满足质量分数总和为1的约束,我们提出了一个可逆解析映射,将n维物种质量分数向量变换到(n-1)维空间。我们将所提出的自适应损失函数扩展到具有多输出的DeepONet的两步训练中的主干和分支训练。我们还通过预测质量分数上的softmax函数精确实现了另一个质量分数总和为1的约束。我们通过两个示例证明了模型的有效性和适用性:合成气(12个状态)、GRI-Mech 3.0(54个中的24个活跃状态)。所提出的DeepONet将成为未来CFD研究加速湍流燃烧模拟的骨干。AMORE是一个通用框架,本文也将其应用于FNO。

英文摘要

Time integration of stiff systems is a primary source of computational cost in combustion, hypersonics, and other reactive transport systems. This stiffness can introduce time scales significantly smaller than those associated with other physical processes, requiring extremely small time steps in explicit schemes or computationally intensive implicit methods. Consequently, strategies to alleviate challenges posed by stiffness are important. While neural operators (DeepONets) can act as surrogates for stiff kinetics, a reliable operator learning strategy is required to appropriately account for differences in error between output variables and samples. Here, we develop AMORE, Adaptive Multi-Output Operator Network, a framework comprising an operator capable of predicting multiple outputs and adaptive loss functions ensuring reliable operator learning. The operator predicts all thermochemical states from given initial conditions. We propose two adaptive loss functions within the framework, considering each state variable's and sample's error to penalize the loss function. We designed the trunk to automatically satisfy Partition of Unity. To enforce unity mass-fraction constraint exactly, we propose an invertible analytical map that transforms the $n$-dimensional species mass-fraction vector into an ($n-1$)-dimensional space. We extend the proposed adaptive loss functions to trunk and branch training in two-step training of DeepONet with multiple outputs. We implemented another unity mass fraction constraint exactly using a softmax function on the predicted mass fraction. We demonstrate efficacy and applicability of our models through two examples: syngas (12 states), GRI-Mech 3.0 (24 active states out of 54). The proposed DeepONet will be a backbone for future CFD studies to accelerate turbulent combustion simulations. AMORE is a general framework, and here, we also demonstrate it for FNO.

2605.14432 2026-06-02 quant-ph math.ST stat.ME stat.TH

Singular Asymptotics of SPADE in Quantum Source Discrimination

量子源鉴别中SPADE的奇异渐近性

Natsuki Kariya

AI总结 研究弱且间距小的发射源在奇异区域中,使用空间模式解复用(SPADE)进行远场鉴别的有限光子行为,通过奇异学习理论分析对齐和未对齐情况下的贝叶斯自由能渐近性和Neyman-Pearson检验性能。

详情
Comments
13 pages, 2 figures
AI中文摘要

我们研究在弱且间距小的发射源的奇异区域中,一个和两个非相干点源之间的远场鉴别。在理想对齐下,空间模式解复用(SPADE)达到了量子最优的大样本Stein指数,但靠近单源边界的有限光子行为以及现实缺陷的影响仍不太清楚。使用奇异学习理论,我们分析了对齐和未对齐问题。在对齐高斯情况下,我们推导了直接成像和SPADE的zeta函数极点,表明两者共享相同的实对数规范阈值$λ=1/2$但多重性不同,并得到了相应的贝叶斯自由能渐近性。这揭示了在对齐局部先验加权区域中,SPADE具有普遍的子领先优势。在未对齐设置中,我们研究了一个物理动机的二进制SPADE简化,它保留了靠近对齐时的完整领先$O(s^2)$泄漏对比,而详细的高阶模式重新分布的修正仅在$O(s^4)$阶出现。我们表明,未对齐的二进制SPADE和直接成像在不同的内在尺度上获得非平凡的局部幂,分别为$s=O(n^{-1/4})$和$s=O(n^{-1/2})$。然而,在常见物理条件下的有限$n$ Neyman-Pearson比较表明,直接成像在绘制的网格上更强,且未对齐的二进制SPADE表现出一个精确的盲分离$s^\ast=2θ$,其幂降至$α$。这些结果将模型奇异性识别为有限光子量子鉴别的结构组织原则,并阐明了理想对齐的SPADE基准如何在未对齐情况下无法转化为有限$n$优势。

英文摘要

We study far-field discrimination between one and two incoherent point sources in the singular regime of weak and closely spaced emitters. Under ideal alignment, spatial-mode demultiplexing (SPADE) attains the quantum-optimal large-sample Stein exponent, but the finite-photon behavior near the one-source boundary and the effect of realistic imperfections remain less understood. Using singular learning theory, we analyze both the aligned and misaligned problems. In the aligned Gaussian case, we derive the zeta-function poles for direct imaging and SPADE, show that both share the same real log canonical threshold $λ=1/2$ but differ in multiplicity, and obtain the corresponding Bayes free-energy asymptotics. This yields a universal subleading advantage of aligned SPADE in the local prior-weighted regime. In the misaligned setting, we study a physically motivated binary-SPADE reduction that retains the full leading $O(s^2)$ leakage contrast near alignment, with corrections from the detailed higher-mode redistribution entering only at $O(s^4)$. We show that misaligned binary-SPADE and direct imaging acquire nontrivial local power on different intrinsic scales, $s=O(n^{-1/4})$ and $s=O(n^{-1/2})$, respectively. However, finite-$n$ Neyman--Pearson comparisons under common physical conditions reveal that direct imaging is stronger on the plotted grids and that misaligned binary-SPADE exhibits an exact blind separation $s^\ast=2θ$, where its power collapses to $α$. These results identify model singularity as a structural organizing principle for finite-photon quantum discrimination and clarify how ideal aligned SPADE benchmarks can fail to translate into finite-$n$ advantages under misalignment.

2605.13430 2026-06-02 stat.ME cs.AI cs.LG

Towards a holistic understanding of Selection Bias for Causal Effect Identification

走向因果效应识别中选择偏差的整体理解

Yiwen Qiu, Filip Kovačević, Shimeng Huang, Peter Spirtes, Francesco Locatello

AI总结 研究在观测研究中存在选择偏差时,如何利用弱假设刻画倾向得分和选择概率,给出平均处理效应可识别性的充要条件,扩展了现有图形识别准则。

详情
Comments
9 pages for the main text, ICML 2026
AI中文摘要

选择偏差在观测研究中普遍存在。例如,大规模生物库数据可能表现出“健康志愿者偏差”,即受访者比他们所要代表的人群更健康、社会经济地位更高。从这样的子人群中恢复因果效应是因果推断中的一个重要问题,因为从选定人群估计平均处理效应(ATE)可能导致对整个群体的ATE估计严重偏倚。本文研究了选择偏差下ATE的可识别性。我们利用概率类的弱假设刻画倾向得分和选择概率,给出了ATE可识别性的充要条件。与以往工作相比,我们的结果扩展了现有的图形可识别性准则,并在存在选择偏差的情况下,以严格更弱的条件提供了对因果效应识别更全面的理解。

英文摘要

Selection bias is pervasive in observational studies. For example, large scale biobanks data can exhibit ``healthy volunteer bias'' when respondents are healthier and of higher socio-economic status than the population they are meant to represent. Recovering causal effects from such sub-population is an important problem in causal inference, as estimating average treatment effects (ATE) from selected populations can result in a severely biased estimate of the ATE from the whole population. In this paper, we investigate the identifiability of the ATE under selection bias. We provide necessary and sufficient conditions for ATE identifiability, leveraging weak assumptions on probability classes to characterize propensity score and selection probability. Compared to previous works, our results extend existing graphical identifiability criteria and offer a more comprehensive understanding of causal effect identification with strictly weaker conditions in the presence of selection bias.

2605.13397 2026-06-02 stat.ME stat.CO

Stabilised weighted data subsampling for accelerated inference in models with recursive likelihoods

稳定加权数据子采样用于递归似然模型的加速推断

Matias Quiroz, Aishwarya Bhaskaran, Zixuan Wang, Thomas Goodwin

AI总结 针对递归定义似然的模型计算成本高的问题,提出一种基于对数似然无偏估计的稳定加权子采样方法,通过给早期观测分配更高采样概率减少递归深度,并引入理论支持的稳定化框架避免方差膨胀,在条件波动率模型中实现显著加速且保持推断精度。

详情
Comments
Version 2: Revised and shortened for journal submission. Some technical material has been moved from the main paper to the appendix and supplementary material. Minor improvements to exposition and presentation. No substantive changes to the methodology, theoretical results, or conclusions. This version includes the main manuscript, appendix, and supplementary material in a single file
AI中文摘要

对于具有递归定义似然的模型,推断计算成本高昂,限制了其在大规模数据集上的可扩展性。我们提出一种基于对数似然无偏估计的稳定加权子采样方法,用于加速推断。通过给早期观测分配更高的采样概率,该方法减少了递归似然评估的有效深度,从而降低计算成本。然而,采样概率衰减过慢会导致节省有限,而过度激进的衰减则可能大幅增加估计量方差。我们开发了一个有理论支持的稳定化框架,通过原则性的超参数调优限制衰减,以避免计算和方差上的病态问题。我们还推导了对数似然梯度的无偏子采样估计量,从而支持基于梯度的推断。该方法可以嵌入多种推断框架中。我们展示了其在变分贝叶斯和子采样马尔可夫链蒙特卡洛方法中用于条件波动率模型(包括杠杆效应)的应用。实证结果表明,与全数据方法相比,该方法在保持推断精度的同时实现了显著的计算加速。我们还与近期针对时间依赖数据的随机梯度MCMC和分治MCMC方法进行了比较,观察到有利的实证性能。

英文摘要

Inference for models with recursively defined likelihoods is computationally demanding, limiting scalability to large datasets. We propose a stabilised weighted subsampling methodology for accelerated inference based on an unbiased estimator of the log-likelihood. By assigning higher sampling probabilities to early observations, the method reduces the effective depth of recursive likelihood evaluations and hence computational cost. However, sampling probabilities that decay too slowly yield limited savings, while overly aggressive decay can substantially inflate estimator variance. We develop a stabilisation framework, supported by theory, that restricts the decay to avoid both computational and variance pathologies through principled hyperparameter tuning. We also derive an unbiased subsampling estimator of the log-likelihood gradient, enabling gradient-based inference. The methodology can be embedded within a range of inferential frameworks. We illustrate its use in variational Bayes and subsampling Markov chain Monte Carlo for conditional volatility models, including leverage effects. Empirical results show substantial computational speed-ups relative to full-data methods while maintaining inferential accuracy. We also compare with recent stochastic gradient MCMC and divide-and-conquer MCMC methods for temporally dependent data, observing favourable empirical performance.

2605.13203 2026-06-02 stat.ME

Double Descent and Ensemble Emergence in Model Averaging Prediction

模型平均预测中的双重下降与集成涌现

Ke Chen, Dandan Jiang, Xinyu Zhang

AI总结 本文利用随机矩阵理论,在高维线性回归中推导模型平均的极限风险,揭示简单加权导致双重下降和方差爆炸,而策略加权触发集成涌现以抑制风险峰值,并据此提出大模型平均(LaMA)方法。

详情
AI中文摘要

本文研究了在回归变量数量与样本量相当的高维线性回归中,模型平均的预测性能。利用随机矩阵理论工具,我们在嵌套模型设置下推导了精确的极限样本外风险,并全面刻画了风险景观。该极限风险有助于揭示两个现象:简单加权继承了双重下降轨迹及其在插值边界附近的相关方差爆炸;策略加权触发了集成涌现,抑制了局部风险激增并产生全局平坦的风险表面。基于此极限风险,我们还提出了大模型平均(LaMA)方法,其中我们考虑了高维环境下样本内与样本外风险之间的差异。数值研究和实际数据应用证实,LaMA在高维环境中实现了优越的预测精度。

英文摘要

This paper investigates the predictive performance of model averaging in high-dimensional linear regression where the number of regressors is comparable to the sample size. Leveraging tools from random matrix theory, we derive the exact limiting out-of-sample risk under a nested model setting and comprehensively characterize the risk landscape. This limiting risk helps to reveal two phenomena: simple weighting inherits the double descent trajectory and its associated variance explosion near the interpolation boundary; strategic weighting triggers an ensemble emergence that suppresses the localized risk surge and yields a globally flat risk surface. Building on this limiting risk, we also propose the Large Model Averaging (LaMA) method, in which we consider the discrepancy between in-sample and out-of-sample risks in the high-dimensional regime. Numerical studies and real data applications confirm that LaMA achieves superior predictive accuracy in high-dimensional environments.

2605.12895 2026-06-02 cs.LG cs.AI cs.CY stat.AP

RISED: A Pre-Deployment Evaluation Framework for High-Stakes AI Decision-Support Systems, with Application to Healthcare

RISED:高风险AI决策支持系统的部署前评估框架,及其在医疗中的应用

Rohith Reddy Bellibatlu, Manpreet Singh, Yash Jajoo, Shyamal Lakhanpal, Abhishek Israni

AI总结 提出RISED框架,通过BCa bootstrap置信区间、文献阈值和Holm-Bonferroni校正的PASS/FAIL/INCONCLUSIVE判定,从五个维度评估高风险AI决策支持系统,在医疗等数据集上发现AUROC无法揭示的失败模式。

详情
Comments
39 pages, 7 figures, 15 tables. Code at https://github.com/rohithreddybc/rised-healthcare-eval and dataset at https://doi.org/10.57967/hf/8734 (Hugging Face). To be submitted to Expert Systems with Applications (Elsevier)
AI中文摘要

临床决策支持系统是专家系统,临床医生直接根据其建议行动,但通常仅通过保留测试集上的一个总体准确率数字来批准。这个数字对编码偏移下的输入可靠性、子组差距、阈值敏感性或操作可行性毫无说明。我们提出RISED,一个部署前评估框架,通过BCa bootstrap 95%置信区间、基于文献的阈值和Holm-Bonferroni校正的PASS/FAIL/INCONCLUSIVE判定,操作化五个维度(可靠性、包容性、敏感性、公平性、可部署性);公平性是一个代理依赖诊断而非门控测试。应用于跨越35年的七个队列(n从303到99,492),RISED揭示了AUROC无法发现的失败:在Diabetes 130上,可靠性通过三个数量级(PSS = 0.0004),而包容性(AUC差距 = 0.262)和敏感性(最大阈值翻转率49.1%)明确失败;两个NHIS队列也重复了这一点。具有完整特征配置的NHANES 2021-2023获得了INCONCLUSIVE判定;BRFSS 2024在仪器旋转移除高血压和胆固醇后产生了该套件中最严重的敏感性失败(最大阈值翻转率64.2%)。该模式在信用和收入预测队列上重复出现,证实了领域无关性;多模型检查显示失败是数据驱动的,而非模型特定的。RISED作为开源Python包发布,补充了TRIPOD+AI、FUTURE-AI和Fairlearn,提供了这些标准要求但未规定的结构化数值证据。

英文摘要

Clinical decision-support systems are expert systems whose recommendations clinicians act on directly, yet they are usually cleared on one aggregate accuracy number from a held-out test set. That number says nothing about input reliability under encoding shifts, subgroup gaps, threshold sensitivity, or operational feasibility. We present RISED, a pre-deployment evaluation framework operationalising five dimensions (Reliability, Inclusivity, Sensitivity, Equity, Deployability) through BCa bootstrap 95% confidence intervals, literature-grounded thresholds, and Holm-Bonferroni-corrected PASS / FAIL / INCONCLUSIVE verdicts; Equity is a proxy-dependence diagnostic rather than a gating test. Applied to seven cohorts spanning 35 years (n from 303 to 99,492), RISED surfaces failures invisible to AUROC: on Diabetes 130, Reliability passes by three orders of magnitude (PSS = 0.0004) while Inclusivity (AUC parity gap = 0.262) and Sensitivity (max threshold-flip rate 49.1%) fail decisively; both NHIS cohorts reproduce this. NHANES 2021-2023, with a complete feature profile, achieves INCONCLUSIVE verdicts; BRFSS 2024 produces the suite's most severe Sensitivity failure (max threshold-flip rate 64.2%) after instrument rotation removed hypertension and cholesterol. The pattern recurs on credit- and income-prediction cohorts, confirming domain-agnosticity; a multi-model check shows the failures are data-driven, not model-specific. RISED ships as an open-source Python package complementing TRIPOD+AI, FUTURE-AI, and Fairlearn with the structured numerical evidence those standards require but do not prescribe.

2605.12768 2026-06-02 stat.ML cs.LG

ISOMORPH: A Supply Chain Digital Twin for Simulation, Dataset Generation, and Forecasting Benchmarks

ISOMORPH:用于仿真、数据集生成和预测基准的供应链数字孪生

Zhizhen Zhang, Hyemin Gu, Benjamin J. Zhang, Daniel Elenius, Michael Tyrrell, Theo J. Bourdais, Houman Owhadi, Markos A. Katsoulakis, Tuhin Sahai

AI总结 本文提出ISOMORPH,首个公开的多级物流网络数字孪生,通过可配置参数和模块化拓扑生成具有牛鞭效应等动态特性的数据集,并评估基础模型的零样本预测性能。

详情
AI中文摘要

开放的时间序列预测(TSF)基准涵盖零售、能源、天气和交通,但供应链物流仍未得到充分服务。我们引入了ISOMORPH,这是第一个具有可解释、用户可配置参数以及模块化拓扑、需求和控制规则的多级物流网络的公开数字孪生。该模拟器在离散时间上推进一个有向路由图:需求从库存中满足或记录为积压,并触发整个网络的补货。状态跟踪库存、未结订单、在途货物以及平滑的需求估计,在可处理的状态空间上产生马尔可夫动力学。发布的数据以经验一致的程度再现了牛鞭效应,同时三个守恒定律为模拟器扩展提供了验证工具。我们发布了两个目录规模(C=50和C=200)、六种场景扫描和20种拉丁超立方体扰动的数据集。这些数据集展示了固定TSF基准中基本缺失的动态特性,包括方差放大、级联瓶颈、制度转换以及通过共享宏观冲击的跨通道耦合。对四个基础模型(Chronos、Moirai、TimesFM和Lag-Llama)的零样本评估在低至中等预测范围上产生了超过公开GIFT-Eval参考的MASE值,支持将其纳入现有基准套件。相同的模型通过需求侧参数的拉丁超立方体扰动提供预测置信带,实现了标准TSF数据集上不可用的前向不确定性量化(UQ),并证明基础模型可以作为基于数字孪生的UQ的快速替代。代码(MIT):https://github.com/tuhinsahai/ISOMORPH。交互演示:https://huggingface.co/spaces/HyeminGu/ISOMORPH-demo。

英文摘要

Open time-series forecasting (TSF) benchmarks cover retail, energy, weather, and traffic, but supply-chain logistics remains underserved. We introduce ISOMORPH, the first public digital twin of a multi-echelon logistics network with interpretable, user-configurable parameters and modular topology, demand, and control rules. The simulator advances a directed routing graph in discrete time: demand is served from inventory or recorded as backlog and triggers replenishment throughout the network. The state tracks inventory, outstanding orders, in-transit shipments, and a smoothed demand estimate, yielding Markovian dynamics on a tractable state space. The released data reproduces the bullwhip effect at empirically consistent magnitudes, while three conservation laws provide verification tools for simulator extensions. We release datasets at two catalogue scales ($C=50$ and $C=200$), six scenario sweeps, and 20 Latin-hypercube perturbations. These datasets exhibit dynamics largely absent from fixed TSF benchmarks, including variance amplification, cascading bottlenecks, regime shifts, and cross-channel coupling through shared macro shocks. Zero-shot evaluation of four foundation models (Chronos, Moirai, TimesFM, and Lag-Llama) yields MASE values exceeding public GIFT-Eval references at low-to-moderate horizons, supporting incorporation into existing benchmark suites. The same models provide forecast confidence bands through Latin-hypercube perturbations of demand-side parameters, enabling forward uncertainty quantification (UQ) unavailable on standard TSF datasets and demonstrating that foundation models can serve as fast surrogates for digital-twin-based UQ. Code (MIT): https://github.com/tuhinsahai/ISOMORPH. Interactive demo: https://huggingface.co/spaces/HyeminGu/ISOMORPH-demo.

2605.03781 2026-06-02 math.ST stat.TH

Empirical Bernstein Confidence Intervals for Kernel Smoothers: A Safe and Sharp Way to Exhaust Assumed Smoothness

核平滑器的经验伯恩斯坦置信区间:一种安全且尖锐的方法以充分利用假设的光滑性

Zihao Yuan, Sven Klaassen

AI总结 本文提出经验伯恩斯坦置信区间(EBCI),通过用经验伯恩斯坦尾部控制替代标准正态临界值校准,在原始估计尺度上控制随机变异性,从而避免归一化偏差放大,实现覆盖精度和区间长度的最优权衡。

详情
Comments
This is the fourth version of the working paper
AI中文摘要

使用标准正态临界值校准(SNC)构建基于核平滑器的置信区间面临一个根本挑战:归一化使得小的估计偏差变成不可忽略的推断偏差。本文采取不同路线,用经验伯恩斯坦尾部控制替代SNC控制。所得的置信区间在原始估计尺度上控制随机变异性,因此确定性平滑偏差作为估计尺度近似误差而非归一化推断偏差进入半径。我们针对单变量密度和回归函数的逐点推断发展了这一思想。所提出的经验伯恩斯坦置信区间(EBCI)结合了经验伯恩斯坦校准与局部泰勒余项类下的偏差感知固定长度半径构造。在具有$S$阶局部光滑性的函数上一致地,单侧和双侧区间在覆盖名义水平上达到阶为$n^{-\frac{2S}{2S+1}}$的余项或有界/次高斯设置下的指数余项。其宽度以极小化最优速率$n^{-\frac{S}{2S+1}}$收缩。此外,在小$α$区域,EBCI半径与偏差感知型固定长度置信区间的半径一阶对齐。对于单侧推断,主导项相同;对于双侧推断,唯一区别是将$\log(\frac{1}{α})$替换为$\log(\frac{2}{α})$。因此,EBCI安全地将正确指定的光滑性转化为覆盖精度和区间长度效率。贡献不是新的偏差控制方法,而是一种新的校准方法,可以继承现有思想如偏差感知推断(BA)和稳健偏差校正(RBC),同时避免SNC引起的归一化偏差膨胀。

英文摘要

Using standard-normal critical-value calibration (SNC) to construct a kernel-smoother-based confidence interval faces a fundamental challenge: the normalization makes a small estimation bias become a non-negligible inferential bias. This paper takes a different route by replacing the SNC control with empirical Bernstein tail control. The resulting confidence intervals control stochastic variability on the original estimation scale, so that deterministic smoothing bias enters the radius as an estimation-scale approximation error rather than as a normalized inferential bias. We develop this idea for pointwise inference on univariate density and regression functions. The proposed empirical Bernstein confidence intervals (EBCIs) combine empirical Bernstein calibration with bias-aware fixed-length radius construction under a local Taylor-remainder class. Uniformly over functions with $S$-th order local smoothness, both one-sided and two-sided intervals attain the nominal coverage level up to a remainder of order $n^{-\frac{2S}{2S+1}}$ or an exponential remainder in bounded or sub-Gaussian settings. Their widths shrink at the minimax rate $n^{-\frac{S}{2S+1}}$. Moreover, in the small-$α$ regime, the EBCI radius is first-order aligned with the radii of bias-aware-type fixed-length confidence intervals. For one-sided inference, the leading term coincides, while, for two-sided inference, the only difference is the usual replacement of \(\log(\frac{1}α)\) by $\log(\frac{2}α)$. Thus, EBCI safely converts correctly specified smoothness into both coverage accuracy and interval-length efficiency. The contribution is not a new bias-control approach, but a new calibration method that can inherit existing ideas such as bias-aware inference (BA) and robust bias correction (RBC) while avoiding the normalized-bias inflation induced by SNC.

2605.07818 2026-06-02 stat.ML math.ST stat.TH

Expectation-Maximization as a Spectrally Governed Relaxation Flow

期望最大化作为谱控制的松弛流

Qiao Wang

AI总结 本文通过揭示连接全局单调性与局部线性收敛的隐变量算子,提出两种基于谱间隙的无参数加速EM算法,并在高斯混合模型上实现超过8倍加速。

详情
Comments
This is an updated version in which the G-accelerator is enhanced by noval adaptive geometry method
AI中文摘要

期望最大化(EM)算法结合了全局单调性、局部线性收敛和强大的实际鲁棒性,但这些特性通常被分开分析。全局下降是非线性的,而局部收敛由线性化EM映射的谱控制。这两个层面如何融入一个统一的动态图景仍不清晰。 我们明确了连接它们的隐变量算子。沿着EM轨迹,似然增量在全局上可以分解为后验相对熵的能量。在非退化最大化点$θ^\ast$处的线性化揭示了局部算子\[ \mathcal G_{θ^\ast}=I-DT(θ^\ast), \] 该算子同时等于缺失信息比和观测似然的信息几何Hessian。 从这个算子我们推导出两种加速策略。\textbf{G-加速器}利用谱间隙获得最优Nesterov型动量$β^* = (1-\sqrt{λ_*})/(1+\sqrt{λ_*})$。\textbf{Geo-自适应}加速器扩展了Zhou, Alexander & Lange的几何EM框架,将其固定校正强度$γ=8$替换为自适应规则$γ_k = 1/\hatλ_k$,其中$\hatλ_k$从参数轨迹在线估计。两种方法均无参数;Geo-自适应在谱间隙最小时实现显著加速。 在高斯混合模型上的数值实验表明,两种加速器始终优于标准EM和固定$γ$的DCC-EM,其中Geo-自适应在最困难的情况下实现超过$8\times$的加速。

英文摘要

The expectation--maximization (EM) algorithm combines global monotonicity, local linear convergence, and strong practical robustness, but these features are usually analyzed separately. Global descent is nonlinear, whereas local convergence is governed by the spectrum of the linearized EM map. How these two levels fit into a single dynamical picture has remained less transparent. We make explicit the latent-variable operator that connects them. Along the EM trajectory, the likelihood increment admits a global energy decomposition in terms of posterior-relative entropy. Linearization at a nondegenerate maximizer $θ^\ast$ reveals the local operator \[ \mathcal G_{θ^\ast}=I-DT(θ^\ast), \] which coincides with both the missing-information ratio and the information-geometric Hessian of the observed likelihood. From this operator we derive two acceleration strategies. The \textbf{G-Accelerator} uses the spectral gap to obtain an optimal Nesterov-type momentum $β^* = (1-\sqrt{λ_*})/(1+\sqrt{λ_*})$. The \textbf{Geo-Adaptive} accelerator extends the geometric EM framework of Zhou, Alexander \& Lange by replacing their fixed correction strength $γ=8$ with the adaptive rule $γ_k = 1/\hatλ_k$, where $\hatλ_k$ is estimated online from the parameter trajectory. Both methods are parameter-free; Geo-Adaptive achieves dramatic acceleration precisely when the spectral gap is smallest. Numerical experiments on Gaussian mixtures demonstrate that both accelerators consistently outperform standard EM and fixed-$γ$ DCC-EM, with Geo-Adaptive attaining speedups exceeding $8\times$ in the most challenging regimes.

2605.00696 2026-06-02 stat.ML cs.CL cs.LG

Adaptive Querying with AI Persona Priors

基于AI人格先验的自适应查询

Kaizheng Wang, Yuhang Wu, Assaf Zeevi

AI总结 提出一种基于AI人格诱导的潜变量模型,利用大语言模型生成响应分布,实现高效贝叶斯设计,用于在有限查询预算下学习用户相关量。

详情
Comments
ICML 2026
AI中文摘要

我们研究在严格查询预算内,通过自适应查询学习用户相关的感兴趣量(如对保留项目的响应和心理测量指标)的问题。经典的贝叶斯设计和计算机化自适应测试通常依赖于限制性的参数假设或昂贵的后验近似,限制了它们在异质性、高维和冷启动场景中的应用。我们引入了一种人格诱导的潜变量模型,通过有限字典中的AI人格成员身份来表示用户状态,每种人格由大语言模型产生的响应分布提供。这产生了具有闭式后验更新和高效有限混合预测的表达性先验,从而实现了可扩展的贝叶斯设计用于顺序项目选择。在合成数据和WorldValuesBench上的实验表明,基于人格的后验提供了准确的概率预测和可解释的自适应启发流程。

英文摘要

We study adaptive querying for learning user-dependent quantities of interest, such as responses to held-out items and psychometric indicators, within tight query budgets. Classical Bayesian design and computerized adaptive testing typically rely on restrictive parametric assumptions or expensive posterior approximations, limiting their use in heterogeneous, high-dimensional, and cold-start settings. We introduce a persona-induced latent variable model that represents a user's state through membership in a finite dictionary of AI personas, each offering response distributions produced by a large language model. This yields expressive priors with closed-form posterior updates and efficient finite-mixture predictions, enabling scalable Bayesian design for sequential item selection. Experiments on synthetic data and WorldValuesBench demonstrate that persona-based posteriors deliver accurate probabilistic predictions and an interpretable adaptive elicitation pipeline.

2604.17838 2026-06-02 cs.LG stat.CO stat.ML

Efficient Diffusion Models under Nonconvex Equality and Inequality constraints via Landing

非凸等式和不等式约束下的高效扩散模型 via Landing

Kijung Jeon, Michael Muehlebach, Molei Tao

AI总结 提出一个统一框架,通过计算高效的landing机制替代投影,结合欠阻尼动力学加速混合,在非凸可行集上实现等式和不等式约束下的扩散模型,显著降低计算成本。

详情
Comments
58 pages
AI中文摘要

在约束集合内的生成建模对于涉及物理、几何或安全要求(例如分子生成、机器人学)的科学和工程应用至关重要。我们提出了一个通用框架,用于在一般非凸可行集 $Σ$ 上的约束扩散模型,该模型在整个扩散过程中同时强制执行等式和不等式约束。我们的框架包含了过阻尼和欠阻尼动力学用于前向和后向采样。一个关键的算法创新是计算高效的landing机制,它替代了昂贵且通常定义不清的到 $Σ$ 的投影,确保可行性而无需迭代牛顿求解或投影失败。通过利用欠阻尼动力学,我们加速了向先验分布的混合,有效缓解了通常与约束扩散相关的高模拟成本。实验上,该方法在训练和推理过程中减少了函数评估和内存使用,同时保持了样本质量。在具有等式和混合约束的基准测试中,我们的方法在显著降低计算成本的同时实现了与最先进基线相当的样本质量,为非凸可行集上的扩散提供了实用且可扩展的解决方案。

英文摘要

Generative modeling within constrained sets is essential for scientific and engineering applications involving physical, geometric, or safety requirements (e.g., molecular generation, robotics). We present a unified framework for constrained diffusion models on generic nonconvex feasible sets $Σ$ that simultaneously enforces equality and inequality constraints throughout the diffusion process. Our framework incorporates both overdamped and underdamped dynamics for forward and backward sampling. A key algorithmic innovation is a computationally efficient landing mechanism that replaces costly and often ill-defined projections onto $Σ$, ensuring feasibility without iterative Newton solves or projection failures. By leveraging underdamped dynamics, we accelerate mixing toward the prior distribution, effectively alleviating the high simulation costs typically associated with constrained diffusion. Empirically, this approach reduces function evaluations and memory usage during both training and inference while preserving sample quality. On benchmarks featuring equality and mixed constraints, our method achieves comparable sample quality to state-of-the-art baselines while significantly reducing computational cost, providing a practical and scalable solution for diffusion on nonconvex feasible sets.

2604.09041 2026-06-02 cs.LG cs.AI physics.ao-ph stat.ML

U-Cast: A Surprisingly Simple and Efficient Frontier Probabilistic AI Weather Forecaster

U-Cast:一种惊人简单且高效的边界概率AI天气预报器

Salva Rühling Cachay, Duncan Watson-Parris, Rose Yu

AI总结 提出基于标准U-Net骨架的概率天气预报模型U-Cast,通过确定性预训练和短时概率微调,以不到1/10的计算成本匹配或超越GenCast和IFS ENS的预报技能。

详情
Comments
ICML 2026. Our code is available at: https://github.com/Rose-STL-Lab/u-cast
AI中文摘要

基于AI的天气预报现在可以与传统的基于物理的集合预报相媲美,但最先进的模型依赖于专门的架构和巨大的计算预算,造成了很高的进入门槛。我们证明,对于边界性能而言,这种复杂性是不必要的。我们引入了\ours,一种基于标准U-Net骨架的概率预报器,采用简单的训练方案:先进行基于平均绝对误差的确定性预训练,然后使用蒙特卡洛Dropout引入随机性,基于连续排序概率评分(CRPS)进行短时概率微调。结果,我们的模型在$1.5^\circ$分辨率下匹配或超过了GenCast和IFS ENS的概率技能,同时与领先的基于CRPS的模型相比,训练计算量减少了10倍以上,与基于扩散的模型相比,推理延迟减少了10倍以上。U-Cast在不到12个H200 GPU天内完成训练,并在3秒内生成15天的集合预报。这些结果表明,可扩展的通用架构与高效的训练课程相结合,可以以极低的成本匹配复杂的领域特定设计,从而向更广泛的社区开放边界概率天气模型的训练。

英文摘要

AI-based weather forecasting now rivals traditional physics-based ensembles, but state-of-the-art (SOTA) models rely on specialized architectures and massive computational budgets, creating a high barrier to entry. We demonstrate that such complexity is unnecessary for frontier performance. We introduce \ours, a probabilistic forecaster built on a standard U-Net backbone trained with a simple recipe: deterministic pre-training on Mean Absolute Error followed by short probabilistic fine-tuning on the Continuous Ranked Probability Score (CRPS) using Monte Carlo Dropout for stochasticity. As a result, our model matches or exceeds the probabilistic skill of GenCast and IFS ENS at $1.5^\circ$ resolution while reducing training compute by over $10\times$ compared to leading CRPS-based models and inference latency by over $10\times$ compared to diffusion-based models. U-Cast trains in under 12 H200 GPU-days and generates a 15-day ensemble forecast in 3 seconds. These results suggest that scalable, general-purpose architectures paired with efficient training curricula can match complex domain-specific designs at a fraction of the cost, opening the training of frontier probabilistic weather models to the broader community.

2604.08149 2026-06-02 cs.LG stat.ML

A Direct Approach for Handling Contextual Bandits with Latent State Dynamics

处理具有隐状态动态的上下文赌博机的直接方法

Zhen Li, Gilles Stoltz

AI总结 本文提出一种直接方法处理隐马尔可夫链驱动的线性上下文赌博机,通过简化模型归约到标准线性上下文赌博机,并扩展理论分析以考虑HMM参数估计,同时针对更复杂的隐状态依赖模型引入周期性参数更新算法。

详情
Journal ref
ICML 2026 - Forty-Third International Conference on Machine Learning, Jul 2026, Seoul, South Korea, France
AI中文摘要

我们考虑一个线性上下文赌博机模型,其中上下文和奖励由有限隐马尔可夫链控制。我们首先重新审视Nelson等人(2022)的简化模型,其中奖励是给定观察上下文(称为信念)的隐状态后验概率的线性函数,而不是隐状态本身的函数。这个简化模型可以通过直接归约到标准线性上下文赌博机来处理。我们扩展了这一归约的理论分析,在遗憾界中考虑了隐马尔可夫模型[HMM]参数的估计,并提供了不再依赖于奖励函数而仅通过HMM参数估计依赖于模型的高概率界。其次,也是最重要的,我们转而研究更自然且更复杂的模型,该模型在隐状态中引入直接依赖关系(除了对观察上下文的依赖,这对于上下文赌博机是自然的)。在经典的HMM遗忘条件下,为应对奖励结构引入的各种统计依赖,引入的主要算法工具是仅周期性更新奖励模型参数。

英文摘要

We consider a linear contextual bandit model where contexts and rewards are governed by a finite hidden Markov chain. We first revisit the simplified model by Nelson et al. (2022), in which rewards are linear functions of the posterior probabilities over the hidden states given the observed contexts (called beliefs), rather than functions of the hidden states themselves. This simplified model may be handled through a direct reduction to standard linear contextual bandits. We extend the theoretical analysis of this reduction to take into account the estimation of the parameters of the hidden Markov model [HMM] in the regret bound and to provide high-probability bounds not depending anymore on the reward functions and only depending on the model through the estimation of the HMM parameters. Second, and most importantly, we instead study the more natural and more complex model incorporating direct dependencies in the hidden states (on top of dependencies on the observed contexts, as is natural for contextual bandits). Under a classic HMM forgetting condition, the main algorithmic tool introduced to cope with the various statistical dependencies that the reward structure introduces is to only periodically update reward-model parameters.

2601.16884 2026-06-02 cs.LG cs.NA math.NA stat.ML

Multigrade Neural Network Approximation

多级神经网络逼近

Shijun Zhang, Zuowei Shen, Yuesheng Xu

AI总结 本文提出多级深度学习(MGDL)框架,通过逐级冻结并训练子网络拟合残差,实现结构化误差修正,并证明固定宽度多级ReLU网络可均匀逼近连续函数。

详情
AI中文摘要

我们研究多级深度学习(MGDL)作为深度神经网络中结构化误差修正的原则性框架。虽然神经网络的逼近能力现在相对被充分理解,但由于高度非凸且常常病态的优化景观,训练非常深的架构仍然具有挑战性。相比之下,对于相对浅的网络,特别是某些单隐层ReLU模型,在适当设置下训练允许具有全局保证的凸重构,这激发了在扩展深度的同时提高稳定性的学习范式。MGDL基于这一见解,通过逐级训练深度网络:先前学习的级别被冻结,每个新添加的级别子网络被组合在先前学习的级别之上,并训练以拟合当前逼近留下的残差,产生结构化和可解释的分层修正过程。我们为MGDL开发了算子理论基础,并证明对于定义在超立方体上的任何连续目标函数,存在一个固定宽度的多级ReLU方案,其残差点态非增且一致收敛到零,并且对于每个非平凡级别,在$p\in [1,\infty)$上具有严格的$L^p$范数衰减。据我们所知,这项工作提供了第一个严格的构造性逼近保证,表明逐级残差修正方案可以在固定宽度多级ReLU架构中实现误差消失。

英文摘要

We study multigrade deep learning (MGDL) as a principled framework for structured error refinement in deep neural networks. While the approximation power of neural networks is now relatively well understood, training very deep architectures remains challenging due to highly nonconvex and often ill-conditioned optimization landscapes. In contrast, for relatively shallow networks, most notably certain one-hidden-layer ReLU models, training admits convex reformulations with global guarantees under appropriate settings, motivating learning paradigms that improve stability while scaling to depth. MGDL builds on this insight by training deep networks grade by grade: previously learned grades are frozen, and each newly added grade-wise subnetwork is composed on top of the previously learned grades and trained to fit the residual left by the current approximation, yielding a structured and interpretable hierarchical refinement process. We develop an operator-theoretic foundation for MGDL and prove that, for any continuous target function defined on a hypercube, there exists a fixed-width multigrade ReLU scheme whose residuals are pointwise nonincreasing in magnitude and converge uniformly to zero, with strict $L^p$-norm decay at every nontrivial grade for $p\in [1,\infty)$. To the best of our knowledge, this work provides the first rigorous constructive approximation guarantee showing that a grade-wise residual refinement scheme can achieve vanishing error in a fixed-width multigrade ReLU architecture.

2604.00578 2026-06-02 astro-ph.GA stat.AP

Revisiting Marked Galaxy Clustering from a Joint Point Process Perspective

从联合点过程视角重新审视标记星系成团性

Tsutomu T. Takeuchi

AI总结 将星系视为位置和标记的联合点过程,引入联合对相关函数作为基本量,并定义诊断量量化标记独立性偏差,从而统一解释标记效应。

详情
Comments
10 pages, 4 figures, accepted for publication in MNRAS
AI中文摘要

标记相关函数(将星系属性如光度或恒星质量视为标记)被广泛用于测试星系形成模型。然而,在天文学中,这些统计量通常作为摘要度量实现,不保留条件于分离的标记对的联合结构。在这项工作中,我们将星系表述为乘积空间 $\mathbb{R}^3\times\mathcal{M}$ 上的点 $(x,m)$,其中 $x$ 表示位置,$m$ 表示标记,并引入联合对相关函数 $g(r;m_1,m_2)$ 作为描述标记依赖成团性的基本量。我们进一步定义了一个诊断量 $\Delta_{\mathrm{ind}}(r;m_1,m_2)$,它局部量化了相对于仅空间成团性的独立性假设的偏差,从而提供了在给定分离尺度上哪些标记对过度或不足表示的无投影描述。在此框架内,常用的诊断量如非齐次交叉 $J$ 函数被自然地解释为通过对标记集进行平均和基于几何事件的联合结构简化而获得的摘要统计量。这一视角澄清了先前讨论的标记效应(包括组装偏差)对应于潜在联合依赖的投影,并且观测上可获取的信息是非因子化联合结构本身的存在。目前的表述既提供了基本量,也提供了用于其特征化的实用诊断量。

英文摘要

Marked correlation functions, in which galaxy properties such as luminosity or stellar mass are treated as marks, are widely used to test models of galaxy formation. In astronomy, however, these statistics are typically implemented as summary measures that do not preserve the joint structure of mark pairs conditioned on separation. In this work, we formulate galaxies as points $(x,m)$ on the product space $\mathbb{R}^3\times\mathcal{M}$, where $x$ denotes position and $m$ a mark, and introduce the joint pair correlation function $g(r;m_1,m_2)$ as the fundamental quantity describing mark-dependent clustering. We further define a diagnostic quantity $Δ_{\mathrm{ind}}(r;m_1,m_2)$ that locally quantifies deviations from the independence hypothesis relative to spatial clustering alone, thereby providing a projection-free description of which mark pairs are over- or underrepresented at a given separation scale. Within this framework, commonly used diagnostics such as the inhomogeneous cross-$J$ function are naturally interpreted as summary statistics obtained through averaging over mark sets and geometric-event-based reductions of the joint structure. This perspective clarifies that previously discussed marked effects, including assembly bias, correspond to projections of an underlying joint dependence, and that observationally accessible information is the existence of non-factorizable joint structure itself. The present formulation provides both a fundamental quantity and practical diagnostics for its characterization.

2601.15880 2026-06-02 stat.ME math.ST stat.TH

Estimating conditional Mann-Whitney effects using pseudo-observation-based regression

使用基于伪观测的回归估计条件性Mann-Whitney效应

Dennis Dobler, Alina Schenk, Matthias Schmid

AI总结 本文提出一种无分布回归模型,通过伪观测方法估计Mann-Whitney效应与协变量的线性关系,并开发了基于bootstrap的假设检验,应用于乳腺癌患者无进展生存期分析。

详情
Comments
32 pages, 10 figures, 7 tables
AI中文摘要

Mann-Whitney效应是衡量两个样本特定结果变量顺序的效应度量,具有概率解释,并与ROC曲线下面积相关。文献中已针对有序结果和右删失时间事件结果进行了考虑。对于这两种情况,本文引入了一种无分布回归模型,将Mann-Whitney效应与协变量的线性组合相关联。为了拟合该模型,我们开发了一种基于伪观测的程序,得到一致且渐近正态的系数估计。此外,我们提出了基于bootstrap的假设检验,以推断协变量对Mann-Whitney效应的影响。关于所提出方法在小样本行为上的模拟研究表明,新假设检验与Cox回归模型的z检验性能相当。新方法用于分析参加随机III期SUCCESS-A试验的乳腺癌患者的无进展生存期。

英文摘要

The Mann-Whitney effect is an effect measure for the order of two sample-specific outcome variables. It has the interpretation of a probability and also a connection to the area under the ROC curve. In the literature it has been considered for both ordinal and right-censored time-to-event outcomes. For both cases, the present paper introduces a distribution-free regression model that relates the Mann-Whitney effect to a linear combination of covariates. To fit the model, we develop a pseudo-observation-based procedure yielding consistent and asymptotically normal coefficient estimates. In addition, we propose bootstrap-based hypothesis tests to infer the effects of the covariates on the Mann-Whitney effect. A simulation study on the small-sample behavior of the proposed method demonstrates that the novel hypothesis tests keep up with the z-test of a Cox regression model. The new methods are used to analyze progression-free survival in breast cancer patients enrolled for the randomized phase III SUCCESS-A trial.

2510.03690 2026-06-02 cs.LG stat.ML

From Moments to Models: Graphon-Mixture Learning for Mixup and Contrastive Learning

从矩到模型:用于混合和对比学习的图模型混合学习

Ali Azizpour, Reza Ramezanpour, Santiago Segarra

AI总结 提出一个统一框架,将图数据建模为图模型(graphon)混合,利用图矩(motif密度)聚类并估计混合成分,进而提出图模型感知的混合(GMAM)和对比学习(MGCL)方法,在监督和无监督任务上取得最优或竞争性能。

详情
AI中文摘要

现实世界的图数据集通常来自混合群体,其中图由多个不同的潜在分布生成。在这项工作中,我们提出了一个统一框架,将图数据显式建模为由图模型表示的 probabilistic 图生成模型的混合。为了表征和估计这些图模型,我们利用图矩(motif密度)对从相同底层模型生成的图进行聚类。我们建立了一个新的理论保证,推导出一个更紧的界,表明从结构相似的图模型中采样的图以高概率表现出相似的 motif 密度。这一结果使得图模型混合成分的估计具有原则性。我们展示了如何将估计的图模型混合成分增强两种广泛使用的下游范式:通过混合进行图数据增强和图对比学习。通过将这些方法基于底层生成模型,我们开发了图模型感知的混合(GMAM)和模型感知的图对比学习(MGCL)。在模拟和真实数据集上的大量实验证明了强大的实证性能。在监督学习中,GMAM 优于现有的增强策略,在 7 个数据集中的 6 个上达到了新的最先进准确率。在无监督学习中,MGCL 在七个基准数据集上具有竞争力,并实现了总体最低的平均排名。

英文摘要

Real-world graph datasets often arise from mixtures of populations, where graphs are generated by multiple distinct underlying distributions. In this work, we propose a unified framework that explicitly models graph data as a mixture of probabilistic graph generative models represented by graphons. To characterize and estimate these graphons, we leverage graph moments (motif densities) to cluster graphs generated from the same underlying model. We establish a novel theoretical guarantee, deriving a tighter bound showing that graphs sampled from structurally similar graphons exhibit similar motif densities with high probability. This result enables principled estimation of graphon mixture components. We show how incorporating estimated graphon mixture components enhances two widely used downstream paradigms: graph data augmentation via mixup and graph contrastive learning. By conditioning these methods on the underlying generative models, we develop graphon-mixture-aware mixup (GMAM) and model-aware graph contrastive learning (MGCL). Extensive experiments on both simulated and real-world datasets demonstrate strong empirical performance. In supervised learning, GMAM outperforms existing augmentation strategies, achieving new state-of-the-art accuracy on 6 out of 7 datasets. In unsupervised learning, MGCL performs competitively across seven benchmark datasets and achieves the lowest average rank overall.

2504.21688 2026-06-02 stat.AP stat.ME stat.ML

Assessing Racial Disparities in Healthcare Expenditures via Mediator Distribution Shifts

通过中介变量分布偏移评估医疗支出中的种族差异

Xiaxian Ou, Xinwei He, David Benkeser, Razieh Nabi

AI总结 本研究提出一个框架,通过中介变量分布的变化分解医疗支出中的种族差异,并应用机器学习方法分析MEPS数据,发现社会经济地位和健康状况是主要贡献因素。

详情
Journal ref
Statistics in Medicine, 45(13-14), e70606, 2026
AI中文摘要

医疗支出中的种族差异已有充分记录,但其潜在驱动因素仍然复杂。本研究开发了一个框架,通过中介变量分布的变化而非将种族本身视为可操纵暴露来分解此类差异。我们将差异定义为不同种族群体间协变量调整后的结果分布的差异,并将总差异分解为可归因于中介变量分布差异的部分,以及在均衡这些分布后仍然存在的残余部分。使用医疗支出面板调查(MEPS)数据,我们考察了如果社会经济地位(SES)、保险获取、健康行为或健康状况等中介变量在不同种族群体间均衡,支出差异将在多大程度上持续或减少。为确保有效推断,我们基于影响函数技术和灵活机器学习(包括超级学习者和为支出数据的零膨胀、右偏特性设计的两部分模型)推导了渐近线性估计量。将该框架应用于2009年和2016年的MEPS数据,在所有成对种族比较中均观察到显著差异,其中非西班牙裔白人与西班牙裔之间的差距在两年中均为最大。SES和健康状况的差异是这些差异的最大贡献因素,保险获取也发挥了重要作用,尤其对于西班牙裔人群,而健康行为的贡献最小。残余差异持续存在,尤其是在涉及非西班牙裔白人的比较中,表明存在未测量或结构性因素的影响。

英文摘要

Racial disparities in healthcare expenditures are well-documented, yet the underlying drivers remain complex. This study develops a framework to decompose such disparities through shifts in the distributions of mediating variables, rather than treating race itself as a manipulable exposure. We define disparities as differences in covariate-adjusted outcome distributions across racial groups, and decompose the total disparity into a component attributable to differences in mediator distributions, and a residual component that remains after equalizing those distributions. Using data from the Medical Expenditures Panel Survey (MEPS), we examine the extent to which expenditure disparities would persist or be reduced if mediators such as socioeconomic status (SES), insurance access, health behaviors, or health status were equalized across racial groups. To ensure valid inference, we derive asymptotically linear estimators based on influence-function techniques and flexible machine learning, including super learners and a two-part model designed for the zero-inflated, right-skewed nature of expenditure data. Applying this framework to MEPS data from 2009 and 2016, substantial disparities were observed across all pairwise racial comparisons, with the largest gaps observed between non-Hispanic Whites and Hispanics in both years. Differences in SES and health status were the largest contributors to these disparities, with insurance access also playing a meaningful role, particularly for Hispanic populations, whereas health behaviors contributed minimally. Residual disparities persisted, especially in comparisons involving non-Hispanic Whites, suggesting the influence of unmeasured or structural factors.

2603.24170 2026-06-02 stat.OT math.CO math.PR

Tackling the 6/49 Lottery and Debunking Common Myths with Probabilistic Methods and Combinatorial Designs

攻克6/49彩票并用概率方法和组合设计破除常见迷思

Ralph Stömmer

AI总结 本文通过概率方法和组合设计客观分析6/49彩票,构建了达到Schönheim界的(49,6,5)覆盖设计,并证明常见试图战胜赔率的方法会不成比例地降低中奖机会。

详情
Comments
16 pages
AI中文摘要

最终,庄家总是赢!这一简单真理适用于所有公共机会游戏。然而,自彩票存在以来,人们尝试了各种方法试图帮助运气。本文比较了攻克6/49彩票的客观科学方法:概率方法和组合设计。本文开发的数学模型可以修改并应用于其他彩票。介绍了新构建的(49,6,5)覆盖设计,该设计达到了Schönheim界。针对彩票设计和覆盖设计,提出了基于概率方法的基准。结果表明,常见的试图战胜赔率的方法对应于将数字限制到子集,这不成比例地降低了中奖机会。

英文摘要

At the end, the house always wins! This simple truth holds for all public games of chance. Nevertheless, since lotteries have existed, people have tried everything to give luck a helping hand. This article compares objective scientific approaches to tackle the 6/49 lottery: probabilistic methods and combinatorial designs. The mathematical models developed herein can be modified and applied to other lotteries. The newly constructed (49, 6, 5) covering design is introduced, which meets the Schönheim bound. For lottery designs and for covering designs, a benchmark based on probabilistic methods is presented. It is demonstrated that common attempts to outwit the odds correspond to limitations of numbers to subsets, which disproportionately reduce the chances of winning.

2602.16733 2026-06-02 econ.EM stat.ME

Scaling Reproducibility: An AI-Assisted Workflow for Large-Scale Replication and Reanalysis

规模化可重复性:一种用于大规模复制与再分析的人工智能辅助工作流

Yiqing Xu, Leo Yang Yang

AI总结 提出一种AI辅助工作流,实现论文全自动复制,并应用于政治学顶级期刊的384篇论文,发现期刊验证要求与数据存档政策显著提升可重复性。

详情
AI中文摘要

计算可重复性是科学可信度的核心,然而大规模验证已发表成果的成本仍然高昂。我们开发了一种AI辅助工作流,用于自动化的全文复制——包括检索材料、重建环境、执行代码,并将输出与回归表中报告的点估计进行匹配。我们定义了三大政治学顶级期刊(2010-2025年)所有实证与定量论文的总体,并通过自动提取衡量声明的数据可用性。对于384项研究的分层样本,我们应用该工作流进行全文复制,共计3,523个实证模型。我们发现,期刊验证要求与数据存档规定共同推动了可重复性:完全或大部分可复制的论文比例从DA-RT采纳前的20.8%上升至采纳后的82.5%,并且在可获取复制包的条件下,92.1%的论文完全或大部分可复制(234/254)。作为二次应用,我们对84项研究(在1,910个复制模型中的597个IV设定)应用标准化IV诊断,展示了自动执行如何实现跨异质经验情境的系统性再分析。

英文摘要

Computational reproducibility is central to scientific credibility, yet verifying published results at scale remains costly. We develop an AI-assisted workflow for automated full-paper replication -- retrieving materials, reconstructing environments, executing code, and matching outputs to point estimates reported in regression tables. We define a universe of all empirical and quantitative papers from the three top political science journals (2010--2025) and measure stated data availability using automated extraction. For a stratified sample of 384 studies, we apply the workflow to conduct full-paper replication, totaling 3,523 empirical models. We find that journal verification requirements, combined with data archiving mandates, drive reproducibility: the share of fully or largely reproducible papers rises from 20.8% before DA-RT adoption to 82.5% after, and conditional on accessible replication packages, 92.1% of papers are fully or largely reproducible (234/254). As a secondary application, we apply standardized IV diagnostics to 84 studies (597 IV specifications among 1,910 replicated models), illustrating how automated execution enables systematic reanalysis across heterogeneous empirical settings.

2603.23398 2026-06-02 cs.LG cs.AI stat.ML

Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generation

图能量匹配:用于图生成的传输对齐能量基建模

Michal Balcerak, Suprosanna Shit, Chinmay Prabhakar, Sebastian Kaltenbach, Michael S. Albergo, Yilun Du, Bjoern Menze

AI总结 提出Graph Energy Matching (GEM)方法,基于JKO传输映射优化视角学习置换不变势能,通过能量基切换策略实现离散图的高质量生成,在分子图基准上匹配或超越离散扩散模型。

详情
AI中文摘要

离散数据(如图)的生成建模支撑着许多科学和工业应用,包括分子发现和材料设计。在这些领域中,概率推理尤其有价值,因为它能够实现可组合生成和原则性地融入期望的约束,例如结构或功能属性。能量基模型通过捕获相对似然并在推理过程中直接施加约束来支持可组合推理,自然符合这一目标。然而,离散能量基模型通常难以实现高效高质量的采样,因为支持区域外的区域常包含虚假局部最小值,会困住采样器并导致训练不稳定,从而与离散扩散模型相比存在保真度差距。为了解决这一差距,我们引入了Graph Energy Matching (GEM),这是一种受Jordan-Kinderlehrer-Otto (JKO)传输映射优化视角启发的离散生成框架。GEM学习一个置换不变势能,同时引导从噪声到高似然图区域的离散传输,并在这些区域内细化样本。我们进一步引入了一种利用能量基切换策略的采样协议,无缝衔接快速的梯度引导传输和用于有效探索的局部混合机制。在分子图基准上,GEM在大多数报告指标上匹配或超越了强离散扩散基线。除了提高生成质量,GEM的相对似然建模还支持定向探索,促进组合生成、属性约束采样以及图之间的插值。项目页面:https://michalbalcerak.ai/graph-energy-matching/。

英文摘要

Generative modeling of discrete data, such as graphs, underpins many scientific and industrial applications, including molecular discovery and materials design. In these domains, probabilistic inference is particularly valuable, as it enables composable generation and principled incorporation of desired constraints, such as structural or functional properties. Energy-based models naturally support this goal by capturing relative likelihoods and enabling composable inference by directly enforcing constraints during inference. However, discrete energy-based models typically struggle with efficient and high-quality sampling, as off-support regions often contain spurious local minima, trapping samplers and causing training instabilities, resulting in a fidelity gap compared to discrete diffusion models. To address this gap, we introduce Graph Energy Matching (GEM), a discrete generative framework inspired by the Jordan-Kinderlehrer-Otto (JKO) transport-map optimization perspective. GEM learns a permutation-invariant potential energy that simultaneously guides discrete transport from noise toward high-likelihood graph regions and refines samples within these regions. We further introduce a sampling protocol leveraging an energy-based switching strategy, seamlessly bridging rapid, gradient-guided transport and a local mixing regime for effective exploration. On molecular graph benchmarks, GEM matches or surpasses strong discrete diffusion baselines on most reported metrics. Beyond improving generation quality, GEM's relative likelihood modeling enables targeted exploration, facilitating compositional generation, property-constrained sampling, and interpolation between graphs. Project page: https://michalbalcerak.ai/graph-energy-matching/.

2603.22215 2026-06-02 stat.ME stat.AP

Multiview Graph Fusion with Covariates

协变量下的多视图图融合

Sharmistha Guha, Jose Rodriguez-Acosta, Ivo Dinov

AI总结 提出一种集成贝叶斯方法,联合建模多视图图与向量值预测变量,实现信息融合、参数估计和不确定性量化。

详情
Comments
46 pages
AI中文摘要

联合建模具有共同节点集和辅助预测变量的多视图图是统计方法论中一个基本但较少探索的领域。传统方法通常将不同视图的图视为独立或未能充分纳入预测变量,可能遗漏图视图内部及之间的复杂依赖关系,导致推断精度降低。受此类方法论缺陷的启发,我们引入一种集成贝叶斯方法,用于联合学习带有向量值预测变量的多视图图。我们的建模框架假设每个图视图具有共同的节点集,同时允许不同图视图之间的节点具有多样化的互连或边权重,适应二值和连续值的边权重。通过采用分层贝叶斯建模方法,我们的框架通过精心设计的模型参数先验分布,无缝整合来自不同图的信息。该方法能够估计定义这些图视图与预测变量之间关系的关键模型参数,并提供图视图的预测推断。至关重要的是,该方法在所有此类推断中提供了不确定性量化。理论分析表明,在真实数据生成密度和图节点数量相对于样本量增长的温和假设下,我们模型的后验预测密度渐近收敛于真实数据生成密度。模拟研究验证了我们的方法相对于依赖预测变量的张量学习和不同图视图与预测变量的独立学习所具有的推断优势。我们进一步通过分析认知控制任务下神经科学中的功能连接图,将任务相关的大脑连接与表型测量相关联,展示了模型的实用性。

英文摘要

Joint modeling of multiview graphs with a common set of nodes between views and auxiliary predictors is an essential, yet less explored, area in statistical methodology. Traditional approaches often treat graphs in different views as independent or fail to adequately incorporate predictors, potentially missing complex dependencies within and across graph views and leading to reduced inferential accuracy. Motivated by such methodological shortcomings, we introduce an integrative Bayesian approach for joint learning of a multiview graph with vector-valued predictors. Our modeling framework assumes a common set of nodes for each graph view while allowing for diverse interconnections or edge weights between nodes across graph views, accommodating both binary and continuous valued edge weights. By adopting a hierarchical Bayesian modeling approach, our framework seamlessly integrates information from diverse graphs through carefully designed prior distributions on model parameters. This approach enables the estimation of crucial model parameters defining the relationship between these graph views and predictors, as well as offers predictive inference of the graph views. Crucially, the approach provides uncertainty quantification in all such inferences. Theoretical analysis establishes that the posterior predictive density for our model asymptotically converges to the true data-generating density, under mild assumptions on the true data-generating density and the growth of the number of graph nodes relative to the sample size. Simulation studies validate the inferential advantages of our approach over predictor-dependent tensor learning and independent learning of different graph views with predictors. We further illustrate model utility by analyzing functional connectivity graphs in neuroscience under cognitive control tasks, relating task-related brain connectivity with phenotypic measures.

2602.10014 2026-06-02 cs.LG stat.ML

A Task-Centric Theory for Iterative Self-Improvement with Easy-to-Hard Curricula

一种以任务为中心的迭代自改进理论,采用由易到难的课程

Chenruo Liu, Yijun Dong, Yiqiu Shen, Qi Lei

AI总结 本文通过将自改进建模为基于奖励过滤分布的最大似然微调,推导了期望奖励的有限样本保证,并证明了在推理任务中由易到难的课程比固定任务混合训练具有更好的理论保证。

详情
AI中文摘要

迭代自改进在由大型语言模型自身生成的、经过奖励验证的输出上微调自回归大型语言模型。与自改进的经验成功相比,这种生成性迭代过程在实际有限样本设置下的理论基础仍然有限。我们通过将每一轮自改进建模为在奖励过滤分布上的最大似然微调,并推导期望奖励的有限样本保证,朝这个目标取得了进展。我们的分析揭示了一个显式的反馈循环,其中更好的模型每轮接受更多数据,支持持续的自改进,同时解释了这种改进最终饱和的原因。通过采用以任务为中心的观点,考虑具有多个难度级别的推理任务,我们进一步证明了在模型初始化、任务难度和样本预算方面的可量化条件,在这些条件下,由易到难的课程比在固定任务混合上训练具有可证明的更好保证。我们的分析通过蒙特卡洛模拟以及涵盖合成图基推理任务和多个标准数学推理基准的实验得到了验证。

英文摘要

Iterative self-improvement fine-tunes an autoregressive large language model (LLM) on reward-verified outputs generated by the LLM itself. In contrast to the empirical success of self-improvement, the theoretical foundation of this generative, iterative procedure in a practical, finite-sample setting remains limited. We make progress toward this goal by modeling each round of self-improvement as maximum-likelihood fine-tuning on a reward-filtered distribution and deriving finite-sample guarantees for the expected reward. Our analysis reveals an explicit feedback loop where better models accept more data per iteration, supporting sustained self-improvement while explaining eventual saturation of such improvement. Adopting a task-centric view by considering reasoning tasks with multiple difficulty levels, we further prove quantifiable conditions on model initialization, task difficulty, and sample budget where easy-to-hard curricula provably achieve better guarantees than training on fixed mixtures of tasks. Our analyses are validated through Monte-Carlo simulations and experiments spanning a synthetic graph-based reasoning task and multiple standard mathematical reasoning benchmarks.

2603.14798 2026-06-02 stat.ML cs.LG cs.NA math.NA

Preconditioned One-Step Generative Modeling for Bayesian Inverse Problems in Function Spaces

函数空间中贝叶斯逆问题的预处理一步生成建模

Zilan Cheng, Li-Lian Wang, Zhongjian Wang

AI总结 提出一种基于一步生成传输的机器学习算法,使用先验对齐的高斯随机场作为源,通过神经算子逼近后验分布,高效求解函数空间中的贝叶斯逆问题。

详情
AI中文摘要

我们提出了一种用于函数空间贝叶斯逆问题的机器学习算法。基于一步生成传输,该方法学习一个摊销神经算子,其将高斯源的推送前推近似于以每个新观测为条件的后验分布。我们证明白噪声源与函数空间极限不兼容,因此采用先验对齐的GRF作为源。通过所得一步条件后验传输的Lipschitz正则性以及在线性逆问题和基于PDE的逆问题上的数值实验,我们证明了这一选择的合理性。该方法并非从MCMC中提炼:它仅使用先验样本和模拟的部分噪声观测进行训练。一旦训练完成,它能在约$10^{-3}$秒内生成一个$64\times64$的后验样本,避免了MCMC中重复的正向模型评估和多步生成采样器中重复的网络评估,同时匹配关键的后验摘要。

英文摘要

We propose a machine-learning algorithm for Bayesian inverse problems in the function-space regime. Based on one-step generative transport, the method learns an amortized neural operator whose pushforward of a Gaussian source approximates the posterior distribution conditioned on each new observation. We show that white-noise sources are incompatible with the function-space limit, and therefore adopt a prior-aligned GRF as the source. We justify this choice through the Lipschitz regularity of the resulting one-step conditional posterior transport and numerical experiments on linear inverse and PDE-based inverse problems. The method is not distilled from MCMC: it is trained only with prior samples and simulated partial noisy observations. Once trained, it generates a $64\times64$ posterior sample in $\sim 10^{-3}$s, avoiding repeated forward-model evaluations in MCMC and repeated network evaluations in multistep generative samplers while matching key posterior summaries.

2603.09919 2026-06-02 stat.ME

A Bayesian adaptive enrichment design using aggregate historical data to inform individualized treatment recommendations

一种使用聚合历史数据指导个体化治疗推荐的贝叶斯自适应富集设计

Lara Maleyeff, Shirin Golchi, Erica E. M. Moodie

AI总结 提出一种贝叶斯自适应富集设计,利用基于归一化幂先验的聚合历史数据(如平均治疗效果)来借用外部信息,通过后验概率指导早期停止或招募,以提高统计功效并减少期望样本量。

详情
Comments
13 pages, 4 tables, 0 figures
AI中文摘要

自适应富集试验旨在根据不断变化的生物标志物证据识别并招募最可能从治疗中获益的参与者,以指导个体化治疗推荐。贝叶斯方法非常适合这些设计,因为它们允许以原则性的方式纳入外部信息。在实践中,先前的研究通常只提供汇总水平的信息,由于设计或隐私限制,无法获得亚组特定的估计值。因此,现有的动态借用方法依赖于聚合指标(如平均治疗效果),并隐含地假设历史信息直接映射到模型参数上。然而,在旨在识别个体化治疗效果的适应性富集设置中,当只有边际历史效应可用时,亚组特定的治疗参数是不可识别的。为了解决这一差距,我们提出了一种贝叶斯自适应富集设计,该设计使用基于一个或多个汇总指标(如平均治疗效果)锚定的归一化幂先验,从外部研究中借用信息。据我们所知,现有方法没有解决这一差距。中期分析使用后验概率来指导因有效性或无效性而提前停止,或在有希望的生物标志物定义的亚组中继续招募。模拟研究评估了历史偏倚、样本量和先验信息量下的操作特征。结合一项即将在阻塞性睡眠呼吸暂停中进行的试验,结果显示与非借用设计相比,效率提高,包括提高功效、更早停止和减少期望样本量。

英文摘要

Adaptive enrichment trials aim to identify and recruit participants most likely to benefit from treatment based on evolving biomarker evidence, with the goal of informing individualized treatment recommendations. Bayesian methods are well suited to these designs because they allow external information to be incorporated in a principled manner. In practice, prior studies often provide only summary-level information, with subgroup-specific estimates unavailable due to design or privacy constraints. Existing dynamic borrowing approaches therefore rely on aggregate measures, such as the average treatment effect, and implicitly assume that historical information maps directly onto model parameters. In adaptive enrichment settings aimed at identifying individualized treatment effects, however, subgroup-specific treatment parameters are not identifiable when only marginal historical effects are available. To address this gap, we propose a Bayesian adaptive enrichment design that borrows information from external studies using a normalized power prior anchored on one or more summary measures, such as the average treatment effect. { To our knowledge, no existing method addresses this gap.} Interim analyses use posterior probabilities to guide early stopping for efficacy or futility, or to continue recruitment within promising biomarker-defined subgroups. Simulation studies evaluate operating characteristics across historical bias, sample size, and prior informativeness. Together with a motivating future trial in obstructive sleep apnea, the results show efficiency gains versus non-borrowing designs, including improved power, earlier stopping, and reduced expected sample size.

2603.07563 2026-06-02 stat.ME

Robust Wasserstein barycenter

鲁棒Wasserstein重心

Zixiong Cheng, Hang Liu

AI总结 针对经典Wasserstein重心对异常值敏感的问题,基于鲁棒最优传输提出鲁棒Wasserstein重心(RWB),并证明其存在性和一致性,实验表明在图像处理和金融数据分析中鲁棒性更优。

详情
AI中文摘要

本文解决了经典Wasserstein重心的一个基本局限性——其对异常值的敏感性。为了克服这些问题,我们基于鲁棒最优传输的最新概念提出了鲁棒Wasserstein重心(RWB)。为所提出的RWB建立了理论保证,包括存在性和一致性。通过在模拟和真实数据(包括图像处理和金融数据分析)上的大量数值实验,我们证明RWB相比经典Wasserstein重心具有更优越的鲁棒性。

英文摘要

In this paper, we address a fundamental limitation of the classical Wasserstein barycenter -- its sensitivity to outliers. To overcome these issues, we propose the robust Wasserstein barycenter (RWB) based on a recent concept of the robust optimal transport. Theoretical guarantees, including existence and consistency, are established for the proposed RWB. Through extensive numerical experiments on both simulated and real-world data -- including image processing and financial data analysis -- we demonstrate that the RWB exhibits superior robustness compared to the classical Wasserstein barycenter.

2506.09267 2026-06-02 math.ST stat.TH

Consistent Infill Estimability of the Regression Slope Between Gaussian Random Fields Under Spatial Confounding

空间混杂下高斯随机场之间回归斜率的填充一致可估计性

Abhirup Datta, Michael L. Stein

AI总结 本文刻画了空间混杂下高斯随机场回归斜率一致可估计的条件,基于平滑性或局部行为给出充分条件,并利用局部差分法构造一致估计量,通过谱条件给出必要条件,对Matérn等协方差族得到充要条件。

详情
AI中文摘要

在未观测空间过程混杂下,估计两个空间过程之间回归斜率参数的问题近期在统计文献中受到广泛关注。然而,一个基本问题仍未解决:在空间混杂下,该斜率何时可一致估计,现有见解主要是经验性的或特定于估计量的。我们刻画了空间混杂下高斯随机场(GRF,空间过程的常见随机模型)之间回归斜率一致可估计的条件。在固定域(填充)渐近下,我们根据暴露和混杂过程的平滑性或局部行为给出了一致可估计的充分条件。当可估计性成立时,我们使用局部差分法(对过程取适当阶数的离散差分或拉普拉斯算子)提供了斜率的一致估计量。利用Paley-Wiener空间上的泛函分析结果,我们根据混杂和暴露的相对谱尾衰减给出了斜率一致可估计的一个易于验证的必要条件。作为副产品,我们建立了关于具有不同平滑度分量场的多元GRF路径上测度等价的一个新颖且一般的谱条件。我们证明,对于许多协方差类,如Matérn、幂指数、广义柯西和协同区域化族,必要和充分条件变得相同,从而为这些过程的斜率一致可估计性提供了清晰的刻画。结果被推广到多元斜率、考虑测量误差、常见的非平稳高斯随机场和一些非高斯随机场,以及不规则设计。

英文摘要

The problem of estimating the slope parameter in regression between two spatial processes under confounding by an unmeasured spatial process has received widespread attention in the recent statistical literature. Yet, a fundamental question remains unresolved: when is this slope consistently estimable under spatial confounding, with existing insights being largely empirical or estimator-specific. We characterize conditions for consistent estimability of the regression slope between Gaussian random fields (GRFs), the common stochastic model for spatial processes, under spatial confounding. Under fixed-domain (infill) asymptotics, we give sufficient conditions for consistent estimability in terms of the smoothness or local behavior of the exposure and confounder processes. When estimability holds, we provide consistent estimators of the slope using local differencing (taking discrete differences or Laplacians of the processes of suitable order). Using functional analysis results on Paley-Wiener spaces, we then provide an easy-to-verify necessary condition for consistent estimability of the slope in terms of the relative spectral tail decays of the confounder and exposure. As a by-product, we establish a novel and general spectral condition on the equivalence of measures on the paths of multivariate GRFs with component fields of varying smoothnesses. We show that for many covariance classes like the Matérn, power-exponential, generalized Cauchy, and coregionalization families, the necessary and sufficient conditions become identical, thereby providing a sharp characterization of consistent estimability of the slope for these processes. The results are extended to multivariate slopes, to accommodate measurement error, to popular classes of non-stationary Gaussian random fields and some non-Gaussian random fields, and for irregular designs.

2603.01157 2026-06-02 q-fin.RM stat.ML

Adaptive Window Selection for Financial Risk Forecasting

金融风险预测的自适应窗口选择

Yinhuan Li, Chenxin Lyu, Ruodu Wang

AI总结 提出基于自助法的自适应窗口选择方法(BAWS),通过数据驱动的在线学习动态确定回溯窗口大小,以应对金融数据的结构性变化,提升风险预测性能。

详情
AI中文摘要

金融监管和内部管理中的风险预测通过历史数据计算。金融数据未知的结构性变化对选择合适回溯窗口进行风险建模和预测构成重大挑战。我们开发了一种数据驱动的在线学习方法,称为基于自助法的自适应窗口选择(BAWS),该方法以顺序方式自适应地确定窗口大小。BAWS的核心是将实现得分与基于自助法的数据相关阈值进行比较。我们为自助法阈值提供了渐近合理性证明,涵盖了非光滑得分,如VaR检查损失和联合VaR-ES得分,并通过移动块自助法扩展到平稳弱依赖数据。单断点分析进一步表明,BAWS会拒绝跨越足够大断点的过长窗口。所提方法适用于可单独或联合诱导的风险度量预测,例如风险价值(VaR)以及VaR和相应预期损失(ES)对。通过模拟研究和实证分析,我们证明BAWS通常优于标准滚动窗口方法和最近开发的基于稳定性的自适应窗口选择方法,特别是在数据生成过程中存在结构性变化时。

英文摘要

Risk forecasts in financial regulation and internal management are calculated through historical data. The unknown structural changes of financial data pose a substantial challenge in selecting an appropriate look-back window for risk modeling and forecasting. We develop a data-driven online learning method, called the bootstrap-based adaptive window selection (BAWS), that adaptively determines the window size in a sequential manner. A central component of BAWS is to compare the realized scores against a data-dependent threshold based on the bootstrap method. We provide an asymptotic justification for the bootstrap threshold, covering non-smooth scores such as the VaR check loss and the joint VaR--ES score, with an extension to stationary weakly dependent data via the moving block bootstrap. A single-break analysis further shows that BAWS rejects overlong windows crossing sufficiently large breaks. The proposed method is applicable to the forecasting of risk measures that are elicitable individually or jointly, such as the Value-at-Risk (VaR) and the pair of VaR and the corresponding Expected Shortfall. Through simulation studies and an empirical analysis, we demonstrate that BAWS often improves upon the standard rolling window approach and the recently developed method of stability-based adaptive window selection, especially when there are structural changes in the data-generating process.

2602.24219 2026-06-02 math.ST stat.TH

Asymptotic theory for multiple samples with flexible random membership

灵活随机成员关系的多样本渐近理论

Ha-Young Shin

AI总结 针对组别成员关系随机时多样本统计量的渐近理论,提出一个可处理确定性和随机成员关系的灵活框架,并证明渐近性质,应用于分层抽样。

详情
Comments
12 pages
AI中文摘要

一个统计量可以是多个样本的函数。当组别成员关系随机时,关于此类统计量的渐近理论的现有工作很少。我们提出了一个灵活框架,可以处理确定性和随机成员关系。我们证明了一些渐近性质,并将该框架应用于分层抽样背景。

英文摘要

A statistic can be a function of multiple samples. There is little existing work on asymptotic theory for such statistics when group membership is random. We propose a flexible framework that can handle both deterministic and random membership. We prove some asymptotic properties and apply the framework to the stratified sampling context.

2509.12734 2026-06-02 math.ST stat.TH

A Statistical Test for Comparing the Linkage and Admixture Model Based on Central Limit Theorems

基于中心极限定理比较连锁模型与混合模型的统计检验

Carola Sophia Heinzel

AI总结 本文证明了连锁模型中祖先参数的最大似然估计的一致性和渐近正态性,并基于此构建了用于在混合模型与连锁模型之间进行模型选择的渐近水平α检验,最后通过1000基因组计划数据验证了其实用性。

详情
AI中文摘要

在混合模型中,个体在特定标记上携带某个等位基因的概率取决于$K$个祖先群体中的等位基因频率以及个体基因组源自这些群体的比例。标记被假设为独立的。连锁模型是一种隐马尔可夫模型(HMM),它通过纳入相邻位点之间的连锁来扩展混合模型。\n我们证明了连锁模型中个体祖先的最大似然估计(MLE)的一致性和渐近正态性,补充了先前由\\citep{pfaff2004information, pfaffelhuber2022central, HEINZEL2025}针对混合模型的结果。这些结果被用于证明一种允许在混合模型和连锁模型之间进行模型选择的统计检验是渐近水平$\\\alpha$检验。最后,我们通过将该检验应用于1000基因组计划的真实数据,展示了我们结果的实际相关性。

英文摘要

In the Admixture Model, the probability that an individual carries a certain allele at a specific marker depends on the allele frequencies in $K$ ancestral populations and the proportion of the individual's genome originating from these populations. The markers are assumed to be independent. The Linkage Model is a Hidden Markov Model (HMM) that extends the Admixture Model by incorporating linkage between neighboring loci. We prove consistency and asymptotic normality of maximum likelihood estimators (MLEs) for the ancestry of individuals in the Linkage Model, complementing earlier results by \citep{pfaff2004information, pfaffelhuber2022central, HEINZEL2025} for the Admixture Model. These results are used to prove that a statistical test that allows for model selection between the Admixture Model and the Linkage Model is an asymptotic level-$α$-test. Finally, we demonstrate the practical relevance of our results by applying the test to real-world data from the 1000 Genomes Project.

2602.23197 2026-06-02 cs.CL cs.LG stat.ML

Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models

微调不忘上下文学习:线性注意力模型的理论分析

Chungpa Lee, Jy-yong Sohn, Kangwook Lee

AI总结 本文通过线性注意力模型理论分析,揭示了微调目标如何修改注意力参数并导致少样本性能下降的条件,提出仅更新值矩阵可保持上下文学习能力。

详情
Journal ref
International Conference on Machine Learning (ICML) 2026
AI中文摘要

基于Transformer的大型语言模型展现出上下文学习能力,能够通过少量示例提示适应下游任务。实践中,这类模型常被微调以提升下游任务的零样本性能,使其无需示例即可解决问题,从而降低推理成本。然而,微调可能削弱上下文学习能力,限制微调模型在未见任务上的表现。利用线性注意力模型,我们提供了理论分析,刻画了微调目标如何修改注意力参数,并识别了导致少样本性能下降的条件。我们表明,微调所有注意力参数会损害上下文学习,而仅更新值矩阵可在保持上下文学习的同时提升零样本性能。我们进一步证明,引入辅助的少样本损失主要增强目标任务的上下文学习,但以牺牲微调未见任务上的上下文学习能力为代价。我们提供了来自合成和真实数据集的实验证据,与理论定性预测一致。

英文摘要

Transformer-based large language models exhibit in-context learning, enabling adaptation to downstream tasks via few-shot prompting with demonstrations. In practice, such models are often fine-tuned to improve zero-shot performance on downstream tasks, allowing them to solve tasks without examples and thereby reducing inference costs. However, fine-tuning can degrade in-context learning, limiting the performance of fine-tuned models on tasks not seen during fine-tuning. Using linear attention models, we provide a theoretical analysis that characterizes how fine-tuning objectives modify attention parameters and identifies conditions under which this leads to degraded few-shot performance. We show that fine-tuning all attention parameters can harm in-context learning, whereas restricting updates to the value matrix improves zero-shot performance while preserving in-context learning. We further show that incorporating an auxiliary few-shot loss enhances in-context learning primarily on the target task, at the expense of degraded in-context learning ability on tasks not seen during fine-tuning. We provide empirical evidence from synthetic and real-world datasets consistent with the qualitative predictions of our theory.

2602.22768 2026-06-02 stat.ME

Asymptotic Theory and Sequential Testing for Adaptive Bandits

自适应Bandits的渐近理论与序贯检验

Li Yang, Xiaodong Yan, Dandan Jiang

AI总结 提出Urn Bandit过程,结合瓮概率模型与多臂老虎机,建立非独立同分布奖励序列下的联合泛函中心极限定理,实现序贯检验的渐近理论,在保持检验性能的同时提升奖励累积。

详情
AI中文摘要

多臂老虎机(MAB)过程构成了强化学习问题的一个基础子类,并且是统计决策理论中的一个核心主题。然而,在自适应分配下进行有效的序贯检验仍然具有挑战性,原因在于非独立同分布奖励序列下缺乏渐近理论,以及某些臂的样本量呈次线性增长。为了解决这一开放性挑战,我们提出了一种瓮Bandit(UNB)过程,将瓮概率模型的强化机制与MAB原理相结合,确保分配比例几乎必然收敛到最优臂。我们在非独立同分布奖励序列、非次高斯尾部以及成对跨臂依赖的情况下,为期望奖励的一致估计量建立了一个联合泛函中心极限定理(FCLT)。为了克服现有方法主要关注累积遗憾、因此仅提供算法性能保证而不支持有效序贯检验的局限性,我们在所提出的UNB过程下发展了序贯检验统计量的渐近理论。由此产生的框架支持广泛的序贯推断程序,例如A/B测试和政策评估。模拟研究和真实数据分析表明,UNB在保持与等随机化(ER)设计相当的检验性能的同时,相对于ER实现了更好的奖励累积。

英文摘要

Multi-armed bandit (MAB) processes constitute a foundational subclass of reinforcement learning problems and represent a central topic in statistical decision theory. Yet, conducting valid sequential testing under adaptive allocation remains challenging due to the lack of asymptotic theory under non-i.i.d. reward sequences and sublinear sample sizes for some arms. To address this open challenge, we propose an Urn Bandit (UNB) process to integrate the reinforcement mechanism of urn probabilistic models with MAB principles, ensuring almost sure concentration of allocation proportions on optimal arms. We establish a joint functional central limit theorem (FCLT) for consistent estimators of expected rewards under non-i.i.d. reward sequences with non-sub-Gaussian tails and pairwise cross-arm dependence. To overcome the limitations of existing methods that focus mainly on cumulative regret and therefore provide only algorithmic performance guarantees without supporting valid sequential testing, we develop an asymptotic theory for sequential test statistics under the proposed UNB process. The resulting framework enables a broad class of sequential inference procedures, such as A/B testing and policy evaluation. Simulation studies and real data analysis demonstrate that UNB maintains testing performance comparable to that of the equal randomization (ER) design while achieving improved reward accumulation relative to ER.

2508.01973 2026-06-02 math.ST stat.ME stat.TH

A New Class of Asymptotically Distribution-Free Smooth Tests

一类新的渐近分布自由光滑检验

Xiangyu Zhang, Sara Algeri

AI总结 利用经验过程理论的最新发展,构建了一类新的渐近分布自由光滑检验,即使在参数估计、模型选择和样本量中等的情况下仍保持分布自由性质,并讨论了计算高效的参数自助法替代方案。

详情
AI中文摘要

本文展示了经验过程理论的最新发展如何使我们能够构建一类新的渐近分布自由光滑检验。即使当参数被估计、进行模型选择且样本量仅为中等大小时,它们的分布自由性质仍然得以保持。还讨论了一种计算高效的经典参数自助法的替代方案。

英文摘要

This article demonstrates how recent developments in the theory of empirical processes allow us to construct a new family of asymptotically distribution-free smooth tests. Their distribution-free property is preserved even when the parameters are estimated, model selection is performed, and the sample size is only moderately large. A computationally efficient alternative to the classical parametric bootstrap is also discussed.

2602.19126 2026-06-02 cs.LG math.PR math.ST stat.TH

Robust Predictive Uncertainty and Double Descent in Contaminated Bayesian Random Features

污染贝叶斯随机特征中的鲁棒预测不确定性与双重下降

Michele Caprio, Katerina Papagiannouli, Siu Lun Chau, Sayan Mukherjee

AI总结 提出一种鲁棒贝叶斯随机特征回归方法,通过Huber污染集处理先验和似然误设,推导出后验预测密度的上下界,并引入不精确最高密度区域进行鲁棒不确定性量化,证明预测不确定性保持计算可行性并继承经典双重下降相位结构。

详情
AI中文摘要

我们提出了一种随机特征(RF)回归的鲁棒贝叶斯公式,通过Huber风格的污染集明确考虑先验和似然的误设。从岭正则化RF训练与高斯先验和似然的贝叶斯推断之间的经典等价性出发,我们分别用ε-和η-污染信度集替换单一先验和似然,并使用悲观广义贝叶斯更新进行推断。我们推导出所得后验预测密度的下界和上界的显式且可处理的界限。这些界限表明,当污染适中时,先验和似然模糊性有效地直接污染后验预测分布,产生围绕经典高斯预测的不确定性包络。我们引入了一个不精确最高密度区域(IHDR)用于鲁棒预测不确定性量化,并证明它可以通过调整的高斯可信区间进行有效近似。我们进一步获得了预测方差界限(在温和截断近似下得到上界),并证明它们保留了RF模型已知的领先阶比例增长渐近性。这些结果共同建立了贝叶斯随机特征的鲁棒性理论:预测不确定性保持计算可行性,继承经典的双重下降相位结构,并在有界先验和似然误设下通过显式最坏情况保证得到改进。

英文摘要

We propose a robust Bayesian formulation of random feature (RF) regression that accounts explicitly for prior and likelihood misspecification via Huber-style contamination sets. Starting from the classical equivalence between ridge-regularized RF training and Bayesian inference with Gaussian priors and likelihoods, we replace the single prior and likelihood with $ε$- and $η$-contaminated credal sets, respectively, and perform inference using pessimistic generalized Bayesian updating. We derive explicit and tractable bounds for the resulting lower and upper posterior predictive densities. These bounds show that, when contamination is moderate, prior and likelihood ambiguity effectively acts as a direct contamination of the posterior predictive distribution, yielding uncertainty envelopes around the classical Gaussian predictive. We introduce an Imprecise Highest Density Region (IHDR) for robust predictive uncertainty quantification and show that it admits an efficient approximation via an adjusted Gaussian credible interval. We further obtain predictive variance bounds (under a mild truncation approximation for the upper bound) and prove that they preserve the leading-order proportional-growth asymptotics known for RF models. Together, these results establish a robustness theory for Bayesian random features: predictive uncertainty remains computationally tractable, inherits the classical double-descent phase structure, and is improved by explicit worst-case guarantees under bounded prior and likelihood misspecification.

2602.16794 2026-06-02 stat.ML cs.LG

Beyond Procedure: Substantive Fairness in Conformal Prediction

超越程序:共形预测中的实质性公平

Pengqi Liu, Zijun Yu, Mouloud Belbahri, Arthur Charpentier, Masoud Asgharian, Jesse C. Cresswell

AI总结 本文通过理论分解和LLM辅助评估,研究共形预测中标签聚类方法如何平衡效用与实质性公平,并发现均衡集合大小比覆盖度更能提升公平性。

详情
Comments
Camera-ready version. Accepted at ICML 2026
AI中文摘要

共形预测(CP)为机器学习模型提供了无分布的不确定性量化,但其在下游决策中与公平性的相互作用仍未充分探索。超越将CP视为独立操作(程序公平),我们分析整体决策流程以评估实质性公平——下游结果的公平性。理论上,我们推导出一个上界,将预测集大小差异分解为可解释的组成部分,阐明标签聚类CP如何帮助控制方法驱动的对不公平的贡献。为了促进可扩展的实证分析,我们引入了一个LLM在环评估器,它近似人类对跨多种模态的实质性公平的评估。我们的实验表明,标签聚类CP通常在效用和实质性公平之间提供了有利的平衡,同时根据我们的理论减少了集合大小差异。最后,我们实证表明,均衡的集合大小(而非覆盖度)与实质性公平的改善强相关,使从业者能够设计更公平的CP系统。我们的代码可在https://github.com/layer6ai-labs/llm-in-the-loop-conformal-fairness获取。

英文摘要

Conformal prediction (CP) offers distribution-free uncertainty quantification for machine learning models, yet its interplay with fairness in downstream decision-making remains underexplored. Moving beyond CP as a standalone operation (procedural fairness), we analyze the holistic decision-making pipeline to evaluate substantive fairness-the equity of downstream outcomes. Theoretically, we derive an upper bound that decomposes prediction-set size disparity into interpretable components, clarifying how label-clustered CP helps control method-driven contributions to unfairness. To facilitate scalable empirical analysis, we introduce an LLM-in-the-loop evaluator that approximates human assessment of substantive fairness across diverse modalities. Our experiments show that label-clustered CP often provides a favorable balance between utility and substantive fairness, while reducing set-size disparities in line with our theory. Finally, we empirically show that equalized set sizes, rather than coverage, strongly correlate with improved substantive fairness, enabling practitioners to design more fair CP systems. Our code is available at https://github.com/layer6ai-labs/llm-in-the-loop-conformal-fairness.

2601.11229 2026-06-02 stat.ME

ThSQCA: Threshold-Sweep Qualitative Comparative Analysis in R

ThSQCA:R语言中的阈值扫描定性比较分析

Yuki Toyoda

AI总结 提出ThSQCA R包,通过自动化阈值扫描分析,将阈值作为显式分析变量,解决QCA中阈值选择敏感性问题,提供四种扫描函数并集成QCA包。

详情
Comments
27 pages, 2 figures, 7 tables. R package available on CRAN (https://cran.r-project.org/package=ThSQCA). v5: package renamed from TSQCA to ThSQCA (v2.0.0, now available on CRAN); updated all URLs and version numbers
AI中文摘要

定性比较分析(QCA)要求研究人员选择校准和二分阈值,这些选择会显著影响真值表、最小化以及最终的解公式。尽管存在这种依赖性,阈值敏感性通常仅以临时方式检查,因为重复分析耗时且易出错。我们提出ThSQCA,一个R包,通过将阈值视为显式分析变量来自动化阈值扫描分析。它提供了四种扫描函数(otSweep、ctSweepS、ctSweepM、dtSweep),分别用于探索结果阈值、单条件阈值、多条件阈值网格以及联合结果-条件阈值空间。ThSQCA与已建立的CRAN包QCA集成,用于真值表构建和布尔最小化,同时返回具有一致打印/摘要方法和可选详细结果的结构化S3对象。该包还支持自动化的Markdown报告生成和配置图输出,以促进跨阈值结果的可重复文档化。

英文摘要

Qualitative Comparative Analysis (QCA) requires researchers to choose calibration and dichotomization thresholds, and these choices can substantially affect truth tables, minimization, and resulting solution formulas. Despite this dependency, threshold sensitivity is often examined only in an ad hoc manner because repeated analyses are time-intensive and error-prone. We present ThSQCA, an R package that automates threshold-sweep analyses by treating thresholds as explicit analytical variables. It provides four sweep functions (otSweep, ctSweepS, ctSweepM, dtSweep) to explore outcome thresholds, single-condition thresholds, multi-condition threshold grids, and joint outcome-condition threshold spaces, respectively. ThSQCA integrates with the established CRAN package QCA for truth table construction and Boolean minimization, while returning structured S3 objects with consistent print/summary methods and optional detailed results. The package also supports automated Markdown report generation and configuration-chart output to facilitate reproducible documentation of cross-threshold results.

2602.13906 2026-06-02 stat.ML cs.LG

How Accurately Can a Gaussian Approximate Stochastic Approximation Iterates?

高斯分布能以多高的精度近似随机逼近迭代?

Shaan Ul Haque, Zedong Wang, Zixuan Zhang, Siva Theja Maguluri

AI总结 本文通过递归定义协方差的高斯序列来近似随机逼近迭代的有限时间分布,并给出了Wasserstein-1距离的显式界,从而得到误差的尾部界和渐近正态性的收敛速率。

详情
Comments
63 pages, 6 figures
AI中文摘要

随机逼近(SA)是一种在噪声干扰下寻找算子根的方法。本文重点研究SA迭代在有限时间内的分布。通常,精确分布难以刻画,因此我们的目标是找到一种能提供有用尾部界的近似。受重缩放SA迭代渐近正态性丰富文献的启发,我们通过协方差递归定义的高斯序列来近似极限前分布。特别地,我们建立了在时间$k$处重缩放迭代与前述高斯之间Wasserstein-1距离的显式界,适用于多种步长选择。由于这些协方差收敛到经典渐近极限,我们的分析也附带给出了渐近正态性的收敛速率。作为界的直接推论,我们得到了任意时刻SA迭代误差的尾部界。最后,通过匹配下界证明了速率的尖锐性,并通过模拟验证了结果。我们首先研究由一般噪声驱动的离散Ornstein-Uhlenbeck(O-U)过程的收敛速率,其平稳分布与重缩放SA迭代的极限高斯分布相同,从而获得尖锐速率。鉴于其与采样文献的联系,我们认为这具有独立意义。分析涉及调整Stein方法进行高斯近似,以处理独立同分布随机变量的矩阵加权和。通过刻画重缩放SA迭代与离散时间O-U过程之间的误差动态,并结合后者的收敛速率,得到了所需的SA有限时间界。

英文摘要

Stochastic approximation (SA) is a method for finding the root of an operator perturbed by noise. The focus of this paper is studying the distribution of SA iterates in finite time. In general, it is not possible to characterize the exact distribution, and therefore our goal is to find an approximation which can yield useful tail bounds. Inspired by the rich literature on the asymptotic normality of rescaled SA iterates, we approximate the pre-limit distributions by a sequence of Gaussians whose covariance is recursively defined. In particular, we establish explicit bounds on the Wasserstein-1 distance between the rescaled iterate at time $k$ and the aforementioned Gaussian for various choices of step-sizes. Since these covariances converge to the classical asymptotic limit, our analysis also provides a convergence rate for asymptotic normality as a by-product. As an immediate consequence of our bounds, we obtain tail bounds on the error of SA iterates at any time. Finally, we establish the sharpness of our rates by providing matching lower bounds and validate our findings through simulations. We obtain the sharp rates by first studying the convergence rate of the discrete Ornstein-Uhlenbeck (O-U) process driven by general noise, whose stationary distribution is identical to the limiting Gaussian distribution of the rescaled SA iterates. We believe that this is of independent interest, given its connection to sampling literature. The analysis involves adapting Stein's method for Gaussian approximation to handle the matrix weighted sum of i.i.d. random variables. The desired finite-time bounds for SA are obtained by characterizing the error dynamics between the rescaled SA iterate and the discrete time O-U process and combining it with the convergence rate of the latter process.

2510.06028 2026-06-02 cs.LG stat.ML

Generalization of Gibbs and Langevin Monte Carlo Algorithms in the Interpolation Regime

插值机制下吉布斯和朗之万蒙特卡洛算法的泛化

Andreas Maurer, Erfan Mirzaei, Massimiliano Pontil

AI总结 本文在过参数化插值机制下,通过数据依赖的期望误差界,证明了低温区的泛化可由高温区的小训练误差预示,并利用朗之万蒙特卡洛算法稳定逼近,在MNIST、CIFAR-10和SVHN数据集上给出非平凡且接近真实标签测试误差的预测。

详情
AI中文摘要

本文在过参数化插值机制下提供了吉布斯算法期望误差的数据依赖界,其中对于不可能的数据(如分类中的随机标签)也能获得低训练误差。结果表明,低温区的泛化已经由噪声较大的高温区的小训练误差所预示。这些界在使用朗之万蒙特卡洛算法近似时是稳定的。该分析激励了一种计算界的算法设计,该算法在MNIST、CIFAR-10和SVHN数据集上对真实标签数据给出了非平凡且接近的测试误差预测,同时对随机标签保持了正确的测试误差上界。

英文摘要

This paper provides data-dependent bounds on the expected error of the Gibbs algorithm in the overparameterized interpolation regime, where low training errors are also obtained for impossible data, such as random labels in classification. The results show that generalization in the low-temperature regime is already signaled by small training errors in the noisier high-temperature regime. The bounds are stable under approximation with Langevin Monte Carlo algorithms. The analysis motivates the design of an algorithm to compute bounds, which on the MNIST, CIFAR-10, and SVHN datasets yield nontrivial, close predictions on the test error for true labeled data, while maintaining a correct upper bound on the test error for random labels.

2602.10056 2026-06-02 cs.LG stat.ML

WildCat: Near-Linear Attention in Theory and Practice

WildCat: 理论与实践中近乎线性的注意力机制

Tobias Schröder, Lester Mackey

AI总结 提出WildCat方法,通过随机枢轴Cholesky算法选择加权核心集,在近线性时间内以超多项式误差衰减近似精确注意力,并应用于图像生成、分类和语言模型KV缓存压缩。

详情
AI中文摘要

我们介绍了WildCat,一种高精度、低成本的神经网络注意力机制压缩方法。虽然注意力是现代网络架构的标配,但由于其资源需求随输入序列长度$n$呈二次方增长,部署成本极高。WildCat通过仅关注一个小的加权核心集来避免这些二次成本。关键的是,我们使用一种快速但谱精确的子采样算法——随机枢轴Cholesky——来选择核心集,并最优地加权元素以最小化重构误差。值得注意的是,在输入有界的情况下,WildCat以超多项式$O(n^{-\sqrt{\log(\log(n))}})$的误差衰减逼近精确注意力,同时运行在近线性$O(n^{1+o(1)})$时间内。相比之下,先前的实用近似要么缺乏误差保证,要么需要二次运行时间才能保证如此高的保真度。我们将这一进展与GPU优化的PyTorch实现以及一套基准实验相结合,展示了WildCat在图像生成、图像分类和语言模型KV缓存压缩方面的优势。

英文摘要

We introduce WildCat, a high-accuracy, low-cost approach to compressing the attention mechanism in neural networks. While attention is a staple of modern network architectures, it is also notoriously expensive to deploy due to resource requirements that scale quadratically with the input sequence length $n$. WildCat avoids these quadratic costs by only attending over a small weighted coreset. Crucially, we select the coreset using a fast but spectrally-accurate subsampling algorithm -- randomly pivoted Cholesky -- and weight the elements optimally to minimise reconstruction error. Remarkably, given bounded inputs, WildCat approximates exact attention with super-polynomial $O(n^{-\sqrt{\log(\log(n))}})$ error decay while running in near-linear $O(n^{1+o(1)})$ time. In contrast, prior practical approximations either lack error guarantees or require quadratic runtime to guarantee such high fidelity. We couple this advance with a GPU-optimized PyTorch implementation and a suite of benchmark experiments demonstrating the benefits of WildCat for image generation, image classification, and language model KV cache compression.

2602.09651 2026-06-02 stat.ML cs.LG

The Entropic Signature of Class Speciation in Diffusion Models

扩散模型中类别分化的熵特征

Florian Handke, Dejan Stančević, Felix Koulischer, Thomas Demeester, Luca Ambrogioni

AI总结 通过追踪潜在语义变量的类别条件熵,检测扩散模型中的语义转变区间,并验证其在高斯混合模型和实际模型中的有效性。

详情
Comments
Accepted at International Conference on Machine Learning (ICML) 2026
AI中文摘要

扩散模型并非随时间均匀地恢复语义结构。相反,样本在狭窄的区间内从语义模糊过渡到类别确定。最近的理论工作将这种转变归因于沿类别分离方向的动力学不稳定性,但在训练模型中检测和利用这些窗口的实用方法仍然有限。我们表明,跟踪给定噪声状态下潜在语义变量的类别条件熵提供了这些转变区间的可靠特征。通过将熵限制在语义划分上,熵还可以解析不同抽象层次上的语义决策。我们在高维高斯混合模型中分析了这种行为,并表明熵率集中在与方差保持扩散中先前识别的分化对称性破缺不稳定性相同的对数时间尺度上。我们在EDM2-XS和Stable Diffusion 1.5上验证了我们的方法,其中类别条件熵一致地隔离了对语义结构形成至关重要的噪声区间。最后,我们使用我们的框架来量化引导如何随时间重新分布语义信息。这些结果共同连接了信息论和统计物理学对扩散的视角,并为时间局部化控制提供了原则性基础。

英文摘要

Diffusion models do not recover semantic structure uniformly over time. Instead, samples transition from semantic ambiguity to class commitment within a narrow regime. Recent theoretical work attributes this transition to dynamical instabilities along class-separating directions, but practical methods to detect and exploit these windows in trained models are still limited. We show that tracking the class-conditional entropy of a latent semantic variable given the noisy state provides a reliable signature of these transition regimes. By restricting the entropy to semantic partitions, the entropy can furthermore resolve semantic decisions at different levels of abstraction. We analyze this behavior in high-dimensional Gaussian mixture models and show that the entropy rate concentrates on the same logarithmic time scale as the speciation symmetry-breaking instability previously identified in variance-preserving diffusion. We validate our method on EDM2-XS and Stable Diffusion 1.5, where class-conditional entropy consistently isolates the noise regimes critical for semantic structure formation. Finally, we use our framework to quantify how guidance redistributes semantic information over time. Together, these results connect information-theoretic and statistical physics perspectives on diffusion and provide a principled basis for time-localized control.

2602.03970 2026-06-02 stat.ML cs.LG cs.NE math.MG math.ST stat.TH

Statistical Guarantees for Reasoning Probes on Looped Boolean Circuits

循环布尔电路上推理探针的统计保证

Anastasis Kratsios, Giulia Livieri, A. Martina Neuman

AI总结 针对循环布尔电路上的推理探针,利用图卷积网络和度量嵌入技术,证明了在最坏情况下泛化误差以最优速率衰减,且该速率与计算图规模无关。

详情
AI中文摘要

我们研究了一种受神经算法推理启发的迭代计算风格化模型中推理探针的统计行为。底层计算由一个循环布尔电路给出,其图是完美的 $ν$ 元树($ν\ge 2$),输出在计算轮次中递归地作为输入反馈。探针观察内部节点的采样子集,并试图推断每个节点处的潜在操作,表示为有限可容许布尔门集合上的概率分布。这种部分可观测性在结构化计算图上诱导了一个转导泛化问题。我们证明,当探针由图卷积网络参数化并查询 $N$ 个节点时,最坏情况下的泛化误差以最优速率 $\mathcal{O}(\sqrt{\log(2/δ)}/\sqrt{N})$ 衰减,概率至少为 $1-δ$。我们的分析将度量嵌入技术与最优传输工具相结合。一个关键见解是,该速率与计算图规模无关,这是通过诱导图度量的低失真一维雪花嵌入实现的。这些结果突出了在探测结构化迭代计算中统计效率的几何机制。

英文摘要

We study the statistical behavior of reasoning probes in a stylized model of iterative computation inspired by neural algorithmic reasoning. The underlying computation is given by a looped Boolean circuit whose graph is a perfect $ν$-ary tree ($ν\ge 2$), with outputs recursively fed back as inputs across computation rounds. A probe observes a sampled subset of internal nodes and seeks to infer the latent operation at each node, represented as a probability distribution over a finite set of admissible Boolean gates. This partial observability induces a transductive generalization problem on a structured computation graph. We show that when the probe is parameterized by a graph convolutional network and queries $N$ nodes, the worst-case generalization error decays at the optimal rate $\mathcal{O}(\sqrt{\log(2/δ)}/\sqrt{N})$ with probability at least $1-δ$. Our analysis combines metric embedding techniques with tools from optimal transport. A key insight is that this rate is achievable independently of the size of the computation graph, enabled by a low-distortion one-dimensional snowflake embedding of the induced graph metric. These results highlight a geometric mechanism underlying statistical efficiency in probing structured, iterative computations.

2507.02552 2026-06-02 math.ST stat.ME stat.TH

Covariance scanning for adaptively optimal change point detection in high-dimensional linear models

高维线性模型中自适应最优变点检测的协方差扫描

Haeran Cho, Housen Li

AI总结 本文研究高维线性模型中单个变点的检测与估计,通过协方差扫描方法实现稀疏和密集机制下的自适应最优检测。

详情
AI中文摘要

本文研究高维线性模型中单个变点的检测与估计。我们推导了检测边界和估计率的极小化下界,揭示了由协方差加权差分参数的稀疏性控制的相变。这种“固有稀疏性”捕捉了回归变量的协方差结构与回归系数变化对变点可检测性之间的微妙相互作用。作为下界的补充,我们引入了两种基于协方差扫描的方法,McScan和QcSan,它们分别在稀疏和密集机制下达到极小化最优性能(可能达到对数因子)。特别地,QcScan是首个在密集机制下达到一致性的方法,此外,我们设计了一个组合程序,在未知稀疏性的情况下,在稀疏和密集机制之间自适应地达到极小化最优。在计算上,基于协方差扫描的方法避免了Lasso型估计器的昂贵计算,并实现了与维度和样本量线性相关的最坏情况计算复杂度。此外,我们考虑了差分参数的检测后估计和变点估计的细化。模拟研究支持理论发现,并展示了所提出的协方差扫描方法的计算和统计效率。

英文摘要

This paper investigates the detection and estimation of a single change in high-dimensional linear models. We derive minimax lower bounds for the detection boundary and the estimation rate, which uncover a phase transition governed by the sparsity of the covariance-weighted differential parameter. This form of "inherent sparsity" captures a delicate interplay between the covariance structure of the regressors and the change in regression coefficients on the detectability of a change point. Complementing the lower bounds, we introduce two covariance scanning-based methods, McScan and QcSan, which achieve minimax optimal performance (up to possible logarithmic factors) in the sparse and the dense regimes, respectively. In particular, QcScan is the first method shown to achieve consistency in the dense regime and further, we devise a combined procedure which is adaptively minimax optimal across sparse and dense regimes without the knowledge of the sparsity. Computationally, covariance scanning-based methods avoid costly computation of Lasso-type estimators and attain worst-case computation complexity that is linear in the dimension and sample size. Additionally, we consider the post-detection estimation of the differential parameter and the refinement of the change point estimator. Simulation studies support the theoretical findings and demonstrate the computational and statistical efficiency of the proposed covariance scanning methods.

2602.06065 2026-06-02 stat.ML cond-mat.dis-nn cs.CL cs.LG

Deep networks learn to parse uniform-depth context-free languages from local statistics

深度网络从局部统计中学习解析均匀深度的上下文无关语言

Jack T. Parley, Francesco Cagnetta, Matthieu Wyart

AI总结 通过引入可调类概率上下文无关文法并设计基于深度卷积网络的推理算法,揭示了语言结构从局部统计中涌现的机制,并验证了深度卷积和Transformer架构的预测。

详情
Comments
Accepted as regular paper at ICML 2026
AI中文摘要

理解语言结构如何仅从句子中学习是认知科学和机器学习中的一个核心问题。大型语言模型(LLMs)内部表征的研究支持其在预测下一个词时解析文本的能力,同时独立于表面形式表示语义概念。然而,哪些数据统计使这些成就成为可能,以及需要多少数据,仍然在很大程度上未知。概率上下文无关文法(PCFGs)为研究这些问题提供了一个可处理的测试平台。然而,先前的工作要么侧重于训练网络使用的类解析算法的后验表征,要么侧重于具有固定语法(无需解析)的PCFGs的可学习性。在这里,我们(i)引入了一个可调的PCFGs类别,其中歧义程度和跨尺度的相关结构都可以被控制;(ii)提供了一种学习机制——一种受深度卷积网络结构启发的推理算法——将可学习性和样本复杂度与特定语言统计联系起来;(iii)在深度卷积和基于Transformer的架构上经验性地验证了我们的预测。总体而言,我们提出了一个统一框架,其中不同尺度的相关性消除了局部歧义,使数据的层次化表征得以涌现。

英文摘要

Understanding how the structure of language can be learned from sentences alone is a central question in both cognitive science and machine learning. Studies of the internal representations of Large Language Models (LLMs) support their ability to parse text when predicting the next word, while representing semantic notions independently of surface form. Yet, which data statistics make these feats possible, and how much data is required, remain largely unknown. Probabilistic context-free grammars (PCFGs) provide a tractable testbed for studying these questions. However, prior work has focused either on the post-hoc characterization of the parsing-like algorithms used by trained networks; or on the learnability of PCFGs with fixed syntax, where parsing is unnecessary. Here, we (i) introduce a tunable class of PCFGs in which both the degree of ambiguity and the correlation structure across scales can be controlled; (ii) provide a learning mechanism -- an inference algorithm inspired by the structure of deep convolutional networks -- that links learnability and sample complexity to specific language statistics; and (iii) validate our predictions empirically across deep convolutional and transformer-based architectures. Overall, we propose a unifying framework where correlations at different scales lift local ambiguities, enabling the emergence of hierarchical representations of the data.

2511.21140 2026-06-02 cs.LG cs.CL stat.AP stat.ML

How to Correctly Report LLM-as-a-Judge Evaluations

如何正确报告LLM作为评估者的评估结果

Chungpa Lee, Thomas Zeng, Jongwon Jeong, Jy-yong Sohn, Kangwook Lee

AI总结 针对LLM作为评估者时存在偏差的问题,提出一种插件式校正框架,实现无偏估计和统计原理的不确定性量化,并证明在分布偏移下仍保持无偏性。

详情
Journal ref
International Conference on Machine Learning (ICML) 2026
AI中文摘要

大型语言模型(LLMs)被广泛用作模型响应的可扩展评估者,以替代人工标注者。然而,LLM评估者的不完美灵敏度和特异性会导致朴素评估分数产生偏差。我们提出一个简单的插件式框架,可校正此偏差并实现统计原理的不确定性量化。我们的框架构建置信区间,该区间同时考虑来自测试数据集和人工标注校准数据集的不确定性。此外,它采用自适应策略分配校准样本以获得更紧的区间。重要的是,我们刻画了由真实评估分数和LLM评估者的灵敏度与特异性定义的参数区间,在这些区间内,基于LLM的评估比仅人工评估产生更可靠的估计。此外,我们证明,与现有方法相比,我们的框架在测试集和校准集之间存在分布偏移时仍保持无偏性。

英文摘要

Large language models (LLMs) are widely used as scalable evaluators of model responses in lieu of human annotators. However, imperfect sensitivity and specificity of the LLM judges induce bias in naive evaluation scores. We propose a simple plug-in framework that corrects this bias and enables statistically principled uncertainty quantification. Our framework constructs confidence intervals that account for uncertainty from both the test dataset and a human-labeled calibration dataset. Additionally, it uses an adaptive strategy to allocate calibration samples for tighter intervals. Importantly, we characterize parameter regimes defined by the true evaluation score and the LLM judge's sensitivity and specificity in which our LLM-based evaluation yields more reliable estimates than human-only evaluation. Moreover, we show that our framework remains unbiased under distribution shift between the test and calibration datasets, in contrast to existing approaches.

2602.07218 2026-06-02 cs.LG cs.AI stat.ML

Collaborative and Efficient Fine-tuning: Leveraging Task Similarity

协作高效微调:利用任务相似性

Gagik Magakyan, Amirhossein Reisizadeh, Chanwoo Park, Pablo A. Parrilo, Asuman Ozdaglar

AI总结 提出CoLoRA方法,通过共享适配器和个性化适配器利用任务相似性进行协作微调,提升数据稀缺下的模型性能,并在理论和实验上验证其有效性。

详情
AI中文摘要

适应性被认为是基础模型的核心特征,使其能够有效适应未见过的下游任务。参数高效的微调方法,如著名的LoRA,使得使用标记的、高质量且通常稀缺的任务数据对大型基础模型进行高效适应成为可能。为了缓解基础模型微调中的数据稀缺问题,我们提出利用多个下游用户之间的任务相似性。直观上,具有相似任务的用户必须能够相互帮助,以增加有效的微调数据量。我们提出了协作低秩适应(CoLoRA),该方法利用任务相似性来协作且高效地微调个性化基础模型。CoLoRA的主要思想是训练一个共享适配器,捕捉所有任务之间的潜在任务相似性,以及针对用户特定任务定制的个性化适配器。我们在异质线性回归上对CoLoRA进行了理论研究,并提供了真实恢复的可证明保证。我们还进行了多个具有不同任务相似性的自然语言实验,进一步表明当与相似任务一起训练时,个体性能显著提升。

英文摘要

Adaptability has been regarded as a central feature in the foundation models, enabling them to effectively acclimate to unseen downstream tasks. Parameter-efficient fine-tuning methods such as celebrated LoRA facilitate efficient adaptation of large foundation models using labeled, high-quality and generally scarce task data. To mitigate data scarcity in fine-tuning of foundation models, we propose to leverage task similarity across multiple downstream users. Intuitively, users with similar tasks must be able to assist each other in boosting the effective fine-tuning data size. We propose Collaborative Low-Rank Adaptation, or CoLoRA, which exploits task similarity to collaboratively and efficiently fine-tune personalized foundation models. The main idea in CoLoRA is to train one shared adapter capturing underlying task similarities across all tasks, and personalized adapters tailored to user-specific tasks. We theoretically study CoLoRA on heterogeneous linear regression and provide provable guarantees for ground truth recovery. We also conduct several natural language experiments with varying task similarity, which further demonstrate that when trained together with similar tasks, individual performances are significantly boosted.

2602.06837 2026-06-02 cs.LG stat.ML

Sharpness-Aware Hybrid Model Learning for Architecture-Agnostic Parameter Estimation

锐度感知的混合模型学习用于架构无关的参数估计

Naoya Takeishi

AI总结 提出一种基于锐度感知最小化的架构无关方法,通过损失平坦性实现混合模型中科学参数的准确估计。

详情
AI中文摘要

混合建模,即机器学习模型与科学数学模型的结合,能够实现灵活且鲁棒的数据驱动预测,并具有部分可解释性。然而,科学模型的未知参数不一定能被正确估计,因为机器学习模型的灵活性可能导致科学模型部分在预测中被有效忽略。我们可以通过应用正则化来避免这种情况,但这种正则化的公式通常依赖于模型架构和领域知识。在本文中,我们提出了一种架构无关的方法来学习混合模型,同时正确估计科学参数。其思想基于奥卡姆剃刀原则,利用损失最小值的平坦性来实现模型简洁性。我们采用锐度感知最小化的思想,并将其适应于混合建模设置。数值实验证明了基于SAM的混合模型学习在科学参数估计中的有效性。

英文摘要

Hybrid modeling, the combination of machine learning models and scientific mathematical models, enables flexible and robust data-driven prediction with partial interpretability. However, the unknown parameters of the scientific model cannot necessarily be estimated properly, since the flexibility of the machine learning model might make the scientific model part effectively ignored in prediction. We may avoid it by applying some regularization, but the formulation of such regularizers typically depends on model architectures and domain knowledge. In this paper, we propose an architecture-agnostic method to learn hybrid models while properly estimating the scientific parameters. The idea is to use the flatness of loss minima to achieve model simplicity, based upon the Occam's razor principle. We employ the idea of sharpness-aware minimization and adapt it to the hybrid modeling setting. Numerical experiments demonstrate the effectiveness of the SAM-based hybrid model learning for scientific parameter estimation.

2602.05970 2026-06-02 cs.LG cs.AI math.DS stat.ML

Inverse Depth Scaling From Most Layers Being Similar

大多数层相似时的逆深度缩放

Yizhou Liu, Sara Kangaslahti, Ziming Liu, Jeff Gore

AI总结 通过分析大型语言模型和玩具残差网络,发现损失与深度成反比,归因于功能相似的层通过集成平均而非组合学习或平滑动力学离散化来减少误差,表明需要架构创新以鼓励深度组合使用。

详情
Comments
Camera-ready version, ICML 2026
AI中文摘要

神经缩放定律将损失与大型语言模型(LLM)的模型大小联系起来,但深度和宽度可能对性能有不同的贡献,需要更详细的研究。在这里,我们通过分析LLM和玩具残差网络来量化深度如何影响损失。我们发现LLM中的损失与深度成反比,这可能是由于功能相似的层通过集成平均而不是组合学习或平滑动力学的离散化来减少误差。这种机制效率低下但鲁棒,可能源于残差网络的架构偏差和与平滑动力学不兼容的目标函数。研究结果表明,提高LLM效率可能需要架构创新以鼓励深度的组合使用。

英文摘要

Neural scaling laws relate loss to model size in large language models (LLMs), yet depth and width may contribute to performance differently, requiring more detailed studies. Here, we quantify how depth affects loss via analysis of LLMs and toy residual networks. We find loss scales inversely proportional to depth in LLMs, probably due to functionally similar layers reducing error through ensemble averaging rather than compositional learning or discretizing smooth dynamics. This regime is inefficient yet robust and may arise from the architectural bias of residual networks and target functions incompatible with smooth dynamics. The findings suggest that improving LLM efficiency may require architectural innovations to encourage compositional use of depth.

2602.05395 2026-06-02 stat.ML cs.AI cs.LG

Optimal Bayesian Stopping for Efficient Inference of Consistent LLM Answers

用于高效推断一致LLM答案的最优贝叶斯停止

Jingkai Huang, Will Ma, Zhengyuan Zhou

AI总结 利用贝叶斯先验信息,通过L-聚合停止策略在达到足够一致性时提前停止采样,以最小化采样成本并高效识别最一致的LLM答案。

详情
Journal ref
Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)
Comments
Accepted to ICML 2026. Camera-ready version
AI中文摘要

一种提高LLM准确性的简单策略,特别是在数学和推理问题中,是采样多个响应并提交最一致达成的答案。在本文中,我们利用贝叶斯先验信息来节省采样成本,一旦达到足够的一致性就停止。尽管精确后验在计算上难以处理,我们进一步引入了一种高效的“L-聚合”停止策略,该策略仅跟踪L-1个最频繁的答案计数。理论上,我们证明L=3就足够了:这种粗略近似足以实现渐近最优性,并且严格优于无先验基线,同时具有快速的后验计算。实验上,该方法使用更少的样本识别出最一致(即众数)的LLM答案,并且可以在减少LLM调用次数(即节省LLM推理成本)高达50%的同时实现相似的答案准确性。

英文摘要

A simple strategy for improving LLM accuracy, especially in math and reasoning problems, is to sample multiple responses and submit the answer most consistently reached. In this paper we leverage Bayesian prior information to save on sampling costs, stopping once sufficient consistency is reached. Although the exact posterior is computationally intractable, we further introduce an efficient "L-aggregated" stopping policy that tracks only the L-1 most frequent answer counts. Theoretically, we prove that L=3 is all you need: this coarse approximation is sufficient to achieve asymptotic optimality, and strictly dominates prior-free baselines, while having a fast posterior computation. Empirically, this identifies the most consistent (i.e., mode) LLM answer using fewer samples, and can achieve similar answer accuracy while cutting the number of LLM calls (i.e., saving on LLM inference costs) by up to 50%.

2602.03685 2026-06-02 cs.LG cs.AI stat.ML

Universal One-third Time Scaling in Learning Peaked Distributions

学习尖峰分布中的普适三分之一时间缩放

Yizhou Liu, Ziming Liu, Cengiz Pehlevan, Jeff Gore

AI总结 本文通过理论分析和实验验证,揭示了使用softmax和交叉熵学习尖峰分布时,损失和梯度呈幂律衰减,导致损失时间缩放指数为1/3的普适瓶颈,为神经缩放现象提供了机理解释。

详情
Comments
Camera-ready version, ICML 2026
AI中文摘要

训练大型语言模型(LLM)计算成本高昂,部分原因是损失呈现缓慢的幂律收敛,其起源仍有争议。通过对玩具模型的系统分析和LLM的经验评估,我们表明这种行为本质上源于softmax和交叉熵的使用。当学习尖峰概率分布(例如下一个词元分布)时,这些组件普遍产生幂律衰减的损失和梯度,与许多微观细节无关,从而形成基本的优化瓶颈。这最终导致损失的时间缩放服从幂律,普适指数为$1/3$。我们的结果为观察到的神经缩放提供了机理解释,并提出了改进LLM训练效率的新方向。

英文摘要

Training large language models (LLMs) is computationally expensive, partly because the loss exhibits slow power-law convergence whose origin remains debatable. Through systematic analysis of toy models and empirical evaluation of LLMs, we show that this behavior can arise intrinsically from the use of softmax and cross-entropy. When learning peaked probability distributions, e.g., next-token distributions, these components generically yield power-law vanishing losses and gradients, regardless of many microscopic details, creating a fundamental optimization bottleneck. This ultimately leads to power-law time scaling of the loss with a universal exponent of $1/3$. Our results provide a mechanistic explanation for observed neural scaling and suggest new directions for improving LLM training efficiency.

2602.00878 2026-06-02 stat.CO

Complexity bounds for Dirichlet process slice samplers

Dirichlet过程切片采样器的复杂度界

Beatrice Franzolini, Francesco Gaffi

AI总结 针对Dirichlet过程模型,证明了切片采样器每迭代计算开销相对于后验聚类数的增长率为O_P(log n),为实际可扩展性提供了理论保证。

详情
AI中文摘要

切片采样是Dirichlet过程(DP)模型的标准蒙特卡洛技术,广泛用于后验模拟。然而,对后验切片采样器可扩展性的正式评估仍基本未被探索,主要是因为切片采样迭代的计算成本是随机的且可能无界。在这项工作中,我们获得了DP切片采样器计算复杂度的高概率界。我们的主要结果表明,在后验聚类增长机制中,切片变量引起的开销相对于后验支持的聚类数一致地为$O_{\mathbb P}(\log n)$。因此,即使在最坏情况下,每迭代计算成本的超线性激增也以消失的概率发生。我们的分析广泛适用于基于DP的模型,无需任何似然特定假设,仍为任意数据集上的后验采样提供复杂度保证。这些结果为评估基于DP模型中切片采样的实际可扩展性奠定了理论基础。

英文摘要

Slice sampling is a standard Monte Carlo technique for Dirichlet process (DP)-based models, widely used in posterior simulation. However, formal assessments of the scalability of posterior slice samplers have remained largely unexplored, primarily because the computational cost of a slice-sampling iteration is random and potentially unbounded. In this work, we obtain high-probability bounds on the computational complexity of DP slice samplers. Our main results show that, uniformly across posterior cluster-growth regimes, the overhead induced by slice variables, relatively to the number of clusters supported by the posterior, is $O_{\mathbb P}(\log n)$. As a consequence, even in worst-case configurations, superlinear blow-ups in per-iteration computational cost occur with vanishing probability. Our analysis applies broadly to DP-based models without any likelihood-specific assumptions, still providing complexity guarantees for posterior sampling on arbitrary datasets. These results establish a theoretical foundation for assessing the practical scalability of slice sampling in DP-based models.

2601.22945 2026-06-02 math.ST cs.CR econ.TH stat.TH

Persuasive Privacy

说服性隐私

Joshua J Bon, James Bailie, Judith Rousseau, Christian P Robert

AI总结 从贝叶斯博弈论角度提出新隐私度量框架,涵盖差分隐私并扩展至确定性算法。

详情
Journal ref
Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026
Comments
24 pages, accepted as regular paper in ICML 2026
AI中文摘要

我们提出了一种从贝叶斯博弈论角度衡量隐私的新框架。该框架能够创建新的、有目的驱动的隐私定义,这些定义经过严格论证,同时允许通过博弈论评估现有的隐私保证。我们证明了纯差分隐私和概率差分隐私是我们框架的特例,并提供了该设置下后处理不等式的新解释。此外,我们证明了可以为确定性算法建立隐私保证,而这在当前的隐私标准中被忽视了。

英文摘要

We propose a novel framework for measuring privacy from a Bayesian game-theoretic perspective. This framework enables the creation of new, purpose-driven privacy definitions that are rigorously justified, while also allowing for the assessment of existing privacy guarantees through game theory. We show that pure and probabilistic differential privacy are special cases of our framework, and provide new interpretations of the post-processing inequality in this setting. Further, we demonstrate that privacy guarantees can be established for deterministic algorithms, which are overlooked by current privacy standards.

2601.22784 2026-06-02 stat.ML cs.LG

Approximating $f$-Divergences with Rank Statistics

用秩统计量近似 $f$-散度

Viktor Stein, José Manuel de Frutos

AI总结 提出一种基于秩统计量的 $f$-散度近似方法,通过直接处理秩分布避免显式密度比估计,并证明其单调性、下界性质及收敛速率,同时扩展到高维数据的切片版本。

详情
Comments
40 pages, 16 figures, 6 tables, accepted at ICML'26. Comments welcome!
AI中文摘要

我们引入了一种基于秩统计量的 $f$-散度近似方法,通过直接处理秩的分布来避免显式的密度比估计。对于分辨率参数 $K$,我们将两个单变量分布 $μ$ 和 $ν$ 之间的不匹配映射到 $\{0, \ldots, K\}$ 上的秩直方图,并通过离散 $f$-散度测量其与均匀分布的偏差,从而得到一个秩统计量散度估计量。我们证明该散度估计量在 $K$ 上是单调的,并且始终是真实 $f$-散度的下界,同时在分位数域密度比的适度正则性下,建立了 $K o\infty$ 时的定量收敛速率。为了处理高维数据,我们通过随机投影对单变量构造进行平均,定义了切片秩统计量 $f$-散度,并给出了切片极限的收敛结果。我们还推导了估计量的有限样本偏差界以及渐近正态性结果。最后,通过与神经基线进行基准测试,并展示其在生成建模实验中作为学习目标的应用,我们实证验证了该方法的有效性。

英文摘要

We introduce a rank-statistic approximation of $f$-divergences that avoids explicit density-ratio estimation by working directly with the distribution of ranks. For a resolution parameter $K$, we map the mismatch between two univariate distributions $μ$ and $ν$ to a rank histogram on $\{ 0, \ldots, K\}$ and measure its deviation from uniformity via a discrete $f$-divergence, yielding a rank-statistic divergence estimator. We prove that the resulting estimator of the divergence is monotone in $K$, is always a lower bound of the true $f$-divergence, and we establish quantitative convergence rates for $K\to\infty$ under mild regularity of the quantile-domain density ratio. To handle high-dimensional data, we define the sliced rank-statistic $f$-divergence by averaging the univariate construction over random projections, and we provide convergence results for the sliced limit as well. We also derive finite-sample deviation bounds along with asymptotic normality results for the estimator. Finally, we empirically validate the approach by benchmarking against neural baselines and illustrating its use as a learning objective in generative modeling experiments.

2601.21959 2026-06-02 stat.ML cs.LG

Near-Optimal Private Tests for Simple and MLR Hypotheses

简单和MLR假设的近最优私有检验

Yu-Wei Chen, Raghu Pasupathy, Jordan Awan

AI总结 本文在高斯差分隐私框架下,针对单调似然比条件下的简单、单侧和双侧假设检验,提出了一种基于数据驱动截断边界的私有均值估计器,并构造了私有检验统计量,实现了与非参数最有效检验相同的渐近相对效率,同时保守控制第一类错误。

详情
AI中文摘要

我们在高斯差分隐私框架下,针对单调似然比条件下的简单假设以及单侧和双侧检验,开发了一种近最优的检验程序。我们的机制基于具有数据驱动截断边界的私有均值估计器,其总体风险在对数因子范围内匹配私有极小化率。利用该估计器,我们构造了私有检验统计量,在保持保守的第一类错误控制的同时,实现了与非私有最有效检验相同的渐近相对效率。除了理论结果外,我们的数值实验表明,即使在中等小的样本量和隐私损失预算下,我们的私有检验也优于竞争性的差分隐私方法,并提供与非私有最有效检验相当的功效。

英文摘要

We develop a near-optimal testing procedure under the framework of Gaussian differential privacy for simple as well as one- and two-sided tests under monotone likelihood ratio conditions. Our mechanism is based on a private mean estimator with data-driven clamping bounds, whose population risk matches the private minimax rate up to logarithmic factors. Using this estimator, we construct private test statistics that achieve the same asymptotic relative efficiency as the non-private, most powerful tests while maintaining conservative type I error control. In addition to our theoretical results, our numerical experiments show that our private tests outperform competing DP methods and offer comparable power to the non-private most powerful tests, even at moderately small sample sizes and privacy loss budgets.

2601.21696 2026-06-02 stat.ME stat.ML

Independent Component Discovery in Temporal Count Data

时间计数数据中的独立成分发现

Alexandre Chaussard, Anna Bonnet, Sylvain Le Corff

AI总结 提出一种结合状态自适应动力学与泊松对数正态发射的生成框架,用于时间计数数据的独立成分分析,实现可辨识的成分分离与状态依赖贡献,并通过摊销变分推断高效学习参数。

详情
Comments
9 pages, 7 figures, Appendix provided
AI中文摘要

数据收集的进步正在产生越来越多的时序计数观测,使得适应性建模日益必要。在这项工作中,我们引入了一个用于时间计数数据独立成分分析的生成框架,结合了状态自适应动力学与泊松对数正态发射。该模型识别出具有状态依赖贡献的解耦成分,从而支持表示学习和扰动分析。值得注意的是,我们建立了模型的可辨识性,支持有原则的解释。为了学习参数,我们提出了一种高效的摊销变分推断程序。在模拟数据上的实验评估了在不同设置下对混合函数和潜在源的恢复情况,而对肠道微生物组和气候数据集的真实应用揭示了与领域特定知识一致的共变模式和状态转换。

英文摘要

Advances in data collection are producing growing volumes of temporal count observations, making adapted modeling increasingly necessary. In this work, we introduce a generative framework for independent component analysis of temporal count data, combining regime-adaptive dynamics with Poisson log-normal emissions. The model identifies disentangled components with regime-dependent contributions, enabling representation learning and perturbations analysis. Notably, we establish the identifiability of the model, supporting principled interpretation. To learn the parameters, we propose an efficient amortized variational inference procedure. Experiments on simulated data evaluate recovery of the mixing function and latent sources across diverse settings, while real-world applications to gut microbiome and climate datasets reveal co-variation patterns and regime shifts consistent with domain-specific knowledge.

2509.13805 2026-06-02 cs.LG cs.AI stat.ML

Towards a Physics Foundation Model

迈向物理基础模型

Florian Wiesner, Zoë J. Gray, Matthias Wessling, Stephen Baek

AI总结 提出通用物理变换器(GPhyT),通过在大规模多样化模拟数据上训练,实现单一模型在多个物理领域(如流固耦合、冲击波、热对流和多相流)的零样本泛化与长期稳定预测,性能超越专用架构7倍以上。

详情
Comments
ICML-AI4Physics 2026
AI中文摘要

基础模型通过“一次训练,随处部署”的范式彻底改变了自然语言处理,即单个预训练模型无需重新训练即可适应无数下游任务。拥有物理基础模型(PFM)将是变革性的——它能够民主化高保真模拟的访问、加速科学发现,并消除对专用求解器开发的需求。然而,当前物理感知的机器学习方法仍然从根本上局限于单一狭窄领域,并且需要为每个新系统重新训练。我们提出了通用物理变换器(GPhyT),该模型在1.8 TB的多样化模拟数据上训练,证明了基础模型能力在物理领域是可以实现的。我们的关键见解是,变换器可以学习从上下文中推断支配动力学,从而使单一模型能够模拟流固耦合、冲击波、热对流和多相动力学,而无需被告知底层方程。GPhyT实现了三个关键突破:(1)在多个物理领域上表现出卓越性能,比专用架构高出7倍以上;(2)通过上下文学习,对完全未见过的物理系统进行合理的零样本泛化;(3)通过长程 rollout 实现更稳定的长期预测。通过证明单一模型可以仅从数据中学习可泛化的物理原理,这项工作为通向通用PFM开辟了道路,该模型可能改变计算科学与工程。

英文摘要

Foundation models have revolutionized natural language processing through a ``train once, deploy anywhere'' paradigm, where a single pre-trained model adapts to countless downstream tasks without retraining. Access to a Physics Foundation Model (PFM) would be transformative - democratizing access to high-fidelity simulations, accelerating scientific discovery, and eliminating the need for specialized solver development. Yet current physics-aware machine learning approaches remain fundamentally limited to single, narrow domains and require retraining for each new system. We present the General Physics Transformer (GPhyT), trained on 1.8 TB of diverse simulation data, that demonstrates foundation model capabilities are achievable for physics. Our key insight is that transformers can learn to infer governing dynamics from context, enabling a single model to simulate fluid-solid interactions, shock waves, thermal convection, and multi-phase dynamics without being told the underlying equations. GPhyT achieves three critical breakthroughs: (1) superior performance across multiple physics domains, outperforming specialized architectures by more than 7x, (2) plausible zero-shot generalization to entirely unseen physical systems through in-context learning, and (3) more stable long-term predictions through long-horizon rollouts. By establishing that a single model can learn generalizable physical principles from data alone, this work opens the path toward a universal PFM that could transform computational science and engineering.

2505.20893 2026-06-02 stat.ME stat.AP

A longitudinal Bayesian framework for estimating causal dose-response relationships

一种用于估计因果剂量-反应关系的纵向贝叶斯框架

Yu Luo, Kuan Liu, Ramandeep Singh, Daniel J. Graham

AI总结 提出一种可扩展的非参数贝叶斯框架,通过广义倾向得分处理时变混杂,估计连续暴露下的边际纵向因果剂量-反应函数,并应用于地铁客流量与COVID-19病例数的因果分析。

详情
AI中文摘要

现有的针对时变暴露和时变混杂的因果方法主要关注估计时变二元治疗对研究终点结局的平均因果效应,在连续暴露下刻画边际因果剂量-反应关系的工具有限。我们提出一种可扩展的非参数贝叶斯框架,用于估计具有重复结局测量的边际纵向因果剂量-反应函数。该方法针对任意固定剂量水平下的平均潜在结局,并通过广义倾向得分处理时变混杂。所提出的方法将狄利克雷过程规范嵌入广义估计方程结构中,在捕捉时间相关性的同时,对连续暴露的函数形式做出最小假设。我们将该方法应用于主要国际城市的月度地铁客流量和COVID-19病例数据,识别出高客流量与病例数增加之间的因果关系和剂量-反应模式。

英文摘要

Existing causal methods for time-varying exposure and time-varying confounding focus on estimating the average causal effect of a time-varying binary treatment on an end-of-study outcome, offering limited tools for characterizing marginal causal dose-response relationships under continuous exposures. We propose a scalable, nonparametric Bayesian framework for estimating marginal longitudinal causal dose-response functions with repeated outcome measurements. Our approach targets the average potential outcome at any fixed dose level and accommodates time-varying confounding through the generalized propensity score. The proposed approach embeds a Dirichlet process specification within a generalized estimating equations structure, capturing temporal correlation while making minimal assumptions about the functional form of the continuous exposure. We apply the proposed methods to monthly metro ridership and COVID-19 case data from major international cities, identifying causal relationships and the dose-response patterns between higher ridership and increased case counts.

2512.23371 2026-06-02 stat.OT cs.GR

Domain matters: Towards domain-informed evaluation for link prediction

领域至关重要:面向领域信息的链接预测评估

Yilin Bi, Junhao Bian, Shuyan Wan, Shuaijia Wang, Tao Zhou

AI总结 本文系统评估了12种主流链接预测算法在7个领域740个真实网络上的性能,发现算法排名在领域间一致性低而在领域内一致性高,提出Winner Score指标识别各领域最优算法,并强调算法机制与网络结构匹配的重要性。

详情
Journal ref
Physica A: Statistical Mechanics and its Applications, 693, 131551 (2026)
AI中文摘要

链接预测作为复杂网络分析中的基础任务,在社交推荐、药物靶点发现和知识图谱补全等关键场景中有着广泛应用。然而,现有的算法评估通常依赖于在有限数量的网络上进行实验,假设跨领域的性能排名一致。尽管生成机制和语义背景存在显著差异,以往的研究常常仅基于跨领域网络上的简单平均,不恰当地强调“普遍最优”算法。本文系统评估了12种主流链接预测算法在跨越七个领域的740个真实网络上的性能。我们提供了大量实证证据,阐明了算法在特定领域中的表现。这些发现揭示了领域间算法排名的一致性程度显著较低,这一现象与单个领域内观察到的高度一致性形成鲜明对比。主成分分析显示,12种算法排名形成的响应向量在低维空间中按领域明显聚类,从而确认了领域属性是影响算法性能的关键因素。我们提出了一种名为Winner Score的指标,可以识别每个领域的优越算法:社交网络中的非负矩阵分解、经济学中的邻域重叠感知图神经网络、化学中的图卷积网络以及生物学中基于L3的资源分配。然而,这些特定领域的顶级算法往往在其他领域表现欠佳。这一发现强调了算法机制与网络结构对齐的重要性。

英文摘要

Link prediction, a foundational task in complex network analysis, has extensive applications in critical scenarios such as social recommendation, drug target discovery, and knowledge graph completion. However, existing evaluations of algorithmic often rely on experiments conducted on a limited number of networks, assuming consistent performance rankings across domains. Despite the significant disparities in generative mechanisms and semantic contexts, previous studies often improperly highlight ``universally optimal" algorithms based solely on naive average over networks across domains. This paper systematically evaluates 12 mainstream link prediction algorithms across 740 real-world networks spanning seven domains. We present substantial empirical evidence elucidating the performance of algorithms in specific domains. This findings reveal a notably low degree of consistency in inter-domain algorithm rankings, a phenomenon that stands in stark contrast to the high degree of consistency observed within individual domains. Principal Component Analysis shows that response vectors formed by the rankings of the 12 algorithms cluster distinctly by domain in low-dimensional space, thus confirming domain attributes as a pivotal factor affecting algorithm performance. We propose a metric called Winner Score that could identify the superior algorithm in each domain: Non-Negative Matrix Factorization for social networks, Neighborhood Overlap-aware Graph Neural Networks for economics, Graph Convolutional Networks for chemistry, and L3-based Resource Allocation for biology. However, these domain-specific top-performing algorithms tend to exhibit suboptimal performance in other domains. This finding underscores the importance of aligning an algorithm's mechanism with the network structure.

2403.08927 2026-06-02 stat.ME

Principal stratification with U-statistics under principal ignorability

主成分可忽略性下基于U统计量的主分层分析

Xinyuan Chen, Fan Li

AI总结 针对存在中间结局的因果推断,引入主广义因果效应估计量以处理非线性对比函数,并在主成分可忽略性假设下发展非参数识别和高效影响函数,进而提出基于U统计量的倍率稳健和去偏机器学习估计方法。

详情
AI中文摘要

主分层是存在中间结局时因果推断的流行框架。虽然主平均处理效应是标准的推断目标,但当关注主分层内潜在结果的相对排序时,它们可能不足。我们引入主广义因果效应估计量以适应非线性对比函数,提供适用于有序结局和复合终点的赢-输比较的稳健概率尺度总结。在主成分可忽略性下,我们将Jiang等人(2022, JRSSB)的理论结果扩展到存在二元中间变量时更广泛的因果估计量类别。我们发展了非参数识别结果,并推导了主分层分析中广义因果估计量的高效影响函数。这些高效影响函数激发了倍率稳健估计量,并为通过基于U统计量的交叉拟合获得高效去偏机器学习估计量奠定了基础。通过模拟和数据分析示例说明了所提出的方法。

英文摘要

Principal stratification is a popular framework for causal inference in the presence of an intermediate outcome. While the principal average treatment effects are the standard target of inference, they may be insufficient when interest lies in the relative ordering of potential outcomes within a principal stratum. We introduce the principal generalized causal effect estimands to accommodate nonlinear contrast functions, providing robust, probability-scale summaries suitable for ordinal outcomes and win-loss comparisons with composite endpoints. Under principal ignorability, we expand the theoretical results in Jiang et al. (2022, JRSSB) to a broader class of causal estimands in the presence of a binary intermediate variable. We develop nonparametric identification results and derive efficient influence functions for the generalized causal estimands in principal stratification analyses. These efficient influence functions motivate multiply robust estimators and lay the ground for obtaining efficient debiased machine learning estimators via cross-fitting based on U-statistics. The proposed methods are illustrated through simulations and the analysis of a data example.

2504.06108 2026-06-02 stat.ME

Causal inference in connected populations with contagion

具有传染性的连接群体中的因果推断

Subhankar Bhadra, Michael Schweinberger

AI总结 研究在具有传染性的连接群体中,通过闭式表达式揭示传染对因果效应的影响,并讨论模型估计的渐近偏差和设计估计的邻域暴露假设违反问题。

详情
AI中文摘要

连接群体中的因果推断因传染和其他现实世界过程导致结果之间的依赖性而变得复杂。我们解决了传染下因果推断文献中的一个空白:虽然关于在传染下估计因果效应的研究越来越多,但关于传染如何影响因果效应和推断却知之甚少。我们基于传染下因果效应的闭式表达式,深入了解了传染如何影响因果效应和推断。这些闭式表达式表明,即使在最简单的情况下,干预、溢出和传染的影响也是相互交织的,并且传染可以降低或增加因果效应。我们讨论了统计含义,包括忽略因传染导致的结果依赖性的基于模型的估计量的渐近偏差、无限制传染违反基于设计的估计量所依赖的邻域暴露假设,以及可能的补救措施。

英文摘要

Causal inference in connected populations is complicated by contagion and other real-world processes inducing dependence among outcomes. We address a gap in the literature on causal inference under contagion: while there is a growing body of work on estimating causal effects under contagion, little is known about how contagion impacts causal effects and inference. We provide insight into how contagion impacts causal effects and inference based on closed-form expressions for causal effects under contagion. These closed-form expressions reveal that the effects of interventions, spillover, and contagion are intertwined even in the simplest possible settings, and that contagion can decrease or increase causal effects. We discuss statistical implications, including asymptotic bias of model-based estimators ignoring dependence among outcomes due to contagion, violations of neighborhood exposure assumptions underlying design-based estimators by unrestricted contagion, and possible remedies.

2512.02342 2026-06-02 math.OC cs.LG stat.ML

Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients

非光滑优化的保护性随机Polyak步长:无需小(次)梯度的鲁棒性能

Dimitris Oikonomou, Nicolas Loizou

AI总结 针对非光滑凸优化问题,提出保护性随机Polyak步长(SPS_safe)用于随机次梯度方法,在无需强假设下提供收敛保证,并融入动量机制,实验验证其在深度神经网络训练中避免梯度消失的鲁棒性。

详情
Comments
43rd International Conference on Machine Learning (ICML 2026)
AI中文摘要

随机Polyak步长(SPS)已被证明是随机梯度下降(SGD)的一个有前景的选择,在光滑凸和非凸优化问题(包括深度神经网络训练)上,与最先进方法相比具有竞争性能。然而,该方法向非光滑设置的扩展仍处于早期阶段,通常依赖于插值假设或需要知道最优解。在这项工作中,我们为随机次梯度方法提出了一种新的SPS变体——保护性SPS(SPS$_{safe}$),并在无需强假设的情况下为非光滑凸优化提供了严格的收敛保证。我们进一步将动量融入更新规则中,得到了同样严格的理论结果。在凸基准和深度神经网络上的综合实验证实了我们的理论:所提出的步长在现有自适应基线中实现了竞争性能,并在广泛的问题设置中表现出稳定行为。最后,在深度神经网络训练的背景下,我们的步长下的梯度范数不会崩溃到(接近)零,表明了对梯度消失的鲁棒性。

英文摘要

The stochastic Polyak step size (SPS) has proven to be a promising choice for stochastic gradient descent (SGD), delivering competitive performance relative to state-of-the-art methods on smooth convex and non-convex optimization problems, including deep neural network training. However, extensions of this approach to non-smooth settings remain in their early stages, often relying on interpolation assumptions or requiring knowledge of the optimal solution. In this work, we propose a novel SPS variant, Safeguarded SPS (SPS$_{safe}$), for the stochastic subgradient method, and provide rigorous convergence guarantees for non-smooth convex optimization with no need for strong assumptions. We further incorporate momentum into the update rule, yielding equally tight theoretical results. Comprehensive experiments on convex benchmarks and deep neural networks corroborate our theory: the proposed step size achieves competitive performance to existing adaptive baselines and exhibits stable behavior across a wide range of problem settings. Finally, in the context of deep neural network training, the gradient norms under our step size do not collapse to (near) zero, indicating robustness to vanishing gradients.

2511.07438 2026-06-02 cs.CV cs.NA math.NA stat.ME

Two Datasets Are Better Than One: Method of Double Moments for 3-D Reconstruction in Cryo-EM

两个数据集优于一个:冷冻电镜三维重建的双矩方法

Joe Kileel, Oscar Mickelin, Amit Singer, Sheng Xu

AI总结 提出双矩方法(MoDM),利用均匀和非均匀两种取向分布下的二阶矩数据唯一确定分子结构,并开发基于凸松弛的算法实现高精度重建。

详情
AI中文摘要

冷冻电镜(cryo-EM)是一种强大的成像技术,用于从随机取向粒子的噪声断层投影图像中重建三维分子结构。我们引入了一种新的数据融合框架,称为双矩方法(MoDM),该方法从两种不同取向分布下获得的投影图像的二阶矩实例中重建分子结构:一种均匀分布,另一种非均匀且未知。我们证明这些矩在一般情况下唯一确定底层结构(全局旋转和反射除外),并开发了一种基于凸松弛的算法,仅使用二阶统计量即可实现精确恢复。我们的结果展示了在不同实验条件下收集和建模多个数据集的好处,表明利用数据集多样性可以显著提高计算成像任务中的重建质量。

英文摘要

Cryo-electron microscopy (cryo-EM) is a powerful imaging technique for reconstructing three-dimensional molecular structures from noisy tomographic projection images of randomly oriented particles. We introduce a new data fusion framework, termed the method of double moments (MoDM), which reconstructs molecular structures from two instances of the second-order moment of projection images obtained under distinct orientation distributions: one uniform, the other non-uniform and unknown. We prove that these moments generically uniquely determine the underlying structure, up to a global rotation and reflection, and we develop a convex-relaxation-based algorithm that achieves accurate recovery using only second-order statistics. Our results demonstrate the advantage of collecting and modeling multiple datasets under different experimental conditions, illustrating that leveraging dataset diversity can substantially enhance reconstruction quality in computational imaging tasks.

2412.02105 2026-06-02 stat.ME

The causal effects of modified treatment policies under network interference

网络干扰下修正治疗策略的因果效应

Salvador V. Balkus, Scott W. Delaney, Nima S. Hejazi

AI总结 提出诱导修正治疗策略以识别网络干扰下的因果效应,并开发半参数有效估计器,通过加州零排放车辆对空气污染的影响验证。

详情
Journal ref
Journal of the Royal Statistical Society, Series B: Statistical Methodology (2026)
Comments
30 pages, 5 figures
AI中文摘要

修正治疗策略是一类广泛适用的干预措施,用于研究连续暴露的因果效应。评估其因果效应的方法假设无干扰,这意味着当一个单元的暴露影响其他单元的结果时(如空间或网络数据中常见的情况),无法从数据中学习这些效应。我们引入了一类新的干预措施,即诱导修正治疗策略,并证明其在存在网络干扰的情况下能够识别此类因果效应。基于网络因果推断的最新进展,我们提供了统计估计量的灵活、半参数有效估计器。数值实验表明,诱导修正治疗策略可以消除由网络干扰导致的因果或识别偏差。我们利用所开发的方法评估了加州零排放车辆普及对空气污染的影响,加强了先前的证据。

英文摘要

Modified treatment policies are a widely applicable class of interventions useful for studying the causal effects of continuous exposures. Approaches to evaluating their causal effects assume no interference, meaning that such effects cannot be learned from data in settings where the exposure of one unit affects the outcomes of others, as is common in spatial or network data. We introduce a new class of intervention, induced modified treatment policies, which we show identify such causal effects in the presence of network interference. Building on recent developments for causal inference in networks, we provide flexible, semi-parametric efficient estimators of the statistical estimand. Numerical experiments demonstrate that an induced modified treatment policy can eliminate the causal, or identification, bias that results from network interference. We use the methodology developed to evaluate the effect of zero-emission vehicle uptake on air pollution in California, strengthening prior evidence.

2511.01064 2026-06-02 stat.ML cs.LG stat.CO

Generalized Guarantees for Variational Inference in the Presence of Even and Elliptical Symmetry

存在偶对称和椭圆对称时变分推断的广义保证

Charles C. Margossian, Isaac E. Rankin, Lawrence K. Saul

AI总结 本文证明,对于所有f-散度,在偶对称和椭圆对称条件下,变分推断的驻点能分别恢复目标密度的均值和相关矩阵,推广了先前对逆KL散度的结果。

详情
AI中文摘要

变分推断(VI)通过在易处理的分布族中寻找最佳匹配$q$来近似目标密度$p$。最佳变分近似通过最小化分布之间的散度$D(p||q)$得到,目前已提出多种散度作为VI的目标函数,不同选择导致不同近似。我们证明,即使这些散度具有不同的最小化器,所得近似都遵循某些对称匹配原则。具体来说,我们的结果适用于所有$f$-散度,这是一大类包括逆和前向Kullback-Leibler散度以及$\alpha$-散度的散度。我们证明,在存在偶对称时,$f$-散度的任何驻点都保证恢复$p$的均值;同样,在存在椭圆对称时,任何驻点都保证恢复其相关矩阵。为获得这些保证,我们假设$p$和$q$是单峰的,但值得注意的是,我们不要求它们是对数凹、轻尾或处处光滑的。这些保证推广了先前对逆Kullback-Leibler散度在$p$为对数凹时得到的结果。它们还扩展到目标密度$p$仅在其部分坐标上呈现对称性的情况。这些部分对称性自然出现在贝叶斯层次模型中,其中先验诱导出具有挑战性的几何结构,但仍具有对称轴。

英文摘要

Variational inference (VI) approximates a target density $p$ by the best match $q$ in a family of tractable distributions. The best variational approximation is found by minimizing a divergence between distributions, $D(p||q)$, and several divergences have been proposed as objective functions for VI, with different choices leading to different approximations. We show that even when these divergences have different minimizers, the resulting approximations all abide by certain symmetry-matching principles. Specifically, our results hold for all $f$-divergences, a broad class which includes the reverse and forward Kullback-Leibler divergences and the $α$-divergences. We show that in the presence of even symmetry, any stationary point of an $f$-divergence is guaranteed to recover the mean of $p$ and likewise, in the presence of elliptical symmetry, any stationary point is guaranteed to recover its correlation matrix. To obtain these guarantees we assume that $p$ and $q$ are unimodal, but notably we do not require them to be log-concave, light-tailed, or even everywhere-smooth. These guarantees generalize a previous result obtained for the reverse Kullback-Leibler divergence when $p$ is log-concave. They also extend to cases where the target density $p$ only exhibits symmetry along some but not all of its coordinates. These partial symmetries arise naturally in Bayesian hierarchical models, where the prior induces a challenging geometry but still possesses axes of symmetry.

2512.07842 2026-06-02 q-bio.NC math.DS math.PR stat.CO

State and Parameter Estimation for a Neural Model of Local Field Potentials

局部场电位神经模型的状态与参数估计

Daniele Avitabile, Gabriel J. Lord, Khadija Meddouni

AI总结 针对小鼠自然睡眠皮层局部场电位数据,采用离散化Wilson-Cowan Amari神经场模型结合贝叶斯数据同化方法,实现状态与参数的联合估计,并验证了方法在合成数据与真实数据上的可行性。

详情
AI中文摘要

在决策、睡眠和运动等不同状态下皮层动力学的研究是神经科学的重要课题。建模工作旨在将皮层记录中的神经节律与产生这些节律的潜在动力学联系起来。我们提出了一种方法,通过局部场电位测量表征小鼠自然睡眠期间皮层神经活动。该方法采用离散化的Wilson-Cowan Amari神经场模型描述神经活动,并结合数据同化方法实现状态和参数的贝叶斯联合估计。我们在合成测量数据上验证了方法的可行性,然后将其应用于文献中的数据集。结果表明,该方法具有表征皮层从其他脑区接收刺激的潜力,同时能够推断出与观测信号一致的状态。

英文摘要

The study of cortical dynamics during different states such as decision making, sleep and movement, is an important topic in Neuroscience. Modelling efforts aim to relate the neural rhythms present in cortical recordings to the underlying dynamics responsible for their emergence. We present an effort to characterize the neural activity from the cortex of a mouse during natural sleep, captured through local field potential measurements. Our approach relies on using a discretized Wilson--Cowan Amari neural field model for neural activity, along with a data assimilation method that allows the Bayesian joint estimation of the state and parameters. We demonstrate the feasibility of our approach on synthetic measurements before applying it to a dataset available in literature. Our findings suggest the potential of our approach to characterize the stimulus received by the cortex from other brain regions, while simultaneously inferring a state that aligns with the observed signal.

2511.08223 2026-06-02 stat.CO cs.LG cs.NA math.NA

High-Performance Variance-Covariance Matrix Construction Using an Uncentered Gram Formulation

使用非中心Gram形式的高性能方差-协方差矩阵构建

Felix Reichel

AI总结 本文通过非中心Gram矩阵和修正项等价于成对差异定义,避免了显式中心化,将计算简化为一个p×p外积和一次减法,在Python基准测试中显著提升运行速度。

详情
Journal ref
A-ccepted at International Journal of Parallel, Emergent and Distributed Systems, 2026, Taylor & Francis, Unpublished
Comments
17 pages, 9 figures, 1 table
AI中文摘要

Reichel (2025) 将bariance定义为一种成对差异度量,该度量可以仅使用标量求和在线性时间内重写。我们通过证明涉及非中心Gram矩阵和修正项的标准矩阵表达式在代数上与成对差异定义相同,同时避免了显式中心化,将此思想扩展到协方差矩阵。然后计算简化为一个p×p维的外积和一次减法。Python中的基准测试显示出明显的运行时间增益,特别是在没有BLAS优化的情况下。可选的更快Gram矩阵例程(如RXTX, Rybin et al., 2025)进一步降低了总体成本。

英文摘要

Reichel (2025) defined the bariance as a pairwise-difference measure that can be rewritten in linear time using only scalar sums. We extend this idea to the covariance matrix by showing that the standard matrix expression involving the uncentered Gram matrix and a correction term is algebraically identical to the pairwise-difference definition while avoiding explicit centering. The computation then reduces to one outer product of dimension p-by-p and a single subtraction. Benchmarks in Python show clear runtime gains, especially when BLAS optimizations are absent. Optionally faster Gram-matrix routines such as RXTX (Rybin et al., 2025) further reduce overall cost.

2511.18065 2026-06-02 stat.ME

Sequential Bootstrap for Out-of-Bag Error Estimation: A 100-Seed Replication Study and Variance-Structure Analysis

用于袋外误差估计的序贯自助法:一项100种子复制研究与方差结构分析

Cheng Peng

AI总结 本文通过序贯自助法固定每个自助样本中不同训练观测数,在100个随机种子下系统比较了袋外估计的均值和方差,发现均值对固定操作不敏感,但方差存在微弱且可复现的影响,并揭示了低复制数下结论的不稳定性。

详情
Comments
22 pages, 9 tables, 1 appendix. v2: replication budget extended from 3 to 100 seeds; statistical analyses re-derived under cross-seed paired tests; Section 5 entirely rewritten; new Section 6.3 and Appendix A document the 3-seed vs 100-seed comparison. Code and data: https://github.com/Cheng-Peng0718/SB-OOB-100seed
AI中文摘要

袋外估计是自助聚合树集成的标准内部诊断方法。在经典多项自助法下,每个复制中不同训练观测的数量 $U_b$ 本身是随机的,但其对基于袋外变异性的贡献很少被经验性地分离出来。我们使用序贯自助法——一种将 $U_b$ 固定在目标值 $k_n = \lfloor 0.632 n\rfloor$ 的重抽样方案——作为自助机制的可控扰动,并探究稳定 $U_b$ 是否会在基于袋外的诊断中产生任何可测量的变化。我们在十二个合成和真实数据集上重现了 Breiman 的五个袋外实验族,但与文献中常见的三种子呈现不同,我们运行了100个独立随机种子,每个种子进行50次内部复制,从而能够进行正式的配对统计比较(Wilcoxon符号秩检验、配对t检验、Pitman-Morgan方差检验)。我们报告三个发现。第一,袋外均值对 $U_b$ 的稳定化基本不敏感:在100个种子下的57个(实验、数据集、指标)单元中,只有6个在配对均值比较中达到 $p<0.05$,且其中4个的方向与3种子读数所暗示的相反。第二,在方差层面存在一个狭窄但可复现的效应:序贯自助法在真实数据集上降低了节点级分类诊断的跨种子标准差,而在合成数据集上略有增加(置换检验 $p=0.026$);Vehicle 数据集显示出21%的跨种子标准差降低(Pitman-Morgan $p=0.017$)。第三,一些在三个种子上看似稳定的方向性声明在100种子复制下翻转了符号,说明了低功效复制协议的代价。因此,我们将序贯自助法视为一种诊断工具,用于探测袋外估计量方差中的不同样本计数项,而不是作为经典自助法的替代方案。

英文摘要

Out-of-Bag (OOB) estimation is the standard internal diagnostic for bootstrap-aggregated tree ensembles. Under the classical multinomial bootstrap, the number of distinct training observations in each replicate, $U_b$, is itself random, but its contribution to OOB-based variability has rarely been isolated empirically. We use Sequential Bootstrap (SB) -- a resampling scheme that holds $U_b$ at a fixed target $k_n = \lfloor 0.632 n\rfloor$ -- as a controlled perturbation of the bootstrap mechanism, and ask whether stabilizing $U_b$ produces any measurable change in OOB-based diagnostics. We reproduce Breiman's five OOB experimental families on twelve synthetic and real datasets, but unlike the three-seed presentation common in this literature, we run 100 independent random seeds with 50 internal replications per seed, enabling formal paired statistical comparison (Wilcoxon signed-rank, paired-$t$, Pitman--Morgan variance test). We report three findings. First, OOB means are essentially insensitive to stabilization of $U_b$: of 57 (experiment, dataset, metric) cells under 100 seeds, only 6 reach $p<0.05$ on the paired mean comparison, and 4 of those 6 point in the opposite direction from what a 3-seed reading would suggest. Second, a narrow but reproducible effect survives at the variance level: SB reduces the cross-seed standard deviation of node-level classification diagnostics on real datasets while slightly increasing it on synthetic ones (permutation $p=0.026$); the Vehicle dataset exhibits a 21% cross-seed sd reduction (Pitman--Morgan $p=0.017$). Third, several directional claims that appear stable across three seeds flip sign under 100-seed replication, illustrating the cost of underpowered replication protocols. We therefore treat SB as a diagnostic tool for probing the distinct-sample-count term in the variance of OOB estimators, not as an alternative to the classical bootstrap.

2507.22842 2026-06-02 stat.ML cs.LG

Tricks and Plug-ins for Gradient Boosting in Image Classification

图像分类中梯度提升的技巧与插件

Biyi Fang, Truong Vo, Jean Utke, Diego Klabjan

AI总结 提出一种结合动态特征选择与BoostCNN原理的框架,通过子网格选择和重要性采样策略,将提升权重嵌入最小二乘损失训练,提升CNN性能与效率。

详情
Journal ref
2025 IEEE International Conference on Big Data (BigData), pp. 1382-1388
Comments
6 pages, 5 figures. Experimental results reported on CIFAR-10, SVHN, and ImageNetSub datasets
AI中文摘要

卷积神经网络(CNN)通过深度架构的分层特征学习,在广泛的机器学习任务中取得了显著成功。然而,大量的层和数百万参数通常使得CNN训练计算成本高昂,需要大量时间和手动调优来发现最优架构。在本文中,我们介绍了一种提升CNN性能的新框架,该框架将动态特征选择与BoostCNN原理相结合。我们的方法包含两个关键策略:子网格选择和重要性采样,以引导训练朝向特征空间的信息区域。我们进一步开发了一系列算法,使用最小二乘损失公式将提升权重直接嵌入网络训练过程。这种集成不仅减轻了手动架构设计的负担,还提高了准确性和效率。在多个细粒度分类基准上的实验结果表明,我们的提升CNN变体在预测性能和训练速度上始终优于传统CNN。

英文摘要

Convolutional Neural Networks (CNNs) have achieved remarkable success across a wide range of machine learning tasks by leveraging hierarchical feature learning through deep architectures. However, the large number of layers and millions of parameters often make CNNs computationally expensive to train, requiring extensive time and manual tuning to discover optimal architectures. In this paper, we introduce a novel framework for boosting CNN performance that integrates dynamic feature selection with the principles of BoostCNN. Our approach incorporates two key strategies: subgrid selection and importance sampling, to guide training toward informative regions of the feature space. We further develop a family of algorithms that embed boosting weights directly into the network training process using a least squares loss formulation. This integration not only alleviates the burden of manual architecture design but also enhances accuracy and efficiency. Experimental results across several fine-grained classification benchmarks demonstrate that our boosted CNN variants consistently outperform conventional CNNs in both predictive performance and training speed.

2501.02409 2026-06-02 cs.LG cs.AI cs.CE q-bio.MN stat.ME

Interpretable Neural ODEs for Gene Regulatory Network Discovery under Perturbations

可解释神经ODE用于扰动下基因调控网络发现

Zaikang Lin, Sei Chang, Aaron Zweig, Minseo Kang, Fabian J. Theis, Elham Azizi, David A. Knowles

AI总结 提出PerturbODE框架,利用可解释神经常微分方程建模扰动下的细胞状态轨迹,从ODE参数中推导因果基因调控网络,实现未见遗传干预的模拟。

详情
AI中文摘要

现代高通量生物数据集包含数千种扰动,使得能够大规模发现代表基因间调控相互作用的因果图。可微分因果图模型和基于回归的方法已被开发用于从干预数据集推断基因调控网络(GRN)。然而,现有方法未能捕捉生物过程(如细胞分化)的非线性动力学。为解决这一局限性,我们提出PerturbODE,一种新颖框架,采用可解释神经常微分方程(神经ODE)对扰动下的细胞状态轨迹进行建模,并从神经ODE参数中推导出潜在的因果GRN,从而实现对未见遗传干预的下游模拟。GRN通过单隐藏层前馈网络编码,隐含地将基因分组为可解释的共调控模块。我们展示了PerturbODE在GRN推断和扩展到扰动响应预测方面的有效性,包括模拟和真实过表达数据集。

英文摘要

Modern high-throughput biological datasets containing thousands of perturbations enable large-scale discovery of causal graphs that represent regulatory interactions between genes. Differentiable causal graphical models and regression-based methods have been developed to infer gene regulatory networks (GRNs) from interventional datasets. However, existing approaches fail to capture the non-linear dynamics of biological processes such as cellular differentiation. To address this limitation, we propose PerturbODE, a novel framework that employs interpretable neural ordinary differential equations (neural ODEs) to model cell state trajectories under perturbations and derive the underlying causal GRN from the neural ODE parameters, enabling downstream simulation of unseen genetic interventions. The GRN is encoded via a single-hidden-layer feedforward network, implicitly grouping genes into interpretable co-regulated modules. We demonstrate PerturbODE's efficacy in GRN inference and extension to perturbation response prediction across both simulated and real overexpression datasets.

2511.04873 2026-06-02 stat.ML cs.LG

Prototype Selection Using Topological Data Analysis

使用拓扑数据分析的原型选择

Jordan Eckert, Elvan Ceyhan, Henry Schenck

AI总结 提出两种基于持续同调的原型选择方法TPS和BoundaryTPS,通过多尺度拓扑结构压缩训练集,在保持决策边界和内部典型点的同时,实现了对H1持续图的最佳保留和稳定的折叠扰动性能。

详情
Comments
Code will be made available upon request to Jordan Eckert
AI中文摘要

原型选择方法压缩训练集,但现有的分类(压缩、编辑、混合、基于能力、基于优化和基于聚类)不包括对数据多尺度拓扑结构进行操作的方法。本文介绍了两种不同的基于持续性的原型选择变体:拓扑原型选择器(TPS)和边界感知拓扑原型选择器(BoundaryTPS)。TPS使用两个连续的Rips过滤来保留边界相关点和内部典型点。BoundaryTPS是一种单阶段变体,其顶点加权过滤将保留集中在决策边界附近。我们在15个真实数据集上对这两种方法进行了评估,并与七个经典基线方法进行了比较,发现拓扑方法在原型选择设计空间中占据了与现有方法不同的操作点。BoundaryTPS在$H_1$持续图保留上实现了最低的平均Friedman秩,并且显著优于七个基线中的五个(Nemenyi,$α= 0.05$)。TPS在同一指标上排名第三。两种方法在折叠扰动下比任何测试的链式决策选择器更稳定,并且两者都继承了源集的类别比例,无需标签感知机制。在聚合G-Mean上,两种方法具有竞争力但并非领先,跨折叠组合的秩1频率分别为$11.3\%$(TPS)和$9.9\%$(BoundaryTPS)。经验上,两种方法在样本量上呈次二次方缩放。

英文摘要

Prototype selection methods compress a training set, but the existing taxonomy of condensation, edition, hybrid, competence-based, optimization-based, and clustering-based families does not include methods that operate on the multi-scale topological structure of the data. This paper introduces two different persistence-based prototype selector variants, Topological Prototype Selector (TPS) and Boundary-Conscious Topological Prototype Selector (BoundaryTPS). TPS uses two sequential Rips filtrations to retain boundary-relevant and interior-typical points. BoundaryTPS is a single-stage variant whose vertex-weighted filtration concentrates retention near the decision boundary. We evaluate both methods against seven classical baselines on fifteen real datasets and find that the topological methods occupy a different operating point in the prototype-selection design space than existing methods. BoundaryTPS achieves the lowest mean Friedman rank on $H_1$ persistence-diagram preservation and is significantly better than five of the seven baselines (Nemenyi, $α= 0.05$). TPS ranks third on the same endpoint. Both methods are more stable under fold perturbation than any chained-decision selector tested, and both inherit the source set's class proportions without label-aware machinery. On aggregate G-Mean both methods are competitive but not leading, with rank-1 frequencies of $11.3\%$ (TPS) and $9.9\%$ (BoundaryTPS) across fold combinations. Empirically, both methods scale sub-quadratically in sample size.

2506.22666 2026-06-02 cs.CR cs.CL cs.LG stat.ML

VERA: Variational Inference Framework for Jailbreaking Large Language Models

VERA:用于越狱大型语言模型的变分推理框架

Anamika Lochab, Lu Yan, Patrick Pynadath, Xiangyu Zhang, Ruqi Zhang

AI总结 提出VERA框架,将黑盒越狱提示生成视为变分推理问题,训练小型攻击者LLM近似目标LLM的对抗提示后验,无需重新优化即可生成多样且流畅的越狱提示。

详情
Comments
Accepted by NeurIPS 2025
AI中文摘要

仅通过API访问最先进LLM的兴起凸显了在现实环境中识别模型漏洞的有效黑盒越狱方法的需求。由于缺乏基于梯度的优化原则性目标,大多数现有方法依赖于遗传算法,这些算法受限于其初始化和对人工策划提示池的依赖。此外,这些方法需要对每个提示进行单独优化,未能提供模型漏洞的全面表征。为弥补这一差距,我们引入了VERA:用于越狱的变分推理框架。VERA将黑盒越狱提示生成视为变分推理问题,训练一个小型攻击者LLM来近似目标LLM在对抗提示上的后验。一旦训练完成,攻击者可以针对目标查询生成多样化、流畅的越狱提示,而无需重新优化。实验结果表明,VERA在一系列目标LLM上取得了强劲的性能,凸显了概率推理在对抗性提示生成中的价值。

英文摘要

The rise of API-only access to state-of-the-art LLMs highlights the need for effective black-box jailbreak methods to identify model vulnerabilities in real-world settings. Without a principled objective for gradient-based optimization, most existing approaches rely on genetic algorithms, which are limited by their initialization and dependence on manually curated prompt pools. Furthermore, these methods require individual optimization for each prompt, failing to provide a comprehensive characterization of model vulnerabilities. To address this gap, we introduce VERA: Variational infErence fRamework for jAilbreaking. VERA casts black-box jailbreak prompting as a variational inference problem, training a small attacker LLM to approximate the target LLM's posterior over adversarial prompts. Once trained, the attacker can generate diverse, fluent jailbreak prompts for a target query without re-optimization. Experimental results show that VERA achieves strong performance across a range of target LLMs, highlighting the value of probabilistic inference for adversarial prompt generation.

2412.16209 2026-06-02 cs.LG stat.ML

Challenges in the calibration of tree-based models for imbalanced classification

基于树的模型在不平衡分类中校准的挑战

Nathan Phelps, Daniel J. Lizotte, Douglas G. Woolford

AI总结 研究随机森林在欠采样数据上使用解析校准导致偏差的问题,发现决策树可能偏向少数类,并提出应使用可学习校准模式的方法(如beta校准)。

详情
AI中文摘要

当使用机器学习处理不平衡的二分类问题时,通常会对多数类进行子采样以创建(更)平衡的训练数据集。这会使模型产生偏差,因为模型从不能完全代表底层感兴趣群体的数据中学习。解决这种偏差的一种方法是基于多数类的采样率,将预测结果解析映射到新值。我们展示了以这种方式校准随机森林会产生负面后果,包括流行率估计同时依赖于随机森林中每个分裂考虑的预测变量数量和使用的采样率。我们利用随机森林和解析校准的已知性质解释前者,并通过展示决策树中的偏差解释后者。与现有文献相矛盾的是,我们证明决策树可能偏向少数类。这些问题表明,在欠采样数据上训练的基于树的模型不应进行解析校准。能够学习原始模型中校准偏差模式的方法(例如beta校准)更为合适。

英文摘要

When using machine learning for imbalanced binary classification problems, it is common to subsample the majority class to create a (more) balanced training dataset. This biases the model's predictions because the model learns from data that is not fully representative of the underlying population of interest. One way of accounting for this bias is analytically mapping the resulting predictions to new values based on the sampling rate for the majority class. We show that calibrating a random forest this way has negative consequences, including prevalence estimates that depend on both the number of predictors considered at each split in the random forest and the sampling rate used. We explain the former using known properties of random forests and analytical calibration and the latter by demonstrating a bias in decision trees. In contradiction with much of the existing literature, we show that decision trees can be biased towards the minority class. These issues indicate that tree-based models trained on undersampled data should not be calibrated analytically. Calibration approaches that can learn a miscalibration pattern in the original model (e.g., beta calibration) are more suitable.

2504.19419 2026-06-02 cs.LG stat.ML

Advancing Local Clustering on Graphs via Compressive Sensing: Semi-supervised and Unsupervised Methods

通过压缩感知推进图上的局部聚类:半监督和无监督方法

Zhaiming Shen, Sung Ha Kang

AI总结 提出基于压缩感知的半监督和无监督局部聚类方法,通过随机采样、扩散和重叠分析实现稀疏解,并证明其正确性,在低标签率下达到最优性能。

详情
AI中文摘要

局部聚类旨在无需图的任何额外结构信息的情况下,识别大型图中的特定子结构。这些子结构通常相对于整个图较小,使得可以通过寻找与图拉普拉斯相关的线性系统的稀疏解来解决问题。在这项工作中,我们首先提出了一种在给定极少标签数据时识别特定局部聚类的方法,我们称之为半监督局部聚类。然后,我们将该方法扩展到无监督设置,即没有标签的先验信息可用。所提出的方法包括随机采样图、通过局部聚类提取进行扩散,然后检查结果之间的重叠以找到每个聚类。我们建立了任意节点对的共成员条件,并严格证明了我们方法的正确性。此外,我们进行了大量实验,证明所提出的方法在低标签率情况下达到了最先进的结果。

英文摘要

Local clustering aims to identify specific substructures within a large graph without any additional structural information of the graph. These substructures are typically small compared to the overall graph, enabling the problem to be approached by finding a sparse solution to a linear system associated with the graph Laplacian. In this work, we first propose a method for identifying specific local clusters when very few labeled data are given, which we term semi-supervised local clustering. We then extend this approach to the unsupervised setting when no prior information on labels is available. The proposed methods involve randomly sampling the graph, applying diffusion through local cluster extraction, then examining the overlap among the results to find each cluster. We establish the co-membership conditions for any pair of nodes, and rigorously prove the correctness of our methods. Additionally, we conduct extensive experiments to demonstrate that the proposed methods achieve state of the art results in the low-label rates regime.

2506.16704 2026-06-02 cs.LG stat.ML

How Many Domains Suffice for Domain Generalization? A Tight Characterization via the Domain Shattering Dimension

域泛化需要多少域?通过域破碎维度的紧刻画

Cynthia Dwork, Lunjia Hu, Han Shao

AI总结 本文在PAC框架下引入域破碎维度,刻画了域泛化中所需随机采样域的数量,并建立了与VC维度的紧定量关系。

详情
Comments
Accepted to NeurIPS 2025
AI中文摘要

我们研究域泛化的一个基本问题:给定一族域(即数据分布),我们需要从多少个随机采样的域中收集数据,才能学习到一个在族中每个已见和未见域上表现合理的模型?我们在PAC框架下建模该问题,并引入一种新的组合度量,称为域破碎维度。我们证明该维度刻画了域样本复杂度。此外,我们建立了域破碎维度与经典VC维度之间的紧定量关系,表明每个在标准PAC设置中可学习的假设类在我们的设置中也是可学习的。

英文摘要

We study a fundamental question of domain generalization: given a family of domains (i.e., data distributions), how many randomly sampled domains do we need to collect data from in order to learn a model that performs reasonably well on every seen and unseen domain in the family? We model this problem in the PAC framework and introduce a new combinatorial measure, which we call the domain shattering dimension. We show that this dimension characterizes the domain sample complexity. Furthermore, we establish a tight quantitative relationship between the domain shattering dimension and the classic VC dimension, demonstrating that every hypothesis class that is learnable in the standard PAC setting is also learnable in our setting.

2510.17303 2026-06-02 cs.LG stat.ML

Symmetries in PAC-Bayesian Learning

PAC-Bayesian学习中的对称性

Armin Beck, Peter Ochs

AI总结 本文在PAC-Bayes框架下,将对称性的泛化保证扩展到非紧致群和非不变数据分布,通过调整和收紧现有界,实验验证了其有效性。

详情
AI中文摘要

已知对称性能够提高机器学习模型的经验性能,但解释这些增益的理论保证仍然有限。先前的工作主要关注紧致群对称性,并且通常假设数据分布本身是不变的,这一假设在实际应用中很少满足。在这项工作中,我们将泛化保证扩展到更广泛的非紧致对称性设置,例如平移和非不变数据分布。基于PAC-Bayes框架,我们调整并收紧现有界,在McAllester的PAC-Bayes界上展示了该方法,同时表明它适用于广泛的PAC-Bayes界。我们通过几个具有非均匀和非紧致变换的数据集上的实验验证了我们的理论,其中推导出的保证不仅成立,而且优于先前的结果。这些发现提供了理论证据,表明对于对称数据,对称模型在紧致群和不变分布的狭窄设置之外也是更可取的,为机器学习中对称性的更一般理解开辟了道路。

英文摘要

Symmetries are known to improve the empirical performance of machine learning models, yet theoretical guarantees explaining these gains remain limited. Prior work has focused mainly on compact group symmetries and often assumes that the data distribution itself is invariant, an assumption rarely satisfied in real-world applications. In this work, we extend generalization guarantees to the broader setting of non-compact symmetries, such as translations and to non-invariant data distributions. Building on the PAC-Bayes framework, we adapt and tighten existing bounds, demonstrating the approach on McAllester's PAC-Bayes bound while showing that it applies to a wide range of PAC-Bayes bounds. We validate our theory with experiments on several datasets with non-uniform and non-compact transformations, where the derived guarantees not only hold but also improve upon prior results. These findings provide theoretical evidence that, for symmetric data, symmetric models are preferable beyond the narrow setting of compact groups and invariant distributions, opening the way to a more general understanding of symmetries in machine learning.

2510.15762 2026-06-02 stat.AP

Incorporating estimands into meta-analyses of clinical trials

将估计目标纳入临床试验的荟萃分析

Antonio Remiro-Azócar, Pepa Polavieja, Emmanuelle Boutmy, Alessandro Ghiretti, Lise Lotte Nystrup Husemoen, Khadija Rerhou Rantell, Tatsiana Vaitsiakhovich, David M. Phillippo, Jay J. H. Park, Helle Lynggaard, Robert Bauer, Antonia Morga

AI总结 提出一个实用框架,通过将估计目标纳入临床试验荟萃分析,系统识别和减轻定量异质性来源,提高汇总估计的适用性和外部有效性。

详情
Comments
28 pages, 6 figures, 6 tables. Version accepted for publication by Research Synthesis Methods following minor revisions
AI中文摘要

估计目标框架在确证性临床试验中越来越被用于提出研究问题。在证据综合中,估计目标的采用较为有限,而PICO(人群、干预、对照、结局)框架更为常用。尽管PICO和估计目标有重叠元素,但估计目标框架明确考虑了不同策略处理伴发事件。我们提出了一个在临床试验荟萃分析中使用估计目标的实用框架,强调了估计目标在系统识别和减轻关键定量异质性来源以及提高汇总估计的适用性或外部有效性方面的价值。重点放在伴发事件策略的作用上,特别是在卫生技术评估的荟萃分析背景下。我们将估计目标框架应用于临床试验的网络荟萃分析,比较司美格鲁肽与度拉糖肽在2型糖尿病中的疗效。我们探讨了治疗政策策略(针对治疗中断或启用补救药物)与假设策略(针对相应的伴发事件)的影响。在荟萃分析层面指定不同的目标估计目标,使我们能够明确异质性的来源,即伴发事件策略,这是导致结果任何潜在差异的驱动因素。我们主张将估计目标整合到荟萃分析的计划中,同时承认在缺乏个体水平数据的情况下存在潜在挑战。估计目标可以补充PICO,以加强利益相关者之间关于证据综合旨在证明什么的沟通,并确保生成的证据对医疗决策者具有最大相关性。

英文摘要

The estimand framework is increasingly established to pose research questions in confirmatory clinical trials. In evidence synthesis, the uptake of estimands has been modest, and the PICO (Population, Intervention, Comparator, Outcome) framework is more often applied. While PICOs and estimands have overlapping elements, the estimand framework explicitly considers different strategies for intercurrent events. We propose a pragmatic framework for the use of estimands in meta-analyses of clinical trials, highlighting the value of estimands to systematically identify and mitigate key sources of quantitative heterogeneity, and to enhance the applicability or external validity of pooled estimates. Focus is placed on the role of strategies for intercurrent events, within the specific context of meta-analyses for health technology assessment. We apply the estimand framework to a network meta-analysis of clinical trials, comparing the efficacy of semaglutide versus dulaglutide in type 2 diabetes. We explore the impact of a treatment policy strategy for treatment discontinuation or initiation of rescue medication versus a hypothetical strategy for the corresponding intercurrent events. The specification of different target estimands at the meta-analytical level allows us to be explicit about the source of heterogeneity, the intercurrent event strategy, driving any potential differences in results. We advocate for the integration of estimands into the planning of meta-analyses, while acknowledging that potential challenges exist in the absence of subject-level data. Estimands can complement PICOs to strengthen communication between stakeholders about what evidence syntheses seek to demonstrate, and to ensure that the generated evidence is maximally relevant to healthcare decision-makers.

2410.14483 2026-06-02 stat.ML cs.LG stat.ME

Interventional Processes for Causal Uncertainty Quantification

因果不确定性量化的干预过程

Hugh Dance, Peter Orbanz, Arthur Gretton

AI总结 本文提出一种基于高斯过程的方法,通过将干预函数表示为再生核希尔伯特空间中观测函数的内积,实现干预函数的不确定性量化,并给出闭式后验矩和可处理的训练推理过程。

详情
AI中文摘要

在高风险应用中,因果效应的可靠不确定性量化至关重要,但当目标是一个完整函数而非标量估计量时,这仍然具有挑战性。在这项工作中,我们引入了一种基于高斯过程的方法,用于干预函数的不确定性量化。核心思想是建立在最近工作的基础上,该工作将干预函数表示为再生核希尔伯特空间中观测函数的内积,通过为这些函数构建适当的高斯过程先验,并从观测数据中推断后验。我们的方法产生闭式后验矩和可处理的训练与推理,同时避免了先前为RKHS函数构建高斯过程先验的病理问题。我们进一步推导了一种后验覆盖校准的实用程序。在合成基准、因果贝叶斯优化任务和大规模真实数据集上,我们的方法在保持因果效应估计竞争力的同时,改善了不确定性量化。

英文摘要

Reliable uncertainty quantification for causal effects is crucial in high-stakes applications, but remains challenging when the target is an entire function rather than a scalar estimand. In this work, we introduce a GP-based approach for uncertainty quantification of interventional functions. The central idea is to build on recent work representing interventional functions as an inner-product of observational functions in a reproducing kernel Hilbert space (RKHS), by constructing appropriate GP priors for such functions and inferring posteriors from observational data. Our approach yields closed-form posterior moments and tractable training and inference, while avoiding pathologies of previous GP prior constructions for RKHS functions. We further derive a practical procedure for posterior coverage calibration. Across synthetic benchmarks, causal Bayesian optimization tasks, and a large-scale real dataset, our method improves uncertainty quantification while remaining competitive in causal effect estimation.

2510.09288 2026-06-02 stat.ML cs.LG

A unifying Bayesian framework for adversarial robustness

对抗鲁棒性的统一贝叶斯框架

Pablo G. Arce, Roi Naveiro, David Ríos Insua

AI总结 提出一个统一的贝叶斯框架,通过随机信道建模对抗不确定性,衍生出对抗训练和对抗净化两种鲁棒化策略,并验证了显式建模对抗不确定性的优势。

详情
AI中文摘要

机器学习模型对对抗攻击的脆弱性仍然是一个关键的社会安全挑战。传统的防御方法,如对抗训练,通常通过最小化最坏情况损失来增强模型鲁棒性。这些确定性方法没有考虑对手攻击的不确定性。虽然存在将概率分布置于对手上的随机防御,但它们通常缺乏统计严谨性,并且未能明确其潜在假设。为了解决这些问题,我们引入了一个正式的贝叶斯框架,通过随机信道建模对抗不确定性,阐明所有概率假设。这产生了两种鲁棒化策略:一种是在训练期间实施的主动防御,与对抗训练一致;另一种是在操作期间实施的被动防御,与对抗净化一致。几种最先进的防御可以作为我们模型的极限情况恢复。我们通过实验验证了我们的方法,展示了显式建模对抗不确定性的好处。

英文摘要

The vulnerability of machine learning models to adversarial attacks remains a critical societal security challenge. Traditional defenses, such as adversarial training, typically robustify models by minimizing a worst-case loss. These deterministic approaches do not account for uncertainty in the adversary's attack. While stochastic defenses placing a probability distribution on the adversary exist, they often lack statistical rigor and fail to make explicit their underlying assumptions. To resolve these issues, we introduce a formal Bayesian framework that models adversarial uncertainty through a stochastic channel, articulating all probabilistic assumptions. This yields two robustification strategies: a proactive defense enacted during training, aligned with adversarial training, and a reactive defense enacted during operations, aligned with adversarial purification. Several state-of-the-art defenses can be recovered as limiting cases of our model. We empirically validate our methodology, showcasing the benefits of explicitly modeling adversarial uncertainty.

2510.05566 2026-06-02 stat.ML cs.AI cs.CL cs.LG stat.AP

Domain-Shift-Aware Conformal Prediction for Large Language Models

领域偏移感知的共形预测用于大型语言模型

Zhexiao Lin, Yuanyuan Li, Neeraj Sarna, Yuanyuan Gao, Michael von Gablenz

AI总结 提出领域偏移感知共形预测框架,通过重加权校准样本应对分布偏移,在MMLU基准上提升覆盖可靠性。

详情
Comments
Accepted to Forty-Third International Conference on Machine Learning (ICML), 2026
AI中文摘要

大型语言模型在各种任务中取得了令人印象深刻的性能。然而,它们倾向于产生过度自信且事实不正确的输出,即所谓的幻觉,这在实际应用中带来了风险。共形预测提供了有限样本、无分布假设的覆盖保证,但标准共形预测在领域偏移下会失效,常常导致覆盖不足和不可靠的预测集。我们提出了一种称为领域偏移感知共形预测(DS-CP)的新框架。我们的框架通过根据校准样本与测试提示的接近程度系统地重新加权校准样本,将共形预测适应于领域偏移下的大型语言模型,从而在保持有效性的同时增强适应性。我们的理论分析和在MMLU基准上的实验表明,所提出的方法比标准共形预测提供了更可靠的覆盖,尤其是在显著分布偏移下,同时保持了效率。这为大型语言模型在实际部署中实现可信的不确定性量化迈出了实际的一步。

英文摘要

Large language models have achieved impressive performance across diverse tasks. However, their tendency to produce overconfident and factually incorrect outputs, known as hallucinations, poses risks in real-world applications. Conformal prediction provides finite-sample, distribution-free coverage guarantees, but standard conformal prediction breaks down under domain shift, often leading to under-coverage and unreliable prediction sets. We propose a new framework called Domain-Shift-Aware Conformal Prediction (DS-CP). Our framework adapts conformal prediction to large language models under domain shift, by systematically reweighting calibration samples based on their proximity to the test prompt, thereby preserving validity while enhancing adaptivity. Our theoretical analysis and experiments on the MMLU benchmark demonstrate that the proposed method delivers more reliable coverage than standard conformal prediction, especially under substantial distribution shifts, while maintaining efficiency. This provides a practical step toward trustworthy uncertainty quantification for large language models in real-world deployment.

2505.18102 2026-06-02 cs.LG cs.AI cs.CL stat.ME

CapBencher: Give Your LLM Benchmark a Built-in Alarm for Test-Set Overfitting

CapBencher: 为您的LLM基准测试内置测试集过拟合警报

Takashi Ishida, Thanawat Lodkaew, Ikko Yamane

AI总结 提出CapBencher方法,通过向答案注入随机性(准备多个逻辑正确但仅一个作为解)来降低贝叶斯准确率,从而在公开基准测试时防止测试集过拟合并检测泄露或作弊。

详情
Comments
ICML 2026 camera ready version
AI中文摘要

在互联网上发布大型语言模型(LLM)基准测试(尤其是其真实答案)存在污染未来LLM和导致评估作弊的风险:它可能被无意(或有意)用于训练或选择模型,或者在标签可访问时被利用来过拟合和操纵排行榜。常见的缓解措施是保持基准测试私有,并让参与者向组织者提交他们的模型或预测,但这仍然允许通过反馈循环进行测试集过拟合。为了克服这个问题,我们提出了CapBencher,一种在不完全公开真实答案的情况下发布基准测试的方法,同时保持LLM的开放评估。主要思想是通过准备多个逻辑正确的答案,并仅将其中一个作为基准测试中的解,向答案中注入随机性,从而降低最佳可能准确率,即贝叶斯准确率。这不仅掩盖了真实答案,还为泄露或作弊提供了测试:由于即使完全有能力的模型也不应超过贝叶斯准确率,任何超过该准确率的模型都是一个强烈的信号。我们从理论和实验上证明,CapBencher能够在不同的基准测试、模型、训练方法和场景中准确检测测试集过拟合。

英文摘要

Publishing a large language model (LLM) benchmark (especially its ground-truth answers) on the Internet risks contaminating future LLMs and enabling evaluation gaming: it may be unintentionally (or intentionally) used to train or select a model, or exploited to overfit and hack leaderboards when labels are accessible. A common mitigation is to keep the benchmark private and let participants submit their models or predictions to the organizers, but this still permits test-set overfitting through feedback loops. To overcome this issue, we propose CapBencher, a way to publish benchmarks without fully disclosing the ground-truth answers, while preserving open evaluation of LLMs. The main idea is to reduce the best possible accuracy, i.e., Bayes accuracy, by injecting randomness to the answers by preparing several logically correct answers, and only include one of them as the solution in the benchmark. Not only does this obscure the ground-truth answers, but it also offers a test for leakage or gaming: since even fully capable models should not surpass the Bayes accuracy, any model that does is a strong signal. We show theoretically and empirically that CapBencher accurately detects test-set overfitting across diverse benchmarks, models, training methodologies, and scenarios.

2510.03494 2026-06-02 cs.LG stat.ML

Trajectory Data Suffices for Statistically Efficient Policy Evaluation in Fixed-Horizon Offline RL with Linear $q^π$-Realizability and Concentrability

轨迹数据足以在具有线性 $q^π$-可实现性和集中性的固定视界离线强化学习中进行统计有效的策略评估

Volodymyr Tkachuk, Csaba Szepesvári, Xiaoqi Tan

AI总结 本文研究在轨迹数据假设下,利用线性 $q^π$-可实现性和集中性,实现固定视界离线强化学习中策略评估的统计有效学习,并改进了策略优化的样本复杂度分析。

详情
AI中文摘要

我们研究了具有函数近似的固定视界离线强化学习(RL),用于策略评估和策略优化。先前的工作表明,当唯一的假设是数据具有良好的覆盖性(集中性)且每个策略的状态-动作值函数是线性可实现的($q^π$-可实现性)时,对于这些问题中的任何一个,统计有效的学习都是不可能的(Foster et al., 2021)。最近,Tkachuk et al. (2024) 给出了一个用于策略优化的统计有效学习器,前提是数据被假定为以轨迹形式给出。在这项工作中,我们在相同的假设下提出了一个用于策略评估的统计有效学习器。此外,我们表明,通过更紧的分析,可以改进 Tkachuk et al. (2024) 用于策略优化的学习器的样本复杂度。

英文摘要

We study finite-horizon offline reinforcement learning (RL) with function approximation for both policy evaluation and policy optimization. Prior work established that statistically efficient learning is impossible for either of these problems when the only assumptions are that the data has good coverage (concentrability) and the state-action value function of every policy is linearly realizable ($q^π$-realizability) (Foster et al., 2021). Recently, Tkachuk et al. (2024) gave a statistically efficient learner for policy optimization, if in addition the data is assumed to be given as trajectories. In this work we present a statistically efficient learner for policy evaluation under the same assumptions. Further, we show that the sample complexity of the learner used by Tkachuk et al. (2024) for policy optimization can be improved by a tighter analysis.

2506.21027 2026-06-02 stat.ME stat.AP

Simultaneous estimation of the effective reproduction number and the time series of daily infections: Application to Covid-19

有效再生数与每日感染时间序列的联合估计:以Covid-19为例

Hans R. Künsch, Fabio Sigrist

AI总结 提出一种贝叶斯方法,通过新颖的MCMC算法联合估计每日新感染数和有效再生数,在Covid-19数据上比三步法更准确。

详情
AI中文摘要

时变有效再生数是疫情期间沟通和决策的重要参数。本文基于Cori等人(2013)的流行模型提出新的统计方法,该模型根据新感染的自激动态定义有效再生数。此类模型概念简单,且比更复杂的多室模型更不易误设。然而,统计推断具有挑战性,以往文献要么依赖代理数据,要么采用两步法先估计感染数。相比之下,我们提出一种连贯的贝叶斯方法,通过新颖的马尔可夫链蒙特卡洛(MCMC)算法近似每日新感染数和再生数的联合后验。将我们的方法与Huisman等人(2022)的最先进三步估计程序进行比较,两者均使用瑞士Covid-19疫情期间的每日确诊病例和模拟数据,我们发现我们的方法在点估计和不确定性量化方面更准确,尤其是在观测期开始和结束时。

英文摘要

The time-varying effective reproduction number is an important parameter for communication and policy decisions during an epidemic. In this paper, we present new statistical methods for estimating the reproduction number based on the popular model of \citet{cori2013new} which defines the effective reproduction number based on self-exciting dynamics of new infections. Such a model is conceptually simple and less susceptible to misspecifications than more complicated multi-compartment models. However, statistical inference is challenging, and the previous literature has either relied on proxy data and/or a two-step approach in which the number of infections is first estimated. In contrast, we present a coherent Bayesian method that approximates the joint posterior of daily new infections and reproduction numbers using a novel Markov chain Monte Carlo (MCMC) algorithm. Comparing our method to the state-of-the-art three-step estimation procedure of \citet{huisman2022estimation}, both using daily confirmed cases from Switzerland in the Covid-19 epidemic and simulated data, we find that our method is more accurate in terms of point estimates and uncertainty quantification, especially near the beginning and end of an observation period.

2509.23544 2026-06-02 stat.ML cs.AI cs.LG stat.ME

End-to-End Deep Learning for Predicting Metric Space-Valued Outputs

端到端深度学习预测度量空间值输出

Yidong Zhou, Su I Iao, Hans-Georg Müller

AI总结 提出E2M框架,通过加权Fréchet均值和神经网络学习权重,实现度量空间值输出的几何感知预测,具有理论保证并在多种结构化输出上取得最优性能。

详情
Journal ref
Journal of Machine Learning Research, 27:1--38, 2026
Comments
38 pages, 4 figures, 9 tables
AI中文摘要

许多现代应用涉及预测结构化、非欧几里得输出,例如概率分布、网络和对称正定矩阵。这些输出自然地被建模为一般度量空间的元素,而依赖于向量空间结构的经典回归技术不再适用。我们引入了E2M(端到端度量回归),这是一个用于预测度量空间值输出的深度学习框架。E2M通过训练输出的加权Fréchet均值进行预测,其中权重由基于输入条件的神经网络学习。这种构造提供了一种原则性的几何感知预测机制,避免了替代嵌入和限制性参数假设,同时完全保留了输出空间的内在几何结构。我们建立了理论保证,包括刻画模型表达能力的通用逼近定理以及熵正则化训练目标的收敛性分析。通过涉及概率分布、网络和对称正定矩阵的大量模拟,我们展示了E2M始终达到最先进的性能,且其优势在更大样本量下更加明显。应用于人类死亡率分布和纽约市出租车网络进一步证明了该框架的灵活性和实用性。

英文摘要

Many modern applications involve predicting structured, non-Euclidean outputs such as probability distributions, networks, and symmetric positive-definite matrices. These outputs are naturally modeled as elements of general metric spaces, where classical regression techniques that rely on vector space structure no longer apply. We introduce E2M (End-to-End Metric regression), a deep learning framework for predicting metric space-valued outputs. E2M performs prediction via weighted Fréchet means over training outputs, where the weights are learned by a neural network conditioned on the input. This construction provides a principled mechanism for geometry-aware prediction that avoids surrogate embeddings and restrictive parametric assumptions, while fully preserving the intrinsic geometry of the output space. We establish theoretical guarantees, including a universal approximation theorem that characterizes the expressive capacity of the model and a convergence analysis of the entropy-regularized training objective. Through extensive simulations involving probability distributions, networks, and symmetric positive-definite matrices, we show that E2M consistently achieves state-of-the-art performance, with its advantages becoming more pronounced at larger sample sizes. Applications to human mortality distributions and New York City taxi networks further demonstrate the flexibility and practical utility of this framework.

2509.18025 2026-06-02 math.OC cs.AI cs.LG math.LO stat.ML

Deep Learning as the Disciplined Construction of Tame Objects

深度学习作为驯服对象的有纪律构造

Gilles Bareilles, Allen Gehret, Johannes Aspman, Jana Lepšová, Jakub Mareček

AI总结 本文通过驯服几何(o-极小性)框架,介绍深度学习模型作为函数组合的数学基础,并展示其在非光滑非凸但驯服设置下为随机梯度下降提供收敛保证的应用。

详情
Comments
39 pages, 10 figures
AI中文摘要

人们可以将深度学习模型视为所谓驯服几何中函数的组合。在这篇说明性笔记中,我们概述了驯服几何(也称为o-极小性)、优化理论以及深度学习理论与实践之间的一些主题。为此,我们逐步介绍在一般非光滑非凸但驯服的设置中,为随机梯度下降建立收敛保证所使用的概念和工具。这说明了驯服几何作为研究AI系统(尤其是深度学习)的自然数学框架的一些方式。

英文摘要

One can see deep-learning models as compositions of functions within the so-called tame geometry. In this expository note, we give an overview of some topics at the interface of tame geometry (also known as o-minimality), optimization theory, and deep learning theory and practice. To do so, we gradually introduce the concepts and tools used to build convergence guarantees for stochastic gradient descent in a general nonsmooth nonconvex, but tame, setting. This illustrates some ways in which tame geometry is a natural mathematical framework for the study of AI systems, especially within Deep Learning.

2509.17557 2026-06-02 stat.AP

A Bayesian approach to aggregated chemical exposure assessment

聚合化学暴露评估的贝叶斯方法

Sophie Van Den Neucker, Alexander Grigoriev, Heidi Demaegdt, Jan Mast, Karlien Cheyns, Sofie De Broe, Roberto Cerina

AI总结 提出一种贝叶斯框架,通过整合多源数据并模拟个体暴露情景,实现聚合化学暴露的稳健估计,并以二氧化钛为例验证其有效性。

详情
AI中文摘要

人类对化学物质的暴露通常来自多个来源,然而传统评估往往孤立地处理这些来源,忽视了它们的联合影响。我们引入了一个用于聚合化学暴露评估的贝叶斯框架,该框架明确考虑了这些相互交织的途径。通过整合多样化的数据集——例如消费调查、人口统计、化学测量和市场存在——我们的方法解决了典型的数据挑战,包括缺失值、有限的样本量和数据不一致性,同时融入了相关的先验知识。通过反映个体暴露情景全谱的模拟策略,我们得出了稳健的、基于人群的聚合暴露估计。我们使用二氧化钛(一种存在于食品、膳食补充剂、药品和个人护理产品中的化学物质)展示了该方法的实用性。通过捕捉现实世界暴露的复杂性,这种全面的贝叶斯方法为决策者提供了更可靠的概率估计,以指导公共卫生政策。

英文摘要

Human exposure to chemicals commonly arises from multiple sources, yet traditional assessments often treat these sources in isolation, overlooking their combined impact. We introduce a Bayesian framework for aggregated chemical exposure assessment that explicitly accounts for these intertwined pathways. By integrating diverse datasets - such as consumption surveys, demographics, chemical measurements, and market presence - our approach addresses typical data challenges, including missing values, limited sample sizes, and inconsistencies, while incorporating relevant prior knowledge. Through a simulation-based strategy that reflects the full spectrum of individual exposure scenarios, we derive robust, population-level estimates of aggregated exposure. We demonstrate the value of this method using titanium dioxide, a chemical found in foods, dietary supplements, medicines, and personal care products. By capturing the complexity of real-world exposures, this comprehensive Bayesian approach provides decision-makers with more reliable probabilistic estimates to inform public health policies.

2412.05998 2026-06-02 stat.ME

B-MASTER: Scalable Bayesian Multivariate Regression for Master Predictor Discovery in Colorectal Cancer Microbiome-Metabolite Profiles

B-MASTER:用于结直肠癌微生物组-代谢物谱中主预测因子发现的可扩展贝叶斯多元回归

Priyam Das, Tanujit Dey, Christine Peterson, Sounak Chakraborty

AI总结 提出可扩展贝叶斯多元回归框架B-MASTER,结合L1稀疏性和L2组收缩,识别系统性调控代谢组的微生物属,并在结直肠癌数据中发现关键主预测因子。

详情
AI中文摘要

动机:肠道微生物通过影响宿主代谢来塑造癌症治疗反应。虽然先前的研究检查了个别属和代谢物之间的成对关联,但识别系统性调控整体代谢组的微生物属的方法有限。需要可扩展的统计工具来在高维微生物组-代谢组数据中发现这种系统级'主预测因子'。结果:我们引入了B-MASTER,一个可扩展的贝叶斯多元回归框架,结合L1稀疏性和L2组收缩来识别关键的跨代谢物调控因子。吉布斯采样器实现了近线性的计算扩展,支持具有数百万参数的模型。该方法有理论保证,包括后验收缩和选择一致性。对结直肠癌微生物组-代谢组数据的分析揭示了控制全局和癌症相关代谢物模式的关键微生物属,突出了系统级调控结构。可用性:B-MASTER代码,包括演示脚本,可在https://github.com/priyamdas2/B-MASTER获取。与本文对应的代码存档快照可在Zenodo上获取,DOI: 10.5281/zenodo.20484958。

英文摘要

Motivation: The gut microbiome shapes cancer therapy response through its influence on host metabolism. While prior studies examine pairwise associations between individual genera and metabolites, there is limited methodology for identifying microbial genera that systematically regulate the overall metabolome. Scalable statistical tools are needed to uncover such system-level 'master predictors' in high-dimensional microbiome-metabolome data. Results: We introduce B-MASTER, a scalable Bayesian multivariate regression framework combining L1 sparsity and L2 group shrinkage to identify essential cross-metabolite regulators. A Gibbs sampler enables near-linear computational scaling, supporting models with millions of parameters. The method is supported by theoretical guarantees, including posterior contraction and selection consistency. Analysis of colorectal cancer microbiome-metabolome data reveals key microbial genera that govern global and cancer-associated metabolite patterns, highlighting system-level regulatory structure. Availability: The B-MASTER code, including demonstration scripts, is available at https://github.com/priyamdas2/B-MASTER. An archived snapshot of the code corresponding to this manuscript is available on Zenodo with DOI: 10.5281/zenodo.20484958.

2509.07602 2026-06-02 stat.ME

Adaptive clinical trial design with delayed treatment effects using elicited prior distributions

利用先验分布处理延迟治疗效应的自适应临床试验设计

James Salsbury, Jeremy Oakley, Steven Julious, Lisa Hampson

AI总结 针对免疫肿瘤学等现代疗法中常见的延迟治疗效应,提出一种结合专家先验分布的自适应试验设计框架,通过中期分析指导早期停止决策,提高试验效率和稳健性。

详情
AI中文摘要

以总生存期(OS)或无进展生存期(PFS)等时间至事件终点为指标的临床试验,是评估新疗法(尤其是免疫肿瘤学疗法)的基础。然而,免疫疗法和靶向治疗等现代疗法常表现出延迟效应,对传统试验设计构成挑战。这些延迟效应违反了比例风险假设,而该假设是Cox比例风险模型和对数秩检验等标准统计方法的基础。为确保试验设计能恰当考虑这些效应的时机和大小,周密的规划至关重要。若无此规划,若研究早期低估治疗效果,中期分析可能导致试验提前终止。我们提出一个自适应试验设计框架,该框架整合了从专家处获取的关于延迟治疗效应的先验分布。通过解决延迟治疗效应带来的不确定性,我们的方法提高了试验效率和稳健性,最大限度地降低了提前终止的风险,并随时间推移改善了治疗获益的检测。我们通过一个示例说明,如何利用先验信息指导的中期分析来指导早期停止决策。为促进我们框架的实施,我们开发了免费的开源软件,使研究人员能够将先验分布整合到试验规划和决策中。该软件提供了一个灵活、易用的工具,通过自适应试验设计更准确地评估现代疗法。

英文摘要

Clinical trials with time-to-event endpoints, such as overall survival (OS) or progression-free survival (PFS), are fundamental for evaluating new treatments, particularly in immuno-oncology. However, modern therapies, such as immunotherapies and targeted treatments, often exhibit delayed effects that challenge traditional trial designs. These delayed effects violate the proportional hazards assumption, which underpins standard statistical methods like the Cox proportional hazards model and the log-rank test. Careful planning is essential to ensure trials are appropriately designed to account for the timing and magnitude of these effects. Without this planning, interim analyses may lead to premature trial termination if the treatment effect is underestimated early in the study. We present an adaptive trial design framework that incorporates prior distributions, elicited from experts, for delayed treatment effects. By addressing the uncertainty surrounding delayed treatment effects, our approach enhances trial efficiency and robustness, minimizing the risk of premature termination and improving the detection of treatment benefits over time. We present an example illustrating how interim analyses, informed by prior distributions, can guide early stopping decisions. To facilitate the implementation of our framework, we have developed free, open-source software that enables researchers to integrate prior distributions into trial planning and decision-making. This software provides a flexible, accessible tool for designing trials that more accurately evaluate modern therapies through adaptive trial designs.

2509.05563 2026-06-02 stat.ME math.ST stat.AP stat.ML stat.TH

Geometry-preserving and interpretable dimension reduction for compositional data

成分数据的保几何与可解释降维

Junyoung Park, Cheolwoo Park, Jeongyoun Ahn

AI总结 针对高维成分数据的单形约束和零膨胀问题,提出一种保几何降维框架,将高维成分直接映射到低维单形,实现可解释的软合并与双可视化,并基于核方法估计中心成分子空间,证明估计与目标子空间不同维度下的一致性。

详情
Comments
61 pages, 4 figures
AI中文摘要

高维成分数据由于单形约束和过多的零值而带来独特的统计挑战。虽然降维对于分析此类数据不可或缺,但传统方法通常依赖于对数比变换,这损害了可解释性,并通过临时零替换扭曲数据。为了解决这些问题,我们引入了一个保几何的成分数据降维框架,将高维成分直接映射到低维单形。该框架可解释为成分的软合并,并支持双可视化——同时显示投影数据和变量如何贡献于降维成分——以实现一目了然的解释。在此几何结构内,我们为成分预测变量定义了一种新的充分降维(SDR)方法,其可识别对象称为中心成分子空间,与欧几里得SDR中的经典中心子空间不同。对于估计,我们提出了一种基于核的方法,该方法产生稀疏解,并附带一个用于直接下游分析的固有预测模型。我们通过一个新的子空间比较论证证明了一致性,该论证允许估计子空间和目标子空间具有不同维度。对真实微生物组数据集的应用表明,我们的方法为在高维成分数据中发现有意义的生物模式提供了强大的图形探索工具。

英文摘要

High-dimensional compositional data pose unique statistical challenges due to the simplex constraint and excess zeros. While dimension reduction is indispensable for analyzing such data, conventional approaches often rely on log-ratio transformations that compromise interpretability and distort the data through ad hoc zero replacements. To address these issues, we introduce a geometry-preserving framework for dimension reduction of compositional data, mapping high-dimensional compositions directly to a lower-dimensional simplex. This framework is interpretable as a softened amalgamation of compositions and enables dual visualization -- showing both projected data and how variables contribute to reduced components -- for at-a-glance interpretation. Within this geometry, we define a new sufficient dimension reduction (SDR) approach for compositional predictors, whose identifiable object, termed the central compositional subspace, differs from the classical central subspace in Euclidean SDR. For estimation, we propose a kernel-based method that yields sparse solutions and comes with an intrinsic predictive model for direct downstream analyses. We prove consistency through a new subspace-comparison argument that allows the estimated and target subspaces to have different dimensions. Applications to real microbiome datasets demonstrate that our approach provides a powerful graphical exploration tool for uncovering meaningful biological patterns in high-dimensional compositional data.

2509.04631 2026-06-02 cs.LG cs.IT math.IT stat.ML

Fundamental bounds on efficiency-confidence trade-off for transductive conformal prediction

转导共形预测的效率-置信度权衡的基本界限

Arash Behboodi, Alvaro H. C. Correia, Fabio Valerio Massoli, Christos Louizos

AI总结 本文证明了转导共形预测中置信度与效率(预测集大小)之间存在严格有限样本界,指出非平凡置信度会导致预测集大小随数据固有不确定性呈指数增长,并提出了接近该界限的实用算法。

详情
AI中文摘要

转导共形预测处理多个数据点的同时预测。给定期望的置信水平,目标是构建一个预测集,以规定的置信度包含真实结果。我们证明了转导方法中置信度与效率之间的基本权衡,其中效率通过预测集的大小来衡量。具体来说,我们推导了一个严格的有限样本界,表明对于具有固有不确定性的数据,任何非平凡的置信水平都会导致预测集大小的指数增长。指数与样本数量线性相关,并与数据的条件熵成正比。此外,该界限包含一个二阶项——分散度,定义为对数条件概率分布的方差。我们表明,基于近似条件分布的转导方法可以接近这个界限。受此启发,我们引入了一种实用的转导预测算法,该算法优于Bonferroni方法。

英文摘要

Transductive conformal prediction addresses the simultaneous prediction for multiple data points. Given a desired confidence level, the objective is to construct a prediction set that includes the true outcomes with the prescribed confidence. We demonstrate a fundamental trade-off between confidence and efficiency in transductive methods, where efficiency is measured by the size of the prediction sets. Specifically, we derive a strict finite-sample bound showing that any non-trivial confidence level leads to exponential growth in prediction set size for data with inherent uncertainty. The exponent scales linearly with the number of samples and is proportional to the conditional entropy of the data. Additionally, the bound includes a second-order term, dispersion, defined as the variance of the log conditional probability distribution. We show that the transductive methods based on the approximate conditional distribution can approach this bound. Inspired by this setup, we introduce a practical transductive prediction algorithm that surpasses Bonferroni methods.

2509.03456 2026-06-02 stat.ML cs.LG

Off-Policy Learning in Large Action Spaces: Optimization Matters More Than Estimation

大动作空间中的离线策略学习:优化比估计更重要

Imad Aouali, Otmane Sakhi

AI总结 本文研究离线上下文强盗中的离线策略学习,发现现有方法在大动作空间中面临严重优化问题,提出使用加权对数似然目标可改善优化并取得竞争性策略。

详情
Comments
ICML '26
AI中文摘要

离线策略评估(OPE)和离线策略学习(OPL)是离线上下文强盗中决策制定的基础。最近OPL的进展主要优化具有改进统计特性的OPE估计器,假设更好的估计器自然产生更优的策略。尽管有理论依据,但这种以估计器为中心的方法忽略了一个关键的实际障碍:具有挑战性的优化景观。在本文中,我们提供理论见解和实证证据,表明当前的OPL方法遇到严重的优化问题,特别是随着动作空间的增长。我们表明,估计器感知的策略参数化可以缓解但不能完全解决优化挑战。在此基础上,我们探索更简单的加权对数似然目标,并证明它们具有显著更好的优化特性,并且仍然能够恢复具有竞争力、通常更优的学习策略。我们的发现强调了在开发针对大动作空间的OPL算法时,明确考虑优化问题的必要性。

英文摘要

Off-policy evaluation (OPE) and off-policy learning (OPL) are foundational for decision-making in offline contextual bandits. Recent advances in OPL primarily optimize OPE estimators with improved statistical properties, assuming that better estimators inherently yield superior policies. Although theoretically justified, this estimator-centric approach neglects a critical practical obstacle: challenging optimization landscapes. In this paper, we provide theoretical insights and empirical evidence showing that current OPL methods encounter severe optimization issues, particularly as the action space grows. We show that estimator-aware policy parametrization can mitigate, but not fully resolve, optimization challenges. Building on this, we explore simpler weighted log-likelihood objectives and demonstrate that they enjoy substantially better optimization properties and still recover competitive, often superior, learned policies. Our findings emphasize the necessity of explicitly addressing optimization considerations in the development of OPL algorithms for large action spaces.

2508.15508 2026-06-02 stat.AP

Post-processing of ensemble photovoltaic power forecasts with distributional and quantile regression methods

基于分布和分位数回归方法的光伏功率集合预报后处理

Martin János Mayer, Ágnes Baran, Sebastian Lerch, Nina Horat, Dazhi Yang, Sándor Baran

AI总结 针对光伏功率集合预报的系统性偏差,系统评估了七种统计后处理方法(含参数/非参数、统计/机器学习),发现非参数方法优于参数方法,其中先进非线性分位数回归模型表现最佳。

详情
Journal ref
Solar Energy 307 (2026), paper 114361
Comments
34 pages, 12 figures, 6 tables
AI中文摘要

光伏发电的准确可靠预报对于电网运行、电力市场和能源规划至关重要,因为太阳能系统现在在许多国家的电力供应中占有重要份额。光伏功率预报通常通过模型链将相关天气变量的预报转换为功率预测。使用数值天气预报模型的集合模拟会产生以预报集合形式呈现的概率性光伏预报。然而,天气预报通常存在系统误差,这些误差会通过模型链传播,导致有偏和/或未校准的光伏功率预测。这些缺陷可以通过统计后处理来缓解。利用匈牙利七个公用事业规模光伏电站的光伏发电数据和相应的短期光伏功率集合预报,我们系统地评估并比较了七种最先进的光伏功率预报后处理方法。这些方法包括参数和非参数技术,以及统计和基于机器学习的方法。我们的结果表明,与原始光伏功率集合相比,任何形式的统计后处理都显著提高了预测性能。非参数方法优于参数模型,其中先进的非线性分位数回归模型显示出最佳结果。此外,基于机器学习的方法超越了传统的统计方法。

英文摘要

Accurate and reliable forecasting of photovoltaic (PV) power generation is crucial for grid operations, electricity markets, and energy planning, as solar systems now contribute a significant share of the electricity supply in many countries. PV power forecasts are often generated by converting forecasts of relevant weather variables to power predictions via a model chain. The use of ensemble simulations from numerical weather prediction models results in probabilistic PV forecasts in the form of a forecast ensemble. However, weather forecasts often exhibit systematic errors that propagate through the model chain, leading to biased and/or uncalibrated PV power predictions. These deficiencies can be mitigated by statistical post-processing. Using PV production data and corresponding short-term PV power ensemble forecasts at seven utility-scale PV plants in Hungary, we systematically evaluate and compare seven state-of-the-art methods for post-processing PV power forecasts. These include both parametric and non-parametric techniques, as well as statistical and machine learning-based approaches. Our results show that compared to the raw PV power ensemble, any form of statistical post-processing significantly improves the predictive performance. Non-parametric methods outperform parametric models, with advanced nonlinear quantile regression models showing the best results. Furthermore, machine learning-based approaches surpass their traditional statistical counterparts.

2410.17105 2026-06-02 econ.EM stat.AP

General Seemingly Unrelated Local Projections

一般看似无关的局部投影

Florian Huber, Christian Matthes, Michael Pfarrhofer

AI总结 提出一个贝叶斯框架,通过联合建模所有局部投影为看似无关的方程组,并基于高斯过程定义灵活的先验,实现多冲击、多工具变量下脉冲响应的联合推断。

详情
Comments
Keywords: Local Projections, Impulse Responses, Instruments, Bayesian Methods, Gaussian Process; JEL: C11, C22, C26, E00
AI中文摘要

我们开发了一个灵活的贝叶斯框架,用于使用工具变量的局部投影(LPs)估计脉冲响应。该框架容纳多个冲击和工具变量,通过将所有LPs联合建模为一个看似无关的方程组来处理多步预测中的自相关,基于高斯过程为脉冲响应定义一个灵活而简约的联合先验,并允许对整个脉冲响应向量进行联合推断。通过蒙特卡洛模拟,我们证明该方法比标准方法提供更准确的点估计和不确定性估计。为了解决误设定问题,我们提出了一个基于幂后验的可选稳健化步骤。

英文摘要

We develop a flexible framework for Bayesian estimation of impulse responses using Local Projections (LPs) with instrumental variables. It accommodates multiple shocks and instruments, accounts for autocorrelation in multi-step forecasts by jointly modeling all LPs as a seemingly unrelated system of equations, defines a flexible yet parsimonious joint prior for impulse responses based on a Gaussian Process, and allows for joint inference about the entire vector of impulse responses. We show via Monte Carlo simulations that our approach delivers more accurate point and uncertainty estimates than standard methods. To address misspecification, we propose an optional robustification step based on power posteriors.

2507.21692 2026-06-02 stat.ME

Signal Detection under Composite Hypotheses with Identical Distributions for Signals and for Noises

信号与噪声分布相同的复合假设下的信号检测

Yiming Xing, Anamitra Chaudhuri, Yifan Chen

AI总结 针对多个数据流中信号与噪声各自服从相同分布的场景,提出一种控制族系误差率并渐近达到最小期望样本量的检测方法。

详情
Comments
9 pages, 3 figure
AI中文摘要

本文考虑在多个顺序观测的数据流中检测信号的问题,其中每个数据流的分布属于两个常见的复合空间之一,取决于它是信号还是噪声。针对这一问题,我们研究了一个实际但尚未充分探索的场景:先验已知所有信号具有相同分布,所有噪声也具有相同分布。与局部分布可任意取值的一般设置相比,这种结构由于联合分布空间更小而有助于更快决策。然而,由于局部分布现在耦合在一起,这给问题分析和检验设计带来了额外挑战。本文首先建立了最小期望样本量的通用下界,该下界刻画了问题的本质难度,其常数既不是一般设置下界中出现的信号/噪声分布到噪声/信号分布空间的最小Kullback-Leibler散度,也不是信号分布与噪声分布之间的Kullback-Leibler散度。此外,我们提出了一种检验方法,能够将两类族系误差率控制在任意水平以下,并在水平趋于零时渐近达到最小期望样本量。数值研究展示了与一般设置下最先进检验的比较,并证明了对模型误设的鲁棒性。

英文摘要

In this paper, we consider the problem of detecting signals in multiple, sequentially observed data streams, where the distribution of each stream lies in one of two common composite spaces, depending on whether it is a signal or a noise. For this problem, we study a practical yet underexplored setting where it is a priori known that all signals have an identical distribution and so do all noises. Compared to the general setting where local distributions are free to take any values, this structure facilitates faster decision-making thanks to a smaller joint distribution space. However, it introduces additional challenges to the analysis of problem and design of tests, since the local distributions are now coupled. In this paper, we first establish a universal lower bound on the minimum expected sample size, which characterizes the essential difficulty of the problem and involves constants that are neither the minimum Kullback-Leibler divergences between the signal/noise distribution to the noise/signal distribution space, which appear in the lower bound for the general setting, nor the Kullback-Leibler divergences between the signal distribution and the noise distribution. Besides, we propose a test that controls the two types of familywise error rates below arbitrary levels, and achieves the minimum expected sample size asymptotically as the levels go to zero. Numerical studies are presented to compare with the state-of-the-art test for the general setting and demonstrate robustness against model misspecification.

2209.00102 2026-06-02 stat.ME stat.AP

Bayesian Mixed Multidimensional Scaling for Auditory Processing

贝叶斯混合多维尺度分析在听觉处理中的应用

Giovanni Rebaudo, Fernando Llanos, Bharath Chandrasekaran, Abhra Sarkar

AI总结 针对听觉处理中个体和群体异质性问题,提出贝叶斯混合多维尺度分析方法,恢复可识别潜在特征并自动确定维度,揭示语言背景差异。

详情
AI中文摘要

人脑通过将声学信号映射到潜在感知空间来区分语音。该空间可通过多维尺度分析(MDS)估计,在较低维度上保留相似性结构。然而,个体和群体水平的异质性,特别是母语与非母语听众之间的差异,仍知之甚少。先前的方法常常忽略这种变异性或无法捕捉共享结构,限制了有原则的比较。此外,文献通常关注潜在距离而非潜在特征本身。为解决这些问题,我们开发了一种贝叶斯混合MDS方法,该方法考虑了受试者和群体水平的异质性,允许恢复唯一、可识别的潜在特征,促进其生物学可解释性,同时以自动、数据自适应方式确定潜在空间的有效维度。模拟和听觉神经科学应用展示了这些特征如何重建观测距离,并随个体和语言背景变化,揭示新的见解。

英文摘要

The human brain distinguishes speech sounds by mapping acoustic signals into a latent perceptual space. This space can be estimated via multidimensional scaling (MDS), preserving the similarity structure in lower dimensions. However, individual and group-level heterogeneity, especially between native and non-native listeners, remains poorly understood. Prior approaches often ignore such variability or cannot capture shared structure, limiting principled comparisons. Moreover, the literature often focuses on latent distances rather than the underlying features themselves. To address these issues, we develop a Bayesian mixed MDS method that accounts for both subject- and group-level heterogeneity, allows for the recovery of unique, identifiable latent features, facilitating their biological interpretability, while also determining the effective dimensionality of the latent space in an automated, data-adaptive manner. Simulations and an auditory neuroscience application demonstrate how these features reconstruct observed distances and vary with individual and language background, revealing novel insights.

2507.14464 2026-06-02 stat.ME

Exact conditional goodness-of-fit tests for the mixed membership stochastic block model

混合成员随机块模型的精确条件拟合优度检验

Sourav Majumdar

AI总结 针对有向混合成员随机块模型,提出基于块对边总数的精确条件拟合优度检验,用于检测残差发送者/接收者异质性、互惠性和有向传递闭合性。

详情
AI中文摘要

我们针对有向混合成员随机块模型提出了精确条件拟合优度检验。给定dyad级别的发送者和接收者角色,块对边总数对于块概率矩阵是充分的;以这些总数为条件,在有限纤维上得到一个无干扰的均匀分布。这产生了针对残差发送者和接收者异质性、互惠性以及有向传递闭合性的有限样本随机化检验。该过程使用独立的纤维采样器、蒙特卡洛秩\(p\)值,并且可以在从后验分布中抽取潜在块对分配后应用。模拟和Sampson修道院网络表明,这些检验在原假设下是校准的,并且对于有向模型误设定具有诊断价值。

英文摘要

We propose exact conditional goodness-of-fit tests for directed mixed membership stochastic block models. Given dyad-level sender and receiver roles, the block-pair edge totals are sufficient for the block probability matrix; conditioning on these totals gives a nuisance-free uniform law on a finite fiber. This yields finite-sample randomization tests for residual sender and receiver heterogeneity, reciprocity, and directed transitive closure. The procedure uses an independent fiber sampler, Monte Carlo rank \(p\)-values, and can be applied after drawing latent block-pair assignments from the posterior distribution. Simulations and the Sampson monastery network show that the tests are calibrated under the null and diagnostically useful for directed model misspecification.

2506.21278 2026-06-02 stat.ML cs.AI cs.LG math.ST stat.TH

Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution

使用高效球面柯西分布的超球面变分自编码器

Lukas Sablica, Kurt Hornik

AI总结 提出基于球面柯西分布的超球面变分自编码器,通过莫比乌斯变换实现可微重参数化,避免贝塞尔函数计算,在保持重尾特性的同时提供高效稳定的训练与推理。

详情
AI中文摘要

我们提出在超球面潜变量空间上使用球面柯西(spCauchy)潜变量的变分自编码器。spCauchy 族具有重尾全局行为,并且通过对球面上的均匀样本应用莫比乌斯变换,允许精确可微的重参数化。我们证明,在高浓度极限下,spCauchy 在显式浓度参数映射下恢复了 von Mises-Fisher(vMF)分布的局部切空间几何,同时避免了 vMF 实现所需的高阶贝塞尔函数计算。对于训练,到均匀球面先验的 Kullback-Leibler 散度具有快速收敛的级数、稳定的求积以及高浓度渐近形式。我们进一步建立了浓度依赖的 KL 核心的单调性,并推导了具有闭形式代理和误差控制的解析括号,支持极端情况下的稳定近似。压力测试基准表明,所得到的潜层目标在 CPU 和 GPU 上比 vMF 基线更稳定且评估更快。在图像和分子序列数据上的实验表明,spCauchy-VAE 为具有超球面潜表示的生式建模提供了一种鲁棒且可扩展的替代方案。

英文摘要

We propose spherical Cauchy (spCauchy) latent variables for variational autoencoders on hyperspherical latent spaces. The spCauchy family has heavy-tailed global behavior and admits an exact differentiable reparameterization by applying a Möbius transformation to uniform samples on the sphere. We show that, in the high-concentration limit, spCauchy recovers the local tangent-space geometry of the von Mises-Fisher (vMF) distribution under an explicit concentration parameter mapping, while avoiding the high-order Bessel-function evaluations required by vMF implementations. For training, the Kullback-Leibler divergence to a uniform spherical prior admits rapidly convergent series, stable quadrature, and high-concentration asymptotic forms. We further establish monotonicity of the concentration-dependent KL core and derive analytic brackets with closed-form surrogates and error control, supporting stable approximation in extreme regimes. Stress-test benchmarks show that the resulting latent-layer objective remains stable and faster to evaluate than vMF baselines on CPU and GPU. Experiments on image and molecular sequence data demonstrate that spCauchy-VAEs provide a robust and scalable alternative for generative modeling with hyperspherical latent representations.

2412.06528 2026-06-02 math.ST stat.AP stat.TH

Highest Posterior Density Intervals of Unimodal Distributions As Analogues to Profile Likelihood Ratio Confidence Intervals

单峰分布的最高后验密度区间作为轮廓似然比置信区间的类比

A. X. Venu

AI总结 本文证明在特定条件下,单峰分布的最高后验密度区间与频率学派轮廓似然比置信区间在单调变换下具有类似的变换不变性。

详情
Comments
This paper needs to be revised such that it frames as transformation invariance with respect to relative likelihood, which is an acceptable concept within Likelihood Framework, but not acceptable within the orthodox Frequentist framework
AI中文摘要

在贝叶斯统计中,最高后验密度(HPD)区间常用于描述后验分布的性质。作为估计置信区间(CI)的一种方法,HPD具有两个主要理想性质。首先,它是具有指定覆盖概率的最短区间。其次,HPD区间内的每个点的密度都大于区间外的每个点的密度。然而,HPD区间有时因变换不变性而受到批评。我们论证在特定条件下,HPD区间是频率学派轮廓似然比置信区间(LRCI)的自然类比。我们的主要结果是推导出一个证明,表明在指定条件下,关于密度众数的HPD区间对于单调函数具有变换不变性,其方式类似于轮廓LRCI。

英文摘要

In Bayesian statistics, the highest posterior density (HPD) interval is often used to describe properties of a posterior distribution. As a method for estimating confidence intervals (CIs), the HPD has two main desirable properties. Firstly, it is the shortest interval to have a specified coverage probability. Secondly, every point inside the HPD interval has a density greater than every point outside the interval. However, the HPD interval is sometimes criticized for being transformation invariant. We make the case that under certain conditions the HPD interval is a natural analog to the frequentist profile likelihood ratio confidence interval (LRCI). Our main result is to derive a proof showing that under specified conditions, the HPD interval with respect to the density mode is transformation invariant for monotonic functions in a manner which is similar to a profile LRCI.

2507.07339 2026-06-02 stat.AP cs.LG

Benchmarking Waitlist Mortality Prediction in Heart Transplantation Through Time-to-Event Modeling using New Longitudinal UNOS Dataset

基于新的纵向UNOS数据集通过时间-事件模型对心脏移植等待名单死亡率预测进行基准测试

Yingtao Luo, Reza Skandari, Carlos Martinez, Arman Kilic, Rema Padman

AI总结 本研究利用纵向等待名单历史数据,通过时间-事件模型对心脏移植等待名单死亡率进行预测,最佳模型C-Index达0.94,AUROC达0.89,显著优于以往模型。

详情
Comments
Best Student Paper Finalist in Proceedings of AMIA Annual Symposium 2025
AI中文摘要

目前,关于心脏移植等待名单患者管理的决策由医生委员会根据多种因素做出,但过程在很大程度上仍是临时的。随着2018年以来器官共享联合网络(UNOS)收集的纵向患者、供体和器官数据量的增加,人们对在器官可用时支持临床决策的分析方法越来越感兴趣。在本研究中,我们对利用纵向等待名单历史数据进行时间依赖性、时间-事件建模的机器学习模型进行了基准测试,以预测等待名单死亡率。我们使用23,807条患者记录(包含77个变量)进行训练,并在1年时间范围内评估生存预测和区分能力。我们的最佳模型实现了0.94的C-Index和0.89的AUROC,显著优于以往模型。关键预测因子与已知风险因素一致,同时也揭示了新的关联。我们的发现可以支持心脏移植决策中的紧迫性评估和政策改进。

英文摘要

Decisions about managing patients on the heart transplant waitlist are currently made by committees of doctors who consider multiple factors, but the process remains largely ad-hoc. With the growing volume of longitudinal patient, donor, and organ data collected by the United Network for Organ Sharing (UNOS) since 2018, there is increasing interest in analytical approaches to support clinical decision-making at the time of organ availability. In this study, we benchmark machine learning models that leverage longitudinal waitlist history data for time-dependent, time-to-event modeling of waitlist mortality. We train on 23,807 patient records with 77 variables and evaluate both survival prediction and discrimination at a 1-year horizon. Our best model achieves a C-Index of 0.94 and AUROC of 0.89, significantly outperforming previous models. Key predictors align with known risk factors while also revealing novel associations. Our findings can support urgency assessment and policy refinement in heart transplant decision making.

2506.17141 2026-06-02 stat.ME

The fundamental problem of risk prediction for individuals: health AI, uncertainty, and personalized medicine

个体风险预测的基本问题:健康人工智能、不确定性与个性化医疗

Lasai Barreñada, Ewout W Steyerberg, Dirk Timmerman, Doranne Thomassen, Laure Wynants, Ben Van Calster

AI总结 本文通过卵巢癌诊断案例,实证研究了个体风险估计中模型不确定性和适用性不确定性远大于估计不确定性,即使模型在群体层面表现良好,并指出预测算法应辅助而非主导临床决策。

详情
Comments
9 pages, 2 tables, 2 figures
AI中文摘要

背景与目标:临床预测模型通常基于群体表现进行评估,但决策是针对个体做出的。经典观点将个体风险估计的不确定性归因于样本量(估计不确定性),而其他来源包括模型不确定性(建模选择的变异性)和适用性不确定性(测量程序和人群间的变异性)。我们旨在通过卵巢癌案例说明预测模型在估计个体风险时的不确定性。方法:我们使用真实和合成的卵巢癌诊断数据,训练了59400个模型,这些模型在估计、模型和适用性不确定性方面存在差异。然后使用这些模型估计固定测试集(100名患者)中卵巢癌的概率,并评估个体估计的变异性。结果:我们实证表明,即使模型在群体层面表现良好,估计不确定性也可能被模型不确定性和适用性不确定性强烈主导。随着训练样本量的增加,估计不确定性显著下降,而模型不确定性和适用性不确定性仍然很大。结论:个体风险估计远比通常假设的更不确定。当预测模型或算法基于单一研究时,模型不确定性和适用性不确定性通常不可见。预测算法应告知而非支配护理,并通过临床医生-患者互动支持个性化。

英文摘要

Background and Objective: Clinical prediction models are commonly evaluated regarding performance for a population, although decisions are made for individuals. The classic view relates uncertainty in risk estimates for individuals to sample size (estimation uncertainty) while other sources are model uncertainty (variability in modeling choices) and applicability uncertainty (variability in measurement procedures and between populations). We aim to illustrate the uncertainty of prediction models in estimating individual risks with an ovarian cancer example. Methods: We used real and synthetic data for ovarian cancer diagnosis to train 59400 models with variations in estimation, model, and applicability uncertainty. We then used these models to estimate the probability of ovarian cancer in a fixed test set of 100 patients and evaluate the variability in individual estimates. Results: We show empirically that estimation uncertainty can be strongly dominated by model uncertainty and applicability uncertainty, even for models that perform well at the population level. Estimation uncertainty decreased considerably with increasing training sample size, whereas model and applicability uncertainty remained large. Conclusion: Individual risk estimates are far more uncertain than often assumed. Model uncertainty and applicability uncertainty usually remain invisible when prediction models or algorithms are based on a single study. Predictive algorithms should inform, not dictate, care and support personalization through clinician-patient interaction.

2506.15578 2026-06-02 stat.AP

Statistical post-processing of operational dual-resolution wind-speed ensemble forecasts

业务双分辨率风速集合预报的统计后处理

Sándor Baran, Mária Lakatos

AI总结 本研究采用集合模型输出统计方法,比较分析了欧洲中期天气预报中心9km和36km水平分辨率及其混合的原始和后处理风速集合预报的预报技巧,发现后处理显著提升概率校准和点预报精度,且空间分辨率优于集合大小,但将高分辨率成员加入低分辨率集合可带来最大增益。

详情
Journal ref
Quarterly Journal of the Royal Meteorological Society (2026), paper e70201
Comments
26 pages, 17 figures
AI中文摘要

天气预报面临若干挑战,包括大气的混沌特性以及数值天气预报模型的高计算需求。为了获得最准确的预测,理想情况是尽可能低的空间分辨率和尽可能大的集合规模。本研究详细比较分析了欧洲中期天气预报中心发布的9km和36km水平分辨率的中期和延伸期风速集合预报及其各种混合的原始和后处理预报技巧。我们采用集合模型输出统计方法进行预报校准,使用了三种不同的空间训练数据选择技术。首先,我们研究了50成员中期和100成员延伸期预测(分别称为高分辨率和低分辨率)及其150成员双分辨率组合的性能。此外,我们考察了通过加入高分辨率集合成员是否能提高原始和后处理低分辨率预报的性能。我们的结果证实,所有后处理预报在概率校准和点预报精度方面均优于原始集合预报,并且后处理显著减少了不同配置之间的差异。我们还表明,空间分辨率优于集合大小;用低分辨率预报扩充足够大的高分辨率集合并不一定会带来预报技巧的提升。然而,我们的研究也突显了另一个方向的明显优势,即将高分辨率成员纳入低分辨率集合预报,在拥有最多高分辨率成员的配置中观察到最大的增益。

英文摘要

Weather forecasting presents several challenges, including the chaotic nature of the atmosphere and the high computational demands of numerical weather prediction models. To achieve the most accurate predictions, the ideal scenario involves the lowest possible horizontal resolution and the largest ensemble size. This study provides a detailed comparative analysis of the forecast skill of the raw and post-processed medium- and extended-range wind-speed ensemble forecasts of the European Centre for Medium-Range Weather Forecasts issued at 9 km and 36 km horizontal resolutions, respectively, and their various mixtures. We utilized the ensemble model output statistic approach for forecast calibration with three different spatial training data selection techniques. First, we investigate the performance of the 50-member medium-range and 100-member extended-range predictions - referred to as high and low resolution, respectively - and their 150-member dual-resolution combination. Further, we examine whether the performance of raw and post-processed low-resolution forecasts can be improved by incorporating high-resolution ensemble members. Our results confirm that all post-processed forecasts outperform the raw ensemble predictions in terms of probabilistic calibration and point forecast accuracy and that post-processing considerably reduces the differences between the various configurations. We also show that spatial resolution is superior to the ensemble size; augmenting a sufficiently large ensemble of high-resolution forecasts with low-resolution predictions does not necessarily result in a gain in forecast skill. However, our study also highlights the clear benefit of the other direction, namely, incorporating high-resolution members into low-resolution ensemble forecasts, where the most significant gains are observed in configurations with the highest number of high-resolution members.

2506.10677 2026-06-02 stat.ML cs.LG

Exploiting Similarities in A/B Testing with Off-Policy Estimation

利用离线策略估计在A/B测试中的相似性

Otmane Sakhi, Alexandre Gilotte, David Rohde

AI总结 本文提出利用离线策略估计方法,通过捕捉新旧系统决策倾向的相似性,构建一族A/B测试估计器,在保持无偏性的同时改善集中性质,提高统计效率。

详情
Comments
KDD '26
AI中文摘要

我们研究A/B测试,即衡量新决策系统相对于基线的性能增益的标准协议。传统的A/B测试将两个系统视为黑箱,忽略了它们之间的潜在相似性。然而,在实践中,新系统和基线系统很少存在根本性差异,通常共享显著的结构,这可以通过它们做出相似决策的倾向来捕捉。我们表明,在这种情况下,常用的均值差估计量虽然无偏,但在统计上并非最优。利用离线策略估计,我们引入了一族A/B测试估计量,这些估计量利用被测试系统的倾向来获得改进的集中性质。这族估计量足够灵活,可以针对实际决策进行定制。得到的估计量简单、对倾向性误设具有鲁棒性,在测试系统表现出相似性时显著更准确,并在缺乏这种相似性时优雅地退化为均值差估计量。我们的理论分析和实证研究证实了它们的效率和实用性。

英文摘要

We study A/B testing, the standard protocol for measuring the performance gain of a new decision system relative to a baseline. Traditional A/B testing treats both systems as black boxes, ignoring potential similarities between them. In practice, however, new and baseline systems are rarely radically different and often share significant structure, which can be captured by their propensities to make similar decisions. We show that in such cases, the commonly used difference-in-means estimator, though unbiased, is statistically suboptimal. Leveraging off-policy estimation, we introduce a family of A/B testing estimators that exploit the propensities of the tested systems to achieve improved concentration properties. This family is flexible enough to be tailored to practical decision-making. The resulting estimators are simple, robust to propensities misspecification, substantially more accurate when the tested systems exhibit similarities, and gracefully fall back to the difference-in-means estimator when such similarities are absent. Our theoretical analysis and empirical studies confirm their efficiency and practicality.

2506.07825 2026-06-02 stat.ME

Identifiability in epidemic models with prior immunity and under-reporting

具有先验免疫和漏报的流行病模型的可识别性

Fanny Bergström, Martina Favero, Tom Britton

AI总结 研究在仅有报告病例数据时,修正SIR模型中漏报比例、先验免疫比例和社区传播率三个参数联合不可识别的问题,并证明通过补充先验免疫或患病率调查数据可实现参数可识别。

详情
AI中文摘要

可识别性是数学建模中的一个性质,它决定了模型参数能否从数据中唯一估计。对于传染病模型,未能确保可识别性可能导致参数估计误导和政策建议不可靠。我们研究了考虑漏报和人群中先验免疫的修正SIR模型的可识别性。我们数学证明了当仅有报告病例数据时,联合估计三个参数(漏报比例、先验免疫比例和社区传播率)是不可识别的。然后,我们通过分析和模拟研究显示,如果报告发病率补充了先验免疫或暴发期间患病率的样本调查数据,则所有三个参数的可识别性得以实现。我们的结果显示了部分观测流行病中参数推断的局限性,以及在开发和应用于公共卫生决策的模型中进行可识别性分析的重要性。

英文摘要

Identifiability is the property in mathematical modelling that determines if model parameters can be uniquely estimated from data. For infectious disease models, failure to ensure identifiability can lead to misleading parameter estimates and unreliable policy recommendations. We examine the identifiability of a modified SIR model that accounts for under-reporting and pre-existing immunity in the population. We provide a mathematical proof of the unidentifiability of jointly estimating three parameters: the fraction under-reporting, the proportion of the population with prior immunity, and the community transmission rate, when only reported case data are available. We then show, analytically and with a simulation study, that the identifiability of all three parameters is achieved if the reported incidence is complemented with sample survey data of prior immunity or prevalence during the outbreak. Our results show the limitations of parameter inference in partially observed epidemics and the importance of identifiability analysis when developing and applying models for public health decision making.

2506.01498 2026-06-02 stat.ME stat.CO

Simulating Complex Crossectional and Longitudinal Data using the simDAG R Package

使用 simDAG R 包模拟复杂的横截面和纵向数据

Robin Denz, Nina Timmesfeld

AI总结 本文介绍 simDAG R 包,它通过有向无环图中的结构方程定义,使用任意函数或回归模型,提供标准化方法来生成简单和复杂的数据,支持多种数据类型和依赖关系,并支持半连续时间尺度上的纵向数据模拟。

详情
Comments
provisionally accepted for publication in "Journal of Statistical Software"
AI中文摘要

生成人工数据是进行蒙特卡洛模拟研究的关键步骤。根据计划的研究,可能需要包含多个可能随时间变化的变量、具有各种依赖关系和数据类型的数据生成过程(DGP)。因此,从这样的 DGP 模拟数据可能变得困难且耗时。simDAG R 包提供了一种标准化方法,基于有向无环图中使用任意函数或回归模型定义的结构方程,从简单和复杂的 DGP 生成数据。该包提供了清晰的语法和增强的公式接口,直接支持生成具有任意依赖关系、可能非线性关系和交互作用的二进制、分类、计数和时间-事件数据。此外,它还包括一个基于离散时间模拟的框架,允许在半连续时间尺度上生成纵向数据。这种方法可用于生成具有复发或竞争事件以及可能多个随时间变化的协变量的时间-事件数据,这些协变量本身可能具有任意数据类型。在本文中,我们通过复制多个真实蒙特卡洛模拟研究的 DGP,展示了 simDAG 中包含的大量功能。

英文摘要

Generating artificial data is a crucial step when performing Monte-Carlo simulation studies. Depending on the planned study, complex data generation processes (DGP) containing multiple, possibly time-varying, variables with various forms of dependencies and data types may be required. Simulating data from such DGP may therefore become a difficult and time-consuming endeavor. The simDAG R package offers a standardized approach to generate data from simple and complex DGP based on the definition of structural equations in directed acyclic graphs using arbitrary functions or regression models. The package offers a clear syntax with an enhanced formula interface and directly supports generating binary, categorical, count and time-to-event data with arbitrary dependencies, possibly non-linear relationships and interactions. It additionally includes a framework to conduct discrete-time based simulations which allows the generation of longitudinal data on a semi-continuous time-scale. This approach may be used to generate time-to-event data with both recurrent or competing events and possibly multiple time-varying covariates, which may themselves have arbitrary data types. In this article we demonstrate the vast amount of features included in simDAG by replicating the DGP of multiple real Monte-Carlo simulation studies.

2505.19925 2026-06-02 stat.ME cs.LG

Cellwise and Casewise Robust Covariance in High Dimensions

高维中的逐细胞和逐案例稳健协方差

Fabio Centofanti, Mia Hubert, Peter J. Rousseeuw

AI总结 提出cellRCov方法,通过主成分和正交子空间分解结合岭正则化,同时处理高维数据中的案例异常值、细胞异常值和缺失数据,并建立了理论性质。

详情
AI中文摘要

样本协方差矩阵是多变量统计的基石,但它对异常值高度敏感。这些异常值可以是案例异常值(例如属于不同总体的案例),也可以是细胞异常值(数据矩阵中的偏差单元格)。最近开发了一些能够处理这两种异常值的稳健协方差估计量,但其计算仅适用于最多20维。为了解决这个问题,我们提出了cellRCov方法,这是一种同时处理案例异常值、细胞异常值和缺失数据的稳健协方差估计量。它依赖于协方差在主成分和正交子空间上的分解,利用了稳健PCA的最新工作。它还采用岭型正则化来稳定估计的协方差矩阵。我们建立了cellRCov的一些理论性质,包括其逐案例和逐细胞影响函数以及一致性和渐近正态性。模拟研究证明了cellRCov在污染和缺失数据场景中的优越性能。此外,其在异常检测的实际应用中也展示了实用性。我们还构建并展示了用于稳健和正则化典型相关分析的cellRCCA方法。

英文摘要

The sample covariance matrix is a cornerstone of multivariate statistics, but it is highly sensitive to outliers. These can be casewise outliers, such as cases belonging to a different population, or cellwise outliers, which are deviating cells (entries) of the data matrix. Recently some robust covariance estimators have been developed that can handle both types of outliers, but their computation is only feasible up to at most 20 dimensions. To remedy this we propose the cellRCov method, a robust covariance estimator that simultaneously handles casewise outliers, cellwise outliers, and missing data. It relies on a decomposition of the covariance on principal and orthogonal subspaces, leveraging recent work on robust PCA. It also employs a ridge-type regularization to stabilize the estimated covariance matrix. We establish some theoretical properties of cellRCov, including its casewise and cellwise influence functions as well as consistency and asymptotic normality. A simulation study demonstrates the superior performance of cellRCov in contaminated and missing data scenarios. Furthermore, its practical utility is illustrated in a real-world application to anomaly detection. We also construct and illustrate the cellRCCA method for robust and regularized canonical correlation analysis.

2505.14725 2026-06-02 q-bio.GN cs.LG stat.AP

HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity

HR-VILAGE-3K3M:用于系统免疫学的人类呼吸道病毒免疫纵向基因表达数据集

Xuejun Sun, Yiran Song, Xiaochen Zhou, Ruilie Cai, Yu Zhang, Xinyi Li, Rui Peng, Jialiu Xie, Yuanyuan Yan, Muyao Tang, Prem Lakshmanane, Baiming Zou, James S. Hagood, Raymond J. Pickles, Didong Li, Fei Zou, Xiaojing Zheng

AI总结 为解决呼吸道病毒感染研究中转录组数据分散且处理不一致的问题,构建了包含3178名受试者、66项研究的HR-VILAGE-3K3M数据集,整合了疫苗接种、病毒接种和混合暴露的批量及单细胞转录组数据,并进行了统一的预处理和质量控制,以支持生物标志物发现、免疫机制研究和分析方法开发。

详情
AI中文摘要

呼吸道病毒感染构成全球健康负担,但保护性和病理性的细胞免疫机制仍不清楚。自然感染队列通常缺乏暴露前基线和时间控制采样,而接种和疫苗试验则产生结构良好的纵向转录组数据。然而,这些数据集分散在多个存储库中且处理不一致,阻碍了整合性和AI驱动的分析。为应对这些挑战,我们开发了人类呼吸道病毒免疫纵向基因表达(HR-VILAGE-3K3M)存储库:一个整合了来自66项研究的3178名受试者的批量及单细胞转录组谱的AI就绪资源。该数据集涵盖疫苗接种、病毒接种和混合暴露,样本来自血液和鼻拭子,收集自GEO、ImmPort和ArrayExpress等公共存储库。我们整理并协调了受试者级别的元数据,标准化了结果测量,并应用了统一的预处理和严格的质量控制。我们还提供了基准分析以说明其实用性。该资源支持生物标志物发现、免疫机制和方法学开发。作为人类呼吸道病毒免疫领域最大的纵向转录组资源之一,HR-VILAGE-3K3M能够实现可重复和可扩展的分析,从而加速疫苗和抗病毒研究。

英文摘要

Respiratory viral infections pose a global health burden, yet the cellular immune mechanisms underlying protection and pathology remain unclear. Natural infection cohorts often lack pre-exposure baselines and time-controlled sampling, whereas inoculation and vaccination trials generate well-structured longitudinal transcriptomic data. However, these datasets are scattered across repositories and processed inconsistently, hindering integrative and AI-driven analyses. To address these challenges, we developed the Human Respiratory Viral Immunization LongitudinAl Gene Expression (HR-VILAGE-3K3M) repository: an AI-ready resource integrating bulk and single-cell transcriptomic profiles from 3,178 subjects across 66 studies. The dataset spans vaccination, inoculation, and mixed exposures, with samples from blood and nasal swabs collected from public repositories including GEO, ImmPort, and ArrayExpress. We curated and harmonized subject-level metadata, standardized outcome measures, and applied unified preprocessing with rigorous quality control. We further provide benchmark analyses illustrating its utility. This resource supports discovery of biomarkers, immune mechanisms, and methodological development. As one of the largest longitudinal transcriptomic resources for human respiratory viral immunization, HR-VILAGE-3K3M enables reproducible and scalable analyses to accelerate vaccine and antiviral research.

2505.10882 2026-06-02 cs.LG stat.ML

Global Convergence of Adaptive Sensing for Principal Eigenvector Estimation

主特征向量估计的自适应传感的全局收敛性

Alex Saad-Falcon, Brighton Ancelin, Justin Romberg

AI总结 本文分析Oja算法的一种压缩变体,利用每样本两个自适应测量估计协方差矩阵的主特征向量,证明了期望正弦平方误差的收敛速率并给出信息论下界,揭示了压缩带来的维度代价。

详情
Comments
Accepted at ICML 2026. 34 pages (9 main text + appendices), 4 figures, 2 tables. v2 (camera-ready) adds a matching information-theoretic lower bound and a non-adaptive lower-bound separation across three powers of d; substantially revised from v1
AI中文摘要

主成分分析经典地需要完整的$d$维样本,但在各种应用中,硬件限制每次采集只能获得少量标量测量。我们分析了Oja算法的一种压缩变体,用于估计数据协方差矩阵的主特征向量,每样本仅使用两个自适应测量。在每次迭代中,我们沿着当前估计方向进行一次测量,并在随机正交方向上进行一次测量。我们证明,经过$t$次迭代后,到真实特征向量的期望正弦平方误差为$\mathcal{O}(\lambda_1\lambda_2 d^2 / (\Delta^2 t))$,其中$d$是环境维度,$\lambda_1, \lambda_2$是前导特征值,$\Delta = \lambda_1 - \lambda_2$是特征间隙。我们用一个匹配的信息论下界$\Omega(\lambda_1\lambda_2 d^2 / (\Delta^2 t))$来补充这一结果——这是压缩特征向量估计的第一个下界——证明$d^2$因子(与完全观测的极小极大速率$\Theta(\lambda_1\lambda_2 d / (\Delta^2 t))$相比多了一个$d$因子)是压缩的基本代价,无法改进。相比之下,每次迭代两次测量的任何非自适应方案都会遭受$\Omega(\lambda_2^2 d^3 / (\Delta^2 t))$的误差,多了一个$d$的幂次。这通过$d$的三个幂次将完全观测PCA、自适应压缩PCA和非自适应压缩PCA区分开来。我们的分析处理了协方差具有非零尾部特征值的噪声设置,为无噪声情况之外的自适应压缩子空间跟踪提供了首个收敛性保证。

英文摘要

Principal component analysis classically requires full $d$-dimensional samples, yet in various applications hardware limits acquisition to a few scalar measurements per sample. We analyze a compressed variant of Oja's algorithm for estimating the principal eigenvector of the data covariance matrix using only two adaptive measurements per sample. At each iteration, we observe one measurement along the current estimate and one in a random orthogonal direction. We prove that after $t$ iterations, the expected sine-squared error to the true eigenvector is $\mathcal{O}(λ_1λ_2 d^2 / (Δ^2 t))$, where $d$ is the ambient dimension, $λ_1, λ_2$ are the leading eigenvalues, and $Δ= λ_1 - λ_2$ is the eigengap. We complement this with a matching information-theoretic lower bound of $Ω(λ_1λ_2 d^2 / (Δ^2 t))$ -- the first for compressed eigenvector estimation -- proving that the $d^2$ factor, an additional factor of $d$ compared to the fully-observed minimax rate $Θ(λ_1λ_2 d / (Δ^2 t))$, is the fundamental cost of compression and cannot be improved. In contrast, any non-adaptive scheme with two measurements per iteration suffers $Ω(λ_2^2 d^3 / (Δ^2 t))$, an additional power of $d$. This separates fully-observed, adaptive-compressed, and non-adaptive-compressed PCA across three powers of $d$. Our analysis handles the noisy setting where the covariance has nonzero trailing eigenvalues, providing the first convergence guarantee for adaptive compressed subspace tracking beyond the noiseless case.

2501.18798 2026-06-02 stat.ME math.ST stat.ML stat.TH

Targeted Data Fusion for Region-Specific Survival Effects in the AMP HIV Prevention Trials

AMP HIV预防试验中区域特异性生存效应的目标数据融合

Yi Liu, Alexander W. Levis, Ke Zhu, Shu Yang, Peter B. Gilbert, Larry Han

AI总结 针对AMP试验中区域异质性,提出一种基于L1正则化的联邦学习方法,通过加权组合各站点估计量,实现隐私保护下的区域自适应生存曲线推断。

详情
AI中文摘要

抗体介导预防(AMP)试验通过显示被动给予的单克隆广泛中和抗体(bnAbs)可以预防HIV-1感染,开辟了新的科学前沿。该试验在美国、巴西、秘鲁、瑞士和撒哈拉以南非洲等多个地理区域进行,揭示了治疗效果的显著区域异质性。这些差异,加上中央数据池化的隐私和监管限制,需要在不共享个体级数据的情况下跨区域借用强度的方法。为了在分布异质性下估计区域和治疗特异性生存曲线,我们开发了一种联邦学习方法,该方法通过L1正则化准则组合站点特异性估计量,该准则对与目标不一致的数据源进行降权。我们进一步将该框架扩展到一般类别的因果对比,包括风险差(RD)、生存比(SR)和受限平均生存时间(RMST)差。通过广泛的模拟和对不同目标人群下AMP试验的分析,我们表明所提出的方法提供了隐私保护、区域自适应的推断,并提高了精度。

英文摘要

The Antibody Mediated Prevention (AMP) trials opened a new scientific frontier by showing that passively administered monoclonal broadly neutralizing antibodies (bnAbs) could prevent HIV-1 acquisition. Conducted across multiple geographic regions, including the United States, Brazil, Peru, Switzerland, and sub-Saharan Africa, the AMP trials revealed substantial regional heterogeneity in treatment efficacy. These differences, together with privacy and regulatory limits on central data pooling, call for methods that borrow strength across regions without sharing individual-level data. To estimate region- and treatment-specific survival curves under distributional heterogeneity, we develop a federated learning approach that combines site-specific estimators via an L1-regularized criterion that downweights data sources not aligned with the target. We further extend the framework to a general class of causal contrasts, including the risk difference (RD), survival ratio (SR), and restricted mean survival time (RMST) difference. Through extensive simulations and an analysis of the AMP trials under different target populations, we show that the proposed approach provides privacy-preserving, region-adaptive inference with improved precision.

2409.15532 2026-06-02 math.PR math.DS stat.ME

A theory of generalised coordinates for stochastic differential equations

随机微分方程的广义坐标理论

Lancelot Da Costa, Nathaël Da Costa, Conor Heins, Johan Medrano, Grigorios A. Pavliotis, Thomas Parr, Ajith Anil Meera, Karl Friston

AI总结 针对非马尔可夫随机微分方程,提出基于广义运动坐标的路径分析理论,实现短时精确、全局解析的马尔可夫化,并应用于线性SDE求解和广义贝叶斯滤波。

详情
Comments
39 pages of main; 49 pages including abstract, TOC, Appendix and references
AI中文摘要

随机微分方程是物理学和科学中无处不在的建模工具。在大多数建模场景中,驱动动力学或运动的随机波动具有某种非平凡的时域相关结构,这使得SDE非马尔可夫;这种现象通常被称为“有色”噪声。因此,一个重要目标是开发有效的工具来数学和数值研究(可能是非马尔可夫的)SDE。在本报告中,我们形式化了一个基于所谓“广义运动坐标”的数学理论,用于分析和数值研究SDE。与粗糙路径理论类似,我们针对噪声的任何给定实现逐路径分析SDE,而不仅仅是概率性的。与已建立的马尔可夫实现理论类似,我们将非马尔可夫SDE实现为扩展空间中的马尔可夫过程。然而,与已建立的马尔可夫实现理论不同,这里的马尔可夫实现在短时间尺度上是精确的,并且当流和波动是解析时,可能在全局时间上精确。该理论对于具有解析流和波动的SDE是精确的,当流和波动可微时是近似的。它提供了有用的分析工具,我们利用这些工具来求解具有解析波动的线性SDE。它也可能有助于研究更粗糙的SDE,因为这些SDE可以被识别为更光滑SDE的极限。该理论为SDE的模拟、滤波和控制提供了有效且计算简便的方法;其中,我们重新推导了广义贝叶斯滤波,这是一种用于时间序列分析的最先进方法。展望未来,本报告表明广义坐标在随机微分方程中具有广泛的应用。

英文摘要

Stochastic differential equations are ubiquitous modelling tools in physics and the sciences. In most modelling scenarios, random fluctuations driving dynamics or motion have some non-trivial temporal correlation structure, which renders the SDE non-Markovian; a phenomenon commonly known as ``colored'' noise. Thus, an important objective is to develop effective tools for mathematically and numerically studying (possibly non-Markovian) SDEs. In this report, we formalise a mathematical theory for analysing and numerically studying SDEs based on so-called `generalised coordinates of motion'. Like the theory of rough paths, we analyse SDEs pathwise for any given realisation of the noise, not solely probabilistically. Like the established theory of Markovian realisation, we realise non-Markovian SDEs as a Markov process in an extended space. Unlike the established theory of Markovian realisation however, the Markovian realisations here are accurate on short timescales and may be exact globally in time, when flows and fluctuations are analytic. This theory is exact for SDEs with analytic flows and fluctuations, and is approximate when flows and fluctuations are differentiable. It provides useful analysis tools, which we employ to solve linear SDEs with analytic fluctuations. It may also be useful for studying rougher SDEs, as these may be identified as the limit of smoother ones. This theory supplies effective, computationally straightforward methods for simulation, filtering and control of SDEs; amongst others, we re-derive generalised Bayesian filtering, a state-of-the-art method for time-series analysis. Looking forward, this report suggests that generalised coordinates have far-reaching applications throughout stochastic differential equations.

2503.12270 2026-06-02 stat.ME

A Bayesian location-scale joint model for time-to-event and multivariate longitudinal data with association based on within-individual variability

基于个体内变异关联的贝叶斯位置尺度联合模型:用于事件时间与多元纵向数据

Marco Palma, Ruth H Keogh, Siobhán B Carr, Rhonda Szczesniak, David Taylor-Robinson, Angela M Wood, Graciela Muniz-Terrera, Jessica K Barrett

AI总结 提出一种贝叶斯联合模型,通过混合效应位置尺度模型分析纵向标志物的个体内变异及其与事件时间的关联,解决回归稀释问题,并应用于囊性纤维化数据评估肺功能、营养不良与死亡风险的关系。

详情
AI中文摘要

随时间测量的健康指标的个体内变异正逐渐用于提示疾病进展。通常使用简单的汇总统计量(如每个个体的标准差),但它们不适合解释时间变化。此外,当这些汇总统计量作为事件时间结果回归模型中的协变量时,风险比估计会受到回归稀释的影响。为克服这些问题,构建了一个联合模型,其中事件时间结果与多元纵向标志物之间的关联以后者的个体内变异来指定。使用混合效应位置尺度模型分析纵向生物标志物、其个体内变异及其相关性。事件时间采用比例风险回归模型建模,基线风险函数灵活指定,纵向标志物的信息通过随机效应共享。该模型可用于量化纵向标志物的个体内变异及其与事件时间结果的关联。通过模拟研究展示了该模型与具有恒定方差的标准联合模型相比的性能。该模型应用于英国囊性纤维化登记处的成年女性数据集,以评估肺功能、营养不良与死亡率之间的关联。

英文摘要

Within-individual variability of health indicators measured over time is becoming commonly used to inform about disease progression. Simple summary statistics (e.g. the standard deviation for each individual) are often used but they are not suited to account for time changes. In addition, when these summary statistics are used as covariates in a regression model for time-to-event outcomes, the estimates of the hazard ratios are subject to regression dilution. To overcome these issues, a joint model is built where the association between the time-to-event outcome and multivariate longitudinal markers is specified in terms of the within-individual variability of the latter. A mixed-effect location-scale model is used to analyse the longitudinal biomarkers, their within-individual variability and their correlation. The time to event is modelled using a proportional hazard regression model, with a flexible specification of the baseline hazard, and the information from the longitudinal biomarkers is shared as a function of the random effects. The model can be used to quantify within-individual variability for the longitudinal markers and their association with the time-to-event outcome. We show through a simulation study the performance of the model in comparison with the standard joint model with constant variance. The model is applied on a dataset of adult women from the UK cystic fibrosis registry, to evaluate the association between lung function, malnutrition and mortality.

2503.11583 2026-06-02 stat.CO

A Unified Framework for Multiple-Try Metropolis: Construction and Empirical Benchmarks

多重尝试Metropolis的统一框架:构建与实证基准

Renny Doig, Liangliang Wang

AI总结 本文在involutive MCMC框架下统一了多重尝试Metropolis算法,通过模拟实验评估了提议分布配置和候选数对非高斯多峰目标分布采样效率的影响。

详情
AI中文摘要

多重尝试Metropolis(MTM)算法使用包含多个候选样本的复合提议来提高局部采样效率。尽管已有若干方法论研究继续发展MTM及其特征性的多候选机制,但文献中缺乏对这些组件的统一比较。本文在involutive MCMC框架下提出了MTM的结构化表述,为基于提议机制推导有效接受概率提供了原则性方法。通过全面的模拟实验,我们评估了MTM配置对非高斯和多峰目标分布的影响。我们的结果表明,虽然权重函数是若干方法论发展的焦点,但它们对平稳采样效率的影响次于提议分布的配置。此外,我们发现增加候选数量虽然提高了每次迭代的效率,但实际性能提升被多重候选引入的计算开销所抵消,除非使用并行计算。我们的发现为针对复杂和非高斯目标配置MTM算法提供了实用指导。

英文摘要

The multiple-try Metropolis (MTM) algorithm uses a compound proposal with multiple candidate draws to improve local sampling efficiency. While several methodological works have continued to develop MTM and the multi-candidate mechanism that characterizes it, the literature lacks a unified comparison of these components. This paper presents a structured formulation of MTM within the involutive MCMC framework, providing a principled approach for deriving valid acceptance probabilities based on the proposal mechanism. Through a comprehensive simulation experiment, we evaluate the impact of MTM configurations on non-Gaussian and multimodal target distributions. Our results reveal that while weight functions are a focus of several methodological developments, their impact on stationary sampling efficiency is secondary to the configuration of the proposal distribution. Furthermore, we find that while increasing the number of candidates enhances per-iteration efficiency, the realized performance gains are offset by computational overhead introduced by multiple candidacy unless parallelize computing is used. Our findings offer practical guidance for configuring an MTM algorithm for complex and non-Gaussian targets.

2503.07325 2026-06-02 cs.LG stat.ML

Non-vacuous Generalization Bounds for Deep Neural Networks without any modification to the trained models

无需对训练模型进行任何修改的深度神经网络非平凡泛化界

Khoat Than, Dat Phan

AI总结 提出一类新的数据依赖泛化界,直接应用于未修改的训练模型,通过分解泛化误差为分布复杂度和局部模型行为项,首次在大型未修改深度网络上实现非平凡泛化保证。

详情
AI中文摘要

理解和认证现代深度神经网络的行为仍然是可靠机器学习中的一个基本挑战。我们引入了一类新的数据依赖泛化界,直接应用于训练模型,无需任何修改。特别地,我们提出了一个可精确计算的界,在所有评估的网络中(包括具有6亿参数的ImageNet规模模型)都是非平凡的。这是首次表明即使对于大型未修改的深度网络,也能实现有意义的泛化保证。我们的方法揭示了泛化由训练模型与数据分布几何之间的相互作用所支配。我们将泛化误差分解为两个可解释的组成部分:一个分布复杂度项,捕捉数据质量在输入空间中的分布;以及局部模型行为项,捕捉网络在单个区域内的行为。这种联合依赖识别出泛化差距出现的位置和原因。实验上,我们界的某些部分对真实测试误差具有高度预测性,并且当划分与内在数据几何对齐时,界会收紧,突出了数据依赖的局部正则性作为泛化的关键驱动因素。

英文摘要

Understanding and certifying the behavior of modern deep neural networks remains a fundamental challenge in reliable machine learning. We introduce a new class of data-dependent generalization bounds that apply directly to trained models, without any modification. In particular, we present an exactly computable bound that is non-vacuous across all evaluated networks, including ImageNet-scale models with 600M parameters. This this is the first work showing that meaningful generalization guarantees are achievable even for large, unaltered deep networks. Our approach reveals that generalization is governed by the interaction between the trained model and the geometry of the data distribution. We decompose the generalization error into two interpretable components: a distributional complexity term, capturing how the data mass is distributed across the input space, and local model-behavior terms, capturing the network's behavior within individual regions. This joint dependence identifies where and why generalization gaps arise. Empirically, some components of our bound are highly predictive of the true test error, and the bound tightens when the partition aligns with the intrinsic data geometry, highlighting data-dependent local regularity as a key driver of generalization.

2501.08640 2026-06-02 cs.LG stat.ML

Quantum Reservoir Computing and Risk Bounds

量子储层计算与风险界

Naomi Mona Chmielewski, Nina Amini, Joseph Mikael

AI总结 利用Rademacher复杂度对量子储层计算中的泛化误差进行界定,并分析其随量子比特数增长的标度行为。

详情
AI中文摘要

我们提出了一种利用Rademacher复杂度来界定几类量子储层泛化误差的方法。我们给出了两个特定量子储层类别的具体参数依赖界。我们分析了泛化界随量子比特数增长的标度行为。将我们的结果应用于具有多项式读出函数的类别,我们发现风险界在训练样本数量上收敛。我们的界中对量子储层和读出参数的显式依赖可用于在一定程度上控制泛化误差。需要注意的是,这些界随量子比特数n呈指数增长。Rademacher复杂度的上界可应用于满足量子动力学和读出函数若干假设的其他储层类别。

英文摘要

We propose a way to bound the generalisation errors of several classes of quantum reservoirs using the Rademacher complexity. We give specific, parameter-dependent bounds for two particular quantum reservoir classes. We analyse how the generalisation bounds scale with growing numbers of qubits. Applying our results to classes with polynomial readout functions, we find that the risk bounds converge in the number of training samples. The explicit dependence on the quantum reservoir and readout parameters in our bounds can be used to control the generalisation error to a certain extent. It should be noted that the bounds scale exponentially with the number of qubits n. The upper bounds on the Rademacher complexity can be applied to other reservoir classes that fulfill a few hypotheses on the quantum dynamics and the readout function.

2412.19444 2026-06-02 cs.LG math.OC stat.ML

Towards Simple and Provable Parameter-Free Adaptive Gradient Methods

迈向简单且可证明的无参数自适应梯度方法

Yuanzhe Tao, Yifeng Liu, Huizhuo Yuan, Xun Zhou, Yuan Cao, Quanquan Gu

AI总结 提出 AdaGrad++ 和 Adam++ 两种简单无参数自适应梯度方法,在无需预设学习率的情况下实现与 AdaGrad 和 Adam 相当的收敛保证。

详情
Comments
45 pages, 19 figures, 3 tables
AI中文摘要

诸如 AdaGrad 和 Adam 等优化算法通过在优化过程中动态调整学习率,显著推进了深度模型的训练。然而,学习率的临时调整带来了挑战并导致实际中的低效。为解决此问题,近期研究聚焦于开发无需学习率调整即可有效运行的“无参数”算法。尽管有这些努力,现有的 AdaGrad 和 Adam 无参数变体往往过于复杂且/或缺乏正式的收敛保证。在本文中,我们提出了 AdaGrad++ 和 Adam++,这是 AdaGrad 和 Adam 的新型简单无参数变体,具有收敛保证。我们证明 AdaGrad++ 在凸优化中无需预设学习率假设即可达到与 AdaGrad 相当的收敛速率。类似地,Adam++ 在不依赖任何学习率条件的情况下匹配 Adam 的收敛速率。跨多种深度学习任务的实验结果验证了 Adam++ 的竞争性能。

英文摘要

Optimization algorithms such as AdaGrad and Adam have significantly advanced the training of deep models by dynamically adjusting the learning rate during the optimization process. However, ad-hoc tuning of learning rates poses a challenge and leads to inefficiencies in practice. To address this issue, recent research has focused on developing ``parameter-free'' algorithms that operate effectively without the need for learning rate tuning. Despite these efforts, existing parameter-free variants of AdaGrad and Adam tend to be overly complex and/or lack formal convergence guarantees. In this paper, we present AdaGrad++ and Adam++, novel and simple parameter-free variants of AdaGrad and Adam with convergence guarantees. We prove that AdaGrad++ achieves comparable convergence rates to AdaGrad in convex optimization without predefined learning rate assumptions. Similarly, Adam++ matches the convergence rate of Adam without relying on any conditions on the learning rates. Experimental results across various deep learning tasks validate the competitive performance of Adam++.

2412.04177 2026-06-02 cs.LG stat.ML

Fixed-Mean Gaussian Processes for Post-hoc Bayesian Deep Learning

固定均值高斯过程用于事后贝叶斯深度学习

Luis A. Ortega, Simón Rodríguez-Santana, Daniel Hernández-Lobato

AI总结 提出固定均值高斯过程(FMGP),通过将后验均值固定为预训练DNN的输出,利用变分推断高效估计预测方差,实现架构无关的事后不确定性量化。

详情
Comments
32 pages, 6 figures and 6 tables. Submitted to for revision
AI中文摘要

近年来,对预训练深度神经网络(DNN)的预测进行事后不确定性估计的兴趣日益增加。给定通过反向传播预训练的DNN,这些方法通过添加输出置信度度量(如误差条)来增强原始网络,同时不损害其初始准确性。在此背景下,我们引入了一种新的稀疏变分高斯过程(GP)族,其中当使用通用核时,后验均值固定为任意连续函数。具体地,我们将该GP的均值固定为预训练DNN的输出,使我们的方法能够有效地拟合GP的预测方差以估计DNN预测的不确定性。我们的方法利用变分推断(VI)进行高效的随机优化,训练成本与训练点数无关,可高效扩展到ImageNet等大型数据集。所提出的方法称为固定均值GP(FMGP),与架构无关,仅依赖预训练模型的输出来调整预测方差。实验结果表明,与最先进的DNN事后贝叶斯推断方法相比,FMGP在不确定性估计和计算效率方面均有提升。

英文摘要

Recently, there has been an increasing interest in performing post-hoc uncertainty estimation about the predictions of pre-trained deep neural networks (DNNs). Given a pre-trained DNN via back-propagation, these methods enhance the original network by adding output confidence measures, such as error bars, without compromising its initial accuracy. In this context, we introduce a novel family of sparse variational Gaussian processes (GPs), where the posterior mean is fixed to any continuous function when using a universal kernel. Specifically, we fix the mean of this GP to the output of the pre-trained DNN, allowing our approach to effectively fit the GP's predictive variances to estimate the DNN prediction uncertainty. Our approach leverages variational inference (VI) for efficient stochastic optimization, with training costs that remain independent of the number of training points, scaling efficiently to large datasets such as ImageNet. The proposed method, called fixed-mean GP (FMGP), is architecture-agnostic, relying solely on the pre-trained model's outputs to adjust the predictive variances. Experimental results demonstrate that FMGP improves both uncertainty estimation and computational efficiency when compared to state-of-the-art methods for DNN post-hoc Bayesian inference.

2411.12438 2026-06-02 cs.DS cs.LG stat.ML

Dimension Reduction via Sum-of-Squares and Improved Clustering Algorithms for Non-Spherical Mixtures

通过平方和方法的降维及非球形混合物的改进聚类算法

Prashanti Anderson, Mitali Bafna, Rares-Darius Buhai, Pravesh K. Kothari, David Steurer

AI总结 提出基于平方和方法的降维子程序,实现非球形高斯混合物的高效聚类,显著降低样本和时间的维度依赖。

详情
Comments
67 pages, updated to match camera-ready version at COLT 2026
AI中文摘要

我们开发了一种新的方法,通过基于平方和方法的子程序,对非球形(即任意分量协方差)高斯混合模型进行聚类,该子程序能够找到输入数据的低维分离保持投影。我们的方法给出了经典降维(基于奇异值分解)的非球形类比,该经典降维在众多应用中构成著名的Vempala和Wang [VW04]球形聚类算法的关键组成部分。作为应用,我们获得了以下算法:(1) 对任意全变差分离的$k$个中心化(即零均值)高斯混合,使用$n\geq \operatorname{poly}(d) f(w_{\min}^{-1})$个样本和$\operatorname{poly}(n)$时间进行聚类;(2) 对任意全变差分离的$k$个具有相同但未知协方差的高斯混合,使用$n \geq d^{O(\log w_{\min}^{-1})} f(w_{\min}^{-1})$个样本和$n^{O(\log w_{\min}^{-1})}$时间进行聚类。这里,$w_{\min}$是输入混合的最小混合权重,$f$不依赖于维度$d$。我们的算法自然扩展到容忍与维度无关的任意异常值比例。在这项工作之前,最先进的非球形聚类算法中的技术需要$d^{O(k)} f(w_{\min}^{-1})$个样本和时间来聚类此类混合。我们的结果可能令人惊讶,因为针对非球形高斯混合聚类的$d^{Ω(k)}$统计查询和平方和下限 [DKS17, DKPP24] 通常被认为排除了该问题的$d^{o(k)}$代价算法,但我们的结果表明,对于一类非常广泛的高斯混合,这些下限实际上可以被规避。

英文摘要

We develop a new approach for clustering non-spherical (i.e., arbitrary component covariances) Gaussian mixture models via a subroutine, based on the sum-of-squares method, that finds a low-dimensional separation-preserving projection of the input data. Our method gives a non-spherical analog of the classical dimension reduction, based on singular value decomposition, that, among several other applications, forms a key component of the celebrated spherical clustering algorithm of Vempala and Wang [VW04]. As applications, we obtain an algorithm to (1) cluster an arbitrary total-variation separated mixture of $k$ centered (i.e., zero-mean) Gaussians with $n\geq \operatorname{poly}(d) f(w_{\min}^{-1})$ samples and $\operatorname{poly}(n)$ time, and (2) cluster an arbitrary total-variation separated mixture of $k$ Gaussians with identical but arbitrary unknown covariance with $n \geq d^{O(\log w_{\min}^{-1})} f(w_{\min}^{-1})$ samples and $n^{O(\log w_{\min}^{-1})}$ time. Here, $w_{\min}$ is the minimum mixing weight of the input mixture, and $f$ does not depend on the dimension $d$. Our algorithms naturally extend to tolerating a dimension-independent fraction of arbitrary outliers. Before this work, the techniques in the state-of-the-art non-spherical clustering algorithms needed $d^{O(k)} f(w_{\min}^{-1})$ samples and time for clustering such mixtures. Our results may come as a surprise in the context of the $d^{Ω(k)}$ statistical query and sum-of-squares lower bounds [DKS17, DKPP24] for clustering non-spherical Gaussian mixtures. While these results are usually thought to rule out $d^{o(k)}$ cost algorithms for the problem, our results show that the lower bounds can in fact be circumvented for a remarkably general class of Gaussian mixtures.

2411.08692 2026-06-02 cond-mat.soft cond-mat.stat-mech stat.ML

A Likelihood Approach for Inference of Population Heterogeneity in Particle Ensembles with Second-Order Langevin Dynamics

基于二阶朗之万动力学的粒子系综群体异质性推断的似然方法

Jan Albrecht, Manfred Opper, Robert Großmann

AI总结 针对离散采样随机轨迹数据,提出一种最大似然方法同时推断非线性二阶朗之万动力学模型和估计主动运动粒子群体的异质性,在短轨迹上优于现有方法。

详情
Journal ref
Commun. Phys. 9, 165 (2026)
Comments
14 pages, 4 figures
AI中文摘要

生物体的内在复杂性常常导致运动行为呈现随机成分。因此,需要稳健的随机推断方法从实验提供的时间离散轨迹数据中理解和预测运动模式。在许多情况下,需要二阶朗之万模型来充分捕捉运动性。此外,在分析来自多个个体生物的数据时,需要考虑群体异质性。在这项工作中,我们描述了一种最大似然方法,用于推断动态随机模型,同时从离散采样的随机轨迹中估计运动活性粒子群体的异质性。为此,我们提出了一种近似非线性二阶朗之万模型似然的方法。我们表明,这种最大似然方法优于替代方法,尤其是在短轨迹上。此外,我们展示了如何推导异质性估计的不确定性度量。因此,我们为基于轨迹数据的主动驱动实体动态模型的系统性数据驱动推断铺平了道路,揭示了时间波动和粒子间变异性。

英文摘要

The inherent complexity of biological agents often leads to motility behavior that appears to have random components. Robust stochastic inference methods are therefore required to understand and predict the motion patterns from time-discrete trajectory data provided by experiments. In many cases, second-order Langevin models are needed to adequately capture the motility. Additionally, population heterogeneity needs to be taken into account when analyzing data from several individual organisms. In this work, we describe a maximum likelihood approach to infer dynamical, stochastic models and, simultaneously, estimate the heterogeneity in a population of motile active particles from discretely sampled, stochastic trajectories. To this end, we propose a method to approximate the likelihood for non-linear second-order Langevin models. We show that this maximum likelihood ansatz outperforms alternative approaches, especially for short trajectories. Additionally, we demonstrate how a measure of uncertainty for the heterogeneity estimate can be derived. We thereby pave the way for the systematic, data-driven inference of dynamical models for actively driven entities based on trajectory data, deciphering temporal fluctuations and inter-particle variability.

2205.08586 2026-06-02 econ.EM stat.ME

Treatment Choice with Nonlinear Regret

非线性遗憾下的治疗选择

Toru Kitagawa, Sokbae Lee, Chen Qiu

AI总结 针对均值遗憾对抽样不确定性敏感的问题,提出最小化非线性变换遗憾的均值,推导出有限样本贝叶斯和极小化最优规则的闭式分数,并将其应用于正态回归模型和样本量计算。

详情
Journal ref
Biometrika, Volume 113, Issue 2, 2026
AI中文摘要

文献关注福利遗憾的均值,但由于对抽样不确定性的敏感性,可能导致不理想的治疗选择。我们提出最小化遗憾的非线性变换的均值,并表明单例规则对于非线性遗憾不是本质上完备的。关注均方遗憾,我们推导出有限样本贝叶斯和极小化最优规则的闭式分数。我们的方法基于决策理论,并扩展到极限实验。治疗分数可视为支持治疗的证据强度。我们将我们的框架应用于正态回归模型和样本量计算。

英文摘要

The literature focuses on the mean of welfare regret, which can lead to undesirable treatment choice due to sensitivity to sampling uncertainty. We propose to minimize the mean of a nonlinear transformation of regret and show that singleton rules are not essentially complete for nonlinear regret. Focusing on mean square regret, we derive closed-form fractions for finite-sample Bayes and minimax optimal rules. Our approach is grounded in decision theory and extends to limit experiments. The treatment fractions can be viewed as the strength of evidence favoring treatment. We apply our framework to a normal regression model and sample size calculation.

2403.07008 2026-06-02 cs.LG cs.AI cs.CL stat.ME

AutoEval Done Right: Using Synthetic Data for Model Evaluation

AutoEval 的正确做法:使用合成数据进行模型评估

Pierre Boyeau, Anastasios N. Angelopoulos, Nir Yosef, Jitendra Malik, Michael I. Jordan

AI总结 本文提出高效且统计上无偏的算法,利用AI标记的合成数据减少模型评估所需的人工标注量,在GPT-4实验中有效样本量提升高达50%。

详情
Comments
camera-ready paper version
AI中文摘要

使用人工标注的验证数据评估机器学习模型可能成本高昂且耗时。AI标记的合成数据可用于减少此目的所需的人工标注数量,这一过程称为自动评估。我们为此提出了高效且统计上无偏的算法,在保持无偏性的同时提高样本效率。这些算法在GPT-4实验中使有效人工标注样本量增加高达50%。

英文摘要

The evaluation of machine learning models using human-labeled validation data can be expensive and time-consuming. AI-labeled synthetic data can be used to decrease the number of human annotations required for this purpose in a process called autoevaluation. We suggest efficient and statistically principled algorithms for this purpose that improve sample efficiency while remaining unbiased. These algorithms increase the effective human-labeled sample size by up to 50% on experiments with GPT-4.

2211.04697 2026-06-02 stat.ME math.ST stat.TH

An average-case sensitivity analysis for unmeasured confounding

未测量混杂的平均情况敏感性分析

Yao Zhang, Qingyuan Zhao

AI总结 提出一种基于倾向得分比二阶矩的平均情况敏感性模型,通过优化问题导出潜在结果均值的闭式界,并开发一步估计量和乘子自举置信带。

详情
Journal ref
Biometrika, 2026
Comments
42 pages, 3 figures, 2 tables
AI中文摘要

对无混杂假设的敏感性分析在观察性研究中至关重要。为此,边际敏感性模型因其良好的可解释性和数学性质近年来受到欢迎。然而,大多数现有模型仅考虑一个最坏情况参数,该参数限制了观测数据与完整数据倾向得分的对数几率差,这可能无法完全捕捉未测量混杂的程度。我们提出一个新的敏感性模型,该模型由倾向得分比的二阶矩参数化,仅要求未测量混杂的平均强度有界。通过将相关的敏感性分析刻画为优化问题,我们推导出模型下平均潜在结果的尖锐闭式界。我们基于相应的有效影响函数提出了这些界的有效一步估计量。此外,我们应用乘子自举构建同时置信带,以覆盖由不同敏感性参数值下的界组成的敏感性曲线。通过一项真实数据研究,我们说明了这种平均情况敏感性分析如何提供更紧的界,并利用观测协变量促进结果的校准。

英文摘要

Sensitivity analysis for the unconfoundedness assumption is crucial in observational studies. For this purpose, the marginal sensitivity model gained popularity recently due to good interpretability and mathematical properties. However, most existing models only consider a worst-case parameter that bounds the logit difference between the observed and full data propensity scores, which may not fully capture the extent of unmeasured confounding. We propose a new sensitivity model that is parameterized by the second moment of the propensity score ratio, requiring only the average strength of unmeasured confounding to be bounded. By characterizing the associated sensitivity analysis as an optimization problem, we derive sharp closed-form bounds of the average potential outcomes under our model. We propose efficient one-step estimators for these bounds based on the corresponding efficient influence functions. Additionally, we apply multiplier bootstrap to construct simultaneous confidence bands to cover the sensitivity curve that consists of bounds at different values of the sensitivity parameters. Through a real-data study, we illustrate how this average-case sensitivity analysis can provide tighter bounds and facilitate calibration of the results using observed covariates.

2310.20545 2026-06-02 cs.LG math.OC stat.ML

Optimizing accuracy and diversity: a multi-task approach to forecast combinations

优化准确性与多样性:一种多任务预测组合方法

Giovanni Felici, Antonio M. Sudoso

AI总结 提出一种基于深度学习架构的多任务优化方法,通过联合选择与组合预测模型,同时考虑准确性和多样性,提升时间序列点预测精度。

详情
Journal ref
Annals of Operations Research, 2026
AI中文摘要

我们提出了一种基于深度学习架构的多任务优化方法,用于时间序列预测。我们利用大量时间序列集合来识别可组合的预测模型权重,从而为每个序列生成预测。该方法联合处理两个任务:选择不同的预测模型及其有效组合。在此过程中,它以一种新颖的方式兼顾了预测方法的准确性和多样性。对于给定的时间序列,模型组合模块提取特征并用于优化预测方法的权重。同时,模型选择模块提取其他特征以识别用于预测的方法子集。该选择过程被构建为一个分类问题,标签表示用于序列的模型集合。这些标签通过求解一个辅助优化问题来确定,该问题为每个时间序列识别准确且多样的方法。然后,两个模块的输出被组合,整个神经网络通过梯度下降优化最小化自定义损失函数进行联合训练。在M4竞赛数据集和真实道路交通数据的大量序列上的实验结果表明,与最先进的方法相比,我们的方法提高了点预测精度。

英文摘要

We present a multi-task optimization approach based on a deep learning architecture for time series forecasting. We leverage large collections of time series to identify the weights of forecasting models that can be combined to produce forecasts for each series. This method jointly addresses two tasks: the selection of different forecasting models, and their effective combination. In doing so, it keeps into account, in an original way, both the accuracy and diversity of the forecasting methods. For a given time series, the model combination module extracts features and uses them to optimize the weights of the forecasting methods. Simultaneously, the model selection module extracts other features to identify the subset of methods to be used for the prediction. This selection process is framed as a classification problem, with the labels representing the set of models to be used for a series. These labels are determined by solving an auxiliary optimization problem that identifies accurate and diverse methods for each time series. The outputs of the two modules are then combined and the entire neural network is jointly trained by minimizing a custom loss function via gradient descent optimization. Experimental results on a large set of series from the M4 competition dataset and from real road traffic data show that our proposal enhances point forecast accuracy compared to state-of-the-art methods.

2304.10255 2026-06-02 cs.LG stat.ML

PED-ANOVA: Efficiently Quantifying Hyperparameter Importance in Arbitrary Subspaces

PED-ANOVA:高效量化任意子空间中超参数重要性

Shuhei Watanabe, Archit Bansal, Frank Hutter

AI总结 提出PED-ANOVA方法,利用Pearson散度实现任意子空间中超参数重要性的闭式计算,在保持高效性的同时准确识别关键超参数。

详情
Comments
Accepted by IJCAI2023
AI中文摘要

近年来,深度学习超参数优化(HPO)的流行凸显了良好超参数(HP)空间设计在训练强模型中的作用。而设计一个好的HP空间关键依赖于理解不同HP的作用。这激发了超参数重要性(HPI)的研究,例如使用流行的功能ANOVA(f-ANOVA)方法。然而,原始的f-ANOVA公式不适用于算法设计者最相关的子空间,例如由顶级性能定义的子空间。为解决此问题,我们推导了任意子空间下f-ANOVA的新公式,并提出一种使用Pearson散度(PED)实现HPI闭式计算的算法。我们证明,这种新算法称为PED-ANOVA,能够成功识别不同子空间中的重要HP,同时计算效率极高。

英文摘要

The recent rise in popularity of Hyperparameter Optimization (HPO) for deep learning has highlighted the role that good hyperparameter (HP) space design can play in training strong models. In turn, designing a good HP space is critically dependent on understanding the role of different HPs. This motivates research on HP Importance (HPI), e.g., with the popular method of functional ANOVA (f-ANOVA). However, the original f-ANOVA formulation is inapplicable to the subspaces most relevant to algorithm designers, such as those defined by top performance. To overcome this issue, we derive a novel formulation of f-ANOVA for arbitrary subspaces and propose an algorithm that uses Pearson divergence (PED) to enable a closed-form calculation of HPI. We demonstrate that this new algorithm, dubbed PED-ANOVA, is able to successfully identify important HPs in different subspaces while also being extremely computationally efficient.

2110.11074 2026-06-02 stat.ME

A Unified Framework for Regularized Estimating Equations via Fixed-Point and Variational Inequality Problems

正则化估计方程的统一框架:基于不动点和变分不等式问题

Archer Y. Yang, Yue Zhao, Yi Lian, Yuwen Gu, Jun Fan

AI总结 本文提出正则化估计方程的改进形式,并证明其与近端算子定义的不动点问题及广义变分不等式问题的等价性,为非凸正则化提供统一的理论与计算框架。

详情
AI中文摘要

许多统计学问题是在估计方程框架而非最小化框架下提出的。然而,与正则化最小化问题相比,正则化估计方程(REE)的研究要少得多。在本文中,我们研究了一种改进的正则化估计方程形式,并探讨其后续等价性,包括:(1)通过相应正则化子的近端算子指定的不动点问题,以及(2)广义变分不等式问题。这些等价性在一般条件下成立,并适用于非凸正则化子。此外,这些等价性为研究REE时的理论分析和计算算法开辟了新的可能性。

英文摘要

Many statistics problems are formulated within an estimating equation framework instead of a minimization framework. However, the regularized estimating equations (REE) have been much less extensively studies than regularized minimization problems. In this paper, we study an improved regularized estimating equation formulation and explore its subsequent equivalences in terms of (1) fixed-point problem specified via the proximal operator of the corresponding regularizer, and (2) generalized variational inequality problems. Such equivalences hold under general conditions and accommodate nonconvex regularizers. Moreover, these equivalences open up new possibilities in theoretical analysis and computational algorithms when studying the REE.

1902.07030 2026-06-02 math.ST math.PR stat.TH

New statistical methodology for second level global sensitivity analysis

第二级全局敏感性分析的新统计方法

Anouar Meynaoui, Amandine Marrel, Béatrice Laurent

AI总结 提出一种基于单层蒙特卡洛循环和加权HSIC估计量的统计方法,用于在输入概率分布不确定时高效量化全局敏感性分析结果的不确定性。

详情
Comments
Previously this version appeared as arXiv:2212.12435 which was submitted as a new work by accident
AI中文摘要

数值模拟器的全局敏感性分析旨在研究输入不确定性对输出的全局影响。为了进行全局敏感性分析,通常使用基于输入/输出依赖度量的统计工具。我们关注基于再生核希尔伯特空间的依赖度量:HSIC(希尔伯特-施密特独立性准则)。有时,模拟输入不确定性的概率分布本身可能是不确定的,量化这种不确定性对全局敏感性分析结果的全局影响很重要。我们称之为第二级全局敏感性分析。然而,使用双蒙特卡洛循环进行第二级全局敏感性分析需要大量模型评估,这对于CPU时间昂贵的模拟器来说是不可行的。为了克服这一限制,我们提出了一种基于单层蒙特卡洛循环且计算预算有限的新统计方法。首先,我们从精心选择的概率分布中构建唯一的输入样本,并计算相应的代码输出。从该输入/输出样本出发,通过使用加权HSIC度量估计量,对输入的各种假设概率分布进行全局敏感性分析。证明了这些加权估计量的统计性质。最后,我们定义了输入概率分布与全局敏感性分析结果之间的基于HSIC的第二级度量,构成了第二级全局敏感性分析指数。通过一个分析示例说明了我们的第二级全局敏感性分析方法的效率,并比较了几种技术选项。最后,提供了一个模拟核反应堆严重事故情景的测试案例应用。

英文摘要

Global sensitivity analysis (GSA) of numerical simulators aims at studying the global impact of the input uncertainties on the output. To perform the GSA, statistical tools based on inputs/output dependence measures are commonly used. We focus here on dependence measures based on reproducing kernel Hilbert spaces: the Hilbert-Schmidt Independence Criterion denoted HSIC. Sometimes, the probability distributions modeling the uncertainty of inputs may be themselves uncertain and it is important to quantify the global impact of this uncertainty on GSA results. We call it here the second-level global sensitivity analysis (GSA2). However, GSA2, when performed with a double Monte Carlo loop, requires a large number of model evaluations which is intractable with CPU time expensive simulators. To cope with this limitation, we propose a new statistical methodology based on a single Monte Carlo loop with a limited calculation budget. Firstly, we build a unique sample of inputs from a well chosen probability distribution and the associated code outputs are computed. From this inputs/output sample, we perform GSA for various assumed probability distributions of inputs by using weighted HSIC measures estimators. Statistical properties of these weighted esti-mators are demonstrated. Finally, we define 2 nd-level HSIC-based measures between the probability distributions of inputs and GSA results, which constitute GSA2 indices. The efficiency of our GSA2 methodology is illustrated on an analytical example, thereby comparing several technical options. Finally, an application to a test case simulating a severe accidental scenario on nuclear reactor is provided.