arXivDaily arXiv每日学术速递 周一至周五更新
重置
STAT统计222

1. 统计理论与方法 32 篇

2606.09391 2026-06-09 math.ST physics.ao-ph stat.ME stat.TH 新提交

Kling-Gupta linear regression

Kling-Gupta线性回归

Hristos Tyralis, Georgia Papacharalampous

AI总结 本文形式化Kling-Gupta损失函数,推导多元线性回归中参数估计的显式公式,证明其与普通最小二乘的差异,并建立渐近性质。

详情
Comments
64 pages, 8 figures, 3 tables
AI中文摘要

尽管Kling-Gupta效率($\mathrm{KGE}$)在水文模型评估中被广泛采用,但其作为统计估计量的性质仍未探索。研究这些性质是必要的,因为参数估计和预测评估本质上是关联的。为此,我们在极值估计框架内形式化了负向Kling-Gupta损失$L_\mathrm{KG} = (1 - \mathrm{KGE})^2$(等价于最大化$\mathrm{KGE}$),并分析了其在多元线性回归中的行为。我们建立了参数估计的显式公式,表明Kling-Gupta线性回归通过一个由预测变量和响应的样本方差及协方差决定的方差膨胀因子,缩放普通最小二乘(OLS)系数向量。我们证明,Kling-Gupta线性回归预测在训练集上复制了响应的样本方差,这与OLS固有的方差缩减形成对比,而两种估计量都保持了观测的样本均值,并在预测与响应之间实现了相同的样本相关性。我们分析表明,没有单一的估计量能同时最大化Nash-Sutcliffe效率$\mathrm{NSE}$和$\mathrm{KGE}$:OLS估计量达到最大可能的$\mathrm{NSE}$但未达到最大$\mathrm{KGE}$,而Kling-Gupta估计量以牺牲$\mathrm{NSE}$为代价最大化$\mathrm{KGE}$。我们证明了Kling-Gupta估计量几乎必然收敛到明确定义的总体极限,并代数表达了这些极限。此外,我们评估了两种估计量的训练集和测试集性能指标,表明对于每个估计量,训练集和独立测试集上的指标渐近收敛到相同的极限(尽管OLS和Kling-Gupta回归的极限不同)。

英文摘要

Although the Kling-Gupta efficiency ($\mathrm{KGE}$) is widely adopted for model evaluation in hydrology, its properties as a statistical estimator remain unexplored. Investigating these properties is necessary because parameter estimation and forecast evaluation are inherently linked. To address this, we formalize the negatively oriented Kling-Gupta loss $L_\mathrm{KG} = (1 - \mathrm{KGE})^2$ within an extremum estimation framework (equivalent to maximizing $\mathrm{KGE}$) and analyze its behavior in multiple linear regression. We establish explicit formulas for the parameter estimates, showing that Kling-Gupta linear regression scales the ordinary least squares (OLS) coefficient vector by a variance-inflation factor governed by the sample variances and covariances of the predictors and the response. We show that Kling-Gupta linear regression predictions replicate the sample variance of the response on the training set, in contrast to the variance reduction inherent to OLS, while both estimators maintain the sample mean of the observations and achieve the same sample correlation between the predictions and the response. We show analytically that no single estimator can simultaneously maximize both the Nash-Sutcliffe efficiency $\mathrm{NSE}$ and $\mathrm{KGE}$: the OLS estimator attains the maximum possible $\mathrm{NSE}$ but not the maximum $\mathrm{KGE}$, while the Kling-Gupta estimator maximizes $\mathrm{KGE}$ at the cost of $\mathrm{NSE}$. We prove the almost sure convergence of the Kling-Gupta estimator to well-defined population limits and express those limits algebraically. Furthermore, we evaluate the training and test set performance metrics for both estimators, demonstrating that for each estimator the metrics on the training set and on an independent test set converge asymptotically to identical limits (though the limits differ between OLS and Kling-Gupta regression).

2606.08981 2026-06-09 stat.ME 新提交

Divide-and-shrink: An efficient and heterogeneity-agnostic approach for transfer estimation using summary statistics

Divide-and-shrink: 一种利用汇总统计量进行迁移估计的高效且异质性无关的方法

Ruoyu Wang, Xihong Lin

AI总结 提出Divide-and-shrink方法,利用目标与外部总体的汇总统计量闭式估计目标参数,保证任意异质性下均优于仅用目标数据的估计,且无需模型或调参。

详情
AI中文摘要

跨数据源的知识转移通过利用来自不同来源的数据日益增长的可用性,有望改善目标总体参数的估计。然而,知识转移的有效性常常受到数据源之间复杂且普遍的异质性以及无法访问个体层面数据的挑战。本文提出了divide-and-shrink (dShrink) 方法,这是一种迁移估计方法,它利用来自目标总体和一些外部源总体的汇总统计量以闭式形式估计目标总体参数,同时考虑总体异质性。dShrink估计器在任意总体异质性下,保证在期望二次误差方面优于仅基于目标总体的估计器。当目标总体与源总体相似或潜在真实参数值接近零时,增益可能很大。值得注意的是,dShrink是无模型的,不需要用户指定的调优参数,对数据源之间的各种异质性具有鲁棒性,并适用于广泛的参数估计问题。即使外部汇总统计量的协方差矩阵不可访问,dShrink仍然有效,并提供了整合来自多个源总体的辅助信息和汇总统计量的灵活性。模拟和真实数据分析展示了dShrink估计器的优越性能及其作为迁移估计的稳健工具的潜力。

英文摘要

Knowledge transfer across data sources holds great promise for improving the estimation of target population parameters by leveraging the growing availability of data from different sources. However, the effectiveness of knowledge transfer is often challenged by the complex and pervasive heterogeneity between data sources and the lack of access to individual-level data. This paper proposes the divide-and-shrink (dShrink) method, a transfer estimation method that estimates target population parameters in a closed form using summary statistics from a target population and some external source populations while accounting for population heterogeneity. The dShrink estimator is guaranteed to outperform the estimator based solely on the target population in terms of expected quadratic error under arbitrary population heterogeneity. The gain can be substantial when the target and source populations are similar, or the underlying true parameter values are near zero. Notably, dShrink is model-free, requires no user-specified tuning parameters, robust to various types of heterogeneity between data sources, and applies to a broad range of parameter estimation problems. dShrink remains effective even when the covariance matrix is not accessible for the external summary statistics and offers flexibility in incorporating side information and summary statistics from multiple source populations. Simulations and real data analyses demonstrate the superior performance of the dShrink estimator and its potential as a robust tool for transfer estimation.

2606.08975 2026-06-09 stat.OT 新提交

Strong Likelihood Principle: Strengthening a Principle or Misunderstanding the Likelihood Function

强似然原理:强化原理还是误解似然函数

Paul William Vos

AI总结 本文重新审视强似然原理,指出其源于对似然函数定义域的混淆,通过二项分布与负二项分布族的比较及Fisher信息度量几何结构,论证强似然原理退化为弱似然原理。

详情
Comments
15 pages, 3 figures
AI中文摘要

强似然原理(SLP)通常由Birnbaum的论证从充分性原理和条件性原理推导而来,大量文献对此推导的合理性存在争议。我们采取不同的方法。我们仔细解读SLP的术语,认为该原理的通常表述反映了对似然函数定义域的混淆。似然自然地被定义为一个分布族$M$上的函数,而非参数空间上的函数,一旦如此定义,SLP便退化为其弱版本,即弱似然原理。通过类比货币价值,具体地通过比较共享参数的二项分布族和负二项分布族来阐明这一诊断,并通过Fisher信息度量与$M$的几何结构相联系。相同的标准化来自关于跨总体比较测量的统计论证和关于流形距离的几何论证;这种收敛为弱似然原理提供了正面内容。

英文摘要

The strong likelihood principle (SLP) is conventionally derived from the sufficiency principle and a conditionality principle in an argument due to Birnbaum, and much of the literature contests whether the derivation is sound. We take a different approach. We ask what the SLP says when its terms are read carefully, and argue that the principle as ordinarily stated reflects a confusion about the domain of the likelihood function. The likelihood is naturally defined as a function on a family of distributions $M$, not on a parameter space, and once it is so defined the SLP collapses into its weak counterpart, the weak likelihood principle. The diagnosis is illustrated by analogy with monetary value, developed concretely through a comparison of the binomial and negative binomial families that share a parameter, and connected to the geometric structure of $M$ through the Fisher information metric. The same standardization emerges from a statistical argument about comparing measurements across populations and from a geometric argument about manifold distance; this convergence supplies the positive content of the weak likelihood principle.

2606.08551 2026-06-09 stat.ME 新提交

Enhanced localized conformal prediction with imperfect auxiliary information

增强的局部化共形预测与不完美辅助信息

Yinjie Min, Liuhua Peng, Changliang Zou

AI总结 提出增强局部化共形预测(ELCP),利用密度比加权核估计整合辅助数据,在保持有限样本边际覆盖的同时提升局部覆盖可靠性。

详情
AI中文摘要

构建提供近似或渐近条件覆盖保证、捕捉局部数据异质性的共形预测集日益受到关注。然而,像局部化共形预测(LCP)这样的方法在确保稀疏校准数据区域的可靠预测集方面可能面临挑战。本文引入了增强局部化共形预测(ELCP),这是一种新颖的方法,它整合辅助数据来细化局部预测集,同时保留有限样本边际覆盖保证。通过使用密度比加权核估计,ELCP无缝集成辅助数据和校准数据,适应潜在的分布偏移,并提高预测集的局部可靠性。理论分析证实,ELCP保持边际覆盖并增强渐近测试条件覆盖。模拟结果表明,与标准LCP相比,其具有更优的局部覆盖和更小的预测集,突显了在有限校准数据但存在来自相关任务的辅助信息的情况下其有效性。

英文摘要

There is growing interest in constructing conformal prediction sets that provide approximate or asymptotic conditional coverage guarantees, capturing local data heterogeneity. However, methods like localized conformal prediction (LCP) may face challenges in ensuring reliable prediction sets in regions with sparse calibration data. This paper introduces Enhanced Localized Conformal Prediction (ELCP), a novel approach that incorporates auxiliary data to refine localized prediction sets while preserving finite-sample marginal coverage guarantees. By utilizing a density-ratio-weighted kernel estimator, ELCP seamlessly integrates auxiliary and calibration data, accommodating potential distributional shifts and improving the local reliability of prediction sets. Theoretical analysis confirms that ELCP maintains marginal coverage and enhances asymptotic test-conditional coverage. Simulation results demonstrate its superior local coverage and smaller prediction sets compared to standard LCP, highlighting its effectiveness in settings with limited calibration data but available auxiliary information from related tasks.

2606.08499 2026-06-09 stat.ME stat.CO 新提交

A Transferability Criterion for Null-Optimized Variance Reduction in Cumulant-Based Error-Independence Testing

基于累积量的误差独立性检验中零优化方差缩减的可迁移性准则

Serhii Zabolotnii

AI总结 提出控制变量和多项式最大化估计量在假设检验中从零假设向备择假设迁移的闭合形式准则,并应用于Wiedermann-Shi三阶累积量检验,发现二阶修正无偏但备择假设下不一致,四阶修正方差减小但无法控制干扰。

详情
Comments
16 pages; no figures; submitted manuscript version
AI中文摘要

控制变量和多项式最大化(PMM)估计量在单个固定分布上优化,但它们越来越多地被提出用于增强假设检验,后者在参数族的两区域之间做出决策。我们给出了这种迁移成功的闭合形式准则。对于以零假设为中心的目标矩统计量,通过零优化权重向量K0进行增广,备择侧期望等于目标加上K0^T mu_a,H1,其中mu_a,H1是增广基的备择侧均值。因此,零方差迁移仅在正交条件K0^T mu_a,H1 = 0下无偏;要求每个增广函数保持均值为零是充分但不必要的。我们将该准则实例化到最近提出的用于测量误差独立性的Wiedermann-Shi三阶累积量检验。二阶PMM修正无偏且在零假设下方差更小(所有36种条件下的相对效率>=1;聚合平均ARE值1.23-5.16;第一类错误0.04-0.09),但在备择假设下证明不一致:反对称多项式辅助量获得非零均值,通过闭合形式因子衰减目标,导致功效损失7-52个百分点,在检验最强时最严重,并在重尾下恶化。四阶变体减小方差(比率1.127)但未能通过干扰保护(拒绝率0.295对比0.10)。我们推导了一个可重用的备择一致性接受门控,用于方差缩减的检验统计量。

英文摘要

Control-variate and polynomial-maximization (PMM) estimators are optimized at a single fixed distribution, yet they are increasingly proposed to strengthen hypothesis tests, which decide between two regions of a parameter family. We give a closed-form criterion for when this transfer succeeds. For an H0-centered augmentation of a target moment statistic with null-optimized weight vector K0, the alternative-side expectation equals the target plus K0^T mu_a,H1, where mu_a,H1 is the alternative-side mean of the augmenting basis. Null-variance reduction therefore transfers without bias only under the orthogonality condition K0^T mu_a,H1 = 0; requiring each augmenting function to remain mean-zero is sufficient but not necessary. We instantiate the criterion on the recently proposed Wiedermann-Shi third-order cumulant test for measurement-error independence. A second-order PMM correction is unbiased and lower-variance under the null (relative efficiency >= 1 in all 36 conditions; aggregated mean ARE values 1.23-5.16; Type-I 0.04-0.09), yet provably inconsistent under the alternative: the antisymmetric polynomial auxiliaries acquire nonzero means, attenuating the target by a closed-form factor and costing 7-52 percentage points of power, worst where the test is strongest and worsening under heavy tails. A fourth-order variant reduces variance (ratio 1.127) but fails a nuisance guard (rejection 0.295 versus 0.10). We derive a reusable alternative-consistency acceptance gate for variance-reduced test statistics.

2606.08475 2026-06-09 q-bio.QM stat.ME 新提交

Parameter uncertainty in dynamical models: a practical identifiability index

动力模型中的参数不确定性:一种实用可辨识性指标

Hamed Karami, Alexandra Smirnova, Sunmi Lee, Gerardo Chowell

AI总结 提出实用可辨识性指标(PII),基于置信区间对数跨度量化参数不确定性,用于评估有限噪声数据下参数约束程度。

详情
AI中文摘要

常微分方程模型被广泛用于理解和预测复杂动力系统,但其预测价值依赖于可靠的参数估计。结构可辨识性评估参数是否可以从理想观测中唯一恢复,而实用可辨识性则依赖于有限、含噪声和部分观测的数据。我们引入了实用可辨识性指标(PII),这是一种基于置信区间对数跨度的边际不确定性宽度度量。以数量级尺度表示,PII总结了单个正值参数被可用观测数据约束的紧密程度,从而能够在参数、模型、误差结构和观测设计之间进行比较。PII旨在作为补充诊断工具,而非独立的可辨识性检验,应与覆盖度、剖面似然、后验总结、敏感性分析或结构可辨识性结果结合解读。通过在增长模型和房室流行病模型上使用参数自助法实验,我们识别出一致的原则:随着校准窗口信息量增加,不确定性降低;随着观测噪声和参数耦合增加,不确定性增加;对于潜在或间接观测的过程,不确定性保持较高。控制早期可观测动态的参数更早受到约束,而额外的观测变量可以改善对潜在进展和恢复参数的约束。PII为动力建模提供了一种简单、可报告的边际参数不确定性总结。

英文摘要

Ordinary differential equation models are widely used to understand and forecast complex dynamical systems, but their predictive value depends on reliable parameter estimation. Structural identifiability assesses whether parameters can be uniquely recovered from ideal observations, whereas practical identifiability depends on finite, noisy and partially observed data. We introduce the Practical Identifiability Index (PII), a marginal uncertainty-width metric based on the logarithmic span of confidence intervals. Expressed on an order-of-magnitude scale, the PII summarises how tightly individual positive-valued parameters are constrained by available observations, enabling comparison across parameters, models, error structures and observation designs. The PII is intended as a complementary diagnostic, not a standalone identifiability test, and should be interpreted alongside coverage, profile likelihoods, posterior summaries, sensitivity analysis or structural identifiability results. Using parametric bootstrap experiments across growth and compartmental epidemic models, we identify consistent principles: uncertainty decreases as calibration windows become more informative, increases with observation noise and parameter coupling, and remains high for latent or indirectly observed processes. Parameters governing early observable dynamics become constrained sooner, while additional observables can improve constraint for latent progression and recovery parameters. The PII provides a simple, reportable summary of marginal parameter uncertainty for dynamical modelling.

2606.08460 2026-06-09 stat.ML cs.LG 新提交

LOTTERY: Learning from Reference-Only Samples in Two-Sample Testing under Size Asymmetry

LOTTERY: 在样本量不对称下的双样本检验中仅从参考样本学习

Xunye Tian, Zhijian Zhou, Liuhua Peng, Feng Liu

AI总结 针对参考样本丰富而查询样本极少的双样本检验问题,提出利用参考样本学习依赖参考的表示并自适应加权,实现置换检验的I类错误控制和一致性。

详情
Journal ref
ICML 2026
Comments
16 pages, 1 figure
AI中文摘要

数据自适应的双样本检验通过从数据中学习的差异(例如基于核的特征表示)来评估两个样本是否来自同一分布。这类方法通常依赖数据分割来解耦学习和检验,并控制I类错误。然而,这种范式不适用于样本量严重不平衡的小样本场景:有大量参考样本可用,而只有少量查询样本。在本文中,我们展示了如何建设性地利用这种不平衡。利用丰富的参考数据,我们学习依赖参考的表示,这些表示总结了参考分布的主要结构,并为检测偏离提供了信息信号。我们引入了一系列表示族,捕获全局和局部结构,并通过不确定性引导原则仅使用参考样本自适应地加权它们。理论上,我们建立了基于置换的I类错误控制,并证明了聚合检验的一致性:随着样本量增长,只要表示集中至少包含一个一致表示,检验功效收敛到1。实验上,我们的聚合方法在多个基准测试中实现了强性能,同时保持了I类错误控制。

英文摘要

Data-adaptive two-sample testing assesses if two samples come from the same distribution, using a discrepancy learned from the data (e.g., via kernel-based feature representations). Such methods typically rely on data splitting to decouple learning from testing and control type I error. However, this paradigm is ill-suited to few-shot settings with severe sample-size imbalance: abundant reference samples are available, while only a handful of query samples arrive. In this paper, we show how this imbalance can be leveraged constructively. Using abundant reference data, we learn reference-dependent representations that summarize salient structure of the reference distribution and provide informative signals for detecting departures. We incorporate a collection of representation families that capture both global and local structure, and adaptively weight them using only reference samples via an uncertainty-guided principle. Theoretically, we establish permutation-based type I error control and show consistency of the aggregated test: as the sample sizes grow, the test power converges to one whenever the representation set contains at least one consistent representation. Empirically, our aggregation achieves strong performance across a range of benchmarks while retaining type I error control.

2606.08409 2026-06-09 stat.ME q-bio.PE 新提交

Matrix representations and distance metrics for unlabeled ranked phylogenetic networks

无标签排序系统发育网络的矩阵表示与距离度量

Jiayang Wang, Julia A. Palacios, Claudia Solís-Lemus

AI总结 针对根有向、排序、无标签的系统发育网络,提出基于双射三角矩阵表示的距离度量族,支持等时和异时网络,可量化拓扑、时间及杂交数差异。

详情
Comments
25 pages, 11 figures. Submitted to the Proceedings of the National Academy of Sciences (PNAS)
AI中文摘要

系统发育网络是从分子序列数据推断出的图,代表由重组、杂交和水平基因转移等网状过程塑造的祖先历史。我们为有根、排序、无标签的系统发育网络引入一系列距离度量,扩展了先前为排序树开发的距离。我们的方法依赖于系统发育网络的双射三角矩阵表示,该表示捕获了内部事件、物种形成和杂交的时间顺序。我们的度量定义为标准矩阵范数,允许对网络拓扑、定时网络和具有不同杂交数量的网络进行高效的定量比较。我们的距离可用于所有末端在一个时间点采样的等时网络,以及允许末端在不同时间点采样的异时网络。我们表明,我们的度量在模拟和病毒系统发育网络的经验后验分布中捕捉到了进化历史上具有生物学意义的差异。这些工具填补了方法论空白,使得对排序、无标签的系统发育网络(包括祖先重组图)进行有原则的比较成为可能。

英文摘要

Phylogenetic networks are graphs inferred from molecular sequence data that represent ancestral histories shaped by reticulate processes such as recombination, hybridization, and horizontal gene transfer. We introduce a family of distance metrics for rooted, ranked, unlabeled phylogenetic networks, extending a previously developed distance for ranked trees. Our approach relies on a bijective triangular matrix representation of phylogenetic networks that captures the temporal order of internal events, speciations, and hybridizations. Our metrics, defined as standard matrix norms, allow efficient quantitative comparisons of network topologies, timed networks and networks with differing numbers of hybridizations. Our distance can be used for both isochronous networks where all tips are sampled at one time point, and heterochronous networks where tips are allowed to be sampled at different time points. We show that our metrics capture biologically meaningful differences among evolutionary histories in both simulations and empirical posterior distributions of viral phylogenetic networks. These tools fill a methodological gap, enabling principled comparisons of ranked, unlabeled phylogenetic networks, including ancestral recombination graphs.

2606.08407 2026-06-09 stat.ME stat.AP 新提交

Topological Effective Connectivity Modeling in Brain Networks

脑网络中的拓扑有效连接建模

Anass El-Yaagoubi, Moo K. Chung, Hernando Ombao

AI总结 提出非参数信息论框架,结合离散Hodge分解与超前滞后互信息,将边流分解为梯度、旋度和调和分量,以区分前馈驱动与循环反馈,并通过置换检验识别条件间信息流显著变化的节点和三角模体。

详情
Comments
45 pages, 15 figures
AI中文摘要

表征脑网络中的定向信息流是困难的,因为神经回路充满递归反馈环路。许多现有的定向依赖工具假设有向无环图(DAG)结构来解决方向模糊性,因此无法表示这些环路。我们提出了一个非参数信息论框架,通过将离散Hodge分解与超前滞后互信息耦合来解决这一问题,将得到的边流分解为三个正交分量:捕获层级前馈关系的梯度项;隔离三角级反馈环路的旋度项;以及捕获拓扑孔周围循环流的调和项。这种分离使得能够区分前馈驱动与循环反馈,而传统度量会混淆这两者。我们进一步开发了基于置换的假设检验层,识别其信息流特征在条件间显著变化的节点和三角模体。我们在具有已知真实结构的模拟上验证了该框架,并将其应用于局灶性缺血性卒中啮齿动物模型的局部场电位记录。在四只动物中的三只中,我们发现卒中后向层级化、源驱动的传播转变,以牺牲循环反馈为代价,而第四只动物没有显示出显著变化。

英文摘要

Characterizing directed information flow in brain networks is difficult because neural circuits are full of recurrent feedback loops. Many existing tools for directed dependence assume a directed acyclic graph (DAG) structure to resolve directional ambiguity, and therefore cannot represent these loops. We present a nonparametric, information-theoretic framework that addresses this by coupling the discrete Hodge decomposition with lead-lag mutual information, splitting the resulting edge flow into three orthogonal components: a gradient term capturing hierarchical, feed-forward relationships; a curl term isolating triangle-level feedback loops; and a harmonic term capturing cyclic flow around topological holes. This separation makes it possible to disentangle feed-forward drive from recurrent circulation, which conventional measures conflate. We further develop a permutation-based hypothesis-testing layer that identifies nodes and triangular motifs whose information-flow signatures change significantly between conditions. We validate the framework on simulations with known ground-truth structure and apply it to local field potential recordings from a rodent model of focal ischemic stroke. In three of four animals, we find a post-stroke shift toward hierarchical, source-driven propagation at the expense of recurrent feedback, while the fourth shows no significant change.

2606.08322 2026-06-09 cs.LG stat.ME 新提交

Orthogonality and Dimensionality in Airline Cluster Analysis using PCA and Kernel PCA

使用PCA和核PCA的航空公司聚类分析中的正交性与维度性

Andreas Schlapbach

发表机构 * Swiss Federal Railways (SBB)(瑞士联邦铁路(SBB)) University of Berne(伯尔尼大学)

AI总结 本文复现了Renold等人对1995-2020年美国航空公司利润周期的聚类实验,通过PCA和核PCA分析,发现六聚类分类在原始7维和3维PC空间中具有几何鲁棒性,并验证了数据的内在线性流形结构。

详情
AI中文摘要

为了刻画1995年至2020年美国航空公司的利润周期,Renold等人(2023)结合了k-means聚类、主成分分析和系统动力学建模。我们在三个空间中复现了他们的聚类实验——原始7维变量空间、3维PC得分空间和4维PC得分空间,使用了他们论文中慷慨包含的数据集。我们表明,六聚类分类在几何上是鲁棒的:在3-PC空间中的k-means产生的聚类分配与7维原始空间逐位相同。作为非线性检验,我们在六个核(涵盖三个族加上一个线性基线)下应用核PCA。所有六个核在2D中保留了六聚类分配。一个1D诊断进一步收紧:线性核将COVID年份C_3与峰值利润聚类C_0混淆,而所有五个非基线核将C_3移动到仅与后金融危机聚类C_5重叠。核族之间的一致性证实了一个内在的线性流形,没有隐藏的曲率。轮廓准则显示,该数据集在结构上仅支持三个聚类,而不是六个。原始7D空间中的共线性抑制了本应识别k=3作为结构上合理选择的轮廓信号。

英文摘要

To characterize the US airline profit cycles from 1995 to 2020, the authors of Renold et al. (2023) combine k-means clustering, principal component analysis, and system dynamic modelling. We replicate their clustering experiment in three spaces -- the original 7-dimensional raw-variable space, a 3-dimensional PC score space, and a 4-dimensional PC score space using their dataset gratefully included in the paper. We show that the six-cluster taxonomy is geometrically robust: k-means in 3-PC space produces bit-for-bit identical cluster assignments relative to 7D raw space. As a nonlinearity check we apply kernel PCA under six kernels spanning three families plus a linear baseline. All six kernels preserve the six-cluster assignment in 2D. A 1D diagnostic tightens this: the linear kernel conflates the COVID year C_3 with the peak-profit cluster C_0, whereas all five non-baseline kernels shift C_3 to overlap only the post-financial-crisis cluster C_5. Agreement across the kernel families confirms an intrinsically linear manifold with no hidden curvature. The silhouette criterion reveals that the dataset structurally supports only three clusters, not six. Collinearity in the raw 7D space suppresses the silhouette signal that would otherwise identify k=3 as the structurally motivated choice.

2606.08202 2026-06-09 stat.ML cs.LG physics.data-an q-bio.NC 新提交

Vector Space of Cycles

循环向量空间

Moo K. Chung, Anass B. El-Yaagoubi, Hernando Ombao

AI总结 提出一种变分框架,将循环交互表示为单纯复形上的边流,通过能量最小化动力学分离瞬态与持久谐波流,得到低维循环空间,实现循环结构的投影、平均、比较和统计推断。

详情
AI中文摘要

大多数用于有向交互的统计和机器学习方法关注变量之间的成对效应。即使现有的循环模型也主要通过节点级依赖表示反馈,使得大规模循环组织难以估计和比较。这一限制在生物和神经系统中尤为突出,其中交互高度循环且涉及许多重叠的循环。我们引入了一个用于循环交互统计推断的变分框架。有向交互被表示为单纯复形上的边流,并在能量最小化动力系统下演化。由此产生的动力学将瞬态交互分量与持久谐波流分离,产生一个捕获稳定循环组织的低维循环空间。该框架不是枚举单个循环,而是将循环交互表示为希尔伯特空间的元素,从而实现投影、平均、比较和群体级统计推断。我们建立了谐波投影的理论性质,包括循环空间的表征、方差减少和群体推断。模拟表明,与现有的有向交互方法相比,该方法在密集循环系统中显著改善了循环结构的恢复。应用于400名人类受试者的静息态fMRI,该框架揭示了通过边平均无法检测的可重复的大规模循环组织。这些结果为研究高维动力系统中的循环交互提供了一个可扩展的统计框架。

英文摘要

Most statistical and machine learning methods for directed interactions focus on pairwise effects among variables. Even existing cyclic models represent feedback primarily through node-level dependencies, making large-scale recurrent organization difficult to estimate and compare. This limitation is particularly acute in biological and neural systems, where interactions are highly recurrent and involve many overlapping cycles. We introduce a variational framework for statistical inference on cyclic interactions. Directed interactions are represented as edge flows on a simplicial complex and evolved under an energy-minimizing dynamical system. The resulting dynamics separate transient interaction components from persistent harmonic flows, yielding a low-dimensional cycle space that captures stable recurrent organization. Rather than enumerating individual cycles, the proposed framework represents cyclic interactions as elements of a Hilbert space, enabling projection, averaging, comparison, and population-level statistical inference. We establish theoretical properties of the harmonic projection, including characterization of the cycle space, variance reduction, and population inference. Simulations demonstrate substantially improved recovery of cyclic structure in dense recurrent systems compared with existing directed-interaction methods. Applied to resting-state fMRI from 400 human subjects, the framework reveals reproducible large-scale cyclic organization that is not detectable through edgewise averaging. These results provide a scalable statistical framework for studying recurrent interactions in high-dimensional dynamical systems.

2606.08084 2026-06-09 math.ST stat.AP stat.ML stat.TH 新提交

Assessing model calibration with boosting trees

使用提升树评估模型校准

Selim Gatti

AI总结 提出利用提升树检验回归模型校准与自校准的必要条件,在保险数据集上验证了方法的有效性。

详情
Comments
36 pages
AI中文摘要

回归建模的主要目标在于近似给定一组特征下响应的条件均值。如果得到的均值估计对于几乎所有特征集都匹配真实条件均值,则称回归函数是校准的。在实践中,由于通常处理的是有限样本的噪声观测,追求校准似乎难以实现。一个较弱的校准概念是自校准,它意味着给定相同均值估计的响应的期望与该估计匹配。这一概念在保险定价中尤为重要,因为它确保不同价格群体之间不存在交叉补贴。在本文中,我们展示了提升树可用于分别检验校准和自校准的必要条件。我们的方法通过一个数值例子证明了其实用相关性,其中所提出的测试在一个大型保险数据集上表现出很强的能力。

英文摘要

The main goal in regression modelling consists in approximating the conditional mean of a response given a set of features. A regression function is said to be calibrated if the resulting mean estimates match the true conditional means for almost every set of features. Aiming for calibration seems not achievable in practice as one typically deals with finite samples of noisy observations. A weaker notion of calibration is auto-calibration, and it means that the expectation of responses being given the same mean estimate matches this estimate. This notion is important, e.g., in insurance pricing as it ensures no cross-subsidization between different price cohorts. In this paper, we show that boosting trees can be used to test necessary conditions for calibration and auto-calibration, respectively. The practical relevance of our approach is supported by a numerical example, in which the proposed tests prove to be very powerful on a large insurance dataset.

2606.07984 2026-06-09 econ.EM cs.NA math.NA stat.CO 新提交

Lagrange multipliers in Maximum likelihood estimations and Least squares problems with Constraints

约束极大似然估计和最小二乘问题中的拉格朗日乘子

Takeshi Fukasawa

AI总结 研究约束极大似然估计和最小二乘问题中拉格朗日乘子的统计性质,证明其在大样本下收敛于零,并探讨该性质对算法初始化和惩罚方法的启示。

详情
AI中文摘要

本研究从数值优化的角度探讨了约束极大似然估计(MLE)和最小二乘(LS)问题中拉格朗日乘子的统计性质。基于大样本理论,我们证明,在MLE中分布正确设定或LS中残差正态分布的条件下,相关的拉格朗日乘子随着样本量增加收敛于零。尽管这一渐近行为在统计学中早已被认识,但在数值优化中却很少受到明确关注,且极少被用于算法设计。重要的是,这一见解超越了经典的低维设定:即使在现代高维应用(如深度学习)中,当参数数量可能超过样本量时,只要泛化性能良好,同样的推理仍然适用。这一观察有两个主要含义。首先,许多约束优化算法,包括增广拉格朗日方法、序列二次规划和内点法,都需要乘子的初始值,而选择零在统计上是合理的。约束回归和动态离散选择模型估计的数值实验支持这一含义,表明将乘子初始化为零通常能带来稳定且高效的性能。其次,将约束问题转化为无约束问题的基于惩罚的方法,在真实乘子较小时可以表现良好。这有助于解释为什么基于惩罚的方法在实践中通常表现良好。

英文摘要

This study investigates a statistical property of Lagrange multipliers in constrained Maximum Likelihood Estimation (MLE) and Least Squares (LS) problems from the perspective of numerical optimization. Building on large-sample theory, we show that the associated Lagrange multipliers converge to zero as the sample size increases, provided the distribution is correctly specified in MLE or the residuals are normally distributed in LS. Although this asymptotic behavior has long been recognized in statistics, it has received little explicit attention in numerical optimization and has rarely been exploited in algorithmic design. Importantly, the insight extends beyond classical low-dimensional settings: even in modern high-dimensional applications, such as deep learning, where the number of parameters may exceed the sample size, the same reasoning applies provided the generalization performance is good. This observation has two main implications. First, many constrained optimization algorithms, including the Augmented Lagrangian Method, Sequential Quadratic Programming, and Interior Point methods, require initial values for the multipliers, and choosing zero is statistically justified. Numerical experiments for constrained regressions and dynamic discrete choice model estimations support this implication by showing that initializing multipliers at zero usually lead to stable and efficient performance. Second, penalty-based approaches that convert constrained problems into unconstrained ones can perform well when the true multipliers are small. This helps explain why penalty-based methods often perform well in practice.

2606.07914 2026-06-09 stat.ML cs.LG 新提交

Identifiability and Estimation for Unlabeled Finite Mixtures under Marginal Independence

边际独立下无标签有限混合模型的可识别性与估计

Takafumi Kanamori, Yushi Hirose, Shohei Yamamoto

AI总结 研究无标签有限混合模型中,利用边际独立性假设恢复潜在成分和估计混合矩阵,提出PM-MMD估计器并证明其收敛性。

详情
AI中文摘要

我们研究来自无标签有限混合模型的成分恢复和混合矩阵估计,其中可观测分布共享相同的潜在成分但具有未知的混合权重。主要识别信号是边际独立性:每个成分假设在至少一个坐标对上是独立的,但没有观察到标签、干净的成分样本或混合权重。我们首先证明乘积成分的一个结构结果:在一元边际线性独立的条件下,成分的任何独立仿射组合必须与单个成分一致。然后我们将这一原理扩展到可观测混合,并表明在满秩和无抵消条件下,边际独立的仿射组合恢复相应的潜在成分。当每个成分在某个坐标对上是独立的时,所有成分都是可识别的,并且在所陈述的完成条件下混合矩阵是可恢复的。最后,我们提出一个基于可观测混合的仿射组合的乘积边际最大均值差异(PM-MMD)估计器,并证明在近似边际独立下的一致收敛性和稳定性。该框架还分离了假设的经验作用:一般来说,不可约性不能直接从无标签混合中检验,而边际独立性通过保留的PM-MMD提供候选级别的诊断。受控实验和流式细胞术实验显示了边际独立性何时提供有用的恢复信号。在报告的多成分比较中,条件感知的代表性选择稳定了PM-MMD,并相对于使用相同无标签混合的聚类、分解和成对混合比例基线改善了恢复。

英文摘要

We study component recovery and mixing-matrix estimation from unlabeled finite mixtures whose observable distributions share the same latent components but have unknown mixing weights. The main identifying signal is marginal independence: each component is assumed to be independent on at least one coordinate pair, but no labels, clean component samples, or mixing weights are observed. We first prove a structural result for product components: under linear independence of the univariate marginals, any independent affine combination of the components must coincide with a single component. We then extend this principle to observable mixtures and show that, under full-rank and no-cancellation conditions, marginally independent affine combinations recover the corresponding latent components. When every component is independent on some coordinate pair, all components are identifiable, and the mixing matrix is recoverable under the stated completion conditions. Finally, we propose a Product-Marginal Maximum Mean Discrepancy (PM-MMD) estimator over affine combinations of the observable mixtures and prove uniform convergence and stability under approximate marginal independence. This framework also separates the empirical roles of the assumptions: irreducibility is, in general, not directly testable from the unlabeled mixtures alone, whereas marginal independence yields a candidate-level diagnostic through held-out PM-MMD. Controlled and flow-cytometry experiments show when marginal independence provides a useful recovery signal. In the reported multi-component comparisons, condition-aware representative selection stabilizes PM-MMD and improves recovery relative to clustering, factorization, and pairwise mixture-proportion baselines using the same unlabeled mixtures.

2606.07809 2026-06-09 cs.SE stat.AP stat.ME 新提交

Sensitivity Analysis White Paper

灵敏度分析白皮书

Nate Bade, Lindsay Erickson

AI总结 本文系统梳理灵敏度分析方法,构建适用于复杂仿真(尤其军事领域)的框架,涵盖局部/全局方法、方差技术等,并讨论灵敏度审计。

详情
Comments
12 pages,
AI中文摘要

灵敏度分析是基于仿真的决策支持的重要组成部分,因为它帮助分析人员确定在不确定性下哪些输入对模型结果影响最大。本文将有广泛应用的灵敏度分析文献组织成一个连贯的框架,用于复杂仿真设置,特别关注军事应用。我们回顾了主要的方法类别,包括局部和全局方法、基于方差的技术、筛选方法、基于导数的方法和不确定性量化工具,并将它们与常见的分析目标(如因子优先排序、因子固定、方差减少和因子映射)联系起来。本文还讨论了灵敏度审计作为一种补充视角,强调透明度、假设追踪以及在决策相关环境中负责任地使用模型。

英文摘要

Sensitivity analysis is an important component of simulation-based decision support because it helps analysts determine which inputs most strongly influence model outcomes under uncertainty. This paper organizes the broad sensitivity analysis literature into a coherent framework for use in complex simulation settings, with particular attention to military applications. We review major classes of methods, including local and global approaches, variance-based techniques, screening methods, derivative-based methods, and uncertainty quantification tools, and relate them to common analytical objectives such as factor prioritization, factor fixing, variance reduction, and factor mapping. The paper also discusses sensitivity auditing as a complementary perspective that emphasizes transparency, assumption tracking, and responsible use of models in decision-relevant settings.

2606.07680 2026-06-09 stat.ME cs.SI 新提交

A Counting Process View of Relational Event Models: Practical Asymptotics

关系事件模型的计数过程视角:实用渐近性

Cornelius Fritz, Alexander Fuchs-Kreiss

AI总结 本文从计数过程角度重新审视关系事件模型,分析不同渐近机制下极大似然估计的渐近正态性条件,并通过模拟研究指导实际模型设定。

详情
AI中文摘要

关系事件模型(REMs)为分析连续时间中观察到的二元交互提供了一个严格的框架,能够捕捉诸如三元闭包和互惠性等历史依赖动态。通过计数过程的视角来构建REMs,将该模型嵌入丰富的理论基础中,促进其数学分析。虽然极大似然估计(MLE)是估计这些模型的标准实践,但其统计保证依赖于特定的渐近机制,即网络规模(n)、观测周期(T)或两者是否趋于无穷。我们回顾了这类基于计数过程的模型的理论基础,形式化了在这些不同极限下实现渐近正态性所需的核心假设。特别关注Cox型乘法模型,我们详细阐述了这些假设成立的条件。通过模拟研究的支持,我们说明了结构建模选择(包括时间窗口和对数变换)如何影响经验覆盖率和估计量收敛性。因此,我们推导出在现实背景下指定此类模型的若干指导原则,架起了理论与实践的桥梁。

英文摘要

Relational Event Models (REMs) provide a rigorous framework for analyzing dyadic interactions observed in continuous time, capturing history-dependent dynamics such as triadic closure and reciprocity. Framing REMs through the lens of counting processes embeds the model in a rich theoretical foundation, facilitating its mathematical analysis. While Maximum Likelihood Estimation (MLE) is standard practice for estimating these models, the underlying statistical guarantees rely on specific asymptotic regimes, namely, whether the network size (n), the observational period (T), or both approach infinity. We review the theoretical foundations of such counting-process-based models, formalizing the core assumptions required to achieve asymptotic normality across these different limits. With a specific focus on Cox-type multiplicative models, we detail the circumstances under which these assumptions hold. Supported by simulation studies, we illustrate how structural modeling choices, including temporal windowing and logarithmic transformations, affect empirical coverage and estimator convergence. We thereby derive several guiding principles for specifying such models in realistic contexts, bridging theory and practice.

2606.07561 2026-06-09 cs.LG stat.ME stat.ML 新提交

Boundary Variance Inflation Causes Acquisition Bias in Gaussian Processes

边界方差膨胀导致高斯过程中的采集偏差

Maria Bånkestad, Sanna Jarl, Jens Sjölund

发表机构 * RISE Research Institutes of Sweden(瑞典RISE研究院) Uppsala University(乌普萨拉大学)

AI总结 本文揭示有界域上平稳核高斯过程边界方差膨胀的根本原因是核相关邻域截断,并证明该几何扭曲导致三类采集函数产生系统性偏差,提出无函数选择剖面诊断方法。

详情
Comments
14 pages, 8 figures; appendices included
AI中文摘要

具有平稳核的高斯过程在有界域上会在边界附近表现出膨胀的后验方差。尽管这在地统计学中是一个长期被认识到的伪影,并且在贝叶斯优化中是过度探索的来源,但边界引起的采集偏差的原因和影响尚未得到充分探索。我们将根本原因追溯到一个简单的几何机制:核相关邻域在域边界处的截断产生了一种与观测无关的扭曲,且随着维度的增加而恶化。我们展示了这种扭曲如何在三类采集函数中表现出来:方差最大化将选择集中在角落,而负积分后验方差和期望预测信息增益则将选择向内移动到轴向内部壳层。这些模式的出现不依赖于任何目标函数,这意味着采集行为可能由核几何主导,而非期望的任务特定不确定性。为了量化这一点,我们引入了一种针对任意采集函数、核和有界域几何的无函数选择剖面诊断方法。

英文摘要

Gaussian processes with stationary kernels on bounded domains exhibit inflated posterior variance near the boundary. Despite being a long-recognized artifact in geostatistics and a source of over-exploration in Bayesian optimization, the causes and effects of boundary-induced acquisition bias are underexplored. We trace the root cause to a simple geometric mechanism: the truncation of the kernel correlation neighborhood at the domain boundary creates an observation-independent distortion that worsens with dimensionality. We show how this distortion manifests across three acquisition classes: variance maximization concentrates selections at the corners, whereas negative integrated posterior variance and expected predictive information gain move selections inward to axis-aligned interior shells. These patterns arise without reference to any objective function, meaning that acquisition behavior can be dominated by kernel geometry rather than the desired task-specific uncertainty. To quantify this, we introduce a function-free selection-profile diagnostic for arbitrary acquisitions, kernels, and bounded-domain geometries.

2606.04875 2026-06-09 astro-ph.IM astro-ph.EP astro-ph.SR physics.data-an stat.ME 版本更新

A Model Selection Criterion for Multidimensional Gaussian Processes: Application to Radial Velocities

多维高斯过程模型选择准则:应用于径向速度

Oscar Barragán

AI总结 针对多维高斯过程回归中联合建模辅助活动指标以分离恒星和行星信号的问题,提出一种基于条件径向速度似然和有效参数计数的信息准则MGIC_rv,用于比较不同多GP模型并识别最有效约束RV信号的活动指标。

详情
Comments
Accepted for publication in MNRAS letters
AI中文摘要

多维高斯过程(multi-GP)回归广泛用于通过联合建模辅助活动指标来分离径向速度(RVs)中的恒星和行星信号。然而,当多GP涉及不同的时间序列组合时,经典模型比较方法不直接适用,因此确定最能约束RVs中恒星信号的指标组合并非易事。在这项工作中,我们提出了一种信息准则,用于比较基于其解释RV分量能力的多GP模型,即$\mathrm{MGIC}_{ m rv}$。该指标将条件RV似然与有效参数计数相结合,该计数考虑了多GP模型对RV分量施加的正则化。我们证明$\mathrm{MGIC}_{ m rv}$为多GP模型比较提供了一个定量且稳健的框架,能够识别最有效约束RV信号的活动指标。尽管该准则是在RV分析的背景下开发的,但它具有通用性,适用于推理集中于特定可观测量的多GP问题。

英文摘要

Multidimensional Gaussian Process (multi-GP) regression is widely used to disentangle stellar and planetary signals in radial velocities (RVs) by jointly modelling ancillary activity indicators. However, identifying the combination of indicators that best constrains the stellar signal in the RVs is non-trivial, as classical model comparison methods are not directly applicable when multi-GPs involve different time series combinations. In this work, we present an information criterion to compare multi-GP models based on their ability to explain the RV component, $\mathrm{MGIC}_{\rm rv}$. This metric combines the conditional RV likelihood with an effective parameter count that accounts for the regularisation imposed by the multi-GP model on the RV component. We demonstrate that $\mathrm{MGIC}_{\rm rv}$ provides a quantitative and robust framework for multi-GP model comparison, identifying the activity indicators that most effectively constrain the RV signal. Although developed in the context of RV analysis, the proposed criterion is general and applicable to multi-GP problems in which the inference focuses on a specific observable.

2605.27237 2026-06-09 math.OC stat.ME 版本更新

Feasibility Determination for Subjective Probability Constraints

主观概率约束的可行性确定

Taehoon Kim, Sigrun Andradottir, Seong-Hee Kim, Yuwei Zhou

AI总结 针对伯努利分布观测数据,提出一种直接利用原始观测而非批均值近似正态性的可行性确定程序,并引入多阈值主观约束,通过实验证明其统计有效性和高效性。

详情
AI中文摘要

我们考虑从有限个模拟备选方案中确定可行系统的问题,其中随机模拟的观测服从伯努利分布。大多数统计有效的可行性确定程序关注于正态分布观测的均值约束。虽然这些程序可以通过将批均值作为基本观测来适应伯努利分布数据,但实现近似正态性通常需要较大的批大小,可能导致在做出决策时浪费不必要的观测。本文提出了一种直接利用伯努利分布观测来确定可行性的程序。此外,我们引入了主观约束,允许每个约束有多个阈值。我们证明了所提出的程序是统计有效的,并且它优于现有的针对正态分布观测的主观约束可行性确定程序。此外,我们提出了两种启发式可行性检查方法,用于决策者顺序添加的阈值,允许在多个系统可行时收紧阈值,或在没有可行系统时放宽阈值。通过实验表明,所提出的程序能够有效地为所有考虑的阈值提供系统的可行性决策。

英文摘要

We consider the problem of determining feasible systems from a finite set of simulated alternatives with respect to probability constraints, where the observations from stochastic simulations are Bernoulli distributed. Most statistically valid procedures for feasibility determination focus on constraints on the means of normally distributed observations. Although these procedures can be adapted to Bernoulli-distributed data by treating batch means as basic observations, achieving approximate normality often requires a large batch size, potentially leading to the unnecessary waste of observations in reaching a decision. This paper proposes a procedure that utilizes the Bernoulli-distributed observations directly to determine feasibility. In addition, we incorporate subjective constraints, allowing for multiple thresholds for each constraint. We demonstrate that our proposed procedure is statistically valid and that it outperforms an existing feasibility determination procedure for subjective constraints originally developed for normally distributed observations. Furthermore, we propose two heuristic feasibility check approaches for thresholds that are sequentially added by decision makers, allowing thresholds to be tightened when many systems are feasible or relaxed when no feasible system exists. We show by experiments that the proposed procedures can efficiently provide feasibility decisions to systems with respect to all thresholds considered.

1606.04182 2026-06-09 stat.CO 版本更新

Robust and Efficient Estimation for a Discrete Distribution Using L2 Optimization

基于L2优化的离散分布稳健高效估计

Jiwoong Kim

AI总结 提出一种基于Cramer-von Mises型优化的泊松分布速率参数估计方法,具有渐近正态性和稳健性,模拟表明优于最大似然法等经典方法。

详情
AI中文摘要

本文提出了一种估计泊松分布速率参数的新方法。所提方法采用了常用于连续分布参数估计的Cramer-von Mises型优化。通过所提方法获得估计量后,严格研究了其理想性质,如渐近分布和稳健性。模拟研究表明,所提方法优于其他著名方法,包括最大似然法。

英文摘要

This paper proposes a novel method to estimate the rate parameter of the Poisson distribution. The proposed method employs the Cramer-von Mises type optimization which has been commonly used in estimating parameters of continuous distributions. Upon obtaining the estimator through the proposed method, its desirable properties such as asymptotic distribution and robustness are rigorously investigated. Simulation studies serve to demonstrate that the proposed method compares favorably with other well-celebrated methods including the maximum likelihood method.

2605.14610 2026-06-09 stat.ME eess.SP math.ST stat.TH 版本更新

Parametrically Adaptive Transition Polynomial: a Signed-Parity Continuous-alpha Extension of Kunchenko Stochastic Polynomials

参数自适应过渡多项式:Kunchenko随机多项式的带符号奇偶连续α扩展

Serhii Zabolotnii

AI总结 本文提出了一种参数自适应过渡多项式(PATP),作为Kunchenko随机多项式的带符号奇偶连续α扩展,通过连续参数α在[0,1]范围内控制,解决了非高斯误差下的参数估计问题,并探讨了其在极端厚尾分布中的应用边界。

详情
Comments
35 pages, 8 figures. Code supplement: https://github.com/SZabolotnii/Ku-PATP-code-supplement
AI中文摘要

Kunchenko的多项式最大化方法提供了一种半参数工具,用于在非高斯误差下的参数估计,但其经典幂基依赖于有限的高阶整数矩。本文引入了参数自适应过渡多项式(PATP),一种由连续参数α在[0,1]范围内控制的带符号奇偶分数幂家族。二次指数映射p_i(α)连接了分数 regime p_i(0)=1/i,退化线性点p_i(1/2)=1和带符号奇偶整数幂 regime p_i(1)=i。对于S=2的情况,我们推导出一个闭式方差减少系数g_2(α),以带符号和绝对分数矩表示,识别了α=1/2处的奇异行为,并陈述了在何种矩和正则性条件下该公式有意义。该构造应被视为Kunchenko广义装置内的Form-B PATP类比,而不是在α=1时的精确恢复经典偶幂PMM基。使用标准分布的数值示例来检验带符号奇偶估计量的有限样本行为,并标记极厚尾情况如Cauchy的适用边界。

英文摘要

Kunchenko's method of polynomial maximization provides a semiparametric apparatus for parameter estimation under non-Gaussian errors, but its classical power basis relies on finite higher-order integer moments. This paper introduces the Parametrically Adaptive Transition Polynomial (PATP), a signed-parity fractional-power family controlled by a continuous parameter alpha in [0,1]. The quadratic exponent map p_i(alpha) connects the fractal regime p_i(0)=1/i, the degenerate linear point p_i(1/2)=1, and the signed-parity integer-power regime p_i(1)=i. For the degree-S=2 case we derive a closed-form variance-reduction coefficient g_2(alpha) in terms of signed and absolute fractional moments, identify the singular behavior at alpha=1/2, and state the moment and regularity conditions under which the formula is meaningful. The construction should be read as a Form-B PATP analogue within Kunchenko's generalized apparatus, not as an exact recovery of the canonical even-power PMM basis at alpha=1. Numerical illustrations on canonical distributions are used to examine the finite-sample behavior of the signed-parity estimator and to mark the boundary of applicability for extremely heavy-tailed cases such as Cauchy.

2605.10406 2026-06-09 stat.ME stat.AP stat.ML 版本更新

Multi-Fidelity Quantile Regression

多保真度分位数回归

Yixiang Liu, Yao Zhang

AI总结 本文提出一种双阶段、模型无关的多保真度分位数回归方法,通过局部分位数链接将高保真度分位数表示为低保真度分位数在协变量依赖水平上的评估,从而估计水平函数以提高估计精度,并通过实验验证方法的有效性。

详情
Comments
69 pages, 12 figures, 3 tables
AI中文摘要

高保真度(HF)数据通常昂贵且稀缺,使得条件分位数难以准确估计。我们提出了一种双阶段、模型无关的多保真度分位数回归方法。核心思想是局部分位数链接:在每个协变量值处,高保真度分位数被表示为低保真度(LF)分位数在协变量依赖水平上的评估。这种重新表述将问题转化为估计水平函数,当LF和HF条件分布形状相似时,该函数可能比高保真度分位数本身更平滑。我们还研究了互补领域,其中这种优势减弱,并引入了修正步骤以提高鲁棒性。我们的理论阐述了所提出估计器在仅使用高保真度数据进行直接分位数回归时收敛速度更快的情况,以及修正步骤进一步改进的情况。在合成和真实数据上的实验表明,我们的方法能够产生更准确的分位数估计和更紧的符合预测区间。

英文摘要

High-fidelity (HF) data are often expensive to collect and therefore scarce, making conditional quantiles difficult to estimate accurately. We propose a two-stage, model-agnostic method for multi-fidelity quantile regression. The central idea is a local quantile link: at each covariate value, the HF quantile is represented as a low-fidelity (LF) quantile evaluated at a covariate-dependent level. This reformulation reduces the problem to estimating the level function, which can be smoother than the HF quantile itself when the LF and HF conditional distributions have similar shapes. We also study the complementary regime in which this advantage weakens and introduce a correction step to improve robustness. Our theory characterizes when the proposed estimator converges faster than direct quantile regression using HF data alone and when the correction step provides further improvement. Experiments on synthetic and real data show that our method yields more accurate quantile estimates and tighter conformal prediction intervals.

2602.18364 2026-06-09 cs.IT cs.LG math.IT quant-ph stat.ML 版本更新

Quantum Maximum Likelihood Prediction via Hilbert Space Embeddings

通过希尔伯特空间嵌入的量子最大似然预测

Sreejith Sreekumar, Nir Weinberger

AI总结 研究量子最大似然预测任务,通过将经验概率分布嵌入量子态并最小化量子相对熵,提出统一框架,给出非渐近性能保证。

详情
Comments
31+3 pages, 1 figure
AI中文摘要

最大似然预测是现代大型语言模型的核心任务。这里,我们作为第一步,针对由独立同分布样本组成的简化数据模型研究该任务的量子版本。量子最大似然预测器通过将经验概率分布嵌入量子态,并在给定状态类上最小化量子相对熵得到。当量子模型类具有足够表达能力时,我们从量子反向信息投影和量子勾股定理的角度给出了该预测器的解释。我们进一步推导了在迹范数和量子相对熵下的非渐近性能保证,包括收敛速度和集中不等式。我们的方法为处理经典和量子LLM中的MLP提供了统一框架。

英文摘要

Maximum likelihood prediction (MLP) is a core task at the heart of modern large language models. Here, we study a quantum version of this task for a simplified data model consisting of independent and identically distributed samples, as a first step. The quantum maximum likelihood predictor is obtained by embedding of empirical probability distributions into quantum states and performing a minimization of quantum relative entropy over a given class of states. We provide an interpretation of this predictor in terms of quantum reverse information projection and quantum Pythagorean theorem when the class of quantum models is sufficiently expressive. We further derive non-asymptotic performance guarantees in terms of convergence rates and concentration inequalities, both in trace norm and quantum relative entropy. Our approach provides a unified framework to handle MLP within both classical and quantum LLMs.

2602.05807 2026-06-09 stat.ME 版本更新

SpARCD: A Spectral Graph Framework for Revealing Differential Functional Connectivity in fMRI Data

SpARCD:一种揭示fMRI数据中差异功能连接的谱图框架

Shira Yoffe, Ziv Ben-Zion, Guy Gurevitch, Talma Hendler, Malka Gorfine, Ariel Jaffe

AI总结 提出SpARCD框架,利用距离相关和谱滤波检测两种实验条件下脑连接差异,通过置换检验得到区域级显著性图,在复杂依赖结构中优于传统方法。

详情
AI中文摘要

识别在不同认知或情绪状态下表现出功能连接改变的脑区域是神经科学中的一个关键问题。现有方法,如边检验、基于种子的心理生理交互(PPI)分析或相关网络比较,通常存在统计功效低、阈值任意以及捕获分布式或非线性依赖模式能力有限的问题。我们提出SpARCD(揭示连接差异的谱分析),一种用于检测两种实验条件下脑连接差异的新统计框架。SpARCD利用距离相关(一种对线性和非线性关联都敏感的依赖度量)为每种条件构建加权图。然后通过谱滤波构建微分算子,并计算其前导特征向量来揭示连接变化。通过基于置换的检验方案实现推断,该方案生成可解释的区域级显著性图。广泛的模拟研究表明,SpARCD相对于传统的边检验或单变量方法具有更高的功效,特别是在存在复杂依赖结构时。对113名早期PTSD患者在执行情绪面孔匹配任务时的fMRI数据应用,揭示了与情绪反应和调节过程相关的不同网络。总体而言,SpARCD为比较高维连接结构提供了一个统计严谨且计算高效的框架,广泛适用于神经影像学和其他基于网络的科学领域。

英文摘要

Identifying brain regions that exhibit altered functional connectivity across cognitive or emotional states is a key problem in neuroscience. Existing methods, such as edge-wise testing, seed-based psychophysiological interaction (PPI) analysis, or correlation network comparison, typically suffer from low statistical power, arbitrary thresholding, and limited ability to capture distributed or nonlinear dependence patterns. We propose SpARCD (Spectral Analysis of Revealing Connectivity Differences), a novel statistical framework for detecting differences in brain connectivity between two experimental conditions. SpARCD leverages distance correlation, a dependence measure sensitive to both linear and nonlinear associations, to construct a weighted graph for each condition. It then constructs a differential operator via spectral filtering and uncovers connectivity changes by computing its leading eigenvectors. Inference is achieved via a permutation-based testing scheme that yields interpretable, region-level significance maps. Extensive simulation studies demonstrate that SpARCD achieves superior power relative to conventional edge-wise or univariate approaches, particularly in the presence of complex dependency structures. Application to fMRI data from 113 early PTSD patients performing an emotional face-matching task reveals distinct networks associated with emotional reactivity and regulatory processes. Overall, SpARCD provides a statistically rigorous and computationally efficient framework for comparing high-dimensional connectivity structures, with broad applicability to neuroimaging and other network-based scientific domains.

2602.02753 2026-06-09 stat.ME math.ST stat.TH 版本更新

Effect-Wise Inference for Smoothing Spline ANOVA on Tensor-Product Sobolev Space

张量积Sobolev空间上光滑样条ANOVA的效应推断

Youngjin Cho, Meimei Liu

AI总结 针对多元非参数函数ANOVA中效应推断的挑战,提出基于张量积Sobolev子空间的光滑样条ANOVA统一框架,实现各效应函数的收敛率、逐点置信区间和Wald检验,主效应达最优单变量率,交互效应达最优率(对数因子内)。

详情
AI中文摘要

函数ANOVA为多元协变量提供了非参数建模框架,能够灵活估计和解释主效应、交互效应等效应函数。然而,此类模型中的效应推断仍然具有挑战性。现有方法主要关注整个函数的推断而非单个效应。针对效应推断的方法面临重大限制:无法处理交互作用、缺乏严格的理论基础,或仅限于逐点推断。为解决这些限制,我们在张量积Sobolev空间的子空间上开发了一个用于光滑样条ANOVA中效应推断的统一框架。对于每个效应函数,我们建立了收敛速率、逐点置信区间,以及检验效应是否为零的Wald型检验,其功效达到极小极大可区分速率(对数因子内)。主效应达到最优单变量速率,交互效应达到最优速率(对数因子内)。理论基础依赖于效应子空间的正交分解,这使得函数Bahadur表示框架能够扩展到包含交互作用的光滑样条ANOVA中的效应推断。模拟研究和科罗拉多温度数据集的真实数据应用表明,与现有方法相比,该方法具有优越性能。

英文摘要

Functional ANOVA provides a nonparametric modeling framework for multivariate covariates, enabling flexible estimation and interpretation of effect functions such as main effects and interaction effects. However, effect-wise inference in such models remains challenging. Existing methods focus primarily on inference for entire functions rather than individual effects. Methods addressing effect-wise inference face substantial limitations: the inability to accommodate interactions, a lack of rigorous theoretical foundations, or restriction to pointwise inference. To address these limitations, we develop a unified framework for effect-wise inference in smoothing spline ANOVA on a subspace of tensor product Sobolev space. For each effect function, we establish rates of convergence, pointwise confidence intervals, and a Wald-type test for whether the effect is zero, with power achieving the minimax distinguishable rate up to a logarithmic factor. Main effects achieve the optimal univariate rates, and interactions achieve optimal rates up to logarithmic factors. The theoretical foundation relies on an orthogonality decomposition of effect subspaces, which enables the extension of the functional Bahadur representation framework to effect-wise inference in smoothing spline ANOVA with interactions. Simulation studies and real-data application to the Colorado temperature dataset demonstrate superior performance compared to existing methods.

2501.04615 2026-06-09 stat.ME math.ST stat.TH 版本更新

Doubly Robust and Efficient Calibration of Prediction Sets for Right-Censored Time-to-Event Outcomes

右删失时间-事件结果预测集的双重稳健与高效校准

Rebecca Farina, Eric J. Tchetgen Tchetgen, Arun Kumar Kuchibhotla

AI总结 针对右删失生存数据,提出基于逆概率删失加权和半参数高效影响函数的预测下界,实现双重稳健且渐近高效的预测集校准。

详情
Comments
48 pages, 11 figures
AI中文摘要

我们的目标是构建针对右删失时间-事件结果的良好校准预测集,并保证覆盖概率。受现代共形推断启发,我们的方法无需指定参数或半参数生存模型。与现有假设I型删失且完全观测删失时间的共形方法不同,我们考虑更常见的右删失设定,其中仅观测到删失时间或事件时间中先发生的一个。在标准条件独立删失条件下,我们提出并分析了几种未来观测生存时间的预测下界,包括逆概率删失加权及其基于半参数高效影响函数的增强版本,该影响函数考虑了依赖删失的相关边际分位数。我们正式建立了所提方法的渐近覆盖保证,并通过理论和实证实验证明,增强方法在所有其他方法中显著提高了效率。具体而言,其覆盖误差界是双重稳健的,因此是二阶的,从而确保其相对于其他方法的覆盖误差渐近可忽略。

英文摘要

Our objective is to construct well-calibrated prediction sets for a time-to-event outcome subject to right-censoring with guaranteed coverage. Inspired by modern conformal inference, our approach avoids the need for a well-specified parametric or semiparametric survival model. Unlike existing conformal methods for survival data, which assume Type-I censoring with fully observed censoring times, we consider the more common right-censoring setting in which only the censoring time or only the event time is observed, whichever comes first. Under a standard conditional independence censoring condition, we propose and analyze several lower prediction bounds for the survival time of a future observation, including inverse-probability-of-censoring weighting, and its augmented version based on the semiparametric efficient influence function for the relevant marginal quantile of the outcome accounting for dependent censoring. We formally establish asymptotic coverage guarantees of the proposed methods, and demonstrate both theoretically and through empirical experiments, that the augmented approach substantially improves efficiency over all other proposed methods. Specifically, its coverage error bound is doubly robust, and therefore of second order, thus ensuring that it is asymptotically negligible relative to the coverage error of the other methods.

2512.10250 2026-06-09 stat.ME stat.AP stat.CO 版本更新

Time-Averaged Drift Approximations are Inconsistent for Inference in Drift Diffusion Models

时间平均漂移近似在漂移扩散模型推断中是不一致的

Sicheng Liu, Alexander Fengler, Michael J. Frank, Matthew T. Harrison

AI总结 本文证明在漂移扩散模型中使用时间平均漂移近似(TADA)进行参数推断会导致不一致估计,并通过注意DDM数值示例展示其导致注意效应系统性误估和错误科学结论。

详情
Comments
37 pages. Includes updates for the first revision
AI中文摘要

漂移扩散模型(DDMs)已在计算神经科学、认知科学、数学心理学及其他领域得到广泛应用。它们将简单决策任务中的证据积累建模为向决策边界漂移的随机过程。在漂移既在试验内随时间变化又在试验间变化的情况下,精确似然评估的高计算成本常常导致使用计算上方便的替代方法进行参数推断,即时间平均漂移近似(TADA)。在每个试验中,TADA假设时变漂移率可以用其在整个试验中的时间平均值代替。这种方法利用恒定漂移DDM的解析似然公式实现快速参数推断。在这项工作中,我们证明这种估计量是不一致的:它不收敛于真实漂移,当通过TADA及类似近似获得参数估计时,存在使科学结论产生偏差的风险。我们在可能是最简单的设置中提供了这种不一致性的基本证明:具有分段恒定漂移的布朗运动击中单侧上边界。此外,基于注意DDM(aDDM)的数值示例表明,使用TADA会导致决策中注意效应的系统性误估,并可能在科学假设检验中导致错误结论。

英文摘要

Drift diffusion models (DDMs) have found widespread use in computational neuroscience, cognitive science, mathematical psychology as well as other fields. They model evidence accumulation in simple decision tasks as a stochastic process drifting towards decision barriers. In models where the drift is both time-varying within a trial and variable across trials, the high computational cost for accurate likelihood evaluation has often led to the use of a computationally convenient surrogate for parameter inference, the time-averaged drift approximation (TADA). In each trial, TADA assumes that the time-varying drift rate can be replaced by its temporal average throughout the trial. This approach enables fast parameter inference using analytical likelihood formulas for DDMs with constant drift. In this work, we show that such an estimator is inconsistent: it does not converge to the true drift, posing a risk of biasing scientific conclusions when parameter estimates are obtained by TADA and similar approximations. We provide an elementary proof of this inconsistency in what is perhaps the simplest possible setting: a Brownian motion with piecewise constant drift hitting a one-sided upper boundary. Furthermore, numerical examples based on an attentional DDM (aDDM) show that using TADA leads to systematic misestimation of attentional effects in decision making and can lead to false conclusions in scientific hypothesis testing.

2512.03983 2026-06-09 stat.ME 版本更新

Statistical hypothesis testing for differences between layers in dynamic multiplex networks

动态多重网络中层级间差异的统计假设检验

Maximilian Baum, Francesco Sanna Passino, Axel Gandy

AI总结 针对动态多重网络中不同边类型层级间连接性差异,提出基于潜空间网络模型的假设检验框架,通过谱嵌入展开图邻接矩阵构建检验统计量,实现层级间全局差异检验。

详情
Comments
12 pages, 3 figures
AI中文摘要

随着动态多重网络(对应多种边类型随时间演化的图)的出现,一个关键的推断任务是确定与不同边类型相关的层级是否在连接性上存在差异。在这项工作中,我们引入了一个基于潜空间网络模型的假设检验框架,用于评估层级是否共享共同的潜在表示。我们提出的方法扩展了先前与随机图两两检验问题相关的文献,并实现了对多重图中层级间差异的全局检验。虽然我们将该方法作为层级间差异的检验引入,但它可以很容易地适应于检验时间点之间的差异。我们基于图邻接矩阵展开表示的谱嵌入构建了一个检验统计量,并证明了其在每个图节点数趋于无穷的渐近情况下检测层级间差异的能力。通过评估其在模拟数据和描述幼虫果蝇神经活动的生物数据集上的表现,经验性地展示了该检验的有限样本性质。

英文摘要

With the emergence of dynamic multiplex networks, corresponding to graphs where multiple types of edges evolve over time, a key inferential task is to determine whether the layers associated with different edge types differ in their connectivity. In this work, we introduce a hypothesis testing framework, under a latent space network model, for assessing whether the layers share a common latent representation. The method we propose extends previous literature related to the problem of pairwise testing for random graphs and enables global testing of differences between layers in multiplex graphs. While we introduce the method as a test for differences between layers, it can easily be adapted to test for differences between time points. We construct a test statistic based on a spectral embedding of an unfolded representation of the graph adjacency matrices and demonstrate its ability to detect differences across layers in the asymptotic regime where the number of nodes in each graph tends to infinity. The finite-sample properties of the test are empirically demonstrated by assessing its performance on both simulated data and a biological dataset describing the neural activity of larval Drosophila.

2507.12843 2026-06-09 cs.LG stat.ML 版本更新

Are Two Datasets Close Enough With Statistical Significance? A Kernel Distributional Closeness Testing Approach

两个数据集在统计意义上是否足够接近?一种核分布接近性检验方法

Zhijian Zhou, Liuhua Peng, Xunye Tian, Mingming Gong, Feng Liu

AI总结 针对分布接近性检验(DCT)在复杂数据上的局限性,提出基于核的最大均值差异(MMD)的改进度量NAMMD,并构建NAMMD-DCT方法,在保持I类错误有界的同时提高检验功效。

详情
AI中文摘要

两个分布在统计意义上是否接近?分布接近性检验(DCT)通过检验分布对之间的距离是否至少为epsilon来形式化这一问题。现有的DCT方法主要测量定义在离散空间上的分布对之间的差异,例如使用总变差,这限制了它们在图像等复杂数据上的应用。为了将DCT扩展到更多类型的数据,一个自然的想法是将最大均值差异(MMD)引入DCT场景,MMD是衡量复杂分布之间分布差异的强大度量。然而,实证结果表明,许多分布对可能具有相同的MMD值,尽管它们在同一个再生核希尔伯特空间(RKHS)中具有不同的范数。这些分布对可能表现出不同的有限样本可区分性,并反映不同的实际接近程度,使得MMD在DCT中信息量不足。为了缓解这个问题,我们设计了一种新的分布差异度量——范数自适应MMD(NAMMD),它使用分布的RKHS范数来缩放MMD值。基于NAMMD的渐近分布,我们提出了基于NAMMD的DCT来评估分布对的接近程度。理论上,我们证明了基于NAMMD的DCT比基于MMD的DCT具有更高的检验功效,同时保持有界的I类错误。这一点在多种类型的数据(包括合成噪声和真实图像)上的大量实验中得到进一步验证。我们的代码可在此https URL获取。

英文摘要

Are two distributions close to each other with statistical significance? Distribution closeness testing (DCT) formalizes this question by testing whether the distance between a distribution pair is at least epsilon-far. Existing DCT methods mainly measure discrepancies between distribution pairs defined on discrete spaces, for example using total variation, which limits their application to complex data such as images. To extend DCT to more types of data, a natural idea is to introduce maximum mean discrepancy (MMD), a powerful measure of distributional discrepancy between complex distributions, into DCT scenarios. However, empirical results indicate that many distribution pairs can have the same MMD value despite having different norms in the same reproducing kernel Hilbert space (RKHS). These pairs may exhibit different finite-sample distinguishability and reflect different practical closeness levels, making MMD less informative for DCT. To mitigate this issue, we design a new measure of distributional discrepancy, norm-adaptive MMD (NAMMD), which scales the MMD value using the RKHS norms of distributions. Based on the asymptotic distribution of NAMMD, we propose NAMMD-based DCT to assess the closeness level of a distribution pair. Theoretically, we prove that NAMMD-based DCT has higher test power than MMD-based DCT while maintaining bounded type-I error. This is further validated by extensive experiments on multiple types of data, including synthetic noise and real images. Our code is available at https://github.com/zhijianzhouml/NAMMD.

2505.01359 2026-06-09 stat.ME 版本更新

Dual system estimation using mixed effects loglinear models

使用混合效应对数线性模型的双系统估计

Ceejay Hammond, Paul A. Smith, Peter G. M. van der Heijden

AI总结 针对双系统估计中分层变量水平多导致参数过多的问题,提出混合效应对数线性模型,通过模拟验证其均方误差略优于固定效应模型,并扩展至多系统估计。

详情
AI中文摘要

在官方统计中,双系统估计(DSE)是一种众所周知的估计人口规模的工具。两个来源被链接,并估计被两个来源都遗漏的单位数量。通常,双系统估计在分层变量的每个水平(如区域)中进行。DSE可以被视为对数线性独立模型,并且,在存在分层变量的情况下,视为对数线性条件独立模型。标准方法是为分层变量的每个水平估计参数。因此,当分层变量的水平数很大时,估计的参数数量也很大。混合效应对数线性模型也被提出,其中涉及分层变量的参数集被一个由其均值和方差参数化的分布所替代,我们通过模拟研究其性质。在我们的模拟研究中,混合效应对数线性模型优于固定效应对数线性模型,尽管在均方误差方面仅略有改善。我们展示了如何将混合效应双系统估计扩展到多系统估计。

英文摘要

In official statistics, dual system estimation (DSE) is a well-known tool to estimate the size of a population. Two sources are linked, and the number of units that are missed by both sources is estimated. Often dual system estimation is carried out in each of the levels of a stratifying variable, such as region. DSE can be considered a loglinear independence model, and, with a stratifying variable, a loglinear conditional independence model. The standard approach is to estimate parameters for each level of the stratifying variable. Thus, when the number of levels of the stratifying variable is large, the number of parameters estimated is large as well. Mixed effects loglinear models, where sets of parameters involving the stratifying variable are replaced by a distribution parameterised by its mean and a variance, have also been proposed, and we investigate their properties through simulation. In our simulation studies the mixed effects loglinear model outperforms the fixed effects loglinear model although only to a small extent in terms of mean squared error. We show how mixed effects dual system estimation can be extended to multiple system estimation.

2311.02858 2026-06-09 math.ST stat.TH

Estimation of the rate parameter of the probability distribution on the regression setup

回归设定下概率分布速率参数的估计

Jiwoong Kim

AI总结 针对指数分布速率参数与预测变量相关时的回归参数估计问题,提出一种新估计量并讨论其渐近性质。

详情
AI中文摘要

当指数分布的速率参数与预测变量相关时,主要关注点将是如何估计回归参数。在本文中,我们将研究如何在指数分布的回归设定下估计参数。为此,我们提出了一种新的估计量,并讨论其渐近性质。

英文摘要

When the rate parameter of the exponential distribution is associated with predictors, then the main interest will be how to estimate the regression parameter. In this paper, we will investigate how to estimate the parameter on the regression setup of the exponential distribution. To that end, we propose a new estimator, and its asymptotic properties will be discussed.

2406.09195 2026-06-09 stat.ME math.ST physics.data-an stat.CO stat.TH

On the statistical analysis of grouped data: when Pearson $χ^2$ and other divisible statistics are not goodness-of-fit tests

关于分组数据的统计分析:当Pearson χ²和其他可分割统计量不是拟合优度检验时

Sara Algeri, Estate V. Khmaladze

AI总结 本文探讨了分组数据统计分析中Pearson χ²等可分割统计量的局限性,提出了一种统一方法,并构建了无分布拟合优度检验。

详情
AI中文摘要

数千项实验每年都会被分析,涉及分组数据的统计分析的论文每年都会发表。尽管这一领域常被 naive 地认为是饱和的,但仍有若干误解影响日常实践,且新前沿尚未被探索。本文引入了一种统一方法,用于分析可分割统计量(包括Pearson的χ²、似然比和谱统计量作为特例)在大量分组/桶的情况下,从而导致大量小或中等频率。测试的性能在连续(局部)替代类中进行分析。最令人惊讶的结果是,在这种“稀疏”情况下,文献中大多数提出的测试可以修改以产生更强大的测试,且没有单一基于可分割统计量的测试能成为拟合优度测试。此外,还构建了无分布拟合优度测试。

英文摘要

Thousands of experiments are analyzed, and papers are published each year involving the statistical analysis of grouped data. While this area of statistics is often perceived -- somewhat naively -- as saturated, several misconceptions still affect everyday practice, and new frontiers have so far remained unexplored. Researchers must be aware of the limitations affecting their analyses and what new possibilities are at their hands. The article introduces a unifying approach to the analysis of divisible statistics -- that includes Pearson's $χ^2$, the likelihood ratio, and spectral statistics, as special cases -- when a statistician deals with a large number of bins/groups, thus leading to a large number of small or moderate frequencies. Performance of the tests is analyzed against the class of contiguous (local) alternatives. Perhaps the most surprising result here is that, in this `sparse' regime, most of the tests proposed in the literature can be modified to produce more powerful tests, and no single test based on a divisible statistic leads to a goodness-of-fit test. Distribution-free goodness-of-fit tests are also constructed.

2. 贝叶斯统计与概率建模 9 篇

2606.09664 2026-06-09 cs.LG stat.ML 新提交

In-Context Learning for Latent Space Bayesian Optimization

潜空间贝叶斯优化的上下文学习

Tuan A. Vu, Harri Lähdesmäki, Julien Martinelli

发表机构 * Aalto University(阿尔托大学)

AI总结 针对潜空间贝叶斯优化中上下文学习模型与优化任务不匹配的问题,提出在分子VAE潜空间上定义合成优化任务进行持续预训练,并引入正则化器保持原始先验,显著提升分子优化性能。

详情
AI中文摘要

贝叶斯优化(BO)是样本高效设计的核心工具,潜空间贝叶斯优化(LSBO)将其扩展到分子和蛋白质等结构化对象。与此同时,TabPFN和TabICL等表格基础模型现已实现最先进的回归性能,并越来越多地被用作BO代理模型。由于其贝叶斯行为是由大规模合成预训练集合诱导的,因此该预训练分布的组成至关重要。LSBO造成了一种独特的不匹配:从潜代码到目标值的映射与当前上下文模型训练所用的回归任务明显不同。我们通过在分子VAE的潜空间上定义合成优化任务来补充表格基础模型代理的预训练阶段,从而解决这种不匹配。持续预训练目标包含一个正则化器,将模型锚定到原始检查点,保留其广泛的回归先验,同时避免对适应任务的过度专业化。在保留的分子优化基准测试中,所得模型实现了强劲性能,支持了针对上下文化代理的LSBO特定适应的重要性。

英文摘要

Bayesian optimization (BO) is a central tool for sample-efficient design, and latent-space Bayesian optimization (LSBO) extends it to structured objects such as molecules and proteins. In parallel, tabular foundation models such as TabPFN and TabICL now achieve state-of-the-art regression performance and are increasingly used as BO surrogates. Because their Bayesian behavior is induced by large synthetic pretraining collections, the composition of this pretraining distribution is crucial. LSBO creates a distinctive mismatch: the induced map from latent code to objective value differs markedly from the regression tasks used to train current in-context models. We address this mismatch by complementing the pretraining stage of tabular foundation model surrogates with synthetic optimization tasks defined on the latent space of a molecular VAE. The continued-pretraining objective features a regularizer that anchors the model to the original checkpoint, preserving its broad regression prior while avoiding overspecialization to the adaptation tasks. On held-out molecular optimization benchmarks, the resulting model achieves strong performance, supporting the relevance of LSBO-specific adaptation for in-context surrogates.

2606.09307 2026-06-09 stat.ME stat.AP stat.CO 新提交

Robust high-dimensional Bayesian regression with non-Gaussian errors under global--local shrinkage priors

全局-局部收缩先验下非高斯误差的稳健高维贝叶斯回归

Mohammad Arashi

AI总结 针对高维多元回归中非高斯误差问题,提出基于尺度-位置混合误差分布与马蹄+先验的稳健贝叶斯框架,实现系数和残差精度矩阵的联合稀疏性,理论证明后验收缩性和选择一致性,模拟和实证表明优于高斯方法。

详情
Comments
21 pages, 9 figures, 6 tables
AI中文摘要

具有许多相关响应和预测变量的多元回归通常由于重尾、异常值和不对称性而违反高斯误差假设。高斯过程在系数估计中失去效率,并产生条件依赖图的有偏估计。我们开发了一个稳健的贝叶斯框架,使用尺度-位置混合误差分布,并在回归系数和误差精度矩阵的非对角元素上使用马蹄+全局-局部先验,将回归映射中的稀疏性与残差依赖结构中的稀疏性结合起来。理论贡献包括联合后验收缩、两个支撑集的选择一致性、一个显示马蹄+优于马蹄的Kullback-Leibler风险界,以及有界敏感性,确保在t误差下单个大异常值的影响消失。跨四种误差机制、污染和不同维度的模拟表明,我们的估计量在高斯性下与高斯过程匹配,在重尾和偏斜下优于它们。对FRED-MD宏观经济数据和S&P 500日收益率的应用恢复了可解释的稀疏系数图和残差依赖图,同时自动降低危机时期观测值的权重。

英文摘要

Multivariate regression with many correlated responses and predictors commonly violates Gaussian error assumptions due to heavy tails, outliers, and asymmetry. Gaussian procedures then lose efficiency in coefficient estimation and produce biased estimates of conditional dependence graphs. We develop a robust Bayesian framework using a scale-location mixture error distribution and horseshoe+ global-local priors on both the regression coefficients and off-diagonals of the error precision matrix, coupling sparsity in the regression map with sparsity in the residual dependence structure. Theoretical contributions include joint posterior contraction, selection consistency for both supports, a Kullback-Leibler risk bound showing the dominance of horseshoe+ over horseshoe, and bounded sensitivity, ensuring that a single large outlier has vanishing influence under t errors. Simulations across four error regimes, contamination, and varying dimensions show that our estimator matches Gaussian procedures under normality and dominates them under heavy tails and skewness. Applications to FRED-MD macroeconomic data and S&P 500 daily returns recover interpretable sparse coefficient maps and residual dependence graphs while automatically down-weighting crisis-period observations.

2606.08819 2026-06-09 stat.ME 新提交

Model Selection for SLOPE Models: A Bayesian Perspective

SLOPE模型的模型选择:贝叶斯视角

Fabio Feser, Marina Evangelou

AI总结 提出贝叶斯方法BGSLOPE和BSGS,通过嵌入spike-and-slab框架控制FDR,并引入两步正交变换TSO,在合成和真实数据中优于交叉验证等方法。

详情
AI中文摘要

排序$\ell_1$惩罚估计(SLOPE)模型执行变量或组选择,在已知噪声的正交设置下控制错误发现率(FDR),但这种设置在现实中很少见。在一般条件下,交叉验证是SLOPE的默认模型选择方法,但它针对预测性能而非FDR控制。我们通过提出新的贝叶斯方法——贝叶斯组SLOPE(BGSLOPE)和贝叶斯稀疏组SLOPE(BSGS),填补了SLOPE模型家族的这一空白。BGSLOPE和BSGS将基于组的SLOPE模型嵌入spike-and-slab框架,其中BSGS为稀疏组模型提供了连续的spike-and-slab框架。我们进一步引入两步正交(TSO),将一般设置转换为正交设置,以恢复SLOPE的FDR控制特性。通过广泛的合成和真实数据研究,比较SLOPE模型的所有主要模型选择策略,所提出的贝叶斯模型一致地控制FDR,实现更高的功效,并在预测中优于竞争方法。

英文摘要

Sorted $\ell_1$ Penalized Estimation (SLOPE) models, that perform either variable or group selection, control the false discovery rate (FDR) under orthogonal settings with known noise, but such settings are rare in practice. Under general conditions, cross-validation is the default model selection approach for SLOPE, yet it targets predictive performance rather than FDR control. We address this gap for the SLOPE family of models by proposing new Bayesian approaches, Bayesian Group SLOPE (BGSLOPE) and Bayesian Sparse-group SLOPE (BSGS). BGSLOPE and BSGS embed group-based SLOPE models into a spike-and-slab framework, with BSGS providing a continuous spike-and-slab framework for sparse-group models. We further introduce Two-step Orthogonal (TSO), which transforms a general setting into an orthogonal one to recover SLOPE's FDR control properties. Through extensive synthetic and real data studies comparing all major model selection strategies for SLOPE models, the proposed Bayesian models consistently control FDR, achieve higher power, and outperform competing methods in prediction.

2606.07947 2026-06-09 stat.ME math.ST stat.AP stat.TH 新提交

Bayesian Global Fréchet Regression via Weak Conditional Expectations

贝叶斯全局Fréchet回归:基于弱条件期望

Simon Fontaine, Bing Li, Lingzhou Xue

AI总结 提出贝叶斯Fréchet回归框架,通过弱条件期望将对象值回归转化为标量回归任务,实现先验与数据驱动估计的受控插值,在微生物组数据中提升预测性能。

详情
Comments
34 pages, 4 figures
AI中文摘要

Fréchet回归为在具有欧几里得预测变量的度量空间中建模响应提供了一个通用框架,然而当前方法几乎完全依赖于频率学派方法。我们提出了一种贝叶斯Fréchet回归框架,为将先验信息纳入非线性全局Fréchet回归提供了一种原则性方法。通过针对一种新颖的Fréchet贝叶斯规则,我们将对象值回归问题简化为一系列可处理的标量回归任务。我们的方法允许在先验和数据驱动的频率学派估计之间进行受控插值,促进向有信息值的有效收缩。虽然最初在高斯假设下推导,但我们通过弱条件期望在矩条件下建立其有效性,证明了我们的框架对模型误设具有鲁棒性。所提出方法的数值特性通过模拟研究和对微生物组组成数据的应用得到展示,其中我们表明利用辅助队列来告知先验显著增强了针对性的小规模研究中的预测性能。

英文摘要

Fréchet regression provides a versatile framework for modeling responses in metric spaces with Euclidean predictors, yet current methodologies rely almost exclusively on frequentist approaches. We propose a Bayesian framework for Fréchet regression that offers a principled way of incorporating prior information into nonlinear global Fréchet regression. By targeting a novel Fréchet Bayes rule, we reduce the object-valued regression problem to a collection of tractable scalar regression tasks. Our approach allows for a controlled interpolation between the prior and the data-driven frequentist estimate, facilitating effective shrinkage toward informed values. While initially derived under Gaussian assumptions, we demonstrate that our framework is robust to model misspecification by establishing its validity under moment conditions via weak conditional expectations. The numerical properties of the proposed methodology are demonstrated in simulation studies and an application to microbiome compositional data, where we show that leveraging an auxiliary cohort to inform the prior significantly enhances predictive performance in a targeted, small-scale study

2606.07677 2026-06-09 stat.ML cs.LG stat.AP stat.ME 新提交

Disentangling Latent Risk Pathways via Bayesian Hypergraph Inference

通过贝叶斯超图推断解缠潜在风险路径

Shengxian Ding, Haonan Gao, Pangpang Liu, Xinyuan Tian, Yize Zhao

AI总结 提出贝叶斯超图推断框架,通过风险因子调节的潜在疾病路径建模多疾病,实现可解释的高阶结构、校准的不确定性估计和罕见病改进预测。

详情
Comments
ICML 2026 Oral
AI中文摘要

电子健康记录(EHR)提出了大规模多疾病建模问题,其中许多结果罕见且受共享风险因素强烈影响。虽然现代方法实现了强大的预测性能,但它们通常独立处理疾病或依赖黑盒架构,对风险因素如何组织疾病风险的洞察有限,且缺乏原则性的不确定性量化。我们引入了一个贝叶斯超图推断框架,将多疾病建模重新构建为围绕潜在的风险因子调节的疾病路径。风险因素作用于超边,即具有共享风险模式的潜在疾病子集,允许疾病参与多个不同的路径,并实现超越成对关联的可解释高阶结构。排斥先验鼓励简约且可识别的结构,而后验推断为疾病分组和风险因素影响提供了校准的不确定性。为了在大型EHR数据集上实现可扩展推断,我们开发了一种结构化变分推断算法,该算法保留了超边存在、疾病成员资格和路径级效应之间的逻辑依赖关系。在模拟数据和英国生物银行上的实验表明,该框架具有稳定且可解释的疾病路径结构、良好校准的不确定性、对罕见病的改进估计以及有竞争力的预测性能。

英文摘要

Electronic health records (EHR) pose large-scale multi-disease modeling problems in which many outcomes are rare and strongly influenced by shared risk factors. While modern approaches achieve strong predictive performance, they often treat diseases independently or rely on black-box architectures, offering limited insight into how risk factors organize disease risk and little principled uncertainty quantification. We introduce a Bayesian hypergraph inference framework that reframes multi-disease modeling around latent, risk-factor-modulated disease pathways. Risk factors act on hyperedges, latent disease subsets with shared risk patterns, allowing diseases to participate in multiple distinct pathways and enabling interpretable, higher-order structure beyond pairwise associations. A repulsion prior encourages parsimonious and identifiable structure, while posterior inference provides calibrated uncertainty over both disease groupings and risk-factor influence. To enable scalable inference on large EHR datasets, we develop a structured variational inference algorithm that preserves logical dependencies among hyperedge existence, disease membership, and pathway-level effects. Experiments on simulated data and UK Biobank demonstrate stable and interpretable disease pathway structure, well-calibrated uncertainty, improved estimation for rare diseases, and competitive predictive performance.

2606.02351 2026-06-09 cs.LG stat.ML 版本更新

Local Preferential Bayesian Optimization

局部偏好贝叶斯优化

Johanna Menn, Miriam Kober, Paul Brunzema, David Stenger, Sebastian Trimpe

AI总结 针对偏好贝叶斯优化在高维问题中效率低的问题,提出利用信任域和导数信息的局部偏好贝叶斯优化方法,显著降低累积遗憾。

详情
AI中文摘要

贝叶斯优化(BO)是一种流行且有效的调优昂贵、有噪声实验的方法,但需要制定明确的目标函数。偏好贝叶斯优化(PBO)通过从成对的人类反馈中学习来消除这一要求,然而现有方法由于其全局搜索策略,难以有效优化中低维以外的问题。我们通过开发一系列局部PBO方法来解决这一限制,这些方法将高维BO的关键思想迁移到偏好设置中。具体而言,我们引入了局部PBO方法,将信任域和导数信息局部搜索适应于成对偏好反馈,其中后者利用了拉普拉斯近似高斯过程后验的一阶和二阶导数。我们在GP样本路径、标准优化基准函数和策略搜索任务上的基准测试表明,局部PBO方法在具有陡峭最优值的高维和复杂景观中特别有效。与基于全局偏好的基线相比,它们可以显著减少累积遗憾,使其对于现实世界中基于偏好的优化任务(如策略搜索)特别有用。

英文摘要

Bayesian optimization (BO) is a popular and effective approach for tuning expensive, noisy experiments, but requires the formulation of an explicit objective function. Preferential BO (PBO) removes this requirement by learning from pairwise human feedback, yet existing methods struggle to efficiently optimize beyond low- and medium-dimensional problems due to their global search approaches. We address this limitation by developing a family of local PBO methods that transfer key ideas from high-dimensional BO to the preferential setting. In particular, we introduce local PBO methods which adapt trust-region and derivative-informed local search to pairwise preference feedback, where the latter exploits first- and second-order derivatives of the Laplace-approximated GP posterior. Our benchmark on GP sample paths, standard optimization benchmark functions, and policy-search tasks shows that local PBO methods are especially effective in high-dimensional and complex landscapes with steep optima. Compared with global preference-based baselines, they can substantially reduce cumulative regret, making them particularly useful for real-world preference-based optimization tasks such as policy search.

2603.06023 2026-06-09 math.PR stat.ML 版本更新

Large deviation principles for convolutional Bayesian neural networks

卷积贝叶斯神经网络的大偏差原理

Federico Bassetti, Vassili De Palma, Lucia Ladelli

AI总结 本文研究了卷积神经网络在无限通道极限下的大偏差原理,建立了条件协方差矩阵的大偏差原理,并推导了后验分布的大偏差原理,首次为卷积神经网络建立了此类原理。

详情
Comments
updated version, simplified notation
AI中文摘要

尽管适当缩放的CNN在通道数趋于无穷时已知收敛于高斯过程,但其高斯极限之外的性质尚不清楚。本文在无限通道极限下建立了卷积神经网络的大偏差原理。我们考虑了一类具有通用感受野的多维CNN架构,其通过满足轻微结构假设的补丁提取函数进行编码。主要结果建立了在高斯先验分布下条件协方差矩阵序列的大偏差原理。我们进一步推导了通过有限观测条件后验分布的大偏差原理。此外,我们还提供了条件协方差集中趋势和网络高斯等价性的简洁证明。据我们所知,这是首次为卷积神经网络建立此类大偏差原理。

英文摘要

While suitably scaled CNNs with Gaussian initialization are known to converge to Gaussian processes as the number of channels diverges, little is known beyond this Gaussian limit. We establish a large deviation principle (LDP) for convolutional neural networks in the infinite-channel regime. We consider a broad class of multidimensional CNN architectures characterized by general receptive fields encoded through a patch-extractor function satisfying mild structural assumptions. Our main result establishes a large deviation principle (LDP) for the sequence of conditional covariance matrices under Gaussian prior distribution on the weights. We further derive an LDP for the posterior distribution obtained by conditioning on a finite number of observations. In addition, we provide a streamlined proof of the concentration of the conditional covariances and of the Gaussian equivalence of the network. To the best of our knowledge, this is the first large deviation principle established for convolutional neural networks.

2410.20169 2026-06-09 stat.ME 版本更新

Bayes-assisted Confidence Regions: Focal Point Estimator and Bounded-influence Priors

贝叶斯辅助置信区域:焦点点估计和有界影响先验

Stefano Cortinovis, François Caron

AI总结 提出FAB置信区域框架,证明高斯似然下后验均值包含于FAB-CR,并引入幂律尾条件实现鲁棒FAB-CR,提出满足条件的收缩先验类,建立与贝叶斯收缩方法的联系。

详情
Comments
35 pages, 17 figures
AI中文摘要

频率学派、贝叶斯辅助(FAB)框架构建利用参数值先验信息的置信区域。FAB置信区域(FAB-CR)在先验下可能的参数值处具有更小的体积,同时保持精确的频率覆盖。本文对FAB框架进行了若干方法论和理论贡献。对于高斯似然,我们证明均值参数的后验均值包含在FAB-CR中。更一般地,该结果扩展到自然指数族似然的自然参数的后验均值。这些结果提供了一个自然的贝叶斯辅助估计量,可与FAB-CR一起报告。此外,对于高斯似然,我们证明边际似然的幂律尾条件诱导出均匀有界且对于极端观测恢复为标准频率置信区间的鲁棒FAB-CR。我们通过提出一类满足该条件且不牺牲解析可处理性的FAB框架收缩先验,将该结果转化为实践。所得的FAB估计量等于著名的贝叶斯收缩估计量,包括马蹄估计量,从而建立了鲁棒FAB-CR与贝叶斯收缩方法之间的深刻联系。

英文摘要

The Frequentist, Assisted by Bayes (FAB) framework constructs confidence regions that leverage prior information about parameter values. FAB confidence regions (FAB-CRs) have smaller volume for values of the parameter that are likely under the prior while maintaining exact frequentist coverage. This work introduces several methodological and theoretical contributions to the FAB framework. For Gaussian likelihoods, we show that the posterior mean of the mean parameter is contained in the FAB-CR. More generally, this result extends to the posterior mean of the natural parameter for likelihoods in the natural exponential family. These results provide a natural Bayes-assisted estimator to be reported alongside the FAB-CR. Furthermore, for Gaussian likelihoods, we show that power-law tail conditions on the marginal likelihood induce robust FAB-CRs that are uniformly bounded and revert to standard frequentist confidence intervals for extreme observations. We translate this result into practice by proposing a class of shrinkage priors for the FAB framework that satisfy this condition without sacrificing analytic tractability. The resulting FAB estimators equal prominent Bayesian shrinkage estimators, including the horseshoe estimator, thereby establishing insightful connections between robust FAB-CRs and Bayesian shrinkage methods.

2505.10849 2026-06-09 stat.ME econ.EM 版本更新

A Tractable Unified Skew-t Distribution and Its Copula for Heterogeneous Asymmetries

一种可处理的统一偏斜t分布及其用于异质性不对称的Copula

Lin Deng, Michael Stanley Smith, Worapree Maneesoonthorn

AI总结 提出一种新的可处理统一偏斜t(TrUST)分布,解决参数识别和计算难题,其copula允许变量对间不对称依赖的异质性,通过贝叶斯推断实现,在电价和股票收益数据中表现优于偏斜t分布。

详情
AI中文摘要

允许不对称和重尾的多元分布是许多计量经济学和统计模型的重要组成部分。统一偏斜t(UST)是一个有前景的选择,因为它既可扩展,又允许分布中的不对称性具有高度灵活性。然而,它存在参数识别和计算难题,迄今阻碍了其在数据建模中的应用。在本文中,我们提出了一种新的可处理统一偏斜t(TrUST)分布变体,解决了这两个挑战。此外,该分布的copula也被证明是可处理的,同时允许变量对之间的不对称依赖性比流行的偏斜t copula具有更大的异质性。我们展示了如何使用从分布的生成表示中导出的扩展似然来计算分布及其copula的贝叶斯后验推断。首先使用模拟数据证明了这种贝叶斯方法的有效性,以及TrUST分布及其隐含copula的增强灵活性。将TrUST分布应用于高度偏斜的澳大利亚区域电价,以及将TrUST copula应用于美国日内股票收益,展示了我们提出的分布及其copula在实践中如何比流行的偏斜t及其copula提供显著的精度提升。

英文摘要

Multivariate distributions that allow for asymmetry and heavy tails are important building blocks in many econometric and statistical models. The Unified Skew-t (UST) is a promising choice because it is both scalable and allows for a high level of flexibility in the asymmetry in the distribution. However, it suffers from parameter identification and computational hurdles that have to date inhibited its use for modeling data. In this paper we propose a new tractable variant of the unified skew-t (TrUST) distribution that addresses both challenges. Moreover, the copula of this distribution is shown to also be tractable, while allowing for greater heterogeneity in asymmetric dependence over variable pairs than the popular skew-t copula. We show how Bayesian posterior inference for both the distribution and its copula can be computed using an extended likelihood derived from a generative representation of the distribution. The efficacy of this Bayesian method, and the enhanced flexibility of both the TrUST distribution and its implicit copula, is first demonstrated using simulated data. Applications of the TrUST distribution to highly skewed regional Australian electricity prices, and the TrUST copula to intraday U.S. equity returns, demonstrate how our proposed distribution and its copula can provide substantial increases in accuracy over the popular skew-t and its copula in practice.

3. 因果推断与实验设计 19 篇

2606.09802 2026-06-09 cs.LG cs.AI stat.ML 新提交

Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts

高效实验的Bandits:适应控制组、偏好和上下文漂移

Udvas Das, Waris Radji, Debabrota Basu, Odalric-Ambrym Maillard

AI总结 针对用户偏好和上下文分布随时间漂移的线性上下文随机多臂赌博机问题,提出Dri-MED算法,通过异方差回归处理非平稳噪声,实现实例相关的遗憾界和约束违规界。

详情
AI中文摘要

我们考虑线性上下文随机多臂赌博机的一个变体,其中学习器必须向一组用户提供推荐,每个用户有其个性化的偏好向量,并且上下文分布随时间漂移。在实践者友好的假设下,我们将此设置简化为具有平稳均值但异方差和非平稳噪声的线性赌博机。我们进一步研究了学习器必须确保每个决策的平均奖励超过基线策略$\boldsymbol{\pi}_0$在每个决策步骤的均值的情况。我们引入了Dri-MED,一种受MED策略线性版本启发并仔细调整以处理非平稳异方差噪声的算法。我们表明,实例相关的遗憾界为$\tilde{\mathcal O}\left(\frac{\kappa}{\tilde{\Delta}}d^2(\log(T)\right)$,其中$\tilde{\Delta}$是受策略$\pi_0$约束的次优性间隙,方差感知乘性项$\kappa$通过异方差回归仔细处理。我们进一步表明Dri-MED享有$\tilde{\mathcal{O}}(d)$的期望约束违规。我们的数值结果表明,Dri-MED显著优于忽略漂移和偏好结构的保守基线。

英文摘要

We consider a variant of the linear contextual stochastic multi-armed bandits, where the learner must provide recommendations to a group of users, each having its personalized preference vector, and in the presence of context distributions that are drifting over time. Under practitioner-friendly assumptions, we reduce this setting to linear bandit with stationary mean but heteroskedastic and non-stationary noise. We further study the case when the learner must ensure the mean reward of each decision must exceed that of a baseline strategy $\boldsymbolπ_0$ at each decision step. We introduce Dri-MED, an algorithm inspired from the linear version of the MED strategy, and carefully adapted to handle the non-stationary heteroskedastic noise. We show that the instance-dependent regret scales as $\tilde{\mathcal O}\left(\fracκ{\tildeΔ}d^2(\log(T)\right)$, where $\tildeΔ$ is the constraint-aware sub-optimality gap subject to policy $π_0$, with variance-aware multiplicative term $κ$ that we carefully handle using heteroskedastic regression. We further show Dri-MED enjoys $\tilde{\mathcal{O}}(d)$ expected constraint violations. Our numerical results suggest that Dri-MED significantly outperforms conservative baselines that ignores the drift and preference structure.

2606.09625 2026-06-09 econ.EM stat.ME 新提交

A Synthetic Control Approach to Conditional Distributional Treatment Effects

条件分布处理效应的合成控制方法

Dominik Wied

AI总结 提出基于分布回归模型的合成控制框架估计条件分布处理效应,通过最小二乘权重估计反事实分布,并推导渐近分布与检验统计量。

详情
AI中文摘要

本文提出了一个合成控制(SC)框架,用于估计条件分布处理效应。识别依赖于在半参数分布回归(DR)模型的参数空间中制定的平行趋势条件,该条件将反事实条件分布保持在模型类内。权重在加总约束下求解最小二乘问题,得到闭式估计量。我们推导了反事实估计量的渐近分布,其中DR估计误差和权重估计误差对渐近方差的贡献率相同。此外,我们提出了一个用于检验无处理效应原假设的上确界检验,其极限是高斯过程的上确界。模拟表明,协变量条件可以揭示仅从无条件分布难以检测的效应。使用CPS数据对1992年新泽西州最低工资上调的应用发现,效应集中在低教育、低经验工人的最低工资走廊内。

英文摘要

This paper proposes a synthetic control (SC) framework for the estimation of conditional distributional treatment effects. Identification rests on a parallel trends condition formulated in the parameter space of the semiparametric distribution regression (DR) model, which keeps the counterfactual conditional distribution within the model class. The weights solve a least-squares problem subject to an adding-up constraint, yielding a closed-form estimator. We derive the asymptotic distribution of the counterfactual estimator, with DR estimation error and weight estimation error contributing at the same rate to the asymptotic variance. Moreover, we propose a supremum test for the null of no treatment effect, whose limit is the supremum of a Gaussian process. Simulations illustrate that conditioning on covariates can reveal effects being difficult to detect from the unconditional distribution alone. An application to the 1992 New Jersey minimum wage increase using CPS data finds effects concentrated in the minimum-wage corridor for low-education, low-experience workers.

2606.09283 2026-06-09 nlin.AO stat.AP 新提交

Towards personalised intervention: A causal-dynamical framework to determine psychological treatment trajectories

迈向个性化干预:确定心理治疗轨迹的因果动力学框架

Lourens Waldorp, Titus Mürtz, Anita Jansen, Jonas Haslbeck

AI总结 针对心理健康治疗效果不佳的问题,提出一个因果动力学框架,通过构建因果图、估计因果效应并模拟干预策略,为个体患者定制治疗重点。

详情
AI中文摘要

对于大约一半接受心理健康护理的个体来说,即使治疗符合循证指南,结果也不理想。这些有限的效果可能部分源于心理健康护理中临床决策如何确定治疗重点。通常,治疗策略由诊断分类结合个体化病例概念化来指导。虽然标准,但这种方法可能因患者和治疗师的偏见以及治疗指南基于可能不完全适合个体患者的平均效应等原因而不足。为了解决这些挑战,我们提出了一个新颖的框架,减少临床决策中的偏见,并真正实现为个体患者量身定制治疗重点。该框架包括:(a) 从密集收集的纵向患者数据中构建因果图并估计因果效应,(b) 基于因果关系模拟新的时间序列,以及 (c) 使用这些模拟来确定对个体患者最有效的治疗重点。通过模拟和比较不同的干预策略,并检查估计的个体反应性及其长期有效性,这种方法可能产生有用的见解来指导治疗重点和策略,从而显著改善心理健康护理的治疗结果。

英文摘要

For approximately half of the individuals receiving mental health care, the results are suboptimal, even when treatments align with evidence-based guidelines. These limited effects may partly stem from how clinical decisions on treatment focus are made in mental health care. Typically, treatment strategy is guided by the diagnostic classification combined with the individualized case conceptualization. While standard, this approach may fall short for several reasons such as biases on the part of both the patient and therapist, and treatment guidelines being based on average effects that may not (exactly) suit the individual patient. To address these challenges, we propose a novel framework that reduces biases in clinical decision-making and makes it genuinely possible to tailor treatment focus to the individual patient. This framework involves (a) constructing causal graphs and estimating causal effects from intensively collected, longitudinal patient data, (b) simulating new time series based upon the causal relationships, and (c) using these simulations to identify the most effective treatment focus for the individual patient. By simulating and comparing different intervention strategies and examining both the estimated individual's responsiveness and its long-term effectiveness, this approach may generate useful insights to guide treatment focus and strategy, which can lead to a significant improvement of treatment outcomes in mental health care.

2606.08941 2026-06-09 stat.ML cs.LG 新提交

Estimate Collapsibility of Causal Effects in Completed Partial DAGs via Strong d-Convex Hulls

通过强d-凸包估计完全部分有向无环图中因果效应的可压缩性

Yuxin Deng, Yi Sun, Zhiming Li, Huaxiong Liu

AI总结 提出一种在完全部分有向无环图中保持因果效应估计一致性的可压缩方法,通过强d-凸包刻画最小可压缩集,并设计高效算法结合IDA框架。

详情
AI中文摘要

本文提出一种可压缩的因果效应估计方法,该方法在完全部分有向无环图(CPDAG)中对某些变量边缘化前后保持估计量的一致性。我们首先引入了CPDAG的估计可压缩性,并将最小可压缩集刻画为强d-凸包。设计了一种高效算法来获取DAG中的此类集合,并将其推广到CPDAG。然后,我们将图约简过程与IDA框架相结合。最后,实验和实证分析显示了CPDAG中因果估计可压缩性的有效性。代码可在 https://github.com/Jamyang-D/strongly-convex 获取。

英文摘要

This paper proposes a collapsible method for estimating causal effects that maintains the estimator's consistency before and after marginalization over some variables in completed partially directed acyclic graphs (CPDAGs). We first introduce the estimate collapsibility for CPDAGs and characterize the minimal collapsible sets as strong d-convex hulls. An efficient algorithm is devised to obtain such sets in DAGs and is generalized to CPDAGs. Then, we combine the graph reduction procedure with the IDA framework. Finally, experiments and empirical analysis show the effectiveness of the collapsibility for causal estimations in CPDAGs. Code is available at https://github.com/Jamyang-D/strongly-convex.

2606.08923 2026-06-09 stat.AP 新提交

Scalable Network-Aware Experiment Design for Two-Sided Marketplaces

面向双边市场的可扩展网络感知实验设计

Yi Su, Zhen Yan

AI总结 针对双边市场中的处理干扰问题,提出EgoCluster V3和MultiEgoCluster聚类算法,分别减少3倍和额外56%的溢出效应,并提高统计功效,已在LinkedIn部署。

详情
AI中文摘要

在双边市场中测量因果效应具有挑战性,因为不同市场参与者之间存在处理干扰。当处理应用于一方(例如求职者)时,他们与另一方(例如招聘者)的互动会引入溢出效应,违反稳定单位处理值假设(SUTVA)并导致因果估计偏差。虽然基于聚类的随机化可以缓解此问题,但先前的方法面临一个基本权衡:减少溢出需要隔离的聚类,但这会减少合格聚类的数量,从而降低统计功效。本文介绍了EgoCluster V3,一种迭代聚类算法,与先前版本相比,在保持节点覆盖率和加倍检验功效的同时,将溢出减少了3倍。我们进一步介绍了MultiEgoCluster,它通过两阶段过程扩展了V3:首先将高度连接的自我节点分组为多自我聚类,然后应用迭代聚类算法。这实现了额外的约56%溢出减少和约38%的样本量增加。两种方法已在LinkedIn的生产环境中部署,并系统性地实现了高影响力的双边市场实验。由于仅通过聚类无法完全消除残余偏差,我们基于图结构推导了平均处理效应(ATE)估计的理论偏差校正方法,并提出了一种将结果推广到总体人群的方法。

英文摘要

Measuring causal effects in networked two-sided marketplaces is challenging due to treatment interference between market participants on different sides. When treatment is applied to one side (e.g., job seekers), their interactions with the other side (e.g., job posters) introduce spillover effects that violate the Stable Unit Treatment Value Assumption (SUTVA) and bias causal estimates. While cluster-based randomization mitigates this problem, prior approaches struggle with a fundamental trade-off: reducing spillover requires isolated clusters that will reduce the number of qualifying clusters, which decreases statistical power. This paper introduces EgoCluster V3, an iterative clustering algorithm that reduces spillover by 3x compared to prior versions while preserving node coverage and doubling test power. We further introduce MultiEgoCluster, which extends V3 through a two-stage procedure that first groups highly connected egos into multi-ego clusters before applying the iterative clustering algorithm. This achieves an additional ~56% spillover reduction and ~38% increase in sample size. Both methods are deployed in production at LinkedIn and have systematically enabled high-impact two-sided marketplace experiments. Since residual bias cannot be fully eliminated through clustering alone, we derive a theoretical bias correction method for average treatment effect (ATE) estimation based on graph structure and propose an approach to generalize results to the general population.

2606.08853 2026-06-09 econ.EM stat.ME 新提交

AI-Assisted Variance Reduction in Randomized Experiments

AI辅助的随机实验方差缩减

David Arbour, Eli Ben-Michael, Avi Feller, Apoorva Lal, Lo-Hua Yuan

AI总结 提出将AI预测作为协变量纳入标准回归调整,以降低随机实验方差,具有“无害”特性,并通过模拟和三个实证应用验证了效率提升。

详情
Comments
camera ready for KDD 2026
AI中文摘要

生成式AI和大语言模型可以从丰富、非结构化的输入中生成人类行为的逼真预测,几乎不需要特定任务的训练数据。最近的工作使用这些“数字孪生”预测来补充调查和实验中的人类响应。我们研究了使用AI生成的预测来减少随机实验方差的特殊情况。我们认为这样做不需要新的估计量,研究人员可以简单地将AI预测作为协变量纳入标准回归调整,类似于调整预后评分。这种方法的一个好处是“无害”特性,即当预测无信息时,调整后的估计量会退回到未调整的均值差。其他方法,如预测驱动推断的变体,没有这种保证。我们提供了实施指南,包括如何从离散的LLM输出中获得连续分数,以及如何使用LLM将非结构化输入特征化为辅助协变量。我们在模拟和三个实证应用中展示了这些想法:一个调查元研究、一个电子邮件营销A/B测试和一个大规模技术平台实验。总体而言,效率提升虽然适度但真实,在包含大量文本和其他非结构化数据的研究中收益更大。我们还从经验上确认了无害特性。鉴于这些收益和有限的成本,我们建议将调整AI生成的预测作为常规实证实践。

英文摘要

Generative AI and large language models can produce realistic predictions of human behavior from rich, unstructured inputs with little to no task-specific training data. Recent work uses these ``digital twin'' predictions to supplement human responses in surveys and experiments. We study the special case of using AI-generated predictions to reduce variance in randomized experiments. We argue that doing so requires no new estimators and that researchers can simply include AI predictions as covariates in standard regression adjustment, analogous to adjusting for a prognostic score. A benefit of this approach is a ``do no harm'' property whereby the adjusted estimator reverts to the unadjusted difference in means when predictions are uninformative. Other methods, such as variants of prediction-powered inference, do not have this guarantee. We provide implementation guidance, including how to obtain continuous scores from discrete LLM outputs and how to use LLMs to featurize unstructured inputs as auxiliary covariates. We demonstrate these ideas in simulations and three empirical applications: a survey mega-study, an email marketing A/B test, and a large-scale technology platform experiment. Overall, efficiency gains are real if modest, with greater benefits in studies that contain substantial text and other unstructured data. We also confirm the do no harm property empirically. Given these gains and limited costs, we recommend adjusting for AI-generated predictions as a regular empirical practice.

2606.08305 2026-06-09 stat.ML cs.LG 新提交

MEC-Cox: Machine-Learning-Assisted Generalized Entropy Calibration for ATT Marginal Hazard-Ratio Estimation

MEC-Cox:基于机器学习的广义熵校准用于ATT边际风险比估计

Se Yoon Lee, Yonghyun Kwon, Jae Kwang Kim

AI总结 提出MEC-Cox方法,结合机器学习辅助的广义熵校准与逆概率加权Cox回归,估计处理组平均处理效应(ATT)边际风险比,通过校准预后评分减少偏差并提高效率。

详情
AI中文摘要

当同时随机对照不可行时,外部对照生存试验越来越多地用于肿瘤学和罕见病等具有时间至事件终点的场景。我们针对处理组平均处理效应(ATT)类型的边际风险比估计量,比较处理组试验人群中的治疗与反事实对照,并使用逆概率加权(IPW)Cox回归进行估计。由于IPW Cox回归通过事件贡献和风险集平均值依赖于权重,使得灵活的机器学习干扰估计难以直接纳入,有效推断具有挑战性。基于Lee和Kim(2026)的机器学习辅助广义熵校准(MEC),我们提出了用于ATT加权IPW Cox回归的MEC-Cox方法。该方法首先对外部对照使用归一化的源倾向得分优势比权重,然后应用Bregman校准来平衡外部对照与处理组试验患者之间的交叉拟合预后摘要。校准基础可包括对照生存预测、Cox线性预测器、惩罚生存模型预测或其他预后评分摘要。因此,MEC更新后的权重扮演源传输和预后评分平衡权重的双重角色。我们建立了相合性,刻画了校准带来的效率增益,并开发了堆叠三明治方差估计器。模拟表明,MEC-Cox通过灵活的机器学习辅助调整可以减少偏差、提高效率并改善覆盖。

英文摘要

Externally controlled survival trials are increasingly used when concurrent randomized controls are infeasible, particularly in oncology and rare-disease settings with time-to-event endpoints. We target an average-treatment-effect-on-the-treated (ATT)-type marginal hazard-ratio estimand, comparing treatment with counterfactual control in the treated trial population, and estimate it using inverse-probability-weighted (IPW) Cox regression. Valid inference is challenging because IPW Cox regression depends on the weights through both event contributions and risk-set averages, making flexible machine-learning nuisance estimation difficult to incorporate directly. Building on machine-learning-assisted generalized entropy calibration (MEC) by Lee and Kim (2026), we propose MEC-Cox for ATT-weighted IPW Cox regression. The method begins with normalized source-propensity-score odds weights for external controls and then applies Bregman calibration to balance cross-fitted prognostic summaries between external controls and treated trial patients. The calibration basis may include control-survival predictions, Cox linear predictors, penalized-survival-model predictions, or other prognostic-score summaries. MEC-updated weights therefore play a dual role as source-transport and prognostic-score balancing weights. We establish consistency, characterize a calibration-induced efficiency gain, and develop a stacked sandwich variance estimator. Simulations show that MEC-Cox can reduce bias, increase efficiency, and improve coverage through flexible machine-learning-assisted adjustment.

2606.08196 2026-06-09 stat.ML cs.AI cs.LG stat.ME 新提交

Beyond Additivity: Causal Discovery in Location-Scale Noise Models with Hidden Variables

超越可加性:含隐变量的位置-尺度噪声模型中的因果发现

Mariyam Khan, Shohei Shimizu, Thong Pham

AI总结 针对含隐变量且数据生成过程遵循位置-尺度噪声模型(LSNM)的因果发现,证明满足无弓条件的非循环有向混合图(ADMG)可识别,并提出两阶段算法LSNM-UV,在异方差数据上优于可加性基线。

详情
Comments
33 pages, 4 figures
AI中文摘要

我们研究当某些变量隐藏且数据生成过程遵循位置-尺度噪声模型(LSNM)时,从观测数据进行因果发现的问题。现有处理隐藏混杂变量的方法通常假设可加性噪声,但在实践中,原因不仅调节其效应的均值,还调节方差。我们证明,满足无弓条件的非循环有向混合图(ADMG)在含隐变量的LSNM下是可识别的,建立了超越噪声可加性的因果不足模型的第一个可识别性结果。我们进一步提供了即使违反无弓假设时识别因果方向的充分条件。我们的两阶段算法LSNM-UV是正确且完备的,实验表明在异方差数据上优于可加性基线方法。

英文摘要

We study causal discovery from observational data when some variables are hidden and the data-generating process follows a location-scale noise model (LSNM). Existing methods that handle hidden confounders typically assume additive noise, but in practice, causes often modulate not just the mean but also the variance of their effects. We prove that acyclic directed mixed graphs (ADMGs) satisfying a bow-free condition are identifiable under LSNM with hidden variables, establishing the first identifiability result for causally insufficient models beyond noise additivity. We further provide sufficient conditions for identifying causal direction even when the bow-free assumption is violated. Our two-stage algorithm, LSNM-UV, is sound and complete, and experiments demonstrate improved performance over additive baselines on heteroscedastic data.

2606.07693 2026-06-09 stat.ML cs.LG math.PR 新提交

Transfer learning for causal forest

迁移学习用于因果森林

Bérénice-Alexia Jocteur, Véronique Maume-Deschamps, Pierre Ribereau

AI总结 提出一种针对因果森林HTERF的迁移学习方法,通过偏移量估计源域与目标域之间的模型偏移,并给出目标域上CATE误差的上界,仿真和真实数据验证了有效性。

详情
AI中文摘要

迁移学习解决了从一个领域向另一个领域迁移知识的挑战。传统的迁移学习侧重于调整在源域(有大量观测)上训练的模型,以提高在目标域(观测较少)上的性能。在这项工作中,我们考虑模型偏移的情况,并专注于将迁移学习应用于因果森林,即HTERF。该因果森林旨在估计条件平均处理效应(CATE)。所考虑的方法是Wang(2016)提出的偏移量方法,经过调整以适应因果背景。该方法依赖于使用中间模型来估计源分布和目标分布之间的偏移量。我们的主要结果是基于中间模型的误差,给出了目标上HTERF的CATE误差的上界。仿真研究表明,该方法在不同设置下的仿真以及真实数据集上均表现出良好的性能。

英文摘要

Transfer learning addresses the challenge of transfering knowledge from one domain to another. Traditional transfer learning focuses on adapting models trained on a source domain (with a lot of observations) to improve performance on a target domain (with few observations). In this work we consider the case of a model shift and we focus on the transfer learning applied to a causal forest namely HTERF. This causal forest aims to estimate the Conditional Average Treatment Effect (CATE). The approach considered is the offset method presented by Wang (2016) adapted to a causal context. This method relies on the use of intermediate models in order to estimate the offset between source and target distributions. Our main result is a bound on the CATE error of HTERF on target depending on the error of the intermediate models. Simulation studies show the good performances of this approach in different settings on simulations and on a real-world dataset.

2606.05797 2026-06-09 cs.LG stat.ML 版本更新

Causal Longitudinal Prior-Fitted Networks for Counterfactual Outcome Prediction

因果纵向先验拟合网络用于反事实结果预测

Amirhossein Zare, Amirhessam Zare, Herlock Rahimi, Reza Salarikia, Mohammad Kashkooli

AI总结 提出CausalLongPFN,一种基于先验拟合的上下文预测器,通过合成因果模型预训练实现无需梯度更新的纵向反事实结果预测,在多个基准上达到与领域训练模型竞争的性能。

详情
Comments
31 pages, 10 tables
AI中文摘要

纵向治疗决策需要预测未来治疗序列下的潜在结果,同时考虑时变混杂、异质性患者动态和有限的领域特定数据。现有的纵向因果估计器通常为每个队列或模拟器训练新模型。我们引入了因果纵向先验拟合网络(CausalLongPFN),一种用于纵向因果预测的先验拟合上下文预测器。该模型完全在从时间结构因果模型的广泛先验中采样的合成情节上进行预训练,使其暴露于治疗-混杂反馈、潜在异质性、非线性状态演化、延迟效应和累积治疗反应。在测试时,CausalLongPFN被冻结:它基于支持轨迹、查询历史和提出的未来治疗序列进行条件预测,返回未来结果的预测分布,无需梯度更新或倾向性模型拟合。通过在指定治疗序列下递归应用一步预测器获得多步预测。我们在具有真实反事实标签的可分支癌症、HIV和华法林基准上,以及在MIMIC-III ICU轨迹的仅事实滚动起点预测上进行评估。CausalLongPFN在反事实基准上与领域训练的纵向基线竞争,并在事实MIMIC-III预测上表现强劲,表明当重复的领域特定训练成本高昂或不可行时,广泛的合成因果预训练可以提供有用的冻结替代方案。

英文摘要

Longitudinal treatment decisions from multivariate time-series data require predicting potential outcomes under future treatment sequences in the presence of time-varying confounding, heterogeneous patient dynamics, and limited domain-specific data. Existing longitudinal causal estimators typically address this problem by training a new model for each cohort or simulator. We introduce Causal Longitudinal Prior-Fitted Networks (CausalLongPFN), a prior-fitted network for time-series causal inference in longitudinal treatment-response data and zero-shot in-context counterfactual outcome prediction. The model is pretrained entirely on synthetic episodes sampled from a broad prior over temporal structural causal models, exposing it to treatment-confounder feedback, latent heterogeneity, nonlinear state evolution, delayed effects, and cumulative treatment responses. At test time, CausalLongPFN remains frozen and is used zero-shot: it conditions on support trajectories, a query history, and a planned future treatment sequence, and returns a predictive distribution over future outcomes without gradient updates or propensity-model fitting. Multi-step predictions are obtained by recursively applying the one-step predictor under the specified treatment sequence. We evaluate the model on branchable cancer, HIV, and warfarin benchmarks with ground-truth counterfactual labels, and on factual-only rolling-origin prediction in MIMIC-III ICU trajectories. CausalLongPFN is competitive with domain-trained longitudinal baselines on counterfactual benchmarks and performs strongly on factual MIMIC-III prediction, suggesting that broad synthetic causal pretraining can provide a frozen, amortized alternative for zero-shot longitudinal treatment-response prediction when repeated domain-specific training is costly or impractical.

2605.29348 2026-06-09 stat.ME 版本更新

Efficient Inference for Incremental Causal Effects of Time to Treatment

治疗时间增量因果效应的有效推断

Zhichen Zhao, Andrew Ying, Ronghui Xu

AI总结 本文针对治疗开始时间的增量因果效应,推导了有效影响函数,并提出了一个结合灵活机器学习方法且收敛速度快的估计框架,通过经验过程理论获得有效置信带。

详情
AI中文摘要

我们考虑治疗开始时间。这在预防医学中常见,如疾病筛查和疫苗接种;也出现在非致命健康状况中,如未发展为艾滋病的HIV感染。虽然传统因果推断关注“何时治疗”及其效果,但我们考虑当干预治疗开始时间强度时的增量因果效应。我们推导了该估计量的有效影响函数,并开发了一个估计框架,该框架适应灵活的机器学习方法,同时实现快速收敛速度。利用经验过程理论获得有效置信带。我们通过模拟说明我们的方法,并将其应用于宫颈癌筛查数据,以研究后续HPV检测时间对宫颈上皮内瘤变检测的增量效应。

英文摘要

We consider continuous time to treatment initiation. This can commonly occur in preventive medicine, such as disease screening and vaccination; it can also occur with non-fatal health conditions such as HIV infection without the onset of AIDS. While traditional causal inference focused on `when to treat' and its effects, we consider the incremental causal effect when the intensity of time to treatment initiation is intervened upon. We derive the efficient influence function for this estimand and develop an estimation framework that accommodates flexible machine learning methods while achieving fast convergence rates. Valid confidence bands are obtained leveraging empirical process theory. We illustrate our approach via simulation, and apply it to cervical cancer screening data to study the incremental effect of time to subsequent HPV testing on cervical intraepithelial neoplasia detection.

2602.05553 2026-06-09 stat.ME stat.AP 版本更新

Sensitivity analysis for contamination in egocentric-network randomized trials with interference

自我中心网络随机试验中污染问题的敏感性分析

Bar Weinstein, Daniel Nevo

AI总结 针对自我中心网络随机试验中的污染问题,提出基于敏感性参数的偏差校正估计量,并通过网格分析和概率偏差分析评估因果估计的稳健性。

详情
AI中文摘要

自我中心网络随机试验(ENRTs)越来越多地用于在干扰存在且无法测量完整社会中心网络数据时估计因果效应。ENRTs依赖于自我中心网络抽样,首先抽样一组自我,每个自我招募其邻居子集作为他人。然后治疗在自我之间随机分配。虽然观察到的自我网络在设计上是不相交的,但底层人口网络可能包含连接它们的边,导致污染。在基于设计的框架下,我们表明,只要存在污染,直接和间接效应的Horvitz-Thompson估计量就是有偏的。为了解决这个问题,我们推导了偏差校正估计量,并提出了一个基于敏感性参数的新型敏感性分析框架,该参数表示缺失边的概率或期望数量。该框架通过网格敏感性分析和概率偏差分析实现,为研究人员提供了一种灵活的工具来评估因果估计量对污染的稳健性。我们将我们的方法应用于HIV预防试验网络037研究,发现忽略污染可能导致低估间接效应和高估直接效应。

英文摘要

Egocentric-Network Randomized Trials (ENRTs) are increasingly used to estimate causal effects under interference when measuring complete sociocentric network data is infeasible. ENRTs rely on egocentric network sampling, where a set of egos is first sampled, and each ego recruits a subset of its neighbors as alters. Treatments are then randomized across egos. While the observed ego-networks are disjoint by design, the underlying population network may contain edges connecting them, leading to contamination. Under a design-based framework, we show that the Horvitz-Thompson estimators of direct and indirect effects are biased whenever contamination is present. To address this, we derive bias-corrected estimators and propose a novel sensitivity analysis framework based on sensitivity parameters representing the probability or expected number of missing edges. This framework is implemented via both grid sensitivity analysis and probabilistic bias analysis, providing researchers with a flexible tool to assess the robustness of the causal estimators to contamination. We apply our methodology to the HIV Prevention Trials Network 037 study, finding that ignoring contamination may lead to underestimation of indirect effects and overestimation of direct effects.

2601.01830 2026-06-09 stat.ME stat.AP 版本更新

Confounder-robust causal discovery and inference in Perturb-seq using proxy and instrumental variables

基于代理变量和工具变量的 Perturb-seq 中抗混杂因果发现与推断

Kwangmoon Park, Hongzhe Li

AI总结 提出一种利用代理变量和工具变量策略,从存在未观测混杂的 Perturb-seq 数据中准确重建因果基因网络的方法,在合成数据和真实实验上优于基线方法。

详情
AI中文摘要

新兴的单细胞技术,如将基于 CRISPR 的遗传扰动与单细胞 RNA 测序相结合的 Perturb-seq,为揭示基因间的因果关系提供了前所未有的机会。然而,Perturb-seq 实验受到未观测因素的影响,如果处理不当,会严重偏倚推断出的基因间因果关系。这些潜在因素不仅可能来自调控元件的内在分子特征,还可能来自因成本限制的实验设计而遗漏的未测量基因。尽管分析大规模 Perturb-seq 数据的方法正在迅速成熟,但在推断因果基因网络时明确考虑此类未观测混杂因素的方法仍然缺乏。在此,我们提出了一种新方法,即使在重要混杂因素缺失的情况下,也能从 Perturb-seq 数据中准确重建因果基因网络。我们的框架利用代理变量和工具变量策略来挖掘嵌入在扰动中的丰富信息,从而实现对基因表达潜在有向无环图(DAG)的无偏估计。在综合合成数据和 K562 细胞真实 CRISPR 干扰实验中的应用表明,我们的方法优于缺乏对未测量混杂进行原则性调整的基线方法,能够更准确且更具生物学相关性地恢复真实的因果 DAG。

英文摘要

Emerging single-cell technologies that combine CRISPR-based genetic perturbations with single-cell RNA sequencing, such as Perturb-seq, offer unprecedented opportunities to uncover cause-and-effect relationships among genes. Nonetheless, Perturb-seq experiments are subject to unobserved factors that, if not properly handled, can severely bias the inferred causal relationships between genes. These latent factors may arise not only from intrinsic molecular features of the regulatory elements, but also from unmeasured genes omitted due to cost-constrained experimental designs. Although methods for analyzing large-scale Perturb-seq data are rapidly maturing, approaches that explicitly account for such unobserved confounders when inferring causal gene networks are still lacking. Here, we propose a novel approach to accurately reconstruct causal gene networks from Perturb-seq data even when important confounders are missing. Our framework leverages proxy and instrumental variable strategies to exploit the rich information embedded in the perturbations, enabling unbiased estimation of the underlying directed acyclic graph (DAG) of gene expression. Applications to both comprehensive synthetic data and real CRISPR interference experiments in K562 cells demonstrate that our method outperforms baseline approaches that lack principled adjustments for unmeasured confounding, yielding more accurate and biologically relevant recovery of the true causal DAGs.

2507.00312 2026-06-09 stat.ME 版本更新

Optimal Targeting in Dynamic Systems

动态系统中的最优目标选择

Yuchen Hu, Shuangning Li, Stefan Wager

AI总结 针对马尔可夫系统中治疗决策影响共享资源的问题,提出结合CATE估计与状态值迭代的算法,理论保证一致性,实验显示优于个体级CATE规则和通用离线强化学习方法。

详情
AI中文摘要

现代治疗目标选择方法通常依赖于使用机器学习工具估计条件平均处理效应(CATE)。虽然这些方法在识别个体层面谁从治疗中受益方面有效,但它们通常忽略了当治疗对共享容量造成压力时可能出现的系统级动态。我们研究了马尔可夫系统中的目标选择问题,其中治疗决策必须随着单元到达逐一做出,早期决策可能通过延迟或有限资源访问影响后续结果。我们表明,此类设置中的最优策略将类似CATE的量与特定状态的阈值进行比较,其中每个阈值反映了在给定状态下治疗额外个体对系统的预期累积影响。我们提出了一种算法,该算法将标准CATE估计与状态级值迭代相结合,从观测数据中估计这些阈值。理论结果建立了一致性和收敛性保证,实证研究表明,相对于个体级CATE目标选择规则和通用离线强化学习算法,我们的方法显著改善了长期结果。

英文摘要

Modern treatment targeting methods often rely on estimating a conditional average treatment effect (CATE) using machine learning tools. While effective in identifying who benefits from treatment on the individual level, these approaches typically overlook system-level dynamics that may arise when treatments induce strain on shared capacity. We study the problem of targeting in Markovian systems, where treatment decisions must be made one at a time as units arrive, and early decisions can impact later outcomes through delayed or limited access to resources. We show that optimal policies in such settings compare CATE-like quantities to state-specific thresholds, where each threshold reflects the expected cumulative impact on the system of treating an additional individual in the given state. We propose an algorithm that augments standard CATE estimation with state-level value iteration to estimate these thresholds from observational data. Theoretical results establish consistency and convergence guarantees, and empirical studies demonstrate that our method improves long-run outcomes considerably relative to individual-level CATE targeting rules and generic offline reinforcement learning algorithms.

2508.10331 2026-06-09 stat.ME 版本更新

Synthesizing Evidence: Data-Pooling as a Tool for Treatment Selection in Online Experiments

综合证据:数据池化作为在线实验中治疗选择的工具

Zhenkang Peng, Chengzhang Li, Ying Rong, Renyu Zhang

AI总结 提出数据池化治疗推出(DPTR)框架,通过跨实验池化数据减少估计变异性,支持重叠与非重叠流量场景,理论分析和实证验证其优于传统方法。

详情
AI中文摘要

随机实验是因果推断的金标准,但在商业应用中面临重大挑战,包括有限的流量分配、异质性治疗效果估计的需求以及管理重叠实验的复杂性。这些因素导致治疗效果估计的高变异性,使得数据驱动的政策推出变得困难。为了解决这些问题,我们引入了数据池化治疗推出(DPTR)框架,该框架通过跨实验池化数据而非狭隘地关注单个实验来增强政策推出。DPTR 能有效适应重叠和非重叠流量场景,无论模型规格是线性还是非线性。我们通过三管齐下的验证证明了该框架的稳健性:(a) 理论分析表明,在非重叠实验下,特别是当实验数量较大时,DPTR 优于传统的均值差和普通最小二乘法;(b) 合成模拟证实了其在具有重叠流量、丰富协变量和非线性规格的复杂场景中的适应性;(c) 对来自真实世界平台的两个实验数据集进行实证应用,展示了其在指导单个实验内子组的定制政策推出以及协调多个重叠实验场景中的政策部署方面的有效性。通过减少估计变异性以提高决策有效性,DPTR 为在线平台在当今日益复杂的商业环境中更好地利用其实验数据提供了一种可扩展、实用的解决方案。

英文摘要

Randomized experiments are the gold standard for causal inference but face significant challenges in business applications, including limited traffic allocation, the need for heterogeneous treatment effect estimation, and the complexity of managing overlapping experiments. These factors lead to high variability in treatment effect estimates, making data-driven policy roll out difficult. To address these issues, we introduce the data pooling treatment roll-out (DPTR) framework, which enhances policy roll-out by pooling data across experiments rather than focusing narrowly on individual ones. DPTR can effectively accommodate both overlapping and non-overlapping traffic scenarios, regardless of linear or nonlinear model specifications. We demonstrate the framework's robustness through a three-pronged validation: (a) theoretical analysis shows that DPTR surpasses the traditional difference-in-mean and ordinary least squares methods under non-overlapping experiments, particularly when the number of experiments is large; (b) synthetic simulations confirm its adaptability in complex scenarios with overlapping traffic, rich covariates and nonlinear specifications; and (c) empirical applications to two experimental datasets from real world platforms, demonstrating its effectiveness in guiding customized policy roll-outs for subgroups within a single experiment, as well as in coordinating policy deployments across multiple experiments with overlapping scenarios. By reducing estimation variability to improve decision-making effectiveness, DPTR provides a scalable, practical solution for online platforms to better leverage their experimental data in today's increasingly complex business environments.

2506.00149 2026-06-09 stat.ME 版本更新

Generalizing causal effects with noncompliance: Application to deep canvassing experiments

推广非依从性下的因果效应:深度拉票实验的应用

Zhongren Chen, Melody Huang

AI总结 针对非依从性情境,提出目标依从者平均因果效应(T-CACE)的识别假设与逆概率加权估计方法,并通过模拟和深度拉票实验验证其有效性。

详情
AI中文摘要

通用性中的标准方法通常关注推广意向治疗效应(ITT)。然而,在实践中,更具政策相关性的量是干预措施在依从者中的推广影响。虽然工具变量(IV)方法常用于估计样本内的依从者平均因果效应(CACE),但标准方法无法应用于与研究样本分布不同的目标总体。本文做出了几项关键贡献。首先,我们引入了一组新的识别假设,形式为总体层面的排除限制,允许在随机实验和观察性研究中识别目标依从者平均因果效应(T-CACE)。这使得研究人员无需依赖标准的主层可忽略性假设即可识别T-CACE。其次,我们提出了一类针对T-CACE的逆概率加权估计量,并推导了其渐近性质。我们提供了研究人员能够获取目标总体辅助依从性信息的扩展设置。最后,我们引入了一种敏感性分析方法,供研究人员评估存在未测量混杂时估计量的稳健性,并扩展了现有检验以评估该情境下的工具有效性。我们通过大量模拟和一项评估深度拉票对减少排斥性态度影响的研究来说明我们提出的方法。

英文摘要

Standard approaches in generalizability often focus on generalizing the intent-to-treat (ITT). However, in practice, a more policy-relevant quantity is the generalized impact of an intervention across compliers. While instrumental variable (IV) methods are commonly used to estimate the complier average causal effect (CACE) within samples, standard approaches cannot be applied to a target population with a different distribution from the study sample. This paper makes several key contributions. First, we introduce a new set of identifying assumptions in the form of a population-level exclusion restriction that allows for identification of the target complier average causal effect (T-CACE) in both randomized experiments and observational studies. This allows researchers to identify the T-CACE without relying on standard principal ignorability assumptions. Second, we propose a class of inverse-weighted estimators for the T-CACE and derive their asymptotic properties. We provide extensions for settings in which researchers have access to auxiliary compliance information across the target population. Finally, we introduce a sensitivity analysis for researchers to evaluate the robustness of the estimators in the presence of unmeasured confounding and extend existing tests to evaluate instrument validity in this context. We illustrate our proposed method through extensive simulations and a study evaluating the impact of deep canvassing on reducing exclusionary attitudes.

2407.01765 2026-06-09 stat.ME stat.AP 版本更新

A General Framework for Design-Based Treatment Effect Estimation in Paired Cluster-Randomized Experiments

配对整群随机实验中基于设计的处理效应估计通用框架

Charlotte Z. Mann, Adam C. Sales, Johann A. Gagnon-Bartsch

AI总结 针对配对整群随机实验,提出一个基于设计的平均个体效应估计通用框架,阐明点估计的偏差-方差权衡,强调协变量调整的优势,并通过模拟研究比较不同估计量性能。

详情
AI中文摘要

配对整群随机实验(pCRTs)在教育项目影响评估试验中很常见。尽管常见,但关于如何分析这种随机化设计以估计平均处理效应,却出人意料地没有明确共识。由于配对集群产生的依赖性,方差估计也变得复杂。因此,我们旨在为pCRTs的不同估计策略提供直观且实用的比较,以指导实践者选择策略。为此,我们提出了一个基于设计的pCRTs平均个体效应估计通用框架。该框架为点估计的偏差-方差权衡提供了新颖且直观的视角,并强调了协变量调整对pCRTs估计的益处。除了提供pCRTs估计的通用框架外,我们提出的点和方差估计量支持固定样本无偏估计,其精度与常见回归模型相当,并提供保守方差估计。通过基于一项教育效能试验的模拟研究,我们比较了所回顾的点估计量和方差估计量的性能。我们的分析和模拟研究为实践中分析pCRTs时选择点和方差估计量提供了依据。

英文摘要

Paired cluster-randomized experiments (pCRTs) are common in education program impact evaluation trials. Although common, there is surprisingly no clear consensus regarding how to analyze this randomization design to estimate average treatment effects. Variance estimation is also complicated due to the dependency created through pairing clusters. Therefore, we aim to provide an intuitive and practical comparison between different estimation strategies for pCRTs to inform practitioners' choice of strategy. To this end, we present a general framework for design-based estimation of an average individual effect in pCRTs. This framework offers a novel and intuitive view on the bias-variance trade-off between point estimators and emphasizes the benefits of covariate adjustment for estimation with pCRTs. In addition to providing a general framework for estimation with pCRTs, the point and variance estimators we present support fixed-sample unbiased estimation with similar precision to a common regression model and conservative variance estimation. Through simulation studies based on an educational efficacy trial, we compare the performance of the point and variance estimators reviewed. Our analysis and simulation studies inform the choice of point and variance estimators for analyzing pCRTs in practice.

2208.05543 2026-06-09 stat.ME

A novel decomposition to explain heterogeneity in observational and randomized studies of causality

一种解释观察性及随机研究因果异质性的新分解方法

Brian Gilbert, Ivan Dıaz, Kara E. Rudolph, Nicholas Williams, Tat-Thang Vo

AI总结 本文提出一种新框架,通过分解不同研究间的因果效应异质性,探讨观察性与随机研究中的治疗效应差异来源,验证了方法在模拟研究和MTO研究中的应用价值。

详情
Comments
edits to data analysis section, minor textual changes throughout
AI中文摘要

本文介绍了一种新的分解框架,用于解释不同研究中观察到的因果效应异质性,考虑了观察性与随机研究设置。我们提出了对研究间异质性的正式分解,识别了不同研究中治疗效应的变异性来源。所提出的方法允许在各种假设下稳健地估计因果参数,解决预处理协变量分布、中介变量和结果机制的差异。我们的方法通过模拟研究验证,并应用于Moving to Opportunity (MTO) 研究数据,展示了其实际相关性。这项工作为多研究环境中的因果推断提供了更深入的理解,具有证据综合和政策制定的潜在应用。

英文摘要

This paper introduces a novel decomposition framework to explain heterogeneity in causal effects observed across different studies, considering both observational and randomized settings. We present a formal decomposition of between-study heterogeneity, identifying sources of variability in treatment effects across studies. The proposed methodology allows for robust estimation of causal parameters under various assumptions, addressing differences in pre-treatment covariate distributions, mediating variables, and the outcome mechanism. Our approach is validated through a simulation study and applied to data from the Moving to Opportunity (MTO) study, demonstrating its practical relevance. This work contributes to the broader understanding of causal inference in multi-study environments, with potential applications in evidence synthesis and policy-making.

2103.11066 2026-06-09 stat.ME

Treatment Allocation under Uncertain Costs

在不确定成本下的治疗分配

Georgy Kalashnov, Evan Munro, Hao Sun, Shuyang Du, Stefan Wager

AI总结 本文研究在不确定成本下如何最优分配治疗,提出基于优先级评分的阈值规则,并通过随机试验数据学习这些评分。

详情
AI中文摘要

我们考虑学习如何在治疗成本不确定且随预处理协变量变化的情况下最优分配治疗的问题。这种设定可能出现在医学领域,当需要优先分配稀缺资源时,不同患者使用时间不同,或在市场营销中,当需要针对折扣的成本取决于使用量时。本文证明,在预算约束下最优的治疗分配规则是基于优先级评分的阈值规则(评分越高越优先治疗),并提出若干实用方法,利用随机试验数据学习这些优先级评分。我们的正式结果利用了我们问题与在内生性下学习异质治疗效应使用工具变量的问题之间的统计联系。我们在多个实证评估中发现我们的方法表现良好。

英文摘要

We consider the problem of learning how to optimally allocate treatments whose cost is uncertain and can vary with pre-treatment covariates. This setting may arise in medicine if we need to prioritize access to a scarce resource that different patients would use for different amounts of time, or in marketing if we want to target discounts whose cost to the company depends on how much the discounts are used. Here, we show that the optimal treatment allocation rule under budget constraints is a thresholding rule based on priority scores (those with a higher score are treated first), and we propose a number of practical methods for learning these priority scores using data from a randomized trial. Our formal results leverage a statistical connection between our problem and that of learning heterogeneous treatment effects under endogeneity using an instrumental variable. We find our method to perform well in a number of empirical evaluations.

4. 高维统计与正则化 6 篇

2606.09153 2026-06-09 math.ST stat.ME stat.TH 新提交

The Asymptotic Distribution of Sample Canonical Directions in Gaussian Spiked High-dimensional CCA

高斯尖峰高维CCA中样本典型方向的渐近分布

Zhangni Pu, Zhangxiao Zhuo, Jiang Hu

AI总结 研究高斯尖峰高维CCA模型中样本典型方向的渐近行为,推导了平方对齐的确定性极限和中心极限定理,并构造了可计算的插件估计量。

详情
AI中文摘要

本文研究了在高斯总体假设下,有限秩尖峰高维典型相关分析模型中样本典型方向的渐近行为。在数据块维度与样本量成比例增长的渐近框架下,即使对应的样本典型相关与谱主体分离,样本典型方向通常也不是其总体方向的一致估计量。为了量化方向恢复,我们研究了样本典型方向与其关联的总体方向之间的平方对齐。对于每个简单总体尖峰,我们首先建立了该平方对齐的确定性一阶极限,给出了样本方向保留的总体方向信息的显式度量。然后,我们证明了其围绕确定性极限波动的中心极限定理,其渐近方差通过预解迹泛函的确定性极限显式表达。为了使理论量可从数据计算,我们通过反演确定性异常值特征值映射,进一步构造了极限均值和渐近方差的插件估计量,并证明了它们的一致性。数值模拟和实际数据示例支持了理论结果,并展示了所提出的估计量如何评估样本典型方向的恢复质量。

英文摘要

This paper studies the asymptotic behavior of sample canonical directions in a finite-rank spiked high-dimensional canonical correlation analysis model under a Gaussian population assumption. Under the asymptotic regime in which the dimensions of the two data blocks grow proportionally with the sample size, sample canonical directions are generally not consistent estimators of their population counterparts, even when the corresponding sample canonical correlations separate from the bulk spectrum. To quantify directional recovery, we investigate the squared alignment between a sample canonical direction and its associated population direction. For each simple population spike, we first establish a deterministic first-order limit for this squared alignment, which gives an explicit measure of the population-level directional information retained by the sample direction. We then prove a central limit theorem for its fluctuations around the deterministic limit, with an explicit asymptotic variance expressed through deterministic limits of resolvent trace functionals. To make the theoretical quantities computable from data, we further construct plug-in estimators for both the limiting mean and the asymptotic variance by inverting the deterministic outlier eigenvalue map, and prove their consistency. Numerical simulations and a real-data illustration support the theoretical results and demonstrate how the proposed estimators assess the recovery quality of sample canonical directions.

2606.09021 2026-06-09 math.ST stat.ME stat.TH 新提交

Sparse Convexification for High-Dimensional Constrained Regression

高维约束回归的稀疏凸化

Matey Neykov

AI总结 针对高维线性回归,提出稀疏凸化层次结构,通过惩罚最小二乘估计自适应于目标的最佳稀疏凸近似,在次高斯假设下得到平方误差率。

详情
AI中文摘要

我们研究了一般对称凸约束下的高维线性回归。我们不施加特定的稀疏诱导惩罚,而是从任意符号对称且置换不变的凸体 $K\subseteq \mathbb R^p$ 出发,构造稀疏凸化层次 \[ K^{(s)} = \operatorname{conv}\{v\in K:\|v\|_0\le s\}. \] 我们提出一个惩罚最小二乘估计量,在该层次上搜索并自适应于目标的最佳稀疏凸近似。在随机设计和噪声的标准次高斯假设下,我们证明了一个oracle不等式,表明该估计量自适应于目标的最佳稀疏凸近似。对于 $s$-稀疏目标,结果给出了由有效稀疏维度 $s\log(ep/s)$、噪声水平 $σ$ 和稀疏凸化 $K^{(s)}$ 的欧几里得直径 $d_s$ 决定的平方误差率。该方法广泛适用于对称范数球,并且可以通过对 $K$ 的闵可夫斯基泛函的oracle访问来实现。作为一个特例,该框架为约束Lasso提供了一致性结果。

英文摘要

We study high-dimensional linear regression under a general symmetric convex constraint. Rather than imposing a specific sparsity-inducing penalty, we start from an arbitrary sign-symmetric and permutation-invariant convex body $K\subseteq \mathbb R^p$ and construct the sparse convexification hierarchy \[ K^{(s)} = \operatorname{conv}\{v\in K:\|v\|_0\le s\}. \] We propose a penalized least-squares estimator that searches over this hierarchy and adapts to the best sparse convex approximation of the target. Under standard sub-Gaussian assumptions on the random design and noise, we prove an oracle inequality showing that the estimator adapts to the best sparse convex approximation of the target. For an $s$-sparse target, the result yields a squared-error rate governed by the effective sparse dimension $s\log(ep/s)$, the noise level $σ$, and the Euclidean diameter $d_s$ of the sparse convexification $K^{(s)}$. The method applies broadly to symmetric norm balls and can be implemented using oracle access to the Minkowski functional of $K$. As a special case, the framework yields a consistency result for the constrained Lasso.

2606.08468 2026-06-09 stat.ME math.ST stat.ML stat.TH 新提交

Nonparametric undirected graphical model selection using diffusion models

基于扩散模型的非参数无向图模型选择

Hyeok Kyu Kwon, Myeonggu Kang, Minwoo Chae, Wanjie Wang

AI总结 提出一种基于扩散模型的非参数方法用于无向图模型选择,证明了模型选择一致性,并通过模拟和实际数据分析验证了有效性。

详情
AI中文摘要

无向图模型为表示高维随机变量间的条件独立结构提供了基本框架。尽管无向图模型选择已成为高维统计学中的核心问题,但现有方法大多局限于参数化设置。本文基于扩散模型,发展了一种非参数的无向图模型选择方法。近期研究表明,扩散模型能够适应未知的底层分布图结构,但利用这些模型进行显式图估计仍未被探索。为填补这一空白,我们提出了一种新颖的基于扩散的非参数无向图模型选择方法。我们证明了所提方法的模型选择一致性,并通过大量模拟和两个实际数据分析展示了其实证性能。

英文摘要

Undirected graphical models provide a fundamental framework for representing conditional independence structures among high-dimensional random variables. While undirected graphical model selection has become a central problem in high-dimensional statistics, most existing methods are restricted to parametric settings. In this paper, we develop a nonparametric approach to undirected graphical model selection based on diffusion models. Recent work has shown that diffusion models can adapt to the unknown graph structure of the underlying distribution, yet utilizing these models for explicit graph estimation remains unexplored. To bridge this gap, we introduce a novel diffusion-based method for nonparametric undirected graphical model selection. We establish the model selection consistency of the proposed method and demonstrate its empirical performance through extensive simulations and two real data analyses.

2606.07986 2026-06-09 stat.ME stat.ML 新提交

Inference for High-Dimensional Sparse Spectral Precision Matrices

高维稀疏谱精度矩阵的推断

Navonil Deb, Younghoon Kim, Sumanta Basu

AI总结 针对平稳高维时间序列,提出基于相邻离散傅里叶变换全似然的稀疏谱精度矩阵推断框架,通过去偏复图形套索估计器实现频率特定条件依赖的检验,并控制正则化、截断和平滑偏差。

详情
Comments
47 pages, 5 figures, 5 tables
AI中文摘要

谱域中的高斯图模型为恢复平稳高维时间序列中的条件依赖结构提供了一种原则性方法。在固定频率上对谱精度矩阵进行推断,可以检验时间序列分量之间频率特定的条件关联。该问题具有挑战性,因为有限样本离散傅里叶变换会引入截断和平滑偏差,而谱精度矩阵的复值性质使高维方差估计复杂化,使得针对独立同分布样本的方法无法直接应用。现有方法未对离散傅里叶变换提供完整的基于似然的推断。我们提出了一种利用相邻离散傅里叶变换全似然的高维稀疏谱精度矩阵推断框架。我们在任意固定频率上构建了一个去偏的复图形套索估计量。利用多元时间序列二次型的渐近理论,我们建立了其渐近正态性,并通过聚合相邻频率的信息构造了逐项一致的协方差估计量。关键的理论贡献是同时控制正则化、有限样本截断和平滑偏差,从而实现有效的推断。模拟研究表明,在远离零频率处具有可靠的覆盖,检测能力优于基准方法,且错误发现率接近期望水平。

英文摘要

Gaussian graphical models in the spectral domain offer a principled approach for recovering conditional dependence structures in stationary high-dimensional time series. Inference on the spectral precision matrix at a fixed frequency enables tests of frequency-specific conditional associations among time series components. The problem is challenging because finite-sample discrete Fourier transforms induce truncation and smoothing biases, while the complex-valued nature of the spectral precision matrix complicates high-dimensional variance estimation, rendering methods for i.i.d. samples not directly applicable. Existing approaches do not provide full likelihood-based inference for the discrete Fourier transforms. We propose a high-dimensional inference framework for sparse spectral precision matrices using the full likelihood of neighboring discrete Fourier transforms. We construct a debiased complex graphical lasso estimator at any fixed frequency. Using asymptotic theory for quadratic forms of multivariate time series, we establish its asymptotic normality and construct entry-wise consistent covariance estimators by aggregating information across neighboring frequencies. The key theoretical contribution is the simultaneous control of regularization, finite-sample truncation, and smoothing biases, enabling valid inference. Simulation studies show reliable coverage away from zero frequency and improved detection power over the benchmark, with false discovery rates near the desired level.

2606.07816 2026-06-09 stat.ME 新提交

High Dimensional Change Point Models for Two-Directional Data

高维变点模型用于双向数据

Abhishek Kaul, Dipesh Baral, Stergios B. Fotopoulos, Venkata K. Jandhyala, Rebecca Killick

AI总结 针对两个时间索引上可能同时发生变化的双向数据,提出高维变点恢复方法,建立渐近估计与推断理论,并应用于美国太平洋西北地区气候数据。

详情
Comments
arXiv admin note: text overlap with arXiv:2105.10017
AI中文摘要

我们开发了用于恢复在多个时间索引上观测到的数据变点的方法,其中变化可能同时在两个索引上发生,且空间分量可能是高维的。这项工作的动机来自气候监测问题,其中可获得长序列数据,例如多年(索引2)的每日观测(索引1)。此类数据可能在年度时间尺度上演化,同时在较短时间尺度上具有动态季节变化。我们将其建模为在具有变点的二维网格上观测到的高维均值过程。在单一变点设置下发展了渐近估计和推断结果,包括所提出方法的收敛速度以及由此产生的极限分布。该方法扩展到多个变点的情况。理论结果通过蒙特卡洛模拟得到数值支持。我们将我们的工作应用于美国太平洋西北地区的大规模气候数据。

英文摘要

We develop methodology for recovery of change points for data observed on more than one temporal index where changes may occur simultaneous in both indices, where the spatial component may be high dimensional. The work is motivated by climate monitoring problems where long series of data are available, e.g., daily observations (index 1) over several years (index 2). Such data may be evolving over the annual time scale, along with dynamic seasonal changes in the shorter time scale. We model this as a high dimensional mean process observed on a two dimensional grid with change points. Asymptotic estimation and inference results are developed under a single change point setup, including rates of convergence of the proposed method as well the resulting limiting distributions. The method is extended to the case of multiple changes. Theoretical results are supported numerically with monte-carlo simulations. We implement our work on a large scale climate data for the Pacific Northwest region of the United States.

2306.06756 2026-06-09 stat.ME stat.CO stat.ML 版本更新

Semi-Parametric Inference for Doubly Stochastic Spatial Point Processes: An Approximate Penalized Poisson Likelihood Approach

双随机空间点过程的半参数推断:一种近似惩罚泊松似然方法

Si Cheng, Jon Wakefield, Ali Shojaie

AI总结 提出一种基于近似惩罚泊松似然的半参数回归方法,高效估计双随机空间点过程中的协变量效应,并证明在模型误设下估计的一致性和渐近正态性。

详情
AI中文摘要

双随机点过程将空间域上的事件发生建模为以随机强度函数实现为条件的非齐次泊松过程。它们是捕捉空间异质性和相关性的灵活工具。然而,现有的双随机空间模型实现计算量大,通常理论保证有限,和/或依赖限制性假设。我们提出一种惩罚回归方法,用于估计双随机点过程中的协变量效应,该方法计算高效,且不需要参数形式或底层强度的平稳性。我们的方法基于真实(连续和随机)强度函数的近似(离散和确定性)公式。我们证明,尽管存在模型误设,协变量效应估计的一致性和渐近正态性仍可实现,并开发了一个协方差估计量,导致有效但保守的统计推断程序。模拟研究显示,在数据生成机制的限制性较弱的假设下,我们的方法是有效的,并且对西雅图犯罪数据的应用表明,与现有替代方法相比,预测精度更高。

英文摘要

Doubly-stochastic point processes model the occurrence of events over a spatial domain as an inhomogeneous Poisson process conditioned on the realization of a random intensity function. They are flexible tools for capturing spatial heterogeneity and correlation. However, existing implementations of doubly-stochastic spatial models are computationally demanding, often have limited theoretical guarantee, and/or rely on restrictive assumptions. We propose a penalized regression method for estimating covariate effects in doubly-stochastic point processes that is computationally efficient and does not require a parametric form or stationarity of the underlying intensity. Our approach is based on an approximate (discrete and deterministic) formulation of the true (continuous and stochastic) intensity function. We show that consistency and asymptotic normality of the covariate effect estimates can be achieved despite the model misspecification, and develop a covariance estimator that leads to a valid, albeit conservative, statistical inference procedure. A simulation study shows the validity of our approach under less restrictive assumptions on the data generating mechanism, and an application to Seattle crime data demonstrates better prediction accuracy compared with existing alternatives.

5. 时间序列与空间统计 12 篇

2606.09473 2026-06-09 stat.ML cs.LG 新提交

Report the Floor: A Training-Free Conformal Interval Is a Mandatory Baseline for Probabilistic Time-Series Forecasting

报告基线:无训练共形区间是概率时间序列预测的强制性基准

Valery Manokhin

AI总结 提出无参数、无训练的共形朴素区间作为概率预测的强基线,在2217个真实序列上击败了多种现有方法,并主张其应成为强制性基准。

详情
AI中文摘要

概率预测器越来越多地通过学习得到,但它们所比较的基线往往较弱或被忽略。我们表明,最简单的共形区间——一个包裹在有限样本分割共形残差分位数中的最后值点预测,无参数且无需训练——是一个远比其在近期学习预测和共形时间序列比较中几乎完全缺失所暗示的更强大的基线。在来自九个公共来源(Monash、LOTSA、LTSF交通/电力/天气套件、METR-LA、BOOM、nips/probts)的2217个真实序列的单步在线预测中,这个ConformalNaive区间决定性地击败了朴素值分位数基线、整个NPTS系列(NPTS 73%,SeasonalNPTS 64%的序列)以及已发表的共形季节池(CSP)方法(71%的序列,bootstrap 95% CI [69,73],配对Wilcoxon p约7.6e-135);它与更简单的学习共形预测器(RCI,分位数回归;中位数相对Winkler在2%以内)相当,并且仅被跟踪分布偏移的自适应在线和集成方法(SPCI、ACI、AgACI)击败,后者在相对Winkler上领先9-33%。它也比训练过的神经预测器校准得更好:在引入DeepNPTS的六个数据集上,平凡的基线在名义95%下覆盖真实值84-85%的时间,而DeepNPTS为66%。在多步季节视界上,情况反转:随机游走基线是最弱的方法,季节池(CSP)获胜——我们描绘了这一边界。最后,我们给出了ConformalNaive+,一个一行代码、无训练、视界自适应的选择器,它在每个视界上达到两个互补基线中较好的一个,并恢复了覆盖。我们认为,每当学习概率预测器声称有改进时,匹配的共形朴素基线必须是一个强制性基准。

英文摘要

Probabilistic forecasters are increasingly learned, yet the baselines they are compared against are often weak or omitted. We show that the simplest possible conformal interval - a last-value point forecast wrapped in a finite-sample split-conformal residual quantile, with no parameters and no training - is a far stronger baseline than its near-total absence from recent learned-forecasting and conformal-time-series comparisons would suggest. In one-step-ahead online forecasting across 2,217 real series from nine public sources (Monash, LOTSA, the LTSF traffic/electricity/weather suites, METR-LA, BOOM, nips/probts), this ConformalNaive interval decisively beats the naive value-quantile baselines, the entire NPTS family (NPTS 73%, SeasonalNPTS 64% of series), and the published Conformal Seasonal Pools (CSP) method (71% of series, bootstrap 95% CI [69,73], paired Wilcoxon p approx 7.6e-135); it is on par with the simpler learned conformal predictors (RCI, quantile regression; median relative Winkler within 2%) and is beaten only by the adaptive-online and ensemble methods (SPCI, ACI, AgACI), which track distribution shift and lead by 9-33% relative Winkler. It is also better calibrated than a trained neural forecaster: on the six datasets that introduced DeepNPTS, the trivial floors cover the truth 84-85% of the time at a nominal 95%, versus DeepNPTS's 66%. At multi-step seasonal horizons the picture inverts: the random-walk floor is the weakest method and the seasonal pool (CSP) wins - a boundary we map. Finally we give ConformalNaive+, a one-line, training-free, horizon-adaptive selector that attains the better of two complementary floors at every horizon with restored coverage. We argue the matching conformal naive floor must be a mandatory baseline whenever a learned probabilistic forecaster claims gains.

2606.08786 2026-06-09 stat.ME 新提交

Inference for Balance in Dynamic Signed Networks

动态符号网络中平衡性的推断

Ergan Shang, Yuan Zhang, Weijing Tang

AI总结 针对动态符号网络,提出非参数推断方法估计指定时间点的结构平衡度,采用核平滑估计器利用时间平滑性,并基于Edgeworth展开建立高阶分布近似,理论证明时间平滑可降低稀疏网络中的观测噪声方差。

详情
AI中文摘要

符号网络由正负关系组成,结构平衡理论为理解其全局张力结构提供了重要的概念框架。现有统计方法主要关注评估单个观测网络中的平衡性经验证据,而许多现实世界的符号关系随时间演化。本文针对动态符号网络,在指定时间点对结构平衡的总体程度进行非参数推断,目标时间可能观测到也可能未观测到快照。我们考虑一个动态符号图模型,其中边的形成和符号的生成均由随时间平滑变化的图函数控制。为利用时间平滑性,我们构建了一个核平滑估计器,从目标时间点附近的快照中借用信息。我们的理论分析建立了一个学生化推断程序,并基于Edgeworth展开给出了高阶分布近似,表明时间平滑通过减少观测噪声的方差(直至平滑偏差和时间离散化误差)来改进稀疏网络中的推断。通过大量模拟研究和对政治科学中动态国际关系网络的应用,我们展示了所提方法的有限样本性能和实际效用。

英文摘要

Signed networks consist of both positive and negative relations, and structural balance theory provides an important conceptural framework for understanding their global tension structure. While existing statistical methods mainly focus on assessing empirical evidence of balance in a single observed network, many real-world signed relations evolve over time. This paper develops nonparametric inference for the population degree of structural balance at specified time points in dynamic signed networks, where the target time may or may not coincide with an observed snapshot. We consider a dynamic signed graphon model in which both edge formation and sign generation are governed by smoothly time-varying graphon functions. To exploit temporal smoothness, we construct a kernel-smoothed estimator that borrows information from snapshots near the target time point. Our theoretical analysis establishes a studentized inference procedure and a higher-order distributional approximation based on Edgeworth expansion, showing that temporal smoothing improves inference in sparse networks by reducing variance of observation noise, up to smoothing bias and time-discretization errors. We demonstrate the finite-sample performance and practical usefulness of the proposed method through extensive simulation studies and an application to a dynamic international relation network in political science.

2606.08560 2026-06-09 stat.ME econ.EM stat.ML 新提交

CP-factorization for high dimensional tensor time series and double projection iterations

高维张量时间序列的CP分解与双投影迭代

Jinyuan Chang, Guanglin Huang, Qiwei Yao, Long Yu

AI总结 采用规范多元分解(CP)建模高维张量时间序列,提出基于序列依赖结构的单次估计方法,并引入双投影迭代算法降低估计误差,理论证明了收敛速度与渐近分布。

详情
AI中文摘要

我们采用规范多元分解(CP)来建模高维张量时间序列。主要目标是识别和估计CP分解中的因子载荷。我们提出了一种基于数据序列依赖结构构建矩阵的标准特征分析的单次估计程序。在因子载荷向量线性独立的一般设定下,建立了所提估计量的渐近性质,允许因子相关且因子载荷向量不近似正交。该程序适应因子载荷向量的稀疏性,容纳弱因子,并在广泛场景中表现出强性能。为了进一步减少估计误差,我们还引入了一种基于新颖双投影方法的迭代算法。我们从理论上证明了迭代估计量改进的收敛速度,并推导了相关的极限分布。还提供了一致渐近方差估计量,这在相关推断问题中起关键作用。所有结果通过大量模拟和两个实际数据应用得到验证。

英文摘要

We adopt the canonical polyadic (CP) decomposition to model high-dimensional tensor time series. Our primary goal is to identify and estimate the factor loadings in the CP decomposition. We propose a one-pass estimation procedure through standard eigen-analysis for a matrix constructed based on the serial dependence structure of the data. The asymptotic properties of the proposed estimator are established under a general setting as long as the factor loading vectors are linearly independent, allowing the factors to be correlated and the factor loading vectors to be not nearly orthogonal. The procedure adapts to the sparsity of the factor loading vectors, accommodates weak factors, and demonstrates strong performance across a wide range of scenarios. To further reduce estimation errors, we also introduce an iterative algorithm based on a novel double projection approach. We theoretically justify the improved convergence rate of the iterative estimator, and derive the associated limiting distribution. A consistent estimator of the asymptotic variance is also provided, which plays a key role in the related inference problems. All results are validated through extensive simulations and two real data applications.

2606.08498 2026-06-09 math.ST stat.ME stat.TH 新提交

Tests for Independence of High-Dimensional Nonstationary Time Series

高维非平稳时间序列的独立性检验

Yunyi Zhang

AI总结 提出一种双模态加权平均检验统计量,无需预白化即可检验高维非平稳时间序列的独立性,并开发了依赖野刀切法进行推断。

详情
AI中文摘要

本文研究了两个高维时间序列之间的独立性检验问题,不假设弱平稳性,即允许其自协方差随时间变化。为此,我们提出了一种双模态加权平均检验统计量,该统计量在原假设下消除了时间依赖性引起的偏差,从而避免了在假设检验前对时间序列进行白化——这一过程在高维和非平稳设置中具有挑战性。为了促进统计推断,我们开发了一种依赖野刀切法。在理论方面,我们推导了一类高维、非线性、非平稳过程的时间序列数据二次型的集中不等式。这一结果使我们能够推导出所提检验统计量的渐近零分布,并建立刀切算法的有效性。数值结果表明,即使当维度超过样本量或数据生成过程表现出时变自协方差时,所提检验也能达到所需的尺寸和良好的功效性能。相比之下,基于白化时间序列的检验在存在不稳定的自协方差结构时无法保持正确的尺寸。由于非平稳自协方差在现实时间序列数据中普遍存在,我们的工作为独立性检验提供了一种稳健的方法。

英文摘要

This manuscript studies the problem of independence testing between two high-dimensional time series without assuming weak stationarity, that is, allowing their autocovariances to vary over time. To this end, we propose a bimodal weighted-average test statistic that removes the bias induced by temporal dependence under the null hypothesis, thereby avoiding the need to whiten the time series prior to hypothesis testing -- a procedure that is challenging in high-dimensional and nonstationary settings. To facilitate statistical inference, we develop a dependent wild bootstrap procedure. On the theoretical side, we derive a concentration inequality for quadratic forms of time series data stemming from a class of high-dimensional, nonlinear, and nonstationary processes. This result enables us to derive the asymptotic null distribution of the proposed test statistic and to establish the validity of the bootstrap algorithm. Numerical results show that the proposed test attains desired size and good power performance even when the dimension exceeds the sample size or when the data-generating process exhibits time-varying autocovariances. In contrast, tests based on whitening time series fail to maintain correct size in the presence of unstable autocovariance structures. Since nonstationary autocovariances commonly arise in real-life time series data, our work offers a robust procedure for independence testing.

2606.08385 2026-06-09 eess.SP cs.IT cs.SD cs.SY eess.SY math.IT stat.ML 新提交

A Switching Beamformer for Highly Non-Stationary Environments

一种适用于高度非平稳环境的切换波束形成器

Manan Mittal, Ryan M. Corey, John R. Buck, Andrew C. Singer

AI总结 针对复杂快速变化干扰下自适应波束形成性能下降的问题,提出通用切换波束形成器(USB),通过竞争性序列预测和线性转移图动态调整有效记忆长度,理论证明其遗憾上界,实验验证其兼具短窗口的敏捷性和长窗口的精度。

详情
Comments
11 pages, 19 figures, under review
AI中文摘要

自适应波束形成是阵列信号处理的基石,但其性能在面对复杂、快速变化的干扰时常常崩溃。当干扰源出现或移动不可预测时,传统估计器面临基本的记忆权衡:短窗口能够快速跟踪但估计方差高,而长窗口提供稳定的抑制但无法适应变化。通过将竞争性序列预测引入波束形成架构,提出通用切换波束形成器(USB)解决了这一挑战。通过使用线性转移图,USB隐式维护了一个指数大的候选协方差历史族,并根据其累积输出功率动态重新加权。该机制使波束形成器能够自动改变其有效记忆长度,无需显式的变化检测或启发式参数调整。证明了相对于一个全知先知(该先知事后选择最佳分段平稳协方差模型)的遗憾的理论上界。在SwellEx-96数据集上的大量仿真和实验表明,USB实现了短窗口估计器的敏捷性和长期集成的精度,为跟踪高度非平稳场景提供了一种原则性解决方案。

英文摘要

Adaptive beamforming is a cornerstone of array signal processing, yet its performance often collapses in the face of complex, rapidly changing interference. When interferers appear or move unpredictably, conventional estimators encounter a fundamental memory trade-off: short windows enable rapid tracking but suffer from high estimation variance, while long windows provide stable rejection but fail to adapt to shifts. This challenge is resolved by introducing the Universal Switching Beamformer (USB), which integrates competitive sequential prediction into the beamforming architecture. By employing a linear transition diagram, the USB implicitly maintains an exponentially large family of candidate covariance histories and dynamically re-weights them based on their cumulative output power. This mechanism allows the beamformer to automatically vary its effective memory length without explicit change detection or heuristic parameter tuning. A theoretical upper bound is proven on the regret relative to an omniscient oracle that selects the best piecewise-stationary covariance model in hindsight. Extensive simulations and experiments on the SwellEx-96 dataset demonstrate that the USB achieves the agility of short-window estimators and the precision of long-term integration, providing a principled solution for tracking highly non-stationary scenes.

2606.08261 2026-06-09 stat.ME stat.AP 新提交

Sparse Longitudinal Functional Principal Component Analysis for Episodic Ambulatory Behavioral Assessments

稀疏纵向函数主成分分析用于间歇性动态行为评估

Nidhi Pai, Yu Fang, Srijan Sen, Zhenke Wu, Erjia Cui

AI总结 提出稀疏纵向函数主成分分析方法,通过结构化惩罚样条回归估计协方差,分解间歇性观测的键入速度轨迹变异性,揭示个体和日间模式,用于定制行为干预。

详情
AI中文摘要

准确监测精神疲劳对于提高工作场所安全性和生产力至关重要。最近一项研究利用实习健康研究(IHS)的数据,检查了无创收集的智能手机打字速度作为精神疲劳的潜在动态代理评估。虽然发现人群平均打字速度模式与经过验证的精神疲劳测量一致,但这些轨迹如何在参与者和天数之间变化可能为即时干预提供适当时机,这仍然是一个未解决的问题。将打字速度轨迹视为稀疏观测的函数数据,我们提出了一种新颖的稀疏纵向函数主成分分析(稀疏LFPCA)方法,用于分解变异性和预测个体曲线。具体来说,通过将协方差估计构建为结构化惩罚样条回归问题来适应稀疏数据,从而能够在功能域中跨位置借用信息的同时,同时估计和平滑多个协方差分量。模拟表明,稀疏LFPCA(1)准确估计特征函数并为潜在曲线生成合理预测,以及(2)与现有替代方法相比实现相似或更优的性能。我们对从IHS收集的打字速度数据的分析揭示了先前分析未捕获的新的、可解释的参与者和日间模式,并可用于定制行为干预。

英文摘要

Accurately monitoring mental fatigue is critical for improving workplace safety and productivity. A recent study examined unobtrusively collected smartphone typing speed as a potential ambulatory proxy assessment of mental fatigue using data from the Intern Health Study (IHS). While population-level average typing speed patterns were found to be consistent with validated measures of mental fatigue, how these trajectories vary across participants and days may inform opportune moments for just-in-time interventions and remains an open question. Treating typing speed trajectories as sparsely observed functional data, we propose a novel sparse longitudinal functional principal component analysis (sparse LFPCA) method for decomposing variability and predicting individual curves. Specifically, sparse data are accommodated by casting covariance estimation as a structured penalized spline regression problem, enabling simultaneous estimation and smoothing of multiple covariance components while borrowing information across locations in the functional domain. Simulations show that sparse LFPCA (1) accurately estimates eigenfunctions and generates reasonable predictions for underlying curves, and (2) achieves similar or superior performance compared to existing alternatives. Our analysis of typing speed data collected from IHS reveals new and interpretable participant- and day-level patterns not captured by previous analyses and can be used to tailor behavioral interventions.

2605.16866 2026-06-09 stat.ME econ.EM 版本更新

Heavy Tails and Predictive Ability Testing

厚尾与预测能力检验

Jonas F. Frederiksen, Muneya Matsui, Rasmus S. Pedersen

AI总结 本文研究了在预测误差具有厚尾时广泛使用的预测准确性评估和比较检验的渐近行为,发现当损失差异具有无限方差时,Diebold-Mariano检验统计量收敛于非标准极限,涉及非高斯稳定随机变量,传统临界值可能导致严重失真推断。为此,作者开发了新的稳定极限定理,并基于此理论提出了一种子采样推断方法,该方法在尾部厚重时仍有效且无需估计长期方差或尾部指数。对新兴市场汇率风险预测的应用表明,考虑厚尾可显著改变预测性能的结论。

详情
Comments
72 pages, 3 figures. Application in Econometrics
AI中文摘要

我们研究了在预测误差表现出厚尾时广泛使用的用于评估和比较预测准确性检验的渐近行为。特别是,当损失差异具有无限方差时,Diebold-Mariano检验统计量收敛于一个涉及非高斯稳定随机变量的非标准极限。因此,传统临界值可能导致严重失真推断:一个名义5%的检验可能将真实的零假设拒绝多达70%的时间。为建立这些结果,我们开发了一个新的稳定极限定理,用于强混合、无限方差时间序列过程。基于此理论,我们考虑了基于子采样的推断方法,该方法在尾部厚重时仍有效,并且不需要估计长期方差或尾部指数。对新兴市场汇率风险预测的应用表明,考虑厚尾可以显著改变预测性能的结论,相对于标准程序。

英文摘要

We study the asymptotic behaviour of widely used tests for evaluating and comparing predictive accuracy when forecast errors exhibit heavy tails. In particular, when loss differentials have infinite variance, the Diebold-Mariano test statistic converges to a nonstandard limit involving non-Gaussian stable random variables. As a consequence, conventional critical values can yield severely distorted inference: a nominal 5$\%$ test may reject a true null as often as 70$\%$ of the time. To establish these results, we develop a new stable limit theorem for strongly mixing, infinite-variance time series processes. Building on this theory, we consider sub-sampling-based inference that remains valid irrespective of tail-heaviness and requires no estimation of long-run variances or tail indices. An application to risk forecasts for emerging-market exchange rates shows that accounting for heavy tails can substantially alter conclusions about predictive performance relative to standard procedures.

2410.20885 2026-06-09 econ.EM stat.ME 版本更新

A Distributed Lag Approach to the Generalised Dynamic Factor Model

一般动态因子模型的分布式滞后方法

Philipp Gersing

AI总结 本文提出了一种简单估计方法,用于一般动态因子模型的动态分解,避免频域方法。通过假设动态公共成分可表示为当前和滞后静态普遍因子的组合,将估计简化为对观测变量进行回归,其中因子通过静态主成分提取。该方法自然容纳弱非普遍因子,建立了弱公共成分的一致性和渐近正态性。

详情
AI中文摘要

我们提出了一种简单的估计器,用于一般动态因子模型的动态分解,避免频域方法。首先,我们证明假设一般动态因子模型的动态公共成分可以表示为当前和滞后静态普遍因子的合理近似。然后,在假设有限滞后阶数的情况下,这种简化将估计简化为对观测变量进行回归,其中因子通过静态主成分提取。所提出的方法自然容纳弱、非普遍因子在动态公共空间内。在新的渐近框架下,我们建立了动态和弱公共成分的一致性和渐近正态性。在对三个高维时间序列面板的欧洲宏观经济数据应用中,我们检测到几个关键宏观经济指标中存在显著的弱公共成分份额。

英文摘要

We propose a simple estimator for the dynamic decomposition of the Generalized Dynamic Factor Model that avoids frequency-domain methods. First, we show that it is a reasonable approximation to assume that the dynamic common component of the Generalized Dynamic Factor Model admits a representation in terms of current and lagged statically pervasive factors. Then, assuming finite lag order, this simplification reduces estimation to a regression of the observed variables on estimated factors and their lags, where the factors are extracted via static principal components. The proposed approach naturally accommodates weak, non-pervasive factors within the dynamic common space. We establish consistency and asymptotic normality for both the dynamic and weak common components under a new asymptotic framework that allows for such weak factors. In an application to three high-dimensional time series panels of European macroeconomic data we detect a sizeable weak common component share in several key macroeconomic indicators.

2603.21161 2026-06-09 stat.ME 版本更新

An information criterion for detecting periodicities in functional time series

用于检测功能时间序列周期性的信息准则

Rinka Sagawa, Yan Liu, Valentin Patilea

AI总结 本文提出一种信息准则用于确定功能时间序列中未知周期成分数量,通过迭代过程和残差过程评估,经数值模拟和实际数据验证其有效性。

详情
Journal ref
Computational Statistics & Data Analysis (2026) 108430
AI中文摘要

我们提出了一种信息准则,用于确定功能时间序列中未知数量的周期成分。识别大规模时间序列中的频率数量一直是核心问题。为此,我们建议一种迭代过程,利用通过最小二乘拟合得到的残差过程。这种迭代方法具有广泛的应用性。我们通过最小化信息准则来建立估计周期成分数量的一致性。该方法的有效性通过数值模拟进行说明。在实际数据分析中,我们将其应用于温度数据和太阳黑子数据。

英文摘要

We propose an information criterion for determining an unknown number of periodic components in functional time series. Identifying the number of frequencies in large-scale time series has been a central focus. To achieve this goal, we suggest an iterative procedure, utilizing the residual process obtained through least squares fitting. This iterative approach demonstrates broad applicability. We establish the consistency of the estimated number of periodic components by minimizing the information criterion. The efficacy of the procedure is illustrated through numerical simulations. In real data analysis, we apply this information criterion to temperature data and sunspot data.

2512.00239 2026-06-09 cs.LG stat.ML 版本更新

Self-Supervised Dynamical System Representations for Physiological Time-Series

生理时间序列的自监督动力系统表示

Yenho Chen, Maxwell A. Xu, James M. Rehg, Christopher J. Rozell

AI总结 提出PULSE框架,利用动力系统生成模型的信息结构,通过跨重建预训练目标提取共享系统参数信息,丢弃样本特异性噪声,提升生理时间序列的表示学习效果。

详情
Comments
Accepted to ICML 2026
AI中文摘要

自监督学习(SSL)对生理时间序列的有效性取决于预训练目标在过滤掉无关噪声的同时保留关于潜在生理状态信息的能力。然而,现有策略由于依赖启发式原则或约束较差的生成任务而受到限制。为解决这一限制,我们提出一个预训练框架,该框架利用跨多个时间序列的动力系统生成模型的信息结构。该框架揭示了我们的关键见解:通过提取与跨相似时间序列样本共享的系统参数相关的生成变量信息,可以高效捕获类别身份,而应丢弃单个样本特有的噪声。基于这一见解,我们提出PULSE,一种基于跨重建的生理时间序列数据集预训练目标,它在丢弃不可迁移的样本特异性信息的同时显式提取系统信息。我们建立了提供系统信息恢复充分条件的理论,并通过合成动力系统实验进行了实证验证。此外,我们将我们的方法应用于多种真实世界数据集,证明PULSE学习到的表示能够广泛区分语义类别、提高标签效率并改进迁移学习。

英文摘要

The effectiveness of self-supervised learning (SSL) for physiological time series depends on the ability of a pretraining objective to preserve information about the underlying physiological state while filtering out unrelated noise. However, existing strategies are limited due to reliance on heuristic principles or poorly constrained generative tasks. To address this limitation, we propose a pretraining framework that exploits the information structure of a dynamical systems generative model across multiple time-series. This framework reveals our key insight that class identity can be efficiently captured by extracting information about the generative variables related to the system parameters shared across similar time series samples, while noise unique to individual samples should be discarded. Building on this insight, we propose PULSE, a cross-reconstruction-based pretraining objective for physiological time series datasets that explicitly extracts system information while discarding non-transferrable sample-specific ones. We establish theory that provides sufficient conditions for the system information to be recovered, and empirically validate it using a synthetic dynamical systems experiment. Furthermore, we apply our method to diverse real-world datasets, demonstrating that PULSE learns representations that can broadly distinguish semantic classes, increase label efficiency, and improve transfer learning.

2302.01233 2026-06-09 econ.EM math.ST stat.ME stat.TH 版本更新

Sparse High-Dimensional Vector Autoregressive Bootstrap

稀疏高维向量自回归自助法

Robert Adamek, Stephan Smeekes, Ines Wilms

AI总结 提出基于稀疏向量自回归模型的高维乘数自助法,用于时间序列数据推断,证明在次高斯和有限矩条件下对高维均值推断的一致性。

详情
AI中文摘要

我们引入了一种高维乘数自助法,适用于时间序列数据,该方法通过稀疏估计的向量自回归模型捕捉依赖性。我们在两种不同的误差矩假设下证明了该方法对高维均值推断的一致性,即次高斯矩和有限绝对矩。在建立这些结果的过程中,我们推导了线性过程最大均值的高斯近似,这可能具有独立的意义。

英文摘要

We introduce a high-dimensional multiplier bootstrap for time series data based on capturing dependence through a sparsely estimated vector autoregressive model. We prove its consistency for inference on high-dimensional means under two different moment assumptions on the errors, namely sub-gaussian moments and a finite number of absolute moments. In establishing these results, we derive a Gaussian approximation for the maximum mean of a linear process, which may be of independent interest.

2406.03296 2026-06-09 stat.ME 版本更新

Multi-relational Network Autoregression Model with Latent Group Structures

具有潜在组结构的多关系网络自回归模型

Yimeng Ren, Xuening Zhu, Ganggang Xu, Yanyuan Ma

AI总结 提出组张量网络自回归模型,通过假设每个网络内实体共享参数并跨网络不同,同时估计组结构和参数,解决多关系网络异质性和高维时序数据问题。

详情
Comments
arXiv admin note: text overlap with arXiv:2212.02107
AI中文摘要

在大数据时代,实体间的多关系网络频繁出现。量化多个网络的影响最近引起了显著的研究兴趣。在这项工作中,我们通过张量值时间序列的自回归框架来建模多网络效应。为了同时表征网络的潜在异质性并处理时间序列数据的高维性,我们假设每个网络中的实体具有独立的组结构,并以数据驱动的方式估计所有组成员身份。具体来说,我们提出了组张量网络自回归(GTNAR)模型,该模型假设在每个网络内,同一组中的实体共享相同的模型参数,而参数在不同网络间不同。我们开发了一种迭代算法来同时估计模型参数和潜在组成员身份。理论上,我们证明了当组数被正确指定或可能过度指定时,组参数和组成员身份可以一致地估计。还提供了每个网络组数估计的信息准则,以一致地选择组数。最后,我们在Yelp数据集上实现了该方法,以说明该方法的实用性。

英文摘要

Multi-relational networks among entities are frequently observed in the era of big data. Quantifying the effects of multiple networks have attracted significant research interest recently. In this work, we model multiple network effects through an autoregressive framework for tensor-valued time series. To characterize the potential heterogeneity of the networks and handle the high dimensionality of the time series data simultaneously, we assume a separate group structure for entities in each network and estimate all group memberships in a data-driven fashion. Specifically, we propose a group tensor network autoregression (GTNAR) model, which assumes that within each network, entities in the same group share the same set of model parameters, and the parameters differ across networks. An iterative algorithm is developed to estimate the model parameters and the latent group memberships simultaneously. Theoretically, we show that the group-wise parameters and group memberships can be consistently estimated when the group numbers are correctly- or possibly over-specified. An information criterion for group number estimation of each network is also provided to consistently select the group numbers. Lastly, we implement the method on a Yelp dataset to illustrate the usefulness of the method.

6. 计算统计与MCMC 17 篇

2606.09049 2026-06-09 stat.ME cs.LG math.ST stat.ML stat.TH 新提交

Data augmented bootstrap: Unifying confidence interval construction by approximate invariance

数据增强自助法:通过近似不变性统一置信区间构建

Kevin Han Huang

AI总结 提出数据增强自助法(DAB),利用数据的近似不变性构建置信区间,统一了经典自助法、共形预测等方法的理论,并引入数据增强启发式方法。

详情
AI中文摘要

我们提出了数据增强自助法(DAB),这是一个通过数据的近似不变变换来构建置信区间的框架。作为特例,DAB 恢复了依赖于精确群对称性的流行方法,例如共形预测、最大均值差异 U-统计量的 wild bootstrap 以及最近提出的 SymmPI。同时,DAB 也恢复了经典的自助法,该方法利用了随着数据集大小增长,数据索引均匀采样下数据集的近似不变性。对于所有 DAB 方法,我们建立了理论覆盖结果,这些结果根据不变性的强度在有限样本和渐近保证之间插值,且不假设群结构。近似不变性通过 Kolmogorov 距离度量,并且对于满足高斯普适性的统计量,简化为条件均值和方差匹配。这使我们能够将数据增强(DA)——一种基于近似不变性的广泛使用的机器学习启发式方法——纳入已知的统计方法中。我们通过实验测试了将 DA 纳入自助法、wild bootstrap 和共形预测在模拟设置以及图像、语言和科学数据上的性能。

英文摘要

We propose the data augmented bootstrap (DAB), a framework for constructing confidence intervals from approximately invariant transformations of the data. As special cases, DAB recovers popular methods that rely on exact group symmetries, such as conformal prediction, wild bootstrap for Maximum Mean Discrepancy U-statistics and the recently proposed SymmPI. Meanwhile, DAB also recovers the classical bootstrap method, which exploits the dataset's approximate invariance under uniform sampling of data indices as the dataset size grows. For all DAB methods, we establish theoretical coverage results that interpolate between finite-sample and asymptotic guarantees according to the strength of the invariance, and without assuming a group structure. The approximate invariance is measured in the Kolmogorov distance and, for statistics that satisfy Gaussian universality, reduces to conditional mean and variance matching. This allows us to incorporate data augmentation (DA), a widely used machine learning heuristic based on approximate invariances, into known statistical methods. We empirically test the performance of incorporating DA into bootstrap, wild bootstrap and conformal prediction for simulated settings as well as for image, language and scientific data.

2606.08946 2026-06-09 stat.CO physics.chem-ph physics.comp-ph 新提交

A Diffusion Monte Carlo algorithm employing depth first traversal and a stack instead of a swarm

一种采用深度优先遍历和栈而非群体的扩散蒙特卡罗算法

Bastiaan J. Braams

AI总结 提出基于深度优先遍历和栈的扩散蒙特卡罗算法(DMCD),通过栈管理分裂历史,相比传统广度优先群体方法更节省内存,并统一了特征值问题与线性方程问题的算法处理。

详情
Comments
11 pages
AI中文摘要

扩散蒙特卡罗(DMC)和用于粒子输运的蒙特卡罗方法(带重要性抽样)都涉及加权游走者的模拟,这些游走者经历出生和死亡过程(分裂和俄罗斯轮盘赌)。这些方法的既定实现截然不同:粒子模拟蒙特卡罗使用栈来处理分裂历史,而传统DMC则跟踪一群游走者。粒子模拟蒙特卡罗方法对访问过的构型进行深度优先遍历,而传统DMC方法可视为广度优先遍历。在本工作中,描述了基于深度优先、栈的DMC实现,并给出了完整代码。深度优先方法(此处称为DMCD)在总内存以及内存层次结构和协处理器的使用方面可能比广度优先方法更节省内存。该实现对于群体控制和后代加权非常自然,并统一了特征值问题(DMC)与线性方程问题(粒子输运)的算法处理。DMCD中存在而广度优先方法中没有的一个问题(本文成功解决了)是,当需要新游走者而栈为空时,需要维护一个起始者池。DMCD方法有潜力成为许多DMC应用的首选实现。

英文摘要

Diffusion Monte Carlo (DMC) and Monte Carlo for particle transport with importance sampling both involve simulations of weighted walkers that undergo birth and death processes (splitting and Russian Roulette). The established implementations of these methods are quite different: Particle simulation Monte Carlo employs a stack to handle the splitting history whereas in traditional DMC one follows a swarm of walkers. The particle simulation Monte Carlo approach involves a depth first traversal of the visited configurations whereas the traditional DMC approach may be seen as a breadth first traversal. In the present work the implementation of a depth first, stack based approach to DMC is described and a complete code is presented. The depth first approach, called DMCD here, can be more memory efficient than the breadth first approach, both for total memory and for use of a memory hierarchy and of co-processors. The implementation appears very natural for population control and for descendant weighting and it unifies algorithmic treatment of the eigenvalue problem (DMC) with the linear equation problem (particle transport). A concern with DMCD that is not present in the breadth first approach, and that is successfully addressed here, is the need to maintain a pool of starters for use when a new walker is required and the stack is empty. The DMCD approach appears to have the potential to become the preferred implementation for many DMC applications.

2606.08438 2026-06-09 stat.ML cs.LG 新提交

Improving Bayesian Optimization via Training-Aware Conditional Diffusion Models

通过训练感知的条件扩散模型改进贝叶斯优化

Yilin Zheng, Haowei Wang, Szu Hui Ng, Enlu Zhou

AI总结 提出利用条件扩散模型高效近似最优解分布,并开发贝叶斯优化固有的训练策略和基于扩散的模态搜索采集函数,理论保证次优性,实验优于标准基线。

详情
AI中文摘要

贝叶斯优化(BO)是一种广泛使用的黑箱优化方法,它使用高斯过程(GP)作为代理模型,并通过采集函数指导顺序评估,最终目标是定位全局最优解 $\mathbf{x}^{\star}$。为了实现这一目标,基于信息的采集函数(如预测熵搜索PES)将 $\mathbf{x}^{\star}$ 建模为随机变量,并减少其分布的熵,但通过传统的GP后验采样来近似该分布计算成本高昂。为了解决这一限制,我们利用条件扩散模型(CDM)高效近似 $\mathbf{x}^{\star}$ 的分布,并为CDM开发了BO固有的训练策略。受CDM学习分布的结构特性启发,我们进一步提出了一种称为基于扩散的模态搜索(DMS)的采集策略来指导顺序评估。我们为CDM学习分布建立了次优性保证,并通过大量实验证明DMS优于标准BO基线。

英文摘要

Bayesian optimization (BO) is a widely used approach for black-box optimization that uses a Gaussian process (GP) as a surrogate and guides sequential evaluations via an acquisition function, with the ultimate goal of locating the global optimum $\mathbf{x}^{\star}$. To align with this goal, information-based acquisition functions such as Predictive Entropy Search (PES) model $\mathbf{x}^{\star}$ as a random variable and reduce the entropy of its distribution, but approximating this distribution via traditional GP posterior sampling is computationally expensive. To address this limitation, we leverage Conditional Diffusion Models (CDMs) to efficiently approximate the distribution of $\mathbf{x}^{\star}$ and develop BO-inherent training strategies for CDMs. Motivated by the structural properties of the CDM-learned distribution, we further develop an acquisition strategy termed Diffusion-based Mode Seeking (DMS) to guide the sequential evaluation. We establish a sub-optimality guarantee for the CDM-learned distribution and demonstrate through extensive experiments that DMS outperforms standard BO baselines.

2606.08418 2026-06-09 stat.ME 新提交

TS-Neyman: Posterior Sampling for Adaptive Stratified Estimation

TS-Neyman: 自适应分层估计的后验采样

Kosuke Morikawa, Mst Moushumi Pervin, Jae Kwang Kim

AI总结 针对分层池中标签成本高的问题,提出TS-Neyman算法,通过Thompson采样随机化方差不确定性,实现自适应Neyman分配,证明其几乎必然收敛到最优分配和渐近最优性,并在多个基准测试中接近Oracle效率。

详情
AI中文摘要

许多模型评估任务归结为在分层池上估计平均损失、错误率或子组指标,而每个标签、人工评分或模拟器调用都是昂贵的。精度最优的Neyman分配依赖于层内方差,这些方差必须从用于估计的同一观测中学习。我们将此问题表述为顺序分配问题,并使用精确的一步边际方差减少作为优先指标。通过独立逆卡方后验抽取替换未知方差,得到TS-Neyman,这是一种Thompson采样规则,在保留Oracle边际增益结构的同时,对方差不确定性进行随机化。对于任意固定的有限层数,我们证明了TS-Neyman分配比例几乎必然收敛到Neyman目标,方差代理的渐近最优性,以及所得自适应分层估计的中心极限定理。在两个五层预算缩放基准测试中(一个有界损失基准和一个基于Dai等人2023精神的二元模型误差基准),TS-Neyman的相对效率在有界损失群体上保持在Oracle的5%以内,在二元基准上保持在约15%以内。在额外的CivilComments真实数据回放中,使用基于置信度的分层,其效率保持在Oracle的约8%以内,并在各预算下均方误差上比均匀分配改进约7%至14%,而贪婪插件和两阶段插件在稀疏试点下可能退化超过一个数量级。公共试点热启动和先验敏感性研究表明,这种行为在工作模型和工作先验误设下是稳定的。

英文摘要

Many model evaluation tasks reduce to estimating an average loss, error rate, or subgroup metric on a stratified pool when each label, human rating, or simulator call is costly. The precision-optimal Neyman allocation depends on within-stratum variances, which must be learned from the same observations used for estimation. We formulate this as a sequential allocation problem and use the exact one-step marginal variance reduction as the priority index. Replacing the unknown variances by independent inverse-chi-squared posterior draws yields TS-Neyman, a Thompson-sampling rule that preserves the oracle marginal-gain structure while randomizing over variance uncertainty. For any fixed finite number of strata, we prove almost-sure convergence of the TS-Neyman allocation proportions to the Neyman target, asymptotic optimality of the variance proxy, and a central limit theorem for the resulting adaptive stratified estimator. In two five-stratum budget-scaling benchmarks, one bounded-loss benchmark and one binary model-error benchmark in the spirit of Dai et al. 2023, TS-Neyman's relative efficiency stays within 5 percent of the oracle on the bounded-loss population and within about 15 percent on the binary benchmark. In an additional CivilComments real-data replay with confidence-based strata, it stays within about 8 percent of the oracle and improves on equal allocation by roughly 7 to 14 percent in MSE across budgets, while plug-in greedy and two-stage plug-in can degrade by over an order of magnitude under sparse pilots. Common-pilot warm-start and prior-sensitivity studies show that this behavior is stable under working-model and working-prior misspecification.

2606.08203 2026-06-09 math.NA cs.LG cs.NA stat.ML 新提交

Stable and Scalable Probabilistic Numerical Solvers for Stiff and High-Dimensional ODEs

适用于刚性和高维ODE的稳定且可扩展的概率数值求解器

Nathanael Bosch

AI总结 针对刚性和高维常微分方程,提出两种互补策略:无矩阵更新步骤实现线性扩展,以及迭代重线性化提升稳定性,从而开发出稳定且可扩展的概率求解器。

详情
AI中文摘要

基于滤波的常微分方程概率数值求解器已被确立为一种灵活高效的仿真框架,具有内置的数值不确定性量化。然而,刚性和高维问题仍然是一个挑战,因为当前方法要么稳定但计算复杂度为ODE维度的三次方,要么线性扩展但牺牲稳定性。在本文中,我们弥合了这一差距,开发了既稳定又可扩展的概率ODE求解器。我们提出了两种互补策略。首先,我们开发了一种无矩阵更新步骤,利用雅可比向量积、迭代线性求解器和随机协方差估计来实现线性扩展,同时保持稳定性。其次,我们提出迭代重线性化以在不牺牲可扩展性的情况下进一步提高稳定性,将概率ODE求解器转变为完全隐式方法。我们在各种刚性和高维问题上评估了所提出的方法,并展示了相对于现有概率求解器在稳定性和可扩展性上的改进。

英文摘要

Filtering-based probabilistic numerical solvers for ordinary differential equations (ODEs) have been established as a flexible and efficient simulation framework with built-in numerical uncertainty quantification. However, problems that are both stiff and high-dimensional remain a challenge, as current methods are either stable and have cubic cost in the ODE dimension, or scale linearly at the expense of stability. In this paper, we close this gap and develop probabilistic ODE solvers that are both stable and scalable. We propose two complementary strategies. First, we develop a matrix-free update step that uses Jacobian-vector products, iterative linear solvers, and stochastic covariance estimation to enable linear scaling, all while retaining stability. Second, we propose iterative re-linearization to further improve stability without sacrificing scalability, turning probabilistic ODE solvers into fully implicit methods. We evaluate the proposed approaches on a range of stiff and high-dimensional problems and demonstrate improved stability and scalability over established probabilistic solvers.

2606.07981 2026-06-09 stat.ME stat.CO 新提交

Making Recursive Bayesian Inference Robust

使递归贝叶斯推断鲁棒

Myungsoo Yoo, Daniel Würzler Barreto, Mevin B. Hooten

AI总结 针对先验提议-递归贝叶斯推断在阶段间后验分布偏移时产生错误推断的问题,提出并行回火先验提议-递归贝叶斯推断,通过Metropolis耦合马尔可夫链蒙特卡洛思想扩展原方法,理论证明其收敛到真实后验,并在数值实验和实际数据分析中验证了其高效性。

详情
AI中文摘要

尽管贝叶斯推断随着计算资源的进步变得越来越流行,但其算法可能在计算上过于昂贵,并且无法扩展到大数据集。这导致人们对替代算法(如近似方法和马尔可夫链蒙特卡洛的变体)的兴趣日益增长。在这些方法中,先验提议-递归贝叶斯(PP-RB)推断通过跨阶段递归更新后验分布并利用并行计算资源,实现了可扩展的贝叶斯计算。虽然PP-RB中众所周知的“退化”问题已被研究,但另一个限制——当后验分布在阶段之间发生显著偏移时,PP-RB可能产生错误推断——仍未解决。为了解决这个问题,我们提出了并行回火先验提议-递归贝叶斯(PPP-RB)推断,它通过利用Metropolis耦合马尔可夫链蒙特卡洛的关键思想扩展了PP-RB。我们从理论和实证两方面证明PPP-RB以真实后验分布为目标。我们通过数值研究和实际数据分析(应用于地震计数数据和北大西洋区域的海表盐度)来说明PPP-RB。在这些应用中,我们将PPP-RB与PP-RB和标准MCMC进行比较,证明PPP-RB在每单位时间的有效样本量方面更高效。

英文摘要

While Bayesian inference has become increasingly popular with advances in computational resources, its algorithms can be computationally prohibitive and may not scale with large datasets. This has led to growing interest in alternative algorithms, such as approximation methods and variants of Markov chain Monte Carlo. Among these approaches, prior proposal-recursive Bayesian (PP-RB) inference facilitates scalable Bayesian computation by recursively updating the posterior distribution across stages and utilizing parallel computing resources. While the well-known ``degeneracy'' issue in PP-RB has been studied, another limitation that PP-RB can yield incorrect inferences when posterior distributions shift substantially between stages has remained unsolved. To address this, we propose parallel-tempered prior proposal-recursive Bayesian (PPP-RB) inference, which extends PP-RB by leveraging the key idea underlying Metropolis-coupled Markov chain Monte Carlo. We show both theoretically and empirically that PPP-RB targets the true posterior distribution. We illustrate PPP-RB through numerical studies and real data analysis in application to earthquake count data and sea surface salinity in the North Atlantic region. In these applications, we compare PPP-RB with PP-RB and a standard MCMC, demonstrating that PPP-RB is more efficient in terms of effective sample size per elapsed time.

2606.07841 2026-06-09 stat.CO cs.LG stat.ML 新提交

Large-scale empirical tuning and comparison of default optimizers for variational inference

变分推断默认优化器的大规模经验调优与比较

Trevor Campbell, Jonathan H. Huggins, Kyurae Kim, Charles C. Margossian

AI总结 通过大规模实验(56种优化器、1092个问题、55万次运行)评估变分推断中的自适应优化器,发现无单一方法最优,但5种算法组合可接近最佳性能。

详情
AI中文摘要

黑箱变分推断(BBVI)是一种依赖于随机优化的后验近似方法。在实践中,支撑BBVI的随机优化器通常需要大量针对特定问题的调优,这削弱了其作为真正“黑箱”推断算法的承诺。然而,在过去十年中,许多新的自适应随机优化算法已被开发出来,它们减少或完全消除了调优的需要。在这项工作中,我们在BBVI的背景下研究了这些新的自适应方法集合,旨在建立当前无调优优化推断的最新技术水平。具体而言,我们对应用于1092个贝叶斯推断优化问题的56种基于随机梯度的优化算法进行了大规模实证评估,涉及超过55万次独立优化运行和15个核心年的计算。我们评估的优化算法代表了近期方法的广泛谱系,而基准问题则涵盖了从难度范围(后验目标维度1-10^4,条件数1-10^8)以及多种变分族。我们的结果表明,没有单一方法占主导地位,但运行5种算法的选择足以可靠地接近观察到的最佳性能。因此,我们为无法进行专家调优的应用以及开发新的随机优化算法时的比较提供了强有力的基线。

英文摘要

Black-box variational inference (BBVI) is a methodology for posterior approximation that relies on stochastic optimization. In practice, the stochastic optimizers underpinning BBVI generally require extensive problem-specific tuning, which undermines its promise as a truly "black box" inference algorithm. However, over the past decade, many new adaptive stochastic optimization algorithms have been developed that reduce or remove entirely the need for tuning. In this work, we investigate this new collection of adaptive methods in the context of BBVI, with the goal of establishing the current state of the art in tuning-free optimization-based inference. In particular, we present a large-scale empirical evaluation of 56 stochastic gradient-based optimization algorithms applied to 1092 Bayesian inference optimization problems, involving over 550,000 individual optimization runs and 15 core-years of compute. The optimization algorithms we evaluate are chosen to represent a wide spectrum of recent approaches and the benchmark problems are chosen to span a range of difficulty, with posterior target dimension 1-10^4, condition number 1-10^8, and a range of variational families. Our results show that no single method dominates, but running a selection of 5 algorithms suffices to reliably get close to the best-possible observed performance. We thus provide a strong baseline for applications where expert tuning is not possible and for comparison when developing new stochastic optimization algorithms.

2606.07578 2026-06-09 cs.LG stat.ME stat.ML 新提交

MST-Direct at Scale: Multivariate and Conditional Geostatistical Simulation via Sinkhorn Optimal Transport

大规模MST-Direct:基于Sinkhorn最优传输的多变量与条件地质统计模拟

Tcharlies Bachmann Schmitz

AI总结 提出MST-Direct扩展方法,通过稀疏Sinkhorn匹配器、多变量元组匹配和克里金条件化,实现大规模、多变量、条件地质统计模拟,精确保持联合分布。

详情
AI中文摘要

本文将MST-Direct(一种用于多变量地质统计模拟的基于Sinkhorn传输的匹配方法)从原始的二元、无条件、小网格形式扩展到多变量、条件和大网格设置。我们解决了原始工作中确定的三个主要限制:(i)通过具有O(nC)内存复杂度的稀疏、候选限制的Sinkhorn匹配器,实现超过几千个节点的可扩展性;(ii)通过将目标值元组匹配到独立FFT-MA高斯骨干上扩展到多个变量,该骨干再现指定的变差函数;以及(iii)通过克里金法条件化骨干,同时在其空间位置固定观测数据元组进行硬数据条件化。由于传输计划仍然是目标元组的排列,多变量联合分布被精确保持。该方法使用与直接多变量模拟(DMS)相同的六变量、异方差、强非线性参考分布进行验证,在无条件(200x200)和条件(100x100,200个硬数据样本)场景下,并与投影寻踪多变量变换(PPMT)进行基准比较。结果表明,MST-Direct以零直方图误差再现联合分布,精确满足硬数据,并准确再现指定的空间相关结构,而PPMT仍然是近似。索引术语-最优传输,Sinkhorn算法,地质统计模拟,多变量模拟。

英文摘要

This paper extends MST-Direct, a Matching-via-Sinkhorn-Transport approach for multivariate geostatistical simulation, from the original bivariate, unconditional, small-grid formulation to multivariate, conditional, and large-grid settings. We address the three main limitations identified in the original work: (i) scalability beyond a few thousand nodes through a sparse, candidate-restricted Sinkhorn matcher with O(nC) memory complexity; (ii) extension to multiple variables by matching target value tuples onto an independent FFT-MA Gaussian backbone that reproduces a prescribed variogram; and (iii) hard-data conditioning by fixing observed data tuples at their spatial locations while conditioning the backbone through kriging. Because the transport plan remains a permutation of the target tuples, the multivariate joint distribution is preserved exactly. The method is validated using the same six-variate, heteroscedastic, strongly nonlinear reference distribution employed in Direct Multivariate Simulation (DMS), under both unconditional (200x200) and conditional (100x100, 200 hard-data samples) scenarios, and is benchmarked against the Projection Pursuit Multivariate Transform (PPMT). Results show that MST-Direct reproduces the joint distribution with zero histogram error, exactly honours hard data, and accurately reproduces the prescribed spatial correlation structure, whereas PPMT remains an approximation. Index Terms-Optimal transport, Sinkhorn algorithm, geostatistical simulation, multivariate simulation.

2606.07574 2026-06-09 cs.DC cs.AI cs.LG stat.CO stat.ML 新提交

Accelerating Birkhoff Projection for Manifold-Constrained Hyper-Connections

加速流形约束超连接的Birkhoff投影

Chenrui Wang, Yixuan Qiu

AI总结 针对流形约束超连接中Birkhoff投影的计算瓶颈,提出基于对偶公式和牛顿法的端到端加速框架,结合隐式微分和CUDA内核实现超过20倍加速。

详情
AI中文摘要

流形约束超连接(mHCs)最近被提出作为超连接的一种原则性扩展,其中残差混合矩阵通过投影到Birkhoff多面体上被约束为双随机矩阵。在实际的mHC实现中,该约束通过Sinkhorn-Knopp迭代强制执行,反向传播依赖于展开迭代求解器。这种设计引入了大量的计算和内存开销,并且当算法在具有挑战性的输入上收敛缓慢时,可能产生不准确的投影,从而破坏mHCs预期的范数控制和稳定性保证。在这项工作中,我们聚焦于实际重要的4x4 Birkhoff投影设置,并开发了一个端到端的加速框架。通过利用对偶公式,我们将问题简化为一个三维无约束凸问题,并使用牛顿法求解,实现了快速收敛和高精度。对于反向传播,我们用隐式微分替代展开微分,无需存储中间状态即可获得精确梯度。为了利用大规模并行性,我们设计了一个warp级别的CUDA内核,仅使用寄存器级原语,避免了全局和共享内存I/O。与代表性开源基线的大量实验表明,所提出的求解器产生了更可靠的双随机投影——特别是在输入幅度较大时——并实现了显著的端到端加速(包括反向传播),在大批量下达到超过20倍的加速,同时保持数量级更小的边际误差。

英文摘要

Manifold-constrained hyper-connections (mHCs) have recently been proposed as a principled extension of hyper-connections, where the residual mixing matrices are constrained to be doubly stochastic via projection onto the Birkhoff polytope. In practical mHC implementations, this constraint is enforced by Sinkhorn-Knopp iterations, and the backward pass relies on unrolling the iterative solver. This design introduces substantial computation and memory overhead, and may also yield inaccurate projections when the algorithm converges slowly on challenging inputs, undermining the intended norm-control and stability guarantees of mHCs. In this work, we focus on the practically important 4x4 Birkhoff projection setting and develop an end-to-end acceleration framework. By leveraging the dual formulation, we reduce the problem to a three-dimensional unconstrained convex problem and solve it with Newton's method, achieving fast convergence and high accuracy. For the backward pass, we replace the unrolled differentiation with implicit differentiation, yielding exact gradients without storing intermediate states. To exploit massive parallelism, we design a warp-level CUDA kernel that uses only register-level primitives, avoiding global and shared memory I/O. Extensive experiments against representative open-source baselines demonstrate that the proposed solver yields substantially more reliable doubly stochastic projections -- especially when the input magnitude is large -- and achieves significant end-to-end speedups (including the backward pass), reaching over 20x acceleration at large batch sizes while maintaining orders of magnitude smaller marginal errors.

2605.18741 2026-06-09 stat.ME stat.CO 版本更新

Robust Simulation Based Inference Through Robust Optimal Transport

通过鲁棒最优传输实现稳健的基于模拟的推断

Peter Matthew Jacobs, Lekha Patel, Anirban Bhattacharya, Debdeep Pati

AI总结 本文研究了在模型不准确的情况下,如何利用基于模拟的推断(SBI)进行参数估计,提出了一种基于鲁棒最优传输的 divergence 方法,并设计了并行化的 SBI 算法以量化参数不确定性。

详情
AI中文摘要

当统计模型{P_θ : θ∈Θ}缺乏解析的似然函数时,只要能够从模型中生成数据,就可以基于未知的底层分布P进行参数推断。这种做法称为基于模拟的推断(SBI)。统计模型很少完全正确(即P不属于{P_θ: θ∈Θ}),鲁棒SBI专注于在模型不准确的情况下推断合理的参数。我们考虑P可能同时具有几何和总变差类型的偏差。为此,我们使用了一个受经验似然启发的Kullback-Leibler信息引导的鲁棒最优传输分歧。我们引入了一个具有收敛保证的随机子梯度上升算法,用于估计这种鲁棒最优传输分歧的半离散版本,并设计了一个并行化的SBI算法,该算法在最小半离散鲁棒最优传输的基础上使用常规bootstrap来量化参数不确定性。我们数学上证明了该分歧在几何和总变差类型的污染下具有鲁棒性,并在复杂的SBI基准任务上展示了推断的鲁棒性。

英文摘要

When a statistical model $\{P_θ : θ\in Θ\}$ lacks analytically tractable likelihoods, parametric statistical inference based on data generated from an unknown underlying distribution $P$ can still be performed as long as simulations from the model are possible. This approach is called Simulation Based Inference (SBI). Statistical models are rarely exactly correct (that is, $P \notin \{P_θ: θ\in Θ\}$), and Robust SBI focuses on inferring a reasonable parameter even under model mis-specification. We focus on the setting where $P$ possesses potentially both geometric and Total Variation type discrepancies from $P_{θ^*}$. For this problem, we use a Kullback-Liebler informed robust Optimal Transport divergence, motivated by Empirical Likelihood considerations. We introduce a stochastic sub-gradient ascent algorithm with a convergence guarantee for estimating the semi-discrete version of this robust Optimal Transport divergence, and design a parallelized SBI algorithm which employs the regular bootstrap on top of minimum semi-discrete robust Optimal Transport for parameter uncertainty quantification. We demonstrate mathematically why the divergence is robust under a joint geometric plus Total Variation type contamination and then illustrate the robustness of inferences on a complex benchmark SBI task.

2605.01446 2026-06-09 math.NA cs.NA stat.ML 版本更新

Sequential Minimal Optimization for $\varepsilon$-SVR with MAPE Loss and Sample-Dependent Box Constraints

基于MAPE损失的ε-SVR序列最小优化

Pablo Benavides-Herrera, Riemann Ruiz-Cruz, Juan Diego Sánchez-Torres

AI总结 本文提出一种针对MAPE损失的ε-SVR序列最小优化算法,通过改进工作集选择和分析更新截断,实现结构不变性,并在效率和收敛性上取得改进,验证了算法在多个数据集上的有效性。

详情
Comments
82 pages, 3 figure, 13 tables
AI中文摘要

支持向量回归使用平均绝对百分比误差(MAPE)损失在理论上有充分动机,适用于以相对精度评估的预测应用,但其诱导的样本依赖对偶框约束在已发表的SMO文献中未被解决。我们为此设定推导出一种序列最小优化算法,并证明一个结构不变性结果:MAPE修改仅影响SMO迭代的两个组成部分——工作集选择和分析更新截断,其余梯度记录和曲率计算与经典epsilon-SVR相同。基于此不变性,我们建立了四个效率改进(不对称冻结计数器、热启动、大小为四的工作集更新和每对容忍度缩放),并通过自适应谱正则化解决了奇对称核变体中先前开放的收敛问题。在三个参考求解器上针对十一种合成配置的数值验证证明了解决方案在标准容差内一致。运行时间基准测试显示,本文算法在每种测试配置中相对于OSQP、MOSEK和Clarabel都实现了最低的中位运行时间。在生产规模上,算法在加州住房基准上收敛,而修补的LIBSVM参考实现达到迭代上限但未满足最优性——证明了理论效率机制的实用性。还提供了一个开源R包和一个显式求解器适应配方。

英文摘要

Support vector regression with Mean Absolute Percentage Error (MAPE) loss is theoretically well-motivated for forecasting applications where accuracy is evaluated in relative terms, but the sample-dependent dual box constraints it induces have not been addressed in the published SMO literature. We derive a Sequential Minimal Optimization algorithm for this setting and prove a structural-invariance result: the MAPE modification affects exactly two components of the SMO iteration -- working-set selection and analytic-update clipping -- leaving gradient bookkeeping and curvature computation identical to classical epsilon-SVR. Building on this invariance, we establish four efficiency improvements (asymmetric freeze-counters, warm-starting, block working-set updates of size four, and per-pair tolerance scaling) and resolve a previously-open convergence problem for the odd-symmetry kernel variant via adaptive spectral regularization. Numerical validation against three reference solvers across eleven synthetic configurations certifies solution agreement within standard tolerance. Wall-time benchmarks show the present algorithm achieves the lowest median runtime on every tested configuration against OSQP, MOSEK, and Clarabel. At production scale, the algorithm converges on the California Housing benchmark while the patched LIBSVM reference implementation reaches its iteration ceiling without satisfying optimality -- demonstrating the practical necessity of the theoretical efficiency mechanisms. An open-source R package and an explicit solver-adaptation recipe are provided.

2604.01459 2026-06-09 eess.SY cs.SY stat.ML 版本更新

Koopman Subspace Pruning in Reproducing Kernel Hilbert Spaces via Principal Vectors

在再生核希尔伯特空间中通过主向量进行Koopman子空间修剪

Dhruv Shah, Jorge Cortes

AI总结 本文提出在再生核希尔伯特空间中通过主向量进行Koopman子空间修剪的方法,解决传统欧几里得方法的局限性,通过主角计算提升子空间不变性。

详情
AI中文摘要

数据驱动的无限维Koopman算子近似依赖于有限维投影,其中所得到模型的预测精度很大程度上取决于所选子空间的不变性。子空间修剪系统地剔除几何上不匹配的方向以增强这种不变性接近性,这在形式上对应于子空间与其在算子作用下的像之间的最大主角。然而,现有技术大多局限于欧几里得环境。为弥合这一差距,本文提出了一种计算主角和向量的方法,以在再生核希尔伯特空间(RKHS)几何中实现Koopman子空间修剪。我们首先概述了一个精确的计算流程,随后利用随机Nystrom近似方法将其扩展到大规模数据集。基于这些基础,我们引入了核-SPV和近似核-SPV算法,通过主向量实现定向子空间细化。模拟结果验证了我们的方法。

英文摘要

Data-driven approximations of the infinite-dimensional Koopman operator rely on finite-dimensional projections, where the predictive accuracy of the resulting models hinges heavily on the invariance of the chosen subspace. Subspace pruning systematically discards geometrically misaligned directions to enhance this invariance proximity, which formally corresponds to the largest principal angle between the subspace and its image under the operator. Yet, existing techniques are largely restricted to Euclidean settings. To bridge this gap, this paper presents an approach for computing principal angles and vectors to enable Koopman subspace pruning within a Reproducing Kernel Hilbert Space (RKHS) geometry. We first outline an exact computational routine, which is subsequently scaled for large datasets using randomized Nystrom approximations. Based on these foundations, we introduce the Kernel-SPV and Approximate Kernel-SPV algorithms for targeted subspace refinement via principal vectors. Simulation results validate our approach.

2601.07013 2026-06-09 stat.ML cs.LG 版本更新

Conditional Normalizing Flows for Forward and Backward Joint State and Parameter Estimation

条件归一化流用于前向和后向联合状态与参数估计

Luke S. Lagunowich, Guoxiang Grayson Tong, Daniele E. Schiavazzi

AI总结 针对非线性非高斯系统,提出基于条件归一化流的状态滤波方法,结合MLP、Transformer或Mamba-SSM生成条件嵌入,并引入最优传输动力学损失缓解过参数化,在自动驾驶和COVID-19联合估计中验证有效性。

详情
AI中文摘要

传统的状态估计滤波算法——如经典卡尔曼滤波、无迹卡尔曼滤波和粒子滤波——在应用于不确定性遵循任意非高斯且可能多峰分布的非线性系统时,性能会下降。本研究回顾了基于条件归一化流进行非线性滤波的状态估计最新方法,其中条件嵌入由标准MLP架构、Transformer或选择性状态空间模型(如Mamba-SSM)生成。此外,我们测试了最优传输启发的动力学损失项在缓解由大量变换组成的流中过参数化问题的有效性。我们研究了这些方法在自动驾驶和患者群体动力学相关应用中的性能,特别关注它们如何处理时间反转和链式预测。最后,我们评估了各种条件策略在真实世界COVID-19联合SIR系统预测和参数估计应用中的性能。

英文摘要

Traditional filtering algorithms for state estimation -- such as classical Kalman filtering, unscented Kalman filtering, and particle filters -- show performance degradation when applied to nonlinear systems whose uncertainty follows arbitrary non-Gaussian, and potentially multi-modal distributions. This study reviews recent approaches to state estimation via nonlinear filtering based on conditional normalizing flows, where the conditional embedding is generated by standard MLP architectures, transformers or selective state-space models (like Mamba-SSM). In addition, we test the effectiveness of an optimal-transport-inspired kinetic loss term in mitigating overparameterization in flows consisting of a large collection of transformations. We investigate the performance of these approaches on applications relevant to autonomous driving and patient population dynamics, paying special attention to how they handle time inversion and chained predictions. Finally, we assess the performance of various conditioning strategies for an application to real-world COVID-19 joint SIR system forecasting and parameter estimation.

2311.05009 2026-06-09 physics.comp-ph cs.NA math.NA stat.ML 版本更新

Consensus-based adaptive sampling and approximation for high-dimensional energy landscapes

基于共识的自适应采样与逼近用于高维能量景观

Liyao Lyu, Huan Lei

AI总结 提出共识框架,通过后验残差自适应采样和相空间探索,联合优化代理模型与采样,有效构建高维集体变量的自由能面。

详情
AI中文摘要

我们提出了一个基于共识的框架,将相空间探索与基于后验残差的自适应采样统一起来,用于高维能量景观中的代理构建。与可以自由查询采样点的标准逼近任务不同,具有复杂能量景观的物理系统(如分子动力学(MD))由于物理约束和能量势垒,无法直接访问任意采样区域;代理构建进一步依赖于相空间的动态探索,这带来了显著的数值挑战。我们将问题表述为一个极小极大优化,联合调整代理逼近和残差增强采样。以MD系统高维集体变量(CVs)的自由能面(FESs)构建作为激励示例来说明基本思想。具体来说,最大化步骤建立了一个随机相互作用粒子系统,通过利用最大残差区域的拉普拉斯逼近和通过温度控制探索未知相空间来实现自适应采样。最小化步骤用新的样本集更新FES代理。数值结果证明了该方法在多达30个CVs的生物分子系统中的有效性。虽然我们专注于FES构建,但所开发的框架对于具有高维能量景观的复杂系统的有效代理构建具有通用性。

英文摘要

We present a consensus-based framework that unifies phase space exploration with posterior-residual-based adaptive sampling for surrogate construction in high-dimensional energy landscapes. Unlike standard approximation tasks where sampling points can be freely queried, physical systems with complex energy landscapes such as molecular dynamics (MD) do not have direct access to arbitrary sampling regions due to the physical constraints and energy barriers; the surrogate construction further relies on the dynamical exploration of phase space, posing a significant numerical challenge. We formulate the problem as a minimax optimization that jointly adapts both the surrogate approximation and residual-enhanced sampling. The construction of free energy surfaces (FESs) for high-dimensional collective variables (CVs) of MD systems is used as a motivating example to illustrate the essential idea. Specifically, the maximization step establishes a stochastic interacting particle system to impose adaptive sampling through both exploitation of a Laplace approximation of the max-residual region and exploration of uncharted phase space via temperature control. The minimization step updates the FES surrogate with the new sample set. Numerical results demonstrate the effectiveness of the present approach for biomolecular systems with up to 30 CVs. While we focus on the FES construction, the developed framework is general for efficient surrogate construction for complex systems with high-dimensional energy landscapes.

2505.24066 2026-06-09 math.ST stat.ME stat.ML stat.TH 版本更新

Adaptive Resolution for Finite-Rank Gaussian Processes

有限秩高斯过程的自适应分辨率

Jaehoan Kim, Anirban Bhattacharya, Debdeep Pati

AI总结 研究有限秩高斯过程后验收缩率,提出基于局部支撑基展开的框架,证明在分辨率参数先验下可实现与父过程相同的后验收缩率,并开发联合更新分辨率和带宽的后验采样器。

详情
Comments
48 pages, 5 figures
AI中文摘要

有限秩近似被广泛用于扩展高斯过程回归,但其后验行为可能与对应的父高斯过程先验不同。我们研究了一类由局部支撑基展开和依赖高斯系数构建的有限秩高斯过程先验。我们的框架涵盖了基于Matérn高斯过程随机偏微分方程表示的有限元近似和规则网格高斯过程插值方案。我们证明,在分辨率参数$N$的合适先验下,这些有限秩展开继承了与对应父高斯过程先验相同的后验收缩率,前提是使用为父先验指定的相同带宽。因此,在平方指数父高斯过程下的插值构造,在带宽参数和$N$的分层先验下,达到了对数因子下的极小化最优速率;而SPDE构造在依赖于样本量和真实函数光滑性的带宽缩放以及$N$的先验下,达到了相同速率。我们还为分层插值模型开发了一个后验采样器,该采样器联合更新分辨率和带宽参数,并提供了支持理论的数值研究。

英文摘要

Finite-rank approximations are widely used to scale Gaussian process (GP) regression, but their posterior behavior can differ from that of the corresponding parent GP prior. We study a class of finite-rank GP priors built from locally supported basis expansions with dependent Gaussian coefficients. Our framework covers finite-element approximations based on the stochastic partial differential equation (SPDE) representation of Matérn GPs and regular-grid GP interpolation schemes. We show that, with a suitable prior on the resolution parameter $N$, these finite-rank expansions inherit the same posterior contraction rate as the corresponding parent GP prior under the same bandwidth specification used for that parent prior. Consequently, the interpolation construction under a squared-exponential parent GP attains the minimax-optimal rate up to logarithmic factors under a hierarchical prior on the bandwidth parameter and on $N$, while the SPDE construction attains the same rate under a bandwidth scaling depending on the sample size and the smoothness of the true function, together with a prior on $N$. We also develop a posterior sampler for the hierarchical interpolation model that jointly updates the resolution and bandwidth parameters, and we provide numerical studies that support the theory.

2505.13410 2026-06-09 math.ST math.PR stat.ML stat.TH 版本更新

Joint stochastic localization and applications

联合随机定位及其应用

Tom Alberts, Yiming Xu, Qiang Ye

AI总结 将随机定位扩展为耦合概率测度的联合框架,定义Eldan α-距离,研究其理论性质并开发高效估计器,应用于分布数据分析。

详情
Comments
68 pages; substantial revision including correcting an error in Theorem 3.1 (iii) in the previous version and adding a few new results
AI中文摘要

随机定位是一种路径分析技术,已成为高维概率和采样中的强大工具。在这项工作中,我们将随机定位扩展为耦合概率测度的联合框架,并探索其在分布数据分析中的应用。我们首先统一了Eldan α-方案下的现有随机定位过程,并刻画了它们的定位速率。在此基础上,我们引入了一个联合方案,通过由共享布朗运动驱动的并发α-方案来耦合概率测度。这种构造是规范的,并在概率测度空间上诱导出一族度量,我们称之为Eldan α-距离。还讨论了将最优高斯耦合外推到对数凹测度的替代变体。我们研究了Eldan α-距离的理论性质,包括其对高斯测度的限制以及在线性变换下的行为。对于α=0,我们证明它在共同紧支撑测度上与2-Wasserstein距离拓扑等价;我们还将它的加权变体与Wiener空间中的线性化最优传输以及训练扩散模型中的分数匹配目标联系起来。在计算上,我们为α=0和α=1/2情况下的Eldan α-距离开发了高效估计器,在前者中对数凹测度和有限支撑测度具有严格的误差保证,在后者中对高斯测度具有严格的误差保证。最后,我们将Eldan α-距离作为2-Wasserstein距离的可扩展替代,以实现快速成对距离估计和Wasserstein重心的近似计算。

英文摘要

Stochastic localization is a pathwise analysis technique that has emerged as a powerful tool in high-dimensional probability and sampling. In this work, we extend stochastic localization to a joint framework for coupling probability measures and explore its applications in distributional data analysis. We first unify existing stochastic localization processes under Eldan's $α$-scheme and characterize their localization rates. Building on this, we introduce a joint scheme to couple probability measures via concurrent $α$-schemes driven by a shared Brownian motion. This construction is canonical and induces a family of metrics on the space of probability measures, which we call Eldan's $α$-distance. Alternative variants that extrapolate optimal Gaussian couplings to log-concave measures are also discussed. We study the theoretical properties of Eldan's $α$-distance, including its restriction to Gaussian measures and its behavior under affine transformations. For $α= 0$, we show it is topologically equivalent to the $2$-Wasserstein distance for measures supported on a common compact set; we also relate its weighted variants to linearized optimal transport in Wiener space and to score-matching objectives in training diffusion models. Computationally, we develop efficient estimators for Eldan's $α$-distance in the cases $α=0$ and $α=1/2$, with rigorous error guarantees for log-concave and finitely supported measures in the former setting and Gaussian measures in the latter. Finally, we apply Eldan's $α$-distance as a scalable surrogate for the $2$-Wasserstein distance to enable fast pairwise distance estimation and approximate computation of Wasserstein barycenters.

2408.02122 2026-06-09 stat.CO stat.AP stat.ME 版本更新

Graph-Enabled Efficient Federated Bayesian Modeling

图启用的高效联邦贝叶斯建模

Chenyang Zhong, Shouxuan Ji, Tian Zheng

AI总结 提出FLaG-MCMC框架,通过将历史后验样本编码到低维潜在空间并构建k近邻图,实现高效联邦贝叶斯学习,在阿片类药物使用障碍患病率估计和联邦主题建模中验证了其近似全局后验的能力。

详情
Comments
20 pages, 7 figures
AI中文摘要

联邦贝叶斯建模需要将来自分布式用户的证据整合为一致的全局后验,同时将用户的原始数据保留在设备上。我们提出了联邦潜在图MCMC(FLaG-MCMC),这是一个计算高效的联邦学习框架,其中共享全局参数的历史后验样本被编码到学习到的低维潜在空间中,通过$k$-近邻图连接,并作为非参数先验顺序传递给新用户。每个用户在其自身似然的引导下,在潜在空间中运行基于图的MCMC,将更新的全局样本返回给服务器,并将局部潜在变量保留在设备上。我们在阿片类药物使用障碍患病率估计的贝叶斯元分析和联邦主题建模上展示了FLaG-MCMC,其中联邦后验在全局参数和局部用户级推断上均紧密逼近合并的全数据后验。

英文摘要

Federated Bayesian modeling requires combining evidence from distributed users into a coherent global posterior while keeping users' raw data on-device. We propose Federated Latent Graph MCMC (FLaG-MCMC), a computationally efficient framework for federated learning in which historical posterior samples of a shared global parameter are encoded into a learned low-dimensional latent space, connected via a $k$-nearest-neighbor graph, and transferred sequentially to new users as a nonparametric prior. Each user runs graph-based MCMC in the latent space guided by their own likelihood, returns updated global samples to the server, and retains local latent variables on-device. We demonstrate FLaG-MCMC on Bayesian meta-analysis for opioid use disorder prevalence estimation and on federated topic modeling, where the federated posterior closely approximates the pooled full-data posterior for both global parameters and local user-level inference.

7. 机器学习统计基础 53 篇

2606.09820 2026-06-09 math.FA cs.LG math.PR q-fin.MF stat.ML 新提交

Weighted universal approximation of differentiable maps on infinite-dimensional manifolds

无限维流形上可微映射的加权通用逼近

Philipp Schmocker, Josef Teichmann

AI总结 通过加权Nachbin定理,将函数输入神经网络的通用逼近定理推广到可微映射,包括导数逼近,并应用于非预期泛函和路径空间泛函的逼近。

详情
Comments
77 pages, 3 figures
AI中文摘要

我们将函数输入神经网络(FNN)的通用逼近定理推广到可微映射,包括导数的逼近。FNN将输入从可能无限维的加权流形映射到实值隐藏层,在该层上应用非线性标量激活函数,然后通过一些线性读出将输出返回到Banach空间。通过证明加权Nachbin定理,我们建立了可微映射的通用逼近定理(UAT),该定理超越了紧集上的通常表述,并且还包括导数的逼近。这导致了非预期泛函(包括水平和垂直导数)的逼近结果。作为进一步的应用,我们证明了签名的线性函数能够逼近路径空间泛函,包括它们的方向导数。

英文摘要

We generalize the universal approximation theorem for functional input neural networks (FNN) to differentiable maps by including the approximation of the derivatives. A FNN maps the input from a possibly infinite-dimensional weighted manifold to the real-valued hidden layer, on which a non-linear scalar activation function is applied, and then returns the output into a Banach space via some linear readouts. By proving a weighted Nachbin theorem, we establish a universal approximation theorem (UAT) for differentiable maps, which goes beyond the usual formulation on compact sets and also includes the approximation of the derivatives. This leads us to approximation results for non-anticipative functionals including the horizontal and vertical derivatives. As a further application, we show that linear functions of the signature are able to approximate path space functionals including their directional derivatives.

2606.09404 2026-06-09 stat.ML cs.AI cs.LG 新提交

SAILS: Surrogate-based Analysis of Interactions via Local Effect Smooths

SAILS: 基于局部效应平滑的交互作用代理分析

Timo Heiß, Julia Herbinger, Bernd Bischl, Giuseppe Casalicchio

AI总结 提出SAILS框架,通过可解释的广义加性模型代理分析黑箱模型中的成对交互作用,实现交互检测、形式分类和可视化。

详情
AI中文摘要

特征交互驱动了机器学习模型的大部分预测能力,然而现有的解释方法仅能检测和量化交互作用,而无法揭示其函数形式,或者只能可视化受限的交互类型。我们提出了基于局部效应平滑的交互作用代理分析(SAILS),这是一个模型无关的框架,通过拟合黑箱模型局部效应的可解释广义加性模型(GAM)代理来分析成对交互作用。对于感兴趣特征的每个区间,代理平滑项在导数层面隔离交互成分,从而实现(i)通过对平滑项显著性检验的启发式方法进行交互检测,(ii)将交互形式分类为线性、乘积可分离和非乘积可分离类型,以及(iii)为每种交互类型提供定制化、可解释的可视化。我们通过受控模拟和实际任务实证验证了该框架,展示了其在成对交互作用上的有效性,但在强特征相关性和高阶交互作用下存在局限性。SAILS填补了XAI工具箱中的一个显著空白,超越了仅检测交互作用,进而表征其函数形式。

英文摘要

Feature interactions drive much of the predictive power of machine learning models, yet existing explanation methods only detect and quantify interactions without revealing their functional form, or visualize only restricted interaction types. We propose Surrogate-based Analysis of Interactions via Local effect Smooths (SAILS), a model-agnostic framework that analyzes pairwise interactions through interpretable generalized additive model (GAM) surrogates fitted to the local effects of a black-box model. For each interval of a feature of interest, the surrogate smooth terms isolate the interaction components on derivative level, enabling (i) interaction detection through a heuristic derived from significance tests on smooth terms, (ii) interaction form categorization into linear, product-separable, and non-product-separable types, and (iii) tailored, interpretable visualizations for each interaction type. We empirically validate the framework through controlled simulations and a real-world task, demonstrating its effectiveness for pairwise interactions, with limitations under strong feature correlations and higher-order interactions. SAILS fills a notable gap in the XAI toolbox, going beyond detection of interactions alone to characterizing their functional form.

2606.09191 2026-06-09 cs.LG stat.ML 新提交

Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards

风险厌恶型多臂赌博机中汤普森采样的渐近最优性(次高斯奖励)

Joel Q. L. Chang

AI总结 本文证明了一种无锚非参数汤普森采样算法在风险厌恶型多臂赌博机中达到实例依赖的渐近最优后悔界,适用于任意连续风险泛函,且仅需连续性条件,优于先前参数方法。

详情
Comments
10 pages, 4 figures
AI中文摘要

我们证明 $\rho\text{-}\mathrm{NPTS}_{\mathrm{SG}}$,一种用于风险厌恶型多臂赌博机的无锚非参数汤普森采样算法,其遗憾值在 $\log n$ 的主阶上匹配实例依赖下界,从而确立了它在具有有界密度和次高斯尾部(包括高斯臂)的分布类上对任意连续风险泛函 $\rho$(CVaR、均值-方差、夏普比率、扭曲风险度量等)的渐近最优性。该结果及其有界支撑版本仅要求 $\rho$ 的连续性:严格弱于先前参数汤普森采样结果的支配条件,也严格弱于UCB类算法的Lipschitz条件,从而在无参数奖励假设下首次为夏普比率等非Lipschitz泛函提供了实例最优保证。有界支撑情形作为具有相同证明结构的垫脚石首先被发展。关键技术贡献是一个离散化引理(有界支撑)和一个截断离散化引理(次高斯尾部),每个引理通过Dirichlet聚合性质将增长字母表的Dirichlet后验投影到固定网格上,保持所有多项式前因子在固定次数且独立于样本量,打破了先前证明中阻碍的超指数障碍。

英文摘要

We prove that $ρ\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, establishing it as asymptotically optimal for any continuous risk functional $ρ$ (CVaR, mean-variance, Sharpe ratio, distortion risk measures, and more) on the class of distributions with bounded density and sub-Gaussian tails, including Gaussian arms. Both this result and its bounded-support counterpart require only continuity of $ρ$: strictly weaker than the dominance condition of prior parametric Thompson Sampling results, and strictly weaker than the Lipschitz condition of UCB-type algorithms, yielding the first instance-optimal guarantees for non-Lipschitz functionals such as the Sharpe ratio without parametric reward assumptions. The bounded-support case is developed first as a stepping stone sharing the same proof structure. The key technical contributions are a discretisation lemma (bounded support) and a truncated discretisation lemma (sub-Gaussian tails), each projecting the growing-alphabet Dirichlet posterior onto a fixed grid via the Dirichlet aggregation property, holding all polynomial prefactors at fixed degree independent of sample size and breaking the super-exponential barrier that blocked prior proofs.

2606.09052 2026-06-09 cs.LG cs.AI cs.CL cs.GT stat.ML 新提交

INFUSER: Influence-Guided Self-Evolution Improves Reasoning

INFUSER: 影响力引导的自我进化提升推理能力

Siyu Chen, Miao Lu, Beining Wu, Heejune Sheen, Fengzhuo Zhang, Shuangning Li, Zhiyuan Li, Jose Blanchet, Tianhao Wang, Zhuoran Yang

AI总结 提出INFUSER框架,通过生成器与求解器的协同进化,利用影响力分数和DuGRPO优化,从文档池中自适应生成训练数据,显著提升模型推理性能。

详情
Comments
66 pages, 17 figures
AI中文摘要

自我进化为更强的推理提供了一条可扩展的路径:预训练语言模型仅需极少的外部监督即可自我改进。然而,现有方法要么依赖于大量精心策划或教师生成的训练数据,要么在生成器无监督运行时,使用未必能改进求解器的难度启发式方法对其进行奖励。我们引入了INFUSER,一个迭代协同训练框架,包含两个共同进化的角色:一个生成器,从自动收集的非结构化文档池中起草问题并参考标准答案;一个求解器,通过在这些数据上训练来改进。求解器使用标准正确性奖励(针对生成器提供的答案)进行训练,而生成器则通过一种优化器感知的影响力分数获得奖励,该分数衡量每个提出的问题是否真正能改进求解器在目标分布上的表现。由于这种连续、有噪声的影响力分数不适合标准的GRPO,我们提出了DuGRPO,一种GRPO的双归一化变体,用于生成器训练。这些设计共同将文档池转化为一个自适应课程,倾向于对当前求解器有用的问题,而不仅仅是困难的问题。在Qwen3-8B-Base上,INFUSER在Olympiad和SuperGPQA基准测试中相对于强自我进化基线取得了超过20%的相对改进,并且一个8B的INFUSER协同进化生成器在数学和编程任务上优于冻结的32B思考生成器。消融实验证实了每个设计选择的必要性,两个扩展——将INFUSER应用于指令微调锚点并辅以规则可验证的RLVR数据——进一步展示了该框架的灵活性和泛化能力。代码可在https://github.com/FFishy-git/INFUSER获取。

英文摘要

Self-evolution offers a scalable path to stronger reasoning: a pretrained language model improves itself with only minimal external supervision. Yet existing methods either depend on extensively curated or teacher-generated training data, or, when the generator runs unsupervised, reward it by a difficulty heuristic that need not improve the solver. We introduce INFUSER, an iterative co-training framework with two co-evolving roles: a Generator that drafts questions and reference golden answers from a pool of unstructured, automatically collected documents, and a Solver that improves by training on them. The solver is trained with standard correctness rewards against the generator-provided answers, while the generator is rewarded by an optimizer-aware influence score that measures whether each proposed question would actually improve the solver on the target distribution. Because this continuous, noisy influence score is poorly served by standard GRPO, we propose DuGRPO, a dual-normalized variant of GRPO, for generator training. Together, these turn the document pool into an adaptive curriculum that favors questions useful to the current solver, not just hard ones. On Qwen3-8B-Base, INFUSER outperforms strong self-evolution baselines with over 20% relative improvement on Olympiad and SuperGPQA benchmarks, and an 8B INFUSER co-evolving generator outperforms a frozen 32B thinking generator on math and coding. Ablations confirm each design choice is necessary, and two extensions, applying INFUSER to an instruction-finetuned anchor and augmenting it with rule-verifiable RLVR data, further demonstrate the flexibility and generalizability of the framework. Code is available at https://github.com/FFishy-git/INFUSER.

2606.09012 2026-06-09 cs.LG cs.AI math.OC stat.ML 新提交

Understanding Quantization-Aware Training: Gradients at Quantized Weights Bias to the Low-Loss Basin

理解量化感知训练:量化权重的梯度偏向低损失盆地

Hanyang Li, Jianhao Ma, Ying Cui

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Pennsylvania(宾夕法尼亚大学)

AI总结 提出统一几何框架解释后训练量化失败与量化感知训练恢复机制,揭示量化感知训练通过梯度感知谷壁使量化点返回低损失盆地。

详情
Comments
31 pages, 10 figures
AI中文摘要

后训练量化(PTQ)将训练好的全精度模型转换为低比特权重,无需任务级重训练,而量化感知训练(QAT)将量化纳入训练循环。尽管PTQ在中等比特宽度下高效且通常准确,但在激进比特宽度下可能急剧失败;QAT成本更高但通常能恢复丢失的精度。我们提出了一个统一的几何框架,同时解释PTQ失败和QAT恢复。我们将全精度训练建模为在更宽的\emph{山谷}内沿着低损失\emph{河流}:河流的法向邻域形成近乎平坦的\emph{盆地},而离开该盆地会导致损失急剧增加。当量化网格与盆地宽度相当时,局部PTQ目标(包括舍入和基于Hessian的二阶重建)可能选择盆地外的高损失部署量化点,即使附近存在低损失量化点。在这种情况下,基于直通估计器的QAT具有有用的偏差:它在部署的量化权重处评估梯度,同时更新潜在的全精度权重,导致梯度感知谷壁并获得向内分量,从而将后续量化迭代引导回盆地。我们通过局部景观模型形式化这一机制,构造了几何PTQ失败模式,并在局部量化器兼容性假设下证明了有限时间QAT恢复。在多种神经网络量化方案下的视觉和语言模型实验,证实了预测的PTQ跨盆地失败以及相应的QAT恢复机制。

英文摘要

Post-training quantization (PTQ) converts a trained full-precision model into low-bit weights without task-level retraining, while quantization-aware training (QAT) incorporates quantization into the training loop. Although PTQ is efficient and often accurate at moderate bitwidths, it can fail sharply at aggressive bitwidths; QAT is more expensive but can often recover the lost accuracy. We propose a unified geometric framework that explains both PTQ failure and QAT recovery. We model full-precision training as following a low-loss \emph{river} inside a wider \emph{valley}: a normal neighborhood of the river forms a nearly flat \emph{basin}, while leaving this basin incurs a sharp loss increase. When the quantization grid is comparable to the basin width, local PTQ objectives, including rounding and Hessian-based second-order reconstruction, can select a high-loss deployed quantized point outside the basin even when nearby low-loss quantized points exist. In this regime, straight-through-estimator-based QAT has a useful bias: it evaluates gradients at the deployed quantized weights while updating latent full-precision weights, causing the gradient to sense the valley wall and acquire an inward component that steers subsequent quantized iterates back into the basin. We formalize this mechanism through a local landscape model, construct a geometric PTQ failure mode, and prove finite-time QAT recovery under local quantizer-compatibility assumptions. Experiments across vision and language models under multiple neural-network quantization schemes corroborate the predicted basin-crossing failure of PTQ and the corresponding recovery mechanism of QAT.

2606.09002 2026-06-09 stat.ML cs.LG math.ST stat.TH 新提交

Multi-Armed Bandits with Arriving Arms: Sequential Screening, Dynamic Regret, and Sublinear Guarantees

带有到达臂的多臂老虎机:顺序筛选、动态遗憾与次线性保证

Deqi Zheng, Xiaoyang Xu, Yuhong Yang

AI总结 针对可用臂随时间扩展的随机多臂老虎机问题,提出基于消除的UCB-AA算法,通过初步筛选新臂并考虑到达信息差异和漂移基准,实现动态遗憾的次线性界。

详情
Comments
24 pages, 4 figures
AI中文摘要

我们研究了一个随机多臂老虎机问题,其中可用臂的集合随时间扩展。这一设置出现在当新动作或治疗在正在进行的研究中变得可用时的顺序实验中,使得对事后单一最佳臂的遗憾不恰当。我们转而评估相对于当前可用最佳臂的性能,从而为到达臂环境引入了一个动态遗憾准则。为了解决到达信息差异(AID)和漂移基准(DB)带来的挑战,我们提出了用于到达臂的UCB(UCB-AA),这是一个基于消除的过程,并包含一个辅助的初步筛选步骤,用于新到达的臂在与现有臂完全竞争之前。我们证明UCB-AA获得的遗憾界明确依赖于到达过程,在间隙演化的正则条件下实现了次线性动态遗憾,并允许对未知时间范围进行在线扩展。仿真结果表明,UCB-AA减少了浪费的拉取次数,保持了较小的活动臂集,同时保持了有竞争力的遗憾性能。

英文摘要

We study a stochastic multi-armed bandit problem in which the set of available arms expands over time. This setting arises in sequential experimentation when new actions or treatments become available during an ongoing study, making regret against a single best arm in hindsight inappropriate. We instead evaluate performance relative to the best arm currently available, leading to a dynamic-regret criterion for arriving-arm environments. To address the resulting challenges of arrival information discrepancy (AID) and a drifting benchmark (DB), we propose UCB for Arriving Arms (UCB-AA), an elimination-based procedure with an aiding preliminary screening step for newly arrived arms before full competition with incumbent arms. We show that UCB-AA attains regret bounds that depend explicitly on the arrival process, achieves sublinear dynamic regret under regularity conditions on gap evolution, and admits an online extension for unknown horizons. Simulation results show that UCB-AA reduces wasted pulls and maintains a smaller active arm set while preserving competitive regret performance.

2606.08934 2026-06-09 cs.LG stat.AP stat.CO stat.ME stat.ML 新提交

Backward Coherence and Hidden-State Stability in Recurrent Neural Networks: A Quasi-Reverse-Martingale Theory

递归神经网络中的反向相干性与隐藏状态稳定性:拟逆鞅理论

Yuan-chin Ivan Chang

发表机构 * Institute of Statistical Science, Academia Sinica(中央研究院统计科学研究所)

AI总结 提出反向相干性概念,通过拟逆鞅理论证明隐藏状态序列几乎必然收敛,并设计正则化方法,在多个任务中实现更早稳定和更低误差。

详情
AI中文摘要

递归神经网络维护一个隐藏状态 $h_t$,但其概率意义通常不明确。我们通过\emph{反向相干性}研究隐藏状态稳定性:即通过学习的反向投影器 $g_ϕ$ 从 $h_{t+1}$ 重构 $h_t$ 的程度。在收缩性和可和反向漂移条件下,隐藏状态序列构成拟逆鞅。这导致几乎必然收敛、混合下的速率、可解释的极限表示、有限路径停止时间以及时间一致置信序列的理论框架。模拟支持该理论。反向相干性正则化将经验拟鞅总和 $\hat Q$ 降低 $43$--$58\%$,比未正则化的 RNN 早 $28$--$44\%$ 达到稳定,并提供与几何界一致的跟踪误差恢复。额外测试证实回波状态遗忘率受 $ρ$ 限制,并验证增量总和管 $R_t$ 具有 $100\%$ 同时覆盖率,尽管 $R_t$ 是保守的;实践中,缺陷尾代理 $\hat Q_t$ 是更有用的监控指标。反向相干性损失也等价于在高斯反向模型中最小化 Kullback--Leibler 散度,将该方法与变分推断联系起来。扩展涵盖 $ϕ$-混合输入、变点检测和有限样本集中度。三项真实数据研究进一步验证了该方法。在 PhysioNet 2012 ICU 数据上,逆鞅 RNN (RMRNN) 与 RNN 的死亡率预测 AUC 相当,同时提前 13 小时达到稳定表示。在 FRED-MD 上,它在概念漂移下将一个月前预测误差降低约四倍。在 UCI 人类活动识别上,它保持较低的后转换跟踪误差并具有几何衰减。这些保证在所述假设下成立;不声称普适性。

英文摘要

Recurrent neural networks maintain a hidden state $h_t$, but its probabilistic meaning is often unclear. We study hidden-state stability through \emph{backward coherence}: the extent to which $h_t$ can be reconstructed from $h_{t+1}$ by a learned backward projector $g_ϕ$. Under contraction and summable backward drift, the hidden-state sequence forms a quasi-reverse-martingale. This yields almost-sure convergence, rates under mixing, an interpretable limiting representation, finite pathwise stopping times, and a theoretical framework for time-uniform confidence sequences. Simulations support the theory. Backward-coherence regularisation reduces the empirical quasi-martingale total $\hat Q$ by $43$--$58%$, reaches stability $28$--$44%$ earlier than an unregularised RNN, and gives tracking-error recovery consistent with geometric bounds. Additional tests confirm echo-state forgetting rates bounded by $ρ$ and verify the increment-sum tube $R_t$ with $100%$ simultaneous coverage, although $R_t$ is conservative; in practice, the defect-tail proxy $\hat Q_t$ is the more useful monitor. The backward-coherence loss is also equivalent to minimising a Kullback--Leibler divergence in a Gaussian backward model, linking the method to variational inference. Extensions cover $ϕ$-mixing inputs, change-point tracking, and finite-sample concentration. Three real-data studies further validate the approach. On PhysioNet 2012 ICU data, the Reverse Martingale RNN (RMRNN) matches RNN mortality-prediction AUC while reaching stable representations 13 hours earlier. On FRED-MD, it reduces one-month-ahead forecast error by about fourfold under concept drift. On UCI Human Activity Recognition, it maintains lower post-transition tracking error with geometric decay. The guarantees apply under the stated assumptions; universality is not claimed.

2606.08854 2026-06-09 cs.LG cs.AI cs.CL stat.ML 新提交

sGPO: Trading Inference FLOPs for Training Efficiency in RLVR

sGPO: 在RLVR中用推理FLOPs换取训练效率

Shivchander Sudalairaj, Kai Xu, Akash Srivastava, Giorgio Giannone

发表机构 * Red Hat(红帽) IBM

AI总结 提出sGPO方法,通过少量推理计算预估查询难度,自适应分配训练预算,将训练计算量降低三倍,同时保持或提升性能。

详情
AI中文摘要

标准的可验证奖励强化学习(RLVR)训练为每个查询分配固定的展开预算,而不考虑每个查询的难度对当前策略的意义。这导致两种对称的失败模式:简单查询产生接近零的优势,因为策略已经解决了它们;而无法解决的查询不产生信号,因为策略从未解决它们。这两种情况都浪费了训练FLOPs,而没有贡献学习梯度。我们引入了排序组策略优化(sGPO),一种计算高效的策略,用少量推理FLOPs换取大量减少浪费的训练FLOPs。关键见解是,廉价的推理计算可以作为查询难度的单一离线代理。通过在初始策略下为每个查询生成一小批并行样本,我们获得了模型感知的经验成功率。这激励将训练展开组大小设置为该成功率的倒数,这是一个实用的规则,通过从每个生成的展开中提取最大优势来最大化样本效率。这一单次性能分析过程同时驱动数据过滤(移除琐碎查询和子采样无法解决的查询)、自适应组大小分配和课程构建(从易到难调度查询)。sGPO匹配或超过基线性能,同时将总训练计算量减少三倍,包括前期的推理性能分析成本。

英文摘要

Standard Reinforcement Learning with Verifiable Rewards (RLVR) training allocates a fixed rollout budget to every query, without regard for what each query's difficulty means for the current policy. This leads to two symmetric failure modes: easy queries produce near-zero advantage because the policy already solves them, while unsolvable queries produce no signal because the policy never solves them. Both regimes waste training FLOPs without contributing to a learning gradient. We introduce sorted Group Policy Optimization (sGPO), a compute-efficient strategy that trades a small budget of inference FLOPs for a large reduction in wasted training FLOPs. The key insight is that cheap inference compute can serve as a single offline proxy for query difficulty. By generating a small batch of parallel samples per query under the initial policy, we obtain a model-aware empirical success rate. This motivates setting the training rollout group size to the inverse of this success rate, a practical rule that maximizes sample efficiency by extracting the most advantage per generated rollout. This single profiling pass simultaneously drives data filtering (removing trivial queries and sub-sampling unsolvable ones), adaptive group size allocation, and curriculum construction (scheduling queries from easy to hard). sGPO matches or exceeds baseline performance while reducing total training compute by a factor of three, with the upfront inference profiling cost included.

2606.08850 2026-06-09 cs.LG cs.AI cs.CL stat.ML 新提交

Intrinsic Selection and Particle Resampling for Inference-Time Scaling Beyond Domain Verifiability

内在选择与粒子重采样:超越领域可验证性的推理时扩展

Giorgio Giannone, Mustafa Eyceoz, Shabana Baig, Shivchander Sudalairaj, Anna C. Doris, Faez Ahmed, Akash Srivastava, Kai Xu

发表机构 * MIT(麻省理工学院) Red Hat(红帽公司) IBM(IBM公司)

AI总结 提出基于并行样本集内在统计量(长度调整尾熵)的推理时扩展方法,通过后验候选排序和步骤级重采样,无需外部验证即可提升开放领域任务性能。

详情
Comments
preprint
AI中文摘要

推理时扩展(ITS)在数学和编程等可验证领域取得了很大成功,其中廉价验证使得可扩展输出选择成为可能。然而,将ITS扩展到容易发生系统性失败的任务——由错误初始假设或未满足的多维约束驱动——通常依赖于昂贵的外部求解器或脆弱的基于模型的验证器。我们的关键洞察是,并行样本集的内在统计量,特别是长度调整尾熵,提供了关于解质量的稳健判别信号,而无需访问真实标签。至关重要的是,这些统计量作为自适应计算分配的难度门控,动态地将问题路由到不同的扩展规模。首先,内在选择(iS)事后对候选进行排序,在三个领域匹配基于共识的算法,并将工程设计选择性能比pass@1基线提高20%。其次,内在粒子滤波(iPF)将其推广到步骤级重采样,引导生成走向高置信度推理轨迹,在困难数学问题上平均将pass@1提高6.1个百分点。最后,粒子蒸馏(dPF)通过早期logit混合和KL引导重采样注入特权指导,引导生成绕过系统性推理错误以满足专家评分标准,在复杂临床响应上获得高达26.5%的提升。我们的流程无缝适用于通用、领域专用和多模态架构,成功将ITS扩展到开放领域,而无需训练奖励模型或精确的真实标签验证。

英文摘要

Inference-Time Scaling (ITS) has largely succeeded in verifiable domains like math and coding, where cheap verification enables scalable output selection. However, extending ITS to tasks prone to systematic failure - driven by faulty initial assumptions or unmet multidimensional constraints - typically relies on costly external solvers or brittle, model-based verifiers. Our key insight is that the intrinsic statistics of parallel sample sets, specifically length-adjusted tail entropy, provide a robust discriminative signal for solution quality without access to ground truth. Crucially, these statistics serve as a difficulty gate for adaptive compute allocation, dynamically routing problems across scaling regimes. First, Intrinsic Selection (iS) ranks candidates post-hoc, matching consensus-based algorithms across three domains and improving engineering design selection by 20% over pass@1 baselines. Second, Intrinsic Particle Filtering (iPF) generalizes this to step-level resampling, guiding generation toward high-confidence reasoning trajectories to improve pass@1 by 6.1 points on average on hard math problems. Finally, Particle Distillation (dPF) injects privileged guidance via early logit blending and KL-guided resampling, steering generation past systematic reasoning errors to satisfy expert rubrics, yielding up to 26.5% gains on complex clinical responses. Our pipeline applies seamlessly across broad-purpose, domain-specialized, and multimodal architectures, successfully extending ITS to open-ended domains without requiring trained reward models or exact ground-truth verification.

2606.08691 2026-06-09 cs.LG stat.ME 新提交

Hierarchical Projection for Adaptive Knowledge Transfer

自适应知识迁移的分层投影

Samhita Pal, Tian Gu

发表机构 * Vanderbilt University Medical Center(范德比尔特大学医学中心) Columbia University(哥伦比亚大学)

AI总结 提出ProjectionTL框架,通过分层贝叶斯建模与自适应投影实现源选择与特征选择,缓解负迁移,提升跨域学习的准确性、稳定性和可解释性。

详情
AI中文摘要

现代数据驱动应用越来越多地涉及从多个异质源中学习,其中目标数据集有限,但跨域可获得相关信息。当相关性变化或存在虚假信号时,简单组合这些源会降低性能,这对可信的跨域学习构成了根本性挑战。我们提出了投影迁移学习(ProjectionTL),这是一个统一框架,将分层贝叶斯建模与自适应投影相结合,用于选择性知识迁移。关键思想是在两个层次上解耦迁移:首先,我们构建一个源引导的分层先验,通过数据驱动的权重聚合跨源信息,捕捉每个源与目标之间的全局对齐;其次,我们通过后验投影步骤在特征层面细化这种借用,选择性地保留与目标信号局部一致的坐标。这种两阶段设计使该方法能够同时进行源选择和特征选择,从而减轻负迁移,同时保持可解释性。ProjectionTL提供了一种跨域整合异质数据的原则性方法,桥接了统计建模和现代机器学习范式,以实现鲁棒且可解释的迁移。通过模拟和真实世界的生物医学应用,我们证明了与现有方法相比,准确性、稳定性和可解释性的提升。我们的框架为高维设置下的可信跨域学习提供了一种可扩展且通用的策略。

英文摘要

Modern data-driven applications increasingly involve learning from multiple heterogeneous sources, where a target dataset is limited but related information is available across domains. Naively combining these sources can degrade performance when relevance varies or spurious signals are present, posing a fundamental challenge for trustworthy cross-domain learning. We propose Projection Transfer Learning (ProjectionTL), a unified framework that integrates hierarchical Bayesian modeling with adaptive projection for selective knowledge transfer. The key idea is to decouple transfer at two levels: first, we construct a source-guided hierarchical prior that aggregates information across sources using data-driven weights, capturing global alignment between each source and the target; second, we refine this borrowing through a posterior-projection step that operates at the feature level, selectively retaining coordinates that exhibit local agreement with the target signal. This two-stage design enables the method to simultaneously perform source selection and feature selection, thereby mitigating negative transfer while preserving interpretability. ProjectionTL provides a principled approach to integrating heterogeneous data across domains, bridging statistical modeling and modern machine learning paradigms for robust and interpretable transfer. Through simulations and real-world biomedical applications, we demonstrate improved accuracy, stability, and interpretability compared to existing methods. Our framework offers a scalable and generalizable strategy for trustworthy cross-domain learning in high-dimensional settings.

2606.08679 2026-06-09 stat.ML cs.CL cs.LG stat.ME 新提交

Rank Intervals for Leaderboards: A Hierarchical Framework for Model Evaluation

排行榜的排名区间:模型评估的分层框架

Bitya Neuhof, Yuval Benjamini

AI总结 提出分层框架,通过任务级置信区间和排行榜级预测区间,实现具有统计保证的模型排名不确定性量化。

详情
AI中文摘要

预训练模型通常在多任务排行榜上评估,以衡量其在不同场景中的适用性。然而,当前将跨任务性能聚合为排行榜级排名的方法并未解决任务层面的不确定性和变异性。尽管近期工作提出了基于区间的模型排名,但从单个任务到排行榜级排名的不确定性的原则性聚合仍未解决,且模型在不同任务上的性能变化常被掩盖。本文引入一个分层框架,在两层上构建具有统计保证的模型排名区间:通过成对比较构建任务级排名置信区间,以及使用共形方法构建排行榜级排名预测区间。这使得能够对每个观测任务和新潜在任务进行可靠的模型排名量化。在模拟数据以及TabArena和PromptEval(MMLU)基准上的实验表明,我们的方法产生统计有效且信息丰富的区间,从而在排行榜上实现可靠、具有不确定性意识的模型排名。

英文摘要

Pretrained models are often evaluated on multi-task leaderboards to measure their applicability in diverse contexts. However, current methods for aggregating performance across tasks into leaderboard-level rankings do not address the uncertainty and variability at the task level. While recent works have proposed interval-based model rankings, the principled aggregation of uncertainty from individual tasks to leaderboard-level rankings remains unaddressed, and variation in models' performance across tasks is frequently obscured. In this work, we introduce a hierarchical framework that constructs model rank intervals with statistical guarantees at both levels: task-level rank confidence intervals from pairwise comparisons, and leaderboard-level rank prediction intervals using a conformal approach. This enables reliable quantification of model rank for each observed task and for new potential tasks. Experiments on simulated data and the TabArena and PromptEval (MMLU) benchmarks show that our method yields statistically valid and informative intervals, enabling reliable, uncertainty-aware model ranking on leaderboards.

2606.08390 2026-06-09 cs.LG stat.ML 新提交

When Are Neural Interaction Discoveries Real? Identifiability, Recoverability, and a Pre-Fit Diagnostic

神经交互发现何时是真实的?可辨识性、可恢复性与拟合前诊断

Valentina Kuskova, Dmitry Zaytsev, Michael Coppedge

AI总结 研究神经时间序列模型中交互发现的真实性问题,提出基于输入支持几何的可辨识性理论,并给出有效秩作为拟合前诊断工具。

详情
Comments
11 pages, 3 figures
AI中文摘要

当神经时间序列模型报告一个变量调节另一个变量对目标的影响时,发现的交互是数据的属性还是模型灵活性的伪影?我们认为这本质上是一个可辨识性问题,由观测输入支持的几何结构决定,而非特定的神经架构。我们在神经加性向量自回归(GNAVAR)的乘法门控扩展中研究该问题,其中源贡献由其他滞后变量调节。我们表明表示能力不等于可辨识性:依赖输入会在边特定交互项之间引入泄漏,低维支持允许不同的交互分解,这些分解在观测数据上一致但在其他地方不同。然后,我们在显式支持条件下(包括共享调节器设置)证明了归一化最小GNAVAR分解的总体可辨识性定理。该理论产生了一个简单的面向实践者的诊断:联合滞后块协方差的有效秩在拟合前预测对于给定候选集交互恢复是否可行。当候选集未知时,双种子稳定性检查提供了实用的操作测试。相同的支持条件将经验结果组织成理论预测的三种状态。我们的结果表明,交互可恢复性取决于支持几何,有效秩提供了实用的拟合前诊断,并且独立拟合之间的不稳定性是非可辨识交互发现的特征标志。可辨识性现象、支持条件和不稳定性标志是模型无关的;GNAVAR是使它们可证明的载体。

英文摘要

When a neural time-series model reports that one variable modulates another's effect on a target, is the discovered interaction a property of the data or an artifact of model flexibility? We argue that this is fundamentally a question of identifiability, governed by the geometry of the observed input support rather than by the specific neural architecture. We study the problem in a multiplicative-gating extension of neural additive vector autoregression (GNAVAR), in which source contributions are modulated by other lagged variables. We show that representational capacity is not identifiability: dependent inputs induce leakage between edge-specific interaction terms, and low-dimensional support permits distinct interaction decompositions that agree on the observed data while differing elsewhere. We then prove a population identifiability theorem for normalized minimal GNAVAR decompositions under explicit support conditions, including settings with shared modulators. The theory yields a simple practitioner-facing diagnostic: the effective rank of the joint lag-block covariance predicts, before fitting, whether interaction recovery is feasible for a given candidate set. When the candidate set is unknown, a two-seed stability check provides a practical operational test. The same support condition organizes empirical outcomes into the three states predicted by the theory. Our results show that interaction recoverability depends on support geometry, that effective rank provides a practical pre-fit diagnostic, and that instability across independent fits is a characteristic signature of non-identifiable interaction discovery. The identifiability phenomenon, the support condition, and the instability signature are model-agnostic; GNAVAR is the vehicle that makes them provable.

2606.08388 2026-06-09 cs.LG math.OC stat.ML 新提交

The Spectral Dynamics and Noise Geometry of Muon

Muon的谱动力学与噪声几何

Pierfrancesco Beneventano, Mahmoud Abdelmoneum, Tomaso Poggio

发表机构 * Massachusetts Institute of Technology(麻省理工学院)

AI总结 研究Muon优化器通过极分解替换矩阵梯度,证明其偏置为平坦谱,在欠定回归中导出奇异值动力学,实验表明其效果依赖于谱方向活跃度。

详情
Comments
24 pages, 11 figures
AI中文摘要

Muon将矩阵梯度$G=UΣV^\ op$替换为其极因子$UV^\ op$。这保留了梯度选择的奇异方向,但使更新谱平坦。我们研究此操作产生的优化偏置。在显式对齐假设下,我们证明在利用梯度奇异方向且不适应当前权重谱的有界更新中,极更新是单步熵最大化的选择。在欠定回归模型中,我们推导了连续时间Muon的精确奇异值动力学,并识别出一个依赖于测量的条件,在该条件下归一化谱趋向于相等的非零奇异值。这种几何也排除了常见的低秩解释:在固定Frobenius范数下,Muon的区分状态具有平坦谱,而核范数最小化则偏好谱集中。受控矩阵感知实验将效应与简单梯度缩放分离,表明范数匹配的梯度下降不能复现Muon,并在广泛消融中恢复预测的平坦化趋势。在小型NanoGPT预训练中,Muon保持稳定秩,具有宽学习率平台,并相对于AdamW改善验证损失;在匹配的小型ViT对照中,排名反转。由此得出的图景是依赖于区域的:Muon并非普遍优越,但其平坦谱偏置在需要保持许多谱方向活跃时可能有所帮助。

英文摘要

Muon replaces a matrix gradient $G=UΣV^\top$ by its polar factor $UV^\top$. This keeps the singular directions selected by the gradient, but makes the update spectrum flat. We study the optimization bias created by this operation. Under explicit alignment assumptions, we prove that the polar update is the one-step entropy-maximizing choice among bounded updates that use the gradient singular directions and do not adapt to the current weight spectrum. In an underdetermined regression model, we derive exact singular-value dynamics for continuous-time Muon and identify a measurement-dependent condition under which the normalized spectrum moves toward equal nonzero singular values. This geometry also rules out a common low-rank interpretation: at fixed Frobenius norm, Muon's distinguished state has a flat spectrum, whereas nuclear-norm minimization favors spectral concentration. Controlled matrix-sensing experiments separate the effect from simple gradient rescaling, show that norm-matched gradient descent does not reproduce Muon, and recover the predicted flattening trend across broad ablations. In small NanoGPT pretraining, Muon preserves stable rank, has a broad learning-rate plateau, and improves validation loss relative to AdamW; in a matched small-ViT control, the ranking reverses. The resulting picture is regime-dependent: Muon is not universally superior, but its flat-spectrum bias can help when many spectral directions need to remain active.

2606.08218 2026-06-09 cs.LG cs.AI math.ST stat.ML stat.TH 新提交

How Deep Are Deep GPs, Really? A Sharp Threshold and a Non-Gaussian Limit for Compositional GPs

深度高斯过程到底有多深?组合高斯过程的尖锐阈值与非高斯极限

Mark Kozdoba, Shie Mannor

发表机构 * Technion, IIT(以色列理工学院) NVIDIA(英伟达)

AI总结 本文研究了深度高斯过程先验在深度增长时的极限行为,识别出RBF核带宽的尖锐阈值,低于该阈值时先验收敛到非退化非高斯分布,具有非零坐标依赖。

详情
AI中文摘要

组合先验描述了深度贝叶斯模型中分层函数的通用属性,其中随机权重的深度神经网络是一个典型例子。在宽网络极限下,先验是一个具有深度相关核的高斯过程,其随深度增长的行为已通过该核得到广泛研究。这里,我们研究另一种情况,其中每一层本身是一个向量值高斯过程,我们的目标类似地理解先验随深度增长的极限行为。先前的高斯过程工作已确定,对于RBF核和一定范围的带宽$r$,先验在极限下退化,收敛到常数函数集——这作为概率模型是无用的。在本文中,我们建立了几个新结果。首先,我们识别出一个尖锐的带宽阈值$r_c(d) = Θ(\sqrt{d})$,高于该阈值极限是退化的,加强了先前的界限。其次,更重要的是,我们证明对于低于阈值$r_c(d)$的$r$,先验收敛到极限分布$π_{\bar{Z}}$。我们还证明这些分布是非退化且非高斯的,坐标之间具有非消失的依赖性。与先前已知的退化机制相反,深度高斯过程先验因此可以允许非平凡极限。实验上,我们在维度$d$的范围内验证了该阈值,并展示了极限分布$π_{\bar{Z}}$的复杂多模态行为——该机制随$d$增长而变得狭窄,且在不了解阈值的情况下难以识别。

英文摘要

Compositional priors describe the generic properties of layered functions in deep Bayesian models, where deep neural networks with random weights are a canonical example.In the wide-network limit, the prior is a Gaussian process with a depth-dependent kernel, and its behaviour as depth grows has been extensively studied through this kernel. Here, we study another case, where each layer itself is a vector valued Gaussian process, and our aim is similarly to understand the limiting behaviour of the prior as depth grows. Previous GP work has established that for the RBF kernel and a certain range of bandwidths $r$, the prior degenerates in the limit, converging to the set of constant functions -- which is not useful as a probabilistic model. In this paper we establish several new results. First, we identify a sharp bandwidth threshold $r_c(d) = Θ(\sqrt{d})$ above which the limit is degenerate, strengthening the earlier bounds. Second, and more importantly, we show that for $r$ below the threshold $r_c(d)$ the prior converges to a limit distribution $π_{\bar{Z}}$. We also prove that these distributions are non-degenerate and non-Gaussian, with non-vanishing dependence between coordinates. In contrast to the previously known degenerate regime, deep Gaussian process priors can therefore admit non-trivial limits. Empirically, we verify the threshold across a range of dimensions $d$, and demonstrate a complex multimodal behaviour of the limit distributions $π_{\bar{Z}}$ -- a regime that becomes increasingly narrow with $d$ and would be hard to identify without knowing the threshold.

2606.08032 2026-06-09 stat.ML cs.LG 新提交

Variational Proximal Policy Optimization

变分近端策略优化

Ousmane Amadou Dia

AI总结 提出变分近端策略优化(VP₂O),利用粒子变分推理和专家混合架构,通过几何近端控制机制解决强化学习中的策略模式崩溃和分布漂移问题,在复杂推理任务上取得显著提升。

详情
AI中文摘要

通过近端策略优化进行的人类反馈强化学习经常遭受策略模式崩溃、脆弱的探索循环和分布漂移。本文引入了变分近端策略优化(\(\textsc{VP}_2\textsc{O}\)),这是一种基于粒子的变分推理框架,将策略优化映射到专家混合架构中的Stein变分梯度下降。通过利用局部化专家原型上的函数核以及专家正交化损失,\(\textsc{VP}_2\textsc{O}\)引入了一种基于几何的近端控制机制,可以减少对固定裁剪或KL计划的依赖。我们在33B/4B稀疏专家混合模型上的结果显示,在复杂推理基准测试中取得了多项改进,在Codeforces上建立了\(+\mathbf{179}\) ELO增益,并在AIME数学推理任务上减少了\(\mathbf{32\%}\)的令牌数量。

英文摘要

Reinforcement Learning from Human Feedback via Proximal Policy Optimization often suffers from policy mode collapse, brittle exploration loops, and distribution drift. This paper introduces Variational Proximal Policy Optimization (\(\textsc{VP}_2\textsc{O}\)), a particle-based variational inference framework that maps policy optimization to Stein Variational Gradient Descent within a Mixture-of-Experts architecture. By leveraging functional kernels over localized expert prototypes alongside an expert orthogonalization loss, \(\textsc{VP}_2\textsc{O}\) introduces a geometry-based proximal-control mechanism that can reduce reliance on fixed clipping or KL schedules. Our results on a 33B/4B sparse Mixture-of-Experts model show several improvements across complex reasoning benchmarks, establishing a \(+\mathbf{179}\) ELO gain on Codeforces and a \(\mathbf{32\%}\) reduction in token count on AIME mathematical reasoning tasks.

2606.07926 2026-06-09 stat.ML cs.LG 新提交

Barycentric Projections of Optimal Transport Plans on Riemannian Manifolds

黎曼流形上最优传输计划的重心投影

Kisung You

AI总结 提出黎曼流形上传输耦合的重心投影框架,通过条件Fréchet均值得到最佳确定性映射,并定义条件方差Monge缺陷,实验验证了内在投影与切向投影的不同作用。

详情
AI中文摘要

最优传输耦合是概率对象,而许多学习流程需要确定性映射。在欧几里得空间中,重心投影通过取条件期望将耦合转换为映射,但在黎曼流形上,曲率和割迹使这一操作变得不平凡。我们开发了一个黎曼流形上传输耦合的重心投影框架。内在投影将每个源点映射到其目标分布的条件Fréchet均值,并证明它是平方测地线损失下的最佳确定性代表。相应的最小值是积分条件Fréchet方差,该方差对于由映射诱导的耦合恰好为零,因此定义了一个条件方差Monge缺陷。我们还研究了一个切向log-exp投影,证明了其欧几里得精确性、在Monge情况下与Brenier-McCann映射的兼容性,以及其作为内在目标的第一单位黎曼梯度更新的解释。对于离散耦合,两种构造都按行分解为加权Fréchet均值和log-exp问题。在球面数据、合成SPD数据和真实EEG协方差矩阵上的实验支持所提出的角色分工:内在投影是变分代表,而切向投影是有用的局部位移代理。

英文摘要

Optimal transport couplings are probabilistic objects, while many learning pipelines require deterministic maps. In Euclidean space, barycentric projection converts a coupling into a map by taking conditional expectations, but on a Riemannian manifold curvature and cut loci make this operation nontrivial. We develop a framework for barycentric projections of transport couplings on Riemannian manifolds. The intrinsic projection maps each source point to the conditional Fréchet mean of its destination law and is shown to be the best deterministic representative under squared geodesic loss. The corresponding minimum value is an integrated conditional Fréchet variance, which vanishes exactly for map-induced couplings and therefore defines a conditional-variance Monge defect. We also study a tangential log-exp projection, prove its Euclidean exactness, its compatibility with Brenier-McCann maps in the Monge case, and its interpretation as the first unit Riemannian gradient update for the intrinsic objective. For discrete couplings, both constructions decompose row-wise into weighted Fréchet mean and log-exp problems. Experiments on spherical data, synthetic SPD data, and real EEG covariance matrices support the proposed division of roles: the intrinsic projection is the variational representative, while the tangential projection is a useful local displacement surrogate.

2606.07890 2026-06-09 cs.LG stat.ML 新提交

Partially Performative Prediction

部分表现性预测

Jaewook Lee, Tijana Zrnic

发表机构 * Stanford University(斯坦福大学)

AI总结 提出部分表现性预测框架,统一建模由模型部署引起的内生分布偏移和外部时间变化引起的外生偏移,并定义在线表现性稳定与最优性,分析重复训练等启发式方法的适应性条件。

详情
AI中文摘要

表现性预测研究当预测模型部署在重要领域时产生的反馈循环。在这些设置中,部署模型可能会改变模型旨在预测其模式的人群,导致学习系统内生的分布偏移。这种视角不同于经典的分布偏移处理,其中偏移通常被建模为数据生成过程中的外生变化。然而,在实践中,分布偏移很少是单一类型的。预测模型可能通过其支持的决策影响未来数据,而世界本身也因学习者无法控制的原因持续漂移。我们研究部分表现性预测,这是一个捕捉内源和外源分布偏移源的框架。该框架通过允许数据分布既响应部署的模型又根据外部时变过程演化,推广了表现性预测。我们通过定义在线类比来跟踪演化的部分表现性环境,将表现性稳定性和表现性最优性的核心概念扩展到这一设置。我们分析了实用的学习启发式方法,包括重复训练,并刻画了它们何时成功适应部分表现性环境。

英文摘要

Performative prediction studies feedback loops that arise when predictive models are deployed in consequential domains. In these settings, deploying a model can change the population whose patterns the model aims to predict, inducing a distribution shift that is endogenous to the learning system. This perspective departs from classical treatments of distribution shift, where shifts are typically modeled as exogenous changes in the data-generating process. Yet, in practice, distribution shift is rarely one or the other. Predictive models may influence future data through the decisions they support, while the world itself continues to drift for reasons beyond the learner's control. We study partially performative prediction, a framework that captures both endogenous and exogenous sources of distribution shift. The framework generalizes performative prediction by allowing the data distribution to evolve both in response to the deployed model and according to an external, time-varying process. We extend the central notions of performative stability and performative optimality to this setting by defining their online analogues that track the evolving partially performative environment. We analyze practical learning heuristics, including repeated retraining, and characterize when they successfully adapt to partially performative environments.

2606.07694 2026-06-09 cs.LG stat.ML 新提交

Vessel Traffic Flow Prediction on Sparse Data via Spatio-Temporal Graph Neural Networks with a Learnable Tweedie Head

基于可学习Tweedie头的时空图神经网络在稀疏数据上的船舶交通流预测

Kyeongjun Lee, Heeyoung Kim

发表机构 * Korea Advanced Institute of Science and Technology (KAIST)(韩国科学技术院)

AI总结 针对船舶交通流数据高度稀疏且间歇性爆发的问题,提出一种模型无关的可学习Tweedie头作为即插即用输出模块,通过优化闭合形式的Tweedie单元偏差并预测均值,同时学习节点级方差幂以捕获港口区域异质性,在真实AIS数据上显著提升RMSE。

详情
AI中文摘要

准确的船舶交通流预测对于智能港口运营和航行安全至关重要。然而,海上交通流数据通常高度稀疏且具有间歇性爆发,使得稳健预测具有挑战性。在这种条件下,传统的时空图神经网络(ST-GNNs)可能退化为保守的接近零的预测,无法捕获非零活动。尽管零膨胀负二项(ZINB)模型部分解决了过多零值问题,但其两部分公式在突变附近仍可能保持保守。为了解决这些问题,我们提出了一种模型无关的可学习Tweedie头,它可以作为即插即用的输出模块附加到任意ST-GNN骨干网络上。与通常需要替代目标的基于似然的Tweedie训练不同,我们的方法优化了闭合形式的Tweedie单元偏差,并预测均值以进行点预测,同时学习节点级方差幂以捕获港口区域间的异质性变异性。在由洛杉矶和长滩港口的真实AIS数据构建的海上交通图上的实验表明,所提出的头在多个ST-GNN骨干网络上一致地提高了RMSE,特别是在非零事件上,从而为实际海上交通控制提供了更可靠的预测。

英文摘要

Accurate vessel traffic flow prediction is crucial for smart port operations and navigational safety. However, maritime traffic flow data are often highly sparse with intermittent bursts, making robust forecasting challenging. Under such conditions, conventional spatio-temporal graph neural networks (ST-GNNs) can degrade toward conservative near-zero predictions and fail to capture non-zero activity. Although zero-inflated negative binomial (ZINB) models partially address excess zeros, their two-part formulation can still remain conservative around abrupt transitions. To address these issues, we propose a model-agnostic learnable Tweedie head that can be attached as a plug-and-play output module to arbitrary ST-GNN backbones. Instead of likelihood-based Tweedie training, which typically requires surrogate objectives, our approach optimizes the closed-form Tweedie unit deviance and predicts the mean for point forecasting while learning a node-level variance power to capture heterogeneous variability across port areas. Experiments on a maritime traffic graph constructed from real-world AIS data in the Port of Los Angeles and Long Beach show that the proposed head consistently improves RMSE across multiple ST-GNN backbones, especially on non-zero events, leading to more reliable forecasts for practical maritime traffic control.

2606.07630 2026-06-09 cs.LG cs.AI stat.ML 新提交

Active Learning with Foundation Model Priors: Efficient Learning under Class Imbalance

基于基础模型先验的主动学习:类别不平衡下的高效学习

Jiancheng Zhang, Meiqing Li, Qi Zhang, Yinglun Zhu

发表机构 * University of California, Riverside(加州大学河滨分校) Carnegie Mellon University(卡内基梅隆大学) Worcester Polytechnic Institute(伍斯特理工学院)

AI总结 针对现实数据中的类别不平衡和噪声标注问题,提出一种利用基础模型先验的主动学习框架,通过不平衡感知的协同决策选择信息量最大的样本,在图像和文本数据集上实现超过50%的标注节省。

详情
Comments
To appear at ICML 2026
AI中文摘要

现实世界中图像和文本领域的数据集通常具有偏斜的类别分布和噪声标注,这共同降低了模型性能,尤其是对少数类。在现有解决方案中,主动学习通过选择性地查询信息最丰富且平衡的样本进行标注,提供了一种有效且高效的范式。我们提出了一种创新的主动学习框架,该框架减轻了类别不平衡,并选择信息量最大的样本进行标注。利用基础模型先验,我们的算法使得基础模型和小模型之间能够进行不平衡感知的协同决策,以处理跨领域的有噪声和不平衡标签。我们首次系统性地研究了在图像和文本领域中标签噪声和类别不平衡双重挑战下的主动学习。在不平衡数据集上的大量实验表明,我们的方法实现了显著的标注节省——与最佳主动学习基线相比超过50%——同时保持了对标签噪声的性能和鲁棒性。

英文摘要

Real-world datasets across image and text domains are often characterized by skewed class distributions and noisy annotations, which jointly degrade model performance, particularly on minority classes. Among existing solutions, active learning offers an effective and efficient paradigm by selectively querying the most informative and balanced samples for annotation. We propose an innovative active learning framework that mitigates class imbalance and selects the most informative samples to annotate. Leveraging foundation model priors, our algorithm enables imbalance-aware co-decisions between foundation model and small model to tackle noisy and imbalanced labels across various domains. We introduce the first study to systematically explore active learning under the dual challenges of label noise and class imbalance across image and text domains. Extensive experiments on imbalanced datasets demonstrate that our method achieves substantial annotation savings-over 50% compared to the best active learning baseline-while preserving performance and robustness to label noise.

2606.05441 2026-06-09 cs.LG cs.AI stat.ML 版本更新

GOTabPFN: From Feature Ordering to Compact Tokenization for Tabular Foundation Models on High-Dimensional Data

GOTabPFN: 从特征排序到高维表格基础模型的紧凑分词化

Al Zadid Sultan Bin Habib, Md Younus Ahamed, Prashnna Kumar Gyawali, Gianfranco Doretto, Donald A. Adjeroh

AI总结 针对高维小样本表格预测问题,提出GOTabPFN模型,通过图引导排序和神经启发子单元压缩实现紧凑表示,提升TabPFN在严格token预算下的稳定性和准确性。

详情
Comments
Accepted to the 43rd International Conference on Machine Learning (ICML 2026). Code and resources GitHub https://github.com/zadid6pretam/GOTabPFN PyPI https://pypi.org/project/gotabpfn Project webpage https://www.zadidhabib.com/gotabpfn.html Hugging Face ZeroGPU https://huggingface.co/spaces/zadid6pretam/GOTabPFN CPU backup https://huggingface.co/spaces/zadid6pretam/GOTabPFN_CPU
AI中文摘要

我们研究了如何在不重新训练大型骨干网络的情况下,使小型表格基础模型对高维小样本(HDLSS)表格预测有效。我们引入了带局部细化的图引导排序(GO-LR),证明了其与加权最小线性排列的等价性,并将实际求解器解释为TSP路径式替代方案。我们提出了基于GO-LR的GOTabPFN,以及一个神经启发子单元压缩(NSC)单元,将局部相邻的排序特征池化为元特征,从而生成紧凑表示,使TabPFN风格的预测在HDLSS场景中变得实用。在多个表格基准测试中,GOTabPFN在严格的token预算下提高了稳定性和准确性。

英文摘要

We investigate how to make small tabular foundation models effective for High-Dimensional, Low-Sample Size (HDLSS) tabular prediction without retraining large backbones. We introduce Graph-guided Ordering with Local Refinement (GO-LR), show its equivalence to weighted Minimum Linear Arrangement, and interpret the practical solver as a TSP-path-style surrogate. We propose GOTabPFN,which builds on GO-LR, and a Neuro-Inspired Subunit Compression (NSC) unit to pool locally adjacent ordered features into meta-features, yielding a compact representation that makes TabPFN-style prediction practical in HDLSS regimes. Across tabular benchmarks, GOTabPFN improves stability and accuracy under tight token budgets.

2606.01619 2026-06-09 cs.AI cs.LG stat.ML 版本更新

ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL

ReSkill:在智能体强化学习中协调技能创建与策略优化

Zelin He, Haotian Lin, Boran Han, Wei Zhu, Haoyang Fang, Bernie Wang, Xuan Zhu, Runze Li, Matthew Reimherr

AI总结 提出ReSkill框架,通过GRPO的组结构嵌入断言驱动技能创建、组内轨迹采样和自适应汤普森采样,实现技能与策略的协同进化,在多个领域超越现有方法。

详情
AI中文摘要

智能体强化学习使LLM智能体能够从环境奖励中持续改进,但由此产生的策略并未系统地积累可跨任务泛化的可重用策略。模块化技能可以提供此类可重用策略,然而现有的技能增强强化学习方法将技能创建与策略优化分离,存在采用与进化策略冲突的技能的风险。受Anthropic的Skill Creator启发,我们引入ReSkill,一种强化学习在环的技能创建框架,协调技能进化与策略学习。ReSkill利用GRPO的组结构自然嵌入三种机制,仅需少量额外开销:(1)断言驱动的技能创建器,从过去经验中诊断失败并提出基于条件的触发式技能修订;(2)组内轨迹采样,实现技能版本的可控比较,捕获哪个版本最能支持策略的持续学习;(3)自适应折扣的汤普森采样,在策略进化过程中平衡技能版本选择的探索与利用。在多个领域,ReSkill始终优于现有的基于记忆和技能的强化学习方法,在未见任务上提升最大。对技能生命周期的分析显示,随着策略改进,技能被自动创建、测试、精炼和修剪,展示了协调的技能-策略协同进化。

英文摘要

Agentic reinforcement learning (RL) enables LLM agents to improve continuously from environment rewards, yet the resulting policies do not systematically accumulate reusable strategies that generalize across tasks. Modular skills can provide such reusable strategies, yet existing skill-augmented RL methods decouple skill creation from policy optimization, risking adopting skills that conflict with the evolving policy. Inspired by Anthropic's Skill Creator, we introduce ReSkill, an RL-in-the-loop skill creation framework that reconciles skill evolution with policy learning. ReSkill exploits the group-wise structure of GRPO to naturally embed three mechanisms with only marginal additional overhead: (1) an assertion-driven skill creator that diagnoses failures from past experience and proposes conditional, trigger-based skill revisions; (2) within-group rollout sampling that enables controlled comparison of skill versions, capturing which version best supports the policy's ongoing learning; and (3) Thompson Sampling with adaptive discounting to balance exploration and exploitation in skill version selection as the policy evolves. Across several domains, ReSkill consistently outperforms existing memory and skill-based RL methods, with the largest gains on unseen tasks. Analysis of the skill lifecycle shows skills being automatically created, tested, refined, and pruned as the policy improves, demonstrating reconciled skill-policy co-evolution.

2606.00469 2026-06-09 math.OC stat.ML 版本更新

Constructive interpolation and generalization rates for neural ODEs: a control perspective

神经ODE的构造性插值与泛化率:控制视角

Antonio Álvarez-López, Lorenzo Liverani, Enrique Zuazua

AI总结 从控制理论角度,通过构造性证明半自主神经ODE(SA-NODE)具有同时单元可控性(SCC),实现精确插值并导出与直方图和最近邻估计器相当的泛化风险界。

详情
Comments
36 pages, 8 figures
AI中文摘要

我们从控制理论的角度研究神经ODE(NODE)的监督回归,以推导显式的总体风险界。我们关注一类广泛使用的非自主模型,具有恒定参数和显式时间依赖性,称为半自主NODE(SA-NODE)。我们构造性地证明SA-NODE能够对可容许的有限数据集进行精确插值,甚至满足一个更强的性质,我们称之为同时单元可控性(SCC):其流可以将不相交的指定单元映射到任意小的目标球内。该性质是将插值升级为定量泛化的机制,因为它允许SA-NODE模拟分段常数非参数估计器。因此,只要网络宽度满足与样本量的保守缩放,我们的风险界就能恢复直方图和最近邻估计器的速率。数值实验表明,训练后的SA-NODE实现了与这些基线相比有竞争力(通常更低)的测试误差。最后,我们证明显式时间依赖性至关重要。尽管两层自主NODE可以插值几何非退化数据集,但结构障碍阻止它们实现SCC。这些局限性(进一步通过数值确认)支持了SA-NODE为学习提供最小有效架构的观点。

英文摘要

We study supervised regression with neural ODEs (NODEs) from a control-theoretic perspective to derive explicit population-risk bounds. We focus on a widely used class of non-autonomous models with constant parameters and explicit time dependence, which we call semi-autonomous NODEs (SA-NODEs). We constructively prove that SA-NODEs are capable of \emph{exact} interpolation of admissible finite datasets, and even satisfy a stronger property that we call \emph{simultaneous cell controllability} (SCC): their flows can map prescribed disjoint cells into arbitrarily small target balls. This property is the mechanism that upgrades interpolation into quantitative generalization, by allowing SA-NODEs to emulate piecewise-constant nonparametric estimators. Consequently, our risk bounds recover the rates of histogram and nearest-neighbor estimators, provided the network width satisfies a conservative scaling with the sample size. Numerical experiments show that trained SA-NODEs achieve competitive -- often lower -- test errors than these baselines. Finally, we show that the explicit time dependence is essential. Although two-layer autonomous NODEs can interpolate geometrically nondegenerate datasets, structural obstructions prevent them from achieving SCC. These limitations, further confirmed numerically, support the view that SA-NODEs provide a minimal effective architecture for learning.

2606.00419 2026-06-09 stat.ML cs.LG 版本更新

Parameter-Free and Group Conditional Online Conformal Prediction

无参数和组条件在线共形预测

Beepul Bharti, Ambar Pal, Jacopo Teneggi, Jeremias Sulam

AI总结 提出一种无参数算法用于组条件在线共形预测,在保证组条件覆盖的同时无需调参,并在合成和真实数据上验证了其有效性和可靠性。

详情
AI中文摘要

不确定性量化对于机器学习预测器在数据分布随时间变化(即数据可能不可交换)的真实场景中的部署至关重要。在线共形预测方法解决了这个问题,但代价是(i)组间误差控制或(ii)与学习率无关的实现。组条件覆盖对于跨不同数据点集合的公平性以及提供更精细的不确定性量化保证至关重要。无参数优化对于对抗对抗性和未知数据偏移的鲁棒性至关重要。我们提出了一种用于组条件在线共形预测的无参数算法,并证明它实现了最佳的组条件覆盖保证。我们在合成和真实数据上评估了我们的算法,表明我们的方法不仅提高了现有无参数在线共形预测方法的可靠性,而且提供了与调优良好的组条件方法大小相当的预测区间。通过将组条件覆盖与无参数在线算法统一,我们的工作为变化环境中公平且鲁棒的不确定性量化奠定了基础。

英文摘要

Uncertainty quantification (UQ) is critical for the deployment of machine learning predictors in real-world scenarios where the data distribution may shift over time (i.e., data may not be exchangeable). Online conformal prediction (OCP) methods address this issue at the expense of either (i) group-wise error control or (ii) learning-rate independent implementation. Group-conditional coverage is essential for fairness across different collections of data points and for providing finer UQ guarantees. Parameter-free optimization is crucial for robustness to adversarial and unknown data shifts. We propose a parameter-free algorithm for group-conditional OCP and demonstrate that it achieves the best group-conditional coverage guarantees. We evaluate our algorithm on synthetic and real-world data, demonstrating that our method not only improves the reliability of existing parameter-free OCP methods but also provides prediction intervals that are comparable in size to well-tuned group-conditional approaches. By unifying group-conditional coverage with parameter-free online algorithms, our work lays a foundation for fair and robust uncertainty quantification in shifting environments.

2605.26703 2026-06-09 econ.TH cs.GT cs.LG stat.ML 版本更新

Proper Calibeating

Proper Calibeating

Dean P. Foster, Sergiu Hart

AI总结 本文将经典校准预测和calibeating概念扩展到真确评分规则,定义proper-calibration和proper-calibeating,证明校准蕴含proper-calibration而calibeating不一定蕴含proper-calibeating,展示如何保证proper-calibeating和proper-multicalibeating,并证明proper-calibration与不确定性决策中对预测最佳回应时通用无遗憾的等价性。

详情
Comments
v2: Updated section 6 "Decision Making Under Uncertainty"
AI中文摘要

经典概念“校准预测”及其更近期的改进“calibeating”是相对于标准二次评分规则定义的。我们将这些概念扩展到$\textit{真确}$评分规则类(其中最佳预测是真实分布),并通过要求误差在所有有界真确评分规则上一致收敛到零来定义$\textit{proper-calibration}$和$\textit{proper-calibeating}$。我们首先证明校准总是蕴含proper-calibration,而calibeating不一定蕴含proper-calibeating。其次,我们展示如何保证proper-calibeating和proper-multicalibeating。最后,我们证明了在不确定性决策中对预测进行最佳回应时,proper-calibration与通用无遗憾之间的等价性。

英文摘要

The classic concept of "calibrated forecasts" and its more recent refinement, "calibeating," are defined with respect to the standard quadratic scoring rule. We extend these notions to the class of $\textit{proper}$ scoring rules (for which the best forecast is the true distribution) and define $\textit{proper-calibration}$ and $\textit{proper-calibeating}$ by requiring the errors to converge to zero uniformly over all bounded proper scoring rules. We first establish that calibration always implies proper-calibration, whereas calibeating need not imply proper-calibeating. Second, we show how to guarantee proper-calibeating and proper-multicalibeating. Finally, we demonstrate the equivalence between proper-calibration and universal no regret when best replying to forecasts in decision-making under uncertainty.

2604.25965 2026-06-09 stat.ML cs.LG 版本更新

Adversarial Robustness of NTK Neural Networks

NTK神经网络的对抗鲁棒性

Yuxuan Hou

AI总结 本文研究了NTK神经网络在非参数回归中的对抗鲁棒性,推导了Sobolev空间中的对抗回归最小最大最优速率,并证明了通过梯度流早停训练的NTK网络可达到该最优速率,但在过拟合情况下最小范数插值器易受对抗扰动影响。

详情
AI中文摘要

深度学习模型被广泛应用于安全关键领域,但仍然容易受到对抗攻击。本文研究了NTK神经网络在非参数回归中的对抗鲁棒性。我们建立了Sobolev空间中的对抗回归最小最大最优速率,并证明了通过梯度流早停训练的NTK神经网络可以达到该最优速率。然而,在过拟合情况下,我们证明了最小范数插值器对对抗扰动是脆弱的。

英文摘要

Deep learning models are widely deployed in safety-critical domains, but remain vulnerable to adversarial attacks. In this paper, we study the adversarial robustness of NTK neural networks in the context of nonparametric regression. We establish minimax optimal rates for adversarial regression in Sobolev spaces and then show that NTK neural networks, trained via gradient flow with early stopping, can achieve this optimal rate. However, in the overfitting regime, we prove that the minimum norm interpolant is vulnerable to adversarial perturbations.

2603.25157 2026-06-09 cs.LG cs.AI cs.CV stat.ML 版本更新

Vision Hopfield Memory Networks for Image Recognition

Vision Hopfield Memory Networks

Jianfeng Wang, Amine M'Charrak, Luk Koska, Xiangtao Wang, Daniel Petriceanu, Ruizhi Wang, Michael Bumbar, Luca Pinchetti, Thomas Lukasiewicz

AI总结 本文提出了一种受大脑启发的视觉Hopfield记忆网络(V-HMN),通过整合分层记忆机制和迭代细化更新,实现了统一框架下的局部和全局动态建模,提升了可解释性和数据效率。

详情
AI中文摘要

近年来,视觉和多模态基础模型,如Transformer家族和状态空间模型(如Mamba)在图像、文本等领域取得了显著进展。尽管这些架构在经验上取得了成功,但它们与人脑的计算原理仍有很大差距,通常需要大量的训练数据且可解释性有限。在本文中,我们提出了视觉Hopfield记忆网络(V-HMN),一种受大脑启发的基础模型,整合了分层记忆机制和迭代细化更新。具体而言,V-HMN包含局部Hopfield模块,提供图像块级别的关联记忆动态,全局Hopfield模块作为情境调节的事件记忆,以及受预测编码启发的细化规则用于迭代误差校正。通过将这些基于记忆的模块分层组织,V-HMN在一个统一的框架中捕捉了局部和全局动态。记忆检索揭示了输入与存储模式之间的关系,使决策更具可解释性,而存储模式的重用提高了数据效率。这种受大脑启发的设计因此在可解释性和数据效率方面超越了现有的自注意或状态空间方法。我们在公开的计算机视觉基准上进行了广泛的实验,V-HMN在与广泛采用的基础架构竞争的同时,提供了更好的可解释性、更高的数据效率和更强的生物合理性。这些发现突显了V-HMN作为下一代视觉基础模型的潜力,同时为文本和音频等领域的多模态基础模型提供了通用的蓝图,从而将受大脑启发的计算与大规模机器学习联系起来。

英文摘要

Recent vision backbones, such as Transformer families and state-space models like Mamba, have achieved remarkable progress on image recognition. Despite their empirical success, these architectures remain far from the computational principles of the human brain, often demanding enormous amounts of training data while offering limited interpretability. We propose the Vision Hopfield Memory Network (V-HMN), a brain-inspired vision backbone that integrates hierarchical memory mechanisms across layers with iterative refinement updates. Specifically, V-HMN incorporates local Hopfield modules that provide associative memory dynamics at the image patch level, global Hopfield modules that function as episodic memory for contextual modulation, and a predictive-coding-inspired refinement rule for iterative error correction. By organizing these memory-based modules hierarchically, V-HMN captures both local and global dynamics in a unified framework. Memory retrieval exposes the relationship between inputs and stored patterns, providing a prototype-based form of interpretability through explicit memory retrieval, while the reuse of stored patterns improves data efficiency. This brain-inspired design therefore enhances data efficiency and provides a prototype-based form of interpretability compared to existing self-attention- or state-space-based approaches. We conducted extensive experiments on public image classification benchmarks. V-HMN achieves strong performance on small- and medium-scale benchmarks, and remains competitive with widely adopted backbone architectures on ImageNet despite minimal architectural tuning, while offering improved data efficiency and a prototype-based form of interpretability. These findings highlight the potential of V-HMN as a memory-centric alternative to standard vision backbones, thereby bridging brain-inspired computation with modern machine learning.

2507.00260 2026-06-09 stat.ML cs.LG math.ST stat.ME stat.TH 版本更新

Disentangled Feature Importance

解耦特征重要性

Jin-Hong Du, Kathryn Roeder, Larry Wasserman

AI总结 本文提出解耦特征重要性(DFI),用于解释相关测量通道中的预测信号分配,通过独立潜在表示和熵最优传输几何计算特征重要性,实现稳定且可解释的归因。

详情
Comments
29 main and 44 supplementary pages
AI中文摘要

当预测变量统计依赖时,特征重要性的适当定义取决于操作目标。条件增量措施适合于特征选择、获取和压缩,其中共享的预测信息被视为冗余。然而,对于事后解释,目标通常是将预测信号归因于相关测量通道。我们引入了解耦特征重要性(DFI),这是一种针对此设置的群体层面归因框架。DFI在指定的熵最优传输几何下将协变量映射到独立的潜在表示,计算潜在重要性,并通过巴里中心敏感度将重要性归因于原始协变量。我们证明了广泛的条件增量FI函数在平方误差损失下瞄准条件增量预测价值,因此回答了与依赖下的共享预测信号归因不同的问题。在固定传输成本、参考定律和正则化水平下,DFI定义了一个well-specified的估计量族。潜在分数具有功能ANOVA解释,并在高斯线性情况下,归因DFI恢复了相关回归器的经典R²分解。我们推导了在干扰率和光滑性条件下基于影响函数的推断,并在模拟和HIV-1中和抗性分析中展示了DFI在共享预测信号归因方面产生稳定、可解释、具有不确定性的归因。

英文摘要

When predictors are statistically dependent, the appropriate definition of feature importance depends on the operational goal. Conditional-incremental measures are well-suited for feature selection, acquisition, and compression, where shared predictive information is treated as redundancy. For post-hoc interpretation, however, the goal is often to attribute predictive signals across correlated measurement channels. We introduce Disentangled Feature Importance (DFI), a population-level attribution framework for this setting. DFI maps covariates to an independent latent representation under a specified entropic optimal transport geometry, computes latent importance, and attributes it back to the original covariates through barycentric sensitivities. We show that broad conditional-incremental FI functionals target conditional incremental predictive value under squared-error loss, and therefore answer a different question from attribution of shared predictive signal under dependence. Under fixed transport cost, reference law, and regularization level, DFI defines a well-specified family of estimands. Latent scores admit a functional ANOVA interpretation, and in the Gaussian linear case, the attributed DFI recovers the classical $R^2$ decomposition for correlated regressors. We derive influence-function-based inference under nuisance-rate and smoothness conditions, and show in simulations and an HIV-1 neutralization-resistance analysis that DFI yields stable, interpretable, uncertainty-quantified attributions of shared predictive signal.

2502.01226 2026-06-09 cs.LG stat.ML 版本更新

Adaptive Prior Selection in Gaussian Process Bandits with Thompson Sampling

基于高斯过程强化学习的自适应先验选择

Jack Sandberg, Morteza Haghir Chehreghani

AI总结 本文提出两种算法,通过高斯过程强化学习进行先验选择和后悔最小化,理论分析证明了HP-GP-TS的亚线性后悔界,并通过实验验证其有效性。

详情
Comments
30 pages, 12 figures
AI中文摘要

高斯过程(GP)强化学习为未知函数的黑箱优化提供了强大框架。未知函数的特性严重依赖于假设的GP先验。大多数文献假设先验已知,但实践中很少成立。本文研究了两种算法:Prior-Elimination GP-TS(PE-GP-TS)通过排除预测性能差的先验,以及HyperPrior GP-TS(HP-GP-TS)利用双层汤普森采样方案。我们理论分析了这些算法,并为HP-GP-TS建立了亚线性后悔界。此外,我们通过合成和现实数据的实验展示了这些算法相对于替代方案的有效性。

英文摘要

Gaussian process (GP) bandits provide a powerful framework for performing blackbox optimization of unknown functions. The characteristics of the unknown function depend heavily on the assumed GP prior. Most work in the literature assume that this prior is known but in practice this seldom holds. Instead, practitioners often rely on maximum likelihood estimation to select the hyperparameters of the prior - which lacks theoretical guarantees. In this work, we study two algorithms for joint prior selection and regret minimization in GP bandits based on GP Thompson sampling (GP-TS): Prior-Elimination GP-TS (PE-GP-TS) that disqualifies priors with poor predictive performance, and HyperPrior GP-TS (HP-GP-TS) that utilizes a bi-level Thompson sampling scheme. We theoretically analyze the algorithms and establish a sublinear regret bound for HP-GP-TS. In addition, we demonstrate the effectiveness of these algorithms compared to the alternatives through extensive experiments with synthetic and real-world data.

2602.15327 2026-06-09 cs.LG cs.AI cs.CL stat.ML 版本更新

Prescriptive Scaling Reveals the Evolution of Language Model Capabilities

规范性缩放揭示语言模型能力的演变

Hanlin Zhang, Jikai Jin, Vasilis Syrgkanis, Sham Kakade

AI总结 通过大规模观测评估和分位数回归,提出规范性缩放定律,将预训练计算预算映射到下游准确率,并验证其时间稳定性,引入平衡I-最优采样算法降低评估成本。

详情
Comments
ICML 2026 Oral. Blog Post: https://jkjin.com/prescriptive-scaling
AI中文摘要

机器学习模型性能的提升往往源于竞争和应用。针对部署,我们考虑规范性缩放定律:给定预训练计算预算,通过当代后训练实践可获得的下游准确率是多少,以及随着领域发展该映射的稳定性如何?我们使用大规模观测评估,涵盖2022-2026年间六个基准测试的5000个现有和2000个新评估的模型检查点,通过带有单调饱和S型参数化的平滑分位数回归,估计能力边界(即基准分数作为对数预训练FLOPs函数的高条件分位数)。我们通过在早期模型代上拟合并在后续版本上评估来验证时间可靠性:在六个任务中的四个上,分布外覆盖误差低于2%,而数学推理能力边界随时间持续提升。例如,在预算为10^24 FLOPs时,IFEval上的估计可达准确率为0.83,MATH Lvl 5上为0.54。然后我们扩展方法以分析任务相关的饱和性,并探测数学推理任务中与污染相关的偏移。最后,我们引入一种平衡I-最优采样算法,该算法使用约20%的参数计数加权评估预算(某些任务低至5%)恢复接近全数据的前沿,同时保持可比的校准。总之,我们的工作发布了Proteus-2k(最新的模型性能评估数据集),并引入了一种实用方法,将计算预算转化为可靠的性能预期,并监测能力边界随时间的变化。

英文摘要

Machine learning model performance improvements tend to arise from competition and application. For deployment, we consider prescriptive scaling laws: given a pre-training compute budget, what downstream accuracy is attainable with contemporary post-training practice, and how stable is that mapping as the field evolves? Using large-scale observational evaluations with 5k existing and 2k newly evaluated model checkpoints spanning 2022-2026 across six benchmarks, we estimate capability boundaries, high conditional quantiles of benchmark scores as a function of log pre-training FLOPs, via smoothed quantile regression with a monotone, saturating sigmoid parameterization. We validate temporal reliability by fitting on earlier model generations and evaluating on later releases: across four of six tasks, the out-of-distribution coverage error remains below 2%, while math reasoning exhibits a consistently advancing boundary over time. For instance, at a budget of 10^24 FLOPs, the estimated attainable accuracies are 0.83 on IFEval and 0.54 on MATH Lvl 5. We then extend our approach to analyze task-dependent saturation and to probe contamination-related shifts on math reasoning tasks. Finally, we introduce a balanced I-optimal sampling algorithm that recovers near-full-data frontiers using roughly 20% of the parameter-count-weighted evaluation budget, as low as 5% on some tasks, while maintaining comparable calibration. Together, our work releases Proteus-2k, the latest model performance evaluation dataset, and introduces a practical methodology for translating compute budgets into reliable performance expectations and for monitoring when capability boundaries shift across time.

2602.12107 2026-06-09 cs.LG cs.AI stat.ML 版本更新

On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage

离线强化学习在 $Q^\star$ 近似与部分覆盖下的复杂性

Haolin Liu, Braham Snyder, Chen-Yu Wei

AI总结 本文通过信息论下界证明 $Q^\star$ 可实现性与贝尔曼完备性在部分覆盖下不足以实现样本高效的离线强化学习,并提出一个通用决策-估计框架来统一和改进现有结果。

详情
AI中文摘要

我们研究了在 $Q^\star$ 近似和部分覆盖下的离线强化学习,这一设定激发了诸如保守 $Q$ 学习(CQL;Kumar et al., 2020)等实用算法,但理论上受到的关注有限。我们的工作受以下开放问题的启发:“在部分覆盖下,$Q^\star$ 可实现性和贝尔曼完备性是否足以实现样本高效的离线强化学习?”我们通过信息论下界给出了否定答案。为了识别在部分覆盖下实现样本高效离线强化学习的额外结构,我们引入了一个通用决策-估计框架,该框架受在线强化学习的无模型决策-估计系数(DEC;Foster et al., 2023b; Liu et al., 2025b)启发。我们的框架将离线强化学习的复杂性分解为决策复杂性和值估计误差,从而允许对这两个子问题进行模块化研究。我们的结果不仅统一了现有结果(Chen and Jiang, 2022; Uehara et al., 2023),而且进一步改进并推广了它们。在决策复杂性方面,我们的改进包括:在部分覆盖下软 $Q$ 学习的首个 $\epsilon^{-2}$ 样本复杂度界,改进了 Uehara 等人(2023)的 $\epsilon^{-4}$ 界;在 Chen 和 Jiang(2022)的值间隙设定中消除了对额外在线交互的需求;以及超越上述两种情况的新可学习设定。在值估计方面,我们提供了在部分覆盖下贝尔曼完备性作用的新刻画,以及一般低贝尔曼秩 MDP(Jiang et al., 2017; Du et al., 2021; Jin et al., 2021)离线可学习性的首个刻画。后者是一个经典的在线强化学习设定,除特殊情况外,在离线强化学习中尚未被探索。作为附带贡献,我们的技术给出了函数近似设定下 CQL 的首个分析。

英文摘要

We study offline reinforcement learning under $Q^\star$-approximation and partial coverage, a setting that motivates practical algorithms such as Conservative $Q$-Learning (CQL; Kumar et al., 2020) but has received limited theoretical attention. Our work is inspired by the following open question: "Are $Q^\star$-realizability and Bellman completeness sufficient for sample-efficient offline RL under partial coverage?" We answer in the negative via an information-theoretic lower bound. To identify additional structure that enables sample-efficient offline RL under partial coverage, we introduce a general decision-estimation framework, inspired by model-free decision-estimation coefficients (DEC) for online RL (Foster et al., 2023b; Liu et al., 2025b). Our framework decomposes offline RL complexity into decision complexity and value estimation error. This allows modular study of both sub-problems. Our result not only unifies existing results (Chen and Jiang, 2022; Uehara et al., 2023), but further improves and generalizes them. On the decision complexity side, our improvement includes: the first $ε^{-2}$ sample complexity bound for soft $Q$-learning under partial coverage that improves Uehara et al.'s (2023) $ε^{-4}$ bound, the removal of the need for additional online interaction in the value-gap setting of Chen and Jiang (2022), and new learnable settings beyond the above two cases. On the value estimation side, we provide a new characterization of the role of Bellman completeness under partial coverage, and the first characterization of offline learnability for general low-Bellman-rank MDPs (Jiang et al., 2017; Du et al., 2021; Jin et al., 2021). The latter is a canonical online RL setting that has remained unexplored in offline RL except for special cases. As a side contribution, our techniques give the first analysis of CQL in the function approximation setting.

2602.04402 2026-06-09 stat.ML cs.AI cs.CY cs.LG math.ST stat.TH 版本更新

Performative Learning Theory

表现性学习理论

Julian Rodemann, Unai Fischer-Abaigar, James Bailie, Krikamol Muandet

AI总结 将表现性预测嵌入统计学习理论,证明在样本和总体表现性效应下的泛化界,揭示模型影响数据越多则学习越少的权衡,并提出通过再训练改善泛化保证。

详情
Comments
ICML 2026. v2: corrected typo in author list; v3: added explanation of condition 3.2, modified condition 3.3 and fixed lemma 3.4, added examples and explanations in sections 2, 5, and 6
AI中文摘要

表现性预测会影响它们试图预测的结果。我们研究影响样本(例如,仅限现有应用用户)和/或整个总体(例如,所有潜在应用用户)的表现性预测。这引发了模型在表现性下泛化能力的问题。例如,当现有用户和新用户都对应用的预测做出反应时,我们基于现有用户对新用户能得出多好的见解?我们通过将表现性预测嵌入统计学习理论来解决这个问题。我们证明了在样本、总体以及两者共同影响下的泛化界。我们证明背后的一个关键直觉是,在最坏情况下,总体否定预测,而样本欺骗性地实现预测。我们分别将这种自我否定和自我实现的预测表述为Wasserstein空间中的最小-最大和最小-最小风险泛函。我们的分析揭示了表现性地改变世界与从中学习之间的基本权衡:模型对数据的影响越大,它能从数据中学到的就越少。此外,我们的分析得出一个令人惊讶的见解:通过对表现性扭曲的样本进行再训练,可以改善泛化保证。我们通过一个案例研究说明了我们的界,该案例涉及基于预测的德国失业居民工作培训分配,利用了德国1975年至2017年的行政劳动力市场记录。

英文摘要

Performative predictions influence the very outcomes they aim to forecast. We study performative predictions that affect a sample (e.g., only existing users of an app) and/or the whole population (e.g., all potential app users). This raises the question of how well models generalize under performativity. For example, how well can we draw insights about new app users based on existing users when both of them react to the app's predictions? We address this question by embedding performative predictions into statistical learning theory. We prove generalization bounds under performative effects on the sample, on the population, and on both. A key intuition behind our proofs is that in the worst case, the population negates predictions, while the sample deceptively fulfills them. We cast such self-negating and self-fulfilling predictions as min-max and min-min risk functionals in Wasserstein space, respectively. Our analysis reveals a fundamental trade-off between performatively changing the world and learning from it: the more a model affects data, the less it can learn from it. Moreover, our analysis results in a surprising insight on how to improve generalization guarantees by retraining on performatively distorted samples. We illustrate our bounds in a case study on prediction-informed assignments of unemployed German residents to job trainings, drawing upon administrative labor market records from 1975 to 2017 in Germany.

2602.05869 2026-06-09 stat.ML cs.LG cs.NA math.NA math.PR math.ST stat.TH 版本更新

Wedge Sampling: Efficient Tensor Completion with Nearly-Linear Sample Complexity

楔形采样:具有近线性样本复杂性的高效张量补全

Hengrui Luo, Anna Ma, Ludovic Stephan, Yizhe Zhu

AI总结 提出楔形采样非自适应方案,通过结构化长度二模式(楔形)分配观测,在均匀采样稀疏时增强谱信号,实现近线性样本复杂度的张量补全。

详情
Comments
COLT 2026 arXiv version. 65 pages, 3 figures
AI中文摘要

我们引入了楔形采样(Wedge Sampling),一种用于低秩张量补全的新型非自适应采样方案。我们研究从部分条目中恢复维度为 $n \times \cdots \times n$ 的 $k$ 阶低秩张量。与标准均匀条目模型(即来自 $[n]^k$ 的 i.i.d. 样本)不同,楔形采样将观测分配到关联二分采样图中的结构化长度二模式(楔形)。通过直接促进这些长度二连接,采样设计增强了在均匀采样过于稀疏而无法产生足够信息相关性的情况下高效初始化所依赖的谱信号。我们的主要结果表明,这种采样范式的改变使得多项式时间算法能够以 $n$ 的近线性样本复杂度实现弱恢复和精确恢复。该方法也是即插即用的:基于楔形采样的谱初始化可以与现有的细化过程(例如,谱方法或梯度方法)结合,仅需额外 $\tilde{O}(n)$ 个均匀采样条目,显著优于在均匀条目采样下高效方法通常所需的 $\tilde{O}(n^{k/2})$ 样本复杂度。总体而言,我们的结果表明,Barak 和 Moitra (2022) 中强调的统计-计算差距在很大程度上是张量补全中均匀条目采样模型的结果,而保证强初始化的替代非自适应测量设计可以克服这一障碍。

英文摘要

We introduce Wedge Sampling, a new non-adaptive sampling scheme for low-rank tensor completion. We study recovery of an order-$k$ low-rank tensor of dimension $n \times \cdots \times n$ from a subset of its entries. Unlike the standard uniform entry model (i.e., i.i.d. samples from $[n]^k$), wedge sampling allocates observations to structured length-two patterns (wedges) in an associated bipartite sampling graph. By directly promoting these length-two connections, the sampling design strengthens the spectral signal that underlies efficient initialization, in regimes where uniform sampling is too sparse to generate enough informative correlations. Our main result shows that this change in sampling paradigm enables polynomial-time algorithms to achieve both weak and exact recovery with nearly linear sample complexity in $n$. The approach is also plug-and-play: wedge-sampling-based spectral initialization can be combined with existing refinement procedures (e.g., spectral or gradient-based methods) using only an additional $\tilde{O}(n)$ uniformly sampled entries, substantially improving over the $\tilde{O}(n^{k/2})$ sample complexity typically required under uniform entry sampling for efficient methods. Overall, our results suggest that the statistical-to-computational gap highlighted in Barak and Moitra (2022) is, to a large extent, a consequence of the uniform entry sampling model for tensor completion, and that alternative non-adaptive measurement designs that guarantee a strong initialization can overcome this barrier.

2602.03682 2026-06-09 stat.ML cs.DC cs.LG cs.NA math.NA 版本更新

Improved Analysis of the Accelerated Noisy Power Method with Applications to Decentralized PCA

加速噪声幂方法的改进分析及其在分布式PCA中的应用

Pierre Aguié, Mathieu Even, Laurent Massoulié

AI总结 本文改进了加速噪声幂方法的分析,在更宽松的扰动条件下保持加速收敛速率,并首次提出具有可证明加速收敛的分布式PCA算法。

详情
AI中文摘要

我们分析了加速噪声幂方法,这是一种在仅有不精确矩阵-向量乘积可用的情况下进行主成分分析的算法,例如在分布式PCA中可能出现的情况。虽然先前的工作已经证明,与标准噪声幂方法相比,加速可以改善收敛速度,但这些保证需要对扰动幅度进行过度严格的上界限制,限制了其实用性。我们提供了该算法的改进分析,在更温和的扰动条件下保持了加速收敛速率。我们证明我们的新分析在最坏情况下是最优的,即收敛速率无法进一步提高,并且我们推导的噪声条件在不牺牲收敛保证的情况下无法放宽。我们通过推导一种用于分布式PCA的加速算法来展示我们结果的实际相关性,该算法具有与非加速方法相似的通信成本。据我们所知,这是第一个具有可证明加速收敛的分布式PCA算法。

英文摘要

We analyze the Accelerated Noisy Power Method, an algorithm for Principal Component Analysis in the setting where only inexact matrix-vector products are available, which can arise for instance in decentralized PCA. While previous works have established that acceleration can improve convergence rates compared to the standard Noisy Power Method, these guarantees require overly restrictive upper bounds on the magnitude of the perturbations, limiting their practical applicability. We provide an improved analysis of this algorithm, which preserves the accelerated convergence rate under much milder conditions on the perturbations. We show that our new analysis is worst-case optimal, in the sense that the convergence rate cannot be improved, and that the noise conditions we derive cannot be relaxed without sacrificing convergence guarantees. We demonstrate the practical relevance of our results by deriving an accelerated algorithm for decentralized PCA, which has similar communication costs to non-accelerated methods. To our knowledge, this is the first decentralized algorithm for PCA with provably accelerated convergence.

2602.02431 2026-06-09 stat.ML cs.LG 版本更新

Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning

全批量梯度下降优于单次SGD:单索引学习中的样本复杂度分离

Filip Kovačević, Hong Chang Ji, Denny Wu, Mahdi Soltanolkotabi, Marco Mondelli

AI总结 研究单索引学习中全批量GD与单次SGD的样本复杂度差异,发现通过截断激活函数,全批量GD在n≃d样本时实现弱恢复,优于单次SGD的n≳d log d样本需求。

详情
Comments
Accepted to ICML 2026
AI中文摘要

传统观点认为,多次重用训练数据可以提高基于梯度的学习的统计效率。虽然这一现象在线性回归中已被广泛研究,但在非线性和非凸设置中,除了前两次数据传递实现的损失修改机制外,多遍梯度下降(GD,重用所有数据)相对于单遍随机梯度下降(在线SGD,每个数据点仅使用一次)的优势尚未得到充分理解。在这项工作中,我们考虑学习一个具有二次激活函数的$d$维单索引模型,已知单次SGD需要$n\gtrsim d\log d$个样本才能实现弱恢复。我们首先证明,对于相关损失上的全批量球面GD,样本复杂度中的$\log d$因子仍然存在;然而,通过简单地截断激活函数,全批量GD在$n \simeq d$个样本时展现出有利的优化景观,从而在统计效率上优于单次SGD(使用相同的激活函数)。我们通过从微小初始化开始的平方损失上全批量GD的轨迹分析补充了这一结果,表明$n \gtrsim d$个样本和$T \gtrsim\log d$个梯度步足以实现强(精确)恢复。

英文摘要

It is folklore that reusing training data more than once can improve the statistical efficiency of gradient-based learning. While this phenomenon has been extensively studied in linear regression, the benefit of multi-pass gradient descent (GD, which reuses all the data) over one-pass stochastic gradient descent (online SGD, which uses each data point only once) is not well-understood in nonlinear and non-convex settings, except for a loss modification mechanism achieved by the first two passes on the data. In this work, we consider learning a $d$-dimensional single-index model with a quadratic activation, for which it is known that one-pass SGD requires $n\gtrsim d\log d$ samples to achieve weak recovery. We first show that this $\log d$ factor in the sample complexity persists for full-batch spherical GD on the correlation loss; however, by simply truncating the activation, full-batch GD exhibits a favorable optimization landscape at $n \simeq d$ samples, thereby outperforming one-pass SGD (with the same activation) in statistical efficiency. We complement this result with a trajectory analysis of full-batch GD on the squared loss from small initialization, showing that $n \gtrsim d$ samples and $T \gtrsim\log d$ gradient steps suffice to achieve strong (exact) recovery.

2602.00797 2026-06-09 stat.ML cs.LG 版本更新

Zero-Flow Encoders

零流编码器

Yakun Wang, Leyang Wang, Song Liu, Taiji Suzuki

AI总结 本文提出了一种基于流的表示学习框架,通过零流准则验证条件独立性,从而在生成模型中提取充分信息,并在图模型和自监督学习任务中学习近似马尔可夫毯和潜在表示。

详情
Comments
Yakun Wang and Leyang Wang contributed equally to this work; As published at ICML 2026
AI中文摘要

基于流的方法在各种生成建模任务中取得了显著成功,能够捕捉复杂数据分布中的细微细节。然而,现有研究很少利用这一独特能力来解决超出生成任务的细粒度结构细节。本文提出了一种流启发式的表示学习框架。首先,我们证明了如果源分布和目标分布相同,独立耦合训练的修正流在t=0.5时处处为零。我们称这一性质为零流准则。其次,我们展示该准则可以验证条件独立性,从而从数据中提取充分信息。第三,我们将这一准则转化为可计算且无需模拟的损失函数,从而在图模型中学习近似马尔可夫毯和自监督学习任务中的潜在表示。在模拟和真实世界数据集上的实验验证了本文方法的有效性。代码可在https://github.com/probabilityFLOW/zfe上找到。

英文摘要

Flow-based methods have achieved significant success in various generative modeling tasks, capturing nuanced details within complex data distributions. However, few existing works have exploited this unique capability to resolve fine-grained structural details beyond generation tasks. This paper presents a flow-inspired framework for representation learning. First, we demonstrate that a rectified flow trained using independent coupling is zero everywhere at $t=0.5$ if and only if the source and target distributions are identical. We term this property the \emph{zero-flow criterion}. Second, we show that this criterion can certify conditional independence, thereby extracting \emph{sufficient information} from the data. Third, we translate this criterion into a tractable, simulation-free loss function that enables learning amortized Markov blankets in graphical models and latent representations in self-supervised learning tasks. Experiments on both simulated and real-world datasets demonstrate the effectiveness of our approach. The code reproducing our experiments can be found at: https://github.com/probabilityFLOW/zfe.

2601.21522 2026-06-09 cs.LG cond-mat.dis-nn cs.AI stat.ML 版本更新

More Bang for the Buck: Improving the Inference of Large Language Models at a Fixed Budget using Reset and Discard (ReD)

更高效利用预算:使用重置与丢弃(ReD)方法在固定预算下提升大型语言模型的推理性能

Sagi Meir, Tommer D. Keidar, Noam Levi, Shlomi Reuveni, Barak Hirshberg

AI总结 针对固定预算下大型语言模型推理的收益递减问题,提出重置与丢弃(ReD)查询方法,通过优化尝试分配提升覆盖率,并在编码、数学和推理基准上验证了其成本节约效果。

详情
AI中文摘要

大型语言模型(LLMs)在可验证任务上的性能通常通过 pass@k 衡量,即在 k 次尝试中至少正确回答一次的概率。在固定预算下,更合适的指标是 coverage@cost,即作为总尝试次数函数的平均唯一回答问题数量。我们连接这两个指标,并证明 pass@k 中经验观察到的幂律行为导致 coverage@cost 的次线性增长(收益递减)。为解决此问题,我们提出重置与丢弃(ReD),一种 LLMs 的查询方法,无论 pass@k 的形式如何,都能在给定预算下增加 coverage@cost。此外,给定 pass@k,我们可以定量预测使用 ReD 在总尝试次数上的节省。如果模型的 pass@k 不可用,ReD 可以推断其幂律指数。在三个 LLMs 上进行的编码(HumanEval)、数学(GSM8K)和推理(MMLU-Pro)基准测试表明,ReD 显著减少了达到期望覆盖率所需的尝试次数、令牌数和美元成本,同时提供了一种高效测量推理幂律的方法。ReD 的优势在非完美验证器下得以保持,并且优于测试的分配基线。

英文摘要

The performance of large language models (LLMs) on verifiable tasks is usually measured by pass@k, the probability of answering a question correctly at least once in k trials. At a fixed budget, a more suitable metric is coverage@cost, the average number of unique questions answered as a function of the total number of attempts. We connect the two metrics and show that the empirically-observed power-law behavior in pass@k leads to a sublinear growth of the coverage@cost (diminishing returns). To solve this problem, we propose Reset-and-Discard (ReD), a query method of LLMs that increases coverage@cost for a given budget, regardless of the pass@k form. Moreover, given a pass@k, we can quantitatively predict the savings in the total number of attempts using ReD. If pass@k is not available for the model, ReD can infer its power-law exponent. Experiments on three LLMs across coding (HumanEval), math (GSM8K), and reasoning (MMLU-Pro) benchmarks demonstrate that ReD substantially reduces the required attempts, tokens, and USD cost to reach a desired coverage, while also offering an efficient way to measure inference power-laws. ReD's advantage is maintained for imperfect verifiers and outperforms the tested allocation baselines.

2507.20975 2026-06-09 stat.ML cs.LG 版本更新

Locally Adaptive Conformal Inference for Operator Models

算子模型的局部自适应共形推断

Trevor Harris, Yan Liu

AI总结 提出局部切片共形推断(LSCI),一种无分布框架,为算子模型生成函数值、局部自适应预测集,在合成和实际任务中比共形基线更紧、适应性更强。

详情
Comments
12 pages, 3 figures, 2 tables, Preprint
AI中文摘要

算子模型是函数巴拿赫空间之间的回归算法。它们已成为时空预测和物理模拟中日益关键的工具,尤其是在需要稳健、校准的不确定性量化的高风险场景中。我们引入了局部切片共形推断(LSCI),这是一种无分布框架,用于为算子模型生成函数值、局部自适应的预测集。我们证明了有限样本有效性,并在局部可交换性下推导了覆盖差距的数据相关上界。在合成高斯过程任务和实际应用(空气质量监测、能源需求预测和天气预报)中,与共形基线相比,LSCI 产生了更紧且适应性更强的集合。我们还实验证明了其对有偏预测和某些分布外噪声模式的鲁棒性。

英文摘要

Operator models are regression algorithms between Banach spaces of functions. They have become an increasingly critical tool for spatiotemporal forecasting and physics emulation, especially in high-stakes scenarios where robust, calibrated uncertainty quantification is required. We introduce Local Sliced Conformal Inference (LSCI), a distribution-free framework for generating function-valued, locally adaptive prediction sets for operator models. We prove finite-sample validity and derive a data-dependent upper bound on the coverage gap under local exchangeability. On synthetic Gaussian-process tasks and real applications (air quality monitoring, energy demand forecasting, and weather prediction), LSCI yields tighter sets with stronger adaptivity compared to conformal baselines. We also empirically demonstrate robustness against biased predictions and certain out-of-distribution noise regimes.

2504.05349 2026-06-09 stat.ML cs.AI cs.LG 版本更新

Hyperflux: Pruning Reveals Importance

Hyperflux: 剪枝揭示重要性

Eugen Barbulescu, Antonio Alexoaie, Lucian Busoniu

AI总结 提出Hyperflux方法,通过将剪枝建模为连续演化系统(通量和压力),在微观和宏观层面解释剪枝行为,并引入压力调度器实现目标稀疏度,在多个数据集上取得竞争性结果。

详情
AI中文摘要

网络剪枝用于减少大型神经网络的推理延迟和功耗。然而,大多数方法侧重于经验结果,而牺牲了对剪枝过程的理解。我们引入Hyperflux,一种新颖的$L_0$方法,将剪枝建模为由通量(权重移除的梯度响应)和压力(驱动权重向剪枝发展的全局正则化)决定的连续演化系统。通过利用该模型,Hyperflux的剪枝行为在微观(权重再生/剪枝)和宏观(稀疏性收敛等)层面都变得可理解。我们还引入了一种新颖的压力调度器,可靠地针对目标稀疏度。Hyperflux在CIFAR-10、CIFAR-100和ImageNet数据集上使用ResNet-50、VGG-19和DeiT-T/S取得了竞争性结果。

英文摘要

Network pruning is used to reduce inference latency and power consumption in large neural networks. However, most methods focus on empirical results at the expense of understanding the pruning process. We introduce Hyperflux, a novel $L_0$ method which models pruning as a continuously evolving system determined by flux, the gradient response to a weight's removal, and pressure, a global regularization driving weights toward pruning. By exploiting this model, Hyperflux's pruning behavior becomes understandable at both microscopic (weight regrowth/pruning) and macroscopic (sparsity convergence, etc.) levels. We also introduce a novel pressure scheduler that reliably targets desired sparsities. Hyperflux achieves competitive results with ResNet-50, VGG-19 and DeiT-T/S on CIFAR-10, CIFAR-100 and ImageNet datasets.

2510.12744 2026-06-09 stat.ML cs.LG math.ST stat.CO stat.ME stat.TH 版本更新

Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency Without Model Sweeps

混合测度的树状图用于Softmax门控高斯混合专家:无需模型扫描的一致性

Do Tien Hai, Trung Nguyen Mai, TrungTin Nguyen, Nhat Ho, Binh T. Nguyen, Christopher Drovandi

AI总结 针对softmax门控高斯混合专家模型,提出基于Voronoi损失函数的统一统计框架,解决参数非可识别性和模型选择问题,并引入混合测度树状图实现一致且无需多尺寸训练的专家数选择。

详情
Comments
Do Tien Hai, Trung Nguyen Mai, and TrungTin Nguyen are co-first authors. In Proceedings of The 29th International Conference on Artificial Intelligence and Statistics, AISTATS 2026 Spotlight, Acceptance rate 2.5% over 2102 submissions
AI中文摘要

我们为softmax门控高斯混合专家(SGMoE)开发了一个统一的统计框架,解决了参数估计和模型选择中三个长期存在的障碍:(i)门控参数在公共平移下的非可识别性,(ii)内在的门控-专家交互导致似然中耦合的微分关系,以及(iii)softmax诱导的条件密度中紧密的分子-分母耦合。我们的方法引入了与门划分几何对齐的Voronoi型损失函数,并建立了最大似然估计(MLE)的有限样本收敛速率。在过指定模型中,我们揭示了MLE收敛速率与刻画接近非可识别方向的多项式方程组可解性之间的联系。对于模型选择,我们将混合测度的树状图适配到SGMoE,产生一个一致且无需扫描的专家数选择器,在过拟合下达到逐点最优的参数速率,同时避免多尺寸训练。在合成数据上的模拟验证了理论,准确恢复了专家数量并达到了参数估计的预测速率,同时紧密逼近回归函数。在模型误指定下(例如,$\epsilon$-污染),树状图选择准则具有鲁棒性,恢复了真实的混合成分数量,而Akaike信息准则、贝叶斯信息准则和集成完全似然在样本量增大时倾向于过选择。在一个干旱响应性状的玉米蛋白质组学数据集上,我们的树状图引导的SGMoE选择了两个专家,揭示了清晰的混合测度层次结构,早期稳定了似然,并产生了可解释的基因型-表型图谱,优于无需多尺寸训练的标准准则。

英文摘要

We develop a unified statistical framework for softmax-gated Gaussian mixture of experts (SGMoE) that addresses three long-standing obstacles in parameter estimation and model selection: (i) non-identifiability of gating parameters up to common translations, (ii) intrinsic gate-expert interactions that induce coupled differential relations in the likelihood, and (iii) the tight numerator-denominator coupling in the softmax-induced conditional density. Our approach introduces Voronoi-type loss functions aligned with the gate-partition geometry and establishes finite-sample convergence rates for the maximum likelihood estimator (MLE). In over-specified models, we reveal a link between the MLE's convergence rate and the solvability of an associated system of polynomial equations characterizing near-nonidentifiable directions. For model selection, we adapt dendrograms of mixing measures to SGMoE, yielding a consistent, sweep-free selector of the number of experts that attains pointwise-optimal parameter rates under overfitting while avoiding multi-size training. Simulations on synthetic data corroborate the theory, accurately recovering the expert count and achieving the predicted rates for parameter estimation while closely approximating the regression function. Under model misspecification (e.g., $ε$-contamination), the dendrogram selection criterion is robust, recovering the true number of mixture components, while the Akaike information criterion, the Bayesian information criterion, and the integrated completed likelihood tend to overselect as sample size grows. On a maize proteomics dataset of drought-responsive traits, our dendrogram-guided SGMoE selects two experts, exposes a clear mixing-measure hierarchy, stabilizes the likelihood early, and yields interpretable genotype-phenotype maps, outperforming standard criteria without multi-size training.

2510.09783 2026-06-09 cs.LG cs.AI stat.ML 版本更新

Large Language Models for Imbalanced Classification: Diversity makes the difference

大语言模型用于不平衡分类:多样性至关重要

Dang Nguyen, Sunil Gupta, Kien Do, Thin Nguyen, Taylor Braund, Alexis Whitton, Svetha Venkatesh

AI总结 提出基于大语言模型的过采样方法,通过条件采样、排列微调和插值样本增强多样性,在10个表格数据集上优于8个基线方法。

详情
AI中文摘要

过采样是解决不平衡分类最广泛使用的方法之一。其核心思想是生成额外的少数类样本以重新平衡数据集。大多数现有方法(如SMOTE)需要将分类变量转换为数值向量,这通常会导致信息损失。最近,基于大语言模型(LLM)的方法被引入以克服这一限制。然而,当前的LLM方法通常生成多样性有限的少数类样本,降低了下游分类任务的鲁棒性和泛化能力。为了解决这一问题,我们提出了一种新的基于LLM的过采样方法,旨在增强多样性。首先,我们引入了一种采样策略,将合成样本生成条件化为少数类标签和特征。其次,我们开发了一种新的排列策略来微调预训练的LLM。第三,我们不仅在少数类样本上微调LLM,还在插值样本上微调以进一步丰富变异性。在10个表格数据集上的大量实验表明,我们的方法显著优于八个SOTA基线。生成的合成样本既真实又多样。此外,我们通过基于熵的视角提供了理论分析,证明了我们的方法鼓励生成样本的多样性。

英文摘要

Oversampling is one of the most widely used approaches for addressing imbalanced classification. The core idea is to generate additional minority samples to rebalance the dataset. Most existing methods, such as SMOTE, require converting categorical variables into numerical vectors, which often leads to information loss. Recently, large language model (LLM)-based methods have been introduced to overcome this limitation. However, current LLM-based approaches typically generate minority samples with limited diversity, reducing robustness and generalizability in downstream classification tasks. To address this gap, we propose a novel LLM-based oversampling method designed to enhance diversity. First, we introduce a sampling strategy that conditions synthetic sample generation on both minority labels and features. Second, we develop a new permutation strategy for fine-tuning pre-trained LLMs. Third, we fine-tune the LLM not only on minority samples but also on interpolated samples to further enrich variability. Extensive experiments on 10 tabular datasets demonstrate that our method significantly outperforms eight SOTA baselines. The generated synthetic samples are both realistic and diverse. Moreover, we provide theoretical analysis through an entropy-based perspective, proving that our method encourages diversity in the generated samples.

2502.15131 2026-06-09 math.ST cs.LG stat.ME stat.ML stat.TH 版本更新

Optimal and Provable Calibration in High-Dimensional Binary Classification: Angular Calibration and Platt Scaling

高维二分类中的最优且可证明的校准:角度校准与Platt缩放

Yufan Li, Pragya Sur

AI总结 针对高维高斯特征下的线性二分类器,提出基于估计权重与真实权重夹角的角度校准方法,证明其可校准且唯一Bregman最优,并揭示Platt缩放在高维下收敛于该最优解。

详情
AI中文摘要

我们研究校准形如 $\sigma(\hat{w}^\top x)$ 的线性二分类器的基本问题,其中特征向量 $x$ 服从高斯分布,$\sigma$ 是链接函数,$\hat{w}$ 是真实线性权重 $w^\star$ 的估计量。通过与非信息性的 $\textit{机会分类器}$ 插值,我们构建了一个良好校准的预测器,其插值权重取决于估计量 $\hat{w}$ 与真实线性权重 $w_\star$ 之间的夹角 $\angle(\hat{w}, w_\star)$。我们证明,在样本量和特征量均以可比速率发散的高维机制下,这种角度校准方法可证明是良好校准的。夹角 $\angle(\hat{w}, w_\star)$ 可以一致地估计。此外,所得预测器是唯一 $\textit{Bregman最优}$ 的,即在合适的校准预测器类中最小化与真实标签分布的Bregman散度。我们的工作是首个在高维下同时满足校准和最优性可证明的校准策略。此外,我们识别了经典Platt缩放预测器收敛到我们的Bregman最优校准解的条件。因此,Platt缩放在高维下也继承了这些理想性质。

英文摘要

We study the fundamental problem of calibrating a linear binary classifier of the form $σ(\hat{w}^\top x)$, where the feature vector $x$ is Gaussian, $σ$ is a link function, and $\hat{w}$ is an estimator of the true linear weight $w^\star$. By interpolating with a noninformative $\textit{chance classifier}$, we construct a well-calibrated predictor whose interpolation weight depends on the angle $\angle(\hat{w}, w_\star)$ between the estimator $\hat{w}$ and the true linear weight $w_\star$. We establish that this angular calibration approach is provably well-calibrated in a high-dimensional regime where the number of samples and features both diverge, at a comparable rate. The angle $\angle(\hat{w}, w_\star)$ can be consistently estimated. Furthermore, the resulting predictor is uniquely $\textit{Bregman-optimal}$, minimizing the Bregman divergence to the true label distribution within a suitable class of calibrated predictors. Our work is the first to provide a calibration strategy that satisfies both calibration and optimality properties provably in high dimensions. Additionally, we identify conditions under which a classical Platt-scaling predictor converges to our Bregman-optimal calibrated solution. Thus, Platt-scaling also inherits these desirable properties provably in high dimensions.

2509.24467 2026-06-09 cs.LG stat.ML 版本更新

Interpretable Self-Supervised Learning via Representer Landmarks and Nyström Approximation

通过表征地标和Nyström近似的可解释自监督学习

Maedeh Zarvandi, Michael Timothy, Theresa Wasserer, Debarghya Ghoshdastidar

AI总结 提出KREPES框架,利用表征地标和Nyström近似,对自监督学习目标(SimCLR、BYOL、VICReg)学到的表征进行可解释性分析,并引入新指标量化透明度。

详情
Comments
24 pages, 10 figures. Accepted to the 43rd International Conference on Machine Learning (ICML 2026)
AI中文摘要

自监督学习(SSL)从大量未标记数据中学习表征,但所得模型通常作为黑盒运行,需要特定领域的解释。我们引入了KREPES,一个统一的框架,用于分析解释SSL目标(包括SimCLR、BYOL和VICReg)学到的表征。通过将神经网络的实证神经正切核近似与核的表征定理联系起来,我们直接通过“表征地标”(即具有影响力的未标记训练样本的表征)来表达学到的潜在空间。我们引入了新指标:“样本特定影响分数”、“条件概念影响分数”和“特征对齐差距”,以量化所学表征的透明度。KREPES能够在无监督的情况下直接审计潜在空间,例如,揭示Adult-1M数据集中的算法偏差,其中SSL使用人口统计代理来预测收入。最后,为了确保在具有100万以上样本的基准测试(ImageNet-1K、Adult-1M)上的可扩展性,KREPES为SSL目标引入了一种基于Nyström近似的新型分析推理框架。

英文摘要

Self-supervised learning (SSL) learns representations from massive unlabeled data, yet the resulting models typically operate as black boxes, necessitating domain-specific explanations. We introduce KREPES, a unified framework to analytically interpret the learned representations of SSL objectives, including SimCLR, BYOL, and VICReg. By bridging empirical neural tangent kernel approximations of neural networks with the Representer Theorem for kernels, we express the learned latent space directly via "Representer Landmarks", which are the representations of influential unlabeled training examples. We introduce novel metrics, "Sample-Specific Influence Score", "Concept-Conditioned Influence Score" and "Feature Alignment Gap", to quantify the transparency of the learned representations. KREPES enables direct audit of the latent space without supervision, for example, revealing an algorithmic bias in the Adult-1M dataset where SSL uses demographic proxies for income. Finally, to ensure scalability to benchmarks with 1M+ samples (ImageNet-1K, Adult-1M), KREPES introduces a novel Nyström approximation-based analytical inference framework for SSL objectives.

2506.01052 2026-06-09 cs.LG math.OC stat.ML 版本更新

A Robust $\widetilde{\mathcal{O}}(1/\sqrt{T})$ Rate for Unprojected TD Learning with Linear Function Approximation

线性函数逼近的无投影TD学习的鲁棒 $\widetilde{\mathcal{O}}(1/\sqrt{T})$ 收敛率

Wei-Cheng Lee, Francesco Orabona

AI总结 本文针对线性函数逼近的时序差分学习,在无投影条件下证明了期望收敛率为 $\widetilde{\mathcal{O}}(\\|\theta^*\\|^2_2/\sqrt{T})$,仅需对学习率进行轻微的对数修正,无需额外正则条件。

详情
AI中文摘要

我们研究了线性函数逼近的时序差分(TD)学习的有限时间收敛性质,这是强化学习的基石。我们关注所谓的“鲁棒”设置,其中收敛保证不依赖于势函数的最小曲率。虽然先前的工作已经建立了该设置下的收敛保证,但这些结果通常依赖于每次迭代被投影到有界集上的人为假设。Bhandari 等人(COLT'18)将去除这一条件留作开放问题,并假设需要额外的“正则条件”。在本文中,我们表明,即使存在马尔可夫噪声,简单的无投影 TD(0) 也能以期望的 $\widetilde{\mathcal{O}}\left(\frac{\\|\theta^*\\|^2_2}{\sqrt{T}}\right)$ 速率收敛。我们不需要额外的正则条件,仅需对学习率进行轻微的对数修正。我们的分析揭示了 TD 更新的一种新的自界性质,并利用它来保证迭代的有界性。

英文摘要

We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation, a cornerstone of reinforcement learning. We are interested in the so-called ``robust'' setting, where the convergence guarantee does not depend on the potential function's minimal curvature. While prior work has established convergence guarantees in this setting, these results typically rely on the artificial assumption that each iterate is projected onto a bounded set. Removing such a condition was left as an open problem by Bhandari et al. (COLT'18), hypothesizing the need for additional ``regularity conditions''. In this paper, we show that the simple unprojected TD(0) converges with a rate of $\widetilde{\mathcal{O}}\left(\frac{\|θ^*\|^2_2}{\sqrt{T}}\right)$ in expectation, even in the presence of Markovian noise. We do not require an additional regularity condition, but only a minor polylog correction to the learning rate. Our analysis reveals a novel self-bounding property of the TD updates and exploits it to guarantee bounded iterates.

2508.10888 2026-06-09 stat.ML math.MG 版本更新

Conic Formulations of Transport Metrics for Unbalanced Measure Networks and Hypernetworks

非平衡测度网络与超网络的运输度量锥形公式

Mary Chriselda Antony Oliver, Emmanuel Hartman, Tom Needham

AI总结 提出基于半耦合的锥形Gromov-Wasserstein距离新公式,扩展至网络与超网络比较,证明缩放、收敛和鲁棒性性质,并设计块坐标上升算法。

详情
Comments
41 pages, 6 figures
AI中文摘要

最优运输的Gromov-Wasserstein (GW) 变体旨在比较定义在不同度量空间上的概率密度,已成为分析具有复杂结构数据(如点云或网络集合)的重要工具。为了克服某些限制,例如仅限于比较等质量测度以及对异常值的敏感性,近期文献中引入了GW距离的几种非平衡或部分运输松弛。本文关注Séjourné、Vialard和Peyré引入的锥形Gromov-Wasserstein (CGW) 距离。我们提供了一种基于半耦合的新公式,并将框架扩展到度量测度空间设置之外,以比较更一般的网络和超网络结构。通过这一新公式,我们建立了CGW度量的几个基本性质,包括其在膨胀下的缩放行为、体积增长约束极限下的变分收敛性,以及与已建立的最优运输度量的比较界。我们进一步推导了定量界,刻画了CGW度量对底层测度扰动的鲁棒性。CGW的超网络公式允许一种简单且可证明收敛的块坐标上升算法用于其估计,并通过在合成和真实世界高维结构化数据集上的实验展示了我们方法的计算可行性和可扩展性。

英文摘要

The Gromov-Wasserstein (GW) variant of optimal transport, designed to compare probability densities defined over distinct metric spaces, has emerged as an important tool for the analysis of data with complex structure, such as ensembles of point clouds or networks. To overcome certain limitations, such as the restriction to comparisons of measures of equal mass and sensitivity to outliers, several unbalanced or partial transport relaxations of the GW distance have been introduced in the recent literature. This paper is concerned with the Conic Gromov-Wasserstein (CGW) distance introduced by Séjourné, Vialard, and Peyré. We provide a novel formulation in terms of semi-couplings, and extend the framework beyond the metric measure space setting, to compare more general network and hypernetwork structures. With this new formulation, we establish several fundamental properties of the CGW metric, including its scaling behavior under dilation, variational convergence in the limit of volume growth constraints, and comparison bounds with established optimal transport metrics. We further derive quantitative bounds that characterize the robustness of the CGW metric to perturbations in the underlying measures. The hypernetwork formulation of CGW admits a simple and provably convergent block coordinate ascent algorithm for its estimation, and we demonstrate the computational tractability and scalability of our approach through experiments on synthetic and real-world high-dimensional and structured datasets.

2506.04480 2026-06-09 stat.ML cs.LG stat.ME 版本更新

On the Wasserstein Geodesic Principal Component Analysis of probability measures

关于概率测度的Wasserstein测地主成分分析

Nina Vesseron, Elsa Cazelles, Alice Le Brigant, Thierry Klein

AI总结 本文利用Otto-Wasserstein几何,对概率分布集合进行测地主成分分析,通过识别概率测度空间中的测地线来捕捉数据变化模式,并针对高斯分布和绝对连续概率测度提出计算方法。

详情
AI中文摘要

本文关注使用Otto-Wasserstein几何对概率分布集合进行测地主成分分析(GPCA)。目标是识别概率测度空间中能够最好地捕捉底层数据集变化模式的测地线。我们首先处理高斯分布集合的情况,并展示如何将计算提升到可逆线性映射的空间。对于更一般的绝对连续概率测度设置,我们利用一种新颖的方法,通过神经网络参数化Wasserstein空间中的测地线。最后,我们通过各种示例与经典切空间PCA进行比较,并在真实世界数据集上提供说明。

英文摘要

This paper focuses on Geodesic Principal Component Analysis (GPCA) on a collection of probability distributions using the Otto-Wasserstein geometry. The goal is to identify geodesic curves in the space of probability measures that best capture the modes of variation of the underlying dataset. We first address the case of a collection of Gaussian distributions, and show how to lift the computations in the space of invertible linear maps. For the more general setting of absolutely continuous probability measures, we leverage a novel approach to parameterizing geodesics in Wasserstein space with neural networks. Finally, we compare to classical tangent PCA through various examples and provide illustrations on real-world datasets.

2412.16457 2026-06-09 stat.ML cs.DS cs.LG math.PR math.ST stat.TH 版本更新

Robust Random Graph Matching in Dense Graphs via an Approximate Message Passing Type Algorithm

稠密图中的鲁棒随机图匹配:基于近似消息传递类型算法

Zhangsong Li

AI总结 针对带潜在顶点对应的相关高斯Wigner矩阵对,提出一种近似消息传递迭代算法,在对抗性扰动下实现多项式时间匹配恢复,扰动规模可达n^{1-o(1)}。

详情
Comments
46 pages; accepted by IEEE Trans. Inf. Theory
AI中文摘要

本文关注一对具有潜在顶点对应的相关高斯Wigner矩阵的匹配恢复问题。我们特别关注该问题的鲁棒版本,其中观测为扰动输入$(A+E,B+F)$,$(A,B)$是一对相关高斯Wigner矩阵,$E,F$是分别支撑在$A,B$的未知$\epsilon n \times \epsilon n$主子矩阵上的对抗性选择矩阵。我们提出一种近似消息传递(AMP)类型迭代算法,只要$(A,B)$之间的相关性$\rho$为非零常数且$\epsilon = o\big( \tfrac{1}{(\log n)^{20}} \big)$,该算法就能在多项式时间内成功。与标准AMP的关键区别在于,迭代中引入了时间依赖的矩阵乘法步骤,该步骤同时扩大特征维度并在迭代过程中抵消相关性。我们结果的主要方法输入来自\cite{DL22+, DL23+}中提出的迭代随机图匹配算法和\cite{IS24+}中提出的谱预处理过程。据我们所知,我们的算法是首个在任意$n^{1-o(1)}$大小的对抗性扰动下具有鲁棒性的高效随机图匹配类型算法。

英文摘要

In this paper, we focus on the matching recovery problem between a pair of correlated Gaussian Wigner matrices with a latent vertex correspondence. We are particularly interested in a robust version of this problem such that our observation is a perturbed input $(A+E,B+F)$ where $(A,B)$ is a pair of correlated Gaussian Wigner matrices and $E,F$ are adversarially chosen matrices supported on an unknown $εn * εn$ principal minor of $A,B$, respectively. We propose an approximate message passing (AMP) type iterative algorithm that succeeds in polynomial time as long as the correlation $ρ$ between $(A,B)$ is a non-vanishing constant and $ε= o\big( \tfrac{1}{(\log n)^{20}} \big)$. A key distinction from standard AMP is the introduction of a time-dependent matrix multiplication step within the iteration, which simultaneously enlarges the feature dimension and cancels the correlation during the iteration. The main methodological inputs for our result are the iterative random graph matching algorithm proposed in \cite{DL22+, DL23+} and the spectral preprocessing procedure proposed in \cite{IS24+}. To the best of our knowledge, our algorithm is the first efficient random graph matching type algorithm that is robust under any adversarial perturbations of $n^{1-o(1)}$ size.

2405.17823 2026-06-09 stat.ML cs.LG math.OA 版本更新

Spectral Truncation Kernels: Noncommutativity in $C^*$-algebraic Kernel Machines

谱截断核:C*-代数核机器中的非交换性

Yuka Hashimoto, Ayoub Hafid, Masahiro Ikeda, Hachem Kadri

AI总结 提出基于谱截断和C*-代数的谱截断核,通过允许非交换乘积实现函数域上的交互,填补了可分离核与交换核之间的空白,并降低了计算成本。

详情
AI中文摘要

向量值学习和函数值学习中的一个核心问题是如何设计既能捕捉局部和非局部交互又保持计算可行性的核。现有的算子值核仅提供部分答案:可分离核效率高但无法建模函数域上的交互,而交换核仅能捕捉逐点结构。为了解决这个问题,我们提出了谱截断核,这是一类基于谱截断和C*-代数的用于向量值和函数值学习的正定核。通过在核构造中允许非交换乘积,所提出的核能够诱导数据函数域上的交互,并填补了现有可分离核与交换核之间的空白。此外,通过使用C*-代数框架,与现有的使用算子值核的向量值RKHS框架相比,我们降低了计算成本。

英文摘要

A central question in vector- and function-valued learning is how to design kernels that capture both local and non-local interactions while remaining computationally tractable. Existing operator-valued kernels offer only partial answers: separable kernels are efficient but fail to model interactions across the function domain, while commutative kernels capture only pointwise structure. To address this, we propose spectral truncation kernels, a new class of positive definite kernels for vector- and function-valued learning based on spectral truncation and $C^*$-algebra. By allowing noncommutative products in the kernel construction, the proposed kernels induce interactions across the data function domain and fill the gap between existing separable and commutative kernels. In addition, by using the $C^*$-algebraic framework, we reduce the computational cost compared to the existing vector-valued RKHS framework with operator-valued kernels.

2402.13425 2026-06-09 cs.LG cs.AI stat.ML 版本更新

Investigating the Histogram Loss in Regression

探究回归中的直方图损失

Ehsan Imani, Kai Luedemann, Sam Scholnick-Hughes, Esraa Elelimy, Martha White

AI总结 本文通过理论和实验分析,探究直方图损失在回归任务中提升性能的原因,发现其优势源于优化改进而非额外信息建模,并在常见深度学习应用中验证其有效性。

详情
Journal ref
JMLR,2026
Comments
52 pages
AI中文摘要

在回归任务中,即使预测只需要均值,训练神经网络来建模整个分布也变得越来越常见。这种额外的建模通常会带来性能提升,但其背后的原因尚不完全清楚。本文研究了一种最近的回归方法——直方图损失,该方法通过最小化目标分布与灵活直方图预测之间的交叉熵来学习目标变量的条件分布。我们设计了理论和实证分析,以确定这种性能提升出现的原因和时机,以及损失的不同组成部分如何贡献于这种提升。我们的结果表明,在这种设置下学习分布的好处来自于优化方面的改进,而非建模额外信息。然后,我们展示了直方图损失在常见深度学习应用中的可行性,无需昂贵的超参数调优。

英文摘要

It is becoming increasingly common in regression to train neural networks that model the entire distribution even if only the mean is required for prediction. This additional modeling often comes with performance gain and the reasons behind the improvement are not fully known. This paper investigates a recent approach to regression, the Histogram Loss, which involves learning the conditional distribution of the target variable by minimizing the cross-entropy between a target distribution and a flexible histogram prediction. We design theoretical and empirical analyses to determine why and when this performance gain appears, and how different components of the loss contribute to it. Our results suggest that the benefits of learning distributions in this setup come from improvements in optimization rather than modelling extra information. We then demonstrate the viability of the Histogram Loss in common deep learning applications without a need for costly hyperparameter tuning.

2407.01718 2026-06-09 stat.ML cs.LG math.ST stat.TH 版本更新

Entropic Optimal Transport Eigenmaps for Nonlinear Alignment and Joint Embedding of High-Dimensional Datasets

熵最优传输特征映射用于高维数据集的非线性对齐与联合嵌入

Boris Landa, Yuval Kluger, Rong Ma

AI总结 提出熵最优传输特征映射方法,通过EOT计划矩阵的奇异向量对齐和联合嵌入两个数据集,具有理论保证,在生成模型下证明其收敛性,并在模拟和真实生物数据中展示优势。

详情
AI中文摘要

将高维数据嵌入低维空间是数据分析中不可或缺的组成部分。在许多应用中,需要对齐和联合嵌入来自不同研究或实验条件的多个数据集。这些数据集可能共享感兴趣的底层结构,但表现出个体扭曲,导致使用传统技术时嵌入不对齐。在这项工作中,我们提出了熵最优传输(EOT)特征映射,一种具有理论保证的对齐和联合嵌入一对数据集的原则性方法。我们的方法利用两个数据集之间EOT计划矩阵的前导奇异向量来提取它们共享的底层结构,并在公共嵌入空间中对齐它们。我们将我们的方法解释为经典拉普拉斯特征映射和扩散映射嵌入的数据间变体,表明它具有许多有利的类似性质。我们分析了一个生成模型,其中两个观测到的高维数据集共享支持在公共低维流形上的潜在变量,而每个数据集受到平移、几何扭曲、正交干扰结构和噪声的影响。在大样本、高维情况下,我们证明EOT计划围绕一个由扭曲的几何均值确定的有效流形上的总体核集中,对平移、正交干扰结构和噪声具有不变性。随后,我们将我们的嵌入与编码共享流形密度和几何的总体水平算子的特征函数联系起来。最后,我们通过模拟和真实生物数据的分析展示了我们的方法在数据集成和嵌入方面的性能,证明了其在挑战性场景下相对于替代方法的优势。

英文摘要

Embedding high-dimensional data into a low-dimensional space is an indispensable component of data analysis. In numerous applications, it is necessary to align and jointly embed multiple datasets from different studies or experimental conditions. Such datasets may share underlying structures of interest but exhibit individual distortions, resulting in misaligned embeddings using traditional techniques. In this work, we propose Entropic Optimal Transport (EOT) eigenmaps, a principled approach for aligning and jointly embedding a pair of datasets with theoretical guarantees. Our approach leverages the leading singular vectors of the EOT plan matrix between two datasets to extract their shared underlying structure and align them in a common embedding space. We interpret our approach as an inter-data variant of the classical Laplacian eigenmaps and diffusion maps embeddings, showing that it enjoys many favorable analogous properties. We analyze a generative model in which two observed high-dimensional datasets share latent variables supported on a common low-dimensional manifold, while each dataset is subject to translation, geometric distortion, orthogonal nuisance structure, and noise. In a large-sample, high-dimensional regime, we prove that the EOT plan concentrates around a population kernel on an effective manifold determined by the geometric mean of the distortions, with invariance to translations, orthogonal nuisance structure, and noise. Subsequently, we relate our embedding to eigenfunctions of population-level operators encoding the density and geometry of the shared manifold. Finally, we showcase the performance of our approach for data integration and embedding through simulations and analyses of real-world biological data, demonstrating its advantages over alternative methods in challenging scenarios.

2402.08922 2026-06-09 cs.LG stat.ML 版本更新

The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes

镜像影响假说:利用前向传播的高效数据影响估计

Myeongseob Ko, Feiyang Kang, Weiyan Shi, Ming Jin, Zhou Yu, Ruoxi Jia

AI总结 提出镜像影响假说,将训练数据对测试预测的影响转化为逆问题,通过测试样本梯度加训练样本前向传播高效估计数据影响,显著提升效率。

详情
Journal ref
The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024
Comments
The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024
AI中文摘要

大规模黑盒模型已在众多应用中变得无处不在。理解单个训练数据源对这些模型预测的影响对于提高其可信度至关重要。当前的影响估计技术涉及计算每个训练点的梯度或在不同子集上重复训练。这些方法在扩展到大型数据集和模型时面临明显的计算挑战。在本文中,我们引入并探索了镜像影响假说,强调了训练数据和测试数据之间影响的互反性质。具体来说,它表明评估训练数据对测试预测的影响可以重新表述为一个等价的逆问题:评估如果模型在特定测试样本上训练,训练样本的预测将如何改变。通过经验验证和理论验证,我们证明了我们假说的广泛适用性。受此启发,我们引入了一种新的训练数据影响估计方法,该方法需要计算特定测试样本的梯度,并结合每个训练点的前向传播。这种方法可以利用常见的不对称性,即同时检查的测试样本数量远小于训练数据集的规模,从而相比现有方法在效率上获得显著提升。我们展示了我们的方法在一系列场景中的适用性,包括扩散模型中的数据归因、数据泄露检测、记忆化分析、错误标记数据检测以及语言模型中的行为追踪。我们的代码将在以下网址提供:https://this https URL。

英文摘要

Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current influence estimation techniques involve computing gradients for every training point or repeated training on different subsets. These approaches face obvious computational challenges when scaled up to large datasets and models. In this paper, we introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data. Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem: assessing how the predictions for training samples would be altered if the model were trained on specific test samples. Through both empirical and theoretical validations, we demonstrate the wide applicability of our hypothesis. Inspired by this, we introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point. This approach can capitalize on the common asymmetry in scenarios where the number of test samples under concurrent examination is much smaller than the scale of the training dataset, thus gaining a significant improvement in efficiency compared to existing approaches. We demonstrate the applicability of our method across a range of scenarios, including data attribution in diffusion models, data leakage detection, analysis of memorization, mislabeled data detection, and tracing behavior in language models. Our code will be made available at https://github.com/ruoxi-jia-group/Forward-INF.

2410.14949 2026-06-09 cs.LG stat.ML

On the Convergence and Straightness of Rectified Flow

关于校正流的收敛性与直线性

Vansh Bansal, Saptarshi Roy, Alessandro Rinaldo, Purnamrita Sarkar

AI总结 本文提出Piecewise Straightness参数γ₂,T,建立首个流模型离散误差与γ₂,T的Wasserstein收敛界,证明最小曲率是实现高保真单步采样的关键,同时为RF的直线性分析提供了理论框架。

详情
Comments
37 pages
AI中文摘要

本文提出Piecewise Straightness参数γ₂,T,建立首个流模型离散误差与γ₂,T的Wasserstein收敛界,证明最小曲率是实现高保真单步采样的关键,同时为RF的直线性分析提供了理论框架。

英文摘要

Flow Matching has become a cornerstone of modern generative models like Stable Diffusion 3, largely due to the efficiency of its Rectified Flow (RF) variant. The success of RF hinges on iteratively learning straight trajectories, pushing generation towards fewer sampling steps. However, the theoretical link between path geometry and sampling efficiency has been underexplored. This paper fills this gap by introducing a novel \textit{Piecewise Straightness} parameter, $γ_{2,T}$. We establish the first Wasserstein convergence bound that explicitly links the discretization error of \textit{any} general flow-model to $γ_{2,T}$, proving that minimizing curvature is the key to achieving high-fidelity, one-step sampling. Building on this theory, we establish the first theoretical framework to analyze the straightness of RF. We begin by offering intuitive geometric arguments for simple cases before identifying sufficient conditions under which a single rectification step (1-RF) yields a perfectly straight or even a Monge optimal coupling. While whether these sufficient conditions are met depends on the problem geometry, they enable the first concrete proofs in this area. Critically, fulfilling these conditions makes the subsequent flow (2-RF) perfectly straight ($γ_{2,T}=0$). This eliminates the discretization error in our bound and makes flawless, single-step sampling possible.

2401.14591 2026-06-09 cs.LG stat.ML

Ricci flow regularization in latent spaces for the forward learning of partial differential equations

在潜在空间中使用里奇流进行偏微分方程的前向学习

Andrew Gracyk

AI总结 本文提出基于流形的机器学习编码器-解码器方法,通过里奇流演化潜在空间来学习时间动态,特别是偏微分方程。方法通过参数化潜在流形并模拟物理约束下的里奇流,实现低维表示学习及对抗鲁棒性。

详情
Comments
Fixed a small error in appendix; some improvements to experiments
AI中文摘要

我们提出了一种基于流形的机器学习编码器-解码器方法,用于学习时间动态,特别是偏微分方程(PDEs)。其中,流形潜在空间根据里奇流演化。这可以通过参数化潜在流形阶段并随后在物理约束下模拟里奇流来实现,通过匹配流形量以实现在经验上达到里奇流。我们强调那些允许低维表示的动力学。通过该方法,由度量诱导的流形通过训练过程得以辨识,而由于里奇流的潜在演化提供了一种适应性表示。利用此流,我们维持了一个标准的流形潜在表示,适用于所有驻留PDE时间区间连续体的值。我们展示里奇流有助于诸如学习非分布数据和在选定PDE数据上的对抗鲁棒性等特性。此外,我们还对允许更高维表示的特殊情形进行了详尽扩展,例如在超球面上的里奇流和具有熵策略的神经发现非参数几何流。

英文摘要

We present a manifold-based machine learning encoder-decoder method for learning dynamics in time, notably partial differential equations (PDEs), in which the manifold latent space evolves according to Ricci flow. This can be accomplished by parameterizing the latent manifold stage and subsequently simulating Ricci flow in a physics-informed setting, matching manifold quantities so that Ricci flow is empirically achieved. We emphasize dynamics that admit low-dimensional representations. With our method, the manifold, induced by the metric, is discerned through the training procedure, while the latent evolution due to Ricci flow provides an accommodating representation. By use of this flow, we sustain a canonical manifold latent representation for all values in the ambient PDE time interval continuum. We showcase that the Ricci flow facilitates qualities such as learning for out-of-distribution data and adversarial robustness on select PDE data. Moreover, we provide a thorough expansion of our methods in regard to special cases which allow higher-dimensional representations, such as Ricci flow on the hypersphere and neural discovery of non-parametric geometric flows with entropic strategies.

2205.01970 2026-06-09 cs.LG stat.ML

Non-Stationary Bandit Learning via Predictive Sampling

非平稳老虎机学习中的预测采样

Yueyang Liu, Xu Kuang, Benjamin Van Roy

AI总结 本文提出预测采样算法,通过区分信息快速失效的行动来改进非平稳环境下的老虎机学习,理论证明其性能并验证其在复杂环境中的有效性。

详情
AI中文摘要

Thompson sampling在广泛平稳老虎机环境中表现良好,但应用于非平稳环境时表现不佳。本文指出,此类失败源于探索时未根据信息失效速度区分行动。基于此,提出预测采样算法,通过优先处理信息快速失效的行动来提升性能。通过理论上的贝叶斯遗憾界证明预测采样的性能,并提供可扩展到实际应用复杂老虎机环境的版本。数值模拟显示,预测采样在所有考察的非平稳环境中均优于Thompson sampling。

英文摘要

Thompson sampling has proven effective across a wide range of stationary bandit environments. However, as we demonstrate in this paper, it can perform poorly when applied to non-stationary environments. We show that such failures are attributed to the fact that, when exploring, the algorithm does not differentiate actions based on how quickly the information acquired loses its usefulness due to nonstationarity. Building upon this insight, we propose predictive sampling, an algorithm that deprioritizes acquiring information that quickly loses usefulness. Theoretical guarantee on the performance of predictive sampling is established through a Bayesian regret bound. We provide versions of predictive sampling for which computations tractably scale to complex bandit environments of practical interest. Through numerical simulations, we demonstrate that predictive sampling outperforms Thompson sampling in all non-stationary environments examined.

8. 生物统计与医学统计 8 篇

2606.09089 2026-06-09 stat.ME stat.CO 新提交

Supervised Low-Rank Structure Discovery for Developmental Epigenetic Aging in Ultra-High-Dimensional DNA Methylation Data

超高通量DNA甲基化数据中发育表观遗传衰老的监督低秩结构发现

Priyam Das, Jiyeon Song, Lathika Mohanraj, Karolina A. Aberg, Yi Li, Subharup Guha

AI总结 提出SOLAR框架,结合正交低秩回归与自适应惩罚,在超高通量数据中自动识别与残余DNAm年龄相关的CpG甲基化结构,具有可扩展性和理论保证。

详情
AI中文摘要

超高通量阵列型CpG甲基化研究需要同时提供监督结构发现、可解释性、可扩展的潜在维度识别和计算可行性的统计框架。我们提出SOLAR(监督正交低秩自适应回归),一个用于识别与残余DNAm年龄相关的CpG水平甲基化结构的监督低秩潜在因子框架。SOLAR将正交低秩回归与惩罚最大后验公式、维度自适应BIC型惩罚以及用于自动潜在秩选择的跨维度模拟退火策略相结合,并在适当的正则条件下提供可识别性、固定秩恢复和秩选择一致性等理论保证。该框架还采用了计算和内存高效的优化策略,展示了可扩展至$p=10^7$的能力,而在标准桌面计算环境下$p=10^6$的分析仍然可行。模拟研究展示了稳定的秩恢复、有竞争力的监督信号恢复以及在中等、高维和超高通量维度下的强可扩展性。利用GUSTO出生队列的纵向EPIC阵列CpG甲基化数据(包括$n=1051$个跨越婴儿期和幼儿期的甲基化图谱,每个样本约860,000个测定的CpG),SOLAR识别了与残余DNAm年龄(超越实际年龄)相关的异质性监督甲基化结构,以及生物学上一致的CpG特征和富集模式。

英文摘要

Ultra-high-dimensional array-based CpG methylation studies require statistical frameworks that simultaneously provide supervised structure discovery, interpretability, scalable latent-dimension identification, and computational feasibility. We propose SOLAR (Supervised Orthogonal Low-rank Adaptive Regression), a supervised low-rank latent-factor framework for identifying CpG-level methylation structure associated with residualized DNAm age. SOLAR combines orthogonal low-rank regression with a penalized maximum a posteriori formulation, dimension-adaptive BIC-type penalization, and a trans-dimensional simulated-annealing strategy for automatic latent-rank selection, together with theoretical guarantees including identifiability, fixed-rank recovery, and rank-selection consistency under suitable regularity conditions. The framework additionally incorporates computationally and memory-efficient optimization strategies demonstrating scalability up to $p=10^7$, while analyses at $p=10^6$ remain feasible on standard desktop computing environments. Simulation studies demonstrate stable rank recovery, competitive supervised signal recovery, and strong scalability across moderate-, high-, and ultra-high-dimensional regimes. Using longitudinal EPIC-array CpG methylation data from the GUSTO birth cohort, comprising $n=1051$ methylation profiles collected across infancy and early childhood with approximately 860,000 assayed CpGs per sample, SOLAR identifies heterogeneous supervised methylation structure associated with residualized DNAm age beyond chronological age alone, together with biologically coherent CpG signatures and enrichment patterns.

2606.08966 2026-06-09 stat.ME 新提交

Class Imbalance Corrections Failed to Enhance Discrimination, Model Calibration, and Prediction Stability: An Empirical Simulation Study Based on Clinical Dataset

类别不平衡校正未能提升判别能力、模型校准和预测稳定性:基于临床数据集的实证模拟研究

Wachiranun Sirikul, Natthanaphop Isaradech, Wuttipat Kiratipaisarl, Pakpoom Wongyikul, Noraworn Jirattikanwong, Phichayut Phinyo

AI总结 本研究通过模拟临床预测模型开发,发现类别不平衡校正(如重采样和算法级调整)不能改善模型判别能力,反而导致校准不良和预测不稳定,建议不应常规进行不平衡校正。

详情
Comments
47 pages
AI中文摘要

类别不平衡在开发临床预测模型(CPMs)时很常见,通常被认为会导致预测性能不佳。已有多种方法被提出来在CPM开发过程中校正数据不平衡。然而,校正类别不平衡是否改善或损害CPM性能仍不清楚。本研究调查了不平衡校正如何影响分类性能和预测稳定性。我们使用惩罚逻辑回归模拟了CPM的开发与内部验证,采用了不同的不平衡校正策略,包括算法级重平衡、数据级过采样重平衡以及过采样与欠采样的组合。模拟数据集来自GUSTO-I试验,包含40,830名患者和2,851个事件。所有不平衡校正策略在样本量从500到40,830的场景下进行了评估。使用200次bootstrap重抽样评估模型性能和预测稳定性,包括判别能力、校准、校准稳定性、平均绝对预测误差(MAPE)和分类不稳定性指数(CII)。类别不平衡校正并未显著改善模型判别能力。与未校正的模型相比,数据级和算法级校正均导致校准不良、风险高估以及预测不稳定性增加,如预测稳定性、MAPE和CII图所示。这些发现表明,类别不平衡校正不一定能改善CPM性能,反而可能损害校准和预测稳定性。类别不平衡不应被视为需要自动校正的病理状态。在临床预测建模中,默认进行常规不平衡校正通常不可取。

英文摘要

Class imbalance is common when developing clinical prediction models (CPMs) and is often assumed to lead to poor predictive performance. Several methods have been proposed to correct data imbalance during CPM development. However, it remains unclear whether correcting class imbalance improves or harms CPM performance. This study investigated how imbalance correction affects classification performance and prediction stability. We simulated the development and internal validation of CPMs using penalised logistic regression under different imbalance-correction strategies, including algorithm-level rebalancing, data-level rebalancing by oversampling, and combined over- and under-sampling. The simulation dataset was derived from the GUSTO-I trial, which included 40,830 patients and 2,851 events. All imbalance-correction strategies were evaluated across sample-size scenarios ranging from 500 to 40,830. Model performance and prediction stability were assessed using 200 bootstrap resamples, including discrimination, calibration, calibration stability, mean absolute prediction error (MAPE), and classification instability index (CII). Class imbalance correction did not meaningfully improve model discrimination. Both data-level and algorithm-level correction led to miscalibration, risk overestimation, and increased prediction instability, as shown by prediction stability, MAPE, and CII plots, compared with models developed without correction. These findings suggest that class imbalance correction does not necessarily improve CPM performance and may compromise calibration and prediction stability. Class imbalance should not be treated as a pathology that automatically requires correction. In clinical prediction modelling, routine imbalance correction by default is generally not advisable.

2606.08693 2026-06-09 stat.AP stat.CO 新提交

An exploration into how susceptibility distribution misspecifications impact epidemic forecasting

易感性分布错误设定如何影响流行病预测的探索

Ibrahim Mohammed, Chris Robertson, M. Gabriela M. Gomes

AI总结 研究比较了SEIR框架下伽马和对数正态易感性分布的错误设定,发现中等异质性时流行轨迹差异显著,但错误设定对预测的影响较小,建议采用异质性模型。

详情
Comments
18 pages, 8 figures, 4 tables
AI中文摘要

流行病动态的异质性易感性模型通常假设个体易感性服从伽马分布,这允许解析简化为低维系统。然而,任何给定人群中真实的经验分布形式是未知的。这里我们通过比较易感-暴露-感染-移除(SEIR)框架中的伽马和对数正态设定,研究错误指定易感性分布的后果。当两种分布在均值和变异系数($\nu$)上匹配时,我们发现一旦异质性中等或较高($\nu\gtrsim 1$),它们的流行轨迹就会发散,对数正态分布产生更晚、更大的峰值和更大的最终规模。然后我们评估分布错误指定对统计推断的影响。使用合成数据集,我们通过最大似然拟合正确指定和错误指定的模型。在默认场景中,基于单个流行病的模拟数据进行推断,两种模型都可以通过异质性和干预参数的相关偏移来补偿以重现数据。然而,当基于两个模拟流行病进行推断时,这种补偿可能因参数跨流行病如何关联的已知约束而减少。在这些情况下,正确指定的模型准确恢复所有参数,而错误指定的模型倾向于给出有偏估计。这些推断偏差会传播到预测中,但与同质模型相比,预测仍然相对准确,例如在$\nu\approx 1$的场景中,同质模型使峰值发病率增加一倍以上。我们得出结论,这里评估的易感性分布错误指定导致的偏差是轻微的,并鼓励在未来的流行病预测中采用异质性模型。

英文摘要

Heterogeneous susceptibility models for epidemic dynamics preferentially assume that individual susceptibility follows a gamma distribution, which permits analytical reduction to a low-dimensional system. However, the true empirical distributional form in any given population is unknown. Here we investigate the consequences of misspecifying the susceptibility distribution by comparing gamma and lognormal specifications in a Susceptible-Exposed-Infectious-Removed (SEIR) framework. When both distributions are matched on mean and coefficient of variation ($ν$), we find that their epidemic trajectories diverge once heterogeneity is moderate or high ($ν\gtrsim 1$), with the lognormal producing a later, larger peak and a greater final size. We then assess the impact of distributional misspecification on statistical inference. Using synthetic datasets, we fit correctly specified and misspecified models by maximum likelihood. In a default scenario, where inference is based on simulated data for a single epidemic, both models can reproduce the data by compensating through correlated shifts in heterogeneity and intervention parameters. When inference is based on two simulated epidemics, however, this compensation may be reduced by known constraints of how parameters are related across epidemics. In these cases, the correctly specified model recovers all parameters accurately, while the misspecified model tends to give biased estimates. These inference biases propagate into forecasts, but predictions remain relatively accurate when compared to homogeneous models which more than double peak incidences in scenarios where $ν\approx 1$, for instance. We conclude that deviations resulting from the susceptibility distribution misspecifications assessed here are minor and encourage the adoption of heterogeneous models in future epidemic forecasting.

2606.08642 2026-06-09 stat.ME stat.AP 新提交

A Practical Framework for Sensitivity Analysis in Externally Controlled Trials: An Illustration with a Bayesian Hybrid Evidence Synthesis Case Study

外部对照试验中敏感性分析的实用框架:贝叶斯混合证据合成案例研究

Xuemin Gu, Kitty Guo, Jane Zhang

AI总结 针对外部对照试验中借用假设的敏感性分析缺乏结构化模板的问题,提出一个三支柱框架,包含八种模块化分析,并通过模拟数据示例展示其应用。

详情
AI中文摘要

外部对照试验(ECTs),包括用历史数据增强的单臂研究和部分外部增强的混合随机设计,在同期随机对照不可行或不道德时越来越常用。FDA、EMA和NMPA的监管指南要求对借用假设进行敏感性分析,但未提供运行哪些分析或如何共同解释它们的结构化模板。我们提出了一个围绕三个问题的三支柱框架:借用是否适当、是否贡献了有意义的价值、结论是否对扰动稳健。该框架包括八种模块化分析,涵盖异质性诊断、源影响、无借用参考、有效样本量、先验敏感性、转折点、替代借用方法和结构模型敏感性。该方法不可知,适用于贝叶斯和频率学派在患者水平或混合环境中的借用。我们使用模拟数据来说明该框架,这些数据模拟了在真实世界证据监管路径下历史批准的种族桥接提交中的混合证据合成。原始分析结合了来自全球关键研究和区域真实世界研究的个体患者数据,以及来自两个已发表队列的汇总数据,通过具有种族差异参数的贝叶斯纵向模型拟合。该工作示例为ECT提交中的敏感性分析提供了可重复的模板。

英文摘要

Externally controlled trials (ECTs), including single-arm studies augmented with historical data and hybrid randomized designs with partial external augmentation, are increasingly used when concurrent randomized controls are infeasible or unethical. Regulatory guidance from the FDA, EMA, and NMPA calls for sensitivity analysis of borrowing assumptions, yet provides no structured template for which analyses to run or how to interpret them together. We propose a three-pillar framework organized around three questions: was the borrowing appropriate, did it contribute meaningful value, and are the conclusions robust to perturbation? The framework comprises eight modular analyses covering heterogeneity diagnostics, source influence, no-borrowing references, effective sample size, prior sensitivity, tipping points, alternative borrowing methods, and structural model sensitivity. It is method-agnostic and applies to both Bayesian and frequentist borrowing in patient-level or hybrid settings. We illustrate the framework using simulated data that mimic a hybrid evidence synthesis from a historical approval of ethnic-bridging submission under a real-world-evidence regulatory pathway. That original analysis combined individual patient data from a global pivotal study and a regional real-world study with aggregate data from two published cohorts, fitted via a Bayesian longitudinal model with ethnic-difference parameters. The worked example provides a reproducible template for sensitivity analysis in ECT submissions.

2602.09267 2026-06-09 stat.AP 版本更新

Estimating the distance at which narwhal respond to disturbance: a penalized threshold hidden Markov model

估算海豚对干扰的响应距离:一种带有惩罚阈值隐马尔可夫模型

Fanny Dupont, Marianne Marcoux, Nigel E. Hussey, Jackie Dawson, Marie Auger-Méthé

AI总结 本文提出一种带惩罚的阈值隐马尔可夫模型,用于量化野生动物对刺激的反应,通过模拟和实证数据展示该方法在估计海豚对船舶响应距离方面的应用。

详情
Comments
22 pages
AI中文摘要

理解野生动物对干扰的反应对于野生动物保护至关重要。例如,在北极,海冰减少开辟了新的航运路线,增加了需要量化海洋哺乳动物对船舶存在反应距离的评估需求。使用轨迹数据确定与行为偏离相关的距离需要高级统计模型,如阈值隐马尔可夫模型(THMMs)。虽然这些模型非常强大,但它们不评估估计的阈值是否反映有意义的行为变化。我们引入了一种带有Lasso惩罚的THMM,基于计算高效的方法对HMMs施加惩罚,并提出了一个新的高效惩罚准限制最大似然估计器。我们的框架能够估计阈值并评估干扰效应是否可区分于基线行为。通过模拟,我们证明了我们的Lasso方法能有效将伪阈值效应缩小到零。当应用于海豚运动数据时,我们的分析表明,海豚在距离船舶4公里范围内通过减少运动持续性和在更深水域(平均最大深度356米)花费更多时间来响应船舶。总体而言,我们提供了一个广泛适用的框架,用于量化对刺激的反应,应用范围从确定反应阈值到干扰,到估算陆地物种如大象检测水源的距离。

英文摘要

Understanding behavioural responses to disturbances is vital for wildlife conservation. For example, in the Arctic, the decrease in sea ice has opened new shipping routes, increasing the need for impact assessments that quantify the distance at which marine mammals react to vessel presence. This information can then guide targeted mitigation policies, such as vessel slow-down regulations and delineation of avoidance areas. Using telemetry data to determine distances linked to deviations from normal behaviour requires advanced statistical models, such as threshold hidden Markov models (THMMs). While these are powerful tools, they do not assess whether the estimated threshold reflects a meaningful behavioural shift. We introduce a lasso-penalized THMM that builds on computationally efficient methods to impose penalties on HMMs and present a new, efficient penalized quasi-restricted maximum-likelihood estimator. Our framework is capable of estimating thresholds and assessing whether the disturbance effects are distinguishable from baseline behaviour. With simulations, we demonstrate that our lasso method effectively shrinks spurious threshold effects towards zero. When applied to narwhal movement data, our analysis suggests that narwhal react to vessels up to 4 kilometres away by decreasing movement persistence and spending more time in deeper waters (average maximum depth of 356m). Overall, we provide a broadly applicable framework for quantifying behavioural responses to stimuli, with applications ranging from determining reaction thresholds to disturbance to estimating the distances at which terrestrial species, such as elephants, detect water.

2410.23786 2026-06-09 stat.ME stat.AP 版本更新

Conformal inference for cell type annotation with graph-structured constraints

基于图结构约束的细胞类型注释的共形推断

Daniela Corbetta, Livio Finos, Ludwig Geistlinger, Davide Risso

AI总结 提出一种利用细胞本体图结构增强共形预测集可解释性的方法,通过共形风险控制实现图结构预测,并处理训练与测试数据分布非交换性,开发了R包scConform。

详情
AI中文摘要

共形预测是一种为机器学习模型构建预测集的框架,仅依赖于训练和测试数据的可交换性,无需指定参数分布。尽管其应用广泛且受欢迎,但在单细胞转录组学中的应用仍未充分探索。本文通过开发一种方法来解决这一空白,该方法利用细胞本体图结构中编码的细胞类型关系的丰富信息,以增强基于参考的细胞类型注释的可解释性。利用共形风险控制,我们开发了一种新颖的用于图结构预测的共形算法,并展示了如何通过纳入图约束来改善细胞类型预测的解释。该方法旨在生成更连贯的共形集,使其与类别之间的内在关系保持一致,从而促进对模型预测的更清晰、更直观的解释。此外,我们提供了一种处理非可交换性的技术,特别是当训练和测试数据集之间的细胞类型分布发生变化时。我们在开源R包scConform中实现了我们的方法,该包可在https://this URL获取。

英文摘要

Conformal prediction is a framework for constructing prediction sets for machine learning models, relying solely on the exchangeability of training and test data and without requiring to specify a parametric distribution. Despite its wide applicability and popularity, its application in single-cell transcriptomics remains underexplored. This paper addresses this gap by developing an approach that leverages the rich information about cell-type relations, encoded in the graph structure of cell ontologies, to enhance the interpretability of reference-based cell-type annotation. Leveraging conformal risk control, we develop a novel conformal algorithm for graph-structured predictions and we demonstrate how incorporating graph constraints can improve the interpretation of cell-type predictions. This approach aims to generate more coherent conformal sets that align with the inherent relationships among classes, facilitating clearer and more intuitive interpretations of model predictions. Additionally, we provide a technique to address non-exchangeability, particularly when the cell-type distribution changes between training and test datasets. We implemented our method in the open-source R package scConform, available at https://bioconductor.posit.co/packages/release/bioc/html/scConform.html.

2507.18683 2026-06-09 stat.AP astro-ph.CO astro-ph.IM 版本更新

Bayesian Deep Gaussian Processes for Correlated Functional Data: A Case Study in Cosmological Matter Power Spectra

贝叶斯深度高斯过程用于相关函数数据:宇宙物质功率谱案例研究

Stephen A. Walsh, Annie S. Booth, David Higdon, Jared Clark, Kelly R. Moran, Katrin Heitmann

AI总结 提出贝叶斯深度高斯过程分层模型,综合多保真度模拟数据估计物质功率谱并量化不确定性,利用基函数表示训练独立GP模拟器预测新宇宙学下的功率谱。

详情
Comments
22 pages, 14 figures. Revised and accepted version for publication in Data Science in Science
AI中文摘要

理解宇宙结构和物质分布是一个活跃的研究领域。随着宇宙学调查的复杂性增加,开发高效预测物质功率谱的模拟器至关重要。我们特别受到Mira-Titan宇宙模拟套件的启发,该套件针对特定的宇宙学参数化(称为“宇宙学”)提供多个不同保真度的响应曲线,包括相关的函数实现。我们的目标有两个。首先,我们从所有提供的曲线中估计潜在的功率谱,并进行适当的不确定性量化(UQ)。为此,我们提出了一种新颖的贝叶斯深度高斯过程(DGP)分层模型,该模型综合所有模拟信息来估计潜在的功率谱,同时提供有效的UQ。我们的模型将先前关于贝叶斯DGP的工作从标量响应扩展到相关函数输出。其次,我们利用从各种宇宙学预测的功率谱,以准确预测未观测宇宙学的整个物质功率谱。对于这个任务,我们使用函数谱的基函数表示来训练一个独立的高斯过程模拟器。我们的方法在合成实验和基准宇宙学模拟器(Cosmic Emu)中表现良好。

英文摘要

Understanding the structure of our universe and the distribution of matter is an area of active research. As cosmological surveys grow in complexity, the development of emulators to efficiently and effectively predict matter power spectra is essential. We are particularly motivated by the Mira-Titan Universe simulation suite that, for a specified cosmological parameterization (termed a "cosmology"), provides multiple response curves of various fidelities, including correlated functional realizations. Our objective is two-fold. First, we estimate the underlying matter power spectra, with appropriate uncertainty quantification (UQ), from all of the provided curves. To this end, we propose a novel Bayesian deep Gaussian process (DGP) hierarchical model which synthesizes all the simulation information to estimate the underlying matter power spectra while providing effective UQ. Our model extends previous work on Bayesian DGPs from scalar responses to correlated functional outputs. Second, we leverage our predicted power spectra from various cosmologies in order to accurately predict the entire matter power spectra for an unobserved cosmology. For this task, we use basis function representations of the functional spectra to train a separate Gaussian process emulator. Our method performs well in synthetic exercises and against the benchmark cosmological emulator (Cosmic Emu).

2503.02245 2026-06-09 stat.AP 版本更新

Identification of Genetic Factors Associated with Corpus Callosum Morphology: Conditional Strong Independence Screening for Non-Euclidean Responses

与胼胝体形态相关的遗传因素识别:非欧几里得响应的条件强独立筛选

Zhe Gao, Jin Zhu, Yue Hu, Wenliang Pan, Xueqin Wang

AI总结 针对超高维预测变量和非欧几里得响应,提出条件强独立筛选方法,引入条件度量依赖概念,用于识别与胼胝体形态相关的遗传因素。

详情
AI中文摘要

胼胝体是大脑中最大的白质结构,在半球间通信中起着关键作用。其形态变化与多种神经和心理疾病相关,使其成为神经遗传学的研究重点。已知年龄会显著影响胼胝体的结构和形态,这增加了识别对其形状和大小有贡献的特定遗传因素的复杂性。我们提出了一种条件强独立筛选方法,以应对超高维预测变量和非欧几里得响应的挑战。该方法融入了先验知识,如年龄,并引入了一个新的条件度量依赖概念,用于量化度量空间中随机对象之间的非线性条件依赖关系,而无需依赖预定义模型。我们将此框架应用于识别与胼胝体形态相关的遗传因素。模拟结果证明了该方法在各种非欧几里得数据类型上的有效性,突显了其在神经科学中推动遗传发现的潜力。

英文摘要

The corpus callosum, the largest white matter structure in the brain, plays a critical role in interhemispheric communication. Variations in its morphology are associated with various neurological and psychological conditions, making it a key focus in neurogenetics. Age is known to influence the structure and morphology of the corpus callosum significantly, complicating the identification of specific genetic factors that contribute to its shape and size. We propose a conditional strong independence screening method to address these challenges for ultrahigh-dimensional predictors and non-Euclidean responses. Our approach incorporates prior knowledge, such as age. It introduces a novel concept of conditional metric dependence, quantifying non-linear conditional dependencies among random objects in metric spaces without relying on predefined models. We apply this framework to identify genetic factors associated with the morphology of the corpus callosum. Simulation results demonstrate the efficacy of this method across various non-Euclidean data types, highlighting its potential to drive genetic discovery in neuroscience.

9. 经济金融与社会科学统计 9 篇

2606.09274 2026-06-09 q-fin.RM q-fin.ST stat.ME 新提交

Reverse Stress Testing for Multivariate Scenarios: A Conditional Framework for Stressed Time Series

多变量场景的反向压力测试:压力时间序列的条件框架

Michele Sparviero, Lorenzo Viola

AI总结 提出一种反向压力测试方法,从单一资产类别的外生冲击出发,重建与市场经验依赖结构一致的多变量压力场景,并通过三种分布假设下的条件密度最大化求解。

详情
Comments
26 pages, 5 figures, 2 tables
AI中文摘要

本文开发了一种反向压力测试(RST)的方法框架,其中从单一资产类别上施加的外生冲击出发,重建与市场经验依赖结构一致的多变量压力场景。该问题被表述为在给定冲击下条件密度的最大化,并在三种逐渐减弱的分布假设下求解。在参数设置中,收益的联合高斯性产生了一个封闭形式的模态场景,该场景与非冲击分量的条件均值一致。在半参数设置中,通过经验似然方法非参数地估计模态场景,并通过高斯或学生t局部采样方案生成周围的压力轨迹。在完全非参数设置中,通过在估计场景的马氏邻域内对历史观测进行逆距离重采样来获得压力轨迹。这三种变体在实际市场数据上得到了验证。模拟的场景在经济上是一致的,并且能够再现压力市场体制中观察到的标准风险-回报不对称性。

英文摘要

This paper develops a methodological framework for reverse stress testing (RST) in which a multivariate stress scenario, coherent with the empirical dependence structure of a market, is reconstructed from a single exogenous shock prescribed on one asset class. The problem is formulated as the maximisation of the conditional density given the imposed shock, and is solved under three progressively weaker distributional assumptions. In the parametric setting, joint Gaussianity of the returns yields a closed-form modal scenario coinciding with the conditional mean of the non-shocked components. In the semiparametric setting, the modal scenario is estimated nonparametrically through the empirical likelihood methodology and the surrounding stressed trajectories are generated via a Gaussian or Student-t local sampling scheme. In the fully nonparametric setting, stressed trajectories are obtained by inverse-distance resampling of the historical observations within a Mahalanobis neighbourhood of the estimated scenario. The three variants are validated on real market data. The simulated scenarios prove to be economically coherent and capable of reproducing the standard risk-reward asymmetry observed in stressed market regimes.

2606.08692 2026-06-09 stat.AP 新提交

Logistic Credibility with Temporal Decay: Extending Bühlmann--Straub for Commercial Lines

具有时间衰减的逻辑信度:扩展Bühlmann--Straub用于商业险种

Jake Morris

AI总结 针对Bühlmann-Straub信度模型假设所有账户共享相同信度权重且不考虑时间衰减的缺陷,提出逻辑信度模型,通过逻辑函数建模权重并引入EWMA衰减参数,在商业车险数据上校准斜率恢复至1.00,预测误差降低38%。

详情
Comments
68 pages, 18 figures
AI中文摘要

Bühlmann--Straub (B-S) 信度赋予每个账户权重 $Z_i = E_i/(E_i+K)$,其中 $K$ 是单一投资组合比率。该公式假设 $K$ 对所有账户相同,无论其规模、历史长度或波动性如何,并且近期和较早年度的权重相等。在一个保留的美国商业车险数据集上,这些假设不成立:对96家公司应用标准B-S,小账户的校准斜率为29,这是严重欠信度的标志。我们提出了一个联合框架,在保留B-S可解释性的同时解决这些局限性。信度权重 $Z_i$ 被建模为账户特征的逻辑函数;历史经验通过从数据估计的EWMA衰减参数 $\lambda$ 进行折扣;并且 $Z$、$\lambda$ 和互补部分在单个似然过程中优化。该框架正式将Bühlmann--Straub作为特例嵌套,允许对任何提出的扩展进行似然比检验。在一个两年保留测试集上,所提出的模型恢复了校准(斜率=1.00),并将暴露加权预测误差降低了38%(90%自助法区间:26%--50%)。衰减率出现了规模梯度(小、中、大型的 $\hat\lambda$ 分别约为0.6、0.84、0.13),并在其他责任险上定性重复。模拟研究证实了这些机制。该模型仅需要账户-年度汇总,并输出三个透明结果:信度权重、互补部分和推荐续保费率。

英文摘要

Bühlmann--Straub (B-S) credibility assigns each account a weight $Z_i = E_i/(E_i+K)$, where $K$ is a single portfolio-wide ratio. The formula assumes $K$ is the same for every account regardless of size, history length, or volatility, and that recent and older years carry equal weight. On a held-out US commercial auto dataset these assumptions fail: standard B-S applied to 96 companies produces a calibration slope of 29 for small accounts, a signature of severe under-crediting. We propose a joint framework that retains B-S interpretability while addressing these limitations. The credibility weight $Z_i$ is modelled as a logistic function of account characteristics; historical experience is discounted by an EWMA decay parameter $λ$ estimated from the data; and $Z$, $λ$, and the complement are optimised in a single likelihood pass. The framework formally nests Bühlmann--Straub as a special case, admitting a likelihood-ratio test for any proposed extension. On a two-year held-out test set the proposed model restores calibration (slope = 1.00) and reduces exposure-weighted prediction error by 38% (90% bootstrap interval: 26%--50%). A size gradient in the decay rate emerges ($\hatλ\approx 0.6$, $0.84$, $0.13$ for Small, Mid, Large) and replicates qualitatively on Other Liability. A simulation study confirms the mechanisms. The model requires only account-year summaries and delivers three transparent outputs: credibility weight, complement, and recommended renewal rate.

2606.07614 2026-06-09 cs.LG stat.AP 新提交

Measuring Poverty and Inequality with Reduced Data: A Machine Learning Approach Using Nigerian Household Data

用缩减数据衡量贫困与不平等:基于尼日利亚住户数据的机器学习方法

Vanesa Jordá, Miguel Niño-Zarazúa

发表机构 * Cantabria University(坎塔布里亚大学) SOAS University of London(伦敦大学亚非学院) United Nations University World Institute for Development Economics Research (UNU-WIDER)(联合国大学世界发展经济学研究所)

AI总结 本文利用随机森林递归特征消除法分析尼日利亚调查数据,发现少量预测因子即可高精度识别贫困状态和不平等线位置,表明机器学习可优化调查设计并降低数据需求。

详情
AI中文摘要

可靠衡量收入和消费对于监测中低收入国家的贫困与不平等至关重要,但完整的住户调查成本高昂且难以定期实施。本文探讨缩减调查工具能否保留关键分布信息。我们应用随机森林递归特征消除法(RF-RFE)对2018/19年尼日利亚通用住户调查面板数据进行分析,识别最能将个体划分到福利分布中的收入来源、消费类别和住户特征。分析聚焦三个结果:贫困状态、在五等分分布中的位置以及相对于基于基尼系数的不平等线的位置。调查的种植后和收获后阶段使我们能够评估不同季节背景下的表现。结果表明,RF-RFE在少量预测因子下实现了强分类准确率。对于消费,使用少量支出类别即可准确预测贫困状态和不平等线位置,而五等分分类对季节性消费达到约80%的准确率,对从单次季节性访问预测的年消费达到60-65%的准确率。对于收入,使用五个预测因子贫困状态准确率约达90%,不平等线位置主要由劳动收入捕获。研究结果表明,机器学习方法有助于改进调查设计并减少数据需求,同时保留衡量和监测贫困与不平等所需的大部分分布信息。

英文摘要

Reliable measurement of income and consumption is essential for monitoring poverty and inequality in low- and middle-income countries, yet full household surveys are costly and difficult to implement regularly. This paper examines whether reduced survey instruments can preserve key distributional information. We apply Random Forest Recursive Feature Elimination (RF-RFE) to the 2018/19 Nigeria General Household Survey-Panel to identify the income sources, consumption categories and household characteristics that best classify individuals within the welfare distribution. The analysis focuses on three outcomes: poverty status, location in the quintile distribution and position relative to the Gini-based inequality line. The survey's post-planting and post-harvest periods allow us to assess performance under different seasonal contexts. Results show that RF-RFE achieves strong classification accuracy with few predictors. For consumption, poverty status and inequality-line position are accurately predicted using a small set of expenditure categories, while quintile classification reaches about 80 percent accuracy for seasonal consumption and 60--65 percent for annual consumption predicted from a single seasonal visit. For income, poverty status reaches around 90 percent accuracy with five predictors, and inequality-line position is largely captured by labour earnings. The findings suggest that machine-learning methods can help improve survey design and reduce data requirements while retaining much of the distributional information needed to measure and monitor poverty and inequality.

2606.07572 2026-06-09 physics.soc-ph cs.LG stat.AP 新提交

Forecasting Japanese elections: A nonlinear machine-learning approach

预测日本选举:一种非线性机器学习方法

Sota Kato, Xuan Luo, Budrul Ahsan, Asahi Obata, Takafumi Nakanishi

AI总结 本研究引入基于决策树和集成学习的非线性机器学习模型,预测日本众议院选举结果,相比传统线性模型在样本内和样本外评估中均表现出更优的预测精度。

详情
AI中文摘要

尽管日本是世界上最大的先进民主国家之一,但其全国选举的预测模型发展仍然有限。本研究引入了基于决策树和集成学习方法的非线性机器学习预测模型,用于预测日本众议院选举结果。为了评估我们方法的方法论优势,我们复现了Lewis-Beck和Tien(LBT)针对日本选举的基础统计预测模型的理论框架和数据集。我们的模型在样本内和样本外评估中均显示出比LBT模型适度但持续提高的预测准确性,表明非线性算法在捕捉复杂选举动态方面为经典线性方法提供了一种替代方案。本研究是非线性机器学习技术较早应用于单一国家选举预测的案例之一。它提供了一个可复现的框架,当与其他国家的特定选举理论相结合时,可能提高预测模型在更广泛国家背景下的预测性能。

英文摘要

Despite Japan being one of the world's largest advanced democracies, the development of election forecasting models for its national elections remains limited. This study introduces nonlinear machine-learning forecasting models, based on decision tree and ensemble learning methods, for predicting the outcomes of Japanese lower-house elections. To assess the methodological benefits of our approach, we replicated the theoretical framework and dataset of Lewis-Beck and Tien's (LBT) foundational statistical forecasting model for Japanese elections. Our models demonstrated moderately but consistently improved predictive accuracy compared to LBT's model in both in-sample and out-of-sample evaluations, suggesting that nonlinear algorithms offer an alternative approach to classical linear methods in capturing complex electoral dynamics. This study represents one of the earlier applications of nonlinear machine-learning techniques to single-country election forecasting. It offers a replicable framework that, when combined with the country-specific electoral theories of other nations, may enhance the predictive performance of forecasting models in broader national contexts.

2504.05912 2026-06-09 q-fin.ST stat.AP 版本更新

Financial resilience of agricultural and food production companies in Spain: A compositional cluster analysis of the impact of the Ukraine-Russia war (2021-2023)

西班牙农业和食品生产企业的财务韧性:乌克兰-俄罗斯战争影响的成分聚类分析(2021-2023)

Mike Hernandez-Romero, Germà Coenders

AI总结 本研究利用基于财务比率的成分聚类分析,探讨乌克兰-俄罗斯战争对西班牙农业和食品生产企业财务韧性的影响,发现到2023年韧性企业数量增加,表明行业对冲突经济挑战的适应。

详情
Journal ref
European Accounting and Management Review, 11, 1 (2025), 55-80
AI中文摘要

本研究利用基于财务比率的聚类分析,分析了乌克兰-俄罗斯战争期间西班牙农业和食品生产企业的财务韧性。本研究采用中心化对数比变换对财务比率进行成分数据分析。数据集包含2021-2023年期间西班牙农业和食品行业1197家企业的财务信息。分析揭示了具有不同财务表现的企业集群,以偿付能力和盈利能力指标为特征。结果突出显示,到2023年韧性企业数量增加,强调了行业对冲突经济挑战的适应。这些发现共同为利益相关者和政策制定者提供了改善行业稳定性和战略规划的见解。

英文摘要

This study analyses the financial resilience of agricultural and food production companies in Spain amid the Ukraine-Russia war using cluster analysis based on financial ratios. This research utilizes centred log-ratios to transform financial ratios for compositional data analysis. The dataset comprises financial information from 1197 firms in Spain's agricultural and food sectors over the period 2021-2023. The analysis reveals distinct clusters of firms with varying financial performance, characterized by metrics of solvency and profitability. The results highlight an increase in resilient firms by 2023, underscoring sectoral adaptation to the conflict's economic challenges. These findings together provide insights for stakeholders and policymakers to improve sectorial stability and strategic planning.

2603.24215 2026-06-09 q-fin.ST stat.AP 版本更新

Adapting Altman's bankruptcy prediction model to the compositional data methodology

将Altman破产预测模型适应到组成数据方法中

Fatemeh Keivani, Germà Coenders, Geòrgia Escaramís

AI总结 本文将经典的Altman破产预测模型及其扩展适应到组成数据方法中,使用成对对数比和三种统计及机器学习工具(逻辑回归模型、k-最近邻和随机森林)进行比较,研究组成数据方法在预测破产及相关术语上的应用。

详情
Comments
22 pages, 2 figures
AI中文摘要

使用标准财务比率作为统计分析中的变量已与一些严重问题相关,例如极端异常值、不对称性、非正态性和非线性。组成数据方法已成功应用于解决这些问题,并且与标准财务比率相比始终产生显著不同的结果。一个未充分研究的领域是使用组成数据方法计算的财务对数比来预测破产或相关术语(如业务违约、无力偿债或失败)。另一个未充分研究的领域是将机器学习方法与组成对数比结合使用。本文将经典的Altman破产预测模型及其一些扩展适应到组成方法中,使用成对对数比和三种常见的统计和机器学习工具:逻辑回归模型、k-最近邻和随机森林,并将结果与标准财务比率进行比较。数据来自西班牙经济中破产公司数量最多的行业(根据NACE代码前两位数字(46XX“非机动车和摩托车的批发贸易”)),这些数据来自伊比利亚资产负债表分析系统。样本量(31,131家公司,其中97家破产)被分为训练集和验证数据集。训练集被下采样为每家破产公司对应一家健康公司。未移除异常值。聚焦于预测性能,结果表明组成方法在灵敏度(召回率)方面优于标准比率,关于特异性的结果混合,组成随机森林和组成逻辑回归表现最佳。

英文摘要

Using standard financial ratios as variables in statistical analyses has been related to several serious problems, such as extreme outliers, asymmetry, non-normality, and non-linearity. The compositional-data methodology has been successfully applied to solve these problems and has always yielded substantially different results when compared to standard financial ratios. An under-researched area is the use of financial log-ratios computed with the compositional-data methodology to predict bankruptcy or the related terms of business default, insolvency or failure. Another under-researched area is the use of machine learning methods in combination with compositional log-ratios. The present article adapts the classical Altman bankruptcy prediction model and some of its extensions to the compositional methodology with pairwise log-ratios and three common statistical and machine learning tools: logistic regression models, k-nearest neighbours, and random forests, and compares the results with standard financial ratios. Data from the sector in the Spanish economy with the largest number of bankrupt firms according to the first two digits of the NACE code (46XX "wholesale trade, except of motor vehicles and motorcycles") were obtained from the Iberian Balance sheet Analysis System. The sample size (31,131 firms, of which 97 were bankrupt) was divided into a training and a validation dataset. The training dataset was downsampled to one healthy firm to each bankrupt firm. No outliers were removed. Focusing on predictive performance, the results show that compositional methods are better than standard ratios in terms of sensitivity (recall), with mixed results regarding specificity, compositional random forests and compositional logistic regression behaving the best.

2504.01148 2026-06-09 stat.AP 版本更新

Methodological insights in Bayesian Age-Period-Cohort analysis: an application to the case of Puerto Rico's fertility decline

贝叶斯年龄-时期-队列分析的方法论洞察:以波多黎各生育率下降案例的应用

Jomarie Jiménez González, Angélica M. Rosario Santos, Luis R. Pericchi Guerra, Hernando Mattei

AI总结 本文提出一种贝叶斯年龄-时期-队列框架,用于分析人口和流行病学现象,改进现有统计方法,并通过波多黎各生育率下降案例探讨年龄、时期和队列效应的复杂关系。

详情
AI中文摘要

年龄-时期-队列(APC)模型在人口学和流行病学中对分析面板数据具有特殊重要性,根据三个不同因素:生物(年龄)、技术(时期)和文化(队列)。APC建模的主要目标是将时期和队列效应的解释分离出来。本文旨在开发一种贝叶斯年龄-时期-队列框架,能够建模广泛的人口和流行病学现象,并改进现有统计方法。APC框架包括解决三个主要挑战:(1)所有APC模型的识别问题,通常通过在效应组上施加约束来解决;(2)在模型定义中考虑专家知识;(3)高效解决计算问题。通过允许完全参数不确定性、使用稳健先验和高效的计算实现,贝叶斯方法管理了这些关注点。贝叶斯模型还产生允许直观实施和支持理论知识的结果。我们的原始方法包括:(i)使用缩放Beta2先验分布对尺度参数进行建模;(ii)施加不同的时期和队列约束并进行比较;(iii)用户友好的实现,可以轻松适应事件;(iv)各种模型比较标准,导致对APC效应的合理解释。我们研究了波多黎各生育率的急剧下降,这是一个由于加速变化而难以建模的案例,具有有趣的人口学含义,挑战了最低低生育国家时期效应的主导地位,强调了队列(文化)动量。本文介绍的方法学范围广泛,包括应用于肥胖或吸烟研究等案例。

英文摘要

Age-Period-Cohort (APC) models are of special importance in Demography and Epidemiology for analyzing panel data according to three different factors: biological (age), technological (period) and cultural (cohort). The main goal of APC modeling is to separate the explanation of both period and cohort effects to the phenomenon. The objective of this paper is to develop a Bayesian Age-Period-Cohort framework that can model a wide range of demographic and epidemiological phenomena and improve upon existing statistical methodologies. The APC framework consists of addressing three main challenges: (1) the identification problem of all APC models, usually managed by imposing constraints on effect groups, (2) considering expert knowledge in the model definition, and (3) efficient solution of computational issues. By allowing full parameter uncertainty, use of robust priors, and an efficient computational implementation, a Bayesian methodology manages these concerns. Bayesian models also produce results that allow intuitive implementation and support theoretical knowledge. Our original methodology consists of the use of (i) a Scaled Beta2 prior distribution for the scale parameters, (ii) imposing different period and cohort constraints and comparing them,(iii) user-friendly implementation that can be easily adapted to the event, and (iv) various model comparison criteria that leads to reasonable interpretation of APC effects. We examine the dramatic collapse of fertility in Puerto Rico, an application that is difficult to model due to the accelerated changes and has interesting demographic implications that challenge the predominance of period effects in lowest-low fertility countries, emphasizing the cohort (cultural) momentum. The scope of the methodology introduced here is wide, including applications to obesity or smoking studies, for example.

2411.03026 2026-06-09 econ.TH econ.EM stat.AP 版本更新

Robust Market Interventions

稳健的市场干预

Andrea Galeotti, Benjamin Golub, Sanjeev Goyal, Eduard Talamàs, Omer Tamuz

AI总结 研究在信息不精确时如何设计高概率增加剩余的市场干预,提出可恢复结构条件,利用Slutsky矩阵主成分分解设计稳健干预。

详情
AI中文摘要

何时可以设计市场干预以稳健地——即高概率地——增加剩余,同时考虑由于经济原始信息不精确而导致的不确定性?在众多拥有一定市场力量的战略企业的环境中,我们给出了此类干预存在的条件。关键条件,可恢复结构,要求产品族之间存在大规模互补性。分析通过将干预的影响分解为Slutsky矩阵的主成分来工作。在可恢复结构下,该矩阵的噪声信号足以揭示这些主成分,从而设计稳健的干预。我们的结果证明了谱方法在分析具有多个参与者的不完全观测战略互动中的有用性。

英文摘要

When can interventions in markets be designed to increase surplus robustly -- i.e., with high probability -- accounting for uncertainty due to imprecise information about economic primitives? In a setting with many strategic firms, each possessing some market power, we present conditions for such interventions to exist. The key condition, recoverable structure, requires large-scale complementarities among families of products. The analysis works by decomposing the incidence of interventions in terms of principal components of a Slutsky matrix. Under recoverable structure, a noisy signal of this matrix reveals enough about these principal components to design robust interventions. Our results demonstrate the usefulness of spectral methods for analyzing imperfectly observed strategic interactions with many agents.

2502.06530 2026-06-09 econ.TH math.ST stat.TH

Ranking Statistical Experiments via the Linear Convex Order and the Lorenz Zonoid: Economic Applications

通过线性凸序和洛伦兹洛必达序对统计实验进行排序:经济应用

Kailin Chen

AI总结 本文提出线性-Blackwell序,用于比较二元行动决策问题和准凹支付决策问题中的实验,以及道德风险和事后信号筛选问题中的实验。

详情
Comments
The main text ends on page 44, and the supplementary material follows thereafter. This paper was previously circulated under the title "Experiments in the Linear Convex Order''
AI中文摘要

本文介绍了一种新的统计实验排序方法,即线性-Blackwell (LB) 序,其可通过三种等价方式表征:(i) 诱导后验和似然比的线性凸序分散性,(ii) 洛伦兹洛必达序(状态期望轮廓集)的大小,或 (iii) 后验均值的变异性。我们将LB序应用于比较二元行动决策问题和决策问题中的实验,如Kolotilin, Corrao, 和Wolitzky (2025) 所分析的。我们还利用它来比较道德风险问题中的实验,基于Holmström (1979) 和Kim (1995),以及具有事后信号的筛选问题。

英文摘要

This paper introduces a novel ranking of statistical experiments, the linear-Blackwell (LB) order, which can equivalently be characterized by (i) the dispersion of the induced posterior and likelihood ratios in the sense of the linear convex order, (ii) the size of the Lorenz zonoid (the set of statewise expectation profiles), or (iii) the variability of the posterior mean. We apply the LB order to compare experiments in binary-action decision problems and in decision problems with quasi-concave payoffs, as analyzed by Kolotilin, Corrao, and Wolitzky (2025). We also use it to compare experiments in moral hazard problems, building on Holmström (1979) and Kim (1995), and in screening problems with ex post signals.

10. 数据隐私、稳健性与公平性 4 篇

2606.09582 2026-06-09 cs.LG stat.ML 新提交

On Choosing the $μ$ Parameter in Gaussian Differential Privacy

论高斯差分隐私中参数 $μ$ 的选择

Bogdan Kulynych, Antti Honkela

AI总结 本文通过匹配强对手成员推理攻击的最坏情况成功度,提供从纯-DP ε到GDP μ的原则性映射,并推荐 μ≈ε/5 作为保守通用转换。

详情
AI中文摘要

近期工作主张使用高斯差分隐私(GDP)来报告隐私保护机器学习中的隐私保证。我们通过匹配强对手成员推理攻击在最坏情况下的成功度,基于三个指标提供了从纯-DP ε到GDP μ的原则性映射:固定FPR下的乘法优势、固定召回率下的精确度以及标准隐私轮廓。我们在有用参数范围内列出了μ值,并推荐μ≈ε/5作为保守的通用转换。

英文摘要

Recent work argues for using Gaussian differential privacy (GDP) to report the privacy guarantees in privacy-preserving machine learning. We provide principled mappings from pure-DP $\varepsilon$ to GDP $μ$ by matching the worst-case success of a strong-adversary membership inference attack in terms of three metrics: multiplicative advantage at fixed FPR, precision at fixed recall, and the standard privacy profile. We tabulate $μ$ values across a useful range of parameters and recommend $μ\approx \varepsilon/5$ as a conservative general-purpose conversion.

2602.16061 2026-06-09 stat.ML cs.LG econ.EM stat.ME 版本更新

Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models

利用预训练模型中的弱影子变量在缺失数据下的部分识别

Hongyu Chen, David Simchi-Levi, Ruoxuan Xiong

AI总结 针对缺失非随机(MNAR)导致的估计偏差,提出部分识别框架,通过线性规划结合预训练模型(如LLM)的预测作为弱影子变量收紧边界,并设计集合扩张估计器保证覆盖,实验显示识别区间缩小75-83%。

详情
AI中文摘要

从用户反馈中估计总体量(如平均结果)是平台评估和社会科学的基础,但反馈通常非随机缺失(MNAR):意见更强的用户更可能回应,因此标准估计量有偏,且在没有额外假设的情况下目标量不可识别。现有方法通常依赖强参数假设或实践中可能不可用的定制辅助变量。在本文中,我们开发了一个部分识别框架,其中通过求解一对线性规划获得目标量的尖锐边界,其约束编码了观测数据结构。该公式自然地将来自预训练模型(包括大型语言模型LLM)的结果预测作为额外的线性约束纳入,从而收紧可行集。我们将这些预测称为弱影子变量:它们满足关于缺失性的条件独立性假设,但不需要经典影子变量方法所需的完备性条件。当预测足够信息时,边界坍缩为点,将标准识别作为特例恢复。在有限样本中,为了提供对识别集的有效覆盖,我们提出了一种集合扩张估计器,在集合识别状态下达到慢于$\sqrt{n}$的收敛速度,在点识别下达到标准$\sqrt{n}$速度。在模拟和半合成实验(基于客服对话)中,我们发现LLM预测通常对经典影子变量方法条件不良,但在我们的框架中仍然非常有效。在现实的MNAR机制下,它们将识别区间缩小75-83%,同时保持有效覆盖。

英文摘要

Estimating population quantities such as mean outcomes from user feedback is fundamental to platform evaluation and social science, yet feedback is often missing not at random (MNAR): users with stronger opinions are more likely to respond, so standard estimators are biased and the estimand is not identified without additional assumptions. Existing approaches typically rely on strong parametric assumptions or bespoke auxiliary variables that may be unavailable in practice. In this paper, we develop a partial identification framework in which sharp bounds on the estimand are obtained by solving a pair of linear programs whose constraints encode the observed data structure. This formulation naturally incorporates outcome predictions from pretrained models, including large language models (LLMs), as additional linear constraints that tighten the feasible set. We call these predictions weak shadow variables: they satisfy a conditional independence assumption with respect to missingness but need not meet the completeness conditions required by classical shadow-variable methods. When predictions are sufficiently informative, the bounds collapse to a point, recovering standard identification as a special case. In finite samples, to provide valid coverage of the identified set, we propose a set-expansion estimator that achieves slower-than-$\sqrt{n}$ convergence rate in the set-identified regime and the standard $\sqrt{n}$ rate under point identification. In simulations and semi-synthetic experiments on customer-service dialogues, we find that LLM predictions are often ill-conditioned for classical shadow-variable methods yet remain highly effective in our framework. They shrink identification intervals by 75--83\% while maintaining valid coverage under realistic MNAR mechanisms.

2506.20573 2026-06-09 stat.ML cs.LG 版本更新

LARP: Learner-Agnostic Robust Data Prefiltering

LARP: 学习者无关的鲁棒数据预过滤

Kristian Minchev, Dimitar I. Dimitrov, Nikola Konstantinov

AI总结 提出LARP框架,通过预过滤程序保护多种下游学习器性能,理论证明可行性并分析性能损失,实验评估了图像和表格任务中的代价。

详情
Comments
Published in Transactions on Machine Learning Research (06/2026). URL: https://openreview.net/forum?id=gI6VOV3jfO
AI中文摘要

公共数据集对现代机器学习和统计推断至关重要,但通常包含低质量或受污染的样本,这可能损害模型性能。因此,需要一种原则性的预过滤程序,数据提供者可以应用该程序同时保护一系列潜在下游统计和学习程序的准确性。在这项工作中,我们形式化并分析了学习者无关的鲁棒数据预过滤(LARP),即设计预过滤程序的问题,该程序对预先指定的学习者集合上的最坏情况损失有保证。我们在两个理论环境中建立了LARP的可行性,通过提供最坏情况损失的上界保证。我们的理论结果表明,与针对单个学习者的特定预过滤相比,通过LARP保护异构学习者集合会以一定的性能损失为代价;我们将这一差距称为LARP的代价。为了评估这一性能差距,我们在图像和表格任务上实证测量了LARP的代价。我们进一步从节省重复数据整理工作的角度探讨了LARP的潜在好处,在一个博弈论模型中,下游学习者可以分摊单一预过滤的成本。

英文摘要

Public datasets, crucial for modern machine learning and statistical inference, often contain low-quality or contaminated samples that can harm model performance. This creates a need for principled prefiltering procedures that a data provider can apply to protect the accuracy of a range of potential downstream statistical and learning procedures simultaneously. In this work, we formalize and analyze Learner-Agnostic Robust data Prefiltering (LARP), the problem of designing prefiltering procedures with guarantees on the worst-case loss over a pre-specified set of learners. We establish the feasibility of LARP in two theoretical settings, by providing upper-bound guarantees on the worst-case loss. Our theoretical results indicate that protecting heterogeneous learner sets via LARP comes at the price of some performance loss compared to individual, learner-specific prefiltering; we call this gap the price of LARP. To assess this gap in performance, we empirically measure the price of LARP across image and tabular tasks. We further explore potential benefits of LARP from the perspective of saving on repeated data curation efforts, in a game-theoretic model where the downstream learners can split the cost of the single prefiltering.

2506.23033 2026-06-09 cs.LG stat.ML 版本更新

How Reliable are Fairness Audits with Unreliable Data?

不可靠数据下的公平性审计有多可靠?

Yash Vardhan Tomar

AI总结 研究受保护标签缺失对公平性缓解审计的影响,提出种子校准压力测试区分缺失效应与随机波动,发现正可用性缺失通常不改变缓解方法效果,但无标签端点表现不同,且阈值优化可能将单轴公平性增益转化为交叉危害。

详情
AI中文摘要

公平性审计是负责任机器学习部署的关键组成部分。然而,在不完全受保护标签访问下审计建议的可靠性仍然知之甚少。在这项工作中,我们关注公平性缓解审计中的受保护标签缺失。我们引入了一种种子校准压力测试,以将缺失效应与完全标签下已经存在的种子间波动分离开来。在ACS/Folktables任务中,我们发现正可用性缺失通常不会将选定的缓解方法移出完全标签的种子基线。无标签端点表现不同,暴露了ERM等效候选和确定性断点,而不是广泛的缺失效应。我们还发现,阈值优化可以将单轴公平性增益转化为高于零点的交叉危害,这是一种更尖锐的失败模式,在随机森林验证下似乎仍然可见。总体而言,我们的结果强调,在将受保护标签缺失视为审计脆弱性的证据之前,应报告种子零校准、候选集背景和交叉后果。

英文摘要

Fairness audits are a key component of responsible machine-learning deployment. Yet, the reliability of audit recommendations under incomplete protected-label access is still poorly understood. In this work, we focused on protected-label missingness in fairness mitigation audits. We introduced a seed-calibrated stress test to separate missingness effects from seed-to-seed movement that is already present under complete labels. Across ACS/Folktables tasks, we found that positive-availability missingness usually does not move selected mitigation methods beyond the complete-label seed floor. The no-label endpoint behaves differently, exposing ERM-equivalent candidates and deterministic tie-breaking rather than a broad missingness effect. We also found that threshold optimization can turn single-axis fairness gains into above-null intersectional harm, a sharper failure pattern that appears to remain visible under random-forest validation. Overall, our results highlight that protected-label missingness should be reported with seed-null calibration, candidate-set context, and intersectional consequences before it is treated as evidence of audit fragility.

11. 数据集、软件与应用 20 篇

2606.09638 2026-06-09 cs.LG cs.SC math-ph math.MP physics.comp-ph stat.AP 新提交

Data-driven discovery of governing differential equations across physical systems

跨物理系统的控制微分方程数据驱动发现

Siyu Lou, Hao Xu, Wenguan Wang, Lu Lu, Hao Sun, Yang Liu, Linfeng Zhang, Dongxiao Zhang, Yuntian Chen

发表机构 * School of Computer Science, Shanghai Jiao Tong University(上海交通大学计算机科学与工程学院) Ningbo Key Laboratory of Advanced Manufacturing Simulation, Eastern Institute of Technology(东部理工学院宁波先进制造仿真重点实验室) The State Key Lab of Brain-Machine Intelligence, Zhejiang University(浙江大学脑机智能全国重点实验室) Department of Statistics and Data Science, Yale University(耶鲁大学统计与数据科学系) Department of Chemical and Environmental Engineering, Yale University(耶鲁大学化学与环境工程系) Gaoling School of Artificial Intelligence, Renmin University of China(中国人民大学高瓴人工智能学院) School of Engineering Sciences, University of Chinese Academy of Sciences(中国科学院大学工程科学学院) DP Technology

AI总结 本文提出问题导向视角,通过二维相图组织方程可发现性,并引入表示-评估-优化(REO)框架抽象发现过程,旨在从数据中推断物理定律,推动理论修正与新概念形成。

详情
AI中文摘要

微分方程在科学发现中扮演关键角色,因为它们提供了描述物理现象行为的数学框架。作为传统第一性原理的有前景替代,数据驱动微分方程发现因其直接从实验或模拟数据推断控制定律的能力而日益受到关注,尤其是在底层物理机制不明确时。然而,该领域沿着多样化的方法论方向迅速扩展,特别是随着基于AI的方法的出现,仍缺乏清晰的组织视角。在本综述中,我们提出数据驱动微分方程发现的问题导向视角。首先引入方程可发现性的二维相图,其中发现问题根据结构复杂性和系数复杂性进行组织。该相图展示了该领域如何从稀疏方程与简单系数的发现转向具有更丰富结构和更灵活参数化的更复杂控制定律。它还阐明了为什么不同的方法论家族在不同问题设置中成功或失败。然后,我们提出表示-评估-优化(REO)框架作为发现过程的基本抽象。通过识别跨算法变体持续存在的方程发现核心问题,REO将讨论从单个算法转向决定可发现性的基本原理。我们将这些视角与物理学及相邻科学的应用联系起来,并认为下一个挑战不仅仅是恢复方程,而是利用它们来修正现有理论、提炼机制并形成新的科学概念。

英文摘要

Differential equations play a critical role in scientific discovery because they provide a mathematical framework to describe the behaviour of physical phenomena. As a promising alternative to traditional first principles, data-driven differential equation discovery has attracted increasing attention for its ability to infer governing laws directly from experimental or simulated data, especially when the underlying physics is unclear. However, the field has expanded rapidly along diverse methodological directions, particularly with the emergence of AI-based approaches, and still lacks a clear organizing perspective. In this Review, we propose a problem-oriented perspective on data-driven differential equation discovery. We first introduce a two-dimensional phase diagram of equation discoverability, where discovery problems are organized according to structural complexity and coefficient complexity. This phase diagram shows how the field has moved from the discovery of sparse equations with simple coefficients toward more complex governing laws with richer structures and more flexible parameterizations. It also clarifies why different methodological families succeed or fail in different problem settings. We then present the representation-evaluation-optimization (REO) framework as a fundamental abstraction of the discovery process. By identifying the core problems of equation discovery that persist across algorithmic variations, REO shifts the discussion from individual algorithms to the fundamental principles that determine discoverability. We connect these perspectives to applications across physics and adjacent sciences, and argue that the next challenge is not merely recovering equations, but using them to revise existing theories, distil mechanisms and form new scientific concepts.

2606.09351 2026-06-09 cs.CL stat.ME 新提交

In-Context Learning for the Imputation of Public Opinion Data with Large Language Models

基于上下文学习的民意数据插补方法

Tobias Holtdirk, Georg Ahnert, Joseph W Sakshaug, Anna-Carolina Haensch

发表机构 * LMU Munich(慕尼黑大学) Munich Center for Machine Learning(慕尼黑机器学习中心) University of Mannheim(曼海姆大学) Institute for Employment Research (IAB)(就业研究所(IAB)) University of Maryland, College Park(马里兰大学帕克分校)

AI总结 提出通过上下文学习(ICL)插补调查缺失数据,在150个意见变量上评估,相比MICE PMM方法,在所有缺失机制下绝对误差更低,尤其非随机缺失时优势显著。

详情
AI中文摘要

大型语言模型已被广泛评估为个体调查响应的模拟器。然而,在实践中,完全未观测到的响应很少见;主要问题是部分无响应。插补旨在通过填充这些缺失值来恢复调查数据集的整体结构。它有自己的明确定义的评估标准,并且与预测有根本区别。我们提出通过上下文学习(ICL)来插补缺失的调查数据。我们在美国趋势面板的15波调查中,针对150个意见变量,系统评估了不同缺失机制(MCAR、MAR、MNAR)下的ICL设计选择。与成熟的数据插补统计方法(如MICE PMM)相比,我们的ICL方法在所有缺失机制下均持续降低了绝对误差,在非随机缺失(MNAR)下收益最大。值得注意的是,性能最佳的配置(gpt-oss-120b,100个上下文示例)实现了接近名义水平的总体覆盖率(接近95%),置信区间比MICE PMM窄2到5倍。我们发布了一个具有类似sklearn API的Python包,以便使用本地和专有LLM轻松部署我们的方法。

英文摘要

Large language models have been widely evaluated as simulators of individual survey responses. In practice, however, fully unobserved responses are rare; the dominant problem is partial non-response. Imputation aims to restore the overall structure of a survey dataset by filling in these missing values. It has its own well-defined evaluation criteria and differs fundamentally from prediction. We propose to impute missing survey data through in-context learning (ICL). We systematically evaluate ICL design choices across different missingness mechanisms (MCAR, MAR, MNAR) on 150 opinion variables spanning 15 waves of the American Trends Panel. Compared to well-established statistical methods for data imputation like MICE PMM, our ICL approach consistently reduces absolute error across all missingness mechanisms, with the largest gains under non-random missingness (MNAR). Notably, the best-performing specification (gpt-oss-120b with 100 in-context examples) achieves near-nominal aggregate coverage (approaching the 95% level) with confidence intervals two to five times narrower than MICE PMM. We publish a Python package with an sklearn-like API to enable easy deployment of our method using local and proprietary LLMs.

2606.09313 2026-06-09 cs.LG stat.AP 新提交

Machine-Learning Emulation of Satellite Greenhouse Gas Retrievals: Stability over Time

卫星温室气体反演的机器学习仿真:时间稳定性

Nugzar Gognadze, Motonobu Kanagawa, Yu Someya, Hisashi Yashiro

发表机构 * EURECOM National Institute for Environmental Studies(国立环境研究所)

AI总结 研究机器学习仿真卫星温室气体反演算法的时间稳定性,发现预测精度随时间下降,加入时间特征可改善Lasso和神经网络模型的XCH4预测,简单Lasso模型表现优于复杂方法且更稳定。

详情
Comments
48 pages, 9 figures, 15 tables
AI中文摘要

反演算法通过求解高光谱分辨率卫星辐射测量值的逆问题,用于估算二氧化碳(CO2)和甲烷(CH4)等温室气体(GHGs)的大气浓度。然而,这些算法计算成本高,使得大规模实时估算变得困难。因此,机器学习模型被提出作为反演算法的快速仿真器。然而,现有大多数研究仅使用与训练数据同期的测试数据评估它们。我们利用温室气体观测卫星(GOSAT)的数据研究此类仿真器的时间稳定性。我们表明,当测试期远离训练期时,预测精度通常会下降。我们还表明,将时间作为输入特征显著改善了Lasso和神经网络模型的XCH4预测。在所考虑的方法中,简单的Lasso模型表现与神经网络等更复杂的方法相当或更好,并且随时间产生更稳定的预测。我们利用地面观测网络——总碳柱观测网络(TCCON)进一步验证了结果。在TCCON匹配数据集上,时间增强的Lasso模型对TCCON的误差与GOSAT和TCCON之间在XCO2和XCH4上的差异相当。

英文摘要

Retrieval algorithms are used to estimate atmospheric concentrations of greenhouse gases (GHGs), such as carbon dioxide (CO2) and methane (CH4), by solving inverse problems from high-spectral-resolution satellite radiance measurements. However, these algorithms are computationally expensive, which makes real-time estimation at scale difficult. Machine-learning models have therefore been proposed as fast emulators of retrieval algorithms. Most existing studies, however, evaluate them only on test data from the same period as the training data. We study the stability over time of such emulators using data from the Greenhouse Gases Observing SATellite (GOSAT). We show that prediction accuracy generally deteriorates when the test period moves away from the training period. We also show that including time as an input feature substantially improves XCH4 prediction for Lasso and neural-network models. Among the methods considered, a simple Lasso model performs as well as or better than more complex methods such as neural networks, and yields more stable predictions over time. We further validate the results using the Total Carbon Column Observing Network (TCCON), a ground-based observation network. On the TCCON-matched dataset, the time-augmented Lasso achieves errors against TCCON that are comparable to the disagreement between GOSAT and TCCON for both XCO2 and XCH4.

2606.09257 2026-06-09 cs.LG cs.AI stat.ML 新提交

BSTabDiff: Block-Subunit Diffusion Priors for High-Dimensional Tabular Data Generation

BSTabDiff: 用于高维表格数据生成的块-子单元扩散先验

Al Zadid Sultan Bin Habib, Md Younus Ahamed, Prashnna Gyawali, Gianfranco Doretto, Donald A. Adjeroh

发表机构 * West Virginia University(西弗吉尼亚大学) The University of Utah(犹他大学)

AI总结 针对高维低样本量表格数据,提出BSTabDiff框架,通过将特征划分为潜在块并使用共享低维子单元变量生成每个块,结合扩散先验和copula依赖,实现稳定合成与可控基准生成。

详情
Comments
Published as a paper at the 2nd DeLTa Workshop, ICLR 2026
AI中文摘要

高维低样本量(HDLSS)表格领域(例如组学)的特点是 $n \ll m$,其中 $n$ = 样本数,$m$ = 特征数。此类领域通常表现出强局部相关组、稀疏跨组依赖、重尾非高斯边缘分布、异方差噪声和结构化缺失,使得在 $\mathbb{R}^m$ 中直接进行密度学习因 $n \ll m$ 而病态。我们提出 BSTabDiff,一种块-子单元生成框架,将 $m$ 个观测特征划分为 $M$ 个潜在块($M \ll m$),并通过共享的低维子单元变量生成每个块,将全局依赖学习集中在紧凑的块潜在空间 $\mathbb{R}^M$ 中,同时通过 copula 驱动的依赖、灵活的逐特征边缘分布和显式缺失机制解码到完整特征空间。BSTabDiff 支持块潜在上的现代深度先验,包括扩散和归一化流,从而在 HDLSS 场景中实现稳定合成和可控基准生成。实验表明,与 HDLSS 数据上的非结构化表格生成器相比,BSTabDiff 能产生更真实和稳定的高维合成数据。

英文摘要

High-Dimensional Low-Sample Size (HDLSS) tabular domains (e.g., omics) are characterized by $n \ll m$, where $n$ = number of samples, and $m$ = number of features. Such domains often exhibit strong local correlation groups, sparse cross-group dependencies, heavy-tailed non-Gaussian marginals, heteroscedastic noise, and structured missingness, making direct density learning in $\mathbb{R}^m$ ill-conditioned since $n \ll m$. We propose BSTabDiff, a block-subunit generative framework that partitions the $m$ observed features into $M$ latent blocks ($M \ll m$) and generates each block via a shared low-dimensional subunit variable, concentrating global dependence learning in the compact block-latent space $\mathbb{R}^M$ while decoding to the full feature space with copula-driven dependence, flexible per-feature marginals, and explicit missingness mechanisms. BSTabDiff supports modern deep priors on block latents, including diffusion and normalizing flows, enabling stable synthesis and controllable benchmark generation in the HDLSS regime. Empirically, BSTabDiff produces more realistic and stable high-dimensional synthetic data when compared with unstructured tabular generators on HDLSS data.

2606.08660 2026-06-09 stat.AP stat.ME stat.OT 新提交

Active Learning with Bayesian Reasoning: A POGIL-Based Pedagogy in Introductory Statistics

基于贝叶斯推理的主动学习:入门统计学中的POGIL教学法

Cheng-Han Yu, Angela Ebeling

AI总结 本文介绍一种面向过程的引导探究学习(POGIL)活动,用于在入门统计学中通过条件概率、贝叶斯定理和信念更新教授贝叶斯推理,并通过准实验比较POGIL与讲授式教学的效果,发现两者在考试表现和满意度上无显著差异。

详情
AI中文摘要

我们介绍了一种面向过程的引导探究学习(POGIL)风格的活动,用于在入门统计学中通过条件概率、贝叶斯定理和信念更新教授贝叶斯推理。该活动自成一体,使用可手工计算的概率,以双向表组织,并让学生参与结构化的团队角色。我们在一个本科入门统计学课程的四个部分中评估了该活动,采用准实验比较了POGIL风格和讲授式教学在贝叶斯定理单元的效果。结果包括学生在贝叶斯定理期末考试问题上的表现以及对教学的满意度。我们使用贝叶斯双变量广义线性模型比较了两种方法,同时考虑了专业类型、性别和种族。结果表明,不同教学风格和人口统计组之间的考试表现相似,高满意度的概率也相似,但存在相当大的不确定性,没有明确证据表明存在有意义的差异。这些发现表明,POGIL风格的活动在该单元中的表现与讲授式教学相当,同时提供了一种主动且课堂就绪的方式来介绍贝叶斯推理,无需困难的计算或模拟。我们提供了可调整的教学材料和一个可复现的贝叶斯分析框架,用于评估入门统计学中的主动学习创新。我们的研究支持在入门课程中可行地纳入贝叶斯推理,并可能帮助考虑主动学习的教师。

英文摘要

We introduce a Process Oriented Guided Inquiry Learning (POGIL)-style activity for teaching Bayesian reasoning in introductory statistics through conditional probability, Bayes' theorem, and belief updating. The activity is self-contained, uses hand-computable probabilities organized in two-way tables, and engages students in structured team roles. We evaluated the activity in four sections of an undergraduate introductory statistics course using a quasi-experimental comparison of POGIL-style and lecture-based instruction for a Bayes' theorem unit. Outcomes included student performance on Bayes' theorem final exam questions and satisfaction with instruction. We used a Bayesian bivariate generalized linear model to compare the two approaches while accounting for major type, gender, and race. The results indicated similar exam performance and similar probabilities of high satisfaction across instructional styles and demographic groups, with considerable uncertainty and no clear evidence of meaningful differences. These findings suggest that the POGIL-style activity performed comparably to lecture-based instruction for this unit while offering an active and classroom-ready way to introduce Bayesian reasoning without requiring difficult computation or simulation. We provide adaptable instructional materials and a reproducible Bayesian analytic framework for evaluating active learning innovations in introductory statistics. Our study supports the feasible inclusion of Bayesian reasoning in introductory courses and may help instructors considering active learning.

2606.08654 2026-06-09 cs.LG cs.NA math.AP math.NA stat.AP 新提交

Operator learning for the 2D incompressible Navier-Stokes equations: a conformal prediction approach in the data-scarce regime

二维不可压缩Navier-Stokes方程的算子学习:数据稀缺情况下的共形预测方法

Weinan Wang, Bowen Gang, Hao Deng

发表机构 * University of Oklahoma(俄克拉荷马大学) Fudan University(复旦大学)

AI总结 针对数据稀缺下算子学习的不确定性量化,提出基于扰动的共形预测框架,在二维Navier-Stokes基准上比现有方法生成更窄的共形带,同时保持目标覆盖。

详情
AI中文摘要

本文提出了一种基于扰动的共形预测框架,用于算子学习中的不确定性量化,重点关注二维Navier-Stokes方程。虽然神经算子为昂贵的PDE求解器提供了快速替代方案,但它们本身无法为时空场预测提供校准的不确定性。我们的方法将训练好的傅里叶神经算子(FNO)与分裂共形预测相结合,通过比较在几乎相同数据集上训练的两个算子的预测来构建局部不确定性尺度:一个使用原始标签,另一个使用添加小高斯噪声的标签。我们在数据稀缺情况下考虑该过程,其中总标签预算固定,而需要单独不确定性网络的方法必须在多个模型之间划分训练数据。在二维Navier-Stokes基准上,在匹配总数据预算的情况下,基于扰动的方法产生的共形带比现有方法窄得多,同时保持目标同时覆盖。这些结果表明,扰动敏感性是共形化神经算子的一种实用且样本高效的不确定性代理。

英文摘要

In this paper, we propose a perturbation-based conformal prediction framework for uncertainty quantification in operator learning, with a focus on the 2D Navier--Stokes equations. While neural operators provide fast surrogates for expensive PDE solvers, they do not by themselves provide calibrated uncertainty for spatiotemporal field predictions. Our approach wraps a trained Fourier Neural Operator (FNO) with split conformal prediction and constructs the local uncertainty scale by comparing the predictions of two operators trained on nearly identical datasets: one on the original labels and one on labels perturbed by small Gaussian noise. We consider this procedure in the data-scarce regime, where the total label budget is fixed and methods that require a separate uncertainty network must divide training data between multiple models. On the 2D Navier--Stokes benchmark, the perturbation-based method produces substantially narrower conformal bands than existing methods under matched total data budgets while maintaining the target simultaneous coverage. These results suggest that perturbation sensitivity is a practical and sample-efficient uncertainty proxy for conformalized neural operators.

2606.08587 2026-06-09 stat.ML cs.LG 新提交

Improving the sharpness in neural network-based parametric post-processing of ensemble forecasts

提高基于神经网络的集合预报参数化后处理中的锐度

Ágnes Baran, Máté Mihalina

AI总结 针对集合预报后处理中锐度下降的问题,通过在损失函数中加入惩罚项,在保持CRPS和RMSE不变的情况下,将中心预测区间宽度相对减小8.2%-12.5%。

详情
Comments
18 pages
AI中文摘要

统计后处理已被证明是改进不同天气变量集合预报的有效工具。案例研究表明,后处理可以纠正集合预报通常存在的分散不足和潜在偏差行为,同时优化表示预报技巧的适当评分规则。这些积极效应的代价通常是锐度下降;中心预测区间的宽度和预测的不确定性增加,尤其是在较短预报时效。本研究旨在通过扩展网络损失函数加入惩罚项,减少基于神经网络的参数化后处理方法中后一种现象的程度。我们使用从EUPPBench基准数据集下载的欧洲中期天气预报中心2米温度集合预报,并对照天气观测进行验证,展示了所提技术的效果。这里,预测分布为高斯分布,我们使用连续排序概率评分(CRPS)作为损失函数。案例研究证实,与未加惩罚项计算的预测分布宽度相比,名义中心预测区间的宽度有显著相对减小(8.2%-12.5%),而概率预报的平均CRPS和预测均值的RMSE没有恶化。

英文摘要

Statistical post-processing has proven to be an effective tool in improving ensemble forecast of different weather variables. Case studies show that post-processing can remedy the typically underdispersive and potentially biased behaviour of the ensemble while optimizing a proper scoring rule expressing the forecast skill. The price of these positive effects is generally a deterioration in sharpness; the width of the central prediction intervals and the uncertainty of the predictions are increasing, especially for shorter lead times. This work aims to reduce the extent of the latter phenomenon for neural network-based parametric post-processing methods by extending the network's loss function with a penalty term. We demonstrate the effect of the proposed technique for 2m temperature ensemble forecasts of the European Centre for Medium-Range Weather Forecasts downloaded from the EUPPBench benchmark dataset and verified against synoptic observations. Here, the predictive distribution is Gaussian, and we use the continuous ranked probability score (CRPS) as loss function. The case studies confirm a substantial relative decrease ($8.2\%-12.5\%$) in the width of the nominal central prediction interval compared to the width of the predictive distribution computed without the penalty term, while there is no deterioration in the mean CRPS of probabilistic forecasts and in the RMSE of the predictive mean.

2606.08289 2026-06-09 stat.ME 新提交

Direct domain estimation via regression-tree-assisted estimators in the production of official statistics

官方统计生产中的回归树辅助直接域估计

Juan Pablo Ferreira

AI总结 本文在模型辅助估计框架下,从设计角度研究使用回归树作为辅助模型对非计划域进行直接估计,提出两种策略并证明其保持单权属性和可加性,模拟显示域特定模型可显著降低方差。

详情
AI中文摘要

国家统计机构(NSO)在单一加权系统(单权方法)下生成其估计值:一组独立于目标变量的权重用于估计多个参数和多个子总体(域)。在本文中,我们在模型辅助估计族中,从直接估计的设计角度研究使用回归树作为辅助模型来估计非计划域的总量。我们区分两种策略:(i)在总体水平上拟合单棵树并从中推导出适用于任何域的单权权重,以及(ii)拟合域特定树。我们证明这两种估计量都可以写成加权和,其权重不依赖于$y$,保留了单权属性和相对于总体总量的可加性基准。将经典结果扩展到树,我们解释了为什么基于总体水平模型构建的估计量在域内倾向于像霍维茨-汤普森估计量那样表现,而域特定模型可以实现显著的方差减少。基于乌拉圭连续家庭调查(ECH)微观数据的模拟研究说明了这些估计量在总体水平和部门层面的行为。

英文摘要

National statistical offices (NSOs) produce their estimates under a single weighting system (uni-weight approach): one set of weights, independent of the variable of interest, is used to estimate multiple parameters and multiple subpopulations (domains). In this paper we study, within the family of model-assisted estimators and from a design-based perspective of direct estimation, the use of regression trees as the assisting model for estimating totals in unplanned domains. We distinguish two strategies: (i) fitting a single tree at the population level and deriving from it uni-weight weights applicable to any domain, and fitting a domain-specific tree. We show that both estimators can be written as weighted sums with weights that do not depend on $y$, preserving the uni-weight property and additivity benchmarking with respect to the population total. Extending to trees the classical result, we argue why the estimator built from a population-level model tends to behave like the Horvitz-Thompson estimator within domains, whereas the domain-specific model can achieve substantial variance reductions. A simulation study based on microdata from the Uruguayan Continuous Household Survey (ECH) illustrates the behavior of the estimators at the population level and by department

2606.07994 2026-06-09 cs.DL stat.AP 新提交

The Rising Dominance of Methods Across Science

方法在科学中日益增长的主导地位

Alexander Krauss, Ariel Rosenfeld, Lutz Bornmann

AI总结 通过分析1980-2019年间超过300万篇论文,发现方法论文占比翻倍,标志着科学从1990年代初开始向方法驱动的结构性转变。

详情
AI中文摘要

科学进步传统上是通过理论见解和实验发现的相互作用来叙述的。然而,这种科学观低估了进步的第三个核心支柱:支撑概念进展和经验证据的方法。通过分析1980年至2019年间发表的超过300万篇科学文章,我们发现科学经历了一个根本性的结构转变。主要贡献新方法的论文(方法论文)的比例在过去四十年中翻了一番,在各个学科和引用影响水平上普遍上升。这种转变并非渐进演变,而是从1990年代初开始的一个关键转变,与计算革命和数据密集型科学的出现相一致。方法论研究的激增并不局限于被引用最多的精英出版物;它跨越了科学产出的全谱系。这些发现揭示了科学生态系统的系统性重新定向,其中可重复使用的方法日益成为科学进步的基本基础设施,挑战了理论和实验研究的传统二分法。随着科学变得越来越方法驱动,我们的结果呼吁重新思考如何评估、资助和组织研究——以更好地激励方法创新。尤其是在扩展人工智能必须与科学仪器有效集成以充分发挥其潜力的情况下,这一点尤为重要。

英文摘要

Scientific progress is traditionally narrated through the interplay of theoretical insights and experimental findings. Yet this view of science underplays a third and central pillar of progress: the methods that underlie both conceptual advances and empirical evidence. By analysing more than 3 million articles across science published between 1980 and 2019, we find that science has undergone a fundamental structural transition. The share of papers that primarily contribute new methods-methods papers-has doubled across science over the past four decades, rising universally across disciplines and citation impact levels. Rather than a gradual evolution, this transition marks a pivotal shift beginning in the early 1990s, aligning with the computational revolution and the emergence of data-intensive science. The surge in methodological research is not confined to the most cited, elite publications; it spans the full spectrum of scientific output. These findings reveal a systemic reorientation of the scientific ecosystem where reusable methods increasingly serve as the essential infrastructure of scientific advances, challenging the traditional dichotomy of theory and experimental research. As science becomes increasingly methods-driven, our results call for rethinking how research is evaluated, funded and organised-towards better incentivising method innovations. This is especially the case as expanding AI must be effectively integrated with scientific instruments to realise its full potential.

2606.07865 2026-06-09 cs.LG cs.AI physics.comp-ph stat.ML 新提交

Instrumented data for causal scientific machine learning

因果科学机器学习的仪器化数据

Daniel N. Wilke

发表机构 * University of the Witwatersrand(威特沃特斯兰德大学)

AI总结 提出仪器化数据作为观测数据和模板合成数据之外的第三种选择,每个数据点携带产生它的机制模型、显式不确定性及可执行的反事实族,通过V&V仪器化图像到模拟管道实现,支持因果干预。

详情
Comments
10 pages, 2 figures
AI中文摘要

科学机器学习受限于训练数据而非模型大小。观测数据记录发生了什么但不记录原因;模板合成数据具有已知的生成过程,但仅适用于模拟器的模板,而非用户面对的情况。我们认为第三种选择现在在操作上是可行的:仪器化数据,其中每个数据点携带产生它的机制模型、对该模型的显式不确定性以及可执行的反事实族。验证与确认(V&V)仪器化图像到模拟管道是一种实现:传感器观测成为完全指定、求解器支持的模拟,具有显式、可编辑的参数以及传播的偶然/认知不确定性。该基底是案例特定的、机制监督的,并通过Pearl的do算子支持因果干预。在验证、审计和替代训练方面的近期影响涵盖计算生物学、气候、材料、流体力学和医学成像;长期可证伪的推论涉及科学推理的基础模型。

英文摘要

Scientific machine learning is limited less by model size than by the data it is trained on. Observational data records what happened but not why; template synthetic data has a known generating process but only for the simulator's template, not the case a user faces. We argue a third option is now operationally feasible: instrumented data, in which every datum carries the mechanistic model that produced it, an explicit uncertainty over that model, and an executable family of counterfactuals. Verification-and-validation (V&V) instrumented image-to-simulation pipelines are one realisation: a sensor observation becomes a fully specified, solver-backed simulation with explicit, editable parameters and a propagated aleatoric/epistemic uncertainty. The substrate is case-specific, mechanistically supervised, and supports causal interventions through Pearl's do-operator. Near-term consequences for validation, auditing, and surrogate training span computational biology, climate, materials, fluid mechanics, and medical imaging; a longer-term, falsifiable implication concerns foundation models for scientific reasoning.

2606.07789 2026-06-09 cs.LG stat.ML 新提交

A Framework for Evaluating and Benchmarking Concept Drift Detection Methods

概念漂移检测方法的评估与基准测试框架

Vitor Cerqueira, Heitor Murilo Gomes, Marco Heyden, Bernhard Pfahringer, Albert Bifet

发表机构 * University of Coimbra(科英布拉大学) Victoria University of Wellington(惠灵顿维多利亚大学) Commerzbank(德国商业银行) University of Waikato(怀卡托大学) AI Institute, University of Waikato(怀卡托大学人工智能研究所)

AI总结 提出一个包含漂移模拟、时序感知评估和超参数优化协议的基准测试框架,在7个真实数据集上评估14种漂移检测方法,揭示其优劣并建立基线性能。

详情
Comments
Accepted in KDD'26
AI中文摘要

数据流挖掘从根本上受到概念漂移的挑战,其中分布变化可能降低模型性能。尽管漂移检测方法层出不穷,但该领域的进展受到不一致评估实践的阻碍:研究依赖于过度简化的合成数据生成器,采用不兼容的指标,并且在超参数选择上缺乏透明度,使得公平比较变得困难。我们通过一个新颖的基准测试框架来解决这一差距,该框架包含三个贡献:(1)一种漂移模拟方法,通过蒙特卡洛试验将受控的分布变化注入真实世界数据集,在保留真实数据复杂性的同时实现监督评估;(2)一种用于漂移检测的评估协议,具有时序感知标准,包括推导出跨流可比较的新指标(例如,F1检测分数、归一化检测时间);(3)我们提倡一种留一数据集超参数优化协议,用于漂移检测方法,以促进跨异构流动态的配置鲁棒性。我们在7个真实世界数据集上对14种广泛使用的漂移检测方法进行了基准测试,涵盖4种漂移类型(类别先验、标签交换、特征排列、特征过滤),每种类型均包括突变和渐变转换。我们的实验结果揭示了当前漂移检测方法的优缺点,同时为该领域的未来研究建立了基线性能指标。所有代码和实验均公开可用。

英文摘要

Data stream mining is fundamentally challenged by concept drift, where distributional changes can degrade model performance. Despite the proliferation of drift detection methods, progress in the field is hindered by inconsistent evaluation practices: studies rely on oversimplified synthetic data generators, adopt incompatible metrics, and lack transparency in hyperparameter selection, making fair comparisons difficult. We address this gap with a novel benchmarking framework comprising three contributions: (1) a drift simulation method that injects controlled distributional changes into real-world datasets via Monte Carlo trials, enabling supervised evaluation while preserving real-world data complexity; (2) an evaluation protocol for drift detection with timing-aware criteria, including the derivation of new metrics (e.g., F1 detection score, normalized detection time) that are comparable across streams; and (3) we advocate for a leave-one-dataset-out hyperparameter optimization protocol for drift detection methods that promotes configuration robustness across heterogeneous stream dynamics. We benchmark 14 widely used drift detection methods on 7 realworld datasets across 4 drift types (class prior, label swap, feature permutation, feature filtering), each under both abrupt and gradual transitions. Our experimental results provide insights into the strengths and weaknesses of current drift detection approaches while establishing baseline performance metrics for future research in this area. All code and experiments are publicly available.

2606.07622 2026-06-09 cs.LG stat.AP 新提交

Airport Terminal Passenger Queue Forecasting for Departure Gates and Security Checkpoints

机场航站楼登机口与安检点旅客排队预测

Juhwan Lee, Seokbin Yoon, Keumjin Lee, Hojong Baik, Seyeon Jung

发表机构 * Korea Aerospace University(韩国航空大学) Korea Airports Corporation(韩国机场公社)

AI总结 提出基于Transformer的框架,利用历史队列长度、等待时间和旅客吞吐量数据,预测登机口和安检点未来两小时的队列长度与等待时间,支持主动排队管理。

详情
Comments
9 pages, 6 figures, accepted at DASC 2026
AI中文摘要

准确的机场航站楼旅客排队预测对于高效的离港运营至关重要,因为它能够实现主动的拥堵管理。然而,时变的旅客需求以及多个离港设施中异构的设施使用情况使得预测具有挑战性。在这项工作中,我们提出了一种旅客排队预测框架,该框架从运营数据中学习历史旅客流量模式。所提出的模型采用基于Transformer的架构,利用过去登机口和安检点的队列长度和等待时间,以及值机岛的旅客吞吐量,来捕捉时间依赖性和设施间相关性。学习到的表示被映射到两个设施特定的MLP头部,以预测登机口和安检点的队列长度和等待时间。实验结果表明,该模型能够准确预测未来两小时内的排队情况。所提出的方法为机场航站楼运营中的主动排队管理和人员重新分配提供了实用的实时决策支持。

英文摘要

Accurate passenger queue forecasting in airport terminals is essential for efficient departure operations, as it enables proactive congestion management. However, time-varying passenger demand and heterogeneous facility usage across multiple departure facilities make forecasting challenging. In this work, we propose a passenger queue forecasting framework that learns historical passenger flow patterns from operational data. The proposed model employs a Transformer-based architecture to capture temporal dependencies and inter-facility correlations using past queue length and waiting time at departure gates and security checkpoints, together with passenger throughput at check-in islands. The learned representations are mapped to two facility-specific MLP heads to predict queue length and waiting time at departure gates and security checkpoints. Experimental results demonstrate accurate forecasts up to two hours ahead. The proposed approach offers practical real-time decision support for proactive queue management and staff reallocation in airport terminal operations.

2606.07556 2026-06-09 cs.NI cs.AI stat.ME 新提交

Selecting New Measurement Locations to Diversify Traffic-Pattern Coverage: A Real-World Evaluation for Total Traffic Volume Estimation

选择新的测量位置以多样化交通模式覆盖:总交通量估计的实际评估

Masaaki Inoue, Akifumi Okuno, Shintaro Fukushima

AI总结 针对固定交通计数器覆盖有限的问题,提出利用广泛设备数据选择新计数器位置以增加观测模式多样性,提高城市交通量估计精度,并通过实地测量验证。

详情
Comments
12 pages, 7 figures
AI中文摘要

准确测量交通量和流量对于现代智能交通至关重要。然而,尽管传感器设备最近取得了技术进步,安装和维护固定交通计数器的成本仍然很高。因此,它仅限于可以安装计数器的一小部分位置点,这严重限制了在城市范围内掌握和预测总交通量的可能性。相比之下,具有位置历史的设备(如智能手机和联网车辆)现在被广泛使用,并提供更广泛的空间覆盖。然而,这些设备的数据通常是部分且嘈杂的,因此不足以直接估计总交通量和流量。在本文中,我们利用这些广泛可用设备的信息来帮助决定在何处放置额外的交通计数器,并研究选择新的测量位置如何改善城市范围的交通估计性能。为此,我们提出了一种算法,该算法选择额外的计数器位置以增加观测到的交通信号模式的多样性,而不是简单地将计数器均匀分布在空间上。目标是捕获当前计数器集中稀有的交通模式类型,并使收集的观测结果对后续估计和预测更具代表性。我们还进行了实际评估;在一个目标城市中,我们选择了预期能改善交通预测的新位置,然后自费在这些位置进行了新的实地测量。所得数据提高了不同保真度下交通量估计的准确性。

英文摘要

Accurate measurement of traffic volumes and flows is vital for modern intelligent transportation. However, despite recent technological advances in sensor devices, it is still expensive to install and maintain fixed traffic counters. Therefore, it is restricted to a small portion of location points where the counters can be installed, which severely limits the possibility of grasping and predicting the total traffic volume at a city-wide level. By contrast, devices with location history such as smartphones and connected vehicles are now widely used and provide much wider spatial coverage. However, the data from these devices are usually partial and noisy, so they are not enough to directly estimate total traffic volumes and flows. In this paper, we use the information from these widely available devices to help decide where to place additional traffic counters, and we study how selecting new measurement locations can improve city-wide traffic estimation performance. To achieve this, we propose an algorithm that chooses additional counter locations to increase the diversity of observed traffic signal patterns, rather than simply spreading counters evenly over space. The goal is to capture traffic-pattern types that are rare in the current counter set and to make the collected observations more representative for later estimation and forecasting. We also present a real-world evaluation; in a target city, we select new locations expected to improve traffic prediction, and we then commissioned new field measurements at those locations at our expense. The resulting data led to an improvement in traffic volume estimation accuracy across different fidelities.

2606.07379 2026-06-09 cs.LG cs.AI cs.CL stat.ME 版本更新

Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

编码智能体会欺骗我们吗?通过带随机测试的上限评估检测和防止作弊

Thanawat Lodkaew, Johannes Ackermann, Soichiro Nishimori, Nontawat Charoenphakdee, Masashi Sugiyama, Takashi Ishida

发表机构 * The University of Tokyo RIKEN

AI总结 提出CapCode框架,通过设置上限评估检测模型在编码任务中的作弊行为,并设计CapReward奖励机制防止作弊,实验表明该方法能有效检测和减少作弊。

详情
AI中文摘要

在智能体评估和训练中,一个日益增长的失败模式是模型可以通过利用捷径而非解决预期任务来获得高评估分数,产生欺骗性表现。这使得评估分数作为真实任务解决能力的度量不可靠。我们提出CapCode,一个构建带有随机测试的编码数据集的框架,其最佳可达的非作弊性能被故意限制在1以下。这种上限性能设计赋予评估分数更清晰的解释:显著高于上限的分数是不可信的,因此提供了作弊的证据。为了防止作弊,我们提出CapReward,一种基于CapCode原则的奖励设计,以抑制超出上限的优化。跨多个数据集的实验表明,CapCode能够检测作弊同时保持模型的性能排名,CapReward减少了作弊行为,产生了更好地遵循预期任务规范的模型。

英文摘要

A growing failure mode in agent evaluation and training is that models can achieve high evaluation scores by exploiting shortcuts instead of solving the intended task, producing deceptive performance. This makes evaluation scores unreliable as measures of true task-solving ability. We propose CapCode, a framework for constructing coding datasets with randomized tests whose best achievable non-cheating performance is deliberately capped below one. This capped-performance design gives evaluation scores a clearer interpretation: scores substantially above the cap are implausible and therefore provide evidence of cheating. To prevent cheating, we propose CapReward, a reward design based on the CapCode principle to discourage optimization beyond the cap. Experiments across multiple datasets show that CapCode detects cheating while preserving performance ranking of models, and CapReward reduces cheating behavior, yielding models that better follow the intended task specification.

2606.05450 2026-06-09 stat.AP stat.ME 版本更新

Eigenvector Spatial Filters Nuclear Norm Matrix Completion with Application to Air Quality Data

特征向量空间滤波核范数矩阵补全及其在空气质量数据中的应用

Rodolfo Metulini

AI总结 针对环境面板数据中缺失观测的可靠重建问题,提出特征向量空间滤波核范数矩阵补全方法,通过引入Moran型特征向量捕获空间自相关,采用块坐标下降法求解多凸优化问题,在模拟和实际空气质量数据中显著提升插补精度。

详情
Comments
29 pages, 5 figures, 14 tables, draft version (to do not cite yet)
AI中文摘要

环境面板数据中缺失观测的可靠重建对于准确的暴露评估和政策分析至关重要。传统的核范数矩阵补全方法能有效插补低秩矩阵中的缺失条目,但往往忽略了空气质量过程固有的空间依赖性。本文介绍了特征向量空间滤波核范数矩阵补全(ESFNNMC)方法,它是核范数固定效应矩阵补全的扩展,用一组捕获数据中空间自相关的Moran型特征向量替代单位特定的截距项。为了估计该模型,我们提出了一种用于多凸优化问题的块坐标下降(BCD)方法,结合软阈值奇异值分解和交叉验证的正则化。通过改变缺失模式、空间和时间自相关水平以及矩阵的维度、形状和秩结构的综合模拟,ESFNNMC在插补精度上比标准固定效应方法有显著提升,同时计算成本基本保持不变。该方法应用于插补2021年意大利伦巴第大区64个监测站每日PM10测量值中的缺失条目。

英文摘要

Reliable reconstruction of missing observations in environmental panel datasets is essential for accurate exposure assessment and policy analysis. Traditional nuclear norm matrix completion methods effectively impute missing entries in low-rank matrices, yet often overlook the spatial dependence inherent to air quality processes. This paper introduces the Eigenvector Spatial Filters Nuclear Norm Matrix Completion (ESFNNMC) method, an extension of nuclear norm fixed-effects matrix completion that replaces unit-specific intercepts with a set of Moran-type eigenvectors capturing spatial autocorrelation in the data. To estimate the model, we propose a Block-Coordinate Descent (BCD) approach for multiconvex optimization problems, with soft-thresholded singular value decomposition and cross-validated regularization. Through comprehensive simulations varying missingness patterns, the level of spatial and temporal autocorrelation, and dimension, shape, and rank structure of the matrices, ESFNNMC demonstrates substantial improvements in imputation accuracy over the standard fixed-effects approach, while keeping the computational cost approximately unchanged. The method is applied to impute missing entries in daily PM10 measurements in 64 monitoring stations in Lombardy, Italy, during the year 2021.

2606.00384 2026-06-09 cs.AI cs.CL cs.CV cs.LG stat.CO 版本更新

VESTA: Visual Exploration with Statistical Tool Agents

VESTA: 基于统计工具代理的视觉探索

William Rudman, Abhishek Divekar, Kanishk Jain, Sebastian Joseph, Stella S. R. Offner, Matthew Lease, Kyle Mahowald, Greg Durrett, Junyi Jessy Li

AI总结 提出VESTA框架,通过动态增长的工具集指导数据变换、假设驱动可视化和统计检验,提升视觉语言模型在复杂统计建模任务上的性能。

详情
AI中文摘要

将定量模型拟合到数据上是科学工作流程中的核心步骤,但它仍然是最少自动化的步骤之一。最近的基于代理的系统利用语言和视觉语言模型(VLM)来迭代地提出和优化统计模型,但这些系统在更具挑战性的建模任务上表现不佳。为了解决这些限制,我们引入了VESTA:基于统计工具代理的视觉探索,这是一个框架,为VLM配备了一个动态增长的探索工具包,通过数据变换、假设驱动的可视化和稳健的统计检验来指导模型优化。与之前仅依赖迭代批评的系统不同,VESTA在优化之前和优化过程中通过选择或创建诊断工具主动探索数据,这些工具会累积在模型的上下文中,并可在以后重用。我们在三种工具配置下评估VESTA与已建立的基线:无工具、静态专家编写的工具和动态模型编写的工具。为了支持这一评估,我们引入了DAWN(自动工作流和数值建模数据集),这是一个针对分布拟合和时间序列建模的基准,具有不同的难度等级,并最终涉及真实世界的天文学任务,包括建模初始质量函数和引力波啁啾信号。我们发现VESTA的动态工具创建优于先前的代理流水线,在复杂和特定领域的任务上取得了最大的收益。我们进一步表明,动态生成的工具比现有视觉工具创建系统生成的工具复杂得多,每个函数覆盖更多的诊断类别,并且强烈倾向于VLM批评者可以直接推理的视觉输出。

英文摘要

Fitting quantitative models to data is a central step in scientific workflows, yet it remains one of the least automated. Recent agent-based systems leverage language and vision-language models (VLMs) to iteratively propose and refine statistical models, but these systems struggle on more challenging modeling tasks. To address these limitations, we introduce VESTA: Visual Exploration with Statistical Tool Agents, a framework that equips VLMs with a dynamically growing exploration toolkit to guide model refinement through data transformations, hypothesis-driven visualizations, and robust statistical tests. Unlike prior systems that rely on iterative critique alone, VESTA actively explores data before and during refinement by selecting or creating diagnostic tools, which accumulate in the model's context and can be reused later. We evaluate VESTA against established baselines in three toolkit configurations: no tools, static expert-written tools, and dynamic model-written tools. To support this evaluation, we introduce DAWN (Dataset for Automated Workflows and Numerical Modeling), a benchmark targeting distribution fitting and time series modeling with varying difficulty tiers, and culminating in real-world astronomy tasks including modeling initial mass functions and gravitational-wave chirp signals. We find that VESTA's dynamic tool creation outperforms prior agentic pipelines, with the largest gains on complex and domain-specific tasks. We further show that dynamically generated tools are substantially more sophisticated than those produced by existing visual tool-creation systems, covering more diagnostic categories per function and strongly preferring visual outputs that the VLM critic can reason over directly.

2604.06278 2026-06-09 stat.ME cs.CY stat.AP 版本更新

Predictive Volatility of Machine Learning in Micro-Samples: A Regularised Assessment of Regional Poverty

机器学习在微样本中的预测波动性:区域贫困的正则化评估

A. H. Jamaluddin, A. T. R. Dani, N. I. Mahat, V. Ratnasari, S. S. M. Fauzi

AI总结 本文通过正则化方法评估了印度尼西亚地区贫困的结构性驱动因素,发现简单的线性收缩模型在预测性能上优于复杂的机器学习集成模型,且ICT技能是预测省级贫困最稳定的代理变量。

详情
Comments
Corrections are needed
AI中文摘要

在区域数据集确定贫困的结构性驱动因素时,小样本量和高多维共线性常导致不稳定且误导性的政策建议。本文通过解决这些特定的统计风险,评估了印度尼西亚省份贫困的原因。我们采用了一个为小样本(n=34)和高共线性设计的严格模型比较框架,比较了标准线性模型、频率学惩罚、贝叶斯收缩先验、调整后的空间内在条件自回归(ICAR)模型和复杂的机器学习集成。为确保稳健的评估,我们使用严格的留一验证(LOOCV)测量预测性能。结果表明,算法复杂性在区域数据集中本质上具有风险:简单的线性收缩模型(Ridge、Elastic Net、LASSO)在样本外预测中表现最佳,而复杂的集成如BART则遭受严重的过拟合。在所有成功的正则化模型中,ICT技能始终是预测较低省级贫困最稳定的代理变量。本文的主要贡献是证明,在数据受限的区域分析中,参数正则化的线性收缩模型比简单的OLS或无约束的机器学习提供了更可靠的数学基础,以隔离结构性发展优先事项,如ICT。

英文摘要

Small regional datasets pose a dual statistical problem: correlated predictors inflate estimation variance, while flexible learners can become unstable because the available information per adaptive degree of freedom is limited. We examine this issue through predictive volatility, defined as the cross-sample dispersion and upper-tail behaviour of out-of-sample loss. Using simulation evidence reported for sparse linear, near-linear and heavy-tailed settings, we compare ordinary least squares, frequentist penalties, Bayesian shrinkage models, bounded-response and spatial specifications, and flexible machine-learning procedures. In the reported simulation results, regularised linear estimators generally dominate in the linear high-collinearity micro-sample settings and remain the most reliable overall, whereas tree-based methods become more competitive only when the signal is weakly nonlinear and the sample size is larger. In the empirical application to 34 Indonesian provinces, ridge yields the best leave-one-out performance, followed by elastic net and lasso. Across the Bayesian shrinkage specifications, ICT skills show the most consistent negative association with poverty, with the strongest support under horseshoe and spike-and-slab formulations. These results suggest that, in micro-sample regional modelling, the main constraint is limited information per effective degree of freedom rather than insufficient algorithmic flexibility.

2602.17640 2026-06-09 stat.AP cs.SE 版本更新

huff: A Python package for Market Area Analysis

huff:用于市场区域分析的Python包

Thomas Wieland

AI总结 huff包提供市场区域分析的完整流程,包括数据导入、OD矩阵构建、模型分析、参数估计和可视化,适用于经济地理学和健康地理学研究。

详情
Comments
v1.2.1; added references, update of scientific usage and PyPI usage statistics
AI中文摘要

市场区域模型,如Huff模型及其扩展,广泛用于估算零售和服务地点的市场份额和客户流量。huff Python包提供完整的市场区域分析工作流,包括数据导入、OD交互矩阵构建、基本模型分析、参数估计、距离或旅行时间指标计算以及地图可视化。此外,该包还提供多种空间可达性分析方法。该包模块化且面向对象,适用于经济地理学、区域经济学、空间规划、营销、地理信息科学和健康地理学研究。软件通过Python Package Index(PyPI)公开可用(https://pypi.org/project/huff/);其开发和版本历史管理在公共GitHub仓库(https://github.com/geowieland/huff_official)中,并在Zenodo(https://doi.org/10.5281/zenodo.18639559)中归档。

英文摘要

Market area models, such as the Huff model and its extensions, are widely used to estimate regional market shares and customer flows of retail and service locations. Another, now very common, area of application is the analysis of catchment areas, supply structures and the accessibility of healthcare locations. The huff Python package provides a complete workflow for market area analysis, including data import, construction of origin-destination interaction matrices, basic model analysis, parameter estimation from empirical data, calculation of distance or travel time indicators, and map visualization. Additionally, the package provides several methods of spatial accessibility analysis. The package is modular and object-oriented. It is intended for researchers in economic geography, regional economics, spatial planning, marketing, geoinformation science, and health geography. The software is openly available via the Python Package Index (PyPI) (https://pypi.org/project/huff/); its development and version history are managed in a public GitHub Repository (https://github.com/geowieland/huff_official) and archived at Zenodo (https://doi.org/10.5281/zenodo.18639559).

2603.10823 2026-06-09 stat.ML cs.LG 版本更新

ReTabSyn: Realistic Tabular Data Synthesis via Reinforcement Learning

ReTabSyn:通过强化学习实现真实表格数据合成

Xiaofeng Lin, Seungbae Kim, Zhuoya Li, Zachary DeSoto, Charles Fleming, Guang Cheng

AI总结 ReTabSyn通过强化学习优先学习条件分布,提升小数据下表格数据合成效率,优于现有基线方法。

详情
AI中文摘要

深度生成模型可通过生成合成训练数据缓解数据稀缺和隐私问题,但在低数据、不平衡的表格设置中难以完全学习复杂的数据分布。我们认为追求完整的联合分布可能过于苛刻;为了提高数据效率,模型应优先学习条件分布$P(y\mid \bm{X})$,这由最近的理论分析所支持。因此,我们通过\textbf{ReTabSyn},一个提供合成器训练过程中特征相关性保留直接反馈的\textbf{Re}inforced \textbf{Tab}ular \textbf{Syn}thesis流程,克服了这一限制。这一目标鼓励生成器在数据有限时优先考虑最有用的预测信号,从而增强下游模型的实用性。我们通过这种做法对基于语言模型的生成器进行经验微调,并在具有小样本量、类别不平衡和分布偏移的基准测试中,ReTabSyn始终优于最先进的基线方法。此外,我们的方法可以轻松扩展到控制合成表格数据的各种方面,例如应用专家指定的生成观测约束。

英文摘要

Deep generative models can help with data scarcity and privacy by producing synthetic training data, but they struggle in low-data, imbalanced tabular settings to fully learn the complex data distribution. We argue that striving for the full joint distribution could be overkill; for greater data efficiency, models should prioritize learning the conditional distribution $P(y\mid \bm{X})$, as suggested by recent theoretical analysis. Therefore, we overcome this limitation with \textbf{ReTabSyn}, a \textbf{Re}inforced \textbf{Tab}ular \textbf{Syn}thesis pipeline that provides direct feedback on feature correlation preservation during synthesizer training. This objective encourages the generator to prioritize the most useful predictive signals when training data is limited, thereby strengthening downstream model utility. We empirically fine-tune a language model-based generator using this approach, and across benchmarks with small sample sizes, class imbalance, and distribution shift, ReTabSyn consistently outperforms state-of-the-art baselines. Moreover, our approach can be readily extended to control various aspects of synthetic tabular data, such as applying expert-specified constraints on generated observations.

2402.06428 2026-06-09 stat.ME 版本更新

Smooth Transformation Models for Survival Analysis: A Tutorial Using R

生存分析的平滑变换模型:使用R的教程

Sandra Siegfried, Bálint Tamási, Torsten Hothorn

AI总结 本文介绍平滑变换模型框架在R中统一处理多种生存模型,包括非比例风险、相依删失等复杂场景,并利用直肠癌试验数据演示实现。

详情
AI中文摘要

在过去的五十年里,生存分析方法取得了重大进展,使用了参数方法以及更突出的基于非参数/半参数估计的方法。随着方法论格局的不断演变,在众多方法中导航并识别可用的软件资源变得越来越具有挑战性——尤其是在更复杂的场景中,例如处理区间删失或聚类生存数据、非比例风险或相依删失。本教程探讨了在R统计计算系统中使用平滑变换模型框架进行生存分析的潜力。该框架提供了一种统一的最大似然方法,涵盖了广泛的生存模型,包括成熟的Weibull模型和著名的Cox比例风险模型的完全参数版本,以及针对更复杂场景的各种扩展。我们探索了该框架内的非比例/交叉风险模型、相依删失、聚类观测以及个性化医学的扩展。使用一项关于直肠癌治疗的两臂随机对照试验的生存数据,我们演示了如何利用“tram”包及少数相关包提供的实现,在该框架内无缝地在R中完成生存分析任务。

英文摘要

Over the last five decades, we have seen strong methodological advances in survival analysis, using parametric methods and, more prominently, methods based on non-/semi-parametric estimation. As the methodological landscape continues to evolve, the task of navigating through the multitude of methods and identifying available software resources is becoming increasingly challenging -- especially in more complex scenarios, such as when dealing with interval-censored or clustered survival data, non-proportional hazards, or dependent censoring. This tutorial explores the potential of using the framework of smooth transformation models for survival analysis in the R system for statistical computing. This framework provides a unified maximum-likelihood approach that covers a wide range of survival models, including well-established ones such as the Weibull model and a fully parametric version of the famous Cox proportional hazards model, and various extensions for more complex scenarios. We explore models for non-proportional/crossing hazards, dependent censoring, clustered observations and extensions towards personalised medicine within this framework. Using survival data from a two-arm randomised controlled trial on rectal cancer therapy, we demonstrate how survival analysis tasks can be seamlessly navigated in R within this framework using the implementation provided by the "tram" package, and few related packages.

12. 其他/综合统计 33 篇

2606.09737 2026-06-09 math.ST stat.TH 新提交

Online change point detection under heavy-tailedness and contamination

重尾与污染下的在线变点检测

Edwin Yiu Nam Tang, Yudong Chen, Mengchu Li, Yi Yu

AI总结 针对动态Huber污染模型,提出在线鲁棒均值变点检测方法,在单变量中划分参数空间实现最优检测延迟,多变量中设计鲁棒均值检验程序,首次同时处理Huber污染和重尾性。

详情
AI中文摘要

我们研究了在动态Huber污染模型下的在线鲁棒均值变点检测问题,该模型具有任意污染分布和具有指数或多项式衰减尾部的内点分布。这种鲁棒性框架在变点文献中首次被系统研究。对于单变量数据,我们通过根据真实变点位置、信号大小和污染水平将参数空间划分为四个区域来刻画检测延迟。高效的检测程序伴随匹配的下界,直到多对数因子。对于多变量设置,我们设计了一个高效的鲁棒均值检验程序,并将其应用于鲁棒在线变点问题。该鲁棒均值检验程序的理论分析是首次同时处理Huber污染和重尾性,因此具有独立的研究价值。进行了大量的数值实验以支持我们的理论发现。

英文摘要

We study an online version of the robust mean change point detection problem under a dynamic Huber contamination model with arbitrary contamination distribution and inlier distribution possessing exponentially- or polynomially-decaying tails. This robustness framework is systematically studied for the first time in the change point literature. For univariate data, we characterise the detection delay by partitioning the parameter space into four regimes, in terms of the true change location, signal size and contamination level. Efficient detection procedures are accompanied by matching lower bounds, up to poly-logarithmic factors. For the multivariate setting, we devise an efficient robust mean testing procedure and apply this to the robust online change point problem. The theoretical analysis of the robust mean testing procedure is the first in dealing with both Huber contamination and heavy-tailedness, and is thus of independent interest. Extensive numerical experiments are conducted to support our theoretical findings.

2606.09660 2026-06-09 math.ST stat.TH 新提交

New Baire category results for stochastic orders on bivariate copulas

二元连接函数上随机序的新的Baire范畴结果

María del Rosario Rodríguez-Griñolo, Manuel Úbeda-Flores

AI总结 证明在Baire范畴意义下,二元连接函数对在递增凸序下可比较的集合是稀疏的,并推广到二元凸序和停止损失序,得出拓扑一般的连接函数对在这三种序下均不可比较。

详情
AI中文摘要

在Baire范畴的意义下,我们证明了在一致度量下,所有二元连接函数对的空间中,那些在递增凸序下可比较(无论方向)的二元连接函数对集合是无处稠密的。因此,一个拓扑一般的二元连接函数对在该序下是不可比较的。我们进一步将Baire范畴程序推广到二元连接函数空间上的另外两个随机序:二元凸序和分量和的停止损失序。对于这些序中的每一个,我们证明了可比较对集合是闭集且无处稠密,并且表明一个拓扑一般的二元连接函数对同时在这三个序下都是不可比较的。这些结果补充了[F. Durante, J. Fernández-Sánchez, C. Ignazzi (2022). Baire category results for stochastic orders. Rev. Real Acad. Cienc. Exactas Fis. Nat. Ser. A-Mat. 116, article 188]中关于连接函数的下象限序的结果。

英文摘要

In the sense of Baire categories, we prove that the set of pairs of bivariate copulas that are comparable -- in either direction -- under the increasing convex order is nowhere dense in the space of all pairs of bivariate copulas equipped with the uniform metric. As a consequence, a topologically generic pair of bivariate copulas is not comparable under this order. We further extend the Baire-category programme to two additional stochastic orders on the space of bivariate copulas: the bivariate convex order and the stop-loss order on the sum of the components. For each of these orders, we establish that the set of comparable pairs is closed and nowhere dense, and we show that a topologically generic pair of bivariate copulas is simultaneously incomparable in all three orders. These results complement those obtained in [F. Durante, J. Fernández-Sánchez, C. Ignazzi (2022). Baire category results for stochastic orders. Rev. Real Acad. Cienc. Exactas Fis. Nat. Ser. A-Mat. 116, article 188] for the lower orthant order on copulas.

2606.09594 2026-06-09 math.ST cond-mat.stat-mech cs.NA math-ph math.MP math.NA stat.TH 新提交

Constraint residuals, graph posteriors, and determinant-corrected full-space targets in Bayesian inverse problems

贝叶斯逆问题中的约束残差、图后验和行列式校正的全空间目标

Jonathon Cottom, Emilia Olsson

AI总结 针对等式约束贝叶斯逆问题,证明了残差惩罚与约化后验不等价,并推导了行列式校正项以实现图提升约化后验的硬约束极限。

详情
AI中文摘要

受状态方程约束的贝叶斯逆问题通常通过惩罚残差在全参数-状态空间中采样,而不是在消除状态的约化空间中采样。我们表明这些公式作为后验测度并不自动等价。对于等式约束逆问题的有限维离散化,假设状态方程 \(c(θ,u)=0\) 有唯一解 \(u=G(θ)\) 且状态雅可比 \(\D_u c\) 非奇异。那么约化后验、其图提升以及零噪声残差后验是不同的。局部变量变换表明,未校正的高斯残差惩罚在边缘化 \(u\) 后收敛到乘以 \(\abs{\det \D_u c(θ,G(θ))}^{-1}\) 的约化密度。因此,代数等价的残差可以定义相同的可行集但不同的极限后验。我们推导了无权重、加权和重新缩放的残差惩罚的行列式校正,这些校正具有图提升的约化后验作为其硬约束极限。该结果将可行性与后验校准分开:将残差驱动到零不足以精确采样图提升的约化后验,除非采样或校正步骤针对相应的校正密度。

英文摘要

Bayesian inverse problems constrained by state equations are often sampled in a full parameter-state space by penalising the residual, rather than in a reduced space where the state is eliminated. We show that these formulations are not automatically equivalent as posterior measures. For finite-dimensional discretisations of equality-constrained inverse problems, assume the state equation \(c(θ,u)=0\) has a unique solution \(u=G(θ)\) and nonsingular state Jacobian \(\D_u c\). The reduced posterior, its graph lift, and the zero-noise residual posterior are then distinct. A local change of variables shows that an uncorrected Gaussian residual penalty converges, after marginalisation over \(u\), to the reduced density multiplied by \(\abs{\det \D_u c(θ,G(θ))}^{-1}\). Thus algebraically equivalent residuals can define the same feasible set but different limiting posteriors. We derive determinant corrections for unweighted, weighted, and rescaled residual penalties that have the graph-lifted reduced posterior as their hard-constraint limit. The result separates feasibility from posterior calibration: driving the residual to zero is not sufficient for exact sampling of the graph-lifted reduced posterior unless the sampling or correction step targets the corresponding corrected density.

2606.09328 2026-06-09 math.ST math.PR stat.TH 新提交

Parameter estimation in generalized fractional neuronal models

广义分数阶神经元模型中的参数估计

Pauliina Ilmonen, Milla Laurikkala, Enrica Pirozzi, Luigia Caputo, Lauri Viitasaari

AI总结 针对结合分数阶动力学与相关随机输入的广义随机分数阶神经元模型,提出基于离散观测的两步参数估计方法,利用Mittag-Leffler函数渐近行为和分数阶微分技术,推导误差界并分析重建稳定性。

详情
AI中文摘要

我们研究了一个广义随机分数阶神经元模型,该模型结合了分数阶动力学与相关随机输入。所提出的框架由分数阶微分方程描述,该方程由具有平稳增量和均值回复结构的潜在随机过程驱动。这种公式允许包含短程和长程依赖结构,并自然产生非指数松弛现象。主要目标是基于神经元状态过程的离散观测,开发可行的参数估计程序。我们提出了一种两步方法。首先,通过利用Mittag-Leffler函数在原点附近的渐近行为,估计控制分数阶动力学的参数。随后,通过分数阶微分技术重建潜在随机输入,从而估计控制隐藏噪声动力学的参数。我们推导了估计量的定量误差界,并在驱动噪声的适当正则性假设下分析了潜在过程的重建误差。特别地,分数阶导数的阶数与噪声过程的Hölder正则性之间的相互作用自然出现在重建过程的稳定性分析中。最后,模拟研究说明了所提出方法的适用性,并强调了记忆效应和噪声正则性对统计推断质量的影响。结果支持分数阶随机分析对于具有记忆和相关输入的神经元系统的建模和推断的相关性。

英文摘要

We investigate a generalized stochastic fractional neuronal model combining fractional dynamics with correlated stochastic inputs. The proposed framework is described by a fractional differential equation driven by a latent stochastic process with stationary increments and mean-reverting structure. This formulation allows the inclusion of both short-range and long-range dependence structures and naturally produces non-exponential relaxation phenomena. The main goal is the development of a feasible parameter estimation procedure based on discrete observations of the neuronal state process. We propose a two-step methodology. First, the parameters governing the fractional dynamics are estimated by exploiting the asymptotic behavior of Mittag-Leffler functions near the origin. Subsequently, the latent stochastic input is reconstructed through fractional differentiation techniques, allowing the estimation of the parameters governing the hidden noise dynamics. We derive quantitative error bounds for the estimators and analyze the reconstruction error of the latent process under suitable regularity assumptions on the driving noise. In particular, the interplay between the order of the fractional derivative and the Hölder regularity of the noise process naturally emerges in the stability analysis of the reconstruction procedure. Finally, simulation studies illustrate the applicability of the proposed methodology and highlight the influence of memory effects and noise regularity on the quality of statistical inference. The results support the relevance of fractional stochastic analysis for the modeling and inference of neuronal systems with memory and correlated inputs.

2606.09121 2026-06-09 math.PR math.ST stat.TH 新提交

Truncated Signature Information for Mixed Fractional Brownian Paths

混合分数布朗路径的截断签名信息

Chunhao Cai

AI总结 研究Hurst指数大于1/4的混合分数布朗运动路径的有限期望签名信息,证明尺度权衡与分离及局部逆界。

详情
AI中文摘要

我们研究了Hurst指数大于$1/4$的混合分数布朗运动路径的有限期望签名信息。在三级以下,唯一依赖于参数的期望特征是方差变换$q_θ$和时间顺序变换$R_θ$。我们证明了尺度权衡$2K$个二级尺度与$K$个选定的二级/三级尺度,以及分离和局部逆界。

英文摘要

We study finite expected-signature information for mixed-fBm paths with Hurst indices above $1/4$. Up to level three, the only parameter-dependent expected features are the variance transform $q_θ$ and the time-ordered transform $R_θ$. We prove the scale tradeoff $2K$ level-two scales versus $K$ selected level-two/three scales, together with separation and local inverse bounds.

2606.08895 2026-06-09 cs.IT math.IT math.ST stat.TH 新提交

Optimal Regret Exponents for Bayesian Statistical Decision Problems

贝叶斯统计决策问题的最优遗憾指数

Hyun-Young Park, Si-Hyeon Lee

AI总结 研究有限状态有限动作贝叶斯统计决策问题,通过最小化不相容状态子集上的多元Chernoff信息,给出了任意损失函数下最优遗憾的精确指数,并首次得到列表假设检验的精确指数。

详情
Comments
5 pages. This work has been submitted to the IEEE for possible publication
AI中文摘要

我们研究有限状态有限动作的贝叶斯统计决策问题。尽管对于包括假设检验和假设排除在内的几种特殊情况,已知精确的错误指数刻画,但一般决策问题中最优贝叶斯遗憾的渐近行为在很大程度上是未知的。在本文中,我们证明最优遗憾总是以指数速度衰减,并刻画了任意损失函数下的精确指数。该指数由最小不相容状态子集上的多元Chernoff信息给出,其中不相容子集是指没有单个动作对该子集中所有状态都是最优的状态集合。我们的结果恢复了对称多元假设检验的经典成对最小Chernoff指数和假设排除的多元Chernoff指数,同时,据我们所知,首次给出了列表假设检验的精确指数刻画。

英文摘要

We study finite-state finite-action Bayesian statistical decision problems. While exact error-exponent characterizations are known for several special cases, including hypothesis testing and hypothesis exclusion, the asymptotic behavior of the optimal Bayes regret is largely unknown for general decision problems. In this paper, we show that the optimal regret always decays exponentially fast and characterize its exact exponent for arbitrary loss functions. The exponent is given by the minimum multivariate Chernoff information over the minimal incompatible subsets of states, where an incompatible subset is a collection of states for which no single action is optimal for all states in the subset. Our result recovers the classical pairwise-minimum Chernoff exponent for symmetric multiple hypothesis testing and the multivariate Chernoff exponent for hypothesis exclusion, while also yielding, to the best of our knowledge, the first exact exponent characterization for list hypothesis testing.

2606.08730 2026-06-09 math.ST stat.TH 新提交

Statistical Optimality of Prediction-Powered Inference

预测驱动推断的统计最优性

Se Yoon Lee, Jae Kwang Kim

AI总结 研究预测驱动推断(PPI)的统计最优性,将其框架化为M估计问题,证明在预测器得分校准时可达到半参数效率下界,并发展交叉拟合和方差校正的渐近理论。

详情
AI中文摘要

Angelopoulos等人(2023)提出的预测驱动推断(PPI)是一种流行方法,利用少量标记样本和机器学习预测进行半监督推断。尽管文献中出现了几种PPI变体,但其严格的统计理论尚未完全发展。本文研究PPI的统计最优性。我们的贡献涵盖基础理论和新方法。首先,我们将PPI框架化为M估计问题,揭示了偏差校正的PPI估计方程与理想的全数据估计方程之间的联系。这一联系导致在无放回简单随机抽样下PPI估计量的一致性和渐近正态性。其次,我们识别出有效影响函数,并证明当预测器是得分校准的,即预测器的输出与估计函数的真实条件期望一致时,PPI可以达到半参数效率下界。最后,对于学习得到的预测规则,我们发展了交叉拟合的渐近理论,以及在半参数均值估计的特殊情况下带有方差校正的单次拟合变体的渐近理论。模拟实验和实际数据应用支持这些发现。

英文摘要

The prediction-powered inference (PPI) proposed by Angelopoulos et al. (2023) is a popular method that leverages a small number of labeled samples and machine learning predictions for semi-supervised inference. While several variants of PPI have appeared in the literature, its rigorous statistical theory has not been fully developed. In this paper, we study the statistical optimality of PPI. Our contributions span both foundational theory and new methodology. First, we frame PPI as an M-estimation problem, revealing a link between the bias-corrected PPI estimating equation and the ideal full-data estimating equation. This connection leads to the consistency and asymptotic normality of the PPI estimator under simple random sampling without replacement. Next, we identify the efficient influence function and prove that PPI can attain the semiparametric efficiency lower bound when the predictor is score-calibrated, that is, when the predictor's output aligns with the true conditional expectation of the estimating function. Finally, for learned prediction rules, we develop asymptotic theory for cross-fitting and for a single-fit variant with variance correction in the special case of semiparametric mean estimation. Simulation experiments and a real-data application support these findings.

2606.08668 2026-06-09 math.ST stat.TH 新提交

Biweighted Poisson Subsampling for Convoluted Rank Regression with Massive Data

双加权泊松子采样用于大规模数据的卷积秩回归

Jialiang Li, Xiaochao Xia, Wei Zhong

AI总结 针对大规模数据中成对损失问题(如卷积秩回归),提出双加权泊松子采样(BIPS)框架,通过设计成对观测权重实现高效子采样,并证明估计量的相合性和渐近正态性,同时开发了分布式估计器。

详情
AI中文摘要

最优子采样能高效选择最具信息量的数据点,在显著降低大规模数据集计算负担的同时实现精确统计推断。然而,现有相关方法无法直接应用于成对损失问题,特别是卷积秩回归(CRR),因为目标函数具有双重求和结构。为此,我们首先针对此类问题提出一个新的双加权泊松子采样(BIPS)框架,通过为目标函数设计一对观测的权重而非单个观测的权重。考虑两种具体的逆概率加权策略。其次,我们聚焦于CRR模型,在该模型下构建BIPS估计量(BIPS-CRR)。我们建立了BIPS-CRR的相合性和渐近正态性,在L-最优性准则下推导出其最优泊松子采样概率,并提供实用算法以促进实施。第三,我们开发了一个分布式CRR估计器,该估计器将BIPS作为预采样子采样策略。该估计在全局上高效,并且在分布式计算环境中对随机和非随机分布的数据集均具有稳健性。大量模拟和实际应用展示了所提方法在有限样本下的优异性能。此外,我们的BIPS可轻松扩展到其他U-统计量优化问题和成对学习任务。

英文摘要

Optimal subsampling efficiently selects the most informative data points, enabling accurate statistical inference while significantly reducing computational burden for massive datasets. However, the existing relevant methods can not directly be applied to pairwise loss problems, particularly for convoluted rank regression (CRR), due to the double summation structure in objective function. To this end, we first propose a new BIweighted Poisson Subsampling (BIPS) framework for such problems through designing a proper weight for a pair of observations instead of for a single observation for objective function. Two concrete inverse probability weighting strategies are considered. Secondly, we focus on the CRR models, under which the BIPS estimator (BIPS-CRR) is formulated. We establish consistency and asymptotic normality for BIPS-CRR, derive its optimal Poisson subsampling probabilities under the L-optimality criterion, and provide a practical algorithm to facilitate implementation. Thirdly, we develop a distributed estimator for CRR that incorporates BIPS as a pilot subsampling strategy. This estimation is globally efficient and is robust to both randomly and non-randomly distributed datasets in distributed computing environments. Extensive simulations and a real-world application demonstrate the excellent finite-sample performance of proposed methodology. Additionally, our BIPS can be readily extended to other U-statistics optimization problems and pairwise learning tasks.

2606.08114 2026-06-09 quant-ph stat.AP 新提交

Robust applicability of continuous dynamical decoupling to decoherence reduction in longitudinal and transverse-noise settings: The role of anisotropy

连续动态解耦在纵向和横向噪声设置中减少退相干的鲁棒适用性:各向异性的作用

S. Afonso, J. M. Gomez Llorente, J. Plata

AI总结 本文解析评估连续动态解耦(CDD)在存在多种噪声源的一般量子比特设置中抑制退相干的效率,考虑横向噪声和各向异性噪声输入,并分析控制场参数对有效噪声谱的影响,通过适当选择控制参数实现CDD的鲁棒性。

详情
Journal ref
Phys. Rev. A 113, 062412 (2026)
AI中文摘要

我们解析评估了连续动态解耦(CDD)在存在多种噪声源的通用量子比特设置中抑制退相干的效率。先前的CDD理论方法主要关注其处理纵向波动的潜力。这里,我们推广了CDD处理的基本场景。除了处理由对角噪声引起的纯退相外,我们还考虑了实际设置中通常存在的横向波动的影响。特别地,研究了各向异性噪声输入的影响。此外,我们分析了CDD控制场对量子比特缀饰中波动的作用:由于驱动场通常通过其特征参数的线性斜坡开启,原始状态的关联缀饰可以用噪声Landau-Zener跃迁来描述。在我们的方法中,基于一系列幺正变换,进入系统的噪声被转化为有效随机项,其谱特性依赖于驱动参数。这种描述允许设计策略,通过控制有效噪声特性的变化来减轻波动的影响。通过适当选择控制参数,可以实现CDD对基本场景推广的显著鲁棒性。

英文摘要

We analytically evaluate the efficiency of continuous dynamical decoupling (CDD) to curb decoherence in generic qubit setups where diverse sources of noise can be present. Previous theoretical approaches to CDD have mainly focused on its potential to cope with longitudinal fluctuations. Here, the basic scenario tackled with CDD is generalized. Apart from dealing with pure dephasing induced by diagonal noise, we consider the impact of transverse fluctuations, usually present in the practical arrangements. In particular, the implications of anisotropic noisy inputs are studied. Additionally, we analyze the role of the fluctuations in the dressing of the qubit by the CDD field of control: since the driving field is usually switched on through linear ramps of its characteristic parameters, the associated dressing of the original states can be described in terms of noisy Landau-Zener transitions. In our approach, based on a sequence of unitary transformations, the noise entering the system is cast into effective stochastic terms whose spectral characteristics are dependent on the driving parameters. This description allows the design of strategies to mitigate the impact of the fluctuations using controlled changes in the effective-noise properties. Significant robustness of CDD against the generalization of the basic scenario can be achieved through an appropriate choice of the parameters of control.

2606.07931 2026-06-09 math.PR cond-mat.stat-mech cs.IT cs.LG math.IT math.ST stat.TH 新提交

Pointwise Complexity for Gaussian Fields: Upper Envelopes, Algorithmic Lower Bounds, and Separation

高斯场的逐点复杂度:上包络、算法下界与分离

Yunbei Xu

AI总结 本文证明了一个方差感知的逐点主测度定理,为高斯过程提供高概率上包络,并通过贝叶斯算法下界和加权基示例,揭示了逐点复杂度与全局极小极大风险之间的分离。

详情
AI中文摘要

我们为中心高斯过程证明了一个方差感知的逐点主测度定理。经典的泛函链刻画了标量量$\mathbb E\sup_{x\in T}X_x$;这里的定理给出了整个场的同时高概率包络。对于先验测度$\mu$,在$x$处的包络由逐点Fernique-Talagrand泛函\[\Phi_\mu(x):=\int_0^{4\sigma(x)}\sqrt{\log\frac{1}{\mu(B_d(x,\varepsilon))}}\,d\varepsilon\]以及相应的高斯尾项控制。该定理提供了经典泛函链的可重用场级精化,以及深度神经网络逐点经验过程界的高斯过程对应物。我们还从交互式Fano/数据处理原理记录了一个贝叶斯算法下包络。对于已知先验$\pi$、观测信道和具体估计量$\widehat t(Y)$,下界通过精确的鬼小弹球质量$\mathbb E_{Y\sim Q}\pi(B_d(\widehat t(Y),\Delta))$表示,而非最坏情况覆盖数。在高斯位置实验中,比较译码器将贝叶斯位置误差转化为决策对齐高斯范围的下界。然后我们构造一个简单的加权基示例,将固定先验的通常Fano松弛、贝叶斯算法下包络、选定子图集上的逐点高斯包络以及全类极小极大风险/全局高斯尺度分离开来。这些结果共同表明,在经典极小极大理论变得过于粗糙或依赖预言机的超参数化环境类中,算法下界为固定估计量提供了逐点复杂性的局部几何证书。

英文摘要

We prove a variance-aware pointwise majorizing-measure theorem for centered Gaussian processes. Classical generic chaining characterizes the scalar quantity $\mathbb E\sup_{x\in T}X_x$; the theorem here gives a simultaneous high-probability envelope for the entire field. For an ambient prior $μ$, the envelope at $x$ is governed by a pointwise Fernique-Talagrand functional \[Φ_μ(x):=\int_0^{4σ(x)}\sqrt{\log\frac{1}{μ(B_d(x,\varepsilon))}}\,d\varepsilon,\] together with the corresponding Gaussian tail term. The theorem provides a reusable field-level refinement of classical generic chaining and a Gaussian-process counterpart of pointwise empirical-process bounds for deep neural networks. We also record a Bayesian algorithmic lower envelope from the interactive Fano/data-processing principle. For a known prior $π$, an observation channel, and a concrete estimator $\widehat t(Y)$, the lower bound is expressed through the exact ghost small-ball mass $\mathbb E_{Y\sim Q}π(B_d(\widehat t(Y),Δ))$, rather than a worst-case covering number. In Gaussian location experiments, comparison decoders convert Bayes location error into lower bounds on decision-aligned Gaussian ranges. We then construct an elementary weighted-basis example separating the usual Fano relaxation for a fixed prior, the Bayesian algorithmic lower envelope, the pointwise Gaussian envelope on the selected subatlas, and the full-class minimax risk/global Gaussian scale. Together, these results show that algorithmic lower bounds provide local-geometric certificates of pointwise complexity for fixed estimators in overparameterized ambient classes, precisely in regimes where classical minimax theory becomes either too coarse or oracle-dependent.

2606.07901 2026-06-09 math.ST math.DS math.PR stat.TH 新提交

Ergodic Theory in Classical and Bayesian Inference

经典推断与贝叶斯推断中的遍历理论

Artur O. Lopes

AI总结 本文阐述经典演绎推理的数学基础与贝叶斯推断框架,介绍统计与遍历理论交叉的结果,为随机数据预测分析提供理论框架,并采用Hölder平衡测度推广独立同分布过程。

详情
AI中文摘要

我们首先介绍经典演绎推理背后的数学原理。然后引入贝叶斯推断框架的基本思想。概述了位于统计学与遍历理论交叉领域的结果,为从随机数据预测和分析现实世界现象提供了理论框架。本文本质上是说明性的——没有提出新结果;而是以教学方式描述了近期发表的结果。全文采用Hölder平衡测度,其涵盖的过程类别比独立同分布过程广泛得多。

英文摘要

We begin by presenting the mathematical rationale underlying classical deductive inference. We then introduce the foundational ideas of the Bayesian inference framework. Results lying at the interface of Statistics and Ergodic Theory are outlined, providing a theoretical framework applicable to the prediction and analysis of real-world phenomena from random data. This text is expository in nature - no new results are presented; rather, recently published results are described in a didactic manner. Throughout, we work with Hölder equilibrium measures, which encompass a substantially more general class of processes than i.i.d. ones.

2606.07847 2026-06-09 math.ST stat.TH 新提交

Revisiting the Behrens-Fisher Problem: Validity-First Optimality

重新审视 Behrens-Fisher 问题:以有效性为先的最优性

Xiao Wang, Chuanhai Liu

AI总结 本文通过推断模型框架重新审视 Behrens-Fisher 问题,提出一种圆柱形二维预测随机集,并证明在保持精确有限样本有效性的无先验方法中,该区间估计最短。

详情
AI中文摘要

Behrens-Fisher 问题涉及当两个正态总体的方差未知且不等时,对其均值差的推断。这是一个经典例子,其中 nuisance 参数阻碍了普通的精确固定样本推断,并长期作为推断基础研究的基准。我们通过 Martin 和 Liu 的推断模型框架重新审视它。在条件化和正则边缘化之后,精确关联是二维的,一个坐标对应标准化均值对比,另一个对应方差比。它们的一维广义边缘 IM 最好被理解为圆柱形二维预测随机集:通过 Hsu 的随机优势,在均值对比投影上尖锐,在方差比上空洞。我们的主要结果是精确的有效性优先最优性:在保留精确、均匀、有限样本有效性的无先验方法中,IM 区间是最短的。我们证明了圆柱形类中的极小极大性和可容许性,并通过投影论证将其扩展到矩形和一般的二维预测随机集。一个伴随的权衡原理表明,任何自适应方法只能重新分配区间宽度在不同方差比区域之间,而无法均匀缩短。蒙特卡洛研究证实了这一点:Welch 和 bootstrap 覆盖不足,而保守的 fiducial 方法并不优于 IM 区间,仅在后者过度覆盖时更短,而在有效性约束下更长。

英文摘要

The Behrens--Fisher problem concerns inference on the difference of two normal means when both variances are unknown and unequal. It is a classical example in which nuisance parameters prevent ordinary exact fixed-sample inference, and it has long served as a benchmark for the foundations of inference. We revisit it through the inferential model (IM) framework of Martin and Liu. After conditioning and regular marginalization, the exact association is two-dimensional, with one coordinate for the standardized mean contrast and one for the variance ratio. Their one-dimensional generalized marginal IM is then best understood as a cylindrical two-dimensional predictive random set: sharp in its mean-contrast projection, by Hsu's stochastic domination, and vacuous in the variance ratio. Our main result is a precise validity-first optimality: among prior-free procedures that retain exact, uniform, finite-sample validity, the IM interval is the shortest. We prove minimaxity and admissibility in the cylindrical class and, by a projection argument, extend this to rectangular and general two-dimensional predictive random sets. A companion tradeoff principle shows that any adaptive procedure can only redistribute interval width across variance-ratio regimes, never shorten it uniformly. A Monte Carlo study bears this out: Welch and the bootstrap under-cover, whereas the conservative fiducial does not dominate the IM interval, being shorter only where the latter over-covers and longer where validity binds.

2606.03360 2026-06-09 math.ST stat.TH 版本更新

Structured drift design for denoising diffusion models

去噪扩散模型的结构化漂移设计

Mahsa Taheri

AI总结 提出几何感知Ornstein-Uhlenbeck过程,通过方差感知各向异性漂移嵌入数据几何结构,改善扩散模型对多模态、高相关分布的模式分离和相关性保持。

详情
AI中文摘要

基于扩散的生成模型在高维数据生成中取得了显著成功;然而,它们从根本上依赖于各向同性扩散过程,这在前向过程中破坏了有意义的几何结构。对于复杂、多模态且高度相关的分布(如受生物学约束的遗传数据),各向同性噪声会合并不同的模式并扭曲内在依赖性。这迫使反向过程从严重退化的信号中恢复结构,导致收敛缓慢、模式平均和生物学上不可信的样本。为了解决这个问题,我们引入了几何感知Ornstein-Uhlenbeck (GOU)过程,这是一种将数据几何嵌入前向和反向动力学的结构化漂移设计。通过采用方差感知的各向异性漂移,GOU快速收缩低方差方向,同时更长时间地保持高方差方向,从而随时间维持关键的多模态结构作为稳定通道。关键的是,我们证明GOU的反向初始化误差由局部方差而非全局方差控制。这种几何自适应初始化通过减少初始失配和保持簇级结构来提高收敛速度。合成和真实世界的遗传实验表明,与标准各向同性模型相比,GOU显著改善了模式分离、相关性保持和统计有效性。

英文摘要

Diffusion-based generative models have achieved remarkable success in high-dimensional data generation; however, they fundamentally rely on isotropic diffusion processes that destroy meaningful geometric structures in the forward process. For complex, multimodal, and highly correlated distributions such as biologically constrained genetic data, isotropic noise merges distinct modes and distorts intrinsic dependencies. This forces the reverse process to recover structure from heavily degraded signals, leading to slow convergence, mode averaging, and biologically implausible samples. To address this, we introduce the Geometry-aware Ornstein-Uhlenbeck (GOU) process, a structured drift design that embeds data geometry into forward and backward dynamics. By employing a variance-aware anisotropic drift, GOU contracts low-variance directions rapidly while preserving high-variance directions longer, maintaining key multimodal structures as stable channels over time. Crucially, we show that GOU's backward initialization error is governed by local rather than global variance. This geometry-adaptive initialization improves convergence rates by reducing initial mismatch and preserving cluster-level structures. Synthetic and real-world genetic experiments demonstrate that GOU significantly improves mode separation, correlation preservation, and statistical validity over standard isotropic models.

2510.01015 2026-06-09 math.ST stat.TH 版本更新

Quantifying the noise sensitivity of the Wasserstein metric for images

对Wasserstein度量在图像中噪声敏感性的量化

Erik Lager, Gilles Mordant, Amit Moscovich

AI总结 本文研究了Wasserstein度量在像素加性噪声下的敏感性,推导了高斯噪声模型下的有限样本期望界,并证明了带符号2-Wasserstein差异误差与噪声标准差平方根成正比,优于欧几里得度量。

详情
AI中文摘要

Wasserstein度量越来越多地被用作图像相似性评分。我们考虑将图像视为像素网格上的离散测度时,Wasserstein度量对像素加性噪声的敏感性。我们为高斯噪声模型推导了有限样本期望界。其他结果包括证明带符号2-Wasserstein差异误差与噪声标准差平方根成正比。这优于欧几里得度量线性增长的特性,从而为在噪声环境下最优传输距离的优势提供了理论基础。我们展示了支持我们理论发现的实验,并指出一种奇特现象:增加噪声水平可能降低Wasserstein距离。在冷冻电镜图像的案例研究中,证明了即使在高噪声环境下欧几里得度量失效,Wasserstein度量仍能捕捉数据流形的几何结构。

英文摘要

Wasserstein metrics are increasingly adopted as similarity scores for images. We consider the sensitivity of Wasserstein metrics with respect to pixel-wise additive noise when the images are treated as discrete measures on the pixel grid. We derive finite-sample expectation bounds for a Gaussian noise model. Among other results, we prove that the error in the signed 2-Wasserstein discrepancy scales with the square root of the noise standard deviation. This is favorable compared to the Euclidean metric that scales linearly, and thus provides a theoretical basis for the benefits of optimal transport distances in noisy settings. We present experiments that support our theoretical findings and point to a peculiar phenomenon where increasing the level of noise can decrease the Wasserstein distance. A case study on cryo-electron microscopy images demonstrates that the Wasserstein metric can capture the geometry of the data manifold in high noise settings even when the Euclidean metric fails.

2602.05483 2026-06-09 eess.SY cs.CY cs.SY stat.AP 版本更新

Toward Operationalizing Rasmussen: Drift Observability on the Simplex for Evolving Systems

迈向Rasmussen的操作化:演化系统在单纯形上的漂移可观测性

Anatoly A. Krasnovsky

AI总结 针对软件运维中漂移监测的挑战,提出一种基于工件的自动漂移监测设计,通过映射软件工件到稳定的组合监测状态,在log-ratio坐标下分析边界导向的漂移,并报告漂移方向、步长、平衡归因和模型健康指标。

详情
AI中文摘要

软件运维日益依赖SLO、追踪、部署规范和变更事件,但仪表板和阈值实践通常将类似份额的操作信号暴露为独立的标量面板或基线距离。这可能在良性重新分布下产生误报,并遗漏朝向策略边界的移动。Rasmussen的动态安全模型在竞争压力下激发了漂移,但将其操作化用于软件是困难的,因为相关状态变量(剩余裕度、工程努力和风险/影响)通常是组合的,且其部分会演化。我们提出了一种自动的、从工件派生的漂移监测器设计,将变化的软件工件映射到一个稳定的组合监测状态:它提取当前部件清单和策略约束,将遥测映射到正组合,通过谱系感知的规范组稳定拆分、合并和重命名,并在log-ratio坐标下分析边界导向的漂移。所提出的监测器将报告漂移方向、到边界的步长、平衡水平归因以及架构变更下的模型健康指标。我们指定了该方法,确定了其零/噪声/谱系假设,并报告了一个可重复的合成合理性检查,涉及边界感知的漂移和受控的部件变更。

英文摘要

Software operations increasingly rely on SLOs, traces, deployment specifications, and change events, yet dashboards and thresholding practices often expose share-like operational signals as separate scalar panels or baseline distances. This can create false alarms under benign redistribution and miss movement toward policy boundaries. Rasmussen's dynamic safety model motivates drift under competing pressures, but operationalizing it for software is difficult because relevant state variables (remaining margin, engineering effort, and risk/impact) are often compositional and their parts evolve. We formulate an automated, artifact-derived drift-monitor design that maps changing software artifacts into a stable compositional monitoring state: it extracts a current part inventory and policy constraints, maps telemetry to a positive composition, stabilizes splits, merges, and renames through lineage-aware canonical groups, and analyzes boundary-directed drift in log-ratio coordinates. The proposed monitor would report drift direction, step-to-boundary, balance-level attribution, and model-health indicators under architectural churn. We specify the approach, identify its zero/noise/lineage assumptions, and report a reproducible synthetic sanity check of boundary-aware drift and controlled part churn.

2505.02197 2026-06-09 math.ST stat.TH 版本更新

Uniform central limit theorems for non-stationary processes via relative weak convergence

非平稳过程的相对弱收敛一致中心极限定理

Nicolai Palm, Thomas Nagler

AI总结 针对非平稳数据经典中心极限定理失效的问题,引入相对弱收敛概念,建立了随机向量和经验过程的相对中心极限定理及其变体,为非参数趋势估计和假设检验提供简单替代方案。

详情
AI中文摘要

非平稳数据的统计推断受到经典中心极限定理(CLT)失效的阻碍,尤其是因为没有固定的高斯极限可供收敛。为解决这一问题,我们引入了相对弱收敛,这是弱收敛的一种扩展,它将统计量或过程与一系列演化过程进行比较。相对弱收敛保留了经典弱收敛的基本推论,并在平稳性下与之重合。关键的是,它适用于经典弱收敛失效的一般非平稳环境。我们为随机向量和经验过程建立了具体的相对CLT,以及序列、加权和自助法变体,这些变体与平稳环境下的最新技术相平行。我们的框架和结果为经典CLT提供了简单的即插即用替代方案,只要平稳性不成立即可使用,如非参数趋势估计和假设检验中的应用所示。

英文摘要

Statistical inference for non-stationary data is hindered by the failure of classical central limit theorems (CLTs), not least because there is no fixed Gaussian limit to converge to. To resolve this, we introduce relative weak convergence, an extension of weak convergence that compares a statistic or process to a sequence of evolving processes. Relative weak convergence retains the essential consequences of classical weak convergence and coincides with it under stationarity. Crucially, it applies in general non-stationary settings where classical weak convergence fails. We establish concrete relative CLTs for random vectors and empirical processes, along with sequential, weighted, and bootstrap variants that parallel the state-of-the-art in stationary settings. Our framework and results offer simple, plug-in replacements for classical CLTs whenever stationarity is untenable, as illustrated by applications in nonparametric trend estimation and hypothesis testing.

2505.08908 2026-06-09 math.ST cs.LG econ.TH stat.TH 版本更新

Statistical Decision Theory with Counterfactual Loss

具有反事实损失的统计决策理论

Benedikt Koch, Kosuke Imai

AI总结 针对经典统计决策理论忽略反事实信息的问题,提出在强可忽略性下反事实风险可识别当且仅当损失函数在潜在结果上可加,并证明可加反事实损失能捕捉决策难度,通过符号线性逆规划无需数据即可判断可识别性。

详情
AI中文摘要

许多研究者应用经典统计决策理论来评估治疗选择和学习最优策略。然而,由于该框架仅依赖于所选行动下的实现结果而忽略反事实,它无法在单位层面评估决策相对于可行替代方案的质量,而这在某些设置中是一个重要要求。例如,在审前保释决策中,法官必须平衡释放后的犯罪预防与对被捕者施加不必要负担的风险。该框架中的一个核心挑战是可识别性:由于每个单位仅观测到一个潜在结果,反事实风险通常不可识别。我们证明,在强可忽略性下,反事实风险可识别当且仅当损失函数在潜在结果上可加。我们进一步证明,当存在两个以上的治疗选项时,可加反事实损失可以产生与基于标准损失不同的治疗推荐。我们表明,可加反事实损失不仅捕捉决策准确性,还捕捉决策难度,而标准损失仅反映准确性。最后,我们引入一个符号线性逆规划,无需数据即可确定给定的反事实损失是否产生可识别的风险。

英文摘要

Many researchers apply classical statistical decision theory to evaluate treatment choices and learn optimal policies. However, because this framework relies solely on realized outcomes under chosen actions and ignores counterfactuals, it cannot assess the quality of a decision relative to feasible alternatives at the unit level, which is an important requirement in some settings. For example, in pretrial bail decisions, a judge must balance crime prevention upon release against the risk of imposing unnecessary burdens on arrestees. A central challenge in this framework is identification: since only one potential outcome is observed per unit, counterfactual risk is typically not identifiable. We show that, under strong ignorability, counterfactual risk is identifiable if and only if the loss is additive in the potential outcomes. We further demonstrate that additive counterfactual losses can yield treatment recommendations that differ from those based on standard losses when more than two treatment options are available. We show that additive counterfactual losses capture not only decision accuracy but also decision difficulty, whereas standard losses reflect accuracy alone. Finally, we introduce a symbolic linear inverse program that determines whether a given counterfactual loss yields an identifiable risk, without requiring data.

2503.18754 2026-06-09 q-bio.NC cond-mat.dis-nn stat.ML 版本更新

Dynamics of learning to integrate in linear recurrent neural networks

线性递归神经网络中整合学习动力学

Blake Bordelon, Jordan Cotler, Cengiz Pehlevan, Jacob A. Zavatone-Veth

AI总结 通过理论分析线性RNN学习白噪声整合的动力学,揭示慢特征通过梯度下降习得的机制,并推广到阻尼振荡滤波器。

详情
Comments
17+9 pages, 7+1 figures
AI中文摘要

学习支持长内在时间尺度记忆的递归连接是动态计算理论中的一个基本问题。虽然连续吸引子和积分器模型描述了调谐的递归电路如何维持信息,但关于这种慢模式如何通过基于梯度的学习获得,我们知之甚少。在这里,我们在一个可解析处理的环境中研究这个问题:我们构建了一个数学理论,用于描述训练以整合白噪声的线性RNN的学习动力学。我们表明,当初始递归权重较小时,学习动力学由一个低维系统描述,该系统跟踪递归权重的单个异常特征值。这揭示了与白噪声积分相关的长时间尺度被学习的精确方式。我们将分析扩展到学习阻尼振荡滤波器的RNN,并发现一对共轭异常特征值演化的低维有效动力学方程。综上所述,我们的分析构建了一个丰富的数学框架,用于研究既与机器学习又与神经科学相关的动态学习问题。

英文摘要

Learning recurrent connectivity that supports memory over long intrinsic timescales is a basic problem in the theory of dynamical computation. While continuous attractor and integrator models describe how tuned recurrent circuits can maintain information, less is known about how such slow modes are acquired by gradient-based learning. Here we study this question in an analytically tractable setting: we build a mathematical theory of the learning dynamics of linear RNNs trained to integrate white noise. We show that when the initial recurrent weights are small, the dynamics of learning are described by a low-dimensional system that tracks a single outlier eigenvalue of the recurrent weights. This reveals the precise manner in which the long timescale associated with white noise integration is learned. We extend our analyses to RNNs learning a damped oscillatory filter, and find low-dimensional effective dynamical equations for the evolution of a conjugate pair of outlier eigenvalues. Taken together, our analyses build a rich mathematical framework for studying dynamical learning problems relevant to both machine learning and neuroscience.

2110.06250 2026-06-09 math.ST stat.TH 版本更新

On the Minimum Attainable Risk in Permutation Invariant Problems

置换不变问题中的最小可达风险

Asaf Weinstein

AI总结 本文通过扩展决策理论定义至选择性推断任务,引入一类置换不变问题,并证明风险最小化器是均匀先验下的贝叶斯规则,给出了风险下界;在选定总体参数估计中,该下界渐近等价于可计算的下界。

详情
Comments
1 figure
AI中文摘要

我们通过将标准决策理论定义扩展到允许选择性推断任务(其中目标仅在查看数据后指定)来引入一类广泛的置换不变问题。对于任何此类问题,在所有置换不变(等变)过程中,风险在$\boldsymbol{\theta}$处的极小化器被证明是假设$\boldsymbol{\theta}$的所有排列上均匀先验的贝叶斯规则。这给出了广泛问题中任何合理过程风险的最大下界的显式形式。从实践角度来看,由于计算成本,需要近似精确界。在估计选定总体参数的具体例子中,我们证明我们的界渐近地等于由贝叶斯规则达到的可计算界,该贝叶斯规则用具有相同边缘的独立同分布先验替换$\boldsymbol{\theta}$所有排列上的均匀先验。这推广了以前仅对复合决策问题这一非常特殊的情况已知的结果。讨论了通过经验贝叶斯规则渐近达到后一个界的可能性。

英文摘要

We introduce a broad class of permutation invariant problems by extending the standard decision theoretic definition to allow also selective inference tasks, where the target is specified only after seeing the data. For any such problem, the minimizer of the risk at $\boldsymbolθ$ among all permutation invariant (equivariant) procedures is shown to be the Bayes rule that posits a uniform prior over all permutations of $\boldsymbolθ$. This gives an explicit form of the greatest lower bound on the risk of any sensible procedure in a wide range of problems. From a practical perspective, approximations to the exact bound are required because of its computational cost. In a specific example of estimating the parameter of a selected population, we prove that our bound coincides asymptotically with the computationally tractable bound attained by the Bayes rule which replaces the uniform prior on all permutations of $\boldsymbolθ$ by the i.i.d. prior with the same marginals. This generalizes results previously known only for the very special case of compound decision problems. The possibility of asymptotically attaining the latter bound by an empirical Bayes rule is discussed.

2401.01599 2026-06-09 cs.LG math.ST stat.TH 版本更新

Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay

幂律衰减下解析谱算法的泛化误差曲线

Yicheng Li, Weiye Gan, Zuoqiang Shi, Qian Lin

AI总结 本文在温和假设下,完整刻画了核梯度下降等解析谱算法在核回归中的泛化误差曲线,揭示了核插值的不一致性和高资格算法的饱和效应,并通过神经正切核理论加深了对宽神经网络泛化行为的理解。

详情
AI中文摘要

某些核回归方法的泛化误差曲线旨在确定在不同源条件、噪声水平和正则化参数选择下泛化误差的精确阶数,而非极小极大速率。在这项工作中,在温和假设下,我们严格地提供了核梯度下降方法(以及一大类解析谱算法)在核回归中泛化误差曲线的完整刻画。因此,我们可以锐化核插值的近不一致性,并阐明具有更高资格的核回归算法的饱和效应等。得益于神经正切核理论,这些结果极大地提高了我们对训练宽神经网络泛化行为的理解。一个新颖的技术贡献——解析泛函论证——可能具有独立的意义。

英文摘要

The generalization error curve of certain kernel regression method aims at determining the exact order of generalization error with various source condition, noise level and choice of the regularization parameter rather than the minimax rate. In this work, under mild assumptions, we rigorously provide a full characterization of the generalization error curves of the kernel gradient descent method (and a large class of analytic spectral algorithms) in kernel regression. Consequently, we could sharpen the near inconsistency of kernel interpolation and clarify the saturation effects of kernel regression algorithms with higher qualification, etc. Thanks to the neural tangent kernel theory, these results greatly improve our understanding of the generalization behavior of training the wide neural networks. A novel technical contribution, the analytic functional argument, might be of independent interest.

2312.04903 2026-06-09 math.ST stat.TH 版本更新

Differential privacy statistical inference for a directed graph network model with covariates

带协变量的有向图网络模型的差分隐私统计推断

Jing Luo, Hong Qin, Zhimeng Xu

AI总结 针对带协变量的有向β模型,提出联合拉普拉斯机制实现差分隐私,基于矩估计方法估计节点异质性和协变量同质性参数,并证明估计量的一致性和渐近正态性。

详情
AI中文摘要

网络数据通常包含敏感的关系信息,在没有适当统计保护的情况下,直接发布或共享可能导致不可忽视的隐私侵犯。虽然差分隐私已成为隐私保护网络数据分析的强大框架,但其理论理解仍然有限,特别是对于同时包含网络结构和节点属性的模型。本文通过研究在差分隐私约束下带协变量的有向β模型来填补这一空白。我们的模型同时考虑了节点级异质性(通过$2n$维度数参数$\theta$)和协变量驱动的同质性(通过$p$维参数$\gamma$)。为了保护隐私,我们引入了一种联合拉普拉斯机制来发布网络统计量,同时满足差分隐私约束。利用基于矩的估计技术,我们估计了度数异质性和同质性的参数,并推导了当网络规模趋于无穷大时差分隐私估计量的一致性和渐近正态性。我们的理论发现通过数值模拟和实际案例研究得到验证,证明了理论结果的有效性。

英文摘要

Network data typically contain sensitive relational information, where direct release or sharing may lead to non-negligible privacy violations without proper statistical safeguards. While differential privacy has emerged as a powerful framework for privacy-preserving network data analysis, theoretical understanding remains limited particularly for models incorporating both network structure and nodal attributes. This paper bridges this gap by investigating a directed $β$-model with covariates under differential privacy constraints. Our model accounts for both node-level heterogeneity (via $2n$-dimensional degree parameters $θ$ ) and covariate-driven homogeneity (via a $p$-dimensional parameter $γ$). To protect privacy, we introduce a joint Laplace mechanism for releasing network statistics while satisfying differential privacy constraints. Leveraging moment-based estimation techniques, we estimate the parameters of both degree heterogeneity and homogeneity and derive the consistency and asymptotic normality of the differentially private estimators as the network size tends to infinity. Our theoretical findings are validated through numerical simulations and real-world case studies, demonstrating the validity of our theoretical results.

2003.01772 2026-06-09 math.ST stat.TH 版本更新

Global Sensitivity Analysis: a new generation of mighty estimators based on rank statistics

全局敏感性分析:基于秩统计的新一代强大估计量

Fabrice Gamboa, Pierre Gremaud, Thierry Klein, Agnès Lagnoux

AI总结 提出基于秩统计和Chatterjee相关系数的框架,统一估计多种全局敏感性指标,在小样本下数值高效且一致。

详情
Comments
This paper has been superseded by another paper on arXiv. Ref arXiv:2605.23760
AI中文摘要

我们为一大类全局敏感性分析方法提出了一种新的统计估计框架。我们的方法基于秩统计,并使用Sourav Chatterjee最近引入的经验相关系数。我们展示了如何应用该方法不仅计算与Chatterjee相关性概念直接相关的Cramér-von-Mises指数,还计算任意阶的Sobol指数、高阶矩指数和Shapley效应。我们证明了所得估计量的一致性,并展示了其数值效率,尤其是在小样本情况下。

英文摘要

We propose a new statistical estimation framework for a large family of global sensitivity analysis methods. Our approach is based on rank statistics and uses an empirical correlation coefficient recently introduced by Sourav Chatterjee. We show how to apply this approach to compute not only the Cramér-von-Mises indices, which are directly related to Chatterjee's notion of correlation, but also Sobol indices at any order, higher-order moment indices, and Shapley effects. We establish consistency of the resulting estimators and demonstrate their numerical efficiency, especially for small sample sizes.

2510.27011 2026-06-09 stat.ME math.OC

Refined thresholds for inconsistency: The effect of the graph associated with incomplete pairwise comparisons

不一致性的精细化阈值:与不完备成对比较相关的图的影响

Kolos Csaba Ágoston, László Csató

AI总结 本文针对不完备成对比较矩阵,发现不一致性阈值不仅依赖于矩阵大小和缺失条目数,还依赖于表示已知比较的无向图,并展示了阈值与图谱半径的强关联,可用于实时监测不一致性。

详情
Journal ref
Expert Systems with Applications, 328: 132938, 2026
Comments
25 pages, 6 figures, 6 tables
AI中文摘要

在缺乏可接受阈值的情况下,成对比较的不一致性仍然难以解释。Saaty 提出的流行的 10% 截断规则最近已被应用于包含一些未知比较的不完备成对比较矩阵。本文细化了这些不一致性阈值:我们发现它们不仅依赖于矩阵的大小和缺失条目的数量,还依赖于无向图,其边表示已知的成对比较。因此,如果填充模式与大量矩阵一致,使用我们的精确阈值尤其重要,正如文献中所推荐的。还展示了新阈值值与表示图的谱半径之间的强关联。我们的结果可以集成到软件中,以在收集成对比较期间持续监控不一致性,并立即检测潜在错误。

英文摘要

The inconsistency of pairwise comparisons remains difficult to interpret in the absence of acceptability thresholds. The popular 10% cut-off rule proposed by Saaty has recently been applied to incomplete pairwise comparison matrices, which contain some unknown comparisons. This paper refines these inconsistency thresholds: we uncover that they depend not only on the size of the matrix and the number of missing entries, but also on the undirected graph whose edges represent the known pairwise comparisons. Therefore, using our exact thresholds is especially important if the filling in patterns coincide for a large number of matrices, as has been recommended in the literature. The strong association between the new threshold values and the spectral radius of the representing graph is also demonstrated. Our results can be integrated into software to continuously monitor inconsistency during the collection of pairwise comparisons and immediately detect potential errors.

2109.13785 2026-06-09 physics.soc-ph stat.AP

Reducing the non-uniformity of the group draw in sports tournaments

减少体育赛事小组抽签的非均匀性

László Csató

AI总结 针对体育赛事小组抽签中Skip机制导致的非均匀分布问题,通过数学分析和案例研究,提出按S组球队数量降序排列抽签顺序以最小化偏差。

详情
Journal ref
Applied Soft Computing, 201(A): 115535, 2026
Comments
30 pages, 4 figures, 9 tables
AI中文摘要

体育赛事的小组抽签需要将球队分配到(几乎)相同规模的小组。抽签程序最重要的标准是平衡性、随机性和透明度,如果存在抽签限制,这些标准无法同时满足。组织者通常使用所谓的Skip机制,这是一种基于从抽签池中随机顺序抽取球队的方法,以确保平衡性和透明度。然而,Skip机制是非均匀分布的:有效的分配不一定具有相同的可能性。我们量化了当一个小组成员最多包含来自给定集合S的两支球队时的这种偏差,这对Skip机制构成了严峻挑战。我们的研究为三个抽签池且其中两个池只包含来自集合S的一支球队的情况提供了任意数量球队的精确结果,以及对三个抽签池且每个池最多五支球队的小规模问题的完全枚举。我们还分析了来自篮球和足球的三个真实案例研究。结果表明,最优设计是根据集合S中球队数量的降序考虑抽签池。这些结果可用于识别偏差最小的透明抽签程序,并决定非均匀性程度是否需要进一步采取行动。

英文摘要

The group draw of a sports tournament requires assigning teams to groups of (almost) the same size. The most important criteria for a draw procedure are balance, randomness, and transparency, which could not be satisfied simultaneously if draw constraints exist. Organisers usually use the so-called Skip mechanism, a method based on a random sequential draw of the teams from pots, in order to ensure balance and transparency. However, the Skip mechanism is non-uniformly distributed: the valid assignments are not necessarily equally likely. We quantify this distortion if a group can contain at most two teams from a given set S, which poses a serious challenge for the Skip mechanism. Our study provides exact results for an arbitrary number of teams when there are three pots and two pots contain only one team from the set S, as well as complete enumeration for small problems with three pots and at most five teams per pot. We also analyse three real-world case studies from basketball and football. It turns out that the optimal design considers the pots in decreasing order according to the number of teams in the set S. These results can be used to identify the least distorted transparent draw procedure, and decide whether the extent of non-uniformity calls for further actions.

2501.18383 2026-06-09 stat.ME stat.AP

A tutorial on conducting sample size and power calculations for detecting treatment effect heterogeneity in cluster randomized trials with linear mixed models

如何在基于线性混合模型的集群随机试验中进行样本量和功效计算以检测治疗效应异质性

Mary Ryan Baumann, Monica Taljaard, Patrick J. Heagerty, Michael O. Harhay, Guangyu Tong, Rui Wang, Fan Li

AI总结 本文探讨了在集群随机试验中使用线性混合模型检测治疗效应异质性的样本量和功效计算方法,提供了计算工具和关键考虑因素。

详情
Journal ref
International Journal of Epidemiology (2026) Volume 55, Issue 3
Comments
v3: accepted, 33 pages (19 main, supplemental 14); v2: revision under review, 36 pages (main 22, supplemental 14); v1: 28 pages, 4 tables, 5 figures
AI中文摘要

集群随机试验(CRTs)是评估社区干预的成熟设计类型。规划这些试验时,确定所需集群和集群大小以获得足够的统计功效以检测临床相关效应大小是关键任务。尽管评估整个研究人群的平均治疗效应(ATE)的方法已很成熟,但针对CRTs中治疗效应异质性(HTEs)的样本量方法仅最近才被开发出来。对于CRTs中预设的HTEs分析,应理想地伴随样本量或功效计算以确保试验具有足够的功效。由于需要指定额外的设计参数,HTEs的功率分析比ATEs更复杂。针对不同集群随机设计(包括单期和多期平行设计、交叉设计和阶梯楔形设计)以及连续和二元结果,已分别推导出通过线性混合效应(LME)模型检测HTEs的功率和样本量公式。本文提供了一本综合参考指南,通过在线R Shiny计算器增强了这些方法的可访问性。我们进一步讨论了进行样本量和功效计算以测试预设HTE假设的关键考虑因素,强调了为结果和协变量指定高级聚类相关系数估计的重要性及其对功效的影响。通过一个真实的CRT示例展示了样本量方法和计算器功能。

英文摘要

Cluster-randomized trials (CRTs) are a well-established class of designs for evaluating community-based interventions. An essential task in planning these trials is determining the number of clusters and cluster sizes needed to achieve sufficient statistical power for detecting a clinically relevant effect size. While methods for evaluating the average treatment effect (ATE) for the entire study population are well-established, sample size methods for testing heterogeneity of treatment effects (HTEs), i.e., treatment-covariate interaction or difference in subpopulation-specific treatment effects, in CRTs have only recently been developed. For pre-specified analyses of HTEs in CRTs, effect-modifying covariates should, ideally, be accompanied by sample size or power calculations to ensure the trial has adequate power for the planned analyses. Power analysis for testing HTEs is more complex than for ATEs due to the additional design parameters that must be specified. Power and sample size formulas for testing HTEs via linear mixed effects (LME) models have been separately derived for different cluster-randomized designs, including single and multi-period parallel designs, crossover designs, and stepped-wedge designs, and for continuous and binary outcomes. This tutorial provides a consolidated reference guide for these methods and enhances their accessibility through an online R Shiny calculator. We further discuss key considerations for conducting sample size and power calculations to test pre-specified HTE hypotheses in CRTs, highlighting the importance of specifying advanced estimates of intracluster correlation coefficients for both outcomes and covariates, and their implications for power. The sample size methodology and calculator functionality are demonstrated through a real CRT example.

2510.17641 2026-06-09 econ.GN physics.soc-ph q-fin.EC stat.AP

Are penalty shootouts better than a coin toss? Evidence from international club football in Europe

罚球大战是否比掷硬币更好?来自欧洲国际俱乐部足球的证据

László Csató, Dóra Gréta Petróczy

AI总结 本文基于2000-2025年间欧洲俱乐部足球比赛中的罚球大战数据,发现无法预测罚球大战结果,与之前研究结果不同。

详情
Journal ref
Journal of Sports Economics, 2026, forthcoming
Comments
23 pages, 5 figures, 7 tables
AI中文摘要

罚球大战在重大足球赛事淘汰阶段起着关键作用。自2021/22赛季以来,由于欧足联取消了客场进球规则,其重要性显著增加。本文研究了罚球大战结果是否可预测。基于2000至2025年间所有罚球大战数据,未发现踢球顺序、比赛场地或心理势头的影响。与以往结果不同,未发现罚球大战成功与相对球队实力(通过Elo评分差异和隐含胜率量化)之间的关系。因此,无法拒绝罚球大战在欧洲国际俱乐部比赛接近硬币投掷的假设。

英文摘要

Penalty shootouts play a crucial role in the knockout stage of major football tournaments. Their importance has substantially increased from the 2021/22 season, when the Union of European Football Associations (UEFA) scrapped the away goals rule. Our paper examines whether the outcome of a penalty shootout can be predicted in UEFA club competitions. Based on all shootouts between 2000 and 2025, no evidence is found for the effect of the kicking order, the field of the match, or psychological momentum. In contrast to previous results, we do not detect any relationship between shootout success and relative team strength, quantified by differences in Elo ratings and the implied winning probability. Thus, the hypothesis that penalty shootouts are close to a coin toss in international competitions for European football clubs cannot be rejected.

2309.10370 2026-06-09 cs.LG cs.AI math-ph math.MP math.OC stat.ML

Geometric structure of shallow neural networks and constructive ${\mathcal L}^2$ cost minimization

浅层神经网络的几何结构与构造性${\mathcal L}^2$成本最小化

Thomas Chen, Patrícia Muñoz Ewald

AI总结 本文研究浅层ReLU网络在欠参数化情况下的成本最小化问题,通过构造上界揭示分类数据的几何结构,不依赖梯度下降。证明了成本函数最小值的上界与训练数据信噪比相关,并确定了特定子空间的构造性训练网络。

详情
Journal ref
Phys. D, 490, Article No. 135176 (2026)
Comments
AMS Latex, 29 pages. Experimental evidence added. To appear in Physica D: Nonlinear Phenomena
AI中文摘要

本文通过显式构造上界,探讨欠参数化浅层ReLU网络中成本(损失)最小化问题,不使用梯度下降方法。重点在于阐明近似和精确极小值的几何结构。考虑$ L^2 $成本函数,输入空间$\mathbb{R}^M$,输出空间${\mathbb R}^Q$,其中$Q\leq M$,训练输入样本大小可任意大。证明了成本函数最小值的上界为$O(δ_P)$,其中$δ_P$衡量训练数据的信噪比。在特殊情况下$M=Q$时,显式确定了成本函数的精确退化局部极小值,并显示该精确值与$Q\leq M$时获得的上界相比,相对误差为$O(δ_P^2)$。上界证明提供了构造性训练的网络;我们证明该网络度量了输入空间$\mathbb{R}^M$中的特定$Q$维子空间。我们还评论了在给定上下文中成本函数全局极小值的特征化问题。

英文摘要

In this paper, we approach the problem of cost (loss) minimization in underparametrized shallow ReLU networks through the explicit construction of upper bounds which appeal to the structure of classification data, without use of gradient descent. A key focus is on elucidating the geometric structure of approximate and precise minimizers. We consider an $L^2$ cost function, input space $\mathbb{R}^M$, output space ${\mathbb R}^Q$ with $Q\leq M$, and training input sample size that can be arbitrarily large. We prove an upper bound on the minimum of the cost function of order $O(δ_P)$ where $δ_P$ measures the signal-to-noise ratio of training data. In the special case $M=Q$, we explicitly determine an exact degenerate local minimum of the cost function, and show that the sharp value differs from the upper bound obtained for $Q\leq M$ by a relative error $O(δ_P^2)$. The proof of the upper bound yields a constructively trained network; we show that it metrizes a particular $Q$-dimensional subspace in the input space ${\mathbb R}^M$. We comment on the characterization of the global minimum of the cost function in the given context.

2405.07098 2026-06-09 cs.LG cs.AI math-ph math.MP math.OC stat.ML

Interpretable global minima of deep ReLU neural networks on sequentially separable data

可解释的深度ReLU神经网络在依次可分数据上的全局极小值

Thomas Chen, Patrícia Muñoz Ewald

AI总结 本文通过构造零损失分类器,利用累积参数确定截断映射,研究了在小且分离的簇数据及依次线性可分等价类情况下,深度ReLU网络的全局极小值描述。

详情
Journal ref
J. Mach. Learn. Res., 26 (173): 1-31 (2025)
Comments
AMS Latex, 31 pages, 3 figures
AI中文摘要

我们显式地构造了零损失神经网络分类器。我们将权重矩阵和偏置向量用累积参数表示,这些参数决定了递归作用于输入空间的截断映射。考虑的训练数据配置包括(i)足够小且彼此分离的簇对应于每个类别,以及(ii)依次线性可分的等价类。在最佳情况下,对于$\mathbb{R}^M$中的$Q$类数据,全局极小值可以用$Q(M+2)$个参数描述。

英文摘要

We explicitly construct zero loss neural network classifiers. We write the weight matrices and bias vectors in terms of cumulative parameters, which determine truncation maps acting recursively on input space. The configurations for the training data considered are (i) sufficiently small, well separated clusters corresponding to each class, and (ii) equivalence classes which are sequentially linearly separable. In the best case, for $Q$ classes of data in $\mathbb{R}^M$, global minimizers can be described with $Q(M+2)$ parameters.

2311.07065 2026-06-09 cs.LG cs.AI math-ph math.MP math.OC stat.ML

On non-approximability of zero loss global ${\mathcal L}^2$ minimizers by gradient descent in Deep Learning

关于深度学习中梯度下降无法逼近零损失全局L²最小化器的非近似性

Thomas Chen, Patricia Muñoz Ewald

AI总结 本文分析了深度学习中梯度下降算法的几何特性,指出在欠参数化网络中,零损失最小化通常无法实现,因此训练输入分布必须非典型才能产生零损失最小化器。

详情
Journal ref
Theor. Appl. Mech., 52 (1), 67-73 (2025)
Comments
AMS Latex, 7 pages. Typos corrected, Corollary 1.6 upgraded to Theorem, acknowledgment added
AI中文摘要

我们分析了深度学习中梯度下降算法的几何特性,并详细讨论了在欠参数化深度学习网络中,零损失最小化通常无法实现的情形。作为结果,我们得出结论:为了产生零损失最小化器,训练输入分布必须非典型,无论是对于[Chen-Munoz Ewald 2023, 2024]中构造的方法,还是对于梯度下降[Chen 2025](假设训练数据聚类)方法而言。

英文摘要

We analyze geometric aspects of the gradient descent algorithm in Deep Learning (DL), and give a detailed discussion of the circumstance that in underparametrized DL networks, zero loss minimization can generically not be attained. As a consequence, we conclude that the distribution of training inputs must necessarily be non-generic in order to produce zero loss minimizers, both for the method constructed in [Chen-Munoz Ewald 2023, 2024], or for gradient descent [Chen 2025] (which assume clustering of training data).

2411.18385 2026-06-09 cs.LG cs.CV stat.ML

Federated Learning with Uncertainty and Personalization via Efficient Second-order Optimization

基于高效二阶优化的联邦学习中的不确定性与个性化

Shivam Pal, Aishwarya Gupta, Saqib Sarwar, Piyush Rai

AI总结 本文提出一种高效的联邦学习方法,利用二阶优化减少计算和通信成本,同时保留贝叶斯方法的不确定性与个性化优势。

详情
Journal ref
Transactions on Machine Learning Research (TMLR), 2025
AI中文摘要

联邦学习(FL)已发展为一种有前景的方法,用于在不同客户端上协作学习分布式和异质数据,而无需数据离开客户端。最近的FL研究倡导采用贝叶斯方法,因为它提供了一种系统的方法来考虑模型和预测不确定性,通过学习客户端和/或服务器模型的后验分布。此外,贝叶斯FL自然能够实现个性化,以处理不同客户端上的数据异质性,通过让每个客户端学习其独特的个性化模型。特别是,层次贝叶斯方法使所有客户端都能学习其个性化模型,同时通过服务器提供的先验分布考虑共同点。然而,尽管有这些优势,贝叶斯方法在FL中可能计算成本高且通信成本高,因为需要计算和发送后验分布。我们提出了一种新的贝叶斯FL方法,采用高效的二阶优化方法,其计算成本与Adam等一阶优化方法相似,同时提供贝叶斯方法的多种优势(例如不确定性、个性化),并且在标准和个性化FL设置中都比最先进的贝叶斯FL方法更高效和准确。我们的方法在预测准确性和不确定性估计方面优于基线方法,包括基于优化和贝叶斯FL的方法。

英文摘要

Federated Learning (FL) has emerged as a promising method to collaboratively learn from decentralized and heterogeneous data available at different clients without the requirement of data ever leaving the clients. Recent works on FL have advocated taking a Bayesian approach to FL as it offers a principled way to account for the model and predictive uncertainty by learning a posterior distribution for the client and/or server models. Moreover, Bayesian FL also naturally enables personalization in FL to handle data heterogeneity across the different clients by having each client learn its own distinct personalized model. In particular, the hierarchical Bayesian approach enables all the clients to learn their personalized models while also taking into account the commonalities via a prior distribution provided by the server. However, despite their promise, Bayesian approaches for FL can be computationally expensive and can have high communication costs as well because of the requirement of computing and sending the posterior distributions. We present a novel Bayesian FL method using an efficient second-order optimization approach, with a computational cost that is similar to first-order optimization methods like Adam, but also provides the various benefits of the Bayesian approach for FL (e.g., uncertainty, personalization), while also being significantly more efficient and accurate than SOTA Bayesian FL methods (both for standard as well as personalized FL settings). Our method achieves improved predictive accuracies as well as better uncertainty estimates as compared to the baselines which include both optimization based as well as Bayesian FL methods.

2312.07928 2026-06-09 eess.SP cs.AI stat.AP

Bayesian inversion of GPR waveforms for sub-surface material characterization: an uncertainty-aware retrieval of soil moisture and overlaying biomass properties

基于GPR波形的贝叶斯反演用于 subsurface 物性表征:一种面向不确定性的土壤含水率和覆盖物性质检索方法

Ishfaq Aziz, Elahe Soltanaghai, Adam Watts, Mohamad Alipour

AI总结 本文提出基于贝叶斯模型更新的GPR波形反演方法,用于预测土壤和覆盖层的含水率和深度,通过实验室和实地数据验证,结果与TDR和重力法一致,提供不确定性的概率估计。

详情
Comments
Total 34 pages, 17 Figures. This paper under review in a journal but has not been published yet
AI中文摘要

准确估计地下属性如含水率和土壤植被层深度对地下条件监测、精准农业和 wildfire 风险评估至关重要。由于土壤常被植被和有机物覆盖,其表征具有挑战性。此外,覆盖层性质的估计对 wildfire 风险评估至关重要。本文提出基于贝叶斯模型更新的GPR波形反演方法,用于预测土壤和覆盖层的含水率和深度。由于其与含水率的高相关性,所提出的方法预测了两层的介电常数,以及其他参数,包括层深度和电导率。所提出的贝叶斯模型更新方法提供了这些参数的概率估计,可提供关于估计信心和不确定性的信息。该方法通过实验室和实地调查收集的多样化实验数据进行了评估。实验室研究包括土壤含水率变化、覆盖层深度和材料粗细的变化。实地研究包括对十六天的田间土壤含水率的测量。结果表明预测与时域反射计(TDR)测量和传统重力法一致。表面层深度也可合理预测。所提出的方法为面向不确定性的地下参数估计提供了一种有前景的方法,可支持跨广泛应用的风险评估决策。

英文摘要

Accurate estimation of sub-surface properties such as moisture content and depth of soil and vegetation layers is crucial for applications spanning sub-surface condition monitoring, precision agriculture, and effective wildfire risk assessment. Soil in nature is often covered by overlaying vegetation and surface organic material, making its characterization challenging. In addition, the estimation of the properties of the overlaying layer is crucial for applications like wildfire risk assessment. This study thus proposes a Bayesian model-updating-based approach for ground penetrating radar (GPR) waveform inversion to predict moisture contents and depths of soil and overlaying material layer. Due to its high correlation with moisture contents, the dielectric permittivity of both layers were predicted with the proposed method, along with other parameters, including depth and electrical conductivity of layers. The proposed Bayesian model updating approach yields probabilistic estimates of these parameters that can provide information about the confidence and uncertainty related to the estimates. The methodology was evaluated for a diverse range of experimental data collected through laboratory and field investigations. Laboratory investigations included variations in soil moisture values, depth of the overlaying surface layer, and coarseness of its material. The field investigation included measurement of field soil moisture for sixteen days. The results demonstrated predictions consistent with time-domain reflectometry (TDR) measurements and conventional gravimetric tests. The depth of the surface layer could also be predicted with reasonable accuracy. The proposed method provides a promising approach for uncertainty-aware sub-surface parameter estimation that can enable decision-making for risk assessment across a wide range of applications.

2310.20699 2026-06-09 physics.chem-ph cs.LG physics.comp-ph physics.data-an stat.AP

Bayesian Multistate Bennett Acceptance Ratio Methods

贝叶斯多状态贝纳特接受比率方法

Xinqiang Ding

AI总结 本文提出贝叶斯多状态贝纳特接受比率方法,通过整合热力学状态的采样配置与先验分布,计算自由能的后验分布,并改进自由能估计的不确定性评估。

详情
Journal ref
Journal of Chemical Theory and Computation 2024 20 (5), 1878-1888
AI中文摘要

多状态贝纳特接受比率(MBAR)方法是一种计算热力学状态自由能的常用方法。本文介绍了贝叶斯MBAR,即MBAR的贝叶斯推广。通过整合从热力学状态采样的配置与先验分布,贝叶斯MBAR计算自由能的后验分布。利用后验分布,我们推导出自由能估计并计算其相关不确定性。值得注意的是,当使用均匀先验分布时,贝叶斯MBAR恢复了MBAR的结果,但提供了更准确的不确定性估计。此外,当有关于自由能的先验知识时,贝叶斯MBAR可以通过使用非均匀先验分布将此信息纳入估计过程。作为示例,我们展示通过结合关于自由能表面光滑性的先验知识,贝叶斯MBAR比MBAR方法提供更准确的估计。鉴于MBAR在自由能计算中的广泛应用,我们预计贝叶斯MBAR将成为自由能计算各种应用中的重要工具。

英文摘要

The multistate Bennett acceptance ratio (MBAR) method is a prevalent approach for computing free energies of thermodynamic states. In this work, we introduce BayesMBAR, a Bayesian generalization of the MBAR method. By integrating configurations sampled from thermodynamic states with a prior distribution, BayesMBAR computes a posterior distribution of free energies. Using the posterior distribution, we derive free energy estimations and compute their associated uncertainties. Notably, when a uniform prior distribution is used, BayesMBAR recovers the MBAR's result but provides more accurate uncertainty estimates. Additionally, when prior knowledge about free energies is available, BayesMBAR can incorporate this information into the estimation procedure by using non-uniform prior distributions. As an example, we show that, by incorporating the prior knowledge about the smoothness of free energy surfaces, BayesMBAR provides more accurate estimates than the MBAR method. Given MBAR's widespread use in free energy calculations, we anticipate BayesMBAR to be an essential tool in various applications of free energy calculations.

1909.02747 2026-06-09 eess.IV cs.CV cs.LG stat.ML

Eelgrass beds and oyster farming at a lagoon before and after the Great East Japan Earthquake 2011: potential to apply deep learning at a coastal area

2011年东日本大地震前后三重县洋浦湾的海草床和牡蛎养殖:在沿海地区应用深度学习的潜力

Takehisa Yamakita

AI总结 本文通过比较手动勾勒、简单图像分割和深度学习图像变换,研究了日本三重县洋浦湾海草床、沙地和牡蛎养殖筏的自动土地覆盖分类,展示了深度学习在地震后沿海地区空间模式提取中的潜力。

详情
AI中文摘要

本文通过对比手动勾勒、简单图像分割和深度学习图像变换方法,研究了日本三重县洋浦湾海草床、沙地和牡蛎养殖筏的自动土地覆盖分类,展示了深度学习在地震后沿海地区空间模式提取中的潜力。实验结果表明,图像变换方法在输出分辨率上表现最佳,其在植被分类上的准确率超过69%,通过随机点评估独立测试数据。沙地分布通过分割模型检测,而牡蛎养殖筏的分布则通过分割模型识别。通过手动勾勒和图像变换结果评估地震前后的变化,发现沙地面积增加而植被面积减少。仅通过分割模型检测到牡蛎养殖面积的减少。这些结果证明了深度学习在地震和海啸后空间模式提取中的潜力。

英文摘要

There is a small number of case studies of automatic land cover classification on the coastal area. Here, I test extraction of seagrass beds, sandy area, oyster farming rafts at Mangoku-ura Lagoon, Miyagi, Japan by comparing manual tracing, simple image segmentation, and image transformation using deep learning. The result was used to extract the changes before and after the earthquake and tsunami. The output resolution was best in the image transformation method, which showed more than 69% accuracy for vegetation classification by an assessment using random points on independent test data. The distribution of oyster farming rafts was detected by the segmentation model. Assessment of the change before and after the earthquake by the manual tracing and image transformation result revealed increase of sand area and decrease of the vegetation. By the segmentation model only the decrease of the oyster farming was detected. These results demonstrate the potential to extract the spatial pattern of these elements after an earthquake and tsunami. Index Terms: Great East Japan Earthquake of 2011, Land use land cover (LULC), Zosteracea seagrass, cultured oyster, deep learning, Mangoku Bay