arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.18751 2026-05-19 math.PR math.ST stat.TH

Kernel Characterisations of Stochastic Orders Within Parametric Density Families

参数密度族中随机序的核刻画

Zakaria Derbazi

AI总结 本文提出了一种基于核方法的随机序刻画方法,用于参数密度族中的似然比序、危险率序、通常随机序和相对log-concave序,并展示了该方法在复合和等场景中的应用。

详情
Comments
21 pages, 2 tables
AI中文摘要

我们开发了核准则,用于参数族中单变量概率律的似然比序、危险率序、通常随机序和相对log-concave序。得分是密度对参数的导数,核等于得分加上仅依赖于参数的加性项。核单调性给出似然比序,核凹性给出相对log-concave序,而两个尾条件均值不等式分别给出危险率序和通常随机序。相同的构造适用于联合参数路径和比较两个密度具有参数依赖因子的律,其中使用对数因子比作为核。对于具有随机项数的复合和,诱导的核是和元个数的后验均值。应用恢复标准单参数序,给出复合律的似然比比较,并通过尾条件准则处理非单调例子。

英文摘要

We develop kernel criteria for the likelihood-ratio, hazard-rate, usual stochastic, and relative log-concavity orders in parametric families of univariate probability laws with densities. The score is the derivative of the log density with respect to the parameter, and a kernel equals the score up to an additive term depending only on the parameter. Kernel monotonicity gives likelihood-ratio order, kernel concavity gives relative log-concavity, and two tail-conditional mean inequalities give the hazard-rate and usual stochastic orders. The same construction applies along joint-parameter paths and to comparisons between two laws whose densities admit parameter-dependent factors, where the log-factor ratio is used as the kernel. For compound sums with a random number of i.i.d. terms, the induced kernel is the posterior mean of the kernel of the summand count. The applications recover standard one-parameter orderings, give likelihood-ratio comparisons for compound laws, and handle nonmonotone examples through the tail-conditional criteria.

2605.18728 2026-05-19 stat.AP

Bayesian Sparse Regression for Microbiome-Metabolite Data Integration

基于微生物组-代谢物数据整合的贝叶斯稀疏回归

Kai Jiang, Satabdi Saha, Christine B. Peterson

AI总结 本文提出了一种新的贝叶斯回归方法,用于解决微生物组和代谢物数据整合中的缺失值问题和组成学特性,通过模拟数据和真实数据验证了该方法在准确填补代谢物值和选择相关微生物组预测变量方面的有效性。

详情
Comments
28 pages including references
AI中文摘要

大量研究表明,微生物代谢物,作为肠道细菌产物,对癌症风险和治疗反应具有关键影响。然而,代谢物数据通常包含大量缺失值,这可能源于低丰度或数据处理技术挑战。此外,鉴于微生物组数据的组成学特性,观测到的丰度只能在相对尺度上解释,标准变量选择方法不适用。在本项目中,我们提出了一种新的贝叶斯回归方法,以解决代谢物和微生物组数据整合中的这些挑战。我们提出模型的主要特点包括建模代谢物数据的两种缺失机制,并采用设计用于处理微生物组数据组成学特性的贝叶斯先验。我们在模拟数据上展示了所提出模型能够准确填补未观测的真实代谢物值并正确选择相关微生物组预测变量。我们进一步使用真实数据集来说明我们的方法,该数据集来自一项研究,旨在理解微生物组和代谢组在结直肠癌中的相互作用。

英文摘要

Numerous studies have shown that microbial metabolites, which represent the products of bacteria in the human gut, play a key role in shaping cancer risk and response to treatment. However, metabolite data typically contain a large proportion of missing values, which may result from either low abundance or technical challenges in data processing. Moreover, given the compositionality of microbiome data, where the observed abundances can only be interpreted on a relative scale, standard variable selection methods are not applicable. In this project, we propose a novel Bayesian regression method to address these challenges in the integration of metabolite and microbiome data. Key features of our proposed model include modeling the two different mechanisms of missingness for the metabolite data and adopting a Bayesian prior designed to address the compositional characteristics of microbiome data. We demonstrate on simulated data that our proposed model can accurately impute the unobserved true metabolite values and correctly select the relevant microbiome predictors. We further illustrate our method using real data from a study focused on understanding the interplay between the microbiome and metabolome in colorectal cancer.

2605.18691 2026-05-19 stat.ME stat.CO

Finite Population Sampling as n to N: Empirical Evidence for the Transition from Inference to Accuracy

有限总体抽样作为n到N:从推断到准确性的过渡的实证证据

Mike Crowhurst

AI总结 本文研究了当抽样比例趋近于1时估计量的行为,探讨了在高覆盖率大规模数据环境中推断假设的重新评估。

详情
Comments
12 pages, 2 Figures, 3 Tables
AI中文摘要

中央极限定理为推断统计和假设检验提供了基础。它描述了在从大总体中重复抽样时标准化统计量的行为。然而,如果样本量(n)变得非常大,以至于接近总体规模(N),则抽样变异性变得非常小,标准误差和置信区间都趋近于零。本项目旨在调查当抽样比例(f = n/N)趋近于1时估计量的行为,受现代行政记录、交易日志、传感器系统和机构数据库等数据流的启发,这些数据流捕捉了有限总体的大部分数据。我们构建了两个具有已知参数的有限总体,并在不同的抽样比例下进行了多次抽样。然后我们检查了样本均值的随机化分布,以了解抽样变异性如何崩溃。此外,我们还使用各种CPU和GPU方法进行了额外实验,以评估在不同计算条件下样本均值偏离定义总体均值的程度。结果证实,在有限总体理论预期下,抽样变异性确实减少,并在完全枚举之前变得可以忽略不计。一旦抽样变异性被最小化,剩余的估计量偏差主要与数值精度和计算结构有关,而非随机抽样。这些发现支持了在高覆盖率、大规模数据环境中推断假设的重新评估。

英文摘要

The Central Limit Theorem provides a foundation for inferential statistics and hypothesis testing. It describes how standardized statistics behave under repeated sampling from large populations. However, if the size of the sample (n) becomes so large that it approaches the size of the population (N), sampling variability becomes very small, and standard errors and margins of error both approach zero. The purpose of this project was to investigate the behavior of estimators as the sampling fraction (f = n/N) approaches 1, motivated by modern data streams from administrative records, transaction logs, sensor systems, and institutional databases that capture large portions of finite populations. We constructed two finite populations with known parameters and drew repeated samples across a range of sampling fractions. We then examined the resulting randomization distributions of the sample mean to understand how sampling variability collapses. Additional experiments were conducted using various CPU- and GPU-based methods to evaluate the deviation of the sample mean from the defined population mean under different computational conditions. The results confirm that sampling variability diminishes as expected under finite population theory and becomes negligible well before full enumeration is reached. Once sampling variability is minimized, remaining deviations in estimators are primarily related to numerical precision and computational structure rather than random sampling. These findings support a reassessment of inferential assumptions in high-coverage, large-scale data settings.

2605.18656 2026-05-19 stat.ML cs.AI cs.LG stat.ME

Statistical Limits and Efficient Algorithms for Differentially Private Federated Learning

统计界限与差分隐私联邦学习的高效算法

Arnab Auddy, Xiangni Peng, Subhadeep Paul

AI总结 本文研究了差分隐私联邦学习中估计精度、隐私约束和通信成本之间的权衡,提出了FedHybrid和FedNewton两种高效算法,通过减少通信成本提升准确性,并建立了均方误差的上界和下界以评估算法性能。

详情
AI中文摘要

联邦学习是训练机器学习和人工智能模型的一种主流框架,用于在众多用户设备或数据库之间协同训练。我们研究了差分隐私(DP)联邦M估计中估计精度、隐私约束和通信成本之间的权衡。文献中的两种标准方法是FedAvg,可能面临较高的联邦偏差,以及FedSGD,可能导致较高的通信成本。为了在减少通信成本的同时提高准确性,我们提出了FedHybrid,它使用FedSGD,但起始时通过FedAvg估计器改进初始化。我们还提出了FedNewton,通过平均本地牛顿迭代来减少FedAvg的偏差,从而在客户端数量增长缓慢时,以更少的通信轮次达到与FedSGD相当的估计精度。我们建立了这些估计器的DP版本的均方误差率的有限样本上界,作为客户端数量、本地样本大小、隐私预算和迭代次数的函数。我们进一步推导了任何迭代私有联邦过程的均方误差的最小最大下界,以作为评估这些方法最优性差距的基准。我们还通过在MNIST和CIFAR-10计算机视觉数据集上训练逻辑回归和神经网络来数值评估我们的方法。

英文摘要

Federated Learning is a leading framework for training ML and AI models collaboratively across numerous user devices or databases. We study the trade-offs among estimation accuracy, privacy constraints, and communication cost for differentially private (DP) federated M estimation. The two standard methods in the literature are FedAvg, which may suffer from high federation bias, and FedSGD, which can incur high communication cost. Aimed at improving accuracy at a reduced communication cost, we propose FedHybrid, which uses FedSGD starting with an improved initialization by the FedAvg estimator. We propose FedNewton, which averages local Newton iterations to reduce bias in FedAvg, achieving an estimation accuracy comparable to FedSGD with much fewer communication rounds when the number of clients grows sufficiently slowly. We establish finite sample upper bounds on the mean-squared error rates of the DP versions of these estimators as functions of the number of clients, local sample sizes, privacy budget, and number of iterations. We further derive a minimax lower bound on the MSE of any iterative private federated procedure that provides a benchmark to assess the optimality gap of these methods. We numerically evaluate our methods for training a logistic regression and a neural network on the computer vision datasets MNIST and CIFAR-10.

2605.18655 2026-05-19 stat.ME astro-ph.IM

Self-Supervised Conformal Prediction with Equivariant Bootstrapping for Image Uncertainty Quantification

基于等变自助法的自监督置信区间预测用于图像不确定性量化

Henry J. Aldridge, Tobías I. Liaudat, Marcelo Pereyra, Jason D. McEwen

AI总结 本文提出了一种基于等变自助法的自监督置信区间预测方法,用于图像不确定性量化,通过利用数据对称性生成启发式覆盖范围,并通过置信预测校准步骤进行细化,避免了对地面真实数据的依赖,特别在弱引力透镜质量映射中展示了其有效性。

详情
Comments
9 pages, 2 figures; submitted conference proceedings for MaxEnt 2025
AI中文摘要

逆问题在现代科学研究中无处不在,涉及从受噪声干扰的观测中恢复底层信号,通常通过测量算子转换。这些问题往往病态,特别是在成像领域,导致多个可能的解决方案和重建图像中的显著不确定性。在物理和生物科学领域,准确的不确定性量化(UQ)对于可信的科学分析和可靠的诊断至关重要。当前的成像UQ方法往往不足;它们可能不准确,或者需要不可用或难以获取的地面真实数据进行校准,这可能由于校准数据与观测数据之间的分布偏移而引入隐藏的偏见。我们介绍了一种UQ方法,利用等变自助法生成启发式覆盖范围,通过利用数据对称性。然后通过置信预测校准步骤细化这些覆盖范围,同时关键地采用自监督方法以避免对地面真实校准数据的需求。我们通过弱引力透镜质量映射展示了该方法,其中我们旨在从遥远星系的弱引力透镜形变测量中重建收敛场。质量映射特别受益于自监督方法,因为生成校准数据成本高昂且依赖于特定的宇宙学模型,这可能在下游宇宙学推断任务中引入偏见。

英文摘要

Inverse problems are ubiquitous in modern scientific studies and involve recovering an underlying signal from noisy observations often transformed by a measurement operator. These problems are frequently ill-posed, particularly in imaging, leading to multiple plausible solutions and considerable uncertainty in reconstructed images. In fields like the physical and biological sciences, accurate uncertainty quantification (UQ) is critical for trustworthy scientific analyses and confident diagnoses. Current UQ methods for imaging often fall short; they can be inaccurate, or require unavailable or difficult-to-acquire ground truth data for calibration, which can introduce hidden biases due to distribution shifts between calibration and observed data. We introduce a UQ approach that leverages equivariant bootstrapping to generate heuristic coverages by exploiting data symmetries. We then refine these coverages through a conformal prediction calibration step, while crucially employing a self-supervised approach to avoid the need for ground truth calibration data. We demonstrate this method with weak lensing mass-mapping, where we aim to reconstruct the convergence field from shear measurements of distant galaxies weakly-lensed by gravitational fields. Mass-mapping in particular benefits from the self-supervised approach, as simulating calibration data is expensive and relies on specific cosmological models that could introduce biases in downstream cosmological inference tasks.

2605.18633 2026-05-19 stat.ME stat.ML

Stable Causal Discovery via Directed Acyclic Graph Aggregation

通过有向无环图聚合实现稳定的因果发现

Yunan Wu, Yue Wang, Chunlin Li, Chenglong Ye

AI总结 本文提出DAGgr模型平均框架,通过聚合多个候选DAG以获得稳定表示,利用外样本预测似然加权候选图,并通过边重要性评分阈值规则保证聚合图的无环性,通过理论分析和实验验证其有效性。

详情
AI中文摘要

有向无环图(DAGs)在揭示复杂系统中的因果结构中起着核心作用,但从数据中学习单一DAG往往具有挑战性:模型不确定性、有限样本和大规模的搜索空间通常会导致不稳定的估计。我们提出了DAGgr,一种模型平均框架,将多个候选DAG聚合为一个稳定的表示。候选图通过在重复数据分割上的外样本预测似然进行加权,而对结果边重要性评分的应用阈值规则保证聚合图本身是无环的。我们建立了有限样本风险界,证明了该过程保持无环性,并展示了在温和的权重条件下边选择的一致性。在随机、中心枢纽和链式结构的模拟中,以及对Sachs等人(2005)蛋白质信号网络的分析中,DAGgr在匹配或超过最佳单个候选的同时,在结构恢复度量上一致优于bootstrap聚合基线。

英文摘要

Directed Acyclic Graphs (DAGs) are central to uncovering causal structure in complex systems, yet learning a single DAG from data is often challenging: model uncertainty, finite samples, and a combinatorially large search space frequently yield unstable estimates. We propose DAGgr, a model averaging framework that aggregates multiple candidate DAGs into a single stable representation. Candidate graphs are weighted by their out-of-sample predictive likelihood across repeated data splits, and a thresholding rule on the resulting edge-importance scores guarantees that the aggregated graph is itself acyclic. We establish a finite-sample risk bound, prove that the procedure preserves acyclicity, and show that edge selection is consistent under mild conditions on the weights. Simulations across random, hub, and chain structures, together with an analysis of the Sachs et al. (2005) protein-signaling network, show that DAGgr matches or exceeds the best individual candidate while consistently outperforming bootstrap-aggregation baselines across structural recovery metrics.

2605.18619 2026-05-19 stat.ME stat.CO

Random spanning tree Markov random field priors for Bayesian inverse problems in imaging

随机生成树马尔可夫随机场先验用于成像中的贝叶斯反问题

Jasper Marijn Everink

AI总结 本文提出了一种基于随机生成树的马尔可夫随机场先验,用于解决成像中的贝叶斯反问题,通过将连续和离散随机变量结合,改进了图像去噪、去模糊和修复等任务的性能。

详情
AI中文摘要

马尔可夫随机场是贝叶斯反成像问题中常用的先验分布。特别是,差分先验将相邻像素之间的差异分配概率分布,如高斯、拉普拉斯或柯西分布。根据所选差分分布,这些先验具有平滑或边缘保持特性。在本文中,我们提出了一种超先验,用于像素网格的连通图,形式为随机生成树,即具有最小边数的随机连通图,从而在先验中耦合连续和离散随机变量。通过使用随机生成树,仅对边缘的稀疏随机子集进行正则化,这有助于在减少对比损失的情况下保留图像边缘,与标准差分马尔可夫随机场相比。我们讨论了由于随机树连通性而在高分辨率先验样本中出现的类似分形界面。最后,我们提出了一种交替进行离散树更新和连续像素更新的吉布斯采样器,以高效探索后验分布。我们应用该方法到各种标准测试图像恢复问题,包括去噪、去模糊和修复,以研究所提先验的影响,与现有马尔可夫随机场进行比较。

英文摘要

Markov random fields are common prior distributions used in Bayesian inverse imaging problems. In particular, difference priors assign probability distributions to differences between neighbouring pixels, such as Gaussian, Laplace, or Cauchy distributions. Depending on the chosen difference distribution, these priors have smoothing or edge-preserving properties. In this work, we propose a hyperprior on the connectivity graph of the pixel grid in the form of a random spanning tree, i.e., a random connected graph with the minimal number of edges, thereby coupling continuous and discrete random variables in the prior. By using random spanning trees, only a sparse random subset of edges is regularized, which helps preserve edges in the image with reduced contrast loss compared to standard difference-based Markov random fields. We discuss how fractal-like interfaces arise in high-resolution prior samples due to the random-tree connectivity. Finally, we propose a Gibbs sampler that alternates between the discrete tree updates and continuous pixel updates to efficiently explore the posterior distribution. We apply the method to various standard test image restoration problems, including denoising, deblurring, and inpainting, to study the impact of the proposed prior in comparison with existing Markov random fields.

2605.18598 2026-05-19 cs.LG cond-mat.stat-mech math.FA math.PR math.ST stat.TH

Pointwise Generalization in Deep Neural Networks

深度神经网络中的逐点泛化

Shaojie Li, Yunbei Xu

AI总结 本文提出了一种深度神经网络逐点泛化的理论框架,通过分析全连接网络的点wise Riemannian 维度,建立了新的表示学习统计基础,提供了更精确的泛化界限。

详情
AI中文摘要

我们通过建立全连接网络的点wise泛化理论,探讨了深度神经网络为何能够泛化的根本问题。该框架解决了长期以来在刻画丰富非线性特征学习领域中的障碍,并为表示学习建立了新的统计基础。对于每个训练好的模型,我们通过从各层学习的特征表示的本征值推导出点wise Riemannian 维度来表征假设。这建立了一个有原则的框架,用于推导依赖假设的、具有表示意识的泛化界限。这些界限在理论和实验上都比基于模型大小、范数乘积和无限宽度线性化的方法有数量级更紧的保证。在分析上,我们识别了深度网络可 tractable 的结构属性和数学原理。在经验上,点wise Riemannian 维度表现出显著的特征压缩,随着过度参数化程度的增加而减小,并捕捉了优化器的隐含偏置。综合来看,我们的结果表明,深度网络在实际情况下是数学上可 tractable 的,并且其泛化性可以通过点wise、特征谱意识的复杂性得到清晰解释。

英文摘要

We address the fundamental question of why deep neural networks generalize by establishing a pointwise generalization theory for fully connected networks. This framework resolves long-standing barriers to characterizing the rich nonlinear feature-learning regime and builds a new statistical foundation for representation learning. For each trained model, we characterize the hypothesis via a pointwise Riemannian Dimension, derived from the eigenvalues of the learned feature representations across layers. This establishes a principled framework for deriving hypothesis-dependent, representation-aware generalization bounds. These bounds offer a systematic upgrade over approaches based on model size, products of norms, and infinite-width linearizations, yielding guarantees that are orders of magnitude tighter in both theory and experiment. Analytically, we identify the structural properties and mathematical principles that explain the tractability of deep networks. Empirically, the pointwise Riemannian Dimension exhibits substantial feature compression, decreases with increased over-parameterization, and captures the implicit bias of optimizers. Taken together, our results indicate that deep networks are mathematically tractable in practical regimes and that their generalization is sharply explained by pointwise, feature-spectrum-aware complexity.

2605.18588 2026-05-19 stat.ME q-bio.QM

OSSMM: An Open-Source Sleep Monitor and Modulator

OSSMM:一个开源的睡眠监测与调节器

Jonny Giordano, Fergal Stapleton, Gabriel Palma, Barak A. Pearlmutter

AI总结 本文提出OSSMM,一种开源的硬件和软件平台,用于可及的睡眠研究。该平台通过低成本的3D打印和商用现成组件构建了小型可穿戴头带,并配以Android应用程序,无需导电凝胶、一次性电极或专业设备,即可通过无线连接捕获多种生物信号,用于睡眠阶段分类和潜在的睡眠调节。

详情
Comments
8 pages
AI中文摘要

我们介绍了开源睡眠监测与调节器(OSSMM),一种开源的硬件和软件平台,用于可及的睡眠研究。OSSMM由一个小型可穿戴头带组成,该头带由3D打印和经济实惠的商用现成(COTS)组件制成,材料成本低于40欧元,配以一个配套的Android应用程序。该系统不需要导电凝胶、一次性电极或专用设备,能够通过无线连接捕获多种生物信号,包括运动、脉搏、电诱发眼动(EOG)和潜在的脑电图(EEG)信号,用于数据存储和可能的睡眠调节能力。通过15晚的单个受试者概念验证评估,所捕获的生物信号支持四阶段睡眠分类(清醒、浅睡、深睡、REM)使用传统机器学习方法,最佳模型在与经过验证的非接触睡眠监测器(κ=0.63与PSG)相比时,达到宏F1分数0.770和准确率0.776。两个技术发现尤其值得注意。首先,低成本、可重复使用的导电热塑性聚氨酯(CTPU)电极从商用健身胸 strap 中捕获的差分信号的频谱特性在传统EEG频率带中,包括与睡眠纺锤体一致的特征,是分类的主要特征。第二,该信号仅通过两个前额电极获得,无需专用地参考电极,表明比通常使用的更简单的配置可以实现实际的睡眠阶段分类。所有硬件设计、软件和构建说明都公开可用,以支持研究社区的复制和修改。

英文摘要

We present the Open-Source Sleep Monitor and Modulator (OSSMM), an open-source hardware and software platform for accessible sleep research. The OSSMM comprises a small wearable headband built from 3D prints and affordable commercial-off-the-shelf (COTS) components at a material cost under 40 euros, supported by a companion Android application. The system requires no conductive gels, disposable electrodes, or specialized equipment, and captures multiple biosignals movement, pulse, electrooculography (EOG), and putative electroencephalography (EEG) with wireless connectivity for data storage and potential sleep modulation capability via an onboard vibration motor. A proof-of-concept single-participant evaluation across 15 nights demonstrated that the captured biosignals support four-stage sleep classification (Wake, Light Sleep, Deep Sleep, REM) using conventional machine learning methods, with the best-performing model achieving a Macro F1-score of 0.770 and accuracy of 0.776 against a validated non-contact sleep monitor ($κ$=0.63 with PSG). Two technical findings are of particular note. First, inexpensive, reusable conductive thermoplastic polyurethane (CTPU) electrodes from commercial fitness chest straps captured a differential signal whose spectral properties in canonical EEG frequency bands, including signatures consistent with sleep spindles, are the principal features driving classification. Second, this signal is obtained from just two frontal electrodes without a dedicated ground reference, suggesting that practical sleep staging is achievable with simpler configurations than typically employed. All hardware designs, software, and build instructions are openly available to support replication and modification by the research community.

2605.18562 2026-05-19 stat.ME cs.AI cs.LG stat.AP

Estimating Item Difficulty with Large Language Models as Experts

利用大语言模型作为专家估算项目难度

Diana Kolesnikova, Kirill Fedyanin, Abe D. Hofman, Matthieu J. S. Brinkhuis, Maria Bolsinova

AI总结 本文研究了如何利用大语言模型估算新任务的难度,通过对比不同配置下的模型表现,发现基于对偶比较的配置在无额外优化时表现更优,而结合token概率和已知难度示例的绝对判断配置也表现出中等至高水平的对齐度。

详情
Comments
24 pages, 2 figures, 9 tables
AI中文摘要

准确估计项目难度对于有效的评估和适应性学习至关重要。然而,对于新创建的任务,响应数据通常不可用。预测试和专家判断可能成本高且耗时,而机器学习方法通常需要大量标记训练数据。最近的研究表明,大语言模型(LLMs)可能有所帮助。然而,关于如何通过提示配置来模拟专家进行难度估计的证据有限。本研究通过评估三种现成的LLMs作为新任务的难度评估者,填补了这一空白。使用一个在线学习系统中的项目库,研究了6个小学数学领域,将经验难度作为参考。研究采用全因子设计,交叉三个因素:判断格式(绝对vs对偶比较)、决策类型(硬决策vs基于token概率的估计)和提示策略(零样本vs少量样本)。LLM生成的难度估计与经验难度通过斯皮尔曼等级相关性进行比较。在各领域中,LLM生成的估计与经验项目难度表现出中等至强正相关。对于简单的算术任务,某些配置接近之前研究中人类专家报告的准确性范围的上限。对偶比较在无额外优化时始终优于绝对判断。然而,当结合token级概率并提供已知难度的项目示例时,绝对判断配置也表现出中等至高水平的对齐度。本研究将LLMs定位为初始项目校准的有前途的工具,并提供了有效工作流程配置的见解。

英文摘要

Accurate estimates of item difficulty are essential for valid assessment and effective adaptive learning. However, for newly created tasks, response data are typically unavailable. Pretesting and expert judgement can be costly and slow, while machine learning methods often require large labelled training datasets. Recent work suggests that large language models (LLMs) may help. However, there is limited evidence on the elicitation procedures and prompt configurations used to emulate experts for difficulty estimation. This study addresses this gap by evaluating three off-the-shelf LLMs as difficulty raters for newly created items without access to response data. Using an item bank from an online learning system, the study examined 6 domains of primary-school mathematics, with empirical difficulty estimates treated as empirical reference. The study used a full factorial design crossing three factors: judgement format (absolute vs pairwise), decision type (hard decisions vs token-probability-based estimates), and prompting strategy (zero-shot vs few-shot). LLM-derived difficulty estimates were compared with empirical difficulties using Spearman rank correlations. Across domains, LLM-based estimates exhibited moderate to strong positive correlations with empirical item difficulties. For simpler arithmetic tasks, some configurations approached the upper end of the accuracy range reported for human experts in previous research. Pairwise comparison consistently outperformed absolute judgement in the absence of additional refinements. However, when token-level probabilities were incorporated and examples of items with known empirical difficulty were provided, the absolute judgement configuration likewise demonstrated moderate-to-high alignment. The study positions LLMs as a promising tool for initial item calibration and offers insights into effective workflow configuration.

2605.18554 2026-05-19 cs.LG stat.ML

Federated Martingale Posterior Samping

联邦马尔可夫后验采样

Boning Zhang, Matteo Zecchin, Mingzhao Guo, Dongzhu Liu, Osvaldo Simeone

AI总结 本文提出联邦马尔可夫后验采样方法,通过在不共享本地数据集的情况下,利用预测分布恢复参数不确定性,从而在联邦学习中提升模型校准性能。

详情
Comments
5 pages
AI中文摘要

联邦贝叶斯神经网络需要在模型参数上固定先验分布和似然函数。在现代过度参数化模型的权重空间上提取有意义的先验分布非常困难,且任一组件的不准确都会严重降低准确性和校准性。受预测模型(如大语言模型)快速发展的启发,马尔可夫后验(也称为预测贝叶斯)用预测分布替代先验-似然对,并通过反复绘制预测样本和重新拟合模型来恢复参数不确定性。然而,直接实现联邦版本需要客户端共享本地数据集。本文提出联邦马尔可夫后验(FMP)采样,是一种单次 embarrassingly parallel 协议,其中每个客户端上传一小组可训练的数据嵌入,服务器在中心运行预测采样器。在MNIST、CIFAR-10和CIFAR-100上的实验表明,FMP与集中式方法高度匹配,并在共识式基线之上显著提升校准性。

英文摘要

Federated Bayesian neural networks require fixing a prior on the model parameters together with a likelihood. Eliciting meaningful priors on the weight space of modern overparameterized models is notoriously difficult, and misspecification of either component can severely degrade accuracy and calibration. Motivated by the rapid progress of predictive models such as large language models, the martingale posterior, also known as predictive Bayes, replaces the prior--likelihood pair with a predictive distribution and recovers parameter uncertainty by repeatedly drawing predictive samples and refitting the model. A direct federated implementation, however, would require clients to share the local data sets. This letter proposes {federated martingale posterior} (FMP) sampling, a one-shot embarrassingly parallel protocol in which each client uploads a small set of trainable data embeddings and the server runs the predictive sampler centrally. Experiments on MNIST, CIFAR-10, and CIFAR-100 show that FMP closely matches the centralized counterpart and significantly improves calibration over consensus-style baselines.

2605.18537 2026-05-19 cs.LG cs.AI stat.ML

Probing for Representation Manifolds in Superposition

在叠加中探测表示流形

Alexander Modell

AI总结 本文提出Manifold Probe方法,用于发现叠加中的表示流形,通过学习可线性预测的特征空间以及编码方向,从而揭示模型行为中因果相关的流形。

详情
Comments
19 pages, 7 figures
AI中文摘要

本文介绍了一个名为Manifold Probe的监督方法,用于在叠加中发现表示流形。该方法通过学习一个概念的特征空间,该空间可以线性预测自表示,然后学习用于编码这些特征的方向。我们展示了该方法在Llama 2-7b中时间与空间的表示上,发现每个案例中都能线性表示可解释的特征集合。在时间案例中,我们展示了通过沿流形引导,可以影响模型对著名歌曲、电影和书籍发布年份的完成,提供了证据表明Manifold Probe能够发现与模型行为因果相关的流形。

英文摘要

This paper introduces the Manifold Probe, a supervised method for discovering representation manifolds in superposition. The method generalizes linear regression probes by learning the space of features of a concept that can be linearly predicted from the representations, and then learning the directions used to encode them. We demonstrate the probe on representations of time and space in Llama 2-7b, finding manifolds which linearly represent an interpretable set of features in each case. In the case of time, we show that by steering along the manifold, we can influence the model's completions about the years in which famous songs, movies and books were released, providing evidence that the Manifold Probe can discover manifolds which are causally involved in model behaviour.

2605.18530 2026-05-19 cs.CL cs.AI cs.LG stat.ML

Continuous Diffusion Scales Competitively with Discrete Diffusion for Language

连续扩散在语言领域中能与离散扩散竞争性地扩展

Zhihan Yang, Wei Guo, Shuibai Zhang, Subham Sekhar Sahoo, Yongxin Chen, Arash Vahdat, Morteza Mardani, John Thickstun

AI总结 本文研究了连续扩散模型在语言建模中的扩展能力,通过改进Plaid模型构建RePlaid,证明连续扩散模型在计算效率和性能上可与离散模型竞争,并提供了理论支持。

详情
AI中文摘要

尽管扩散模型近期在语言建模领域受到广泛关注,但连续扩散模型在扩展性方面似乎不如离散方法。为了挑战这一观点,我们重新审视Plaid,一种基于似然的连续扩散语言模型(DLM),并构建RePlaid,通过将Plaid的架构与现代离散DLMs对齐。在统一的设定下,我们建立了第一个连续DLMs的扩展定律,表明RePlaid的计算差距仅为自回归模型的20倍,使用更少的参数优于Duo,并在过训练范围内优于MDLM。我们将RePlaid与最近的连续DLMs进行基准测试:在OpenWebText上,RePlaid实现了连续DLMs中的新状态-of-the-art PPL界值为22.1,并在生成质量上更优。这些结果表明,当通过似然训练时,连续扩散是与离散DLMs高度竞争且可扩展的替代方案。此外,我们提供了理论见解以理解基于似然训练的优势。我们展示了优化噪声调度以最小化ELBO的方差自然会得到时间上的线性交叉熵(信息损失)。这均匀地分配去噪难度,而无需任何特定时间的重参数化。此外,我们发现通过似然优化嵌入会创建结构化的几何形状并驱动最大的似然增益。

英文摘要

While diffusion has drawn considerable recent attention from the language modeling community, continuous diffusion has appeared less scalable than discrete approaches. To challenge this belief we revisit Plaid, a likelihood-based continuous diffusion language model (DLM), and construct RePlaid by aligning the architecture of Plaid with modern discrete DLMs. In this unified setting, we establish the first scaling law for continuous DLMs that rivals discrete DLMs: RePlaid exhibits a compute gap of only $20\times$ compared to autoregressive models, outperforms Duo while using fewer parameters, and outperforms MDLM in the over-trained regime. We benchmark RePlaid against recent continuous DLMs: on OpenWebText, RePlaid achieves a new state-of-the-art PPL bound of $22.1$ among continuous DLMs and superior generation quality. These results suggest that continuous diffusion, when trained via likelihood, is a highly competitive and scalable alternative to discrete DLMs. Moreover, we offer theoretical insights to understand the advantage of likelihood-based training. We show that optimizing the noise schedule to minimize the ELBO's variance naturally yields linear cross-entropy (information loss) over time. This evenly distributes denoising difficulty without any case-specific time reparameterization. In addition, we find that optimizing embeddings via likelihood creates structured geometries and drives the most significant likelihood gain.

2605.18476 2026-05-19 stat.CO cs.AI cs.LG

AI4BayesCode: From Natural Language Descriptions to Validated Modular Stateful Bayesian Samplers

AI4BayesCode: 从自然语言描述到经过验证的模块化状态性贝叶斯采样器

Jungang Zou, Alex Ziyu Jiang, Qixuan Chen

AI总结 该研究提出AI4BayesCode系统,通过自然语言描述生成可运行且验证过的MCMC采样器,采用模块化设计和递归状态性编码范式,提升了贝叶斯模型的可靠性和扩展性。

详情
AI中文摘要

编码和计算仍然是马尔可夫链蒙特卡洛(MCMC)工作流程中的主要瓶颈,尤其是在现代采样算法日益复杂的情况下,现有的概率编程系统在模型支持、扩展性和可组合性方面仍然有限。我们介绍了AI4BayesCode,这是一个可扩展的LLM驱动系统,能够将自然语言的贝叶斯模型描述转换为可运行且经过验证的MCMC采样器。为了提高可靠性,AI4BayesCode采用模块化设计,将模型分解为模块化采样块,并将每个块映射到内置的采样组件,从而减少从头实现复杂采样算法的需要。通过预生成模型规范的验证和后生成采样器代码的验证进一步提高了可靠性。AI4BayesCode还引入了一种新的递归状态性编码范式,使模块化采样组件(可能由不同贡献者开发)能够在更大的MCMC过程中协同一致地组成。我们开发了一个基准测试套件来评估AI4BayesCode的采样器生成能力。实验表明,AI4BayesCode能够仅通过自然语言描述实现广泛的贝叶斯模型。作为一项开放系统,其能力可以随着底层AI代理的改进和新增内置块的添加而继续扩展。

英文摘要

Coding and computation remain major bottlenecks in Markov chain Monte Carlo (MCMC) workflows, especially as modern sampling algorithms have become increasingly complex and existing probabilistic programming systems remain limited in model support, extensibility, and composability. We introduce \textbf{AI4BayesCode}, an extensible LLM-driven system that translates natural-language Bayesian model descriptions into runnable, validated MCMC samplers. To improve reliability, AI4BayesCode adopts a modular design that decomposes models into modular sampling blocks and maps each block to a built-in sampling component, reducing the need to implement complex sampling algorithms from scratch. Reliability is further improved through pre-generation validation of model specifications and post-generation validation of generated sampler code. AI4BayesCode also introduces a novel recursively stateful coding paradigm for MCMC, allowing modular sampling components, potentially developed by different contributors, to be composed coherently within larger MCMC procedures. We develop a benchmark suite to evaluate AI4BayesCode for sampler-generation. Experiments show that AI4BayesCode can implement a wide range of Bayesian models from natural-language descriptions alone. As an open-ended system, its capability can continue to expand with improvements in the underlying AI agent and the addition of new built-in blocks.

2605.18472 2026-05-19 stat.ML cs.AI cs.LG

Flowing with Confidence

流中自信

Friso de Kruiff, Dario Coscia, Max Welling, Erik Bekkers

AI总结 本文提出了一种名为流匹配与自信(FMwC)的方法,通过在选定层注入输入依赖的乘法噪声,传播其方差并通过网络闭式形式传播,从而在标准采样成本下获得每个样本的置信度评分,用于改进图像质量和晶体热力学稳定性、轨迹编辑和自适应步长等应用。

详情
AI中文摘要

生成模型可以产生不合逻辑的文本、不现实的图像和不稳定的材料,其生成速度比模拟或人类审查更快;没有每个样本的置信度,信任会逐渐丧失。现有解决方案运行k个集成或随机轨迹,消耗k倍的计算资源,测量模型之间的变异性,而不是模型的置信度。我们提出流匹配与自信(FMwC)。FMwC在选定的层注入输入依赖的乘法噪声,通过网络闭式形式传播其方差,并沿ODE轨迹整合,从而在标准采样成本下获得每个样本的置信度评分。该评分支持多种用途:过滤可以提高图像质量和晶体的热力学稳定性;编辑可以将轨迹回退到模型承诺的点并重新定向;自适应步长将ODE计算集中在流不明确的地方。我们发现置信度评分与学习速度场的发散量的大小相关,这为我们提供了一个窗口来理解生成过程,开启了针对关键时刻的手术形式指导,新的采样算法和生成模型的可解释性。

英文摘要

Generative models can produce nonsensical text, unrealistic images, and unstable materials faster than simulation or human review can absorb; without per-sample confidence, trust erodes. Existing fixes run $k$ ensembles or stochastic trajectories at $k\times$ compute, measuring variability between models, not model confidence. We propose Flow Matching with Confidence (FMwC). FMwC injects input-dependent multiplicative noise at selected layers, propagates its variance through the network in closed form, and integrates it along the ODE trajectory, yielding a per-sample confidence score at standard sampling cost. The score supports multiple uses: filtering improves image quality and thermodynamic stability of crystals; editing rewinds trajectories to the points where the model commits and redirects them; and adaptive stepping concentrates ODE compute where the flow is ambiguous. We find that the confidence score correlates with the magnitude of the divergence of the learned velocity field, which gives us a window to understand the generative process, opening up surgical forms of guidance that target the moments that matter, new sampling algorithms and interpretability of generative models.

2605.18459 2026-05-19 cs.LG stat.ML

Adaptive Experimentation for Censored Survival Outcomes

适应性实验设计用于截断生存结果

Yuxin Wang, Dennis Frauen, Jonas Schweisthal, Maresa Schröder, Emil Javurek, Stefan Feuerriegel

AI总结 本文提出了一种新的适应性实验框架,用于在右截断情况下估计因果效应,通过推导平均生存效应曲线的半参数效率界限,得到闭合形式的效率最优分配策略,并通过数值实验展示了与均匀随机化和截断无关基线相比的一致效率提升。

详情
AI中文摘要

适应性实验设计能够高效估计因果效应,但现有方法未针对具有截断的生存数据进行设计,其中事件时间仅部分观察(例如癌症试验中的总生存时间但存在退出)。本文开发了一种新的适应性实验框架,用于在右截断情况下估计因果效应。为此,我们推导了平均生存效应曲线的半参数效率界限,作为治疗分配策略的函数,从而获得闭合形式的效率最优分配策略。该策略通过优先考虑同时事件和截断动态导致高不确定性的患者分层,将经典Neyman分配扩展到生存设置。在此基础上,我们提出了自适应生存估计器(ASE),一种能够学习分配策略并依次估计平均生存效应曲线的自适应框架。我们的框架有三个主要优势:(i)它可以容纳任意机器学习模型用于非必要估计;(ii)它由闭合形式的效率最优分配策略引导;(iii)它具有强的理论保证,包括通过鞅中心极限定理获得的渐近正态性。我们通过各种数值实验展示了该框架,以显示与均匀随机化和截断无关基线相比的一致效率提升。

英文摘要

Adaptive experimentation enables efficient estimation of causal effects, but existing methods are not designed for survival data with censoring, where event times are only partially observed (e.g., overall survival in cancer trials but with dropout). In this paper, we develop a novel framework for adaptive experimentation to estimate causal effects under right censoring. For this, we derive the semiparametric efficiency bound for the average survival effect curve as a function of the treatment allocation policy and thereby obtain a closed-form efficiency-optimal allocation policy. The policy generalizes classical Neyman allocation to survival settings by prioritizing patient strata where both event and censoring dynamics induce high uncertainty. Building on this, we propose the Adaptive Survival Estimator (ASE), an adaptive framework that learns the allocation policy and estimates the average survival effect curve sequentially. Our framework has three main benefits: (i) it accommodates arbitrary machine learning models for nuisance estimation; (ii) it is guided by a closed-form efficiency-optimal allocation policy; and (iii) it admits strong theoretical guarantees, including asymptotic normality via a martingale central limit theorem. We demonstrate our framework across various numerical experiments to show consistent efficiency gains over uniform randomization and censoring-agnostic baselines.

2605.18448 2026-05-19 math.ST econ.EM stat.TH

Fixed-order PCA: Theory for Overestimated Factor Models

固定阶PCA:对高维因子模型中高估因子模型的理论

Yuan Liao, Xin Tong, Wanjie Wang, Dacheng Xiu

AI总结 本文研究了在高维因子模型中固定阶PCA的渐近理论,通过引入扩展和压缩映射,证明了估计因子的一致性,并展示了在固定阶下因子增强回归的渐近正态性,为保守的因子数上界提供了理论支持。

详情
AI中文摘要

我们开发了高维因子模型中主成分分析(PCA)的渐近理论,其中工作维度R是固定的,并且只需满足R≥r,其中r是真实因子数。基于随机矩阵理论中的各向异性局部定律,我们证明了第r个特征值之外的“额外”经验特征值在渐近上受噪声支配、不相干且几乎正交于因子负载。我们引入了两种旋转,一个扩展的r×R映射H'和一个压缩的R×r映射H⁺,并在两种情况下建立了估计因子的一致性。作为应用,我们分析了因子增强回归用于处理效应推断,并证明对于每个固定的R≥r,具有√T渐近正态性。这些结果为采用保守的因子数上界提供了理论基础,并将分析负担从一致的维度选择转移到更温和的对r的上界约束。

英文摘要

We develop asymptotic theory for principal component analysis (PCA) of a high-dimensional factor model in which the working dimension $R$ is fixed and only required to satisfy $R \ge r$, where $r$ is the true number of factors. Building on anisotropic local laws from random matrix theory, we show that the ``extra'' empirical eigencomponents beyond the $r$-th are asymptotically noise-governed, incoherent, and nearly orthogonal to the factor loadings. We introduce two rotations, an expanded $r\times R$ map $H'$ and a compressed $R\times r$ map $H^{+}$, and establish consistency of the estimated factors under both. As an application, we analyze a factor-augmented regression for treatment-effect inference and prove $\sqrt{T}$-asymptotic normality for every fixed $R \ge r$. These results provide a theoretical underpinning for the common empirical practice of adopting a conservative upper bound on the number of factors, and shift the analytical burden from consistent dimension selection to the milder requirement of bounding $r$ from above.

2605.18425 2026-05-19 cs.LG math.ST stat.TH

Generative Adversarial Learning from Deterministic Processes

从确定性过程生成对抗学习

Joris C. Kühl, Hanno Gottschalk

AI总结 本文研究了生成对抗网络在非独立同分布数据中的成功应用,证明了通过无限维生成对抗学习模型可以从单个确定性时间序列中学习混沌动力系统不变分布,并给出了收敛速率。

详情
Comments
37 pages, 3 figures
AI中文摘要

物理人工智能正被成功应用于不遵循传统独立同分布(i.i.d.)样本 paradigm 的数据。事实上,物理人工智能常常在非随机数据上进行训练,这些数据来源于混沌动力系统,如湍流。我们旨在通过生成对抗网络(GANs)的例子来解释这些方法的实证成功,其统计学习理论在i.i.d.假设下通常被很好地理解。我们证明了使用无限维的生成对抗学习(GAL)模型,可以从单个确定性演变的时间序列中学习足够混沌的动力系统的不变分布,并以詹森-香农散度给出收敛到解的显式速率。

英文摘要

Physical AI is being successfully applied to data which does not follow the traditional paradigm of independent and identically distributed (i.i.d.) samples. In fact, physical AI is often trained on data which is not random at all, and is instead derived from chaotic dynamical systems like turbulence. We aim to explain the empirical success of these methods using the example of generative adversarial networks (GANs), whose statistical learning theory under the i.i.d. assumption is generally well understood. We prove that it is possible, using an infinite-dimensional model of generative adversarial learning (GAL), to learn the invariant distribution of a sufficiently chaotic dynamical system from a single deterministically evolving time series of its states or measurements thereof, and give explicit rates for the convergence to the solution in terms of the Jensen-Shannon divergence.

2605.18422 2026-05-19 stat.ML cs.LG math.ST stat.TH

Generalized Functional ANOVA in Closed-Form: A Unified View of Additive Explanations

广义函数ANOVA的闭式表达:加性解释的统一视角

Baptiste Ferrere, Nicolas Bousquet, Fabrice Gamboa, Jean-Michel Loubes

AI总结 本文提出了一种闭式表达的广义函数ANOVA方法,提供了一种统一的加性解释框架,能够处理依赖输入情况下的模型预测分解问题。

详情
Comments
34 pages, 23 Figures, 101 equations, 8 Tables
AI中文摘要

函数ANOVA,或Hoeffding分解,提供了一个原理性的框架用于可解释性,通过将模型预测分解为主效应和高阶交互作用。对于独立输入,这种经典分解是显式的。它与SHAP值、广义加性模型和正交多项式展开密切相关,因此构成了加性可解释性的重要工具。然而,在更一般和现实的依赖设置中,获得可处理的表示并从数据中估计分解仍然具有挑战性。在本文中,我们针对连续输入解决了这个问题。通过结合Hilbert空间方法与广义函数ANOVA,我们构建了一个显式的Riesz基分解,使得分解计算变得容易。我们的方法恢复了经典独立情况及其相关的正交分解。基于此表示,我们提出了一种简单但强大的算法,能够在模型无关的设置下从数据样本中估计分解,并通过与几种最先进的解释方法进行实证比较,展示了该方法的威力。

英文摘要

The functional ANOVA, or Hoeffding decomposition, provides a principled framework for interpretability by decomposing a model prediction into main effects and higher-order interactions. For independent inputs, this classical decomposition is explicit. It is closely connected to SHAP values, generalized additive models, and orthogonal polynomial expansions, and therefore constitutes a fundamental tool for additive explainability. In the more general and realistic dependent setting, however, obtaining a tractable representation and estimating the decomposition from data remain challenging. In this work, we address this problem for continuous inputs. By combining Hilbert space methods with the generalized functional ANOVA, we build an explicit decomposition Riesz Basis allowing to easily compute the decomposition. Our formulation recovers the classical independent case and its associated orthogonal decomposition. Building on this representation, we propose a simple but mighty algorithm to estimate the decomposition from a data sample in a model-agnostic setting and we compare it empirically with several state-of-the-art explanation methods, demonstrating the power of the approach.

2605.18406 2026-05-19 math.NA cs.NA stat.ML

Computational aspects of the Volterra Signature

Volterra签名的计算方面

Paul P. Hager, Fabian N. Harang, Luca Pelizzari, Samy Tindel

AI总结 本文研究了Volterra签名的计算方法,提出了一种高效的算法,通过分解Chen型卷积关系并引入多种高效算法,如二次复杂度O(J²)的近似方案、基于FFT的加速方案以及精确递归方案,解决了Volterra签名计算中的算法挑战。

详情
AI中文摘要

Volterra签名扩展了经典的路径签名,通过将其迭代积分结构中的通用矩阵值核纳入其中,从而获得时间序列的灵活记忆概念。其组成部分可以视为线性受控Volterra方程的连续Picard迭代,使得其精确计算具有额外的数学兴趣。然而,核的引入带来了显著的算法挑战。我们通过首先将[arXiv:2603.04525]中建立的Chen型卷积关系分解为解析和算术部分,然后引入几种高效的算法:一种通用的近似方案,其复杂度为O(J²),其中J是时间步数;一种基于FFT的加速方案,其复杂度为O(J log J),适用于在均匀网格上的卷积核;以及一种精确递归方案,其复杂度为O(JR²),适用于具有状态空间表示维度为R的核;保留标准签名复杂度在路径维度和截断级别N中的标准复杂度。我们进一步证明,矩阵值核形式为K(t,s)=∑_p k_p(t-s)A_p的因子数量不会增加J和N的渐近复杂度。最后,我们推导了与相关Volterra签名核相关的有限差分预测-校正方案。所有算法均在公开可用的JAX基于包

英文摘要

The Volterra signature extends the classical path signature by incorporating general matrix-valued kernel into its iterated integral structure, yielding a flexible notion of memory for time series. Its components can be viewed as successive Picard iterates of linear controlled Volterra equations, making their exact computation of additional mathematical interest. However, the kernel introduces substantial algorithmic challenges. We provide a resolution by first decomposing the Chen-type convolution relation established in [arXiv:2603.04525] into analytic and arithmetic parts, and then introducing several efficient algorithms: a general approximative scheme with quadratic complexity $O(J^2)$ in the number of time steps $J$, an FFT-based acceleration with complexity $O(J\log J)$ for convolution kernels on uniform grids, and an exact recursion with complexity $O(JR^2)$ for kernels admitting a state-space representation of dimension $R$; retaining standard signature complexity in the path dimension and truncation level $N$. We further show that the number of factors in matrix-valued kernels of the form $K(t,s)=\sum_p k_p(t-s)A_p$ do not increase the asymptotic complexity in $J$ and $N$. Finally, we derive a finite-difference predictor--corrector scheme for the associated Volterra signature kernel. All algorithms are implemented in the publicly available JAX-based package "tensordev".

2605.18358 2026-05-19 math.ST stat.TH

Multi-state model with temporal-consistent survival analysis for homogeneous Markov chains

具有时间一致生存分析的多状态模型用于同质马尔可夫链

Mikael Escobar-Bach, Alexandre Popier, Malo Sahin

AI总结 本文提出了一种基于时间一致生存分析的新方法,用于估计同质马尔可夫链中指定终端状态的首次到达时间分布,并讨论了治愈个体的问题,提出了治愈率估计器,并给出了非渐近的理论保证。

详情
AI中文摘要

在本研究中,我们考虑从时间同质马尔可夫链中抽取的序列,并引入了一种新的方法来估计指定终端状态的首次到达时间分布。我们的方法基于时间一致的生存分析,这使得能够从任何转移率和转移概率的估计中构建一致的分布估计器。在这一工作中,我们还讨论了那些从未达到终端状态的链中的治愈个体问题,并提出了治愈率估计器。此外,我们为我们的方法推导了非渐近的理论保证,并应用了核型估计器。后者方法在使用通用数据和涉及接受骨髓移植的患者的真实应用中进行了模拟研究。

英文摘要

In this study, we consider sequences drawn from time-homogeneous Markov chains and introduce a novel approach for estimating first hitting-time distributions to specified terminal states. Our method- ology is based on the temporal-consistent survival analysis that facilitates the construction of consistent estimators of the distributions from any estimates of the transition rate and transition probabilities. In this line of work, we also discuss the issue of cured individuals with chains that never reach a termi- nal state, and propose an estimator of the cure rate. Furthermore, we derive non-asymptotic theoretical guarantees for our approach and apply our methodology with kernel type estimators. The latter approach is illustrated in a simulation study using generic data and a real-life application involving patients un- dergoing bone marrow transplants.

2605.18339 2026-05-19 stat.ME math.ST stat.TH

Compositional Periodic Spline Approximation for Circular Density Data in Bayes Spaces

基于贝叶斯空间的组合周期样条近似用于圆密度数据

Jitka Machalová, Jana Heckenbergerová, Karel Hron

AI总结 本文提出了一种利用贝叶斯空间中的希尔伯特空间结构,通过组合周期样条对圆密度数据进行近似和分析的新框架,通过中心对数比变换将密度表示为标准L²空间的子空间,从而在保持分布相对性和周期结构的同时应用函数数据分析工具。

详情
AI中文摘要

本文提出了一种利用贝叶斯空间中的希尔伯特空间结构,通过组合周期样条对圆密度数据进行近似和分析的新框架。通过应用中心对数比变换,密度被表示为标准L²空间的子空间,这使得能够使用函数数据分析工具,同时保持分布的相对性质和周期结构。开发了具有零积分约束的系数基周期样条构造,以及用于平滑样条和惩罚样条的矩阵公式,允许高效估计和实现。该方法应用于长期风向数据,提供了平滑且可解释的密度估计,并支持进一步的统计分析,包括函数回归。结果展示了所提出方法的实用相关性和扩展到更复杂密度值数据的潜力。

英文摘要

This paper proposes a novel framework for the approximation and analysis of circular density data using compositional periodic splines within Bayes spaces with the Hilbert space structure. By applying the centered log-ratio transformation, densities are represented in a subspace of the standard $L^2$ space of real-valued functions, which enables the use of functional data analysis tools while preserving the relative nature of distributions and their periodic structure. A coefficient-based construction of periodic splines with a zero-integral constraint is developed, together with matrix formulations for both smoothing splines and penalized splines, allowing efficient estimation and implementation. The methodology is applied to long-term wind direction data, where it provides smooth and interpretable density estimates and supports further statistical analysis, including functional regression. The results demonstrate the practical relevance of the proposed approach and its potential for extensions to more complex density-valued data.

2605.18338 2026-05-19 stat.AP cs.LG

Robust Player-Conditional Champion Ranking for League of Legends: Style Similarity, Mastery Priors, and Archetype-Constrained Discovery

《英雄联盟中稳健的玩家条件冠军排名:风格相似性、熟练度先验知识和范式约束发现》

Min Heo, Pranav Kadiyam, Prasun Panthi

AI总结 本文提出了一种基于玩家条件的稳健冠军排名方法,结合风格相似性、熟练度先验知识和范式约束,以解决《英雄联盟》中的冠军推荐问题。

详情
Comments
11 pages, 3 figures
AI中文摘要

在多人在线战斗竞技场游戏中,冠军推荐通常被非正式地视为元游戏强度、个人舒适度或全局胜率的问题。我们正式将《英雄联盟》中的冠军推荐建模为一个可解释的、玩家条件的排名问题,该问题在稀疏、嘈杂和非平稳的行为数据下进行。所提出的框架结合了四个信息源:人口强度代理、玩家风格相似性、直接和间接熟练度先验知识以及范式级的保护措施。该方法使用稳健的中位数/MAD标准化、对数转换用于偏斜事件计数、近期加权的玩家风格向量、熟练度加权的冠军池向量、加权余弦相似度、排名缩放的得分组件以及k-means++聚类用于粗略的范式支持。实现原型使用Python/Pandas建模层、Supabase支持的存储以及面向网页的推荐接口。与黑箱监督胜利预测系统不同,所提出的方法返回分解的推荐评分,可以作为预期性能代理、拟合、熟练度和范式兼容性的检查。包含一个单人案例研究,针对玩家标识符DIVINERAINRACCON的100场比赛历史进行端到端的合理性检查。因此,本文是一项方法和系统贡献:它指定了一个可重复、模块化和可审计的冠军推荐器,并通过时间训练-测试分割、下一冠军恢复、校准分析和消融研究提供了未来大规模评估的验证协议。

英文摘要

Champion recommendation in multiplayer online battle arena games is usually framed informally as a problem of metagame strength, personal comfort, or global win rate. We formalize champion recommendation in League of Legends as an interpretable, player-conditional ranking problem under sparse, noisy, and non-stationary behavioral data. The proposed framework combines four information sources: a population-strength proxy, player-style similarity, direct and indirect mastery priors, and archetype-level guardrails. The method uses robust median/MAD normalization, logarithmic transforms for skewed event counts, recency-weighted player style vectors, mastery-weighted champion-pool vectors, weighted cosine similarity, rank-scaled score components, and k-means++ clustering for coarse archetype support. The implemented prototype uses a Python/Pandas modeling layer, Supabase-backed storage, and a web-facing recommendation interface. Unlike black-box supervised win-prediction systems, the proposed method returns decomposed recommendation scores that can be inspected as expected-performance proxy, fit, mastery, and archetype compatibility. A single-player case study on a 100-game history for the player identifier DIVINERAINRACCON is included as an end-to-end sanity check. The manuscript is therefore a methods and systems contribution: it specifies a reproducible, modular, and auditable champion recommender and gives a validation protocol for future large-scale evaluation through temporal train-test splits, next-champion recovery, calibration analysis, and ablation studies.

2605.18324 2026-05-19 cs.CV cs.AI cs.GR cs.LG stat.ML

Improved Baselines with Representation Autoencoders

改进的基于表示自动编码器的基线

Jaskirat Singh, Boyang Zheng, Zongze Wu, Richard Zhang, Eli Shechtman, Saining Xie

AI总结 本文研究了基于表示自动编码器(RAE)的设计选择,发现三个见解,简化并改进了RAE。首先,研究了一种通用公式,将表示定义为最后k个编码器层的总和,而不是仅最终层。其次,研究了RAE与表示对齐(REPA)的假设,发现两者具有互补的工作机制。最后,改进了RAE在无分类器指导(CFG)中的表现,通过重新参数化DiT模型输出,实现了无需训练第二个模型的指导效果。RAEv2在ImageNet-256上达到了1.06的gFID,且训练效率显著提高。

详情
AI中文摘要

Representation Autoencoders (RAE) replace traditional VAE with pretrained vision encoders. In this paper, we systematically investigate several design choices and find three insights which simplify and improve RAE. First, we study a generalized formulation where the representation is defined as sum of the last k encoder layers rather than solely the final layer. This simple change greatly improves reconstruction without encoder finetuning or specialized data (e.g., text, faces). Second, we study the prevalent assumption that RAE (using pretrained representation as encoder) replaces representation alignment (REPA), which distills the same representation to intermediate layers instead. Through large-scale empirical analysis, we uncover a surprising finding: RAE and REPA exhibit complementary working mechanisms, allowing the same representation to be used as both encoder and target for intermediate diffusion layers. Finally, the original RAE struggles with classifier-free guidance (CFG) and requires training a second, weaker diffusion model for AutoGuidance (AG). We show that REPA itself can be viewed as x-prediction in RAE latent space. By simply re-parameterizing the output of the DiT model, it can provide guidance for

英文摘要

Representation Autoencoders (RAE) replace traditional VAE with pretrained vision encoders. In this paper, we systematically investigate several design choices and find three insights which simplify and improve RAE. First, we study a generalized formulation where the representation is defined as sum of the last k encoder layers rather than solely the final layer. This simple change greatly improves reconstruction without encoder finetuning or specialized data (e.g., text, faces). Second, we study the prevalent assumption that RAE (using pretrained representation as encoder) replaces representation alignment (REPA), which distills the same representation to intermediate layers instead. Through large-scale empirical analysis, we uncover a surprising finding: RAE and REPA exhibit complementary working mechanisms, allowing the same representation to be used as both encoder and target for intermediate diffusion layers. Finally, the original RAE struggles with classifier-free guidance (CFG) and requires training a second, weaker diffusion model for AutoGuidance (AG). We show that REPA itself can be viewed as x-prediction in RAE latent space. By simply re-parameterizing the output of the DiT model, it can provide guidance for "free". Overall, RAEv2 leads to more than 10x faster convergence over the original RAE, achieving a state-of-the-art gFID of 1.06 in just 80 epochs on ImageNet-256. On FDr^k, RAEv2 achieves a state-of-the-art 2.17 at just 80 epochs compared to the previous best 3.26 (800 epochs) without any post-training. This motivates EP_FID@k (epochs to reach unguided gFID <= k) as a measure of training efficiency. RAEv2 attains an EP_FID@2 of 35 epochs, versus 177 for the original RAE. We also validate our approach across diverse settings for text-to-image generation and navigation world models, showing consistent improvements. Code is available at https://raev2.github.io.

2605.18315 2026-05-19 math.OC stat.ML

Attention-based PCA

基于注意力的PCA

Rodrigo Maulen-Soto, Claire Boyer

AI总结 本文研究了注意力机制在无监督问题PCA中的表现,证明在高斯数据上训练时,softmax和线性注意力层学习的参数与协方差矩阵的主特征向量对齐,建立了与PCA的直接联系,并扩展到上下文设置中。

详情
AI中文摘要

我们通过一个经典无监督问题——主成分分析(PCA)的视角研究注意力机制。我们证明,当在高斯数据上训练时,softmax和线性注意力层学习的参数与协方差矩阵的主特征向量对齐,从而建立了与PCA的直接且明确的联系。我们的分析涵盖了有限和无限提示范围。在无限提示极限下,我们证明收敛到与主谱方向对齐的全局最优解;而在有限提示设置中,我们显示相同的行为在采样效应范围内出现。我们进一步将分析扩展到具有突出Wishart协方差的上下文设置中,其中注意力成功地恢复了底层信号方向。这些结果表明,在无监督目标下,注意力本质上执行类似于PCA的计算,为其实现表示学习能力提供了理论基础。

英文摘要

We study attention mechanisms through the lens of a canonical unsupervised problem: principal component analysis (PCA). We show that, when trained on Gaussian data, both softmax and linear attention layers learn parameters that align with the principal eigenvectors of the covariance matrix, thereby establishing a direct and explicit connection with PCA. Our analysis covers both finite and infinite prompt regimes. In the infinite-prompt limit, we prove convergence to globally optimal solutions aligned with the leading spectral direction, while in the finiteprompt setting we show that the same behavior emerges up to sampling effects. We further extend the analysis to an in-context setting with spiked Wishart covariances, where attention successfully recovers the underlying signal direction. These results demonstrate that attention inherently performs PCA-like computations under unsupervised objectives, providing a theoretical foundation for its representation-learning capabilities.

2605.18276 2026-05-19 stat.ML cs.LG

Geometric Dictionary Learning of Dynamical Systems with Optimal Transport

通过最优传输的几何字典学习动力系统

Thibaut Germain, Sami Chemlal, Rémi Flamary, Vladimir R. Kostic, Karim Lounici

AI总结 本文提出DOODL框架,通过几何字典学习方法在谱算子空间中学习低维流形,从而实现对复杂动力系统的高效表征和可解释的算子估计。

详情
AI中文摘要

通过算子理论表示学习动力系统提供了一个强大的框架,用于分析复杂动态,因为诸如特征值和不变结构等谱量编码了特征时间尺度和长期行为。然而,动力算子通常独立地为每个系统估计,阻止了发现相关动态中的共享结构。为了解决这一限制,我们提出相关动力系统位于谱算子空间中的低维流形附近。基于这一假设,我们引入DOODL(Dynamical OperatOr Dictionary Learning),一个框架,学习一组特征谱动态的字典,其组合近似该流形并产生紧凑、可解释的个体系统嵌入。除了表征学习外,DOODL通过将估计限制在学习的算子流形上,使从短且部分观测轨迹中快速且可解释地估计算子成为可能。在metastable Langevin动力学和湍流等离子体模拟中的实验表明,DOODL能够扩展到高度复杂的多尺度区域,同时捕捉支配动态的特征谱结构,而不是仅仅拟合轨迹,在具有挑战性的低数据区域中,其误差比独立算子估计方法低一个到两个数量级。

英文摘要

Learning dynamical systems through operator-theoretic representations provides a powerful framework for analyzing complex dynamics, as spectral quantities such as eigenvalues and invariant structures encode characteristic time scales and long-term behavior. However, dynamical operators are typically estimated independently for each system, preventing the discovery of shared structure across related dynamics. To address this limitation, we posit that related dynamical systems lie near a low-dimensional manifold in spectral operator space. Based on this hypothesis, we introduce DOODL (Dynamical OperatOr Dictionary Learning), a framework that learns a dictionary of characteristic spectral dynamics whose combinations approximate this manifold and yield compact, interpretable embeddings of individual systems. Beyond representation learning, DOODL enables fast and interpretable operator estimation from short and partially observed trajectories by constraining the estimation to the learned operator manifold. Experiments on metastable Langevin dynamics and turbulent plasma simulations demonstrate that DOODL scales to highly complex multiscale regimes while capturing characteristic spectral structure governing the dynamics rather than merely fitting trajectories, achieving errors one to two orders of magnitude lower than independent operator estimation methods in challenging low-data regimes.

2605.18206 2026-05-19 stat.ME

A tool to determine the degrees of freedom in tree-structured varying coefficient models

确定树状结构变系数模型自由度的工具

Nikolai Spuck, Moritz Berger

AI总结 本文提出了一种确定树状结构变系数模型自由度的公式,通过贝叶斯信息准则进行模型选择,并在模拟研究中验证了其比传统方法更准确且预测能力更强。

详情
AI中文摘要

树状结构变系数(TSVC)模型是一种灵活的广义回归方法,其中协变量的线性效应允许随着效应修饰变量的值而变化。相关效应修饰因子和交互作用通过递归分割来识别。在TSVC模型中,如同其他半参数和非参数回归方法一样,需要考虑数据驱动模型构建的成本以推导模型自由度(DoF)。为了解决这一问题,我们开发了一种易于应用的公式来近似TSVC模型的自由度。该公式用于基于贝叶斯信息准则(BIC)的模型选择,并在模拟研究中与将自由度设为自由模型参数的朴素解进行比较。为了说明所提出的自由度方法,使用BIC基于选择的TSVC模型被拟合到欧洲健康、老龄化和退休调查的数据上。结果表明,使用所提出公式计算自由度导致了更准确的选择结果,并提高了预测能力。

英文摘要

The tree-structured varying coefficient (TSVC) model is a flexible approach for generalized regression, where the linear effects of the covariates are allowed to vary with the values of effect modifiers. Relevant effect modifiers and interactions are identified using recursive partitioning. In TSVC models, analogously to other semi- and nonparametric regression approaches, one needs to account for the cost of data-driven model building when deriving the model degrees of freedom (DoF). To address this issue, we develop an easy-to-apply formula to approximate the DoF of a TSVC model. This formula is employed for model selection based on the Bayesian information criterion (BIC) and compared to the naive solution, setting the DoF to the number of free model parameters, in a simulation study. To illustrate the proposed DoF method, TSVC models using BIC-based selection were fitted to data from the Survey of Health, Ageing, and Retirement in Europe. Results indicated that calculation of the DoF using the proposed formula resulted in more accurate selection results with improved predictive ability.

2605.18204 2026-05-19 stat.ML cs.LG

Forward-Learned Discrete Diffusion: Learning how to noise to denoise faster

前向学习离散扩散:学习如何更快地噪声去噪声

Grigory Bartosh, Teodora Pandeva, Sushrut Karmalkar, Javier Zazo

AI总结 本文提出前向学习离散扩散(FLDD),通过引入可学习的前向(噪声)过程,减少目标分布与模型分布之间的差距,实现少步生成。该方法采用非马尔可夫形式,利用可学习的边缘和后验分布,使生成过程保持因子化同时匹配噪声过程定义的目标。实验表明,在相同采样步数下,FLDD生成的样本质量优于传统离散扩散模型。

详情
AI中文摘要

离散扩散模型是一类强大的生成模型,在许多领域表现出色。然而,为了效率,离散扩散通常用因子化分布参数化生成(反向)过程,这使得模型难以在少量步骤内学习目标过程,并需要长且计算成本高的采样过程。为减少目标与模型分布之间的差距并实现少步生成,我们提出前向学习离散扩散(FLDD),引入可学习的前向(噪声)过程。不同于固定马尔可夫前向链,我们采用非马尔可夫形式,结合可学习的边缘和后验分布。这使生成过程保持因子化,同时匹配由噪声过程定义的目标。我们通过标准变分目标端到端训练所有参数。在各种基准测试中,实验表明,对于给定的采样步数,我们的方法生成的样本质量优于使用相同反向参数化的传统离散扩散模型。

英文摘要

Discrete diffusion models are a powerful class of generative models with strong performance across many domains. For efficiency, however, discrete diffusion typically parameterizes the generative (reverse) process with factorized distributions, which makes it difficult for the model to learn the target process in a small number of steps and necessitates a long, computationally expensive sampling procedure. To reduce the gap between the target and model distributions and enable few-step generation, we propose Forward-Learned Discrete Diffusion (FLDD), which introduces discrete diffusion with a learnable forward (noising) process. Rather than fixing a Markovian forward chain, we adopt a non-Markovian formulation with learnable marginal and posterior distributions. This allows the generative process to remain factorized while matching the target defined by the noising process. We train all parameters end-to-end under the standard variational objective. Experiments on various benchmarks show that, for a given number of sampling steps, our approach produces a higher quality samples than conventional discrete diffusion models using the same reverse parameterization.

2605.18180 2026-05-19 stat.ML cs.LG

Canonical Regularisation of Wide Feature-Learning Neural Networks

宽特征学习神经网络的规范正则化

George Whittle, Pranav Vaidhyanathan, Juliusz Ziomek, Natalia Ares, Maike A. Osborne

AI总结 本文研究了宽特征学习神经网络中梯度流训练所隐含的正则化性质,揭示了在核域中广泛研究的范数正则化在特征学习域中会导致诱导偏差扭曲,并提出了弧范数作为可扩展的替代方案,扩展了范数正则化到特征学习域。

详情
AI中文摘要

宽神经网络在特征学习范式中推动了现代深度学习的发展,但它们的研究远少于核范式中的网络。我们考虑了这两个范式之间一个关键但研究不足的差异:梯度流训练所隐含的正则化和先验。这种规范正则化性质在核范式网络中已被广泛研究——在所有无限全局极小点中,梯度流精确选择消失的岭解——并支撑了著名的NN-GP对应关系,精确允许在训练过程中建模噪声。然而,我们证明在特征学习范式网络中,岭正则化会扭曲梯度流的诱导偏差,即使在正则化趋于零的极限下也是如此。在训练过程中,岭正则化会扭曲网络的诱导偏差,尤其对预训练网络造成损害,因为隐含的先验信息是有信息的。我们通过将规范正则化作为一种无关范式函数空间能量和提升函数来公理化,这在核范式中唯一识别岭解,并且关键地扩展到特征学习范式。通过研究特征学习网络的黎曼几何,我们从框架中推导出黎曼几何岭,将岭扩展到特征学习范式。相应地,我们证明规范函数空间先验是一个黎曼-高斯过程,扩展了更熟悉的高斯过程。作为实际贡献,我们提出了弧岭作为最小最大鲁棒、可扩展的替代方案,揭示了早停和规范正则化在学习范式中的深刻关系。最后,我们在图像处理和NLP迁移学习问题上展示了我们的理论后果。

英文摘要

Wide neural networks in the feature-learning regime drive modern deep learning, and yet they remain far less studied than their kernel-regime counterparts. We consider a critical yet under-explored difference between these two regimes: the regulariser and prior implied by gradient flow training. This canonical regularisation property is well-studied in kernel regime networks -- of all the infinite global minima, gradient flow selects exactly the vanishing ridge solution -- and underpins the celebrated NN-GP correspondence, precisely allowing the modelling of noise during training. However, we prove ridge regularisation biases gradient flow in feature-learning regime networks, even in the infinitesimal limit of vanishing regularisation. Over training, ridge distorts the inductive bias of the network, with a particular damage done to pretrained networks where the implicit prior is informative. We resolve this by axiomatising the canonical regulariser as a regime-agnostic function-space energy and lift, which uniquely identifies ridge in the kernel regime, and crucially generalises to the feature-learning regime. By studying the Riemannian geometry of feature-learning networks, we derive geodesic ridge from our framework, generalising ridge to the feature-learning regime. Correspondingly, we prove the canonical function-space prior is a Riemannian Gibbs Process, generalising the more familiar Gaussian Process. As a practical contribution, we propose arc ridge as a minimax-robust, scalable surrogate to geodesic ridge, revealing a deep relationship between early stopping and canonical regularisation across learning regimes. Finally, we demonstrate the consequences of our theory empirically on both image processing and NLP transfer-learning problems.

2605.18174 2026-05-19 cs.LG cs.DC math.OC stat.ML

Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method

Ringmaster LMO: 异步线性最小化Oracle动量方法

Abdurakhmon Sadiev, Artavazd Maranjyan, Ivan Ilin, Peter Richtárik

AI总结 本文提出Ringmaster LMO,一种用于无约束随机非凸优化的异步线性最小化Oracle动量方法,通过延迟阈值机制改进传统同步方法,适用于异构分布式系统,实验表明其在系统异构性增强时表现更优。

详情
AI中文摘要

Muon最近作为一种强大的替代AdamW方法出现,展现出大规模预训练的良好结果和矩阵结构更新在实践中可能更快的证据。然而,Muon以及更一般的线性最小化Oracle(LMO)方法通常用于同步方式。这在异构分布式系统中存在问题,因为工人完成梯度计算的速度不同,同步训练必须反复等待较慢的工人。本文引入Ringmaster LMO,一种用于无约束随机非凸优化的异步LMO基于动量方法。我们的方法基于Ringmaster ASGD的延迟阈值思想。对于SGD类型方法,Ringmaster ASGD通过丢弃过于陈旧的梯度实现最优时间复杂度。Ringmaster LMO将这一机制扩展到一般LMO更新。我们建立了在广义$(L_0, L_1)$-平滑条件下的收敛保证,并进一步开发了参数无关变体,具有递减步长和自适应延迟阈值。最后,我们将我们的迭代保证转换为在异构工人计算时间下的时间复杂度界限。在经典欧几里得平滑设置中,这些界限恢复了Ringmaster ASGD的最优时间复杂度。在随机二次问题和NanoChat语言模型预训练中的实验表明,Ringmaster LMO的优势随着系统异构性增加而增强,并且该方法在同步和异步基线方法中表现更优。

英文摘要

Muon has recently emerged as a strong alternative to AdamW for training neural networks, with encouraging large-scale pretraining results and growing evidence that matrix-structured updates can be faster in practice. Yet Muon, and more generally Linear Minimization Oracle (LMO) based methods, are typically used synchronously. This is problematic in heterogeneous distributed systems, where workers complete gradient computations at different speeds and synchronous training must repeatedly wait for slower workers. In this work, we introduce Ringmaster LMO, an asynchronous LMO-based momentum method for unconstrained stochastic nonconvex optimization. Our method builds on the delay-thresholding idea of Ringmaster ASGD. For SGD-type methods, Ringmaster ASGD achieves optimal time complexity by discarding overly stale gradients. Ringmaster LMO extends this mechanism to general LMO-based updates. We establish convergence guarantees under generalized $(L_0, L_1)$-smoothness and further develop a parameter-agnostic variant with decreasing stepsizes and adaptive delay thresholds. Finally, we translate our iteration guarantees into time complexity bounds under heterogeneous worker computation times. In the classical Euclidean smooth setting, these bounds recover the optimal time complexity of Ringmaster ASGD. Experiments on stochastic quadratic problems and NanoChat language-model pretraining show that the advantages of Ringmaster LMO grow with system heterogeneity and that the method outperforms strong synchronous and asynchronous baselines.

2605.18167 2026-05-19 stat.ME

1-truncated C-vine copula mixed models for network meta-analysis of multiple diagnostic tests

1-truncated C-vine copula混合模型用于多诊断测试网络元分析

Aristidis K. Nikoloulopoulos

AI总结 本文提出了一种灵活且强大的1-truncated C-vine copula混合模型,用于网络元分析多个诊断测试,以提高对多诊断测试准确性比较的分析能力。

详情
AI中文摘要

随着多诊断测试的元分析对临床决策和患者健康的影响日益增加,统计模型在整合比较多个诊断测试的研究证据方面受到越来越多的关注。为了在单个研究中比较多个诊断测试的准确性,三种设计被广泛使用:(i)多测试比较设计;(ii)随机设计;(iii)非比较设计。广义线性混合模型(GLMMs)目前是联合元分析这三种设计数据的推荐方法,能够实现同时推断。在此背景下,提出1-truncated C-vine copula混合模型作为一种灵活且强大的替代方法。这些模型通过允许随机效应的任意单变量分布,并捕捉尾部依赖性和不对称性,扩展了GLMM框架。我们通过广泛的模拟研究和对深静脉血栓诊断测试网络元分析案例的深入重新分析,展示了我们方法的实用性。结果表明,1-truncated C-vine copula混合模型在GLMMs之上可以提供改进,支持其在多诊断测试网络元分析中的采用。

英文摘要

As meta-analysis of multiple diagnostic tests impacts clinical decision making and patient health, there is growing interest in statistical models that synthesize evidence from studies comparing multiple diagnostic tests. To compare the accuracy of multiple diagnostic tests in a single study, three designs are commonly used: (i) the multiple test comparison design; (ii) the randomized design, and (iii) the non-comparative design. Generalized linear mixed models (GLMMs) are currently the recommended approach for jointly meta-analyzing data from all three designs, enabling simultaneous inference. In this context, 1-truncated C-vine copula mixed models are proposed as a flexible and powerful alternative. These models generalize the GLMM framework by allowing for arbitrary univariate distributions of the random effects and capturing tail dependencies and asymmetries. We demonstrate the utility of our methods with an extensive simulation study and by insightfully re-analysing a case study on the network meta-analysis of diagnostic tests for deep vein thrombosis. Findings indicate that 1-truncated C-vine copula mixed models can offer improvements over GLMMs, supporting their adoption for network meta-analysis of multiple diagnostic tests.

2605.14565 2026-05-19 stat.ME math.ST stat.AP stat.TH

A Bayesian Longitudinal Spatial Normative Model for Individualized Brain Deviation Mapping

一个用于个性化大脑偏差映射的贝叶斯纵向空间规范模型

J. T. Korley

AI总结 本文提出了一种贝叶斯纵向空间规范模型,通过统一的分层框架联合捕捉个体内部时间依赖性和空间结构化的个体偏差,从而在多个模拟场景中减少了偏差图重建误差,并在OASIS-3结构MRI数据应用中显著降低了RMSE。

详情
AI中文摘要

规范建模通过将受试者与参考人群进行比较而不是群体平均来实现对结构性大脑偏差的个性化表征。大多数现有实现将大脑区域独立处理且保持横断面,尽管有重复神经影像测量可用以及神经解剖变异的已知空间组织。我们提出了一种贝叶斯纵向空间规范模型,该模型在一个统一的分层框架中联合捕捉个体内部的时间依赖性和空间结构化的个体偏差。个体化偏差图被视为一个具有显式后验分布的潜在空间过程,从而在平方误差损失下获得一个原理性的贝叶斯估计器,而不是任意的残差总结。在六个涵盖不同空间依赖性、非线性轨迹、不规则访问计划和缺失随访的模拟场景中,所提出的模型在独立的横断面和纵向非空间基准上一致地减少了偏差图重建误差,同时保持了稳定的校准。在OASIS-3结构MRI数据的应用中,该模型相对于独立的横断面模型将RMSE降低了54%,相对于纵向非空间模型降低了45%。区域偏差负担集中在颞极、海马回、下颞叶皮层、后扣带回和旁海马回,这些区域与早期阿尔茨海默病型神经退行性变相关。个体层面的概况揭示了区域异常模式的显著异质性,包括显著的多区域偏差但保持全球认知分数。

英文摘要

Normative modeling enables individualized characterization of structural brain deviations by evaluating subjects against a reference population rather than a group average. Most existing implementations treat brain regions independently and remain cross-sectional, despite the availability of repeated neuroimaging measurements and the well-documented spatial organization of neuroanatomical variation. We propose a Bayesian longitudinal spatial normative model that jointly captures within-subject temporal dependence and spatially structured subject-specific deviations within a unified hierarchical framework. The individualized deviation map is treated as a latent spatial process with an explicit posterior distribution, yielding a principled Bayes estimator under squared error loss rather than an ad hoc residual summary. Across six simulation scenarios encompassing varying spatial dependence, nonlinear trajectories, irregular visit schedules, and missing follow-up, the proposed model consistently reduced deviation-map reconstruction error relative to independent cross-sectional and longitudinal non-spatial benchmarks while maintaining stable calibration. In an application to OASIS-3 structural MRI data, the model reduced RMSE by 54% relative to the independent cross-sectional model and by 45% relative to the longitudinal non-spatial model. Regional deviation burden was concentrated in the temporal pole, entorhinal cortex, inferior temporal cortex, posterior cingulate, and parahippocampal cortex, consistent with regions implicated in early Alzheimer-type neurodegeneration. Subject-level profiles revealed substantial heterogeneity in regional abnormality patterns, including marked multiregional deviation with preserved global cognitive scores.

2605.11617 2026-05-19 cs.LG math.ST stat.TH

MIST: Reliable Streaming Decision Trees for Online Class-Incremental Learning via McDiarmid Bound

MIST:通过McDiarmid界实现可靠的流决策树用于在线类增量学习

Phu-Hoa Pham, Chi-Nguyen Tran, Nguyen Lam Phu Quy, Dao Sy Duy Minh, Huynh Trung Kiet, Long Tran-Thanh

AI总结 本文提出MIST方法,通过三个集成组件解决流决策树在在线类增量学习中的可靠性问题,包括McDiarmid置信半径、贝叶斯继承协议和KLL量化图,以提升在非高斯几何中的鲁棒性。

详情
Comments
9 pages of main text, 5 figures
AI中文摘要

流决策树是开放世界持续学习的自然候选者,因为它们执行局部更新,具有有界内存,并且具有静态决策边界。尽管如此,它们仍然在在线类增量学习中失败,由于两个耦合的校准问题:(i)随着类别数K的增加,其分裂标准逐渐变得不可靠;(ii)在分裂时间缺乏知识转移。这两种失败的共同根源是信息增益的范围本质上与log2 K成比例。因此,任何基于它的Hoeffding式置信半径必然随着类别数的增长而增长,使得结构上独立于K的分裂标准不可能,从而剥夺了应用流决策树进行持续学习的潜在优势。为了解决这个问题,我们提出了MIST(McDiarmid增量流树),通过三个集成组件解决这两种失败:(i)一个紧致且独立于K的McDiarmid置信半径用于Gini分裂,作为结构正则化器;(ii)一个贝叶斯继承协议,通过截断高斯矩将父统计信息投影到子节点,方差减少保证在最保守的分裂时最强;(iii)每个叶子的KLL量化图支持连续阈值评估和几何自适应的叶子预测。在标准和压力测试表格流上,MIST在近高斯基准上与全局参数方法竞争,并在非高斯几何中表现出独特鲁棒性,其中SOTA基准崩溃。

英文摘要

Streaming decision trees are natural candidates for open-world continual learning, as they perform local updates, enjoy bounded memory, and static decision boundaries. Despite these, they still fail in online class-incremental learning due to two coupled miscalibrations: (i) their split criterion grows unreliable as the class count K expands, and (ii) the absence of knowledge transfer at split time. Both failures share a common root: the range of Information Gain intrinsically scales with log2 K. Consequently, any Hoeffding-style confidence radius derived from it must inevitably grow with the class count, making a K-independent split criterion structurally impossible, taking away the potential benefits of applying streaming decision trees to continual learning. To fix this issue, we present MIST (McDiarmid Incremental Streaming Tree), which resolves both failures through three integrated components: (i) a tight, K-independent McDiarmid confidence radius for Gini splitting that acts as a structural regulariser; (ii) a Bayesian inheritance protocol that projects parent statistics to child nodes via truncated-Gaussian moments, with variance reduction guarantees strongest precisely when splitting is most conservative; and (iii) per-leaf KLL quantile sketches that support both continuous threshold evaluation and geometry-adaptive leaf prediction from a single data structure. On standard and stress-test tabular streams, MIST is competitive with global parametric methods on near-Gaussian benchmarks and uniquely robust on non-Gaussian geometry where SOTA benchmarks collapse.

2605.11365 2026-05-19 cs.AI cs.LG stat.ML

Causal Bias Detection in Generative Artificial Intelligence

生成人工智能中的因果偏见检测

Drago Plecko

AI总结 本文研究了生成人工智能中的因果公平性问题,提出了新的因果分解结果,以量化不同因果路径和现实机制被生成模型替代对公平性的影响,并通过分析大型语言模型中的种族和性别偏见验证了方法的有效性。

详情
AI中文摘要

基于人工智能构建的自动化系统越来越多地应用于高风险领域,引发了关于公平性和现实世界中存在的人口差异持续存在的关键担忧。在此背景下,因果推断提供了一个有原则的框架来思考公平性,因为它将观察到的不平等与潜在机制联系起来,并自然与人类直觉和法律上的歧视观念相一致。先前关于因果公平性的研究主要集中在标准机器学习设置中,其中决策者为结果变量Y构建单一预测机制f_Ŷ,同时继承其他协变量的因果机制。然而,生成人工智能的设置却更加复杂:生成模型可以从任意条件下对任何变量集进行采样,隐式地构建了自己对所有因果机制的看法,而不是学习单一预测函数。这种根本性的差异要求因果公平性方法论有新的发展。我们正式定义了生成人工智能中的因果公平性问题,并在统一的理论框架下将其与标准机器学习设置相结合。然后,我们推导了新的因果分解结果,使能够对不同因果路径以及现实机制被生成模型机制替代的公平性影响进行精细量化。我们建立了识别条件并引入了用于因果感兴趣的量的高效估计器,并通过分析不同数据集中的大型语言模型中的种族和性别偏见来证明了我们方法的价值。

英文摘要

Automated systems built on artificial intelligence (AI) are increasingly deployed across high-stakes domains, raising critical concerns about fairness and the perpetuation of demographic disparities that exist in the world. In this context, causal inference provides a principled framework for reasoning about fairness, as it links observed disparities to underlying mechanisms and aligns naturally with human intuition and legal notions of discrimination. Prior work on causal fairness primarily focuses on the standard machine learning setting, where a decision-maker constructs a single predictive mechanism $f_{\widehat Y}$ for an outcome variable $Y$, while inheriting the causal mechanisms of all other covariates from the real world. The generative AI setting, however, is markedly more complex: generative models can sample from arbitrary conditionals over any set of variables, implicitly constructing their own beliefs about all causal mechanisms rather than learning a single predictive function. This fundamental difference requires new developments in causal fairness methodology. We formalize the problem of causal fairness in generative AI and unify it with the standard ML setting under a common theoretical framework. We then derive new causal decomposition results that enable granular quantification of fairness impacts along both (a) different causal pathways and (b) the replacement of real-world mechanisms by the generative model's mechanisms. We establish identification conditions and introduce efficient estimators for causal quantities of interest, and demonstrate the value of our methodology by analyzing race and gender bias in large language models across different datasets.

2605.09782 2026-05-19 cs.DS stat.ME

Near-Linear Time Generalized Sinkhorn Algorithms for Bounded Genus Graphs

近线性时间的广义Sinkhorn算法用于有界亏格图

Krzysztof Choromanski, Derek Long, Ananya Parashar, Dwaipayan Saha

AI总结 本文提出GenusSink算法,一种用于有界亏格图(如平面图)的近线性时间广义Sinkhorn算法,通过分离基于分解、计算几何技术和快速矩阵向量乘法等方法,解决了传统方法的二次时间复杂度问题,并在有界亏格图上实现了更精确的最优运输计算。

详情
AI中文摘要

我们提出了GenusSink,一种新的近似广义Sinkhorn算法,用于具有最短路径距离成本的有界亏格(如平面图)图,提供近线性时间:(1)预处理,(2)迭代步骤,(3)最终运输计划矩阵查询和近线性内存。GenusSink处理的图包括特别是平面图和逼近3D对象的有界亏格网格。GenusSink通过利用图分离分解、计算几何技术以及新的快速矩阵向量乘法结果(特别是傅里叶分析和低位移秩理论)来解决其暴力方法的总二次时间复杂度问题。它受到最近在图论中对用小树宽度度量近似有界亏格度量的突破性进展的启发。图中心的方法使我们能够针对在由加权图近似表示的流形上定义的相应分布的最优运输问题。我们进行了严格的理论分析,提供了实际实现,利用了本文中引入的新数据结构分离图场积分器(S-GFIs),并展示了经验验证。GenusSink提供的计算精度比其他高效的Sinkhorn算法高多个数量级,同时在与基线相比时仍保证了显著的计算改进。作为所开发方法的副产品,我们证明GenusSink在具有O(log log n)树宽的n-顶点图上(例如树)与暴力地理Sinkhorn算法在数值上是等价的。

英文摘要

We present GenusSink, a new class of approximate generalized Sinkhorn algorithms with shortest-path-distance costs for bounded genus (e.g. planar) graphs, providing near-linear time: (1) pre-processing, (2) iteration step, (3) final transport plan matrix querying and near-linear memory. Graphs handled by GenusSink include in particular planar graphs and bounded-genus meshes approximating 3D objects. GenusSink addresses total quadratic time complexity of its brute-force counterpart by leveraging separator-based decomposition of graphs, computational geometry techniques, and new results on fast matrix-vector multiplications with generalized distance matrices, using, in particular, Fourier analysis and low displacement rank theory. It is inspired by recent breakthroughs in graph theory on approximating bounded genus metrics with small treewidth metrics \citep{minor-free-paper}. The graph-centric approach enables us to target optimal transport problem with the corresponding distributions defined on the manifolds approximated by weighted graphs and with cost functions given by geodesic distances. We conduct rigorous theoretical analysis of GenusSink, provide practical implementations, leveraging newly introduced in this paper \textit{separation graph field integrators} (S-GFIs) data structures and present empirical verification. GenusSink provides orders of magnitude more accurate computations than other efficient Sinkhorn algorithms, while still guaranteeing significant computational improvements, as compared to the baseline. As a by-product of the developed methods, we show that GenusSink is \textbf{numerically equivalent} to the brute-force geodesic Sinkhorn algorithm on $n$-vertex graphs with treewidth $O(\log \log (n))$ (e.g. on trees).

2605.07855 2026-05-19 stat.AP

Jagged AI in Scientific Peer Review: Evidence from POMP Data Analysis

科学同行评审中的锯齿AI:来自POMP数据分析的证据

Jin Wook Lee, William Szegda, Zhisheng Song, Edward L. Ionides

AI总结 本研究探讨了人工智能在科学同行评审中的表现,发现AI在某些领域表现出色而在其他领域表现不佳,通过分析POMP数据集,发现AI在技术错误检测上优于人类,但在解释性错误和叙述连贯性方面表现不足。

详情
AI中文摘要

尽管人工智能在学术写作和统计分析中日益普及,但其在科学同行评审中的性能仍鲜有研究。一个关键挑战是锯齿AI现象,即AI在某些领域表现出强劲的能力跃升,而在其他领域则表现不佳。为了在实际数据科学背景下研究这种锯齿性,我们考虑了审查部分观测马尔可夫过程(POMP)数据分析的任务。POMP模型,也称为状态空间模型或隐藏马尔可夫模型,被用于拟合各种应用中的机理动态模型,包括疾病传播、生态动态和金融风险评估。高质量的同行评审需要评估科学背景、识别复杂算法实现中的错误,并做出关于方法学最佳实践的决策。我们研究了来自密歇根大学研究生时间序列课程四个学期的72个POMP项目,这些项目的报告、源代码和学生同行评审已匿名且开放获取。我们比较了人类评审与四个AI评审代理,使用Claude Code配合不同的指令实现为技能文件。我们发现AI评审员表现出锯齿型能力谱,能够高效地发现人类忽视的技术错误和无效推断方法,但在检查解释性错误、叙述连贯性和领域驱动的模型批评方面无法达到人类标准。锯齿性在所有代理中发现相似,这与它主要是一种底层AI模型属性而非特定指令有关。技能文件配置改变了代理强调的弱点,但并未消除锯齿性。

英文摘要

Despite their growing use in academic writing and statistical analysis, the performance of artificial intelligence (AI) tools in scientific peer review remains a largely unexplored area. A key challenge is jagged AI, a phenomenon where AI exhibits strong ability spikes in some domains while remaining deficient in others. To study this jaggedness in a practical data science context, we considered the task of reviewing partially observed Markov process (POMP) data analyses. POMP models, also known as state-space models or hidden Markov models, are used to fit mechanistic dynamic models to time series data in diverse applications including disease transmission, ecological dynamics, and financial risk assessment. High-quality peer review in this area entails assessment of scientific context, identification of errors in implementing complex algorithms, and decisions concerning methodological best practices. We studied 72 POMP projects from four semesters of a University of Michigan graduate time series course for which the project reports, the source code, and student peer reviews are anonymized and open-access. We compared the human reviews with four AI reviewing agents, using Claude Code with differing instructions implemented as skill files. We found that AI reviewers exhibited a jagged capability profile, proficiently catching human-overlooked technical errors and invalid inference methodology, while failing to match human standards in checking interpretive errors, narrative coherence, and domain-informed model critique. The jaggedness was found to be similar for all agents, consistent with it being primarily a property of the underlying AI model rather than the specific instructions. Skill file configuration shifted which weaknesses agents emphasized, without removing the jaggedness.

2605.07263 2026-05-19 eess.SP cs.AI cs.DC cs.LG stat.ML

Resource-Element Energy Difference for Noncoherent Over-the-Air Federated Learning

非协作空中联邦学习的资源元素能量差

Hao Chen, Zavareh Bozorgasl

AI总结 本文提出了一种非协作物理层原始方法,即资源元素能量差(REED),用于连续符号聚合。该方法通过将实值更新的正负部分映射到配对正交的资源元素上的传输能量,并通过减去对应的接收到的能量来估计符号和。REED利用慢时间尺度校准的平均信道功率,但不需要瞬时发射端或接收端CSI或信道反转。对于独立的瑞利衰落,我们推导了单次REED和芯片多样扩展的精确一阶和二阶矩表达式。

详情
Comments
Preprint; Under-review; Codes to replicate the results is available at: https://github.com/zavareh1/REED
AI中文摘要

Over-the-air federated learning (OTA-FL) reduces uplink latency by aggregating client updates directly over the wireless multiple-access channel. Coherent analog aggregation realizes this idea by aligning the phases and amplitudes of simultaneously transmitted waveforms, which typically requires synchronization, instantaneous channel-state information (CSI), phase compensation, and power control. Noncoherent energy detection removes the need for phase-coherent combining, but a single energy measurement is nonnegative and, therefore, cannot represent signed model updates. This paper introduces resource-element energy difference (REED), a noncoherent physical-layer primitive for continuous signed aggregation. REED maps the positive and negative parts of each real-valued update to transmit energies on paired orthogonal resource elements and estimates the signed sum by subtracting the corresponding received energies. The construction uses slow-timescale calibration of average channel powers, but does not require instantaneous transmitter- or receiver-side CSI or channel inversion. For independent Rayleigh fading, we derive exact first- and second-moment expressions for single-shot REED and for a chip-diverse extension that spreads each coordinate over multiple independently faded paired chips. The resulting variance laws separate fading-induced self-noise, signal-noise interaction, and receiver-noise fluctuation, giving an explicit diversity-resource tradeoff. More->The rest of abstract is in the paper.

英文摘要

Over-the-air federated learning (OTA-FL) reduces uplink latency by aggregating client updates directly over the wireless multiple-access channel. Coherent analog aggregation realizes this idea by aligning the phases and amplitudes of simultaneously transmitted waveforms, which typically requires synchronization, instantaneous channel-state information (CSI), phase compensation, and power control. Noncoherent energy detection removes the need for phase-coherent combining, but a single energy measurement is nonnegative and, therefore, cannot represent signed model updates. This paper introduces resource-element energy difference (REED), a noncoherent physical-layer primitive for continuous signed aggregation. REED maps the positive and negative parts of each real-valued update to transmit energies on paired orthogonal resource elements and estimates the signed sum by subtracting the corresponding received energies. The construction uses slow-timescale calibration of average channel powers, but does not require instantaneous transmitter- or receiver-side CSI or channel inversion. For independent Rayleigh fading, we derive exact first- and second-moment expressions for single-shot REED and for a chip-diverse extension that spreads each coordinate over multiple independently faded paired chips. The resulting variance laws separate fading-induced self-noise, signal-noise interaction, and receiver-noise fluctuation, giving an explicit diversity-resource tradeoff. More->The rest of abstract is in the paper.

2603.17577 2026-05-19 cs.LG cs.AI stat.ML

Identifying Latent Actions and Dynamics from Offline Data via Demonstrator Diversity

通过示范多样性从离线数据中识别潜在动作和动态

Felix Schur

AI总结 本文研究了在不观察动作的情况下从离线轨迹中恢复潜在动作和环境动态的问题,通过示范多样性假设,证明了在满足特定条件时,潜在转移和示范策略可以被唯一确定,从而为从离线强化学习数据中学习潜在动作和动态提供了新的方法。

详情
AI中文摘要

在动作未被观察的情况下,能否从离线轨迹中恢复潜在动作和环境动态?我们研究了在轨迹无动作但带有示范者身份标签的设置中这一问题。我们假设每个示范者遵循不同的策略,而环境动态在所有示范者之间是共享的,身份仅通过所选动作影响下一个观测。在这些假设下,条件下一个观测分布 $p(o_{t+1}\mid o_t,e)$ 是潜在动作条件化转移核的混合,具有示范者特定的混合权重。我们证明,这导致每个状态的可观测条件分布具有列随机非负矩阵分解。通过充分分散的策略多样性和秩条件,我们证明潜在转移和示范策略在潜在动作标签的排列下是可识别的。通过Gram行列式最小体积准则,我们将结果扩展到连续观测空间,并证明在连接的状态空间上转移映射的连续性将局部排列模糊性提升为单一全局排列。少量标记的动作数据足以消除最终的模糊性。这些结果确立了示范多样性作为从离线强化学习数据中学习潜在动作和动态的原理性可识别性来源。

英文摘要

Can latent actions and environment dynamics be recovered from offline trajectories when actions are never observed? We study this question in a setting where trajectories are action-free but tagged with demonstrator identity. We assume that each demonstrator follows a distinct policy, while the environment dynamics are shared across demonstrators and identity affects the next observation only through the chosen action. Under these assumptions, the conditional next-observation distribution $p(o_{t+1}\mid o_t,e)$ is a mixture of latent action-conditioned transition kernels with demonstrator-specific mixing weights. We show that this induces, for each state, a column-stochastic nonnegative matrix factorization of the observable conditional distribution. Using sufficiently scattered policy diversity and rank conditions, we prove that the latent transitions and demonstrator policies are identifiable up to permutation of the latent action labels. We extend the result to continuous observation spaces via a Gram-determinant minimum-volume criterion, and show that continuity of the transition map over a connected state space upgrades local permutation ambiguities to a single global permutation. A small amount of labeled action data then suffices to fix this final ambiguity. These results establish demonstrator diversity as a principled source of identifiability for learning latent actions and dynamics from offline RL data.

2603.17041 2026-05-19 stat.ML cs.AI cs.LG stat.ME

When Marginals Match but Structure Fails: Covariance Fidelity in Generative Models

当边缘匹配但结构失败:生成模型中的协方差保真度

Nazia Riasat

AI总结 本文提出了一种基于协方差层面的依赖保真度评估标准,以弥补传统边缘分布匹配评估方法的不足,通过实验证明该标准能更准确地区分结构保留与结构丢失的生成模型。

详情
Comments
44 pages, 25 figures. Extended version of paper accepted at MathAI 2026 (International Conference on Mathematics of Artificial Intelligence), March 30 - April 3, 2026
AI中文摘要

生成模型正越来越多地被用作真实数据的替代品用于下游科学流程,但标准评估标准仍然集中在边缘分布匹配上。我们主张这代表了一个根本性的差距:下游推断很少是边缘操作,且一个通过所有单变量诊断的模型仍可能产生结构不可靠的合成数据。我们引入了协方差层面的依赖保真度,通过D_Sigma(P,Q) = ||Sigma_P - Sigma_Q||_F来衡量生成模型是否在超出单变量边缘之外保留数据的联合结构。三个结果正式化了这一准则。首先,边缘保真度对依赖结构没有任何约束:D_Sigma可以被任意增大,同时所有单变量边缘完全匹配。其次,协方差分歧会引起可量化的下游不稳定性,包括总体回归系数的符号反转。第三,通过Davis-Kahan型界提供对依赖敏感过程如PCA的正向稳定性保证。在三个领域,图像数据(Fashion-MNIST VAE,n = 60,000)、批量RNA-seq(TCGA-BRCA,n = 1,111)和小样本压力测试(阿尔茨海默症基因表达,n = 113)的实证验证显示,D_Sigma/delta在标准边缘诊断显示很少分离的情况下,能一致地区分结构丢弃与结构保留的生成器,确认了协方差层面保真度在跨领域和样本大小上提供了与现有评估指标正交的信息。

英文摘要

Generative models are increasingly deployed as substitutes for real data in downstream scientific workflows, yet standard evaluation criteria remain focused on marginal distribution matching. We argue that this represents a fundamental gap: downstream inference is rarely a marginal operation, and a model that passes every univariate diagnostic can still produce structurally unreliable synthetic data. We introduce covariance-level dependence fidelity, measured by D_Sigma(P,Q) = ||Sigma_P - Sigma_Q||_F, as a principled, computable criterion for evaluating whether a generative model preserves the joint structure of data beyond its univariate marginals. Three results formalise this criterion. First, marginal fidelity provides no constraint on dependence structure: D_Sigma can be made arbitrarily large while all univariate marginals match exactly. Second, covariance divergence induces quantifiable downstream instability, including sign reversals in population regression coefficients. Third, bounding D_Sigma provides positive stability guarantees for dependence-sensitive procedures such as PCA via Davis-Kahan-type bounds. Empirical validation across three domains, image data (Fashion-MNIST VAE, n = 60,000), bulk RNA-seq (TCGA-BRCA, n = 1,111), and a small-sample stress test (Alzheimer's gene expression, n = 113), shows that D_Sigma/delta consistently distinguishes structure-discarding from structure-preserving generators in cases where standard marginal diagnostics show little separation, confirming that covariance-level fidelity provides information orthogonal to existing evaluation metrics across domains and sample sizes.

2603.06984 2026-05-19 stat.ML cs.AI cs.GT cs.LG cs.SI

Masking Causality and Conditional Dependence

掩盖因果关系与条件依赖

Zou Yang, Sophia Xiao, Bijan Mazaheri

AI总结 本文研究了通过平均约束来强制条件独立性的问题,发现这种约束在监管层面无法满足分层要求,而在优化者层面却能有效隐藏依赖关系,从而指出通过观测决策的平均统计来监管直接依赖是有限的,必须在决策规则层面进行监管。

详情
AI中文摘要

许多监管和分析问题要求被禁止的变量只能通过指定的允许渠道影响决策——这是一种出现在路径特定公平性、处理敏感信息和监管非公开信息交易等场景中的条件独立性要求。这些要求可以通过分层方式执行,或更常见且更高效地通过单个平均约束来执行。本文从监管者的角度将因果掩盖建模为一个线性规划,并证明平均约束优化几乎总是产生违反分层要求但恰好满足平均约束的政策。掩盖收益随着混淆和结果异质性增加而增长,检测需要精确的条件独立性测试,而平均约束旨在避免这些测试。从优化者的角度来看,相同的构造表明,被掩盖的政策恢复了大部分无约束利用的收益,但更难被检测到,因此在决策基础本身敏感的任何设置中都具有吸引力。这些结果表明,通过观测决策的平均统计来监管直接依赖在结构上是有限的,有意义的监管必须在决策规则本身层面进行。

英文摘要

Many regulatory and analytic problems require that a prohibited variable influence a decision only through a designated allowable channel -- a conditional-independence requirement that arises in path-specific fairness, the handling of classified information, and the regulation of trading on non-public information, among other settings. Such requirements may be enforced either stratum-by-stratum or, more commonly (and more efficiently), through a single averaged constraint on the conditional effect. We study the resulting enforcement problem from two perspectives. From the regulator's side, we formulate causal masking as a linear program and show that averaged-constraint optimization almost surely produces policies that violate the stratum-wise requirement while satisfying the averaged one exactly. The gains from masking grow with confounding and outcome heterogeneity, and detection requires precisely the conditional-independence tests that average constraints aim to avoid. From the optimizer's side, the same construction shows that masked policies recover most of the reward of unconstrained exploitation while being far harder to detect, making them attractive in any setting where the basis of decisions is itself sensitive. Together, these results argue that regulating direct dependence through averaged statistics on observed decisions is structurally limited, and that meaningful enforcement must operate at the level of the decision rule itself.

2602.22307 2026-05-19 stat.ME astro-ph.CO astro-ph.GA astro-ph.IM

Global structure of the time delay likelihood

时间延迟似然的全局结构

Namu Kroupa, Will Handley

AI总结 本文研究了时间延迟推断中似然函数的固有病态,指出标准推断方法面临挑战,并提出通过增加活点数量等方法来确保收敛的实用解决方案。

详情
Journal ref
Phys. Rev. Research 8, 023175 (2026)
Comments
21 pages, 8 figures
AI中文摘要

我们识别了时间延迟推断中似然函数的根本病理,这挑战了标准推断方法。通过分析高斯过程光变模型中的时间延迟似然函数,我们显示其通常会形成一个受边界驱动的

英文摘要

We identify a fundamental pathology in the likelihood for time delay inference which challenges standard inference methods. By analysing the likelihood for time delay inference with Gaussian process light curve models, we show that it generically develops a boundary-driven "W"-shape with a global maximum at the true delay and gradual rises towards the edges of the observation window. This arises because time delay estimation is intrinsically extrapolative. In practice, global samplers such as nested sampling are steered towards spurious edge modes unless strict convergence criteria are adopted. We demonstrate this with simulations and show that the effect strengthens with higher data density over a fixed time span. To ensure convergence, we provide concrete guidance, notably increasing the number of live points. Further, we show that methods implicitly favouring small delays, for example optimisers and local MCMC, induce a bias towards larger $H_0$. Our results clarify failure modes and offer practical remedies for robust fully Bayesian time delay inference.

2602.07618 2026-05-19 cs.LG stat.ML

Neural Networks With Dense Weights Are Not Universal Approximators

具有密集权重的神经网络不是通用逼近器

Levi Rauchwerger, Stefanie Jegelka, Ron Levie

AI总结 研究探讨了密集神经网络的逼近能力,指出在有限的权重约束下,密集连接的神经网络无法逼近任意连续函数,从而揭示了密集层神经网络的固有局限性,推动了稀疏连接在实现真正通用性中的必要性。

详情
AI中文摘要

我们研究了密集神经网络的逼近能力。虽然通用逼近定理表明,如果对权重值没有限制,足够大的架构可以逼近任意连续函数,但我们证明密集神经网络并不具备这种普遍性。我们的论证基于一种模型压缩方法,结合弱正则性引理与将前馈网络解释为消息传递图神经网络的解释。我们考虑具有自然权重、输入和输出维度约束的ReLU神经网络,这建模了一种密集连接的概念。在此设置中,我们展示了存在无法被此类网络逼近的Lipschitz连续函数。这突显了密集层神经网络的固有局限性,并推动了稀疏连接作为实现真正通用性的必要成分的使用。

英文摘要

We investigate the approximation capabilities of dense neural networks. While universal approximation theorems establish that sufficiently large architectures can approximate arbitrary continuous functions if there are no restrictions on the weight values, we show that dense neural networks do not possess this universality. Our argument is based on a model compression approach, combining the weak regularity lemma with an interpretation of feedforward networks as message passing graph neural networks. We consider ReLU neural networks subject to natural constraints on weights and input and output dimensions, which model a notion of dense connectivity. Within this setting, we demonstrate the existence of Lipschitz continuous functions that cannot be approximated by such networks. This highlights intrinsic limitations of neural networks with dense layers and motivates the use of sparse connectivity as a necessary ingredient for achieving true universality.

2602.05172 2026-05-19 stat.ML cs.LG math.ST stat.TH

Finite-Particle Rates for Regularized Stein Variational Gradient Descent

有限粒子率的正则化Stein变分梯度下降

Ye He, Krishnakumar Balasubramanian, Sayan Banerjee, Promit Ghosal

AI总结 本文研究了正则化Stein变分梯度下降算法的有限粒子率,通过应用树脂型预条件器来校正SVGD的常数阶偏差,推导了时间平均经验测度的非渐近界,并在目标满足W₁I条件下,证明了对于光滑核函数的大类,W₁收敛。

详情
AI中文摘要

我们推导了He等人(2024)提出的正则化Stein变分梯度下降(R-SVGD)算法的有限粒子率,该算法通过在核化Wasserstein梯度上应用树脂型预条件器来校正SVGD的常数阶偏差。对于由此得到的相互作用N粒子系统,我们建立了时间平均(退火)经验测度的显式非渐近界,展示了在真正的(非核化)Fisher信息上的收敛,并在目标满足W₁I条件下,对于一大类光滑核函数,对应W₁收敛。我们的分析涵盖了连续时间和离散时间动力学,并给出了正则化参数、步长和平均时间范围的原理性调整规则,这些规则量化了近似Wasserstein梯度流和控制有限粒子估计误差之间的权衡。

英文摘要

We derive finite-particle rates for the regularized Stein variational gradient descent (R-SVGD) algorithm introduced by He et al. (2024) that corrects the constant-order bias of the SVGD by applying a resolvent-type preconditioner to the kernelized Wasserstein gradient. For the resulting interacting $N$-particle system, we establish explicit non-asymptotic bounds for time-averaged (annealed) empirical measures, illustrating convergence in the \emph{true} (non-kernelized) Fisher information and, under a $\mathrm{W}_1\mathrm{I}$ condition on the target, corresponding $\mathrm{W}_1$ convergence for a large class of smooth kernels. Our analysis covers both continuous- and discrete-time dynamics and yields principled tuning rules for the regularization parameter, step size, and averaging horizon that quantify the trade-off between approximating the Wasserstein gradient flow and controlling finite-particle estimation error.

2601.16022 2026-05-19 stat.ME

Approximate Likelihood-Based Inference for Spatial Generalized Linear Mixed Models

基于空间广义线性混合模型的近似似然推断

Samuel I. Watson, Yixin Wang, Emanuele Giorgi

AI总结 本文研究了使用随机牛顿-拉夫森算法进行空间广义线性混合模型的最大似然估计,比较了Spectral Gaussian Process和随机偏微分方程两种高斯过程近似方法,并提出了新的停止准则和固定效应标准误差估计方法。

详情
AI中文摘要

我们研究了使用随机牛顿-拉夫森算法进行具有高斯过程近似的空间广义线性混合模型的最大似然估计。我们考虑了两种高斯过程近似方法:频谱高斯过程近似和随机偏微分方程(SPDE)。我们改进了随机最大似然算法,并提出了一种新的停止准则以高效终止,防止在平稳后收敛阶段长时间采样,并提出了一种固定效应标准误差的蒙特卡洛估计器。我们运行了一系列空间统计模型的模拟比较,同时对比了流行的贝叶斯嵌套拉普拉斯近似方法,该方法结合了SPDE。我们表明,HSGP在平滑的潜在场中为固定和随机效应参数提供名义覆盖,但在粗糙场中性能下降。在随机最大似然框架中使用SPDE保持名义覆盖,并在贝叶斯嵌套拉普拉斯近似方法中匹配或改进其性能。

英文摘要

We study maximum likelihood estimation for spatial generalized linear mixed models with Gaussian process approximations using a stochastic Newton-Raphson algorithm. We consider two Gaussian Process approximations in this context: spectral Gaussian process approximations and stochastic partial differential equations (SPDE). We refine the stochastic maximum likelihood algorithm and we propose a new stopping criterion for efficient termination to prevent long runs of sampling in the stationary post-convergence phase and a Monte Carlo estimator of fixed effect standard errors. We run a series of simulation comparisons of spatial statistical models alongside the popular Bayesian integrated nested Laplacian approximation method which incorporates SPDE. We show that HSGP provides nominal coverage of fixed and random effect parameters with smooth latent fields but performance degrades for rough fields. SPDE in a stochastic maximum likelihood framework maintains nominal coverage and matches or improves upon the performance of Bayesian integrated nested Laplacian approximation.

2512.13506 2026-05-19 cs.LG stat.ML

Learning under Distributional Drift: Prequential Reproducibility as an Intrinsic Statistical Resource

在分布漂移下学习:预quential可再现性作为内在统计资源

Sofiya Zaichyk

AI总结 本文研究了在分布漂移下学习的问题,提出了一种内在的漂移预算$C_T$,用于量化数据分布沿实际学习者-环境轨迹的累积信息几何运动,以 Fisher-Rao 距离衡量。该预算将外生环境变化与学习者动作引起的反馈分离,从而提供了基于速率的预quential可再现性特征。文章证明了漂移反馈界,并建立了匹配的下界,展示了平均 Fisher-Rao 运动率的依赖性是紧的。此外,还证明了信息论上的不可区分性结果,并通过实验表明适当选择的监控通道可以保留风险相关的漂移信号。

详情
Comments
Revised: Added additional experiment. Clarified lower bound
AI中文摘要

在分布漂移下统计学习仍然缺乏充分的描述,尤其是在闭环设置中,学习会改变数据生成规律。我们引入了一个内在的漂移预算$C_T$,用于量化数据分布沿实际学习者-环境轨迹的累积信息-几何运动,以Fisher-Rao距离衡量。该预算将外生环境变化与由学习者动作引起的反馈分离。这给出了基于速率的预quential可再现性特征:当使用实际流上的性能来预测下一步分布下的一步 ahead 性能时,漂移贡献通过平均运动率$C_T/T$,而不是单独的累积漂移。我们证明了一个漂移反馈界,其顺序为$T^{-1/2}+C_T/T$,至多有受控的二阶余项。我们还建立了在标准正则子类上的匹配尖锐下界。因此,对平均Fisher-Rao运动率的依赖性在常数范围内是紧的:$C_T/T$足够用于上界控制,并且在正则困难子类上是不可避免的。我们进一步证明了一个信息论上的不可区分性结果,表明在一步 ahead 目标上的顺序$C/T$效应不需要仅从实际性能流中识别。最后,我们表明固定监控通道诱导了收缩的可观察Fisher运动,并通过实验,包括一个不正确的现实数据反馈设置,表明适当选择的通道可以在内在数据生成规律不可用时保留风险相关的漂移信号。由此产生的理论将外生漂移、自适应数据分析和表现反馈视为沿同一学习者-环境轨迹的Fisher-Rao运动的不同来源。

英文摘要

Statistical learning under distributional drift remains poorly characterized, especially in closed-loop settings where learning alters the data-generating law. We introduce an intrinsic drift budget $C_T$ that quantifies cumulative information-geometric motion of the data distribution along the realized learner-environment trajectory, measured in Fisher-Rao distance. The budget separates exogenous environmental change from policy-sensitive feedback induced by the learner's actions. This gives a rate-based characterization of prequential reproducibility: when performance on the realized stream is used to predict one-step-ahead performance under the next distribution, the drift contribution enters through the average motion rate $C_T/T$, not through cumulative drift alone. We prove a drift-feedback bound of order $T^{-1/2}+C_T/T$, up to controlled second-order remainder terms, and establish a matching sharpness lower bound for the same prequential reproducibility gap on a canonical regular subclass. Thus the dependence on the average Fisher-Rao motion rate is tight up to constants: $C_T/T$ is sufficient for upper control and unavoidable on regular hard subclasses. We further prove an information-theoretic indistinguishability result showing that order-$C/T$ effects on the one-step-ahead target need not be identifiable from the realized performance stream alone. Finally, we show that fixed monitoring channels induce contracted observable Fisher motion, and experiments, including a misspecified real-data feedback setting, indicate that appropriately chosen channels can retain risk-relevant drift signal when the intrinsic data-generating law is unavailable. The resulting theory treats exogenous drift, adaptive data analysis, and performative feedback as different sources of Fisher-Rao motion along the same learner-environment trajectory.

2510.26745 2026-05-19 cs.LG cs.AI cs.CL stat.ML

Deep sequence models tend to memorize geometrically; it is unclear why

深度序列模型倾向于记忆几何学;不清楚为何

Shahriar Noroozizadeh, Vaishnavh Nagarajan, Elan Rosenfeld, Sanjiv Kumar

AI总结 研究探讨了深度序列模型中原子事实的存储机制,发现几何记忆能编码全局关系,即使在训练中未共现的实体间也能建立联系,挑战了传统关联记忆的观点。

详情
Comments
Forty-third International Conference on Machine Learning (ICML 2026)
AI中文摘要

深度序列模型被认为主要通过关联记忆存储原子事实,即通过暴力查找共现实体。我们识别出一种不同的存储形式,称为几何记忆。在此模型中,嵌入编码了所有实体之间的新型全局关系,包括训练中未共现的实体。这种存储形式强大:例如,我们展示了它如何将涉及ℓ-折叠组合的困难推理任务转化为易于学习的一步导航任务。从这一现象中,我们提取了神经嵌入几何学中难以解释的基本方面。我们认为,这种几何的出现,与局部关联的查找相比,不能简单归因于典型的监督、架构或优化压力。反直觉的是,即使几何比暴力查找更复杂,它仍然会被学习。然后,通过分析与Node2Vec的联系,我们展示了几何起源于一种光谱偏见,这与主流理论相反,确实自然产生,尽管缺乏各种压力。这一分析也指出了从业者在使Transformer记忆更几何化方面的可见空间。我们希望几何视角的参数记忆鼓励重新审视指导知识获取、容量、发现和遗忘等领域的默认直觉。

英文摘要

Deep sequence models are said to store atomic facts predominantly in the form of associative memory: a brute-force lookup of co-occurring entities. We identify a dramatically different form of storage of atomic facts that we term as geometric memory. Here, the model has synthesized embeddings encoding novel global relationships between all entities, including ones that do not co-occur in training. Such storage is powerful: for instance, we show how it transforms a hard reasoning task involving an $\ell$-fold composition into an easy-to-learn $1$-step navigation task. From this phenomenon, we extract fundamental aspects of neural embedding geometries that are hard to explain. We argue that the rise of such a geometry, as against a lookup of local associations, cannot be straightforwardly attributed to typical supervisory, architectural, or optimizational pressures. Counterintuitively, a geometry is learned even when it is more complex than the brute-force lookup. Then, by analyzing a connection to Node2Vec, we demonstrate how the geometry stems from a spectral bias that -- in contrast to prevailing theories -- indeed arises naturally despite the lack of various pressures. This analysis also points out to practitioners a visible headroom to make Transformer memory more strongly geometric. We hope the geometric view of parametric memory encourages revisiting the default intuitions that guide researchers in areas like knowledge acquisition, capacity, discovery, and unlearning.

2509.23068 2026-05-19 stat.ML cs.LG

Sparse Deep Additive Model with Interactions: Enhancing Interpretability and Predictability

稀疏深度加法模型与交互:增强可解释性和预测性

Yi-Ting Hung, Li-Hsiang Lin, Vince D. Calhoun

AI总结 本文提出了一种结合稀疏特征选择与深度子网络的稀疏深度加法模型与交互(SDAMI),通过三阶段策略实现高维回归中的可解释性和预测性提升。

详情
AI中文摘要

近年来深度学习的进步突显了需要能够从少量样本中学习、处理高维特征并保持可解释性的个性化模型。为此,我们提出了稀疏深度加法模型与交互(SDAMI)框架,该框架结合了以稀疏性驱动的特征选择与深度子网络以实现灵活的功能近似。SDAMI的核心是效应足迹原理,该原理认为高阶交互会在构成变量上留下可检测的边际痕迹,从而无需穷尽搜索即可发现它们。SDAMI通过三阶段策略执行这一原理:(1)筛选足迹变量,(2)通过组Lasso分离主效应与交互,(3)使用专用深度子网络建模组件。理论分析证实,足迹仅在测度零对称条件下消失,而这些条件在实践中极为罕见,从而确保了一致的交互恢复。广泛模拟显示,SDAMI能够成功识别出基于遗传的基线方法根本无法识别的纯交互,以接近零的假阳性率恢复复杂的效应结构。这些结果将SDAMI定位为一种原理上适用于高维回归的可解释框架。

英文摘要

Recent advances in deep learning highlight the need for personalized models that can learn from small samples, handle high-dimensional features, and remain interpretable. To address this, we propose the Sparse Deep Additive Model with Interactions (SDAMI), a framework that combines sparsity-driven feature selection with deep subnetworks for flexible function approximation. Central to SDAMI is the Effect Footprint principle, which posits that higher-order interactions leave detectable marginal traces on constituent variables, enabling their discovery without exhaustive search. SDAMI executes this principle through a three-stage strategy: (1) screening for footprint variables, (2) disentangling main effects from interactions via group lasso, and (3) modeling components with dedicated deep subnetworks. Theoretical analysis confirms that footprints vanish only under measure-zero symmetry conditions that are rare in practice, ensuring consistent interaction recovery. Extensive simulations demonstrate that SDAMI successfully identifies pure interactions that heredity-based baselines fundamentally miss, recovering complex effect structures with near-zero false positive rates. Together, these results position SDAMI as a principled framework for interpretable high-dimensional regression.

2509.22459 2026-05-19 stat.ML cs.LG

Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs)

通用逆向蒸馏用于匹配模型与真实数据监督(无GANs)

Nikita Kornilov, David Li, Tikhon Mavrin, Aleksei Leonov, Nikita Gushchin, Evgeny Burnaev, Iaroslav Koshelev, Alexander Korotin

AI总结 本文提出RealUID框架,通过无需GANs的方式将真实数据无缝融入逆向蒸馏过程,为所有匹配模型提供统一的蒸馏方法,涵盖流匹配和扩散模型,并可扩展至其变种。

详情
AI中文摘要

尽管生成质量优异,现代扩散、流及其他匹配模型在推理时速度较慢,因为它们需要许多迭代生成步骤。最近的蒸馏方法通过在预训练教师模型指导下训练高效的单步生成器来解决这个问题。然而,这些方法通常局限于特定框架,例如仅限于扩散或仅限于流模型。此外,这些方法原本是数据无关的,为了利用真实数据,需要使用额外的复杂对抗训练和额外的判别器模型。在本文中,我们提出了RealUID,一种适用于所有匹配模型的通用蒸馏框架,能够无缝地将真实数据整合到蒸馏过程中而无需GANs。我们的RealUID方法提供了一个简单的理论基础,涵盖了流匹配和扩散模型之前的蒸馏方法,并可扩展到其变种,如桥接匹配和随机插值。代码可在https://github.com/David-cripto/RealUID中找到。

英文摘要

While achieving exceptional generative quality, modern diffusion, flow, and other matching models suffer from slow inference, as they require many steps of iterative generation. Recent distillation methods address this problem by training efficient one-step generators under the guidance of a pre-trained teacher model. However, these methods are often constrained to only one specific framework, e.g., only to diffusion or only to flow models. Furthermore, these methods are originally data-free, and to benefit from the usage of real data, it is required to use an additional complex adversarial training with an extra discriminator model. In this paper, we present RealUID, a universal distillation framework for all matching models that seamlessly incorporates real data into the distillation procedure without GANs. Our RealUID approach offers a simple theoretical foundation that covers previous distillation methods for Flow Matching and Diffusion models, and can be also extended to their modifications, such as Bridge Matching and Stochastic Interpolants. The code can be found in https://github.com/David-cripto/RealUID.

2508.08080 2026-05-19 cs.LG cs.NE stat.AP

Symbolic Quantile Regression for the Interpretable Prediction of Conditional Quantiles

符号量化回归用于条件量化可解释性预测

Cas Oude Hoekstra, Floris den Hengst

AI总结 本文提出了一种符号量化回归方法,用于预测条件量化并解释预测变量对结果的影响,通过在航空燃料使用案例中比较预测极值和中央结果的模型,展示了SQR在高风险应用中的有效性。

详情
Journal ref
Transactions on Machine Learning Research, May 2026, https://openreview.net/pdf?id=x9OYbyPJOG
AI中文摘要

符号回归(SR)是一种生成可解释或白盒预测模型的已知框架。尽管SR已被成功应用于创建结果平均值的可解释估计,但目前尚不清楚如何利用SR来估计目标变量分布其他点处变量之间的关系。例如,中位数或极值的估计提供了预测变量如何影响结果的更全面图景,并在高风险、安全关键应用领域是必要的。本文介绍了符号量化回归(SQR),一种利用SR预测条件量化的做法。在广泛的评估中,我们发现SQR在透明模型上表现优于,并且在不牺牲透明性的情况下与强大的黑盒基线模型表现相当。我们还展示了如何利用SQR通过比较预测极值和中央结果的模型来解释目标分布的差异。我们得出结论,SQR适用于预测条件量化并理解不同分位数下的有趣特征影响。

英文摘要

Symbolic Regression (SR) is a well-established framework for generating interpretable or white-box predictive models. Although SR has been successfully applied to create interpretable estimates of the average of the outcome, it is currently not well understood how it can be used to estimate the relationship between variables at other points in the distribution of the target variable. Such estimates of e.g. the median or an extreme value provide a fuller picture of how predictive variables affect the outcome and are necessary in high-stakes, safety-critical application domains. This study introduces Symbolic Quantile Regression (SQR), an approach to predict conditional quantiles with SR. In an extensive evaluation, we find that SQR outperforms transparent models and performs comparably to a strong black-box baseline without compromising transparency. We also show how SQR can be used to explain differences in the target distribution by comparing models that predict extreme and central outcomes in an airline fuel usage case study. We conclude that SQR is suitable for predicting conditional quantiles and understanding interesting feature influences at varying quantiles.

2508.03833 2026-05-19 math.ST math.PR stat.TH

Computable Bounds for Strong Approximations with Applications

可计算的强逼近界及其应用

Haoyu Ye, Morgane Austern

AI总结 本文提出了一种可计算的KMT不等式,用于有界独立同分布随机变量的部分和,同时给出了在标准差未知时的经验版本,并展示了其在在线突变点检测和首次击中时间概率中的应用。

详情
AI中文摘要

Komlós$\unicode{x2013}$Major$\unicode{x2013}$Tusnády (KMT) 不等式对于部分和是概率论中最受推崇的结果之一。然而,其实际应用受到缺乏实用常数的限制。本文针对有界独立同分布随机变量解决了这一限制。通过付出额外的对数因子,我们提出了一种仅依赖于变量范围和标准差的可计算KMT不等式版本。我们还推导出一个经验版本的不等式,即使在标准差未知时也能实现名义覆盖。然后,我们通过在线突变点检测和首次击中时间概率的应用来展示我们边界的实用性。作为我们分析的副产品,我们获得了归一化中心部分和的Cramér型中等偏差界。

英文摘要

The Komlós$\unicode{x2013}$Major$\unicode{x2013}$Tusnády (KMT) inequality for partial sums is one of the most celebrated results in probability theory. Yet its practical application has been hindered by a lack of practical constants. This paper addresses this limitation for bounded i.i.d. random variables. At the cost of an additional logarithmic factor, we propose a computable version of the KMT inequality that depends only on the variables' range and standard deviation. We also derive an empirical version of the inequality that achieves nominal coverage even when the standard deviation is unknown. We then demonstrate the practicality of our bounds through applications to online change point detection and first hitting time probabilities. As a byproduct of our analysis, we obtain a Cramér-type moderate deviation bound for normalized centered partial sums.

2507.20982 2026-05-19 math.PR math.ST stat.TH

Bernstein-type dimension-free concentration for self-normalised martingales

伯恩斯坦型无维集中不等式用于自归一化鞅

Arya Akhavan, Amitis Shidani, Alex Ayoub, David Janz

AI总结 本文提出了一种无维的伯恩斯坦型尾界不等式,用于自归一化鞅,其中归一化使用可预测的二次变分,半径取决于观测协方差的信息增益。应用包括为具有自适应选择的希尔伯特值协变量的逻辑回归提供椭球置信序列,以及为希尔伯特臂逻辑带宽提供实例自适应的后悔界。

详情
AI中文摘要

我们引入了一种无维的伯恩斯坦型尾界不等式,用于自归一化鞅,其中的归一化使用可预测的二次变分,而半径取决于观测协方差的信息增益。作为应用,我们为具有自适应选择的希尔伯特值协变量的逻辑回归提供了椭球置信序列,并给出了希尔伯特臂逻辑带宽的实例自适应的后悔界。

英文摘要

We introduce a dimension-free Bernstein-type tail inequality for self-normalised martingales, where the normalisation uses the predictable quadratic variation and the radius depends on the information gain of the observed covariance. As applications, we provide ellipsoidal confidence sequences for logistic regression with adaptively chosen Hilbert-valued covariates, and give instance-adaptive regret bounds for Hilbert-armed logistic bandits.

2507.05482 2026-05-19 cs.LG stat.ML

Stein Diffusion Guidance: Training-Free Posterior Correction for Sampling Beyond High-Density Regions

Stein Diffusion Guidance: Training-Free Posterior Correction for Sampling Beyond High-Density Regions

Van Khoa Nguyen, Lionel Blondé, Alexandros Kalousis

AI总结 本文提出了一种基于Stein扩散引导的训练自由后验校正方法,用于在高密度区域之外进行采样。该方法结合了随机最优控制和Stein变分推断,通过引入新的理论界和运行成本函数,实现了在低密度区域的有效引导。

详情
Comments
Revised version accepted to the ICML 2026 main track; prior version accepted to two ICLR 2026 workshops: ReALM-GEN and DeLTa
AI中文摘要

Training-free diffusion guidance offers a flexible framework for leveraging off-the-shelf classifiers without additional training. Yet, current approaches hinge on posterior approximations via Tweedie's formula, which often yield unreliable guidance, particularly in low-density regions. Stochastic optimal control (SOC), in contrast, enables principled posterior sampling but remains computationally prohibitive for efficient inference. In this work, we reconcile the strengths of these paradigms by introducing Stein Diffusion Guidance (SDG), a novel 免训练 framework grounded in a surrogate SOC objective. We establish a new theoretical bound on the SOC value function, revealing the necessity of correcting approximate posteriors to reflect true diffusion dynamics. Building on Stein variational inference, SDG computes the steepest descent direction that minimizes the Kullback-Leibler divergence between approximate and true posteriors. By integrating a principled Stein correction mechanism along with a novel running cost functional, SDG enables effective guidance in low-density regions. Our experiments on diverse image-guidance tasks and on challenging small-ligand sampling for protein docking suggest that SDG consistently outperforms standard 免训练 guidance methods and highlights its potential for broader posterior sampling problems beyond high-density regimes.

英文摘要

Training-free diffusion guidance offers a flexible framework for leveraging off-the-shelf classifiers without additional training. Yet, current approaches hinge on posterior approximations via Tweedie's formula, which often yield unreliable guidance, particularly in low-density regions. Stochastic optimal control (SOC), in contrast, enables principled posterior sampling but remains computationally prohibitive for efficient inference. In this work, we reconcile the strengths of these paradigms by introducing Stein Diffusion Guidance (SDG), a novel training-free framework grounded in a surrogate SOC objective. We establish a new theoretical bound on the SOC value function, revealing the necessity of correcting approximate posteriors to reflect true diffusion dynamics. Building on Stein variational inference, SDG computes the steepest descent direction that minimizes the Kullback-Leibler divergence between approximate and true posteriors. By integrating a principled Stein correction mechanism along with a novel running cost functional, SDG enables effective guidance in low-density regions. Our experiments on diverse image-guidance tasks and on challenging small-ligand sampling for protein docking suggest that SDG consistently outperforms standard training-free guidance methods and highlights its potential for broader posterior sampling problems beyond high-density regimes.

2506.12201 2026-05-19 cs.IT eess.SP math.IT math.ST stat.TH

Functional Multi-Reference Alignment via Deconvolution

基于去卷积的功能多参考对齐

Omar Al-Ghattas, Anna Little, Daniel Sanz-Alonso, Mikhail Sweeney

AI总结 本文研究了通过移位和噪声观测估计信号函数的多参考对齐问题,提出了一种利用Kotlarski公式进行去卷积的新方法,从而将MRA与去卷积联系起来,并通过理论和数值实验验证了该方法的有效性。

详情
Comments
48 pages, 9 figures
AI中文摘要

本文研究了多参考对齐(MRA)问题,即从移位和噪声观测中估计信号函数。我们的功能形式化揭示了MRA与去卷积之间的新联系:信号可以通过Kotlarski公式通过二阶统计量进行估计,该公式是去卷积领域的重要识别结果,适用于重复测量。为了设计我们的MRA算法,我们扩展了Kotlarski公式以适用于一般维度,并研究了具有消失傅里叶变换的信号估计,从而也对去卷积文献做出了贡献。我们通过理论和数值实验验证了我们的去卷积方法在MRA中的应用。

英文摘要

This paper studies the multi-reference alignment (MRA) problem of estimating a signal function from shifted, noisy observations. Our functional formulation reveals a new connection between MRA and deconvolution: the signal can be estimated from second-order statistics via Kotlarski's formula, an important identification result in deconvolution with replicated measurements. To design our MRA algorithms, we extend Kotlarski's formula to general dimension and study the estimation of signals with vanishing Fourier transform, thus also contributing to the deconvolution literature. We validate our deconvolution approach to MRA through both theory and numerical experiments.

2506.08244 2026-05-19 cs.LG cs.AI stat.ML

Algebraic Priors for Approximately Equivariant Networks

代数先验用于近似等变网络

Riccardo Ali, Pietro Liò, Jamie Vicary

AI总结 本文提出了一种无需参数的代数方法,利用群表示理论来构建等变网络的先验,通过实验验证该方法在多个任务中表现优异,甚至在无限群情况下也优于专门设计的模型。

详情
AI中文摘要

等变神经网络通过群作用来整合对称性,将其作为归纳偏差以提高性能。现有方法在潜在空间中学习等变作用,或设计具有等变结构的架构。这些方法通常能获得良好的经验结果,但可能涉及架构特定的约束、大量参数和高计算成本。我们挑战复杂等变架构范式,提出一种无参数的方法,基于群表示理论。我们证明,对于有限群上的等变编码器,潜在空间几乎必然包含每个线性无关数据轨道的一个副本,我们通过多个实验证明这一点。利用这一基础的代数洞察,我们通过辅助损失将群的正则表示作为归纳偏差,不增加可学习参数。我们的广泛评估显示,该方法在多个任务中表现优异,甚至在无限群情况下也优于专门设计的模型。我们进一步通过消融研究验证了正则表示的选择,显示其在所有情况下均优于定义和平凡群表示的基线模型。

英文摘要

Equivariant neural networks incorporate symmetries through group actions, embedding them as an inductive bias to improve performance. Existing methods learn an equivariant action on the latent space, or design architectures that are equivariant by construction. These approaches often deliver strong empirical results but can involve architecture-specific constraints, large parameter counts, and high computational cost. We challenge the paradigm of complex equivariant architectures with a parameter-free approach grounded in group representation theory. We prove that for an equivariant encoder over a finite group, the latent space must almost surely contain one copy of its regular representation for each linearly independent data orbit, which we explore with a number of empirical studies. Leveraging this foundational algebraic insight, we impose the group's regular representation as an inductive bias via an auxiliary loss, adding no learnable parameters. Our extensive evaluation shows that this method matches or outperforms specialized models in several cases, even those for infinite groups. We further validate our choice of the regular representation through an ablation study, showing it consistently outperforms defining and trivial group representation baselines.

2504.03035 2026-05-19 stat.ML cs.LG math.PR math.ST stat.ME stat.TH

High-dimensional ridge regression with random features for non-identically distributed data with a variance profile

具有随机特征的高维岭回归:非同分布数据的方差轮廓

Issa-Mbenard Dabo, Jérémie Bigot

AI总结 本文研究了在非同分布数据下,使用随机特征的高维岭回归,通过方差轮廓模型分析训练和测试风险的渐近等价,并揭示了异质方差轮廓对泛化性能的影响。

详情
AI中文摘要

随机特征岭回归通常在同质采样模型下分析,即$x_i=Σ^{1/2}x_i'$,其中向量$x_i'$具有独立同分布的条目和相同的协方差矩阵$Σ$。本文超越了这一设定,通过方差轮廓模型研究非同分布数据,其中训练和测试协变量具有行依赖的对角协方差矩阵$Σ_i=diag(γ_{i1}^2,…,γ_{ip}^2)$和$\widetildeΣ_i=diag( ildeγ_{i1}^2,…, ildeγ_{ip}^2)$。我们的主要贡献是推导了当$n$、$p$和$m$按比例增长时,具有随机特征的岭回归的训练和测试风险的渐近等价。第一组等价是通过线性加混沌近似与交通概率论证相结合得到的,而第二组是确定性的,并通过通过主对角线的融合论证从算子值自由概率中获得。这些等价在数值实验中是精确的。它们还揭示了异质方差轮廓,包括受MNIST启发的混合型轮廓,如何修改泛化性能,并在岭参数较小时表现出双下降行为。

英文摘要

Random feature ridge regression is often analyzed in the high-dimensional regime under the homogeneous sampling model $x_i=Σ^{1/2}x_i'$, where the vectors $x_i'$ have iid entries and the same covariance matrix $Σ$ is shared by all samples. In this paper, we move beyond this setting and study non-identically distributed data through a variance-profile model in which the training and test covariates have row-dependent diagonal covariance matrices $Σ_i=\diag(γ_{i1}^2,\ldots,γ_{ip}^2)$ and $\widetildeΣ_i=\diag(\tildeγ_{i1}^2,\ldots,\tildeγ_{ip}^2)$. Our main contribution is the derivation of asymptotic equivalents for the training and test risks of ridge regression with random features when $n$, $p$, and $m$ grow proportionally. The first set of equivalents is obtained by combining the linear-plus-chaos approximation with traffic-probability arguments, whereas the second set is deterministic and follows from operator-valued free probability through an amalgamation-over-the-diagonal argument. These equivalents are sharp in numerical experiments. They also reveal how heterogeneous variance profiles, including mixture-type profiles inspired by MNIST, can modify generalization and exhibit double-descent behavior when the ridge parameter is small.

2502.02463 2026-05-19 stat.ML cs.LG

Distribution Transformers: Fast Approximate Bayesian Inference With On-The-Fly Prior Adaptation

分布变换器:通过实时先验适应实现快速近似贝叶斯推断

George Whittle, Juliusz Ziomek, Jacob Rawling, Maike A. Osborne

AI总结 本文提出分布变换器,一种能够学习任意分布到分布映射的新型架构,通过实时先验适应实现快速近似贝叶斯推断,显著降低计算时间并达到与现有方法相当或更优的对数似然性能。

详情
Comments
Spotlight acceptance at ICML 2026
AI中文摘要

尽管贝叶斯推断为在不确定性下的推理提供了原理性框架,但其广泛应用受到精确后验计算不可行的限制,需要使用近似推断。然而,现有方法通常计算成本高,或在先验变化时需要昂贵的重新训练,限制了其在如实时传感器融合等连续推断问题中的实用性。为了解决这些挑战,我们引入了分布变换器——一种新型架构,能够学习任意分布到分布的映射。我们的方法可以训练为将先验映射到对应的后验,条件于某些数据集——从而执行近似贝叶斯推断。我们的新型架构将先验分布表示为(通用近似)高斯混合模型(GMM),并将其实变为后验的GMM表示。GMM的组成部分通过自注意力机制相互关注,并通过交叉注意力机制与数据点相互作用。我们证明分布变换器在保持先验变化的灵活性的同时,显著减少了计算时间——从分钟到毫秒——并在序列推断、量子系统参数推断以及具有超先验的高斯过程预测后验推断等任务中实现了与现有近似推断方法相当或更优的对数似然性能。

英文摘要

While Bayesian inference provides a principled framework for reasoning under uncertainty, its widespread adoption is limited by the intractability of exact posterior computation, necessitating the use of approximate inference. However, existing methods are often computationally expensive, or demand costly retraining when priors change, limiting their utility, particularly in sequential inference problems such as real-time sensor fusion. To address these challenges, we introduce the Distribution Transformer -- a novel architecture that can learn arbitrary distribution-to-distribution mappings. Our method can be trained to map a prior to the corresponding posterior, conditioned on some dataset -- thus performing approximate Bayesian inference. Our novel architecture represents a prior distribution as a (universally-approximating) Gaussian Mixture Model (GMM), and transforms it into a GMM representation of the posterior. The components of the GMM attend to each other via self-attention, and to the datapoints via cross-attention. We demonstrate that Distribution Transformers both maintain flexibility to vary the prior, and significantly reduces computation times-from minutes to milliseconds-while achieving log-likelihood performance on par with or superior to existing approximate inference methods across tasks such as sequential inference, quantum system parameter inference, and Gaussian Process predictive posterior inference with hyperpriors.

2501.14993 2026-05-19 math.OC stat.ML

Convergence Analysis of the Wasserstein Proximal Algorithm beyond Geodesic Convexity

超越测地凸性的Wasserstein近端算法收敛性分析

Shuailong Zhu, Xiaohui Chen

AI总结 本文提出了一种无需假设目标函数测地凸性的简单自包含分析,证明了Wasserstein近端算法在自然的Wasserstein类欧几里得Polyak-Łojasiewicz不等式下具有无偏线性收敛性,改进了现有在强测地凸性下求解Wasserstein梯度流的近端算法收敛率,并扩展到半测地凸目标的近端算法。

详情
AI中文摘要

近端算法是一种强大的工具,用于在一般的度量空间中最小化非线性和非光滑泛函。受最近在均场 regime 下研究噪声梯度下降算法训练动态在两层神经网络中的进展启发,本文提供了一种简单且自包含的分析,用于分析一般用途的Wasserstein近端算法的收敛性,而无需假设目标泛函的测地凸性。在自然的Wasserstein类欧几里得Polyak-Łojasiewicz不等式的前提下,我们证明了近端算法具有无偏和线性收敛速率。我们的收敛速率优于现有在强测地凸性下求解Wasserstein梯度流的近端算法的收敛率。我们还扩展了我们的分析到半测地凸目标的近端算法。在我们的数值实验中,近端训练在均场神经网络上的收敛速率比噪声梯度下降算法更快。

英文摘要

The proximal algorithm is a powerful tool to minimize nonlinear and nonsmooth functionals in a general metric space. Motivated by the recent progress in studying the training dynamics of the noisy gradient descent algorithm on two-layer neural networks in the mean-field regime, we provide in this paper a simple and self-contained analysis for the convergence of the general-purpose Wasserstein proximal algorithm without assuming geodesic convexity of the objective functional. Under a natural Wasserstein analog of the Euclidean Polyak-Łojasiewicz inequality, we establish that the proximal algorithm achieves an unbiased and linear convergence rate. Our convergence rate improves upon existing rates of the proximal algorithm for solving Wasserstein gradient flows under strong geodesic convexity. We also extend our analysis to the inexact proximal algorithm for geodesically semiconvex objectives. In our numerical experiments, proximal training demonstrates a faster convergence rate than the noisy gradient descent algorithm on mean-field neural networks.

2410.16307 2026-05-19 q-fin.ST stat.AP stat.ME

Functional Clustering of Discount Functions for Behavioral Investor Profiling

基于折扣函数的功能聚类用于行为投资者画像

Annamaria Porreca, Viviana Ventre, Roberta Martino, Salvador Cruz Rambaud, Fabrizio Maturo

AI总结 本文通过功能数据分析研究不同性格类型在时间折扣行为中的异质性,揭示投资者画像的多样性,为金融顾问制定个性化策略提供理论支持。

详情
Journal ref
Applied Stochastic Models in Business and Industry 42(3), e70101 (2026)
AI中文摘要

经典金融模型基于投资者理性决策和利用所有可用信息的假设,但这些模型往往无法捕捉跨时期选择和不确定性决策中的异常现象,尤其是在考虑个人偏好和消费模式差异时。此类限制阻碍了传统金融理论回答关键问题:个人偏好如何影响投资决策?投资者行为的驱动力是什么?个体如何选择其投资组合?Pompian的四种行为投资者类型(BITs)模型是一个重要贡献,它将行为金融学研究与Keirsey的性格理论联系起来,强调了性格在金融决策中的作用。然而,传统参数模型难以捕捉这些不同性格如何影响跨时期决策,如个体如何评估现在与未来结果之间的权衡。为填补这一空白,本文采用功能数据分析(FDA)专门研究时间折扣行为,揭示不同性格类型在时间不确定性感知和管理中的细微模式。我们的发现表明每种性格类型内部都存在异质性,表明投资者画像比以往认为的更加多样。这种细化的分类提供了更深入的见解,揭示了性格在塑造跨时期金融决策中的作用,为金融顾问更好地制定针对个体风险偏好和决策风格的策略提供了实用意义。

英文摘要

Classical finance models are based on the premise that investors act rationally and utilize all available information when making portfolio decisions. However, these models often fail to capture the anomalies observed in intertemporal choices and decision-making under uncertainty, particularly when accounting for individual differences in preferences and consumption patterns. Such limitations hinder traditional finance theory's ability to address key questions like: How do personal preferences shape investment choices? What drives investor behaviour? And how do individuals select their portfolios? One prominent contribution is Pompian's model of four Behavioral Investor Types (BITs), which links behavioural finance studies with Keirsey's temperament theory, highlighting the role of personality in financial decision-making. Yet, traditional parametric models struggle to capture how these distinct temperaments influence intertemporal decisions, such as how individuals evaluate trade-offs between present and future outcomes. To address this gap, the present study employs Functional Data Analysis (FDA) to specifically investigate temporal discounting behaviours revealing nuanced patterns in how different temperaments perceive and manage uncertainty over time. Our findings show heterogeneity within each temperament, suggesting that investor profiles are far more diverse than previously thought. This refined classification provides deeper insights into the role of temperament in shaping intertemporal financial decisions, offering practical implications for financial advisors to better tailor strategies to individual risk preferences and decision-making styles.

2406.16859 2026-05-19 stat.ME

On the extensions of the Chatterjee-Spearman test

关于Chatterjee-Spearman检验的扩展

Qingyang Zhang

AI总结 本文提出了一种基于秩的联合检验方法,通过结合Chatterjee和Spearman相关性,扩展了检验的适用范围,并探讨了其在多变量情况下的应用。

详情
Comments
46 pages, 8 figures
AI中文摘要

Chatterjee (2021) 引入了一种新颖的独立性检验,该检验基于秩,渐近正态且对所有替代假设一致。Chatterjee检验的一个局限性是其在检测单调关系时统计功效较低。为了解决这一局限性,在我们之前的工作(Zhang, 2024, Commun. Stat. - Theory Methods)中,我们提出了将Chatterjee和Spearman相关性结合为最大型检验,并建立了渐近联合正态性。本工作考察了联合检验的三个关键扩展。首先,受其原始非对称形式的启发,我们将Chatterjee-Spearman检验扩展为对称版本,并推导了对称统计量的渐近零分布。其次,我们研究了Chatterjee相关性与其他流行秩相关性(包括Kendall's tau和 quadrant 相关性)之间的关系。我们证明,在独立性下,Chatterjee相关性和这些秩相关性渐近联合正态且独立。模拟研究显示,Chatterjee-Kendall检验的效力优于Chatterjee-Spearman检验。最后,我们探讨了两种可能的多变量扩展。这些扩展扩展了基于秩的联合检验在更广泛场景中的适用性。

英文摘要

Chatterjee (2021) introduced a novel independence test that is rank-based, asymptotically normal and consistent against all alternatives. One limitation of Chatterjee's test is its low statistical power for detecting monotonic relationships. To address this limitation, in our previous work (Zhang, 2024, Commun. Stat. - Theory Methods), we proposed to combine Chatterjee's and Spearman's correlations into a max-type test and established the asymptotic joint normality. This work examines three key extensions of the combined test. First, motivated by its original asymmetric form, we extend the Chatterjee-Spearman test to a symmetric version, and derive the asymptotic null distribution of the symmetrized statistic. Second, we investigate the relationships between Chatterjee's correlation and other popular rank correlations, including Kendall's tau and quadrant correlation. We demonstrate that, under independence, Chatterjee's correlation and any of these rank correlations are asymptotically joint normal and independent. Simulation studies demonstrate that the Chatterjee-Kendall test has better power than the Chatterjee-Spearman test. Finally, we explore two possible extensions to the multivariate case. These extensions expand the applicability of the rank-based combined tests to a broader range of scenarios.

2307.08643 2026-05-19 cs.LG stat.ML

Corruptions of Supervised Learning Problems: Typology and Mitigations

监督学习问题的腐败:类型与缓解方法

Laura Iacovissi, Nan Lu, Robert C. Williamson

AI总结 本文提出了一种通用的腐败理论,通过马尔可夫核分析底层概率分布的变化,统一了不同类型的腐败模型,并探讨了针对各种腐败类型的缓解方法。

详情
Comments
73 pages. To be published in Journal of Machine Learning Research 27 (2026) 1-73
AI中文摘要

腐败在数据收集中普遍存在。尽管已有大量研究,现有文献主要集中在特定设置和学习场景,缺乏对腐败建模和缓解的统一视角。本文开发了一种通用的腐败理论,涵盖监督学习问题的所有修改,包括模型类和损失的变化。通过分析底层概率分布的变化,我们的方法带来了三个新机会:首先,构建了一个新型且可证明的腐败框架,区分不同类型的腐败;其次,通过比较清洁和受污染场景下的贝叶斯风险,系统分析了腐败对学习任务的影响;第三,基于这些结果,我们研究了各种腐败类型的缓解方法。我们扩展了现有的标签腐败损失修正方法以处理依赖性腐败类型。我们的发现强调了将经典腐败修正学习框架推广到更宽松的范式以涵盖更多腐败类型的必要性。我们提供了这种范式以及属性和联合腐败情况下的损失修正公式。

英文摘要

Corruption is notoriously widespread in data collection. Despite extensive research, the existing literature predominantly focuses on specific settings and learning scenarios, lacking a unified view of corruption modelization and mitigation. In this work, we develop a general theory of corruption, which incorporates all modifications to a supervised learning problem, including changes in model class and loss. Focusing on changes to the underlying probability distributions via Markov kernels, our approach leads to three novel opportunities. First, it enables the construction of a novel, provably exhaustive corruption framework, distinguishing among different corruption types. This serves to unify existing models and establish a consistent nomenclature. Second, it facilitates a systematic analysis of corruption's consequences on learning tasks, by comparing Bayes risks in the clean and corrupted scenarios. Notably, while label corruptions affect only the loss function, attribute corruptions additionally influence the hypothesis class. Third, building upon these results, we investigate mitigations for various corruption types. We expand existing loss-correction methods for label corruption to handle dependent corruption types. Our findings highlight the necessity to generalize this classical corruption-corrected learning framework to a new paradigm with weaker requirements to encompass more corruption types. We provide such a paradigm as well as loss correction formulas in the attribute and joint corruption cases.

2305.18578 2026-05-19 stat.ME cs.LG stat.ML

Quick Adaptive Ternary Segmentation: An Efficient Decoding Procedure For Hidden Markov Models

快速自适应三元分割:一种适用于隐马尔可夫模型的高效解码过程

Alexandre Mösching, Housen Li, Axel Munk

AI总结 本文提出了一种快速自适应三元分割(QATS)方法,通过分治策略在序列长度上具有多项对数复杂度,在状态空间大小上具有三次复杂度,适用于大规模隐马尔可夫模型。该方法通过自适应搜索近似最大化局部似然得分,实现了比Viterbi和PMAP更快的解码速度和更高的精度。

详情
Journal ref
Journal of Computational and Graphical Statistics, 35(2), 865-879, 2026
AI中文摘要

隐马尔可夫模型(HMMs)由一个不可观测的马尔可夫链和一个可观测的过程组成——隐藏链的噪声版本。从噪声观测中解码原始信号是几乎所有基于HMM的数据分析的主要目标。现有的解码算法,如维特比算法和点最大后验(PMAP)算法,其计算复杂度在最坏情况下是观测序列长度的线性函数,或隐藏链状态空间大小的亚二次函数。我们提出了快速自适应三元分割(QATS),一种分治策略,其计算复杂度在序列长度上为多项对数,在状态空间大小上为三次方,因此特别适用于具有相对较少状态的大规模HMM。它还提出了一种有效的数据存储方法,即特定的累积和。本质上,估计的状态序列在所有最多三个段的局部路径中最大化局部似然得分,并且是可接受的。最大化仅通过自适应搜索过程近似进行。我们的模拟展示了QATS相比维特比和PMAP的速度提升,以及精度分析。QATS的实现可在GitHub上的R包QATS中找到。

英文摘要

Hidden Markov models (HMMs) are characterized by an unobservable Markov chain and an observable process -- a noisy version of the hidden chain. Decoding the original signal from the noisy observations is one of the main goals in nearly all HMM based data analyses. Existing decoding algorithms such as Viterbi and the pointwise maximum a posteriori (PMAP) algorithm have computational complexity at best linear in the length of the observed sequence, and sub-quadratic in the size of the state space of the hidden chain. We present Quick Adaptive Ternary Segmentation (QATS), a divide-and-conquer procedure with computational complexity polylogarithmic in the length of the sequence, and cubic in the size of the state space, hence particularly suited for large scale HMMs with relatively few states. It also suggests an effective way of data storage as specific cumulative sums. In essence, the estimated sequence of states sequentially maximizes local likelihood scores among all local paths with at most three segments, and is meanwhile admissible. The maximization is performed only approximately using an adaptive search procedure. Our simulations demonstrate the speedups offered by QATS in comparison to Viterbi and PMAP, along with a precision analysis. An implementation of QATS is in the R-package QATS on GitHub.

2211.09284 2026-05-19 eess.SP cs.NA math.NA stat.ME

Iterative execution of discrete and inverse discrete Fourier transforms with applications for signal denoising via sparsification

迭代执行离散和反向离散傅里叶变换及其在信号去噪中的应用

H. Robert Frost

AI总结 本文提出了一种迭代算法家族,通过反复执行离散和反向离散傅里叶变换,利用稀疏化操作在时域和频域数据中实现信号去噪,特别是在高斯噪声中恢复周期性尖峰信号。

详情
AI中文摘要

我们描述了一类迭代算法,其涉及离散和反向离散傅里叶变换的反复执行。这一家族中的一员受到离散傅里叶变换不确定性原理的启发,通过在时域和频域数据中应用稀疏化操作,当时域稀疏性达到稳定模式时收敛。这种稀疏化变体在信号去噪中有实际应用,特别是恢复在高斯噪声中存在的周期性尖峰信号。通过模拟研究展示了通用收敛性质和与现有方法相比的去噪性能。实现该技术的R包及相关资源可在https://hrfrost.host.dartmouth.edu/IterativeFT找到。

英文摘要

We describe a family of iterative algorithms that involve the repeated execution of discrete and inverse discrete Fourier transforms. One interesting member of this family is motivated by the discrete Fourier transform uncertainty principle and involves the application of a sparsification operation to both the real domain and frequency domain data with convergence obtained when real domain sparsity hits a stable pattern. This sparsification variant has practical utility for signal denoising, in particular the recovery of a periodic spike signal in the presence of Gaussian noise. General convergence properties and denoising performance relative to existing methods are demonstrated using simulation studies. An R package implementing this technique and related resources can be found at https://hrfrost.host.dartmouth.edu/IterativeFT.

2605.18134 2026-05-19 stat.CO stat.ME

Optimal Sampling for Kernel Quadrature on Unbounded Domains

核 quadrature 在无界域上的最优采样

Edoardo Bandoni, Christian Robert, Julien Stoehr

AI总结 本文研究了随机 quadrature 方法,旨在提高鲁棒性而非特定核的最优性。提出了一种显式且依赖于 n 的采样分布,能够在不需了解核的情况下实现最小最大误差率,扩展到无界域,提供理论保证和实用的鲁棒最优随机 quadrature 方法。

详情
AI中文摘要

核 quadrature 广泛用于近似光滑函数的积分,其最坏情况误差通常以最小最大速率 $n^{-α/d}$ 衰减,其中 α 是平滑度,d 是维度。现有最优速率方法通常依赖于针对特定核定制的确定性点集,使其对规格错误敏感且在实践中不够稳健。在本文中,我们研究了随机 quadrature 方法,重点在于鲁棒性而非核特定的最优性。我们构造了一个显式且依赖于 n 的采样分布,能够在不需了解核的情况下实现最小最大误差率,适用于平滑性类。我们的分析包括无界采样措施,如高斯和学生 t 分布,扩展到紧域之外。结果提供了理论保证和实用的鲁棒最优随机 quadrature 方法的配方。

英文摘要

Kernel quadrature is widely used to approximate integrals of smooth functions, with worst-case error typically decaying at the minimax rate $n^{-α/d}$ for smoothness $α$ in dimension $d$. Existing rate-optimal methods often depend on deterministic point sets tailored to a specific kernel, making them sensitive to misspecification and less robust in practice. In this work, we study randomized quadrature methods with a focus on robustness rather than kernel-specific optimality. We construct an explicit, $n$-dependent sampling distribution that achieves minimax rates for worst-case error over smoothness classes without requiring knowledge of the kernel. This kernel-agnostic design improves robustness while retaining optimal rates. Our analysis includes unbounded sampling measures such as Gaussian and Student-$t$ distributions, extending beyond compact domains. The results provide both theoretical guarantees and a practical recipe for robust, rate-optimal randomized quadrature.

2605.18100 2026-05-19 math.ST stat.TH

Uncertainty functionals revisited: Concavity and Jensen's inequality

不确定性功能重新审视:凹性与Jensen不等式

Julien Bect, Xujia Zhu

AI总结 本文研究了在一般可测空间上的不确定性功能,探讨了其在实验设计和全局敏感性分析中的应用,指出凹性是Jensen不等式成立的必要条件,但并非充分条件,并提供了可行的充分条件。

详情
AI中文摘要

本文对一般可测空间上的不确定性功能进行了理论研究。这些功能在实验设计和全局敏感性分析中用于量化概率模型中的变异性与信息含量。DeGroot在1962年的开创性文章中首次提出,不确定性在获得额外信息时应平均减少,这等同于概率测度空间上的概率形式的Jensen不等式。我们的主要结果表明,当底层可测空间为无限时,凹性是Jensen不等式成立的必要条件,但不充分。我们还提供了可行的充分条件,以保证所期望的性质成立。这些结果为不确定性量化提供了更清晰的数学基础。提出了几个开放性问题。

英文摘要

This article presents a theoretical study of uncertainty functionals on general measurable spaces. These functionals are fundamental in experimental design and global sensitivity analysis, where they are used to quantify variability and information content in probabilistic models. As first articulated in DeGroot's seminal 1962 article, a natural requirement is that uncertainty should decrease on average when additional information is obtained. This requirement is equivalent to the probabilistic form of Jensen's inequality on the space of probability measures. Our main results show that concavity is necessary but not sufficient for Jensen's inequality to hold whenever the underlying measurable space is infinite. We also provide practicable sufficient conditions under which the desired property holds. These results contribute to a clearer mathematical foundation for uncertainty quantification. Several open questions are formulated.

2605.18069 2026-05-19 stat.ML cs.LG math.PR math.ST stat.TH

Wasserstein bounds for denoising diffusion probabilistic models via the Föllmer process

通过Föllmer过程研究去噪扩散概率模型的Wasserstein界限

Yuta Koike

AI总结 本文研究了去噪扩散概率模型(DDPMs)在2-Wasserstein距离下的采样误差界限,提出了三种核心贡献:一是基于一般Lipschitz型条件和广泛方差调度(包括余弦调度),建立了最优的上界;二是证明了相同的Lipschitz型条件蕴含对数Sobolev不等式和二次运输成本不等式;三是展示了对于一般的对数凹目标分布,即使没有二次运输成本不等式,最优的Wasserstein误差界限仍可达到。

详情
Comments
45 pages
AI中文摘要

本文研究了去噪扩散概率模型(DDPMs)在2-Wasserstein距离下的采样误差界限。我们的贡献有三个方面。 (i) 在一般Lipschitz型条件和广泛方差调度(包括余弦调度)下,我们建立了最优的上界,该上界在维度和步骤数上都是最优的,并恢复了文献中已获得的几个最优误差界限。 (ii) 我们证明了相同的Lipschitz型条件,涵盖了通常施加于(学习的)得分函数的条件,蕴含对数Sobolev不等式以及DDPM的二次运输成本不等式。因此,在现有工作的覆盖设置中,最优的Wasserstein界限(在对数因子范围内)可以从最近在Kullback-Leibler散度下的最优误差界限中推导出来。 (iii) 我们展示了对于一般的对数凹目标分布,即使没有目标的二次运输成本不等式,最优的Wasserstein误差界限仍可达到。我们的分析基于将DDPM采样器视为Föllmer过程的离散化,而不是传统的反向Ornstein-Uhlenbeck过程。

英文摘要

This paper studies sampling error bounds for denoising diffusion probabilistic models (DDPMs) in the 2-Wasserstein distance. Our contributions are threefold. (i) Under general Lipschitz-type conditions on the score function and for a broad class of variance schedules, including the cosine schedule, we establish sharp upper bounds that are optimal in both the dimension and the number of steps, and recover several sharp error bounds previously obtained in the literature. (ii) We prove that the same Lipschitz-type conditions, which encompass those commonly imposed on the (learned) score, imply a logarithmic Sobolev inequality and hence a quadratic transportation cost inequality for the DDPM. As a consequence, in settings covered by existing work, an optimal Wasserstein bound, up to a logarithmic factor, follows from the recently obtained sharp error bound in the Kullback-Leibler divergence under geometric-type variance schedules. (iii) We show that for general log-concave target distributions, the optimal Wasserstein error bound remains attainable even without a quadratic transportation cost inequality for the target. Our analysis is based on viewing the DDPM sampler as a discretization of the Föllmer process rather than the conventional reverse Ornstein-Uhlenbeck process.

2605.18042 2026-05-19 cs.DS stat.ML

On efficient robust regression with subquadratic samples

关于使用亚二次样本的高效鲁棒回归

Deeksha Adil, Jarosław Błasiok, Hongjie Chen, Deepak Narayanan Sridharan

AI总结 本文研究了在高斯协变量下,未知条件数为κ的鲁棒线性回归问题,提出了一种近线性时间算法,使用O(d/ε^4)样本,预测误差为O(√(εκ)),并在εκ≈1的条件下改进了先前工作。同时,通过统计查询下限证明了高效SQ算法若要达到o(√(εκ))的误差,需模拟O(d²)样本。最后,证明了低次多项式下限,表明在不假设εκ≈1的情况下,高效算法可能需要O(min{dε²κ², ε²d²})样本才能显著优于始终猜测0的平凡估计器。

详情
Comments
Accepted at COLT 2026
AI中文摘要

我们重新审视在高斯协变量下,具有未知条件数κ的鲁棒线性回归问题。对于这个问题,我们对样本复杂度、条件数、运行时间和预测误差之间的权衡理解仍存在显著差距。我们的第一个结果是一个近线性时间算法,使用O(d/ε^4)样本,其中d是维度,ε是污染率,并在条件εκ≈1下达到预测误差O(√(εκ)),优于所有先前工作。我们补充这一结果,通过统计查询(SQ)下限证明,当εκ≈1时,实现误差o(√(εκ))的高效SQ算法需要模拟Ω(d²)样本。最后,我们证明了一个低次多项式下限,表明在不假设如εκ≈1的情况下,高效算法可能需要O(min{dε²κ², ε²d²})样本才能显著优于始终猜测0的平凡估计器。

英文摘要

We revisit the problem of robust linear regression under Gaussian covariates with an unknown covariance matrix of condition number $κ$. For this fundamental problem, significant gaps remain in our understanding of the trade-offs among sample complexity, condition number, runtime, and prediction error for efficient algorithms. Our first result is a near-linear-time algorithm that uses $\widetilde{O}(d/ε^4)$ samples, where $d$ is the dimension and $ε$ is the corruption rate, and achieves prediction error $O(\sqrt{εκ})$ under the condition $εκ\lesssim 1$, improving over all prior works. We complement this result with a Statistical Query (SQ) lower bound showing that efficient SQ algorithms achieving error $o(\sqrt{εκ})$ when $εκ\lesssim 1$ require queries that take $Ω(d^2)$ samples to simulate. Finally, we prove a low-degree polynomial lower bound that gives fine-grained evidence that, without assumptions such as $εκ\lesssim 1$, efficient algorithms may require $\tildeΩ\left(\min\{dε^{2}κ^{2},\ ε^{2}d^{2}\}\right)$ samples to significantly outperform the trivial estimator that always guesses $0$.

2605.18040 2026-05-19 stat.ML cs.LG math.PR

A note on connections between the Föllmer process and the denoising diffusion probabilistic model

关于Föllmer过程与去噪扩散概率模型之间联系的注记

Yuta Koike

AI总结 本文探讨了Föllmer过程与去噪扩散概率模型(DDPM)之间的联系,指出离散化的Föllmer过程可以作为DDPM采样器的自然超参数设置,并系统地恢复了DDPM采样误差界的结果。

详情
Comments
32 pages
AI中文摘要

Föllmer过程是一种在时间1处具有预指定分布的布朗运动。该过程可以被解释为去噪扩散概率模型(DDPM)的逆随机微分方程(SDE)的'增强'时间压缩版本。尽管这一事实已间接用于通过逆SDE的离散化分析DDPM采样误差,但Föllmer过程的直接离散化与DDPM采样器之间的联系尚未被充分探讨。本文旨在澄清这一点,并回顾现有工作中相关的结果。我们证明离散化的Föllmer过程可以作为DDPM采样器的自然超参数设置。此外,这使我们能够系统地恢复最先进的DDPM采样误差界结果,并稍作改进。

英文摘要

The Föllmer process is a Brownian motion conditioned to have a pre-specified distribution at time 1. This process can be interpreted as an "augmented" time-compressed version of the reverse stochastic differential equation (SDE) for the denoising diffusion probabilistic model (DDPM). While this fact has been indirectly used to analyze DDPM sampling errors via discretization of the reverse SDE, connections between direct discretization of the Föllmer process and the DDPM sampler have not yet been fully explored. This note aims to clarify this point while surveying relevant results from existing work. We show that discretized Föllmer processes give natural hyper-parameter settings of the DDPM sampler. Moreover, this allows us to systematically recover state-of-the-art results on DDPM sampling error bounds with slight improvements.

2605.18030 2026-05-19 stat.ME

A robust nonparametric test for spatial isotropy in lattice data

一种用于晶格数据空间各向同性的稳健非参数检验

Jana Gierse, Roland Fried

AI总结 本文提出了一种基于空间数据二维正则网格变异图的稳健检验方法,用于评估各向同性,该方法通过比较不同方向相同距离的变异图估计值,同时采用稳健变异图估计器和块置换重采样方法,以提高对异常值的鲁棒性。

详情
Comments
33 pages, 11 figures, 7 tables
AI中文摘要

本文提出了一种基于空间数据二维正则网格变异图的稳健检验方法,用于评估各向同性。该检验基于Guan等人(2004)提出的非稳健子采样检验,该检验通过比较不同方向相同距离的变异图估计值。稳健检验采用基于单变量或多变量散点估计的稳健变异图估计器,能够在存在孤立或块状异常值时表现良好。此外,还提出了一种不同的重采样方法,称为块置换。与子采样检验相比,块置换检验在数据强相关性时仍能保持显著性水平,并且对异常值具有鲁棒性。该方法通过应用Landsat 8卫星数据进行了说明,其中由于例如云层等原因可能会出现异常块。

英文摘要

This paper proposes a robust test for assessing isotropy based on the variogram of spatial data on a two-dimensional regular grid. The test is based on the non-robust subsampling test for isotropy of Guan et al. (2004), which uses the idea of comparing variogram estimates in diff erent directions at the same distance. The robust test employs robust variogram esti- mators which are based on estimators of univariate or multivariate scatter and perform well in the presence of isolated or block outliers. Additionally, a diff erent resampling method, called block permutation, is proposed. Compared with the subsampling test, the block per- mutation test maintains the signifi cance level even for strong dependencies in the data and is robust to outliers. The methods are illustrated by an application to Landsat 8 satellite data, where outlier blocks may occur due to, for example, clouds.

2605.18022 2026-05-19 cs.LG cs.AI stat.ML

Unveiling Memorization-Generalization Coexistence: A Case Study on Arithmetic Tasks with Label Noise

揭示记忆与泛化共存:在带有标签噪声的算术任务中的案例研究

Linyu Liu, Pinyan Lu

AI总结 本文研究了在高过参数化模型中如何同时记忆噪声标签和泛化,通过模运算任务中的实验发现,适当优化和模型配置下大模型泛化能力更强,噪声标签被更快记忆,而过参数化模型内部形成泛化结构,但输出被拟合噪声标签的需求所抑制。通过频率方法提取内部结构可实现高准确率,提出任务无关方法将网络分为泛化和记忆组件,尽管该子网络提升泛化能力,但相比频率提取方法仍有局限,表明泛化结构分布于神经元中,需要新工具来检索过参数化网络中的可泛化知识。

详情
Comments
27 pages, 32 figures
AI中文摘要

高度过参数化的模型可以同时记忆噪声标签并良好泛化,但如何这些行为共存仍不明确。本文通过模运算任务在重噪声标签下研究其内在机制。通过在两层神经网络上的广泛实验发现,适当优化和模型配置下大模型泛化能力更强,而噪声标签被更快记忆。过参数化模型内部形成泛化结构,但其在输出中的表达被拟合噪声标签的需求所抑制。值得注意的是,即使在80%的标签噪声下,通过频率方法提取内部结构也可实现接近完美的测试准确率。我们进一步提出一种任务无关的方法将网络分为泛化和记忆组件。尽管该子网络提升泛化能力,但相比频率提取方法仍有局限,表明泛化结构分布于神经元中,需要新工具来检索过参数化网络中的可泛化知识。

英文摘要

Highly over-parameterized models can simultaneously memorize noisy labels and generalize well, yet how these behaviors coexist remains poorly understood. In this work, we investigate the underlying mechanisms of this coexistence using modular arithmetic tasks under heavy label noise. Through extensive experiments on two-layer neural networks, we find that larger models tend to generalize better under appropriate optimization and model configurations, while noisy labels are memorized faster than clean data. Over-parameterized models internally form a generalization structure, but its expression in the output is suppressed by the need to fit noisy labels. Remarkably, even with 80\% label noise, near-perfect test accuracy can be achieved by extracting this internal structure using frequency-based methods. We further propose a task-agnostic method to partition networks into generalization and memorization components. Although this subnetwork improves generalization, it is limited compared with frequency-based extraction, indicating that the generalization structure is distributed across neurons and motivating the development of new tools to retrieve generalizable knowledge from over-parameterized networks.

2605.18019 2026-05-19 stat.ML q-fin.CP

A data-driven Fourier-mixture neural-network method for density estimation

一种数据驱动的傅里叶-混合神经网络方法用于密度估计

Duy-Minh Dang, Volter Entoma

AI总结 本文提出了一种数据驱动的傅里叶训练神经网络方法,用于从经验特征函数信息估计固定期限的概率密度。该估计器是一种具有闭式特征函数的正高斯-拉普拉斯混合分布,可以在傅里叶空间中直接训练,同时保持非负性和单位质量。研究考虑了两种采样设置,并分析了不同情况下误差界和计算复杂性。

详情
Comments
27 pages, 4 figures
AI中文摘要

我们提出了一种数据驱动的傅里叶训练神经网络方法,用于从经验特征函数(CF)信息估计固定期限概率密度。估计器是一种具有闭式特征函数的正高斯-拉普拉斯混合分布,因此可以在傅里叶空间中直接训练,同时保持非负性和单位质量。我们考虑了两种采样设置。在直接的i.i.d.采样设置中,该方法是针对由i.i.d.样本构造的经验CF进行训练的。在基于重采样的伪采样设置中,它则是针对由依赖数据通过重采样构造的经验伪CF进行训练的。对于直接的i.i.d.情况,我们推导出一个期望的L2误差界,该界将傅里叶截断、经验训练误差、离散化和CF采样误差分开。对于伪采样情况,我们获得了一个条件类比,其中包含两个额外的伪定律差异项。我们开发了该框架的多维扩展,并分析了其计算复杂性。数值实验显示,该方法在高斯混合基准上表现与期望最大化相当,在重尾目标上具有明显优势,L2误差衰减与理论一致,在明确设定下,且能够有效估计从重采样依赖数据中的一年澳大利亚股票收益分布。

英文摘要

We propose a data-driven Fourier-trained neural-network method for estimating fixed-horizon probability densities from empirical characteristic-function (CF) information. The estimator is a positive Gaussian--Laplace mixture with closed-form CF, so training can be performed directly in Fourier space while preserving nonnegativity and unit mass. We consider two sampling settings. In the direct i.i.d. sampling setting, the method is trained against an empirical CF constructed from i.i.d. samples. In the resampling-based pseudo-sampling setting, it is trained against an empirical pseudo-CF constructed from dependent data by resampling. For the direct i.i.d. case, we derive an expected $L_2$ error bound that separates Fourier truncation, empirical training error, discretization, and CF sampling error. For the pseudo-sampling case, we obtain a conditional analogue with two additional pseudo-law discrepancy terms. We develop a multidimensional extension of the framework and analyze its computational complexity. Numerical experiments show competitive performance relative to Expectation--Maximization on Gaussian-mixture benchmarks, clear gains on heavy-tailed targets, $L_2$ error decay consistent with the theory in a well-specified setting, and effective estimation of one-year Australian equity return law from resampled dependent data.

2605.18008 2026-05-19 cs.LG stat.ML

Uncertainty Reliability Under Domain Shift: An Investigation for Data-Driven Blood Pressure Estimation in Photoplethysmography

域移情况下不确定性可靠性研究:面向光体积脉搏波测记中数据驱动血压估计的探讨

Mohammad Moulaeifard, Ciaran Bench, Philip J. Aston, Nils Strodthoff

AI总结 本文研究了在域移情况下深度学习用于光体积脉搏波测记信号中血压估计的不确定性可靠性,比较了深度集成和蒙特卡洛滴答方法,并探讨了不确定性校准的重要性。

详情
Comments
23 pages, 2 figures
AI中文摘要

不确定性量化(UQ)对于安全关键领域如医疗至关重要,但很少在现实的分布外(OOD)条件下进行评估。本文评估了基于深度学习的血压(BP)估计在光体积脉搏波测记(PPG)信号中的预测性能和不确定性可靠性,分别在分布内(ID)和分布外(OOD)设置下进行。使用在PulseDB上训练的XResNet1D-50模型在四个外部数据集上进行测试,比较了深度集成(DE)和蒙特卡洛滴答(MCD)方法,并使用高斯负对数似然(GNLL)和均方误差(MSE)损失函数,可选地通过符合预测(CP)、温度缩放(TS)和等比回归(IR)进行后处理校准。我们的关键发现如下:(1)在域移情况下,DE比MCD提供更强的预测鲁棒性,这种优势主要在外部域移情况下显现。(2)经过校准的GNLL方法在不确定性校准方面表现最佳(例如,GNLL+DE+CP用于收缩压(SBP),GNLL+DE+TS用于舒张压(DBP)),而基于MSE的不确定性需要校准才能实用。(3)在各种设置中,CP和TS提供了最一致的增益,IR在某些情况下仍然具有竞争力。总体而言,我们的结果表明,基于DE的方法在域移下的预测性能最为稳健,GNLL在原生UQ中最强,而校准对于使MSE基于的不确定性实用化至关重要。这些发现突显了在外部数据上联合评估预测准确性和校准的重要性,以实现无袖带血压估计的可信度。

英文摘要

Uncertainty quantification (UQ) is critical for safety-critical domains like healthcare, yet it is rarely evaluated under realistic out-of-distribution (OOD) conditions. Here, we assessed predictive performance and uncertainty reliability for deep learning-based blood pressure (BP) estimation from photoplethysmography (PPG) signals under both in-distribution (ID) and OOD settings. Using an XResNet1D-50 trained on PulseDB and tested on four external datasets, we compared deep ensembles (DE) and Monte Carlo dropout (MCD) with Gaussian negative log-likelihood (GNLL) and mean squared error (MSE) losses, optionally followed by post-hoc recalibration via conformal prediction (CP), temperature scaling (TS), and isotonic regression (IR). The key findings of our study are as follows: (1) DE provides stronger predictive robustness under domain shift than MCD, an advantage that becomes clear primarily under external shift. (2) Recalibrated GNLL-based methods yield the best uncertainty calibration (e.g., GNLL+DE+CP for systolic blood pressure (SBP), GNLL+DE+TS for diastolic blood pressure (DBP)), while MSE-based uncertainty requires recalibration to become practically useful. (3) Across settings, CP and TS offer the most consistent gains, with IR remaining competitive in several cases. Overall, our results identify DE-based methods as most robust for predictive performance under domain shift, GNLL as strongest for native UQ, and recalibration as essential for making MSE-based uncertainty practical. These findings highlight the need to jointly assess predictive accuracy and calibration on external data for trustworthy cuffless BP estimation

2605.18005 2026-05-19 cs.LG stat.ML

Scalable Decision-Focused Learning through Cost-Sensitive Regression

通过成本敏感回归实现可扩展的决策聚焦学习

Noah Schutte, Senne Berden, Tias Guns, Krzysztof Postek, Neil Yorke-Smith

AI总结 本文提出了一种基于成本敏感多输出回归的方法,用于解决包含多个不确定参数的组合优化问题,通过引入成本敏感的损失函数组件,提高了决策聚焦学习的效率和可扩展性。

详情
Comments
12 pages, 7 figures
AI中文摘要

许多现实世界中的组合问题涉及不确定参数,这些参数可以根据上下文特征和历史数据进行预测。这些'预测后优化'或'上下文优化'问题已获得显著关注:端到端训练方法现在可以最小化下游任务成本而不是预测误差。然而,尽管这些决策聚焦学习(DFL)方法有效,但它们通常在训练过程中依赖于重复解决底层组合优化问题,这使得它们计算成本高且难以扩展。我们重新将学习问题视为一个成本敏感的多输出回归问题:多输出是因为组合问题有多个不确定参数,而成本敏感是因为下游任务成本是真正的目标。我们的技术贡献是正式化了多个损失函数组件,这些组件来自于这种重新框架:成本不敏感的归一化、决策意识的不对称惩罚过预测和欠预测,以及实例化的成本,这些成本在本地模仿真正的下游任务损失。这些组件需要每个训练数据实例零或一次求解,而训练过程中不需要进一步求解。实验表明,损失组件的组合在下游任务质量上与最先进的方法相当,同时显著更高效,使能够扩展到以前无法用DFL解决的问题规模。

英文摘要

Many real-world combinatorial problems involve uncertain parameters, which can be predicted given contextual features and historical data. These `predict-then-optimize' or `contextual optimization' problems have gained significant attention: end-to-end training methods can now minimize the downstream task cost rather than the predictive error. However, despite their effectiveness, these decision-focused learning (DFL) approaches often rely on repeated solving of the underlying combinatorial optimization problem during training, making them computationally expensive and difficult to scale. We reframe the learning problem as a cost-sensitive multi-output regression problem: multi-output due to the combinatorial problem having multiple uncertain parameters, and cost-sensitive due to the downstream task cost being the real target. Our technical contribution is the formalization of multiple loss function components that follow from this reframing: cost-insensitive normalization, decision-aware asymmetric penalization of over- and underpredictions, and instance-based costs that mimic the true downstream task-based loss locally. These components require zero or one solve per training data instance, while requiring no further solves during training. Experiments show that the combination of loss components achieves comparable downstream task quality to the state of the art, while being significantly more efficient, enabling scaling to problem sizes that have not been tackled before with DFL.

2605.17963 2026-05-19 math.OC stat.ML

From Saddle Points Toward Global Minima: A Newton-Type Method on Wasserstein Space

从鞍点走向全局极小值:Wasserstein空间上的牛顿型方法

Razvan-Andrei Lascu, Taiji Suzuki

AI总结 本文研究了在Wasserstein空间上非凸泛函的最小化问题,提出了一种Wasserstein无鞍点牛顿(WSFN)方法,通过正则化的WassersteinHessian平方根预条件Wasserstein梯度,从而克服了标准Wasserstein牛顿动力学倾向于鞍点的倾向,并在多项式时间内达到全局极小值的α-邻域。

详情
Comments
83 pages, 3 figures
AI中文摘要

我们研究了在Wasserstein空间上非凸泛函的最小化问题。尽管最近的研究表明,扰动的Wasserstein梯度方法可以避免鞍点,但现有方法本质上是第一阶的,并且在迭代进入全局极小值邻域时无法提供快速的局部收敛。我们提出了Wasserstein Saddle-Free Newton(WSFN),一种第二阶方法,通过正则化的Wasserstein Hessian平方根预条件Wasserstein梯度。这种构造在正曲率方向上保持吸引,在负曲率方向上诱导排斥,从而克服了标准Wasserstein牛顿动力学倾向于鞍点的倾向。我们还建立了Wasserstein空间上的第二阶充分最优条件,用于严格局部极小性。在正则性和良性景观假设下,我们证明WSFN在多项式时间内逃离鞍点区域,并在全局极小值的α-邻域内达到,其对鞍点参数的依赖性优于先前的扰动一阶方法。一旦进入该邻域,我们证明WSFN在L^2-Wasserstein距离上以线性速度收敛到非退化的全局极小值。最后,我们提出了该方法的粒子实现。

英文摘要

We study the minimization of non-convex functionals over the Wasserstein space. While recent work has showed that perturbed Wasserstein gradient methods can avoid saddle points for benign landscapes, existing approaches remain essentially first-order and do not provide fast local convergence once the iterates enter a neighborhood of a global minimizer. We propose Wasserstein Saddle-Free Newton (WSFN), a second-order method that preconditions the Wasserstein gradient by a regularized square root of the squared Wasserstein Hessian. This construction preserves attraction toward directions of positive curvature while inducing repulsion along directions of negative curvature, thereby overcoming the tendency of standard Wasserstein Newton dynamics to be attracted to saddles. We also establish second-order sufficient optimality conditions on Wasserstein space for strict local minimality. Under regularity and benign landscape assumptions, we prove that WSFN escapes saddle regions and reaches an $α$-neighborhood of a global minimizer in polynomial time, with improved dependence on saddle parameters compared with prior perturbed first-order methods. Once inside this neighborhood, we show that WSFN converges linearly in $L^2$-Wasserstein distance to a non-degenerate global minimizer. Finally, we present a particle-based implementation of the method.

2605.17938 2026-05-19 cs.LG cs.AI stat.ML

Training data attribution in diffusion models via mirrored unlearning and noise-consistent skew

通过镜像反学习和噪声一致偏斜训练数据归因

Joan Serrà, Dipam Goswami, Fabio Morreale, Wei-Hsiang Liao, Yuki Mitsufuji

AI总结 本文提出了一种基于镜像反学习和噪声一致偏斜的方法,用于提升扩散模型的训练数据归因的可靠性与鲁棒性,通过在不同数据集上显著优于现有方法,展示了其在生成实例间影响实例重叠和扩散损失比较任务中的潜力。

详情
Comments
21 pages, 5 figures, 9 tables (includes appendix)
AI中文摘要

训练数据归因(TDA)应能够促进生成模型的可解释性,并推动各种相关下游任务的发展。然而,当前的TDA方法缺乏可靠性和鲁棒性,阻碍了其在实际应用中的采用。在本文中,我们采取了关键步骤,以实现更可靠和鲁棒的扩散模型TDA。我们提出通过镜像反学习和噪声一致偏斜(MUCS)进行TDA。该方法的核心思想是使用受限的镜像梯度上升微调第二个模型,并通过一致的噪声样本测量该模型相对于原始模型的归一化偏斜。我们展示了,尽管概念上简单且通用,MUCS在三个不同的数据集上系统性地大幅优于现有方法。此外,我们研究了核心设计选择对最终性能的影响,并分析了影响实例在生成项目中的重叠以及整合TDA方法的潜力。我们相信,我们的发现可能对更一般的反学习设置以及需要比较扩散损失的任务具有更广泛的意义。

英文摘要

Training data attribution (TDA) should enable generative model interpretability and foster a variety of related downstream tasks. Nonetheless, current TDA approaches lack reliability and robustness, preventing their adoption in real-world setups. In this paper, we take a decisive step towards more reliable and robust TDA for diffusion models. We propose to perform TDA with mirrored unlearning and noise-consistent skew (MUCS). The idea is to fine-tune a second model with bounded mirrored gradient ascent, and to measure the normalized skew of this model with respect to the original one using consistent noise samples. We show that, while being conceptually simple and generic, MUCS systematically outperforms existing methods on three different datasets by a large margin. We additionally study the effect that core design choices have on final performance, and analyze novel aspects regarding the overlap of influential instances across generated items and the potential of ensembling TDA approaches. We believe that our findings may have broader implications for more general unlearning setups, as well as for tasks requiring the comparison of diffusion losses.

2605.17934 2026-05-19 stat.ME math.ST stat.ML stat.TH

Conditional Predictive Inference for General Structured Data with Group Symmetries

具有群对称性的通用结构数据的条件预测推断

Yichen Shen, Mengxin Yu

AI总结 本文研究了具有群对称性的数据的无分布预测推断,旨在建立超越可交换性的近条件覆盖保证。虽然许多预测推断方法可以达到目标覆盖水平,但大多数只能提供边缘覆盖。在实践中,条件预测推断更受青睐,因为它可以量化给定观察属性的黑盒预测的不确定性,从而适应异质性。尽管许多努力旨在实现高效的条件覆盖,但现有方法通常依赖于i.i.d.或可交换假设,这在结构数据如网络、聚类和成像数据中常常被违反。最近,SymmPI引入了一种在超越可交换性的情况下进行预测推断的统一方法;然而,其保证仍然只是边缘的,并不考虑总体异质性。为了填补这一差距,我们引入了C-SymmPI框架,该框架在具有群对称性的通用数据结构下实现近条件覆盖,超越了可交换性,覆盖网络、聚类级数据及相关结构。受放松多准确性启发,我们的方法将条件覆盖重新公式化为用户指定的功能类上的误覆盖误差。我们在分布不变性和分布转移下建立了理论保证,并推导了线性和RKHS函数类的收敛速率,将最先进结果作为可交换情况的特例恢复。为了计算效率,我们开发了两种变体:一种基于投影的算法用于高维观测,另一种基于采样的算法用于大或无限群。我们在分层和网络数据上展示了有效性。实验证果表明,C-SymmPI相比现有方法提供了更具信息性和稳定性的条件覆盖,精度有所提高。

详情
AI中文摘要

我们研究了具有群对称性的数据的无分布预测推断,旨在建立超越可交换性的近条件覆盖保证。虽然许多预测推断方法可以达到目标覆盖水平,但大多数只能提供边缘覆盖。在实践中,条件预测推断更受青睐,因为它可以量化给定观察属性的黑盒预测的不确定性,从而适应异质性。尽管许多努力旨在实现高效的条件覆盖,但现有方法通常依赖于i.i.d.或可交换假设,这在结构数据如网络、聚类和成像数据中常常被违反。最近,SymmPI引入了一种在超越可交换性的情况下进行预测推断的统一方法;然而,其保证仍然只是边缘的,并不考虑总体异质性。为了填补这一差距,我们引入了C-SymmPI框架,该框架在具有群对称性的通用数据结构下实现近条件覆盖,超越了可交换性,覆盖网络、聚类级数据及相关结构。受放松多准确性启发,我们的方法将条件覆盖重新公式化为用户指定的功能类上的误覆盖误差。我们建立了在分布不变性和分布转移下的理论保证,并推导了线性和RKHS函数类的收敛速率,将最先进结果作为可交换情况的特例恢复。为了计算效率,我们开发了两种变体:一种基于投影的算法用于高维观测,另一种基于采样的算法用于大或无限群。我们在分层和网络数据上展示了有效性。实验证果表明,C-SymmPI相比现有方法提供了更具信息性和稳定性的条件覆盖,精度有所提高。

英文摘要

We study distribution-free predictive inference for data with group symmetries, aiming to establish near-conditional coverage guarantees beyond exchangeability for structured data. While many predictive inference methods achieve a target coverage level, most provide marginal coverage. In practice, conditional predictive inference is often preferred, as it quantifies uncertainty for black-box predictions given observed attributes, thereby accommodating heterogeneity. Although many efforts have pursued efficient conditional coverage, existing methods rely on the i.i.d. or exchangeable assumption, often violated in structured settings such as networks, clusters, and imaging data. Recently, SymmPI introduced a unified approach to predictive inference under group symmetries beyond exchangeability; nevertheless, its guarantees remain marginal and do not account for population heterogeneity. To bridge this gap, we introduce C-SymmPI, a framework that achieves near-conditional coverage under general data structures with group symmetries, extending beyond exchangeability to cover networks, cluster-level data, and related structures. Inspired by relaxed multi-accuracy, our approach reformulates conditional coverage as miscoverage error over a user-specified function class. We establish theoretical guarantees under distributional invariance and distribution shift, and derive convergence rates for linear and RKHS function classes, recovering state-of-the-art results in the exchangeable setting as special cases. For computational efficiency, we develop two variants: a projection-based algorithm for high-dimensional observations, and a sampling-based algorithm for large or infinite groups. We demonstrate effectiveness on hierarchical and network data. Empirical results show that C-SymmPI delivers more informative and stable conditional coverage with improved accuracy compared to existing methods.

2605.17920 2026-05-19 stat.ME stat.AP

Multivariate reconciliation for hierarchical time series

多变量层级时间序列的重新协调

Ana Caroline Pinheiro, Rodrigo de Souza Bulhões, Rob J. Hyndman, Paulo Canas Rodrigues

AI总结 本文提出了一种多变量重新协调方法,用于确保层级时间序列的预测一致性,并考虑变量间的关系。通过数值模拟和实际数据验证,该方法在模拟数据和实际应用中均优于传统方法。

详情
Comments
22 pages, 7 figures, 8 tables
AI中文摘要

某些时间序列可以根据某些特征(如地理或其它属性)进行层次化组织,这些序列称为层级时间序列。通常,所有层级的预测都会被生成,以确保一致性,即预测应满足与观测数据相同的汇总约束。各种方法已提出,通过使用一组基础预测来保证这种一致性,这一过程称为预测重新协调。类似于单变量情况,多变量时间序列也可以进行层次化结构。然而,所有现有方法都局限于单个变量。因此,确保一致的预测需要分别重新协调每个变量。然而,这一过程不考虑多个变量之间的相关性。为了解决这一限制,本文提出了一种多变量重新协调方法,以确保一致的预测并纳入变量间的关系。所提出的方法通过数值模拟进行测试,考虑了系列层次中的不同场景和多个变量之间的差异。此外,一些基础预测模型也被评估。该方法还应用于巴西实际就业数据中的录取和解雇数据。结果表明,多变量重新协调在模拟数据和实际应用中均比其他方法更准确。

英文摘要

Some time series can be hierarchically organized into levels based on certain characteristics, such as geography or other attributes of interest. These series are referred to as hierarchical time series. Typically, forecasts are generated at all levels to ensure coherence, meaning that the forecasts should satisfy the same aggregation constraints as the observed data. Various approaches have been proposed to guarantee this coherence by using a set of base forecasts. The process through which these forecasts are adjusted to become coherent is known as forecast reconciliation. Similar to the univariate case, multivariate time series can also be structured hierarchically. However, all existing approaches are limited to a single variable. As a result, ensuring coherent forecasts requires reconciling each variable separately. However, this process does not account for correlations among multiple variables. To address this limitation, this paper proposes a multivariate reconciliation methodology that ensures coherent forecasts and incorporates relationships among variables. The proposed methodology was tested through numerical simulations, considering distinct scenarios within the series hierarchy and across multiple variables. Additionally, some base forecasting models were evaluated. The methodology was also applied to real employment data of admissions and dismissals in Brazil. The results demonstrated that multivariate reconciliation yielded more accurate outcomes than the other methods considered, both in simulated data and in practical applications.

2605.17910 2026-05-19 stat.ME

Double/Debiased Machine Learning for Continuous Treatment Effects in Panel Data with Endogeneity

双重/去偏机器学习用于面板数据中持续治疗效应的估计

Peikai Wu, Kuan Sun, Zhiguo Xiao

AI总结 本文提出了一种双重/去偏机器学习框架,用于估计非参数面板模型中平均导数效应,扩展了工具变量方法到面板数据设置,处理连续治疗和各种内生性形式,并引入交叉拟合方案以消除时间固定效应后的独立性。通过惩罚GMM去偏项实现自动去偏机器学习。所提出的估计器在同时效应、动态效应和聚合效应上具有一致性和渐近正态性,具有有效的方差估计器。模拟显示了减少的正则化偏差和准确的置信区间。对ECLS-K数据的应用揭示了家庭社会经济地位对儿童BMI影响的丰富动态。

详情
AI中文摘要

我们提出了一种双重/去偏机器学习框架,用于估计非参数面板模型中平均导数效应。该框架扩展了工具变量方法到面板设置,能够处理连续治疗和各种形式的内生性,并引入交叉拟合方案以在消除时间固定效应后恢复独立性。一个受惩罚的GMM去偏项使得能够实现自动去偏机器学习。我们所提出的估计器在同时效应、动态效应和聚合效应上是一致的且渐近正态的,具有有效的方差估计器。模拟显示了减少的正则化偏差和准确的置信区间。对ECLS-K数据的应用揭示了家庭社会经济地位对儿童BMI影响的丰富动态。

英文摘要

We propose a double/debiased machine learning framework to estimate average derivative effects in nonparametric panel models with two-way fixed effects. It extends instrumental variable methods to panel settings, handles continuous treatments and various forms of endogeneity, and introduces a cross-fitting scheme to restore independence after eliminating time fixed effects. A penalized GMM debiasing term enables automatic debiased machine learning with endogeneity. Our estimators for contemporaneous, dynamic, and aggregated effects are consistent and asymptotically normal with a valid variance estimator. Simulations show reduced regularization bias and accurate confidence intervals. An application to ECLS-K data reveals rich dynamics in the effect of family SES on childhood BMI.

2605.17864 2026-05-19 stat.ME

Wavelet Based Time Series Models with Time-Varying Thresholds

基于小波的时间序列模型与时间变化阈值

Rhea Davis, N. Balakrishna

AI总结 本文提出了一种具有时间变化阈值的小波时间序列模型,通过小波级数展开表示阈值,能够更好地捕捉不规则和突发变化以及阈值参数的平滑变化,比傅里叶方法更具灵活性。通过模拟实验和实际数据应用评估了该模型的性能。

详情
AI中文摘要

本文开发了一种具有时间变化阈值的阈值模型,该模型通过小波级数展开进行表示。该模型能够充分捕捉不规则和突发性变化,以及阈值参数的平滑变化,从而比傅里叶方法具有更大的灵活性。通过模拟实验和实际数据应用来评估该模型的性能。

英文摘要

This paper develops a threshold model with a time-varying threshold, represented using a wavelet series expansion. The model adequately captures irregular and abrupt variations, as well as smooth changes in the threshold parameter, allowing greater flexibility than Fourier-based approaches. Simulation experiments and real-data applications are used to evaluate the model's performance.

2605.17850 2026-05-19 stat.ML cs.CV cs.LG cs.NA math.NA math.PR

Simple Approximation and Derivative Free Inference-Time Scaling for Diffusion Models via Sequential Monte Carlo on Path Measures

通过路径测度的序列蒙特卡洛实现扩散模型的简单近似与无导数推理时间缩放

Chenyang Wang, Weizhong Wang, Yinuo Ren, Jose Blanchet, Yiping Lu

AI总结 本文提出URGE算法,一种无需梯度的推理时间缩放方法,通过路径重要性重加权提升扩散模型样本质量,同时在合成测试和扩散模型基准中表现出色,且实现简单且无梯度依赖。

详情
Comments
accepted by ICML 2026
AI中文摘要

扩散生成模型越来越多地依赖于推理时间引导,通过添加漂移项或重新加权专家混合物来提高任务特定目标的样本质量。然而,大多数现有技术需要重复评估分数或梯度,引入偏差、高计算开销或两者兼有。我们引入URGE(Unbiased Resampling via Girsanov Estimation),一种无导数的推理时间缩放算法,通过Girsanov测度变换进行路径重要性重加权。与先前工作不同,URGE为每个模拟轨迹附加简单的乘法权重,并定期重新采样。无需计算基于梯度的粒子权重。我们建立了路径级和粒子级SMC之间的等价性:Girsanov路径权重允许一个向后条件期望,恢复先前的粒子级权重,保证两种方案产生相同的无偏终端分布。经验上,URGE在合成测试和扩散模型基准中优于现有推理时间引导基线,实现了更好的生成质量,同时显著更简单且完全无梯度依赖。

英文摘要

iffusion-based generative models increasingly rely on inference-time guidance, adding a drift term or reweighting mixture of experts, to improve sample quality on task-specific objectives. However, most existing techniques require repeated score or gradient evaluations, introducing bias, high computational overhead, or both. We introduce \texttt{URGE}, Unbiased Resampling via Girsanov Estimation, a derivative-free inference-time scaling algorithm that performs path-wise importance reweighting via a Girsanov change of measure. Instead of computing gradient-based particle weights in previous work, \texttt{URGE} attaches a simple multiplicative weight to each simulated trajectory and periodically resamples. No score, no Hessian, and no PDE evaluation is required. We establish an equivalence between path-wise and particle-wise SMC: the Girsanov path weight admits a backward conditional expectation that recovers the previous particle-level weights, guaranteeing that both schemes produce the same unbiased terminal law. Empirically, \texttt{URGE} outperforms existing inference-time guidance baselines on synthetic tests and diffusion-model benchmarks, achieving better generation quality, while being significantly simpler to implement and fully gradient-free.

2605.17845 2026-05-19 stat.AP

Quantifying Officiating Impact in the NBA: A Referee Impact Metric Analysis Using ESPN Win-Probability Data

量化NBA裁判影响:使用ESPN胜率数据的裁判影响指标分析

Nirek Duma, Leo Benaharon

AI总结 本文提出了一种裁判影响指标(RIM),用于量化NBA比赛中裁判决策对比赛结果的影响,通过整合胜率变化数据,分析裁判表现,并探讨不同因素对裁判影响的异质性。

详情
AI中文摘要

在过去一个世纪中,篮球分析从简单的比分统计发展到复杂的、考虑上下文的度量方法,这些方法评估事件对比赛结果的预期影响。然而,裁判分析并未经历这一转变:现有研究和公众讨论仍然严重依赖于犯规率、犯规差异、赛后复核的晚场比赛正确性标签,或球队/球员从判罚中受益的情况。这留下了经验上的空白,因为一场比赛中低影响的犯规不应等同于在紧要关头改变胜率的判罚。为了解决这一空白,我们引入了裁判影响指标(RIM),这是一个比赛层面的统计指标,整合了与犯规事件相关的绝对胜率变化,以衡量每位裁判在每场比赛中的影响。利用ESPN比赛总结和胜率数据,我们展示了RIM在经验上与犯规数量和犯规差异不同,识别了常规赛和季后赛裁判分布,并探讨了主客场、球队侧和裁判-球队异质性。然后,我们使用线性控制作为压力测试:在考虑主客场状态、球队、对手、赛季和季后赛系列状态等因素后,哪些描述性异常值在基本上下文调整后仍然存在。结果表明,一些球队侧和裁判-球队模式在条件后仍然可见,但遗漏变量稳健性诊断表明,这些模式应被解释为观测筛选信号,而不是任何单一官员有意、违规或吹哨责任的证据。本文对文献的贡献是基础性的,我们强调该框架应使用不同的胜率模型和进一步的因果推断进行测试。

英文摘要

Over the past century, basketball analytics has moved from simple box-score rates toward complex context-aware measures that evaluate events by their expected effect on game outcomes. Officiating analysis has not made the same transition: existing work and public discussion still rely heavily on foul rates, foul differentials, reviewed late-game correctness labels, or team/player benefit from calls. This leaves an empirical gap because a low-leverage foul in a decided game should not be treated as equivalent to a whistle that materially shifts win probability in a close game. To address this gap, we introduce the Ref Impact Metric (RIM), a game-level statistic that aggregates the absolute win-probability movement attached to foul events, measuring the impact of each referee for each game. Using ESPN game-summary and win-probability data for NBA seasons 2021-2022 through 2024-2025, we show that RIM is empirically distinct from both foul volume and foul disparity, identify regular-season and postseason referee distributions, and examine home/away, team-side, and referee-team heterogeneity. We then use linear controls intentionally as stress tests: conditioning on home status, team, opponent, season, and postseason series state asks which descriptive outliers persist after basic contextual adjustment. The results show that several team-side and referee-team patterns remain visible after conditioning, but omitted-variable robustness diagnostics indicate that these patterns should be interpreted as observational screening signals rather than evidence of intent, misconduct, or whistle-level responsibility by any single official. Our contribution to the literature is foundational, and we emphasize that this framework should be tested with different win probability models and further causal inference.

2605.17808 2026-05-19 cs.LG stat.ML

A Unified Framework for Data-Free One-Step Sampling via Wasserstein Gradient Flows

通过Wasserstein梯度流构建数据免费一步采样的统一框架

Chenguang Wang, Tianshu Yu

AI总结 本文提出了一种基于Wasserstein梯度流的数据免费一步采样的统一理论框架,展示了f-分歧度目标下诱导速度场的通用形式,并通过软欠覆盖功能理论推导了分歧度选择与质量运输几何之间的压缩-弹性恒等式,进一步扩展到Log-Variance分歧度,并通过KDE实现和归一化流路线实现了一步推断。

详情
AI中文摘要

我们开发了一种基于Wasserstein梯度流的数据免费一步采样的统一理论框架。对于广泛的标准f-分歧度目标,我们证明诱导速度场具有通用形式V(x)=w(r(x))β(x),其中β(x)=∇log(p(x)/q(x))在不同目标中共享,而w仅由分歧度的选择决定。这种分解表明标准f-分歧度漂移共享相同的渐近目标分布p,并主要区别于如何在欠覆盖区域重新分配瞬时修复努力。为了正式化这种区别,我们推导了软欠覆盖功能的一步区域响应理论,并获得了一个将分歧度选择与质量运输进入欠覆盖区域的几何联系的压缩-弹性恒等式。我们进一步将该框架扩展到Log-Variance (LV)分歧度,分析参考分布如何改变最终的漂移结构,并提出一个实用的LV启发式替代方案用于数据免费训练。基于此理论,我们通过KDE实现该框架,并描述了互补的归一化流路线,从而在训练后实现一步推断。在多模态高斯混合基准测试中的实验结果与理论预测一致,并在这些目标上展示了有效的一步采样。

英文摘要

We develop a unified theoretical framework for data-free one-step sampling from unnormalized target distributions based on Wasserstein gradient flows. For a broad class of standard f-divergence objectives, we show that the induced velocity field admits the universal form $\mathbf{V}(x)=w(r(x))\,β(x)$, where $β(x)=\nabla \log (p(x)/q(x))$ is shared across objectives and $w$ is determined solely by the choice of divergence. This decomposition shows that standard f-divergence drifts share the same asymptotic target distribution $p$ and differ primarily in how they redistribute transient repair effort across under-covered regions. To formalize this distinction, we derive a one-step regional-response theory for a soft under-coverage functional and obtain a compression--elasticity identity that links divergence choice to the geometry of mass transport into under-covered regions. We further extend the framework beyond the f-divergence family to the Log-Variance (LV) divergence, analyze how the reference distribution alters the resulting drift structure, and motivate a practical LV-inspired surrogate for data-free training. Based on this theory, we instantiate the framework with a KDE-based implementation and describe a complementary normalizing-flow route, enabling one-step inference after training. Experiments on multimodal Gaussian-mixture benchmarks are consistent with the theoretical predictions and demonstrate effective one-step sampling on these targets.

2605.17778 2026-05-19 math.ST cs.LG stat.ME stat.ML stat.TH

Self-Distillation is Optimal Among Spectral Shrinkage Estimators in Spiked Covariance Models

自蒸馏在带噪协方差模型中的谱收缩估计器中是最优的

Radu Lecoiu, Debarghya Mukherjee, Pragya Sur

AI总结 本文研究了自蒸馏在带噪协方差模型中的表现,证明了在谱收缩估计器中,s步自蒸馏在性能上最优,并展示了其在统计和机器学习中的优势。

详情
Comments
103 pages, 8 figures
AI中文摘要

自蒸馏已经 emerged 为提高现代机器学习系统模型性能的一种有前景的技术。我们通过引入并分析一个广泛的估计器类别,即谱收缩估计器,建立了自蒸馏在带噪协方差模型中的统计基础。我们证明了对于具有s个脊的带噪协方差矩阵,s步自蒸馏在谱收缩估计器中达到最优性能,优于统计和机器学习中已知的估计器。此外,我们还显示s步是必要的,任何(s-k)步蒸馏估计器对于1 ≤ k ≤ s都是严格次优的。对于等方差协方差的特殊子类,我们证明了最优调优的岭回归在谱收缩估计器中表现最佳。我们还研究了一种联邦方法,其中多个数据中心共享谱收缩估计器,并且一个共同的服务器试图聚合它们以实现最优性能。在这种情况下,我们发现最佳的本地规则再次采用自蒸馏的形式,尽管当数据集中在单一服务器上时,它与最优规则不同。总之,我们的结果阐明了自蒸馏如何提高预测性能,并提供了一个更广泛的统计框架,将自蒸馏与经典收缩方法联系起来。

英文摘要

Self-distillation has emerged as a promising technique for improving model performance in modern machine learning systems. We develop the statistical foundations of self-distillation in spiked covariance models, by introducing and analyzing a broad class of estimators, namely spectral shrinkage estimators. We establish that for spiked covariance matrices with $s$ spikes, $s$-step self-distillation achieves optimal performance among spectral shrinkage estimators, outperforming well-known estimators in statistics and machine learning. Moreover, we show that $s$ steps are necessary for optimality: any $(s-k)$-step distilled estimator is strictly suboptimal for $1 \leq k \leq s$. For the special subclass of isotropic covariances, we show that optimally tuned Ridge regression performs best among spectral shrinkage estimators. We also study a federated approach where multiple data centers share spectral shrinkage estimators and a common server seeks to aggregate them to achieve optimal performance. In this case, we find that the best local rule again takes the form of self-distillation, though it differs from the optimal rule when data are hosted centrally on a single server. Together, our results elucidate why self-distillation improves predictive performance and provide a broader statistical framework connecting it with classical shrinkage-based methods.

2605.17771 2026-05-19 stat.AP

Multi-Class Neurological Disorder Prediction with Tensor Network Feature Engineering

多类神经系统疾病预测的张量网络特征工程

Keshav Balakrishna, Aaryan Chityala, Vivan Kanna, Ishan Pathak, Harshit Ravula, Aaron Lee, Alessandro Hammond, Moemal Al-Wishah, Leo Anthony Celi

AI总结 本文提出了一种结合张量分解与集成分类器的方法,用于多类神经系统疾病预测,通过张量网络表达性提升了模型鲁棒性,并在临床数据集上展示了与最新经典方法相媲美的性能。

详情
AI中文摘要

准确诊断神经系统疾病依赖于先进的成像模态,如磁共振成像(MRI),其通常利用稀疏成像技术从有限数据中重建图像,从而减少存储和采集时间。然而,管理噪声和保留关键诊断特征仍具挑战性。在本研究中,一种集成分类器被增强为PARAFAC CP张量分解,其数学灵感来自量子神经网络架构,但完全采用经典方法实现。该模型在包含55,160张图像的大型平衡临床数据集上进行了评估,涵盖8种诊断类别,采用高和低PARAFAC秩配置。通过5折分层嵌套交叉验证评估,两种配置均表现出强大的验证性能,展示了张量网络表达性的鲁棒性。此外,所提模型在最近的经典方法中表现具有竞争力,进一步凸显了受量子启发的经典框架在增强医学图像分析和支持可靠临床诊断中的潜力。未来的工作将探索先进编码方案的整合、在真实量子硬件上的部署以及使用更多样化的神经系统数据集。

英文摘要

Accurate diagnosis of neurological disorders is contingent upon advanced imaging modalities such as Magnetic Resonance Imaging (MRI), which commonly utilize sparse imaging techniques to reconstruct images from limited data, thus reducing storage and acquisition time. However, challenges remain in managing noise and preserving critical diagnostic features for effective analysis. In this study, an ensemble classifier is enriched with PARAFAC CP tensor decompositions, drawing mathematical inspiration from quantum neural network architectures but implemented entirely classically. The model was evaluated on a large, balanced clinical dataset comprising 55,160 images across 8 diagnostic categories, employing both higher and lower PARAFAC rank configurations. Evaluated through 5-fold nested stratified cross-validation, both configurations achieved strong validation performance, demonstrating robustness to tensor network expressivity. Additionally, the proposed model achieved competitive performance relative to recent classical approaches, further underscoring the potential of quantum-inspired classical frameworks to enhance medical image analysis and support reliable clinical diagnosis. Future work will explore the integration of advanced encoding schemes, deployment on real quantum hardware, and the use of more diverse neurological datasets.

2605.17764 2026-05-19 stat.ME

Stationary birth-death processes generating inflation-deflation distributions: Avoiding the issue of dominance

静止的生灭过程生成通胀-贬值分布:避免主导问题

Wanrudee Skulpakdee, Mongkol Hunkrajok

AI总结 本文研究了通过修改生灭过程的出生和死亡率来生成通胀-贬值分布的机制,并引入了两种新的此类分布,以解决现有方法中可能出现的主导问题。

详情
AI中文摘要

两种或更多计数分布的混合已深深嵌入超额计数分析中,通常相对于生灭过程(如几何分布、泊松分布、泊松-林德利分布、负二项分布、超泊松分布和康韦-马克斯韦尔-泊松分布)的平稳(平衡)分布。然而,超额计数产生的机制——即通过修改基础分布的出生和死亡率——尚未在文献中直接被研究。所有已知的通胀混合分布实际上都是生灭过程平稳分布的参数化。因此,尽管所得到的分布具有相同的形状,但它们来自不同的机制,并在回归分析中不等价。本文聚焦于由修改的生灭过程生成的通胀-贬值平稳分布,这些过程形成指数族,并引入了两种此类分布。

英文摘要

A mixture of two or more count distributions has become deeply embedded in the analysis of excess counts, often relative to the stationary (equilibrium) distributions of birth-death processes such as the geometric, Poisson, Poisson-Lindley (PL), negative binomial (NB), hyper-Poisson (HP), and Conway-Maxwell-Poisson (CMP) distributions. However, the mechanism by which excess counts arise--namely, through modifications of the birth and death rates in the base distributions--has not yet been directly examined in the research literature. All well-known inflation mixture distributions are, in fact, parameterizations of the stationary distributions of birth-death processes. Thus, although the resulting distributions share the same shapes, they arise from distinct mechanisms and are not equivalent in regression analyses. This paper focuses on inflation-deflation stationary distributions arising from modified birth-death processes that form an exponential family and introduces two types of such distributions.

2605.17763 2026-05-19 stat.ME stat.ML

Comparing Two Categorical Gini Correlations with Applications to Classification Problems

比较两种分类Gini相关性及其在分类问题中的应用

Sameera Hewage, Yongli Sang

AI总结 本文提出了一种用于比较分类问题中预测变量重要性的推断框架,基于Dang等人(2020)提出的分类Gini相关性(CGC),通过测试不同预测变量组之间的CGC差异来评估预测变量的重要性,并通过模拟研究和实际应用验证了该方法的有效性。

详情
AI中文摘要

本文提出了一种用于比较分类问题中预测变量重要性的推断框架。该方法基于Dang等人(2020)提出的分类Gini相关性(CGC),这是一种衡量数值预测变量与分类结果之间依赖性的指标。通过测试不同预测变量组之间的CGC差异来评估预测变量的重要性。所提出的方法可以处理任意维度和不等维度的预测变量,并允许预测变量组之间存在依赖性。在原假设和备择假设下均建立了检验统计量的渐近正态性,并证明了该检验的一致性。此外,除了推导渐近分布外,还开发了一种非参数自助法作为另一种推断方法。通过模拟研究以及乳腺癌和人类活动识别数据集的应用,展示了所提出框架的有效性。

英文摘要

This article proposes an inferential framework for comparing predictor importance in classification problems with categorical response variables. The approach is based on the categorical Gini correlation (CGC) proposed by Dang et al. (2020), a measure of dependence between numerical predictors and categorical outcomes. Predictor importance is evaluated by testing differences in CGCs across competing predictor groups. The proposed methodology accommodates predictors of arbitrary and unequal dimensions and allows for dependence between predictor groups. Asymptotic normality of the test statistic is established under both the null and alternative hypotheses, and the resulting test is shown to be consistent. In addition to deriving the asymptotic distribution, a nonparametric bootstrap procedure is developed as an alternative approach to inference. Simulation studies, along with applications to breast cancer and human activity recognition datasets, demonstrate the effectiveness of the proposed framework.

2605.17749 2026-05-19 cs.LG stat.ML

Testable and Actionable Calibration for Full Swap Regret

可检验且可操作的全面交换懊悔校准

Konstantina Bairaktari, Lunjia Hu, Huy L. Nguyen, Jonathan Ullman

AI总结 本文提出了一种新的校准度量标准SCDL,该度量标准在不削弱任何要求的前提下,既可操作又可检验,同时具备连续性和一致性等理想特性,并通过实验验证了其在实际中的优越性能。

详情
AI中文摘要

人工智能生成的预测越来越多地影响关键任务中的决策制定,因此必须具有可信度。校准是衡量可信度的一种广泛使用的度量标准,要求预测与真实频率匹配,并可以像真实概率一样对待某一结果。然而,定义校准是微妙的,设计良好的校准误差度量标准一直是最近研究的活跃主题。第一个目标是找到可操作的校准度量标准,即能够向决策者说明当预测被视为真实概率时的效用损失,这被称为交换懊悔。第二个目标是找到可检验的校准度量标准,即校准误差可以从少量预测和结果中测量出来。尽管这些是基本要求,但目前没有现有的校准度量标准能够完全满足这两个属性,所有现有的度量标准都通过限制交换懊悔的弱化观念来放松可操作性,或通过具有次优估计误差来放松可检验性。我们介绍了一种新的校准度量标准,称为软分箱校准决策损失(SCDL),我们证明其在不削弱任何要求的前提下是完全可操作的,并且可检验性具有几乎最优的误差率。此外,SCDL还满足其他理想属性,如连续性和一致性。我们还提供了一组实验,证明了SCDL与其他度量标准的理论优势在实践中导致更好的性能。

英文摘要

AI generated predictions increasingly inform decision making in critical tasks, and therefore must be trustworthy. One widely used measure of trustworthiness is calibration, which requires that the predictions match the true frequencies and can be treated like real probabilities of a given outcome. However, defining calibration is subtle, and designing good measures of calibration error has been an active topic of recent research. The first goal is to find calibration measures that are actionable, meaning they can inform decision makers about their utility loss when predictions are treated as true probabilities, which is known as swap regret. The second goal is to find calibration measures that are testable, meaning that calibration error can be measured from a small sample of predictions and outcomes. Although these are very basic requirements, there is no existing calibration measure that fully satisfies both properties, and all existing measures relax actionability by bounding a weaker notion of swap regret, or relax testability by having suboptimal estimation error. We introduce a new calibration measure, Soft-Binned Calibration Decision Loss (SCDL), which we prove is fully actionable without weakening either requirement, and testable with nearly optimal error rate. In addition, SCDL satisfies other desired properties such as continuity and consistency. We also provide a set of experiments confirming that the theoretical advantages of SCDL compared to other measures lead to better performance in practice.

2605.17745 2026-05-19 stat.ML cs.LG

StatQAT: Statistical Quantizer Optimization for Deep Networks

StatQAT: 深度网络的统计量化优化

Mehmet Aktukmak, Daniel Huang, Ke Ding

AI总结 本文提出了一种新的统计误差分析框架,用于统一和浮点量化,以提供理论洞察,针对不同数据分布的量化配置误差行为。基于此分析,作者提出了适用于任意数据分布的迭代量化器和适用于高斯似分布权重的分析量化器,从而实现了高效的低误差量化,适用于激活和权重。将这些量化器整合到量化感知训练中,并在整数和浮点格式上进行了评估,实验表明提高了准确性和稳定性,展示了该方法在训练低精度神经网络中的有效性。

详情
AI中文摘要

量化对于减少深度神经网络的计算成本和内存使用至关重要,使低精度硬件上的高效推断成为可能。尽管统一和浮点量化方案的广泛应用,选择最优的量化参数仍是一个关键挑战,尤其是在训练和推断过程中遇到的多样化数据分布。本文提出了一种新的统计误差分析框架,用于统一和浮点量化,提供了对量化配置下误差行为的理论洞察。基于此分析,我们提出了适用于任意数据分布的迭代量化器和适用于高斯似分布权重的分析量化器。这些方法使高效、低误差的量化成为可能,适用于激活和权重。我们将我们的量化器整合到量化感知训练中,并在整数和浮点格式中进行了评估。实验表明,精度和稳定性得到了提高,突显了我们的方法在训练低精度神经网络中的有效性。

英文摘要

Quantization is essential for reducing the computational cost and memory usage of deep neural networks, enabling efficient inference on low-precision hardware. Despite the growing adoption of uniform and floating-point quantization schemes, selecting optimal quantization parameters remains a key challenge, particularly for diverse data distributions encountered during training and inference. This work presents a novel statistical error analysis framework for uniform and floating-point quantization, providing theoretical insight into error behavior across quantization configurations. Building on this analysis, we propose iterative quantizers designed for arbitrary data distributions and analytic quantizers tailored for Gaussian-like weight distributions. These methods enable efficient, low-error quantization suitable for both activations and weights. We incorporate our quantizers into quantization-aware training and evaluate them across integer and floating-point formats. Experiments demonstrate improved accuracy and stability, highlighting the effectiveness of our approach for training low-precision neural networks.

2605.17718 2026-05-19 stat.ML cs.LG

How does feature learning reshape the function space?

特征学习如何重塑函数空间?

João Lobo, Bruno Loureiro, Long Tran-Than, Fanghui Liu

AI总结 本文研究了特征学习如何通过梯度下降训练改变两层神经网络的函数空间,揭示了特征学习在参数空间或输入空间中的分布变换作用,以及其对函数空间谱结构的影响。

详情
Comments
59 pages, 1 figure
AI中文摘要

特征学习被广泛认为是区分神经网络与固定核方法的关键机制,但其对诱导函数空间的影响仍不明确。本文精确刻画了两层神经网络特征所张开的函数空间在梯度下降训练中的演变。我们证明,在高维比例 regime 中,经过大梯度步后,更新后的特征分布可近似为一个依赖目标的 spiked Gaussian 协方差。这诱导出一个数据自适应的核,重塑函数空间并修改其谱结构。我们的分析揭示,特征学习可以被解释为参数空间或输入空间中的分布变换,等价于引入一个依赖目标的核。特别是,它会选择性地放大与目标方向对齐的本征值,并混合主导本征函数,将顶部径向模式与目标对齐的二次谐波耦合。总体而言,我们的结果为早期阶段的特征学习提供了精确的函数空间视角:而不是仅仅缩放固定核,梯度下降诱导出一种数据自适应的变形,优先增强与数据中的信号方向对齐的方向。

英文摘要

Feature learning is widely regarded as the key mechanism distinguishing neural networks from fixed-kernel methods, yet its impact on the induced function space remains poorly understood. In this work, we precisely characterize how the function space spanned by the features of a two-layer neural network evolves during gradient descent training. We prove that, in the high-dimensional proportional regime, after a large gradient step the post-update feature distribution is well approximated by a target-dependent spiked Gaussian covariance. This induces a data-adaptive kernel that reshapes the function space and modifies its spectral structure. Our analysis reveals that feature learning can be interpreted as a distributional transformation in either parameter space or input space, equivalently as the introduction of a target-dependent kernel. In particular, it selectively amplifies eigenvalues aligned with the target direction and mixes leading eigenfunctions, coupling the top radial mode with a target-aligned quadratic harmonic. Overall, our results provide a precise function-space perspective on early-stage feature learning: rather than just rescaling a fixed kernel, gradient descent induces a data-adaptive deformation that preferentially enhances directions aligned with the signal in the data.

2605.17705 2026-05-19 stat.ML cs.LG stat.ME

Online Conformal Prediction for Non-Exchangeable Panel Data

在线非交换面板数据的符合预测

Daohong Tu, Kay Giesecke

AI总结 本文提出了一种简单的在线符合框架,用于非交换面板数据,通过利用在线面板预测的关键特征,即当需要对一个单位进行预测时,相关单位的同期结果可能已观察到,可以作为校准面板。该方法利用了适应性量来形成预测集,从而在长期内提供覆盖保证。

详情
Comments
34 pages, 5 figures
AI中文摘要

面板数据,其中多个单位在时间上被反复观察,出现在科学和工程中。在这样的设置中量化预测不确定性具有挑战性,因为符合预测,虽然分布无关且模型无关,但传统上依赖于可交换性假设,这些假设在时间依赖性和单位异质性下失效。我们提出了一种简单的在线符合框架用于非交换面板数据。该方法利用了在线面板预测的一个关键特征:当需要对一个单位进行预测时,相关单位的同期结果可能已经观察到,并可以作为校准面板。在每一轮中,使用当前观察到的校准单位以及两个适应性量来形成预测集:基于历史的相似性权重,强调与目标相似的校准单位,以及一个适应性的误覆盖水平,当目标反馈被揭示时会更新。这种双状态设计产生了一种逐步覆盖界和长期覆盖保证。在经验上,跨合成和真实面板数据集,该方法通过适应性区间宽度分配改进了最差覆盖目标单位的覆盖,而不是均匀膨胀。两个状态是互补的:相似性权重在目标反馈稀疏时保护覆盖,而适应性水平进一步在反馈积累时提高覆盖。

英文摘要

Panel data, in which multiple units are repeatedly observed over time, arise throughout science and engineering. Quantifying predictive uncertainty in such settings is challenging because conformal prediction, while distribution-free and model-agnostic, classically relies on exchangeability assumptions that fail under temporal dependence and unit heterogeneity. We propose a simple online conformal framework for non-exchangeable panel data. The method exploits a key feature of online panel prediction: when a forecast is required for one unit, contemporaneous outcomes from related units may already be observed and can serve as a calibration panel. At each round, prediction sets are formed using currently observed calibration units together with two adaptive quantities: history-based similarity weights that emphasize calibration units resembling the target, and an adaptive miscoverage level that is updated whenever target feedback is revealed. This two-state design yields a stepwise coverage bound and a long-run coverage guarantee. Empirically, across synthetic and real panel data sets, the method improves coverage on the worst-covered target units through adaptive interval-width allocation rather than uniform inflation. The two states are complementary: similarity weights protect coverage when target feedback is sparse, while the adaptive level further improves coverage as feedback accumulates.

2605.17689 2026-05-19 stat.ME math.ST stat.TH

Do Stationarity Transformations Actually Improve Time Series Forecasts? A Controlled Experimental Evaluation

平稳性变换真的能提升时间序列预测吗?一种受控的实验评估

Bhanu Suraj Malla, Yuqing Hu

AI总结 本文通过构造具有已知性质(趋势、季节性、异方差性及组合)的合成数据集,并在七种模型和三种预测时间跨度(共3528次实验)上应用14种变换配置,评估了平稳性变换对不同非平稳性类型和模型家族的预测准确性影响,发现只有18%的变换能提升预测,而方差稳定化方法在异方差数据上表现更佳,且差分线性趋势序列反而会降低预测精度,实验证实应基于经验性外样本评估选择变换而非理论平稳性假设。

详情
AI中文摘要

平稳性变换是时间序列预测中的标准预处理步骤,但其在不同非平稳性类型和模型家族中的实际影响尚未受到严格控制的评估。我们构建了具有已知性质(趋势、季节性、异方差性及组合)的合成数据集,并在七种模型和三种预测时间跨度(3,528次实验)上应用14种变换配置。平稳性通过十种统计检验的共识比率量化,每种变换-数据集对根据变换是否针对数据集的已知非平稳性被分类为匹配或不匹配。对于匹配对,变换仅在18%的情况下提升预测。主要例外是方差稳定化:在异方差数据上,对数和Box-Cox变换在60-65%的情况下提升准确性。差分线性趋势序列(教科书用例)在所有测试情况下均降低预测精度。中介分析确认,尽管变换实现趋势平稳性,但并未降低预测误差;机制是信号衰减。在TSA机场乘客数据上的现实验证证实了这些发现。我们的结果表明,变换选择应基于经验性外样本评估,而非理论平稳性假设。

英文摘要

Stationarity transformations are standard preprocessing in time series forecasting, yet their actual impact on accuracy across different non-stationarity types and model families has received little controlled evaluation. We construct synthetic datasets with known properties - trend, seasonality, heteroscedasticity, and combinations - and apply fourteen transformation configurations across seven models and three forecast horizons (3,528 experiments). Stationarity is quantified via consensus ratios from ten statistical tests, and each transform-dataset pair is classified as matched or mismatched based on whether the transform targets the dataset's known non-stationarity. For matched pairs, transforms improve forecasts only 18% of the time. The primary exception is variance stabilization: log and Box-Cox on heteroscedastic data improve accuracy in 60-65% of cases. Differencing a linear-trend series - a textbook use case - worsens forecasts in all cases tested. Mediation analysis confirms that while transforms achieve trend stationarity, this does not translate into lower forecast error; the mechanism is signal attenuation. Real-world validation on TSA airport passenger data corroborates these findings. Our results suggest transformation selection should be guided by empirical out-of-sample evaluation rather than theoretical stationarity assumptions.

2605.17678 2026-05-19 stat.ML cs.LG

On Gaussian approximation for entropy-regularized Q-learning with function approximation

关于熵正则化Q学习与函数逼近的高斯近似

Artemy Rubtsov, Rahul Singh, Eric Moulines, Alexey Naumov, Sergey Samsonov

AI总结 本文研究了熵正则化异步Q学习在高维中心极限定理下的收敛速率,通过线性函数逼近和多项式步长,建立了在凸距离下的高斯近似界,并推导了算法最后迭代的高阶矩界。

详情
AI中文摘要

在本文中,我们推导了在高维中心极限定理下,由熵正则化异步Q学习生成的Polyak-Ruppert平均迭代的收敛速率。假设观测到的三元组序列$(s_k,a_k,s_{k+1})_{k \geq 0}$形成一个均匀几何递归的马尔可夫链,并在适当的正则性条件下,针对投影软贝尔曼方程,我们建立了在凸距离下的高斯近似界,其收敛速率的顺序为$n^{-1/4}$,至多包含多项对数因子,其中$n$是算法所用的样本数量。为了获得这一结果,我们结合了软贝尔曼递归的线性化与对主导测度项的高斯近似。最后,我们推导了算法最后迭代的高阶矩界,这可能具有独立兴趣。

英文摘要

In this paper, we derive rates of convergence in the high-dimensional central limit theorem for Polyak--Ruppert averaged iterates generated by entropy-regularized asynchronous Q-learning with linear function approximation and a polynomial stepsize $k^{-ω}$, $ω\in (1/2,1)$. Assuming that the sequence of observed triples $(s_k,a_k,s_{k+1})_{k \geq 0}$ forms a uniformly geometrically ergodic Markov chain, and under suitable regularity conditions for the projected soft Bellman equation, we establish a Gaussian approximation bound in the convex distance with rate of order $n^{-1/4}$, up to polylogarithmic factors in $n$, where $n$ is the number of samples used by the algorithm. To obtain this result, we combine a linearization of the soft Bellman recursion with a Gaussian approximation for the leading martingale term. Finally, we derive high-order moment bounds for the algorithm's last iterate, which might be of independent interest.

2605.17660 2026-05-19 math.OC cs.AI cs.LG stat.ML

Training Infinitely Deep and Wide Transformers

训练无限深且宽的Transformer

Raphaël Barboni, Maarten V. de Hoop, Takashi Furuya, Gabriel Peyré

AI总结 本文提出了一种严格的数学框架,用于分析Transformer在均场 regime 中的梯度基于训练动态,通过研究无限深和宽的Transformer的均场模型,建立了训练风险的条件Wasserstein梯度的显式公式,并证明了在NTK注入性假设下梯度流收敛到全局极小值。

详情
AI中文摘要

Transformers已成为现代机器学习中占主导地位的架构,但其训练动态的理论理解仍然有限。本文开发了一个严格的数学框架,用于分析在均场 regime 中Transformer的梯度基于训练动态,其中深度(层数)和宽度(注意头数)趋于无穷大。虽然ResNet训练可以理解为控制神经ODE,但Transformer训练对应于控制神经PDE,因为通过注意力机制耦合了多个token分布。我们的均场模型特征两种类型的测度表示:通过层演变的token分布和每层的注意力参数。我们建立了无限深Transformer前向传递的well-posedness,通过流映射来表征token演变,这些流映射满足函数空间中的ODE。利用伴随敏感度分析,我们推导出训练风险的条件Wasserstein梯度的显式公式,该公式涉及由反向ODE控制的伴随变量。我们证明了在条件Wasserstein度量空间中梯度流曲线的存在性和唯一性,建立了梯度基于Transformer训练的严格基础。一个关键技术贡献是提供了注意力机制的神经切线核(NTK)注入性的必要且充分条件:我们证明NTK注入性等同于log-sum-exp函数的线性独立性模仿射函数,这一条件由多种token分布满足,包括离散分布、均匀分布和高斯混合分布。在NTK注入性假设下,我们证明当初始损失足够小时,梯度流收敛到全局极小值,消除了优化景观中的虚假局部极小值。

英文摘要

Transformers have become the dominant architecture in modern machine learning, yet the theoretical understanding of their training dynamics remains limited. This paper develops a rigorous mathematical framework for analyzing gradient-based training of transformers in the mean-field regime, where both the depth (number of layers) and width (number of attention heads) tend to infinity. While ResNet training can be understood as controlling a neural ODE, transformer training corresponds to controlling a neural PDE, due to the coupling of multiple token distributions through the attention mechanism. Our mean-field model features two types of measure representations: token distributions evolving through layers and attention parameters at each layer. We establish well-posedness of the forward pass through infinitely deep transformers, characterizing token evolution via flow maps that satisfy ODEs in function spaces. Using adjoint sensitivity analysis, we derive an explicit formula for the conditional Wasserstein gradient of the training risk, involving adjoint variables governed by backward ODEs. We prove the existence and uniqueness of gradient flow curves in the conditional Wasserstein metric space, establishing a rigorous foundation for gradient-based transformer training. A key technical contribution is providing necessary and sufficient conditions for injectivity of the Neural Tangent Kernel (NTK) for attention mechanisms: we show that NTK injectivity is equivalent to linear independence of log-sum-exp functions modulo affine functions, a condition satisfied by diverse token distributions, including discrete distributions, uniform distributions, and Gaussian mixtures. Under this NTK injectivity assumption, we prove that gradient flow converges to global minima when the initial loss is sufficiently small, eliminating spurious local minima from the optimization landscape.

2605.17646 2026-05-19 stat.ME

Starshaped Mean Residual Life Models for Non-Monotonic Survival Data: A Bayesian PMRL Regression Framework with Applications to Teacher Retention

具有非单调生存数据的星形平均残余寿命模型:一种贝叶斯PMRL回归框架及其在教师留任中的应用

Mohammad Sepehrifar

AI总结 本文提出了一种适用于非单调危险模式生存数据的星形平均残余寿命(SMEL)框架,通过贝叶斯PMRL回归方法,展示了在教师留任问题中的应用,该方法在处理复杂时间动态方面具有优势。

详情
AI中文摘要

我们开发了一种适用于具有非单调危险模式的生存数据的星形平均残余寿命(SMEL)框架,其中早期阶段的流失后是中期职业的稳定期。与Cox比例危险模型或标准平均残余寿命模型要求单调性不同,SMEL通过仅要求m(t)/t非递减,正式化了从易感性到平衡的过渡。我们通过比例平均残余寿命(PMRL)模型将SMEL扩展到回归设置,m(t|Z)=m0(t)exp(Z⊤γ),使用三参数Weibull-韧性分布和No-U-Turn Sampler进行自适应贝叶斯估计。在48,000个数据集上的蒙特卡洛模拟显示,SMEL-PMRL在40%右删失下保持偏差≤0.02,比Cox模型减少19%的整合Brier分数(2.34 vs. 2.88×10⁻²),并实现5.4%的AIC改进。通过共享 frailty 的纵向-生存扩展,可以同时建模相关的时间到事件和连续结果。应用于169名农村STEM教师(2018-2023,NSF Noyce)确认了星形平衡(Λ=12.47,p=0.002),其中38%的早期职业任期下降(第1-3年)。联合模型(θ̂=0.41,95% CI: [0.35, 0.47])显示在第3年后持续存在可带来四年内31点的累积成就提升(0.56 SD)。SMEL-PMRL为工作动态和高流失率设置提供了灵活且理论支持的替代方案,特别是在平衡过程主导长期稳定性的场景中。

英文摘要

We develop a Starshaped Mean Residual Life (SMEL) framework for survival data with non-monotonic hazard patterns, where early-stage attrition is followed by mid-career stabilization. Unlike Cox proportional hazards models or standard mean residual life models requiring monotonicity, SMEL accommodates complex temporal dynamics by requiring only that $m(t)/t$ be nondecreasing, formalizing the transition from vulnerability to equilibrium. We extend SMEL to regression settings via proportional mean residual life (PMRL) models, $m(t\mid Z)=m_0(t)\exp(Z^\topγ)$, with adaptive Bayesian estimation using three-parameter Weibull--resilience distributions and the No-U-Turn Sampler. Monte Carlo simulations across 48,000 datasets show SMEL-PMRL maintains bias $\leq 0.02$ under 40\% right-censoring, reduces integrated Brier score by 19\% over Cox models ($2.34$ vs.\ $2.88\times10^{-2}$), and achieves 5.4\% AIC improvement. Joint longitudinal-survival extensions via shared frailty enable simultaneous modeling of correlated time-to-event and continuous outcomes. Application to 169 rural STEM teachers (2018--2023, NSF Noyce) confirms starshaped equilibrium ($Λ=12.47$, $p=0.002$), with 38\% early-career tenure decline (years 1--3). The joint model ($\hatθ=0.41$, 95\% CI: $[0.35,\,0.47]$) shows persistence beyond year~3 yields 31-point cumulative achievement gains (0.56~SD) over four years. SMEL-PMRL offers a flexible, theoretically grounded alternative to proportional hazards for workforce dynamics and high-attrition settings where equilibrium processes govern long-term stability.

2605.17592 2026-05-19 math.FA math-ph math.MP quant-ph stat.ME

Ordered POVMs and Residual Collapse

有序POVMs与残余坍缩

James Tian

AI总结 本文研究了通过残余变换生成的有序离散POVMs的实现,探讨了残余变换在连续测试中的应用,提出了残余坍缩POVM的概念,并分析了其等价关系、范围和纤维特性。

详情
AI中文摘要

通过连续测试生成的残余变换研究了有序离散POVMs的实现。变换的一种应用是将每个坐标替换为所有先前测试失败后获得的效果,并将剩余质量作为终端结果。在自然假设下,迭代变换产生一个坍缩的POVM,其非逃逸坐标是原始效果经过所有先前测试后存活的部分。所得到的坍缩映射在有序POVM实现上定义了等价关系。其范围和纤维被刻画。范围由坍缩的POVM组成,其非逃逸坐标彼此正交且其支撑投影强和为恒等元。坍缩POVM的纤维由所有具有相同残余可见压缩的有序实现组成。特别是,不同的有序实现,包括具有不同非对角耦合数据的实现,可以有相同的坍缩图像。坍缩后,非逃逸坐标在进一步的残余迭代下保持不变。剩余的动力学发生在逃逸效果中,该效果被通用的标量函数 calculus 分裂。

英文摘要

Ordered realizations of discrete POVMs are studied through a residual transform generated by sequential tests. One application of the transform replaces each coordinate by the effect obtained after all earlier tests have failed, and appends the remaining mass as a terminal outcome. Under natural hypotheses, iterating the transform produces a collapsed POVM whose non-escape coordinates are the parts of the original effects that survive all earlier tests. The resulting collapse map gives an equivalence relation on ordered POVM realizations. Its range and fibers are characterized. The range consists of collapsed POVMs, whose non-escape coordinates are mutually orthogonal and whose support projections strongly sum to the identity. The fiber over a collapsed POVM consists of all ordered realizations with the same residually visible compressions. In particular, different ordered realizations, including ones with different off-diagonal coupling data, can have the same collapsed image. After collapse, the non-escape coordinates are fixed under further residual iteration. The remaining dynamics takes place in the escape effect, which is fragmented by a universal scalar functional calculus.

2605.17585 2026-05-19 stat.ME math.ST stat.TH

Modelling pairs of Poissons and binomials with negative correlation

用负相关建模泊松分布和二项分布对

Nils Lid Hjort

AI总结 本文研究了通过引入调整函数构造具有负相关性的双变量分布,应用于泊松分布和二项分布,展示了其在土壤种子竞争数据和审计-C筛查问卷中的应用。

详情
Comments
14 pages, 4 figures, 3 tables; Statistical Research Report, Department of Mathematics, University of Oslo, 17 May 2026; submitted for publication
AI中文摘要

假设有给定的边缘分布$f_1(x)$和$f_2(y)$对于对$(x,y)$。我考虑构造$f_1(x)f_2(y)\{1+αh_1(x)h_2(y)\}$,其中$h_1$和$h_2$被视为有界的调整函数,归一化为在$f_1$和$f_2$下具有零均值。这定义了一个双变量分布$(X,Y)$,具有指定的边缘密度$f_1$和$f_2$,并具有可取的α值区间,正负都有;特别是,独立性对应于调整参数区域内的内点。讨论了应用于双变量泊松分布,允许正负相关。作为示例,我提供了一个更准确和扩展的分析,关于竞争种子和植物的数据集,涉及n=958块土壤,此前在被广泛引用的论文Lakshminarayana, Pandit, Rao, Srinivasa (1999)中分析过。一般的装置也显示适用于负相关的二项分布。这些方法在两个-by-两个表格的元分析框架中得到示例,涉及不同研究中的审计-C筛查问卷,再次展示了负相关,即X为正确“是”的数量,Y为正确“否”的数量。

英文摘要

Suppose $f_1(x)$ and $f_2(y)$ are given marginals for pairs $(x,y)$. I consider the construction $f_1(x)f_2(y)\{ 1+αh_1(x)h_2(y) \}$, where $h_1$ and $h_2$ are seen as bounded adjustment functions, normalised to have means zero under $f_1$ and $f_2$. This defines a bivariate distribution for $(X,Y)$ with the specified marginal densities $f_1$ and $f_2$, with an interval of permissible values of $α$, both positive and negative; in particular, independence corresponds to an innter point in the adjustments parameter region. Applications to bivariate Poisson distributions, allowing both positive and negative correlation, are discussed. As illustration I provide a more accurate and extended analysis of a Poisson pairs dataset, pertaining to competing seeds and plants, for $n=958$ plots of soil, earlier analysed in the well-cited paper Lakshminarayana, Pandit, Rao, Srinivasa (1999). The general apparatus is also shown to work for negatively correlated binomials. Those methods are illustrated in a meta-analysis framework for two-by-two tables across different studies, pertaining to the Audit-C screening questionnaire for alcohol use disorders, where again negative correlation is demonstrated, between $X$, the number of correct `yes', and $Y$, the number of correct `no'.

2605.17559 2026-05-19 stat.ME cs.AI q-bio.QM stat.ML

Controlling False Discovery in Arbitrarily Structured Hypothesis Spaces via Reproducing Kernels

通过再生核来控制任意结构假设空间中的假发现

Binyamin Perets, Shie Mannor

AI总结 本文提出了一种基于再生核的框架,用于在任意结构的假设空间中控制假发现率,通过将结构FDR控制转化为正则化学习问题,实现了对连续域、图和层次结构的统一处理,提高了发现能力。

详情
Comments
9 pages
AI中文摘要

大规模假设检验是现代科学的核心,其中控制假发现率(FDR)已成为管理多个同时检验中假阳性的一种标准方法。假设很少是孤立存在的;它们通常通过接近性、连接性或层次结构表现出结构。这种结构既是挑战也是机会:虽然经典方法将这些依赖性视为需要保守校正的障碍,但利用它们可以显著提高发现能力。本文将结构化的FDR控制重新表述为一个正则化学习问题。通过在合适的再生核希尔伯特空间(RKHS)中优化,我们引入了一个框架,通过仅选择合适的核,将连续域、图和层次结构统一到单一算法中。这种形式化使我们能够用平滑的解决方案替代先前方法的分段常数拟合,通过原理化的基于似然的超参数选择而不是启发式调整,并在未观测位置进行推断,从而支持样本效率的实验设计。在该估计器的基础上,我们提供了两个决策规则,我们证明它们能够控制FDR。我们验证了我们的方法在两个来源上:来自高维现实数据集的空间位置,以及利用蛋白质-蛋白质相互作用图的差异基因表达任务。

英文摘要

Large-scale hypothesis testing is central to modern science, where controlling the False Discovery Rate (FDR) has become the standard approach to managing false positives across many simultaneous tests. Hypotheses rarely exist in isolation; they often exhibit structure through proximity, connectivity, or hierarchy. This structure represents both a challenge and an opportunity: while classical methods treat these dependencies as obstacles requiring conservative correction, leveraging them can substantially increase discovery power. Here, we reframe structured FDR control as a regularized learning problem. By optimizing within a suitable Reproducing Kernel Hilbert Space (RKHS), we introduce a framework that unifies continuous domains, graphs, and hierarchies under a single algorithm through kernel choice alone. This formulation enables smooth solutions in place of the piecewise-constant fits of prior methods, principled likelihood-based hyperparameter selection rather than heuristic tuning, and inference at unobserved locations which in turn supports sample-efficient experimental design. Building on this estimator, we provide two decision rules which we prove to control the FDR. We validate our method on two sources: spatial locations derived from high-dimensional real-world datasets, and a differential gene expression task utilizing protein-protein interaction graphs.

2605.17518 2026-05-19 physics.data-an stat.AP stat.ML

Integrating Bayesian Spectral Deconvolution and Expert Scientific Reasoning for Robust Peak Estimation

将贝叶斯频谱反卷积与专家科学推理相结合以实现稳健的峰估计

Hayato Okubo, Yoshifumi Amamoto, Toshimitsu Aritake, Hiroyuki Kumazoe, Shiryu Nakano, Evan Jamison, Satoshi Tanaka, Yoh-ichi Mototake

AI总结 本文提出一种结合频谱反卷积与专家科学推理的贝叶斯框架,用于在存在高强度噪声或未知背景时稳健地估计峰结构,通过将物理性质回归层与贝叶斯频谱反卷积结合,提升峰识别的准确性。

详情
Comments
55 pages, 26 figures
AI中文摘要

频谱反卷积对于提取编码材料性质和化学结构的峰结构至关重要,但传统自动化方法在频谱包含高强度噪声或未知背景成分时往往失效。在实践中,科学家很少孤立地解释频谱。相反,他们通过将频谱结构与辅助信息如物理性质值、化学结构和相关测量中的趋势联系起来,来识别具有物理意义的峰。本文提出了一种将频谱反卷积与专家科学推理模型相结合的贝叶斯框架。在此工作中,专家科学推理指的是通过候选频谱结构与独立测量的物理性质值的一致性来评估它们,而不是在推断过程中进行手动专家干预。我们将这种推理形式化为一个物理性质回归层,通过高斯过程回归实现,并将其与贝叶斯频谱反卷积结合。通过在从贝叶斯频谱反卷积推断出的后验预测频谱上平均物理性质似然,所提出的方法根据推断频谱结构与物理性质信息之间的一致性来选择频谱模型。我们使用具有高强度噪声或未知背景的合成频谱和聚乳酸的红外频谱验证了该框架。该方法恢复了传统贝叶斯频谱反卷积所遗漏或误识别的具有物理意义的峰结构,包括与测量降解率相关的聚乳酸红外频谱中的弱峰。这些结果表明,将专家科学推理与贝叶斯频谱反卷积相结合,能够在频谱单独推断不可靠的条件下实现稳健的峰估计。

英文摘要

Spectral deconvolution is essential for extracting peak structures that encode material properties and chemical structures, but conventional automated methods often fail when spectra contain high-intensity noise or unknown background components. In practice, scientists rarely interpret spectra in isolation. Instead, they identify physically meaningful peaks by relating spectral structures to auxiliary information such as physical-property values, chemical structures, and trends across related measurements. Here, we propose a Bayesian framework that integrates spectral deconvolution with a model of expert scientific reasoning. In this work, expert scientific reasoning refers to the practice of evaluating candidate spectral structures by their consistency with independently measured physical-property values, rather than to manual expert intervention during inference. We formalize this reasoning as a physical-property regression layer, implemented using Gaussian process regression, and couple it with Bayesian spectral deconvolution. By averaging the physical-property likelihood over posterior predictive spectra inferred from Bayesian spectral deconvolution, the proposed method selects spectral models according to the consistency between inferred spectral structures and physical-property information. We validate the framework using synthetic spectra with high-intensity noise or unknown backgrounds and infrared spectra of poly(lactic acid). The method recovers physically meaningful peak structures that conventional Bayesian spectral deconvolution misses or misidentifies from spectra alone, including weak peaks in poly(lactic acid) IR spectra related to measured degradation rates. These results demonstrate that integrating expert scientific reasoning with Bayesian spectral deconvolution enables robust peak estimation under conditions where spectrum-only inference is unreliable.

2605.17474 2026-05-19 math.ST stat.ME stat.TH

Multivariate EDF tests for uniformity, normality,spherical and elliptical symetry, and independence based on a Brownian sheet deconstruction

多变量EDF检验用于均匀性、正态性、球面和椭球对称性及独立性的Brownian sheet分解

Alejandra Cabaña, Enrique M. Cabaña

AI总结 本文扩展了一种最近提出的基于EDF的适配性检验方法,用于超立方体[0,1]^p,即m检验和s检验,这些方法基于对p参数Brownian sheet的独特分解。通过将原假设所隐含的联合分布经过适当映射后分解为独立的连续分量,可以将问题转化为超立方体上的均匀性检验。本文引入并分析了基于这些原理的新检验方法,用于检验超球面S^p上的均匀性、多变量正态性、球面和椭球对称性以及R^p中的独立性。该方法基于将有限的有符号测度分解为零边际分量,以隔离坐标间相互作用。实证检验显示,这些扩展方法在统计文献中与现有方法具有竞争力,特别是在检测基于坐标的依赖性和联合依赖结构方面表现出高度敏感性。

详情
Comments
Acompanying R package: https://github.com/emcabana/MuniCandS
AI中文摘要

本文扩展了一种最近提出的基于EDF的适配性检验方法,用于超立方体[0,1]^p,即m检验和s检验,这些方法基于对p参数Brownian sheet的独特分解。我们利用这样一个事实:当原假设隐含的联合分布经过适当映射后可以分解为独立的连续分量时,问题可以转化为通过分量概率积分变换在超立方体上进行均匀性检验。具体来说,我们引入并分析了基于这些原理的新检验方法,用于检验超球面S^p上的均匀性,以及多变量正态性、球面和椭球对称性以及R^p中的独立性。该方法基于将有限的有符号测度分解为零边际分量,以隔离坐标相互作用。实证检验显示,这些扩展方法在统计文献中与现有方法具有竞争力,特别是在检测基于坐标的依赖性和联合依赖结构方面表现出高度敏感性。

英文摘要

This paper extends a recently proposed family of EDF-based goodness-of-fit procedures for the hypercube $[0,1]^p$ - the m-test and the s-test - which are based on a unique deconstruction of the $p$-parameter Brownian sheet into independent Gaussian processes. We use the fact that whenever a null hypothesis implies a joint distribution that factorizes into independent continuous components after a suitable mapping, the problem can be reduced to a uniformity test on the hypercube via componentwise probability integral transforms. Specifically, we introduce and analyze new procedures derived from these principles for testing uniformity on the hypersphere $S^p$, as well as multivariate normality, spherical and elliptical symmetry, and independence in $R^p$. The methodology is based on the decomposition of finite signed measures into zero-marginal components to isolate coordinate interactions. Empirical power comparisons show that these extended procedures are highly competitive with existing methods in the statistical literature, demonstrating particular sensitivity to coordinate-based dependencies and joint dependency structures.

2605.17404 2026-05-19 cs.DS cs.CR math.NT math.ST quant-ph stat.TH

Module Lattice Security (Part III): Structured CVP Distance on the Log-Unit Lattice

模块格安全(第三部分):对数单位格上的结构化CVP距离

Ming-Xing Luo

AI总结 本文研究了对数单位格上结构化CVP距离的性质,证明了在随机短环元到对数单位格的L²距离收敛到特定值,并展示了其在Voronoi单元内的位置,同时给出了关于L∞范数的近似因子和粗格定理,以及模块行列式理想中的三角函数定理,最终将ML-KEM的CDPR因子从指数级降低到亚多项式级。

详情
Comments
26 pages (simplied version). Most important part in this series
AI中文摘要

我们证明了从随机短环元到Q(ζ_{2^k})的对数单位格的L²CVP距离随着n=2^{k-1}→∞收敛到π/(2√6)√n。然后我们证明对于k≥4,该目标位于原点的Voronoi单元内。对于L∞范数,最大值在n个子高斯坐标上给出O(√logn),这转化为短生成元问题的亚多项式近似因子。我们展示了粗格定理,Babai算法对所有结构化目标返回零,但能精确恢复任意大小的单位扰动。对于模块行列式理想,我们进一步证明了三角函数定理,证明了内在不平衡σ_{g_0}=O(1)与模q无关。最后,结合前两部分,我们将ML-KEM的CDPR因子从exp(\tO(√n))降低到亚多项式值。

英文摘要

We prove that the $L^2$ CVP distance from a random short ring element to the log-unit lattice of $\Q(ζ_{2^k})$ converges to $\fracπ{2\sqrt{6}}\sqrt{n}$ as $n=2^{k-1}\to\infty$. We then show that this target lies inside the Voronoi cell of the origin for $k\ge 4$. For the $L^\infty$ norm, the maximum over $n$ sub-Gaussian coordinates yields $O(\sqrt{\log n})$ which translates into a sub-polynomial approximation factor for the Short Generator Problem. We show a Coarse Lattice Theorem that Babai's algorithm returns zero for all structured targets, yet exactly recovers unit perturbations of arbitrary size. For module determinant ideals, we further prove the Trigamma Theorem that proves an intrinsic imbalance $σ_{g_0}=O(1)$ independent of the modulus $q$. Finally, combined with Parts I and II, we reduce the CDPR factor for ML-KEM from $\exp(\tO(\sqrt{n}))$ to a sub-polynomial value.

2605.14692 2026-05-19 math.ST stat.ME stat.TH

Asymptotic Anytime-Valid Inference for U-statistics

关于U统计量的渐近任意时刻有效推断

Leheng Cai, Qirui Hu, Weijia Li

AI总结 本文研究了在连续监控下二阶U统计量的渐近任意时刻有效置信序列,通过Hoeffding投影将非退化情况转化为时间均匀的中心极限理论,同时在退化情况下提出SAGE边界以解决二次高斯-混沌近似问题,最终实现非退化和退化情况下的最优置信区间宽度。

详情
AI中文摘要

我们研究了在连续监控下二阶U统计量的渐近任意时刻有效的置信序列。在非退化情况下,Hoeffding的投影将问题转化为对一级投影部分和的时间均匀中心极限理论,同时在较弱的矩假设下证明了标准余项可以忽略。通过leave-one-out jackknife估计器,得到一个完全数据驱动的程序,从而得到具有渐近覆盖保证的参数置信序列。在退化情况下,我们证明U统计量近似于一个中心化的二次高斯-混沌,而非简单的高斯分布,这给顺序推断带来了重大挑战。为了解决这个问题,我们新颖地开发了Spectrally Allocated Gaussian-chaos Excursion(SAGE)边界,并基于截断谱估计提供插件实现,具有一致性保证。所得到的宽度可以达到预期的时间均匀最优速率:在非退化情况下为√(log log n/n),在退化情况下为log log n/n。讨论了几种广泛使用的U统计量,并通过数值实验进一步支持所推导的理论。

英文摘要

We study asymptotic anytime-valid confidence sequences for degree-two U-statistics under continuous monitoring. In the nondegenerate case, Hoeffding's projection reduces the problem to a time-uniform central limit theory for the partial sums of the first-order projection, while the canonical remainder is shown to be negligible under mild moment assumptions. A leave-one-out jackknife estimator then yields a fully data-driven procedure, leading to confidence sequences with asymptotic coverage guarantee for the parameter of interest. In the degenerate case, we show that the U-statistic is approximated by a centered quadratic Gaussian-chaos rather than by a simple Gaussian, which poses significant challenges for sequential inference. To address this issue, we novelly develop the Spectrally Allocated Gaussian-chaos Excursion (SAGE) boundary, and then provide plug-in implementations based on truncated spectrum estimation with consistency guarantees. The resulting widths can attain the expected time-uniform optimal rates: $\sqrt{\log\log n/n}$ in the nondegenerate regime and $\log\log n/n$ in the degenerate regime. Several widely used U-statistics are discussed within the proposed framework, and numerical experiments further support the validity of the derived theory.

2604.12288 2026-05-19 stat.ML cs.LG stat.ME

SMART Fine-tuning Factor Augmented Neural Lasso

SMART Fine-tuning Factor Augmented Neural Lasso

Jinhang Chai, Jianqing Fan, Cheng Gao, Qishuo Yin

AI总结 本文提出了一种结合预训练源模型作为增强特征的残差调优框架(SMART),用于高维非参数回归中的变量选择问题,通过引入低秩因子结构和残差调优分解,实现了协变量和后验偏移的联合处理,并推导了最小最大最优的超额风险界。

详情
Comments
Authors are listed in alphabetical order
AI中文摘要

细调是一种广泛用于将预训练模型适应到新任务的策略,然而在高维非参数设置中,其方法论和理论性质在变量选择方面尚未得到发展。我们提出了一种源模型增强残差调优(SMART)框架,该框架将预训练源模型作为增强特征纳入目标学习者,并仅估计残差目标特定组件。该方法广泛适用,从参数和稀疏模型到神经网络和黑箱机器学习模型。我们专注于细调因子增强神经Lasso的发展,从而得到SMART-FAN-Lasso。这种用于高维非参数回归的迁移学习框架,同时处理协变量和后验偏移。我们使用低秩因子结构来管理高维依赖协变量,并在残差调优分解中将目标函数表示为源模型和其他目标特定变量的函数,从而降低目标任务的有效复杂性。我们推导了最小最大最优的超额风险界,刻画了在相对样本量和函数复杂性条件下,细调在统计加速方面优于单任务学习的精确条件。在广泛的不同协变量和后验偏移场景中进行的大量数值实验表明,SMART-FAN-Lasso在严重的目标样本量约束下仍能超越标准基线,实现接近 oracle 的性能,经验上验证了推导的速率。

英文摘要

Fine-tuning is a widely used strategy for adapting pre-trained models to new tasks, yet its methodology and theoretical properties in high-dimensional nonparametric settings with variable selection have not yet been developed. We propose a source-model-augmented residual tuning (SMART) framework, which incorporates the pre-trained source model as an augmented feature into the target learner and estimates only the residual target-specific component. The approach is widely applicable, from parametric and sparse models to neural networks and blackbox machine learning models. We focus on the development of fine-tuning factor-augmented neural Lasso, resulting in SMART-FAN-Lasso. This transfer-learning framework for high-dimensional nonparametric regression with variable selection simultaneously handles covariate and posterior shifts. We use a low-rank factor structure to manage high-dimensional dependent covariates and a residual tuning decomposition in which the target function is expressed as a function of source model and other target-specific variables, thereby reducing the effective complexity of the target task. We derive minimax-optimal excess risk bounds, characterizing the precise conditions, in terms of relative sample sizes and function complexities, under which fine-tuning yields statistical acceleration over single-task learning. Extensive numerical experiments across diverse covariate- and posterior-shift scenarios demonstrate that SMART-FAN-Lasso consistently outperforms standard baselines and achieves near-oracle performance even under severe target sample size constraints, empirically validating the derived rates.

2604.07630 2026-05-19 physics.geo-ph stat.AP

Diffusional earthquakes and their slip-distance scaling

扩散型地震及其滑动距离标度

Dye SK Sato, Keisuke Yoshida

AI总结 研究通过分析扩散型地震的滑动距离标度,揭示了地震活动区域的扩散迁移特性,并建立了统一的标度关系。

详情
Comments
33 pages, 10 figures
AI中文摘要

地震的最终规模通常无法从其持续的地震辐射中预测。扩展的观测结果揭示了例外情况,如慢地震、注人诱导地震性和地震群,其中断层滑动有上限。这些异常的共同点是其活跃区域的扩散迁移。本文报告了一种统一的标度关系用于这些扩散型地震。通过跟踪日本东北地区持续的地震群,我们约束了其活跃地震区域的时间演化和累积地震矩。它们的矩-持续时间轨迹与全球地震群和诱导地震在各种尺度下的最终状态一致。当以地震矩对地震活动区域作图时,它们的轨迹坍缩到慢地震的轨迹上,这由一个扩散的恒定滑动模型统一解释。这种恒定滑动标度定义了一个独特的扩散型地震类,其中最终可用的地震能量由滑动距离预决定。

英文摘要

The final size of an earthquake typically cannot be predicted from its ongoing seismic radiation. Expanding observations reveal distinct exceptions, such as slow earthquakes, injection-induced seismicity, and earthquake swarms, in which fault slip has an upper bound. A common thread among these anomalies is the diffusive migration of their active areas. Here, we report a unified scaling relation for these diffusional earthquakes. By tracking prolonged earthquake swarms in Northeast Japan, we constrained the time evolution of their active seismicity areas and cumulative seismic moments. Their moment-duration trajectories coincide with the final states documented for global swarms and induced seismicity across various scales. When plotted as seismic moment versus seismicity area, their trajectories collapse onto those of slow earthquakes, uniformly explained by a diffusional constant-slip model. This constant-slip scaling carves out a unique class of diffusional earthquakes, where the final available seismic energy is predetermined by slip distance.

2604.01160 2026-05-19 stat.ME

Machine learning methods for finite population parameter estimation in survey sampling

用于调查抽样中有限总体参数估计的机器学习方法

Mehdi Dagdoug, David Haziza

AI总结 本文探讨了机器学习方法在调查抽样有限总体推断中的应用,重点在于基于设计的有效性与统计推断。虽然灵活的预测工具能显著提高估计准确性,但也带来了重要挑战,主要是由于拟合预测器与样本之间的依赖性。本文聚焦于预测如何通过模型辅助估计、项目非响应插补和单位非响应调整进入调查估计的场景。对于模型辅助估计和项目非响应,展示了交叉拟合和奈曼正交估计方程如何借鉴双重/去偏机器学习的思想,使高维或非参数学习器得以应用,同时在适当条件下保持根n一致性与渐近正态性。相比之下,对于单位非响应,标准逆概率加权方法是结果无关且操作上具有吸引力的,但这一特性使得双重稳健和正交构造在官方统计中更难部署。此外,还简要讨论了小区域估计和概率/非概率数据整合的相关发展。总体而言,本文突显了机器学习的潜力及其对调查实践提出的根本推断挑战。

详情
AI中文摘要

本文是一篇教学性的综述,探讨了在调查抽样中有限总体推断中使用机器学习方法的应用,重点在于基于设计的有效性与统计推断。虽然灵活的预测工具在估计准确性上带来了显著的提升,但它们也引入了重要的挑战,主要由于拟合的预测器与样本之间的依赖性。我们关注的是预测如何通过模型辅助估计、项目非响应插补和单位非响应调整进入调查估计的场景。对于模型辅助估计和项目非响应,我们展示了交叉拟合和奈曼正交估计方程如何借鉴双重/去偏机器学习的思想,使高维或非参数学习器得以应用,同时在适当条件下保持根n一致性与渐近正态性。相比之下,对于单位非响应,标准逆概率加权方法是结果无关且操作上具有吸引力的,但这一特性使得双重稳健和正交构造在官方统计中更难部署。此外,还简要讨论了小区域估计和概率/非概率数据整合的相关发展。总体而言,本文突显了机器学习的潜力及其对调查实践提出的根本推断挑战。

英文摘要

This pedagogical review examines the use of machine learning methods in finite-population inference for survey sampling, with an emphasis on design-based validity and statistical inference. While flexible prediction tools offer substantial gains in estimation accuracy, they also introduce important challenges, primarily due to the dependence between the fitted predictors and the sample. We focus on settings in which such predictions enter survey estimation through model-assisted estimation, item nonresponse imputation, and unit nonresponse adjustment. For model-assisted estimation and item nonresponse, we show how cross-fitting and Neyman-orthogonal estimating equations can adapt ideas from double/debiased machine learning to survey data, allowing the use of high-dimensional or nonparametric learners while preserving root-n consistency and asymptotic normality under suitable conditions. In contrast, for unit nonresponse, standard inverse-probability weighting remains outcome-agnostic and operationally attractive, but this same feature makes doubly robust and orthogonal constructions harder to deploy in official statistics. We also briefly discuss related developments in small area estimation and probability/nonprobability data integration. Overall, the paper highlights both the promise of machine learning and the fundamental inferential challenges it raises for survey practice.

2603.11276 2026-05-19 stat.ML cs.LG

RIE-Greedy: Regularization-Induced Exploration for Contextual Bandits

RIE-Greedy: 基于正则化的探索策略用于上下文老虎机

Tong Li, Thiago de Queiroz Casanova, Eric M. Schwartz, Victor Kostyuk, Dehan Kong, Joseph J. Williams

AI总结 本文提出了一种基于正则化的探索策略(RIE-Greedy),利用模型拟合过程中的随机性作为内在探索源,理论证明其在两臂老虎机情况下等价于Thompson Sampling,并在大规模商业环境中优于epsilon-greedy等基准方法。

详情
AI中文摘要

现实中的复杂奖励模型的上下文老虎机问题通常使用迭代训练的模型(如提升树)来解决。然而,直接应用简单的有效探索策略(如Thompson Sampling或UCB)在这些黑箱估计器上很困难。现有方法依赖于复杂的假设或不可行的程序,难以在实践中验证和实现。本文探讨了一种无探索(纯贪婪)的动作选择策略,利用模型拟合过程中的随机性作为内在探索源。更具体地说,我们注意到基于交叉验证的正则化过程中的随机性可以自然地诱导出Thompson Sampling-like的探索。我们证明了这种正则化诱导的探索在两臂老虎机情况下在理论上等价于Thompson Sampling,并在大规模商业环境中相对于epsilon-greedy和其他最先进的方法在经验上实现了可靠的探索。总体而言,本文揭示了正则化估计器训练本身如何诱导有效的探索,为上下文老虎机设计提供了理论洞察和实践指导。

英文摘要

Real-world contextual bandit problems with complex reward models are often tackled with iteratively trained models, such as boosting trees. However, it is difficult to directly apply simple and effective exploration strategies--such as Thompson Sampling or UCB--on top of those black-box estimators. Existing approaches rely on sophisticated assumptions or intractable procedures that are hard to verify and implement in practice. In this work, we explore the use of an exploration-free (pure-greedy) action selection strategy, that exploits the randomness inherent in model fitting process as an intrinsic source of exploration. More specifically, we note that the stochasticity in cross-validation based regularization process can naturally induce Thompson Sampling-like exploration. We show that this regularization-induced exploration is theoretically equivalent to Thompson Sampling in the two-armed bandit case and empirically leads to reliable exploration in large-scale business environments compared to benchmark methods such as epsilon-greedy and other state-of-the-art approaches. Overall, our work reveals how regularized estimator training itself can induce effective exploration, offering both theoretical insight and practical guidance for contextual bandit design.

2603.09089 2026-05-19 stat.CO math.PR q-bio.NC

Sampling on Discrete Spaces with Temporal Point Processes

在离散空间中使用时间点过程进行采样

Cameron A. Stewart, Maneesh Sahani

AI总结 本文提出了一种基于时间点过程的离散空间采样方法,通过构造多变量时间点过程,使其在固定长度滑动窗口内的事件计数向量收敛于目标分布,同时引入辅助随机性将采样器转化为退化出生-死亡过程,并在多个目标分布上验证了其优越性。

详情
Comments
20 pages, 1 figure. Minor revisions to wording, notation, and formatting. No substantive changes
AI中文摘要

时间点过程为从离散分布中采样提供了一个强大的框架,但在现有文献中仍未被充分利用。我们展示了如何为任何具有向下闭合支持的多变量计数分布构造一个多变量时间点过程,其在固定长度滑动窗口内的事件计数向量随着时间趋于无穷大时收敛于目标分布。该采样器被结构化为一组可能相互耦合的无限服务器队列,具有确定性服务时间,表现出一种离散形式的动量,抑制了随机游走行为。允许的进程家族既包括可逆动态也包括不可逆动态。作为应用,我们推导出一个递归的随机神经网络,其动态实现基于采样的计算,并表现出一些生物合理特征,包括相对抑制期和振荡。引入辅助随机性将采样器转化为出生-死亡过程,从而将后者确立为退化情况,具有相同的极限分布。在63个目标分布的模拟中,我们的采样器始终优于这些出生-死亡过程,并在多变量有效样本量方面频繁优于Zanella过程,进一步在归一化CPU时间下获得进一步增益。

英文摘要

Temporal point processes offer a powerful framework for sampling from discrete distributions, yet they remain underutilized in existing literature. We show how to construct, for any target multivariate count distribution with downward-closed support, a multivariate temporal point process whose event-count vector in a fixed-length sliding window converges in distribution to the target as time tends to infinity. Structured as a system of potentially coupled infinite-server queues with deterministic service times, the sampler exhibits a discrete form of momentum that suppresses random-walk behaviour. The admissible families of processes permit both reversible and non-reversible dynamics. As an application, we derive a recurrent stochastic neural network whose dynamics implement sampling-based computation and exhibit some biologically plausible features, including relative refractory periods and oscillations. The introduction of auxiliary randomness reduces the sampler to a birth-death process, establishing the latter as a degenerate case with the same limiting distribution. In simulations on 63 target distributions, our sampler always outperforms these birth-death processes and frequently outperforms Zanella processes in multivariate effective sample size, with further gains when normalized by CPU time.

2602.21426 2026-05-19 cs.LG stat.CO

Proximal-IMH: Proximal Posterior Proposals for Independent Metropolis-Hastings with Approximate Operators

Proximal-IMH: 用于独立Metropolis-Hastings的近端后验提议

Youguang Chen, George Biros

AI总结 本文提出了一种改进的独立Metropolis-Hastings算法,通过引入辅助优化问题来消除近似后验分布中的偏差,从而在保持精确模型的同时提高稳定性和采样效率。

详情
AI中文摘要

我们考虑了在科学、工程和成像中的贝叶斯反问题中从后验分布采样的问题。我们的方法属于独立Metropolis-Hastings(IMH)采样算法家族,常用于贝叶斯推断。依赖于存在一个更便宜但可能有显著偏差的近似后验分布,我们引入了Proximal-IMH,通过辅助优化问题纠正近似后验的样本,从而在精确模型和近似参考点周围获得局部调整。对于理想化设置,我们证明了近端校正能够收紧近似和精确后验之间的匹配,从而提高接受率和混合性。该方法适用于线性和非线性输入-输出算子,并特别适用于精确后验采样成本过高的反问题。我们展示了包含多模态和数据驱动先验的数值实验,结果表明Proximal-IMH在现有IMH变体中表现更优。

英文摘要

We consider the problem of sampling from a posterior distribution arising in Bayesian inverse problems in science, engineering, and imaging. Our method belongs to the family of independence Metropolis-Hastings (IMH) sampling algorithms, which are common in Bayesian inference. Relying on the existence of an approximate posterior distribution that is cheaper to sample from but may have significant bias, we introduce Proximal-IMH, a scheme that removes this bias by correcting samples from the approximate posterior through an auxiliary optimization problem. This yields a local adjustment that trades off adherence to the exact model against stability around the approximate reference point. For idealized settings, we prove that the proximal correction tightens the match between approximate and exact posteriors, thereby improving acceptance rates and mixing. The method applies to both linear and nonlinear input-output operators and is particularly suitable for inverse problems where exact posterior sampling is too expensive. We present numerical experiments including multimodal and data-driven priors with nonlinear input-output operators. The results show that Proximal-IMH reliably outperforms existing IMH variants.

2602.05742 2026-05-19 stat.ML cs.LG math.ST stat.TH

Fast Rates for Nonstationary Weighted Risk Minimization

非平稳加权风险最小化中的快速收敛速率

Tobias Brock, Thomas Nagler

AI总结 本文研究了非平稳条件下加权经验风险最小化方法的样本外预测误差,提出了一种将超额风险分解为学习项和分布漂移相关项的通用分解方法,并在混合条件下证明了学习误差的Oracle不等式,考虑了权重向量的有效样本量、权重和假设类的复杂性以及数据依赖性。

详情
AI中文摘要

加权经验风险最小化是一种在分布漂移下进行预测的常见方法。本文研究其非平稳条件下的样本外预测误差。我们提供了一个将超额风险分解为学习项和与分布漂移相关项的通用分解,并在混合条件下证明了学习误差的Oracle不等式。学习界在任意权重类上均匀成立,并考虑了权重向量诱导的有效样本量、权重和假设类的复杂性以及潜在的数据依赖性。我们在回归问题中展示了结果的适用性和精确性,使用线性模型、基函数逼近和神经网络,当专门应用于无权和平稳设置时,恢复了最小最大最优速率(除对数因子外)

英文摘要

Weighted empirical risk minimization is a common approach to prediction under distribution drift. This article studies its out-of-sample prediction error under nonstationarity. We provide a general decomposition of the excess risk into a learning term and an error term associated with distribution drift, and prove oracle inequalities for the learning error under mixing conditions. The learning bound holds uniformly over arbitrary weight classes and accounts for the effective sample size induced by the weight vector, the complexity of the weight and hypothesis classes, and potential data dependence. We illustrate the applicability and sharpness of our results in (auto-) regression problems with linear models, basis approximations, and neural networks, recovering minimax-optimal rates (up to logarithmic factors) when specialized to unweighted and stationary settings.

2602.04872 2026-05-19 stat.ML cs.AI cs.LG

Multi-layer Cross-attention is Provably Optimal for Multi-modal In-context Learning

多层交叉注意力是多模态上下文学习中可证明最优的

Nicholas Barnfield, Subhabrata Sen, Pragya Sur

AI总结 本文研究了多模态上下文学习中多层交叉注意力机制的理论最优性,证明了在多模态数据下,交叉注意力机制在梯度流优化下可达到贝叶斯最优,同时指出单层线性自注意力无法在任务分布下统一恢复贝叶斯最优预测。

详情
AI中文摘要

近期进展迅速推动了我们对现代基于注意力的神经网络中上下文学习机制的理解。然而,现有结果仅专注于单模态数据;相比之下,多模态数据的上下文学习的理论基础仍不清晰。我们引入了一个数学上可处理的框架来研究多模态学习,并探讨了在何种情况下Transformer-like架构可以在上下文中恢复贝叶斯最优性能。为了建模多模态问题,我们假设观测数据来自一个潜在因子模型。我们的第一个结果是对表达性的否定:我们证明单层线性自注意力无法在任务分布下统一恢复贝叶斯最优预测。为了解决这一限制,我们引入了一种新的线性化交叉注意力机制,并在交叉注意力层和上下文长度都较大的情况下进行研究。我们证明,当使用梯度流优化时,这种交叉注意力机制可证明是贝叶斯最优的。我们的结果强调了深度对上下文学习的好处,并确立了交叉注意力在多模态分布中的可证明效用。

英文摘要

Recent progress has rapidly advanced our understanding of the mechanisms underlying in-context learning in modern attention-based neural networks. However, existing results focus exclusively on unimodal data; in contrast, the theoretical underpinnings of in-context learning for multi-modal data remain poorly understood. We introduce a mathematically tractable framework for studying multi-modal learning and explore when transformer-like architectures can recover Bayes-optimal performance in-context. To model multi-modal problems, we assume the observed data arises from a latent factor model. Our first result comprises a negative take on expressibility: we prove that single-layer, linear self-attention fails to recover the Bayes-optimal predictor uniformly over the task distribution. To address this limitation, we introduce a novel, linearized cross-attention mechanism, which we study in the regime where both the number of cross-attention layers and the context length are large. We show that this cross-attention mechanism is provably Bayes optimal when optimized using gradient flow. Our results underscore the benefits of depth for in-context learning and establish the provable utility of cross-attention for multi-modal distributions.

2602.02830 2026-05-19 cs.LG stat.ME

SC3D: Dynamic and Differentiable Causal Discovery for Temporal and Instantaneous Graphs

SC3D:动态和可微的因果发现用于时序和瞬时图

Sourajit Das, Dibyajyoti Chakraborty, Romit Maulik

AI总结 本文提出SC3D,一种动态和可微的因果发现方法,用于处理时序和瞬时图,通过两阶段可微框架联合学习滞后特定的邻接矩阵和瞬时有向无环图,提升了因果结构的稳定性和准确性。

详情
Comments
12 pages
AI中文摘要

从多变量时间序列中发现因果结构是一个关键问题,因为相互作用跨越多个滞后并可能涉及瞬时依赖。此外,动态图的搜索空间本质上是组合性的。在本研究中,我们提出稳定因果动态可微发现(SC3D),一种两阶段可微框架,联合学习滞后特定的邻接矩阵以及如果存在的话瞬时有向无环图(DAG)。在第一阶段,SC3D通过节点级预测进行边预选以获得滞后和瞬时边的掩码,而第二阶段通过优化具有稀疏性的似然函数并强制瞬时块的无环性来细化这些掩码。在合成SVAR系统、非线性和混沌基准、非平稳动态和现实世界数据集上的数值结果表明,SC3D在稳定性和准确性方面优于现有基线,能够更准确地恢复滞后和瞬时因果结构。

英文摘要

Discovering causal structures from multivariate time series is a key problem because interactions span across multiple lags and possibly involve instantaneous dependencies. Additionally, the search space of the dynamic graphs is combinatorial in nature. In this study, we propose Stable Causal Dynamic Differentiable Discovery (SC3D), a two-stage differentiable framework that jointly learns lag-specific adjacency matrices and, if present, an instantaneous directed acyclic graph (DAG). In Stage 1, SC3D performs edge preselection through node-wise prediction to obtain masks for lagged and instantaneous edges, whereas Stage 2 refines these masks by optimizing a likelihood with sparsity along with enforcing acyclicity on the instantaneous block. Numerical results across synthetic SVAR systems, nonlinear and chaotic benchmarks, nonstationary dynamics and real-world datasets demonstrate that SC3D achieves improved stability and more accurate recovery of both lagged and instantaneous causal structures compared to existing baselines.

2602.01733 2026-05-19 stat.ML cs.LG

ST-BCP: Tightening Coverage Bound for Backward Conformal Prediction via Non-Conformity Score Transformation

ST-BCP:通过非一致性分数转换紧缩后向符合预测的覆盖界

Junxian Liu, Hao Zeng, Hongxin Wei

AI总结 本文提出ST-BCP方法,通过引入数据依赖的非一致性分数转换来缩小后向符合预测中的覆盖界差距,实验表明该方法有效减少了覆盖差距。

详情
AI中文摘要

符合预测(CP)提供了一个用于不确定性量化统计框架,能够构造具有覆盖保证的预测集。尽管CP会产生不受控的预测集大小,后向符合预测(BCP)通过强制设定预测集大小的上界并估计由此产生的覆盖保证来反转这一范式。然而,BCP框架内马尔可夫不等式引起的松散性导致估计的覆盖界与经验覆盖之间存在显著差距。在本文中,我们提出ST-BCP,一种新颖的方法,引入数据依赖的非一致性分数转换以缩小覆盖差距。具体而言,我们开发了一种可计算的转换并证明其优于基线的恒等转换。广泛的实验展示了我们方法的有效性,在常见基准上将平均覆盖差距从4.20\%降至1.12\%。

英文摘要

Conformal Prediction (CP) provides a statistical framework for uncertainty quantification that constructs prediction sets with coverage guarantees. While CP yields uncontrolled prediction set sizes, Backward Conformal Prediction (BCP) inverts this paradigm by enforcing a predefined upper bound on set size and estimating the resulting coverage guarantee. However, the looseness induced by Markov's inequality within the BCP framework causes a significant gap between the estimated coverage bound and the empirical coverage. In this work, we introduce ST-BCP, a novel method that introduces a data-dependent transformation of nonconformity scores to narrow the coverage gap. In particular, we develop a computable transformation and prove that it outperforms the baseline identity transformation. Extensive experiments demonstrate the effectiveness of our method, reducing the average coverage gap from 4.20\% to 1.12\% on common benchmarks.

2601.20888 2026-05-19 stat.ML cs.LG math.ST stat.CO stat.TH

Latent-IMH: Efficient Bayesian Inference for Inverse Problems with Approximate Operators

Latent-IMH: 高效的贝叶斯推断用于具有近似算子的反问题

Youguang Chen, George Biros

AI总结 本文研究了在贝叶斯线性反问题中如何高效地从后验分布采样,其中参数到观测算子A计算成本高。通过将A分解为可构造低成本近似算子A~的方式,提出了一种基于Metropolis-Hastings独立采样器的Latent-IMH方法,通过近似算子生成中间潜在变量并利用精确算子进行优化,从而将计算成本转移到离线阶段,理论分析表明其在KL散度和混合时间上表现优异,实验显示其在计算效率上优于NUTS等现有方法。

详情
AI中文摘要

我们研究了在贝叶斯线性反问题中从后验分布采样,其中A,参数到观测算子,计算成本高。在许多应用中,A可以分解为一种方式,从而构造出一个成本效益高的近似A~。在该框架中,我们引入了Latent-IMH,一种基于Metropolis-Hastings独立(IMH)采样器的采样方法。Latent-IMH首先使用近似A~生成中间潜在变量,然后利用精确A进行优化。其主要优势是将计算成本转移到离线阶段。我们通过KL散度和混合时间界限理论分析了Latent-IMH的性能。使用多个模型问题的数值实验,我们表明,在合理假设下,它在计算效率上优于NUTS等现有方法。在某些情况下,Latent-IMH比现有方案快几个数量级。

英文摘要

We study sampling from posterior distributions in Bayesian linear inverse problems where $A$, the parameters to observables operator, is computationally expensive. In many applications, $A$ can be factored in a manner that facilitates the construction of a cost-effective approximation $\tilde{A}$. In this framework, we introduce Latent-IMH, a sampling method based on the Metropolis-Hastings independence (IMH) sampler. Latent-IMH first generates intermediate latent variables using the approximate $\tilde{A}$, and then refines them using the exact $A$. Its primary benefit is that it shifts the computational cost to an offline phase. We theoretically analyze the performance of Latent-IMH using KL divergence and mixing time bounds. Using numerical experiments on several model problems, we show that, under reasonable assumptions, it outperforms state-of-the-art methods such as the No-U-Turn sampler (NUTS) in computational efficiency. In some cases, Latent-IMH can be orders of magnitude faster than existing schemes.

2601.10100 2026-05-19 math.ST stat.ME stat.TH

Prediction Suboptimality of the Lasso in Sparse Linear Regression

Lasso在稀疏线性回归中的预测次优性

Guo Liu

AI总结 研究Lasso在高维线性回归中选择调参方式时的预测性能次优问题,发现通过简单改进可在高概率事件和均方误差上提升性能,分析了高斯极大值在选定或局部化支撑上的作用,并讨论了设计矩阵结构因素对次优现象的影响及扩展至其他估计器和更一般噪声结构的可能性。

详情
Comments
26 pages; revised version
AI中文摘要

在高维线性回归中,Lasso的调参选择对其统计性能至关重要。本文研究了Lasso在哪些调参条件下会表现出预测性能次优,即简单的改进在高概率事件和均方预测误差上均优于Lasso。我们的分析表明,相关的随机尺度由选定或局部化支撑上的高斯极大值决定,这可能比Lasso理论中的通用速率更具信息量。我们进一步展示了设计矩阵的结构性质如何影响次优现象,并讨论了扩展到其他估计器和更一般的噪声结构的可能性。

英文摘要

The choice of the tuning parameter in the Lasso is central to its statistical performance in high-dimensional linear regression. In this work, we study tuning regimes under which the Lasso exhibits suboptimal prediction performance, in the sense that a simple refinement improves upon it both on high-probability events and in mean squared prediction error. Our analysis shows that the relevant stochastic scale is governed by Gaussian maxima on the selected or localized support, which may be more informative than the universal rate in Lasso theory. We further illustrate how structural factors in the design matrix can influence the suboptimality phenomenon and discuss extensions to other estimators and more general noise structures.

2512.24497 2026-05-19 cs.AI cs.LG cs.RO stat.ML

What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?

在联合嵌入预测世界模型中成功因素是什么?

Basile Terver, Tsung-Yen Yang, Jean Ponce, Adrien Bardes, Yann LeCun

AI总结 本文研究了在物理规划中使用联合嵌入预测世界模型(JEPA-WMs)的成功因素,通过分析模型架构、训练目标和规划算法对规划成功的影响,提出了一种在导航和操作任务中优于现有基线方法的模型。

详情
Comments
V2 of the article: - Added AdaLN-zero - Added table comparing JEPA-WMs with baselines with std translating per-seed variability only, no variability across epochs - Reordered figures in main body of the paper V3: added data scaling experiments, theoretical appendix section on autoregressive rollout, acceptance at TMLR
AI中文摘要

人工智能领域长期存在的挑战是开发能够解决广泛物理任务并泛化到新、未见过的任务和环境的智能体。一种流行的近期方法是通过状态-动作轨迹训练世界模型,然后使用规划算法解决新任务。规划通常在输入空间中进行,但最近出现的一类方法引入了在学习的表示空间中优化的规划算法,其承诺通过抽象无关细节来提高规划效率。在本工作中,我们将此类模型称为JEPA-WMs,并研究使此类算法有效技术选择。我们提出了一项全面研究几个关键组件,旨在找到该类中的最佳方法。我们使用模拟环境和真实世界机器人数据进行了实验,并研究了模型架构、训练目标和规划算法对规划成功的影响。我们结合发现,提出了一种在导航和操作任务中优于两个现有基线方法(DINO-WM和V-JEPA-2-AC)的模型。代码、数据和检查点可在https://github.com/facebookresearch/jepa-wms上获得。

英文摘要

A long-standing challenge in AI is to develop agents capable of solving a wide range of physical tasks and generalizing to new, unseen tasks and environments. A popular recent approach involves training a world model from state-action trajectories and subsequently use it with a planning algorithm to solve new tasks. Planning is commonly performed in the input space, but a recent family of methods has introduced planning algorithms that optimize in the learned representation space of the world model, with the promise that abstracting irrelevant details yields more efficient planning. In this work, we characterize models from this family as JEPA-WMs and investigate the technical choices that make algorithms from this class work. We propose a comprehensive study of several key components with the objective of finding the optimal approach within the family. We conducted experiments using both simulated environments and real-world robotic data, and studied how the model architecture, the training objective, and the planning algorithm affect planning success. We combine our findings to propose a model that outperforms two established baselines, DINO-WM and V-JEPA-2-AC, in both navigation and manipulation tasks. Code, data and checkpoints are available at https://github.com/facebookresearch/jepa-wms.

2512.23178 2026-05-19 math.OC cs.LG stat.ML

Clipped Gradient Methods for Nonsmooth Convex Optimization under Heavy-Tailed Noise: A Refined Analysis

针对重尾噪声下的非光滑凸优化的截断梯度方法:一种细化分析

Zijian Liu

AI总结 本文针对重尾噪声下的非光滑凸优化问题,提出了一种改进的截断梯度方法,并在高概率和期望收敛方面提供了更优的收敛速率和理论分析。

详情
Comments
A preliminary conference version is accepted at ICLR 2026. This full version includes the formal statements of lower bounds and their proofs. v3: fixed some typos
AI中文摘要

在重尾噪声下的优化问题近年来变得流行,因为它更好地拟合了许多现代机器学习任务,如经验观察所捕获的。具体来说,而不是对梯度噪声有有限的二阶矩,已被认识到一个有界的p阶矩,其中p∈(1,2]更现实(例如上界由σ_l^p对于某些σ_l≥0)。一个简单而有效的操作,梯度截断,已知能成功处理这个新的挑战。具体来说,截断随机梯度下降(Clipped SGD)保证了非光滑凸(resp.强凸)问题的高概率速率O(σ_l ln(1/δ)T^{1/p-1})(resp. O(σ_l^2 ln^2(1/δ)T^{2/p-2})),其中δ∈(0,1]是失败概率,T∈N是时间范围。在本文中,我们为Clipped SGD提供了一种细化分析,并提供了两个速率,O(σ_l d_{eff}^{-1/(2p)} ln^{1-1/p}(1/δ) T^{1/p-1})和O(σ_l^2 d_{eff}^{-1/p} ln^{2-2/p}(1/δ) T^{2/p-2}),比上述最佳结果更快,其中d_{eff}≥1是我们称为“广义有效维度”的量。我们的分析在两个方面优于现有方法:更有效地利用Freedman不等式和更精细的截断误差界在重尾噪声下。此外,我们将细化分析扩展到期望收敛,并获得新的速率,突破了已知的下界。最后,为了补充研究,我们为高概率和期望收敛建立了新的下界。值得注意的是,期望下界与我们的新上界相匹配,表明我们的细化分析在期望收敛方面是最佳的。

英文摘要

Optimization under heavy-tailed noise has become popular recently, since it better fits many modern machine learning tasks, as captured by empirical observations. Concretely, instead of a finite second moment on gradient noise, a bounded ${\frak p}$-th moment where ${\frak p}\in(1,2]$ has been recognized to be more realistic (say being upper bounded by $σ_{\frak l}^{\frak p}$ for some $σ_{\frak l}\ge0$). A simple yet effective operation, gradient clipping, is known to handle this new challenge successfully. Specifically, Clipped Stochastic Gradient Descent (Clipped SGD) guarantees a high-probability rate ${\cal O}(σ_{\frak l}\ln(1/δ)T^{1/{\frak p}-1})$ (resp. ${\cal O}(σ_{\frak l}^2\ln^2(1/δ)T^{2/{\frak p}-2})$) for nonsmooth convex (resp. strongly convex) problems, where $δ\in(0,1]$ is the failure probability and $T\in\mathbb{N}$ is the time horizon. In this work, we provide a refined analysis for Clipped SGD and offer two rates, ${\cal O}(σ_{\frak l}d_{\rm eff}^{-1/2{\frak p}}\ln^{1-1/{\frak p}}(1/δ)T^{1/{\frak p}-1})$ and ${\cal O}(σ_{\frak l}^2d_{\rm eff}^{-1/{\frak p}}\ln^{2-2/{\frak p}}(1/δ)T^{2/{\frak p}-2})$, faster than the aforementioned best results, where $d_{\rm eff}\ge1$ is a quantity we call the $\textit{generalized effective dimension}$. Our analysis improves upon the existing approach on two sides: better utilization of Freedman's inequality and finer bounds for clipping error under heavy-tailed noise. In addition, we extend the refined analysis to convergence in expectation and obtain new rates that break the known lower bounds. Lastly, to complement the study, we establish new lower bounds for both high-probability and in-expectation convergence. Notably, the in-expectation lower bounds match our new upper bounds, indicating the optimality of our refined analysis for convergence in expectation.

2512.22098 2026-05-19 stat.ME math.PR math.ST q-bio.PE stat.CO stat.TH

Exact inference via quasi-conjugacy in two-parameter Poisson-Dirichlet hidden Markov models

通过准共轭性在双参数泊松-狄利克雷隐马尔可夫模型中实现精确推断

Marco Dalla Pria, Matteo Ruggiero, Dario Spanò

AI总结 本文提出了一种非参数模型,用于从离散时间数据中推断时间演变的未观察概率分布,数据由无标签的划分组成。潜在过程是双参数泊松-狄利克雷扩散过程,观测通过可交换抽样产生。应用包括社会和遗传数据,其中仅观察到聚类汇总信息。为了解决不可行的似然,我们开发了一个可计算的推断框架,避免了标签枚举和直接模拟潜在状态。我们利用扩散过程与在划分上的纯死亡过程之间的对偶性,以及编码新数据影响的凝集算子,从而得到前向和后向推断的闭式递归更新。我们计算了任意时间点潜在状态的精确后验分布和未来或插值划分的预测分布。这使我们能够进行在线和离线推断和预测,并完全量化不确定性,绕过MCMC和序列蒙特卡罗方法。与粒子滤波相比,我们的方法在准确性、方差和计算效率方面都有显著优势。我们通过合成实验和社会网络应用展示了该方法,恢复了时间变化异质性的可解释模式。

详情
Comments
Final accepted version. To appear in JASA
AI中文摘要

我们介绍了一种非参数模型,用于从由无标签划分组成的离散时间数据中推断时间演变的未观察概率分布。潜在过程是一个双参数泊松-狄利克雷扩散过程,观测通过可交换抽样产生。应用包括社会和遗传数据,其中仅观察到聚类汇总信息。为了解决不可行的似然,我们开发了一个可计算的推断框架,避免了标签枚举和直接模拟潜在状态。我们利用扩散过程与在划分上的纯死亡过程之间的对偶性,以及编码新数据影响的凝集算子,从而得到前向和后向推断的闭式递归更新。我们计算了任意时间点潜在状态的精确后验分布和未来或插值划分的预测分布。这使我们能够进行在线和离线推断和预测,并完全量化不确定性,绕过MCMC和序列蒙特卡罗方法。与粒子滤波相比,我们的方法在准确性、方差和计算效率方面都有显著优势。我们通过合成实验和社会网络应用展示了该方法,恢复了时间变化异质性的可解释模式。

英文摘要

We introduce a nonparametric model for inferring time-evolving, unobserved probability distributions from discrete-time data consisting of unlabelled partitions. The latent process is a two-parameter Poisson-Dirichlet diffusion, and observations arise via exchangeable sampling. Applications include social and genetic data where only aggregate clustering summaries are observed. To address the intractable likelihood, we develop a tractable inferential framework that avoids label enumeration and direct simulation of the latent state. We exploit a duality between the diffusion and a pure-death process on partitions, together with coagulation operators that encode the effect of new data. These yield closed-form, recursive updates for forward and backward inference. We compute exact posterior distributions of the latent state at arbitrary times and predictive distributions of future or interpolated partitions. This enables online and offline inference and forecasting with full uncertainty quantification, bypassing MCMC and sequential Monte Carlo. Compared to particle filtering, our method achieves higher accuracy, lower variance, and substantial computational gains. We illustrate the methodology with synthetic experiments and a social network application, recovering interpretable patterns in time-varying heterozygosity.

2512.11089 2026-05-19 stat.ML cs.LG

TPV: Parameter Perturbations Through the Lens of Test Prediction Variance

TPV:通过测试预测方差的透镜进行参数扰动分析

Devansh Arpit

AI总结 本文引入测试预测方差(TPV)作为分析训练后鲁棒性的统一框架,通过研究参数扰动对模型输出的一阶敏感性,揭示了SGD噪声、标签噪声、量化和剪枝等机制的统一视角,并提出了基于TPV的剪枝准则和模型选择方法。

详情
Comments
ICML 2026
AI中文摘要

我们引入测试预测方差(TPV)——训练模型输出对参数扰动的一阶敏感性——作为分析训练后鲁棒性的统一框架。TPV是一个完全标签无关的对象,其迹形式将训练好的模型几何结构与特定扰动机制分离,将SGD噪声、标签噪声、量化和剪枝置于同一个视角下。所得到的表达式恢复了SGD和量化噪声的宽谷假设,并给出了标签噪声的Jacobian谱特征,将标签噪声TPV与非线性网络中的良性过拟合联系起来。理论上,我们证明在过参数化极限下,训练集TPV收敛到其测试集对应值,无论泛化性能如何,提供了首个结果:预测方差在局部参数扰动下可以通过训练输入单独推断。经验上,这种稳定性在更广泛的范围内成立,包括非常低的宽度。此外,TPV与测试损失相关联,使其具有实际应用价值:JBR,一种基于TPV几何匹配的无标签剪枝准则,实现了最先进的基线;以及基于训练集的模型选择信号,适用于分布内和迁移学习场景。代码可在github.com/devansharpit/TPV获得。

英文摘要

We introduce test prediction variance (TPV)--the first-order sensitivity of a trained model's outputs to parameter perturbations--as a unifying framework for analyzing post-training robustness. TPV is a fully label-free object whose trace form separates the geometry of the trained model from the specific perturbation mechanism, placing SGD noise, label noise, quantization, and pruning under a single lens. The resulting expressions recover the wide-minima hypothesis for SGD and quantization noise, and yield a distinct Jacobian-spectral characterization for label noise connecting label-noise TPV with benign overfitting in nonlinear networks. Theoretically, we prove that training-set TPV converges to its test-set counterpart in the overparameterized limit, irrespective of generalization performance, providing the first result that prediction variance under local parameter perturbations can be inferred from training inputs alone. Empirically, this stability holds far more broadly, including at very low widths. Further, TPV correlates well with test loss, enabling practical applications: JBR, a label-free pruning criterion derived from TPV geometry matching state-of-the-art baselines; and training-set based model selection signal for in-distribution and transfer learning scenarios. Code available at github.com/devansharpit/TPV.

2510.10140 2026-05-19 cs.LG cs.CR stat.ML

Adversarial Attacks on Downstream Weather Forecasting Models: Application to Tropical Cyclone Trajectory Prediction

对下游天气预测模型的对抗攻击:应用于热带气旋轨迹预测

Yue Deng, Francisco Santos, Pang-Ning Tan, Lifeng Luo

AI总结 本文研究了对抗攻击对深度学习天气预测模型的脆弱性,提出了一种新的攻击方法Cyc-Attack,用于生成对抗性轨迹,以提高攻击的准确性并减少检测难度。

详情
Comments
Compared with the previous version, we added zeroth-order optimization methods as baselines, clarified the motivation for using a surrogate model, and provided a more detailed investigation of the upstream attack
AI中文摘要

基于深度学习的天气预测(DLWF)模型利用过去的天气观测数据生成未来的预测,支持广泛的应用,包括热带气旋(TC)预测。在本文中,我们研究了这些模型对对抗攻击的脆弱性,其中对上游预测的细微扰动可以改变下游TC轨迹预测。尽管最近对DLWF模型的对抗攻击研究有所增长,但仍然具有挑战性,即创建扰动的上游预测,使下游输出朝向攻击者指定的轨迹。首先,传统的TC检测系统是不透明的、非可微的黑箱,这使得标准的梯度基攻击不可行。其次,TC事件的极端稀有性导致严重的类别不平衡问题,使得开发扰动上游预测的方法变得困难,这些扰动产生的轨迹看起来真实并与攻击者的目标轨迹一致。为了克服这些限制,我们提出了Cyc-Attack,一种新的方法,用于扰动DLWF模型的上游预测以生成对抗性轨迹。所提出的方法使用可微的替代模型来近似TC检测器的输出,使梯度基攻击的应用成为可能。Cyc-Attack还采用了一种考虑偏度的损失函数和核扩张策略来解决不平衡问题。最后,基于距离的梯度加权方案和正则化用于约束扰动并消除不真实的轨迹,从而使对抗性上游预测更难以检测。我们的实验表明,Cyc-Attack在匹配攻击者目标轨迹方面具有更高的真实阳性率,同时具有更低的误报率和更隐蔽的扰动,优于传统攻击方法。

英文摘要

Deep learning-based weather forecasting (DLWF) models leverage past weather observations to generate future forecasts, supporting a wide range of downstream applications, including tropical cyclone (TC) prediction. In this paper, we investigate their vulnerability to adversarial attacks, where subtle perturbations to the upstream forecasts can alter the downstream TC trajectory predictions. Although research into adversarial attacks on DLWF models has grown recently, it remains challenging to craft perturbed upstream forecasts that steer the downstream outputs toward attacker-specified trajectories. First, conventional TC detection systems are opaque, non-differentiable black boxes, making standard gradient-based attacks infeasible. Second, the extreme rarity of TC events leads to severe class imbalance problem, making it difficult to develop attack methods for perturbing upstream forecasts that produce realistic-looking cyclone paths aligned with attacker's target trajectories. To overcome these limitations, we propose Cyc-Attack, a novel method for perturbing the upstream forecasts of DLWF models to generate adversarial trajectories. The proposed method uses a differentiable surrogate model to approximate the TC detector's output, enabling the application of gradient-based attacks. Cyc-Attack also employs a skewness-aware loss function with kernel dilation strategy to address the imbalance problem. Finally, a distance-based gradient weighting scheme and regularization are used to constrain the perturbations and eliminate unrealistic-looking trajectories, thereby making the adversarial upstream forecasts less easily detectable. Our experiments show that Cyc-Attack achieves a higher true positive rate in matching the attacker's target trajectories, along with lower false alarm rates and stealthier perturbations than conventional attack methods.

2510.06388 2026-05-19 cs.LG cs.DS stat.ML

Truthful Calibration Errors for Multi-Class Prediction

多类预测中的诚实校准误差

Yuxuan Lu, Yifan Wu, Jason Hartline, Lunjia Hu

AI总结 本文研究了多类预测中诚实校准误差的实用作用,提出了完美诚实校准误差以处理标签分布的多维线性属性,并分析了这些诚实误差在决策理论上的影响,从而解释并缓解了分箱校准误差的排名鲁棒性问题。

详情
AI中文摘要

校准预测之所以有用,是因为其数值可以被解释为概率。校准误差因此被广泛用于评估、比较和调整概率预测器。最近,Haghtalab等人(2024)引入了一个额外的要求:诚实性。如果预测器通过报告真实的条件标签分布来最小化其预期测量误差,则校准度量是诚实的。许多标准的经验校准误差是非诚实的:预测器可能通过扭曲其概率而不是报告真实值来显得更校准。我们研究了诚实性在多类预测中校准测量的实用作用。首先,我们引入了完美诚实校准误差以处理标签分布的多维线性属性,推广了Hartline等人(2025)中二元预测的诚实校准误差。此框架包括完整的多类校准和类内校准。我们还确定了置信度校准的诚实修正。其次,我们分析了这些诚实误差的决策理论影响。对于校准预测器,诚实校准误差保持了Blackwell主导性:更信息丰富的校准预测器不会产生更大的预期误差。第三,我们表明这种决策理论解释解释并缓解了已观察到的分箱校准误差的排名鲁棒性问题。经验上,非诚实的置信度校准误差在分箱数量变化时可能逆转模型排名,而我们的诚实误差在不同分箱选择下提供更稳定的排名。

英文摘要

Calibrated predictions are useful because their numerical values can be interpreted as probabilities. Calibration errors are therefore widely used to evaluate, compare, and tune probabilistic predictors. Recently, Haghtalab et al. (2024) introduced an additional requirement for such measures: truthfulness. A calibration measure is truthful if a predictor minimizes its expected measured error by reporting the true conditional label distribution. Many standard empirical calibration errors are non-truthful: a predictor may appear better calibrated by distorting its probabilities rather than reporting them truthfully. We study the practical role of truthfulness for calibration measurement in multiclass prediction. First, we introduce perfectly truthful calibration errors for multidimensional linear properties of the label distribution, generalizing the truthful calibration error for binary predictions in Hartline et al. (2025). This framework includes full multiclass calibration and classwise calibration. We also identify a truthful correction for confidence calibration. Second, we characterize the decision-theoretic implications of these truthful errors. For calibrated predictors, truthful calibration errors preserve the Blackwell dominance: a more informative calibrated predictor receives no larger expected error. Third, we show that this decision-theoretic interpretation explains and mitigates the well-observed ranking robustness problem of binned calibration errors. Empirically, non-truthful confidence-based errors can reverse model rankings when the number of bins changes, while our truthful errors give more stable rankings across binning choices.

2509.03151 2026-05-19 math.NA cs.NA stat.ML

Convergence for adaptive resampling of random Fourier features

自适应重采样的随机傅里叶特征收敛性

Xin Huang, Aku Kammonen, Anamika Pandey, Mattias Sandberg, Erik von Schwerin, Anders Szepessy, Raúl Tempone

AI总结 本文研究了高维数据中基于随机傅里叶特征的机器学习方法的收敛性,通过自适应重采样频率的方法证明了在节点数和数据量趋于无穷时的收敛性,并通过数值实验验证了回归和分类问题的分析结果。

详情
Comments
50 pages, 19 figures
AI中文摘要

对于高维数据中的机器学习随机傅里叶特征方法,在计算和理论上具有吸引力,因为优化基于一个凸的标准最小二乘问题和独立采样的傅里叶频率。挑战在于如何良好地采样傅里叶频率。本文证明了基于渐近最优重采样频率的自适应方法的收敛性,当节点数和数据量趋于无穷时。基于重采样和自适应随机游走步骤以及通过共轭梯度迭代近似最小二乘问题的数值结果验证了回归和分类问题的分析。

英文摘要

The machine learning random Fourier feature method for data in high dimension is computationally and theoretically attractive since the optimization is based on a convex standard least squares problem and independent sampling of Fourier frequencies. The challenge is to sample the Fourier frequencies well. This work proves convergence of a data adaptive method based on resampling the frequencies asymptotically optimally, as the number of nodes and amount of data tend to infinity. Numerical results based on resampling and adaptive random walk steps together with approximations of the least squares problem by conjugate gradient iterations confirm the analysis for regression and classification problems.

2506.01523 2026-05-19 cs.LG stat.ML

Beyond RLHF: A Unified Theoretical Framework of Alignment

超越RLHF:对齐的统一理论框架

Jihun Yun, Juno Kim, Jongho Park, Junhyuck Kim, Jongha Jon Ryu, Jaewoong Cho, Kwang-Sung Jun

AI总结 本文提出了一种统一的对齐理论框架,通过将对齐视为基于成对偏好的分布学习,推导出三种新的对齐目标,并证明了它们在非渐近情况下具有O(1/n)的收敛性,为RLHF提供了理论支持。

详情
AI中文摘要

通过强化学习从人类反馈(RLHF)对大型语言模型(LLMs)输出质量进行控制已成为主流方法。然而,现有理论未能为RLHF目标本身提供有力的理论依据,并且由于不同方法通常在不同框架下分析,难以比较各种方法的保证。为建立统一的对齐框架,本文探讨在何种假设下可以推导出现有或新的训练目标并获得理论保证。为此,本文将对齐重新定义为基于成对偏好的分布学习,这建立了一个概率假设,描述了偏好如何揭示关于目标LM的信息。这导致我们提出三种原理性的对齐目标:偏好最大似然估计、偏好蒸馏和反KL最小化。我们证明了它们都自然地避免退化,并具有O(1/n)的收敛性。特别是,反KL高度类似于RLHF目标,为RLHF提供了有力的理论支持。此外,本文的理论首次解释了实证发现:在策略性目标(如RLHF)通常优于似然式目标(如DPO)。最后,实验结果表明,所提出的目标在多个任务和模型上与强基线竞争。

英文摘要

Alignment via reinforcement learning from human feedback (RLHF) has become the dominant paradigm for controlling the quality of outputs from large language models (LLMs). However, existing theories do not provide strong justification for the RLHF objective itself and do not allow comparisons of the guarantees between various methods because different methods are often analyzed under different frameworks. Toward a unified framework for alignment, we ask under what assumptions can we derive existing or new training objectives and obtain theoretical guarantees. To this end, we reframe alignment as distribution learning from pairwise preferences, which makes a probabilistic assumption describing how preferences reveal information about the target LM. This leads us to propose three principled alignment objectives: preference maximum likelihood estimation, preference distillation, and reverse KL minimization. We prove that they all enjoy strong non-asymptotic $O(1/n)$ convergence to the target LM, naturally avoiding degeneracy. In particular, reverse KL highly resembles the RLHF objective, providing strong justification for RLHF. Furthermore, our theory explains, for the first time, the empirical finding that on-policy objectives (e.g., RLHF) typically outperform likelihood-style objectives (e.g., DPO). Finally, empirical results indicate that the proposed objectives are competitive with strong baselines across several tasks and models.

2505.11143 2026-05-19 stat.ML cs.LG

Nash: Neural Adaptive Shrinkage for Structured High-Dimensional Regression

Nash: 用于结构高维回归的神经自适应收缩

William R. P. Denault

AI总结 本文提出Nash框架,通过神经网络整合协变量特定的侧信息,实现高维稀疏回归,提升模型适应性和准确性。

详情
AI中文摘要

稀疏线性回归是数据分析中的基本工具。然而,传统方法在协变量具有结构或来自异质来源时往往表现不佳。在生物医学应用中,协变量可能来自不同的模态或根据潜在图结构进行组织。我们引入了神经自适应收缩(Nash),一种统一的框架,通过神经网络将协变量特定的侧信息整合到稀疏回归中。Nash在每个协变量的基础上自适应地调节惩罚项,学习调整正则化而无需交叉验证。我们使用一种分裂变分经验贝叶斯算法,将先验学习与后验推断解耦,将每轮扫描的M步骤从每个神经网络传递的O(p)次减少到一次批量传递,相对于之前提出的坐标上升CAVI方法,在p在10²到10⁴之间时,实测时间加速了74到106倍。在真实数据上的实验表明,Nash在准确性和适应性上优于现有方法。

英文摘要

Sparse linear regression is a fundamental tool in data analysis. However, traditional approaches often fall short when covariates exhibit structure or arise from heterogeneous sources. In biomedical applications, covariates may stem from distinct modalities or be structured according to an underlying graph. We introduce \textit{Neural Adaptive Shrinkage} (Nash), a unified framework that integrates covariate-specific side information into sparse regression via neural networks. Nash adaptively modulates penalties on a per-covariate basis, learning to tailor regularization without cross-validation. We use a \textit{split variational empirical Bayes} algorithm that decouples prior learning from posterior inference, reducing the M-step from $\mathcal{O}(p) $ neural-network passes per sweep to a single batched pass, a \textit{74 to 106x wall-clock speedup} over previously proposed coordinate ascent CAVI for p between $10^2$ and $10^4$. Experiments on real data demonstrate that Nash improves accuracy and adaptability over existing methods.

2505.06852 2026-05-19 cs.LG stat.ML

Improving Random Forests by Smoothing

通过平滑改进随机森林

Ziyi Liu, Phuc Luong, Mario Boley, Daniel F. Schmidt

AI总结 本文提出一种基于核的平滑机制,通过引入局部正则性来增强随机森林的预测性能,同时保留其自适应分区能力,特别是在数据稀缺情况下提升了预测效果。

详情
Comments
v2: Accepted manuscript. 30 pages (18 main + 12 appendix), 6 figures
AI中文摘要

随机森林回归是一种强大的非参数方法,通过数据驱动的分区适应局部数据特征,在各种应用领域中表现出色。然而,随机森林预测的分段常数性质意味着每个分区都是独立预测的,忽略了潜在的函数平滑性。特别是在小数据情况下,输入空间内缺乏信息共享可能导致性能不佳。在本文中,我们提出了一种基于核的平滑机制,通过引入局部正则性来增强随机森林,同时保留其自适应分区能力。我们的方法将核平滑应用于随机森林的分段常数输出,有效地结合了基于树的方法的适应性和核方法的平滑性假设。我们证明这种平滑过程可以被解释为在重新采样训练输入的情况下捕捉树切分点的变异性/不确定性。实验证实,所提出的平滑随机森林模型在各种测试案例中一致提高了预测性能,特别是在数据稀缺的情况下。代码、数据集和实验结果可在 https://github.com/Neal-Liu-Ziyi/SmoothedRandomForest.git 公开获取。

英文摘要

Random forest regression is a powerful non-parametric method that adapts to local data characteristics through data-driven partitioning, making it effective across diverse application domains. However, the piecewise constant nature of random forest predictions means each partition is predicted independently, ignoring potential smoothness in the underlying function. Particularly in the small data regime, this lack of information sharing across the input space can lead to suboptimal performance. In this work, we propose a kernel-based smoothing mechanism that enhances random forests by introducing local regularity to their predictions while preserving their adaptive partitioning capabilities. Our approach applies kernel smoothing to the piecewise constant outputs of random forests, effectively combining the adaptability of tree-based methods with the smoothness assumptions of kernel methods. We show that this smoothing procedure can be interpreted as capturing the variability/uncertainty in the tree cut points under resampling of the training inputs. Empirical results demonstrate that the proposed smoothed random forest model consistently improves predictive performance across diverse test cases, particularly in data-scarce settings. Code, datasets, and experiment results are publicly available at https://github.com/Neal-Liu-Ziyi/SmoothedRandomForest.git.

2505.02621 2026-05-19 cs.LG math.OC stat.ML

Mirror Mean-Field Langevin Dynamics

镜像均场 Langevin 动力学

Anming Gu, Juno Kim

AI总结 本文提出镜像均场 Langevin 动力学(MMFLD),用于优化受限在 $\mathbb{R}^d$ 子集上的概率测度,并通过统一的对数 Sobolev 不等式获得连续 MMFLD 的线性收敛性保证,以及其时间-粒子离散化版本的统一时间传播混沌结果。

详情
Comments
ICML 2026
AI中文摘要

均场 Langevin 动力学(MFLD)在 $\mathbb{R}^d$ 上的 Wasserstein 空间上最小化一个熵正则化的非线性凸函数,并最近因其作为无限宽度两层神经网络等相互作用粒子系统的梯度下降动力学模型而受到关注。然而,许多感兴趣的问题具有受限的域,而现有的均场算法由于全局扩散项无法解决此类问题。我们通过将 MFLD 扩展到镜像 Langevin 框架,提出镜像均场 Langevin 动力学(MMFLD),以研究受限在 $\mathbb{R}^d$ 的凸子集上的概率测度的优化。我们通过统一的对数 Sobolev 不等式获得了连续 MMFLD 的线性收敛性保证,并获得了其时间-粒子离散化版本的统一时间传播混沌结果。

英文摘要

The mean-field Langevin dynamics (MFLD) minimizes an entropy-regularized nonlinear convex functional on the Wasserstein space over $\mathbb{R}^d$, and has gained attention recently as a model for the gradient descent dynamics of interacting particle systems such as infinite-width two-layer neural networks. However, many problems of interest have constrained domains, which are not solved by existing mean-field algorithms due to the global diffusion term. We study the optimization of probability measures constrained to a convex subset of $\mathbb{R}^d$ by proposing the \emph{mirror mean-field Langevin dynamics} (MMFLD), an extension of MFLD to the mirror Langevin framework. We obtain linear convergence guarantees for the continuous MMFLD via a uniform log-Sobolev inequality, and uniform-in-time propagation of chaos results for its time- and particle-discretized counterpart.

2504.07347 2026-05-19 stat.ML cs.LG math.PR

Throughput-Optimal Scheduling Algorithms for LLM Inference and AI Agents

面向LLM推理和AI代理的吞吐量最优调度算法

J. G. Dai, Tianze Deng, Yueying Li, Tianyi Peng

AI总结 本文从排队论角度研究了LLM推理系统的吞吐量优化问题,证明了工作保持调度算法在DAG和Fork-Join路由拓扑中能实现最大吞吐量,并揭示了批量处理网络中K-FCFS调度的流极限框架,评估了Orca和Sarathi-Serve的吞吐量最优性,同时指出批量大小限制和循环路由拓扑对吞吐量的影响。

详情
AI中文摘要

随着大型语言模型(LLM)和AI代理的需求迅速增长,优化高效LLM推理系统变得至关重要。尽管已有大量针对系统级工程的努力,但从数学建模和排队视角进行探索的却很少。本文开发了LLM推理的排队基础。特别地,我们研究了LLM推理系统的吞吐量方面。我们证明了一类广泛的'工作保持'调度算法在单个请求和AI代理工作负载中都能实现最大吞吐量,建立了'工作保持'作为从业者的关键设计原则。技术上,我们开发了在K-FCFS调度下的多类批量处理网络的流极限框架,这可能具有独立价值。对实际系统的评估证实Orca和Sarathi-Serve是吞吐量最优的,使从业者放心,而FasterTransformer和原生vLLM则不是最大稳定,应谨慎使用。我们的分析还揭示了诸如批量大小限制和循环路由拓扑等约束如何复杂化吞吐量的图景,指向排队论与LLM系统设计交汇处丰富的开放问题。

英文摘要

As demand for Large Language Models (LLMs) and AI agents grows rapidly, optimizing systems for efficient LLM inference becomes critical. While significant efforts have targeted system-level engineering, little has been explored from a mathematical modeling and queueing perspective. In this paper, we develop the queueing fundamentals for LLM inference. In particular, we study the throughput aspect of LLM inference systems. We prove that a large class of `work-conserving' scheduling algorithms achieve maximum throughput for both individual requests and AI-agent workloads with directed acyclic graph (DAG) and fork-join routing topologies, establishing `work-conserving' as a key design principle for practitioners. Technically, we develop a fluid-limit framework for multi-class batched processing networks under $K$-FCFS scheduling, which may be of independent interest. Evaluations of real-world systems confirm that Orca and Sarathi-Serve are throughput-optimal, reassuring practitioners, while FasterTransformer and vanilla vLLM are not maximally stable and should be used with caution. Our analysis also reveals how constraints such as batch size limits and cyclic routing topologies complicate the throughput picture, pointing to rich open questions at the intersection of queueing theory and LLM system design.

2501.11181 2026-05-19 stat.ME

Sample size and power calculations for causal inference of observational studies

观察性研究因果推断的样本量和功效计算

Bo Liu, Chengxin Yang, Fan Li

AI总结 本文研究了观察性研究中因果推断的样本量和功效计算,通过分析逆概率加权估计量的方差,将功效计算分解为三个组成部分:倾向分数分布、潜在结果分布及其相关性。提出通过两个参数量化混杂因素-治疗和混杂因素-结果关联强度,并开发了R包和在线计算器。

详情
AI中文摘要

本文探讨了观察性研究中因果推断的样本量和功效计算的理论基础,并开发了分析公式。通过分析逆概率加权估计量的方差,我们将功效计算分解为三个组成部分:倾向分数分布、潜在结果分布及其相关性。我们证明,除了随机试验中标准输入外,确定观察性研究的最小样本量只需两个参数,分别量化混杂因素-治疗和混杂因素-结果关联强度。对于前者,我们提出使用Bhattacharyya系数,该系数测量协变量重叠,并与治疗比例结合,导致唯一可识别且易于计算的倾向分数分布。对于后者,我们提出一个受协变量回归R-squared统计量限制的敏感性参数。我们的方法依赖于参数化的倾向分数模型和半参数化的限制均值结果模型,但不需要对多变量协变量进行分布假设。我们开发了相关的R包PSpower和在线计算器。

英文摘要

This paper investigates the theoretical foundation and develops analytical formulas for sample size and power calculations for causal inference with observational data. By analyzing the variance of an inverse probability weighting estimator of the average treatment effect, we decompose the power calculation into three components: propensity score distribution, potential outcome distribution, and their correlation. We show that to determine the minimal sample size of an observational study, in addition to the standard inputs in the power calculation of randomized trials, it is sufficient to have two parameters, which quantify the strength of the confounder-treatment and the confounder-outcome association, respectively. For the former, we propose using the Bhattacharyya coefficient, which measures the covariate overlap and, together with the treatment proportion, leads to a uniquely identifiable and easily computable propensity score distribution. For the latter, we propose a sensitivity parameter bounded by the R-squared statistic of the regression of the outcome on covariates. Our procedure relies on a parametric propensity score model and a semiparametric restricted mean outcome model, but does not require distributional assumptions on the multivariate covariates. We develop an associated R package PSpower and an online calculator.

2501.08492 2026-05-19 stat.ME math.ST stat.TH

Bayesian Sphere-on-Sphere Regression with Optimal Transport Maps

基于最优传输映射的贝叶斯球面回归

Tin Lok James Ng, Kwok-Kun Kwong, Jiakun Liu, Andrew Zammit-Mangion

AI总结 本文提出了一种基于最优传输的贝叶斯球面回归方法,通过划分球面领域并局部建模回归映射,以捕捉复杂的球面关系,从而实现灵活且具有表现力的回归模型。

详情
AI中文摘要

球面回归,其中协变量和响应变量都位于球面上,在许多科学应用中出现,并且近年来引起了相当多的方法论关注。尽管有进展,构建灵活且具有表现力的球面领域之间的回归模型仍然具有挑战性,特别是因为单一的全局映射通常不足以捕捉整个球面上的复杂关系。因此,一种自然的策略是划分球面领域并在每个区域中允许不同的映射,尽管这引入了建模划分结构本身的额外挑战。为了解决这些问题,我们提出了一种基于最优传输的方法来建模球面划分,并结合在每个区域中定义的参数映射。我们采用贝叶斯框架,共同建模划分和相关的回归映射。该框架使能够识别球面上的异质区域,同时提供原则性的不确定性量化。通过实际数据应用,我们证明所提出的方法实现了强大的预测性能,产生了有意义的不确定性估计,并揭示了球面数据中的可解释聚类结构。

英文摘要

Spherical regression, in which both covariates and responses lie on the sphere, arises in many scientific applications and has attracted considerable methodological attention in recent years. Despite this progress, constructing flexible and expressive regression models between spherical domains remains challenging, particularly because a single global mapping is often insufficient to capture complex relationships across the entire sphere. A natural strategy is therefore to partition the spherical domain and allow distinct mappings within each region, though this introduces the additional challenge of modeling the partition structure itself. To address these issues, we propose an approach based on optimal transport to model spherical partitions, combined with parametric mappings defined locally within each region. We adopt a Bayesian framework to jointly model both the partitioning and the associated regression maps. This framework enables the identification of heterogeneous regions on the sphere while providing principled uncertainty quantification. Through real-data applications, we demonstrate that the proposed method achieves strong predictive performance, yields meaningful uncertainty estimates, and reveals interpretable clustering structure in spherical data.

2412.13731 2026-05-19 stat.CO stat.ME stat.ML

Reliability analysis for non-deterministic limit-states using stochastic emulators

使用随机模拟器进行非确定性极限状态的可靠性分析

Anderson V. Pires, Maliki Moustapha, Stefano Marelli, Bruno Sudret

AI总结 本文提出了一种基于随机模拟器的可靠性分析方法,通过使用合适的替代模型降低计算成本,验证了通用lambda模型和随机多项式展开在分析风力涡轮机可靠性时的有效性。

详情
Journal ref
Structural Safety, 117,102621,pp. 1-14, 2025
AI中文摘要

可靠性分析是不确定性量化的一个子领域,用于评估系统在各种不确定性下的预期性能概率。传统上,这种分析依赖于确定性模型,其中实验是可重复的,即给定输入集下产生一致的输出。然而,现实系统往往表现出随机行为,导致不可重复的结果。这些所谓的随机模拟器每次运行模型时都会产生不同的输出,即使输入固定。本文正式引入了对随机模型的可靠性分析,并通过使用合适的替代模型来解决这一问题,以降低通常较高的计算成本。具体而言,我们专注于最近引入的广义lambda模型和随机多项式展开。这些模拟器旨在学习模拟器响应的内在随机性,并在比传统蒙特卡洛模拟低得多的成本下实现高效的不确定性量化。我们通过三个案例研究验证了我们的方法。首先,使用具有闭式解的分析函数,我们证明模拟器收敛到正确解。其次,我们使用简支梁的玩具示例展示了替代模型的成果。最后,我们将模拟器应用于一个现实的风力涡轮机案例研究,其中只有模拟结果的数据集可用。

英文摘要

Reliability analysis is a sub-field of uncertainty quantification that assesses the probability of a system performing as intended under various uncertainties. Traditionally, this analysis relies on deterministic models, where experiments are repeatable, i.e., they produce consistent outputs for a given set of inputs. However, real-world systems often exhibit stochastic behavior, leading to non-repeatable outcomes. These so-called stochastic simulators produce different outputs each time the model is run, even with fixed inputs. This paper formally introduces reliability analysis for stochastic models and addresses it by using suitable surrogate models to lower its typically high computational cost. Specifically, we focus on the recently introduced generalized lambda models and stochastic polynomial chaos expansions. These emulators are designed to learn the inherent randomness of the simulator's response and enable efficient uncertainty quantification at a much lower cost than traditional Monte Carlo simulation. We validate our methodology through three case studies. First, using an analytical function with a closed-form solution, we demonstrate that the emulators converge to the correct solution. Second, we present results obtained from the surrogates using a toy example of a simply supported beam. Finally, we apply the emulators to perform reliability analysis on a realistic wind turbine case study, where only a dataset of simulation results is available.

2411.03936 2026-05-19 cs.LG stat.ML

GUIDE-VAE: Advancing Data Generation with User Information and Pattern Dictionaries

GUIDE-VAE:利用用户信息和模式词典推进数据生成

Kutay Bölat, Simon Tindemans

AI总结 本文提出GUIDE-VAE,一种基于用户嵌入和模式词典的生成模型,通过整合用户信息和复杂特征依赖性,提升多用户数据集下的生成性能和样本真实性。

详情
AI中文摘要

多用户数据集的生成建模在科学和工程中变得突出。生成特定用户的样本需要利用用户信息,而传统生成模型,包括变分自编码器(VAEs),通常忽略这一点。本文介绍了GUIDE-VAE,一种新的条件生成模型,利用用户嵌入生成用户引导的数据。通过利用用户之间的共享模式,GUIDE-VAE在多用户设置中提升了性能,即使在数据不平衡显著的情况下。除了整合用户信息外,GUIDE-VAE还采用基于模式词典的协方差组成(PDCC)来提高生成样本的真实性和捕捉复杂特征依赖性。虽然用户嵌入推动了性能提升,但PDCC解决了VAEs中常见的噪声和过平滑问题。所提出的GUIDE-VAE在具有显著用户数据不平衡的多用户智能电表数据集上进行了评估。定量结果表明,GUIDE-VAE在合成数据生成和缺失记录填补任务中表现良好,而定性评估表明其生成的数据更加合理且噪声更少。这些结果确立了GUIDE-VAE作为多用户数据集可控、真实数据生成的有前景工具,具有跨领域应用的潜力。

英文摘要

Generative modelling of multi-user datasets has become prominent in science and engineering. Generating a data point for a given user requires employing user information, and conventional generative models, including variational autoencoders (VAEs), often ignore this. This paper introduces GUIDE-VAE, a novel conditional generative model that leverages user embeddings to generate user-guided data. By leveraging shared patterns across users, GUIDE-VAE improves performance in multi-user settings, even under significant data imbalance. In addition to integrating user information, GUIDE-VAE incorporates a pattern dictionary-based covariance composition (PDCC) to improve the realism of generated samples by capturing complex feature dependencies. While user embeddings drive performance gains, PDCC addresses common issues such as noise and over-smoothing typically seen in VAEs. The proposed GUIDE-VAE was evaluated on a multi-user smart meter dataset characterised by substantial data imbalance across users. Quantitative results show that GUIDE-VAE performs effectively on both synthetic data generation and missing-record imputation tasks, while qualitative evaluations indicate that it produces more plausible and less noisy data. These results establish GUIDE-VAE as a promising tool for controlled, realistic data generation in multi-user datasets, with potential applications across domains that require user-informed modelling.

2410.01223 2026-05-19 stat.CO cs.LG

Statistical Taylor Expansion: A New and Path-Independent Method for Uncertainty Analysis

统计泰勒展开:一种新的、路径无关的不确定性分析方法

Chengpu Wang

AI总结 本文提出了一种新的路径无关的不确定性分析方法,通过将精确输入变量替换为具有已知分布和样本数的随机变量,计算每个结果的均值、偏差和可靠因子,从而实现对输入不确定性的传播追踪,使最终结果成为路径无关的,与传统数学方法不同。

详情
Comments
47 pages, 40 figures
AI中文摘要

作为一种严谨的统计方法,统计泰勒展开扩展了传统泰勒展开,通过将精确输入变量替换为具有已知分布和样本数的随机变量来计算每个结果的均值、偏差和可靠因子。它通过中间步骤追踪输入不确定性的传播,使最终的解析结果成为路径无关的。因此,它与传统数学方法根本不同,后者为每项计算优化计算路径。统计泰勒展开可能为解析表达式的数值计算提供标准化方法。本研究还介绍了称为方差算术的统计泰勒展开的实现,并在广泛的数学应用中展示了相应测试结果。此外,本研究还得出一个重要结论,即库函数中的数值误差可能显著影响结果。理想情况下,每个库函数的值都应通过不确定性偏差来完成。此外,统计泰勒展开与量子物理之间的可能联系也进行了讨论。

英文摘要

As a rigorous statistical approach, statistical Taylor expansion extends the conventional Taylor expansion by replacing precise input variables with random variables of known distributions and sample counts to compute the mean, the deviation, and the reliable factor of each result. It tracks the propagation of the input uncertainties through intermediate steps, so that the final analytic result becomes path independent. Therefore, it differs fundamentally from common approaches in applied mathematics that optimize computational path for each calculation. Statistical Taylor expansion may standardize numerical computations for analytic expressions. This study also introduces the implementation of statistical Taylor expansion termed variance arithmetic and presents corresponding test results across a wide range of mathematical applications. Another important conclusion of this study is that numerical errors in library functions can significantly affect results. It is desirable that each value from library functions be accomplished by an uncertainty deviation. The possible link between statistical Taylor expansion and quantum physics is discussed as well.

2406.19152 2026-05-19 stat.ME stat.AP

Mixture priors for replication studies

混合先验用于复制研究

Roberto Macrì-Demartino, Leonardo Egidi, Leonhard Held, Samuel Pawel

AI总结 本文提出了一种基于混合先验的新型贝叶斯方法,用于复制研究,通过混合原始研究的后验分布和非信息性分布来量化复制研究的复制程度,并展示了如何利用贝叶斯因子进行科学假设检验。

详情
AI中文摘要

科学研究的复制对于评估其结果的可信度至关重要。然而,尚无共识如何量化复制研究复制原始结果的程度。我们提出了一种基于混合先验的新型贝叶斯方法用于复制研究。该方法的思路是使用原始研究的后验分布与非信息性分布的混合作为复制研究分析的先验。混合权重则决定了原始数据与复制数据的混合程度。本文提出了两种不同的策略:一种是固定混合权重,另一种是通过为混合权重分配先验分布引入不确定性。此外,还展示了如何在该框架内利用贝叶斯因子进行相关科学假设的正式检验,如检验效应的存在与否或混合权重是否为零(完全忽略原始数据)或一(完全混合原始数据)。为了展示该方法的实用性,我们分析了三个复制研究的数据。我们的发现表明,混合先验是分析复制研究的一种有价值且直观的替代方法,如层次模型和功率先验。我们提供了免费且开源的R包repmix,实现了所提出的方法。

英文摘要

Replication of scientific studies is important for assessing the credibility of their results. However, there is no consensus on how to quantify the extent to which a replication study replicates an original result. We propose a novel Bayesian approach for replication studies based on mixture priors. The idea is to use a mixture of the posterior distribution based on the original study and a non-informative distribution as the prior for the analysis of the replication study. The mixture weight then determines the extent to which the original and replication data are pooled. Two distinct strategies are presented: one with fixed mixture weights, and one that introduces uncertainty by assigning a prior distribution to the mixture weight itself. Furthermore, it is shown how within this framework Bayes factors can be used for formal testing of relevant scientific hypotheses, such as tests on the presence or absence of an effect or whether the mixture weight equals zero (completely discounting the original data) or one (fully pooling with the original data). To showcase the practical application of the methodology, we analyze data from three replication studies. Our findings suggest that mixture priors are a valuable and intuitive alternative to other Bayesian methods for analyzing replication studies, such as hierarchical models and power priors. We provide the free and open source R package repmix that implements the proposed methodology.

2406.09241 2026-05-19 math.OC cs.LG math.PR stat.ML

What is the long-run distribution of stochastic gradient descent? A large deviations analysis

小批量梯度下降的长期分布是什么?一种大偏差分析

Waïss Azizian, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos

AI总结 本文研究了在一般非凸问题中随机梯度下降(SGD)的长期分布。通过基于大偏差理论和随机扰动动力系统的方法,作者发现SGD的长期分布类似于热力学平衡态的玻尔兹曼-盖布斯分布,其中温度等于方法的步长大小,能量水平由问题的目标函数和噪声统计决定。研究还发现,在长期中,(a)问题的临界区域比任何非临界区域被访问的次数指数级更多;(b)SGD的迭代结果在问题的最低能量状态上指数级集中(该状态不总是对应于目标函数的全局最小值);(c)所有其他临界点的连通分量被访问的频率与它们的能量水平呈指数比例关系;最后,(d)任何局部极大值或鞍点的连通分量都被局部最小值的连通分量所主导,后者被访问的次数指数级更多。

详情
Comments
71 pages, 3 figures; presented in ICML 2024
AI中文摘要

在本文中,我们研究了随机梯度下降(SGD)在一般非凸问题中的长期分布。具体而言,我们试图了解SGD更可能访问问题状态空间的哪些区域,以及程度如何。通过基于大偏差理论和随机扰动动力系统的方法,我们证明SGD的长期分布类似于热力学平衡态的玻尔兹曼-盖布斯分布,其中温度等于方法的步长大小,能量水平由问题的目标函数和噪声的统计特性决定。特别地,我们证明在长期中,(a)问题的临界区域比任何非临界区域被访问的次数指数级更多;(b)SGD的迭代结果在问题的最低能量状态上指数级集中(该状态不总是对应于目标函数的全局最小值);(c)所有其他临界点的连通分量被访问的频率与它们的能量水平呈指数比例关系;最后,(d)任何局部极大值或鞍点的连通分量都被局部最小值的连通分量所主导,后者被访问的次数指数级更多。

英文摘要

In this paper, we examine the long-run distribution of stochastic gradient descent (SGD) in general, non-convex problems. Specifically, we seek to understand which regions of the problem's state space are more likely to be visited by SGD, and by how much. Using an approach based on the theory of large deviations and randomly perturbed dynamical systems, we show that the long-run distribution of SGD resembles the Boltzmann-Gibbs distribution of equilibrium thermodynamics with temperature equal to the method's step-size and energy levels determined by the problem's objective and the statistics of the noise. In particular, we show that, in the long run, (a) the problem's critical region is visited exponentially more often than any non-critical region; (b) the iterates of SGD are exponentially concentrated around the problem's minimum energy state (which does not always coincide with the global minimum of the objective); (c) all other connected components of critical points are visited with frequency that is exponentially proportional to their energy level; and, finally (d) any component of local maximizers or saddle points is "dominated" by a component of local minimizers which is visited exponentially more often.

2405.06415 2026-05-19 stat.ML cs.LG

Generalization analysis with deep ReLU networks for metric and similarity learning

基于深度ReLU网络的度量与相似性学习的泛化分析

Junyu Zhou, Puyu Wang, Ding-Xuan Zhou

AI总结 本文研究了度量与相似性学习的泛化性能,通过构建结构化的深度ReLU神经网络来近似真实度量,并推导出显式的泛化误差界,首次为该领域提供了明确的泛化分析。

详情
Comments
15 pages, 1 figure
AI中文摘要

尽管度量与相似性学习已从多个理论角度被广泛研究,但对其泛化性能的深入理解仍显不足。本文通过利用真实度量(即目标函数)的特定结构,研究了度量与相似性学习的泛化行为。特别地,通过推导具有hinge损失的度量与相似性学习的真实度量的显式形式,我们构建了一个结构化的深度ReLU神经网络作为真实度量的近似,其近似能力取决于网络复杂度。这里,网络复杂度通过网络深度、非零权重数量和计算单元数量来表征。基于由此类结构化深度ReLU网络构成的假设空间,我们通过仔细控制近似误差和估计误差,建立了度量与相似性学习的超额风险界。通过选择适当的构造假设空间的容量,推导出显式的超额风险率。迄今为止,这是首次为度量与相似性学习提供显式超额风险界的泛化分析。此外,我们还研究了在更一般损失函数下度量与相似性学习的真实度量的性质。实验表明,所提出模型在经验上具有竞争力,并能更好地捕捉底层的相似性结构。

英文摘要

While metric and similarity learning has been extensively studied from several theoretical perspectives, a rigorous understanding of its generalization performance is still lacking. In this paper, we investigate the generalization behavior of metric and similarity learning by exploiting the specific structure of the true metric (i.e., the target function). In particular, by deriving the explicit form of the true metric for metric and similarity learning with the hinge loss, we construct a structured deep ReLU neural network as an approximation of the true metric, whose approximation ability depends on the network complexity. Here, the network complexity is characterized by the network depth, the number of nonzero weights, and the number of computational units. Based on the hypothesis space consisting of such structured deep ReLU networks, we establish excess risk bounds for metric and similarity learning by carefully controlling both the approximation error and the estimation error. An explicit excess risk rate is derived by choosing the proper capacity of the constructed hypothesis space. To the best of our knowledge, this is the first generalization analysis that provides explicit excess risk bounds for metric and similarity learning. In addition, we investigate properties of the true metric for metric and similarity learning under more general loss functions. Experiments show that the proposed model is empirically competitive and better captures the underlying similarity structure.

2402.14390 2026-05-19 stat.CO stat.ME

Composite likelihood inference for the Poisson log-normal model

泊松对数正态模型的复合似然推断

Julien Stoehr, Stephane S. Robin

AI总结 本文提出了一种结合EM框架与复合似然和重要性采样估计的新型推断方法,用于泊松对数正态模型参数估计,解决了高维积分瓶颈问题,实现了计算可行性,同时保持了最大似然估计的渐近性质。

详情
AI中文摘要

泊松对数正态模型是一种隐变量模型,为多变量计数数据的分析提供了通用框架。推断其参数是一项艰巨的任务,因为给定观测值的隐变量条件分布不可 tractable。对于此模型,变分方法是黄金标准的解决方案,因为它们被证明在计算上是高效的,但缺乏对估计量的理论保证。基于采样的解决方案则恰恰相反。我们首先定义了一个蒙特卡洛EM算法,可以实现最大似然估计器,但仅在低维隐变量空间中计算高效。然后我们提出了一种新的推断程序,将EM框架与复合似然和重要性采样估计相结合。该算法保持了最大似然估计器的有利渐近性质,同时绕过了高维积分瓶颈,从而在中等大小的数据集上保持计算可行性。这种方法使基于实际的参数估计、置信区间和假设检验成为可能。对贝伦特海斯鱼数据集的应用展示了该算法识别显著环境效应和残余种间相关性的能力。

英文摘要

The Poisson log-normal model is a latent variable model that provides a generic framework for the analysis of multivariate count data. Inferring its parameters can be a daunting task since the conditional distribution of the latent variables given the observed ones is intractable. For this model, variational approaches are the golden standard solution as they prove to be computationally efficient but lack theoretical guarantees on the estimates. Sampling-based solutions are quite the opposite. We first define a Monte Carlo EM algorithm that can achieve maximum likelihood estimators, but that is computationally efficient only for low-dimensional latent spaces. We then propose a novel inference procedure combining the EM framework with composite likelihood and importance sampling estimates. The algorithm preserves the desirable asymptotic properties of maximum likelihood estimators while circumventing the high-dimensional integration bottleneck, thus maintaining computational feasibility for moderately large datasets. This approach enables grounded parameter estimation, confidence intervals, and hypothesis testing. Application to the Barents Sea fish dataset demonstrates the algorithm capacity to identify significant environmental effects and residual interspecies correlations.

2201.04982 2026-05-19 stat.OT

An empirical exploration of the diversified R ecosystem

对多样化R生态系统的实证探索

Tian-Yuan Huang, Zhilan Lou

AI总结 本文通过分析CRAN元数据和文献引用数据,探讨了R生态系统的发展动力及其跨学科影响,揭示了R社区中开发者协作模式及潜在影响。

详情
AI中文摘要

R诞生于20世纪末,是统计计算和图形学中最受欢迎的软件之一。随着信息技术的发展和大数据时代的到来,R生态系统发生了巨大变化。基于综合R存档网络(CRAN)的元信息和引用R文献的文献计量数据,我们发现虽然R起源于统计学,但其发展受益于计算机科学,而学术界的主要用户群体来自农业科学、生物学、环境科学和医学等不同学科。此外,我们展示了R开发者之间的协作模式,并分析了协作在R社区中的可能影响。

英文摘要

Born in the late 20s, R is one of the most popular software for statistical computing and graphics. With the development of information technology and the advent of the big data era, great changes have taken place in the R ecosystem. Based on the meta information of the Comprehensive R Archive Network (CRAN) and the bibliometric data of literature citing R, we discovered that while R is initiated by statistics, its development is benefited greatly from computer science and the main user group in academics come from various disciplines such as agricultural science, biological science, environmental science and medical science. In addition, we displayed the collaboration patterns among R developers and analyze the possible effects of collaboration in the R community.

1810.06433 2026-05-19 stat.CO stat.ME

Calibration procedures for approximate Bayesian credible sets

近似贝叶斯可信集的校准程序

Jeong Eun Lee, Geoff K. Nicholls, Robin J. Ryder

AI总结 本文提出并应用了两种校准程序,用于检查使用蒙特卡洛方法估计的可信区间覆盖性。研究核心是通过半参数逻辑回归和重要性采样来估计后验覆盖,以评估近似可信集的性能。

详情
Journal ref
Bayesian Analysis 14(4): 1245-1269 (2019)
Comments
28 pages, 6 Figures, 1 Table, 4 Algorithm boxes. Revision improves clarity of presentation and adds relevant citations
AI中文摘要

我们开发并应用了两种校准程序,用于检查使用蒙特卡洛方法估计的可信区间覆盖性。用户拥有理想的先验和似然,但生成了一个近似后验的可信集,该后验不与理想似然和先验的乘积成比例。我们估计由近似可信集实现的后验覆盖,即如果数据是用户理想观测模型在参数条件下的实现,且参数是从用户理想先验中抽取的,那么未知的“真实”参数的覆盖性。在一种方法中,我们通过半参数逻辑回归对二元覆盖结果进行回归,以估计数据点的后验覆盖,该回归基于模拟数据的汇总统计量。在另一种方法中,我们使用重要性采样从近似后验中进行重要性采样,并将模拟数据窗口化以接近观测数据。我们通过四个例子展示了我们的方法。

英文摘要

We develop and apply two calibration procedures for checking the coverage of approximate Bayesian credible sets including intervals estimated using Monte Carlo methods. The user has an ideal prior and likelihood, but generates a credible set for an approximate posterior which is not proportional to the product of ideal likelihood and prior. We estimate the realised posterior coverage achieved by the approximate credible set. This is the coverage of the unknown ``true'' parameter if the data are a realisation of the user's ideal observation model conditioned on the parameter, and the parameter is a draw from the user's ideal prior. In one approach we estimate the posterior coverage at the data by making a semi-parametric logistic regression of binary coverage outcomes on simulated data against summary statistics evaluated on simulated data. In another we use Importance Sampling from the approximate posterior, windowing simulated data to fall close to the observed data. We illustrate our methods on four examples.

2605.17269 2026-05-19 cs.LG stat.ML

Calibeating for general proper losses: A Bregman divergence approach

基于Bregman散度的方法:一般恰当损失的校准

Maximilian Fichtl, Cristóbal Guzmán, Nishant A. Mehta

AI总结 本文提出了一种基于懊悔最小化的通用校准框架,考虑了包括α-Tsallis损失(α∈[1,2])和Lipschitz损失在内的广泛恰当损失家族,同时展示了新的关于Be The Regularized Leader的懊悔等式。

详情
Comments
31 pages
AI中文摘要

本文介绍了一种基于懊悔最小化的通用校准框架。与Foster和Hart的开创性校准工作相比,后者专门处理Brier分数(平方损失)和log损失,我们考虑了一类包含α-Tsallis损失(α∈[1,2])和Lipschitz损失的广泛恰当损失家族。我们的结果对于Tsallis损失也适用于未缩放的Tsallis损失,该损失恢复log损失。我们的分析围绕恰当损失的Bregman散度观点展开。技术上,我们考虑的Tsallis损失家族的结果是U-calibration结果,同时在所有损失家族中获得对数懊悔,同时与先前结果相比具有更弱的维度依赖性。潜在的独立兴趣点是,我们还展示了新的关于Be The Regularized Leader的懊悔等式。该懊悔等式适用于一般恰当损失,并且本身基于两个与广义方差的在线更新公式相关的结果,后者是基于Bregman散度的方差泛化。

英文摘要

This work introduces a general framework for calibeating based on regret minimization. As compared to Foster and Hart's seminal calibeating work which had specialized treatments of Brier score (squared loss) and log loss, we consider a large family of proper losses that includes $α$-Tsallis losses (for $α\in [1, 2]$) and Lipschitz losses. Our results for Tsallis losses also hold for an unscaled version of Tsallis loss that recovers log loss. Our analysis is oriented around the Bregman divergence view of a proper loss. Technically, our results for the family of Tsallis losses that we consider are U-calibration results, simultaneously obtaining logarithmic regret for all losses in this family while having a weaker dependence on the dimension compared to previous results. Of potential independent interest, we also show a new regret equality for the regret of Be The Regularized Leader. This regret equality holds for general proper losses and itself is based on two results related to online updating formulas for the generalized variance, the latter being a previously introduced generalization of variance based on Bregman divergences.

2605.17240 2026-05-19 stat.ME

The FORSS Framework for Sample Size and Power Calculations With Win Statistics for Hierarchical Endpoints

具有Win统计的分层终点的样本量和功效计算的FORSS框架

Baoshan Zhang, Huiman X. Barnhart, Yuan Wu, Roland A. Matsouaka

AI总结 本文提出了一种基于公式的方法,即FORSS框架,用于处理具有分层终点的临床试验中的样本量和功效计算,通过灵活的联合工作分布和熟悉度量来指定边际治疗效应,从而克服了现有方法的局限性。

详情
AI中文摘要

Win统计已成为具有分层终点(HEs)作为主要终点的临床试验的主要分析方法。然而,现有试验设计中的样本量和功效计算方法仍面临几个限制和挑战:基于模拟的方法计算成本高,而现有的基于公式的办法通常依赖于简化假设,如HEs之间的独立性,或需要指定总体Win统计和平局概率,这些在实践中难以事先获得。为了解决这些挑战,我们提出了FORSS框架,一种基于公式的超样本方法,允许研究者使用熟悉的度量(如危险比、均值差异和风险差异)指定边际治疗效应,同时结合灵活的联合工作分布用于HEs。而不是在每个候选样本量上反复模拟完整试验,FORSS使用超样本来估计分析公式所需的人口层面插值量,用于功率和样本量计算。通过广泛的模拟研究评估了所提FORSS的性能。结果表明,基于公式的FORSS在广泛的情景中紧密匹配经验功率,同时保持I类错误率接近名义的5%水平。基于HEART-FID试验的示例进一步表明,当规划具有HEs的试验时,终点依赖性规范可能对预期功率和所需样本量产生实质性影响。

英文摘要

Win statistics have gained increasing popularity as primary analysis methods for clinical trials with hierarchical endpoints (HEs) as primary endpoints. However, existing sample size and power calculation approaches in trial design still face several limitations and challenges: simulation-based approaches are computationally intensive, while existing formula-based methods often rely on simplifying assumptions such as independence among HEs, or require specification of overall win statistics and tie probability that are difficult to elicit a priori in practice. To address these challenges, we propose the FORSS framework, a FORmula-based Super-Sample approach that allows investigators to specify marginal treatment effects using familiar metrics (e.g., hazard ratios, mean differences, and risk differences) together with a flexible joint working distribution for the HEs. Rather than repeatedly simulating full trials at each candidate sample size, FORSS uses super-samples to estimate the population-level plug-in quantities required by analytical formulas for both power and sample size calculation. We evaluated the performance of the proposed FORSS through extensive simulation studies. The results show that the formula-based FORSS closely matches empirical power across a wide range of scenarios while maintaining Type~I error rates near the nominal 5\% level. An illustration based on the HEART-FID trial further shows that endpoint-dependence specifications can materially affect projected power and required sample size when planning trials with HEs.

2605.17238 2026-05-19 cs.LG stat.ML

Learning in Position-Aware Multinomial Logit Bandits: From Multiplicative to General Position Effects

基于位置感知的多项逻辑带宽学习:从乘法位置效应到一般位置效应

Xi Chen, Shibo Dai, Jiameng Lyu, Yuan Zhou

AI总结 本文研究了动态联合品类选择与排列问题,其中每个产品的吸引力取决于其内在吸引力和显示位置,在多项逻辑(MNL)选择框架下。研究从乘法位置效应模型扩展到一般位置效应模型,为两种模型设计了基于轮次的学习算法,并建立了首个最优后悔分析。此外,这些基于轮次的算法为现代平台提供了必要的实时操作。对于乘法模型,开发了具有截断机制的交叉位置成对最大似然估计器,并证明算法P2MLE-UCB达到$ ilde{O}(\sqrt{NT})$的后悔,匹配下限并弥补了先前基于周期的分析留下的$\sqrt{K}$差距。对于一般模型,建立了最小最大下界并提出了GP2-UCB算法,具有匹配的上界。此外,设计了基于Dinkelbach方法和最大权二分图匹配的高效子程序,用于每轮联合品类和排列优化。在合成数据和Expedia数据集上的数值实验表明,我们的算法在性能上始终优于最先进的基准。

详情
AI中文摘要

我们研究了动态联合品类选择与排列问题,其中每个产品的吸引力取决于其内在吸引力和显示位置,在多项逻辑(MNL)选择框架下。我们的研究从乘法位置效应模型开始,其中每个产品的吸引力由位置特定因子缩放,扩展到一般位置效应模型,该模型为每个产品-位置对分配独立吸引力参数以捕捉异质协同效应。对于两种模型,我们设计了基于轮次的学习算法,在每次反馈后更新决策,并建立了首个最优后悔分析。此外,我们的基于轮次算法为现代平台提供了必要的实时操作。对于乘法模型,我们开发了具有截断机制的交叉位置成对最大似然估计器,并证明我们的算法P2MLE-UCB达到$ ilde{O}(\sqrt{NT})$的后悔,匹配下限并弥补了先前基于周期的分析留下的$\sqrt{K}$差距。对于一般模型,我们建立了最小最大下界并提出了GP2-UCB算法,具有匹配的上界。此外,我们设计了基于Dinkelbach方法和最大权二分图匹配的高效子程序,用于每轮联合品类和排列优化。在合成数据和Expedia数据集上的数值实验表明,我们的算法在性能上始终优于最先进的基准。

英文摘要

We study the dynamic joint assortment selection and positioning problem, where the attraction of each product depends on both its intrinsic appeal and its display position under a Multinomial Logit (MNL) choice framework. Our study ranges from the multiplicative position effects model, in which each product's attraction is scaled by a position-specific factor, to a general position effects model assigning independent attraction parameters to every product--position pair to capture heterogeneous synergies. For both models, we design round-based learning algorithms that update decisions after every single feedback, and establish the first regret-optimal characterization. Besides, our round-based algorithms provide the prompt operations needed by modern platforms. For the multiplicative model, we develop a cross-position pairwise maximum likelihood estimator with a clipping mechanism, and prove that our algorithm P2MLE-UCB attains a regret of $\tilde{O}(\sqrt{NT})$, matching the lower bound and closing the $\sqrt{K}$ gap left by prior epoch-based analyses. For the general model, we establish a minimax lower bound and propose GP2-UCB with a matching upper bound. Moreover, we design an efficient subroutine for the per-round joint assortment and positioning optimization based on Dinkelbach's method and maximum-weight bipartite matching. Numerical experiments on synthetic data and the Expedia dataset show that our algorithms consistently outperform state-of-the-art benchmarks.

2605.17232 2026-05-19 cs.LG math.ST stat.ML stat.TH

Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space

离散扩散模型的维度无关收敛性:伴随方程诱导了正确的空间

Kelvin Kan, Xingjian Li, Benjamin J. Zhang, Tuhin Sahai, Stanley Osher, Markos A. Katsoulakis

AI总结 本文提出了一种基于伴随方程的统一框架,实现了任何积分概率度量(IPM)下的维度无关收敛保证,克服了传统KL和TV方法在处理大规模状态空间时的局限性。

详情
AI中文摘要

离散扩散已成为生成建模中的领先框架,广泛应用于语言、视觉和生物学等领域。然而,现有的收敛理论存在根本性局限。基于KL的分析在奇异先验如掩码分布下会发散,而总变差(TV)的界依赖于状态空间大小S,并在现代语言任务中变得无效,因为词汇表包含数以万计的标记。我们开发了一种统一的基于伴随方程的框架,建立了任何积分概率度量(IPM)下的维度无关收敛保证。到目前为止,我们的界是首个完全不依赖S且适用于掩码和均匀先验的。重要的是,我们的理论仅依赖于一个标准的速率矩阵正则性假设,并且兼容时间非齐次调度。四个新颖的技术推动了我们的改进:通过伴随方程在可观测空间中工作而不是直接处理概率测度,一种产生任何IPM界正则性分析,一种耦合论证在均匀转移下去除S依赖性,以及一种分数-边际抵消技术在掩码转移下去除S依赖性。因此,我们的框架与先前分析显著不同,并避免了路径空间-KL和现有TV方法的不足。除了收敛界外,我们的框架还提供了一种灵活的工具包,用于进一步理论研究离散扩散模型。

英文摘要

Discrete diffusion has become a leading framework for generative modeling in various applications including language, vision, and biology. Existing convergence theory, however, exhibits fundamental limitations. KL-based analyses diverge under singular priors such as the masked distribution, while bounds in total variation (TV) depend on the state space size $S$ and become vacuous for modern language tasks, where vocabularies contain hundreds of thousands of tokens. We develop a unified adjoint-equation-based framework that establishes dimension-free convergence guarantees in any integral probability metric (IPM). To the best of our knowledge, our bounds are the first to be entirely free of $S$ and applicable to both masked and uniform priors. Importantly, our theory relies only on a single standard rate-matrix regularity assumption and is compatible with time-inhomogeneous schedules. Four novel techniques drive our improvements: working in the space of observables via adjoint equations rather than directly with probability measures, a regularity analysis that yields bounds on any IPM, a coupling argument that removes $S$-dependence under uniform transitions, and a score-marginal cancellation technique that removes $S$-dependence under masked transitions. Our framework thus sharply departs from prior analyses and avoids the shortcomings of pathspace-KL and existing TV-based approaches. Beyond convergence bounds, our framework provides a versatile toolkit for further theoretical study of discrete diffusion models.

2605.17186 2026-05-19 math.NA cs.NA math.DS math.PR q-bio.QM stat.CO

Solving linear-rate ODE hierarchies (like master equations) using closures and operator splitting

利用闭合条件和算子分裂求解线性速率ODE层级(如同主方程)

Joshua C Chang

AI总结 本文提出了一种方法,通过闭合条件和算子分裂来求解线性速率ODE层级,解决了连续时间马尔可夫过程的正向方程问题,展示了该方法在不同应用中的有效性,包括分支过程、矩阵值电报和G/R延长等。

详情
Comments
An implementation exists at https://github.com/nih-niddk-mbs/StochasticGene.jl
AI中文摘要

可数无限系统的线性ODE方程出现在许多连续时间马尔可夫过程的正向方程中。标准方法——截断到有限的N并指数化——在N上具有三次成本和时间增长的边界反馈偏差。我们识别出一个关于速率的结构性条件,L_{n+r,n} = alpha_r n + beta_r (

英文摘要

Countably infinite systems of linear ODEs arise as forward equations for many continuous-time Markov processes. The standard recipe -- truncate to a finite cap N and exponentiate -- pays cubic cost in N and a time-growing boundary-feedback bias. We identify a structural condition on the rates, L_{n+r,n} = alpha_r n + beta_r ("linear-rate"), under which the generating function G(z,t) = sum_n x_n(t) z^n satisfies a first-order linear PDE in z, and the method of characteristics yields a composition-multiplier representation G(z,t) = K_t(z) G(Phi_t(z), 0). The Taylor coefficients of Phi_t and K_t on any output window {0,...,N} are determined exactly by a closed lower-triangular polynomial ODE on R^{2(N+1)}, independent of any coefficients above N. Truncation enters only through the support M_0 of the initial law, set independently of N. For binary birth-death the closure collapses to the geometric tail p_n(t) = p_1(t) rho(t)^{n-1} with rho(t) = lambda(1 - e^{-(mu-lambda)t})/(mu - lambda e^{-(mu-lambda)t}). The linear-rate class spans Markov branching with immigration, multi-type branching, matrix-valued telegraph and G/R elongation, and signed or non-stochastic hierarchies. When the generator decomposes as L = A + B with A linear-rate and B non-affine (Schlogl bistable, predator-prey, lattice reaction-diffusion), we pair the closure with Strang splitting on B; Richardson extrapolation lifts the time order to Delta-t^4 at ~3x wall clock. On the Schlogl problem at V=500, N=8,000, the split runs 6.3x faster than dense Pade and 20x faster than sparse Krylov expv. For the stationary regime, a closure-Strang power iteration extends the same machinery to multi-dimensional product-state-space generators where sparse LU hits OOM/OOT or boundary-projection bias at usable caps. Numerical experiments locate where each route wins and where it is dominated by standard tools.

2605.17180 2026-05-19 cs.LG math.OC stat.ML

The Geometry of Projection Heads: Conditioning, Invariance, and Collapse

投影头的几何学:条件性、不变性与坍缩

Faris Chaudhry

AI总结 本文提出了一种投影头的几何理论,通过将投影头建模为可训练的黎曼度量来研究自监督学习中的条件性、不变性和坍缩问题,揭示了投影头在不同深度下的适应能力和稳定性。

详情
Comments
Accepted at ICML 2026. 29 pages, 8 figures, 7 tables
AI中文摘要

我们通过将头建模为可训练的黎曼度量来发展投影头在自监督学习中的几何理论。我们证明线性头执行隐式的子空间白化,而非线性头适应局部度量以满足损失函数的特定拓扑约束,且头的深度经验上决定了这种能力。通过分析维度坍缩,我们证明平滑的非线性头在坍缩平衡点会自然诱导Hessian矩阵的负特征值,使其不稳定。我们通过连续跟踪训练过程中的优化几何来验证这一点,发现Swish等平滑激活函数可以生成显式的负曲率以逃离坍缩,而线性和ReLU头在连续时间梯度流中无法做到这一点,而是依赖于离散时间优化动态和BatchNorm。最后,我们从几何上表征了度量退化如何支配信息不变性之间的权衡,解释了为什么必须丢弃头。在基础模型上对比和去相关目标的评估表明,投影头起到通用几何缓冲器的作用,将语义骨干与预训练目标的刚性破坏约束解耦。

英文摘要

We develop a geometric theory of projection heads in self-supervised learning by modeling the head as a trainable Riemannian metric on the backbone representation manifold. We show that linear heads perform implicit subspace whitening, while nonlinear heads adapt local metrics to satisfy the specific topological constraints of the loss, with head depth empirically dictating this capacity. Analyzing dimensional collapse, we prove that smooth nonlinear heads natively induce negative eigenvalues in the Hessian at collapsed equilibria, making them unstable. We empirically validate this by continuously tracking the optimization geometry during training, which reveals that smooth activations like Swish can generate explicit negative curvature to escape collapse, whereas linear and ReLU heads under continuous-time gradient flow cannot, relying instead on discrete-time optimization dynamics and BatchNorm. Finally, we geometrically characterize how metric degeneracy governs the information-invariance trade-off, explaining why the head must be discarded. Evaluated across contrastive and decorrelation-based objectives on foundation models, our results demonstrate that the projection head acts as a universal geometric buffer, decoupling the semantic backbone from the rigid, destructive constraints of the pretraining objective.

2605.17177 2026-05-19 math.OC cs.LG math.ST stat.ML stat.TH

High-dimensional Limit of SGD for Diagonal Linear Networks

SGD在对角线线性网络中的高维极限

Begoña García Malaxechebarría, Courtney Paquette, Maryam Fazel, Dmitriy Drusvyatskiy

AI总结 本文研究了在高维情况下,SGD在对角线线性网络中的行为,通过推导随机微分方程来近似SGD的动力学,并推导了描述迭代状态和可观测统计量时间演化的偏微分方程,最终证明了在合适参数化下,SGD动态具有全局良好定义并以指数速度收敛到零风险。

详情
Comments
91 pages, 5 figures
AI中文摘要

理解随机梯度方法的行为是现代机器学习中的核心问题。最近的研究强调了对角线线性网络作为一种简化且具有表现力的设置,用于分析神经模型的优化和泛化特性。在本文中,我们证明在高维情况下,对角线线性网络上的随机梯度下降可以被由随机微分方程(SDE)控制的连续动力学近似,该方程显式地将漂移与梯度噪声分离。我们进一步推导了一个确定性偏微分方程,其解传播迭代状态并描述了广泛可观测统计量的时间演化,包括风险、曲率和其他最优性度量。最后,我们证明在合适的参数化下,随机动力学是全局良好的,并以高概率指数收敛到零风险,从而得到其长时间行为的完全显式非渐近描述。数值模拟验证了我们的理论发现。

英文摘要

Understanding the behavior of stochastic gradient methods is a central problem in modern machine learning. Recent work has highlighted diagonal linear networks as a simplified yet expressive setting for analyzing the optimization and generalization properties of neural models. In this work, we show that in the high-dimensional regime, stochastic gradient descent on diagonal linear networks is well-approximated by continuous dynamics governed by a stochastic differential equation (SDE), which explicitly decouples the drift from the gradient noise. We further derive a deterministic partial differential equation whose solution propagates the relevant state of the iterates and characterizes the time evolution of a broad class of observable statistics, including the risk, curvature, and other metrics for optimality. Finally, we show that, under a suitable parametrization, the stochastic dynamics are globally well posed and converge exponentially fast to zero risk with high probability, yielding a fully explicit non-asymptotic description of their long-time behavior. Numerical simulations corroborate our theoretical findings.

2605.17154 2026-05-19 stat.ME stat.ML

Learning Gaussian Graphical Models under Total Positivity via Spectral Graph Sparsification

通过谱图稀疏化学习总正性高斯图模型

Ignacio Echave-Sustaeta Rodríguez, Aida Abiad, Frank Röttger

AI总结 本文提出了一种基于谱图稀疏化的高斯图模型学习方法Spectral-MTP2,通过总正性约束在保持模型准确性的同时生成更稀疏可解释的图结构。

详情
Comments
16 pages
AI中文摘要

许多实际数据分析任务归结为从观测样本中学习变量间的依赖关系。广泛使用的做法是拟合高斯图模型,该模型将依赖结构表示为连接变量的图。在许多重要应用中,如金融回报、基因共表达和气候或网络分析,依赖性往往呈正向:变量倾向于同向变动而非抵消。通过多变量总正性阶数二(MTP2)约束编码这种正性,可以得到一种吸引人的估计器,能够产生准确的拟合且无需调参。然而,所得到的图通常比底层真实模型更密集,这使得它们难以解释并在任何基于图的操作中使用缓慢。在本文中,我们提出了一种新颖的高可扩展性方法,用于通过谱图稀疏化从数据中学习高斯图模型;我们称之为Spectral-MTP2。谱图稀疏化是一种基本方法,旨在通过更稀疏的子图保留密集图的有意义属性。我们理论和经验地研究并验证了我们的方法,并展示了在MTP2约束下使用谱稀疏化学习高斯图模型能够保留MTP2并以Kullback-Leibler散度和高斯对数似然度近似原始模型。在模拟和应用于股票回报和基因表达的实验中,我们发现Spectral-MTP2保留了更密集的MTP2基线的大部分拟合质量,同时生成显著更稀疏和可解释的图结构。

英文摘要

Many practical data analysis tasks reduce to learning, from observed samples, how a collection of variables depend on each other. A widely used approach is to fit a Gaussian graphical model, which represents the dependence structure as a graph connecting the variables. In a number of important applications, such as financial returns, gene co-expression, and climate or network analysis, the dependencies tend to be positive: variables move together rather than offset each other. Encoding this positivity through the constraint of multivariate total positivity of order two (MTP2) yields an attractive estimator that produces accurate fits with no tuning required. The resulting graphs are, however, typically much denser than the underlying ground-truth model, which makes them hard to interpret and slow to use in any downstream task that operates on the graph. In this work, we propose a novel highly-scalable approach for learning Gaussian graphical models from data using spectral sparsification; we call it Spectral-MTP2. Spectral graph sparsification is a fundamental method which aims to preserve meaningful properties of a dense graph with a sparser subgraph. We theoretically and empirically investigate and validate our method, and show that learning Gaussian Graphical Models under MTP2 using spectral sparsification preserves MTP2 and approximates well the original model in terms of Kullback-Leibler divergence and Gaussian log-likelihood. In simulations and applications to equity returns and gene expression, we observe that Spectral-MTP2 retains most of the fit quality of the denser MTP2 baseline, while producing substantially sparser and more interpretable graphs.

2605.17118 2026-05-19 cs.LG stat.CO stat.ML

Differentiable Optimization Layers for Guaranteed Fairness in Deep Learning

可微优化层用于深度学习中的保证公平性

David Troxell, Noah Roemer, Guido Montúfar

AI总结 本文提出了一种称为'公平性层'的可微优化层,该层可确保在神经网络中集成时满足所选的输出平等性概念,并介绍了一个在线对偶推理算法,为流式预测提供可证明的公平性保证,即使使用任意小的批量大小。

详情
Comments
To be published in International Conference on Machine Learning (ICML), 2026
AI中文摘要

可微优化层通常集成在预测后再优化的框架中,其中神经网络模型估计参数,这些参数随后作为固定输入用于下游决策优化问题。在本工作中,我们引入了

英文摘要

Differentiable optimization layers are traditionally integrated in predict-then-optimize frameworks where a neural model estimates parameters that subsequently serve as fixed inputs to downstream decision-making optimization problems. In this work, we introduce the concept of a "fairness layer": a differentiable optimization layer appended to a model's output layer that guarantees a chosen notion of output parity is satisfied when integrated into a neural network. Additionally, we introduce an online primal-dual inference algorithm that provides provable aggregate fairness guarantees for streaming predictions with arbitrarily small batch sizes, where traditional per-batch constraints become overly restrictive. Numerical experiments demonstrate the effectiveness of the fairness layer and associated algorithm, and theoretical analysis characterizes the layer's differentiability and stability properties during model training and backpropagation. Our code for these experiments is publicly available on GitHub (https://github.com/dtroxell19/FairDL-ICML-2026.git) and our public Python package documentation can be found online: https://dtroxell19.github.io/fairness_training/.

2605.17107 2026-05-19 stat.ML cs.LG math.OC math.PR

Diffusion-Based Stochastic Operator Networks for Uncertainty Quantification in Stochastic Partial Differential Equations

基于扩散的随机算子网络用于随机偏微分方程中的不确定性量化

Phuoc-Toan Huynh, Richard Archibald, Feng Bao

AI总结 本文提出了一种新的框架,用于随机偏微分方程(SPDEs)解算子的不确定性量化。尽管SPDEs在建模具有不确定性的复杂物理系统中起着核心作用,但其实际应用通常需要指定模型不确定性的幅度和结构,而这些通常是未知且难以从噪声测量中推断出来的。为此,本文开发了一种随机算子学习框架,直接从噪声数据中学习,并输出均值解场和不确定性量化。所提出的方法,即随机算子网络(SON),通过结合深度算子网络(DeepONet)的结构与随机神经网络(SNNs)来建模随机性并实现概率预测。训练过程通过最小化一种哈密顿型损失并使用随机最大原理优化所得目标进行。在多个不确定性源下的基准SPDEs上的数值实验展示了所提出方法在捕捉解结构和量化预测不确定性方面的准确性和鲁棒性。

详情
AI中文摘要

我们介绍了一种新颖的框架,用于随机偏微分方程(SPDEs)解算子的不确定性量化。尽管SPDEs在建模具有不确定性的复杂物理系统中起着核心作用,但其实际应用通常需要指定模型不确定性的幅度和结构,而这些通常是未知且难以从噪声测量中推断出来的。为此,我们开发了一种随机算子学习框架,直接从噪声数据中学习,并输出均值解场和不确定性量化。所提出的方法,即随机算子网络(SON),是通过将深度算子网络(DeepONet)的结构与随机神经网络(SNNs)相结合来建模随机性并实现概率预测。训练过程是通过最小化一种哈密顿型损失并使用随机最大原理优化所得目标进行。在多个不确定性源下的基准SPDEs上的数值实验展示了所提出方法在捕捉解结构和量化预测不确定性方面的准确性和鲁棒性。

英文摘要

We introduce a novel framework for uncertainty quantification of solution operators associated with stochastic partial differential equations (SPDEs). Although SPDEs play a central role in modeling complex physical systems under uncertainty, their practical use typically requires specifying the magnitude and structure of model uncertainties that are often unknown and difficult to infer from noisy measurements. To address this challenge, we develop a stochastic operator-learning framework that learns directly from noisy data and outputs both a mean solution field and a quantification of uncertainty. The proposed method, namely the Stochastic Operator Network (SON), is constructed by combining the structure of the Deep Operator Network (DeepONet) with Stochastic Neural Networks (SNNs) to model stochasticity and enable probabilistic prediction. The training procedure is carried out by minimizing a Hamiltonian-type loss and optimizing the resulting objective using the Stochastic Maximum Principle. Numerical experiments on benchmark SPDEs under multiple uncertainty sources demonstrate the accuracy and robustness of the proposed method in capturing solution structure and quantifying predictive uncertainty.

2605.17086 2026-05-19 econ.GN cs.AI cs.CY q-fin.EC stat.AP

Global Automation Atlas

全球自动化图谱

Prashant Garg, Tommaso Crosta, Jasmin Baier

AI总结 本文提出了一种基于任务和国家特定的方法,用于全球范围内分类自动化暴露,以区分劳动力替代和增强自动化,相关技术渠道以及人工智能的物质作用。研究涵盖了124个国家,生成了覆盖全球99%人口和GDP的233万个任务-国家标签。

详情
Comments
65 pages, 6 figures. Data and code: https://automationatlas.org/
AI中文摘要

自动化对工作劳动力内容的影响在不同背景下有所不同。然而,大多数现有的暴露测量方法对任务或职业分配固定分数,限制了国家之间的自动化暴露比较。我们开发了一种基于任务和国家特定的方法,用于在全球范围内分类自动化暴露,以区分劳动力替代和增强自动化,相关技术渠道以及人工智能的物质作用。我们的测量覆盖124个国家,生成了覆盖全球99%人口和GDP的233万个任务-国家标签。我们提出了五个描述性结果。首先,暴露程度高度不均,从南苏丹3.3%的任务到中国61.6%的任务,收入越高暴露程度越强,尽管收入组内仍有显著差异。其次,不同国家暴露的任务偏向于替代而非增强,但低收入国家更倾向于替代,而中等收入国家则更异质。第三,低收入国家中,技术先进的自动化形式占暴露任务的一半以上,而高收入国家则约为四分之一;而其他更复杂的渠道通常随收入水平上升。第四,人工智能在简单自动化渠道中较少,但在低收入地区更倾向于劳动力替代边缘,而在高收入地区则更倾向于增强劳动力。第五,我们发现女性似乎比男性更倾向于受到劳动力替代自动化的影响。我们的方法为比较不同发展阶段的自动化暴露提供了基础,将其与跨国数据联系起来,允许我们将暴露水平、劳动力边缘、技术渠道和人工智能参与视为独立维度。

英文摘要

Automation affects the labour content of work differently across different contexts. Yet, most existing exposure measures assign fixed scores to tasks or occupations, limiting comparisons of automation exposure across countries. We develop a task-based and country-specific approach to classify automation exposure across the world to disentangle labor-substituting from labor-augmenting automation, the relevant technology channel, and the material role of AI. Our measure spans 124 countries, generating an atlas of 2.33 million task-country labels for economies covering 99% of world population and GDP. We present five descriptive results. First, exposure is highly uneven, ranging from 3.3% of tasks in South Sudan to 61.6% in China, and rises strongly with income, although substantial variation remains within income groups. Second, across countries, exposed tasks are skewed towards substitution rather than augmentation, but low-income countries are disproportionately exposed to substitution, whereas middle-income countries are more heterogeneous. Third, less technologically advanced forms of automation account for more than half of exposed tasks in low-income countries but about one quarter in high-income countries; while other more complex channels generally rise with income levels. Fourth, AI tends to be less prevalent in simpler channels of automation, but also more prevalent in labour-substituting margins in lower income settings and to augment labour in higher income settings. Fifth, we find that females seem to be disproportionately more exposed to labour-substituting automation than males. Our methodology provides a basis for comparing automation exposure across development stages, linking it with cross-country data and allowing us to treat exposure levels, labour margins, technological channels and AI involvement as separate dimensions.

2605.17050 2026-05-19 stat.ME

Single World Intervention Graphs as Distributions: A Framework for Causal Identification

单世界干预图作为分布:因果识别的一个框架

Christian Bartels

AI总结 本文提出将单世界干预图视为分布的框架,用于因果识别,通过系统推导干预定义的估计量的识别表达式,扩展了现有文献中的后门推导方法,并提出了适用于复杂场景的前门推导方法。

详情
AI中文摘要

因果推断旨在利用观测数据估计干预对结果的影响,通常通过Rubin的潜在结果框架或Pearl的do演算。在Richardson和Robins(2013)第9节之后,本文将单世界干预图(SWIGs)视为观测数据分布和干预分布的表示,而非潜在结果的桥梁。我们证明这种观点提供了一种系统的方法,用于推导由干预选定变量定义的估计量的识别表达式。后门推导与现有文献中的推导相似,而前门推导则提供了一条更易扩展到复杂场景的路径。概念上,该方法与Rubin的框架和Pearl的演算既有关联又有所不同。

英文摘要

Causal inference seeks to estimate the effect of an intervention on an outcome using observed data, typically via Rubin's potential-outcome framework or Pearl's do-calculus. Following section 9 of Richardson and Robins (2013), this essay treats single-world intervention graphs (SWIGs) as representations of both the observed-data distribution and the interventional distribution, rather than as a bridge to potential outcomes. We demonstrate that this perspective provides a systematic way to derive identifying expressions for estimands defined by interventions on selected variables. Back-door derivations mirror those in existing literature, while front-door derivations offer a distinct pathway that extends more readily to complex settings. Conceptually, the method is simultaneously related to and distinct from Rubin's framework and Pearl's calculus.

2605.16970 2026-05-19 math.ST stat.TH

Quantifying Dependence Between Random Vectors: A New Index with Applications

对随机向量之间依赖性的量化:一个新的指数及其应用

Chuancun yin

AI总结 本文提出一个新的指数来量化随机向量之间依赖的程度,该指数在[0,1]区间内取值,当且仅当随机向量子独立时取零值。与单纯的不相关性不同,子独立性表示一种更强的依赖形式,但仍然严格弱于完全独立性。该指数通过特征函数构造,并在矩的术语中具有简化表示。我们建立了其理论性质,并推导了相应的经验测度的计算效率公式。此外,我们研究了估计量的渐近行为,并通过在机器学习、精算科学和再生成理论中的应用展示了其实际用途。

详情
Comments
31pages
AI中文摘要

本文提出一个新的指数来量化随机向量之间依赖的程度。该指数取值在[0,1]区间内,并且当且仅当随机向量是子独立时才取零值。与单纯的不相关性不同,子独立性表示一种更强的依赖形式,但仍然严格弱于完全独立性。所提出的指数是通过特征函数构造的,并在矩的术语中具有简化表示。我们建立了其理论性质,并推导了相应的经验测度的计算效率公式。此外,我们研究了估计量的渐近行为,并通过在机器学习、精算科学和再生成理论中的应用展示了其实际用途。

英文摘要

This article proposes a new index for quantifying the degree of dependence between random vectors. The index takes values in [0,1] and equals zero if and only if the random vectors are sub-independent. Unlike mere uncorrelatedness, sub-independence implies a stronger form of dependence while remaining strictly weaker than full independence. The proposed index is constructed via characteristic functions and admits a simplified representation in terms of moments. We establish its theoretical properties and derive a computationally efficient formula for the corresponding empirical measure. Furthermore, we investigate the asymptotic behavior of the estimator and demonstrate its practical utility through applications in machine learning, actuarial science, and renewal theory.

2605.16919 2026-05-19 stat.ML cs.LG

CAST: Causal Anchored Simplex Transport for Distribution-Valued Time Series

CAST:基于简单集的因果传输用于分布值时间序列

Jiecheng Lu, Jieqi Di, Runhua Wu, Yuwei Zhou

AI总结 该研究提出CAST方法,通过因果锚定简单集传输来处理分布值时间序列的因果预测,解决了分布传输中的结构性失效问题,并在多个基准测试中表现出色。

详情
AI中文摘要

许多面向决策的随机系统是通过聚合分布而非标量轨迹观测的:队列占用、移动份额、公共卫生混合、发电源份额、生态组成和空气质量严重程度剖面都生活在概率简单集上并随时间演变。我们研究这些分布值时间序列的因果(在线)预测,并认为过渡算子本身应围绕简单集进行结构化。我们引入CAST(因果锚定简单集传输),一种 successor-local 操作符,它(i)从因果上下文中检索经验后继,(ii)通过持久锚稳定它们,(iii)在有序支持上应用有界的局部随机传输;每一步都通过构造保持简单集。我们识别出一种结构性失效模式,即潜在的转换核别名,其中相似的观测分布在不同的上下文制度下演变不同,且证明任何仅依赖于别名总结的预测者都会遭受不可约的加权Jensen-Shannon超额风险下界,而CAST假设类包含制度-aware的贝叶斯后继;对于有序支持,当传输后继位于无传输锚壳体外时,额外存在Pinsker分离。在覆盖生态、能源、饮食、死亡率、就业、空气质量、恶劣天气、移动和G/G/1,G_t/G/1队列占用的11个公共和模拟基准上,CAST在一步KL(1.27)和自回归滚动JSD(1.91)上获得最佳平均排名,战胜了广泛的统计、组成、递归、卷积和Transformer基线集,并在所有11个部分中取得前两名的离线KL。组件消融和受控合成别名实验验证了理论。

英文摘要

Many decision-facing stochastic systems are observed through aggregate distributions rather than scalar trajectories: queue occupancies, mobility shares, public-health mixtures, generation-source shares, ecological compositions, and air-quality severity profiles all live on the probability simplex and evolve over time. We study causal (online) forecasting for these distribution-valued time series and argue that the transition operator itself should be structured around the simplex. We introduce CAST (Causal Anchored Simplex Transport), a successor-local operator that (i) retrieves empirical successors from causal context, (ii) stabilizes them with a persistence anchor, and (iii) applies a bounded local stochastic transport on ordered supports; every stage preserves the simplex by construction. We identify a structural failure mode, latent transition-kernel aliasing, where similar observed distributions evolve differently under different contextual regimes, and prove that any forecaster depending only on an aliased summary incurs an irreducible weighted Jensen-Shannon excess-risk lower bound, while the CAST hypothesis class contains the regime-aware Bayes successor; for ordered supports an additional Pinsker separation holds whenever the transported successor lies outside the no-transport anchor hull. On eleven public and simulated benchmarks spanning ecology, energy, diet, mortality, employment, air quality, severe weather, mobility, and G/G/1, G_t/G/1 queue occupancy, CAST attains the best average rank on both one-step KL (1.27) and autoregressive rollout JSD (1.91), winning 8/11 sections on each metric against a broad statistical, compositional, recurrent, convolutional, and Transformer baseline set, and top-2 on all 11 sections for offline KL. Component ablations and a controlled synthetic aliasing experiment corroborate the theory.

2605.16913 2026-05-19 stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG math.PR

A Fourier perspective on the learning dynamics of neural networks: from sample complexities to mechanistic insights

从样本复杂性到机理洞察的神经网络学习动态的傅里叶视角

Fabiola Ricci, Claudia Merger, Sebastian Goldt

AI总结 本文从傅里叶视角研究神经网络学习动态,揭示了自然图像的近似平移不变性和功率谱特性,展示了简单神经网络在图像分类任务中先依赖幅度信息再利用相位信息的学习过程,并证明了在高维输入下仅基于相位信息的分类任务的难度,以及功率谱如何加速相位信息学习。

详情
Journal ref
ICML 2026
AI中文摘要

通过梯度方法训练的神经网络表现出强烈的简单性偏差:它们在学习数据的更复杂特征之前,先学习更简单的统计特征。以往对此现象的研究主要集中在(准)各向同性输入的设置中。在本文中,我们从傅里叶视角研究简单性偏差,这使我们能够将自然图像的两个关键特性纳入分析:近似平移不变性和功率谱。我们首先实验表明,简单神经网络在图像分类任务中首先依赖于幅度信息——与像素对之间的相关性有关——然后再利用相位信息,后者编码边缘和高阶相关性。为此,我们引入了一个合成数据模型,用于平移不变输入,允许对幅度和相位进行精确控制,同时保持可处理性。我们严格证明了对于各向同性和高维输入,仅基于相位信息的分类任务是一个真正困难的任务:在线随机梯度下降(SGD)在n << N^3步内无法区分结构输入与噪声,但需要至少n >> N^3 log^2{N}步。相比之下,我们通过实验和理论证明,功率谱可以显著加速相位信息学习的速度,即使谱本身不帮助分类。对纹理任务的两层网络和ImageNet和CIFAR100的深度卷积网络的模拟证实了幅度和相位之间非平凡的相互作用,提供了深度神经网络高效学习自然图像分布的机理洞察。

英文摘要

Neural networks trained with gradient-based methods exhibit a strong simplicity bias: they learn simpler statistical features of their data before moving to more complex features. Previous analyses of this phenomenon have largely focused on settings with (quasi-)isotropic inputs. In this work, we study the simplicity bias from a Fourier perspective, which allows us to include two key features of natural images in the analysis: approximate translation-invariance and power-law spectra. We first show experimentally that simple neural networks trained on image classification tasks first rely on amplitude information -- related to pair-wise correlations between pixels -- before exploiting phase information, which encodes edges and higher-order correlations. In view of this, we introduce a synthetic data model for translation-invariant inputs that allows precise control over amplitudes and phases while remaining tractable. We rigorously establish that for isotropic and high-dimensional inputs, classification based on phase information alone is a genuinely hard task: online stochastic gradient descent (SGD) cannot distinguish the structured inputs from noise within $n \ll N^3$ steps, but needs at least $n \gg N^3 \log^2{N}$ steps. In contrast, we show both experimentally and theoretically that power-law spectra can dramatically accelerate the speed of learning phase information, even if the spectra do not help with classification. Simulations with two-layer networks trained on textures and with deep convolutional networks on ImageNet and CIFAR100 confirm this non-trivial interaction between amplitudes and phases, providing mechanistic insights into how deep neural networks can learn natural image distributions efficiently.

2605.16906 2026-05-19 math.ST stat.ME stat.TH

Differentially private hypothesis testing in survival analysis

生存分析中的差分隐私假设检验

Elly K. H. Hung, Yi Yu

AI总结 本文研究了生存分析中差分隐私假设检验的问题,提出了针对Cox回归系数的隐私部分似然比检验和分数型检验,以及累积危险函数的隐私分布式两样本检验,并证明了差分隐私和有限样本检验保证以及最小最大下界,揭示了隐私在统计上可忽略、主导检验速率以及半参数生存模型中最优隐私检验率仍待解决的问题。

详情
AI中文摘要

生存分析广泛应用于涉及敏感个体数据的应用中,但针对右删失数据的差分隐私假设检验仍处于初步阶段。我们首次建立了生存分析应用中的有限样本理论私有假设检验。对于Cox回归系数,我们开发了隐私部分似然比检验和分数型检验,包括用于拒绝阈值校准的隐私校准程序。对于累积危险函数,我们提出了隐私分布式两样本检验。在这些问题中,我们证明了差分隐私和有限样本检验保证,以及最小最大下界。我们的结果确定了隐私在统计上可忽略、主导检验速率以及半参数生存模型中最优隐私检验率仍待解决的问题。这种理论分析伴随对模拟数据的数值实验。

英文摘要

Survival analysis is widely used in applications involving sensitive individual-level data, yet differentially private hypothesis testing for right-censored data remains largely undeveloped. We initiate a finite-sample theory of private hypothesis testing in survival analysis applications. For Cox regression coefficients, we develop private partial-likelihood-ratio and score-type tests, including a private calibration procedure for the rejection threshold. For cumulative hazard functions, we propose a private distributed two-sample test. Across these problems, we prove differential privacy and finite-sample testing guarantees, as well as minimax lower bounds. Our results identify when privacy is statistically negligible, when it dominates the testing rate, and where optimal private rates for testing in semiparametric survival models remain open. This theoretical analysis is accompanied by numerical experiments on simulated data.

2605.16900 2026-05-19 stat.ME math.ST stat.TH

Splitting schemes and estimators for stochastic differential equations with Hölder multiplicative noise

具有Hölder乘性噪声的随机微分方程的分裂方案和估计器

Bowen Fang, Dario Spanò, Massimiliano Tamborrino

AI总结 本文研究了具有局部Lipschitz漂移和Hölder连续乘性扩散的单变量随机微分方程的参数估计问题,提出了一种基于数值分裂方案的首个显式伪似然估计器,该方案在强均方收敛性和状态空间保持性方面优于传统的欧拉-马尔蒂内斯离散化方法,并通过模拟验证了其在准确性和计算效率上的优越性。

详情
Comments
53 pages, 12 figures, 2 tables
AI中文摘要

我们研究了具有局部Lipschitz漂移和Hölder连续乘性扩散的单变量随机微分方程的参数估计问题。现有的推断方法通常依赖于欧拉-马尔蒂内斯离散化,尽管其缺乏强收敛性和无法保持状态空间,或者依赖于近似方法,例如高斯近似或海尔特展开的截断,这影响了其稳定性和计算效率。我们引入了首个基于数值分裂方案的显式伪似然估计器,这些方案对于此类SDEs具有强均方收敛性和状态空间保持性。我们的方法基于一种新的SDE分解,利用了可约性和拉姆普蒂变换,导致产生Lie-Trotter(LT)和Strang分裂方案,从而产生基于这些方案的显式伪似然和最大似然估计器。我们证明了强均方收敛性、状态空间保持性和比欧拉-马尔蒂内斯方法更稳健的离散化步长。我们进一步建立了LT估计器的一致性和渐近正态性。由于所提出的数值方案在伪似然中耦合了漂移和扩散参数,因此渐近分析需要新的证明技术。广泛的模拟显示,所提出的估计器在准确性和计算效率上均优于现有方法。

英文摘要

We study parameter estimation for univariate stochastic differential equations with locally Lipschitz drift and Hölder continuous multiplicative diffusion, a class commonly arising in several applications. Existing inference methods typically rely on either the Euler-Maruyama discretisation, despite its lack of strong convergence and failure to preserve the state space, or on approximations, e.g. Gaussian approximation or truncation of Hermite's expansions, impacting on their stability and computational efficiency. We introduce the first explicit pseudo-likelihood estimators based on numerical splitting schemes that are both strong mean-square convergent and state space preserving for this class of SDEs. Our approach is based on a novel decomposition of the SDE that exploits reducibility and the Lamperti transform, leading to Lie-Trotter (LT) and Strang splitting schemes yielding explicit pseudo-likelihoods and maximum likelihood estimators based on them. We prove strong mean-square convergence, state space preservation, and improved robustness with respect to the discretisation step compared to Euler-Maruyama-based methods. We further establish consistency and asymptotic normality of the LT estimator. Because the proposed numerical scheme couples drift and diffusion parameters in the pseudo-likelihood, the asymptotic analysis requires new proof techniques. Extensive simulations demonstrate that the proposed estimators outperform existing methods in both accuracy and computational efficiency.

2605.16885 2026-05-19 stat.AP stat.ME

A Workflow for Evaluating Regional Treatment Effect Heterogeneity in Multi-Regional Clinical Trials

多地区临床试验中区域治疗效应异质性评估的工作流程

Cong Zhang, Meihua Long, Tianyu Zheng, Konstantinos Sechidis, Xiaoni Liu, Sophie Sun, Yao Chen, Xinyi Zhang, Shuhei Kaneko, Björn Bornkamp, Yan Hou

AI总结 本文提出了一种结构化、问题导向的框架,用于指导多地区临床试验中区域异质性的探索性评估,通过四个关键问题明确分析目标,并提出相应的统计方法来解决这些问题。

详情
AI中文摘要

多地区临床试验(MRCTs)通过在单一协议中评估不同地区内的治疗效应,实现了高效的全球药物开发。虽然MRCTs在总体疗效方面具有统计效力,但通常并不设计用于提供关于地区差异的确认证据,因此对观察到的地区异质性的评估大多是探索性的,并容易受到采样变异性的影响。尽管存在这一挑战,理解地区异质性对于解释和监管决策仍然很重要。本文提出了一种结构化、问题导向的框架,用于指导MRCTs中区域异质性的探索性评估。我们提出了四个关键问题以明确此类分析的目标,并提出了一组统计方法来解决这些问题。模拟研究评估了在无异质性和由观察到或未观察到的治疗效应修饰因素驱动的异质性场景下的性能,展示了结构化方法如何支持透明且谨慎的解释。

英文摘要

Multi-regional clinical trials (MRCTs) enable efficient global drug development by assessing treatment effects across regions within a single protocol. While powered for overall efficacy, MRCTs are typically not designed to provide confirmatory evidence on regional differences, making an assessment of observed regional heterogeneity largely exploratory and susceptible to sampling variability. Despite this challenge, understanding regional heterogeneity remains important for interpretation and regulatory decision-making. This paper proposes a structured, question-driven framework to guide exploratory assessments of regional heterogeneity in MRCTs. We formulate four key questions to clarify the objectives of such analyses and propose a set of statistical methods to address them. Simulation studies evaluate performance under scenarios with no heterogeneity and heterogeneity driven by observed or unobserved treatment effect modifiers, illustrating how a structured approach can support transparent and cautious interpretation.

2605.16836 2026-05-19 stat.ML cs.LG

HYVINT: Intensity-Driven Hypergraph Generation with Variational Representations

HYVINT: 基于变分表示的强度驱动超图生成

Xinyi Hong, Shuntuo Xu, Zhou Yu

AI总结 本文提出HYVINT框架,通过强度驱动的超图生成机制和变分估计器,解决超图生成中节点-超边关系的建模问题,实现高保真且具有多样性的生成。

详情
AI中文摘要

超图提供了一个系统的方法来建模多阶交互,应用于推荐系统、社交网络和分子建模等领域。超图生成仍然具有挑战性,因为 incidence 结构是离散、稀疏且由异质的高阶交互支配。现有的生成器通常依赖于隐含的潜在空间或连续的 incidence 解码器,这些方法在解释节点-超边关系的产生机制方面有限。为了解决这些限制,我们提出HYVINT,一种强度驱动的超图生成框架。我们的关键创新是双重:(i) 我们开发了一种强度驱动的 incidence 形成机制,将潜在的交互强度与二进制 incidence 相联系;(ii) 我们推导出一个可处理的变分下界估计器用于学习潜在表示。我们提供了生成误差界和渐近收敛速率,并在合成和现实超图上实验证明HYVINT在保持显著新颖性和多样性的同时实现了强保真度。

英文摘要

Hypergraphs provide a principled framework for modeling polyadic interactions, with applications in recommendation systems, social networks, and molecular modeling. Hypergraph generation remains challenging because incidence structures are discrete, sparse, and governed by heterogeneous higher-order interactions. Existing generators often rely on implicit latent spaces or continuous incidence decoders, which provide limited mechanistic interpretation of how node-hyperedge incidences arise. To address these limitations, we propose HYVINT, an intensity-driven hypergraph generative framework. Our key innovations are twofold: (i) we develop an intensity-driven incidence formation mechanism for hypergraphs that links latent interaction strength to binary incidence, and (ii) we derive a tractable lower-bound variational estimator for learning latent representations. We provide generation error bounds with asymptotic convergence rates and empirically show that HYVINT achieves strong fidelity while maintaining substantial novelty and diversity on synthetic and real-world hypergraphs.

2605.16828 2026-05-19 stat.ML cs.AI cs.LG stat.ME

Prediction-Intervention Games and Invariant Sets

预测-干预博弈与不变集

Linus Kühne, Felix Schur, Jonas Peters

AI总结 本文研究了预测-干预博弈中的领导方如何通过选择预测函数来应对跟随方的干预,证明了基于稳定毯的预测在某些情况下优于因果父母的预测,并讨论了实际应用中的策略。

详情
AI中文摘要

我们考虑了一个两位玩家博弈:利用观测数据,领导者选择一个响应变量Y的预测函数,跟随者则在潜在的结构因果模型中对某些协变量进行干预以最大化自身目标。领导者知道干预目标,但可能对跟随者的目标了解有限。我们称这种设置为预测-干预博弈,是Stackelberg博弈的一种特殊情况。找到领导者的最优策略通常很困难。为了避免严重性能损失,领导者可能基于Y的因果父母或更一般地基于协变量的不变子集来选择预测。我们证明,对于两种常见的跟随者目标类别,基于稳定毯(特定不变子集)的预测总是更好或至少与基于因果父母的预测一样好。我们进一步通过允许的干预的最坏情况风险上界来上界领导者干预后的风险,并加强现有的分布泛化结果以分析此界限:我们给出了稳定毯预测在某些条件下最坏情况最优的充分条件,并通过例子表明这些条件不能一般被删除。最后,我们讨论了已知和未知图的实际情况中的实用策略,并在模拟和现实数据上测试了这些策略。

英文摘要

We consider the following two-player game: using observational data, the leader chooses a prediction function for a response variable $Y$ from given covariates. The follower then reacts with an intervention on some covariates in the underlying structural causal model to maximize their own objective. The leader knows the intervention targets, but may have limited knowledge of the follower's objective. We call this setup a prediction-intervention game, a special case of a Stackelberg game. Finding an optimal strategy for the leader is generally difficult. To avoid severe performance loss, the leader may base their prediction on the causal parents of $Y$, or more generally on an invariant subset of covariates. We prove, for two common classes of follower objectives, that predictors based on the stable blanket, a specific invariant subset, are always better or as good as those based on the causal parents. We further upper bound the leader's post-intervention risk by a worst-case risk over allowed interventions and strengthen existing distribution generalization results to analyze this bound: we give sufficient conditions under which stable-blanket predictors are worst-case optimal, and show by examples that these conditions cannot in general be dropped. Finally, we discuss practical strategies for settings with known and unknown graph, and test them on simulated and real-world data.

2605.16804 2026-05-19 stat.AP

Multi-resolution Spatial Graphical Regression Models for Hierarchical Spatial Transcriptomics Data

多分辨率空间图回归模型用于分层空间转录组数据

Liying Chen, Satwik Acharyya, Allison M. May, Aaron M. Udager, Evan T. Keller, Veerabhadran Baladandayuthapani

AI总结 本文提出了一种基于多分辨率空间图回归的贝叶斯框架,用于从多分辨率空间转录组数据中推断空间变化的基因网络,通过引入空间结构的边选择策略和高斯过程先验,提高了对空间变化的建模能力,并在模拟研究和肾癌数据中展示了改进的网络结构恢复和肿瘤梯度中hub基因的识别。

详情
AI中文摘要

空间转录组(ST)技术的进步使得肿瘤微环境、肿瘤梯度和基因调控网络的系统分子表征成为可能。癌症进展已知会沿病理梯度变化,但现有的基因网络推断方法通常忽略了肿瘤中的分层空间组织。我们开发了一种贝叶斯多分辨率空间图回归(mSGR)框架,用于从多分辨率ST数据中推断空间变化的基因网络。所提出的模型允许精度矩阵在分层结构的空间域中变化,捕捉肿瘤中的局部和全局组织。为了识别空间变化的调控关系,我们引入了一种空间结构的边选择策略,根据空间接近性和病理梯度在不同区域之间借力,同时高斯过程先验灵活地建模边强度的空间变化。通过增强的平均-场变分贝叶斯算法和节点并行回归,实现了可扩展的推断,从而在高维设置中实现了高效的估计。模拟研究显示,与竞争方法相比,网络结构的恢复有所改进。将mSGR应用于肾癌的多分辨率ST数据,揭示了上皮-间质转化路径过渡区域的更强调控连接,并识别了沿肿瘤梯度的hub基因,展示了空间解析网络分析如何提供关于肿瘤微环境组织的关键见解。

英文摘要

Advances in spatial transcriptomics (ST) technologies enable systematic molecular characterization of tumor microenvironment, tumor gradients and gene regulatory networks. Cancer progression is known to vary along pathological gradients, yet existing network approaches for gene network inference typically ignore hierarchical spatial organization across the tumor. We develop a Bayesian multi-resolution spatial graphical regression (mSGR) framework to infer spatially varying gene networks from multi-resolution ST data. The proposed model allows precision matrices to vary across hierarchically structured spatial domains, capturing both local and global organization within the tumor. To identify spatially varying regulatory relationships, we introduce a spatially structured edge selection strategy that borrows strength across regions according to spatial proximity and pathological gradients, while Gaussian-process priors flexibly model spatial variation in edge strengths. Scalable inference is achieved through an augmented mean-field variational Bayes algorithm with node-wise parallel regressions, enabling efficient estimation in high-dimensional settings. Simulation studies demonstrate improved recovery of network structures compared with competing approaches. Applying mSGR to multi-resolution ST data from kidney cancer reveals stronger regulatory connectivity in transitional regions of epithelial-mesenchymal transition pathway and identifies hub genes along the tumor gradient, illustrating how spatially resolved network analysis can provide key insights into tumor microenvironment organization.

2605.16757 2026-05-19 cs.AI cs.MA stat.ME stat.ML

NeuroMAS: Multi-Agent Systems as Neural Networks with Joint Reinforcement Learning

NeuroMAS: 多智能体系统作为神经网络的多智能体系统

Haoran Lu, Luyang Fang, Wenxuan Zhong, Ping Ma

AI总结 本文提出NeuroMAS,一种将多智能体系统视为可训练和可扩展的神经网络架构的方法,通过联合强化学习提升多智能体系统的性能和可扩展性。

详情
AI中文摘要

多智能体语言系统通常被构建为人工设计的工作流,其中智能体被分配语义角色,通信协议在提前指定。我们提出NeuroMAS,一种方法,首先将多智能体语言系统视为可训练和可扩展的神经网络-like架构,其中LLM智能体作为节点,中间文本信号作为边。在NeuroMAS中,智能体节点是无角色但结构感知的:拓扑结构只决定信息如何一般流动,而强化学习训练决定如何通信、专业化和协调。这种表法将多智能体设计从工作流工程转向架构设计,其中深度、宽度、连接性和增长协议成为可扩展的能力来源。进一步,我们提供了一个理论视角,说明为何这种模块化文本计算在任务允许层次分解时更具参数效率。实验表明,NeuroMAS在推理时间和训练多智能体基线方面均有显著提升。我们进一步发现,组织扩展是路径依赖的:更大的系统从头开始训练具有挑战性,但当从较小的训练系统逐步扩展时变得可行。这些结果表明,学习的神经多智能体系统是LLM的有前景的扩展轴。

英文摘要

Multi-agent language systems are often built as hand-designed workflows, where agents are assigned semantic roles and communication protocols are specified in advance. We propose NeuroMAS, a method that first treats a multi-agent language system as a trainable and scalable neural-network-like architecture with LLM agents as nodes and intermediate textual signals as edges. In NeuroMAS, agent nodes are role-free but structure-aware: the topology only determines how information can flow in general, while reinforcement learning training determines how nodes communicate, specialize, and coordinate. This formulation shifts multi-agent design from workflow engineering toward architecture design, where depth, width, connectivity, and growth protocol become scalable sources of capability. Further, we provide a theoretical perspective showing why such modular textual computation is more parameter-efficient when tasks admit hierarchical decompositions. Experiments show that NeuroMAS improves significantly over both inference-time and trained multi-agent baselines. We further find that organizational scaling is path-dependent: larger systems can be challenging to train from scratch, but become feasible when grown progressively from smaller trained systems. These results suggest that learned neural multi-agent systems are a promising scaling axis for LLMs.

2605.16747 2026-05-19 cs.LG math.AP math.OC math.PR math.ST stat.TH

Propagation of Chaos in Contextual Flow Maps

在上下文流映射中传播混沌

Shi Chen, Zhengjiang Lin, Kaizhao Liu, Philippe Rigollet

AI总结 本文提出了一种定量统计理论,用于在大上下文范围内研究transformers,通过采用上下文流映射(CFMs)的抽象:在一组注意力块中,动态系统在上下文度量的存在下演进一个区分的token。在此框架下,有限上下文模型近似于理想化的无限上下文系统,其中上下文度量被其底层总体取代,因此上下文长度n成为统计资源。利用动态的麦肯-瓦尔科夫结构和经典的传播混沌经典机器,我们建立了前向边界,控制有限上下文和无限上下文CFMs在深度上的偏差,并建立了后向边界,控制对应的训练轨迹在在线梯度下降迭代中的偏差。这两个边界实现了通用CFMs的最优Wasserstein速率n^{-1/d}和参数速率n^{-1/2},对于包含transformers的受限CFM类。分析基于新的欧拉共轭公式和由此产生的前向-共轭系统的稳定性估计,这两者可能具有独立兴趣。

详情
Comments
31 pages, 1 figure
AI中文摘要

我们通过采用上下文流映射(CFMs)的抽象来开发一种定量统计理论,用于在大上下文范围内研究transformers:动态系统在一组注意力块中,通过上下文度量的存在演进一个区分的token。在此框架下,有限上下文模型近似于理想化的无限上下文系统,其中上下文度量被其底层总体取代,因此上下文长度n成为统计资源。利用动态的麦肯-瓦尔科夫结构和经典的传播混沌经典机器,我们建立了前向边界,控制有限上下文和无限上下文CFMs在深度上的偏差,并建立了后向边界,控制对应的训练轨迹在在线梯度下降迭代中的偏差。这两个边界实现了通用CFMs的最优Wasserstein速率n^{-1/d}和参数速率n^{-1/2},对于包含transformers的受限CFM类。分析基于新的欧拉共轭公式和由此产生的前向-共轭系统的稳定性估计,这两者可能具有独立兴趣。

英文摘要

We develop a quantitative statistical theory of transformers in the large-context regime by adopting the abstraction of contextual flow maps (CFMs): dynamical systems that evolve a distinguished token in the presence of a contextual measure across a stack of attention blocks. Within this framework, the finite-context model approximates an idealized infinite-context system in which the contextual measure is replaced by its underlying population, so that the context length $n$ becomes a statistical resource. Exploiting the McKean--Vlasov structure of the dynamics and the classical machinery of propagation of chaos, we establish a forward bound controlling the deviation between the finite- and infinite-context CFMs uniformly along depth, and a backward bound controlling the deviation between the corresponding training trajectories uniformly across iterations of online gradient descent. Both bounds achieve the optimal Wasserstein rate $n^{-1/d}$ for general CFMs and parametric rate $n^{-1/2}$ for a restricted class of CFMs that includes transformers as a special case. The analysis rests on a new Eulerian adjoint formulation of the loss gradient and stability estimates for the resulting forward--adjoint system, both of which may be of independent interest.

2605.16742 2026-05-19 cs.CV stat.ME

Diffeomorphic Cortical Alignment via Direct Warping of Streamline Endpoints

通过直接变形纤维束端点实现的皮层对齐

Yang Xiang, Martin Cole, Zhengwu Zhang

AI总结 本文提出了一种基于连接性的皮层对齐方法,通过直接操作白质纤维束端点来对齐皮层表面,以提高纤维束层面的对应性,并在主要纤维束上实现更高的连接性重叠系数和更强的鲁棒性。

详情
AI中文摘要

皮层表面注册通常由局部几何描述符(例如沟回深度和曲率)驱动。尽管这种方法实现了几何对应,但忽略了白质解剖结构所施加的远距离连接约束。扩散磁共振成像束追踪提供了这些关键约束;然而,先前的连接性指导流程通常对预计算的连接性矩阵进行对齐,使优化高度敏感于连接性估计及其分辨率。在本文中,我们提出了一种新的基于连接性的皮层对齐方法,通过直接在白质纤维束端点上操作来对齐皮层表面。我们将束端点建模为产品流形Ω×Ω上的点云,其中Ω代表膨胀的皮层半球的球形域。我们的对齐方法通过迭代(i)通过最小化连接性不匹配计算Ω的小变形扭曲,并(ii)根据此扭曲更新端点。该方法依赖于一个几何框架,确保输出扭曲是微分同胚,并具有最终目标,即优化已知纤维束的匹配。在人类连接组计划(HCP)数据上的实验表明,该方法在纤维束层面实现了改进的对应性,实现了主要纤维束上的更高连接性重叠系数,并在Ω的网格分辨率下比最先进的方法如ENCORE和MSMAll表现出更强的鲁棒性。

英文摘要

Cortical surface registration is often driven by local geometric descriptors (e.g., sulcal depth and curvature). While this approach achieves geometric correspondence, it neglects the long-range wiring constraints imposed by white-matter anatomy. Diffusion MRI tractography offers these crucial constraints; however, prior connectivity-informed pipelines typically align precomputed connectivity matrices, making the optimization highly sensitive to connectivity estimation and its resolution. In this paper, we introduce a novel connectivity-based surface registration method that aligns cortical surfaces by operating directly on white-matter fiber-tract endpoints. We model tract endpoints as a point cloud on the product manifold $Ω\times Ω$, where $Ω$ represents the spherical domain of the inflated cortical hemispheres. Our alignment method iteratively (i) computes a small diffeomorphic warp for $Ω$ by minimizing connectivity mismatch, and (ii) updates the endpoints based on this warp. The method relies on a geometric framework that ensures output warps are diffeomorphisms and has a final goal that optimizes the matching of well-known fiber bundles. Experiments on Human Connectome Project (HCP) data demonstrate improved tract-level correspondence, achieving higher connectivity-level overlap coefficients on major fiber bundles and stronger robustness across grid resolutions for $Ω$ compared to state-of-the-art methods such as ENCORE and MSMAll.

2605.16733 2026-05-19 math.PR math.ST stat.TH

Concentration Inequalities for Sample Cross-Covariances

样本交叉协方差的集中不等式

Jiaheng Chen, Daniel Sanz-Alonso

AI总结 本文研究了样本交叉协方差矩阵偏离其均值的集中不等式和期望界,针对子高斯随机向量,证明了由两个边缘协方差矩阵有效秩决定的高概率算子范数界,在高斯情况下证明了匹配的期望下界,允许两个随机向量之间任意的相关性。

详情
Comments
13 pages
AI中文摘要

本文建立了样本交叉协方差矩阵偏离其均值的精确维数无关的集中不等式和期望界。对于子高斯随机向量,我们证明了一个由两个边缘协方差矩阵有效秩决定的高概率算子范数界。在高斯情况下,我们证明了一个匹配的期望下界,允许两个随机向量之间任意的相关性。

英文摘要

This paper establishes sharp dimension-free concentration and expectation bounds for the deviation of a sample cross-covariance matrix from its mean. For sub-Gaussian random vectors, we prove a high-probability operator-norm bound governed by the effective ranks of the two marginal covariance matrices. In the Gaussian case, we prove a matching expectation lower bound, allowing arbitrary correlation between the two random vectors.

2605.16708 2026-05-19 cs.LG stat.ML

Isolating Nonlinear Independent Sources in fMRI with $β$-TCVAE Models

利用β-TCVAE模型在fMRI中分离非线性独立源

Qiang Li, Shujian Yu, Jesus Malo, Jingyu Liu, Tülay Adali, Vince D. Calhoun

AI总结 本文提出利用β-TCVAE模型处理非线性fMRI数据,分离混合的空间和时间脑信号,恢复具有生物学意义的非线性空间成分,并通过功能网络连接性验证了潜在结构的可解释性。

详情
Comments
6 pages, 2 figures
AI中文摘要

从非线性fMRI数据中学习有意义的潜在表示仍然是神经影像分析中的基本挑战。传统独立成分分析(ICA)因其能估计可解释的功能脑网络而被广泛使用,但其依赖于线性混合假设,限制了其捕捉大脑动态内在非线性和复杂组织的能力。近年来,深度表示学习方法作为非线性潜在结构建模的有希望替代方案出现。然而,许多方法主要在模拟数据集或自然图像基准上评估,对真实世界神经影像数据如fMRI的验证相对有限。本文受β-TCVAE(总相关变分自编码器)的启发,这是β-VAE框架的改进,用于学习潜在表示而不引入额外超参数。我们调整并修改该模型以适应fMRI数据,旨在分离混合的空间和时间脑信号为可解释的成分。我们证明β-TCVAE框架可以恢复具有生物学意义的非线性空间成分,包括已建立的内在连接网络如默认模式网络。此外,我们通过功能网络连接性评估学习的表示,显示潜在结构捕捉了连贯且可解释的大脑组织模式。本研究提供了一项将非线性表示学习与fMRI分析连接的初步调查。

英文摘要

Learning meaningful latent representations from nonlinear fMRI data remains a fundamental challenge in neuroimaging analysis. Traditional independent component analysis, widely used due to its ability to estimate interpretable functional brain networks, relies on a linear mixing assumption for latent sources, limiting its ability to capture the inherently nonlinear and complex organization of brain dynamics. More recently, deep representation learning methods have emerged as promising alternatives for modeling nonlinear latent structure. However, many of these approaches have been evaluated primarily on simulated datasets or natural image benchmarks, with comparatively limited validation on real-world neuroimaging data such as fMRI. In this work, we are motivated by the $β$-TCVAE (Total Correlation Variational Autoencoder), a refinement of the $β$-VAE framework for learning latent representations without introducing additional hyperparameters during training. We adapt and modify this model to fMRI data for nonlinear source disentanglement, aiming to separate mixed spatial and temporal brain signals into interpretable components. We show that the $β$-TCVAE framework can recover meaningful nonlinear spatial components with biological relevance, including well-established intrinsic connectivity networks such as the default mode network. Furthermore, we evaluate the learned representations using functional network connectivity, showing that the latent structure captures coherent and interpretable brain organization patterns. This study provides a pilot investigation that bridges nonlinear representation learning and fMRI analysis.

2605.16699 2026-05-19 cs.LG q-fin.RM stat.ML

Your SaaS Is an Insurance Product: A Modeling Framework

你的 SaaS 是一种保险产品:一种建模框架

Caio Gomes

AI总结 本文将 capped-usage SaaS 产品与保险产品进行类比,提出基于频率-严重性分解、保费计算原理和蒙特卡洛储备充足性的建模框架,用于 SaaS 价格建模。

详情
Comments
23 pages, 2 figures, 7 tables. Companion code archived at DOI 10.5281/zenodo.20213155
AI中文摘要

Capped-usage SaaS 产品——如 Claude Code 和 ChatGPT 等大语言模型订阅、Vercel 和 Cloudflare Workers 等云平台、企业福利平台、具有责任转移的身份验证服务——与保险产品有相同的结构性特征:固定保费与实际消费解耦、用户层面的随机需求具有厚尾严重性、非同质的上限在固定时间表重置、以及需要在尾部风险下具备充足储备的组合层面暴露。我们主张这不是类比,而是 actuarial science 已经几十年来试图解决的问题,用新的依赖变量(如 tokens、带宽字节、函数调用、健身房打卡)替代医疗索赔。本文提出一个基于频率-严重性分解、保费计算原理和蒙特卡洛储备充足性的建模框架,将其映射到两个领域(LLM 服务和云平台)的公开可观察的订阅层级,基于经典的健康保险经济学(Arrow 1963; Pauly 1968; Manning 等 1987; Brot-Goldberg 等 2017),并通过一个工作示例展示与传统单位经济的差异。贡献是操作性的而非理论性的:不是新的定理,而是目前缺失于 cs.LG/stat.ML 实践中的词汇和工具。

英文摘要

Capped-usage SaaS products -- LLM subscriptions such as Claude Code and ChatGPT, cloud platforms such as Vercel and Cloudflare Workers, corporate benefit platforms, identity-verification services with liability transfer -- share a structural signature with insurance products: a fixed premium decoupled from realized consumption, stochastic per-user demand with heavy-tailed severity, a non-fungible cap that resets on a fixed schedule, and a portfolio-level exposure that requires reserve adequacy under tail risk. We argue that this is not an analogy. It is the same operational problem actuarial science has been tooled for decades to address, restated with new dependent variables (tokens, bandwidth bytes, function-invocations, gym check-ins) in place of medical claims. This paper proposes a modeling framework for capped-usage SaaS pricing built from frequency-severity decomposition, premium calculation principles, and Monte Carlo reserve adequacy. We map the framework to publicly observable subscription tiers in two domains (LLM services and cloud platforms), ground it in canonical health-insurance economics (Arrow 1963; Pauly 1968; Manning et al. 1987; Brot-Goldberg et al. 2017), and demonstrate divergence from traditional unit economics through a worked example. The contribution is operational rather than theoretical: not a new theorem, but vocabulary and tools currently absent from cs.LG/stat.ML practice.

2605.16652 2026-05-19 stat.ME

Semiparametric Regression for Misclassified Competing Risks Data

半参数回归用于误分类竞争风险数据

Theofanis Balanos, Constantin T. Yiannoutsos, Felix M. Pabon-Rodriguez, Hongmei Nan, Giorgos Bakoyannis

AI总结 本文提出一种半参数回归方法,用于处理无内部验证样本的竞争风险数据误分类问题,通过外部验证研究估计误分类概率,提升估计效率。

详情
Comments
Original Article, Biostatistics - Survival Analysis, 2 figures
AI中文摘要

竞争风险数据的分析常因失败原因的误分类而复杂化。此问题可能导致严重偏倚和无效结论。一种处理误分类的方法是在部分非右删截参与者中使用金标准失败原因确定程序(内部验证样本)以及处理缺失金标准确定的缺失数据方法。然而,这种方法成本高且耗时,无法在许多研究中实施。本文提出了一种半参数回归分析方法,用于无内部验证样本的情况。我们的方法利用外部验证研究中估计的误分类概率来调整当前研究中的误分类。这些概率被纳入基于B样条的筛伪似然函数中,通过最大化该函数联合估计所有事件类型的模型。利用经验过程理论,我们证明了所提估计器的一致性。广泛的模拟实验显示,该方法在现实样本量下表现良好,比之前的方法提供了更高效的估计。该方法应用于一项大型HIV观察研究的竞争风险数据,其中事件类型因重大死亡漏报而被误分类。

英文摘要

The analysis of competing risks data is often complicated by misclassification of the cause of failure. This issue can lead to seriously biased estimates and invalid conclusions. One way to deal with such misclassification is to use a gold-standard cause of failure ascertainment procedure in a subset of the non-right-censored participants (internal validation sample) along with methods for missing data to deal with the missing gold-standard ascertainments. However, this approach can be costly and time-consuming and, therefore, cannot be implemented in many studies. In this work, we propose a semiparametric regression analysis methodology for the case where no internal validation sample exists. Our approach leverages estimates of the misclassification probabilities from an external validation study to adjust for misclassification in the study at hand. These probabilities are incorporated in a B-spline-based sieve pseudo-likelihood function, which is maximized to jointly estimate models for all event types. Using empirical process theory, we show that the proposed estimator is consistent. Extensive simulation experiments demonstrate that the method performs well with realistic sample sizes and provides substantially more efficient estimates compared to previously proposed approaches. The methodology is applied to competing risks data from a large HIV observational study in sub-Saharan Africa, where event type is misclassified due to significant death under-reporting.

2605.16645 2026-05-19 math.ST cs.IT cs.LG math.IT stat.ML stat.TH

Statistical Unlearning of Distributions: A Hypothesis Testing Approach

分布统计遗忘:一种假设检验方法

Aaradhya Pandey, Sanjeev Kulkarni

AI总结 本文提出一种分布统计遗忘框架,通过假设检验选择样本以减少不需要的分布影响,同时保持所需分布的性能,并分析了允许的编辑数据分布区域和帕累托前沿。

详情
Comments
Comments welcome
AI中文摘要

机器学习系统越来越多地面临要求遗忘不仅单个数据点,还包括整个信息领域的需求,例如有毒语言、受版权保护的语料库或人口统计数据偏见。这提出了统计-计算权衡的根本困境:移除所有不需要领域的样本可能是计算上不可行的,而随机移除一部分可能无法提供分布层面的统计保证。我们提出了一种分布遗忘的统计框架,其中领域被建模为概率分布,目标是移除精心选择的样本子集,以减少不需要分布的影响,同时保持所需分布的性能。我们通过假设检验编辑数据与所需和不需要的领域,从而得到可解释且稳健的样本移除标准。在该统计框架中,我们表征了允许的编辑数据分布区域以及广泛分布族的移除-保留帕累托前沿。这包括参数族如任意维度的位移高斯分布、一维位置族带有对数凹噪声以及一维泊松族。它还包含非参数族,如高斯白噪声模型,这是非参数回归的通用模型。我们证明了组合规则,描述了分布遗忘在多模式不需要领域中的行为,并引入了当组合大量此类族时移除-保留基线的中心极限行为。最后,我们通过提供某些选择算法的帕累托前沿来提供有限样本保证,并观察到信息-计算差距。

英文摘要

Machine learning systems increasingly face requirements to forget not only individual data points, but entire domains of information, such as toxic language, copyrighted corpora, or demographic biases. This raises a fundamental dilemma of statistical-computational tradeoffs: removing all samples from an unwanted domain may be computationally prohibitive, while randomly removing a subset may not provide distribution-level statistical guarantees. We propose a statistical framework for distributional unlearning, in which domains are modeled as probability distributions, and the goal is to remove a carefully chosen subset of samples that reduces the effect of an unwanted distribution while preserving performance on a desired one. We formalize this using a hypothesis test of the edited data with the desired and unwanted domains, leading to an interpretable and robust criterion for selecting samples to remove. Within this statistical framework, we characterize the fundamental region of the allowable edited data distributions and the removal-preservation Pareto frontier for a broad class of distribution families. This includes parametric families such as shifted Gaussians of arbitrary dimension, a one-dimensional location family with log-concave noise, and the one-dimensional Poisson family. It also includes nonparametric families such as the Gaussian white noise model, a canonical model for nonparametric regression. We prove composition rules that describe how distributional unlearning behaves across multimodal unwanted domains, and introduce a central-limit behavior for the removal-preservation baselines when composing a large number of such families. Finally, we provide finite sample guarantees by providing Pareto frontiers for some selection algorithms, and observe an information-computation gap.

2605.16644 2026-05-19 eess.SY cs.LG cs.SY math.OC stat.ML

The Score Kalman Filter

分数卡尔曼滤波器

Kaito Iwasaki, Anthony Bloch, Taeyoung Lee, Maani Ghaffari

AI总结 本文提出分数卡尔曼滤波器,通过结合分数匹配与斯蒂恩恒等式,避免了分区函数的计算,实现了非线性系统的高效滤波,适用于高维问题。

详情
Comments
56 pages, 27 figures
AI中文摘要

非线性贝叶斯滤波的核心难题在于表示信念分布。基于矩的滤波器通过传播多项式矩并从它们中重建密度来解决这一问题。最近的工作通过最大熵原理完成预测-更新循环,但每一步都需要分区函数及其梯度,均为n维积分,其成本呈指数增长,限制了最大熵矩滤波器的演示到n≤4。我们通过将分数匹配与斯蒂恩恒等式结合,完全避免了分区函数。在我们的设置中,分数匹配将密度拟合减少到一个线性求解,其系数直接从传播的矩中组装。相同的参数随后驱动斯蒂恩恒等式在预测期间关闭矩层次结构,并在每次贝叶斯更新后恢复后验矩,使完整的预测-更新循环免于分区函数评估。所得到的分数卡尔曼滤波器(SKF)作为特殊情况退化为经典的信息形式卡尔曼滤波器,并通过线性代数完成每一步。在非线性耦合振荡器网络上,SKF能够运行n=20,并在测试的合成基准上报告比EKF、UKF、EnKF和粒子滤波基线更低的RMSE。

英文摘要

A central obstacle in nonlinear Bayesian filtering is representing the belief distribution. Moment-based filters address this by propagating polynomial moments and reconstructing a density from them. Recent work completes the predict-update loop via the maximum-entropy (MaxEnt) principle, but each step requires the partition function and its gradient, both $n$-dimensional integrals whose cost scales exponentially, restricting the demonstrated MaxEnt moment filtering to $n \le 4$. We avoid the partition function entirely by combining score matching with Stein's identity. In our setting, score matching reduces the density fit to a single linear solve whose coefficients are assembled directly from the propagated moments. The same parameters then drive Stein's identity to close the moment hierarchy during prediction and to recover posterior moments after each Bayesian update, keeping the full predict-update loop free of partition function evaluation. The resulting Score Kalman Filter (SKF) reduces to the classical information-form Kalman filter as a special case and performs every step through linear algebra. On nonlinear coupled-oscillator networks, the SKF runs through $n=20$ and reports lower RMSE than the EKF, UKF, EnKF, and particle-filter baselines on the tested synthetic benchmarks.

2605.14943 2026-05-19 stat.ME

Piece-wise linear isotonic regression

分段线性单调回归

Timo Kuosmanen, Juan F. Monge, José L. Ruiz, Xun Zhou

AI总结 本文提出分段线性平滑框架,解决传统单调回归无法提供边际属性的问题,通过双层优化方法在凸非凸情况下提升估计精度。

详情
AI中文摘要

单调回归提供了一种灵活的无调参方法来估计单调函数,但估计的回归函数本质上是阶梯函数。本文针对此类估计器的关键局限性:无法提供有意义的边际属性,如影子价格或弹性。我们提出了一种新的分段线性平滑框架,即使在非凸情况下也能恢复有意义的边际估计。基于确定性前沿分析中最初开发的条件凸性概念,我们将平滑过程建模为一个双层优化问题,以拟合连续、单调、分段线性的函数到初始单调回归预测。蒙特卡洛模拟显示,所提出的方法在单变量和多变量数据的凸和非凸情况下显著提高了估计精度。我们将其应用于分析芬兰市政的集聚经济效应,展示了其实际价值。

英文摘要

Isotonic regression provides a flexible, tuning-free approach to estimating monotonic functions without imposing global curvature constraints, yet the estimated regression function is inherently a step function. This paper addresses a key limitation of such estimators: their inability to provide meaningful marginal properties, such as shadow prices or elasticities. We propose a novel piece-wise linear smoothing framework that recovers meaningful marginal estimates even in non-convex settings. Building on the concept of conditional convexity originally developed in deterministic frontier analysis, we formulate the smoothing process as a bilevel optimization problem that fits a continuous, monotonic, piece-wise linear function to the initial isotonic regression predictions. Monte Carlo simulations demonstrate that the proposed approach can significantly improve estimation accuracy in both convex and non-convex settings for univariate and multivariate data. We apply this approach to analyze agglomeration economies in Finnish municipalities, illustrating its practical value.

2605.12547 2026-05-19 econ.EM cs.LG q-fin.ST stat.AP

The Payment Heterogeneity Index: An Integrated Unsupervised Framework for High-Volume Procurement Oversight and Decision Support

支付异质性指数:一种用于高 volume 采购监督和决策支持的集成无监督框架

Kyriakos Christodoulides

AI总结 本文提出支付异质性指数(PHI),通过整合高斯混合模型参数和非参数统计,用于高 volume 采购监督和决策支持,揭示支付结构和潜在模式。

详情
Comments
Request category change from econ.EM -> stat.ML. Paper is methodological, introducing a new unsupervised ML/stat framework (SHI/PHI index) for distributional structure. Methodology is general; procurement is the application. stat.ML is more appropriate primary; econ.EM as cross-list
AI中文摘要

公共采购易受错误、欺诈和腐败影响,特别是在高交易量超出监督能力时。尽管研究常关注招标阶段异常,但中标后付款监控仍被忽视。由于标记数据稀缺且如本福特定律等方法假设限制多,需要可解释的无监督框架用于高 volume 采购监督和决策支持。本文引入结构异质性指数(SHI),一种一维样本复合统计量,及其支付特定实例支付异质性指数(PHI),用于表征支付结构和潜在模式。它整合高斯混合模型(GMM)参数和非参数统计,整合四个可解释组件:模态、不对称性、尾部行为和结构分散性。独特的是,尾部行为组件捕捉分布厚重和极值集中,而结构分散性结合了潜在支付模式的变异性、普遍性和分离度。应用于英国市政采购数据,PHI识别出一个财务显著的供应商群体(0.6%的供应商;10.1%的高 volume 供应商)具有结构不同的支付模式。统计检验进一步支持这些差异,针对性的人工验证确认了优先案例的合理性。比较分析显示PHI揭示了被变异系数(ρ=0.310)掩盖的模式分离。PHI提供了一个透明、可分解且计算轻量的框架用于采购完整性监督和目标审计优先级。

英文摘要

Public procurement is vulnerable to error, fraud, and corruption, particularly as high transaction volumes overwhelm oversight. While research often focuses on tender-stage anomalies, post-award payment monitoring remains underexplored. Since labelled datasets are rare and methods like Benford's Law face restrictive assumptions, there is a need for interpretable, unsupervised frameworks for high-volume procurement oversight and decision support. This paper introduces the Structural Heterogeneity Index (SHI), a composite statistic for one-dimensional samples, and its payment-specific instantiation, the Payment Heterogeneity Index (PHI), characterising payment structure and latent regimes. It incorporates Gaussian Mixture Model (GMM) parameters alongside non-parametric statistics, integrating four interpretable components: modality, asymmetry, tail behaviour, and structural dispersion. Uniquely, the tail-behaviour component captures both distributional heaviness and extreme-value concentration, while structural-dispersion combines the variability, prevalence, and separation of latent payment regimes. Applied to UK municipal procurement data, PHI identifies a financially significant cohort (0.6\% of suppliers; 10.1\% of high-volume vendors) with structurally distinct payment patterns. Statistical testing further supports these differences, and targeted human verification confirms the plausibility of prioritised cases. Comparative analysis shows PHI reveals regime separation obscured by the Coefficient of Variation ($ρ= 0.310$). PHI provides a transparent, decomposable, and computationally lightweight framework for procurement integrity oversight and targeted audit prioritisation.

2605.10088 2026-05-19 stat.ME

Sample size and power calculations for causal inference with time-to-event outcomes

基于时间到事件结果的因果推断样本量和功效计算

Chengxin Yang, Bo Liu, Fan Li

AI总结 本文开发了用于时间到事件结果因果推断的样本量和功效公式,提出新的分析样本量公式,适用于随机试验和观察性研究,修正了经典log-rank方法的误判。

详情
AI中文摘要

本文开发了用于因果推断的时间到事件结果的样本量和功效公式。目标估计量是边际危险比:边际结构Cox比例危险模型中的系数,其中治疗是唯一预测因子。我们扩展了稳健 Sandwich 方差理论,推导出逆概率加权偏似然估计量的渐近方差的解析形式。在此基础上,我们推导了一个新的分析样本量公式,适用于任何预指定的效果大小,适用于随机试验和观察性研究。对于随机试验,该公式仅需要治疗比例、效果大小和事件率的常规输入。新的公式修正了经典log-rank方法的误判。对于观察性研究,一个额外的输入足够:一个总结比较组之间协变量相似性的重叠系数。我们进一步开发了一种适用于任何倾向分数平衡权重的方差膨胀方法,锚定在修正的基线方差上。我们提供了一个在线计算器和一个R包'PSpower'来实现该方法。

英文摘要

This paper develops power and sample size formulas for causal inference with time-to-event outcomes. The target estimand is the marginal hazard ratio: the coefficient of a marginal structural Cox proportional hazard model with treatment as the only predictor. We extend the robust sandwich variance theory and derive the analytical form of the asymptotic variance for the inverse probability weighted partial likelihood estimator. Building on this, we derive a new analytical sample size formula valid at any prespecified effect size, applicable to both randomized trials and observational studies. For randomized trials, the formula requires only the canonical inputs of treatment proportion, effect size, and event rate. The new formula corrects the mischaracterization of classic log-rank-based formulas. For observational studies, one additional input suffices: an overlap coefficient summarizing covariate similarity between comparison groups. We further develop a variance inflation approach applicable to any propensity score balancing weights, anchored to the corrected baseline variance. We provide an online calculator and an R package 'PSpower' to implement the method.

2605.08550 2026-05-19 cs.LG stat.ML

A Call to Lagrangian Action: Learning Population Mechanics from Temporal Snapshots

对拉格朗日作用的呼吁:从时间快照中学习群体动力学

Vincent Guan, Lazar Atanackovic, Kirill Neklyudov

AI总结 本文提出通过时间快照学习群体动力学的新方法,基于拉格朗日作用和韦瑟斯特拉格梯度流,提出WLM算法能预测和插值未见边际,并在多种动态中表现优异。

详情
Comments
Accepted at ICML 2026 (spotlight)
AI中文摘要

分子、细胞和生物体的群体动力学由若干未知力支配。过去十年中,群体动力学主要通过韦瑟斯特拉格梯度流建模。然而,由于梯度流最小化自由能,它们无法捕捉重要的动态特性,如周期性。本文提出通过考虑在阻尼韦瑟斯特拉格拉格朗日下最小化群体层面作用的动力学,推导对应的哈密顿方程,正式化韦瑟斯特拉格拉格梯度流力学,即一类包含经典力学、量子力学和梯度流的结构化二阶动力学。随后提出WLM作为首个能从观测边际中学习这些二阶动力学的算法,无需指定拉格朗日量。通过直接学习群体动力学,WLM能够预测和插值未见边际,并在广泛动态中优于现有梯度流和流匹配方法,包括涡旋动力学、胚胎发育和鸟群行为。

英文摘要

The population dynamics of molecules, cells, and organisms are governed by a number of unknown forces. In the last decade, population dynamics have predominantly been modeled with Wasserstein gradient flows. However, since gradient flows minimize free energy, they fail to capture important dynamical properties, such as periodicity. In this work, we propose a change in perspective by considering dynamics that minimize a population-level action under a damped Wasserstein Lagrangian. By deriving the corresponding Hamiltonian equations of motion, we formalize Wasserstein Lagrangian Mechanics, a structured class of second-order dynamics that encompasses classical mechanics, quantum mechanics, and gradient flows. We then propose WLM as the first algorithm that learns these second-order dynamics from observed marginals, without specifying the Lagrangian. By directly learning the population mechanics, WLM can both forecast and interpolate unseen marginals, and outperforms existing gradient flow and flow matching methods across a wide range of dynamics, including vortex dynamics, embryonic development, and flocking.

2605.07285 2026-05-19 stat.ME

Transporting treatment effects by calibrating large-scale observational outcomes

通过校准大规模观测结果来运输治疗效应

Harrison H Li

AI总结 本文提出一种方法,通过校准观测结果与实验数据,估计运输治疗效应,并展示其在不同正性下的稳定性。

详情
Comments
37 pages, 5 figures
AI中文摘要

一个高质量的实验数据集通常比相应观测数据集小。当后者可能有偏的结局测量时,我们提出了一种估计和推断运输治疗效应的程序。我们的点估计器可通过以下步骤计算:首先,通过普通最小二乘法(OLS)校准观测结果中的治疗-对照对比,估计条件平均治疗效应(CATE)。然后,计算该估计CATE在观测数据集上的样本平均值。我们证明,即使OLS校准不准确,极限估计量仍是一个加权运输平均治疗效应。此外,当实验数据集规模增长速度慢于观测数据集时,我们的推断在渐近上是有效的且半参数高效,无论两个数据集之间是否存在正性(重叠)。我们通过数值模拟和一个使用田间实验和卫星基产量估计的数据示例,展示了该方法在不同正性程度下的稳定实证表现,以估计美国中西部地区大规模范围内轮作对玉米产量的平均效应。

英文摘要

A high-quality experimental dataset is often much smaller than a corresponding observational dataset. When this holds with possibly biased measurements of the outcome of interest in the latter, we propose an estimation and inference procedure for a transported treatment effect. Our point estimator can be computed as follows. First, we estimate the conditional average treatment effect (CATE) by calibrating a treatment-control contrast estimated using the observational outcomes to the experimental dataset using ordinary least squares (OLS). Then, we compute the sample average of this estimated CATE over the observational dataset. We show that the limiting estimand is a weighted transported average treatment effect even when the OLS calibration is misspecified. Furthermore, our inference for this estimand is asymptotically valid and semiparametrically efficient when the size of the experimental dataset grows more slowly than the size of the observational dataset, regardless of the existence of positivity (overlap) between the two datasets. We illustrate the stable empirical performance of our method under varying degrees of positivity using numerical simulations and a data example using field experiments and satellite-based yield estimates to estimate the average effect of crop rotation on maize (corn) yields over a large area of the Midwestern United States.

2605.04317 2026-05-19 math.ST stat.TH

The Threshold Breakdown Point

阈值破裂点

Tianjun Ke, Marco Avella Medina

AI总结 本文提出了一种新的有限样本鲁棒性方法,定义了阈值破裂点和有限样本m-敏感性,扩展了Zhang(1996)的决策破裂点,展示了这些概念与假设检验的有限样本对应关系。

详情
AI中文摘要

我们介绍了一种新的有限样本鲁棒性方法,以避免传统破裂分析的悲观性。我们定义了阈值破裂点,即引起预定偏差所需的最小污染分数,以及有限样本m-敏感性,即在m个观测值被污染后估计器可能产生的最坏偏差。我们推导了这些度量标准用于常用M-估计量、其标准误差及相关检验统计量。这使我们能够扩展Zhang(1996)的决策破裂点,以获得假设检验的一般破裂特征,并展示这些概念如何对应于He、Simpson和Portnoy(1990)的幂和水平破裂函数的有限样本对应物。我们补充了阈值破裂和m-敏感性的推断框架,该框架提供了一致性和渐近正态性结果,以及用于不确定性量化的有效乘数自助法。我们通过各种数值示例和一个用于血压数据集的两样本检验问题展示了我们方法的实际效用。

英文摘要

We introduce a novel approach to finite sample robustness that avoids the pessimism of traditional breakdown analyses. We define the threshold breakdown point, the smallest contamination fraction needed to induce a prescribed deviation, and the finite sample m-sensitivity, the worst-case deviation that an estimator can incur after m observations are contaminated. We derive these measures for commonly used M-estimators, their standard errors and related test statistics. This allows us to extend the decision breakdown point of Zhang (1996) to obtain general breakdown characterizations for hypothesis testing, and show how these notions correspond to finite sample counterparts of the power and level breakdown functions of He, Simpson and Portnoy (1990). We complement our work with an inferential framework for the threshold breakdown and m-sensitivity that yields consistency and asymptotic normality results, as well as a valid multiplier bootstrap for uncertainty quantification. We illustrate the practical utility of our methods in various numerical examples and an application to a two sample testing problem for a blood pressure dataset.

2605.00155 2026-05-19 cs.LG cs.CL math.OC stat.ML

Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback

Wasserstein分布鲁棒遗憾优化用于人类反馈的强化学习

Yikai Wang, Shang Liu, Jose Blanchet

AI总结 本文提出Wasserstein分布鲁棒遗憾优化(DRRO)用于强化学习从人类反馈,通过简单分配模型研究提示问题,展示在ℓ1-地面成本Wasserstein模糊集下,内最坏遗憾有精确解,最优策略具有水填充结构,从而实现高效政策梯度算法。

详情
AI中文摘要

强化学习从人类反馈(RLHF)已成为对齐大语言模型的核心后训练步骤,但RLHF中使用的奖励信号仅是真实人类效用的学得代理。从运筹学角度看,这形成了一个目标不准确的决策问题:策略是针对估计奖励优化,而部署性能由未观察的目标决定。由此产生的差距导致奖励过度优化,即Goodharting现象,即代理奖励在真正质量下降后仍继续改善。现有缓解方法通过不确定性惩罚、悲观奖励或保守约束,但这些方法计算上负担重且过于悲观。我们提出Wasserstein分布鲁棒遗憾优化(DRRO)用于RLHF。不同于标准DRO悲观最坏价值,DRRO悲观最坏遗憾相对于相同合理奖励扰动下的最佳策略。我们通过简单分配模型研究提示问题,展示在ℓ1-地面成本Wasserstein模糊集下,内最坏遗憾有精确解,最优策略具有水填充结构。这些结果导致具有简单采样奖金解释和仅小幅改动GRPO式RLHF训练的实用策略梯度算法。该框架还理论上澄清了为什么DRRO比DRO更不悲观,且实验显示DRRO比现有基线更有效缓解过度优化,而标准DRO系统性过悲观。

英文摘要

Reinforcement learning from human feedback (RLHF) has become a core post-training step for aligning large language models, yet the reward signal used in RLHF is only a learned proxy for true human utility. From an operations research perspective, this creates a decision problem under objective misspecification: the policy is optimized against an estimated reward, while deployment performance is determined by an unobserved objective. The resulting gap leads to reward over-optimization, or Goodharting, where proxy reward continues to improve even after true quality deteriorates. Existing mitigations address this problem through uncertainty penalties, pessimistic rewards, or conservative constraints, but they can be computationally burdensome and overly pessimistic. We propose Wasserstein distributionally robust regret optimization (DRRO) for RLHF. Instead of pessimizing worst-case value as in standard DRO, DRRO pessimizes worst-case regret relative to the best policy under the same plausible reward perturbation. We study the promptwise problem through a simplex allocation model and show that, under an $\ell_1$-ground-cost Wasserstein ambiguity set, the inner worst-case regret admits an exact solution and the optimal policy has a water-filling structure. These results lead to a practical policy-gradient algorithm with a simple sampled-bonus interpretation and only minor changes to GRPO-style RLHF training. The framework also clarifies theoretically why DRRO is less pessimistic than DRO, and our experiments show that DRRO mitigates over-optimization more effectively than existing baselines while standard DRO is systematically over-pessimistic.

2604.20031 2026-05-19 math.OC cs.LG stat.ML

Decision-Focused Federated Learning Under Heterogeneous Objectives and Constraints

在异质目标和约束下聚焦决策的联邦学习

Konstantinos Ziliaskopoulos, Alexander Vinel

AI总结 本文研究了在异质目标和约束下聚焦决策的联邦学习,通过SPO+替代损失推导出异质性界限,展示了在强凸可行集下联邦学习的鲁棒性,并通过实验验证了其有效性。

详情
AI中文摘要

我们考虑了决策聚焦联邦学习(DFFL),这是一种预测后再优化的设置,在其中多个客户端协同训练预测模型以解决下游的线性优化问题,而无需交换原始数据。除了标准联邦学习中典型的数据异质性外,客户端还可能有不同的目标函数和可行区域。基于SPO+替代损失,我们推导出异质性界限,将目标偏移(通过成本向量距离测量)与可行集偏移(通过支撑函数和形状距离术语测量)分开。我们证明,对于一般的紧致可行集,小的目标扰动仍可引起非消失的决策聚焦损失差异,而强凸可行区域会产生更尖锐的基于稳定性界限。然后,我们将这些点状界限提升到局部与联邦的超额风险比较,显示当统计优势超过客户端特定的异质性惩罚时,联邦学习是有益的。在多面体和强凸问题上的计算实验证实,在强凸可行区域下联邦学习的鲁棒性显著增强。最后,我们评估了一个简单的基于验证的插值方法,用于本地和联邦DFFL模型之间。该插值方法缓解了理论权衡,减少了合成实验和PJM电力定价案例研究中的累积遗憾和最坏客户端损害。

英文摘要

We consider Decision-Focused Federated Learning (DFFL), a predict-then-optimize setting in which multiple clients collaboratively train predictive models for downstream linear optimization problems without exchanging raw data. Besides the data heterogeneity typical of standard federated learning, clients may also have different objective functions and feasible regions. Building on the SPO+ surrogate loss, we derive heterogeneity bounds that separate objective shift, measured through cost-vector distances, from feasible-set shift, measured through support-function and shape-distance terms. We show that, for general compact feasible sets, small objective perturbations can still induce nonvanishing decision-focused loss discrepancies, while strongly convex feasible regions yield sharper stability-based bounds. We then lift these pointwise bounds to a local-versus-federated excess-risk comparison, showing that federation is beneficial when the statistical advantage of pooling exceeds a client-specific heterogeneity penalty. Computational experiments on polyhedral and strongly convex problems confirm that federation is substantially more robust under strongly convex feasible regions. Finally, we evaluate a simple validation-based interpolation between local and federated DFFL models. This interpolation mitigates the theoretical tradeoff and reduces aggregate regret and worst-client harm in both synthetic experiments and a PJM energy-pricing case study.

2604.13276 2026-05-19 stat.ME math.ST stat.TH

Addressing Confounding by Indication Through (Un)Measured Centre Characteristics in Learn-As-you-GO(LAGO) Trials

通过测量和未测量的中心特征处理指示偏倚的LAGO试验

Minh Thu Bui, Christopher T. Longenecker, Ante Bing, Donna Spiegelman, Allison R. Webel, Hayden B. Bosworth, Judith J. Lok

AI总结 本文提出通过引入固定中心效应来控制指示偏倚,统一了连续和二元结果类型的LAGO理论,并提供了统计检验和优化方法。

详情
AI中文摘要

Learn-As-you-Go (LAGO) 设计是一种自适应临床试验设计,允许在不同阶段修改多组件干预包。在 LAGO 试验中,中心特征可能作为混杂因素,预测干预包和结果。本文通过引入固定中心效应来控制通过测量和未测量的中心特征引起的指示偏倚。通过包含固定中心效应来条件化中心特征,确保渐近结果成立而无需显式表征未测量的混杂因素。我们的方法即使在中心数量较少时也适用。LAGO 理论已建立在广义线性模型和二元结果的逻辑回归模型下,统一了不同结果类型的理论。推导了点估计和区间估计,并建立了一致性和渐近正态性。提供了总体干预效应的有效假设检验,并通过约束优化获得了最小化成本且满足目标结果均值的最优干预包。

英文摘要

The Learn-As-you-Go (LAGO) design is an adaptive clinical trial design that allows modifications to multicomponent intervention packages across stages. Centers participate in more than one stage, as is common in large-scale implementation trials. In LAGO trials, center characteristics may act as confounders, predicting both the intervention package and the outcomes. We extend the LAGO theory by introducing fixed center effects to control for confounding by indication through measured and unmeasured center characteristics. Conditioning on center characteristics by including fixed center effects ensures asymptotic results hold without requiring explicit characterization of unmeasured confounders. Our methods apply even with small numbers of centers. LAGO theory is established for continuous outcomes following a generalized linear model and binary outcomes following a logistic regression model, unifying theory across outcome types. Point- and interval estimators are derived, and consistency and asymptotic normality are established. Valid hypothesis tests for the overall intervention effect are provided, and the optimal intervention package minimizing cost subject to a target outcome mean is obtained via constrained optimization.

2603.25860 2026-05-19 stat.ML cs.LG

On the Expressive Power of Contextual Relations in Transformers

Transformer中上下文关系的表达能力

Demián Fraiman

AI总结 本文提出一种测度理论框架,将上下文关系建模为概率对象,揭示了softmax注意力与熵正则化最优传输的联系,并证明Transformer能近似任意上下文关系规则。

详情
AI中文摘要

Transformer架构在建模上下文关系方面取得了显著的实证成功,但对其表达能力的理解仍不清晰。本文引入一种测度理论框架,将上下文关系建模为概率对象,无论是条件分布还是联合分布(耦合)。这一视角揭示了标准softmax注意力与熵正则化最优传输之间的自然联系,为注意力提供了一种统一的视图,即作为底层亲和函数的归一化。在此框架内,我们利用标准softmax注意力和交替Sinkhorn归一化建立了上下文系统的通用近似定理。这些结果表明,Transformer架构能够近似任意上下文关系规则,且归一化的选择决定了这些关系的表示方式。此外,它们还提供了Transformers在建模上下文关系上有效的原因的原理性解释。

英文摘要

Transformer architectures have achieved remarkable empirical success in modeling contextual relations, yet a clear understanding of their expressive power is still lacking. In this work, we introduce a measure-theoretic framework in which contextual relations are modeled as probabilistic objects, either as conditional distributions or as joint distributions (couplings). This perspective reveals a natural connection between standard softmax attention and entropy-regularized optimal transport, providing a unified view of attention as a normalization of an underlying affinity function. Within this framework, we establish a universal approximation theorem for contextual systems using standard Softmax Attention and alternately Sinkhorn normalization. These results show that Transformer architectures can approximate arbitrary contextual relations rules, and that the choice of normalization determines how these relations are represented. Moreover, they provide a principled explanation for why Transformers are effective at modeling contextual relations.

2603.20904 2026-05-19 stat.ME math-ph math.DS math.MP nlin.CD physics.data-an stat.ML

Weak-Form Recovery of Stochastic Generators and Dynamical Invariants

弱形式恢复随机生成器与动力学不变量

Eshwar R A, Gajanan V. Honnavar

AI总结 本文通过弱投影方法从稀疏回归中联合识别随机过程的漂移和扩散项,从而显式生成符可进行谱分析,并在基准系统中验证了其准确性。

详情
Comments
21 pages, 5 figures
AI中文摘要

谱间隙、克拉默斯逃逸率和位置依赖的弛豫时间尺度是随机流的无穷小生成器$\Lop$中编码的动力学不变量。我们展示弱投影生成 governing Itô SDE 到时间测试函数会引入阶 $O(T\,\dt^{3/2})$ 的内生偏差,该偏差随观测窗口增大而增长,无法通过额外数据消除。相反,将投影到空间高斯核则可精确去除偏差:$\mathcal{F}_{t_n}$-可测性和塔性质保证了每一步的无偏回归行。由此框架从单个稀疏回归中联合识别漂移$b(x)$和扩散$a(x)$,产生一个显式的符号生成器,可进行谱分析。在三个基准系统中的验证显示系数误差低于5%,静止密度总变差距离低于0.01,自相关函数忠实再现真实弛豫时间尺度。

英文摘要

Spectral gaps, Kramers escape rates, and position-dependent relaxation timescales are dynamical invariants encoded in the infinitesimal generator $\Lop$ of a stochastic flow. We show that weak projection of the governing Itô SDE onto temporal test functions produces an endogeneity bias of order $O(T\,\dt^{3/2})$ that grows with the observation window and cannot be eliminated by additional data. Projecting instead onto spatial Gaussian kernels removes the bias exactly: $\mathcal{F}_{t_n}$-measurability and the tower property guarantee unbiased regression rows at every step. The resulting framework jointly identifies the drift $b(x)$ and diffusion $a(x)$ from a single sparse regression, producing an explicit symbolic enerator amenable to spectral analysis. Validation on three benchmark systems yields coefficient errors below 5%, stationary-density total-variation distances below 0.01, and autocorrelation functions that faithfully reproduce true relaxation timescales.

2603.14942 2026-05-19 eess.SY cs.SY stat.ME

A System-Theoretic Approach to Hawkes Process Identification with Guaranteed Positivity and Stability

基于系统理论的Hawkes过程识别方法:保证正定与稳定性

Xinhui Rong, Girish N. Nair

AI总结 本文提出基于系统理论的Hawkes过程识别方法,利用正交拉格朗日基保证正定性和稳定性,通过半定规划高效求解参数约束问题。

详情
Comments
6 pages, 2 figures
AI中文摘要

Hawkes过程模型自激发事件流,要求严格非负且稳定的随机强度。标准识别方法使用非负因果基,导致参数约束保守且高阶模型条件数差。为此,本文引入基于符号不定正交拉格朗日基的系统理论识别框架,保证渐近Gram矩阵与模型阶数无关。通过构造经验Gram矩阵和利用平方和迹等价性,提出估计器通过半定规划高效计算。

英文摘要

The Hawkes process models self-exciting event streams, requiring a strictly non-negative and stable stochastic intensity. Standard identification methods enforce these properties using non-negative causal bases, yielding conservative parameter constraints and severely ill-conditioned least-squares Gram matrices at higher model orders. To overcome this, we introduce a system-theoretic identification framework utilizing the sign-indefinite orthonormal Laguerre basis, which guarantees a well-conditioned asymptotic Gram matrix independent of model order. We formulate a constrained least-squares problem enforcing the necessary and sufficient conditions for positivity and stability. By constructing the empirical Gram matrix via a Lyapunov equation and representing the constraints through a sum-of-squares trace equivalence, the proposed estimator is efficiently computed via semidefinite programming.

2603.01388 2026-05-19 cs.LG stat.ML

Invariant-Stratified Propagation for Expressive Graph Neural Networks

不变量分层传播用于表达性图神经网络

Asela Hevapathige, Ahad N. Zehmakan, Asiri Wijesinghe, Saman Halgamuge

AI总结 本文提出不变量分层传播框架,通过改进的WL变体和高效神经网络实现,提升图神经网络的表达能力,解决结构异质性捕捉问题。

详情
Journal ref
Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)
AI中文摘要

图神经网络(GNNs)在表达性和捕捉结构异质性方面存在根本限制。标准消息传递架构受限于1维Weisfeiler-Leman(1-WL)测试,无法区分超过度序列的图,并且从邻居均匀聚合信息,无法捕捉节点在更高阶模式中的不同结构性位置。尽管存在实现更高表达性的方法,但它们带来了不可接受的计算成本,并缺乏统一的框架来灵活编码多样的结构属性。为了解决这些限制,我们引入不变量分层传播(ISP),该框架包括一种新的WL变体(ISP-WL)及其高效的神经网络实现(ISPGNN)。ISP根据图不变量分层节点,处理它们在层次结构中揭示的结构差异,这些差异对1-WL不可见。通过层次结构异质性编码,ISP量化节点在更高阶模式中的结构性位置差异,区分参与者占据不同角色的相互作用与参与者参与均匀的相互作用。我们提供了正式的理论分析,证明了超越1-WL的增强表达性,收敛保证以及固有的抗过平滑性。在图分类、节点分类和影响估计的广泛实验中,ISP在标准架构和最先进的表达性基线中表现出一致的改进。

英文摘要

Graph Neural Networks (GNNs) face fundamental limitations in expressivity and capturing structural heterogeneity. Standard message-passing architectures are constrained by the 1-dimensional Weisfeiler-Leman (1-WL) test, unable to distinguish graphs beyond degree sequences, and aggregate information uniformly from neighbors, failing to capture how nodes occupy different structural positions within higher-order patterns. While methods exist to achieve higher expressivity, they incur prohibitive computational costs and lack unified frameworks for flexibly encoding diverse structural properties. To address these limitations, we introduce Invariant-Stratified Propagation (ISP), a framework comprising both a novel WL variant (ISP-WL) and its efficient neural network implementation (ISPGNN). ISP stratifies nodes according to graph invariants, processing them in hierarchical strata that reveal structural distinctions invisible to 1-WL. Through hierarchical structural heterogeneity encoding, ISP quantifies differences in nodes' structural positions within higher-order patterns, distinguishing interactions where participants occupy different roles from those with uniform participation. We provide formal theoretical analysis establishing enhanced expressivity beyond 1-WL, convergence guarantees, and inherent resistance to oversmoothing. Extensive experiments across graph classification, node classification, and influence estimation demonstrate consistent improvements over standard architectures and state-of-the-art expressive baselines.

2602.04353 2026-05-19 stat.OT

Anyone for chess? Analysing chess ratings above high thresholds

有人下棋吗?分析高于高阈值的国际象棋评级

Nils Lid Hjort

AI总结 本文分析了国际象棋评级中高于高阈值的玩家分布,提出新的模型来解释顶级玩家的差异。

详情
Comments
9 pages, 7 figures
AI中文摘要

假设某些聪明程度参数足够有趣以至于被定义和测量,可能针对不同专业层次或更广泛的人群。此类现象在所有玩家中可能呈高斯分布,但当关注极少数顶尖玩家时,需要不同的模型。本文开发了此类模型和工具,并应用于当前活跃的14671名男性和753名女性国际象棋玩家的前100名及2100分以上列表。即使两个或多个分布的期望值或中位数接近,微小的方差差异也可能解释顶尖玩家之间的差距。

英文摘要

Suppose some cleverness score parameter is sufficiently interesting to be defined and then measured, perhaps for different strata of specialists or for the broader population. Such phenomena could have Gaussian distributions, when it comes to all players in a stratum, but when interest focuses on the very tails, for the top few percent, those above certain high thresholds, different models are called for, along with the need to analyse such based on the listed top scores only. In this note I develop such models and tools, and apply them to the top-100 and above 2100 points lists for regular chess ratings, for the currently active 14671 men and 753 women, as given by the FIDE, January 2026. It is argued that even when two or more distributions have close to identical expected values, or medians, even smaller differences in variance may explain gaps for the few very best ones.

2601.21170 2026-05-19 cs.LG stat.ML

The Powers of Precision: Structure-Informed Detection in Complex Systems -- From Customer Churn to Seizure Onset

精度的威力:复杂系统中的结构引导检测——从客户流失到癫痫发作 onset

Augusto Santos, Teresa Santos, Catarina Rodrigues, José M. F. Moura

AI总结 本文提出一种基于结构信息的机器学习方法,用于复杂系统中关键事件的早期检测,通过学习最优特征表示和分类模块,实现对隐藏因果结构的识别与利用,展示了在癫痫发作检测和客户流失预测中的有效性。

详情
AI中文摘要

涌现现象——癫痫发作 onset、突发客户流失或流行病爆发——往往源于复杂系统中隐藏的因果相互作用。我们提出了一种机器学习方法,用于其早期检测,解决了核心挑战:在数据生成过程未知且部分观测的情况下,揭示并利用系统潜在的因果结构。该方法从一个参数家族的估计器中学习最优特征表示——经验协方差或精度矩阵的幂——提供了一种原则性方法来捕捉驱动关键事件出现的底层结构。随后的监督学习模块对学习到的表示进行分类。我们证明了该家族的结构一致性,并在癫痫发作检测和客户流失预测中展示了方法的实证有效性,取得了竞争性的结果。除了预测之外,我们还发现最优协方差幂显示出良好的可识别性,同时捕捉到结构特征,从而在预测性能与可解释的统计结构之间取得平衡。

英文摘要

Emergent phenomena -- onset of epileptic seizures, sudden customer churn, or pandemic outbreaks -- often arise from hidden causal interactions in complex systems. We propose a machine learning method for their early detection that addresses a core challenge: unveiling and harnessing a system's latent causal structure despite the data-generating process being unknown and partially observed. The method learns an optimal feature representation from a one-parameter family of estimators -- powers of the empirical covariance or precision matrix -- offering a principled way to tune in to the underlying structure driving the emergence of critical events. A supervised learning module then classifies the learned representation. We prove structural consistency of the family and demonstrate the empirical soundness of our approach on seizure detection and churn prediction, attaining competitive results in both. Beyond prediction, and toward explainability, we ascertain that the optimal covariance power exhibits evidence of good identifiability while capturing structural signatures, thus reconciling predictive performance with interpretable statistical structure.

2601.06009 2026-05-19 stat.ML cs.LG eess.SP math.PR stat.AP

Detecting Stochasticity in Discrete Signals via Nonparametric Excursion Theorem

通过非参数逃逸定理检测离散信号中的随机性

Sunia Tanweer, Firas A. Khasawneh

AI总结 本文提出一种基于连续半鞅逃逸和穿越定理的非参数方法,通过比较实测逃逸次数与理论期望比值,区分扩散过程与确定性信号,不依赖参数模型。

详情
AI中文摘要

我们开发了一个实用框架,仅使用单个离散时间序列区分扩散随机过程与确定性信号。该方法基于连续半鞅的经典逃逸和穿越定理,将逃逸次数$N_\varepsilon$与过程的二次变分$[X]_T$相关联。该标度定律适用于所有具有有限二次变分的连续半鞅,包括具有非线性或状态依赖波动率的一般伊藤扩散过程,但对确定性系统失效,从而提供了一种理论认证的方法来区分这些动态,而非基于主观熵或复发的最新方法。我们构建了一个稳健的数据驱动扩散测试,该方法将实测逃逸次数与理论期望进行比较。所得比值$K(\varepsilon)=N_{\varepsilon}^{\mathrm{emp}}/N_{\varepsilon}^{\mathrm{theory}}$通过log-log斜率偏差总结,测量$\varepsilon^{-2}$定律,从而分类为扩散样或非扩散样。我们在经典随机系统、某些周期性和混沌映射及加性白噪声系统,以及随机杜芬系统上展示了该方法。该方法是非参数、无模型的,仅依赖于连续半鞅的小尺度结构。

英文摘要

We develop a practical framework for distinguishing diffusive stochastic processes from deterministic signals using only a single discrete time series. Our approach is based on classical excursion and crossing theorems for continuous semimartingales, which correlates number $N_\varepsilon$ of excursions of magnitude at least $\varepsilon$ with the quadratic variation $[X]_T$ of the process. The scaling law holds universally for all continuous semimartingales with finite quadratic variation, including general Ito diffusions with nonlinear or state-dependent volatility, but fails sharply for deterministic systems -- thereby providing a theoretically-certfied method of distinguishing between these dynamics, as opposed to the subjective entropy or recurrence based state of the art methods. We construct a robust data-driven diffusion test. The method compares the empirical excursion counts against the theoretical expectation. The resulting ratio $K(\varepsilon)=N_{\varepsilon}^{\mathrm{emp}}/N_{\varepsilon}^{\mathrm{theory}}$ is then summarized by a log-log slope deviation measuring the $\varepsilon^{-2}$ law that provides a classification into diffusion-like or not. We demonstrate the method on canonical stochastic systems, some periodic and chaotic maps and systems with additive white noise, as well as the stochastic Duffing system. The approach is nonparametric, model-free, and relies only on the universal small-scale structure of continuous semimartingales.

2512.23978 2026-05-19 cs.LG math.OC stat.ML

Assured autonomy: How operations research powers and orchestrates generative AI systems

保障自主性:如何用运筹学赋能和协调生成式AI系统

Tinglong Dai, David Simchi-Levi, Michelle Xiao Wu, Yao Xie

AI总结 本文探讨生成式AI在向自主决策系统转变过程中,如何通过运筹学方法提升系统的可行性、鲁棒性和风险控制能力。

详情
Comments
Authors are listed alphabetically; Production and Operations Management (POM), 2026
AI中文摘要

生成式人工智能(GenAI)正从对话助手转向代理系统——能够在操作流程中感知、决策和行动的自主决策系统。这种转变带来了自主性悖论:随着GenAI系统获得更大的操作自主权,它们应通过设计体现更正式的结构、更明确的约束和更强的风险控制。我们论证,除非生成模型与提供可验证可行性、对抗鲁棒性和高后果场景下的压力测试机制相结合,否则随机生成模型在操作领域可能脆弱。为此,我们开发了一个以运筹学(OR)为基础的保障自主性框架,基于两种互补方法。首先,基于流的生成模型将生成过程框架为确定性传输,由常微分方程描述,从而实现可审计性、约束感知生成以及与最优传输、鲁棒优化和顺序决策控制的联系。其次,通过对抗鲁棒性视角制定操作安全性:决策规则在不确定性或模糊集内评估最坏扰动,使未建模风险成为设计的一部分。该框架阐明了增加自主性如何使OR的角色从求解器转变为护栏到系统架构师,负责控制逻辑、激励协议、监控制度和安全边界。这些元素定义了在安全关键、可靠性敏感的操作领域中保障自主性的研究议程。

英文摘要

Generative artificial intelligence (GenAI) is shifting from conversational assistants toward agentic systems -- autonomous decision-making systems that sense, decide, and act within operational workflows. This shift creates an autonomy paradox: as GenAI systems are granted greater operational autonomy, they should, by design, embody more formal structure, more explicit constraints, and stronger tail-risk discipline. We argue that stochastic generative models can be fragile in operational domains unless paired with mechanisms that provide verifiable feasibility, robustness to distribution shift, and stress testing under high-consequence scenarios. To address this challenge, we develop a conceptual framework for assured autonomy grounded in operations research (OR), built on two complementary approaches. First, flow-based generative models frame generation as deterministic transport characterized by an ordinary differential equation, enabling auditability, constraint-aware generation, and connections to optimal transport, robust optimization, and sequential decision control. Second, operational safety is formulated through an adversarial robustness lens: decision rules are evaluated against worst-case perturbations within uncertainty or ambiguity sets, making unmodeled risks part of the design. This framework clarifies how increasing autonomy shifts OR's role from solver to guardrail to system architect, with responsibility for control logic, incentive protocols, monitoring regimes, and safety boundaries. These elements define a research agenda for assured autonomy in safety-critical, reliability-sensitive operational domains.

2512.22473 2026-05-19 stat.ML cs.AI cs.LG

Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds

注意力的梯度动力学:交叉熵如何塑造贝叶斯流形

Naman Agarwal, Siddhartha R. Dalal, Vishal Misra

AI总结 研究通过分析交叉熵训练如何重塑Transformer注意力分数和值向量,揭示了注意力评分的优势路由定律和值的职责加权更新,展示了梯度动力学如何塑造贝叶斯流形以支持概率推理。

详情
Comments
v2: Add dual-entropy connection - advantage signal drives \r{ho} down; fix duplicate bibliography entries (synced from Paper I)
AI中文摘要

Transformer在精心构建的『贝叶斯风洞』和大规模语言模型中表现出精确的概率推理能力,但梯度学习如何创建所需的内部几何仍不清楚。本文提供了一种完整的首次级分析,揭示了交叉熵训练如何重塑Transformer注意力头中的注意力评分和值向量。核心结果是注意力评分的『优势路由定律』,以及值的『职责加权更新』。这些方程诱导出正反馈循环,使路由和内容共同专业化:查询更强烈地路由到误差信号高于平均的值,而这些值被拉向使用它们的查询。本文展示了这种耦合专业化行为类似于两时间尺度EM过程:注意力权重实现E步(软责任),而值实现M步(责任加权原型更新),查询和键调整假设框架。通过受控模拟,包括一个粘性马尔可夫链任务,比较了闭合形式EM式更新与标准SGD,证明了相同的梯度动力学在最小化交叉熵的同时,塑造了本文配套工作所识别的低维流形,这些流形实现了贝叶斯推理。这给出了一个统一的画面:优化(梯度流)导致几何(贝叶斯流形),后者又支持功能(上下文概率推理)。

英文摘要

Transformers empirically perform precise probabilistic reasoning in carefully constructed ``Bayesian wind tunnels'' and in large-scale language models, yet the mechanisms by which gradient-based learning creates the required internal geometry remain opaque. We provide a complete first-order analysis of how cross-entropy training reshapes attention scores and value vectors in a transformer attention head. Our core result is an \emph{advantage-based routing law} for attention scores, \[ \frac{\partial L}{\partial s_{ij}} = α_{ij}\bigl(b_{ij}-\mathbb{E}_{α_i}[b]\bigr), \qquad b_{ij} := u_i^\top v_j, \] coupled with a \emph{responsibility-weighted update} for values, \[ Δv_j = -η\sum_i α_{ij} u_i, \] where $u_i$ is the upstream gradient at position $i$ and $α_{ij}$ are attention weights. These equations induce a positive feedback loop in which routing and content specialize together: queries route more strongly to values that are above-average for their error signal, and those values are pulled toward the queries that use them. We show that this coupled specialization behaves like a two-timescale EM procedure: attention weights implement an E-step (soft responsibilities), while values implement an M-step (responsibility-weighted prototype updates), with queries and keys adjusting the hypothesis frame. Through controlled simulations, including a sticky Markov-chain task where we compare a closed-form EM-style update to standard SGD, we demonstrate that the same gradient dynamics that minimize cross-entropy also sculpt the low-dimensional manifolds identified in our companion work as implementing Bayesian inference. This yields a unified picture in which optimization (gradient flow) gives rise to geometry (Bayesian manifolds), which in turn supports function (in-context probabilistic reasoning).

2512.22471 2026-05-19 cs.LG cs.AI stat.ML

The Bayesian Geometry of Transformer Attention

Transformer 注意力的贝叶斯几何

Naman Agarwal, Siddhartha R. Dalal, Vishal Misra

AI总结 本文通过构建贝叶斯风道,验证了Transformer在上下文中的贝叶斯推理能力,发现其通过几何机制实现后验更新与路由,揭示了注意力机制的必要性及扁平架构的不足。

详情
Comments
v2: Add dual-entropy measurement framework (H_I, H_P, \r{ho} = H_P/H_I); incorporate Overleaf revisions; fix duplicate bibliography entries (akyurek mashup; openai title; legacy aliases removed)
AI中文摘要

Transformer 似乎在上下文中表现出贝叶斯推理,但严格验证一直困难:自然数据缺乏解析后验,大模型将推理与记忆混淆。我们通过构建贝叶斯风道——可控环境,其中真实后验以闭合形式给出,记忆可证明不可能。在这些设置中,小型Transformer以10^-3-10^-4 bit精度再现贝叶斯后验,而容量匹配的MLP则相差多个数量级,确立了明确的架构分离。在两个任务——双射消除和隐马尔可夫模型(HMM)状态跟踪中,发现Transformer通过一致的几何机制实现贝叶斯推理:残差流作为信念基质,前馈网络执行后验更新,注意力提供内容可寻址路由。几何诊断揭示正交键基、渐进查询-键对齐和由后验熵参数化的低维值流形。训练期间该流形展开而注意力模式保持稳定,这与最近的梯度分析预测的帧精度解离一致。这些结果表明,分层注意力通过几何设计实现贝叶斯推理,解释了注意力的必要性及扁平架构的失败。贝叶斯风道为机械连接小型可验证系统与大语言模型中推理现象提供了基础。

英文摘要

Transformers often appear to perform Bayesian reasoning in context, but verifying this rigorously has been impossible: natural data lack analytic posteriors, and large models conflate reasoning with memorization. We address this by constructing \emph{Bayesian wind tunnels} -- controlled environments where the true posterior is known in closed form and memorization is provably impossible. In these settings, small transformers reproduce Bayesian posteriors with $10^{-3}$-$10^{-4}$ bit accuracy, while capacity-matched MLPs fail by orders of magnitude, establishing a clear architectural separation. Across two tasks -- bijection elimination and Hidden Markov Model (HMM) state tracking -- we find that transformers implement Bayesian inference through a consistent geometric mechanism: residual streams serve as the belief substrate, feed-forward networks perform the posterior update, and attention provides content-addressable routing. Geometric diagnostics reveal orthogonal key bases, progressive query-key alignment, and a low-dimensional value manifold parameterized by posterior entropy. During training this manifold unfurls while attention patterns remain stable, a \emph{frame-precision dissociation} predicted by recent gradient analyses. Taken together, these results demonstrate that hierarchical attention realizes Bayesian inference by geometric design, explaining both the necessity of attention and the failure of flat architectures. Bayesian wind tunnels provide a foundation for mechanistically connecting small, verifiable systems to reasoning phenomena observed in large language models.

2511.11054 2026-05-19 math.ST stat.TH

Tuning free Catoni type joint robust estimation

无需调谐的Catoni类型联合鲁棒估计

Xiang Li, Jun S. Liu, Qiang Sun, Lihu Xu

AI总结 本文提出一种无需调谐的Catoni类型联合估计框架,用于具有厚尾噪声的参数模型,同时估计目标参数和未知噪声方差。在均值估计、线性回归和ℓ₂惩罚回归三种经典设置中应用该框架,并建立非渐近性子高斯型偏差界,证明其在厚尾情况下最优。

详情
AI中文摘要

本文提出了一种无需调谐的Catoni类型联合估计框架,用于具有厚尾噪声的参数模型,同时估计目标参数和未知噪声方差。在均值估计、线性回归和ℓ₂惩罚回归三种经典设置中应用该框架,并建立非渐近性子高斯型偏差界,证明其在厚尾情况下最优。

英文摘要

This paper develops a Catoni-type joint (tuning-free) estimation framework for parametric models with heavy-tailed noise, in which the target parameter and the unknown noise variance are estimated simultaneously through a system of two coupled Catoni-type estimating equations. We instantiate the framework in three canonical settings: mean estimation, linear regression, and $\ell_{2}$-penalized regression. Theoretically, we establish non-asymptotic, sub-Gaussian-type deviation bounds that hold jointly for the target parameter and the variance estimator, under only a finite $2β$-th moment assumption with $β\in (1,2]$. The resulting rates match -- up to absolute constants -- those of oracle procedures that know the variance in advance, thereby attaining optimality in the heavy-tailed regime. Methodologically, because the coupled equations are intrinsically non-convex and non-linear, classical convex M-estimation arguments are inapplicable. We develop a new analytical toolkit based on the Poincare--Miranda theorem. The resulting proof strategy is of independent methodological interest, and we expect it to be applicable to a broad class of other statistical problems in which several parameters of heterogeneous nature must be estimated jointly.

2509.15480 2026-05-19 stat.ME stat.AP

A tree-based kernel for densities and its applications in clustering DNase-seq profiles

基于树的密度核及其在聚类DNase-seq轮廓中的应用

Yuliang Xu, Kaixuan Luo, Li Ma

AI总结 本文提出一种非参数密度核,用于在分层框架中建模多重采样密度,以捕捉TF足迹的复杂空间依赖性,提升聚类准确性。

详情
AI中文摘要

在分层框架中建模多个采样密度可跨样本借用信息。这些密度随机效应可作为潜在变量模型中的核,表示可交换的子群或聚类。这些核的关键特性是它们诱导的(函数)协方差,这决定了混合模型中密度如何分组。我们的动机问题是在高通量DNase-seq实验中聚类染色质可及性轮廓以检测转录因子(TF)结合。TF结合通常产生具有空间模式的足迹轮廓,产生基因组位置间的长程依赖性。现有非参数分层模型施加了限制性的协方差假设,无法容纳此类依赖性,常导致生物上无信息的聚类。我们提出了一种非参数密度核,足以捕捉多样的协方差结构,并适应各种TF足迹的空间模式。该核通过多元逻辑it-normal模型与稀疏精度矩阵指定二元树分裂概率。使用该核的潜在变量模型的贝叶斯推断通过吉布斯采样与Polya-Gamma增强实现。广泛的模拟显示,我们的核显著提高了聚类准确性。我们应用所提出的混合模型到ENCODE项目中的DNase-seq数据,结果得到对应于两种常见TF结合事件的生物上意义的聚类。

英文摘要

Modeling multiple sampling densities within a hierarchical framework enables borrowing of information across samples. These density random effects can act as kernels in latent variable models to represent exchangeable subgroups or clusters. A key feature of these kernels is the (functional) covariance they induce, which determines how densities are grouped in mixture models. Our motivating problem is clustering chromatin accessibility profiles from high-throughput DNase-seq experiments to detect transcription factor (TF) binding. TF binding typically produces footprint profiles with spatial patterns, creating long-range dependency across genomic locations. Existing nonparametric hierarchical models impose restrictive covariance assumptions and cannot accommodate such dependencies, often leading to biologically uninformative clusters. We propose a nonparametric density kernel flexible enough to capture diverse covariance structures and adaptive to various spatial patterns of TF footprints. The kernel specifies dyadic tree splitting probabilities via a multivariate logit-normal model with a sparse precision matrix. Bayesian inference for latent variable models using this kernel is implemented through Gibbs sampling with Polya-Gamma augmentation. Extensive simulations show that our kernel substantially improves clustering accuracy. We apply the proposed mixture model to DNase-seq data from the ENCODE project, which results in biologically meaningful clusters corresponding to binding events of two common TFs.

2509.01629 2026-05-19 stat.ML cs.LG cs.NA math.NA

Lipschitz-Guided Design of Interpolation Schedules in Generative Models

基于Lipschitz性的生成模型插值调度设计

Yifan Chen, Eric Vanden-Eijnden, Jiawei Xu

AI总结 本文研究了生成模型中插值调度的设计,从统计和数值角度出发,提出通过最小化漂移场的平均平方Lipschitz性来设计调度,以提升生成模型的稳定性与准确性。

详情
AI中文摘要

我们从统计和数值角度研究了流和扩散生成模型中插值调度的设计。在随机插值框架下,我们证明在最优后验调优扩散系数后,标量插值调度在路径空间的Kullback-Leibler散度下是统计等价的。这一等价性促使我们关注漂移场的数值特性而非纯统计标准。我们提出最小化漂移的平均平方Lipschitz性作为调度设计的原理性标准,与最优传输中的动能最小化形成对比。一个简单的转换公式将一个调度的漂移表示为另一个调度的漂移,允许在不同(如线性)调度训练的模型上进行推断而不需重新训练。我们为高斯和高斯混合目标分析了最优调度:对于高斯分布,我们获得比线性调度在Lipschitz常数上指数级改进的调度;对于高斯混合,我们获得在少量步采样中缓解模式崩溃的调度。我们随后在高维不变测度的随机Allen-Cahn和Navier-Stokes方程中验证了该方法,其中设计的调度在固定积分器预算下显著提高了细粒度统计的准确性。

英文摘要

We study the design of interpolation schedules in flow and diffusion-based generative models from both statistical and numerical perspectives. Within the stochastic interpolants framework, we first show that scalar interpolation schedules are statistically equivalent under the Kullback--Leibler divergence in path space, after optimal a posteriori tuning of the diffusion coefficient. This equivalence motivates focusing on numerical properties of the drift field rather than purely statistical criteria. We propose minimizing the averaged squared Lipschitzness of the drift as a principled criterion for schedule design, in contrast with kinetic-energy minimization in optimal transport. A simple transfer formula expresses the drift of one schedule in terms of the drift of another, allowing the designed schedule to be used at inference time with a model trained under a different (e.g., linear) schedule, without retraining. We work out the optimal schedules analytically for Gaussian and Gaussian-mixture targets: for Gaussians, we obtain exponential improvements in the Lipschitz constant over linear schedules; for Gaussian mixtures, we obtain schedules that mitigate mode collapse in few-step sampling. We then validate the approach on high-dimensional invariant measures of stochastic Allen--Cahn and Navier--Stokes equations, where the designed schedule yields markedly more accurate fine-scale statistics at fixed integrator budget.

2507.21334 2026-05-19 stat.ML cs.LG

Graph neural networks for residential location choice: connection to classical logit models

图神经网络在住宅选址选择中的应用:与经典logit模型的联系

Zhanhong Cheng, Lingqian Hu, Yuheng Bu, Yuqi Zhou, Shenhao Wang

AI总结 本文提出基于图神经网络的住宅选址选择模型,通过捕捉空间替代关系,优于传统模型,展现深度学习与离散选择模型结合的潜力。

详情
AI中文摘要

研究人员已采用深度学习进行经典离散选择分析,因其能捕捉复杂特征关系并提高预测性能。然而,现有深度学习方法无法显式捕捉选择替代品之间的关系,这在经典离散选择模型中一直是重点。为解决这一差距,本文引入图神经网络(GNN)作为新框架分析住宅选址选择。GNN-DCMs提供了一种结构化方法,使神经网络能捕捉空间替代品间的依赖关系,同时保持与经典随机效用理论的明确联系。理论上,证明GNN-DCMs包含嵌套logit(NL)模型和空间相关logit(SCL)模型作为特定情况,通过替代品效用间的消息传递获得新的算法解释。实证上,GNN-DCMs在预测芝加哥77个社区区的住宅选址选择中优于基准MNL、SCL和前馈神经网络。在模型解释方面,GNN-DCMs能捕捉个体异质性和空间感知的替代模式。总体而言,这些结果突显了GNN-DCMs作为统一且表达性强的框架,可整合离散选择建模和深度学习,在复杂空间选择情境中的潜力。

英文摘要

Researchers have adopted deep learning for classical discrete choice analysis as it can capture complex feature relationships and achieve higher predictive performance. However, the existing deep learning approaches cannot explicitly capture the relationship among choice alternatives, which has been a long-lasting focus in classical discrete choice models. To address the gap, this paper introduces Graph Neural Network (GNN) as a novel framework to analyze residential location choice. The GNN-based discrete choice models (GNN-DCMs) offer a structured approach for neural networks to capture dependence among spatial alternatives, while maintaining clear connections to classical random utility theory. Theoretically, we demonstrate that the GNN-DCMs incorporate the nested logit (NL) model and the spatially correlated logit (SCL) model as two specific cases, yielding novel algorithmic interpretation through message passing among alternatives' utilities. Empirically, the GNN-DCMs outperform benchmark MNL, SCL, and feedforward neural networks in predicting residential location choices among Chicago's 77 community areas. Regarding model interpretation, the GNN-DCMs can capture individual heterogeneity and exhibit spatially-aware substitution patterns. Overall, these results highlight the potential of GNN-DCMs as a unified and expressive framework for synergizing discrete choice modeling and deep learning in the complex spatial choice contexts.

2507.09148 2026-05-19 stat.ML cs.LG math.OC

A Randomized Algorithm for Sparse PCA based on the Basic SDP Relaxation

基于基本SDP松弛的稀疏PCA随机算法

Alberto Del Pia, Dekun Zhou

AI总结 本文提出基于基本SDP松弛的稀疏PCA随机近似算法,通过构造确定性和随机性解并输出最优解,实现高概率下的稀疏性常数近似比,并在特定条件下保证近似比受对数约束。

详情
Comments
29 pages, 2 figures
AI中文摘要

稀疏主成分分析(SPCA)是一种用于降维的基本技术,属于NP难问题。本文介绍了一种基于基本SDP松弛的随机近似算法,该算法通过构造确定性稀疏解和多个随机解,并输出最优解。该算法在足够多次调用时,近似比最多为稀疏常数。在技术假设下,平均近似比受O(log d)约束,其中d为特征数。我们证明若SDP解低秩或具有指数衰减特征值,则该技术假设成立。我们还展示了两类实例满足该假设,并在协方差模型中证明确定性解可达到近优近似比。通过在真实数据集上的数值测试验证了算法的有效性。

英文摘要

Sparse Principal Component Analysis (SPCA) is a fundamental technique for dimensionality reduction, and is NP-hard. In this paper, we introduce a randomized approximation algorithm for SPCA, which is based on the basic SDP relaxation. Our algorithm takes an (approximate) SDP solution, constructs one deterministic sparse solution and several randomized solutions, and outputs the best among them. Our algorithm has an approximation ratio of at most the sparsity constant with high probability, if called enough times. Under a technical assumption, which is consistently satisfied in our numerical tests, the average approximation ratio is also bounded by $\mathcal{O}(\log{d})$, where $d$ is the number of features. We show that this technical assumption is satisfied if the SDP solution is low-rank, or has exponentially decaying eigenvalues. We then present two classes of instances for which this technical assumption holds. We also demonstrate that in a covariance model, which generalizes the spiked Wishart model, the deterministic solution in our algorithm achieves a near-optimal approximation ratio. We demonstrate the efficacy of our algorithm through numerical tests on real-world datasets.

2506.11229 2026-05-19 stat.ME physics.ed-ph

Advancing clustering methods in physics education research: A case for mixture models

推动物理教育研究中的聚类方法:混合模型的案例

Minghui Wang, Meagan Sundstrom, Karen Nylund-Gibson, Marsha Ing

AI总结 本文探讨了混合模型在物理教育研究中的应用,对比了k-modes聚类与潜在类别分析的理论差异,并通过平行分析展示其在解决相同研究问题时的异同。

详情
AI中文摘要

聚类方法常用于物理教育研究(PER)中,以识别具有相似响应模式或特征的个体子群体。k-means(或k-modes,用于分类数据)是PER中最常用的聚类方法之一。然而,该算法并非基于模型:它依赖于算法划分,并将个体分配到子群体中具有确定隶属关系。研究人员还必须进行事后分析,以将子群体隶属关系与其他变量相关联。混合模型提供了一种基于模型的替代方法,能够考虑分类误差,并允许研究人员将子群体隶属关系直接整合到更广泛的潜在变量框架中。本文概述了k-modes聚类与潜在类别分析(一种用于分类数据的混合模型类型)之间的理论相似性和差异。我们还使用每种方法进行平行分析,以解决相同的研究问题,以展示这些相似性和差异。我们为有兴趣使用混合模型的研究人员提供了数据和R代码,以复制本文中所展示的示例。

英文摘要

Clustering methods are often used in physics education research (PER) to identify subgroups of individuals within a population who share similar response patterns or characteristics. K-means (or k-modes, for categorical data) is one of the most commonly used clustering methods in PER. This algorithm, however, is not model-based: it relies on algorithmic partitioning and assigns individuals to subgroups with definite membership. Researchers must also conduct post-hoc analyses to relate subgroup membership to other variables. Mixture models offer a model-based alternative that accounts for classification errors and allows researchers to directly integrate subgroup membership into a broader latent variable framework. In this paper, we outline the theoretical similarities and differences between k-modes clustering and latent class analysis (one type of mixture model for categorical data). We also present parallel analyses using each method to address the same research questions in order to demonstrate these similarities and differences. We provide the data and R code to replicate the worked example presented in the paper for researchers interested in using mixture models.

2506.10959 2026-05-19 cs.LG cs.AI math.ST stat.TH

Understanding In-Context Learning on Structured Manifolds: Bridging Attention to Kernel Methods

在结构流形上理解上下文学习:连接注意力机制与核方法

Zhaiming Shen, Alexander Hsu, Rongjie Lai, Wenjing Liao

AI总结 本文研究了在结构几何数据上上下文学习的理论,通过将注意力机制与核方法联系,揭示了transformers在流形上进行核预测的机制,并推导了泛化误差界。

详情
AI中文摘要

尽管上下文学习(ICL)在自然语言和视觉领域取得了显著成功,但其在结构几何数据中的理论理解仍不明确。本文首次对ICL在流形上回归Hölder函数的理论进行了研究。我们建立了注意力机制与经典核方法之间的新联系,证明transformers通过与提示的交互在新查询上进行基于核的预测。这一联系通过数值实验得到验证,显示学习的查询-提示分数与高斯核高度相关。基于此见解,我们推导了泛化误差界,以提示长度和训练任务数量为变量。当观察到足够多的训练任务时,transformers在流形上实现Hölder函数的最小最大回归率,该速率与提示长度呈指数关系,指数取决于流形的内在维度,而非外蕴空间维度。我们的结果还描述了泛化误差随训练任务数量的变化,揭示了transformers作为上下文核算法学习器的复杂性。我们的发现为理解几何在ICL中的作用提供了基础见解,并为研究非线性模型的ICL提供了新工具。

英文摘要

While in-context learning (ICL) has achieved remarkable success in natural language and vision domains, its theoretical understanding-particularly in the context of structured geometric data-remains unexplored. This paper initiates a theoretical study of ICL for regression of Hölder functions on manifolds. We establish a novel connection between the attention mechanism and classical kernel methods, demonstrating that transformers effectively perform kernel-based prediction at a new query through its interaction with the prompt. This connection is validated by numerical experiments, revealing that the learned query-prompt scores for Hölder functions are highly correlated with the Gaussian kernel. Building on this insight, we derive generalization error bounds in terms of the prompt length and the number of training tasks. When a sufficient number of training tasks are observed, transformers give rise to the minimax regression rate of Hölder functions on manifolds, which scales exponentially with respect to the prompt length with the exponent depending on the intrinsic dimension of the manifold, rather than the ambient space dimension. Our result also characterizes how the generalization error scales with the number of training tasks, shedding light on the complexity of transformers as in-context kernel algorithm learners. Our findings provide foundational insights into the role of geometry in ICL and novels tools to study ICL of nonlinear models.

2505.12181 2026-05-19 stat.ME

Reliable fairness auditing with semi-supervised inference

基于半监督推断的可靠公平性审计

Jianhui Gao, Jessica Gronsbell

AI总结 本文提出Infairness框架,利用半监督推断在有限标注数据下实现公平性审计,通过回归与非线性基函数填补缺失结果,提升估计鲁棒性和效率,实验证明其在医疗数据中显著降低方差。

详情
AI中文摘要

机器学习模型常表现出加剧生物医学应用不平等的偏见。公平性审计是评估模型在子群体表现的关键步骤,但通常依赖大量标注数据,成本高且耗时。本文引入Infairness框架,结合小规模标注数据与大规模未标注数据,通过回归与精心选择的非线性基函数填补缺失结果,实现广泛公平性标准的审计。理论和实证分析显示,所提估计器对ML或填补模型的规格具有鲁棒性,并且在仅使用标注数据的监督估计基础上显著更高效。在两个真实世界公平性审计中,Infairness将方差降低约50%,证明其在有限标注数据下的可靠性。

英文摘要

Machine learning (ML) models often exhibit bias that can exacerbate inequities in biomedical applications. Fairness auditing, the process of evaluating a model's performance across subpopulations, is critical for identifying and mitigating these biases. However, audits typically rely on large volumes of labeled data, which are costly and labor-intensive to obtain. To address this challenge, we introduce $\textit{Infairness}$, a unified framework for auditing a wide range of fairness criteria using semi-supervised inference. Our approach combines a small labeled dataset with a large unlabeled dataset by imputing missing outcomes via regression with carefully selected nonlinear basis functions. Through extensive theoretical and empirical analyses, we show that our proposed estimator is (i) robust to specification of the ML or imputation model and (ii) substantially more efficient than supervised estimation based solely on the labeled data. In two real-world fairness audits using electronic health record and medical imaging data, Infairness reduces variance by approximately 50% compared to supervised estimation, underscoring its value for reliable fairness auditing with limited labeled data.

2505.03205 2026-05-19 cs.LG cs.NA math.NA math.ST stat.TH

Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights

用于噪声和任务级流形学习的Transformer:近似和泛化见解

Zhaiming Shen, Alex Havrilla, Rongjie Lai, Alexander Cloninger, Wenjing Liao

AI总结 本文研究了Transformer在噪声和任务级流形上的学习性能,证明了其在低维结构中泛化能力与任务级流形的内在维度密切相关。

详情
AI中文摘要

Transformers作为大语言和视频生成模型的基础架构,如GPT、BERT、SORA及其后续模型。实证研究表明,现实数据和学习任务具有低维结构,伴有噪声或测量误差。Transformer的性能依赖于数据/任务的内在维度,但理论理解仍待探索。本文通过分析回归任务中接近流形的噪声输入数据,建立了Transformer的理论基础。具体而言,输入数据位于流形的管状邻域中,而真实函数依赖于噪声数据在该流形上的投影,称为任务级流形。我们证明了近似和泛化误差,其关键依赖于任务级流形的内在维度。结果表明,即使输入数据受高维噪声扰动,Transformer仍能利用低复杂度结构进行学习。我们的新证明技术通过Transformer构建基本算术运算的表示,可能具有独立兴趣。

英文摘要

Transformers serve as the foundational architecture for large language and video generation models, such as GPT, BERT, SORA and their successors. Empirical studies have demonstrated that real-world data and learning tasks exhibit low-dimensional structures, along with some noise or measurement error. The performance of transformers tends to depend on the intrinsic dimension of the data/tasks, though theoretical understandings remain largely unexplored for transformers. This work establishes a theoretical foundation by analyzing the performance of transformers for regression tasks involving noisy input data near a manifold. Specifically, the input data are in a tubular neighborhood of a manifold, while the ground truth function depends on the projection of the noisy data onto this manifold, referred to as the task-level manifold. We prove approximation and generalization errors which crucially depend on the intrinsic dimension of the task-level manifold. Our results demonstrate that transformers can leverage low-complexity structures in learning task even when the input data are perturbed by high-dimensional noise. Our novel proof technique constructs representations of basic arithmetic operations by transformers, which may hold independent interest.

2501.09015 2026-05-19 stat.ME

Family-wise Error Rate Control with E-values

基于e值的家族错误率控制

Will Hartog, Lihua Lei

AI总结 本文提出基于e值的闭合检验框架,用于控制家族错误率,改进了传统方法在静态和动态设置中的性能,并开发了高效的算法。

详情
Comments
32 pages, 12 figures, 4 algorithms
AI中文摘要

闭合原理是多重检验问题中实现强家族错误率(FWER)控制的标准工具。我们开发了一种基于e值的闭合检验框架,继承了e值的优良性质,这些性质常见于顺序假设检验或不规则参数模型的通用推断中。我们证明了基于e值的闭合检验在静态设置中强控制后验FWER,并在顺序设置中具有更强的任何时间有效和始终有效FWER控制性质。此外,我们扩展了著名的图形方法用于FWER控制(Bretz等,2009),使用e值的加权平均作为局部检验,这是一种比使用逆e值作为p值的加权Bonferroni局部检验更强大的方法。一般来说,闭合检验的计算成本可能呈指数级增长于假设的数量。尽管p值基于的图形方法的计算快捷方式不适用,我们开发了使用动态规划的高效多项式时间算法用于e值基于的图形方法,适用于任何有向无环图,并为e-Holm程序(之前由Vovk和Wang研究)和e-Fallback程序开发了定制算法。

英文摘要

The closure principle is a standard tool for achieving strong family-wise error rate (FWER) control in multiple testing problems. We develop an e-value-based closed testing framework that inherits nice properties of e-values, which are common in settings of sequential hypothesis testing or universal inference for irregular parametric models. We prove that e-value-based closed testing strongly controls the post-hoc FWER in the static setting, and has stronger anytime-valid and always-valid FWER-controlling properties in the sequential setting. Furthermore, we extend the celebrated graphical approach for FWER control (Bretz et al. 2009), using the weighted average of e-values for the local test, a strictly more powerful approach than weighted Bonferroni local tests with inverse e-values as p-values. In general, the computational cost for closed testing can be exponential in the number of hypotheses. Although the computational shortcuts for the p-value-based graphical approach are not applicable, we develop an efficient polynomial-time algorithm using dynamic programming for e-value-based graphical approaches with any directed acyclic graph, and tailored algorithms for the e-Holm procedure previously studied by Vovk and Wang (2021) and the e-Fallback procedure.

2501.02475 2026-05-19 stat.CO stat.ME

Tactics for Improving Least Squares Estimation

提升最小二乘估计的策略

Qiang Heng, Hua Zhou, Kenneth Lange

AI总结 本文探讨了高维最小二乘回归中加速计算的策略,包括MM原理、Moreau包络 smoothing 和约束估计的proximal距离原理,通过迭代加权最小二乘等方法提高计算效率。

详情
AI中文摘要

本文讨论了在高维最小二乘回归中加速计算的策略。这些策略包括:(a) 主要化-最小化 (MM) 原理,(b) 通过Moreau包络进行平滑,以及(c) 用于约束估计的proximal距离原理。在迭代加权最小二乘中,MM原理可以创建一个替代函数,通过交换案例权重来调整响应值。将其减少到普通最小二乘允许在迭代中重用Gram矩阵及其Cholesky分解。此策略适用于L2E回归和广义线性模型的估计。对于如分位数回归等问题,非光滑目标函数项可以被其Moreau包络近似替代,并通过球面二次函数进行主要化。最后,具有距离到集合惩罚的惩罚回归也受益于这种视角。我们的数值实验验证了去权重和Moreau包络近似的速度和实用性。Julia软件实现这些实验可在我们的网页上获得。

英文摘要

This paper deals with tactics for fast computation in least squares regression in high dimensions. These tactics include: (a) the majorization-minimization (MM) principle, (b) smoothing by Moreau envelopes, and (c) the proximal distance principle for constrained estimation. In iteratively reweighted least squares, the MM principle can create a surrogate function that trades case weights for adjusted responses. Reduction to ordinary least squares then permits the reuse of the Gram matrix and its Cholesky decomposition across iterations. This tactic is pertinent to estimation in L2E regression and generalized linear models. For problems such as quantile regression, non-smooth terms of an objective function can be replaced by their Moreau envelope approximations and majorized by spherical quadratics. Finally, penalized regression with distance-to-set penalties also benefits from this perspective. Our numerical experiments validate the speed and utility of deweighting and Moreau envelope approximations. Julia software implementing these experiments is available on our web page.

2412.19983 2026-05-19 q-fin.RM stat.AP

A Dynamic Spillover Effect Investigation on Cryptocurrency Market Before and After Pandemic

新冠疫情前后加密货币市场动态溢出效应研究

Wenjie Lan

AI总结 本文基于非对称断点方法区分加密货币市场中的风险共振与风险分散关系,分析极端事件下加密货币的风险传播机制,并探讨疫情前后加密货币风险关联的动态演变。

详情
Comments
This paper has been withdrawn because the current version contains errors in the framing and results that may mislead readers. The authors are preparing a corrected manuscript
AI中文摘要

本文基于新开发的非对称断点方法,区分了加密货币市场中的风险共振与风险分散关系,并分析了极端事件下加密货币之间的风险传播机制。此外,通过节点关联和网络结构的视角,本文探讨了疫情前后加密货币风险关联的动态演变关系。同时,通过疫情指标深入分析了加密货币风险运动的驱动机制。研究发现,在新冠爆发的影响下,加密货币之间的风险传播效应变得更加显著。同时,确诊病例的增加加剧了加密货币之间的风险溢出效应,而原油市场与加密货币市场之间的风险共振效应放大了疫情对加密货币的影响。然而,其他金融市场相对独立于加密货币市场。本文从公共卫生危机的角度提出应对加密货币风险传播的策略,为完善加密货币监管机制提供了有益的参考依据。

英文摘要

This paper distinguishes between risk resonance and risk diversification relationships in the cryptocurrency market based on the newly developed asymmetric breakpoint approach, and analyzes the risk propagation mechanism among cryptocurrencies under extreme events. In addition, through the lens of node association and network structure, this paper explores the dynamic evolutionary relationship of cryptocurrency risk association before and after the epidemic. In addition, the driving mechanism of the cryptocurrency risk movement is analyzed in a depth with the epidemic indicators. The findings show that the effect of propagation of risk among cryptocurrencies becomes more significant under the influence of the new crown outbreak. At the same time, the increase in the number of confirmed cases exacerbated the risk spillover effect among cryptocurrencies, while the risk resonance effect that exists between the crude oil market and the cryptocurrency market amplified the extent of the outbreak's impact on cryptocurrencies. However, other financial markets are relatively independent of the cryptocurrency market. This study proposes a strategy to deal with the spread of cryptocurrency risks from the perspective of a public health crisis, providing a useful reference basis for improving the regulatory mechanism of cryptocurrencies.

2411.18234 2026-05-19 cs.LG cs.AI cs.PF stat.CO

Time-Efficient Hybrid Hyperparameter Tuning Approach for Cardiovascular Disease Classification

用于心血管疾病分类的高效混合超参数调优方法

Abhay Kumar Pathak, Mrityunjay Chaubey, Manjari Gupta

AI总结 本文提出一种结合随机搜索和网格搜索的混合超参数调优方法,提升心血管疾病分类模型的准确性和效率,实验表明该方法在性能和计算时间上均优于传统方法。

详情
AI中文摘要

心血管疾病(CVDs)是任何严重的心脏疾病,需要准确诊断以防止致命后果。超参数调优在优化机器学习模型性能中起关键作用,通过选择最合适的参数配置来提高准确性、泛化性和可靠性。网格搜索系统地评估预定义的超参数组合,而随机搜索则从搜索空间中随机采样配置,实现更广泛的探索并减少计算成本。因此,在开发分类模型时,高效调优策略至关重要,因为时间和预测能力同样关键。本文提出了一种新的超参数调优方法,用于调优用于CVD分类的机器学习模型。所提出的随机网格搜索结合了随机搜索探索全局空间的能力和网格搜索在最有前途区域的集中和彻底搜索。这种混合方法在探索和利用之间找到最佳平衡,产生了一个稳健且高效的时间机器学习模型。在最先进的模型上的实验结果表明,随机网格搜索比传统超参数调优方法表现更好。除了观察到的模型性能提升外,大多数模型的训练所需计算时间也显著减少。所提研究的结果强调了所提出随机网格搜索方法在训练时间和计算效率上的减少。所提出的技术在医疗保健领域的机器学习应用中具有重大潜力,能够提供及时且准确的CVDs诊断。

英文摘要

Cardiovascular diseases (CVDs) are any serious illness of the heart, which require accurate diagnosis to prevent fatal consequences. Hyperparameter tuning plays a critical role in optimizing machine learning model performance by selecting the most suitable parameter configurations for improved accuracy, generalization, and reliability. Grid search systematically evaluates predefined hyperparameter combinations, whereas random search samples configurations randomly from the search space enabling broader exploration with reduced computational cost. Therefore, an efficient tuning strategy is essential when developing classification models where time plays an crucial role along with the predictive capability. In this work, we propose a new hyperparameter tuning approach to tune the hyperparameters of ML models for CVD classification. The proposed random grid search combines the power of random search to explore the global space with the focused and exhaustive search of grid search in the most promising areas. This hybrid approach finds an optimal balance between exploration and exploitation and yields a robust and time-efficient ML model for classification seetings. Experimental results on state of the art models demonstrated that randomised grid search performed better than traditional hyperparameter tuning methods. In addition to the observed improvement in model performance, the computational time required for training models was substantially reduced across most of the models. Presented results of the proposed study emphasizes the reduction in training time and computational efficiency of the proposed Randomized-Grid Search method. The proposed technique has significant potential to advance ML application in healthcare providing timely and accurate CVDs diagnosis.

2410.20319 2026-05-19 stat.ME

High-dimensional partial linear model with trend filtering

高维部分线性模型与趋势过滤

Sang Kyu Lee, Erikka Loftfield, Hyokyoung G. Hong, Haolei Weng

AI总结 本文提出高维部分线性回归模型,结合线性模型的可解释性和非参数方法的适应性,利用趋势过滤处理局部平滑变化,实现最小最大最优率,用于复杂生物数据集中的生物标志物识别。

详情
Journal ref
Lee, S. K., Loftfield, E., Hong, H. G., and Weng, H. (2026) High-dimensional partial linear model with trend filtering, Electronic Journal of Statistics, 20(1), 1800-1850
Comments
52 pages, 8 figures
AI中文摘要

理解饮食、代谢变化与健康结果之间的联系是营养科学和更广泛生物研究的关键焦点。分析如超加工食品(UPF)摄入量与代谢物之间的关系,可为饮食相关疾病和公共健康应用提供潜在生物标志物的洞察。然而,这些分析因高维数据结构和协变量与健康结果之间复杂的、往往非线性关联而具有挑战性。传统线性模型和常规非参数方法往往缺乏灵活性,无法准确捕捉生物数据中的复杂性。为此,我们提出一个高维部分线性回归模型,能够捕捉线性和非线性效应,结合线性模型的可解释性和非参数方法的适应性。我们的模型利用趋势过滤有效处理局部平滑变化,并实现最小最大最优率,使其适用于复杂生物数据集。我们将其应用于互动饮食和活动跟踪在AARP(IDATA)研究的数据中,展示了其在识别与UPF摄入量相关的生物标志物方面的实用性,并展示了其在饮食、代谢和健康相关研究中的潜在应用价值。

英文摘要

Understanding the links between diet, metabolic changes, and health outcomes is a key focus in nutritional science and broader biological research. Analyzing relationships, such as those between ultra-processed food (UPF) intake and metabolites, offers insights into potential biomarkers for diet-related diseases and public health applications. However, these analyses are challenging due to high-dimensional data structures and complex, often nonlinear associations between covariates and health outcomes. Traditional linear models and conventional nonparametric methods often lack the flexibility to accurately capture such complexities in biological data. To address these challenges, we propose a high-dimensional partial linear regression model that captures both linear and nonlinear effects, combining the interpretability of linear models with the adaptability of nonparametric approaches. Our model leverages trend filtering to handle local smoothness variations effectively and achieves minimax optimal rates, making it suitable for complex biological datasets. We apply this model to data from the Interactive Diet and Activity Tracking in AARP (IDATA) Study, demonstrating its utility in identifying biomarkers associated with UPF intake and illustrating its potential for broader applications in dietary, metabolic, and health-related research.

2410.07191 2026-05-19 cs.RO cs.LG stat.ME

Curb Your Attention: Causal Attention Gating for Robust Trajectory Prediction in Autonomous Driving

抑制注意力:因果注意力门控用于自动驾驶中的鲁棒轨迹预测

Ehsan Ahmadi, Ray Mercurius, Soheil Alizadeh, Kasra Rezaee, Amir Rasouli

AI总结 本文提出CRiTIC模型,通过因果发现网络识别agent间因果关系,并引入因果注意力门控机制提升轨迹预测的鲁棒性和泛化能力,实验表明模型在对抗非因果扰动时鲁棒性提升54%。

详情
Comments
Accepted ICRA 2025
AI中文摘要

自动驾驶中的轨迹预测模型易受非因果代理的扰动影响,此类扰动可能导致其他代理轨迹预测错误,进而影响自动驾驶决策的安全性和效率。本文提出CRiTIC模型,利用因果发现网络识别过去时间窗口内代理间的因果关系,并引入因果注意力门控机制,以选择性过滤Transformer架构中的信息。在两个自动驾驶基准数据集上进行了大量实验,评估了模型在对抗非因果扰动和泛化能力方面的鲁棒性。实验结果表明,预测鲁棒性可提升54%而对预测准确性影响不大。此外,本文展示了所提模型在跨域性能上的优越泛化能力,达到29%的改进。进一步细节请参见项目页面:https://ehsan-ami.github.io/critic。

英文摘要

Trajectory prediction models in autonomous driving are vulnerable to perturbations from non-causal agents whose actions should not affect the ego-agent's behavior. Such perturbations can lead to incorrect predictions of other agents' trajectories, potentially compromising the safety and efficiency of the ego-vehicle's decision-making process. Motivated by this challenge, we propose $\textit{Causal tRajecTory predICtion}$ $\textbf{(CRiTIC)}$, a novel model that utilizes a $\textit{Causal Discovery Network}$ to identify inter-agent causal relations over a window of past time steps. To incorporate discovered causal relationships, we propose a novel $\textit{Causal Attention Gating}$ mechanism to selectively filter information in the proposed Transformer-based architecture. We conduct extensive experiments on two autonomous driving benchmark datasets to evaluate the robustness of our model against non-causal perturbations and its generalization capacity. Our results indicate that the robustness of predictions can be improved by up to $\textbf{54%}$ without a significant detriment to prediction accuracy. Lastly, we demonstrate the superior domain generalizability of the proposed model, which achieves up to $\textbf{29%}$ improvement in cross-domain performance. These results underscore the potential of our model to enhance both robustness and generalization capacity for trajectory prediction in diverse autonomous driving domains. Further details can be found on our project page: https://ehsan-ami.github.io/critic.

2409.07014 2026-05-19 stat.ML cs.DB cs.LG

A Practical Theory of Generalization in Selectivity Learning

选择性学习中泛化理论的实用性研究

Peizhi Wu, Haoshu Xu, Ryan Marcus, Zachary G. Ives

AI总结 本文从理论与实践角度探讨选择性学习的泛化能力,提出基于有符号测度的可学习预测方法,并改进OOF泛化性能。

详情
Comments
15 pages. Technical Report (Extended Version)
AI中文摘要

查询驱动的机器学习模型已作为一种有前途的查询选择性估计技术出现。然而,从理论角度看,这些技术的有效性仍知之甚少,因为实际解决方案与基于Probably Approximately Correct (PAC) 学习框架的最先进理论之间存在显著差距。本文旨在弥合理论与实践之间的差距。首先,我们证明由符号测度诱导的选择性预测器是可学习的,这放松了PAC理论对概率测度的依赖。更重要的是,在此基础上,我们建立了在温和假设下,此类选择性预测器在分布外(OOD)泛化误差界上的有利表现。这些理论进步为我们提供了对查询驱动选择性学习的分布内和分布外泛化能力的更好理解,并促进了两种改进分布外泛化的通用策略的设计。我们实证验证了我们的技术在预测准确性和查询延迟性能方面显著帮助查询驱动选择性模型泛化到分布外查询,同时保持其优越的分布内泛化性能。

英文摘要

Query-driven machine learning models have emerged as a promising estimation technique for query selectivities. Yet, surprisingly little is known about the efficacy of these techniques from a theoretical perspective, as there exist substantial gaps between practical solutions and state-of-the-art (SOTA) theory based on the Probably Approximately Correct (PAC) learning framework. In this paper, we aim to bridge the gaps between theory and practice. First, we demonstrate that selectivity predictors induced by signed measures are learnable, which relaxes the reliance on probability measures in SOTA theory. More importantly, beyond the PAC learning framework (which only allows us to characterize how the model behaves when both training and test workloads are drawn from the same distribution), we establish, under mild assumptions, that selectivity predictors from this class exhibit favorable out-of-distribution (OOD) generalization error bounds. These theoretical advances provide us with a better understanding of both the in-distribution and OOD generalization capabilities of query-driven selectivity learning, and facilitate the design of two general strategies to improve OOD generalization for existing query-driven selectivity models. We empirically verify that our techniques help query-driven selectivity models generalize significantly better to OOD queries both in terms of prediction accuracy and query latency performance, while maintaining their superior in-distribution generalization performance.

2407.07316 2026-05-19 cs.GT math.OC stat.AP

Fast Revenue Maximization

快速收益最大化

Achraf Bahamou, Omar Besbes, Omar Mouchtaki

AI总结 本文研究了基于数据的定价问题,通过有限历史价格数据确定单个物品的价格,量化信息价值并指导高效定价实验。核心方法是将无限维问题转化为一维优化问题,提供有保证的定价策略,并展示在动态定价中如何减少实验次数。

详情
AI中文摘要

本文研究了基于数据的定价问题,通过有限历史价格数据确定单个物品的价格,量化信息价值并指导高效定价实验。核心方法是将无限维问题转化为一维优化问题,提供有保证的定价策略,并展示在动态定价中如何减少实验次数。

英文摘要

Problem definition: We study a data-driven pricing problem in which a seller sets a price for a single item based on demand observed at a limited number of historical prices. Our goal is to quantify the value of such information and to guide efficient price experimentation under practical constraints. Methodology/results: Our main methodological contribution is an exact reduction that characterizes the maximin revenue ratio, defined as the worst-case revenue achievable using only past data relative to the optimal revenue under full information. This reduction transforms an infinite-dimensional problem into a tractable one-dimensional optimization problem, allowing us to compute near-optimal pricing policies with explicit guarantees and to precisely quantify the value of historical data. Managerial implications: Motivated by practical constraints that limit price changes, we first evaluate the value of local information and show that the sign of the revenue gradient at a single price can provide significant guidance. We then use our framework to design efficient price experiments: we develop a method to select the next price to test so as to maximize future robust performance, and show how to substantially reduce the number of experiments needed to achieve target revenue guarantees in dynamic pricing. Finally, we show that our approach remains effective with noisy demand data, achieving near-optimal performance with as few as 25 to 100 samples per price.

2405.14657 2026-05-19 cs.LG stat.ML

Anchor-Based Heteroscedastic Noise for Preferential Bayesian Optimization

基于锚点的异方差噪声用于偏好贝叶斯优化

Marshal Arijona Sinaga, Julien Martinelli, Samuel Kaski

AI总结 本文提出一种异方差噪声模型用于偏好贝叶斯优化,通过用户提供的可靠示例(锚点)和核密度估计生成用户不确定性图,并推导出风险规避的获取函数,提升风险调整性能。

详情
Comments
Camera-ready version (ProbML 2026)
AI中文摘要

偏好贝叶斯优化(PBO)通过成对比较学习潜在效用,但现有方法假设比较噪声同方差,这在人机交互场景中不足,因为用户可能对某些设计可靠而对其他设计犹豫。本文提出PBO的异方差噪声模型:在优化前,用户提供少量可靠示例(锚点),核密度估计(KDE)将这些锚点转化为输入依赖的用户不确定性图。该图被整合到偏好高斯过程(GP)代理中,并推导出风险规避的获取函数,平衡效用和比较的便利性。进一步证明,风险调整的流行预期效用(EUBO)变体在一步贝叶斯最优性保证上至多加一个常数,且在理想化的独立同分布锚点模型下,KDE估计器具有标准一致性和集中率。在合成问题和人类偏好数据集上的实验显示,改进了风险调整性能,并澄清了锚点放置对方法的影响。

英文摘要

Preferential Bayesian optimization (PBO) learns latent utilities from pairwise comparisons, but most existing methods assume homoscedastic comparison noise. This is inadequate in human-in-the-loop settings, where a user may compare some designs reliably and others only hesitantly. We propose a heteroscedastic noise model for PBO: before optimization, the user provides a small set of reliable examples, called anchors, and a kernel density estimator (KDE) turns these anchors into an input-dependent map of user uncertainty. We incorporate this map into preferential GP surrogates and derive risk-averse acquisition functions that trade off utility and ease of comparison. We further show that a risk-adjusted variant of the popular expected utility of the best option (EUBO) preserves the one-step Bayes-optimality guarantee up to an additive constant, and that under an idealized i.i.d. anchor model the KDE estimator enjoys standard consistency and concentration rates. Experiments on synthetic problems and human-preference datasets show improved risk-adjusted performance and clarify how anchor placement affects the method.

2403.11782 2026-05-19 cs.LG stat.ML

A tutorial on learning from preferences and choices with Gaussian Processes

基于高斯过程的学习偏好与选择教程

Alessio Benavoli, Dario Azzimonti

AI总结 本文介绍了利用高斯过程进行偏好学习的框架,结合经济学和决策理论原理,提出新颖的模型以填补现有文献的空白。

详情
AI中文摘要

偏好建模处于经济学、决策理论、机器学习和统计学的交汇点。通过理解个体的偏好和选择方式,可以构建更符合预期的产品,推动在广泛领域内更高效和个性化应用的发展。本文旨在介绍一个连贯且全面的高斯过程(GPs)偏好学习框架,展示如何将理性原则无缝融入学习过程。通过适当调整似然函数,该框架能够构建包含随机效用模型、辨别极限以及多重冲突效用场景的偏好学习模型。本文在已有研究基础上,同时引入了一些新的基于高斯过程的模型,以解决现有文献中的特定缺口。

英文摘要

Preference modelling lies at the intersection of economics, decision theory, machine learning and statistics. By understanding individuals' preferences and how they make choices, we can build products that closely match their expectations, paving the way for more efficient and personalised applications across a wide range of domains. The objective of this tutorial is to present a cohesive and comprehensive framework for preference learning with Gaussian Processes (GPs), demonstrating how to seamlessly incorporate rationality principles (from economics and decision theory) into the learning process. By suitably tailoring the likelihood function, this framework enables the construction of preference learning models that encompass random utility models, limits of discernment, and scenarios with multiple conflicting utilities for both object- and label-preference. This tutorial builds upon established research while simultaneously introducing some novel GP-based models to address specific gaps in the existing literature.

2310.07983 2026-05-19 cs.LG math.OC stat.ML

Achieving Linear Speedup with ProxSkip in Distributed Stochastic Optimization

通过ProxSkip在分布式随机优化中实现线性加速

Luyao Guo, Sulaiman A. Alghunaim, Kun Yuan, Laurent Condat, Jinde Cao

AI总结 本文研究了ProxSkip在非凸设置下的收敛性,证明其在节点数量上实现线性加速,并展示了局部更新对通信效率的提升作用。

详情
AI中文摘要

ProxSkip算法在分布式优化中因其减少通信的效果而受到越来越多的关注。然而,现有分析仅限于强凸设置,无法实现节点数量的线性加速。本文重新审视去中心化ProxSkip,回答了其在非凸设置下的行为及线性加速的可实现性问题。我们为随机非凸、凸和强凸问题提供了统一的收敛分析,揭示了梯度噪声、局部更新、网络连通性和数据异质性如何共同决定收敛行为。到目前为止,这是首次证明去中心化ProxSkip在随机梯度下实现节点数量线性加速的分析。此外,我们的结果表明,局部更新可以有效减少通信频率并提高通信效率。

英文摘要

The ProxSkip algorithm for distributed optimization is gaining increasing attention due to its effectiveness in reducing communication. However, existing analyses of ProxSkip are limited to the strongly convex setting and fail to achieve linear speedup with respect to the number of nodes. Key questions regarding its behavior in the non-convex setting and the achievability of linear speedup remain open. In this paper, we revisit decentralized ProxSkip and answer these questions affirmatively. We provide a unified convergence analysis for stochastic non-convex, convex, and strongly convex problems, revealing how gradient noise, local updates, network connectivity, and data heterogeneity jointly determine the convergence behavior. To the best of our knowledge, this is the first analysis showing that decentralized ProxSkip achieves linear speedup in the number of nodes under stochastic gradients. Moreover, our results demonstrate that local updates can effectively reduce communication frequency and improve communication efficiency.

2308.05534 2026-05-19 stat.ME

Collective Outlier Detection and Enumeration with Conformalized Closed Testing

集体异常检测与枚举的符合化封闭检验

Chiara G. Magnani, Matteo Sesia, Aldo Solari

AI总结 本文提出一种分布无关方法,用于集体异常检测与枚举,结合符合推断和多重检验等思想,通过自动选择分类器和检验程序,有效检测稀疏、弱或隐蔽的异常信号。

详情
AI中文摘要

本文开发了一种灵活的分布无关方法,用于集体异常检测和枚举,适用于即使异常信号稀疏、弱或隐蔽时仍能有效检测的情况。该方法基于最近的符合推断发展,并整合了其他领域的经典思想,包括多重检验、局部最强大和自适应秩检验以及非参数大样本渐近性。关键创新在于开发了一种原理上有效的方法,用于自动选择最适合给定数据集的机器学习分类器和两样本检验程序。通过广泛的实证演示,包括对LHCO高能粒子碰撞数据集的分析,评估了该方法的性能。

英文摘要

This paper develops a flexible distribution-free method for collective outlier detection and enumeration, designed for situations in which the presence of outliers can be detected powerfully even though their precise identification may be challenging due to the sparsity, weakness, or elusiveness of their signals. This method builds upon recent developments in conformal inference and integrates classical ideas from other areas, including multiple testing, locally most powerful and adaptive rank tests, and non-parametric large-sample asymptotics. The key innovation lies in developing a principled and effective approach for automatically choosing the most appropriate machine learning classifier and two-sample testing procedure for a given data set. The performance of our method is investigated through extensive empirical demonstrations, including an analysis of the LHCO high-energy particle collision data set.

2605.16622 2026-05-19 cs.LG math.OC stat.ML

Does Weight Decay Enhance Training Stability?

权重衰减是否增强训练稳定性?

Marius Saether, Amir Kolic, Tomaso Poggio, Pierfrancesco Beneventano

AI总结 本文研究权重衰减对训练动态稳定性的影响机制,发现其通过参数空间动态和损失尖锐度的变化影响训练稳定性,并揭示了架构依赖的相变现象。

详情
Comments
24 pages, 16 figures
AI中文摘要

在现代深度学习中,权重衰减常被归功于

英文摘要

In modern deep learning, weight decay is often credited with "stabilizing" training dynamics, diverging from its classical role as a static regularization penalty. We investigate a fundamental question: *does weight decay stabilize training dynamics, and if so, through which mechanism?* Indeed, training stability is understood through different but related notions in the literature. We consider how weight decay affects the parameter-space dynamics and loss sharpness by analyzing its effects at the \emph{Edge of Stability} (EoS). We show that weight decay robustly slows *progressive sharpening}. Furthermore, we uncover a striking architecture-dependent phase transition. In CNNs, weight decay dampens the oscillations at the EoS, while in MLPs, increasing weight decay causes a phase transition in which the sharpness stabilizes at a threshold significantly below the theoretical $\frac{2}η$ boundary. We develop a mathematical framework that accurately models these phenomena and identify the global alignment of the parameter vector and the sharpness gradient as the mechanistic driver of the phase transition. Importantly, we show that these phenomena translate into stability in terms of search in function-space (NTK). Last, this shows that curvature thresholds obtained from convex/quadratic heuristics may not be reliable stability diagnostics under regularization.

2605.16593 2026-05-19 stat.AP econ.EM stat.ML

Policy Learning with Observational Data: The Case of Hepatitis C Treatment for HIV/HCV Co-Infected Patients

基于观测数据的政策学习:HIV/HCV共感染患者抗病毒治疗的案例

Raphaël Langevin

AI总结 本文提出在弱假设下通过观测数据推导多行动政策规则的方法,应用于HIV/HCV共感染患者抗病毒治疗,发现部分患者无需治疗即可自愈,优化治疗分配可降低成本并提升健康效益。

详情
Comments
74 pages, 10 figures
AI中文摘要

决策者常需在有限选项中选择单一行动,如医生选治疗方案。本文展示如何在弱假设下从观测数据中推导多行动政策规则。通过加权K均值算法估计条件平均处理效应(CATEs),假设每个同质子群内的结果模型正确指定。通过标准决策树实施可行政策规则,允许完美或 imperfect 的治疗依从性。方法应用于HIV/HCV共感染患者抗病毒治疗,该领域缺乏统一指南。结果发现约80%的患者无需治疗即可自愈,重新分配治疗可降低总成本360万至490万加元,同时提升整体健康效益。这些发现表明,所提出方法可生成数据驱动的优化治疗指南。

英文摘要

Decision-makers frequently must choose a single action from a finite set of alternatives -- for example, physicians selecting a treatment, investors choosing a portfolio risk level, or judges determining sentences. To improve outcomes, policymakers often issue policy rules or guidelines to inform such choices. In this paper, I show how to generally derive policy rules from observational data in a multi-action framework under relatively weak assumptions about the underlying structure of the heterogeneous sampled population. Conditional average treatment effects (CATEs) are consistently estimated via a weighted K-means algorithm, assuming the outcome model is correctly specified within each homogeneous subgroup. Feasible policy rules are then implemented via a standard decision tree, allowing for both perfect and imperfect adherence to treatment. The methodology is applied to treatment options for Hepatitis C (HCV) among patients co-infected with human immunodeficiency virus (HIV), a setting in which no uniform guideline exists for modern pharmaceutical therapies. The results identify a subgroup of patients with approximately an 80% probability of spontaneous HCV clearance without treatment. Estimation results also show that reallocating treatments among treated individuals could have reduced total treatment costs by CAN$3.6-4.9 million while still increasing aggregate health benefits relative to the status quo. These findings demonstrate that the proposed approach can generate improved, data-driven treatment guidelines for the management of HIV/HCV co-infected patients.

2605.16571 2026-05-19 stat.ML cs.AI cs.LG

Isotonic Survival Regression: Calibrated Survival Distributions from Deep Cox Models

非递减生存回归:从深度Cox模型中校准生存分布

Anchit Jain, Kevin Zhang, Stephen Bates

AI总结 本文提出一种非递减回归方法,用于校准深度Cox模型的生存概率,通过理论保证和实验验证提升模型实用性。

详情
AI中文摘要

时间到事件数据在生命科学和工程中普遍存在,但通常伴随删失,这使得标准机器学习方法的应用复杂化。深度Cox模型因能优雅处理删失并可与无结构数据如临床文本报告、基因组序列和病理图像结合而成为分析时间到事件数据的流行方法。然而,其预测的生存概率往往校准不良,限制了实际应用。本文提出了一种新颖的后验校准方法,利用非递减回归来改进预测生存概率而不影响判别能力。我们建立了有利的理论保证,包括双重鲁棒性属性和渐近校准。在合成和真实世界临床数据上的实验展示了我们方法的实证有效性。

英文摘要

Time-to-event data is widespread across the life sciences and engineering, but it is typically encountered together with censoring, which complicates the application of standard machine learning methods. Deep Cox models have emerged as a popular method for analyzing time-to-event data because they gracefully handle censoring and can be used with unstructured data such as clinical text reports, genomic sequences, and pathology images. However, their predicted survival probabilities are often poorly calibrated, thus limiting their practical utility. In this paper, we propose a novel post hoc calibration method for Deep Cox models that uses isotonic regression to refine predicted survival probabilities without affecting discriminative power. We establish favorable theoretical guarantees, including a double-robustness property and asymptotic calibration. Experiments on synthetic and real-world clinical data demonstrate the empirical effectiveness of our method.

2605.16570 2026-05-19 stat.CO stat.ML

A Cubing Strategy for Identifying Stable Hyperparameter Regions for Uncertainty Quantification in Spatial Deep Learning

一种用于空间深度学习中不确定性量化稳定超参数区域识别的立方策略

Isaac Amouzou, Ben Seiyon Lee

AI总结 本文提出一种基于立方体的诊断框架,通过递归划分超参数空间,识别MC dropout产生良好校准预测区间稳定区域,提升空间深度学习模型的不确定性量化能力。

详情
AI中文摘要

空间参考数据集在许多领域中变得越来越普遍,主要得益于数据收集方法的进步,如卫星遥感。在许多应用中,未观测位置的预测伴随着可靠的不确定性估计。尽管深度学习方法为空间预测提供了可扩展且准确的模型,但在空间深度学习中仍缺乏明确的共识来解决不确定性量化问题。蒙特卡洛(MC)丢弃已成为不确定性量化的流行方法,但现有实现通常专注于调整丢弃率,而固定其他关键超参数,如权重衰减和预测标准差乘数,通常通过随意或手动调整。我们提出了一种基于立方体的诊断框架,通过递归划分超参数空间,以识别MC丢弃产生良好校准预测区间的稳定区域。该方法通过评分规则相对统计基线模型评估超参数区域,该基线模型作为校准锚点。通过涵盖多个空间依赖性制度的模拟研究以及一个大规模的遥感地表温度数据集,我们证明了我们的方法在预测区间上与基线模型相比具有竞争力或更优的表现。我们的方法为从业者提供了一种系统化的方法,将不确定性量化纳入空间深度学习模型中。

英文摘要

Spatially referenced datasets have become increasingly prevalent across many fields, largely driven by advances in data collection methods such as satellite remote sensing. In many applications, predictions at unobserved locations are accompanied by reliable uncertainty estimates. While deep learning methods provide both scalable and accurate models for spatial predictions, there remains no clear consensus for addressing uncertainty quantification in spatial deep learning. Monte Carlo (MC) dropout has become a popular approach for uncertainty quantification, yet existing implementations typically focus on tuning the dropout rate while fixing other influential hyperparameters, such as weight decay and the predictive standard deviation multiplier, often through ad-hoc or manual tuning. We propose a cubing-based diagnostic framework that recursively partitions the hyperparameter space to identify stable regions where MC dropout yields well-calibrated predictive intervals. The approach evaluates hyperparameter regions using scoring rules relative to a statistical baseline model, which serves as a calibration anchor. Through a simulation study spanning multiple spatial dependence regimes as well as a large remotely-sensed land surface temperature dataset, we demonstrate that our approach produces competitive or superior predictive intervals compared to the baseline model. Our methodology provides practitioners with a systematic procedure for incorporating uncertainty quantification into spatial deep learning models.

2605.16486 2026-05-19 stat.ML astro-ph.IM cs.LG

StAD: Stein Amortized Divergence for Fast Likelihoods with Diffusion and Flow

StAD:基于Stein算子的 amortized 散度用于具有扩散和流的快速似然

Gurjeet Jagwani, Stephen Thorp, Sinan Deger, Hiranya Peiris

AI总结 本文提出StAD方法,利用Langevin-Stein算子预测和学习PF-ODE的散度,无需计算雅可比矩阵,提升了似然预测的效率和稳定性。

详情
Comments
24 pages, 10 figures
AI中文摘要

扩散和流基模型广泛用于生成建模和密度估计。它们允许确定性概率流常微分方程(PF-ODE),类似于连续归一化流(CNFs),描述了概率质量的传输。从这些模型中获得似然对于许多工作流程至关重要,尤其是贝叶斯分析,这需要求解雅可比矩阵的迹来计算学习PF-ODE的发散性,这要么是$\mathcal{O}(D^2)$精确计算,要么是$\mathcal{O}(D)$的噪声估计。我们引入StAD,一种新的蒸馏方法,利用兰格vin-斯坦算子预测和学习PF-ODE的发散性,而无需计算雅可比矩阵。我们证明我们的方法在CIFAR-10、ImageNet和其他密度估计任务上与Hutchinson和Hutch++竞争,一致提高了似然预测的方差和速度,优于Hutchinson。我们还证明我们的方法可以推广到各种生成模型,且在某些正则性条件下,这些学习的向量场可以满足斯坦类。

英文摘要

Diffusion and flow-based models are ubiquitously used for generative modelling and density estimation. They admit a deterministic probability flow ordinary differential equation (PF-ODE), analogous to continuous normalizing flows (CNFs), which describes the transport of the probability mass. Obtaining the likelihood from these models is of interest to many workflows, especially Bayesian analysis, and requires solving the trace of the Jacobian to compute the divergence of the learned PF-ODE, which is either $\mathcal{O}(D^2)$ to compute exactly or $\mathcal{O}(D)$ with a noisy estimate. We introduce StAD, a new distillation method to predict and learn the divergence of the PF-ODE using the Langevin-Stein operator without ever computing the Jacobian. We show that our method is competitive with the Hutchinson and Hutch++ on CIFAR-10, ImageNet and other density estimation tasks, consistently improving the variance and speed of the likelihood predictions compared to the Hutchinson. We additionally show our method will generalize to a varied class of generative models, and show that under some regularity conditions these learned vector fields can be made to satisfy the Stein class.

2605.16473 2026-05-19 stat.ML cs.LG cs.NA math.NA math.PR

Dimension-Uniform Discretization Analysis of Preconditioned Annealed Langevin Dynamics for Multimodal Gaussian Mixtures

预处理退火 Langevin 动力学在多模高斯混合中的维度均匀离散化分析

Lorenzo Baldassari, Josselin Garnier, Knut Solna, Maarten V. de Hoop

AI总结 本文研究了预处理退火 Langevin 动力学在高斯混合中的稳定性问题,通过 Euler-Maruyama 离散化和指数积分方案,证明了在满足特定谱条件时,KL 散度具有维度均匀的上界。

详情
AI中文摘要

在高维和无穷维设置中,获得稳定的扩散基采样器具有挑战性,因为高频率坐标上的误差累积会使动力学在有限维近似细化时变得不稳定。离散化是此类误差的典型来源,而使用合适的谱衰减预处理是控制其累积的一种方法。本文研究了预处理退火 Langevin 动力学(ALD)应用于高斯混合时的问题。我们首先证明 Euler-Maruyama(EM)离散化通过将退火分数的刚性线性部分用前向 Euler 步处理,施加了将预处理器与退火协方差尺度耦合的稳定性约束。结合确保退火动力学维度均匀控制的条件,该约束迫使初始平滑分布在不同维度上保持均匀接近目标。然后我们考虑了对退火分数的刚性线性部分进行精确积分的指数积分方案。在满足耦合平滑协方差、组件协方差谱和预处理器的显式谱可求和条件时,我们证明了该方案的 KL 散度具有维度均匀的上界。此上界可通过允许足够时间进行退火并相应细化时间网格来使其任意小。重要的是,这些条件允许 KL 散度在不同维度上发散的区域,表明 EM 限制是方案依赖的,而非 ALD 的固有属性。

英文摘要

Obtaining stable diffusion-based samplers in high- and infinite-dimensional settings is challenging because errors can accumulate across high-frequency coordinates and make the dynamics unstable under refinement of the finite-dimensional approximation of the underlying function-space problem. Discretization is a typical source of such errors, and preconditioning with a suitable spectral decay is one way to control their accumulation. In this paper, we study this problem for preconditioned annealed Langevin dynamics (ALD) applied to Gaussian mixtures. We first show that Euler-Maruyama (EM) discretization, by treating the stiff linear part of the annealed score with a forward Euler step, imposes a stability constraint coupling the preconditioner with the annealed covariance scale. Together with the conditions ensuring dimension-uniform control of the annealed dynamics, this constraint forces the initial smoothed law to remain uniformly close to the target across dimensions. We then consider an exponential-integrator scheme that integrates the stiff linear part of the annealed score exactly. Under explicit spectral summability conditions coupling the smoothing covariance, the component covariance spectra, and the preconditioner, we prove a dimension-uniform Kullback-Leibler (KL) bound for this scheme. This bound can be made arbitrarily small, uniformly in dimension, by allowing enough time for annealing and then refining the time mesh accordingly. Importantly, these conditions allow regimes in which the KL divergence between the target and the initial smoothed law diverges with dimension, showing that the restrictions imposed by EM are scheme-dependent rather than intrinsic to ALD.

2605.16390 2026-05-19 cs.CV cs.LG stat.ML

Inducing Spatial Locality in Vision Transformers through the Training Protocol

通过训练协议在视觉变换器中诱导空间局部性

Eduardo Santiago Toledo, Asael Fabian Martínez

AI总结 研究通过对比不同训练协议,发现CutMix能提升视觉变换器早期层的注意力局部性,降低MAD值,表明CutMix促进局部注意力的产生。

详情
AI中文摘要

我们研究了是否可以通过训练协议在从头训练的视觉变换器(ViT)的早期层中诱导空间局部性,而无需大规模预训练。在CIFAR-10、CIFAR-100和Tiny-ImageNet上,我们比较了基线协议与现代协议(AutoAugment/ColorJitter、CutMix和Label Smoothing),通过均值注意力距离(MAD)和归一化熵来表征每个注意力头。在所有三个数据集中,现代协议在早期层产生更局部和更集中的注意力;在CIFAR-100上,最小MAD从0.316(基线)降至0.008(现代)。为了确定这种效果的来源,我们在CIFAR-100上进行了消融研究,分别添加或移除每个组件。结果表明CutMix是实验中的决定性组件:所有包含CutMix的条件均显示MAD为0.024,而所有不包含CutMix的条件仍保持在MAD 0.210。AutoAugment和Label Smoothing对局部性无独立影响。总体而言,这些发现表明,由CutMix诱导的从部分图像区域进行分类的压力,可以促进视觉变换器中局部注意力的出现。

英文摘要

We investigate whether the training protocol can induce spatial locality in the early layers of a Vision Transformer (ViT) trained from scratch, without large-scale pretraining. Keeping the architecture and optimization procedure fixed, we compare a Baseline protocol with a Modern protocol (AutoAugment/ColorJitter, CutMix, and Label Smoothing) on CIFAR-10, CIFAR-100, and Tiny-ImageNet, characterizing each attention head via Mean Attention Distance (MAD) and normalized entropy. Across all three datasets, the Modern protocol produces more local and more concentrated attention in early layers; on CIFAR-100, the minimum MAD drops from 0.316 (Baseline) to 0.008 (Modern). To identify the source of this effect, we conduct an ablation study on CIFAR-100 by adding or removing each component individually. The results identify CutMix as the determining component within our experiments: all conditions with CutMix exhibit MAD 0.024, while all conditions without CutMix remain at MAD 0.210. AutoAugment and Label Smoothing show no independent effect on locality. Taken together, these findings suggest that the pressure to classify from partial image regions, induced by CutMix, can promote the emergence of local attention in Vision Transformers.

2605.16383 2026-05-19 cs.CV cs.AI stat.ML

A neurosymbolic Approach with Epistemic Deep Learning for Hierarchical Image Classification

一种结合知识符号学习与认知深度学习的分层图像分类方法

Ezel Kilicdere, Shireen Kudukkil Manchingal, Fabio Cuzzolin

AI总结 本文提出一种统一的神经符号和认知建模框架,通过融合Swin Transformer、焦点集推理和可微模糊逻辑,提升分层图像分类的准确性和逻辑一致性。

详情
Comments
36 pages
AI中文摘要

深度神经网络在图像分类任务中实现高精度,但往往产生过于自信的预测,无法表达认知不确定性,并违反数据中存在的逻辑或结构约束。这些局限性在分层分类中尤为明显,因为细粒度和粗粒度的预测必须保持一致。本文首次提出一种统一的神经符号和认知建模框架,通过融合Swin Transformer、焦点集推理和可微模糊逻辑,将标签视为孤立类别,而是在学习的嵌入空间中诱导数据驱动的焦点集,帮助捕捉多个可能细粒度类别的认知不确定性。这些焦点集构成了一个基于信念理论的层,利用模糊隶属函数和t-范数合取来鼓励细粒度和粗粒度预测之间的一致性。可学习的损失进一步平衡校准、质量正则化和逻辑一致性,使模型能够自适应地权衡符号结构与数据驱动的证据。在分层图像分类实验中,本文框架在与Transformer基线相当的准确性的同时,提供更校准和可解释的预测,减少过度自信并强制在分层输出中保持高逻辑一致性。实验结果表明,结合焦点集推理与模糊逻辑为深度学习模型提供了实际步骤,使其既准确又具有认知意识。

英文摘要

Deep neural networks achieve high accuracy on image classification tasks. Yet, they often produce overconfident predictions as which fail to express epistemic uncertainty, and frequently violate logical or structural constraints present in the data. These limitations are particularly pronounced in hierarchical classification, where predictions across fine and coarse levels must remain coherent. We propose, for the first time, a unified neurosymbolic and epistemic modelling framework that augments Swin Transformers with focal set reasoning and differentiable fuzzy logic. Rather than treating labels as isolated categories, our method induces data-driven focal sets within the learnt embedding space, which helps capture epistemic uncertainty over multiple plausible fine-grained classes. These focal sets form the basis of a belief-theoretic layer that uses fuzzy membership functions and t-norm conjunctions to encourage consistency between fine- and coarse-grained predictions. A learnable loss further balances calibration, mass regularisation, and logical consistency, allowing the model to adaptively trade off symbolic structure with data-driven evidence. In experiments on hierarchical image classification, our framework maintains accuracy on par with transformer baselines while providing more calibrated and interpretable predictions, reducing overconfidence and enforcing high logical consistency across hierarchical outputs. Our experimental results show that combining focal set reasoning with fuzzy logic provides a practical step toward deep learning models that are both accurate and epistemically aware.

2605.16361 2026-05-19 cs.LG cs.AI stat.ML

TailedTS: Benchmark Dataset for Heavy-Tailed Time Series Prediction and Periodicity Quantification

TailedTS:用于重尾时间序列预测和周期性量化的大规模基准数据集

Xinyu Chen, HanQin Cai, Lijun Ding, Jinhua Zhao

AI总结 TailedTS数据集用于测试在重尾、零膨胀和非高斯条件下时间序列预测模型的鲁棒性,通过稀疏自回归框架揭示高频页面的周期性较弱,同时提供非高斯损失函数的标准化预测基准。

详情
AI中文摘要

我们介绍了TailedTS,一个基于2024年维基百科每小时页面浏览观测数据的大规模基准数据集,专门用于测试时间序列预测模型在重尾、零膨胀和非高斯条件下的性能。该数据集包含约2469亿个数据点,覆盖约300万个唯一维基百科页面,存储在高效的Apache Parquet格式中。维基百科流量遵循幂律分布,其中约5%的页面贡献了70%的总浏览量,为模型在极端波动下的鲁棒性提供了一个自然且严谨的测试环境。TailedTS支持多个研究任务:首先,我们引入了一个基于稀疏自回归的周期性量化框架,揭示高频页面的周期性结构显著弱于低频页面,这对大型数字平台的服务器分配和流量预测有直接意义。其次,我们提供了在一系列非高斯损失函数下的标准化预测基准,包括ℓ1范数、Huber、分位数和ℓp范数损失,表明基于高斯的估计器在高流量页面类别中性能显著下降,而鲁棒替代方案在所有流量规模上均提供一致的提升。TailedTS可在https://doi.org/10.5281/zenodo.17070469公开获取。

英文摘要

We present TailedTS, a large-scale benchmark dataset derived from Wikipedia hourly page view observations throughout 2024, specifically designed to test time series forecasting models under heavy-tailed, zero-inflated, and non-Gaussian conditions. The dataset comprises approximately 24.69 billion data points spanning roughly 3 million unique Wikipedia pages per month, stored in high-efficiency Apache Parquet format. Wikipedia traffic follows a pronounced power-law distribution where roughly 5% of pages account for over 70% of total page views, creating a natural and rigorous testbed for model robustness against extreme volatility that are absent from or underrepresented in existing benchmarks such as M4, M5, and UCI electricity datasets. TailedTS enables several research tasks. First, we introduce a periodicity quantification framework based on sparse autoregression with sparsity and non-negativity constraints, revealing that frequently-viewed pages exhibit significantly weaker periodic structure than their less-viewed counterparts, showing direct implications for server allocation and traffic forecasting on large digital platforms. Second, we provide standardized prediction benchmarks evaluated under a suite of non-Gaussian loss functions, including $\ell_1$-norm, Huber, quantile, and $\ell_p$-norm losses, demonstrating that standard Gaussian-based estimators degrade substantially on high-volume page categories, while robust alternatives provide consistent gains across all traffic scales. TailedTS is publicly available at https://doi.org/10.5281/zenodo.17070469.

2605.16354 2026-05-19 cs.LG cs.AI cs.CL cs.HC stat.ML

Augmenting Human Evaluation with LLM Judges: How Many Human Reviews Do You Need?

通过LLM裁判增强人类评估:你真的需要多少人类评审?

Jane Paik Kim

AI总结 本文提出通过LLM作为辅助裁判来增强人类评估,通过两阶段抽样设计确定人类和LLM评审样本量,以实现目标统计功效。

详情
Comments
10 pages, 5 figures
AI中文摘要

大型语言模型(LLMs)越来越多地被用作AI系统的自动评估者,包括在高风险应用中。在这一角色中,LLMs用于生成关于模型输出质量、适当性甚至安全性的判断。这种做法受到实际限制的驱动。专家人类评分成本高且难以扩展,而LLM评分可以快速低成本地生成。然而,当前部署LLM评估者的方法是随意的,通常仅限于报告人类和LLM裁判之间的一致性度量作为替代人类评分的正当性,且缺乏正式的研究设计基础。本文(1)将LLM裁判的角色从替代性转为辅助性,并(2)将LLM作为裁判范式制定为通过两阶段抽样设计增强人类评估的一种方法,其中在第一阶段对所有观察进行LLM评估,在第二阶段对子样本进行部分人类评分。我们提出使用来自缺失数据文献的双重鲁棒估计器,利用预测模型的鲁棒性属性,因为缺失性模型是设计已知的。使用该估计器的渐近方差,我们提出如何确定人类和LLM评分的样本量以达到目标统计功效。我们还展示通过分配更多人类评分给LLM评分预测性不高的评估类型,可以高效地设计研究。据我们所知,关于在验证基准时应保留多少人类监督的指导非常有限。

英文摘要

Large language models (LLMs) are increasingly used as automated evaluators of AI systems, including in high-stakes applications. In this role, LLMs are used to generate judgments about the quality, appropriateness, or even safety of model outputs. This approach is motivated by practical constraints. Expert human ratings are costly and difficult to scale, whereas LLM ratings can be produced quickly at low cost. However, current approaches to deploying LLM evaluators are ad hoc, typically limited to reporting agreement metrics between human and LLM judges as a justification for substitution of human ratings, and lack a formal basis for study design. This paper (1) shifts the role of the LLM judge from substitutive to auxiliary, and (2) formulates the LLM-as-a-judge paradigm as one of augmenting human evaluation through a two-stage sampling design, where LLM evaluations are measured for all observations at the first stage and human ratings are partially observed for a subsample at the second stage. We propose to use a doubly robust estimator from the missing data literature, which takes advantage of the robustness property against the prediction model, since the missingness model is known by design. Using the asymptotic variance of this estimator, we propose how sample sizes of human and LLM ratings can be determined to achieve a targeted level of power. We also show that a study can be efficiently designed by allocating more human ratings for types of evaluations where the predictability of LLM ratings is not high. To the best of our knowledge, there is very little guidance on how much human oversight should be retained when validating benchmarks.

2605.16335 2026-05-19 stat.ME math.ST stat.TH

Tests for constancy of model parameters Over time

对时间变化模型参数一致性的检验

Nils Lid Hjort, Alex J. Koning

AI总结 本文研究了模型参数随时间变化的检验问题,提出了一种监控过程并构建了适合检验的统计量,讨论了如何确定变化的位置和类型,适用于各类参数模型。

详情
Journal ref
Journal of Nonparametric Statistics, 2002, vol. 14, pages 113-132
Comments
23 pages, 3 figures. This is a Statistical Research Report, Department of Mathematics, University of Oslo, from 2001, containing some more material than for the published version, in Journal of Nonparametric Statistics, 2002, vol. 14, pages 113-132. NLH honours Alex Koning (1959-2022) by making these Hjort-Koning methods more visible, via arXiv and other channels
AI中文摘要

假定一系列数据点服从某种参数形式的分布,但其中一些底层参数可能随时间变化。本文在该框架下探讨了各种自然问题。我们构建了监控过程,在无变化假设下,这些过程收敛于独立布朗桥。利用这些过程构建了适合检验的统计量。研究了加权版本,并推导了最优权重函数以获得最大局部效力。还讨论了如何利用结果确定变化的位置和类型,当初步筛查测试表明存在变化时。我们的统一大样本方法具有广泛适用性,适用于所有常规参数模型,包括回归、马尔可夫链和时间序列情况。

英文摘要

Suppose that a sequence of data points follows a distribution of a certain parametric form, but that one or more of the underlying parameters may change over time. This paper addresses various natural questions in such a framework. We construct canonical monitoring processes which under the hypothesis of no change converge in distribution to independent Brownian bridges, and use these to construct natural goodness-of-fit statistics. Weighted versions of these are also studied, and optimal weight functions are derived to give maximum local power against alternatives of interest. We also discuss how our results can be used to pinpoint where and what type of changes have occurred, in the event that initial screening tests indicate that such exist. Our unified large-sample methodology is quite general and applies to all regular parametric models, including regression, Markov chains, and time series situations.

2605.16332 2026-05-19 stat.AP

Data-Driven Climate Outage Risk Characterization and Resilience Analysis in Joint Power-Communication Networks

数据驱动的气候停电风险特征分析与联合电力-通信网络韧性分析

Yoneke Graham, Gelila Webster, Tina Tran, Sohini Roy

AI总结 本文提出数据驱动框架,结合实证停电分析与级联故障模拟,揭示气候相关停电事件增加趋势及沿海地区风险加剧问题,通过逻辑回归模型识别主要影响因素,并通过级联模拟评估不同场景下的韧性缺口。

详情
AI中文摘要

气候驱动的停电事件对美国电网可靠性构成日益增长的威胁,但实证停电研究与基于互依赖性的韧性分析很少整合。本文提出一个数据驱动的框架,整合实证停电特征分析与级联故障模拟。利用EAGLE-I国家停电数据集(2015-2023年,超过525,000条记录),通过描述性分析和假设检验揭示气候-停电景观,发现气候相关停电事件每年增加约9,100次,对沿海地区影响更严重。通过可解释的逻辑回归模型识别主要的严重停电风险预测因素,其中恶劣天气是主导因素。基于这些发现,构建了四个具有地理代表性的故障场景,并利用基于MIIM的级联模拟在IEEE 118节点系统上进行评估。沿海场景产生的韧性缺口远大于内陆案例,极端沿海恶劣天气场景使级联后运行能力降至17.6%。结果表明,仅凭汇总停电统计数据低估沿海风险,因为跨层级联传播放大了地理损害,这种损害只能通过互依赖性意识的模拟揭示。

英文摘要

Climate-driven power outages pose a growing threat to U.S. grid reliability, yet empirical outage studies and interdependency-based resilience analyses are rarely integrated. This paper presents a data-driven framework that integrates empirical outage characterization with cascade failure simulation in joint power-communication networks. Using the EAGLE-I national outage dataset (2015-2023, above 525,000 records), we characterize the climate-outage landscape through descriptive analysis and hypothesis testing, finding that climate-related outages increase by roughly 9,100 events per year and impose a significantly greater severity burden on coastal states. An interpretable logistic regression model then identifies the main predictors of severe outage risk, with Severe Weather emerging as the dominant factor. Guided by these findings, we construct four geographically representative failure scenarios and evaluate them using MIIM-based cascade simulation on the IEEE 118-bus system with a communication network overlay. Coastal scenarios produce substantially larger resilience gaps than the inland case, with the Extreme Coastal Severe Weather scenario reducing post-cascade operability to 17.6 percentage. The results show that aggregate outage statistics alone underestimate coastal risk, as cross-layer cascade propagation amplifies geographic damage in ways revealed only through interdependency-aware simulation.

2605.16319 2026-05-19 cs.LG stat.AP stat.ML

Forecasting Medium-Horizon Alzheimer's Disease Progression: Residual Gap-Aware Transformers for 24-Month CDR-SB Change from ADNI Clinical and Biomarker Histories

中长期阿尔茨海默病进展预测:基于残差间隙感知的变换器用于ADNI临床和生物标志物历史的24个月CDR-SB变化

Ran Tong, Tong Wang, Lanruo Wang, Yin Ni

AI总结 本文提出残差间隙感知变换器,结合混合效应统计参考与变换器残差学习,用于预测24个月CDR-SB变化,提升了预测精度和相关性。

详情
Comments
Preprint; includes appendix, 4 figures, and 6 tables
AI中文摘要

中长期阿尔茨海默病进展预测困难,因为未来临床评分可能与基线严重程度保持一致,而生物标志物历史不规则且不完全观察。我们开发了一种基于锚点的分析,利用统一的阿尔茨海默病神经影像计划(ADNI)表格分析24个月临床痴呆评定总箱数(CDR-SB)变化。每个标记样本在轻度认知障碍访问时锚定,仅使用在锚点之前观察到的临床和生物标志物历史,并将响应定义为在18-30个月窗口内最接近24个月的未来访问的CDR-SB减去锚定CDR-SB。分析队列包含来自858名受试者的2,600个标记锚点和7,276个纵向行。我们提出了一种残差间隙感知变换器,结合混合效应统计参考与基于变换器的残差学习,从预锚点的临床和生物标志物历史中学习。模型使用受试者层面的随机截距在混合效应参考中,观察层面的三元组标记化用于不规则历史,并在自注意力中学习非负时间间隙惩罚。我们比较了所提模型与通过贝叶斯信息准则选择的线性混合效应基线、GRU-D和STraTS在重复的受试者层面训练-测试分割下的表现。在五个受试者层面随机种子下,所提模型在所有报告指标上实现了最佳的平均测试性能,相对于混合效应基线,将MSE降低了13.1%,预测-观测相关性提高了26.4%。此外,它在均值误差和相关性上优于GRU-D和STraTS。这些结果表明,统计锚定和间隙感知残差学习为中长期阿尔茨海默病进展预测提供了一个有用的结构。

英文摘要

Medium-horizon Alzheimer's disease progression prediction is difficult because future clinical scores can remain tied to baseline severity, while biomarker histories are irregular and incompletely observed. We develop an anchor-based analysis of 24-month Clinical Dementia Rating Sum of Boxes (CDR-SB) change using harmonized Alzheimer's Disease Neuroimaging Initiative (ADNI) tables. Each labeled sample is anchored at a mild cognitive impairment visit, uses only clinical and biomarker history observed at or before that anchor, and defines the response as CDR-SB at the future visit closest to 24 months within an 18--30 month window minus anchor CDR-SB. The analytic cohort contains 2,600 labeled anchors from 858 participants and 7,276 longitudinal rows. We propose a residual gap-aware transformer that combines a mixed-effects statistical reference with transformer-based residual learning from pre-anchor clinical and biomarker histories. The model uses participant-level random intercepts in the mixed-effects reference, observation-level triplet tokenization for irregular histories, and a learned nonnegative time-gap penalty inside self-attention. We compare the proposed model with a Bayesian-information-criterion-selected linear mixed-effects baseline, GRU-D, and STraTS under repeated participant-level train--test splits. Across five participant-level random seeds, the proposed model achieves the best mean test performance across all reported metrics, reducing MSE by 13.1% and increasing prediction--observation correlation by 26.4% relative to the mixed-effects baseline. It also improves over both GRU-D and STraTS in mean error and correlation. These results show that statistical anchoring and gap-aware residual learning provide a useful structure for medium-horizon Alzheimer's disease progression prediction.

2604.28005 2026-05-19 cs.LG stat.ML

Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning

核化优势估计:从非参数统计到大语言模型推理

Shijin Gong, Kai Ye, Jin Zhu, Xinyu Zhang, Hongyi Zhou, Chengchun Shi

AI总结 本文提出利用非参数统计方法提升大语言模型推理中的优势估计,通过核平滑技术实现高效的价值函数估计与策略优化,提升政策学习质量。

详情
Comments
45 pages, 5 figures
AI中文摘要

近年来,大语言模型(LLM)在增强推理能力方面越来越多地依赖强化学习(RL)。三种方法被广泛采用:第一种依赖深度神经网络估计学习策略的价值函数以减少策略梯度的方差,但估计和维护此类价值网络会带来显著的计算和内存开销。第二种方法通过样本平均近似价值函数,但每个提示需要大量推理轨迹样本以实现准确的价值函数近似,这使计算成本很高。第三种方法每个提示仅采样一个推理轨迹,这降低了计算成本,但样本效率低下。本文聚焦于一个实际且资源受限的场景,其中每个提示只能采样少量推理轨迹,同时低方差梯度估计对于高质量策略学习至关重要。为解决这一挑战,我们引入经典的非参数统计方法,这些方法在计算和统计上都具有效率,应用于LLM推理。我们采用核平滑作为价值函数估计的具体例子,并进行后续的策略优化。数值和理论结果表明,我们的方法实现了准确的价值和梯度估计,从而提升了策略优化。

英文摘要

Recent advances in large language models (LLMs) have increasingly relied on reinforcement learning (RL) to improve their reasoning capabilities. Three types of approaches have been widely adopted: The first relies on a deep neural network to estimate the value function of the learning policy in order to reduce the variance of the policy gradient. However, estimating and maintaining such a value network incurs substantial computational and memory overhead. The second avoids training a value network by approximating the value function using sample averages. However, it samples a large number of reasoning traces per prompt for accurate value function approximation, making it computationally expensive. The third samples only a single reasoning trajectory per prompt, which reduces computational cost but suffers from poor sample efficiency. This paper focuses on a practical, resource-constrained setting in which only a small number of reasoning traces can be sampled per prompt, while low-variance gradient estimation remains essential for high-quality policy learning. To address this challenge, we bring classical nonparametric statistical methods, which are both computationally and statistically efficient, to LLM reasoning. We employ kernel smoothing as a concrete example for value function estimation and the subsequent policy optimization. Numerical and theoretical results demonstrate that our proposal achieves accurate value and gradient estimation, leading to improved policy optimization.

2603.18053 2026-05-19 cs.SI cs.CY econ.GN q-fin.EC stat.ML

Auditing the Auditors: Does Community-based Moderation Get It Right?

审计审计者:基于社区的 moderation 是否正确?

Yeganeh Alimohammadi, Karissa Huang, Christian Borgs, Jennifer Chayes

AI总结 本文研究了基于社区的 moderation 系统在 X 平台 Community Notes 中的审计机制,发现少数贡献者在争议话题中趋于多数意见,并提出一种基于贡献者稳定性权重的两阶段算法以提升预测性能。

详情
AI中文摘要

在线社交平台越来越多地依赖众包系统来大规模标记误导性内容,但这些系统必须聚合用户的评估并决定信任哪些用户的评估。为解决后者,许多平台通过奖励与最终聚合结果一致来审计用户,我们称之为基于共识的审计。我们分析了这种设计在 X 的 Community Notes 中的后果,该平台在 2022 年 9 月采用了将用户参与资格与最终平台结果一致性的审计机制。我们发现证据表明存在策略性 conformity:少数贡献者的评估倾向多数意见,且在争议话题中其参与比例下降,其中独立信号最为重要。我们通过一个行为模型正式化了这一机制,其中贡献者在私人信念与预期分歧惩罚之间权衡。受这些发现启发,我们提出了一种两阶段审计和聚合算法,该算法根据贡献者过去残差的稳定性而非多数同意来加权贡献者。该方法首先考虑内容和贡献者之间的差异,然后衡量每个贡献者评估相对于潜在因子模型的可预测性。那些评估始终具有信息性的贡献者在聚合中获得更大的影响力,即使他们与主流共识相左。在 Community Notes 数据中,这种方法提高了离样预测性能,同时避免了对分歧的惩罚。

英文摘要

Online social platforms increasingly rely on crowd-sourced systems to label misleading content at scale, but these systems must both aggregate users' evaluations and decide whose evaluations to trust. To address the latter, many platforms audit users by rewarding agreement with the final aggregate outcome, a design we term consensus-based auditing. We analyze the consequences of this design in X's Community Notes, which in September 2022 adopted consensus-based auditing that ties users' eligibility for participation to agreement with the eventual platform outcome. We find evidence of strategic conformity: minority contributors' evaluations drift toward the majority and their participation share falls on controversial topics, where independent signals matter most. We formalize this mechanism in a behavioral model in which contributors trade off private beliefs against anticipated penalties for disagreement. Motivated by these findings, we propose a two-stage auditing and aggregation algorithm that weights contributors by the stability of their past residuals rather than by agreement with the majority. The method first accounts for differences across content and contributors, and then measures how predictable each contributor's evaluations are relative to the latent-factor model. Contributors whose evaluations are consistently informative receive greater influence in aggregation, even when they disagree with the prevailing consensus. In the Community Notes data, this approach improves out-of-sample predictive performance while avoiding penalization of disagreement.

2512.19929 2026-05-19 math.ST stat.TH

Deconvolution in unlinked linear models

未链接线性模型中的反卷积

Fadoua Balabdaoui, Antonio Di Noia, Cécile Durot

AI总结 本文研究了在未链接线性回归框架下非参数反卷积问题,提出了一种在Wasserstein距离下达到参数收敛速率的非参数估计器,且噪声平滑度不影响收敛速度。

详情
AI中文摘要

未链接回归,即协变量和响应变量被分别观测且无已知对应关系,近年来受到越来越多关注。反卷积,另一方面,在非参数统计中是一个基本且具有挑战性的问题,旨在根据受某些加性噪声污染的观测值来估计潜在随机变量Z的分布。该任务的复杂性受到噪声分布平滑度的严重影响,通常导致缓慢的估计速率。在本文中,我们将最近的未链接线性回归问题与经典反卷积框架相结合。具体而言,我们研究在Z是可观测多维协变量的线性函数的假设下的非参数反卷积。这种结构约束允许我们引入一种非参数估计Z分布的估计器,该估计器在Wasserstein距离阶1中达到参数收敛速率,其中噪声的平滑度不影响收敛速率。此外,我们引入了Z的无条件密度和给定观测响应的条件密度的非参数估计器。这使我们能够研究估计潜在线性预测值的问题,其与观测响应的联系不可及。通过若干模拟,我们展示了我们反卷积估计器的快速收敛速率以及所提条件估计器在不同模拟场景中的性能。

英文摘要

Unlinked regression, in which covariates and responses are observed separately without known correspondence, has recently gained increasing attention. Deconvolution, on the other hand, is a fundamental and challenging problem in nonparametric statistics with the aim of estimating the distribution of a latent random variable $Z$ based on observations contaminated by some additive noise. The complexity of this task is heavily influenced by the smoothness of the noise distribution and often leads to slow estimation rates. In this paper, we combine the recent unlinked linear regression problem with the classical deconvolution framework. Specifically, we study nonparametric deconvolution under the assumption that $Z$ is a linear function of an observable multidimensional covariate. This structural constraint allows us to introduce a nonparametric estimator of the distribution of $Z$ which achieves the parametric rate of convergence in the Wasserstein distance of order 1, where the smoothness of the noise does not affect the rate. Furthermore, we introduce nonparametric estimators for the unconditional density of $Z$ and the conditional density of $Z$ given an observed response. This allows us to study the problem of estimating the value of the latent linear predictor, whose link to the observed response is not accessible. Through several simulations, we illustrate the fast convergence rate of our deconvolution estimator and the performance of the proposed conditional estimators of the latent predictor in different simulation scenarios.

2512.12572 2026-05-19 cs.LG stat.ML

On the Accuracy of Newton Step and Influence Function Data Attributions

关于牛顿步和影响函数数据归因的准确性

Ittai Rubinstein, Samuel B. Hopkins

AI总结 本文研究了牛顿步和影响函数数据归因方法的准确性,推导出误差缩放规律,揭示了NS方法在特定条件下更准确的原因。

详情
AI中文摘要

数据归因旨在通过估计移除某些训练点时预测的变化来解释模型预测,广泛应用于可解释性、信用分配、遗忘和隐私等领域。即使在逻辑回归这种相对简单的案例中,现有对影响函数(IF)和单步牛顿步(NS)等主流数据归因方法的数学分析仍存在两个关键局限:首先,它们依赖于全局强凸性假设,这在实践中往往不成立;其次,所得的界限在参数数量(d)和移除样本数量(k)方面表现极差。因此,这些分析不够精确,无法回答诸如“每种方法的渐进行为误差如何”或“给定数据集哪种方法更准确”等基本问题。本文引入了针对凸学习问题的NS和IF数据归因方法的新分析。据我们所知,这是首个不假设全局强凸性且解释了[KATL19]和[RH25a]观察到NS数据归因常比IF更准确的分析。我们证明,对于足够良好的逻辑回归,我们的界限在多项对数因子范围内渐近紧致,从而得到平均样本移除情况下的误差缩放定律。[公式]

英文摘要

Data attribution aims to explain model predictions by estimating how they would change if certain training points were removed, and is used in a wide range of applications, from interpretability and credit assignment to unlearning and privacy. Even in the relatively simple case of logistic regressions, existing mathematical analyses of leading data attribution methods such as Influence Functions (IF) and single Newton Step (NS) remain limited in two key ways. First, they rely on global strong convexity assumptions which are often not satisfied in practice. Second, the resulting bounds scale very poorly with the number of parameters ($d$) and the number of samples removed ($k$). As a result, these analyses are not tight enough to answer fundamental questions such as "what is the asymptotic scaling of the errors of each method?" or "which of these methods is more accurate for a given dataset?" In this paper, we introduce a new analysis of the NS and IF data attribution methods for convex learning problems. To the best of our knowledge, this is the first analysis of these questions that does not assume global strong convexity and also the first explanation of [KATL19] and [RH25a]'s observation that NS data attribution is often more accurate than IF. We prove that for sufficiently well-behaved logistic regressions, our bounds are asymptotically tight up to poly-logarithmic factors, yielding scaling laws for the errors in the average-case sample removals. \[ \mathbb{E}_{T \subseteq [n],\, |T| = k} \bigl[ \|\hatθ_T - \hatθ_T^{\mathrm{NS}}\|_2 \bigr] = \widetildeΘ\!\left(\frac{k d}{n^2}\right), \qquad \mathbb{E}_{T \subseteq [n],\, |T| = k} \bigl[ \|\hatθ_T^{\mathrm{NS}} - \hatθ_T^{\mathrm{IF}}\|_2 \bigr] = \widetildeΘ\!\left( \frac{(k + d)\sqrt{k d}}{n^2} \right). \]

2512.06238 2026-05-19 cs.IT math.IT math.ST stat.TH

Non-Asymptotic Error Bounds for Causally Conditioned Directed Information Rates of Gaussian Sequences

关于高斯序列因果条件定向信息率的非渐近误差界

Yuping Zheng, Andrew Lamperski

AI总结 本文研究了高斯序列的因果条件定向信息率,提出基于最优预测的显式公式,并给出误差界为O(N^{-1/2}log(N))的估计器。

详情
Comments
9 pages, 1 figure; accepted by IFAC World Congress 2026
AI中文摘要

定向信息及其因果条件变体常用于测量随机过程间的因果影响。在实践中,这些量必须从数据中测量。已知有限字母序列的这些估计量的非渐近误差界,但对实值数据的研究较少。本文研究了数据为高斯向量序列的情况。我们基于最优预测提供了因果条件定向信息率的显式公式,并定义了一个基于该公式的估计器。我们证明,我们的估计器在高概率下具有误差界为O(N^{-1/2}log(N)),其中N是总样本量。

英文摘要

Directed information and its causally conditioned variations are often used to measure causal influences between random processes. In practice, these quantities must be measured from data. Non-asymptotic error bounds for these estimates are known for sequences over finite alphabets, but less is known for real-valued data. This paper examines the case in which the data are sequences of Gaussian vectors. We provide an explicit formula for causally conditioned directed information rate based on optimal prediction and define an estimator based on this formula. We show that our estimator gives an error of order $O\left(N^{-1/2}\log(N)\right)$ with high probability, where $N$ is the total sample size.

2510.21523 2026-05-19 cs.LG stat.ML

Interpretable epistemic uncertainty decomposition in sequential generative models via polynomial chaos surrogates

通过多项式混沌代理实现序列生成模型中可解释的epistemic不确定性分解

Ramón Nartallo-Kaluarachchi, Shashanka Ubaru, Małgorzata J Zimoń, Dongsung Huh, Robert Manson-Sawko, Lior Horesh, Yoshua Bengio

AI总结 本文提出通过多项式混沌展开分析序列生成模型中epistemic不确定性的来源,揭示奖励组件对生成决策的影响,优于深度集合、贝叶斯神经网络等方法,且在多个真实任务中展现高效性和鲁棒性。

详情
Comments
37 pages, 15 figures
AI中文摘要

条件于不确定奖励的序列生成模型在AI驱动的科学发现中至关重要,但其继承的epistemic不确定性仍无法量化。我们通过拟合多项式混沌展开(PCE)到小规模训练模型集合,将不确定性传播通过生成流网络(GFlowNets)。PCE系数产生分析Sobol敏感性指数,提供首次可解释的分解,揭示哪些奖励组件驱动哪些生成决策,这一能力无法由深度集合、贝叶斯神经网络或蒙特卡洛dropout提供。理论上建立了收敛保证,并在Lean 4证明助手中正式验证了四分之五。在三个真实任务中,该框架揭示了无法被集合单独发现的可操作结构。在Doyle-Dreher Buchwald-Hartwig催化剂选择任务中,催化剂选择稳健(D_catalyst≈71),而添加剂选择脆弱(D_additive≈179,2.5倍更高)。在基于片段的分子设计中,连接位置是最敏感的(D_linker≈28),而装饰位置是最稳健的(D≈14-18),逆转了传统支架稳健/装饰脆弱的假设。在Sachs蛋白质信号网络中,MAPK级联边和PKA/PKC枢纽边分离到不同的敏感性区域,为扰动实验提供靶向地图。95%置信度下的校准覆盖率达到0.97-1.00,且代理在毫秒内评估10,000个策略样本,比穷举重新训练快10^3-10^4倍。

英文摘要

Sequential generative models conditioned on uncertain rewards are central to AI-driven scientific discovery, yet the epistemic uncertainty they inherit from imperfect reward estimates remains unquantified. We propagate this uncertainty through generative flow networks (GFlowNets) by fitting polynomial chaos expansions (PCEs) to small ensembles of trained models. The PCE coefficients yield analytical Sobol sensitivity indices, providing the first interpretable decomposition of which reward components drive which generative decisions, a capability unavailable from deep ensembles, Bayesian neural networks, or Monte Carlo dropout. Convergence guarantees are established theoretically and four of five are formally verified in the Lean 4 proof assistant. Across three real-world tasks the framework reveals actionable structure invisible to ensembles alone. On the Doyle-Dreher Buchwald-Hartwig dataset catalyst selection is robust ($D_{\mathrm{catalyst}}\approx 71$) while additive selection is fragile ($D_{\mathrm{additive}}\approx 179$, $2.5\times$ higher). In fragment-based molecular design the linker position is the most sensitive ($D_{\mathrm{linker}}\approx 28$) while decoration positions are the most robust ($D\approx 14$-$18$), reversing the conventional scaffold-robust / decoration-fragile assumption. On the Sachs protein signalling network, MAPK-cascade edges and PKA/PKC hub edges separate into distinct sensitivity regimes, providing a targeted map for perturbation experiments. Calibration coverage at the 95% level reaches 0.97-1.00 across the dominant steps, and the surrogate evaluates 10{,}000 policy samples in milliseconds - $10^{3}$-$10^{4}\times$ faster than exhaustive retraining.

2504.15879 2026-05-19 stat.ME

Multivariate Poisson intensity estimation via low-rank tensor decomposition

通过低秩张量分解估计多变量泊松强度函数

Haotian Xu, Carlos Misael Madrid Padilla, Oscar Hernan Madrid Padilla, Daren Wang

AI总结 本文提出基于矩阵和张量的方法估计非均匀点过程的多变量强度函数,通过函数空间中的无限维矩阵或张量实现最优偏差方差权衡,提高估计精度并降低计算成本。

详情
AI中文摘要

在本文中,我们提出新的基于矩阵和张量的方法,用于估计非均匀点过程的多变量强度函数。通过将多变量强度函数视为函数空间中的无限维矩阵或张量,我们的算法实现了最优的偏差-方差权衡,产生率最优的估计误差,其模型复杂度由矩阵或张量的秩决定。它们显著提高了估计精度,同时降低了计算成本。为了说明所提框架的适应性,我们证明了许多基本的多变量函数类,包括加法和均场模型,都允许有限秩张量表示。我们应用我们的方法到一个四维的美国地质调查局地震数据集,包含纬度、经度、深度和震级等特征。我们的张量估计器恢复了局部地震活动模式(加利福尼亚、俄克拉荷马、太平洋西北、美国中北部),而核基线方法则过度平滑了这些模式。

英文摘要

In this work, we propose new matrix- and tensor-based methodologies for estimating multivariate intensity functions of inhomogeneous point processes. By viewing multivariate intensity functions as infinite-dimensional matrices or tensors within function spaces, our algorithms attain the optimal bias-variance trade-off, yielding rate-optimal estimation error, with model complexity governed by matrix or tensor ranks. They substantially improve estimation accuracy, while simultaneously reducing computational cost. To illustrate the adaptivity of the proposed framework, we show that many fundamental classes of multivariate functions, including additive and mean-field models, admit finite-rank tensor representations. We apply our method to a four-dimensional U.S. Geological Survey earthquake dataset, comprising features such as latitude, longitude, depth, and magnitude. Our tensor estimator recovers localized seismicity patterns (California, Oklahoma, Pacific Northwest, north-central U.S.), whereas the kernel baseline oversmooths them.

2502.17007 2026-05-19 cs.LG cs.AI stat.ML

Uncertainty Quantification as a Principled Foundation for Explainable Artificial Intelligence: A Case Study of Counterfactual Explanations

不确定性量化作为可解释人工智能的原理性基础:反事实解释的案例研究

Kacper Sokol, Santo M. A. R. Thies, Eyke Hüllermeier

AI总结 本文通过反事实可解释性中的不确定性量化,展示其作为统一框架的潜力,提出两种解释器变体,并证明其在性能上优于现有方法。

详情
AI中文摘要

本文认为,透明性研究忽视了人工智能的基础概念。以反事实可解释性中的不确定性量化为例,证明其广泛应用能解决领域关键挑战。通过将核心反事实属性用不确定性表达,构建两种解释器变体,并展示框架在性能上优于现有方法。文章进一步表明,将人工智能基础融入透明性研究能产生更可靠、稳健和易懂的预测模型。提出使人工智能可解释性真正不确定性感知是实现该目标的第一步。

英文摘要

In this paper we argue that, to its detriment, transparency research overlooks many foundational concepts of artificial intelligence. As an illustrating example we focus on uncertainty quantification in the context of counterfactual explainability, demonstrating that its broader adoption could address key challenges in the field. To this end, we show how uncertainty can provide a principled unifying framework for counterfactual explainability by expressing the core counterfactual properties in terms of uncertainty, allowing us to build two variants of an explainer upon them -- one based solely on uncertainty estimates and another pairing them with distance measured in the feature space. Our comprehensive experiments illustrate highly competitive performance of our framework when compared to many state-of-the-art methods despite its radically simple design. More broadly, the paper demonstrates that integrating artificial intelligence fundamentals into transparency research promises to yield more reliable, robust and understandable predictive models. We posit that making artificial intelligence explainability truly uncertainty-aware is the first step towards this goal.

2411.18510 2026-05-19 stat.ME

A subgroup-aware scoring approach to the study of effect modification in observational studies

一种考虑子组的评分方法用于观察性研究中的效应修饰研究

Yijun Fan, Dylan S. Small

AI总结 本文提出一种新的组M统计量方法,通过在每个子组中评分匹配对来解决子组联合分布中因异常值导致的效应修饰混淆问题,通过广泛实验验证其优越性,并应用于西非疟疾预防治疗的效果研究。

详情
AI中文摘要

效应修饰指的是治疗效应的大小随观察到的协变量变化。一般来说,较大的治疗效应伴有更稳定的误差项,因此可以通过使用这些经历较大治疗效应的子组来得出研究对未测量偏差更不敏感的结论。Lee等人(2018)提出了利用子组联合分布测试统计量的submax方法,如果存在效应修饰则能得出更稳固的结论。然而,一种submax方法版本使用M统计量作为测试统计量,并在R包submax中实现(Rosenbaum, 2017)。M统计量的缩放因子是通过跨子组的所有观测数据计算的。我们证明这种合并可能将效应修饰与异常值混淆。我们提出了一种新的组M统计量,通过在每个子组中评分匹配对来解决这一问题。我们通过广泛设置检验我们的新评分策略,以展示其优越性。所提出的方法应用于西非疟疾预防治疗效果的观察性研究。

英文摘要

Effect modification means the size of a treatment effect varies with an observed covariate. Generally speaking, a larger treatment effect with more stable error terms is less sensitive to bias. Thus, we might be able to conclude that a study is less sensitive to unmeasured bias by using these subgroups experiencing larger treatment effects. Lee et al. (2018) proposed the submax method that leverages the joint distribution of test statistics from subgroups to draw a firmer conclusion if effect modification occurs. However, one version of the submax method uses M-statistics as the test statistics and is implemented in the R package submax (Rosenbaum, 2017). The scaling factor in the M-statistics is computed using all observations combined across subgroups. We show that this combining can confuse effect modification with outliers. We propose a novel group M-statistic that scores the matched pairs in each subgroup to tackle the issue. We examine our novel scoring strategy in extensive settings to show the superior performance. The proposed method is applied to an observational study of the effect of a malaria prevention treatment in West Africa.

2010.15538 2026-05-19 stat.ML cs.LG

Matérn Gaussian Processes on Graphs

图上的Matérn高斯过程

Viacheslav Borovitskiy, Iskander Azangulov, Alexander Terenin, Peter Mostowsky, Marc Peter Deisenroth, Nicolas Durrande

AI总结 本文研究了图上Matérn高斯过程,利用其随机偏微分方程特性,继承了欧几里得和黎曼流形高斯过程的特性,提供标准训练方法,使其适用于小批量和非共轭场景。

详情
Journal ref
Artificial Intelligence and Statistics, 2021
AI中文摘要

高斯过程是一种用于学习未知函数的灵活框架,允许利用对函数性质的先验信息。尽管许多不同的高斯过程模型在欧几里得输入空间中 readily available,但对于输入空间为无向图的高斯过程,选择则更加有限。在本文中,我们利用Matérn高斯过程的随机偏微分方程特性——在欧几里得设置中广泛使用的模型类——来研究其在无向图上的类比。我们证明,所得到的高斯过程继承了其欧几里得和黎曼流形类比的各种吸引特性,并提供了允许使用标准方法(如诱导点)进行训练的技术。这使得图Matérn高斯过程能够应用于小批量和非共轭设置,从而使其更易于从业者使用,并更容易在更大的学习框架中部署。

英文摘要

Gaussian processes are a versatile framework for learning unknown functions in a manner that permits one to utilize prior information about their properties. Although many different Gaussian process models are readily available when the input space is Euclidean, the choice is much more limited for Gaussian processes whose input space is an undirected graph. In this work, we leverage the stochastic partial differential equation characterization of Matérn Gaussian processes - a widely-used model class in the Euclidean setting - to study their analog for undirected graphs. We show that the resulting Gaussian processes inherit various attractive properties of their Euclidean and Riemannian analogs and provide techniques that allow them to be trained using standard methods, such as inducing points. This enables graph Matérn Gaussian processes to be employed in mini-batch and non-conjugate settings, thereby making them more accessible to practitioners and easier to deploy within larger learning frameworks.

1908.05387 2026-05-19 cs.LG stat.ML

HONEM: Learning Embedding for Higher Order Networks

HONEM:用于高阶网络的嵌入学习

Mandana Saebi, Giovanni Luca Ciampaglia, Lance M Kaplan, Nitesh V Chawla

AI总结 本文提出HONEM方法,针对高阶网络结构,有效捕捉非马尔可夫高阶依赖,提升节点分类、网络重建、链接预测和可视化性能。

详情
Journal ref
Big Data 8, no. 4 (2020): 255-269
AI中文摘要

图网络上的表示学习为手动特征工程往往繁琐的过程提供了一个强大的替代方案,因此近年来取得了显著的成功。然而,现有的所有表示学习方法都是基于一阶网络(FON),即只捕捉节点之间成对相互作用的网络。因此,这些方法可能无法纳入非马尔可夫高阶依赖性。因此,生成的嵌入可能无法准确表示网络中的底层现象,导致在不同的归纳或传递学习任务中表现不佳。为了解决这一挑战,本文提出了HONEM,一种能够捕捉网络中非马尔可夫高阶依赖性的高阶网络嵌入方法。HONEM专门针对高阶网络结构(HON)设计,并在包含非马尔可夫高阶依赖性的网络中,在节点分类、网络重建、链接预测和可视化任务中优于其他最先进的方法。

英文摘要

Representation learning on networks offers a powerful alternative to the oft painstaking process of manual feature engineering, and as a result, has enjoyed considerable success in recent years. However, all the existing representation learning methods are based on the first-order network (FON), that is, the network that only captures the pairwise interactions between the nodes. As a result, these methods may fail to incorporate non-Markovian higher-order dependencies in the network. Thus, the embeddings that are generated may not accurately represent of the underlying phenomena in a network, resulting in inferior performance in different inductive or transductive learning tasks. To address this challenge, this paper presents HONEM, a higher-order network embedding method that captures the non-Markovian higher-order dependencies in a network. HONEM is specifically designed for the higher-order network structure (HON) and outperforms other state-of-the-art methods in node classification, network re-construction, link prediction, and visualization for networks that contain non-Markovian higher-order dependencies.