arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.26093 2026-05-26 cs.LG stat.ML

Goal-driven Bayesian Optimal Experimental Design for Robust Decision-Making Under Model Uncertainty

面向模型不确定性下鲁棒决策的目标驱动贝叶斯最优实验设计

Jinwoo Go, Xiaoning Qian, Byung-Jun Yoon

AI总结 提出GoBOED框架,通过结合变分后验代理与可微凸决策层,直接优化实验设计以提升下游决策质量,并理论证明其对决策无关参数方向不敏感。

详情
AI中文摘要

贝叶斯最优实验设计(BOED)选择实验以最大化关于模型参数的信息增益。然而,在决策关键场景中,减少参数不确定性并不一定能改善下游决策,因为只有与目标相关的特定参数方向才真正重要。我们提出了GoBOED,一个目标驱动的BOED框架,它直接针对指定的决策目标优化实验设计。GoBOED结合了摊销变分后验代理与可微凸决策层,实现了完全以决策为中心的基于梯度的设计优化。我们从理论上证明,GoBOED梯度对决策目标无关的参数方向不敏感,这为为什么目标驱动设计在更广泛的实验设计集合上实现与信息增益最大化等效的决策质量提供了形式化依据。在源定位、流行病管理和药代动力学控制等实证任务中,GoBOED识别出与下游决策目标更一致的设计,并揭示了接近最优的设计窗口比目标无关的BOED方法预测的要宽得多。

英文摘要

Bayesian optimal experimental design (BOED) selects experiments to maximize information gain about model parameters. However, in decision-critical settings, reducing parameter uncertainty does not necessarily improve downstream decisions, as only specific parameter directions relevant to the objective truly matter. We propose GoBOED, a goal-driven BOED framework that directly optimizes experimental designs for a specified decision-making objective. GoBOED combines an amortized variational posterior surrogate with a differentiable convex decision layer, enabling gradient-based design optimization that is fully decision-focused. We theoretically show that GoBOED gradients are insensitive to parameter directions irrelevant to the decision objective, providing a formal justification for why goal-driven design achieves equivalent decision quality over a wider set of experimental designs than information-gain maximization. Empirically, across source localization, epidemic management, and pharmacokinetic control, GoBOED identifies designs that better align with downstream decision objectives and reveals that near-optimal design windows are substantially wider than those predicted by goal-agnostic BOED approaches.

2605.26087 2026-05-26 stat.ML cs.LG

DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking

DiscoverPhysics: 基准测试LLMs的即用型科学思维

Matt L. Wiemann, Lindsay M. Smith, Peter Melchior, Siddharth Mishra-Sharma, Andrew Gordon Wilson, Pavel Izmailov, Carolina Cuesta-Lázaro

AI总结 提出DiscoverPhysics交互式基准,通过让LLM代理探索物理定律偏离现实的模拟世界,评估其设计实验、修正假设和发现物理规律的能力。

详情
AI中文摘要

前沿LLM现在在广泛的物理评估中表现强劲,但很难区分真正的推理与对已知科学的回忆。我们引入了DiscoverPhysics,一个交互式基准,要求LLM代理发现一个模拟世界的运动定律,该世界的物理故意偏离我们自己的世界。我们构建了22个世界,分别由屏蔽重力、分数幂重力、多物种耦合、隐藏暗物质样粒子、非坐标无关物理以及时变相互作用等支配。每个世界由N体模拟器按需生成,代理提出多轮实验,观察原始轨迹数据,最终提交对世界物理的自然语言解释以及推断定律的Python实现。由于解决一个世界需要代理设计信息性实验并修正其假设,该基准探测了在实验历史之上的长程推理。我们沿着两个互补轴评估提交:保留粒子的轨迹MSE和LLM评判的解释分数,该分数遵循专家编写的评估每个世界概念理解的规则。在11个前沿模型中,我们发现最强的代理仅通过一半的世界,并且在那些必须揭示潜在结构的世界中持续失败。开源模型在设计信息性实验和从数据中提取结论的能力方面明显落后于商业模型。我们进一步发现,良好的预测准确性并不能保证高质量的解释,并且概念理解依赖于通过精心选择的实验进行假设修正。

英文摘要

Frontier LLMs now perform strongly across a wide range of physics evaluations, but it is hard to disentangle genuine reasoning from recall of established science. We introduce DiscoverPhysics, an interactive benchmark that asks a LLM agent to discover the laws of motion of a simulated world whose physics deliberately deviates from our own. We construct 22 worlds governed by, among others, screened and fractional-power gravity, multi-species couplings, hidden dark-matter-like particles, non-coordinate-free physics, and time-varying interactions. Each world is generated on demand by an N-body simulator, for which the agent proposes several rounds of experiments, observes raw trajectory data, and ultimately submits both a natural-language explanation of the world's physics and a Python implementation of the inferred law. Because solving a world requires the agent to design informative experiments and revise its hypotheses, the benchmark probes long-horizon reasoning over an experimental history. We evaluate submissions along two complementary axes: trajectory MSE on held-out particles and an LLM-judged explanation score following an expert-written rubric assessing conceptual understanding of each world. Across eleven frontier models, we find that the strongest agents pass only half of the worlds and consistently fail on those where latent structure must be uncovered. Open-source models lag substantially behind commercial models, both in their ability to design informative experiments and in extracting conclusions from the data. We further find that good predictive accuracy does not guarantee high explanation quality and that conceptual understanding depends on hypothesis refinement through well-chosen experiments.

2605.26052 2026-05-26 stat.CO

Quantile autoregressive moving average models for ratio-based bounded time series

基于比率的有限时间序列的分位数自回归移动平均模型

Helton Saulo, Roberto Vila, Filidor Vilca

AI总结 提出分位数单位对数对称自回归移动平均(QULS-ARMA)模型,通过分位数重新参数化将自回归和移动平均动态嵌入条件分位数,用于处理(0,1)区间上的比率数据。

详情
Comments
24 pages, 1 figure
AI中文摘要

本文针对开区间$(0,1)$上的有限时间序列,提出了分位数单位对数对称自回归移动平均(QULS--ARMA)模型。该模型通过引入基于分位数的重新参数化,并将自回归和移动平均动态直接嵌入条件分位数,扩展了单位对数对称族,从而克服了基于均值方法的局限性,为来自依赖正变量比率的比例数据提供了一个连贯的框架。所提出的规范通过灵活的对数对称核(包括正态分布和Student-$t$分布)适应非对称行为和重尾。参数估计通过条件最大似然进行,并建立了渐近性质。蒙特卡洛模拟和巴西水能存储比例的应用实例评估了QULS--ARMA模型的有限样本性能和实际优势。结果表明,所提出的估计器在各种场景和核规范下表现良好。

英文摘要

This paper proposes the quantile unit-log-symmetric autoregressive moving average (QULS--ARMA) model for bounded time series on the open unit interval $(0,1)$. The model extends the unit-log-symmetric family by introducing a quantile-based reparameterization and embedding autoregressive and moving-average dynamics directly in the conditional quantile, thereby overcoming limitations of mean-based approaches and providing a coherent framework for proportion data arising from ratios of dependent positive variables. The proposed specification accommodates asymmetric behavior and heavy tails through flexible log-symmetric kernels, including the normal and Student-$t$ distributions. Parameter estimation is carried out via conditional maximum likelihood, and asymptotic properties are established. Monte Carlo simulations and an empirical application to hydroelectric energy storage proportions in Brazil assess the finite-sample performance and practical advantages of the QULS--ARMA model. The results show the good performance of the proposed estimators across a range of scenarios and kernel specifications.

2605.26023 2026-05-26 stat.ME

Considering causality in the construction of molecular signatures of lifestyle exposures

考虑生活方式暴露分子特征构建中的因果关系

Diana Wu, Vivian Viallon

AI总结 本文通过有向无环图和d-分离论证,指出在构建分子特征时忽略单变量筛选步骤可能导致碰撞偏差,引入非因果特征,并建议在特征构建前应用单变量筛选以减轻此偏差。

详情
Comments
28 pages, 10 figures
AI中文摘要

来自组学数据的分子特征越来越多地被用于流行病学研究,以表征生活方式暴露,既可以作为暴露的代理,也可以提供对疾病机制的见解。这些特征通常通过将暴露对高维组学特征进行回归来构建。在文献中,有时会在多变量建模之前应用初始的单变量筛选步骤,但这一选择的因果含义尚未被考虑。关注暴露因果影响分子特征(而非相反)的设置,我们使用有向无环图(DAG)和$d$-分离论证表明,当忽略筛选步骤时,可能会出现碰撞偏差,导致特征中包含非因果特征。我们进一步证明,筛选步骤可以减轻这种偏差。我们的模拟研究表明,筛选减少了非因果特征的包含,尽管以降低敏感性和暴露与所得特征之间的相关性为代价。总体而言,我们建议在特征构建之前应用单变量筛选,特别是在不希望包含非因果特征的情况下,例如在机制研究中。

英文摘要

Molecular signatures derived from omics data are increasingly used in epidemiological studies to characterize lifestyle exposures, either as proxies of exposure or to provide insight into disease mechanisms. These signatures are typically constructed by regressing the exposure on high-dimensional omics features. In the literature, an initial univariate screening step has sometimes been applied prior to multivariate modelling, but the causal implications of this choice have not yet been considered. Focusing on settings where the exposure causally influences molecular features (and not the reverse), we use directed acyclic graphs (DAGs) and $d$-separation arguments to show that collider bias may arise when the screening step is ignored, leading to the inclusion of non-causal features in the signature. We further demonstrate that the screening step can mitigate this bias. Our simulation studies illustrate that screening reduces the inclusion of non-causal features, albeit at the cost of lower sensitivity and reduced correlation between the exposure and the resulting signature. Overall, we recommend applying univariate screening prior to signature construction, particularly when the inclusion of non-causal features is undesirable, such as in mechanistic studies.

2605.26000 2026-05-26 stat.ML cs.LG stat.ME

Statistical Inference for Stochastic Gradient Descent Beyond Finite Variance

超越有限方差的随机梯度下降统计推断

Jose Blanchet, Peter Glynn, Wenhao Yang

AI总结 针对随机梯度下降中梯度方差可能无限的问题,提出一种基于联合弱收敛和自正则化统计量的模型无关置信域构建方法,并通过子采样校准实现渐近有效推断。

详情
AI中文摘要

随机梯度下降(SGD)是大规模统计学习和随机优化的基础算法。然而,当随机梯度具有无限方差时,基于SGD迭代的统计推断仍然具有挑战性,因为相关的极限分布依赖于未知的冗余参数。在本文中,我们开发了一种高效、模型无关的方法,用于从SGD轨迹构建置信域,该方法适用于有限方差和无限方差两种情况。该过程基于Polyak-Ruppert平均估计量和由SGD轨迹上的随机梯度构建的经验二阶矩归一化器的联合弱收敛结果。这种联合极限产生了一个自归一化统计量,其中主要的尾部依赖尺度项相互抵消。然后,我们使用子采样校准方案来估计相关的临界值,避免了对尾部指数、慢变函数或稳定律参数的显式估计。由此产生的置信域易于实现,并且在有限二阶矩和无限二阶矩情况下都是渐近有效的。模拟研究显示了在各种设置下的可靠覆盖,支持所提出的方法作为随机优化中不确定性量化的实用工具。

英文摘要

Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients have infinite variance, as the relevant limiting distributions depend on unknown nuisance parameters. In this paper, we develop an efficient, model-agnostic methodology for constructing confidence regions from SGD trajectories that applies in both finite- and infinite-variance regimes. The procedure is based on a joint weak convergence result for the Polyak-Ruppert averaged estimator and an empirical second-moment normalizer constructed from stochastic gradients along the SGD trajectory. This joint limit yields a self-normalized statistic in which the leading tail-dependent scaling terms cancel. We then use a subsampling calibration scheme to estimate the relevant critical values, avoiding explicit estimation of tail indices, slowly varying functions, or stable-law parameters. The resulting confidence regions are straightforward to implement and are asymptotically valid under both the finite- and infinite-second-moment regimes. Simulation studies show reliable coverage in various settings, supporting the proposed method as a practical tool for uncertainty quantification in stochastic optimization.

2605.25997 2026-05-26 cs.LG stat.ML

Deployment-complete benchmarking

部署完备的基准测试

El Mustapha Mansouri, Keigo Arai

AI总结 提出部署完备的基准测试框架,通过证据纤维和完成曲线量化基准证据是否足以确定部署行动,并证明仅靠分数不足以支持部署决策。

详情
Comments
33 pages, 5 figures, 1 table; supplementary tables and code available
AI中文摘要

基准测试日益指导部署、采购和科学筛选,但分数仅支持其记录的反应,不一定支持部署行动。我们引入了部署完备的基准测试,测试基准证据是否确定部署行动。当行动在每个证据纤维上恒定时,基准对于某个声明是完备的;混合纤维暴露了缺失的部署信息,完成曲线量化了解决歧义所需的证据。在受控响应空间中,基准通道的共形覆盖率为94.98%,但迁移到未测量的部署通道时表现不佳(10.07%),而响应排名区间实现了94.91%的覆盖率;即使零基准错误,在最大残差大小下也仅认证了45.4%的候选者。公开审计揭示了不完备性,包括97.9%的混合Tox21纤维和Matbench与JARVIS主要审计中零中位可认证分数。在保留的重放中,先认证后获取将Tox21中的错误决策从1.19%降至0.027%,JARVIS中从20.3%降至0.128%,同时改变了模型选择并识别了部署相关的探针。部署就绪的基准应报告证据、支持的行动、歧义和完成成本,而不仅仅是分数。

英文摘要

Benchmarks increasingly guide deployment, procurement and scientific screening, yet a score supports only the response it records, not necessarily the deployment action. We introduce deployment-complete benchmarking, which tests whether benchmark evidence determines a deployment action. A benchmark is complete for a claim exactly when the action is constant on each evidence fiber; mixed fibers expose missing deployment information, and completion curves quantify the evidence required to resolve ambiguity. In controlled response spaces, benchmark-channel conformal coverage of 94.98% transferred poorly to an unmeasured deployment channel (10.07%), whereas response-rank intervals achieved 94.91% coverage; even zero benchmark error certified only 45.4% of candidates at the largest residual size. Public audits revealed incompleteness, including 97.9% mixed Tox21 fibers and zero median certifiable fraction in main Matbench and JARVIS audits. In held-out replays, certify-then-acquire reduced false decisions from 1.19% to 0.027% in Tox21 and from 20.3% to 0.128% in JARVIS, while changing model choice and identifying deployment-relevant probes. Deployment-ready benchmarks should report evidence, supported actions, ambiguity and completion cost rather than scores alone.

2605.23082 2026-05-26 stat.ML cs.AI cs.LG

KAPLAN: Kolmogorov-Arnold Prognostic Learnable Activation Networks for Survival Analysis

KAPLAN: 用于生存分析的Kolmogorov-Arnold可预测可学习激活网络

Stelios Boulitsakis Logothetis, Angela Wood, Pietro Liò

AI总结 提出KAPLAN-HR模型,利用B样条Kolmogorov-Arnold网络非参数估计条件风险函数,通过深层架构自动捕捉交互和时变效应,并证明其收敛速率仅依赖于表示平滑性,从而缓解维度灾难,在六个临床数据集上达到或超越现有方法。

详情
Comments
9 pages, 3 figures, 13 supplementary pages. Submitted to NeurIPS 2026
AI中文摘要

生存分析旨在建模协变量和时间如何共同影响右删失下的事件时间分布。经典方法如Cox模型和广义加性模型(GAM)需要手动指定交互和时变效应,这在丰富的临床数据集上越来越不切实际。我们引入了KAPLAN-HR,一种B样条Kolmogorov-Arnold网络(KAN),用于非参数估计条件风险函数作为协变量和时间的联合函数。单层KAPLAN-HR模型恢复GAM,而更深层的架构通过组合捕捉交互和时变效应。我们为非参数KAN风险估计器建立了收敛速率,该速率仅依赖于底层KAN表示的平滑性,而不依赖于协变量维度,从而缓解了KAN可表示目标的维度灾难。在六个临床基准数据集的评估中,KAPLAN-HR匹配或超过了已建立的统计和深度学习生存方法的预测性能。

英文摘要

Survival analysis aims to model how covariates and time jointly shape the time-to-event distribution under right censoring. Classical methods such as the Cox model and generalised additive models (GAMs) require interactions and time-varying effects to be manually specified, which is increasingly impractical on rich clinical datasets. We introduce KAPLAN-HR, a B-spline Kolmogorov-Arnold Network (KAN) for nonparametric estimation of the conditional hazard as a joint function of covariates and time. A single-layer KAPLAN-HR model recovers a GAM, while deeper architectures capture interactions and time-varying effects through composition. We establish a convergence rate for the nonparametric KAN hazard estimator that depends only on the smoothness of the underlying KAN representation and not on the covariate dimension, thereby mitigating the curse of dimensionality for KAN-representable targets. In evaluations over six clinical benchmark datasets, KAPLAN-HR matches or exceeds the predictive performance of established statistical and deep learning survival methods.

2605.25966 2026-05-26 cs.LG cs.CL stat.ML

Mapping the Schedule x Bit-Width Boundary in Sub-100M Quantisation-Aware Training

在小于100M参数量化感知训练中映射调度策略与位宽边界

Christian Brandt Thomassen

AI总结 通过大规模实验研究子100M参数解码器语言模型中,量化感知训练的最佳学习率调度是否依赖于位宽,发现INT6 QAT无需不同调度,INT4在50M以上需wd33调度,以下则噪声主导。

详情
Comments
20 pages, 6 figures, 4 tables. 1345 training runs total (720 + 625). Submitted for review at TMLR
AI中文摘要

我们测试了在子100M参数解码器语言模型中,从初始化开始的量化感知训练(QAT)的最佳学习率调度是否依赖于位宽。一项720次运行的因子网格实验(阶段2)覆盖了位宽×衰减分数×学习率大小×模型大小×随机种子(FP16/INT8/INT6,15M-100M,5个种子),发现在每个(位宽,大小)单元中,最佳衰减分数为33%。主要假设——INT6 QAT需要与高精度训练不同的调度——在FP16/INT8/INT6下被证伪。后续625次运行(阶段5)沿五个轴探测零假设:优化器(AdamW)、调度形状(余弦)、训练长度(最多9倍迭代次数)、扩展的大小扫描(5M-350M)以及从3M到100M的INT4扫描。零假设在所有三种设置变化下均稳健。INT6的惩罚遵循对数线性缩放定律,其在阶段2的拟合预测了五个保留的阶段5大小(5M、8M、175M、250M、350M),且均在95%预测区间内(5/5)。对于INT4,情况比高精度更清晰:在50M和100M时,wd33明确最优(配对z~12-15,10/10种子);低于50M时,在从3M到30M的六个测试大小中,没有单个大小显示出统计显著的调度偏好,且每个大小的平均惩罚在种子级噪声内振荡。因此,边界是从低于50M的噪声主导区域到50M及以上明确的wd33区域的过渡,而非清晰的wd10区域。权重到网格距离的探测证伪了FP16/INT8/INT6零假设的最简单机制(快速网格锁定):在衰减前,INT6-QAT权重与INT6网格的距离基本与FP16权重相同(比率~1.04)。实用建议:在子100M规模下,在FP16上调优一次学习率调度,并原封不动地应用于INT8/INT6 QAT;对于50M以上的INT4,使用wd33;对于50M以下的INT4,调度选择在噪声中。

英文摘要

We test whether the optimal learning-rate schedule depends on bit-width during from-initialisation quantisation-aware training (QAT) for sub-100M decoder language models. A 720-run factorial grid (Phase 2) over bit-width x warmdown fraction x LR magnitude x model size x seed (FP16/INT8/INT6, 15M-100M, 5 seeds) finds the optimal warmdown is 33% at every (bit-width, size) cell. The primary hypothesis -- that INT6 QAT requires a different schedule than higher-precision training -- is falsified at FP16/INT8/INT6. A 625-run follow-up (Phase 5) probes the null along five axes: optimiser (AdamW), schedule shape (cosine), training length (up to 9x more iterations), an extended size sweep (5M-350M), and an INT4 sweep from 3M to 100M. The null is robust under all three setup changes. The INT6 penalty follows a log-linear scaling law whose fit on Phase 2 predicts the five held-out Phase 5 sizes (5M, 8M, 175M, 250M, 350M) within their 95% prediction intervals (5/5). For INT4 the picture is sharper than the higher precisions: at 50M and 100M, wd33 is decisively optimal (paired z ~ 12-15, 10/10 seeds); below 50M, across the six tested sizes from 3M to 30M, no individual size shows a statistically significant schedule preference and the per-size mean penalty oscillates within seed-level noise. The boundary is therefore a transition between a noise-dominated regime below 50M and a decisive wd33 regime at and above 50M, not a clean wd10 region. A weight-to-grid-distance probe falsifies the simplest mechanism for the FP16/INT8/INT6 null result (rapid grid-snapping): pre-warmdown, INT6-QAT weights sit at essentially the same distance from the INT6 grid as FP16 weights (ratio ~ 1.04). Practical recommendation: at sub-100M scale, tune the LR schedule once at FP16 and apply unchanged to INT8/INT6 QAT; for INT4 at 50M+ use wd33; for INT4 below 50M the schedule choice is in the noise.

2605.25897 2026-05-26 stat.ME

Nonparametric Estimation via Expected Order Statistics

基于期望顺序统计量的非参数估计

Tommaso Lando, Lorenzo Tedesco

AI总结 提出一种通过将质量分配给估计的期望顺序统计量来降低经验分布函数估计误差的非参数估计方法,并建立了其有限样本性质、渐近理论及自举有效性。

详情
AI中文摘要

经验分布函数将质量 $1/n$ 分配给样本中的每个 $n$ 个观测值。由于这些观测值变异性高,通过用渐近变异性更小的估计观测值替换它们,可以降低估计误差。受此启发,我们引入一种非参数估计量,通过将质量 $1/m$ 分配给 $m$ 个估计的期望顺序统计量($m$ 可任意选择)得到。该估计量具有若干有限样本性质,并产生丰富的渐近理论。其相对于总体对应物的估计误差由经验分布的 $L^1$ 误差控制。此外,新估计量的每个 $L$ 泛函对应于具有更新权重的经验分布的 $L$ 泛函。我们建立了当 $n \to \infty$ 时在 $L^p$ 范数和 Wasserstein 距离下的几乎必然收敛性,并导出了当 $p \in [1,\infty)$ 且 $m$ 固定时,以及当 $p=1,2$ 且 $n,m \to \infty$ 时,关联的经验分位过程在 $L^p(0,1)$ 中的弱收敛性。这些结果为基于距离的泛函(包括 $L^p$ 和 Wasserstein 度量)提供了渐近分布。自举有效性也得到了证明。模拟表明,该估计量通常优于经验分布函数,并与核方法保持竞争力,在不同分布设置下表现更稳定。

英文摘要

The empirical distribution function assigns mass $1/n$ to each of the $n$ observations in a sample. As these are highly variable, estimation error may be reduced by replacing them with estimated observations that are asymptotically less variable. Motivated by this idea, we introduce a nonparametric estimator obtained by assigning mass $1/m$ to $m$ estimated expected order statistics, with $m$ chosen arbitrarily. The estimator enjoys several finite-sample properties and yields a rich asymptotic theory. Its estimation error relative to its population counterpart is controlled by the $L^1$ error of the empirical distribution. Moreover, every $L$-functional of the new estimator corresponds to an $L$-functional of the empirical distribution with updated weights. We establish almost sure convergence in $L^p$ norm and Wasserstein distance as $n \to \infty$, and derive weak convergence of the associated empirical quantile process in $L^p(0,1)$, for $p\in[1,\infty)$ and $m$ fixed, and for $p=1,2$ as $n,m \to \infty$. These results yield asymptotic distributions for distance-based functionals, including $L^p$ and Wasserstein metrics. Bootstrap validity is also established. Simulations show that the estimator often improves on the empirical distribution and remains competitive with kernel methods, with more stable performance across different distributional settings.

2605.25873 2026-05-26 stat.ME stat.CO

Bayesian perspectives on exponential random graph models

指数随机图模型的贝叶斯视角

Alberto Caimo, Isabella Gollini

AI总结 本文综述了贝叶斯指数随机图模型的推断方法,包括辅助变量MCMC、调整伪似然和变分方法,并讨论了模型选择及缺失数据、纵向动态等扩展。

详情
Comments
16 pages
AI中文摘要

指数随机图模型(ERGMs)是网络数据中广泛使用的框架,能够对观测网络背后的结构机制进行假设检验。贝叶斯ERGMs通过完全概率建模提供了原则性的不确定性量化,并能够整合先验知识。然而,由于后验分布是双重难处理的(似然归一化常数依赖于未知参数),计算仍然具有挑战性。本文回顾了贝叶斯ERGMs推断的方法,将推断方法分为三大类:辅助变量MCMC方法、调整伪似然方法和变分方法,并专门讨论了模型选择。我们还讨论了缺失数据、纵向动态、网络群体、加权网络等建模扩展,强调了在各个科学学科中的应用。

英文摘要

Exponential random graph models (ERGMs) are a widely used framework for network data, enabling hypothesis testing on the structural mechanisms underlying observed networks. Bayesian ERGMs provide principled uncertainty quantification and enable the incorporation of prior knowledge through fully probabilistic modelling. However, computation remains challenging because the posterior is doubly intractable, with a likelihood normalising constant that depends on unknown parameters. This paper reviews Bayesian approaches to ERGM inference, categorising inference methods into three broad classes: auxiliary variable MCMC methods, adjusted pseudo-likelihood approaches, and variational methods, alongside dedicated treatment of model selection. We also discuss modelling extensions for missing data, longitudinal dynamics, populations of networks, weighted networks, highlighting applications across various scientific disciplines.

2605.25870 2026-05-26 eess.SP math.ST stat.AP stat.TH

The Symmetric Location Problem: a Song of Efficiency and Robustness

对称定位问题:效率与鲁棒性的协奏曲

Stefano Fortunati

AI总结 本文通过半参数统计框架解决对称定位问题,在存在无限维 nuisance 参数时实现有限维参数的估计,兼顾统计效率与分布自由鲁棒性。

详情
AI中文摘要

本讲义旨在向信号处理(SP)社区介绍一种强大但仍未充分利用的工具:半参数统计。简而言之,半参数框架允许我们在存在无限维 nuisance 参数(例如噪声密度)的情况下估计或对有限维参数进行假设检验。显然,该框架足够通用,几乎涵盖所有 SP 应用。值得注意的是,正如标题借鉴乔治·R·R·马丁著名系列丛书所暗示的,半参数统计相对于参数和非参数统计的最大优势在于它能够调和两个看似对立的概念:统计效率和鲁棒性。这里,鲁棒性被理解为分布自由性,即估计性能必须对生成数据分布的函数形式缺乏知识具有鲁棒性。为了确切解释这意味着什么,在本讲义中,我们将重点关注著名且基本的对称定位问题。对称定位问题是一个基本问题,可以在无数 SP 领域中找到(以各种形式):源定位、时间同步、阵列信号处理和分布式传感器网络等。此外,值得注意的是,我们将针对这一特定问题开发的方法可以扩展到更一般的半参数估计问题,例如椭圆数据中的位置向量和协方差矩阵的估计。

英文摘要

The aim of this Lecture Note is to introduce the Signal Processing (SP) community to a powerful yet still under-utilised tool: the semiparametric statistics. In short, the semiparametric framework allows us to estimate or perform hypothesis testing on a finite-dimensional parameter in the presence of an infinite-dimensional nuisance parameter (i.e. a function), such as the density of the noise. Clearly, this framework is general enough to include almost every SP application. Remarkably, as the title suggests drawing on George R. R. Martin's famous book series, the greatest advantage of semiparametric statistics over parametric and non-parametric ones lies in the fact that it is able to reconcile two seemingly dichotomous concepts: statistical efficiency and robustness. Here, robustness is understood in the sense of distribution-freeness, that is the estimation performance must be robust with respect to the lack of knowledge of the functional form of the generating data distribution. To explain exactly what this means, in this Lecture Note we will focus our attention on the famous and fundamental symmetric location problem. The symmetric location problem is a fundamental problem that can be found (in various forms) in countless areas of SP: source localization, time synchronization, array signal processing, and distributed sensor networks, just to name a few. Furthermore, it is important to note that the methodology we will develop for this specific problem can be extended to much more general semiparametric estimation problems, such as the estimation of the location vector and covariance matrix in elliptical data.

2605.25859 2026-05-26 math.ST cs.LG stat.TH

Minimax Limits of k-Fold Cross-Validation via Majority

k折交叉验证的极小极大极限:多数投票算法

Ido Nachum, Rüdiger Urbanke, Thomas Weinberger

AI总结 本文通过分析二元分类中多数投票算法的交叉验证均方误差,揭示了k折交叉验证的极小极大极限,证明当折数k随样本数n增长时,任何经验风险最小化算法的均方误差下界为Ω(√k/n)。

详情
AI中文摘要

我们研究了$k$折交叉验证作为风险估计量的均方误差,特别关注其精度如何依赖于折数$k$。尽管交叉验证被广泛使用,但关于如何选择$k$的原则性指导基本缺失,这主要是由于折间误差估计的复杂依赖性。为了获得清晰且可解释的结果,我们聚焦于二元分类中的多数投票算法,这是一个最小但非平凡的经验风险最小化过程。我们对其交叉验证行为进行了细粒度分析,表明即使这个简单算法也表现出微妙而精细的现象,现有理论对此给出的界是宽松甚至无效的。借助这一分析,我们引入了交叉验证风险估计的极小极大框架,并证明当折数随样本数$n$增长时,没有任何经验风险最小化算法能够达到$O(1/n)$的极小极大均方误差;相反,一个$Ω(√k/n)$阶的下界是不可避免的。我们的结果揭示了交叉验证作为数据重用策略的根本局限性,澄清了先前理论工作中的空白和不准确之处,并将多数投票算法定位为一个自然的基准,任何对交叉验证的紧致分析都应能够解释它。

英文摘要

We study the mean-squared error of $k$-fold cross-validation as a risk estimator, with particular emphasis on how its accuracy depends on the number of folds $k$. Despite the widespread use of cross-validation, principled guidance for choosing $k$ is largely absent, mainly due to the complex dependence between fold-wise error estimates. To obtain sharp and interpretable results, we focus on the majority algorithm in binary classification, a minimal yet nontrivial empirical risk minimization procedure. We provide a fine-grained analysis of its cross-validation behavior, showing that even this simple algorithm exhibits subtle and delicate phenomena for which existing theory provides loose and even vacuous bounds. Leveraging this analysis, we introduce a minimax framework for cross-validation risk estimation and prove that no empirical risk minimization algorithm can achieve an $O(1/n)$ minimax mean-squared error when the number of folds grows with the number of samples $n$; instead, a lower bound of order $Ω(\sqrt{k}/n)$ is unavoidable. Our results reveal fundamental limitations of cross-validation as a data-reuse strategy, clarify gaps and inaccuracies in prior theoretical work, and position the majority algorithm as a natural benchmark that any tight analysis of cross-validation should be able to explain.

2605.25855 2026-05-26 stat.ME math.ST stat.ML stat.TH

High-Dimensional Change-Point Detection via Angular Kernel Statistics

高维变点检测:基于角核统计量

Jyotishka Ray Choudhury, Yao Xie

AI总结 针对高维低样本量(HDLSS)数据,提出一种维度平均的角核扫描框架,通过聚合坐标间有界一维角差异实现非参数、无超参数、不依赖矩的变点检测,并给出离线与在线过程的统计推断保证。

详情
AI中文摘要

我们研究在必须从少量观测批次中进行推断的高维数据变点检测问题。主要关注高维低样本量(HDLSS)情形,其中序列长度固定而环境维度发散。我们提出一种维度平均的角核扫描框架,用于检测边际分布变化。该统计量聚合跨坐标的有界一维角差异,得到一个完全非参数、无超参数且不依赖矩的估计量,该估计量在无需指定、估计或假设有限边际矩(例如在重尾或污染分布下)的情况下仍然定义良好。对于离线单变点问题,我们推导出精确的总体均值分解为通用确定性形状函数和标量信号因子,将零假设协方差结构表征至标量长期方差因子,并建立了跨坐标混合下的HDLSS多元中心极限定理。这些结果导致插件高斯校准、渐近第一类错误控制以及功效和定位保证,包括$d^{-1/2}$局部检测尺度。我们进一步将离线过程扩展为针对高维流数据的固定窗口序贯监测过程,并获得了ARL校准和最坏情况EDD界。模拟研究表明,所提方法能够在具有挑战性的HDLSS和流设置中准确检测和定位变化,而基于矩或超参数敏感的程序可能不可靠。

英文摘要

We study change-point detection for high-dimensional data in regimes where inference must be performed from small batches of observations. Our primary focus is the high-dimensional, low sample size (HDLSS) regime, where the sequence length is fixed while the ambient dimension diverges. We propose a dimension-averaged angular kernel scan framework for detecting marginal distributional shifts. The statistic aggregates bounded one-dimensional angular discrepancies across coordinates, yielding a fully nonparametric, hyperparameter-free, and moment-agnostic estimator that remains well-defined without specifying, estimating, or assuming finite marginal moments, for example under heavy-tailed or contaminated distributions. For the offline single-change problem, we derive an exact population mean factorization into a universal deterministic shape function and a scalar signal factor, characterize the null covariance structure up to a scalar long-run variance factor, and establish an HDLSS multivariate central limit theorem under cross-coordinate mixing. These results lead to plug-in Gaussian calibration, asymptotic type-I error control, and power and localization guarantees, including a $d^{-1/2}$ local detection scale. We further extend the offline procedure to a fixed-window sequential monitoring procedure for high-dimensional streaming data, and obtain ARL calibration and worst-case EDD bounds. Simulation studies demonstrate that the proposed method can accurately detect and localize changes in challenging HDLSS and streaming settings where moment-based or hyperparameter-sensitive procedures may be unreliable.

2605.25811 2026-05-26 stat.ME cs.LG stat.ML

Geometry Adaptive Counterfactual Distribution Learning with Diffusion-Guided Smoothing

几何自适应反事实分布学习与扩散引导平滑

Kwangho Kim

AI总结 针对高维反事实分布学习,提出两种基于扩散引导的几何自适应平滑估计器,通过有效维度降低误差,并在CelebA实验验证。

详情
AI中文摘要

我们研究了高维结果的反事实分布学习,其反事实律可能集中在低维结构附近。标准各向同性平滑对所有环境方向一视同仁,导致不利的缩放和不稳定的局部推断。我们提出了两种基于半参数去偏的扩散引导估计器:用于反事实密度的扩散知情平滑和用于反事实得分的扩散知情得分平滑。这些估计器将因果干扰调整与由扩散得分信息驱动的几何自适应定位相结合,在去除一阶干扰偏差的同时使平滑与局部结果几何对齐。我们建立了平滑密度和基于得分目标的渐近展开、风险界限和推断程序,并在额外近似条件下获得了环境密度推断。在结构几何条件下,主导随机误差由扩散引导核诱导的有效维度控制,而非环境维度。基于CelebA的半合成实验显示几何自适应方法的误差衰减更陡峭,支持了所提出的有效维度理论。

英文摘要

We study counterfactual distribution learning for high-dimensional outcomes whose counterfactual law may concentrate near lower-dimensional structure. Standard isotropic smoothing treats all ambient directions equally, leading to unfavorable scaling and unstable local inference. We propose two diffusion-guided estimators based on semiparametric debiasing: diffusion-informed smoothing for counterfactual densities and diffusion-informed score smoothing for counterfactual scores. The estimators combine causal nuisance adjustment with geometry-adaptive localization driven by diffusion score information, removing first-order nuisance bias while aligning smoothing with local outcome geometry. We establish asymptotic expansions, risk bounds, and inference procedures for smoothed density and score-based targets, with ambient density inference obtained under additional approximation conditions. Under structural geometry conditions, the leading stochastic error is governed by an effective dimension induced by the diffusion-guided kernel, rather than by the ambient dimension. Semi-synthetic experiments based on CelebA show steeper error decay for geometry-adaptive methods, supporting the proposed effective-dimension theory.

2605.25809 2026-05-26 math.NA cs.NA stat.CO

A multilevel sketch-and-solve method for overdetermined least squares problems

超定最小二乘问题的多级草图与求解方法

Irina-Beatrice Haas, Michael B. Giles, Yuji Nakatsukasa

AI总结 提出一种多级草图与求解(MLSAS)方法,通过结合不同草图大小的样本和校正项,在保持计算效率的同时提高超定最小二乘问题的估计精度。

详情
Comments
19 pages, 3 figures
AI中文摘要

草图与求解(SAS)是一种非常成功的方法,用于高效估计严重超定的大规模线性最小二乘问题的解。它使用随机草图来减小问题规模,从而降低计算成本。多位作者已经表明,对SAS的多个解进行平均可以进一步提高精度,该精度通过近似解对应的残差来衡量。进一步地,我们以多级方式组合来自草图与求解的解,使得近似解是由小草图获得的SAS样本与由较大草图获得的更精确校正项的组合。我们首先考虑估计量的方差,该方差取决于粗样本和校正项的方差。我们表明,每一级校正项的方差呈现趋势,并且比简单SAS估计量的方差下降得更快。然而,我们随后表明,我们的多级框架的整体计算成本略高于简单平均估计量,因此对于最小二乘问题,多级方法的朴素应用似乎并不吸引人。

英文摘要

Sketch-and-solve (SAS) is a very successful method to efficiently estimate the solution of heavily overdetermined large linear least squares problems. It uses random sketching to reduce the size of the problem, hence reducing the computational cost. Several authors have shown that averaging several solutions from SAS further improves the accuracy, which is measured by the residual associated to the approximate solution. Going further, we combine solutions from sketch-and-solve in a multilevel manner, such that the approximate solution is a combination of SAS samples obtained from small sketches and more accurate correction terms obtained from larger sketches. We first consider the variance of the estimator, which depends on the variance of the coarse samples and the correction terms. We show that the variance of the correction terms on each level follows a trend and decreases faster than the variance of the simple SAS estimator. However, we then show that the overall computational cost of our multilevel framework is slightly higher than that of the simple average estimator, so a naive application of multilevel methods appears unattractive for least squares problems.

2605.25789 2026-05-26 cs.LG cs.AI cs.IT math.IT stat.ML

On the Benefits of Free Exploration for Regret Minimization in Multi-Armed Bandits

关于自由探索对多臂老虎机遗憾最小化的益处

Yunlong Hou, Zixin Zhong, Vincent Y. F. Tan

AI总结 本文研究在初始自由探索阶段后最小化累积遗憾的多臂老虎机问题,提出一种两阶段算法UFE-KLUCB-H,并证明其相比无自由探索的策略能严格减少遗憾。

详情
Comments
55 pages
AI中文摘要

我们研究了一个随机多臂老虎机问题,其中智能体在遗憾累积之前被授予一个自由探索预算,这是经典遗憾最小化或纯探索范式未涵盖的设置。目标是设计一个自适应策略,在初始自由探索阶段策略性地探索老虎机实例,并在后续阶段最小化累积遗憾。我们形式化了这个带有自由探索的遗憾最小化问题,并识别出一个有趣的区间,其中自由探索预算与时间范围成对数比例。为了量化由于自由探索阶段的可用性而高概率节省的遗憾量,我们引入了一类新的策略,称为$(α,β)$-可能节省策略。我们提出了一种两阶段、可能节省的算法UFE-KLUCB-H,它由一个原则性的自由探索策略UFE和一个历史感知的遗憾最小化策略KLUCB-H组成。推导了UFE-KLUCB-H的实例相关上界,表明UFE-KLUCB-H累积的遗憾严格少于无法访问自由探索阶段的策略。作为补充,我们基于针对自由探索环境定制的多实例扰动论证推导了实例相关下界,证明了UFE-KLUCB-H对于二值老虎机的近乎最优性。我们的上界和下界揭示了累积遗憾中依赖于可用自由探索量的尖锐相变。进行了仿真,表明算法中的强制探索和自适应性导致了更大的遗憾节省。

英文摘要

We study a stochastic multi-armed bandit problem where an agent is granted a free exploration budget before regret accumulates, a setting not captured by the classic regret minimization or pure exploration paradigms. The goal is to design an adaptive policy that strategically explores the bandit instance in the initial free exploration phase and minimizes the cumulative regret in the subsequent phase. We formalize this regret minimization with free exploration problem and identify an interesting regime where the free exploration budget scales logarithmically with the time horizon. To quantify the amount of regret saved with high probability as a result of the availability of the free exploration phase, we introduce a novel set of policies known as $(α,β)$-probably saving policies. We propose a two-phase, probably saving algorithm, UFE-KLUCB-H, which consists of a principled free exploration policy, UFE, and a history-aware regret minimization policy KLUCB-H. Instance-dependent upper bounds on UFE-KLUCB-H are derived, showing that UFE-KLUCB-H accumulates strictly less regret than policies that do not have access to a free exploration phase. Complementarily, we derive instance-dependent lower bounds based on novel multi-instance perturbation arguments tailored to the free-exploration setting, demonstrating the near-optimality of UFE-KLUCB-H for two-valued bandits. Our upper and lower bounds reveal sharp phase transitions in the accumulated regret depending on the amount of available free exploration. Simulations are conducted to demonstrate that forced exploration and adaptivity in the algorithm lead to greater regret savings.

2605.25766 2026-05-26 math.ST q-fin.RM stat.TH

Measuring multivariate maximal tail dependence

测量多元最大尾部依赖性

Takaaki Koike, Marius Hofert, Haruki Tsunekawa

AI总结 针对经典尾部依赖系数仅沿对角线评估的局限性,本文提出并研究了多元最大尾部协调测度(MTCM),将其从二元扩展到多元情形,量化了单位体积超矩形上的最大尾部质量,并证明了最优方向的存在性。

详情
AI中文摘要

经典的尾部依赖系数(TDC)可能无法捕捉二元尾部依赖的非交换特征,因为它仅沿对角线评估底层连接函数。为解决这一局限性,二元情形下已提出几种最强尾部依赖表现形式的测度,包括基于底层二元连接函数的尾部连接函数的测度。本文引入并研究了多元最大尾部协调测度(MTCM),将二元测度扩展到多元情形。MTCM量化了公共单位体积下超矩形上的最大尾部质量,而相应的最大化器识别了最大尾部概率的方向。我们建立了多元情形下MTCM的基本性质,包括最优方向的存在性。我们还推导了几个重要模型类的解析表示。对于具有正则变化阿基米德生成元的生存Marshall-Olkin连接函数、Archimax和嵌套阿基米德连接函数,进一步得到了闭式表达式。应用于英格兰三变量年海平面最大值表明,MTCM可以揭示非对角应力方向以及基于似然或TDC的比较未能检测到的潜在极端依赖的显著差异。

英文摘要

The classical tail dependence coefficient (TDC) may fail to capture non-exchangeable features of bivariate tail dependence since it evaluates the underlying copula only along the diagonal. To address this limitation, several measures of strongest manifestation of tail dependence have been proposed in the bivariate case, including a measure based on the tail copula of the underlying bivariate copula. This paper introduces and investigates the multivariate maximal tail concordance measure (MTCM) which extends the bivariate measure to the multivariate case. The MTCM quantifies the largest tail mass over lower hyperrectangles of common unit volume, while the associated maximizer identifies the direction of maximal tail probability. We establish fundamental properties of the MTCM in the multivariate case, including existence of an optimal direction. We also derive analytical representations for several important model classes. Closed-form expressions are further obtained for survival Marshall-Olkin copulas, Archimax and nested Archimedean copulas with regularly varying Archimedean generators. An application to trivariate annual sea-level maxima in England shows that the MTCM can reveal off-diagonal stress directions and substantial differences in the underlying extremal dependence not detected by likelihood- or TDC-based comparisons.

2605.25739 2026-05-26 cs.LG cs.GT stat.ML

The Behavioral Credibility Trilemma: When Calibrated Autonomy Becomes Impossible

行为可信度三难困境:当校准自主性变得不可能

Lauri Lovén, Nam Do, Hassan Mehmood, Dinesh Kumar Sah, Sasu Tarkoma

AI总结 本文证明,在理性监督下,当某些任务超出智能体的可靠能力时,任何具有置信门控自主性的强化学习策略都无法同时实现最大帮助性、最优校准和完全自主性,即行为可信度三难困境。

详情
Comments
48 pages, 3 figures
AI中文摘要

我们证明,在理性监督下,当某些任务超出智能体的可靠能力时,任何具有置信门控自主性的强化学习策略都无法同时实现最大帮助性、最优校准和完全自主性:即行为可信度三难困境。这种不可能性是几何性的——向严格适当的评分规则添加任何非仿射自主性激励都会破坏严格适当性,因此,同时因校准置信度和自主行动而获得奖励的智能体,会在低于委托人批准阈值的任务上系统性地夸大其报告的置信度。行为扰动引理量化了这种膨胀(对于Brier分数,缩放比例为 $w_A/(2 w_C)$),并表明检测需要 $Ω(1/Δ^2)$ 次观测。我们证明委托人的最优监督规则必然是非仿射的,这使得不可能性是无条件的,并且在对数凹密度策略族中与优化器无关。我们形式化了置信门控决策问题,将现有方法映射到三难困境上,并确定了两种建设性的解决路径(承诺、领域分离)。一个540配置的Best-of-N实验测试了五个预注册假设,所有假设均得到强烈证实(效应量 $d = 1.10$ 至 $5.32$),并增加了对可达 $(H, C, A)$ 曲面几何的描述性分析,显示了一个与预测的膨胀饱和一致的平台截断前沿。

英文摘要

We prove that no reinforcement learning policy with confidence-gated autonomy can simultaneously achieve maximum helpfulness, optimal calibration, and full autonomy under rational oversight, whenever some tasks exceed the agent's reliable competence: the Behavioral Credibility Trilemma. The impossibility is geometric -- adding any non-affine autonomy incentive to a strictly proper scoring rule destroys strict properness, so an agent rewarded for both calibrated confidence and autonomous action systematically inflates its reported confidence on tasks below the principal's approval threshold. The Behavioral Perturbation Lemma quantifies the inflation (scaling as $w_A/(2 w_C)$ for the Brier score) and shows detection requires $Ω(1/Δ^2)$ observations. We prove the principal's optimal oversight rule is necessarily non-affine, making the impossibility unconditional and optimizer-independent across log-concave-density policy families. We formalize the Confidence-Gated Decision Problem, map existing methods onto the trilemma, and identify two constructive resolution pathways (commitment, domain separation). A 540-configuration Best-of-N experiment tests five pre-registered hypotheses, all strongly confirmed (effect sizes $d = 1.10$ to $5.32$), and adds a descriptive analysis of the achievable-$(H, C, A)$ surface geometry showing a plateau-truncated frontier consistent with the predicted inflation saturation.

2605.25734 2026-05-26 stat.AP stat.ME stat.ML

Stein-Encoder: A White-Box Supervised Encoder via Stein Identities in Multi-Modal Studies

Stein-Encoder: 多模态研究中基于Stein恒等式的白盒监督编码器

Jiarui Zhang, Shuoxun Xu, Jiasheng Shi, Xinzhou Guo

AI总结 针对多模态生物医学数据中基因与临床特征混杂的问题,提出Stein-Encoder白盒监督框架,利用Stein方法和残差化技术构建可解释的单指标,实现结构解耦并提升预测性能。

详情
AI中文摘要

在多模态生物医学研究中,将高维基因组数据与临床基线整合对于精准医学至关重要。然而,标准深度神经网络方法常常纠缠这些模态,掩盖了遗传特征的具体预测影响,并可能导致次优的预测性能。受里程碑式的METABRIC队列原发性乳腺肿瘤研究的启发,我们提出了Stein-Encoder,一种白盒监督框架,旨在隔离在混杂协变量条件下驱动临床结果的遗传信号。通过利用Stein方法和残差化技术,我们的方法构建了一个可解释的单指标,总结了相关的生物异质性,同时灵活地纳入临床因素,并可用于改进下游预测。我们建立了识别性、一致性和效率改进的理论保证。应用于METABRIC队列,Stein-Encoder在预测准确性上优于无监督基准。关键的是,它通过揭示响应特异性生物机制实现了结构解耦:我们发现肿瘤大小主要由有丝分裂网络驱动,而预后指数则依赖于不同的增殖-免疫轴。这项工作提供了一个统一、计算高效的框架,弥合了统计严谨性与神经网络表示能力之间的差距,使得可解释、任务特异性和高效的多模态健康数据压缩成为可能,适用于超越生物标志物发现的广泛精准医学应用。

英文摘要

In multi-modal biomedical research, integrating high-dimensional genomic data with clinical baselines is essential for precision medicine. However, standard deep neural network approaches often entangle these modalities, obscuring the specific predictive impact of genetic features and leading to possibly suboptimal predictive performance. Motivated by the landmark METABRIC cohort primary breast tumors study, we propose the Stein-Encoder, a white-box supervised framework designed to isolate the genetic signal driving clinical outcomes conditional on nuisance covariates. By leveraging Stein's method and residualization techniques, our approach constructs an interpretable single index that summarizes relevant biological heterogeneity while flexibly incorporating clinical factors and can be used to improve downstream prediction. We establish theoretical guarantees for identification, consistency and efficiency improvement. Applied to the METABRIC cohort, the Stein-Encoder outperforms unsupervised benchmarks in predictive accuracy. Crucially, it achieves structural disentanglement by revealing response-specific biological mechanisms: we find that tumor size is driven primarily by mitotic networks, whereas prognostic indices rely on a distinct proliferation-versus-immune axis. This work contributes a unified, computationally efficient framework that bridges statistical rigor with the representational power of neural networks, enabling interpretable, task-specific and efficient compression of multi-modal health data for a wide range of precision medicine applications, beyond biomarker discovery.

2605.25687 2026-05-26 math.ST stat.TH

Confidence intervals for causal effects in sequential decision making

序贯决策中因果效应的置信区间

Vladimir Vovk, Ruodu Wang

AI总结 针对后门或前门准则适用的情形,推导了因果效应的置信区间和置信序列,并分析了不同数据收集方式下置信区间的宽度变化。

详情
Comments
15 pages, 4 figures
AI中文摘要

我们在后门或前门准则适用的情形下推导了因果效应的置信区间和置信序列。最紧的置信区间适用于标准设置,其中训练数据由给定因果图描述的系统的独立同分布观测组成。当干预允许依赖于过去数据时,我们的置信区间变宽,并包含来自重对数律的一项,即使观测数量事先已知。在观测数量未知的序贯设置中,我们的置信区间排列成因果效应的置信序列,包含更多的重对数律项,并且变得更宽。

英文摘要

We derive confidence intervals and confidence sequences for causal effects in situations where the back-door or front-door criteria are applicable. Our tightest confidence intervals hold in the standard setting where the training data consists of IID observations over a system described by a given causal diagram. When interventions are allowed to depend on the past data, our confidence intervals become wider and involve a term coming from the law of the iterated logarithm, even where the number of observations is known in advance. In the sequential setting where the number of observations is not given, our confidence intervals, arranged into a confidence sequence for causal effects, involve more iterated logarithm terms and become even wider.

2605.25648 2026-05-26 stat.ML cs.LG

StrTransformer: Source-Wise Structured Transformers for Unsupervised Blind Source Recovery

StrTransformer: 面向无监督盲源恢复的源向结构化Transformer

Yuan-Hao Wei

AI总结 提出StrTransformer框架,通过源向结构化Transformer分支和观测空间混合器直接优化潜在源矩阵,实现盲源恢复和分支潜在建模。

详情
AI中文摘要

本文提出StrTransformer,一种用于盲源恢复和分支潜在建模的源向结构化Transformer框架。StrTransformer不使用编码器推断潜在变量,而是直接优化潜在源矩阵,同时结合观测空间混合器和源向结构化Transformer分支。混合器强制重建一致性,而每个Transformer分支对一条潜在源轨迹施加可微的结构约束。具体来说,每个源被转换为多尺度补丁令牌,随机掩码,由局部偏置Transformer处理,并通过掩码补丁重建能量进行评估。该能量作为隐式的源向结构先验。为了鼓励不同潜在分支专门处理不同的时间模式,StrTransformer进一步引入有序多尺度控制器,学习分支特定的补丁尺度权重、有序尺度中心和局部注意力斜率。最终目标函数结合了观测重建、源向结构正则化以及用于分离和尺度专门化的模块化辅助惩罚。我们分析了目标函数的解耦和耦合结构、正则化精确重建纤维,以及由有序分支描述符引起的置换对称性减少。一个受控案例研究表明,学习到的分支收敛到不同的时间尺度结构,并在事后评估中恢复源对齐的潜在轨迹。

英文摘要

This paper proposes StrTransformer, a source-wise structured Transformer framework for blind source recovery and branch-wise latent modeling. Instead of using an encoder to infer latent variables, StrTransformer directly optimizes the latent source matrix together with an observation-space mixer and source-wise structural Transformer branches. The mixer enforces reconstruction consistency, while each Transformer branch imposes a differentiable structural constraint on one latent source trajectory. Specifically, each source is converted into multi-scale patch tokens, randomly masked, processed by a locality-biased Transformer, and evaluated through a masked patch reconstruction energy. This energy acts as an implicit source-wise structural prior. To encourage different latent branches to specialize into different temporal regimes, StrTransformer further introduces an ordered multi-scale controller that learns branch-specific patch-scale weights, ordered scale centers, and locality attention slopes. The resulting objective combines observation reconstruction, source-wise structural regularization, and modular auxiliary penalties for separation and scale specialization. We analyze the decoupling and coupling structure of the objective, the regularized exact-reconstruction fiber, and the reduction of permutation symmetry induced by ordered branch descriptors. A controlled case study shows that the learned branches converge to distinct temporal-scale structures and recover source-aligned latent trajectories under post-hoc evaluation.

2605.25633 2026-05-26 math.ST stat.TH

Exponential mixing properties of nonlinear functional autoregressive models

非线性函数自回归模型的指数混合性质

Shuntarou Suzuki, Yoshikazu Terada

AI总结 本文推导了非线性函数自回归模型指数混合的充分条件,并应用于Urysohn算子的深度学习估计,给出了收敛速度。

详情
AI中文摘要

近年来,函数型数据分析的重要性显著增加。在机器学习中,基于深度神经网络的非线性函数回归被称为算子学习,其许多应用涉及函数型时间序列数据。然而,函数型时间序列分析中非线性模型的理论理解仍然有限,因为现有工作大多关注线性模型。本文推导了分析非线性函数自回归(NFAR)模型中自适应学习的基本性质。具体而言,我们推导了NFAR模型指数混合的充分条件。我们给出了一个满足这些条件的Hammerstein算子示例。作为指数混合的一个应用,我们考虑了具有Urysohn算子的NFAR模型的算子学习,并推导了基于深度神经网络的自适应估计量的收敛速度。

英文摘要

The importance of functional data analysis has increased substantially in recent years. In machine learning, nonlinear function regression based on deep neural networks is referred to as operator learning, and many of its applications involve functional time series data. However, the theoretical understanding of nonlinear models in functional time series analysis remains limited, as most existing works focus on linear models. In this paper, we derive basic properties for analyzing adaptive learning in nonlinear functional autoregressive (NFAR) models. Specifically, we derive sufficient conditions for NFAR models to be exponentially mixing. We provide an example with a Hammerstein operator under which these conditions are satisfied. As an application of exponential mixing, we consider operator learning for NFAR models with Urysohn operators and derive convergence rates for adaptive estimators based on deep neural networks.

2605.25616 2026-05-26 cs.LG stat.ML

Courtroom Analogy: New Perspective on Uncertainty-Aware Classification

法庭类比:不确定性感知分类的新视角

Taeseong Yoon, Heeyoung Kim

AI总结 提出法庭类比框架,通过结构化混合狄利克雷分布建模分类中的不确定性聚合,并设计单次前馈神经网络MoDEX实现高效、可解释的不确定性量化。

详情
Comments
ICML 2026
AI中文摘要

分类中的单次不确定性量化方法通过预测类概率向量上的可处理分布来表示不确定性。现有方法主要关注增强该分布的表示能力,但往往对预测不确定性如何结构化和聚合提供的见解有限,导致可解释性较弱。我们引入法庭类比,将不确定性感知分类概念化为类特定倡导者之间的结构化辩论。每位倡导者形成概率意见,并通过输入依赖的可信度权重聚合这些意见得出最终裁决。在此框架中,每位倡导者的意见被建模为狄利克雷分布,其浓度参数分解为共享证据和类特定倡导。这产生了具有语义可解释参数的结构化混合狄利克雷分布。为实例化该公式,我们提出了混合狄利克雷专家(MoDEX),一种预测法庭参数的单次前馈神经架构,能够在显式建模不确定性聚合的同时实现高效且表达力强的不确定性量化。我们证明MoDEX具有强大的理论性质,并在多种基准测试中实现了最先进的不确定性量化性能,产生具有有意义语义的可解释不确定性估计。

英文摘要

Single-pass uncertainty quantification (UQ) methods for classification represent uncertainty by predicting a tractable distribution over the class probability vector. While existing approaches primarily focus on enhancing the expressiveness of this distribution, they often provide limited insight into how predictive uncertainty is structured and aggregated, resulting in weak interpretability. We introduce the courtroom analogy, which conceptualizes uncertainty-aware classification as a structured debate among class-specific advocates. Each advocate forms a probabilistic opinion, and a final verdict is reached by aggregating these opinions using input-dependent plausibility weights. In this framework, each advocate's opinion is modeled as a Dirichlet distribution whose concentration parameter is decomposed into shared evidence and class-specific advocacy. This yields a structured mixture of Dirichlet distributions with semantically interpretable parameters. To instantiate this formulation, we propose Mixture of Dirichlet EXperts (MoDEX), a single-pass neural architecture that predicts the courtroom parameters, enabling efficient and expressive UQ while explicitly modeling uncertainty aggregation. We demonstrate that MoDEX enjoys strong theoretical properties and achieves state-of-the-art UQ performance across diverse benchmarks, yielding interpretable uncertainty estimates with meaningful semantics.

2605.25610 2026-05-26 physics.soc-ph econ.GN math.OC q-fin.EC stat.AP

Match classification in the last round of four-team round-robin tournaments

四队循环赛最后一轮的比赛分类

László Csató, András Gyimesi

AI总结 本文通过分析FIFA世界杯小组赛,首次比较了确定性和概率性比赛分类方法,并利用概率模型量化了2026年世界杯改革的影响。

详情
Comments
22 pages, 4 figures, 6 tables
AI中文摘要

体育比赛最后一轮比赛分类是评估锦标赛设计的成熟工具。确定性和概率性方法均可用于此目的。本文通过分析最突出的四队循环赛例子——FIFA世界杯小组赛,首次对它们进行了比较。我们表明两种方法在实践中高度相关:2014年和2018年FIFA世界杯中分别出现了所有(四种)确定性和(六种)概率性比赛类型。考虑进攻和防守相对收益的概率模型提供了更深入的见解;例如,确定性方法中的竞争性比赛可以是六种概率类型中的任何一种。最后,利用概率框架量化并分解了2026年FIFA世界杯引入的主要改革的影响:扩军至48支球队,以及修改后的晋级和打破平局规则。

英文摘要

Classification of matches played in the last rounds of sports competitions is a well-established tool for evaluating tournament designs. Both deterministic and probabilistic approaches are available for this purpose. Our paper offers the first comparison of them by analysing the most prominent example of four-team round-robin competitions, the group stage of the FIFA World Cup. We show that both methods are highly relevant in practice: all (four) deterministic and (six) probabilistic match types occurred in the 2014 and 2018 FIFA World Cups, respectively. The probabilistic model, which accounts for the relative benefits of attacking and defending, provides deeper insights; for instance, the competitive matches from the deterministic approach can be of any of the six probabilistic types. Finally, the probabilistic framework is used to quantify and decompose the impact of the main reforms introduced for the 2026 FIFA World Cup: the expansion to 48 teams, as well as the modified qualification and tie-breaking rules.

2605.25608 2026-05-26 stat.ML cs.LG

Learning Sparse Compositional Functions with Norm-Constrained Neural Networks

学习具有范数约束神经网络的稀疏组合函数

Shuo Huang, Lorenzo Fiorito, Lorenzo Rosasco, Tomaso Poggio

AI总结 本文通过范数约束的深度神经网络,建立了学习稀疏组合函数的逼近率和过风险界,证明了深度网络能够利用层次表示避免维数灾难。

详情
AI中文摘要

深度神经网络学习层次特征的能力被广泛认为是其在高维学习中成功的关键机制。现有理论通过基于参数计数的逼近率和组合模型的无维数灾难样本复杂度保证,部分支持了这一观点。为了研究参数数量超过样本量的过参数化场景,我们开发了一个通过参数范数衡量复杂度的框架。在该方法中,我们使用Frobenius范数约束的深度神经网络,为学习稀疏组合函数建立了逼近率和过风险界,其中组合函数的组合结构由有向无环图表示。我们的结果具有广泛的适用性,因为每个可有效图灵计算的函数都具有稀疏组合表示。特别地,我们涵盖了一系列代表性模型,包括多指标模型、二叉树结构和一般组合架构。我们推导的速率表明,深度网络可以利用目标函数的组合结构,通过层次表示有效避免维数灾难。

英文摘要

The ability of deep neural networks to learn hierarchical features is widely regarded as a key mechanism underlying their success in high-dimensional learning. Existing theory partially supports this view by establishing approximation rates based on parameter counts and sample complexity guarantees for compositional models without incurring the curse of dimensionality (CoD). To study overparameterized regimes, where the number of parameters exceeds the sample size, we develop a framework that measures complexity via the parameter norm. Within this approach, we establish approximation rates and excess risk bounds for learning sparse compositional functions whose compositional structure is represented by directed acyclic graphs (DAGs), using Frobenius norm-constrained deep neural networks. Our results have broad applicability since every function that is efficiently Turing computable admits sparse compositional representations. In particular, we cover a range of representative models, including multi-index models, binary tree structures, and general compositional architectures. The rates we derive show that deep networks can exploit the compositional structure of the target functions, effectively avoiding the CoD through hierarchical representations.

2605.25592 2026-05-26 stat.ML cs.LG

Optimal Design for Multinomial Logit Model with Applications to Best Assortment Identification

多项Logit模型的最优设计及其在最佳组合识别中的应用

Joongkyu Lee, Min-hwan Oh

AI总结 针对多项Logit(MNL)模型,提出计算高效的最优实验设计框架,通过混合整数线性规划和多项式时间松弛方法实现统计效率与可扩展性,并应用于线性效用和非均匀收益下的最佳组合识别。

详情
Comments
Accepted at ICML 2026
AI中文摘要

我们研究了多项Logit(MNL)赌博机的最优实验设计,其中智能体从大小为$N$的基集中重复选择$K$个物品的子集,并观察单选择反馈。与线性或广义线性赌博机不同,MNL赌博机具有组合动作空间,这使得经典的最优设计方法和对所有子集的朴素优化在计算上难以处理。我们为MNL模型提出了一种计算高效的最优设计框架,通过两种互补方法实现了统计效率和可扩展性:(i) 将设计预言精确或认证近似地重构为带有求解器认证早停的$0$-$1$混合整数线性规划(MILP),以及(ii) 一种完全多项式时间的提升设计,用可处理的替代目标替换非线性目标。利用Kiefer-Wolfowitz等价定理,我们建立了接近G-最优性的保证,并刻画了由此产生的统计-计算权衡。作为应用,我们为具有线性效用和非均匀收益的MNL赌博机开发了一种最佳组合识别算法,并证明了实例相关的样本复杂度为$\tilde{O}\big(\frac{d \log N}{\Delta^2}\big)$,其中$d$是特征维度,$N$是臂的数量,$\Delta$是最小收益差距。

英文摘要

We study optimal experimental design for multinomial logit (MNL) bandits, where an agent repeatedly selects a subset of $K$ items from a ground set of size $N$ and observes single-choice feedback. Unlike linear or generalized linear bandits, MNL bandits have a combinatorial action space, which makes classical optimal design approaches and naive optimization over all subsets computationally intractable. We propose a computationally efficient optimal design framework for MNL models that achieves both statistical efficiency and scalability through two complementary approaches: (i) an exact or certified-approximate reformulation of the design oracle as a $0$-$1$ mixed-integer linear program (MILP) with solver-certified early stopping, and (ii) a fully polynomial-time lifted design that replaces the nonlinear objective with a tractable surrogate. Using the Kiefer-Wolfowitz equivalence theorem, we establish near G-optimality guarantees and characterize the induced statistical-computational trade-offs. As an application, we develop a best assortment identification algorithm for MNL bandits with linear utilities and non-uniform revenues, and prove an instance-dependent sample complexity of $\tilde{O}\big(\frac{d \log N}{Δ^2}\big)$, where $d$ is the feature dimension, $N$ is the number of arms, and $Δ$ is the minimum revenue gap.

2605.25590 2026-05-26 stat.ML cs.LG

Nonstationary Generalized Linear Bandits with Discounted Online Mirror Descent

基于折扣在线镜像梯度的非平稳广义线性老虎机

Joongkyu Lee, Min-hwan Oh

AI总结 提出DOMD-GLB算法,利用折扣在线镜像梯度处理非平稳广义线性老虎机,在保持O(1)每轮计算和内存成本的同时,实现动态遗憾界。

详情
AI中文摘要

我们研究非平稳广义线性老虎机(GLBs),其中期望奖励通过非线性链接函数与未知时变参数建模。该框架涵盖广泛的奖励模型,包括线性、伯努利和二项式奖励。现有方法主要基于最大似然估计(MLE),使用滑动窗口、重启或折扣机制处理非平稳性。尽管这些方法在统计上实现了高效的遗憾保证,但它们通常需要在每轮重新访问过去观测,导致计算和内存成本随时间增长;此外,其中一些方法依赖于非凸投影步骤。本文提出DOMD-GLB,一种用于非平稳GLBs的新算法,利用折扣在线镜像梯度(DOMD)进行参数估计,从而每轮仅产生O(1)的计算和内存成本。我们证明了在漂移环境下的动态遗憾界为$\tilde{O} \big(c_\mu^{-1/2} d^{3/4} P_T^{1/4} T^{3/4}\big)$,在分段平稳环境下为$\tilde{O}\big(c_\mu^{-1/3} d^{2/3} \Gamma_T^{1/3} T^{2/3}\big)$,其中$d$表示特征维度,$T$表示时间范围,$P_T$表示路径长度,$\Gamma_T$表示变化点数量,$c_\mu$是与链接函数相关的曲率参数,同时显著提高了计算效率。据我们所知,这是首个每轮计算和内存成本与时间无关的非平稳GLBs算法。

英文摘要

We study nonstationary generalized linear bandits (GLBs), where the expected reward is modeled through a nonlinear link function with an unknown time-varying parameter. This framework encompasses a broad class of reward models, including linear, Bernoulli, and binomial rewards. Existing approaches are predominantly based on maximum-likelihood estimation (MLE), using sliding-window, restart, or discounting mechanisms to handle nonstationarity. Although these methods achieve statistically efficient regret guarantees, they generally require revisiting past observations at every round, which leads to computation and memory costs that grow with time; moreover, several of them rely on a non-convex projection step. In this paper, we propose DOMD-GLB, a new algorithm for nonstationary GLBs that utilizes discounted online mirror descent (DOMD) for parameter estimation, thereby incurring only $O(1)$ computation and memory costs per round. We prove dynamic regret bounds of order $\tilde{O} \big(c_μ^{-1/2} d^{3/4} P_T^{1/4} T^{3/4}\big)$ in drifting environments and $\tilde{O}\big(c_μ^{-1/3} d^{2/3} Γ_T^{1/3} T^{2/3}\big) $in piecewise-stationary environments, where $d$ denotes the feature dimension, $T$ the time horizon, $P_T$ the path length, $Γ_T$ the number of change points, and $c_μ$ a curvature parameter associated with the link function, while substantially improving computational efficiency over prior work. To the best of our knowledge, this is the first algorithm for nonstationary GLBs with per-round computation and memory costs independent of time.

2605.25533 2026-05-26 eess.SP cs.IT math.IT math.ST stat.TH

Projected multi-reference alignment

投影多参考对齐

Amnon Balanov, Josh Katz, Tamir Bendory, Dan Edidin

AI总结 针对投影多参考对齐模型,在高噪声条件下利用前三阶矩恢复信号的二面体轨道,并证明样本复杂度与噪声方差的六次方成正比。

详情
AI中文摘要

受结构生物学应用启发,我们研究了投影多参考对齐(MRA)模型,其中未知信号通过含噪样本观测,每个样本由随机循环移位后接固定投影生成。投影合并反射对称的索引对,从而丢弃方向信息。目标是恢复信号的二面体轨道。我们证明在高噪声条件下,投影观测的前三阶矩确定一个一般的二面体轨道。主要机制是在矩层面将投影MRA约简为二面体MRA的反射不变相位耦合结构。在适应投影的傅里叶-余弦坐标中,一阶矩确定均值分量,二阶矩确定傅里叶幅度,选定的三阶矩给出二面体双谱中出现的余弦相位耦合关系。这些关系导致从三阶矩出发的构造性恢复方案。我们通过有限样本实验补充总体理论,比较期望最大化(EM)、直接矩优化和直接傅里叶-余弦矩优化。结果表明,在高噪声条件下,EM和直接矩优化均与预测的三阶矩样本复杂度标度$n \gtrsim σ^6$一致,其中$n$是观测数,$σ^2$是噪声方差。

英文摘要

Motivated by structural biology applications, we study the projected multi-reference alignment (MRA) model, in which an unknown signal is observed through noisy samples, each generated by applying a random cyclic shift followed by a fixed projection. The projection merges reflection-symmetric index pairs, thereby discarding orientation information. The goal is to recover the dihedral orbit of the signal. We prove that in the high-noise regime, the first three moments of the projected observations determine a generic dihedral orbit. The main mechanism is a reduction, at the moment level, from projected MRA to the reflection-invariant phase-coupling structure of dihedral MRA. In Fourier-cosine coordinates adapted to the projection, the first moment determines the mean component, the second moment determines the Fourier magnitudes, and selected third moments yield the cosine phase-coupling relations appearing in the dihedral bispectrum. These relations lead to a constructive recovery scheme from moments up to order three. We complement the population theory with finite-sample experiments comparing expectation--maximization (EM), direct moment optimization, and direct Fourier-cosine moment optimization. The results show that, in the high-noise regime, both EM and direct moment optimization are consistent with the predicted third-moment sample-complexity scaling $n \gtrsim σ^6$, where $n$ is the number of observations and $σ^2$ is the noise variance.

2605.25526 2026-05-26 stat.ML cs.LG

From DPPs to $k$-DPPs: identifiability analysis via spectral decomposition

从DPP到$k$-DPP:通过谱分解的可识别性分析

Hideitsu Hino, Keisuke Yano

AI总结 通过谱分解研究行列式点过程(DPP)及其条件版本$k$-DPP的几何结构,揭示了$k$-DPP中谱参数和特征空间旋转参数的可识别性变化,并刻画了可识别性差距。

详情
Comments
10 pages
AI中文摘要

我们通过谱分解$L=UΛU^{\top}$研究行列式点过程(DPP)的几何结构。谱$Λ$通过初等对称多项式控制基数分布,而特征空间方向$U$控制每个固定基数层内的条件分布。在基数$k$上取条件得到$k$-DPP,其可识别性结构发生根本变化:谱参数仅在一个公共尺度下可识别,特征空间旋转参数仅通过特征向量矩阵的平方子式可识别。我们通过三个显式不变性(尺度、符号相似性和特征空间旋转)以及一个维数计数定理精确刻画了可识别性差距,该定理表明当$\binom{N}{k}<N(N+1)/2$时存在额外的连续不可识别性。相比之下,对于完整DPP,不可识别性仅来自离散的符号相似性。

英文摘要

We study the geometry of determinantal point processes (DPPs) through the spectral decomposition $L=UΛU^{\top}$. The spectrum $Λ$ governs the cardinality distribution via elementary symmetric polynomials, while the eigenspace orientation $U$ governs the conditional law within each fixed-cardinality stratum. Conditioning on cardinality $k$ yields the $k$-DPP, for which the identifiability structure changes fundamentally: the spectral parameter becomes identifiable only up to a common scale, and the eigenspace rotation parameter is identifiable only through squared minors of the eigenvector matrix. We characterize the identifiability gap precisely, via three explicit invariances (scale, sign similarity, and eigenspace rotation) and a dimension-counting theorem showing the existence of additional continuous non-identifiability whenever $\binom{N}{k}<N(N+1)/2$. In contrast, for the full DPP the non-identifiability comes only from the discrete sign similarity.

2605.25509 2026-05-26 stat.ML cs.LG

Guided Flow Matching for Forward and Inverse PDE Problems with Sparse Observations: Algorithm and Theory

面向稀疏观测的正反PDE问题的引导流匹配:算法与理论

Xifeng Zhang, Jin Zhao

AI总结 提出FM4PDE流匹配生成框架,通过引导采样联合学习PDE系数与解分布,实现稀疏观测下的正向模拟与逆问题恢复,并提供误差保证。

详情
Comments
50 pages, 8 figures, 4 tables
AI中文摘要

从稀疏观测中重建PDE解是科学计算中的核心挑战。我们提出FM4PDE,一种流匹配生成框架,学习PDE系数(或初始状态)与解(或最终状态)的联合分布,从而在有限配对数据下实现正向模拟和逆问题恢复。在推理时,采样由一个复合损失引导,该损失强制与稀疏测量一致并减少PDE残差;我们支持确定性、随机性和混合采样器。我们为这些引导过程提供误差保证。对于确定性优化器,一个强制条件确保轨迹有界,且逐阶段收缩导致目标精度的对数复杂度。对于随机采样器,我们引入自适应引导并假设速度场的耗散性,以获得与噪声基底参数无关的均匀矩界。这导致多项式时间误差界,且一个匹配的下界表明恒定引导会引入不可避免的正偏差,从而激发自适应性。还提供了混合确定性-随机分析。在静态和时变基准PDE上的实验表明,与基于扩散的生成模型相比,具有竞争性的精度和更快的推理速度。

英文摘要

Reconstructing PDE solutions from sparse observations is a core challenge in scientific computing. We present FM4PDE, a flow-matching generative framework that learns the joint distribution of PDE coefficients (or initial states) and solutions (or final states), enabling both forward simulation and inverse recovery with limited paired data. At inference, sampling is guided by a composite loss that enforces agreement with sparse measurements and reduces the PDE residual; we support deterministic, stochastic, and hybrid samplers. We provide error guarantees for these guided procedures. For the deterministic optimizer, a coercivity condition ensures trajectory boundedness and a phase-wise contraction yields logarithmic complexity in the target accuracy. For the stochastic sampler, we introduce adaptive guidance and assume dissipativity of the velocity field to obtain uniform moment bounds independent of the noise-floor parameter. This leads to polynomial-time error bounds, and a matching lower bound shows constant guidance induces an unavoidable positive bias, motivating adaptivity. A hybrid deterministic-stochastic analysis is also provided. Experiments on static and time-dependent benchmark PDEs demonstrate competitive accuracy and faster inference than diffusion-based generative models.

2605.25496 2026-05-26 stat.ME

Estimation of Directed Acyclic Graphs by Frequentist Model Averaging

通过频率模型平均估计有向无环图

Huihang Liu, Wenhui Li, Xinyu Zhang

AI总结 针对图结构不确定性下的有向无环图估计问题,提出一种基于惩罚负对数似然准则的最优模型平均方法,并证明了渐近最优性、权重一致性和参数一致性,即使所有候选模型均被错误指定。

详情
Comments
33 pages, 5 figures
AI中文摘要

有向无环图是表示多元网络数据中有向依赖关系的基本工具,广泛应用于金融和经济网络建模。然而,在图结构不确定性下,准确且可解释的估计仍然具有挑战性。我们提出了一种针对有向无环高斯图的最优模型平均方法。通过一组图结构不同的候选模型,我们使用最小化惩罚负对数似然准则的权重对候选模型的估计进行平均。与现有方法相比,我们不仅建立了所提方法的渐近最优性、权重一致性和参数一致性,还明确刻画了不同候选模型如何影响收敛速度。此外,即使所有候选图模型都被错误指定,我们也证明了参数一致性。模拟研究和基于银行国际负债数据的实证分析结果显示了所提方法的潜力。

英文摘要

Directed acyclic graphs provide a fundamental tool for representing directed dependence structures in multivariate network data, and are widely used to model financial and economic networks. However, accurate and interpretable estimation remains challenging under graph structural uncertainty. We propose an optimal model averaging method for directed acyclic Gaussian graphs. With a set of candidate models varying by graph structures, we average estimates from candidate models using weights that minimize a penalized negative log-likelihood criterion. In contrast to existing approaches, we not only establish the asymptotic optimality, weight consistency, and parameter consistency of the proposed method, but also explicitly characterize how different candidate models affect the convergence rate. Moreover, we prove parameter consistency even when all candidate graph models are misspecified. Results from simulation studies and a real-data analysis on the banks' international liability data show the promise of the proposed method.

2605.25478 2026-05-26 stat.ME

Transcripts and Algebraic Distances in Time Series: Stochastic Properties and Nonparametric Dependence Tests

时间序列中的转录本与代数距离:随机性质与非参数依赖性检验

Christian H. Weiß, José M. Amigó

AI总结 本文通过定义时间序列中连续序模式之间的转录本和代数距离(Cayley和Kendall编辑距离),推导其随机性质,并基于这些统计量构造非参数序列依赖性检验,模拟显示新检验具有优越的检验功效。

详情
AI中文摘要

近年来,使用序模式分析单变量连续分布过程的依赖结构已变得流行。本研究更进一步,考虑从时间序列中连续序模式计算出的转录本。转录本构成了连续序模式之间的一种“差异”,因此自然与序模式之间的两种代数距离——Cayley和Kendall编辑距离相关。原始时间序列被转换为转录本序列或距离序列,并推导了它们的重要随机性质。结果表明,这些性质在不同类型的原始过程之间存在显著差异。这激发了基于转录本和编辑距离的各种统计量的开发,以研究原始过程的依赖结构。特别地,推导了这些统计量在序列独立原假设下的渐近分布,并用于实现序列依赖的非参数检验。模拟研究表明,这些新颖的依赖检验具有吸引人的功效特性,通常优于以前基于序模式的依赖检验。最后,一个实际数据示例说明了所提出方法在实践中的应用和解释。

英文摘要

The use of ordinal patterns (OPs) for analyzing the dependence structure of univariate and continuously distributed processes has gained popularity in recent years. This research goes one step further and considers the transcripts being computed from successive OPs in the time series. Transcripts constitute a kind of ``difference'' between successive OPs and thus naturally relate to two algebraic distances between OPs, the Cayley and Kendall edit distances. The original time series is transformed into a sequence of transcripts or distances, respectively, and important stochastic properties thereof are derived. It is shown that these properties differ substantially among different types of original processes. This motivates the development of various statistics based on transcripts and edit distances in order to investigate the dependence structure of the original process. In particular, the asymptotic distribution of these statistics under the null hypothesis of serial independence is derived, which is then used to implement nonparametric tests for serial dependence. A simulation study shows that these novel dependence tests have appealing power properties, often outperforming former OP-based dependence tests. A concluding real-world data example illustrates the application and interpretation of the proposed approaches in practice.

2605.25460 2026-05-26 stat.ML cs.LG

Mean-Shift PCA by Knockoff Mean

通过Knockoff均值的Mean-Shift PCA

Mengda Li, Zeng Li, Jianfeng Yao

AI总结 提出一种通过故意引入knockoff均值扰动来消除PCA中均值偏移噪声的方法,利用随机矩阵理论证明均值偏移尖峰与原始协方差特征值谱可分离,并设计了两阶段PCA算法。

详情
Comments
ICML 2026
AI中文摘要

去除噪声是困难的,但添加噪声是容易的。在这项工作中,我们展示了如何通过故意引入knockoff均值扰动来消除PCA中的均值偏移噪声成分。标准PCA对样本均值的偏移高度敏感:来自偏移分布的一小部分样本可能导致主成分方向的大偏差。在高维情况下,现有的鲁棒PCA方法无法处理混合模型中固有的均值偏移污染结构。利用随机矩阵理论工具,我们证明了均值偏移尖峰在谱上与原始协方差的稳定特征值可分离。此外,原始特征空间渐近地不受污染影响,与混合权重无关。利用这种谱稳定性,我们提出了一种简单的两阶段PCA算法,通过添加knockoff均值,仅使用标准PCA操作来识别和移除均值偏移成分。

英文摘要

Removing noise is difficult, but adding noise is easy. In this work, we show how to eliminate mean-shift noisy components from PCA by deliberately introducing knockoff mean-shift perturbation. Standard PCA is highly sensitive to shifts in the sample mean: a small fraction of samples from a shifted distribution can cause large deviations in the leading principal components. In high-dimensional regimes, existing Robust PCA approaches cannot handle the mean-shift contamination structure inherent in the mixture model. Using tools from Random Matrix Theory, we prove that the mean-shift spikes are spectrally separable from the stable eigenvalues of the original covariance. Furthermore, the original eigenspace remains asymptotically invariant to the contamination, independent of the mixture weight. Exploiting this spectral stability, we propose a simple, two-stage PCA algorithm by adding knockoff mean that identifies and removes the mean-shift component using only standard PCA operations.

2605.25452 2026-05-26 stat.ME cs.LG stat.ML

Different Statistical Perspectives for Understanding Generalisation in Graph Neural Networks

理解图神经网络泛化能力的不同统计视角

Nil Ayday, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar

AI总结 本文从学习理论、无限参数/图渐近和随机图模型三个统计框架综述图神经网络泛化性的理论进展。

详情
Comments
15 pages, 4 figures, submission for Special Issue in AStA Advances in Statistical Analysis
AI中文摘要

图神经网络(GNN)是目前用于图结构数据学习和预测的最流行方法,已部署在从社交网络分析到药物发现的各种领域。然而,对GNN性能的数学理解仍然有限。我们讨论了用于研究GNN统计泛化性的各种视角。我们识别出三个广泛的框架。第一种方法根植于学习理论,依赖于一致收敛界和特定GNN架构假设类的复杂度。该方法还建立在GNN的表达性之上,通常通过图同构测试的视角进行研究。第二个原则是通过分析无限多参数或无限图大小渐近下的GNN来简化神经架构。该方法使用高斯过程、神经正切核或图神经网络算子来近似GNN,从而可以研究训练后GNN的泛化性或稳定性。第三个框架在随机图模型(通常是上下文随机块模型)下研究GNN,并利用高维统计工具推导非渐近误差率。我们强调了一些关键的理论结果,并讨论了每个视角的一些局限性和开放研究问题。

英文摘要

Graph Neural Networks (GNN) are currently the most popular approach for learning and prediction on graph-structured data and are deployed in various fields, from social network analysis to drug discovery. However, there is limited mathematical understanding of the performance of GNNs. We discuss the various perspectives used to study statistical generalisation in GNNs. We identify three broad frameworks. The first approach, rooted in learning theory, relies on uniform convergence bounds and the complexity of the hypothesis class of specific GNN architectures. This approach also builds on the expressivity of GNNs, typically studied through the lens of graph isomorphism tests. The second principle is to simplify the neural architecture by analysing GNNs under the asymptotics of infinitely many parameters or infinite graph size. This approach approximates GNNs using Gaussian processes, neural tangent kernels or graphon neural network operators, which allow studying the generalisation or stability of trained GNNs. The third framework studies GNNs under random graph models, often the contextual stochastic block model, and derives non-asymptotic error rates using tools from high-dimensional statistics. We highlight some key theoretical results and discuss a few limitations and open research questions for each perspective.

2605.25383 2026-05-26 stat.ML cs.LG math.ST stat.TH

Learning manifold diffusion semigroups from graph transition matrices

从图转移矩阵学习流形扩散半群

Xiuyuan Cheng, Nan Wu

AI总结 本文提出通过迭代图转移矩阵直接逼近流形热半群,在低正则性假设下给出了无穷范数误差界,并实现了与图拉普拉斯方法相当的收敛速率。

详情
AI中文摘要

我们考虑由从嵌入欧氏空间的未知流形中抽取的有限独立同分布样本构建的图扩散过程,其中图亲和度由环境高斯核矩阵定义。我们证明,在测试函数 $f$ 仅具有低正则性假设(包括 $f \in L^\infty$ 的情况)下,流形热半群 $Q_t = e^{t\Delta}$ 可以通过迭代图转移矩阵 $P$ 直接逼近。我们以 $\infty$-范数界定了 $\| P^n f - Q_t f \|$,其中算子对 $f$ 的作用被适当定义,并且对于扩散时间 $t$ 至 $O(1)$ 及更长,我们恢复了经典图拉普拉斯逐点速率 $O(N^{-2/(d+6)})$(忽略对数因子)。该速率适用于样本内误差以及样本外泛化,其中新点处 $Q_t f$ 的估计量通过核卷积定义。为了处理流形上的非均匀采样密度,我们引入了图转移矩阵的右归一化;在采样密度 $p$ 为 $C^3$ 且远离零的假设下,相同的收敛速率成立。我们在模拟数据上数值验证了所提估计器的性能。

英文摘要

We consider graph diffusion processes constructed from finite i.i.d. samples drawn from an unknown manifold embedded in ambient Euclidean space, where the graph affinity is defined by an ambient Gaussian kernel matrix. We show that the manifold heat semigroup $Q_t = e^{tΔ}$ can be approximated directly by iterating the graph transition matrix $P$, under only low regularity assumptions on the test function $f$, including the case $f \in L^\infty$. We bound $\| P^n f - Q_t f \|$ in $\infty$-norm, with the operator application to $f$ properly defined, and we recover the classical graph-Laplacian pointwise rate $O(N^{-2/(d+6)})$ up to logarithmic factors, for diffusion times $t $ up to $O(1)$ and longer. The rate holds for in-sample error as well as out-of-sample generalization, where the estimator of $Q_t f$ at a new point is defined via kernel convolution. To handle non-uniform sampling densities on the manifold, we introduce a right-normalization of the graph transition matrix; under the assumption that the sampling density $p$ is $C^3$ and bounded away from zero, the same convergence rates hold. We numerically demonstrate the performance of the proposed estimator on simulated data.

2605.25380 2026-05-26 stat.ME

Rank-Based Tests for Mutual Independence of High-Dimensional Random Vectors via $L_q$ Norm

基于$L_q$范数的高维随机向量相互独立的秩检验

Ping Zhao, Hongfei Wang, Long Feng

AI总结 针对高维随机向量分量间的相互独立检验问题,基于秩的max-sum框架,引入有限$L_q$幂和统计量,并利用Cauchy规则组合$L_2, L_4, L_6$和$L_\infty$的p值,提出对备选假设稀疏性高度鲁棒的$L_{2,4,6,\infty}$方法。

详情
AI中文摘要

我们考虑检验高维随机向量分量间相互独立的问题。基于秩的max-sum框架,我们在三类秩相关(简单线性秩统计量、非退化秩U统计量和退化秩U统计量)下引入了固定的有限$L_q$幂和统计量。所提出的统计量在$L_2$统计量对密集备选假设的敏感性和$L_\infty$统计量对稀疏备选假设的敏感性之间进行插值。我们建立了任意固定有限$L_q$块与对应$L_\infty$统计量之间的渐近独立性,并通过Cauchy规则组合$L_2, L_4, L_6$和$L_\infty$的p值。数值研究表明,所得的$L_{2,4,6,\infty}$程序对备选假设的稀疏性高度鲁棒,并且在所考虑的设计中具有强大的经验功效。

英文摘要

We consider the problem of testing mutual independence among the components of a high-dimensional random vector. Building on the rank-based max-sum framework, we introduce fixed finite-$L_q$ power-sum statistics under three general classes of rank-based correlations: simple linear rank statistics, non-degenerate rank-based U-statistics and degenerate rank-based U-statistics. The proposed statistics interpolate between the dense-alternative sensitivity of the $L_2$ statistic and the sparse-alternative sensitivity of the $L_\infty$ statistic. We establish the asymptotic independence between any fixed finite-$L_q$ block and the corresponding $L_\infty$ statistic, and combine $L_2,L_4,L_6$ and $L_\infty$ p-values through a Cauchy rule. Numerical studies show that the resulting $L_{2,4,6,\infty}$ procedure is highly robust to the sparsity of the alternative and has strong empirical power across the considered designs.

2605.25359 2026-05-26 math.ST stat.TH

A Quasi Maximum Likelihood Estimation Method for Bergomi-Type Volatility Models

Bergomi型波动率模型的拟极大似然估计方法

Masaaki Fukasawa, Haruki Tomita

AI总结 针对Bergomi型随机波动率模型,提出一种基于期权价格高频时间序列观测的拟极大似然估计方法,用于估计核参数,并证明了估计量的一致性和渐近混合正态性。

详情
Comments
27 pages, 3 figures
AI中文摘要

我们提出了一种针对带有参数化核的Bergomi型随机波动率模型的拟极大似然估计方法,重点是从期权价格的高频时间序列观测中估计核参数。我们首先证明了累积远期方差(可从期权价格重建)在Bergomi型模型下解一个由一维布朗运动驱动的无穷维随机微分方程。为了克服这种退化性,我们基于Euler-Maruyama近似引入了一个非退化的代理似然,并通过相关的估计函数定义了一个估计量。我们在一个正则核类下建立了所提估计量的一致性和渐近混合正态性。模拟研究和SPXW期权数据的实证应用说明了该方法的有限样本性能以及该方法的实际相关性。

英文摘要

We propose a quasi maximum likelihood estimation method for Bergomi-type stochastic volatility models with parametrized kernels, focusing on the estimation of the kernel parameters from high-frequency time-series observations of option prices. We first show that the cumulative forward variance, which can be reconstructed from option prices, solves an infinite-dimensional stochastic differential equation driven by a one-dimensional Brownian motion under the Bergomi-type model. To overcome this degeneracy, we introduce a nondegenerate proxy likelihood based on the Euler-Maruyama approximation and define an estimator through the associated estimating function. We establish consistency and asymptotic mixed normality of the proposed estimator under a regular class of kernels. Simulation studies and an empirical application to SPXW option data illustrate the finite-sample performance of the method and the practical relevance of the approach.

2605.25313 2026-05-26 cs.LG cs.AI cs.RO stat.ML

UWM-JEPA: Predictive World Models That Imagine in Belief Space

UWM-JEPA:在信念空间中进行想象的世界预测模型

Santosh Kumar Radha, Oktay Goktas

AI总结 针对部分可观测环境,提出UWM-JEPA模型,通过密度矩阵潜变量和酉预测器在信念空间中保持联合状态谱,实现长时域盲推演下的不确定性保持,显著优于向量潜变量基线。

详情
Comments
14 pages, 6 figures, 7 tables. Code and data: https://github.com/santoshkumarradha/uwm-jepa
AI中文摘要

部分可观测环境下的世界模型必须想象多个兼容的隐藏未来,并在反事实动作下引导它们。联合嵌入预测架构(JEPAs)在潜在空间中实现这一点,但向量值潜变量没有内部结构来承载盲推演过程中隐藏连续性的信念。我们引入了酉世界模型JEPA(UWM-JEPA),这是一种JEPA世界模型,具有在联合系统-环境空间上的密度矩阵潜变量和学习的酉预测器。该结构在推演过程中精确保持联合状态谱,因此预测器本身不会耗散表示的不确定性。在一个需要根据给定动作序列进行五步前向模拟且目标观测被掩蔽的隐藏速度指示任务中,UWM-JEPA达到0.77的准确率,并且随着动作被扰动而单调下降;而参数匹配的LSTM-JEPA在相同的反事实目标目标和动作头训练下,在所有动作条件下都崩溃为多数类准确率(0.53)。在盲推演下,UWM-JEPA在短时域上损失不到十个点的探针R^2,而向量潜变量基线损失四十一个和六十八个点;两者在保留的上下文探针上表现相当,表明差异在于预测器而非编码器。动作敏感性本身需要针对反事实而非教师强制目标进行训练,这一发现适用于酉参数化之外。对于JEPA世界模型在部分可观测性下进行想象,潜变量几何和预测器动力学至关重要,而不仅仅是冻结的上下文编码能力。

英文摘要

World models for partially observed environments must imagine multiple compatible hidden futures and steer between them under counterfactual actions. Joint Embedding Predictive Architectures (JEPAs) do this in latent space, but a vector-valued latent has no internal structure for carrying the belief over hidden continuations through blind rollout. We introduce the Unitary World Model JEPA (UWM-JEPA), a JEPA world model with a density-matrix latent on a joint system-environment space and a learned unitary predictor. The construction preserves the joint-state spectrum exactly during rollout, so the predictor itself cannot dissipate the represented uncertainty. On a hidden-velocity indicator task requiring five-step forward simulation under a given action sequence with the target observation masked, UWM-JEPA reaches 0.77 accuracy and degrades monotonically as actions are perturbed; a parameter-matched LSTM-JEPA trained under the same counterfactual-target objective and action head collapses to majority-class accuracy (0.53) under every action condition. Under blind rollout, UWM-JEPA loses fewer than ten points of probe R^2 at short horizons while vector-latent baselines lose forty-one and sixty-eight; both nevertheless tie on a held-out context probe, locating the separation in the predictor rather than the encoder. Action sensitivity itself requires training against counterfactual rather than teacher-forced targets, a finding that applies beyond the unitary parameterisation. For JEPA world models to imagine under partial observability, latent geometry and predictor dynamics matter, not frozen context-encoding capacity alone.

2605.25290 2026-05-26 stat.ML cs.LG

Choosing Online Experiment Designs under Interference in Ads, Recommendations, and Member-Experience Systems

广告、推荐和会员体验系统中存在干扰时的在线实验设计选择

Prashant Shekhar, Caroline Howard

AI总结 针对广告、推荐和会员体验系统中干扰机制未知的问题,提出一种基于鲁棒设计选择的框架,通过最坏情况规划风险比较六种可实施设计,并给出几何感知保证和有限目录近似定理。

详情
AI中文摘要

广告、推荐和会员体验系统中的在线实验通常是在主导干扰机制已知之前规划的。处理效应可能通过预算、库存、生产者曝光、图溢出或时间结转传播,使得随机化设计本身成为一个统计决策。我们将此问题形式化为在不确定曝光机制下的鲁棒设计选择。给定一个包含六种可实施设计的有限目录,选择器通过模糊集上的最坏情况规划风险比较每种设计。风险结合了曝光偏差、分配单元方差、最小可检测效应、污染或结转、操作成本和估计量不匹配。在理论证明方面,本文开发了一种几何感知保证,指出设计偏差受限于到发布曝光分布的Wasserstein距离,并且该惩罚在Lipschitz曝光响应下是极小极大紧的。我们还证明了有限目录近似和具有超额风险控制的鲁棒选择器定理、在分离条件下的精确恢复,以及当风险曲面平坦时的认证候选列表。实证上,同一选择器在来自公共数据集的样本上给出不同的推荐。它在Criteo广告上选择用户随机化,无量纲鲁棒风险为1.295;在Open Bandit-bts/men上选择切换设计,风险为2.105;在KuaiRand上选择聚类随机化,风险为2.240。Open Bandit案例强调了已知但不均匀的日志记录支持,倾向性从0.00006到0.594,IPS有效样本份额为5.17%。总体而言,本文贡献了一个基于机制鲁棒设计决策的干扰感知实验设计框架,输出要么是合理的设计选择,要么是不确定性候选列表。

英文摘要

Online experiments in ads, recommendation, and member-experience systems are often planned before the dominant interference mechanism is known. A treatment may propagate through budgets, inventory, producer exposure, graph spillovers, or temporal carryover, making the randomization design itself a statistical decision. We formulate this problem as robust design selection over uncertain exposure mechanisms. Given a finite catalog of six implementable designs, the selector compares each design by worst-case planning risk over an ambiguity set. The risk combines exposure bias, assignment-unit variance, minimum detectable effect, contamination or carryover, operational cost, and estimand mismatch. For theoretical justification, the paper develops a geometry-aware guarantee, stating that design bias is bounded by Wasserstein distance to the launch exposure distribution, and this penalty is minimax tight under Lipschitz exposure response. We also prove finite-catalog approximation and a robust selector theorem with excess-risk control, exact recovery under separation, and certified shortlists when the risk surface is flat. Empirically, the same selector gives different recommendations across samples from public datasets. It selects user-randomization on Criteo ads with dimensionless robust risk 1.295, switchbacks on Open Bandit-bts/men with risk 2.105, and cluster-randomization on KuaiRand with risk 2.240. The Open Bandit case stresses known but uneven logging support, with propensities from 0.00006 to 0.594 and a 5.17% IPS effective-sample share. Overall, the paper contributes an interference-aware experiment design framework based on mechanism-robust design decisions, where the output is either a justified design choice or an uncertainty shortlist.

2605.25272 2026-05-26 cs.AI cs.CY stat.AP

AI Cartography: Mapping the Latent Landscape of AI Benchmark Ecosystems

AI 制图:绘制 AI 基准生态系统的潜在景观

Michael Hardy, Anka Reuel, Lijin Zhang, Jodi M. Casabianca, Sang Truong, Yash Dave, Hansol Lee, Benjamin Domingue, Sanmi Koyejo

AI总结 针对排行榜分数受测量噪声影响的问题,提出基于验证性因子分析和概化理论的框架,分解排名方差来源,揭示基准间关系、局部依赖性及元数据影响,并比较显式与潜在缩放律的可靠性。

详情
AI中文摘要

虽然总体排行榜分数驱动着 AI 发展,但它们包含大量测量噪声,其来源和幅度尚未量化,使得排名何时反映真实能力差异何时反映评估伪像尚不明确。我们引入了一个用于测量 AI 基准生态系统中潜在景观的框架。将验证性因子分析(CFA)和概化理论应用于 Open LLM Leaderboard 上的 4000 多个模型,我们分解了排名方差的来源并确定:(1)当前报告实践中假设的结构低估了基准之间关系的强度;(2)排行榜项目之间存在局部依赖性的证据,这削弱了在当前评分系统下将基准用作测量工具的有效性;(3)在此背景下,贡献者元数据解释了比架构或部署类别更多的排名相关方差(约 9%);(4) 显式分数的“缩放律”斜率可靠性较低($R_β=0.53$);相比之下,潜在通用因子大小斜率在生态系统控制下高度稳定($R_g=0.97$)。我们能够提供对基准动态的独特见解,例如哪些基准是 LLM 规模的函数,哪些可能受到后训练实践的相反影响。我们提供了可操作的诊断方法,以确定如何信任基准排名以及如何改进基准设计。

英文摘要

While aggregate leaderboard scores drive AI development, they contain substantial measurement noise whose sources and magnitudes remain unquantified, making it unclear when rankings reflect genuine capability differences versus evaluation artifacts. We introduce a framework for measuring the latent landscape in AI benchmark ecosystems. Applying Confirmatory Factor Analysis (CFA) and Generalizability Theory to 4,000+ models from the Open LLM Leaderboard, we decompose sources of ranking variance and establish: (1) structures assumed in current reporting practice underestimate the strength of relationships between benchmarks; (2) evidence of local dependence among leaderboard items, undermining uses of benchmarks as measurement instruments under current scoring systems; (3) contributor metadata explains more rank-relevant variance ($\approx9\%$) than architecture or deployment categories in this context; (4) a manifest-score "scaling law" slope has low reliability ($R_β=0.53$); by contrast, the latent general-factor size slope is highly stable across ecosystem controls ($R_g=0.97$). We are able to provide unique insights into benchmark dynamics, such as which benchmarks are a function of LLM size and which can be oppositely impacted by post-training practices. We provide actionable diagnostics to determine how benchmark rankings can be trusted and how benchmark design can be improved.

2605.25261 2026-05-26 stat.AP

A Statistical Physics View of the S&P 500: Pairwise Interactions and Time-Varying Dynamics

S&P 500的统计物理学视角:成对相互作用与时变动力学

Sebin Oh, Marta C. Gonzáleza, Ziqi Wang

AI总结 本文采用静态和动力学伊辛模型分析S&P 500股票日度二元涨跌数据,揭示长期最大熵相互作用网络和短期时变动力学特征。

详情
AI中文摘要

我们使用互补的静态和动力学伊辛模型,应用于1996年至2026年S&P 500固定股票池的日度二元开盘-收盘变动数据。静态成对模型提供了低阶依赖的长期最大熵总结,揭示了一个按行业组织的相互作用网络,具有适度的小世界结构,行业内耦合强度约为行业间耦合强度的2.8倍,其中房地产和能源行业尤为一致。动力学模型引入了平滑时变外场、自记忆和定向滞后耦合来描述次日动力学。它揭示了围绕三个主要市场范围扰动——互联网泡沫破裂、全球金融危机和COVID-19事件——的缓慢场机制转变。自记忆通常较弱,定向耦合结构比静态网络更少行业集中且更不对称,但仍能再现总体市场运动的广泛演变。综合来看,这两个互补模型刻画了长期市场组织和短期跨股票动力学,提供了S&P 500中相互作用结构和时变行为的紧凑统计物理学视角。

英文摘要

We analyze a fixed panel of S\&P 500 stocks from 1996 to 2026 using complementary static and kinetic Ising models applied to daily binary open-to-close movements. The static pairwise model provides a long-run maximum-entropy summary of low-order dependence and reveals a sectorally organized interaction network with modest small-world structure and within-sector couplings about 2.8 times stronger than between-sector couplings, with especially coherent real estate and energy sectors. The kinetic model incorporates smooth time-varying external fields, self-memory, and directed lagged couplings to describe next-day dynamics. It reveals slow field-regime shifts around three major market-wide perturbations -- the dot-com bust, the global financial crisis, and the COVID-19 episode. Self-memory is generally weak, and the directed coupling structure is much less sector-concentrated and more asymmetric than the static network, while still reproducing the broad evolution of aggregate market movement. Taken together, the two complementary models characterize both persistent market organization and short-horizon cross-stock dynamics, providing a compact statistical physics view of interaction structure and time-varying behavior in the S\&P 500.

2605.25255 2026-05-26 math.OC stat.ML

Boosted Stochastic Frank-Wolfe for Constrained Nonconvex Optimization

增强型随机Frank-Wolfe算法用于约束非凸优化

Navil Nandhan, Abbas Khademi, Antonio Silveti-Falls

AI总结 提出一种不依赖梯度Lipschitz常数的步长策略,将增强型Frank-Wolfe算法扩展到随机设置,并证明其与多种梯度估计器结合时保持收敛率,首次给出非凸和拟凸目标上的收敛率。

详情
AI中文摘要

增强型Frank-Wolfe算法通过更好地对齐更新方向与负梯度来加速经典Frank-Wolfe算法。然而,其分析仅限于确定性凸问题,且步长需要线搜索或知道梯度的Lipschitz常数。我们开发了一种不依赖梯度Lipschitz常数的新型步长策略,从而将增强型Frank-Wolfe算法扩展到随机设置。我们证明,这种步长策略的增强可以与许多现代梯度估计器(包括SAGA、L-SVRG、SAG、重球动量和零阶估计器等)结合,同时保持普通随机Frank-Wolfe的最坏情况收敛率。我们的分析还首次给出了增强型Frank-Wolfe在非凸和拟凸目标上的收敛率,这些结果即使对于确定性问题也是新的。在稀疏逻辑回归和量子过程层析成像上的实验表明,与未增强的基线相比,随机增强型Frank-Wolfe在每次梯度oracle调用(以及挂钟时间)上实现了更快的收敛。

英文摘要

The boosted Frank-Wolfe algorithm accelerates the classical Frank-Wolfe algorithm by better aligning the update direction with the negative gradient. Its analysis, however, has been limited to deterministic convex problems, with step sizes that require either line search or knowledge of the Lipschitz constant of the gradient. We develop a novel step size strategy that does not depend on the Lipschitz constant of the gradient, which allows us to extend the boosted Frank-Wolfe algorithm to the stochastic setting. We prove that boosting with this step size strategy can be combined with many modern gradient estimators, including SAGA, L-SVRG, SAG, Heavy Ball momentum, and zeroth-order estimators, among others, while retaining the worst-case convergence rates of ordinary stochastic Frank-Wolfe. Our analysis also yields the first convergence rates for boosted Frank-Wolfe on nonconvex and quasar-convex objectives, results which are new even for deterministic problems. Experiments on sparse logistic regression and quantum process tomography show that stochastic boosted Frank-Wolfe achieves faster convergence per gradient oracle call (and on wall-clock) compared to the non-boosted baseline.

2605.23650 2026-05-26 stat.ML cs.LG

Learning Kernel-Based MDPs from Episodic Preferential Feedback

从片段偏好反馈中学习基于核的MDP

Nikola Pavlovic, Sattar Vakili, Qing Zhao

AI总结 本文研究片段核MDP中的偏好学习,提出基于偏好比较的价值估计和置信集方法,并证明亚线性遗憾界。

详情
AI中文摘要

人类反馈通常以偏好形式而非校准的数值奖励出现,这推动了从偏好反馈中强化学习(也称为从人类反馈中强化学习,RLHF)。我们对片段核MDP中的纯偏好学习进行了严格的理论研究。在每个片段中,学习器从共同起始状态部署两个策略,并接收一个二进制标签,指示哪个轨迹更受偏好,该标签通过Bradley-Terry-Luce链接函数基于累积(未观测)奖励的差异建模。在奖励和转移函数基于核的假设下(这是最适宜理论分析的一般模型之一),我们开发了基于偏好的价值估计和专门针对片段结束比较的置信集。我们证明了遗憾界以高概率随片段数亚线性增长,这意味着学习策略的价值收敛到最优策略的价值。

英文摘要

Human feedback often arrives as preferences rather than calibrated numeric rewards, motivating reinforcement learning from preferential feedback, also referred to as reinforcement learning from human feedback (RLHF). We present a rigorous theoretical study of preference-only learning in episodic kernel MDPs. In each episode, the learner deploys two policies from a common start state and receives a single binary label indicating which trajectory is preferred, modeled by a Bradley--Terry--Luce link on the difference of cumulative (unobserved) rewards. Under kernel-based assumptions on the reward and transition functions (one of the most general models amenable to theoretical analysis) we develop preference-based value estimation and confidence sets tailored to end-of-episode comparisons. We prove high-probability regret bounds that scale sublinearly in the number of episodes, implying that the value of the learned policy converges to that of the optimal policy.

2605.22795 2026-05-26 stat.ML cs.AI cs.LG math.ST stat.TH

Finite-Particle Convergence Rates for Conservative and Non-Conservative Drifting Models

保守与非保守漂移模型的有限粒子收敛速率

Krishnakumar Balasubramanian

AI总结 针对一步生成建模,提出保守漂移方法(用核密度估计梯度速度替代位移速度)并证明连续时间有限粒子收敛界,同时分析非保守方法(Laplace核)的对应速率。

详情
AI中文摘要

我们提出并分析了一种用于一步生成建模的保守漂移方法。该方法将原始的基于位移的漂移速度替换为核密度估计(KDE)梯度速度,即核平滑数据得分与核平滑模型得分之差。该速度为梯度场,解决了通用基于位移的漂移场中发现的非保守性问题。我们证明了在$\R^d$上保守方法的连续时间有限粒子收敛界:联合熵恒等式给出了经验Stein漂移、KDE的平滑Fisher差异以及中心速度平方的界。主要的有限粒子校正是倒数KDE自相互作用项,我们给出了确定性和高概率的局部占据条件,在此条件下该项可控。我们保持求积常数显式并追踪其可能的带宽依赖性:在额外的$h$均匀求积正则条件下,根残差速度率为$N^{-1/(d+4)}$;而更一般的增长条件产生优化根速率$N^{-(2-β)/(2(d+4-β))}$,其中$0\le β<2$。我们还分析了使用Laplace核的非保守漂移方法,对应于Deng等人2026年(arxiv:2602.04770)提出的原始基于位移的速度。对于该方法,一个尖锐的伴随核将速度分解为尖锐得分不匹配的正标量预处理加上Laplace尺度不匹配残差,产生类似的有限粒子速率,但带有一个不可避免的残差项。最后,我们解释了如何通过显式漂移大小$η$将连续时间残差速度界转化为一步生成保证。

英文摘要

We propose and analyze a conservative drifting method for one-step generative modeling. The method replaces the original displacement-based drifting velocity by a kernel density estimator (KDE)-gradient velocity, namely the difference of the kernel-smoothed data score and the kernel-smoothed model score. This velocity is a gradient field, addressing the non-conservatism issue identified for general displacement-based drifting fields. We prove continuous-time finite-particle convergence bounds for the conservative method on $\R^d$: a joint-entropy identity yields bounds for the empirical Stein drift, the smoothed Fisher discrepancy of the KDE, and the squared center velocity. The main finite-particle correction is a reciprocal-KDE self-interaction term, and we give deterministic and high-probability local-occupancy conditions under which this term is controlled. We keep the quadrature constants explicit and track their possible bandwidth dependence: the root residual-velocity rate $N^{-1/(d+4)}$ holds under an additional $h$-uniform quadrature regularity condition, while a more general growth condition yields the optimized root rate $N^{-(2-β)/(2(d+4-β))}$, where $0\le β<2$. We also analyze the non-conservative drifting method with Laplace kernel, corresponding to the original displacement-based velocity proposed in Deng et al., 2026 (arxiv:2602.04770). For this method, a sharp companion kernel decomposes the velocity into a positive scalar preconditioning of a sharp-score mismatch plus a Laplace scale-mismatch residual, producing an analogous finite-particle rate with an unavoidable residual term. Finally, we explain how the continuous-time residual-velocity bounds translate into one-step generation guarantees through the explicit drift size $η$.

2605.22265 2026-05-26 math.DG math.AT math.PR math.ST stat.TH

Empirical Hodge Laplacians, Cohomology Ring, and Manifold Learning

经验Hodge拉普拉斯算子、上同调环与流形学习

Hông Vân Lê

AI总结 针对高维欧氏空间中的紧致可定向光滑黎曼子流形,构造变形Hodge拉普拉斯算子族并证明其收敛到经典算子,进而基于点云数据建立经验算子的谱收敛,从而实现对de Rham上同调环、第二基本形式及庞特里亚金特征类的无偏恢复。

详情
Comments
Revised version, condition $ n \ge 3$ added. 68 p
AI中文摘要

设 $M^n$ 是 $\mathbb R^d$ 中维数 $n\geq 3$ 的紧致可定向光滑黎曼子流形。我们构造了一族变形的 Hodge 拉普拉斯算子 $Δ_t^*$, $t>0$,它们作用于微分形式并通过 $M^n$ 的外在几何定义。我们证明这些算子在适当的算子拓扑下当 $t o0^+$ 时一致收敛到经典的 Hodge 拉普拉斯算子 $Δ^*$。给定点云 $S_m \subset M^n$,我们定义经验算子 $Δ^*_{t, S_m}$ 并在合适的缩放机制 $t = m ^{- rac{1}{2n}}$ 下建立其当 $t o 0^+$ 时依概率谱收敛到 $Δ^*$。这严格地将标量的 Belkin--Niyogi 拉普拉斯特征映射框架推广到微分形式。作为应用,我们获得了从采样数据一致恢复 de Rham 上同调环 $H^* (M^n,\mathbf R)$、$M^n$ 的第二基本形式(从而黎曼曲率张量)以及庞特里亚金示性类和庞特里亚金数的程序。

英文摘要

Let $M^n$ be a compact orientable smooth Riemannian submanifold of dimension $n\geq 3$ in $\mathbb R^d$. We construct a family of deformed Hodge Laplacians $Δ_t^*$, $t>0$, acting on differential forms and defined through the extrinsic geometry of $M^n$. We prove that these operators converge uniformly, in the appropriate operator topology, to the classical Hodge Laplacian $Δ^*$ as $t\to0^+$. Given a point cloud $S_m \subset M^n$, we define empirical operators $Δ^*_{t, S_m}$ and establish their spectral convergence in probability to $Δ^*$, as $t \to 0^+$, under a suitable scaling regime $t = m ^{-\frac{1}{2n}}$. This rigorously extends the scalar Belkin--Niyogi Laplacian Eigenmaps framework to differential forms. As applications, we obtain consistent recovery procedures for the de Rham cohomology ring $H^* (M^n,\mathbf R)$, the second fundamental form of $M^n$, hence for the Riemannian curvature tensor, and consequently for the Pontryagin characteristic classes and Pontryagin numbers of $M^n$ from sampled data.

2605.18745 2026-05-26 stat.ML cs.LG cs.NA math.NA math.PR q-fin.MF stat.CO

SURGE: Approximation and Training Free Particle Filter for Diffusion Surrogate

SURGE: 扩散替代模型的近似与免训练粒子滤波

Lifu Wei, Yinuo Ren, Naichen Shi, Yiping Lu

AI总结 提出一种基于扩散模型的无偏粒子滤波方法,通过序列蒙特卡洛对扩散轨迹进行重加权和重采样,融合观测数据与模型模拟,实现状态估计的连续校正。

详情
Comments
accepted by ICML 2026
AI中文摘要

数据同化(DA)解决从含噪声和不完整的观测中顺序估计动力系统状态的问题。本文采用扩散模型作为世界模型来模拟和预测系统动力学。最近,基于分数的扩散模型学习了全局扩散先验,能有效建模(随机)动力学,显示出数据同化的强大潜力。本文研究如何利用含噪观测信息,在使用扩散先验时实现对预测系统状态的连续校正和细化。受粒子滤波方法启发,我们使用一组粒子表示后验分布。接收到含噪观测后,利用观测似然引导扩散模型,使生成过程朝向与观测一致的状态。然而,这种引导并不能保证从真实后验中采样。因此,我们将扩散轨迹视为路径测度,采用序列蒙特卡洛方法对粒子进行重加权和重采样,从而纠正生成过程并确保收敛到所需的后验分布。这产生了一种无偏的粒子滤波方法,严格地将观测数据与扩散模型模拟融合。

英文摘要

Data assimilation (DA) addresses the problem of sequentially estimating the state of a dynamical system from noisy and incomplete observations. In this work, we employ a diffusion model as a world model to simulate and predict the system's dynamics. Recently, score-based diffusion models have learned global diffusion priors that effectively model (stochastic) dynamics, revealing strong potential for data assimilation. In this paper, we investigate how information from noisy observations can be incorporated to enable continuous correction and refinement of the predicted system state when using a diffusion prior. Motivated by particle filtering methods, we represent the posterior distribution using a set of particles. After receiving noisy observations, the diffusion model is guided using the observation likelihood to steer the generation process toward observation-consistent states. Nevertheless, such guidance does not guarantee sampling from the true posterior. We therefore employ a Sequential Monte Carlo approach over the diffusion trajectory, viewed as a path measure, to reweight and resample particles, thereby correcting the generation process and ensuring convergence toward the desired posterior distribution. This leads to an unbiased particle filtering method that rigorously fuses observational data with diffusion model simulations.

2605.07233 2026-05-26 cs.LG cs.CR stat.ML

Modulated learning for private and distributed regression with just a single sample per client device

调制学习:每个客户端设备仅有一个样本的私有分布式回归

Praneeth Vepakomma, Amirhossein Reisizadeh, Samuel Horváth, Munther A. Dahleh

AI总结 针对每个客户端仅有一个样本的分布式学习场景,提出一种通过注入校准噪声并共享后处理表示来实现隐私保护的全局模型学习方法,在期望上匹配非私有中心化梯度更新。

详情
Comments
30 pages
AI中文摘要

本文聚焦于从大量设备中学习的问题,每个设备仅持有一个数据样本。这种每客户端一个样本的设置存在于多个实际应用中,包括从健身追踪器、数据/应用使用聚合器、可穿戴传感设备和日常事件监测器等学习。当客户端只有一个样本时,标准的联邦学习范式会失效,因为基于单个点的局部更新远非有用,尤其是在模型系数估计的早期轮次中。这种效用进一步被每轮添加的隐私诱导噪声削弱。本文针对这一问题,使此类客户端能够协作贡献,有效学习全局模型,同时不泄露其数据隐私。所提出的方法在每个客户端注入一个精心校准的噪声扰动来变换样本,然后共享经过后处理的表示给服务器。服务器聚合这些表示,处理得到无偏梯度更新,该更新在期望上匹配非私有中心化梯度,同时保护数据隐私。这种方法不同于传统的私有联邦学习,其中通信负载涉及模型系数而非私有变换的数据样本。该方法使数据极其有限的设备能够协作学习准确、保护隐私的模型,无需大量本地数据集或牺牲个体隐私。

英文摘要

This work focuses on the question of learning from a large number of devices with each device holding only a single sample of data. Several real-world applications exist to this one sample per client setup up including learning from fitness trackers, data/app usage aggregators, body-worn sensing devices, and daily event monitors to name a few. When a client has only one sample, the standard federated learning paradigm breaks down as a local update based on that single point is far from being useful, especially in the earlier rounds for estimation of the model coefficients. This utility is further weakened by the privacy-inducing noise applied at every round. This work caters to this problem to enable such clients to collaboratively contribute to effectively learn a global model without leaking the privacy of their data. The proposed approach injects a single, carefully calibrated noisy perturbation to transform the sample at each client, followed by a post-processed representation which is shared with the server. These representations aggregated at the server are processed to obtain an unbiased gradient update that in expectation matches the non-private centralized gradient while preserving data privacy. This approach is different than traditional private federated learning, where the communication payloads involve model coefficients as opposed to privately transformed data samples. This method enables devices with extremely limited data to collaborate and learn accurate, privacy-preserving models without requiring large local datasets or sacrificing individual privacy.

2605.05076 2026-05-26 math.ST stat.CO stat.ME stat.ML stat.TH

High-Dimensional Statistics: Reflections on Progress and Open Problems

高维统计学:进展与开放问题的反思

Arian Maleki, Subhabrata Sen, Sivaraman Balakrishnan, Verena Zuber, Chao Gao, Rishabh Dudeja, Christos Thrampoulidis, Anru Zhang, Weijie Su, Jason M. Klusowski, Po-Ling Loh, Ali Shojaie

AI总结 本文回顾高维统计学近二十年的进展,总结代表性成果、共同主题和开放问题,并指出进入该领域的重要文献。

详情
AI中文摘要

在过去的二十年中,高维统计学领域取得了实质性进展,这主要得益于技术进步,这些技术极大地降低了生物学、医学、天文学以及社会和环境科学等广泛领域的数据收集和存储成本与工作量。现代数据集日益复杂,通常表现出丰富的依赖性、异质性以及其他挑战传统统计方法的特征。作为回应,高维统计学已发展为解决更复杂的估计和推断问题。这种演变反过来又促进了与优化、测度集中、随机矩阵理论、信息论和理论计算机科学等广泛研究领域的深入联系和贡献。鉴于高维统计学近期发展的快速步伐,我们的目标是综合代表性进展,突出共同主题和开放问题,并指出进入该领域的重要文献。

英文摘要

Over the past two decades, the field of high-dimensional statistics has experienced substantial progress, driven largely by technological advances that have dramatically reduced the cost and effort for data collection and storage across a broad range of domains, including biology, medicine, astronomy, and the social and environmental sciences. Modern datasets are increasingly complex, often exhibiting rich dependency, heterogeneity, and other features that challenge traditional statistical methods. In response, high-dimensional statistics has evolved to address more sophisticated estimation and inference problems. This evolution has, in turn, fostered deep connections with and contributions to a wide range of research areas, including optimization, concentration of measure, random matrix theory, information theory, and theoretical computer science. Given the rapid pace of recent developments in high-dimensional statistics, our goal is to synthesize representative advances, highlight common themes and open problems, and point to important works that offer entry points into the field.

2605.02495 2026-05-26 cs.LG cs.AI stat.ML

Efficient Preference Poisoning Attack on Offline RLHF

高效偏好投毒攻击离线RLHF

Chenye Yang, Weiyu Xu, Lifeng Lai

AI总结 针对离线RLHF中的偏好投毒攻击,提出基于梯度字典的二进制稀疏近似方法(BAL-A和BMP-A),实现高效标签翻转攻击。

详情
Comments
Accepted to ICML 2026
AI中文摘要

离线人类反馈强化学习(RLHF)流程(如直接偏好优化DPO)在预收集的偏好数据集上训练,使其容易受到偏好投毒攻击。我们研究了对数线性DPO的标签翻转攻击。首先说明翻转一个偏好标签会在DPO梯度中引起与参数无关的偏移。利用这一关键性质,我们可以将目标投毒问题转化为结构化的二进制稀疏近似问题。为解决该问题,我们开发了两种攻击方法:二进制感知格点攻击(BAL-A)和二进制匹配追踪攻击(BMP-A)。BAL-A将二进制翻转选择问题嵌入二进制感知格点,并应用Lenstra-Lenstra-Lovász约简和Babai最近平面算法;我们提供了强制二进制系数并恢复最小翻转目标的充分条件。BMP-A将二进制匹配追踪适应于我们的非归一化梯度字典,并给出基于相干性的恢复保证和$K$翻转预算的鲁棒性(不可能性)证书。在合成字典和斯坦福人类偏好数据集上的实验验证了理论,并突出了字典几何如何决定攻击成功。

英文摘要

Offline Reinforcement Learning from Human Feedback (RLHF) pipelines such as Direct Preference Optimization (DPO) train on a pre-collected preference dataset, which makes them vulnerable to preference poisoning attack. We study label flip attacks against log-linear DPO. We first illustrate that flipping one preference label induces a parameter-independent shift in the DPO gradient. Using this key property, we can then convert the targeted poisoning problem into a structured binary sparse approximation problem. To solve this problem, we develop two attack methods: Binary-Aware Lattice Attack (BAL-A) and Binary Matching Pursuit Attack (BMP-A). BAL-A embeds the binary flip selection problem into a binary-aware lattice and applies Lenstra-Lenstra-Lovász reduction and Babai's nearest plane algorithm; we provide sufficient conditions that enforce binary coefficients and recover the minimum-flip objective. BMP-A adapts binary matching pursuit to our non-normalized gradient dictionary and yields coherence-based recovery guarantees and robustness (impossibility) certificates for $K$-flip budgets. Experiments on synthetic dictionaries and the Stanford Human Preferences dataset validate the theory and highlight how dictionary geometry governs attack success.

2604.18419 2026-05-26 cs.LG cs.CL stat.ML

Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning

知道何时退出:LLM推理中动态弃权的原则性框架

Hen Davidov, Nachshon Cohen, Oren Kalinsky, Yaron Fairstein, Guy Kushilevitz, Ram Yazdi, Patrick Rebeschini

AI总结 本文提出一个基于正则化强化学习框架的动态弃权原则,通过价值函数与弃权奖励的比较来决定是否提前终止推理,在数学推理和毒性避免任务上优于现有方法。

详情
Journal ref
Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026. Copyright 2026 by the author(s)
AI中文摘要

利用思维链推理的大型语言模型常常因产生冗长且错误的响应而浪费大量计算资源。弃权可以通过抑制可能不正确的输出来缓解这一问题。虽然大多数弃权方法在生成之前或之后决定是否保留输出,但动态的生成中弃权考虑在每个token位置提前终止无前途的推理轨迹。先前的工作探索了这一想法的经验变体,但缺乏对弃权规则的原则性指导。我们提出了LLM动态弃权的形式化分析,将弃权建模为正则化强化学习框架中的一个显式动作。弃权奖励参数控制计算与信息之间的权衡。我们证明,在一般条件下,当价值函数低于该奖励时弃权严格优于自然基线。我们进一步推导了一种原则性且高效的方法来近似价值函数。在数学推理和毒性避免任务上的实证结果支持我们的理论,并展示了相比现有方法改进的选择性准确性。

英文摘要

LLMs utilizing chain-of-thought reasoning often waste substantial compute by producing long, incorrect responses. Abstention can mitigate this by withholding outputs unlikely to be correct. While most abstention methods decide to withhold outputs before or after generation, dynamic mid-generation abstention considers early termination of unpromising reasoning traces at each token position. Prior work has explored empirical variants of this idea, but principled guidance for the abstention rule remains lacking. We present a formal analysis of dynamic abstention for LLMs, modeling abstention as an explicit action within a regularized reinforcement learning framework. An abstention reward parameter controls the trade-off between compute and information. We show that abstaining when the value function falls below this reward strictly outperforms natural baselines under general conditions. We further derive a principled and efficient method to approximate the value function. Empirical results on mathematical reasoning and toxicity avoidance tasks support our theory and demonstrate improved selective accuracy over existing methods.

2603.06798 2026-05-26 cs.LG cs.DC stat.ML

NEST: Network- and Memory-Aware Device Placement For Distributed Deep Learning

NEST: 面向分布式深度学习的网络与内存感知设备放置

Irene Wang, Vishnu Varma Venkata, Arvind Krishnamurthy, Divya Mahajan

AI总结 提出NEST框架,通过结构化动态规划统一模型并行、拓扑建模和内存可行性,在多种硬件和网络上实现高达2.43倍的吞吐量提升。

详情
Comments
Accepted to MLSys 2026
AI中文摘要

深度学习规模的不断增长要求分布式训练框架能够联合考虑并行性、内存和网络拓扑。先前的工作通常依赖启发式或拓扑无关的搜索,分别处理通信和内存。由于缺乏每设备内存感知,这些方法通常事后通过将参数和激活分片到多个设备上来确保可行性,从而增加同步、扩大通信、降低计算利用率,限制了实际数据中心网络上的可扩展性和效率。我们提出了NEST,一个网络、计算和内存感知的设备放置框架,通过结构化动态规划统一了模型并行、拓扑建模和内存可行性。NEST的动态规划在具有张量和专家并行配置、跨层次或任意网络的显式allreduce延迟以及内存/计算轮廓的算子图上运行。通过跨张量、流水线、数据和专家维度分解并行性,NEST为混合策略定义了一个原则性的搜索空间,同时联合优化共置、网络延迟和内存可行性。在多种硬件和网络上的评估表明,与最先进的基线相比,NEST实现了高达2.43倍的吞吐量提升、更好的内存效率和可扩展性,为下一代AI基础设施的并行化策略和数据中心互连协同设计提供了基础。NEST的源代码可在https://github.com/scai-tech/Nest获取。

英文摘要

The growing scale of deep learning demands distributed training frameworks that jointly reason about parallelism, memory, and network topology. Prior works often rely on heuristic or topology-agnostic search, handling communication and memory separately. Without per-device memory awareness, these methods typically ensure feasibility post hoc by sharding parameters and activations across many devices, increasing synchronization, inflating communication, and underutilizing compute-limiting scalability and efficiency on real datacenter networks. We present NEST, a network-, compute-, and memory-aware device placement framework that unifies model parallelism, topology modeling, and memory feasibility via structured dynamic programming. NEST's DP operates on operator graphs with tensor and expert parallel configurations, explicit allreduce latencies across hierarchical or arbitrary networks, and memory/compute profiles. By factoring parallelism across tensor, pipeline, data, and expert dimensions, NEST defines a principled search space for hybrid strategies while jointly optimizing co-location, network latency, and memory feasibility. Evaluations across diverse hardware and networks show NEST achieves up to 2.43 times higher throughput, better memory efficiency, and improved scalability over state-of-the-art baselines, providing a foundation for co-designing parallelization strategies and datacenter interconnects for next-generation AI infrastructure. The source code of NEST is available at: https://github.com/scai-tech/Nest

2602.23851 2026-05-26 stat.ME

Nonlinear Modal Interval Regression for Bivariate Data Analysis

双变量数据分析的非线性模态区间回归

Sai Yao, Yuko Araki, Osuke Iwata

AI总结 提出一种非线性模态区间回归方法,通过核密度估计和平滑样条分位数损失,稳健估计条件模态区间,在数值实验和新生儿激素数据中表现出更高精度和稳定性。

详情
Comments
25 pages, 8 figures
AI中文摘要

真实数据的离散程度对于理解给定分布的变异性尤为重要。除了集中趋势外,变异性在生命科学、气象学和经济学等广泛领域都备受关注。模态区间(MI)描述分布的离散程度,代表单变量单峰分布的最集中区间。在本研究中,我们提出一种非线性模态区间回归(MIR)方法,以平滑估计条件MI,从而稳健描述数据分布的离散程度如何随协变量变化。首先,我们使用核密度估计(KDE)估计对应于条件MI边界的分位水平,这些分位水平作为分位数损失函数的输入。其次,我们使用平滑样条的分位数损失拟合上下界函数。数值实验结果表明,与传统的MIR和KDE方法相比,重新表述的MIR实现了更高的精度和稳定性。为了评估所提方法的有效性,我们将该方法应用于新生儿激素数据,并识别出出生后前十天内皮质醇和褪黑素水平的显著节律。

英文摘要

The dispersion of real data is particularly important to understand the variability of a given distribution. In addition to the central tendency, variability is of considerable interest in a wide variety of fields such as life sciences, meteorology, and economics. The modal interval (MI) describes the dispersion or spread of distribution and represents the most concentrated interval of a univariate unimodal distribution. In this study, we propose a nonlinear modal interval regression (MIR) method to smoothly estimate a conditional MI to provide a robust description of how the dispersion of a data distribution varies with the covariate. First, we use kernel density estimation (KDE) to estimate the quantile levels corresponding to the conditional MI bounds, which serve as input to the quantile loss function. Second, we fit upper and lower bound functions using the quantile loss with smoothing splines. The results of numerical experiments demonstrate that the reformulated MIR achieved higher accuracy and stability than both the conventional MIR and the KDE methods. To evaluate the effectiveness of the proposed approach, we applied the method to neonatal hormone data and identified notable rhythms in cortisol and melatonin levels during the first ten days after birth.

2602.07704 2026-05-26 stat.ME

Correcting for Nonignorable Nonresponse Bias in Ordinal Observational Survey Data

有序观测调查数据中不可忽略无应答偏差的校正

Lukáš Lafférs, Jozef Michal Mintal, Ivan Sutóris

AI总结 针对有序结果变量中不可忽略的无应答偏差,提出一种利用应答倾向代理变量(如合作性)结合事后分层的方法,将Peress(2010)的二元VRP框架推广至有序结果,并通过最大似然估计实现。

详情
Comments
17 pages
AI中文摘要

许多政治调查依赖事后分层、倾斜或相关加权调整来使受访者与目标人群对齐。但当受访者在结果本身与非受访者不同时(不可忽略无应答),这些调整可能失败,甚至引入偏差到基本描述统计中。我们提供了一种实用方法,通过利用在受访者中观察到的应答倾向代理变量(例如,访问员编码的合作性)来外推至非受访者,同时直接整合可观测协变量并保留已知人口份额的事后分层优势,从而校正不可忽略无应答。该方法将Peress(2010)的变应答倾向(VRP)框架从二元结果推广到有序结果,后者广泛用于测量信任、满意度和政策态度。得到的估计量通过最大似然计算,并在一个紧凑的R程序中实现,该程序处理有序和二元结果。使用2024年美国全国选举研究(ANES),我们表明考虑不可忽略无应答对生活满意度产生了实质性的有意义变化(估计潜在相关性$ρ\approx 0.53$),而对回顾性经济评估则产生可忽略的变化($ρ\approx 0$),突出了不可忽略无应答何时实质性地影响调查估计。

英文摘要

Many political surveys rely on post-stratification, raking, or related weighting adjustments to align respondents with the target population. But when respondents differ from nonrespondents on the outcome itself (nonignorable nonresponse), these adjustments can fail, introducing bias even into basic descriptives. We provide a practical method that corrects for nonignorable nonresponse by leveraging response-propensity proxies (e.g., interviewer-coded cooperativeness) observed among respondents to extrapolate toward nonrespondents, while directly integrating observable covariates and retaining the benefits of post-stratification with known population shares. The method generalizes the variable-response-propensity (VRP) framework of Peress (2010) from binary to ordinal outcomes, which are widely used to measure trust, satisfaction, and policy attitudes. The resulting estimator is computed by maximum likelihood and implemented in a compact R routine that handles both ordinal and binary outcomes. Using the 2024 American National Election Study (ANES), we show that accounting for nonignorable nonresponse produces substantively meaningful shifts for life satisfaction (estimated latent correlation $ρ\approx 0.53$), while yielding negligible changes for retrospective economic evaluations ($ρ\approx 0$), highlighting when nonignorable nonresponse substantively affects survey estimates.

2602.05938 2026-05-26 stat.ME stat.AP

DiPPER: A Bayesian approach to differential prevalence analysis with applications in microbiome studies

DiPPER:一种用于差异流行率分析的贝叶斯方法及其在微生物组研究中的应用

Juho Pelto, Kari Auranen, Janne V. Kujala, Leo Lahti

AI总结 针对微生物组研究中差异流行率分析的边界问题和多重检验挑战,提出基于贝叶斯层次模型的DiPPER方法,在保持良好校准的族系错误率的同时,具有高灵敏度并在跨研究复现中优于现有方法。

详情
Comments
Source code and datasets: https://github.com/jepelt/differential-prevalence. R package: https://github.com/jepelt/DiPPER
AI中文摘要

最近的证据表明,分析分类特征的存在/缺失可以为微生物组研究中的差异丰度分析提供一种有吸引力的替代方案。然而,标准的差异流行率分析方法面临边界情况和多重检验的挑战。为了解决这些限制,我们开发了DiPPER(基于R的概率估计差异流行率),这是一种基于贝叶斯层次模型的方法。我们使用来自57个人类肠道微生物组研究的公开数据,将我们的方法与现有的差异流行率方法以及两种差异丰度工具进行了基准测试。我们观察到评估方法之间性能的显著差异。重要的是,DiPPER在检测潜在差异流行特征方面表现出高灵敏度,同时在全局零假设下保持良好校准的族系错误率。最值得注意的是,它在跨独立研究的发现复现方面优于替代方法。此外,DiPPER提供了差异流行率估计和不确定性区间,这些区间固有地针对多重检验进行了调整。

英文摘要

Recent evidence suggests that analyzing the presence/absence of taxonomic features can offer a compelling alternative to differential abundance analysis in microbiome studies. However, standard approaches to differential prevalence analysis face challenges with boundary cases and multiple testing. To address these limitations, we developed DiPPER (Differential Prevalence via Probabilistic Estimation in R), a method based on Bayesian hierarchical modeling. We benchmarked our method against existing differential prevalence methods, along with two differential abundance tools, using publicly available data from 57 human gut microbiome studies. We observed considerable variation in performance across the evaluated methods. Importantly, DiPPER demonstrated high sensitivity to detect potentially differentially prevalent features while maintaining a well-calibrated family-wise error rate under the global null hypothesis. Most notably, it outperformed the alternatives in the replication of findings across independent studies. Furthermore, DiPPER provides differential prevalence estimates and uncertainty intervals that are inherently adjusted for multiple testing.

2601.09525 2026-05-26 stat.ME stat.AP stat.CO

Sparse covariate-driven factorization of high-dimensional brain connectivity with application to site effect correction

高维脑连接性的稀疏协变量驱动因子分解及其在站点效应校正中的应用

Rongqian Zhang, Elena Tuzhilina, Jun Young Park

AI总结 提出SLACC方法,通过稀疏潜在协变量驱动的连接组因子分解,显式参数化协变量效应,以校正多站点脑成像数据中的站点效应。

详情
AI中文摘要

大规模神经影像学研究通常从不同站点的多个扫描仪收集数据,不同站点之间的扫描仪、扫描程序和其他条件的差异可能会引入人为的站点效应。这些效应可能会偏倚脑连接性度量,例如功能连接(FC),它量化了从功能磁共振成像(fMRI)导出的功能网络组织。如何利用高维网络结构有效减轻站点效应尚未得到解决。在本文中,我们提出了SLACC(稀疏潜在协变量驱动连接组)因子分解,这是一种多元方法,在对应于从脑连接性导出的稀疏秩1潜在模式的潜在受试者分数中显式参数化协变量效应。所提出的方法识别脑网络内和跨网络的局部站点驱动变异性,从而实现有针对性的校正。我们开发了一种惩罚期望最大化(EM)算法进行参数估计,并结合贝叶斯信息准则(BIC)指导优化。大量模拟验证了SLACC在恢复真实参数和潜在连接模式方面的鲁棒性。应用于自闭症脑成像数据交换(ABIDE)数据集,SLACC展示了其减少站点效应的能力。

英文摘要

Large-scale neuroimaging studies often collect data from multiple scanners across different sites, where variations in scanners, scanning procedures, and other conditions across sites can introduce artificial site effects. These effects may bias brain connectivity measures, such as functional connectivity (FC), which quantify functional network organization derived from functional magnetic resonance imaging (fMRI). How to leverage high-dimensional network structures to effectively mitigate site effects has yet to be addressed. In this paper, we propose SLACC (Sparse LAtent Covariate-driven Connectome) factorization, a multivariate method that explicitly parameterizes covariate effects in latent subject scores corresponding to sparse rank-1 latent patterns derived from brain connectivity. The proposed method identifies localized site-driven variability within and across brain networks, enabling targeted correction. We develop a penalized Expectation-Maximization (EM) algorithm for parameter estimation, incorporating the Bayesian Information Criterion (BIC) to guide optimization. Extensive simulations validate SLACC's robustness in recovering the true parameters and underlying connectivity patterns. Applied to the Autism Brain Imaging Data Exchange (ABIDE) dataset, SLACC demonstrates its ability to reduce site effects.

2512.15605 2026-05-26 cs.LG stat.ML

Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction

自回归语言模型实际上是能量模型:对下一个词元预测的预见能力的洞察

Mathieu Blondel, Michael E. Sander, Germain Vivier-Ardisson, Tianlin Liu, Vincent Roulet

AI总结 本文通过建立自回归模型与能量模型之间的双射,揭示了自回归模型在下一个词元预测范式下具备预见能力,并提供了理论误差界。

详情
AI中文摘要

自回归模型(ARMs)目前构成了大型语言模型(LLMs)的主导范式。能量模型(EBMs)代表了另一类模型,历史上在LLM发展中不太普遍,但自然地刻画了后训练对齐中的最优策略。在本文中,我们提供了这两类模型的统一视角。以概率链式法则为起点,我们在函数空间建立了ARMs和EBMs之间的显式双射,并证明这对应于最大熵强化学习中的软贝尔曼方程的一个特例。基于这一双射,我们推导了ARMs和EBMs的监督学习之间的等价性。此外,我们通过提供理论误差界分析了将EBMs蒸馏为ARMs的过程。我们的结果揭示了ARMs尽管基于下一个词元预测范式,却具备规划能力的原因。

英文摘要

Autoregressive models (ARMs) currently constitute the dominant paradigm for large language models (LLMs). Energy-based models (EBMs) represent another class of models, which have historically been less prevalent in LLM development, yet naturally characterize the optimal policy in post-training alignment. In this paper, we provide a unified view of these two model classes. Taking the chain rule of probability as a starting point, we establish an explicit bijection between ARMs and EBMs in function space, which we show to correspond to a special case of the soft Bellman equation in maximum entropy reinforcement learning. Building upon this bijection, we derive the equivalence between supervised learning of ARMs and EBMs. Furthermore, we analyze the distillation of EBMs into ARMs by providing theoretical error bounds. Our results provide insights into the ability of ARMs to plan ahead, despite being based on the next-token prediction paradigm.

2512.06823 2026-05-26 math.ST stat.TH

Double Local-to-Unity: Inference under Nearly Nonstationary Volatility

双局部单位根:近非平稳波动下的推断

Abir Sarkar, Martin T. Wells

AI总结 针对均值和波动率动态均具有持久性的自回归模型,发展了中度偏离极限理论,提出双局部化框架,证明在近非平稳波动下自回归系数OLS估计量的极限分布,并建立波动率稳健的统计推断方法。

详情
AI中文摘要

本文针对均值和波动率动态具有联合持久性的自回归模型,发展了中度偏离极限理论。自回归系数允许以比经典1/n速率更慢的速度向单位根漂移,同时波动率持久性参数以更慢的对数阶收敛到1,使得条件方差过程本身近乎非平稳且其无条件矩可能发散。这种双局部化允许方差过程近乎非平稳且缓慢演变,正如在金融数据和资产价格泡沫时期所观察到的。在标准正则性条件下,我们建立了自回归系数OLS估计量的相合性和分布极限,该估计量在高度持久随机波动存在下仍然有效。我们证明最小二乘推断的有效归一化由平均波动率尺度决定,并推导了在联合漂移和波动率动态下OLS估计量的鞅极限定理。在温和平稳情形(自回归根从下方接近1)下,OLS估计量渐近正态。在温和爆炸情形(自回归根从上方接近1)下,基于OLS的自归一化统计量收敛到柯西极限。引人注目的是,在两种情形下,我们的统计量的极限律对波动率过程的详细设定保持不变,即使条件方差本身近乎非平稳。总体而言,这些结果将中度偏离渐近理论扩展到波动率持久性漂移的情形,统一了局部单位根推断与近非平稳随机波动,并为接近不稳定和出现泡沫的实证工作提供了实际可用的波动率稳健统计量。

英文摘要

This article develops a moderate-deviation limit theory for autoregressive models with jointly persistent mean and volatility dynamics. The autoregressive coefficient is allowed to drift toward unity slower than the classical 1/n rate, while the volatility persistence parameter also converges to one at an even slower, logarithmic order, so that the conditional variance process is itself nearly nonstationary and its unconditional moments may diverge. This double localization allows the variance process to be nearly nonstationary and to evolve slowly, as observed in financial data and during asset price bubble episodes. Under standard regularity conditions, we establish consistency and distributional limits for the OLS estimator of the autoregressive coefficient that remains valid in the presence of highly persistent stochastic volatility. We show that the effective normalization for least squares inference is governed by an average volatility scale, and we derive martingale limit theorems for the OLS estimator under joint drift and volatility dynamics. In a mildly stationary regime (where the autoregressive root approaches one from below), the OLS estimator is asymptotically normal. In a mildly explosive regime (where the root approaches one from above), an OLS based self normalized statistic converges to a Cauchy limit. Strikingly, in both regimes, the limiting laws of our statistics are invariant to the detailed specification of the volatility process, even though the conditional variance is itself nearly nonstationary. Overall, the results extend moderate-deviation asymptotics to settings with drifting volatility persistence, unify local to unity inference with nearly nonstationary stochastic volatility, and deliver practically usable volatility robust statistics for empirical work in settings approaching instability and exhibiting bubbles.

2510.26051 2026-05-26 econ.EM math.ST stat.ME stat.TH

Estimation and Inference in Boundary Discontinuity Designs: Distance-Based Methods

边界不连续设计中的估计与推断:基于距离的方法

Matias D. Cattaneo, Rocio Titiunik, Ruiqi Rae Yu

AI总结 本文研究非参数距离基(各向同性)局部多项式方法,用于估计边界平均处理效应曲线,并建立点态和均匀的识别、估计与推断结果,揭示了边界几何正则性在可行收敛速率和有效推断中的核心作用。

详情
AI中文摘要

我们研究非参数距离基(各向同性)局部多项式方法,用于估计边界平均处理效应曲线,该曲线是捕捉边界不连续设计中处理效应异质性的因果泛函。我们沿处理分配边界建立了点态和均匀的识别、估计与推断结果。我们证明,边界(一维流形)的几何正则性在确定可行收敛速率和有效推断程序中起着核心作用。我们的理论贡献有三方面。首先,我们推导了各向同性局部多项式估计量误设偏差收敛速率的均匀下界和上界。其次,我们获得了均匀分布近似,为边界稳健推断提供了依据。第三,我们为一类广泛的非参数各向同性回归估计量建立了极小极大下界。这些结果为实证实施提供了实用指导,包括适应处理分配边界局部不规则性的新带宽选择规则。我们通过模拟证据和一个实证应用说明了所提出的方法,并提供了配套的通用软件。

英文摘要

We study nonparametric distance-based (isotropic) local polynomial methods for estimating the boundary average treatment effect curve, a causal functional that captures treatment effect heterogeneity in boundary discontinuity designs. We establish identification, estimation, and inference results both pointwise and uniformly along the treatment assignment boundary. We show that the geometric regularity of the boundary, a one-dimensional manifold, plays a central role in determining feasible convergence rates and valid inference procedures. Our theoretical contributions are threefold. First, we derive uniform lower and upper bounds on the convergence rate of the misspecification bias of isotropic local polynomial estimators. Second, we obtain uniform distributional approximations that justify boundary-robust inference. Third, we establish minimax lower bounds for a broad class of nonparametric isotropic regression estimators. These results yield practical guidance for empirical implementation, including new bandwidth selection rules that adapt to local irregularities of the treatment-assignment boundary. We illustrate the proposed methods using simulation evidence and an empirical application, and provide companion general-purpose software.

2508.17090 2026-05-26 stat.ML cs.LG

Neural Stochastic Differential Equations on Compact State Spaces: Theory, Methods, and Application to Suicide Risk Modeling

紧致状态空间上的神经随机微分方程:理论、方法及其在自杀风险建模中的应用

Malinda Lu, Yue-Jane Liu, Matthew K. Nock, Yaniv Yacoby

AI总结 针对生态瞬时评估数据中随机微分方程违反域约束和训练不稳定的问题,提出一种新型表达性SDE,通过约束漂移和扩散确保解在紧致多面体状态空间内,并引入参数化映射任意动力学为满足约束的SDE,在真实数据上提升预测和优化性能。

详情
Comments
Accepted at the Symposium on Probabilistic Machine Learning (ProbML) 2026, and at the Methods and Opportunities at Small Scale (MOSS), ICML 2025, Vancouver, Canada
AI中文摘要

生态瞬时评估(EMA)研究能够通过智能手机收集自杀想法和行为(STB)的高频自我报告。潜在随机微分方程(SDE)是EMA数据的一个有前景的模型类别,因为数据是不规则采样、有噪声且部分观测的。但基于SDE的模型存在两个关键限制。(a) 这些模型经常违反域约束,削弱了模型的科学有效性和临床信任。(b) 训练在数值上不稳定,除非采用临时修复(例如过度简化的动力学),而这些修复不适合高风险应用。在此,我们开发了一类新型表达性SDE,其解被证明被限制在预设的紧致多面体状态空间内,与EMA数据的域匹配。在这项工作中,(1) 我们从理论和经验上展示了为什么基于链式法则的紧致域上SDE构造会失败;(2) 我们推导了一般和稳态SDE的漂移和扩散约束,使其解保持在所需状态空间内;(3) 我们引入了一种参数化方法,将任意(神经或专家给出的)动力学映射为满足约束的SDE。在多个真实EMA数据集上,包括一项大型自杀风险研究,我们的参数化方法在预测和优化动力学方面优于标准潜在神经SDE基线。这些贡献为自杀风险和其他临床时间序列的原则性、可信赖的连续时间模型铺平了道路,并将基于SDE的方法(例如扩散模型)的应用扩展到具有硬状态约束的领域。

英文摘要

Ecological Momentary Assessment (EMA) studies enable the collection of high-frequency self-reports of suicidal thoughts and behaviors (STBs) via smartphones. Latent stochastic differential equations (SDEs) are a promising model class for EMA data, as it is irregularly sampled, noisy, and partially observed. But SDE-based models suffer from two key limitations. (a) These models often violate domain constraints, undermining scientific validity and clinical trust of the model. (b) Training is numerically unstable without ad hoc fixes (e.g. oversimplified dynamics) that are ill-suited for high-stakes applications. Here, we develop a novel class of expressive SDEs whose solutions are provably confined to a prescribed compact polyhedral state space, matching the domains of EMA data. In this work, (1) we show why chain-rule based constructions of SDEs on compact domains fail, theoretically and empirically; (2) we derive constraints on drift and diffusion for general and stationary SDEs so their solutions remain in the desired state space; and (3), we introduce a parameterization that maps arbitrary (neural or expert-given) dynamics into constraint-satisfying SDEs. On several real EMA datasets, including a large suicide-risk study, our parameterization improves forecasts and optimization dynamics over standard latent neural SDE baselines. These contributions pave the way for principled, trustworthy continuous-time models of suicide risk and other clinical time series and extend applications of SDE-based methods (e.g. diffusion models) to domains with hard state constraints.

2507.02215 2026-05-26 stat.ML cs.LG cs.NA math.NA

Hybrid least squares for learning functions from highly noisy data

混合最小二乘法:从高噪声数据中学习函数

Ben Adcock, Bernhard Hientzsch, Akil Narayan, Yiming Xu

AI总结 针对高噪声数据下的最小二乘函数逼近问题,提出结合Christoffel采样与最优实验设计的混合方法,在样本点生成和噪声平滑方面实现最优性,提升计算效率和样本复杂度,并扩展到凸性约束和自适应随机子空间场景。

详情
Comments
30 pages
AI中文摘要

受高效估计条件期望需求的驱动,我们考虑一个数据严重污染的最小二乘函数逼近问题。在小噪声情况下有效的现有方法在存在大噪声时表现不佳。为了解决这个问题,我们提出了一种混合方法,将Christoffel采样与最优实验设计相结合。我们证明,所提出的算法在样本点生成和噪声平滑方面都具有适当的优化特性,与现有方法相比,提高了计算效率和样本复杂度。我们还将该算法扩展到凸性约束设置,并具有类似的理论保证。当目标函数定义为随机场的期望时,我们进一步扩展我们的方法以利用自适应随机子空间,并建立了自适应过程逼近能力的结果。我们的理论发现得到了数值研究的支持,包括合成数据以及计算金融中更具挑战性的随机模拟问题。

英文摘要

Motivated by the need for efficient estimation of conditional expectations, we consider a least-squares function approximation problem with heavily polluted data. Existing methods that are effective in the small-noise regime are suboptimal when large noise is present. To address this issue, we propose a hybrid approach that combines Christoffel sampling with optimal experimental design. We show that the proposed algorithm enjoys appropriate optimality properties for both sample point generation and noise mollification, leading to improved computational efficiency and sample complexity compared to existing methods. We also extend the algorithm to convexity-constrained settings with similar theoretical guarantees. When the target function is defined as the expectation of a random field, we further extend our approach to leverage adaptive random subspaces and establish results on the approximation capacity of the adaptive procedure. Our theoretical findings are supported by numerical studies on both synthetic data and on a more challenging stochastic simulation problem in computational finance.

2506.17326 2026-05-26 cs.LG stat.AP stat.ML

CopulaSMOTE: A Copula-Based Oversampling Approach for Imbalanced Classification in Diabetes Prediction

CopulaSMOTE:基于Copula的过采样方法用于糖尿病预测中的不平衡分类

Agnideep Aich, Md Monzur Murshed, Bruce Wade, Sameera Hewage

AI总结 提出CopulaSMOTE方法,利用截断藤copula建模少数类联合依赖结构生成合成样本,在三个糖尿病数据集上结合多种分类器评估,显示能改善大表格数据集的少数类恢复。

详情
AI中文摘要

类别不平衡仍然是糖尿病等疾病临床预测模型开发中的一个实际障碍,其中确诊病例的数量通常远少于对照组。合成少数类过采样技术(SMOTE)及其变体被广泛用于解决这种不平衡,但它们通过特征空间中的局部插值生成合成观测值,并未显式建模少数类的联合依赖结构。为了解决这一挑战,我们的研究引入了一种基于copula的数据增强方法,该方法在生成合成样本时估计少数类的依赖结构,并与标准机器学习技术集成。具体来说,我们采用截断藤copula通过一系列双变量构建块来表示多元依赖。我们在三个公共糖尿病数据集上评估了所提出的方法,即Pima Indians糖尿病数据集、Iraqi糖尿病数据集和CDC BRFSS 2015糖尿病健康指标数据集,这些数据集涵盖了不同的样本量、维度和不平衡程度。对于每个数据集,使用5×2交叉验证协议和Dietterich配对t检验,在五个分类器上比较了五种重采样策略。我们的研究结果表明,CopulaSMOTE可以改善较大表格糖尿病数据集(尤其是CDC BRFSS数据集)中的少数类恢复,但其优势取决于分类器和评估指标。

英文摘要

Class imbalance remains a practical obstacle in the development of clinical prediction models for conditions such as diabetes mellitus, where the number of confirmed cases is often much smaller than the number of controls. The Synthetic Minority Over-sampling Technique (SMOTE) and its variants are widely used to address this imbalance, but they generate synthetic observations through local interpolation in feature space and do not explicitly model the joint dependence structure of the minority class. To address this challenge, our study introduces a copula-based data augmentation approach that estimates the minority-class dependence structure when generating synthetic samples and integrates with standard machine learning techniques. Specifically, we employ truncated vine copulas to represent multivariate dependence through a sequence of bivariate building blocks. We evaluate the proposed approach on three public diabetes datasets, namely the Pima Indians Diabetes dataset, the Iraqi Diabetes dataset, and the CDC BRFSS 2015 Diabetes Health Indicators dataset, which together cover a range of sample sizes, dimensionalities, and imbalance regimes. For each dataset, five resampling strategies are compared across five classifiers using a 5 by 2 cross validation protocol with Dietterich's paired t test. Our findings suggest that CopulaSMOTE can improve minority-class recovery in larger tabular diabetes datasets, particularly the CDC BRFSS dataset, but its advantages depend on the classifier and evaluation metric.

2506.06840 2026-05-26 stat.ML cs.AI cs.LG stat.AP stat.OT

A Statistical Framework for Model Selection in LSTM Networks

LSTM网络中模型选择的统计框架

Fahad Mostafa

AI总结 针对LSTM网络模型选择依赖启发式且计算昂贵的问题,提出统一统计框架,通过扩展信息准则和收缩估计到序列神经网络,定义适应时间结构的惩罚似然、广义阈值方法处理隐状态动态,并利用变分贝叶斯和近似边际似然实现高效估计,在生物医学数据上验证了灵活性和性能提升。

详情
AI中文摘要

长短期记忆(LSTM)神经网络模型已成为从自然语言处理到时间序列预测等众多应用中序列数据建模的基石。尽管取得了成功,但模型选择问题,包括超参数调优、架构规范和正则化选择,仍然很大程度上是启发式的且计算昂贵。在本文中,我们提出了一个统一的统计框架,用于LSTM网络中的系统模型选择。我们的框架将经典的模型选择思想,如信息准则和收缩估计,扩展到序列神经网络。我们定义了适应时间结构的惩罚似然,提出了一个用于隐状态动态的广义阈值方法,并利用变分贝叶斯和近似边际似然方法提供了高效的估计策略。几个以生物医学数据为中心的示例展示了所提出框架的灵活性和改进的性能。

英文摘要

Long Short-Term Memory (LSTM) neural network models have become the cornerstone for sequential data modeling in numerous applications, ranging from natural language processing to time series forecasting. Despite their success, the problem of model selection, including hyperparameter tuning, architecture specification, and regularization choice remains largely heuristic and computationally expensive. In this paper, we propose a unified statistical framework for systematic model selection in LSTM networks. Our framework extends classical model selection ideas, such as information criteria and shrinkage estimation, to sequential neural networks. We define penalized likelihoods adapted to temporal structures, propose a generalized threshold approach for hidden state dynamics, and provide efficient estimation strategies using variational Bayes and approximate marginal likelihood methods. Several biomedical data centric examples demonstrate the flexibility and improved performance of the proposed framework.

2506.06454 2026-05-26 cs.LG cs.AI stat.ML

LETS Forecast: Learning Embedology for Time Series Forecasting

LETS Forecast:用于时间序列预测的嵌入学

Abrar Majeedi, Viswanatha Reddy Gajjala, Satya Sai Srinath Namburi GNVV, Nada Magdi Elkordi, Yin Li

AI总结 提出DeepEDM框架,结合非线性动力系统建模与深度学习,通过延迟嵌入和核回归学习潜在动态,实现高精度时间序列预测。

详情
Comments
Accepted at International Conference on Machine Learning (ICML) 2025
AI中文摘要

现实世界的时间序列通常受复杂的非线性动力学支配。理解这些潜在动力学对于精确的未来预测至关重要。虽然深度学习在时间序列预测中取得了重大成功,但许多现有方法并未显式建模动力学。为弥补这一差距,我们引入了DeepEDM,一个将非线性动力系统建模与深度神经网络相结合的框架。受经验动态建模(EDM)启发并基于Takens定理,DeepEDM提出了一种新颖的深度模型,该模型从时间延迟嵌入中学习潜在空间,并使用核回归来逼近潜在动力学,同时利用softmax注意力的高效实现,允许对未来时间步进行准确预测。为了评估我们的方法,我们在非线性动力系统的合成数据以及跨领域的真实世界时间序列上进行了全面实验。结果表明,DeepEDM对输入噪声具有鲁棒性,并在预测准确性上优于最先进的方法。我们的代码可在以下网址获取:https://abrarmajeedi.github.io/deep_edm。

英文摘要

Real-world time series are often governed by complex nonlinear dynamics. Understanding these underlying dynamics is crucial for precise future prediction. While deep learning has achieved major success in time series forecasting, many existing approaches do not explicitly model the dynamics. To bridge this gap, we introduce DeepEDM, a framework that integrates nonlinear dynamical systems modeling with deep neural networks. Inspired by empirical dynamic modeling (EDM) and rooted in Takens' theorem, DeepEDM presents a novel deep model that learns a latent space from time-delayed embeddings, and employs kernel regression to approximate the underlying dynamics, while leveraging efficient implementation of softmax attention and allowing for accurate prediction of future time steps. To evaluate our method, we conduct comprehensive experiments on synthetic data of nonlinear dynamical systems as well as real-world time series across domains. Our results show that DeepEDM is robust to input noise, and outperforms state-of-the-art methods in forecasting accuracy. Our code is available at: https://abrarmajeedi.github.io/deep_edm.

2412.06114 2026-05-26 stat.ME

Randomized interventional effects in semicompeting risks, with application to a hematopoietic cell transplantation study

半竞争风险中的随机干预效应及其在造血细胞移植研究中的应用

Yuhao Deng, Rui Wang, Tao Zhang, Xiang Zhan

AI总结 针对半竞争风险数据,提出基于随机干预的因果效应分解方法,通过非参数极大似然估计和敏感性分析评估中间事件对终末事件的中介效应。

详情
AI中文摘要

在临床研究中,主要(终末)事件的风险可能因中间事件而改变,导致半竞争风险。为了研究通过中间事件介导的终末事件的治疗效果,研究者希望将总效应分解为直接效应和间接效应。在本文中,我们将随机干预方法扩展到时间-事件结局,其中中间事件和终末事件都可能受到右删失。我们设想从参考分布中随机抽取中间事件过程,该分布可以是边际上随时间变化的混杂因素,也可以是条件于观测历史的。我们给出了干预效应的识别公式,并讨论了识别假设的一些变体。我们使用非参数极大似然估计来估计治疗效果,并提出了一种包含潜在脆弱性的敏感性分析。作为示例,我们在一项以移植物抗宿主病(GVHD)为时变混杂因素的造血细胞移植研究中,研究了匹配无关供体与半相合供体对由复发介导的死亡的影响。我们发现,对于淋巴瘤患者,在使用移植后环磷酰胺(PTCy)GVHD预防方案的情况下,匹配无关供体移植在生存率方面更优。

英文摘要

In clinical studies, the risk of the primary (terminal) event may be modified by intermediate events, resulting in semicompeting risks. To study the treatment effect on the terminal event mediated by the intermediate event, researchers wish to decompose the total effect into direct and indirect effects. In this article, we extend the randomized interventional approach to time-to-event outcomes, where both intermediate and terminal events are subject to right censoring. We envision a random draw for the intermediate event process from a reference distribution, either marginally over time-varying confounders or conditionally given the observed history. We present the identification formula for interventional effects. We also discuss some variants of the identification assumptions. We estimate the treatment effects using nonparametric maximum likelihood estimation and propose a sensitivity analysis that incorporates a latent frailty. As an illustration, we study the effect of matched unrelated donor versus haploidentical donor on death mediated by relapse in a hematopoietic cell transplantation study with graft-versus-host disease (GVHD) as the time-varying confounder. We find that matched unrelated donor transplantation is preferable in terms of survival rates under the use of post-transplant PTCy GVHD prophylaxis for lymphoma patients.

2406.04374 2026-05-26 cs.IR cs.GT cs.LG stat.ML

Incentivized Exploration with Stochastic Covariates: A Two-Stage Mechanism Design for Recommender System

带随机协变量的激励探索:推荐系统的两阶段机制设计

Yuantong Li, Guang Cheng, Xiaowu Dai

AI总结 针对推荐系统中用户自利偏好下的探索-利用权衡问题,提出一种两阶段算法,通过激励相容的探索和逆比例间隙采样策略实现次线性遗憾并满足激励约束。

详情
Comments
ICML 2026
AI中文摘要

推荐系统通过连接用户与相关产品在互联网经济中扮演关键角色。然而,设计有效的推荐系统面临关键挑战:在确保探索新产品的激励与用户自利偏好之间的探索-利用权衡。先前工作解决了固定设计线性bandit中的贝叶斯激励相容性(Sellke & Slivkins, 2023),我们则应对在线采样的随机用户协变量的挑战。与标准的黑箱归约(Mansour et al., 2020)不同,我们的两阶段框架利用线性奖励结构,在满足激励约束的同时实现次线性遗憾。为解决该问题,我们提出一种两阶段算法,将激励探索与任何高效的即插即用离线学习算法相结合。在第一阶段,算法在保持激励相容性的同时探索产品以收集最优样本。第二阶段采用逆比例间隙采样策略(IPGS)与任何高效学习方法相结合,以确保次线性遗憾。理论上,我们证明算法RCB实现了$O(\sqrt{KdT})$遗憾,同时满足激励约束,并发现了激励预算与遗憾之间的权衡,实验验证了这一点。通过在个性化华法林剂量调整的实际应用和模拟中,我们展示了RCB的强激励增益、次线性遗憾和鲁棒性。

英文摘要

Recommender systems play a crucial role in internet economies by connecting users with relevant products. However, designing effective recommender systems faces the key challenges: the exploration-exploitation tradeoff in securing incentive to explore new products against user's self-interested preferences. While prior work addresses Bayesian Incentive Compatibility (BIC) in fixed-design linear bandits (Sellke & Slivkins, 2023), we tackle the challenge of stochastic user covariates sampled online. Unlike standard black-box reductions (Mansour et al., 2020), our two-stage framework exploits the linear reward structure to achieve sublinear regret while satisfying incentive constraints. To address it, we propose a two-stage algorithm that integrates incentivized exploration with any efficient plug-in offline learning algorithms. In the first stage, it explores products while maintaining incentive compatibility to gather optimal samples. The second stage employs inverse proportional gap sampling strategy (IPGS) integrated with any efficient learning methods to secure sublinear regret. Theoretically, we prove that algorithm RCB achieves $O(\sqrt{KdT})$ regret and simultaneously satisfies incentive constraints, and discovers the tradeoff between incentive budget and regret, validating in experiments. We demonstrate RCB's strong incentive gain, sublinear regret, and robustness through a real application on personalized warfarin dosing and simulations.

2404.14328 2026-05-26 stat.CO physics.ao-ph physics.data-an stat.ME stat.ML

Preserving linear invariants in ensemble filtering methods

保持集合滤波方法中的线性不变量

Mathieu Le Provost, Jan Glaubitz, Youssef Marzouk

AI总结 针对非高斯滤波问题,利用测度传输理论提出一类保持线性不变量的非线性集合滤波器,并展示如何与正则化技术结合。

详情
Journal ref
Journal of Computational Physics (2026)
Comments
25 pages
AI中文摘要

数据同化将动力学模型与观测相结合以改进状态估计。集合滤波器通过随时间更新一组样本,在预报和分析步骤之间交替,顺序同化观测值。准确且稳健的预测通常需要保持关键不变量,如质量、化学物种的化学计量平衡和电荷。虽然现代数值求解器保持这些不变量,但现有的保持不变量分析步骤仅限于高斯设置。此外,它们可能与膨胀和协方差锥化等正则化技术不兼容。在这项工作中,我们专注于保持非高斯滤波问题中的线性不变量。利用测度传输理论的工具,我们引入了一类新颖的非线性集合滤波器,可以保持任何期望的线性不变量。值得注意的是,在高斯设置的特殊情况下,我们恢复了卡尔曼滤波器的约束形式。我们还展示了如何在集合卡尔曼滤波器中结合保持不变量与正则化技术。数值实验说明了在集合卡尔曼滤波器和基于传输的非线性集合滤波器中保持线性不变量的好处。

英文摘要

Data assimilation combines dynamical models with observations to improve state estimates. Ensemble filters sequentially assimilate observations by updating a set of samples over time, alternating between a forecast and an analysis step. Accurate and robust predictions often require preserving critical invariants such as mass, stoichiometric balance of chemical species, and electrical charge. While modern numerical solvers maintain these invariants, existing invariant-preserving analysis steps are limited to Gaussian settings. Furthermore, they can be incompatible with regularization techniques such as inflation and covariance tapering. In this work, we focus on preserving linear invariants in non-Gaussian filtering problems. Leveraging tools from measure transport theory, we introduce a novel class of nonlinear ensemble filters that preserve any desired linear invariants. Notably, we recover a constrained formulation of the Kalman filter for the special case of the Gaussian setting. We also demonstrate how to combine preserving invariants with regularization techniques in the ensemble Kalman filter. Numerical experiments illustrate the benefits of preserving linear invariants in both ensemble Kalman filters and transport-based nonlinear ensemble filters.

2404.12589 2026-05-26 math.PR cs.IT math.IT math.OC stat.CO

Geometry and factorization of multivariate Markov chains with applications to MCMC acceleration and approximate inference

多元马尔可夫链的几何与分解及其在MCMC加速和近似推断中的应用

Michael C. H. Choi, Youjia Wang, Geoffrey Wolfer

AI总结 本文通过信息几何视角分析多元马尔可夫链转移矩阵的可分解性,提出投影采样器以加速MCMC混合,并设计因子滤波方案实现高维近似推断。

详情
Comments
47 pages, 6 figures
AI中文摘要

本文分析了多元马尔可夫链转移矩阵的可分解性和几何结构。具体地,我们证明了乘积空间因子上的诱导链可以视为关于Kullback-Leibler散度的信息投影。这一视角导出了Han-Shearer型不等式和马尔可夫链熵率的子模性,以及在大型偏差和混合时间比较中的应用。作为马尔可夫链蒙特卡洛(MCMC)和近似推断中的具体算法应用,我们提供了三个基于提升MCMC、交换算法和因子滤波的实例,以展示投影采样器比原始采样器改善了混合。基于交换算法的投影采样器在每一步平稳时重新采样最高温度坐标,我们证明与原始交换算法相比,这种做法将混合时间加速了与温度数量和底层状态空间维度相关的乘法因子。通过对双峰目标分布的简单数值实验,我们展示了投影采样器有效混合,而提升MCMC和交换算法混合较差。在滤波中,我们提出的因子滤波方案能够扩展到高维,每一步的计算成本与维度呈线性关系,但代价是近似误差,该误差可通过与独立性的距离来跟踪,而精确滤波每一步的计算成本与维度呈指数关系。

英文摘要

This paper analyzes the factorizability and geometry of transition matrices of multivariate Markov chains. Specifically, we demonstrate that the induced chains on factors of a product space can be regarded as information projections with respect to the Kullback-Leibler divergence. This perspective yields Han-Shearer type inequalities and submodularity of the entropy rate of Markov chains, as well as applications in the context of large deviations and mixing time comparison. As concrete algorithmic applications in Markov chain Monte Carlo (MCMC) and approximate inference, we provide three illustrations based on lifted MCMC, swapping algorithm and factored filtering to demonstrate projection samplers improve mixing over the original samplers. The projection sampler based on the swapping algorithm resamples the highest-temperature coordinate at stationarity at each step, and we prove that such practice accelerates the mixing time by multiplicative factors related to the number of temperatures and the dimension of the underlying state space when compared with the original swapping algorithm. Through simple numerical experiments on a bimodal target distribution, we show that the projection samplers mix effectively, in contrast to lifted MCMC and the swapping algorithm, which mix less well. In filtering, our proposed factored filtering scheme is able to scale to high dimensions with linear-in-dimension computational cost per step at the price of an approximation error that can be tracked using the distance to independence, compared with the exponential-in-dimension cost per step of the exact filter.

2404.08073 2026-05-26 math.OC cs.LG stat.ML

Spurious Stationarity and Hardness Results for Bregman Proximal-Type Algorithms

Bregman近端类型算法的伪平稳性和困难结果

He Chen, Jiajin Li, Anthony Man-Cho So

AI总结 本文揭示了Bregman近端类型算法(如镜像下降)在非欧几何下可能陷入伪平稳点,即使对于凸问题,若Bregman核的梯度非Lipschitz连续,停滞可无限持续,并指出该现象在非凸多面体约束问题中普遍存在,挑战了现有收敛性理论。

详情
AI中文摘要

Bregman近端类型算法(BPs),如镜像下降,已成为机器学习和数据科学中通过非欧几何利用问题结构的流行工具。在本文中,我们表明BPs可能被困在一类非平稳点附近,我们称之为\emph{伪平稳点}。如果Bregman核的梯度不是Lipschitz连续的,即使对于凸问题,这种停滞也可能持续任意有限次迭代。根本原因在于欧几里得几何和Bregman几何在下降行为上的根本对比:虽然欧几里得梯度下降确保在任何非平稳点附近充分下降,但BPs可能在伪平稳点附近表现出任意缓慢的下降。因此,常用的基于Bregman的平稳性度量,例如Bregman散度的相对变化,可能在伪平稳点附近消失。这可能误导性地表明收敛,即使迭代点仍远离任何真正的平稳点。我们的分析进一步揭示,伪平稳点并非病态,而是在具有多面体约束的广泛非凸问题类中普遍出现。综上所述,我们的发现揭示了基于Bregman的优化方法中的一个严重盲点,并呼吁新的理论工具和算法保障以确保可靠的收敛。

英文摘要

Bregman proximal-type algorithms (BPs), such as mirror descent, have become popular tools in machine learning and data science for exploiting problem structures through non-Euclidean geometries. In this paper, we show that BPs can get trapped near a class of non-stationary points, which we term \emph{spurious stationary points}. Such stagnation can persist for any finite number of iterations if the gradient of the Bregman kernel is not Lipschitz continuous, even in convex problems. The root cause lies in a fundamental contrast in descent behavior between Euclidean and Bregman geometries: While Euclidean gradient descent ensures sufficient decrease near any non-stationary point, BPs may exhibit arbitrarily slow decrease around spurious stationary points. As a result, commonly used Bregman-based stationarity measure, such as relative change in terms of Bregman divergence, can vanish near spurious stationary points. This may misleadingly suggest convergence, even when the iterates remain far from any true stationary point. Our analysis further reveals that spurious stationary points are not pathological, but rather occur generically in a broad class of nonconvex problems with polyhedral constraints. Taken together, our findings reveal a serious blind spot in Bregman-based optimization methods and calls for new theoretical tools and algorithmic safeguards to ensure reliable convergence.

2403.04545 2026-05-26 cs.LG math.ST stat.TH

Branch Scaling Manifests as Implicit Architectural Regularization for Improving Generalization in Overparameterized ResNets

分支缩放表现为隐式架构正则化以改善过参数化ResNet的泛化能力

Zixiong Yu, Guhan Chen, Jianfa Lai, Bohan Li, Songtao Tian

AI总结 本文研究残差网络中分支缩放因子对过参数化ResNet泛化性能的影响,通过理论分析证明快速深度衰减的缩放因子结合早停可实现极小极大最优泛化率,并利用神经正切核(NTK)近似解释其机制。

详情
Comments
Accepted by ICML. This version incorporates content from the preprint arXiv:2305.18506. The contributors of the relevant content have consented to its inclusion and have been listed as authors
AI中文摘要

残差分支中的缩放因子已成为提升神经网络性能的流行方法,特别是在无归一化架构中。虽然先前的工作主要从优化角度研究缩放效应,本文通过泛化理论的视角探讨其在残差架构中的作用。具体来说,我们证明具有恒定缩放因子的宽残差网络(ResNet)随着深度增加渐近地变得不可学习。相反,当缩放因子表现出快速的深度方向衰减并结合早停时,过参数化ResNet实现了极小极大最优泛化率。为了建立这一结论,我们证明宽ResNet的泛化能力可以通过与神经正切核(NTK)相关的核回归来近似。我们的理论发现通过合成数据和真实世界分类任务(包括MNIST和CIFAR-100)的实验得到验证。

英文摘要

Scaling factors in residual branches have emerged as a prevalent method for boosting neural network performance, especially in normalization-free architectures. While prior work has primarily examined scaling effects from an optimization perspective, this paper investigates their role in residual architectures through the lens of generalization theory. Specifically, we establish that wide residual networks (ResNets) with constant scaling factors become asymptotically unlearnable as depth increases. In contrast, when the scaling factor exhibits rapid depth-wise decay combined with early stopping, over-parameterized ResNets achieve minimax-optimal generalization rates. To establish this, we demonstrate that the generalization capability of wide ResNets can be approximated by kernel regression associated with the Neural Tangent Kernel (NTK). Our theoretical findings are validated through experiments on synthetic data and real-world classification tasks, including MNIST and CIFAR-100.

2011.10254 2026-05-26 cs.LG cs.AI stat.ML

Unbalanced Incomplete Multi-view Clustering via the Scheme of View Evolution: Weak Views are Meat; Strong Views do Eat

通过视图演化方案的不平衡不完整多视图聚类:弱视图为食,强视图为食

Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

AI总结 针对不同视图不完整程度不平衡的问题,受生物进化理论启发,提出基于视图演化的不平衡不完整多视图聚类方法UIMC,通过加权多视图子空间聚类和低秩鲁棒表示恢复数据,显著提升聚类性能。

详情
Journal ref
IEEE Transactions on Emerging Topics in Computational Intelligence 2021
Comments
Accepted by IEEE Transactions on Emerging Topics in Computational Intelligence
AI中文摘要

不完整多视图聚类是处理现实世界中不完整多视图数据的重要技术。以往的工作假设所有视图具有相同的不完整性,即平衡不完整性。然而,不同的视图往往具有不同的不完整性,即不平衡不完整性,这导致了强视图(低不完整性视图)和弱视图(高不完整性视图)。不平衡不完整性阻止我们直接使用先前的方法进行聚类。在本文中,受有效生物进化理论的启发,我们设计了新颖的视图演化方案来聚类强视图和弱视图。此外,我们提出了一种不平衡不完整多视图聚类方法(UIMC),这是第一个基于视图演化的有效方法,用于不平衡不完整多视图聚类。与先前的方法相比,UIMC有两个独特的优势:1)它提出了加权多视图子空间聚类来整合这些不平衡不完整的视图,有效解决了不平衡不完整多视图问题;2)它设计了低秩和鲁棒表示来恢复数据,减少了不完整性和噪声的影响。大量的实验结果表明,UIMC在三个评估指标上相比其他最先进的方法将聚类性能提高了高达40%。

英文摘要

Incomplete multi-view clustering is an important technique to deal with real-world incomplete multi-view data. Previous works assume that all views have the same incompleteness, i.e., balanced incompleteness. However, different views often have distinct incompleteness, i.e., unbalanced incompleteness, which results in strong views (low-incompleteness views) and weak views (high-incompleteness views). The unbalanced incompleteness prevents us from directly using the previous methods for clustering. In this paper, inspired by the effective biological evolution theory, we design the novel scheme of view evolution to cluster strong and weak views. Moreover, we propose an Unbalanced Incomplete Multi-view Clustering method (UIMC), which is the first effective method based on view evolution for unbalanced incomplete multi-view clustering. Compared with previous methods, UIMC has two unique advantages: 1) it proposes weighted multi-view subspace clustering to integrate these unbalanced incomplete views, which effectively solves the unbalanced incomplete multi-view problem; 2) it designs the low-rank and robust representation to recover the data, which diminishes the impact of the incompleteness and noises. Extensive experimental results demonstrate that UIMC improves the clustering performance by up to 40% on three evaluation metrics over other state-of-the-art methods.

1708.07193 2026-05-26 stat.ML cs.CY

Applications of Trajectory Data from the Perspective of a Road Transportation Agency: Literature Review and Maryland Case Study

从道路运输机构视角的轨迹数据应用:文献综述与马里兰州案例研究

Nikola Marković, Przemysław Sekuła, Zachary Vander Laan, Gennady Andrienko, Natalia Andrienko

AI总结 本文从道路运输机构视角,通过文献综述和马里兰州2000万GPS轨迹数据的可视化探索,总结了轨迹数据在需求估计、人类行为建模、公共交通设计、交通性能测量与预测、环境与安全六个领域的应用。

详情
Comments
Revised after peer review
AI中文摘要

运输机构有机会利用日益可用的轨迹数据集来改进其分析和决策过程。然而,这些数据通常从供应商处购买,这意味着机构必须事先了解其潜在收益,以便正确评估其相对于获取成本的价值。虽然有关轨迹数据的文献丰富,但自然分散且侧重于小众领域的技术贡献,这使得政府机构难以评估其在不同运输领域的价值。为克服这一问题,本文从有意获取轨迹以增强其分析的道路运输机构视角探索轨迹数据。本文提供了文献综述,阐述了轨迹数据在道路运输系统分析的六个领域中的应用:需求估计、人类行为建模、公共交通设计、交通性能测量与预测、环境与安全。此外,本文还直观地探索了马里兰州的2000万条GPS轨迹,展示了轨迹数据的现有应用并提出了新的应用。

英文摘要

Transportation agencies have an opportunity to leverage increasingly-available trajectory datasets to improve their analyses and decision-making processes. However, this data is typically purchased from vendors, which means agencies must understand its potential benefits beforehand in order to properly assess its value relative to the cost of acquisition. While the literature concerned with trajectory data is rich, it is naturally fragmented and focused on technical contributions in niche areas, which makes it difficult for government agencies to assess its value across different transportation domains. To overcome this issue, the current paper explores trajectory data from the perspective of a road transportation agency interested in acquiring trajectories to enhance its analyses. The paper provides a literature review illustrating applications of trajectory data in six areas of road transportation systems analysis: demand estimation, modeling human behavior, designing public transit, traffic performance measurement and prediction, environment and safety. In addition, it visually explores 20 million GPS traces in Maryland, illustrating existing and suggesting new applications of trajectory data.

2605.25234 2026-05-26 cs.LG cs.AI stat.CO stat.ML

On the Epistemic Uncertainty of Overparametrized Neural Networks

关于过参数化神经网络的认知不确定性

David Rügamer

AI总结 本文通过非可辨识性视角分析过参数化神经网络的认知不确定性,刻画了离散和连续残余不确定性来源,并以单隐层ReLU网络为例验证理论。

详情
Comments
Accepted at ICML 2026 (Main Track)
AI中文摘要

认知不确定性通常被视为一种可减少的不确定性,随着数据增加而消失。这种观点隐含地假设参数可辨识,并将认知不确定性等同于预测变异性。然而,在过参数化神经网络中,由于对称性和冗余表示,模型参数通常不可辨识。因此,即使底层函数被完全识别,大量的参数不确定性仍然存在。在这项工作中,我们通过非可辨识性的视角分析认知不确定性,并刻画了残余不确定性的离散和连续来源。聚焦于单隐层ReLU网络,我们深入分析了由此产生的后验结构,并通过实证研究验证了我们的理论见解。

英文摘要

Epistemic uncertainty is often viewed as a reducible uncertainty that vanishes with increasing data. This perspective implicitly assumes parameter identifiability and equates epistemic uncertainty with predictive variability. In overparametrized neural networks, however, model parameters are typically non-identifiable due to symmetries and redundant representations. As a consequence, substantial parameter uncertainty can persist even when the underlying function is fully identified. In this work, we analyze epistemic uncertainty through the lens of non-identifiability and characterize both discrete and continuous sources of residual uncertainty. Focusing on one-hidden-layer ReLU networks, we thoroughly analyze the resulting posterior structure and validate our theoretical insights through empirical studies.

2605.25227 2026-05-26 math.HO math.PR math.ST stat.TH

From Coefficients to Distributions: De~Moivre and the Operational View of Probability

从系数到分布:棣莫弗与概率的运算观点

R. Labouriau

AI总结 本文追溯从棣莫弗正态曲线推导到现代分布统计学的概念谱系,提出分布-核偶对表示概率律的框架,并证明棣莫弗-拉普拉斯定理的分布版本。

详情
Comments
10 pages, two figures and two tables
AI中文摘要

我们追溯从亚伯拉罕·棣莫弗的正态曲线推导(1733年)到现代分布统计学方法的概念谱系。棣莫弗的《二项式项之和的逼近》首次系统推导了高斯密度、其归一化常数(由斯特林识别出$B = \sqrt{2π}$)以及精确到六位小数的尾部概率——比高斯早七十多年。他的方法——通过评估指示探针的和从概率律中提取信息——显然是分布统计学基础的运算观点的一个实例。 我们识别出一个四阶段链条:系数提取(棣莫弗)$\to$ 生成函数(欧拉、拉普拉斯)$\to$ 特征函数(傅里叶、莱维)$\to$ 分布配对$\langle T, φ\rangle$(施瓦茨)。在每个阶段,探针变得更加灵活,可研究的律类也变得更广。分布框架,其中概率律由分布-核对$(T, φ) \in \mathcal{S}'(\mathbb{R}) \times \mathcal{S}(\mathbb{R})$表示,是这一进展的自然终点。 我们阐述并证明了棣莫弗-拉普拉斯定理的分布版本:标准化二项分布在$\mathcal{S}'(\mathbb{R})$中收敛到高斯分布,棣莫弗的原始计算对应于指示测试函数的特例。我们还讨论了横截性框架,该框架通过退化层的无限余维数,为为什么参数统计模型中很少遇到矩不确定、不可识别和奇异Fisher信息等病态提供了几何解释。

英文摘要

We trace a conceptual genealogy from Abraham de Moivre's derivation of the normal curve (1733) to the modern distributional approach to statistics. De Moivre's Approximatio ad Summam Terminorum Binomii gave the first systematic derivation of the Gaussian density, its normalising constant (completed by Stirling's identification of $B = \sqrt{2π}$), and its tail probabilities computed to six decimal places -- more than seventy years before Gauss. His method -- extracting information from probability laws by evaluating sums against indicator probes -- is recognisably an instance of the operational viewpoint that underlies distributional statistics. We identify a four-stage chain: coefficient extraction (De Moivre) $\to$ generating functions (Euler, Laplace) $\to$ characteristic functions (Fourier, Lévy) $\to$ distributional pairings $\langle T, φ\rangle$ (Schwartz). At each stage the probes become more flexible and the class of laws that can be studied grows wider. The distributional framework, in which a probability law is represented by a distribution--kernel pair $(T, φ) \in \mathcal{S}'(\mathbb{R}) \times \mathcal{S}(\mathbb{R})$, is the natural endpoint of this progression. We formulate and prove a distributional version of the De Moivre--Laplace theorem: the standardised binomial distribution converges to the Gaussian in $\mathcal{S}'(\mathbb{R})$, with De Moivre's original computation corresponding to the special case of indicator test functions. We also discuss the transversality framework, which provides a geometric explanation -- via infinite codimension of degeneracy strata -- for why pathologies such as moment indeterminacy, non-identifiability, and singular Fisher information are rarely encountered in parametric statistical models.

2605.25210 2026-05-26 cs.LG cs.AI stat.ML

Multi-Objective Learning for Diffusion Models: A Statistical Theory under Semi-Supervised Learning

扩散模型的多目标学习:半监督学习下的统计理论

Ziheng Cheng, Yixiao Huang, Hanlin Zhu, Haoran Geng, Somayeh Sojoudi, Jitendra Malik, Pieter Abbeel, Xin Guo

AI总结 针对扩散模型在多目标学习中因模型容量增大导致统计成本高的问题,提出半监督两阶段训练方法,利用未标记数据通过伪样本蒸馏,证明所需配对样本量仅取决于专家模型复杂度。

详情
AI中文摘要

扩散模型越来越多地被用作强大的条件生成器,然而实际部署通常涉及来自不同任务的多个目标分布,例如文本到图像生成中的多样化提示域,或机器人技术中具有扩散策略的多个环境。这自然引出了多目标学习(MOL)问题。一个关键挑战是,实现良好的帕累托权衡可能需要一个通用模型类,其容量远大于解决任何单个任务所需的容量,从而增加了统计成本,因为样本复杂度通常随模型复杂度而扩展。为了调和这一点,我们为有限数据下的扩散模型开发了一个原则性的多目标学习框架:一种半监督机制,其中配对(标记)样本稀缺,但(未标记)条件数据丰富。我们提出了一种两阶段训练程序,首先从有限的配对数据中拟合轻量级专家模型,然后通过生成伪样本将它们蒸馏成一个通用模型。我们建立了泛化界限,表明所需的配对样本数量仅取决于专家模型类的复杂度。我们进一步将理论扩展到用于序列决策的扩散策略,以考虑在线策略展开中的分布偏移。在机器人控制和图像恢复任务上进行了大量实验,以验证我们的理论结果。

英文摘要

Diffusion models are increasingly used as powerful conditional generators, yet real deployments often involve multiple target distributions arising from different tasks, e.g., diverse prompt domains in text-to-image generation, or multiple environments in robotics with diffusion policies. This naturally leads to a multi-objective learning (MOL) problem. A key challenge is that achieving good Pareto trade-offs can require a generalist model class with substantially larger capacity than what suffices for solving any individual task, thereby increasing statistical cost since sample complexity typically scales with the model complexity. To reconcile this, we develop a principled MOL framework for diffusion models with limited data: a semi-supervised regime where paired (labeled) samples are scarce, but (unlabeled) condition data are abundant. We propose a two-stage training procedure that first fits lightweight specialist models from limited paired data, and then distills them into a generalist model by generating pseudo-samples. We establish generalization bounds showing that the required number of paired samples only depends on the complexity of the specialist model classes. We further extend the theory to diffusion policies for sequential decision making to account for distribution shift in on-policy rollouts. Extensive experiments on robotic control and image restoration tasks are conducted to verify our theoretical results.

2605.25173 2026-05-26 stat.ML cs.LG math.ST stat.TH

Nyström Kernel Stein Discrepancy Tests

Nyström 核 Stein 散度检验

Florian Kalinke, Zoltán Szabó, Bharath K. Sriperumbudur

AI总结 本文提出并理论证明 Nyström 加速的核 Stein 散度检验在保持渐近水平和局部一致性的同时,显著降低计算复杂度。

详情
AI中文摘要

核 Stein 散度(KSD)是通用域上最受欢迎的拟合优度(GoF)度量之一,已成功部署大量应用。KSD 的主要应用之一是构建强大的 GoF 检验。然而,依赖于经典 U-/V-统计量基 KSD 估计量的检验有两个主要缺点。(i)其运行时间随样本数量呈二次方增长。(ii)在大多数情况下,其渐近零分布计算上难以处理,通常通过自举法处理。虽然已知 Nyström 方法可以在温和条件下以无统计精度损失的方式加速 KSD 估计,但据我们所知,其对基于自举的 GoF 检验影响的基本问题尚未解决;解决此问题是本文的重点。特别地,我们证明了二次时间自举 KSD 基 GoF 检验的关键性质(渐近水平和局部一致性)由其 Nyström 加速版本保持。我们在球面数据和函数数据的 GoF 检验背景下数值展示了加速 KSD 估计量和自举的效率。我们的数值结果表明,Nyström 加速方法在统计性能上与二次时间方法相当,同时需要显著更小的运行时间。

英文摘要

Kernel Stein discrepancy (KSD) is among the most popular goodness-of-fit (GoF) measures on general domains with a large number of successful deployments. One of the main applications of KSD is in constructing powerful GoF tests. However, tests relying on the classical U-/V-statistic-based KSD estimators have two major drawbacks. (i) Their runtime scales quadratically in the number of samples. (ii) Their asymptotic null distribution is computationally intractable in most cases, typically handled by bootstrapping. While it is known that the Nyström method permits accelerating KSD estimation with no loss of statistical accuracy under mild conditions, to the best of our knowledge, the fundamental question of its impact on bootstrap-based GoF testing is open; resolving this question is the focus of the current paper. In particular, we prove that the key properties of the quadratic-time bootstrapped KSD-based GoF test (asymptotic level and local consistency) are preserved by its Nyström acceleration. We numerically demonstrate the efficiency of the accelerated KSD estimator and bootstrap in the context of GoF testing of spherical and functional data. Our numerical results show that the Nyström-accelerated method performs statistically on-par with the quadratic-time approach, while requiring substantially smaller runtime.

2605.25172 2026-05-26 stat.AP cs.DL cs.LG

Rejoinder: The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review

回复:ICML 2023 排名实验:审视机器学习/人工智能同行评审中的作者自我评估

Buxin Su, Jiayao Zhang, Natalie Collina, Yuling Yan, Didong Li, Kyunghyun Cho, Jianqing Fan, Aaron Roth, Weijie Su

AI总结 本文回应了关于ICML 2023排名实验的讨论,将同行评审视为统计估计问题,探讨了等渗机制的公平性与策略问题,并提出了结合审稿人排名和生成式AI时代以人为中心的评审框架。

详情
Comments
Rejoinder to the JASA Discussion of "The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review" (arXiv:2408.13430)
AI中文摘要

本文是对即将发表在《美国统计协会杂志》并附有讨论的《ICML 2023排名实验:审视机器学习/人工智能同行评审中的作者自我评估》一文的回复。为了回应讨论者提出的实践和理论观点,我们围绕四个核心主题组织回应:(i) 将同行评审表述为统计估计问题;(ii) 缓解等渗机制部署中的公平性和策略性担忧;(iii) 整合补充信号,如审稿人排名和结构化元数据;(iv) 探索生成式AI时代以人为中心的同行评审框架。

英文摘要

This article is the rejoinder to ``The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review,'' to appear in the Journal of the American Statistical Association with discussion. To address the practical and theoretical points raised by the discussants, we organize our response around four core themes: (i) formulating peer review as a statistical estimation problem; (ii) mitigating equity and strategic concerns in the deployment of the Isotonic Mechanism; (iii) incorporating complementary signals such as reviewer rankings and structured metadata; and (iv) exploring a human-centered framework for peer review in the era of generative AI.

2605.25169 2026-05-26 cs.LG stat.ME stat.ML

Learning Treatment Effects during Resource Allocation via Priority-Queue Randomization

资源分配中通过优先级队列随机化学习处理效应

JungHo Lee, Johnna Sundberg, Pim Welle, Bryan Wilder

AI总结 提出优先级队列随机化实验设计框架,在优先服务高需求个体的同时识别因果效应,并优化队列分配以平衡统计效率与优先级。

详情
AI中文摘要

公共服务项目通常在对其效益不确定的情况下分配有限资源,因此需要随机化来支持可信评估。然而在实践中,申请人通常进入等待名单,资源通过分层优先级队列优先分配给被认为需求更高的个体,这使得直接随机化变得困难。受此启发,我们开发了一个实验设计框架,用于在学习处理效应的同时优先治疗最需要帮助的个体,其中新申请人根据其评估的风险评分被随机分配到优先级队列。然后,在预算允许的情况下,按优先级顺序跨队列提供治疗,并在队列内按先到先得原则提供。我们的贡献有两方面。首先,我们描述了在这种优先级队列分配下哪些因果效应被识别。当到达是外生时,处理是条件随机化的,因此标准估计量被识别;当到达是内生时,队列随机化反而为处理提供了工具变量,识别出由排队过程引起的局部处理效应。其次,我们开发了优化的队列分配设计,以在统计效率与优先考虑高需求申请人之间进行权衡。在此过程中,我们表明,尽管设计导致的处理分配存在依赖性,但通常的独立同分布效率界限仍然是合理的设计目标。我们使用美国一个大县的住房分配项目的数据来说明所提出的设计。

英文摘要

Public service programs often allocate limited resources under uncertainty about their benefits, creating a need for randomization to support credible evaluation. In practice, however, applicants commonly enter waitlists where resources are prioritized toward individuals judged to have higher need through tiered priority queues, making direct randomization difficult. Motivated by this, we develop an experimental design framework for learning treatment effects while treating those most in need where incoming applicants are randomized into priority queues based on their assessed risk scores. Treatments are then provided across queues in priority order and first-in-first-out within queue as budget becomes available. Our contributions are two-fold. First, we characterize what causal effects are identified under this priority-queue allocation. When arrivals are exogenous, treatments are conditionally randomized, and hence standard estimands are identified; when arrivals are endogenous, queue randomization instead provides an instrument for treatment, identifying local treatment effects induced by the queuing process. Second, we develop optimized queue-assignment designs that trade off statistical efficiency against prioritizing higher-need applicants. We show in the process that, despite dependence in treatment assignments induced by the design, usual iid efficiency bounds remain well-justified design objectives. We illustrate the proposed designs using data from a housing allocation program in a large U.S. county.

2605.25146 2026-05-26 math.ST quant-ph stat.TH

Kernel Embedding for Operator-Valued Measures and Its Application to Quantum Tomography

算子值测度的核嵌入及其在量子层析成像中的应用

Philipp Nikolas Mayer, Ho Yun

AI总结 本文通过张量化的Bochner积分将正算子值测度嵌入再生核希尔伯特空间与量子态空间的张量积中,提出量子协方差嵌入,并应用于量子态层析成像,实现无需基依赖稀疏约束的最优密度估计。

详情
AI中文摘要

本文介绍了量子协方差嵌入,该嵌入通过张量化的Bochner积分将正算子值测度嵌入再生核希尔伯特空间与量子态空间的张量积中。这一构造引出了量子最大差异,它度量了量子测量的空间。将该框架应用于量子态层析成像,我们将密度估计重新表述为张量化核回归,从而无需现有方法所依赖的基相关稀疏约束即可实现最优推断。我们为量子Gram超算子发展了一套统一的几何设计理论,确立了酉设计是严格E-最优实验设计,因此在统计上优于泡利可观测量。对于一般的无结构估计,我们推导了精确的极小极大下界,并证明了我们的张量化估计量达到了这一最优速率。此外,我们引入了量子核回归(QUARK)估计量以适应物理实现的谱几何,推导了中心极限定理和集中不等式。为促进实际估计,我们建立了迹保持投影的精确性,并展示了在互不偏基下通过快速沃尔什-阿达玛变换实现高效估计。

英文摘要

This paper introduces the Quantum Covariance Embedding, which embeds Positive Operator-Valued Measures into a tensor product of a Reproducing Kernel Hilbert Space and the quantum state space via a tensorized Bochner integral. This construction induces the Quantum Maximum Discrepancy that metrizes the space of quantum measurements. Applying this framework to Quantum State Tomography, we reformulate density estimation as a tensorized kernel regression, enabling optimal inference without the basis-dependent sparsity constraints that restrict existing methods. We develop a unified geometric design theory for quantum Gram superoperators, establishing that Unitary Designs are strictly E-optimal experimental designs and thus statistically superior to Pauli observables. For general structure-free estimation, we derive the exact minimax lower bound and prove that our tensorized estimators achieve this optimal rate. Furthermore, we introduce the QUAntum Regression with Kernels (QUARK) estimator to accommodate the spectral geometry of physical implementations, deriving central limit theorem and concentration inequalities. To facilitate practical estimation, we establish the exactness of trace-preserving projections and demonstrate efficient estimation under mutually unbiased bases via the fast Walsh-Hadamard transform.

2605.25123 2026-05-26 cs.LG cs.AI cs.CL cs.CV stat.ML

Inference-Time Alignment of Diffusion Models via Trust-Region Iterative Twisted Sequential Monte Carlo

扩散模型的推理时对齐:基于信任区域迭代扭曲序贯蒙特卡洛方法

Weixin Wang, Yu Yang, Wei Deng, Pan Xu

AI总结 提出信任区域迭代扭曲序贯蒙特卡洛(TRI-TSMC)框架,通过迭代学习扭曲函数来改进扩散模型推理时的对齐,在文本生成和文本到图像生成任务上优于现有方法。

详情
Comments
34 pages, 6 figures, and 7 tables
AI中文摘要

我们研究基于扩散的生成模型的推理时对齐,旨在引导基础模型产生高奖励输出而不更新其权重。最近的基于序贯蒙特卡洛(SMC)的引导方法以原则性的方式近似奖励倾斜的目标分布,但其提议仍主要依赖于基础采样器。由于奖励信息主要通过粒子重加权和重采样在传播后使用,这些方法可能需要大量粒子预算,并遭受权重退化和高方差估计的问题。降低方差和提高粒子效率的一种方法是迭代学习提供前瞻指导的扭曲函数,如扭曲SMC。然而,现有的可学习扭曲方法主要针对经典序贯推理开发,当应用于具有高维状态空间和终端、噪声或黑盒奖励的扩散对齐时可能不稳定。我们提出信任区域迭代扭曲序贯蒙特卡洛(TRI-TSMC),一种用于在基于SMC的推理时对齐中学习扭曲函数的信任区域框架。每次迭代在路径空间中计算精确的KL约束更新,通过温度重要性重加权得到闭式解,并通过加权最大似然将该目标投影回参数化扭曲族。理论上,我们形式化了最优扭曲函数的值函数解释,并表明它产生零方差采样器。我们证明信任区域更新沿着护航路径朝向目标分布,加权最大似然更新是前向KL投影,并且该路径降低了残差重要性权重方差。实验上,在匹配的推理时预算下,TRI-TSMC在离散扩散文本生成和文本到图像生成上改进了主要对齐目标。

英文摘要

We study inference-time alignment for diffusion-based generative models, aiming to steer a base model toward high-reward outputs without updating its weights. Recent Sequential Monte Carlo (SMC)-based steering methods approximate reward-tilted target distributions in a principled way, but their proposals remain largely tied to the base sampler. Since reward information is mainly used after propagation through particle reweighting and resampling, these methods can require large particle budgets and suffer from weight degeneracy and high-variance estimates. One way to reduce variance and improve particle efficiency is to iteratively learn twisting functions that provide look-ahead guidance, as in twisted SMC. However, existing learnable twisting methods are developed mainly for classical sequential inference and can be unstable when applied to diffusion-based alignment with high-dimensional state spaces and terminal, noisy, or black-box rewards. We propose Trust-Region Iterative Twisted Sequential Monte Carlo (TRI-TSMC), a trust-region framework for learning twisting functions in SMC-based inference-time alignment. Each iteration computes an exact KL-constrained update in path space, which admits a closed-form solution by tempered importance reweighting, and projects this target back to the parameterized twisted family by weighted maximum likelihood. Theoretically, we formalize the value-function interpretation of the optimal twisting function and show that it yields a zero-variance sampler. We prove that the trust-region update follows an escort path toward the target distribution, that the weighted maximum-likelihood update is a forward-KL projection, and that the path reduces residual importance-weight variance. Empirically, TRI-TSMC improves primary alignment objectives on discrete diffusion text generation and text-to-image generation under matched inference-time budgets.

2605.25114 2026-05-26 stat.ML cs.LG

Counterfactually Safe Reinforcement Learning

反事实安全的强化学习

Jingyi Li, Peng Wu, Chengchun Shi

AI总结 针对强化学习策略可能对个体造成伤害的问题,提出基于反事实视角定义个体伤害,并设计两阶段学习过程以最大化期望回报同时控制伤害率,理论证明有限样本性质与次优性上界,实验验证有效性。

详情
AI中文摘要

强化学习算法通常被设计为最大化群体上的期望回报。然而,平均最优的策略可能对某些个体是次优的,导致潜在的安全问题。为了解决这个问题,我们首先从反事实角度形式化了个体伤害的概念,并将伤害定义为所选动作导致结果严格差于基线替代方案的事件。然后,我们提出了一种通用的两阶段过程来学习策略,该策略在考虑个体伤害的同时最大化期望回报。我们进一步建立了所学策略的有限样本性质,推导了其次优性差距的上界,并表明伤害率得到了良好控制。在模拟和真实数据集上的数值实验证明了所提出方法的有效性。

英文摘要

Reinforcement learning algorithms are generally designed to maximize the expected return across a population. However, a policy that is optimal on average may be suboptimal for certain individuals, leading to potential safety concerns. To address this, we first formalize the notion of individual harm from a counterfactual perspective and define harm as the event in which a chosen action results in a strictly worse outcome than a baseline alternative. We then propose a general two-stage procedure for learning policies that maximize the expected return while accounting for individual harm. We further establish the finite-sample properties of the learned policy, derive an upper bound on its sub-optimality gap, and show that the harm rate remains well-controlled. Numerical experiments on both simulated and real-world datasets demonstrate the effectiveness of the proposed approach.

2605.25050 2026-05-26 stat.AP cs.LG q-bio.QM stat.ML

Multimodality Stacking with Blockwise missing values and application to the PIONeeR biomarkers study for prediction of resistance to immunotherapy

具有分块缺失值的多模态堆叠及其在预测免疫治疗耐药性的PIONeeR生物标志物研究中的应用

Mohamed Boussena, Florence Monville, Jacques Fieschi-Meric, Frederic Vely, Pierre Milpied, Julien Mazieres, Maurice Perol, Eric Vivier, Laurent Greillier, Fabrice Barlesi, Sebastien Benzekry

AI总结 提出多模态堆叠框架MSB,通过独立建模各模态特征并利用交叉验证堆叠元学习器聚合预测,解决高维和分块缺失问题,在PIONeeR研究中预测非小细胞肺癌免疫治疗无进展生存期,性能优于基线算法。

详情
AI中文摘要

在临床肿瘤学中,整合多模态数据集常受到高维性和分块缺失的阻碍,即特定患者子集无法获得完整数据源。标准生存模型通常难以处理这些缺失,导致结果偏倚或患者排除。我们提出具有分块缺失值的多模态堆叠(MSB),一种用于生存分析的晚期融合框架,它独立建模模态特定特征,然后通过交叉验证的堆叠元学习器聚合预测。MSB在PIONeeR研究(n=443名患者,来自八个异质来源的378个生物标志物)中进行了验证,以预测接受免疫治疗的晚期非小细胞肺癌患者的无进展生存期。MSB产生了比基线算法更高的预测性能(C-index)。改进幅度因基线强度而异:线性模型提高了15.9%(Wilcoxon符号秩检验p<0.001),随机生存森林提高了5.4%(p=0.002),梯度提升方法提高了2.1%(p=0.030)。除了区分能力外,MSB还缩小了泛化差距(5折交叉验证重复3次的训练-测试差异:0.055 vs 线性模型的0.380)。置换重要性分析确定了常规实验室标志物、临床特征和PD-L1表达为主要预测驱动因素。缺失块指示器的重要性可忽略,表明模型从生物标志物值而非数据可用性模式中学习。MSB为具有分块缺失的多模态生存预测提供了一个统计验证的框架。通过无需完整数据即可进行系统性生物标志物评估,MSB为生物医学研究中的预测建模提供了实用工具,有待外部验证。实现代码可在https://github.com/MohamedBoussena/MSB 根据Inria许可证获取。

英文摘要

Integrating multimodal datasets in clinical oncology is frequently hindered by high dimensionality and blockwise missingness, where entire data sources are unavailable for specific patient subsets. Standard survival models often struggle with these gaps, leading to biased results or patient exclusion. We introduce Multimodality Stacking with Blockwise missing values (MSB), a late-fusion framework for survival analysis that independently models modality-specific features before aggregating predictions via a cross-validated stacking meta-learner. MSB was validated on the PIONeeR study (n=443 patients, 378 biomarkers across eight heterogeneous sources) to predict progression-free survival in advanced non-small cell lung cancer patients receiving immunotherapy. MSB yielded higher predictive performance (C-index) than baseline algorithms. Improvements varied by baseline strength: linear models showed a 15.9% increase (p<0.001 for the Wilcoxon signed-rank test), random survival forests gained 5.4% (p=0.002), and gradient boosting methods improved by 2.1% (p=0.030). Beyond discrimination, MSB reduced the generalization gap (train-test difference in 5 folds cross-validation repeated 3 times: 0.055 vs 0.380 for linear models). Permutation importance analysis identified routine laboratory markers, clinical features, and PD-L1 expression as primary predictive drivers. Missing block indicators showed negligible importance, suggesting the model learned from biomarker values rather than data availability patterns. MSB provides a statistically validated framework for multimodal survival prediction with blockwise missingness. By enabling systematic biomarker evaluation without requiring complete data, MSB offers a practical tool for predictive modeling in biomedical research, pending external validation. Implementation is available at https://github.com/MohamedBoussena/MSB under Inria license.

2605.25043 2026-05-26 stat.AP stat.ML

Shared Keyboard: An improved Bayesian design for phase I clinical trials via Beta kernel process

共享键盘设计:基于Beta核过程的I期临床试验改进贝叶斯设计

Jiangyan Zhao, Xian Shi, Jin Xu

AI总结 提出共享键盘设计,通过Beta核过程实现剂量间信息借用,提高最大耐受剂量识别的准确性和安全性。

详情
Comments
42 pages, 14 figures, 9 tables
AI中文摘要

模型辅助区间设计(如键盘设计)在I期肿瘤学试验中透明且易于实施。然而,仅基于当前剂量数据的临时决策可能忽略相邻剂量的信息信号,导致不必要的升阶或降阶。我们提出共享键盘设计,这是一种贝叶斯模型辅助设计,用Beta核过程通过核加权伪计数诱导的后验替代每个剂量处的独立Beta-二项式更新方案。该设计保留了键盘设计的决策结构,同时允许在相邻剂量间进行受控借用。为优先控制过量给药,我们提出一种非对称核,在升阶期间对较高剂量观察到的毒性赋予更大权重。我们进一步扩展该设计,以适应初始剂量网格不足时的自适应剂量插入和存在迟发性毒性时的至事件时间结果。大量模拟研究表明,在识别最大耐受剂量方面,准确性和安全性均有显著提升。在涉及剂量插入的场景中,所提设计比自适应剂量修改更有效地识别插入的目标剂量,同时保持相当的修改率。

英文摘要

Model-assisted interval designs such as the Keyboard design are transparent and easy to implement in phase I oncology trials. However, interim decisions based solely on data from the current dose may overlook informative signals from neighbouring doses, leading to unnecessary escalation or de-escalation. We propose the shared Keyboard design, a Bayesian model-assisted design that replaces the independent beta--binomial updating scheme at each dose with a posterior induced by a Beta kernel process using kernel-weighted pseudo-counts. The design preserves the decision structure of the Keyboard design while enabling controlled borrowing across nearby doses. To prioritise overdose control, we propose an asymmetric kernel that assigns greater weight to toxicities observed at higher doses during escalation. We further extend the proposed design to accommodate adaptive dose insertion when the initial dose grid is inadequate and time-to-event outcomes when late-onset toxicities are present. Extensive simulation studies demonstrate substantial improvements in both accuracy and safety for identifying the maximum tolerated dose. In settings involving dose insertion, the proposed design identifies inserted target doses more effectively than adaptive dose modification while maintaining a comparable modification rate.

2605.24995 2026-05-26 stat.ME

Information-Theoretic Reliability is Robust to Analytic Choice: A 24-Specification Multiverse on Public Cognitive Test-Retest Data

信息论可靠性对分析选择具有鲁棒性:基于公共认知测试-重测数据的24种规范多重宇宙分析

Maria Westrin

AI总结 本研究引入归一化信息论可靠性指标NLRΔ,结合ICC(2,1)和多重宇宙分析,在五个认知任务家族的50个主要测量上发现信息论可靠性无法解决可靠性悖论,且结果对分析选择鲁棒。

详情
Comments
12 pages, 2 figures, 3 tables; software and reproducibility materials archived at Zenodo DOI 10.5281/zenodo.20207371
AI中文摘要

背景。可靠性悖论描述了认知任务在产生稳健的群体水平效应时,往往在个体间可靠性上表现不佳的实证观察。现有方法主要依赖于组内相关系数(ICC),该系数仅捕捉测试与重测之间的线性、二阶矩依赖性。方法。我们引入ICC的归一化信息论补充指标NLRΔ,定义为经验估计互信息与测试-重测相关性隐含的分析高斯基线之间的差异。我们将NLRΔ与ICC(2,1)、偏差校正加速(BCa)自助法区间、Benjamini-Hochberg错误发现率(FDR)控制以及一个包含KSG最近邻参数、相关方法和最小样本阈值的24种多重宇宙分析相结合。整个流程由预先指定的声明合约、内容寻址溯源和SHA-256验证的原始数据摄取控制,并作为MixMind可靠性框架发布。结果。在来自Flanker、Stroop、Stop-Signal、Go/No-Go和Posner任务家族的50个可估计主要测量中,NLRΔ中位数为-0.138 nats,四分位距为[-0.257, -0.034]。50个主要测量中没有一个超过标题规则。伴随的ICC(2,1)分析重现了经典可靠性悖论模式,而24种规范多重宇宙分析中,1,200个可估计单元格中没有一个通过标题规则。结论。在这两个公共数据集上,用信息论可靠性指标替代或补充ICC并不能使认知任务摆脱可靠性悖论。稳健的零结果不受本文考察的分析选择影响。我们发布完整流程、原始数据哈希值和合约,以便精确复现并扩展到其他数据集和任务。

英文摘要

Background. The reliability paradox describes the empirical observation that cognitive tasks producing robust group-level effects often yield poor between-individual reliability. Existing approaches rely predominantly on the intraclass correlation coefficient (ICC), which captures only linear, second-moment dependence between test and retest. Methods. We introduce a normalized, information-theoretic complement to ICC, NLRΔ, defined as the difference between empirically estimated mutual information and the analytic Gaussian baseline implied by the test-retest correlation. We pair NLRΔ with ICC(2,1), bias-corrected and accelerated (BCa) bootstrap intervals, Benjamini-Hochberg false discovery rate (FDR) control, and a 24-cell multiverse over the KSG nearest-neighbour parameter, correlation method, and minimum-sample threshold. The full pipeline is governed by pre-specified claim contracts, content-addressed provenance, and SHA-256-verified raw data ingestion, and is released as the MixMind Reliability Framework. Results. Across 50 estimable primary measures from the Flanker, Stroop, Stop-Signal, Go/No-Go, and Posner task families, the median NLRΔ is -0.138 nats, with interquartile range [-0.257, -0.034]. Zero of 50 primary measures exceed the headline rule. The companion ICC(2,1) analysis recovers the classical reliability paradox pattern, and the 24-specification multiverse yields 0 of 1,200 estimable cells passing the headline rule. Conclusions. On these two public datasets, replacing or augmenting ICC with an information-theoretic reliability measure does not rescue cognitive tasks from the reliability paradox. The robust null is invariant to the analytic choices examined here. We release the full pipeline, raw-data hashes, and contracts to enable exact replication and extension to other datasets and tasks.

2605.24955 2026-05-26 math.NA cs.NA stat.ML

Debiasing Random Oblique Projections for Subsampled OLS and Fast CUR in High Dimensions

高维子采样OLS和快速CUR中随机斜投影的去偏

Chengmei Niu, Sachin Garg, Michał Dereziński, Zhenyu Liao

AI总结 针对随机斜投影在子采样最小二乘和快速低秩近似中因非线性逆导致的系统性偏差,提出统一的非渐近理论框架和去偏方法,并应用于子采样最小二乘和快速CUR分解以提升精度。

详情
Comments
52 pages, 4 figure, and 4 tables
AI中文摘要

随机抽样是现代机器学习和数值线性代数中降低大规模矩阵问题计算成本的基本工具。然而,现有分析主要依赖于子空间嵌入保证,未能精确刻画由抽样引起的非线性随机斜投影的统计偏差,这种偏差普遍存在于子采样最小二乘和快速低秩近似方法中。由于(伪)逆是非线性的,即使底层草图是无偏的,这些随机斜投影也可能存在系统性偏差,从而将隐藏偏差引入下游的最小二乘和低秩近似解。在这项工作中,我们发展了一个高维随机斜投影的统一非渐近理论。我们表明,标准的随机抽样方案通常会导致经典子空间嵌入风格分析所忽视的系统性统计偏差,并提出了一个原则性的去偏框架来纠正它。我们通过两个典型应用说明了该理论的力量。对于子采样最小二乘,我们获得了尖锐的偏差-方差特征,揭示了广泛使用的抽样方案中先前未被认识的统计次优性,并确定了去偏何时能带来可证明的改进。对于快速CUR分解,我们开发了一种具有改进近似精度的去偏方法。数值实验进一步验证了我们的理论发现。

英文摘要

Random sampling is a fundamental tool in modern machine learning and numerical linear algebra for reducing the computational cost of large-scale matrix problems. Existing analyses, however, rely primarily on subspace embedding guarantees, which do not precisely characterize the statistical bias of nonlinear random oblique projections induced by sampling, which arises ubiquitously in subsampled least squares and fast low-rank approximation methods. Because (pseudo)inversion is nonlinear, these random oblique projections can be systematically biased even when the underlying sketch is unbiased, thereby introducing hidden bias into downstream least squares and low-rank approximation solutions. In this work, we develop a unified non-asymptotic theory for random oblique projections in high dimensions. We show that standard random sampling schemes generally induce a systematic statistical bias overlooked by classical subspace embedding-style analyses, and we propose a principled debiasing framework to correct it. We illustrate the power of the theory through two canonical applications. For subsampled least squares, we obtain sharp bias--variance characterizations, reveal previously unrecognized statistical suboptimality in widely used sampling schemes, and identify when debiasing yields provable improvements. For fast CUR decomposition, we develop a debiased approach with improved approximation accuracy. Numerical experiments further validate our theoretical findings.

2605.24937 2026-05-26 math.PR cs.NA math.NA math.OC stat.CO stat.ML

Error estimates for tamed Euler and Randomized Euler schemes for SDEs with locally Lipschitz drift with applications to non-logconcave sampling and optimization

具有局部Lipschitz漂移的SDE的驯服欧拉和随机欧拉格式的误差估计及其在非对数凹采样和优化中的应用

Iosif Lytras, Angelos Ntousis

AI总结 针对具有局部Lipschitz超线性增长漂移的随机微分方程,提出KL加速驯服未调整Langevin算法(kTULA)和驯服随机中点方案(tRLMC),基于移位-组合方法建立局部误差框架,得到有限时间非渐近误差估计,并在对数Sobolev不等式下实现近最优迭代复杂度。

详情
Comments
38 pages
AI中文摘要

本文研究了具有局部Lipschitz、超线性增长漂移的随机微分方程的数值离散化,以及由此产生的对满足对数Sobolev不等式的非对数凹分布采样的影响。在此情形下,未调整Langevin算法(ULA)基础的经典Euler-Maruyama格式已知是不稳定的。我们分析了KL加速驯服未调整Langevin算法(kTULA)并引入了一种新的驯服随机中点方案,称为tRLMC。基于\cite{chewi2024local}的移位-组合方法,我们开发了两个新的局部误差框架,针对一般局部Lipschitz漂移,得到了kTULA的KL散度和tRLMC的全变差距离下相对于底层SDE的有限时间非渐近误差估计。将这些框架专门用于对数Sobolev不等式下的采样问题,我们获得了kTULA在KL散度下的近最优$\widetilde{O}(\varepsilon^{-1/2})$迭代复杂度,以及相应的全变差距离和Wasserstein距离保证。我们进一步首次建立了在超线性漂移增长下驯服随机Langevin方案的全变差距离非渐近保证,以及相应的Wasserstein距离界,tRLMC的复杂度均为$\widetilde{O}(\varepsilon^{-1})$。因此,两种方案都给出了非凸超额风险优化问题的非渐近界。

英文摘要

In this paper, we study the numerical discretization of stochastic differential equations with locally Lipschitz, super-linearly growing drift, and the resulting implications for sampling from non-log-concave distributions satisfying a logarithmic Sobolev inequality. In this regime, the classical Euler--Maruyama scheme underlying the unadjusted Langevin algorithm (ULA) is known to be unstable. We analyze the KL-accelerated tamed unadjusted Langevin algorithm (kTULA) and introduce a new tamed randomized midpoint scheme, termed tRLMC. Building on the shifted-composition approach of \cite{chewi2024local}, we develop two new local-error frameworks that yield finite-time, non-asymptotic error estimates against the underlying SDE -- in KL divergence for kTULA, and in total variation for tRLMC -- valid for general locally Lipschitz drift. Specializing these frameworks to the sampling problem under a logarithmic Sobolev inequality, we obtain a near-optimal $\widetilde{O}(\varepsilon^{-1/2})$ iteration complexity for kTULA in KL divergence, with corresponding guarantees in total variation and Wasserstein distance. We further establish, for the first time, a non-asymptotic guarantee in total variation for a tamed randomized Langevin scheme under super-linear drift growth, together with the corresponding Wasserstein-distance bound, both with $\widetilde{O}(\varepsilon^{-1})$ complexity for tRLMC. As a consequence, both schemes yield non-asymptotic bounds for a non-convex excess-risk optimization problem.

2605.24929 2026-05-26 stat.ML cs.IT cs.LG math.IT

Estimating Mixture Distributions via Stochastic Mirror Descent

通过随机镜像下降估计混合分布

Mohammadreza Ahmadypour, Tara Javidi, Farinaz Koushanfar

AI总结 针对从样本中估计未知分布的问题,提出基于随机镜像下降(SMD)的混合模型估计器族,通过选择Bregman散度实现灵活估计,在大规模候选分量下保持高效,并在KL散度和ℓ2范数下达到近最优收敛率。

详情
AI中文摘要

我们重新审视了从样本中估计未知分布的经典问题,通过拟合最小化交叉熵损失的混合模型。将该任务视为在$M$分量混合分布空间上的随机凸优化问题,我们提出了一族源自随机镜像下降(SMD)算法的估计器。这种基于优化的方法提供了一个原则性且灵活的框架,它推广了传统估计器,并通过选择Bregman散度提出了多种新颖的估计器。我们方法的一个关键优势是它能够随着候选分量$f_i$的数量高效扩展;也就是说,可以在混合模型中使用大量基分布,而不会产生显著的计算开销。这使得能够实现更丰富的近似和改进的估计精度。此外,在类别分布(离散结果)的情况下,我们的估计器不需要严格的下界,换句话说,我们的框架不需要精确知道分布的支持集。我们证明,在温和条件下,所提出的$φ$-SMD估计器在Kullback-Leibler(KL)散度和$\ell_2$范数下均能达到近最优的收敛速率,并在计算昂贵时提供实际优势。我们的数值分析突出了相对于经典估计器在样本效率和可扩展性方面的改进性能保证。

英文摘要

We revisit the classical problem of estimating an unknown distribution from its samples by fitting a mixture model that minimizes cross-entropy loss. Framing the task as a stochastic convex optimization problem over the space of $ M $-component mixture distributions, we propose a family of estimators derived from the stochastic mirror descent (SMD) algorithm. This optimization-based approach provides a principled and flexible framework that generalizes traditional estimators and proposes a variety of novel estimators through the choice of Bregman divergences. A key advantage of our method is that it scales efficiently with the number of candidate components $ f_i $; that is, one can employ a large set of basis distributions in the mixture model without incurring significant computational overhead. This enables richer approximations and improved estimation accuracy. Moreover, in the case of categorical distribution (discrete outcomes) our estimators do not require a strict lower bound, in other words our framework does not require the precise knowledge of the support of the distribution. We demonstrate that, under mild conditions, the proposed $ φ$-SMD estimators achieve near-optimal convergence rates in both Kullback-Leibler (KL) divergence and $ \ell_2 $-norm and offer practical benefits when computation is expensive. Our numerical analysis highlights improved performance guaranties over classical estimators, particularly in terms of sample efficiency and scalability.

2605.24920 2026-05-26 cs.LG cs.AI stat.ML

Quaternion Self-Attention with Shared Scores

共享分数的四元数自注意力

Shogo Yamauchi, Tohru Nitta, Hideaki Tamori

AI总结 提出一种共享分数四元数自注意力机制,通过四元数内积计算单一实值分数并共享注意力分布,在保持性能的同时大幅降低计算成本。

详情
Comments
26 pages, 6 figures and 15 tables. Accepted at ICML2026
AI中文摘要

四元数神经网络通过将四个相关特征表示为一个单一实体,实现了参数高效并建模多维依赖关系。然而,现有的四元数自注意力计算每个分量的分数并对每个分量应用独立的softmax操作,这增加了计算成本并允许注意力分布在分量间发散。我们提出了一种共享分数的四元数自注意力机制,该机制使用四元数内积计算单一实值分数,并在所有分量上应用共享的注意力分布。这将分数计算乘法减少了75%,并将softmax操作次数从四次减少到一次。我们证明,当查询和键由诱导分量预混合的四元数线性投影产生时,分量级分数和共享分数位于相同的交互子空间中,表明独立的分量级注意力主要重新参数化相同的交互,而不是扩展特征交互空间。在语音增强中,我们的方法在GPU上将推理时间减少了高达44.3%,在CPU上减少了58.1%,同时保持了质量,并且在视觉和自然语言处理中呈现一致的趋势。

英文摘要

Quaternion neural networks are parameter-efficient and model multidimensional dependencies by representing four related features as a single entity. However, existing quaternion self-attention computes component-wise scores and applies independent softmax operations to each component, which increases the computational cost and allows attention distributions to diverge across components. We propose a shared-score quaternion self-attention mechanism that computes a single real-valued score using the quaternion inner product and applies a shared attention distribution across all components. This reduces score-computation multiplications by 75% and the number of softmax operations from four to one. We prove that, when queries and keys are produced by quaternion linear projections that induce component pre-mixing, the component-wise and shared scores lie in the same interaction subspace, indicating that independent component-wise attention primarily re-parameterizes the same interactions rather than expanding the feature interaction space. In speech enhancement, our method reduces inference time by up to 44.3% on a GPU and 58.1% on a CPU while maintaining quality, with consistent trends across vision and natural language processing.

2605.24858 2026-05-26 stat.ME

Optimal Estimation of Discrete Multiview Distributions under Heteroskedastic Multinomial Sampling

异方差多项抽样下离散多视图分布的最优估计

Runshi Tang, Julien Chhor, Olga Klopp, Alexandre B. Tsybakov, Anru R. Zhang

AI总结 针对多项计数数据中的多视图密度张量估计问题,提出一种统一缩放框架,给出谱估计器的Frobenius范数上界,并建立极小极大下界,证明异方差和负相关的影响不可忽略。

详情
AI中文摘要

多视图潜变量模型为离散数据分析提供了基本框架,应用于潜在结构模型、主题模型和产品分布混合。在离散设置中,观测视图的联合分布可以表示为非负低秩张量,我们称之为多视图密度张量。我们研究从多项计数数据估计该张量的问题。一个关键挑战是多项抽样引入异方差和依赖噪声,因此估计的难度不仅取决于环境维度和秩,还取决于概率质量在样本空间不同位置上的分布。我们提出了一个用于多项抽样下密度张量估计的统一缩放框架。该框架导致一个谱估计器,我们证明了其Frobenius范数上界,直接处理异方差和负相关。对于原始多视图模型,我们得到了依赖于纤维质量的Frobenius上界和极小极大下界,表明这种依赖是不可避免的。在ℓ1损失下,我们基于相同的缩放原理开发了oracle和可行的数据驱动估计器,建立了极小极大下界,并展示了在固定秩下oracle规则和在有界切片-纤维不平衡下切片归一化的近最优性。模拟支持理论并证明了所提出方法的鲁棒性。

英文摘要

Multiview latent-variable models provide a fundamental framework for discrete data analysis, with applications to latent structure models, topic models, and mixtures of product distributions. In the discrete setting, the joint distribution of the observed views can be represented as a nonnegative low-rank tensor, which we call a multiview density tensor. We study the problem of estimating this tensor from multinomial count data. A key challenge is that multinomial sampling induces heteroskedastic and dependent noise, so the difficulty of estimation depends not only on the ambient dimensions and rank, but also on how the probability mass is distributed across different locations of sample space. We propose a general scaling framework for density tensor estimation under multinomial sampling. This framework leads to a spectral estimator for which we prove a Frobenius-norm upper bound that directly handles heteroskedasticity and negative dependence. For the original multiview model, we obtain fiber-mass-dependent Frobenius upper bounds and minimax lower bounds showing that this dependence is unavoidable. Under $\ell_1$ loss, we develop both oracle and feasible data-driven estimators based on the same scaling principle, establish minimax lower bounds, and show near-optimality for the oracle rule at fixed rank and for slice normalization under bounded slice-to-fiber imbalance. Simulations support the theory and demonstrate the robustness of the proposed methods.

2605.24854 2026-05-26 stat.ME

Deep Regression for Repeated Measurements under Covariate Shift

协变量偏移下重复测量的深度回归

Yingxuan Wang, Xiangyu Xing, Wangli Xu

AI总结 针对目标域响应不可观测或成本高的问题,采用迁移学习框架,利用源域数据通过密度比校正分布偏移,用ReLU前馈神经网络估计目标回归函数,并证明估计量达到极小极大最优收敛速度,同时通过多项式依赖维度的逼近理论缓解维度灾难。

详情
Comments
59 pages, 2 figures, 2 tables, including appendix
AI中文摘要

本文研究了当目标域中的响应不可观测或收集成本高昂时,具有重复测量的非参数回归问题。我们采用迁移学习框架,利用在协变量偏移下具有可观测响应的源域。通过密度比校正分布偏移来估计目标回归函数。我们考虑了已知和未知密度比两种情况,这反映了非参数回归估计可用的不同数据。在这两种情况下,我们进一步处理了两种设置:一致有界密度比和具有有限矩条件的无界情况。在未知密度比场景下,使用修正线性单元(ReLU)前馈神经网络(FNN)同时估计密度比和目标回归函数;而在已知密度比场景下,仅使用ReLU FNN估计目标回归函数。理论上,我们为所提出的估计量建立了非渐近误差界,并证明它们在重复测量设置下达到了极小极大最优收敛速度。值得注意的是,我们发展了一种新的逼近理论,其中网络参数的常数以多项式方式(而非现有工作中的指数方式)依赖于维度,从而缓解了维度灾难。因此,我们推导出了更尖锐的随机误差非渐近界。通过数值模拟和实际数据应用,展示了所提方法的有限样本性能。

英文摘要

This paper studies nonparametric regression with repeated measurements when the response in the target domain is unobservable or costly to collect. We adopt a transfer learning framework that leverages a source domain with observable responses under covariate shift. The target regression function is estimated by correcting the distribution shift via the density ratio. We consider both known and unknown density ratio scenarios, which reflect different data available for nonparametric regression estimation. In both cases, we further address two settings: the uniformly bounded density ratio and the unbounded case with finite moment conditions. Under the unknown density ratio scenario, both the density ratio and the target regression function are estimated using rectified linear unit (ReLU) feedforward neural networks (FNNs), whereas under the known density ratio scenario, only the target regression function is estimated by ReLU FNNs. Theoretically, we establish non-asymptotic error bounds for the proposed estimators and prove that they achieve the minimax optimal convergence rate under the repeated measurements setting. Notably, we develop a novel approximation theory where the constants of the network parameters depend polynomially, rather than exponentially as in existing works, on the dimension, thereby mitigating the curse of dimensionality. Consequently, we derive sharper non-asymptotic bounds for the stochastic error. The finite sample performance of the proposed method is demonstrated through numerical simulations and a real data application.

2605.24850 2026-05-26 cs.CL cs.IT math.IT stat.AP

Repeated Sequences Reveal Gaps between Large Language Models and Natural Language

重复序列揭示大语言模型与自然语言之间的差距

Kumiko Tanaka-Ishii

AI总结 通过分析重复子序列的分布及其与高阶Rényi熵的关系,提出一种评估大语言模型生成文本长程统计组织的框架,发现GPT生成文本在熵增长模式上与自然语言存在系统性差异。

详情
Comments
ACL 2026
AI中文摘要

评估大语言模型(LLMs)是否捕捉到自然语言的结构(超越局部流畅性)仍然是一个开放的挑战。现有的评估方法主要基于任务性能或短上下文行为,对生成文本的长程统计组织提供的洞察有限。我们提出了一种基于重复子序列的补充评估框架。通过分析其跨尺度的分布并将其与高阶Rényi熵联系起来,我们探究文本在有限长度条件下如何重用先前建立的结构。对人类撰写的文本和长度匹配的GPT生成文本的实验表明,虽然幂律模型可以描述有限范围的块长度,但观察到的熵增长通常同样或更好地由对数-幂形式刻画。跨数据集,自然语言在可访问范围内表现出稳定的熵增长模式,尽管个体文本之间存在变异性,但平均行为一致。相比之下,GPT生成文本的估计指数随模型大小呈现系统性和统计显著的变化。这些结果表明,重复子序列熵提供了一种定量的结构诊断,揭示了长程组织中的系统性差异,从而在表面流畅性之外区分自然语言与最先进的LLM输出。

英文摘要

Evaluating whether large language models (LLMs) capture the structure of natural language beyond local fluency remains an open challenge. Existing evaluation methods, largely based on task performance or short-context behavior, provide limited insight into the long-range statistical organization of generated text. We propose a complementary evaluation framework based on repeated subsequences. By analyzing their distribution across scales and relating it to higher-order Rényi entropies, we probe how texts reuse previously established structure under finite-length conditions. Experiments on human-written texts and length-matched GPT-generated texts show that, while power-law models can describe restricted ranges of block length, the observed entropy growth is often equally or better characterized by logarithmic--power forms. Across datasets, natural language exhibits stable entropy-growth patterns over accessible ranges, with consistent average behavior despite variability across individual texts. In contrast, GPT-generated texts show systematic and statistically significant shifts in estimated exponents with model size. These results demonstrate that repeated-subsequence entropy provides a quantitative structural diagnostic that reveals systematic differences in long-range organization, distinguishing natural language from state-of-the-art LLM outputs beyond surface-level fluency.

2605.24849 2026-05-26 stat.AP

How Eviction Court Governs: A Statistical Analysis of Bargaining, Templates, and Debt in Philadelphia

驱逐法庭如何治理:费城谈判、模板和债务的统计分析

Marios Papamichalis, Regina Ruane

AI总结 通过分析费城1969-2022年755,004份市政法庭房东-租户记录,发现重复的法庭关系、法官和租户律师制度、可重复使用的协议模板以及重复的团队-财产单元组织着案件处理,其中法官和律师身份显著影响结果,而协议文本高度标准化。

详情
Comments
Preprint
AI中文摘要

我们利用1969年至2022年间提交的755,004份市政法庭房东-租户记录,分析了费城驱逐案件的下游法庭治理。提交后的案件处理由重复的法庭关系、法官和租户律师制度、可重复使用的协议模板以及重复的团队-财产单元组织。在双方均有律师代理的案件中,58.2%涉及一对原告方和租户方律师,他们在前一年曾相互对抗;更大的先前配对暴露预测更低的缺席判决、更高的协议判决和更高的送达令状率。与法官相关的案件显示出统计上不同的基线结果、延期、费用和裁决制度;租户律师身份解释了案件结果和协议条款中的显著方差。和解文本高度标准化:可重复使用的模板对严格性、弃权、驱逐触发、付款计划、截止日期和时间要素条款的解释力远强于原始律师身份。金钱负担集中在重复的原告律师-财产单元中。分配单元支持和平衡审计表明,与法官相关的证据反映了制度异质性而非纯粹的法官抽签,且法官-三人组交互作用在此案卷中不可估计。驱逐法庭表现为一个重复的制度场域,在案件进入法庭流程后组织谈判、文本、债务和执行。

英文摘要

We analyze downstream courtroom governance in Philadelphia eviction cases using 755,004 Municipal Court landlord--tenant records filed from 1969 through 2022. Post-filing case processing is organized by repeated courtroom relationships, judge and tenant-attorney regimes, reusable agreement templates, and repeated team-property units. Among both-represented, both-attorney-named cases, 58.2% involve a plaintiff-side and tenant-side attorney pair that had appeared against one another in the prior year, and greater prior pair exposure predicts lower default, higher judgment-by-agreement, and higher served-writ rates. Judge-linked cases display statistically distinct baseline outcome, continuance, fee, and award regimes; tenant-attorney identity explains meaningful variance in both case outcomes and agreement terms. Settlement text is highly standardized: reusable templates explain strictness, waiver, lockout-trigger, payment-plan, deadline, and time-is-essence language far more strongly than raw attorney identity. Monetary burden concentrates in repeated plaintiff-attorney-property units. Assignment-cell support and balance audits indicate that judge-linked evidence reflects institutional heterogeneity rather than a clean judge lottery, and judge--triad interactions are not estimable in this docket. Eviction court emerges as a repeated institutional field that organizes bargaining, text, debt, and enforcement after cases enter the courtroom pipeline.

2605.24848 2026-05-26 stat.ME math.ST stat.TH

Distributional Conformal Prediction for Markov Processes

马尔可夫过程的分布共形预测

Dehao Dai, Kejin Wu, Dimitris N. Politis

AI总结 提出马尔可夫分布共形预测方法,利用概率积分变换将马尔可夫数据转化为独立同分布数据,在β-混合条件下给出非渐近误差界,并验证条件预测区间的渐近有效性。

详情
Comments
54 pages, 5 figures
AI中文摘要

我们引入了马尔可夫分布共形预测(MDCP)方法,该方法将分布共形预测(先前为回归开发)扩展到严格平稳马尔可夫过程的设定。与依赖特定模型结构进行预测不同,分布共形预测区间的思想符合无模型(MF)预测原则。类似于马尔可夫过程的MF预测,我们的方法利用基于估计的转移分布函数的概率积分变换,将马尔可夫数据转化为独立同分布数据集。我们在β-混合条件及其他关于核估计量的标准假设下,展示了MDCP无条件覆盖率的非渐近误差界。条件预测区间的渐近有效性也得到了验证。此外,我们表明当马尔可夫过程满足$L^p$-$m$-可逼近性而非混合性质时,我们的条件预测区间仍然是渐近有效的。通过数值模拟和实际数据实验,我们实证说明了MDCP的有限样本性能,并将其与MF自助法预测方法进行了比较。

英文摘要

We introduce the Markov Distributional Conformal Prediction (MDCP) method that extends the distributional conformal prediction (previously developed for regression) to the setting of a strictly stationary Markov process. Instead of relying on a specific model structure to do prediction, the idea of distributional conformal prediction interval aligns with the Model-Free (MF) Prediction Principle. In analogy to MF prediction of Markov processes, our method exploits the probability integral transform based on estimated transition distribution functions to transform the Markov data to an i.i.d.~dataset. We show a non-asymptotic error bound of MDCPs unconditional coverage rate under a $β$-mixing condition and other standard assumptions on the kernel estimators. The asymptotic validity of the conditional prediction interval is also verified. In addition, we show that our conditional prediction interval is still asymptotically valid with Markov processes being $L^p$-$m$-approximable instead of satisfying the mixing property. Numerical simulations and real data experiments are deployed to empirically illustrate the finite-sample performance of MDCP, and compare it with the MF bootstrap prediction method.

2605.24838 2026-05-26 stat.ME

Adaptable High-Dimensional Change Point Detection via Ridge Regularization

通过岭正则化的自适应高维变点检测

Haoran Li, Haotian Xu

AI总结 针对高维观测序列均值向量的多变点检测问题,提出基于岭正则化CUSUM统计量的方法,通过稳定样本协方差归一化并适应总体协方差结构,在渐近功率最大化原则下选择正则化参数,模拟和实证表现优于现有方法。

详情
Comments
49 pages
AI中文摘要

我们研究了独立高维观测序列均值向量中多个变点的检测问题。基于Li等人(Ann. Statist. 48 (2020) 1815-1847)的自适应岭正则化Hotelling T2检验,我们提出了一族岭正则化CUSUM统计量。所提出的检验专为维度与样本量相当的高维场景中的密集备择假设而设计。通过引入岭正则化,该方法实现了样本协方差归一化的稳定形式,并获得了对潜在总体协方差结构的自适应性。我们在温和条件下推导了所提统计量在原假设和一类局部备择假设下的极限分布。进一步,我们通过最大化渐近功效,建立了一个选择正则化参数的原则性框架。广泛的模拟研究表明,所提出的检验在各种设置下优于现有多种方法。通过应用于2007-2025年标普500成分股日对数收益率的组合数据,展示了所提检验程序的性能。

英文摘要

We study the problem of detecting multiple change points in the mean vectors of an independent sequence of high-dimensional observations. We propose a family of ridge-regularized CUSUM statistics built upon the adaptable ridge-regularized Hotelling's T2 test of Li et al. (Ann. Statist. 48 (2020) 1815-1847). The proposed tests are designed for dense alternatives in the high-dimensional regime where the dimension is comparable to the sample size. By introducing ridge regularization, the procedure achieves a stable form of sample covariance normalization and attains adaptability with respect to the underlying population covariance structure. We derive the limiting distributions of the proposed statistics under mild conditions, both under the null hypothesis and under a class of local alternatives. We further develop a principled framework for selecting the regularization parameter by maximizing asymptotic power. Extensive simulation studies demonstrate that the proposed tests compare favorably with a wide range of existing methods across diverse settings. The performance of the proposed test procedure is illustrated through an application to a panel of daily log-returns from S&P 500 constituents spanning 2007-2025.

2605.24810 2026-05-26 cs.LG cs.AI cs.RO stat.AP

Cross-Domain Energy-Guided Diffusion Generation for Off-Dynamics Reinforcement Learning

跨域能量引导扩散生成用于动态偏移强化学习

Yu Yang, Yihong Guo, Anqi Liu, Pan Xu

AI总结 提出CEDGE框架,利用能量引导扩散模型生成目标域轨迹,解决动态偏移下离线强化学习的域适应问题。

详情
Comments
29 pages, 3 figures, and 14 tables
AI中文摘要

离动态离线强化学习旨在从大规模源数据集和有限目标数据集中学习目标域策略,但面临转移动态不匹配的问题。现有方法如奖励增强和数据过滤受限于源数据集,无法合成新的目标行为以改善超出收集源轨迹的覆盖范围。虽然近期基于模型的方法尝试通过学习目标感知动态来解决此问题,但生成的体验仅在转移层面构建,导致长时域上的累积误差。这些限制促使离动态离线RL转向轨迹级生成。我们提出CEDGE,一种跨域能量引导扩散生成框架。CEDGE在源域轨迹上训练轨迹扩散模型,并通过能量引导将生成样本适应到目标域。该引导通过最小化源域与期望目标域轨迹之间的分布不匹配得到,并分解为回报、域和行为能量成分。得到的能量引导轨迹既可用于直接规划,也可作为策略学习的合成数据。由于目标适应通过能量引导而非重新训练扩散模型实现,与先前方法相比,CEDGE能高效适应新的目标动态。在ODRL基准上的实验表明,轨迹级能量引导生成改善了动态偏移下的扩散规划,并产生提升下游目标策略学习的合成数据。

英文摘要

Off-dynamics offline reinforcement learning seeks to learn a target-domain policy from a large source dataset and a limited target dataset under mismatched transition dynamics. Existing approaches such as reward augmentation and data filtering are constrained to the source dataset and cannot synthesize new target behavior to improve coverage beyond the collected source trajectories. While recent model-based methods attempt to address this by learning target-aware dynamics, the generated experience is constructed only at the transition level, which leads to accumulated errors over long horizons. These limitations necessitate a shift toward trajectory-level generation for off-dynamics offline RL. We propose CEDGE, a Cross-domain Energy-guided Diffusion GEneration framework. CEDGE trains a trajectory diffusion model on source-domain trajectories and adapts the generated samples to the target domain through energy guidance. This guidance is derived by minimizing the distribution mismatch between the source and desired target-domain trajectories and is decomposed into return, domain, and behavior energy components. The resulting energy-guided trajectories are useful both for direct planning and as synthetic data for policy learning. Since target adaptation is achieved via energy guidance rather than retraining the diffusion model, CEDGE can be efficiently adapted to new target dynamics compared to previous methods. Experiments on the ODRL benchmark demonstrate that trajectory-level energy-guided generation improves diffusion planning under dynamics shifts and produces synthetic data that improves downstream target policy learning.

2605.24749 2026-05-26 stat.ML cs.LG

How Neural Reward Models Learn Features for Policy Optimization: A Single-Index Analysis

神经奖励模型如何学习策略优化的特征:单指标分析

Rei Higuchi, Ryotaro Kawata, Akifumi Wachi, Shokichi Takakura, Kohei Miyaguchi, Taiji Suzuki

AI总结 本文通过高斯单指标模型分析两阶段神经奖励模型,研究指数奖励加权对特征学习的影响,并推导出倾斜策略价值差距的界限,给出可接受的部署温度范围。

详情
Comments
35 pages
AI中文摘要

奖励建模不仅是一个预测问题:在KL正则化策略优化中,学习到的奖励被指数化以定义部署策略,因此下游价值取决于奖励倾斜区域中的误差。我们在高斯单指标模型 $r^*(x) = σ^*(\langle θ^*, x angle)$ 且 $x \sim N(0, I_d)$ 下研究这种反馈。我们分析了一个两阶段神经奖励模型,该模型首先从奖励加权样本中学习隐藏方向 $θ^*$,然后通过加权岭回归拟合读出层。指数奖励加权改变了第一层可用的Hermite信号;对于任何高于无维度 $O(1)$ 阈值的特征学习温度 $β_1$,恒定比例的神经元恢复隐藏方向,弱恢复复杂度由生成指数控制。在特征恢复后,我们推导了理想化标签加权拟合(权重 $e^{y/β_2}$)和更实用的代理加权拟合(权重 $e^{r_{a_0}(x)/β_2}$)的倾斜策略价值差距界限。保持 $β_2$ 依赖性显式,得到一组可接受的部署温度,平衡降低 $β_2$ 带来的收益与指数加权放大的学习成本;在代理加权情况下,代理相关因子缩小了该可接受集。

英文摘要

Reward modeling is not only a prediction problem: in KL-regularized policy optimization, the learned reward is exponentiated to define the deployed policy, so downstream value depends on errors in reward-tilted regions. We study this feedback in a Gaussian single-index model with $r^*(x) = σ^*(\langle θ^*, x\rangle)$ and $x \sim N(0, I_d)$. We analyze a two-stage neural reward model that first learns the hidden direction $θ^*$ from reward-weighted samples and then fits the readout layer by weighted ridge regression. Exponential reward weighting changes the Hermite signal available to the first layer; for any feature-learning temperature $β_1$ above a dimension-free $O(1)$ threshold, a constant fraction of neurons recover the hidden direction, with weak-recovery complexity governed by the generative exponent. After feature recovery, we derive tilted-policy value-gap bounds for an idealized label-weighted fit with weights $e^{y/β_2}$ and a more practical surrogate-weighted fit with weights $e^{r_{a_0}(x)/β_2}$. Keeping the $β_2$-dependence explicit yields an admissible set of deployment temperatures, balancing the gain from lowering $β_2$ against the learning cost amplified by exponential weighting; in the surrogate-weighted case, proxy-dependent factors shrink this admissible set.

2605.24741 2026-05-26 math.ST cs.IT cs.LG math.IT stat.ML stat.TH

On the Sample Complexity of Robust Binary Hypothesis Testing

关于鲁棒二元假设检验的样本复杂度

Shankar Vallinayagam, Ankit Pensia, Varun Jog

AI总结 研究在三种污染模型下鲁棒二元假设检验的样本复杂度,证明最不利分布的存在性并给出显式公式,揭示样本复杂度对污染参数的不稳定性,并建立不同模型间样本复杂度的可比性。

详情
Comments
Comments welcome
AI中文摘要

我们研究了在三种标准污染模型下鲁棒二元假设检验的样本复杂度:$\varepsilon$-加性(Huber)、$\varepsilon$-减性和$\varepsilon$-全变差(TV),分别记为$n^*_{\mathrm{Hub}}(\varepsilon)$、$n^*_{\mathrm{Sub}}(\varepsilon)$和$n^*_{\mathrm{TV}}(\varepsilon)$。对于减性污染,我们证明最不利分布存在并给出显式公式,使该模型与经典的Huber和TV模型一致。接下来我们表明,在所有三种模型中,样本复杂度可能在污染参数$\varepsilon$上高度不稳定,即使对于$o(\varepsilon)$的扰动也会增加多项式因子。类似地,当$\varepsilon$精确已知与仅知道$o(\varepsilon)$误差时,样本复杂度之间可能存在多项式因子差距。尽管所有模型中样本复杂度不稳定,但我们表明,在$\varepsilon$的常数因子重新缩放下,各模型的样本复杂度是可比较的。具体地,对于任意固定的$\delta_0>0$,以下对所有分布$p$和$q$成立:(i) $n^*_{\mathrm{Hub}}(\varepsilon) \lesssim n^*_{\mathrm{TV}}(\varepsilon) \lesssim n^*_{\mathrm{Hub}}(2\varepsilon)$,(ii) $n^*_{\mathrm{Sub}}(\varepsilon) \lesssim n^*_{\mathrm{TV}}(\varepsilon) \lesssim n^*_{\mathrm{Sub}}((2+\delta_0)\varepsilon)$,(iii) $n^*_{\mathrm{Sub}}(\varepsilon) \lesssim n^*_{\mathrm{Hub}}(\varepsilon) \lesssim n^*_{\mathrm{Sub}}((1+\delta_0)\varepsilon)$,且缩放常数是紧的。最后,我们将结果扩展到污染模型的自适应版本。

英文摘要

We study the sample complexity of robust binary hypothesis testing under three standard contamination models: $\varepsilon$-additive (Huber), $\varepsilon$-subtractive, and $\varepsilon$-total variation (TV), denoted by $n^*_{\mathrm{Hub}}(\varepsilon)$, $n^*_{\mathrm{Sub}}(\varepsilon)$, and $n^*_{\mathrm{TV}}(\varepsilon)$, respectively. For subtractive contamination, we show that least favourable distributions exist and provide explicit formulas for the same, bringing this model in line with the classical Huber and TV models. Next we show that in all three models, sample complexity may be highly unstable in the contamination parameter $\varepsilon$, increasing by polynomial factors even for $o(\varepsilon)$ perturbations. Similarly, there may be polynomial factor gaps between the sample complexities when $\varepsilon$ is known exactly versus when it is known up to $o(\varepsilon)$ error. Despite the instability of the sample complexity in all models, we show that the sample complexities across models are comparable up to constant-factor rescaling of $\varepsilon$. Specifically, for any fixed $δ_0>0$, the following hold for all distributions $p$ and $q$: (i) $n^*_{\mathrm{Hub}}(\varepsilon) \lesssim n^*_{\mathrm{TV}}(\varepsilon) \lesssim n^*_{\mathrm{Hub}}(2\varepsilon)$, (ii) $n^*_{\mathrm{Sub}}(\varepsilon) \lesssim n^*_{\mathrm{TV}}(\varepsilon) \lesssim n^*_{\mathrm{Sub}}((2+δ_0)\varepsilon)$, and (iii) $n^*_{\mathrm{Sub}}(\varepsilon) \lesssim n^*_{\mathrm{Hub}}(\varepsilon) \lesssim n^*_{\mathrm{Sub}}((1+δ_0)\varepsilon)$, and the scaling constants are tight. Finally, we extend our results to adaptive versions of the contamination models.

2605.24734 2026-05-26 math.ST stat.TH

Consistent Identification of Top-$K$ Nodes in Noisy Networks

噪声网络中Top-$K$节点的一致性识别

Hui Shen, Eric D. Kolaczyk

AI总结 针对噪声网络,基于度中心性研究真实top-$k$节点集的恢复条件,揭示度间隙与噪声幅度的关系,并推导期望差异的上下界,扩展至特征向量中心性。

详情
AI中文摘要

识别网络中最具影响力的节点(通常使用中心性度量)是应用网络分析中的核心任务。然而,现实世界的网络通常由噪声或不完整的数据构建,这可能会扭曲排名并导致识别真实top-$k$节点时出现错误。在本文中,我们研究网络噪声如何影响基于度中心性的真实top-$k$节点集的恢复。具体来说,我们考虑一个噪声网络观测,其中边根据概率噪声模型随机添加或移除,并分析由此产生的经验top-$k$集。我们表明,网络噪声下的top-$k$恢复由度间隙与噪声幅度之间的关系决定,这区分了可恢复和不可恢复的区域。为了量化排名稳定性,我们在一般框架和特定网络模型下推导了经验top-$k$集与真实top-$k$集之间期望差异的上界和下界。我们还将分析扩展到特征向量中心性,表明谱排名中存在类似的噪声-间隙权衡。模拟研究支持了我们的理论发现,并说明了网络噪声在各种设置下的实际影响。

英文摘要

Identifying the most influential nodes in a network, typically using centrality measures, is a central task in applied network analysis. However, real-world networks are often constructed from noisy or incomplete data, which can distort rankings and lead to errors in identifying the true top-$k$ nodes. In this paper, we study how network noise affects the recovery of the true top-$k$ node set based on degree centrality. Specifically, we consider a noisy network observation in which edges are randomly added or removed according to a probabilistic noise model, and analyze the resulting empirical top-$k$ set. We show that top-$k$ recovery under network noise is governed by the relationship between the degree gap and the noise magnitude, which separates recoverable and unrecoverable regimes. To quantify ranking stability, we derive upper and lower bounds on the expected discrepancy between the empirical and true top-$k$ sets in a general framework and for specific network models. We also extend the analysis to eigenvector centrality, showing that similar noise-gap tradeoffs arise in spectral rankings. Simulation studies support our theoretical findings and illustrate the practical impact of network noise across a range of settings.

2605.24710 2026-05-26 cs.LG math.PR math.ST stat.ML stat.TH

Feature Learning in Wide Neural Networks under $μ$P: Identifiability and Sparse-Dictionary Decomposition of the Mean-Field Limit

μP 下宽神经网络中的特征学习:平均场极限的可辨识性与稀疏字典分解

Akmal Xodarev

AI总结 本文在最大更新参数化(μP)下,针对宽两层神经网络,建立了特征学习的四个结构结果,包括平均场极限的全局存在唯一性、可辨识性刻画、稀疏字典分解以及总特征学习误差分解,并揭示了架构-数据对的自然学习单元。

详情
Comments
86 pages
AI中文摘要

我们在最大更新参数化($μ$P)下,为宽两层神经网络中的特征学习建立了四个结构结果。 第一,我们证明了在$μ$P下带噪声梯度下降的平均场极限的全局存在唯一性,确定了初始化矩序列上的最大可容许权重$w^*$作为参数-矩增长边界的倒数,从而也是流传播的最大加权矩类。有限粒子近似具有关于时间的均匀平方Wasserstein速率$O(N^{-1})$。 第二,我们刻画了平均场极限的可辨识性:两个可容许参数测度在$L^2$中诱导相同的网络函数当且仅当它们的活跃分量在模去架构的有限秩实现对称性后一致。轨道深度$D^*_{\mathrm{orb}}$与矩簇深度$D^*_{\mathrm{var}}$不同。 第三,在Barron-Hermite目标条件下,长时间极限测度的活跃支撑集允许一个稀疏字典分解:它在模去有限秩实现对称性后至多支撑在$S^*$个原子上,其中$S^*$由一个显式的系数阈值数界定。 第四,我们将总特征学习误差分解为统计、优化、混沌传播和稀疏残差分量,其中目标相关的Hermite/Barron尾部取代了任何仅初始化的残差。 这四个结果通过一个架构恒等式联系在一起:三元组$(w^*, D^*_{\mathrm{orb}}, S^*)$——最大可容许权重、轨道可辨识深度以及目标可实现时的稀疏字典深度——是架构-数据对$(\sigma, \rho)$的自然学习单元。证明是自包含的,除了来自$μ$P和平均场Langevin理论的标准结果。

英文摘要

We establish four structural results for feature learning in wide two-layer neural networks under the Maximal Update Parametrization ($μ$P). First, we prove global existence and uniqueness of the mean-field limit of noisy gradient descent under $μ$P, identifying the maximal admissible weight $w^*$ on the moment sequence of the initialization as the reciprocal parameter-moment-growth boundary, and hence the largest weighted moment class propagated by the flow. The finite-particle approximation has uniform-in-time squared-Wasserstein rate $O(N^{-1})$. Second, we characterize identifiability of the mean-field limit: two admissible parameter measures induce the same network function in $L^2$ exactly when their active components agree modulo the finite-rank realization symmetry of the architecture. The orbit depth $D^*_{\mathrm{orb}}$ is separated from the moment-variety depth $D^*_{\mathrm{var}}$. Third, under the Barron-Hermite target condition the active support of the long-time limit measure admits a sparse-dictionary decomposition: it is supported on at most $S^*$ atoms modulo finite-rank realization symmetry, with $S^*$ bounded by an explicit coefficient-threshold number. Fourth, we derive the total feature-learning-error decomposition into statistical, optimization, propagation-of-chaos, and sparse-residual components, with a target-dependent Hermite/Barron tail replacing any initialization-only residual. The four results are tied together by an architectural identity: the triple $(w^*, D^*_{\mathrm{orb}}, S^*)$ -- the maximal admissible weight, the orbit identifiability depth, and the sparse-dictionary depth at which the target is realizable -- is the natural learning cell of the architecture-data pair $(σ, ρ)$. The proofs are self-contained except for standard results from $μ$P and mean-field Langevin theory.

2605.24707 2026-05-26 stat.ME

Shared hidden-factor information framework for multiple behavioral tasks

多行为任务的共享隐因子信息框架

Yuan Bian, Yuanjia Wang, Xingche Guo

AI总结 提出共享隐因子信息框架(SHIFT),通过联合建模多个行为任务中的共享信息,提升估计精度并揭示跨任务依赖关系,应用于抑郁症研究。

详情
AI中文摘要

理解重度抑郁症(MDD)的认知过程通常依赖于行为任务,这些任务通常被单独分析,忽略了潜在的相关性和共享潜在结构。为解决这一局限,我们提出了多行为任务的共享隐因子信息框架(SHIFT),一种联合建模方法,利用任务间的共享信息,使每个任务能从其他任务学习到的信息中受益。SHIFT引入了受试者特定的潜在因子,以捕捉跨任务依赖性,同时容纳个体在决策、反应时间(RTs)和策略切换中的异质性。为解决计算挑战而不需要高维积分,我们开发了一种带有变分近似的期望最大化算法,该算法保留了时间结构和任务间依赖性。通过广泛的模拟研究,我们证明SHIFT相对于单任务分析显著提高了估计精度和效率。然后我们将SHIFT应用于一项MDD研究,联合建模概率奖励任务(PRT)和侧翼任务(FT)。结果表明,与健康对照组相比,MDD参与者在PRT中表现出较低的参与度,在FT中表现出较低的专注度。此外,当个体参与且专注时,他们表现出更长的RTs。尽管观察到的RTs不能预测治疗反应,但SHIFT恢复的共享参数显示出提示性的治疗调节模式,表明它们作为治疗结果探索性行为标志物的潜力。

英文摘要

Understanding cognitive processes in major depressive disorder (MDD) often relies on behavioral tasks, which are typically analyzed separately, overlooking potential correlations and shared latent structure. To address this limitation, we propose the Shared Hidden-factor Information Framework for Multiple Behavioral Tasks (SHIFT), a joint modeling approach that leverages shared information across tasks, allowing each task to benefit from information learned by the others. SHIFT introduces subject-specific latent factors that capture cross-task dependencies while accommodating individual heterogeneity in decision-making, response times (RTs), and strategy switching. To address computational challenges without requiring high-dimensional integration, we develop an expectation-maximization with variational approximation algorithm that preserves both temporal structure and between-task dependencies. Through extensive simulation studies, we demonstrate that SHIFT substantially improves estimation accuracy and efficiency relative to single-task analyses. We then apply SHIFT to a study of MDD to jointly model the Probabilistic Reward Task (PRT) and the Flanker Task (FT). Results indicate that MDD participants show lower engagement in the PRT and reduced focus in the FT compared with healthy controls. Moreover, when individuals are engaged and focused, they exhibit longer RTs. Although observed RTs do not predict treatment response, the shared parameters recovered by SHIFT showed suggestive treatment-modulation patterns, indicating their potential as exploratory behavioral markers for therapeutic outcomes.

2605.24673 2026-05-26 stat.ML cs.LG

Affinity Graph Connectivity in Convex Clustering

凸聚类中的亲和图连通性

Sam Rosen, Jason Xu

AI总结 研究凸聚类中亲和权重对应一般连通图时的有限样本界,通过随机游走理论分析聚类性能与图结构连通性的关系,并提出超参数调优应包括亲和权重的调整。

详情
Comments
28 pages, 6 figures
AI中文摘要

我们将凸聚类的有限样本界推广到目标函数中的亲和权重对应一般连通图的情形。这些界及其分析有助于更好地理解数据背后各种隐含连通结构下的聚类行为,并为质心恢复提供新的收敛速率。新的理论框架基于随机游走,这使得可以应用与随机图模型相关的集中不等式,并形式化了聚类性能与图结构连通性之间的关系。通过界的形式和实证结果,我们认为凸聚类问题的超参数调优还应包括输入亲和权重的调优。

英文摘要

We generalize finite-sample bounds for convex clustering to the setting where affinity weights appearing in the objective correspond to a general connected graph. These bounds and their analysis lead to a better understanding of clustering behavior under various implied connectivity structures behind the data and to new rates of convergence for centroid recovery. The new theoretical framework is based on random walks, which allow application of concentration inequalities related to random graph models, and formalizes the relationship between the clustering performance and the connectivity of the graph structures. Through the form of the bound and empirical results, we argue proper tuning of hyperparameters to convex clustering problems should also include tuning of input affinity weights.

2605.24601 2026-05-26 stat.ME math.ST stat.TH

Bayesian Conformal-Projective Prediction

贝叶斯共形-投影预测

Arkaprava Roy, Malay Ghosh

AI总结 提出一种结合贝叶斯预测建模与共形预测思想的鲁棒预测框架(CPP),通过分布共形性定义预测区间,并证明其在ε-污染模型下渐近方差优于任意有界影响插件预测器。

详情
AI中文摘要

我们提出一个通用的鲁棒预测框架,称为共形-投影预测(CPP),它将贝叶斯预测建模与共形预测的思想相结合。CPP 准则不通过基于残差的得分来评估共形性,而是从分布上定义共形性:未来响应的候选值被视为共形的程度,取决于将其包含在数据中是否使观测响应的留一预测分布保持不变。该框架仅要求留一预测分布和交换预测分布具有封闭形式,且交换预测均值对候选值可微。在这些条件下,我们建立了一个一般的有界影响命题和一个一般的局部凸性引理,并证明在ε-污染模型下,CPP 在渐近方差上优于任何具有无界影响的插件预测器。当后验均值是观测值的线性函数时(如高斯线性模型、基展开回归和高斯过程回归),交换预测均值是候选值的仿射函数,从而产生封闭形式或一维优化解以及高效的秩二计算更新;所有一般理论结果在此设定下都特化为显式推论。在高斯线性模型下的模拟实验和两个数据分析展示了所提方法的有限样本优势,验证了在不同污染水平、样本量和预测变量维度下的理论预测。

英文摘要

We propose a general robust prediction framework, termed conformal-projective prediction (CPP), that integrates Bayesian predictive modeling with ideas from conformal prediction. Rather than assessing conformity through residual-based scores, the CPP criterion defines conformity distributionally: a candidate value for a future response is considered conforming to the extent that its inclusion in the data leaves the leave-one-out predictive distributions of the observed responses undisturbed. The framework requires only that the leave-one-out and swapped predictive distributions are available in closed form and that the swapped predictive mean is differentiable in the candidate value. Under these conditions, we establish a general bounded-influence proposition and a general local convexity lemma, and prove that CPP dominates any plug-in predictor with unbounded influence in asymptotic variance under $ε$-contamination models. When the posterior mean is linear in the observations, as in Gaussian linear models, basis-expansion regression, and Gaussian process regression, the swapped predictive mean is affine in the candidate value, yielding closed-form or one-dimensional optimization solutions and an efficient rank-two computational update; all general theoretical results specialize to explicit corollaries in this setting. Simulation experiments and two data analyses under the Gaussian linear model illustrate the finite-sample advantages of the proposed method, confirming the theoretical predictions across contamination levels, sample sizes, and predictor dimensions.

2605.24590 2026-05-26 cs.CV cs.LG stat.ML

Physen-Noise2Noise: Physics-Guided Self-Supervised Defocus Deblurring with Bias Correction under Low-Light Conditions

Physen-Noise2Noise: 低光条件下带偏差校正的物理引导自监督散焦去模糊

Ziyan Huang, Lang Wu, Hongji Wang, Yifei Liu, Dongliang Tang, Hongqiao Wang

AI总结 提出一种基于物理模型的自监督散焦去模糊框架Physen-Noise2Noise,通过可学习噪声偏差参数和频域约束,在无干净参考图像的情况下联合校正偏差噪声并恢复高频细节。

详情
Comments
14 pages
AI中文摘要

低光、长曝光散焦去模糊由于同时存在严重模糊和复杂有偏噪声,仍然是一个具有挑战性的问题。现有方法通常依赖于简化的噪声假设,这限制了它们在真实成像条件下的有效性。在这项工作中,我们提出了Physen-Noise2Noise,一种由散焦成像物理模型引导的自监督去模糊框架,它利用有噪声的多帧观测,无需干净参考图像。与传统的基于Noise2Noise的方法假设零均值噪声不同,我们推导了散焦成像过程固有的频域约束,并通过可学习的噪声偏差参数将其纳入学习框架。此外,引入了一种多帧有噪初始化策略,在去模糊之前抑制复杂有偏噪声,为重建提供更稳定的起点。该公式显式建模有偏噪声,并在训练过程中实现联合偏差校正和高频细节恢复。此外,我们开发了一种预训练-微调变体,以增强在挑战性噪声条件下的鲁棒性和泛化能力。在模拟和真实数据集上的大量实验表明,所提出的方法在存在复杂有偏噪声的情况下,始终优于最先进的自监督散焦去模糊方法。

英文摘要

Low-light, long-exposure defocus deblurring remains a challenging problem due to the simultaneous presence of severe blur and complex biased noise. Existing methods typically rely on simplified noise assumptions, which limits their effectiveness under realistic imaging conditions. In this work, we propose Physen-Noise2Noise, a self-supervised deblurring framework guided by the physical model of defocus imaging, which leverages noisy multi-frame observations without requiring clean reference images. Unlike conventional Noise2Noise-based approaches that assume zero-mean noise, we derive a frequency-domain constraint inherent to the defocus imaging process and incorporate it into the learning framework via a learnable noise bias parameter. In addition, a multi-frame noisy initialization strategy is introduced to suppress complex biased noise prior to deblurring, providing a more stable starting point for reconstruction. This formulation explicitly models biased noise and enables joint bias correction and high-frequency detail recovery during training. Furthermore, we develop a pretrain-finetune variant to enhance robustness and generalization under challenging noise conditions. Extensive experiments on both simulation and real-world datasets demonstrate that the proposed method consistently outperforms state-of-the-art self-supervised approaches for defocus deblurring in the presence of complex biased noise.

2605.24587 2026-05-26 stat.ME

Synthetic Heterogeneous-Effects LASSO: A Fixed-effects Estimation Approach for High-dimensional Mixed-effects Models

合成异质效应LASSO:高维混合效应模型的固定效应估计方法

Shangyuan Ye, Cong Zhang, Ying Chen, Ye Liang, Guanbo Wang

AI总结 针对高维聚类数据,提出合成异质效应LASSO(SHEL)方法,通过纳入聚类级合成近似来解决边际模型LASSO因协变量异质分布导致的虚假选择问题,并建立理论性质与后选择推断程序。

详情
AI中文摘要

本文研究基于边际模型程序的高维聚类数据的变量选择和后选择推断。我们表明,当协变量在聚类间异质分布时,边际模型LASSO可能将其用作潜在聚类效应的稀疏代理,从而将估计目标从结构固定效应偏移并导致虚假选择。为解决此问题,我们提出合成异质效应LASSO(SHEL),一种纳入聚类级合成近似以处理潜在异质性的固定效应惩罚框架。我们在高维设置下建立了SHEL的理论性质,并开发了有效的后选择推断程序。通过广泛的模拟研究考察了所提方法的有限样本性能。分析了一项来自住院COVID-19患者富集血中性粒细胞的纵向bulk RNA-seq数据集,以在实际应用中演示该方法。

英文摘要

This paper studies variable selection and post-selection inference for high-dimensional clustered data using marginal-model-based procedures. We show that, when covariates are heterogeneously distributed across clusters, marginal-model LASSO may use them as sparse proxies for latent cluster effects, shifting the estimation target away from the structural fixed effects and inducing false selections. To address this problem, we propose Synthetic Heterogeneous-Effects LASSO (SHEL), a fixed-effects penalized framework that incorporates cluster-level synthetic approximations to the latent heterogeneity. We establish theoretical properties of SHEL in high-dimensional settings and develop procedures for valid post-selection inference. The finite sample performance of the proposed method is investigated through extensive simulation studies. A longitudinal bulk RNA-seq dataset of enriched blood neutrophils from hospitalized COVID-19 patients is analyzed to demonstrate the method in a real application.

2605.19938 2026-05-26 stat.ME cs.LG stat.ML

Variance-Reduced Manifold Sampling via Polynomial-Maximization Density Estimation

通过多项式最大化密度估计的方差缩减流形采样

Serhii Zabolotnii

AI总结 针对隐式定义流形上的均匀采样问题,提出一种基于多项式最大化矩估计的密度估计模块PMM-MASEM,通过门控机制在非平坦间距分布下替代传统插件估计,降低密度均方误差22-36%。

详情
Comments
16 pages, 5 figures, 3 tables. Code supplement: https://github.com/SZabolotnii/Ku-PMM-MASEM-code-supplement
AI中文摘要

在隐式定义流形上的均匀采样是运动规划、约束模拟和概率机器学习中的核心原语。MASEM通过熵最大化重采样解决该问题,但其重采样权重依赖于局部k近邻密度估计,而激进的重采样温度可能放大其误差。我们探究是否可以用多项式最大化矩估计器替代插件密度规则,而不改变周围的MASEM架构。所提出的PMM-MASEM模块从嵌套的k近邻半径计算壳间距,估计其标准化累积量,并仅在间距分布偏离平坦的Exp(1)分布时使用门控的PMM2/PMM3估计器;否则回退到插件/MLE规则。这种回退至关重要:在平坦齐次流形上,插件估计器已经是MLE,因此PMM不应优于它。局部已知DGP蒙特卡洛实验证实了该门控:选择器在平坦Exp(1)间距下返回MLE,并在非对称伽马和边界间距情况下将密度MSE降低22-36%。证据并非一致积极:PMM3在尖峰均匀间距法则下表现更差,而轻量级重采样代理实验改善了七瓣覆盖但降低了正弦和瑞士卷代理的性能。因此,当前证据支持的是适用边界结果,而非一般的MASEM改进主张。

英文摘要

Uniform sampling on implicitly defined manifolds is a core primitive in motion planning, constrained simulation, and probabilistic machine learning. MASEM addresses this problem by entropy-maximizing resampling, but its resampling weights depend on a local k-nearest-neighbour density estimate whose errors can be amplified by aggressive resampling temperatures. We ask whether a polynomial-maximization moment estimator can replace the plug-in density rule without changing the surrounding MASEM architecture. The proposed PMM-MASEM module computes shell spacings from nested k-nearest-neighbour radii, estimates their standardized cumulants, and uses a gated PMM2/PMM3 estimator only when the spacing distribution departs from the flat Exp(1) regime; otherwise it falls back to the plug-in/MLE rule. This fallback is essential: on a flat homogeneous manifold the plug-in estimator is already the MLE, so PMM should not outperform it. A local Known-DGP Monte Carlo experiment confirms this gate: the selector returns MLE on flat Exp(1) spacings and reduces density MSE by 22--36% on asymmetric gamma and boundary-spacing regimes. The evidence is not uniformly positive: PMM3 worsens a platykurtic uniform spacing law, and a lightweight resampling-proxy experiment improves seven-lobes coverage but degrades the sine and swiss-roll proxies. The current evidence therefore supports an applicability-boundary result rather than a general MASEM improvement claim.

2605.19170 2026-05-26 stat.ML cs.LG

Reducing Diffusion Model Memorization with Higher Order Langevin Dynamics

使用高阶朗之万动力学减少扩散模型记忆化

Benjamin Sterling, Mónica F. Bugallo, Tom Tirer

AI总结 本文研究高阶朗之万动力学(HOLD)对扩散模型记忆化的影响,通过理论分析表明HOLD通过低通滤波学习得分函数并随阶数增加平滑度,从而缓解记忆化,并在真实数据上验证了理论。

详情
AI中文摘要

扩散/基于分数的模型已成为强大的生成模型,能够生成模仿训练数据分布的高质量样本。然而,观察到它们容易重现训练样本——称为“记忆化”——可能违反版权和隐私。在本文中,我们研究了高阶朗之万动力学(HOLD)对这一现象的影响。HOLD扩散过程引入了辅助变量;如果数据变量被解释为“位置”,那么辅助变量可以解释为“速度”和“加速度”,具体取决于所选模型的阶数。它们最初是基于这样的直觉提出的:通过隐式施加额外的动力学约束来正则化数据变量的轨迹。据我们所知,我们的工作首次提供了HOLD正则化效应的理论刻画。具体来说,我们表明在HOLD中,数据变量的动力学由学习得分函数的低通滤波版本控制,其平滑度随HOLD阶数增加而增加。然后我们分析了最优经验得分和分布崩溃的可能性。总之,我们的结果解释了随着模型阶数增加记忆化的缓解。最后,我们在真实世界数据上进行了实证研究,支持了我们的理论,并突出了HOLD在实践中相对于标准扩散的这一独特优势。

英文摘要

Diffusion/score-based models have emerged as powerful generative models, capable of generating high-quality samples that mimic the training data distribution. However, it has been observed that they are prone to reproducing training samples-known as "memorization"-potentially violating copyright and privacy. In this paper, we study the effect of Higher-Order Langevin Dynamics (HOLD) on this phenomenon. HOLD diffusion processes introduce auxiliary variables; if the data variable is interpreted as "position," then the auxiliary variables can be interpreted as "velocity" and "acceleration," depending on the chosen order of the model. They were originally proposed based on the intuition that they regularize the trajectories of the data variable by implicitly imposing additional dynamical constraints. Our work provides, to our knowledge, the first theoretical characterization of the regularization effect of HOLD. Specifically, we show that in HOLD, the dynamics of the data variable are governed by a low-pass-filtered version of the learned score function, with smoothness increasing with the order of HOLD. We then analyze the optimal empirical score and the possibility of distribution collapse. Together, our results explain the mitigation of memorization as the model order increases. Finally, we present an empirical study on real-world data that supports our theory and highlights this distinct advantage of HOLD over standard diffusion in practice.

2605.12764 2026-05-26 q-fin.MF cs.LG stat.ML

Yield Curves Dynamics Using Variational Autoencoders Under No-arbitrage

无套利条件下使用变分自编码器的收益率曲线动力学

Fusheng Luo, H'elyette Geman

AI总结 提出一种物理信息生成框架,通过两阶段架构(学生t条件变分自编码器+动态水平注入和神经随机微分方程)解决深度学习统计灵活性与固定收益理论约束的冲突,在多个主权货币上显著降低预测误差并实现无套利。

详情
Comments
This is the full script (version 2) of our paper, which is awaiting submission to financial journals/conferences, after modifying and double-checking the reference lists
AI中文摘要

本文引入了一个物理信息生成框架,解决了深度学习统计灵活性与固定收益建模严格理论约束之间的根本冲突。我们证明,标准生成模型和无约束统计外推在预测跨多种宏观经济体制的期限结构时,会遭受“流形崩溃”和严重的套利违规。为克服这一问题,我们提出了一种两阶段架构。首先,具有动态水平注入的学生t条件变分自编码器(CVAEsT+LS)提取了一个稳健、重尾的期限结构流形,有效解耦了宏观经济形状动态与绝对基准利率。其次,潜在动态演化由连续时间神经随机微分方程(SDE)控制,并受到无套利偏微分方程(PDE)的严格惩罚。跨多个主权货币(美元、英镑、日元)的实证结果证实,我们的协同方法大幅降低了样本外预测误差——实现了卓越的6.58个基点平均期限RMSE——并成功克服了经典HJM模型在极端环境中表现出的巨大平行漂移和零下限违规。此外,通过相空间向量场分析,我们展示了该模型在无监督宏观经济体制检测和高质量连续时间情景生成方面的卓越能力。最终,本研究为期限结构建模提供了一个高度可扩展、数学上合理的演化引擎。

英文摘要

This paper introduces a physics-informed generative framework that resolves the fundamental conflict between the statistical flexibility of deep learning and the rigorous theoretical constraints of fixed-income modeling. We demonstrate that standard generative models and unconstrained statistical extrapolations suffer from "manifold collapse" and severe arbitrage violations when forecasting term structures across diverse macroeconomic regimes. To overcome this, we propose a two-stage architecture. First, a Student-t Conditional Variational Autoencoder with Dynamic Level Injection (CVAEsT+LS) extracts a robust, heavy-tailed term structure manifold, effectively decoupling macroeconomic shape dynamics from absolute base rates. Second, the latent dynamic evolution is governed by a continuous-time Neural Stochastic Differential Equation (SDE) strictly penalized by a No-Arbitrage Partial Differential Equation (PDE). Empirical results across multiple sovereign currencies (USD, GBP, JPY) confirm that our synergistic approach drastically reduces out-of-sample forecasting errors -- achieving an exceptional 6.58 bps Mean Tenor RMSE -- and successfully overcomes the massive parallel drift and zero-lower-bound violations exhibited by the classical HJM model in extreme environments. Furthermore, through phase space vector field analysis, we demonstrate the model's superior capability in unsupervised macroeconomic regime detection and high-quality continuous-time scenario generation. Ultimately, this research provides a highly scalable, mathematically sound evolutionary engine for term structure modeling.

2605.12118 2026-05-26 stat.ML cs.LG

Keeping Score: Efficiency Improvements in Neural Likelihood Surrogate Training via Score-Augmented Loss Functions

保持分数:通过分数增强损失函数提高神经似然代理训练的效率

Alexander Shen, Mikael Kuusela

AI总结 针对随机过程模型,提出通过分数增强损失函数和自适应加权改进神经似然代理训练,在显著降低计算成本的同时提升代理质量,实现与10倍训练数据相当的推理性能。

详情
Comments
9 pages of main text, 9 pages of appendices, 13 figures
AI中文摘要

对于随机过程模型,参数推断通常受限于计算昂贵的似然函数。基于模拟的推断(SBI)通过构建摊销代理似然绕过了这一限制,但大多数SBI方法假设黑箱数据生成过程。虽然这些代理在无限训练数据下是精确的,但实际场景迫使在模型质量和模拟成本之间进行严格权衡。在这项工作中,我们放宽了SBI的黑箱假设,以改善结构化随机过程模型的这种权衡。具体而言,对于通过概率分类训练的神经网络似然代理,我们提出用精确的分数信息 $\nabla_θ\log p(x \mid θ)$ 和基于损失梯度的自适应加权来增强标准二元交叉熵损失。我们在涉及网络动力学和空间过程的案例研究中评估了我们的方法,证明我们的方法以远低于生成更多训练数据的计算成本提高了代理质量。值得注意的是,在某些情况下,我们的方法实现了与训练数据增加10倍相当的下游推理性能,而训练时间增加不到1.1倍。

英文摘要

For stochastic process models, parameter inference is often severely bottlenecked by computationally expensive likelihood functions. Simulation-based inference (SBI) bypasses this restriction by constructing amortized surrogate likelihoods, but most SBI methods assume a black-box data generating process. While these surrogates are exact in the limit of infinite training data, practical scenarios force a strict tradeoff between model quality and simulation cost. In this work, we loosen the black-box assumption of SBI to improve this tradeoff for structured stochastic process models. Specifically, for neural network likelihood surrogates trained via probabilistic classification, we propose to augment the standard binary cross-entropy loss with exact score information $\nabla_θ\log p(x \mid θ)$ and adaptive weighting based on loss gradients. We evaluate our approach on case studies involving network dynamics and spatial processes, demonstrating that our method improves surrogate quality at a drastically lower computational cost than generating more training data. Notably, in some cases, our approach achieves downstream inference performance equivalent to a 10x increase in training data with less than a 1.1x increase in training time.

2605.04536 2026-05-26 math.ST math.DG stat.ME stat.TH

Transversality and Geometric Regularisation in Distributional Statistical Models

分布统计模型中的横截性与几何正则化

R. Labouriau

AI总结 本文提出分布-核对框架,利用横截性定理证明核作为几何正则化器使参数模型避免退化,并给出可验证条件。

详情
Comments
22 pages, no figures no tables. In the second version some sketches were replaced by proofs, an example of M-determinancy was added
AI中文摘要

分布统计框架用分布-核对 $(T, φ)$ 替代经典概率密度,其中 $T$ 是缓增分布,$φ$ 是快速衰减核。我们论证核作为几何正则化器,使参数统计模型相对于编码不可识别性、奇异信息、矩不确定性和表示失效的退化轨迹处于一般(横截)位置。利用 Whitney、Thom 和 Mather 的横截性定理,我们证明了一个有限维弱横截性定理:对于任何足够丰富族中的一般核,核诱导的特征映射避免了足够高余维的退化层。我们建立了可验证的条件——表述为联合特征映射的雅可比矩阵的秩条件——在这些条件下可以检验横截性假设,并对位置族、对数正态分布、Stein 差异和图模型进行了验证。当前结果适用于参数模型;讨论了半参数和非参数设置的扩展。退化分类包括无闭式密度模型的表示退化(类型 0)和非弦图模型中的高阶不稳定性(类型 IV)。可识别性、鲁棒性、矩确定性、Fisher 信息正则性、Stein 差异、推断分离和 Behrens-Fisher 问题都统一地解释为特征映射上的横截性条件。本文作为一系列发展分布框架论文的几何伴侣。

英文摘要

The distributional statistical framework replaces classical probability densities by distribution-kernel pairs $(T, φ)$, where $T$ is a tempered distribution and $φ$ is a rapidly decaying kernel. We develop the thesis that the kernel acts as a geometric regulariser, placing parametric statistical models in generic (transversal) position relative to degeneracy loci encoding non-identifiability, singular information, moment indeterminacy, and representation failure. Using the transversality theorems of Whitney, Thom, and Mather, we prove a finite-dimensional weak transversality theorem: for a generic kernel in any sufficiently rich family, the kernel-induced feature map avoids degeneracy strata of sufficiently high codimension. We establish verifiable conditions -- formulated as rank conditions on the Jacobian of the joint feature map -- under which the transversality hypothesis can be checked, and verify them for location families, the log-normal, Stein discrepancies, and graphical models. The present results apply to parametric models; extensions to semiparametric and nonparametric settings are discussed. The degeneracy classification includes representation degeneracy (Type 0) for models without closed-form densities and higher-order instabilities (Type IV) in non-chordal graphical models. Identifiability, robustness, moment determinacy, Fisher information regularity, Stein discrepancy, inferential separation, and the Behrens-Fisher problem all admit a unified geometric interpretation as transversality conditions on the feature map. This paper serves as a geometric companion to a series of papers developing the distributional framework.

2603.24814 2026-05-26 stat.AP

Multiple-group (Controlled) Interrupted Time Series Analysis with Higher-Order Autoregressive Errors: A Simulation Study Comparing Newey-West and Prais-Winsten Methods

多组(受控)中断时间序列分析中的高阶自回归误差:比较Newey-West和Prais-Winsten方法的模拟研究

Ariel Linden

AI总结 通过蒙特卡洛模拟首次系统评估了在AR(2)和AR(3)误差结构下,Newey-West与Prais-Winsten方法在多组中断时间序列分析中的表现,发现Prais-Winsten在大多数条件下具有更好的推断校准性。

详情
AI中文摘要

先前在多组中断时间序列分析中对普通最小二乘法结合Newey-West标准误(OLS-NW)与Prais-Winsten(PW)回归的比较仅限于一阶自回归(AR[1])误差,因为之前无法进行高阶AR[k]过程的PW估计。我们通过蒙特卡洛模拟首次系统评估了OLS-NW和PW在AR[2]和AR[3]误差结构下的表现。模拟考察了在不同序列长度和效应量下的轻度正自相关、振荡自相关和高持久自相关。OLS-NW通常表现出更高的表观检验效能,但第一类错误率显著膨胀且覆盖率较差,尤其是在持久自相关下,随着AR阶数和序列长度的增加,推断性能恶化。PW在几乎所有条件下保持了明显更好的推断校准性。两种方法近似无偏。

英文摘要

Previous comparisons of ordinary least squares with Newey-West standard errors (OLS-NW) and Prais-Winsten (PW) regression in multiple-group interrupted time series analysis have been limited to first-order autoregressive (AR[1]) errors because PW estimation for higher-order AR[k] processes was previously unavailable. We conducted the first systematic evaluation of OLS-NW and PW under AR[2] and AR[3] error structures using Monte Carlo simulation. Simulations examined mild positive, oscillatory, and high persistent autocorrelation across varying series lengths and effect sizes. OLS-NW generally showed higher apparent power but substantially inflated Type I error and poor coverage, particularly under persistent autocorrelation, where inferential performance worsened with increasing AR order and series length. PW maintained substantially better inferential calibration across nearly all conditions. Both methods were approximately unbiased.

2603.16833 2026-05-26 stat.ME math.ST stat.TH

Refined Inference for Asymptotically Linear Estimators with Non-Negligible Second-Order Remainders

具有非可忽略二阶余项的新近线性估计量的精细推断

Lin Li

AI总结 针对半参数模型中渐近线性估计量的二阶余项方差不可忽略的情形,提出了留一法刀切法和配对自助法的有效推断方法,并推导了聚类数据下的解析表达式。

详情
AI中文摘要

半参数模型中的渐近线性估计量通常通过冯·米塞斯展开进行研究,其中一阶推断基于影响函数方差。这种简化仅在二阶余项不仅在概率上可忽略,而且在方差上也可忽略时才有效,而通常确保渐近线性性的乘积率条件并不隐含这一要求。我们研究了余项贡献阶为$n^{-1}$方差的情形,此时总抽样方差与标准影响函数近似相差一个非消失的一阶项。我们推导了一个有限样本方差分解,将影响函数方差、余项方差及其协方差分开,并通过缩放余项方差的消失来刻画夹心估计量的有效性:在交叉项可忽略的情况下,当$n\,\mathrm{Var}(R_{\mathrm{rem}})\to 0$时,夹心估计量对总抽样方差一致估计,而在互补的近边界情形$n\,\mathrm{Var}(R_{\mathrm{rem}})\to c_R>0$中,它会显著低估。然后,我们在近边界情形下建立了两种精细过程(留一法刀切法和配对自助法)的渐近有效性。刀切法的有效性通过自归一化论证获得;自助法的有效性在Mallows--2条件下直接建立。我们还将理论扩展到聚类数据,并推导了一个解析表达式,显示组内相关性如何通过余项项放大夹心差距。模拟说明了该情形,并验证了竞争方差估计量的预测覆盖行为。

英文摘要

Asymptotically linear estimators in semiparametric models are usually studied through a von Mises expansion in which first-order inference is based on the influence-function variance. This reduction is valid only when the second-order remainder is negligible not only in probability but also in variance, a requirement not implied by the usual product-rate conditions ensuring asymptotic linearity. We study the regime in which the remainder contributes variance at order $n^{-1}$, so that the total sampling variance differs from the standard influence-function approximation by a non-vanishing first-order term. We derive a finite-sample variance decomposition separating the influence-function variance, the remainder variance, and their covariance, and characterize sandwich validity through the vanishing of scaled remainder variance: under a negligible cross term, the sandwich estimator is consistent for the total sampling variance when $n\,\mathrm{Var}(R_{\mathrm{rem}})\to 0$ and materially underestimates it in the complementary near-boundary regime $n\,\mathrm{Var}(R_{\mathrm{rem}})\to c_R>0$. We then establish asymptotic validity of two refined procedures in the near-boundary regime: the leave-one-out jackknife and the pairs bootstrap. Jackknife validity is obtained through a self-normalization argument; bootstrap validity is established directly under a Mallows--2 condition. We also extend the theory to clustered data and derive an analytic expression showing how intra-cluster correlation amplifies the sandwich gap through the remainder term. Simulations illustrate the regime and confirm the predicted coverage behaviour of the competing variance estimators.

2603.10941 2026-05-26 stat.ME

Covariate-adjusted statistical dependence representation through partial copulas: bounds and new insights

通过部分Copula的协变量调整统计依赖性表示:界限与新见解

Vinícius Litvinoff Justus, Felipe Fontana Vieira

AI总结 本文重新审视部分Copula的概念,证明其可作为偏相关的非线性类比,并揭示条件Copula的依赖性如何约束部分Copula的形式,通过模拟展示其在因果推断中恢复因果效应真实符号的潜力。

详情
AI中文摘要

在本文中,我们重新审视部分Copula的概念,该概念最初被引入用于检验条件独立性,强调其表示在去除与协变量的依赖性后两个随机变量之间依赖性的能力。基于文献中先前提出的结果,我们证明部分Copula可以被视为偏相关的非线性类比。然后,我们证明几个结果,展示条件Copula的依赖性如何约束部分Copula的形式。最后,进行模拟研究以说明结果并展示部分Copula作为描述协变量调整统计依赖性的一种方式的潜力。这突显了该方法在因果推断问题中以及恢复因果效应真实符号的潜力。

英文摘要

In this paper, we revisit the notion of partial copula, originally introduced to test conditional independence, highlighting its capability to represent the dependence between two random variables after removing their dependence with a covariate. Building upon results previously presented in the literature, we show that partial copulas can be seen as a nonlinear analogue of partial correlation. Then, we prove several results showing how dependence properties of the conditional copulas constrain the form of the partial copula. Finally, a simulation study is conducted to illustrate the results and to show the potential of the partial copula as a way to describe covariate-adjusted statistical dependence. This highlights the potential of the method to be used in causal inference problems and to recover the true sign of a causal effect.

2602.21479 2026-05-26 stat.ML cs.LG

Global Sequential Testing for Multi-Stream Auditing

多流审计的全局序贯检验

Beepul Bharti, Ambar Pal, Jeremias Sulam

AI总结 针对多数据流审计问题,提出基于鞅合并的序贯检验方法,在稀疏和密集备择假设下分别达到最优停止时间,并通过实验验证。

详情
AI中文摘要

在许多风险敏感领域,随着接收更多数据,持续审计机器学习系统以快速判断其是否按设计运行至关重要。该审计任务可建模为具有 $k$ 个数据流和全局零假设的序贯假设检验问题,其中全局零假设断言系统在所有 $k$ 个流上按预期运行。在备择假设下,使用 Bonferroni 校正的标准全局序贯检验,对于大 $k$ 和显著性水平 $α$,期望停止时间为 $O\left(\ln rac{k}{α} ight)$。在这项工作中,我们证明了依赖于通过平均和乘积规则合并鞅的高效序贯检验提供了改进的停止时间,从而对零假设具有更强的检验能力。利用这些结果,我们表明平衡检验在稀疏情形(仅少数非零流)下可以达到 Bonferroni 的 $O\left(\ln rac{k}{α} ight)$ 速率,同时在密集备择假设(许多非零流)下实现 $O\left( rac{1}{k}\ln rac{1}{α} ight)$。我们通过在合成数据和真实数据上的实验验证了我们的理论。

英文摘要

Across many risk-sensitive areas, it is critical to continuously audit machine learning systems as we receive more data to quickly determine if they are performing as designed. This auditing task can be modeled as a sequential hypothesis testing problem with $k$ data streams and a global null hypothesis that asserts the system operates as intended across all $k$ streams. Under the alternative, the standard global sequential test, which uses a Bonferroni correction, has an expected stopping time of $O\left(\ln \frac{k}α\right)$ for large $k$ and significance level $α$. In this work, we demonstrate that efficient sequential tests, relying on merging martingales via averaging and products rules, provide improved stopping times, and thus more powerful tests against the null. Using these results, we show that a balanced test can match the Bonferroni rate of $O\left(\ln \frac{k}α\right)$ in the sparse regime (just a few non-null streams) while achieving $O\left(\frac{1}{k}\ln \frac{1}α\right)$ under dense alternatives (many non-null steams). We validate our theory through experiments on both synthetic and real-world data.

2602.16340 2026-05-26 cs.LG stat.ML

The Implicit Bias of Adam and Muon on Smooth Homogeneous Neural Networks

Adam和Muon在光滑齐次神经网络上的隐式偏差

Eitan Gronich, Gal Vardi

AI总结 研究动量优化器在光滑齐次模型上的隐式偏差,证明Muon、MomentumGD和Signum在衰减学习率下近似于最速下降轨迹,并偏向于对应边际最大化问题的KKT点,同时将分析扩展到Adam和混合范数优化器。

详情
Comments
ICML 2026. 8 pages, 1 figure (with appendix: 45 pages, 3 figures)
AI中文摘要

我们研究了动量优化器在光滑齐次模型上的隐式偏差。我们证明,在衰减学习率调度下,像Muon(谱范数)、MomentumGD(ℓ2范数)和Signum(ℓ∞范数)这样的动量最速下降算法是近似最速下降轨迹,从而证明这些算法偏向于对应边际最大化问题的KKT点。我们将分析扩展到Adam(不含稳定性常数),它最大化ℓ∞边际,以及Muon-Signum和Muon-Adam,它们最大化混合范数。我们的实验证实了理论,并表明最大化的边际类型取决于优化器的选择。总体而言,我们的结果扩展了早期关于齐次模型中最速下降和线性模型中动量优化器的工作线。

英文摘要

We study the implicit bias of momentum-based optimizers on smooth homogeneous models. We show that \textit{momentum steepest descent} algorithms like Muon (spectral norm), MomentumGD ($\ell_2$ norm), and Signum ($\ell_\infty$ norm) are \textit{approximate} steepest descent trajectories under a decaying learning rate schedule, proving that these algorithms have a bias towards KKT points of the corresponding margin maximization problem. We extend the analysis to Adam (without the stability constant), which maximizes the $\ell_\infty$ margin, and to Muon-Signum and Muon-Adam, which maximize a hybrid norm. Our experiments corroborate the theory and show that the identity of the margin maximized depends on the choice of optimizer. Overall, our results extend earlier lines of work on steepest descent in homogeneous models and momentum-based optimizers in linear models.

2602.12604 2026-05-26 math.ST stat.ML stat.TH

Differentially Private Two-Stage Empirical Risk Minimization with Applications to Individualized Treatment Rule

差分隐私的两阶段经验风险最小化及其在个体化治疗规则中的应用

Joowon Lee, Guanhua Chen

AI总结 本文提出差分隐私两阶段经验风险最小化(DP-2ERM),通过目标扰动校准全流程的数据依赖敏感性,在因果推断和个体化治疗规则学习中实现隐私保护,并展示了优于逐阶段组合的隐私-效用权衡。

详情
Comments
25 pages, 2 figures. Technical proofs are omitted for the initial version. It will be included in future versions
AI中文摘要

差分隐私通过注入校准的随机噪声,为发布统计估计量提供了一个正式框架,限制任何单个观测对输出的影响。我们研究了因果推断和个体化治疗规则(ITR)学习中常见的两阶段程序的差分隐私估计,其中首先估计数据依赖的权重以强制执行协变量平衡,然后通过加权经验风险最小化获得感兴趣的参数。我们提出了差分隐私两阶段经验风险最小化(DP-2ERM),它通过目标扰动直接私有化最终估计量,该扰动校准到整个流程的数据依赖敏感性。分析将几种协变量平衡方法(逆概率加权、熵平衡加权和最大均值差异加权)的确定性权重扰动界与第二阶段解的随机敏感性界相结合。得到的校准比自然的逐阶段组合基线更尖锐,而相同的敏感性分析作为副产品提供了该基线。模拟研究和ITR学习的基准应用展示了改进的隐私-效用权衡。

英文摘要

Differential privacy provides a formal framework for releasing statistical estimators that limit how much any single observation can influence the output, by injecting calibrated random noise. We study differentially private estimation in two-stage procedures common in causal inference and individualized treatment rule (ITR) learning, in which data-dependent weights are first estimated to enforce covariate balance and a parameter of interest is then obtained by weighted empirical risk minimization. We propose Differentially Private Two-Stage Empirical Risk Minimization (DP-2ERM), which privatizes the final estimator directly through objective perturbation calibrated to the data-dependent sensitivity of the full pipeline. The analysis combines deterministic weight-perturbation bounds for several covariate-balancing methods (inverse propensity weighting, entropy balancing weighting, and maximum mean discrepancy weighting) with probabilistic sensitivity bounds for the second-stage solution. The resulting calibration is sharper than the natural stage-wise composition baseline, which the same sensitivity analysis supplies as a byproduct. Simulation studies and a benchmark application to ITR learning demonstrate the improved privacy--utility trade-off.

2602.02806 2026-05-26 stat.AP

De-Linearizing Agent Traces: Bayesian Inference of Latent Partial Orders for Efficient Execution

去线性化智能体轨迹:潜在偏序的贝叶斯推断以实现高效执行

Dongqing Li, Zheqiao Cheng, Geoff K. Nicholls, Quyu Kong

AI总结 提出BPOP贝叶斯框架,从噪声线性化轨迹中推断潜在依赖偏序,通过可处理的边界-softmax似然实现高效MCMC推理,在云基础设施任务和科学工作流上显著减少令牌使用和执行时间。

详情
AI中文摘要

AI智能体越来越多地将程序化工作流执行为顺序动作轨迹,这掩盖了潜在的并发性并导致重复的逐步推理。我们引入了BPOP,一个贝叶斯框架,从噪声线性化轨迹中推断潜在依赖偏序。BPOP将轨迹建模为底层图的随机线性扩展,并通过可处理的边界-softmax似然执行高效的MCMC推理,避免了线性扩展上的#P困难边缘化。我们在开源Cloud-IaC-6(一组具有异构LLM生成轨迹的云配置任务)和WFCommons科学工作流上进行了评估。BPOP比仅轨迹和过程挖掘基线更准确地恢复依赖结构,并且推断出的图支持一个编译执行器,该执行器修剪不相关的上下文,从而显著减少令牌使用和执行时间。

英文摘要

AI agents increasingly execute procedural workflows as sequential action traces, which obscures latent concurrency and induces repeated step-by-step reasoning. We introduce BPOP, a Bayesianframework that infers a latent dependency partial order from noisy linearized traces. BPOP models traces as stochastic linear extensions of an underlying graph and performs efficient MCMC inference via a tractable frontier-softmax likelihood that avoids #P-hard marginalization over linear extensions. We evaluate on our open-sourced Cloud-IaC-6, a suite of cloud provisioning tasks with heterogeneous LLM-generated traces, and WFCommons scientific workflows. BPOP recover dependency structure more accurately than trace-only and process-mining baselines, and the inferred graphs support a compiled executor that prunes irrelevant context, yielding substantial reductions in token usage and execution time.

2601.21924 2026-05-26 cs.LG stat.ML

One-Step Bellman Alignment Enables Provably Efficient Transfer in Online RL

一步贝尔曼对齐实现在线强化学习中的可证明高效迁移

Elynn Chen, Enpei Zhang, Jinhang Chai, Yujun Yan

AI总结 提出一步贝尔曼对齐作为在线强化学习中迁移的正确抽象,并通过重加权目标(RWT)实现算子级修正,在RKHS函数逼近下建立了与任务迁移复杂度相关的遗憾界。

详情
AI中文摘要

我们研究在情节马尔可夫决策过程中的在线迁移强化学习,其中在学习目标任务时,来自相关源任务的经验是可用的。一个基本困难在于任务相似性通常根据奖励或转移来定义,而在线RL算法操作在贝尔曼回归目标上。因此,简单地重用源贝尔曼更新会引入系统性偏差并使遗憾保证失效。我们识别出一阶贝尔曼对齐作为在线RL中迁移的正确抽象,并提出重加权目标(RWT),这是一种算子级修正,通过测度变换重新定位延续值并补偿转移不匹配。RWT将任务不匹配简化为固定的一步修正,并实现了源数据的统计上合理的重用。这种对齐产生了一个两阶段RWT Q学习框架,将方差减少与偏差修正分离。在RKHS函数逼近下,我们建立的遗憾界随任务迁移的复杂度而非目标MDP的复杂度变化。我们进一步证明了所需的密度比允许一个具有有限样本保证的构造性RKHS估计器,并经验验证了对估计和错误指定比率的鲁棒性。在表格和神经网络设置中的实证结果均显示,与单任务学习和朴素池化相比,持续改进,突出了贝尔曼对齐作为在线RL中模型无关的迁移原理。

英文摘要

We study online transfer reinforcement learning (RL) in episodic Markov decision processes, where experience from related source tasks is available during learning on a target task. A fundamental difficulty is that task similarity is typically defined in terms of rewards or transitions, whereas online RL algorithms operate on Bellman regression targets. As a result, naively reusing source Bellman updates introduces systematic bias and invalidates regret guarantees. We identify one-step Bellman alignment as the correct abstraction for transfer in online RL and propose re-weighted targeting (RWT), an operator-level correction that retargets continuation values and compensates for transition mismatch via a change of measure. RWT reduces task mismatch to a fixed one-step correction and enables statistically sound reuse of source data. This alignment yields a two-stage RWT $Q$-learning framework that separates variance reduction from bias correction. Under RKHS function approximation, we establish regret bounds that scale with the complexity of the task shift rather than the target MDP. We further show the required density ratios admit a constructive RKHS estimator with finite-sample guarantees, and empirically validate robustness to estimated and mis-specified ratios. Empirical results in both tabular and neural network settings demonstrate consistent improvements over single-task learning and naïve pooling, highlighting Bellman alignment as a model-agnostic transfer principle for online RL.

2601.20738 2026-05-26 cs.LG cs.DC eess.SP math.OC stat.ML

SA-PEF: Step-Ahead Partial Error Feedback for Efficient Federated Learning

SA-PEF:用于高效联邦学习的前瞻部分误差反馈

Dawit Kiros Redie, Reza Arablouei, Stefan Werner

AI总结 提出SA-PEF方法,通过结合前瞻校正和部分误差反馈,在非IID数据和部分客户端参与下加速联邦学习收敛,并理论证明其收敛速率与Fed-SGD相当。

详情
Journal ref
Transactions on Machine Learning Research, 2026
AI中文摘要

带误差反馈(EF)的有偏梯度压缩减少了联邦学习(FL)中的通信,但在非IID数据下,残差误差可能缓慢衰减,导致早期轮次中的梯度不匹配和进度停滞。我们提出前瞻部分误差反馈(SA-PEF),它集成了前瞻(SA)校正与部分误差反馈(PEF)。当前瞻系数$α=0$时,SA-PEF恢复为EF;当$α=1$时,恢复为前瞻EF(SAEF)。对于非凸目标和$δ$-收缩压缩器,我们建立了二阶矩界和残差递归,保证了在异构数据和部分客户端参与下收敛到平稳点。得到的速率与标准非凸Fed-SGD保证在常数因子内匹配,在固定内步长下实现$O((η,η_0TR)^{-1})$收敛到方差/异质性下界。我们的分析揭示了一个由前瞻控制的残差收缩$ρ_r$,解释了早期训练阶段观察到的加速。为了平衡SAEF的快速预热与EF的长期稳定性,我们选择接近理论预测最优的$α$。跨多种架构和数据集的实验表明,SA-PEF始终比EF更快达到目标精度。

英文摘要

Biased gradient compression with error feedback (EF) reduces communication in federated learning (FL), but under non-IID data, the residual error can decay slowly, causing gradient mismatch and stalled progress in the early rounds. We propose step-ahead partial error feedback (SA-PEF), which integrates step-ahead (SA) correction with partial error feedback (PEF). SA-PEF recovers EF when the step-ahead coefficient $α=0$ and step-ahead EF (SAEF) when $α=1$. For non-convex objectives and $δ$-contractive compressors, we establish a second-moment bound and a residual recursion that guarantee convergence to stationarity under heterogeneous data and partial client participation. The resulting rates match standard non-convex Fed-SGD guarantees up to constant factors, achieving $O((η,η_0TR)^{-1})$ convergence to a variance/heterogeneity floor with a fixed inner step size. Our analysis reveals a step-ahead-controlled residual contraction $ρ_r$ that explains the observed acceleration in the early training phase. To balance SAEF's rapid warm-up with EF's long-term stability, we select $α$ near its theory-predicted optimum. Experiments across diverse architectures and datasets show that SA-PEF consistently reaches target accuracy faster than EF.

2601.10494 2026-05-26 stat.ML cs.LG

CROCS: A Two-Stage Clustering Framework for Behaviour-Centric Consumer Segmentation with Smart Meter Data

CROCS:一种基于智能电表数据的以行为为中心的消费者细分的两阶段聚类框架

Luke W. Yerbury, Ricardo J. G. B. Campello, G. C. Livingston, Mark Goldsworthy, Lachlan O'Neil

AI总结 提出CROCS两阶段聚类框架,通过消费者日常负荷曲线的独立聚类和基于加权最小距离的集合间比较,实现鲁棒且可扩展的消费者行为细分。

详情
AI中文摘要

随着电网运营商面临可再生能源整合和电气化推广带来的不确定性增加,需求侧管理(DSM)——特别是需求响应(DR)——作为一种平衡现代电力系统的成本效益机制引起了广泛关注。全球持续部署的智能电表提供了前所未有的消费数据量,使得基于实际用电行为的消费者细分成为可能,有望为设计更有效的DSM和DR计划提供信息。然而,现有的基于聚类的细分方法未能充分反映消费者的行为多样性,通常依赖于严格的时间对齐,并且在存在异常值、缺失数据或大规模部署时表现不佳。为了解决这些挑战,我们提出了一种新颖的两阶段聚类框架——优化消费者细分的聚类表示(CROCS)。在第一阶段,每个消费者的每日负荷曲线被独立聚类,形成代表性负荷集(RLS),提供其典型日间消费行为的紧凑摘要。在第二阶段,使用加权最小距离和(WSMD)对消费者进行聚类,这是一种新颖的集合间度量,通过考虑这些行为的普遍性和相似性来比较RLS。最后,对WSMD诱导图进行社区检测,揭示体现定义消费者群体的共享日间行为的高阶原型,从而增强所得聚类的可解释性。在合成和真实澳大利亚智能电表数据集上的大量实验表明,CROCS能够捕捉消费者内部变异性,发现同步和异步行为相似性,对异常值和缺失数据保持鲁棒性,并通过自然并行化实现高效扩展。这些结果...

英文摘要

With grid operators confronting rising uncertainty from renewable integration and a broader push toward electrification, Demand-Side Management (DSM) -- particularly Demand Response (DR) -- has attracted significant attention as a cost-effective mechanism for balancing modern electricity systems. Unprecedented volumes of consumption data from a continuing global deployment of smart meters enable consumer segmentation based on real usage behaviours, promising to inform the design of more effective DSM and DR programs. However, existing clustering-based segmentation methods insufficiently reflect the behavioural diversity of consumers, often relying on rigid temporal alignment, and faltering in the presence of anomalies, missing data, or large-scale deployments. To address these challenges, we propose a novel two-stage clustering framework -- Clustered Representations Optimising Consumer Segmentation (CROCS). In the first stage, each consumer's daily load profiles are clustered independently to form a Representative Load Set (RLS), providing a compact summary of their typical diurnal consumption behaviours. In the second stage, consumers are clustered using the Weighted Sum of Minimum Distances (WSMD), a novel set-to-set measure that compares RLSs by accounting for both the prevalence and similarity of those behaviours. Finally, community detection on the WSMD-induced graph reveals higher-order prototypes that embody the shared diurnal behaviours defining consumer groups, enhancing the interpretability of the resulting clusters. Extensive experiments on both synthetic and real Australian smart meter datasets demonstrate that CROCS captures intra-consumer variability, uncovers both synchronous and asynchronous behavioural similarities, and remains robust to anomalies and missing data, while scaling efficiently through natural parallelisation. These results...

2512.24434 2026-05-26 math.CO math.ST stat.TH

The non-backtracking random walk and its usage for vertex clustering

非回溯随机游走及其在顶点聚类中的应用

Marianna Bolla

AI总结 本文通过非回溯矩阵和转移概率矩阵的特征值关系,提出基于非回溯随机游走的顶点聚类方法,并开发“膨胀-收缩”技术应用于实际图数据。

详情
Comments
18 pages
AI中文摘要

在稀疏图的情况下,考虑了非回溯矩阵的真实特征值与非回溯转移概率矩阵的真实特征值之间的关系,并应用于顶点聚类。为此,考虑了沿非回溯图的随机游走,其顶点是双向边,邻接关系取决于随机游走是否按照“下一步不回溯”的规则通过有向边。这被编码为非回溯矩阵,即非回溯图的邻接矩阵。转移概率矩阵的结构性真实特征值与非回溯矩阵的常数倍相关,其一致性表明图背后存在稀疏随机块模型。还开发了“膨胀-收缩”技术用于原始图的顶点聚类,并给出了实际应用。

英文摘要

In case of sparse graphs, relation between the real eigenvalues of the non-backtracking matrix and those of the non-backtracking transition probability matrix is considered with respect to vertex clustering. For this purpose, the random walk along the non-backtracking graph is considered, the vertices of which are the bioriented edges, and the adjacency relation depends on whether the random walk goes through the oriented edges with the rule of ``not going back in the next step''. This is encoded in the non-backtracking matrix that is the adjacency matrix of the non-backtracking graph. The structural real eigenvalues of the transition probability matrix are related to the constant multiples of the non-backtracking one, the concordance of which indicates the existence of a sparse stochastic block model behind the graph. ``Inflation--deflation'' techniques are also developed for clustering the vertices of the original graph together with real world applications.

2512.23956 2026-05-26 stat.ML cs.LG

Implicit geometric regularization in flow matching via density weighted Stein operators

通过密度加权Stein算子的流匹配中的隐式几何正则化

Shinto Eguchi

AI总结 提出γ-流匹配(γ-FM),通过动态密度加权策略隐式正则化高维空间中的回归几何,改善向量场平滑性和采样效率。

详情
Comments
Revised version
AI中文摘要

流匹配(FM)已成为连续归一化流的一个强大范式,但标准FM隐式地在整个环境空间上进行未加权的$L^2$回归。在高维空间中,这导致了一个根本性的低效:绝大多数积分区域由低密度的“空洞”区域组成,其中目标速度场通常是混沌或定义不良的。在本文中,我们提出了γ-流匹配(γ-FM),一种密度加权变体,它将回归几何与底层概率流对齐。虽然密度加权是可取的,但朴素实现需要评估难以处理的目标密度。我们通过引入一种动态密度加权策略来规避这一点,该策略直接从训练粒子估计目标密度。这种方法使我们能够动态降低空洞区域中的回归损失,而不损害FM的无模拟特性。理论上,我们证明了γ-FM在赋予γ-Stein度量的统计流形上最小化传输成本。谱分析进一步表明,这种几何结构引入了隐式Sobolev正则化,有效地抑制了空洞区域中的高频振荡。实验上,γ-FM显著改善了高维潜在数据集上的向量场平滑性和采样效率,同时展示了对异常值的内在鲁棒性。

英文摘要

Flow Matching (FM) has emerged as a powerful paradigm for continuous normalizing flows, yet standard FM implicitly performs an unweighted $L^2$ regression over the entire ambient space. In high dimensions, this leads to a fundamental inefficiency: the vast majority of the integration domain consists of low-density ``void'' regions where the target velocity fields are often chaotic or ill-defined. In this paper, we propose {$γ$-Flow Matching ($γ$-FM)}, a density-weighted variant that aligns the regression geometry with the underlying probability flow. While density weighting is desirable, naive implementations would require evaluating the intractable target density. We circumvent this by introducing a Dynamic Density-Weighting strategy that estimates the \emph{target} density directly from training particles. This approach allows us to dynamically downweight the regression loss in void regions without compromising the simulation-free nature of FM. Theoretically, we establish that $γ$-FM minimizes the transport cost on a statistical manifold endowed with the $γ$-Stein metric. Spectral analysis further suggests that this geometry induces an implicit Sobolev regularization, effectively damping high-frequency oscillations in void regions. Empirically, $γ$-FM significantly improves vector field smoothness and sampling efficiency on high-dimensional latent datasets, while demonstrating intrinsic robustness to outliers.

2512.18508 2026-05-26 stat.ME cs.AI cs.SY eess.SP eess.SY

Selection-Induced Contraction of Innovation Statistics in Gated Kalman Filters

门控卡尔曼滤波中创新统计量的选择诱导收缩

Barak Or

AI总结 本文证明在门控卡尔曼滤波中,经过门控后的创新统计量收敛于门控条件而非名义量,并推导了椭球门控下创新的一阶和二阶矩的精确表达式,揭示了门控引起的确定性协方差收缩,并扩展至最近邻关联分析。

详情
Comments
9 pages, preprint
AI中文摘要

验证门控是经典卡尔曼跟踪系统的基本组成部分。只有归一化创新平方(NIS)低于规定阈值的测量值才被考虑用于状态更新。虽然这个过程在统计上基于卡方分布,但它隐含地将无条件创新过程替换为条件观测过程,仅限于验证事件。本文表明,门控后计算的创新统计量收敛于门控条件量而非名义量。在线性高斯假设下,我们推导了椭球门控条件下创新的一阶和二阶矩的精确表达式,并证明门控会引起创新协方差的确定性、维度依赖的收缩。分析扩展至最近邻(NN)关联,后者被证明是一个额外的统计选择算子。我们证明,在多个门内测量中选择最小范数创新会引入不可避免的能量收缩,这意味着在非平凡门控和关联下,名义创新统计量无法保持。二维情况下的闭式结果量化了组合效应并说明了其实际意义。

英文摘要

Validation gating is a fundamental component of classical Kalman-based tracking systems. Only measurements whose normalized innovation squared (NIS) falls below a prescribed threshold are considered for state update. While this procedure is statistically motivated by the chi-square distribution, it implicitly replaces the unconditional innovation process with a conditionally observed one, restricted to the validation event. This paper shows that innovation statistics computed after gating converge to gate-conditioned rather than nominal quantities. Under classical linear--Gaussian assumptions, we derive exact expressions for the first- and second-order moments of the innovation conditioned on ellipsoidal gating, and show that gating induces a deterministic, dimension-dependent contraction of the innovation covariance. The analysis is extended to NN association, which is shown to act as an additional statistical selection operator. We prove that selecting the minimum-norm innovation among multiple in-gate measurements introduces an unavoidable energy contraction, implying that nominal innovation statistics cannot be preserved under nontrivial gating and association. Closed-form results in the two-dimensional case quantify the combined effects and illustrate their practical significance.

2511.08870 2026-05-26 math.ST stat.TH

Gaussian Approximation for High-Dimensional Second-Order $U$- and $V$-statistics with Size-Dependent Kernels under i.n.i.d. Sampling

高维二阶 $U$- 和 $V$-统计量在独立非同分布抽样下具有大小依赖核的高斯逼近

Shunsuke Imai

AI总结 针对核依赖于样本大小的高维二阶 $U$- 和 $V$-统计量,在独立非同分布抽样下建立高斯逼近,统一处理非退化和退化情形,并涵盖加权、两样本统计量等特例。

详情
AI中文摘要

我们为二阶 $U$- 和 $V$-统计量形成的高维向量开发了高斯逼近,这些统计量的核在独立但非同分布(i.n.i.d.)抽样下依赖于样本大小。我们的结果不依赖于 Hoeffding 分解中哪个分量占主导地位,因此涵盖了非退化和退化情形作为特例。通过允许 i.n.i.d. 抽样,我们分析的统计量类别包括加权 $U$- 和 $V$-统计量以及两样本 $U$- 和 $V$-统计量作为特例,这些涵盖了具有许多协变量的回归模型中的参数估计、许多弱工具变量以及广泛的平滑两样本检验和分别可交换数组等。此外,我们在 i.n.i.d. 设置下开发了具有大小依赖核的高维 $U$-统计量的极大不等式,其形式在广泛的应用中保持尖锐,这可能具有独立的意义。

英文摘要

We develop Gaussian approximations for high-dimensional vectors formed by second-order $U$- and $V$-statistics whose kernels depend on sample size under independent but not identically distributed (i.n.i.d.) sampling. Our results hold irrespective of which component of the Hoeffding decomposition is dominant, thereby covering both non-degenerate and degenerate regimes as special cases. By allowing i.n.i.d.~sampling, the class of statistics we analyze includes weighted $U$- and $V$-statistics and two-sample $U$- and $V$-statistics as special cases, which cover estimators of parameters in regression models with many covariates, many-weak instruments as well as a broad class of smoothed two-sample tests and the separately exchangeable arrays, among others. In addition, we develop maximal inequalities for high-dimensional $U$-statistics with size-dependent kernels under the i.n.i.d.~setting, in a form that remains sharp across a broad range of applications, which may be of independent interest.

2511.03963 2026-05-26 stat.ML cs.LG

Robust inference using density-powered Stein operators

使用密度驱动的Stein算子进行稳健推断

Shinto Eguchi

AI总结 提出基于γ-散度的γ-Stein算子,通过密度加权实现未归一化概率模型的稳健推断,并应用于稳健拟合优度检验和贝叶斯后验近似。

详情
Comments
Revised version
AI中文摘要

我们引入了Stein算子的密度幂加权变体,称为γ-Stein算子。这是一类从γ-散度导出的新型算子,旨在为未归一化概率模型构建稳健的推断方法。该算子的构造(通过模型密度的正幂γ进行加权)固有地降低了异常值的影响,提供了一种稳健性的原则性机制。应用该算子产生了得分匹配的稳健推广,保留了不依赖于模型归一化常数的关键性质。我们将此框架扩展到两个关键应用:用于稳健拟合优度检验的γ-核化Stein散度,以及用于稳健贝叶斯后验近似的γ-Stein变分梯度下降。在受污染的高斯和四次势模型上的实验结果表明,我们的方法在稳健性和统计效率上显著优于标准基线。

英文摘要

We introduce a density-power weighted variant for the Stein operator, called the $γ$-Stein operator. This is a novel class of operators derived from the $γ$-divergence, designed to build robust inference methods for unnormalized probability models. The operator's construction (weighting by the model density raised to a positive power $γ$ inherently down-weights the influence of outliers, providing a principled mechanism for robustness. Applying this operator yields a robust generalization of score matching that retains the crucial property of being independent of the model's normalizing constant. We extend this framework to develop two key applications: the $γ$-kernelized Stein discrepancy for robust goodness-of-fit testing, and $γ$-Stein variational gradient descent for robust Bayesian posterior approximation. Empirical results on contaminated Gaussian and quartic potential models show our methods significantly outperform standard baselines in both robustness and statistical efficiency.

2510.20954 2026-05-26 stat.ML cs.LG eess.SP

A Spectral Framework for Graph Neural Operators: Convergence Guarantees and Tradeoffs

图神经算子的谱框架:收敛保证与权衡

Roxanne Holden, Luana Ruiz

AI总结 本文提出统一谱框架,分析图神经算子在无正则性、全局Lipschitz连续和分段Lipschitz连续假设下的收敛率与权衡。

详情
AI中文摘要

图极限(Graphons)作为图序列的极限,为分析图神经算子的渐近行为提供了算子理论框架。采样图到图极限的谱收敛诱导了相应神经算子的收敛,从而实现了图神经网络(GNN)的可迁移性分析。本文开发了一个统一的谱框架,将不同假设下(包括无正则性、全局Lipschitz连续和分段Lipschitz连续)的收敛结果整合在一起。该框架将这些结果置于公共算子环境中,便于直接比较其假设、收敛率和权衡。我们进一步在合成图和真实世界图上展示了这些率的经验紧致性。

英文摘要

Graphons, as limits of graph sequences, provide an operator-theoretic framework for analyzing the asymptotic behavior of graph neural operators. Spectral convergence of sampled graphs to graphons induces convergence of the corresponding neural operators, enabling transferability analyses of graph neural networks (GNNs). This paper develops a unified spectral framework that brings together convergence results under different assumptions on the underlying graphon, including no regularity, global Lipschitz continuity, and piecewise-Lipschitz continuity. The framework places these results in a common operator setting, enabling direct comparison of their assumptions, convergence rates, and tradeoffs. We further illustrate the empirical tightness of these rates on synthetic and real-world graphs.

2510.17578 2026-05-26 math.ST stat.TH

A robust and scalable estimation for high-dimensional volatility models

高维波动率模型的稳健可扩展估计

Kejun Chen, Yuchang Lin, Qianqian Zhu

AI总结 针对BEKK-ARCH类高维波动率模型,提出一种结合数据截断和正则化最小二乘的稳健估计框架,建立了非渐近误差界和极小极大最优收敛率,并引入稳健BIC和岭型估计器实现模型选择,在计算速度和预测精度上优于现有方法。

详情
AI中文摘要

本文针对BEKK-ARCH类高维波动率模型,引入了一种稳健且计算高效的估计框架。所提方法采用数据截断以确保对重尾分布的稳健性,并利用正则化最小二乘法在高维设置中实现高效优化。在重尾机制下,建立了所得估计量的非渐近误差界,并推导了极小极大最优收敛率。此外,引入了稳健BIC和岭型估计器,分别用于选择模型阶数和BEKK分量数量,并在重尾设置下建立了它们的选择一致性。模拟研究展示了所提方法的有限样本性能,两个实证应用说明了其实用性。结果表明,新框架在计算速度和预测精度上均优于现有替代方法。

英文摘要

This paper introduces a robust and computationally efficient estimation framework for high-dimensional volatility models in the BEKK-ARCH class. The proposed approach employs data truncation to ensure robustness against heavy-tailed distributions and utilizes a regularized least squares method for efficient optimization in high-dimensional settings. Non-asymptotic error bounds are established for the resulting estimators under heavy-tailed regimes, and the minimax optimal convergence rate is derived. Moreover, a robust BIC and a Ridge-type estimator are introduced for selecting the model order and the number of BEKK components, respectively, with their selection consistency established under heavy-tailed settings. Simulation studies demonstrate finite-sample performance of the proposed method, and two empirical applications illustrate its practical utility. The results show that the new framework outperforms existing alternatives in both computational speed and forecasting accuracy.

2510.15284 2026-05-26 cs.LG math.ST stat.TH

Small Ensemble-based Data Assimilation: A Machine Learning-Enhanced Data Assimilation Method with Limited Ensemble Size

基于小集合的数据同化:一种机器学习增强的有限集合数据同化方法

Zhilin Li, Zhou Yao, Xianglong Li, Zeng Liu, Zhaokuan Lu, Shanlin Xu, Seungnam Kim, Guangyao Wang

AI总结 提出一种结合集合卡尔曼滤波与全连接神经网络的机器学习数据同化方法,通过小集合生成初步分析状态并用神经网络预测修正项,在几乎不增加计算成本下提升精度。

详情
AI中文摘要

基于集合的数据同化方法因其处理非线性动态问题的固有能力而日益流行。然而,这些方法通常面临分析精度与计算效率之间的权衡,因为更高精度所需的更大集合规模也会导致更高的计算成本。在本研究中,我们提出了一种新颖的基于机器学习的数据同化方法,将传统的集合卡尔曼滤波与全连接神经网络相结合。具体而言,我们的方法使用相对较小的集合规模通过EnKF生成初步但次优的分析状态。然后利用FCNN学习并预测这些状态的修正项,从而减轻有限集合规模导致的性能下降。我们通过涉及Lorenz系统和非线性海浪场模拟的数值实验评估了所提出的EnKF-FCNN方法的性能。结果一致表明,新方法在相同集合规模下比传统EnKF实现了更高的精度,同时几乎不增加额外计算成本。此外,EnKF-FCNN方法通过与不同模型耦合以及使用替代的基于集合的数据同化方法,可适应多种应用。

英文摘要

Ensemble-based data assimilation (DA) methods have become increasingly popular due to their inherent ability to address nonlinear dynamic problems. However, these methods often face a trade-off between analysis accuracy and computational efficiency, as larger ensemble sizes required for higher accuracy also lead to greater computational cost. In this study, we propose a novel machine learning-based data assimilation approach that combines the traditional ensemble Kalman filter (EnKF) with a fully connected neural network (FCNN). Specifically, our method uses a relatively small ensemble size to generate preliminary yet suboptimal analysis states via EnKF. A FCNN is then employed to learn and predict correction terms for these states, thereby mitigating the performance degradation induced by the limited ensemble size. We evaluate the performance of our proposed EnKF-FCNN method through numerical experiments involving Lorenz systems and nonlinear ocean wave field simulations. The results consistently demonstrate that the new method achieves higher accuracy than traditional EnKF with the same ensemble size, while incurring negligible additional computational cost. Moreover, the EnKF-FCNN method is adaptable to diverse applications through coupling with different models and the use of alternative ensemble-based DA methods.

2510.00479 2026-05-26 physics.flu-dyn stat.ML

On the joint estimation of flow fields and particle properties from Lagrangian data

关于从拉格朗日数据联合估计流场和粒子属性的研究

Ke Zhou, Samuel J. Grauer

AI总结 本文通过数据同化框架联合估计流场和未知粒子属性(位置、大小、密度),在湍流边界层、均匀各向同性湍流和可压缩激波主导流中验证了可行性,并分析了种子密度、噪声水平和斯托克斯数对重建精度的影响。

详情
AI中文摘要

我们数值研究了从拉格朗日粒子追踪(LPT)数据联合估计流场和未知粒子属性(如位置、大小和密度)的可行性和极限。LPT提供时间分辨的粒子轨迹体积测量,这些粒子是载流流体运动的标记。然而,实验轨迹在空间上是稀疏的且可能有噪声,而流场重建问题可能因惯性粒子输运而进一步复杂化,因此必须确定粒子滑移速度才能获取载流流体的速度场。为了解决这个问题,我们开发了一个数据同化框架,将流的欧拉表示与拉格朗日粒子模型耦合,从而在弥散多相流控制方程下同时推断载流场和粒子属性。我们展示了在三种代表性情况下可以联合估计流场和粒子属性:(1)在带有噪声示踪轨迹的湍流边界层中(St→0),联合估计流场和真实粒子位置,这相当于一个物理信息粒子追踪问题;(2)在播种了惯性粒子(St~1-5)的均匀各向同性湍流中,我们演示了同时恢复流态和粒子直径,显示了隐式粒子表征的可行性;(3)在可压缩、激波主导的流中,我们首次报告了速度、压力、密度和惯性粒子属性(直径和密度)的联合重建,突出了超音速区域联合估计的潜力和某些限制。系统的敏感性研究揭示了种子密度、噪声水平和斯托克斯数如何影响我们方法的重建精度。

英文摘要

We numerically investigate the feasibility and limits of jointly estimating flow fields and unknown particle properties (e.g., position, size, and density) from Lagrangian particle tracking (LPT) data. LPT offers time-resolved, volumetric measurements of particle trajectories, which are markers of the carrier fluid motion. However, experimental tracks are spatially sparse and potentially noisy, and the problem of reconstructing flow fields may be further complicated by inertial particle transport, such that particle slip velocities must be determined to access the velocity field of the carrier fluid. To address this problem, we develop a data assimilation framework that couples an Eulerian representation of the flow with Lagrangian particle models, enabling the simultaneous inference of carrier fields and particle properties under the governing equations of disperse multiphase flow. We show that flow fields and particle properties can be jointly estimated in three representative regimes: (1) In a turbulent boundary layer with noisy tracer tracks (St to 0), flow fields and true particle positions are jointly estimated, which amounts to a physics-informed particle tracking problem; (2) in homogeneous isotropic turbulence seeded with inertial particles (St ~ 1-5), we demonstrate simultaneous recovery of flow states and particle diameters, showing the feasibility of implicit particle characterization; and (3) in a compressible, shock-dominated flow, we report the first joint reconstructions of velocity, pressure, density, and inertial particle properties (diameter and density), highlighting both the potential and certain limits of joint estimation in supersonic regimes. A systematic sensitivity study reveals how the seeding density, noise level, and Stokes number govern reconstruction accuracy for our method.

2509.25507 2026-05-26 stat.ML cs.LG math.ST stat.ME stat.TH

One-shot Conditional Sampling: MMD meets Nearest Neighbors

一次性条件采样:MMD 遇见最近邻

Anirban Chatterjee, Sayantan Choudhury, Rohan Hore

AI总结 提出 CGMMD 框架,通过最小化最大均值差异(MMD)实现一次性条件采样,理论保证收敛性,并在图像去噪和超分辨率等任务中表现优异。

详情
Comments
Accepted at the 43rd International Conference on Machine Learning (ICML 2026)
AI中文摘要

我们如何从从未完全观察到的条件分布中生成样本?这个问题在现代机器学习和经典统计学的广泛应用中都会出现,包括计算机视觉中的图像后处理、基于模拟的推理中的近似后验采样以及复杂数据设置中的条件分布建模。在这种情况下,与无条件采样相比,可以利用额外的特征信息来实现更自适应和高效的采样。基于此,我们引入了使用 MMD 的条件生成器(CGMMD),一种用于条件采样的新颖框架。与许多当代方法不同,我们的方法将训练目标设定为一个简单的、无对抗的直接最小化问题。CGMMD 的一个关键特性是它能够在生成器的单次前向传播中产生条件样本,从而实现实际的一次性采样,测试时复杂度低。我们建立了从 CGMMD 采样器采样时产生的损失的严格理论界限,并证明了估计分布向真实条件分布的收敛性。在此过程中,我们还开发了基于最近邻的泛函的一致集中结果,这可能具有独立的研究价值。最后,我们展示了 CGMMD 在涉及复杂条件密度的合成任务以及实际应用(如图像去噪和图像超分辨率)中具有竞争力的表现。

英文摘要

How can we generate samples from a conditional distribution that we never fully observe? This question arises across a broad range of applications in both modern machine learning and classical statistics, including image post-processing in computer vision, approximate posterior sampling in simulation-based inference, and conditional distribution modeling in complex data settings. In such settings, compared with unconditional sampling, additional feature information can be leveraged to enable more adaptive and efficient sampling. Building on this, we introduce Conditional Generator using MMD (CGMMD), a novel framework for conditional sampling. Unlike many contemporary approaches, our method frames the training objective as a simple, adversary-free direct minimization problem. A key feature of CGMMD is its ability to produce conditional samples in a single forward pass of the generator, enabling practical one-shot sampling with low test-time complexity. We establish rigorous theoretical bounds on the loss incurred when sampling from the CGMMD sampler, and prove convergence of the estimated distribution to the true conditional distribution. In the process, we also develop a uniform concentration result for nearest-neighbor based functionals, which may be of independent interest. Finally, we show that CGMMD performs competitively on synthetic tasks involving complex conditional densities, as well as on practical applications such as image denoising and image super-resolution.

2506.01945 2026-05-26 econ.EM cs.LG stat.AP

Stock Market Telepathy: Graph Neural Networks Predicting the Secret Conversations between MINT and G7 Countries

股市读心术:图神经网络预测MINT与G7国家之间的秘密对话

Nurbanu Bursa

AI总结 使用MTGNN图神经网络分析2012-2024年G7与MINT国家股市指数,揭示美国、加拿大、印尼和土耳其的影响力,并证明该方法优于传统预测模型。

详情
Journal ref
Communications in Statistics: Case Studies, Data Analysis and Applications (2026)
AI中文摘要

新兴经济体,特别是MINT国家(墨西哥、印度尼西亚、尼日利亚和土耳其),在全球股市中的影响力日益增强,尽管它们仍易受G7(加拿大、法国、德国、意大利、日本、英国和美国)等发达国家经济状况的影响。金融市场的这种相互关联性和敏感性使得理解这些关系对于投资者和政策制定者准确预测股价走势至关重要。为此,我们研究了2012年至2024年G7和MINT国家的主要股市指数,使用了一种称为多元时间序列图神经网络(MTGNN)的最新图神经网络算法。该方法允许考虑多元时间序列中复杂的时空连接。在实现中,MTGNN揭示出美国和加拿大在预测过程中对股市指数最具影响力的G7国家,而印度尼西亚和土耳其是最具影响力的MINT国家。此外,我们的结果表明,MTGNN在预测MINT和G7国家股市指数价格方面优于传统方法。因此,该研究为经济板块市场提供了宝贵的见解,并提出了一种使用MTGNN分析全球股市动态的令人信服的实证方法。

英文摘要

Emerging economies, particularly the MINT countries (Mexico, Indonesia, Nigeria, and Türkiye), are gaining influence in global stock markets, although they remain susceptible to the economic conditions of developed countries like the G7 (Canada, France, Germany, Italy, Japan, the United Kingdom, and the United States). This interconnectedness and sensitivity of financial markets make understanding these relationships crucial for investors and policymakers to predict stock price movements accurately. To this end, we examined the main stock market indices of G7 and MINT countries from 2012 to 2024, using a recent graph neural network (GNN) algorithm called multivariate time series forecasting with graph neural network (MTGNN). This method allows for considering complex spatio-temporal connections in multivariate time series. In the implementations, MTGNN revealed that the US and Canada are the most influential G7 countries regarding stock indices in the forecasting process, and Indonesia and Türkiye are the most influential MINT countries. Additionally, our results showed that MTGNN outperformed traditional methods in forecasting the prices of stock market indices for MINT and G7 countries. Consequently, the study offers valuable insights into economic blocks' markets and presents a compelling empirical approach to analyzing global stock market dynamics using MTGNN.

2505.01357 2026-05-26 stat.ME math.ST stat.TH

Weight-calibrated estimation for factor models of high-dimensional time series

高维时间序列因子模型的权重校准估计

Xinghao Qiao, Zihan Wang, Qiwei Yao, Bo Zhang

AI总结 针对高维时间序列因子模型,提出一种权重校准方法,通过线性投影和降秩自回归改进估计性能,并首次系统比较不同因子强度下的协方差基、标准自协方差基和权重校准自协方差基方法。

详情
Comments
This version is the accepted version by Journal of the American Statistical Association
AI中文摘要

高维时间序列的因子建模在发现潜在公共成分以进行降维和信息提取方面非常强大。大多数现有的估计方法可分为两类:基于协方差的方法(在渐近可识别假设下)和基于自协方差的方法(具有白噪声特质成分)。本文遵循基于自协方差的框架,开发了一种新颖的权重校准方法以提高估计性能。它采用线性投影来处理高维性,并采用降秩自回归公式。建立了所提出方法的渐近理论,放宽了白噪声假设。此外,我们在文献中首次尝试,在存在不同强度的因子的情况下,对基于协方差、标准基于自协方差和我们提出的权重校准基于自协方差的方法进行了系统的理论比较。进行了大量模拟,以展示我们提出的方法优越的有限样本性能,并验证新建立的理论。通过对一个金融数据集和一个宏观经济数据集的分析,进一步说明了我们提出的方法的优越性。

英文摘要

The factor modeling for high-dimensional time series is powerful in discovering latent common components for dimension reduction and information extraction. Most available estimation methods can be divided into two categories: the covariance-based under asymptotically-identifiable assumption and the autocovariance-based with white idiosyncratic noise. This paper follows the autocovariance-based framework and develops a novel weight-calibrated method to improve the estimation performance. It adopts a linear projection to tackle high-dimensionality, and employs a reduced-rank autoregression formulation. The asymptotic theory of the proposed method is established, relaxing the assumption on white noise. Additionally, we make the first attempt in the literature by providing a systematic theoretical comparison among the covariance-based, the standard autocovariance-based, and our proposed weight-calibrated autocovariance-based methods in the presence of factors with different strengths. Extensive simulations are conducted to showcase the superior finite-sample performance of our proposed method, as well as to validate the newly established theory. The superiority of our proposal is further illustrated through the analysis of one financial and one macroeconomic data sets.

2504.03390 2026-05-26 math.ST stat.TH

Estimation of Population Linear Spectral Statistics by Marchenko--Pastur Inversion

通过Marchenko-Pastur逆变换估计总体线性谱统计量

Ben Deitmar

AI总结 提出一种从高维数据估计总体线性谱统计量的新方法,在非参数设定下首次证明收敛速度为O(n^{ε-1}),并对高斯数据给出标准化因子n的估计误差中心极限定理。

详情
Comments
78 pages, 14 figures
AI中文摘要

提出一种从高维数据估计总体线性谱统计量的新方法。当维度$d$随样本量$n$增长且满足$ rac{d}{n} o c>0$时,该方法是首个在一般非参数设定下具有$\mathcal{O}(n^{\varepsilon - 1})$(对任意$\varepsilon > 0$)收敛速度的方法。对于高斯数据,给出了估计误差以$n$为标准化因子的中心极限定理。

英文摘要

A new method of estimating population linear spectral statistics from high-dimensional data is introduced. When the dimension $d$ grows with the sample size $n$ such that $\frac{d}{n} \to c>0$, the proposed method is the first with proven convergence rate of $\mathcal{O}(n^{\varepsilon - 1})$ for any $\varepsilon > 0$ in a general nonparametric setting. For Gaussian data, a CLT for the estimation error with normalization factor $n$ is shown.

2503.19605 2026-05-26 cs.LG cs.CL math.ST stat.TH

Lean Formalization of Generalization Error Bound by Rademacher Complexity and Dudley's Entropy Integral

Rademacher复杂度和Dudley熵积分的泛化误差界的Lean形式化

Sho Sonoda, Kazumi Kasaura, Yuma Mizuno, Kei Tsukamoto, Naoto Onda

AI总结 本文在Lean 4中形式化了基于Rademacher复杂度的泛化误差界,通过形式化对称化论证、有界差异分析和McDiarmid不等式,并扩展到可数假设类及可分离拓扑索引集,最后应用得到线性预测器的经验Rademacher界和Dudley熵积分界。

详情
Comments
accepted at ITP2026
AI中文摘要

理解和证明机器学习算法的泛化性能——即从训练误差获得测试误差的理论估计——是统计学习理论的核心主题。在用于推导此类保证的众多复杂度度量中,Rademacher复杂度提供了尖锐的、数据相关的界,其适用范围远超经典的VC维理论。在本研究中,我们基于Mathlib库中可用的测度论概率论,在Lean 4中形式化了Rademacher复杂度的泛化误差界。我们的开发提供了一个经过机械检查的流水线,从经验和期望Rademacher复杂度的定义开始,经过形式化的对称化论证和有界差异分析,通过形式化证明的McDiarmid不等式得到高概率一致偏差界。一个关键的技术贡献是可重用机制,通过归约到可数稠密子集,将结果从可数假设类(其中上确界的可测性在Mathlib中直接成立)提升到可分离拓扑索引集。作为抽象定理的工作应用,我们机械化了$\ell_2$和$\ell_1$正则化下线性预测器的标准经验Rademacher界,并且我们还形式化了基于覆盖数和链式构造的Dudley型熵积分界。

英文摘要

Understanding and certifying the generalization performance of machine learning algorithms -- i.e. obtaining theoretical estimates of the test error from the training error -- is a central theme of statistical learning theory. Among the many complexity measures used to derive such guarantees, Rademacher complexity yields sharp, data-dependent bounds that apply well beyond classical VC-dimension theory. In this study, we formalize the generalization error bound by Rademacher complexity in Lean 4, building on measure-theoretic probability theory available in the Mathlib library. Our development provides a mechanically-checked pipeline from the definitions of empirical and expected Rademacher complexity, through a formal symmetrization argument and a bounded-differences analysis, to high-probability uniform deviation bounds via a formally proved McDiarmid inequality. A key technical contribution is a reusable mechanism for lifting results from countable hypothesis classes (where measurability of suprema is straightforward in Mathlib) to separable topological index sets via a reduction to a countable dense subset. As worked applications of the abstract theorem, we mechanize standard empirical Rademacher bounds for linear predictors under $\ell_2$ and $\ell_1$ regularizations, and we also formalize a Dudley-type entropy integral bound based on covering numbers and a chaining construction.

2503.05632 2026-05-26 stat.ME stat.ML

A Functional Approach to Curve Alignment and Shape Analysis

曲线对齐与形状分析的功能方法

Issam-Ali Moindjié, Cédric Beaulac, Marie-Hélène Descary

AI总结 提出一种基于函数数据分析(FDA)的框架,通过基展开技术解析估计缩放、平移、旋转和重参数化等变形变量,实现曲线对齐,并利用主成分分析构建随机轮廓生成模型,在模拟数据和MPEG-7数据库上优于传统FDA方法。

详情
AI中文摘要

在许多图像分析问题中,物体的轮廓承载着关于形状的重要统计信息。这些轮廓通常受到缩放、平移、旋转和重参数化等变形变量的影响。以往统计形状分析的研究主要集中于通过离散观测来分析轮廓和形状。虽然这种方法可能具有计算优势,但它忽略了这些对象的连续性质及其底层几何结构。它还忽略了变形变量之间的潜在依赖关系及其对形状的影响,可能导致统计信息损失和可解释性降低。在本文中,我们引入了一个在函数数据分析(FDA)背景下分析形状的新框架。采用基展开技术来推导变形变量(即缩放、平移、旋转和重参数化)估计的解析解,从而实现曲线对齐。然后,利用主成分分析技术开发了一个随机轮廓的生成模型。在模拟数据和MPEG-7数据库上的数值实验表明,我们的方法成功识别了变形参数,并捕捉了传统FDA方法失效情况下的随机轮廓的潜在分布。

英文摘要

In many image analysis problems, the contours of objects carry important statistical information about shape. Such contours are typically affected by deformation variables including scaling, translation, rotation, and reparametrization. Previous studies in statistical shape analysis have mainly focused on analyzing contours and shapes through discrete observations. While this approach might offer computational advantages, it overlooks the continuous nature of these objects and their underlying geometric structure. It also ignores potential dependencies between the deformation variables and their effect on the shape, which may result in a loss of statistical information and reduced interpretability. In this paper, we introduce a novel framework for analyzing shapes within the context of Functional Data Analysis (FDA). Basis expansion techniques are employed to derive analytic solutions for the estimation of deformation variables, namely scaling, translation, rotation, and reparametrization, thereby achieving curve alignment. A generative model for random contours is then developed using principal component analysis techniques. Numerical experiments on simulated data and the \textit{MPEG-7} database demonstrate that our method successfully identifies deformation parameters and captures the underlying distribution of random contours in settings where traditional FDA methods fail.

2503.05067 2026-05-26 stat.ME

Inverse sampling intensity weighting for preferential sampling adjustment

用于优先采样调整的逆采样强度加权

Thomas W. Hsiao, Lance A. Waller

AI总结 提出逆采样强度加权(ISIW)方法,通过两阶段加权似然调整和Vecchia近似实现高效预测,在模型误设下优于传统方法,并应用于铅浓度和PM2.5空间预测。

详情
Journal ref
Journal of Agricultural, Biological, and Environmental Statistics, 2026
AI中文摘要

传统地统计方法假设观测位置与感兴趣的空间过程独立。违反这一独立性假设称为优先采样(PS)。解决PS的标准方法依赖于估计复杂的共享潜变量模型,且在实践中难以应用。我们研究了在基于模型的地统计学中使用逆采样强度加权(ISIW)进行PS调整。ISIW是一种两阶段方法,首先估计观测位置的采样强度,然后在加权似然调整中定义基于强度的权重。通过将调整后的参数估计代入克里金法进行预测。我们引入了一种基于Vecchia近似的ISIW实现,在保持强预测精度的同时实现计算增益。有趣的是,我们发现ISIW在采样设计误设下优于标准PS方法,并且准确的参数估计与预测性能相关性很小,这引发了关于在PS下驱动基于克里金预测器最优实现条件的问题。我们的工作凸显了ISIW以直观、快速和有效的方式调整PS的潜力。我们通过西班牙加利西亚苔藓生物监测数据中铅浓度和加利福尼亚州美国EPA空气质量系统网络中PM2.5浓度的空间预测来说明这些想法。

英文摘要

Traditional geostatistical methods assume independence between observation locations and the spatial process of interest. Violations of this independence assumption are referred to as preferential sampling (PS). Standard methods to address PS rely on estimating complex shared latent variable models and can be difficult to apply in practice. We study the use of inverse sampling intensity weighting (ISIW) for PS adjustment in model-based geostatistics. ISIW is a two-stage approach wherein we estimate the sampling intensity of the observation locations then define intensity-based weights within a weighted likelihood adjustment. Prediction follows by substituting the adjusted parameter estimates within kriging. We introduce an implementation of ISIW based on the Vecchia approximation, enabling computational gains while maintaining strong predictive accuracy. Interestingly, we found that ISIW outpredicts standard PS methods under misspecification of the sampling design, and that accurate parameter estimation had little correlation with predictive performance, raising questions about the conditions driving optimal implementation of kriging-based predictors under PS. Our work highlights the potential of ISIW to adjust for PS in an intuitive, fast, and effective manner. We illustrate these ideas on spatial prediction of lead concentrations measured through moss biomonitoring data in Galicia, Spain, and PM2.5 concentrations from the U.S. EPA Air Quality System network in California.

2503.01684 2026-05-26 nucl-th cs.LG physics.comp-ph stat.ML

An Efficient Learning Method to Connect Observables

一种连接可观测量与高效学习方法

Hang Yu, Takayuki Miyagi

AI总结 提出多参数本征值问题(MEP)仿真器,通过连接不同仿真器实现从可观测量到可观测量的直接预测,并利用特征向量延续(EC)和参数矩阵模型(PMM)数据进行训练,在一维格点模拟和$^{28}$O示例中验证了性能与预测概率分布获取的简便性。

详情
Journal ref
Phys. Rev. Lett. 136, 202502 (2026)
Comments
5+2 pages, 4 figures, matched published version. Shared data and toy model code in the source file (shared.zip)
AI中文摘要

构建快速准确的替代模型是许多主题中做出稳健预测的关键要素。我们引入了一种新模型,即多参数本征值问题(MEP)仿真器。新方法连接了仿真器,并可以直接从可观测量到可观测量进行预测。我们展示了MEP仿真器可以使用来自特征向量延续(EC)和参数矩阵模型(PMM)仿真器的数据进行训练。在一维格点上的简单模拟证实了MEP仿真器的性能。以$^{28}$O为例,我们还证明了通过新仿真器可以轻松获得目标可观测量的预测概率分布。

英文摘要

Constructing fast and accurate surrogate models is a key ingredient for making robust predictions in many topics. We introduce a new model, the Multiparameter Eigenvalue Problem (MEP) emulator. The new method connects emulators and can make predictions directly from observables to observables. We present that the MEP emulator can be trained with data from Eigenvector Continuation (EC) and Parametric Matrix Model (PMM) emulators. A simple simulation on a one-dimensional lattice confirms the performance of the MEP emulator. Using $^{28}$O as an example, we also demonstrate that the predictive probability distribution of the target observables can be easily obtained through the new emulator.

2502.18645 2026-05-26 stat.ME math.ST stat.TH

A Matsuoka-Based GARMA Model for Hydrological Forecasting: Theory, Estimation, and Applications

基于Matsuoka的GARMA水文预报模型:理论、估计与应用

Guilherme Pumi, Danilo Hiroshi Matsuoka, Taiane Schaedler Prass, Bruna Gregory Palm

AI总结 提出Matsuoka自回归移动平均(MARMA)模型,扩展GARMA框架,通过部分极大似然估计和自助法预测区间,用于单位区间(0,1)上的时间序列数据,并在巴西Guarapiranga水库有效水量预测中验证了实用性。

详情
AI中文摘要

自然科学中的时间序列,如水文学、气候学和其他环境应用,通常包含约束在单位区间(0,1)内的连续观测。传统的高斯模型无法捕捉这些边界,需要更灵活的方法。本文介绍了Matsuoka自回归移动平均(MARMA)模型,通过假设Matsuoka分布的随机分量取值在(0,1)内以及允许随机时变协变量的ARMA类系统结构,扩展了GARMA框架。参数估计通过部分极大似然估计(PMLE)进行,我们给出了其渐近理论,从而支持统计推断,包括置信区间和模型选择。为了构建预测区间,我们提出了一种新的基于自助法的方法,考虑了依赖结构的不确定性。全面的蒙特卡洛模拟研究评估了所提方法的有限样本性能,而应用于巴西Guarapiranga水库有效水量预测的实例展示了其实用价值。

英文摘要

Time series in natural sciences, such as hydrology and climatology, and other environmental applications, often consist of continuous observations constrained to the unit interval (0,1). Traditional Gaussian-based models fail to capture these bounds, requiring more flexible approaches. This paper introduces the Matsuoka Autoregressive Moving Average (MARMA) model, extending the GARMA framework by assuming a Matsuoka-distributed random component taking values in (0,1) and an ARMA-like systematic structure allowing for random time-dependent covariates. Parameter estimation is performed via partial maximum likelihood (PMLE), for which we present the asymptotic theory. It enables statistical inference, including confidence intervals and model selection. To construct prediction intervals, we propose a novel bootstrap-based method that accounts for dependence structure uncertainty. A comprehensive Monte Carlo simulation study assesses the finite sample performance of the proposed methodologies, while an application to forecasting the useful water volume of the Guarapiranga Reservoir in Brazil showcases their practical usefulness.

2411.18957 2026-05-26 stat.ME stat.CO

Bayesian Cluster Weighted Gaussian Models

贝叶斯聚类加权高斯模型

Panagiotis Papastamoulis, Konstantinos Perrakis

AI总结 提出一类新的贝叶斯混合正态线性回归模型,通过引入预测变量分布的额外高斯随机成分,利用lasso和graphical-lasso先验分别处理回归系数和协方差结构,并采用跨维度伸缩采样器自动确定聚类数,在模拟和生物医学数据上优于EM、回归混合和专家混合方法。

详情
Journal ref
Journal of Classification, 2026
Comments
accepted to Journal of Classification
AI中文摘要

我们引入了一类新的贝叶斯混合正态线性回归模型,该模型为预测变量的分布纳入了一个额外的高斯随机成分。所提出的聚类加权模型旨在涵盖响应变量分布以及协变量多元分布中的潜在异质性,以检测与潜在结构相关的信号。特别感兴趣的是源自以下两方面的潜在信号:(i)回归模型的线性预测结构,和(ii)协变量的协方差结构。我们使用回归系数的lasso收缩先验和协方差矩阵的graphical-lasso收缩先验来建模这两个成分。通过将混合成分数量视为随机变量并实现跨维度伸缩采样器,采用完全贝叶斯方法估计聚类数。还考虑了基于过拟合混合模型或使用信息准则选择成分数量的替代贝叶斯方法。将所提方法与EM类型实现、回归混合和专家混合进行了比较。通过一组模拟研究和生物医学数据集说明了该方法。

英文摘要

We introduce a novel class of Bayesian mixtures for normal linear regression models which incorporates a further Gaussian random component for the distribution of the predictor variables. The proposed cluster-weighted model aims to encompass potential heterogeneity in the distribution of the response variable as well as in the multivariate distribution of the covariates for detecting signals relevant to the underlying latent structure. Of particular interest are potential signals originating from: (i) the linear predictor structures of the regression models and (ii) the covariance structures of the covariates. We model these two components using a lasso shrinkage prior for the regression coefficients and a graphical-lasso shrinkage prior for the covariance matrices. A fully Bayesian approach is followed for estimating the number of clusters, by treating the number of mixture components as random and implementing a trans-dimensional telescoping sampler. Alternative Bayesian approaches based on overfitting mixture models or using information criteria to select the number of components are also considered. The proposed method is compared against EM type implementation, mixtures of regressions and mixtures of experts. The method is illustrated using a set of simulation studies and a biomedical dataset.

2311.15487 2026-05-26 cs.LG cs.AI math-ph math.MP math.OC stat.ML

Global $\mathcal{L}^2$ minimization at uniform exponential rate via geometrically adapted gradient descent in Deep Learning

全局 $\mathcal{L}^2$ 最小化:通过深度学习中的几何自适应梯度下降实现均匀指数速率

Thomas Chen

AI总结 本文利用微分几何中黎曼度量的任意性,提出两种改进的梯度下降流(过参数化和欠参数化设置),在秩条件成立时证明其以均匀指数收敛速率驱动 $\mathcal{L}^2$ 代价到全局最小值,并推广到秩条件不成立的情形。

详情
Comments
AMS Latex, 21 pages. Typos corrected, references and comments added
AI中文摘要

我们考虑深度学习网络中的监督学习场景,并利用黎曼度量选择的任意性(微分几何的一般事实)来定义梯度下降流。在标准的深度学习方法中,参数空间(权重和偏置)上的梯度流是相对于欧几里得度量定义的。而在这里,我们选择相对于深度学习网络输出层中的欧几里得度量的梯度流。这自然地在参数空间中诱导出两种改进的梯度下降流版本,一种适用于过参数化设置,另一种适用于欠参数化设置。在过参数化情况下,我们证明,只要秩条件成立,改进的梯度下降的所有轨道都以均匀指数收敛速率将 ${\mathcal L}^2$ 代价驱动到其全局最小值;因此,对于任何预先指定的接近全局最小值的程度,可以获得一个先验的停止时间。我们指出了后者与亚黎曼几何的关系。此外,我们将上述框架推广到秩条件不成立的情况;特别地,我们表明局部平衡只有在秩损失发生时才能存在,并且通常它们不是孤立点,而是参数空间中临界子流形的元素。

英文摘要

We consider the scenario of supervised learning in Deep Learning (DL) networks, and exploit the arbitrariness of choice in the Riemannian metric relative to which the gradient descent flow can be defined (a general fact of differential geometry). In the standard approach to DL, the gradient flow on the space of parameters (weights and biases) is defined with respect to the Euclidean metric. Here instead, we choose the gradient flow with respect to the Euclidean metric in the output layer of the DL network. This naturally induces two modified versions of the gradient descent flow in the parameter space, one adapted for the overparametrized setting, and the other for the underparametrized setting. In the overparametrized case, we prove that, provided that a rank condition holds, all orbits of the modified gradient descent drive the ${\mathcal L}^2$ cost to its global minimum at a uniform exponential convergence rate; one thereby obtains an a priori stopping time for any prescribed proximity to the global minimum. We point out relations of the latter to sub-Riemannian geometry. Moreover, we generalize the above framework to the situation in which the rank condition does not hold; in particular, we show that local equilibria can only exist if a rank loss occurs, and that generically, they are not isolated points, but elements of a critical submanifold of parameter space.

2310.01285 2026-05-26 q-fin.CP cs.LG q-fin.MF stat.ML

Automated regime classification in multidimensional time series data using sliced Wasserstein k-means clustering

多维时间序列数据中的自动制度分类:基于切片Wasserstein k-means聚类

Qinmeng Luan, James Hamp

AI总结 提出切片Wasserstein k-means聚类方法,通过近似多维Wasserstein距离,实现多维时间序列数据的自动制度分类,并在合成数据和真实外汇数据中验证有效性。

详情
Journal ref
Data Science in Finance and Economics 2025, Volume 5, Issue 3: 387-418
AI中文摘要

最近的研究提出Wasserstein k-means(Wk-means)聚类作为对时间序列数据(特别是单维资产收益)进行制度分类的强大方法。本文首先详细研究应用于合成一维时间序列数据的Wasserstein k-means聚类算法的行为。我们通过详细研究聚类算法的动态以及超参数变化如何影响不同随机初始化的性能,扩展了先前的工作。我们计算简单的度量,发现这些度量有助于识别高质量的聚类。然后,我们将Wasserstein k-means聚类技术扩展到多维时间序列数据,通过将多维Wasserstein距离近似为切片Wasserstein距离,得到一种称为“切片Wasserstein k-means(sWk-means)聚类”的方法。我们将sWk-means聚类方法应用于多维时间序列数据中的自动制度分类问题,使用合成数据证明该方法的有效性和有效性。最后,我们以公开的外汇即期汇率数据作为案例研究,表明sWk-means方法能够识别真实多维金融时间序列中的不同市场制度。我们最后评论了该方法的一些局限性以及潜在的补充或替代方法。

英文摘要

Recent work has proposed Wasserstein k-means (Wk-means) clustering as a powerful method to classify regimes in time series data, and one-dimensional asset returns in particular. In this paper, we begin by studying in detail the behaviour of the Wasserstein k-means clustering algorithm applied to synthetic one-dimensional time series data. We extend the previous work by studying, in detail, the dynamics of the clustering algorithm and how varying the hyperparameters impacts the performance over different random initialisations. We compute simple metrics that we find to be useful in identifying high-quality clusterings. We then extend the technique of Wasserstein k-means clustering to multidimensional time series data by approximating the multidimensional Wasserstein distance as a sliced Wasserstein distance, resulting in a method we call 'sliced Wasserstein k-means (sWk-means) clustering'. We apply the sWk-means clustering method to the problem of automated regime classification in multidimensional time series data, using synthetic data to demonstrate the validity and effectiveness of the approach. Finally, we show that the sWk-means method is able to identify distinct market regimes in real multidimensional financial time series, using publicly available foreign exchange spot rate data as a case study. We conclude with remarks about some limitations of our approach and potential complementary or alternative approaches.

2605.24477 2026-05-26 cs.LG cs.IT math.IT math.ST stat.TH

The Normalized Maximum Likelihood for Regular Non-Smooth Models: Measure-Theoretic Foundations and Geometric Sampling

正则非光滑模型的归一化最大似然:测度理论基础与几何采样

Trenton Lau, Gary P. T. Choi

AI总结 针对现代机器学习中非光滑估计器(如Lasso、稀疏SVM)的归一化最大似然(NML)编码长度计算问题,本文利用几何测度论和保守雅可比矩阵建立严格框架,并提出一种几何MCMC采样器(PDL-PPMH)以精确计算非光滑模型的随机复杂度。

详情
AI中文摘要

归一化最大似然(NML)编码长度,或称随机复杂度,代表了通用编码的一个原则性准则。虽然最近基于余面积公式的公式化为光滑模型提供了计算方法,但该框架对于现代机器学习中普遍存在的非光滑估计器(例如Lasso、稀疏SVM)失效。在这项工作中,我们为正则路径可微Lipschitz(PDL)估计器提供了计算NML的严格框架。通过应用经典几何测度论并将余面积公式与保守雅可比矩阵联系起来,我们证明了非光滑模型的随机复杂度是适定的,并且在理论上与现代自动微分的输出一致。为了精确计算该量,我们引入了提议-投影Metropolis-Hastings(PDL-PPMH)采样器,这是一种能够遍历最大似然估计器非可微水平集的几何MCMC算法。我们在理论上证明了其组成部分的合理性,包括随机切空间提议和可证明收敛的非光滑投影求解器。我们通过从高维Lasso后验($P=2000$)中采样来展示该方法的鲁棒性,同时量化了控制精确性与混合时间之间权衡的计算规模。关键的是,我们通过实验证明,我们的精确NML准则提供了一种高度数据高效的交叉验证替代方案,无需数据分割即可获得统计上不可区分的预测最优值。总之,我们的工作为正则非光滑模型的NML编码长度理论分析铺平了道路。

英文摘要

The Normalized Maximum Likelihood (NML) codelength, or stochastic complexity, represents a principled criterion for universal coding. While recent coarea-based formulations provided a calculation method for smooth models, this framework collapses for the non-smooth estimators ubiquitous in modern machine learning (e.g., Lasso, Sparse SVMs). In this work, we provide a rigorous framework for computing the NML for regular path-differentiable Lipschitz (PDL) estimators. By applying classical geometric measure theory and bridging the coarea formula with conservative Jacobians, we prove that the stochastic complexity for non-smooth models is well-posed and theoretically consistent with the outputs of modern Automatic Differentiation. To compute this quantity exactly, we introduce the Propose-and-Project Metropolis-Hastings (PDL-PPMH) sampler, a geometric MCMC algorithm capable of traversing the non-differentiable level sets of the maximum likelihood estimator. We theoretically justify its components, including a stochastic tangent space proposal and a provably convergent non-smooth projection solver. We demonstrate the method's robustness by sampling from a high-dimensional Lasso posterior ($P=2000$), while simultaneously quantifying the computational scaling that governs the trade-off between exactness and mixing time. Crucially, we empirically demonstrate that our exact NML criterion provides a highly data-efficient alternative to cross-validation, achieving statistically indistinguishable predictive optima without requiring data splitting. Altogether, our work paves the way for the theoretical analysis of the NML codelength for regular non-smooth models.

2605.24445 2026-05-26 math.PR math.ST stat.TH

Matrix concentration inequalities for time-inhomogeneous Markov chains

非齐次马尔可夫链的矩阵集中不等式

Luca Zanetti

AI总结 针对非齐次马尔可夫链生成的厄米随机矩阵和的最大特征值,建立了Chernoff型界,并应用于Elo评分系统的分析。

详情
AI中文摘要

我们建立了由非齐次马尔可夫链生成的厄米随机矩阵和的最大特征值的Chernoff型界。我们的主要框架假设状态空间紧致且每个马尔可夫核在Wasserstein距离下具有收缩性,即正Ollivier-Ricci曲率。这一假设在非齐次设置中易于验证,并且被多个实际感兴趣的链所满足,例如强凸光滑目标上的随机梯度下降。我们还针对非紧致状态空间,在Saloff-Coste和Zuniga引入的非齐次链谱间隙概念下,建立了类似的界。最后,我们通过分析Elo评分系统(体育分析中一种流行的玩家排名方法)在动态Bradley-Terry-Luce模型下的应用,展示了我们结果的实用性。

英文摘要

We establish Chernoff-type bounds for the largest eigenvalue of sums of Hermitian random matrices generated by a time-inhomogeneous Markov chain. Our primary regime assumes a compact state space and contractivity of each Markov kernel in Wasserstein distance, i.e., positive Ollivier-Ricci curvature. This assumption is convenient to verify in inhomogeneous settings and is satisfied by several chains of practical interest, such as stochastic gradient descent on strongly convex smooth objectives. We also develop analogous bounds for noncompact state spaces under a notion of spectral gap for inhomogeneous chains introduced by Saloff-Coste and Zuniga. Finally, we illustrate the utility of our results through an analysis of the Elo rating system, a popular method for ranking players in sports analytics, under a dynamic version of the Bradley-Terry-Luce model.

2605.24422 2026-05-26 stat.ML cs.LG

Clustering based on Stochastic Dominance with application for risk averters and risk seekers

基于随机占优的聚类及其对风险规避者和风险寻求者的应用

Hua Li, Xue Jia, Yilin Kang, Wing-Keung Wong

AI总结 针对传统聚类方法无法捕捉资产间风险占优关系的问题,提出基于随机占优检验统计量的聚类分析框架,通过构造随机占优系数矩阵并改进K-means和层次聚类算法,实现面向不同风险偏好投资者的定制化资产配置。

详情
AI中文摘要

随机占优(SD)理论为选择适合不同风险偏好(即风险规避、风险寻求和风险中性)投资者资产配置需求的优质资产提供了严格框架。然而,传统的股票聚类方法通常依赖欧氏距离等几何度量,往往无法有效捕捉资产间的内在风险占优关系。为解决这一局限,本文提出一种基于SD检验统计量的创新聚类分析框架。方法上,本研究将SD理论与机器学习算法深度融合。超越传统依赖几何距离的限制,我们创新性地利用一阶、二阶和三阶SD的检验统计量构建“随机占优系数矩阵”。在此矩阵基础上,我们修改了经典的K-means和层次聚类算法。具体地,针对不同阶次的SD关系,我们推导出12种不同的算法变体。同时,我们构建了SD-SC系数和SD-DBI指数作为专门的有效性指标来评估聚类性能。实证上,我们分析了代表性发达市场(美国纳斯达克指数)和新兴市场(中国沪深100指数)的成分股数据。结果验证了所提方法的有效性和稳健性。此外,我们将聚类结果应用于单指数模型的修正和全局最小方差投资组合(GMVP)的构建。结果表明,所提方法有效促进了投资者的定制化资产配置,具有重要的理论价值和实践意义。

英文摘要

Stochastic Dominance (SD) theory provides a rigorous framework for selecting superior assets tailored to the asset allocation needs of investors with varying risk preferences (i.e., risk-averse, risk-seeking, and risk-neutral). However, traditional stock clustering methods typically rely on geometric metrics such as Euclidean distance, which often fail to effectively capture the intrinsic risk dominance relationships among assets. To address this limitation, this paper proposes an innovative clustering analysis framework based on SD test statistics. Methodologically, this study deeply integrates SD theory with machine learning algorithms. Transcending the limitations of traditional reliance on geometric distance, we innovatively utilize test statistics from first-, second-, and third-order SD to construct a "Stochastic Dominance Coefficient Matrix." Building upon this matrix, we modify the classic K-means and Hierarchical Clustering algorithms. Specifically, we derive 12 distinct algorithm variants tailored to different orders of SD relationships. Simultaneously, we construct the SD-SC coefficient and the SD-DBI index as specialized validity indices to evaluate the clustering performance. Empirically, we analyze constituent stock data from a representative developed market (the US NASDAQ Index) and an emerging market (China's CSI 100 Index). The results verify the effectiveness and robustness of the proposed method. Furthermore, we apply the clustering results to the modification of the Single Index Model and the construction of Global Minimum Variance Portfolios (GMVP). The findings demonstrate that the proposed method effectively facilitates customized asset allocation for investors, holding significant theoretical value and practical implications.

2605.24381 2026-05-26 cs.LG cs.AI stat.AP stat.ML

Assessing the Operational Viability of Foundation Models for Time Series Forecasting

评估基础模型在时间序列预测中的操作可行性

Kavin Soni, Debanshu Das, Vamshi Guduguntla

AI总结 通过对比基础模型与监督学习方法在四种操作场景下的性能,提出基于经验特征的复杂度路由器以实现精度与效率的平衡。

详情
Comments
21 pages, 8 Figures, Code available at [https://github.com/kavin-soni/timeseries-zeroshot-eval]
AI中文摘要

时间序列预测驱动着金融、交通和能源等领域的操作决策。虽然监督学习方法表现出色,但它们需要特定领域的训练、特征工程和持续维护。大规模基础模型最近作为一种零样本替代方案出现,像LLM一样避免了任务特定训练。在这项工作中,我们评估了基础模型与标准监督方法的对比。我们不仅关注总体精度,还分析了四种操作场景下的性能:周期性人机系统、物理约束过程、随机金融市场和异构需求预测。我们的结果描述了最优部署区域。基础模型在具有可迁移周期结构的领域中表现良好,并且对于冷启动或长尾场景效率高。相反,监督专家在受严格物理约束的系统中保持更高的精度。在金融领域,较新的基础模型正在迅速缩小与监督专家的性能差距。我们进一步量化了推理延迟、数据漂移适应性和部署约束之间的权衡。最后,我们提出了一个复杂度路由器,它利用经验特征将每个序列分配给最优模型类别。我们证明,与部署通用基础模型相比,这种选择性路由实现了更高的精度和显著更低的推理成本,为平衡泛化性和效率提供了一个实用框架。

英文摘要

Time series forecasting drives operational decisions in areas like finance, transportation, and energy. While supervised learning approaches achieve strong performance, they require domain-specific training, feature engineering, and ongoing maintenance. Large-scale foundation models have recently emerged as a zero-shot alternative, avoiding task-specific training much like LLMs. In this work, we evaluate foundation models against standard supervised approaches. Rather than focusing solely on aggregate accuracy, we analyze performance across four operational regimes: periodic human-centric systems, physically constrained processes, stochastic financial markets, and heterogeneous demand forecasting. Our results characterize optimal deployment areas. Foundation models perform well in domains with transferable periodic structures and are efficient for cold-start or long-tail scenarios. Conversely, supervised specialists maintain higher precision in systems governed by strict physical constraints. In financial domains, newer foundation models are rapidly closing the performance gap with supervised specialists. We further quantify trade-offs in inference latency, data drift adaptability, and deployment constraints. Finally, we propose a Complexity Router that assigns each series to the optimal model class using empirical features. We demonstrate that this selective routing achieves higher accuracy and significantly lower inference costs compared to deploying a universal foundation model, providing a practical framework for balancing generalization and efficiency.

2605.24364 2026-05-26 stat.ML cs.LG

Multicalibration Boosting: Theory, Convergence, and Transferability

多校准提升:理论、收敛性和可迁移性

Hanxuan Ye, Hongzhe Li

AI总结 本文统一了多校准提升(MCBoost)的理论框架,揭示了校准-风险权衡,并建立了在弱假设下的收敛率、有限样本保证和协变量偏移下的迁移性保证。

详情
AI中文摘要

多校准通过要求预测在一组丰富的函数(包括预测切片和子群体)上无偏,扩展了经典校准。它已成为公平性、鲁棒性和可靠预测的强大框架,然而多校准提升(MCBoost)的理论理解仍然零散且常依赖限制性假设。在这项工作中,我们发展了一个统一且精细的MCBoost视角,涵盖了现有变体,包括多精度、BatchGCP和BatchMVP。我们揭示了几个现象,为其实际行为提供了新见解:即使高度准确和灵活的预测器也可能保持显著的不校准;强制多校准引入了校准-风险权衡;早停在这一权衡的控制中起核心作用。在理论方面,我们在更弱且更现实的条件下建立了MCBoost的通用框架。我们证明提升迭代收敛到审计类生成的累积跨度上总体最优预测器的Bregman投影,从而显式刻画了实现多校准的函数空间。我们进一步推导了不同光滑性假设下的收敛率、有限样本保证以及确保终止时多校准的原则性停止规则。最后,我们将通用适应性理论扩展到协变量偏移下,提供了更一般的迁移保证,并阐明了多校准预测器何时跨领域泛化。这些结果为多校准提升提供了更完整的理论基础和实践指导,将其定位为现代预测模型的统一框架和可靠的后处理方法。

英文摘要

Multicalibration extends classical calibration by requiring predictions to be unbiased over a rich collection of functions, encompassing both prediction slices and subpopulations. It has emerged as a powerful framework for fairness, robustness, and reliable prediction, yet the theoretical understanding of multicalibration boosting (MCBoost) remains fragmented and often relies on restrictive assumptions. In this work, we develop a unified and refined perspective on MCBoost that subsumes existing variants, including multiaccuracy, BatchGCP, and BatchMVP. We uncover several phenomena that provide new insights into its practical behavior: even highly accurate and flexible predictors can remain substantially miscalibrated; enforcing multicalibration introduces a calibration-risk trade-off; and early stopping plays a central role in controlling this trade-off. On the theoretical side, we establish a general framework for MCBoost under weaker and more realistic conditions. We show that the boosting iterates converge to a Bregman projection of the population-optimal predictor onto the cumulative span generated by the audit class, thereby explicitly characterizing the function space on which multicalibration is achieved. We further derive convergence rates under different smoothness assumptions, finite-sample guarantees, and principled stopping rules that ensure multicalibration at termination. Finally, we extend the theory of universal adaptability under covariate shift, providing more general transfer guarantees and clarifying when multicalibrated predictors generalize across domains. These results provide a more complete theoretical foundation and practical guidance for multicalibration boosting, positioning it as both a unifying framework and a reliable post-processing approach for modern predictive models.

2605.24356 2026-05-26 eess.SY cs.SY econ.GN q-fin.EC stat.AP stat.OT

Contested Temporalities in Critical Minerals and Resource Extraction for Electric Vehicles

电动汽车关键矿产与资源开采中的时间性争议

Joseph Nyangon

AI总结 本文探讨电动汽车关键矿产(如钴和锂)开采中短期经济激励与长期可持续性之间的时间性冲突,并提出以社区为中心的治理、可持续采矿和循环经济策略来协调资源安全与公平。

详情
Comments
31 Pages, 2 Figures
AI中文摘要

全球对电动汽车(EVs)的推动急剧增加了对钴和锂等关键矿产的需求,造成了快速工业增长与长期可持续性之间的紧张关系。开采集中在少数地区——特别是刚果民主共和国(DRC)、智利和阿根廷——在这些地区,开采已造成严重的 socio-environmental 危害,包括生态系统退化、劳动剥削以及原住民社区的流离失所。在刚果民主共和国,钴矿开采经常与童工和危险工作条件相关;在智利,锂开采加剧了水资源短缺,威胁当地农业和生物多样性。美国《通胀削减法案》(IRA)等政策工具旨在促进道德采购,但以开采为导向的模式继续加深全球不平等。本章考察了转型中的时间性争议,其中开采的短期经济激励与长期环境和社会目标相冲突。它主张建立一个基于地方、以社区为中心的治理、可持续采矿实践和循环经济策略(包括回收和材料替代)的框架,以协调资源安全与公平,并确保向电动汽车的转型不会重现其旨在解决的不公正现象。

英文摘要

The global push for electric vehicles (EVs) has sharply increased demand for critical minerals such as cobalt and lithium, creating a tension between rapid industrial growth and long-term sustainability. Extraction is concentrated in a few regions -- notably the Democratic Republic of Congo (DRC), Chile, and Argentina -- where it has produced serious socio-environmental harms, including ecosystem degradation, labour exploitation, and the displacement of Indigenous communities. In the DRC, cobalt mining is frequently linked to child labour and hazardous working conditions; in Chile, lithium extraction intensifies water scarcity and threatens local agriculture and biodiversity. Policy instruments such as the U.S. Inflation Reduction Act (IRA) seek to promote ethical sourcing, but an extraction-driven model continues to deepen global inequalities. This chapter examines the contested temporalities of the transition, in which the short-term economic incentives of extraction conflict with longer-term environmental and social goals. It argues for a place-based framework built on community-centred governance, sustainable mining practices, and circular-economy strategies, including recycling and material substitution, to align resource security with equity and ensure that the shift to EVs does not reproduce the injustices it aims to address.

2605.24346 2026-05-26 stat.ME

Using the target trial framework for combining information: external comparator analyses and other applications

使用目标试验框架进行信息整合:外部对照分析及其他应用

Lawson Ung, Miguel A. Hernán, Issa J. Dahabreh

AI总结 本文提出在目标试验框架中增加目标人群及其抽样模型的明确规范,以支持多源信息整合的因果分析,并强调数据映射过程中识别和处理数据源间不一致性的重要性。

详情
AI中文摘要

我们描述了如何利用目标试验框架来规划和报告通过整合来自多个不同来源的信息来回答因果问题的分析。这类分析可能涉及对不同人群中评估的治疗进行比较,例如当将索引试验与其他数据源结合进行外部对照分析时,或者在可推广性和可迁移性分析中将因果推断从随机试验扩展到新的目标人群时。在规划此类分析时,目标试验的规范支持通过相关的抽样模型明确定义目标人群。我们提出将此作为目标试验框架的一个额外组成部分,尤其适用于整合信息的分析,因为它影响资格标准的选择、因果模型的规范、因果对比的选择以及对识别策略的推理。此外,该框架鼓励将来自多个数据源的数据元素仔细映射到单个目标试验。这一映射过程可以突出显示数据源之间在框架特定组成部分(例如资格标准、治疗分配和治疗接受的定义)上可能存在的不可调和的不一致。当试图指定与特定数据源一致的目标试验时,可能会引入或加剧与其他拟议数据源的不一致。这种不一致的程度可能值得切换到其他数据源,或前瞻性地获取数据,以模拟拟议的目标试验。我们得出结论,目标试验框架促进了关于通过整合来自不同来源的信息来回答因果问题的分析的设计和假设的透明讨论。

英文摘要

We describe how the target trial framework can be used to plan and report analyses that attempt to answer causal questions by combining information from multiple, diverse sources. Such analyses may involve comparisons of treatments evaluated in different populations, for example when an index trial is combined with other data sources in external comparator analyses, or when extending causal inferences from a randomized trial to a new target population in generalizability and transportability analyses. When planning such analyses, the specification of the target trial supports the explicit definition of the target population with an associated sampling model. We propose this as an additional component for the target trial framework, especially relevant for analyses that combine information, because it influences the choice of eligibility criteria, the specification of the causal model, the choice of causal contrasts, and reasoning about identification strategies. Furthermore, the framework encourages careful mapping of data elements from multiple data sources to a single target trial. This mapping process can highlight potentially irreconcilable misalignments between data sources with respect to specific components of the framework -- for example, in the definitions of eligibility criteria, treatment assignment, and treatment receipt. Such misalignments can arise when attempts to specify a target trial that aligns with a specific data source introduce or worsen misalignments with other proposed data sources. The extent of such misalignments may warrant switching to other data sources, or prospectively obtaining data, to emulate the proposed target trial. We conclude that the target trial framework promotes transparent discussion about the design of and assumptions made in analyses that answer causal questions by combining information from diverse sources.

2605.24331 2026-05-26 cs.LG stat.ML

CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning

CurveRL: 用于LLM推理的基于分布感知的上下文重加权原则

Ke Sun, Yizhou Zhao, Jiayi Xin, Qi Long, Weijie Su

AI总结 本文提出CurveRL方法,通过分位数坐标变换实现分布感知的提示重加权,在RLVR框架下统一优化理论并显著提升推理性能。

详情
AI中文摘要

上下文或提示级别的重加权已成为使用验证奖励的强化学习(RLVR)中提升大型语言模型推理能力的关键算法杠杆,但决定最优加权的原则仍不清楚。我们通过将提示重加权公式化为通过率函数空间中定义的效用泛函的泛函导数来解决这一差距,从而产生一个统一的优化框架,该框架能够容纳现有方案,包括REINFORCE和GRPO。在此优化框架的基础上,我们提出了一种基于分位数坐标变换的分布感知提示重加权方法,称为CurveRL,其中分配给每个提示的权重不取决于通过率的绝对值,而是取决于其排名和密度,以反映学习动态中通过率的分布结构。跨多个基准的大量实验表明,我们提出的CurveRL始终优于GRPO和其他RLVR基线。我们的研究将上下文分布控制确定为分析和设计提示重加权RLVR算法的原则性轴心。代码发布在https://github.com/zhyzmath/CurveRL。

英文摘要

Context or prompt-level reweighting has emerged as a central algorithmic lever in Reinforcement Learning with Verified Rewards (RLVR) for improving the reasoning capability of large language models, yet the principle determining what constitutes an optimal weighting remains poorly understood. We address this gap by formulating prompt reweighting as a functional derivative of a utility functional defined in the pass-rate function space, yielding a unified optimality framework that accommodates existing schemes, including REINFORCE and GRPO. Building on this optimality framework, we propose a distribution-aware prompt reweighting approach, called CurveRL, based on a quantile coordinate transform, in which the weight assigned to each prompt depends not on the absolute value of pass rates but on its rank and density to reflect the distributional structure of the pass rates in the learning dynamics. Extensive experiments across multiple benchmarks demonstrate that our proposed CurveRL consistently outperforms GRPO and other RLVR baselines. Our study identifies context-distribution control as a principled axis for analyzing and designing prompt-reweighted RLVR algorithms. The code is released in https://github.com/zhyzmath/CurveRL.

2605.24295 2026-05-26 cs.LG stat.ML

Private Adaptive Covariance Estimation via Gaussian Graphical Models

通过高斯图模型进行私有自适应协方差估计

Cecilia Ferrando, Miguel Fuentes, Brett Mullins, Cameron Musco, Daniel Sheldon

AI总结 提出PACE-GGM,一种数据自适应的差分隐私协方差估计方法,通过将隐私预算集中在经验协方差矩阵信息量最大的条目上,并在每轮中选择近似差的条目进行高斯机制测量,然后通过最大熵重建目标重构完整协方差矩阵,从而在高维和低到中等隐私预算下显著降低估计误差。

详情
AI中文摘要

我们提出了PACE-GGM,一种数据自适应的差分隐私协方差估计方法,该方法将隐私预算集中在经验协方差矩阵信息量最大的条目上,而不是扰动所有条目。这适用于建模者为每个变量提供单独边界的自然场景,因此各个条目可以比整个矩阵以更少的噪声进行测量。在每一轮中,我们的方法选择一个近似较差的条目,使用高斯机制对其进行测量,然后通过最大熵重建目标重构完整的协方差矩阵,从而得到高斯图模型结构。在多个真实世界数据集上的实验表明,与高斯机制和其他基线相比,该方法在估计误差方面持续改进,特别是在高维和低到中等隐私预算的情况下。

英文摘要

We propose PACE-GGM, a data-adaptive differentially private method for covariance estimation that concentrates its privacy budget on the most informative entries of the empirical covariance matrix, rather than perturbing all entries. This applies in the natural setting where the modeler supplies separate bounds for each variable, so that individual entries can be measured with less noise than the full matrix. In each round, our method selects a poorly approximated entry, measures it using the Gaussian mechanism, and then reconstructs a full covariance matrix using a maximum-entropy reconstruction objective, leading to a Gaussian graphical model structure. Experiments on diverse real-world datasets demonstrate consistent improvements in estimation error with respect to the Gaussian mechanism and other baselines, particularly in high-dimensional and low-to-moderate privacy regimes.

2605.24285 2026-05-26 q-fin.ST stat.ML

Memory, Roughness, and Information Persistence in Financial Markets: A Structural Approach to Volatility Forecasting

金融市场中的记忆、粗糙性与信息持久性:一种波动率预测的结构性方法

Akash Deep, Nicholas Appiah, Svetlozar T. Rachev

AI总结 本文通过结合长记忆估计、粗糙波动率诊断和结构化预测回归,研究了持久性度量在股票波动率建模中的预测价值,发现其能显著提升样本外预测能力,尤其在压力时期。

详情
AI中文摘要

本文研究了长记忆动态、粗糙波动率行为以及基于持久性的预测特征在股票波动率建模中的联合作用。我们结合半参数长记忆估计、粗糙波动率诊断和结构化预测回归,检验持久性度量是否包含超越传统波动率预测因子的经济上有意义的预测信息。使用2001年11月至2026年4月期间115只标普500成分股的面板数据,我们发现波动率代理变量表现出显著的长记忆行为和局部粗糙动态。记忆参数的横截面均值Geweke-Porter-Hudak估计为$\hat{d}=0.226$,而相应的局部Whittle估计为$\hat{d}=0.440$,且几乎整个面板均具有统计显著性。持久性的滚动估计在全球金融危机和COVID期间显著上升,并与VIX呈正相关。然后,我们检验持久性相关特征是否能在标准HAR和HAR-X基准之外改进样本外波动率预测。纳入横截面持久性聚合、行业持久性度量以及持久性与压力交互项,产生了适度但统计上显著的预测改进,尤其是在较长预测期和压力时期。预测收益在市场波动加剧时期和波动率管理的投资组合应用中最为显著。结果表明,持久性度量可作为金融市场不确定性持续时间和传播的有用简化指标,但本文未对产生持久性的经济机制进行结构性识别。

英文摘要

This paper studies the joint role of long-memory dynamics,rough-volatility behavior, and persistence-based forecasting features in equity volatility modeling. We combine semiparametric long-memory estimation, rough-volatility diagnostics, and structured forecasting regressions to examine whether persistence measures contain economically meaningful forecasting information beyond conventional volatility predictors. Using a panel of 115 S&P500 constituents from November 2001 through April 2026, we document that volatility proxies exhibit substantial long-memory behavior and locally rough dynamics. The cross-sectional mean Geweke-Porter-Hudak estimate of the memory parameter is $\hat{d} = 0.226$, while the corresponding local-Whittle estimate is $\hat{d} = 0.440$, with statistical significance observed across nearly the entire panel. Rolling estimates of persistence rise substantially during the global financial crisis and the COVID period and display a positive contemporaneous association with the VIX. We then examine whether persistence-related features improve out-of-sample volatility forecasts beyond standard HAR and HAR-X benchmarks. Incorporating cross-sectional persistence aggregates, sectoral persistence measures, and persistence-by-stress interaction terms produces moderate but statistically significant forecasting improvements, particularly at longer horizons and during stress regimes. Forecast gains are strongest during periods of elevated market volatility and in volatility-managed portfolio applications. The results suggest that persistence measures may serve as useful reduced-form indicators of the duration and propagation of uncertainty in financial markets, although the paper does not claim structural identification of the economic mechanisms generating persistence.

2605.24284 2026-05-26 stat.AP physics.geo-ph

Scalable Gaussian Process for Learning Non-Ergodic Ground Motion Model from Physics-Based Simulations with Application to Power Infrastructure Assessment

基于物理模拟的非遍历性地面运动模型的可扩展高斯过程学习及其在电力基础设施评估中的应用

Jinyan Zhao, Grigorios Lavrentiadis, Domniki Asimaki

AI总结 提出一种可扩展的高斯过程回归模型,利用稀疏Cholesky求逆、并行计算、GPU加速和随机梯度下降等方法,从大规模物理模拟数据中学习非遍历性地面运动效应,并在洛杉矶地区验证了其在插值和预测中的准确性,应用于电力网络评估。

详情
AI中文摘要

本研究介绍了洛杉矶地区非遍历性地面运动模型(NGMM)的开发和应用。该模型基于近期加州全州地震中心(SCEC)CyberShake研究中的物理模拟地面运动数据进行训练和验证。NGMM被构建为高斯过程(GP)回归模型,其中先验中位数定义为ASK14遍历性地面运动模型,后验中位数通过学习训练数据中嵌入的非遍历性效应获得。这些非遍历性效应包括系统性场地和路径效应,在GP中使用Matérn和专门协方差核进行表示,这些核显式表征路径向量。实施NGMM需要对大型数据集(约百万级数据点或更多)进行超参数调优和推断,这对传统GP方法构成了重大计算挑战。为解决这一可扩展性问题,本文提出了一套计算策略,包括稀疏Cholesky求逆、并行计算、GPU加速和随机梯度下降最小化。尽管取得了这些进展,完整的CyberShake数据集(约数亿级数据点)在计算上仍然难以处理。因此,使用混合效应模型单独建模随机变异性,以表示事件内和事件间变异性。所开发的NGMM有两个主要应用:部分观测地面运动场的插值和未观测地震场景中地面运动的预测建模。在独立数据集上的验证结果表明,这两种应用均具有准确性能。对Mw 6.7 Puente Hill场景中电力传输网络评估的案例研究进一步表明,所开发的NGMM能够紧密复现物理模拟结果。

英文摘要

This study presents the development and application of a scalable non-ergodic ground motion model (NGMM) for the Los Angeles area. The NGMM is trained and validated on physics-based simulated ground-motion data from a recent Statewide California Earthquake Center (SCEC) CyberShake study. The NGMM is formulated as a Gaussian Process (GP) regression model, where the prior median is defined as the ASK14 ergodic ground-motion model and the posterior median is obtained by learning the non-ergodic effects embedded in the training data. These non-ergodic effects include systematic site and path effects, which are represented in the GP using Matérn and specialized covariance kernels that explicitly characterize path vectors. Implementing the NGMM requires hyperparameter tuning and inference on large datasets (on the order of one million data points or more), posing significant computational challenges for conventional GP approaches. To address this scalability issue, this paper presents a suite of computational strategies, including sparse Cholesky inversion, parallel computing, GPU acceleration, and stochastic gradient descent minimization. Despite these advances, the full CyberShake dataset (on the order of hundreds of millions of data points) remains computationally prohibitive. Therefore, aleatory variability is modeled separately using a mixed-effects formulation to represent within-event and between-event variability. The developed NGMM has two primary applications: interpolation of partially observed ground-motion fields and predictive modeling for ground motions in unobserved earthquake scenarios. Validation results on independent datasets demonstrate accurate performance in both applications. A case study of power transmission network assessment in an Mw 6.7 Puente Hill scenario further demonstrated that the developed NGMM closely reproduces physics-based simulation results.

2605.24274 2026-05-26 cs.LG stat.ML

A lift for input-convex neural network training

输入凸神经网络训练的提升方法

Ali Siahkoohi, Anirudh Thatipelli

AI总结 针对输入凸神经网络(ICNN)中非负权重约束导致的训练困难,提出一种通过超网络参数扩展的“提升”方法,软化损失景观,避免梯度衰减,在多个任务上达到更低测试损失。

详情
AI中文摘要

输入凸神经网络(ICNN)广泛用于对数凹密度估计、凸势归一化流、最优传输以及高维贝叶斯后验的传输图反演。这些任务共享一个结构约束:ICNN的层间权重必须保持非负。标准方法——投影梯度下降(PGD)到非负锥——应用硬非光滑投影(ADMM风格约束分裂的刚性惩罚极限),其经典收敛保证不适用于非光滑的ICNN训练景观;可微替代方案——softplus重参数化——以权重幅度指数方式衰减梯度,导致层间权重死亡和损失平台,从而停滞训练。受PDE约束反问题的参数扩展提升启发,我们提出“提升”:不是直接约束层间权重,而是训练一个无约束的超网络,该超网络从输入批次的置换不变摘要中生成这些权重。这为训练动态增加了随机性,软化了损失景观,使迭代能够逃离直接softplus停滞的梯度衰减区域。我们将这种软化追溯到三个结构要素——作为松弛变量的可学习偏置、条件于目标批次的超网络主体、以及通过批次随机性耦合两者的交叉协方差——并证明每个要素都是必要的:删除任何单个要素都会破坏承载软化的交叉协方差。在一维玩具目标到图像风格潜在变量的对数凹能量建模,以及21维表格基准上的凸势归一化流实验中,我们展示了提升方法比PGD和直接softplus达到更低的测试损失,并将平台受限的训练轨迹转变为下降谷底的轨迹。

英文摘要

Input-convex neural networks (ICNNs) are widely used for log-concave density estimation, convex-potential normalizing flows, optimal transport, and transport-map inversion for high-dimensional Bayesian posteriors. These tasks share a structural constraint: the inter-layer weights of the ICNN must remain non-negative. The standard recipe, projected gradient descent (PGD) onto the non-negative cone, applies a hard, non-smooth projection -- the stiff-penalty limit of an ADMM-style constraint splitting -- and its classical convergence guarantees do not transfer to the non-smooth ICNN training landscape; the differentiable alternative, softplus reparametrization, attenuates the gradient exponentially in the weight magnitude, stalling training with dead inter-layer weights and plateaued loss. Inspired by parameter-extension lifts of PDE-constrained inverse problems, we propose the lift: instead of constraining the inter-layer weights directly, we train an unconstrained hypernetwork that emits them from a permutation-invariant summary of the input batch. This adds stochasticity to the training dynamics that softens the loss landscape, letting the iterates escape the gradient-attenuated region where direct softplus stalls. We trace this softening to three structural ingredients -- a learnable bias acting as slack, a hypernetwork body that conditions on the target batch, and a cross-covariance coupling the two through batch stochasticity -- and prove each one necessary: deleting any single ingredient collapses the cross-covariance that carries the softening. On log-concave energy-based modeling from one-dimensional toy targets to image-flavored latents, and convex-potential normalizing flows on a 21-dimensional tabular benchmark, we show that the lift reaches a lower test loss than both PGD and direct softplus, and turns a plateau-bounded training trajectory into a valley-descending one.

2605.24243 2026-05-26 cs.CV cs.AI stat.ML

GIBLy: Improving 3D Semantic Segmentation through an Architecture-Agnostic Lightweight Geometric Inductive Bias Layer

GIBLy: 通过架构无关的轻量级几何归纳偏置层改进3D语义分割

Diogo Lavado, Alessandra Micheletti, Clàudia Soares

AI总结 提出一种轻量级几何归纳偏置层GIBLy,通过集成可学习的几何先验提升3D分割性能,仅增加少量参数即可在多个基准上获得一致提升。

详情
AI中文摘要

在3D场景理解中,深度学习模型依赖大型模型和大量训练来捕捉3D数据中存在的几何结构。然而,现有方法缺乏显式机制来融入几何信息(例如可学习的基元形状),往往需要更大的模型和更多的训练数据,这增加了成本并可能限制泛化能力。我们引入了GIBLy,一种轻量级几何归纳偏置层,将可学习的几何先验集成到3D分割流程中。GIBLy通过提供与简单几何形状(因此可解释)对齐的特征来增强现有架构——无论是基于MLP、卷积还是Transformer——以最小的计算开销提升分割性能。我们在多个3D语义分割基准上验证了我们的方法,展示了一致的性能提升,包括在TS40K上使用PTV3时mIoU提升高达+11.5%,而仅增加58K额外参数。我们的结果突显了显式编码几何结构以支持准确高效的3D场景理解的优势,且仅需一个轻量级的附加层。

英文摘要

In 3D scene understanding, deep learning models rely on large models and extensive training to capture basic geometric structures that are present in the 3D data. However, existing methods lack explicit mechanisms to incorporate geometric information, such as learnable primitive shapes, often necessitating large models and more training data which in turn increases cost and can limit generalization. We introduce GIBLy, a lightweight geometric inductive bias layer that integrates learnable geometric priors into 3D segmentation pipelines. GIBLy enhances existing architectures -- whether MLP-based, convolution-based, or transformer-based -- by providing features aligned with simple geometric shapes (and thus human-interpretable) that improve segmentation performance with minimal computational overhead. We validate our approach across multiple 3D semantic segmentation benchmarks, demonstrating consistent performance gains, including up to +11.5% mIoU on TS40K with PTV3, while adding only 58K extra parameters. Our results highlight the benefit of explicitly encoding geometric structure to support accurate and efficient 3D scene understanding, with a lightweight add-on layer

2605.24212 2026-05-26 stat.AP cs.AI cs.LG stat.ML

Distributionally Robust Transfer Learning with Structurally Missing Covariates, with Application to Cross-National Cardiac Arrest Prediction

分布鲁棒迁移学习在结构缺失协变量中的应用:以跨国心脏骤停预测为例

Siqi Li, Chuan Hong, Ziye Tian, Benjamin Sieu-Hon Leong, Koshi Nakagawa, Hideharu Tanaka, Sang Do Shin, Khuong Quoc Dai, Do Ngoc Son, Marcus Eng Hock Ong, Nan Liu, Molei Liu

AI总结 提出DRUM框架,通过分布鲁棒优化和神经网络生成器处理目标域中结构缺失的协变量,实现无标签目标域的预测模型迁移,并在跨国心脏骤停预测中验证有效性。

详情
AI中文摘要

当关键训练协变量在部署时不可用且目标域中标记结果有限时,跨医疗系统部署临床预测模型常常失败。例如,院外心脏骤停(OHCA)的高性能模型依赖于高资源环境中常规收集的详细院前测量数据,但在许多国际登记处中不可用。现有方法要么丢弃缺失协变量,牺牲预测信息,要么依赖于关于其目标分布的可检验假设。我们提出了DRUM(具有结构缺失协变量的分布鲁棒无监督迁移学习),这是一个将预测模型迁移到某些协变量结构缺失且结果标签不可用的目标群体的框架。DRUM将协变量划分为共享组件($X$,在所有环境中观察到)和缺失组件($A$,仅在源域中观察到)。DRUM不进行缺失协变量插补,而是使用神经网络生成器优化未知目标分布$A \mid X$上的最坏情况预测性能,并通过鲁棒性参数控制与源条件允许的偏差。我们进一步开发了一种偏差校正程序,以减少对干扰估计误差的敏感性。模拟显示,在分布偏移下,平均和最坏情况预测误差均有显著改善。应用于跨国OHCA预测,将模型从美国登记处迁移到多个未记录院前变量的亚洲登记处,DRUM在各个站点产生了更校准的预测和改进的临床分类性能。

英文摘要

Deploying clinical prediction models across healthcare systems often fails when key training covariates are unavailable at deployment and labeled outcomes are limited in the target domain. For example, high-performing models for out-of-hospital cardiac arrest (OHCA) rely on detailed prehospital measurements routinely collected in high-resource settings but unavailable in many international registries. Existing methods either discard missing covariates, sacrificing predictive information, or rely on untestable assumptions about their target distribution. We propose DRUM (\underline{D}istributionally \underline{R}obust \underline{U}nsupervised transfer learning with structurally \underline{M}issing covariates), a framework that transfers prediction models to target populations where certain covariates are structurally absent and outcome labels are unavailable. DRUM partitions covariates into shared components ($X$), observed across all settings, and missing components ($A$), observed only in the source. Rather than imputing missing covariates, DRUM optimizes worst-case predictive performance over the unknown target distribution of $A \mid X$ using a neural network generator, with a robustness parameter controlling allowable deviation from the source conditional. We further develop a bias correction procedure that reduces sensitivity to nuisance estimation error. Simulations show substantial improvements in both mean and worst-case prediction error under distribution shift. Applied to cross-national OHCA prediction, transferring models from a US registry to multiple Asian registries where prehospital variables are unrecorded, DRUM yields better-calibrated predictions and improved clinical classification performance across sites.

2605.24210 2026-05-26 cs.LG stat.ML

Characterizing the Representational Capacity of Neural Processes

神经过程表示能力的刻画

Robin Young

AI总结 本文通过严格层级分析,刻画了条件神经过程、注意力神经过程、Transformer神经过程及其潜在变体的表示能力,揭示了不同架构在函数表示上的包含关系与局限。

详情
Comments
To appear at ProbML/AABI 2026
AI中文摘要

神经过程能表示哪些函数?我们分析了流行的NP架构的表示能力:条件神经过程(CNPs)、注意力神经过程(ANPs)、Transformer神经过程(TNPs)及其潜在变体。我们证明这些架构形成了一个严格的层级结构。CNP可表示的函数恰好是那些依赖于上下文分布的有限多个期望特征的函数。ANPs通过查询相关的重新加权严格推广了CNPs,从而实现了核平滑器。ConvCNPs和ANPs不可比较;每个都包含对方之外的函数,通过平稳性与平移等变性区分。具有$L$个自注意力层的TNPs捕获$L$跳上下文交互。对于潜在NPs,我们证明有限维潜在变量提供一致的采样,但不能规避编码器的限制;匹配GP后验分布需要潜在维度随上下文大小缩放。这些结果为基于任务结构的架构选择提供了理论基础。

英文摘要

What functions can Neural Processes represent? We analyze the representational capacity of popular NP architectures: Conditional Neural Processes (CNPs), Attentive Neural Processes (ANPs), Transformer Neural Processes (TNPs), and their latent variants. We prove these architectures form a strict hierarchy. CNP-representable functions are exactly those depending on finitely many expected features of the context distribution. ANPs strictly generalize CNPs via query-dependent reweighting, enabling kernel smoothers. ConvCNPs and ANPs are incomparable; each contains functions outside the other, separated by stationarity versus translation equivariance. TNPs with $L$ self-attention layers capture $L$-hop context interactions. For latent NPs, we show finite-dimensional latents provide coherent sampling but do not circumvent encoder limitations; matching GP posterior distributions requires latent dimension scaling with context size. These results provide a theoretical foundation for architecture selection based on task structure.

2605.24169 2026-05-26 stat.ME

Post-Processing Posterior Predictive P-values

后处理后验预测p值

Nils Lid Hjort, Fredrik A. Dahl, Gunnhildur Högnadóttir Steinbakk

AI总结 针对贝叶斯模型中后验预测p值(ppp值)分布不均匀的问题,提出校准后的cppp值,使其在模型条件下服从均匀分布,从而便于模型评估与比较。

详情
Journal ref
Journal of the American Statistician, 2006, vol. 101, pp 1157-1174
Comments
35 pages, 5 figures. This is the authors' Statistical Research Report, Department of Mathematics, University of Oslo, from 2005, later accepted in modified form in Journal of the American Statistician, 2006, vol. 101, pp 1157-1174
AI中文摘要

本文讨论了贝叶斯背景下的模型批评和模型比较问题,重点关注所谓的后验预测p值(ppp值)的使用。这些值涉及一般的差异或冲突度量,并依赖于先验、模型和数据。它们在统计实践中用于量化数据中的意外或冲突程度,以及比较先验和模型的不同组合。然而,正如我们在不同模型中所展示的,这种ppp值的分布远非均匀,使得它们的解释和比较变得困难。我们提出了一种自然的ppp值校准方法,得到的cppp值在模型条件下在单位区间上均匀分布。cppp值的计算通常依赖于双重模拟方案,然后可用于评估和比较不同的先验和模型。我们的方法还使得参数模型与非参数模型规范之间的比较成为可能,因为真正的“意外度量”被置于相同的规范均匀尺度上。我们的技术通过一些实际数据应用进行了说明。我们还提供了关于ppp和cppp各种性质的补充理论结果。

英文摘要

This article addresses issues of model criticism and model comparison in Bayesian contexts, and focusses on the use of the so-called posterior predictive p-values (ppp values). These involve a general discrepancy or conflict measure and depend on the prior, the model, and the data. They are used in statistical practice to quantify the degree of surprise or conflict in data, and for purposes of comparing different combinations of prior and model. The distribution of such ppp values is however far from uniform, as we demonstrate for different models, making their interpretation and comparison a difficult matter. We propose a natural calibration of the ppp values, where the resulting cppp values are uniform on the unit interval under model conditions. The cppp values, which in general rely on a double simulation scheme for their computation, may then be used to assess and compare different priors and models. Our methods also make it possible to compare parametric with nonparametric model specifications, in that genuine `measures of surprise' are put on the same canonical uniform scale. Our techniques are illustrated for some applications to real data. We also present supplementing theoretical results on various properties of the ppp and cppp.

2605.24167 2026-05-26 stat.ME

Modified treatment policies that depend on the natural history of treatment

依赖于治疗自然史的修正治疗策略

Iván Díaz, Nicholas T. Williams, Paweł Morzywołek, Kara E. Rudolph

AI总结 本文针对依赖于治疗自然史(包括历史值)的纵向修正治疗策略,提出了基于有效影响函数的目标学习估计量,并提供了在标准双稳健速率假设下的根号n推断。

详情
AI中文摘要

纵向修正治疗策略(LMTP)是一类干预措施,允许在一般设置中定义、识别和估计因果效应,例如连续或多变量暴露、需要宽限期的治疗方案。针对将时间$t$的暴露分配为时间$t$的治疗自然值的函数的LMTP,已经制定了目标机器学习估计量(即双重/去偏)。然而,重要的应用,如估计治疗开始延迟的效果,需要制定不仅依赖于时间$t$的治疗自然值,还依赖于时间$t$之前治疗自然值的 extit{历史}的LMTP。本文针对这种一般情况开发了目标学习估计量。我们讨论了效应的定义,并提出了使用纵向g计算公式的序贯回归形式的增广数据版本的估计量。我们的估计量基于有效影响函数,并在结果和治疗回归收敛的标准双稳健速率假设下提供$\sqrt{n}$推断。我们将新估计量应用于评估将风险疼痛治疗延迟一个月对12个月阿片类药物使用障碍发生率的影响。

英文摘要

Longitudinal modified treatment policies (LMTP) are a class of interventions that allow the definition, identification, and estimation of causal effects in general settings, such as with continuous or multivariate exposures, treatment regimens that require grace periods. Targeted machine learning estimators (i.e., double/debiased) have been formulated for LMTPs that assign the exposure at time $t$ as a function of the natural value of treatment at time $t$. However, important applications such as estimating the effect of a delay in the start of a treatment require formulating LMTPs that depend not only on the natural value of treatment at time $t$ but also on the \textit{history} of the natural value of treatment prior to time $t$. This paper develops targeted learning estimators for this general case. We discuss the definition of the effects, and propose estimators that use an augmented-data version of the sequential regression form of the longitudinal g-computation formula. Our estimators are based on the efficient influence function and provide $\sqrt{n}$ inference under standard doubly robust rate assumptions on the convergence of the outcome and treatment regressions. We apply the new estimators to assess the effect of delaying a risky pain treatment by one month on 12-month incidence of opioid use disorder.

2605.24156 2026-05-26 math.ST stat.TH

Long Memory in Intrinsically Dynamic Factor Models

内在动态因子模型中的长记忆性

Qin Wen, Clifford M. Hurvich

AI总结 研究广义动态因子模型在长记忆设定下的估计问题,采用双侧估计方法恢复公共成分,并利用特征间隙和L^p范数处理谱密度无界性。

详情
Comments
92 pages, 11 figures
AI中文摘要

我们研究了长记忆设定下的广义动态因子模型。与大多数近期工作假设有限维因子空间和短记忆不同,我们的框架允许因子空间是无限维的,并且公共成分表现出长记忆。我们采用Forni、Hallin、Lippi和Reichlin(2000,Review of Economics and Statistics)的双侧估计方法来恢复公共成分。公共成分的长记忆结构带来了挑战,因为它引入了谱密度的无界性/不连续性。我们通过利用两个关键事实来解决这个问题:第一,估计算子是对前导特征空间的投影,因此特征间隙提供了内在的缩放,部分缓解了爆炸。第二,我们主要在$L^p$范数下进行估计,而不是逐点估计。实验结果提供了支持该理论的证据,以及对其潜在改进的启示。

英文摘要

We study the generalized dynamic factor model in a long-memory setting. Unlike most recent work, which assumes a finite-dimensional factor space and short memory, our framework allows the factor space to be infinite-dimensional and the common components to exhibit long memory. We employ the two-sided estimation method of Forni, Hallin, Lippi and Reichlin (2000, Review of Economics and Statistics) to recover the common component. The long memory structure of the common component poses a challenge, as it introduces unboundedness/discontinuity in the spectral density. We address this issue by leveraging two key facts: First, the estimated operator is a projection onto the leading eigenspace and thus the eigengap provides an intrinsic scaling that partially mitigates the blow-up. Second, we perform most of our estimation in $L^p$-norm, rather than pointwise. Experimental results are presented to provide evidence supporting the theory, as well as potential improvements to it.

2605.24136 2026-05-26 stat.ML cs.LG stat.CO

Detecting Metastable Basins in High Dimensions via Marginal Trajectory Distribution Discrimination

通过边际轨迹分布判别检测高维亚稳态盆地

Taj Jones-McCormick

AI总结 提出一种基于边际轨迹分布比较的判别方法,通过神经网络近似贝叶斯分类器来识别高维马尔可夫过程中的亚稳态盆地,克服了传统谱方法在高维和非线性几何下的局限性。

详情
AI中文摘要

我们研究仅使用轨迹采样来识别高维时间齐次马尔可夫过程中动态不同的吸引盆地的问题。该问题是亚稳态动力系统分析的基础,其中过程在盆地内快速混合,而盆地之间的转换在感兴趣的时间尺度上很少发生,甚至当状态空间可约时也是如此。现有方法通常依赖于空间离散化或估计转移算子的谱分析,这在高维设置或底层盆地几何高度非线性时可能变得不可靠。我们提出了一种基于边际轨迹分布比较的盆地识别判别方法。我们证明了一个简单的风险分离结果:如果两个初始状态属于同一盆地,则区分其边际轨迹分布的贝叶斯最优分类器达到接近1/2的风险,而如果它们位于不同的盆地,则最优风险接近零。这一观察将盆地检测简化为边际轨迹分布之间的两样本判别问题。基于这一原理,我们开发了一种神经算法,该算法接收一组候选盆地代表,并通过神经网络近似贝叶斯分类器估计分类风险,迭代地合并它们。我们在各种亚稳态系统上评估了该方法。这些系统包括通过将低维动力学嵌入高维噪声环境空间构建的合成系统。在这些设置中,标准的谱和聚类方法常常失败,而我们的方法准确恢复了底层盆地结构。这些结果显示了现有方法的缺点,并突出了轨迹判别作为识别高维随机系统中动态盆地的有效工具。

英文摘要

We study the problem of identifying dynamically distinct basins of attraction in high dimensional time-homogeneous Markov processes using only trajectory sampling. This problem is fundamental in the analysis of metastable dynamical systems, where the process rapidly mixes within basins while transitions between basins occur rarely on the timescale of interest, or even when the state space is reducible. Existing approaches typically rely on spatial discretization or spectral analysis of estimated transition operators, which can become unreliable in high dimensional settings or when the underlying basin geometry is highly nonlinear. We propose a discriminative approach to basin identification based on marginal trajectory distribution comparison. We prove a simple risk separation result: if two initial states belong to the same basin, the Bayes-optimal classifier distinguishing their marginal trajectory distributions achieves risk close to 1/2, whereas if they lie in distinct basins, the optimal risk is close to zero. This observation reduces basin detection to a two-sample discrimination problem between marginal trajectory distributions. Motivated by this principle, we develop a neural algorithm that receives a set of candidate basin representatives and iteratively merges them by estimating classification risk with a neural network that approximates the Bayes classifier. We evaluate the method on various metastable systems. These include synthetic systems constructed by embedding low-dimensional dynamics into high dimensional noisy ambient spaces. In these settings, standard spectral and clustering-based methods often fail, while our approach accurately recovers the underlying basin structure. These results display a shortcoming of existing methods and highlight trajectory discrimination as an effective tool for identifying dynamical basins in high dimensional stochastic systems.

2605.24123 2026-05-26 stat.AP

Heritability: A Counterfactual Perspective

遗传度:一个反事实视角

Haochen Lei, Jieru Shi, Hongyuan Cao, Qingyuan Zhao

AI总结 提出基于反事实框架的遗传度定义,利用潜在结果模型量化遗传继承的重要性,并给出可从观测数据计算的界限,与常见遗传度概念进行比较。

详情
Comments
46 pages
AI中文摘要

遗传度是生物和社会科学中关于先天与后天长期争论的核心概念。然而,现有的遗传度概念基于强假设,且未使用显式因果模型。我们通过采用因果推断中的潜在结果模型,提出了一种新的反事实遗传度定义。我们的反事实遗传度通过个体与其假设的、暴露于完全相同环境的“非同卵双胞胎”之间的平均差异大小来衡量遗传继承的重要性。我们给出了原则上可从观测数据计算的反事实遗传度界限。然后,我们将反事实遗传度及其相关界限与基于人群的研究、双胞胎和同胞研究以及植物育种实验中常见的遗传度概念进行了比较。我们的结果和比较强调了在推理遗传度时澄清因果结构假设和反事实比较的重要性。

英文摘要

Heritability is a central concept in the long-standing debate about nature versus nurture in biological and social sciences. However, existing notions of heritability are based on strong assumptions and do not use explicit causal models. We propose a new, counterfactual definition of heritability by adopting the potential outcomes model in causal inference. Our counterfactual heritability measures the importance of genetic inheritance by the average magnitude of difference between an individual with their hypothetical ``non-identical twin'' that is exposed to the exact same environment. We provide bounds on the counterfactual heritability that can, in principle, be computed from observational data. We then compare counterfactual heritability and its associated bounds with common notions of heritability in population-based studies, twin and sibling studies, and plant breeding experiments. Our results and comparisons highlight the importance of clarifying the causal structural assumptions and counterfactual comparisons in reasoning about heritability.

2605.24118 2026-05-26 stat.ME

PCA score regression: the art of losing power

主成分得分回归:失去统计效能的艺术

Yu Lu, Nidhi Pai, Erjia Cui, Ciprian Crainiceanu

AI总结 本文通过理论分析和模拟实验,证明主成分得分回归(RPCS)相比标量函数回归(FoSR)在检测功能数据与协变量关联时存在统计效能损失、α水平膨胀及推断无效的问题,而FoSR通过特定建模工具可避免这些问题。

详情
AI中文摘要

主成分得分回归(RPCS)是一种广泛使用的分析方法,用于检测和检验功能测量与研究对象特征之间的关联。这里我们证明:(1)RPCS相对于标量函数回归(FoSR)会损失统计效能;(2)效能损失的程度取决于主成分与真实效应之间的相关性;(3)如果不进行多重性校正,RPCS的α水平会膨胀;(4)当前的RPCS方法无法为真实效应提供有效的推断。相比之下,我们证明标量函数回归(FoSR)通过使用特定的建模工具组合可以避免这些问题。我们通过大量模拟验证了这些理论发现,并使用国家健康与营养调查(NHANES)的分钟级加速度计数据说明了其实际意义。

英文摘要

The regression of principal component scores (RPCS) on covariates is a widely used analytic approach to detect and test for associations between functional measurements and study participant characteristics. Here we show that: (1) RPCS loses power relative to Function on Scalar Regression (FoSR); (2) the amount of power loss depends on the correlation between the PCs and the true effect; (3) if not corrected for multiplicity, RPCS has inflated $α$-level; and (4) current RPCS methods do not provide valid inference for the true effect. In contrast, we show that Function on Scalar Regression (FoSR) can avoid these problems using a particular combination of modeling tools. We validate these theoretical findings through extensive simulations and illustrate their practical implications using minute-level accelerometry data from the National Health and Nutrition Examination Survey (NHANES).

2605.24113 2026-05-26 cs.LG math.DG math.OC math.ST stat.TH

Riemannian Archetypal Analysis: Interpretable non-linear data analysis on deformed star distributions

黎曼原型分析:变形星形分布上的可解释非线性数据分析

Willem Diepeveen, Deanna Needell

AI总结 提出黎曼原型分析(RAM),通过数据驱动的拉回几何将经典原型分析扩展到非线性流形,结合可解释性与非线性表达能力,并基于凸松弛与非凸细化的优化方案实现。

详情
AI中文摘要

经典原型分析因其可解释性而具有吸引力,但其线性几何在处理强非线性结构的数据时可能限制性能;同时,现有的神经扩展提高了灵活性,但往往削弱了原型和插值的几何意义。在这项工作中,我们基于数据驱动的拉回几何,针对实值数据开发了黎曼版本的原型分析,旨在结合经典原型分析的可解释性与现代非线性模型的表达能力。我们引入了一类变形星形分布及其相关的拉回黎曼几何,以提供所得流形映射的统计解释,将黎曼原型映射(RAM)定义为投影到原型的测地凸组合流形上,并提出了基于凸松弛后接非凸细化的实用优化方案。我们进一步提出了一种学习方案,从数据中产生合理但通常次优的变形星形分布。在合成示例和MNIST上的实验表明,所提出的框架产生了有意义的测地线、有用的去噪投影和几何感知分类,同时也明确了当前优化限制所在。

英文摘要

Classical archetypal analysis is appealing for its interpretability, but its linear geometry can limit performance on data with strongly non-linear structure; at the same time, existing neural extensions improve flexibility while often weakening the geometric meaning of archetypes and interpolations. In this work, we develop a Riemannian version of archetypal analysis based on data-driven pullback geometry for real-valued data, with the goal of combining the interpretability of classical archetypal analysis with the expressive power of modern non-linear models. We introduce a class of deformed star distributions together with associated pullback Riemannian geometry to provide a statistical interpretation of the resulting manifold mappings, define the Riemannian archetypal mapping (RAM) as a projection onto the manifold of geodesically convex combinations of archetypes, and propose a practical optimization scheme based on convex relaxation followed by non-convex refinement. We further propose a learning scheme that yields reasonable, albeit generally suboptimal, deformed star distributions from data. Experiments on synthetic examples and MNIST show that the resulting framework produces meaningful geodesics, useful denoising projections, and geometry-aware classifications, while also clarifying where current optimization limitations remain.

2605.24076 2026-05-26 stat.ML cs.LG

Causality as the Statistical Conscience of Artificial Intelligence: From Pearl's Ladder to Trustworthy Machines

因果关系作为人工智能的统计良知:从珀尔的阶梯到可信机器

Ernest Fokoué

AI总结 本文论证因果推断是AI不可或缺的统计良知,通过统计必要性定理、统一因果统计估计框架以及三种AI失败模式的因果盲区分析,提出可信AI本质上是因果统计问题。

详情
Comments
18 pages, 4 figures, 1 table
AI中文摘要

现代人工智能通过优化大规模语料库上的统计风险函数实现了卓越的预测能力。然而,这与其正智能之间存在差距:无法区分相关性与因果关系。本文认为,因果推断(识别干预下不变的机制)是人工智能不可或缺的统计良知。没有因果基础,AI系统只是相关机器:在熟悉领域强大,在分布偏移下脆弱,在高风险场景中存在偏见。三个贡献发展了这一论点。首先,因果泛化的统计必要性定理:任何实现分布外泛化的算法必须编码因果结构,形式化了预测P(Y|X)与智能P(Y|do(X))之间的区别。其次,一个统一框架将珀尔的do演算、潜在结果框架、双机器学习以及不变风险最小化连接为一系列因果统计估计量,每个估计量在不同假设下识别干预分布。第三,三种AI失败模式(大语言模型中的幻觉、基于人类反馈的强化学习中的奖励黑客以及分布偏移下的退化)是因果盲区的表现,每种都有原则性的统计补救措施。可信AI的核心是一个因果统计问题。统计界不仅有能力解决它——而且是唯一拥有严格解决所需基础工具的群体。

英文摘要

Modern Artificial Intelligence achieves remarkable predictive power by optimizing statistical risk functionals over vast corpora. Yet a gap separates this from genuine intelligence: the inability to distinguish correlation from causation. This paper argues that causal inference (identifying mechanisms invariant under intervention) is AI's indispensable statistical conscience. Without causal grounding, AI systems are correlation machines: powerful in familiar domains, brittle under distribution shift, and biased in high-stakes settings. Three contributions develop this argument. First, a Statistical Necessity Theorem for Causal Generalization: any algorithm achieving out-of-distribution generalization must encode causal structure, formalizing the distinction between prediction P(Y|X) and intelligence P(Y|do(X)). Second, a unified framework connects Pearl's do-calculus, the Potential Outcomes framework, Double Machine Learning, and Invariant Risk Minimization as a family of Causal Statistical Estimators, each identifying interventional distributions under different assumptions. Third, three AI failure modes (hallucination in large language models, reward hacking in reinforcement learning from human feedback, and degradation under distribution shift) are manifestations of causal blindness, each admitting a principled statistical remedy. Trustworthy AI is, at its core, a problem of causal statistics. The statistical community is not merely equipped to solve it -- it is the only community with the foundational tools to do so rigorously.

2605.24072 2026-05-26 stat.ML cs.LG math.PR

Optimal Non-Asymptotic Edgeworth Expansions for Multivariate Neural Network Outputs

多元神经网络输出的最优非渐近 Edgeworth 展开

Lucia Celli

AI总结 针对有限宽度全连接神经网络输出,利用任意阶 Edgeworth 展开逼近其与高斯极限的偏差,并给出总变差距离的上下界。

详情
Comments
34 pages, 2 figures
AI中文摘要

具有高斯初始化权重的有限宽度全连接神经网络偏离其无限宽度高斯极限,表现出非消失的高阶累积量。我们针对在有限个输入上评估的神经网络,使用任意阶 $4m-1$($m\in\mathbb{N}$)的多维 Edgeworth 展开来逼近这些偏差。假设相应的高斯极限具有可逆协方差矩阵且激活函数为多项式有界,我们在真实网络输出分布与其 Edgeworth 逼近之间的总变差距离上建立了 $n^{-m}$ 阶的界,并给出了匹配的下界。作为一个应用,我们量化了当先验被其 Edgeworth 展开替代时贝叶斯后验分布的误差。我们的结果更具一般性,也适用于收敛到具有可逆协方差的高斯向量的条件高斯向量序列。

英文摘要

Finite-width fully connected neural networks with Gaussian-initialized weights deviate from their infinite-width Gaussian limit, exhibiting non-vanishing higher-order cumulants. We approximate these deviations, for a neural network evaluated in a finite number of inputs, using multidimensional Edgeworth expansions of arbitrary order $4m-1$, with $m\in\mathbb{N}$. Assuming that the corresponding Gaussian limit has an invertible covariance matrix and that the activation function is polynomially bounded, we establish a bound of order $n^{-m}$ on the total variation distance between the law of the true network output and its Edgeworth approximation, with matching lower bounds. As an application, we quantify the error in Bayesian posterior distributions when the prior is replaced by its Edgeworth expansion. Our results are more general and also apply to sequences of conditionally Gaussian vectors converging to a Gaussian vector with invertible covariance.

2605.24070 2026-05-26 stat.CO cs.NA math.NA math.PR

Convergence and non-asymptotic error analysis for kinetic Langevin samplers using the exact harmonic Langevin integrator

基于精确调和朗之万积分器的动力学朗之万采样器的收敛性与非渐近误差分析

Katharina Schuh

AI总结 提出一种基于精确调和朗之万积分器的分裂方案的新型动力学朗之万采样器,针对强对数凹目标测度,建立L^2-Wasserstein距离下的收敛速率和非渐近误差界。

详情
Comments
36 pages, 4 figures
AI中文摘要

我们提出了一种基于特定分裂方案的新型动力学朗之万采样器,该方案使用精确调和朗之万积分器。对于强对数凹目标测度,该采样器利用将强凸势能分解为二次部分和具有Lipschitz连续梯度的凸扰动。对于与该分裂相关的所得一阶和二阶格式,我们建立了$L^2$-Wasserstein距离下的收敛速率以及非渐近误差界。特别地,收缩率与底层连续动力学的阶数相同。为了达到$\varepsilon$精度,二阶格式所需的步长与已建立的分裂方案(如OBABO或UBU)相当,这些方案广泛应用于机器学习和分子动力学。

英文摘要

We propose a novel kinetic Langevin sampler based on a specific splitting scheme using the exact harmonic Langevin integrator. For strongly log-concave target measures, the sampler exploits a decomposition of the strongly convex potential into a quadratic part and a convex perturbation with Lipschitz continuous gradient. For the resulting first- and second-order schemes associated with this splitting we establish convergence rates in $L^2$-Wasserstein distance as well as non-asymptotic error bounds. In particular, the contraction rate is of the same order as that of the underlying continuous dynamics. To achieve $\varepsilon$-accuracy, the required step size for the second-order scheme is comparable to that of established splitting schemes such as OBABO or UBU, which are widely used in machine learning and molecular dynamics.

2605.24056 2026-05-26 stat.AP

Possession-Level Player Impact in the Pre-Play-by-Play NBA Era: A Video-Reconstructed RAPM Database, 1984--1996

控球级别球员影响力在NBA赛前数据时代:基于视频重建的RAPM数据库,1984-1996

Justin Jacobs

AI总结 通过手动从广播视频重建比赛数据,构建了1984-1996赛季首个控球级别球员影响力数据库,并采用加权岭回归估计RAPM,为历史分析提供了基础。

详情
AI中文摘要

正则化调整正负值(RAPM)是评估篮球运动员个人影响力的标准框架。其应用需要控球级别的轮换数据——记录每个连续控球序列中场上五名球员的记录——这种数据NBA直到20世纪90年代末才系统记录。本文描述了首个覆盖1984-85至1995-96赛季(共12个已发布赛季)的NBA赛前数据时代控球级别球员影响力数据库的构建、方法和验证。截至撰写时,已重建了12个已发布赛季的2,179场常规赛,包括总计435,760个记录的控球和1,012个不同的球员赛季。每场比赛均从广播视频手动重建:在每次死球换人时记录阵容变化,直接从录像统计控球次数,并记录每个阵容的得分。RAPM通过加权岭回归应用于重建的轮换数据来估计,使用与当代比赛记录相同的数学框架。我们提供了重建协议、估计过程的正式性质、通过后验可信区间的不确定性量化、多准则验证框架以及部分覆盖下的采样特性分析。生成的数据库是该时代唯一的控球级别个人影响力记录,为迄今为止技术上无法访问的历史分析提供了基础。

英文摘要

Regularized Adjusted Plus-Minus (RAPM) is the standard framework for estimating individual player impact in basketball. Its application requires possession-level stint data -- records of which five players shared the court for each contiguous sequence of possessions -- a form of data the NBA did not systematically record until the late 1990s. This paper describes the construction, methodology, and validation of the first possession-level player impact database for the pre-play-by-play NBA era, covering the regular seasons from 1984--85 through 1995--96, spanning twelve published seasons. As of this writing, 2,179 regular-season games have been reconstructed across twelve published seasons, comprising 435,760 total logged possessions and 1,012 distinct player-seasons. Every game was manually reconstructed from broadcast video: lineup changes were logged at every dead-ball substitution, possessions were tallied directly from footage, and points scored by each lineup were recorded. RAPM is estimated via weighted ridge regression applied to the reconstructed stint data, using the identical mathematical framework applied to modern play-by-play records. We provide a rigorous treatment of the reconstruction protocol, the formal properties of the estimation procedure, uncertainty quantification through posterior credible intervals, a multi-criterion validation framework, and an analysis of sampling properties at partial coverage. The resulting database is the only possession-level individual impact record for this era and provides a foundation for historical analysis that has until now been technically inaccessible.

2605.24050 2026-05-26 cs.SE cs.AI stat.AP

More Skills, Worse Agents? Skill Shadowing Degrades Performance When Expanding Skill Libraries

更多技能,更差智能体?扩展技能库时技能遮蔽降低性能

Hongwen Song, Song, Wei

AI总结 本文研究LLM智能体技能库扩展导致性能下降的现象,提出将性能下降分解为技能遮蔽和上下文开销两种效应,并通过实验证明技能遮蔽是主要瓶颈。

详情
AI中文摘要

技能库允许LLM智能体按需加载任务特定指令,使非专家用户能够通过自然语言解决领域特定任务,而无需知道存在哪些技能或它们如何工作。然而,随着技能库的增长,性能会下降——当从一组已知有用的小技能扩展到包含202个技能的库时,性能下降高达21%。在这项工作中,我们将这种性能下降定义为从加载已知有用技能库到加载完整技能库之间的通过率下降。此外,我们提出通过条件化技能调用——即智能体在轨迹中选择哪些技能——将通过率下降分解为两种效应:\emph{技能遮蔽},即随着技能库扩展,智能体更频繁地选择错误技能;以及\emph{上下文开销},即即使选择正确,扩大的上下文也会降低执行性能。我们推导了这两种效应的上界,以表征它们对通过率下降的影响程度。我们对效应及其上界的经验估计均表明,\emph{技能遮蔽}效应随技能库大小增长,并对性能下降有显著贡献,而\emph{上下文开销}效应仍然很小且与零无显著差异。这种观察到的非对称性表明,技能选择失败(而非上下文扩大)是扩展技能库时的主要瓶颈。

英文摘要

Skill libraries allow LLM agents to load task-specific instructions on demand, letting non-expert users solve domain-specific tasks through natural language without knowing which skills exist or how they work. However, performance degrades as libraries grow -- by up to 21\% when scaling from a small set of helpful skills to a 202-skill library. In this work, we formulate this performance degradation as the pass rate drop between loading a library of known-helpful skills and the full library. Moreover, we propose to decompose the pass rate drop by conditioning on the skill(s) invocation -- which skills the agent selects during a trajectory -- into two effects: \emph{skill shadowing}, where the agent selects wrong skills more often as the library expands, and \emph{context overhead}, where the enlarged context degrades execution even when selection is correct. We derive upper bounds on both effects to characterize their magnitudes of impacts to the pass rate drop. Our empirical estimates of the effects and their upper bounds both show that the \emph{skill shadowing} effect grows with library size and significantly contributes to the performance degradation, whereas the \emph{context overhead} effect remains small and indistinguishable from zero. This observed asymmetry establishes that the skill selection failure, not the enlarged context, is the primary bottleneck when expanding the skill libraries.

2605.24003 2026-05-26 cs.CV cs.AI stat.AP

Remote sensing data imputation using deep learning for multispectral imagery

基于深度学习的多光谱遥感数据插补

Shuang Liua, Fiona Johnson, Rohitash Chandra

AI总结 针对云覆盖导致的光学卫星数据缺失问题,本研究比较了线性插值与多种深度学习模型(CNN、Inception Resnet、Autoencoder及其与LSTM的组合)在四个有藻华历史记录的湖泊中重建缺失光谱波段的效果,发现深度学习模型显著优于基线方法,其中CNN表现最佳,且基于插补图像的藻华指数与观测数据吻合良好。

详情
AI中文摘要

近年来,遥感技术在水体应用中得到越来越多的利用。使用光学卫星数据的一个常见挑战是由于云覆盖导致的观测缺失。这些数据缺口可能导致错过对水资源管理部门高度关注的湖泊中关键事件(如藻华)的检测。因此,提高光学卫星数据集的完整性对于改善藻华的监测和预测至关重要。在本研究中,我们比较了传统数据插补方法(即线性插值)与深度学习模型在四个有藻华历史记录的湖泊中重建缺失光谱波段的效果。采用的深度学习模型包括基于CNN的架构(即CNN、Inception Resnet和Autoencoder)以及基于CNN-LSTM的架构(即CNN-LSTM、Resnet-LSTM和Autoencoder-LSTM)。我们的结果表明,在人工掩膜区域内插补光谱波段值时,深度学习模型显著优于基线线性插值方法。在这些模型中,CNN在大多数湖泊中表现最佳。此外,我们通过将插补图像与观测数据进行比较,评估了基于插补图像的藻华指数(即Green/Red和NDCI)的性能。我们的结果表明,深度学习模型对于插补PlanetScope SuperDove影像中的缺失数据是有效的,从而能够实现更可靠的水体监测应用。

英文摘要

Remote sensing techniques have been increasingly utilised in aquatic applications in recent years. A common challenge in using optical satellite data is the presence of missing observations due to cloud cover. These data gaps can lead to missed detection of critical events, such as algal blooms, in lakes of high interest to water authorities. As a result, enhancing the completeness of optical satellite datasets is crucial for improving the monitoring and prediction of algal blooms. In this study, we compared a traditional data imputation method (i.e., linear interpolation) with deep learning models for reconstructing missing spectral bands across four lakes with historical records of algal blooms. The deep learning models adopted include CNN-based architectures (i.e., CNN, Inception Resnet, and Autoencoder) and CNN-LSTM-based architectures (i.e., CNN-LSTM, Resnet-LSTM, and Autoencoder-LSTM). Our results demonstrated that deep learning models substantially outperformed the baseline linear interpolation method in imputing spectral band values within artificially masked regions. Among these models, CNN delivered the best performance across most lakes. Furthermore, we evaluated the performance of algal bloom indices (i.e., Green/Red and NDCI) derived from the imputed imagery by comparing them with the observed data. Our results demonstrate that deep learning models are effective for imputing missing data in PlanetScope SuperDove imagery, enabling more reliable applications in water monitoring.

2605.22800 2026-05-26 cs.LG cs.AI stat.ML

The Matching Principle: A Geometric Theory of Loss Functions for Nuisance-Robust Representation Learning

匹配原则:面向干扰鲁棒表示学习的损失函数几何理论

Vishal Rajput

AI总结 提出匹配原则,通过估计任务协方差矩阵并匹配惩罚矩阵的像空间,统一了多种鲁棒性方法,并在线性高斯模型中证明最优性。

详情
Comments
58 pages, 13 pre-specified empirical blocks. v2: partial-pass framing, geometry-task dissociation, T2B protocol v3, layout/figure fixes; core theorems unchanged. Code: matching-pmh (PyPI). Related note: arXiv:2604.21395
AI中文摘要

鲁棒性、领域自适应、光度/遮挡不变性、传感器漂移和对齐风格被视为独立的文献领域,拥有各自独立的方法族。在标签保持的部署偏移下,它们共享一个几何对象:协方差 Sigma_task = Cov_{Q_n}(n),即输入在标签不变的情况下可以变化的方式。CORAL、对抗训练、数据增强、度量学习、雅可比惩罚和对齐约束并非独立的技巧——它们都是 Sigma_task 的估计量。固定该对象后,雅可比惩罚由一个矩阵 Sigma' 确定,其像空间必须覆盖 range(Sigma_task)——即匹配原则。我们在线性高斯模型中证明了最优性(定理A),证明了任何能够消除部署漂移的二次惩罚都需要像空间覆盖(定理G),并在全局最小值处证明了相同的二分性(定理A*_global)。错误方向/信号对齐控制(引理C;推论E/E*)以及七个估计量(引理D1-D7),加上无标签TDI,为需要学习 Sigma_task 的情况提供了可证伪的配方。在十三个模块(从ML到Qwen2.5-7B)上,测试了匹配的、各向同性的和错误方向的惩罚对几何和部署漂移的影响。其中十二个模块与可识别性成立的理论一致;Office-31是一个命名的特征间隙失败案例。部分通过:几何可以在不改善每个头条任务指标的情况下提升。一次初步的7B DPO运行(一个epoch,240对):匹配风格-PMH保持了风格TDI,而标准DPO则使其退化。我们不声称标准训练达到全局最小值(假设(O)是开放的),不声称估计的 Sigma_task 总是可识别的,也不声称在每个排行榜上占优。我们提出一个可证伪的设计配方:估计 Sigma_task,匹配 Sigma',运行控制,分别报告任务和几何指标。

英文摘要

Robustness, domain adaptation, photometric/occlusion invariance, sensor drift, and alignment style are treated as separate literatures with separate method families. Under label-preserving deployment shift they share one geometric object: the covariance Sigma_task = Cov_{Q_n}(n) of ways inputs can change without changing the label. CORAL, adversarial training, augmentation, metric learning, Jacobian penalties, and alignment constraints are not independent tricks--they are estimators of Sigma_task. Fix that object and the Jacobian penalty is pinned by a matrix Sigma' whose range must cover range(Sigma_task)--the matching principle. We prove optimality in a linear-Gaussian model (Thm. A), necessity of range coverage for any quadratic penalty that zeros deployment drift (Thm. G), and the same dichotomy at global minima (Thm. A*_global). Wrong-direction/signal-aligned controls (Lemma C; Cor. E/E*) and seven estimators (Lemmas D1--D7), plus label-free TDI, yield a falsifiable recipe when Sigma_task must be learned. Thirteen blocks (ML through Qwen2.5-7B) test matched vs isotropic vs wrong-direction penalties on geometry and deployment drift. Twelve match theory where identifiability holds; Office-31 is a named eigengap failure. Partial passes: geometry can improve without every headline task metric moving. A pilot 7B DPO run (one epoch, 240 pairs): matched style-PMH preserves Style TDI where standard DPO degrades it. We do not claim standard training reaches global minima (assumption (O) is open), that estimated Sigma_task is always identifiable, or dominance on every leaderboard. We claim a falsifiable design recipe: estimate Sigma_task, match Sigma', run the controls, report task and geometry separately.

2605.10430 2026-05-26 cs.LG cs.AI stat.ML

Real vs. Semi-Simulated: Rethinking Evaluation for Treatment Effect Estimation

真实 vs. 半模拟:重新思考治疗效果估计的评估

George Panagopoulos

AI总结 通过大规模实证研究,比较了半模拟基准和真实数据集上使用反事实指标与可观测指标评估治疗效果估计模型的效果,揭示了两种评估体系之间的差距,并发现简单元学习器与强基础模型结合具有竞争力。

详情
AI中文摘要

利用机器学习估计异质性治疗效果在学术研究和工业实践中都引起了广泛关注。然而,这两个领域通常在不同条件下评估模型。方法论工作通常依赖于半模拟基准和需要反事实结果的指标,而实际应用则依赖于基于排名或测试结果的可观测指标。尽管方法论进展与实际部署之间存在众所周知的差距,但这些评估体系之间的关系尚未得到系统研究。我们对标准半模拟基准系列和真实数据集上的治疗效果评估进行了大规模实证研究。我们的基准涵盖了与多个基础学习器配对的元学习器,以及专门的因果机器学习模型。我们使用应用导向文献中常见的可观测指标以及方法论文中常用的反事实指标来评估这些方法。我们的结果揭示了两个互补的差距。首先,即使在相同的半模拟基准上,反事实指标也不能可靠地恢复可观测指标偏好的估计器。其次,在半模拟基准上获得的排名不能迁移到真实数据集。我们还发现,具有强大基础模型的简单元学习器始终具有竞争力,这与专门的因果模型形成对比。总体而言,我们的发现表明,治疗效果估计研究的进展不应仅通过反事实指标和半模拟基准来评估,而应结合可观测指标和真实数据验证。

英文摘要

Estimating heterogeneous treatment effects with machine learning has attracted substantial attention in both academic research and industrial practice. However, the two communities often evaluate models under markedly different conditions. Methodological work typically relies on semi-simulated benchmarks and metrics that require counterfactual outcomes, whereas real-world applications rely on observable metrics based on ranking or test outcomes. Despite the well-known gap between methodological progress and practical deployment, the relationship between these evaluation regimes has not been examined systematically. We conduct a large-scale empirical study of treatment effect evaluation across standard semi-simulated benchmark families and real-world datasets. Our benchmark covers meta-learners paired with multiple base learners, as well as specialized causal machine learning models. We evaluate these methods using observable metrics common in application-oriented literature, alongside counterfactual metrics commonly used in methods papers. Our results reveal two complementary gaps. First, counterfactual metrics do not reliably recover the estimators preferred by observable metrics, even on the same semi-simulated benchmarks. Second, rankings obtained on semi-simulated benchmarks do not transfer to real datasets. We further find that simple meta-learners with strong base models are consistently competitive, in contrast to specialized causal models. Overall, our findings suggest that progress in treatment effect estimation research should not be assessed solely through counterfactual metrics and semi-simulated benchmarks, but it would benefit from incorporating observable metrics and real-data validation.

2605.07107 2026-05-26 cs.IT math.IT math.ST stat.ML stat.TH

Sub-Gaussian Concentration and Entropic Normality of the Maximum Likelihood Estimator

最大似然估计的次高斯集中与熵正态性

Leighton P. Barnes, Alex Dytso

AI总结 本文在标准正则条件下,通过建立归一化最大似然估计的次高斯尾界、所有矩收敛以及熵中心极限定理,强化了经典渐近正态性结果。

详情
AI中文摘要

众所周知,在标准正则条件下,最大似然估计满足中心极限定理,并随着样本量的增长依分布收敛于高斯随机变量。本文通过发展归一化最大似然估计的几种更强形式的渐近正态性,强化了这一经典结果。在得分函数上附加假设后,我们首先建立了归一化估计误差的次高斯尾界和所有矩的收敛性。然后,我们证明了估计量平滑版本的熵中心极限定理,显示其相对熵收敛到极限高斯分布。当归一化估计的Fisher信息有界或其密度具有有界一阶导数时,我们进一步表明可以去除平滑,从而得到最大似然估计本身的熵正态性。证明过程中开发了辅助工具,可能具有独立意义,包括指数一致性界、高阶矩估计以及估计量的熵控制论证。

英文摘要

It is well known that, under standard regularity conditions, the maximum likelihood estimator (MLE) satisfies a central limit theorem and converges in distribution to a Gaussian random variable as the sample size grows. This paper strengthens this classical result by developing several stronger forms of asymptotic normality for the normalized MLE. With additional assumptions on the score, we first establish sub-Gaussian tail bounds and convergence of all moments for the normalized estimation error. We then prove an entropic central limit theorem for a smoothed version of the estimator, showing convergence in relative entropy to the limiting Gaussian law. When the Fisher information of the normalized estimate is bounded, or its density has bounded first derivative, we further show that the smoothing can be removed, yielding entropic normality of the MLE itself. The proofs develop auxiliary tools that may be of independent interest, including exponential consistency bounds, high-moment estimates, and entropy-control arguments for the estimator.

2604.22948 2026-05-26 cs.LG stat.CO stat.ML

Score-Repellent Monte Carlo: Toward Efficient Non-Markovian Sampler with Constant Memory in General State Spaces

分数排斥蒙特卡洛:面向一般状态空间中具有恒定内存的高效非马尔可夫采样器

Jie Hu, Lingyun Chen, Geeho Kim, Jinyoung Choi, Bohyung Han, Do Young Eun

AI总结 提出分数排斥蒙特卡洛(SRMC)框架,通过分数评估的运行平均值总结轨迹历史,利用指数分数倾斜构建替代目标,实现恒定内存下的非马尔可夫采样,降低渐近方差并改善模式覆盖。

详情
Comments
Accepted at ICML 2026 (Spotlight); GitHub Repo: https://github.com/srmc-project/Score-Repellent-Monte-Carlo
AI中文摘要

历史依赖采样可以通过阻止冗余重访来降低长期蒙特卡洛方差,但现有方案通常通过有限状态空间上的经验度量编码历史,这在高维离散配置空间中不可行或在连续域中不适定。我们提出分数排斥蒙特卡洛(SRMC)框架,该框架通过 $\mathbb{R}^d$ 中分数评估的运行平均值总结轨迹历史,其中 $d$ 是分数和状态表示的维度。该历史通过指数分数倾斜转换为替代目标,以 $α$ 为索引,表示排斥强度,控制基于历史的排斥幅度。替代族在标准MCMC意义上是无需归一化的,从而产生一个通用包装器:在每次迭代中,任何针对 $π$ 的基础核都可以在当前替代 $π_{θ_n}$ 上运行,同时在线更新历史。我们使用带有受控马尔可夫噪声的随机逼近分析历史递归和蒙特卡洛估计器的耦合演化,建立了几乎必然收敛和联合中心极限定理。我们进一步确定了渐近协方差随 $α$ 增加而减小的区域,缩放比例为 $O(1/α)$,将有限状态历史依赖采样器的近零方差效应扩展到具有恒定内存的一般状态空间。在连续目标和离散能量基模型上的实验表明,估计器方差和模式覆盖得到改善,同时保持 $O(d)$ 内存使用和适度的每次迭代开销。

英文摘要

History-dependent sampling can reduce long-run Monte Carlo variance by discouraging redundant revisits, but existing schemes typically encode history through empirical measure on finite state spaces, which is infeasible in high-dimensional discrete configuration spaces or ill-posed in continuous domains. We propose Score-Repellent Monte Carlo (SRMC) framework that summarizes trajectory history by a running average of score evaluations in $\mathbb{R}^d$, where $d$ is the dimension of the score and state representation. This history is converted into a surrogate target through an exponential score tilt, indexed with $α$ that represents the strength of repellence in controlling the magnitude of the history-based repulsion. The surrogate family is normalization-free in the standard MCMC sense, yielding a generic wrapper: at each iteration, any base kernel targeting $π$ can instead be run on the current surrogate $π_{θ_n}$ while the history is updated online. We analyze the coupled evolution of the history recursion and Monte Carlo estimators using stochastic approximation with controlled Markovian noise, establishing almost sure convergence and a joint central limit theorem. We further identify regimes in which the asymptotic covariance decreases as $α$ increases, with scaling $O(1/α)$, extending the near-zero-variance effect of finite-state history-dependent samplers to general state spaces with constant memory. Experiments on continuous targets and discrete energy-based models demonstrate improved estimator variance and mode coverage, while retaining $O(d)$ memory usage and modest per-iteration overhead.

2603.14561 2026-05-26 stat.ME math.ST stat.TH

Variance Inference Beyond the Sandwich for Asymptotically Linear Estimators with Second-Order Remainders

具有二阶余项的新近线性估计量的方差推断:超越Sandwich方法

Lin Li, Pengcheng Wu

AI总结 针对半参数估计中二阶余项贡献不可忽略方差的情况,提出有限样本方差分解,并证明留一法Jackknife和配对聚类Bootstrap可估计总方差,通过阶梯楔形聚类随机试验验证改进覆盖。

详情
Comments
15 main tex page, 1 supplement
AI中文摘要

具有von Mises展开的半参数估计量通常将推断简化为影响函数方差。当二阶余项在方差中可忽略时,这种简化是合理的,该条件比保证经典渐近线性的通常乘积率要求更强。当余项贡献不可忽略的方差时,标准sandwich会低估总抽样方差,Wald区间会覆盖不足;我们称此为近边界情形。我们推导了分离影响函数和余项分量的有限样本方差分解,给出了sandwich方差何时失效的实用刻画,并证明了在明确的正则条件下,留一法jackknife和配对聚类bootstrap可以估计总方差。对于jackknife,一致性来自自归一化论证;对于bootstrap,我们在Mallows-2一致性条件下工作。针对聚类数据,推导了sandwich差距被组内相关放大的解析表达式。使用阶梯楔形聚类随机试验中代理辅助目标学习估计量的模拟研究说明了该情形:方差比$\hat{V}_{\rm JK}/\hat{V}_{\rm Sand}$为1.14--1.38,且在不同聚类数量下持续存在,改进后的程序显著提高了覆盖度。

英文摘要

Semiparametric estimators admitting a von Mises expansion often reduce inference to the influence-function variance. This reduction is justified when the second-order remainder is negligible in variance, a condition that is stronger than the usual product-rate requirement guaranteeing classical asymptotic linearity. When the remainder contributes non-negligible variance, the standard sandwich can underestimate the total sampling variance and Wald intervals can undercover; we call this the \emph{near-boundary regime}. We derive a finite-sample variance decomposition separating influence-function and remainder components, give a practical characterization of when sandwich variance can fail, and show that the leave-one-out jackknife and pairs cluster bootstrap can estimate the total variance under explicit regularity conditions. For the jackknife, consistency follows from a self-normalization argument; for the bootstrap, we work under a Mallows-2 consistency condition. An analytic expression for the amplification of the sandwich gap by intra-cluster correlation is derived for clustered data. A simulation study using a surrogate-assisted targeted learning estimator in stepped-wedge cluster-randomized trials illustrates the regime: the variance ratio $\hat{V}_{\rm JK}/\hat{V}_{\rm Sand}$ is 1.14--1.38 and persistent across cluster counts, and the refined procedures substantially improve coverage.

2603.03245 2026-05-26 math.PR math.ST stat.TH

Testing the mixture model hypothesis via spectral gap

通过谱间隙检验混合模型假设

March T. Boedihardjo, Joe Kileel, Vandy Tombs

AI总结 本文通过引入四阶矩算子的特征值,研究如何检验给定概率测度能否分解为两个二阶统计量显著不同的概率测度的混合,并给出了非渐近界和渐近版本的完整解。

详情
AI中文摘要

本文研究检验给定概率测度 $μ$ 在 $\mathbb{R}^{d}$ 上是否可以分解为两个概率测度的混合,且这两个测度的二阶统计量显著不同的问题。我们称之为检验混合模型假设问题。为了解决这个问题,我们引入了一组新的可计算的正交不变量,即与测度相关的四阶矩算子 $T_μ$ 的特征值。我们证明最大特征值始终是一个离群特征值。此外,我们展示了 $T_μ$ 的第一和第二大特征值如何为该问题提供非渐近界,并在 $L^{8}$-$L^{2}$ 等价假设下给出了该问题渐近版本的完整解。

英文摘要

In this paper, we study the problem of testing whether or not a given probability measure $μ$ on $\mathbb{R}^{d}$ can be decomposed as a mixture of two probability measures whose second order statistics are significantly different. We call this the problem of testing the mixture model hypothesis. To tackle it, we introduce a new set of computable orthogonal invariants of $μ$, namely, the eigenvalues of the 4th moment operator $T_μ$ associated with the measure. We prove that the largest eigenvalue is always an outlier eigenvalue. Further, we show how the first and second largest eigenvalues of $T_μ$ give nonasymptotic bounds for this problem and give a complete resolution of the asymptotic version of the problem under the $L^{8}$-$L^{2}$ equivalence assumption.

2602.16376 2026-05-26 econ.EM stat.AP

Two-way Clustering Robust Variance Estimator in Quantile Regression Models

分位数回归模型中的双向聚类稳健方差估计量

Ulrich Hounyo, Jiahao Lin

AI总结 针对双向聚类数据,提出一种基于核密度估计和投影分解的双向聚类稳健三明治方差估计量,并证明其在高斯机制下的一致性及推断有效性。

详情
AI中文摘要

我们研究线性分位数回归中双向聚类数据的推断。利用单独可交换数组框架和分位数得分的投影分解,我们刻画了依赖于机制的收敛速度,并建立了自归一化高斯逼近。我们提出了一种双向聚类稳健三明治方差估计量,其中包含基于核密度的“面包”和与投影匹配的“肉”,并证明了在高斯机制下推断的一致性和有效性。我们还展示了在非高斯交互机制下统一推断的不可能性结果。

英文摘要

We study inference for linear quantile regression with two-way clustered data. Using a separately exchangeable array framework and a projection decomposition of the quantile score, we characterize regime-dependent convergence rates and establish a self-normalized Gaussian approximation. We propose a two-way cluster-robust sandwich variance estimator with a kernel-based density ``bread'' and a projection-matched ``meat'', and prove consistency and validity of inference in Gaussian regimes. We also show an impossibility result for uniform inference in a non-Gaussian interaction regime.

2602.10538 2026-05-26 stat.ML cs.LG

Why Agentic Theorem Prover Works: A Statistical Provability Theory of Mathematical Reasoning Models

为什么智能体定理证明器有效:数学推理模型的统计可证明性理论

Sho Sonoda, Shunta Akiyama, Yuya Uezato

AI总结 本文通过统计可证明性理论,将形式化证明搜索建模为有限视界可达性MDP,分析了智能体定理证明器中各组件对有限预算下证明成功率的影响,并给出了成功率差距的误差界。

详情
Comments
accepted at icml2026
AI中文摘要

智能体定理证明器结合了推理模型、检索、搜索和证明助手验证器,但目前尚不清楚哪些组件实际上提高了有限预算下的证明成功率,以及它们为何在真实数学工作负载上有效。我们通过统计可证明性来研究这个问题:在指定定理实例流上,在预算内达到已验证证明的概率。我们将形式化证明搜索建模为具有确定性验证器动态的有限视界可达性MDP,并表明在忠实状态抽象下,最优成功概率与普通句法可证明性一致。然后我们分析了一个简单但实际重要的流程:深度上的离线动作值回归,随后是贪婪的测试时证明。我们的主要定理通过一个占用加权和的一致动作值误差来界定学习证明器与最优证明器之间的可证明性差距;在常见的均匀误差解读中,主要复杂度乘子是学习证明器的平均截断证明长度。误差分解为逼近误差、训练分布的几何覆盖和蒙特卡洛标签噪声,并在动作间隔边界条件下以快速速率改进。该结果给出了一个组件敏感的解释,说明为什么验证器反馈、检索、表示几何和证明缩短机制在偏置定理工作负载上有帮助,而不与经典的最坏情况困难性相矛盾。

英文摘要

Agentic theorem provers combine a reasoning model, retrieval, search, and a proof assistant verifier, yet it remains unclear which components actually improve finite-budget proof success and why they help on real mathematical workloads. We study this question through statistical provability: the probability of reaching a verified proof within a budget on a specified stream of theorem instances. We model formal proof search as a finite-horizon reachability MDP with deterministic verifier dynamics, and show that under a faithful state abstraction the optimal success probability coincides with ordinary syntactic provability. We then analyze a simple but practically important pipeline: depth-wise offline action-value regression followed by greedy test-time proving. Our main theorem bounds the provability gap between the learned prover and the optimal prover by an occupancy-weighted sum of uniform action-value errors; in the common uniform-error reading, the leading complexity multiplier is the learned prover's average truncated proof length. The error decomposes into approximation error, geometric coverage of the training distribution, and Monte Carlo label noise, and improves to a fast rate under an action-gap margin condition. The result gives a component-sensitive account of why verifier feedback, retrieval, representation geometry, and proof-shortening mechanisms help on biased theorem workloads, without contradicting classical worst-case hardness.

2601.07044 2026-05-26 stat.ME

Semiparametric Analysis of Interval-Censored Data Subject to Inaccurate Diagnoses with A Terminal Event

带有终端事件的区间删失数据在诊断不准确情况下的半参数分析

Yuhao Deng, Donglin Zeng, Yuanjia Wang

AI总结 针对诊断不准确且存在终端事件的区间删失数据,提出基于Cox比例风险模型的半参数框架,采用非参数最大似然估计和EM算法,并证明估计量的渐近正态性。

详情
Comments
This paper has been accepted by the Annals of Applied Statistics
AI中文摘要

区间删失常见于慢性病研究中,疾病状态通过间歇收集的生物标志物推断。尽管已有许多方法分析此类数据,但它们通常假设疾病诊断完美,而在实践中,由于认知功能的临床诊断固有缺陷或脑脊液等生物标志物的测量误差,这一假设往往不成立。本文引入一个使用Cox比例风险模型的半参数建模框架,以处理存在不准确疾病诊断的区间删失数据。我们的模型结合了诊断的敏感性和特异性,以解释区间是否真正包含疾病发病的不确定性。此外,该框架适用于涉及终端事件以及诊断准确(例如通过尸检分析)的场景。我们提出一种用于推断的非参数最大似然估计方法,并开发了一种高效的EM算法以确保计算可行性。回归系数估计量被证明是渐近正态的,达到了半参数效率界。我们通过广泛的模拟研究和评估阿尔茨海默病(AD)风险的应用进一步验证了该方法。我们发现β-淀粉样蛋白与AD显著相关,而Tau蛋白对AD和死亡率均有预测作用。

英文摘要

Interval-censoring frequently occurs in studies of chronic diseases where disease status is inferred from intermittently collected biomarkers. Although many methods have been developed to analyze such data, they typically assume perfect disease diagnosis, which often does not hold in practice due to the inherent imperfect clinical diagnosis of cognitive functions or measurement errors of biomarkers such as cerebrospinal fluid. In this work, we introduce a semiparametric modeling framework using the Cox proportional hazards model to address interval-censored data in the presence of inaccurate disease diagnosis. Our model incorporates sensitivity and specificity of the diagnosis to account for uncertainty in whether the interval truly contains the disease onset. Furthermore, the framework accommodates scenarios involving a terminal event and when diagnosis is accurate, such as through postmortem analysis. We propose a nonparametric maximum likelihood estimation method for inference and develop an efficient EM algorithm to ensure computational feasibility. The regression coefficient estimators are shown to be asymptotically normal, achieving semiparametric efficiency bounds. We further validate our approach through extensive simulation studies and an application assessing Alzheimer's disease (AD) risk. We find that amyloid-beta is significantly associated with AD, but Tau is predictive of both AD and mortality.

2512.17064 2026-05-26 cs.CE math.ST stat.CO stat.TH

Flux-Preserving Adaptive Finite State Projection for Multiscale Stochastic Reaction Networks

多尺度随机反应网络的保通量自适应有限状态投影

Aditya Dendukuri, Shivkumar Chandrasekaran, Linda Petzold

AI总结 提出一种基于概率通量的自适应有限状态投影方法,通过通量驱动的状态空间剪枝和时间步长选择,有效处理多尺度随机反应网络中的瓶颈状态和刚性动态。

详情
AI中文摘要

有限状态投影(FSP)方法通过将动力学限制在(通常是无限的)状态空间的一个有限子集上来近似化学主方程(CME),从而能够直接数值求解并具有可计算的误差界。自适应变体随时间更新该子集,但反应速率相差悬殊的多尺度系统仍然具有挑战性,因为低概率瓶颈状态可能携带重要的概率通量,并且动力学在快速瞬态和缓慢演化的刚性状态之间交替。我们提出了一种基于通量的自适应FSP方法,该方法使用概率通量来驱动状态空间剪枝和时间步长选择。剪枝规则保护具有大出射通量的低概率状态,从而保持瓶颈系统中的连通性,而时间步长规则则根据瞬时总通量进行调整,以处理跨越多个数量级的速率常数。在刚性、振荡和瓶颈反应网络上的数值实验表明,该方法在保持精度的同时使用了显著更小的状态空间。

英文摘要

The Finite State Projection (FSP) method approximates the Chemical Master Equation (CME) by restricting the dynamics to a finite subset of the (typically infinite) state space, enabling direct numerical solution with computable error bounds. Adaptive variants update this subset in time, but multiscale systems with widely separated reaction rates remain challenging, as low-probability bottleneck states can carry essential probability flux and the dynamics alternate between fast transients and slowly evolving stiff regimes. We propose a flux-based adaptive FSP method that uses probability flux to drive both state-space pruning and time-step selection. The pruning rule protects low-probability states with large outgoing flux, preserving connectivity in bottleneck systems, while the time-step rule adapts to the instantaneous total flux to handle rate constants spanning several orders of magnitude. Numerical experiments on stiff, oscillatory, and bottleneck reaction networks show that the method maintains accuracy while using substantially smaller state spaces.

2512.12398 2026-05-26 stat.ME stat.CO

Scalable Spatial Stream Network (S3N) Models

可扩展空间河流网络(S3N)模型

Jessica P. Kunke, Julian D. Olden, Tyler H. McCormick

AI总结 提出S3N模型,通过扩展近邻高斯过程并优化预处理,实现比现有模型快2-3个数量级的空间与协方差参数估计,成功应用于俄亥俄河流域285种鱼类种群规模与地理分布估算。

详情
AI中文摘要

理解栖息地如何塑造河流网络中物种分布和丰度仍然是生态学中一个长期且基本的挑战,对有效的生物多样性管理和保护具有直接影响。我们引入了一种可扩展的空间河流网络(S3N)模型,该模型能够以比以往更高的计算效率进行估计、推断和预测。S3N将近邻高斯过程(NNGP)扩展到包含生态上显著的河流网络依赖结构。此外,S3N实现了比SSN更高效的预处理;虽然估计的计算成本是观测点数量的函数,而不是河段数量的函数,但预处理则是两者的函数。我们证明,S3N准确恢复空间和协方差参数的速度比现有空间河流网络模型快2-3个数量级。然后,我们应用S3N在一台笔记本电脑上估算了整个俄亥俄河流域(>4000河流公里,约170,000个河段和9,000个观测点)285种鱼类的种群规模和地理分布。这些结果表明S3N在绘制淡水变量图以及量化环境驱动因素对广泛、复杂且具有许多观测点的河流网络的影响方面具有前景。

英文摘要

Understanding how habitats shape species distributions and abundances across river networks remains a longstanding and fundamental challenge in ecology, with direct implications for effective biodiversity management and conservation. We introduce a scalable spatial stream network (S3N) model that enables estimation, inference, and prediction with greater computational efficiency than previously possible. S3Ns extend nearest-neighbor Gaussian processes (NNGPs) to include ecologically salient stream network dependence structure. Additionally, S3Ns implement more efficient preprocessing than SSNs; while the computational cost of estimation is a function of the number of observation points and not of the number of reaches, the preprocessing is a function of both. We demonstrate that S3Ns accurately recover spatial and covariance parameters 2-3 orders of magnitude faster than existing spatial stream network models. We then apply S3Ns to estimate the population sizes and geographic distributions of 285 fish species in the entire Ohio River Basin (>4,000 river km, approximately 170,000 reaches and 9,000 observation points) on a laptop. These results indicate the promise of S3Ns for mapping freshwater variables and quantifying the influence of environmental drivers across extensive, complex river networks with many observation points.

2509.12783 2026-05-26 q-bio.NC cs.LG math.DS stat.ML

Fast reconstruction of degenerate populations of conductance-based neuron models from spike times

基于电导的神经元模型退化群体的尖峰时间快速重建

Julien Brandoit, Damien Ernst, Guillaume Drion, Arthur Fyon

AI总结 结合深度学习与动态输入电导理论,从尖峰时间快速重建高维电导模型的退化群体,实现高精度、鲁棒且可扩展的推断。

详情
Journal ref
PLOS Computational Biology 22(5): e1014337 (2026)
AI中文摘要

从实验可获取的记录中推断电导模型(CBMs)的生物物理参数仍然是计算神经科学的一个核心挑战。尖峰时间是最广泛可用的数据,但它们很少揭示哪些离子通道电导组合产生了观察到的活动。这一逆问题因神经元退化而进一步复杂化,其中多个不同的电导集产生相似的尖峰模式。我们引入了一种方法,通过将深度学习与动态输入电导(DICs)相结合来解决这一挑战,DICs是一个理论框架,将复杂的CBMs简化为三个可解释的反馈组件,控制兴奋性和尖峰模式。我们的方法首先使用一个神经网络将尖峰时间映射到阈值处的DIC密度,该网络学习神经元活动的低维表示。然后,预测的DIC值通过迭代补偿算法用于生成退化的CBM群体,确保与中间目标DIC兼容,从而再现相应的尖峰模式,即使在高度模型中也是如此。应用于两个模型,该算法流程以高精度和鲁棒性重建尖峰和爆发模式,包括在模拟生理随机性的噪声电流注入下生成的尖峰序列。它在标准硬件上毫秒级内产生多样的退化群体,实现了仅从尖峰记录进行可扩展且高效的推断。总之,这项工作将DICs定位为实验观察活动与机制模型之间的实用且可解释的桥梁。通过实现直接从尖峰时间快速且可扩展地重建退化群体,我们的方法提供了一种强大的方式来研究神经元如何利用电导变异性实现可靠计算。

英文摘要

Inferring the biophysical parameters of conductance-based models (CBMs) from experimentally accessible recordings remains a central challenge in computational neuroscience. Spike times are the most widely available data, yet they reveal little about which combinations of ion channel conductances generate the observed activity. This inverse problem is further complicated by neuronal degeneracy, where multiple distinct conductance sets yield similar spiking patterns. We introduce a method that addresses this challenge by combining deep learning with Dynamic Input Conductances (DICs), a theoretical framework that reduces complex CBMs to three interpretable feedback components governing excitability and firing patterns. Our approach first maps spike times to DIC densities at threshold using a neural network that learns a low-dimensional representation of neuronal activity. The predicted DIC values are then used to generate degenerate CBM populations via an iterative compensation algorithm, ensuring compatibility with the intermediate target DICs, and thereby reproducing the corresponding firing patterns, even in high-dimensional models. Applied to two models, this algorithmic pipeline reconstructs spiking and bursting regimes with high accuracy and robustness to variability, including spike trains generated under noisy current injection mimicking physiological stochasticity. It produces diverse degenerate populations within milliseconds on standard hardware, enabling scalable and efficient inference from spike recordings alone. Together, this work positions DICs as a practical and interpretable link between experimentally observed activity and mechanistic models. By enabling fast and scalable reconstruction of degenerate populations directly from spike times, our approach provides a powerful way to investigate how neurons exploit conductance variability to achieve reliable computation.

2509.11379 2026-05-26 stat.ML cs.LG math.ST stat.TH

Some Robustness Properties of Label Cleaning

标签清理的一些鲁棒性性质

Chen Cheng, John Duchi

AI总结 本文证明,依赖聚合标签(例如从噪声响应中提炼的标签信息)的学习过程具有数据清理无法实现的鲁棒性,体现在风险一致性、模型误设下的收敛性等方面。

详情
Comments
41 pages, 3 figures. Accepted to Transactions on Machine Learning Research (TMLR)
AI中文摘要

我们证明,依赖聚合标签(例如从噪声响应中提炼的标签信息)的学习过程具有数据清理无法实现的鲁棒性。这种鲁棒性以多种方式体现。在风险一致性的背景下——当采用机器学习中标准的做法,即最小化替代(通常是凸的)损失函数来代替期望的任务损失(如0-1误分类误差)时——使用标签聚合的过程获得了比使用原始标签更强的相合性保证。而在经典统计场景中,拟合完全正确指定的模型表明,纳入所有可能信息(即对标签不确定性建模)在统计上是有效的,但一旦要最小化的损失函数有轻微误设,“标准”方法就会失去相合性。然而,利用聚合信息的过程仍然收敛到最优分类器,这突显了纳入更全面的数据分析流程(从数据收集到模型拟合再到预测时间)如何通过精炼噪声信号来产生更鲁棒的方法论。

英文摘要

We demonstrate that learning procedures that rely on aggregated labels, e.g., label information distilled from noisy responses, enjoy robustness properties impossible without data cleaning. This robustness appears in several ways. In the context of risk consistency -- when one takes the standard approach in machine learning of minimizing a surrogate (typically convex) loss in place of a desired task loss (such as the zero-one mis-classification error) -- procedures using label aggregation obtain stronger consistency guarantees than those even possible using raw labels. And while classical statistical scenarios of fitting perfectly-specified models suggest that incorporating all possible information -- modeling uncertainty in labels -- is statistically efficient, consistency fails for ``standard'' approaches as soon as a loss to be minimized is even slightly mis-specified. Yet procedures leveraging aggregated information still converge to optimal classifiers, highlighting how incorporating a fuller view of the data analysis pipeline, from collection to model-fitting to prediction time, can yield a more robust methodology by refining noisy signals.

2506.00181 2026-05-26 cs.LG stat.ML

On the Interaction of Batch Noise, Adaptivity, and Compression, under $(L_0,L_1)$-Smoothness: An SDE Approach

关于批噪声、自适应性和压缩在$(L_0,L_1)$-光滑性下的相互作用:一种SDE方法

Enea Monzio Compagnoni, Rustem Islamov, Frank Norbert Proske, Aurelien Lucchi, Antonio Orvieto, Eduard Gorbunov

AI总结 本文通过随机微分方程(SDE)框架,在$(L_0,L_1)$-光滑性假设下统一分析分布式压缩SGD及其符号变体,揭示了梯度噪声、通信压缩和自适应更新之间的相互作用,并提出了新的SDE模型以准确捕捉学习率限制与几何特性的关系。

详情
Comments
Accepted at ICML 2026 (Poster)
AI中文摘要

分布式随机优化交织了(i)随机梯度噪声、(ii)通信压缩和(iii)自适应/归一化更新。虽然每个因素已被单独研究,但在现实假设下它们的联合效应仍然知之甚少。在这项工作中,我们在最近引入的$(L_0, L_1)$-光滑性条件下,为分布式压缩SGD(DCSGD)及其符号变体分布式符号SGD(DSignSGD)开发了一个统一的理论框架。从概念角度,我们表明文献中的一阶和二阶修正方程不能准确建模离散时间步长/稳定性限制,特别是在$(L_0,L_1)$-光滑性下。从技术角度,我们通过将曲率相关项仔细纳入其漂移中,提出了新的一阶SDE:这有助于捕捉学习率限制、梯度噪声、压缩和损失景观几何之间的细粒度关系。重要的是,我们在一般梯度噪声假设下进行,包括重尾和仿射方差区域,这超出了经典的有限方差设置。我们的结果表明,归一化DCSGD的更新作为稳定性的自然条件出现,归一化程度由梯度噪声结构、景观正则性和压缩率精确决定。相比之下,DSignSGD即使在重尾噪声下也能以标准学习率调度收敛。这些发现共同提供了新的理论见解和视角,以及实践指导。

英文摘要

Distributed stochastic optimization intertwines (i) stochastic gradient noise, (ii) communication compression, and (iii) adaptive/normalized updates. While each factor has been studied in isolation, their joint effect under realistic assumptions remains poorly understood. In this work, we develop a unified theoretical framework for Distributed Compressed SGD (DCSGD) and its sign variant Distributed SignSGD (DSignSGD) under the recently introduced $(L_0, L_1)$-smoothness condition. From a conceptual perspective, we show that the first- and second-order modified equations from the literature do not accurately model the discrete-time step-size/stability restrictions, especially under $(L_0,L_1)$-smoothness. From a technical perspective, we propose new first-order SDEs by carefully incorporating curvature-dependent terms into their drift: This helps capture the fine-grained relationship between learning rate restrictions, gradient noise, compression, and the geometry of the loss landscape. Importantly, we do so under general gradient noise assumptions, including heavy-tailed and affine-variance regimes, which extend beyond the classical bounded-variance setting. Our results suggest that normalizing the updates of DCSGD emerges as a natural condition for stability, with the degree of normalization precisely determined by the gradient noise structure, the landscape's regularity, and the compression rate. In contrast, DSignSGD converges even under heavy-tailed noise with standard learning rate schedules. Together, these findings offer both new theoretical insights and perspectives, and practical guidance.

2405.07026 2026-05-26 stat.ME

Selective Randomization Inference for Adaptive Experiments

自适应实验的选择性随机化推断

Tobias Freidling, Qingyuan Zhao, Zijun Gao

AI总结 针对自适应实验的统计推断难题,提出选择性随机化推断框架,通过条件后选择推断与随机化检验结合,控制选择性第一类错误,无需模型假设。

详情
AI中文摘要

自适应实验利用数据的初步分析来指导后续行动,广泛应用于医学和社会科学等多个学科。由于零假设和实验设计依赖于数据,长期以来人们认识到自适应实验的统计推断并不直接。大多数现有方法仅适用于特定的自适应设计,并依赖强假设。在这项工作中,我们提出选择性随机化推断作为分析自适应实验的通用框架。简而言之,我们的方法将条件后选择推断应用于随机化检验。通过使用有向无环图描述数据生成过程,我们推导出选择性随机化p值,控制选择性第一类错误。由于推断仅依赖于处理分配的随机性,不需要建模假设或独立同分布数据。我们详细阐述了使所提出的p值可计算的条件,并提供了拒绝抽样和MCMC算法来寻找蒙特卡洛近似。此外,本文展示了如何估计和构建同质处理效应的置信区间。最后,我们使用合成数据和真实数据演示了我们的方法,并与其他随机化检验进行了比较。

英文摘要

Adaptive experiments use preliminary analyses of the data to inform further course of action and are commonly used in many disciplines including medical and social sciences. Because the null hypothesis and experimental design are data-dependent, it has long been recognized that statistical inference for adaptive experiments is not straightforward. Most existing methods only apply to specific adaptive designs and rely on strong assumptions. In this work, we propose selective randomization inference as a general framework for analysing adaptive experiments. In a nutshell, our approach applies conditional post-selection inference to randomization tests. By using directed acyclic graphs to describe the data generating process, we derive a selective randomization p-value that controls the selective type-I error. As inference only relies on the randomness in the treatment assignment, no modelling assumptions or independent and identically distributed data are needed. We elaborate on conditions that render the proposed p-value computable and provide rejection sampling and MCMC algorithms to find a Monte Carlo approximation. Moreover, this article shows how to estimate and construct confidence intervals for a homogeneous treatment effect. Lastly, we demonstrate our method and compare it with other randomization tests using synthetic and real-world data.

2401.14684 2026-05-26 stat.ME

Inference for Cumulative Incidences and Treatment Effects in Randomized Controlled Trials with Time-to-Event Outcomes under ICH E9 (R1)

ICH E9 (R1) 下随机对照试验中时间至事件结局的累积发生率和治疗效应的推断

Yuhao Deng, Shasha Han, Xiao-Hua Zhou

AI总结 针对随机对照试验中时间至事件结局的并发事件,定义了五种策略下的因果估计量数学形式,提出了非参数估计与推断方法,并在LEADER试验中验证。

详情
Comments
Accepted by Statistics in Medicine
AI中文摘要

在关注时间至事件结局的随机对照试验中,并发事件可能以两种方式出现:作为半竞争事件(改变主要结局事件的风险)或作为竞争事件(使主要结局事件的定义不明确)。尽管ICH E9 (R1)增补文件中提出了五种策略来处理随机对照试验中的并发事件,但这些策略在追求因果解释时不易应用于时间至事件结局。在本研究中,我们展示了如何定义、估计和推断这些背景下具有因果解释的目标。具体来说,我们推导了对应于五种策略的因果估计量的数学公式,并阐明了识别这些因果估计量所需的数据结构。此外,我们介绍了用于估计和推断这些因果估计量的非参数方法,包括估计量的渐近方差和假设检验。最后,我们使用LEADER试验的数据说明了我们的方法,该试验旨在研究利拉鲁肽对心血管结局的影响。

英文摘要

In randomized controlled trials (RCTs) that focus on time-to-event outcomes, intercurrent events can arise in two ways: as semi-competing events, which modify the hazard of the primary outcome events, or as competing events, which make the definition of the primary outcome events unclear. Although five strategies have been proposed in the ICH E9 (R1) addendum to address intercurrent events in RCTs, these strategies are not easily applicable to time-to-event outcomes when aiming for causal interpretations. In this study, we show how to define, estimate, and make inferences concerning objectives that have causal interpretations within these contexts. Specifically, we derive the mathematical formulations of the causal estimands corresponding to the five strategies and clarify the data structure needed to identify these causal estimands. Furthermore, we introduce nonparametric methods for estimating and making inferences about these causal estimands, including the asymptotic variance of estimators and hypothesis tests. Finally, we illustrate our methods using data from the LEADER Trial, which aims to investigate the effect of liraglutide on cardiovascular outcomes.

2302.03089 2026-05-26 stat.AP astro-ph.GA

Statistical methods for partitioning ribbon and globally-distributed flux using data from the Interstellar Boundary Explorer

利用星际边界探测器数据划分带状和全球分布通量的统计方法

Lauren J. Beesley, Dave Osthus, Kelly R. Moran, Madeline A. Stricklin, Grant David Meadors, Thomas K. Kim, Sung Jun Noh, Nehpreet K. Walia, Paul H. Janzen, Eric J. Zirnstein, Brian P. Weaver, Daniel B. Reisenfeld

AI总结 提出统计算法将总ENA强度图分离为带状和全球分布通量图并估计不确定性,增强模型灵活性和不确定性传播。

详情
AI中文摘要

NASA的星际边界探测器(IBEX)卫星收集高能中性原子(ENA)数据,可揭示太阳系与星际空间之间的日光层边界。利用这些数据,科学家可以构建所有方向观测到的ENA强度(通常以通量表示)图。这些图中观测到的ENA通量被认为来自至少两个不同的源:一个源表现为带状集中ENA通量,另一个源(或可能多个)产生平滑变化的全球分布通量。每个ENA源类型及其对应的ENA强度图具有独立的科学意义。本文开发了统计算法,将总ENA强度图分离为两个特定源图(带状和全球分布通量),并估计相应不确定性。所提方法的主要优势包括增强模型灵活性和改进估计不确定性的传播。我们在模拟数据上评估了所提方法,这些数据旨在模拟真实数据设置。我们还提出了估计天空中近椭圆带状中心的新方法,未来可用于研究局部星际磁场的位置和变化。

英文摘要

NASA's Interstellar Boundary Explorer (IBEX) satellite collects data on energetic neutral atoms (ENAs) that can provide insight into the heliosphere boundary between our solar system and interstellar space. Using these data, scientists can construct maps of the ENA intensities (often, expressed in terms of flux) observed in all directions. The ENA flux observed in these maps is believed to come from at least two distinct sources: one source which manifests as a ribbon of concentrated ENA flux and one source (or possibly several) that results in a smoothly-varying globally-distributed flux. Each ENA source type and its corresponding ENA intensity map is of separate scientific interest. In this paper, we develop statistical algorithms for separating the total ENA intensity maps into two source-specific maps (ribbon and globally-distributed flux) and estimating corresponding uncertainty. Key advantages of the proposed method include enhanced model flexibility and improved propagation of estimation uncertainty. We evaluate the proposed methods on simulated data designed to mimic realistic data settings. We also propose new methods for estimating the center of the near-elliptical ribbon in the sky, which can be used in the future to study the location and variation of the local interstellar magnetic field.

2604.05639 2026-05-26 stat.ME

Estimating Dynamic Marginal Policy Effects under Sequential Unconfoundedness

估计顺序无混淆下的动态边际政策效应

I-han Lai, Stefan Wager

AI总结 本文提出在顺序无混淆假设下,通过可简化表达式识别动态边际政策效应,并开发双重稳健估计器,避免指数级诅咒。

详情
Comments
Fix typos
AI中文摘要

我们开发了估计动态系统中无穷小政策变化如何影响长期结果的方法。我们证明动态边际政策效应(MPEs)可以通过可处理的简化形式表达式识别,并可以在一般的顺序无混淆假设下进行估计。我们还提出了动态MPEs的双重稳健估计器。我们的方法不需要观察完整的动态状态信息(如马尔可夫决策过程中的离策略评估通常假设的那样),也不会遭受指数级的时间跨度诅咒(如非马尔可夫离策略评估中常见的那样)。我们在多个模拟中展示了我们方法的实用性和稳健性,包括一个由动态定价应用驱动的模拟,其中人们使用过去的价格来形成当前价格的参考水平。

英文摘要

We develop methods for estimating how infinitesimal policy changes affect long-term outcomes in dynamic systems. We show that dynamic marginal policy effects (MPEs) can be identified via tractable reduced-form expressions, and can be estimated under a general sequential unconfoundedness assumption. We also propose a doubly robust estimator for dynamic MPEs. Our approach does not require observing full dynamic state information (as is typically assumed for off-policy evaluation in Markov decision processes), and does not incur an exponential curse of horizon (as is typical in non-Markovian off-policy evaluation). We demonstrate practicality and robustness of our approach in a number of simulations, including one motivated by a dynamic pricing application where people use past prices to form a reference level for current prices.

2604.10845 2026-05-26 stat.ME econ.EM

Learning Preferences from Conjoint Data: A Structural Deep Learning Approach

从联合数据中学习偏好:一种结构深度学习方法

Avidit Acharya, Jens Hainmueller, Yiqing Xu

AI总结 提出一种将深度神经网络嵌入随机效用Logit模型的结构方法,以灵活估计偏好参数,并通过双重/去偏机器学习实现有效推断,揭示联合实验中隐藏的偏好异质性。

详情
AI中文摘要

联合实验随机化多维轮廓,为恢复结构偏好参数(包括边际替代率、支付意愿以及偏好总体分布)提供了有力设计。然而,政治学中的主流方法侧重于非参数因果估计量,未充分利用这一潜力。我们提出一种结构方法,将深度神经网络嵌入随机效用Logit模型中,使偏好参数作为受访者特征的完全灵活函数而变化。神经网络解决了参数设定可能无法捕捉真实数据生成过程的担忧,而双重/去偏机器学习则提供了对平均偏好参数的有效推断。我们将该方法应用于三项著名的联合研究,发现了被简约形式平均值掩盖的丰富偏好异质性:近乎为零的性别效应与83%偏好女性候选人并存,对非民主行为的反对近乎普遍但强度差异显著,累进税偏好跨越每个党派子群体。

英文摘要

Conjoint experiments randomize multidimensional profiles, offering a powerful design for recovering structural preference parameters -- including marginal rates of substitution, willingness to pay, and the distribution of preferences across a population. Yet the dominant approach in political science has focused on nonparametric causal estimands that do not leverage this potential. We propose a structural approach that embeds a deep neural network within a random utility logit model, allowing preference parameters to vary as a fully flexible function of respondent characteristics. The neural network addresses the concern that a parametric specification may not capture the true data generating process, while double/debiased machine learning provides valid inference on average preference parameters. We apply our method to three prominent conjoint studies and find rich preference heterogeneity masked by reduced-form averages: a near-zero gender effect coexists with 83% preferring female candidates, opposition to undemocratic behavior is near-universal but varies sharply in intensity, and progressive tax preferences cut across every partisan subgroup.

2510.07128 2026-05-26 stat.ME stat.ML

A General Framework for Joint Multi-State Models

联合多状态模型的一般框架

Félix Laplante, Christophe Ambroise

AI总结 提出一个统一纵向生物标志物与有向图上多状态时间-事件过程的通用联合建模框架,支持非线性子模型和随机梯度下降推断,实现复杂轨迹下的动态预测。

详情
Comments
34 pages, 12 figures
AI中文摘要

传统的联合建模方法通常描述纵向生物标志物与终端、复发或竞争风险设置中的离散事件发生之间的关系,从而对复杂的多状态轨迹提供有限的表示。我们提出了一个通用的多状态联合建模框架,将纵向生物标志物动态与定义在任意有向图上的多状态时间-事件过程统一起来。该框架还支持非线性纵向子模型和通过随机梯度下降的可扩展推断。该公式包含马尔可夫和半马尔可夫转移结构,使得循环和终端吸收能够自然表示。纵向和事件过程通过非线性混合效应模型中的共享潜在结构连接,扩展了经典的联合建模公式。我们推导了完整似然、模型选择标准,并开发了基于随机梯度下降的可扩展推断程序,以实现高维和大规模应用。此外,我们制定了一个动态预测框架,提供个体化的状态转移概率和沿着复杂事件轨迹的个性化风险评估。通过模拟和对PAQUID队列的应用,我们展示了准确的参数恢复和个体化预测。

英文摘要

Conventional joint modeling approaches generally characterize the relationship between longitudinal biomarkers and discrete event occurrences within terminal, recurring or competing risk settings, thereby offering a limited representation of complex, multi-state trajectories. We propose a general multi-state joint modeling framework that unifies longitudinal biomarker dynamics with multi-state time-to-event processes defined on arbitrary directed graphs. The proposed framework also accomodates nonlinear longitudinal submodels and scalable inference via stochastic gradient descent. This formulation encompasses both Markovian and semi-Markovian transition structures, allowing recurrent cycles and terminal absorptions to be naturally represented. The longitudinal and event processes are linked through shared latent structures within nonlinear mixed-effects models, extending classical joint modeling formulations. We derive the complete likelihood, model selection criteria, and develop scalable inference procedures based on stochastic gradient descent to enable high-dimensional and large-scale applications. In addition, we formulate a dynamic prediction framework that provides individualized state-transition probabilities and personalized risk assessments along complex event trajectories. Through simulation and application to the PAQUID cohort, we demonstrate accurate parameter recovery and individualized prediction.

2409.02416 2026-05-26 cs.LG stat.ML

Relative Translation Invariant Wasserstein Distance

相对平移不变Wasserstein距离

Binshuai Wang, Qiwei Di, Ming Yin, Mengdi Wang, Quanquan Gu, Peng Wei

AI总结 受Bures距离启发,提出相对平移不变Wasserstein距离RW_p,证明其度量性质,并设计双层算法计算离散分布间的RW_p距离,当p=2时提出RW_2-LP和RW_2-Sinkhorn算法以提高数值稳定性,实验验证了算法在减少数值误差和实际雷暴模式检索中的有效性。

详情
Comments
Accepted by Transactions on Machine Learning Research (TMLR). Final accepted version. The implementation is publicly available at \url{https://github.com/DRKWang/rw_metric}
AI中文摘要

受Bures距离启发,我们引入了一类新的距离族——\\emph{相对平移不变Wasserstein距离},记为$RW_p$,作为经典Wasserstein距离$W_p$($p \\\in [1, +\\\infty)$)的推广。我们证明了$RW_p$定义了一个有效的度量,并表明这类度量比经典Wasserstein距离更具内在性。设计了一种双层算法来计算任意离散分布之间的一般$RW_p$距离。此外,当$p=2$时,我们证明在离散设定下最优耦合矩阵在分布平移下不变,并进一步提出了两种算法,即$\\\mathrm{RW}_2$-LP算法和$\\\mathrm{RW}_2$-Sinkhorn算法,以提高计算$W_2$距离和最优耦合矩阵解的数值稳定性。最后,我们进行了三个实验来验证我们的理论结果和算法。前两个实验报告了$\\\mathrm{RW}_2$-LP算法和$\\\mathrm{RW}_2$-Sinkhorn算法(无论是否归一化)相比标准算法能显著减少数值误差。第三个实验表明$RW_p$算法在计算上具有可扩展性,并适用于实际应用中相似雷暴模式的检索。

英文摘要

Motivated by the Bures distance, we introduce a new family of distances, \emph{relative translation invariant Wasserstein distances}, denoted by $RW_p$, as an extension of the classical Wasserstein distances $W_p$ for $p \in [1, +\infty)$. We establish that $RW_p$ defines a valid metric and demonstrate that this type of metric is more intrinsic than the classical Wasserstein distance. A bi-level algorithm is designed to compute the general $RW_p$ distance between arbitrary discrete distributions. Moreover, when $p = 2$, we show that the optimal coupling matrix is invariant under distributional translation in the discrete setting, and we further propose two algorithms, the $\mathrm{RW}_2$-LP algorithm and the $\mathrm{RW}_2$-Sinkhorn algorithm, to improve the numerical stability of computing $W_2$ distance and the optimal coupling matrix solutions. Finally, we conduct three experiments to validate our theoretical results and algorithms. The first two experiments report that the $\mathrm{RW}_2$-LP algorithm and the $\mathrm{RW}_2$-Sinkhorn algorithm, both with and without normalization, can significantly reduce the numerical errors compared to standard algorithms. The third experiment shows that $RW_p$ algorithms are computationally scalable and applicable to the retrieval of similar thunderstorm patterns in practical applications.