arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.21486 2026-05-21 cs.LG cond-mat.dis-nn cs.AI stat.ML

Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate

量化超参数迁移与嵌入层学习率的重要性

Dayal Singh Kalra, Maissam Barkeshli

AI总结 本文研究了超参数迁移的量化方法,通过三种指标评估超参数迁移的质量,发现Maximal Update(μP)参数化在训练中通过最大化嵌入层学习率提升了超参数迁移质量,而权重衰减虽改善了缩放定律拟合,但会降低外推鲁棒性。

详情
Comments
10+28 pages, 5+17 figures
AI中文摘要

超参数迁移允许从小规模到大规模模型中外推最优优化超参数,这对于训练大型语言模型(LLMs)至关重要。这可以通过拟合缩放定律或通过精心选择参数化方式(如Maximal Update(μP))来实现,使最优超参数近似规模不变。本文首先开发了一个框架,通过三个指标量化超参数迁移:(1)缩放定律拟合的质量,(2)对外推误差的鲁棒性,以及(3)由于参数化选择导致的渐近损失惩罚。接着,通过一系列全面的消融实验,探讨了为何μP相对于标准参数化(SP)在训练AdamW时提供高质量的学习率迁移,因为现有理论不足。我们发现,μP相对于SP的主要优势在于最大化嵌入层学习率。在SP中,嵌入层学习率充当瓶颈,导致训练不稳定性;将其增加到宽度的倍数以匹配μP,可显著平滑训练并提高超参数迁移质量。此外,权重衰减改善了缩放定律拟合,但在固定token-per-parameter设置下会损害外推的鲁棒性。

英文摘要

Hyperparameter transfer allows extrapolating optimal optimization hyperparameters from small to large scales, making it critical for training large language models (LLMs). This is done either by fitting a scaling law to the hyperparameters or by a judicious choice of parameterization, such as Maximal Update ($μ$P), that renders optimal hyperparameters approximately scale invariant. In this paper, we first develop a framework to quantify hyperparameter transfer through three metrics: (1) the quality of the scaling law fit, (2) the robustness to extrapolation errors, and (3) the asymptotic loss penalty due to choice of parameterization. Next, we investigate through a comprehensive series of ablations why $μ$P appears to offer high-quality learning rate transfer relative to standard parameterization (SP), as existing theory is inadequate. We find that the overwhelming benefit of $μ$P relative to SP when training with AdamW arises simply from maximizing the learning rate of the embedding layer. In SP, the embedding layer learning rate acts as a bottleneck that induces training instabilities; increasing it by a factor of width to match $μ$P dramatically smooths out training while improving hyperparameter transfer. We also find that weight decay improves the scaling law fits, while, in the fixed token-per-parameter setting, it hurts the robustness of the extrapolation.

2605.21464 2026-05-21 stat.AP

Assessing the impact of tourist attractions through the integration of causal inference and demand-side economic analysis: A case study of the Sensoria experience museum in Holzminden, Germany

通过因果推断与需求侧经济分析整合评估旅游景点影响:以德国霍尔茨明登市Sensoria体验博物馆为例

Thomas Wieland

AI总结 本文通过整合因果推断与需求侧经济分析,研究了德国霍尔茨明登市2024年9月开放的Sensoria体验博物馆对当地旅游业需求及直接和间接影响,发现其在第一年运营中带来4,691个额外的过夜住宿,产生约0.56百万欧元的额外总收入,但长期影响尚无法确定。

详情
Comments
v1.0.0
AI中文摘要

本研究笔记探讨了2024年9月在德国霍尔茨明登市开放的体验博物馆Sensoria对当地旅游业需求及相关直接和间接影响。为此,本研究采用了一种新的方法,通过结合因果推断与需求侧经济分析。采用差异-差异方法来量化治疗城市中额外的游客过夜住宿数量;结果转换为行业特定支出,从而确定Sensoria的直接和间接影响。在新旅游景点运营的第一年,可以检测到正向且显著的影响,对应4,691个额外的过夜住宿,导致酒店和零售行业及其他服务的额外总收入约为0.56百万欧元。直接效应和间接效应分别为约0.23和0.21百万欧元。然而,长期影响尚无法确定。此外,还可以证明在研究城市中小型和大型活动的正向影响。本简短研究证明了结合上述两种方法具有潜力,但仍需更深入的分析,文中也讨论了如何进行此类分析的建议。

英文摘要

This research note investigates the impact of the experience museum Sensoria, opened in September 2024 in Holzminden, Germany, on local tourism demand and related direct and indirect effects. To this end, the study employs a novel approach by combining causal inference and demand-side economic analysis. A difference-in-differences approach is employed to quantify the number of additional guest overnight stays in the treatment city; the results are converted into industry-specific expenditures, from which the direct and indirect effects of Sensoria are determined. A positive and significant impact which corresponds to 4,691 additional overnight stays can be detected in the first year of operation of the new tourist attraction, resulting in an additional gross turnover of approximately 0.56 million EUR across the hospitality and retail industries and other services. The direct effects and indirect effects amount to approximately 0.23 and 0.21 million EUR, respectively. However, long-term effects cannot (yet) be determined. Additionally, positive effects from small and large events in the cities studied can be demonstrated. This brief study demonstrates that combining the two approaches mentioned holds promise, yet requires a more in-depth analysis, for which suggestions are also discussed regarding how it could be conducted.

2605.21458 2026-05-21 cs.AI cs.LG stat.ME

Mind the Sim-to-Real Gap & Think Like a Scientist

注意仿真到现实的差距并像科学家一样思考

Harsh Parikh, Gabriel Levin-Konigsberg, Dominique Perrault-Joncas, Alexander Volfovsky

AI总结 本文研究了在仿真和现实之间如何补充实验以减少价值差距,提出了Fisher-SEP方法,并通过两个案例研究展示了其应用。

详情
AI中文摘要

假设有规划者拥有一个预先训练的序列决策问题的仿真器,并有机会在现实中进行实验。仿真器查询成本低,但继承了校准数据中的混杂因素和漂移。实验是无偏的,但每次试验消耗一个现实单位。我们研究了规划者何时以及如何补充仿真器进行实验。我们给出了三个结果。首先,扩展的仿真引理将仿真器的价值误差分解为校准-部署偏移,该偏移可以随机化识别,以及一个参数残差,无法通过进一步交互减少。第二,仿真器最优策略与最优解之间的价值差距分为局部部分,这部分在部署策略已访问的状态上,以及可达性部分,这部分在部署策略未访问的状态上。在纯被动学习下,可达性部分在任何时间范围内都保持远离零。第三,我们提出了Fisher-SEP,一种辅助仿真的实验策略(SEP),该策略最小化目标策略价值的后验预测方差,具有仅奖励和仅转换的特殊化版本。两个案例研究展示了这些制度。在自动售货机供应链中,前端实验在时间范围足够长以抵消试点成本后超过后验更新。在HIV移动测试示例中,有一个走廊将一个受监控区域与一个受监控较差的区域分开,只有设计的探索才能到达受监控较差的区域。

英文摘要

Suppose a planner has a pre-trained simulator of a sequential decision problem and the option to run real experiments in the field. The simulator is cheap to query but inherits confounding and drift from its calibration data. Experimentation is unbiased but consumes one real unit per trial. We study when, and how, the planner should supplement the simulator with experiments. We give three results. First, an extended simulation lemma decomposes the simulator's value error into a calibration--deployment shift that randomization can identify and a parametric residual that no further interaction can reduce. Second, the value gap between the simulator-optimal policy and the optimum splits into a local component, on states the deployed policy already visits, and a reachability component, on states it does not. The reachability component stays bounded away from zero at any horizon under purely passive learning. Third, we propose Fisher-SEP, a simulation-aided experimental policy (SEP) that minimizes the posterior predictive variance of a target policy's value, with reward-only and transition-only specializations. Two case studies illustrate the regimes. In a vending-machine supply chain, front-loaded experimentation overtakes posterior updating once the horizon is long enough to amortize the pilot. In an HIV mobile-testing example with a corridor that separates a well-surveilled region from a poorly-surveilled one, only designed exploration reaches the poorly-surveilled region.

2605.21437 2026-05-21 physics.geo-ph cs.LG stat.ML

Neural Negative Binomial Regression for Weekly Seismicity Forecasting: Per-Cell Dispersion Estimation and Tail Risk Assessment

基于神经网络的负二项回归用于每周地震预测:每个单元的分散估计和尾部风险评估

Alim Igilik

AI总结 本文提出了一种基于神经网络的地震预测方法,通过每个单元的分散参数估计和尾部风险评估,改进了传统泊松分布的假设,提高了极端事件预测的准确性。

详情
Comments
28 pages, 9 figures. Source code available at https://github.com/Al1mkaYandere/seismic-probabilistic-modeling
AI中文摘要

传统方法在空间网格上预测每周地震数量时依赖于具有单一全局分散假设的泊松分布。我们证明在中亚(2010-2024)的地震数据中,这一假设系统性地被违反,通过具有边界校正的似然比检验,强烈拒绝泊松假设(p < 10^{-179})。本文的主要贡献是EarthquakeNet架构,它通过神经网络(空间嵌入+MLP)提供每个单元的过分散参数alpha的内生估计,而无需显式空间协方差指定。与现有地震预测中的负二项回归方法不同,后者通常假设单一全局alpha,所提出的每个单元公式允许模型识别地震聚类的空间异质性,并通过预测分布的分位数构建概率风险意识警报。在2018-2023年的四系统走步评估中,与负二项GLM基线相比,平均皮球偏差(MPD)减少了8.6%。在尾部区域(Y >= 5)的改进最为显著,所提出模型的连续排名概率得分(CRPS)比基线低12.5%,表明极端事件预测的校准得到改善。

英文摘要

Standard approaches to forecasting the weekly number of earthquakes on a spatial grid rely on the Poisson distribution with a single global dispersion assumption. We show that this assumption is systematically violated in seismic data from Central Asia (2010-2024), where a likelihood-ratio test with boundary correction strongly rejects the Poisson hypothesis (p < 10^{-179}). The main contribution of this work is the EarthquakeNet architecture, which provides an endogenous per-cell estimate of the overdispersion parameter alpha via a neural network (spatial embeddings + MLP), without explicit spatial covariance specification. In contrast to existing negative binomial regression approaches in seismological forecasting, which typically assume a single global alpha, the proposed per-cell formulation allows the model to identify spatial heterogeneity in seismic clustering and to construct probabilistic risk-aware alerts via quantiles of the predicted distribution. A walk-forward evaluation (2018-2023) over four systems shows an 8.6 percent reduction in mean pinball deviation (MPD) relative to a negative binomial GLM baseline. The strongest improvements are observed in the tail regime (Y >= 5), where the continuous ranked probability score (CRPS) of the proposed model is 12.5 percent lower than that of the baseline, indicating improved calibration in extreme-event forecasting.

2605.21416 2026-05-21 math.ST stat.ME stat.TH

Data driven extreme value distribution estimation: Derivation of the Mean Integrated Squared Error, optimal bandwidth selection and stability conditions

数据驱动的极值分布估计:均方误差的推导、最优带宽选择和稳定性条件

Michael Sandbichler, Tobias Hell

AI总结 本文提出了一种数据驱动的极值分布估计器,推导了其均方误差,用于计算最优带宽并建立了带宽优化过程的稳定性条件。

详情
Comments
37 pages, 5 figures
AI中文摘要

我们介绍了一种数据驱动的极值分布(DDEVD)估计器,一种基于核的方法,用于从数据中估计极值分布。我们详细推导了其均方误差(MISE),利用它来计算最优带宽,并建立了带宽优化过程的稳定性条件。

英文摘要

We introduce the data driven extreme value distribution (DDEVD) estimator, a kernel-based method for estimating extreme value distributions from data. We derive its mean integrated squared error (MISE) in detail, use it to compute the optimal bandwidth and establish stability conditions for the bandwidth optimization procedure.

2605.21408 2026-05-21 stat.ME

TCARD: Nearly Balanced Two-Level Designs with Treatment Cardinality Constraints with an Application to LLM Prompt Engineering

TCARD: 近似平衡的双水平设计与处理基数约束及其在大语言模型提示工程中的应用

Kexin Xie, Ryan Lekivetz, Xinwei Deng

AI总结 本文研究了在处理基数约束下近似平衡的双水平设计,提出了一种新的无模型目标函数,用于优化设计的平衡性和均匀性,并通过数值实验验证了其在不同问题规模和约束强度下的有效性。

详情
AI中文摘要

现代实验设计常常面临所谓的处理基数约束,即每个处理中包含的因素数量的限制。在工程模拟、人工智能系统调优和大规模系统验证中,这种约束非常常见。这需要开发适当的设计以在可行约束下实现统计效率。本文研究了在该k-处理基数约束(TCARD)下的双水平设计,其中设计矩阵X∈{0,1}^{n×p}具有恒定的行和等于k。尽管TCARD与平衡不完全块设计(BIBD)密切相关,但许多实际(n,p,k)组合无法获得精确的BIBD结构。这导致了近似平衡TCARD的概念,我们证明了它们最小化广义字长模式的第一两个组件。我们还显示,在这种情况下良好的投影行为由两种基于计数的规律控制:平衡因素重复和均匀的成对共现。受此表征的启发,我们提出了一种无模型目标函数,即平衡共现偏差(Φ_BCD),它联合惩罚重复不平衡和共现分散。我们进一步表明,这一准则与经典最优原则密切相关,包括(M,S)-最优性、中心UE(s²)准则和贝叶斯D-最优性。为了构造最小化Φ_BCD的设计,我们开发了一种坐标交换(CE)算法,具有高效的增量更新,以及基于模拟的程序用于校准准则权重以适应预期的下游任务。数值实验确认,所提出的方法在不同问题规模和约束强度下均优于现有替代方法。

英文摘要

Modern experimental designs often face the so-called treatment cardinality constraint, which is the constraint on the number of included factors in each treatment. Experiments with such constraints are commonly encountered in engineering simulation, AI system tuning, and large-scale system verification. This calls for the development of adequate designs to enable statistical efficiency for modeling and analysis within feasible constraints. In this work, we study two-level designs under this $k$-treatment cardinality constraint (TCARD), where the design matrix $\mathbf{X} \in \{0,1\}^{n \times p}$ has constant row sums equal to $k$. Although TCARDs are closely related to balanced incomplete block designs (BIBDs), exact BIBD structure is unavailable for many practical $(n,p,k)$ combinations. This leads to the notion of nearly balanced TCARDs, which we prove minimize the first two components of the generalized word-length pattern. We also show that good projection behavior in this setting is governed by two count-based regularities: balanced factor replications and uniform pairwise concurrences. Motivated by this characterization, we then propose the Balanced Concurrence Deviation ($Φ_{\mathrm{BCD}}$), a model-free objective that jointly penalizes replication imbalance and concurrence dispersion. We further show that this criterion is closely connected to classical optimality principles, including $(M,S)$-optimality, centered $\mathrm{UE}(s^2)$ criterion, and Bayesian $D$-optimality. To construct designs minimizing $Φ_{\mathrm{BCD}}$, we develop a coordinate-exchange (CE) algorithm with efficient incremental updates, together with a simulation-based procedure for calibrating the criterion weights to the intended downstream task. Numerical experiments confirm that the proposed method compares favorably with existing alternatives across a range of problem sizes and constraint strengths.

2605.21402 2026-05-21 stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG

Memorisation, convergence and generalisation in generative models

记忆、收敛与泛化在生成模型中的表现

Antoine Maillard, Sebastian Goldt

AI总结 本文研究了生成模型中记忆、收敛和泛化的区别,通过线性生成模型的分析,发现当样本数与输入维度成线性关系时,模型会从记忆过渡到泛化,并揭示了泛化包含两个不同目标:匹配数据分布的主体和恢复数据的主潜在因素。

详情
AI中文摘要

生成神经网络通过少量但有限的示例学习生成高度逼真的图像——它们是通过记忆训练集还是真正收敛到数据分布?为了解决这个问题,Kadkhodaie、Guth、Simoncelli和Mallat(ICLR '24)分别在数据集的不同子集上训练扩散模型,并显示当训练图像数量足够大时,它们会收敛到几乎相同的密度。这一结果提出了两个基本问题:需要多少数据才能收敛,以及收敛在学习数据分布方面捕捉了什么?本文通过提供线性生成模型从记忆到泛化的精确分析来解决这些问题。我们发现这些模型在小负载下会记忆,而当样本数与输入维度成线性关系时,收敛会连续出现。令人惊讶的是,我们发现收敛对恢复数据的主潜在因素不敏感,这些因素在尖锐的过渡中被恢复。在将我们的方法扩展到具有幂律谱的数据后,我们在卷积去噪器实验和Kadkhodaie等人的数据中发现了相同的收敛与潜在因素恢复的区别。因此,我们证明生成模型的泛化分解为至少两个不同的目标:匹配数据分布的主体和恢复数据的主潜在因素。这些目标对应于真实与学习数据分布之间的两种不同距离,只有第一个被收敛所捕捉。

英文摘要

Generative neural networks learn how to produce highly realistic images from a large, but finite number of examples - or do they simply memorise their training set? To settle this question, Kadkhodaie, Guth, Simoncelli and Mallat (ICLR '24) trained diffusion models independently on disjoint subsets of a dataset and showed that they converge to nearly the same density when the number of training images is large enough. This result raises two basic questions: how much data do you need for convergence, and what does convergence capture about learning the data distribution? Here, we address these questions by providing an exact analytical characterisation of the transition from memorisation to generalisation in linear generative models. We find that these models memorise at small load, while convergence emerges continuously when the number of samples is linear in the input dimension. Strikingly, we find that convergence is insensitive to recovery of the principal latent factors of the data, which are recovered in a sharp transition. After extending our approach to data with power-law spectra, we find the same distinction between convergence and latent recovery in our experiments with convolutional denoisers and in the data of Kadkhodaie et al. We thus show that generalisation in generative models decomposes into at least two distinct objectives: matching the bulk of the data distribution and recovering the principal latent factors. These objectives correspond to two different distances between true and learnt data distribution, and only the first one is captured by convergence.

2605.21388 2026-05-21 cs.LG cs.AI cs.NA math.NA stat.ML

On the Regularity and Generalization of One-Step Wasserstein-guided Generative Models for PDE-Induced Measures

关于PDE诱导度量的一步Wasserstein引导生成模型的正则性和泛化性

Likun Lin, Zhongjian Wang, Jack Xin, Zhiwen Zhang

AI总结 本文研究了一步Wasserstein引导生成模型在处理PDE诱导概率度量时的正则性和泛化性,通过理论框架证明了运输映射的正则性和生成模型的泛化性质,并通过实验验证了理论结果。

详情
AI中文摘要

尽管生成模型在经验上取得了显著成功,但其在科学计算中的统计准确性理论仍然较为悲观。本文发展了一个理论框架,用于理解运输映射的正则性和一步Wasserstein引导生成模型的泛化性质。我们考虑了与线性椭圆和抛物型方程在有界域上以及扩散和福克-计划克方程在环面上关联的归一化目标密度。在标准结构假设下,我们证明这些目标度量满足倍增条件。通过结合这一事实与倍增度量之间最优运输的正则性理论,我们证明了从均匀源度量到目标度量的最优运输映射是Hölder连续的。这种正则性为通过单个推前映射学习PDE诱导分布的一步生成模型提供了近似理论依据。作为代表实例,我们研究了DeepParticle,并推导了描述学习映射与总体最优映射之间差异的额外风险界。我们还建立了在目标转移下的鲁棒性估计,并通过实验验证了推导出的速率。

英文摘要

Despite the remarkable empirical success of generative models, the available theory on their statistical accuracy in scientific computing remains largely pessimistic. This paper develops a theoretical framework for understanding the regularity of transport maps and the generalization properties of one-step Wasserstein-guided generative models for PDE-induced probability measures. We consider normalized target densities associated with linear elliptic and parabolic equations on bounded domains, as well as diffusion and Fokker--Planck equations on the torus. Under standard structural assumptions, we prove that these target measures satisfy doubling conditions. By combining this fact with regularity theory for optimal transport between doubling measures, we show that the optimal transport map from a uniform source measure to the target measure is Hölder continuous. This regularity yields an approximation-theoretic justification for one-step generative models that learn PDE-induced distributions via a single pushforward map. As a representative instance, we study DeepParticle and derive excess-risk bounds characterizing the discrepancy between the learned map and the population-optimal map. We also establish a robustness estimate under target shift and illustrate the theory with experiments which support the derived rates.

2605.21387 2026-05-21 stat.ME

Clustering Craters on the Moon with Dysfunctional Families

用功能失调家族方法对月球陨石坑进行聚类

Nathan Weed, Emily Castleton, Dave Osthus, Brian Weaver, Richard L. Warr

AI总结 本文提出了一种新的聚类方法,结合功能失调家族约束到贝叶斯非参数聚类方法中,以解决专家陨石坑识别列表的整合问题,并提供聚类不确定性估计。

详情
AI中文摘要

对地球天体上的陨石坑数量和大小分布的总结对于理解太阳系的历史至关重要。然而,识别陨石坑却未被自动化,因此依赖专家陨石坑计数员标记静态图像。Robbins等人(2014)(以下简称R14)表明,与之前假设相反,专家陨石坑计数员的识别陨石坑列表之间存在显著差异。如何最佳地结合多个专家的识别陨石坑列表以了解太阳系的历史是一个开放且重要的问题。R14通过修改流行的DBSCAN聚类方法进行聚类。然而,他们的方法并未利用所有可用的约束信息,也未提供聚类不确定性的估计。为了解决DBSCAN方法的不足,我们提出了一种新的聚类方法,可以结合同一图像中多个感兴趣的物体列表。关键创新是将功能失调家族约束纳入贝叶斯非参数聚类方法,即中文餐厅过程(CRP),该方法自然考虑了陨石坑标识信息。功能失调家族中文餐厅过程(DFCRP)提供了聚类不确定性的估计。在本工作中,我们提供了超参数规范的指导,提出了一个吉布斯采样器,并进行了模拟研究,以比较DFCRP与CRP的性能。最后,我们将DFCRP应用于R14的陨石坑识别问题,比较结果,并展示了使用后验抽样聚类分配所进行的分析类型。

英文摘要

Summaries of craters on terrestrial bodies, such as the number and size distribution, are essential for understanding the history of the Solar System. Identifying craters, however, has not been automated and thus relies on expert crater-counters marking static images. Robbins et al. (2014) (hereafter R14) showed that, contrary to previously held assumptions, there exists large variability across expert crater-counters' identified crater lists. How best to combine identified crater lists across multiple experts for the purposes of learning about the Solar System is an open and consequential question. R14 combined identified crater lists via clustering through a modification of the popular DBSCAN clustering method. Their approach did not, however, make use of all the constraining information available nor did it provide an estimate of clustering uncertainty. To address the shortcomings of the DBSCAN method, we present a novel clustering approach that can combine multiple lists of identified objects of interest from the same image. The key innovation is incorporating a dysfunctional family constraint into the Bayesian nonparametric clustering approach, the Chinese restaurant process (CRP), which naturally takes into account information about the crater identifier. The dysfunctional family Chinese restaurant process (DFCRP) provides an estimate of clustering uncertainty. In this work, we provide guidance on hyperparameter specification, present a Gibbs sampler, and perform a simulation study to compare the performance of the DFCRP to the CRP. Finally, we apply the DFCRP to the crater identification problem of R14, comparing results, and also demonstrate the types of analyses that can be performed with posterior draws of cluster assignments.

2605.21365 2026-05-21 math.ST stat.ML stat.TH

$L^2$ over Wasserstein: Statistical Analysis for Optimal Transport

$L^2$ over Wasserstein: 统计分析与最优传输

Riccardo Passeggeri, Rohan M. Shenoy, Pengcheng Ye

AI总结 本文提出$L^2$ over Wasserstein空间,继承了Wasserstein空间的Riemannian结构,并通过随机概率测度的框架,为最优传输的统计不确定性提供了理论基础,同时展示了其在生成建模和贝叶斯非参数中的应用。

详情
Comments
49 pages. Comments are welcome
AI中文摘要

最优传输提供了一种本质上几何且高度结构化的框架,用于研究概率测度空间,为当代统计学、机器学习和生成建模提供了丰富的理论工具。然而,在实际应用中,感兴趣的测度几乎从来不是精确已知的,这就要求一个能够处理统计不确定性的最优传输理论。我们构建了这样的框架,将经典理论提升到随机概率测度的设置中。我们引入了$L^2$ over Wasserstein空间,证明其继承了Wasserstein空间的正式Riemannian结构,通过刻画距离和测地几何。该结构诱导出具有Wasserstein梯度流样本路径的随机流,使其成为允许随机梯度流动态的Wasserstein空间的自然扩展。我们利用$L^2$ over Wasserstein框架内的经验测度,对最优传输工具的统计收敛结果进行了集合。此外,在贝叶斯非参数的设定中,我们将Schwartz的一致性定理细化到Wasserstein拓扑,并推导了在同一框架下的后验收敛结果。我们还展示了随机令牌采样理论中使用自注意力流路径的Transformer模型可以嵌入到我们的框架中。这些结果为随机最优传输及其在随机采样统计不确定性下的原理性推断和生成建模提供了统一的处理。

英文摘要

Optimal transport provides an inherently geometric and highly structured framework for studying spaces of probability measures, supplying a rich theoretical toolkit for contemporary statistics, machine learning, and generative modelling. In applications, however, the measures of interest are almost never known precisely, calling for a theory of optimal transport that accounts for statistical uncertainty. We construct such a framework, lifting the classical theory to the setting of random probability measures. We introduce the $L^2$ over Wasserstein space establishing that it inherits the formal Riemannian structure of the Wasserstein space by characterising distances and geodesic geometry. The structure induces random flows with Wasserstein gradient flow sample paths, making it the natural extension of the Wasserstein space which allows for random gradient flow dynamics. We ensemble statistical convergence results of the optimal transport machinery using the empirical measure within the $L^2$ over Wasserstein framework. Moreover, in the setting of Bayesian non-parametrics, we refine Schwartz's consistency theorem to the Wasserstein topology and deduce posterior convergence of the same machinery in the $L^2$ over Wasserstein space. We demonstrate that the growing theory of random token sampling for transformer models using self-attention flow paths can be embedded into the our framework. The results provide a unified treatment of random optimal transport and its consequences for principled inference and generative modelling under the statistical uncertainty of random sampling.

2605.21360 2026-05-21 math.ST cs.CC stat.TH

Linear Functional Testing with General Loadings in Sparse Regression: Separation Rates and Computational Barriers

高维稀疏线性回归中一般加载情况下的线性功能检验:分离率与计算障碍

Jie Xie, Dongming Huang

AI总结 本文研究了在高维稀疏线性回归中,针对高斯随机设计和未知设计协方差的H0:ξ^Tβ=t0检验问题。构造了一个计算高效的混合检验方法,给出了适应性分离距离的上界,并建立了信息论下界。在超稀疏情况下,这些界限在任意ξ下描述了适应性分离率,而在中等稀疏情况下,这些界限在某些加载向量类别中匹配,但可能在一般情况下不同。此外,本文证明了一个低次下界,与上界在对数因子内匹配,表明改进混合检验的速率可能在统计上难以实现。对于平坦稀疏加载,本文通过稀疏CCA的多项式时间归约提供了进一步证据。最后,本文探讨了设计协方差信息如何影响适应性分离率,在稀疏符号尖峰协方差模型下,信息论下界可通过非高效的算法达到,而低次下界和稀疏CCA归约仍适用,提供了统计-计算差距的证据。当设计协方差已知且对角时,适应性分离率形式与超稀疏情况相同。

详情
AI中文摘要

我们研究了在高维稀疏线性回归中,针对高斯随机设计和未知设计协方差的H0:ξ^Tβ=t0检验问题。加载向量ξ是任意的,确切稀疏水平k未知但被一个已知值k_u所限制。检验需要在k_u稀疏的零假设下统一控制I型错误,而功率则评估在k稀疏的备择假设下。我们构造了一个计算高效的混合检验方法,给出了适应性分离距离的上界,并建立了信息论下界,该下界校准到ξ的幅度谱。在超稀疏情况下k_u≈√n/log p,这些界限在任意ξ下描述了适应性分离率,直至对数因子。在中等稀疏情况下√n/log p<<k_u≈n/log p,这些界限在某些加载向量类别中匹配,但可能在一般情况下不同。在该情况下,我们进一步证明了一个低次下界,该下界在对数因子内与上界匹配。这提供了证据,表明如果统计上可能改进混合检验的速率,可能在计算上是困难的。对于平坦稀疏加载,我们通过稀疏CCA的多项式时间归约补充了这一证据。最后,我们探讨了设计协方差信息如何影响适应性分离率在两种情况下的表现。在稀疏符号尖峰协方差模型下,信息论下界可通过非高效的算法达到,直至对数因子,而低次下界和稀疏CCA归约仍适用,提供了统计-计算差距的证据。当设计协方差已知且对角时,适应性分离率的形式与超稀疏情况相同。

英文摘要

We study the problem of testing $H_0: ξ^\topβ=t_0$ in high-dimensional sparse linear regression with Gaussian random design and unknown design covariance. The loading vector $ξ$ is arbitrary, and the exact sparsity level $k$ is unknown but bounded by a known value $k_u$. Tests are required to control Type I error uniformly over the $k_u$-sparse null, while power is evaluated against $k$-sparse alternatives. We construct a computationally efficient mixed test that gives an upper bound on the adaptive separation distance and establish an information-theoretic lower bound calibrated to the magnitude profile of $ξ$. In the ultra-sparse regime $k_u\lesssim \sqrt n/\log p$, these bounds characterize the adaptive separation rate up to logarithmic factors for arbitrary $ξ$. In the moderately sparse regime $\sqrt n/\log p\ll k_u\lesssim n/\log p$, these bounds match for several classes of loading vectors but may differ in general. In this regime, we further prove a low-degree lower bound that matches the upper bound up to logarithmic factors. This provides evidence that improving on the rate of the mixed test, if statistically possible, may be computationally hard. For flat sparse loadings, we complement this evidence with a polynomial-time reduction from sparse CCA. Finally, we examine how information about the design covariance affects the adaptive separation rate in two settings. Under a sparse signed-spiked covariance model, the information-theoretic lower bound is attainable up to logarithmic factors by a computationally inefficient procedure, while the low-degree lower bound and sparse-CCA reduction continue to apply, providing evidence for a statistical-computational gap. When the design covariance is known and diagonal, the adaptive separation rate takes the same form as in the ultra-sparse regime.

2605.21341 2026-05-21 stat.ML cs.LG

Semiparametric Efficient Bilevel Gradient Estimation

半参数高效双层梯度估计

Fares El Khoury, Houssam Zenati, Nathan Kallus, Michael Arbel, Aurélien Bibaut

AI总结 本文提出一种半参数去偏理论,用于消除双层梯度估计中的一阶偏差,通过交叉拟合的正交超梯度估计器实现了渐近正态性,并在二次损失下简化为基于条件均值 nuisances 的双重鲁棒分数。

详情
AI中文摘要

功能双层方法估计下层函数并将其插入选项超梯度,但当下层问题非参数学习时,这种插入选项梯度可能保留一阶偏差。为消除此偏差,我们基于高效影响函数开发了半参数去偏理论,用于总体双层梯度。这种视角导致了交叉拟合的正交超梯度估计器,我们建立了渐近正态性并统一控制外参数。在二次损失下,该估计器简化为基于条件均值 nuisances 的简单双重鲁棒分数。在具有已知真实值的合成双层基准测试中,该方法跟踪 oracle 高效梯度基准,并优于插入选项函数超梯度和正则化核双层基线。

英文摘要

Functional bilevel methods estimate a lower-level function and plug it into a hypergradient, but this plug-in gradient can retain first-order bias when the lower-level problem is learned nonparametrically. To remove this bias, we develop a semiparametric debiasing theory for population bilevel gradients based on the efficient influence function. This perspective leads to a cross-fitted orthogonal hypergradient estimator for which we establish asymptotic normality together with uniform control over the outer parameter. Under quadratic losses, the estimator reduces to a simple doubly robust score based on conditional mean nuisances. On synthetic bilevel benchmarks with known ground truth, the method tracks the oracle efficient-gradient benchmark and improves over plug-in functional hypergradients and regularized kernel bilevel baselines.

2605.21316 2026-05-21 stat.AP cs.DC

Bitcoin's Power Law: Weak Structure, Strong Forecasts

比特币的幂律:弱结构,强预测

Carlos Baquero, Raquel Menezes

AI总结 本文检验了比特币价格遵循幂律的假设,发现其分布幂律在UTXO余额和每日收益率上被拒绝,且时间域内指数变化显著,标准残差诊断和尺度不变性测试无法区分幂律与多组件Sigmoid堆叠拟合,同时比特币价格在跨资产比较中表现出独特性,其简单幂律在长期预测中表现较差,但在中短期预测中优于其他基准模型。

详情
AI中文摘要

比特币的价格在时间上被描述为遵循幂律(PL),即P ~ t^β,其中β的估计值约为5.7,时间范围为2010-2026年。我们使用Clauset-Shalizi-Newman协议对比特币的尾部相关分布序列进行测试,并开发了三种时间域的适应方法。我们发现(i)在UTXO余额和每日收益率上,分布幂律被拒绝,正态分布被明显偏好;(ii)时间域内拟合的指数在合理的时间原点移动下变化接近三倍,即在要求的移位不变结构意义上不稳健;(iii)早期工作中提出的标准残差诊断和尺度不变性测试无法区分幂律与拟合相同数据的多组件Sigmoid堆叠;(iv)比特币价格在涵盖比特币链上指标和传统资产类别的跨资产比较中脱颖而出:在九个系列的样本内测试中,比特币是唯一一个没有单一组件增长曲线优于幂律的系列,且季度K=3波稳定性-bootstrap在比特币上以p=0.015(严格15%CV阈值)拒绝PL+AR(1)原假设——这表明了跨资产分离,但不具有Bonferroni稳健性;(v)与十个候选者(包括标准时间序列基准(RW with drift, auto-ARIMA, ETS, local-linear-trend))的走步Diebold-Mariano评估显示,样本内胜者(多Sigmoid)在长期预测中表现较差,而简单的幂律在12-24个月预测中优于每个标准基准,p < 0.05,这正是因为它不承诺特定波形。拟合-预测权衡是描述发现的实用对应物。

英文摘要

Bitcoin's price has been described as following a power law (PL) in time, $P \sim t^β$ with $\hatβ\approx 5.7$ over 2010-2026. We test this claim using the Clauset-Shalizi-Newman protocol applied to Bitcoin's tail-relevant distributional series, and develop three principled time-domain adaptations of the protocol. We find that (i) the distributional power law is rejected on UTXO balances and daily |returns|, with lognormal preferred decisively; (ii) the fitted time-domain exponent varies by nearly a factor of three across reasonable shifts of the time origin -- it is not specification-robust in the sense required for a shift-invariant structural reading; (iii) standard residual diagnostics and scale-invariance tests proposed in earlier work cannot distinguish a power law from a multi-component sigmoid stack fit to the same data; (iv) Bitcoin price stands apart in a cross-asset comparison spanning Bitcoin on-chain metrics and traditional asset classes: it is the only series in the nine-series in-sample test where no single-component growth curve improves on the power law, and the quarterly $K=3$ wave-stability bootstrap rejects the PL+AR(1) null on Bitcoin at $p = 0.015$ (strict 15% CV threshold) -- a clear cross-asset separation, although not a Bonferroni-robust rejection; and (v) walk-forward Diebold-Mariano evaluation against ten candidates -- including standard time-series baselines (RW with drift, auto-ARIMA, ETS, local-linear-trend) -- shows the in-sample winner (multi-sigmoid) is among the worst long-horizon forecasters, while the simple power law dominates 12-24 month horizons against every standard baseline at $p < 0.05$, precisely because it does not commit to specific wave shapes. The fit-prediction tradeoff is the practical counterpart of the descriptive findings.

2605.21307 2026-05-21 stat.ME

The Bayesian Gaussian Process Latent Variable Model for Spatio-Temporal Stream Networks

基于空间时间流网络的贝叶斯高斯过程隐变量模型

Marno Basson, Tobias M. Louw, Theresa R. Smith

AI总结 本文提出了一种基于变分推断的框架,用于训练多输出高斯过程隐变量模型,特别针对尾部向上空间时间流网络进行优化,通过梯度优化最大化模型对数边际似然的变分下界,引入了一种新的尾部向上空间时间流网络模型家族,利用稀疏高斯过程诱导变量框架、贝叶斯高斯过程隐变量模型和局部变分方法,采用流距离代替欧几里得距离,并利用自相关/交叉相关和过程卷积捕捉空间和时间依赖性,从而开发出有效的可分离空间时间流网络基协方差函数。模拟案例研究结果表明,该框架在基准比较和多种性能指标下表现良好。

详情
AI中文摘要

本文提出了一种基于变分推断的框架,用于训练多输出高斯过程隐变量模型,特别针对尾部向上空间时间流网络进行优化。训练过程在存在缺失值的 censoring 观测数据集上进行,通过梯度优化最大化模型对数边际似然的二次变分下界。因此,引入了一种新的尾部向上空间时间流网络模型家族,这些模型依赖于稀疏高斯过程诱导变量框架、贝叶斯高斯过程隐变量模型和局部变分方法。这些空间时间模型使用流距离代替欧几里得距离,并利用自相关/交叉相关和过程卷积分别捕捉空间和时间依赖性,从而开发出有效的可分离空间时间流网络基协方差函数。模拟基于的案例研究结果表明,所提出的框架在考虑基准比较和多种性能指标时表现良好。

英文摘要

A variational inference-based framework for training a multi-output Gaussian process latent variable model, specifically tailored to the tails-up spatio-temporal stream network, is developed. Training, given a censored observational data set subject to missing values, proceeds by maximising a secondary variational lower bound on the model log marginal likelihood using gradient-based optimisation. Consequently, the theoretical development for a new family of tails-up spatio-temporal stream network models is introduced which rely on the sparse Gaussian process inducing variable framework, the Bayesian Gaussian process latent variable model, and local variational methods. These spatio-temporal models use stream distance instead of Euclidean distance and capture spatial and temporal dependencies using auto/cross-correlation and process convolution, respectively, which allows for the development of valid separable spatio-temporal stream network-based covariance functions. Results from the simulation-based case studies indicate that the proposed framework performs well when considering benchmark comparisons and several performance metrics.

2605.21304 2026-05-21 stat.ME

How does limma-trend work? An empirical partially Bayes perspective

limma-trend是如何工作的?一种经验性部分贝叶斯视角

Sagnik Nandy, Wanyi Ling, Nikolaos Ignatiadis

AI总结 本文从经验性部分贝叶斯推理的角度研究了limma-trend方法,揭示了其通过参数化收缩方差估计向拟合曲线(趋势)来提高统计功效,并探讨了其在ChIP-seq中MAnorm2变体可能无法控制FDR的原因,同时提出了一种非参数化的limma-trend泛化方法以更有效地控制FDR。

详情
AI中文摘要

在高通量生物学中,通常需要为每个基因、蛋白质或其他单位拟合成千上万的线性回归,每个单位仅有很少的样本。Limma-trend是这种情况下最广泛使用的方法之一,它通过将方差估计参数化地收缩向拟合曲线(趋势)来提高统计功效,该趋势将方差与单位层面的汇总信息(如平均强度、肽计数)相关联,然后计算p值并应用Benjamini-Hochberg过程以控制假发现率(FDR)。我们通过经验性部分贝叶斯推理的视角研究limma-trend,这是一种在拟合曲线中设定并估计非必要参数的范式,而感兴趣参数保持固定。从这一视角出发,limma-trend计算近似的部分贝叶斯p值,这些p值基于残差样本方差和单位层面的汇总信息。相同的框架解释了为什么MAnorm2,一种用于ChIP-seq的流行变体,有时无法控制FDR。然后我们推导了一种非参数化的limma-trend泛化方法,该方法使用非参数最大似然法估计残差方差先验。在密集信号下,这种方法渐近地控制FDR——即使趋势被错误指定或不一致地估计。为了允许条件方差分布的完整形状依赖于单位层面的汇总信息,我们开发了第二种方法,直接学习它。

英文摘要

In high-throughput biology, it is common to fit thousands of linear regressions -- one per gene, protein, or other unit -- with very few samples per unit. Limma-trend, one of the most widely used methods in this setting, improves power by shrinking variance estimates parametrically toward a fitted curve (the trend) relating variance to a unit-level summary (e.g., average intensity, peptide count), before computing p-values and applying the Benjamini-Hochberg procedure to control the false discovery rate (FDR). We study limma-trend through the lens of empirical partially Bayes inference, a paradigm in which a prior is posited and estimated for the nuisance parameters while parameters of interest remain fixed. From this perspective, limma-trend computes approximate partially Bayes p-values that condition on the residual sample variance and the unit-level summary. The same framework explains why MAnorm2, a popular variant for ChIP-seq, can sometimes fail to control FDR. We then derive a nonparametric generalization of limma-trend that estimates the residual variance prior using nonparametric maximum likelihood. Under dense signals, this procedure asymptotically controls the FDR -- even when the trend is misspecified or inconsistently estimated. To allow the full shape of the conditional variance distribution to depend on the unit-level summary, we develop a second procedure that learns it directly.

2605.21292 2026-05-21 stat.ML cs.AI cs.LG math.DS

Large-Step Training Dynamics of a Two-Factor Linear Transformer Model

双因子线性变换器模型的大步训练动态

Krishnakumar Balasubramanian

AI总结 本文研究了双因子线性变换器模型在大学习率下的训练动态,通过分析发现大步长学习率可以改变变换器的训练吸引子,而非仅仅加速收敛,可能在稳定性阈值之外导致训练进入循环、有界混沌或发散。

详情
AI中文摘要

梯度流分析显示,简化的线性变换器可以学习上下文线性回归算法,但无法解释大学习率下梯度下降的有限步行为。受高学习率变换器不稳定性实证研究和二次回归的立方图相图启发,我们研究了一个可以简化为单提示线性变换器训练问题的恰好可约问题。归一化后,动态减少为一个双因子乘积映射,具有有效步长参数μ。在平衡切片上,该映射恢复了已知的标量立方过渡,从单调收敛到飞弹收敛,周期性和有界非收敛,以及发散。我们随后分析了完整的二维系统,显示对于0<μ<2,它有一个显式不变的切比雪夫椭圆,将前向不变区域分开;该椭圆承载着不平衡的混沌动态,但横向排斥,而平衡标量吸引子可以横向吸引。这些结果表明,大常数学习率可以改变学习变换器的训练吸引子,而不仅仅是加速收敛:在稳定性阈值之外,有限步训练可能进入循环、有界混沌或发散,而不是单一的上下文线性回归解。我们还讨论了这对基于小批量梯度下降训练方法的影响。

英文摘要

Gradient-flow analyses show that simplified linear transformers can learn the in-context linear-regression algorithm, but they do not explain the finite-step behavior of gradient descent at large learning rates. Motivated by empirical work on high-learning-rate transformer instabilities and by the cubic-map phase diagram for quadratic regression, we study an exactly reducible one-prompt linear-transformer training problem. After normalization, the dynamics reduce to a two-factor product map with an effective step-size parameter \(μ\). On the balanced slice, this map recovers the known scalar cubic transition from monotone convergence to catapult convergence, periodic and chaotic bounded nonconvergence, and divergence. We then analyze the full two-dimensional system and show that, for \(0<μ<2\), it has an explicit invariant Chebyshev ellipse separating forward-invariant regions; this ellipse carries off-balanced chaotic dynamics but is transversely repelling, while balanced scalar attractors can be transversely attracting. These results show that large constant learning rates can change the training attractor of the learned transformer rather than merely accelerating convergence: beyond sharp stability thresholds, finite-step training may settle into cycles, bounded chaos, or divergence instead of a single in-context linear-regression solution. We also discuss the consequences for mini-batch gradient descent based training methods.

2605.21283 2026-05-21 stat.ME stat.AP

A continuous-time Markov chain framework for population size estimation from multi-list data: accounting for absorbing lists and asymmetric interactions

一个用于从多列表数据中估计种群规模的连续时间马尔可夫链框架:考虑吸收列表和非对称交互

Ophélie Schaller, Andrew Titman, Rachel McCrea

AI总结 本文提出了一种连续时间马尔可夫链框架,用于从多列表数据中估计种群规模,能够建模方向性交互并处理吸收列表(如死亡记录)或更一般的数据收集过程。通过模拟研究,作者强调了在存在列表依赖性时,使用马尔可夫模型或对数线性模型进行强制吸收交互的必要性,以避免种群规模估计的偏差。

详情
AI中文摘要

我们介绍了一个连续时间马尔可夫链框架,用于从多列表数据中估计种群规模,该框架允许建模方向性交互,并能够处理吸收列表,如死亡记录或更一般的数据收集过程。标准的连续时间马尔可夫链框架模型和多列表数据的对数线性模型在列表相互独立时是等价的,我们通过实证研究发现,在列表之间存在依赖性时,它们也给出相似的结果。通过模拟研究,我们强调了通过使用马尔可夫模型或对数线性模型中的强制吸收交互来考虑吸收列表的必要性,否则会得到有偏的种群规模估计。我们用一个流行病学数据集来说明我们的方法,该数据集涉及在英格兰西北部经历首次中风的个体,其中有一个列表是死亡记录。我们还通过考虑伦敦城的药物使用数据中的有序列表情况,进一步展示了本方法的应用。

英文摘要

We introduce a continuous-time Markov chain framework for estimating population size from multi-list data, which allows directional interactions to be modelled and can accommodate absorbing lists, such as death records, or more general data collection processes. The standard model of the continuous-time Markov chain framework and the log-linear model for multi-list data are equivalent when lists are independent and we show empirically that they give similar results in the presence of dependencies between lists. Through a simulation study, we highlight the need to account for an absorbing list by using the Markov model or the log-linear model with forced absorbing interactions, observing biased estimates of the population size otherwise. We motivate our approach with an epidemiological dataset concerning individuals suffering from a first ever stroke in North-West England, in which one of the lists is a death record. We illustrate a further use of our approach by considering a case of ordered lists on drug use data from the City of London.

2605.08352 2026-05-21 cs.LG math.PR stat.ML

Convergence Analysis of Newton's Method for Neural Networks in the Overparameterized Limit

神经网络过参数化极限下牛顿方法的收敛性分析

Konstantin Riedl, Konstantinos Spiliopoulos, Justin Sirignano

AI总结 本文研究了在过参数化极限下,正则化牛顿方法训练神经网络的收敛性问题,通过分析牛顿神经切线核(NNTK)的特性,证明了在无限宽极限下,神经网络以指数速度收敛到目标数据,并解决了频谱偏置问题。

详情
AI中文摘要

本文开发了一种正则化牛顿方法用于训练神经网络(NNs)在过参数化极限下的收敛性分析。当隐藏单元数量趋于无穷大时,NN训练动态在概率意义上收敛到一个确定性极限方程的解,该方程涉及一个“牛顿神经切线核”(NNTK)。给出了描述这种收敛的显式速率,并在无限宽度极限下证明NN以指数速度收敛到目标数据(即零损失的全局极小值)。我们证明这种收敛在频谱上是均匀的,解决了梯度下降中的频谱偏置问题。梯度下降的NNTK的特征值聚集在零,导致具有高频分量的目标数据收敛缓慢。相反,如果适当选择正则化参数,NNTK的特征值具有统一的下界,使得牛顿方法能够更快地收敛到具有高频分量的数据。数学上需要解决的问题包括牛顿方法隐式参数更新中可能的不定Hessian矩阵以及随着NN宽度增加,该线性方程组的维度趋于无穷大。这使得在过参数化极限下推导训练动态以及证明有限宽度动态收敛变得复杂。分析确定了一个正则化参数的标度公式,我们证明该公式可以随着隐藏单元数量的增加以合适速率趋于零。我们证明,对于足够大的隐藏单元数量,正则化Hessian在训练过程中保持正定,且NN参数的牛顿更新收敛到零,表明模型行为如同初始化周围的线性化。

英文摘要

A convergence analysis is developed for the regularized Newton method for training neural networks (NNs) in the overparameterized limit. As the number of hidden units tends to infinity, the NN training dynamics converge in probability to the solution of a deterministic limit equation involving a ``Newton neural tangent kernel'' (NNTK). Explicit rates characterizing this convergence are provided and, in the infinite-width limit, we prove that the NN converges exponentially fast to the target data (i.e., a global minimizer with zero loss). We show that this convergence is uniform across the frequency spectrum, addressing the spectral bias inherent in gradient descent. The eigenvalues of the NTK for gradient descent accumulate at zero, leading to slow convergence for target data with high-frequency components. In contrast, the NNTK has uniformly lower bounded eigenvalues if the regularization parameter is selected appropriately, allowing Newton's method to converge more quickly for data with high-frequency components. Mathematical challenges that need to be addressed in our analysis include the implicit parameter update of the Newton method with a potentially indefinite Hessian matrix and the fact that the dimension of this linear system of equations tends to infinity as the NN width grows. This complicates deriving the training dynamics in the overparameterized limit as well as proving the convergence of the finite-width dynamics thereto. The analysis identifies a scaling formula for selecting the regularization parameter, which we show can vanish at a suitable rate as the number of hidden units becomes larger. We prove that, for sufficiently large numbers of hidden units, the regularized Hessian remains positive definite during training and the Newton updates for individual NN parameters converge to zero, showing that the model behaves as a linearization around the initialization.

2604.23944 2026-05-21 stat.ML cs.LG

Sliced-Regularized Optimal Transport

切片正则化最优传输

Khai Nguyen

AI总结 本文提出了一种新的正则化最优传输(OT)方法,称为切片正则化最优传输(SROT)。与熵正则化最优传输(EOT)不同,SROT将正则化方向指向平滑的切片最优传输(SOT)计划。我们提供了SROT的正式定义,推导了其对偶形式,并提供了SROT的后贝叶斯解释。然后,我们开发了一种类似Sinkhorn的算法,以高效计算,保留与EOT相同的可扩展性优势。通过将可扩展的SOT计划作为先验,SROT在相同正则化水平下比EOT更准确地近似了精确的OT计划。此外,所得到的传输计划优于参考的SOT计划本身。我们还引入了由SROT引起的相应的OT分歧度,称为SROT分歧度,并分析了其拓扑和计算性质。最后,我们通过合成数据集和颜色传输任务的实验验证了我们的方法,证明SROT在近似精确OT方面优于EOT和SOT。额外的梯度流实验进一步突显了SROT分歧度的优势。

详情
Comments
22 pages, 8 figures, 1 table
AI中文摘要

我们提出了一种新的正则化最优传输(OT)公式,称为切片正则化最优传输(SROT)。与熵正则化最优传输(EOT)不同,SROT正则化方向指向平滑的切片最优传输(SOT)计划。据我们所知,SROT是首个利用SOT计划的版本作为参考来改进经典OT的方法。我们提供了SROT的正式定义,推导了其对偶形式,并提供了SROT的后贝叶斯解释。然后,我们开发了一种类似Sinkhorn的算法以实现高效的计算,保留与EOT相同的可扩展性优势。通过将可扩展的SOT计划作为先验,SROT在相同正则化水平下比EOT更准确地近似了精确的OT计划。此外,所得到的传输计划优于参考的SOT计划本身。我们进一步引入了由SROT引起的相应的OT分歧度,称为SROT分歧度,并分析了其拓扑和计算性质。最后,我们通过合成数据集和颜色传输任务的实验验证了我们的方法,证明SROT在近似精确OT方面优于EOT和SOT。额外的梯度流实验进一步突显了SROT分歧度的优势。

英文摘要

We propose a new regularized optimal transport (OT) formulation, termed sliced-regularized optimal transport (SROT). Unlike entropic OT (EOT), which regularizes the transport plan toward an independent coupling, SROT regularizes it toward a smoothened sliced OT (SOT) plan. To the best of our knowledge, SROT is the first approach to leverage a version of SOT plan as a reference to improve classical OT. We provide a formal definition of SROT, derive its dual formulation, and provide a post-Bayesian interpretation of SROT. We then develop a Sinkhorn-style algorithm for efficient computation, retaining the same scalability advantages as EOT. By incorporating a scalable SOT plan as a prior, SROT yields more accurate approximations of the exact OT plan than EOT under the same level of regularization. Moreover, the resulting transport plan improves upon the reference SOT plan itself. We further introduce the corresponding OT divergence induced by SROT, named SROT divergence, and analyze its topological and computational properties. Finally, we validate our approach through experiments on synthetic datasets and color transfer tasks, demonstrating that SROT is better than both EOT and SOT in approximating exact OT. Additional experiments on gradient flows further highlight the advantages of SROT divergence.

2604.19169 2026-05-21 stat.ME

A Finite Mixture Failure-rate based Heterogeneous Step-stress Accelerated Life Testing (h-SSALT) Model

基于有限混合失败率的异质步应力加速寿命测试(h-SSALT)模型

Pranoy Palit, Ayan Pal, Kiran Prajapat

AI总结 本文提出一种基于有限混合失败率的异质步应力加速寿命测试模型,通过Weibull分布的II型截尾失效时间,允许在第二应力水平上通过有限混合的m个潜在子组产生异质性,利用期望最大化算法进行最大似然估计,并通过模拟研究验证了忽略异质性对寿命预测的系统性偏差。

详情
Comments
44 pages, 7 figures, 12 tables. Version 2: we have added interval estimation using Louis' missing information method with transformation-based confidence intervals, and an additional real data analysis example
AI中文摘要

传统步应力加速寿命测试模型假设测试单元来自同质群体。最近,Lu和Kateri(2025)提出了一种基于累积暴露的异质步应力加速寿命测试(SSALT)模型,以考虑同一生产批次中测试单元的非同质老化模式。本文引入了一种替代但灵活的基于失败率的异质简单步应力加速寿命测试(h-SSALT)模型,采用Weibull分布的II型截尾失效时间,允许通过m个潜在子组的有限混合在第二应力水平上产生异质性。开发了期望最大化算法用于模型参数的最大似然估计,利用来自未知群体成员资格和II型截尾的不完整数据结构。通过Louis(1982)的缺失信息身份进行区间估计,使用基于转换的置信区间尊重参数约束。广泛的模拟研究评估了所提出估计器的有限样本性能,并通过基于分位数的比较证明,忽略群体异质性会导致整个分位数范围内的寿命预测系统性偏差,最严重的后果出现在早期失效分位数上,这直接相关于保修期设计。通过特殊情形比较确认,所提出的Weibull失败率基于公式当形状参数等于1时退化为Lu和Kateri(2025)现有的模型,验证了所提出的框架作为适当推广的正确性。通过模拟和实际数据分析示例进一步展示了该模型的实际应用。

英文摘要

Traditional step-stress accelerated life testing models assume that test units originate from a homogeneous population. Recently, Lu and Kateri (2025) proposed a heterogeneous cumulative exposure based SSALT model to account for the inhomogeneous aging patterns among test units belonging to the same production batch. This paper introduces an alternative yet flexible failure-rate based heterogeneous simple SSALT (h-SSALT) model with Weibull-distributed Type-II censored failure times, allowing heterogeneity to emerge at the second stress level through a finite mixture of m latent subgroups, each characterized by its own failure behavior. The expectation-maximization algorithm is developed for maximum likelihood estimation of the model parameters, exploiting the incomplete data structure arising from both unknown group membership and Type-II censoring. Interval estimation is performed using the missing information identity of Louis (1982) with transformation-based confidence intervals respecting parameter constraints. An extensive simulation study evaluates the finite-sample performance of the proposed estimators and demonstrates, through a quantile-based comparison, that ignoring population heterogeneity leads to systematic bias in lifetime predictions across the entire quantile range, with the most severe consequences at early failure quantiles of direct relevance to warranty period design. A special case comparison confirms that the proposed Weibull failure-rate based formulation reduces to the existing model of Lu and Kateri (2025) when the shape parameter equals unity, validating the proposed framework as a proper generalization. The practical application of the model is further illustrated through simulated and real data analysis examples.

2603.06871 2026-05-21 stat.ME stat.AP stat.CO stat.OT

Adaptive Bi-Level Variable Selection of Conditional Main Effects for Generalized Linear Models

适应性双层条件主效应变量选择用于广义线性模型

Kexin Xie, Xinwei Deng

AI总结 本文提出了一种适应性cmenet方法,用于在广义线性模型框架下进行条件主效应变量选择,通过自适应权重的惩罚似然方法改进了双层变量选择,同时开发了高效的参数估计算法,并通过模拟研究和基因关联分析的实证研究评估了方法性能。

详情
AI中文摘要

理解变量间的交互效应对于各种应用中的回归建模至关重要。传统的将交互效应量化为变量乘积的方法往往缺乏清晰的可解释性,尤其是在复杂系统中。条件主效应(CME)的概念提供了一个更直观和可解释的框架,通过量化一个变量在另一个变量水平下的效应来捕捉交互效应。最近提出的一种称为cmenet的方法进一步考虑了CME的双层选择,通过利用其自然分组结构(例如兄弟和堂兄弟组)通过惩罚来实现。然而,cmenet方法存在一些局限性,包括组内CME的惩罚耦合能力、组间惩罚的缺乏适应性以及仅限于具有连续响应的线性模型。为克服这些限制,我们提出了一种适应性cmenet方法,用于在广义线性模型(GLM)框架下进行CME选择。所提出的方法考虑了一种带有自适应权重的惩罚似然方法,以实现有效的双层变量选择,提高组间和组内选择的效果。还开发了通过迭代加权最小二乘法进行参数估计的高效算法。所提出方法的性能通过模拟研究和基因关联分析的实证研究进行了评估。

英文摘要

Understanding interaction effects among variables is important for regression modeling in various applications. The conventional approach of quantifying interactions as the product of variables often lacks clear interpretability, especially in complex systems. The concept of conditional main effects (CME) provides a more intuitive and interpretable framework for capturing interaction effects by quantifying the effect of one variable conditional on the level of another. A recent method called cmenet further considered the bi-level selection of CMEs by leveraging their natural grouping structure (e.g., sibling and cousin groups) through penalization. However, there are several limitations in the cmenet method, including the coupling ability of penalties for within-group CMEs, lack of adaptiveness for between-group penalties, and restriction to linear models with continuous responses. To overcome these limitations, we propose an adaptive cmenet method for CME selection under the generalized linear model (GLM) framework. The proposed method considers a penalized likelihood approach with adaptive weights to enable effective bi-level variable selection, improving both between-group and within-group selection. An efficient algorithm for parameter estimation is also developed by employing an iteratively reweighted least squares procedure. The performance of the proposed method is evaluated by both simulation studies and real-data studies in gene association analysis.

2602.13485 2026-05-21 cs.LG stat.ML

Federated Learning of Nonlinear Temporal Dynamics with Graph Attention-based Cross-Client Interpretability

基于图注意力的跨客户端可解释性非线性时序动态联邦学习

Ayse Tursucular, Ayush Mohanty, Nazal Mohamed, Nagi Gebraeel

AI总结 本文提出了一种联邦学习框架,用于在分布式非线性系统中学习跨客户端的时序依赖关系。该框架通过非线性状态空间模型将本地高维观测映射到低维潜在状态,并利用图注意力网络在通信的潜在状态上学习图结构的神经状态转移模型,通过将学习的服务器侧转移模型的雅可比矩阵与注意力系数相关联,实现了对跨客户端时序依赖关系的可解释性。

详情
Comments
Manuscript under review
AI中文摘要

现代工业系统的网络越来越多地由分布式传感器监控,其中每个系统由多个子系统组成,生成高维时间序列数据。这些子系统通常是相互依赖的,因此理解一个子系统中的时序模式如何与其他子系统相关联变得很重要。在去中心化设置中,原始测量值无法共享,客户端观测是异质的,这使得问题更加复杂。在实际部署中,每个子系统(客户端)运行一个固定的专有模型,无法修改或重新训练,限制了现有方法。非线性动态进一步使跨客户端时序依赖关系难以解释,因为它们嵌入在非线性状态转移函数中。本文提出了一种联邦框架,用于在这些约束下学习跨客户端的时序依赖关系。每个客户端使用非线性状态空间模型将高维本地观测映射到低维潜在状态。中央服务器利用图注意力网络在通信的潜在状态上学习图结构的神经状态转移模型。为了可解释性,我们将学习的服务器侧转移模型的雅可比矩阵与注意力系数相关联,从而首次提供了对去中心化非线性系统中跨客户端时序依赖关系的可解释性描述。我们建立了理论收敛保证,以达到集中化 oracle,并通过合成实验验证了该框架,展示了收敛性、可解释性、可扩展性和隐私。此外,现实世界实验显示其性能与去中心化基线相当。

英文摘要

Networks of modern industrial systems are increasingly monitored by distributed sensors, where each system comprises multiple subsystems generating high dimensional time series data. These subsystems are often interdependent, making it important to understand how temporal patterns at one subsystem relate to others. This is challenging in decentralized settings where raw measurements cannot be shared and client observations are heterogeneous. In practical deployments each subsystem (client) operates a fixed proprietary model that cannot be modified or retrained, limiting existing approaches. Nonlinear dynamics further make cross client temporal interdependencies difficult to interpret because they are embedded in nonlinear state transition functions. We present a federated framework for learning temporal interdependencies across clients under these constraints. Each client maps high dimensional local observations to low dimensional latent states using a nonlinear state space model. A central server learns a graph structured neural state transition model over the communicated latent states using a Graph Attention Network. For interpretability we relate the Jacobian of the learned server side transition model to attention coefficients, providing the first interpretable characterization of cross client temporal interdependencies in decentralized nonlinear systems. We establish theoretical convergence guarantees to a centralized oracle and validate the framework through synthetic experiments demonstrating convergence, interpretability, scalability and privacy. Additional real world experiments show performance comparable to decentralized baselines.

2601.15950 2026-05-21 math.PR math.ST stat.TH

Extreme Score Distributions in Countable-Outcome Round-Robin Tournaments of Equally Strong Players

可数结果轮换赛中同等强选手的极端得分分布

Yaakov Malinovsky

AI总结 本文研究了在可数结果轮换赛中,同等强选手的极端得分分布问题,通过分析极端得分(如最大值、次大值和下限极值)的分布特性,得出了当玩家数量n趋于无穷时的极限分布及收敛速率。

详情
AI中文摘要

我们考虑了一类通用的轮换赛模型,其中n名选手每名都与其他选手进行一次比赛。在每场比赛中,结果是从单位区间的一个可数子集中的值,且两名选手在比赛中的得分之和为一。每个选手的最终得分定义为其在所有比赛中获得的得分之和。我们研究了极端得分的分布,包括最大值、次大值和下限极值。由于即使对于小n值,精确分布也是计算不可行的,我们推导了当玩家数量n趋于无穷时的渐进行为,包括极限分布和收敛速率。

英文摘要

We consider a general class of round-robin tournament models of equally strong players. In these models, each of the $n$ players competes against every other player exactly once. For each match between two players, the outcome is a value from a countable subset of the unit interval, and the scores of the two players in a match sum to one. The final score of each player is defined as the sum of the scores obtained in matches against all other players. We study the distribution of extreme scores, including the maximum, second maximum, and lower-order extremes. Since the exact distribution is computationally intractable even for small values of $n$, we derive asymptotic results as the number of players $n$ tends to infinity, including limiting distributions, and rates of convergence.

2512.23943 2026-05-21 cs.CY cs.LG stat.ME

Statistical Guarantees in the Search for Less Discriminatory Algorithms

在寻找更少歧视性算法中统计保证

Chris Hays, Ben Laufer, Solon Barocas, Manish Raghavan

AI总结 本文研究了在高风险领域中,企业为减少对受保护群体的歧视性影响而寻找更少歧视性算法的统计保证问题,提出了一种自适应停止算法以确定何时停止搜索以证明进一步搜索不会带来有意义的改进。

详情
Comments
38 pages, 10 figures
AI中文摘要

美国反歧视法可以对企业未能采用减少歧视的替代方案(LDA)施加责任:一种决策政策,能够在实现相同商业目标的同时减少对受法律保护群体的歧视性影响。最近的学术研究认为,这一学说对高风险领域(如就业、贷款和住房)的算法决策有直接影响,可能迫使企业寻找“更少歧视性算法”(Black等,2024)。监管机构有时会鼓励主动寻找LDA,强化了企业努力寻找同样表现但影响更小的模型的期望。模型多样性使得此类搜索成为可能:通过不同的随机种子重新训练可以产生具有相似预测性能但实质性不同的歧视性影响的模型。然而企业无法无限重新训练,这提出了一个核心问题:何时搜索足够证明善意?我们正式将LDA搜索在多样性下作为最优停止问题,其中开发者试图产生证据表明进一步搜索不太可能带来有意义的改进。我们的主要贡献是一种自适应停止算法,它提供了一个高概率的上界,以确定通过继续重新训练所能达到的最佳歧视性影响改进,使开发者能够证明(例如,向法院)进一步搜索不太可能有所帮助。我们还展示了在模型空间上更强的分布假设可以产生更紧的界限,并在现实世界信用和住房数据集上验证了该方法。

英文摘要

U.S. discrimination law can impose liability on firms that fail to adopt a less discriminatory alternative (LDA): a decision policy that achieves the same business objectives while reducing disparate impact on legally protected groups. Recent scholarship argues that this doctrine has direct implications for algorithmic decision-making in high-stakes domains such as employment, lending, and housing, potentially obligating firms to search for "less discriminatory algorithms" (Black et al., 2024). Regulators have at times encouraged proactive LDA searches, reinforcing the expectation of a good-faith effort to identify equally performant models with lower disparate impact. Model multiplicity makes such searches plausible: retraining with different random seeds can yield models with comparable predictive performance but materially different disparate impacts. Yet firms cannot retrain indefinitely, raising a central question: when is the search sufficient to demonstrate good faith? We formalize LDA search under multiplicity as an optimal stopping problem in which a developer seeks to produce evidence that further search is unlikely to yield meaningful improvements. Our main contribution is an adaptive stopping algorithm that provides a high-probability upper bound on the best disparate-impact gains attainable through continued retraining, enabling developers to certify (e.g., to a court) that additional search is unlikely to help. We also show how stronger distributional assumptions over the model space can yield tighter bounds, and we validate the approach on real-world credit and housing datasets.

2508.04074 2026-05-21 stat.AP

Matrix Factorization-Based Solar Spectral Irradiance Missing Data Imputation with Uncertainty Quantification

基于矩阵分解的太阳光谱辐照度缺失数据填补与不确定性量化

Yuxuan Ke, Xianglei Huang, Odele Coddington, Yang Chen

AI总结 本文提出了一种基于低秩矩阵分解的太阳光谱辐照度重建方法,结合自回归时间正则化、周期样条去趋势和交叉光谱协方差信息,以提高填补精度并生成校准的不确定性区间,适用于气候科学研究。

详情
AI中文摘要

太阳光谱辐照度(SSI)描述了到达地球大气顶部的太阳能量通量的光谱分布。每日SSI测量构成一个在光谱(行)和时间(列)上解析的太阳能量通量测量矩阵。最近的SSI测量自2018年3月起由NASA的总和光谱太阳辐照度传感器-1(TSIS-1)光谱辐照度监测器(SIM)完成,但数据存在大量缺失,原因包括随机因素、仪器停机、与太阳周期性磁活动相关的周期性趋势以及光谱间不同程度的相关性,某些接近于1。本文提出了一种低秩矩阵分解方法用于SSI重建,结合自回归时间正则化、周期样条去趋势和交叉光谱协方差信息。该方法作为两阶段过程实现,分别针对散射缺失和延长停机缺失进行处理,并使用高效的交替优化算法进行拟合。我们进一步通过基于符合预测的分布自由区间估计程序附带重建的SSI值。通过合成实验和真实数据分析,我们比较了该方法与高斯过程回归、线性时间序列平滑和现有矩阵补全方法在填补精度、区间覆盖、区间长度和计算效率方面的表现。结果表明,利用SSI的周期性、时间性和交叉光谱结构显著提高了重建性能,并生成校准的不确定性区间,产生适合下游气候科学研究的重建SSI数据产品。

英文摘要

The solar spectral irradiance (SSI) depicts the spectral distribution of solar energy flux reaching the top of the Earth's atmosphere. Daily SSI measurements constitute a matrix with spectrally (rows) and temporally (columns) resolved solar energy flux measurements. The most recent SSI measurements have been made by NASA's Total and Spectral Solar Irradiance Sensor-1 (TSIS-1) Spectral Irradiance Monitor (SIM) since March 2018. This data has considerable missing data due to both random factors and instrument downtime, a periodic trend related to the Sun's cyclical magnetic activity, and varying degrees of correlation among the spectra, some approaching unity. We propose a low-rank matrix factorization method for SSI reconstruction that incorporates autoregressive temporal regularization, periodic spline detrending, and cross-spectral covariance information. The method is implemented as a two-stage procedure designed to address scattered missingness and extended downtime missingness, respectively, and is fitted using efficient alternating optimization algorithms. We further accompany the reconstructed SSI values with a distribution-free interval estimation procedure based on conformal prediction. Through synthetic experiments and real-data analyses, we compare this method with Gaussian process regression, linear time series smoothing, and existing matrix-completion approaches in terms of imputation accuracy, interval coverage, interval length, and computational efficiency. The results show that exploiting the periodic, temporal, and cross-spectral structure of SSI substantially improves reconstruction performance and yields calibrated uncertainty intervals, producing a reconstructed SSI data product suitable for downstream climate science studies.

2503.18831 2026-05-21 math.ST stat.TH

An improved central limit theorem for the empirical sliced Wasserstein distance

经验切片瓦瑟斯坦距离的改进中心极限定理

David Rodríguez-Vítores, Eustasio del Barrio, Jean-Michel Loubes

AI总结 本文基于Efron-Stein不等式和对最优运输势的非平凡控制,推导了p-切片瓦瑟斯坦距离的中心极限定理,为非紧致测度之间的切片瓦瑟斯坦距离提供了首次渐近有效的推断框架。

详情
Comments
26 pages, 1 figure
AI中文摘要

瓦瑟斯坦距离在现代数据分析中被广泛应用,但在高维情况下面临显著的计算和统计挑战。切片瓦瑟斯坦距离通过利用一维投影缓解了这些挑战。基于Efron-Stein不等式-一种在相关问题中已被证明有效的技术-以及对最优运输势在不同方向上的非平凡控制,我们建立了p-切片瓦瑟斯坦距离(p>1)的中心极限定理,以经验成本的期望为中心。与一般瓦瑟斯坦距离不同,中心化可以被总体成本替代,从而实现有效的统计推断。这扩展和细化了现有的一维结果,为可能非紧致测度之间的切片瓦瑟斯坦距离提供了首个渐近有效的推断框架。最后,我们处理了推断中其他关键的实用方面,包括切片积分的蒙特卡洛近似和一致方差估计。

英文摘要

Wasserstein distances are widely used in modern data analysis but pose significant computational and statistical challenges in high dimensions. The sliced Wasserstein distance alleviates these challenges by leveraging one-dimensional projections. Building on the Efron-Stein inequality-a technique proven effective in related problems-and a non-trivial control of the optimal transport potentials across directions, we establish a central limit theorem for the p-sliced Wasserstein distance, for p>1, centered at the expected empirical cost. Unlike for the general Wasserstein distance, the centering can be replaced by the population cost, enabling valid statistical inference. This generalizes and refines existing one-dimensional results, providing the first asymptotically valid inference framework for the sliced Wasserstein distance between possibly non-compact measures. Finally, we address other practical aspects crucial for inference, including Monte Carlo approximation of the slicing integral and consistent variance estimation.

2503.00565 2026-05-21 stat.ML cs.LG math.ST stat.ME stat.TH

Batched Single-Index Global Multi-Armed Bandits with Covariates

批量单索引全局多臂老虎机与协变量

Sakshi Arya, Hyebin Song

AI总结 本文提出了一种新的半参数框架,用于带有协变量的批量老虎机问题,通过引入共享参数和单索引回归模型来捕捉臂奖励之间的关系,提出BIDS算法,在两种设置下推导了理论遗憾界,证明了在协变量维度为1时非参数批量老虎机的最优率。

详情
AI中文摘要

多臂老虎机(MAB)框架是一种广泛用于顺序决策制定的方法,其中决策者在每一轮中选择一个臂,以最大化长期奖励。在许多实际应用中,如个性化医学和推荐系统,决策时可用上下文信息,不同臂的奖励相关而非独立,且反馈以批量形式提供。我们提出了一种新的半参数框架,用于带有协变量的批量老虎机,该框架在臂之间共享参数。我们利用单索引回归(SIR)模型来捕捉臂奖励之间的关系,同时在可解释性和灵活性之间取得平衡。我们的算法,批量单索引动态分箱和 successive arm elimination(BIDS),采用批量 successive arm elimination 策略,并通过单索引方向引导的动态分箱机制。我们考虑了两种设置:一种是可用 pilot 方向,另一种是方向从数据估计,推导了两种情况的理论遗憾界。当 pilot 方向足够准确且臂的数量 K 固定时,我们的方法在非参数批量老虎机中实现了最小化最优率(d=1),规避了维度灾难。在模拟和现实数据集上的大量实验展示了我们的算法相比由 \cite{jiang2025batched} 引入的非参数批量老虎机方法的有效性。

英文摘要

The multi-armed bandits (MAB) framework is a widely used approach for sequential decision-making, where a decision-maker selects an arm in each round with the goal of maximizing long-term rewards. In many practical applications, such as personalized medicine and recommendation systems, contextual information is available at the time of decision-making, rewards from different arms are related rather than independent, and feedback is provided in batches. We propose a novel semi-parametric framework for batched bandits with covariates that incorporates a shared parameter across arms. We leverage the single-index regression (SIR) model to capture relationships between arm rewards while balancing interpretability and flexibility. Our algorithm, Batched single-Index Dynamic binning and Successive arm elimination (BIDS), employs a batched successive arm elimination strategy with a dynamic binning mechanism guided by the single-index direction. We consider two settings: one where a pilot direction is available and another where the direction is estimated from data, deriving theoretical regret bounds for both cases. When a pilot direction is available with sufficient accuracy and the number of arms $K$ is fixed, our approach achieves minimax-optimal rates (with $d = 1$) for nonparametric batched bandits, circumventing the curse of dimensionality. Extensive experiments on simulated and real-world datasets demonstrate the effectiveness of our algorithm compared to the nonparametric batched bandit method introduced by \cite{jiang2025batched}.

2502.17773 2026-05-21 stat.ME cs.AI cs.LG

How Many Human Survey Respondents is a Large Language Model Worth? An Uncertainty Quantification Perspective

大型语言模型值得模拟多少人意见?从不确定性量化角度出发

Chengpiao Huang, Yuhang Wu, Kaizheng Wang

AI总结 本文从不确定性量化角度出发,提出了一种框架,将LLM模拟的响应转换为人类响应总体参数的可靠置信集,通过量化人类-LLM不一致带来的不确定性。关键设计是模拟响应的数量:过多会导致置信集过窄且覆盖性差,过少则导致置信集过宽且信息不足。本文提出了一种数据驱动的方法,自适应选择模拟样本量以实现名义平均覆盖性,无论LLM的模拟保真度或置信集构建过程如何。所选样本量进一步反映了LLM能代表的有效人类人口规模,提供了其模拟保真度的定量度量。实验表明不同LLM和领域存在异质性模拟保真度。

详情
Comments
63 pages, 13 figures
AI中文摘要

大型语言模型(LLMs)越来越多地用于模拟调查响应,但合成数据可能与人类人口不一致,导致不可靠的推断。我们开发了一个通用框架,将LLM模拟的响应转换为人类响应总体参数的可靠置信集,量化由人类-LLM不一致引起的不确定性。关键设计选择是模拟响应的数量:过多会产生过于狭窄的置信集,覆盖性差;过少则会产生过于宽泛且信息不足的置信集,受随机噪声主导。我们提出了一种数据驱动的方法,自适应地选择模拟样本量以实现名义平均覆盖性,无论LLM的模拟保真度或置信集构建过程如何。所选样本量进一步被证明反映了LLM能代表的有效人类人口规模,提供其模拟保真度的定量度量。在真实调查数据集上的实验揭示了不同LLM和领域之间的异质性模拟保真度。

英文摘要

Large language models (LLMs) are increasingly used to simulate survey responses, but synthetic data can be misaligned with the human population, leading to unreliable inference. We develop a general framework that converts LLM-simulated responses into reliable confidence sets for population parameters of human responses, quantifying the uncertainty induced by the human-LLM misalignment. The key design choice is the number of simulated responses: too many produce overly narrow sets with poor coverage, while too few yield overly wide and uninformative sets dominated by stochastic noise. We propose a data-driven approach that adaptively selects the simulation sample size to achieve nominal average-case coverage, regardless of the LLM's simulation fidelity or the confidence set construction procedure. The selected sample size is further shown to reflect the effective human population size that the LLM can represent, providing a quantitative measure of its simulation fidelity. Experiments on real survey datasets reveal heterogeneous simulation fidelity across different LLMs and domains.

2502.17518 2026-05-21 cs.LG cs.AI q-fin.CP stat.ML

Ensemble RL through Classifier Models: Enhancing Risk-Return Trade-offs in Trading Strategies

通过分类器模型进行集成强化学习:在交易策略中增强风险回报权衡

Zheli Xiong

AI总结 本文研究了在金融交易策略中使用集成强化学习模型的全面研究,利用分类器模型来提升性能。通过将A2C、PPO和SAC等强化学习算法与传统分类器如支持向量机(SVM)、决策树和逻辑回归相结合,探讨不同分类器组如何整合以改善风险回报权衡。研究评估了各种集成方法的有效性,将其与单个强化学习模型在关键金融指标(包括累计回报率、夏普比率(SR)、卡勒姆比率和最大回撤(MDD))上进行比较。结果表明,集成方法在风险调整后的回报方面始终优于基础模型,提供了更好的回撤管理和整体稳定性。然而,我们发现集成性能对方差阈值τ的选择敏感,强调了动态调整τ以达到最佳性能的重要性。本研究强调了将强化学习与分类器结合在自适应决策中的价值,对金融交易、机器人和其他动态环境具有启示。

详情
Comments
16 pages,5 figures, 1 table
AI中文摘要

本文提出了一项全面研究,探讨在金融交易策略中使用集成强化学习(RL)模型的应用,利用分类器模型来提升性能。通过结合A2C、PPO和SAC等强化学习算法与传统分类器如支持向量机(SVM)、决策树和逻辑回归,我们研究了不同分类器组如何整合以改善风险回报权衡。研究评估了各种集成方法的有效性,将其与单个RL模型在关键金融指标(包括累计回报率、夏普比率(SR)、卡勒姆比率和最大回撤(MDD))上进行比较。我们的结果表明,集成方法在风险调整后的回报方面始终优于基础模型,提供了更好的回撤管理和整体稳定性。然而,我们发现集成性能对方差阈值τ的选择敏感,强调了动态调整τ以达到最佳性能的重要性。本研究强调了将强化学习与分类器结合在自适应决策中的价值,对金融交易、机器人和其他动态环境具有启示。

英文摘要

This paper presents a comprehensive study on the use of ensemble Reinforcement Learning (RL) models in financial trading strategies, leveraging classifier models to enhance performance. By combining RL algorithms such as A2C, PPO, and SAC with traditional classifiers like Support Vector Machines (SVM), Decision Trees, and Logistic Regression, we investigate how different classifier groups can be integrated to improve risk-return trade-offs. The study evaluates the effectiveness of various ensemble methods, comparing them with individual RL models across key financial metrics, including Cumulative Returns, Sharpe Ratios (SR), Calmar Ratios, and Maximum Drawdown (MDD). Our results demonstrate that ensemble methods consistently outperform base models in terms of risk-adjusted returns, providing better management of drawdowns and overall stability. However, we identify the sensitivity of ensemble performance to the choice of variance threshold τ, highlighting the importance of dynamic τ adjustment to achieve optimal performance. This study emphasizes the value of combining RL with classifiers for adaptive decision-making, with implications for financial trading, robotics, and other dynamic environments.

2401.03834 2026-05-21 stat.ME

On the error control of invariant causal prediction

关于不变因果预测的误差控制

Jinzhou Li, Jelle J Goeman

AI总结 本文研究如何通过更宽松的误差保证改进不变因果预测方法,提出使用虚假发现率控制和同时真实发现界作为核心方法,以提高因果信息的提取能力。

详情
AI中文摘要

不变因果预测提供了一个有用的框架,用于使用来自多个环境的异质数据识别响应的因果预测器。原始不变因果预测方法的一个有价值特性是它以高概率保证没有虚假的因果发现。然而,这种保证在某些应用中可能过于保守,导致很少或没有因果发现。这引发了一个自然的问题:能否为不变因果预测配备更不保守的误差保证,从而从数据中提取更多的因果信息?在本文中,我们通过聚焦于两种广泛使用的更宽松保证:虚假发现率控制和同时真实发现界来回答这个问题。我们方法的关键步骤是将不变因果预测重新表述为多重检验问题。然后我们采用e-Closure原理来获得(同时)虚假发现率控制,同时采用针对此设置的新p-to-e校准器。我们还通过封闭检验推导出同时真实发现界,这些界提供了额外的因果信息,而无需额外假设,并保留了原始不变因果预测方法的所有发现。通过模拟和对美国青少年教育成就的现实数据应用,我们展示了这些更宽松的误差控制保证可以提高不变因果预测的实用性。

英文摘要

Invariant causal prediction provides a useful framework for identifying causal predictors of a response using heterogeneous data from multiple environments. One valuable property of the original invariant causal prediction method is that it guarantees no false causal discoveries with high probability. Such a guarantee, however, can be overly conservative in some applications, resulting in few or no causal discoveries. This raises a natural question: can invariant causal prediction be equipped with less conservative error guarantees and thereby extract more causal information from the data? In this paper, we address this question by focusing on two widely used and more liberal guarantees: false discovery rate control and simultaneous true discovery bounds. A key step in our approach is to reformulate invariant causal prediction as a multiple testing problem. We then adopt the e-Closure principle to obtain (simultaneous) false discovery rate control, together with new p-to-e calibrators tailored to this setting. We also derive simultaneous true discovery bounds via closed testing, which provide additional causal information without requiring extra assumptions and retain all discoveries from the original invariant causal prediction method. Through simulations and a real data application on educational attainment of teenagers in the United States, we show that these more liberal error control guarantees can improve the practical usefulness of invariant causal prediction.

2605.21253 2026-05-21 stat.ML cs.LG

Theoretical guidelines for annealed Langevin dynamics in compositional simulation-based inference

关于组成式得分方法在基于模拟的推断中的退火动力学理论指南

Camille Touron, Gabriel V. Cardoso, Julyan Arbel, Pedro L. C. Rodrigues

AI总结 本文研究了基于模拟的推断中组成式得分方法的退火动力学理论,提出了一种新的理论框架,通过推导Wasserstein界,为超参数选择提供了理论指导,并在高斯情况下证明了不同复合得分方法在步长和总动力学步数上的差异。

详情
AI中文摘要

基于模拟的推断(SBI)中的组成式得分方法通过聚合单独学习的后验得分来近似给定n个独立观测的后验分布。目前主要有两种方法(Geffner等人,2023;Linhart等人,2026)。由于所得到的复合得分不对应于真实多观测后验的正向扩散路径上的任何分布的得分,通过反向SDE采样会导致不可消除的偏差。退火动力学提供了一种原理性的替代方法:它将复合得分视为一系列可处理的桥梁密度序列的真实得分,并依次采样这些密度。当正确调节时,它可能导致可控的偏差。然而,其超参数,即步长、每个级别步数和退火级别数,迄今为止都是经验选择。我们推导了退火动力学在近似得分下的Wasserstein界,并将其转化为这些超参数的显式决策规则,以保证规定的采样精度,同时突显每种复合得分方法的不同理论方面。在高斯情况下,我们获得了所有相关量的闭式表达式,并证明了Linhart等人(2026)的桥梁密度一致地允许更大的步长和更少的总动力学步数,而Geffner等人(2023)的则不然。此外,我们还通过实验证明,在高斯情况下的调节可以推广到更复杂的问题,从而为使用组成式得分方法的实践者提供了一个清晰且理论坚实的起点。

英文摘要

Compositional score-based approaches to simulation-based inference (SBI) approximate the posterior over a shared parameter given $n$ independent observations by aggregating individually learned posterior scores: currently, there are two main propositions of such methods (Geffner et al. (2023), Linhart et al. (2026)). As the resulting composite score does not correspond to the score of any distribution along the forward diffusion path of the true multi-observation posterior, sampling from it via a reverse SDE leads to an irreducible bias. Annealed Langevin dynamics provides a principled alternative: it treats the composite score as the genuine score of a sequence of tractable bridging densities and samples from them in succession. When properly tuned, it could lead to a controllable bias. However, its hyperparameters, namely step sizes, the number of steps per level, and the number of annealing levels, have so far been chosen empirically. We derive Wasserstein bounds for annealed Langevin with approximate scores and translate them into explicit decision rules for these hyperparameters that guarantee a prescribed sampling accuracy, while highlighting different theoretical aspects of each composite score formulation. In the Gaussian setting, we obtain closed-form expressions for all relevant quantities and prove that the bridging densities of Linhart et al. (2026) consistently admit larger step sizes and require fewer total Langevin steps than those of Geffner et al. (2023). Furthermore, we show empirically that the tuning obtained in the Gaussian setting generalizes to more complex problems, thus providing a well-understood and theoretically grounded starting point for practitioners using compositional score-based approaches.

2605.21217 2026-05-21 stat.ML cs.LG

Federated LoRA Fine-Tuning for LLMs via Collaborative Alignment

通过协作对齐的联邦LoRA微调大型语言模型

Shuaida He, Liwen Chen, Long Feng

AI总结 本文研究了在联邦学习环境下使用LoRA进行参数高效微调的问题,提出了一种名为CLAIR的框架,通过结构低秩加块稀疏分解来恢复共享LoRA子空间并检测污染客户端,从而在噪声情况下实现精确恢复,并在不同条件下实现稳定和一致的协作集恢复。

详情
AI中文摘要

低秩适应(LoRA)已成为参数高效微调大型语言模型(LLMs)的强大工具。本文研究了在联邦学习设置下的LoRA,使客户端能够在保持参数效率的同时进行协作微调。我们专注于一个高度异质的环境,在这种环境中客户端仅共享部分结构,且大量子集可能被污染。我们提出了Collaborative Low-rank Alignment and Identifiable Recovery(CLAIR),一个意识污染的框架,仅依赖于初步的本地估计器。其公式适用于从线性回归到神经网络和LLM模块的广泛领域,只要本地适应可以表示为矩阵值更新。CLAIR通过结构低秩加块稀疏分解恢复共享LoRA子空间并检测污染客户端。我们证明了在无噪声情况下能够精确恢复共享LoRA子空间,在初步估计误差下实现稳定恢复,并在温和的分离条件下实现一致的协作集恢复。我们进一步量化了CLAIR的改进效果:它通过跨客户端平均减少子空间外的估计误差,同时在共享LoRA子空间内保留客户端特定的变异,从而在该Oracle增益超过子空间估计和良性客户端异质性的成本时优于本地微调。经验上,我们通过在文本复制任务上微调Transformer架构来展示CLAIR的优势。结果表明,与本地微调和非鲁棒联邦平均相比,CLAIR在准确检测污染客户端和改善良性客户端性能方面表现出色。

英文摘要

Low-rank adaptation (LoRA) has emerged as a powerful tool for parameter-efficient fine-tuning of large language models (LLMs). This paper studies LoRA under a federated learning setting, enabling collaborative fine-tuning across clients while preserving parameter efficiency. We focus on a highly heterogeneous regime in which clients share only partial structure and a substantial subset may be contaminated. We propose Collaborative Low-rank Alignment and Identifiable Recovery (CLAIR), a contamination-aware framework that relies only on preliminary local estimators. Its formulation applies broadly, from linear regression to neural network and LLM modules, whenever local adaptation can be represented by matrix-valued updates. CLAIR recovers the shared LoRA subspace and detects contaminated clients via a structured low-rank plus block-sparse decomposition. We prove exact recovery of the shared LoRA subspace in the noiseless case, stable recovery under preliminary estimation error, and consistent collaborative-set recovery under mild separation conditions. We further quantify the gain from CLAIR refinement: it reduces off-subspace estimation error through cross-client averaging while preserving client-specific variation within the shared LoRA subspace, thus improves over local fine-tuning whenever this oracle gain outweighs the costs of subspace estimation and benign-client heterogeneity. Empirically, we demonstrate the benefits of CLAIR by fine-tuning a Transformer architecture on a text-copying task. The results show accurate contamination detection and improved benign-client performance compared with local fine-tuning and non-robust federated averaging.

2605.21197 2026-05-21 stat.ME

Laplace Approximations for Mixed-Effects and Gaussian Process Quantile Regression

混合效应和高斯过程分位数回归的拉普拉斯近似

Andrea Nava, Fabio Sigrist

AI总结 本文提出了一种适用于混合效应和高斯过程分位数回归的拉普拉斯近似方法,通过分析信息和预期损失的曲率来克服非对称拉普拉斯似然下的计算障碍,提高了计算效率和准确性。

详情
AI中文摘要

拉普拉斯近似是一种用于潜在高斯模型计算高效推断的标准工具,但其在分位数回归中因非对称拉普拉斯似然的观测海森矩阵几乎处处消失而失效。本文证明,这一障碍可通过不平滑似然函数来克服:当模型正确指定时,相关局部曲率由信息给出;在模型不正确时,由预期损失的总体曲率给出。基于此,本文开发了适用于混合效应和高斯过程分位数回归的拉普拉斯近似框架。我们提出了实用的曲率估计器,包括三角核曲率(TKC)估计器,用于后验分布和边缘似然的近似,并建立了其渐近有效性。实证结果表明,所提方法在可扩展性和数值稳定性方面表现良好,并且在潜在高斯模型中,其精度可与MCMC和变分竞争者相比或更优,但计算成本显著更低。更广泛地说,该框架阐明了如何通过预期损失的局部二次行为来合理化非光滑广义后验的拉普拉斯近似。

英文摘要

Laplace approximations are a standard tool for computationally efficient inference in latent Gaussian models, but they fail for quantile regression with the asymmetric Laplace likelihood because the observed Hessian vanishes almost everywhere. We show that this obstacle can be overcome without smoothing the likelihood: the relevant local curvature is given not by the observed Hessian, but by the Fisher information when the model is correctly specified and by the population curvature of the expected loss under misspecification. On this basis, we develop a Laplace approximation framework for quantile regression with mixed-effects and Gaussian process models. We propose practical curvature estimators, including the triangular kernel curvature (TKC) estimator, that yield approximations for posterior distributions and marginal likelihoods, and we establish their asymptotic validity. Empirically, the proposed methods are scalable and numerically stable, and for latent Gaussian models, they achieve accuracy comparable to or better than MCMC and variational competitors at substantially lower computational costs. More broadly, the framework clarifies how Laplace approximations can be justified for non-smooth generalized posteriors through local quadratic behavior of the expected loss.

2605.21167 2026-05-21 stat.ML cs.LG

A Rigorous, Tractable Measure of Model Complexity

一个严格且可计算的模型复杂度度量

Oskar Allerbo, Thomas B. Schön

AI总结 本文提出了一种严格且易于计算的模型复杂度度量方法,基于模型在不同输入上的梯度相似性,适用于参数模型和非参数模型,并扩展了多项式度数、核长度尺度等模型特定复杂度度量,同时揭示了随机傅里叶特征、随机森林、神经网络和梯度提升中的双下降现象。

详情
AI中文摘要

对模型复杂度的准确评估对于解释、泛化和模型选择等主题至关重要。然而,大多数现有复杂度度量要么依赖于启发式假设,要么计算上不可行。在本文中,我们提出了一种数学上严谨且易于计算的模型复杂度度量方法,该方法基于模型在不同输入上的梯度相似性。因此,它适用于任何参数模型,也适用于基于核的非参数模型。我们证明了我们的复杂度度量可以推广到模型特定的复杂度度量,如多项式度数(多项式回归)、核长度尺度(Matérn核)、邻居数(k-近邻)、分割数(决策树)和树数(随机森林)。我们还利用我们的度量方法获得了关于随机傅里叶特征、随机森林、神经网络和梯度提升中的双下降现象的新见解。

英文摘要

An accurate assessment of a model's complexity is crucial for topics such as interpretation, generalization, and model selection. However, most existing complexity measures either rely on heuristic assumptions or are computationally prohibitive. In this paper, we present a mathematically rigorous yet easy-to-compute measure of model complexity that is based on the similarities between the model gradients across inputs. It is thus well-defined for any parametric model, but also for kernel-based non-parametric models. We prove that our measure of complexity generalizes model-specific complexity measures such as polynomial degree (for polynomial regression), kernel length scale (for Matérn kernels), number of neighbors (for k-nearest neighbors), number of splits (for decision trees), and number of trees (for random forests). We also use our measure to obtain new insights into the double descent phenomenon for random Fourier features, random forests, neural networks, and gradient boosting.

2605.21107 2026-05-21 cs.LG stat.ML

Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction

通过自收缩性获得约束在线凸优化的改进保证

Dhruv Sarkar, Abhishek Sinha

AI总结 本文提出了一种基于投影的算法,在强凸损失下同时实现O(log T)的 regrets 和 O(log T) 的 CCV,对于凸损失则在保持最优 O(√T) regrets 的同时将 CCV 提升到 O(√T)。

详情
AI中文摘要

我们考虑了具有对抗性选择约束的约束在线凸优化 (COCO)。在每一轮中,学习者在观察该轮损失和约束函数之前选择动作。目标是在满足所有约束的最佳点上实现小静态遗憾,同时控制累积约束违反(CCV)。对于强凸损失,最先进的算法实现 O(log T) 的遗憾和 O(√(T log T)) 的 CCV。对应的凸损失最佳已知界限是 O(√T) 的遗憾和 O(√T log T) 的 CCV。在本文中,我们提出了一种简单的投影算法,对于强凸损失同时实现 O(log T) 的遗憾和 O(log T) 的 CCV,从而在 CCV 方面实现了指数级改进。对于凸损失,我们的算法将 CCV 提高到 O(√T),同时保持最优的 O(√T) 悲伤。我们改进的关键是一个最近的几何结果,用于自收缩曲线,这可能具有独立兴趣。

英文摘要

We consider Constrained Online Convex Optimization (COCO) with adversarially chosen constraints. At each round, the learner chooses an action before observing the loss and constraint function for that round. The goal is to achieve small static regret against the best point satisfying all constraints while also controlling cumulative constraint violation ($\mathsf{CCV}$). For strongly convex losses, state-of-the-art algorithms achieve $O(\log T)$ regret and $O(\sqrt{T \log T})$ $\mathsf{CCV}.$ The corresponding best-known bounds for convex losses is $O(\sqrt{T})$ regret and $O(\sqrt{T} \log T)$ $\mathsf{CCV}$. In this paper, we give a simple projection-based algorithm that simultaneously achieves $O(\log T)$ regret and $O(\log T)$ $\mathsf{CCV}$ for strongly-convex losses, yielding an exponential improvement in the $\mathsf{CCV}$. For the convex losses, our algorithm improves the $\mathsf{CCV}$ to $O(\sqrt{T})$ while maintaining the optimal $O(\sqrt{T})$ regret. The key to our improvement is a recent geometric result for self-contracted curves, which may be of independent interest.

2605.21060 2026-05-21 cs.LG cs.AI stat.ML

Divide et Calibra: Multiclass Local Calibration via Vector Quantization

Divide et Calibra: 通过向量量化实现多类局部校准

Cesare Barbera, Lorenzo Perini, Giovanni De Toni, Andrea Passerini, Andrea Pugnana

AI总结 本文提出了一种复合方法,通过向量量化诱导表示空间的结构划分,并利用Dirichlet浓度的参数化实现跨区域参数共享,从而学习出能泛化到稀疏区域的异质校准映射,提升了局部校准性能同时保持了全局校准和预测性能。

详情
AI中文摘要

在高风险场景中,准确且校准良好的机器学习(ML)模型是必需的,但有效的多类校准仍然具有挑战性:全局方法假设校准误差在潜在空间中是同质的,而局部方法通常依赖于潜在空间降维,导致信息丢失。为了解决这些问题,我们提出了一种多类校准的复合方法,其中区域特定的校准映射是从共享的码字依赖因素中构建的。我们通过向量量化(VQ)实现这一想法,它诱导了表示空间的结构划分,并利用Dirichlet浓度的参数化实现跨区域参数共享。我们的方法学习了能泛化到稀疏区域的异质校准映射。在基准数据集上的实验显示,在保持竞争性的全局校准和预测性能的同时,显著提高了局部校准性能。

英文摘要

Accurate and well-calibrated Machine Learning (ML) models are mandatory in high-stakes settings, yet effective multiclass calibration remains challenging: global approaches assume calibration errors are homogeneous across the latent space, while local methods often rely on latent-space dimensionality reduction, which leads to information loss. To address these issues, we propose a compositional approach to multiclass calibration, where region-specific calibration maps are constructed from shared codeword-dependent factors. We instantiate this idea via Vector Quantization (VQ), which induces a structured partition of the representation space, and an indexed parameterization of Dirichlet concentrations that enables parameter sharing across regions. Our approach learns heterogeneous calibration maps that generalize well even to sparse regions of the latent space. Experiments on benchmark datasets show significant improvements in local calibration while maintaining competitive global calibration and predictive performance.

2605.21043 2026-05-21 stat.OT

An Introduction to Copulas: a Complement

关于皮尔逊相关系数:一种补充

Werner G. Müller

AI总结 本文为《统计推断》课程补充关于皮尔逊相关系数的内容,提供两个章节以更接近原书风格的方式介绍皮尔逊相关系数理论。

详情
AI中文摘要

多年来,我一直在用Casella和Berger(2002)的书为硕士生教授高级统计推断课程。这本书对核心主题进行了全面的阐述,避免了测度论,但保持了数学上的精确性,但并未涵盖日益重要的皮尔逊相关系数概念。本文的笔记旨在补充该书,通过尽可能接近原书风格的方式增加两个关于皮尔逊相关系数的章节。定义、定理、例子和练习的编号与Casella和Berger(2002)一致,但材料也可以作为简短的独立介绍皮尔逊相关系数理论的读物来阅读。

英文摘要

For many years I have taught an advanced statistical inference course for master's students using the text of Casella and Berger (2002). The book gives a comprehensive treatment of the core topics at a level that avoids measure theory while remaining mathematically precise, but it does not cover the increasingly important concept of copulas. The present notes are intended to complement the book by adding two sections on copulas in a style that is as close as possible to that of the original text. Numbering of definitions, theorems, examples, and exercises is consistent with Casella and Berger (2002), but the material may also be read as a brief, stand-alone introduction to copula theory.

2605.21041 2026-05-21 stat.ML cs.LG stat.ME

Conditioning Gaussian Processes on Almost Anything

对几乎任何事物进行高斯过程的条件化

Henry Moss, Lachlan Astfalck, Thomas Cowperthwaite, Colin Doumont, Sam Willis, Philipp Hennig, Christopher Nemeth, Andrew Zammit-Mangion

AI总结 本文提出了一种通用的方法,通过将高斯过程与线性扩散模型建立等价关系,实现了对任意条件语句的高效条件化,包括非线性物理模型和自然语言,从而扩展了高斯过程在现实世界建模中的应用。

详情
AI中文摘要

高斯过程(GPs)提供了一种基于函数的原理性概率模型,但精确推断仅限于线性-高斯范式。我们建立了GPs与一类线性扩散模型之间的显式等价关系,将预测采样重新表述为一个具有闭式高斯动力学和一个依赖似然的引导项的ODE,该引导项允许简单的蒙特卡洛近似。在线性-高斯设置中,我们精确恢复了标准GP条件化;超越共轭性之外,相同的机制能够处理任何允许逐点似然评估的条件语句——包括非线性物理模型,以及首次通过大型语言模型实现自然语言。白化分离了不可约的非高斯动力学,最小化了Wasserstein-2运输成本并消除了数值刚性。结果是一种通用的GP推断方案,无需专门推导。这些结果提供了一种通用机制,将现实世界知识的全部丰富性作为条件信息纳入其中,为现实世界问题的概率建模开辟了新的前沿。

英文摘要

Gaussian processes (GPs) offer a principled probabilistic model over functions, but exact inference is restricted to the linear-Gaussian regime. We establish an explicit equivalence between GPs and a class of linear diffusion models, recasting predictive sampling as an ODE with closed-form Gaussian dynamics and a likelihood-dependent guidance term that admits a simple Monte Carlo approximation. In the linear-Gaussian setting, we recover standard GP conditioning exactly; beyond conjugacy, the same machinery handles any conditioning statement admitting point-wise likelihood evaluation -- including non-linear physics, and, for the first time, natural language via large language models. Whitening isolates the irreducible non-Gaussian dynamics, minimising Wasserstein-2 transport cost and eliminating numerical stiffness. The result is a general-purpose GP inference scheme requiring no bespoke derivations. Together, these results provide a general mechanism for incorporating the full richness of real-world knowledge as conditioning information, opening a new frontier for the probabilistic modelling of real-world problems.

2605.20999 2026-05-21 math.PR cs.LG math.OC stat.ML

Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise

一般随机逼近的集中性在重尾马尔可夫噪声下

Shubhada Agrawal, Siva Theja Maguluri, Martin Zubeldia

AI总结 本文研究了在具有有限状态马尔可夫分量和马丁格尔差分分量的噪声下,随机逼近算法迭代项的最大集中性界。通过新的Lyapunov函数和辅助投影算法,分析了不同步长序列和随机算子性质对误差尾部行为的影响,并展示了在无界马丁格尔差分噪声情况下,误差尾部的集中性结果。

详情
Comments
67 pages
AI中文摘要

我们建立了由具有通用步长的随机逼近算法生成的迭代项的最大集中性界,其中噪声包含有限状态马尔可夫分量和马丁格尔差分分量。当马丁格尔差分噪声有界时,我们证明误差尾部可以是亚高斯、亚魏伯或比任何帕累托分布更轻但比任何魏伯分布更重,这取决于步长序列和随机算子是否几乎必然收缩、几乎必然非扩张或以正概率扩张。我们的分析依赖于一个涉及解泊松方程的矩生成函数的新型Lyapunov函数,以及一个辅助投影算法。我们通过最坏情况例子补充上界,表明更精确的上界不可能实现。我们进一步研究了当平均算子是收缩的且步长为$1/k$时无界马丁格尔差分噪声的情况,在此设置下,如果随机算子几乎必然非扩张,则误差尾部至多是噪声尾部的三倍重;如果随机算子以正概率扩张,则误差尾部可能显著更重。这些结果通过一种新的黑盒截断论证获得,将无界噪声情况转化为有界噪声情况。

英文摘要

We establish maximal concentration bounds for the iterates generated by stochastic approximation algorithms with general step sizes, where the noise has a finite-state Markovian component plus a Martingale-difference component. When the Martingale-difference noise is bounded, we show that the tail of the error can be sub-Gaussian, sub-Weibull, or something lighter than any Pareto but heavier than any Weibull, depending on the step size sequence and on whether the random operator is almost surely contractive, almost surely non-expansive, or expansive with positive probability. Our analysis relies on a novel Lyapunov function involving the moment-generating function of the solution to a Poisson equation, together with an auxiliary projected algorithm. We complement the upper bounds with worst-case examples showing that qualitatively sharper bounds are impossible. We further study the case of unbounded Martingale-difference noise when the average operator is contractive, and the step sizes are of order $1/k$. In this setting, we show that if the random operator is almost surely non-expansive, then the error tail is at most three times heavier than the noise tail, whereas if the random operator is expansive with positive probability, then the error may have substantially heavier tails. These results are obtained through a novel black-box truncation argument that reduces the unbounded-noise setting to the bounded-noise case.

2605.20987 2026-05-21 stat.CO math.PR

Particle filtering methods for partially observed branching processes

部分观测分支过程的粒子滤波方法

Miguel González, Inés M. del Puerto, Manuel Serrano-Pastor

AI总结 本文研究了部分观测分支过程的参数估计问题,提出基于序列蒙特卡洛方法的计算工具进行贝叶斯推断,并应用Liu-West粒子滤波器对流行病模型进行参数估计。

详情
AI中文摘要

本文聚焦于部分观测分支过程的估计问题。首先回顾了文献中从频数学视角提出的估计器。本文的主要目的是展示基于序列蒙特卡洛方法的计算工具,用于对这些过程进行贝叶斯推断。特别是,将Liu-West粒子滤波器应用于对由部分观测分支过程拟合的流行病模型的参数进行贝叶斯估计。作为应用,[8]中的例子被重新审视并扩展。

英文摘要

This paper focuses on the estimation of partially observed branching processes. First, the estimators from a frequentist perspective proposed in the literature are reviewed. The main objective of this paper is to present computational tools based on sequential Monte Carlo methods to perform Bayesian inference for these processes. In particular, the Liu-West particle filter is applied to perform Bayesian estimation of the parameters of interest for an epidemic model fitted by a partially observed branching process. As application, the example given in [8] is revisited and extended.

2605.20943 2026-05-21 stat.ME

Missing data and cluster graphs: cluster-level missingness vs variable-level missingness

缺失数据与聚类图:聚类层面的缺失性与变量层面的缺失性

Willow Scott, Eugenio Valdano, Charles Assaad

AI总结 本文研究了在仅有粗略结构信息的情况下,如何通过聚类图来恢复缺失数据,提出了两种聚类缺失性图模型,并探讨了这些抽象图与底层变量缺失性模型之间的兼容性及其对概率和因果查询恢复的影响。

详情
AI中文摘要

缺失数据在许多科学领域如公共卫生、环境科学和社会科学中普遍存在。通常使用完全指定的变量层面缺失性模型来研究缺失数据的恢复性,尽管在许多应用中,仅有的结构信息是粗略的,例如当变量被分组到聚类中时,由于知识或可解释性的限制。在本文中,我们研究了从这种抽象表示中恢复数据的可行性。我们引入了两种基于聚类的缺失性图:m-C-DMG,它保留了变量特定的缺失性指示器,以及cm-C-DMG,它在聚类层面聚合了缺失性机制。我们正式定义了这些抽象图与底层变量层面缺失性模型之间的兼容性概念,并研究了这种抽象如何影响概率和因果查询的恢复性。特别是,我们给出了恢复联合分布以及恢复宏观因果效应的图示条件。总体而言,我们的结果澄清了何时聚类层面的缺失性信息足以进行有效的推断,以及何时需要更精细的建模。

英文摘要

Missing data is pervasive in many scientific domains such as public health, environmental science, and the social sciences. Recoverability from missing data is typically studied using fully specified variable-level missingness models despite that, in many applications, only coarse structural information is available, for instance when variables are grouped into clusters due to limited knowledge or interpretability reasons. In this paper, we investigate recoverability from such abstract representations. We introduce two classes of cluster-based missingness graphs: the m-C-DMG, which retains variable-specific missingness indicators, and the cm-C-DMG, which aggregates missingness mechanisms at the cluster level. We formalize the notion of compatibility between these abstract graphs and underlying variable-level missingness models, and study how this abstraction affects the recoverability of probabilistic and causal queries. In particular, we give graphical conditions of recovering the joint distribution as well as graphical conditions of recovering a macro causal effect. Overall, our results clarify when cluster-level missingness information is sufficient for valid inference, and when finer-grained modeling is necessary.

2605.20866 2026-05-21 cs.LG cs.DC math.OC stat.ML

LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging

LOSCAR-SGD:局部SGD与通信-计算重叠及延迟校正的稀疏模型平均

Yassine Maziane, Ammar Mahran, Artavazd Maranjyan, Peter Richtárik

AI总结 本文研究了在异构计算环境下结合通信压缩、局部训练和通信-计算重叠的局部SGD方法,提出LOSCAR-SGD通过仅通信稀疏模型坐标并持续优化来提高分布式学习效率,首次给出了这种组合方法的理论保证。

详情
AI中文摘要

在分布式学习中,通信是主要的瓶颈,尤其是在大规模设置和联邦学习环境中链接缓慢时。减少此成本的三种标准方法是通信压缩、局部训练和通信-计算重叠。结合这些成分的方法在实践中被发现对大规模训练有效,但很少有理论支持同时结合这三种方法的方法。我们研究了一个异构计算环境,其中不同的工作者可能进行不同数量的局部步骤,并提出LOSCAR-SGD,一种局部SGD方法,仅通信模型坐标的稀疏子集,并在通信飞行期间继续优化。关键成分是延迟校正的合并规则,该规则在不丢弃重叠阶段所做进展的情况下整合延迟同步信息。我们为光滑非凸目标函数提供了收敛保证,并展示了稀疏性、重叠和工作者异质性如何影响收敛速度。据我们所知,这是首次针对这种成分组合的理论。实验进一步表明,通信-计算重叠减少了训练时间,并且延迟校正的合并优于朴素覆盖。

英文摘要

Communication is a major bottleneck in distributed learning, especially in large-scale settings and in federated learning environments with slow links. Three standard ways to reduce this cost are communication compression, local training, and communication-computation overlap. Methods that combine these ingredients are used in practice and have been found to be effective for large-scale training, but there is little theory for methods that combine all three. We study a heterogeneous-compute setting in which different workers may take different numbers of local steps, and we propose LOSCAR-SGD, a Local SGD method that communicates only a sparse subset of model coordinates and continues optimizing while communication is in flight. A key ingredient is a delay-corrected merge rule that incorporates delayed synchronized information without discarding the progress made during the overlap phase. We give convergence guarantees for smooth non-convex objectives and show how sparsity, overlap, and worker heterogeneity affect the rate. To the best of our knowledge, this is the first theory for this combination of ingredients. Experiments further show that communication-computation overlap reduces training time and that the delay-corrected merge outperforms naive overwriting.

2605.20817 2026-05-21 stat.ME

Topics in Nonparametric Bayesian Statistics

非参数贝叶斯统计学主题

Nils Lid Hjort

AI总结 本文综述了非参数贝叶斯统计学领域内的各种理论和应用研究主题,补充和扩展了最近的综述文献,旨在探讨感兴趣的研究所涉及的领域。

详情
Comments
23 pages, no figures. Published, in modified form, as Chapter 15 in the book `Highly Structured Stochastic Systems' (Oxford University Press, 2003, eds. P.J. Green, N.L. Hjort, S. Richardson)
AI中文摘要

贝叶斯统计学和非参数统计学的交集在大约1973年前几乎为空,但现在正以健康的速度增长。本章为《高度结构随机系统》一书(牛津大学出版社,2003年)提供概述,介绍了该领域内各种理论和应用研究主题,部分补充和扩展了Dey、Müller和Sinha(1998)以及Walker、Damien、Laud和Smith(1999)的最近综述。目的是不力求完整或详尽,而是通过例子探讨感兴趣的研究所涉及的领域。

英文摘要

The intersection set of Bayesian and nonparametric statistics was almost empty until about 1973, but now is growing at a healthy rate. This chapter, for the {\it Highly Structured Stochastic Systems} book (Oxford University Press, 2003) gives an overview of various theoretical and applied research themes inside this field, partly complementing and extending recent reviews of Dey, M{ü}ller and Sinha (1998) and Walker, Damien, Laud and Smith (1999). The intention is not to be complete or exhaustive, but rather to touch on research areas of interest, partly by example.

2605.20806 2026-05-21 stat.ME stat.AP

Evaluation of the number of clusters in a data set using $p$-values from Multiple Tests of Hypotheses

利用假设检验的p值评估数据集中的聚类数

Soumita Modak

AI总结 本文提出了一种新的非参数、基于点间距离的度量方法,用于确定给定数据集中是否存在群体,以及如果存在,则总共有多少群体。该方法适用于任意维度的数据集,并与任何指定聚类数作为先验的聚类算法相结合。通过执行单变量、非参数、多重假设检验,利用样本量相同的依赖检验进行点间距离分析,生成p值以进行组合决策,通过逐步过程确定可能的聚类数。该方法比文献中的其他准确性度量减少了不必要的计算。数据研究证明了所提出指标的效率和优越性。

详情
Journal ref
Communications in Statistics - Theory and Methods (2024), 53, 8878-8889
AI中文摘要

本文提出了一种新的非参数、基于点间距离的度量方法,用于确定给定数据集中是否存在群体,以及如果存在,则总共有多少群体。它是一种适用于任意维度数据集的聚类准确性指数,可与任何具有指定聚类数作为先验的聚类算法相结合。我们执行单变量、非参数、多重假设检验,其中使用点间距离进行的依赖检验数量与样本量相同。它们具有p值用于组合以做出决策,该决策通过逐步过程对可能的聚类数进行判断。与文献中的其他准确性度量相比,该方法减少了不必要的计算。数据研究确立了所提出指标的效率和优越性。

英文摘要

This paper proposes a novel, nonparametric, interpoint distance-based measure to investigate whether there exist any groups in a set of given data, and if so then, how many groups are prevailing in total. It is a cluster accuracy index useful for arbitrary-dimensional data set, in association with any clustering algorithm having the number of groups specified as a priori. We perform univariate, nonparametric, multiple statistical tests of hypotheses, where as many dependent tests as the sample size are carried out using the interpoint distances. They possess $p$-values to be combined to reach a decision, which is taken in a step-wise process for a possible number of clusters. It reduces the unnecessary computations compared with the other accuracy measures from the literature. Data study establishes the proposed index's efficiency and superiority.

2605.20767 2026-05-21 cs.CL cs.LG stat.ME

The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study

干预的幻觉:你的LLM模拟实验实际上是一个观察性研究

Victoria Lin, Taedong Yun, Maja Matarić, John Canny, Arthur Gretton, Alexander D'Amour

AI总结 本文探讨了大型语言模型在模拟人类行为中的潜在作用,指出在LLM模拟的合成用户中进行干预可能引起潜在用户属性的意外变化,从而导致用户漂移,影响效果估计。本文提出了使用负对照结果来检测分布变化的方法,并通过调整角色描述以减少偏倚来缓解漂移问题。

详情
AI中文摘要

大型语言模型(LLMs)显示出作为人类行为模拟器的潜力,提供了一种可扩展的方式研究对干预的反应。然而,由于LLMs主要基于观察性数据进行训练,在与LLM模拟的合成用户进行实验时,干预可能会引起潜在用户属性的意外变化,导致用户漂移,其中隐含的模拟总体在不同处理条件下有所不同,这可能会扭曲效应估计。我们正式化了由于用户漂移可能产生的混淆或选择偏差,并展示了干预依赖性变化如何放大或减弱干预下用户响应的观测差异。为了诊断混淆,我们提出使用负对照结果——在干预下应保持不变的属性——来识别干预条件间的分布变化,提供用户漂移的证据。为了缓解漂移,我们研究了通过获取额外的混杂因素来调整角色描述,发现针对特定场景的相关混杂因素可以显著减少调查式和多轮代理评估中的偏倚。

英文摘要

Large language models (LLMs) show potential as simulators of human behavior, offering a scalable way to study responses to interventions. However, because LLMs are trained largely on observational data, interventions in experiments with LLM-simulated synthetic users can induce unintended shifts in latent user attributes, causing user drift where the implicit simulated population differs across treatment conditions, potentially distorting effect estimates. We formalize the confounding or selection bias that can arise due to user drift and show how intervention-dependent shifts can inflate or attenuate observed differences in user responses under intervention. To diagnose confounding, we propose using negative control outcomes--attributes that should remain invariant under intervention--to identify distribution shifts across intervention conditions, providing evidence of user drift. To mitigate drift, we study adjusting the persona specification by eliciting additional confounders, finding that targeted, setting-relevant confounders can substantially reduce bias across survey-style and multi-turn agent evaluations.

2605.20756 2026-05-21 cs.LG cs.AI math.OC stat.ML

Correcting Stochastic Update Bias in Preconditioned Language Model Optimizers

纠正预条件语言模型优化器中的随机更新偏差

Nikhil Nayak, Julia White, Urchade Zaratiana, Kelton Zhang, Henrijs Princis, Dhruv Atreja, Henry Fawcett, Matthew Thomas, George Hurn-Maloney, Ash Lewis

AI总结 本文研究了预条件优化器中随机更新规则的有限样本偏差问题,提出了一种单批次偏差校正框架,通过交叉拟合预条件估计和方差校正逆运算来减少梯度-预条件器耦合偏差和逆运算偏差,从而提升预条件优化器的性能。

详情
Comments
32 pages, 3 figures, 13 tables
AI中文摘要

预条件优化器在语言模型训练中至关重要,但其随机更新规则通常被视为对群体预条件下降的直接近似。我们证明这种观点忽略了两个有限样本偏差。首先,梯度和预条件器通常从同一个mini-batch估计,引入梯度-预条件器耦合偏差。其次,即使预条件器估计是无偏的,其逆或逆根通常有偏,因为逆运算是非线性的。我们提出了一种单批次偏差校正框架,以解决这两种效应:交叉拟合预条件估计从独立的微批次组中估计分子和预条件器,而方差校正逆运算利用微批次变化来减去主导的delta-方法偏差项。该框架适用于对角矩、对角曲率和矩阵预条件方法,分别在AdamW、Sophia和Shampoo中实现。偏差校正将Qwen2.5-0.5B的保持预训练损失减少了0.15、0.07和0.11 nat,分别;对混合质量预训练和下游指令微调的影响始终是中性到积极的。这些结果确立了偏差校正作为减少有限样本更新偏差和提升预条件优化器性能的实用机制。

英文摘要

Preconditioned optimizers are central to language model training, but their stochastic update rules are usually treated as direct approximations to population preconditioned descent. We show that this view misses two finite-sample biases. First, the gradient and preconditioner are typically estimated from the same minibatch, introducing gradient--preconditioner coupling bias. Second, even when the preconditioner estimate is unbiased, its inverse or inverse-root is generally biased because inversion is nonlinear. We propose a single-batch bias-correction framework that addresses both effects: cross-fitted preconditioning estimates the numerator and preconditioner from independent microbatch groups, while variance-corrected inversion uses microbatch variability to subtract the leading delta-method bias term. The framework applies to diagonal moment, diagonal curvature, and matrix preconditioning methods, instantiated in AdamW, Sophia, and Shampoo. Bias correction reduces held-out pretraining loss on Qwen2.5-0.5B by $0.15$, $0.07$, and $0.11$ nats, respectively; the effects on mixed-quality pretraining and downstream instruction tuning are consistently neutral-to-positive. Together, these results establish bias correction as a practical mechanism for reducing finite-sample update bias and improving the performance of preconditioned optimizers.

2605.20739 2026-05-21 math.ST eess.SP stat.TH

Revisiting the Misspecified Cramér-Rao Bound

重新审视设定错误的Cramér-Rao界

Malaak Khatib, Nadav Harel, Joseph Tabrikian, Tirza Routtenberg

AI总结 本文重新审视在模型设定错误下的参数估计理论,重新审视MCRB的基础,通过点wise等价模型的概念推导出新的MCRB,并明确其适用的估计器类别和等式条件,为实际估计器提供了新的见解。

详情
Comments
This work has been submitted to the IEEE for possible publication
AI中文摘要

在许多信号处理问题中,模型设定错误会导致假设的观测模型与真实的数据生成机制不一致。设定错误的Cramér-Rao界(MCRB)是描述这种情况下均方误差(MSE)下限的广泛认可的界,最初用于描述设定错误最大似然(MML)估计量的渐近行为。尽管其广泛应用,MCRB缺乏对其有效估计器类别的严格表征。本文重新审视在模型设定错误下的参数估计理论,并重新审视MCRB的基础。我们首先展示了这些限制,并检查了一个基于局部设定错误无偏性的朴素MCRB版本。我们证明该界通常不紧且可能无法达到。为了获得有意义的界,我们基于点wise等价模型的概念开发了新的推导。通过最大化这些模型的朴素界,我们恢复了经典的MCRB,现在有了构造性的推导、相关估计器类别的显式表征以及等式条件。这种表述建立了局部无偏性条件与可达到的界之间的正式联系,为MCRB结构及其对实际估计器的相关性提供了新的见解。最后,我们定义了有效设定错误估计器的概念,并证明如果存在,则由MML估计量实现。

英文摘要

Estimation under model misspecification arises in many signal processing problems, where the assumed observation model deviates from the true data-generating mechanism due to errors or simplifications. The misspecified Cramér-Rao bound (MCRB) is a widely recognized mean-squared-error (MSE) lower bound for this case, which has originally been used to describe the asymptotic behavior of the misspecified maximum likelihood (MML) estimator. Despite its widespread use, the MCRB lacks a rigorous characterization of the class of estimators for which it is valid. In this paper, we revisit the theory of parameter estimation under model misspecification and re-examine the foundations of the MCRB. We first demonstrate these limitations and examine a naive version of the MCRB, which relies only on local misspecified unbiasedness. We show that this bound is generally not tight and may be unattainable. To obtain a meaningful bound, we develop a new derivation based on the concept of pointwise equivalent models. By maximizing the naive bound for these models, we recover the classical MCRB, now supported by a constructive derivation, an explicit characterization of the associated estimator class, and an equality condition. This formulation establishes a formal link between local unbiasedness conditions and achievable bounds, offering new insights into the MCRB structure and its relevance to practical estimators. Finally, we define the notion of an efficient misspecified estimator and show that if it exists, it is achieved by the MML estimator.

2605.20726 2026-05-21 stat.ME cs.LG stat.ML

Everywhere Valid Bounds on False Discovery Proportions in Conformal Inference

在符合推断中对虚假发现比例的处处有效界

Ziang Song, Ying Jin, Emmanuel J. Candès

AI总结 本文提出了一种在多重检验问题中对虚假发现比例(FDP)的处处有效界,通过构造高概率包络来保证在任意后验阈值选择下的统计保证,同时展示了该方法在异常检测和符合选择中的应用。

详情
Comments
31 pages, 12 figures. Code available at https://github.com/sza919/everywhere-valid-fdp-bounds-in-conformal-inference
AI中文摘要

现代将符合推断应用于多重检验问题,如异常检测和候选选择时,通常涉及选择符合p值低于阈值的测试样本。此类方法的质量通常通过虚假发现比例(FDP)来衡量,定义为错误选择的比例。现有方法通常控制FDP的期望值,使用如Benjamini-Hochberg过程等方法。这种做法无法提供高概率界下的实际FDP界,且当拒绝阈值在查看数据后选择时会破坏统计保证。本文建立了适用于所有可能拒绝阈值的有限样本、分布无关的FDP上界,从而允许任意后验阈值选择。通过从其联合分布中采样来构造null符合p值的经验分布函数的高概率包络,实现了同时有效性。此外,我们的框架允许从业者调节包络的形状,从而在主要感兴趣的拒绝区域中产生更紧的界。我们使用这种灵活的方法推导出异常检测和符合选择的的同时FDP上界。通过合成和真实数据实验,我们展示了所得到的界既有效又比现有方法的界更加不保守。

英文摘要

Modern applications of conformal inference to multiple testing problems, such as outlier detection and candidate selection, often involve selecting test samples whose conformal p-values fall below a threshold. The quality of such methods is often measured by the false discovery proportion (FDP), defined as the fraction of incorrect selections. Existing approaches typically control the expected value of the FDP, using methods such as the Benjamini-Hochberg procedure. This approach fails to provide high-probability bounds on the realized false discovery proportion and invalidates statistical guarantees if the rejection threshold is selected after inspecting the data. This paper establishes finite-sample, distribution-free upper bounds on the FDP that hold simultaneously over all possible rejection thresholds, enabling arbitrary post hoc selection of the threshold. Simultaneous validity is achieved by constructing a high-probability envelope for the empirical distribution function of null conformal p-values by sampling from their joint distribution. Furthermore, our framework allows practitioners to modulate the envelope's shape, thereby producing tight bounds in rejection regions of primary interest. We use this flexible approach to derive simultaneous FDP upper bounds for both outlier detection and conformal selection. We demonstrate through synthetic and real-data experiments that the resulting bounds are both valid and substantially less conservative than those derived from existing approaches.

2605.20710 2026-05-21 stat.ME

Assessing Estimate of CATE from Observational Data via an RCT Study

通过RCT研究评估从观察数据中估计的CATE

Bosen Cui, Yuhong Yang

AI总结 本文提出了一种通过RCT研究评估从观察数据中估计的CATE(条件平均处理效应)的方法,该方法通过在随机试验中评估CATE估计的拟合质量,从而提高其在实际应用中的可信度。

详情
Comments
34 pages, 5 figures
AI中文摘要

条件平均处理效应(CATEs)越来越多地从观察数据中估计并用于指导政策和个体化治疗决策。在实践中,在此类估计被信任之前,其预测适应性需要被评估,但仅靠观察数据本身提供有限的机会进行此类评估。我们提出了CATE评估通过适应性评估(CAFE),这是一种正式框架,用于直接评估从观察数据中学习的CATE估计的拟合质量,而不是完整的潜在结果模型,使用来自随机试验的证据。CAFE根据估计的倾向分数(或类似指标)将试验协变量空间划分为多个部分,并将观察到的条件处理效应与组水平的实验平均值进行比较。该框架可以容纳广泛类别的CATE学习器,包括参数模型和灵活的机器学习方法,如因果森林和提升方法。我们建立了在空虚假设和替代假设下的理论保证,并引入了最大型扩展以提高对局部不适应的敏感性。当同时可用随机试验数据和观察数据时,我们进一步开发了两阶段程序以检测未观察到的混杂因素的存在。广泛的数值研究展示了CAFE方法在评估观察数据导出的CATE估计时的实用性。

英文摘要

Conditional average treatment effects (CATEs) are increasingly estimated from observational data and used to guide policy and individualized treatment decisions. Before such estimates can be trusted in practice, their predictive fitness needs to be assessed, yet observational data alone offer limited opportunities for doing so. We propose CATE Assessment via Fitness Evaluation (CAFE), a formal framework for directly assessing the goodness-of-fit of a CATE estimate learned from observational data, rather than the full underlying outcome model, using evidence from a randomized trial. CAFE partitions the trial covariate space according to estimated propensity scores (or the like) and compares observationally derived conditional treatment effects with group-level experimental averages. The framework accommodates a broad class of CATE learners, including parametric models and flexible machine learning methods such as causal forest and boosting. We establish theoretical guarantees under both the null and alternative hypotheses, and introduce a maximum-type extension to improve sensitivity to localized lack of fit. When both randomized trial and observational data are available, we further develop a two-stage procedure to detect the existence of unobserved confounders. Extensive numerical studies show the utility of the CAFE approach when assessing observational-derived CATE estimates.

2605.20693 2026-05-21 cs.CL cs.AI stat.ML

Interpretable Discriminative Text Representations via Agreement and Label Disentanglement

通过共识和标签解缠获得可解释的判别文本表示

Tong Wang, Yiqing Xu, Leo Yang Yang

AI总结 本文提出了一种可解释的判别文本表示方法,通过共识和标签解缠来确保特征的可解释性和可重复性,实验表明该方法在多个文本分类任务中表现优异,产生了更清晰且更少标签纠缠的特征。

详情
AI中文摘要

可解释的文本表示应暴露出不仅具有预测性,而且对独立审计员来说有意义的坐标。现有的判别表示通常使用匿名嵌入方向,而概念瓶颈和LLM辅助方法将自然语言名称附加到特征上,但并未确保这些定义是可重复的或与目标标签不同。我们提出了一种可解释判别文本表示的操作标准:每个坐标应满足概念清晰度,通过独立标注员应用特征定义之间的机会调整一致性来衡量,并且标签解缠,即特征不应仅仅改述预测目标。我们通过LLM辅助特征发现(LFD)方法实现了这一标准,这是一种迭代方法,从对比性反向文本对中提出词汇和语义特征,通过跨LLM Cohen's $κ$ 筛选候选,并通过残差保留的预测增益选择特征。一种简化分析将$κ$筛选与每个特征的注释噪声界限联系起来,正式化一致性作为可靠性检查。在十个跨越七个语料库的文本分类任务中,LFD与强大的文本瓶颈基线具有相同的预测性能,同时产生明显更清晰且标签纠缠更少的特征。232名人类审计员的实验表明,LFD特征在人类-人类和人类-LLM一致性方面优于基线概念,且审计员一致认为它们更少标签泄漏。这些结果表明,经过一致性测试和标签解缠的坐标为可解释文本分类提供了一个实用的可审计标准。

英文摘要

Interpretable text representations should expose coordinates that are not only predictive, but also meaningful enough for independent auditors to apply. Existing discriminative representations often use anonymous embedding directions, while concept-bottleneck and LLM-assisted methods attach natural-language names to features without ensuring that those definitions are reproducible or distinct from the target label. We propose an operational criterion for interpretable discriminative text representations: each coordinate should satisfy conceptual clarity, measured by chance-adjusted agreement between independent annotators applying the feature definition, and label disentanglement, meaning the feature should not merely paraphrase the prediction target. We instantiate this criterion in LLM-assisted Feature Discovery (LFD), an iterative method that proposes lexical and semantic features from contrastive outcome-opposed text pairs, screens candidates using cross-LLM Cohen's $κ$, and selects features by residual held-out predictive gain. A stylized analysis connects the $κ$ screen to a per-feature annotation-noise bound, formalizing agreement as a reliability check. Across ten text-classification tasks spanning seven corpora, LFD matches the predictive performance of a strong text bottleneck baseline while producing substantially clearer and less label-entangled features. Human audits with 232 raters show that LFD features achieve higher human--human and human--LLM agreement than baseline concepts, and raters consistently judge them as less label-leaking. These results suggest that agreement-tested, label-disentangled coordinates provide a practical auditability standard for interpretable text classification.

2605.20692 2026-05-21 stat.ME q-bio.PE q-bio.QM stat.AP

Inferring infectiousness: a joint model of the within-host viral kinetics of SARS-CoV-2

推断传染性:SARS-CoV-2宿主内病毒动力学的联合模型

Christopher B. Boyer, Stephen M. Kissler, Seran Hakki, Jakob Jonnerby, Ajit Lalvani, Marc Lipsitch

AI总结 本文提出了一种联合模型,通过分析多个病毒脱落间接指标的数据,推断SARS-CoV-2宿主内病毒动力学的传染性轨迹,从而为政策制定提供更准确的传染性评估。

详情
AI中文摘要

在传染病爆发期间,提供准确的政策问题答案需要详细的传染病性自然史模型。不幸的是,直接测量传染性通常不可用。相反,我们通常依赖间接代理,如通过PCR或抗原测试测量的病毒载量、通过病毒培养检测复制活性病毒或症状发作,这些都反映了病毒动力学或宿主反应的不同方面。然而,这些代理在收集的便利性、可扩展性和与病毒脱落及基础传染性相关联方面存在差异。在此,我们利用来自五个前瞻性、密集采样队列的数据,这些队列有纵向数据,涵盖多个病毒脱落代理,约2000例感染,开发了一个贝叶斯联合模型,用于SARS-CoV-2感染的宿主内病毒动力学。对联合分布的建模使我们能够推断仅提供PCR数据的个体的病毒脱落轨迹——传染性的最直接相关指标,并计算无法通过任何单一代理单独获得的衍生量。这些包括根据诊断后时间、变种、疫苗接种状态和感染史分层的群体层面传染性持续时间和概率;隔离解除的残余风险;以及根据新检测结果逐步更新的个性化实时传染性估计。

英文摘要

During an infectious disease outbreak, providing accurate answers to policy questions about transmission requires a detailed model of the natural history of infectiousness. Unfortunately, direct measures of infectiousness are generally unavailable. Instead, we often rely on indirect proxies, such as viral load measured by PCR or antigen tests, viral culture to detect replication-competent virus, or symptom onset, each of which reflects different aspects of viral dynamics or host response. However, these proxies vary in terms of the ease of collection, scalability, and their relationship to viral shedding and therefore underlying infectiousness. Here, we use data from five prospective, densely sampled cohorts with longitudinal data on multiple proxies of viral shedding for approximately 2,000 infections to develop a Bayesian joint model for the within-host viral kinetics of SARS-CoV-2 infection. Modeling the joint distribution allows us to infer the trajectory of infectious virus shedding -- the most direct correlate of infectiousness -- for individuals who contribute only PCR data, and to compute derived quantities that are inaccessible from any single proxy alone. These include the population-level probability and expected duration of ongoing infectiousness as a function of time since diagnosis, stratified by variant, vaccination status, and infection history; the residual risk of releasing an individual from isolation; and personalized, real-time estimates of infectiousness that are sequentially updated as new test results become available.

2605.20681 2026-05-21 stat.ME cs.LG

Scale-Calibrated Median-of-Means for Robust Distributed Principal Component Analysis

基于尺度校准的中位数-均值方法用于鲁棒分布式主成分分析

Kisung You

AI总结 本文研究了基于尺度校准的中位数-均值估计器,用于鲁棒分布式主成分分析,通过欧几里得空间和格拉斯曼流形的产品几何结构,提出了一个节点级PCA展开,证明了所提出的产品流形中位数-均值估计器的渐近等价性,并展示了鲁棒块尺度和推断最优校准规则,以及高概率中位数-均值界限。

详情
AI中文摘要

分布式主成分分析(PCA)产生节点级的均值向量和主子空间估计。稳健地聚合这些异质对象需要均值误差和子空间误差之间的相对尺度。我们研究了使用欧几里得空间和格拉斯曼流形的产品几何结构的尺度校准的中位数-均值估计器用于此问题。一个节点级PCA展开显示,均值组件具有通常的线性影响,而子空间组件是特征间隙加权的协方差扰动。我们证明了一个局部减少,显示所提出的产品流形中位数-均值估计器在渐近上等价于一个缩放后的节点影响误差的空间中位数。这导致了固定节点非高斯极限、增长节点高斯极限和有限块偏差的高斯极限,以及显式依赖于尺度的协方差公式。我们提出了鲁棒块尺度和推断最优校准规则,建立了高概率中位数-均值界限,刻画了因子wise坏节点影响,并证明了节点Bootstrap有效性。模拟和大规模单细胞RNA-seq数据表明,尺度校准适应于特征间隙驱动的子空间不确定性,并提供了鲁棒的分布式PCA总结。

英文摘要

Distributed principal component analysis (PCA) produces node-level estimates of both a mean vector and a principal subspace. Robustly aggregating these heterogeneous objects requires a relative scale between mean error and subspace error. We study a scale-calibrated median-of-means estimator for this problem using the product geometry of Euclidean space and the Grassmann manifold. A node-level PCA expansion shows that the mean component has the usual linear influence, whereas the subspace component is an eigengap-weighted covariance perturbation. We prove a local reduction showing that the proposed product-manifold median-of-means estimator is asymptotically equivalent to a scaled spatial median of node influence errors. This yields fixed-node non-Gaussian limits, growing-node Gaussian limits with finite-block bias, and an explicit scale-dependent covariance formula. We propose robust block-scale and inference-optimal calibration rules, establish high-probability median-of-means bounds, characterize factorwise bad-node influence, and prove node-bootstrap validity. Simulations and large-scale single-cell RNA-seq data show that scale calibration adapts to eigengap-driven subspace uncertainty and provides a robust distributed PCA summary.

2605.20634 2026-05-21 stat.ME math.ST stat.TH

New Confidence Regions for Linear Regression Parameters with Stationary-Ergodic Dependent Errors

线性回归参数的新置信区域:具有平稳-遍历依赖误差

Mous-Abou Hamadou, Martial Longla, Mathias Nthiani Muia, Mahmud Hasan

AI总结 本文提出了一种在回归变量和误差 jointly stationary and ergodic 且未指定序列依赖的情况下,利用随机平滑和辅助样本进行回归系数联合置信区域估计的方法,无需直接估计长期方差或参数依赖模型,且在实际应用中表现出良好的覆盖性能和区域体积。

详情
AI中文摘要

我们开发了在回归变量和误差 jointly stationary and ergodic 且未指定序列依赖的情况下,线性回归系数的联合置信区域。该方法应用随机平滑,使用独立的辅助样本和收缩带宽,对回归和二阶矩统计量向量进行处理。在平稳性、遍历性和有限二阶矩条件下,估计量渐近正态,从而产生Wald置信区域和同时置信区间,而无需直接估计长期方差或参数依赖模型。在实现中,我们引入了标度估计量,具有数据驱动的带宽选择和一种温和的截断,以提高有限样本稳定性。在ARMA、ARFIMA、基于copula的马尔可夫误差和分数高斯噪声(具有高斯和重尾边缘分布)的模拟中,显示出接近名义覆盖和与Newey-West HAC和MAC相比具有竞争力的区域体积。一个北京冬季PM2.5应用示例展示了该过程。关键词:随机平滑,联合推断,置信区域,依赖误差,长期记忆,回归推断

英文摘要

We develop joint confidence regions for linear regression coefficients when the regressors and errors are jointly stationary and ergodic with unspecified serial dependence. The method applies random smoothing, using an independent auxiliary sample and shrinking bandwidth, to a vector of regression and second-moment statistics. Under stationarity, ergodicity, and finite second moments, the estimator is asymptotically normal and yields Wald confidence regions and simultaneous confidence intervals without direct long-run variance estimation or a parametric dependence model. For implementation, we introduce a scaled estimator with data-driven bandwidth selection and a mild truncation that improves finite-sample stability. Simulations under ARMA, ARFIMA, copula-based Markov errors, and fractional Gaussian noise, with Gaussian and heavy-tailed margins, show near-nominal coverage and competitive region volumes relative to Newey-West HAC and MAC. A winter Beijing PM2.5 application illustrates the procedure. Keywords: Random smoothing, Joint inference, Confidence regions, Dependent errors, Long memory, Regression inference

2605.20633 2026-05-21 stat.ME stat.AP

Application of Propensity Score Models and Causal Estimators in Observational Studies under Model Misspecification

倾向分数模型和因果估计器在模型不规范下的观察性研究应用

Apu Chandra Das, Sakib Salam, Md Robiul Islam Talukder, Ashim Chandra Das, Antar Chandra Das, Rakhi Chowdhury

AI总结 本文研究了在模型不规范情况下,倾向分数模型和因果估计器在观察性研究中的性能,发现AIPW在大多数情况下提供了稳健且稳定的估计,而IPW对PS模型不规范非常敏感,RSM仅在结果模型正确规范时表现良好。

详情
Comments
24 pages, 4 figures
AI中文摘要

倾向分数(PS)方法被广泛应用于观察性研究中,以减少混杂因素并估计因果治疗效应。然而,PS基于的因果估计器的有效性严重依赖于正确的模型规范,模型不规范可能导致显著的偏倚和不稳定性。在本研究中,我们系统地评估了常用因果估计器在不同水平的PS和结果模型不规范下的性能,包括响应面建模(RSM)、逆概率加权(IPW)和增强逆概率加权(AIPW)。我们比较了逻辑回归与几种机器学习方法在PS估计中的表现,包括随机森林(RF)、支持向量机(SVM)和线性判别分析(LDA)。在多个由正确规范和不规范的PS和结果模型、不同样本量和不同协变量相关结构定义的场景下进行了广泛的模拟研究。通过偏倚、绝对偏倚、均方根误差、经验标准误差和置信区间宽度来评估估计器性能。结果表明,AIPW在大多数情况下由于其双重鲁棒性提供了稳健且稳定的估计,而IPW对PS不规范非常敏感,且由灵活的机器学习方法产生的不稳定PS估计会使其不稳。RSM仅在结果模型正确规范时表现良好。使用ACTG175临床试验和阿尔茨海默病神经影像化验计划(ADNI)数据集的现实应用进一步说明了估计器选择和PS建模策略的实际影响。总体而言,我们的发现强调了在双重鲁棒框架内整合灵活的机器学习方法以提高观察性研究中的因果效应估计的重要性。

英文摘要

Propensity score (PS) methods are widely used in observational studies to reduce confounding and estimate causal treatment effects. However, the validity of PS-based causal estimators depends heavily on correct model specification, and model misspecification may lead to substantial bias and instability. In this study, we systematically evaluate the performance of commonly used causal estimators, including response surface modeling (RSM), inverse probability weighting (IPW), and augmented inverse probability weighting (AIPW), under varying levels of PS and outcome model misspecification. We compare classical logistic regression with several machine learning approaches for PS estimation, including random forests (RF), support vector machines (SVM), and linear discriminant analysis (LDA). Extensive simulation studies were conducted under multiple scenarios defined by combinations of correctly specified and misspecified PS and outcome models, varying sample sizes, and different covariate correlation structures. Estimator performance was assessed using bias, absolute bias, root mean squared error, empirical standard error, and confidence interval width. Results demonstrate that AIPW consistently provides robust and stable estimates across most scenarios due to its doubly robust property, whereas IPW is highly sensitive to PS misspecification and unstable PS estimates produced by flexible machine learning methods. RSM performs well only when the outcome model is correctly specified. Real-world applications using the ACTG175 clinical trial and the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset further illustrate the practical implications of estimator choice and PS modeling strategy. Overall, our findings highlight the importance of integrating flexible machine learning approaches within doubly robust frameworks to improve causal effect estimation in observational studies.

2605.20621 2026-05-21 stat.ME stat.AP stat.CO

Changepoint Detection in Categorical Time Series with Application to Daily Total Cloud Cover in Canada

在加拿大每日总云量中的类别时间序列中的突变点检测及其应用

Mo Li, QiQi Lu, XiaoLan Wang

AI总结 本文提出了一种边际过渡模型,用于检测周期性和序列相关类别时间序列中的单个突变点,通过马尔可夫链捕捉序列依赖性,并利用最大似然估计方法提高计算效率,以检测类别时间序列中的突然变化。

详情
Comments
31 pages, 16 figures, 5 tables; includes supplementary material; R/Rcpp code available in the linked GitHub repository
AI中文摘要

突变点对于同质化类别时间序列和分析其趋势和变化至关重要。加拿大原始总云量每小时记录在十分位(或八分位),表现出固有的季节性和序列相关性。Lu和Wang(2012)引入了扩展的累积logit模型来检测云量条件的年度频率变化。虽然年度汇总可以减轻季节性和序列相关性,但会缩短时间序列并可能导致过度分散。本文提出了一种边际过渡模型,用于检测周期性和序列相关的类别时间序列中的单个突变点。该模型利用一阶马尔可夫链捕捉序列依赖性,并允许类别特定的突变点指定。为了提高计算效率,我们开发了一种新的参数估计程序以获得最大似然估计。然后提出了一种最大化选择的似然比检验统计量来测试类别时间序列中的突然变化,并通过在加拿大不列颠哥伦比亚省弗雷德里克顿圣约翰机场9点和3点记录的每日总云量观测数据来说明该方法。

英文摘要

Changepoints are essential for homogenizing categorical time series and analyzing their trends and variations. The original total cloud cover in Canada was recorded hourly in tenths (or eighths), exhibiting inherent seasonality and serial correlation. Lu and Wang (2012) introduced an extended cumulative logit model to detect shifts in the annual frequencies of cloud cover conditions. While annual aggregation mitigates seasonality and serial correlation, it shortens the time series and may lead to overdispersion. This article introduces a marginalized transition model to detect a single changepoint in periodic and serially correlated categorical time series. The model captures serial dependence using a first-order Markov chain and enables category-specific changepoint specification. To enhance computational efficiency, we develop a new parameter estimation procedure for obtaining maximum likelihood estimates. A maximally selected likelihood ratio test statistic is then proposed to test for sudden changes in categorical time series, and the method is illustrated using daily total cloud cover observations recorded at 9 a.m. and 3 p.m. at Fort St. John Airport, British Columbia, Canada.

2605.20619 2026-05-21 cs.LG math.OC stat.ML

SURF: Steering the Scalarization Weight to Uniformly Traverse the Pareto Front

SURF: 通过调整标量化权重以均匀遍历帕累托前沿

Liuyuan Jiang, Chentong Huang, Lisha Chen

AI总结 本文提出SURF方法,通过调整标量化权重以实现帕累托前沿的均匀覆盖,解决了传统标量化方法在多目标优化中导致非均匀覆盖的问题。

详情
AI中文摘要

标量化在多目标优化中因其简单性和可扩展性而被广泛应用。然而,在许多应用中,目标是生成代表多样化用户偏好的解决方案,理想情况下应实现帕累托前沿(PF)的均匀覆盖。然而,通常均匀采样标量化权重通常会导致PF的非均匀覆盖。我们通过标量化路径的几何分析解释了这种不匹配。随着标量化权重的变化,对应的解决方案通常以非均匀的速度遍历PF。这种速度诱导了一个弧长累积分布函数(CDF);通过反向此CDF映射,可以得到一个原则性的规则,用于选择产生均匀PF覆盖的权重。基于这一见解,我们提出了SURF(沿帕累托前沿均匀采样)。对于结构化问题,包括双目标老虎机,我们推导了此CDF映射和由此产生的PF感知的权重采样规则。对于一般问题,SURF在CDF重建和权重采样之间交替进行。理论上,我们证明在可证明的条件下,SURF收敛到一个不可避免的有限采样地板。经验上,在老虎机、多目标gymnasium和多目标LLM对齐实验中,SURF在效率上实现了比基线更均匀的PF覆盖。

英文摘要

Scalarization is widely used in multi-objective optimization owing to its simplicity and scalability. In many applications, the goal is to generate solutions that represent diverse user preferences, ideally with uniform coverage of the Pareto front (PF). However, uniformly sampling scalarization weights usually induces non-uniform coverage of the PF. We explain this mismatch through a geometric analysis of the scalarization path. As the scalarization weight varies, the corresponding solutions trace the PF with a generally non-uniform traversal speed. This speed induces an arc-length cumulative distribution function (CDF); inverting this CDF map yields a principled rule for selecting weights that produce uniform PF coverage. Building on this insight, we propose SURF (Sampling Uniformly along the PaReto Front). For structured problems, including bi-objective bandits, we derive closed-form expressions for this CDF map and the resulting PF-aware weight sampling rule. For general problems, SURF alternates between CDF reconstruction and weight sampling. Theoretically, we show that under provable conditions, SURF converges linearly to an unavoidable finite-sampling floor. Empirically, experiments on bandits, multi-objective-gymnasium, and multi-objective LLM alignment demonstrate that SURF efficiently achieves more uniform PF coverage than baselines.

2605.20604 2026-05-21 stat.ME

Conditional regularized halfspace depth for sparse functional data and its applications

基于稀疏函数数据的条件正则化半空间深度及其应用

Hyemin Yeon, Xiongtao Dai, Sara Lopez-Pintado

AI总结 本文提出了一种新的稀疏函数数据深度概念——条件正则化半空间深度(CRHD),用于评估稀疏观测数据的深度,克服了现有方法对重构曲线的依赖,并通过排名检验和婴儿生长数据集展示了其应用价值。

详情
AI中文摘要

许多函数数据集是稀疏且不规则观测的。对这类数据进行排序具有挑战性,因为每个观测点仅提供有限的信息,而潜在轨迹仍然是无限维的。本文开发了一种新的深度概念,称为条件正则化半空间深度(CRHD)。CRHD定义为给定观测稀疏测量的潜在轨迹的条件半空间概率的下确界,从而允许在稀疏观测上直接进行深度评估,而不需要轨迹重构。我们研究了CRHD的几个基本理论性质,以澄清其作为深度度量的行为。所提出的深度甚至适用于极其稀疏观测的函数数据,克服了现有稀疏函数深度方法的关键限制,这些方法通常依赖于重构曲线。此外,CRHD为复杂函数数据诱导了有意义的排名。通过基于排名的检验展示了其数值性能,并通过婴儿生长数据集展示了其实际应用价值。

英文摘要

Many functional datasets are observed sparsely and irregularly. Ordering such data is challenging because only limited information is available from each observation, while the underlying trajectories remain infinite-dimensional. This paper develops a novel depth notion for sparse functional data, called the conditional regularized halfspace depth (CRHD). CRHD is defined as the infimum of conditional halfspace probabilities of the underlying trajectory given the observed sparse measurements, thereby enabling depth evaluation directly at sparse observations without requiring trajectory reconstruction. We study several basic theoretical properties of CRHD that clarify its behavior as a depth measure. The proposed depth is applicable even to extremely sparsely observed functional data, overcoming key limitations of existing sparse functional depths that often rely on reconstructed curves. In addition, CRHD induces meaningful rankings for complex functional data. Its numerical performance is demonstrated through rank-based tests, and its practical utility is illustrated using an infant growth dataset.

2605.20572 2026-05-21 math.ST stat.ME stat.TH

Minimax unbiased estimation for finite populations with bounded outcomes

有限总体的最小最大无偏估计:具有有界结果

P. M. Aronow, Patrick Lopatto

AI总结 本文研究了在每个结果满足已知边界的情况下,对有限总体总和进行设计无偏估计的问题,推导了在矩形参数空间下的最坏情况平方误差下界,并证明了当单元包含指示符成对独立时,最小最大估计器为中点差分Horvitz-Thompson估计器。

详情
Comments
14 pages
AI中文摘要

我们研究了在每个结果满足已知边界$y_i\in[a_i,b_i]$的情况下,对有限总体总和$\sum_{i=1}^N y_i$进行设计无偏估计的问题。对于任何具有包含概率$\pi_i>0$的抽样设计,我们证明了在矩形参数空间下的最坏情况平方误差的紧下界。该下界当且仅当单元包含指示符成对独立时才能达到,此时最小最大估计器是中点差分Horvitz-Thompson估计器$\sum_{i=1}^N m_i+\sum_{i\in S}(y_i-m_i)/\pi_i$,其中$m_i=(a_i+b_i)/{2}$。然后我们在约束$\sum_i \pi_i\le n$下解决联合设计与估计问题。我们发现,最小最大策略以概率$\pi_i^\ast=\min(1,c (b_i-a_i))$独立抽取单元,其中$c>0$被选择使得$\sum_i \pi_i^\ast=n$,并使用中点差分估计器。这将Gabler (1990)的线性最小最大结果扩展到完整的设计无偏估计器类。我们还证明该估计器在无偏估计器中是可接受的,并且是仿射等变的。

英文摘要

We study design-unbiased estimation of the finite-population total $\sum_{i=1}^N y_i$ when each outcome satisfies known bounds $y_i\in[a_i,b_i]$. For any sampling design with inclusion probabilities $π_i>0$, we prove a sharp lower bound on the worst-case squared error over the rectangular parameter space. This bound is attained if and only if the unit inclusion indicators are pairwise independent, in which case the minimax estimator is the midpoint-differenced Horvitz-Thompson estimator $\sum_{i=1}^N m_i+\sum_{i\in S}(y_i-m_i)/π_i$, with $m_i=(a_i+b_i)/{2}$. We then solve the joint design-and-estimation problem under the constraint $\sum_i π_i\le n$. We find that a minimax strategy samples units independently with probabilities $π_i^\ast=\min(1,c (b_i-a_i))$ where $c>0$ is chosen so that $\sum_i π_i^\ast=n$, and uses the midpoint-differenced estimator. This extends Gabler (1990)'s linear minimax result to the full class of design-unbiased estimators. We also show that the estimator is admissible among unbiased estimators and affine equivariant.

2605.20567 2026-05-21 stat.ME

Meta-analysis and network meta-analysis of time-to-event outcomes with non-proportional hazards: a Bayesian time-varying hazard ratio approach

时间至事件结局的元分析与网络元分析:基于贝叶斯时间变化危险比方法的非比例危险处理

Rhiannon K Owen, Keith R Abrams

AI总结 本文提出了一种基于贝叶斯时间变化危险比方法的元分析和网络元分析方法,用于处理时间至事件结局中非比例危险的情况,通过分析化疗与标准治疗在晚期复发胃癌中的疗效,并在晚期BRAF突变黑色素瘤中评估总生存率,展示了该方法在非比例危险假设不成立时的有效性。

详情
Comments
23 pages, 13 figures, 3 tables & Presented as an Oral Contribution at International Society for Clinical Biostatistics (ISCB) Conference (ISCB-46), Basel, August 27, 2025
AI中文摘要

背景:在进行时间至事件(TTE)结局的元分析,尤其是在健康技术评估(HTA)背景下时,通常使用危险比(HR)尺度。然而,当某些研究显示非比例危险时会出现问题。尽管已有多种方法被推荐,但它们的使用受到复杂性和结果在HTA中应用的便利性限制。替代方法是假设在Cox比例危险模型中每个研究内存在治疗-时间交互作用,并对由此产生的治疗和交互系数进行双变量元分析,从而获得总体时间变化危险比(TVHR)。方法:该TVHR方法被应用于比较化疗与标准治疗在晚期复发胃癌的元分析,其中无进展生存期(PFS)是结局。该方法也应用于评估晚期BRAF突变黑色素瘤的网络元分析(NMA)中的总生存期(OS)。结果:在晚期胃癌的元分析中,有五项试验显示出PFS的非比例危险证据。使用TVHR模型得到的HR在0.5年时为0.83(CrI:0.75-0.91),在3.5年时为0.99(CrI:0.79-1.23)。在晚期BRAF突变黑色素瘤NMA中,三项研究显示出OS的非比例危险证据。使用TVHR模型,nivolumab加ipilimumab在第七个月后持续优于对照组,HR从一年时的0.37(CrI:0.26-0.51)提高到五年时的0.24(CrI:0.12-0.45)。结论:当比例危险假设不成立时,采用TVHR方法进行TTE结局的元分析或NMA,能够提供直观的解决方案,便于在HTA中使用。

英文摘要

Background: Often when undertaking meta-analyses of time-to-event (TTE) outcomes, especially in a Health Technology Assessment context, a hazard ratio (HR) scale is used. However, issues arise when there is evidence of non-proportional hazards in some of the studies included. A number of methods have been advocated, but their use has been limited by either their complexity and/or the ease with which their results can be used in HTA. An alternative approach is to assume a treatment-log(time) interaction within a Cox proportional hazards model for each study, and to then undertake a bivariate meta-analysis of the resulting treatment and interaction coefficients, so that an overall time-varying HR (TVHR) can be obtained. Methods: A TVHR approach was applied to a meta-analysis of chemotherapy compared to Standard of Care for advanced recurrent gastric cancer, and in which Progression-Free Survival (PFS) was an outcome. The approach was also applied to a network meta-analysis (NMA) evaluating overall survival (OS) in advanced BRAF-mutated melanoma. Results: Five trials in the advanced gastric cancer meta-analysis displayed evidence of non-proportional hazards for PFS. Using a TVHR model produced HRs ranging from 0.83 (CrI:0.75-0.91) at 0.5 years to 0.99 (CrI:0.79-1.23) at 3.5 years. Three studies showed evidence of non-proportional hazards in the advanced BRAF-mutated melanoma NMA for OS. Using a TVHR model, nivolumab plus ipilimumab demonstrated consistent superiority from month 7 onwards, with a HR improving from 0.37 (CrI:0.26-0.51) at one year to 0.24 (CrI:0.12-0.45) at five years. Conclusions: A TVHR approach to the meta-analysis or NMA of TTE outcomes when the proportional hazards assumption appears not to hold, produces an intuitive solution which can be readily used in HTA.

2605.20559 2026-05-21 stat.ML cs.LG stat.AP stat.ME

Group-Aware Matrix Estimation and Latent Subspace Recovery

基于群体的矩阵估计与潜在子空间恢复

Hamza Golubovic, Matthew Shen, Genevera I. Allen, Tarek M. Zikry

AI总结 本文提出了一种针对异质数据中群体特定低秩矩阵估计的凸估计器GAME,通过重叠核范数惩罚正则化来恢复子群特定的子空间结构,同时在共享坐标系中保留局部潜在结构,并在不同数据集上验证了其在结构缺失情况下优于传统低秩方法的性能。

详情
Comments
12 pages, 6 main figures, 1 main algorithm
AI中文摘要

现代矩阵补全问题通常涉及异质数据,其行同时属于多个元类别,如推荐系统中的人口统计数据和年龄组,或神经电生理实验中的区域和记录会话标签。标准低秩估计器施加单一全局潜在几何结构,可以恢复平均结构,但可能平滑掉子群特定的变异,尤其是在观察分布不均的情况下。我们引入了Group-Aware Matrix Estimation (GAME),一种用于重叠子群级低秩矩阵估计的凸估计器。GAME通过重叠核范数惩罚正则化子群特定的子矩阵,允许相关组之间共享信息,同时在共享坐标系中保留局部潜在结构。我们为重建误差和子群特定子空间恢复提供了有限样本保证,展示了性能如何依赖于采样密度、子群秩和重叠结构。在合成、推荐、生态和神经科学数据集上的实验表明,GAME在结构缺失情况下最有益,其中子群意识正则化提高了重建准确性和潜在子空间保真度。在这些基准测试中,GAME在全局低秩、侧信息和现代填补基线中表现竞争力或最佳,当子群表现出不同低秩结构时,收益最大。

英文摘要

Modern matrix completion problems often involve heterogeneous data whose rows simultaneously belong to many meta-categories, such as demographic and age groups in recommendation systems, or region and recording session labels in neural electrophysiological experiments. Standard low-rank estimators impose a single global latent geometry, which can recover average structure but may smooth away subgroup-specific variation, especially when observations are unevenly distributed across groups. We introduce Group-Aware Matrix Estimation (GAME), a convex estimator for overlapping subgroup-wise low-rank matrix estimation. GAME regularizes category-specific submatrices through overlapping nuclear-norm penalties, allowing related groups to borrow information while preserving local latent structure in a shared coordinate system. We provide finite-sample guarantees for both reconstruction error and subgroup-specific subspace recovery, showing how performance depends on sampling density, subgroup rank, and overlap structure. Experiments on synthetic, recommendation, ecological, and neuroscience datasets show that GAME is most beneficial in structured missingness regimes, where subgroup-aware regularization improves both reconstruction accuracy and latent subspace fidelity. Across these benchmarks, GAME is competitive or best among global low-rank, side-information, and modern imputation baselines, with the largest gains when subgroups exhibit distinct low-rank structure.

2605.20552 2026-05-21 stat.ML cs.LG

Spectral bandits for smooth graph functions with applications in recommender systems

图上平滑函数的谱带it问题及其在推荐系统中的应用

Tomáš Kocák, Michal Valko, Rémi Munos, Branislav Kveton, Shipra Agrawal

AI总结 本文研究了图上平滑函数的带it问题,提出了一种在推荐系统中有效学习用户偏好的方法,通过有效维度的定义和线性缩放的算法,实现了低悔的在线学习。

详情
Comments
Published at AAAI 2014 - SDMBD
AI中文摘要

图上的平滑函数在流形和半监督学习中有广泛应用。本文研究了一个带it问题,其中臂的收益在图上是平滑的。该框架适用于涉及图的在线学习问题,如基于内容的推荐。在该问题中,每个推荐的项目是一个节点,其预期评分与其邻居相似。目标是推荐具有高预期评分的项目。我们旨在设计累积遗憾不随节点数量劣化的算法。特别是,我们引入了有效维度的概念,该概念在现实世界图中较小,并提出了两种算法,其规模与该维度线性相关。我们在现实世界的内容推荐问题上的实验表明,从仅几十个节点的评估中即可学习出对成千上万项目的良好用户偏好估计器。

英文摘要

Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each recommended item is a node and its expected rating is similar to its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose two algorithms for solving our problem that scale linearly in this dimension. Our experiments on real-world content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens nodes evaluations.

2605.20550 2026-05-21 math.ST stat.TH

Kernel Density Estimation under $C^{1,1}$ Regularity: AMISE, Weak Curvature, and Plug-in Bandwidths

核密度估计在$C^{1,1}$正则性下:AMISE、弱曲率和插件带宽

Alireza Kabgani, Elaheh Lotfian

AI总结 本文研究了在$C^{1,1}$正则性条件下核密度估计的AMISE理论,提出了弱曲率概念,并在不假设经典二阶导数连续的情况下,推导出AMISE公式、最优带宽和Epanechnikov核最优性。

详情
AI中文摘要

经典的核密度估计通常通过点wise泰勒展开推导出AMISE和最优带宽,这需要两次连续可导。这一假设比必要强,排除了自然密度,这些密度来自阈值模型、制度变化和鲁棒混合模型,其中一阶导数可能是Lipschitz的,而曲率可能是尖点、不连续或仅弱定义的。我们证明在更弱的条件$f\in C^{1,1}(\mathbb{R})$下,经典AMISE理论仍然有效。点wise $C^2$泰勒展开被基于弱二阶导数的积分泰勒表示所替代,因此$R(f'')$被解释为弱曲率功能。在$f\in C^{1,1}(\mathbb{R})$和$f''\in L^2(\mathbb{R})$的条件下,我们恢复了经典的AMISE公式、$n^{-1/5}$最优带宽和Epanechnikov核最优性,而无需假设连续的经典二阶导数。我们还提出了一种广义曲率插件带宽选择器,证明其在比率一致曲率估计下的AMISE等价性,并建立了留一法U统计量曲率估计器的一致性。使用弱Hessian的多元扩展恢复了标量带宽率$n^{-4/(d+4)}$。

英文摘要

Classical kernel density estimation usually derives the AMISE and optimal bandwidth from a pointwise Taylor expansion, which requires twice continuous differentiability. This assumption is stronger than necessary and excludes natural densities arising from threshold models, regime changes, and robust mixture models, where the first derivative may be Lipschitz while the curvature is kinked, discontinuous, or only weakly defined. We show that the classical AMISE theory remains valid under the weaker condition $f\in C^{1,1}(\mathbb{R})$. The pointwise $C^2$ Taylor expansion is replaced by an integral Taylor representation based on the weak second derivative, so that $R(f'')$ is interpreted as a weak-curvature functional. Under $f\in C^{1,1}(\mathbb{R})$ and $f''\in L^2(\mathbb{R})$, we recover the classical AMISE formula, the $n^{-1/5}$ optimal bandwidth, and Epanechnikov kernel optimality without assuming a continuous classical second derivative. We also propose a generalized-curvature plug-in bandwidth selector, prove its first-order AMISE equivalence under ratio-consistent curvature estimation, and establish consistency of a leave-one-out U-statistic curvature estimator. A multivariate extension using weak Hessians recovers the scalar-bandwidth rate $n^{-4/(d+4)}$.

2605.20547 2026-05-21 cs.LG cs.AI stat.ML

Latent Process Generator Matching

潜在过程生成器匹配

Lukas Billera, Hedwig Nora Nordlinder, Ben Murrell

AI总结 本文提出了一种潜在过程生成器匹配框架,该框架将观测到的生成状态视为可 tractable 马尔可夫过程的确定性图像,从而扩展了生成器匹配理论,使其适用于时间依赖的潜在条件过程。

详情
Comments
18 pages, 1 figure
AI中文摘要

许多近期的流匹配和扩散式生成模型在训练过程中依赖于辅助的随机动力学:通过模拟更丰富的过程来定义条件目标,但辅助状态在生成时要么难以采样,要么并不属于期望的输出。现有的生成器匹配理论规范了对静态潜在随机变量的条件,而几篇近期论文证明了特定增强状态构造的投影结果的特殊情况。我们引入了潜在过程生成器匹配,一种通用框架,将观测到的生成状态视为可 tractable 马尔可夫过程的确定性图像 $X_t=Φ(Y_t)$。我们显示在这一设定下,可以在图像空间中学习一个随机过程的生成器,其一阶边缘分布与投影过程相同。这扩展并涵盖了文献中的离散潜在过程结果,并将生成器匹配从静态潜在变量扩展到丰富的时间依赖潜在条件过程家族。

英文摘要

Many recent flow-matching and diffusion-style generative models rely on auxiliary stochastic dynamics during training: a richer process is simulated to define conditional targets, but the auxiliary state is either intractable to sample at generation time or simply not part of the desired output. Existing Generator Matching theory formalises conditioning on static latent random variables, and several recent papers prove special cases of projection results for particular augmented-state constructions. We introduce latent process generator matching, a general framework that treats the observed generative state as a deterministic image $X_t=Φ(Y_t)$ of a tractable Markov process $Y_t$. We show that in this setting one may learn the generator of a stochastic process on the image space which has the same one-time marginal distributions as the projected process. This generalizes and subsumes the discrete latent process results from the literature, and extends Generator Matching from static latent variables to a rich family of time-dependent latent conditional processes.

2605.20545 2026-05-21 stat.ML cs.LG

Sample Complexity of Transfer Learning: An Optimal Transport Approach

迁移学习的样本复杂性:一种最优传输方法

Haoyang Cao, Xin Guo, Wenpin Tang, Guan Wang

AI总结 本文通过最优传输视角分析迁移学习的样本效率,发现当数据维度d大于3时,迁移学习的样本复杂性为O(m^{-(α+1)/d}),优于直接学习的O(m^{-p/d}),其中α表示数据分布的光滑度,p表示最优目标模型的光滑度。

详情
AI中文摘要

迁移学习是许多复杂结构的机器学习/AI模型,如大语言模型和生成式AI中的关键技术。迁移学习的本质是利用已解决的源任务知识来解决新目标任务,尤其是在后者训练数据样本量m较低时。本文严格分析了迁移学习在样本效率方面的潜在优势。具体而言,从最优传输视角出发,我们发现当数据维度d大于3时,迁移学习的样本复杂性为O(m^{-(α+1)/d}),其中α表示数据分布的光滑度,而直接学习的样本复杂性为O(m^{-p/d}),其中p表示最优目标模型的光滑度。我们的发现从理论上支持了当目标任务在一系列不太光滑的模型(即高度复杂的网络,可能使用非光滑激活函数)中优化时,迁移学习具有更好的样本效率。以图像分类为例,我们通过数值实验展示了迁移学习的样本效率,即在数据渴求的 regime 中,迁移学习可以显著提升模型性能。

英文摘要

Transfer learning is an essential technique for many machine learning/AI models of complex structures such as large language models and generative AI. The essence of transfer learning is to leverage knowledge from resolved source tasks for a new target task, especially when the sample size $m$ of the training data for the latter is low. In this work, we rigorously analyze the potential benefit of transfer learning in terms of sample efficiency. Specifically, taking an optimal transport viewpoint of transfer learning, we find that when the data dimension $d$ is higher than $3$, the sample complexity for transfer learning is $O(m^{-(α+1)/d})$, with $α$ indicating the smoothness of the data distribution, as opposed to the $O(m^{-p/d})$ sample complexity for direct learning with $p$ indicating the smoothness of the optimal target model. Our finding theoretically supports a better sample efficiency for transfer learning, when the target task is optimizing over a family of not-so-smooth models (i.e., highly complex networks with the possible use of non-smooth activation functions). Using image classification as an example, we numerically demonstrate the sample efficiency for transfer learning, that is, in the data hungry regime, the model performance can be significantly improved by transfer learning.

2605.20541 2026-05-21 math.ST math.PR stat.TH

Finite-Sample Bounds for Expected Signature Estimation under Weak Dependence

有限样本下弱依赖条件下期望签名估计的界限

Bryson Schenck

AI总结 本文研究了在弱依赖条件下,从单一长依赖轨迹估计期望签名的有限样本界限,通过块平均估计器证明了非渐近的均方误差界,并探讨了在不同Hurst指数下的收敛性。

详情
Comments
51 pages, 1 figure
AI中文摘要

期望签名在满足矩增长条件时唯一确定随机粗糙路径的分布,但此前缺乏从单一长依赖轨迹估计其有限样本界限。本文研究了一个平稳随机过程,其样本路径可解释为几何粗糙路径,被划分为等间距观测的块,并证明了块平均估计器的非渐近均方误差界。当路径的Hölder正则性至多为1/2时,需要粗糙路径理论来定义估计量,因为Young积分和Riemann-Stieltjes积分无法定义签名的迭代积分。在矩、平稳性和块签名协方差衰减条件(严格弱于α-混合且适用于长程依赖驱动器)下,误差分为离散化项和波动项,其速率分别由路径正则性和依赖强度决定。通过逐层粗糙因子方差分析,保持有限截断常数显式,并在固定观测预算下获得最优分配规则。本文验证了分数奥本海姆-乌伦贝克过程在三个制度下的假设,即粗糙(Hurst H<1/2)、半鞅(H=1/2)和长程(H>1/2)。蒙特卡罗实验显示经验收敛速率快于理论上界。

英文摘要

The expected signature uniquely determines the law of a random rough path under a moment-growth condition, yet finite-sample bounds for estimating it from a single long dependent trajectory have been lacking. We study a stationary stochastic process whose sample paths can be interpreted as geometric rough paths, partitioned into blocks of equally-spaced observations, and prove a non-asymptotic mean-squared error bound for the block-averaging estimator. Rough-path theory is required for the estimand to be well-defined when paths have Hölder regularity at most $1/2$, because Young and Riemann--Stieltjes integration cannot define the signature's iterated integrals. Under moment and stationarity assumptions together with a covariance-decay condition on block signatures -- strictly weaker than $α$-mixing and applicable to long-range-dependent drivers -- the error separates into a discretization term and a fluctuation term, with rates determined respectively by path regularity and dependence strength. A level-wise rough-factorial variance analysis keeps finite-truncation constants explicit and yields an optimal allocation rule under a fixed observation budget. We verify the assumptions for fractional Ornstein--Uhlenbeck processes in three regimes, namely rough (Hurst $H<1/2$), semimartingale ($H=1/2$), and long-range ($H>1/2$). Monte Carlo experiments show empirical convergence rates faster than the theoretical upper bounds.

2605.20154 2026-05-21 stat.ME stat.AP

Component over Composite: Mitigating Type I Error Inflation when Imputing "Days Alive and at Home"

组件与复合体:在填补“在家存活天数”时减轻I型错误膨胀

Mia S. Tackney, Sarah Dawson, Letao Yuan, Dominique-Laurent Couturier, Sofia S. Villar

AI总结 本文研究了在填补“在家存活天数”复合结局时如何减轻I型错误膨胀的问题,通过模拟研究比较了不同处理缺失数据的方法,发现对组件层面进行多重插补比对复合体层面进行插补更能控制I型错误,建议未来研究应开发更适用于复杂DAH定义的插补方法。

详情
AI中文摘要

背景:在家存活天数(DAH)是在预定义随访期内的一个新型干预后复合结局,结合了至少三个组成部分的数据:(i)初始住院天数,(ii)总再住院或其他出院后护理的天数,(iii)死亡率。缺失值给分析DAH结局的试验带来了独特挑战,因为三个组成部分可能有不同的缺失率,由于不同的缺失数据机制。当前方法将DAH定义为缺失如果任何组成部分缺失,并进行完整案例分析或复合物的多重插补(MI)。方法:通过受NOTACS试验启发的模拟研究,我们比较了几种处理缺失数据的方法,包括完整案例分析、复合物的MI,以及在主要分析为曼-惠特尼-威尔科克森检验时对组件进行MI。结果:对组件层面进行MI在I型错误控制和功效方面有良好的特性。我们警告不要使用预测均值匹配(PMM)对复合体层面进行MI,这可能导致I型错误膨胀。结论:鉴于DAH的复杂分布特征,将缺失性定义在复合体层面并直接使用PMM插补复合体的简单方法可能导致I型错误膨胀。建议对组件层面进行插补,未来研究应包括开发适用于更复杂DAH定义的插补方法,以及对“缺失于随机”假设的敏感性分析建议。

英文摘要

Background: Days Alive and at Home (DAH) over a pre-defined follow-up period is a novel post-intervention composite outcome that combines data from at least three components: (i) initial length of hospital stay, (ii) length of total readmissions or other post-discharge care and (iii) mortality. Missing values bring unique challenges to the analysis of trials with the DAH outcome as the three components may have different rates of missingness caused by distinct missing data mechanisms. Current approaches define DAH as missing if any of the components are missing, and proceed with complete cases or Multiple Imputation (MI) of the composite. Methods: Through a simulation study motivated by the NOTACS trial, we compare several methods of handling missing data, including complete case analysis, MI of the composite, and MI of the components when the primary analysis is a Mann-Whitney-Wilcoxon test. Results: MI on the component level has good properties in terms of type I error control and power. We caution against the use of MI on the composite level with Predictive Mean Matching, which can lead to type I error inflation. Conclusions: Given the complex distributional characteristics of DAH, naive approaches such as defining missingness on the composite level and directly imputing the composite with Predictive Mean Matching, can lead to type I error inflation. Imputing on the component level is recommended, suggested future work included imputation approaches that are compatible with more complex definitions of DAH, as well as recommendations for sensitivity analyses to the Missing at Random assumption.

2604.21212 2026-05-21 stat.AP

Legal Infrastructure Organizes Eviction: Evidence from Philadelphia

法律基础设施组织驱逐:来自费城的证据

Marios Papamichalis, Regina Ruane

AI总结 本文研究了费城驱逐诉讼中法律基础设施的组织结构,发现集中化的原告律师、长期原告律师依赖、重复使用同一物业以及反复暴露租户名称是驱逐诉讼的主要特征,同时揭示了驱逐诉讼是多层次的上游过程。

详情
Comments
This is a preprint before submission
AI中文摘要

我们利用1969年至2022年间费城市政法院房东-房客记录中的755,004条数据,分析驱逐诉讼的起诉方法律基础设施。其中747,125条为住宅案件。费城的驱逐诉讼由一个集中化的原告律师群体、长期原告律师依赖、重复使用同一物业以及反复暴露租户名称所组织。1983年至2022年间,最活跃的10名原告律师处理了每年平均82.2%的原告方案件,而最活跃的10名原告仅处理了14.8%的案件。大原告严重依赖单一律师:对于至少提起101起案件的原告,78.3%的案件由其最常用的律师处理。重复性在案件日程中同样至关重要。在住宅案件中,48.8%的案件发生在有前一年案件的地址,23.6%的案件发生在有六个或更多前次案件的地址;这些重复案件通常由同一原告提起,并遵循更偏向默认条款、较少协议条款的路径。我们进一步研究了更狭窄的机制:严格转向专业原告律师,定义为原告转而聘请前一年前十名律师之一。在转向后,案件数量出现非平缓的增长趋势,表明组织重构而非纯粹的外生冲击。原告内和原告-物业内比较得出更稳定的估计:协议判决、费用份额、豁免语言以及修正的锁出触发语言下降,而截止期限语言上升。我们解释驱逐诉讼为一个多层次的上游过程,其中集中化的律师、重复的地点和反复出现的租户在任何法庭谈判或裁决之前产生案件。

英文摘要

We analyze the filing-side legal infrastructure of eviction using 755,004 Philadelphia Municipal Court landlord-tenant records filed between 1969 and 2022, of which 747,125 are residential. Eviction in Philadelphia is organized upstream by a concentrated plaintiff-side bar, durable plaintiff-attorney dependence, repeated use of the same properties, and recurring tenant-name exposure. Between 1983 and 2022, the ten most active plaintiff attorneys handled 82.2% of represented plaintiff-side cases per year on average, compared with 14.8% for the ten most active plaintiffs. Large plaintiffs depend heavily on a single attorney: among plaintiffs filing at least 101 cases, 78.3% of each plaintiff's filings are handled by that plaintiff's most-used attorney, on average. Repetition is likewise central to the docket. Across the residential filing universe, 48.8% of cases occur at addresses with a prior filing in the preceding year, and 23.6% at addresses with six or more prior filings; these repeats are usually filed by the same plaintiff and follow a more default-heavy, less agreement-heavy pathway. We further examine a narrower mechanism: strict switches into specialist plaintiff-side counsel, defined as a plaintiff changing attorney to one in the prior-year top ten. Filing counts rise around the switch with non-flat pre-trends, indicating organizational reconfiguration rather than a clean exogenous shock. Within-plaintiff and within-plaintiff-property comparisons yield more stable estimates: judgment by agreement, fee share, waiver language, and corrected lockout-trigger language decline, while deadline language rises. We interpret eviction as a layered upstream process in which concentrated counsel, repeated places, and recurring tenants produce filings before any courtroom bargaining or adjudication occurs.

2604.20985 2026-05-21 cs.LG cs.AI cs.CR stat.ML

Differentially Private Model Merging

差分隐私模型融合

Qichuan Yin, Manzil Zaheer, Tian Li

AI总结 本文提出两种后处理技术,随机选择和线性组合,用于在不额外训练的情况下生成满足任意目标差分隐私要求的最终私有模型,同时分析了这些方法在一般问题和私有均值估计中的隐私-效用权衡。

详情
AI中文摘要

在机器学习中,推理或部署时间的隐私要求往往由于政策、法规或用户偏好变化而演变。在本文中,我们旨在构建一组模型,以满足任何目标差分隐私(DP)要求,而无需额外训练,给定一组已在相同数据集上训练且具有不同隐私/效用权衡的现有模型。我们提出两种后处理技术,即随机选择和线性组合,以生成最终的私有模型,满足任何目标隐私参数。我们从R'enyi DP和一般问题中的隐私损失分布的角度提供了这些方法的隐私计费,以及在私有均值估计中的精确隐私/效用权衡分析,并比较了这两种机制。实验上,我们展示了我们方法的有效性,并在多个模型和合成及现实世界数据集上验证了我们的分析。

英文摘要

In machine learning, privacy requirements at inference or deployment time often evolve due to changing policies, regulations, or user preferences. In this work, we aim to construct a magnitude of models to satisfy any target differential privacy (DP) requirement without additional training, given a set of existing models trained on the same dataset with different privacy/utility tradeoffs. We propose two post-processing techniques, namely random selection and linear combination, to generate final private models satisfying any target privacy parameter. We provide privacy accounting of these approaches from the lens of R'enyi DP and privacy loss distributions on general problems, as well as on private mean estimation, where we precisely characterize the privacy/utility tradeoffs and compare the two mechanisms. Empirically, we demonstrate the effectiveness of our approaches and validate our analyses on several models and both synthetic and real-world datasets.

2603.26184 2026-05-21 stat.AP

Why decision curves go above or below treat-all and treat-none: a PPV- and calibration-based guide for clinical prediction models

为何决策曲线高于或低于全治疗和无治疗:基于PPV和校准的临床预测模型指南

Linard Hoessly

AI总结 本文通过阈值特定的观察风险,将决策曲线性能与校准联系起来,提出了PPV曲线作为决策曲线的实用补充。

详情
Comments
Comments welcome
AI中文摘要

净收益被广泛用于评估预测模型的临床效用,但其解释在实践中往往困难。在本文中,我们开发了两种互补的解释,使净收益更容易被临床受众理解。我们展示了与无治疗和全治疗的比较可以通过阈值特定的观察风险在阈值以上和以下的患者中表达,将决策曲线性能与在临床相关亚组中的校准联系起来。我们还展示了净收益如何与阳性预测值相关联,提供了更直观的解释,说明何时基于模型预测采取行动是合理的。我们推导并展示了这些结果,并提出阳性预测值曲线作为决策曲线的实用补充。

英文摘要

Net benefit is widely used and reported to evaluate the clinical utility of prediction models, yet its interpretation often remains difficult in practice. In this didactical note, we develop two complementary interpretations that make net benefit easier to understand for clinical audiences. We show that comparisons with treat-none and treat-all can be expressed through threshold-specific observed risk in patients above and below the decision threshold, linking decision-curve performance to calibration in clinically relevant subgroups. We also show how net benefit relates to positive predictive value, offering a more intuitive explanation of when acting on model predictions is justified. We derive and illustrate these results and propose positive predictive value curves as a practical complement to decision curves.

2602.10989 2026-05-21 math.ST cs.IT cs.LG math.IT math.PR stat.ML stat.TH

Variational Optimality of Föllmer Processes in Generative Diffusions

变分最优的Föllmer过程在生成扩散中的应用

Yifan Chen, Eric Vanden-Eijnden

AI总结 本文研究了利用随机插值框架构造和分析生成扩散的过程,通过条件期望估计漂移项,证明了在变分最优条件下Föllmer过程在路径空间中最小化相对熵,并提供了数据驱动的模拟方法。

详情
AI中文摘要

我们构造并分析了利用随机插值框架在有限时间范围内将点质量运输到指定目标分布的生成扩散。漂移项以条件期望形式表达,可通过独立样本估计而无需模拟随机过程。我们证明扩散系数可以在事后调整而不改变时间边际分布。在所有此类调整中,最小化估计误差对路径空间Kullback-Leibler散度的影响会选出闭式形式的Föllmer过程——一种路径测度相对于由插值计划确定的参考过程最小化相对熵的扩散。这为Föllmer过程提供了新的变分刻画,补充了经典的Schrodinger桥和随机控制方法,并提供了Föllmer漂移的条件期望表示,使从数据中无模拟估计成为可能。我们进一步证明,在最优扩散系数下,路径空间Kullback-Leibler散度与插值计划无关,使得不同计划在变分意义上统计等价。我们还通过数值实验展示了Föllmer过程在概率预报和数据同化中的路径空间变分最优影响。

英文摘要

We construct and analyze generative diffusions that transport a point mass to a prescribed target distribution over a finite time horizon using the stochastic interpolant framework. The drift is expressed as a conditional expectation that can be estimated from independent samples without simulating stochastic processes. We show that the diffusion coefficient can be tuned \emph{a~posteriori} without changing the time-marginal distributions. Among all such tunings, we prove that minimizing the impact of estimation error on the path-space Kullback--Leibler divergence selects, in closed form, a Föllmer process -- a diffusion whose path measure minimizes relative entropy with respect to a reference process determined by the interpolation schedules alone. This yields a new variational characterization of Föllmer processes, complementing classical formulations via Schrödinger bridges and stochastic control, and provides a conditional-expectation representation of the Föllmer drift that enables simulation-free estimation from data. We further establish that, under this optimal diffusion coefficient, the path-space Kullback--Leibler divergence becomes independent of the interpolation schedule, rendering different schedules statistically equivalent in this variational sense. We provide numerical experiments to illustrate the impact of path-space variational optimality of Föllmer's processes in probabilistic forecasting and data assimilation applications.

2602.04907 2026-05-21 cs.LG cs.AI stat.ME

Causal Discovery from Heteroscedastic Stochastic Dynamical Systems under Imperfect Physical Models

从不完美物理模型下的异方差随机动力系统中进行因果发现

Jianhong Chen, Naichen Shi, Xubo Yue

AI总结 本文提出了一种整合因果发现框架,利用随机微分方程中的部分物理知识来提高动态系统中因果图的恢复能力,同时分析了在不完美物理模型下的鲁棒性。

详情
Comments
101 pages
AI中文摘要

因果发现是一种数据驱动的复杂系统分析范式,而基于物理的模型,如常微分方程(ODEs),为现实世界的动力学过程提供了机理结构。整合这些范式可以提高可识别性、稳定性和鲁棒性。然而,真实动力系统往往表现出循环交互和非平稳性,而许多因果发现方法依赖于无循环、平稳或平衡假设。我们提出了一种整合因果发现框架,利用随机微分方程(SDEs)中的部分物理知识。漂移项编码已知的ODE动力学,而扩散项捕捉超出规定物理的未知因果耦合。我们开发了一种可扩展的稀疏诱导最大准似然估计器,并通过理论上合理的稳定技术来改善优化景观。在温和条件下,我们为稳定和不稳定SDEs建立了因果图恢复保证。我们还分析了我们的因果图估计在ODE不准确情况下的鲁棒性,并澄清了引入的稳定技术如何平衡数值稳定性和统计恢复能力。在线性SDEs和非线性基准测试,包括具有无循环和循环结构的Lotka-Volterra和Lorenz动力学上,实验显示了比数据驱动基线更好的图恢复和鲁棒性。我们还通过在我们的因果发现框架内重建随机SIR动力学来展示实际应用,以在现实世界流行病数据中进行因果图重建。

英文摘要

Causal discovery is a data-driven paradigm for analyzing complex systems, while physics-based models, such as ordinary differential equations (ODEs), provide mechanistic structure for real-world dynamical processes. Integrating these paradigms can improve identifiability, stability, and robustness. However, real dynamical systems often exhibit cyclic interactions and nonstationarity, whereas many causal discovery methods rely on acyclicity, stationarity, or equilibrium assumptions. We propose an integrative causal discovery framework for dynamical systems that leverages partial physical knowledge through stochastic differential equations (SDEs). The drift term encodes known ODE dynamics, while the diffusion term captures unknown causal couplings beyond the prescribed physics. We develop a scalable sparsity-inducing maximum quasi-likelihood estimator with a theoretically justified stabilization technique to improve the optimization landscape. Under mild conditions, we establish causal graph recovery guarantees for both stable and unstable SDEs. We also analyze robustness of our causal graph estimate to ODE misspecification and clarify how the introduced stabilization technique balances numerical stability and statistical recoverability. Experiments on linear SDEs and nonlinear benchmarks, including Lotka-Volterra and Lorenz dynamics with acyclic and cyclic structures, show improved graph recovery and robustness over data-driven baselines. We also demonstrate practical utility on real-world epidemic data by reconstructing stochastic SIR dynamics within our causal discovery framework.

2602.04092 2026-05-21 stat.AP econ.EM stat.ME

Time-to-Event Estimation with Unreliably Reported Events in Medicare Health Plan Payment

Medicare健康计划支付中不可靠事件报告的时间到事件估计

Oana M. Enache, Sherri Rose

AI总结 本文提出了一种时间到事件估计器,用于评估医疗保险中的新诊断编码和可能的虚报,并介绍了一个开源软件包,以提高与医疗保险报销行为相关的可重复方法开发。

详情
Comments
44 pages, 10 figures
AI中文摘要

OBJECTIVE: 为了提出有助于评估医疗保险中新诊断编码和可能虚报的时间到事件估计器,并介绍一个开源软件包,以促进与医疗保险报销行为相关的更可重复的方法开发。 STUDY SETTING AND DESIGN: 对基于保险公司或提供者编码的模拟虚报进行观察性分析,这些编码可能受到医疗保险经办机构风险调整的激励。 DATA SOURCES AND ANALYTIC SAMPLE: 两年期间分别模拟了医疗保险经办机构人口和传统医疗保险人口的新健康状况编码数据,其中编码模式与每个计划中已知的做法一致。 PRINCIPAL FINDINGS: 我们提出了几种新的时间到事件估计器,用于估计医疗保险经办机构中的新编码强度和可能的虚报,包括考虑不可靠的报告。我们利用国家卫生研究院的All of Us研究在模拟数据中展示了估计器的性能,并开发了一个开源的R包来模拟纵向的现实标记虚报数据,这些数据之前对研究人员不可用。在模拟中,我们的新型估计器恢复了不同监控期内的虚报差异。低估对我们的新型估计器影响有限,而现有的估计器对低估更敏感。 CONCLUSIONS: 我们提出的估计器可以帮助研究人员和政策制定者跟踪新的编码行为(例如,可能受到风险调整公式更新的激励)并以更大规模进行跟踪,同时考虑多个现实数据因素。此外,我们提供的R包可用于改进编码强度和虚报方法的开发、可及性和可重复性评估。

英文摘要

OBJECTIVE: To propose time-to-event estimators that help evaluate incident diagnostic coding and possible upcoding in Medicare as well as introduce an open-source software package that enables more reproducible methods development relevant to Medicare billing behavior. STUDY SETTING AND DESIGN: Observational analysis of simulated upcoding based on coding by insurers or providers that may be incentivized by Medicare Advantage risk adjustment. DATA SOURCES AND ANALYTIC SAMPLE: Two years of separately simulated incident health condition coding data for a Medicare Advantage population and a Traditional Medicare population where coding patterns are aligned with known practices in each program. PRINCIPAL FINDINGS: We propose several novel time-to-event estimators of incident coding intensity and possible upcoding in Medicare Advantage, including accounting for unreliable reporting. We demonstrate estimator performance in simulated data leveraging the National Institutes of Health's All of Us study and also develop an open source R package to simulate longitudinal realistic labeled upcoding data, which were not previously available for researchers. In simulations, our novel estimators recovered differences in upcoding within and across monitoring periods. Undercoding had a limited effect on our novel estimators while an existing estimator was more sensitive to undercoding. CONCLUSIONS: Our proposed estimators can help researchers and policymakers track new coding behaviors (e.g., as may be incentivized by risk adjustment formula updates) earlier and at scale while accounting for several real-world data considerations. Further, the R package we provide can be used to improve the development, accessibility, and reproducible evaluation of coding intensity and upcoding methodology.

2601.14991 2026-05-21 stat.ME stat.ML

Consistency of Honest Decision Trees and Random Forests

诚实决策树与随机森林的一致性

Martin Bladt, Rasmus Frigaard Lemvig

AI总结 本文研究了回归设置中诚实决策树和随机森林的不同一致性类型,通过简单证明和经典平滑方法的论证,建立了诚实树和诚实森林平均值对真实回归函数的弱一致性和几乎必然收敛,并在紧致协变量域上获得一致收敛。该框架自然支持基于分层采样的集成变体和两阶段bootstrap采样方案,简化了现有分析并恢复了多个结果。

详情
AI中文摘要

我们研究了回归设置中诚实决策树和随机森林的各种一致性类型。与相关文献不同,我们的证明是简单的,并遵循用于平滑方法的经典论证。在回归函数和数据分布的温和正则性条件下,我们建立了诚实树和诚实森林平均值对真实回归函数的弱一致性和几乎必然收敛,并且还获得了在紧致协变量域上的统一收敛。该框架自然地容纳了基于分层采样的集成变体以及两阶段bootstrap采样方案。我们的处理综合并简化了现有的分析,特别是恢复了多个结果作为特殊情况。论证的简单性澄清了数据自适应分区与核型方法之间的紧密关系,为理解基于树的方法的渐近行为提供了可访问的方法。

英文摘要

We study various types of consistency of honest decision trees and random forests in the regression setting. In contrast to related literature, our proofs are elementary and follow the classical arguments used for smoothing methods. Under mild regularity conditions on the regression function and data distribution, we establish weak and almost sure convergence of honest trees and honest forest averages to the true regression function, and moreover we obtain uniform convergence over compact covariate domains. The framework naturally accommodates ensemble variants based on subsampling and also a two-stage bootstrap sampling scheme. Our treatment synthesizes and simplifies existing analyses, in particular recovering several results as special cases. The elementary nature of the arguments clarifies the close relationship between data-adaptive partitioning and kernel-type methods, providing an accessible approach to understanding the asymptotic behavior of tree-based methods.

2601.07169 2026-05-21 math.PR cond-mat.stat-mech cs.DM math.CO math.ST stat.TH

Approximate FKG inequalities for phase-bound spin systems, with applications to central limit theorems for exponential random graphs

相位边界自旋系统的近似FKG不等式,及其在指数随机图模型中心极限定理中的应用

Satyaki Mukherjee, Vilas Winstein

AI总结 本文研究了相位边界自旋系统中近似的FKG不等式,证明了在相变共存区域中,每个相内部确实满足近似的FKG不等式,并利用此结果完成了各个相内的中心极限定理证明,回答了Bianchi等人提出的问题。

详情
Comments
28 pages, 1 figure. Title, abstract, and introduction updated to clarify the focus of the article
AI中文摘要

Fortuin-Kasteleyn-Ginibre(FKG)不等式是单调自旋系统中满足FKG晶格条件的重要工具,它为所有坐标递增的自旋函数提供正相关性。该不等式在各种中心极限定理(CLTs)的证明中发挥了重要作用,包括最近关于铁磁性指数随机图模型(ERGMs)的研究,其中哈密顿量倾斜促进了小子图如三角形的存在。然而,当将自旋系统限制在特定相中时,在低温参数下FKG晶格条件会失效。因此,不清楚每个相内部是否对递增函数具有正相关性,或者整体模型(即相的混合)中的正相关性是否主要来自全局相的选择。在本文中,我们证明ERGMs中的各个相确实满足近似的FKG不等式。我们利用此结果完成各个相内的中心极限定理证明,回答了Bianchi、Collet和Magnanini提出的问题。我们展示了ERGMs中的FKG不等式是更一般结果的推论,该结果在某些与元稳态混合相关的输入条件下成立;我们预计该一般结果将具有广泛的应用性,并专门用一节来阐述其在一类广义高阶铁磁性居里-魏斯模型中的应用细节,其中所需的输入相对明确。

英文摘要

The Fortuin-Kasteleyn-Ginibre (FKG) inequality is an invaluable tool in monotone spin systems satisfying the FKG lattice condition, which provides positive correlations for all coordinate-wise increasing functions of spins. This inequality has numerous applications and plays an integral role in the proof of various central limit theorems (CLTs), including recent work on ferromagnetic exponential random graph models (ERGMs) wherein a Hamiltonian tilt promotes the presence of small subgraphs like triangles. However, the FKG lattice condition fails to hold when confining a spin system to a particular phase in the low-temperature regime of parameters. Thus it is not a priori clear if each phase internally has positive correlations for increasing functions, or if the positive correlations in the overall model (which is a mixture of phases) arise primarily from the global choice of phase. In this article, we show that the individual phases in ERGMs do indeed satisfy an approximate form of the FKG inequality internally. We use this to finish the proof of various CLTs within each individual phase in the phase-coexistence regime, answering a question posed by Bianchi, Collet, and Magnanini. We present the FKG inequality for ERGMs as a consequence of a more general result which holds under certain inputs related to metastable mixing; we expect this general result to be widely applicable, and we devote a section to spelling out the details of its application to a class of generalized higher-order ferromagnetic Curie-Weiss models where the necessary inputs are relatively transparent.

2511.23152 2026-05-21 cs.LG cond-mat.dis-nn math.OC math.RT stat.ML

A Differentiable Measure of Algebraic Complexity: Provably Exact Discovery of Group Structures

一种可微的代数复杂性度量:证明精确发现群结构

Dongsung Huh, Lior Horesh, Halyun Jeong

AI总结 本文提出了一种可微的代数复杂性度量,通过Cayley表完成问题,证明了通过超立方体操作符张量分解可以精确发现群结构,解决了Huh(2025)的核心开放猜想。

详情
Comments
29 pages, 3 figures. All theoretical conjectures are formally proven as theorems and verified in Lean 4. v4: Minor typographical corrections
AI中文摘要

从数据中发现离散代数规则是机器学习中的基本挑战。我们通过Cayley表完成——经典矩阵完成的代数对应物——正式化了这个问题,其中关联性违反的程度取代线性秩作为复杂性的内在度量。我们对超立方体,一种操作值张量分解,在完全观察的目标表δ上进行了严格的景观分析,证明其全局下界H_inf(δ) := inf_{Θ∈F_δ} H(Θ)隐式定义了这种复杂性的精确可微度量。我们证明了超立方体的原目标函数H(Θ)分解为两个组成部分:几何对齐(共线性)和反ℓ_2惩罚。我们建立这些连续变分压力诱导了核心离散属性:共线性强制关联性(共线性-关联性等价),而反ℓ_2惩罚在共线性流形内减少为精确反秩惩罚,驱动参数向全秩单位性发展。因此,我们推导出一个绝对下界H(Θ) ≥ H_inf(δ) ≥ 3 |δ|,其中|δ|是目标表大小。我们证明这个绝对地板在且仅在目标是同源于群时被达到,并将全局最小值表征为底层群的正则表示(除单位性规范外),解决了Huh(2025)的核心开放猜想。本文为某些离散代数结构可以被可微度量精确表征提供了存在证明,使得基于梯度的发现无需组合搜索。所有理论结果均在Lean 4中机械验证并通过小规模实验确认。

英文摘要

Discovering discrete algebraic rules from data is a fundamental challenge in machine learning. We formalize this problem through Cayley-table completion -- an algebraic counterpart to classical matrix completion -- where the degree of associativity violation replaces linear rank as the intrinsic measure of complexity. We provide a rigorous landscape analysis of HyperCube, an operator-valued tensor factorization, on the fully observed target table $δ$, proving that its global infimum $H_{\inf}(δ) := \inf_{Θ\in F_δ} H(Θ)$ implicitly defines an exact differentiable measure for this complexity. We show that HyperCube's native objective $H(Θ)$ decomposes into two components: geometric alignment (collinearity) and an inverse $\ell_2$ penalty. We establish that these continuous variational pressures induce core discrete properties: collinearity enforces associativity (Collinearity--Associativity Equivalence), and the inverse $\ell_2$ penalty reduces to an exact inverse rank penalty within the collinear manifold, driving the parameters toward full-rank unitarity. Consequently, we derive an absolute lower bound $H(Θ) \ge H_{\inf}(δ) \ge 3 \, |δ|$, where $|δ|$ is the target table size. We prove this absolute floor is attained if and only if the target is isotopic to a group, and characterize the global minimizer as the regular representation of the underlying group (up to unitary gauge), resolving the central open conjecture of Huh (2025). This work serves as an existence proof that certain discrete algebraic structures can be exactly characterized by differentiable measures, enabling gradient-based discovery without the need for combinatorial search. All theoretical results are mechanically verified in Lean 4 and confirmed via small-scale experiments.

2511.21836 2026-05-21 stat.ME

A simple and powerful test of vaccine waning

一种简单而强大的疫苗效用衰减检验

Gellért Perényi, Matias Janvin, Mats J. Stensrud

AI总结 本文提出了一种新的统计检验方法,用于评估治疗效果在个体层面是否随时间保持不变,从而更有效地检测疫苗效用的衰减,同时提供了新的关于衰减效应的界限结果。

详情
AI中文摘要

确定疫苗效力是否减弱对个体和公共决策至关重要。然而,量化衰减是一个微妙的任务。经典方法除非我们施加不合理假设,否则不能解释为效力下降的度量。最近,正式因果估计量被提出,用于量化疫苗衰减,这些估计量可以在较弱的假设下被界定,但界限往往太宽,无法做出关于衰减存在的声明。我们提出了一种不同的方法:一种正式检验,用于评估治疗效果在个体层面是否随时间保持不变。该检验在现有方法上提供了显著的统计功效提升,并在疫苗试验中可解释的假设下保持有效。我们通过实际和模拟例子展示了统计功效的提升,使用三种不同的方法计算检验统计量。其中两种方法仅基于汇总数据,这些数据来自现有的临床试验。除了我们的检验外,我们还提供了新的结果,界定了衰减效应。我们使用这些方法重新分析了BNT162b2新冠疫苗随机对照试验的数据。尽管之前的分析未建立衰减,我们的检验拒绝了无衰减的原假设。

英文摘要

Determining whether vaccine efficacy wanes is important for individual and public decision making. Yet, quantification of waning is a subtle task. The classical approaches cannot be interpreted as measures of declining efficacy unless we impose unreasonable assumptions. Recently, formal causal estimands designed to quantify vaccine waning have been proposed. These estimands can be bounded under weaker assumptions, but the bounds are often too wide to make claims about the presence of waning. We propose a different approach: a formal test to assess whether a treatment effect is constant over time at the individual level. This test provides a considerable power gain over existing approaches and is valid under interpretable assumptions in vaccine trials. We illustrate the increase in power through real and simulated examples, using three different approaches to compute the test statistics. Two of these approaches are based solely on summary data, accessible from existing clinical trials. Beyond our test, we also give new results that bound the waning effect. We use our methods to reanalyze data from a randomized controlled trial of the BNT162b2 COVID-19 vaccine. While prior analysis did not establish waning, our test rejects the null hypothesis of no waning.

2511.21223 2026-05-21 stat.ML cs.LG

Maxitive Donsker-Varadhan Formulation for Possibilistic Variational Inference

Maxitive Donsker-Varadhan Formulation for Possibilistic Variational Inference

Jasraj Singh, Shelvia Wongso, Jeremie Houssineau, Badr-Eddine Chérief-Abdellatif

AI总结 本文提出了一种基于可能性理论的变分推断方法,通过建立最大性Donsker-Varadhan公式,解决了传统变分推断中对加法性假设的依赖问题,并提出了CBOpt优化器以提升图像分类任务的性能。

详情
Comments
37 pages, 3 figures, 13 tables
AI中文摘要

变分推断(VI)是现代贝叶斯学习的核心,使复杂模型的近似推断成为可能。然而,其公式依赖于高维积分定义的期望和发散,通常使解析处理变得不可能,需要依赖大量近似。可能性理论是一种不精确概率框架,允许我们直接建模信念不确定性,而不是依赖概率的主观解释。尽管该框架在稀疏或不精确信息下提供鲁棒性和可解释性,但将VI适应到可能性设置中需要重新思考核心概念,如发散,这预设了加法性。在本工作中,我们开发了一种原则性的公式,以进行可能性VI,通过建立经典Donsker-Varadhan公式的最大性类比。所得到的框架使我们能够推导出具有指数族候选者的可能性VI学习规则和实用的神经网络训练更新规则,从而产生了一族称为CBOpt的优化器。最后,我们证明CBOpt在域内和域外图像分类任务中实现了有竞争力的性能。

英文摘要

Variational inference (VI) is a cornerstone of modern Bayesian learning, enabling approximate inference in complex models. However, its formulation depends on expectations and divergences defined through high-dimensional integrals, often rendering analytical treatment impossible and necessitating heavy reliance on approximations. Possibility theory, an imprecise probability framework, allows us to directly model epistemic uncertainty instead of relying on a subjective interpretation of probabilities. While this framework provides robustness and interpretability under sparse or imprecise information, adapting VI to the possibilistic setting requires rethinking core concepts such as divergences, which presuppose additivity. In this work, we develop a principled formulation for performing possibilistic VI by establishing a maxitive analogue of the classical Donsker-Varadhan formulation. The resulting framework enables us to derive a learning rule for possibilistic VI with exponential-family candidates and practical update rules for neural-network training, giving rise to a family of optimizers termed CBOpt. Finally, we demonstrate that CBOpt achieves competitive performance on both in-domain and out-of-domain image classification tasks.

2511.20183 2026-05-21 stat.AP

Multi-fidelity Gaussian process regression for noisy outputs and non-nested experimental designs: a comparison between the recursive and non-recursive formulations

多保真度高斯过程回归用于噪声输出和非嵌套实验设计:递归与非递归公式的比较

Nils Baillie, Baptiste Kerleguer, Cyril Feau, Josselin Garnier

AI总结 本文研究了在高保真和低保真数据存在噪声和非嵌套情况下的递归自回归多保真度高斯过程回归方法,提出了一种基于期望最大化算法的解耦优化策略,并与经典非递归方法进行了比较,展示了递归方法在训练时间和预测精度上的优势。

详情
AI中文摘要

本文研究了在高保真和低保真数据存在噪声和非嵌套情况下的递归自回归多保真度高斯过程回归方法。我们提出了一种基于期望最大化算法的解耦优化策略,利用递归模型的结构。特别是,我们推导了当缩放因子建模为参数线性预测器时的闭式更新公式。该方法与经典非递归方法的完全耦合似然最大化进行了比较。一系列基准实验,涵盖从简单到复杂的应用,突显了两种方法的性能。结果表明,所提出的递归策略在大规模低保真数据集可用时显著减少了训练时间,同时保持了竞争性的预测精度和不确定性估计。

英文摘要

This paper investigates a recursive formulation of auto-regressive multi-fidelity Gaussian process regression in the challenging setting of noisy and non-nested high- and low-fidelity data. We propose a decoupled optimization strategy based on the expectation-maximization algorithm, which exploits the structure of the recursive model. In particular, we derive closed-form update formulas when the scaling factor is modeled as a parametric linear predictor. This approach is compared with the fully coupled likelihood maximization of the classical non-recursive formulation introduced by Kennedy and O'Hagan. A series of benchmark experiments, covering applications of increasing complexity, highlights the performance of both approaches. The results demonstrate that the proposed recursive strategy significantly reduces training time, especially when large low-fidelity datasets are available, while maintaining competitive predictive accuracy and uncertainty estimation.

2511.07846 2026-05-21 cs.DS cs.CC math.ST stat.TH

Model-agnostic super-resolution in high dimensions

不依赖模型的高维超分辨率

Xi Chen, Anindya De, Yizhi Huang, Shivam Nadimpalli, Rocco A. Servedio, Tianqi Yang

AI总结 本文研究了在高维空间中不依赖模型的超分辨率问题,通过分析一般非负信号(等价于分布)的重建,提出了新的'heavy hitter'重建概念,并展示了在不同重建标准下所需傅里叶系数数量的差异。

详情
AI中文摘要

超分辨率问题大致是指在给定可能含噪的低阶傅里叶系数信息的情况下,重建未知信号的高精度。先前结果对信号施加了强模型假设,通常要求其是空间分离点源的线性组合。在本文中,我们分析了超分辨率问题的一个非常一般性的版本,考虑了完全一般的非负信号(等价于分布)在d维单位立方体[0,1)^d上的情况;我们不假设点源之间的空间分离,甚至不假设分布是有限个点源的线性组合。自然地提出问题:在这样的通用设置下,关于超分辨率能说些什么?- 作为预热,我们首先给出了在Wasserstein距离下重建分布的一组结果。我们建立了截断频率T和噪声幅度κ的上界和下界,使得精确重建成为可能:我们证明对于d维分布,估计约exp(d)个傅里叶系数是准确Wasserstein重建的必要且充分条件。- 作为主要结果,我们定义了新的'heavy hitter'重建概念,本质上意味着实现所有'足够密集'区域的高精度重建。我们给出了在该概念下准确重建所需的截断频率T和噪声幅度κ的上界和下界。我们的结果表明(与Wasserstein重建形成鲜明对比),仅需约exp(sqrt(d))个傅里叶系数的准确估计即可实现heavy hitter重建。

英文摘要

The problem of super-resolution, roughly speaking, is to reconstruct an unknown signal to high accuracy, given (potentially noisy) information about its low-degree Fourier coefficients. Prior results on super-resolution have imposed strong modeling assumptions on the signal, typically requiring that it is a linear combination of spatially separated point sources. In this work we analyze a very general version of the super-resolution problem by considering completely general non-negative signals (equivalently, distributions) over the $d$-dimensional torus $[0,1)^d$; we do not assume any spatial separation between point sources, or even that the distribution is a finite linear combination of point sources. The question naturally arises: what can be said about super-resolution in such a general setting? - As a warm-up, we first give a set of results for reconstructing distributions under the Wasserstein distance. We establish essentially matching upper and lower bounds on the cutoff frequency $T$ and the magnitude $κ$ of the noise for which accurate reconstruction is possible: we show that for $d$-dimensional distributions, estimates of $\approx \exp(d)$ many Fourier coefficients are both necessary and sufficient for accurate Wasserstein reconstruction. - As our main result, we define a new notion of "heavy hitter" reconstruction for distributions, which essentially amounts to achieving high-accuracy reconstruction of all "sufficiently dense" regions of the distribution. We give essentially matching upper and lower bounds on the cutoff frequency $T$ and the magnitude $κ$ of the noise for which accurate reconstruction is possible under this notion. Our results show that (in sharp contrast with Wasserstein reconstruction) accurate estimates of only $\approx \exp(\sqrt{d})$ many Fourier coefficients are both necessary and sufficient for heavy hitter reconstruction.

2511.01705 2026-05-21 stat.ME cs.SI stat.AP

Z-Dip: a standardized measure for data modality assessment

Z-Dip:数据模态评估的标准度量

Edoardo Di Martino, Matteo Cinelli, Roy Cerqueti

AI总结 本文提出Z-Dip作为一种标准化的多模态度量,通过解决传统Dip测试在样本大小和显著性校准方面的局限性,提供更可解释和可比的模态评估方法。

详情
AI中文摘要

检测经验分布中的多模态性是统计学和数据分析中的基本问题,其应用范围从聚类到复杂系统研究。然而,在实践中,以一致且可比的方式评估偏离单模态性仍具有挑战性。广泛使用的Hartigan和Hartigan的Dip测试展示了这些困难,因为其统计量的解释强烈依赖于样本大小,需要校准以确定显著性,并且对于大样本,表现出增加的敏感性,导致对任意小的偏离都会拒绝单模态性。我们引入Z-Dip,作为一种多模态性的标准化度量,以解决这些限制。通过将Dip统计量视为单模态性假设下的随机变量,并对其观测值进行标准化,所提出的方法产生可以直接比较不同大小数据集的分数。通过基于模拟的校准,我们推导出一个通用的决策阈值,能够接近重现经典Dip测试决策,而无需样本大小特定的调整。在模拟数据和超过88,000个经验意见分布上的广泛验证显示,与经典Dip测试几乎完美一致,同时提供更可解释和可比的模态度量。最后,我们提出基于降采样的修正,以减轻极大数据样本中的残余敏感性。开源软件和参考表被提供以促进实际应用。

英文摘要

Detecting multimodality in empirical distributions is a fundamental problem in statistics and data analysis, with applications ranging from clustering to the study of complex systems. In practice, however, assessing departures from unimodality in a consistent and comparable way remains challenging. Widely used methods such as Hartigan and Hartigan's Dip Test illustrate these difficulties, as the interpretation of their statistics depends strongly on sample size, requires calibration to determine significance, and, for large samples, exhibit increasing sensitivity, leading to rejection of unimodality for arbitrarily small deviations from the null. We introduce Z-Dip, a standardized measure of multimodality that addresses these limitations. By treating the Dip statistic as a random variable under the null hypothesis of unimodality and standardizing its observed value, the proposed approach yields scores that are directly comparable across datasets of different sizes. Using simulation-based calibration, we derive a universal decision threshold that closely reproduces classical Dip Test decisions without requiring sample-size-specific adjustments. Extensive validation on simulated data and on more than 88,000 empirical opinion distributions shows near-perfect agreement with the classical Dip Test while providing a more interpretable and comparable measure of modality. Finally, we propose a downsampling-based correction that mitigates residual sensitivity in extremely large samples. Open-source software and reference tables are provided to facilitate practical adoption.

2509.08795 2026-05-21 stat.ME

On the inclusion of non-concurrent controls in platform trials with an interim analysis

关于在包含中期分析的平台试验中纳入非同时对照的探讨

Pavla Krotka, Martin Posch, Marta Bofill Roig

AI总结 本文研究了在包含中期分析的平台试验中如何纳入非同时对照,探讨了非同时对照对治疗效应估计的影响,并提出了一种新的估计方法以减少偏差和I类错误。

详情
Journal ref
Statistics in Medicine (2026)
AI中文摘要

利用非同时对照可以增强平台试验的分析。由于纳入此类数据可能会在存在时间趋势时引入偏差,因此已提出了一些调整时间的方法。然而,迄今为止,这些方法在包含中期分析的平台试验中的行为尚未系统地得到研究。为了评估在使用非同时对照的试验中中期分析的影响,我们考虑了一个包含两个实验臂和一个共享对照的平台试验,其中第二个实验臂较晚进入。我们关注一种频率回归模型,该模型利用非同时对照来估计第二个臂的治疗效应,并使用阶梯函数调整时间以考虑时间变化。我们证明,如果在第一个臂中进行中期分析,而回归模型未进行调整,则可能会引入对第二个臂效应估计的偏差,并研究边际偏差和在第一个臂中期后继续的条件下偏差如何依赖于不同的试验设计参数。此外,我们提出了一种新的估计第二个臂治疗效应的估计量,旨在消除由第一个臂的中期分析和时间趋势引入的偏差,并在模拟研究中评估其性能。新提出的估计量被证明可以显著减少偏差和I类错误率膨胀,同时相比仅使用同时对照的分析具有更高的功效。

英文摘要

The analysis of platform trials can be enhanced by utilizing non-concurrent controls. Since including this data might also introduce bias in the treatment effect estimators if time trends are present, methods for incorporating non-concurrent controls adjusting for time have been proposed. However, so far their behavior has not been systematically investigated in platform trials that include interim analyses. To evaluate the impact of an interim analysis in trials utilizing non-concurrent controls, we consider a platform trial featuring two experimental arms and a shared control, with the second experimental arm entering later. We focus on a frequentist regression model that uses non-concurrent controls to estimate the treatment effect of the second arm and adjusts for time using a step function to account for temporal changes. We show that performing an interim analysis in Arm 1 may introduce bias in the point estimation of the effect in Arm 2, if the regression model is used without adjustment, and investigate how the marginal bias and bias conditional on the first arm continuing after the interim depend on different trial design parameters. Moreover, we propose a new estimator of the treatment effect in Arm 2, aiming to eliminate the bias introduced by both the interim analysis in Arm 1 and the time trends, and evaluate its performance in a simulation study. The newly proposed estimator is shown to substantially reduce the bias and type I error rate inflation while leading to power gains compared to an analysis using only concurrent controls.

2505.24275 2026-05-21 cs.LG math.OC stat.ML

GradPower: Powering Gradients for Faster Language Model Pre-Training

GradPower: 通过梯度加速更快的语言模型预训练

Jinbo Wang, Mingze Wang, Jiaqi Zhang, Wei Wang, Peng Pei, Xunliang Cai, Weinan E, Lei Wu

AI总结 本文提出GradPower,一种轻量级的梯度变换技术,用于加速语言模型预训练。通过元素级符号幂变换,将梯度输入基础优化器,无需修改优化器内部逻辑或超参数,从而在多种架构、参数规模、数据集和学习率调度方案中均取得更低的终端损失。

详情
Comments
24 pages, accepted by ICML 2026
AI中文摘要

我们提出GradPower,一种轻量级的梯度变换技术,用于加速语言模型预训练。给定一个梯度向量$g=(g_i)_i$,GradPower首先应用元素级符号幂变换:$φ_p(g)=({ m sign}(g_i)|g_i|^p)_{i}$,其中$p>0$为固定值,然后将变换后的梯度输入基础优化器。值得注意的是,GradPower只需单行代码更改,无需修改基础优化器的内部逻辑,包括超参数。当应用于Adam(称为AdamPower)时,GradPower在多种架构(LLaMA、Qwen2MoE)、参数规模(66M到2B)、数据集(C4、OpenWebText)和学习率调度方案(余弦、warmup-stable-decay)中均一致取得更低的终端损失。最显著的收益出现在训练现代混合专家模型时使用warmup-stable-decay调度方案。GradPower还无缝集成到其他最先进的优化器中,如Muon,从而进一步提升性能。最后,我们提供了理论分析,揭示了GradPower的内在机制,并突显了梯度噪声的影响。

英文摘要

We propose GradPower, a lightweight gradient-transformation technique for accelerating language model pre-training. Given a gradient vector $g=(g_i)_i$, GradPower first applies the elementwise sign-power transformation: $φ_p(g)=({\rm sign}(g_i)|g_i|^p)_{i}$ for a fixed $p>0$, and then feeds the transformed gradient into a base optimizer. Notably, GradPower requires only a single-line code change and no modifications to the base optimizer's internal logic, including the hyperparameters. When applied to Adam (termed AdamPower), GradPower consistently achieves lower terminal loss across diverse architectures (LLaMA, Qwen2MoE), parameter scales (66M to 2B), datasets (C4, OpenWebText), and learning-rate schedules (cosine, warmup-stable-decay). The most pronounced gains are observed when training modern mixture-of-experts models with warmup-stable-decay schedules. GradPower also integrates seamlessly with other state-of-the-art optimizers, such as Muon, yielding further improvements. Finally, we provide theoretical analyses that reveal the underlying mechanism of GradPower and highlight the influence of gradient noise.

2504.05431 2026-05-21 stat.ME math.ST stat.TH

A Generalized Tangent Approximation based Variational Inference Framework for Strongly Super-Gaussian Likelihoods

一种基于切线近似的变分推断框架用于强超高斯似然模型

Somjit Roy, Pritam Dey, Debdeep Pati, Bani K. Mallick

AI总结 本文提出一种基于切线变换的变分框架,用于处理具有强超高斯似然特征的概率模型,通过凸对偶性构造对数似然的切线下界,从而在不可行的设置中实现高斯先验与模型参数的共轭性,并在数据生成机制的温和假设下建立算法收敛保证,同时推导出近最优的变分风险界。

详情
Comments
135 pages, 51 figures, 13 tables, Revision Submitted
AI中文摘要

变分推断作为一种替代马尔可夫链蒙特卡罗采样的方法,在使复杂贝叶斯模型的可扩展计算成为可能方面发挥了变革性的作用。然而,现有方法往往依赖于刚性的模型特定公式或随机黑箱优化程序。切线近似是一种原理性的结构化变分方法,利用了底层概率模型的几何特性。然而,其用途主要局限于逻辑回归及相关建模领域。在本文中,我们提出了一种基于切线变换的新型变分框架,用于广泛概率模型类,这些模型由强超高斯似然特征定义。我们的方法利用凸对偶性来构造对数似然的切线下界,从而在不可行的设置中诱导高斯先验与模型参数的共轭性。在数据生成机制的温和假设下,我们建立了算法收敛保证,这一贡献与通常可用的黑箱变分方法的有限理论保证形成对比。此外,我们推导出近最优的变分风险界。我们提出的方法在模拟和真实数据场景中的优越性能得到了展示,这些场景在可扩展性和一致捕捉复杂底层数据结构方面挑战了最先进的变分算法。

英文摘要

Variational inference, as an alternative to Markov chain Monte Carlo sampling, has played a transformative role in enabling scalable computation for complex Bayesian models. Nevertheless, existing approaches often depend on either rigid model-specific formulations or stochastic black-box optimization routines. Tangent approximation is a principled class of structured variational methods that exploits the geometry of the underlying probability model. However, its utility has largely been confined to logistic regression and related modeling regimes. In this article, we propose a novel variational framework based on tangent transformation for a broad class of probability models characterized by strongly super-Gaussian likelihoods. Our method leverages convex duality to construct tangent minorants of the log-likelihood, thereby inducing conjugacy with Gaussian priors over model parameters in an otherwise intractable setup. Under mild assumptions on the data-generating mechanism, we establish algorithmic convergence guarantees, a contribution that stands in contrast to the limited theoretical assurances typically available for black-box variational methods. Additionally, we derive near-minimax optimal bounds for the variational risk. Superior performance of our proposed methodology is illustrated on simulated and real-data scenarios that challenge state-of-the-art variational algorithms in terms of scalability and their ability to consistently capture complex underlying data structure.

2503.19066 2026-05-21 math.PR stat.ML

Accelerating Langevin Monte Carlo Sampling: A Large Deviations Analysis

加速Langevin抽样:大偏差分析

Nian Yao, Pervez Ali, Xihua Tao, Lingjiong Zhu

AI总结 本文通过大偏差理论研究了过阻尼Langevin动力学变种的加速方法,并通过合成和实际数据实验展示了其效率。

详情
Comments
53 pages, 5 figures
AI中文摘要

Langevin算法是流行的马尔可夫链蒙特卡罗方法,常用于解决机器学习中高维大规模抽样问题。最经典的Langevin蒙特卡罗算法基于过阻尼Langevin动力学。有许多Langevin动力学的变种在实践中表现出优越的性能。在本文中,我们通过大偏差理论的视角提供了一种统一的方法来研究这些变种的加速问题。使用合成和实际数据的数值实验展示了这些变种的效率。

英文摘要

Langevin algorithms are popular Markov chain Monte Carlo methods that are often used to solve high-dimensional large-scale sampling problems in machine learning. The most classical Langevin Monte Carlo algorithm is based on the overdamped Langevin dynamics. There are many variants of Langevin dynamics that often show superior performance in practice. In this paper, we provide a unified approach to study the acceleration of the variants of the overdamped Langevin dynamics through the lens of large deviations theory. Numerical experiments using both synthetic and real data are provided to illustrate the efficiency of these variants.

2502.06178 2026-05-21 math.OC cs.LG stat.ML

Bayesian Optimization by Kernel Regression and Density-based Exploration

基于核回归和密度探索的贝叶斯优化

Tansheng Zhu, Hongyu Zhou, Ke Jin, Xusheng Xu, Qiufan Yuan, Lijie Ji

AI总结 该研究提出了一种新的贝叶斯优化算法BOKE,通过核回归和密度探索结合,减少计算成本至二次复杂度,并在理论和实验上证明了其收敛性和有效性。

详情
AI中文摘要

贝叶斯优化在优化昂贵评估的黑盒函数时非常有效,但因高斯过程的每次迭代三次计算复杂度而面临显著的计算挑战,导致总时间复杂度与迭代次数的四次方成正比。为了解决这一限制,我们提出了一种新的算法,即基于核回归和密度探索的贝叶斯优化(BOKE)。BOKE利用核回归进行高效的函数近似,核密度用于探索,并将它们整合到置信界标准中以指导优化过程,从而将计算成本降低到二次。我们的理论分析严格建立了在噪声评估下的BOKE全局收敛性。通过广泛的数值实验,在合成和现实优化任务中,我们证明了BOKE不仅在与高斯过程方法和其他基线方法相比具有竞争力,而且表现出优越的计算效率。这些结果突显了BOKE在资源受限环境中的有效性,为工程应用中的优化问题提供了一种实用的方法。

英文摘要

Bayesian optimization is highly effective for optimizing expensive-to-evaluate black-box functions, but it faces significant computational challenges due to the cubic per-iteration cost of Gaussian processes, which results in a total time complexity that is quartic with respect to the number of iterations. To address this limitation, we propose a novel algorithm, Bayesian optimization by kernel regression and density-based exploration (BOKE). BOKE uses kernel regression for efficient function approximation, kernel density for exploration, and integrates them into the confidence bound criteria to guide the optimization process, thus reducing computational costs to quadratic. Our theoretical analysis rigorously establishes the global convergence of BOKE under noisy evaluations. Through extensive numerical experiments on both synthetic and real-world optimization tasks, we demonstrate that BOKE not only performs competitively compared to Gaussian process-based methods and several other baseline methods but also exhibits superior computational efficiency. These results highlight BOKE's effectiveness in resource-constrained environments, providing a practical approach for optimization problems in engineering applications.

2410.23212 2026-05-21 stat.ML cs.LG math.ST stat.TH

Improved convergence rate of kNN graph Laplacians: differentiable self-tuned affinity

kNN图拉普拉斯算子的改进收敛速度:可微自调亲和力

Xiuyuan Cheng, Yixuan Tan, Nan Wu

AI总结 本文研究了kNN图的收敛速度问题,提出了一种可微自调亲和力的方法,通过改进分析得到在流形数据设定下,kNN图拉普拉斯算子以O(N^{-2/(d+6)})的速度收敛到极限流形算子,验证了理论结果。

详情
AI中文摘要

在基于图的数据分析中,k最近邻(kNN)图因其对局部数据密度的适应性而被广泛应用。允许图中边的加权,核化图亲和力提供了一种更一般的kNN图,其中kNN距离用于自适应地设置核带宽。在本文中,我们考虑了一类一般的kNN图,其中图亲和力为W_{ij}=ε^{-d/2}k_0(||x_i -x_j||^2/εϕ(ρ(x_i),ρ(x_j))^2),其中ρ(x)是点x的(重新缩放的)kNN距离,ϕ是一个对称双变量函数,k_0是一个非负函数。在流形数据设定下,其中N个i.i.d.样本x_i从一个未知的d维流形上的密度p中抽取,我们证明了在k_0和ϕ具有C^3正则性并满足其他技术条件时,kNN图拉普拉斯算子以O(N^{-2/(d+6)})的速度收敛到极限流形算子(取决于p),并验证了理论结果。

英文摘要

In graph-based data analysis, $k$-nearest neighbor ($k$NN) graphs are widely used due to their adaptivity to local data densities. Allowing weighted edges in the graph, the kernelized graph affinity provides a more general type of $k$NN graph where the $k$NN distance is used to set the kernel bandwidth adaptively. In this work, we consider a general class of $k$NN graph where the graph affinity is $W_{ij} = ε^{-d/2} k_0 ( \| x_i - x_j \|^2 / εϕ( \hat ρ(x_i), \hat ρ(x_j) )^2 ) $, with $\hatρ(x)$ being the (rescaled) $k$NN distance at the point $x$, $ϕ$ a symmetric bi-variate function, and $k_0$ a non-negative function on $[0,\infty)$. Under the manifold data setting, where $N$ i.i.d. samples $x_i$ are drawn from a density $p$ on a $d$-dimensional unknown manifold embedded in a high dimensional Euclidean space, we prove the operator pointwise convergence of the $k$NN graph Laplacian to the limiting manifold operator (depending on $p$) at the rate of $O(N^{-2/(d+6)})$, up to a log factor, when $k_0$ and $ϕ$ have $C^3$ regularity and satisfy other technical conditions. This is obtained when $ε\sim N^{-2/(d+6)}$ and $k \sim N^{6/(d+6)}$, both at the optimal order to balance the theoretical bias and variance errors. Our improved convergence rate is based on a refined analysis of the $k$NN estimator, which can be of independent interest. We validate our theory by numerical experiments on simulated data.

2407.08976 2026-05-21 stat.ML cs.LG math.ST stat.TH

Computational-Statistical Trade-off in Kernel Two-Sample Testing with Random Fourier Features

核两样本检验中计算与统计的权衡:随机傅里叶特征

Ikjun Choi, Ilmun Kim

AI总结 本文研究了使用随机傅里叶特征近似MMD检验在计算复杂度与统计功效之间的权衡,证明通过合理选择随机特征数量可以在亚二次时间内达到与MMD检验相同的最小最大分离率。

详情
AI中文摘要

近年来,两样本检验方法得到了快速发展,其中最大均值差异(MMD)检验已成为处理复杂和高维数据的有效工具。尽管MMD检验在成功和广泛应用方面表现突出,但其二次时间复杂度限制了大规模分析的应用。为了解决这一问题,本文重新审视了使用随机傅里叶特征近似的MMD检验,并研究其计算-统计权衡。我们首先揭示,只有当随机特征数量趋于无穷时,近似MMD检验才能在点估计上保持一致性。随后,我们考虑检验的均匀功效,并在最小最大检验框架下研究时间-功效权衡。我们的结果表明,通过精心选择随机特征数量,可以在亚二次时间内达到与MMD检验相同的最小最大分离率。我们基于不同的分布假设(如Sobolev球内的密度)展示了这一点。理论发现通过模拟研究得到验证。

英文摘要

Recent years have seen a surge in methods for two-sample testing, among which the Maximum Mean Discrepancy (MMD) test has emerged as an effective tool for handling complex and high-dimensional data. Despite its success and widespread adoption, the primary limitation of the MMD test has been its quadratic-time complexity, which poses challenges for large-scale analysis. While various approaches have been proposed to expedite the procedure, it has been unclear whether it is possible to attain the same power guarantee as the MMD test at sub-quadratic time cost. To fill this gap, we revisit the approximated MMD test using random Fourier features, and investigate its computational-statistical trade-off. We start by revealing that the approximated MMD test is pointwise consistent in power only when the number of random features approaches infinity. We then consider the uniform power of the test and study the time-power trade-off under the minimax testing framework. Our result shows that, by carefully choosing the number of random features, it is possible to attain the same minimax separation rates as the MMD test within sub-quadratic time. We demonstrate this point under different distributional assumptions such as densities in a Sobolev ball. Our theoretical findings are corroborated by simulation studies.

2312.01386 2026-05-21 cs.LG stat.ML

On the Suboptimality of GP-UCB under Polynomial Effective Optimism

关于多项式有效乐观性下GP-UCB的次优性质

Wenjia Wang, Xiaowei Zhang

AI总结 本文研究了GP-UCB在多项式有效乐观性下的次优性质,通过定义有效乐观性水平(核岭回归中的探索系数与正则化参数的乘积),在统一置信假设下证明了GP-UCB在Matérn核下的新后悔下界,表明有效乐观性水平的多项式增长排除了最小最大最优后悔率,揭示了标准GP-UCB证明最小最大最优性的障碍。

详情
AI中文摘要

高斯过程上置信界(GP-UCB)被广泛用于昂贵黑盒函数的序列优化。尽管文献中已建立了许多关于其累积后悔的上界,但GP-UCB是否最小最大最优仍是一个开放问题。我们通过定义有效乐观性水平(核岭回归中的探索系数与正则化参数的乘积)来研究这一问题。在统一置信假设下,我们证明了GP-UCB在Matérn核下的新后悔下界。该下界表明,有效乐观性水平的多项式增长(至对数因子)排除了最小最大最优的后悔率。由于这一情形涵盖大多数现有分析,我们的结果指出了证明标准GP-UCB最小最大最优性的具体障碍。更广泛地说,它表明当前上界与最小最大下界之间的差距可能反映了算法本身的限制,而不仅仅是分析的限制。

英文摘要

Gaussian process upper confidence bound (GP-UCB) is widely used for sequential optimization of expensive black-box functions. Although many upper bounds on its cumulative regret have been established in the literature, whether GP-UCB is minimax optimal remains open. We study this question through the effective optimism level, defined as the product of the exploration coefficient and the regularization parameter in kernel ridge regression. Under a uniform confidence assumption, we prove a new regret lower bound for GP-UCB with Matérn kernels. The bound shows that polynomial growth of the effective optimism level, up to logarithmic factors, rules out the minimax-optimal regret rate. Since this is the regime covered by most existing analyses, our result identifies a concrete obstacle to proving minimax optimality for standard GP-UCB. More broadly, it suggests that the gap between current upper bounds and minimax lower bounds may reflect a real limitation of the algorithm, not only of the analysis.

2304.12906 2026-05-21 cs.LG stat.ML

The Score-Difference Flow for Implicit Generative Modeling

隐式生成建模的分数差流

Romann M. Weber

AI总结 本文提出分数差流作为隐式生成建模的一种新方法,通过最优减少两个分布之间的KL散度,展示了其与去噪扩散模型的等价性,并揭示了生成对抗网络训练中隐含的数据优化子问题与分数差流之间的联系。

详情
Journal ref
Transactions on Machine Learning Research (7/2023)
Comments
25 pages, 5 figures, 4 tables. Updated final version of a paper originally published in Transactions on Machine Learning Research (TMLR), including minor typographical corrections and post-publication commentary connecting the SD flow to drifting models
AI中文摘要

隐式生成建模(IGM)旨在生成与目标数据分布特征相符的合成样本。近期工作(如分数匹配网络、扩散模型)从推动合成源数据向目标分布的角度出发,通过动力学扰动或环境空间中的流来实现。在此方向上,我们提出任意目标与源分布之间的分数差(SD)作为一种流,该流能够最优地减少两者之间的KL散度。我们应用SD流到方便的代理分布上,这些分布只有在原始分布对齐时才对齐。我们证明在某些条件下,这种形式与去噪扩散模型具有形式等价性。我们还表明,生成对抗网络的训练包含一个隐含的数据优化子问题,当判别器最优时,该子问题在特定损失函数选择下诱导出SD流。因此,SD流为解决生成建模三重困境(高质量样本、模式覆盖和快速采样)的三种模型类别提供了理论联系,从而为统一方法奠定了基础。

英文摘要

Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. In this direction, we present the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. We also show that the training of generative adversarial networks includes a hidden data-optimization sub-problem, which induces the SD flow under certain choices of loss function when the discriminator is optimal. As a result, the SD flow provides a theoretical link between model classes that individually address the three challenges of the "generative modeling trilemma" -- high sample quality, mode coverage, and fast sampling -- thereby setting the stage for a unified approach.

1908.05972 2026-05-21 cs.LG stat.ML

AI-based Prediction of Independent Construction Safety Outcomes from Universal Attributes

基于AI的独立施工安全结果的属性预测

Henrietta Baker, Matthew R. Hallowell, Antoine J. -P. Tixier

AI总结 本文改进并验证了先前研究中通过机器学习从属性中预测安全结果的方法,使用NLP提取属性并训练模型预测伤害严重性、类型、受影响身体部位和事件类型,通过独立人工标注消除潜在的人工相关性,结果表明属性仍具有高度预测性,同时引入了更大的数据集、新模型、模型堆叠和更合适的评估指标,最终成功预测伤害严重性,这是重大进展。

详情
Journal ref
Automation in Construction 118 (2020): 103146
Comments
Added author contributions and journal reference, updated corresponding author, fixed a few typos
AI中文摘要

本文显著改进并验证了先前研究中通过机器学习从属性中预测安全结果的方法。与原始研究类似,我们使用自然语言处理(NLP)从原始事件报告中提取基本属性,并训练机器学习模型进行预测。此处预测的安全结果包括伤害严重性、伤害类型、受影响身体部位和事件类型。与原始研究不同,安全结果不是通过NLP提取,而是由独立的人工标注提供,从而消除了预测变量和预测目标之间可能的人工相关性。结果表明,属性仍具有高度预测性,证实了原始方法的有效性。当前研究的其他改进包括使用(1)一个包含超过90,000份报告的更大数据集,(2)两种新模型,XGBoost和线性支持向量机(SVM),(3)模型堆叠,(4)更简单的实验设置和更合适的性能指标,以及(5)对各属性重要性评分的分析。最后,伤害严重性结果得到良好预测,这在原始研究中并未实现。这是重大进展。

英文摘要

This paper significantly improves on, and finishes to validate, an approach proposed in previous research in which safety outcomes were predicted from attributes with machine learning. Like in the original study, we use Natural Language Processing (NLP) to extract fundamental attributes from raw incident reports and machine learning models are trained to predict safety outcomes. The outcomes predicted here are injury severity, injury type, body part impacted, and incident type. However, unlike in the original study, safety outcomes were not extracted via NLP but were provided by independent human annotations, eliminating any potential source of artificial correlation between predictors and predictands. Results show that attributes are still highly predictive, confirming the validity of the original approach. Other improvements brought by the current study include the use of (1) a much larger dataset featuring more than 90,000 reports, (2) two new models, XGBoost and linear SVM (Support Vector Machines), (3) model stacking, (4) a more straightforward experimental setup with more appropriate performance metrics, and (5) an analysis of per-category attribute importance scores. Finally, the injury severity outcome is well predicted, which was not the case in the original study. This is a significant advancement.

2605.20534 2026-05-21 cs.LG cs.AI stat.ML

Axiomatizing Neural Networks via Pursuit of Subspaces

通过子空间追求轴心化神经网络

Mehmet Yamac, Mert Duman, Ugur Akpinar, Felix Rojas Casadiego, Serkan Kiranyaz, Marcel van Gerven, Moncef Gabbouj

AI总结 本文提出一个基于几何公理的框架,用于解释神经网络的行为,通过子空间追求假设,统一了表示、计算和泛化在浅层和深层架构中的视角。

详情
Comments
43 pages, 25 figures. Code and additional materials will be released
AI中文摘要

尽管深度神经网络在许多领域取得了显著成功,但其底层机制仍不清晰,常被视为黑箱。这种经验表现与理论理解之间的差距类似于经典几何学的前公理阶段。在本文中,我们引入了子空间追求(PoS)假设,这是一个轴心化的框架,通过一组几何公理来表征神经网络的行为。这些公理及其推导出的结论为浅层和深层架构中的表示、计算和泛化提供了统一的视角。我们展示了该框架能够为深度学习中的基本问题提供几何解释,包括表示结构、架构机制和泛化行为,从而为一个连贯的理论基础提供了有原则的步骤。

英文摘要

While deep neural networks have achieved remarkable success across a wide range of domains, their underlying mechanisms remain poorly understood, and they are often regarded as black boxes. This gap between empirical performance and theoretical understanding poses a challenge analogous to the pre-axiomatic stage of classical geometry. In this work, we introduce the Pursuit of Subspaces (PoS) hypothesis, an axiomatic framework that formulates neural network behavior through a set of geometric postulates. These axioms, together with their derived consequences, provide a unified perspective on representation, computation, and generalization in both shallow and deep architectures. We show that this framework yields geometric explanations for fundamental questions in deep learning, including representation structure, architectural mechanisms, and generalization behavior, offering a principled step toward a coherent theoretical foundation.

2605.20508 2026-05-21 stat.ME astro-ph.HE astro-ph.IM physics.data-an stat.AP

Compensator-Based Inference for Signal Detection Under Unknown Background

基于补偿器的信号检测推断:在未知背景下的应用

Aritra Banerjee, Sara Algeri

AI总结 本文提出了一种新的信号检测方法,通过估计补偿器参数而非背景分布来简化推断过程,从而更有效地传播不确定性。

详情
AI中文摘要

在未知背景存在的情况下检测新信号的问题在科学发现中普遍存在,尤其是在物理科学中尤为突出。迄今为止大多数解决方案都集中在估计背景分布并利用该估计来推断信号。通过研究该问题的几何结构,本文证明估计背景分布对于推断信号强度并非绝对必要。相反,只需估计一个参数,称为补偿器,即可弥补对背景的不完全了解,显著简化了问题的复杂性,并使不确定性传播成为可能。所提出的补偿器被证明在所提出的设置以及基于似然的方法中都控制推断的保守性。

英文摘要

The problem of detecting new signals in the presence of an unknown background is ubiquitous in scientific discoveries and is especially prominent in the physical sciences. Most solutions proposed thus far to address the problem focus on estimating the background distribution and using that estimate to infer the signal. By studying the geometry of the problem, this article demonstrates that estimating the background distribution is somewhat unnecessary for inferring the signal intensity. Instead, it suffices to estimate a single parameter, referred to as the compensator, to account for the incomplete knowledge on the background, substantially simplifying the problem's complexity and enabling proper uncertainty propagation. Such a compensator is shown to govern the conservativeness of the inference, both in the proposed setup and in likelihood-based approaches.

2605.20502 2026-05-21 cs.LG cs.AI cs.CV stat.AP stat.ML

Tippett-minimum Fusion of Representation-space Diffusion Models for Multi-Encoder Out-of-Distribution Detection

基于表示空间扩散模型的Tippett最小融合多编码器异常检测

Neelkamal Bhuyan

AI总结 本文提出了一种多编码器融合的表示空间扩散模型,通过统计分析每个编码器对特定分布偏移类型的敏感性,引入EncMin2L门控机制,无需使用OOD标签即可在较低参数成本下提升异常检测性能,同时在四种分布偏移类型上均达到0.94以上的AUROC。

详情
Comments
14 pages
AI中文摘要

我们通过多编码器融合的每编码器表示空间扩散模型(RDMs)来解决跨完整分布偏移谱的异常检测问题,包括全局域变化、语义分歧、纹理差异和协变量腐蚀。我们从ID数据中统计地识别每个编码器对特定偏移类型的敏感性,并引入EncMin2L——一种编码器无关的两级min(⋅)门控,能够在不使用OOD标签的情况下结合和校准每编码器扩散基的似然检测器,参数成本比单编码器基线低2.3倍。两种ID数据诊断:η²(类条件F检验)和Δμ(在合成腐蚀下的对数似然偏移)量化编码器的专业化,而Tippett最小p值组合将每编码器得分聚合为一个校准稳定的OOD信号。EncMin2L在所有四种偏移类型上均达到≥0.94的AUROC,优于在重叠基准上的最佳表示空间扩散OOD检测器。

英文摘要

We address out-of-distribution (OOD) detection across the full spectrum of distribution shifts -- global domain changes, semantic divergence, texture differences, and covariate corruptions -- through a multi-encoder fusion of per-encoder representation-space diffusion models (RDMs). We statistically identify each encoder's sensitivity to specific shift types from ID data alone and introduce EncMin2L -- an encoder-agnostic two-level $\min(\cdot)$-gate that combines and calibrates per-encoder diffusion-based likelihood detectors without OOD labels, outperforming monolithic multi-encoder baselines at $2.3\times$ lower parameter cost. Two ID-data diagnostics: $η^2$ (class-conditional F-test) and $Δμ$ (log-likelihood shift under synthetic corruptions) -- quantify encoder specialization, while a Tippett minimum $p$-value combination aggregates per-encoder scores into a single, calibration-stable OOD signal. EncMin2L achieves $\geq 0.94$ AUROC across all four shift types simultaneously, outperforming the state-of-the-art representation-space diffusion OOD detectors across overlapping benchmarks.

2605.20494 2026-05-21 cs.LG physics.ao-ph stat.AP

A 10,000-Year Global Stochastic Tropical Cyclone Catalog with Wind-Dependent Track Transitions (WHITS)

具有风依赖性路径转换的10,000年全球随机热带气旋目录(WHITS)

Jennifer Nakamura, Upmanu Lall

AI总结 本文提出WHITS方法,通过非参数半马尔可夫路径生成器生成全球10,000年合成气旋目录,以提高保险损失评估的可靠性。

详情
AI中文摘要

可靠的热带气旋(TC)风险评估受到历史记录的简短和空间稀疏性的限制,特别是对于罕见的高强度登陆事件,这些事件主导了保险损失。我们提出了WHITS(风聚焦飓风交互路径模拟器),这是一种非参数半马尔可夫路径生成器,扩展了Nakamura等人(2015)的HITS框架,有三种改进:在历史路径段之间转换时,除了位置、年龄和前进向量外,还根据局部风速进行条件;在比较向量项上选择核时,进行了细化以抑制动态不一致的跳跃;并在每个转换中应用了短平滑窗口,以消除下游风暴潮用户报告的位置和风速不连续性。WHITS被拟合到每个六个盆地的完整可用最佳轨迹记录中,北大西洋延伸至1851年,在其他盆地延伸至可靠最佳轨迹数据的最早年份。所得到的10,000年全球合成目录重现了所有盆地的观测路径密度和每年飓风/台风风力打击概率。该目录旨在用于灾难风险应用,其中大量、低偏倚的物理合理路径比小而统计上修正的样本更有用。

英文摘要

Reliable assessment of tropical cyclone (TC) risk is limited by the brevity and spatial sparsity of the historical record, particularly for the rare, high-intensity landfalls that dominate insured loss. We present WHITS (Wind-focused Hurricane Interactive Track Simulator), a non-parametric semi-Markov track generator that extends the HITS framework of Nakamura et al. (2015) in three ways: transitions between historical track segments are conditioned on local wind speed in addition to position, age, and forward vector; the kernel selection on the comparative-vector term is sharpened to suppress dynamically inconsistent jumps; and a short smoothing window is applied across each transition to remove the position and wind discontinuities reported by downstream surge users. WHITS is fit to the full available best-track record in each of six basins in IBTrACS, extending in the North Atlantic to 1851 and in other basins to the earliest year of reliable best-track data. The resulting 10,000-yr global synthetic catalog reproduces observed track density and the annual hurricane/typhoon-force wind-hit probability across all basins. The catalog is intended for catastrophe-risk applications where a large, low-bias sample of physically plausible tracks is more useful than a small, statistically corrected one.

2605.20434 2026-05-21 stat.ML cs.DM cs.LG

Contradiction Graphs Determine VC Dimension

矛盾图确定VC维

Jesse Campbell, Daniel Ibaibarriaga, Lev Reyzin

AI总结 本文研究二元概念类的矛盾图,通过分析矛盾图的结构确定VC维的阈值,从而精确计算VC维并区分有限与无限VC维。

详情
AI中文摘要

我们研究与二元概念类相关的矛盾图。对于一个概念类$H \subseteq \{0,1\}^X$,顺序-$m$矛盾图$G_m(H)$的顶点是长度为$m$的可由$H$实现的标记序列,当两个序列对某个公共域点赋予相反标签时,两个顶点相邻。我们的主要结果是单个图$G_m(H)$确定阈值谓词$\mathrm{VCdim}(H)\ge m$。因此,完整的序列$(G_m(H))_{m \ge 1}$确定精确的VC维,并且特别地,区分有限与无限VC维,回答了Alon等人(2024)提出的问题。

英文摘要

We study the contradiction graphs associated with binary concept classes. For a class $H \subseteq \{0,1\}^X$, the order-$m$ contradiction graph $G_m(H)$ has as vertices the $H$-realizable labeled sequences of length $m$, with two vertices adjacent when the two sequences assign opposite labels to some common domain point. Our main result is that the single graph $G_m(H)$ determines the threshold predicate $\mathrm{VCdim}(H)\ge m$. Consequently, the full sequence $(G_m(H))_{m \ge 1}$ determines the exact VC dimension and, in particular, detects finite versus infinite VC dimension, answering a question posed by Alon et al. (2024).

2605.20429 2026-05-21 stat.AP

Design and Validation of a Grid-based Home Detection via Stay-Time (GHOST) Software for Mobile Location Data

基于停留时间的网格化家庭检测(GHOST)软件的设计与验证:用于移动定位数据

Alessandra Recalde, Mustafa Sameen, Xiaojian Zhang, Xilei Zhao

AI总结 本研究提出了一种基于网格和停留时间的家庭检测算法GHOST,通过定制的空间和时间过滤器识别最频繁访问的夜间或周末白天网格单元,以推断代理家庭位置,并在大规模数据集上验证其在噪声数据中的鲁棒性。

详情
AI中文摘要

从移动设备生成的GPS数据中准确检测家庭位置是人类移动性研究的基础步骤,对交通规划、公共卫生和应急响应有重要影响。然而,现有的家庭检测算法在处理真实世界中的噪声数据时往往结果不可靠,并且由于缺乏地面真实基准而难以验证。为解决这些限制,本研究提出了GHOST算法的设计与验证,作为开源的Python包实现。该算法通过识别基于可定制空间和时间过滤器的最频繁访问的夜间或周末白天网格单元来推断代理家庭位置。为了验证其性能,我们使用包含超过155,000次行程的大型波士顿步行数据集,该数据集来自波士顿都会区的377名参与者,以测试其对噪声数据的鲁棒性。此外,我们还收集了来自美国不同地区的10名志愿者的地面真实数据,包括佛罗里达、密西西比和科罗拉多,以及他们自报的家庭坐标,以评估GHOST在多样化的移动模式和采样条件下的表现。我们比较了GHOST的准确性与五种已建立的家庭检测算法:All-time clustering方法、Stay-point方法、DBSCAN、K-MEANS++和SciKit-Mobility Home Detection在多种参数设置下的表现。结果表明,GHOST在准确性和鲁棒性方面均优于所有算法,最佳配置下的平均误差低至22.3米。我们的发现突显了该算法的高准确性和灵活性,其中网格大小是验证过程中最影响性的参数,展示了该算法在真实世界移动定位数据分析中的潜力。

英文摘要

Accurately detecting home locations from GPS data generated by mobile devices is a foundational step in human mobility research, with significant implications for transportation planning, public health, and emergency response. However, existing home detection algorithms often produce unreliable results for noisy real-world data and are barely validated due to a lack of ground-truth benchmarks. To tackle these limitations, this study presents the development and validation of a Grid-based home detection via Stay-Time (GHOST) algorithm, implemented as an open-source Python package. The algorithm infers proxy home locations by identifying the most frequently visited nighttime or weekend daytime grid cells based on customizable spatial and temporal filters. To validate its performance, we use the large-scale BostonWalks dataset, which includes over 155,000 trips from 377 participants in the Boston metropolitan area, to test robustness to noisy data. Additionally, we collected a ground-truth dataset for ten volunteers across different regions in the U.S., including Florida, Mississippi, and Colorado, along with their self-reported home coordinates, to evaluate GHOST across diverse mobility patterns and sampling conditions. We compared GHOST accuracy to that of 5 well-established home detection algorithms: All-time clustering method, Stay-point method, DBSCAN, K-MEANS++, and SciKit-Mobility Home Detection, across multiple parameter settings. Results show that GHOST outperforms all algorithms in accuracy and robustness, with average errors as low as 22.3 meters under optimal configurations. Our findings highlight the high accuracy and flexibility of our algorithm, with grid size being the most influential parameter during validation, demonstrating the potential of this algorithm for real-world mobile location data analysis.

2605.20400 2026-05-21 stat.AP cs.LG stat.ML

Understanding Deterioration Random Effects for Causal Discovery in Infrastructure Management

理解基础设施管理中的劣化随机效应以进行因果发现

Takato Yasuno

AI总结 本文提出了一种结合贝叶斯分层危险模型与因果发现的新框架,用于识别驱动泵设备异质劣化率的操作模式,通过GPU加速NUTS估计随机效应并验证线性假设,揭示不同操作制度需要不同的管理策略。

详情
Comments
20 pages, 7 figures, 4 tables
AI中文摘要

基础设施劣化对资产管理工作构成重大挑战,但现有方法依赖于人口平均模型,忽略了设备特定的异质性。我们提出了一种新的框架,结合贝叶斯分层危险建模与因果发现,以识别驱动泵设备异质劣化率的操作模式。我们的方法首先利用GPU加速的No-U-Turn Sampling (NUTS) 估计泵特定的随机效应 $u_i$,实现比CPU实现快3-5倍的速度提升。然后,我们使用DirectLiNGAM发现22个工程时间序列特征与劣化率之间的因果关系,并根据正 ($u_i > 0$, 更快劣化) 与负 ($u_i \leq 0$, 更慢劣化) 随机效应进行分层。分析112台泵共92,861个观测值,持续650天,我们发现显著的异质性:负组的因果效应比正组大400倍,标准差 (std) 显示在低风险设备上,正因果效应 ($+1.515$) 对劣化率有显著影响。我们通过NonlinearLiNGAM比较验证线性假设,并通过GPU加速展示实际可扩展性。我们的发现使通过揭示不同操作制度需要根本不同的管理方法,推动预测性维护从人口平均到异质性感知决策的进展。

英文摘要

Infrastructure deterioration poses significant challenges for asset management, yet existing approaches rely on population-averaged models that overlook equipment-specific heterogeneity. We present a novel framework that combines Bayesian hierarchical hazard modeling with causal discovery to identify operational patterns that drive heterogeneous deterioration rates in pump equipment. Our approach first estimates pump-specific random effects $u_i$ using GPU-accelerated No-U-Turn Sampling (NUTS), achieving 3--5$\times$ speedup over CPU implementations. We then employ DirectLiNGAM to discover causal relationships between 22 engineered time-series features and deterioration rates, stratified by positive ($u_i > 0$, faster deterioration) versus negative ($u_i \leq 0$, slower deterioration) random effects. Analyzing 112 pumps with 92,861 observations over 650 days, we uncover striking heterogeneity: the negative group exhibits causal effects 400$\times$ larger than the positive group, with standard deviation (std) showing a strong positive causal effect ($+1.515$) on deterioration rates in low-risk equipment. We validate linearity assumptions through NonlinearLiNGAM comparison and demonstrate practical scalability through GPU acceleration. Our findings enable targeted maintenance strategies by revealing that different operational regimes require fundamentally distinct management approaches, advancing predictive maintenance from population-averaged to heterogeneity-aware decision making.

2605.20399 2026-05-21 stat.ME stat.AP

A duration-augmented binary Markov chain for rainfall occurrence with long dry spells

具有持续时间的二元马尔可夫链用于降雨发生与长干旱期

Antoine Doizé, Denis Allard, Philippe Naveau, Olivier Wintenberger

AI总结 本文提出了一种具有持续时间增强的二元马尔可夫链,用于模拟降雨发生与长干旱期,通过与交替再生成链建立联系,实现了对湿旱期持续时间分布的灵活参数建模,并在南欧200多个站点上应用,验证了该方法在刻画持久性和高分位数外推方面的有效性。

详情
AI中文摘要

模拟真实合理的湿期和旱期是天气生成器和气候影响研究中的核心任务。虽然有限阶马尔可夫链是标准方法,但它们由于内在的亚指数衰减特性,往往无法再现持久的干旱条件。我们通过引入持续时间增强的二元马尔可夫链来建模降雨发生。我们建立了与交替再生成链的联系,从而能够灵活地对湿期和旱期持续时间分布进行参数建模。我们使用两种从广义极值分布一般类中衍生出的适应性规范来建模这些分布,从而在各种气候条件下实现灵活的尾部行为。我们使用适应于每种规范的估计方法。我们的模型应用于南欧约200个站点,涵盖了多样的地中海和大陆性气候。我们将此框架与标准马尔可夫模型进行比较,以刻画持久性和高分位数外推。该方法具有通用性,可自然扩展到多状态设置或其他二元序列应用在环境统计中。

英文摘要

Simulating realistic wet and dry spells is central in weather generators and climate-impact studies. While finite-order Markov chains are standard, they often fail to reproduce persistent dry conditions due to their inherent subexponential decay. We model rainfall occurrence by introducing a duration-augmented binary Markov chain. We establish a link with alternating renewal chains, enabling flexible parametric modelling of wet and dry spell duration distribution. We model those using two regime-adapted specifications from the general class of extended Generalized Pareto Distributions, yielding flexible tail behaviour across various climates. We use estimation methods adapted to each specification. Our model is applied to around 200 stations in the South of Europe spanning diverse Mediterranean and continental climates. We compare this framework to standard Markov models in characterising persistence and high-quantile extrapolation. The approach is generic, extending naturally to multi-state settings or other binary sequence applications in environmental statistics.

2605.20396 2026-05-21 cs.LG stat.ML

Score-Based Causal Discovery of Latent Variable Causal Models

基于得分的潜在变量因果模型因果发现

Ignavier Ng, Xinshuai Dong, Haoyue Dai, Biwei Huang, Peter Spirtes, Kun Zhang

AI总结 本文提出了一种基于得分的方法,用于识别包含因果相关潜在变量的因果结构,并提供了可识别性保证,同时通过实验验证了方法的有效性。

详情
Comments
ICML 2024
AI中文摘要

识别潜在变量及其涉及的因果结构在各种科学领域中都是至关重要的。尽管许多现有工作属于约束性方法(例如条件独立性或秩不足测试),但它们可能面临经验挑战,如测试顺序依赖性、误差传播和选择合适显著性水平的问题。这些问题可以通过精心设计的基于得分的方法(如在没有潜在变量的情况下使用的贪心等价搜索(GES))来缓解。然而,设计包含潜在变量的基于得分的方法却极具挑战性。在本文中,我们开发了能够识别包含因果相关潜在变量的因果结构的基于得分的方法,并提供了可识别性保证。具体而言,我们证明了适当制定的评分函数可以实现结构学习的得分等价性和一致性。我们进一步对文献中考虑的多种结构假设下观测变量边缘分布的有效自由度进行了表征,并据此开发了精确和连续的基于得分的方法。这为几种现有约束性方法提供了统一的视角。实验结果验证了所提出方法的有效性。

英文摘要

Identifying latent variables and the causal structure involving them is essential across various scientific fields. While many existing works fall under the category of constraint-based methods (with e.g. conditional independence or rank deficiency tests), they may face empirical challenges such as testing-order dependency, error propagation, and choosing an appropriate significance level. These issues can potentially be mitigated by properly designed score-based methods, such as Greedy Equivalence Search (GES) (Chickering, 2002) in the specific setting without latent variables. Yet, formulating score-based methods with latent variables is highly challenging. In this work, we develop score-based methods that are capable of identifying causal structures containing causally-related latent variables with identifiability guarantees. Specifically, we show that a properly formulated scoring function can achieve score equivalence and consistency for structure learning of latent variable causal models. We further provide a characterization of the degrees of freedom for the marginal over the observed variables under multiple structural assumptions considered in the literature, and accordingly develop both exact and continuous score-based methods. This offers a unified view of several existing constraint-based methods with different structural assumptions. Experimental results validate the effectiveness of the proposed methods.

2605.20359 2026-05-21 econ.EM stat.ME

The Harmonic Synthetic Control Method

谐波合成控制法

Ziyi Liu, Yiqing Xu

AI总结 本文提出谐波合成控制法(HSC),通过软分配机制替代二元选择,联合估计供体权重和被处理单位的平滑残差成分,并利用时间序列预测器外推残差成分。HSC通过滚动原点交叉验证选择调节参数,平衡供体匹配与预测。通过频谱解释显示HSC在供体匹配中降低低频残差成分,并将其分配给预测分支。蒙特卡洛实验表明HSC能适应不同 regime,而在随机趋势主要为共同或异质时表现良好。

详情
AI中文摘要

合成控制方法在结果序列包含单位特定随机趋势时会产生误导性的反事实预测,这是非平稳宏观经济数据的常见特征。现有解决方案,如预滤波或差分,可以减少虚假匹配,但可能丢弃共享的非平稳变化,这些变化有助于估计供体权重。我们提出谐波合成控制法(HSC),将这一二元选择替换为软分配机制。HSC联合估计供体权重和被处理单位特定的平滑残差成分,然后利用时间序列预测器将此成分外推到治疗后时期。一个通过滚动原点交叉验证选择的调节参数控制供体匹配与预测之间的分配。随着该参数的变化,HSC连续在差分结果上的合成控制和原始结果上的合成控制(带有截距或趋势)之间插值。我们提供频谱解释,说明HSC如何在供体匹配中降低低频残差成分,并将其分配给预测分支。预测误差分解将权重估计扭曲与残差预测误差分开。蒙特卡洛实验表明HSC能适应不同 regime,在随机趋势主要为共同或异质时表现良好,而固定在某一 regime 的估计器在另一 regime 时会失败。

英文摘要

Synthetic control methods can produce misleading counterfactual predictions when outcome series contain unit-specific stochastic trends, a common feature of nonstationary macroeconomic data. Existing remedies, such as pre-filtering or differencing, reduce spurious matching but may discard shared nonstationary variation that helps estimate donor weights. We propose Harmonic Synthetic Control (HSC), which replaces this binary choice with a soft allocation mechanism. HSC jointly estimates donor weights and a treated-unit-specific smooth residual component, then extrapolates this component into post-treatment periods using a time-series forecaster. A tuning parameter, selected by rolling-origin cross-validation, governs the division between donor matching and forecasting. As it varies, HSC continuously interpolates between synthetic control applied to differenced outcomes and synthetic control applied to raw outcomes with an intercept or trend. We provide a spectral interpretation showing how HSC downweights low-frequency residual components in donor matching and assigns them to the forecasting branch. A prediction-error decomposition separates weight-estimation distortion from residual-forecasting error. Monte Carlo exercises show that HSC adapts across regimes, performing well when stochastic trends are predominantly common or idiosyncratic, while estimators fixed to one regime can fail in the other.

2605.20345 2026-05-21 stat.ML cs.LG

Corrected Integrated Laplace Approximation for Bayesian Inference in Latent Gaussian Models

修正的积分拉普拉斯近似法用于潜在高斯模型的贝叶斯推断

Jinlin Lai, Charles C. Margossian, Daniel R. Sheldon

AI总结 本文提出了一种重要性采样方案来纠正积分拉普拉斯近似法(ILA)在潜在高斯模型(LGMs)中引入的误差,通过增加重要性采样的样本数使近似后验收敛到正确后验,并在自动微分框架中实现该方法以支持超参数推断中的梯度基算法,特别是哈密顿蒙特卡洛方法。

详情
AI中文摘要

潜在高斯模型(LGMs)是一类流行的贝叶斯分层模型,包括高斯过程、某些空间模型和混合效应模型。对LGMs进行高效贝叶斯推断通常需要对潜在变量进行边缘化。对于具有非高斯似然的LGMs,精确边缘化是不可能的,一种流行的方法是使用积分拉普拉斯近似(ILA)进行近似边缘化。使用ILA会产生一个近似后验,在某些情况下,它可能与正确后验有显著差异,从而影响下游应用。我们提出了一种重要性采样方案来纠正ILA引入的误差。通过增加重要性采样的样本数,ILA产生的后验将收敛到正确后验。这一想法通过伪边缘化、拟蒙特卡洛和随机化拟蒙特卡洛等技术实现。我们将在自动微分框架中实现我们的方法,以支持在超参数推断中的梯度基算法。对于后者,我们特别考虑使用哈密顿蒙特卡洛方法。我们展示了在各种应用模型中减少误差的好处。

英文摘要

Latent Gaussian models (LGMs) are a popular class of Bayesian hierarchical models that include Gaussian processes, as well as certain spatial models and mixed-effect models. Efficient Bayesian inference of LGMs often requires marginalizing out the latent variables. For LGMs with a non-Gaussian likelihood, exact marginalization is not possible and a popular approach is to do approximate marginalization with an integrated Laplace approximation (ILA). Using ILA produces an approximate posterior which, in some settings, can differ significantly from the correct posterior, which impacts downstream applications. We propose an importance sampling scheme to correct the error introduced by ILA. By increasing the number of samples in importance sampling, the posterior with ILA converges to the correct posterior. This idea is realized with various techniques, including pseudo-marginalization, quasi-Monte Carlo and randomized quasi-Monte Carlo. We implement our methods in an automatic differentiation framework to support gradient-based algorithms when doing inference on the hyperparameters. For the latter, we specifically consider the use of Hamiltonian Monte Carlo. We demonstrate the benefits of reduced error in various applied models.

2605.20325 2026-05-21 stat.ME stat.CO

Explainable Outlier Detection for Multivariate Functional Data

可解释的多元函数数据异常检测

Marcus Mayrhofer, Una Radojičić, Horst Lewitschnig, Peter Filzmoser

AI总结 本文针对具有分离协方差结构的多元函数数据的鲁棒协方差估计和可解释性异常检测挑战,提出了一种结合随机过程与矩阵变量分布的方法,通过改进鲁棒性和可解释性来估计均值和协方差,并利用Shapley值进行异常检测分解。

详情
AI中文摘要

本工作针对具有分离协方差结构的多元函数数据的鲁棒协方差估计和可解释性异常检测挑战,提出了一种方法,通过建立具有分离协方差结构的随机过程与其基表示的矩阵变量分布之间的联系,同时改进鲁棒性和可解释性。利用最近开发的矩阵变量最小协方差确定性估计器(MMCD)的变体,结合截断的多元函数Mahalanobis半距离,以鲁棒的方式估计多元函数数据的均值和协方差。对于可解释的异常检测,将基于Shapley值的多元异常解释推广到分解总体多元函数异常性为时间坐标特定的贡献。重要的是,将原本相对于组件数量呈指数级的计算复杂度降低到线性复杂度,同时保留Shapley值的关键属性。这种集成框架结合了鲁棒Mahalanobis距离、MMCD估计器和基于Shapley值的异常性分解,为具有分离协方差结构的多元函数数据提供了一种鲁棒且可解释的分析方法。通过理论分析和实际应用,包括模拟和现实世界示例,验证了该方法的有效性。

英文摘要

This work addresses the challenges of robust covariance estimation and interpretable outlier detection for multivariate functional data with separable covariance structure. We develop a method that simultaneously improves robustness and interpretability in this context by establishing a connection between stochastic processes with separable covariance structures and the corresponding matrix-variate distribution of their basis representations. Leveraging this connection, we employ the recently developed matrix-variate counterpart of the Minimum Covariance Determinant estimator (MMCD) in conjunction with a truncated multivariate functional Mahalanobis semi-distance to robustly estimate mean and covariance for multivariate functional data. For interpretable outlier detection, we generalize multivariate outlier explanations based on Shapley values to decompose overall multivariate functional outlyingness into time-coordinate-specific contributions. Importantly, we reduce the otherwise exponential computational complexity (relative to the number of components) to linear complexity, while retaining the key properties of the Shapley value. This integrated framework combines robust Mahalanobis distances, MMCD estimators, and Shapley value-based outlyingness decomposition to provide a robust and interpretable approach for analyzing multivariate functional data with separable covariance structures. The effectiveness of this approach is demonstrated through both theoretical analysis and practical applications, including simulations and real-world examples.

2605.20271 2026-05-21 stat.ML cs.LG

Multi-Head Attention as Ensemble Nadaraya-Watson Estimation: Variance Reduction, Decorrelation, and Optimal Head Diversity

多头注意力作为恩德里亚-沃森估计的集合:方差减少、去相关和最优头多样性

Ernest Fokoué

AI总结 本文提出多头注意力可以视为恩德里亚-沃森核回归估计器的集合,通过分析头输出的去相关性,推导出方差减少与头多样性之间的关系,并提出头多样性指数来衡量不同头之间的去相关程度,最终得出最优的头数量和维度分配方案。

详情
Comments
14 pages
AI中文摘要

我们发展了多头注意力(MHA)作为恩德里亚-沃森(NW)核回归估计器集合的严谨统计理论。基于单头softmax注意力与NW估计器之间的代数恒等式,我们证明MHA是H个NW估计器的结构化集合,每个在键空间的不同的学习投影子空间中操作。我们推导出MHA均方误差的显式偏倚-方差-协方差分解,表明方差减少不仅取决于头数H,还根本上取决于头输出的去相关性。去相关由学习投影子空间之间的主角之间决定:正交投影产生最大方差减少;对齐投影产生无。我们引入头多样性指数(HDI),一个可计算的谱度量,衡量头之间的去相关程度,并证明MHA均方误差随HDI单调递减。这为经验观察到的注意力头的专业化提供了第一个严谨的理论解释。在固定总维度预算D=H*d_k下,我们解决最优头维度分配问题,推导出MSE最小化的配对(H*,d_k*)从数据分布和回归平滑度。解决方案得出新的架构扩展定律:最优每头维度随着训练集大小对数增长,而最优头数几乎与总预算D线性增长。我们的框架统一了三个先前的工作:单头注意力的NW理论、集合学习的一般加权理论以及生物和计算集合之间的去相关-方差减少同构性。多头注意力是Transformer对通用原则的实例化:相同代理加上促进多样性的机制产生涌现最优性。

英文摘要

We develop a rigorous statistical theory of multi-head attention (MHA) as an ensemble of Nadaraya-Watson (NW) kernel regression estimators. Building on the algebraic identity between single-head softmax attention and the NW estimator, we prove that MHA is a structured ensemble of H NW estimators, each operating in a distinct learned projection subspace of the key space. We derive an explicit Bias-Variance-Covariance decomposition of the MHA mean squared error, showing that variance reduction depends not merely on the number of heads H but fundamentally on the decorrelation of head outputs. Decorrelation is governed by the principal angles between learned projection subspaces: orthogonal projections yield maximum variance reduction; aligned projections yield none. We introduce the Head Diversity Index (HDI), a computable spectral measure of inter-head decorrelation, and prove that MHA mean squared error is monotonically decreasing in HDI. This provides the first rigorous theoretical explanation for the empirically observed specialization of attention heads. Under a fixed total-dimension budget D = H * d_k, we solve the optimal head-dimension allocation problem, deriving the MSE-minimizing pair (H*, d_k*) from data distribution and regression smoothness. The solution yields a new architectural scaling law: the optimal per-head dimension grows logarithmically with training set size, while the optimal number of heads grows nearly linearly with the total budget D. Our framework unifies three strands of prior work: the NW theory of single-head attention, the general weighting theory for ensemble learning, and the decorrelation-variance-reduction isomorphism between biological and computational ensembles. Multi-head attention is the Transformer's instantiation of a universal principle: identical agents plus diversity-enforcing mechanisms yields emergent optimality.

2605.20270 2026-05-21 cs.LG cs.AI stat.ML

Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs

conformal selective acting: any-time-valid risk control for rlvr-trained llms

Hamed Khosravi, Xiaoming Huo

AI总结 该研究提出了一种 conformal selective acting 方法,用于在 rlvr 训练的 llms 部署中实现 anytime-valid 的风险控制,通过在部署要求下强制一个空单元,利用 e-process 和 bonferroni 网格来维护 pathwise 有效性,同时在多个基准测试中证明了其有效性。

详情
AI中文摘要

一个本地专家 llm,通过在操作员本地数据上使用强化学习从可验证奖励 (rlvr) 进行微调,被安装在一个受监管的组织中,具有每个部署的误差预算 α。操作员需要在每个回合为该部署的流提供安全证书:不跨部署汇总,不等待长期平均。现有封装器无法在自适应、在线更新的流上实现这一点:离线 conformal 风险方法需要可交换性;在线 conformal 方法仅绑定长期平均;非可交换扩展是边际有效的;最接近的 anytime 封装器,A-RCPS,控制的是边际风险而非选择性风险。使用 (测试统计量,有效性保证,部署规则) 框架,我们识别了一个被部署要求强制的空单元:e-process 每个阈值,选择性风险,anytime-pathwise 有效性,max-certified-threshold 规则。Conformal Selective Acting (CSA) 填充它作为每回合的封装器,维护每个阈值上的 ville 型 e-process 在 bonferroni 网格上,评估相对于 rlvr 过滤器。在可预测的更新和 isotonic-calibrated 单调风险下,我们证明了 (i) 一个 anytime-pathwise 选择性风险界 $R_T^{\mathrm{act}}\leα+O(N_T^{-1/2})$,(ii) 与 $Θ(arη^{-2}\log(1/δ))$ 匹配的认证率,以及 (iii) 与 horizon 无关的发布率差距。在八个专家基准 ($480$ 流)、十六个对抗性分布偏移单元 ($160$ 流) 和五个 live Expert-Iteration RLVR 单元 (在四个基础模型上使用在线 LoRA 在三个架构家族中) ($10{,}300$ 轮) 中,CSA 是十种方法中唯一一个在每个单元上都满足 pathwise 有效性和非拒绝部署的方法。我们不提出新的 llm、训练算法或策略类;CSA 是部署端的补充,与模型正交,适用于无法使用前沿 API 的操作员。

英文摘要

A local specialist LLM, fine-tuned with reinforcement learning from verifiable rewards (RLVR) on operator-local data, is installed in a regulated organization with per-deployment error budget $α$. The operator needs a safety certificate for this deployment's stream at every round: no pooling across deployments, no waiting for a long-run average. Existing wrappers cannot deliver this on adaptive, online-updated streams: offline conformal-risk methods require exchangeability; online-conformal methods bound only long-run averages; non-exchangeable extensions are marginally valid; and the closest anytime wrapper, A-RCPS, controls marginal rather than selective risk. Using a (test statistic, validity guarantee, deployment rule) framework, we identify one empty cell forced by deployment requirements: e-process per threshold, selective risk, anytime-pathwise validity, max-certified-threshold rule. Conformal Selective Acting (CSA) fills it as a per-round wrapper maintaining a Ville-type e-process per threshold on a Bonferroni grid, evaluated against the RLVR filtration. Under predictable updates and isotonic-calibrated monotone risk we prove (i) an anytime-pathwise selective-risk bound $R_T^{\mathrm{act}}\leα+O(N_T^{-1/2})$, (ii) rate-optimal certification matching $Θ(\barη^{-2}\log(1/δ))$, and (iii) a horizon-independent release-rate gap. Across eight specialist benchmarks ($480$ streams), sixteen adversarial distribution-shift cells ($160$ streams), and five live Expert-Iteration RLVR cells with online LoRA over four base models in three architecture families ($10{,}300$ rounds), CSA is the only method among ten compared that satisfies pathwise validity and non-refusing deployment on every cell. We do not propose a new LLM, training algorithm, or policy class; CSA is the deployment-side complement, orthogonal to the model, for operators who cannot use a frontier API.

2605.20269 2026-05-21 cs.LG cs.AI stat.ML

Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity

捕捉移动子空间:超越平稳性的低秩老虎机

Hamed Khosravi, Xiaoming Huo

AI总结 本文研究了在子空间漂移的情况下,低秩线性上下文老虎机的问题,提出了一种新的算法SPSC,在保持子空间变化的同时,实现了基于秩的动态遗憾率。

详情
AI中文摘要

许多老虎机应用(推荐、临床给药、广告定向)有两个事实,以往的工作只孤立处理:奖励生活在低维潜在子空间上,且该子空间漂移。静态低秩老虎机利用秩但受子空间变化影响;非静态线性老虎机适应漂移但以环境速率$\widetilde{O}(d\sqrt{T})$工作。我们研究了分段静态低秩线性上下文老虎机,具有标量反馈:$θ_t = B_k^\star w_t$,其中秩-$r$因子$B_k^\star\in\mathbb{R}^{d\times r}$在每个未知的$K$段内恒定,且可以在边界处改变。我们的结果在三个轴上都是紧致的。 (i) 识别边界。在单次标量奖励下,移动子空间可通过奖励的二次函数来恢复,当且仅当三个探针侧条件成立:已知噪声方差、有界状态-噪声耦合、以及全维探针支持。每个都是在无限制二次矩问题中的必要条件,且共同它们是充分的,表征了解决区域的边界。 (ii) 算法和动态遗憾。SPSC在学习的$r$维子空间内交替等距探针与窗口投影岭UCB利用;CUSUM样式的变体在线发现段边界。成本动态遗憾是$\widetilde{O}(r\sqrt{T})+\widetilde{O}(T^{2/3})+O(W\,V_{\mathrm{in}})$,用内在秩代替环境$d\sqrt{T}$速率。 (iii) 实验。在十一基准上,从合成、UCI/MovieLens、半合成临床和ZOZOTOWN生产日志数据跨度,SPSC在$d-r\gtrsim T^{1/6}$时优于非静态和低秩基线,匹配分析交叉点。据我们所知,这是在该设置中首次工作来表征识别边界并达到内在秩动态遗憾率的工作。

英文摘要

Many bandit deployments (recommendation, clinical dosing, ad targeting) share two facts prior work handles only in isolation: rewards live on a low-dimensional latent subspace, and that subspace drifts. Stationary low-rank bandits exploit rank but break under subspace change; non-stationary linear bandits adapt to drift but pay ambient rate $\widetilde{O}(d\sqrt{T})$. We study piecewise-stationary low-rank linear contextual bandits with scalar feedback: $θ_t = B_k^\star w_t$ with rank-$r$ factor $B_k^\star\in\mathbb{R}^{d\times r}$ constant within each of $K$ unknown segments and able to shift at boundaries. Our results are tight along three axes. (i) Identification boundary. With single-play scalar rewards, the moving subspace is recoverable through quadratic functionals of rewards iff three probe-side conditions hold: known noise variance, bounded state-noise coupling, and full-dimensional probe support. Each is necessary in the unrestricted-second-moment problem, and jointly they are sufficient, characterizing the boundary of the solvable region. (ii) Algorithm and dynamic regret. SPSC interleaves isotropic probes with windowed projected ridge-UCB exploitation inside the learned $r$-dimensional subspace; a CUSUM-style variant discovers segment boundaries online. The costed dynamic regret is $\widetilde{O}(r\sqrt{T})+\widetilde{O}(T^{2/3})+O(W\,V_{\mathrm{in}})$, replacing the ambient $d\sqrt{T}$ rate with the intrinsic rank. (iii) Empirics. On eleven benchmarks spanning synthetic, UCI/MovieLens, semi-synthetic clinical, and ZOZOTOWN production-log data, SPSC outperforms non-stationary and low-rank baselines whenever $d-r\gtrsim T^{1/6}$, matching the analytical crossover. To our knowledge, this is the first work to characterize the identification boundary and attain the intrinsic-rank dynamic-regret rate in this setting.

2605.15955 2026-05-21 eess.SP stat.ML

Topological Kalman Filtering on Cell Complexes

拓扑卡尔曼过滤在细胞复形上的应用

Chengen Liu, Rohan Money, Ting Gao, Mohammad Sabbaqi, Baltasar Beferull-Lozano, Elvin Isufi

AI总结 本文提出了一种基于拓扑结构的状态空间框架,用于从定义在拓扑细胞复形上的多变量时间序列中推断潜在动态,通过拓扑扩散和非线性映射恢复复杂的高阶交互结构。

详情
AI中文摘要

从定义在拓扑细胞复形上的多变量时间序列中推断潜在动态对于捕捉现实世界系统中固有的复杂高阶相互作用至关重要,例如水、传感器和交通网络。然而,由于信号在更高阶拓扑结构之间耦合,高维性、非线性观测和未知结构增加了重建这些潜在状态的难度。为此,我们提出了一种基于细胞复形上的随机偏微分方程的拓扑感知状态空间框架。状态演化遵循类似于热的拓扑扩散,扰动沿边界算子传播。在部分可观测的情况下,我们利用细胞复形卷积将潜在状态与非线性映射结合,以建模观测。我们通过扩展卡尔曼滤波进行递归状态估计,同时通过在线期望最大化算法学习模型参数和不确定性。最后,对于仅已知低阶拓扑结构的情况,例如节点和边,如在关键基础设施网络中,我们引入了一种启发式的细胞识别算法,以显式推断第二阶细胞结构。在合成和真实数据集上的验证表明,我们的方法在部分可观测情况下能够产生可靠的估计,并成功恢复底层拓扑结构。

英文摘要

Inferring latent dynamics from multivariate time-series defined over topological cell complexes is crucial for capturing the complex, higher-order interactions inherent in real-world systems such as in water, sensor, and transportation networks. However, reconstructing these latent states is challenging because the signals are coupled across higher-order topologies, while high dimensionality, nonlinear observations, and unknown structures increase the difficulty. To address this, we propose a topology-aware state space framework derived from stochastic partial differential equations on cell complexes. State evolution follows heat-like topological diffusion, with perturbations propagating along boundary operators. Under partial observability, we model observations using a cell complex convolution of latent states coupled with a nonlinear mapping. We perform recursive state estimation via an Extended Kalman Filter, simultaneously learning model parameters and uncertainties through an online Expectation-Maximization algorithm. Finally, for scenarios where only lower-order topological structure is known, e.g., nodes and edges, as in critical infrastructure networks, we introduce a heuristic cell identification algorithm to explicitly infer the second-order cell structures. Validations on synthetic and real datasets from water, sensor and transportation networks demonstrate that our approach yields reliable estimates under partial observability and successfully recovers the underlying topological structures.

2603.07312 2026-05-21 stat.ME

Predictive Power Analysis of Multiple Test Procedures Under Arbitrary Dependence

在任意依赖情况下多重检验程序的预测能力分析

George Karabatsos

AI总结 本文提出了一种基于贝叶斯预测方法的新方法,用于在任意依赖情况下计算多重检验程序的统计功效和样本量,同时通过模拟方法展示了该方法在铅暴露研究中的应用。

详情
AI中文摘要

许多统计问题可以通过应用一种多重检验程序(MTP)来解决,该程序在未知的任意相关p值下控制家族错误率(FWER)或虚假发现率(FDR),而无需显式建模这些相关性。这些包括控制FWER的Bonferroni(1936)MTP和Holm(1979)MTP;控制FDR的Benjamini和Yekutieli(2001)MTP;以及基于Dirichlet过程(DP)先验分布的DP-MTP(Karabatsos, 2025),该分布支持整个MTP空间,这些MTP控制FWER或FDR。对于此类MTP,本研究介绍了一种新的、相容的方法用于贝叶斯预测能力分析,以计算任何给定计划未来(例如,复制或中期)研究的统计功效和样本量确定。这种新的MTP预测能力分析方法基于一个联合先验分布,定义了一个不对称多元正态均值-方差混合分布的比例矩阵混合,分解为一个通用的效应量先验分布(例如,来自专家判断或先前研究的结果),以及一个均匀先验分布,用于表示给定多个假设检验的测试统计量的p值之间的任意依赖性。新的MTP功效分析方法还产生p值权重,可用于最小化和评估多重检验中的相对影响以及显著性追逐偏差(例如,出版偏倚、p-hacking等),而无需假设p值(效应量)是独立的。新的基于模拟的MTP预测能力分析方法通过分析通过著名铅暴露研究获得的p值,并由先前MTP文献重新分析,使用R包bnpMTP进行了说明。

英文摘要

Many statistical problems can be addressed by applying a multiple testing procedure (MTP) that controls either the Family-wise Error Rate (FWER) or False Discovery Rate (FDR) under unknown arbitrarily-interdependent $p$-values, without explicitly modeling these inter-correlations. They include the FWER-controlling Bonferroni (1936) MTP and Holm (1979) MTP; the FDR-controlling Benjamini and Yekutieli (2001) MTP; and the DP-MTP (Karabatsos, 2025), based on a Dirichlet process (DP) prior distribution supporting the entire space of MTPs that control either the FWER or FDR. For such an MTP, this study introduces a new and congenial method for Bayesian predictive power analysis, for power calculation and sample size determination for any given planned future (e.g., replication or interim) study. This novel MTP predictive power analysis method is based on a joint prior distribution defining a scale matrix mixture of asymmetric multivariate normal mean-variance mixture distributions, factorized as a general prior distribution for effect sizes (e.g., obtained from expert judgment or results of prior studies), and a uniform prior distribution for correlation matrices representing arbitrary dependencies between $p$-values of test statistics of given multiple hypothesis tests under their alternative hypotheses. The new MTP power analysis method also results in $p$-value weights which can be used to minimize the relative impacts of and assess for significance-chasing biases (e.g., publication bias, $p$-hacking, etc.) in multiple testing, without needing to assume that $p$-values (effect sizes) are independent. The new simulation-based MTP predictive power analysis method is illustrated through the analysis of $p$-values obtained by a famous study of lead exposure and re-analyzed by the previous MTP literature, using R package bnpMTP.

2512.19373 2026-05-21 stat.ML cs.LG

Cluster-Based Generalized Additive Models Informed by Random Fourier Features

基于聚类的广义加性模型:受随机傅里叶特征启发

Xin Huang, Jia Li, Jun Yu

AI总结 本文提出了一种结合响应引导的谱表示学习与局部加性建模的可解释回归框架,用于处理异质数据。通过随机傅里叶特征回归模型构建谱特征图,并利用主成分分析压缩以获得低维潜在嵌入,随后通过高斯混合模型发现软区域,在每个区域中使用聚类特定的广义加性模型捕捉非线性协变量效应,最终通过软混合这些局部加性模型实现对非线性和异质结构的灵活建模,同时保持可解释性。

详情
Comments
33 pages, 13 figures, 7 tables
AI中文摘要

在开发数据驱动的建模方法时,需要在黑箱模型的强大预测性能与关键应用所需透明性之间取得平衡。本文介绍了一种可解释且计算上可行的回归框架,用于异质数据,通过结合响应引导的谱表示学习与局部加性建模。该方法首先拟合一个随机傅里叶特征回归模型,并构建一个谱特征图,从学习的振幅和自适应重新采样频率中获得,使表示反映数据中的预测变化。该表示随后通过主成分分析压缩以获得低维潜在嵌入,在其中高斯混合模型执行软区域发现。在每个区域中,聚类特定的广义加性模型通过可解释的样条基单变量平滑函数捕捉非线性协变量效应。最终预测器由这些局部加性模型的软混合组成,使能够灵活地建模非线性和异质结构,同时保持可解释性。在多个基准回归数据集上的数值实验表明,所提出的方法在一致地优于经典全局可解释基线的同时,仍与更灵活的黑箱模型竞争。总体而言,该框架提供了一种统一的异质回归方法,结合了预测适应性与可解释的局部协变量效应。

英文摘要

In developing data-driven modeling methodologies, there is an ongoing need to reconcile the strong predictive performance of opaque black-box models with the transparency required for critical applications. This work introduces an interpretable and computationally tractable regression framework for heterogeneous data by combining response-informed spectral representation learning with localized additive modeling. The method first fits a random Fourier feature regression model and constructs a spectral feature map from the learned amplitudes and adaptively resampled frequencies, so that the representation reflects predictive variation in the data. This representation is then compressed by principal component analysis to obtain a low-dimensional latent embedding, in which a Gaussian mixture model performs soft regime discovery. Within each regime, a cluster-specific generalized additive model captures nonlinear covariate effects through interpretable spline-based univariate smooth functions. The final predictor is formed as a soft mixture of these local additive models, enabling flexible modeling of a nonlinear, heterogeneous structure while preserving interpretability. Numerical experiments across several benchmark regression datasets show that the proposed method consistently improves upon classical globally interpretable baselines while remaining competitive with more flexible black-box models. Overall, the framework provides a unified approach to heterogeneous regression that combines predictive adaptivity with interpretable local covariate effects.

2512.02182 2026-05-21 stat.ME stat.AP

Two-phase validation sampling via principal components to improve efficiency in multi-model estimation from error-prone biomedical databases

通过主成分进行两阶段验证抽样以提高多模型估计在误差多的生物医学数据库中的效率

Sarah C. Lotspeich, Cole Manschot

AI总结 本文提出了一种基于主成分分析的两阶段抽样方法,用于在多模型估计中提高效率,通过平衡和优先考虑多个模型的变量解释量,从而在误差多的生物医学数据库中更有效地进行验证。

详情
Comments
22 pages, 5 figures, 2 tables, GitHub repositories with R package and simulation/analysis code
AI中文摘要

两阶段抽样提供了一种成本效益高的方式来验证生物医学数据库中易出错的协变量测量。在第一阶段收集廉价或容易获得的信息,然后在第二阶段对部分患者进行成本高昂的验证(如专家图表审查)以收集更准确的数据。在平衡主要和次要分析时,竞争模型和优先事项可能导致最信息量大的第二阶段抽样标准不明确。极端尾部抽样(ETS)通过选择特定数量(如协变量或残差)的最小和最大值的患者,可以提供在单个分析目标上进行两阶段研究时的统计效率,通过针对对Fisher信息贡献最大的观测值。我们提出了一种直观、易于使用的方法,扩展了ETS以平衡和优先考虑多个感兴趣模型中的最大变异量。通过主成分分析,我们简洁地总结了所有模型误差多的暴露的固有变异。然后,我们对第一个主成分具有最极端值的患者进行抽样以进行验证。通过广泛的模拟和对国家健康和营养调查(NHANES)的应用,所提出的方法在多个感兴趣模型上同时提高了效率。其优势在各种现实世界场景中持续存在,包括相关或异质的测量误差。在设计验证研究时,专注于单个模型可能是短视的。战略性地分配资源可以同时平衡多个分析目标。在抽样前进行降维将使该策略能够很好地扩展到具有许多误差多的暴露的大数据应用中。

英文摘要

Two-phase sampling offers a cost-effective way to validate error-prone covariate measurements in biomedical databases. Inexpensive or easy-to-obtain information is collected for the entire study in Phase I. Then, a subset of patients undergoes cost-intensive validation (e.g., expert chart review) to collect more accurate data in Phase II. When balancing primary and secondary analyses, competing models and priorities can result in poorly defined objectives for the most informative Phase II sampling criterion. Extreme tail sampling (ETS), wherein patients with the smallest and largest values of a particular quantity (like a covariate or residual) are selected, can offer great statistical efficiency in two-phase studies when focusing on a single analytic objective by targeting observations with the biggest contributions to the Fisher information. We propose an intuitive, easy-to-use approach that extends ETS to balance and prioritize explaining the largest amount of variability across multiple models of interest. Using principal components analysis, we succinctly summarize the inherent variability of all models' error-prone exposures. Then, we sample patients with the most extreme values of the first principal component for validation. Through extensive simulations and an application to the National Health and Nutrition Examination Survey (NHANES), the proposed strategy offered simultaneous efficiency gains across multiple models of interest. Its advantages persisted across various real-world scenarios, including correlated or heterogeneous measurement error. When designing a validation study, concentrating on a single model may be short-sighted. Strategically allocating resources more broadly balances multiple analytical goals simultaneously. Employing dimension reduction before sampling will allow this strategy to scale up well to big-data applications with many error-prone exposures.

2504.01355 2026-05-21 stat.ME econ.EM

A Practical Guide to Estimating Conditional Marginal Effects: Modern Approaches

一种估计条件边际效应的实用指南:现代方法

Jiehan Liu, Ziyi Liu, Yiqing Xu

AI总结 本文提供了一种使用现代统计方法估计条件边际效应的实用指南,讨论了处理效应如何随调节变量变化,并改进了现有解决方案,如半参数核估计器,引入了稳健的估计策略,如AIPW-Lasso和DML,并通过模拟和实证例子评估了每种方法,提供了针对样本量和研究背景的实用建议。

详情
AI中文摘要

本元素提供了一种使用现代统计方法估计条件边际效应的实用指南——即处理效应如何随调节变量变化。常用的approaches,如线性交互模型,常面临估计目标不明确、重叠有限和函数形式限制等问题。本指南首先明确估计目标并呈现主要识别结果。然后回顾并改进现有解决方案,如半参数核估计器,并引入稳健的估计策略,包括带有Lasso选择的增广逆概率加权(AIPW-Lasso)和现代算法的双重机器学习(DML)。每种方法均通过模拟和实证例子进行评估,针对样本量和研究背景提供实用建议。所有工具均在配套的R语言interflex包中实现。

英文摘要

This Element offers a practical guide to estimating conditional marginal effects-how treatment effects vary with a moderating variable-using modern statistical methods. Commonly used approaches, such as linear interaction models, often suffer from unclarified estimands, limited overlap, and restrictive functional forms. This guide begins by clearly defining the estimand and presenting the main identification results. It then reviews and improves upon existing solutions, such as the semiparametric kernel estimator, and introduces robust estimation strategies, including augmented inverse propensity score weighting with Lasso selection (AIPW-Lasso) and double machine learning (DML) with modern algorithms. Each method is evaluated through simulations and empirical examples, with practical recommendations tailored to sample size and research context. All tools are implemented in the accompanying \texttt{interflex} package for \texttt{R}.

2503.03347 2026-05-21 math.ST stat.TH

Drift estimation for rough processes under small noise asymptotic : trajectory fitting method

小噪声渐进行下粗糙过程的漂移估计:轨迹拟合方法

Arnaud Gloter, Nakahiro Yoshida

AI总结 本文研究了在小噪声渐进行下,通过轨迹拟合方法估计满足随机Volterra方程的未知漂移参数的问题,提出了一种一致且渐近正态的轨迹拟合估计量,并给出了保证估计量在L^p意义下收敛的可识别性条件。

详情
AI中文摘要

我们考虑一个满足随机Volterra方程的过程$X^\ve$,其漂移函数中包含未知参数$θ^\star$。Volterra核是奇异的,例如$K_0(u)=c u^{α-1/2} \id{u>0}$,其中$α\in (0,1/2)$。假设扩散系数与$\ve o 0$成正比。从路径$(X^\ve_s)_{s\in[0,T]}$的观测中,我们构建了一个轨迹拟合估计量,证明其一致性和渐近正态性。我们还指定了保证估计量在$L^p$意义下收敛的可识别性条件。

英文摘要

We consider a process $X^\ve$ that solves a stochastic Volterra equation with an unknown parameter $θ^\star$ in the drift function. The Volterra kernel is singular, and includes as an example, $K\_0(u)=c u^{α-1/2} \id{u>0}$ with $α\in (0,1/2)$. It is assumed that the diffusion coefficient is proportional to $\ve \to 0$. From an observation of the path $(X^\ve\_s)\_{s\in[0,T]}$, we construct a Trajectory Fitting Estimator, which is shown to be consistent and asymptotically normal. We also specify identifiability conditions insuring the $L^p$ convergence of the estimator.

2412.15076 2026-05-21 stat.AP

Digital N-of-1 Trials and their Application in Experimental Physiology

数字N-of-1试验及其在实验生理学中的应用

Stefan Konigorski, Mathias Ried-Larsen, Christopher H Schmid

AI总结 本文提出了一种新的N-of-1试验设计,用于在小样本情况下对个体进行有效的统计推断,并通过实验生理学中的实例展示了其应用和分析方法。

详情
Comments
Accepted in Experimental Physiology. https://doi.org/10.1113/EP092753
AI中文摘要

传统上,实验生理学研究通常在小群体的人类受试者、动物模型或细胞系中进行。在小样本量下识别出能够实现足够统计功效以得出正确统计推断,以检测群体层面效应的研究设计一直具有挑战性。此外,传统群体层面推断得出的平均效应不一定适用于个体受试者。在这里,我们介绍N-of-1试验作为一种创新的研究设计,可以用于对个体受试者干预效果进行有效的统计推断,并且可以跨多个研究受试者进行汇总,以比标准群体随机试验更高效地提供群体层面的推断。N-of-1试验自20世纪80年代末以来已在医疗环境中使用,但尚未大规模采用,并且在实验生理学研究环境中应用较少。在本文中,我们介绍了N-of-1试验的关键组成部分和设计特征,描述了结果的统计分析和解释,并通过实验生理学中的实例描述了一些可用的数字工具,以促进其使用。

英文摘要

Traditionally, studies in experimental physiology have been conducted in small groups of human participants, animal models or cell lines. Identifying optimal study designs that achieve sufficient power for drawing proper statistical inferences to detect group level effects with small sample sizes has been challenging. Moreover, average effects derived from traditional group-level inference do not necessarily apply to individual participants. Here, we introduce N-of-1 trials as an innovative study design that can be used to draw valid statistical inference about the effects of interventions on individual participants and can be aggregated across multiple study participants to provide population-level inferences more efficiently than standard group randomized trials. N-of-1 trials have been used in healthcare settings since the late 1980s, but without large-scale adoption and with few applications in experimental physiology research settings. In this manuscript, we introduce the key components and design features of N-of-1 trials, describe statistical analysis and interpretations of the results, and describe some available digital tools to facilitate their use using examples from experimental physiology.

2411.05758 2026-05-21 math.ST econ.EM stat.TH

Limit theorems of matching estimators with a fixed number of matches

具有固定匹配数的匹配估计量的极限定理

Songliang Chen, Fang Han

AI总结 本文重新审视Abadie和Imbens针对固定匹配数的最近邻匹配估计量的平均处理效应的极限定理,首次建立了具有显式计算极限方差的非标准化中心极限定理(CLT)。关键在于证明CLT中归一化统计量收敛到其均值,并计算该均值的闭式表达式。前者填补了未发表工作(Abadie和Imbens,2002)中的空白,后者解决了Abadie和Imbens(2006)提出的问题。

详情
Comments
In this version, we close a gap in the original submission
AI中文摘要

本文重新审视Abadie和Imbens针对固定匹配数的最近邻匹配估计量的平均处理效应的极限定理。我们首次建立了具有显式计算极限方差的非标准化中心极限定理(CLT)。关键在于证明CLT中归一化统计量收敛到其均值,并计算该均值的闭式表达式。前者填补了未发表工作(Abadie和Imbens,2002)中的空白,后者解决了Abadie和Imbens(2006)提出的问题。

英文摘要

This paper re-examines the limit theorems of Abadie and Imbens for nearest-neighbor matching estimators of average treatment effects with a fixed number of matches. We establish, for the first time, a non-normalized central limit theorem (CLT) with an explicitly calculated limiting variance. The key ingredients are to prove the convergence of the normalizing statistic appearing in the CLT of Abadie and Imbens to its mean, and to calculate the closed form of the limit of this mean. The former closes a gap in the argument of an unpublished work (Abadie and Imbens, 2002), while the latter resolves a question raised in Abadie and Imbens (2006).