arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2605.21486 2026-05-21 cs.LG cond-mat.dis-nn cs.AI stat.ML

Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate

量化超参数迁移与嵌入层学习率的重要性

Dayal Singh Kalra, Maissam Barkeshli

AI总结本文研究了超参数迁移的量化方法，通过三种指标评估超参数迁移的质量，发现Maximal Update（μP）参数化在训练中通过最大化嵌入层学习率提升了超参数迁移质量，而权重衰减虽改善了缩放定律拟合，但会降低外推鲁棒性。

详情

Comments: 10+28 pages, 5+17 figures

AI中文摘要

超参数迁移允许从小规模到大规模模型中外推最优优化超参数，这对于训练大型语言模型（LLMs）至关重要。这可以通过拟合缩放定律或通过精心选择参数化方式（如Maximal Update（μP））来实现，使最优超参数近似规模不变。本文首先开发了一个框架，通过三个指标量化超参数迁移：（1）缩放定律拟合的质量，（2）对外推误差的鲁棒性，以及（3）由于参数化选择导致的渐近损失惩罚。接着，通过一系列全面的消融实验，探讨了为何μP相对于标准参数化（SP）在训练AdamW时提供高质量的学习率迁移，因为现有理论不足。我们发现，μP相对于SP的主要优势在于最大化嵌入层学习率。在SP中，嵌入层学习率充当瓶颈，导致训练不稳定性；将其增加到宽度的倍数以匹配μP，可显著平滑训练并提高超参数迁移质量。此外，权重衰减改善了缩放定律拟合，但在固定token-per-parameter设置下会损害外推的鲁棒性。

英文摘要

Hyperparameter transfer allows extrapolating optimal optimization hyperparameters from small to large scales, making it critical for training large language models (LLMs). This is done either by fitting a scaling law to the hyperparameters or by a judicious choice of parameterization, such as Maximal Update ($μ$P), that renders optimal hyperparameters approximately scale invariant. In this paper, we first develop a framework to quantify hyperparameter transfer through three metrics: (1) the quality of the scaling law fit, (2) the robustness to extrapolation errors, and (3) the asymptotic loss penalty due to choice of parameterization. Next, we investigate through a comprehensive series of ablations why $μ$P appears to offer high-quality learning rate transfer relative to standard parameterization (SP), as existing theory is inadequate. We find that the overwhelming benefit of $μ$P relative to SP when training with AdamW arises simply from maximizing the learning rate of the embedding layer. In SP, the embedding layer learning rate acts as a bottleneck that induces training instabilities; increasing it by a factor of width to match $μ$P dramatically smooths out training while improving hyperparameter transfer. We also find that weight decay improves the scaling law fits, while, in the fixed token-per-parameter setting, it hurts the robustness of the extrapolation.

URL PDF HTML ☆

赞 0 踩 0

2605.21464 2026-05-21 stat.AP

Assessing the impact of tourist attractions through the integration of causal inference and demand-side economic analysis: A case study of the Sensoria experience museum in Holzminden, Germany

通过因果推断与需求侧经济分析整合评估旅游景点影响：以德国霍尔茨明登市Sensoria体验博物馆为例

Thomas Wieland

AI总结本文通过整合因果推断与需求侧经济分析，研究了德国霍尔茨明登市2024年9月开放的Sensoria体验博物馆对当地旅游业需求及直接和间接影响，发现其在第一年运营中带来4,691个额外的过夜住宿，产生约0.56百万欧元的额外总收入，但长期影响尚无法确定。

详情

Comments: v1.0.0

AI中文摘要

本研究笔记探讨了2024年9月在德国霍尔茨明登市开放的体验博物馆Sensoria对当地旅游业需求及相关直接和间接影响。为此，本研究采用了一种新的方法，通过结合因果推断与需求侧经济分析。采用差异-差异方法来量化治疗城市中额外的游客过夜住宿数量；结果转换为行业特定支出，从而确定Sensoria的直接和间接影响。在新旅游景点运营的第一年，可以检测到正向且显著的影响，对应4,691个额外的过夜住宿，导致酒店和零售行业及其他服务的额外总收入约为0.56百万欧元。直接效应和间接效应分别为约0.23和0.21百万欧元。然而，长期影响尚无法确定。此外，还可以证明在研究城市中小型和大型活动的正向影响。本简短研究证明了结合上述两种方法具有潜力，但仍需更深入的分析，文中也讨论了如何进行此类分析的建议。

英文摘要

This research note investigates the impact of the experience museum Sensoria, opened in September 2024 in Holzminden, Germany, on local tourism demand and related direct and indirect effects. To this end, the study employs a novel approach by combining causal inference and demand-side economic analysis. A difference-in-differences approach is employed to quantify the number of additional guest overnight stays in the treatment city; the results are converted into industry-specific expenditures, from which the direct and indirect effects of Sensoria are determined. A positive and significant impact which corresponds to 4,691 additional overnight stays can be detected in the first year of operation of the new tourist attraction, resulting in an additional gross turnover of approximately 0.56 million EUR across the hospitality and retail industries and other services. The direct effects and indirect effects amount to approximately 0.23 and 0.21 million EUR, respectively. However, long-term effects cannot (yet) be determined. Additionally, positive effects from small and large events in the cities studied can be demonstrated. This brief study demonstrates that combining the two approaches mentioned holds promise, yet requires a more in-depth analysis, for which suggestions are also discussed regarding how it could be conducted.

URL PDF HTML ☆

赞 0 踩 0

2605.21458 2026-05-21 cs.AI cs.LG stat.ME

Mind the Sim-to-Real Gap & Think Like a Scientist

注意仿真到现实的差距并像科学家一样思考

Harsh Parikh, Gabriel Levin-Konigsberg, Dominique Perrault-Joncas, Alexander Volfovsky

AI总结本文研究了在仿真和现实之间如何补充实验以减少价值差距，提出了Fisher-SEP方法，并通过两个案例研究展示了其应用。

详情

AI中文摘要

假设有规划者拥有一个预先训练的序列决策问题的仿真器，并有机会在现实中进行实验。仿真器查询成本低，但继承了校准数据中的混杂因素和漂移。实验是无偏的，但每次试验消耗一个现实单位。我们研究了规划者何时以及如何补充仿真器进行实验。我们给出了三个结果。首先，扩展的仿真引理将仿真器的价值误差分解为校准-部署偏移，该偏移可以随机化识别，以及一个参数残差，无法通过进一步交互减少。第二，仿真器最优策略与最优解之间的价值差距分为局部部分，这部分在部署策略已访问的状态上，以及可达性部分，这部分在部署策略未访问的状态上。在纯被动学习下，可达性部分在任何时间范围内都保持远离零。第三，我们提出了Fisher-SEP，一种辅助仿真的实验策略（SEP），该策略最小化目标策略价值的后验预测方差，具有仅奖励和仅转换的特殊化版本。两个案例研究展示了这些制度。在自动售货机供应链中，前端实验在时间范围足够长以抵消试点成本后超过后验更新。在HIV移动测试示例中，有一个走廊将一个受监控区域与一个受监控较差的区域分开，只有设计的探索才能到达受监控较差的区域。

英文摘要

Suppose a planner has a pre-trained simulator of a sequential decision problem and the option to run real experiments in the field. The simulator is cheap to query but inherits confounding and drift from its calibration data. Experimentation is unbiased but consumes one real unit per trial. We study when, and how, the planner should supplement the simulator with experiments. We give three results. First, an extended simulation lemma decomposes the simulator's value error into a calibration--deployment shift that randomization can identify and a parametric residual that no further interaction can reduce. Second, the value gap between the simulator-optimal policy and the optimum splits into a local component, on states the deployed policy already visits, and a reachability component, on states it does not. The reachability component stays bounded away from zero at any horizon under purely passive learning. Third, we propose Fisher-SEP, a simulation-aided experimental policy (SEP) that minimizes the posterior predictive variance of a target policy's value, with reward-only and transition-only specializations. Two case studies illustrate the regimes. In a vending-machine supply chain, front-loaded experimentation overtakes posterior updating once the horizon is long enough to amortize the pilot. In an HIV mobile-testing example with a corridor that separates a well-surveilled region from a poorly-surveilled one, only designed exploration reaches the poorly-surveilled region.

URL PDF HTML ☆

赞 0 踩 0

2605.21437 2026-05-21 physics.geo-ph cs.LG stat.ML

Neural Negative Binomial Regression for Weekly Seismicity Forecasting: Per-Cell Dispersion Estimation and Tail Risk Assessment

基于神经网络的负二项回归用于每周地震预测：每个单元的分散估计和尾部风险评估

Alim Igilik

AI总结本文提出了一种基于神经网络的地震预测方法，通过每个单元的分散参数估计和尾部风险评估，改进了传统泊松分布的假设，提高了极端事件预测的准确性。

详情

Comments: 28 pages, 9 figures. Source code available at https://github.com/Al1mkaYandere/seismic-probabilistic-modeling

AI中文摘要

$L^2$ over Wasserstein: 统计分析与最优传输

Riccardo Passeggeri, Rohan M. Shenoy, Pengcheng Ye

AI总结本文提出$L^2$ over Wasserstein空间，继承了Wasserstein空间的Riemannian结构，并通过随机概率测度的框架，为最优传输的统计不确定性提供了理论基础，同时展示了其在生成建模和贝叶斯非参数中的应用。

详情

Comments: 49 pages. Comments are welcome

AI中文摘要

最优传输提供了一种本质上几何且高度结构化的框架，用于研究概率测度空间，为当代统计学、机器学习和生成建模提供了丰富的理论工具。然而，在实际应用中，感兴趣的测度几乎从来不是精确已知的，这就要求一个能够处理统计不确定性的最优传输理论。我们构建了这样的框架，将经典理论提升到随机概率测度的设置中。我们引入了$L^2$ over Wasserstein空间，证明其继承了Wasserstein空间的正式Riemannian结构，通过刻画距离和测地几何。该结构诱导出具有Wasserstein梯度流样本路径的随机流，使其成为允许随机梯度流动态的Wasserstein空间的自然扩展。我们利用$L^2$ over Wasserstein框架内的经验测度，对最优传输工具的统计收敛结果进行了集合。此外，在贝叶斯非参数的设定中，我们将Schwartz的一致性定理细化到Wasserstein拓扑，并推导了在同一框架下的后验收敛结果。我们还展示了随机令牌采样理论中使用自注意力流路径的Transformer模型可以嵌入到我们的框架中。这些结果为随机最优传输及其在随机采样统计不确定性下的原理性推断和生成建模提供了统一的处理。

英文摘要

Optimal transport provides an inherently geometric and highly structured framework for studying spaces of probability measures, supplying a rich theoretical toolkit for contemporary statistics, machine learning, and generative modelling. In applications, however, the measures of interest are almost never known precisely, calling for a theory of optimal transport that accounts for statistical uncertainty. We construct such a framework, lifting the classical theory to the setting of random probability measures. We introduce the $L^2$ over Wasserstein space establishing that it inherits the formal Riemannian structure of the Wasserstein space by characterising distances and geodesic geometry. The structure induces random flows with Wasserstein gradient flow sample paths, making it the natural extension of the Wasserstein space which allows for random gradient flow dynamics. We ensemble statistical convergence results of the optimal transport machinery using the empirical measure within the $L^2$ over Wasserstein framework. Moreover, in the setting of Bayesian non-parametrics, we refine Schwartz's consistency theorem to the Wasserstein topology and deduce posterior convergence of the same machinery in the $L^2$ over Wasserstein space. We demonstrate that the growing theory of random token sampling for transformer models using self-attention flow paths can be embedded into the our framework. The results provide a unified treatment of random optimal transport and its consequences for principled inference and generative modelling under the statistical uncertainty of random sampling.

URL PDF HTML ☆

赞 0 踩 0

2605.21360 2026-05-21 math.ST cs.CC stat.TH

Linear Functional Testing with General Loadings in Sparse Regression: Separation Rates and Computational Barriers

高维稀疏线性回归中一般加载情况下的线性功能检验：分离率与计算障碍

Jie Xie, Dongming Huang

AI总结本文研究了在高维稀疏线性回归中，针对高斯随机设计和未知设计协方差的H0:ξ^Tβ=t0检验问题。构造了一个计算高效的混合检验方法，给出了适应性分离距离的上界，并建立了信息论下界。在超稀疏情况下，这些界限在任意ξ下描述了适应性分离率，而在中等稀疏情况下，这些界限在某些加载向量类别中匹配，但可能在一般情况下不同。此外，本文证明了一个低次下界，与上界在对数因子内匹配，表明改进混合检验的速率可能在统计上难以实现。对于平坦稀疏加载，本文通过稀疏CCA的多项式时间归约提供了进一步证据。最后，本文探讨了设计协方差信息如何影响适应性分离率，在稀疏符号尖峰协方差模型下，信息论下界可通过非高效的算法达到，而低次下界和稀疏CCA归约仍适用，提供了统计-计算差距的证据。当设计协方差已知且对角时，适应性分离率形式与超稀疏情况相同。

详情

AI中文摘要

切片正则化最优传输

Khai Nguyen

AI总结本文提出了一种新的正则化最优传输（OT）方法，称为切片正则化最优传输（SROT）。与熵正则化最优传输（EOT）不同，SROT将正则化方向指向平滑的切片最优传输（SOT）计划。我们提供了SROT的正式定义，推导了其对偶形式，并提供了SROT的后贝叶斯解释。然后，我们开发了一种类似Sinkhorn的算法，以高效计算，保留与EOT相同的可扩展性优势。通过将可扩展的SOT计划作为先验，SROT在相同正则化水平下比EOT更准确地近似了精确的OT计划。此外，所得到的传输计划优于参考的SOT计划本身。我们还引入了由SROT引起的相应的OT分歧度，称为SROT分歧度，并分析了其拓扑和计算性质。最后，我们通过合成数据集和颜色传输任务的实验验证了我们的方法，证明SROT在近似精确OT方面优于EOT和SOT。额外的梯度流实验进一步突显了SROT分歧度的优势。

详情

Comments: 22 pages, 8 figures, 1 table

AI中文摘要

我们提出了一种新的正则化最优传输（OT）公式，称为切片正则化最优传输（SROT）。与熵正则化最优传输（EOT）不同，SROT正则化方向指向平滑的切片最优传输（SOT）计划。据我们所知，SROT是首个利用SOT计划的版本作为参考来改进经典OT的方法。我们提供了SROT的正式定义，推导了其对偶形式，并提供了SROT的后贝叶斯解释。然后，我们开发了一种类似Sinkhorn的算法以实现高效的计算，保留与EOT相同的可扩展性优势。通过将可扩展的SOT计划作为先验，SROT在相同正则化水平下比EOT更准确地近似了精确的OT计划。此外，所得到的传输计划优于参考的SOT计划本身。我们进一步引入了由SROT引起的相应的OT分歧度，称为SROT分歧度，并分析了其拓扑和计算性质。最后，我们通过合成数据集和颜色传输任务的实验验证了我们的方法，证明SROT在近似精确OT方面优于EOT和SOT。额外的梯度流实验进一步突显了SROT分歧度的优势。

英文摘要

We propose a new regularized optimal transport (OT) formulation, termed sliced-regularized optimal transport (SROT). Unlike entropic OT (EOT), which regularizes the transport plan toward an independent coupling, SROT regularizes it toward a smoothened sliced OT (SOT) plan. To the best of our knowledge, SROT is the first approach to leverage a version of SOT plan as a reference to improve classical OT. We provide a formal definition of SROT, derive its dual formulation, and provide a post-Bayesian interpretation of SROT. We then develop a Sinkhorn-style algorithm for efficient computation, retaining the same scalability advantages as EOT. By incorporating a scalable SOT plan as a prior, SROT yields more accurate approximations of the exact OT plan than EOT under the same level of regularization. Moreover, the resulting transport plan improves upon the reference SOT plan itself. We further introduce the corresponding OT divergence induced by SROT, named SROT divergence, and analyze its topological and computational properties. Finally, we validate our approach through experiments on synthetic datasets and color transfer tasks, demonstrating that SROT is better than both EOT and SOT in approximating exact OT. Additional experiments on gradient flows further highlight the advantages of SROT divergence.

URL PDF HTML ☆

赞 0 踩 0

2604.19169 2026-05-21 stat.ME

A Finite Mixture Failure-rate based Heterogeneous Step-stress Accelerated Life Testing (h-SSALT) Model

基于有限混合失败率的异质步应力加速寿命测试（h-SSALT）模型

Pranoy Palit, Ayan Pal, Kiran Prajapat

AI总结本文提出一种基于有限混合失败率的异质步应力加速寿命测试模型，通过Weibull分布的II型截尾失效时间，允许在第二应力水平上通过有限混合的m个潜在子组产生异质性，利用期望最大化算法进行最大似然估计，并通过模拟研究验证了忽略异质性对寿命预测的系统性偏差。

详情

Comments: 44 pages, 7 figures, 12 tables. Version 2: we have added interval estimation using Louis' missing information method with transformation-based confidence intervals, and an additional real data analysis example

AI中文摘要

传统步应力加速寿命测试模型假设测试单元来自同质群体。最近，Lu和Kateri（2025）提出了一种基于累积暴露的异质步应力加速寿命测试（SSALT）模型，以考虑同一生产批次中测试单元的非同质老化模式。本文引入了一种替代但灵活的基于失败率的异质简单步应力加速寿命测试（h-SSALT）模型，采用Weibull分布的II型截尾失效时间，允许通过m个潜在子组的有限混合在第二应力水平上产生异质性。开发了期望最大化算法用于模型参数的最大似然估计，利用来自未知群体成员资格和II型截尾的不完整数据结构。通过Louis（1982）的缺失信息身份进行区间估计，使用基于转换的置信区间尊重参数约束。广泛的模拟研究评估了所提出估计器的有限样本性能，并通过基于分位数的比较证明，忽略群体异质性会导致整个分位数范围内的寿命预测系统性偏差，最严重的后果出现在早期失效分位数上，这直接相关于保修期设计。通过特殊情形比较确认，所提出的Weibull失败率基于公式当形状参数等于1时退化为Lu和Kateri（2025）现有的模型，验证了所提出的框架作为适当推广的正确性。通过模拟和实际数据分析示例进一步展示了该模型的实际应用。

英文摘要

Traditional step-stress accelerated life testing models assume that test units originate from a homogeneous population. Recently, Lu and Kateri (2025) proposed a heterogeneous cumulative exposure based SSALT model to account for the inhomogeneous aging patterns among test units belonging to the same production batch. This paper introduces an alternative yet flexible failure-rate based heterogeneous simple SSALT (h-SSALT) model with Weibull-distributed Type-II censored failure times, allowing heterogeneity to emerge at the second stress level through a finite mixture of m latent subgroups, each characterized by its own failure behavior. The expectation-maximization algorithm is developed for maximum likelihood estimation of the model parameters, exploiting the incomplete data structure arising from both unknown group membership and Type-II censoring. Interval estimation is performed using the missing information identity of Louis (1982) with transformation-based confidence intervals respecting parameter constraints. An extensive simulation study evaluates the finite-sample performance of the proposed estimators and demonstrates, through a quantile-based comparison, that ignoring population heterogeneity leads to systematic bias in lifetime predictions across the entire quantile range, with the most severe consequences at early failure quantiles of direct relevance to warranty period design. A special case comparison confirms that the proposed Weibull failure-rate based formulation reduces to the existing model of Lu and Kateri (2025) when the shape parameter equals unity, validating the proposed framework as a proper generalization. The practical application of the model is further illustrated through simulated and real data analysis examples.

URL PDF HTML ☆

赞 0 踩 0

2603.06871 2026-05-21 stat.ME stat.AP stat.CO stat.OT

Adaptive Bi-Level Variable Selection of Conditional Main Effects for Generalized Linear Models

适应性双层条件主效应变量选择用于广义线性模型

Kexin Xie, Xinwei Deng

AI总结本文提出了一种适应性cmenet方法，用于在广义线性模型框架下进行条件主效应变量选择，通过自适应权重的惩罚似然方法改进了双层变量选择，同时开发了高效的参数估计算法，并通过模拟研究和基因关联分析的实证研究评估了方法性能。

详情

DOI: 10.1080/00401706.2026.2643213

AI中文摘要

理解变量间的交互效应对于各种应用中的回归建模至关重要。传统的将交互效应量化为变量乘积的方法往往缺乏清晰的可解释性，尤其是在复杂系统中。条件主效应（CME）的概念提供了一个更直观和可解释的框架，通过量化一个变量在另一个变量水平下的效应来捕捉交互效应。最近提出的一种称为cmenet的方法进一步考虑了CME的双层选择，通过利用其自然分组结构（例如兄弟和堂兄弟组）通过惩罚来实现。然而，cmenet方法存在一些局限性，包括组内CME的惩罚耦合能力、组间惩罚的缺乏适应性以及仅限于具有连续响应的线性模型。为克服这些限制，我们提出了一种适应性cmenet方法，用于在广义线性模型（GLM）框架下进行CME选择。所提出的方法考虑了一种带有自适应权重的惩罚似然方法，以实现有效的双层变量选择，提高组间和组内选择的效果。还开发了通过迭代加权最小二乘法进行参数估计的高效算法。所提出方法的性能通过模拟研究和基因关联分析的实证研究进行了评估。

英文摘要

Understanding interaction effects among variables is important for regression modeling in various applications. The conventional approach of quantifying interactions as the product of variables often lacks clear interpretability, especially in complex systems. The concept of conditional main effects (CME) provides a more intuitive and interpretable framework for capturing interaction effects by quantifying the effect of one variable conditional on the level of another. A recent method called cmenet further considered the bi-level selection of CMEs by leveraging their natural grouping structure (e.g., sibling and cousin groups) through penalization. However, there are several limitations in the cmenet method, including the coupling ability of penalties for within-group CMEs, lack of adaptiveness for between-group penalties, and restriction to linear models with continuous responses. To overcome these limitations, we propose an adaptive cmenet method for CME selection under the generalized linear model (GLM) framework. The proposed method considers a penalized likelihood approach with adaptive weights to enable effective bi-level variable selection, improving both between-group and within-group selection. An efficient algorithm for parameter estimation is also developed by employing an iteratively reweighted least squares procedure. The performance of the proposed method is evaluated by both simulation studies and real-data studies in gene association analysis.

URL PDF HTML ☆

赞 0 踩 0

2602.13485 2026-05-21 cs.LG stat.ML

Federated Learning of Nonlinear Temporal Dynamics with Graph Attention-based Cross-Client Interpretability

基于图注意力的跨客户端可解释性非线性时序动态联邦学习

Ayse Tursucular, Ayush Mohanty, Nazal Mohamed, Nagi Gebraeel

AI总结本文提出了一种联邦学习框架，用于在分布式非线性系统中学习跨客户端的时序依赖关系。该框架通过非线性状态空间模型将本地高维观测映射到低维潜在状态，并利用图注意力网络在通信的潜在状态上学习图结构的神经状态转移模型，通过将学习的服务器侧转移模型的雅可比矩阵与注意力系数相关联，实现了对跨客户端时序依赖关系的可解释性。

详情

Comments: Manuscript under review

AI中文摘要

现代工业系统的网络越来越多地由分布式传感器监控，其中每个系统由多个子系统组成，生成高维时间序列数据。这些子系统通常是相互依赖的，因此理解一个子系统中的时序模式如何与其他子系统相关联变得很重要。在去中心化设置中，原始测量值无法共享，客户端观测是异质的，这使得问题更加复杂。在实际部署中，每个子系统（客户端）运行一个固定的专有模型，无法修改或重新训练，限制了现有方法。非线性动态进一步使跨客户端时序依赖关系难以解释，因为它们嵌入在非线性状态转移函数中。本文提出了一种联邦框架，用于在这些约束下学习跨客户端的时序依赖关系。每个客户端使用非线性状态空间模型将高维本地观测映射到低维潜在状态。中央服务器利用图注意力网络在通信的潜在状态上学习图结构的神经状态转移模型。为了可解释性，我们将学习的服务器侧转移模型的雅可比矩阵与注意力系数相关联，从而首次提供了对去中心化非线性系统中跨客户端时序依赖关系的可解释性描述。我们建立了理论收敛保证，以达到集中化 oracle，并通过合成实验验证了该框架，展示了收敛性、可解释性、可扩展性和隐私。此外，现实世界实验显示其性能与去中心化基线相当。

英文摘要

Networks of modern industrial systems are increasingly monitored by distributed sensors, where each system comprises multiple subsystems generating high dimensional time series data. These subsystems are often interdependent, making it important to understand how temporal patterns at one subsystem relate to others. This is challenging in decentralized settings where raw measurements cannot be shared and client observations are heterogeneous. In practical deployments each subsystem (client) operates a fixed proprietary model that cannot be modified or retrained, limiting existing approaches. Nonlinear dynamics further make cross client temporal interdependencies difficult to interpret because they are embedded in nonlinear state transition functions. We present a federated framework for learning temporal interdependencies across clients under these constraints. Each client maps high dimensional local observations to low dimensional latent states using a nonlinear state space model. A central server learns a graph structured neural state transition model over the communicated latent states using a Graph Attention Network. For interpretability we relate the Jacobian of the learned server side transition model to attention coefficients, providing the first interpretable characterization of cross client temporal interdependencies in decentralized nonlinear systems. We establish theoretical convergence guarantees to a centralized oracle and validate the framework through synthetic experiments demonstrating convergence, interpretability, scalability and privacy. Additional real world experiments show performance comparable to decentralized baselines.

URL PDF HTML ☆

赞 0 踩 0

2601.15950 2026-05-21 math.PR math.ST stat.TH

Extreme Score Distributions in Countable-Outcome Round-Robin Tournaments of Equally Strong Players

可数结果轮换赛中同等强选手的极端得分分布

Yaakov Malinovsky

AI总结本文研究了在可数结果轮换赛中，同等强选手的极端得分分布问题，通过分析极端得分（如最大值、次大值和下限极值）的分布特性，得出了当玩家数量n趋于无穷时的极限分布及收敛速率。

2512.23943 2026-05-21 cs.CY cs.LG stat.ME

Statistical Guarantees in the Search for Less Discriminatory Algorithms

在寻找更少歧视性算法中统计保证

Chris Hays, Ben Laufer, Solon Barocas, Manish Raghavan

AI总结本文研究了在高风险领域中，企业为减少对受保护群体的歧视性影响而寻找更少歧视性算法的统计保证问题，提出了一种自适应停止算法以确定何时停止搜索以证明进一步搜索不会带来有意义的改进。

详情

Comments: 38 pages, 10 figures

AI中文摘要

美国反歧视法可以对企业未能采用减少歧视的替代方案（LDA）施加责任：一种决策政策，能够在实现相同商业目标的同时减少对受法律保护群体的歧视性影响。最近的学术研究认为，这一学说对高风险领域（如就业、贷款和住房）的算法决策有直接影响，可能迫使企业寻找“更少歧视性算法”（Black等，2024）。监管机构有时会鼓励主动寻找LDA，强化了企业努力寻找同样表现但影响更小的模型的期望。模型多样性使得此类搜索成为可能：通过不同的随机种子重新训练可以产生具有相似预测性能但实质性不同的歧视性影响的模型。然而企业无法无限重新训练，这提出了一个核心问题：何时搜索足够证明善意？我们正式将LDA搜索在多样性下作为最优停止问题，其中开发者试图产生证据表明进一步搜索不太可能带来有意义的改进。我们的主要贡献是一种自适应停止算法，它提供了一个高概率的上界，以确定通过继续重新训练所能达到的最佳歧视性影响改进，使开发者能够证明（例如，向法院）进一步搜索不太可能有所帮助。我们还展示了在模型空间上更强的分布假设可以产生更紧的界限，并在现实世界信用和住房数据集上验证了该方法。

英文摘要

U.S. discrimination law can impose liability on firms that fail to adopt a less discriminatory alternative (LDA): a decision policy that achieves the same business objectives while reducing disparate impact on legally protected groups. Recent scholarship argues that this doctrine has direct implications for algorithmic decision-making in high-stakes domains such as employment, lending, and housing, potentially obligating firms to search for "less discriminatory algorithms" (Black et al., 2024). Regulators have at times encouraged proactive LDA searches, reinforcing the expectation of a good-faith effort to identify equally performant models with lower disparate impact. Model multiplicity makes such searches plausible: retraining with different random seeds can yield models with comparable predictive performance but materially different disparate impacts. Yet firms cannot retrain indefinitely, raising a central question: when is the search sufficient to demonstrate good faith? We formalize LDA search under multiplicity as an optimal stopping problem in which a developer seeks to produce evidence that further search is unlikely to yield meaningful improvements. Our main contribution is an adaptive stopping algorithm that provides a high-probability upper bound on the best disparate-impact gains attainable through continued retraining, enabling developers to certify (e.g., to a court) that additional search is unlikely to help. We also show how stronger distributional assumptions over the model space can yield tighter bounds, and we validate the approach on real-world credit and housing datasets.

URL PDF HTML ☆

赞 0 踩 0

2508.04074 2026-05-21 stat.AP

Matrix Factorization-Based Solar Spectral Irradiance Missing Data Imputation with Uncertainty Quantification

基于矩阵分解的太阳光谱辐照度缺失数据填补与不确定性量化

Yuxuan Ke, Xianglei Huang, Odele Coddington, Yang Chen

AI总结本文提出了一种基于低秩矩阵分解的太阳光谱辐照度重建方法，结合自回归时间正则化、周期样条去趋势和交叉光谱协方差信息，以提高填补精度并生成校准的不确定性区间，适用于气候科学研究。

详情

AI中文摘要

太阳光谱辐照度（SSI）描述了到达地球大气顶部的太阳能量通量的光谱分布。每日SSI测量构成一个在光谱（行）和时间（列）上解析的太阳能量通量测量矩阵。最近的SSI测量自2018年3月起由NASA的总和光谱太阳辐照度传感器-1（TSIS-1）光谱辐照度监测器（SIM）完成，但数据存在大量缺失，原因包括随机因素、仪器停机、与太阳周期性磁活动相关的周期性趋势以及光谱间不同程度的相关性，某些接近于1。本文提出了一种低秩矩阵分解方法用于SSI重建，结合自回归时间正则化、周期样条去趋势和交叉光谱协方差信息。该方法作为两阶段过程实现，分别针对散射缺失和延长停机缺失进行处理，并使用高效的交替优化算法进行拟合。我们进一步通过基于符合预测的分布自由区间估计程序附带重建的SSI值。通过合成实验和真实数据分析，我们比较了该方法与高斯过程回归、线性时间序列平滑和现有矩阵补全方法在填补精度、区间覆盖、区间长度和计算效率方面的表现。结果表明，利用SSI的周期性、时间性和交叉光谱结构显著提高了重建性能，并生成校准的不确定性区间，产生适合下游气候科学研究的重建SSI数据产品。

英文摘要

The solar spectral irradiance (SSI) depicts the spectral distribution of solar energy flux reaching the top of the Earth's atmosphere. Daily SSI measurements constitute a matrix with spectrally (rows) and temporally (columns) resolved solar energy flux measurements. The most recent SSI measurements have been made by NASA's Total and Spectral Solar Irradiance Sensor-1 (TSIS-1) Spectral Irradiance Monitor (SIM) since March 2018. This data has considerable missing data due to both random factors and instrument downtime, a periodic trend related to the Sun's cyclical magnetic activity, and varying degrees of correlation among the spectra, some approaching unity. We propose a low-rank matrix factorization method for SSI reconstruction that incorporates autoregressive temporal regularization, periodic spline detrending, and cross-spectral covariance information. The method is implemented as a two-stage procedure designed to address scattered missingness and extended downtime missingness, respectively, and is fitted using efficient alternating optimization algorithms. We further accompany the reconstructed SSI values with a distribution-free interval estimation procedure based on conformal prediction. Through synthetic experiments and real-data analyses, we compare this method with Gaussian process regression, linear time series smoothing, and existing matrix-completion approaches in terms of imputation accuracy, interval coverage, interval length, and computational efficiency. The results show that exploiting the periodic, temporal, and cross-spectral structure of SSI substantially improves reconstruction performance and yields calibrated uncertainty intervals, producing a reconstructed SSI data product suitable for downstream climate science studies.

URL PDF HTML ☆

赞 0 踩 0

2503.18831 2026-05-21 math.ST stat.TH

An improved central limit theorem for the empirical sliced Wasserstein distance

经验切片瓦瑟斯坦距离的改进中心极限定理

David Rodríguez-Vítores, Eustasio del Barrio, Jean-Michel Loubes

AI总结本文基于Efron-Stein不等式和对最优运输势的非平凡控制，推导了p-切片瓦瑟斯坦距离的中心极限定理，为非紧致测度之间的切片瓦瑟斯坦距离提供了首次渐近有效的推断框架。

详情

Comments: 26 pages, 1 figure

AI中文摘要

瓦瑟斯坦距离在现代数据分析中被广泛应用，但在高维情况下面临显著的计算和统计挑战。切片瓦瑟斯坦距离通过利用一维投影缓解了这些挑战。基于Efron-Stein不等式-一种在相关问题中已被证明有效的技术-以及对最优运输势在不同方向上的非平凡控制，我们建立了p-切片瓦瑟斯坦距离（p>1）的中心极限定理，以经验成本的期望为中心。与一般瓦瑟斯坦距离不同，中心化可以被总体成本替代，从而实现有效的统计推断。这扩展和细化了现有的一维结果，为可能非紧致测度之间的切片瓦瑟斯坦距离提供了首个渐近有效的推断框架。最后，我们处理了推断中其他关键的实用方面，包括切片积分的蒙特卡洛近似和一致方差估计。

英文摘要

Wasserstein distances are widely used in modern data analysis but pose significant computational and statistical challenges in high dimensions. The sliced Wasserstein distance alleviates these challenges by leveraging one-dimensional projections. Building on the Efron-Stein inequality-a technique proven effective in related problems-and a non-trivial control of the optimal transport potentials across directions, we establish a central limit theorem for the p-sliced Wasserstein distance, for p>1, centered at the expected empirical cost. Unlike for the general Wasserstein distance, the centering can be replaced by the population cost, enabling valid statistical inference. This generalizes and refines existing one-dimensional results, providing the first asymptotically valid inference framework for the sliced Wasserstein distance between possibly non-compact measures. Finally, we address other practical aspects crucial for inference, including Monte Carlo approximation of the slicing integral and consistent variance estimation.

URL PDF HTML ☆

赞 0 踩 0

2503.00565 2026-05-21 stat.ML cs.LG math.ST stat.ME stat.TH

Batched Single-Index Global Multi-Armed Bandits with Covariates

批量单索引全局多臂老虎机与协变量

Sakshi Arya, Hyebin Song

AI总结本文提出了一种新的半参数框架，用于带有协变量的批量老虎机问题，通过引入共享参数和单索引回归模型来捕捉臂奖励之间的关系，提出BIDS算法，在两种设置下推导了理论遗憾界，证明了在协变量维度为1时非参数批量老虎机的最优率。

详情

AI中文摘要

多臂老虎机（MAB）框架是一种广泛用于顺序决策制定的方法，其中决策者在每一轮中选择一个臂，以最大化长期奖励。在许多实际应用中，如个性化医学和推荐系统，决策时可用上下文信息，不同臂的奖励相关而非独立，且反馈以批量形式提供。我们提出了一种新的半参数框架，用于带有协变量的批量老虎机，该框架在臂之间共享参数。我们利用单索引回归（SIR）模型来捕捉臂奖励之间的关系，同时在可解释性和灵活性之间取得平衡。我们的算法，批量单索引动态分箱和 successive arm elimination（BIDS），采用批量 successive arm elimination 策略，并通过单索引方向引导的动态分箱机制。我们考虑了两种设置：一种是可用 pilot 方向，另一种是方向从数据估计，推导了两种情况的理论遗憾界。当 pilot 方向足够准确且臂的数量 K 固定时，我们的方法在非参数批量老虎机中实现了最小化最优率（d=1），规避了维度灾难。在模拟和现实数据集上的大量实验展示了我们的算法相比由 \cite{jiang2025batched} 引入的非参数批量老虎机方法的有效性。

英文摘要

The multi-armed bandits (MAB) framework is a widely used approach for sequential decision-making, where a decision-maker selects an arm in each round with the goal of maximizing long-term rewards. In many practical applications, such as personalized medicine and recommendation systems, contextual information is available at the time of decision-making, rewards from different arms are related rather than independent, and feedback is provided in batches. We propose a novel semi-parametric framework for batched bandits with covariates that incorporates a shared parameter across arms. We leverage the single-index regression (SIR) model to capture relationships between arm rewards while balancing interpretability and flexibility. Our algorithm, Batched single-Index Dynamic binning and Successive arm elimination (BIDS), employs a batched successive arm elimination strategy with a dynamic binning mechanism guided by the single-index direction. We consider two settings: one where a pilot direction is available and another where the direction is estimated from data, deriving theoretical regret bounds for both cases. When a pilot direction is available with sufficient accuracy and the number of arms $K$ is fixed, our approach achieves minimax-optimal rates (with $d = 1$) for nonparametric batched bandits, circumventing the curse of dimensionality. Extensive experiments on simulated and real-world datasets demonstrate the effectiveness of our algorithm compared to the nonparametric batched bandit method introduced by \cite{jiang2025batched}.

URL PDF HTML ☆

赞 0 踩 0

2502.17773 2026-05-21 stat.ME cs.AI cs.LG

How Many Human Survey Respondents is a Large Language Model Worth? An Uncertainty Quantification Perspective

大型语言模型值得模拟多少人意见？从不确定性量化角度出发

Chengpiao Huang, Yuhang Wu, Kaizheng Wang

AI总结本文从不确定性量化角度出发，提出了一种框架，将LLM模拟的响应转换为人类响应总体参数的可靠置信集，通过量化人类-LLM不一致带来的不确定性。关键设计是模拟响应的数量：过多会导致置信集过窄且覆盖性差，过少则导致置信集过宽且信息不足。本文提出了一种数据驱动的方法，自适应选择模拟样本量以实现名义平均覆盖性，无论LLM的模拟保真度或置信集构建过程如何。所选样本量进一步反映了LLM能代表的有效人类人口规模，提供了其模拟保真度的定量度量。实验表明不同LLM和领域存在异质性模拟保真度。

详情

Comments: 63 pages, 13 figures

AI中文摘要

大型语言模型（LLMs）越来越多地用于模拟调查响应，但合成数据可能与人类人口不一致，导致不可靠的推断。我们开发了一个通用框架，将LLM模拟的响应转换为人类响应总体参数的可靠置信集，量化由人类-LLM不一致引起的不确定性。关键设计选择是模拟响应的数量：过多会产生过于狭窄的置信集，覆盖性差；过少则会产生过于宽泛且信息不足的置信集，受随机噪声主导。我们提出了一种数据驱动的方法，自适应地选择模拟样本量以实现名义平均覆盖性，无论LLM的模拟保真度或置信集构建过程如何。所选样本量进一步被证明反映了LLM能代表的有效人类人口规模，提供其模拟保真度的定量度量。在真实调查数据集上的实验揭示了不同LLM和领域之间的异质性模拟保真度。

英文摘要

Large language models (LLMs) are increasingly used to simulate survey responses, but synthetic data can be misaligned with the human population, leading to unreliable inference. We develop a general framework that converts LLM-simulated responses into reliable confidence sets for population parameters of human responses, quantifying the uncertainty induced by the human-LLM misalignment. The key design choice is the number of simulated responses: too many produce overly narrow sets with poor coverage, while too few yield overly wide and uninformative sets dominated by stochastic noise. We propose a data-driven approach that adaptively selects the simulation sample size to achieve nominal average-case coverage, regardless of the LLM's simulation fidelity or the confidence set construction procedure. The selected sample size is further shown to reflect the effective human population size that the LLM can represent, providing a quantitative measure of its simulation fidelity. Experiments on real survey datasets reveal heterogeneous simulation fidelity across different LLMs and domains.

URL PDF HTML ☆

赞 0 踩 0

2502.17518 2026-05-21 cs.LG cs.AI q-fin.CP stat.ML

Ensemble RL through Classifier Models: Enhancing Risk-Return Trade-offs in Trading Strategies

通过分类器模型进行集成强化学习：在交易策略中增强风险回报权衡

Zheli Xiong

AI总结本文研究了在金融交易策略中使用集成强化学习模型的全面研究，利用分类器模型来提升性能。通过将A2C、PPO和SAC等强化学习算法与传统分类器如支持向量机（SVM）、决策树和逻辑回归相结合，探讨不同分类器组如何整合以改善风险回报权衡。研究评估了各种集成方法的有效性，将其与单个强化学习模型在关键金融指标（包括累计回报率、夏普比率（SR）、卡勒姆比率和最大回撤（MDD））上进行比较。结果表明，集成方法在风险调整后的回报方面始终优于基础模型，提供了更好的回撤管理和整体稳定性。然而，我们发现集成性能对方差阈值τ的选择敏感，强调了动态调整τ以达到最佳性能的重要性。本研究强调了将强化学习与分类器结合在自适应决策中的价值，对金融交易、机器人和其他动态环境具有启示。

详情

Comments: 16 pages,5 figures, 1 table

AI中文摘要

本文提出了一项全面研究，探讨在金融交易策略中使用集成强化学习（RL）模型的应用，利用分类器模型来提升性能。通过结合A2C、PPO和SAC等强化学习算法与传统分类器如支持向量机（SVM）、决策树和逻辑回归，我们研究了不同分类器组如何整合以改善风险回报权衡。研究评估了各种集成方法的有效性，将其与单个RL模型在关键金融指标（包括累计回报率、夏普比率（SR）、卡勒姆比率和最大回撤（MDD））上进行比较。我们的结果表明，集成方法在风险调整后的回报方面始终优于基础模型，提供了更好的回撤管理和整体稳定性。然而，我们发现集成性能对方差阈值τ的选择敏感，强调了动态调整τ以达到最佳性能的重要性。本研究强调了将强化学习与分类器结合在自适应决策中的价值，对金融交易、机器人和其他动态环境具有启示。

英文摘要

This paper presents a comprehensive study on the use of ensemble Reinforcement Learning (RL) models in financial trading strategies, leveraging classifier models to enhance performance. By combining RL algorithms such as A2C, PPO, and SAC with traditional classifiers like Support Vector Machines (SVM), Decision Trees, and Logistic Regression, we investigate how different classifier groups can be integrated to improve risk-return trade-offs. The study evaluates the effectiveness of various ensemble methods, comparing them with individual RL models across key financial metrics, including Cumulative Returns, Sharpe Ratios (SR), Calmar Ratios, and Maximum Drawdown (MDD). Our results demonstrate that ensemble methods consistently outperform base models in terms of risk-adjusted returns, providing better management of drawdowns and overall stability. However, we identify the sensitivity of ensemble performance to the choice of variance threshold τ, highlighting the importance of dynamic τ adjustment to achieve optimal performance. This study emphasizes the value of combining RL with classifiers for adaptive decision-making, with implications for financial trading, robotics, and other dynamic environments.

URL PDF HTML ☆

赞 0 踩 0

2401.03834 2026-05-21 stat.ME

On the error control of invariant causal prediction

关于不变因果预测的误差控制

Jinzhou Li, Jelle J Goeman

AI总结本文研究如何通过更宽松的误差保证改进不变因果预测方法，提出使用虚假发现率控制和同时真实发现界作为核心方法，以提高因果信息的提取能力。

详情

AI中文摘要

不变因果预测提供了一个有用的框架，用于使用来自多个环境的异质数据识别响应的因果预测器。原始不变因果预测方法的一个有价值特性是它以高概率保证没有虚假的因果发现。然而，这种保证在某些应用中可能过于保守，导致很少或没有因果发现。这引发了一个自然的问题：能否为不变因果预测配备更不保守的误差保证，从而从数据中提取更多的因果信息？在本文中，我们通过聚焦于两种广泛使用的更宽松保证：虚假发现率控制和同时真实发现界来回答这个问题。我们方法的关键步骤是将不变因果预测重新表述为多重检验问题。然后我们采用e-Closure原理来获得（同时）虚假发现率控制，同时采用针对此设置的新p-to-e校准器。我们还通过封闭检验推导出同时真实发现界，这些界提供了额外的因果信息，而无需额外假设，并保留了原始不变因果预测方法的所有发现。通过模拟和对美国青少年教育成就的现实数据应用，我们展示了这些更宽松的误差控制保证可以提高不变因果预测的实用性。

英文摘要

Invariant causal prediction provides a useful framework for identifying causal predictors of a response using heterogeneous data from multiple environments. One valuable property of the original invariant causal prediction method is that it guarantees no false causal discoveries with high probability. Such a guarantee, however, can be overly conservative in some applications, resulting in few or no causal discoveries. This raises a natural question: can invariant causal prediction be equipped with less conservative error guarantees and thereby extract more causal information from the data? In this paper, we address this question by focusing on two widely used and more liberal guarantees: false discovery rate control and simultaneous true discovery bounds. A key step in our approach is to reformulate invariant causal prediction as a multiple testing problem. We then adopt the e-Closure principle to obtain (simultaneous) false discovery rate control, together with new p-to-e calibrators tailored to this setting. We also derive simultaneous true discovery bounds via closed testing, which provide additional causal information without requiring extra assumptions and retain all discoveries from the original invariant causal prediction method. Through simulations and a real data application on educational attainment of teenagers in the United States, we show that these more liberal error control guarantees can improve the practical usefulness of invariant causal prediction.

URL PDF HTML ☆

赞 0 踩 0

2605.21253 2026-05-21 stat.ML cs.LG

Theoretical guidelines for annealed Langevin dynamics in compositional simulation-based inference

关于组成式得分方法在基于模拟的推断中的退火动力学理论指南

Camille Touron, Gabriel V. Cardoso, Julyan Arbel, Pedro L. C. Rodrigues

AI总结本文研究了基于模拟的推断中组成式得分方法的退火动力学理论，提出了一种新的理论框架，通过推导Wasserstein界，为超参数选择提供了理论指导，并在高斯情况下证明了不同复合得分方法在步长和总动力学步数上的差异。

详情

AI中文摘要

基于模拟的推断（SBI）中的组成式得分方法通过聚合单独学习的后验得分来近似给定n个独立观测的后验分布。目前主要有两种方法（Geffner等人，2023；Linhart等人，2026）。由于所得到的复合得分不对应于真实多观测后验的正向扩散路径上的任何分布的得分，通过反向SDE采样会导致不可消除的偏差。退火动力学提供了一种原理性的替代方法：它将复合得分视为一系列可处理的桥梁密度序列的真实得分，并依次采样这些密度。当正确调节时，它可能导致可控的偏差。然而，其超参数，即步长、每个级别步数和退火级别数，迄今为止都是经验选择。我们推导了退火动力学在近似得分下的Wasserstein界，并将其转化为这些超参数的显式决策规则，以保证规定的采样精度，同时突显每种复合得分方法的不同理论方面。在高斯情况下，我们获得了所有相关量的闭式表达式，并证明了Linhart等人（2026）的桥梁密度一致地允许更大的步长和更少的总动力学步数，而Geffner等人（2023）的则不然。此外，我们还通过实验证明，在高斯情况下的调节可以推广到更复杂的问题，从而为使用组成式得分方法的实践者提供了一个清晰且理论坚实的起点。

英文摘要

Compositional score-based approaches to simulation-based inference (SBI) approximate the posterior over a shared parameter given $n$ independent observations by aggregating individually learned posterior scores: currently, there are two main propositions of such methods (Geffner et al. (2023), Linhart et al. (2026)). As the resulting composite score does not correspond to the score of any distribution along the forward diffusion path of the true multi-observation posterior, sampling from it via a reverse SDE leads to an irreducible bias. Annealed Langevin dynamics provides a principled alternative: it treats the composite score as the genuine score of a sequence of tractable bridging densities and samples from them in succession. When properly tuned, it could lead to a controllable bias. However, its hyperparameters, namely step sizes, the number of steps per level, and the number of annealing levels, have so far been chosen empirically. We derive Wasserstein bounds for annealed Langevin with approximate scores and translate them into explicit decision rules for these hyperparameters that guarantee a prescribed sampling accuracy, while highlighting different theoretical aspects of each composite score formulation. In the Gaussian setting, we obtain closed-form expressions for all relevant quantities and prove that the bridging densities of Linhart et al. (2026) consistently admit larger step sizes and require fewer total Langevin steps than those of Geffner et al. (2023). Furthermore, we show empirically that the tuning obtained in the Gaussian setting generalizes to more complex problems, thus providing a well-understood and theoretically grounded starting point for practitioners using compositional score-based approaches.

URL PDF HTML ☆

赞 0 踩 0

2605.21217 2026-05-21 stat.ML cs.LG

Federated LoRA Fine-Tuning for LLMs via Collaborative Alignment

通过协作对齐的联邦LoRA微调大型语言模型

Shuaida He, Liwen Chen, Long Feng

AI总结本文研究了在联邦学习环境下使用LoRA进行参数高效微调的问题，提出了一种名为CLAIR的框架，通过结构低秩加块稀疏分解来恢复共享LoRA子空间并检测污染客户端，从而在噪声情况下实现精确恢复，并在不同条件下实现稳定和一致的协作集恢复。

详情

AI中文摘要

低秩适应（LoRA）已成为参数高效微调大型语言模型（LLMs）的强大工具。本文研究了在联邦学习设置下的LoRA，使客户端能够在保持参数效率的同时进行协作微调。我们专注于一个高度异质的环境，在这种环境中客户端仅共享部分结构，且大量子集可能被污染。我们提出了Collaborative Low-rank Alignment and Identifiable Recovery（CLAIR），一个意识污染的框架，仅依赖于初步的本地估计器。其公式适用于从线性回归到神经网络和LLM模块的广泛领域，只要本地适应可以表示为矩阵值更新。CLAIR通过结构低秩加块稀疏分解恢复共享LoRA子空间并检测污染客户端。我们证明了在无噪声情况下能够精确恢复共享LoRA子空间，在初步估计误差下实现稳定恢复，并在温和的分离条件下实现一致的协作集恢复。我们进一步量化了CLAIR的改进效果：它通过跨客户端平均减少子空间外的估计误差，同时在共享LoRA子空间内保留客户端特定的变异，从而在该Oracle增益超过子空间估计和良性客户端异质性的成本时优于本地微调。经验上，我们通过在文本复制任务上微调Transformer架构来展示CLAIR的优势。结果表明，与本地微调和非鲁棒联邦平均相比，CLAIR在准确检测污染客户端和改善良性客户端性能方面表现出色。

英文摘要

Low-rank adaptation (LoRA) has emerged as a powerful tool for parameter-efficient fine-tuning of large language models (LLMs). This paper studies LoRA under a federated learning setting, enabling collaborative fine-tuning across clients while preserving parameter efficiency. We focus on a highly heterogeneous regime in which clients share only partial structure and a substantial subset may be contaminated. We propose Collaborative Low-rank Alignment and Identifiable Recovery (CLAIR), a contamination-aware framework that relies only on preliminary local estimators. Its formulation applies broadly, from linear regression to neural network and LLM modules, whenever local adaptation can be represented by matrix-valued updates. CLAIR recovers the shared LoRA subspace and detects contaminated clients via a structured low-rank plus block-sparse decomposition. We prove exact recovery of the shared LoRA subspace in the noiseless case, stable recovery under preliminary estimation error, and consistent collaborative-set recovery under mild separation conditions. We further quantify the gain from CLAIR refinement: it reduces off-subspace estimation error through cross-client averaging while preserving client-specific variation within the shared LoRA subspace, thus improves over local fine-tuning whenever this oracle gain outweighs the costs of subspace estimation and benign-client heterogeneity. Empirically, we demonstrate the benefits of CLAIR by fine-tuning a Transformer architecture on a text-copying task. The results show accurate contamination detection and improved benign-client performance compared with local fine-tuning and non-robust federated averaging.

URL PDF HTML ☆

赞 0 踩 0

2605.21197 2026-05-21 stat.ME

Laplace Approximations for Mixed-Effects and Gaussian Process Quantile Regression

混合效应和高斯过程分位数回归的拉普拉斯近似

Andrea Nava, Fabio Sigrist

AI总结本文提出了一种适用于混合效应和高斯过程分位数回归的拉普拉斯近似方法，通过分析信息和预期损失的曲率来克服非对称拉普拉斯似然下的计算障碍，提高了计算效率和准确性。

详情

AI中文摘要

拉普拉斯近似是一种用于潜在高斯模型计算高效推断的标准工具，但其在分位数回归中因非对称拉普拉斯似然的观测海森矩阵几乎处处消失而失效。本文证明，这一障碍可通过不平滑似然函数来克服：当模型正确指定时，相关局部曲率由信息给出；在模型不正确时，由预期损失的总体曲率给出。基于此，本文开发了适用于混合效应和高斯过程分位数回归的拉普拉斯近似框架。我们提出了实用的曲率估计器，包括三角核曲率（TKC）估计器，用于后验分布和边缘似然的近似，并建立了其渐近有效性。实证结果表明，所提方法在可扩展性和数值稳定性方面表现良好，并且在潜在高斯模型中，其精度可与MCMC和变分竞争者相比或更优，但计算成本显著更低。更广泛地说，该框架阐明了如何通过预期损失的局部二次行为来合理化非光滑广义后验的拉普拉斯近似。

英文摘要

Laplace approximations are a standard tool for computationally efficient inference in latent Gaussian models, but they fail for quantile regression with the asymmetric Laplace likelihood because the observed Hessian vanishes almost everywhere. We show that this obstacle can be overcome without smoothing the likelihood: the relevant local curvature is given not by the observed Hessian, but by the Fisher information when the model is correctly specified and by the population curvature of the expected loss under misspecification. On this basis, we develop a Laplace approximation framework for quantile regression with mixed-effects and Gaussian process models. We propose practical curvature estimators, including the triangular kernel curvature (TKC) estimator, that yield approximations for posterior distributions and marginal likelihoods, and we establish their asymptotic validity. Empirically, the proposed methods are scalable and numerically stable, and for latent Gaussian models, they achieve accuracy comparable to or better than MCMC and variational competitors at substantially lower computational costs. More broadly, the framework clarifies how Laplace approximations can be justified for non-smooth generalized posteriors through local quadratic behavior of the expected loss.

URL PDF HTML ☆

赞 0 踩 0

2605.21167 2026-05-21 stat.ML cs.LG

A Rigorous, Tractable Measure of Model Complexity

一个严格且可计算的模型复杂度度量

Oskar Allerbo, Thomas B. Schön

AI总结本文提出了一种严格且易于计算的模型复杂度度量方法，基于模型在不同输入上的梯度相似性，适用于参数模型和非参数模型，并扩展了多项式度数、核长度尺度等模型特定复杂度度量，同时揭示了随机傅里叶特征、随机森林、神经网络和梯度提升中的双下降现象。

详情

AI中文摘要

对模型复杂度的准确评估对于解释、泛化和模型选择等主题至关重要。然而，大多数现有复杂度度量要么依赖于启发式假设，要么计算上不可行。在本文中，我们提出了一种数学上严谨且易于计算的模型复杂度度量方法，该方法基于模型在不同输入上的梯度相似性。因此，它适用于任何参数模型，也适用于基于核的非参数模型。我们证明了我们的复杂度度量可以推广到模型特定的复杂度度量，如多项式度数（多项式回归）、核长度尺度（Matérn核）、邻居数（k-近邻）、分割数（决策树）和树数（随机森林）。我们还利用我们的度量方法获得了关于随机傅里叶特征、随机森林、神经网络和梯度提升中的双下降现象的新见解。

英文摘要

An accurate assessment of a model's complexity is crucial for topics such as interpretation, generalization, and model selection. However, most existing complexity measures either rely on heuristic assumptions or are computationally prohibitive. In this paper, we present a mathematically rigorous yet easy-to-compute measure of model complexity that is based on the similarities between the model gradients across inputs. It is thus well-defined for any parametric model, but also for kernel-based non-parametric models. We prove that our measure of complexity generalizes model-specific complexity measures such as polynomial degree (for polynomial regression), kernel length scale (for Matérn kernels), number of neighbors (for k-nearest neighbors), number of splits (for decision trees), and number of trees (for random forests). We also use our measure to obtain new insights into the double descent phenomenon for random Fourier features, random forests, neural networks, and gradient boosting.

URL PDF HTML ☆

赞 0 踩 0

2605.21107 2026-05-21 cs.LG stat.ML

Improved Guarantees for Constrained Online Convex Optimization via Self-Contraction

通过自收缩性获得约束在线凸优化的改进保证

Dhruv Sarkar, Abhishek Sinha

AI总结本文提出了一种基于投影的算法，在强凸损失下同时实现O(log T)的 regrets 和 O(log T) 的 CCV，对于凸损失则在保持最优 O(√T) regrets 的同时将 CCV 提升到 O(√T)。

详情

AI中文摘要

我们考虑了具有对抗性选择约束的约束在线凸优化 (COCO)。在每一轮中，学习者在观察该轮损失和约束函数之前选择动作。目标是在满足所有约束的最佳点上实现小静态遗憾，同时控制累积约束违反（CCV）。对于强凸损失，最先进的算法实现 O(log T) 的遗憾和 O(√(T log T)) 的 CCV。对应的凸损失最佳已知界限是 O(√T) 的遗憾和 O(√T log T) 的 CCV。在本文中，我们提出了一种简单的投影算法，对于强凸损失同时实现 O(log T) 的遗憾和 O(log T) 的 CCV，从而在 CCV 方面实现了指数级改进。对于凸损失，我们的算法将 CCV 提高到 O(√T)，同时保持最优的 O(√T) 悲伤。我们改进的关键是一个最近的几何结果，用于自收缩曲线，这可能具有独立兴趣。

英文摘要

We consider Constrained Online Convex Optimization (COCO) with adversarially chosen constraints. At each round, the learner chooses an action before observing the loss and constraint function for that round. The goal is to achieve small static regret against the best point satisfying all constraints while also controlling cumulative constraint violation ($\mathsf{CCV}$). For strongly convex losses, state-of-the-art algorithms achieve $O(\log T)$ regret and $O(\sqrt{T \log T})$ $\mathsf{CCV}.$ The corresponding best-known bounds for convex losses is $O(\sqrt{T})$ regret and $O(\sqrt{T} \log T)$ $\mathsf{CCV}$. In this paper, we give a simple projection-based algorithm that simultaneously achieves $O(\log T)$ regret and $O(\log T)$ $\mathsf{CCV}$ for strongly-convex losses, yielding an exponential improvement in the $\mathsf{CCV}$. For the convex losses, our algorithm improves the $\mathsf{CCV}$ to $O(\sqrt{T})$ while maintaining the optimal $O(\sqrt{T})$ regret. The key to our improvement is a recent geometric result for self-contracted curves, which may be of independent interest.

URL PDF HTML ☆

赞 0 踩 0

2605.21060 2026-05-21 cs.LG cs.AI stat.ML

Divide et Calibra: Multiclass Local Calibration via Vector Quantization

Divide et Calibra: 通过向量量化实现多类局部校准

Cesare Barbera, Lorenzo Perini, Giovanni De Toni, Andrea Passerini, Andrea Pugnana

AI总结本文提出了一种复合方法，通过向量量化诱导表示空间的结构划分，并利用Dirichlet浓度的参数化实现跨区域参数共享，从而学习出能泛化到稀疏区域的异质校准映射，提升了局部校准性能同时保持了全局校准和预测性能。

详情

AI中文摘要

在高风险场景中，准确且校准良好的机器学习（ML）模型是必需的，但有效的多类校准仍然具有挑战性：全局方法假设校准误差在潜在空间中是同质的，而局部方法通常依赖于潜在空间降维，导致信息丢失。为了解决这些问题，我们提出了一种多类校准的复合方法，其中区域特定的校准映射是从共享的码字依赖因素中构建的。我们通过向量量化（VQ）实现这一想法，它诱导了表示空间的结构划分，并利用Dirichlet浓度的参数化实现跨区域参数共享。我们的方法学习了能泛化到稀疏区域的异质校准映射。在基准数据集上的实验显示，在保持竞争性的全局校准和预测性能的同时，显著提高了局部校准性能。

英文摘要

Accurate and well-calibrated Machine Learning (ML) models are mandatory in high-stakes settings, yet effective multiclass calibration remains challenging: global approaches assume calibration errors are homogeneous across the latent space, while local methods often rely on latent-space dimensionality reduction, which leads to information loss. To address these issues, we propose a compositional approach to multiclass calibration, where region-specific calibration maps are constructed from shared codeword-dependent factors. We instantiate this idea via Vector Quantization (VQ), which induces a structured partition of the representation space, and an indexed parameterization of Dirichlet concentrations that enables parameter sharing across regions. Our approach learns heterogeneous calibration maps that generalize well even to sparse regions of the latent space. Experiments on benchmark datasets show significant improvements in local calibration while maintaining competitive global calibration and predictive performance.

URL PDF HTML ☆

赞 0 踩 0

2605.21043 2026-05-21 stat.OT

An Introduction to Copulas: a Complement

关于皮尔逊相关系数：一种补充

Werner G. Müller

AI总结本文为《统计推断》课程补充关于皮尔逊相关系数的内容，提供两个章节以更接近原书风格的方式介绍皮尔逊相关系数理论。

2605.21041 2026-05-21 stat.ML cs.LG stat.ME

Conditioning Gaussian Processes on Almost Anything

对几乎任何事物进行高斯过程的条件化

Henry Moss, Lachlan Astfalck, Thomas Cowperthwaite, Colin Doumont, Sam Willis, Philipp Hennig, Christopher Nemeth, Andrew Zammit-Mangion

AI总结本文提出了一种通用的方法，通过将高斯过程与线性扩散模型建立等价关系，实现了对任意条件语句的高效条件化，包括非线性物理模型和自然语言，从而扩展了高斯过程在现实世界建模中的应用。

详情

AI中文摘要

高斯过程（GPs）提供了一种基于函数的原理性概率模型，但精确推断仅限于线性-高斯范式。我们建立了GPs与一类线性扩散模型之间的显式等价关系，将预测采样重新表述为一个具有闭式高斯动力学和一个依赖似然的引导项的ODE，该引导项允许简单的蒙特卡洛近似。在线性-高斯设置中，我们精确恢复了标准GP条件化；超越共轭性之外，相同的机制能够处理任何允许逐点似然评估的条件语句——包括非线性物理模型，以及首次通过大型语言模型实现自然语言。白化分离了不可约的非高斯动力学，最小化了Wasserstein-2运输成本并消除了数值刚性。结果是一种通用的GP推断方案，无需专门推导。这些结果提供了一种通用机制，将现实世界知识的全部丰富性作为条件信息纳入其中，为现实世界问题的概率建模开辟了新的前沿。

英文摘要

Gaussian processes (GPs) offer a principled probabilistic model over functions, but exact inference is restricted to the linear-Gaussian regime. We establish an explicit equivalence between GPs and a class of linear diffusion models, recasting predictive sampling as an ODE with closed-form Gaussian dynamics and a likelihood-dependent guidance term that admits a simple Monte Carlo approximation. In the linear-Gaussian setting, we recover standard GP conditioning exactly; beyond conjugacy, the same machinery handles any conditioning statement admitting point-wise likelihood evaluation -- including non-linear physics, and, for the first time, natural language via large language models. Whitening isolates the irreducible non-Gaussian dynamics, minimising Wasserstein-2 transport cost and eliminating numerical stiffness. The result is a general-purpose GP inference scheme requiring no bespoke derivations. Together, these results provide a general mechanism for incorporating the full richness of real-world knowledge as conditioning information, opening a new frontier for the probabilistic modelling of real-world problems.

URL PDF HTML ☆

赞 0 踩 0

2605.20999 2026-05-21 math.PR cs.LG math.OC stat.ML

Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise

一般随机逼近的集中性在重尾马尔可夫噪声下

Shubhada Agrawal, Siva Theja Maguluri, Martin Zubeldia

AI总结本文研究了在具有有限状态马尔可夫分量和马丁格尔差分分量的噪声下，随机逼近算法迭代项的最大集中性界。通过新的Lyapunov函数和辅助投影算法，分析了不同步长序列和随机算子性质对误差尾部行为的影响，并展示了在无界马丁格尔差分噪声情况下，误差尾部的集中性结果。

详情

Comments: 67 pages

AI中文摘要

非参数贝叶斯统计学主题

Nils Lid Hjort

AI总结本文综述了非参数贝叶斯统计学领域内的各种理论和应用研究主题，补充和扩展了最近的综述文献，旨在探讨感兴趣的研究所涉及的领域。

2605.20806 2026-05-21 stat.ME stat.AP

Evaluation of the number of clusters in a data set using $p$-values from Multiple Tests of Hypotheses

利用假设检验的p值评估数据集中的聚类数

Soumita Modak

AI总结本文提出了一种新的非参数、基于点间距离的度量方法，用于确定给定数据集中是否存在群体，以及如果存在，则总共有多少群体。该方法适用于任意维度的数据集，并与任何指定聚类数作为先验的聚类算法相结合。通过执行单变量、非参数、多重假设检验，利用样本量相同的依赖检验进行点间距离分析，生成p值以进行组合决策，通过逐步过程确定可能的聚类数。该方法比文献中的其他准确性度量减少了不必要的计算。数据研究证明了所提出指标的效率和优越性。

详情

DOI: 10.1080/03610926.2024.2309967
Journal ref: Communications in Statistics - Theory and Methods (2024), 53, 8878-8889

AI中文摘要

本文提出了一种新的非参数、基于点间距离的度量方法，用于确定给定数据集中是否存在群体，以及如果存在，则总共有多少群体。它是一种适用于任意维度数据集的聚类准确性指数，可与任何具有指定聚类数作为先验的聚类算法相结合。我们执行单变量、非参数、多重假设检验，其中使用点间距离进行的依赖检验数量与样本量相同。它们具有p值用于组合以做出决策，该决策通过逐步过程对可能的聚类数进行判断。与文献中的其他准确性度量相比，该方法减少了不必要的计算。数据研究确立了所提出指标的效率和优越性。

英文摘要

This paper proposes a novel, nonparametric, interpoint distance-based measure to investigate whether there exist any groups in a set of given data, and if so then, how many groups are prevailing in total. It is a cluster accuracy index useful for arbitrary-dimensional data set, in association with any clustering algorithm having the number of groups specified as a priori. We perform univariate, nonparametric, multiple statistical tests of hypotheses, where as many dependent tests as the sample size are carried out using the interpoint distances. They possess $p$-values to be combined to reach a decision, which is taken in a step-wise process for a possible number of clusters. It reduces the unnecessary computations compared with the other accuracy measures from the literature. Data study establishes the proposed index's efficiency and superiority.

URL PDF HTML ☆

赞 0 踩 0

2605.20767 2026-05-21 cs.CL cs.LG stat.ME

The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study

干预的幻觉：你的LLM模拟实验实际上是一个观察性研究

Victoria Lin, Taedong Yun, Maja Matarić, John Canny, Arthur Gretton, Alexander D'Amour

AI总结本文探讨了大型语言模型在模拟人类行为中的潜在作用，指出在LLM模拟的合成用户中进行干预可能引起潜在用户属性的意外变化，从而导致用户漂移，影响效果估计。本文提出了使用负对照结果来检测分布变化的方法，并通过调整角色描述以减少偏倚来缓解漂移问题。

详情

AI中文摘要

大型语言模型（LLMs）显示出作为人类行为模拟器的潜力，提供了一种可扩展的方式研究对干预的反应。然而，由于LLMs主要基于观察性数据进行训练，在与LLM模拟的合成用户进行实验时，干预可能会引起潜在用户属性的意外变化，导致用户漂移，其中隐含的模拟总体在不同处理条件下有所不同，这可能会扭曲效应估计。我们正式化了由于用户漂移可能产生的混淆或选择偏差，并展示了干预依赖性变化如何放大或减弱干预下用户响应的观测差异。为了诊断混淆，我们提出使用负对照结果——在干预下应保持不变的属性——来识别干预条件间的分布变化，提供用户漂移的证据。为了缓解漂移，我们研究了通过获取额外的混杂因素来调整角色描述，发现针对特定场景的相关混杂因素可以显著减少调查式和多轮代理评估中的偏倚。

英文摘要

Large language models (LLMs) show potential as simulators of human behavior, offering a scalable way to study responses to interventions. However, because LLMs are trained largely on observational data, interventions in experiments with LLM-simulated synthetic users can induce unintended shifts in latent user attributes, causing user drift where the implicit simulated population differs across treatment conditions, potentially distorting effect estimates. We formalize the confounding or selection bias that can arise due to user drift and show how intervention-dependent shifts can inflate or attenuate observed differences in user responses under intervention. To diagnose confounding, we propose using negative control outcomes--attributes that should remain invariant under intervention--to identify distribution shifts across intervention conditions, providing evidence of user drift. To mitigate drift, we study adjusting the persona specification by eliciting additional confounders, finding that targeted, setting-relevant confounders can substantially reduce bias across survey-style and multi-turn agent evaluations.

URL PDF HTML ☆

赞 0 踩 0

2605.20756 2026-05-21 cs.LG cs.AI math.OC stat.ML

Correcting Stochastic Update Bias in Preconditioned Language Model Optimizers

纠正预条件语言模型优化器中的随机更新偏差

Nikhil Nayak, Julia White, Urchade Zaratiana, Kelton Zhang, Henrijs Princis, Dhruv Atreja, Henry Fawcett, Matthew Thomas, George Hurn-Maloney, Ash Lewis

AI总结本文研究了预条件优化器中随机更新规则的有限样本偏差问题，提出了一种单批次偏差校正框架，通过交叉拟合预条件估计和方差校正逆运算来减少梯度-预条件器耦合偏差和逆运算偏差，从而提升预条件优化器的性能。

详情

Comments: 32 pages, 3 figures, 13 tables

AI中文摘要

预条件优化器在语言模型训练中至关重要，但其随机更新规则通常被视为对群体预条件下降的直接近似。我们证明这种观点忽略了两个有限样本偏差。首先，梯度和预条件器通常从同一个mini-batch估计，引入梯度-预条件器耦合偏差。其次，即使预条件器估计是无偏的，其逆或逆根通常有偏，因为逆运算是非线性的。我们提出了一种单批次偏差校正框架，以解决这两种效应：交叉拟合预条件估计从独立的微批次组中估计分子和预条件器，而方差校正逆运算利用微批次变化来减去主导的delta-方法偏差项。该框架适用于对角矩、对角曲率和矩阵预条件方法，分别在AdamW、Sophia和Shampoo中实现。偏差校正将Qwen2.5-0.5B的保持预训练损失减少了0.15、0.07和0.11 nat，分别；对混合质量预训练和下游指令微调的影响始终是中性到积极的。这些结果确立了偏差校正作为减少有限样本更新偏差和提升预条件优化器性能的实用机制。

英文摘要

Preconditioned optimizers are central to language model training, but their stochastic update rules are usually treated as direct approximations to population preconditioned descent. We show that this view misses two finite-sample biases. First, the gradient and preconditioner are typically estimated from the same minibatch, introducing gradient--preconditioner coupling bias. Second, even when the preconditioner estimate is unbiased, its inverse or inverse-root is generally biased because inversion is nonlinear. We propose a single-batch bias-correction framework that addresses both effects: cross-fitted preconditioning estimates the numerator and preconditioner from independent microbatch groups, while variance-corrected inversion uses microbatch variability to subtract the leading delta-method bias term. The framework applies to diagonal moment, diagonal curvature, and matrix preconditioning methods, instantiated in AdamW, Sophia, and Shampoo. Bias correction reduces held-out pretraining loss on Qwen2.5-0.5B by $0.15$, $0.07$, and $0.11$ nats, respectively; the effects on mixed-quality pretraining and downstream instruction tuning are consistently neutral-to-positive. Together, these results establish bias correction as a practical mechanism for reducing finite-sample update bias and improving the performance of preconditioned optimizers.

URL PDF HTML ☆

赞 0 踩 0

2605.20739 2026-05-21 math.ST eess.SP stat.TH

Revisiting the Misspecified Cramér-Rao Bound

重新审视设定错误的Cramér-Rao界

Malaak Khatib, Nadav Harel, Joseph Tabrikian, Tirza Routtenberg

AI总结本文重新审视在模型设定错误下的参数估计理论，重新审视MCRB的基础，通过点wise等价模型的概念推导出新的MCRB，并明确其适用的估计器类别和等式条件，为实际估计器提供了新的见解。

详情

Comments: This work has been submitted to the IEEE for possible publication

AI中文摘要

在许多信号处理问题中，模型设定错误会导致假设的观测模型与真实的数据生成机制不一致。设定错误的Cramér-Rao界（MCRB）是描述这种情况下均方误差（MSE）下限的广泛认可的界，最初用于描述设定错误最大似然（MML）估计量的渐近行为。尽管其广泛应用，MCRB缺乏对其有效估计器类别的严格表征。本文重新审视在模型设定错误下的参数估计理论，并重新审视MCRB的基础。我们首先展示了这些限制，并检查了一个基于局部设定错误无偏性的朴素MCRB版本。我们证明该界通常不紧且可能无法达到。为了获得有意义的界，我们基于点wise等价模型的概念开发了新的推导。通过最大化这些模型的朴素界，我们恢复了经典的MCRB，现在有了构造性的推导、相关估计器类别的显式表征以及等式条件。这种表述建立了局部无偏性条件与可达到的界之间的正式联系，为MCRB结构及其对实际估计器的相关性提供了新的见解。最后，我们定义了有效设定错误估计器的概念，并证明如果存在，则由MML估计量实现。

英文摘要

Estimation under model misspecification arises in many signal processing problems, where the assumed observation model deviates from the true data-generating mechanism due to errors or simplifications. The misspecified Cramér-Rao bound (MCRB) is a widely recognized mean-squared-error (MSE) lower bound for this case, which has originally been used to describe the asymptotic behavior of the misspecified maximum likelihood (MML) estimator. Despite its widespread use, the MCRB lacks a rigorous characterization of the class of estimators for which it is valid. In this paper, we revisit the theory of parameter estimation under model misspecification and re-examine the foundations of the MCRB. We first demonstrate these limitations and examine a naive version of the MCRB, which relies only on local misspecified unbiasedness. We show that this bound is generally not tight and may be unattainable. To obtain a meaningful bound, we develop a new derivation based on the concept of pointwise equivalent models. By maximizing the naive bound for these models, we recover the classical MCRB, now supported by a constructive derivation, an explicit characterization of the associated estimator class, and an equality condition. This formulation establishes a formal link between local unbiasedness conditions and achievable bounds, offering new insights into the MCRB structure and its relevance to practical estimators. Finally, we define the notion of an efficient misspecified estimator and show that if it exists, it is achieved by the MML estimator.

URL PDF HTML ☆

赞 0 踩 0

2605.20726 2026-05-21 stat.ME cs.LG stat.ML

Everywhere Valid Bounds on False Discovery Proportions in Conformal Inference

在符合推断中对虚假发现比例的处处有效界

Ziang Song, Ying Jin, Emmanuel J. Candès

AI总结本文提出了一种在多重检验问题中对虚假发现比例（FDP）的处处有效界，通过构造高概率包络来保证在任意后验阈值选择下的统计保证，同时展示了该方法在异常检测和符合选择中的应用。

详情

Comments: 31 pages, 12 figures. Code available at https://github.com/sza919/everywhere-valid-fdp-bounds-in-conformal-inference

AI中文摘要

现代将符合推断应用于多重检验问题，如异常检测和候选选择时，通常涉及选择符合p值低于阈值的测试样本。此类方法的质量通常通过虚假发现比例（FDP）来衡量，定义为错误选择的比例。现有方法通常控制FDP的期望值，使用如Benjamini-Hochberg过程等方法。这种做法无法提供高概率界下的实际FDP界，且当拒绝阈值在查看数据后选择时会破坏统计保证。本文建立了适用于所有可能拒绝阈值的有限样本、分布无关的FDP上界，从而允许任意后验阈值选择。通过从其联合分布中采样来构造null符合p值的经验分布函数的高概率包络，实现了同时有效性。此外，我们的框架允许从业者调节包络的形状，从而在主要感兴趣的拒绝区域中产生更紧的界。我们使用这种灵活的方法推导出异常检测和符合选择的的同时FDP上界。通过合成和真实数据实验，我们展示了所得到的界既有效又比现有方法的界更加不保守。

英文摘要

Modern applications of conformal inference to multiple testing problems, such as outlier detection and candidate selection, often involve selecting test samples whose conformal p-values fall below a threshold. The quality of such methods is often measured by the false discovery proportion (FDP), defined as the fraction of incorrect selections. Existing approaches typically control the expected value of the FDP, using methods such as the Benjamini-Hochberg procedure. This approach fails to provide high-probability bounds on the realized false discovery proportion and invalidates statistical guarantees if the rejection threshold is selected after inspecting the data. This paper establishes finite-sample, distribution-free upper bounds on the FDP that hold simultaneously over all possible rejection thresholds, enabling arbitrary post hoc selection of the threshold. Simultaneous validity is achieved by constructing a high-probability envelope for the empirical distribution function of null conformal p-values by sampling from their joint distribution. Furthermore, our framework allows practitioners to modulate the envelope's shape, thereby producing tight bounds in rejection regions of primary interest. We use this flexible approach to derive simultaneous FDP upper bounds for both outlier detection and conformal selection. We demonstrate through synthetic and real-data experiments that the resulting bounds are both valid and substantially less conservative than those derived from existing approaches.

URL PDF HTML ☆

赞 0 踩 0

2605.20710 2026-05-21 stat.ME

Assessing Estimate of CATE from Observational Data via an RCT Study

通过RCT研究评估从观察数据中估计的CATE

Bosen Cui, Yuhong Yang

AI总结本文提出了一种通过RCT研究评估从观察数据中估计的CATE（条件平均处理效应）的方法，该方法通过在随机试验中评估CATE估计的拟合质量，从而提高其在实际应用中的可信度。

详情

Comments: 34 pages, 5 figures

AI中文摘要

条件平均处理效应（CATEs）越来越多地从观察数据中估计并用于指导政策和个体化治疗决策。在实践中，在此类估计被信任之前，其预测适应性需要被评估，但仅靠观察数据本身提供有限的机会进行此类评估。我们提出了CATE评估通过适应性评估（CAFE），这是一种正式框架，用于直接评估从观察数据中学习的CATE估计的拟合质量，而不是完整的潜在结果模型，使用来自随机试验的证据。CAFE根据估计的倾向分数（或类似指标）将试验协变量空间划分为多个部分，并将观察到的条件处理效应与组水平的实验平均值进行比较。该框架可以容纳广泛类别的CATE学习器，包括参数模型和灵活的机器学习方法，如因果森林和提升方法。我们建立了在空虚假设和替代假设下的理论保证，并引入了最大型扩展以提高对局部不适应的敏感性。当同时可用随机试验数据和观察数据时，我们进一步开发了两阶段程序以检测未观察到的混杂因素的存在。广泛的数值研究展示了CAFE方法在评估观察数据导出的CATE估计时的实用性。

英文摘要

Conditional average treatment effects (CATEs) are increasingly estimated from observational data and used to guide policy and individualized treatment decisions. Before such estimates can be trusted in practice, their predictive fitness needs to be assessed, yet observational data alone offer limited opportunities for doing so. We propose CATE Assessment via Fitness Evaluation (CAFE), a formal framework for directly assessing the goodness-of-fit of a CATE estimate learned from observational data, rather than the full underlying outcome model, using evidence from a randomized trial. CAFE partitions the trial covariate space according to estimated propensity scores (or the like) and compares observationally derived conditional treatment effects with group-level experimental averages. The framework accommodates a broad class of CATE learners, including parametric models and flexible machine learning methods such as causal forest and boosting. We establish theoretical guarantees under both the null and alternative hypotheses, and introduce a maximum-type extension to improve sensitivity to localized lack of fit. When both randomized trial and observational data are available, we further develop a two-stage procedure to detect the existence of unobserved confounders. Extensive numerical studies show the utility of the CAFE approach when assessing observational-derived CATE estimates.

URL PDF HTML ☆

赞 0 踩 0

2605.20693 2026-05-21 cs.CL cs.AI stat.ML

Interpretable Discriminative Text Representations via Agreement and Label Disentanglement

通过共识和标签解缠获得可解释的判别文本表示

Tong Wang, Yiqing Xu, Leo Yang Yang

AI总结本文提出了一种可解释的判别文本表示方法，通过共识和标签解缠来确保特征的可解释性和可重复性，实验表明该方法在多个文本分类任务中表现优异，产生了更清晰且更少标签纠缠的特征。

详情

AI中文摘要

可解释的文本表示应暴露出不仅具有预测性，而且对独立审计员来说有意义的坐标。现有的判别表示通常使用匿名嵌入方向，而概念瓶颈和LLM辅助方法将自然语言名称附加到特征上，但并未确保这些定义是可重复的或与目标标签不同。我们提出了一种可解释判别文本表示的操作标准：每个坐标应满足概念清晰度，通过独立标注员应用特征定义之间的机会调整一致性来衡量，并且标签解缠，即特征不应仅仅改述预测目标。我们通过LLM辅助特征发现（LFD）方法实现了这一标准，这是一种迭代方法，从对比性反向文本对中提出词汇和语义特征，通过跨LLM Cohen's $κ$ 筛选候选，并通过残差保留的预测增益选择特征。一种简化分析将$κ$筛选与每个特征的注释噪声界限联系起来，正式化一致性作为可靠性检查。在十个跨越七个语料库的文本分类任务中，LFD与强大的文本瓶颈基线具有相同的预测性能，同时产生明显更清晰且标签纠缠更少的特征。232名人类审计员的实验表明，LFD特征在人类-人类和人类-LLM一致性方面优于基线概念，且审计员一致认为它们更少标签泄漏。这些结果表明，经过一致性测试和标签解缠的坐标为可解释文本分类提供了一个实用的可审计标准。

英文摘要

Interpretable text representations should expose coordinates that are not only predictive, but also meaningful enough for independent auditors to apply. Existing discriminative representations often use anonymous embedding directions, while concept-bottleneck and LLM-assisted methods attach natural-language names to features without ensuring that those definitions are reproducible or distinct from the target label. We propose an operational criterion for interpretable discriminative text representations: each coordinate should satisfy conceptual clarity, measured by chance-adjusted agreement between independent annotators applying the feature definition, and label disentanglement, meaning the feature should not merely paraphrase the prediction target. We instantiate this criterion in LLM-assisted Feature Discovery (LFD), an iterative method that proposes lexical and semantic features from contrastive outcome-opposed text pairs, screens candidates using cross-LLM Cohen's $κ$, and selects features by residual held-out predictive gain. A stylized analysis connects the $κ$ screen to a per-feature annotation-noise bound, formalizing agreement as a reliability check. Across ten text-classification tasks spanning seven corpora, LFD matches the predictive performance of a strong text bottleneck baseline while producing substantially clearer and less label-entangled features. Human audits with 232 raters show that LFD features achieve higher human--human and human--LLM agreement than baseline concepts, and raters consistently judge them as less label-leaking. These results suggest that agreement-tested, label-disentangled coordinates provide a practical auditability standard for interpretable text classification.

URL PDF HTML ☆

赞 0 踩 0

2605.20692 2026-05-21 stat.ME q-bio.PE q-bio.QM stat.AP

Inferring infectiousness: a joint model of the within-host viral kinetics of SARS-CoV-2

推断传染性：SARS-CoV-2宿主内病毒动力学的联合模型

Christopher B. Boyer, Stephen M. Kissler, Seran Hakki, Jakob Jonnerby, Ajit Lalvani, Marc Lipsitch

AI总结本文提出了一种联合模型，通过分析多个病毒脱落间接指标的数据，推断SARS-CoV-2宿主内病毒动力学的传染性轨迹，从而为政策制定提供更准确的传染性评估。

详情

AI中文摘要

在传染病爆发期间，提供准确的政策问题答案需要详细的传染病性自然史模型。不幸的是，直接测量传染性通常不可用。相反，我们通常依赖间接代理，如通过PCR或抗原测试测量的病毒载量、通过病毒培养检测复制活性病毒或症状发作，这些都反映了病毒动力学或宿主反应的不同方面。然而，这些代理在收集的便利性、可扩展性和与病毒脱落及基础传染性相关联方面存在差异。在此，我们利用来自五个前瞻性、密集采样队列的数据，这些队列有纵向数据，涵盖多个病毒脱落代理，约2000例感染，开发了一个贝叶斯联合模型，用于SARS-CoV-2感染的宿主内病毒动力学。对联合分布的建模使我们能够推断仅提供PCR数据的个体的病毒脱落轨迹——传染性的最直接相关指标，并计算无法通过任何单一代理单独获得的衍生量。这些包括根据诊断后时间、变种、疫苗接种状态和感染史分层的群体层面传染性持续时间和概率；隔离解除的残余风险；以及根据新检测结果逐步更新的个性化实时传染性估计。

英文摘要

During an infectious disease outbreak, providing accurate answers to policy questions about transmission requires a detailed model of the natural history of infectiousness. Unfortunately, direct measures of infectiousness are generally unavailable. Instead, we often rely on indirect proxies, such as viral load measured by PCR or antigen tests, viral culture to detect replication-competent virus, or symptom onset, each of which reflects different aspects of viral dynamics or host response. However, these proxies vary in terms of the ease of collection, scalability, and their relationship to viral shedding and therefore underlying infectiousness. Here, we use data from five prospective, densely sampled cohorts with longitudinal data on multiple proxies of viral shedding for approximately 2,000 infections to develop a Bayesian joint model for the within-host viral kinetics of SARS-CoV-2 infection. Modeling the joint distribution allows us to infer the trajectory of infectious virus shedding -- the most direct correlate of infectiousness -- for individuals who contribute only PCR data, and to compute derived quantities that are inaccessible from any single proxy alone. These include the population-level probability and expected duration of ongoing infectiousness as a function of time since diagnosis, stratified by variant, vaccination status, and infection history; the residual risk of releasing an individual from isolation; and personalized, real-time estimates of infectiousness that are sequentially updated as new test results become available.

URL PDF HTML ☆

赞 0 踩 0

2605.20681 2026-05-21 stat.ME cs.LG

Scale-Calibrated Median-of-Means for Robust Distributed Principal Component Analysis

基于尺度校准的中位数-均值方法用于鲁棒分布式主成分分析

Kisung You

AI总结本文研究了基于尺度校准的中位数-均值估计器，用于鲁棒分布式主成分分析，通过欧几里得空间和格拉斯曼流形的产品几何结构，提出了一个节点级PCA展开，证明了所提出的产品流形中位数-均值估计器的渐近等价性，并展示了鲁棒块尺度和推断最优校准规则，以及高概率中位数-均值界限。

详情

AI中文摘要

分布式主成分分析（PCA）产生节点级的均值向量和主子空间估计。稳健地聚合这些异质对象需要均值误差和子空间误差之间的相对尺度。我们研究了使用欧几里得空间和格拉斯曼流形的产品几何结构的尺度校准的中位数-均值估计器用于此问题。一个节点级PCA展开显示，均值组件具有通常的线性影响，而子空间组件是特征间隙加权的协方差扰动。我们证明了一个局部减少，显示所提出的产品流形中位数-均值估计器在渐近上等价于一个缩放后的节点影响误差的空间中位数。这导致了固定节点非高斯极限、增长节点高斯极限和有限块偏差的高斯极限，以及显式依赖于尺度的协方差公式。我们提出了鲁棒块尺度和推断最优校准规则，建立了高概率中位数-均值界限，刻画了因子wise坏节点影响，并证明了节点Bootstrap有效性。模拟和大规模单细胞RNA-seq数据表明，尺度校准适应于特征间隙驱动的子空间不确定性，并提供了鲁棒的分布式PCA总结。

英文摘要

Distributed principal component analysis (PCA) produces node-level estimates of both a mean vector and a principal subspace. Robustly aggregating these heterogeneous objects requires a relative scale between mean error and subspace error. We study a scale-calibrated median-of-means estimator for this problem using the product geometry of Euclidean space and the Grassmann manifold. A node-level PCA expansion shows that the mean component has the usual linear influence, whereas the subspace component is an eigengap-weighted covariance perturbation. We prove a local reduction showing that the proposed product-manifold median-of-means estimator is asymptotically equivalent to a scaled spatial median of node influence errors. This yields fixed-node non-Gaussian limits, growing-node Gaussian limits with finite-block bias, and an explicit scale-dependent covariance formula. We propose robust block-scale and inference-optimal calibration rules, establish high-probability median-of-means bounds, characterize factorwise bad-node influence, and prove node-bootstrap validity. Simulations and large-scale single-cell RNA-seq data show that scale calibration adapts to eigengap-driven subspace uncertainty and provides a robust distributed PCA summary.

URL PDF HTML ☆

赞 0 踩 0

2605.20634 2026-05-21 stat.ME math.ST stat.TH

New Confidence Regions for Linear Regression Parameters with Stationary-Ergodic Dependent Errors

线性回归参数的新置信区域：具有平稳-遍历依赖误差

Mous-Abou Hamadou, Martial Longla, Mathias Nthiani Muia, Mahmud Hasan

AI总结本文提出了一种在回归变量和误差 jointly stationary and ergodic 且未指定序列依赖的情况下，利用随机平滑和辅助样本进行回归系数联合置信区域估计的方法，无需直接估计长期方差或参数依赖模型，且在实际应用中表现出良好的覆盖性能和区域体积。

详情

AI中文摘要

我们开发了在回归变量和误差 jointly stationary and ergodic 且未指定序列依赖的情况下，线性回归系数的联合置信区域。该方法应用随机平滑，使用独立的辅助样本和收缩带宽，对回归和二阶矩统计量向量进行处理。在平稳性、遍历性和有限二阶矩条件下，估计量渐近正态，从而产生Wald置信区域和同时置信区间，而无需直接估计长期方差或参数依赖模型。在实现中，我们引入了标度估计量，具有数据驱动的带宽选择和一种温和的截断，以提高有限样本稳定性。在ARMA、ARFIMA、基于copula的马尔可夫误差和分数高斯噪声（具有高斯和重尾边缘分布）的模拟中，显示出接近名义覆盖和与Newey-West HAC和MAC相比具有竞争力的区域体积。一个北京冬季PM2.5应用示例展示了该过程。关键词：随机平滑，联合推断，置信区域，依赖误差，长期记忆，回归推断

英文摘要

We develop joint confidence regions for linear regression coefficients when the regressors and errors are jointly stationary and ergodic with unspecified serial dependence. The method applies random smoothing, using an independent auxiliary sample and shrinking bandwidth, to a vector of regression and second-moment statistics. Under stationarity, ergodicity, and finite second moments, the estimator is asymptotically normal and yields Wald confidence regions and simultaneous confidence intervals without direct long-run variance estimation or a parametric dependence model. For implementation, we introduce a scaled estimator with data-driven bandwidth selection and a mild truncation that improves finite-sample stability. Simulations under ARMA, ARFIMA, copula-based Markov errors, and fractional Gaussian noise, with Gaussian and heavy-tailed margins, show near-nominal coverage and competitive region volumes relative to Newey-West HAC and MAC. A winter Beijing PM2.5 application illustrates the procedure. Keywords: Random smoothing, Joint inference, Confidence regions, Dependent errors, Long memory, Regression inference

URL PDF HTML ☆

赞 0 踩 0

2605.20633 2026-05-21 stat.ME stat.AP

Application of Propensity Score Models and Causal Estimators in Observational Studies under Model Misspecification

倾向分数模型和因果估计器在模型不规范下的观察性研究应用

Apu Chandra Das, Sakib Salam, Md Robiul Islam Talukder, Ashim Chandra Das, Antar Chandra Das, Rakhi Chowdhury

AI总结本文研究了在模型不规范情况下，倾向分数模型和因果估计器在观察性研究中的性能，发现AIPW在大多数情况下提供了稳健且稳定的估计，而IPW对PS模型不规范非常敏感，RSM仅在结果模型正确规范时表现良好。

详情

Comments: 24 pages, 4 figures

AI中文摘要

倾向分数（PS）方法被广泛应用于观察性研究中，以减少混杂因素并估计因果治疗效应。然而，PS基于的因果估计器的有效性严重依赖于正确的模型规范，模型不规范可能导致显著的偏倚和不稳定性。在本研究中，我们系统地评估了常用因果估计器在不同水平的PS和结果模型不规范下的性能，包括响应面建模（RSM）、逆概率加权（IPW）和增强逆概率加权（AIPW）。我们比较了逻辑回归与几种机器学习方法在PS估计中的表现，包括随机森林（RF）、支持向量机（SVM）和线性判别分析（LDA）。在多个由正确规范和不规范的PS和结果模型、不同样本量和不同协变量相关结构定义的场景下进行了广泛的模拟研究。通过偏倚、绝对偏倚、均方根误差、经验标准误差和置信区间宽度来评估估计器性能。结果表明，AIPW在大多数情况下由于其双重鲁棒性提供了稳健且稳定的估计，而IPW对PS不规范非常敏感，且由灵活的机器学习方法产生的不稳定PS估计会使其不稳。RSM仅在结果模型正确规范时表现良好。使用ACTG175临床试验和阿尔茨海默病神经影像化验计划（ADNI）数据集的现实应用进一步说明了估计器选择和PS建模策略的实际影响。总体而言，我们的发现强调了在双重鲁棒框架内整合灵活的机器学习方法以提高观察性研究中的因果效应估计的重要性。

英文摘要

Propensity score (PS) methods are widely used in observational studies to reduce confounding and estimate causal treatment effects. However, the validity of PS-based causal estimators depends heavily on correct model specification, and model misspecification may lead to substantial bias and instability. In this study, we systematically evaluate the performance of commonly used causal estimators, including response surface modeling (RSM), inverse probability weighting (IPW), and augmented inverse probability weighting (AIPW), under varying levels of PS and outcome model misspecification. We compare classical logistic regression with several machine learning approaches for PS estimation, including random forests (RF), support vector machines (SVM), and linear discriminant analysis (LDA). Extensive simulation studies were conducted under multiple scenarios defined by combinations of correctly specified and misspecified PS and outcome models, varying sample sizes, and different covariate correlation structures. Estimator performance was assessed using bias, absolute bias, root mean squared error, empirical standard error, and confidence interval width. Results demonstrate that AIPW consistently provides robust and stable estimates across most scenarios due to its doubly robust property, whereas IPW is highly sensitive to PS misspecification and unstable PS estimates produced by flexible machine learning methods. RSM performs well only when the outcome model is correctly specified. Real-world applications using the ACTG175 clinical trial and the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset further illustrate the practical implications of estimator choice and PS modeling strategy. Overall, our findings highlight the importance of integrating flexible machine learning approaches within doubly robust frameworks to improve causal effect estimation in observational studies.

URL PDF HTML ☆

赞 0 踩 0

2605.20621 2026-05-21 stat.ME stat.AP stat.CO

Changepoint Detection in Categorical Time Series with Application to Daily Total Cloud Cover in Canada

在加拿大每日总云量中的类别时间序列中的突变点检测及其应用

Mo Li, QiQi Lu, XiaoLan Wang

AI总结本文提出了一种边际过渡模型，用于检测周期性和序列相关类别时间序列中的单个突变点，通过马尔可夫链捕捉序列依赖性，并利用最大似然估计方法提高计算效率，以检测类别时间序列中的突然变化。

详情

Comments: 31 pages, 16 figures, 5 tables; includes supplementary material; R/Rcpp code available in the linked GitHub repository

AI中文摘要

突变点对于同质化类别时间序列和分析其趋势和变化至关重要。加拿大原始总云量每小时记录在十分位（或八分位），表现出固有的季节性和序列相关性。Lu和Wang（2012）引入了扩展的累积logit模型来检测云量条件的年度频率变化。虽然年度汇总可以减轻季节性和序列相关性，但会缩短时间序列并可能导致过度分散。本文提出了一种边际过渡模型，用于检测周期性和序列相关的类别时间序列中的单个突变点。该模型利用一阶马尔可夫链捕捉序列依赖性，并允许类别特定的突变点指定。为了提高计算效率，我们开发了一种新的参数估计程序以获得最大似然估计。然后提出了一种最大化选择的似然比检验统计量来测试类别时间序列中的突然变化，并通过在加拿大不列颠哥伦比亚省弗雷德里克顿圣约翰机场9点和3点记录的每日总云量观测数据来说明该方法。

英文摘要

Changepoints are essential for homogenizing categorical time series and analyzing their trends and variations. The original total cloud cover in Canada was recorded hourly in tenths (or eighths), exhibiting inherent seasonality and serial correlation. Lu and Wang (2012) introduced an extended cumulative logit model to detect shifts in the annual frequencies of cloud cover conditions. While annual aggregation mitigates seasonality and serial correlation, it shortens the time series and may lead to overdispersion. This article introduces a marginalized transition model to detect a single changepoint in periodic and serially correlated categorical time series. The model captures serial dependence using a first-order Markov chain and enables category-specific changepoint specification. To enhance computational efficiency, we develop a new parameter estimation procedure for obtaining maximum likelihood estimates. A maximally selected likelihood ratio test statistic is then proposed to test for sudden changes in categorical time series, and the method is illustrated using daily total cloud cover observations recorded at 9 a.m. and 3 p.m. at Fort St. John Airport, British Columbia, Canada.

URL PDF HTML ☆

赞 0 踩 0

2605.20619 2026-05-21 cs.LG math.OC stat.ML

SURF: Steering the Scalarization Weight to Uniformly Traverse the Pareto Front

SURF: 通过调整标量化权重以均匀遍历帕累托前沿

Liuyuan Jiang, Chentong Huang, Lisha Chen

AI总结本文提出SURF方法，通过调整标量化权重以实现帕累托前沿的均匀覆盖，解决了传统标量化方法在多目标优化中导致非均匀覆盖的问题。

详情

AI中文摘要

标量化在多目标优化中因其简单性和可扩展性而被广泛应用。然而，在许多应用中，目标是生成代表多样化用户偏好的解决方案，理想情况下应实现帕累托前沿（PF）的均匀覆盖。然而，通常均匀采样标量化权重通常会导致PF的非均匀覆盖。我们通过标量化路径的几何分析解释了这种不匹配。随着标量化权重的变化，对应的解决方案通常以非均匀的速度遍历PF。这种速度诱导了一个弧长累积分布函数（CDF）；通过反向此CDF映射，可以得到一个原则性的规则，用于选择产生均匀PF覆盖的权重。基于这一见解，我们提出了SURF（沿帕累托前沿均匀采样）。对于结构化问题，包括双目标老虎机，我们推导了此CDF映射和由此产生的PF感知的权重采样规则。对于一般问题，SURF在CDF重建和权重采样之间交替进行。理论上，我们证明在可证明的条件下，SURF收敛到一个不可避免的有限采样地板。经验上，在老虎机、多目标gymnasium和多目标LLM对齐实验中，SURF在效率上实现了比基线更均匀的PF覆盖。

英文摘要

Scalarization is widely used in multi-objective optimization owing to its simplicity and scalability. In many applications, the goal is to generate solutions that represent diverse user preferences, ideally with uniform coverage of the Pareto front (PF). However, uniformly sampling scalarization weights usually induces non-uniform coverage of the PF. We explain this mismatch through a geometric analysis of the scalarization path. As the scalarization weight varies, the corresponding solutions trace the PF with a generally non-uniform traversal speed. This speed induces an arc-length cumulative distribution function (CDF); inverting this CDF map yields a principled rule for selecting weights that produce uniform PF coverage. Building on this insight, we propose SURF (Sampling Uniformly along the PaReto Front). For structured problems, including bi-objective bandits, we derive closed-form expressions for this CDF map and the resulting PF-aware weight sampling rule. For general problems, SURF alternates between CDF reconstruction and weight sampling. Theoretically, we show that under provable conditions, SURF converges linearly to an unavoidable finite-sampling floor. Empirically, experiments on bandits, multi-objective-gymnasium, and multi-objective LLM alignment demonstrate that SURF efficiently achieves more uniform PF coverage than baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.20604 2026-05-21 stat.ME

Conditional regularized halfspace depth for sparse functional data and its applications

基于稀疏函数数据的条件正则化半空间深度及其应用

Hyemin Yeon, Xiongtao Dai, Sara Lopez-Pintado

AI总结本文提出了一种新的稀疏函数数据深度概念——条件正则化半空间深度（CRHD），用于评估稀疏观测数据的深度，克服了现有方法对重构曲线的依赖，并通过排名检验和婴儿生长数据集展示了其应用价值。

详情

AI中文摘要

许多函数数据集是稀疏且不规则观测的。对这类数据进行排序具有挑战性，因为每个观测点仅提供有限的信息，而潜在轨迹仍然是无限维的。本文开发了一种新的深度概念，称为条件正则化半空间深度（CRHD）。CRHD定义为给定观测稀疏测量的潜在轨迹的条件半空间概率的下确界，从而允许在稀疏观测上直接进行深度评估，而不需要轨迹重构。我们研究了CRHD的几个基本理论性质，以澄清其作为深度度量的行为。所提出的深度甚至适用于极其稀疏观测的函数数据，克服了现有稀疏函数深度方法的关键限制，这些方法通常依赖于重构曲线。此外，CRHD为复杂函数数据诱导了有意义的排名。通过基于排名的检验展示了其数值性能，并通过婴儿生长数据集展示了其实际应用价值。

英文摘要

Many functional datasets are observed sparsely and irregularly. Ordering such data is challenging because only limited information is available from each observation, while the underlying trajectories remain infinite-dimensional. This paper develops a novel depth notion for sparse functional data, called the conditional regularized halfspace depth (CRHD). CRHD is defined as the infimum of conditional halfspace probabilities of the underlying trajectory given the observed sparse measurements, thereby enabling depth evaluation directly at sparse observations without requiring trajectory reconstruction. We study several basic theoretical properties of CRHD that clarify its behavior as a depth measure. The proposed depth is applicable even to extremely sparsely observed functional data, overcoming key limitations of existing sparse functional depths that often rely on reconstructed curves. In addition, CRHD induces meaningful rankings for complex functional data. Its numerical performance is demonstrated through rank-based tests, and its practical utility is illustrated using an infant growth dataset.

URL PDF HTML ☆

赞 0 踩 0

2605.20572 2026-05-21 math.ST stat.ME stat.TH

Minimax unbiased estimation for finite populations with bounded outcomes

有限总体的最小最大无偏估计：具有有界结果

P. M. Aronow, Patrick Lopatto

AI总结本文研究了在每个结果满足已知边界的情况下，对有限总体总和进行设计无偏估计的问题，推导了在矩形参数空间下的最坏情况平方误差下界，并证明了当单元包含指示符成对独立时，最小最大估计器为中点差分Horvitz-Thompson估计器。

详情

Comments: 14 pages

AI中文摘要

我们研究了在每个结果满足已知边界$y_i\in[a_i,b_i]$的情况下，对有限总体总和$\sum_{i=1}^N y_i$进行设计无偏估计的问题。对于任何具有包含概率$\pi_i>0$的抽样设计，我们证明了在矩形参数空间下的最坏情况平方误差的紧下界。该下界当且仅当单元包含指示符成对独立时才能达到，此时最小最大估计器是中点差分Horvitz-Thompson估计器$\sum_{i=1}^N m_i+\sum_{i\in S}(y_i-m_i)/\pi_i$，其中$m_i=(a_i+b_i)/{2}$。然后我们在约束$\sum_i \pi_i\le n$下解决联合设计与估计问题。我们发现，最小最大策略以概率$\pi_i^\ast=\min(1,c (b_i-a_i))$独立抽取单元，其中$c>0$被选择使得$\sum_i \pi_i^\ast=n$，并使用中点差分估计器。这将Gabler (1990)的线性最小最大结果扩展到完整的设计无偏估计器类。我们还证明该估计器在无偏估计器中是可接受的，并且是仿射等变的。

英文摘要

We study design-unbiased estimation of the finite-population total $\sum_{i=1}^N y_i$ when each outcome satisfies known bounds $y_i\in[a_i,b_i]$. For any sampling design with inclusion probabilities $π_i>0$, we prove a sharp lower bound on the worst-case squared error over the rectangular parameter space. This bound is attained if and only if the unit inclusion indicators are pairwise independent, in which case the minimax estimator is the midpoint-differenced Horvitz-Thompson estimator $\sum_{i=1}^N m_i+\sum_{i\in S}(y_i-m_i)/π_i$, with $m_i=(a_i+b_i)/{2}$. We then solve the joint design-and-estimation problem under the constraint $\sum_i π_i\le n$. We find that a minimax strategy samples units independently with probabilities $π_i^\ast=\min(1,c (b_i-a_i))$ where $c>0$ is chosen so that $\sum_i π_i^\ast=n$, and uses the midpoint-differenced estimator. This extends Gabler (1990)'s linear minimax result to the full class of design-unbiased estimators. We also show that the estimator is admissible among unbiased estimators and affine equivariant.

URL PDF HTML ☆

赞 0 踩 0

2605.20567 2026-05-21 stat.ME

Meta-analysis and network meta-analysis of time-to-event outcomes with non-proportional hazards: a Bayesian time-varying hazard ratio approach

时间至事件结局的元分析与网络元分析：基于贝叶斯时间变化危险比方法的非比例危险处理

Rhiannon K Owen, Keith R Abrams

AI总结本文提出了一种基于贝叶斯时间变化危险比方法的元分析和网络元分析方法，用于处理时间至事件结局中非比例危险的情况，通过分析化疗与标准治疗在晚期复发胃癌中的疗效，并在晚期BRAF突变黑色素瘤中评估总生存率，展示了该方法在非比例危险假设不成立时的有效性。

详情

Comments: 23 pages, 13 figures, 3 tables & Presented as an Oral Contribution at International Society for Clinical Biostatistics (ISCB) Conference (ISCB-46), Basel, August 27, 2025

AI中文摘要

背景：在进行时间至事件（TTE）结局的元分析，尤其是在健康技术评估（HTA）背景下时，通常使用危险比（HR）尺度。然而，当某些研究显示非比例危险时会出现问题。尽管已有多种方法被推荐，但它们的使用受到复杂性和结果在HTA中应用的便利性限制。替代方法是假设在Cox比例危险模型中每个研究内存在治疗-时间交互作用，并对由此产生的治疗和交互系数进行双变量元分析，从而获得总体时间变化危险比（TVHR）。方法：该TVHR方法被应用于比较化疗与标准治疗在晚期复发胃癌的元分析，其中无进展生存期（PFS）是结局。该方法也应用于评估晚期BRAF突变黑色素瘤的网络元分析（NMA）中的总生存期（OS）。结果：在晚期胃癌的元分析中，有五项试验显示出PFS的非比例危险证据。使用TVHR模型得到的HR在0.5年时为0.83（CrI:0.75-0.91），在3.5年时为0.99（CrI:0.79-1.23）。在晚期BRAF突变黑色素瘤NMA中，三项研究显示出OS的非比例危险证据。使用TVHR模型，nivolumab加ipilimumab在第七个月后持续优于对照组，HR从一年时的0.37（CrI:0.26-0.51）提高到五年时的0.24（CrI:0.12-0.45）。结论：当比例危险假设不成立时，采用TVHR方法进行TTE结局的元分析或NMA，能够提供直观的解决方案，便于在HTA中使用。

英文摘要

Background: Often when undertaking meta-analyses of time-to-event (TTE) outcomes, especially in a Health Technology Assessment context, a hazard ratio (HR) scale is used. However, issues arise when there is evidence of non-proportional hazards in some of the studies included. A number of methods have been advocated, but their use has been limited by either their complexity and/or the ease with which their results can be used in HTA. An alternative approach is to assume a treatment-log(time) interaction within a Cox proportional hazards model for each study, and to then undertake a bivariate meta-analysis of the resulting treatment and interaction coefficients, so that an overall time-varying HR (TVHR) can be obtained. Methods: A TVHR approach was applied to a meta-analysis of chemotherapy compared to Standard of Care for advanced recurrent gastric cancer, and in which Progression-Free Survival (PFS) was an outcome. The approach was also applied to a network meta-analysis (NMA) evaluating overall survival (OS) in advanced BRAF-mutated melanoma. Results: Five trials in the advanced gastric cancer meta-analysis displayed evidence of non-proportional hazards for PFS. Using a TVHR model produced HRs ranging from 0.83 (CrI:0.75-0.91) at 0.5 years to 0.99 (CrI:0.79-1.23) at 3.5 years. Three studies showed evidence of non-proportional hazards in the advanced BRAF-mutated melanoma NMA for OS. Using a TVHR model, nivolumab plus ipilimumab demonstrated consistent superiority from month 7 onwards, with a HR improving from 0.37 (CrI:0.26-0.51) at one year to 0.24 (CrI:0.12-0.45) at five years. Conclusions: A TVHR approach to the meta-analysis or NMA of TTE outcomes when the proportional hazards assumption appears not to hold, produces an intuitive solution which can be readily used in HTA.

URL PDF HTML ☆

赞 0 踩 0

2605.20559 2026-05-21 stat.ML cs.LG stat.AP stat.ME

Group-Aware Matrix Estimation and Latent Subspace Recovery

基于群体的矩阵估计与潜在子空间恢复

Hamza Golubovic, Matthew Shen, Genevera I. Allen, Tarek M. Zikry

AI总结本文提出了一种针对异质数据中群体特定低秩矩阵估计的凸估计器GAME，通过重叠核范数惩罚正则化来恢复子群特定的子空间结构，同时在共享坐标系中保留局部潜在结构，并在不同数据集上验证了其在结构缺失情况下优于传统低秩方法的性能。

详情

Comments: 12 pages, 6 main figures, 1 main algorithm

AI中文摘要

现代矩阵补全问题通常涉及异质数据，其行同时属于多个元类别，如推荐系统中的人口统计数据和年龄组，或神经电生理实验中的区域和记录会话标签。标准低秩估计器施加单一全局潜在几何结构，可以恢复平均结构，但可能平滑掉子群特定的变异，尤其是在观察分布不均的情况下。我们引入了Group-Aware Matrix Estimation (GAME)，一种用于重叠子群级低秩矩阵估计的凸估计器。GAME通过重叠核范数惩罚正则化子群特定的子矩阵，允许相关组之间共享信息，同时在共享坐标系中保留局部潜在结构。我们为重建误差和子群特定子空间恢复提供了有限样本保证，展示了性能如何依赖于采样密度、子群秩和重叠结构。在合成、推荐、生态和神经科学数据集上的实验表明，GAME在结构缺失情况下最有益，其中子群意识正则化提高了重建准确性和潜在子空间保真度。在这些基准测试中，GAME在全局低秩、侧信息和现代填补基线中表现竞争力或最佳，当子群表现出不同低秩结构时，收益最大。

英文摘要

Modern matrix completion problems often involve heterogeneous data whose rows simultaneously belong to many meta-categories, such as demographic and age groups in recommendation systems, or region and recording session labels in neural electrophysiological experiments. Standard low-rank estimators impose a single global latent geometry, which can recover average structure but may smooth away subgroup-specific variation, especially when observations are unevenly distributed across groups. We introduce Group-Aware Matrix Estimation (GAME), a convex estimator for overlapping subgroup-wise low-rank matrix estimation. GAME regularizes category-specific submatrices through overlapping nuclear-norm penalties, allowing related groups to borrow information while preserving local latent structure in a shared coordinate system. We provide finite-sample guarantees for both reconstruction error and subgroup-specific subspace recovery, showing how performance depends on sampling density, subgroup rank, and overlap structure. Experiments on synthetic, recommendation, ecological, and neuroscience datasets show that GAME is most beneficial in structured missingness regimes, where subgroup-aware regularization improves both reconstruction accuracy and latent subspace fidelity. Across these benchmarks, GAME is competitive or best among global low-rank, side-information, and modern imputation baselines, with the largest gains when subgroups exhibit distinct low-rank structure.

URL PDF HTML ☆

赞 0 踩 0

2605.20552 2026-05-21 stat.ML cs.LG

Spectral bandits for smooth graph functions with applications in recommender systems

图上平滑函数的谱带it问题及其在推荐系统中的应用

Tomáš Kocák, Michal Valko, Rémi Munos, Branislav Kveton, Shipra Agrawal

AI总结本文研究了图上平滑函数的带it问题，提出了一种在推荐系统中有效学习用户偏好的方法，通过有效维度的定义和线性缩放的算法，实现了低悔的在线学习。

详情

Comments: Published at AAAI 2014 - SDMBD

AI中文摘要

图上的平滑函数在流形和半监督学习中有广泛应用。本文研究了一个带it问题，其中臂的收益在图上是平滑的。该框架适用于涉及图的在线学习问题，如基于内容的推荐。在该问题中，每个推荐的项目是一个节点，其预期评分与其邻居相似。目标是推荐具有高预期评分的项目。我们旨在设计累积遗憾不随节点数量劣化的算法。特别是，我们引入了有效维度的概念，该概念在现实世界图中较小，并提出了两种算法，其规模与该维度线性相关。我们在现实世界的内容推荐问题上的实验表明，从仅几十个节点的评估中即可学习出对成千上万项目的良好用户偏好估计器。

英文摘要

Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each recommended item is a node and its expected rating is similar to its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose two algorithms for solving our problem that scale linearly in this dimension. Our experiments on real-world content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens nodes evaluations.

URL PDF HTML ☆

赞 0 踩 0

2605.20550 2026-05-21 math.ST stat.TH

Kernel Density Estimation under $C^{1,1}$ Regularity: AMISE, Weak Curvature, and Plug-in Bandwidths

核密度估计在$C^{1,1}$正则性下：AMISE、弱曲率和插件带宽

Alireza Kabgani, Elaheh Lotfian

AI总结本文研究了在$C^{1,1}$正则性条件下核密度估计的AMISE理论，提出了弱曲率概念，并在不假设经典二阶导数连续的情况下，推导出AMISE公式、最优带宽和Epanechnikov核最优性。

详情

AI中文摘要

经典的核密度估计通常通过点wise泰勒展开推导出AMISE和最优带宽，这需要两次连续可导。这一假设比必要强，排除了自然密度，这些密度来自阈值模型、制度变化和鲁棒混合模型，其中一阶导数可能是Lipschitz的，而曲率可能是尖点、不连续或仅弱定义的。我们证明在更弱的条件$f\in C^{1,1}(\mathbb{R})$下，经典AMISE理论仍然有效。点wise $C^2$泰勒展开被基于弱二阶导数的积分泰勒表示所替代，因此$R(f'')$被解释为弱曲率功能。在$f\in C^{1,1}(\mathbb{R})$和$f''\in L^2(\mathbb{R})$的条件下，我们恢复了经典的AMISE公式、$n^{-1/5}$最优带宽和Epanechnikov核最优性，而无需假设连续的经典二阶导数。我们还提出了一种广义曲率插件带宽选择器，证明其在比率一致曲率估计下的AMISE等价性，并建立了留一法U统计量曲率估计器的一致性。使用弱Hessian的多元扩展恢复了标量带宽率$n^{-4/(d+4)}$。

英文摘要

Classical kernel density estimation usually derives the AMISE and optimal bandwidth from a pointwise Taylor expansion, which requires twice continuous differentiability. This assumption is stronger than necessary and excludes natural densities arising from threshold models, regime changes, and robust mixture models, where the first derivative may be Lipschitz while the curvature is kinked, discontinuous, or only weakly defined. We show that the classical AMISE theory remains valid under the weaker condition $f\in C^{1,1}(\mathbb{R})$. The pointwise $C^2$ Taylor expansion is replaced by an integral Taylor representation based on the weak second derivative, so that $R(f'')$ is interpreted as a weak-curvature functional. Under $f\in C^{1,1}(\mathbb{R})$ and $f''\in L^2(\mathbb{R})$, we recover the classical AMISE formula, the $n^{-1/5}$ optimal bandwidth, and Epanechnikov kernel optimality without assuming a continuous classical second derivative. We also propose a generalized-curvature plug-in bandwidth selector, prove its first-order AMISE equivalence under ratio-consistent curvature estimation, and establish consistency of a leave-one-out U-statistic curvature estimator. A multivariate extension using weak Hessians recovers the scalar-bandwidth rate $n^{-4/(d+4)}$.

URL PDF HTML ☆

赞 0 踩 0

2605.20547 2026-05-21 cs.LG cs.AI stat.ML

Latent Process Generator Matching

潜在过程生成器匹配

Lukas Billera, Hedwig Nora Nordlinder, Ben Murrell

AI总结本文提出了一种潜在过程生成器匹配框架，该框架将观测到的生成状态视为可 tractable 马尔可夫过程的确定性图像，从而扩展了生成器匹配理论，使其适用于时间依赖的潜在条件过程。

详情

Comments: 18 pages, 1 figure

AI中文摘要

许多近期的流匹配和扩散式生成模型在训练过程中依赖于辅助的随机动力学：通过模拟更丰富的过程来定义条件目标，但辅助状态在生成时要么难以采样，要么并不属于期望的输出。现有的生成器匹配理论规范了对静态潜在随机变量的条件，而几篇近期论文证明了特定增强状态构造的投影结果的特殊情况。我们引入了潜在过程生成器匹配，一种通用框架，将观测到的生成状态视为可 tractable 马尔可夫过程的确定性图像 $X_t=Φ(Y_t)$。我们显示在这一设定下，可以在图像空间中学习一个随机过程的生成器，其一阶边缘分布与投影过程相同。这扩展并涵盖了文献中的离散潜在过程结果，并将生成器匹配从静态潜在变量扩展到丰富的时间依赖潜在条件过程家族。

英文摘要

Many recent flow-matching and diffusion-style generative models rely on auxiliary stochastic dynamics during training: a richer process is simulated to define conditional targets, but the auxiliary state is either intractable to sample at generation time or simply not part of the desired output. Existing Generator Matching theory formalises conditioning on static latent random variables, and several recent papers prove special cases of projection results for particular augmented-state constructions. We introduce latent process generator matching, a general framework that treats the observed generative state as a deterministic image $X_t=Φ(Y_t)$ of a tractable Markov process $Y_t$. We show that in this setting one may learn the generator of a stochastic process on the image space which has the same one-time marginal distributions as the projected process. This generalizes and subsumes the discrete latent process results from the literature, and extends Generator Matching from static latent variables to a rich family of time-dependent latent conditional processes.

URL PDF HTML ☆

赞 0 踩 0

2605.20545 2026-05-21 stat.ML cs.LG

Sample Complexity of Transfer Learning: An Optimal Transport Approach

迁移学习的样本复杂性：一种最优传输方法

Haoyang Cao, Xin Guo, Wenpin Tang, Guan Wang

AI总结本文通过最优传输视角分析迁移学习的样本效率，发现当数据维度d大于3时，迁移学习的样本复杂性为O(m^{-(α+1)/d})，优于直接学习的O(m^{-p/d})，其中α表示数据分布的光滑度，p表示最优目标模型的光滑度。

详情

AI中文摘要

迁移学习是许多复杂结构的机器学习/AI模型，如大语言模型和生成式AI中的关键技术。迁移学习的本质是利用已解决的源任务知识来解决新目标任务，尤其是在后者训练数据样本量m较低时。本文严格分析了迁移学习在样本效率方面的潜在优势。具体而言，从最优传输视角出发，我们发现当数据维度d大于3时，迁移学习的样本复杂性为O(m^{-(α+1)/d})，其中α表示数据分布的光滑度，而直接学习的样本复杂性为O(m^{-p/d})，其中p表示最优目标模型的光滑度。我们的发现从理论上支持了当目标任务在一系列不太光滑的模型（即高度复杂的网络，可能使用非光滑激活函数）中优化时，迁移学习具有更好的样本效率。以图像分类为例，我们通过数值实验展示了迁移学习的样本效率，即在数据渴求的 regime 中，迁移学习可以显著提升模型性能。

英文摘要

Transfer learning is an essential technique for many machine learning/AI models of complex structures such as large language models and generative AI. The essence of transfer learning is to leverage knowledge from resolved source tasks for a new target task, especially when the sample size $m$ of the training data for the latter is low. In this work, we rigorously analyze the potential benefit of transfer learning in terms of sample efficiency. Specifically, taking an optimal transport viewpoint of transfer learning, we find that when the data dimension $d$ is higher than $3$, the sample complexity for transfer learning is $O(m^{-(α+1)/d})$, with $α$ indicating the smoothness of the data distribution, as opposed to the $O(m^{-p/d})$ sample complexity for direct learning with $p$ indicating the smoothness of the optimal target model. Our finding theoretically supports a better sample efficiency for transfer learning, when the target task is optimizing over a family of not-so-smooth models (i.e., highly complex networks with the possible use of non-smooth activation functions). Using image classification as an example, we numerically demonstrate the sample efficiency for transfer learning, that is, in the data hungry regime, the model performance can be significantly improved by transfer learning.

URL PDF HTML ☆

赞 0 踩 0

2605.20541 2026-05-21 math.ST math.PR stat.TH

Finite-Sample Bounds for Expected Signature Estimation under Weak Dependence

有限样本下弱依赖条件下期望签名估计的界限

Bryson Schenck

AI总结本文研究了在弱依赖条件下，从单一长依赖轨迹估计期望签名的有限样本界限，通过块平均估计器证明了非渐近的均方误差界，并探讨了在不同Hurst指数下的收敛性。

详情

Comments: 51 pages, 1 figure

AI中文摘要

期望签名在满足矩增长条件时唯一确定随机粗糙路径的分布，但此前缺乏从单一长依赖轨迹估计其有限样本界限。本文研究了一个平稳随机过程，其样本路径可解释为几何粗糙路径，被划分为等间距观测的块，并证明了块平均估计器的非渐近均方误差界。当路径的Hölder正则性至多为1/2时，需要粗糙路径理论来定义估计量，因为Young积分和Riemann-Stieltjes积分无法定义签名的迭代积分。在矩、平稳性和块签名协方差衰减条件（严格弱于α-混合且适用于长程依赖驱动器）下，误差分为离散化项和波动项，其速率分别由路径正则性和依赖强度决定。通过逐层粗糙因子方差分析，保持有限截断常数显式，并在固定观测预算下获得最优分配规则。本文验证了分数奥本海姆-乌伦贝克过程在三个制度下的假设，即粗糙（Hurst H<1/2）、半鞅（H=1/2）和长程（H>1/2）。蒙特卡罗实验显示经验收敛速率快于理论上界。

英文摘要

The expected signature uniquely determines the law of a random rough path under a moment-growth condition, yet finite-sample bounds for estimating it from a single long dependent trajectory have been lacking. We study a stationary stochastic process whose sample paths can be interpreted as geometric rough paths, partitioned into blocks of equally-spaced observations, and prove a non-asymptotic mean-squared error bound for the block-averaging estimator. Rough-path theory is required for the estimand to be well-defined when paths have Hölder regularity at most $1/2$, because Young and Riemann--Stieltjes integration cannot define the signature's iterated integrals. Under moment and stationarity assumptions together with a covariance-decay condition on block signatures -- strictly weaker than $α$-mixing and applicable to long-range-dependent drivers -- the error separates into a discretization term and a fluctuation term, with rates determined respectively by path regularity and dependence strength. A level-wise rough-factorial variance analysis keeps finite-truncation constants explicit and yields an optimal allocation rule under a fixed observation budget. We verify the assumptions for fractional Ornstein--Uhlenbeck processes in three regimes, namely rough (Hurst $H<1/2$), semimartingale ($H=1/2$), and long-range ($H>1/2$). Monte Carlo experiments show empirical convergence rates faster than the theoretical upper bounds.

URL PDF HTML ☆

赞 0 踩 0

2605.20154 2026-05-21 stat.ME stat.AP

Component over Composite: Mitigating Type I Error Inflation when Imputing "Days Alive and at Home"

组件与复合体：在填补“在家存活天数”时减轻I型错误膨胀

Mia S. Tackney, Sarah Dawson, Letao Yuan, Dominique-Laurent Couturier, Sofia S. Villar

AI总结本文研究了在填补“在家存活天数”复合结局时如何减轻I型错误膨胀的问题，通过模拟研究比较了不同处理缺失数据的方法，发现对组件层面进行多重插补比对复合体层面进行插补更能控制I型错误，建议未来研究应开发更适用于复杂DAH定义的插补方法。

详情

AI中文摘要

背景：在家存活天数（DAH）是在预定义随访期内的一个新型干预后复合结局，结合了至少三个组成部分的数据：（i）初始住院天数，（ii）总再住院或其他出院后护理的天数，（iii）死亡率。缺失值给分析DAH结局的试验带来了独特挑战，因为三个组成部分可能有不同的缺失率，由于不同的缺失数据机制。当前方法将DAH定义为缺失如果任何组成部分缺失，并进行完整案例分析或复合物的多重插补（MI）。方法：通过受NOTACS试验启发的模拟研究，我们比较了几种处理缺失数据的方法，包括完整案例分析、复合物的MI，以及在主要分析为曼-惠特尼-威尔科克森检验时对组件进行MI。结果：对组件层面进行MI在I型错误控制和功效方面有良好的特性。我们警告不要使用预测均值匹配（PMM）对复合体层面进行MI，这可能导致I型错误膨胀。结论：鉴于DAH的复杂分布特征，将缺失性定义在复合体层面并直接使用PMM插补复合体的简单方法可能导致I型错误膨胀。建议对组件层面进行插补，未来研究应包括开发适用于更复杂DAH定义的插补方法，以及对“缺失于随机”假设的敏感性分析建议。

英文摘要

Background: Days Alive and at Home (DAH) over a pre-defined follow-up period is a novel post-intervention composite outcome that combines data from at least three components: (i) initial length of hospital stay, (ii) length of total readmissions or other post-discharge care and (iii) mortality. Missing values bring unique challenges to the analysis of trials with the DAH outcome as the three components may have different rates of missingness caused by distinct missing data mechanisms. Current approaches define DAH as missing if any of the components are missing, and proceed with complete cases or Multiple Imputation (MI) of the composite. Methods: Through a simulation study motivated by the NOTACS trial, we compare several methods of handling missing data, including complete case analysis, MI of the composite, and MI of the components when the primary analysis is a Mann-Whitney-Wilcoxon test. Results: MI on the component level has good properties in terms of type I error control and power. We caution against the use of MI on the composite level with Predictive Mean Matching, which can lead to type I error inflation. Conclusions: Given the complex distributional characteristics of DAH, naive approaches such as defining missingness on the composite level and directly imputing the composite with Predictive Mean Matching, can lead to type I error inflation. Imputing on the component level is recommended, suggested future work included imputation approaches that are compatible with more complex definitions of DAH, as well as recommendations for sensitivity analyses to the Missing at Random assumption.

URL PDF HTML ☆

赞 0 踩 0

2604.21212 2026-05-21 stat.AP

Legal Infrastructure Organizes Eviction: Evidence from Philadelphia

法律基础设施组织驱逐：来自费城的证据

Marios Papamichalis, Regina Ruane

AI总结本文研究了费城驱逐诉讼中法律基础设施的组织结构，发现集中化的原告律师、长期原告律师依赖、重复使用同一物业以及反复暴露租户名称是驱逐诉讼的主要特征，同时揭示了驱逐诉讼是多层次的上游过程。

详情

Comments: This is a preprint before submission

AI中文摘要

我们利用1969年至2022年间费城市政法院房东-房客记录中的755,004条数据，分析驱逐诉讼的起诉方法律基础设施。其中747,125条为住宅案件。费城的驱逐诉讼由一个集中化的原告律师群体、长期原告律师依赖、重复使用同一物业以及反复暴露租户名称所组织。1983年至2022年间，最活跃的10名原告律师处理了每年平均82.2%的原告方案件，而最活跃的10名原告仅处理了14.8%的案件。大原告严重依赖单一律师：对于至少提起101起案件的原告，78.3%的案件由其最常用的律师处理。重复性在案件日程中同样至关重要。在住宅案件中，48.8%的案件发生在有前一年案件的地址，23.6%的案件发生在有六个或更多前次案件的地址；这些重复案件通常由同一原告提起，并遵循更偏向默认条款、较少协议条款的路径。我们进一步研究了更狭窄的机制：严格转向专业原告律师，定义为原告转而聘请前一年前十名律师之一。在转向后，案件数量出现非平缓的增长趋势，表明组织重构而非纯粹的外生冲击。原告内和原告-物业内比较得出更稳定的估计：协议判决、费用份额、豁免语言以及修正的锁出触发语言下降，而截止期限语言上升。我们解释驱逐诉讼为一个多层次的上游过程，其中集中化的律师、重复的地点和反复出现的租户在任何法庭谈判或裁决之前产生案件。

英文摘要

We analyze the filing-side legal infrastructure of eviction using 755,004 Philadelphia Municipal Court landlord-tenant records filed between 1969 and 2022, of which 747,125 are residential. Eviction in Philadelphia is organized upstream by a concentrated plaintiff-side bar, durable plaintiff-attorney dependence, repeated use of the same properties, and recurring tenant-name exposure. Between 1983 and 2022, the ten most active plaintiff attorneys handled 82.2% of represented plaintiff-side cases per year on average, compared with 14.8% for the ten most active plaintiffs. Large plaintiffs depend heavily on a single attorney: among plaintiffs filing at least 101 cases, 78.3% of each plaintiff's filings are handled by that plaintiff's most-used attorney, on average. Repetition is likewise central to the docket. Across the residential filing universe, 48.8% of cases occur at addresses with a prior filing in the preceding year, and 23.6% at addresses with six or more prior filings; these repeats are usually filed by the same plaintiff and follow a more default-heavy, less agreement-heavy pathway. We further examine a narrower mechanism: strict switches into specialist plaintiff-side counsel, defined as a plaintiff changing attorney to one in the prior-year top ten. Filing counts rise around the switch with non-flat pre-trends, indicating organizational reconfiguration rather than a clean exogenous shock. Within-plaintiff and within-plaintiff-property comparisons yield more stable estimates: judgment by agreement, fee share, waiver language, and corrected lockout-trigger language decline, while deadline language rises. We interpret eviction as a layered upstream process in which concentrated counsel, repeated places, and recurring tenants produce filings before any courtroom bargaining or adjudication occurs.

URL PDF HTML ☆

赞 0 踩 0

2604.20985 2026-05-21 cs.LG cs.AI cs.CR stat.ML

Differentially Private Model Merging

差分隐私模型融合

Qichuan Yin, Manzil Zaheer, Tian Li

AI总结本文提出两种后处理技术，随机选择和线性组合，用于在不额外训练的情况下生成满足任意目标差分隐私要求的最终私有模型，同时分析了这些方法在一般问题和私有均值估计中的隐私-效用权衡。

详情

AI中文摘要

在机器学习中，推理或部署时间的隐私要求往往由于政策、法规或用户偏好变化而演变。在本文中，我们旨在构建一组模型，以满足任何目标差分隐私（DP）要求，而无需额外训练，给定一组已在相同数据集上训练且具有不同隐私/效用权衡的现有模型。我们提出两种后处理技术，即随机选择和线性组合，以生成最终的私有模型，满足任何目标隐私参数。我们从R'enyi DP和一般问题中的隐私损失分布的角度提供了这些方法的隐私计费，以及在私有均值估计中的精确隐私/效用权衡分析，并比较了这两种机制。实验上，我们展示了我们方法的有效性，并在多个模型和合成及现实世界数据集上验证了我们的分析。

英文摘要

In machine learning, privacy requirements at inference or deployment time often evolve due to changing policies, regulations, or user preferences. In this work, we aim to construct a magnitude of models to satisfy any target differential privacy (DP) requirement without additional training, given a set of existing models trained on the same dataset with different privacy/utility tradeoffs. We propose two post-processing techniques, namely random selection and linear combination, to generate final private models satisfying any target privacy parameter. We provide privacy accounting of these approaches from the lens of R'enyi DP and privacy loss distributions on general problems, as well as on private mean estimation, where we precisely characterize the privacy/utility tradeoffs and compare the two mechanisms. Empirically, we demonstrate the effectiveness of our approaches and validate our analyses on several models and both synthetic and real-world datasets.

URL PDF HTML ☆

赞 0 踩 0

2603.26184 2026-05-21 stat.AP

Why decision curves go above or below treat-all and treat-none: a PPV- and calibration-based guide for clinical prediction models

为何决策曲线高于或低于全治疗和无治疗：基于PPV和校准的临床预测模型指南

Linard Hoessly

AI总结本文通过阈值特定的观察风险，将决策曲线性能与校准联系起来，提出了PPV曲线作为决策曲线的实用补充。

2602.10989 2026-05-21 math.ST cs.IT cs.LG math.IT math.PR stat.ML stat.TH

Variational Optimality of Föllmer Processes in Generative Diffusions

变分最优的Föllmer过程在生成扩散中的应用

Yifan Chen, Eric Vanden-Eijnden

AI总结本文研究了利用随机插值框架构造和分析生成扩散的过程，通过条件期望估计漂移项，证明了在变分最优条件下Föllmer过程在路径空间中最小化相对熵，并提供了数据驱动的模拟方法。

详情

AI中文摘要

我们构造并分析了利用随机插值框架在有限时间范围内将点质量运输到指定目标分布的生成扩散。漂移项以条件期望形式表达，可通过独立样本估计而无需模拟随机过程。我们证明扩散系数可以在事后调整而不改变时间边际分布。在所有此类调整中，最小化估计误差对路径空间Kullback-Leibler散度的影响会选出闭式形式的Föllmer过程——一种路径测度相对于由插值计划确定的参考过程最小化相对熵的扩散。这为Föllmer过程提供了新的变分刻画，补充了经典的Schrodinger桥和随机控制方法，并提供了Föllmer漂移的条件期望表示，使从数据中无模拟估计成为可能。我们进一步证明，在最优扩散系数下，路径空间Kullback-Leibler散度与插值计划无关，使得不同计划在变分意义上统计等价。我们还通过数值实验展示了Föllmer过程在概率预报和数据同化中的路径空间变分最优影响。

英文摘要

We construct and analyze generative diffusions that transport a point mass to a prescribed target distribution over a finite time horizon using the stochastic interpolant framework. The drift is expressed as a conditional expectation that can be estimated from independent samples without simulating stochastic processes. We show that the diffusion coefficient can be tuned \emph{a~posteriori} without changing the time-marginal distributions. Among all such tunings, we prove that minimizing the impact of estimation error on the path-space Kullback--Leibler divergence selects, in closed form, a Föllmer process -- a diffusion whose path measure minimizes relative entropy with respect to a reference process determined by the interpolation schedules alone. This yields a new variational characterization of Föllmer processes, complementing classical formulations via Schrödinger bridges and stochastic control, and provides a conditional-expectation representation of the Föllmer drift that enables simulation-free estimation from data. We further establish that, under this optimal diffusion coefficient, the path-space Kullback--Leibler divergence becomes independent of the interpolation schedule, rendering different schedules statistically equivalent in this variational sense. We provide numerical experiments to illustrate the impact of path-space variational optimality of Föllmer's processes in probabilistic forecasting and data assimilation applications.

URL PDF HTML ☆

赞 0 踩 0

2602.04907 2026-05-21 cs.LG cs.AI stat.ME

Causal Discovery from Heteroscedastic Stochastic Dynamical Systems under Imperfect Physical Models

从不完美物理模型下的异方差随机动力系统中进行因果发现

Jianhong Chen, Naichen Shi, Xubo Yue

AI总结本文提出了一种整合因果发现框架，利用随机微分方程中的部分物理知识来提高动态系统中因果图的恢复能力，同时分析了在不完美物理模型下的鲁棒性。

详情

Comments: 101 pages

AI中文摘要

因果发现是一种数据驱动的复杂系统分析范式，而基于物理的模型，如常微分方程（ODEs），为现实世界的动力学过程提供了机理结构。整合这些范式可以提高可识别性、稳定性和鲁棒性。然而，真实动力系统往往表现出循环交互和非平稳性，而许多因果发现方法依赖于无循环、平稳或平衡假设。我们提出了一种整合因果发现框架，利用随机微分方程（SDEs）中的部分物理知识。漂移项编码已知的ODE动力学，而扩散项捕捉超出规定物理的未知因果耦合。我们开发了一种可扩展的稀疏诱导最大准似然估计器，并通过理论上合理的稳定技术来改善优化景观。在温和条件下，我们为稳定和不稳定SDEs建立了因果图恢复保证。我们还分析了我们的因果图估计在ODE不准确情况下的鲁棒性，并澄清了引入的稳定技术如何平衡数值稳定性和统计恢复能力。在线性SDEs和非线性基准测试，包括具有无循环和循环结构的Lotka-Volterra和Lorenz动力学上，实验显示了比数据驱动基线更好的图恢复和鲁棒性。我们还通过在我们的因果发现框架内重建随机SIR动力学来展示实际应用，以在现实世界流行病数据中进行因果图重建。

英文摘要

Causal discovery is a data-driven paradigm for analyzing complex systems, while physics-based models, such as ordinary differential equations (ODEs), provide mechanistic structure for real-world dynamical processes. Integrating these paradigms can improve identifiability, stability, and robustness. However, real dynamical systems often exhibit cyclic interactions and nonstationarity, whereas many causal discovery methods rely on acyclicity, stationarity, or equilibrium assumptions. We propose an integrative causal discovery framework for dynamical systems that leverages partial physical knowledge through stochastic differential equations (SDEs). The drift term encodes known ODE dynamics, while the diffusion term captures unknown causal couplings beyond the prescribed physics. We develop a scalable sparsity-inducing maximum quasi-likelihood estimator with a theoretically justified stabilization technique to improve the optimization landscape. Under mild conditions, we establish causal graph recovery guarantees for both stable and unstable SDEs. We also analyze robustness of our causal graph estimate to ODE misspecification and clarify how the introduced stabilization technique balances numerical stability and statistical recoverability. Experiments on linear SDEs and nonlinear benchmarks, including Lotka-Volterra and Lorenz dynamics with acyclic and cyclic structures, show improved graph recovery and robustness over data-driven baselines. We also demonstrate practical utility on real-world epidemic data by reconstructing stochastic SIR dynamics within our causal discovery framework.

URL PDF HTML ☆

赞 0 踩 0

2602.04092 2026-05-21 stat.AP econ.EM stat.ME

Time-to-Event Estimation with Unreliably Reported Events in Medicare Health Plan Payment

Medicare健康计划支付中不可靠事件报告的时间到事件估计

Oana M. Enache, Sherri Rose

AI总结本文提出了一种时间到事件估计器，用于评估医疗保险中的新诊断编码和可能的虚报，并介绍了一个开源软件包，以提高与医疗保险报销行为相关的可重复方法开发。

详情

Comments: 44 pages, 10 figures

AI中文摘要

OBJECTIVE: 为了提出有助于评估医疗保险中新诊断编码和可能虚报的时间到事件估计器，并介绍一个开源软件包，以促进与医疗保险报销行为相关的更可重复的方法开发。 STUDY SETTING AND DESIGN: 对基于保险公司或提供者编码的模拟虚报进行观察性分析，这些编码可能受到医疗保险经办机构风险调整的激励。 DATA SOURCES AND ANALYTIC SAMPLE: 两年期间分别模拟了医疗保险经办机构人口和传统医疗保险人口的新健康状况编码数据，其中编码模式与每个计划中已知的做法一致。 PRINCIPAL FINDINGS: 我们提出了几种新的时间到事件估计器，用于估计医疗保险经办机构中的新编码强度和可能的虚报，包括考虑不可靠的报告。我们利用国家卫生研究院的All of Us研究在模拟数据中展示了估计器的性能，并开发了一个开源的R包来模拟纵向的现实标记虚报数据，这些数据之前对研究人员不可用。在模拟中，我们的新型估计器恢复了不同监控期内的虚报差异。低估对我们的新型估计器影响有限，而现有的估计器对低估更敏感。 CONCLUSIONS: 我们提出的估计器可以帮助研究人员和政策制定者跟踪新的编码行为（例如，可能受到风险调整公式更新的激励）并以更大规模进行跟踪，同时考虑多个现实数据因素。此外，我们提供的R包可用于改进编码强度和虚报方法的开发、可及性和可重复性评估。

英文摘要

OBJECTIVE: To propose time-to-event estimators that help evaluate incident diagnostic coding and possible upcoding in Medicare as well as introduce an open-source software package that enables more reproducible methods development relevant to Medicare billing behavior. STUDY SETTING AND DESIGN: Observational analysis of simulated upcoding based on coding by insurers or providers that may be incentivized by Medicare Advantage risk adjustment. DATA SOURCES AND ANALYTIC SAMPLE: Two years of separately simulated incident health condition coding data for a Medicare Advantage population and a Traditional Medicare population where coding patterns are aligned with known practices in each program. PRINCIPAL FINDINGS: We propose several novel time-to-event estimators of incident coding intensity and possible upcoding in Medicare Advantage, including accounting for unreliable reporting. We demonstrate estimator performance in simulated data leveraging the National Institutes of Health's All of Us study and also develop an open source R package to simulate longitudinal realistic labeled upcoding data, which were not previously available for researchers. In simulations, our novel estimators recovered differences in upcoding within and across monitoring periods. Undercoding had a limited effect on our novel estimators while an existing estimator was more sensitive to undercoding. CONCLUSIONS: Our proposed estimators can help researchers and policymakers track new coding behaviors (e.g., as may be incentivized by risk adjustment formula updates) earlier and at scale while accounting for several real-world data considerations. Further, the R package we provide can be used to improve the development, accessibility, and reproducible evaluation of coding intensity and upcoding methodology.

URL PDF HTML ☆

赞 0 踩 0

2601.14991 2026-05-21 stat.ME stat.ML

Consistency of Honest Decision Trees and Random Forests

诚实决策树与随机森林的一致性

Martin Bladt, Rasmus Frigaard Lemvig

AI总结本文研究了回归设置中诚实决策树和随机森林的不同一致性类型，通过简单证明和经典平滑方法的论证，建立了诚实树和诚实森林平均值对真实回归函数的弱一致性和几乎必然收敛，并在紧致协变量域上获得一致收敛。该框架自然支持基于分层采样的集成变体和两阶段bootstrap采样方案，简化了现有分析并恢复了多个结果。

详情

AI中文摘要

我们研究了回归设置中诚实决策树和随机森林的各种一致性类型。与相关文献不同，我们的证明是简单的，并遵循用于平滑方法的经典论证。在回归函数和数据分布的温和正则性条件下，我们建立了诚实树和诚实森林平均值对真实回归函数的弱一致性和几乎必然收敛，并且还获得了在紧致协变量域上的统一收敛。该框架自然地容纳了基于分层采样的集成变体以及两阶段bootstrap采样方案。我们的处理综合并简化了现有的分析，特别是恢复了多个结果作为特殊情况。论证的简单性澄清了数据自适应分区与核型方法之间的紧密关系，为理解基于树的方法的渐近行为提供了可访问的方法。

英文摘要

We study various types of consistency of honest decision trees and random forests in the regression setting. In contrast to related literature, our proofs are elementary and follow the classical arguments used for smoothing methods. Under mild regularity conditions on the regression function and data distribution, we establish weak and almost sure convergence of honest trees and honest forest averages to the true regression function, and moreover we obtain uniform convergence over compact covariate domains. The framework naturally accommodates ensemble variants based on subsampling and also a two-stage bootstrap sampling scheme. Our treatment synthesizes and simplifies existing analyses, in particular recovering several results as special cases. The elementary nature of the arguments clarifies the close relationship between data-adaptive partitioning and kernel-type methods, providing an accessible approach to understanding the asymptotic behavior of tree-based methods.

URL PDF HTML ☆

赞 0 踩 0

2601.07169 2026-05-21 math.PR cond-mat.stat-mech cs.DM math.CO math.ST stat.TH

Approximate FKG inequalities for phase-bound spin systems, with applications to central limit theorems for exponential random graphs

相位边界自旋系统的近似FKG不等式，及其在指数随机图模型中心极限定理中的应用

Satyaki Mukherjee, Vilas Winstein

AI总结本文研究了相位边界自旋系统中近似的FKG不等式，证明了在相变共存区域中，每个相内部确实满足近似的FKG不等式，并利用此结果完成了各个相内的中心极限定理证明，回答了Bianchi等人提出的问题。

详情

Comments: 28 pages, 1 figure. Title, abstract, and introduction updated to clarify the focus of the article

AI中文摘要

Fortuin-Kasteleyn-Ginibre（FKG）不等式是单调自旋系统中满足FKG晶格条件的重要工具，它为所有坐标递增的自旋函数提供正相关性。该不等式在各种中心极限定理（CLTs）的证明中发挥了重要作用，包括最近关于铁磁性指数随机图模型（ERGMs）的研究，其中哈密顿量倾斜促进了小子图如三角形的存在。然而，当将自旋系统限制在特定相中时，在低温参数下FKG晶格条件会失效。因此，不清楚每个相内部是否对递增函数具有正相关性，或者整体模型（即相的混合）中的正相关性是否主要来自全局相的选择。在本文中，我们证明ERGMs中的各个相确实满足近似的FKG不等式。我们利用此结果完成各个相内的中心极限定理证明，回答了Bianchi、Collet和Magnanini提出的问题。我们展示了ERGMs中的FKG不等式是更一般结果的推论，该结果在某些与元稳态混合相关的输入条件下成立；我们预计该一般结果将具有广泛的应用性，并专门用一节来阐述其在一类广义高阶铁磁性居里-魏斯模型中的应用细节，其中所需的输入相对明确。

英文摘要

The Fortuin-Kasteleyn-Ginibre (FKG) inequality is an invaluable tool in monotone spin systems satisfying the FKG lattice condition, which provides positive correlations for all coordinate-wise increasing functions of spins. This inequality has numerous applications and plays an integral role in the proof of various central limit theorems (CLTs), including recent work on ferromagnetic exponential random graph models (ERGMs) wherein a Hamiltonian tilt promotes the presence of small subgraphs like triangles. However, the FKG lattice condition fails to hold when confining a spin system to a particular phase in the low-temperature regime of parameters. Thus it is not a priori clear if each phase internally has positive correlations for increasing functions, or if the positive correlations in the overall model (which is a mixture of phases) arise primarily from the global choice of phase. In this article, we show that the individual phases in ERGMs do indeed satisfy an approximate form of the FKG inequality internally. We use this to finish the proof of various CLTs within each individual phase in the phase-coexistence regime, answering a question posed by Bianchi, Collet, and Magnanini. We present the FKG inequality for ERGMs as a consequence of a more general result which holds under certain inputs related to metastable mixing; we expect this general result to be widely applicable, and we devote a section to spelling out the details of its application to a class of generalized higher-order ferromagnetic Curie-Weiss models where the necessary inputs are relatively transparent.

URL PDF HTML ☆

赞 0 踩 0

2511.23152 2026-05-21 cs.LG cond-mat.dis-nn math.OC math.RT stat.ML

A Differentiable Measure of Algebraic Complexity: Provably Exact Discovery of Group Structures

一种可微的代数复杂性度量：证明精确发现群结构

Dongsung Huh, Lior Horesh, Halyun Jeong

AI总结本文提出了一种可微的代数复杂性度量，通过Cayley表完成问题，证明了通过超立方体操作符张量分解可以精确发现群结构，解决了Huh(2025)的核心开放猜想。

详情

Comments: 29 pages, 3 figures. All theoretical conjectures are formally proven as theorems and verified in Lean 4. v4: Minor typographical corrections

AI中文摘要

从数据中发现离散代数规则是机器学习中的基本挑战。我们通过Cayley表完成——经典矩阵完成的代数对应物——正式化了这个问题，其中关联性违反的程度取代线性秩作为复杂性的内在度量。我们对超立方体，一种操作值张量分解，在完全观察的目标表δ上进行了严格的景观分析，证明其全局下界H_inf(δ) := inf_{Θ∈F_δ} H(Θ)隐式定义了这种复杂性的精确可微度量。我们证明了超立方体的原目标函数H(Θ)分解为两个组成部分：几何对齐（共线性）和反ℓ_2惩罚。我们建立这些连续变分压力诱导了核心离散属性：共线性强制关联性（共线性-关联性等价），而反ℓ_2惩罚在共线性流形内减少为精确反秩惩罚，驱动参数向全秩单位性发展。因此，我们推导出一个绝对下界H(Θ) ≥ H_inf(δ) ≥ 3 |δ|，其中|δ|是目标表大小。我们证明这个绝对地板在且仅在目标是同源于群时被达到，并将全局最小值表征为底层群的正则表示（除单位性规范外），解决了Huh(2025)的核心开放猜想。本文为某些离散代数结构可以被可微度量精确表征提供了存在证明，使得基于梯度的发现无需组合搜索。所有理论结果均在Lean 4中机械验证并通过小规模实验确认。

英文摘要

Discovering discrete algebraic rules from data is a fundamental challenge in machine learning. We formalize this problem through Cayley-table completion -- an algebraic counterpart to classical matrix completion -- where the degree of associativity violation replaces linear rank as the intrinsic measure of complexity. We provide a rigorous landscape analysis of HyperCube, an operator-valued tensor factorization, on the fully observed target table $δ$, proving that its global infimum $H_{\inf}(δ) := \inf_{Θ\in F_δ} H(Θ)$ implicitly defines an exact differentiable measure for this complexity. We show that HyperCube's native objective $H(Θ)$ decomposes into two components: geometric alignment (collinearity) and an inverse $\ell_2$ penalty. We establish that these continuous variational pressures induce core discrete properties: collinearity enforces associativity (Collinearity--Associativity Equivalence), and the inverse $\ell_2$ penalty reduces to an exact inverse rank penalty within the collinear manifold, driving the parameters toward full-rank unitarity. Consequently, we derive an absolute lower bound $H(Θ) \ge H_{\inf}(δ) \ge 3 \, |δ|$, where $|δ|$ is the target table size. We prove this absolute floor is attained if and only if the target is isotopic to a group, and characterize the global minimizer as the regular representation of the underlying group (up to unitary gauge), resolving the central open conjecture of Huh (2025). This work serves as an existence proof that certain discrete algebraic structures can be exactly characterized by differentiable measures, enabling gradient-based discovery without the need for combinatorial search. All theoretical results are mechanically verified in Lean 4 and confirmed via small-scale experiments.

URL PDF HTML ☆

赞 0 踩 0

2511.21836 2026-05-21 stat.ME

A simple and powerful test of vaccine waning

一种简单而强大的疫苗效用衰减检验

Gellért Perényi, Matias Janvin, Mats J. Stensrud

AI总结本文提出了一种新的统计检验方法，用于评估治疗效果在个体层面是否随时间保持不变，从而更有效地检测疫苗效用的衰减，同时提供了新的关于衰减效应的界限结果。

详情

AI中文摘要

确定疫苗效力是否减弱对个体和公共决策至关重要。然而，量化衰减是一个微妙的任务。经典方法除非我们施加不合理假设，否则不能解释为效力下降的度量。最近，正式因果估计量被提出，用于量化疫苗衰减，这些估计量可以在较弱的假设下被界定，但界限往往太宽，无法做出关于衰减存在的声明。我们提出了一种不同的方法：一种正式检验，用于评估治疗效果在个体层面是否随时间保持不变。该检验在现有方法上提供了显著的统计功效提升，并在疫苗试验中可解释的假设下保持有效。我们通过实际和模拟例子展示了统计功效的提升，使用三种不同的方法计算检验统计量。其中两种方法仅基于汇总数据，这些数据来自现有的临床试验。除了我们的检验外，我们还提供了新的结果，界定了衰减效应。我们使用这些方法重新分析了BNT162b2新冠疫苗随机对照试验的数据。尽管之前的分析未建立衰减，我们的检验拒绝了无衰减的原假设。

英文摘要

Determining whether vaccine efficacy wanes is important for individual and public decision making. Yet, quantification of waning is a subtle task. The classical approaches cannot be interpreted as measures of declining efficacy unless we impose unreasonable assumptions. Recently, formal causal estimands designed to quantify vaccine waning have been proposed. These estimands can be bounded under weaker assumptions, but the bounds are often too wide to make claims about the presence of waning. We propose a different approach: a formal test to assess whether a treatment effect is constant over time at the individual level. This test provides a considerable power gain over existing approaches and is valid under interpretable assumptions in vaccine trials. We illustrate the increase in power through real and simulated examples, using three different approaches to compute the test statistics. Two of these approaches are based solely on summary data, accessible from existing clinical trials. Beyond our test, we also give new results that bound the waning effect. We use our methods to reanalyze data from a randomized controlled trial of the BNT162b2 COVID-19 vaccine. While prior analysis did not establish waning, our test rejects the null hypothesis of no waning.

URL PDF HTML ☆

赞 0 踩 0

2511.21223 2026-05-21 stat.ML cs.LG

Maxitive Donsker-Varadhan Formulation for Possibilistic Variational Inference

Jasraj Singh, Shelvia Wongso, Jeremie Houssineau, Badr-Eddine Chérief-Abdellatif

AI总结本文提出了一种基于可能性理论的变分推断方法，通过建立最大性Donsker-Varadhan公式，解决了传统变分推断中对加法性假设的依赖问题，并提出了CBOpt优化器以提升图像分类任务的性能。

详情

Comments: 37 pages, 3 figures, 13 tables

AI中文摘要

关于在包含中期分析的平台试验中纳入非同时对照的探讨

Pavla Krotka, Martin Posch, Marta Bofill Roig

AI总结本文研究了在包含中期分析的平台试验中如何纳入非同时对照，探讨了非同时对照对治疗效应估计的影响，并提出了一种新的估计方法以减少偏差和I类错误。

详情

DOI: 10.1002/sim.70585
Journal ref: Statistics in Medicine (2026)

AI中文摘要

利用非同时对照可以增强平台试验的分析。由于纳入此类数据可能会在存在时间趋势时引入偏差，因此已提出了一些调整时间的方法。然而，迄今为止，这些方法在包含中期分析的平台试验中的行为尚未系统地得到研究。为了评估在使用非同时对照的试验中中期分析的影响，我们考虑了一个包含两个实验臂和一个共享对照的平台试验，其中第二个实验臂较晚进入。我们关注一种频率回归模型，该模型利用非同时对照来估计第二个臂的治疗效应，并使用阶梯函数调整时间以考虑时间变化。我们证明，如果在第一个臂中进行中期分析，而回归模型未进行调整，则可能会引入对第二个臂效应估计的偏差，并研究边际偏差和在第一个臂中期后继续的条件下偏差如何依赖于不同的试验设计参数。此外，我们提出了一种新的估计第二个臂治疗效应的估计量，旨在消除由第一个臂的中期分析和时间趋势引入的偏差，并在模拟研究中评估其性能。新提出的估计量被证明可以显著减少偏差和I类错误率膨胀，同时相比仅使用同时对照的分析具有更高的功效。

英文摘要

The analysis of platform trials can be enhanced by utilizing non-concurrent controls. Since including this data might also introduce bias in the treatment effect estimators if time trends are present, methods for incorporating non-concurrent controls adjusting for time have been proposed. However, so far their behavior has not been systematically investigated in platform trials that include interim analyses. To evaluate the impact of an interim analysis in trials utilizing non-concurrent controls, we consider a platform trial featuring two experimental arms and a shared control, with the second experimental arm entering later. We focus on a frequentist regression model that uses non-concurrent controls to estimate the treatment effect of the second arm and adjusts for time using a step function to account for temporal changes. We show that performing an interim analysis in Arm 1 may introduce bias in the point estimation of the effect in Arm 2, if the regression model is used without adjustment, and investigate how the marginal bias and bias conditional on the first arm continuing after the interim depend on different trial design parameters. Moreover, we propose a new estimator of the treatment effect in Arm 2, aiming to eliminate the bias introduced by both the interim analysis in Arm 1 and the time trends, and evaluate its performance in a simulation study. The newly proposed estimator is shown to substantially reduce the bias and type I error rate inflation while leading to power gains compared to an analysis using only concurrent controls.

URL PDF HTML ☆

赞 0 踩 0

2505.24275 2026-05-21 cs.LG math.OC stat.ML

GradPower: Powering Gradients for Faster Language Model Pre-Training

GradPower: 通过梯度加速更快的语言模型预训练

Jinbo Wang, Mingze Wang, Jiaqi Zhang, Wei Wang, Peng Pei, Xunliang Cai, Weinan E, Lei Wu

AI总结本文提出GradPower，一种轻量级的梯度变换技术，用于加速语言模型预训练。通过元素级符号幂变换，将梯度输入基础优化器，无需修改优化器内部逻辑或超参数，从而在多种架构、参数规模、数据集和学习率调度方案中均取得更低的终端损失。

详情

Comments: 24 pages, accepted by ICML 2026

AI中文摘要

我们提出GradPower，一种轻量级的梯度变换技术，用于加速语言模型预训练。给定一个梯度向量$g=(g_i)_i$，GradPower首先应用元素级符号幂变换：$φ_p(g)=({ m sign}(g_i)|g_i|^p)_{i}$，其中$p>0$为固定值，然后将变换后的梯度输入基础优化器。值得注意的是，GradPower只需单行代码更改，无需修改基础优化器的内部逻辑，包括超参数。当应用于Adam（称为AdamPower）时，GradPower在多种架构（LLaMA、Qwen2MoE）、参数规模（66M到2B）、数据集（C4、OpenWebText）和学习率调度方案（余弦、warmup-stable-decay）中均一致取得更低的终端损失。最显著的收益出现在训练现代混合专家模型时使用warmup-stable-decay调度方案。GradPower还无缝集成到其他最先进的优化器中，如Muon，从而进一步提升性能。最后，我们提供了理论分析，揭示了GradPower的内在机制，并突显了梯度噪声的影响。

英文摘要

We propose GradPower, a lightweight gradient-transformation technique for accelerating language model pre-training. Given a gradient vector $g=(g_i)_i$, GradPower first applies the elementwise sign-power transformation: $φ_p(g)=({\rm sign}(g_i)|g_i|^p)_{i}$ for a fixed $p>0$, and then feeds the transformed gradient into a base optimizer. Notably, GradPower requires only a single-line code change and no modifications to the base optimizer's internal logic, including the hyperparameters. When applied to Adam (termed AdamPower), GradPower consistently achieves lower terminal loss across diverse architectures (LLaMA, Qwen2MoE), parameter scales (66M to 2B), datasets (C4, OpenWebText), and learning-rate schedules (cosine, warmup-stable-decay). The most pronounced gains are observed when training modern mixture-of-experts models with warmup-stable-decay schedules. GradPower also integrates seamlessly with other state-of-the-art optimizers, such as Muon, yielding further improvements. Finally, we provide theoretical analyses that reveal the underlying mechanism of GradPower and highlight the influence of gradient noise.

URL PDF HTML ☆

赞 0 踩 0

2504.05431 2026-05-21 stat.ME math.ST stat.TH

A Generalized Tangent Approximation based Variational Inference Framework for Strongly Super-Gaussian Likelihoods

一种基于切线近似的变分推断框架用于强超高斯似然模型

Somjit Roy, Pritam Dey, Debdeep Pati, Bani K. Mallick

AI总结本文提出一种基于切线变换的变分框架，用于处理具有强超高斯似然特征的概率模型，通过凸对偶性构造对数似然的切线下界，从而在不可行的设置中实现高斯先验与模型参数的共轭性，并在数据生成机制的温和假设下建立算法收敛保证，同时推导出近最优的变分风险界。

详情

Comments: 135 pages, 51 figures, 13 tables, Revision Submitted

AI中文摘要

隐式生成建模的分数差流

Romann M. Weber

AI总结本文提出分数差流作为隐式生成建模的一种新方法，通过最优减少两个分布之间的KL散度，展示了其与去噪扩散模型的等价性，并揭示了生成对抗网络训练中隐含的数据优化子问题与分数差流之间的联系。

详情

Journal ref: Transactions on Machine Learning Research (7/2023)
Comments: 25 pages, 5 figures, 4 tables. Updated final version of a paper originally published in Transactions on Machine Learning Research (TMLR), including minor typographical corrections and post-publication commentary connecting the SD flow to drifting models

AI中文摘要

隐式生成建模（IGM）旨在生成与目标数据分布特征相符的合成样本。近期工作（如分数匹配网络、扩散模型）从推动合成源数据向目标分布的角度出发，通过动力学扰动或环境空间中的流来实现。在此方向上，我们提出任意目标与源分布之间的分数差（SD）作为一种流，该流能够最优地减少两者之间的KL散度。我们应用SD流到方便的代理分布上，这些分布只有在原始分布对齐时才对齐。我们证明在某些条件下，这种形式与去噪扩散模型具有形式等价性。我们还表明，生成对抗网络的训练包含一个隐含的数据优化子问题，当判别器最优时，该子问题在特定损失函数选择下诱导出SD流。因此，SD流为解决生成建模三重困境（高质量样本、模式覆盖和快速采样）的三种模型类别提供了理论联系，从而为统一方法奠定了基础。

英文摘要

Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. In this direction, we present the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. We also show that the training of generative adversarial networks includes a hidden data-optimization sub-problem, which induces the SD flow under certain choices of loss function when the discriminator is optimal. As a result, the SD flow provides a theoretical link between model classes that individually address the three challenges of the "generative modeling trilemma" -- high sample quality, mode coverage, and fast sampling -- thereby setting the stage for a unified approach.

URL PDF HTML ☆

赞 0 踩 0

1908.05972 2026-05-21 cs.LG stat.ML

AI-based Prediction of Independent Construction Safety Outcomes from Universal Attributes

基于AI的独立施工安全结果的属性预测

Henrietta Baker, Matthew R. Hallowell, Antoine J. -P. Tixier

AI总结本文改进并验证了先前研究中通过机器学习从属性中预测安全结果的方法，使用NLP提取属性并训练模型预测伤害严重性、类型、受影响身体部位和事件类型，通过独立人工标注消除潜在的人工相关性，结果表明属性仍具有高度预测性，同时引入了更大的数据集、新模型、模型堆叠和更合适的评估指标，最终成功预测伤害严重性，这是重大进展。

详情

Journal ref: Automation in Construction 118 (2020): 103146
Comments: Added author contributions and journal reference, updated corresponding author, fixed a few typos

AI中文摘要

本文显著改进并验证了先前研究中通过机器学习从属性中预测安全结果的方法。与原始研究类似，我们使用自然语言处理（NLP）从原始事件报告中提取基本属性，并训练机器学习模型进行预测。此处预测的安全结果包括伤害严重性、伤害类型、受影响身体部位和事件类型。与原始研究不同，安全结果不是通过NLP提取，而是由独立的人工标注提供，从而消除了预测变量和预测目标之间可能的人工相关性。结果表明，属性仍具有高度预测性，证实了原始方法的有效性。当前研究的其他改进包括使用（1）一个包含超过90,000份报告的更大数据集，（2）两种新模型，XGBoost和线性支持向量机（SVM），（3）模型堆叠，（4）更简单的实验设置和更合适的性能指标，以及（5）对各属性重要性评分的分析。最后，伤害严重性结果得到良好预测，这在原始研究中并未实现。这是重大进展。

英文摘要

This paper significantly improves on, and finishes to validate, an approach proposed in previous research in which safety outcomes were predicted from attributes with machine learning. Like in the original study, we use Natural Language Processing (NLP) to extract fundamental attributes from raw incident reports and machine learning models are trained to predict safety outcomes. The outcomes predicted here are injury severity, injury type, body part impacted, and incident type. However, unlike in the original study, safety outcomes were not extracted via NLP but were provided by independent human annotations, eliminating any potential source of artificial correlation between predictors and predictands. Results show that attributes are still highly predictive, confirming the validity of the original approach. Other improvements brought by the current study include the use of (1) a much larger dataset featuring more than 90,000 reports, (2) two new models, XGBoost and linear SVM (Support Vector Machines), (3) model stacking, (4) a more straightforward experimental setup with more appropriate performance metrics, and (5) an analysis of per-category attribute importance scores. Finally, the injury severity outcome is well predicted, which was not the case in the original study. This is a significant advancement.

URL PDF HTML ☆

赞 0 踩 0

2605.20534 2026-05-21 cs.LG cs.AI stat.ML

Axiomatizing Neural Networks via Pursuit of Subspaces

通过子空间追求轴心化神经网络

Mehmet Yamac, Mert Duman, Ugur Akpinar, Felix Rojas Casadiego, Serkan Kiranyaz, Marcel van Gerven, Moncef Gabbouj

AI总结本文提出一个基于几何公理的框架，用于解释神经网络的行为，通过子空间追求假设，统一了表示、计算和泛化在浅层和深层架构中的视角。

详情

Comments: 43 pages, 25 figures. Code and additional materials will be released

AI中文摘要

尽管深度神经网络在许多领域取得了显著成功，但其底层机制仍不清晰，常被视为黑箱。这种经验表现与理论理解之间的差距类似于经典几何学的前公理阶段。在本文中，我们引入了子空间追求（PoS）假设，这是一个轴心化的框架，通过一组几何公理来表征神经网络的行为。这些公理及其推导出的结论为浅层和深层架构中的表示、计算和泛化提供了统一的视角。我们展示了该框架能够为深度学习中的基本问题提供几何解释，包括表示结构、架构机制和泛化行为，从而为一个连贯的理论基础提供了有原则的步骤。

英文摘要

While deep neural networks have achieved remarkable success across a wide range of domains, their underlying mechanisms remain poorly understood, and they are often regarded as black boxes. This gap between empirical performance and theoretical understanding poses a challenge analogous to the pre-axiomatic stage of classical geometry. In this work, we introduce the Pursuit of Subspaces (PoS) hypothesis, an axiomatic framework that formulates neural network behavior through a set of geometric postulates. These axioms, together with their derived consequences, provide a unified perspective on representation, computation, and generalization in both shallow and deep architectures. We show that this framework yields geometric explanations for fundamental questions in deep learning, including representation structure, architectural mechanisms, and generalization behavior, offering a principled step toward a coherent theoretical foundation.

URL PDF HTML ☆

赞 0 踩 0

2605.20508 2026-05-21 stat.ME astro-ph.HE astro-ph.IM physics.data-an stat.AP

Compensator-Based Inference for Signal Detection Under Unknown Background

基于补偿器的信号检测推断：在未知背景下的应用

Aritra Banerjee, Sara Algeri

AI总结本文提出了一种新的信号检测方法，通过估计补偿器参数而非背景分布来简化推断过程，从而更有效地传播不确定性。

2605.20502 2026-05-21 cs.LG cs.AI cs.CV stat.AP stat.ML

Tippett-minimum Fusion of Representation-space Diffusion Models for Multi-Encoder Out-of-Distribution Detection

基于表示空间扩散模型的Tippett最小融合多编码器异常检测

Neelkamal Bhuyan

AI总结本文提出了一种多编码器融合的表示空间扩散模型，通过统计分析每个编码器对特定分布偏移类型的敏感性，引入EncMin2L门控机制，无需使用OOD标签即可在较低参数成本下提升异常检测性能，同时在四种分布偏移类型上均达到0.94以上的AUROC。

详情

Comments: 14 pages

AI中文摘要

我们通过多编码器融合的每编码器表示空间扩散模型（RDMs）来解决跨完整分布偏移谱的异常检测问题，包括全局域变化、语义分歧、纹理差异和协变量腐蚀。我们从ID数据中统计地识别每个编码器对特定偏移类型的敏感性，并引入EncMin2L——一种编码器无关的两级min(⋅)门控，能够在不使用OOD标签的情况下结合和校准每编码器扩散基的似然检测器，参数成本比单编码器基线低2.3倍。两种ID数据诊断：η²（类条件F检验）和Δμ（在合成腐蚀下的对数似然偏移）量化编码器的专业化，而Tippett最小p值组合将每编码器得分聚合为一个校准稳定的OOD信号。EncMin2L在所有四种偏移类型上均达到≥0.94的AUROC，优于在重叠基准上的最佳表示空间扩散OOD检测器。

英文摘要

We address out-of-distribution (OOD) detection across the full spectrum of distribution shifts -- global domain changes, semantic divergence, texture differences, and covariate corruptions -- through a multi-encoder fusion of per-encoder representation-space diffusion models (RDMs). We statistically identify each encoder's sensitivity to specific shift types from ID data alone and introduce EncMin2L -- an encoder-agnostic two-level $\min(\cdot)$-gate that combines and calibrates per-encoder diffusion-based likelihood detectors without OOD labels, outperforming monolithic multi-encoder baselines at $2.3\times$ lower parameter cost. Two ID-data diagnostics: $η^2$ (class-conditional F-test) and $Δμ$ (log-likelihood shift under synthetic corruptions) -- quantify encoder specialization, while a Tippett minimum $p$-value combination aggregates per-encoder scores into a single, calibration-stable OOD signal. EncMin2L achieves $\geq 0.94$ AUROC across all four shift types simultaneously, outperforming the state-of-the-art representation-space diffusion OOD detectors across overlapping benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2605.20494 2026-05-21 cs.LG physics.ao-ph stat.AP

A 10,000-Year Global Stochastic Tropical Cyclone Catalog with Wind-Dependent Track Transitions (WHITS)

具有风依赖性路径转换的10,000年全球随机热带气旋目录（WHITS）

Jennifer Nakamura, Upmanu Lall

AI总结本文提出WHITS方法，通过非参数半马尔可夫路径生成器生成全球10,000年合成气旋目录，以提高保险损失评估的可靠性。

详情

AI中文摘要

可靠的热带气旋（TC）风险评估受到历史记录的简短和空间稀疏性的限制，特别是对于罕见的高强度登陆事件，这些事件主导了保险损失。我们提出了WHITS（风聚焦飓风交互路径模拟器），这是一种非参数半马尔可夫路径生成器，扩展了Nakamura等人（2015）的HITS框架，有三种改进：在历史路径段之间转换时，除了位置、年龄和前进向量外，还根据局部风速进行条件；在比较向量项上选择核时，进行了细化以抑制动态不一致的跳跃；并在每个转换中应用了短平滑窗口，以消除下游风暴潮用户报告的位置和风速不连续性。WHITS被拟合到每个六个盆地的完整可用最佳轨迹记录中，北大西洋延伸至1851年，在其他盆地延伸至可靠最佳轨迹数据的最早年份。所得到的10,000年全球合成目录重现了所有盆地的观测路径密度和每年飓风/台风风力打击概率。该目录旨在用于灾难风险应用，其中大量、低偏倚的物理合理路径比小而统计上修正的样本更有用。

英文摘要

Reliable assessment of tropical cyclone (TC) risk is limited by the brevity and spatial sparsity of the historical record, particularly for the rare, high-intensity landfalls that dominate insured loss. We present WHITS (Wind-focused Hurricane Interactive Track Simulator), a non-parametric semi-Markov track generator that extends the HITS framework of Nakamura et al. (2015) in three ways: transitions between historical track segments are conditioned on local wind speed in addition to position, age, and forward vector; the kernel selection on the comparative-vector term is sharpened to suppress dynamically inconsistent jumps; and a short smoothing window is applied across each transition to remove the position and wind discontinuities reported by downstream surge users. WHITS is fit to the full available best-track record in each of six basins in IBTrACS, extending in the North Atlantic to 1851 and in other basins to the earliest year of reliable best-track data. The resulting 10,000-yr global synthetic catalog reproduces observed track density and the annual hurricane/typhoon-force wind-hit probability across all basins. The catalog is intended for catastrophe-risk applications where a large, low-bias sample of physically plausible tracks is more useful than a small, statistically corrected one.

URL PDF HTML ☆

赞 0 踩 0

2605.20434 2026-05-21 stat.ML cs.DM cs.LG

Contradiction Graphs Determine VC Dimension

矛盾图确定VC维

Jesse Campbell, Daniel Ibaibarriaga, Lev Reyzin

AI总结本文研究二元概念类的矛盾图，通过分析矛盾图的结构确定VC维的阈值，从而精确计算VC维并区分有限与无限VC维。

2605.20429 2026-05-21 stat.AP

Design and Validation of a Grid-based Home Detection via Stay-Time (GHOST) Software for Mobile Location Data

基于停留时间的网格化家庭检测（GHOST）软件的设计与验证：用于移动定位数据

Alessandra Recalde, Mustafa Sameen, Xiaojian Zhang, Xilei Zhao

AI总结本研究提出了一种基于网格和停留时间的家庭检测算法GHOST，通过定制的空间和时间过滤器识别最频繁访问的夜间或周末白天网格单元，以推断代理家庭位置，并在大规模数据集上验证其在噪声数据中的鲁棒性。

详情

AI中文摘要

从移动设备生成的GPS数据中准确检测家庭位置是人类移动性研究的基础步骤，对交通规划、公共卫生和应急响应有重要影响。然而，现有的家庭检测算法在处理真实世界中的噪声数据时往往结果不可靠，并且由于缺乏地面真实基准而难以验证。为解决这些限制，本研究提出了GHOST算法的设计与验证，作为开源的Python包实现。该算法通过识别基于可定制空间和时间过滤器的最频繁访问的夜间或周末白天网格单元来推断代理家庭位置。为了验证其性能，我们使用包含超过155,000次行程的大型波士顿步行数据集，该数据集来自波士顿都会区的377名参与者，以测试其对噪声数据的鲁棒性。此外，我们还收集了来自美国不同地区的10名志愿者的地面真实数据，包括佛罗里达、密西西比和科罗拉多，以及他们自报的家庭坐标，以评估GHOST在多样化的移动模式和采样条件下的表现。我们比较了GHOST的准确性与五种已建立的家庭检测算法：All-time clustering方法、Stay-point方法、DBSCAN、K-MEANS++和SciKit-Mobility Home Detection在多种参数设置下的表现。结果表明，GHOST在准确性和鲁棒性方面均优于所有算法，最佳配置下的平均误差低至22.3米。我们的发现突显了该算法的高准确性和灵活性，其中网格大小是验证过程中最影响性的参数，展示了该算法在真实世界移动定位数据分析中的潜力。

英文摘要

Accurately detecting home locations from GPS data generated by mobile devices is a foundational step in human mobility research, with significant implications for transportation planning, public health, and emergency response. However, existing home detection algorithms often produce unreliable results for noisy real-world data and are barely validated due to a lack of ground-truth benchmarks. To tackle these limitations, this study presents the development and validation of a Grid-based home detection via Stay-Time (GHOST) algorithm, implemented as an open-source Python package. The algorithm infers proxy home locations by identifying the most frequently visited nighttime or weekend daytime grid cells based on customizable spatial and temporal filters. To validate its performance, we use the large-scale BostonWalks dataset, which includes over 155,000 trips from 377 participants in the Boston metropolitan area, to test robustness to noisy data. Additionally, we collected a ground-truth dataset for ten volunteers across different regions in the U.S., including Florida, Mississippi, and Colorado, along with their self-reported home coordinates, to evaluate GHOST across diverse mobility patterns and sampling conditions. We compared GHOST accuracy to that of 5 well-established home detection algorithms: All-time clustering method, Stay-point method, DBSCAN, K-MEANS++, and SciKit-Mobility Home Detection, across multiple parameter settings. Results show that GHOST outperforms all algorithms in accuracy and robustness, with average errors as low as 22.3 meters under optimal configurations. Our findings highlight the high accuracy and flexibility of our algorithm, with grid size being the most influential parameter during validation, demonstrating the potential of this algorithm for real-world mobile location data analysis.

URL PDF HTML ☆

赞 0 踩 0

2605.20400 2026-05-21 stat.AP cs.LG stat.ML

Understanding Deterioration Random Effects for Causal Discovery in Infrastructure Management

理解基础设施管理中的劣化随机效应以进行因果发现

Takato Yasuno

AI总结本文提出了一种结合贝叶斯分层危险模型与因果发现的新框架，用于识别驱动泵设备异质劣化率的操作模式，通过GPU加速NUTS估计随机效应并验证线性假设，揭示不同操作制度需要不同的管理策略。

详情

Comments: 20 pages, 7 figures, 4 tables

AI中文摘要

基础设施劣化对资产管理工作构成重大挑战，但现有方法依赖于人口平均模型，忽略了设备特定的异质性。我们提出了一种新的框架，结合贝叶斯分层危险建模与因果发现，以识别驱动泵设备异质劣化率的操作模式。我们的方法首先利用GPU加速的No-U-Turn Sampling (NUTS) 估计泵特定的随机效应 $u_i$，实现比CPU实现快3-5倍的速度提升。然后，我们使用DirectLiNGAM发现22个工程时间序列特征与劣化率之间的因果关系，并根据正 ($u_i > 0$, 更快劣化) 与负 ($u_i \leq 0$, 更慢劣化) 随机效应进行分层。分析112台泵共92,861个观测值，持续650天，我们发现显著的异质性：负组的因果效应比正组大400倍，标准差 (std) 显示在低风险设备上，正因果效应 ($+1.515$) 对劣化率有显著影响。我们通过NonlinearLiNGAM比较验证线性假设，并通过GPU加速展示实际可扩展性。我们的发现使通过揭示不同操作制度需要根本不同的管理方法，推动预测性维护从人口平均到异质性感知决策的进展。

英文摘要

Infrastructure deterioration poses significant challenges for asset management, yet existing approaches rely on population-averaged models that overlook equipment-specific heterogeneity. We present a novel framework that combines Bayesian hierarchical hazard modeling with causal discovery to identify operational patterns that drive heterogeneous deterioration rates in pump equipment. Our approach first estimates pump-specific random effects $u_i$ using GPU-accelerated No-U-Turn Sampling (NUTS), achieving 3--5$\times$ speedup over CPU implementations. We then employ DirectLiNGAM to discover causal relationships between 22 engineered time-series features and deterioration rates, stratified by positive ($u_i > 0$, faster deterioration) versus negative ($u_i \leq 0$, slower deterioration) random effects. Analyzing 112 pumps with 92,861 observations over 650 days, we uncover striking heterogeneity: the negative group exhibits causal effects 400$\times$ larger than the positive group, with standard deviation (std) showing a strong positive causal effect ($+1.515$) on deterioration rates in low-risk equipment. We validate linearity assumptions through NonlinearLiNGAM comparison and demonstrate practical scalability through GPU acceleration. Our findings enable targeted maintenance strategies by revealing that different operational regimes require fundamentally distinct management approaches, advancing predictive maintenance from population-averaged to heterogeneity-aware decision making.

URL PDF HTML ☆

赞 0 踩 0

2605.20399 2026-05-21 stat.ME stat.AP

A duration-augmented binary Markov chain for rainfall occurrence with long dry spells

具有持续时间的二元马尔可夫链用于降雨发生与长干旱期

Antoine Doizé, Denis Allard, Philippe Naveau, Olivier Wintenberger

AI总结本文提出了一种具有持续时间增强的二元马尔可夫链，用于模拟降雨发生与长干旱期，通过与交替再生成链建立联系，实现了对湿旱期持续时间分布的灵活参数建模，并在南欧200多个站点上应用，验证了该方法在刻画持久性和高分位数外推方面的有效性。

详情

AI中文摘要

模拟真实合理的湿期和旱期是天气生成器和气候影响研究中的核心任务。虽然有限阶马尔可夫链是标准方法，但它们由于内在的亚指数衰减特性，往往无法再现持久的干旱条件。我们通过引入持续时间增强的二元马尔可夫链来建模降雨发生。我们建立了与交替再生成链的联系，从而能够灵活地对湿期和旱期持续时间分布进行参数建模。我们使用两种从广义极值分布一般类中衍生出的适应性规范来建模这些分布，从而在各种气候条件下实现灵活的尾部行为。我们使用适应于每种规范的估计方法。我们的模型应用于南欧约200个站点，涵盖了多样的地中海和大陆性气候。我们将此框架与标准马尔可夫模型进行比较，以刻画持久性和高分位数外推。该方法具有通用性，可自然扩展到多状态设置或其他二元序列应用在环境统计中。

英文摘要

Simulating realistic wet and dry spells is central in weather generators and climate-impact studies. While finite-order Markov chains are standard, they often fail to reproduce persistent dry conditions due to their inherent subexponential decay. We model rainfall occurrence by introducing a duration-augmented binary Markov chain. We establish a link with alternating renewal chains, enabling flexible parametric modelling of wet and dry spell duration distribution. We model those using two regime-adapted specifications from the general class of extended Generalized Pareto Distributions, yielding flexible tail behaviour across various climates. We use estimation methods adapted to each specification. Our model is applied to around 200 stations in the South of Europe spanning diverse Mediterranean and continental climates. We compare this framework to standard Markov models in characterising persistence and high-quantile extrapolation. The approach is generic, extending naturally to multi-state settings or other binary sequence applications in environmental statistics.

URL PDF HTML ☆

赞 0 踩 0

2605.20396 2026-05-21 cs.LG stat.ML

Score-Based Causal Discovery of Latent Variable Causal Models

基于得分的潜在变量因果模型因果发现

Ignavier Ng, Xinshuai Dong, Haoyue Dai, Biwei Huang, Peter Spirtes, Kun Zhang

AI总结本文提出了一种基于得分的方法，用于识别包含因果相关潜在变量的因果结构，并提供了可识别性保证，同时通过实验验证了方法的有效性。

详情

Comments: ICML 2024

AI中文摘要

识别潜在变量及其涉及的因果结构在各种科学领域中都是至关重要的。尽管许多现有工作属于约束性方法（例如条件独立性或秩不足测试），但它们可能面临经验挑战，如测试顺序依赖性、误差传播和选择合适显著性水平的问题。这些问题可以通过精心设计的基于得分的方法（如在没有潜在变量的情况下使用的贪心等价搜索（GES））来缓解。然而，设计包含潜在变量的基于得分的方法却极具挑战性。在本文中，我们开发了能够识别包含因果相关潜在变量的因果结构的基于得分的方法，并提供了可识别性保证。具体而言，我们证明了适当制定的评分函数可以实现结构学习的得分等价性和一致性。我们进一步对文献中考虑的多种结构假设下观测变量边缘分布的有效自由度进行了表征，并据此开发了精确和连续的基于得分的方法。这为几种现有约束性方法提供了统一的视角。实验结果验证了所提出方法的有效性。

英文摘要

Identifying latent variables and the causal structure involving them is essential across various scientific fields. While many existing works fall under the category of constraint-based methods (with e.g. conditional independence or rank deficiency tests), they may face empirical challenges such as testing-order dependency, error propagation, and choosing an appropriate significance level. These issues can potentially be mitigated by properly designed score-based methods, such as Greedy Equivalence Search (GES) (Chickering, 2002) in the specific setting without latent variables. Yet, formulating score-based methods with latent variables is highly challenging. In this work, we develop score-based methods that are capable of identifying causal structures containing causally-related latent variables with identifiability guarantees. Specifically, we show that a properly formulated scoring function can achieve score equivalence and consistency for structure learning of latent variable causal models. We further provide a characterization of the degrees of freedom for the marginal over the observed variables under multiple structural assumptions considered in the literature, and accordingly develop both exact and continuous score-based methods. This offers a unified view of several existing constraint-based methods with different structural assumptions. Experimental results validate the effectiveness of the proposed methods.

URL PDF HTML ☆

赞 0 踩 0

2605.20359 2026-05-21 econ.EM stat.ME

The Harmonic Synthetic Control Method

谐波合成控制法

Ziyi Liu, Yiqing Xu

AI总结本文提出谐波合成控制法（HSC），通过软分配机制替代二元选择，联合估计供体权重和被处理单位的平滑残差成分，并利用时间序列预测器外推残差成分。HSC通过滚动原点交叉验证选择调节参数，平衡供体匹配与预测。通过频谱解释显示HSC在供体匹配中降低低频残差成分，并将其分配给预测分支。蒙特卡洛实验表明HSC能适应不同 regime，而在随机趋势主要为共同或异质时表现良好。

详情

AI中文摘要

合成控制方法在结果序列包含单位特定随机趋势时会产生误导性的反事实预测，这是非平稳宏观经济数据的常见特征。现有解决方案，如预滤波或差分，可以减少虚假匹配，但可能丢弃共享的非平稳变化，这些变化有助于估计供体权重。我们提出谐波合成控制法（HSC），将这一二元选择替换为软分配机制。HSC联合估计供体权重和被处理单位特定的平滑残差成分，然后利用时间序列预测器将此成分外推到治疗后时期。一个通过滚动原点交叉验证选择的调节参数控制供体匹配与预测之间的分配。随着该参数的变化，HSC连续在差分结果上的合成控制和原始结果上的合成控制（带有截距或趋势）之间插值。我们提供频谱解释，说明HSC如何在供体匹配中降低低频残差成分，并将其分配给预测分支。预测误差分解将权重估计扭曲与残差预测误差分开。蒙特卡洛实验表明HSC能适应不同 regime，在随机趋势主要为共同或异质时表现良好，而固定在某一 regime 的估计器在另一 regime 时会失败。

英文摘要

Synthetic control methods can produce misleading counterfactual predictions when outcome series contain unit-specific stochastic trends, a common feature of nonstationary macroeconomic data. Existing remedies, such as pre-filtering or differencing, reduce spurious matching but may discard shared nonstationary variation that helps estimate donor weights. We propose Harmonic Synthetic Control (HSC), which replaces this binary choice with a soft allocation mechanism. HSC jointly estimates donor weights and a treated-unit-specific smooth residual component, then extrapolates this component into post-treatment periods using a time-series forecaster. A tuning parameter, selected by rolling-origin cross-validation, governs the division between donor matching and forecasting. As it varies, HSC continuously interpolates between synthetic control applied to differenced outcomes and synthetic control applied to raw outcomes with an intercept or trend. We provide a spectral interpretation showing how HSC downweights low-frequency residual components in donor matching and assigns them to the forecasting branch. A prediction-error decomposition separates weight-estimation distortion from residual-forecasting error. Monte Carlo exercises show that HSC adapts across regimes, performing well when stochastic trends are predominantly common or idiosyncratic, while estimators fixed to one regime can fail in the other.

URL PDF HTML ☆

赞 0 踩 0

2605.20345 2026-05-21 stat.ML cs.LG

Corrected Integrated Laplace Approximation for Bayesian Inference in Latent Gaussian Models

修正的积分拉普拉斯近似法用于潜在高斯模型的贝叶斯推断

Jinlin Lai, Charles C. Margossian, Daniel R. Sheldon

AI总结本文提出了一种重要性采样方案来纠正积分拉普拉斯近似法（ILA）在潜在高斯模型（LGMs）中引入的误差，通过增加重要性采样的样本数使近似后验收敛到正确后验，并在自动微分框架中实现该方法以支持超参数推断中的梯度基算法，特别是哈密顿蒙特卡洛方法。

详情

AI中文摘要

潜在高斯模型（LGMs）是一类流行的贝叶斯分层模型，包括高斯过程、某些空间模型和混合效应模型。对LGMs进行高效贝叶斯推断通常需要对潜在变量进行边缘化。对于具有非高斯似然的LGMs，精确边缘化是不可能的，一种流行的方法是使用积分拉普拉斯近似（ILA）进行近似边缘化。使用ILA会产生一个近似后验，在某些情况下，它可能与正确后验有显著差异，从而影响下游应用。我们提出了一种重要性采样方案来纠正ILA引入的误差。通过增加重要性采样的样本数，ILA产生的后验将收敛到正确后验。这一想法通过伪边缘化、拟蒙特卡洛和随机化拟蒙特卡洛等技术实现。我们将在自动微分框架中实现我们的方法，以支持在超参数推断中的梯度基算法。对于后者，我们特别考虑使用哈密顿蒙特卡洛方法。我们展示了在各种应用模型中减少误差的好处。

英文摘要

Latent Gaussian models (LGMs) are a popular class of Bayesian hierarchical models that include Gaussian processes, as well as certain spatial models and mixed-effect models. Efficient Bayesian inference of LGMs often requires marginalizing out the latent variables. For LGMs with a non-Gaussian likelihood, exact marginalization is not possible and a popular approach is to do approximate marginalization with an integrated Laplace approximation (ILA). Using ILA produces an approximate posterior which, in some settings, can differ significantly from the correct posterior, which impacts downstream applications. We propose an importance sampling scheme to correct the error introduced by ILA. By increasing the number of samples in importance sampling, the posterior with ILA converges to the correct posterior. This idea is realized with various techniques, including pseudo-marginalization, quasi-Monte Carlo and randomized quasi-Monte Carlo. We implement our methods in an automatic differentiation framework to support gradient-based algorithms when doing inference on the hyperparameters. For the latter, we specifically consider the use of Hamiltonian Monte Carlo. We demonstrate the benefits of reduced error in various applied models.

URL PDF HTML ☆

赞 0 踩 0

2605.20325 2026-05-21 stat.ME stat.CO

Explainable Outlier Detection for Multivariate Functional Data

可解释的多元函数数据异常检测

Marcus Mayrhofer, Una Radojičić, Horst Lewitschnig, Peter Filzmoser

AI总结本文针对具有分离协方差结构的多元函数数据的鲁棒协方差估计和可解释性异常检测挑战，提出了一种结合随机过程与矩阵变量分布的方法，通过改进鲁棒性和可解释性来估计均值和协方差，并利用Shapley值进行异常检测分解。

详情

AI中文摘要

本工作针对具有分离协方差结构的多元函数数据的鲁棒协方差估计和可解释性异常检测挑战，提出了一种方法，通过建立具有分离协方差结构的随机过程与其基表示的矩阵变量分布之间的联系，同时改进鲁棒性和可解释性。利用最近开发的矩阵变量最小协方差确定性估计器（MMCD）的变体，结合截断的多元函数Mahalanobis半距离，以鲁棒的方式估计多元函数数据的均值和协方差。对于可解释的异常检测，将基于Shapley值的多元异常解释推广到分解总体多元函数异常性为时间坐标特定的贡献。重要的是，将原本相对于组件数量呈指数级的计算复杂度降低到线性复杂度，同时保留Shapley值的关键属性。这种集成框架结合了鲁棒Mahalanobis距离、MMCD估计器和基于Shapley值的异常性分解，为具有分离协方差结构的多元函数数据提供了一种鲁棒且可解释的分析方法。通过理论分析和实际应用，包括模拟和现实世界示例，验证了该方法的有效性。

英文摘要

This work addresses the challenges of robust covariance estimation and interpretable outlier detection for multivariate functional data with separable covariance structure. We develop a method that simultaneously improves robustness and interpretability in this context by establishing a connection between stochastic processes with separable covariance structures and the corresponding matrix-variate distribution of their basis representations. Leveraging this connection, we employ the recently developed matrix-variate counterpart of the Minimum Covariance Determinant estimator (MMCD) in conjunction with a truncated multivariate functional Mahalanobis semi-distance to robustly estimate mean and covariance for multivariate functional data. For interpretable outlier detection, we generalize multivariate outlier explanations based on Shapley values to decompose overall multivariate functional outlyingness into time-coordinate-specific contributions. Importantly, we reduce the otherwise exponential computational complexity (relative to the number of components) to linear complexity, while retaining the key properties of the Shapley value. This integrated framework combines robust Mahalanobis distances, MMCD estimators, and Shapley value-based outlyingness decomposition to provide a robust and interpretable approach for analyzing multivariate functional data with separable covariance structures. The effectiveness of this approach is demonstrated through both theoretical analysis and practical applications, including simulations and real-world examples.

URL PDF HTML ☆

赞 0 踩 0

2605.20271 2026-05-21 stat.ML cs.LG

Multi-Head Attention as Ensemble Nadaraya-Watson Estimation: Variance Reduction, Decorrelation, and Optimal Head Diversity

多头注意力作为恩德里亚-沃森估计的集合：方差减少、去相关和最优头多样性

Ernest Fokoué

AI总结本文提出多头注意力可以视为恩德里亚-沃森核回归估计器的集合，通过分析头输出的去相关性，推导出方差减少与头多样性之间的关系，并提出头多样性指数来衡量不同头之间的去相关程度，最终得出最优的头数量和维度分配方案。

详情

Comments: 14 pages

AI中文摘要

我们发展了多头注意力（MHA）作为恩德里亚-沃森（NW）核回归估计器集合的严谨统计理论。基于单头softmax注意力与NW估计器之间的代数恒等式，我们证明MHA是H个NW估计器的结构化集合，每个在键空间的不同的学习投影子空间中操作。我们推导出MHA均方误差的显式偏倚-方差-协方差分解，表明方差减少不仅取决于头数H，还根本上取决于头输出的去相关性。去相关由学习投影子空间之间的主角之间决定：正交投影产生最大方差减少；对齐投影产生无。我们引入头多样性指数（HDI），一个可计算的谱度量，衡量头之间的去相关程度，并证明MHA均方误差随HDI单调递减。这为经验观察到的注意力头的专业化提供了第一个严谨的理论解释。在固定总维度预算D=H*d_k下，我们解决最优头维度分配问题，推导出MSE最小化的配对（H*,d_k*）从数据分布和回归平滑度。解决方案得出新的架构扩展定律：最优每头维度随着训练集大小对数增长，而最优头数几乎与总预算D线性增长。我们的框架统一了三个先前的工作：单头注意力的NW理论、集合学习的一般加权理论以及生物和计算集合之间的去相关-方差减少同构性。多头注意力是Transformer对通用原则的实例化：相同代理加上促进多样性的机制产生涌现最优性。

英文摘要

We develop a rigorous statistical theory of multi-head attention (MHA) as an ensemble of Nadaraya-Watson (NW) kernel regression estimators. Building on the algebraic identity between single-head softmax attention and the NW estimator, we prove that MHA is a structured ensemble of H NW estimators, each operating in a distinct learned projection subspace of the key space. We derive an explicit Bias-Variance-Covariance decomposition of the MHA mean squared error, showing that variance reduction depends not merely on the number of heads H but fundamentally on the decorrelation of head outputs. Decorrelation is governed by the principal angles between learned projection subspaces: orthogonal projections yield maximum variance reduction; aligned projections yield none. We introduce the Head Diversity Index (HDI), a computable spectral measure of inter-head decorrelation, and prove that MHA mean squared error is monotonically decreasing in HDI. This provides the first rigorous theoretical explanation for the empirically observed specialization of attention heads. Under a fixed total-dimension budget D = H * d_k, we solve the optimal head-dimension allocation problem, deriving the MSE-minimizing pair (H*, d_k*) from data distribution and regression smoothness. The solution yields a new architectural scaling law: the optimal per-head dimension grows logarithmically with training set size, while the optimal number of heads grows nearly linearly with the total budget D. Our framework unifies three strands of prior work: the NW theory of single-head attention, the general weighting theory for ensemble learning, and the decorrelation-variance-reduction isomorphism between biological and computational ensembles. Multi-head attention is the Transformer's instantiation of a universal principle: identical agents plus diversity-enforcing mechanisms yields emergent optimality.

URL PDF HTML ☆

赞 0 踩 0

2605.20270 2026-05-21 cs.LG cs.AI stat.ML

Conformal Selective Acting: Anytime-Valid Risk Control for RLVR-Trained LLMs

conformal selective acting: any-time-valid risk control for rlvr-trained llms

Hamed Khosravi, Xiaoming Huo

AI总结该研究提出了一种 conformal selective acting 方法，用于在 rlvr 训练的 llms 部署中实现 anytime-valid 的风险控制，通过在部署要求下强制一个空单元，利用 e-process 和 bonferroni 网格来维护 pathwise 有效性，同时在多个基准测试中证明了其有效性。

详情

AI中文摘要

一个本地专家 llm，通过在操作员本地数据上使用强化学习从可验证奖励 (rlvr) 进行微调，被安装在一个受监管的组织中，具有每个部署的误差预算 α。操作员需要在每个回合为该部署的流提供安全证书：不跨部署汇总，不等待长期平均。现有封装器无法在自适应、在线更新的流上实现这一点：离线 conformal 风险方法需要可交换性；在线 conformal 方法仅绑定长期平均；非可交换扩展是边际有效的；最接近的 anytime 封装器，A-RCPS，控制的是边际风险而非选择性风险。使用 (测试统计量，有效性保证，部署规则) 框架，我们识别了一个被部署要求强制的空单元：e-process 每个阈值，选择性风险，anytime-pathwise 有效性，max-certified-threshold 规则。Conformal Selective Acting (CSA) 填充它作为每回合的封装器，维护每个阈值上的 ville 型 e-process 在 bonferroni 网格上，评估相对于 rlvr 过滤器。在可预测的更新和 isotonic-calibrated 单调风险下，我们证明了 (i) 一个 anytime-pathwise 选择性风险界 $R_T^{\mathrm{act}}\leα+O(N_T^{-1/2})$，(ii) 与 $Θ(arη^{-2}\log(1/δ))$ 匹配的认证率，以及 (iii) 与 horizon 无关的发布率差距。在八个专家基准 ($480$ 流)、十六个对抗性分布偏移单元 ($160$ 流) 和五个 live Expert-Iteration RLVR 单元 (在四个基础模型上使用在线 LoRA 在三个架构家族中) ($10{,}300$ 轮) 中，CSA 是十种方法中唯一一个在每个单元上都满足 pathwise 有效性和非拒绝部署的方法。我们不提出新的 llm、训练算法或策略类；CSA 是部署端的补充，与模型正交，适用于无法使用前沿 API 的操作员。

英文摘要

A local specialist LLM, fine-tuned with reinforcement learning from verifiable rewards (RLVR) on operator-local data, is installed in a regulated organization with per-deployment error budget $α$. The operator needs a safety certificate for this deployment's stream at every round: no pooling across deployments, no waiting for a long-run average. Existing wrappers cannot deliver this on adaptive, online-updated streams: offline conformal-risk methods require exchangeability; online-conformal methods bound only long-run averages; non-exchangeable extensions are marginally valid; and the closest anytime wrapper, A-RCPS, controls marginal rather than selective risk. Using a (test statistic, validity guarantee, deployment rule) framework, we identify one empty cell forced by deployment requirements: e-process per threshold, selective risk, anytime-pathwise validity, max-certified-threshold rule. Conformal Selective Acting (CSA) fills it as a per-round wrapper maintaining a Ville-type e-process per threshold on a Bonferroni grid, evaluated against the RLVR filtration. Under predictable updates and isotonic-calibrated monotone risk we prove (i) an anytime-pathwise selective-risk bound $R_T^{\mathrm{act}}\leα+O(N_T^{-1/2})$, (ii) rate-optimal certification matching $Θ(\barη^{-2}\log(1/δ))$, and (iii) a horizon-independent release-rate gap. Across eight specialist benchmarks ($480$ streams), sixteen adversarial distribution-shift cells ($160$ streams), and five live Expert-Iteration RLVR cells with online LoRA over four base models in three architecture families ($10{,}300$ rounds), CSA is the only method among ten compared that satisfies pathwise validity and non-refusing deployment on every cell. We do not propose a new LLM, training algorithm, or policy class; CSA is the deployment-side complement, orthogonal to the model, for operators who cannot use a frontier API.

URL PDF HTML ☆

赞 0 踩 0

2605.20269 2026-05-21 cs.LG cs.AI stat.ML

Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity

捕捉移动子空间：超越平稳性的低秩老虎机

Hamed Khosravi, Xiaoming Huo

AI总结本文研究了在子空间漂移的情况下，低秩线性上下文老虎机的问题，提出了一种新的算法SPSC，在保持子空间变化的同时，实现了基于秩的动态遗憾率。

详情

AI中文摘要

许多老虎机应用（推荐、临床给药、广告定向）有两个事实，以往的工作只孤立处理：奖励生活在低维潜在子空间上，且该子空间漂移。静态低秩老虎机利用秩但受子空间变化影响；非静态线性老虎机适应漂移但以环境速率$\widetilde{O}(d\sqrt{T})$工作。我们研究了分段静态低秩线性上下文老虎机，具有标量反馈：$θ_t = B_k^\star w_t$，其中秩-$r$因子$B_k^\star\in\mathbb{R}^{d\times r}$在每个未知的$K$段内恒定，且可以在边界处改变。我们的结果在三个轴上都是紧致的。 (i) 识别边界。在单次标量奖励下，移动子空间可通过奖励的二次函数来恢复，当且仅当三个探针侧条件成立：已知噪声方差、有界状态-噪声耦合、以及全维探针支持。每个都是在无限制二次矩问题中的必要条件，且共同它们是充分的，表征了解决区域的边界。 (ii) 算法和动态遗憾。SPSC在学习的$r$维子空间内交替等距探针与窗口投影岭UCB利用；CUSUM样式的变体在线发现段边界。成本动态遗憾是$\widetilde{O}(r\sqrt{T})+\widetilde{O}(T^{2/3})+O(W\,V_{\mathrm{in}})$，用内在秩代替环境$d\sqrt{T}$速率。 (iii) 实验。在十一基准上，从合成、UCI/MovieLens、半合成临床和ZOZOTOWN生产日志数据跨度，SPSC在$d-r\gtrsim T^{1/6}$时优于非静态和低秩基线，匹配分析交叉点。据我们所知，这是在该设置中首次工作来表征识别边界并达到内在秩动态遗憾率的工作。

英文摘要

Many bandit deployments (recommendation, clinical dosing, ad targeting) share two facts prior work handles only in isolation: rewards live on a low-dimensional latent subspace, and that subspace drifts. Stationary low-rank bandits exploit rank but break under subspace change; non-stationary linear bandits adapt to drift but pay ambient rate $\widetilde{O}(d\sqrt{T})$. We study piecewise-stationary low-rank linear contextual bandits with scalar feedback: $θ_t = B_k^\star w_t$ with rank-$r$ factor $B_k^\star\in\mathbb{R}^{d\times r}$ constant within each of $K$ unknown segments and able to shift at boundaries. Our results are tight along three axes. (i) Identification boundary. With single-play scalar rewards, the moving subspace is recoverable through quadratic functionals of rewards iff three probe-side conditions hold: known noise variance, bounded state-noise coupling, and full-dimensional probe support. Each is necessary in the unrestricted-second-moment problem, and jointly they are sufficient, characterizing the boundary of the solvable region. (ii) Algorithm and dynamic regret. SPSC interleaves isotropic probes with windowed projected ridge-UCB exploitation inside the learned $r$-dimensional subspace; a CUSUM-style variant discovers segment boundaries online. The costed dynamic regret is $\widetilde{O}(r\sqrt{T})+\widetilde{O}(T^{2/3})+O(W\,V_{\mathrm{in}})$, replacing the ambient $d\sqrt{T}$ rate with the intrinsic rank. (iii) Empirics. On eleven benchmarks spanning synthetic, UCI/MovieLens, semi-synthetic clinical, and ZOZOTOWN production-log data, SPSC outperforms non-stationary and low-rank baselines whenever $d-r\gtrsim T^{1/6}$, matching the analytical crossover. To our knowledge, this is the first work to characterize the identification boundary and attain the intrinsic-rank dynamic-regret rate in this setting.

URL PDF HTML ☆

赞 0 踩 0

2605.15955 2026-05-21 eess.SP stat.ML

Topological Kalman Filtering on Cell Complexes

拓扑卡尔曼过滤在细胞复形上的应用

Chengen Liu, Rohan Money, Ting Gao, Mohammad Sabbaqi, Baltasar Beferull-Lozano, Elvin Isufi

AI总结本文提出了一种基于拓扑结构的状态空间框架，用于从定义在拓扑细胞复形上的多变量时间序列中推断潜在动态，通过拓扑扩散和非线性映射恢复复杂的高阶交互结构。

详情

AI中文摘要

从定义在拓扑细胞复形上的多变量时间序列中推断潜在动态对于捕捉现实世界系统中固有的复杂高阶相互作用至关重要，例如水、传感器和交通网络。然而，由于信号在更高阶拓扑结构之间耦合，高维性、非线性观测和未知结构增加了重建这些潜在状态的难度。为此，我们提出了一种基于细胞复形上的随机偏微分方程的拓扑感知状态空间框架。状态演化遵循类似于热的拓扑扩散，扰动沿边界算子传播。在部分可观测的情况下，我们利用细胞复形卷积将潜在状态与非线性映射结合，以建模观测。我们通过扩展卡尔曼滤波进行递归状态估计，同时通过在线期望最大化算法学习模型参数和不确定性。最后，对于仅已知低阶拓扑结构的情况，例如节点和边，如在关键基础设施网络中，我们引入了一种启发式的细胞识别算法，以显式推断第二阶细胞结构。在合成和真实数据集上的验证表明，我们的方法在部分可观测情况下能够产生可靠的估计，并成功恢复底层拓扑结构。

英文摘要

Inferring latent dynamics from multivariate time-series defined over topological cell complexes is crucial for capturing the complex, higher-order interactions inherent in real-world systems such as in water, sensor, and transportation networks. However, reconstructing these latent states is challenging because the signals are coupled across higher-order topologies, while high dimensionality, nonlinear observations, and unknown structures increase the difficulty. To address this, we propose a topology-aware state space framework derived from stochastic partial differential equations on cell complexes. State evolution follows heat-like topological diffusion, with perturbations propagating along boundary operators. Under partial observability, we model observations using a cell complex convolution of latent states coupled with a nonlinear mapping. We perform recursive state estimation via an Extended Kalman Filter, simultaneously learning model parameters and uncertainties through an online Expectation-Maximization algorithm. Finally, for scenarios where only lower-order topological structure is known, e.g., nodes and edges, as in critical infrastructure networks, we introduce a heuristic cell identification algorithm to explicitly infer the second-order cell structures. Validations on synthetic and real datasets from water, sensor and transportation networks demonstrate that our approach yields reliable estimates under partial observability and successfully recovers the underlying topological structures.

URL PDF HTML ☆

赞 0 踩 0

2603.07312 2026-05-21 stat.ME

Predictive Power Analysis of Multiple Test Procedures Under Arbitrary Dependence

在任意依赖情况下多重检验程序的预测能力分析

George Karabatsos

AI总结本文提出了一种基于贝叶斯预测方法的新方法，用于在任意依赖情况下计算多重检验程序的统计功效和样本量，同时通过模拟方法展示了该方法在铅暴露研究中的应用。

详情

AI中文摘要

许多统计问题可以通过应用一种多重检验程序（MTP）来解决，该程序在未知的任意相关p值下控制家族错误率（FWER）或虚假发现率（FDR），而无需显式建模这些相关性。这些包括控制FWER的Bonferroni（1936）MTP和Holm（1979）MTP；控制FDR的Benjamini和Yekutieli（2001）MTP；以及基于Dirichlet过程（DP）先验分布的DP-MTP（Karabatsos, 2025），该分布支持整个MTP空间，这些MTP控制FWER或FDR。对于此类MTP，本研究介绍了一种新的、相容的方法用于贝叶斯预测能力分析，以计算任何给定计划未来（例如，复制或中期）研究的统计功效和样本量确定。这种新的MTP预测能力分析方法基于一个联合先验分布，定义了一个不对称多元正态均值-方差混合分布的比例矩阵混合，分解为一个通用的效应量先验分布（例如，来自专家判断或先前研究的结果），以及一个均匀先验分布，用于表示给定多个假设检验的测试统计量的p值之间的任意依赖性。新的MTP功效分析方法还产生p值权重，可用于最小化和评估多重检验中的相对影响以及显著性追逐偏差（例如，出版偏倚、p-hacking等），而无需假设p值（效应量）是独立的。新的基于模拟的MTP预测能力分析方法通过分析通过著名铅暴露研究获得的p值，并由先前MTP文献重新分析，使用R包bnpMTP进行了说明。

英文摘要

Many statistical problems can be addressed by applying a multiple testing procedure (MTP) that controls either the Family-wise Error Rate (FWER) or False Discovery Rate (FDR) under unknown arbitrarily-interdependent $p$-values, without explicitly modeling these inter-correlations. They include the FWER-controlling Bonferroni (1936) MTP and Holm (1979) MTP; the FDR-controlling Benjamini and Yekutieli (2001) MTP; and the DP-MTP (Karabatsos, 2025), based on a Dirichlet process (DP) prior distribution supporting the entire space of MTPs that control either the FWER or FDR. For such an MTP, this study introduces a new and congenial method for Bayesian predictive power analysis, for power calculation and sample size determination for any given planned future (e.g., replication or interim) study. This novel MTP predictive power analysis method is based on a joint prior distribution defining a scale matrix mixture of asymmetric multivariate normal mean-variance mixture distributions, factorized as a general prior distribution for effect sizes (e.g., obtained from expert judgment or results of prior studies), and a uniform prior distribution for correlation matrices representing arbitrary dependencies between $p$-values of test statistics of given multiple hypothesis tests under their alternative hypotheses. The new MTP power analysis method also results in $p$-value weights which can be used to minimize the relative impacts of and assess for significance-chasing biases (e.g., publication bias, $p$-hacking, etc.) in multiple testing, without needing to assume that $p$-values (effect sizes) are independent. The new simulation-based MTP predictive power analysis method is illustrated through the analysis of $p$-values obtained by a famous study of lead exposure and re-analyzed by the previous MTP literature, using R package bnpMTP.

URL PDF HTML ☆

赞 0 踩 0

2512.19373 2026-05-21 stat.ML cs.LG

Cluster-Based Generalized Additive Models Informed by Random Fourier Features

基于聚类的广义加性模型：受随机傅里叶特征启发

Xin Huang, Jia Li, Jun Yu

AI总结本文提出了一种结合响应引导的谱表示学习与局部加性建模的可解释回归框架，用于处理异质数据。通过随机傅里叶特征回归模型构建谱特征图，并利用主成分分析压缩以获得低维潜在嵌入，随后通过高斯混合模型发现软区域，在每个区域中使用聚类特定的广义加性模型捕捉非线性协变量效应，最终通过软混合这些局部加性模型实现对非线性和异质结构的灵活建模，同时保持可解释性。

详情

Comments: 33 pages, 13 figures, 7 tables

AI中文摘要

在开发数据驱动的建模方法时，需要在黑箱模型的强大预测性能与关键应用所需透明性之间取得平衡。本文介绍了一种可解释且计算上可行的回归框架，用于异质数据，通过结合响应引导的谱表示学习与局部加性建模。该方法首先拟合一个随机傅里叶特征回归模型，并构建一个谱特征图，从学习的振幅和自适应重新采样频率中获得，使表示反映数据中的预测变化。该表示随后通过主成分分析压缩以获得低维潜在嵌入，在其中高斯混合模型执行软区域发现。在每个区域中，聚类特定的广义加性模型通过可解释的样条基单变量平滑函数捕捉非线性协变量效应。最终预测器由这些局部加性模型的软混合组成，使能够灵活地建模非线性和异质结构，同时保持可解释性。在多个基准回归数据集上的数值实验表明，所提出的方法在一致地优于经典全局可解释基线的同时，仍与更灵活的黑箱模型竞争。总体而言，该框架提供了一种统一的异质回归方法，结合了预测适应性与可解释的局部协变量效应。

英文摘要

In developing data-driven modeling methodologies, there is an ongoing need to reconcile the strong predictive performance of opaque black-box models with the transparency required for critical applications. This work introduces an interpretable and computationally tractable regression framework for heterogeneous data by combining response-informed spectral representation learning with localized additive modeling. The method first fits a random Fourier feature regression model and constructs a spectral feature map from the learned amplitudes and adaptively resampled frequencies, so that the representation reflects predictive variation in the data. This representation is then compressed by principal component analysis to obtain a low-dimensional latent embedding, in which a Gaussian mixture model performs soft regime discovery. Within each regime, a cluster-specific generalized additive model captures nonlinear covariate effects through interpretable spline-based univariate smooth functions. The final predictor is formed as a soft mixture of these local additive models, enabling flexible modeling of a nonlinear, heterogeneous structure while preserving interpretability. Numerical experiments across several benchmark regression datasets show that the proposed method consistently improves upon classical globally interpretable baselines while remaining competitive with more flexible black-box models. Overall, the framework provides a unified approach to heterogeneous regression that combines predictive adaptivity with interpretable local covariate effects.

URL PDF HTML ☆

赞 0 踩 0

2512.02182 2026-05-21 stat.ME stat.AP

Two-phase validation sampling via principal components to improve efficiency in multi-model estimation from error-prone biomedical databases

通过主成分进行两阶段验证抽样以提高多模型估计在误差多的生物医学数据库中的效率

Sarah C. Lotspeich, Cole Manschot

AI总结本文提出了一种基于主成分分析的两阶段抽样方法，用于在多模型估计中提高效率，通过平衡和优先考虑多个模型的变量解释量，从而在误差多的生物医学数据库中更有效地进行验证。

详情

Comments: 22 pages, 5 figures, 2 tables, GitHub repositories with R package and simulation/analysis code

AI中文摘要

两阶段抽样提供了一种成本效益高的方式来验证生物医学数据库中易出错的协变量测量。在第一阶段收集廉价或容易获得的信息，然后在第二阶段对部分患者进行成本高昂的验证（如专家图表审查）以收集更准确的数据。在平衡主要和次要分析时，竞争模型和优先事项可能导致最信息量大的第二阶段抽样标准不明确。极端尾部抽样（ETS）通过选择特定数量（如协变量或残差）的最小和最大值的患者，可以提供在单个分析目标上进行两阶段研究时的统计效率，通过针对对Fisher信息贡献最大的观测值。我们提出了一种直观、易于使用的方法，扩展了ETS以平衡和优先考虑多个感兴趣模型中的最大变异量。通过主成分分析，我们简洁地总结了所有模型误差多的暴露的固有变异。然后，我们对第一个主成分具有最极端值的患者进行抽样以进行验证。通过广泛的模拟和对国家健康和营养调查（NHANES）的应用，所提出的方法在多个感兴趣模型上同时提高了效率。其优势在各种现实世界场景中持续存在，包括相关或异质的测量误差。在设计验证研究时，专注于单个模型可能是短视的。战略性地分配资源可以同时平衡多个分析目标。在抽样前进行降维将使该策略能够很好地扩展到具有许多误差多的暴露的大数据应用中。

英文摘要

Two-phase sampling offers a cost-effective way to validate error-prone covariate measurements in biomedical databases. Inexpensive or easy-to-obtain information is collected for the entire study in Phase I. Then, a subset of patients undergoes cost-intensive validation (e.g., expert chart review) to collect more accurate data in Phase II. When balancing primary and secondary analyses, competing models and priorities can result in poorly defined objectives for the most informative Phase II sampling criterion. Extreme tail sampling (ETS), wherein patients with the smallest and largest values of a particular quantity (like a covariate or residual) are selected, can offer great statistical efficiency in two-phase studies when focusing on a single analytic objective by targeting observations with the biggest contributions to the Fisher information. We propose an intuitive, easy-to-use approach that extends ETS to balance and prioritize explaining the largest amount of variability across multiple models of interest. Using principal components analysis, we succinctly summarize the inherent variability of all models' error-prone exposures. Then, we sample patients with the most extreme values of the first principal component for validation. Through extensive simulations and an application to the National Health and Nutrition Examination Survey (NHANES), the proposed strategy offered simultaneous efficiency gains across multiple models of interest. Its advantages persisted across various real-world scenarios, including correlated or heterogeneous measurement error. When designing a validation study, concentrating on a single model may be short-sighted. Strategically allocating resources more broadly balances multiple analytical goals simultaneously. Employing dimension reduction before sampling will allow this strategy to scale up well to big-data applications with many error-prone exposures.

URL PDF HTML ☆

赞 0 踩 0

2504.01355 2026-05-21 stat.ME econ.EM

A Practical Guide to Estimating Conditional Marginal Effects: Modern Approaches

一种估计条件边际效应的实用指南：现代方法

Jiehan Liu, Ziyi Liu, Yiqing Xu

AI总结本文提供了一种使用现代统计方法估计条件边际效应的实用指南，讨论了处理效应如何随调节变量变化，并改进了现有解决方案，如半参数核估计器，引入了稳健的估计策略，如AIPW-Lasso和DML，并通过模拟和实证例子评估了每种方法，提供了针对样本量和研究背景的实用建议。

2503.03347 2026-05-21 math.ST stat.TH

Drift estimation for rough processes under small noise asymptotic : trajectory fitting method

小噪声渐进行下粗糙过程的漂移估计：轨迹拟合方法

Arnaud Gloter, Nakahiro Yoshida

AI总结本文研究了在小噪声渐进行下，通过轨迹拟合方法估计满足随机Volterra方程的未知漂移参数的问题，提出了一种一致且渐近正态的轨迹拟合估计量，并给出了保证估计量在L^p意义下收敛的可识别性条件。

2412.15076 2026-05-21 stat.AP

Digital N-of-1 Trials and their Application in Experimental Physiology

数字N-of-1试验及其在实验生理学中的应用

Stefan Konigorski, Mathias Ried-Larsen, Christopher H Schmid

AI总结本文提出了一种新的N-of-1试验设计，用于在小样本情况下对个体进行有效的统计推断，并通过实验生理学中的实例展示了其应用和分析方法。

详情

Comments: Accepted in Experimental Physiology. https://doi.org/10.1113/EP092753

AI中文摘要

传统上，实验生理学研究通常在小群体的人类受试者、动物模型或细胞系中进行。在小样本量下识别出能够实现足够统计功效以得出正确统计推断，以检测群体层面效应的研究设计一直具有挑战性。此外，传统群体层面推断得出的平均效应不一定适用于个体受试者。在这里，我们介绍N-of-1试验作为一种创新的研究设计，可以用于对个体受试者干预效果进行有效的统计推断，并且可以跨多个研究受试者进行汇总，以比标准群体随机试验更高效地提供群体层面的推断。N-of-1试验自20世纪80年代末以来已在医疗环境中使用，但尚未大规模采用，并且在实验生理学研究环境中应用较少。在本文中，我们介绍了N-of-1试验的关键组成部分和设计特征，描述了结果的统计分析和解释，并通过实验生理学中的实例描述了一些可用的数字工具，以促进其使用。

英文摘要

Traditionally, studies in experimental physiology have been conducted in small groups of human participants, animal models or cell lines. Identifying optimal study designs that achieve sufficient power for drawing proper statistical inferences to detect group level effects with small sample sizes has been challenging. Moreover, average effects derived from traditional group-level inference do not necessarily apply to individual participants. Here, we introduce N-of-1 trials as an innovative study design that can be used to draw valid statistical inference about the effects of interventions on individual participants and can be aggregated across multiple study participants to provide population-level inferences more efficiently than standard group randomized trials. N-of-1 trials have been used in healthcare settings since the late 1980s, but without large-scale adoption and with few applications in experimental physiology research settings. In this manuscript, we introduce the key components and design features of N-of-1 trials, describe statistical analysis and interpretations of the results, and describe some available digital tools to facilitate their use using examples from experimental physiology.

URL PDF HTML ☆

赞 0 踩 0

2411.05758 2026-05-21 math.ST econ.EM stat.TH

Limit theorems of matching estimators with a fixed number of matches

具有固定匹配数的匹配估计量的极限定理

Songliang Chen, Fang Han

AI总结本文重新审视Abadie和Imbens针对固定匹配数的最近邻匹配估计量的平均处理效应的极限定理，首次建立了具有显式计算极限方差的非标准化中心极限定理（CLT）。关键在于证明CLT中归一化统计量收敛到其均值，并计算该均值的闭式表达式。前者填补了未发表工作（Abadie和Imbens，2002）中的空白，后者解决了Abadie和Imbens（2006）提出的问题。