arXivDaily arXiv每日学术速递 周一至周五更新
重置

1. 统计理论与方法 3 篇

2503.02178 2026-06-12 stat.ML cs.LG 版本更新

Central Limit Theorems for Stochastic Gradient Descent Quantile Estimators

随机梯度下降分位数估计量的中心极限定理

Ziyang Wei, Jiaqi Li, Likai Chen, Wei Biao Wu

AI总结 本文针对常学习率SGD分位数估计,利用马尔可夫链理论证明其平稳分布随学习率趋于零时收敛到高斯分布,首次给出CLT型理论保证,并提出置信区间递归算法。

详情
AI中文摘要

本文发展了通过恒定学习率的随机梯度下降(SGD)进行分位数估计的渐近理论。分位数损失函数既不光滑也不强凸。超越传统视角和技术,我们将分位数SGD迭代视为一个不可约、周期且正常返的马尔可夫链,该链循环收敛到其唯一的平稳分布,无论初始值如何任意固定。为了推导平稳分布的精确形式,我们通过利用平稳方程分析其特征函数的结构。我们还推导了其矩生成函数(MGF)和尾部概率的紧界。综合上述方法,我们证明了当学习率$\eta\rightarrow0$时,中心化和标准化的平稳分布收敛到高斯分布。这一发现为恒定学习率的分位数SGD估计量提供了首个中心极限定理(CLT)类型的理论保证。我们进一步提出了一种递归算法来构建具有统计保证的估计量的置信区间。数值研究展示了在线估计器和推断过程的有效有限样本性能。本研究所发展的理论工具对于研究一般形式化为马尔可夫链的SGD算法具有独立意义,特别是在非强凸和非光滑设置中。

英文摘要

This paper develops asymptotic theory for quantile estimation via stochastic gradient descent (SGD) with a constant learning rate. The quantile loss function is neither smooth nor strongly convex. Beyond conventional perspectives and techniques, we view quantile SGD iteration as an irreducible, periodic, and positive recurrent Markov chain, which cyclically converges to its unique stationary distribution regardless of the arbitrarily fixed initialization. To derive the exact form of the stationary distribution, we analyze the structure of its characteristic function by exploiting the stationary equation. We also derive tight bounds for its moment generating function (MGF) and tail probabilities. Synthesizing the aforementioned approaches, we prove that the centered and standardized stationary distribution converges to a Gaussian distribution as the learning rate $\eta\rightarrow0$. This finding provides the first central limit theorem (CLT)-type theoretical guarantees for the quantile SGD estimator with constant learning rates. We further propose a recursive algorithm to construct confidence intervals of the estimators with statistical guarantees. Numerical studies demonstrate the effective finite-sample performance of the online estimator and inference procedure. The theoretical tools developed in this study are of independent interest for investigating general SGD algorithms formulated as Markov chains, particularly in non-strongly convex and non-smooth settings.

2209.13686 2026-06-12 stat.ME 版本更新

False Discovery Rate Adjustments for Average Significance Level Controlling Tests

平均显著性水平控制检验的错误发现率调整

Timothy B. Armstrong

AI总结 研究在平均显著性水平控制下,Benjamini-Hochberg过程仍能渐近控制FDR,并证明某些依赖调整在有限样本中有效,为高维非参数设置提供FDR控制方法。

详情
AI中文摘要

多重检验调整,例如控制错误发现率(FDR)的Benjamini & Hochberg(1995)逐步上升程序,通常应用于在经典意义上控制显著性水平的检验族:对于每个单独检验,错误拒绝的概率不超过名义水平。在本文中,我们考虑仅满足较弱显著性水平控制概念的检验,其中错误拒绝的概率只需在假设上平均控制。我们发现,Benjamini & Hochberg(1995)逐步上升程序在具有许多弱相关p值和拒绝数量增加的渐近情况下仍然控制FDR,并且对相关p值的某些调整(例如Benjamini & Yekutieli(2001)程序)在有限样本中继续产生FDR控制。我们的结果为在非参数和高维设置中采用FDR控制程序打开了大门,其中弱化推断概念可能允许提高功效。

英文摘要

Multiple testing adjustments, such as the Benjamini & Hochberg (1995) step-up procedure for controlling the false discovery rate (FDR), are typically applied to families of tests that control significance level in the classical sense: for each individual test, the probability of false rejection is no greater than the nominal level. In this paper, we consider tests that satisfy only a weaker notion of significance level control, in which the probability of false rejection need only be controlled on average over the hypotheses. We find that the Benjamini & Hochberg (1995) step-up procedure still controls FDR in the asymptotic regime with many weakly dependent p-values and an increasing number of rejections, and that certain adjustments for dependent p-values such as the Benjamini & Yekutieli (2001) procedure continue to yield FDR control in finite samples. Our results open the door to FDR controlling procedures in nonparametric and high dimensional settings where weakening the notion of inference may allow for power improvements.

2504.16279 2026-06-12 math.ST cs.IT stat.AP 版本更新

Sharp Detection Threshold for Correlation among Multiple Unlabeled Gaussian Networks

多个未标记高斯网络之间相关性的尖锐检测阈值

Taha Ameen, Bruce Hajek

AI总结 研究m≥2个带高斯边权的完全加权图在未知顶点重标号后是否互相关的假设检验问题,确定了固定m下的信息论检测阈值,并证明无检测-恢复间隙。

详情
AI中文摘要

本文研究假设检验问题,判断$m \geq 2$个具有高斯边权的完全加权图在顶点未知重标号后是否互相关。在零模型下,所有边权独立服从标准高斯分布;而在植入模型下,图共享一个潜在顶点对齐,每对对应边权具有相关性$\rho$。对于固定$m$,我们确定了检测的尖锐信息论阈值。在阈值之上,广义似然比检验实现强检测;而在阈值之下,即使弱检测也不可能。该结果将Wu、Xu和Yu的双图检测阈值推广到任意固定数量的图,展示了一种侧信息机制,其中仅两个图不足但多个图可实现检测,并且与Vassaux和Massoulié的恢复阈值一起表明,该高斯多图模型不存在检测-恢复间隙。

英文摘要

This paper studies the hypothesis testing problem of deciding whether $m \geq 2$ complete weighted graphs with Gaussian edge weights are mutually correlated after unknown relabelings of their vertices. Under the null model all edge weights are independent standard Gaussians, whereas under the planted model the graphs share a latent vertex alignment and each pair of corresponding edge weights has correlation $\rho$. For fixed $m$, we identify the sharp information-theoretic threshold for detection. Above the threshold, a generalized likelihood-ratio test achieves strong detection, whereas even weak detection is impossible below the threshold. The result extends the two-graph detection threshold of Wu, Xu, and Yu to any fixed number of graphs, exhibits a side-information regime in which two graphs alone are insufficient but multiple graphs enable detection, and, together with the recovery threshold of Vassaux and Massoulié, shows that this Gaussian multi-graph model has no detection--recovery gap.

2. 贝叶斯统计与概率建模 5 篇

2605.00432 2026-06-12 cs.LG stat.ML 版本更新

Optimal Spatio-Temporal Decoupling for Bayesian Conformal Prediction

贝叶斯共形预测的最优时空解耦

Yu-Hsueh Fang, Chia-Yen Lee

AI总结 提出状态自适应贝叶斯共形预测(SA-BCP),通过门控凸组合平衡长期时间惯性与局部空间证据,实现分布漂移下的快速适应与稳定覆盖,并给出MSE最优阈值闭式解及在线选择过程的遗憾界。

详情
AI中文摘要

在线共形预测必须在快速适应分布漂移与稳定覆盖之间取得平衡:基于反馈的方法反应迅速但变得不稳定,而强折扣贝叶斯方法滞后并在紧密覆盖下膨胀区间。我们引入了\textbf{状态自适应贝叶斯共形预测(SA-BCP)},它将预测分位数形成为长期时间惯性与来自核密度估计的局部空间证据的门控凸组合,由单个可解释的证据阈值$K$控制。我们建立了三个结果:(i) 所得区间的渐近边际有效性;(ii) MSE最优阈值的闭式表达式$K^*_{\mathrm{MSE}}=\alpha(1-\alpha)/M^{\mathcal{T}}$,权衡了覆盖指标(伯努利)方差与时间结构偏差$M^{\mathcal{T}}$;(iii) 在线选择$K$的滚动起点过程——在平稳性下一致,对最佳固定$K$具有$O(\sqrt{T\log N})$遗憾,对于分段变体,在有界漂移下具有次线性动态遗憾界。在四个金融波动率和天气数据集、三个目标覆盖水平以及八个基线(包括最强的最近条件分位数方法SPCI和KOWCPI)上,SA-BCP在大多数设置中达到或超过名义覆盖,同时产生显著更窄的区间——在最紧密覆盖下,Winkler得分比折扣贝叶斯CP低约$3\times$——覆盖匹配审计确认这些效率提升并非欠覆盖的假象。我们披露了一个主要限制:一个专门针对波动率的共形GARCH竞争对手在其主波动率基序列上仍然更高效,尽管它不能跨领域迁移。

英文摘要

Online conformal prediction must balance fast adaptation to distribution shift against stable coverage: feedback-driven methods react quickly but become volatile, while strongly discounted Bayesian methods lag and inflate intervals at tight coverage. We introduce \textbf{State-Adaptive Bayesian Conformal Prediction (SA-BCP)}, which forms the predictive quantile as a gated convex combination of long-term temporal inertia and local spatial evidence from a kernel density estimate, controlled by a single interpretable evidence threshold $K$. We establish three results: (i) asymptotic marginal validity of the resulting intervals; (ii) a closed-form expression for the MSE-optimal threshold, $K^*_{\mathrm{MSE}}=\alpha(1-\alpha)/M^{\mathcal{T}}$, trading the coverage-indicator (Bernoulli) variance against the temporal structural bias $M^{\mathcal{T}}$; and (iii) a rolling-origin procedure for selecting $K$ online -- consistent under stationarity, with $O(\sqrt{T\log N})$ regret against the best fixed $K$ and, for a segmented variant, a sublinear dynamic-regret bound under bounded drift. Across four financial-volatility and weather datasets, three target coverage levels, and eight baselines (including the strongest recent conditional-quantile methods, SPCI and KOWCPI), SA-BCP attains at-or-above-nominal coverage in most settings while producing substantially sharper intervals -- up to roughly $3\times$ lower Winkler score than discounted Bayesian CP at the tightest coverage -- and a coverage-matched audit confirms these efficiency gains are not an artifact of under-coverage. We disclose one principal limitation: a volatility-specialized conformal-GARCH competitor remains more efficient on its home volatility-base series, though it does not transfer across domains.

2602.08913 2026-06-12 cs.LG stat.ML 版本更新

GEMSS: A Variational Bayesian Method for Discovering Multiple Sparse Solutions in Classification and Regression Problems

GEMSS: 一种用于在分类和回归问题中发现多个稀疏解的变分贝叶斯方法

Kateřina Henclová, Václav Šmídl

AI总结 提出GEMSS算法,利用结构化spike-and-slab先验、高斯混合近似后验和Jaccard惩罚,通过变分推断同时发现多个多样化的稀疏特征组合,在128个实验和3个真实数据集上优于对比方法。

详情
AI中文摘要

高维、欠定且高度相关的系统在数据科学实践中很常见,尤其是在分析物理测量时。在这种情况下,特征选择面临根本性挑战,因为多个不同的稀疏子集可能同样好地解释响应。识别这些子集不仅对预测建模至关重要,而且对生成关于潜在机制的领域特定见解也至关重要。然而,传统方法通常只隔离单个解,掩盖了全部合理的解释。本文介绍了GEMSS(高斯集成多稀疏解),一种变分算法,旨在同时发现多个多样化的稀疏特征组合。该方法采用结构化spike-and-slab先验实现稀疏性,使用高斯混合近似难以处理的多模态后验,并引入基于Jaccard的惩罚进一步控制解的多样性。通过随机梯度下降优化单个目标函数。该方法通过一个新的基准测试框架在128个综合实验上进行测试,该框架旨在生成具有相同预测属性的多个稀疏解的人工问题。这使我们能够测量真实特征的检索,而不仅仅是评估预测性能——这些特征更符合我们的实际需求。比较分析表明,GEMSS始终优于通过ALFESE框架适配的五种著名特征选择方法。最后,我们通过来自代谢组学和物理化学的3个具有挑战性的真实世界数据集展示了其实用性:GEMSS成功分离出多个不同但质量高的解。GEMSS作为PyPI包'gemss'提供。相应的存储库此http URL包含完整的代码库和免费的无代码应用程序GEMSS Explorer。

英文摘要

High-dimensional, underdetermined and highly correlated systems are common in data science practice, especially when analyzing physical measurements. In such settings, feature selection poses a fundamental challenge because multiple distinct sparse subsets may explain the response equally well. Their identification is crucial not only for predictive modeling but also for generating domain-specific insights into the underlying mechanisms. Yet, conventional methods typically isolate a single solution, obscuring the full spectrum of plausible explanations. This work introduces GEMSS (Gaussian Ensemble for Multiple Sparse Solutions), a variational algorithm designed to simultaneously discover multiple, diverse sparse feature combinations. The method employs a structured spike-and-slab prior for sparsity, a mixture of Gaussians to approximate the intractable multimodal posterior, and a Jaccard-based penalty to further control solution diversity. A single objective function is optimized via stochastic gradient descent. The method is tested on 128 comprehensive experiments by a novel benchmarking framework designed to generate artificial problems with multiple sparse solutions of equal predictive properties. This allows us to measure the retrieval of ground truth features rather than only evaluating predictive performance -- characteristics more fitting to our practical needs. A comparative analysis shows that GEMSS consistently outperforms five prominent feature selection methods adapted through the ALFESE framework. Finally, we demonstrate practical usability through 3 challenging real-world datasets from metabolomics and physical chemistry: GEMSS successfully isolates multiple distinct yet quality solutions. GEMSS is available as a PyPI package 'gemss'. The corresponding repository this http URL includes the full codebase and a free, no-code application GEMSS Explorer.

2512.25056 2026-06-12 stat.ME 版本更新

Sequential Bayesian parameter-state estimation in dynamical systems with noisy and incomplete observations via a variational framework

基于变分框架的含噪声不完全观测动态系统序贯贝叶斯参数-状态估计

Liliang Wang, Alex Gorodetsky

AI总结 提出一种在线变分推断框架,通过分解联合后验为参数边缘分布和条件状态分布,实现动态系统参数与状态的联合估计,并给出误差上界理论保证,数值实验验证了其在混沌和高维系统中的鲁棒性与可扩展性。

详情
Comments
31 pages, 8 figures
AI中文摘要

在许多应用中,对动态模型的未知参数和状态进行在线联合估计并量化不确定性至关重要。例如,数字孪生动态更新其对模型参数和状态的知识以支持预测和决策。可靠性和计算速度对数字孪生至关重要。在线参数-状态估计确保了计算效率,而不确定性量化对于做出可靠的预测和决策至关重要。在参数-状态估计中,以数据为条件的状态和模型参数的联合分布(称为联合后验)提供了准确的不确定性量化。由于联合后验通常难以计算,本文提出一个在线变分推断框架,在每个时间步计算其近似。该近似被分解为模型参数的边缘分布和以参数为条件的状态分布。这种分解通过两阶段过程实现递归更新:首先,通过变分推断近似参数后验;其次,基于近似参数后验使用高斯滤波计算以参数为条件的状态分布。算法设计由一个定理支持,该定理建立了联合后验近似误差的上界。数值实验表明,所提出的方法(i)准确推断动态和观测模型的未观测状态和未知参数;(ii)在混沌Lorenz'96系统中,在噪声、部分观测和模型偏差下保持鲁棒性;(iii)有效扩展到由对流-扩散方程空间离散化产生的高维状态空间系统,在此设置下优于联合集成卡尔曼滤波器。

英文摘要

Online joint estimation of a dynamical model's unknown parameters and states with uncertainty quantification is crucial in many applications. For example, digital twins dynamically update their knowledge of model parameters and states to support prediction and decision-making. Reliability and computational speed are vital for DTs. Online parameter-state estimation ensures computational efficiency, while uncertainty quantification is essential for making reliable predictions and decisions. In parameter-state estimation, the joint distribution of the state and model parameters conditioned on the data, termed the joint posterior, provides accurate uncertainty quantification. Because the joint posterior is generally intractable to compute, this paper presents an online variational inference framework to compute its approximation at each time step. The approximation is factorized into a marginal distribution over the model parameters and a state distribution conditioned on the parameters. This factorization enables recursive updates through a two-stage procedure: first, the parameter posterior is approximated via variational inference; second, the state distribution conditioned on the parameters is computed using Gaussian filtering based on the approximate parameter posterior. The algorithmic design is supported by a theorem establishing upper bounds on the joint posterior approximation error. Numerical experiments demonstrate that the proposed method (i) accurately infers both unobserved states and unknown parameters of dynamical and observation models; (ii) remains robust under noisy, partial observations and model discrepancies in a chaotic Lorenz'96 system; and (iii) scales effectively to a high-dimensional state-space system arising from the spatial discretization of a convection-diffusion equation. outperforming the joint ensemble Kalman filter in this setting.

2408.17346 2026-06-12 stat.ME stat.CO 版本更新

On Nonparanormal Likelihoods

关于非参数正态似然

Torsten Hothorn

AI总结 提出非参数正态模型的一步估计框架,通过四种新似然函数解决参数联合估计问题,并展示其在变换判别分析中的应用优势。

详情
AI中文摘要

非参数正态模型通过潜在高斯(即参数)copula 描述多元响应的联合分布,同时允许灵活的非参数边际。这些分布的某些方面(例如条件独立性)是参数化的。其他特征(如边际分布)可以是非参数或半参数化的。当多元正态性可疑但可解释性至关重要时,此类模型具有吸引力。大多数估计过程执行两步:首先估计非参数部分。然后处理 copula 参数,将边际估计视为已知。这对于某些应用是足够的。对于其他应用,例如当半参数边际包含感兴趣的参数或标准误差很重要时,所有参数的联合估计可能更有利。我们提出了非参数正态模型的合适参数化,可能包括半参数效应,并定义了四种新颖的非参数正态对数似然函数。通常,相应的单步优化问题被证明是非凸的。然而,在某些情况下,会出现双凸问题。讨论了几种凸近似。从底层计算角度来看,核心贡献是通过 Genz 过程计算的多元正态对数概率的得分函数。为了展示理论和计算框架的通用性,我们提出了一系列用于变换判别分析的非参数正态模型,其中一些生物标志物受到检测限问题的影响。在模拟研究中,针对半参数有效多分格相关分析(存在理论基准),展示了全最大似然估计相比两步方法可能带来的经验增益。

英文摘要

Nonparanormal models describe the joint distribution of multivariate responses via latent Gaussian, and thus parametric, copulae while allowing flexible nonparametric marginals. Some aspects of such distributions, for example conditional independence, are formulated parametrically. Other features, such as marginal distributions, can be formulated non- or semiparametrically. Such models are attractive when multivariate normality is questionable but interpretability paramount. Most estimation procedures perform two steps, first estimating the nonparametric part. The copula parameters come second, treating the marginal estimates as known. This is sufficient for some applications. For other applications, e.g. when a semiparametric margin features parameters of interest or when standard errors are important, a simultaneous estimation of all parameters might be more advantageous. We present suitable parameterisations of nonparanormal models, possibly including semiparametric effects, and define four novel nonparanormal log-likelihood functions. In general, the corresponding one-step optimisation problems are shown to be non-convex. In some cases, however, biconvex problems emerge. Several convex approximations are discussed. From a low-level computational point of view, the core contribution is the score function for multivariate normal log-probabilities computed via Genz procedure. As a demonstration for the versatility of the theoretical and computational framework, we present a series of nonparanormal models for transformation discriminant analysis when some biomarkers are subject to limit-of-detection problems. Possible empirical gains of full maximum likelihood estimation compared to two-step approaches are illustrated in a simulation study targeting semiparametric efficient polychoric correlation analysis where a theoretical benchmark is available.

2411.07651 2026-06-12 stat.ME stat.ML 版本更新

Quasi-Bayes empirical Bayes: a sequential approach to the Poisson compound decision problem

拟贝叶斯经验贝叶斯:泊松复合决策问题的序贯方法

Stefano Favaro, Sandra Fortini

AI总结 针对流式数据中的泊松复合决策问题,提出基于牛顿算法的拟贝叶斯序贯估计,具有常数计算成本,并证明了其一致性和渐近最优性。

详情
Comments
49 pages
AI中文摘要

泊松复合决策问题是统计学中一个长期存在的问题,经验贝叶斯方法常用于在静态或批量设置中估计泊松均值。我们在流式或在线框架中考虑该问题。基于牛顿算法的拟贝叶斯方法,我们开发了一个易于评估、计算高效且随着数据积累具有常数每观测成本的序贯估计。我们为所提出的估计建立了频率学派保证,包括一致性和渐近最优性,其中最优性理解为渐近消失的超额贝叶斯风险或遗憾。通过模拟研究和与基准程序的比较评估了实证性能。

英文摘要

The Poisson compound decision problem is a long-standing problem is statistics, for which empirical Bayes methods are commonly used to estimate Poisson means in static or batch settings. We consider this problem in a streaming, or online, framework. Building on a quasi-Bayesian approach based on Newton's algorithm, we develop a sequential estimate that is easy to evaluate, computationally efficient, and has constant per-observation cost as the data accrue. We establish frequentist guarantees for the proposed estimate, including consistency and asymptotic optimality, with optimality understood as vanishing excess Bayes risk, or regret. Empirical performance is assessed through simulation studies and comparisons with benchmark procedures.

3. 因果推断与实验设计 5 篇

2606.04009 2026-06-12 stat.ML cs.AI cs.LG 版本更新

Counterfactual Explanations for Deep Two-Sample Testing

深度双样本检验的反事实解释

Wei-Cheng Lai, Marco Simnacher, Christoph Lippert

AI总结 针对深度双样本检验,提出基于扩散自编码器和MMD优化的反事实解释框架,生成样本级编辑以揭示驱动假设拒绝的特征。

详情
Comments
17 pages
AI中文摘要

双样本检验是检测科学领域中分布差异的基本工具,但经典检验(包括基于核的检验)在高维结构化数据(如图像)上可能效果不佳。最近的深度双样本检验通过学习信息表示提高了这些场景下的灵敏度,但它们对哪些数据特征驱动拒绝原假设 $H_0$ 提供的洞察有限。为解决此问题,我们提出了一种用于深度双样本检验的反事实解释框架,该框架生成样本级编辑,将观测值从源组移向目标组,同时明确减少检验所测量的差异。我们的方法将扩散自编码器与预训练的深度双样本检验模型相结合,并在检验模型的表示空间中优化最大均值差异(MMD)目标,以生成合理的反事实。我们通过检验统计量和由此产生的双样本p值的变化来量化分布级效应。我们在合成2D形状数据集和两个MRI队列上评估了该方法。在这两种设置下,反事实变换相对于原始样本持续增加p值,表明编辑后的源集在检验下在统计上更接近目标分布。我们使用LPIPS测量最小性,以确保反事实保持接近原始样本。由此产生的编辑提供了与检测到的组差异相关的特征的可解释证据。在MRI上,局部变化与队列之间已知的解剖学差异一致。

英文摘要

Two-sample testing is a fundamental tool for detecting distributional differences across scientific domains, but classical tests (including kernel-based tests) can be ineffective on high-dimensional structured data such as images. Recent deep two-sample tests improve sensitivity in these settings by learning informative representations, yet they provide limited insight into which data features drive rejection of the null hypothesis $H_0$. To address this issue, we propose a counterfactual explanation framework for deep two-sample testing that generates sample-level edits moving observations from a source group toward a target group while explicitly reducing the discrepancy measured by the test. Our method combines a diffusion autoencoder with a pretrained deep two-sample test model and optimizes a maximum mean discrepancy (MMD) objective in the test model's representation space to produce plausible counterfactuals. We quantify distribution-level effects through changes in the test statistic and the resulting two-sample p-values. We evaluate the method on synthetic 2D shape datasets and two MRI cohorts. Across both settings, the counterfactual transformations consistently increase p-values relative to the original samples, indicating that the edited source set becomes statistically closer to the target distribution under the test. We measure minimality using LPIPS to ensure the counterfactuals remain close to the original samples. The resulting edits provide interpretable evidence of the features associated with the detected group differences. On MRI, the localized changes are consistent with known anatomical differences between cohorts.

2605.18724 2026-06-12 stat.ME 版本更新

Sensitivity analysis for causal mediation: bridge score, sharp sensitivity bounds, and calibration

因果中介的敏感性分析:桥分数、精确敏感性界限和校准

Yuki Ohnishi, Fan Li

AI总结 本文提出桥分数作为中介阶段的平衡分数,并通过两个可解释的潜在混淆参数推导出精确的点wise界限,同时介绍了两种校准方法以实现敏感性分析。

详情
Comments
33 pages
AI中文摘要

因果中介分析将总处理效应分解为通过假设中介变量起作用的部分和残余直接部分。自然直接和间接效应的识别通常依赖于顺序可忽略性的中介阶段,这无法通过经验验证,需要明确的敏感性分析。我们引入了桥分数,这是一种由两个处理特定的中介密度在共同中介值处形成的低维向量,并展示了它是顺序可忽略性中介阶段的平衡分数。在桥分数条件下,我们推导出一个精确的点wise envelope,以解释两个可解释的潜在混淆参数来表达未识别的中介-结果混淆函数。为了使该界限适用于敏感性分析,我们进一步引入了两种校准方法。第一种是针对观测协变量的基准校准,包括一种基于排名的版本,其对基准的单调重新表达具有不变性;第二种是基于残余结果变异的残差预算校准。最后,我们展示如何通过标量函数减少和贝叶斯g-计算算法将点wise界限用于推断,将所有不确定性源传播到中介效应估计的后验抽样中。

英文摘要

Causal mediation analysis decomposes the total treatment effect into a portion operating through a hypothesized mediator and a residual direct portion. Identification of natural direct and indirect effects typically rests on the mediator stage of sequential ignorability, which cannot be empirically verified and requires explicit sensitivity analysis. We formulate the \emph{bridge score}, a mediator-stage balancing score, as a low-dimensional vector formed from the two treatment-specific mediator densities at a common mediator value, and show that it balances baseline covariates for the mediator stage relevant to natural effect identification. Conditional on the bridge score, we derive a sharp pointwise variance envelope on the unidentified mediator-outcome confounding function in terms of latent outcome relevance and residual selection. To make the bound operational for sensitivity analysis, we further introduce a residual budget calibration approach based on local residual outcome variation and record a complementary range bound for support-based restrictions. Finally, we show how the pointwise bound can be operationalized for inference through a scalar functional reduction and a Bayesian g-computation algorithm that combines observed-data posterior uncertainty with user-specified sensitivity uncertainty, rather than treating the unidentified sensitivity corrections as learned from the likelihood.

2604.23534 2026-06-12 stat.ME stat.AP 版本更新

Multivariate incremental effects for continuous treatments: Studying the health effects of environmental mixtures

连续型处理变量的多元增量效应:研究环境混合物的健康影响

Zhuochao Huang, Kejin Dong, Tuo Lin, Joseph Antonelli

AI总结 针对连续型多元暴露(如空气污染混合物)违背正性假设的问题,提出基于指数倾斜的因果推断框架,定义公平比较不同干预方向的因果估计量,并开发高效一步估计、黎曼BFGS算法等理论方法,应用于全国环境健康数据以优化PM2.5化学混合物干预策略。

详情
AI中文摘要

评估多元连续暴露(如空气污染混合物)的因果健康效应是一项关键的公共卫生挑战。主要障碍是正性假设经常被违反,这使得标准确定性干预的效果无法识别或严重依赖于不可靠的模型外推。在本文中,我们开发了一个新的因果推断框架来应对这一挑战。我们将指数倾斜扩展到多元暴露,并解决了如何公平比较不同干预方向的关键问题。这建立了一个系统框架,用于定义和评估各种政策相关的因果估计量,使研究人员能够解决不同的科学问题。我们开发了许多方法论进展,包括高效的一步估计策略、用于求解约束流形优化问题的黎曼BFGS算法、因果估计量的半参数效率界、估计量的极小极大速率,并建立了渐近正态性。我们通过将框架应用于全国环境健康数据集来展示其实用性,以确定减少与PM$_{2.5}$化学混合物相关的不良健康结果的最优策略。

英文摘要

Evaluating the causal health effects of multivariate, continuous exposures, such as air pollution mixtures, is a critical public health challenge. A primary obstacle is the frequent violation of the positivity assumption, which renders the effects of standard deterministic interventions unidentified or heavily reliant on unreliable model extrapolation. In this paper, we develop a novel causal inference framework to address this challenge. We extend exponential tilting to multivariate exposures and address the critical question of how to compare different intervention directions fairly. This establishes a systematic framework for defining and evaluating various policy-relevant causal estimands, allowing researchers to address diverse scientific questions. We develop numerous methodological advancements, including efficient one-step estimation strategies, a Riemannian BFGS algorithm to solve a constrained manifold optimization problem, semiparametric efficiency bounds for causal estimands, minimax rates for estimators, and establishing asymptotic normality. We demonstrate our framework's utility by applying it to a nationwide environmental health dataset to identify the optimal strategy for reducing adverse health outcomes associated with a PM$_{2.5}$ chemical mixture.

2410.00903 2026-06-12 stat.AP cs.CL cs.LG 版本更新

Causal Inference with Generative Artificial Intelligence: Application to Texts as Treatments

基于生成式人工智能的因果推断:以文本作为处理变量

Kosuke Imai, Kentaro Nakamura

AI总结 提出利用生成式AI(如大语言模型)生成处理变量并利用其内部表示进行因果效应估计,避免从数据中学习因果表示,提高估计准确性和效率。

详情
AI中文摘要

在本文中,我们展示了如何利用生成式人工智能(GenAI)的力量,增强以文本等高维非结构化数据作为处理变量时的因果推断有效性。具体而言,我们提出使用深度生成模型(如大语言模型,LLMs)高效地生成处理变量,并利用其内部表示进行后续的因果效应估计。我们表明,了解这种真实内部表示有助于将感兴趣的处理特征(如特定情感和某些主题)与其他可能未知的混淆特征分离开来。与现有方法不同,所提出的GenAI驱动推断(GPI)方法无需从数据中学习因果表示,因此能产生更准确和高效的估计。我们正式建立了非参数识别平均处理效应所需的条件,提出了一种避免重叠假设违反的估计策略,并通过应用双重机器学习推导了所提出估计量的渐近性质。最后,利用工具变量方法,我们将所提出的GPI方法扩展到处理特征基于人类感知的场景。GPI也适用于文本复用,即使用LLM重新生成现有文本。我们进行了模拟和实证研究,使用开源LLM Llama 3生成的文本数据,展示了我们的估计器相对于最先进的因果表示学习算法的优势。

英文摘要

In this paper, we demonstrate how to enhance the validity of causal inference with unstructured high-dimensional treatments like texts, by leveraging the power of generative Artificial Intelligence (GenAI). Specifically, we propose to use a deep generative model such as large language models (LLMs) to efficiently generate treatments and use their internal representation for subsequent causal effect estimation. We show that the knowledge of this true internal representation helps disentangle the treatment features of interest, such as specific sentiments and certain topics, from other possibly unknown confounding features. Unlike existing methods, the proposed GenAI-Powered Inference (GPI) methodology eliminates the need to learn causal representation from the data, and hence produces more accurate and efficient estimates. We formally establish the conditions required for the nonparametric identification of the average treatment effect, propose an estimation strategy that avoids the violation of the overlap assumption, and derive the asymptotic properties of the proposed estimator through the application of double machine learning. Finally, using an instrumental variables approach, we extend the proposed GPI methodology to the settings in which the treatment feature is based on human perception. The GPI is also applicable to text reuse where an LLM is used to regenerate existing texts. We conduct simulation and empirical studies, using the generated text data from an open-source LLM, Llama 3, to illustrate the advantages of our estimator over state-of-the-art causal representation learning algorithms.

2111.08157 2026-06-12 econ.EM math.ST stat.ME 版本更新

Fine Stratification of Survey Experiments

调查实验的精细分层

Max Cytrynbaum

AI总结 本文提出两阶段实验模型,通过匹配k元组随机化实现精细分层,开发快速匹配算法,证明可减少处理效应估计方差,并提供充分利用设计效率的推断方法。

详情
AI中文摘要

本文研究了一个两阶段实验模型,其中研究者首先从符合条件的池中抽样具有代表性的实验参与者,然后使用匹配的$k$元组随机化将每个抽样单元分配到处理组或对照组。为了实现这种设计,我们开发了一种快速的新算法,用于将单元匹配成$k$元组,适用于任意$k \ge 2$和任意维度的协变量。通过调查200篇近期实验工作论文,我们估计该算法新近实现了多变量精细分层,并为经济学中约44%的实验提供了可证明的匹配质量保证。我们表明,精细分层抽样和分配都非参数地降低了处理效应估计的方差,其中分层抽样的收益随着合格池的大小以及协变量预测处理效应异质性的程度而增加。我们开发了新的推断方法,充分利用两个设计阶段的效率提升,允许研究者报告更小的标准误,如果他们设计了代表性实验。对九个已发表实验的应用量化了效率提升。

英文摘要

This paper studies a two-stage model of experimentation, where the researcher first samples representative experimental participants from an eligible pool, then assigns each sampled unit to treatment or control, using matched $k$-tuples randomization at both stages. To implement such designs, we develop a fast new algorithm for matching units into $k$-tuples for any $k \ge 2$ and any dimension of covariates. By surveying 200 recent experimental working papers, we estimate that our algorithm newly enables multivariate fine stratification with provable match quality guarantees for about 44\% of experiments in economics. We show that finely stratified sampling and assignment both nonparametrically reduce the variance of treatment effect estimation, with the gains from stratified sampling increasing in the size of the eligible pool and how well covariates predict treatment effect heterogeneity. We develop new inference methods that fully exploit the efficiency gains from both design stages, allowing researchers to report smaller standard errors if they designed a representative experiment. An application to nine published experiments quantifies the efficiency gains.

4. 计算统计与MCMC 5 篇

2602.03165 2026-06-12 stat.ME stat.ML 版本更新

Entropic Mirror Monte Carlo

熵镜像蒙特卡洛

Anas Cherradi (LPSM (UMR\_8001), SU), Yazid Janati, Alain Durmus (CMAP), Sylvain Le Corff (LPSM (UMR\_8001), SU), Yohan Petetin, Julien Stoehr (CEREMADE)

AI总结 提出一种自适应重要性采样方法,通过结合全局采样与延迟加权机制构建高效提议分布,实现多模态高维目标分布的有效探索,并证明算法在温和假设下几何收敛。

详情
AI中文摘要

重要性采样是一种蒙特卡洛方法,它利用来自提议分布的加权样本设计目标分布下期望的估计量。当目标分布复杂时,例如高维空间中的多模态分布,重要性采样的效率关键取决于提议分布的选择。在本文中,我们提出了一种新的自适应方案来构建高效的提议分布。我们的算法通过将全局采样机制与延迟加权过程相结合,促进了对目标分布的有效探索。所提出的加权机制通过在提议分布与目标适应不良的区域实现快速重采样,发挥了关键作用。我们的采样算法在温和假设下被证明是几何收敛的,并通过各种数值实验进行了说明。

英文摘要

Importance sampling is a Monte Carlo method which designs estimators of expectations under a target distribution using weighted samples from a proposal distribution. When the target distribution is complex, such as multimodal distributions in highdimensional spaces, the efficiency of importance sampling critically depends on the choice of the proposal distribution. In this paper, we propose a novel adaptive scheme for the construction of efficient proposal distributions. Our algorithm promotes efficient exploration of the target distribution by combining global sampling mechanisms with a delayed weighting procedure. The proposed weighting mechanism plays a key role by enabling rapid resampling in regions where the proposal distribution is poorly adapted to the target. Our sampling algorithm is shown to be geometrically convergent under mild assumptions and is illustrated through various numerical experiments.

2601.22003 2026-06-12 stat.ML cs.LG stat.CO 版本更新

Efficient Stochastic Optimisation via Sequential Monte Carlo

通过序贯蒙特卡洛实现高效随机优化

James Cuin, Davide Carbone, Yanbo Tang, O. Deniz Akyildiz

AI总结 针对梯度难以计算的优化问题,提出用序贯蒙特卡洛(SMC)采样器替代昂贵的内采样循环,实现高效随机优化,并在能量模型奖励调优中验证有效性。

详情
Comments
Accepted to ICML 2026
AI中文摘要

在机器学习和统计学中,从最大边际似然估计过程到生成模型的微调,经常出现优化具有难处理梯度函数的问题。针对这类问题的随机近似方法通常需要内部采样循环来获得(有偏的)随机梯度估计,这很快会变得计算昂贵。在这项工作中,我们开发了用于优化具有难处理梯度函数的序贯蒙特卡洛(SMC)采样器。我们的方法用高效的SMC近似替代昂贵的内部采样方法,这可以带来显著的计算收益。我们为我们的方法所定义的基本递归建立了收敛结果,这些递归由SMC采样器近似。我们在各种设置下对能量模型的奖励调优展示了我们方法的有效性。

英文摘要

The problem of optimising functions with intractable gradients frequently arises in machine learning and statistics, ranging from maximum marginal likelihood estimation procedures to fine-tuning of generative models. Stochastic approximation methods for this class of problems typically require inner sampling loops to obtain (biased) stochastic gradient estimates, which rapidly becomes computationally expensive. In this work, we develop sequential Monte Carlo (SMC) samplers for optimisation of functions with intractable gradients. Our approach replaces expensive inner sampling methods with efficient SMC approximations, which can result in significant computational gains. We establish convergence results for the basic recursions defined by our methodology which SMC samplers approximate. We demonstrate the effectiveness of our approach on the reward-tuning of energy-based models within various settings.

2512.23566 2026-06-12 math.DS cond-mat.stat-mech cs.LG math.OC stat.ML 版本更新

From geometry to dynamics: Learning overdamped Langevin dynamics from sparse observations with geometric constraints

从几何到动力学:基于几何约束从稀疏观测学习过阻尼朗之万动力学

Dimitra Maoutsa

AI总结 提出一种随机控制框架,利用系统不变密度的几何结构进行路径增强,从稀疏时间采样数据中恢复过阻尼朗之万动力学,无需参数模型假设。

详情
Comments
10+54 pages, 14 figures; accepted at ICML 2026 An earlier account of this work has previously appeared in arXiv:2301.08102 and arXiv:2304.00423; main methodology remains the same, this version includes additional numerical experiments and theory
AI中文摘要

当随机系统的轨迹在时间上稀疏采样时,我们如何学习其动力学背后的规律?现有方法要么需要时间分辨的高频观测,要么依赖于仅适用于保守系统的几何论证,限制了它们能恢复的动力学范围。在这里,我们提出一个新的框架,通过将推断重新表述为随机控制问题来调和这两种观点。我们的方法使用几何驱动的路径增强,以系统不变密度的几何结构为指导,重构可能的轨迹并推断底层动力学,而不假设特定的参数模型。应用于过阻尼朗之万系统,我们的方法即使在极度欠采样数据下也能准确恢复随机动力学,在合成基准测试中优于现有方法。这项工作证明了将几何归纳偏差纳入随机系统识别方法的有效性。

英文摘要

How can we learn the laws underlying the dynamics of stochastic systems when their trajectories are sampled sparsely in time? Existing methods either require temporally resolved high-frequency observations, or rely on geometric arguments that apply only to conservative systems, limiting the range of dynamics they can recover. Here, we present a new framework that reconciles these two perspectives by reformulating inference as a stochastic control problem. Our method uses geometry-driven path augmentation, guided by the geometry in the system's invariant density to reconstruct likely trajectories and infer the underlying dynamics without assuming specific parametric models. Applied to overdamped Langevin systems, our approach accurately recovers stochastic dynamics even from extremely undersampled data, outperforming existing methods in synthetic benchmarks. This work demonstrates the effectiveness of incorporating geometric inductive biases into stochastic system identification methods.

2402.01779 2026-06-12 eess.IV cs.CV cs.LG stat.ML 版本更新

Plug-and-Play image restoration with Stochastic deNOising REgularization

即插即用图像恢复:随机去噪正则化

Marien Renaud, Jean Prost, Arthur Leclaire, Nicolas Papadakis

AI总结 提出SNORE框架,仅在适当噪声水平图像上应用去噪器,结合随机正则化与梯度下降求解逆问题,在去模糊和修复任务上达到SOTA。

详情
AI中文摘要

即插即用(PnP)算法是一类迭代算法,通过结合物理模型和深度神经网络进行正则化来解决图像逆问题。尽管它们能产生令人印象深刻的图像恢复结果,但这些算法依赖于在迭代过程中噪声逐渐减小的图像上非标准地使用去噪器,这与最近基于扩散模型(DM)的算法形成对比,后者仅在重新加噪的图像上应用去噪器。我们提出了一种新的PnP框架,称为随机去噪正则化(SNORE),该框架仅在具有适当噪声水平的图像上应用去噪器。它基于显式的随机正则化,从而产生一种随机梯度下降算法来解决不适定逆问题。提供了该算法及其退火扩展的收敛性分析。实验上,我们证明SNORE在去模糊和修复任务上与最先进方法相比具有竞争力,无论是在定量还是定性方面。

英文摘要

Plug-and-Play (PnP) algorithms are a class of iterative algorithms that address image inverse problems by combining a physical model and a deep neural network for regularization. Even if they produce impressive image restoration results, these algorithms rely on a non-standard use of a denoiser on images that are less and less noisy along the iterations, which contrasts with recent algorithms based on Diffusion Models (DM), where the denoiser is applied only on re-noised images. We propose a new PnP framework, called Stochastic deNOising REgularization (SNORE), which applies the denoiser only on images with noise of the adequate level. It is based on an explicit stochastic regularization, which leads to a stochastic gradient descent algorithm to solve ill-posed inverse problems. A convergence analysis of this algorithm and its annealing extension is provided. Experimentally, we prove that SNORE is competitive with respect to state-of-the-art methods on deblurring and inpainting tasks, both quantitatively and qualitatively.

2505.14343 2026-06-12 stat.CO stat.ME stat.ML 版本更新

Mixing times of data-augmentation Gibbs samplers for high-dimensional probit regression

高维probit回归的数据增强Gibbs采样器的混合时间

Filippo Ascolani, Giacomo Zanella

AI总结 针对贝叶斯probit回归的数据增强Gibbs采样器,基于对数凹目标分布的Gibbs采样器最新结果,给出了混合时间的显式非渐近界,并分析了不同统计场景下的行为。

详情
AI中文摘要

我们研究了贝叶斯probit回归中流行的数据增强采样器的收敛性质。利用最近关于对数凹目标分布的Gibbs采样器的结果,我们提供了相关混合时间(在Kullback-Leibler散度下)的简单且显式的非渐近界。这些界明确依赖于设计矩阵和先验精度,并且对响应向量一致成立。我们将结果专门化到不同的统计感兴趣区域,当数据点数$n$和参数$p$都很大时:特别地,我们识别了混合时间在$n,p\to\infty$时保持有界的情况,以及混合时间发散的情况。结果表明(在响应最坏情况下)是紧的,并为选择能导致快速混合的先验分布提供了指导。基于耦合技术的实证分析表明,这些界能有效预测实际观察到的行为。

英文摘要

We investigate the convergence properties of popular data-augmentation samplers for Baye\-sian probit regression. Leveraging recent results on Gibbs samplers for log-concave targets, we provide simple and explicit non-asymptotic bounds on the associated mixing times (in Kullback-Leibler divergence). The bounds depend explicitly on the design matrix and the prior precision, while they hold uniformly over the vector of responses. We specialize the results for different regimes of statistical interest, when both the number of data points $n$ and parameters $p$ are large: in particular we identify scenarios where the mixing times remain bounded as $n,p\to\infty$, and ones where they do not. The results are shown to be tight (in the worst case with respect to the responses) and provide guidance on choices of prior distributions that provably lead to fast mixing. An empirical analysis based on coupling techniques suggests that the bounds are effective in predicting practically observed behaviours.

5. 机器学习统计基础 7 篇

2606.01172 2026-06-12 cs.LG stat.ME stat.ML 版本更新

Revisiting Neural Processes via Fourier Transform and Volterra Series

通过傅里叶变换和Volterra级数重新审视神经过程

Peiman Mohseni, Nick Duffield, Raymond K. W. Wong

AI总结 本文利用Volterra展开和集合傅里叶卷积,提出了两种新的条件神经过程模型,解决了现有平移等变神经过程在可解释性和计算效率上的局限性。

详情
AI中文摘要

从有限的、不规则采样的测量中建模未知的潜在函数是科学和工程中的一个反复出现的挑战。神经过程(NPs)是一类概率函数模型,是有前景的解决方案——尤其是当赋予领域特定的对称性(如平移等变性)时,这提高了样本效率和泛化能力。然而,现有的平移等变NPs面临两个局限性:(i)它们堆叠带有非线性的通用组件,模糊了诱导的函数类并限制了可解释性;(ii)卷积设计依赖于具有局部感受野的核,并需要密集的均匀输入网格,而基于注意力的方法避免了这些问题,但随观测数量呈二次方缩放。我们通过两个贡献解决了这两个问题。首先,利用Volterra展开,我们将连续平移等变算子表征为高阶卷积的和,实现了分析透明性,同时允许通过一阶卷积进行高效近似。其次,我们引入了集合傅里叶卷积(SFConvs),这是一种频域参数化方法,直接在不规则采样点上操作,实现近似全局感受野,并在观测数量上线性缩放。基于这些思想,我们提出了两种条件神经过程(CNPs):SFConvCNPs,它堆叠带有非线性的SFConv块,以及SFVConvCNPs,它整合了Volterra公式。在合成和真实世界数据集上的实验证明了我们的方法相对于最先进基线的有效性。

英文摘要

Modeling unknown latent functions from finite, irregularly sampled measurements is a recurring challenge across science and engineering. Neural processes (NPs), a family of probabilistic functional models, are promising solutions -- especially when endowed with domain-specific symmetries like translation equivariance, which improve sample efficiency and generalization. Yet existing translation-equivariant NPs face two limitations: (i) they stack generic components with non-linearities, obscuring the induced function class and limiting interpretability; and (ii) convolutional designs rely on kernels with local receptive fields and require dense uniform input grids, while attention-based methods avoid these issues but scale quadratically with the number of observations. We address both with two contributions. First, using the Volterra expansion, we characterize continuous translation-equivariant operators as sums of higher-order convolutions, yielding analytical transparency while admitting efficient approximation by first-order convolutions. Second, we introduce set Fourier convolutions (SFConvs), a frequency-domain parameterization that operates directly on irregularly sampled points, achieves approximately global receptive fields, and scales linearly in the number of observations. Building on these ideas, we propose two conditional NPs (CNPs): SFConvCNPs, which stack SFConv blocks with non-linearities, and SFVConvCNPs, which integrate the Volterra formulation. Experiments on synthetic and real-world datasets demonstrate our methods' efficacy against state-of-the-art baselines.

2605.28076 2026-06-12 stat.ML math.NA nlin.CD physics.data-an 版本更新

Diagnosing the conditional-mean barrier in scientific machine-learning surrogates

条件均值障碍:从确定性回归到条件分布学习

Junfeng Chen

AI总结 本文提出条件均值障碍概念,通过残差-特征正交性和决定系数两个诊断指标识别该障碍,并证明添加潜在随机性会迫使平方损失预测器回到条件均值,从而需要分布评分损失来跨越障碍。

详情
AI中文摘要

计算科学与工程中的许多问题在粗粒化、部分观测或逆重建后变成一对多映射:一个已解析状态可能无法确定唯一的子网格强迫,一个结构描述符可能无法确定唯一的有效响应,一个低分辨率观测可能对应多个合理的高分辨率场。在这种情况下,确定性代理可能学习到一个定义明确的数学对象,但仍会遗漏应用相关的不确定性。本教程开发了一个以条件均值障碍为中心的自包含模块:平方损失预测器达到条件均值且剩余误差为不可约的偶然方差时的点。我们给出了两个定位该障碍的诊断方法:残差-特征正交性和决定系数(相对于其解释方差上限),并证明向平方损失预测器添加潜在随机性会使其坍缩回条件均值。因此,跨越障碍需要一种对分布而非点预测进行评分的损失函数。我们简要整理了常见的分布目标,包括负对数似然、矩和可观测匹配、变分目标、对抗散度和分数匹配,根据每个目标针对的条件律特征进行分类。重点在于障碍本身以及识别它的有限数据程序,而非对超越障碍的方法进行综述。基于CPU的双分支律和双尺度Lorenz-96闭合问题的演示展示了诊断如何区分确定性欠拟合与剩余分布变异性。

英文摘要

Many problems in computational science and engineering become one-to-many after coarse graining, partial observation, or inverse reconstruction: a resolved state may not determine a unique subgrid forcing, a structural descriptor may not determine a unique effective response, and a low-resolution observation may correspond to many plausible high-resolution fields. In such settings, deterministic surrogates may learn a well-defined mathematical object while still missing application-relevant uncertainty. This tutorial develops a self-contained module centered on the conditional-mean barrier: the point at which a squared-loss predictor has reached the conditional mean and the remaining error is irreducible aleatoric variance. We give two diagnostics for locating this barrier, residual-feature orthogonality and the coefficient of determination against its explained-variance ceiling, and prove that adding latent randomness to a squared-loss predictor collapses it back to the conditional mean. Crossing the barrier therefore requires a loss that scores distributions rather than point predictions. We briefly organize common distributional objectives, including negative log-likelihood, moment and observable matching, variational objectives, adversarial divergences, and score matching, by the feature of the conditional law each targets. The emphasis is the boundary itself and a finite-data procedure for recognizing it, rather than a survey of methods beyond it. CPU-based demonstrations on a two-branch law and a two-scale Lorenz-96 closure problem show how the diagnostics distinguish deterministic underfitting from residual distributional variability.

2603.17527 2026-06-12 stat.ML cs.LG math.OC 版本更新

Mirror Descent on Riemannian Manifolds

黎曼流形上的镜像下降

Jiaxin Jiang, Lei Shi, Jiyuan Tan

AI总结 将镜像下降推广到黎曼流形,通过重参数化提出黎曼镜像下降(RMD)及其随机变体,并建立非渐近收敛保证,在Stiefel流形上退化为曲线梯度下降(CGD)。

详情
AI中文摘要

镜像下降(MD)是一种可扩展的一阶方法,广泛应用于大规模优化,包括图像处理、策略优化和神经网络训练。本文将MD推广到黎曼流形上的优化。具体地,我们通过重参数化开发了一个黎曼镜像下降(RMD)框架,并进一步提出了RMD的随机变体。我们还为RMD和随机RMD建立了非渐近收敛保证。作为在Stiefel流形上的应用,我们的RMD框架退化为[26]中提出的曲线梯度下降(CGD)方法。此外,当将随机RMD框架特化到Stiefel设置时,我们得到了CGD的随机扩展,这有效地解决了大规模流形优化问题。

英文摘要

Mirror Descent (MD) is a scalable first-order method widely used in large-scale optimization, with applications in image processing, policy optimization, and neural network training. This paper generalizes MD to optimization on Riemannian manifolds. In particular, we develop a Riemannian Mirror Descent (RMD) framework via reparameterization and further propose a stochastic variant of RMD. We also establish non-asymptotic convergence guarantees for both RMD and stochastic RMD. As an application to the Stiefel manifold, our RMD framework reduces to the Curvilinear Gradient Descent (CGD) method proposed in [26]. Moreover, when specializing the stochastic RMD framework to the Stiefel setting, we obtain a stochastic extension of CGD, which effectively addresses large-scale manifold optimization problems.

2603.11242 2026-06-12 stat.ML cs.LG 版本更新

A Unified Latent Space Disentanglement VAE Framework with Robust Disentanglement Effectiveness Evaluation

统一潜在空间解缠的VAE框架及鲁棒的解缠效果评估

Xiaoan Lang, Md Mostafizer Rahman, Fang Liu

AI总结 提出统一框架bfVAE整合多种解缠VAE方法,并开发FVH-LT和DBSR-LS评估解缠效果,引入LSSI指标量化潜在结构分离,无需真实生成因子。

详情
AI中文摘要

评估和解释潜在表示(如变分自编码器VAE)对于多样数据类型仍然是一个重大挑战,尤其是当真实生成因子未知时。为此,我们将几种最先进的用于潜在空间解缠的VAE方法统一到一个框架——bfVAE中。为了评估解缠VAE模型的有效性并增强潜在空间可解释性,我们提出了通过潜在遍历的特征方差异质性(FVH-LT)和潜在空间中的脏块稀疏回归(DBSR-LS)。为了确保学习到的潜在空间的鲁棒可解释性,我们开发了一种贪婪对齐策略(GAS),该策略减轻了标签切换问题,并对齐不同运行中的潜在维度,为结果聚合奠定基础。我们还引入了一个方便的标量潜在空间分离指数(LSSI),该指数基于FVH-LT和DBSR-LS的GAS对齐输出,在不知道真实生成因子的情况下总结整体潜在结构分离。我们将bfVAE与五个VAE模型进行比较,并在七个表格和图像数据集上验证了FVH-LT、DBSR-LS和LSSI的有效性。在我们检查的实验设置下,bfVAE提供了一个更灵活的解缠框架,在解缠和重构之间实现了比基准VAE模型更有利的整体权衡;FVH-LT和DBSR-LS可靠地揭示了语义上有意义且与领域相关的潜在结构,并且通常产生一致的结果;LSSI对潜在结构分离做出了有效的定量总结。

英文摘要

Evaluating and interpreting latent representations, such as variational autoencoders (VAEs), remains a significant challenge for diverse data types, especially when ground-truth generative factors are unknown. To address this, we unify several state-of-the-art disentangled VAE approaches for latent space disentanglement into one framework -- bfVAE. To assess the effectiveness of a disentangled VAE model and enhance latent space interpretability, we propose Feature Variance Heterogeneity via Latent Traversal (FVH-LT) and Dirty Block Sparse Regression in Latent Space (DBSR-LS). To ensure robust interpretability of learned latent space, we develop a greedy alignment strategy (GAS) that mitigates label switching and aligns latent dimensions across runs to set the foundation of result aggregation. We also introduce a convenient scalar latent space separation index (LSSI) based on the GAS-aligned outputs of FVH-LT and DBSR-LS to summarize the overall latent structural separation without knowledge of the ground-truth generative factors. We compare bfVAE to five VAE models and validate the effectiveness FVH-LT, DBSR-LS, and LSSI in on seven tabular and image datasets. Under our examined experimental settings, bfVAE provides a more flexible disentanglement framework achieves more favorable overall trade-off between disentanglement and reconstruction than the benchmark VAE models; FVH-LT and DBSR-LS reliably uncover semantically meaningful and domain-relevant latent structures and generally yield consistent results; and LSSI makes an effective quantitative summary of latent structural separation.

2304.13836 2026-06-12 cs.LG cs.AI cs.CV stat.ME 版本更新

On Pitfalls of $\textit{RemOve-And-Retrain}$: Data Processing Inequality Perspective

论 $\textit{RemOve-And-Retrain}$ 的陷阱:数据处理不等式视角

Junhwa Song, Keumgang Cha, Junghoon Seo

AI总结 从信息论角度揭示ROAR基准的缺陷:数据无关的后处理可提升ROAR分数,导致对归因图信息量的误判,并发现模糊性偏差。

详情
Comments
Accepted at the 2026 ICML Workshop on Mechanistic Interpretability
AI中文摘要

RemOve-And-Retrain (ROAR) 基准被广泛用于评估特征归因方法,但其有效性尚未从信息论角度得到充分探索。我们证明,对归因图进行模型和数据无关的后处理(通过数据处理不等式,这些变换\emph{不能}增加关于决策函数的信息)通常可以改善ROAR分数。这意味着ROAR排名的提升本身并不能证明归因图携带更多关于模型的信息。我们将这种失败模式归因于对空间模糊掩膜的偏好。在CIFAR-10、SVHN和CUB-200上的实验显示,模糊度与ROAR性能之间存在一致的关联,这种模式也出现在ROAD变体中。我们为更谨慎的基于移除的基准测试提供了指导方针,这对验证神经网络内部机制的机械理解具有重要意义。

英文摘要

The RemOve-And-Retrain (ROAR) benchmark is widely used to evaluate feature attribution methods, yet its validity remains underexplored from an information-theoretic perspective. We show that model- and data-agnostic post-processing of attribution maps (transformations that, by the data processing inequality, \emph{cannot} add information about the decision function) can often improve ROAR scores. This means that an improved ROAR ranking is not, by itself, evidence that an attribution map carries more information about the model. We trace this failure mode to a bias toward spatially blurry masks. Experiments on CIFAR-10, SVHN, and CUB-200 show a consistent association between blurriness and ROAR performance, a pattern that also appears in the ROAD variant. We provide guidelines for more cautious removal-based benchmarking, with implications for validating mechanistic understanding of neural network internals.

2508.21531 2026-06-12 stat.ML cs.LG stat.CO 版本更新

Adaptive generative moment matching networks for improved learning of dependence structures

自适应生成矩匹配网络用于改进依赖结构学习

Marius Hofert, Gan Yao

AI总结 提出自适应带宽选择的最大均值差异混合核用于生成矩匹配网络,通过增加核数量和早停策略提升训练性能,在copula随机数生成、高维收敛率及金融数据依赖建模中优于传统方法。

详情
AI中文摘要

引入了一种用于最大均值差异(MMD)中混合核的自适应带宽选择程序,以拟合生成矩匹配网络(GMMNs),并展示了copula随机数生成器的改进学习。基于训练损失的相对误差,在训练过程中增加核的数量;此外,验证损失的相对误差被用作早停标准。虽然训练时间保持相似,但自适应训练GMMNs(AGMMNs)显著提高了训练性能,这通过验证MMD轨迹、样本和验证MMD值得以展示。在三个应用中,AGMMNs相比GMMNs和参数copula模型也表现出优越性。首先,首次在高达100维的维度中研究了基于copula的准随机与伪随机样本的估计量收敛速度。其次,重复的验证MMD以及蒙特卡洛和准蒙特卡洛应用证明了AGMMNs在去GARCH化后的标普500指数50个成分所隐含的copula模型上的改进训练。最后,后一个数据集和富时100指数的50个成分被用于证明AGMMNs的改进训练确实转化为改进的模型预测。

英文摘要

An adaptive bandwidth selection procedure for the mixture kernel in the maximum mean discrepancy (MMD) for fitting generative moment matching networks (GMMNs) is introduced, and improved learning of copula random number generators is demonstrated. Based on the relative error of the training loss, the number of kernels is increased during training; additionally, the relative error of the validation loss is used as an early stopping criterion. While training time remains similar, adaptively training GMMNs (AGMMNs) significantly increases training performance, which is shown based on validation MMD trajectories, samples and validation MMD values. Superiority of AGMMNs over GMMNs and parametric copula models is also demonstrated in terms of three applications. First, convergence rates of estimators based on quasi-random versus pseudo-random samples from copulas are investigated in dimensions as large as 100 for the first time. Second, replicated validation MMDs, as well as Monte Carlo and quasi-Monte Carlo applications demonstrate the improved training of AGMMNs for a copula model implied by the 50 constituents of the S&P 500 index after deGARCHing. Last, both the latter dataset and 50 constituents of the FTSE 100 are used to demonstrate that the improved training of AGMMNs indeed translates to an improved model prediction.

2502.18959 2026-06-12 cs.LG stat.ML 版本更新

Fourier Multi-Component and Multi-Layer Neural Networks: Unlocking High-Frequency Potential

傅里叶多分量与多层神经网络:解锁高频潜力

Shijun Zhang, Hongkai Zhao, Yimin Zhong, Haomin Zhou

AI总结 提出傅里叶多分量与多层神经网络(FMMNN),结合正弦型激活函数与多分量多层结构,通过低秩架构实现指数级函数逼近能力,优化景观优于标准全连接网络,并设计缩放随机初始化方法加速训练,在高频函数逼近任务中取得高精度与良好收敛性。

详情
Comments
Our code and implementation details are available at this https URL
AI中文摘要

神经网络的结构及其激活函数的选择对其性能至关重要。同样重要的是确保这两个元素良好匹配,因为它们的对齐是有效表示和学习的关键。在本文中,我们引入了傅里叶多分量与多层神经网络(FMMNN),该模型将正弦型激活函数与MMNN的多分量多层结构相结合。在FMMNN中,每个分量表示为固定随机正弦型基函数的可训练线性组合,而多层组合则生成更复杂且自适应的频率特征。我们证明,即使在低秩架构下,FMMNN仍能保持函数逼近的指数级表达能力。我们还分析了FMMNN的优化景观,发现其比标准全连接神经网络更有利,尤其是对于高频目标。此外,我们提出了一种针对FMMNN第一层权重的缩放随机初始化方法,当样本充足时,该方法能加速训练并提高最终性能。大量数值实验支持我们的理论见解,表明FMMNN在振荡函数逼近基准上实现了高精度和良好的收敛行为。

英文摘要

The architecture of a neural network and the choice of its activation function are both fundamental to its performance. Equally important is ensuring that these two elements are well matched, as their alignment is key to effective representation and learning. In this paper, we introduce the Fourier Multi-Component and Multi-Layer Neural Network (FMMNN), a model that combines sine-type activations with the multi-component and multi-layer structure of MMNNs. In an FMMNN, each component is represented as a trainable linear combination of fixed random sine-type basis functions, while multi-layer composition generates more complex and adaptive high-frequency features. We establish that FMMNNs retain exponential expressive power for function approximation even under a low-rank architectural structure. We also analyze the optimization landscape of FMMNNs and find it to be substantially more favorable than that of standard fully connected neural networks, especially for high-frequency targets. In addition, we propose a scaled random initialization method for the first-layer weights in FMMNNs, which accelerates training and improves final performance when sufficient samples are available. Extensive numerical experiments support our theoretical insights, showing that FMMNNs achieve strong accuracy and favorable convergence behavior on oscillatory function-approximation benchmarks.

6. 生物统计与医学统计 6 篇

2602.17041 2026-06-12 stat.ME 版本更新

Reframing Population-Adjusted Indirect Comparisons as a Transportability Problem: An Estimand-Based Perspective and Implications for Health Technology Assessment

将人口调整间接比较重新定义为可迁移性问题:基于估计量的视角及其对卫生技术评估的影响

Conor Chandler, Jack Ishak

AI总结 本文从估计量角度形式化人口调整间接比较中的可迁移性,区分条件与边际处理效应,并揭示效应修饰、可压缩性与效应尺度如何影响迁移,为卫生技术评估中间接证据的使用提供指导。

详情
Comments
26 pages (excluding supplement and references), 7 figures, 1 table
AI中文摘要

当随机对照试验招募不同患者群体且缺乏头对头比较时,人口调整间接比较(PAICs)被广泛用于综合证据。尽管PAICs调整了试验间观察到的人群差异,但仅调整并不能确保估计效应可迁移至卫生技术评估(HTA)中决策相关的人群。我们从基于估计量的角度审视并形式化PAICs中的可迁移性。我们区分条件与边际处理效应估计量,并展示可迁移性如何依赖于效应修饰、可压缩性以及效应修饰尺度与效应度量之间的一致性。通过示例说明,即使效应修饰因子在不同治疗间共享,对于常用的非可压缩性度量(包括风险比和比值比),边际效应通常依赖于人群。相反,在线性预测变量尺度上定义的可压缩性和条件效应表现出更有利的可迁移性属性。我们进一步证明,成对PAIC方法通常识别在比较人群中所定义的效应,将这些估计应用于其他人群需要额外的、通常是隐含的迁移步骤,这需要进一步的假设。这对HTA有直接影响,因为PAIC推导的效应通常应用于为不同目标人群定义的成本效果和决策模型中。我们的结果阐明了何时将PAIC推导的处理效应应用于期望目标人群是合理的,何时需要额外假设,以及何时应将结果解释为特定人群而非决策相关,从而支持在HTA及相关决策环境中更透明、更有原则地使用间接证据。

英文摘要

Population-adjusted indirect comparisons (PAICs) are widely used to synthesize evidence when randomized controlled trials enroll different patient populations and head-to-head comparisons are unavailable. Although PAICs adjust for observed population differences across trials, adjustment alone does not ensure transportability of estimated effects to decision-relevant populations for health technology assessment (HTA). We examine and formalize transportability in PAICs from an estimand-based perspective. We distinguish conditional and marginal treatment effect estimands and show how transportability depends on effect modification, collapsibility, and alignment between the scale of effect modification and the effect measure. Using illustrative examples, we demonstrate that even when effect modifiers are shared across treatments, marginal effects are generally population-dependent for commonly used non-collapsible measures, including hazard ratios and odds ratios. Conversely, collapsible and conditional effects defined on the linear predictor scale exhibit more favorable transportability properties. We further show that pairwise PAIC approaches typically identify effects defined in the comparator population and that applying these estimates to other populations entails an additional, often implicit, transport step requiring further assumptions. This has direct implications for HTA, where PAIC-derived effects are routinely applied within cost-effectiveness and decision models defined for different target populations. Our results clarify when applying PAIC-derived treatment effects to desired target populations is justified, when doing so requires additional assumptions, and when results should instead be interpreted as population-specific rather than decision-relevant, supporting more transparent and principled use of indirect evidence in HTA and related decision-making contexts.

2601.04192 2026-06-12 stat.ME 版本更新

Prediction Intervals for Future Event Counts at Interim Analyses of Time-to-Event Clinical Trials

时间-事件临床试验中期分析中未来事件计数的预测区间

Edoardo Ratti, Federico L. Perlino, Stefania Galimberti, Maria G. Valsecchi

AI总结 针对时间-事件临床试验中期分析,提出基于条件参数自助法的患者级框架,构建未来事件计数的预测区间,并通过模拟和实际案例验证其有效性。

详情
Comments
36 pages, 19 figures
AI中文摘要

时间-事件终点是评估各疾病领域治疗效果的核心。在具有时间-事件终点的临床试验中,中期和最终分析可用的信息主要由观察到的事件数而非入组患者数决定。因此,中期监测需要评估在预定未来分析日期前预计将累积多少额外事件。量化这些计数的不确定性对于评估计划的信息水平是否可能达到、预测延迟或事件超限以及支持试验进行中的操作决策至关重要。这在儿科肿瘤学试验中尤其相关,因为事件累积通常具有不确定性。尽管预测终点成熟时间的方法已很成熟,但在固定日历时间对事件计数进行区间预测仍不完善。我们提出一个患者级框架,用于在时间-事件试验的中期分析中构建此类区间。以中期数据为条件,未来计数遵循具有患者特异性事件概率的泊松-二项分布;我们使用条件参数自助法估计该分布。在标准正则条件下,自助法是一致的,并产生渐近校准的预测区间。该框架适应了分阶段入组、患者级协变量、管理删失、随机失访以及入组日期与失访之间在条件于已实现中期数据之前的可能依赖关系。我们通过模拟研究其操作特征,并利用一项儿童急性淋巴细胞白血病的真实III期试验进行说明。

英文摘要

Time-to-event endpoints are central to evaluating treatment efficacy across disease areas. In clinical trials with time-to-event endpoints, the information available for interim and final analyses is largely determined by the number of observed events rather than by the number of enrolled patients. Interim monitoring therefore requires assessing how many additional events are expected to accrue by scheduled future analysis dates. Quantifying uncertainty around these counts is essential for assessing whether planned information levels are likely to be reached, anticipating delays or event overrunning, and supporting operational decisions while the trial is ongoing. This is especially relevant in pediatric oncology trials, where event accrual is often uncertain. Although methods for predicting time to endpoint maturation are well established, interval prediction for event counts at fixed calendar times remains less developed. We propose a patient-level framework for constructing such intervals at interim analyses of time-to-event trials. Conditionally on the interim data, the future count follows a Poisson--binomial law with patient-specific event probabilities; we estimate this law using a conditional parametric bootstrap. Under standard regularity conditions, the bootstrap is consistent and yields asymptotically calibrated prediction intervals. The framework accommodates staggered entry, patient-level covariates, administrative censoring, random loss to follow-up, and possible dependence between entry dates and loss to follow-up before conditioning on the realised interim data. We study its operating characteristics in simulation studies and illustrate it using a real-world phase III trial in childhood acute lymphoblastic leukaemia.

2201.13095 2026-06-12 stat.ME 版本更新

Joint Count Transformation Models with Covariate-dependent Correlations

具有协变量相关相关性的联合计数变换模型

Lukas Graz, Luisa Barbanti, Roland Brandl, Torsten Hothorn

AI总结 提出联合计数变换模型,结合无分布边际计数变换与协变量依赖的高斯Copula,通过联合最大似然估计高效建模多物种丰度及其相关性,在鸟类案例中捕获季节变化模式。

详情
AI中文摘要

联合物种分布模型对于理解生态协变量如何塑造物种群落至关重要。然而,大多数现有方法受限于计数数据的刚性参数分布,且无法模拟种间关联如何随这些协变量变化。我们引入了联合计数变换模型,这是一个旨在克服这些限制的新框架。我们的方法将多个物种的无分布边际计数变换模型与协变量依赖的潜高斯Copula相结合,以建模种间相关性,该相关性可解释为观测计数尺度上的Spearman秩相关。所有模型参数通过联合最大似然估计高效估计,并在R包tram中实现。我们将此框架应用于模拟三种食鱼鸟类的联合丰度,以季节性作为主要协变量。我们的模型成功捕获了复杂的、物种特异性的季节性丰度模式,包括高零计数的时期和方差的季节性变化。此外,模型揭示了物种之间强烈的、随季节变化的相关性。这些发现与经验方法一致,并且与计算昂贵的参数化贝叶斯分层建模物种群落(HMSC)框架的结果相似。通过多达10个物种的模拟研究,证明了我们方法的一致性、准确性和可行性。

英文摘要

Joint Species Distribution Models are essential for understanding how ecological covariates shape species communities. However, most existing approaches are limited by rigid parametric distributions for count data and the inability to model how interspecific associations change with those covariates. We introduce joint count transformation models, a novel framework designed to overcome these limitations. Our approach combines distribution-free marginal count transformation models for multiple species with a covariate-dependent latent Gaussian copula to model interspecific correlations, interpretable as Spearman's rank correlation on the observed count scale. All model parameters are estimated efficiently via joint maximum likelihood estimation, implemented in the R package tram. We apply this framework to model the joint abundance of three fish-eating bird species, using seasonality as the primary covariate. Our model successfully captured the complex, species-specific seasonal abundance patterns, including periods of high zero-counts and seasonal shifts in variance. Furthermore, the model revealed strong, seasonally-varying correlations between the species. These findings are consistent with an empirical approach and similar to those from the computationally expensive parametric Bayesian Hierarchical Modelling of Species Communities (HMSC) framework. Consistency, accuracy and feasibility of our approach are demonstrated in a simulation study for up to 10 species.

2509.12473 2026-06-12 stat.ME 版本更新

Cox Regression on the Plane

平面上的Cox回归

Yael Travis-Lumer, Micha Mandel, Ido Didi Fabian, Rebecca A. Betensky, Malka Gorfine

AI总结 提出两种基于Lehmann型表示的Cox比例风险模型扩展,用于双变量生存数据,通过伪观测方法估计回归参数,并证明估计量的一致性和渐近正态性。

详情
Comments
89 pages, including appendices, figures, and tables
AI中文摘要

Cox比例风险模型是单变量生存分析中最广泛使用的回归模型,但其对双变量生存数据的扩展仍然很少。我们基于生存函数的Lehmann型表示提出了两种新的扩展。第一种是简单Lehmann模型,是一种直接扩展,保留了简单的结构。第二种是广义Lehmann模型,通过引入三个不同的回归参数允许更大的灵活性,并将简单Lehmann模型作为特例。这些模型在生存概率方面具有直接解释,提供了一个透明、完全半参数的框架,用于评估协变量对边际生存概率及其依赖性的影响,而无需指定copula或脆弱分布。为了估计回归参数,我们基于双变量生存数据的伪观测方法,并通过两步程序将其扩展到广义模型。我们建立了所得估计量的一致性和渐近正态性。通过模拟研究和来自全球视网膜母细胞瘤研究的数据应用说明了所提出的方法。

英文摘要

The Cox proportional hazards model is the most widely used regression model in univariate survival analysis, yet extensions to bivariate survival data remain scarce. We propose two novel extensions based on a Lehmann-type representation of the survival function. The first, the simple Lehmann model, is a direct extension that retains a straightforward structure. The second, the generalized Lehmann model, allows greater flexibility by incorporating three distinct regression parameters and includes the simple Lehmann model as a special case. The models admit a direct interpretation in terms of survival probabilities, providing a transparent, fully semiparametric framework for assessing covariate effects on both marginal survival probabilities and their dependence, without requiring specification of a copula or frailty distribution. To estimate the regression parameters, we build on a pseudo-observation-based approach for bivariate survival data and extend it to the generalized model via a two-step procedure. We establish consistency and asymptotic normality of the resulting estimators. The proposed approach is illustrated through simulation studies and an application to data from the Global Retinoblastoma Outcome Study.

2508.20349 2026-06-12 stat.ME 版本更新

Covariate-adjusted win statistics in randomized clinical trials with ordinal outcomes

序数结局随机对照试验中协变量调整的胜率统计量

Zhiqiang Cao, Scott Zuo, Mary Ryan Baumann, Kendra Plourde, Patrick Heagerty, Guangyu Tong, Fan Li

AI总结 针对序数结局,提出基于倾向性评分加权和增广加权的胜率估计方法,实现协变量调整以提高效率,并证明模型稳健性。

详情
AI中文摘要

序数结局在临床中常见,通常代表疾病进展的不同阶段或功能损伤的不同程度。本文通过内在成对结局比较(如胜率和胜率差)来表征序数结局的平均处理效应。认识到基线协变量调整对提高精度的价值,我们首先开发了倾向性评分加权估计量,包括逆概率加权(IPW)和重叠加权(OW),专门用于估计胜率估计量。此外,我们开发了增广加权估计量,利用额外的序数结局回归以可能提高仅加权的效率。利用U统计量理论,我们建立了所有估计量的渐近理论,并推导了闭式方差估计量以支持统计推断。我们还证明了所有协变量调整估计量即使在相关工作模型错误指定时也不会损害目标估计量的一致性;因此这些协变量调整估计量具有模型稳健性。通过模拟,我们展示了加权估计量相对于未调整估计量的效率提升,而增广加权估计量在除极端情况外进一步提高了效率。最后,我们通过ORCHID试验说明了所提出的方法,并在R包winPSW中实现了我们的协变量调整方法。

英文摘要

Ordinal outcomes are common in clinical settings where they often represent increasing levels of disease progression or different levels of functional impairment. In this article, we focus on representing the average treatment effect for ordinal outcomes via intrinsic pairwise outcome comparisons captured through win estimands, such as the win ratio and win difference. Recognizing the value of baseline covariate adjustment toward enhanced precision, we first develop propensity score weighting estimators, including both inverse probability weighting (IPW) and overlap weighting (OW), tailored to estimating win estimands. Furthermore, we develop augmented weighting estimators that leverage an additional ordinal outcome regression to potentially improve efficiency over weighting alone. Leveraging the theory of U-statistics, we establish the asymptotic theory for all estimators, and derive closed-form variance estimators to support statistical inference. We also prove that all of the covariate-adjusted estimators do not compromise consistency for the target estimand even when the associated working models are incorrectly specified; hence these covariate-adjusted estimators are model-robust. Through simulations we demonstrate the enhanced efficiency of the weighted estimators over the unadjusted estimator, with the augmented weighting estimators showing a further improvement in efficiency except for extreme cases. Finally, we illustrate our proposed methods with the ORCHID trial, and implement our covariate adjustment methods in an R package winPSW.

2412.12967 2026-06-12 stat.ME 版本更新

Neural Posterior Estimation for Stochastic Epidemic Modeling

随机流行病建模的神经后验估计

Prayag Chatha, Fan Bu, Jeffrey Regier, Evan Snitkin, Jon Zelner

AI总结 提出使用神经后验估计(NPE)校准随机传染病模型,通过模拟训练神经网络近似后验分布,在样本效率上优于近似贝叶斯计算(ABC),并应用于医疗相关感染数据。

详情
Comments
36 pages, 22 figures, preprint. To be published in the Annals of Applied Statistics
AI中文摘要

随机传染病模型捕捉了公共卫生结果的不确定性,并在流行病学实践中日益流行。然而,使用现有参数估计方法将这些模型校准到观测数据具有挑战性。随机流行病模型是非线性动力系统,具有潜在的大状态空间,导致似然密度在计算上难以处理。我们开发了一种使用神经后验估计(NPE)校准复杂流行病模型到高维数据的方法,这是一种用于基于模拟推断的新技术。在NPE中,在模拟数据上训练的神经条件密度估计器学习“反转”随机模拟器,返回后验分布的参数近似。我们引入了一个随机的、离散时间的易感-感染(SI)模型,具有异质性传播,用于医疗相关感染(HAIs)。HAIs是医疗系统的重大负担,它们表现出高比例的无症状携带,使得估计感染率变得困难。通过广泛的模拟实验,我们表明NPE能够以比近似贝叶斯计算(ABC)更高的样本效率产生准确的感染率后验估计。然后,我们使用NPE将我们的SI模型拟合到一家长期急性护理机构中耐碳青霉烯肺炎克雷伯菌的爆发,发现了患者间传播风险中基于位置的异质性证据。我们认为我们的方法可以有效地应用于广泛的机制传播模型和传染病流行病学问题。

英文摘要

Stochastic infectious disease models capture uncertainty in public health outcomes and have become increasingly popular in epidemiological practice. However, calibrating these models to observed data is challenging with existing methods for parameter estimation. Stochastic epidemic models are nonlinear dynamical systems with potentially large latent state spaces, resulting in computationally intractable likelihood densities. We develop an approach to calibrating complex epidemiological models to high-dimensional data using Neural Posterior Estimation, a novel technique for simulation-based inference. In NPE, a neural conditional density estimator trained on simulated data learns to "invert" a stochastic simulator, returning a parametric approximation to the posterior distribution. We introduce a stochastic, discrete-time Susceptible Infected (SI) model with heterogeneous transmission for healthcare-associated infections (HAIs). HAIs are a major burden on healthcare systems. They exhibit high rates of asymptotic carriage, making it difficult to estimate infection rates. Through extensive simulation experiments, we show that NPE produces accurate posterior estimates of infection rates with greater sample efficiency compared to Approximate Bayesian Computation (ABC). We then use NPE to fit our SI model to an outbreak of carbapenem-resistant Klebsiella pneumoniae in a long-term acute care facility, finding evidence of location-based heterogeneity in patient-to-patient transmission risk. We argue that our methodology can be fruitfully applied to a wide range of mechanistic transmission models and problems in the epidemiology of infectious disease.

7. 经济金融与社会科学统计 1 篇

2502.07695 2026-06-12 stat.AP stat.ME 版本更新

A scalable Bayesian double machine learning framework, with application to racial disproportionality assessment

可扩展的贝叶斯双机器学习框架及其在种族不成比例评估中的应用

Yu Luo, Vanessa McNealis, Yijing Li

AI总结 提出一种结合贝叶斯经验似然与双机器学习的半参数部分线性结构回归方法,用于控制高维混杂并纳入先验假设,应用于伦敦拦截搜查数据发现种族不成比例受行政区种族构成影响。

详情
AI中文摘要

拦截搜查实践中的种族不成比例引发了对其社会和行为影响的重大关切。在伦敦,黑人被拦截搜查的可能性大约是白人的四倍。利用2019年1月至2023年12月伦敦拦截搜查事件的数据,本文旨在调查涉及黑人的表达性犯罪拦截量与其他种族相比的不成比例性。我们采用半参数部分线性结构回归方法,并引入一种结合双机器学习技术的贝叶斯经验似然程序,以控制高维混杂并适应强先验假设。此外,我们证明了所提程序在覆盖方面产生有效的后验。将该方法应用于拦截搜查数据集,我们发现针对黑人社区的种族不成比例可能在关注表达性犯罪时受到行政区种族构成的影响。

英文摘要

Racial disproportionality in stop and search practices elicits substantial concerns about its societal and behavioral impacts. In London, Black individuals are about four times more likely to be stopped and searched than White individuals. Using data on stop and search events in London from January 2019 to December 2023, this paper aims to investigate disproportionality in the volume of stops for expressive crimes involving Black individuals compared to other ethnicities. We employ a semi-parametric partially linear structural regression method and introduce a Bayesian empirical likelihood procedure combined with double machine learning techniques to control for high-dimensional confounding and to accommodate strong prior assumptions. In addition, we show that the proposed procedure yields a valid posterior in terms of coverage. Applying this approach to the stop and search dataset, we find that racial disproportionality aimed at the Black community may be influenced by the borough racial composition when focusing on expressive crimes.

8. 数据隐私、稳健性与公平性 2 篇

2601.21324 2026-06-12 stat.ML cs.LG 版本更新

Bulk-Calibrated Credal Ambiguity Sets: Fast, Tractable Decision Making under Out-of-Sample Contamination

批量校准的置信模糊集:样本外污染下的快速、可处理决策

Mengqi Chen, Thomas B. Berrett, Theodoros Damoulas, Michele Caprio

AI总结 提出批量校准置信模糊集,通过分离批量内污染和尾部贡献,得到闭式有限风险目标,转化为线性或二阶锥规划,实现高效鲁棒优化。

详情
Comments
Accepted for publication (spotlight) at ICML 2026
AI中文摘要

分布鲁棒优化(DRO)在模糊集上最小化最坏情况期望损失,该模糊集可捕捉样本外环境中的分布偏移。虽然Huber(线性-空)污染是$\varepsilon$分数任意扰动的经典最小假设模型,但将其纳入模糊集可能导致最坏情况风险无穷大,且DRO目标变得无意义,除非施加强有界性或支撑假设。我们通过引入批量校准的置信模糊集来解决这些挑战:我们从数据中学习一个高质量批量集,同时考虑批量内的污染,并分别约束剩余尾部贡献。这导致一个闭式、有限的$\mathrm{mean}+\sup$鲁棒目标,以及针对常见损失和批量几何结构的可处理线性或二阶锥规划。通过该框架,我们强调并利用上期望(不精确概率概念)与最坏情况风险之间的等价性,展示IP置信集如何转化为具有可解释容忍水平的DRO目标。在重尾库存控制、地理偏移房价回归和人口偏移文本分类上的实验显示了竞争性的鲁棒性-准确性权衡和高效的优化时间,使用了贝叶斯、频率学派或经验参考分布。

英文摘要

Distributionally robust optimisation (DRO) minimises the worst-case expected loss over an ambiguity set that can capture distributional shifts in out-of-sample environments. While Huber (linear-vacuous) contamination is a classical minimal-assumption model for an $\varepsilon$-fraction of arbitrary perturbations, including it in an ambiguity set can make the worst-case risk infinite and the DRO objective vacuous unless one imposes strong boundedness or support assumptions. We address these challenges by introducing bulk-calibrated credal ambiguity sets: we learn a high-mass bulk set from data while considering contamination inside the bulk and bounding the remaining tail contribution separately. This leads to a closed-form, finite $\mathrm{mean}+\sup$ robust objective and tractable linear or second-order cone programs for common losses and bulk geometries. Through this framework, we highlight and exploit the equivalence between the imprecise probability (IP) notion of upper expectation and the worst-case risk, demonstrating how IP credal sets translate into DRO objectives with interpretable tolerance levels. Experiments on heavy-tailed inventory control, geographically shifted house-price regression, and demographically shifted text classification show competitive robustness-accuracy trade-offs and efficient optimisation times, using Bayesian, frequentist, or empirical reference distributions.

2506.23033 2026-06-12 cs.LG stat.ML 版本更新

How Reliable are Fairness Audits with Unreliable Data?

不可靠数据下的公平性审计有多可靠?

Yash Vardhan Tomar

AI总结 研究受保护标签缺失对公平性缓解审计的影响,提出种子校准压力测试区分缺失效应与随机波动,发现正可用性缺失通常不改变缓解方法效果,但无标签端点表现不同,且阈值优化可能将单轴公平性增益转化为交叉危害。

详情
AI中文摘要

公平性审计是负责任机器学习部署的关键组成部分。然而,在不完全受保护标签访问下审计建议的可靠性仍然知之甚少。在这项工作中,我们关注公平性缓解审计中的受保护标签缺失。我们引入了一种种子校准压力测试,以将缺失效应与完全标签下已经存在的种子间波动分离开来。在ACS/Folktables任务中,我们发现正可用性缺失通常不会将选定的缓解方法移出完全标签的种子基线。无标签端点表现不同,暴露了ERM等效候选和确定性断点,而不是广泛的缺失效应。我们还发现,阈值优化可以将单轴公平性增益转化为高于零点的交叉危害,这是一种更尖锐的失败模式,在随机森林验证下似乎仍然可见。总体而言,我们的结果强调,在将受保护标签缺失视为审计脆弱性的证据之前,应报告种子零校准、候选集背景和交叉后果。

英文摘要

Fairness audits are a key component of responsible machine-learning deployment. Yet, audit-recommendation reliability under incomplete protected-label access is still poorly understood. In this work, we focused on protected-label missingness in fairness mitigation audits. We introduced a seed-calibrated stress test to separate missingness effects from seed-to-seed movement already present under complete labels. Across ACS/Folktables tasks, missingness settings that retain some protected labels usually do not move selected mitigation methods beyond a complete-label seed-to-seed baseline. At $0%$ protected-label access, candidates collapse to an empirical-risk-minimization baseline and deterministic tie-breaking rather than revealing a broad missingness effect. We also found that threshold optimization can turn fairness gains on a single protected axis into intersectional harm above a seed baseline, and this threshold-optimizer finding persists under random-forest validation. Overall, our results highlight that protected-label missingness should be reported with seed-null calibration, candidate-set context, and intersectional consequences before it is treated as evidence of audit fragility.

9. 数据集、软件与应用 6 篇

2604.12497 2026-06-12 cs.LG stat.ML 版本更新

Allocating Human Oversight in AI-Enabled Analytics

AI赋能分析中的人类监督分配

Zikun Ye, Jiameng Lyu, Rui Tao

AI总结 针对AI预测可靠性异质且未知的问题,提出基于上置信界的在线学习策略,动态分配有限的人类验证预算,使终端效率损失随预算增长趋于零。

详情
AI中文摘要

组织越来越多地部署AI作为面向客户的决策过程中的低成本预测层,包括需求感知、服务质量监控、产品测试和市场研究,但AI生成的信号在不同任务、产品和客户细分中的可靠性并不均匀。因此,企业仍然需要稀缺的人类验证(标签、审计、调查回复或后续测量)来将AI输出锚定到真实情况。由于人类真实情况本身存在噪声,在不同标注者之间甚至重复判断中都有所变化,企业必须为每个任务收集并平均多个人类标签,这使得人类验证成本高昂。我们研究如何在可靠性异质且在部署前未知的情况下,将有限的人类验证预算分配到多个AI辅助任务中。我们将其置于调优的预测驱动推断框架内。每个人类标签既提高了AI辅助估计的精度,也揭示了任务的修正难度,即在使用AI预测作为控制变量后剩余的方差。如果难度已知,最优分配将遵循Neyman平方根规则;由于未知,我们提出一种基于上置信界的策略,该策略在线学习难度并将验证导向AI最不可靠的任务。我们证明,随着预算增长,该策略相对于最优分配的终端效率损失趋于零。在合成实验和一个包含68个任务和超过2000名受访者的真实数字孪生调查中,当可靠性异质时,该策略缩小了与最优分配的大部分差距,优于均匀分配和epsilon-贪婪分配;在调查数据上,它还优于先探索后提交的试点设计,并将均匀分配的10-12%差距缩小到2-6%。AI的价值不仅取决于模型准确性,还取决于将人类监督定向到AI错误影响最大的操作策略。

英文摘要

Organizations increasingly deploy AI as a low-cost prediction layer in customer-facing decision processes, including demand sensing, service-quality monitoring, product testing, and market research, but AI-generated signals are unevenly reliable across tasks, products, and customer segments. Firms therefore still need scarce human validation (labels, audits, survey responses, or follow-up measurements) to anchor AI outputs to ground truth. Because human ground truth is itself noisy, varying across labelers and even across repeated judgments, the firm must collect and average several human labels per task, which makes human validation costly. We study how to allocate a limited human-validation budget across many AI-assisted tasks when reliability is heterogeneous and unknown before deployment. We cast this within tuned prediction-powered inference. Each human label both sharpens the AI-assisted estimate and reveals the task's rectification difficulty, the variance that remains after the AI prediction is optimally used as a control variate. If difficulties were known, the optimal allocation would follow a Neyman square-root rule; because they are unknown, we propose a policy based on upper confidence bounds that learns them online and steers validation toward tasks where AI is least reliable. We prove that the policy's terminal efficiency loss relative to the oracle allocation vanishes as the budget grows. In synthetic experiments and a real digital-twin survey with 68 tasks and over 2000 respondents, it closes most of the gap to the oracle when reliability is heterogeneous, outperforming uniform and epsilon-greedy allocation; on the survey data it also outperforms explore-then-commit pilot designs and cuts uniform's 10--12% gap to 2--6%. The value of AI depends not only on model accuracy but also on the operational policy that targets human oversight where AI errors matter most.

2601.09693 2026-06-12 cs.LG stat.ML 版本更新

Contrastive Geometric Learning Unlocks Unified Structure- and Ligand-Based Drug Design

对比几何学习实现统一的结构与配体药物设计

Lisa Schneckenreiter, Sohvi Luukkonen, Lukas Friedrich, Daniel Kuhn, Günter Klambauer

AI总结 提出对比几何模型ConGLUDe,统一结构与配体训练,实现虚拟筛选、靶标钓鱼和配体条件口袋预测,在多项基准测试中表现优异。

详情
Comments
Forty-Third International Conference on Machine Learning
AI中文摘要

基于结构和基于配体的计算药物设计传统上依赖于不相关的数据源和建模假设,限制了它们在大规模上的联合使用。在这项工作中,我们引入了用于统一计算药物设计的对比几何学习(ConGLUDe),这是一个单一的对比几何模型,统一了基于结构和基于配体的训练。ConGLUDe将产生全蛋白质表示和预测结合位点的隐式嵌入的几何蛋白质编码器与快速配体编码器耦合,消除了对预定义口袋的需求。通过对比学习将配体与全局蛋白质表示和多个候选结合位点对齐,ConGLUDe除了支持虚拟筛选和靶标钓鱼外,还支持配体条件口袋预测,同时在蛋白质-配体复合物和大规模生物活性数据上联合训练。在多种基准测试中,ConGLUDe实现了具有竞争力的零样本虚拟筛选性能,在具有挑战性的靶标钓鱼任务上显著优于现有方法,并展示了最先进的配体条件口袋选择。这些结果突显了统一结构-配体训练的优势,并将ConGLUDe定位为迈向药物发现通用基础模型的一步。

英文摘要

Structure-based and ligand-based computational drug design have traditionally relied on disjoint data sources and modeling assumptions, limiting their joint use at scale. In this work, we introduce Contrastive Geometric Learning for Unified Computational Drug Design (ConGLUDe), a single contrastive geometric model that unifies structure- and ligand-based training. ConGLUDe couples a geometric protein encoder that produces whole-protein representations and implicit embeddings of predicted binding sites with a fast ligand encoder, removing the need for predefined pockets. By aligning ligands with both global protein representations and multiple candidate binding sites through contrastive learning, ConGLUDe supports ligand-conditioned pocket prediction in addition to virtual screening and target fishing, while being trained jointly on protein-ligand complexes and large-scale bioactivity data. Across diverse benchmarks, ConGLUDe achieves competitive zero-shot virtual screening performance, substantially outperforms existing methods on a challenging target fishing task, and demonstrates state-of-the-art ligand-conditioned pocket selection. These results highlight the advantages of unified structure-ligand training and position ConGLUDe as a step toward general-purpose foundation models for drug discovery.

2511.02430 2026-06-12 stat.CO cs.MS cs.SE stat.ML 版本更新

Efficient Solvers for SLOPE in R, Python, Julia, and C++

R、Python、Julia 和 C++ 中 SLOPE 的高效求解器

Johan Larsson, Malgorzata Bogdan, Krystyna Grzesiak, Mathurin Massias, Jonas Wallin

AI总结 提出一套在 R、Python、Julia 和 C++ 中高效求解 Sorted L-One Penalized Estimation (SLOPE) 问题的软件包,采用混合坐标下降算法,支持多种损失函数和数据结构,性能优于现有实现。

详情
Comments
30 pages, 8 figures
AI中文摘要

我们提供了一套在 R、Python、Julia 和 C++ 中高效求解 Sorted L-One Penalized Estimation (SLOPE) 问题的软件包。这些软件包采用了一种高效的混合坐标下降算法,能够拟合广义线性模型(GLM),并支持多种损失函数,包括高斯、二项、泊松和多项逻辑回归。我们的实现旨在快速、内存高效且灵活。这些软件包支持多种数据结构(稠密、稀疏和内存外矩阵),并设计用于高效拟合完整的 SLOPE 路径以及处理 SLOPE 模型的交叉验证,包括松弛 SLOPE。我们展示了如何使用这些软件包的示例,以及在真实和模拟数据上展示其性能的基准测试,结果表明我们的软件包在速度上优于现有的 SLOPE 实现。

英文摘要

We present a suite of packages in R, Python, Julia, and C++ that efficiently solve the Sorted L-One Penalized Estimation (SLOPE) problem. The packages feature a highly efficient hybrid coordinate descent algorithm that fits generalized linear models (GLMs) and supports a variety of loss functions, including Gaussian, binomial, Poisson, and multinomial logistic regression. Our implementation is designed to be fast, memory-efficient, and flexible. The packages support a variety of data structures (dense, sparse, and out-of-memory matrices) and are designed to efficiently fit the full SLOPE path as well as handle cross-validation of SLOPE models, including the relaxed SLOPE. We present examples of how to use the packages and benchmarks that demonstrate the performance of the packages on both real and simulated data and show that our packages outperform existing implementations of SLOPE in terms of speed.

2508.14858 2026-06-12 stat.ME stat.ML 版本更新

Data Fusion for High-Resolution Estimation

数据融合用于高分辨率估计

Amy Guan, Roshni Sahoo, Joshua Salomon, Stefan Wager, Marissa Reitsma

AI总结 提出一种融合无偏低分辨率数据与有偏高分辨率数据的方法,通过KL散度学习与行政数据一致的分布,显著降低高分辨率估计的偏差。

详情
AI中文摘要

人口健康指标的高分辨率估计对于精准公共卫生至关重要。我们提出了一种高分辨率估计方法,融合了不同的数据源:无偏的低分辨率数据源(例如汇总的行政数据)和可能有偏的高分辨率数据源(例如个体层面的在线调查回复)。我们假设可能有偏的高分辨率数据源是在一个抽样偏差模型下从总体生成的,其中可观测变量可以任意影响响应概率,但具有相同可观测变量的单元之间响应概率的对数差异与其可观测变量和结果的充分统计量之间的差异呈线性关系。我们的数据融合方法学习一个分布,该分布在KL散度意义上最接近在线调查分布,并且与汇总的行政数据以及我们的抽样偏差模型一致。在一个包含三个指标的重复测量的测试平台上,该测试平台同时使用了(在线)家庭脉搏调查和同一时间段内两个地理分辨率下的真实数据源,与仅依赖单一数据源的基线方法相比,我们的方法显著减少了高分辨率估计中的偏差。

英文摘要

High-resolution estimates of population health indicators are critical for precision public health. We propose a method for high-resolution estimation that fuses distinct data sources: an unbiased, low-resolution data source (e.g. aggregated administrative data) and a potentially biased, high-resolution data source (e.g. individual-level online survey responses). We assume that the potentially biased, high-resolution data source is generated from the population under a model of sampling bias where observables can have arbitrary impact on the probability of response but the difference in the log probabilities of response between units with the same observables is linear in the difference between sufficient statistics of their observables and outcomes. Our data fusion method learns a distribution that is closest (in the sense of KL divergence) to the online survey distribution and consistent with the aggregated administrative data and our model of sampling bias. This approach significantly reduces bias in high-resolution estimates compared to baselines that rely on a single data source alone on a testbed that includes repeated measurements of three indicators measured by both the (online) Household Pulse Survey and ground-truth data sources at two geographic resolutions over the same time period.

2407.18572 2026-06-12 stat.AP math.ST stat.OT 版本更新

Bernoulli amputation

伯努利缺失生成

Marius Hofert, James Jackson, Niels Hagenbuch

AI总结 提出一种基于伯努利分布和copula的随机缺失生成方法,通过指定缺失指示变量的分布而非手动模式,灵活生成多种缺失模式,包括结构化缺失。

详情
AI中文摘要

提出了一种新颖的随机缺失生成方法,即向完整数据集中引入缺失值的过程。该方法只需指定缺失指示变量的分布,而无需手动指定每个缺失模式,即可构建多种缺失模式。通过copula和伯努利边际以原则性方式建模缺失指示变量,从而能够纳入缺失模式中的依赖性。除了经典的缺失机制如完全随机缺失、随机缺失和非随机缺失外,该方法还能建模结构化缺失,如块缺失,以及通过混合模型建模单调缺失,这些是现实数据集中常见的缺失数据模式。数学上推导了联合缺失概率和缺失相关性等性质。通过数学示例和基于一个样本量足够小、可视觉识别每个缺失数据点的知名示例数据集的经验说明,展示了该方法在仅需指定缺失指示变量的分布假设下捕捉不同缺失模式的灵活性。最后,提供了一个应用于多元金融时间序列的示例。

英文摘要

A novel, stochastic approach to amputation, the process of introducing missing values to a complete dataset, is presented. It allows one to construct a wide variety of missingness patterns by only having to specify distributions of missingness indicators as opposed to specifying each missingness pattern manually. Missingness indicators are modeled in a principled way via copulas and Bernoulli margins, thus allowing one to incorporate dependence in missingness patterns. Besides more classical missingness mechanisms such as missing completely at random, missing at random, and missing not at random, the approach is able to model structured missingness such as block missingness and, via mixtures, monotone missingness, which are patterns of missing data frequently found in real-life datasets. Properties such as joint missingness probabilities or missingness correlation are derived mathematically. The flexibility of the approach in capturing different missingness patterns while only requiring to specify distributional assumptions on missingness indicators is demonstrated with mathematical examples and empirical illustrations in terms of a well-known example dataset of sufficiently small sample size that allows to identify each missing data point visually. Finally, an example application to multivariate financial time series is provided.

2501.04823 2026-06-12 cs.RO math.OC stat.AP 版本更新

Learning Robot Safety from Sparse Human Feedback using Conformal Prediction

基于共形预测从稀疏人类反馈中学习机器人安全

Aaron O. Feldman, Joseph A. Vincent, Maximilian Adang, JunEn Low, Mac Schwager

AI总结 通过人类对策略轨迹的二元反馈,利用共形预测识别包含未来策略错误的状态区域,构建具有保证漏检率的预警系统,并用于改进模型预测控制器的安全性。

详情
AI中文摘要

确保机器人安全可能具有挑战性;用户定义的约束可能遗漏边缘情况,策略即使从安全数据训练也可能变得不安全,并且安全可能是主观的。因此,我们通过向标记不安全行为的人类展示策略轨迹来学习机器人安全。从这种二元反馈中,我们使用共形预测的统计方法识别一个状态区域(可能在学习的潜在空间中),保证包含用户指定比例的未来策略错误。我们的方法是样本高效的,因为它基于最近邻分类,避免了共形预测中常见的保留数据。通过提醒机器人是否到达可疑的不安全区域,我们获得了一个模拟人类安全偏好且具有保证漏检率的预警系统。通过视频标注,我们的系统可以检测四旋翼视觉运动策略何时无法通过指定门。我们提出了一种通过避免可疑不安全区域来改进策略的方法。通过它,我们提高了模型预测控制器的安全性,这在30次四旋翼飞行跨越6个导航任务的实验测试中得到了证明。提供了代码和视频。

英文摘要

Ensuring robot safety can be challenging; user-defined constraints can miss edge cases, policies can become unsafe even when trained from safe data, and safety can be subjective. Thus, we learn about robot safety by showing policy trajectories to a human who flags unsafe behavior. From this binary feedback, we use the statistical method of conformal prediction to identify a region of states, potentially in learned latent space, guaranteed to contain a user-specified fraction of future policy errors. Our method is sample-efficient, as it builds on nearest neighbor classification and avoids withholding data as is common with conformal prediction. By alerting if the robot reaches the suspected unsafe region, we obtain a warning system that mimics the human's safety preferences with guaranteed miss rate. From video labeling, our system can detect when a quadcopter visuomotor policy will fail to steer through a designated gate. We present an approach for policy improvement by avoiding the suspected unsafe region. With it we improve a model predictive controller's safety, as shown in experimental testing with 30 quadcopter flights across 6 navigation tasks. Code and videos are provided.

10. 其他/综合统计 1 篇

2603.26116 2026-06-12 stat.ME stat.AP 版本更新

Reconciling Latent Variables and Networks: Exploring and extending the Psychometric-Toolbox

整合潜在变量与网络:探索和扩展心理测量工具箱

Kevin Kistermann, Vivato V. Andriamiarana, Augustin Kelava

AI总结 本文回顾并综合了网络心理测量与经典心理测量方法的联系,提出通过跨学科统计方法扩展心理测量工具箱,促进跨领域合作,提升方法论系统的性和目标性。

详情
AI中文摘要

自网络心理测量引入以来,已建立了与经典心理测量模型(如IRT、SEM、GLM)及其他领域方法的联系。本文回顾了这些发展,并通过探索性文献检索进一步扩展和以可视化形式呈现。这种视角为通过整合和学习其他领域开发的统计方法来扩展心理测量工具箱提供了机会,这些方法往往解决相似或相同的问题。强调这些方法论的共同点可能促进传统上独立的跨领域合作。此外,了解这些联系可能使方法论发展更加系统和目标明确,并可能使开发统计方法与通过软件工具进行实证研究之间实现有意义的分工。最后,这些方法论进展为实证研究提供了新机会,并可能有助于解决长期存在的心理测量构念及更广泛的心理现象概念问题。

英文摘要

Since the introduction of network psychometrics, several connections to statistical models in "classical" psychometrics (i.e., IRT, SEM, GLM) as well as to approaches from other research fields have been established. In this paper, these developments have been reviewed and synthesized and, based on an exploratory literature search, further advanced and presented in an accessible visual format. This perspective opens up promising opportunities to extend the psychometric-toolbox by incorporating and learning from statistical methodologies developed in other research domains, which often address similar or even identical problems. Highlighting these methodological commonalities may also foster collaboration across research fields that have traditionally remained largely independent. Moreover, awareness of these connections may render methodological development more systematic and goal-directed and may enable a meaningful division of labor, for example between the development of statistical methodology and its practical implementation for empirical research through software tools. Finally, these methodological advances provide new opportunities for empirical research and may contribute to a reconciliation with longstanding conceptual issues concerning psychometric constructs and, more broadly, psychological phenomena.