arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.10934 2026-05-12 cs.LG cs.AI cs.CV cs.RO stat.ML

Variational Inference for Lévy Process-Driven SDEs via Neural Tilting

Yaman Kindap, Manfred Opper, Benjamin Dupuis, Umut Simsekli, Tolga Birdal

AI总结 该论文研究了如何利用变分推断方法对由Lévy过程驱动的随机微分方程(SDEs)进行建模,以准确捕捉金融、气候等领域的极端事件和重尾现象。传统方法要么计算开销大,要么依赖高斯假设而无法处理跳跃特性。为此,作者提出了一种基于神经网络的指数倾斜框架,通过神经网络对Lévy测度进行指数加权,构建灵活的变分族,在保留跳跃结构的同时保证计算可行性。实验表明,该方法在合成和真实数据上均能有效捕捉跳跃动态,并在高斯变分方法失效的情况下提供可靠的后验推断。

Comments The associated project page which contains the official implementation can be found in https://circle-group.github.io/research/NeuralTilting/

详情
英文摘要

Modelling extreme events and heavy-tailed phenomena is central to building reliable predictive systems in domains such as finance, climate science, and safety-critical AI. While Lévy processes provide a natural mathematical framework for capturing jumps and heavy tails, Bayesian inference for Lévy-driven stochastic differential equations (SDEs) remains intractable with existing methods: Monte Carlo approaches are rigorous but lack scalability, whereas neural variational inference methods are efficient but rely on Gaussian assumptions that fail to capture discontinuities. We address this tension by introducing a neural exponential tilting framework for variational inference in Lévy-driven SDEs. Our approach constructs a flexible variational family by exponentially reweighting the Lévy measure using neural networks. This parametrization preserves the jump structure of the underlying process while remaining computationally tractable. To enable efficient inference, we develop a quadratic neural parametrization that yields closed-form normalization of the tilted measure, a conditional Gaussian representation for stable processes that facilitates simulation, and symmetry-aware Monte Carlo estimators for scalable optimization. Empirically, we demonstrate that the method accurately captures jump dynamics and yields reliable posterior inference in regimes where Gaussian-based variational approaches fail, on both synthetic and real-world datasets.

2605.10915 2026-05-12 math.ST stat.TH

A Generative High Quantile Homogeneity Test Using Bahadur Representation for Heteroskedastic High Quantile Regression of Tail Dependent Time Series

Ting Zhang, Fangwei Wu, Jingying Gao

AI总结 本文研究了在尾部依赖时间序列的异方差高分位数回归中,解释变量对响应变量不同高分位数的影响是否具有同质性的问题。为此,作者提出了一种基于Bahadur表示的新型高分位数同质性检验方法,该方法适用于异方差情形,并能够将非线性高分位数回归估计问题转化为具有显式误差界的线性形式问题。该方法不仅为高分位数回归提供了理论基础,还在实际数据中的应用展示了其有效性。

Comments 31 pages, 1 figure

详情
英文摘要

We consider a high quantile homogeneity test to determine whether a certain set of explanatory variables has homogeneous effects on different high quantiles of the response variable in the tail. To accommodate for situations under both the null and the alternative, the auxiliary process in this case may no longer be treated as stationary, and the problem requires a joint analysis of both homoscedastic and heteroskedastic high quantiles. For this, we develop a novel Bahadur representation result in the high quantile setting for a general class of tail dependent time series under potential heteroskedasticity, which can be of interest by its own. In particular, the Bahadur representation provides a foundation for reducing problems regarding nonlinear high quantile regression estimators to those regarding suitably constructed linear forms with an explicit error bound and can be transformative and useful in many statistical problems. We in the current article apply it to guide the development of a generative high quantile homogeneity test, which is then illustrated through applications to both synthetic and real data.

2605.10914 2026-05-12 stat.CO

gemlib.mcmc: composable kernels for Metropolis-within-Gibbs sampling schemes

Alin Morariu, Jess Bridgen, Chris Jewell

AI总结 该研究针对流行病学和生态学中状态转移模型的统计推断难题,提出了一种名为 gemlib.mcmc 的 MCMC 模块,旨在简化 Metropolis-within-Gibbs 采样方案的实现。通过引入范畴论中的 writer monad,该框架实现了参数估计与数据增强核的可组合性,无需手动管理状态,从而提升了代码的可扩展性与复用性。基于 JAX 和 TensorFlow Probability,gemlib.mcmc 提供了高效且易用的接口,使复杂推断算法能够简洁表达并跨应用复用,降低了实现门槛,推动了方法研究与实际应用的结合。

详情
英文摘要

State-transition models are essential across epidemiology and ecology, but statistical inference remains challenging owing to high-dimensional latent state spaces, temporal dependence, and intractable likelihood functions. Bayesian inference via Markov Chain Monte Carlo (MCMC) enables joint estimation of model parameters and missing event times through data augmentation, but Metropolis-within-Gibbs (MWG) schemes that combine multiple specialised kernels are notoriously difficult to implement. Current probabilistic programming frameworks face a trade-off: automation sacrifices extensibility, whilst flexibility demands substantial implementation overhead. This divide has created a software landscape characterised by tightly coupled, model-specific implementations that resist reuse and extension. We introduce gemlib.mcmc, an MCMC module designed to bridge methodological and applied communities through principled, composable kernel abstractions. The framework employs writer monads from category theory to formalise kernel composition, enabling seamless integration of parameter-estimation and data-augmentation kernels without manual state management. Built on JAX and TensorFlow Probability for high-performance computation, gemlib.mcmc provides an ergonomic interface -- leveraging Python's right-shift operator for intuitive kernel chaining -- whilst maintaining statistical rigour and transparency. Developers can extend the library by implementing only two methods; composition and hardware acceleration are automated. We demonstrate the framework through parameter inference on partially observed epidemic models, showing how complex inference algorithms can be expressed concisely and reused across applications. By reducing implementation burden we provide access to sophisticated MCMC methods and enable applied researchers to employ state-of-the-art algorithms without reimplementation overhead.

2605.10911 2026-05-12 math.PR cs.CC cs.DS math.CO math.ST stat.TH

The stochastic block model has the overlap graph property for modularity

Shankar Bhamidi, David Gamarnik, Remco van der Hofstad, Nelly Litvak, Pawel Pralat, Fiona Skerman, Yasmin Tousinejad

AI总结 本文研究了随机块模型(SBM)中基于模块度的聚类算法的理论极限,指出模块度在SBM中具有重叠间隙性质(OGP)。这一性质表明,基于模块度的局部算法在恢复隐藏的社区结构时存在困难,并且相关马尔可夫链的混合时间较慢。该研究扩展了Bickel和Chen的结论,证明了在高概率下,任何接近最优模块度的划分都与隐藏的社区划分接近,为理解SBM中算法性能的瓶颈提供了理论依据。

Comments 28 pages, 2 figures

详情
英文摘要

The overlap gap property (OGP) is a statement about the geometry of near-optimal solutions. Exhibiting OGP implies failure of a class of local algorithms; and has been observed to coincide with conjectured algorithmic limits in problems with statistical computational gap. We consider the Stochastic Block Model (SBM), where the graph has a planted partition with $k$ equal-size blocks which form the `communities', and where, for parameters $p>q$, vertices within the same community connect with probability $p$, while vertices in different communities connect with probability $q$, independently across pairs of vertices. Modularity--based clustering algorithms have become ubiquitous in applications. This article studies theoretical limits of local algorithms based on the modularity score on the SBM. We establish that modularity exhibits OGP on the SBM. This rules out a class of local algorithms based on modularity for recovery in the SBM, and shows slow mixing time for a related Markov Chain. Theoretically this is one of the few instances where OGP has been established for a `planted' model, as most such analyses to date consider the `null' model. As part of our analysis, we extend a result by Bickel and Chen 2009, who established that with high probability, the modularity optimal partition of SBM is $o(n)$ local moves away from the planted partition, where $n$ is the graph size. We show that, with high probability, any partition with modularity score sufficiently near the optimal value is close to the planted partition.

2605.10909 2026-05-12 cs.LG stat.ML

Revisiting Policy Gradients for Restricted Policy Classes: Escaping Myopic Local Optima with $k$-step Policy Gradients

Alex DeWeese, Guannan Qu

AI总结 本文重新审视了在受限策略类中使用的标准策略梯度方法,发现其容易陷入次优临界点,主要原因在于策略梯度本身具有短视性,仅依赖于一步Q函数进行优化。为此,作者提出了一种基于$k$-步策略梯度的通用方法,通过结合$k$步时间窗口内的随机性,能够逃离受限策略类中的短视局部最优解。理论分析表明,该方法在性能上可以指数级接近最优确定性策略,并且在仅假设价值函数光滑可微的前提下,投影梯度下降和镜像下降方法能在$O(1/T)$次迭代内实现这一保证,适用于状态聚合和部分可观测协作多智能体等之前难以求解的问题。

详情
英文摘要

This work revisits standard policy gradient methods used on restricted policy classes, which are known to get stuck in suboptimal critical points. We identify an important cause for this phenomenon to be that the policy gradient is itself fundamentally myopic, i.e. it only improves the policy based on the one-step $Q$-function. In this work, we propose a generalized $k$-step policy gradient method that couples the randomness within a $k$-step time window and can escape the myopic local optima in MDPs with restricted policy classes. We show this new method is theoretically guaranteed to converge to a solution that is exponentially close in performance to the optimal deterministic policy with respect to $k$. Further, we show projected gradient descent and mirror descent with this $k$-step policy gradient can achieve this exponential guarantee in $O(\frac{1}{T})$ iterations, despite only assuming smoothness and differentiability of the value function. This will provide near optimal solutions to previously elusive applications like state aggregation and partially observable cooperative multi-agent settings. Moreover, our bounds avoid the ubiquitous distribution mismatch factors $||d_μ^{π^*} / d_μ^π||_\infty$ and $||d_μ^{π^*} / μ||_\infty$ enabling the $k$-step policy gradient method to escape suboptimal critical points that emerge from poor exploration in fully observable settings.

2605.10842 2026-05-12 econ.EM math.ST stat.TH

Higher-Order Neyman Orthogonality in Moment-Condition Models

Stéphane Bonhomme, Koen Jochmans, Whitney K. Newey, Martin Weidner

AI总结 本文研究了在参数矩条件模型中构造高阶Neyman正交矩函数的方法,旨在降低对 nuisance 参数估计误差的敏感性,从而为广泛计量经济模型提供统一且可行的高阶去偏方法。所提出的构造方式所需新增的 nuisance 参数数量与正交化阶数无关,并可根据需要减少为一个标量。

详情
英文摘要

We construct moment functions that are Neyman-orthogonal to a chosen order in parametric moment condition models. These moment functions reduce sensitivity to nuisance estimation error and, as such, offer a unified and tractable route to higher-order debiasing in a wide range of econometric models. The number of additional nuisance parameters required by our construction, beyond those already present in the original moment conditions, is independent of the order of orthogonalization and can be reduced to a single scalar if desired.

2605.10805 2026-05-12 cs.AI cs.CL stat.ML

Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge

Wenbo Zhang, Lijinghua Zhang, Liner Xiang, Hengrui Cai

AI总结 本文研究了在LLM作为裁判的场景下,推理能力带来的收益与成本之间的平衡问题。研究发现,推理在需要结构化验证的任务中显著提升判断准确性,但在简单任务中可能带来有限甚至负面效果,并伴随更高的计算成本。为此,作者提出了RACER方法,在固定预算下通过分布鲁棒优化动态选择是否启用推理,有效应对分布偏移,并在实验中展现出优越的准确率与成本平衡能力。

Comments Accepted at ICML 2026

详情
英文摘要

Reasoning-capable large language models (LLMs) have recently been adopted as automated judges, but their benefits and costs in LLM-as-a-Judge settings remain unclear. Through controlled comparisons between reasoning and non-reasoning judges, we show that explicit reasoning substantially improves judgment accuracy on tasks requiring structured verification (e.g., math and coding), while offering limited or even negative gains on simpler evaluations and incurring significantly higher computational cost. These findings motivate that reasoning should be used selectively rather than universally, with awareness of possible distribution shift. We propose a Robust Adaptive Cost-Efficient Routing (RACER), which dynamically selects between reasoning and non-reasoning judges under a fixed budget by formulating routing as a constrained distributionally robust optimization problem. RACER explicitly accounts for distribution shift via a KL-divergence uncertainty set, admits an efficient primal--dual algorithm, and enjoys theoretical guarantees including uniqueness of the optimal policy and linear convergence. Extensive experiments show that RACER achieves superior accuracy--cost trade-offs under distribution shift.

2605.10795 2026-05-12 stat.ML cond-mat.dis-nn cond-mat.stat-mech cs.LG

Factual recall in linear associative memories: sharp asymptotics and mechanistic insights

Alessio Giorlandino, Sebastian Goldt, Antoine Maillard

AI总结 本文研究了线性联想记忆网络在存储和检索输入-输出关联时的基本限制,揭示了其存储容量的精确渐进行为及机制。通过引入一个解耦模型,作者证明该模型在存储容量、权重谱和存储机制方面与原模型等价,并利用统计物理工具分析得出其最大存储量与输入维度之间的关系。研究还揭示了最优解如何超越传统赫布学习规则,为理解神经网络的记忆机制提供了新见解。

详情
英文摘要

Large language models demonstrate remarkable ability in factual recall, yet the fundamental limits of storing and retrieving input--output associations with neural networks remain unclear. We study these limits in a minimal setting: a linear associative memory that maps $p$ input embeddings in $\mathbb{R}^d$ to their corresponding~$d$-dimensional targets via a single layer, requiring each mapped input to be well separated from all other targets. Unlike in supervised classification, this strict separation induces~$p$ constraints per association and produces strong correlations between constraints that make a direct characterisation of the storage capacity difficult. Here, we provide a precise characterisation of this capacity in the following way. We first introduce a decoupled model in which each input has its own independent set of competing outputs, and provide numerical and analytical evidence that this decoupled model is equivalent to the original model in terms of storage capacity, spectra of the learnt weights, and storage mechanism. Using tools from statistical physics, we show that the decoupled model can store up to $p_c \log p_c / d^2 = 1 / 2$ associations, and generalise the computation of $p_c$ to linear two-layer architectures. Our analysis also gives mechanistic insight into how the optimal solution improves over a naïve Hebbian learning rule: rather than boosting input-output alignments with broad fluctuations, the optimal solution raises the correct scores just above the extreme-value threshold set by the competing outputs. These findings give a sharp statistical-physics characterisation of factual storage in linear networks and provide a baseline for understanding the memory capacity of more realistic neural architectures.

2605.10774 2026-05-12 math.ST stat.ML stat.TH

When Are Trade-Off Functions Testable from Finite Samples?

Kaining Shi, Qiaosen Wang, Cong Ma

AI总结 本文研究了从有限样本中对两个未知概率分布的权衡函数进行统计检验的问题,该函数描述了二分类测试中类型一和类型二错误的最优边界。作者提出了一种精确可实现的框架,指出在该框架下,若可测集合类的Vapnik-Chervonenkis维数有限,则可实现非渐近的有限样本检验。文中构造了具有非渐近误差保证的检验方法,并通过检验逆过程得到了整个权衡曲线的同时置信带,还分析了方法的尖锐性和鲁棒性。

详情
英文摘要

We study finite-sample inference for the trade-off function of two unknown probability distributions, the function that traces the optimal type I/type II error frontier in binary testing. Given samples from distributions $P$ and $Q$, we consider the problem of testing whether their trade-off function lies above a benchmark curve $f_0$ or falls below a weaker benchmark $f_1$. Without structural restrictions, this problem is impossible uniformly over nonparametric classes. We identify a sharp condition under which it becomes possible. The key structural assumption is that the Neyman--Pearson rejection regions for $(P,Q)$ are attainable, up to null sets, by a prescribed class $S$ of measurable sets. Within this exact attainability framework, finite Vapnik--Chervonenkis dimension of $S$ is both sufficient and necessary for nontrivial finite-sample testing. We construct a test with nonasymptotic error guarantees: type I error control is valid without assuming attainability, while power holds uniformly over attainable alternatives satisfying an explicit separation condition. By inverting the test, we also obtain simultaneous confidence bands for the whole trade-off curve. Finally, we study the sharpness and robustness of the procedure. In the monotone likelihood-ratio model, we derive local separation rates and prove matching lower bounds up to logarithmic factors. We also allow approximate, rather than exact, attainability; this extension yields finite-sample guarantees for univariate log-concave distributions by approximating their rejection regions with unions of intervals.

2605.10716 2026-05-12 cs.LG stat.ML

What should post-training optimize? A test-time scaling law perspective

Muheng Li, Jian Qian, Wenlong Mou

AI总结 该论文研究了大语言模型在部署时常用的“最佳中选N”策略与后训练目标之间的不匹配问题。作者提出,在训练资源有限的情况下,可以通过对奖励分布的上尾统计量进行外推,近似最佳中选N的目标梯度,从而设计出高效的后训练优化方法。文中提出的Tail-Extrapolated Advantage(TEA)及其改进版本Prefix-TEA,在多种语言模型和数据集上均能有效提升最佳中选N的性能。

详情
英文摘要

Large language models are increasingly deployed with test-time strategies: sample $N$ responses, score them with a reward model or verifier, and return the best. This deployment rule exposes a mismatch in post-training: standard objectives optimize the mean reward of a single response, whereas best-of-$N$ performance is governed by the upper tail of the reward distribution. Recent test-time-aware objectives partly address this mismatch, but typically assume that training can use the same per-prompt rollout budget as deployment, which is impractical when post-training must cover many prompts while deployment can allocate much larger per-prompt test-time compute. We study this budget-mismatch regime, where only $m\ll N$ per-prompt rollouts are available during training but the target objective is best-of-$N$ deployment. Under structural assumptions on the reward tails, we show that the policy gradient of the best-of-$N$ objective can be approximated from a much smaller rollout group by extrapolating upper-tail statistics. This yields a family of Tail-Extrapolated estimators for best-of-$N$-oriented post-training: a simple direct estimator, Tail-Extrapolated Advantage (TEA), and a fixed-order debiased Prefix-TEA estimator based on moment cancellation. Experiments on instruction-following tasks show that TEA and Prefix-TEA improve best-of-$N$ performance across different language models, reward models and datasets under various training and test-time budget settings.

2605.10713 2026-05-12 stat.ML cs.IT cs.LG math.IT math.ST stat.TH

Price of Quality: Sufficient Conditions for Sparse Recovery using Mixed-Quality Data

Youssef Chaabouni, David Gamarnik

AI总结 本文研究了在混合质量数据源下的稀疏恢复问题,即少量高质量低噪声测量与大量低质量高噪声测量共同存在的情况。作者提出了“质量代价”这一概念,给出了信息论和算法层面的样本数量条件,揭示了高质量样本与低质量样本之间的替代关系。研究发现,在无先验信息的设定下,高质量样本的价值有限,而在有先验信息的设定下,其价值可能无限放大;同时,LASSO算法在混合噪声下的恢复阈值与均匀噪声情况一致,表现出对数据异质性的强鲁棒性。该工作首次为混合质量数据下的稀疏恢复提供了理论条件,并揭示了信息论与算法恢复阈值对数据质量变化的不同适应方式。

Comments Published as a conference paper at ICLR 2026

详情
英文摘要

We study sparse recovery when observations come from mixed-quality sources: a small collection of high-quality measurements with small noise variance and a larger collection of lower-quality measurements with higher variance. For this heterogeneous-noise setting, we establish sample-size conditions for information-theoretic and algorithmic recovery. On the information-theoretic side, we show that it is sufficient for $(n_1, n_2)$ to satisfy a linear trade-off defining the Price of Quality: the number of low-quality samples needed to replace one high-quality sample. In the agnostic setting, where the decoder is completely agnostic to the quality of the data, it is uniformly bounded, and in particular one high-quality sample is never worth more than two low-quality samples for this sufficient condition to hold. In the informed setting, where the decoder is informed of per-sample variances, the price of quality can grow arbitrarily large. On the algorithmic side, we analyze the LASSO in the agnostic setting and show that the recovery threshold matches the homogeneous-noise case and only depends on the average noise level, revealing a striking robustness of computational recovery to data heterogeneity. Together, these results give the first conditions for sparse recovery with mixed-quality data and expose a fundamental difference between how the information-theoretic and algorithmic thresholds adapt to changes in data quality.

2605.10671 2026-05-12 cs.LG math.OC stat.ML

Natural Policy Gradient as Doubly Smoothed Policy Iteration: A Bellman-Operator Framework

Phalguni Nanda, Zaiwei Chen

AI总结 本文将强化学习中的自然策略梯度算法表示为一种双重平滑策略迭代(DSPI)形式,并将其嵌入到贝尔曼算子的框架中。该框架通过在历史 Q 函数的加权平均上应用正则化贪心步骤来生成策略,涵盖了策略迭代、双平均策略迭代等多种方法。作者证明了 DSPI 在无需修改 MDP 或使用轨迹依赖步长的情况下,具有分布无关的全局几何收敛性,并给出了自然策略梯度和策略双平均方法的迭代复杂度上界。此外,该框架还可扩展至具有线性函数逼近的折扣 MDP 和随机最短路径问题。

详情
英文摘要

In this work, we show that natural policy gradient, a core algorithm in reinforcement learning, admits an exact formulation as a smoothed and averaged form of policy iteration. Specifically, we introduce doubly smoothed policy iteration (DSPI), a Bellman-operator framework in which each policy is obtained by applying a regularized greedy step to a weighted average of past $Q$-functions. DSPI includes policy iteration, dual-averaged policy iteration, natural policy gradient, and more general policy dual averaging methods as special cases. Using only monotonicity and contraction of smoothed Bellman operators, we prove distribution-free global geometric convergence of DSPI. Consequently, standard natural policy gradient and policy dual averaging achieve an iteration complexity of $\mathcal{O}((1-γ)^{-1}\log((1-γ)^{-1}ε^{-1}))$ for computing an $ε$-optimal policy, without modifying the MDP, adding regularization beyond the mirror map inherent in the update, or using adaptive, trajectory-dependent stepsizes. For the unregularized greedy case, corresponding to dual-averaged policy iteration, we also prove finite termination. The same Bellman-operator framework further extends to discounted MDPs with linear function approximation and stochastic shortest path problems.

2605.10668 2026-05-12 cs.LG math.OC math.ST stat.TH

A Spectral Framework for Closed-Form Relative Density Estimation

Francis Bach

AI总结 本文提出了一种用于线性参数化概率模型(包括未归一化和条件模型)中相对对数密度估计的闭式谱框架。该方法通过将KL散度表示为加权卡方散度的积分,将KL估计转化为一系列最小二乘问题,并基于一阶和二阶特征矩导出了显式的谱公式,从而得到闭式散度和对数密度势估计。该框架适用于广泛的f散度,并可与核方法或神经网络特征学习结合,理论证明了估计器的收敛性,并在合成数据上与基于优化的变分方法进行了实验对比。

详情
英文摘要

We propose a closed-form spectral framework for relative log-density estimation in linearly parameterized probabilistic models, including unnormalized and conditional models. This is achieved by representing the Kullback-Leibler (KL) divergence as an integral of weighted chi-squared divergences, converting KL estimation into a family of least-squares problems. We derive an explicit spectral formula based only on first- and second-order feature moments, yielding closed-form estimators of both divergences and log-density potentials for fixed features. The framework extends to a broad class of f-divergences and can be combined with kernelization or feature learning with neural networks. We prove convergence guarantees for the resulting estimators and empirically compare them on synthetic data with optimization-based variational formulations, including logistic and softmax regression for normalized conditional models.

2605.10659 2026-05-12 cs.CL cs.AI cs.SI stat.ML

When Can Digital Personas Reliably Approximate Human Survey Findings?

Mumin Jia, Yilin Chen, Divya Sharma, Jairo Diaz-Rodriguez

AI总结 本文探讨了大型语言模型(LLM)生成的数字人像在何种程度上能够可靠地模拟人类在调查中的回答。研究利用LISS调查数据集构建数字人像,并与真实受访者后续的回答进行对比,评估其在不同任务和层次上的表现。结果表明,数字人像在稳定属性和价值观相关的领域表现较好,但在个体预测和多维结构恢复方面仍存在局限,且其效果更多依赖于人类回答的结构而非模型选择。

详情
英文摘要

Digital personas powered by Large Language Models (LLMs) are increasingly proposed as substitutes for human survey respondents, yet it remains unclear when they can reliably approximate human survey findings. We answer this question using the LISS panel, constructing personas from respondents' background variables and pre-2023 survey histories, then testing them against the same respondents' held-out post-cutoff answers. Across four persona architectures, three LLMs, and two prediction tasks, we assess performance at the question, respondent, distributional, equity, and clustering levels. Digital personas improve alignment with human response distributions, especially in domains tied to stable attributes and values, but remain limited for individual prediction and fail to recover multivariate respondent structure. Retrieval-augmented architectures provide the clearest gains, but performance depends more on human response structure than on model choice: personas perform best for low-variability questions and common respondent patterns, and worst for subjective, heterogeneous, or rare responses. Our results provide practical guidance on when digital personas could be appropriate for survey research and when human validation remains necessary.

2605.10651 2026-05-12 cs.LG cs.AI stat.ML

A Recursive Decomposition Framework for Causal Structure Learning in the Presence of Latent Variables

Zheng Li, Feng Xie, Shenglan Nie, Xichen Guo, Ruxin Wang, Hao Zhang

AI总结 本文提出了一种名为DiCoLa的递归分解框架,用于在存在潜在变量的情况下进行因果结构学习。该方法通过递归分解全局学习任务为更小的子问题,并通过原理化的重构步骤整合子问题的解,从而恢复全局因果结构。该框架在理论上保证了其正确性和完备性,并在合成数据和真实数据上的实验表明,它显著提升了多种因果发现算法的计算效率。

详情
英文摘要

Constraint-based causal discovery is widely used for learning causal structures, but heavy reliance on conditional independence (CI) testing makes it computationally expensive in high-dimensional settings. To mitigate this limitation, many divide-and-conquer frameworks have been proposed, but most assume causal sufficiency, i.e., no latent variables. In this paper, we show that divide-and-conquer strategies can be theoretically generalized beyond causal sufficiency to settings with latent variables. Specifically, we propose a recursive decomposition framework, termed DiCoLa, that enables divide-and-conquer causal discovery in the presence of latent variables. It recursively decomposes the global learning task into smaller subproblems and integrates their solutions through a principled reconstruction step to recover the global structure. We theoretically establish the soundness and completeness of the proposed framework. Extensive experiments on synthetic data demonstrate that our approach significantly improves computational efficiency across a range of causal discovery algorithms, while experiments on a real-world dataset further illustrate its practical effectiveness.

2605.10618 2026-05-12 stat.ME

Indirect Comparisons For Health Technology Assessment: A Practical Methodological Guide And Tips With Insights From The French Transparency Commission

Louise Baschet, Ana Jarne, Matthias Monnereau, Clémence Fradet, Axel Benoist

AI总结 本文针对健康技术评估中缺乏直接头对头证据时的间接治疗比较(ITC)方法,提供了实用的指导建议。文章结合法国国家卫生局透明委员会的经验,探讨了如何在实际应用中确保ITC的可靠性,包括对相似性、传递性假设的严格评估,以及网络Meta分析中证据网络结构的合理设计。研究强调了在不同医疗决策背景下,选择合适方法对提升ITC质量与决策支持能力的重要性。

Comments 9 pages, 1 figure

详情
英文摘要

Context: Indirect treatment comparisons (ITC) are essential when direct head-to-head evidence is unavailable. Their reliability depends on rigorous methodological choices and careful assessment of underlying assumptions. Appropriate methodological choices can help address challenges such as cross-country variations in treatment practices, ethical constraints, and evolving treatment landscapes during trial conduct. This opinion and perspective paper provides practical guidance to strengthen the quality, robustness and accuracy of ITCs in the context of health technology assessment (HTA) in France. Methods: A panel of experts in ITCs and French market access environment developed the present strategic guidance, informed by previous work reviewing HTA methodological guidelines and complemented by a systematic review of Transparency Committee opinions from the French National Authority for Health (HAS). Results: Key considerations include early anticipation of ITCs, justification of potential confounding factors, and rigorous assessment of similarity and transitivity in randomized trial-based comparisons. In network meta-analysis, the structure of the evidence network should be adapted to the specific decision context. Population-Adjusted Indirect Comparisons require careful reporting and interpretation of the effective sample size. When evidence relies on non-randomized clinical trials, comparisons between single-arm studies and external control arms may be appropriate under different scenarios, depending on the feasibility of conducting subsequent randomized studies. Conclusions: Robust and reliable ITCs require methods consistent with the validity of their assumptions and the strength of the available evidence. This practical guidance supports the development of rigorous ITCs to inform decision-making in complex medical contexts where direct comparisons are not feasible.

2605.10590 2026-05-12 stat.ML cs.LG

Amortizing Causal Sensitivity Analysis via Prior Data-Fitted Networks

Emil Javurek, Dennis Frauen, Marie Brockschmidt, Jonas Schweisthal, Stefan Feuerriegel

AI总结 该论文提出了一种用于因果敏感性分析的 amortized 方法,旨在在存在未观测混杂因素的情况下,高效估计因果效应的置信区间。研究通过引入基于先验数据拟合的神经网络,将传统的逐实例计算方式转化为上下文学习框架,大幅提升了计算效率。该方法通过构建通用的先验数据集,并利用拉格朗日标量化的优化目标生成训练标签,避免了模型特定的分析推导,同时在标准凸性和线性条件下能够恢复完整的帕累托前沿解。实验表明,该方法在多种数据集和敏感度设置下均表现出显著的加速效果。

详情
英文摘要

Causal sensitivity analysis aims to provide bounds for causal effect estimates in the presence of unobserved confounding. However, existing methods for causal sensitivity analysis are per-instance procedures, meaning that changes to the dataset, causal query, sensitivity level, or treatment require new computation. Here, we instead present an in-context learning approach. Specifically, we propose an amortized approach to causal sensitivity analysis based on prior-data fitted networks. A key challenge is that the sensitivity bounds are not directly available when sampling training data. To address this, we develop a general prior-data construction that is applicable across the class of generalized treatment sensitivity models. Our construction involves a Lagrangian scalarization of the objective to generate training labels for the bounds through a tradeoff between causal effect min/max-imization and sensitivity model violation, which avoids model-specific analytical derivations. We further show that, under standard convexity and linearity conditions, our objective recovers the full Pareto frontier of solutions. Empirically, we demonstrate our amortized approach across various datasets, causal queries, and sensitivity levels, where our approach achieves a test-time computation that is orders of magnitude faster than per-instance methods. To the best of our knowledge, ours is the first foundation model for in-context learning for causal sensitivity analysis.

2605.10566 2026-05-12 stat.ML cs.LG cs.NA math.NA

Affine Tracing: A New Paradigm for Probabilistic Linear Solvers

Disha Hegde, Marvin Pförtner, Jon Cockayne

AI总结 本文提出了一种新的概率线性求解器框架——仿射追踪(Affine Tracing),旨在解决线性系统求解中的不确定性量化问题。研究指出,传统的贝叶斯概率线性求解器实际上是非平稳仿射概率迭代方法(PIMs)的一个特例,并证明了所有现实的仿射PIMs都是校准良好的。为了解决手动实现仿射PIMs的困难,作者引入了仿射追踪算法,该方法能够自动从标准仿射迭代方法的实现中构建概率迭代求解器,从而显著降低了实现难度,并通过实例展示了其在高斯过程近似中的应用效果。

详情
英文摘要

Probabilistic linear solvers (PLSs) return probability distributions that quantify uncertainty due to limited computation in the solution of linear systems. The literature has traditionally distinguished between Bayesian PLSs, which condition a prior on information obtained from projections of the linear system, and probabilistic iterative methods (PIMs), which lift classical iterative solvers to probability space. In this work we show this dichotomy to be false: Bayesian PLSs are a special case of non-stationary affine PIMs. In addition, we prove that any realistic affine PIM is calibrated. These results motivate a focus on (non-stationary) affine PIMs, but their practical adoption has been limited by the significant manual effort required to implement them. To address this, we introduce affine tracing, an algorithmic framework that automatically constructs a PIM from a standard implementation of an affine iterative method by passing symbolic tracers through the computation to build an affine computational graph. We show how this graph can be transformed to compute posterior covariances, and how equality saturation can be used to perform algebraic simplifications required for computation under specific prior choices. We demonstrate the framework by automatically generating a probabilistic multigrid solver and evaluate its performance in the context of Gaussian process approximation.

2605.10553 2026-05-12 stat.ME

Estimation of the Risk Measure under a Nuisance Autoregression

Jana Jurečková, Jan Picek

AI总结 本文研究在存在干扰自回归过程的情况下,如何估计不可观测误差项的分位数函数,以衡量损失或相关经济指标的风险。作者提出了一种基于R估计量和自回归分位数的估计方法,仅利用可观测的序列数据进行推断。该方法为在未知自回归系数情形下准确评估风险提供了有效途径,具有重要的理论与应用价值。

Comments 11 pages, 1 figure, 4 tables

详情
英文摘要

The goal of an experiment is to evaluate the profit, loss, or the amount of a physical entity over a period. The measurements $X_t$ can be influenced by the values measured in the past; hence we describe the situation with an autoregression model, whose autoregression coefficients are generally unknown. The variable of interest is the error term $Z_t$ of the model, which is the increment of $X_t$ with respect to the past, but itself unobservable. The problem is to estimate various quantile functions of $Z$, as the risk measure of the loss or the related economic indicators. We construct an estimate of quantile functions of $Z$ in the situation that the inference is possible only by means of observations $X$. The proposed estimates are based on the R-estimators of autoregression coefficients, combined with the autoregression quantiles.

2605.10498 2026-05-12 cs.CV cs.AI stat.ML

Simultaneous Long-tailed Recognition and Multi-modal Fusion for Highly Imbalanced Multi-modal Data

Heegeon Yoon, Heeyoung Kim

AI总结 该研究针对高度不平衡的多模态数据,提出了一个同时处理长尾识别与多模态融合的新框架。该方法通过引入多专家架构,结合模态特异性网络估计各模态的信息量,并利用置信度引导的权重动态调整融合过程,从而更有效地整合多源数据。实验表明,该方法在多个基准和真实数据集上优于现有方法,展示了其在长尾分类任务中的鲁棒性和泛化能力。

详情
英文摘要

Long-tailed distributions in class-imbalanced data present a fundamental challenge for deep learning models, which tend to be biased toward majority classes. While recent methods for long-tailed recognition have mitigated this issue, they are largely restricted to single-modal inputs and cannot fully exploit complementary information from diverse data sources. In this work, we introduce a new framework for long-tailed recognition that explicitly handles multi-modal inputs. Our approach extends multi-expert architectures to the multi-modal setting by fusing heterogeneous data into a unified representation while leveraging modality-specific networks to estimate the informativeness of each modality. These confidence-guided weights dynamically modulate the fusion process, ensuring that more informative modalities contribute more strongly to the final decision. To further enhance performance, we design specialized training and test procedures that accommodate diverse modality combinations, including images and tabular data. Extensive experiments on benchmark and real-world datasets demonstrate that the proposed approach not only effectively integrates multi-modal information but also outperforms existing methods in handling long-tailed, class-imbalanced scenarios, highlighting its robustness and generalization capability.

2605.10495 2026-05-12 stat.ME econ.TH

Robust Bayes Acts under Prior Perturbations: Contamination, Stability, and Selection Paths

Christoph Jansen, Georg Schollmeyer

AI总结 本文提出了一种定量框架,用于评估有限决策问题中贝叶斯最优决策在模型不确定性下的稳健性。通过引入稳健性半径和污染需求两个互补的稳定性概念,研究刻画了贝叶斯最优行动在先验扰动下的保持或转变条件,并利用线性规划和二分法高效计算这些指标。基于稳定性度量,文章进一步提出一种结合稳健性与选择成本的调整准则,构建了一组由正则化参数索引的决策规则,并分析了最优行动选择随参数变化的路径,揭示了稳健性驱动与成本驱动决策之间的结构转变。该框架应用于经济制度不确定下的投资组合选择问题,并基于历史ETF收益率数据对六种投资策略的稳健性和污染特性进行了实证分析。

详情
英文摘要

This paper develops a quantitative framework to assess the robustness of Bayes-optimal decisions in finite decision problems under model uncertainty. We introduce two complementary stability notions for acts: the robustness radius, measuring the largest perturbation of a reference prior under which an act remains Bayes-optimal, and the contamination need, quantifying the minimal perturbation required for an act to become Bayes-optimal under some nearby prior. Both concepts are characterized via linear programming formulations and computed efficiently using bisection methods exploiting monotonicity properties. Building on these stability measures, we propose a cost-adjusted stability criterion that integrates robustness considerations with act-specific selection costs, yielding a parametric family of decision rules indexed by a regularization parameter. We analyze how optimal act selection evolves along this parameter and derive selection paths that reveal structural transitions between stability-driven and cost-driven regimes. The framework is applied to a portfolio choice problem under uncertainty between different economic regimes. Concretely, using data on historical ETF returns, we compute robustness and contamination profiles for six portfolio strategies and analyze their behavior under heterogeneous belief specifications. The results illustrate that robustness-based selection refines classical expected utility by accounting for prior misspecification.

2605.10491 2026-05-12 math.PR math.ST stat.TH

Zero-couplings of infinite measures with cyclically monotone support and multivariate regular variation

Alexandre Reber, Anne Sabourin, Johan Segers, Cees de Valk

AI总结 本文研究了在无限测度之间具有循环单调支撑的零耦合问题,特别关注多变量正则变差情形下的指数测度。作者引入了零耦合的概念,并证明了在特定条件下,任意两个无限测度之间存在唯一的循环单调零耦合,推广了经典的Brenier-McCann定理。此外,文章还展示了此类耦合与闭凸函数梯度的关系,并将其应用于正则变差概率测度,揭示了其尾部行为与指数测度之间零耦合的联系。

Comments This paper supersedes arXiv:1811.12061 "Tails of optimal transport plans for regularly varying probability measures" by Cees de Valk and Johan Segers

详情
英文摘要

We study cyclically monotone transport plans between measures in $\mathrm{M}_0(\mathbb{R}^d)$, the class of Borel measures on $\mathbb{R}^d \setminus \{0\}$ that are finite on sets bounded away from the origin but may have infinite total mass. We avoid moment assumptions and allow the transport cost to be infinite. This framework naturally arises for exponent measures in multivariate regular variation and includes other examples such as Lévy measures. We introduce the notion of a zero-coupling and establish existence of cyclically monotone zero-couplings for arbitrary pairs of measures in $\mathrm{M}_0(\mathbb{R}^d)$. Under a Hausdorff-dimension condition on the first measure and when at least one of the two measures has infinite mass, we prove uniqueness of the cyclically monotone zero-coupling, yielding an analogue of the Brenier--McCann theorem in this infinite-measure setting. We further derive a representation of such couplings through gradients of closed convex functions and identify conditions under which the zero-coupling is proper in the sense that the second measure is equal to the restriction to the punctured space of the push-forward of the first measure by a cyclically monotone transport map. Finally, we apply these results to regularly varying probability measures. We show that a cyclically monotone coupling between two such distributions admits a tail limit that coincides with the unique proper cyclically monotone zero-coupling between the corresponding exponent measures.

2605.10385 2026-05-12 stat.ML cs.LG

Regret Analysis of Guided Diffusion for Black-Box Optimization over Structured Inputs

Masaki Adachi, Anita Yang, Yakun Wang, Song Liu

AI总结 本文研究了引导扩散模型在结构化输入的黑箱优化中的遗憾行为,针对现有分析方法在现代扩散优化框架下不适用的问题,提出了一种基于证书的期望简单遗憾分析框架。核心方法围绕“质量提升”这一概念,衡量预训练生成器对近最优设计的概率质量增加,揭示了指数级收敛与多项式加速可能源自同一机制。研究还提供了从有限候选池估计搜索指数的实用诊断方法,并提出了一个完全认证的采样器构造方案。

Comments 48 pages, 12 figures

详情
英文摘要

Guided-diffusion black-box optimization (BO) has shown strong empirical performance on structured design problems such as molecules and crystals, but its regret behavior remains poorly understood. Existing BO regret analyses typically rely on maximum information gain, non-pretrained surrogate models, or exact acquisition maximization -- assumptions that break down in modern diffusion -- BO pipelines, where pretrained diffusion models serve as powerful priors over valid structures and acquisition maximization is replaced by approximate sampling over astronomically large discrete spaces. We develop a first certificate-based expected simple-regret framework for guided-diffusion BO that avoids maximum-information-gain bounds, RKHS assumptions, and exact acquisition maximization. The central quantity in our analysis is mass lift: the increase in probability mass assigned to near-optimal designs relative to the pretrained generator. This view explains how exponential-looking finite-budget convergence and polynomial acceleration can all arise from the same mechanism. We also give practical diagnostics for estimating search exponents from finite candidate pools and a proposal-corrected resampling construction that provides a fully certified sampler instance.

2605.10383 2026-05-12 stat.ML cs.LG

Multifidelity Gaussian process regression for solving nonlinear partial differential equations

Fatima-Zahrae El-Boukkouri, Josselin Garnier, Olivier Roustant

AI总结 本文提出了一种基于协同克里金法的多保真度高斯过程回归方法,用于求解非线性偏微分方程。该方法利用多保真度仿真数据,首先拟合一个可微的非平稳核函数,再结合估计的超参数构建高保真度核函数和均值函数,从而在高斯过程框架下求解PDE。实验在Burgers方程上验证了该方法的有效性,展示了其在物理信息引导下的优越性能。

Comments 31 pages, 20 figures

详情
英文摘要

Solving nonlinear partial differential equations (PDEs) using kernel methods offers a compelling alternative to traditional numerical solvers. However, the performance of these methods strongly depends on the choice of kernel. In this work, as the available information is inherently multifidelity, we propose a kernel learning approach based on cokriging, leveraging empirical information from multifidelity simulations. In the first step, we fit a differentiable non-stationary kernel to an empirical kernel obtained from low-fidelity simulations. In the second step, we derive a high-fidelity kernel with estimated hyperparameters, and construct a corresponding high-fidelity mean using the multifidelity framework. These components can then be used within a Gaussian process framework for solving PDEs. Finally, we demonstrate the performance of the proposed physics-informed method on the Burgers' equation.

2605.10378 2026-05-12 stat.ML astro-ph.CO astro-ph.GA hep-ex hep-ph

Uncertainty in Physics and AI: Taxonomy, Quantification, and Validation

Manuel Haußmann, Ramon Winterhalder, Maria Ubiali

AI总结 本文探讨了在物理领域中使用机器学习时不确定性量化的重要性,提出了一个统一的不确定性分类体系,并澄清了在频率学派和贝叶斯框架下预测不确定性和推断不确定性的含义。研究介绍了多种原理性的验证工具,如覆盖率、校准度、偏差测试和适当评分规则,并通过简单的回归和分类示例加以说明,为物理中的机器学习应用提供了可靠的不确定性评估方法。

详情
英文摘要

Reliable uncertainty quantification is essential for the use of machine learning in physics, where scientific discoveries depend on validated probabilistic statements. We provide a structured overview of uncertainty quantification in ML for physics, introducing a unified taxonomy of uncertainty and clarifying the interpretation of predictive and inference uncertainties across frequentist and Bayesian frameworks. We discuss principled validation tools, including coverage, calibration, bias tests, and proper scoring rules, and illustrate them with simple regression and classification examples.

2605.10330 2026-05-12 stat.ML cs.LG stat.ME

Fast Training of Mixture-of-Experts for Time Series Forecasting via Expert Loss Integration

Btissame El Mahtout, Florian Ziel

AI总结 本文提出了一种新的自适应专家混合(MoE)框架,用于时间序列预测,通过在训练过程中直接引入专家特定的损失信息,增强专家的专业化能力。该方法将基础预测损失与专家特定损失结合,使专家级别的预测误差能够与全局预测损失共同影响模型训练,并结合部分在线学习策略,实现对门控机制和专家参数的增量更新,从而显著降低计算成本。实验表明,该方法在多个经济、旅游和能源数据集上优于传统统计方法和先进神经网络模型,具有更高的预测精度和计算效率。

详情
英文摘要

We propose a novel adaptive Mixture-of-Experts (MoE) framework for time series forecasting that enhances expert specialization by incorporating expert-specific loss information directly into the training process. Notably, the overall objective comprises the base forecasting loss and expert-specific losses, allowing expert-level prediction errors to jointly shape training alongside the global forecasting loss. This framework is further combined with a partial online learning strategy, enabling incremental updates of both the gating mechanism and expert parameters. This approach significantly reduces computational cost by eliminating the need for repeated full model retraining. By integrating expert-level loss awareness with efficient online optimization, the proposed method achieves improved learning efficiency while maintaining strong predictive performance. Empirical results across economic, tourism, and energy datasets with varying frequencies demonstrate that the proposed approach generally outperforms both statistical methods and state-of-the-art neural network models, such as Transformers and WaveNet, in forecasting accuracy and computational efficiency. Furthermore, ablation studies confirm the effectiveness of the expert-specific loss integration strategy, highlighting its contribution to enhancing predictive performance.

2605.10291 2026-05-12 econ.GN cs.AI cs.ET q-fin.EC stat.AP

Generative AI Fuels Solo Entrepreneurship, but Teams Still Lead at the Top

Hyunso Kim, Hyo Kang, Jaeyong Song

AI总结 近年来生成式人工智能的发展正在改变创业者的参与方式,但并未改变高质量创业成果的分布格局。研究利用Product Hunt平台上超过16万次产品发布的数据发现,ChatGPT-3.5发布后,个人创业者进入创业领域的比例显著上升,尤其在以往更倾向于团队创业的领域更为明显。然而,这种增长主要体现在低投入、实验性创业活动上,而高质量成果仍由团队创业主导,表明生成式AI虽降低了个人创业的门槛,但团队在顶尖成果中仍具优势。

详情
英文摘要

Recent advances in generative artificial intelligence (AI) are reshaping who enters entrepreneurship, but not who reaches the top of the quality distribution. Using data on over 160,000 product launches on Product Hunt, we find that entrepreneurial entry increased sharply following the public release of ChatGPT-3.5, driven disproportionately by solo entrepreneurs. This shift toward solo entry is particularly pronounced in categories that historically favored team-based ventures. However, much of this growth reflects low-commitment, experimental entry and does not translate into greater representation among the highest-quality outcomes. Team-based ventures are increasingly dominant in the top tiers of platform rankings. These findings suggest that generative AI lowers barriers to solo entrepreneurship while reinforcing team-based advantages.

2605.10290 2026-05-12 stat.ML cs.LG math.ST stat.TH

Characterizing the Generalization Error of Random Feature Regression with Arbitrary Data-Augmentation

Lucas Morisset, Alain Durmus, Adrien Hardy

AI总结 本文研究了在协变量数量与样本数量成比例的场景下,数据增强对监督回归方法正则化效果的影响。通过仅依赖真实数据的总体统计量以及数据增强方案的一阶和二阶统计量,给出了测试误差(以均方误差衡量)的精确刻画。研究适用于任意网络结构,只要仅训练最后一层输出层,其余部分固定或随机初始化,并且在高斯数据情况下验证了所提出理论的紧致性。

详情
英文摘要

This paper aims at analyzing the regularization effect that data augmentation induces on supervised regression methods in the proportional regime, where the number of covariates grows proportionally to the number of samples. We provide a tight characterization of the test error, measured in mean squared error, in terms only of the population quantities of the true data, as well as first and second order statistics of the augmentation scheme. Our results are valid under misspecified feature maps, and for any network architecture where only the last readout layer is trained, and the rest of the network is either frozen or randomly initialized. We specify our results in the case of Gaussian data, and show that our asymptotic characterization is tight in this setting.

2605.10285 2026-05-12 stat.ML cs.LG

Scalable Gaussian process inference via neural feature maps

Anthony Stephenson

AI总结 本文提出了一种基于神经特征映射的理论支撑高斯过程框架,用于构建表达能力强的核函数。通过将学习到的特征映射解释为隐含再生核希尔伯特空间中格拉姆矩阵的最优低秩近似,建立了高斯过程后验的一致性。该方法还分析了所诱导核的谱特性,并引入乘积特征映射核以缓解过平滑问题,从而实现了快速、可扩展且准确的高斯过程推理,适用于回归和分类任务,并在多个基准数据集上表现出优越的性能。

Comments 27 pages

详情
英文摘要

We present a theoretically grounded Gaussian process framework that leverages neural feature maps to construct expressive kernels. We show that the learned feature map can be interpreted as an optimal low-rank approximation to a Gram matrix derived from an implied RKHS, from which we establish consistency of the GP posterior. We further analyse the spectral properties of the induced kernels and introduce product feature-map kernels to address oversmoothing. This simple yet powerful approach enables fast, scalable, and accurate exact GP inference with minimal upfront work. The flexibility of kernel design supports seamless application to both regression and classification tasks across diverse data modalities, including tabular inputs and structured domains such as images. On benchmark datasets, this approach surpasses pre-existing methods in terms of accuracy and training and prediction efficiency.

2605.10277 2026-05-12 cs.LG math.AP stat.ML

Generalization Error Bounds for Picard-Type Operator Learning in Nonlinear Parabolic PDEs

Koichi Taniguchi, Sho Sonoda

AI总结 本文研究了基于Duhamel-Picard迭代的非线性抛物型偏微分方程(PDE)解算子的学习问题,提出了一个抽象的状态转移模型框架,并推导了与实现无关的泛化误差界,将实现误差与估计误差分离。核心贡献在于揭示了增加Picard迭代深度可以减少截断误差,同时避免熵估计误差的无界增长,并将该理论应用于环面上非线性热方程的Picard型傅里叶神经算子实现中。

Comments 39 pages

详情
英文摘要

Operator learning for partial differential equations (PDEs) aims to learn solution operators on infinite-dimensional function spaces from finite-resolution data. In this setting, it is important for the learned model to be discretization-invariant, or resolution-robust, and to reflect PDE-specific structure. It is therefore natural to ask how such structure should be encoded in the model architecture, hypothesis class, or learning procedure. In this paper, we study operator learning for solution operators of nonlinear parabolic PDEs based on Duhamel--Picard iteration. We formulate Picard iteration as an abstract state-transition model and present a theoretical framework for Picard-type operator learning. We derive implementation-agnostic generalization error bounds that separate the implementation error from the estimation error associated with the abstract state-transition model induced by Picard iteration. A key consequence is that increasing the Picard depth reduces the Picard truncation error without causing an unbounded growth of the entropy-based estimation error. We also extend the analysis to long-time prediction by rolling out the same learned local model over successive time blocks. Finally, we illustrate the theory for nonlinear heat equations on the torus using a Picard-type Fourier neural operator as a concrete implementation.

2605.10249 2026-05-12 stat.ME

Diffeomorphic registration distances for Bayesian calibration of infinite-dimensional computer models

Paul Lartaud, Gwenaël Salin

AI总结 本文研究了如何利用微分同胚配准距离进行无限维计算机模型的贝叶斯标定问题。作者提出采用大形变微分同胚配准(LDDMM)框架中的距离度量,以处理计算机模型输出的无限维特性,如标量场或函数图。该方法通过能量最小变形来定义形状之间的距离,具有良好的可解释性,并与贝叶斯推断兼容,从而能够在无限维空间中建立预测后验分布,为参数标定提供了可靠的不确定性量化方法。

详情
英文摘要

The simulation of physical phenomena with computer models relies on the estimation of physical and/or numerical parameters calibrated to fit experimental data. The approximations within the computer model and the errors in the measurements lead to uncertainties in the calibrated parameters. Bayesian calibration offers a well-studied framework to provide reliable uncertainty quantification on the calibrated parameters. When dealing with complex computer codes whose outputs are infinite-dimensional, Bayesian calibration may be extended by providing a relevant distance in the output space. In this paper, Bayesian calibration is performed using distances from the large deformation diffeomorphic metric matching (LDDMM) framework. LDDMM distances can provide a suitable metric for infinite-dimensional shapes such as scalar fields (i.e. images) or function graphs. This metric can be interpreted as the minimal energy deformation required to transform one shape into another. As such, it provides a readily interpretable metric for Bayesian calibration. On top of this, the representation of the diffeomorphism group as an exponential transformation of an RKHS is compatible with Bayesian inference and allows to define a predictive posterior distribution on the infinite-dimensional space shape.

2605.10206 2026-05-12 math.ST cs.LG stat.ML stat.TH

Extended Wasserstein-GAN Approach to Causal Distribution Learning: Density-Free Estimation and Minimax Optimality

Shu Tamano, Masaaki Imaizumi

AI总结 该论文研究了因果分布学习中的分布性因果推断问题,旨在估计干预后的结果分布,包括分位数和尾部风险等。为解决现有生成对抗网络(GAN)方法在理论保证和稳定性方面的不足,作者提出了GANICE方法,通过引入扩展的Wasserstein距离和单元批评机制,实现了对条件干预分布的精确估计,并在Besov空间理论基础上证明了其最小最大最优性。实验表明,GANICE在多个任务中优于现有方法。

详情
英文摘要

Distributional causal inference requires estimating not only average treatment effects but also interventional outcome distributions, including quantiles, tail risks, and policy-dependent uncertainty. As a method for distributional causal inference, generative adversarial network (GAN)-based counterfactual methods are flexible tools for this task. However, these methods have several limitations. First, the objectives of certain techniques do not coincide with the statistical risk of the identifiable causal target, and therefore provide limited theoretical guarantees regarding estimable counterfactual distributions or optimality. Second, they tend to rely on unstable density-based methods, such as density ratio estimation. In this paper, we propose GANICE (GAN for Interventional Conditional Estimation) with several advantages: it (i) clarifies the conditional interventional distribution for each treatment--covariate state as the causal estimation target; (ii) estimates the conditional distribution such that its averaged Wasserstein risk is minimized; (iii) establishes minimax optimality. GANICE achieves these advantages through the introduction of the extended Wasserstein distance, the incorporation of a cellwise critic in its dual, and an optimality proof based on Besov space theory. Our experiments demonstrate that GANICE consistently outperforms existing methods.

2605.10164 2026-05-12 cs.LG stat.ML

Hyperparameter Transfer for Dense Associative Memories

Roi Holtzman, Dmitry Krotov, Boris Hanin

AI总结 该论文研究了如何将超参数迁移方法应用于密集联想记忆(DenseAM)模型,这类模型通过神经网络在能量景观上进行时间动态操作,具有层内和层间权重共享的结构特点。由于DenseAM使用了在传统前馈网络中较少见的快速峰值激活函数,使得现有超参数迁移方法难以直接应用。本文提出了针对DenseAM的超参数迁移方法,推导了从小规模模型迁移至大规模模型的明确超参数设置规则,并通过实验验证了理论分析与实际结果的一致性。

详情
英文摘要

Dense Associative Memory (DenseAM) is a promising family of AI architectures that is represented by a neural network performing temporal dynamics on an energy landscape. While hyperparameter transfer methods are well-studied for feed-forward networks, these methods have not been developed for settings in which weights are shared across layers and within the layer, which is common in DenseAMs. Additionally, DenseAMs utilize rapidly peaking activation functions that are rarely used in feed-forward architectures. The confluence of these aspects makes DenseAM a challenging framework for using existing methods for hyperparameter transfer. Our work initiates the development of hyperparameter transfer methods for this class of models. We derive explicit prescriptions for how the hyperparameters tuned on small models can be transferred to models trained at scale. We demonstrate excellent agreement between these theoretical findings and empirical results.

2605.10163 2026-05-12 stat.ML cs.AI cs.LG

Coarsening Linear Non-Gaussian Causal Models with Cycles

Francisco Madaleno, Francisco C Pereira, Alex Markham

AI总结 本文研究了在存在循环的线性非高斯因果模型中,如何从高维数据中学习低维因果结构的问题。作者提出了一种方法,在不假设高维结构无环的前提下,仍能恢复出低维的有向无环图(DAG),并将其与现有可识别性结果联系起来。该方法具有较低的时间复杂度和明确的样本复杂度界,为高维因果模型的抽象提供了更广泛适用的解决方案。

详情
英文摘要

Recent work on causal abstraction, in particular graphical approaches focusing on causal structure between clusters of variables, aims to summarize a high-dimensional causal structure in terms of a low-dimensional one. Existing methods for learning such summaries from data assume that both the high- and low-dimensional structures are acyclic, which is helpful for causal effect identification and reasoning but excludes many high-dimensional models and thus limits applicability. We show that in the linear non-Gaussian (LiNG) setting, the high-dimensional acyclicity assumption can be relaxed while still allowing recovery of a low-dimensional causal directed acyclic graph (DAG). We further connect identifiability of this low-dimensional DAG to existing results: LiNG models with cycles are observationally identifiable only up to an equivalence class whose members differ by reversals of directed cycles; our low-dimensional DAG, which is invariant across all members of a given equivalence class, thus forms a natural representative of the class. While existing approaches for learning this observational equivalence class over high-dimensional variables have exponential time complexity, our low-dimensional summary is learned in worst-case cubic time and comes with explicit bounds on the sample complexity. We provide open source code and experiments on synthetic data to corroborate our theoretical results.

2605.10137 2026-05-12 stat.ML cs.LG

PFN-TS: Thompson Sampling for Contextual Bandits via Prior-Data Fitted Networks

Yan Shuo Tan, Kenyon Ng, Ruizhe Deng, Sumetha Loganathan, Qiong Zhang, Bibhas Chakraborty

AI总结 本文提出了一种基于先验数据拟合网络(PFN)的汤普森采样算法PFN-TS,用于上下文老虎机问题。该方法通过子采样预测中心极限定理,将PFN的后验预测分布转化为对奖励函数均值的采样,从而在保持不确定性估计的同时提升采样效率。相比传统方法,PFN-TS通过几何网格上的数据前缀估计后验方差,减少了计算复杂度,并复用TabICL的缓存表示以提高效率。实验表明,PFN-TS在多个基准测试中表现优异,具有较高的策略价值和竞争力。

详情
英文摘要

Thompson sampling is a widely used strategy for contextual bandits: at each round, it samples a reward function from a Bayesian posterior and acts greedily under that sample. Prior-data fitted networks (PFNs), such as TabPFN v2+ and TabICL v2, are attractive candidates for this purpose because they approximate Bayesian posterior predictive distributions in a single forward pass. However, PFNs predict noisy future rewards, while Thompson sampling requires uncertainty over the latent mean reward function. We propose PFN-TS, a Thompson sampling algorithm that converts PFN posterior predictives into mean-reward samples using a subsampled predictive central limit theorem. The method estimates posterior variance from a geometric grid of $O(\log n)$ dataset prefixes rather than the full $O(n)$ predictive sequence used in previous predictive-sequence approaches, and reuses TabICL's cached representations across rounds. We prove consistency of the subsampled variance estimator and give a Bayesian regret bound that decomposes PFN-TS regret into exact posterior-sampling regret under the PFN prior plus approximation terms. Empirically, PFN-TS achieves the best average rank across nonlinear synthetic and OpenML classification-to-bandit benchmarks, remains competitive on linear and BART-generated rewards, and attains the highest estimated policy value in an offline mobile-health evaluation. Code is available at https://anonymous.4open.science/r/PFN_TS-36ED/.

2605.03573 2026-05-12 stat.ML cs.LG

Stochastic Schrödinger Diffusion Models for Pure-State Ensemble Generation

Jian Xu, Wei Chen, Shigui Li, Chao Li, Jingyuan Zheng, Delu Zeng, John Paisley, Qibin Zhao

AI总结 在量子机器学习中,如何从纯态集合中生成新的量子态是一个重要问题。本文提出了一种基于黎曼几何的生成模型——随机薛定谔扩散模型(SSDMs),用于直接在复射影空间上生成量子纯态。该模型通过引入福比尼-斯图迪度量,结合随机薛定谔方程和黎曼分数梯度,解决了传统扩散模型在非欧几里得空间中的扩展难题,并通过局部欧几里得近似实现了无需显式转移密度的训练。实验表明,SSDMs能够准确捕捉目标纯态集合的统计特性,并提升量子机器学习任务的泛化性能。

详情
英文摘要

In quantum machine learning (QML), classical data are often encoded as quantum pure states and processed directly as quantum representations, motivating representation-level generative modeling that samples new quantum states from an underlying pure-state ensemble rather than re-preparing them from perturbed classical inputs. However, extending \emph{score-based} diffusion models with well-defined reverse-time samplers to quantum pure-state ensembles remains challenging, due to the non-Euclidean geometry of the complex projective space $\mathbb{CP}^{d-1}$ and the intractability of transition densities. We propose \emph{Stochastic Schrödinger Diffusion Models} (SSDMs), an intrinsic score-based generative framework on $\mathbb{CP}^{d-1}$ endowed with the Fubini--Study (FS) metric. SSDMs formulate a forward Riemannian diffusion with a stochastic Schrödinger equation (SSE) realization, and derive reverse-time dynamics driven by the Riemannian score $\nabla_{\mathrm{FS}} \log p_t$. To enable training without analytic transition densities, we introduce a local-time objective based on a local Euclidean Ornstein--Uhlenbeck approximation in FS normal coordinates, yielding an analytic teacher score mapped back to the manifold. Experiments show that SSDMs faithfully capture target pure-state ensemble statistics, including observable moments, overlap-kernel MMD, and entanglement measures, and that SSDM-generated quantum representations improve downstream QML generalization via representation-level data augmentation.

2604.26055 2026-05-12 stat.ME stat.AP

Extending Evidence Accumulation Models to Bounded Continuous Self-report Data

Yufei Wu, Tamás Szűcs, Agnes Moors, Francis Tuerlinckx

AI总结 本文将证据积累模型(EAM)扩展到有界连续自我报告数据,以解决传统EAM仅适用于二元选择的局限性。研究提出了两种适用于有界连续数据的模型:半圆扩散模型(HCDM)和贝塔漂移扩散模型(BDDM),并利用 amortized Bayesian 推断方法进行模型拟合与比较。实验表明,两种模型均能有效捕捉反应和反应时间的联合分布,并通过实证数据验证了其参数可解释性和可靠性,为连续响应的认知动力学建模提供了实用工具。

详情
英文摘要

Evidence accumulation models (EAMs) provide a powerful framework for inferring latent cognitive processes from choice and reaction time data. While EAMs are traditionally limited to binary choices, recent developments have extended them to rotationally symmetric continuous responses via the circular diffusion model \citep{smith2016diffusion} and the spatially continuous diffusion model \citep{ratcliff2018decision}. Yet, such extensions are limited in scope, as many psychological constructs are measured on bounded non-rotational scales. In this paper, we bridge this gap by presenting and comparing two adaptations designed for bounded continuous data: the Half-Circular Diffusion Model (HCDM) and the Beta Drift Diffusion Model (BDDM). Because both models have intractable likelihoods, we fit them using Amortized Bayesian Inference (ABI) and compare them using Amortized Bayesian Model Comparison (ABMC). We demonstrate the complete workflow on an empirical affect dataset (N = 215), including parameter recovery, simulation-based calibration, posterior predictive checks, and model comparison. Both models accurately capture the joint distribution of responses and reaction times and yield interpretable parameters that can be reliably recovered. The model comparison further reveals a simple diagnostic for choosing between them: the dispersion of the rating distribution, with HCDM preferred for moderate spread and BDDM for highly concentrated or highly dispersed ratings. This work extends the EAM framework to a new application context, bounded continuous self-report data, and offers researchers a user-friendly toolkit for modeling the cognitive dynamics of continuous responses. We release fully documented Python code with both GPU and CPU implementations, along with example datasets.

2604.19530 2026-05-12 cs.LG cs.CE stat.ML

Calibrating Scientific Foundation Models with Inference-Time Stochastic Attention

Akash Yadav, Taiwo A. Adebiyi, Ruda Zhang

AI总结 本文研究了如何为科学基础模型提供校准良好的预测不确定性,提出了一种名为“随机注意”的轻量级推理时修改方法,通过在注意力权重中引入随机性来生成预测集成,无需重新训练模型。该方法通过一个校准目标来调整随机性参数,实现了高效的后校准。实验表明,该方法在天气预测、时间序列和回归任务中表现出更优的校准性能和更窄的预测区间,且计算成本显著低于现有方法。

详情
英文摘要

Transformer-based scientific foundation models are increasingly deployed in high-stakes settings, but current architectures give deterministic outputs and provide limited support for calibrated predictive uncertainty. We propose Stochastic Attention, a sample average lightweight inference-time modification that randomizes attention by replacing softmax weights with normalized multinomial samples controlled by a single concentration parameter, and produces predictive ensembles without retraining. To set this parameter, we introduce a calibration objective that matches the stochastic attention output with the target, yielding an efficient univariate post-hoc tuning problem. We evaluate this mechanism on scientific foundation models for weather and time-series forecasting, as well as several regression tasks. Across benchmarks against uncertainty-aware baselines, we find that Sample Average Stochastic Attention achieves the strongest native calibration and the sharpest prediction intervals at comparable calibration, with adaptation costs nearly three orders of magnitude lower than the next-best baseline.

2603.15917 2026-05-12 cs.CE stat.ML

Data-efficient Bayesian-guided design selection from large candidate sets: Application to hyperelastic stochastic metamaterials

Hooman Danesh, Henning Wessels

AI总结 本文研究如何从大量候选设计方案中高效选择满足特定宏观应力响应的结构,尤其适用于无法参数化几何且高精度评估代价高昂的情况。提出了一种基于贝叶斯引导的框架,通过统计特征工程降维,并利用多输出高斯过程代理模型进行主动学习,以最小的高精度评估次数筛选出最优设计。该方法在包含5万候选结构的数值实验中表现出高效性,仅需少量评估即可达到预定误差目标。

详情
英文摘要

From a pool of admissible designs, we aim to identify a structure that achieves a target macroscopic stress response. For each candidate, the response is obtained from a high-fidelity oracle, such as expensive computational homogenization or experiments. We consider cases in which (i) the geometry cannot be conveniently parameterized, rendering gradient-based optimization inapplicable, and (ii) brute-force evaluation of all candidates is infeasible due to costly oracle queries. To tackle this challenge, we propose a Bayesian-guided design selection framework. The dimensionality of design variants is reduced through statistical feature engineering, and the resulting low-dimensional descriptors are mapped to effective hyperelastic constitutive parameters using a multi-output Gaussian process surrogate. The surrogate is trained using uncertainty-driven active learning with only a limited number of high-fidelity oracle evaluations. The surrogate shortlists promising candidates, and since its accuracy is inherently limited, the final selection of the optimal design is performed through high-fidelity oracle evaluations within the shortlist. In numerical test cases, we consider a design set of 50,000 candidate structures. Active learning requires labeling less than half a percent of the entire candidate set. Bayesian-guided design selection reaches a prescribed error threshold with only a handful of oracle evaluations in most cases.

2603.02563 2026-05-12 math.ST math.PR stat.TH

Graph Disjointness with Applications to Reversible Markov Chains

Yang Xiang, Kevin McGoff, Andrew B. Nobel

AI总结 本文研究了无向加权图与可逆马尔可夫链之间的结构差异,通过图联接(graph joinings)的概念,探讨了图之间的强弱不相容性。文章建立了图联接、不相容性与图因子之间的紧密联系,并利用马尔可夫转移矩阵的谱重叠特性,刻画了图的弱不相容性;同时证明了无自环图强不相容当且仅当其弱不相容且其中一个是树;此外,图的强弱不相容性主要由顶点和边集决定,与边权重无关。这些结果为理解可逆马尔可夫链的耦合结构提供了新的视角。

详情
英文摘要

The correspondence between weighted undirected graphs and reversible Markov chains via vertex random walks is simple and well known. Leveraging this correspondence and ideas from the theory of dynamical systems, we study the structural discordance of graphs and Markov chains by means of graph joinings. Informally, a joining of graphs $G$ and $H$ is a graph on the product of their vertex sets giving rise to a coupling of their random walks. Graphs $G$ and $H$ are strongly disjoint if their only joining is the tensor product, and they are weakly disjoint if the degree function of every joining is equal to the degree function of the tensor product. We establish close connections between graph joinings, disjointness, and graph factors. Our first principal result characterizes weak disjointness of graphs in terms of the spectral overlap of their Markov transition matrices. The second establishes that two graphs without self loops are strongly disjoint if and only if they are weakly disjoint and exactly one of the graphs is a tree. The third shows that the strong or weak disjointness of graphs is essentially determined by their vertex and edge sets, without regard to edge weights. Translating these results into the language of Markov chains yields new insights into the rigidity and structure of reversible couplings of reversible Markov chains.

2603.00541 2026-05-12 cs.LG stat.ML

Spectral Condition for $μ$P under Width-Depth Scaling

Chenyu Zheng, Rongzhen Wang, Xinyu Zhang, Chongxuan Li

AI总结 随着生成式基础模型在宽度和深度上同时扩展,稳定特征学习和可靠的超参数迁移面临挑战。本文提出了一种统一的谱域框架,用于在联合宽度-深度缩放下实现最大更新参数化($μ$P),明确了权重及其每步更新的范数应如何随宽度和深度变化,并揭示了从单变换($k=1$)到多变换($k\geq 2$)的转变。该框架适用于多种优化器,实验表明其在GPT-2类语言模型中能实现稳定的特征学习和鲁棒的超参数迁移,优于传统参数化和$ k=1 $情况下的$ μ $P方法。

Comments 76 pages, 13 figures, 40 tables

详情
英文摘要

Generative foundation models are increasingly scaled in both width and depth, posing significant challenges for stable feature learning and reliable hyperparameter (HP) transfer across model sizes. While maximal update parameterization ($μ$P) has provided a principled solution to both problems for width scaling, existing extensions to the joint width-depth scaling regime remain fragmented, architecture- and optimizer-specific, and often rely on technically involved theories. In this work, we develop a simple and unified spectral framework for $μ$P under joint width-depth scaling. For deep residual networks whose residual blocks contain $k$ transformations, the framework specifies how the norms of weights and their per-step updates should scale with width and depth. It reveals a fundamental transition from $k=1$ to $k\geq 2$, unifying previously disparate $μ$P formulations and identifying the $k\geq 2$ case as more appropriate for practical architectures with multi-transformation branches such as Transformers. Building on this framework, we derive a general recipe for implementing $μ$P across a broad class of optimizers by mapping spectral constraints to concrete HP parameterizations, recovering existing results and extending them to additional optimizers. Finally, experiments on GPT-2 style language models show that the $μ$P formulation derived from the $k\geq 2$ case achieves stable feature learning and robust HP transfer under width-depth scaling, whereas standard parameterization and $μ$P in the $k=1$ case often fail to do so. These results support the practical effectiveness of the proposed spectral framework.

2602.18866 2026-05-12 cs.LG stat.ML

$(α,β)$-Stability for Boosting Vector-Valued Prediction

Jian Qian, Shu Ge

AI总结 本文研究了向量值预测中的提升(boosting)方法,提出了基于几何中位数的$(α,β)$-稳定性概念,用于分析聚合过程如何将弱预测器的性能提升为强预测器。作者在多种自然散度度量下刻画了该稳定性性质,并基于此提出了一种通用的提升框架\geomedboost,该框架通过指数重加权和几何中位数聚合实现,能够在弱学习器条件下保证经验散度误差的指数衰减,并进一步推导出总体误差的上界。

详情
英文摘要

Despite the widespread use of boosting in structured prediction, a general theoretical understanding of aggregation beyond scalar prediction remains incomplete. We study vector-valued prediction under a target divergence and identify a geometric stability property under which aggregation amplifies weak guarantees into strong ones. We formalize this property as $(α,β)$-stability by geometric median and show how it supports a boosting framework based on exponential reweighting and geometric-median aggregation. For vector-valued prediction, we characterize this stability property under several natural divergences: $\ell_1$ and $\ell_2$ distances for unconstrained vector-valued prediction, and TV, Hellinger, and KL for density estimation over finite probability vectors. Building on these results, we propose a generic boosting framework \geomedboost. Under a weak learner condition and $(α,β)$-stability, we obtain exponential decay of the empirical divergence error, which then yields population guarantees through a generalization bound.

2602.17274 2026-05-12 eess.IV stat.ML

Gaussian Surrogates for Poisson Imaging: Some Theoretical and Empirical Results

Alexandra Spitzer, Lorenzo Baldassari, Valentin Derbanot, Ivan Dokmanić

AI总结 在泊松分布测量的成像逆问题中,通常使用泊松似然函数构建目标函数,但实际性能常通过均方误差(MSE)评估。本文研究了在泊松噪声下,泊松目标函数与高斯替代目标函数在MSE上的表现差异,发现未正则化的泊松最大似然估计在低剂量下可能导致较大的MSE,而泊松MAP通过正则化可缓解这一问题。文章提出两种高斯替代目标函数,并证明它们在低剂量条件下可达到与泊松MAP相当的MSE,数值实验进一步验证了这些结论的广泛适用性。

详情
英文摘要

In imaging inverse problems with Poisson-distributed measurements, it is common to use objectives derived from the Poisson likelihood. But performance is often evaluated by mean squared error (MSE), which raises a practical question: how much does a Poisson objective matter for MSE, even at low dose? We analyze the MSE of Poisson and Gaussian surrogate reconstruction objectives under Poisson noise. In a stylized diagonal model, we show that the unregularized Poisson maximum-likelihood estimator can incur large MSE at low dose, while Poisson MAP mitigates this instability through regularization. We then study two Gaussian surrogate objectives: a heteroscedastic quadratic objective motivated by the normal approximation of Poisson data, and a homoscedastic quadratic objective that yields a simple linear estimator. We show that both surrogates can achieve MSE comparable to Poisson MAP in the low-dose regime, despite departing from the Poisson likelihood. Numerical computed tomography experiments indicate that these conclusions extend beyond the stylized setting of our theoretical analysis.

2602.04189 2026-05-12 cs.LG stat.CO

Beyond Accuracy: Evaluating Posterior Fidelity of Diffusion Inverse Solvers

Xiaoyu Qiu, Taewon Yang, Zhanhao Liu, Guanyang Wang, Liyue Shen

AI总结 本文研究了扩散逆解器(DIS)在科学与工程反问题中的后验分布保真度问题,指出现有基准主要关注重建精度而忽视了不确定性量化。为此,作者提出了一种无需真实后验的评分核Stein分歧(score-KSD)指标,用于评估扩散采样器生成样本与目标后验分布的一致性。实验表明,该指标能有效揭示重建精度与后验一致性之间的差异,为更全面的模型评估提供了新方法。

详情
英文摘要

Uncertainty evaluation is critical in scientific and engineering inverse problems. However, existing benchmarks on Diffusion Inverse Solvers (DIS) primarily focus on reconstruction accuracy but overlook uncertainty and distributional behavior. Since stochastic inverse solvers represent uncertainty through diffusion-based posterior samples, evaluating how well their generated samples capture the target posterior distribution becomes an important aspect of uncertainty quantification. To address this limitation and better understand the distributional behavior of diffusion samplers, we conduct a systematic study to investigate the posterior fidelity of a broad range of existing DIS methods in controlled simulation settings with a known analytical true posterior. Furthermore, to enable posterior-aware evaluation on real-world inverse problems where ground-truth posterior is unavailable, we propose score-based Kernel Stein Discrepancy (score-KSD), a theoretically-grounded and ground-truth-free metric that measures the consistency of the distribution of generated samples from a DIS method with the target posterior score field, induced by the forward model and learned diffusion prior. Through both simulation experiments and real-world inverse problem solving, we validate the effectiveness of the proposed score-KSD and demonstrate that it provides meaningful posterior fidelity diagnostics beyond reconstruction accuracy, revealing that higher reconstruction accuracy does not necessarily imply better posterior consistency.

2601.21739 2026-05-12 cs.LG cs.AI stat.ML

Why Adam Works Better with $β_1 = β_2$: The Missing Gradient Scale Invariance Principle

Alberto Fernández-Hernández, Cristian Pérez-Corral, Jose I. Mestre, Manuel F. Dolz, Enrique S. Quintana-Ortí

AI总结 本文研究了Adam优化器中为何当动量参数满足 $β_1 = β_2$ 时表现更优这一长期未被解释的现象。作者提出并形式化了一个名为“梯度尺度不变性”的结构性质,证明当 $β_1 = β_2$ 时,Adam 优化器具有一阶梯度尺度不变性。该发现不仅解释了Adam在平衡参数设置下的优越性能,也为设计鲁棒性更强的优化算法提供了理论指导。

Comments 23 pages, 8 figures. Preprint

详情
英文摘要

Adam has been at the core of large-scale training for almost a decade, yet a simple empirical fact remains unaccounted for: both validation scores and the qualitative behaviour of the training runs improve when the momentum parameters satisfy $β_{1}=β_{2}$. Some recent studies have reported this pattern, but there is still no explanation for why this choice helps. We show that this choice is closely tied to a structural property that we refer to as \textit{gradient scale invariance}. We formalize this notion and prove that Adam becomes gradient scale invariant of first order if and only if $β_{1}=β_{2}$. This perspective places the balanced regime of Adam in direct alignment with the design principles underlying several recent optimizers that explicitly enforce scale-robust updates. The theory is supported by experiments across vision and language tasks, and across different architectural families, in which rescaling the gradient has a markedly smoother effect on the update when $β_{1}=β_{2}$. Overall, our results offer a coherent explanation for an open question in the behavior of Adam and provide a simple principle that helps guide the design of future optimizers.

2601.20756 2026-05-12 cs.LG stat.ML

Supervised Guidance Training for Infinite-Dimensional Diffusion Models

Elizabeth L. Baker, Alexander Denker, Jes Frellsen

AI总结 本文研究了如何在无限维函数空间中对扩散模型进行监督引导训练,以解决来自偏微分方程的贝叶斯反问题。作者提出了一种基于无限维Doob $h$-变换的条件化方法,并将条件分数分解为无条件分数和引导项,进而设计了一种无需模拟的分数匹配目标(称为监督引导训练),实现了高效稳定的后验采样。该方法为在函数空间中微调扩散模型以准确采样后验分布提供了首个系统性方案。

详情
英文摘要

Score-based diffusion models have recently been extended to infinite-dimensional function spaces, with uses such as inverse problems arising from partial differential equations. In the Bayesian formulation of inverse problems, the aim is to sample from a posterior distribution over functions obtained by conditioning a prior on noisy observations. While diffusion models provide expressive priors in function space, the theory of conditioning them to sample from the posterior remains open. We address this, assuming that either the prior lies in the Cameron-Martin space, or is absolutely continuous with respect to a Gaussian measure. We prove that the models can be conditioned using an infinite-dimensional extension of Doob's $h$-transform, and that the conditional score decomposes into an unconditional score and a guidance term. As the guidance term is intractable, we propose a simulation-free score matching objective (called Supervised Guidance Training) enabling efficient and stable posterior sampling. We illustrate the theory with numerical examples on Bayesian inverse problems in function spaces. In summary, our work offers the first function-space method for fine-tuning trained diffusion models to accurately sample from a posterior.

2601.14013 2026-05-12 math.ST stat.TH

Robustness for free: asymptotic size and power of max-tests in high dimensions

Anders Bredahl Kock, David Preinerstorfer

AI总结 本文研究了在高维情况下,存在对抗性污染和重尾分布时,检验高维随机向量均值是否为零的问题。为了解决标准最大值检验方法对异常值高度敏感的问题,作者提出了一种基于分位数 winsorization 的最大值检验方法,该方法在维度指数增长的情况下仍能控制渐近显著性水平,并仅需二阶以上矩条件。研究表明,与标准最大值检验相比,该方法在保持相同渐近功效的同时提升了鲁棒性,且在某些情况下使用 bootstrap 临界值还能进一步提升检验功效。

详情
英文摘要

Allowing for adversarial contamination and heavy tails, we study testing whether the mean of a high-dimensional random vector equals zero. Because standard max-tests based on sample averages are highly non-robust, we propose a max-test based on quantile-winsorized observations. The test controls asymptotic size under adversarial contamination and only requires $m>2$ moments, while allowing dimension to grow exponentially with sample size. We fully characterize its asymptotic power function. Comparing with the standard max-test, for which we also derive a power characterization as a benchmark, we show that robustness is obtained for free: under the stronger conditions that make the standard max-test valid, our robust test has identical asymptotic power. We also study the role of bootstrap critical values, showing that their use never decreases power, can strictly improve asymptotic power in extremely correlated designs, but often has no first-order asymptotic effect.

2512.16875 2026-05-12 cs.DS cs.LG math.ST stat.ML stat.TH

Learning Confidence Ellipsoids and Applications to Robust Subspace Recovery

Chao Gao, Liren Shan, Vaidehi Srinivas, Aravindan Vijayaraghavan

AI总结 本文研究了在高维空间中为任意分布寻找置信椭球的问题,目标是在给定置信参数α的情况下,找到包含至少1−α概率质量的最小体积椭球。为了解决高维下传统方法难以高效近似的问题,作者提出了一种多项式时间算法,能够在体积近似因子与椭球条件数β的多项式关系下,保证覆盖足够概率质量,并给出了相应的计算复杂性下界。该方法基于最小体积外接椭球的对偶结构和几何Brascamp-Lieb不等式,为鲁棒子空间恢复问题提供了首个具有最坏情况近似保证的多项式时间算法。

详情
英文摘要

We study the problem of finding confidence ellipsoids for an arbitrary distribution in high dimensions. Given samples from a distribution $D$ and a confidence parameter $α$, the goal is to find the smallest volume ellipsoid $E$ which has probability mass $\mathbb{P}_{D}[E] \ge 1-α$. Ellipsoids are a highly expressive class of confidence sets as they can capture correlations in the distribution, and can approximate any convex set. In statistics, this is the classic minimum volume estimator introduced by Rousseeuw as a robust non-parametric estimator of location and scatter. However in high dimensions, it becomes NP-hard to obtain any non-trivial approximation factor in volume when the condition number $β$ of the ellipsoid (ratio of the largest to the smallest axis length) goes to $\infty$. This motivates the focus of our paper: can we efficiently find confidence ellipsoids with volume approximation guarantees when compared to ellipsoids of bounded condition number $β$? Our main result is a polynomial time algorithm that finds an ellipsoid $E$ whose volume is within a $O(β)^{γd}$ multiplicative factor of the volume of best $β$-conditioned ellipsoid while covering at least $1-O(α/γ)$ probability mass for any $γ\in (0,1)$. In particular, setting $γ= o(1)$, this gives a $O(β)^{o(d)}$ volume approximation, with a multiplicative loss in miscoverage. We complement this with a computational hardness result that shows that such a dependence on $β$ seems necessary, even with some slack in coverage. The algorithm and analysis uses the rich primal-dual structure of the minimum volume enclosing ellipsoid and the geometric Brascamp-Lieb inequality. As a consequence, we obtain the first polynomial time algorithm with approximation guarantees on worst-case instances of the robust subspace recovery problem.

2511.01292 2026-05-12 stat.ML cs.LG

Optimal Attention Temperature Improves the Robustness of In-Context Learning under Distribution Shift in High Dimensions

Samet Demir, Zafer Dogan

AI总结 该研究探讨了如何通过调整注意力温度来提升预训练Transformer模型在分布偏移情况下的上下文学习(ICL)鲁棒性。在高维线性回归框架下,作者分析了一种具有近似softmax注意力机制的Transformer,并推导出分布偏移下ICL泛化误差的闭式表达式,发现存在一个最优注意力温度可最小化该误差。实验表明,调整注意力温度不仅能提升理论性能,还能在实际预训练大语言模型中有效增强对噪声上下文示例的鲁棒性。

Comments ICML 2026, 24 pages, 7 figures

详情
英文摘要

Pretrained Transformers can perform in-context learning (ICL) from a few demonstrations, but this ability can fail sharply when the test distribution differs from pretraining, a common deployment setting. We study attention temperature as a simple inference-time control for improving ICL robustness under such shifts. In a high-dimensional linear-regression framework, we analyze a Transformer with "approximate softmax" attention, which preserves softmax's normalization and temperature-dependent selectivity while remaining tractable. We derive a closed-form expression for the ICL generalization error under distribution shift, and show that it is minimized by an explicit optimal attention temperature. This characterization yields interpretable guidance by linking the best temperature to moments of the pre-softmax attention scores, and predicts when temperature adjustment can recover near Bayes-optimal performance. We validate the theory with extensive simulations, and further demonstrate gains on pretrained LLMs (GPT-2 and Llama2-7B) on question-answering benchmarks under distribution shift induced by noisy in-context demonstrations. Overall, attention temperature emerges as a principled, lightweight knob for improving the robustness of ICL in pretrained Transformers.

2510.22202 2026-05-12 stat.ME stat.ML

Causal Effect Estimation with TMLE: Handling Missing Data and Near-Violations of Positivity

Christoph Wiederkehr, Christian Heumann, Michael Schomaker

AI总结 本文研究了在存在缺失数据和近似违反正则性假设的情况下,使用目标最大似然估计(TMLE)估计平均处理效应的性能。通过模型和设计驱动的模拟实验,比较了八种缺失数据处理方法与TMLE结合的效果,发现非多重插补方法,特别是结合结果缺失模型的完整案例方法,在减少偏差和提高对正则性违反的鲁棒性方面表现更优;而多重插补结合分类与回归树(CART)则在均方根误差和置信区间覆盖率方面具有优势。研究揭示了偏差与覆盖率之间的权衡,并为不同优先目标提供了相应的推荐方法。

Comments 35 Pages, 7 Figures

详情
Journal ref
Biometrical Journal, 68(3): e70134, 2026
英文摘要

We evaluate the performance of targeted maximum likelihood estimation (TMLE) for estimating the average treatment effect in missing data scenarios under varying levels of positivity violations. We employ model- and design-based simulations, with the latter using undersmoothed highly adaptive lasso on the 'WASH Benefits Bangladesh' dataset to mimic real-world complexities. Five missingness-directed acyclic graphs are considered, capturing common missing data mechanisms in epidemiological research, particularly in one-point exposure studies. These mechanisms include also not-at-random missingness in the exposure, outcome, and confounders. We compare eight missing data methods in conjunction with TMLE as the analysis method, distinguishing between non-multiple imputation (non-MI) and multiple imputation (MI) approaches. The MI approaches use both parametric and machine-learning models. Results show that non-MI methods, particularly complete cases with TMLE incorporating an outcome-missingness model, exhibit lower bias compared to all other evaluated missing data methods and greater robustness against positivity violations across. In Comparison MI with classification and regression trees (CART) achieve lower root mean squared error, while often maintaining nominal coverage rates. Our findings highlight the trade-offs between bias and coverage, and we recommend using complete cases with TMLE incorporating an outcome-missingness model for bias reduction and MI CART when accurate confidence intervals are the priority.

2510.13397 2026-05-12 cs.LG stat.ML

Assessing the robustness of heterogeneous treatment effects in survival analysis under informative censoring

Yuxin Wang, Dennis Frauen, Jonas Schweisthal, Maresa Schröder, Stefan Feuerriegel

AI总结 在临床研究中,由于患者提前退出(dropout)现象普遍,且退出可能与生存时间相关(即信息性删失),导致治疗效果估计存在偏差。本文提出了一种假设较少的框架,用于在信息性删失下评估条件平均处理效应(CATE)估计的稳健性,通过部分识别方法推导出CATE的置信区间,从而识别出在存在信息性删失情况下治疗仍有效的患者子群。此外,作者还提出了一种新型的模型无关元学习方法SurvB-learner,能够与任意机器学习模型结合使用,具有双重稳健性和近似最优效率等良好理论性质,并通过仿真和真实数据实验验证了其有效性。

详情
英文摘要

Dropout is common in clinical studies, with up to half of patients leaving early due to side effects or other reasons. When dropout is informative (i.e., dependent on survival time), it introduces censoring bias, because of which treatment effect estimates are also biased. In this paper, we propose an assumption-lean framework to assess the robustness of conditional average treatment effect (CATE) estimates in survival analysis when facing censoring bias. Unlike existing works that rely on strong assumptions, such as non-informative censoring, to obtain point estimation, we use partial identification to derive informative bounds on the CATE. Thereby, our framework helps to identify patient subgroups where treatment is effective despite informative censoring. We further propose a novel model-agnostic meta-learner, called SurvB-learner, to estimate the bounds that can be used in combination with arbitrary machine-learning models, and that has favorable theoretical properties such as double-robustness and quasi-oracle efficiency. We finally demonstrate the effectiveness of our meta-learner across various experiments using both simulated and real-world data.

2510.10730 2026-05-12 cs.LG cs.AI stat.ML

Provable Anytime Ensemble Sampling Algorithms in Nonlinear Contextual Bandits

Jiazheng Sun, Weixin Wang, Pan Xu

AI总结 本文提出了一种统一的算法框架,用于非线性上下文老虎机中的集成采样,并针对广义线性老虎机和神经网络上下文老虎机两种常见场景,分别给出了广义线性集成采样(GLM-ES)和神经网络集成采样(Neural-ES)方法,并证明了它们的高概率频繁主义遗憾界。研究通过在随机扰动数据上使用最大似然估计维护多个奖励模型参数估计器,解决了非线性模型中的理论挑战,并提供了无需固定时间步长的任意时间版本算法,具有较强的实用性和理论保证。实验结果表明,所提方法在实际中表现优异。

Comments 58 pages, 5 figures, 1 table

详情
英文摘要

We provide a unified algorithmic framework for ensemble sampling in nonlinear contextual bandits and develop corresponding regret bounds for two most common nonlinear contextual bandit settings: Generalized Linear Ensemble Sampling (GLM-ES) for generalized linear bandits and Neural Ensemble Sampling (Neural-ES) for neural contextual bandits. Both methods maintain multiple estimators for the reward model parameters via maximum likelihood estimation on randomly perturbed data. We prove high-probability frequentist regret bounds of $\widetilde{O}(d^{3/2} \sqrt{T} + d^{4})$ for GLM-ES and $\widetilde{O}(\widetilde{d}^{3/2} \sqrt{T})$ for Neural-ES, where $d$ is the dimension of feature vectors, $\widetilde{d}$ is the effective dimension of a neural tangent kernel (NTK) matrix and $T$ is the number of rounds. The regret bound of GLM-ES matches the state-of-the-art result of randomized exploration algorithms in generalized linear bandit setting. In the theoretical analysis, we introduce techniques that address challenges specific to nonlinear models. Practically, we remove fixed-time horizon assumption by developing anytime versions of our algorithms, suitable when $T$ is unknown. Finally, we empirically evaluate GLM-ES, Neural-ES and their anytime variants, demonstrating strong performance. Overall, our results establish ensemble sampling as a provable and practical randomized exploration approach for nonlinear contextual bandits.

2510.08117 2026-05-12 cs.IT math.IT stat.ML

Near-optimal Rank Adaptive Inference of High Dimensional Matrices

Frédéric Zheng, Yassir Jedra, Alexandre Proutiere

AI总结 本文研究了从线性测量中估计高维矩阵的问题,重点设计能够自适应调整有效秩的最优算法。该方法通过估计矩阵的奇异值和对应奇异向量,自适应地确定有效秩,并分析了有效秩选择中的基本权衡关系。作者提出了一种结合最小二乘估计和通用奇异值阈值化的算法,提供了有限样本下的误差界,并证明其性能接近理论下限,研究成果在多元回归和线性动态系统识别中得到了验证。

Comments AISTATS 2026

详情
英文摘要

We address the problem of estimating a high-dimensional matrix from linear measurements, with a focus on designing optimal rank-adaptive algorithms. These algorithms infer the matrix by estimating its singular values and the corresponding singular vectors up to an effective rank, adaptively determined based on the data. We establish instance-specific lower bounds for the sample complexity of such algorithms, uncovering fundamental trade-offs in selecting the effective rank: balancing the precision of estimating a subset of singular values against the approximation cost incurred for the remaining ones. Our analysis identifies how the optimal effective rank depends on the matrix being estimated, the sample size, and the noise level. We propose an algorithm that combines a Least-Squares estimator with a universal singular value thresholding procedure. We provide finite-sample error bounds for this algorithm and demonstrate that its performance nearly matches the derived fundamental limits. Our results rely on an enhanced analysis of matrix denoising methods based on singular value thresholding. We validate our findings with applications to multivariate regression and linear dynamical system identification.

2509.20294 2026-05-12 cs.LG math.ST stat.TH

Alignment-Sensitive Minimax Rates for Spectral Algorithms with Learned Kernels

Dongming Huang, Zhifan Li, Yicheng Li, Qian Lin

AI总结 本文研究了在核函数从数据中学习的背景下谱算法的泛化性能,引入了一个新的复杂度度量——有效跨度维度(ESD),该度量考虑了信号、谱和噪声水平的联合影响,适用于任意核和信号,无需依赖特征值衰减条件。研究证明,当序列模型的ESD不超过$K$时,最小最大超额风险与$σ^2 K$成比例,并分析了过参数化梯度流如何降低ESD,从而提升谱算法的泛化能力。该框架拓展到了线性模型和再生核希尔伯特空间回归,并通过数值实验验证了理论结果,为理解自适应特征学习与泛化性能的关系提供了新视角。

详情
英文摘要

We study spectral algorithms in the setting where kernels are learned from data. We introduce the effective span dimension (ESD), an alignment-sensitive complexity measure that depends jointly on the signal, spectrum, and noise level $σ^2$. The ESD is well-defined for arbitrary kernels and signals without requiring eigen-decay conditions or source conditions. We prove that for sequence models whose ESD is at most $K$, the minimax excess risk scales as $σ^2 K$. Furthermore, we analyze over-parameterized gradient flow and prove that it can reduce the ESD. This finding establishes a connection between adaptive feature learning and provable improvements in generalization of spectral algorithms. We demonstrate the generality of the ESD framework by extending it to linear models and RKHS regression, and we support the theory with numerical experiments. This framework provides a novel perspective on generalization beyond traditional fixed-kernel theories.

2509.06172 2026-05-12 stat.AP cs.LG

Robust Analysis for Resilient AI System

Yu Wang, Ran Jin, Lulu Kang

AI总结 本文针对制造工业互联网(MII)系统中操作风险导致的数据异常问题,提出了一种新的鲁棒回归方法DPD-Lasso,结合密度幂散度与Lasso正则化,以处理AI韧性实验中的污染数据。该方法通过高效的迭代算法克服了计算瓶颈,并在气溶胶喷射打印的MII测试平台中验证了其在干净数据和含异常值数据下的可靠性和稳定性,为构建和验证韧性工业AI系统提供了重要工具。

Comments 10 pages, 3 figures

详情
Journal ref
2025 IEEE International Conference on Data Mining Workshops (ICDMW), Washington, DC, USA, 2025, pp. 1631-1641
英文摘要

Operational hazards in Manufacturing Industrial Internet (MII) systems generate severe data outliers that cripple traditional statistical analysis. This paper proposes a novel robust regression method, DPD-Lasso, which integrates Density Power Divergence with Lasso regularization to analyze contaminated data from AI resilience experiments. We develop an efficient iterative algorithm to overcome previous computational bottlenecks. Applied to an MII testbed for Aerosol Jet Printing, DPD-Lasso provides reliable, stable performance on both clean and outlier-contaminated data, accurately quantifying hazard impacts. This work establishes robust regression as an essential tool for developing and validating resilient industrial AI systems.

2507.22867 2026-05-12 stat.ME

Hawkes Processes with Variable Length Memory: Existence, Inference and Application to Neuronal Activity

Sacha Quayle, Anna Bonnet, Maxime Sangnier

AI总结 本文提出了一类具有可变长度记忆的非线性霍克斯过程,用于建模神经元活动中的激发与抑制效应。该模型扩展了传统霍克斯过程,允许事件发生的概率根据其最后一次事件前后的历史信息不同而变化,从而更灵活地描述神经元记忆重置的现象。研究证明了该过程的存在性,并提出了有效的似然最大化方法,能够在合成数据和真实神经活动数据中成功识别经典及可变记忆动态。

详情
英文摘要

Multivariate Hawkes processes are past-dependant point processes originally introduced to model excitation effects, later extended to a nonlinear framework to account for the opposite effect, known as inhibition. Motivated by applications in neuroscience, where the memory of a neuron may reset upon firing, we introduce a new class of nonlinear Hawkes processes with variable length memory. Our model generalises classical Hawkes processes, with or without inhibition, describing the situation where the probability of an event occurring within a given subprocess may depend differently on the history before and after its last event. In particular, if the subprocess does not depend on the history before its last event, it is said to have a variable length memory. Our main contributions are to prove existence of such processes, and to derive a workable likelihood maximisation method, capable of identifying both classical and variable memory dynamics. We demonstrate the effectiveness of our approach both on synthetic data, and on a neuronal activity dataset.

2507.15437 2026-05-12 stat.ME q-fin.ST stat.AP

Prediction of linear fractional stable motions using codifference, with application to non-Gaussian rough volatility

Matthieu Garcin, Karl Sawaya, Thomas Valade

AI总结 本文研究了如何利用共差(codifference)预测线性分数稳定运动(LFSM)的未来增量,并将其应用于非高斯粗糙波动率的建模。与传统依赖协方差的方法不同,该方法适用于具有无限协方差的α-稳定增量过程,通过条件期望或半度量投影实现预测。研究表明,该方法在模拟数据和实际波动率数据中均表现出良好的预测性能,并揭示了分数过程在序列依赖性中可能存在第四种记忆机制。

详情
英文摘要

The linear fractional stable motion (LFSM) extends the fractional Brownian motion (fBm) by considering $α$-stable increments. We propose a method to forecast future increments of the LFSM from past discrete-time observations, using the conditional expectation when $α>1$ or a semimetric projection otherwise. It relies on the codifference, which describes the serial dependence of the process, instead of the covariance. Indeed, covariance is commonly used for predicting an fBm but it is infinite when $α<2$. Some theoretical properties of the method and of its accuracy are studied and both a simulation study and an application to real volatility data, with a comparison to the fBm and to the heterogeneous auto-regressive model, confirm the relevance of the approach. The LFSM-based method shows a promising performance in the forecast of time series of volatilities, decomposing properly, in the fractal dynamic of rough volatilities, the contribution of the kurtosis of the increments and the contribution of their serial dependence. Moreover, the analysis of hit ratios suggests that, beside independence, persistence, and antipersistence, a fourth regime of serial dependence exists for fractional processes, characterized by a selective memory controlled by a few large increments.

2507.07969 2026-05-12 cs.LG cs.AI cs.RO stat.ML

Reinforcement Learning with Action Chunking

Qiyang Li, Zhiyuan Zhou, Sergey Levine

AI总结 本文提出了一种名为Q-chunking的方法,旨在提升强化学习在长期任务和稀疏奖励场景下的性能。该方法通过引入动作分块技术,使智能体能够在离线数据的指导下进行更有效的在线探索,并结合无偏的n步备份机制,提高时差学习的稳定性与效率。实验表明,Q-chunking在多个长期稀疏奖励的操控任务中表现出优越的离线性能和在线样本效率。

Comments The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025); 29 pages, 17 figures

详情
英文摘要

We present Q-chunking, a simple yet effective recipe for improving reinforcement learning (RL) algorithms for long-horizon, sparse-reward tasks. Our recipe is designed for the offline-to-online RL setting, where the goal is to leverage an offline prior dataset to maximize the sample-efficiency of online learning. Effective exploration and sample-efficient learning remain central challenges in this setting, as it is not obvious how the offline data should be utilized to acquire a good exploratory policy. Our key insight is that action chunking, a technique popularized in imitation learning where sequences of future actions are predicted rather than a single action at each timestep, can be applied to temporal difference (TD)-based RL methods to mitigate the exploration challenge. Q-chunking adopts action chunking by directly running RL in a 'chunked' action space, enabling the agent to (1) leverage temporally consistent behaviors from offline data for more effective online exploration and (2) use unbiased $n$-step backups for more stable and efficient TD learning. Our experimental results demonstrate that Q-chunking exhibits strong offline performance and online sample efficiency, outperforming prior best offline-to-online methods on a range of long-horizon, sparse-reward manipulation tasks.

2506.20928 2026-05-12 stat.ML cs.LG

Active Learning for Manifold Gaussian Process Regression

Yuanxing Cheng, Lulu Kang, Yiwei Wang, Chun Liu

AI总结 本文提出了一种用于流形高斯过程回归的主动学习框架,将流形学习与策略性数据选择相结合,以提升高维空间中的预测精度。该方法联合优化一个用于降维的神经网络和潜空间中的高斯过程回归器,并通过主动学习准则最小化全局预测误差。实验表明,该框架在合成数据上的表现优于随机顺序学习,能够高效处理复杂且不连续的函数,同时保持计算可行性,具有重要的科学与工程应用价值。

Comments 13 pages, 6 figures

详情
Journal ref
2025 Winter Simulation Conference (WSC), Seattle, WA, USA, 2025, pp. 1-12
英文摘要

This paper introduces an active learning framework for manifold Gaussian Process (GP) regression, combining manifold learning with strategic data selection to improve accuracy in high-dimensional spaces. Our method jointly optimizes a neural network for dimensionality reduction and a Gaussian process regressor in the latent space, supervised by an active learning criterion that minimizes global prediction error. Experiments on synthetic data demonstrate superior performance over randomly sequential learning. The framework efficiently handles complex, discontinuous functions while preserving computational tractability, offering practical value for scientific and engineering applications. Future work will focus on scalability and uncertainty-aware manifold learning.

2505.06452 2026-05-12 math.ST stat.TH

Semiparametric semi-supervised learning for general targets under distribution shift and decaying overlap

Lorenzo Testa, Qi Xu, Jing Lei, Kathryn Roeder

AI总结 在现代科学应用中,协变量数据丰富,但结果标签往往稀缺且可能面临分布偏移。本文提出了一种半参数半监督学习框架D2S3,适用于标签在随机缺失(MAR)且样本间重叠可能随样本量增加而消失的情形。该方法支持多种平滑统计目标的估计与推断,如均值、线性回归系数、分位数和因果效应,并在高维干扰估计和分布偏移下保持有效性。理论分析表明,该方法在保持双重稳健性、渐近正态性和半参数效率的同时,修正了传统根号n收敛率在重叠消失情况下的失效问题。

Comments 21 pages, 4 figures

详情
英文摘要

In modern scientific applications, large volumes of covariate data are readily available, while outcome labels are costly, sparse, and often subject to distribution shift. This asymmetry has spurred interest in semi-supervised (SS) learning, but most existing approaches rely on strong assumptions -- such as missing completely at random (MCAR) labeling or strict positivity -- that put substantial limitations on their practical usefulness. In this work, we introduce a general semiparametric framework for estimation, inference, and efficiency benchmarking in SS settings where labels are missing at random (MAR) and the overlap may vanish as sample size increases. Our framework, that we label D2S3, accommodates a wide range of smooth statistical targets -- including means, linear regression coefficients, quantiles, and causal effects -- and remains valid under high-dimensional nuisance estimation and distributional shift between labeled and unlabeled samples. We extend the theoretical guarantees of augmented inverse probability weighting estimators to preserve double robustness, asymptotic normality, and semiparametric efficiency under this challenging D2S3 regime. A key insight is that classical root-n convergence fails under vanishing overlap; we instead provide corrected asymptotic rates that capture the impact of the decay in overlap. We validate our theory through simulations and demonstrate practical utility in real-world applications on the internet of things and public health where labeled data are scarce.

2505.02562 2026-05-12 math.OC math.ST stat.TH

Marginal minimization and sup-norm expansions in perturbed optimization

Vladimir Spokoiny

AI总结 本文研究了在存在干扰变量的情况下如何求解目标函数的边际最小化问题,探讨了插值方法和交替优化方法的收敛性条件,并分析了边际优化与上范数估计之间的联系。通过合理的假设,文章给出了精确的闭式结果,并以BTL模型的数值例子加以说明。

Comments arXiv admin note: substantial text overlap with arXiv:2503.15045

详情
英文摘要

Let the objective unction \( f \) depends on the target variable \( x \) along with a nuisance variable \( s \): \( f(v) = f(x,s) \). The goal is to identify the marginal solution \( x^{*} = \arg\min_{x} \min_{s} f(x,s) \). This paper discusses three related problems. The plugin approach widely used e.g. in inverse problems suggests to use a preliminary guess (pilot) \( \hat{s} \) and apply the solution of the partial optimization \( \hat{x} = \arg\min_{x} f(x,\hat{s}) \). The main question to address within this approach is the required quality of the pilot ensuring the prescribed accuracy of \( \hat{x} \). The popular \emph{alternating optimization} approach suggests the following procedure: given a starting guess \( x_{0} \), for \( t \geq 1 \), define \( s_{t} = \arg\min_{s} f(x_{t-1},s) \), and then \( x_{t} = \arg\min_{x} f(x,s_{t}) \). The main question here is the set of conditions ensuring a convergence of \( x_{t} \) to \( x^{*} \). Finally, the paper discusses an interesting connection between marginal optimization and sup-norm estimation. The basic idea is to consider one component of the variable \( v \) as a target and the rest as nuisance. In all cases, we provide accurate closed form results under realistic assumptions. The results are illustrated by one numerical example for the BTL model.

2504.20941 2026-05-12 cs.CR math.DG stat.OT

Conformal-DP: A Density-Aware Mechanism for Differential Privacy over Riemannian Manifolds via Conformal Transformation

Peilin He, Liou Tang, M. Amin Rahimian, James Joshi

AI总结 该论文提出了一种针对黎曼流形数据的差分隐私机制Conformal-DP,旨在解决现有方法在处理非均匀数据分布时扰动偏差和隐私-效用权衡不佳的问题。该方法通过共形变换校准扰动,使隐私扰动与局部数据密度相关,从而在流形上诱导出密度平衡的几何结构。理论分析表明该机制在温和正则条件下满足ε-差分隐私,实验结果验证了其在异构数据分布下相比现有方法具有更优的隐私-效用平衡。

Comments Submitted, under review

详情
英文摘要

Differential Privacy (DP) is being increasingly adopted for non-Euclidean data that lie on complex, high-dimensional manifolds. Existing DP mechanisms for manifold data consider geometric properties when calibrating privacy perturbations, but they largely fail to capture variations in data density within datasets, leading to biased perturbations and suboptimal privacy-utility trade-offs due to heterogeneous data distributions. In this paper, we propose a novel density-aware differential privacy mechanism on Riemannian manifolds, referred to as Conformal-DP, that leverages conformal transformations to calibrate perturbations based on local densities and to induce a density-balanced geometry. We prove that our mechanism satisfies $ε$-differential privacy on any complete Riemannian manifold under mild regularity assumptions. In addition, we derive a closed-form expected geodesic error bound that depends only on the underlying data density ratio and is independent of global curvature. Our empirical results on synthetic and real-world datasets demonstrate that the proposed Conformal-DP mechanism substantially improves the privacy-utility trade-off in heterogeneous data distribution settings, with worst-case performance comparable to state-of-the-art manifold DP mechanisms that assume uniformly distributed data.

2504.11848 2026-05-12 stat.ME math.ST stat.ML stat.TH

Proximal Inference for Indirect and Intervening Effects in Population Interventions

Yang Bai, Yifan Cui, Baoluo Sun

AI总结 该论文研究了在存在未测量混杂因素的情况下,如何准确估计群体干预的间接效应(PIIE)以及干预变量的因果效应。作者提出了一种基于近似因果推断框架的统一识别与估计方法,利用观测协变量作为代理变量,构建了三种不同的识别策略,并推导了目标估计量的半参数效率界,开发了多重稳健且局部高效的估计方法。研究通过模拟实验验证了方法的有效性,并应用于分析酒精摄入通过去人格化症状对抑郁风险的间接影响。

详情
英文摘要

Unmeasured confounding, unethical exposure, and ill-defined interventions pose significant challenges to evaluating policy-relevant mediation estimands in medicine and public health. In observational studies involving harmful exposures, the population intervention indirect effect (PIIE) is often more salient than the natural indirect effect, as the latter relies on hypothetical interventions that may be ethically or practically unfeasible. While the PIIE can be identified via the generalized front-door criterion under unmeasured exposure-outcome confounding, existing estimation methods typically assume the absence of unmeasured confounding for the mediator. Furthermore, when the exposure corresponds to ill-defined interventions, the standard PIIE criterion fails; however, the generalized front-door formula may still identify the causal effect of an intervening variable designed to capture the indirect effect. This paper develops a unified identification and estimation framework for the PIIE and the causal effect of an intervening variable in settings with pervasive unmeasured confounding affecting exposure-mediator, exposure-outcome, and mediator-outcome relationships. Specifically, we leverage observed covariates as proxy variables to construct three distinct identification strategies within a proximal causal inference framework. We characterize the semiparametric efficiency bound for the target estimands and develop multiply robust, locally efficient estimators that remain consistent under partial model misspecification. The finite-sample performance of our estimators is demonstrated through simulations. Finally, we apply our methodology to study the indirect effect of alcohol consumption on depression risk as mediated by depersonalization symptoms.

2502.16120 2026-05-12 math.OC stat.ML

A Fenchel-Young Loss Approach to Data-Driven Inverse Optimization

Zhehao Li, Yanchen Wu, Jian Chen, Xiaojie Mao

AI总结 本文研究了从优化解观测中估计优化模型未知参数的数据驱动逆优化问题。作者提出了一种基于Fenchel-Young损失函数的新方法,将逆优化与结构化预测中的FY损失联系起来,该方法能够高效地进行梯度优化,显著优于现有方法。理论分析和实验结果表明,该方法在参数估计精度、决策误差和计算速度方面具有明显优势。

详情
英文摘要

Data-driven inverse optimization seeks to estimate unknown parameters in an optimization model from observations of optimization solutions. Many existing methods are ineffective in handling noisy and suboptimal solution observations and also suffer from computational challenges. In this paper, we build a connection between inverse optimization and the Fenchel-Young (FY) loss originally designed for structured prediction, proposing a FY loss approach to data-driven inverse optimization. This new approach is amenable to efficient gradient-based optimization, hence much more efficient than existing methods. We provide theoretical guarantees for the proposed method and use extensive simulation and real-data experiments to demonstrate its significant advantage in parameter estimation accuracy, decision error and computational speed.

2502.10760 2026-05-12 cs.CL cs.LG stat.ML

Why is prompting hard? Understanding prompts on binary sequence predictors

Li Kevin Wenliang, Anian Ruoss, Jordi Grau-Moya, Marcus Hutter, Tim Genewein

AI总结 本文探讨了为何在二元序列预测模型中设计有效的提示(prompt)具有挑战性,认为最优提示的寻找应基于接近最优的序列预测器进行条件设置。通过大量受控实验,研究发现结合预训练分布可以更好地理解非直观的最优提示,而即使使用穷举搜索,实际神经预测模型的最优提示也难以可靠识别。研究还指出,一些流行的提示方法如使用目标任务的示例可能效果不佳,并揭示了前沿模型中最优提示的规律与二元示例及先前研究存在相似性。

详情
Journal ref
Artificial Intelligence and Statistics 2026
英文摘要

Frontier models can be prompted or conditioned to do many tasks, but finding good prompts is not always easy, nor is understanding some performant prompts. We view prompting as finding the best conditioning sequence on a near-optimal sequence predictor. On numerous well-controlled experiments, we show that unintuitive optimal conditioning sequences can be better understood given the pretraining distribution, which is not usually available. Even using exhaustive search, reliably identifying optimal prompts for practical neural predictors can be surprisingly difficult. Popular prompting methods, such as using demonstrations from the targeted task, can be surprisingly suboptimal. Using the same empirical framework, we analyze optimal prompts on frontier models, revealing patterns similar to the binary examples and previous findings. Taken together, this work takes an initial step towards understanding optimal prompts, from a statistical and empirical perspective that complements research on frontier models.

2412.09226 2026-05-12 stat.AP econ.EM

The Global Carbon Budget as a cointegrated system

Mikkel Bennedsen, Eric Hillebrand, Morten Ørregaard Nielsen

AI总结 本文研究全球碳预算的四个年度时间序列,包括大气CO₂浓度、人为CO₂排放以及陆地和海洋的CO₂吸收量,将其作为协整系统进行分析。研究发现这四个序列具有三阶协整关系,其中人为排放是驱动系统非平稳动态的单一随机趋势。文章进一步构建了一个符合物理关系的误差修正模型,并通过似然比检验验证了该模型的合理性,可用于样本内和样本外分析,并在共享社会经济路径情景下展示了与气候科学一致的预测结果。

详情
英文摘要

The Global Carbon Budget, maintained by the Global Carbon Project, summarizes Earth's global carbon cycle through four annual time series beginning in 1959: atmospheric CO$_2$ concentrations, anthropogenic CO$_2$ emissions, and CO$_2$ uptake by land and by ocean. We analyze these four time series as a multivariate (cointegrated) system. Statistical tests show that the four time series are cointegrated with rank three and identify anthropogenic CO$_2$ emissions as the single stochastic trend driving the nonstationary dynamics of the system. The three cointegrated relations correspond to the physical relations that the sinks are linearly related to atmospheric concentrations and that the change in concentrations equals emissions minus the combined uptake by land and ocean. Furthermore, likelihood ratio tests show that a parametrically restricted error-correction model that embodies these physical relations cannot be rejected on the data. The model can be used for both in-sample and out-of-sample analysis. In an application of the latter, we demonstrate that projections based on this model, using Shared Socioeconomic Pathways scenarios, yield results consistent with established climate science.

2409.03410 2026-05-12 math.ST stat.TH

Error bounds of Median-of-means estimators with VC-dimension

Yuxuan Wang, Yiming Chen, Hanchao Wang, Lixin Zhang

AI总结 本文研究了在重尾分布和数据污染情况下,使用中位数-均值(MOM)方法对均值向量进行鲁棒估计的误差上界问题。通过引入VC维而非Rademacher复杂度来度量统计复杂度,该方法仅需有限二阶矩的假设,比许多现有方法更弱。研究还提出了一种基于MOM的半空间深度估计器,并给出了在任意范数下均值估计的误差界,拓展了MOM方法在协方差估计中的应用。

详情
英文摘要

We obtain the upper error bounds of robust estimators for mean vector, using the median-of-means (MOM) method. The method is designed to handle data with heavy tails and contamination, with only a finite second moment, which is weaker than many others, relying on the VC dimension rather than the Rademacher complexity to measure statistical complexity. This allows us to implement MOM in covariance estimation, without imposing conditions such as $L$-sub-Gaussian or $L_{4}-L_{2}$ norm equivalence. In particular, we derive a new robust estimator, the MOM version of the halfspace depth, along with error bounds for mean estimation in any norm.

2408.15701 2026-05-12 stat.ME stat.CO

Robust discriminant analysis

Mia Hubert, Jakob Raymaekers, Peter J. Rousseeuw

AI总结 判别分析(DA)是一种广泛用于分类的统计方法,因其概念简单、计算成本低且性能稳定而受到青睐。传统DA通过算术均值和样本协方差矩阵估计各类别的中心和散布,但这种方法对异常值和误标数据非常敏感。本文综述了鲁棒判别分析的技术,介绍基于稳健位置和散布估计的DA方法,并提供了用于可视化分析结果的图形诊断工具,提高了在存在偏离数据时的可靠性。

Comments Accepted for publication in WIREs Computational Statistics (Wiley Interdisciplinary Reviews)

详情
英文摘要

Discriminant analysis (DA) is one of the most popular methods for classification due to its conceptual simplicity, low computational cost, and often solid performance. In its standard form, DA uses the arithmetic mean and sample covariance matrix to estimate the center and scatter of each class. We discuss and illustrate how this makes standard DA very sensitive to suspicious data points, such as outliers and mislabeled cases. We then present an overview of techniques for robust DA, which are more reliable in the presence of deviating cases. In particular, we review DA based on robust estimates of location and scatter, along with graphical diagnostic tools for visualizing the results of DA.

2405.17642 2026-05-12 cs.LG cs.AI stat.ME

Unifying Perspectives: Plausible Counterfactual Explanations on Global, Group-wise, and Local Levels

Oleksii Furman, Patryk Wielopolski, Łukasz Lenkiewicz, Jerzy Stefanowski, Maciej Zięba

AI总结 随着人工智能系统日益复杂,可解释性需求日益迫切。本文提出一种基于梯度优化的统一方法,能够同时生成局部、全局和群体级反事实解释,弥补了现有方法在不同粒度层面缺乏整合的不足。通过将实例分组与反事实生成结合为单一高效流程,并引入可信性准则,提升了群体级反事实的合理性与实用性,实验验证了该方法在有效性、贴近性与可信性之间的良好平衡。

详情
英文摘要

The growing complexity of AI systems has intensified the need for transparency through Explainable AI (XAI). Counterfactual explanations (CFs) offer actionable "what-if" scenarios on three levels: Local CFs providing instance-specific insights, Global CFs addressing broader trends, and Group-wise CFs (GWCFs) striking a balance and revealing patterns within cohesive groups. Despite the availability of methods for each granularity level, the field lacks a unified method that integrates these complementary approaches. We address this limitation by proposing a gradient-based optimization method for differentiable models that generates Local, Global, and Group-wise Counterfactual Explanations in a unified manner. We especially enhance GWCF generation by combining instance grouping and counterfactual generation into a single efficient process, replacing traditional two-step methods. Moreover, to ensure trustworthiness, we innovatively introduce the integration of plausibility criteria into the GWCF domain, making explanations both valid and realistic. Our results demonstrate the method's effectiveness in balancing validity, proximity, and plausibility while optimizing group granularity, with practical utility validated through practical use cases.

2102.09448 2026-05-12 stat.ME math.ST stat.TH

A Generative Approach to Joint Modeling of Quantitative and Qualitative Responses

Xiaoning Kang, Lulu Kang, Wei Chen, Xinwei Deng

AI总结 在许多科学领域,常常需要同时处理定量和定性响应变量,并且预测变量数量庞大。本文提出了一种生成模型方法,用于联合建模定量与定性响应变量及其预测变量的联合分布。该方法在惩罚似然框架下实现了高效的参数估计,能够在保证计算效率的同时,准确分类定性响应并预测定量响应。理论分析表明,在一定正则条件下,该方法在分类和预测方面具有渐近最优性,模拟和实际案例研究验证了其有效性。

详情
Journal ref
Journal of Multivariate Analysis, Volume 190, 2022, 104952
英文摘要

In many scientific areas, data with quantitative and qualitative (QQ) responses are commonly encountered with a large number of predictors. By exploring the association between QQ responses, existing approaches often consider a joint model of QQ responses given the predictor variables. However, the dependency among predictive variables also provides useful information for modeling QQ responses. In this work, we propose a generative approach to model the joint distribution of the QQ responses and predictors. The proposed generative model provides efficient parameter estimation under a penalized likelihood framework. It achieves accurate classification for qualitative response and accurate prediction for quantitative response with efficient computation. Because of the generative approach framework, the asymptotic optimality of classification and prediction of the proposed method can be established under some regularity conditions. The performance of the proposed method is examined through simulations and real case studies in material science and genetics.

2008.06525 2026-05-12 stat.ME

Bayesian Auxiliary Variable Model for Birth Records Data with Qualitative and Quantitative Responses

Xiaoning Kang, Shyam Ranganathan, Lulu Kang, Julia Gohlke, Xinwei Deng

AI总结 本文提出了一种贝叶斯辅助变量模型,用于同时分析具有定性与定量响应的数据,旨在更准确地捕捉两者之间的依赖关系。该方法通过引入潜在变量建立响应之间的联系,并采用高效的MCMC算法进行参数后验分布估计。研究应用于弗吉尼亚州健康部门的出生记录数据,分析了早产与婴儿出生体重之间的相互依赖关系,验证了模型在预测性能上的优势。

Comments 27 pages, 3 figures. 3 tables

详情
Journal ref
2021 , Journal of Statistical Computation and Simulation, 91(16), 3283-3303
英文摘要

Many applications involve data with qualitative and quantitative responses. When there is an association between the two responses, a joint model will provide improved results than modeling them separately. In this paper, we propose a Bayesian method to jointly model such data. The joint model links the qualitative and quantitative responses and can assess their dependency strength via a latent variable. The posterior distributions of parameters are obtained through an efficient MCMC sampling algorithm. The simulation shows that the proposed method can improve the prediction capacity for both responses. We apply the proposed joint model to the birth records data acquired by the Virginia Department of Health and study the mutual dependence between preterm birth of infants and their birth weights.

2008.06476 2026-05-12 stat.ME

Locally Optimal Design for A/B Testing in the Presence of Covariates and Network Connection

Qiong Zhang, Lulu Kang

AI总结 本文研究了在存在协变量和网络连接的情况下,如何设计更有效的A/B测试实验。作者将处理分配、协变量和网络结构纳入条件自回归模型,提出了一种基于方差最小化的设计准则,并采用局部最优设计方法应对网络相关参数的不确定性。实验表明,考虑网络依赖性的设计能显著提升A/B测试的效果,并且所提方法对参数选择具有鲁棒性。

Comments 19 pages, 8 figures

详情
Journal ref
2022 Technometrics, 64(3), 358--369
英文摘要

A/B test, a simple type of controlled experiment, refers to the statistical procedure of experimenting to compare two treatments applied to test subjects. For example, many IT companies frequently conduct A/B tests on their users who are connected and form social networks. Often, the users' responses could be related to the network connection. In this paper, we assume that the users, or the test subjects of the experiments, are connected on an undirected network, and the responses of two connected users are correlated. We include the treatment assignment, covariate features, and network connection in a conditional autoregressive model. Based on this model, we propose a design criterion that measures the variance of the estimated treatment effect and allocate the treatment settings to the test subjects by minimizing the criterion. Since the design criterion depends on an unknown network correlation parameter, we adopt the locally optimal design method and develop a hybrid optimization approach to obtain the optimal design. Through synthetic and real social network examples, we demonstrate the value of including network dependence in designing A/B experiments and validate that the proposed locally optimal design is robust to the choices of parameters.

2008.06475 2026-05-12 stat.ME

A Maximin $Φ_{p}$-Efficient Design for Multivariate GLM

Yiou Li, Lulu Kang, Xinwei Deng

AI总结 本文研究了在多元广义线性模型(GLM)中如何设计高效实验方案,以应对模型参数和结构的不确定性。提出了一种新的最大化最小 $Φ_p$-效率(Mm-$Φ_p$)设计方法,旨在在模型不确定性下保证设计效率的最差情况也尽可能高。基于该准则的理论性质,作者开发了一种具有良好收敛性的高效算法,并通过数值例子验证了该设计方法的有效性。

详情
Journal ref
Statistica Sinica. 32(4), 2047--2069
英文摘要

Experimental designs for a generalized linear model (GLM) often depend on the specification of the model, including the link function, the predictors, and unknown parameters, such as the regression coefficients. To deal with uncertainties of these model specifications, it is important to construct optimal designs with high efficiency under such uncertainties. Existing methods such as Bayesian experimental designs often use prior distributions of model specifications to incorporate model uncertainties into the design criterion. Alternatively, one can obtain the design by optimizing the worst-case design efficiency with respect to uncertainties of model specifications. In this work, we propose a new Maximin $Φ_p$-Efficient (or Mm-$Φ_p$ for short) design which aims at maximizing the minimum $Φ_p$-efficiency under model uncertainties. Based on the theoretical properties of the proposed criterion, we develop an efficient algorithm with sound convergence properties to construct the Mm-$Φ_p$ design. The performance of the proposed Mm-$Φ_p$ design is assessed through several numerical examples.

2008.05578 2026-05-12 stat.ME

Covariate Balancing Based on Kernel Density Estimates for Controlled Experiments

Yiou Li, Lulu Kang, Xiao Huang

AI总结 该论文研究了在受控实验中如何通过协变量平衡来提高因果效应估计的准确性。作者提出了一种基于核密度估计的协变量平衡准则,用于在实验前对实验单元进行分组,以减少协变量在处理组之间的差异。该方法通过最小化处理组协变量核密度估计的差异来实现更均衡的分组,数值实验表明该方法相比完全随机化和再随机化方法能更有效地提升均值差异估计的精度。

Comments 26 page, 2 figures, 1 table

详情
Journal ref
2021, Statistical Theory and Related Fields, 5(2), 102--113
英文摘要

Controlled experiments are widely used in many applications to investigate the causal relationship between input factors and experimental outcomes. A completely randomized design is usually used to randomly assign treatment levels to experimental units. When covariates of the experimental units are available, the experimental design should achieve covariate balancing among the treatment groups, such that the statistical inference of the treatment effects is not confounded with any possible effects of covariates. However, covariate imbalance often exists, because the experiment is carried out based on a single realization of the complete randomization. It is more likely to occur and worsen when the size of the experimental units is small or moderate. In this paper, we introduce a new covariate balancing criterion, which measures the differences between kernel density estimates of the covariates of treatment groups. To achieve covariate balance before the treatments are randomly assigned, we partition the experimental units by minimizing the criterion, then randomly assign the treatment levels to the partitioned groups. Through numerical examples, we show that the proposed partition approach can improve the accuracy of the difference-in-mean estimator and outperforms the complete randomization and rerandomization approaches.

2004.09887 2026-05-12 stat.CO

Is a Transformed Low Discrepancy Design Also Low Discrepancy?

Yiou Li, Lulu Kang, Fred J. Hickernell

AI总结 本文研究了将低差异设计通过变量变换应用于任意目标分布后,是否仍保持低差异的问题。作者分析了不同核函数对差异度量的影响,指出在满足特定条件时变换后的设计仍具有低差异,但在实际应用中这些条件可能难以满足,导致差异增大。为此,文章提出了两种改进方法:一种是确保原始设计具有一维最优投影,适用于高密度设计;另一种是将变换后的设计作为输入,通过坐标交换算法优化目标差异,适用于各类设计。

详情
Journal ref
2020, Book Chapter in Fan J., Pan J. (eds) Contemporary Design of Experiments, Multivariate Analysis and Data Mining--Festschrift in Honor of Professor Kai-Tai Fang. Springer, Cham
英文摘要

Experimental designs intended to match arbitrary target distributions are typically constructed via a variable transformation of a uniform experimental design. The inverse distribution function is one such transformation. The discrepancy is a measure of how well the empirical distribution of any design matches its target distribution. This chapter addresses the question of whether a variable transformation of a low discrepancy uniform design yields a low discrepancy design for the desired target distribution. The answer depends on the two kernel functions used to define the respective discrepancies. If these kernels satisfy certain conditions, then the answer is yes. However, these conditions may be undesirable for practical reasons. In such a case, the transformation of a low discrepancy uniform design may yield a design with a large discrepancy. We illustrate how this may occur. We also suggest some remedies. One remedy is to ensure that the original uniform design has optimal one-dimensional projection, but this remedy works best if the design is dense, or in other words, the ratio of sample size divided by the dimension of the random variable is relatively large. Another remedy is to use the transformed design as the input to a coordinate-exchange algorithm that optimizes the desired discrepancy, and this works for both dense or sparse designs. The effectiveness of these two remedies is illustrated via simulation.

2004.06443 2026-05-12 stat.ML cs.LG

Particle-based Energetic Variational Inference

Yiwei Wang, Jiuhai Chen, Chun Liu, Lulu Kang

AI总结 本文提出了一种基于能量耗散律的变分推断新框架——能量变分推断(EVI),能够统一并推导出多种现有的粒子型变分推断方法,如Stein变分梯度下降(SVGD)。在此框架下,作者还提出了一种新的粒子型EVI方法,采用“先近似后变分”的策略,在每一步迭代中显著降低KL散度,数值实验表明该方法在保持目标分布忠实度方面优于现有方法。

Comments 17 pages, 7 figures

详情
Journal ref
2021, Statistics and Computing, Vol 31, 34
英文摘要

We introduce a new variational inference (VI) framework, called energetic variational inference (EVI). It minimizes the VI objective function based on a prescribed energy-dissipation law. Using the EVI framework, we can derive many existing Particle-based Variational Inference (ParVI) methods, including the popular Stein Variational Gradient Descent (SVGD) approach. More importantly, many new ParVI schemes can be created under this framework. For illustration, we propose a new particle-based EVI scheme, which performs the particle-based approximation of the density first and then uses the approximated density in the variational procedure, or "Approximation-then-Variation" for short. Thanks to this order of approximation and variation, the new scheme can maintain the variational structure at the particle level, and can significantly decrease the KL-divergence in each iteration. Numerical experiments show the proposed method outperforms some existing ParVI methods in terms of fidelity to the target distribution.

1910.03120 2026-05-12 stat.ME

Gaussian Process Assisted Active Learning of Physical Laws

Jiuhai Chen, Lulu Kang, Guang Lin

AI总结 在科学与工程领域,从噪声实验数据中发现控制微分方程是一个关键挑战。本文提出一种主动学习方法,通过结合D-最优性与最大化最小空间填充准则,以减少实验数据量准确估计未知微分方程。该方法利用高斯过程回归模型对未知解及其导数进行估计,并结合变量选择回归方法从实验数据中学习微分方程,多个案例研究表明该方法在模型精度和数据效率方面优于传统设计方法。

Comments 27 pages, 5 figures, 10 tables

详情
Journal ref
2021, Technometrics, 63(3), 329--342
英文摘要

In many areas of science and engineering, discovering the governing differential equations from the noisy experimental data is an essential challenge. It is also a critical step in understanding the physical phenomena and prediction of the future behaviors of the systems. However, in many cases, it is expensive or time-consuming to collect experimental data. This article provides an active learning approach to estimate the unknown differential equations accurately with reduced experimental data size. We propose an adaptive design criterion combining the D-optimality and the maximin space-filling criterion. In contrast to active learning for other regression models, the D-optimality here requires the unknown solution of the differential equations and derivatives of the solution. We estimate the Gaussian process (GP) regression models from the available experimental data and use them as the surrogates of these unknown solution functions. The derivatives of the estimated GP models are derived and used to substitute the derivatives of the solution. Variable selection-based regression methods are used to learn the differential equations from the experimental data. Through multiple case studies, we demonstrate the proposed approach outperforms the D-optimality and the maximin space-filling design alone in terms of model accuracy and data economy.

1902.00482 2026-05-12 stat.ME

D-optimal Design for Network A/B Testing

Victoria Pokhiko, Qiong Zhang, Lulu Kang, D'arcy P. Mays

AI总结 本文研究了在网络环境下进行A/B测试时如何优化实验设计的问题,提出了基于条件自回归模型的方法,以捕捉网络结构对处理效应的影响。通过构建D-最优设计准则,并利用混合整数规划方法求解最优实验方案,有效提升了网络A/B测试的统计效率。该方法在合成网络和真实社交网络上的数值实验验证了其有效性。

Comments 24 pages, 5 figures, 2 tables

详情
Journal ref
2019, Journal of Statistical Theory and Practice, 13(4), 61
英文摘要

A/B testing refers to the statistical procedure of conducting an experiment to compare two treatments, A and B, applied to different testing subjects. It is widely used by technology companies such as Facebook, LinkedIn, and Netflix, to compare different algorithms, web-designs, and other online products and services. The subjects participating these online A/B testing experiments are users who are connected in different scales of social networks. Two connected subjects are similar in terms of their social behaviors, education and financial background, and other demographic aspects. Hence, it is only natural to assume that their reactions to the online products and services are related to their network adjacency. In this paper, we propose to use the conditional auto-regressive model to present the network structure and include the network effects in the estimation and inference of the treatment effect. A D-optimal design criterion is developed based on the proposed model. Mixed integer programming formulations are developed to obtain the D-optimal designs. The effectiveness of the proposed method is shown through numerical results with synthetic networks and real social networks.

2605.10069 2026-05-12 stat.AP

Estimating Consensus Epidemic Trajectories via a Constrained Power Fréchet Mean with Functional Registration

Yui Tomo, Shu Tamano, Daisuke Yoneoka

AI总结 本文提出了一种在函数空间中总结SEIR型传染病模型多种解的方法,通过计算带有功能注册的约束幂Fréchet均值,以获得具有部分机制可解释性的共识疫情轨迹。该方法将暴露和传染群体对视为希尔伯特空间中的对象,共识轨迹通过包含微分方程和人口约束的优化问题定义,从而保留对传染群体的部分机制解释。研究还开发了一种高效的分块优化算法,并利用模拟和文献中的新冠疫情数据进行了验证,为传染病模型的模型平均和集成预测提供了通用的轨迹汇总框架。

详情
英文摘要

We propose a method for summarizing multiple solutions to SEIR-type compartmental models on a functional space by computing a constrained power Fréchet mean with functional registration to obtain consensus epidemic trajectories with partial mechanistic interpretability. In our method, we regard the pairs of exposed and infectious compartments as objects in a Hilbert space, and the consensus trajectory is defined as the solution to a constrained optimization problem. Differential equation constraints and population constraints are incorporated in the optimization to preserve a partially mechanistic interpretation regarding the infectious compartment. The full dynamics with additional susceptible and removed compartments can then be recovered from the estimated trajectories and parameters. We develop an efficient block-optimization algorithm based on functional data analysis and illustrate the method using simulated and literature-derived epidemiological parameters for COVID-19 in the early phase of the pandemic that began in 2020. The proposed approach provides a generalized trajectory-summarization framework that includes mean- and median-type estimators on a functional space and holds potential for model averaging and ensemble forecasting in infectious disease modeling.

2605.10042 2026-05-12 stat.ME math.PR stat.AP

A Statistical Framework for Learning Preferences from the Past

Tamojit Sadhukhan, Moulinath Banerjee, Krishanu Maulik, Parthanil Roy

AI总结 本文提出了一种统计框架,用于从用户过去的重复选择中学习其潜在偏好,以提升个性化推荐和选择预测的准确性。该方法基于一个自然的单调性假设,即过去被选择频率或强度更高的选项在未来更可能再次被选择,并对原有参数模型进行了非参数扩展。研究还提出了在单调性约束下的最大似然估计方法,并通过理论分析和实验验证了方法的有效性。

Comments 31 pages, 2 figures

详情
英文摘要

In many real-world settings such as online recommendation or consumer choice modeling, individuals make repeated choices from a fixed set of options. Accurately estimating their underlying preferences is essential for generating personalized future recommendations. Probabilistic models for understanding user choice behavior from past decisions can serve as a valuable addition to existing recommender systems and choice prediction methods. To this end, in this article, we introduce a novel statistical framework for predicting user preferences based on their past choices, under a natural monotonicity assumption: options that were chosen more frequently or more intensely in the past are more likely to be chosen again in the future. Our approach builds on a parametric model proposed by Le Goff and Soulier (2017), originally used to describe how ants in an ant colony select a path among many pre-existing paths. We propose a non-parametric generalization of this model, drawing inspiration from the generalized elephant random walk introduced by Maulik et al. (2024). We develop a method of maximum likelihood estimation of the user preference probabilities under the above-mentioned monotonicity constraint. We also derive theoretical guarantees for our estimator and demonstrate the effectiveness of our method through both simulated experiments and real-world datasets.

2605.10019 2026-05-12 cs.LG cs.AI cs.CC stat.ML

The two clocks and the innovation window: When and how generative models learn rules

Binxu Wang, Emma Lucia Byrnes Finn, Bingbin Liu

AI总结 该论文研究了生成模型在有限数据下学习规则时所面临的基本矛盾,即模型的训练目标使其更倾向于拟合经验分布而非目标分布。通过引入两个关键时间点——规则生效时间 $τ_{\mathrm{rule}}$ 和记忆重现时间 $τ_{\mathrm{mem}}$,论文分析了生成模型何时开始生成符合规则的样本以及何时开始复制训练数据。研究发现,这两个时间点受规则复杂度、模型容量和数据规模等因素影响,并定义了“创新窗口”作为模型真正创新的时期,揭示了生成模型在不同架构下学习规则的共性与差异。

Comments 48 pages, 28 figures. Earlier versions are presented in NeurIPS2025 SPIGM workshop as oral presentation https://openreview.net/forum?id=LjqX8OhPPi

详情
英文摘要

Generative models trained on finite data face a fundamental tension: their score-matching or next-token objective converges to the empirical training distribution rather than the population distribution we seek to learn. Using rule-valid synthetic tasks, we trace this tension across two training timescales: $τ_{\mathrm{rule}}$, the step at which generations first become rule-valid, and $τ_{\mathrm{mem}}$, the step at which models begin reproducing training samples. Focusing on parity and extending to other binary rules and combinatorial puzzles, we characterize how these two clocks, $τ_{\mathrm{rule}}$ and $τ_{\mathrm{mem}}$, depend on key aspects of the learning setup. Specifically, we show that $τ_{\mathrm{rule}}$ increases with rule complexity and decreases with model capacity, while $τ_{\mathrm{mem}}$ is approximately invariant to the rule and scales nearly linearly with dataset size $N$. We define the \emph{innovation window} as the interval $[τ_{\mathrm{rule}}, τ_{\mathrm{mem}}]$. This window widens with increasing $N$ and narrows with rule complexity, and may vanish entirely when $τ_{\mathrm{rule}} \geq τ_{\mathrm{mem}}$. The same two-clock structure arises in both diffusion (DiT) and autoregressive (GPT) models, with architecture-dependent offsets. Dissecting the learned score of DiT models reveals a corresponding evolution of the optimization landscapes, where rule-valid samples' basins expand substantially around $τ_{\mathrm{rule}}$, while training samples' basins begin to dominate around $τ_{\mathrm{mem}}$. Together, these results yield a unified and predictive account of when and how generative models exhibit genuine innovation.

2605.10015 2026-05-12 stat.ML cs.CR cs.LG

Differentially Private Sampling from Distributions via Wasserstein Projection

Shokichi Takakura, Seng Pei Liew, Satoshi Hasegawa

AI总结 本文研究了在差分隐私约束下从分布中采样的问题。与以往基于密度比的效用度量方法不同,本文提出以Wasserstein距离作为效用指标,克服了传统方法在捕捉分布支持几何结构和处理不同支持分布方面的不足。作者提出了基于Wasserstein投影的最小最大最优机制(WPM),并设计了相应的高效近似算法,提供了收敛性保证,为差分隐私采样提供了新的理论框架和实用方法。

详情
英文摘要

In this paper, we study the problem of sampling from a distribution under the constraint of differential privacy (DP). Prior works measure the utility of DP sampling with density ratio-based measures such as KL divergence. However, such formulations suffer from two key limitations: 1) they fail to capture the geometric structure of the support, and 2) they are not applicable when the supports of the distributions differ. To deal with these issues, we develop a novel framework for DP sampling with Wasserstein distance as the utility measure. In this formulation, we propose Wasserstein Projection Mechanism (WPM), a minimax optimal mechanism based on Wasserstein projection. Furthermore, we develop efficient algorithms for computing the proposed mechanisms approximately and provide convergence guarantees.

2605.09953 2026-05-12 stat.ME

Generalized Boundary FDR Control under Arbitrary Dependence: An Approach on Closure Principle

Yifan Zhang, Wentao Zhang, Changliang Zou, Haojie Ren

AI总结 本文针对多重假设检验中边界发现的可靠性问题,提出了一种新的误差度量 $k$-bFDR,用于控制最不显著的 $k$ 个发现的错误概率。基于闭包原则,作者构建了名为 Domino 的统一框架,在任意依赖性下实现 $k$-bFDR 控制,适用于 p 值和 e 值。理论分析与数值实验表明,Domino 能有效保证 $k$-bFDR 控制并提升边界发现的可靠性,实际数据分析也验证了其在提高拒绝集质量与实用价值方面的优势。

详情
英文摘要

False discovery rate (FDR) is a cornerstone of modern multiple testing. However, it often fails to guarantee the reliability of "marginal" discoveries that lie at the boundary of the rejection set, which are often crucial in high-precision applications. While recent works (Soloff et al., 2024; Xiang et al., 2025) introduced the boundary false discovery rate (bFDR) to control the error probability at the marginal discovery, their method relies on restrictive assumptions such as independence or specific prior distributions. In this paper, we first propose $k$-bFDR, a novel generalization that controls the error probability of the $k$ least significant discoveries. We then provide a systematic investigation into the theoretical relationship between $k$-bFDR and existing error metrics. Furthermore, building upon the closure principle, we develop Domino, a unified framework that guarantees $k$-bFDR control under arbitrary dependence, applicable for both p-values and e-values. We prove the theoretical validity of the proposed Domino algorithm and demonstrate through extensive numerical experiments that it consistently achieves rigorous $k$-bFDR control while identifying trustworthy marginal discoveries. Analyses of real data reveal that $k$-bFDR control yields higher-quality rejection sets with greater practical significance.

2605.09880 2026-05-12 math.NA cs.NA stat.CO stat.ME

Parameter Estimation for Partially Observed Time-Changed SDEs

Ke Zhao, Ajay Jasra

AI总结 本文研究了对部分观测的时变随机微分方程(SDEs)进行参数估计的问题,观测数据在离散时间点给出。作者提出了新的马尔可夫链蒙特卡洛(MCMC)算法,结合无偏得分型随机逼近方法,用于构造似然型参数估计器,并进一步用于多层级贝叶斯参数估计。该方法在数值实验中展示了良好的性能,理论分析表明其均方误差为 $\mathcal{O}(ε^2)$,计算成本为 $\mathcal{O}(ε^{-2}\log(ε)^2)$。

详情
英文摘要

In this paper we consider the parameter estimation problem associated to partially-observed time changed SDEs, with observations that are given at discrete times. In particular we consider both likelihood and Bayesian estimation. We develop new Markov chain Monte Carlo (MCMC) algorithms which allow an unbiased score-based stochastic approximation method to provide likelihood-type parameter estimators. We also use a variant of this MCMC algorithm to perform multilevel-based Bayesian parameter estimation. We prove that this latter method achieves a mean square error of $\mathcal{O}(ε^2)$ ($ε>0$) with a cost of $\mathcal{O}(ε^{-2}\log(ε)^2)$. Our methodologies are tested numerically on both simulated and real data.

2605.09857 2026-05-12 stat.ML cs.LG

Unified Approach for Weakly Supervised Multicalibration

Futoshi Futami, Takashi Ishida

AI总结 该论文研究了弱监督学习下多校准(multicalibration)的问题,即在缺乏干净标签的情况下,如何使模型预测的分数与真实标签概率在不同子群和评分相关测试中保持一致。为解决这一问题,作者提出了一种统一框架,结合污染矩阵风险重写和基于见证的校准约束,实现了在弱监督设置下的多校准误差估计与后处理修正,并提出了一个通用的弱标签多校准提升算法(WLMC)。实验表明该方法在多种弱监督场景下有效,为不确定性估计提供了新的实证见解。

详情
英文摘要

Multicalibration requires predicted scores to agree with label probabilities across rich families of subgroups and score-dependent tests, but existing methods require clean input-label pairs for evaluation and post-processing. This assumption fails in weakly supervised learning (WSL) regimes -- including positive-unlabeled, unlabeled-unlabeled, and positive-confidence learning -- where clean labels are costly or unavailable even though reliable uncertainty estimates may be crucial. We address this gap by developing estimators of multicalibration error and post-hoc correction methods for WSL settings in which clean input-label pairs are unavailable. We propose a unified framework for estimating and correcting multicalibration under weak supervision by combining contamination-matrix risk rewrites with witness-based calibration constraints, yielding corrected multicalibration moments with finite-sample guarantees. We further propose weak-label multicalibration boost (WLMC), a generic post-hoc recalibration algorithm under weak supervision. Finally, we conduct experiments across multiple weak-supervision settings to evaluate multicalibration behavior and offer empirical insight into uncertainty estimation under weak supervision.

2605.09849 2026-05-12 stat.ME

Proximal Causal Inference for Hidden Outcomes

Helen Guo, Ilya Shpitser, Elizabeth L. Ogburn

AI总结 本文研究了在存在隐藏结果的情况下如何进行因果推断,提出了一种基于近似因果推理的方法。该方法利用特征值-特征向量结构重建潜在分布,并在此基础上构建了基于影响函数的因果效应估计器。该方法无需依赖无偏代理测量或部分观测,实现了多重稳健性和较高的估计效率,是该领域的一项重要进展。

详情
英文摘要

Methods that rely on proxies, without imposing strong parametric structure, are increasingly used to deal with unobserved variables in causal inference. One influential line of this work reconstructs latent distributions used to identify the target functional by exploiting eigenvalue eigenvector structure. Within this framework, we first establish identification of the full data law in the presence of hidden outcomes, and then develop influence function based estimators for causal effects. To the best of our knowledge, this is the first work to develop influence function based estimators in this setting without relying on unbiased proxy measurements or partial observation, while achieving multiple robustness and desirable efficiency properties. We demonstrate the performance of our approach through simulation studies.

2605.09834 2026-05-12 stat.ML cs.LG

Supercharging Bayesian Inference with Reliable AI-Informed Priors

Jongwoo Choi, Sean O'Hagan

AI总结 本文研究了如何利用现代预测系统提供的信念作为统计推断的先验信息,以提升数据有限情况下的推断性能。为了解决预测模型误差可能传播到后验分布的问题,作者提出了一种修正AI生成数据规律的框架,用于构建更可靠的AI先验。该方法显著降低了偏差,提高了可信区间覆盖率,并在实际皮肤疾病分类任务中验证了其有效性。

详情
英文摘要

Modern predictive systems encode beliefs that can act as useful prior information for statistical inference in data-limited settings. Using them for prior construction introduces a tradeoff: an informative prior built from a predictive model can sharpen inference from limited data, but also risks propagating error from the model into the posterior. We propose a framework for AI-informed prior elicitation that mitigates this tension by rectifying the AI-induced law that generates synthetic data before using it to inform a prior. The rectified law can be embedded into synthetic data-driven prior elicitation techniques, including as a base measure in a Dirichlet process (DP) prior on the data-generating process. We refer to the resulting prior and corresponding posterior as the rectified AI prior and rectified AI posterior. We establish Gaussian asymptotics for the rectified AI posterior under non-vanishing prior strength and derive a first-order expression for its centering bias. Our rectified AI priors substantially reduce bias compared to standard approaches, improve the coverage of credible intervals, and make AI-powered prior information more reliable. We additionally apply the rectified AI prior to a real skin disease classification task and show that it can meaningfully boost predictive performance.

2605.09757 2026-05-12 cs.LG stat.ML

On Uniform Error Bounds for Kernel Regression under Non-Gaussian Noise

Johannes Teutsch, Oleksii Molodchyk, Marion Leibold, Timm Faulwasser, Armin Lederer

AI总结 本文研究了在非高斯噪声环境下基于核回归的函数估计的非保守不确定性量化问题,提出了新的非渐近概率统一误差界。与以往仅适用于次高斯噪声的界不同,本文的界适用于更广泛的非高斯噪声分布,包括次高斯、有界、次指数以及方差/矩有界噪声,并且适用于相关和不相关噪声。通过与现有结果在不确定性区域和安全控制性能上的对比,验证了所提出误差界的紧致性。

Comments This paper has been accepted at the 43rd International Conference on Machine Learning (ICML) 2026

详情
英文摘要

Providing non-conservative uncertainty quantification for function estimates derived from noisy observations remains a fundamental challenge in statistical machine learning, particularly for applications in safety-critical domains. In this work, we propose novel non-asymptotic probabilistic uniform error bounds for kernel-based regression. Compared to related bounds in the literature that are restricted to (conditionally) independent sub-Gaussian noise, our bounds allow to consider a broad class of non-Gaussian distributions, such as sub-Gaussian, bounded, sub-exponential, and variance/moment-bounded noise. Moreover, our results apply to correlated and uncorrelated noise. We compare our proposed error bounds with existing results in terms of the induced uncertainty region and their performance in safe control, demonstrating the tightness of the proposed bounds.

2605.09755 2026-05-12 math.NA cs.DS cs.LG cs.NA stat.ML

Accelerating Power Method with Fast Sketching for Stronger Low-Rank Approximation

Shabarish Chenakkod, Michał Dereziński

AI总结 本文研究如何加速幂法以实现更强的低秩近似,针对传统幂法在高秩目标下计算成本高的问题,提出了一种基于快速随机投影的加速框架。该方法在奇异值分解、低秩分解和Nystrom近似等任务中表现出高效且稳定的数值性能,其核心创新在于引入了正则化谱近似理论,为幂法的推广提供了更灵活的分析工具。

详情
英文摘要

The power method is one of the most fundamental tools for extracting top principal components from data through low-rank matrix approximation. Yet, when the target rank is large, the cost of matrix multiplication associated with this procedure becomes a major bottleneck. We develop an algorithmic and theoretical framework for accelerating the power method using fast sketching, which is a popular paradigm in randomized linear algebra. Our framework leads to simple and provably efficient methods for singular value decomposition, low-rank factorization, and Nyström approximation, which attain strong numerical performance on benchmark problems. The key novelty in our analysis is the use of regularized spectral approximation, a property of fast sketching methods which proves more flexible in generalizing power method guarantees than traditional arguments.

2605.09741 2026-05-12 stat.ME

Adaptive discovery of effect modification in matched observational studies

Yu Gui, Dylan S Small, Zhimei Ren

AI总结 本文研究了在配对观察性研究中发现效应修饰的问题,即治疗效果在不同子群体中的差异。作者提出了一种有限样本下有效的子群发现方法,能够精确控制子群层面的错误发现率,并考虑了未测量混杂因素的影响。该方法通过利用多个匹配的对照组提升统计功效,在模拟研究和实际应用中均表现出优于现有方法的性能。

详情
英文摘要

Understanding effect modification -- how treatment effects vary across subpopulations -- is practically important in observational studies, as it helps identify which subgroups are likely to benefit from a given treatment. In this paper, we study the discovery of effect modification in matched observational studies, where each treated unit may be matched to multiple controls. We develop a finite-sample valid procedure for identifying and selecting covariate-interpretable subgroups, with exact control of the subgroup-level false discovery rate (FDR). Our method explicitly accounts for unmeasured confounding via sensitivity models, and leverages multiple matched controls to improve statistical power. We demonstrate the favorable performance of our method relative to baseline methods through extensive simulation studies and a real-world application to the economic returns to college education.

2605.09740 2026-05-12 econ.EM stat.ME stat.ML

LGB+: A Macroeconomic Forecasting Road Test

Philippe Goulet Coulombe

AI总结 本文提出了一种名为LGB+的梯度提升方法,旨在提高宏观经济时间序列的预测性能。该方法通过在每一步同时评估树模型和线性模型,并选择表现更优的模型进行更新,从而在保持非线性建模能力的同时更高效地捕捉数据中的线性关系。LGB+能够将预测分解为线性和非线性部分,有助于理解变量重要性和历史影响权重,在具有显著自回归特征或混合线性-非线性信号的宏观经济指标预测中表现出色。

详情
英文摘要

Needless to say, linear dynamics are pervasive in economic time series, particularly autoregressive ones. While gradient boosting with trees excels at capturing nonlinearities, it is inefficient in small samples when much of the predictive content is linear, expending splits to approximate relationships better captured by simple linear terms. This paper proposes LGB+, a boosting procedure operating on a more inclusive set of basis functions. The idea comes in two flavors. LGB+ evaluates a tree and a linear candidate at each step against out-of-bag data; only the winner advances. The simpler variant, LGB^A+, alternates on a fixed schedule: a block of tree updates, then a greedy linear correction, repeat. Both designs avoid ex ante commitments to any particular functional form or predictor selection. Because the prediction is the sum of a linear and a tree component, forecasts decompose natively into linear and nonlinear contributions, and so does permutation-based variable importance and historical proximity weights. In a quarterly U.S. macroeconomic forecasting exercise, LGB+ delivers strong gains for targets with pronounced autoregressive dynamics or mixed linear-nonlinear signals. Variables dominating the linear channel are those operating through autoregressive persistence or near-accounting relationships to the target (e.g., initial claims for unemployment and building permits for housing starts).

2605.09718 2026-05-12 stat.ML cs.LG math.PR math.ST stat.TH

Learning stochastic multiscale models through normalizing flows

Anan Saha, Arnab Ganguly

AI总结 该论文研究了如何从单一观测轨迹中学习多尺度随机系统的有效动力学模型。作者提出了一种基于轨迹的框架,通过耦合多尺度随机微分方程建模系统动力学,并利用随机平均方法进行模型降阶。为了解决降阶模型中依赖于难以求解的快变量不变分布的问题,作者引入了归一化流来参数化该分布,并通过端到端优化学习模型参数,同时采用变分贝叶斯推断方法进行不确定性量化,从而实现了对多尺度系统中认识不确定性的有效刻画。

Comments 17 pages, 4 figures

详情
英文摘要

Many systems in physics, engineering, and biology exhibit multiscale stochastic dynamics, where low-dimensional slow variables evolve under the influence of high-dimensional fast processes. In practice, observations are often limited to a single trajectory of the slow component, while the fast dynamics remain unobserved, making statistical learning challenging. Approaches based on partial differential equations (PDE), such as Fokker-Planck formulations, aim to characterize the evolution of probability densities, typically requiring dense space-time data or grid-based solvers. In contrast, we adopt a trajectory-based perspective and develop a data-driven framework for learning effective stochastic dynamics from a single observed path. We model the dynamics by coupled multiscale stochastic differential equations (SDEs) and first obtain a principled model reduction through stochastic averaging. Unlike generic model reduction techniques such as PCA, this respects the dynamical structure of the original system and explicitly incorporates the interaction between slow and fast scales. A central challenge, however, is that the reduced model depends on the invariant distribution of the fast process, which is a solution to an intractable and often unknown PDE. We introduce a novel learning framework that parameterizes the invariant distribution using normalizing flows, enabling expressive density modeling in the latent fast-variable space. The flow is trained end-to-end by optimizing a penalized likelihood objective induced by the reduced stochastic dynamics. Furthermore, we develop a Bayesian variational inference procedure for uncertainty quantification, employing a second normalizing flow to approximate the posterior distribution over model parameters. This yields a scalable approach to capturing epistemic uncertainty in multiscale systems.

2605.09717 2026-05-12 math.ST stat.TH

The general regularisation scheme applied to conditional density estimation

Gilles Germain

AI总结 本文将通用正则化方法应用于条件密度估计,提出了一种统一的框架,并推导出具有严格收敛速率保证的新估计器。该方法采用计算更高效的Landweber正则化,实验表明其在多种场景下表现优于或不低于Nadaraya-Watson估计器,包括时间序列模型。

Comments 15 pages, 0 figures

详情
英文摘要

The general regularisation scheme, a versatile approach for nonparametric estimation, has been successfully applied to regression, density ratio, and score estimation. In this paper, we introduce a unified framework encompassing these settings and extend it to conditional density estimation, deriving a new estimator with rigorously established convergence rates. We implement the Landweber regularisation, which is computationally more tractable than Tikhonov regularisation in this context. Numerical experiments demonstrate that our estimator matches or outperforms the Nadaraya-Watson estimator in various scenarios, including time series models.

2605.09712 2026-05-12 econ.EM q-fin.PM stat.ML

Quantifying the Risk-Return Tradeoff in Forecasting

Philippe Goulet Coulombe

AI总结 本文研究了在预测领域中风险与收益的权衡问题,提出将预测误差相对于基准的差异视为收益序列,并采用金融领域的风险调整绩效指标对其进行评估。研究引入了Edge Ratio等新指标,用于衡量模型提供独特信息预测的能力,并将该框架应用于美国宏观经济预测,比较了计量经济模型、机器学习方法及专业预测者的绩效,发现尽管机器学习在平均准确性上可能优于专业预测者,但在风险调整后的表现上专业预测者更具优势,体现出其在风险控制和情境判断上的价值。

详情
英文摘要

Average forecast accuracy is not the same as forecast reliability. I treat forecast loss differentials relative to a benchmark as a return series. I then evaluate these returns using risk-adjusted performance measures from finance, including the Sharpe ratio, Sortino ratio, Omega ratio, and drawdown-based metrics. I also introduce the Edge Ratio capturing a model's propensity to deliver uniquely informative predictions relative to the forecasting frontier. I apply this framework to U.S. macroeconomic forecasting, comparing econometric benchmarks, machine learning models, a foundation model (TabPFN), and the Survey of Professional Forecasters. While it is often feasible to beat professional forecasters in terms of average accuracy, it is much harder to beat them on a risk-adjusted basis. They rarely exhibit catastrophic failures and often achieve high Edge Ratios, plausibly reflecting the value of contextual judgment. Nonetheless, selected machine learning methods deliver attractive risk profiles for specific targets. The framework naturally extends to meta-analyses across targets, horizons, and samples, illustrated with a density forecast evaluation and the M4 competition.

2605.09702 2026-05-12 stat.ME cs.CL

Calibrate, Don't Curate: Label-Efficient Estimation from Noisy LLM Judges

Yanran Li

AI总结 本文研究了在存在噪声标签的多评委评估体系中,如何高效估计大型语言模型的性能。传统方法倾向于通过筛选高准确率的评委来提升评估效果,但作者发现,当目标是校准后的概率评估时,保留全部评委反而表现更优。研究表明,即使某些评委的准确率低于平均水平,只要其偏差可学习且信息不冗余,就能为校准带来帮助,因此在有标注校准数据的情况下,应避免仅依据准确率剔除弱评委。

详情
英文摘要

Multi-judge evaluation is increasingly used to assess LLMs and reward models, and the prevailing heuristic is to curate: keep the most accurate judges and discard weaker ones. We show that this heuristic can reverse when the target is not point accuracy, but calibrated probabilistic evaluation from a labeled calibration set. Holding the aggregation and calibration procedures fixed, we compare accuracy-ranked top-$k$ judge selection with using the full judge panel. Across four labeled pairwise-evaluation benchmarks spanning LLM-as-judge and reward-model settings, the calibrated full panel consistently outperforms accuracy-based selection. On RewardBench2, retaining all judges achieves negative log-likelihood (NLL) of $0.006$ versus $0.013$ under top-5 selection, halving the calibration error. This advantage persists after judge-family deduplication and against stronger same-pipeline subset search. We explain this reversal with oracle analyses showing that the optimal calibrated risk under proper scoring rules cannot increase when additional judge signals are made available, and that even below-chance judges can be useful when their biases are learnable and their signals are non-redundant. The resulting operating principle is simple: in multi-judge evaluation with labeled calibration data, do not discard weak judges by accuracy alone; keep them when they are parseable, non-redundant, and calibratable.

2605.09673 2026-05-12 stat.ME

On the Need for Spatial Random Effects in Bayesian Regression Models for Multilevel Areal Data

Shuqi Lin, Joshua L. Warren

AI总结 本文研究了在多层级区域数据的贝叶斯回归模型中是否需要引入空间随机效应的问题。作者在高斯响应的分层贝叶斯框架下,利用Leroux条件自回归(CAR)先验分布,推导出一个样本量阈值 $m^*$,用于判断空间模型对回归系数推断的影响程度。研究发现,当样本量低于该阈值时,空间建模对推断有显著影响,而当样本量高于该阈值时,非空间模型即可得到相近结果。该阈值依赖于空间相关参数、区域间与区域内方差比以及协变量与空间模式的对齐程度,为实际研究设计提供了实用指导。

详情
英文摘要

Although spatial models for areal data are widely used in multilevel settings, the conditions under which spatial and nonspatial random effects yield equivalent posterior inference for regression coefficients have never been formally characterized. We address this question within a hierarchical Bayesian framework for Gaussian outcomes, using the Leroux conditional autoregressive (CAR) prior distribution as a representative specification. We derive a closed-form sample size threshold, $m^*$, below which spatial modeling materially affects inference on regression coefficients and above which a simpler nonspatial model yields effectively equivalent results, and show that the absolute relative difference in posterior variances converges to zero at rate $O(m^{-1})$. The threshold depends on three interpretable quantities: the spatial correlation parameter, the ratio of between-area to within-area variance, and the alignment between the covariate and dominant spatial patterns in the data. Because each can often be estimated prior to model fitting, $m^*$ can serve as a practical study design tool. Simulation studies confirm that $m^*$ accurately identifies this threshold across a range of settings. However, when the covariate does not vary within a given location, spatial modeling remains necessary regardless of within-area sample size. These results offer formal guidance for practitioners deciding whether the added complexity of spatial modeling is warranted.

2605.09654 2026-05-12 stat.ML cs.LG stat.CO

Metropolis-Adjusted Diffusion Models

Kevin H. Lam, Tyler Farghly, Christopher Williams, Jun Yang, Yee Whye Teh, Arnaud Doucet

AI总结 本文研究了基于分数的扩散模型中的采样偏差问题,提出了一种基于Metropolis-Hastings(MH)或Barker接受-拒绝步骤的修正方法,以减少时间离散化和分数函数近似带来的偏差。作者引入了一种基于双硬币伯努利工厂算法的精确修正方法,并提出了一种基于辛普森法则的高效近似方法,显著提升了采样质量。实验表明,该方法在合成数据和图像数据集上均取得了更好的样本生成效果,尤其在FID指标上表现突出。

详情
英文摘要

Sampling from score-based diffusion models incurs bias due to both time discretisation and the approximation of the score function. A common strategy for reducing this bias is to apply corrector steps based on the unadjusted Langevin algorithm (ULA) at each noise level within a predictor-corrector framework. However, ULA is itself a biased sampler, as it discretises a continuous diffusion process. In this work, we consider adjusted Langevin correctors that employ Metropolis--Hastings (MH) or Barker's accept-reject steps to correct for this bias. Since the target density ratio typically required by MH-based algorithms is unavailable, we propose methods that instead utilise the score function to compute the correct acceptance probability. We introduce the first exact method for adjusting Langevin corrections in diffusion models, based on a two-coin Bernoulli factory algorithm. We also propose an efficient approximation based on Simpson's rule that achieves accuracy of order $5/2$ in the step size at near-zero marginal cost. We demonstrate that these procedures improve sample quality on both synthetic and image datasets, yielding consistent gains in Fréchet Inception Distance (FID) on the latter.

2605.09562 2026-05-12 stat.ME

Laplace Variational Inference for Dirichlet Process Mixtures of Marked Poisson Point Processes

Minsung Choi, Seonghyun Jeong

AI总结 本文研究具有标记的泊松点过程数据的聚类问题,提出了一种基于狄利克雷过程的标记泊松点过程混合模型,能够同时推断潜在的聚类结构、聚类数量以及连续的标记特异性强度表面。为实现高效后验推断,作者设计了一种变分贝叶斯算法,并采用约束拉普拉斯近似处理非共轭部分,有效解决了平方链模型中的符号歧义和节点线问题。实验表明,该方法在合成数据和实际数据分析中均表现出良好的性能。

详情
英文摘要

Marked point process data arise when events occur in a space with event-level marks. We study clustering of replicated marked Poisson point processes and introduce Dirichlet process mixtures of marked Poisson point processes, a Bayesian nonparametric model that jointly infers latent cluster structure, the number of clusters, and continuous mark-specific intensity surfaces. We use a squared link intensity representation to obtain tractable continuous domain likelihood terms without gridding or thinning. For posterior inference, we develop an efficient variational Bayes algorithm with a constrained Laplace approximation for the nonconjugate basis-coefficient block. The resulting coefficient update is formulated as a constrained optimization problem, which avoids the sign ambiguity and nodal-line issue of squared-link models. We further establish theoretical guarantees for mode finding optimization. We demonstrate the performance of the proposed model and algorithm through synthetic experiments and real-data analysis.

2605.09552 2026-05-12 math.OC cs.LG stat.ML

Phases of Muon: When Muon Eclipses SignSGD

Elliot Paquette, Noah Marshall, Lucas Benigni, Guangyuan Wang, Atish Agarwala, Courtney Paquette

AI总结 本文研究了Muon及其相关的谱优化方法在高维矩阵最小二乘问题中的行为,揭示了其与SignSVD和SignSGD等随机优化方法之间的关系。通过推导确定性动态模型,分析表明Muon在大批次时相当于对数据协方差谱进行平方根预处理,而小批次时则表现出类似SGD的行为,收敛速度变慢。研究还发现,在各向异性数据下,SignSVD和SignSGD的性能存在显著差异,并在协方差幂律模型中识别出三种不同的性能相态。

详情
英文摘要

Recently, Muon and related spectral optimizers have demonstrated strong empirical performance as scalable stochastic methods, often outperforming Adam. Yet their behaviour remains poorly understood. We analyze stochastic spectral optimizers, including Muon, on a high-dimensional matrix-valued least squares problem. We derive explicit deterministic dynamics that provide a tractable framework for studying learning behaviour with a focus on (stochastic) SignSVD, which Muon approximates, and (stochastic) SignSGD, the latter serving as a proxy for Adam. Our analysis shows that for large batch size, SignSVD performs a square-root preconditioning with respect to the data covariance spectrum, while for small batch size smaller eigenmodes behave like SGD, slowing down convergence. We contrast with SignSGD which for generic covariance performs no preconditioning and has no transition, leading to different optimal learning rates and convergence characteristics. The two methods match up to a constant factor with isotropic data, but behave differently with anisotropic data. An analysis of a power law covariance model with data exponent $α$ and target exponent $β$ shows there are three phases in the $(α,β)$ plane: one where SignSGD is uniformly favored, one where SignSVD is uniformly favored, and a third where the two methods exhibit a trade-off in performance.

2605.09525 2026-05-12 stat.ME

Simultaneous false discovery rate control in location families

Zijun Gao, Wenjie Hu, Qingyuan Zhao

AI总结 在使用来自位置族的数据进行多个统计假设检验时,除了控制零假设的错误发现率(FDR)外,还希望对其他被认为在实践中不显著的参数值进行FDR控制。本文将FDR视为一个关于位置参数的曲线,并提出了一种对Benjamini-Hochberg程序的简单推广方法,以确保该FDR曲线低于任意用户指定的水平。作为主要结果的一个推论,标准的Benjamini-Hochberg程序在控制零假设FDR的同时,实际上也免费实现了对整个FDR曲线的同步控制。

Comments 11 pages, 3 figures

详情
英文摘要

When testing a number of statistical hypotheses using data from location families, it is often useful to control the false discovery rate (FDR) not just for hypotheses of the null values but also of other parameter values that are deemed practically insignificant. Here we consider FDR as a curve indexed by the location parameter and suggest a simple generalization of the Benjamini-Hochberg procedure that controls the FDR curve below any user-specified level. As a corollary of our main result, we show that the standard Benjamini-Hochberg procedure -- designed to control the FDR at the null -- also provides simultaneous control of the whole FDR curve for free. We further demonstrate the implications of our results and some practical considerations with a numerical example.

2605.09509 2026-05-12 stat.ML cs.LG stat.ME

Empirical Bayes 1-bit matrix completion

Takeru Matsuda

AI总结 本文研究了二值矩阵中未观测条目预测的问题,即1比特矩阵补全,该问题在推荐系统等领域有广泛应用。受Efron-Morris估计器启发,作者提出了一种经验贝叶斯方法,通过收缩奇异值来利用二值矩阵的低秩结构,方法在预测精度、不确定性量化和计算效率方面优于现有方法。

详情
英文摘要

The problem of predicting unobserved entries in a binary matrix, known as 1-bit matrix completion, has found diverse applications in fields such as recommendation systems. In this study, we develop an empirical Bayes method for 1-bit matrix completion motivated by the Efron--Morris estimator, a matrix generalization of the James--Stein estimator that shrinks singular values toward zero. The proposed method exploits the underlying low-rank structure of binary matrices, drawing parallels with multidimensional item response theory. Simulation studies and real-data applications demonstrate that the proposed method achieves a superior balance of predictive accuracy, calibration reliability (uncertainty quantification), and computational efficiency compared to existing methods.

2605.09506 2026-05-12 stat.ME q-bio.QM stat.CO

Accelerating Bayesian Phylogenetic Inference via Delayed Acceptance Sequential Monte Carlo with Random Forest Surrogates

Wentao Yu, Shijia Wang

AI总结 在贝叶斯系统发育分析中,研究旨在估计系统发育树的后验分布。本文提出了一种基于随机森林的代理模型,用于预测标准MCMC方法中树结构变化(如eSPR、stNNI)对似然函数的影响,从而设计出一种延迟接受MCMC核,显著减少似然函数的计算次数。该方法进一步集成到序贯蒙特卡洛采样框架中,实验表明其在保持估计精度的同时大幅提升了计算效率。

详情
英文摘要

In Bayesian phylogenetics, our goal is to estimate the posterior distribution over phylogenetic trees. Markov chain Monte Carlo methods are widely used to approximate the phylogenetic posterior distributions. For large-scale sequence data, repeated evaluation of the likelihood function incurs a high computational cost. In this article, we propose a machine-learning algorithm with over 35 topological and branch-length features to predict the changes in the likelihood function caused by tree moves (\eg,~eSPR, stNNI) used in standard MCMC approaches. This algorithm is then used to design a delayed acceptance MCMC kernel, which utilized the predicted surrogate function for preliminary rejection, to accelerate tree space searches. Furthermore, we integrate our proposed MCMC kernel into the sequential Monte Carlo sampler framework. We validate the proposed delayed-acceptance sequential Monte Carlo approach (DA-SMC) on simulation and real data sets. Our delayed acceptance kernel can maintain robust estimation while reduces the number of likelihood evaluations significantly, yielding substantial computational time savings. We develop a Python package that is available at https://github.com/wentYu/DAphyloSMC.

2605.09485 2026-05-12 cs.LG stat.ML

SEMASIA: A Large-Scale Dataset of Semantically Structured Latent Representations

Mario Edoardo Pandolfo, Enrico Grimaldi, Lorenzo Marinucci, Leonardo Di Nino, Simone Fiorellino, Sergio Barbarossa, Paolo Di Lorenzo

AI总结 本文介绍了SEMASIA,一个大规模的语义结构潜在表示数据集,包含从约1700个预训练视觉模型中提取的潜在表示,覆盖八个标准图像分类基准。该数据集配以描述模型架构、训练方式、预训练来源等结构化元数据,旨在解决不同模型潜在空间几何结构不兼容的问题。研究通过分析潜在空间的概念组织、对齐映射性能以及预训练数据与模型特性对表示的影响,展示了SEMASIA在可解释性、迁移学习等任务中的应用价值。

详情
英文摘要

Latent representations learned by neural networks often exhibit semantic structure, where concept similarity is reflected by geometric proximity in embedding space. However, comparing such spaces across models remains difficult: changes in architecture, pretraining data, objective, or random seed can yield embeddings with similar content but incompatible geometry. This latent space alignment problem is central to interpretability, transfer and multimodal learning, federated systems, and semantic communication; however, progress remains limited by the lack of large-scale, model-diverse, and metadata-rich benchmarks. To address this gap, we introduce SEMASIA, a large-scale collection of latent representations extracted from approximately 1,700 pretrained vision models across eight standard image-classification benchmarks. SEMASIA pairs embeddings with structured metadata describing architectures, training regimes, pretraining sources, and model scale. We demonstrate three applications of the resource. First, we analyze the conceptual organization of individual latent spaces, showing consistent prototype-like clustering and hierarchical semantic neighborhoods across models and datasets. Second, we benchmark supervised alignment mappings between latent spaces using reconstruction error and downstream task performance. Third, we perform a large-scale regression analysis of how pretraining-data complexity, specialization, transfer learning, augmentation, and model scale relate to geometric and probing properties of embeddings. By coupling representational scale with standardized metadata, SEMASIA provides a reproducible foundation for studying latent geometry, evaluating alignment methods, and developing next-generation heterogeneous and interoperable AI systems.

2605.09471 2026-05-12 math.ST stat.TH

The Statistical Cost of Adaptation in Multi-Source Transfer Learning

Abhinav Chakraborty, Subha Maity

AI总结 本文研究了多源迁移学习中适应未知源到目标偏差所带来的统计代价。作者引入了“适应的内在代价”这一概念,用于衡量在不了解偏差的情况下,任意估计器与理想 oracle 估计器之间的风险比。研究发现,在参数估计场景下,多源迁移与单源迁移有本质区别,适应并非总能实现,且随着源数量增加,适应代价会提升。此外,当无法在全部偏差空间中进行适应时,某些结构假设可显著降低代价,文中提出了针对不同场景的估计方法并提供了理论与实验支持。

详情
英文摘要

Multi-source transfer learning can improve target-domain estimation by leveraging related source data, but its benefits depend on unknown source-to-target biases. This raises a fundamental question: can a bias-agnostic estimator perform as well as an oracle that knows the true bias configuration? To study this, we introduce the intrinsic cost of adaptation, defined as the smallest worst-case ratio between the risk of any bias-agnostic estimator and the oracle risk. An intrinsic cost of one means oracle performance is achievable without knowing the biases, whereas a larger cost quantifies the unavoidable price of adaptation. Focusing on parametric estimation, we show that multi-source transfer behaves fundamentally differently from the single-source setting: adaptation is not always possible, even with only two sources. For a fixed number of sources, we characterize the intrinsic cost of adaptation and identify a phase transition separating regimes where oracle performance is achievable from those where it is not. As the number of sources grows, we further show that the adaptation cost increases. When adaptation over the full bias configuration space is impossible, additional structure can substantially reduce the cost. We study settings with ordered biases, clustered source parameters, and sufficiently separated non-informative sources, and propose estimators tailored to each regime, with supporting theoretical and empirical results. Overall, our results delineate the statistical limits of multi-source transfer, clarifying when oracle performance is attainable, when structural assumptions help, and when adaptation is fundamentally impossible.

2605.09462 2026-05-12 stat.ME math.ST stat.ML stat.TH

Proximal Path-Specific Inference

Yang Bai, Sihan Wu, Baoluo Sun, Yifan Cui

AI总结 本文研究了因果中介分析中路径特异性效应的估计问题,旨在在存在未观测混杂因素的情况下,准确分离特定中介路径上的处理效应。作者提出利用可观测协变量作为代理变量,构建近端混杂桥函数,发展了四种非参数识别策略,并设计了一种四重稳健且局部高效的估计方法,同时提出了适用于高维 nuisance 参数的近端去偏机器学习方法。理论分析表明该估计方法在 nuisance 函数估计速率较慢时仍具有根号n一致性与渐近正态性,实际应用验证了其有效性。

详情
英文摘要

Causal mediation analysis has been extended to estimate path-specific effects with multiple intermediate variables, isolating treatment effects through a mediator of interest while excluding pathways through its ancestors. Such analyses address bias from recanting witnesses, i.e., treatment-induced mediator-outcome confounders. However, existing methods typically rely on stringent assumptions precluding general unmeasured confounding, which are often violated in practice. In this paper, we relax these restrictions by leveraging observed covariates as proxy variables to accommodate unmeasured confounding among the treatment, recanting witness, mediator, and outcome. Using proximal confounding bridge functions, we develop four nonparametric identification strategies for the path-specific effect. We further derive the efficient influence function and propose a quadruply robust, locally efficient estimator. To handle high-dimensional nuisance parameters, we propose a proximal debiased machine learning approach. We theoretically guarantee that our estimator achieves $\sqrt{n}$-consistency and asymptotic normality even when machine learning estimators for nuisance functions converge at slower rates. Our approaches are validated via semiparametric and nonparametric simulations and an application to the CDC WONDER Natality study, estimating the path-specific effect of prenatal care on preterm birth through preeclampsia, independent of maternal smoking during pregnancy.

2605.09456 2026-05-12 stat.ML cs.LG math.AP math.OC

Quantitative Local Convergence of Mean-Field Stein Variational Gradient Flow

Lénaïc Chizat, Maria Colombo, Roberto Colombo, Xavier Fernández-Real

AI总结 本文研究了均场Stein变分梯度流(SVGD)在局部区域内的定量强收敛性。针对$d$维环面上的Riesz型交互核,作者在初始密度与目标分布在$L^2$范数下接近且光滑的条件下,给出了明确的多项式收敛速率,并证明了这些速率在某些情形下是紧致的。研究还表明,当核具有库仑奇异性时,可恢复先前工作的全局指数收敛结果,理论分析受到核均值差异Wasserstein梯度流研究的启发。

详情
英文摘要

Stein Variational Gradient Descent (SVGD) is a deterministic interacting-particle method for sampling from a target probability measure given access to its score function. In the mean-field and continuous-time limit, it is known that the flow converges weakly toward the target, but no quantitative rate is known for the last iterate. In this paper, we establish quantitative local convergence in strong norms for this dynamics, when the interaction kernel is of Riesz type on the $d$-dimensional torus. Specifically, assuming that the initial density and the target are smooth and close in $L^2$-norm, we obtain explicit polynomial convergence rates in $L^2$-norm that depend on the dimension and on the regularity parameters of the kernel, the initialization and the target. We further show that these rates are sharp in certain regimes, and support the theory with numerical experiments. In the edge case of kernels with a Coulomb singularity, we recover the global exponential convergence result established in prior work. Our analysis is inspired by recent results on Wasserstein gradient flows of kernel mean discrepancies.

2605.09454 2026-05-12 stat.ML cs.LG

Optimal Regret for Single Index Bandits

Devdan Dey, Sujoy Bhore, Avishek Ghosh

AI总结 本文研究单索引老虎机问题,其中奖励依赖于高维上下文的未知一维投影,且投影函数未知。该模型扩展了线性及广义线性老虎机到非参数设置,适用于奖励函数未知的情形。作者提出了一种两阶段算法ZoomSIB-UCB,通过归一化Stein估计器估计投影方向,再将其转化为一维老虎机问题并使用UCB策略,从而在无需额外假设的情况下实现了最优的$\tilde{\mathcal{O}}(T^{2/3})$ regret上界,并证明了匹配的下界$\tildeΩ(T^{2/3})$,给出了单索引老虎机问题的精确regret刻画。

Comments 27 pages, 9 figures

详情
英文摘要

We study the $\textit{single-index bandit}$ problem, where rewards depend on an unknown one-dimensional projection of high-dimensional contexts through an unknown reward function. This model extends linear and generalized linear bandits to a nonparametric setting, and is particularly relevant when the reward function is not known in advance. While optimal regret guarantees are known for monotone reward functions, the general non-monotone case remains poorly understood, with the best known bound being $\tilde{\mathcal{O}}(T^{3/4})$ (under standard boundedness and Lipschitz assumptions on the reward function [Kang et al., 2025]). We close this gap by establishing the optimal regret for general single-index bandits. We propose a simple two-phase algorithm, namely, Zoomed Single Index Bandit with Upper Confidence Bound ($\texttt{ZoomSIB-UCB}$), that first estimates the projection direction via a normalized Stein estimator, and then reduces the problem to a one-dimensional bandit using discretization and finally use UCB. This approach achieves a regret of $\tilde{\mathcal{O}}(T^{2/3})$, and improves significantly upon prior work without any additional assumptions. We also prove a matching minimax lower bound of $\tildeΩ(T^{2/3})$, showing that the upper bound is essentially tight. Our upper and lower bounds together provide a sharp characterization of the regret in single-index bandits. Moreover, the empirical results further demonstrate the effectiveness and robustness of our approach.

2605.09439 2026-05-12 cs.LG stat.ML

Inverse Design for Conditional Distribution Matching

Ori Meidler, Shaul Tolkovsky, Or Zuk

AI总结 该论文提出了一种新的逆设计问题——条件分布匹配(CDM),旨在从给定的联合分布 $\mathcal{P}(X, Y)$ 中找到输入 $x^*$,使得其诱导的条件分布 $\mathcal{P}(Y \mid X = x^*)$ 与目标分布 $\mathcal{G}(Y)$ 匹配。为了解决这一问题,作者提出了 MLGD-F 算法,结合预训练的扩散模型和快速条件采样器,在无需额外训练的情况下实现高效求解。实验表明,该方法在多种任务中能够可靠地恢复出满足用户指定分布目标的输入。

详情
英文摘要

Generative models are powerful tools for sampling from a learned distribution $\mathcal{P}(Y \mid X)$, and inverse-design methods invert this map to find an input $x$ that produces a desired point output $y^*$. However, many design goals are naturally distributional rather than pointwise, incorporating the inherent uncertainty of $Y$ and targeting a specific form for it, a task not addressed by standard inverse design. To address this issue we introduce Conditional Distribution Matching (CDM), a new inverse-design problem class in generative modeling: given a joint distribution $\mathcal{P}(X, Y)$ and a target distribution $\mathcal{G}(Y)$, find an input $x^*$ whose induced conditional distribution $\mathcal{P}(Y \mid X = x^*)$ matches $\mathcal{G}$. We formally define two variants: Conditional Distribution Matching Sampling (CDMS) and Conditional Distribution Matching Optimization (CDMO). To solve these problems, we propose MLGD-F (Matching-Loss Guided Diffusion with a Fast inner sampler), a plug-and-play inference-time algorithm that combines a pretrained score-based diffusion model with a pretrained fast conditional sampler, requiring no additional training or fine-tuning. By leveraging single-step conditional sampling, MLGD-F enables tractable gradient computation, making the estimation of $\mathcal{P}(Y \mid X)$ both memory-efficient and computationally lightweight. We validate MLGD-F on synthetic benchmarks, structured image transformations, and generative editing optimization, demonstrating reliable recovery of inputs whose conditional distributions match diverse user-specified targets, including discrete mixtures and continuous low-rank supports.

2605.09408 2026-05-12 cs.LG cs.SI stat.ML

GravityGraphSAGE: Link Prediction in Directed Attributed Graphs

Riccardo Porcedda, Francesca Chiaromonte, Fabrizio Lillo, Andrea Vandin

AI总结 本文研究了有向属性图中的链接预测问题,即预测图中节点之间缺失或未来的连接关系。为了解决现有方法在处理有向图和节点属性时的不足,作者提出了基于引力机制的改进版GraphSAGE模型——GravityGraphSAGE(GG-SAGE),首次将GraphSAGE应用于有向链接预测任务。实验表明,该模型在多个基准数据集和真实网络数据上优于现有最先进的图深度学习链接预测方法,展示了其在复杂图结构中的有效性与扩展性。

详情
英文摘要

Link prediction (inferring missing or future connections between nodes in a graph) is a fundamental problem in network science with widespread applications in, e.g., biological systems, recommender systems, finance and cybersecurity. The ability to accurately predict links has significant real-world applications, such as detecting fraudulent financial transactions or identifying drug-target interactions in biomedicine. Despite a rich literature, link prediction is still challenging, especially for graphs enriched with information on edges (direction) and nodes (attributes). In fact, research on link prediction, especially the one based on Graph Deep Learning (GDL), has mostly focused on undirected graphs, without fully leveraging node attributes. Here, we fill this gap by proposing Gravity-GraphSAGE (GG-SAGE), a modified version of GraphSAGE, a GDL model for node embeddings, composed of a gravity-inspired decoder. This implementation is the first example in the literature of a GraphSAGE backbone adopted for directed link prediction. Using the benchmark datasets Cora, Citeseer, PubMed and 16 real-world graphs from the online Netzschleuder repository, we show that our proposed model outperforms state-of-the-art GDL link prediction techniques. Using further experimental evidence, we relate the quality of the output of our model with various characteristics of the graph, suggesting that our framework scales well when applied to data of increasing complexity.

2605.09396 2026-05-12 cs.IT cs.LG math.IT math.ST stat.ML stat.TH

Universal Feature Selection with Noisy Observations and Weak Symmetry Conditions

Dier Tang, Guangyue Han

AI总结 本文放宽了现有研究中对对称性的严格限制,提出了一种适用于噪声观测和具有方向偏好属性结构的通用特征选择框架。通过引入由二阶矩距离度量的弱球对称性概念,允许在旋转不变性上存在可控偏差,并基于噪声数据计算的典型依赖矩阵的奇异值分解构建特征选择方法。研究证明,所选特征在渐近情况下可达到接近最优的误差指数,其性能依赖于对称性偏差和噪声水平,当这些参数较小时,结果与已有研究一致,表明精确球对称性并非必要。该成果展示了框架对二阶矩偏差和观测噪声的鲁棒性,拓展了其在多种推理任务中的适用性。

Comments 6 pages, 0 figures. This work has been submitted to the 2026 IEEE Information Theory Workshop (ITW) for possible publication

详情
英文摘要

This paper relaxes the restrictive symmetry conditions adopted in [4], [5] and extends their universal feature selection framework to accommodate noisy observations as well as attribute structures that may exhibit directional preferences. We introduce the notion of weak spherical symmetry, quantified by second-moment distances, which allows controlled deviations from rotational invariance. Under this relaxed condition, we develop a universal feature selection framework based on the singular value decomposition of the canonical dependence matrix computed from noisy data. Our main result shows that the selected features achieve asymptotically optimal error exponents up to a residual term that depends on the symmetry deviation $δ$ and the noise levels $η_1, η_2$. When $δ, η_1, η_2$ are relatively small, our result recovers that of [5], thereby demonstrating that exact spherical symmetry is unnecessary. Overall, our findings highlight the robustness of the selection framework against second-moment deviations and observation noise, thereby broadening its applicability across diverse inference tasks and providing a theoretically grounded tool for universal feature selection in practical scenarios.

2605.09305 2026-05-12 stat.ME cs.HC cs.LG stat.CO stat.ML

Reinforcement Learning Measurement Model

Wenqian Xu, Feng Ji

AI总结 本文提出了一种新的强化学习测量模型(RLMM),用于处理交互式评估中产生的序列过程数据,克服了传统项目反应模型和现有基于马尔可夫决策过程的测量模型在处理大规模任务时的计算效率问题。该模型通过共享参数化的动作价值函数,将个体选择敏感性与任务价值表示解耦,从而提高了估计效率,并引入了玻尔兹曼选择规则、软贝尔曼一致性惩罚和块坐标MAP估计方法,实现了对行为关键决策的诊断。实验表明,RLMM在复杂任务中具有更高的估计精度和更低的运行时间,并能有效反映个体决策能力与任务表现之间的关系。

详情
英文摘要

Interactive assessments generate sequential process data that are not well handled by conventional item response models. Existing MDP-based measurement approaches, such as the Markov decision process measurement model (MDP-MM, LaMar, 2018), link action choices to state-action values, but their reliance on person-specific tabular value functions makes them difficult to scale beyond small, fully enumerated tasks. We propose the Reinforcement Learning Measurement Model (RLMM), a measurement framework that decouples person-level choice sensitivity from task-level value representation through a shared parametric action-value function, making estimation more computationally efficient for larger process-data settings. The model combines a Boltzmann choice rule with normalized advantages, a soft Bellman consistency penalty, and a block-coordinate MAP procedure for joint estimation, while also yielding step-level influence diagnostics for identifying behaviorally critical decisions. In peg-solitaire simulations, the RLMM achieved higher estimation accuracy and substantially lower runtime than the original MDP-MM, with advantages increasing as task complexity grew. In AQUALAB gameplay logs, the estimated person parameter was positively associated with cumulative reward, task completion, and behavioral efficiency. These results show that the RLMM extends decision-process-based psychometric models to larger and more behaviorally realistic environments while preserving an interpretable latent trait tied to decision making steps.

2605.09300 2026-05-12 stat.ME

Causal Stability Selection

Falco J. Bargagli-Stoffi, Omar Melikechi

AI总结 本文研究如何识别影响治疗效果的协变量这一因果推断中的核心问题。作者提出了一种新的因果稳定性选择方法,将交叉拟合的条件平均处理效应估计与路径稳定性选择相结合,能够在有限样本下有效控制假阳性数量,提高发现结果的可重复性。该方法适用于任意的处理效应估计器和基础选择器,并在标准假设下保证估计的收敛性,建立了处理效应估计与效应修饰变量发现之间的直接联系。

详情
英文摘要

Identifying covariates that modify treatment effects is a central problem in causal inference. Yet existing data-adaptive procedures do not provide finite-sample control over the expected number of false discoveries, risking spurious findings that fail to replicate. We introduce causal stability selection, an algorithm that combines cross-fitted estimation of conditional average treatment effects with integrated path stability selection. The method accommodates arbitrary treatment effect estimators and arbitrary base selectors, and produces a selection set with an explicit, non-asymptotic bound on the expected number of false positives. Under standard causal identifying assumptions and regularity conditions on the base selector, we prove that the estimated selection probabilities converge to their oracle counterparts at the rate of the underlying treatment effect estimator. This establishes a direct connection between treatment effect estimation and effect modifier discovery. We illustrate the method on a randomized trial in oncology and on observational data on maternal smoking and infant birthweight.

2605.09291 2026-05-12 cs.LG stat.AP

dFlowGRPO: Rate-Aware Policy Optimization for Discrete Flow Models

Zhengyan Wan, Yidong Ouyang, Panwen Hu, Qiang Sun

AI总结 本文提出了一种名为dFlowGRPO的强化学习框架,用于离散流模型,支持更广泛的概率路径和非掩码源分布。该方法通过推导离散流模型的完整轨迹概率,将去噪过程建模为马尔可夫决策过程,从而在强化学习中结合条件转移率和后验模型的信息。实验表明,dFlowGRPO在文本到图像生成任务中优于现有的GRPO方法,并在理解任务中展现出强大的能力。

详情
英文摘要

Discrete flow models (DFMs) are a class of flexible generative models for generating discrete data, and diffusion large language models (dLLMs) can be viewed as a special case with a specific choice of mixture path and a masked source distribution. While several recent works have explored reinforcement learning into dLLMs, its application to more general discrete flow models remains underexplored. In this work, we present discrete Flow-GRPO (dFlowGRPO), a unified reinforcement learning framework for discrete flow models that supports a broad family of probability paths and non-masked source distributions. We derive the full trajectory probability for DFMs and formulate denoising as a Markov decision process, enabling dFlowGRPO to incorporate information from both the associated conditional transition rates and the posterior model during reinforcement learning. We apply dFlowGRPO to FUDOKI, a recent multimodal discrete flow model, and evaluate it on both image generation and multimodal understanding tasks. Empirical results show that dFlowGRPO outperforms existing GRPO-type methods for dLLMs on text-to-image generation tasks and achieves performance competitive with continuous flow-based models trained using FlowGRPO, while also demonstrating strong capabilities on understanding tasks.

2605.09264 2026-05-12 stat.ME

Nested Sensitivity Envelopes for Transported Quantile Treatment Effects

Pengyun Wang

AI总结 本文研究在存在未测量混杂因素和目标人群不可推广性的情况下,如何估计目标人群的分位数处理效应。作者提出了一种嵌套敏感性包络方法,结合源人群的处理分配敏感性约束和源到目标潜在结果分布的条件似然比约束,推导出针对每个处理组和分位点的闭式分位数反事实累积分布函数包络。该方法在保持标准化的同时提升了传统似然比放松方法的精度,并发展了相应的半参数理论,实现了对分位数效应的精确置信区间估计。

详情
英文摘要

We study target-population quantile treatment effects when a source study may have unmeasured treatment confounding and may not transport to a target population after conditioning on observed covariates. The observed data consist of a source sample with treatment, outcome and covariates, and a target sample with covariates only. We impose two marginal sensitivity restrictions: an odds-ratio bound \(\Gam\) for source treatment assignment and a conditional likelihood-ratio bound \(\Lam\) for source-to-target potential-outcome distribution shift. For each treatment arm and threshold \(y\), we derive a closed-form sharp target counterfactual CDF envelope. The envelope nests a source marginal-sensitivity map inside a target outcome-shift map, preserving two normalizations and generally improving on a single product likelihood-ratio relaxation. We prove process-level sharpness, so the envelopes are attainable as entire CDFs and can be inverted to obtain sharp target quantile bounds and sharp interval-hull QTE bounds. We then develop semiparametric theory for these nonsmooth bound processes. On regular index sets, we give the canonical gradient, including the source propensity contribution required in observational studies, and construct cross-fitted Neyman-orthogonal one-step estimators with uniform Gaussian approximation. On full index sets with active-set ties or mass points, we use Hadamard directional differentiability and subsampling-valid inference, with a primitive finite-support route for the required weak convergence. Finally, we invert simultaneous monotone CDF bands to obtain honest confidence sets for quantile and QTE interval-hull processes, and formulate the two-dimensional \((\Gam,\Lam)\) breakdown frontier as level-set inference for interval-hull non-refutation.

2605.09257 2026-05-12 stat.ME

Regularity, Phase Transitions, and Uniform Inference for Proximal Counterfactual Quantile Processes

Pengyun Wang

AI总结 本文研究了在未测量混杂因素下,利用近端负控制代理变量对反事实分布、分位数和下尾风险过程进行半参数推断的理论问题。通过建立连续的逆问题框架,作者提出了原桥方程和对偶桥方程,并揭示了反事实累积分布函数可微性的精确正则性边界。研究还给出了典型梯度表达式,分析了根号n可估性的相变条件,并提出了高效的CDF过程推断和分位数带估计方法,为因果推断提供了新的理论工具和计算方法。

详情
英文摘要

This paper develops semiparametric theory for counterfactual distribution, quantile, and lower-tail risk processes under unmeasured confounding using proximal negative-control proxies. Rather than treating each threshold as a separate proximal mean problem with outcome $\mathbf 1\{Y\le y\}$, we study the continuum of inverse problems indexed by $y$. For each treatment arm $a$, the counterfactual CDF $F_a(y)=P\{Y(a)\le y\}$ is represented by the primal bridge equation $T_a h_{a,y}=g_{a,y}$ and the linear functional $\ell(h)=E\{h(W,X)\}$. The dual bridge $q_a$ solves $T_a^*q_a=1$, equivalently $E[\mathbf 1(A=a)q_a(Z,X)-1\mid W,X]=0$. We show that this dual equation, together with the minimal residual-moment condition required for the influence function to lie in $L_2(P_0)$, is the exact regularity boundary in a threshold-saturated observed-data proximal bridge model: $F_a(y)$ is pathwise differentiable if and only if a regular square-integrable dual bridge exists. The canonical gradient is \[ h_{a,y}(W,X)-F_a(y)+\mathbf 1(A=a)q_a(Z,X)\{\mathbf 1(Y\le y)-h_{a,y}(W,X)\}. \] A singular-system characterization gives a Picard-type phase transition: root-$n$ regular estimation is possible exactly when $\sum_j\ell_{a,j}^2/s_{a,j}^2<\infty$ and the residual moment is finite. Outside this region, finite-dimensional efficiency bounds diverge under residual-noise nondegeneracy, and Gaussian inverse benchmarks yield slower minimax rates. We further establish efficient CDF-process inference, cross-fitted uniform doubly robust expansions, finite-rank weak-proxy rate conditions, density-free simultaneous quantile bands by inversion of CDF bands, and lower-tail CVaR inference via a shortfall representation. The estimators rely on closed-form linear algebra, convex Tikhonov regularization, and isotonic projection for shape enforcement.

2605.09256 2026-05-12 cs.LG cs.AI stat.ML

Improving Generalization by Permutation Routing Across Model Copies

Shuhei Kashiwamura, Timothee Leleu

AI总结 本文提出了一种利用 $M$-cover 变换来提升机器学习模型泛化能力的方法。该方法通过复制模型 $M$ 次,并利用结构化的混合核 $Q$ 对模型参数进行排列路由,从而在不同副本之间传递局部学习信息,而非传统的参数平均或显式吸引力机制。这种方法通过结构化的消息共享机制,有效改善了模型的泛化性能,适用于从感知机到多层感知机等多种模型结构。

详情
英文摘要

We introduce a use of the \(M\)-cover (or \(M\)-layer) transform for machine learning. The method replicates a model \(M\) times, but instead of coupling the copies through parameter averaging or an explicit attractive force, as in replicated SGD or Elastic SGD, it rewires the contexts in which local learning messages are computed. Each local loss is evaluated on a routed model whose parameters are drawn from different copies according to permutations sampled from a structured mixing kernel \(Q\). Training then uses the original local update rule, while the resulting learning messages are redistributed across the copies through these routed computational paths. Thus \(Q\) defines a topology for message transport and controls the long-loop structure of the lifted factor graph. We formulate this construction for perceptrons, committee machines, and multilayer perceptrons, showing that the same principle applies from discrete models to differentiable neural networks. The resulting framework provides a mechanism for improving generalization through structured message sharing rather than replica collapse or parameter-space coupling.

2605.07964 2026-05-12 stat.ML cs.LG

Asymptotically Log-Optimal Bayes-Assisted Confidence Sequences for Bounded Means

Valentin Kilian, Stefano Cortinovis, François Caron

AI总结 该论文提出了一种基于贝叶斯预测模型的置信序列构造方法,用于对有界独立同分布观测的均值进行时间统一的不确定性量化。核心方法通过在每一步选择最大化预测期望对数增长的合法鞅更新因子,从而在保持有效性的同时利用先验信息提升效率。研究证明,当预测分布满足Wasserstein一致性时,该方法在渐近意义上达到对数最优,实验表明其在减少置信区间宽度和采样努力方面具有显著优势。

Comments Valentin and Stefano are joint first authors

详情
英文摘要

Confidence sequences based on test martingales provide time-uniform uncertainty quantification for the mean of bounded IID observations without parametric distributional assumptions. Their practical efficiency, however, depends strongly on the choice of martingale updates, and many existing constructions do not exploit prior information about plausible data-generating distributions or mean values. We propose a Bayes-assisted framework that uses a Bayesian working predictive model to adaptively construct confidence sequences. For each candidate mean and time point, the predictive distribution selects, among valid one-step martingale factors, the update maximising predictive expected log-growth; validity is therefore preserved even when the prior or working model is misspecified. We prove that if the predictive distribution is Wasserstein-consistent, the resulting procedure is asymptotically log-optimal, matching the per-sample log-growth of an oracle procedure with access to the true distribution. We instantiate the framework using robust predictives based on Dirichlet-process mixtures and Bayesian exponentially tilted empirical likelihood. Experiments on synthetic data, sequential best-arm identification for LLM evaluation, and prediction-powered inference show that informative priors can substantially reduce confidence-sequence width and sampling effort while retaining anytime-valid coverage.

2605.05743 2026-05-12 stat.ML cs.AI cs.LG

Fourier Feature Methods for Nonlinear Causal Discovery: FFML Scoring, TRFF Scoring, and FFCI Testing in Mixed Data

Joseph D. Ramsey

AI总结 该论文提出三种基于傅里叶特征的实用方法,用于解决非线性因果发现中的大规模计算问题。FFML 评分通过有限维特征表示近似高斯过程边缘似然,降低了计算复杂度并支持混合数据;TRFF 评分采用带惩罚的Student-t回归,具有更强的鲁棒性和更快的运行速度;FFCI 检验则是一种适用于混合数据的快速非参数条件独立性检验方法。这些方法在不同数据场景下表现出互补的优势,提升了因果发现的准确性和效率。

Comments 18 pages, 2 figures, 3 tables

详情
英文摘要

Gaussian process (GP) marginal likelihood scores and kernel conditional independence tests are theoretically appealing for nonlinear causal discovery but computationally prohibitive at scale. We present three complementary RFF-based methods forming a practical toolkit for score-based, constraint-based, and hybrid causal discovery. The Fourier Feature Marginal Likelihood (FFML) score approximates the exact GP marginal likelihood by replacing the $n x n$ kernel Gram matrix with a finite-dimensional feature representation, reducing cost to $O(nm^2 + m^3)$ while retaining the probabilistic interpretation and automatic complexity penalty of the exact score. FFML extends to mixed (continuous and discrete) parent sets via a product-kernel construction, with a Kronecker path for small discrete parent sets and a Hadamard-product path otherwise. The Tetrad Random Fourier Feature (TRFF) score is a complementary BIC-style alternative using penalized Student-t regression with random Fourier features. TRFF offers robustness to heavy-tailed noise and faster runtime than FFML. Empirically, TRFF and FFML exhibit a complementary precision-recall profile: TRFF achieves higher precision while FFML achieves better recall and lower SHD overall. The Fourier Feature Conditional Independence (FFCI) test is a fast nonparametric CI test for mixed data, using ridge residualization in feature space and a Frobenius-norm cross-covariance statistic approximated as a weighted sum of chi-squared variables. Empirically, BOSS+FFML achieves the lowest SHD on nonlinear data, while BOSS+TRFF offers the highest precision. When run through PC-Max, FFCI and RCIT exhibit complementary precision-recall profiles: RCIT is more precise while FFCI achieves better recall and substantially lower SHD, at approximately twice the runtime.

2605.04915 2026-05-12 quant-ph cs.IT math.IT math.ST stat.TH

Optimal Error Exponents for Composite Sequential Quantum Hypothesis Testing

Jacob Paul Simpson, Efstratios Palias, Sharu Theresa Jose

AI总结 本文研究复合序贯量子假设检验问题,旨在从一组备择量子态中区分零假设量子态。作者提出了一种混合序贯量子似然比检验方法,根据当前对备择集合的混合估计自适应选择测量,并在混合对数似然比首次越过阈值时停止。该方法在期望样本数量受限的条件下,同时达到了类型一和(最坏情况)类型二错误的最优指数,该指数由零假设与备择集合之间的最小测度相对熵表征,并证明了相应的最优性界。研究还表明,在复合序贯量子假设检验中实现趋于零的错误概率,所需的期望样本复杂度至少与两个固定态之间的序贯检验相当。

Comments Under Review

详情
英文摘要

We study the composite sequential quantum hypothesis testing (SQHT) problem, where the objective is to distinguish a null quantum state from a set of alternative quantum states. We propose a mixture-sequential quantum probability ratio test that adaptively selects measurements based on the current mixture estimate of the alternative set, and stops upon the first threshold crossing of the mixture log-likelihood ratio. Under an expected sample size constraint, we show that our proposed strategy simultaneously achieves the Type-I and (worst-case) Type-II error exponents, characterized by the minimal measured relative entropies between the null state and the alternative set. We further establish a matching converse, thereby characterizing the optimal error exponent region. Finally, our results show that achieving vanishing error probabilities in composite SQHT requires an expected sample complexity at least as large as that of sequential testing between two fixed states.

2605.04589 2026-05-12 stat.ML cs.LG math.ST stat.TH

Multiscale Euclidean Network Trajectories: Second-Moment Geometry, Attribution, and Change Points

Haruka Ezoe, Ryohei Hisano

AI总结 本文研究动态网络随时间演变的几何表征问题,提出了一种基于二阶矩几何的多尺度欧几里得网络轨迹框架(MENT)。通过引入各向同性归一化处理,消除节点嵌入中的线性变换模糊性,从而保留几何结构并支持轨迹与节点层面的时间变化分析。该方法能够进行模式分解、变化归因和变点检测,并在合成与真实动态网络实验中展现出良好的结构恢复与变点检测性能。

详情
英文摘要

A central challenge in dynamic network analysis is to represent temporal evolution in a way that is both geometrically meaningful and statistically identifiable. One approach embeds a sequence of network snapshots as trajectories in a Euclidean space and relates these trajectories to node embeddings. In multilayer and unfolded spectral constructions, however, node embeddings and their underlying latent positions are identifiable only up to general linear transformations. Although this ambiguity preserves edge probabilities, it can distort geometry and invalidate distance based temporal comparisons at both the trajectory and node-levels. We develop Multiscale Euclidean Network Trajectories (MENT), a framework for multiscale temporal trajectories based on second-moment geometry. By imposing an isotropic normalization on the anchor latent positions, we reduce the relevant ambiguity to orthogonal transformations and prevent distortion of the second-moment geometry. In this canonical representation, we define a trace variation distance and mode-wise variation distances along orthogonal directions, and use multidimensional scaling to obtain low-dimensional trajectories of time points at both global and mode-wise levels. The resulting trajectories support interpretation and inference. They admit mode-wise decompositions, support attribution of global and mode-wise temporal changes to nodes, and enable change point detection through 1D trajectories. We prove consistency of the proposed unfolded spectral embedding and of the induced temporal trajectories. Experiments on two synthetic and two real dynamic networks illustrate stable and interpretable recovery of temporal structure and show strong performance against existing change point detection baselines.

2605.02326 2026-05-12 stat.AP q-fin.PM

Large-Scale Asset Selection via Metric Dependence with Enriched High Frequency Information

Yangzhou Chen, Shuaida He, Xin Chen

AI总结 本文研究了如何利用高频率数据进行大规模资产选择,以提高投资组合的绩效。作者提出了一种名为度量依赖筛选(MDS)的方法,通过将每只资产的日收益率与日内风险状态曲线结合为点-曲线对象,并引入加权乘积度量,保留收益信息和日内风险动态。MDS通过Fréchet变分依赖分数对资产进行排序,从而筛选出最优投资标的,最终结合传统均值-方差或最小方差方法进行资产配置。实证研究表明,MDS在保留日内风险动态的前提下,显著提升了投资组合的样本外表现。

详情
英文摘要

Large-scale portfolio choice is highly sensitive to estimation error, making the preliminary asset selection essential in empirical implementation. Existing selection rules typically rely on scalar returns or low dimensional high frequency summaries, and thus discard intraday risk dynamics that may be relevant for risk adjusted allocation. We propose Metric Dependence Screening (MDS), an asset selection procedure that incorporates high frequency information as object valued data. Each asset day observation is represented as a point-curve object combining daily return with an intraday risk state curve, equipped with a weighted product metric that preserves both reward information and within day risk dynamics. MDS ranks assets by a Fréchet variation based dependence score, measuring how much a risk adjusted target explains the metric dispersion of the asset representations. This yields a simple two stage portfolio procedure: MDS first reduces the investable universe, and standard mean-variance or minimum variance allocation is then applied. We develop a target slicing estimator and establish concentration, sure selection, and rank consistency guarantees under $α$-mixing time series dependence and ultrahigh dimensionality. Simulations show that MDS performs well across both Euclidean and non-Euclidean settings. Using high frequency data for $2938$ Chinese A-share stocks from July 2023 to December 2025, we demonstrate that MDS improves out of sample portfolio performance over return based and scalar dependence based benchmarks, highlighting the value of preserving intraday risk dynamics.

2604.27569 2026-05-12 stat.ME

Robust Nonparametric Testing Approaches for Spatial Regression

Kanghyun Wi, Hyoeun Kim, Tomáš Mrkvička, Jorge Mateu, Jaewoo Park

AI总结 本文针对空间回归模型中的可靠推断问题,提出了一种基于随机位移的稳健非参数蒙特卡洛检验框架,避免了传统参数方法对空间依赖结构、均值趋势和误差分布的严格假设。该方法通过构建残差与目标协变量之间的依赖性统计量,评估协变量的显著性,适用于多种模型且无需参数假设或检验统计量的显式分布形式。研究还证明了该检验在样本协方差作为统计量时的渐近精确性,并通过数值实验验证了其在保持名义显著性水平的同时具有良好的检验效能。

详情
英文摘要

Reliable inference for spatial regression remains challenging because it requires the correct specification of the spatial dependence structure, the mean trend, and the error distribution. Existing parametric testing methods rely on restrictive assumptions that are difficult to verify in practice and can lead to inaccurate conclusions under misspecification. To address this, we develop a robust nonparametric Monte Carlo testing framework for spatial regression based on random shifts. We construct test statistics that measure the dependence between residuals, obtained after removing the effects of nuisance covariates, and the covariate of interest. This allows us to assess the significance of the covariate in the sense of partial correlation. The proposed framework enables robust inference across various models without requiring parametric assumptions or even a closed-form distribution of the test statistics. Furthermore, we establish the asymptotic exactness of the random shift test in the increasing-domain setting when the sample covariance is used as the test statistic. Through extensive numerical experiments, we demonstrate that our method maintains the nominal significance level while achieving competitive power, whereas parametric methods can exhibit inflated type I error rates, even when they are correctly specified.

2604.23904 2026-05-12 stat.ME cs.AI stat.ML

Generative Synthetic Data for Causal Inference: Pitfalls, Remedies, and Opportunities

Yichen Xu

AI总结 该论文研究了生成合成数据在因果推断中的有效性问题,指出传统生成模型虽在预测性能上表现良好,但可能扭曲平均处理效应(ATE)估计。文章分析了生成模型在保留协变量分布与准确处理效应之间的结构性矛盾,并提出了一种混合生成框架,将协变量生成与处理和结果机制建模分离,以提升因果推断的准确性。实验表明,该方法在多种场景下相比全生成模型能显著提高因果推断的保真度。

详情
英文摘要

Synthetic tabular data are often evaluated by distributional similarity, privacy distance, or train-on-synthetic-test-on-real predictive performance, but these criteria do not ensure validity for causal inference. We show that fully generative tabular synthesizers, including GAN- and LLM-based models, can preserve predictive utility while distorting average treatment effect (ATE) estimates. The failure is structural: ATE preservation requires both a realistic covariate law and an accurate treatment-effect contrast, whereas prediction loss penalizes treatment-effect error only through an overlap-weighted term. We formalize this mismatch through sensitivity and loss-decomposition results, and identify an analogous decomposition in block-level next-token prediction under log loss. Motivated by the tabular causal analysis, we propose a hybrid synthetic-data framework that generates covariates while modeling treatment and outcome mechanisms separately, allowing causal-purpose treatment assignment such as randomized synthetic assignment. We evaluate this framework in three settings: ATE preservation under fully generative versus hybrid synthesis, targeted augmentation for practical positivity problems, and synthetic simulation engines for comparing OR, IPW, AIPW, and TMLE before real-data analysis. Across synthetic and ACTG experiments, hybrid synthesis improves causal fidelity relative to fully generative baselines; LLM-based hybrid synthesis is often more faithful than CTGAN for ATE preservation and finite-sample estimator benchmarking.

2604.20172 2026-05-12 cs.LG math.ST stat.ML stat.TH

Cover meets Robbins while Betting on Bounded Data: $\ln n$ Regret and Almost Sure $\ln\ln n$ Regret

Shubhada Agrawal, Aaditya Ramdas

AI总结 本文研究在有界数据序列上进行投注时的策略设计,旨在同时应对随机数据和对抗性数据。提出了一种结合Robbins和Cover思想的混合投注策略,该策略在几乎所有路径上实现了$O(\ln \ln n)$的对数对数级遗憾,而在少数路径上则保持$O(\ln n)$的对数级遗憾。该方法首次展示了通过策略对冲实现对随机数据和对抗数据的自适应性,具有重要的理论价值和应用前景。

Comments Improved a regret bound. New regret bound for a classical mixture

详情
英文摘要

Consider betting against a sequence of data in $[0,1]$, where one is allowed to make any bet that is fair if the data have a conditional mean $m_0 \in (0,1)$. Cover's universal portfolio algorithm delivers a worst-case regret of $O(\ln n)$ compared to the best constant bet in hindsight, and this bound is unimprovable against adversarially generated data. In this work, we present a novel mixture betting strategy that combines insights from Robbins and Cover, and exhibits a different behavior: it eventually produces a regret of $O(\ln \ln n)$ on almost all paths (a measure-one set of paths if each conditional mean equals $m_0$ and intrinsic variance increases to $\infty$), but has an $O(\log n)$ regret on the complement (a measure zero set of paths). Our paper appears to be the first to point out the value in hedging two very different strategies to achieve a best-of-both-worlds adaptivity to stochastic data and protection against adversarial data. We contrast our results to those in Agrawal and Ramdas [2026] for a sub-Gaussian mixture on unbounded data: their worst-case regret has to be unbounded, but a similar hedging delivers both an optimal betting growth-rate and an almost sure $\ln\ln n$ regret on stochastic data. Finally, our strategy witnesses a sharp game-theoretic upper law of the iterated logarithm, analogous to Shafer and Vovk [2005].

2604.17676 2026-05-12 stat.ME econ.EM math.ST stat.TH

Subsample-Based Estimation under Dynamic Contamination

Yukai Yang, Rickard Sandberg

AI总结 本文研究了动态时间序列模型中基于子样本估计在数据污染情况下的结构性失效问题。即使已知污染位置,剔除污染观测也无法恢复无污染的目标函数,因为污染会通过残差滤波传播并扭曲估计准则,导致子样本估计量对干净数据参数不一致。为此,作者提出了一种基于补丁移除算子的传播兼容性变换,能够在污染存在时恢复估计一致性,且不影响无污染模型下的估计性能,该方法适用于广泛的残差型估计器,无需对污染过程进行建模。

Comments 42 pages, 2 figures, 6 tables, 1 algorithm. Code available at https://github.com/yukai-yang/Robust_Experiments

详情
英文摘要

This paper studies a structural failure of subsample-based estimation in dynamic time series models. Even under oracle knowledge of contamination locations, removing contaminated observations does not restore the uncontaminated objective. In such settings, contamination propagates through the residual filter and distorts the estimation criterion. As a result, subsample-based estimators are generically inconsistent for the clean-data parameter. We characterise this failure as a structural incompatibility between pointwise subsampling and residual propagation. More generally, the failure arises whenever contamination propagates through transformations that enter the estimation criterion, with dynamic time series models as a leading example. To address it, we propose a propagation-compatible transformation of index sets via a patch removal operator. Under general high-level conditions, this transformation leaves the estimator asymptotically unchanged under the uncontaminated model while restoring consistency under contamination. The results apply to a broad class of residual-based estimators and do not rely on modelling the contamination process.

2604.08789 2026-05-12 eess.SY cs.SY stat.AP

Quantifying the resilience benefits of undergrounding a circuit with utility data

Arslan Ahmad, Ian Dobson, Anne Kimber

AI总结 本文利用历史停电数据,量化了将架空线路改为地下线路所带来的韧性提升效益。通过对比线路在过去若为地下线路时的运行表现,研究分析了停电次数、受影响用户数、停电时长和用户停电小时数等指标,结果显示两种选定线路的年用户停电小时数分别减少了75%和78%。此外,还评估了加快10%停电恢复速度所带来的额外效益。

详情
英文摘要

We leverage historical outage data to quantify the resilience benefits of undergrounding a circuit. The historical performance of the overhead circuit is compared to the performance if the circuit had been undergrounded in the past. The number of outages, customers affected, outage duration, and customer hours lost are used as metrics to quantify the benefits of undergrounding. Results show 75% and 78% reductions in customer hours lost per year for two selected circuits, as well as a significant reduction in the average number of outages and customers affected per year, highlighting the advantages of undergrounding. The benefits of investments that result in 10% faster outage restoration are also calculated by rerunning history with the faster restoration included.

2604.03928 2026-05-12 cs.LG cs.AI cs.CV stat.ML

Supervised Dimensionality Reduction Revisited: Why LDA on Frozen CNN Features Deserves a Second Look

Indar Kumar, Girish Karhana, Sai Krishna Jasti, Ankit Hemant Lade

AI总结 本文重新审视了在冻结的预训练卷积神经网络特征上应用监督降维方法的有效性,特别是线性判别分析(LDA)。研究对比了多种降维策略在多个视觉任务上的表现,发现LDA在粗粒度分类任务中能显著提升分类准确率并大幅降低特征维度,但在细粒度任务中效果较差。实验表明,LDA在类间结构较明显时表现优异,而对需要细微区分的任务则可能适得其反,为冻结特征分类流程中的降维应用提供了实用指导。

Comments 11 pages, 5 figures, 5 tables. Code available at https://github.com/IndarKarhana/lda-image-classification

详情
英文摘要

Frozen pretrained image representations are widely used for transfer learning: a backbone is kept fixed, feature vectors are extracted, and a lightweight classifier is trained on top. This pipeline usually feeds the full feature vector to the classifier, even when the target task has far fewer classes than the pretraining task. We revisit a classical alternative: supervised dimensionality reduction with Linear Discriminant Analysis (LDA) before linear probing. We evaluate ten dimensionality-reduction strategies on frozen features from six backbones -- ResNet-18, ResNet-50, MobileNetV3-Small, EfficientNet-B0, ViT-B/16, and DINOv2-ViT-S/14 -- across CIFAR-100, Tiny ImageNet, and CUB-200-2011. Under a fixed logistic-regression protocol, LDA improves accuracy over full features in 11 of 12 coarse-grained configurations, with gains up to 4.5 percentage points while reducing feature dimensionality by 48-87%. The same projection consistently hurts on fine-grained CUB-200, where full features win across all six backbones. This establishes a practical boundary condition: LDA is useful when class-level structure is coarse enough to be captured by mean-separating directions, but it can discard subtle cues needed for fine-grained recognition. We also compare LDA with PCA, PCA+LDA, regularized LDA, Local Fisher Discriminant Analysis, Neighbourhood Components Analysis, and three lightweight LDA extensions. The results show that plain LDA offers the best accuracy-cost tradeoff for most coarse-grained settings, while more complex supervised reduction methods rarely justify their additional cost. Overall, the study provides concrete guidance for when post-hoc supervised projection should, and should not, be inserted into frozen-feature image classification pipelines.

2604.03883 2026-05-12 cs.LG cs.AI cs.SY eess.SY stat.ML

Regime-Calibrated Fleet Repositioning with a Spatial Queue-Regret Decomposition

Indar Kumar, Akanksha Tiwari

AI总结 本文研究了网约车和自动驾驶按需出行运营商在未完全观测未来需求前对闲置运力进行再分配的问题,提出了一种基于历史需求模式校准的预测-优化方法。核心方法包括训练一个能减少需求误差、接单位置偏差和排队短缺风险的相似性门控,并构建了空间排队遗憾分解模型,以稳定队列代理模型分析需求场误差对等待时间的影响。实验表明,该方法在纽约市多个场景中有效降低了平均等待时间,优于传统调优方法和分布型基线。

Comments 13 pages, 4 figures, 8 tables. Code: https://github.com/IndarKarhana/regime-calibrated-dispatch

详情
英文摘要

Ride-hailing and autonomous mobility-on-demand operators reposition idle supply before future demand is fully observed. We study a retrieval-calibrated predict-then-optimize approach for this problem: historical demand regimes are matched to the current query block, combined into a calibrated demand prior, and passed to a fleet-balancing controller. The paper makes three contributions. First, we train a leakage-safe similarity gate whose objective penalizes demand error, pickup spatial mismatch, and queue shortage risk rather than retrieval rank alone. Second, we develop a spatial queue-regret decomposition for a stable queueing surrogate, linking demand-field error to wait through queueing sensitivity, allocator sensitivity, and Wasserstein pickup mismatch. Third, we evaluate learned retrieval and external-style rebalancing baselines in a common simulator. In the calibrated-demand gate experiment, across eight New York City scenarios and ten seeds, the spatial gate reduces mean wait to 82.3s, compared with 85.3s for hand-tuned similarity and 85.8s for a distributional-only baseline. In a separate replay-demand controller comparison, a scenario chance-MPC analog and a share-target transportation LP improve on Wen-style rebalancing (92.2s/92.2s vs. 100.1s), a reduced GPR chance-MPC comparator is intermediate at 94.4s, and an oracle MPC diagnostic is 91.3s.

2603.28254 2026-05-12 cs.LG stat.ML

MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration

Da Chang, Qiankun Shi, Lvgang Zhang, Yu Li, Ruijie Zhang, Yao Lu, Yongxiang Liu, Ganzhao Yuan

AI总结 本文提出了一种名为MuonEq的轻量级预正交化均衡方法,用于改进矩阵参数优化中的正交化更新策略。该方法在正交化之前对动量矩阵进行行或列归一化,从而提升正交化过程中的几何特性,改善训练效果。实验表明,MuonEq在多个大规模语言模型的预训练任务中表现优于原有方法,具有更快的收敛速度和更低的验证困惑度。

详情
英文摘要

Orthogonalized-update optimizers such as Muon improve training of matrix-valued parameters, but existing extensions typically either rescale updates after orthogonalization or use heavier whitening-based preconditioners before it. We introduce {\method}, a lightweight family of pre-orthogonalization equilibration schemes for Muon with three forms: two-sided row/column normalization (RC), row normalization (R), and column normalization (C). By rebalancing the momentum matrix before finite-step Newton--Schulz orthogonalization, {\method} improves the geometry seen by orthogonalization. We show that finite-step orthogonalization is governed by the input spectrum, especially stable rank and condition number, and that row/column normalization acts as a zeroth-order surrogate for whitening. For hidden matrix weights, R is the default variant. Theoretically, {\method} (R) retains the standard $\widetilde{\mathcal O}(T^{-1/4})$ Muon-type nonconvex stationarity guarantee with decoupled weight decay and a horizon-free diminishing learning-rate schedule, and extends it to finite-step NS5 up to an explicit inexactness constant. In LLaMA2 pretraining on C4, {\method} (R) consistently outperforms Muon on 130M, 350M, and 1B models, with faster convergence and lower validation perplexity. The code is available at the \href{https://github.com/MaeChd/muon-eq}{MuonEq codebase}.

2603.02678 2026-05-12 cs.LG cs.ET cs.HC stat.ME stat.ML

Causal Discovery Should Embrace the Wisdom of the Crowd

Ryan Feng Lin, Yuantao Wei, Huiling Liao, Xiaoning Qian, Shuai Huang

AI总结 本文提出了一种基于“群体智慧”的因果学习新范式,主张通过整合多人提供的分散且可能带有噪声的因果知识来构建全局因果结构。研究引入了众包平台、专家知识获取与聚合技术以及大语言模型辅助的信息获取等手段,构建了一个涵盖知识获取、建模、聚合与优化的群体因果学习框架。该方法为因果学习提供了新的研究方向,同时也带来了跨学科合作的机遇与挑战。

详情
英文摘要

This paper argues for recognizing an emerging paradigm of causal learning by wisdom of the crowd. Recent developments in government, industry, and research point to the rise of decentralized and crowd-based approaches within causal modeling, where causal knowledge distributed across many contributors can be systematically elicited and integrated with causal learning workflows. In this paradigm, causal learning becomes a distributed decision-making problem: each participant contributes partial and potentially noisy knowledge, while collective contributions help construct a global causal structure. This direction is enabled by advances in crowdsourcing platforms, expert knowledge elicitation, aggregation techniques, and large language model (LLM)-augmented information acquisition. Its promise is increasingly visible in early research and emerging real-world practices. Building on this momentum, we outline a framework for crowd-based causal learning spanning elicitation, modeling, aggregation, and optimization. We further discuss the opportunities and challenges introduced by this paradigm and call for interdisciplinary collaboration across causal learning, collective intelligence, human-AI interaction, and decision science.

2602.05946 2026-05-12 cs.LG stat.ML

f-GRPO and Beyond: Divergence-Based Reinforcement Learning Algorithms for General LLM Alignment

Rajdeep Haldar, Lantao Mei, Guang Lin, Yue Xing, Qifan Song

AI总结 本文研究了如何通过基于散度的强化学习算法实现大语言模型的一般对齐,包括基于可验证奖励的强化学习(RLVR)等场景。作者提出了 $f$-GRPO 和 $f$-HAL 两种方法,分别用于基于策略的奖励优化和结合策略与偏好监督的混合对齐损失,证明了它们能够估计奖励对齐与不对齐分布之间的 $f$-散度,并在实验中展示了其在数学推理任务和安全对齐中的优越性。

详情
英文摘要

Recent work shows that preference alignment objectives can be interpreted as divergence estimators between aligned (preferred) & unaligned (less-preferred) distributions, yielding a principled recipe for designing alignment losses. However, this view has so far been limited to preference-based supervision. We extend it to general LLM alignment, including reinforcement learning with verifiable rewards (RLVR), where alignment feedback is given only as scalar rewards. We introduce $f$-Group Relative Policy Optimization ($f$-GRPO), a class of on-policy RL objectives, and $f$-Hybrid Alignment Loss ($f$-HAL), which combines on-policy reward optimization with off-policy preference supervision. We show that these objectives estimate $f$-divergences between reward-aligned & reward-unaligned distributions induced by above- & below-average reward responses, and prove expected reward improvement after alignment. Empirically, $f$-GRPO improves over GRPO on math-reasoning RLVR tasks, while hybrid $f$-HAL mitigates reward hacking in on-policy safety alignment when verifiable rewards are unavailable and learned reward models must be used.

2601.23252 2026-05-12 stat.CO cs.LG stat.ML

Nested Slice Sampling: Vectorized Nested Sampling for GPU-Accelerated Inference

David Yallup, Namu Kroupa, Will Handley

AI总结 本文提出了一种名为嵌套切片采样(Nested Slice Sampling, NSS)的算法,旨在提高嵌套采样在GPU上的可扩展性和计算效率。该方法通过引入切片采样的击中-运行策略,实现了对约束更新的向量化处理,并给出了一个简单且近似最优的切片宽度设置规则,提升了高维问题下的性能和并行计算的可预测性。实验表明,NSS在复杂模型比较、高维贝叶斯推断和高斯过程超参数边缘化等任务中,能够保持准确的证据估计和高质量的后验样本,尤其在多模态问题上表现出优于现有方法的鲁棒性。

Comments 58 pages, 13 figures, Accepted to Transactions on Machine Learning Research

详情
英文摘要

Model comparison and calibrated uncertainty quantification often require integrating over parameters, but scalable inference can be challenging for complex, multimodal targets. Nested Sampling is a robust alternative to standard MCMC, yet its typically sequential structure and hard constraints make efficient accelerator implementations difficult. This paper introduces Nested Slice Sampling (NSS), a GPU-friendly, vectorized formulation of Nested Sampling that uses Hit-and-Run Slice Sampling for constrained updates. A tuning analysis yields a simple near-optimal rule for setting the slice width, improving high-dimensional behavior and making per-step compute more predictable for parallel execution. Experiments on challenging synthetic targets, high dimensional Bayesian inference, and Gaussian process hyperparameter marginalization show that NSS maintains accurate evidence estimates and high-quality posterior samples, and is particularly robust on difficult multimodal problems where current state-of-the-art methods such as tempered SMC baselines can struggle. An open-source implementation is released to facilitate adoption and reproducibility.

2601.22320 2026-05-12 cs.LG stat.ML

Matrix Factorization for Practical Continual Mean Estimation Under User-Level Differential Privacy

Nikita P. Kalinin, Ali Najar, Valentin Roth, Christoph H. Lampert

AI总结 本文研究了在用户级差分隐私保护下的连续均值估计问题,即在数据向量依次到达的情况下,如何持续准确地估计累积均值。为了解决这一问题,作者采用近似差分隐私框架,并结合矩阵分解机制,提出了一种专门用于均值估计的矩阵分解方法,该方法在保证隐私的同时,能够显著降低均值估计的均方误差,提升了实际应用中的估计精度与效率。

详情
英文摘要

We study continual mean estimation, where data vectors arrive sequentially and the goal is to maintain accurate estimates of the running mean. We address this problem under user-level differential privacy, which protects each user's entire dataset even when they contribute multiple data points. Previous work on this problem has focused on pure differential privacy. While important, this approach limits applicability, as it leads to overly noisy estimates. In contrast, we analyze the problem under approximate differential privacy, adopting recent advances in the Matrix Factorization mechanism. We introduce a novel mean estimation specific factorization, which is both efficient and accurate, achieving asymptotically lower mean-squared error bounds in continual mean estimation under user-level differential privacy.

2512.04366 2026-05-12 stat.ME stat.AP

Sequential Randomization Tests Using e-values: Applications for trial monitoring

Fernando G Zampieri

AI总结 本文提出了一类基于e值的非参数序贯随机化检验方法(e-RT),用于随机试验的序贯监测,适用于二分类、事件型和连续型终点。该方法利用随机化机制保证检验的正确性,通过构建测试鞅来实现对治疗效应的持续监控,无需依赖参数假设或渐近近似。研究还展示了方法的校准性和功效,并探讨了不同结局类型下下注策略的原理性差异,为序贯分析提供了一种保守且假设较少的补充方法。

详情
英文摘要

Sequential monitoring of randomized trials traditionally relies on parametric assumptions or asymptotic approximations. We discuss a family of nonparametric sequential tests - collectively called e-RT - for binary, event-only, and continuous endpoints. All active variants derive validity from the randomization mechanism. Using a betting framework, each test constructs a test martingale by sequentially wagering on randomized assignments or observed event labels before using the current label in the wealth update. Under the null hypothesis of no treatment effect, the expected wealth cannot grow, guaranteeing anytime-valid Type I error control regardless of stopping rule. The default e-RT posture is effect-size agnostic: monitoring can begin without specifying a hypothesized treatment effect. Alternatively, fixed design-calibrated wagers, including growth-rate-optimal (GROW) wagers, may be used as optional efficiency tools when a clinically meaningful design alternative is credible. We present simulation studies demonstrating calibration and power, and discuss the principled asymmetry in betting strategies across outcome types. These methods provide a conservative, assumption-light complement to model-based sequential analyses.

2512.00175 2026-05-12 stat.ME cs.LG stat.ML

Comparing Two Proxy Methods for Causal Identification

Helen Guo, Elizabeth L. Ogburn, Ilya Shpitser

AI总结 本文比较了两种用于因果识别的代理变量方法:桥接方程方法和数组分解方法。前者通过求解积分方程来恢复因果目标,后者则通过特征分解任务识别潜在因子以估计反事实效应。研究分析了两种方法的模型限制及其假设条件,明确了各自的适用范围,为因果效应识别提供了理论指导。

Comments 10 pages; 5 figures

详情
英文摘要

Identifying causal effects in the presence of unmeasured variables is a fundamental challenge in causal inference, for which proxy variable methods have emerged as a powerful solution. We contrast two major approaches in this framework: (1) bridge equation methods, which leverage solutions to integral equations to recover causal targets, and (2) array decomposition methods, which recover latent factors used to identify counterfactual quantities via eigendecomposition tasks. We compare the model restrictions underlying these two approaches and provide insight into implications of the underlying assumptions, clarifying the scope of applicability for each method.

2510.22351 2026-05-12 math.ST stat.ME stat.ML stat.TH

Design Stability in Adaptive Experiments: Implications for Treatment Effect Estimation

Saikat Sengupta, Koulik Khamaru, Suvrojit Ghosh, Tirthankar Dasgupta

AI总结 本文研究了在顺序自适应治疗分配机制下估计平均处理效应(ATE)的问题,与传统完全随机设计不同,治疗分配概率可能依赖于之前的分配和观测结果。文章提出了两种自然的ATE估计器——逆概率加权(IPW)估计器和增强型IPW(AIPW)估计器,并引入“设计稳定性”这一核心概念,保证了估计量的渐近正态性。研究建立了IPW和AIPW估计器的中心极限定理,给出了其渐近方差的显式表达式,并提出了方差估计方法,从而能够构建渐近有效的置信区间,理论结果在Wei自适应硬币设计和Efron偏倚硬币设计中得到了验证。

详情
英文摘要

We study the problem of estimating the average treatment effect (ATE) under sequentially adaptive treatment assignment mechanisms. In contrast to classical completely randomized designs, we consider a setting in which the probability of assigning treatment to each experimental unit may depend on prior assignments and observed outcomes. Within the potential outcomes framework, we propose and analyze two natural estimators for the ATE: the inverse propensity weighted (IPW) estimator and an augmented IPW (AIPW) estimator. The cornerstone of our analysis is the concept of design stability, which requires that as the number of units grows, either the assignment probabilities converge, or sample averages of the inverse propensity scores and of the inverse complement propensity scores converge in probability to fixed, non-random limits. Our main results establish central limit theorems for both the IPW and AIPW estimators under design stability and provide explicit expressions for their asymptotic variances. We further propose estimators for these variances, enabling the construction of asymptotically valid confidence intervals. Finally, we illustrate our theoretical results in the context of Wei's adaptive coin design and Efron's biased coin design, highlighting the applicability of the proposed methods to sequential experimentation with adaptive randomization.

2509.22196 2026-05-12 cs.LG stat.ML

Mechanistic Independence: A Principle for Identifiable Disentangled Representations

Stefan Matthes, Zhiwei Han, Hao Shen

AI总结 本文提出了一种基于“机制独立性”的统一框架,用于实现可识别的解耦表征,其核心在于通过潜变量对观测变量的作用方式来刻画潜在因素,而非依赖潜变量的分布特性。该方法在潜变量密度变化甚至引入统计依赖的情况下仍保持不变性,并提出了多种独立性准则,证明了即使在非线性和非可逆混合条件下,也能实现潜空间的可识别性。研究还建立了这些准则之间的层次关系,并从图论角度对潜空间进行了结构表征,为解耦表征的可识别性提供了新的理论基础。

详情
英文摘要

Disentangled representations seek to recover latent factors of variation underlying observed data, yet their identifiability is still not fully understood. We introduce a unified framework in which disentanglement is achieved through mechanistic independence, which characterizes latent factors by how they act on observed variables rather than by their latent distribution. This perspective is invariant to changes of the latent density, even when such changes induce statistical dependencies among factors. Within this framework, we propose several related independence criteria -- ranging from support-based and sparsity-based to higher-order conditions -- and show that each yields identifiability of latent subspaces, even under nonlinear, non-invertible mixing. We further establish a hierarchy among these criteria and provide a graph-theoretic characterization of latent subspaces as connected components. Together, these results clarify the conditions under which disentangled representations can be identified without relying on statistical assumptions.

2509.18484 2026-05-12 stat.ML cs.LG

Estimating Heterogeneous Causal Effect on Networks via Orthogonal Learning

Yuanchen Wu, Yubai Yuan

AI总结 本文研究了在网络数据中估计异质性因果效应的问题,即处理不仅影响自身节点,还可能对邻居节点产生溢出效应,且不同节点和边的因果效应可能存在差异。为此,作者提出了一种两阶段正交学习框架,第一阶段利用图神经网络估计与协变量和网络结构相关的干扰因素,第二阶段通过可解释的注意力机制模型估计直接和溢出效应,并提供了边级、节点级和群体级的因果效应估计。该方法通过正交化和交叉拟合降低对第一阶段估计误差的敏感性,并结合自助法进行不确定性量化,实验表明其在异质效应估计和后续可解释分析方面具有优势。

详情
英文摘要

Estimating causal effects on networks is challenging because treatments may affect both treated units and their neighbors, while network homophily induces dependence and confounding. These challenges are amplified when causal effects are heterogeneous across units and edges. We propose a two-stage orthogonal learning framework for estimating heterogeneous direct and spillover effects on networks. The first stage uses graph neural networks to estimate nuisance components that capture complex dependence on covariates and network structure. The second stage residualizes these nuisance components and estimates causal effects through an interpretable attention-based interference model, yielding edge-level spillover estimates as well as node- and population-level summaries. Neyman orthogonalization and cross-fitting reduce sensitivity to first-stage estimation error, so nuisance errors enter only at higher order. We further develop a bootstrap-based uncertainty quantification procedure for the estimated spillover matrix, enabling pointwise and simultaneous inference for heterogeneous edge- and node-level effects. Experiments show that our method improves heterogeneous effect estimation while supporting interpretable downstream analyses such as influential-neighbor detection and spillover-sign recovery.

2509.02510 2026-05-12 cs.CL cs.AI stat.ML

Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation

Erfan Baghaei Potraghloo, Seyedarmin Azizi, Souvik Kundu, Massoud Pedram

AI总结 本文提出了一种名为Top-H的解码方法,旨在解决大语言模型在开放文本生成中创造力与连贯性之间的平衡问题。通过建立熵约束下的最小化散度理论框架,并将其转化为熵约束质量最大化问题,作者设计了一种高效的贪心算法来实现该目标。实验表明,Top-H在创意写作任务上优于现有方法,提升了约25.63%,同时在问答任务中也保持了良好的连贯性,具有实际应用价值。

详情
英文摘要

Large language models (LLMs), despite their impressive performance across a wide range of tasks, often struggle to balance two competing objectives in open-ended text generation: fostering diversity and creativity while preserving logical coherence. Existing truncated sampling techniques, including temperature scaling, top-\$p\$ (nucleus) sampling, and min-\$p\$ sampling, aim to manage this trade-off. However, they exhibit limitations, particularly in the effective incorporation of the confidence of the model into the corresponding sampling strategy. For example, min-\$p\$ sampling relies on a single top token as a heuristic for confidence, eventually underutilizing the information of the probability distribution. Toward effective incorporation of the confidence of the model, in this paper, we present **top-H** decoding. We first establish the theoretical foundation of the interplay between creativity and coherence in truncated sampling by formulating an **entropy-constrained minimum divergence** problem. We then prove this minimization problem to be equivalent to an **entropy-constrained mass maximization** (ECMM) problem, which is NP-hard. Finally, we present top-H decoding, a computationally efficient greedy algorithm to solve the ECMM problem. Extensive empirical evaluations demonstrate that top-H outperforms the state-of-the-art (SoTA) alternative of min-\$p\$ sampling by up to **25.63%** on creative writing benchmarks, while maintaining robustness on question-answering datasets such as GPQA, GSM8K, and MT-Bench. Additionally, an *LLM-as-judge* evaluation confirms that top-H indeed produces coherent outputs even at higher temperatures, where creativity is especially critical. In summary, top-H advances SoTA in open-ended text generation and can be *easily integrated* into creative writing applications. The code is available at https://github.com/ErfanBaghaei/Top-H-Decoding.

2507.14453 2026-05-12 stat.ME

Generalized optimal parameter-transfer learning through Mallows-type model averaging

Fen Jiang, Wenhui Li, Xinyu Zhang

AI总结 在许多经济应用中,虽然存在多个源数据集,但由于数据集之间的异质性,有效整合这些数据具有挑战性。本文提出了一种参数迁移学习框架,仅共享源模型的参数估计,并引入一种基于Mallows型准则的模型平均方法,用于在参数设定下结合目标模型与源模型。该方法在目标预测风险的无偏估计方面具有优势,并在目标模型误设时保证权重的渐近最优性,且无需任何源模型正确设定。研究还拓展了该框架至半参数和面板数据场景,并通过模拟研究和房价应用验证了方法的有效性。

Comments Substantially revised and expanded version of arXiv:2507.14453v1, extending the original SAR-model framework to a broader class of parametric models

详情
英文摘要

In many economic applications, multiple source datasets are available, but their effective combination is challenging due to heterogeneity across datasets. To address this problem, we study a parameter-transfer framework that shares only source-side estimates and propose a Mallows-type model averaging method for combining target and source models in the parametric setting. The weights are obtained from a Mallows-type criterion that is unbiased for the target prediction risk up to a weight-independent term, extending the classical Mallows criterion to the parameter-transfer framework. We establish that the proposed weights are asymptotically optimal when the target model is misspecified, and asymptotically allocate weights only to informative sources when the target model is correctly specified. These guarantees do not require any source model to be correctly specified. We also consider extensions of the framework to semiparametric and panel data settings. Simulation studies and house price application further demonstrate the effectiveness of our approach.

2506.19230 2026-05-12 stat.ME stat.CO

gcor: A Python Implementation of Categorical Gini Correlation and Its Inference

Sameera Hewage

AI总结 本文介绍了gcor,一个用于计算分类Gini相关系数(CGC)的Python实现,该系数用于衡量数值变量与分类变量之间的依赖关系。CGC相比现有方法具有更优越的统计性质,如零相关意味着变量独立,并在分类特征筛选中表现出更好的性能。本文提供了高效的算法实现,包括置信区间构建和独立性检验,并通过向量化和并行化优化了计算效率。

Comments Added Computational Performance section and 4 figures

详情
英文摘要

Categorical Gini Correlation (CGC), introduced by Dang et al. (2020), is a novel dependence measure designed to quantify the association between a numerical variable and a categorical variable. It has appealing properties compared to existing dependence measures, such as zero correlation mutually implying independence between the variables. It has also shown superior performance over existing methods when applied to feature screening for classification. This article presents a Python implementation for computing CGC, constructing confidence intervals, and performing independence tests based on it. Efficient algorithms have been implemented for all procedures, and they have been optimized using vectorization and parallelization to enhance computational efficiency.

2506.05967 2026-05-12 cs.AI cs.LG stat.ML

Preference Learning for AI Alignment: a Causal Perspective

Katarzyna Kobalczyk, Mihaela van der Schaar

AI总结 本文从因果视角探讨了基于偏好数据的奖励建模问题,旨在提升大语言模型与人类价值观的一致性。研究指出了因果误识别、偏好异质性及用户特定因素带来的混淆等关键挑战,并借鉴因果推断领域的理论,明确了实现可靠泛化的必要假设。通过分析朴素奖励模型的失效模式,文章展示了因果启发方法在提升模型鲁棒性方面的潜力,并提出了未来研究和实践应关注的方向。

详情
Journal ref
Proceedings of the 42nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025
英文摘要

Reward modelling from preference data is a crucial step in aligning large language models (LLMs) with human values, requiring robust generalisation to novel prompt-response pairs. In this work, we propose to frame this problem in a causal paradigm, providing the rich toolbox of causality to identify the persistent challenges, such as causal misidentification, preference heterogeneity, and confounding due to user-specific factors. Inheriting from the literature of causal inference, we identify key assumptions necessary for reliable generalisation and contrast them with common data collection practices. We illustrate failure modes of naive reward models and demonstrate how causally-inspired approaches can improve model robustness. Finally, we outline desiderata for future research and practices, advocating targeted interventions to address inherent limitations of observational data.

2505.22873 2026-05-12 econ.GN q-fin.EC stat.ML

Forecasting Residential Heating and Electricity Demand with Scalable, High-Resolution, Open-Source Models

Stephen J. Lee, Cailinn Drouin

AI总结 本文提出了一种基于概率深度学习模型的高分辨率住宅供暖和非供暖用电需求预测框架。该方法利用建筑层面的多模态数据,如建筑面积、高度、周边环境及高分辨率天气信息,实现了对住宅用电和供暖需求的精细化预测。相比现有标准模型ResStock,该方法在建筑层面的预测精度显著提升,RMSE分别降低18.8%和27.6%,为政策制定者和电网规划者提供了开放、可扩展的高精度预测工具,有助于推动美国建筑领域的低碳转型。

Comments 11 pages, 4 figures, 2 tables. Published version (Energy and AI 24 (2026) 100726). Supplementary material available at the publisher: https://doi.org/10.1016/j.egyai.2026.100726

详情
Journal ref
Energy and AI 24 (2026) 100726
英文摘要

We present a novel framework for high-resolution forecasting of residential heating demand and non-heating electricity demand using probabilistic deep learning models. Because our models are trained on electricity consumption from a predominantly gas-heated region, the learned electricity demand patterns primarily reflect non-heating end uses such as lighting, appliances, and cooling. We focus specifically on providing hourly building-level electricity and heating demand forecasts for the residential sector. Leveraging multimodal building-level information -- including data on building footprint areas, heights, nearby building density, nearby building size, land use patterns, and high-resolution weather data -- and probabilistic modeling, our methods provide granular insights into demand heterogeneity. Validation at the building level underscores a step change improvement in performance relative to NREL's ResStock model, which has emerged as a research community standard for residential heating and electricity demand characterization. In building-level heating and electricity estimation backtests, our probabilistic models respectively achieve RMSE scores 18.8% and 27.6% lower than those based on ResStock, with probabilistic forecast quality measured via WIS improving by 59% for both applications. By offering an open-source, scalable, high-resolution platform for demand estimation and forecasting, this research advances the tools available for policymakers and grid planners, contributing to the broader effort to decarbonize the U.S. building stock and meeting climate objectives.

2505.18269 2026-05-12 cs.LG math.OC math.PR stat.ML

Representative Action Selection for Large Action Space Bandit Families

Quan Zhou, Mark Kozdoba, Shie Mannor

AI总结 本文研究了从共享动作空间的多个老虎机问题中选择代表性动作子集的问题。在实际场景中,尽管动作空间较大,但不同动作在不同环境中的奖励高度相关,因此无需保留全部动作。作者提出了一种简单有效的算法,通过随机采样并求解每个老虎机实例,收集其最优动作,从而显著减少动作空间。该方法无需预先了解动作间的相关性结构,并在理论上保证了性能,实验也验证了其优于多种基准方法。

详情
英文摘要

We study the problem of selecting a subset from a large action space shared by a family of bandits. In many natural situations, while the nominal set of actions is large, actions are highly correlated: many yield similar rewards across environments, making it wasteful to maintain the full set. Our aim is to understand whether it is possible -- and how -- to select a smaller set of representative actions that performs nearly as well as the full action space. Our main contribution is a surprisingly simple algorithm: repeatedly sample a bandit instance at random, solve it, and collect the optimal action. This algorithm can significantly reduce the action space when such correlations are present, without the need to know a-priori the correlation structure. We provide theoretical guarantees on the performance of the algorithm and demonstrate its practical effectiveness through empirical comparisons with Combinatorial Bandit, Meta Learning Bandit and Zooming baselines.

2505.06835 2026-05-12 cs.LG stat.CO stat.ME stat.ML

Streaming Sliced Optimal Transport

Khai Nguyen

AI总结 本文提出了一种用于流式数据的切片沃谢尔距离估计方法——Streaming Sliced Wasserstein(Stream-SW),旨在提升切片最优传输在计算效率和内存消耗方面的表现。该方法基于对一维沃谢尔距离的流式估计,结合分位数近似技术,实现了对流式样本的高效处理。实验表明,与随机子采样方法相比,Stream-SW 在保持较低内存消耗的同时,能够更准确地逼近切片沃谢尔距离,并在点云分类、梯度流和流式变化点检测等任务中展现出优越的性能。

Comments Accepted to ICML 2026, 22 pages, 8 figures, 7 tables

详情
英文摘要

Sliced optimal transport (SOT), or sliced Wasserstein (SW) distance, is widely recognized for its statistical and computational scalability. In this work, we further enhance computational scalability by proposing the first method for estimating SW from sample streams, called streaming sliced Wasserstein (Stream-SW). To define Stream-SW, we first introduce a streaming estimator of the one-dimensional Wasserstein distance (1DW). Since the 1DW has a closed-form expression, given by the integral of the absolute difference between the quantile functions of the compared distributions, we leverage quantile approximation techniques for sample streams to define a streaming 1DW estimator. By applying the streaming 1DW to all projections, we obtain Stream-SW. The key advantage of Stream-SW is its low memory complexity while providing theoretical guarantees on the approximation error. We demonstrate that Stream-SW achieves a more accurate approximation of SW than random subsampling, with lower memory consumption, when comparing Gaussian distributions and mixtures of Gaussians from streaming samples. Additionally, we conduct experiments on point cloud classification, point cloud gradient flows, and streaming change point detection to further highlight the favorable performance of the proposed Stream-SW.

2504.01781 2026-05-12 math.ST stat.ML stat.TH

Proper scoring rules for estimation and forecast evaluation

Kartik Waghmare, Johanna Ziegel

AI总结 本文综述了适当评分规则的数学基础及其在统计学和机器学习中的应用。适当评分规则不仅用于概率预测的评估,还可用于概率分布的估计,文章介绍了其一般性理论结果和重要类别,并探讨了其在估计与预测评价中的作用。此外,还评论了其在实际应用中的最新发展。

详情
英文摘要

Proper scoring rules have been a subject of growing interest in recent years, not only as tools for evaluation of probabilistic forecasts but also as methods for estimating probability distributions. In this article, we review the mathematical foundations of proper scoring rules including general characterization results and important families of scoring rules. We discuss their role in statistics and machine learning for estimation and forecast evaluation. Furthermore, we comment on interesting developments of their usage in applications.

2409.15645 2026-05-12 quant-ph stat.ML

Quantum Machine Learning in Drug Discovery: Applications in Academia and Pharmaceutical Industries

Anthony M. Smaldone, Yu Shee, Gregory W. Kyro, Chuzhi Xu, Nam P. Vu, Rishab Dutta, Marwa H. Farag, Alexey Galda, Sandeep Kumar, Elica Kyoseva, Victor S. Batista

AI总结 本文综述了量子机器学习在药物发现领域的应用,重点探讨了基于量子门的量子神经网络在学术界和制药工业中的潜力。文章介绍了量子机器学习的理论基础,包括数据编码、变分量子电路和混合量子-经典方法,并展示了其在分子性质预测和分子生成等任务中的应用。同时,文章也客观分析了该领域面临的挑战与未来发展方向。

详情
英文摘要

The nexus of quantum computing and machine learning - quantum machine learning - offers the potential for significant advancements in chemistry. This review specifically explores the potential of quantum neural networks on gate-based quantum computers within the context of drug discovery. We discuss the theoretical foundations of quantum machine learning, including data encoding, variational quantum circuits, and hybrid quantum-classical approaches. Applications to drug discovery are highlighted, including molecular property prediction and molecular generation. We provide a balanced perspective, emphasizing both the potential benefits and the challenges that must be addressed.

2407.16239 2026-05-12 cs.LG stat.ML

Identifiable Latent Bandits: Leveraging observational data for personalized decision-making

Ahmet Zahid Balcıoğlu, Newton Mwai, Emil Carlsson, Fredrik D. Johansson

AI总结 本文研究了如何利用观测数据实现可识别的潜在变量多臂老虎机模型,以提升个性化决策效率。提出了一种基于非线性独立成分分析的框架,能够从历史决策和结果中学习到足够表征潜在问题结构的表示,从而在较短的探索时间内做出最优决策。该方法在模拟和半合成环境中验证有效,相比传统在线和离线学习方法表现出显著优势。

Comments 35 pages, 21 figures

详情
英文摘要

Sequential decision-making algorithms such as multi-armed bandits can find optimal personalized decisions, but are notoriously sample-hungry. In personalized medicine, for example, training a bandit from scratch for every patient is typically infeasible, as the number of trials required is much larger than the number of decision points for a single patient. To combat this, latent bandits offer rapid exploration and personalization beyond what context variables alone can offer, provided that a latent variable model of problem instances can be learned consistently. However, existing works give no guidance as to how such a model can be found. In this work, we propose an identifiable latent bandit framework that leads to optimal decision-making with a shorter exploration time than classical bandits by learning from historical records of decisions and outcomes. Our method is based on nonlinear independent component analysis that provably identifies representations from observational data sufficient to infer optimal actions in new bandit instances. We verify this strategy in simulated and semi-synthetic environments, showing substantial improvement over online and offline learning baselines when identifying conditions are satisfied.

2406.13111 2026-05-12 stat.ME

Nonparametric Motion Control in Functional Connectivity Studies in Children with Autism Spectrum Disorder

Jialu Ran, Sarah Shultz, Benjamin B. Risk, David Benkeser

AI总结 该研究旨在解决自闭症谱系障碍(ASD)儿童功能连接分析中因头部运动引起的伪影问题。研究提出了一种非参数的运动校正方法MoCo,无需剔除运动较大的参与者,而是通过机器学习集成方法灵活建模运动及其他特征对功能连接的影响,从而更准确地估计ASD与非ASD儿童之间的功能连接差异。该方法在大样本下具有高效性和多重稳健性,有效减少了运动伪影,同时提高了数据利用效率并降低了选择偏差的风险。

详情
英文摘要

Autism Spectrum Disorder (ASD) is a neurodevelopmental condition associated with difficulties with social interactions, communication, and restricted or repetitive behaviors. To characterize ASD, investigators often use functional connectivity derived from resting-state functional magnetic resonance imaging of the brain. However, participants' head motion during the scanning session can induce motion artifacts. Many studies remove participants with excessive motion, and then estimate the effect of diagnosis on functional connectivity using linear regression. However, participant exclusions and linearity assumptions can cause biases. We propose an estimand that quantifies the difference in average functional connectivity in autistic and non-ASD children while standardizing motion relative to the low motion distribution in scans that pass motion quality control. We introduce a nonparametric estimator for motion control, called MoCo, that uses all participants and flexibly models the impacts of motion and other relevant features using an ensemble of machine learning methods. We establish large-sample efficiency and multiple robustness of our proposed estimator. The framework is applied to estimate the difference in functional connectivity between 132 autistic and 245 non-ASD children, of which 34 and 126 pass motion quality control, respectively. MoCo appears to dramatically reduce motion artifacts compared to a standard approach with no participant removal, while more efficiently utilizing participant data and accounting for possible selection biases compared to participant removal.

2307.09077 2026-05-12 q-fin.TR stat.ML

Estimation of an Order Book Dependent Hawkes Process for Large Datasets

Luca Mucciante, Alessio Sancetta

AI总结 本文研究了高频交易中事件到达的点过程建模问题,提出了一种结合Hawkes过程与订单簿协变量高维函数的模型。为处理大规模数据集,文中设计了一种高效估计算法,并证明了其收敛性与一致性。实证部分应用于纽约证券交易所四只股票的数据,结果表明,捕捉订单簿信息的非线性特征有助于提升模型对高频交易事件自激发特性的刻画能力。

详情
英文摘要

A point process for event arrivals in high frequency trading is presented. The intensity is the product of a Hawkes process and high dimensional functions of covariates derived from the order book. Conditions for stationarity of the process are stated. An algorithm is presented to estimate the model even in the presence of billions of data points, possibly mapping covariates into a high dimensional space. The large sample size can be common for high frequency data applications using multiple liquid instruments. Convergence of the algorithm is shown, consistency results under weak conditions is established, and a test statistic to assess out of sample performance of different model specifications is suggested. The methodology is applied to the study of four stocks that trade on the New York Stock Exchange (NYSE). The out of sample testing procedure suggests that capturing the nonlinearity of the order book information adds value to the self exciting nature of high frequency trading events.

2011.04135 2026-05-12 stat.ME stat.AP

Mixture of Finite Mixtures Model for Basket Trial

Junxian Geng, Tianjian Zhou, Ruitao Lin, Guanyu Hu

AI总结 随着癌症药物研发从细胞毒性药物向靶向治疗和免疫肿瘤学治疗转变,篮子试验允许具有相同分子靶点的不同癌症亚型患者参与。本文提出一种两步方法,在分层分析与完全数据合并之间取得平衡:首先利用混合有限混合(MFM)模型对具有相似治疗效应的队列进行聚类,其次在每个聚类内使用贝叶斯分层模型的收缩估计方法进行治疗效应估计。该方法在模拟研究和Vemurafenib篮子试验数据分析中得到了验证,有效避免了传统方法中因可交换性假设失效导致的过度收缩问题。

Comments 18 pages, 1 figure

详情
英文摘要

With the recent paradigm shift from cytotoxic drugs to new generation of target therapy and immuno-oncology therapy during oncology drug developments, patients with various cancer (sub)types may be eligible to participate in a basket trial if they have the same molecular target. Bayesian hierarchical modeling (BHM) are widely used in basket trial data analysis, where they adaptively borrow information among different cohorts (subtypes) rather than fully pool the data together or doing stratified analysis based on each cohort. Those approaches, however, may have the risk of over shrinkage estimation because of the invalidated exchangeable assumption. We propose a two-step procedure to find the balance between pooled and stratified analysis. In the first step, we treat it as a clustering problem by grouping cohorts into clusters that share the similar treatment effect. In the second step, we use shrinkage estimator from BHM to estimate treatment effects for cohorts within each cluster under exchangeable assumption. For clustering part, we adapt the mixture of finite mixtures (MFM) approach to have consistent estimate of the number of clusters. We investigate the performance of our proposed method in simulation studies and apply this method to Vemurafenib basket trial data analysis.

2005.04721 2026-05-12 stat.AP stat.ME

Decision Making in Drug Development via Inference on Power

Geoffrey S Johnson

AI总结 本文探讨了药物开发过程中基于统计功效的决策问题,指出传统做法是用外部研究结果替代功效公式中的未知参数,而贝叶斯方法则通过先验或后验分布计算成功概率以反映不确定性。作者提出应将这两种方法视为对功效的不同点估计,并指出仅依赖这些点估计进行“继续/终止”决策无法充分量化和控制风险,主张采用基于功效的统计推断方法以提升决策的科学性和风险管理水平。

详情
英文摘要

A typical power calculation is performed by replacing unknown population-level quantities in the power function with what is observed in external studies. Many authors and practitioners view this as an assumed value of power and offer the Bayesian quantity probability of success or assurance as an alternative. The claim is by averaging over a prior or posterior distribution, probability of success transcends power by capturing the uncertainty around the unknown true treatment effect and any other population-level parameters. We use p-value functions to frame both the probability of success calculation and the typical power calculation as merely producing two different point estimates of power. We demonstrate that Go/No-Go decisions based on either point estimate of power do not adequately quantify and control the risk involved, and instead we argue for Go/No-Go decisions that utilize inference on power for better risk management and decision making.

1706.00476 2026-05-12 math.OC cs.LG stat.ML

The Mixing method: low-rank coordinate descent for semidefinite programming with diagonal constraints

Po-Wei Wang, Wei-Cheng Chang, J. Zico Kolter

AI总结 本文提出了一种用于解决具有对角约束的结构化半定规划问题的低秩坐标下降方法,称为“Mixing方法”。该方法实现简单、无需调参,并在优化性能上相比现有方法有显著提升。研究证明该方法严格递减、收敛于临界点,且在足够秩的条件下所有非最优临界点均为不稳定的。此外,该方法在随机初始化下几乎肯定以局部线性速率收敛到全局最优解,这是首个无需假设即可在球面流形上达到全局最优的低秩半定规划方法。作者将该算法应用于最大割问题和最大可满足性问题的松弛求解,并在多个方面展示了优于现有方法的显著改进。

Comments The proof has been updated to match the version presented in the 2021 thesis: https://ml.cmu.edu/research/phd-dissertation-pdfs/thesis_poweiw.pdf

详情
英文摘要

In this paper, we propose a low-rank coordinate descent approach to structured semidefinite programming with diagonal constraints. The approach, which we call the Mixing method, is extremely simple to implement, has no free parameters, and typically attains an order of magnitude or better improvement in optimization performance over the current state of the art. We show that the method is strictly decreasing, converges to a critical point, and further that for sufficient rank all non-optimal critical points are unstable. Moreover, we prove that with a step size, the Mixing method converges to the global optimum of the semidefinite program almost surely in a locally linear rate under random initialization. This is the first low-rank semidefinite programming method that has been shown to achieve a global optimum on the spherical manifold without assumption. We apply our algorithm to two related domains: solving the maximum cut semidefinite relaxation, and solving a maximum satisfiability relaxation (we also briefly consider additional applications such as learning word embeddings). In all settings, we demonstrate substantial improvement over the existing state of the art along various dimensions, and in total, this work expands the scope and scale of problems that can be solved using semidefinite programming methods.

2605.09235 2026-05-12 cs.LG cs.AI stat.ML

On Variance Reduction in Learning Mean Flows

Juanwu Lu, Ziran Wang

AI总结 本文研究了在学习均值流(MeanFlow)过程中方差减少的问题,指出当前训练方法因错误使用条件速度场而导致损失不降和梯度方差无界。作者提出了一种理论分析,明确了最优的系数设置,并表明已有多种改进方法实际上对应于同一最优解的不同实现。实验表明,使用最优系数可显著提升样本质量,并揭示了梯度方差最小化与FID指标优化之间的不匹配现象。

Comments 25 pages, 7 figures, 6 tables

详情
英文摘要

One-step generative modeling has emerged as a leading approach to amortize the inference cost of diffusion and flow-matching models. Among distillation-free methods, MeanFlow training is notoriously unstable, with non-decreasing loss and unbounded gradient variance. In this work, we establish a theory that attributes this pathology to a misuse of the conditional velocity field: it plays two distinct statistical roles in the loss, both as an unbiased regression target and as a Monte Carlo control variate inside a Jacobi-vector product, with the original loss assigning the wrong coefficient to the latter. We derive the optimal coefficient in closed form, and show that a family of fixes in concurrent works corresponds to different practical realizations of the same optimum. A controlled sweep of this coefficient on two-dimensional benchmarks and on a latent Diffusion Transformer recovers the predicted bias-variance ordering. The optimal coefficient yields up to a %54 improvement in sample quality on two-dimensional benchmarks and a monotone FID trend at every matched-step DiT checkpoint. Crucially, the same DiT measurement also reveals a quantitative FID-MSE landscape mismatch: although gradient variance is minimized at an interior coefficient value, the coefficient that minimizes FID prefers the direct use of conditional velocity.

2605.09214 2026-05-12 cs.LG cs.AI cs.IT math.IT math.ST stat.ML stat.TH

Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability

Qingyue Zhao, Kaixuan Ji, Heyang Zhao, Quanquan Gu

AI总结 本文研究了在单策略可集中性条件下,使用前向KL正则化的离线上下文老虎机问题,提出了首个达到 $\tilde{O}(ε^{-1})$ 的上界分析,显著优于以往 $\tilde{O}(ε^{-2})$ 的慢速率结果。通过引入一种新的凸分析方法,结合悲观原则,统一了表格和一般函数逼近场景,并避开了基于平均值定理的传统证明方法。此外,作者还给出了匹配的下界,证明了所获上界在统计速率上的最优性,并揭示了前向KL正则化在低正则化区域与无正则化方法具有一致的慢速率表现。

Comments 31 pages, comments are welcome

详情
英文摘要

\emph{Kullback-Leibler} (KL) regularization is ubiquitous in reinforcement learning algorithms in the form of \emph{reverse} or \emph{forward} KL. Recent studies have demonstrated $ε^{-1}$-type fast rates for decision making under reverse KL regularization, in contrast to the standard $ε^{-2}$-type sample complexity. However, for forward-KL-regularized objectives, existing statistical analyses are either not applicable or result in $\tilde{O}(ε^{-2})$ slow rates. We take the first step towards addressing this problem via a streamlined analysis of forward-KL-regularized offline CBs. We give the first $\tilde{O}(ε^{-1})$ upper bounds in tabular and general function approximation settings, both under notions of \emph{single-policy concentrability}. In particular, our convex-analytical pipeline unifies these settings by exploiting the pessimism principle in a novel way and completely bypasses the proof routines in previous works based on the mean value theorem, which might be of independent interest. Moreover, we provide rate-optimal lower bounds, manifesting the tightness of our upper bounds in terms of statistical rates. Our lower bounds also demonstrate that the forward-KL-regularized sample complexity recovers the unregularized slow rate in the low-regularization regime, similarly to the reverse-KL regularization.

2605.09193 2026-05-12 stat.AP stat.ME

Quantifying Time-Varying Physical Activity Intervention Effects via Functional Regression

Nidhi Pai, Yu Lu, Kristin A. Linn, Erjia Cui

AI总结 该研究旨在量化体力活动干预效果随时间的变化,提出了一种基于函数回归的方法,将整个体力活动轨迹作为函数观测进行建模,从而更准确地捕捉干预效果的动态变化。相较于传统方法,该方法在方法论和实际应用中均展现出优势,并进一步扩展为函数-函数回归以分析不同时间段体力活动之间的关联。研究应用该方法分析了STEP UP研究中的每日步数数据,揭示了三种干预策略在体力活动上的不同时间变化效应及其持续性差异,展示了函数数据分析在高维终点干预研究中的应用潜力。

详情
英文摘要

Physical activity (PA) intervention studies often collect repeated intensity measurements over long observation periods. Quantifying the variation in intervention effects over the study period is critical to evaluating and improving intervention strategies, yet many analyses reduce PA data into scalar summary measures, resulting in limited insights. We propose a functional regression framework, which captures time-varying intervention effects by modeling the entire PA trajectory as a functional observation. From both methodological and practical perspectives, we demonstrate the advantages of function-on-scalar regression (FoSR) over the traditional two-step approach of applying functional principal components analysis (FPCA) followed by regressing scores on covariates. The FoSR is further extended to a function-on-function regression (FoFR) for studying the association of PA across time periods. Methods are applied to daily step counts from the Social incentives to Encourage Physical Activity and Understand Predictors (STEP UP) study, revealing distinct and highly interpretable time-varying effects of three intervention strategies on PA and differences in their sustainability. Our case study highlights the feasibility of functional data analysis techniques for uncovering novel insights in intervention studies with high-dimensional endpoints.

2605.09147 2026-05-12 cs.CL cs.AI stat.AP

From Traditional Taggers to LLMs: A Comparative Study of POS Tagging for Medieval Romance Languages

Matthias Schöffel, Esteban Garces Arias

AI总结 本文对比研究了传统词性标注工具与大型语言模型(LLMs)在中世纪罗曼语(包括中世纪奥克语、加泰罗尼亚语和法语)词性标注任务中的表现。研究发现,基于LLM的方法在零样本、少样本、单语微调和跨语言迁移等设置下均优于传统标注工具,其中微调和多语训练效果最佳。研究还指出,跨语言迁移对资源匮乏的语言尤为有效,而有针对性的双语训练在特定目标语言上可能优于更广泛的多语配置,为历史自然语言处理提供了重要的实践指导。

Comments Accepted at NLP4DH @ ACL 2026

详情
英文摘要

Part-of-speech (POS) tagging for Medieval Romance languages remains challenging due to orthographic variation, morphological complexity, and limited annotated resources. This paper presents a systematic empirical evaluation of large language models (LLMs) for POS tagging across three medieval varieties: Medieval Occitan, Medieval Catalan, and Medieval French. We compare traditional rule-based and statistical taggers with modern open-source LLMs under zero-shot prompting, few-shot prompting, monolingual fine-tuning, and cross-lingual transfer learning settings. Experiments on historically grounded datasets show that LLM-based approaches consistently outperform traditional taggers, with fine-tuning and multilingual training yielding the largest improvements. In particular, cross-lingual transfer learning substantially benefits under-resourced varieties, while targeted bilingual training can outperform broader multilingual configurations for specific target languages. The results highlight the importance of linguistic proximity and dataset characteristics when designing transfer strategies for historical NLP. These findings provide empirical insights into the applicability of modern neural methods to medieval text processing and provide practical guidance for deploying LLM-based POS tagging pipelines in digital humanities research. All code, models, and processed datasets are released for reproducibility.

2605.09116 2026-05-12 stat.ME stat.AP stat.ML

Fit CATE Once: Model-Assisted Randomization Tests Without Sample Splitting

Fangnan Zheng, Yao Zhang

AI总结 本文提出了一种无需样本分割的模型辅助随机化检验方法,旨在结合随机化检验和灵活处理效应模型的优势,以更有效地分析随机面板实验数据。核心思想是从残差化结果的协方差结构中估计无符号条件平均处理效应(CATE),并利用实际分配结果进行随机化推断,从而在保持统计有效性的同时提升检验功效。该方法在合成和半合成实验中表现出更好的类型I错误控制和更高的检验功效,并可用于识别异质子群体及检验子群体特定的处理效应。

Comments 48 pages, 7 figures

详情
英文摘要

Randomization tests and flexible treatment-effect models offer complementary strengths for analyzing data from randomized panel experiments: the former provide valid inference under the known assignment mechanism, while the latter can capture complex patterns of effect heterogeneity. We develop model-assisted randomization tests that combine these strengths without sample splitting. The key idea is to estimate an unsigned version of the conditional average treatment effect (CATE) from the covariance structure of residualized outcomes, while leaving the realized assignments for randomization inference. The remaining sign can be chosen to best fit the observed outcomes. We establish identification and consistency for the proposed unsigned CATE estimators, as well as validity for the CATE-assisted randomization tests. Across synthetic and semi-synthetic experiments, the CATE-assisted randomization tests control Type I error and achieve higher power than covariate-adjusted and sample-split alternatives. Finally, we show that the assignment-free CATE estimates can be used to discover heterogeneous subgroups and test subgroup-specific treatment effects.

2605.09075 2026-05-12 stat.ML cs.LG

Optimality of Sub-network Laplace Approximations: New Results and Methods

Swarnali Raha, Kshitij Khare, Rohit K Patra

AI总结 本文研究了子网络拉普拉斯近似方法在深度神经网络不确定性量化中的最优性问题。现有方法通常依赖于对参数子集的启发式选择,忽略了参数间的交叉作用,且缺乏理论保证。作者通过理论分析证明,所有子网络拉普拉斯方法都会系统性低估全拉普拉斯后验的预测方差,且该偏差随保留参数子矩阵的增大而减小。基于这一发现,本文提出了两种基于梯度和贪心策略的子网络拉普拉斯近似方法,并证明其在理论上的优越性,实验也表明其性能优于现有方法。

Comments 34 Pages, 8 Figures, 2 Tables

详情
英文摘要

Although the Laplace approximation offers a simple route to uncertainty quantification in deep neural networks, its reliance on inverting large Hessian matrices has motivated a range of computationally feasible low-dimensional or sparse approximations. A prominent class of such methods - sub-network Laplace approximations, constructs surrogates by restricting attention to a small subset of parameters. Existing approaches in this family typically rely on diagonal, layer-wise, or other architectural heuristics for subset selection, which ignore cross-parameter interactions and lack formal optimality guarantees. In this paper, we provide a rigorous theoretical analysis of the sub-network Laplace paradigm. We prove that all sub-network Laplace methods systematically underestimate the predictive variance of the full Laplace posterior, and that this bias decreases monotonically as the retained sub-matrix expands. Leveraging this insight, we propose two principled, analytically grounded sub-network Hessian approximations: \textit{Gradient-Laplace} selects parameters with the largest average squared gradients of the model output with respect to the parameters over a reference dataset; while \textit{Greedy-Laplace} iteratively refines this selection by accounting for off-diagonal interactions in the precision matrix. We establish theoretical guarantees characterizing their optimality properties and show that Gradient-Laplace provably outperforms existing heuristic approaches. Extensive numerical studies across diverse settings indicate that these methods perform strongly relative to existing benchmarks.

2605.09064 2026-05-12 stat.AP

Bayesian decision theory for wildlife management under uncertainty: from inference to action

Olivier Gimenez, Abby Keller, Cyril Milleret

AI总结 该研究探讨了在不确定性条件下,如何利用贝叶斯决策理论将生态学推断结果有效应用于野生动物管理决策。研究提出了一种基于标准贝叶斯工具的实用工作流程,并通过法国狼群管理和荷兰麝鼠防控两个案例展示了其应用。结果表明,该方法能够明确权衡不同目标,提升决策的透明性与实用性,为生态模型与管理决策的整合提供了灵活框架。

Comments 3 figures

详情
英文摘要

Ecologists are increasingly expected to inform management decisions under uncertainty, yet most analytical workflows stop at statistical inference. This disconnect limits the practical impact of ecological modelling, particularly in high-stakes contexts such as wildlife management, where decisions must balance ecological, economic and social objectives. Bayesian decision theory provides a coherent framework to bridge this gap. It propagates uncertainty from posterior distributions to quantify the consequences of alternative actions through utility functions. Despite its strong theoretical foundations, it remains underused in ecology. Here, we present a practical workflow for implementing Bayesian decision theory using standard Bayesian tools. We illustrate the approach with two case studies. First, wolf management in France, where the decision consists of selecting the number of wolves that can be removed under uncertainty about population dynamics. Second, invasive muskrat management in the Netherlands, where the decision involves allocating a fixed control effort across space. In both cases, expected utility is computed from posterior simulations, explicitly accounting for uncertainty and trade-offs. Results show that optimal decisions emerge as a compromise between competing objectives. In the wolf case, optimal harvest balances removal benefits and population risk. In the muskrat case, optimal effort increases with the importance of population reduction and is unevenly allocated across provinces. These examples show that Bayesian decision theory can be implemented as a direct extension of standard inference. By making trade-offs explicit, it enhances transparency, reproducibility, and relevance for management. More broadly, it provides a flexible basis for integrating ecological modelling with decision-making.

2605.08995 2026-05-12 stat.ME

Semiparametric Elliptical Mixture Clustering for High-Dimensional Data

Long Feng, Dan Zhuang

AI总结 本文提出了一种用于高维数据的半参数椭球混合聚类方法,旨在解决在分布具有重尾且近似椭球结构时聚类的挑战。该方法通过引入簇特定中心、未知的公共径向生成函数和稀疏的公共精度-形状矩阵,结合数据驱动的聚类数选择规则,构建了一个灵活的聚类框架。研究开发了一种基于变换半径估计、径向得分中心更新和Tyler-POET-GLASSO算法的广义期望最大化(GEM)算法,实现了高维下的计算可行性,并在模拟研究和手写数字应用中验证了其优越的聚类性能和鲁棒性。

详情
英文摘要

Clustering high-dimensional data is especially challenging when cluster distributions are heavy tailed and only approximately elliptical. Existing high-dimensional methods are largely built for Gaussian or other light-tailed models, whereas classical robust elliptical procedures are mostly low dimensional or rely on fully parametric radial families. We propose a semiparametric elliptical mixture clustering framework with cluster-specific centers, an unknown common radial generator, and a common sparse precision-shape matrix, together with a data-driven rule for selecting the number of clusters. A generalized expectation-maximization (GEM) algorithm is developed by combining transformed-radius estimation of the radial generator, radial-score center updates, and a Tyler-POET-GLASSO update for the common precision-shape matrix. The method avoids specifying a parametric radial family and remains computationally feasible in high dimensions. We establish high-dimensional consistency for the estimated model components and the excess misclustering error. Simulation studies and a handwritten-digit application demonstrate the competitive performance and robustness of the proposed method, particularly in heavy-tailed elliptical settings.

2605.08980 2026-05-12 cs.LG math.OC stat.ML

Muon Does Not Converge on Convex Lipschitz Functions

Tetiana Parshakova, Ahmed Khaled, Michael Crawshaw, Guillaume Garrigos, Robert M. Gower

AI总结 本文研究了Muon优化算法在凸Lipschitz函数上的收敛性问题,指出尽管Muon及其变体在深度学习中表现出色,但其收敛性分析通常依赖于平滑性假设,而凸Lipschitz函数类却是许多优化方法的基础。研究发现,Muon在凸Lipschitz函数上无法收敛,无论学习率如何选择。虽然误差反馈机制可以恢复其收敛性,但在图像分类和语言建模任务中却会损害其性能,表明Muon的成功可能源于凸Lipschitz模型所缺乏的结构,最可能是与平滑性相关。

详情
英文摘要

Muon and its variants have shown strong empirical performance in a variety of deep learning tasks. Existing convergence analyses of Muon rely on smoothness assumptions, though arguably the most successful function class for developing deep learning methods (such as AdaGrad, Shampoo, Schedule-Free and more) has been the class of convex and Lipschitz functions. In this paper we question whether the classical convex Lipschitz model is a useful one for understanding Muon. Our answer is no. We show that Muon does not converge on the class of convex and Lipschitz functions, regardless of the choice of learning rate schedule. We also show that error feedback restores convergence of Muon and all the non-Euclidean subgradient methods with momentum. However, this theoretical fix using error feedback degrades the performance of Muon in two representative settings for image classification (CIFAR-10) and language modeling (nanoGPT on FineWeb-Edu 10B). Our conclusion is that convex Lipschitz theory, despite having a prominent role in the design of practical methods for deep learning, is not the most suited one for Muon. This suggests that Muon's success must come from structure absent from this model, most plausibly related to smoothness.

2605.08963 2026-05-12 stat.ML cs.LG

Survey-aware Machine Learning: A Guideline for Valid Population Health Inference based on Scoping Review

YongKyung Oh, Henry W. Zheng, Jeffrey Feng, Alex A. T. Bui

AI总结 该研究针对基于复杂健康调查数据(如NHANES)的机器学习模型中常忽略调查设计信息的问题,提出了一个九步指南——Survey-aware Machine Learning(SaML),以确保人口健康推断的有效性。通过综述16篇方法学论文,总结了加权模型训练、基于设计的交叉验证和调查调整性能评估等现有方法,并指出现有研究在超参数调优和部署方面的不足。SaML为不同分析目标提供了具体的步骤指导,有助于提升模型的公平性和推断准确性。

详情
英文摘要

Machine Learning (ML) models trained on complex health surveys such as the National Health and Nutrition Examination Survey (NHANES) often ignore primary sampling units, stratification variables, and sampling weights. This practice violates the independence assumptions of standard evaluation methods. As a result, estimates become biased, uncertainty is underestimated, and fairness assessments fail to reflect population-level disparities. We propose Survey-aware Machine Learning (SaML), a nine-step guideline that incorporates survey design metadata across the ML lifecycle. Through a scoping review of 16 methodological papers, we summarize existing work on weighted model training, design-based cross-validation, and survey-adjusted performance evaluation. We also identify gaps in hyperparameter tuning and deployment. We provide task-specific guidance that clarifies which steps are required for different analytical objectives. SaML provides a checklist for valid population inference from survey data.

2605.08928 2026-05-12 math.OC stat.ML

Learning Generative Dynamics with Soft Law Constraints: A McKean-Vlasov FBSDE Approach

Samer El Boustany, Samy Mekkaoui, Yadh Hafsi, Alexandre Alouadi, Huyên Pham

AI总结 本文提出了一种基于 McKean-Vlasov 前向-后向随机微分方程(FBSDE)的生成框架,用于从终端和中间分布观测中学习随机动力学。该方法通过软能量约束强制终端和时间边缘分布,将生成过程建模为一个受均场目标驱动的控制问题,从而提供了一种不同于硬插值或最优传输映射的生成方式。实验表明,该方法能够生成符合给定边缘分布的平滑随机路径,并在高维人体运动数据上展示了其对结构化分布学习的有效性。

详情
英文摘要

We propose a generative framework for learning stochastic dynamics from endpoint and intermediate distributional observations. The method formulates generation as a McKean-Vlasov control problem in which terminal and time-marginal laws are enforced through soft energy constraints. The associated optimality system is a forward-backward stochastic differential equation (FBSDE) whose backward component receives a continuous drift induced by the marginal law penalties. This provides a principled alternative to hard interpolation or optimal transport maps between observed distributions: the model learns a stochastic path law whose dynamics remain globally coupled through the mean-field objective. We derive the reduced FBSDE system for quadratic control cost and constant diffusion, connecting terminal and marginal law flat derivatives to score-like training signals. The resulting neural solver is evaluated on low-dimensional distributional benchmarks, where it recovers smooth stochastic paths matching prescribed marginal laws. In a higher-dimensional ALAE latent space, endpoint supervision is used as a qualitative stress test for transporting non-smiling faces toward smiling ones in a pretrained representation. We then use articulated human motion as a structured high-dimensional case study on a curated AMASS low-to-high position dataset, using SMPL-H pose sequences and reduced pose representations. The experiments show that soft marginal law constraints can produce coherent stochastic trajectories whose intermediate distributions follow the observed evolution of human motion. The code is available at https://github.com/murex/deep-mkv-gen/tree/main.

2605.08873 2026-05-12 cs.LG stat.AP stat.ML

CoDistill-GRPO: A Co-Distillation Recipe for Efficient Group Relative Policy Optimization

Soo Min Kwon, Ziteng Sun, Ananda Theertha Suresh, Himanshu Jain, Sanjiv Kumar

AI总结 Group Relative Policy Optimization(GRPO)是一种提升语言模型推理能力的有效算法,但在处理困难任务时,由于稀疏奖励的问题,难以提升小型模型的性能。为此,本文提出CoDistill-GRPO,一种通过联合训练大模型和小模型的协同蒸馏方法,利用精心设计的GRPO目标,使两者相互学习,从而提升小模型的表现并降低训练成本。实验表明,CoDistill-GRPO在多个数学基准测试中显著优于传统GRPO,同时在大模型训练中也实现了效率提升。

详情
英文摘要

Group Relative Policy Optimization (GRPO) has emerged as a powerful algorithm for improving the reasoning capabilities of language models, but often fails to improve small models due to sparse rewards on difficult tasks. Existing works mitigate this issue by leveraging a larger model, either to provide hints for rollouts or to provide dense reward signals through knowledge distillation (KD). However, this assumes the existence of such an oracle, and training one can significantly increase total training time. In this work, we propose CoDistill-GRPO, a co-distillation algorithm that simultaneously trains a large and a small model by maximizing carefully designed GRPO objectives. The two models learn from each other: the small model uses an on-policy KD reward to learn from the large model's distribution, while the large model is updated using rollouts generated by the small model with importance reweighting, reducing the computational overhead of rollout generation. We show that CoDistill-GRPO substantially improves small model performance over standard GRPO on mathematical benchmarks across both Qwen and Llama models. Specifically, with Qwen2.5-Math-1.5B, we observe an accuracy increase of over 11.6 percentage points over the base model and an additional 6.0 percentage points over GRPO on the Minerva dataset. Interestingly, the larger model (Qwen2.5-Math-7B) trained with CoDistill-GRPO nearly matches standard GRPO performance despite training on small-model rollouts. This highlights CoDistill-GRPO as a cost-effective alternative to GRPO for larger models, yielding an approximate 18% speedup, which may be of independent interest.

2605.08871 2026-05-12 math.OC cs.DC cs.LG stat.ML

Rennala MVR: Improved Time Complexity for Parallel Stochastic Optimization via Momentum-Based Variance Reduction

Zhirayr Tovmasyan, Artavazd Maranjyan, Peter Richtárik

AI总结 本文研究了在异构计算集群中如何通过方差缩减技术提升并行随机优化算法的时间复杂度。作者提出了一种基于动量的方差缩减方法Rennala MVR,改进了原有的Rennala SGD算法,并在均方平滑假设下证明了其时间复杂度的优越性。实验表明,该方法在理论分析和实际应用中均能有效提升优化效率。

详情
英文摘要

Large-scale machine learning models are trained on clusters of machines that exhibit heterogeneous performance due to hardware variability, network delays, and system-level instabilities. In such environments, time complexity rather than iteration complexity becomes the relevant performance metric for optimization algorithms. Recent work by Tyurin and Richtárik (2023) established the first time complexity analysis for parallel first-order stochastic optimization, proposing Rennala SGD as a time-optimal method for smooth nonconvex optimization. However, Rennala SGD is fundamentally a modification of SGD, and variance reduction techniques are known to improve the iteration complexity of SGD. In this work, we investigate whether variance reduction can also improve time complexity in heterogeneous systems. We show that, under a mean-squared smoothness assumption, variance reduction can improve time complexity in relevant parameter regimes. To this end, we propose Rennala MVR, a variance-reduced extension of Rennala SGD based on momentum-based variance reduction, and analyze its oracle and time complexity. We establish lower bounds for time complexity under these assumptions. On a stochastic quadratic benchmark, experiments with the exact method support the theory, while neural-network experiments with a practical inexact variant show similar empirical gains over Rennala SGD.

2605.08866 2026-05-12 stat.ML cs.LG math.OC

Tight Generalization Bounds for Noiseless Inverse Optimization

Pouria Fatemi, Hoomaan Maskan, Suvrit Sra, Peyman Mohajerin Esfahani

AI总结 本文研究了无噪声逆优化问题,旨在从观测到的上下文-动作数据中推断决策者的优化目标参数。作者提供了高概率下的 $O(\frac{d}{T})$ 通用化界,并在特定条件下进一步加强了这一界,使其与强化学习中的最佳臂识别结果相一致。此外,作者证明了该界在所考虑的一致估计器中是紧致的,并将结果扩展到瞬时和累积遗憾分析,实验验证了理论结果的有效性。

Comments 29 pages, 2 figures

详情
英文摘要

Inverse optimization (IO) seeks to infer the parameters of a decision-maker's objective from observed context--action data. We study noiseless IO, where demonstrations are generated by a ground-truth objective. We provide a high-probability ${O}(\frac{d}{T})$ generalization bound for the induced action set, where $d$ is the number of unknown parameters and $T$ is the size of the training dataset. We strengthen these guarantees under additional conditions that ensure uniqueness of the chosen action, bringing our IO guarantees in line with best-arm identification results in the bandit literature. We further show that the ${O}(\frac{d}{T})$ rate is tight over all consistent estimators considered here, and extend the result to both instantaneous and cumulative regret. Notably, the resulting regret lower bound matches the corresponding upper bounds in the adversarial setting, indicating that the stochastic IO setting is effectively adversarial for the class of estimators studied here. Finally, we propose a parameter-free algorithm with lower per-iteration complexity than generic solvers. Experiments validate the predicted rates and illustrate the tightness of our bounds.

2605.08864 2026-05-12 cs.LG math.ST stat.TH

Higher-Order Equilibrium Tracking for EM-Compressible Online Estimation

ZhiMing Li, Yue Song

AI总结 本文研究了潜在变量模型中的在线估计问题,将其重新表述为追踪一个移动的实证均衡。作者提出了一种新的分析框架,将在线估计分解为当前运行统计量对应的冻结批量均衡和追踪滞后误差,并证明了在一定条件下,在线估计器可以继承批量估计的中心极限定理和精确的一阶风险常数。研究还引入了EM压缩性及相关概念,为在线追踪提供了理论支持,并在潜在线性高斯协方差估计中验证了该方法的有效性。

Comments 41 pages, 6 figures

详情
英文摘要

We study online estimation in latent-variable models by recasting the problem as tracking a moving empirical equilibrium. Standard online EM and stochastic approximation analyses primarily study convergence toward the population parameter and typically do not isolate the empirical batch optimum from the online tracking error at finite horizon. Our framework decomposes the online estimate into the frozen batch equilibrium at the current running statistic and a tracking lag that captures the algorithm's delay behind this moving target. We prove a batch-to-online transfer theorem: provided $\lVert e_T \rVert_{L^{2}} = o(T^{-1/2})$, the online estimator inherits the batch central limit theorem and the sharp first-order risk constant. Our key observation is that the empirical optimum evolves on a smooth equilibrium manifold indexed by the running statistic. An $m$-th order equilibrium-jet predictor combined with an order-$ν$ frozen corrector yields localized tracking rates $O(T^{-ν(m+1)})$. We formalize EM-compressibility and EM-jet$^R$-compressibility as the structural conditions that make the equilibrium response and the Newton corrector evaluable from a retained streaming statistic. The theory is instantiated in latent linear Gaussian covariance estimation, where the first-order scheme operates on a compressed $d \times d$ statistic with explicit finite-sample risk envelopes and a certified restart rule.

2605.08850 2026-05-12 math.OC cs.LG stat.ML

Local LMO: Constrained Gradient Optimization via a Local Linear Minimization Oracle

Peter Richtárik, Kaja Gruntkowska, Hanmin Li

AI总结 本文提出了一种新的无投影梯度优化方法 Local LMO,用于解决约束优化问题。其核心思想是用局部线性最小化预言替代传统 Frank-Wolfe 方法中的全局线性最小化预言,通过在当前迭代点周围的小球区域内进行约束集的交集最小化操作,从而实现更高效的优化过程。Local LMO 在多个重要场景下继承了投影梯度下降的收敛速率,并在无需约束集有界、无需曲率假设等条件下,获得了凸函数、强凸函数以及非凸、随机和非光滑问题的多种收敛性保证。

Comments 71 pages, 8 figures

详情
英文摘要

We design Local LMO - a new projection-free gradient-type method for constrained optimization. The key algorithmic idea is to replace the global linear minimization oracle over the constraint set used by Frank-Wolfe (FW) with a local linear minimization oracle over the intersection of the constraint set and a "small" ball centered at the current iterate. In particular, when minimizing $f:\mathbb{R}^d\to \mathbb{R}$ over a constraint $\emptyset\neq\mathcal{X}\subseteq\mathbb{R}^d$, Local LMO performs the iteration \[x_{k+1}\in \arg\min_{z\in\mathcal{X}\cap\mathcal{B}(x_{k},t_k)}\langle\nabla f(x_{k}), z \rangle,\] where $x_0\in\mathcal{X}$, and $t_k>0$ is a suitably chosen radius which can be interpreted as an effective stepsize. While designed as an alternative to FW, Local LMO is perhaps best viewed as a generalization of Gradient Descent (GD) rather than a modification of FW. Indeed, it is easy to see that Local LMO reduces to GD in the unconstrained setting and, more generally, to GD restricted to an affine subspace if the constraint $\mathcal{X}$ is affine. We prove that this simple algorithmic scheme transfers the known (unaccelerated) convergence rates of Projected Gradient Descent (PGD) to the projection-free world in several important regimes, some of which are beyond the reach of FW. In contrast to FW theory, i) our guarantees hold without requiring the feasible set $\mathcal{X}$ to be bounded, ii) our theory does not require the "curvature" assumption, which allows us to establish a standard sublinear rate for convex functions with bounded gradients, iii) we obtain a linear rate in the smooth strongly convex regime. Furthermore, we obtain sharp sublinear rates in the smooth convex and non-convex regimes, in the $(L_0,L_1)$-smooth convex regime, and in stochastic and non-differentiable settings.

2605.08811 2026-05-12 stat.ML cs.LG

Learning Theory of Transformers: Local-to-Global Approximation via Softmax Partition of Unity

Zhongjie Shi, Wenjing Liao

AI总结 本文研究了Transformer网络在紧致欧几里得域和紧致黎曼流形上的回归任务中的学习理论,提出了一种基于softmax分区统一性的构造性逼近框架,通过注意力机制实现局部逼近的全局聚合。研究表明,仅包含两个编码器块和标准单隐藏层前馈网络的密集型Transformer,能够以$\mathcal{O}(\varepsilon^{-d/α})$参数数量实现对α-Hölder连续函数的均匀ε逼近。进一步分析表明,该模型的泛化误差界达到近似最小最大最优,为$\mathcal{O}\big(n^{-\frac{2α}{2α+d}} \log n\big)$,其中$n$为训练样本数量。

详情
英文摘要

This paper investigates the learning theory of Transformer networks for regression tasks on the compact Euclidean domain $[0,1]^d$ and $d$-dimensional compact Riemannian manifolds. We propose a novel constructive approximation framework for Transformers that builds local approximations of the target function and aggregates them into a global approximation via softmax partition of unity. This approach leverages the attention mechanism to achieve spatial localization through affine transformations of the input. The softmax activation plays a crucial role in aggregating local approximations to a global output. From an approximation perspective, we prove that a dense Transformer equipped with only two encoder blocks and standard single-hidden-layer point-wise feed-forward networks can achieve a uniform $\varepsilon$-approximation error for $α$-Hölder continuous functions with $α\in (0,1]$ using $\mathcal{O}(\varepsilon^{-d/α})$ total parameters. Building upon this approximation guarantee, we establish a near minimax-optimal generalization error bound of order $\mathcal{O}\big(n^{-\frac{2α}{2α+d}} \log n\big)$ for the empirical risk minimizer, where $n$ is the training data size. The Transformer architecture studied in this paper is dense, shallow and wide, and employs softmax activation and sinusoidal positional encodings, closely reflecting practical implementations.

2605.08793 2026-05-12 cs.MS cs.AI cs.LG stat.CO stat.ML

cuRegOT: A GPU-Accelerated Solver for Entropic-Regularized Optimal Transport

Yixuan Qiu

AI总结 最优传输(OT)已成为现代机器学习中的基础工具,但在大规模应用中其计算成本仍是一个显著瓶颈。为提升计算效率,本文提出 cuRegOT,一种针对熵正则化最优传输问题的高性能 GPU 求解器。该方法结合了多种算法与架构优化策略,包括摊销符号分析、异步 Sinkhorn 迭代生成机制以及融合内核设计,有效提升了 GPU 上的计算效率与收敛速度,并在多个基准任务中展现出优于现有方法的性能。

详情
英文摘要

Optimal transport (OT) has emerged as a fundamental tool in modern machine learning, yet its computational cost remains a significant bottleneck for large-scale applications. While harnessing the massive parallelism of modern GPU hardware is critical for efficiency, the de facto standard Sinkhorn algorithm, despite its ease of parallelization, often suffers from slow convergence in challenging problems. More recently, the sparse-plus-low-rank quasi-Newton method offers a balance between convergence rate and per-iteration complexity; however, its efficiency on GPUs is severely hindered by the serial nature of sparse matrix symbolic analysis and irregular memory access patterns. To bridge this gap, we present cuRegOT, a high-performance GPU solver tailored for entropic-regularized OT. We introduce a suite of algorithmic and architectural optimizations, including an amortized symbolic analysis strategy to mitigate CPU bottlenecks, an asynchronous Sinkhorn iterates generation mechanism, and a fused kernel for bandwidth-efficient gradient evaluation. These strategies are backed by rigorous theoretical guarantees ensuring algorithmic convergence. Extensive numerical experiments demonstrate that cuRegOT achieves significant speedups over state-of-the-art GPU-based solvers across a variety of benchmark tasks.

2605.08777 2026-05-12 stat.ML cs.LG math.PR

Measuring and Decomposing Mode Separation via the Canonical Diffusion

Shaul Tolkovsky, Ori Meidler, Or Zuk

AI总结 本文研究了密度分布中模式分离的度量问题,即分布如何形成被势垒分隔的簇状结构,这一特性在高维空间中难以量化。作者提出了一种基于密度平稳分布的可逆扩散过程,通过其自协方差矩阵提取两个指标:SSA(平方自相关和)用于衡量势垒敏感的分离程度,DA(主导自相关方向)用于捕捉元稳态结构。该方法仅需样本和分数函数,适用于高维数据,并在合成混合高斯、文本到图像生成和分子动力学等场景中验证了其有效性。

详情
英文摘要

Mode separation, namely how sharply a distribution fragments into barrier-separated clusters, is a fundamental geometric property of densities, difficult to quantify in high dimensions. It is structurally distinct from dispersion, yet existing tools fall short: differential entropy rises with spread regardless of fragmentation, PCA orders directions by variance regardless of barriers, and mutual information requires a mixture decomposition one usually does not have. We measure mode separation through a single stochastic process intrinsic to the density: a unique reversible diffusion with $f$ as its stationary distribution and constant scalar diffusion coefficient. We extract two readouts from its autocovariance matrix: SSA (Sum of Squared Autocorrelations), a scalar barrier-sensitive measure; and DA (Dominant Autocorrelation directions), linear projections ordered by metastability rather than variance. Under an isotropic-Gaussian null, we derive a closed-form spectrum for the empirical autocovariance that generalizes Marchenko--Pastur, with an analytic upper edge that selects the lag at which DA is read off. Both readouts use only samples and a score function, scaling to high dimensions through pretrained score-based generative models via Tweedie's identity. We apply our framework to three settings: (i) synthetic Gaussian mixtures, where SSA tracks mutual information; (ii) SDXL text-to-image generations, where SSA and DA capture structure that entropy and PCA miss; and (iii) molecular dynamics of alanine dipeptide, where DA recovers the known slow backbone dihedrals from static samples alone.

2605.08773 2026-05-12 stat.ME

Prediction-Powered Linear Regression: A Balance Between Interpretation and Prediction

Fuzhi Xu, Xingyu Yan, Xinyu Zhang

AI总结 在经济学研究中,如何有效利用大量未标记数据提升预测精度是一个挑战,尤其当观测结果难以获得时。本文提出了一种预测驱动的统一模型平均(PUMA)框架,将线性回归与机器学习方法结合,兼顾模型的解释性与预测能力。该方法通过模型平均同时处理模型误设、功率调节和算法选择带来的不确定性,在理论和实证上均表现出优越的预测性能和估计一致性。

详情
英文摘要

Unlabeled data are increasingly prevalent in contemporary economic studies, yet their effective use for improving prediction remains challenging because the outcomes are often costly or even infeasible to observe. Machine learning methods can help label these data and achieve high predictive accuracy, but they often lack interpretability. In this paper, we propose a Prediction-powered Unified Model Averaging (PUMA) framework to combine linear regression and machine learning methods, achieving a balance between interpretation and prediction. Unlike existing works on prediction powered inference, our approach is the first to jointly address uncertainty arising from model misspecification, power-tuning selection, and the choice of machine learning algorithms by using model averaging. Theoretically, we establish the asymptotic prediction optimality of the proposed method both in-sample and out-of-sample under mild conditions, along with estimation consistency. Extensive simulations and a real-world application further demonstrate the empirical advantages of the proposed method.

2605.08753 2026-05-12 cs.CV stat.ML

Simultaneous Monitoring of Shape and Surface Color via 4D Point Clouds: A Registration-free Approach

Mariafrancesca Patalano, Giovanna Capizzi, Kamran Paynabar

AI总结 本文提出了一种无需配准的4D点云框架SMAC,用于同时监测物体的形状和表面颜色变化。该方法利用拉普拉斯-贝尔特拉米算子的谱特性,捕捉形状与颜色之间的关系,并通过联合监测策略有效检测形状变形和颜色异常。此外,该方法还引入了空间感知的后信号诊断过程,以定位异常来源,具有计算高效、无需配准和网格重建的优势,实验表明其在细微缺陷检测方面表现优异。

Comments 38 pages, 11 figures

详情
英文摘要

Advanced manufacturing technologies allow for the production of intricate parts featuring high shape complexity and spatially-varying material composition. Data fusion of point clouds with chromatic attributes provides 4D point clouds, a compact and informative representation that encodes both shape and material information. In this paper, we present a registration-free framework for Simultaneous Monitoring of shApe and Color (SMAC) via 4D point clouds. The proposed framework leverages Laplace-Beltrami operator spectral properties to capture and monitor geometric features and the relationship between shape and surface color. A combined monitoring scheme is proposed to effectively detect shape deformations and color anomalies, along with a spatially-aware post-signal diagnostic procedure to determine the source of change and localize color anomalies. Importantly, neither component relies on registration or mesh reconstruction, eliminating error-prone and computationally expensive preprocessing steps. A Monte Carlo simulation study and a case study on functionally graded materials demonstrate that SMAC achieves effective detection performance, particularly for subtle defects, while providing diagnostic capabilities to identify the source and location of anomalies.

2605.08705 2026-05-12 math.ST math.PR stat.ME stat.ML stat.TH

Minimax Optimal Estimation of Transport-Growth Pairs in Unbalanced Optimal Transport

Donlapark Ponnoprat, Noboru Isobe, Masaaki Imaizumi

AI总结 本文研究了非平衡最优运输(UOT)中运输-增长对的最小最大最优估计问题。不同于传统的最优运输,UOT 允许源和目标测度具有不同的总质量,作者指出其自然的总体目标应为运输-增长对而非单一映射。为此,他们提出了两种估计方法,并证明了估计误差达到了最小最大最优速率,主要技术贡献在于通过UOT间隙条件将目标函数扰动转化为运输与增长风险,为非平衡最优运输中的Monge型估计提供了统计理论基础。

Comments 70 pages

详情
英文摘要

Unbalanced optimal transport (UOT) extends classical optimal transport to measures with different total masses, but statistical guarantees for Monge-type estimation remain limited. We study unbalanced transport with quadratic cost and Kullback-Leibler marginal penalties and argue that the natural population target is not a map alone, but a transport-growth pair. Consequently, we develop two estimators for the transport-growth pairs under several setups: an optimal transport plan-based estimator for a general case, and a kernel-based estimator for a case with smooth densities. We also show that an error of the estimator achieves the minimax optimal rate by deriving a matching lower bound of the minimax risk. Our main technical contribution is a value-based stability reduction that converts perturbations of the UOT objective into transport and growth risks through a UOT gap condition. These results provide a statistical foundation for Monge-type estimation in unbalanced optimal transport.

2605.08681 2026-05-12 stat.ML cs.AI cs.LG cs.NA math.NA

Core-Halo Decomposition: Decentralizing Large-Scale Fixed-Point Problems

Haixiang, Yang Xu, Jiefu Zhang, Xudong Wu, Zihan Zhou, Jun He, Jiayu Chen

AI总结 本文研究如何通过分解方法求解大规模固定点方程 $x^\star = \bar{F}(x^\star)$。传统严格分解方法将变量分配给不同代理,但会导致依赖关系被截断,引入结构性偏差。为此,作者提出核心-边缘(Core-Halo)分解方法,将变量的写操作与读操作分离,使每个代理更新自己的核心变量,同时读取重叠的边缘变量,从而忠实实现原固定点问题。实验表明,该方法在保持去中心化优势的同时,性能接近集中式求解。

详情
英文摘要

We study solving large-scale fixed-point equation \(x^\star=\bar F(x^\star)\) with decomposition. Standard strict decomposition assigns each agent a disjoint block and evaluates updates using only owned coordinates. For most operators, however, a block update may depend on variables outside the block. Truncating these dependencies by strict decomposition changes the mean operator and creates structural bias that cannot be removed by more samples, smaller stepsizes, or additional consensus. We therefore propose Core-Halo decomposition, which separates write ownership from read-only evaluation context: each agent updates its own core and reads from an overlapping halo. By aligning the Core-Halo decomposition with the block-dependence structure of $\bar F$, the original fixed-point problem can be implemented faithfully in a decentralized multi-agent system. We further characterize the fundamental obstruction faced by strict decomposition through a Bellman closure condition and a blockwise bias lower bound, showing that local-only updates can alter the original fixed-point operator. Finally, we conduct extensive experiments across a range of application settings, and demonstrate that Core-Halo achieves near-centralized performance while retaining the parallelism benefits of decentralization.

2605.08677 2026-05-12 math.ST stat.TH

Bridging Theory and Practice: Statistical Inference for Latent Space Models of Networks

Yuang Tian, Jiajin Sun, Yinqiu He

AI总结 该论文研究了网络数据的潜在空间模型的统计推断问题,旨在弥合理论分析与实际算法之间的差距。作者提出了一种统一的分析框架,放宽了现有理论中的谱多重性约束,并开发了新的自适应准则和理论工具,以去除对未知真实参数的依赖。研究还明确建立了投影梯度下降和奇异值阈值算法的输出与最大似然估计之间的联系,为网络分析中的实用且统计严谨的推断提供了理论基础。

详情
英文摘要

Latent space models have been widely adopted in modeling network data. Developing statistical inference for estimated model parameters enables quantifying associated uncertainty and is pivotal for downstream tasks. Despite recent progress on statistical inference of maximum likelihood estimation, crucial gaps remain between asymptotic theoretical guarantees and practical use. Specifically, how are the oracle maximum likelihood estimators related to the solutions produced by algorithms in practice? Can rigorous guarantees be established for existing algorithms without unnecessary restrictions? To address these fundamental questions, we develop a unified analytical framework that bridges theory and practice of statistical inference for latent space models. First, for the maximum likelihood estimation, we relax the spectral-multiplicity constraint in the existing asymptotic theory to broaden the applicability. Second, we overcome the dependence on unknown true parameters in prior algorithmic analyses by developing novel adaptive criteria and theoretical tools. For the widely used algorithm based on the projected gradient descent and the singular value thresholding, we explicitly connect their outputs to the maximum likelihood estimator without relying on unknown information. Our results provide a solid foundation for practically useful and statistically principled statistical inference in network analysis.

2605.08672 2026-05-12 math.ST cs.NA math.NA stat.ML stat.TH

Posterior Concentration of Bayesian Physics-Informed Neural Networks for Elliptic PDEs

Yuxuan Zhao, Yulong Lu

AI总结 本文研究了贝叶斯物理信息神经网络(PINNs)在求解一类椭圆偏微分方程(PDEs)时的后验收缩速率。针对具有非齐次狄利克雷边界条件的椭圆方程,利用域内和边界上的噪声观测数据进行学习,假设方程的精确解属于霍尔德空间,并构造合适的先验分布,证明了后验分布以近似最优速率围绕真实解集中。所选先验具有速率自适应性,无需预先知道精确解的光滑程度即可实现几乎最优的收敛速率,为通过贝叶斯PINNs进行PDE不确定性量化提供了统计保证。

详情
英文摘要

We study the posterior contraction rate of Bayesian Physics-Informed Neural Networks (PINNs) for solving a general class of elliptic partial differential equations (PDEs). We focus on learning of the elliptic equation with a non-homogeneous Dirichlet boundary condition from independent and noisy measurements collected both inside the domain and on the boundary. Assuming that the PDE admits a strong solution in a Hölder space and using with a suitably constructed prior on the neural network weights, we prove that the posterior distribution concentrates around the exact solution at a near-minimax rate. Furthermore, the chosen prior is rate-adaptive: the posterior contracts at an (almost) optimal rate without prior knowledge of the smoothness level of the exact solution. Our results provide statistical guarantees for uncertainty quantification of PDEs via Bayesian PINNs.

2605.08656 2026-05-12 stat.ME

Bias Correction for Semiparametric Regression Models

Yuming Zhang, Yanyuan Ma, Xuming He, Stéphane Guerrier

AI总结 本文研究了一类广义的半参数回归模型,其中响应变量的条件分布形式已知,但包含一个高维参数 $\boldsymbolβ$、一个光滑函数 $m(\cdot)$ 和一个离散参数 $\phi$。现有研究多关注 $\boldsymbolβ$ 的半参数效率,而忽视了 $\phi$ 和 $m(\cdot)$ 的有限样本偏差及其对推断的影响。为此,作者提出了一种基于模拟的偏差校正框架 SABRE,能够在不增加方差的前提下有效降低 $\boldsymbolβ$ 和 $\phi$ 的估计偏差,并通过仿真和实际数据分析验证了其有效性。

详情
英文摘要

We consider a broad class of semiparametric regression models in which the conditional distribution of the response takes the form $f\{Y|\bf{x}^{\rm T}\boldsymbolβ+m(z), ϕ\}$, which is known up to a parametric component $\boldsymbolβ$ of diverging dimension $p$, a smooth function $m(\cdot)$, and a dispersion parameter $ϕ$. Existing semiparametric literature on such models has primarily focused on semiparametric efficiency for $\boldsymbolβ$, typically treating $ϕ$ and $m(\cdot)$ as nuisances and largely ignoring their finite-sample bias. However, the finite-sample bias of standard estimators can be substantial (especially when $p$ is large relatively to $n$ and/or dispersion is high) and can seriously undermine inference for $\boldsymbolβ$. Moreover, $ϕ$ is often of direct scientific interest and requires accurate estimation. To address this gap, we propose SABRE, a simulation-based bias correction framework for this broad semiparametric model class. We establish asymptotic properties of SABRE for the subclass of generalized partially linear models, where bias reduction for $\boldsymbolβ$ and $ϕ$ can be achieved without inflating variance, and we outline how the underlying principle may be adapted more generally. Comprehensive simulation studies and a real-data application on early-stage diabetes demonstrate the empirical effectiveness of SABRE in reducing bias and improving inference.

2605.08637 2026-05-12 stat.ME

Spherical Mixture Integration for Latent Embedding Alignment across Multi-Source Feature Spaces

Yuming Zhang, Congyuan Duan, Dong Xia, Doudou Zhou, Tianxi Cai

AI总结 本文研究了如何对来自不同医疗机构的电子健康记录(EHR)数据进行整合分析,以提升模型的鲁棒性和泛化能力。面对不同机构间数据编码不一致和语义碎片化的问题,作者提出了SMILE方法,通过球面混合模型对异源特征空间中的潜在嵌入进行对齐,实现临床概念的语义统一。该方法利用稀疏的辅助关系对潜在空间进行弱监督,并建立了非渐近误差界,理论分析表明多源数据与辅助知识的融合能带来统计上的增益,实验验证了其在对齐与同义词聚类方面的有效性。

详情
英文摘要

Multi-institutional electronic health record (Multi-EHR) data have emerged as a powerful resource for developing predictive models to support clinical decisions and for generating reliable real-world evidence. By aggregating information from diverse patient populations and institutions, they enhance the robustness and generalizability of models and findings. However, analyzing multi-EHR remains challenging because disparate institutions rarely map all data elements to common ontology, and raw EHR codes are often overly granular and institution-specific, fragmenting representations of the same clinical concept. Hence, integrative analysis must overcome two key hurdles: harmonizing codes with the same clinical meaning (synonymy), and aligning institutional feature spaces. To address these challenges, we propose SMILE, a Spherical Mixture Integration for Latent Embedding alignment across multi-source feature spaces, where embeddings from heterogeneous sources serve as privacy-preserving summaries of clinical concepts and sparse auxiliary relationship pairs provide weak supervision on the latent geometry. Synonymy is modeled via a mixture of von Mises-Fisher distributions, yielding unified representations that consolidate semantically equivalent raw codes. We develop a composite quasi-likelihood estimation procedure and establish non-asymptotic error bounds for latent representations and mixture mean directions, together with consistent recovery of synonym clusters. The theory quantifies statistical gains from integrating multiple sources and auxiliary knowledge graph information. Simulations and a multi-institutional EHR application demonstrate improved alignment and synonym clustering.

2605.08561 2026-05-12 stat.ML cs.LG

CONTRA: Conformal Prediction Region via Normalizing Flow Transformation

Zhenhan Fang, Aixin Tan, Jian Huang

AI总结 本文提出了一种名为CONTRA的新方法,用于生成多维输出的可靠预测区域。该方法通过归一化流的潜在空间定义非一致性评分,从而克服传统方法在高维空间中预测区域模糊的问题。CONTRA不仅能够生成更精确的预测区域,还支持与现有预测模型结合使用,提升其预测可靠性,适用于多种数据集,具有广泛的适用性。

Comments 18 pages, 7 figures and 5 tables

详情
Journal ref
International Conference on Learning Representations 2025
英文摘要

Density estimation and reliable prediction regions for outputs are crucial in supervised and unsupervised learning. While conformal prediction effectively generates coverage-guaranteed regions, it struggles with multi-dimensional outputs due to reliance on one-dimensional nonconformity scores. To address this, we introduce CONTRA: CONformal prediction region via normalizing flow TRAnsformation. CONTRA utilizes the latent spaces of normalizing flows to define nonconformity scores based on distances from the center. This allows for the mapping of high-density regions in latent space to sharp prediction regions in the output space, surpassing traditional hyperrectangular or elliptical conformal regions. Further, for scenarios where other predictive models are favored over flow-based models, we extend CONTRA to enhance any such model with a reliable prediction region by training a simple normalizing flow on the residuals. We demonstrate that both CONTRA and its extension maintain guaranteed coverage probability and outperform existing methods in generating accurate prediction regions across various datasets. We conclude that CONTRA is an effective tool for (conditional) density estimation, addressing the under-explored challenge of delivering multi-dimensional prediction regions.

2605.08552 2026-05-12 stat.ML cs.LG

Learnability and Competition in High-Dimensional Multi-Component ICA

Eser Ilke Genc, Samet Demir, Zafer Dogan

AI总结 本文研究了高维多分量独立成分分析(ICA)中的可学习性与竞争机制,提出了一个渐近精确的平均场理论,揭示了在线学习过程中估计方向与真实成分之间的耦合关系。研究发现,在高维极限下,估计值与真实成分的重叠矩阵满足一个闭合的常微分方程系统,并据此发现了由初始化驱动的两种相态:解耦态和竞争态。该理论给出了学习率、数据矩和初始化之间的显式可学习边界与竞争条件,并通过实验验证了理论预测的轨迹和相变行为。

Comments 56 pages, 9 figures

详情
英文摘要

Independent Component Analysis (ICA) is a foundational tool for unsupervised representation learning, yet its high-dimensional theory remains largely limited to single-component recovery. We develop an asymptotically exact mean-field theory for multi-component online ICA, capturing the coupling induced by simultaneous learning and orthogonalization. In the high-dimensional limit, the joint empirical distribution of learned estimates and ground-truth components converges to a deterministic process, yielding a closed ODE system for the overlap matrix between learned directions and true components. This characterization reveals a genuinely multi-component, initialization-driven phase structure: a decoupled regime, where estimates align with distinct components and evolve nearly independently, and a competition regime, where overlapping initializations induce orthogonality-driven conflicts, slow reorientation, and delayed convergence. Our steady-state analysis gives explicit learnability boundaries and competition conditions linking step size, data moments, and initialization. These conditions show that larger higher-order moments and competition shrink the stable learning-rate window, increase convergence times, and predict a staircase phenomenon in which the number of recoverable components changes discretely with the learning rate. Experiments on synthetic data and hyperspectral remote sensing data validate the predicted trajectories and phase behavior.

2605.08551 2026-05-12 econ.EM math.ST stat.ME stat.TH

Nonparametric Empirical Bayes Confidence Intervals

Zhen Xie

AI总结 本文提出了一种非参数经验贝叶斯置信区间(NP-EBCI),用于在正态均值模型中对不可观测的个体效应进行推断。该方法基于点识别的全非参数先验构建置信区间,通过后验分位数或其非参数估计实现可行的区间估计,其条件和边际覆盖率在渐近下均收敛于目标水平。尽管非参数方法具有灵活性,但也面临非参数去卷积带来的严重病态问题,导致估计速率仅为对数速率,但仿真结果表明该方法在非高斯先验下仍能保持较好的覆盖率并显著缩短区间长度。

详情
英文摘要

Empirical Bayes methods can improve inference on unobservable individual effects by borrowing strength across units. This paper proposes nonparametric empirical Bayes confidence intervals (NP-EBCIs) for unobservable individual effects in a normal means model. The oracle intervals are constructed from posterior quantiles under a point-identified, fully nonparametric prior; feasible intervals replace these quantiles with nonparametric estimates. The NP-EBCIs are asymptotically exact in the sense that both their conditional and marginal coverage probabilities converge to the nominal level. The flexibility of this nonparametric construction has an unavoidable statistical cost. We demonstrate that posterior quantiles, unlike posterior means, inherit the severe ill-posedness of nonparametric deconvolution: the minimax optimal estimation rate is logarithmic. This logarithmic rate is minimax optimal for errors in the conditional coverage probability, and the resulting errors in the marginal coverage probability also vanish at the same logarithmic rate. Despite these slow asymptotic rates, simulations show that the NP-EBCIs remain close to nominal coverage when the prior is non-Gaussian, and deliver substantial length reductions relative to intervals that treat each unit in isolation.

2605.08546 2026-05-12 stat.ML cs.LG math.OC

Sliced Inner Product Gromov-Wasserstein Distances

Xiaoyun Gong, Gabriel Rioux, Ziv Goldfeld

AI总结 本文研究了高维数据下内积成本的格罗莫夫-瓦瑟斯坦(IGW)距离的可扩展性问题,提出了一种具有自然旋转不变性质的切片IGW距离,解决了其在一维情况下缺乏闭式解的难题。该方法在理论分析和数值实验中得到了验证,并应用于文本数据的异构聚类和语言模型表示比较任务中。

Comments 49 pages, 8 figures

详情
英文摘要

The Gromov-Wasserstein (GW) problem provides a framework for aligning heterogeneous datasets by matching their intrinsic geometry, but its statistical and computational scaling remains an issue for high-dimensional problems. Slicing techniques offer an appealing route to scalability, but, unlike Wasserstein distances, GW problems do not generally admit closed-form solutions in one-dimension. We resolve this problem for the GW problem with inner product cost (IGW), propose a sliced IGW distance that enjoys a natural rotational invariance property, and comprehensively study its structural and computational properties. Numerical experiments validating our theory are presented, followed by applications to heterogeneous clustering of text data and language model representation comparison.

2605.08532 2026-05-12 stat.AP stat.ME

Accounting for variable detection functions in temporal abundance modeling via transfer learning

Kevin M. Collins, Erin M. Schliep, Tyler Wagner, Christopher K. Wikle

AI总结 该研究旨在解决在利用相对丰度(如单位采样努力捕获的动物数)监测鱼类和野生动物种群时,因检测概率变化带来的建模挑战。研究提出通过迁移学习方法,将从捕获-再捕获(CR)数据中学习到的检测概率函数应用于更广泛可用的相对丰度数据(CPUE)模型中,从而提高对种群数量变化的估计精度。该方法在模拟研究和实际案例中均显示出对种群动态趋势检测能力的提升,为生态监测提供了新的分析工具。

详情
英文摘要

Relative abundance, measured as the number of animals caught per unit of sampling effort (CPUE), is commonly used to monitor fish and wildlife populations, largely because sampling methods are cost-effective to implement. Modeling relative abundance, however, requires the assumption that the detection probability is constant across sampling events. This assumption is likely not valid, as the probability of detection often varies as a function of several factors, including the characteristics of individual animals and environmental conditions at the time of sampling. In contrast, methods to estimate absolute abundance, such as capture-recapture (CR), account for variable detection, but are often infeasible to implement across large spatiotemporal scales. Despite this, CR data are sometimes available for species of interest, albeit at smaller spatiotemporal extents. Leveraging information on detection probabilities from CR data to help inform estimates of widely available CPUE data could strengthen inferences about the status of fish and wildlife populations. We propose an approach to (i) learn the effect of environmental covariates on detection probabilities from CR data and (ii) transfer these detection functions to CPUE models for improved inference. Shown empirically through a simulation study, this approach improves estimates of abundance and the ability to detect temporal trends. We apply our transfer learning method using CR and CPUE data to recreationally important smallmouth bass (\textit{Micropterus dolomieu}) fisheries in Pennsylvania, USA rivers.

2605.08509 2026-05-12 stat.ME

An Object-Oriented Spatial Statistics Approach for Human Activity Space Estimation

Haoyang Wu, Yen-Chi Chen, Adrian Dobra

AI总结 本文提出一种基于面向对象空间统计的方法,用于从GPS数据中估计人类活动空间,综合考虑个体移动行为与建成环境的影响。该方法通过时间在空间区域和道路网上的分布来刻画日常活动模式,并开发了时间加权估计器以处理不规则采样的GPS观测。研究还推导了误差界并构建了地图增强的活动模式表示,仿真和实际数据分析表明该框架能够有效识别稳定活动中心、可解释的出行走廊及活动与移动成分的稳定行为。

Comments 53 pages, 16v figures

详情
英文摘要

Human activity spaces are shaped by individual mobility and the built environment, motivating statistical methods that integrate GPS observations with GIS representations of places and routes. We propose a novel methodology to estimate activity spaces in built environments from GPS data within the Object Oriented Spatial Statistics framework. We characterize daily mobility through the distribution of time across spatial polygons and road segments, aiming to capture entity-specific time-use fractions and level-$γ$ activity spaces. We develop a time-weighted estimator to handle irregularly sampled GPS observations. We derive an error bound that quantifies the effects of measurement error, nearest-entity misclassification, temporal gaps, boundary crossings, and day-to-day variability. We also develop a map-augmented representation of daily activity patterns, a dwell-time-weighted distance for clustering daily trajectories, and polygon- and road-based stability summaries. Simulation studies and a real-data application demonstrate that the proposed framework recovers concentrated stationary anchors, interpretable travel corridors, and distinct stabilization behavior for dwelling and movement components, supporting the benefits of weighting under irregular sampling. KEYWORDS: GPS data, GIS, human mobility, space-time geography.

2605.08505 2026-05-12 cs.LG cs.AI math.PR math.ST stat.TH

Scaling Limits of Long-Context Transformers

Giuseppe Bruno, Shi Chen, Zhengjiang Lin, Yury Polyanskiy, Philippe Rigollet

AI总结 本文研究了固定查询和随机上下文下的长上下文Transformer的注意力机制,分析了逆温度参数 $β_n$ 对注意力行为的影响,揭示了选择性出现的临界尺度由距离分布的局部指数决定,而非全局特征。研究还刻画了不同 $β_n$ 区域下注意力权重和输出的极限分布,包括亚临界、临界和超临界情形,并指出在亚临界情况下,当值矩阵为单位矩阵时,注意力映射近似实现了反向热方程。

Comments 40 pages, 4 figures

详情
英文摘要

We study the long-context limit of softmax self-attention with a fixed query and a random context of $n$ i.i.d. keys on the sphere, viewing the inverse temperature $β_n$ as the scaling parameter that decides whether attention degenerates into uniform averaging or collapses onto the single closest key. We show that the critical scale at which selectivity emerges is determined by the local exponent of the distance-to-query distribution near zero rather than by global features of the context, and scales like $β_n^\ast \asymp n^{2/(d-1)}$ for uniform keys on $\mathbb{S}^{d-1}$. Furthermore, we characterize the limiting laws of the ordered attention weights and of the attention output across all regimes of $β_n$: a subcritical regime in which the output reduces to a local average around $q$ with explicit deterministic bias and Gaussian fluctuations; a critical regime in which a finite collection of nearest keys retains macroscopic mass without single-key collapse; and a supercritical regime in which all mass concentrates on the closest key. Of notable interest is the subcritical case with identity value matrix where the attention map approximately implements a backward heat equation.

2605.08485 2026-05-12 stat.ML cs.LG math.ST stat.ME stat.TH

Sinkhorn Treatment Effects: A Causal Optimal Transport Measure

Medha Agarwal, Alex Luedtke

AI总结 本文提出了一种名为Sinkhorn处理效应的因果最优运输度量,用于衡量反事实分布之间的差异。该方法基于熵正则化的最优运输理论,能够捕捉整个分布层面的差异,而不仅仅是平均处理效应。通过将其表示为反事实均值嵌入的平滑变换,作者建立了该度量的路径可微性,并构造了去偏估计量,从而提出了用于检验分布处理效应的渐近有效检验方法。实验表明该方法在模拟和图像数据中具有良好的实际效果。

Comments 55 pages, 6 figures

详情
英文摘要

We introduce the Sinkhorn treatment effect, an entropic optimal transport measure of divergence between counterfactual distributions. Unlike classical quantities such as the average treatment effect, this measure captures differences across entire distributions. We analyze this divergence as a statistical functional and show it can be written as a smooth transformation of counterfactual mean embeddings with an appropriate kernel. This characterization allows us to establish first-order pathwise differentiability in general, and second-order pathwise differentiability under the null hypothesis of equal counterfactual distributions. Leveraging this smoothness, we construct debiased estimators and use them to obtain asymptotically valid tests for distributional treatment effects with a fixed entropic regularization parameter. Because the power of the test depends on this unknown parameter, we further propose an aggregated test that combines evidence across a grid of regularization choices. Experiments on simulated and image data demonstrate the practical advantages of our estimator and testing procedure.

2605.08483 2026-05-12 math.NA cs.NA stat.CO

Randomized quasi-Monte Carlo for walk on spheres

Valerie N. P. Ho, Art B. Owen

AI总结 本文研究了在“球面行走”算法中使用随机准蒙特卡洛(RQMC)方法,用于求解具有狄利克雷边界条件的实数空间中的边值问题。针对二维调和函数,作者分析了在环面区域上周期性指示函数的积分特性,并给出了边界满足特定闵可夫斯基内容条件的充分条件,从而可以应用已有理论结果。实验表明,RQMC方法在多个测试案例中表现出比传统蒙特卡洛方法更优的方差收敛率,且不同RQMC方法在性能上各有优劣。

详情
英文摘要

We investigate the use of randomized quasi-Monte Carlo (RQMC) in walk on spheres algorithms to solve boundary value problems for functions with Dirichlet boundary conditions in $\mathbb{R}^d$. For harmonic functions with $d=2$, the integrands of interest are periodic indicator functions over regions $Θ$ in the torus $\mathbb{T}^k$. We give conditions for $\partialΘ$ to have $k-1$ dimensional Minkowski content which allows us to use results of He and Wang (2015). The RQMC estimates involve multiple values of $k$. We see sampling variances decreasing with the number $n$ of sample points at slightly better than Monte Carlo rates. The median variance rate in $4$ RQMC methods over $5$ worked examples, including some with $d=3$ and some with nonzero source functions, was slightly better than $O(n^{-1.1})$. The variance reduction factors ranged from $1.8$ to $10.7$ at $n=2^{17}$. None of the four RQMC methods dominated the others.

2605.08453 2026-05-12 cs.LG cs.AI stat.ML

Sink vs. diagonal patterns as mechanisms for attention switch and oversmoothing prevention

Peter Súkeník, Cristina López Amado, Christoph H. Lampert, Marco Mondelli

AI总结 本文研究了sink(汇点)和对角模式在注意力切换和防止过度平滑中的作用。通过分析几何条件,揭示了sink表示所需的嵌入对齐特性,并进一步明确了sink在防止过度平滑中的作用机制,证明了密集注意力在某些条件下比稀疏注意力更易导致平滑,并通过实验验证了这一条件在实际中常被满足。文章还建立了sink与硬注意力切换之间的等价关系,并通过引入自通信机制对硬注意力切换进行了放松,分析了sink与对角模式在表示成本上的差异,解释了为何预训练Transformer更倾向于使用sink结构。这些研究填补了防止过度平滑需求与sink功能之间的差距,并阐明了注意力层在无需token通信时为何可能表现出类似MLP的行为。

详情
英文摘要

This paper studies the role of sinks and diagonal patterns as attention switch and anti-oversmoothing mechanisms. We analyze geometric conditions under which sinks can be represented, showing a necessary alignment between the embedding of the sink and all other embeddings. Next, we refine the current understanding of the role of sinks in oversmoothing prevention: we specify the conditions under which dense attention provably smooths more than sparse attention, and empirically verify that such conditions are often satisfied in practice. We further prove an equivalence between sinks and hard attention switch, in which the output of the attention is identically 0. Finally, we relax the hard attention switch by allowing token self-communication: we provide a quantitative comparison of the costs of representing sinks vs.\ diagonal patterns, showing why sinks are favored in pretrained transformers. The introduction and analysis of diagonal patterns and the generalization of the attention switch close the gap between what oversmoothing prevention requires and what sinks provide, while also establishing when and why attention layers act like MLPs if token communication is not necessary.

2605.06375 2026-05-12 cs.LG cs.AI math.ST stat.TH

A Unified Pair-GRPO Family: From Implicit to Explicit Preference Constraints for Stable and General RL Alignment

Hao Yu

AI总结 该论文针对基于人类偏好强化学习(RLHF)中的大语言模型对齐问题,提出了一种统一的Pair-GRPO方法家族,旨在解决策略更新不稳定、梯度方向模糊、可解释性差和梯度方差高等问题。研究通过引入Soft-Pair-GRPO和Hard-Pair-GRPO两种变体,分别在保留GRPO结构的基础上引入二元偏好奖励和显式概率约束,理论证明了其梯度稳定性,并提供了单调策略改进、确定梯度方向等理论保证。实验表明,该方法在多个基准任务中优于现有先进方法,显著提升了对齐质量与训练稳定性。

详情
英文摘要

Large language model (LLM) alignment via reinforcement learning from human preferences (RLHF) suffers from unstable policy updates, ambiguous gradient directions, poor interpretability, and high gradient variance in mainstream pairwise preference learning paradigms. To systematically address these limitations, we establish a unified theoretical framework for preference-based RL optimization centered on the Pair-GRPO family, comprising two tightly coupled variants: Soft-Pair-GRPO and Hard-Pair-GRPO. Soft-Pair-GRPO is a minimal modification of Group Relative Policy Optimization (GRPO) that replaces group-normalized scalar rewards with binary pairwise preference rewards, retaining GRPO's clipped surrogate and KL-regularized structure. We prove a critical gradient equivalence theorem: under first-order Taylor expansion around the current policy, Soft-Pair-GRPO's gradient is a positive scalar multiple of standard GRPO's gradient, explaining its empirical stability despite discarding continuous reward magnitudes. Building on this foundation, we propose Hard-Pair-GRPO, an advanced variant introducing explicit local probability constraints and constrained KL-fitting optimization to further suppress gradient noise and global policy drift. We provide comprehensive theoretical guarantees for both variants--including monotonic policy improvement, deterministic gradient direction, gradient-variance reduction, and dynamic step-size convergence. Extensive experiments on standard LLM alignment benchmarks (HH-RLHF,UltraFeedback) and the MuJoCo continuous control task HalfCheetah-v4 demonstrate that our Pair-GRPO family consistently outperforms state-of-the-art baselines in alignment quality, human preference win rate, training stability, and generalization to general reinforcement learning. Ablation studies validate the critical contributions of each core component.

2605.06135 2026-05-12 stat.ME stat.AP

Linked-Tucker Factorized Individualized Regression for Paired Multivariate Categorical Outcomes

Arkaprava Roy, Jeremy T. Gaskins, Steven Levy, Somnath Datta

AI总结 该研究提出了一种联合个体化截断-序数回归模型,用于分析配对的零膨胀序数结果,如龋齿和氟斑牙,数据来源于爱荷华州氟化物研究。模型结合了截断部分和比例优势部分,分别描述疾病是否存在以及严重程度,并引入链接 Tucker 张量分解以高效处理高维协变量效应,同时考虑个体和空间异质性。研究揭示了早期氟化物和饮食暴露与两种牙科结果在不同牙齿位置、年龄和亚人群中的异质性关联。

详情
英文摘要

We propose a joint individualized hurdle-ordinal regression model for paired zero-inflated ordinal outcomes with subject-specific, spatially varying, and time-varying covariate effects, motivated by the Iowa Fluoride Study (IFS). The two outcomes, dental caries and dental fluorosis, are measured repeatedly across ages at fine spatial resolution, yielding nested longitudinal data with substantial zero inflation, ordinality, and heterogeneity across individuals and locations. For each outcome, a hurdle component models disease presence, while a proportional-odds component models severity among positive observations. To parsimoniously represent the high-dimensional coefficient arrays, we introduce a linked Tucker tensor factorization. Shared subject-mode factors induce dependence between the caries and fluorosis coefficient tensors, while separate spatial factors accommodate the distinct measurement grids of tooth surfaces and tooth zones. A horseshoe prior on the core tensor elements encourages sparsity, and posterior computation is performed using the No-U-Turn Sampler in NumPyro. Population-level effect summaries are obtained by projecting individualized posterior linear predictors onto the design space, and Wasserstein barycenters aggregate these summaries across tooth locations and anatomical classes. Applied to the IFS, the model reveals spatially heterogeneous associations between early-life fluoride and dietary exposures and both outcomes. Fluoride exposure is associated with increased odds and severity of fluorosis, while soda intake consistently increases caries risk. These associations differ between presence and severity components and vary across tooth locations, ages, and subpopulations defined by prior caries status, highlighting the importance of the joint hurdle-ordinal framework for disentangling disease occurrence from disease progression in multilevel dental data.

2605.04274 2026-05-12 cs.LG cs.AI stat.ML

A Mean Curvature Approach to Boundary Detection: Geometric Insights for Unsupervised Learning

Alexandre L. M. Levada

AI总结 本文提出了一种基于平均曲率的边界检测方法——平均曲率边界点(MCBP),用于高维数据中的无监督学习。该方法通过局部k近邻邻域估计形状算子的离散近似,直接建模数据流形的内在曲率,从而无需显式参数化即可计算点的平均曲率,作为边界结构的原理性描述。研究揭示了高曲率区域与聚类过渡、几何不规则性和低密度界面之间的对应关系,并引入自适应百分位阈值策略实现多尺度边界提取,同时提出基于曲率的数据分解方法,提升聚类可分性和下游算法的鲁棒性。实验表明,MCBP在合成和真实数据集上显著提升了聚类性能,尤其在复杂高维场景中表现突出。

Comments 30 pages, 6 tables, 8 figures

详情
英文摘要

Accurate boundary detection in high-dimensional data remains a central challenge in unsupervised learning, particularly in the presence of non-linear structures and heterogeneous densities. In this work, we introduce Mean Curvature Boundary Points (MCBP), a novel geometric framework grounded in Geometric Machine Learning that departs from traditional density-based approaches by explicitly modeling the intrinsic curvature of the data manifold. The method relies on a discrete approximation of the shape operator, estimated from local k-nearest neighbor patches, to compute pointwise mean curvature without requiring explicit manifold parametrization. The key insight of MCBP is to use mean curvature as a principled descriptor of boundary structure: high-curvature regions naturally correspond to transitions between clusters, geometric irregularities, and low-density interfaces. This yields a unified geometric interpretation of boundary, outlier, and transition points. We further introduce an adaptive percentile-based thresholding scheme that enables multiscale boundary extraction without relying on ad hoc density parameters. Beyond detection, we propose a curvature-driven data decomposition that separates samples into smooth (low-curvature) and boundary (high-curvature) subsets, effectively acting as a non-linear geometric filtering mechanism. This representation enhances cluster separability and improves the robustness of downstream unsupervised algorithms. Extensive experiments on synthetic and real-world datasets demonstrate that MCBP consistently improves clustering performance, particularly in complex and high-dimensional scenarios. These results position MCBP as a concrete contribution to Geometric Machine Learning, highlighting the potential of curvature-aware analysis as a unifying paradigm bridging differential geometry and data-driven modeling.

2605.04124 2026-05-12 stat.ME econ.EM

Design-Based Variance Estimation for Modern Heterogeneity-Robust Difference-in-Differences Estimators

Isaac Gerber

AI总结 本文研究了现代异质性稳健双重差分(DiD)估计方法在复杂调查设计下的方差估计问题。作者指出,尽管现有方法通常基于独立同分布或固定设计框架,但在实际应用中常用于分层聚类设计的全国性调查,导致标准误估计不准确。通过理论分析和蒙特卡洛模拟,本文证明了在常规条件下,使用分层聚类方差公式可获得设计一致的标准误,并展示了忽略调查设计会严重降低置信区间覆盖率。研究还提供了适用于多种现代DiD估计器的开源Python工具包,以支持设计一致的方差估计。

Comments 38 pages, 1 figure, 8 tables. Companion software: diff-diff v3.3.2 (https://doi.org/10.5281/zenodo.19803705), public replication archive (https://github.com/igerber/design-based-did-replication; Zenodo DOI 10.5281/zenodo.20097360)

详情
英文摘要

Modern heterogeneity-robust difference-in-differences estimators derive their asymptotic properties under iid, cluster, or fixed-design frameworks that abstract from complex survey sampling, yet practitioners routinely apply them to nationally representative surveys with stratified cluster designs. We show that, under standard regularity conditions, the influence functions of each smooth IF-based or regression-based modern DiD estimator satisfy Binder's (1983) smoothness conditions, so the standard stratified-cluster variance formula applied to their values produces design-consistent standard errors. A Monte Carlo study with 66,000 replications shows where the design effect comes from. HC1 standard errors that treat observations as iid produce coverage as low as 34% under a baseline survey design and below 11% under informative sampling. Combining the survey-weighted point estimate with PSU-level clustering - the practitioner's cluster=psu heuristic - recovers near-nominal coverage across all scenarios. Adding strata and finite-population corrections yields incremental precision but is not required for valid coverage. Survey-weighted doubly robust estimation produces well-calibrated inference when parallel trends hold only conditionally. An NHANES illustration of the ACA dependent coverage provision shows that point estimates and standard errors change substantively - enough to reverse significance conclusions - when the survey design is accounted for. We provide diff-diff (https://github.com/igerber/diff-diff), an open-source Python package implementing design-based variance for fifteen modern DiD estimators.

2605.00247 2026-05-12 stat.CO cs.DC cs.MM econ.EM

$2B$ or Not $2B$: A Tale of Three Algorithms for Streaming: Covariance Estimation after Welford and Chan-Golub-LeVeque

Felix Reichel

AI总结 本文将三种用于流式和分布式环境下计算无偏样本协方差矩阵的算法——Gram算法、Welford算法和Chan-Golub-LeVeque(CGL)算法——统一在一个代数、数值和统计基础上,分析了它们的运行机制、数值稳定性及适用场景。研究提出了一种基于符合性预测的框架,为流式协方差估计提供了分布无关的有限样本置信区间,并通过实验验证了各算法在不同场景下的性能优势。

Comments 20 pages, 10 figures, 3 tables

详情
英文摘要

We place three algorithms for computing the unbiased sample covariance matrix in streaming and distributed settings on a common algebraic, numerical, and statistical foundation. The Gram algorithm, derived from the variance reformulation, maintains the running cross-product matrix $G_t = \sum_{i=1}^t x_i x_i^\top$ and the column-sum vector $s_t = \sum_{i=1}^t x_i$, yielding the unbiased covariance estimator $S_t = (t-1)^{-1}(G_t - t^{-1}s_t s_t^\top)$ in $O(p^2)$ time per update. The Welford algorithm propagates a running mean $m_t$ and outer-product corrections $M_t$, with updates $m_t = m_{t-1} + (x_t - m_{t-1})/t$ and $M_t = M_{t-1} + (x_t - m_{t-1})(x_t - m_t)^\top$, achieving the same asymptotic cost with improved numerical stability under large data shifts. The Chan-Golub-LeVeque algorithm supports block-parallel merging through the exact identity $M = M_A + M_B + \frac{n_A n_B}{n_A+n_B}(m_B - m_A)(m_B - m_A)^\top$, making it the natural choice for distributed and map-reduce architectures. All three algorithms produce the same estimator $S_t = M_t/(t-1)$ in exact arithmetic, although their finite-precision behavior differs markedly. Beyond runtime and numerical comparisons, we introduce a conformal prediction framework for streaming covariance estimation that yields finite-sample, distribution-free confidence sets $C_{t,jk}$ for each entry $S_{t,jk}$ of the covariance matrix at any step $t$ of the data stream. Experiments confirm that the Gram algorithm is fastest for batch computation, Welford is uniquely robust to catastrophic cancellation under large mean shifts, CGL is optimal for distributed settings, and conformal intervals achieve the nominal coverage level across all three algorithms.

2604.26326 2026-05-12 cs.LG cs.CL stat.ML

Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control

Bolian Li, Yifan Wang, Yi Ding, Anamika Lochab, Ananth Grama, Ruqi Zhang

AI总结 本文研究了大语言模型(LLM)在强化学习(RL)中遇到的性能饱和问题,并提出了一种名为Entrocraft的新方法,通过精确控制熵曲线来解决这一问题。该方法基于偏差优势分布的拒绝采样,无需正则化且适用于任意优势估计器。理论分析表明,该方法能够解释现有RL方法和熵保持方法的行为,并揭示了线性退火策略在熵调度中的优越性。实验表明,Entrocraft有效缓解了性能饱和,显著提升了模型的泛化能力、输出多样性和长期训练表现。

详情
英文摘要

Reinforcement learning (RL) has enabled complex reasoning abilities in large language models (LLMs). However, most RL algorithms suffer from performance saturation, preventing continued gains as RL training scales. This problem can be characterized by the collapse of entropy, a key diagnostic for exploration in RL. Existing attempts focus on preventing entropy collapse through regularization or clipping. However, their resulting entropy curves often exhibit instability in the long term, which hinders performance gains. In this paper, we introduce Entrocraft, a simple rejection-sampling approach that realizes user-customized entropy schedule by biasing the advantage distributions. Entrocraft requires no objective regularization and is advantage-estimator-agnostic. Theoretically, we relate per-step entropy change to the advantage distribution under minimal assumptions. This explains the behavior of existing RL and entropy-preserving methods. Entrocraft also enables a systematic study of entropy schedules, which reveals that linear annealing, which starts high and decays to a slightly lower target, performs best. Empirically, Entrocraft addresses performance saturation, significantly improving generalization, output diversity, and long-term training. It enables a 4B model to outperform an 8B baseline, sustains improvement for up to 4x longer before plateauing, and raises pass@K by 50% over the baseline.

2604.14345 2026-05-12 cs.LG cs.AI stat.ML

PAC-MCTS: Bias-Aware Pruning for Robust LLM-Guided Search and Planning

Tianhao Qian

AI总结 在自主推理和具身规划中,随着搜索深度增加,候选动作空间呈指数级扩展,导致计算资源消耗巨大。本文提出PAC-MCTS,一种基于偏差感知的剪枝框架,通过将节点扩展建模为有界偏差下的最佳臂识别问题,推导出样本复杂度上界和信息论下界,明确了安全剪枝的条件。实验表明,PAC-MCTS在Blocksworld和ALFWorld任务中显著提升了搜索效率和鲁棒性,减少了API调用次数并提高了样本效率。

Comments 18 pages, 4 figures

详情
英文摘要

As search depth increases in autonomous reasoning and embodied planning, candidate action spaces expand exponentially, often exhausting computational budgets. While heuristic pruning is a critical countermeasure, existing approaches lack formal safety guarantees when guided by surrogate evaluators such as Large Language Models (LLMs), which exhibit systematic biases. We formulate node expansion as a localized Best-Arm Identification (BAI) problem under bounded bias $L$ and derive a sample complexity upper bound of $\mathcal{O}((Δ-4L)^{-2})$, identifying $Δ> 4L$ as the regime where safe elimination is feasible. We further establish an information-theoretic lower bound of $Ω((Δ-2L)^{-2})$ that characterizes the structural limits of biased exploration. Motivated by these results, we propose PAC-MCTS, a bias-aware pruning framework that dynamically adapts confidence bounds during search. Experiments on Blocksworld and ALFWorld demonstrate that PAC-MCTS consistently improves robustness and search efficiency over strong pruning baselines, achieving up to 78\% fewer API evaluations and over 3$\times$ higher sample efficiency under strict compute budgets. Ablation studies further validate the predicted degradation behavior as evaluator bias increases.

2604.12062 2026-05-12 stat.ME

Is There an AI Bubble? Robust Date-Stamping for Periods of Exuberance

Abir Sarkar, Martin T. Wells

AI总结 本文探讨了人工智能相关企业估值飙升是否引发了新的投机性泡沫,并提出了一种稳健的日期标记方法,用于识别价格在持续变动波动率下的泡沫形成与破裂时期。研究扩展了传统的单位根检验,构建了能够适应持久波动性的SV-ADF检验方法,提升了泡沫检测的准确性与稳定性。实证分析显示,包括“七巨头”和主要半导体企业在内的AI相关股票存在显著的过度投机现象,其中谷歌和台积电在当前周期中表现出尤为强烈的泡沫特征。

详情
英文摘要

The recent surge in valuations among AI related firms has renewed concerns that markets may be entering a new phase of speculative exuberance, especially in the technology and semiconductor sectors at the center of the AI investment wave. This paper develops a practical econometric framework for detecting, date-stamping, and drawing inference on the origination and collapse of bubble episodes when prices evolve under persistent, time-varying volatility. Standard bubble tests are typically derived under homoskedasticity or weak heteroskedasticity and may therefore yield misleading inference in more general settings. We extend right-tailed Dickey-Fuller unit root tests to autoregressive models with highly persistent mean and volatility dynamics, delivering a stochastic-volatility-robust ADF (SV-ADF) test that accommodates persistent variance without imposing strict parametric structure. Building on a moderate-deviation asymptotic theory, the SV-ADF yields nuisance-parameter-free procedures with distinct critical values for origination and collapse, producing more stable alarms and fewer transient false positives around volatility spikes. We establish consistency of the date-stamping estimator and show that it remains asymptotically tractable. Monte Carlo simulations document strong power and substantial gains over homoskedastic (PWY) procedures when volatility dynamics are pronounced. An empirical analysis of AI-exposed equities, including the "Magnificent Seven" and leading semiconductor firms, finds pervasive exuberance with substantial heterogeneity in timing, intensity, and duration. The evidence points to especially strong bubble dynamics for Alphabet and TSMC in the current cycle, while Tesla and Nvidia exhibited pronounced explosive episodes in earlier phases of the AI-driven market cycle.

2604.06689 2026-05-12 cs.LG stat.ML

Generative Cross-Entropy: A Strictly Proper Loss for Data-Efficient Classification

Qipeng Zhan, Zhuoping Zhou, Li Shen

AI总结 本文提出了一种名为生成交叉熵(GenCE)的新分类损失函数,旨在提高数据稀缺场景下的样本效率。该方法通过引入生成学习的思想,在不改变网络结构或拟合额外密度模型的前提下,对传统交叉熵损失进行改进。GenCE 基于贝叶斯重写条件似然,并在小批量近似下实现跨类样本的训练信号耦合,理论证明其在一定条件下是严格正确的评分规则,实验表明其在多个数据集和不同场景下均优于传统损失函数,且具有更好的概率校准和分布外检测能力。

详情
英文摘要

Cross-entropy (CE) is the default training loss for supervised classification, but its sample efficiency is limited when labels are scarce. Existing remedies primarily act on the data side, via augmentation, synthesis, or transfer from pretrained models; the training objective itself is rarely revisited. We revisit it here. Drawing on the classical observation that generative classifiers reach their asymptotic error with fewer samples than discriminative ones, we propose Generative Cross-Entropy (GenCE), a drop-in replacement for CE that introduces a generative learning principle into a standard discriminative network without altering the architecture or fitting a separate density model. GenCE follows from a Bayesian rewrite of the class-conditional likelihood and, in the mini-batch approximation, reduces to normalizing each sample's softmax score against the model's predictions on the batch, coupling the training signal across examples sharing a class. We extend the proper-scoring-rule framework to such non-local losses and prove that GenCE is strictly proper under a mild completeness condition: its population risk is uniquely minimized at the true posterior. Across three datasets, on two architectures and in both balanced small-data and class-imbalanced regimes, GenCE outperforms CE and other widely used losses, while also producing better-calibrated probabilities and stronger out-of-distribution detection.

2603.08308 2026-05-12 math.ST cs.IT math.IT math.PR stat.TH

Weighted Chernoff information and optimal loss exponent in context-sensitive hypothesis testing

Mark Kelbert, El'mira Yu. Kalimulina

AI总结 本文研究了在乘积型上下文权重下的独立同分布二元假设检验问题,提出了加权切诺夫信息(weighted Chernoff information)作为最优加权总损失的指数衰减率。通过将加权几何混合分布嵌入到似然比指数族中,并利用其对数归一化因子确定衰减速率,证明了最优加权总损失的渐近形式。研究还推导了倾斜加权对数似然的集中界,并给出了高斯、泊松和指数模型下的闭式解,同时将结果推广到有限多个假设的情形。

Comments 30 pages, 3 figures, 1 table

详情
Journal ref
(2026) Entropy, 28(5), 536
英文摘要

We study binary hypothesis testing for i.i.d. observations under a multiplicative context weight. For the optimal weighted total loss, defined as the sum of weighted type-I and type-II losses, we prove the logarithmic asymptotic $$ L_n^* = \exp\{-n D_C^{\mathrm{w}}(\mathbb{P}, \mathbb{Q}) + o(n)\}, \quad n \to \infty, $$ where $D_C^{\mathrm{w}}$ is the weighted Chernoff information. The single-letter form of the exponent relies on a structural assumption that the weight factorises across observations, $φ(x_1^n) = \prod_{i=1}^n φ(x_i)$; this restriction is essential for the single-letter representation and should be distinguished from the weaker qualitative description "multiplicative context weight". The proof embeds the weighted geometric mixtures $φp^αq^{1-α}$ into a likelihood-ratio exponential family and identifies the rate through its log-normaliser. We also derive concentration bounds for the tilted weighted log-likelihood, obtain closed forms for Gaussian, Poisson, and exponential models, and extend the exponent characterisation to finitely many hypotheses.

2602.16596 2026-05-12 cs.LG cs.CR math.ST stat.ML stat.TH

Sequential Membership Inference Attacks

Thomas Michel, Debabrota Basu, Emilie Kaufmann

AI总结 本文研究了针对现代动态AI模型的序列成员推理攻击(SeMI),旨在通过利用模型更新序列信息,提高隐私审计的准确性。作者提出了一种最优攻击方法SeMI*,能够通过控制插入时间并分析模型序列中的统计特性,更有效地识别目标样本是否被包含在训练数据中。实验表明,与仅依赖最终模型的基线方法相比,SeMI攻击在多种数据集和基于(差分隐私)随机梯度下降训练的模型上表现出更高的攻击效果和更严格的隐私评估能力。

Comments 32 pages, 14 figures

详情
英文摘要

Modern AI models are not static. They go through multiple updates in their lifecycles. We propose to design Sequential Membership Inference (SeMI) attacks leading to tighter privacy audits by exploiting the sequence of models and injecting a target canary at a controlled insertion time. First, for empirical mean computation, we develop SeMI*, an {optimal SeMI attack to identify the presence of a target inserted at a specific insertion step}. We derive the power of SeMI* to show that accessing the model sequence yields more powerful MI attacks than scrutinising only the final model. SeMI* exhibits an isolation property -- its power depends on the statistics obtained right before and after insertion of the target. Leveraging this insight, we develop practical white-box (accessing model gradients) and black-box (accessing loss) SeMI attacks against models trained with (DP-)SGD. Across datasets and models trained with (DP-)SGD, our experiments show that SeMI attacks achieve higher powers than snapshot-independent baselines, and yield tighter privacy audits thanks to (a) control over the insertion time and (b) observations across the model sequence.

2602.09317 2026-05-12 cs.LG cs.AI stat.ML

SnareNet: Flexible Repair Layers for Neural Networks with Hard Constraints

Ya-Chi Chu, Alkiviades Boukas, Madeleine Udell

AI总结 SnareNet 是一种用于神经网络的可控修复架构,旨在解决模型输出违反物理、操作或安全约束的问题。其核心方法是在网络中引入可微分的修复层,通过在约束空间中进行迭代调整,使输出满足用户指定的约束条件。该方法采用自适应松弛训练策略,确保端到端训练的稳定性,并在多个基准任务中表现出更高的目标优化质量与更强的约束满足能力,尤其在处理非凸约束时具有显著优势。

详情
英文摘要

Neural networks are increasingly used as fast surrogate models across various domains, but unconstrained predictions can violate physical, operational, or safety requirements. We propose SnareNet, a feasibility-controlled architecture to learn mappings whose outputs must satisfy input-dependent constraints. SnareNet appends a differentiable repair layer that navigates in the constraint map's range space, steering iterates toward feasibility and producing a repaired output that satisfies constraints to a user-specified tolerance. We stabilize end-to-end training by adaptive relaxation, a new training paradigm that snares the neural network at initialization and shrinks it into the feasible set, enabling early exploration and strict feasibility later in training. On optimization learning and trajectory planning benchmarks, SnareNet consistently attains improved objective quality while satisfying constraints more reliably than prior work, and it is the first to enforce non-convex constraints at medium-to-high precision robustly across instances.

2602.07144 2026-05-12 cs.LG cs.AI stat.ML

BONSAI: Bayesian Optimization with Natural Simplicity and Interpretability

Samuel Daulton, David Eriksson, Maximilian Balandat, Eytan Bakshy

AI总结 BONSAI 是一种面向默认配置的贝叶斯优化方法,旨在在优化过程中尽量减少对默认参数的偏离,从而提升结果的可解释性与实用性。该方法通过控制获取函数的损失,有效剪枝低影响的参数变化,并兼容多种获取函数如预期改进和上置信界。理论分析表明,BONSAI 在保证优化性能的同时,能够以零获取成本恢复关键参数坐标,优于现有稀疏贝叶斯优化方法,并在多个实际应用中验证了其在减少非默认参数数量方面的显著优势。

Comments 32 pages

详情
英文摘要

Bayesian optimization (BO) is a popular technique for sample-efficient optimization of black-box functions. In many applications, the parameters being tuned come with a carefully engineered default configuration, and practitioners only want to deviate from this default when necessary. Standard BO, however, does not aim to minimize deviation from the default and, in practice, often pushes weakly relevant parameters to the boundary of the search space. This makes it difficult to distinguish between important and spurious changes and increases the burden of vetting recommendations when the optimization objective omits relevant operational considerations. We introduce BONSAI, a default-aware BO policy that prunes low-impact deviations from a default configuration while explicitly controlling the loss in acquisition value. BONSAI is compatible with a variety of acquisition functions, including expected improvement and upper confidence bound (GP-UCB). We theoretically bound the regret incurred by BONSAI, showing that, under certain conditions, it enjoys the same no-regret property as vanilla GP-UCB. Moreover, assuming known ARD lengthscales -- the same assumption underlying GP-UCB regret bounds -- BONSAI provably recovers the relevant-coordinate set at zero acquisition cost, yielding a method that matches the GP-UCB regret rate while recovering the minimal-$\ell_0$ solution -- a guarantee not provided by prior sparse-BO methods. Across many real-world applications, we empirically find that BONSAI substantially reduces the number of non-default parameters in recommended configurations while maintaining competitive optimization performance, with little effect on wall time -- averaging only $1.5\times$ the candidate-generation cost of standard BO, compared to $7$-$34\times$ on average for prior sparse-BO methods (IR, ER, and SEBO).

2602.00834 2026-05-12 cs.LG cs.AI stat.ML

A Minimum Variance Path Principle for Accurate and Stable Score-Based Density Ratio Estimation

Wei Chen, Jiacheng Li, Shigui Li, Zhiqi Lin, Junmei Yang, John Paisley, Delu Zeng

AI总结 本文针对基于分数的密度比估计方法在实践中存在的路径依赖性问题,提出了一种最小方差路径(MVP)原则,通过推导分数函数路径方差的闭式表达式,实现了对路径方差的优化。该方法利用可灵活参数化的库马拉吉混合模型自动学习低方差路径,无需人工设定,从而提升了估计的准确性和稳定性,并在多个基准任务上取得了新的最优结果。

详情
Journal ref
The Fourteenth International Conference on Learning Representations,2026
英文摘要

Score-based methods are powerful across machine learning, but they face a paradox: theoretically path-independent, yet practically path-dependent. We resolve this by proving that practical training objectives differ from the ideal, ground-truth objective by a crucial, overlooked term: the path variance of the score function. We propose the MVP (**M**imum **V**ariance **P**ath) Principle to minimize this path variance. Our key contribution is deriving a closed-form expression for the variance, making optimization tractable. By parameterizing the path with a flexible Kumaraswamy Mixture Model, our method learns data-adaptive, low-variance paths without heuristic manual selection. This principled optimization of the complete objective yields more accurate and stable estimators, establishing new state-of-the-art results on challenging benchmarks and providing a general framework for optimizing score-based interpolation. Our code can be found in https://github.com/Hoemr/OpenDRE.git.

2601.21410 2026-05-12 stat.ML cs.LG

Learning When to Trust LLM Priors: A Validated Framework for Semantic Prior Integration

Erica Zhang, Naomi Sagan, Danny Tse, Fangzhao Zhang, Mert Pilanci, Jose Blanchet

AI总结 该研究探讨了如何在监督学习中可靠地利用大语言模型(LLM)的语义先验知识。作者提出了一种名为Statsformer的验证框架,能够动态判断何时信任LLM生成的语义先验,并将其融入到不同类型的预测模型中。通过交叉验证机制,Statsformer自动调整各模型对先验信息的依赖程度,从而在提升预测性能的同时抑制不可靠的先验信号,为LLM辅助的统计学习提供了一种可靠性导向的解决方案。

详情
英文摘要

Large language models (LLMs) encode rich semantic knowledge that can be useful for supervised learning, but their outputs are unreliable as statistical priors: they may be noisy, misspecified, or hallucinated. Existing LLM-informed learning methods either trust such signals directly, leaving predictions vulnerable to unreliable LLM guidance, or restrict semantic integration to a single model class. We introduce Statsformer, a validated framework for learning when to trust LLM-derived semantic priors in supervised statistical learning. Statsformer maps LLM-derived feature scores into a family of learner-specific prior-injection mechanisms across a heterogeneous library of linear and nonlinear predictors. It then uses out-of-fold validation to adaptively calibrate the influence of each prior-informed learner, allowing useful semantic information to improve prediction while attenuating weak, misspecified, or adversarial priors. This yields a guardrailed statistical learning system with an oracle-style guarantee: up to statistical error, the final predictor performs no worse than the best convex combination of its in-library candidates, including prior-free learners. Across diverse prediction tasks, informative LLM priors improve performance, while unreliable priors are automatically downweighted. These results position Statsformer as a reliability-oriented approach to LLM-informed statistical learning: rather than trusting LLM knowledge directly, it validates semantic priors against data before allowing them to influence the final predictor.

2601.21061 2026-05-12 cs.LG stat.ML

Signal from Structure: Exploiting Submodular Upper Bounds in Generative Flow Networks

Alexandre Larouche, Audrey Durand

AI总结 本文研究了生成流网络(GFlowNets)在奖励函数具有子模结构时的优化问题,提出了一种基于子模上界的新训练方法SUBo-GFN。该方法利用子模性推导出未观测组合对象的奖励上界,并基于不确定性乐观原则进行训练,显著提升了生成样本的质量和数量。实验表明,SUBo-GFN在合成和现实子模任务中表现出优越的分布匹配能力和候选生成效果。

详情
英文摘要

Generative Flow Networks (GFlowNets; GFNs) are a class of generative models that learn to sample compositional objects proportionally to their a priori unknown value, their reward. We focus on the case where the reward has a specified, actionable structure, namely that it is submodular. We show submodularity can be harnessed to retrieve upper bounds on the reward of compositional objects that have not yet been observed. We provide in-depth analyses of the probability of such bounds occurring, as well as how many unobserved compositional objects can be covered by a bound. Following the Optimism in the Face of Uncertainty principle, we then introduce SUBo-GFN, which uses the submodular upper bounds to train a GFN. We show that SUBo-GFN generates orders of magnitude more training data than classical GFNs for the same number of queries to the reward function. We demonstrate the effectiveness of SUBo-GFN in terms of distribution matching and high-quality candidate generation on synthetic and real-world submodular tasks.

2601.20251 2026-05-12 stat.ML cs.LG

Efficient Evaluation of LLM Performance with Statistical Guarantees

Skyler Wu, Yash Nair, Emmanuel J. Candès

AI总结 本文研究如何在有限查询预算下高效且准确地评估大量大语言模型的性能。提出了一种名为Factorized Active Querying(FAQ)的方法,结合贝叶斯因子模型、自适应采样策略和有限总体主动推理,以在保证统计置信度的同时减少所需的评估样本数量。实验表明,FAQ在多个基准测试中相比现有方法可提升有效样本量达5倍,显著提高了评估效率。

Comments 27 pages, 12 figures

详情
英文摘要

Exhaustively evaluating many large language models (LLMs) on a large suite of benchmarks is expensive. We cast benchmarking as finite-population inference and, under a fixed query budget, seek tight confidence intervals (CIs) for model accuracy with valid frequentist coverage. We propose Factorized Active Querying (FAQ), which (a) leverages historical information through a Bayesian factor model; (b) adaptively selects questions using a hybrid variance-reduction/active-learning sampling policy; and (c) maintains validity through Proactive Active Inference -- a finite-population extension of active inference (Zrnic & Candès, 2024) that enables direct question selection while preserving coverage. With negligible overhead cost, FAQ delivers up to $5\times$ effective sample size gains over strong baselines on two benchmark suites, across varying historical-data missingness levels: this means that it matches the CI width of uniform sampling while using up to $5\times$ fewer queries. We release our source code and our curated datasets to support reproducible evaluation and future research.

2601.19553 2026-05-12 stat.ME stat.CO

A Fast, Closed-Form Bandwidth Selector for the Beta Kernel Density Estimator

Johan Hallberg Szabadváry

AI总结 该研究提出了一种快速、闭式带宽选择方法——“Beta参考规则”,用于Beta核密度估计器,解决了其在单位区间数据应用中因缺乏可靠带宽选择方法而受限的问题。该方法基于无权重的渐近均方积分误差(AMISE)推导,通过矩估计近似将带宽选择复杂度从迭代优化降至常数级别,显著提升了计算效率。实验表明,该方法在保持估计精度的同时,相比传统数值优化方法速度提升了35000倍以上,并有效避免了高斯核方法中常见的边界消失和肩部伪影问题。

Comments v3: Added Appendix detailing Python, R, and Julia software implementations. Accepted for publication in the Journal of Computational and Graphical Statistics (JCGS)

详情
英文摘要

The Beta kernel estimator offers a theoretically superior alternative to the Gaussian kernel for unit interval data, eliminating boundary bias without requiring reflection or transformation. However, its adoption remains limited by the lack of a reliable bandwidth selector; practitioners currently rely on iterative optimization methods that are computationally expensive and prone to instability. We derive the ``Beta Reference Rule,'' a fast, closed-form bandwidth selector based on the unweighted Asymptotic Mean Integrated Squared Error (AMISE) of a beta reference distribution. To address boundary integrability issues, we introduce a principled heuristic for U-shaped and J-shaped distributions. By employing a method-of-moments approximation, we reduce the bandwidth selection complexity from iterative optimization to $\mathcal{O}(1)$. Extensive Monte Carlo simulations demonstrate that our rule matches the accuracy of numerical optimization while delivering a speedup of over 35,000 times. Real-world validation on socioeconomic data shows that it avoids the ``vanishing boundary'' and ``shoulder'' artifacts common to Gaussian-based methods. We provide a comprehensive, open-source Python package to facilitate the immediate adoption of the Beta kernel as a drop-in replacement for standard density estimation tools.

2601.11242 2026-05-12 stat.ME

Deriving Complete Constraints in Hidden Variable Models

Michael C. Sachs, Erin E. Gabriel, Robin J. Evans, Arvid Sjölander

AI总结 本文研究了隐藏变量图模型中可观测分布所隐含的完整约束条件,这些问题通常比简单的条件独立关系更为复杂。作者提出了一种系统的方法,用于在观测变量为类别型且联合分布由线性关系描述的模型中推导出所有可观测约束,从而提升统计估计的效率。该方法在多个新场景中得到应用,能够同时处理不等式和等式约束。

详情
英文摘要

Hidden variable graphical models can sometimes imply constraints on the observable distribution that are more complex than simple conditional independence relations. These observable constraints can falsify assumptions of the model that would otherwise be untestable due to the unobserved variables and can be used to constrain estimation procedures to improve statistical efficiency. Knowing the complete set of observable constraints is thus ideal, but this can be difficult to determine in many settings. In models with categorical observed variables and a joint distribution that is completely characterized by linear relations to the unobservable response function variables, we develop a systematic method for deriving the complete set of observable constraints. We illustrate the method in several new settings, including ones that imply both inequality and equality constraints.

2601.10899 2026-05-12 stat.ME

On the use of cross-fitting in causal machine learning with correlated units

Salvador V. Balkus, Hasan Laith, Nima S. Hejazi

AI总结 在因果机器学习中,研究者通常将数据划分为不同部分分别用于拟合和评估模型,这种方法称为交叉拟合,可消除黑箱预测算法引入的偏差。本文指出,即使研究单位之间存在相关性(如空间、聚类或时间序列数据),无需特别设计交叉拟合方式以降低折叠间的相关性,仍能有效消除关键偏差项。通过多种相关结构的仿真实验验证,基于独立假设的交叉拟合在偏差和精度方面表现优异,甚至优于专门消除折叠间相关性的方法。

Comments 14 pages, 8 figures

详情
英文摘要

In causal machine learning, the fitting and evaluation of nuisance models are often performed on separate partitions, or folds, of the observed data. This technique, called cross-fitting, eliminates bias introduced by the use of black-box predictive algorithms. When study units may be correlated, such as in spatial, clustered, or time-series data, investigators often design bespoke forms of cross-fitting to minimize correlation between folds. We prove that, perhaps contrary to popular belief, this is typically unnecessary: performing cross fitting as if study units were independent still eliminates key bias terms even when units may be correlated. In simulation experiments with various correlation structures, we show that causal machine learning estimators achieve the same or improved bias and precision under cross-fitting that ignores correlation compared to techniques striving to eliminate correlation between folds.

2601.09371 2026-05-12 stat.ME

White noise testing for functional time series via functional quantile autocorrelation

Ángel López-Oriona, Ying Sun, Hanlin Shang

AI总结 本文提出了一类基于函数分位数自相关框架的新型非线性检验方法,用于检测函数型时间序列中的序列依赖性。该方法通过分位数基的 excursion 集合,能够稳健地捕捉无限维函数数据中的时间依赖性,适用于存在异常值和复杂非线性关系的情形。研究提出了统一的检验统计量,分析了其在已知和估计分位数曲线下的渐近性质,并通过大量仿真和高频金融数据应用验证了方法的有效性,表现出比现有方法更强的检验能力。

详情
英文摘要

We introduce a novel class of nonlinear tests for serial dependence in functional time series, grounded in the functional quantile autocorrelation framework. Unlike traditional approaches based on the classical autocovariance kernel, the functional quantile autocorrelation framework leverages quantile-based excursion sets to robustly capture temporal dependence within infinite-dimensional functional data, accommodating potential outliers and complex nonlinear dependencies. We propose omnibus test statistics and study their asymptotic properties under both known and estimated quantile curves, establishing their asymptotic distribution and consistency under mild assumptions. In particular, no moment conditions are required for the validity of the tests. Extensive simulations and an application to high-frequency financial functional time series demonstrate the methodology's effectiveness, reliably detecting complex serial dependence with superior power relative to several existing tests. This work expands the toolkit for functional time series, providing a robust framework for inference in settings where traditional methods may fail.

2512.04475 2026-05-12 cs.LG cs.AI cs.NE stat.ML

GraphBench: Next-generation graph learning benchmarking

Timo Stoll, Chendi Qian, Ben Finkelshtein, Ali Parviz, Darius Weber, Fabrizio Frasca, Hadar Shavit, Antoine Siraudin, Arman Mielke, Marie Anastacio, Erik Müller, Maya Bechler-Speicher, Michael Bronstein, Mikhail Galkin, Holger Hoos, Mathias Niepert, Bryan Perozzi, Jan Tönshoff, Christopher Morris

AI总结 随着图机器学习在分子性质预测和芯片设计等领域取得进展,当前的基准测试方法仍存在碎片化问题,依赖于任务特定的数据集和不一致的评估协议,限制了研究的可复现性和整体进展。为应对这一挑战,本文提出 GraphBench,一个涵盖多种现实领域和任务场景的综合性基准测试套件,提供标准化的评估协议和统一的超参数调优框架,旨在推动图学习模型的全面评估与未来发展。

详情
英文摘要

Machine learning on graphs has made substantial progress across domains such as molecular property prediction and chip design. Yet benchmarking practices remain fragmented, often relying on narrow, task-specific datasets and inconsistent evaluation protocols, hindering reproducibility and broader progress. With the recent popularity of graph foundation models, these weaknesses have become apparent, as existing benchmarks are insufficient for thorough evaluation. To address these challenges, we introduce GraphBench, a comprehensive benchmark suite spanning diverse real-world domains and task settings, including node-level, edge-level, graph-level, and generative tasks. GraphBench provides standardized evaluation protocols, including consistent dataset splits and metrics for assessing out-of-distribution generalization across selected tasks, as well as a unified hyperparameter-tuning framework. We further evaluate GraphBench with recent message-passing neural networks and graph transformer models, establishing principled baselines for future research. See www.graphbench.io for further details.

2511.17994 2026-05-12 cs.LG stat.ML

Learning Rate Scheduling with Matrix Factorization for Private Training

Nikita P. Kalinin, Joel Daniel Andersson

AI总结 本文研究了在学习率调度和相关噪声背景下进行差分隐私模型训练的问题。作者通过矩阵分解方法引入相关噪声以提升模型精度,并针对实际中广泛使用的非固定学习率调度策略,推导了单轮和多轮训练场景下的一般误差上界和下界。基于理论分析,提出了一种学习率感知的矩阵分解方法,在多种误差指标下均优于传统的前缀和分解方法,并在CIFAR-10和IMDB数据集上的实验验证了其有效性。

Comments Accepted at FORC 2026

详情
英文摘要

We study differentially private model training with stochastic gradient descent under learning rate scheduling and correlated noise. Although correlated noise, in particular via matrix factorizations, has been shown to improve accuracy, prior theoretical work focused primarily on the prefix-sum workload. That workload assumes a constant learning rate, whereas in practice learning rate schedules are widely used to accelerate training and improve convergence. We close this gap by deriving general upper and lower bounds for a broad class of learning rate schedules in both single- and multi-epoch settings. Building on these results, we propose a learning-rate-aware factorization that achieves improvements over prefix-sum factorizations under both MaxSE and MeanSE error metrics. Our theoretical analysis yields memory-efficient constructions suitable for practical deployment, and experiments on CIFAR-10 and IMDB datasets confirm that schedule-aware factorizations improve accuracy in private training.

2511.14091 2026-05-12 stat.ME stat.AP

State-Space Representation of INGARCH Models and Their Application in Insurance

Jae Youn Ahn, Hong Beng Lim, Mario V. Wüthrich

AI总结 本文研究了整数值广义自回归条件异方差(INGARCH)模型在保险领域的应用,并提出了边际化状态空间模型(M-SSM)以克服传统INGARCH模型在理论解释、协变量引入和缺失数据处理方面的不足。通过将INGARCH模型嵌入到M-SSM框架中,论文展示了其对协变量和缺失数据的自然兼容性,并进一步证明在适当假设下,M-SSM可以转化为观测驱动的状态空间模型(O-SSM),从而为弱平稳性分析提供理论支持。研究通过泊松和负二项分布的INGARCH(1,1)模型实例,展示了该方法在保险数据预测中的有效性。

详情
英文摘要

Integer-valued generalized autoregressive conditional heteroskedastic (INGARCH) models are a popular framework for modeling serial dependence in count time-series. While convenient for modeling, prediction, and estimation, INGARCH models lack a clear theoretical justification for the evolution step. This limitation not only makes interpretation difficult and complicates the inclusion of covariates, but can also make the handling of missing data computationally burdensome. Consequently, applying such models in an insurance context, where covariates and missing observations are common, can be challenging. In this paper, we first introduce the marginalized state-space model (M-SSM), defined solely through the marginal distribution of the observations, and show that INGARCH models arise as special cases of this framework. The M-SSM formulation facilitates the natural incorporation of covariates and missing data mechanisms, and this representation in turn provides a coherent way to incorporate these elements within the INGARCH model as well. We then demonstrate that an M-SSM can admit an observation-driven state-space model (O-SSM) representation when suitable assumptions are imposed on the evolution of its conditional mean. This lifting from an M-SSM to an O-SSM provides a natural setting for establishing weak stationarity, even in the presence of heterogeneity and missing observations. The proposed ideas are illustrated through the Poisson and the Negative-Binomial INGARCH(1,1) models, highlighting their applicability in predictive analysis for insurance data.

2511.01196 2026-05-12 stat.ML cs.AI cs.LG

An Interdisciplinary and Cross-Task Review on Missing Data Imputation

Jicong Fan

AI总结 本文系统综述了缺失数据填补这一跨学科、跨任务的研究领域,探讨了缺失机制、填补方法及在不同应用场景下的问题特性。文章全面梳理了从传统统计方法到现代深度学习模型(如自编码器、生成对抗网络、图神经网络等)的各类填补技术,并重点分析了复杂数据类型(如张量、时间序列、图结构数据等)的处理方法。此外,还探讨了填补方法与下游任务(如分类、聚类、异常检测)的结合方式,并指出了未来研究的关键挑战与发展方向。

详情
Journal ref
Foundations and Trends in Signal Processing, Vol. 20, No. 3, pp. 185-317, 2026
英文摘要

Missing data is a fundamental challenge in data science, significantly hindering analysis and decision-making across a wide range of disciplines, including healthcare, bioinformatics, social science, e-commerce, and industrial monitoring. Despite decades of research and numerous imputation methods, the literature remains fragmented across fields, creating a critical need for a comprehensive synthesis that connects statistical foundations with modern machine learning advances. This work systematically reviews core concepts-including missingness mechanisms, single versus multiple imputation, and different imputation goals-and examines problem characteristics across various domains. It provides a thorough categorization of imputation methods, spanning classical techniques (e.g., regression, the EM algorithm) to modern approaches like low-rank and high-rank matrix completion, deep learning models (autoencoders, GANs, diffusion models, graph neural networks), and large language models. Special attention is given to methods for complex data types, such as tensors, time series, streaming data, graph-structured data, categorical data, and multimodal data. Beyond methodology, we investigate the crucial integration of imputation with downstream tasks like classification, clustering, and anomaly detection, examining both sequential pipelines and joint optimization frameworks. The review also assesses theoretical guarantees, benchmarking resources, and evaluation metrics. Finally, we identify critical challenges and future directions, emphasizing model selection and hyperparameter optimization, the growing importance of privacy-preserving imputation via federated learning, and the pursuit of generalizable models that can adapt across domains and data types, thereby outlining a roadmap for future research.

2510.26470 2026-05-12 stat.ME econ.EM

Valid Inference when Testing Violations of Parallel Trends for Difference-in-Differences

Jonas M. Mikhaeil, Christopher Harshaw

AI总结 本文研究了双重差分法(DID)中平行趋势假设的检验问题,指出传统预检验方法在估计和推断中存在功效低、偏差大和置信区间覆盖率不足等问题。作者提出了一种改进的预检验方法和相应的置信区间构造方式,在较弱的分离条件下,该方法具有一致性且置信区间在通过检验的条件下具有有效覆盖率。研究还引入了条件外推假设,用于连接前处理时期的平行趋势偏差与后处理时期的未识别偏差,并通过合成数据和越南公共服务再中心化、弗吉尼亚州持枪权法律等实际数据验证了方法的有效性。

详情
英文摘要

The difference-in-differences (DID) research design is a key identification strategy which allows researchers to estimate causal effects under the parallel trends assumption. While the parallel trends assumption is counterfactual and cannot be tested directly, researchers often examine pre-treatment periods to check whether the time trends are parallel before treatment is administered. A recent literature has shown that existing preliminary tests have adverse effects on conventional statistical methods for estimation and inference, including low power, bias, and undercoverage. In this paper, we describe simple preliminary tests and corresponding confidence intervals for the causal effect which overcome these issues. Under mild separation conditions, the preliminary test is shown to be consistent and the confidence intervals for the causal effect have valid coverage conditional on passing the test. Our results hold under what we refer to as the conditional extrapolation assumption, which posits a relationship between the unidentified post-treatment violation of parallel trends and the identified pre-treatment violations. We view the conditional extrapolation assumption as one formalization of the assumption which is implicitly held when conducting a preliminary test for parallel trends. To illustrate the performance of the proposed methods, we use synthetic data as well as data on recentralization of public services in Vietnam and right-to-carry laws in Virginia.

2510.09877 2026-05-12 cs.LG cs.AI stat.ML

Batch Bayesian Active Learning with Partial Batch Label Sampling

Kangping Hu, Stephen Mussmann

AI总结 本文研究了批量贝叶斯主动学习中标签采样的问题,针对现有方法在大批次场景下计算复杂或性能下降的挑战,提出了一种基于贝叶斯决策理论的局部批量标签采样方法ParBaLS,专门用于改进EPIG算法。实验表明,该方法在固定预算下相比其他方法具有更优的性能,尤其在结合大预训练模型嵌入的贝叶斯逻辑回归任务中表现突出。

详情
英文摘要

Over the past couple of decades, many active learning acquisition functions have been proposed, leaving practitioners with an unclear choice of which to use. Bayesian-based active learning offers principled objectives with explainable intuition, including Expected Error Reduction (EER), Expected Predictive Information Gain (EPIG), and Bayesian Active Learning by Disagreements (BALD). A key challenge of such methods is the difficult scaling to large batch sizes, leading to either computational challenges (BatchBALD) or dramatic performance drops (top-$B$ selection). Here, using a particular formulation of Bayesian Decision Theory, we derive Partial Batch Label Sampling (ParBaLS) for the EPIG algorithm. We show experimentally for several datasets that ParBaLS EPIG gives superior performance for a fixed budget and Bayesian Logistic Regression on embeddings from large pre-trained models. Our code is available at https://github.com/ADDAPT-ML/ParBaLS.

2510.08972 2026-05-12 stat.ME

Robust and Efficient Semiparametric Inference for the Stepped Wedge Design

Fan Xia, K. C. Gary Chan, Emily Voldal, Avi Kenny, Patrick J. Heagerty, James P. Hughes

AI总结 该论文针对阶梯楔形设计(SWD)中干预效果估计的挑战,提出了一种统一的半参数推断框架,能够处理时间变化的干预效应以及集群内相关观测、集群规模变化和治疗分配依赖性等问题。所提方法在协方差结构误设的情况下仍保持估计的一致性和渐近正态性,并通过利用治疗分配的排列结构设计了适用于小样本的方差估计方法,提升了推断的稳健性和效率。研究还展示了该方法在处理效应修饰和不平衡协变量调整方面的灵活性,并通过模拟和实际公共卫生试验验证了其有效性。

详情
英文摘要

Stepped wedge designs (SWDs) are increasingly used to evaluate longitudinal cluster-level interventions but pose substantial challenges for valid inference. Because crossover times are randomized, intervention effects are intrinsically confounded with secular time trends, while heterogeneity across clusters, complex correlation structures, baseline covariate imbalances, and small numbers of clusters further complicate inference. We propose a unified semiparametric framework for estimating possibly time-varying intervention effects in SWDs. Under a semiparametric model on treatment contrast, we develop a nonstandard semiparametric efficiency theory that accommodates correlated observations within clusters, varying cluster-period sizes, and weakly dependent treatment assignments. The resulting estimator is consistent and asymptotically normal even under misspecified covariance structure and control cluster-period means, and is efficient when both are correctly specified. To enable inference with few clusters, we exploit the permutation structure of treatment assignment to propose a standard error estimator that reflects finite-sample variability, with a leave-one-out correction to reduce plug-in bias. The framework also allows incorporation of effect modification and adjustment for imbalanced precision variables through design-based adjustment or double adjustment that additionally incorporates an outcome-based component. Simulations and application to a public health trial demonstrate the robustness and efficiency of the proposed method relative to standard approaches.

2509.22531 2026-05-12 stat.ML cs.LG

Debiased Front-Door Learners for Heterogeneous Effects

Yonghan Jung

AI总结 在观察性研究中,当处理变量和结果变量存在未观测的混杂因素,但中介变量不受混杂影响时,可通过前门(FD)调整识别因果效应。本文研究了在FD识别框架下异质处理效应(HTE)的估计问题,提出了两种去偏学习方法:FD-DR-Learner和FD-R-Learner。在明确的样本分割、重叠界、矩条件和分阶段学习假设下,这两种方法分别满足乘积误差界和阶段误差分解,从而在 nuisance 项较小时实现条件准oracle性质。实验表明,这些方法在合成数据和基于FARS数据集的真实案例中均表现出良好的稳健性和估计效率。

Comments 26 pages, 3 figures. Revised theory statements, notation, and proof presentation; conclusions unchanged. Code available at https://github.com/yonghanjung/FD-CATE

详情
英文摘要

In observational settings where treatment and outcome share unmeasured confounders but an observed mediator remains unconfounded, the front-door (FD) adjustment identifies causal effects through the mediator. We study the heterogeneous treatment effect (HTE) under FD identification and introduce two debiased learners: FD-DR-Learner and FD-R-Learner. Under explicit sample-splitting, bounded-overlap, moment, and stage-learning assumptions, we show that FD-DR satisfies a product-error bound and FD-R satisfies a stage-error decomposition; these results yield conditional quasi-oracle corollaries when the relevant nuisance remainders are no larger than the target or stage oracle terms. We provide error analyses establishing this debiasedness and demonstrate robust empirical performance in synthetic studies and a real-world case study of primary seat-belt laws using Fatality Analysis Reporting System (FARS) dataset. Together, these results indicate that the proposed learners can deliver reliable and sample-efficient HTE estimates in FD scenarios when the stated assumptions are credible. The implementation is available at https://github.com/yonghanjung/FD-CATE.

2508.15016 2026-05-12 stat.ME

Untangling Sample and Population Level Estimands in Bayesian Causal Computation

Arman Oganisian

AI总结 本文探讨了在贝叶斯因果计算中样本层面和总体层面因果效应估计量之间的区别,指出了在识别、建模、计算和解释上的关键差异。研究指出,常见的样本层面估计量需要跨世界的贝叶斯建模和反事实的显式MCMC采样,而总体层面估计量通常只需参数的后验分布及事后蒙特卡洛近似。通过多个例子,作者展示了看似相似的计算方法可能得出本质不同的估计量,从而导致错误推论,并总结了选择估计量时应注意的常见错误和因素。

详情
英文摘要

Model-based Bayesian inference for sample and population-level causal estimands has been growing in popularity. This literature routinely emphasizes clear specification of the target estimand, however blind implementation of standard computational procedures may implicitly target estimands that differ from the one specified at the outset. This sometimes leads to unwitting conflation of sample and population-level inference. In this paper, we elucidate the differences between sample and population-level inference with respect to identification, modeling, computation, and interpretation. For example, common sample-level estimands require cross-world Bayesian modeling, whereas many (but not all) population-level estimands do not. Similarly, the former requires explicit MCMC sampling of counterfactuals from their joint posterior, whereas the latter typically only requires a posterior distribution over parameters and, perhaps, post-hoc Monte Carlo approximations. We explore these issues across four examples, including with Bayesian nonparametric models, in which ostensibly similar Bayesian computational procedures yield posterior draws of fundamentally different estimands, leading to incorrect inferences. We end with a discussion of common mistakes and factors to consider when choosing an estimand.

2507.00795 2026-05-12 econ.EM stat.ME

Randomization Inference with Sample Attrition

Xinran Li, Peizan Sheng, Zeyang Yu

AI总结 本文研究了在存在样本流失(即部分单位的结局数据缺失)的情况下,如何进行有效的随机化推断。作者提出了一种计算高效的新方法,能够在广泛的信息性缺失机制下保持推断的有效性,即使单位的缺失情况依赖于其未观测的潜在结果。该方法通过构造基于最坏情况的p值,支持对处理效应的尖锐和有界零假设进行检验,并利用分布自由的检验统计量实现闭式解,同时结合潜在结果和潜在缺失指示符,提升了推断的统计功效。

详情
英文摘要

Randomization inference is a widely-used and appealing approach for analyzing treatment effects in randomized experiments, as it is finite-sample valid and does not require any distributional assumptions. However, naive application of randomization inference may suffer from severe size distortion in the presence of sample attrition, where outcome data are missing for some units. In this paper, we propose new, computationally efficient methods for randomization inference that remain valid under a broad class of potentially informative missingness mechanisms, allowing a unit's missingness to depend on its (unobserved) potential outcomes. Specifically, we construct valid p-values for testing both sharp and bounded null hypotheses on treatment effects via a worst-case consideration of the classical Fisher randomization test. Leveraging distribution-free test statistics, these worst-case p-values admit closed-form solutions. Importantly, by incorporating both potential outcomes and potential missingness indicators into the test statistic, our methods can exploit structural assumptions such as monotone missingness, which are commonly adopted in applications due to their plausibility and ability to substantially improve inferential power. Moreover, our approach connects to a range of partial identification bounds in the literature, which in some sense suggests the sharpness of our tests. We illustrate the proposed methods through both simulation studies and an empirical application. An R package implementing the proposed methods is publicly available.

2506.12542 2026-05-12 cs.LG cs.AI cs.CV stat.ML

PLD: A Choice-Theoretic List-Wise Knowledge Distillation

Ejafa Bassam, Dawei Zhu, Kaigui Bian

AI总结 本文提出了一种基于选择理论的知识蒸馏方法PLD,将教师网络的logit值解释为类别“价值”得分,并在Plackett-Luce模型框架下构建了一个加权列表级排序损失函数。PLD直接优化教师模型的完整排序结构,将真实标签置于首位,其余类别按教师置信度降序排列,从而生成一个凸且平移不变的替代损失函数。实验表明,PLD在多个数据集和不同架构的师生对中均能实现稳定提升,适用于多种蒸馏目标。

详情
Journal ref
Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 136090--136112 (2026)
英文摘要

Knowledge distillation is a model compression technique in which a compact "student" network is trained to replicate the predictive behavior of a larger "teacher" network. In logit-based knowledge distillation, it has become the de facto approach to augment cross-entropy with a distillation term. Typically, this term is either a KL divergence that matches marginal probabilities or a correlation-based loss that captures intra- and inter-class relationships. In every case, it acts as an additional term to cross-entropy. This term has its own weight, which must be carefully tuned. In this paper, we adopt a choice-theoretic perspective and recast knowledge distillation under the Plackett-Luce model by interpreting teacher logits as "worth" scores. We introduce "Plackett-Luce Distillation (PLD)", a weighted list-wise ranking loss. In PLD, the teacher model transfers knowledge of its full ranking of classes, weighting each ranked choice by its own confidence. PLD directly optimizes a single "teacher-optimal" ranking. The true label is placed first, followed by the remaining classes in descending teacher confidence. This process yields a convex and translation-invariant surrogate that subsumes weighted cross-entropy. Empirically, across CIFAR-100, ImageNet-1K, and MS-COCO, PLD achieves consistent gains across diverse architectures and distillation objectives, including divergence-based, correlation-based, and feature-based methods, in both homogeneous and heterogeneous teacher-student pairs.

2505.23113 2026-05-12 stat.ME math.ST stat.AP stat.TH

Valid F-screening in linear regression

Olivia McGough, Daniela Witten, Daniel Kessler

AI总结 本文研究了在线性回归中,当整体零假设被拒绝后,如何进行有效的条件推断问题。作者提出了一种基于F检验的筛选方法(F-screening),并开发了一套在拒绝整体零假设条件下仍具有良好统计性质的推断工具,包括选择性p值、置信区间和点估计。这些方法无需原始数据,仅依赖于回归的标准输出,适用于对已发表研究的回顾性分析,并在模拟和实际数据中验证了其有效性。

详情
英文摘要

Suppose that a data analyst wishes to report the results of a least squares linear regression only if the overall null hypothesis, $H_0^{1:p}: β_1= β_2 = \ldots = β_p=0$, is rejected. This practice, which we refer to as F-screening (since the overall null hypothesis is typically tested using an $F$-statistic), is in fact common across a number of applied fields. Unfortunately, it poses a problem: standard guarantees for the inferential outputs of linear regression, such as Type 1 error control of hypothesis tests and nominal coverage of confidence intervals, hold unconditionally, but fail to hold conditional on rejection of the overall null hypothesis. In this paper, we develop an inferential toolbox for the coefficients in a least squares model that are valid conditional on rejection of the overall null hypothesis. We develop selective p-values that lead to tests that are consistent and control the selective Type 1 error, i.e., the Type 1 error conditional on having rejected the overall null hypothesis. Furthermore, they can be computed without access to the raw data, i.e., using only the standard outputs of a least squares linear regression, and therefore are suitable for use in a retrospective analysis of a published study. We also develop confidence intervals that attain nominal selective coverage, and point estimates that account for having rejected the overall null hypothesis. We derive an expression for the Fisher information about the coefficients resulting from the proposed approach, and compare this to the Fisher information that results from an alternative approach that relies on sample splitting. We investigate the proposed approach in simulation and via re-analysis of two datasets from the biomedical literature.

2505.17204 2026-05-12 stat.ML cs.LG math.PR math.ST stat.CO stat.TH

Liouville PDE-based sliced-Wasserstein flow

Jayshawn Cooper, Pilhwa Lee

AI总结 本文将切片沃瑟斯坦流(SWF)转化为基于刘维尔偏微分方程(PDE)的形式,提出了一种新的非参数隐式生成梯度流方法。通过将基于福克-普朗克方程的随机扩散项重新表述为无扩散项的刘维尔PDE运输方程,并结合神经ODE的标准化流进行密度估计,提升了模型的收敛效率与稳定性。该方法在生成沃瑟斯坦中心时引入柯纳托维奇势函数,有效降低了方差,并在公平回归任务中展现出优于标准SWF的准确率与公平性平衡能力。

Comments 24 pages, 10 figures. arXiv admin note: substantial text overlap with arXiv:1806.08141 by other authors

详情
英文摘要

The sliced Wasserstein flow (SWF), a nonparametric and implicit generative gradient flow, is transformed into a Liouville partial differential equation (PDE)-based formalism. First, the stochastic diffusive term from the Fokker-Planck equation-based Monte Carlo is reformulated as a Liouville PDE-based transport without the diffusive term, essentially reflecting the probability flow ODE. The involved density estimation is handled by normalizing flows of neural ODE without an explicitly defined score function. Next, the computation of the Wasserstein barycenter is approximated by the Liouville PDE-based SWF barycenter with the prescription of Kantorovich potentials for the induced gradient flow to generate its samples. These two efforts show outperforming convergence in training and testing Liouville PDE-based SWF and SWF barycenters with reduced variance. Applying the generative Liouville PDE-based SWF barycenter for fair regression demonstrates competent profiles in the accuracy-fairness Pareto curves, with comparable and alternative choices against the standard SWF, and significant benefit in improving fairness with scalability in comparison to the exact Wasserstein barycenter.

2505.16741 2026-05-12 cs.LG math.OC stat.ML

Meta-reinforcement learning with minimum attention

Shashank Gupta, Pilhwa Lee

AI总结 该论文将最小注意原理应用于强化学习,通过在奖励函数中引入最小注意正则化,旨在提升智能体在高维非线性动态环境中的学习效率和稳定性。研究结合模型基于的元学习框架,交替进行模型学习与元策略优化,实验表明该方法在少量样本下的适应能力和对模型与环境扰动的鲁棒性方面优于现有先进算法,并在能量效率方面也表现出改进。

Comments 30 pages, 22 figures

详情
英文摘要

Minimum attention applies the least action principle to changes of control concerning state and time, first proposed by Brockett. The involved regularization is highly relevant in emulating biological control, such as motor learning. We apply minimum attention in reinforcement learning (RL) as part of the rewards and investigate its connection to meta-learning and stabilization. Specifically, model-based meta-learning with minimum attention is explored in high-dimensional nonlinear dynamics. Ensemble-based model learning and gradient-based meta-policy learning are alternately performed. Empirically, the minimum attention does show outperforming competence in comparison to the state-of-the-art algorithms of model-free and model-based RL, i.e., fast adaptation in few shots and variance reduction from the perturbations of the model and environment. Furthermore, the minimum attention demonstrates an improvement in energy efficiency.

2504.14697 2026-05-12 cs.LG math.AP math.DS stat.ML

Quantitative Clustering in Mean-Field Transformer Models

Shi Chen, Zhengjiang Lin, Yury Polyanskiy, Philippe Rigollet

AI总结 本文研究了平均场变换器模型中令牌的长期聚类行为,揭示了在适当参数假设下,模型会以指数速率收敛到一个狄拉克点质量。作者通过定量分析给出了明确的收敛速率,为理解变换器模型中的同步现象提供了理论依据。

Comments 50 pages, 4 figures; We have updated the introduction and added sketches of the proofs of the main theorems

详情
英文摘要

The evolution of tokens through deep transformer models can be modeled as an interacting particle system that has been shown to exhibit an asymptotic clustering behavior akin to the synchronization phenomenon in Kuramoto models. In this work, we investigate the long-time clustering of mean-field transformer models. More precisely, under suitable assumptions on the transformer model parameters, we establish that any suitably regular mean-field initialization synchronizes exponentially fast to a Dirac point mass, with explicit quantitative convergence rates.

2503.16027 2026-05-12 stat.CO stat.AP stat.ME

Deep Gaussian Process Emulation with gradient Information and Sequential Design for Simulators with Sharp Variations

Yiming Yang, Deyu Ming, Serge Guillas

AI总结 该论文研究了如何利用深度高斯过程(DGP)对具有剧烈变化特性的仿真模型进行高效建模,并提出了梯度不确定性量化方法。作者通过局部线性化和链式法则,推导出两层DGP梯度的均值和协方差的闭式表达,实现了快速的梯度评估与不确定性估计。基于梯度不确定性,论文进一步提出了一种用于剧烈变化区域的序列设计方法,通过熵驱动的采样策略提升模型在复杂非平稳场景下的拟合精度。

详情
英文摘要

Deep Gaussian Processes (DGPs) compose GP layers to warp inputs, enabling improved emulation of computer models with nonstationary input-output behavior compared with ordinary GPs. In contrast to GPs, the predictive uncertainty for DGP gradients remains relatively underexplored. Quantifying DGP gradient uncertainty can support gradient-based tasks in complex, nonstationary settings where ordinary GPs may struggle. While GP gradient posteriors are analytically tractable, extending such constructions to DGPs is challenging due to their hierarchical composition. In this paper, we propose an efficient approximation to the gradient distribution of a two-layer DGP emulator. Using the chain rule with local linearization, we derive closed-form expressions for the gradient mean and covariance, enabling fast gradient evaluation with uncertainty quantification (UQ). Empirically, our approach delivers promising performance while uniquely providing UQ of gradients. We then use the gradient uncertainties to guide sequential design for models with sharp variations: we define sharp variation regions as those where the gradient norm exceeds a threshold. We subsequently introduce an entropy-based acquisition rule that selects new samples in locations where the classification of points as inside versus outside the sharp-variation region is most uncertain. Experiments on synthetic benchmarks and a real-world application show that the resulting sequential design more accurately emulates functions with sharp variations than existing design methods.

2503.00982 2026-05-12 stat.ME physics.soc-ph

Multivariable Behavioral Change Modeling of Epidemics in the Presence of Undetected Infections

Caitlin Ward, Rob Deardon, Alexandra M. Schmidt

AI总结 该论文提出了一种新的贝叶斯传染病建模框架,旨在更准确地刻画疫情传播过程,特别考虑了人类行为变化和未被检测的感染者对疫情的影响。研究通过引入医院收治和死亡数据,结合多种数据源动态影响人群行为变化,提升了模型对疫情传播复杂性的刻画能力。该方法在模拟实验和实际疫情数据(如蒙特利尔和迈阿密的新冠数据)中的应用验证了其有效性,为传染病防控提供了更精确的分析工具。

Comments 21 pages, 7 figures

详情
英文摘要

Epidemic models are invaluable tools to understand and implement strategies to control the spread of infectious diseases, as well as to inform public health policies and resource allocation. However, current modeling approaches have limitations that reduce their practical utility, such as the exclusion of human behavioral change in response to the epidemic or ignoring the presence of undetected infectious individuals in the population. These limitations became particularly evident during the COVID-19 pandemic, underscoring the need for more accurate and informative models. To address these challenges, we develop a novel Bayesian epidemic modeling framework to better capture the complexities of disease spread by incorporating behavioral responses and undetected infections. In particular, our framework makes three contributions: 1) leveraging additional data on hospitalizations and deaths in modeling the disease dynamics, 2) accounting for data uncertainty arising from the large presence of asymptomatic and undetected infections, and 3) allowing the population behavioral change to be dynamically influenced by multiple data sources (cases and deaths). We thoroughly investigate the properties of the proposed model via simulation, and illustrate its utility on COVID-19 data from Montreal and Miami.

2502.06096 2026-05-12 stat.ML cs.AI cs.LG stat.ME

Post-detection inference for sequential changepoint localization

Aytijhya Saha, Aaditya Ramdas

AI总结 本文研究了序贯变点分析中一个基础但尚未充分探索的问题:在检测到变化后进行统计推断。作者提出了一种通用的非参数框架,能够在任意序贯检测算法判定变化的停时点,仅基于该时刻之前观测到的数据,构建未知变点的置信集。该方法无需对变点后的观测分布、观测空间或检测过程做任何假设,且具有非渐近有效性,适用于多种实际场景,并提供了置信区间的宽度理论保证。

详情
英文摘要

This paper addresses a fundamental but largely unexplored challenge in sequential changepoint analysis: conducting inference following a detected change. We develop a very general framework to construct confidence sets for the unknown changepoint using only the data observed up to a data-dependent stopping time at which an arbitrary sequential detection algorithm declares a change. Our framework is nonparametric, making no assumption on the composite post-change class, the observation space, or the sequential detection procedure used, and is non-asymptotically valid. We also extend it to handle composite pre-change classes under a suitable assumption, and also derive confidence sets for the change magnitude in parametric settings. We provide theoretical guarantees on the width of our confidence intervals. Extensive simulations demonstrate that the produced sets have reasonable size, and slightly conservative coverage. In summary, we present the first general method for sequential changepoint localization, which is theoretically sound and broadly applicable in practice.

2502.03414 2026-05-12 stat.ME

Efficient nonparametric estimation with difference-in-differences in the presence of network dependence and interference

Michael Jetsupphasuk, Didong Li, Michael G. Hudgens

AI总结 本文研究了在存在网络依赖和干扰的情况下,如何高效地进行非参数差分-in-差分(DiD)因果效应估计。作者扩展了传统的DiD方法,允许处理效应异质性、单位间相互影响以及潜在变量相关性,并提出了一个双重稳健估计器,能够在条件平行趋势假设下实现一致、渐近正态且高效的估计。该方法通过模拟验证,并应用于研究燃煤电厂采用排放控制技术对心血管疾病死亡率的影响。

详情
英文摘要

Differences-in-differences (DiD) is a causal inference method for observational longitudinal data that assumes parallel expected potential outcome trajectories between treatment groups under the counterfactual scenario where all units receive a specific treatment. In this paper DiD is extended to allow for: (i) non-identically distributed treatment effects and exposure probabilities; (ii) interference, where treatment of one unit can affect outcomes in neighboring units; and (iii) latent variable dependence, where outcomes, treatments, and covariates may exhibit between-unit correlation. The causal estimand of interest is the network-averaged expected exposure effect if units received a specific exposure level, where a unit's exposure is a function of its own treatment and its neighbors' treatments. Under a conditional parallel trends assumption and suitable network dependency and heterogeneity conditions, a doubly robust estimator allowing for data-adaptive nuisance function estimation is proposed and shown to be consistent, asymptotically normal, and efficient. The proposed methods are evaluated in simulations and applied to study the effects of adopting emission control technologies in coal power plants on county-level mortality due to cardiovascular disease.

2501.04721 2026-05-12 stat.AP cs.LG physics.med-ph

A Shape-Based Functional Index for Objective Assessment of Pediatric Motor Function

Shashwat Kumar, Arafat Rahman, Robert Gutierrez, Sarah Livermon, Allison N. McCrady, Silvia Blemker, Rebecca Scharf, Anuj Srivastava, Laura E. Barnes

AI总结 该研究提出了一种基于形状的函数指标,用于客观评估儿童神经肌肉疾病患者的运动功能。通过可穿戴传感器采集数据,结合形状主成分分析和偏最小二乘法,识别出与运动速度变化和不对称性相关的运动模式,并构建了一个与肌肉脂肪浸润、运动功能评分及年龄相关退化变化高度相关的新型运动功能指数。该方法可应用于家庭环境,有助于长期追踪治疗效果,为儿科神经肌肉疾病提供更客观的评估手段。

Comments 13 pages

详情
Journal ref
Plos one, 2025
英文摘要

Clinical assessments for neuromuscular disorders, such as Spinal Muscular Atrophy (SMA) and Duchenne Muscular Dystrophy (DMD), continue to rely on subjective measures to monitor treatment response and disease progression. We introduce a novel method using wearable sensors to objectively assess motor function during daily activities in 19 patients with DMD, 9 with SMA, and 13 age-matched controls. Pediatric movement data is complex due to confounding factors such as limb length variations in growing children and variability in movement speed. Our approach uses Shape-based Principal Component Analysis to align movement trajectories and identify distinct kinematic patterns, including variations in motion speed and asymmetry. Both DMD and SMA cohorts have individuals with motor function on par with healthy controls. Notably, patients with SMA showed greater activation of the motion asymmetry pattern. We further combined projections on these principal components with partial least squares (PLS) to identify a covariation mode with a canonical correlation of r = 0.78 (95% CI: [0.34, 0.94]) with muscle fat infiltration, the Brooke score (a motor function score), and age-related degenerative changes, proposing a novel motor function index. This data-driven method can be deployed in home settings, enabling better longitudinal tracking of treatment efficacy for children with neuromuscular disorders.

2404.18779 2026-05-12 stat.ME math.ST stat.CO stat.TH

Semiparametric fiducial inference for Cox models

Yifan Cui, Jan Hannig, Paul Edlefsen

AI总结 本文提出了一种用于半参数统计模型的新型 fiducial 推断方法,并以纪念已故的戴维·考克斯爵士而选取其提出的 Cox 比例风险模型作为实例,展示了该方法在生存数据分析中的应用。该方法在最大似然估计失效的情况下表现出色,为半参数模型的统计推断提供了新的思路和工具。

详情
英文摘要

R. A. Fisher introduced the fiducial distribution as a potential replacement for the Bayesian posterior distribution in the 1930s. During the past century, fiducial approaches have been explored in various parametric and nonparametric settings. However, to the best of our knowledge, no fiducial inference has been developed in the realm of semiparametric statistics. In this paper, we propose a novel fiducial approach for semiparametric models. In memory of Sir David Cox who passed away in 2022, we use the Cox proportional hazards model, which is the most popular model for the analysis of survival data, as a running example. Other models and extensions are also discussed. In our experiments, we find that our method performs particularly well in situations where the maximum likelihood estimator fails.

2403.04131 2026-05-12 stat.ME econ.EM

Extracting Mechanisms from Heterogeneous Effects: An Identification Strategy for Mediation Analysis

Jiawei Fu

AI总结 本文提出了一种新的识别策略,用于同时识别和估计处理效应与中介效应,克服了传统方法对多重忽略假设或复杂研究设计的依赖。该方法结合显性和隐性中介分析,利用处理效应的异质性,无需处理部分未观测的混杂因素,提高了估计的准确性与精确性。通过蒙特卡洛模拟和两个不同数据结构的实证研究,验证了该方法的有效性与实用性。

详情
英文摘要

Understanding causal mechanisms is crucial for explaining and generalizing empirical phenomena. Causal mediation analysis offers statistical techniques to quantify the mediation effects. Although numerous methods have been developed for causal inference more broadly, the methodological toolkit for causal mediation analysis remains limited. Current methods often require multiple ignorability assumptions or sophisticated research designs. In this paper, we introduce an alternative identification strategy that enables the simultaneous identification and estimation of treatment and mediation effects. By combining explicit and implicit mediation analysis, this strategy leverages heterogeneous treatment effects and does not require addressing some unobserved confounders. Monte Carlo simulations demonstrate that the method is more accurate and precise across various scenarios. To illustrate the efficiency and efficacy of our method, we apply it to estimate the causal mediation effects in two studies with distinct data structures, focusing on common pool resource governance and voting information.

2401.02694 2026-05-12 stat.ME

Nonconvex High-Dimensional Time-Varying Coefficient Estimation for Noisy High-Frequency Observations with a Factor Structure

Minseok Shin, Donggyu Kim

AI总结 本文研究了在存在噪声且具有因子结构的高维高频观测数据下,如何估计时间变化的系数问题。为应对噪声和强相关性,作者首先对观测过程进行平滑处理,并利用主成分分析降低协变量的高维相关性,再结合非凸惩罚回归方法估计局部系数,通过去偏和阈值处理得到最终的估计结果。提出的FATEN-LASSO估计方法在理论上有良好的集中性质,并适用于高维非凸优化问题。

Comments 104 pages, 8 figures

详情
英文摘要

In this paper, we propose a novel high-dimensional time-varying coefficient estimator for noisy high-frequency observations with a factor structure. In high-frequency finance, we often observe that noises dominate the signal of underlying true processes and that covariates exhibit a factor structure due to their strong dependence. Thus, we cannot apply usual regression procedures to analyze high-frequency observations. To handle the noises, we first employ a smoothing method for the observed dependent and covariate processes. Then, to handle the strong dependence of the covariate processes, we apply Principal Component Analysis (PCA) and transform the highly correlated covariate structure into a weakly correlated structure. However, the variables from PCA still contain non-negligible noises. To manage these non-negligible noises and the high dimensionality, we propose a nonconvex penalized regression method for each local coefficient. This method produces consistent but biased local coefficient estimators. To estimate the integrated coefficients, we propose a debiasing scheme and obtain a debiased integrated coefficient estimator using debiased local coefficient estimators. Then, to further account for the sparsity structure of the coefficients, we apply a thresholding scheme to the debiased integrated coefficient estimator. We call this scheme the Factor Adjusted Thresholded dEbiased Nonconvex LASSO (FATEN-LASSO) estimator. Furthermore, this paper establishes the concentration properties of the FATEN-LASSO estimator and discusses a nonconvex optimization algorithm.

2305.00207 2026-05-12 stat.AP stat.ME

Mixed-Response State-Space Model for Analyzing Multi-Dimensional Digital Phenotypes

Tianchen Xu, Yuan Chen, Donglin Zeng, Yuanjia Wang

AI总结 该研究针对多维数字表型数据建模中的挑战,提出了一种混合响应状态空间模型(MRSS),用于联合分析来自帕金森病患者的多模态、多维度数字表型及其测量过程。该模型通过有限个潜在状态时间序列捕捉个体健康状态的动态变化和个性化治疗效应,能够有效调整信息性测量带来的偏差。研究还提出了适用于高斯和非高斯表型的计算方法,并通过仿真和实际数据验证了模型的有效性。

Comments 59 pages, 14 figures, 8 tables

详情
英文摘要

Digital technologies (e.g., mobile phones) can be used to obtain objective, frequent, and real-world digital phenotypes from individuals. However, modeling these data poses substantial challenges since observational data are subject to confounding and various sources of variabilities. For example, signals on patients' underlying health status and treatment effects are mixed with variation due to the living environment and measurement noises. The digital phenotype data thus shows extensive variabilities between- and within-patient as well as across different health domains (e.g., motor, cognitive, and speaking). Motivated by a mobile health study of Parkinson's disease (PD), we develop a mixed-response state-space (MRSS) model to jointly capture multi-dimensional, multi-modal digital phenotypes and their measurement processes by a finite number of latent state time series. These latent states reflect the dynamic health status and personalized time-varying treatment effects and can be used to adjust for informative measurements. For computation, we use the Kalman filter for Gaussian phenotypes and importance sampling with Laplace approximation for non-Gaussian phenotypes. We conduct comprehensive simulation studies and demonstrate the advantage of MRSS in modeling a mobile health study that remotely collects real-time digital phenotypes from PD patients.

2208.00552 2026-05-12 econ.EM stat.ME

The Effect of Omitted Variables on the Sign of Regression Coefficients

Matthew A. Masten, Alexandre Poirier

AI总结 本文研究了被忽略变量对回归系数符号的影响,指出在某些情况下,被忽略变量更容易导致系数符号翻转,而非使其趋近于零。作者基于“Oster's delta”这一稳健性指标,提出了一种改进的衡量方法,以更准确地反映被忽略变量对估计结果的影响。研究通过四个实证案例和两次元分析进行了验证,并提供了相应的Stata模块以供应用。

Comments Main paper 32 pages. Appendix 32 pages

详情
英文摘要

We show that, depending on how the impact of omitted variables is measured, it can be substantially easier for omitted variables to flip coefficient signs than to drive them to zero. This behavior occurs with "Oster's delta" (Oster 2019), a widely reported robustness measure. Consequently, any time this measure is large -- suggesting that omitted variables may be unimportant -- a much smaller value reverses the sign of the parameter of interest. We propose a modified measure of robustness to address this concern. We illustrate our results in four empirical applications and two meta-analyses. We implement our methods in the companion Stata module regsensitivity.

2205.13469 2026-05-12 math.ST stat.ME stat.TH

Proximal Estimation and Inference

Alberto Quaini, Fabio Trojani

AI总结 本文构建了一个统一的凸分析框架,用于刻画一大类惩罚估计器在正则和非正则设计下的统计性质。核心方法是将惩罚估计器解释为通过近端算子作用于初始估计器的近端估计器,并推导了其渐近分布的闭式表达式,该分布仅依赖于初始估计器的渐近分布、惩罚函数的子梯度极限以及近端算子的内积结构。研究还揭示了近端估计器的Oracle性质,并据此构建了适用于线性回归场景的新近端估计器,具有根号n一致性和渐近正态性,且在实际蒙特卡洛实验中表现出良好性能。

详情
英文摘要

We build a unifying convex analysis framework characterizing the statistical properties of a large class of penalized estimators, both under a regular and an irregular design. Our framework interprets penalized estimators as proximal estimators, defined by a proximal operator applied to a corresponding initial estimator. We characterize the asymptotic properties of proximal estimators, showing that their asymptotic distribution follows a closed-form formula depending only on (i) the asymptotic distribution of the initial estimator, (ii) the estimator's limit penalty subgradient and (iii) the inner product defining the associated proximal operator. In parallel, we characterize the Oracle features of proximal estimators from the properties of their penalty's subgradients. We exploit our approach to systematically cover linear regression settings with a regular or irregular design. For these settings, we build new $\sqrt{n}-$consistent, asymptotically normal Ridgeless-type proximal estimators, which feature the Oracle property and are shown to perform satisfactorily in practically relevant Monte Carlo settings.

2111.05243 2026-05-12 econ.EM stat.ME

Bounding Treatment Effects by Pooling Limited Information across Observations

Sokbae Lee, Martin Weidner

AI总结 本文提出了一种在无混淆假设下对处理效应进行约束的新方法,适用于处理变量取值多或重叠条件不满足等挑战性场景。该方法通过有限地跨观测信息整合,构建出基于样本平均的处理效应边界,既避免了Manski边界的信息完全不整合问题,也不同于传统的逆概率加权方法。研究通过蒙特卡洛实验和实证应用验证了该方法在实际中的稳健性和有效性。

详情
英文摘要

We provide novel bounds on average treatment effects (on the treated) that are valid under an unconfoundedness assumption. Our bounds are designed to be robust in challenging situations, for example, when the conditioning variables take on a large number of different values in the observed sample, or when the overlap condition is violated. This robustness is achieved by only using limited "pooling" of information across observations. Namely, the bounds are constructed as sample averages over functions of the observed outcomes such that the contribution of each outcome only depends on the treatment status of a limited number of observations. No information pooling across observations leads to so-called "Manski bounds", while unlimited information pooling leads to standard inverse propensity score weighting. We explore the intermediate range between these two extremes and provide corresponding inference methods. We show in Monte Carlo experiments and through two empirical application that our bounds are indeed robust and informative in practice.

2110.12907 2026-05-12 stat.ML cs.LG math.PR math.ST stat.TH

Hamiltonian Monte Carlo with Asymmetrical Momentum Distributions

Soumyadip Ghosh, Yingdong Lu, Tomasz Nowicki

AI总结 本文研究了哈密顿蒙特卡洛(HMC)算法在使用非对称动量分布时的收敛性问题。传统HMC依赖对称的高斯动量变量,而本文通过新的动力学和概率分析,提出了在更弱条件下保证收敛的理论框架,并指出普通HMC在非对称动量下会破坏自伴随性要求。为此,作者提出了一种改进的AD-HMC算法,能够在Wasserstein距离下实现几何收敛,并通过数值实验验证了其相对于传统高斯辅助HMC的优越性。

详情
英文摘要

Existing rigorous convergence guarantees for the Hamiltonian Monte Carlo (HMC) algorithm use Gaussian auxiliary momentum variables, which are crucially symmetrically distributed. We present a novel convergence analysis for HMC utilizing new dynamical and probabilistic arguments. The convergence is rigorously established under significantly weaker conditions, which among others allow for general auxiliary distributions. In our framework, we show that plain HMC with asymmetrical momentum distributions breaks a key self-adjointness requirement. We propose a modified version of HMC, that we call the Alternating Direction HMC (AD-HMC), which overcomes this difficulty. Sufficient conditions are established under which AD-HMC exhibits geometric convergence in Wasserstein distance. The geometric convergence analysis is extended to when the Hamiltonian motion is approximated by the leapfrog symplectic integrator, where an additional Metropolis-Hastings rejection step is required. Numerical experiments suggest that AD-HMC can generalize a popular dynamic auxiliary scheme to show improved performance over HMC with Gaussian auxiliaries.

1902.00772 2026-05-12 stat.ME stat.ML

High-dimensional semi-supervised learning: in search for optimal inference of the mean

Yuqian Zhang, Jelena Bradic

AI总结 本文研究了高维半监督学习中对缺失结果数据进行均值估计的最优推断问题。作者提出了一种新的k折交叉拟合双重稳健估计方法,能够在仅需一致估计结果(可能以比根号n更慢的速度收敛)的情况下,实现对结果均值的根号n精度推断。该方法适用于线性和非线性结果模型,尤其适合高维、非参数或半参数模型,文中还将其应用于异质处理效应的估计。

详情
Journal ref
Biometrika, 109(2), 387-403 (2022)
英文摘要

A fundamental challenge in semi-supervised learning lies in the observed data's disproportional size when compared with the size of the data collected with missing outcomes. An implicit understanding is that the dataset with missing outcomes, being significantly larger, ought to improve estimation and inference. However, it is unclear to what extent this is correct. We illustrate one clear benefit: root-n inference of the outcome's mean is possible while only requiring a consistent estimation of the outcome, possibly at a rate slower than root n. This is achieved by a novel k-fold, cross-fitted, double robust estimator. We discuss both linear and nonlinear outcomes. Such an estimator is particularly suited for models that naturally do not admit root-n consistency, such as high-dimensional, nonparametric or semiparametric models. We apply our methods to estimating heterogeneous treatment effects.

1811.01198 2026-05-12 cs.LG math.OC stat.ML

Provable Exactness for Asymmetric Low-Rank SDP Learning

Enliang Hu

AI总结 本文研究了一种统一的正则化非对称低秩半定规划(aBMF)框架,旨在解决机器学习中的结构化优化问题。通过引入一个二次惩罚项,该方法在保持目标函数双凸性的同时,确保了在足够大的惩罚参数下,非对称方法与对称方法具有相同的临界点,从而保证解的精确性。该研究为非对称松弛方法提供了理论保证,解决了关于是否存在精确惩罚的开放问题。

详情
英文摘要

Low-rank factorization is a standard way to make structured optimization problems in machine learning more tractable by replacing matrix variables with compact factors. For positive semidefinite (PSD) variables, the symmetric Burer--Monteiro factorization (sBMF) writes $Z=XX^\top$ with a single low-rank factor $X$. A recent asymmetric alternative (aBMF) writes $Z=XY^\top$ and adds a quadratic penalty $(γ/2)\|X-Y\|_F^2$ to encourage symmetry. This split is attractive because it yields a biconvex objective with alternating convex subproblems, but its practical value depends strongly on how the penalty parameter $γ$ is chosen. We study a unified regularized aBMF framework and derive an explicit lower bound on $γ$ that guarantees exactness: under mild assumptions, any $γ$ above this threshold makes aBMF and sBMF share the same critical points. This gives a principled way to use the asymmetric formulation without altering the critical-point structure of the symmetric problem. In particular, it answers the open question of whether an exact penalty exists for asymmetric relaxation.

1808.09448 2026-05-12 stat.ME stat.AP

Estimating the distribution of marks of a homogeneous marked Poisson process

Dragi Anevski, Vladimir Pastukhov

AI总结 本文研究了如何估计齐次标记泊松过程中不同类型事件的分布。作者提出了该分布的最大似然估计方法,并给出了其强一致性与渐近正态性。此外,还提出了一个满足序约束的估计方法,并分析了其一致性与渐近分布。该方法被应用于瑞典隆德欧洲散裂中子源新型中子探测器中的中子检测问题。

详情
英文摘要

In this paper we propose an estimator of the distribution of events of different kinds in a homogeneous Poisson process. We give an explicit solution for the maximum likelihood estimator of the distribution and derive its strong consistency and asymptotic normality. We also provide an order restricted estimator of the distribution and derive its consistency and asymptotic distribution. The inference problem gives rise to a Sylvester-Ramanujan system of equations. We discuss application of the estimator to the detection of neutrons in a novel detector developed at the European Spallation Source in Lund, Sweden.

2605.08432 2026-05-12 cs.CL cs.AI stat.ML

A Semantic-Sampling Framework for Evaluating Calibration in Open-Ended Question Answering

Zhanliang Wang, Jiancong Xiao, Ruochen Jin, Shu Yang, Bojian Hou, Li Shen

AI总结 该论文提出了一种用于评估开放域问答中大语言模型校准性能的语义抽样框架Sem-ECE。该方法通过从模型中采样答案并按语义分类,利用分类频率作为置信度,解决了现有方法在开放域场景下评估校准的不足。研究引入了两种估计器Sem₁-ECE和Sem₂-ECE,并证明其在大样本下无偏,且在难问题上表现出更小的校准误差,为问题难度诊断提供了依据。实验表明,Sem-ECE在多个基准测试中优于现有方法,具有重要的实际应用价值。

Comments Preprint

详情
英文摘要

Calibration measures whether a model's predicted confidence aligns with its empirical accuracy, and is central to the reliable deployment of large language models (LLMs) in high-stakes domains such as medicine and law. While much recent work focuses on improving LLM calibration, the equally important question of how to evaluate it in realistic settings remains underdeveloped. Open-ended question answering (QA), the most common deployment setting for modern LLMs, is where existing evaluation methods fall short: logit-based metrics need restricted output formats and internal probabilities; verbalized confidence is self-reported and often overconfident; and sampling-based methods rely on task-specific extraction rules without a clear finite-sample target. We introduce Sem-ECE (Semantic-Sampling Expected Calibration Error), a calibration evaluation framework for open-ended QA that samples answers from the model, groups them into semantic classes, and uses the resulting frequencies as confidence. We study two estimators within this framework: Sem$_1$-ECE, the same-sample self-consistency score, and Sem$_2$-ECE, a held-out variant that separates answer selection from confidence evaluation. We prove both are asymptotically unbiased, and further show that they agree on easy questions but diverge on hard ones with Sem$_2$ achieving strictly smaller calibration error, so their gap also serves as a diagnostic for question difficulty. Experiments on three open-ended QA benchmarks across five leading commercial LLMs match our theoretical predictions and show that Sem-ECE outperforms verbalized confidence and existing sampling-based methods, while complementing logit-based evaluation when internal probabilities are unavailable.

2605.08429 2026-05-12 stat.ML cs.LG stat.ME

Active Multiple-Prediction-Powered Inference

Nicholas Brawand, Nima Leclerc, Anhthy Ngo, Matthew Peterson, Sriram Vishwanath, Laith Alhussein, Ben Wellner

AI总结 在医疗AI的部署后监控中,如何以较少的标注数据实现统计有效的推断是一个重要问题。本文提出了一种主动多预测器驱动推断(AM-PPI)方法,通过将每个样本路由到适合其成本的预测器子集,并根据所选预测器的残差不确定性按比例采样标注标签,从而在有限预算下降低估计方差。该方法扩展了单一预测器的预测驱动推断和主动统计推断,实现了多预测器的全局分配到实例级自适应路由的转变,并在理论和实验上验证了其有效性与优越性。

详情
英文摘要

Post-deployment monitoring of healthcare AI requires statistically valid, label-efficient methods, but gold-standard labels from clinician chart review are expensive. Prediction-powered inference (PPI) and active statistical inference (ASI) reduce label cost by combining a small labeled sample with abundant model predictions, but both are restricted to a single predictor, a poor fit for modern clinical pipelines that have multiple predictors of differing cost and accuracy available at inference time. We propose Active Multiple-Prediction-Powered Inference (AM-PPI), which routes each instance to a cost-appropriate predictor subset, samples gold-standard labels in proportion to the chosen subset's residual uncertainty, and reweights predictions to minimize estimator variance, all under a single deployment-time budget. AM-PPI generalizes ASI to leverage multiple predictors and extends Multiple-PPI from global per-predictor allocation to per-instance adaptive routing. We derive closed-form Karush-Kuhn-Tucker (KKT) conditions for all three decisions and prove, via biconvexity and strong duality, that the resulting fixed point is a global optimum despite the joint problem being non-jointly-convex. We establish asymptotic normality with valid coverage, minimum-variance unbiasedness within the linear-prediction augmented inverse propensity weighted (AIPW) class, and a closed-form criterion identifying when multiple predictors help. On synthetic data and three healthcare monitoring tasks, AM-PPI produces 10 to 40 percent narrower confidence intervals (CIs) than single-predictor ASI in the budget regime where routing matters, and matches the better baseline elsewhere.

2605.08423 2026-05-12 cs.LG cs.CL stat.ML

Queryable LoRA: Instruction-Regularized Routing Over Shared Low-Rank Update Atoms

Omatharv Bharat Vaidya, Connor T. Jerzak, Nhat Ho, Chandrajit Bajaj

AI总结 本文提出了一种数据自适应的参数高效微调方法,用于大神经网络的优化。该方法通过引入一个共享的、可查询的低秩更新原子记忆库,替代传统的层内适配器,使得模型能够根据输入内容和网络计算过程动态选择适合的更新组件,从而在保持低秩适应效率的同时实现更灵活的参数更新。此外,通过引入指令正则化机制,模型能够偏向语义相关方向进行更新,提升训练稳定性与最终性能。

详情
英文摘要

We present a data-adaptive method for parameter-efficient fine-tuning of large neural networks. Standard low-rank adaptation methods improve efficiency by restricting each layer update to a fixed low-rank form, but this static parameterization can be too rigid when the appropriate correction depends on the input and on the evolving depth-wise computation of the network. Our approach replaces a purely layer-local adapter with a shared queryable memory of low-rank update atoms. For each block of layers, the model forms a query from the current low-rank state and a running summary of previous blocks, uses this query to retrieve a content-dependent combination of shared update components via attention, and applies the resulting routed operator within the low-rank bottleneck. In this way, the method retains the efficiency and scalability of low-rank adaptation while allowing the effective update to vary across inputs and to share reusable structure across layers. The resulting architecture provides a principled middle ground between static LoRA-style updates and fully generated parameter updates: it remains compact and parameter-efficient while supporting dynamic, context-sensitive adaptation. Further, we incorporate instruction-regularization by augmenting routing logits with a language-induced prior over update atoms, thereby biasing the selection of low-rank transformations toward semantically relevant directions without generating unconstrained parameter updates. Experiments on noisy non-linear regression tasks and LLM fine-tuning suggest that this queryable update-memory formulation can improve final test performance and training stability compared to standard low-rank adaptation, while using a comparable number of trainable parameters.

2605.08422 2026-05-12 stat.ME econ.EM stat.CO

Rolling-Origin Conformal Prediction under Local Stationarity and Weak Dependence

Stanisław M. S. Halkiewicz

AI总结 本文提出并分析了一种基于滚动起始点的共形预测方法,用于时间序列预测,旨在应对序列相关性、波动聚集和分布漂移等问题。在Hölder-β局部平稳性和α-混合条件下,作者建立了覆盖误差的四部分分解,并推导出最优校准窗口大小及相应的误差收敛速率,证明该方法在该模型类中具有最小最大最优性。实证结果显示,该方法在多个实际数据集上表现优于传统全历史校准方法,验证了理论分析的有效性。

详情
英文摘要

We propose and analyse rolling-origin conformal prediction for time-series forecasting. The method calibrates the conformal quantile against the $m$ most recent pseudo-out-of-sample forecast errors, adapting to serial dependence, volatility clustering, and distributional drift that invalidate classical conformal guarantees. Under Hölder-$β$ local stationarity and $α$-mixing, we establish a four-term coverage-error decomposition and derive the optimal calibration window $m^{\star} \asymp T^{2β/(2β+1)}$ with coverage-error rate $O(T^{-β/(2β+1)})$. A Le Cam two-point construction shows this rate is minimax-optimal over the Hölder-$β$ model class. The Bahadur representation is proved under both $α$-mixing and the physical-dependence framework of Wu (2005). An oracle inequality formalises Winkler cross-validation as an adaptive window selector; the required uniform concentration condition is established in an appendix. Validation on six real series and 93 M4 competition series confirms the theory: rolling-origin calibration outperforms full-history calibration in 86\% of comparisons (median Winkler improvement 12.3\%), maintains coverage within $\pm2\%$ of the 90\% target at short and medium horizons, and the cross-frequency log-log regression slope $0.614$ ($95\%$ CI $[0.424, 0.805]$) is consistent with the theoretical $2/3$ after controlling for frequency fixed effects.

2605.08400 2026-05-12 math.ST cs.IT cs.LG math.IT stat.ML stat.TH

On Observation Time for Recovering Latent Hawkes Networks

Jonas Linkerhägner, Michele Bortolasi, Lorenzo Baldassari, Maarten V. de Hoop, Ivan Dokmanić

AI总结 本文研究了从基于事件的观测中恢复潜在交互网络所需的最小观测时间问题,该问题在金融、地震学和神经科学等领域具有重要意义。针对一类具有稀疏、弱交互的平稳Hawkes过程,作者证明了观测时间在数量级上需为$\log d$,其中$d$为交互实体的数量,这一时间尺度既是充分条件也是必要条件。研究提出了一个两阶段估计方法,并结合泊松簇表示的浓度界与Fano不等式及Jacod公式,给出了理论保证。

详情
英文摘要

Dynamics of interacting systems in engineering, society, and nature often evolve over latent networks that govern which entities can interact. We study the problem of inferring these networks from event-based observations, which arise naturally in finance, seismology, and neuroscience. While there is substantial algorithmic work addressing this important problem, theoretical results are scarce. In this paper we ask the following fundamental question: what is the minimum time that one must observe the dynamics in order to exactly recover the underlying network, as a function of the number $d$ of interacting entities? For a class of stationary Hawkes processes with sparse, weak interactions, we prove that an observation time of order $\log d$ is sufficient and necessary. For the upper bound we construct a two-stage estimator that uses clipped and binned event data for screening, followed by a least-squares refinement, and apply concentration bounds derived from the Poisson cluster representation. For the lower bound we combine Fano's inequality with Jacod's Girsanov formula for point processes on a suitable subclass of networks.

2605.08395 2026-05-12 stat.ME stat.AP

Statistical Design of Pragmatic Trials Using Electronic Health Record Data when Outcome Assessments are Uncontrolled and Irregular

Jennifer F. Bobb, Sungtaek Son, Melissa L. Anderson, Noorie Hyun, Lynn L. DeBar, Katharine A. Bradley

AI总结 本文研究了在使用电子健康记录数据进行实际临床试验时,如何设计统计方法以应对不规则且不受控的结局评估问题。研究提出了一种基于模拟的方法,用于评估不同统计模型在处理稀疏和干预依赖型评估时的效果,并应用于MI-CARE实际试验中。结果表明,考虑随访时间的灵活模型能够无偏地估计治疗效应,而线性混合模型在多种调整下表现出最强的统计效力,为实际试验的分析方法选择提供了重要参考。

Comments 24 pages, 2 figures; includes supplementary material

详情
英文摘要

Pragmatic trials increasingly define outcomes using real-world data such as electronic health records, where assessments are collected during routine care rather than at fixed timepoints. Consequently, these uncontrolled assessments may be irregular, sparse, and affected by the intervention (intervention-dependent assessments), which can lead to biased treatment effect estimates. We developed a simulation study to inform the statistical approach for trials with uncontrolled assessments, which we applied to the MI-CARE pragmatic trial. Using a pre-trial cohort mimicking eligibility and outcome measurement, we estimated assessment frequency and timing and combined these estimates with assumptions about how the intervention effects might impact assessment. We simulated sparse and intervention-dependent assessments and compared single-measure approaches with longitudinal models using all scores. Under intervention-dependent assessments, we found that naive methods such as using the best score or using a randomly selected score without adjusting for measurement timing produced substantial bias. Models that adjusted flexibly for the follow-up timing estimated time-point specific or time-averaged treatment effects without bias. Simulation results informed the selection of the statistical approach for the MI-CARE trial. Among unbiased methods, the most powerful was a linear mixed model with exponential correlation structure, adjustment for time since baseline, and a time-varying intervention effect to estimate the intervention effect at the end of the intervention window. Future studies can use pre-trial data to conduct a simulation study tailored to the trial's data features to inform the analytic approach. Trials with uncontrolled assessments should consider the potential for intervention-dependent assessments and select an appropriate method to avoid bias.

2605.08379 2026-05-12 stat.AP cs.LG

Transfer Learning for Dead Fuel Moisture Prediction Using Time-Warping Recurrent Neural Networks

Jonathon Hirschi, Jan Mandel, Adam Kochanski

AI总结 本文提出了一种基于时间扭曲的迁移学习方法,通过调整长短期记忆网络(LSTM)的时间尺度,实现不同燃料含水率类别之间的任务迁移。研究针对燃料含水率预测问题,利用天气站传感器获取的大量10小时燃料数据训练模型,并迁移预测1小时、100小时和1000小时燃料的含水率。该方法在俄克拉荷马州的一项标志性实地研究数据上进行了验证,有效提升了稀疏观测条件下的预测性能。

Comments Preprint. Related to PhD thesis work that is also available for preprint at https://doi.org/10.48550/arXiv.2604.02474

详情
英文摘要

This paper proposes a time-warping transfer learning method, a technique for temporally rescaling the learned dynamics of a recurrent neural network (RNN) with a Long Short-Term Memory (LSTM) layer to enable task transfer across fuel moisture classes. Fuel moisture content (FMC) is divided into idealized classes based on characteristic lag time. Large quantities of real-time data are available for 10h fuels from sensors on weather stations, but observations of other fuel classes are sparse in space and time. We use transfer learning to adapt an RNN pretrained on 10h FMC to predict FMC for 1h, 100h, and 1000h fuels. We validate this method using data from a landmark field study conducted in Oklahoma that was used to calibrate the state-of-the-art Nelson fuel moisture model.

2605.08377 2026-05-12 cs.LG stat.ML

Embedding Dimension Lower Bounds for Universality of Deep Sets and Janossy Pooling

Ali Syed, Aditya Nambiar, Jonathan W. Siegel

AI总结 本文研究了在点云数据中实现排列对称性的深度神经网络架构的通用性问题,重点分析了Deep Sets和Janossy Pooling方法所需的嵌入维度下界。通过提出一种新方法,作者证明了保证这些架构通用性的嵌入维度的新的下界,其中对于Deep Sets,结果在维度大于1时给出了正确的最小嵌入维度(相差常数因子),而对于$k$-元Janossy Pooling,这是首次证明了$k > 1$时的非平凡下界。

详情
英文摘要

In many practical applications it is important to build symmetries into neural network architectures. Consider the important case of permutation symmetry on point clouds consisting of $n$ points in $d$ dimensions. In this case the network learns a function on a set of $n$ points in $\mathbb{R}^d$, and a natural paradigm for constructing invariant networks is Janossy pooling, which generalizes the popular Deep Sets architecture. We study the universality of this approach, in particular the important question of how large the embedding dimension must be to guarantee universality of this architecture. Specifically, using a novel technique, we prove new lower bounds on the required size of this embedding dimension. For Deep Sets, this gives the correct minimal dimension up to a constant factor for all $d > 1$. For $k$-ary Janossy pooling, we prove the first non-trivial lower bound on the required embedding dimension when $k > 1$.

2605.08272 2026-05-12 stat.AP

Quantifying Exposure Information Uncertainty in Regional Risk Assessment

Chenhao Wu, Henry Burton

AI总结 该研究旨在量化区域风险评估中因暴露信息不完整所带来的偏差和不确定性。通过结合分析与模拟方法,提出了一种将总不确定性分解为暴露信息不完整及其他来源(如灾害和损害特征)贡献的方法,从而更清晰地揭示缺失信息对风险评估的影响路径。研究还应用了该方法于桥梁和区域风险评估,并利用数据增强框架构建了高分辨率的桥梁暴露信息库存。

详情
英文摘要

Exposure characterization in regional risk assessment aims to assign physical properties to the assets of interest so they can be associated with damage and loss functions. While this process has benefited from the growing availability of public infrastructure inventories, these datasets often lack the detailed attributes required for high-resolution risk assessment. Missing attributes are commonly inferred using predictive models or engineering-based rulesets. However, these imputations are inherently imperfect and can introduce bias and additional uncertainty in regional risk estimates. This study proposes a methodology to quantify the bias and uncertainty in regional risk assessment that arises from probabilistic exposure characterization. By integrating analytical and simulation-based approaches, the methodology decomposes the total uncertainty into contributions from incomplete exposure information as well as other sources, including hazard and damage characterization. This decomposition clarifies how bias and uncertainty associated with missing exposure information are generated and propagated through the risk assessment pipeline. The methodology is applied to both bridge-specific and regional risk assessments. A high-resolution bridge exposure inventory is developed using a data augmentation framework that combines publicly available information with machine learning and engineering-based imputation methods.

2605.08263 2026-05-12 stat.ML cs.IT cs.LG eess.SP math.IT stat.ME

Decentralized Conformal Novelty Detection via Quantized Model Exchange

Kyle Loh, Yu Xiang

AI总结 本文研究了在保护隐私和节省带宽的前提下,如何在异构复合零假设分布下实现去中心化的异常检测,并控制全局错误发现率(FDR)。研究提出了一种基于量化模型交换的框架,使各独立代理能够共享本地学习的非一致性评分函数的低精度表示。该方法在保证条件交换性的同时,提供了严格的有限样本FDR控制保障,实验验证了其在保持统计效力的同时显著降低了通信成本。

详情
英文摘要

This work studies decentralized novelty detection with global false discovery rate (FDR) control across heterogeneous composite null distributions, without sharing the raw data due to privacy and bandwidth considerations. We propose a framework based on the exchange of quantized surrogate models, allowing independent agents to share low-precision representations of locally learned non-conformity score functions. We prove that evaluating data against these quantized composite scores preserves conditional exchangeability, providing rigorous finite-sample guarantees for global FDR control. Empirical studies on synthetic datasets confirm our theoretical results, demonstrating that the proposed approach maintains competitive statistical power while drastically reducing the communication cost.

2605.08237 2026-05-12 cs.LG stat.ML

Distributional Spectral Diagnostics for Localizing Grokking Transitions

Ziyue Wang, Yufeng Ying, Takafumi Kanamori

AI总结 该研究探讨了机器学习模型在“grokking”现象中从记忆训练数据到泛化的转变过程,并提出了一种基于分布谱分析的方法来定位这一转变。通过将任务相关的观测值映射到 Wasserstein/分位数坐标,并结合 Hankel 动态模态分解,研究构建了用于诊断的残差、谱特征和有效秩等指标。实验表明,该方法在模块加法 Transformer 模型中能够有效区分 grokking 与非 grokking 运行,并在固定阈值下实现提前预警,具有较高的检测性能和实用性。

详情
英文摘要

In grokking, a model first fits the training data while test accuracy remains low, and only later begins to generalize. We ask whether this transition can be localized from observed training trajectories before the test accuracy rises, and formulate grokking transition localization as a diagnostic problem with an explicit threshold/FPR/lead-time trade-off. Task-dependent observables are summarized as empirical distributions, mapped to Wasserstein/quantile coordinates, and analyzed by Hankel dynamic mode decomposition (DMD); the resulting reconstruction residual, together with spectrum and effective rank, forms the diagnostic output. On held-out modular-addition Transformer runs, the residual achieves AUROC \(\approx \) 0.93 for grokking-vs-non-grokking discrimination at the run level; under a fixed sustained-threshold operating rule, true-positive alarms can precede onset, with lead time reported jointly with false-alarm rate and uncertainty intervals. Perturbation experiments show that, in the tested \(wd=1\) pool, high-residual windows exhibit about \(3\times\) larger short-horizon perturbation deviation than low-residual windows. In a same-data norm-window control, perturbation sensitivity aligns with the residual ordering rather than total-parameter-norm ordering, suggesting that the residual is not merely a total-norm proxy at the window level in the studied \(wd=1\) dynamics. Norm signals remain strong run-level regime indicators, and log-probability performs best among the observables tested under the current protocol. We position the residual as a window-level monitoring and localization signal in the studied modular-arithmetic Transformer settings, not a universal early-warning predictor or an intervention rule.

2605.08230 2026-05-12 cs.LG stat.AP

Social Determinants of Health and Fentanyl Overdose Mortality Across US Counties: An XGBoost and SHAP Analysis Identifying Silent Risk Counties and Treatment Deserts

Kabi Raj Tiruwa, Abhisan Ghimire, Anuj Kumar Shah

AI总结 该研究利用XGBoost和SHAP方法,分析美国各县的社 hội决定因素与芬太尼过量死亡率之间的关系,旨在识别高风险但未被关注的“沉默风险县”和“治疗荒漠县”。研究整合了多项公共卫生数据,发现残疾率、高血压、吸烟和交通不便等因素是预测过量死亡的关键指标,并揭示治疗荒漠县的死亡风险显著升高。研究结果为制定针对性干预措施提供了依据,强调应优先扩展药物使用障碍治疗资源,并对高风险地区进行早期干预。

Comments 21 pages, 7 figures, 4 tables

详情
英文摘要

Background: Fentanyl overdose deaths are still increasing across the U.S. We do not fully understand which county-level social and structural conditions lead to higher overdose death rates. Social determinants of health, including disability, treatment access, and behavioral health issues, may help identify vulnerable counties before deaths become severe. No earlier study has used explainable machine learning with SHAP attribution on 2022 CDC WONDER data to study treatment access gaps and silent risk counties. Methods: We combined data from four government sources for 975 U.S. counties, including CDC WONDER (2022) overdose mortality data, CDC Social Vulnerability Index (SVI), CDC PLACES health behavior data, and Area Health Resources Files. An XGBoost model was used to predict overdose mortality risk using Standardized Mortality Ratio (SMR). Five-fold cross-validation was used to test model accuracy, and SHAP values were used to show which factors increase or decrease risk. Results: XGBoost outperformed all tested models (Spearman rho=0.67, R2=0.457, MAE=0.409, high-risk recall=71.1%). Top predictors were disability rate, hypertension, smoking, and lack of vehicle access. Treatment desert counties had 52.6% higher overdose mortality (SMR 1.786 vs 1.170; p<0.0001). K-means identified 143 silent risk counties. Overdose deaths were spatially clustered (Moran's I=0.505, p=0.001) with 75 hotspots and 136 coldspots. Suppressed counties were 58.2% of WONDER counties, mostly rural (72%) and treatment deserts (65%). Conclusions: County-level SDOH factors predict overdose deaths, especially disability, treatment access, and behavioral health burden. MOUD expansion should prioritize treatment desert counties, and silent risk counties need early intervention before mortality worsens.

2605.08155 2026-05-12 stat.AP math.PR physics.data-an physics.flu-dyn

Structural and Lagrangian properties of analogue ensembles to characterize multifractality of stochastic processes

Carlos Granero-Belinchon

AI总结 本文提出了一种在重构的有限维相空间中表征随机过程标度不变性的框架。该方法基于Takens嵌入重构,通过定义类似状态的集合,分析相空间的结构和动力学特性,并将目标状态的最近邻作为其类似状态。研究发现,类似状态的体积分布及其随时间的扩散特性能够反映随机过程的标度不变性,适用于如正则分数布朗运动和正则多重分形随机游走等一类具有标度不变性的平稳耗散过程。

详情
英文摘要

We present a framework for the scale-invariance characterization of stochastic processes in reconstructed finite-dimensional phase spaces. This framework analyses the structural and dynamical properties of the phase space and is based on a Takens embedding reconstruction followed by the definition of ensembles of analogue states. We define the analogues of a target state as its nearest neighbors. Then, we specify a collection of target states densely sampling the full phase space. For each target state, we search for the ensemble of its k-best analogues and we analyze its volume and dynamics. First, we study the probability distribution of the volumes and relate its mean and variance to the scale-invariance properties of the stochastic process. Second, we study the Lagrangian properties of the analogues by characterizing how they disperse in time. More particularly, we study the volume occupied by the analogue's successors in function of time and of their initial volume. We link these dynamical properties to the scale-invariance properties of the process. We analyze two types of stationary and dissipative 1-dimensional scale-invariant processes: regularized fractional Brownian motion and regularized multifractal random walk. For both processes, the structure and dynamics of the phase space are determined by their scale-invariant properties.

2605.08111 2026-05-12 cs.LG cs.AI stat.ME

TTCD:Transformer Integrated Temporal Causal Discovery from Non-Stationary Time Series Data

Omar Faruque, Sahara Ali, Xue Zheng, Jianwu Wang

AI总结 该论文提出了一种名为TTCD的新型端到端框架,用于从非平稳时间序列数据中发现瞬时和滞后因果关系。TTCD结合了Transformer架构,引入了非平稳特征学习模块和自定义因果结构学习模块,通过重建引导的因果信号蒸馏方法,有效抑制噪声和虚假相关性,从而在无需强统计假设的前提下推断出潜在的因果图。实验表明,TTCD在多种合成和真实数据集上均优于现有方法,展现出在复杂现实场景中进行因果发现的有效性。

Comments 18 Pages

详情
英文摘要

The widespread availability of complex time series data in various domains such as environmental science, epidemiology, and economics demands robust causal discovery methods that can identify intricate contemporaneous and lagged relationships in non-stationary, nonlinear, and noisy settings. Existing constraint-based methods often rely heavily on conditional independence tests that degrade for limited data samples and complex distributions, while score-based methods impose strong statistical assumptions. Recent methods address special cases such as change point detection or distribution shifts, but struggle to provide a unified solution. We propose the Transformer Integrated Temporal Causal Discovery (TTCD) Framework, a novel end-to-end approach that learns contemporaneous and lagged causal relations from non-stationary time series. TTCD introduces a Non-Stationary Feature Learner integrating temporal and frequency-domain attention with dynamic non-stationarity profiling, and a custom Causal Structure Learner. A key innovation is reconstruction-guided causal signal distillation, to distill essential causal signals through the reconstruction process of the transformer decoder, which mitigates noise and spurious correlations while preserving meaningful dependencies. The Causal Structure Learner operates on distilled reconstructed signals to infer the underlying causal graph without restrictive assumptions on noise distributions or data generation processes. Experiments on synthetic, benchmark, and real world datasets show that TTCD consistently outperforms state-of-the-art baselines in both accuracy and consistency with domain knowledge, demonstrating the approach's effectiveness for causal discovery in challenging real world contexts.

2605.08102 2026-05-12 cs.LG stat.ML

Path-Based Gradient Boosting for Graph-Level Prediction

Claudio Meggio, Johan Pensar, Riccardo De Bin

AI总结 本文提出了一种名为PathBoost的梯度树提升方法,用于图级别的分类与回归任务,能够直接从图结构中学习具有判别性的路径特征。该方法在原有针对化学应用的工作基础上进行了三项关键扩展,包括对二分类任务的适配、多节点和边属性的融合以及自动选择锚点节点。实验表明,PathBoost在多个基准数据集上表现优异,尤其在节点数量较多的图上效果更佳,展示了基于路径的提升方法在复杂图任务中的竞争力。

Comments 20 Pages, 1 figure

详情
英文摘要

We propose PathBoost, a gradient tree boosting method for graph-level classification and regression that learns discriminative path-based features directly from the input graph structure. Building on a previous work, which was tailored to a specific chemistry application, PathBoost introduces three key extensions: (i) adaptation to binary classification through gradient boosting with a logistic loss, (ii) incorporation of multiple node and edge attributes into the path feature space via a prefix-based decomposition, and (iii) automatic anchor node selection based on categorical attribute diversity, eliminating the need for the user to specify the starting point of the considered path features. We compared PathBoost to graph neural networks and graph kernel approaches on several benchmark datasets, obtaining better results in half of them, and comparable results in the rest. PathBoost shows better performances on graphs with larger average node counts. Overall, the results demonstrate that path-based boosting methods can be competitive with more complex black-box approaches.

2604.02474 2026-05-12 cs.LG stat.ML

Time-Warping Recurrent Neural Networks for Transfer Learning

Jonathon Hirschi

AI总结 本文研究了如何利用时间拉伸方法在循环神经网络(RNN)中实现迁移学习,以应对物理系统在不同环境条件下演化速度变化的问题。提出的方法基于对时间尺度的重新标定,证明了LSTM可以高精度逼近一类线性微分方程模型,并在保持精度的前提下进行时间拉伸。该方法在预测燃料含水率的应用中得到验证,实验表明,仅调整少量参数即可实现对不同时间尺度数据的准确预测,效果与现有迁移学习方法相当。

详情
英文摘要

Dynamical systems describe how a physical system evolves over time. Physical processes can evolve faster or slower in different environmental conditions. We use time-warping as rescaling the time in a model of a physical system. This thesis proposes a new method of transfer learning for Recurrent Neural Networks (RNNs) based on time-warping. We prove that for a class of linear, first-order differential equations known as time lag models, an LSTM can approximate these systems with any desired accuracy, and the model can be time-warped while maintaining the approximation accuracy. The Time-Warping method of transfer learning is then evaluated in an applied problem on predicting fuel moisture content (FMC), an important concept in wildfire modeling. An RNN with LSTM recurrent layers is pretrained on fuels with a characteristic time scale of 10 hours, where there are large quantities of data available for training. The RNN is then modified with transfer learning to generate predictions for fuels with characteristic time scales of 1 hour, 100 hours, and 1000 hours. The Time-Warping method is evaluated against several known methods of transfer learning. The Time-Warping method produces predictions with an accuracy level comparable to the established methods, despite modifying only a small fraction of the parameters that the other methods modify.

2603.10960 2026-05-12 cs.LG math.ST stat.TH

Ranking Reasoning LLMs under Test-Time Scaling

Mohsen Hariri, Michael Hinczewski, Jing Ma, Vipin Chaudhary

AI总结 本文研究了在测试时缩放(test-time scaling)条件下对推理大语言模型(LLMs)进行排名的问题,提出了一个名为Scorio的开源库,实现了多种统计排名方法,如配对比较模型、项目反应理论模型等。实验表明,在多个数学基准测试中,多数方法的排名结果与贝叶斯黄金标准高度一致,且部分方法在单次试验下仍能保持较高一致性。研究为不同预算下的模型排名提供了可靠的解决方案。

Comments Code is available at https://github.com/mohsenhariri/scorio

详情
Journal ref
The 64th Annual Meeting of the Association for Computational Linguistics (ACL), 2026
英文摘要

Test-time scaling evaluates reasoning LLMs by sampling multiple outputs per prompt, but ranking models in this regime remains underexplored. We formalize dense benchmark ranking under test-time scaling and introduce Scorio, a library that implements statistical ranking methods such as paired-comparison models, item response theory (IRT) models, voting rules, and graph- and spectral-based methods. Across $20$ reasoning models on four Olympiad-style math benchmarks (AIME'24, AIME'25, HMMT'25, and BrUMO'25; up to $N=80$ trials), most full-trial rankings agree closely with the Bayesian gold standard $\mathrm{Bayes}_{\mathcal{U}}@80$ (mean Kendall's $τ_b = 0.93$--$0.95$), and $19$--$34$ methods recover exactly the same ordering. In the single-trial regime, the best methods reach $τ_b \approx 0.86$. Using greedy decoding as an empirical prior ($\mathrm{Bayes}_{\mathbf{R}_0}@N$) reduces variance at $N=1$ by $16$--$52\%$, but can bias rankings when greedy and stochastic sampling disagree. These results identify reliable ranking methods for both high- and low-budget test-time scaling. We release Scorio as an open-source library at https://github.com/mohsenhariri/scorio.

2509.21484 2026-05-12 cs.LG stat.ML

High-probability zeroth-order online convex optimisation beyond Euclidean geometry

David Janz, El-Mahdi El-Mhamdi, Arya Akhavan

AI总结 本文研究了在非欧几里得几何框架下的零阶在线凸优化问题,考虑了具有$\ell_q$-Lipschitz损失函数和$\ell_p$-正则化FTRL算法的优化方法,并基于$\ell_r$-球上的锥测度采样构造了随机两点有限差分梯度估计器。作者给出了适用于所有$p,q,r \in [1,\infty]$的高概率统一后悔界,并通过分析梯度估计器在对偶FTRL范数下的所有矩界,实现了时间一致的二次变分控制。该算法具有任意时刻有效和数据驱动的特点,其收敛速率在已有研究中得到了强化,并揭示了在$q > 2$时存在与估计器本身相关的性能瓶颈。

详情
英文摘要

We study online convex optimisation with $\ell_q$-Lipschitz losses, $\ell_p$-regularised FTRL, and randomised two-point finite-difference gradient estimators based on cone-measure sampling from $\ell_r$-spheres. For random Lipschitz losses whose mean is convex, we prove unified high-probability regret bounds for all $p,q,r \in [1,\infty]$. The analysis is driven by all-moment bounds for the gradient estimator in the dual FTRL norm, yielding time-uniform control of the quadratic variation. The algorithm is anytime and data-driven; in the special cases previously studied, its rates recover the known in-expectation guarantees while strengthening them to time-uniform high probability. Together with constant-probability lower bounds, these results establish optimality for $q\in[1,2]$ under appropriate sampling geometry, and expose a gap for $q>2$ that appears intrinsic to the estimators themselves.

2508.21146 2026-05-12 cs.LG stat.ML

Privacy Auditing Synthetic Data Release through Local Likelihood Attacks

Joshua Ward, Chi-Hua Wang, Guang Cheng

AI总结 本文研究了合成数据发布中的隐私泄露问题,提出了一种基于局部似然比的新型无模型成员推理攻击方法——生成似然比攻击(Gen-LRA),该方法无需模型访问或知识,通过评估测试样本对合成数据局部似然比估计的影响来检测训练数据是否被泄露。理论分析表明,Gen-LRA 能在局部过拟合条件下有效区分成员与非成员样本,并在多个数据集和模型架构上表现出优于现有方法的性能,突显了生成模型过拟合对隐私安全的潜在威胁。

详情
英文摘要

Auditing the privacy leakage of synthetic data is an important but unresolved problem. Existing privacy auditing frameworks for synthetic data rely on heuristics and unrealistic assumptions about model access, offering limited ability to describe or detect the privacy exposure of training data through synthetic data release. In this paper, we study designing membership inference attacks (MIAs) that specifically exploit the observation that tabular generative models tend to significantly overfit to certain regions of the training distribution. We propose \emph{Generative Likelihood Ratio Attack} (Gen-LRA), a novel, computationally efficient No-Box MIA that, with no assumption of model knowledge or access, formulates its attack by evaluating the influence a test observation has on a surrogate model's estimate of a local likelihood ratio over the synthetic data. We develop a theoretical framework for the attack: we show that the Gen-LRA score admits a closed-form characterization as a localized density-ratio statistic, and we prove that under a general model of local overfitting it produces a provable mean-score gap between members and non-members, yielding testable predictions for when the attack should succeed. We validate these predictions in a controlled simulation study and assess Gen-LRA against a comprehensive benchmark spanning diverse datasets, generative model architectures, and attack parameters. Across metrics, Gen-LRA consistently dominates competing MIAs, with especially strong gains at low false positive rates. These results underscore Gen-LRA's effectiveness as a privacy auditing tool for the release of synthetic data, and highlight the significant privacy risks posed by generative model overfitting in real-world applications.

2508.12776 2026-05-12 cs.LG cs.AI stat.ML

Randomized PCA Forest for Unsupervised Outlier Detection

Muhammad Rajabinasab, Farhad Pakdaman, Moncef Gabbouj, Peter Schneider-Kamp, Arthur Zimek

AI总结 本文提出了一种基于随机主成分分析(RPCA)的无监督异常检测方法,利用RPCA森林的内在特性计算异常分数,以实现高效的异常检测。该方法在多个数据集上表现出优于传统及最新方法的性能,同时具有良好的鲁棒性和计算效率,适用于无监督场景下的异常检测任务。

详情
英文摘要

We propose a novel unsupervised outlier detection method based on Randomized Principal Component Analysis (PCA). Motivated by the performance of Randomized PCA (RPCA) Forest in approximate K-Nearest Neighbor (KNN) search, we develop a novel unsupervised outlier detection method that utilizes RPCA Forest for unsupervised outlier detection by deriving an outlier score from its intrinsic properties. Experimental results showcase the superiority of the proposed approach compared to the classical and state-of-the-art methods in performing the outlier detection task on several datasets while performing competitively on the rest. The extensive analysis of the proposed method reflects its robustness and its computational efficiency, highlighting it as a good choice for unsupervised outlier detection.

2507.17921 2026-05-12 stat.ML cs.LG eess.IV math.ST stat.CO stat.ME stat.TH

Sliding Window Informative Canonical Correlation Analysis

Arvind Prasadan

AI总结 本文提出了一种适用于流数据场景的新型典型相关分析方法——滑动窗口信息典型相关分析(SWICCA),用于实时发现两个数据集之间的相关特征。该方法结合流式主成分分析算法与滑动窗口样本,实现了对CCA成分的在线估计,具有高维数据处理能力和良好的可扩展性。文中通过数值模拟和实际数据案例验证了方法的有效性,并提供了理论性能保证。

Comments 11 pages (double column), submitted; revised with updated simulations

详情
英文摘要

Canonical correlation analysis (CCA) is a technique for finding correlated sets of features between two datasets. In this paper, we propose a novel extension of CCA to the online, streaming data setting: Sliding Window Informative Canonical Correlation Analysis (SWICCA). Our method uses a streaming principal component analysis (PCA) algorithm as a backend and uses these outputs combined with a small sliding window of samples to estimate the CCA components in real time. We motivate and describe our algorithm, provide numerical simulations to characterize its performance, and provide a theoretical performance guarantee. The SWICCA method is applicable and scalable to extremely high dimensions, and we provide a real-data example that demonstrates this capability.

2504.14127 2026-05-12 econ.EM stat.ME

Finite Population Identification and Design-Based Sensitivity Analysis

Brendan Kline, Matthew A. Masten

AI总结 本文提出了一种用于有限总体中量化不确定性的新方法,通过设计分布校准敏感性参数,从而得到可解释为识别集、稳健贝叶斯可信集或统一频率学派设计置信集的不确定性区间。研究聚焦于平均处理效应的不确定性量化,其方法无需依赖渐近理论即可处理异质性处理效应,同时为协变量平衡的分析提供了新视角,并对随机化的作用进行了形式化分析。文中通过三个实证应用展示了该方法的有效性。

详情
英文摘要

We develop a new approach for quantifying uncertainty in finite populations, by using design distributions to calibrate sensitivity parameters in finite population identified sets. This yields uncertainty intervals that can be interpreted as identified sets, robust Bayesian credible sets, or uniform frequentist design-based confidence sets. We focus on quantifying uncertainty about the average treatment effect, where our approach (1) yields design-based confidence intervals which allow for heterogeneous treatment effects without using asymptotics, (2) provides a new motivation for examining covariate balance, and (3) gives a new formal analysis of the role of randomization. We illustrate our approach in three empirical applications.

2502.06044 2026-05-12 stat.ML cs.LG

Differentially Private Hyperparameter Tuning using Local Bayesian Optimization

Getoar Sopa, Juraj Marusic, Marco Avella Medina, John P. Cunningham

AI总结 本文研究了在验证数据包含敏感用户信息时,如何实现差分隐私的超参数调优问题。针对现有方法依赖近似随机搜索或全局贝叶斯优化导致效率低下的问题,提出了一种基于局部贝叶斯优化的差分隐私框架DP-GIBO,利用高斯过程代理模型私密地近似梯度。该方法在适当条件下可保证收敛到局部最优超参数配置,并在中高维超参数空间中表现出优于非隐私随机搜索和全局贝叶斯优化的性能。

Comments 26 pages, 6 figures

详情
英文摘要

Hyperparameter tuning is a key component of machine learning procedures, but when validation data contain sensitive user information, search mechanisms can leak private information through the selected configuration. Existing differentially private hyperparameter tuning methods often rely on near-random search, while prior differentially private Bayesian optimization approaches are typically global and, therefore, scale poorly with the hyperparameter dimensionality. We study differentially private hyperparameter tuning using local Bayesian optimization, focusing on settings where the validation objective is available only through noisy black box evaluations and gradients are unavailable or impractical to compute. We introduce DP-GIBO, a differentially private local Bayesian optimization framework that privately approximates gradients using a Gaussian Process surrogate. Under suitable conditions, we prove that DP-GIBO converges to a locally optimal hyperparameter configuration up to a privacy-dependent error, with dimensional dependence that is polynomial rather than exponential.Empirically, we show that DP-GIBO provides scalable private hyperparameter tuning across multiple tasks, substantially outperforming non-private random search and global Bayesian optimization baselines in moderate-to-high-dimensional hyperparameter spaces.

2410.01656 2026-05-12 math.ST cs.DS cs.LG stat.CO stat.ML stat.TH

Efficient Statistics With Unknown Truncation, Polynomial Time Algorithms, Beyond Gaussians

Jane H. Lee, Anay Mehrotra, Manolis Zampetakis

AI总结 本文研究了在样本仅来自未知集合 $S \subseteq \mathbb{R}^d$ 的情况下,如何高效估计分布参数的问题。作者提出了一种多项式时间算法,适用于满足特定结构条件的指数族分布,并能处理由低次多项式近似表示的未知截断集 $S$,从而扩展了对高斯分布参数估计的现有结果。此外,针对截断集为半空间或轴对齐矩形的情况,作者设计了运行时间为 $\mathrm{poly}(d/\varepsilon)$ 的算法,为截断数据下的参数估计提供了更高效的解决方案。

Comments Appeared at the 65th IEEE Symposium on Foundations of Computer Science (FOCS), 2024; abstract shortened for arXiv

详情
英文摘要

We study the estimation of distributional parameters when samples are shown only if they fall in some unknown set $S \subseteq \mathbb{R}^d$. Kontonis, Tzamos, and Zampetakis (FOCS'19) gave a $d^{\mathrm{poly}(1/\varepsilon)}$ time algorithm for finding $\varepsilon$-accurate parameters for the special case of Gaussian distributions with diagonal covariance matrix. Recently, Diakonikolas, Kane, Pittas, and Zarifis (COLT'24) showed that this exponential dependence on $1/\varepsilon$ is necessary even when $S$ belongs to some well-behaved classes. These works leave the following open problems which we address in this work: Can we estimate the parameters of any Gaussian or even extend beyond Gaussians? Can we design $\mathrm{poly}(d/\varepsilon)$ time algorithms when $S$ is a simple set such as a halfspace? We make progress on both of these questions by providing the following results: 1. Toward the first question, we give a $d^{\mathrm{poly}(\ell/\varepsilon)}$ time algorithm for any exponential family that satisfies some structural assumptions and any unknown set $S$ that is $\varepsilon$-approximable by degree-$\ell$ polynomials. This result has two important applications: 1a) The first algorithm for estimating arbitrary Gaussian distributions from samples truncated to an unknown $S$; and 1b) The first algorithm for linear regression with unknown truncation and Gaussian features. 2. To address the second question, we provide an algorithm with runtime $\mathrm{poly}(d/\varepsilon)$ that works for a set of exponential families (containing all Gaussians) when $S$ is a halfspace or an axis-aligned rectangle. Along the way, we develop tools that may be of independent interest, including, a reduction from PAC learning with positive and unlabeled samples to PAC learning with positive and negative samples that is robust to certain covariate shifts.

2410.01244 2026-05-12 stat.ML cs.LG math.PR

Equivariant score-based generative models provably learn distributions with symmetries efficiently

Ziyu Chen, Markos A. Katsoulakis, Benjamin J. Zhang

AI总结 本文研究了如何高效学习具有对称性的数据分布,提出了首个关于等变分数生成模型(SGMs)的理论分析与保证。通过改进Wasserstein-1距离下的泛化界,并结合哈密顿-雅可比-贝尔曼理论,论文证明了在不进行数据增强的情况下,使用等变向量场即可有效学习对称化分布的分数函数。研究还表明,若未在模型中引入等变结构,将导致更差的泛化性能,突显了等变先验在对称数据建模中的重要性。

详情
英文摘要

Symmetry is ubiquitous in many real-world phenomena and tasks, such as physics, images, and molecular simulations. Empirical studies have demonstrated that incorporating symmetries into generative models can provide better generalization and sampling efficiency when the underlying data distribution has group symmetry. In this work, we provide the first theoretical analysis and guarantees of score-based generative models (SGMs) for learning distributions that are invariant with respect to some group symmetry and offer the first quantitative comparison between data augmentation and adding equivariant inductive bias. First, building on recent works on the Wasserstein-1 ($\mathbf{d}_1$) guarantees of SGMs and empirical estimations of probability divergences under group symmetry, we provide an improved $\mathbf{d}_1$ generalization bound when the data distribution is group-invariant. Second, we describe the inductive bias of equivariant SGMs using Hamilton-Jacobi-Bellman theory, and rigorously demonstrate that one can learn the score of a symmetrized distribution using equivariant vector fields without data augmentations through the analysis of the optimality and equivalence of score-matching objectives. This also provides practical guidance that one does not have to augment the dataset as long as the vector field or the neural network parametrization is equivariant. Moreover, we quantify the impact of not incorporating equivariant structure into the score parametrization, by showing that non-equivariant vector fields can yield worse generalization bounds. This can be viewed as a type of model-form error that describes the missing structure of non-equivariant vector fields. Numerical simulations corroborate our analysis and highlight that data augmentations cannot replace the role of equivariant vector fields.