arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.08072 2026-05-11 stat.ML cs.DS cs.LG math.ST stat.TH

A Note on Non-Negative $L_1$-Approximating Polynomials

Jane H. Lee, Anay Mehrotra, Manolis Zampetakis

AI总结 本文研究了在高斯分布下具有非负性的 $L_1$-逼近多项式的存在性,这类多项式在逼近指示函数时不仅满足 $L_1$-范数误差要求,还保证输出非负。作者证明了对于具有有限高斯表面面积(GSA)的集合类,存在次数为 $\tilde{O}(Γ^2/\varepsilon^2)$ 的非负多项式,能够以 $\varepsilon$ 的误差逼近其指示函数。该结果在保持 $L_1$-逼近能力的同时,提供了更强的点态保证,并且与当前最优的无非负性约束的高斯 $L_1$-逼近多项式次数相差仅常数因子。

详情
英文摘要

$L_1$-Approximating polynomials, i.e., polynomials that approximate indicator functions in $L_1$-norm under certain distributions, are widely used in computational learning theory. We study the existence of \textit{non-negative} $L_1$-approximating polynomials with respect to Gaussian distributions. This is a stronger requirement than $L_1$-approximation but weaker than sandwiching polynomials (which themselves have many applications). These non-negative approximating polynomials have recently found uses in smoothed learning from positive-only examples. In this short note, we prove that every class of sets with Gaussian surface area (GSA) at most $Γ$ under the standard Gaussian admits degree-$k$ non-negative polynomials that $\eps$-approximate its indicator functions in $L_1$-norm, for $k=\tilde{O}(Γ^2/\varepsilon^2)$. Equivalently, finite GSA implies $L_1$-approximation with the stronger pointwise guarantee that the approximating polynomial has range contained in $[0,\infty)$. Up to a constant-factor, this matches the degree of the best currently known Gaussian $L_1$-approximation degree bound without the non-negativity constraint.

2605.08071 2026-05-11 econ.EM cs.HC stat.ME

Vibe Econometrics and the Analysis Contract

Lydia Ashton

AI总结 本文探讨了“vibe方法论”在经济学中的应用,指出人工智能辅助的因果分析(即“vibe计量经济学”)在提升效率的同时,也带来了新的方法与数据不匹配、置信度漂白和隐形分叉等失效模式。文章提出“分析契约”框架,通过预分析计划和因果路线图的改进,为AI辅助下的因果推断提供一种治理机制,以增强结果的可信度和可审查性。

Comments 20 pages, 2 figures. Appendices A-C (fillable templates) provided as ancillary file. Companion materials: https://github.com/lydiaashton/vibe-econometrics-supp . Also posted on SSRN: https://doi.org/10.2139/ssrn.6699999

详情
英文摘要

"Vibe coding" and "vibe analytics" have been framed as a democratization of technical capability. This paper argues that AI-assisted methodology more broadly, or what I call "vibe methodology," also democratizes the failure modes specific to each domain. When AI assists with methods whose validity depends on assumptions that cannot be verified from the output alone (a class I call "vibe inference"), the failure surface is structurally different: the output does not reliably signal invalidity, and when it does, recognizing the signal requires the expertise the workflow bypasses. I focus on "vibe econometrics," the subset of AI-assisted causal analysis where identification can be named faster than it can be audited. The claim of this paper is not that AI invents inferential failures that did not previously exist, but that it changes their incidence, observability, and persuasive force enough to create a practically distinct governance problem. This results in three failure modes: method-data mismatch, where AI bypasses expertise at execution; confidence laundering, where AI amplifies the credibility of formatted output; and invisible forking, which spans both. What is new is not the failure modes but AI's industrialization of their packaging. The barrier between naming a method and executing it has collapsed, and weak foundations, dressed as rigorous analysis, now reach audiences at a scale, speed, and polish that previously required expertise. I propose the Analysis Contract, a pre-commitment framework that adapts the logic of pre-analysis plans and the Causal Roadmap to the AI-assisted setting. The contract imposes three conditions before a causal claim is made: a method-data contract, a data audit, and a pre-commitment statement defining what would count as a disconfirming result. The framework generalizes across domains of vibe inference through domain-specific instantiation.

2605.08069 2026-05-11 stat.ME stat.ML

Empirical Bayes Rebiasing

Wanyi Ling, Sida Li, Junming Guan, Nikolaos Ignatiadis

AI总结 本文研究如何同时分析大量存在噪声和偏差的估计值,并且每个估计值还配有一个更加噪声的偏差估计。为了解决传统去偏方法导致方差增大、置信区间过长的问题,作者提出了一种经验贝叶斯再偏差方法,通过从完全去偏的估计中学习未知的偏差分布,从而合理地重新引入偏差。该方法在预测驱动的推断任务和基于家系的全基因组关联研究中均表现出显著的精度提升。

详情
英文摘要

We study methods for simultaneous analysis of many noisy and biased estimates, each paired with an even noisier estimate of its own bias. The analyst's goal is to construct short calibrated intervals for each parameter. The standard debiasing approach, which subtracts the bias estimate from each biased estimate, inflates variance and yields long intervals. In this paper, we propose an empirical Bayes rebiasing strategy that starts from the fully debiased estimates and learns from data how much bias to reintroduce by estimating the unknown bias distribution. We provide convergence rates for the coverage of our intervals when the bias distribution is estimated using nonparametric maximum likelihood. Furthermore, we demonstrate substantial precision gains in prediction-powered inference, including pairwise LLM win-rate evaluations, as well as for inference of direct genetic effects in family-based GWAS.

2605.08051 2026-05-11 astro-ph.SR stat.ML

Inferring Asteroseismic Parameters from Short Observations Using Deep Learning: Application to TESS and K2 Red Giants

Nipun Ghanghas, Siddharth Dhanpal, Shravan Hanasoge, Praneeth Netrapalli, Karthikeyan Shanmugam

AI总结 本文利用深度学习方法,从短时间观测数据中推断红巨星的星震学参数,如频率分离(Δν)和最大振幅频率(ν_max),并应用于TESS和K2任务数据。研究提出了一种高效的机器学习方法,在处理TESS单月观测数据时,能够对约23%的恒星可靠推断Δν,而在K2数据中则能对约200颗年轻红巨星可靠推断重力模周期间隔(ΔΠ₁)。该方法为大规模星震学数据分析提供了可行的技术方案。

Comments 43 pages, 22 figures, 5 tables. Under review at ApJ

详情
英文摘要

Asteroseismology is the study of resonant oscillations of stars to infer their internal structure and dynamics. It is also a powerful tool for precisely determining stellar parameters such as mass, radius, surface gravity, and age. The ongoing TESS mission, with its nearly complete sky coverage, presents a unique opportunity to uniformly probe stellar populations across the Milky Way. TESS is estimated to have observed more than 300,000 oscillating red giants, most of which have one to two months of observations. Given the scale of this dataset, we need a fast, efficient, and robust way to analyse the data. In this work, our objective is to develop a machine learning (ML) based method to infer asteroseismic parameters from short-duration observations. Specifically, we focus on two global seismic parameters, the large frequency separation ($Δν$) and the frequency at maximum power ($ν_{\mathrm{max}}$), from one-month-long TESS observations of red giants. Meanwhile, for K2 data, our focus extends to inferring the period spacings of dipolar gravity modes ($ΔΠ_{1}$), in addition to $Δν$ and $ν_{\mathrm{max}}$. Our findings demonstrate that our machine learning algorithm can accurately infer $Δν$ and $ν_{\mathrm{max}}$ for approximately 50% of samples created by taking one-month Kepler and K2 observations. For TESS one sector data however, we recover reliable $Δν$ for only about 23% of the stars. Additionally, we get reliable $ΔΠ_{1}$ inferences for about 200 young red-giants from K2. For these $ΔΠ_{1}$ inferences, we see a good match with the well known $Δν-ΔΠ_{1}$ degenerate sequence observed in Kepler red-giants.

2605.08046 2026-05-11 stat.ME

Semi-supervised Method for Risk Prediction with Doubly Censored EHR Data

Jie Zhou, Enhao Wang, Xuan Wang

AI总结 随着电子健康记录(EHR)数据的快速增长,如何更准确高效地进行临床风险预测成为重要课题。然而,由于临床事件可能发生在记录系统之外,导致数据存在双重截断(左截断和右截断)问题,且高质量的事件时间标签获取困难,仅依赖少量标签数据效率有限。本文提出了一种半监督学习方法,结合少量高质量标签与大量易获取的替代性结局数据,在双重截断条件下进行风险预测,理论分析与模拟实验表明该方法显著提升了预测效率,并在2型糖尿病风险因素分析中验证了其实际应用价值。

详情
英文摘要

The rapid expansion of large-scale electronic health record (EHR) data offers unique opportunities to improve the accuracy and efficiency of clinical risk estimation. Yet, because clinical events may occur outside the recording health system, clinical event outcomes are frequently subject to double censoring (both left and right). Besides, gold-standard event times can often only be ascertained through labor-intensive manual chart reviews, yielding labels for only a small subset of patients. Reliance on this limited labeled set alone is limited in efficiency, whereas widely available surrogate outcomes such as the time to first diagnostic code or first disease mention are error-prone and can yield biased estimates if used directly. Semi-supervised learning (SSL) methods provide a principled way to integrate labeled and unlabeled data, and prior work has demonstrated their advantages in settings with binary or right-censored outcomes. However, existing approaches do not accommodate double censoring for risk prediction, which poses additional methodological challenges. To address this gap, we develop a novel SSL framework for risk prediction that combines a small set of gold-standard labels with large-scale surrogate information under double censoring. We establish the theoretical validity of the proposed estimator. Through extensive simulation studies, we show that our method substantially improves estimation efficiency relative to existing supervised estimators (based on the labeled data). Finally, we demonstrate its practical value by applying it to study risk factors for type 2 diabetes (T2D) using EHR data from a health system in the US.

2605.08034 2026-05-11 stat.ML cs.LG

Semiparametric Efficient Test for Interpretable Distributional Treatment Effects

Houssam Zenati, Arthur Gretton

AI总结 该研究提出了一种名为DR-ME的半参数高效测试方法,用于检测可解释的分布性处理效应。该方法能够在观测数据中识别出处理对结果分布不同位置的影响,而不仅仅是整体差异,通过学习关键结果位置并结合正交的双重稳健核特征,实现了对分布尾部、模式等变化的精确检测。实验表明,DR-ME在控制第一类错误率和检测能力方面表现优异,并能有效定位医学影像研究中的分布性处理效应。

详情
英文摘要

Distributional treatment effects can be invisible to means: a treatment may preserve average outcomes while changing tails, modes, dispersion, or rare-event probabilities. Kernel tests can detect discrepancies between interventional outcome laws, but global tests do not reveal where the laws differ. We propose DR-ME, to our knowledge the first semiparametrically efficient finite-location test for interpretable distributional treatment effects. DR-ME evaluates an interventional kernel witness at learned outcome locations, returning causal-discrepancy coordinates rather than only a global rejection. From observational data, we derive orthogonal doubly robust kernel features whose centered oracle form is the canonical gradient of this finite witness. For fixed locations, we characterize the local testing limit: DR-ME is chi-square calibrated under the null, has noncentral chi-square local power, and uses the covariance whitening that optimizes local signal-to-noise for discrepancies visible through the selected coordinates. This efficient local-power geometry yields a principled location-learning criterion, with sample splitting preserving post-selection validity. Experiments show near-nominal type-I error, competitive power against global doubly robust kernel tests, and interpretable learned locations that localize distributional effects in a semi-synthetic medical-imaging study.

2605.08027 2026-05-11 stat.ME stat.AP

Randomization Tests for Distributions of Individual Treatment Effects via Combined Rank Statistics

David Kim, Yongchang Su, Jake Bowers, Xinran Li

AI总结 本文研究如何在随机实验中推断个体处理效应的分布,如受益比例、中位效应等。作者提出了一种自适应结合多个秩统计量的检验方法,在不依赖先验知识的情况下保持有限样本有效性,并针对分层实验设计了有效聚合不同层信息的加权方案。该方法在实际应用中表现出比单一检验更高的功效,例如在教师培训项目的评估中,综合检验显示约一半受训教师受益,而单一检验可能仅显示少数受益。

详情
英文摘要

What proportion of treated units actually benefited from an experimental intervention? What is the median or the largest individual treatment effect? This paper develops methods for answering such questions about the distribution of individual causal effects in randomized experiments. Existing approaches require the analyst to select a rank-based test statistic before observing the data. A poor choice can substantially reduce power, while searching over multiple test statistics and adjusting for multiplicity using Bonferroni correction also incurs power loss. We propose inference procedures that adaptively combine multiple rank-based statistics while maintaining finite-sample validity. For stratified experiments, we further develop weighting schemes that effectively aggregate evidence across strata of heterogeneous sizes. The resulting combined test achieves power comparable to, or exceeding, that of the best individual test, without requiring prior knowledge of the optimal statistic. When applied to a randomized experiment evaluating a teacher training program, the combined test suggests that roughly half of treated teachers benefited, whereas a single rank-based test may indicate only a small minority. Thus, the choice of test determined whether the program appears broadly successful or narrowly effective.

2605.08018 2026-05-11 stat.ME

BAMIFun: Bayesian Multiple Imputation for Functional Data

Ziren Jiang, Lei Xuan, Eric F. Lock, Erjia Cui

AI总结 本文提出了一种用于函数型数据的贝叶斯多重插补方法BAMIFun,旨在解决现代函数型数据中轨迹观测稀疏或不规则导致的缺失值问题。该方法基于贝叶斯低秩模型和惩罚样条表示,有效提升了插补的平滑性和推断可靠性,并通过Gibbs采样算法实现后验计算。此外,研究还扩展了该框架以处理多维函数型数据,通过低秩函数张量奇异值分解(FTSVD)模型实现了现有方法无法支持的场景下的多重插补。实验表明,BAMIFun在插补精度和下游推断的置信度方面均优于现有方法。

Comments 2 Tables, 3 Figures

详情
英文摘要

Missing data are pervasive in modern functional datasets, where trajectories are often sparsely or irregularly observed. Although Functional Principal Component Analysis (FPCA) is widely used to reconstruct incomplete curves, existing FPCA-based approaches typically employ single imputation, leading to overly optimistic inferences in downstream analyses. To address these challenges, we develop a novel Bayesian multiple imputation framework for functional data (BAMIFun). For single-level functional data, we impose a Bayesian low-rank model that incorporates penalized spline representations to enforce smoothness of eigenfunctions and derive an efficient Gibbs sampler algorithm for posterior computation. In addition, we demonstrate and validate how to properly account for the estimation uncertainties in downstream analysis. Furthermore, we extend the framework to multiway functional data using a low-rank Functional Tensor Singular Value Decomposition (FTSVD) model, enabling Bayesian multiple imputation in settings not supported by existing methods. Simulation studies show that, compared to existing methods, BAMIFun achieves accurate imputation while providing substantially improved coverage and more reliable downstream inference. Case studies using a physical activity dataset and an infant gut microbiome dataset further demonstrate the practical advantages of our proposed methods under severe missingness. Code for our algorithms is available at https://github.com/ZirenJiang/BAMIFun.

2605.08011 2026-05-11 cs.AI stat.CO

Abductive Reasoning with Probabilistic Commonsense

Joseph Cotnareanu, Chiara Roverato, Han Zhou, Didier Chetelat, Yingxue Zhang, Mark Coates

AI总结 该研究旨在提升大语言模型的推理能力,特别是解决其在处理需要常识推理的问题时的不足。提出了一种概率框架,用于建模不同个体对常识信念的差异,并引入了名为PACS的新算法,通过结合大语言模型与形式化求解器,从多个样本中聚合结论,以判断多数人是否会认为某个陈述为真或假。实验表明,PACS在多个基准测试中优于现有的推理方法。

详情
Journal ref
Proceedings of the International Conference on Machine Learning, 2026
英文摘要

Recent efforts to improve the reasoning abilities of Large Language Models (LLMs) have focused on integrating formal logic solvers within neurosymbolic frameworks. A key challenge is that formal solvers lack commonsense world knowledge, preventing them from making reasoning steps that humans find obvious. Prior methods address this by using LLMs to supply missing commonsense assumptions, but these approaches implicitly assume universal agreement on such commonsense facts. In reality, commonsense beliefs vary across individuals. We propose a probabilistic framework for abductive commonsense reasoning that explicitly models this variation, aiming to determine whether most people would judge a statement as true or false. We introduce Probabilistic Abductive CommonSense (PACS), a novel algorithm that uses an LLM and a formal solver to sample proofs as observations of individuals' distinct commonsense beliefs, and aggregates conclusions across these samples. Empirically, PACS outperforms chain-of-thought reasoning, prior neurosymbolic methods, and search-based approaches across multiple benchmarks.

2605.08006 2026-05-11 math.OC cs.LG stat.ML

Penalty-Based First-Order Methods for Bilevel Optimization with Minimax and Constrained Lower-Level Problems

Yiyang Shen, Yutian He, Weiran Wang, Qihang Lin

AI总结 本文研究了一类具有上下层均为极小极大结构的双层优化问题,这类问题在许多新兴应用中具有广泛代表性。为了解决现有方法在处理下层为极小极大问题时的不足,作者提出了一种基于惩罚函数的一阶优化方法,无需假设下层问题强凸,即可高效求解。在确定性设置下,该方法能够以 $\tilde{O}(ε^{-4})$ 的计算复杂度找到 $ε$-KKT 点,并在随机设置下也给出了相应的复杂度分析,显著优于现有结果。

详情
英文摘要

We study a class of bilevel optimization problems in which both the upper- and lower-level problems have minimax structures. This setting captures a broad range of emerging applications. Despite the extensive literature on bilevel optimization and minimax optimization separately, existing methods mainly focus on bilevel optimization with lower-level minimization problems, often under strong convexity assumptions, and are not directly applicable to the minimax lower-level setting considered here. To address this gap, we develop penalty-based first-order methods for bilevel minimax optimization without requiring strong convexity of the lower-level problem. In the deterministic setting, we establish that the proposed method finds an $ε$-KKT point with $\tilde{O}(ε^{-4})$ oracle complexity. We further show that bilevel problems with convex constrained lower-level minimization can be reformulated as special cases of our framework via Lagrangian duality, leading to an $\tilde{O}(ε^{-4})$ complexity bound that improves upon the existing $\tilde{O}(ε^{-7})$ result. Finally, we extend our approach to the stochastic setting, where only stochastic gradient oracles are available, and prove that the proposed stochastic method finds a nearly $ε$-KKT point with $\tilde{O}(ε^{-9})$ oracle complexity.

2605.08002 2026-05-11 stat.ME math.ST stat.TH

Cellwise and Casewise Robust Multivariate Regression with Inference

Fabio Centofanti, Mia Hubert, Peter J. Rousseeuw

AI总结 本文研究了在存在案例型和单元型异常值、缺失数据及高维特征情况下的多元线性回归问题,提出了一个鲁棒的多元回归估计方法——单元多元回归(cellMR),该方法结合了单元鲁棒协方差估计和岭正则化,能够同时处理多种数据污染问题。此外,作者还提出了一种基于自助法的推断方法cellBoot,能够在存在异常值的情况下提供渐近有效的置信区间,并通过模拟和基因组实际应用验证了方法的有效性。

详情
英文摘要

Multivariate linear regression is a fundamental statistical task, but classical estimators such as ordinary least squares are highly sensitive to outliers. These may occur as casewise outliers that affect entire observations, or as outlying cells, that are individual contaminated entries in the predictor and/or response matrix. Moreover, modern datasets frequently contain missing values and are high-dimensional. To address these challenges we propose the cellwise multivariate regression (cellMR) estimator, a robust regression method that simultaneously accommodates casewise and cellwise outliers, missing data, and high dimensionality. The approach builds on a cellwise robust covariance estimator and uses ridge regularization for numerical stability. We further introduce cellBoot, a novel bootstrap-based inference procedure tailored to the cellMR framework. Relying on indirect inference, cellBoot provides asymptotically valid confidence intervals that are robust to casewise and cellwise contamination. We derive influence functions of the regression estimator and prove the asymptotic validity of the cellBoot confidence intervals. Simulations and a real genomics application illustrate the strong finite-sample performance of the proposed methods.

2605.08001 2026-05-11 math.ST stat.ME stat.TH

Scale selection for geometric medians on product manifolds

Kisung You

AI总结 本文研究了在乘积流形上几何中位数的尺度选择问题,指出直接联合优化位置和尺度会导致尺度退化到边界,从而使问题退化为边缘中位数,丢失一个因子的信息。为此,作者提出了三种改进方法,分别从敏感性路径、鲁棒尺度校准和平衡方程等角度出发,确保尺度估计的稳定性、一致性及单位不变性,并通过仿真验证了方法在欧几里得和Bures-Wasserstein空间中的有效性。

详情
英文摘要

Geometric medians on product manifolds are sensitive to the relative scaling of factor metrics because the median objective couples the factors rather than separating them. We study this scale-selection problem and first prove that naive joint minimization over location and scale is degenerate: the scale is driven to the boundary and the problem collapses to a marginal median, effectively discarding one factor. Thus relative scale is not identifiable from the raw median loss alone. We develop three alternatives to mitigate this issue. The first treats scale as indexing a sensitivity path and establishes uniform consistency, a functional central limit theorem, and a derivative-based sensitivity measure. The second constructs a robust scale-calibrated median using marginal radial median scales, yielding unit invariance, consistency, a two-step central limit theorem, and bounded influence. The third introduces a bounded balance equation for direct scale estimation, with monotonicity, uniqueness, joint asymptotic normality, and bounded influence. Simulations illustrate boundary collapse, sensitivity, unit invariance, and balanced estimation in Euclidean and Bures-Wasserstein settings.

2605.07993 2026-05-11 cs.LG stat.ME

Bayesian Sensitivity of Causal Inference Estimators under Evidence-Based Priors

Nikita Dhawan, Daniel Shen, Leonardo Cotta, Chris J. Maddison

AI总结 因果推断,尤其是在观察性研究中,依赖于对真实数据生成过程的不可检验假设。本文提出了一种基于现实证据构建先验的贝叶斯敏感性分析方法,用于评估因果估计量对三种常见假设的敏感性,克服了传统最坏情况分析可能过于悲观或与先验知识冲突的问题。该方法引入了贝叶斯敏感性值(BSV),通过蒙特卡洛近似计算估计量在假设违反下的期望敏感性,并在糖尿病治疗对体重影响的观察性研究中验证了其有效性。

Comments TMLR 2026

详情
英文摘要

Causal inference, especially in observational studies, relies on untestable assumptions about the true data-generating process. Sensitivity analysis helps us determine how robust our conclusions are when we alter these underlying assumptions. Existing frameworks for sensitivity analysis are concerned with worst-case changes in assumptions. In this work, we argue that using such pessimistic criteria can often become uninformative or lead to conclusions contradicting our prior knowledge about the world. To demonstrate this claim, we generalize the recent s-value framework (Gupta & Rothenhäusler, 2023) to estimate the sensitivity of three different common assumptions in causal inference. Empirically, we find that, indeed, worst-case conclusions about sensitivity can rely on unrealistic changes in the data-generating process. To overcome this, we extend the s-value framework with a new sensitivity analysis criterion: Bayesian Sensitivity Value (BSV), which computes the expected sensitivity of an estimate to assumption violations under priors constructed from real-world evidence. We use Monte Carlo approximations to estimate this quantity and illustrate its applicability in an observational study on the effect of diabetes treatments on weight loss.

2605.07980 2026-05-11 cs.LG cond-mat.stat-mech math.ST stat.TH

Susceptibilities and Patterning: A Primer on Linear Response in Bayesian Learning

Chris Elliott, Daniel Murfet

AI总结 本文介绍了在神经网络解释中发展的易感性理论,用于分析贝叶斯学习中的线性响应。易感性定义为可观测量对数据扰动的后验期望导数,根据涨落-耗散定理等价于后验协方差。通过不同可观测量的选择,可得到不同对象,如样本损失对应影响矩阵,局部组件可观测量对应结构易感性矩阵,该矩阵与数据模式和模型组件的映射有关,并可用于寻找实现特定结构变化的数据扰动。文章从统计力学基础出发,详细阐述了易感性及其估计方法与损失景观几何的关系。

Comments 34 pages, 3 figures, comments welcome!

详情
英文摘要

These notes introduce the theory of susceptibilities as developed in [arXiv:2504.18274, arXiv:2601.12703] for interpreting neural networks. The susceptibility of an observable $ϕ$ to a data perturbation is defined as a derivative of a posterior expectation, which by the fluctuation--dissipation theorem equals a posterior covariance. Different choices of $ϕ$ yield different objects: per-sample losses give the influence matrix (the Bayesian influence function of [arXiv:2509.26544]), while component-localized observables give the structural susceptibility matrix that pairs model components with data patterns. The susceptibility matrix is (up to a factor of $nβ$) the Jacobian of the map from data distributions to structural coordinates; its pseudo-inverse provides a linearized solution to the patterning problem of [arXiv:2601.13548]: finding data perturbations that produce a desired structural change. We motivate the theory from its statistical-mechanical foundations, then give a detailed exposition of susceptibilities, their empirical estimators, and their connection to the geometry of the loss landscape.

2605.07972 2026-05-11 cs.LG cs.AI stat.ML

It Just Takes Two: Scaling Amortized Inference to Large Sets

Antoine Wehenkel, Michael Kagan, Lukas Heinrich, Chris Pollard

AI总结 本文研究了如何将免计算推断扩展到大规模观测集合的问题,提出了一个简单且理论基础扎实的方法,将表示学习与后验建模解耦。该方法通过在最多包含两个元素的集合上训练一个均值池化Deep Set模型,生成的编码器能够泛化到任意规模的集合,从而显著降低了训练成本并提升了推断效率。实验表明,该方法在多种高维条件生成任务中表现优异,计算成本仅为传统方法的一小部分。

详情
英文摘要

Neural posterior estimation has emerged as a powerful tool for amortized inference, with growing adoption across scientific and applied domains. In many of these applications, the conditioning variable is a set of observations whose elements depend not only on the target but also on unknown factors shared across the set. Optimal inference therefore requires treating the set jointly, which in turn requires training the estimator at the deployment set size -- a regime where memory and compute quickly become prohibitive. We introduce a simple, theoretically grounded strategy that decouples representation learning from posterior modeling. Our method trains a mean-pool Deep Set on sets of size at most two, producing an encoder that generalizes to arbitrary set sizes. The inference head is then finetuned on pre-aggregated embeddings, making training cost essentially independent of the deployment set size N. Across scalar, image, multi-view 3D, molecular, and high-dimensional conditional generation benchmarks with N in the thousands, our approach matches or outperforms standard baselines at a fraction of the compute.

2605.07970 2026-05-11 math.ST cs.LG stat.TH

Linear Response Estimators for Singular Statistical Models

Chris Elliott, Daniel Murfet

AI总结 本文研究了一类统计模型在数据扰动下可观测量的响应特性,定义了用于衡量这种响应的“易感度”指标。作者提出了一种针对这些易感度的估计方法,并证明了在数据量趋于无穷大时,这些估计量具有一致性和渐近无偏性。该研究为理解复杂统计模型对数据变化的敏感性提供了理论基础和实用工具。

Comments 24 pages, comments welcome!

详情
英文摘要

We define susceptibilities as a measure of the response of an observable quantity of a parameterized statistical model to a perturbation of the data for a general class of observables. We define estimators for these susceptibilities as statistics in a sequence of n data-points and prove that these estimators are consistent and asymptotically unbiased in the large n regime.

2605.07967 2026-05-11 math.ST stat.TH

Density Estimation Using the Sinc Kernel

Ingrid Kristine Glad, Nils Lid Hjort, Nikolai G. Ushakov

AI总结 本文研究了一种基于sinc核(或傅里叶积分核)的密度估计方法,该核函数为 $K(x)=(πx)^{-1}\sin x$。通过详细分析该估计器的渐近性质和有限样本性质,研究发现与普遍看法相反,sinc核密度估计器在多个方面优于其他估计器,包括样本量适中时的精度更高、在非光滑密度情况下的渐近性能更优,以及带宽选择更为方便等。

Comments 20 pages, no figures. Preprint, Department of Mathematical Statistics, Norwegian University of Science and Technology, Trondheim, no. 2, 2007; arXiv'd for broader visibility and for direct use in a forthcoming paper

详情
英文摘要

This paper deals with the kernel density estimator based on the so-called sinc (or Fourier integral) kernel $K(x)=(πx)^{-1}\sin x$. We study in detail both asymptotic and finite sample properties of this estimator. It is shown that, contrary to widespread opinion, the sinc estimator is superior to other estimators in many respects: it is more accurate for quite moderate values of the sample size, has better asymptotics in non-smooth case (the density to be estimated has only first derivative), is more convenient for the bandwidth selection, etc.

2605.07939 2026-05-11 math.ST cs.NA math.NA stat.TH

Accelerating Langevin Monte Carlo via Efficient Stochastic Runge--Kutta Methods beyond Log-Concavity

Bin Yang, Xiaojie Wang

AI总结 本文研究了如何通过高效的随机Runge-Kutta方法加速高维概率分布采样中的朗之万蒙特卡洛(LMC)算法。提出了一种基于强阶为1.5的随机Runge-Kutta方法的高阶、无需Hessian矩阵的LMC算法,相比现有方法每迭代仅需两次梯度计算,计算效率更高。在非对数凹条件下的非渐近误差界分析表明,该算法具有与现有工作相同量级的收敛速率,数值实验验证了其有效性。

详情
英文摘要

Sampling from a high-dimensional probability distribution is a fundamental algorithmic task arising in wide-ranging applications across multiple disciplines, including scientific computing, computational statistics and machine learning. Langevin Monte Carlo (LMC) algorithms are among the most widely used sampling methods in high-dimensional settings. This paper introduces a novel higher-order and Hessian-free LMC sampling algorithm based on an efficient stochastic Runge--Kutta method of strong order $1.5$ for the overdamped Langevin dynamics. In contrast to the existing Runge--Kutta type LMC (Li et al., 2019) involved with three gradient evaluations, the newly proposed algorithm is computationally cheaper and requires only two gradient evaluations for one iteration. Under certain log-smooth conditions, non-asymptotic error bounds of the proposed algorithms are analyzed in $\mathcal{W}_2$-distance. In particular, a uniform-in-time convergence rate of order $O(d ^{\frac32} h^{\frac32})$ is derived in a non-log-concave setting, matching the convergence rate proved in the aforementioned work but under the log-concavity condition. Numerical experiments are finally presented to demonstrate the effectiveness of the new sampling algorithm.

2605.07908 2026-05-11 math.ST cs.AI cs.LG stat.TH

Statistical inference with belief functions: A survey

Fabio Cuzzolin

AI总结 本文综述了基于信任函数的统计推断方法,重点探讨了在数据不足的情况下如何从统计数据中学习信任度量的问题。文章回顾了该领域的重要研究成果,总结了相关的核心方法与理论进展,为不确定性建模提供了有效的数学框架。

Comments 9 pages, 0 figures

详情
英文摘要

Belief functions are a powerful and popular framework for the mathematical characterisation of uncertainty, in particular in situations in which lack of data renders learning a probability distribution for the problem impractical. The first step in a reasoning chain based on belief functions is inference: how to learn a belief measure from the available data. In this survey we focus, in particular, on making inference from statistical data, and review the most significant contributions in the area.

2605.07907 2026-05-11 stat.ML cs.CV cs.LG

Consistency Regularised Gradient Flows for Inverse Problems

Alessio Spagnoletti, Tim Y. J. Wang, Marcelo Pereyra, O. Deniz Akyildiz

AI总结 本文提出了一种基于一致性正则化的梯度流方法,用于解决逆问题,通过统一的欧几里得-沃瑟斯坦2梯度流框架,在潜在空间中联合进行后验采样和提示优化,从而减少计算成本并提升重建质量。该方法结合少量步骤的潜在文本到图像模型,避免了通过自动编码器进行反向传播,显著降低了神经函数评估次数,实验表明其在多个经典成像逆问题中达到了最先进的性能。

详情
英文摘要

Vision-Language Latent Diffusion Models (LDMs) (Rombach et al., 2022) provide powerful generative priors for inverse problems. However, existing LDM-based inverse solvers typically require a large number of neural function evaluations (NFEs) and backpropagation through large pretrained components, leading to substantial computational costs and, in some cases, degraded reconstruction quality. We propose a unified Euclidean-Wasserstein-2 gradient-flow framework that jointly performs posterior sampling and prompt optimization in the latent space through a single flow that aligns the prior and posterior with the observed data. Combined with few-step latent text-to-image models, this formulation enables low-NFE inference without backpropagation through autoencoders. Experiments across several canonical imaging inverse problems show that our method achieves state-of-the-art performance with significantly reduced computational cost.

2605.07886 2026-05-11 stat.ML cs.LG

Characterizing and Correcting Effective Target Shift in Online Learning

Ziyan Li, Naoki Hiratani

AI总结 本文研究了在线学习中由于分布偏移导致的有效目标漂移问题,通过核回归的视角揭示了在线学习与离线学习之间的关系,并推导出在线核回归等价于使用漂移目标输出的离线回归。通过目标校正方法,论文证明了在线学习可以与离线学习达到相同的预测性能,并提出了闭式和迭代式的目标修正方法。实验表明,该方法在持续学习任务中优于使用真实目标的在线梯度下降方法,为非平稳环境下的在线学习提供了分析与改进的理论框架。

Comments 22 pages; 6 figures

详情
英文摘要

Online learning from a stream of data is a defining feature of intelligence, yet modern machine learning systems often struggle in this setting, especially under distributional shift. To understand its basic properties, we study the relationship between online and offline learning in the context of kernel regression. We derive a closed-form expression for the function learned by online kernel regression, revealing that online kernel regression is equivalent to offline regression with shifted, inaccurate target outputs. Conversely, we show that by compensating for this effective shift in the teaching signal through target correction, online kernel-based learning can provably learn the same predictor as its offline counterpart. We derive both a closed-form expression for this target correction and an iterative form that can be applied sequentially. Applying this framework to image classification tasks on CIFAR-10 and CORe50, we show that online stochastic gradient descent with iteratively corrected targets outperforms learning with the true targets in continual learning settings. This work therefore provides a basic framework for analyzing and improving online learning in non-stationary environments.

2605.07878 2026-05-11 cs.LG stat.ML

Black-box model classification under the discriminative factorization

Hayden Helm, Merrick Ohata, Carey Priebe

AI总结 本文研究了在黑盒模型分类任务中如何通过查询集区分模型特性的问题。作者提出了一种判别因子分解方法,用于评估查询集质量,并证明在该框架下,随机分类的概率会随查询预算指数级下降。实验表明,基于估计的判别因子选择的查询集能够有效重现最优查询集的性能排序,为黑盒模型分析提供了新的理论依据和实用工具。

详情
英文摘要

Access to modern generative systems is often restricted to querying an API (the ``black-box" setting) and many properties of the system are unknown to the user at inference time. While recent work has shown that low-dimensional representations of models based on the relationship between their embedded responses to a set of queries are useful for inferring model-level properties, the quality of these representations is highly sensitive to the query set. We introduce the \emph{discriminative factorization} to distinguish between high- and low-quality query sets in the context of black-box model-level classification. Under this framework, the probability of chance-level classification decays exponentially in the query budget. On three auditing tasks, estimated factorization parameters predict the empirical performance decay rate. We conclude by showing that query sets selected using the estimated discriminative field reproduce the empirical ordering of oracle query sets.

2605.07852 2026-05-11 stat.ME

CHASM: Online Changepoint Detection in Temporal and Cross-Variable Dependence

Victor K. Khamesi, Edward A. K. Cohen, Niall M. Adams, Dean A. Bodenham

AI总结 本文提出了一种名为CHASM的在线非参数方法,用于检测多变量时间序列中跨变量和时间依赖关系的变化。该方法通过递归估计动态模式分解算子的截断特征值序列进行监测,有效解决了传统方法在捕捉全局结构时的不足。研究还解决了特征分解的排列不变性问题,并设计了适用于复值时间序列的在线监测方案,理论分析表明其在向量自回归模型下具有良好的性能,实验显示其在合成和实际数据集上均表现优异,且无需分布假设,具有广泛的应用前景。

Comments 11 pages, 5 figures, and supplementary (53 pages total)

详情
英文摘要

Changepoint detection identifies times when the generative process of a time series changes, with applications in healthcare, cybersecurity, and finance. In multivariate settings, changes in cross-variable and temporal dependence are particularly challenging to detect, as they are often less pronounced than shifts in marginal statistics such as the mean or variance. Existing methods detect changes using reconstruction error, which provides only an indirect measure of dynamical change, or rely on scalar functionals that may be too coarse to capture global structure. We introduce CHASM, an online nonparametric method that monitors the truncated eigenvalue sequence of the recursively estimated dynamic mode decomposition operator. Designing such an approach raises two challenges: the permutation invariance of eigendecompositions, resolved via optimal linear assignment, and the lack of online changepoint methods for multivariate complex-valued time series, addressed through a novel augmented monitoring scheme. We study the theoretical properties of the dynamics estimator under the canonical vector autoregressive model, which directly motivates our algorithmic design. The proposed method achieves competitive or superior performance to modern competitors across synthetic and real-world data sets, including challenging settings in video and text data. It is unsupervised, depends on a small number of interpretable parameters, and requires no distributional assumptions beyond finite moments, making it readily deployable across scientific domains.

2605.07834 2026-05-11 stat.ME stat.AP

GenAI Powered Dynamic Causal Inference with Unstructured Data

Kentaro Nakamura, Kosuke Imai

AI总结 本文研究如何利用生成式人工智能(GenAI)模型从非结构化数据(如文本、图像和视频)中进行动态因果推断。作者提出了一种统计框架,通过从GenAI模型中提取内部表示,并结合神经网络架构联合学习去混杂因素,从而估计序列治疗特征的因果效应。该方法能够在有限样本下生成有效的渐近置信区间,并在模拟研究和香港示威活动的随机实验中验证了其有效性与准确性。

详情
英文摘要

A growing number of scholars seek to estimate causal effects of unstructured data such as text, images, and video. However, existing methods typically treat each object as a single, static observation. We develop a statistical framework for dynamic causal inference with unstructured data by leveraging generative artificial intelligence (GenAI) models. Our approach enables researchers to estimate the causal effects of sequences of treatment features, including their positions within text and video. We first extract internal representations of unstructured objects from a GenAI model and then estimate a marginal structural model using a neural network architecture that jointly learns a deconfounder for each treatment feature in the sequence. Our semiparametric inference framework yields valid asymptotic confidence intervals. Simulation studies demonstrate that the proposed estimator recovers the target causal effects and that the confidence intervals achieve nominal coverage in finite samples. We further apply our method to a randomized experiment on the Hong Kong protests, showing that the effect of a treatment feature depends critically on its position within the text.

2605.07829 2026-05-11 stat.ME math.PR

Parametric ROC Analysis and Optimal Cutoff Selection under Scale Mixtures of Skew-Normal Distributions: A Decision-Theoretic Framework with Asymptotic Inference

Renato de Paula, Helena Mouriño, Tiago Dias Domingues

AI总结 本文研究了在二分类问题中,针对连续生物标志物选择最优阈值的问题,提出了一种基于偏斜正态尺度混合分布(SMSN)的参数化ROC分析框架。该方法考虑了疾病流行率和非对称误分类成本的影响,通过最小化加权误分类风险定义最优阈值,并在满足单调似然比条件下证明了其存在性、唯一性和全局最优性。研究表明,该方法在实际应用中可显著降低误分类风险,尤其在非对称决策场景下效果更为明显。

Comments 42 pages, 3 figures

详情
英文摘要

We study an optimal threshold functional arising in binary classification for continuous biomarkers. While the ROC curve summarizes discriminatory performance across all thresholds, practical threshold selection must also account for disease prevalence and asymmetric misclassification costs. The classical Youden index corresponds to a symmetric special case and may therefore be suboptimal in realistic decision settings. In addition, biomarker distributions in serological and immunological studies often display skewness and heavy tails, making Gaussian ROC models inadequate. We develop a parametric framework for ROC analysis and optimal cutoff selection under the family of scale mixtures of skew-normal (SMSN) distributions, including the skew-normal and skew-t models. The ROC curve and AUC are estimated by plug-in maximum likelihood from separate group fits. The optimal cutoff is defined as the minimiser of a weighted misclassification risk, which yields a likelihood ratio equation extending the Youden criterion. Under a monotone likelihood ratio condition, we establish existence, uniqueness, and global optimality of the cutoff. We further study its local regularity as an implicitly defined functional of the model parameter and derive consistency, asymptotic normality, and a closed-form plug-in variance estimator. A central term in this variance is the local slope of the estimating equation at the optimal threshold, which acts as a local identifiability diagnostic. Monte Carlo experiments across six scenarios show that the asymptotic approximation is accurate and that Wald confidence intervals attain near nominal coverage. An application to SARS-CoV-2 serological data illustrates that the proposed cutoff can differ substantially from the Youden threshold and may reduce estimated misclassification risk by up to 63% under asymmetric decision settings.

2605.07775 2026-05-11 cs.LG cs.AI stat.ML

POETS: Uncertainty-Aware LLM Optimization via Compute-Efficient Policy Ensembles

Nicolas Menet, Andreas Krause, Abbas Rahimi

AI总结 POETS 是一种基于策略集成的不确定性感知大语言模型优化框架,旨在解决序贯决策与黑箱优化中的探索与利用平衡问题。该方法通过隐式编码奖励函数并直接训练策略集成体,避免了传统不确定性感知奖励模型的复杂训练过程,同时利用共享预训练主干与独立低秩适配分支的高效架构,显著降低了计算和内存开销。理论分析表明,POETS 实现了KL正则化的汤普森采样,具有优秀的累积遗憾界,实验显示其在蛋白质搜索、量子电路设计等科学发现任务中表现出领先的样本效率和优化性能。

Comments preprint

详情
英文摘要

Balancing exploration and exploitation is a core challenge in sequential decision-making and black-box optimization. We introduce POETS ($\textbf{Po}$licy $\textbf{E}$nsembles for $\textbf{T}$hompson $\textbf{S}$ampling), a novel framework that bridges uncertainty quantification and policy optimization. Our approach is grounded in the insight that policies trained with Kullback-Leibler (KL) regularization implicitly encode an underlying reward function. Building on this, POETS bypasses the complex, nested process of training an uncertainty-aware reward model and separately fitting a policy to this model. Instead, we directly train a policy ensemble to capture epistemic uncertainty by matching implicitly encoded reward functions to online, bootstrapped data. To overcome the prohibitive compute and memory constraints of ensembling Large Language Models (LLMs), POETS utilizes an efficient architecture: the ensemble shares a pre-trained backbone while maintaining diversity through independent Low-Rank Adaptation (LoRA) branches. Theoretically, we prove that POETS implicitly conducts KL-regularized Thompson sampling and thus inherits strong cumulative regret bounds of ${\mathcal O}(\sqrt{T γ_T})$. Empirically, we demonstrate that POETS achieves state-of-the-art sample efficiency across diverse scientific discovery domains, including protein search and quantum circuit design. Furthermore, it improves the optimization trajectories of reinforcement learning, proving particularly robust in off-policy settings with experience replay or in small dataset regimes.

2605.07746 2026-05-11 stat.ML cs.LG q-bio.QM

Flow Matching for Count Data

Ganchao Wei, John Pearson

AI总结 本文研究了高维计数数据(如单细胞RNA测序和神经脉冲序列)的生成建模问题,提出了一种基于连续时间出生-死亡过程的流匹配框架count-FM。该方法通过模拟自由的方式学习计数空间中的边际转移率,实现了在任意计数分布源和目标之间进行高效的生成与迁移。实验表明,count-FM在样本质量、模型效率和路径可解释性方面优于现有方法,适用于无条件生成、数据迁移和条件生成等多种任务。

详情
英文摘要

High-dimensional count data arise in applications such as single-cell RNA sequencing and neural spike trains, where mapping between distributions across successive batches or time points form critical components of data analysis. The recent success of diffusion- and flow-based deep generative models for images, video, and text motivates extending these ideas to count-valued settings, but many existing methods either treat each count as a categorical state or transform counts into a continuous space, neither of which is natural or efficient when the count range is large. We propose count-FM, a flow-matching framework for count data based on a continuous-time birth-death process with local unit jumps. Count-FM learns marginal transitions efficiently in count space through simulation-free training of conditional transition rates, allowing transport between arbitrary count-distributed source and target populations. In simulation, count-FM achieves better sample quality than representative baselines while using substantially fewer parameters. We further apply count-FM to scRNA-seq and neural spike-train data for unconditional generation, transport, and conditional generation. Across these tasks, count-FM yields improved sample quality, greater modeling efficiency, and interpretable transport paths.

2605.07720 2026-05-11 stat.ML astro-ph.CO math.AT

TopoFisher: Learning Topological Summary Statistics by Maximizing Fisher Information

Matteo Biagetti, Mathieu Carrière, Francesco Conti, Enrico Maria Ferrari, Sven Heydenreich, Karthik Viswanathan

AI总结 TopoFisher 是一种基于最大化费舍尔信息的可微分持续同调方法,旨在学习几何与拓扑结构的稳定可解释摘要。该方法无需人工设计过滤器和压缩策略,通过优化可训练参数,在保证拓扑归纳偏置的前提下,提升对参数不确定性的表征能力。实验表明,TopoFisher 在弱引力透镜等高维非高斯宇宙学问题中,相比现有方法能以更少参数实现更高的费舍尔信息,且在模型泛化和后验估计方面表现更优。

Comments 10+21 pages, 3 figures

详情
英文摘要

Persistence diagrams provide stable, interpretable summaries of geometric and topological structure and are useful for simulation-based inference when low-order statistics miss key information. Yet persistence-based pipelines require hand-chosen filtrations, vectorizations, and compressors, typically without an objective tied to parameter uncertainty. We introduce \textbf{TopoFisher}, a differentiable persistent-homology pipeline that learns topological summaries by maximizing local Gaussian Fisher information. Using simulations near a fiducial parameter, TopoFisher optimizes trainable filtrations, diagram vectorizations, and compressors without posterior samples or supervised regression targets, while retaining stable topological inductive bias. We also give sufficient regularity conditions for the log-determinant Fisher loss to be locally Lipschitz in trainable parameters. Controlled experiments on noisy spirals and Gaussian random fields, where total Fisher information is known, show that TopoFisher recovers much of the available information and outperforms fixed topological vectorizations. Our main results are on weak gravitational lensing, a high-dimensional non-Gaussian cosmological field-inference problem. Learned topological summaries reach higher Fisher information than state-of-the-art cosmological summaries and approach an unconstrained Information Maximising Neural Network baseline with up to $\sim80\times$ fewer parameters. The learned filtrations also generalize better: under simulator shift from lognormal to LPT-based maps it retains most Fisher information, while the neural baseline drops, and in neural posterior estimation they give tighter constraints than the neural baseline, and of state-of-the-art cosmological summaries. These results support Fisher-based topological optimization as a robust, parameter-efficient front end for simulation-based inference.

2605.07665 2026-05-11 stat.ML cs.LG

Debiased Counterfactual Generation via Flow Matching from Observations

Hugh Dance, Johnny Xi, Peter Orbanz, Benjamin Bloem-Reddy

AI总结 本文研究了在干预下估计反事实分布的问题,提出了一种基于观测数据的去混淆流匹配方法,通过利用观测分布与反事实分布之间的紧密联系,提高了反事实生成的准确性。该方法通过流匹配框架和半参数高效估计器实现,能够在高维空间中学习最小能量流,有效克服了现有方法的偏差和失败模式。

详情
英文摘要

Estimating counterfactual distributions under interventions is central to treatment risk assessment and counterfactual generation tasks. Existing approaches model the counterfactual distribution as a standalone generative target, without exploiting its relationship to the observational data. In this work, we show that under standard assumptions, observational and counterfactual outcome distributions are tightly linked: they have identical support and tail behavior, remain statistically close under weak confounding, and share any features of high-dimensional outcomes which are invariant to confounders. These properties motivate learning counterfactual distributions not from scratch, but via a deconfounding flow from the observational distribution. We formulate this problem via flow-matching and derive a semiparametrically efficient estimator based on a novel efficient influence function correction. We subsequently extend our estimator to target minimal-energy flows in high-dimensions, which we show can be especially simple targets between observational and counterfactual distributions. In experiments, deconfounding flows outperform existing debiased counterfactual distribution estimators, while also mitigating known failure modes of flow-based methods.

2605.07654 2026-05-11 stat.ML cs.CL cs.LG

Reliable Chain-of-Thought via Prefix Consistency

Naoto Iwase, Yuki Ichihara, Mohammad Atif Quamar, Junpei Komiyama

AI总结 该研究提出了一种名为“前缀一致性”的新方法,用于提升大型语言模型在推理任务中的可靠性。通过观察正确答案的思维链在截断后更可能被重新生成,研究利用这一特性作为可靠性信号,对候选答案进行加权。实验表明,该方法在多个数学和科学基准测试中表现出色,能以更少的计算资源达到与多数投票相当的准确率。

Comments See our project page at https://naoto-iwase.github.io/prefix-consistency-page

详情
英文摘要

Large Language Models often improve accuracy on reasoning tasks by sampling multiple Chain-of-Thought (CoT) traces and aggregating them with majority voting (MV), a test-time technique called self-consistency. When we truncate a CoT partway through and regenerate the remainder, we observe that traces with correct answers reproduce their original answer more often than traces with wrong answers. We use this difference as a reliability signal, prefix consistency, that weights each candidate answer by how often it reappears under regeneration. It requires no access to token log-probabilities or self-rating prompts. Across five reasoning models and four math and science benchmarks, prefix consistency is the best correctness predictor in most settings, and reweighting votes by it reaches Standard MV plateau accuracy at up to 21x fewer tokens (median 4.6x). Our code is available at https://github.com/naoto-iwase/prefix-consistency.

2605.07634 2026-05-11 math.OC cs.LG math.ST stat.TH

Robust stochastic first order methods in heavy-tailed noise via medoid mini-batch gradient sampling

Manojlo Vukovic, Dusan Jakovetic

AI总结 本文研究了在重尾噪声环境下鲁棒的一阶随机优化方法,提出了一种基于中位数梯度采样的新型随机梯度下降算法(R-SGD-Mini)。该方法通过将数据批次划分为多个子块,计算每个子块的梯度,并选择梯度中位数方向进行参数更新,从而有效降低噪声影响。理论分析表明,该算法在非凸设置下能够以 $\mathcal{O}(T^{-1})$ 的速率收敛,并在已知时间范围时达到 $\mathcal{O}(T^{-1/2})$ 的更快收敛速度,实验结果也验证了其优于传统方法的性能。

详情
英文摘要

We consider a first order stochastic optimization framework where, at each iteration, $K$ independent identically distributed (i.i.d.) data point samples are drawn, based on which stochastic gradients can be queried. We allow gradient noise to be heavy-tailed, with possibly infinite variances. For the considered heavy-tailed setting, many algorithmic variants have recently been proposed based on gradient clipping or other nonlinear operators (e.g., normalization) applied over noisy gradients. In this paper, we take an alternative approach and propose a novel stochastic first order method dubbed Robust Stochastic Gradient Descent with medoid mini-batch gradient sampling, R-SGD-Mini for short. The core idea of R-SGD-Mini is to split the $K$-sized data batch into $M$ distinct data chunks, form for each chunk the stochastic gradient, and update the solution estimate with respect to the stochastic gradient direction of the chunk that is medoid of gradients of all data-chunks. Under a general class of symmetric heavy-tailed gradient noises and a standard non-convex setting, we establish explicit bounds on the expected time-averaged squared gradient norm. More precisely, we show that the latter quantity converges at rate $\mathcal{O}(T^{-1})$ to a small neighborhood of zero; we explicitly characterize this neighborhood in terms of noise and algorithm's parameters. Moreover, if the time horizon is known in advance, we establish the rate of $\mathcal{O}(T^{-\frac{1}{2}}).$ Furthermore, when clipping is incorporated, we obtain convergence guaranties in the high-probability sense and recover the same rate. Experimental results indicate that R-SGD-Mini and its clipped variant consistently perform favorably compared to SGD, clipped SGD and Median-of-Means based methods.

2605.07625 2026-05-11 math.ST stat.ML stat.TH

Statistical Convergence of Spherical First Hitting Diffusion Models

Simon Bienewald, Lukas Trottner

AI总结 本文研究了球面支持的 Sobolev 光滑数据分布下,首次击中扩散模型(FHDM)在总变分意义下的统计收敛性质。该模型是一种具有随机生成时间的去噪扩散模型,能够高效生成定义在已知流形上的数据。作者证明了 FHDM 在对数因子范围内达到了最小最大最优收敛率,这是首次针对具有随机生成时间的去噪扩散模型的统计最优性结果。

详情
英文摘要

Denoising diffusion models have evolved into a state-of-the-art method for tasks in various fields, such as denoising and generation of images, text generation, or generation of synthetic data for training of other machine learning models. First hitting diffusion models (FHDM) are a particular class of denoising diffusion models with \textit{random} adaptive generation time tailored to generate data on a known manifold. Building on the conditioning framework of Doob's $h$-transform these models leverage the given information on the target data manifold to demonstrate strong performance across tasks while offering distinct features such as time-homogeneous dynamics of the generating process and a reduced average simulation time. Even though the theoretical investigation of standard forward-backward diffusion models has attracted much attention in the recent past, the statistical convergence properties of FHDMs are not yet understood. In this work, we show that, up to logarithmic factors, FHDMs achieve the minimax optimal convergence rate in total variation for spherically supported Sobolev smooth data distributions. In particular, this is the first statistical optimality result for denoising diffusion modelling with random generation time.

2605.07620 2026-05-11 stat.ME

Operationalizing Allocation Probability Tests: Practical Guidance on Optimized Implementation for Power and Robustness

Stina Zetterstrom, David S. Robertson, Thomas Jaki, Sofía S. Villar

AI总结 本文针对响应自适应临床试验中基于分配概率(AP)的检验方法,探讨了其在实际应用中的优化实现问题。研究通过优化分配概率在检验统计量中的使用方式,提升了检验的统计功效,并扩展了该方法至生存终点(指数分布)的应用。同时,提出了一种严格的虚无假设选择策略以确保I型错误率的精确控制,仿真结果表明优化后的AP检验在保持患者目标的前提下显著优于传统频率学派检验。

详情
英文摘要

Recently, a new testing approach for response-adaptive clinical trials was proposed based on the allocation probabilities (AP) rather than the outcome data. While original work on the AP test focused on binary and normal endpoints and demonstrated that significant efficiency gains are possible, many critical questions remain open regarding its practical implementation and upper limits. In this work, rather than simply proposing novel statistics, we seek to understand the maximum gain that can be obtained with the AP test by optimizing how these probabilities are used to define the test statistic. We expand the method's practical utility by applying it to survival endpoints (exponential distributions) and introducing a rigorous strategy for selecting the null hypothesis to properly calibrate type I error. Our simulation studies reveal that by optimizing the functional form of the AP test, investigators can achieve a substantial increase in power, approaching the theoretical maximum, without sacrificing the patient outcome goals of the design. Furthermore, we explicitly compare the method to a standard Bayesian decision rule, finding that the optimized AP test significantly outperforms traditional frequentist tests while maintaining strict error control. This work provides a missing practical framework for implementing robust and optimized AP tests in complex response-adaptive settings.

2605.07588 2026-05-11 cs.LG cs.AI stat.ML

Revisiting Transformer Layer Parameterization Through Causal Energy Minimization

Jin Xu, Camille Couturier, Victor Rühle, Saravan Rajmohan, James Hensman

AI总结 本文提出了一种基于因果能量最小化(CEM)的框架,用于重新审视Transformer层的参数化设计。通过将Transformer层视为条件能量函数的优化步骤,CEM揭示了多头注意力和门控MLP等模块在能量视角下的参数化原理,并指出了包括权重共享、低秩交互和递归更新等在内的设计空间。实验表明,基于CEM设计的Transformer层在参数受限的情况下仍能稳定训练并达到与传统Transformer相当的性能,为理解与改进Transformer结构提供了新的视角。

详情
英文摘要

Transformer blocks typically combine multi-head attention (MHA) for token mixing with gated MLPs for token-wise feature transformation, yet many choices in their parameterization remain largely empirical. We introduce Causal Energy Minimization (CEM), a framework that recasts Transformer layers as optimization steps on conditional energy functions while explicitly accounting for layer parameterization. Extending prior energy-based interpretations of attention, CEM shows that weight-tied MHA can be derived as a gradient update on an interaction energy, and that a gated MLP with shared up/down projections can be viewed through an element-wise energy. This perspective identifies a design space for Transformer layers that includes within-layer weight sharing, diagonal-plus-low-rank interactions, lightweight preconditioners, and recursive updates. We evaluate CEM-derived layers in language-modeling experiments at the moderate hundred-million-parameter scale. Despite their constrained parameterizations, these layers train stably and can match corresponding Transformer baselines. Overall, our results suggest that CEM provides a useful lens for understanding Transformer layer parameterization, connecting Transformer architectures to energy-based models and motivating further exploration of energy-guided layer designs.

2605.07572 2026-05-11 cs.AI stat.ML

Open-Ended Task Discovery via Bayesian Optimization

Masaki Adachi, Yuta Suzuki, Juliusz Ziomek

AI总结 本文提出了一种名为Generate-Select-Refine(GSR)的开放任务发现框架,通过交替生成任务和优化任务,解决科学工作流中任务本身不确定的问题。该方法从用户提供的初始任务出发,逐步生成并优化新任务,最终将评估集中于最优任务,仅产生对单任务贝叶斯优化的对数遗憾开销。实验表明,GSR在新产品开发、化学合成放大、算法分析和专利再利用等任务中优于现有的基于大语言模型的优化器。

Comments 60 pages, 11 figures

详情
英文摘要

When applying Bayesian optimization (BO) to scientific workflow, a major yet often overlooked source of uncertainty is the task itself -- namely, what to optimize and how to evaluate it -- which can evolve as evidence accumulates. We introduce Generate-Select-Refine (GSR), a open-ended BO framework that alternates between task generation and task optimization. Starting from a user-provided seed task, GSR generates new tasks in a coarse-to-fine manner while a task-acquisition function schedules optimization. Asymptotically, it concentrates evaluations on the best task, incurring only logarithmic regret overhead relative to single-task BO. We apply GSR to new product development, chemical synthesis scaling, algorithm analysis, and patent repurposing, where it outperforms existing LLM-based optimizers.

2605.07565 2026-05-11 cs.LG cs.AI stat.ML

Ensemble Distributionally Robust Bayesian Optimisation

Tigran Ramazyan, Denis Derkach

AI总结 本文研究了在上下文分布不确定条件下的零阶优化问题,提出了一个基于集成的分布鲁棒贝叶斯优化算法。该方法通过使用集成模型作为替代模型,增强了对复杂和噪声数据的鲁棒性,并在保持计算可行性的同时处理连续上下文。理论分析表明该算法具有次线性遗憾界,优于现有先进方法,实验结果也验证了其理论保证的有效性。

详情
英文摘要

We study zeroth-order optimisation under context distributional uncertainty, a setting commonly tackled using Bayesian optimisation (BO). A prevailing strategy to make BO more robust to the complex and noisy nature of data is to employ an ensemble as the surrogate model, thereby mitigating the weaknesses of any single model. In this study, we propose a novel algorithm for Ensemble Distributionally Robust Bayesian Optimisation that remains computationally tractable while managing continuous context. We obtain theoretical sublinear regret bounds, improving current state-of-the-art results. We show that our method's empirical behaviour aligns with its theoretical guarantees.

2605.07554 2026-05-11 cs.LG cs.AI q-bio.BM stat.ML

ProteinJEPA: Latent prediction complements protein language models

Dan Ofer, Dafna Shahaf, Michal Linial

AI总结 本文研究了在蛋白质语言模型中引入潜在空间预测(JEPA)是否能提升模型性能,并在相同训练时间预算下与传统的掩码语言建模(MLM)进行对比。研究发现,在预训练和从头训练的蛋白质序列编码器中,仅在掩码位置进行潜在预测并保留MLM交叉熵损失的方法(称为masked-position MLM+JEPA)表现最佳,显著优于仅使用MLM或仅使用JEPA的方法。该方法在多个下游任务中取得了更好的性能,包括蛋白质稳定性预测、酶分类和结构检索等。

详情
英文摘要

Protein language models are trained primarily with masked language modeling (MLM), which predicts amino-acid identities at masked positions. We ask whether latent-space prediction can complement these token-level objectives under matched wall-clock budget. Across pretrained and random-init protein sequence encoders at 35--150M parameters, we find that the best protein-JEPA design is not all-position latent prediction but a variant: predicting latent targets only at masked positions, and retaining the MLM cross-entropy. We call this recipe masked-position MLM+JEPA. On a 16-task downstream suite (15 frozen linear probes plus SCOPe-40 zero-shot fold retrieval), under matched wall-clock budgets, this recipe wins more tasks than it loses against MLM-only continuation: 10 wins / 3 losses / 3 ties (hereafter W/L/T) on pretrained ESM2-35M, 11/2/3 on ESM2-150M while results in pretraining from scratch are mixed (6/8/2). Gains are seen for multiple models on 11 of 16 tasks, including stability, \b{eta}β\b{eta}-lactamase fitness, variant effect, intrinsic disorder, remote homology, enzyme classification, and SCOPe-40 fold retrieval. Tasks with more losses than wins are Fluorescence (TAPE) and Peptide-HLA Binding. All-position MLM+JEPA matches MLM-only overall but does not reproduce the masked-position gains. JEPA-only (no MLM) collapses in nearly every experiment. We conclude that JEPA, when combined with MLM, is competitive and can outperform pure MLM in pretraining and continued training, even under matched wall-clock budgets.

2605.06474 2026-05-11 cs.LG cs.AI stat.ML

Q-MMR: Off-Policy Evaluation via Recursive Reweighting and Moment Matching

Xiang Li, Nan Jiang

AI总结 本文提出了一种名为Q-MMR的新型理论框架,用于有限时间马尔可夫决策过程中的离线策略评估。该方法通过递归重加权和矩匹配,学习一组标量权重以近似目标策略下的期望回报,并在无需依赖函数类复杂度的情况下,建立了数据依赖的有限样本保证。研究还揭示了覆盖性在离线强化学习中的本质意义,并与重要性采样和线性FQE等现有方法建立了联系。

详情
英文摘要

We present a novel theoretical framework, Q-MMR, for off-policy evaluation in finite-horizon MDPs. Q-MMR learns a set of scalar weights, one for each data point, such that the reweighted rewards approximate the expected return under the target policy. The weights are learned inductively in a top-down manner via a moment matching objective against a value-function discriminator class. Notably, and perhaps surprisingly, a data-dependent finite-sample guarantee for general function approximation can be established under only the realizability of $Q^π$, with a dimension-free bound -- that is, the error does not depend on the statistical complexity of the function class. We also establish connections to several existing methods, such as importance sampling and linear FQE. Further theoretical analyses shed new light on the nature of coverage, a concept of fundamental importance to offline RL.

2605.05099 2026-05-11 stat.AP cs.MS

Randompack: Cross-Platform Reproducible Random Number Generation and Distribution Sampling

Kristján Jónasson

AI总结 本文介绍了一个名为 Randompack 的 C 语言随机数生成库,支持多种现代随机数生成算法和多种连续分布采样方法,并实现了跨平台的可重复性,即在不同编程语言、硬件和编译器下使用相同种子可获得一致结果。该库结构清晰,将随机数引擎与分布层分离,便于灵活组合使用,同时在性能上优于其他同类库,且提供了全面的并行模拟支持和多种语言接口。

Comments 19 pages

详情
英文摘要

A C library for random number generation, Randompack, is presented. The library implements several modern random number generators (engines), including xoshiro256, PCG64, Philox, ranlux++, and sfc64; 14 continuous distributions including uniform, normal, exponential, gamma, beta, and multivariate normal; raw bit streams, bounded integers, permutations, and sampling without replacement. The engine and the distribution layers are separated so any engine can be used with any distribution. Benchmarks show that Randompack is faster overall than competing libraries, with speedup factors ranging from about 1 to 15 depending on engine, distribution, interface, and platform. A distinguishing feature is reproducibility: with the same seeds Randompack gives compatible results across programming languages, computers, CPU architectures, and compilers. The library includes comprehensive support for parallel simulation. It is accompanied by a comprehensive test suite, benchmarking programs, and example programs. Interfaces to Fortran, Python, Julia, and R have been implemented; their benchmark results are included, although their design and implementation are otherwise outside the scope of the article. Unlike other available C libraries with comparable scope, Randompack is permissively licensed under the MIT license, and it is open source and publicly available through GitHub and conda-forge.

2605.01288 2026-05-11 cs.LG cond-mat.dis-nn stat.ML

A Theory of Saddle Escape in Deep Nonlinear Networks

Divit Rawal, Michael R. DeWeese

AI总结 本文研究了深度非线性网络在小初始化条件下训练过程中出现的长时间平坦期及突变特征获取现象。通过推导适用于任意平滑激活函数和可微损失函数的矩阵Frobenius范数不平衡恒等式,作者将激活函数分为四类通用类别,并在对称子流形上将矩阵演化简化为标量ODE,得出了临界深度逃逸时间与瓶颈层数相关的解析公式。理论结果与数值模拟高度一致,揭示了深度网络训练动态中瓶颈结构对逃逸时间的关键影响。

详情
英文摘要

In deep networks with small initialization, training exhibits long plateaus separated by sharp feature-acquisition transitions. Whereas shallow nonlinear networks and deep linear networks are well studied, extending these analyses to deep nonlinear networks remains challenging. We derive an exact identity for the imbalance of Frobenius norms of layer weight matrices that holds for any smooth activation and any differentiable loss and use this to classify activation functions into four universality classes. On the permutation-symmetric submanifold, the identity combines with an approximate balance law to reduce the full matrix flow to a scalar ODE, giving a critical-depth escape time law $τ_\star = Θ(\varepsilon^{-(r-2)})$ governed by the number $r$ of layers at the bottleneck scale rather than the total depth $L$. We find that this same $r-2$ exponent is recovered under He-normal initialization with $r$ bottleneck layers rescaled by $\varepsilon$, where the symmetry manifold is preserved by the flow but not attracting. We find close agreement between our theory and numerical simulations.

2604.25826 2026-05-11 econ.GN q-fin.EC stat.AP

General-Purpose Technology and Speculative Bubble Detection

Haiqiang Chen, Li Chen, Difang Huang, Yuexin Li, Zhengjun Zhang

AI总结 本文研究了通用技术采用对资产价格泡沫检测的影响,指出传统泡沫检验方法在考虑技术冲击时存在严重的规模扭曲。作者通过在Campbell-Shiller现值模型中引入驼峰形技术冲击,证明技术采用期间基本价格会出现局部爆炸性增长,从而影响检验的极限分布。为此,提出将价格分解为基本价值与投机成分的方法,实证分析表明该方法能有效区分2020-2025年AI热潮中的投机行为,并确认了1999年12月至2000年3月的互联网泡沫高峰期。

详情
英文摘要

We show that the leading bubble test suffers severe size distortion when fundamentals incorporate general-purpose technology adoption. Embedding a hump-shaped technology shock in the Campbell-Shiller present-value model, we prove that the fundamental price becomes locally explosive during adoption, contaminating the test's limit distribution with a non-centrality parameter proportional to the shock's peak. We propose a fundamental-versus-speculative decomposition that projects prices onto observable technology proxies and applies the test to the residual. Empirically, the decomposition eliminates evidence of speculation in the 2020-2025 AI rally while confirming a speculative peak confined to December 1999-March 2000 in the dot-com episode.

2604.04891 2026-05-11 math.OC cs.AI stat.ML

Muon Dynamics as a Spectral Wasserstein Flow

Gabriel Peyré

AI总结 本文研究了深度学习中梯度归一化方法的连续时间动力学,提出了一种基于谱范数的Wasserstein距离,用于描述参数空间上的概率测度演化。核心方法通过引入由不同矩阵范数索引的谱Wasserstein距离,将归一化训练过程解释为梯度流,并建立了与Benamou-Brenier公式等的理论联系。研究贡献包括静态Kantorovich公式、鲁棒成本表示、高斯简化以及在多种模型中的数值验证,为理解归一化训练提供了新的几何视角。

详情
英文摘要

Gradient normalization stabilizes deep-learning optimization, and spectral normalizations are especially natural for matrix-shaped parameter blocks; Muon is the motivating example. We study an idealized deterministic, continuous-time, vanishing-momentum version of this idea in the mean-field regime, where wide models are represented by probability measures on parameter space. Starting from normalized matrix flows, we introduce Spectral Wasserstein distances indexed by norms $γ$ on positive semidefinite matrices: the trace norm gives classical $W_2$, the operator norm gives the Muon geometry, and Schatten norms interpolate between them. We develop the static Kantorovich formulation, a max-min robust-cost representation, Gaussian reductions extending the Bures formula, and for monotone norms, prove equivalence with a Benamou--Brenier formulation. This yields a gradient-flow interpretation of the mean-field normalized training dynamics. We illustrate these findings by numerical experiments on MMD flows, Gaussian reductions, two-layer ReLU models, and shallow attention.

2603.09742 2026-05-11 cs.LG math.DS stat.ML

Upper Generalization Bounds for Neural Oscillators

Zifeng Huang, Konstantin M. Zuev, Yong Xia, Michael Beer

AI总结 本文研究了源自二阶常微分方程的神经振荡器在学习复杂非线性结构系统动态映射时的泛化能力。通过Rademacher复杂度框架,推导了其在连续时间函数空间之间逼近因果和一致连续算子,以及逼近一致渐近增量稳定二阶动力系统的上界泛化界,并将其扩展到目标算子与神经振荡器输出之间的平方Wasserstein-1距离。理论分析表明,估计误差随神经网络规模和时间长度多项式增长,避免了参数复杂度的灾难,并指出通过损失函数正则化约束MLP的Lipschitz常数可提升泛化性能。数值实验验证了理论预测的误差幂律关系,并证实了在有限训练数据下约束MLP矩阵和向量范数的有效性。

Comments This manuscript contains 33 pages with 6 figures

详情
英文摘要

Neural oscillators that originate from second-order ordinary differential equations (ODEs) have shown competitive performance in learning mappings between dynamic loads and responses of complex nonlinear structural systems. Despite this empirical success, theoretically quantifying the generalization capacities of their neural network architectures remains undeveloped. In this study, the neural oscillator consisting of a second-order ODE followed by a multilayer perceptron (MLP) is considered. Its upper probably approximately correct (PAC) generalization bound for approximating causal and uniformly continuous operators between continuous temporal function spaces and that for approximating the uniformly asymptotically incrementally stable second-order dynamical systems are derived by leveraging the Rademacher complexity framework. These bounds are further extended to the squared Wasserstein-1 distances between the probability measures of quantities of interest calculated from target causal operators and the corresponding learned neural oscillators. The theoretical results show that the estimation errors grow polynomially with respect to both MLP sizes and the time length, thereby avoiding the curse of parametric complexity. Furthermore, the derived error bounds demonstrate that constraining the Lipschitz constants of the MLPs via loss function regularization can improve the generalization ability of the neural oscillator. Numerical studies considering a Bouc-Wen nonlinear system under stochastic seismic excitation validates the theoretically predicted power laws of the estimation errors with respect to the sample size and time length, and confirms the effectiveness of constraining MLPs' matrix and vector norms in enhancing the performance of the neural oscillator under limited training data.

2603.00041 2026-05-11 cs.LG cs.AI econ.EM stat.ME

Econometric vs. Causal Structure-Learning for Time-Series Policy Decisions: Evidence from the UK COVID-19 Policies

Bruno Petrungaro, Anthony C. Constantinou

AI总结 本文研究了在时间序列政策决策中,计量经济学方法与因果结构学习方法在因果关系发现上的表现差异,以英国新冠疫情政策为案例进行实证分析。研究对比了四种计量经济学方法与十一种因果机器学习算法在图结构、模型维度和因果效应恢复能力方面的表现,发现计量经济学方法在时间结构上提供了明确的规则,而因果机器学习方法则能探索更广泛的图结构空间,从而发现更多可识别的因果关系。研究为因果机器学习从计量经济学中借鉴经验提供了实证依据,并提供了将计量经济学结果转换为贝叶斯网络工具的代码支持。

详情
英文摘要

Causal machine learning (ML) recovers graphical structures that inform us about potential cause-and-effect relationships. Most progress has focused on cross-sectional data with no explicit time order, whereas recovering causal structures from time series data remains the subject of ongoing research in causal ML. In addition to traditional causal ML, this study assesses econometric methods that some argue can recover causal structures from time series data. The use of these methods can be explained by the significant attention the field of econometrics has given to causality, and specifically to time series, over the years. This presents the possibility of comparing the causal discovery performance between econometric and traditional causal ML algorithms. We seek to understand if there are lessons to be incorporated into causal ML from econometrics, and provide code to translate the results of these econometric methods to the most widely used Bayesian Network R library, bnlearn. We investigate the benefits and challenges that these algorithms present in supporting policy decision-making, using the real-world case of COVID-19 in the UK as an example. Four econometric methods are evaluated in terms of graphical structure, model dimensionality, and their ability to recover causal effects, and these results are compared with those of eleven causal ML algorithms. Amongst our main results, we see that econometric methods provide clear rules for temporal structures, whereas causal-ML algorithms offer broader discovery by exploring a larger space of graph structures that tends to lead to denser graphs that capture more identifiable causal relationships.

2602.07390 2026-05-11 stat.ME

Balancing Covariates in Survey Experiments

Pengfei Tian, Jiyang Ren, Yingying Ma

AI总结 在调查实验中,如何平衡协变量以提高处理效应估计的准确性是一个重要问题。本文提出了一种分层的拒绝抽样与再随机化设计,以增强协变量的平衡性,并建立了相应的设计基础渐近理论,证明了平均处理效应估计量的一致性及其更优的渐近分布特性。此外,文章还提出了一种协变量调整方法,进一步提升了估计效率,数值研究验证了方法的有效性和优越性。

详情
英文摘要

The survey experiment is widely used in economics and social sciences to evaluate the effects of treatments or programs. In a standard population-based survey experiment, the experimenter randomly draws experimental units from a target population of interest and then randomly assigns the sampled units to treatment or control conditions to explore the treatment effect of an intervention. Simple random sampling and treatment assignment can balance covariates on average. However, covariate imbalance often exists in finite samples. To address the imbalance issue, we study a stratified approach to balance covariates in a survey experiment. A stratified rejective sampling and rerandomization design is further proposed to enhance the covariate balance. We develop a design-based asymptotic theory for the widely used stratified difference-in-means estimator of the average treatment effect under the proposed design. In particular, we show that it is consistent and asymptotically a convolution of a normal distribution and two truncated normal distributions. This limiting distribution is more concentrated at the true average treatment effect than that under the existing experimental designs. Moreover, we propose a covariate adjustment method in the analysis stage, which can further improve the estimation efficiency. Numerical studies demonstrate the validity and improved efficiency of the proposed method.

2602.04774 2026-05-11 cond-mat.dis-nn cs.LG stat.ML

Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model

Blake Bordelon, Francesco Mori

AI总结 本文研究了深度学习中学习率调度的最优理论,针对随机特征模型在随机梯度下降(SGD)下的训练过程,提出了基于最优控制理论的分析方法。研究发现学习率调度可分为“易相”和“难相”两个阶段,分别对应不同的衰减策略,并揭示了学习率与批量大小联合优化对训练效率的影响。实验表明,该理论在图像分类和语言模型任务中均具有良好的适用性,为学习率调度提供了理论指导和实践参考。

详情
英文摘要

Setting the learning rate (LR) for a deep learning model is a critical part of successful training. Choosing LRs is often done empirically with trial and error. In this work, we explore a solvable model of optimal LR schedules for a powerlaw random feature model trained with stochastic gradient descent (SGD). We consider the optimal schedule $η_T^\star(t)$ where $t$ is the current iterate and $T$ is the training horizon. This schedule is computed both as a numerical optimization problem and also analytically using optimal control theory. Our analysis reveals two regimes which we term the easy phase and hard phase. In the easy phase the optimal schedule is a polynomial decay $η_T^\star(t) \simeq T^{-ξ} (1-t/T)^δ$ where $ξ$ and $δ$ depend on the properties of the features and task. In the hard phase, the optimal schedule resembles warmup-stable-decay with constant initial LR and annealing performed over a vanishing fraction of training steps. We investigate joint optimization of LR and batch size and find batch ramps can improve the wall-clock time in the easy phase. Beyond SGD, we derive optimal schedules for momentum parameter $β(t)$ and show that it improves the loss-scaling exponent in the hard phase. We compare our optimal schedule to various benchmarks including (1) optimal constant learning rates $η_T(t) \sim T^{-ξ}$ (2) optimal power laws $η_T(t) \sim T^{-ξ} t^{-χ}$, finding that our schedule achieves better rates than either of these. Our theory suggests that LR transfer across training horizon depends on the structure of the model and task. For ResNet image classification on CIFAR-5M, the learning curves exhibit hard-phase behavior where optimal base LRs are constant under sufficient annealing. GPT-2 style transformers trained in language modeling exhibit easy-phase behavior where optimal LRs shift even under annealing.

2601.21951 2026-05-11 stat.ML cs.LG stat.CO

Diffusion Path Samplers via Sequential Monte Carlo

James Matthew Young, Paula Cordero-Encinar, Sebastian Reich, Andrew Duncan, O. Deniz Akyildiz

AI总结 本文提出了一种基于扩散路径的采样方法,用于从仅知归一化常数的目标分布中进行采样。研究通过构建一条从简单基础分布到目标分布的扩散路径,并结合序贯蒙特卡洛方法,高效估计时间变化分布的得分函数和密度函数。为降低得分估计的方差,作者还设计了实用的控制变量调度策略,并将该框架应用于多种扩散路径模型,理论分析与实验结果均验证了方法的有效性。

详情
英文摘要

We develop diffusion-based samplers for target distributions known up to a normalising constant. To this end, we rely on the well-known diffusion path that smoothly interpolates between a simple base distribution and the target, popularised by diffusion models. We tackle the score estimation problem by developing an efficient sequential Monte Carlo sampler that evolves auxiliary variables from conditional distributions along the path, providing principled score and density estimates for time-varying distributions. To control the variance of score estimates, we further propose practical control variate schedules that incur minimal overhead. We adapt this general framework to paths induced by the Ornstein-Uhlenbeck (OU) time-reversal process, stochastic interpolants, and diffusion annealed Langevin dynamics, outlining their trade-offs. Finally, we provide theoretical guarantees and empirically demonstrate the effectiveness of our method on several synthetic and real-world datasets.

2601.17621 2026-05-11 stat.ME physics.data-an

Non-parametric finite-sample credible intervals with one-dimensional priors: a middle ground between Bayesian and frequentist intervals

Tim Ritmeester

AI总结 本文提出了一种统计区间构造方法,旨在在贝叶斯和频率学派区间之间找到一种自然的中间立场。该方法仅需对感兴趣的参数设定一维先验,无需对整个分布进行高维先验设定,即可在观察到区间后赋予其相应的置信度,同时保留贝叶斯方法的许多实用与哲学优势。作者通过两个具体问题的实现与分析,验证了该方法的可行性与潜在优势,为统计方法的发展提供了新的思路。

详情
英文摘要

We present a method of constructing statistical intervals that obtain a natural middle ground between Bayesian and frequentist statistical intervals, previously unexplored in literature: To a p% Bayesian credible interval we should assign a p% belief after observing both the dataset and the interval, to p% frequentist intervals we can generally only assign a p% belief before observing either the data or the interval, while to the intervals proposed here we can assign a p% belief after observing the interval, but not necessarily after inspecting the full dataset ourselves. Even in fully non-parametric problems this only requires a prior over the parameter(s) of interest, not a high-dimensional prior over the full distribution, while maintaining many of the practical and philosophical advantages of Bayesian methods. We belief these methods may therefore provide significant advances in statistical methodology to a number of fields. This work is meant as a proof of principle: We concretely implement such intervals for two different problems and study the properties of resulting intervals. We discuss promising directions where the proposed type of interval may provide significant advantages.

2601.07247 2026-05-11 stat.ML cs.LG math.ST stat.ME stat.TH

Multi-environment Invariance Learning with Missing Data

Yiran Jia, Jelena Bradic

AI总结 本文研究了在存在缺失数据的情况下如何进行多环境不变性学习,以提升模型的因果解释能力和预测鲁棒性。作者提出了一种基于不变性目标的估计方法,并建立了变量选择性质和$\ell_2$误差收敛率的非渐近理论保证,分析了缺失数据比例和插补模型质量对性能的影响。实验表明,即使在使用有偏插补模型的情况下,该方法仍能有效降低预测误差,展现出良好的实用价值。

Comments Added co-author

详情
英文摘要

Learning models that can handle distribution shifts is a key challenge in domain generalization. Invariance learning, an approach that focuses on identifying features invariant across environments, improves model generalization by capturing stable relationships, which may represent causal effects when the data distribution is encoded within a structural equation model (SEM) and satisfies modularity conditions. This has led to a growing body of work that builds on invariance learning, leveraging the inherent heterogeneity across environments to develop methods that provide causal explanations while enhancing robust prediction. However, in many practical scenarios, obtaining complete outcome data from each environment is challenging due to the high cost or complexity of data collection. This limitation in available data hinders the development of models that fully leverage environmental heterogeneity, making it crucial to address missing outcomes to improve both causal insights and robust prediction. In this work, we derive an estimator from the invariance objective under missing outcomes. We establish non-asymptotic guarantees on variable selection property and $\ell_2$ error convergence rates, which are influenced by the proportion of missing data and the quality of imputation models across environments. We evaluate the performance of the new estimator through extensive simulations and demonstrate its application using the UCI Bike Sharing dataset to predict the count of bike rentals. The results show that despite relying on a biased imputation model, the estimator is efficient and achieves lower prediction error, provided the bias is within a reasonable range.

2512.21411 2026-05-11 math.ST stat.ML stat.TH

Singular Fluctuation as Specific Heat in Bayesian Learning

Sean Plummer

AI总结 本文研究了贝叶斯学习中奇异学习理论中的“奇异波动”这一概念,揭示了其在热力学中的精确解释。作者指出,奇异波动等同于贝叶斯自由能对逆温度的曲率,即对数似然可观测值的方差,因此可视为统计意义上的比热容。这一解释阐明了奇异波动在训练误差与泛化误差关系中的作用,并解释了WAIC在奇异模型中成功的本质原因。研究还通过高斯混合模型和低秩回归验证了奇异波动作为热力学响应系数的行为。

Comments Withdrawn by the author. The main thermodynamic identity in this version incorrectly identifies Watanabe's functional variance with the scalar variance of the total log likelihood. A corrected version will distinguish global heat capacity from the pointwise predictive response trace

详情
英文摘要

Singular learning theory characterizes Bayesian models with non-identifiable parameterizations through two central quantities: the real log canonical threshold (RLCT), which governs marginal likelihood asymptotics, and the singular fluctuation, which determines second-order generalization behavior and the complexity term in WAIC. While the geometric meaning of the RLCT is well understood, the interpretation of singular fluctuation has remained comparatively opaque. We show that singular fluctuation admits a precise thermodynamic interpretation. Under a tempered (Gibbs) posterior, it is exactly the curvature of the Bayesian free energy with respect to inverse temperature; equivalently, the variance of the log-likelihood observable. In this sense, singular fluctuation is the statistical analogue of specific heat. This identity clarifies why singular fluctuation controls the equation of state relating training and generalization error and explains the success of WAIC in singular models: WAIC estimates a fluctuation coefficient rather than a parameter dimension. Across Gaussian mixture models and reduced-rank regression, we demonstrate that singular fluctuation behaves as a thermodynamic response coefficient. As temperature decreases, posterior reorganization suppresses fluctuation directions that affect predictive performance, and model-specific geometric observables track the decay of singular fluctuation. Rather than introducing new asymptotic expansions, this work unifies existing variance identities, equation-of-state results, and WAIC complexity corrections under a single free-energy curvature framework.

2510.18843 2026-05-11 stat.ME math.ST stat.ML stat.TH

Inference on Variable Importance for Treatment Effect Heterogeneity: Shapley Values and Beyond

Pawel Morzywolek, Peter B. Gilbert, Alex Luedtke

AI总结 本文提出了一种用于评估处理效应异质性中变量重要性的推断框架,特别适用于医疗等高风险领域,以辅助决策者减少对黑箱算法的依赖。该方法基于局部变量重要性度量,同时进行全局推断,检验某变量是否对任何个体具有重要影响,并结合了函数型参数的半参数理论,适用于使用统计机器学习方法估计处理效应异质性的场景。研究还展示了该方法在传染病预防策略中的应用价值。

Comments 41 pages, 8 figures, v1 was called "Inference on Local Variable Importance Measures for Heterogeneous Treatment Effects"

详情
英文摘要

We provide an inferential framework to assess variable importance for heterogeneous treatment effects. This assessment is especially useful in high-risk domains such as medicine, where decision makers hesitate to rely on black-box treatment recommendation algorithms. The variable importance measures we consider are local in that they may differ across individuals, while the inference is global in that it tests whether a given variable is important for any individual. Our approach builds on recent developments in semiparametric theory for function-valued parameters, and is valid even when statistical machine learning algorithms are employed to quantify treatment effect heterogeneity. We demonstrate the applicability of our method to infectious disease prevention strategies.

2510.04606 2026-05-11 cs.LG stat.ML

Closed-Form Last Layer Optimization

Alexandre Galashov, Nathaël Da Costa, Liyuan Xu, Philipp Hennig, Arthur Gretton

AI总结 本文研究了在平方损失下神经网络最后一层权重的闭式优化方法。作者提出在优化过程中将最后一层视为主干网络参数的函数,仅对主干参数进行优化,从而等价于交替进行主干网络的梯度下降和最后一层的闭式更新。该方法在随机梯度下降框架下进行了改进,并通过理论分析证明了其在神经切线核 regime 下的收敛性,实验表明该方法在多个回归任务中优于标准 SGD 和 Adam。

详情
英文摘要

Neural networks are typically optimized with variants of stochastic gradient descent. Under a squared loss, however, the optimal solution to the linear last layer weights is known in closed-form. We propose to leverage this during optimization, treating the last layer as a function of the backbone parameters, and optimizing solely for these parameters. We show this is equivalent to alternating between gradient descent steps on the backbone and closed-form updates on the last layer. We adapt the method for the setting of stochastic gradient descent, by trading off the loss on the current batch against the accumulated information from previous batches. We provide theoretical analyses showing convergence of the method to an optimal solution in the neural tangent kernel regime, as well as quantifying the gains compared to standard SGD in a one-step analysis. Finally, we demonstrate the effectiveness of our approach compared with SGD and Adam on a squared loss in several regression tasks, including neural operators and causal inference.

2508.12258 2026-05-11 math.ST math.OC stat.TH

Identifying Network Hubs with the Partial Correlation Graphical LASSO

Małgorzata Bogdan, Adam Chojecki, Ivan Hejný, Bartosz Kołodziejek, Jonas Wallin

AI总结 本文研究了部分相关图LASSO(PCGLASSO)在高维无向图模型中的统计和计算性质。该方法通过惩罚部分相关性而非精度矩阵的直接元素,解决了传统图LASSO(GLASSO)不具有尺度不变性的缺陷。作者提出了一个适用于PCGLASSO的尺度不变不可表示条件,并证明该条件足以保证模型选择的一致性,同时指出该条件比GLASSO的相应条件更弱,解释了PCGLASSO在如网络枢纽结构等场景中表现更优的原因。此外,文章还提出了两种高效算法,并分析了PCGLASSO背后的非凸优化问题,推导了全局唯一性和解的一致性条件。

Comments 59 pages

详情
英文摘要

Graphical LASSO (GLASSO) is a widely used method for estimating sparse precision matrices and learning undirected graphical models in high-dimensional settings. Because GLASSO penalizes entries of the precision matrix directly, however, it is not scale-invariant. Partial Correlation Graphical LASSO (PCGLASSO), introduced by Carter et al. (2024), addresses this limitation by penalizing partial correlations, which directly characterize conditional dependence. In this paper, we study both statistical and computational properties of the PCGLASSO estimator. Our main contribution is the introduction of a scale-invariant irrepresentability condition for PCGLASSO and the proof that this condition is sufficient for consistent model selection. We further show that this condition is weaker than the corresponding irrepresentability condition for GLASSO, helping to explain the improved empirical behavior of PCGLASSO in settings such as hub-structured graphs. In addition, we develop two efficient algorithms for computing the estimator and analyze the nonconvex optimization problem underlying PCGLASSO, deriving conditions for global uniqueness and showing consistency of all minimizers.

2507.01064 2026-05-11 physics.data-an cond-mat.stat-mech cs.IT hep-th math.IT stat.ME

Functional Renormalization for Signal Detection: Dimensional Analysis and Dimensional Phase Transition for Nearly Continuous Spectra Effective Field Theory

Riccardo Finotello, Vincent Lahoche, Dine Ousmane Samary

AI总结 本文研究了高维信号检测中的关键问题,即在存在近连续信号分布的噪声背景中如何有效识别信号。作者引入功能重整化群(FRG)框架,将经验谱视为有效场论,定义了一个尺度依赖的“规范维度”作为谱几何的敏感序参量,揭示了在低于传统BBP阈值的信噪比下发生的“维度相变”。该方法能够检测谱密度的细微形变,并与最近关于“广泛尖峰模型”的理论结果一致,已在实际数据集上验证了其有效性。

Comments 36 pages; update figures

详情
Journal ref
J. Stat. Mech. (2026) 043403
英文摘要

Signal detection in high dimensions is a critical challenge in data science. While standard methods based on random matrix theory provide sharp detection thresholds for finite-rank perturbations, such as the known Baik-Ben Arous-Péché (BBP) transition, they are often insufficient for realistic data exhibiting nearly continuous (extensive-rank) signal distributions that merge with the noise bulk. In this regime, typically associated with real-world scenarios such as images for computer vision tasks, the signal does not manifest as a clear outlier but as a deformation of the spectral density's geometry. We use the functional renormalisation group (FRG) framework to probe these subtle spectral deformations. Treating the empirical spectrum as an effective field theory, we define a scale-dependent "canonical dimension" that acts as a sensitive order parameter for the spectral geometry. We show that this dimension undergoes a sharp crossover, interpreted as a "dimensional phase transition", at signal-to-noise ratios significantly lower than the standard BBP threshold. This dimensional instability is shown to correlate with a spontaneous symmetry breaking in the effective potential and a deviation of eigenvector statistics from the universal Porter-Thomas distribution, confirming the consistency of the method. Such behaviour aligns with recent theoretical results on the "extensive spike model", where signal information persists inside the noise bulk before any spectral gap opens. We validate our approach on realistic datasets, demonstrating that the FRG flow consistently detects the onset of this bulk deformation. Finally, we explore a formalisation of this methodology for analysing nearly continuous spectra, proposing a heuristic criterion for signal detection and a method to estimate the number of independent noise components based on the stability of these canonical dimensions.

2506.22925 2026-05-11 stat.ME math.ST stat.TH

Confidence sequences with informative, bounded-influence priors

Stefano Cortinovis, Valentin Kilian, François Caron

AI总结 本文研究了在已知方差的高斯观测下,如何利用具有信息量且影响有界的先验分布构造置信序列。通过结合混合方法与全局先验,并应用扩展的Ville不等式,作者提出了一种在先验正确时比非信息先验更精确、在先验错误时仍保持有界性的置信序列方法,从而在准确性和鲁棒性之间取得了良好平衡。

详情
英文摘要

Confidence sequences are collections of confidence regions that simultaneously cover the true parameter for every sample size at a prescribed confidence level. Tightening these sequences is of practical interest and can be achieved by incorporating prior information through the method of mixture martingales. However, confidence sequences built from informative priors are vulnerable to misspecification and may become vacuous when the prior is poorly chosen. We study this trade-off for Gaussian observations with known variance. By combining the method of mixtures with a global informative prior whose tails are polynomial or exponential and the extended Ville's inequality, we construct confidence sequences that are sharper than their non-informative counterparts whenever the prior is well specified, yet remain bounded under arbitrary misspecification. The theory is illustrated with several classical priors.

2506.19554 2026-05-11 stat.ME stat.CO

Modeling the uncertainty on the covariance matrix for probabilistic forecast reconciliation

Chiara Carrara, Dario Azzimonti, Giorgio Corani, Lorenzo Zambon

AI总结 在最小迹(MinT)预测协调方法中,基础预测误差的协方差矩阵起着关键作用,但通常被估计后当作已知量处理,这可能导致预测分布方差的低估。本文提出了一种贝叶斯协调模型,考虑协方差矩阵估计的不确定性,采用逆 Wishart 先验和高斯残差假设,使得协调后的预测分布服从闭式表达的多元 t 分布,而非多元高斯分布。实验表明,该方法在三个旅游相关数据集上显著提升了预测区间的准确性。

详情
英文摘要

In minimum trace (MinT) forecast reconciliation, the covariance matrix of the base forecasts errors plays a crucial role. Typically, this matrix is estimated and then treated as known. This can lead to underestimation of the variance of the predictive distribution. To address the problem, we propose a Bayesian reconciliation model that accounts for the uncertainty in the estimation of the covariance matrix. By adopting an Inverse-Wishart prior and assuming Gaussian residuals, the reconciled predictive distribution follows a multivariate t-distribution, obtained in closed-form, rather than a multivariate Gaussian distribution. We evaluate our method on three tourism-related datasets, including a new publicly available dataset. Empirical results show that our approach consistently improves prediction intervals compared to MinT reconciliation.

2506.05668 2026-05-11 cs.LG stat.ML

RNE: plug-and-play diffusion inference-time control and energy-based training

Jiajun He, José Miguel Hernández-Lobato, Yuanqi Du, Francisco Vargas

AI总结 本文提出了一种名为RNE的插件式扩散模型方法,用于在推理阶段实现对生成过程的控制,并支持基于能量的训练。RNE基于路径分布之间的密度比概念,建立了边缘密度与转移核之间的基本联系,从而统一了扩散密度估计、推理控制和能量训练等多个任务。实验表明,RNE在推理控制任务中表现出色,同时为能量型扩散模型提供了简单高效的正则化方法,并适用于连续和离散扩散模型。

Comments Accepted at ICLR 2026

详情
英文摘要

Diffusion models generate data by removing noise gradually, which corresponds to the time-reversal of a noising process. However, access to only the denoising kernels is often insufficient. In many applications, we need the knowledge of the marginal densities along the generation trajectory, which enables tasks such as inference-time control. To address this gap, in this paper, we introduce the Radon-Nikodym Estimator (RNE). Based on the concept of the \textit{density ratio} between path distributions, it reveals a fundamental connection between marginal densities and transition kernels, providing a flexible plug-and-play framework that unifies (1) diffusion density estimation, (2) inference-time control, and (3) energy-based diffusion training under a single perspective. Experiments demonstrate that RNE delivers strong results in inference-time control applications, such as annealing and model composition, with promising inference-time scaling performance, and achieves a simple yet efficient regularisation for training energy-based diffusion models. Additionally, our proposed RNE is modality-agnostic and applicable not only to continuous diffusion models but also to their discrete diffusion counterparts.

2505.11325 2026-05-11 stat.ME cs.AI cs.LG stat.CO stat.ML

Uncertainty Quantification for Prior-Data Fitted Networks using Martingale Posteriors

Thomas Nagler, David Rügamer

AI总结 本文研究了如何为先验-数据拟合网络(PFNs)提供不确定性量化方法,这类网络在表格数据预测任务中表现出色但缺乏对预测结果的不确定性估计。作者提出了一种基于鞅后验的采样方法,能够在无需调参的情况下高效构建预测均值、分位数等估计的贝叶斯后验,并证明了该方法的收敛性。实验表明,该方法在多个模拟和实际数据集上表现出良好的效率和校准能力。

详情
英文摘要

Prior-data fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular datasets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not provide any uncertainty quantification for predictive means, quantiles, or similar quantities. We propose a principled, efficient, and tuning-free sampling procedure to construct Bayesian posteriors for such estimates based on martingale posteriors, and prove its convergence. Several simulated and real-world data examples showcase the efficiency and calibration of our method in inference applications.

2505.07383 2026-05-11 math.ST stat.TH

Bias robustness of depth estimators in multivariate settings

Jorge G. Adrover, Marcelo Ruiz

AI总结 本文研究了多元统计模型中深度估计器的偏差鲁棒性问题,重点分析了最深散度矩阵在污染数据下的最大偏差曲线、污染敏感度和破坏点等关键性质。作者提出了一个统一的框架,用于分析Tukey中位数、基于深度的散度矩阵以及多元回归估计器的统计收敛速度和鲁棒性,并指出这些估计器的最大偏差行为可通过不等式的变化进行可视化。此外,文章还揭示了多种半空间深度可由一个统一的概念——残差小深度推导而来,并通过数值实验比较了多元设置下几种鲁棒估计器的有限样本偏差表现。

详情
英文摘要

The concept of statistical depth extends the notions of the median and quantiles to other statistical models. These procedures aim to formalize the idea of identifying deeply embedded fits to a model that are less influenced by contamination. In the multivariate case, Tukey's median was a groundbreaking concept for multivariate location estimation, and its counterpart for scatter matrices has recently attracted considerable interest. The breakdown point and the maximum asymptotic bias are key concepts used to summarize an estimator's behavior under contamination. We explicitly obtain the maximum bias curve, contamination sensitivity and breakdown point of the deepest scatter matrices. In the multivariate and regression setting we analyse recently introduced error bounds that provide a unified framework for studying both the statistical convergence rate and robustness of Tukey's median, depth-based scatter matrices and multivariate regression estimators. We observe that slight variations in these inequalities allow us to visualize the maximum bias behavior of the deepest estimators. We also point out that all the halfspace depths under consideration can be obtained from a unifying concept called residual smallness depth. A numerical study is performed to compare the finite sample bias performance of several robust estimators in the multivariate setting.

2503.12285 2026-05-11 cs.LG cs.AI cs.GT cs.SY eess.SY stat.ML

A Resilience Framework for Bi-Criteria Combinatorial Optimization with Bandit Feedback

Vaneet Aggarwal, Shweta Jain, Subham Pokhriyal, Christopher John Quinn

AI总结 本文研究了在噪声函数评估下的双目标组合优化问题,提出了一个适用于此类问题的鲁棒性框架。该框架引入了$(α,β,δ,\texttt{N})$-鲁棒性概念,用于描述在有界噪声下近似保证的联合退化情况,并开发了一个通用的黑盒方法,将任何鲁棒的离线算法转化为适用于双目标组合多臂老虎机问题的在线算法。该方法在无需线性、子模性等结构假设的情况下,实现了次线性遗憾和约束违反的累积上界,展示了框架在经典子模优化贪心算法中的适用性。

详情
Journal ref
Transactions on Machine Learning Research, May 2026
英文摘要

We study bi-criteria combinatorial optimization under noisy function evaluations. While resilience and black-box offline-to-online reductions have been studied in single-objective settings, extending these ideas to bi-criteria problems introduces new challenges due to the coupled degradation of approximation guarantees for objectives and constraints. We introduce a notion of $(α,β,δ,\texttt{N})$-resilience for bi-criteria approximation algorithms, capturing how joint approximation guarantees degrade under bounded (possibly worst-case) oracle noise, and develop a general black-box framework that converts any resilient offline algorithm into an online algorithm for bi-criteria combinatorial multi-armed bandits with bandit feedback. The resulting online guarantees achieve sublinear regret and cumulative constraint violation of order $\tilde{O}(δ^{2/3}\texttt{N}^{1/3}T^{2/3})$ without requiring structural assumptions such as linearity, submodularity, or semi-bandit feedback on the noisy functions. We demonstrate the applicability of the framework by establishing resilience for several classical greedy algorithms in submodular optimization.

2502.19275 2026-05-11 stat.ME

Deep Computerized Adaptive Testing

Jiguang Li, Robert Gibbons, Veronika Rockova

AI总结 计算机化自适应测试(CAT)在教育评估和行为健康诊断中具有重要作用,传统方法基于单因素项目反应理论(IRT)模型,难以处理现实数据中的多因素结构。本文提出了一种新型的CAT系统,结合多变量潜变量和贝叶斯稀疏多变量IRT模型,通过直接采样潜变量后验分布,显著提升了项目选择效率。同时,引入双重深度Q学习算法优化项目选择策略,实验表明该方法不仅加速了现有方法,还展示了强化学习在CAT中的应用潜力。

详情
Journal ref
Psychometrika, 2026
英文摘要

Computerized adaptive tests (CATs) play a crucial role in educational assessment and diagnostic screening in behavioral health. Unlike traditional linear tests that administer a fixed set of pre-assembled items, CATs adaptively tailor the test to an examinee's latent trait level by selecting a smaller subset of items based on their previous responses. Existing CAT frameworks predominantly rely on item response theory (IRT) models with a single latent variable, a choice driven by both conceptual simplicity and computational feasibility. However, many real-world item response datasets exhibit complex, multi-factor structures, limiting the applicability of CATs in broader settings. In this work, we develop a novel CAT system that incorporates multivariate latent traits, building on recent advances in Bayesian sparse multivariate IRT. Our approach leverages direct sampling from the latent factor posterior distributions, significantly accelerating existing information-theoretic item selection criteria by eliminating the need for computationally intensive Markov Chain Monte Carlo (MCMC) simulations. Recognizing the potential sub-optimality of existing item selection rules, which are often based on myopic one-step-lookahead optimization of some information-theoretic criterion, we propose a double deep Q-learning algorithm to learn an optimal item selection policy. Through simulation and real-data studies, we demonstrate that our approach not only accelerates existing item selection methods but also highlights the potential of reinforcement learning in CATs.

2405.15670 2026-05-11 stat.ME

Post-selection inference for quantifying uncertainty in changes in variance

Rachel Carrington, Paul Fearnhead

AI总结 本文研究如何准确量化检测到的方差变化点的不确定性。传统方法在检测变化点后直接进行假设检验会导致偏差,而本文借鉴事后选择推断的思想,提出两种适用于方差变化检测的后选择p值构造方法,确保在无变化情况下p值服从均匀分布。该方法适用于多种变化检测方法和假设检验场景,具有广泛适用性。

Comments 25 pages, 12 figures, plus 7 pages supplementary material

详情
Journal ref
Statistics and Computing 36 (2026) 128
英文摘要

Quantifying uncertainty in detected changepoints is an important problem. However it is challenging as the naive approach would use the data twice, first to detect the changes, and then to test them. This will bias the test, and can lead to anti-conservative p-values. One approach to avoid this is to use ideas from post-selection inference, which conditions on the information in the data used to choose which changes to test. As a result this produces valid p-values; that is, p-values that have a uniform distribution if there is no change. Currently such methods have been developed for detecting changes in mean only. This paper presents two approaches for constructing post-selection p-values for detecting changes in variance. These vary depending on the method use to detect the changes, but are general in terms of being applicable for a range of change-detection methods and a range of hypotheses that we may wish to test.

2403.05566 2026-05-11 stat.AP

Bringing Age Back In: Accounting for Population Age Distribution in Forecasting Migration

Nathan G. Welch, Hana Ševčíková, Adrian E. Raftery

AI总结 该研究探讨了人口年龄结构对国际净迁入率的影响,指出现有模型在预测国家层面的净迁移率时忽略了这一关键因素。研究提出了一种基于年龄标准化的估计方法,结合迁移年龄结构指数(MASI),对1990年至2020年间200个人口大国的净迁移率进行分解与重构,并利用贝叶斯分层模型对未来五十年的净迁移率进行联合概率预测。结果表明,考虑人口年龄结构后,多数国家的预测区间更窄,且能更准确地反映快速老龄化国家的人口变化趋势。

Comments 29 pages, 8 figures, 3 tables

详情
Journal ref
Demography 2026
英文摘要

The link between age and migration propensity is long established, but existing models of country-level net migration ignore the effect of population age distribution on past and projected migration rates. We propose a method to estimate and forecast international net migration rates for the 200 most populous countries, taking account of changes in population age structure. We use age-standardized estimates of country-level net migration rates and in-migration rates over quinquennial periods from 1990 through 2020 to decompose past net migration rates into in-migration rates and out-migration rates. We then recalculate historic migration rates on a scale that removes the influence of the population age distribution. This is done by scaling past and projected migration rates in terms of a reference population and period. We show that this can be done very simply, using a quantity we call the migration age structure index (MASI). We use a Bayesian hierarchical model to generate joint probabilistic forecasts of total and age- and sex- specific net migration rates over five-year periods for all countries from 2020 through 2100. We find that accounting for population age structure in historic and forecast net migration rates leads to narrower prediction intervals by the end of the century for most countries. Also, applying a Rogers & Castro-like migration age schedule to migration outflows reduces uncertainty in population pyramid forecasts. Finally, accounting for population age structure leads to less out-migration among countries with rapidly aging populations that are forecast to contract most rapidly by the end of the century. This leads to less drastic population declines than are forecast without accounting for population age structure.

2311.08433 2026-05-11 q-bio.QM cs.LG stat.AP

Clinical Characteristics and Laboratory Biomarkers in ICU-admitted Septic Patients with and without Bacteremia

Sangwon Baek, Seung Jun Lee

AI总结 该研究旨在探讨重症监护病房内感染性休克患者中是否存在菌血症的临床特征和实验室生物标志物的预测价值。通过回顾性分析218例患者的临床数据,研究发现C反应蛋白(CRP)和降钙素原(PCT)对菌血症具有较好的预测能力,而结合PCT、胆红素、中性粒细胞与淋巴细胞比值(NLR)、血小板、乳酸、红细胞沉降率(ESR)和格拉斯哥昏迷评分(GCS)构建的多变量逻辑回归模型显著提升了预测准确性,AUC达到0.907。研究还发现菌血症与患者死亡率存在显著关联,表明这些生物标志物在临床诊断和预后评估中具有重要应用价值。

Comments This research is not complete

详情
英文摘要

Few studies have investigated the diagnostic utilities of biomarkers for predicting bacteremia among septic patients admitted to intensive care units (ICU). Therefore, this study evaluated the prediction power of laboratory biomarkers to utilize those markers with high performance to optimize the predictive model for bacteremia. This retrospective cross-sectional study was conducted at the ICU department of Gyeongsang National University Changwon Hospital in 2019. Adult patients qualifying SEPSIS-3 (increase in sequential organ failure score greater than or equal to 2) criteria with at least two sets of blood culture were selected. Collected data was initially analyzed independently to identify the significant predictors, which was then used to build the multivariable logistic regression (MLR) model. A total of 218 patients with 48 cases of true bacteremia were analyzed in this research. Both CRP and PCT showed a substantial area under the curve (AUC) value for discriminating bacteremia among septic patients (0.757 and 0.845, respectively). To further enhance the predictive accuracy, we combined PCT, bilirubin, neutrophil lymphocyte ratio (NLR), platelets, lactic acid, erythrocyte sedimentation rate (ESR), and Glasgow Coma Scale (GCS) score to build the predictive model with an AUC of 0.907 (95% CI, 0.843 to 0.956). In addition, a high association between bacteremia and mortality rate was discovered through the survival analysis (0.004). While PCT is certainly a useful index for distinguishing patients with and without bacteremia by itself, our MLR model indicates that the accuracy of bacteremia prediction substantially improves by the combined use of PCT, bilirubin, NLR, platelets, lactic acid, ESR, and GCS score.

2305.01429 2026-05-11 cs.LG stat.ML

Unsupervised Feature Based Algorithms for Time Series Extrinsic Regression

David Guijo-Rubio, Matthew Middlehurst, Guilherme Arcencio, Diego Furtado Silva, Anthony Bagnall

AI总结 本文研究了时间序列外生回归(TSER)问题,即利用一组训练时间序列预测与回归变量无直接关系的连续响应变量。作者扩展了TSER算法比较数据集,从19个问题增加到63个,并对比了多种回归模型,发现基于分类器的回归方法(如旋转森林)表现优异。文中提出两种新的TSER算法——FreshPRINCE和DrCIF,它们通过提取时间序列的统计特征进行预测,在多个数据集上显著优于其他方法,尤其是优于标准的旋转森林回归器。

Comments 19 pages, 21 figures, 6 tables. Appendix included

详情
Journal ref
Data Mining and Knowledge Discovery, Volume 38, pages 2141-2185, (2024)
英文摘要

Time Series Extrinsic Regression (TSER) involves using a set of training time series to form a predictive model of a continuous response variable that is not directly related to the regressor series. The TSER archive for comparing algorithms was released in 2022 with 19 problems. We increase the size of this archive to 63 problems and reproduce the previous comparison of baseline algorithms. We then extend the comparison to include a wider range of standard regressors and the latest versions of TSER models used in the previous study. We show that none of the previously evaluated regressors can outperform a regression adaptation of a standard classifier, rotation forest. We introduce two new TSER algorithms developed from related work in time series classification. FreshPRINCE is a pipeline estimator consisting of a transform into a wide range of summary features followed by a rotation forest regressor. DrCIF is a tree ensemble that creates features from summary statistics over random intervals. Our study demonstrates that both algorithms, along with InceptionTime, exhibit significantly better performance compared to the other 18 regressors tested. More importantly, these two proposals (DrCIF and FreshPRINCE) models are the only ones that significantly outperform the standard rotation forest regressor.

2301.05636 2026-05-11 stat.ME

Improving Power by Conditioning on Less in Post-selection Inference for Changepoints

Rachel Carrington, Paul Fearnhead

AI总结 该研究旨在提高在变点检测后选择推断中的统计检验功效。通过减少条件信息,提出了一种更高效的条件p值计算方法,尽管其形式难以直接求解,但可通过蒙特卡洛方法进行近似。实验表明,即使使用较小的蒙特卡洛样本量,该方法也能显著提升检测能力,在人类基因组GC含量数据上,将检测到的显著变点数量从17个提升至27个。

Comments 32 pages, 14 figures

详情
Journal ref
Statistics and Computing 35 (2025) 8
英文摘要

Post-selection inference has recently been proposed as a way of quantifying uncertainty about detected changepoints. The idea is to run a changepoint detection algorithm, and then re-use the same data to perform a test for a change near each of the detected changes. By defining the p-value for the test appropriately, so that it is conditional on the information used to choose the test, this approach will produce valid p-values. We show how to improve the power of these procedures by conditioning on less information. This gives rise to an ideal selective p-value that is intractable but can be approximated by Monte Carlo. We show that for any Monte Carlo sample size, this procedure produces valid p-values, and empirically that noticeable increase in power is possible with only very modest Monte Carlo sample sizes. Our procedure is easy to implement given existing post-selection inference methods, as we just need to generate perturbations of the data set and re-apply the post-selection method to each of these. On genomic data consisting of human GC content, our procedure increases the number of significant changepoints that are detected from e.g. 17 to 27, when compared to existing methods.

2605.07448 2026-05-11 stat.ME stat.CO stat.ML

Robust Tensor Regression with Nonconvexity: Algorithmic and Statistical Theory

Zihao Song, Jicai Liu, Heng Lian, Weihua Zhao

AI总结 本文研究了在存在重尾噪声和异常值情况下高维张量数据的鲁棒回归问题,提出了一种基于非凸张量管秩松弛的稳健回归方法。该方法在一般优化框架下同时处理损失函数和惩罚项的非凸性,并开发了可实现的估计算法,证明了其在温和条件下的全局收敛性。此外,论文建立了关于平稳点的通用统计理论,涵盖了线性模型、广义线性模型以及一些非凸损失函数,并通过仿真和实际应用验证了方法的有效性。

详情
英文摘要

Tensor regression is an important tool for tensor data analysis, but existing works have not considered the impact of outliers, making them potentially sensitive to such data points. This paper proposes a low tubal rank robust regression method for analyzing high-dimensional tensor data with heavy-tailed random noise. The proposed method is based on a nonconvex relaxation of the tensor tubal rank within a general optimization framework, which allows for nonconvexity in both the loss and penalty functions. We develop an implementable estimation algorithm and establish its global convergence under some mild assumptions. Furthermore, we provide general statistical theories regarding stationary point, including the rates of convergence and bounds on the prediction error. These theoretical results cover many important models, such as linear models, generalized linear models, and Huber regression, and even encompass some nonconvex losses like correntropy and minimum distance criterion-induced losses. Supportive numerical evidence is provided through simulations and application studies.

2605.07434 2026-05-11 stat.OT

Adaptive Subspace Signal Detection and Performance Analysis in Nonzero-Mean Clutter

Weijian Liu, Zhenyu Xu, Jun Liu, Hui Chen, Yongxiang Liu

AI总结 本文研究了在非零均值杂波背景下子空间信号的检测问题,提出了基于广义似然比检验(GLRT)、Rao检验、Wald检验等策略的自适应检测器。分析了各检测器的检测概率和虚警概率表达式,揭示了非零均值杂波下自由度和信杂比的性能损失。仿真和实测数据验证了所提检测器的有效性及其在实际雷达系统中的应用价值。

详情
英文摘要

To solve the problem of detecting subspace signals in nonzero-mean clutter, we propose adaptive detectors, based on the strategies of generalized likelihood ratio test (GLRT), Rao test, Wald test, gradient test, and Durbin test. The results show that the detectors based on GLRT, Rao and Wald are structurally consistent with the subspace detectors in zero-means clutter. The analytic expressions for the probability of detection (PD) and probability of false alarm (PFA) of each detector are derived, and two major performance differences in the nonzero-mean clutter scenario are revealed. One is the loss of degree of freedom (DOF), which is reduced by 1 compared with the zero-mean clutter scenario. The second is the loss of signal-to-clutter (SCR) ratio. Simulation and measured data verify the effectiveness of the proposed detectors and demonstrate their practical value in real-world radar systems.

2605.07421 2026-05-11 stat.AP

There to care; not to kill: medical settings, statistics and wrongful convictions

Richard D. Gill

AI总结 本文探讨了医疗环境中护士被错误定罪的问题,分析了此类案件中常见的证据薄弱情况,如缺乏直接证据、监控记录或供认,而主要依赖统计关联性作为指控依据。研究指出,警方调查往往受医院顾问的影响,而检方可能将护士的日常行为或私人文字曲解为犯罪证据,动机多为推测。文章强调了统计证据在医疗误判中的关键作用,并呼吁对这类案件进行更审慎的法律与医学评估。

Comments Invited contribution to a volume on miscarriages of justice, in preparation

详情
英文摘要

This paper discusses wrongful convictions in a medical setting, focusing on nurses. Common features are lack of strong direct evidence: the nurse was never seen doing anything wrong. There is no DNA evidence of tampering of apparatus or medications by the nurse. There is no CCTV footage showing suspicious actions. Analysis of medical records at the time led coroners to issue certificates of natural deaths, and most events were not, at the time, thought suspicious by hospital staff. There is no confession and the nurse consistently asserts they are completely innocent. There is no evidence of earlier psychopathic behaviour. Instead, private writings (e.g., in a diary) are interpreted by the prosecution as a confession; mundane behaviour is given a sinister interpretation. Motive remains speculation. The main evidence is statistical: a spike in deaths or collapses and a statistical association with a particular nurse. There is forensic evidence which suggests one or two patients might have been harmed by administration of medication much used in the hospital, and even legitimately used earlier in the care of the alleged victims. Police investigations are driven by the hospital consultants who were clinically responsible for the patients allegedly killed or harmed by the nurse.

2605.07409 2026-05-11 cs.CL cs.LG stat.AP

The Proxy Presumption: From Semantic Embeddings to Valid Social Measures

Baishi Li, Ta Yu, Kelvin J. L. Koa, Ke-Wei Huang

AI总结 本文探讨了自然语言处理在计算社会科学中的应用中面临的一个核心有效性问题——“代理假设”,即直接使用语义嵌入的几何特性(如余弦距离)来衡量社会概念(如新颖性、创造力等)可能引入偏差。为此,研究提出了“构念效度协议”(CVP),结合因果表征学习和心理测量学方法,构建从概念定义到量化验证的严谨流程,并引入“反事实中和”方法以减少嵌入空间中的混淆因素,为社区提供了一套标准化的效度检验工具,助力将经验性代理指标转化为科学可靠的测量工具。

Comments ACL 2026

详情
英文摘要

Natural Language Processing is rapidly evolving into a primary instrument for Computational Social Science, with researchers increasingly using embeddings to measure latent constructs such as novelty, creativity, and bias. However, this transition faces a fundamental validity challenge: the ''Proxy Presumption,'' or the reliance on geometric properties (e.g., cosine distance) as direct measures of social concepts. We argue that without explicit validation, unsupervised representations remain entangled mixtures of the target construct ($C$) and confounding attributes ($Z$) like topic, style, and authorship. To bridge the gap between semantic embeddings and valid social measures, we introduce the Construct Validity Protocol (CVP). Drawing on causal representation learning and psychometrics, the CVP offers a rigorous pipeline from conceptualization to quantitative verification. We further propose Counterfactual Neutralization, a novel method using LLMs to reduce confounding in embedding space. By providing a standardized Validity Suite -- including tests for discriminant, incremental, and predictive validity -- this work offers the community a toolkit to transform heuristic proxies into robust, scientifically defensible instruments.

2605.07404 2026-05-11 math.ST econ.EM stat.TH

Self-normalized tests for multistep conditional predictive ability

Qitong Chen, Shuwen Lai

AI总结 本文提出了一种用于多步条件预测能力比较的自归一化检验方法。通过利用变换后损失差值样本均值的累积和(CUSUM)过程的功能量进行归一化,该方法避免了对长期协方差矩阵的直接估计,从而省去了传统方法中所需的带宽、核函数和滞后截断等人为设定。研究建立了该检验统计量的渐近理论,推导了其在原假设下的极限分布,并证明了检验的一致性。仿真实验表明,该方法有效缓解了传统异方差与自相关一致(HAC)方法在小样本下的显著性扭曲问题,同时保持了对条件可预测性备择假设的强大检验能力。

详情
英文摘要

This paper proposes self-normalized tests for multistep conditional predictive ability in forecast comparison. By normalizing the sample mean of the transformed loss differential using functionals of its cumulative sum (CUSUM) process, specifically an adjusted-range normalizer for scalars and a matrix normalizer for vectors, our approach avoids direct estimation of the long-run covariance matrix. Consequently, it eliminates the need for the ad hoc bandwidth, kernel, and lag-truncation choices required by traditional methods. We establish the asymptotic theory for these statistics, deriving pivotal null limiting distributions and proving test consistency. Monte Carlo simulations show that the proposed tests effectively mitigate the finite-sample size distortions associated with traditional heteroskedasticity and autocorrelation consistent (HAC) methods, while retaining strong empirical power against conditional predictability alternatives.

2605.07383 2026-05-11 cs.CR stat.AP

Combating Organized Platform Abuse: Amplifying Weak Risk Signals with Structural Information

Meng He, Jia Long Loh

AI总结 本文针对在线平台面临的有组织滥用行为,提出了一种基于经济约束的欺诈三难困境理论,揭示了有组织欺诈行为的结构性不变特征——集中式提现,并利用简单的统计方法将低精度的弱信号放大为高精度的强决策。该方法无需标注数据、参数极少、可解释性强,且具备“开手”特性,即使攻击者完全知情也难以规避。实验验证表明,该方法在推广滥用和信用卡欺诈两种真实场景中均取得了极高的检测精度和召回率。

Comments 11 pages, 6 figures, 8 tables

详情
英文摘要

Large-scale online service platforms face severe challenges from organized platform abuse: multiple forms such as credit card fraud and promotion abuse continually emerge, characterized by large numbers of involved accounts, rapid outbreaks, and constantly shifting tactics. Existing mainstream approaches, whether heuristic rules limited in precision, supervised learning with insufficient generalization, or graph models that are engineering-heavy and dependent on seed users, have failed to address such threats effectively. This paper returns to first principles and, starting from the economic constraints of fraudulent behavior, proposes the Fraudster's Trilemma: organized attackers cannot simultaneously achieve scale, low cost, and dispersed cash-out. Building on this theory, we derive a robust structural invariant in organized fraud, namely centralized cash-out, and use a simple statistical method to turn low-precision individual weak signals into high-precision strong decisions. The method requires no labels, is nearly parameter-free, white-box interpretable, has linear complexity O(|E|), avoids cold-start issues, and its detection logic possesses the "open-hand" property: attackers cannot evade it even when fully informed. We validate the approach on two real fraud incidents in backtests. In the promotion abuse case, a single near-zero-cost weak signal (global Precision of only 16%) after structural amplification achieves Precision above 91% and Recall exceeding 99% (z=10.0); at a higher threshold (z=40.0), Precision reaches 93.7%. In the credit card fraud case, an infrastructure-layer weak signal (device spoofing) successfully detects payment-layer attacks without any business-logic linkage, revealing the framework's natural MO-agnostic property: it relies more on the structural invariant than on signal semantics.

2605.07362 2026-05-11 stat.ME

Sufficient Dimension Reduction via Inverse Conditional Mean or Variance Independence

Jicai Liu, Yu Zhang, Jinhong Li

AI总结 本文提出了一种统一的充分维数约减(SDR)框架,推广了多种现有SDR方法,并揭示了逆条件矩独立性与维数约减之间的新联系。该框架基于响应向量与预测变量之间的两种逆独立性形式——逆条件均值独立(ICMI)和逆条件方差独立(ICVI),分别构建了两类能够恢复中心子空间的矩阵,从而得到四种不同的估计方法。理论分析表明这些方法在高维条件下具有良好的收敛性质,且对响应变量中的异常值具有鲁棒性,仿真实验和实际数据分析验证了其有效性。

详情
英文摘要

This paper presents a unified framework for sufficient dimension reduction (SDR) that generalizes several existing SDR techniques and offers new insights into the connection between inverse conditional moment independence and dimension reduction. The framework is built on two forms of inverse independence between the response vector and predictors: inverse conditional mean independence (ICMI) and inverse conditional variance independence (ICVI). For each form, we develop two general classes of matrices capable of recovering the central subspace, based on projection and kernel techniques respectively. This yields four distinct estimators: projection- and kernel-based variants under both ICMI and ICVI frameworks. Under standard regularity conditions, we establish the theoretical properties of these estimators and derive their convergence rates in high-dimensional settings. The proposed methods exhibit robustness to outliers in the response variable while maintaining computational competitiveness. Simulation studies and real-data analyses demonstrate the practical effectiveness of the proposed methods.

2605.07312 2026-05-11 stat.ME

Incorporating Missing Data Considerations into Sample Size Calculations for Developing Clinical Prediction Models

Glen P. Martin, Sian Bladon, Rebecca Whittle, Molly Wells, Gary S. Collins, Richard D. Riley

AI总结 临床预测模型的开发需要足够大的数据集以减少过拟合并确保预测性能的稳健性。现有样本量计算方法假设所有纳入参与者的所有预测变量数据完整,但实际中缺失值普遍存在,可能影响模型性能并增加所需样本量。本研究通过模拟实验和实际案例,探讨了缺失预测变量对模型校准和过拟合的影响,并提出了一种将缺失数据假设和处理策略纳入后验抽样样本量计算框架的方法,为在存在缺失数据的情况下合理确定最小样本量提供了实用的解决方案。

Comments 35 pages, 5 figures (8 supplementary figures), 1 table (1 supplementary table)

详情
英文摘要

Clinical prediction models must be developed using sufficiently large datasets to minimise overfitting and ensure robust predictive performance. Existing sample size calculations assume complete predictor data for all included participants, yet missing values are common and may increase required sample sizes. This study aimed to quantify how missing predictor data and different imputation methods affect overfitting and model degradation, within datasets that adhere to current sample size criteria. We also aimed to explore how a general sample size framework based on anticipated posterior (sampling) distributions can be adapted to incorporate missing data assumptions and handling strategies. Using a simulation study, we found that in development data meeting current minimum sample size requirements, missing data reduced predictive performance, with expected calibration slopes frequently falling below the targeted value of 0.9. Increasing the required sample size to account for missing data reduced overfitting concerns, but the necessary inflation factor was context specific. In some scenarios, up to twice the minimum sample size was needed to achieve performance comparable to models developed using fully observed data. Expected value of perfect information calculations allowed quantification of the expected loss due to finite samples and missingness. Through two applied examples, we illustrate how embedding missing data assumptions and handling within the posterior sampling approach provides a principled way to determine required minimum sample sizes under missing data. Overall, missing predictor data increases minimum sample size requirements to develop stable and well-calibrated models. Our adaptations to recent posterior (sampling) sample size calculations offer a practical approach for incorporating missing data directly into sample size calculations.

2605.07309 2026-05-11 eess.SY cs.SY stat.AP

Variational PMB filter via coordinate descent Kullback-Leibler divergence minimisation

Ángel F. García-Fernández, Yuxuan Xia

AI总结 本文提出了一种新的变分泊松多伯努利(V-PMB)滤波器推导方法,用于多目标估计。该方法通过引入包含目标状态及其轨迹索引的扩展空间,并结合全局假设变量,将V-PMB投影解释为在该空间上进行坐标下降的Kullback-Leibler散度最小化过程,以拟合最佳的PMB密度到PMBM后验分布。研究还表明该方法能够保持后验的概率假设密度,并通过与其他PMB滤波器变体的对比,展示了V-PMB滤波器在目标近距离接近后分离场景中的优势。

Comments Accepted in Proceedings of the 29th International Conference on Information Fusion, 2026. Matlab code available at https://github.com/Agarciafernandez/MTT

详情
英文摘要

This paper presents a new derivation of the variational Poisson multi-Bernoulli (V-PMB) filter for multi-target estimation proposed in [#Williams15]. The proposed derivation is based on considering an augmented space that includes the set of target states with their track indices and the global hypothesis variable. Then, we show that the V-PMB projection performs a coordinate descent Kullback-Leibler divergence (KLD) minimisation on this augmented space to fit the best possible PMB density to the Poisson multi-Bernoulli mixture (PMBM) posterior. We also show that this V-PMB projection keeps the probability hypothesis density of the posterior. The paper also includes a comparison with the PMBM filter and other PMB filter variants, including a track-oriented Murty-based implementation, a track-oriented loopy belief propagation implementation and a global nearest neighbour implementation, showing the benefits of the V-PMB filter compared to the other PMB filters when targets get in close proximity and then separate.

2605.07300 2026-05-11 stat.ME stat.AP

A Beta-GAM Hidden Markov Model for Proportion Time Series

Andrea Nigri, Han Lin Shang, Marco Bonetti

AI总结 本文提出了一种用于单位比例时间序列的隐马尔可夫模型,该模型通过贝塔分布描述观测值,并利用广义可加模型(GAM)将贝塔均值与协变量联系起来,同时允许每个隐状态具有特定的精度参数,从而灵活建模非线性协变量效应和状态依赖的变异性。通过带罚项的期望最大化算法进行估计,并结合信息准则进行隐状态数量和光滑惩罚的选择,最终通过参数自举方法量化不确定性。该模型在模拟和俄罗斯特定年龄死亡率数据上的应用表明其在捕捉状态转换动态、识别潜在结构变化方面具有良好的表现。

详情
英文摘要

We propose a hidden Markov model for univariate proportion time series taking values in (0,1), where regime switching captures latent structural changes and the emission distribution belongs to the Beta family. In each latent state, the Beta mean is linked to covariates through a generalized additive model (GAM) with spline-based smooth functions, while the Beta precision is state-specific, enabling flexible modeling of both nonlinear covariate effects and regime-dependent variability. Estimation is carried out via a penalized expectation--maximization algorithm, combining smoothing with numerical maximization of the penalized emission likelihood. To select the number of latent states and the smoothing penalty, we implement a grid search guided by standard information criteria (Akaike Information Criterion/Bayesian Information Criterion/Integrated Completed Likelihood) with a diagnostic filter that removes degenerate solutions characterized by explosive precision estimates. Uncertainty is quantified through a parametric bootstrap procedure for transition probabilities and state-dependent parameters. Simulation results demonstrate accurate recovery of transition dynamics, state precisions, and latent-state decoding. A motivating application to Russian age-specific mortality data (1960--2014, ages 0--40) illustrates how the proposed model summarizes smooth age patterns in female-to-total mortality ratios while identifying two persistent latent regimes that admit a substantive demographic interpretation in light of the country's well-documented mortality shocks that occurred over the second half of the twentieth century.

2605.07297 2026-05-11 stat.ML cs.LG

Spectrum-Adaptive Generalization Bounds for Trained Deep Transformers

Mana Sakai, Masaaki Imaizumi

AI总结 本文研究了训练好的Transformer模型泛化性能良好的原因,提出了基于谱适配的后验泛化界。通过逐层控制谱范数,作者将泛化界表示为查询-键、值和前馈权重矩阵的Schatten量,这些量可根据训练后的奇异值分布进行自适应选择,从而在谱复杂度与维度、深度相关因素之间取得平衡。实验表明,与基于范数的界相比,本文提出的复杂度代理量随深度和隐藏维度的增长速度更慢,为理解Transformer的泛化能力提供了新的视角。

详情
英文摘要

Understanding why trained Transformers generalize well is a fundamental problem in modern machine learning theory, and complexity-based generalization bounds provide a principled way to study this question. While existing norm-based bounds for Transformers remove the explicit polynomial dependence on the hidden dimension, they typically impose fixed norm constraints specified a priori and can exhibit unfavorable exponential dependence on depth. In this paper, we derive spectrum-adaptive post hoc generalization bounds for multi-layer Transformers. Under layerwise spectral norm control, the bounds are expressed in terms of layerwise Schatten quantities of the query-key, value, and feedforward weight matrices. Since the Schatten indices need not be fixed a priori and can instead be selected after training, separately for each matrix type and layer, the bounds adaptively trade off spectral complexity against the dimension- and depth-dependent factors according to the learned singular-value profiles. Empirical comparisons of BERT-adapted proxies for the leading complexity factors suggest that the proxies induced by our bounds grow more slowly with depth and hidden dimension than the corresponding norm-based proxies. Overall, our results provide a complexity-based perspective on how the spectral structure of trained Transformers is reflected in generalization analyses.

2605.07225 2026-05-11 stat.AP

Spatiotemporal dynamics of wind-speed volatility

Ariane Nidelle Meli Chrisko, Philipp Otto

AI总结 本文研究了风速波动的时空动态特性,利用意大利北部141个站点2016至2021年的每日10米和100米高度风速观测数据,分析了其时空依赖性。研究采用基于GARCH类型的简洁时空波动模型,将条件方差与过去局部冲击及邻近站点的空间信息相结合,并结合基于距离和方向的权重矩阵构建结构化波动模型。结果表明,合理建模空间均值对残差行为和推断可靠性至关重要,且风速波动随高度增加而增强,多变量扩展揭示了不同高度间的相互依赖关系。

Comments Submitted to Environmetrics. 6 figures, 11 tables

详情
英文摘要

Wind-speed processes exhibit substantial temporal variability and spatial dependence, yet volatility dynamics across monitoring networks remain relatively unexplored. This study investigates the spatiotemporal behaviour of wind-speed volatility using daily observations from 141 stations in Northern Italy over 2016--2021, with measurements at 10 m and 100 m enabling the analysis of spatial and vertical dependence. We adopt a parsimonious spatiotemporal volatility framework based on GARCH-type dynamics, in which conditional variance depends on past local shocks and spatially aggregated information from neighbouring stations. The approach combines a spatial mean specification with structured volatility models using distance-based and directionally informed weight matrices. Results show that properly modelling spatial dependence in the mean is essential for well-behaved residuals and reliable inference. Forecast performance is strongly driven by the mean specification: flexible structures perform better when residual spatial dependence remains, while parsimonious distance-based models yield robust out-of-sample forecasts once spatial interactions are captured. Persistence increases with height, and a multivariate extension reveals cross-height dependence.

2605.07218 2026-05-11 cs.LG stat.ML

Improved Model-based Reinforcement Learning with Smooth Kernels

Kun Long, Yuqiang Li, Xianyi Wu

AI总结 本文研究了连续状态-动作空间下的模型基于强化学习问题,提出了一种基于平滑核的改进方法,利用MDP的平滑性进行非参数核平滑估计。通过引入伯恩斯坦风格的探索奖励,该方法在有限时间范围内实现了比现有方法更优的遗憾界,其理论分析还提出了一个可能具有独立价值的新的伯恩斯坦型鞅浓度不等式。

Comments 38 pages, 5 figures

详情
英文摘要

For continuous state-action space scenarios, classical reinforcement learning (RL) theory predominantly focuses on low-rank Markov decision processes (MDPs), which provide sample-efficient guarantees at the expense of restrictive structural assumptions. Kernel smoothing model-based approaches offer a promising alternative paradigm that instead leverages the smoothness of the MDP and employs non-parametric kernel smoothing estimates of transition dynamics. This paper proposes a new kernel-smoothing model-based approach for online reinforcement learning in finite-horizon settings under Lipschitz continuity assumptions on the MDP. By incorporating a Bernstein-style exploration bonus into the kernel smoothing framework, our method achieves a regret bound which improves upon the state-of-the-art regret bound in its dependence on the horizon. The theoretical advancement relies on a delicate analysis of the synergy between Bernstein-style bonuses and kernel smoothing, where a new tight Bernstein-type concentration inequality for martingales may be of independent interest.

2605.07171 2026-05-11 cs.LG cs.SY eess.SY stat.ML

Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy

Ishank Juneja, Carlee Joe-Wong, Osman Yağan

AI总结 本文研究了在成本补贴约束下的多臂老虎机问题,目标是在保证最小奖励的前提下最小化总成本。针对奖励约束相对于未知最优奖励的情况,作者提出了一个名为Cost-Ordered Feasibility(COF)的算法,该算法通过智能地整合各臂的采样信息,评估低成本臂的可行性,并在理论上证明了其累积成本和质量遗憾的上界。实验表明,COF在理论分析和实际性能上均优于现有方法。

详情
英文摘要

The classic multi-armed bandit (MAB) problem tackles the challenge of accruing maximum reward while making decisions under uncertainty. However, in applications, often the goal is to minimize cost subject to a constraint on the minimum permissible reward, an objective captured by multi-armed bandits with cost-subsidy (MAB-CS). Of interest to this paper is the setting where the quality (reward) constraint is specified relative to the unknown best reward and the cost of each arm is known. We characterize the expected sub-optimal samples required by any policy by proving instance-dependent lower bounds that offer new insight into the problem and are a strict generalization of prior bounds. Then, we propose an algorithm called Cost-Ordered Feasibility (COF) that leverages our insight and intelligently combine samples from all arms to gauge the feasibility of a cheap arm. Thereafter, we analyze COF to establish instance-dependent upper bounds on its expected cumulative cost and quality regret, i.e., relative to the cheapest feasible arm. Finally, we empirically validate the merits of COF, comparing it to baselines from the literature through extensive simulation experiments on the MovieLens and Goodreads datasets as well as representative synthetic instances. Not only does our paper develop qualitatively better theoretical regret upper bounds, but COF also convincingly demonstrates improved empirical performance.

2605.07120 2026-05-11 cs.LG stat.ML

When Symbol Names Should Not Matter: A Logistic Theory of Fresh-Symbol Classification

Wenjie Guan, Jelena Bradic

AI总结 该论文研究了在固定标签分类任务中,模型是否能基于抽象模板而非具体符号名称进行推理的问题。作者提出了一种正则化核逻辑分类方法,分析了在训练数据中由于符号偶然重叠引起的扰动,并通过着色碰撞图对这些扰动进行建模。研究证明了在新鲜符号分类任务中,模型的分类边界具有高概率的迁移保证,并揭示了词汇规模与碰撞几何对分类性能的不同影响,为理解符号抽象和泛化提供了新的理论视角。

详情
英文摘要

Template tasks have emerged as a clean testbed for asking whether transformers reason with abstract symbols rather than concrete token names. We study the fixed-label classification version of this problem, where train and test examples share latent templates but may use disjoint vocabularies. Unlike next-token prediction, the model need not emit unseen symbols; it must learn a decision rule invariant to symbol renaming. We analyze regularized kernel logistic classification in the transformer-kernel regime. Our main result decomposes the learned predictor into an ideal template-level classifier and a finite-sample perturbation caused by accidental token overlaps in the training data. We encode these overlaps by a colored collision graph and prove high-probability margin-transfer guarantees for fresh-symbol classification. This perspective extends template-based analyses to logistic classification and refines scalar diversity conditions: vocabulary size controls the average rate of collisions, but collision geometry controls whether the ideal classification margin is preserved. More broadly, the same perturbation framework applies to abstraction-augmented inputs, yielding a general margin-versus-collision criterion for identifying when prompting strategies improve fresh-symbol generalization. Synthetic template experiments illustrate the predicted roles of regularization, sample size, and transformer-kernel structure.

2605.07119 2026-05-11 stat.ML cs.LG

Classification Fields: Arbitrarily Fine Recursive Hierarchical Clustering From Few Examples

Yicen Li, Ruiyang Hong, Anastasis Kratsios, Haitz Sáez de Ocáriz Borde, Paul D. McNicholas

AI总结 该论文提出了一种名为“分类场”的无限深度分层聚类结构,用于从少量样本中学习递归生成细粒度层次化的聚类场。研究通过定义局部的父节点到子节点的细化规则,生成具有无限深度的聚类中心、Voronoi单元和层次结构的有向无环图。论文证明了所学模型在完成单元度量下的指数收敛性,并在实验中验证了其在生成分层结构、保持几何特性与路径度量方面的能力。

详情
英文摘要

Classical clustering methods usually return either a finite partition of the observed data or a finite dendrogram over it. This finite-sample view is inadequate when the hierarchy of interest is a recursive geometric object with fine-scale refinements that continue beyond the levels directly observed. We introduce classification fields: infinite-depth hierarchical cluster structures on $\mathbb{R}^d$ generated by a local parent-to-child refinement rule. A classification field generator maps each parent centre to an ordered, bounded, and separated tuple of child residuals. Together with a root and a scale factor, this rule recursively generates cluster centres, Voronoi cells, and a metric DAG encoding the hierarchy. Given only a finite prefix of such a hierarchy, we learn a classification field predictor that approximates the generator and can be rolled out to unseen depths. We prove exponential truncation convergence in the completed cell metric and ReLU realizability with width $O(\varepsilon^{-γ})$ and depth $\widetilde O(\varepsilon^{-3γ/2})$, where $γ=\log K/(-\log s)$, up to finite-window aspect-ratio factors. The approximation holds at the level of the induced compact metric structures, measured in the completed cell-metric Hausdorff distance. Experimental validation on matched CFG-generated hierarchies, IFS fractals, and image-induced recursive clustering hierarchies shows that learned predictors preserve ordered child slots, unordered geometry, and hierarchy-level path metrics under recursive rollout. These results support the claim that finite hierarchical observations can reveal local refinement rules capable of generating substantially deeper classification fields.

2605.07115 2026-05-11 cs.LG stat.ML

Conformal-Style Quantile Analyses for Stochastic Bandits

Chengyu Du, Mengfan Xu

AI总结 本文研究了在随机多臂老虎机问题中,如何针对具有强上尾性能的臂进行分析,而非传统的平均奖励准则。作者提出了一种基于符合性(conformal)方法的上尾量化分析框架,并设计了ACPU-CB1算法,该算法结合了自适应的符合性估计与UCB型乐观奖励机制。该方法在保证上尾性能的同时,实现了对数级别的上尾遗憾界,理论分析与实验验证均表明其优于传统UCB算法。

详情
英文摘要

Stochastic bandit algorithms are usually analyzed under a mean-reward criterion, yet many problems favor arms with strong upper-tail performance, which we study herein. For a fixed miscoverage level \(α\), the natural upper-tail target of arm \(j\) is the upper endpoint \(F_j^{-1}(1-α/2)\) of a central prediction interval. This target can rank arms differently from their means, creating a central mismatch with the classical bandit objective. To this end, we propose ACP-UCB1, a conformal-style policy that combines an adaptive conformal estimate of the upper endpoint with a UCB-type optimism bonus. The technical challenge is that the conformity scores used by ACP-UCB1 are recomputed from evolving empirical quantile estimates and evaluated at an adaptive level. We control this endpoint through reward-quantile concentration, a perturbation argument for recomputed score quantiles, and deterministic localization of the adaptive level. ACP-UCB1 achieves logarithmic upper-quantile regret with per-arm contribution \(O(\nicefrac{\log n}{Δ_j^{\mathrm{ACP}}})\). We also provide metric-specific regret decompositions comparing ACP-UCB1 with UCB1 and use numerical experiments to validate performance and improvement.

2605.07104 2026-05-11 cs.LG math.OC stat.ML

Almost Sure Convergence Rates of Stochastic Approximation and Reinforcement Learning via a Poisson-Moreau Drift

Xinyu Liu, Zixuan Xie, Shangtong Zhang

AI总结 本文研究了在马尔可夫噪声环境下随机逼近和强化学习算法的几乎必然收敛速率问题。针对一类期望更新具有收缩性的算法(如Q学习和线性时序差分学习),作者提出了一种基于泊松方程修正的Lyapunov漂移构造方法,从而获得了对幂律和调和学习率下接近最优的收敛速率结果。该方法为理解强化学习算法在非独立同分布噪声下的收敛行为提供了新的理论分析工具。

详情
英文摘要

Establishing almost sure convergence rates for stochastic approximation and reinforcement learning under Markovian noise is a fundamental theoretical challenge. We make progress towards this challenge for a class of stochastic approximation algorithms whose expected updates are contractive, a setting that arises in many reinforcement learning algorithms such as $Q$-learning and linear temporal difference learning. Specifically, for a power-law learning rate $O(n^{-η})$ with $η\in (1/2, 1)$, we obtain an almost sure convergence rate arbitrarily close to $o(n^{1 - 2η})$. For a harmonic learning rate $O(n^{-1})$, we obtain an almost sure convergence rate arbitrarily close to $o(n^{-1})$, which we argue is a strong result because it is close to the optimal rate $O(n^{-1}\log\log n)$ given by the law of the iterated logarithm (for a special case of i.i.d. noise). Key to our analysis is a novel Lyapunov drift construction that applies a Poisson-equation based correction for Markovian noise to the well-established Moreau-envelope smoothing for the contractive mapping.

2605.07101 2026-05-11 cs.MA stat.ML

Decentralized Diffusion Policy Learning for Enhanced Exploration in Cooperative Multi-agent Reinforcement Learning

Yuyang Zhang, Haldun Balim, Na Li

AI总结 本文研究了合作多智能体强化学习中的探索问题,指出现有基于高斯策略的去中心化策略梯度方法在智能体数量增加时探索能力受限。为此,提出了一种基于去噪扩散概率模型的去中心化扩散策略学习方法(DDPL),能够生成多模态动作分布以提升探索效率,并通过重要性采样得分匹配方法实现高效在线训练。实验表明,DDPL在多个连续动作多智能体基准任务中表现优异。

详情
英文摘要

Cooperative multi-agent reinforcement learning (MARL) involves complex agent interactions and requires effective exploration strategies. A prominent class of MARL algorithms, decentralized softmax policy gradient (DecSPG), addresses this through energy-based policy updates. In practice, however, such energy-based policies are intractable to maintain and are commonly projected onto the Gaussian policy class. In this work, we show that the limited expressiveness of Gaussian policies severely hinders exploration in DecSPG, and this limitation worsens as the number of agents grows. To address this issue, we propose decentralized diffusion policy learning (DDPL), which parameterizes each agent's policy with a denoising diffusion probabilistic model, an expressive generative model that captures multi-modal action distributions for enhanced exploration. DDPL enables efficient online training of diffusion policies via importance sampling score matching (ISSM), a novel training method with theoretical guarantee. We evaluate DDPL on representative continuous-action MARL benchmarks, including multi-agent particle environment, multi-agent MuJoCo, IsaacLab, and JAX-reimplemented StarCraft multi-agent challenge, and observe consistently improved performance.

2605.07100 2026-05-11 stat.ML cs.LG

TRACE: Transport Alignment Conformal Prediction via Diffusion and Flow Matching Models

Zhenhan Fang, Aixin Tan, Jian Huang

AI总结 TRACE 是一种基于扩散模型和流匹配模型的符合性预测框架,旨在为多维输出构建有效且信息丰富的预测区间。该方法通过运输对齐来定义非符合性分数,避免了显式似然评估和可逆变换的限制,仅通过沿随机运输轨迹的去噪或速度匹配误差来衡量候选输出与生成动态的契合度。实验表明,TRACE 能在保证边际覆盖率的同时,适应多模态和非凸条件分布,具有良好的实用性和泛化能力。

Comments 22 pages, 5 figures and 5 tables

详情
英文摘要

Constructing valid and informative conformal prediction regions for multi-dimensional outputs remains a fundamental challenge. While conformal prediction provides finite-sample, distribution-free coverage guarantees, its practical performance critically depends on the choice of nonconformity score. Existing approaches often rely on restrictive geometric assumptions or require explicit likelihood evaluation and invertible transformations, limiting their applicability in complex generative settings. In this work, we introduce TRACE (TRansport Alignment Conformal Estimation), a conformal prediction framework that defines nonconformity through transport alignment in diffusion and flow matching models. Rather than evaluating likelihoods, we measure how well a candidate output aligns with the learned generative dynamics by averaging denoising or velocity-matching errors along stochastic transport trajectories. The resulting transport-based scores are scalar-valued and can be calibrated using split conformal prediction, yielding valid marginal coverage under exchangeability. We further analyze the statistical properties of the proposed scores and their sensitivity to computational budget. Experiments on synthetic and real datasets demonstrate valid coverage and show that the resulting regions adapt naturally to multimodal and non-convex conditional distributions.

2605.07097 2026-05-11 stat.ML cs.LG cs.NE math.LO math.ST stat.TH

Every Feedforward Neural Network Definable in an o-Minimal Structure Has Finite Sample Complexity

Anastasis Kratsios, Gregory Cousins, Haitz Sáez de Ocáriz Borde, Bum Jun Kim, Simone Brugiapaglia

AI总结 本文证明了在PAC学习模型中,一类广泛的前馈神经网络具有有限样本复杂度:任何固定层数且各层在o-极小结构中可定义的前馈网络,即使参数无界,也具有有限样本复杂度。该结果适用于标准的固定大小的多层感知机、卷积神经网络、图神经网络和固定序列长度的Transformer等现代非循环架构,涵盖了这些结构中常用的各类操作和层。研究指出,现代非循环网络的分布无关可学习性并非依赖于特定激活函数或架构特有VC维论证的例外性质,而是源于其“温顺”的前馈计算特性。

详情
英文摘要

We show that, in a precise sense, a broad class of feedforward neural networks learn (have finite sample complexity) in the PAC model: every fixed finite feedforward architecture whose layers are definable in an o-minimal structure has finite sample complexity in the agnostic PAC setting, even with unbounded parameters. This covers standard fixed-size MLPs, CNNs, GNNs, and transformers with fixed sequence length, together with the operations and layers typically used in such architectures, including linear projections, residual connections, attention mechanisms, pooling layers, normalization layers, and admissible positional encodings. Hence, distribution-free learnability for modern non-recurrent architectures is not an exceptional property of particular activations or architecture-specific VC arguments, but a consequence of tame feedforward computation. Our results reposition finite-sample PAC learnability as a baseline rather than a differentiator: they shift the focus of architectural comparison toward inductive biases, symmetries and geometric priors, scalability, and optimization behaviour.

2605.07087 2026-05-11 stat.ME

A Finite-Horizon Mixture Cure Model with Application to Online Flea Market Data

Yuji Komiyama, Yasumasa Matsuda, Masakazu Ishihara

AI总结 本文提出了一种有限时间范围的混合治愈模型,用于分析事件在特定时间段内发生的情况,克服了传统模型基于无限时间范围所带来的可识别性和解释性问题。该方法通过关注有限时间内的决策目标,减少了对无限尾部假设的依赖,并在模拟研究和实际应用中展示了其优越性。文章将该模型应用于日本二手交易平台Mercari的交易数据,揭示了该模型在识别用户行为季节性变化方面具有更准确的解释能力。

详情
英文摘要

This study proposes a mixture cure model that latently divides a population based on event occurrence within a finite time horizon. Conventional models rely on event occurrence over an infinite horizon, introducing untestable assumptions that often lead to issues with identifiability and interpretability. By shifting the estimand to a specific period of interest, the proposed approach reduces reliance on these infinite-tail assumptions and aligns interpretations more closely with finite-horizon decision-making objectives. Through simulation studies, we first evaluate the statistical properties of the proposed estimator, including estimation bias and variance. We further show that relying on conventional infinite-horizon models for finite-horizon decision-making can lead to erroneous judgments. Finally, we apply the model to transaction data from Mercari, a Japanese online flea market platform. The empirical results reveal that the proposed model identifies different significant variables compared to the conventional model, offering interpretations that better reflect seasonal variation in user behavior.

2605.07072 2026-05-11 cs.LG cs.CR stat.ML

Less Random, More Private: What is the Optimal Subsampling Scheme for DP-SGD?

Andy Dong, Ayfer Özgür

AI总结 本文研究了差分隐私随机梯度下降(DP-SGD)中最优的子采样方案,指出传统的泊松子采样虽然便于隐私分析,但其引入的参与方差会削弱隐私增强效果。作者提出了一种结构化的平衡迭代子采样(BIS)方法,通过确保每个样本参与固定数量的迭代,实现了比泊松子采样更强的隐私增强效果,并在噪声趋于零和无穷大的极端情况下达到最优。实验表明,BIS在低噪声场景下能有效减少所需噪声乘数,提升模型实用性和隐私保护水平。

Comments 17 pages, 1 table. Submitted to NeurIPS 2026

详情
英文摘要

Poisson subsampling is the default sampling scheme in differentially private machine learning, largely because its unstructured randomness yields tractable privacy amplification analyses. Yet this same randomness introduces substantial participation variance: each sample appears in very different numbers of training iterations. In this work, we show that this variance is not merely a practical artifact to be tolerated, but a fundamental source of suboptimal privacy amplification. We prove that Balanced Iteration Subsampling (BIS), a structured scheme in which each sample participates in exactly a fixed number of iterations, achieves stronger privacy amplification than Poisson subsampling and is optimal at both extremes of the noise spectrum ($σ\to 0$ and $σ\to \infty$). Our analysis reveals that the privacy-noise tradeoff is governed not by maximizing randomness, but by eliminating participation variance while preserving uniform marginal participation across iterations. To translate this asymptotic theory into finite-noise guarantees, we introduce a practical near-exact Monte Carlo accountant for BIS, which removes the analytical slack of existing RDP and composition-based PLD analyses. Evaluations across more than 60 practical DP-SGD configurations show that BIS consistently outperforms Poisson subsampling in the low-noise regimes most relevant for high-utility private training, reducing the required noise multiplier by up to $9.6\%$. These results overturn the common intuition that more sampling randomness necessarily yields stronger privacy amplification: in DP-SGD, structured participation can be both more practical and more private. Our implementation is available at https://github.com/dong-xin-ao-andy/bis-mc-accountant.

2605.07065 2026-05-11 stat.ML cs.AI cs.LG econ.EM

Causal EpiNets: Precision-corrected Bounds on Individual Treatment Effects using Epistemic Neural Networks

Gandharv Patil, Keyi Tang, Raquel Aoki, Leo Guelman

AI总结 该研究针对个体处理效应的识别问题,提出了一种基于认知神经网络的因果EpiNets方法,用于在有限样本下更精确地估计个体层面的因果效应。该方法通过设计满足结构约束的神经网络架构,并结合精度校正的交集界推理,有效解决了传统估计方法在结构概率约束和极值偏差上的缺陷。实验表明,该方法在高维场景下能够保持名义覆盖度和约束有效性,优于现有估计器。

详情
英文摘要

Individual treatment effects are not point-identified from data. The Probability of Necessity and Sufficiency (PNS) circumvents this limitation by characterizing individual-level causality through intersection bounds derived from combined experimental and observational data. In finite samples, however, standard plug-in estimators systematically fail: they violate structural probability constraints and suffer from extremum bias induced by max-min operators, yielding spuriously narrow intervals. We propose a neural framework for finite-sample PNS estimation that resolves both pathologies. We introduce an anchored neural architecture that guarantees structural constraint satisfaction by construction. To correct extremum bias, we employ precision-corrected intersection-bound inference, leveraging Epistemic Neural Networks for scalable, high-dimensional uncertainty quantification. Empirical evaluations confirm that this approach maintains nominal coverage and exact constraint validity in high-dimensional regimes where standard estimators systematically undercover.

2605.07056 2026-05-11 cs.CY cs.HC cs.SI stat.AP

The University AI Didn't Replace -- Rethinking Universities in the AI Era

Karol P. Binkowski, Andrew Hopkins

AI总结 本文探讨了人工智能时代下大学教育面临的变革与挑战,指出尽管生成式人工智能正在重塑高等教育,但多数高校仍处于早期应用阶段,缺乏系统性的战略整合。研究提出了一个包含四个层次的AI采纳框架,并通过案例分析展示了高校在课程改革中引入AI的实践动态。核心贡献在于强调高校需从零散的创新转向战略整合,重构以AI支持的推理为核心的学习模式,并调整相关政策与评价体系以推动教育转型。

Comments 8 pages, 1 figure. Position paper on Generative AI and the transition from isolated educational innovation to institutionally supported adoption in higher education

详情
英文摘要

Generative artificial intelligence (AI) is reshaping higher education, yet many universities remain in early stages of adoption where AI innovation occurs informally and without institutional recognition. This paper presents a framework describing four levels of AI adoption in universities and illustrates these dynamics through a case study of AI-enabled curriculum initiatives in several units. We contend that the key institutional challenge is moving from isolated innovation to strategic integration, where universities redesign learning around AI-supported reasoning and align policies, workload models, and recognition systems to support educational transformation.

2605.07046 2026-05-11 stat.ML cs.AI cs.LG

An Interpretable and Scalable Framework for Evaluating Large Language Models

Xinhao Qu, Qiang Heng, Hao Zeng, Xiaoqian Liu

AI总结 本文提出了一种可解释且可扩展的框架,用于评估大型语言模型(LLM),旨在解决传统基准测试方法忽视模型输出随机性和题目异质性的问题。该方法基于最大-最小化原理,将评估问题转化为一系列约束矩阵分解子问题,从而实现稳定高效的参数估计,并具有理论上的可识别性和收敛性保证。实验表明,该方法在多个合成和真实数据集上表现出更高的可扩展性和解释性,同时在速度和估计精度方面优于现有方法。

详情
英文摘要

Evaluation of large language models (LLMs) is increasingly critical, yet standard benchmarking methods rely on average accuracy, overlooking both the inherent stochasticity of LLM outputs and the heterogeneity of benchmark items. Item Response Theory (IRT) offers a principled framework for modeling latent model abilities and item characteristics, but conventional methods are computationally expensive and numerically unstable, limiting large-scale implementations. To address these challenges, we propose an interpretable and scalable framework for LLM evaluation based on the majorization-minimization principle. Our approach reformulates the problem as a sequence of constrained matrix factorization subproblems, enabling stable and efficient parameter estimation with theoretical guarantees for identifiability and convergence. Experiments on synthetic and real-world datasets, including MATH-500 and six Open LLM Leaderboard benchmarks, demonstrate that our method achieves superior scalability and interpretability. It delivers orders-of-magnitude speedups over competing methods while maintaining comparable or even higher estimation accuracy. Our results align with established scaling laws and offer insights into item difficulty and discrimination, informing more principled benchmark design.

2605.07029 2026-05-11 stat.ML cs.AI cs.LG stat.ME

BGM-IV: an AI-powered Bayesian generative modeling approach for instrumental variable analysis

Guyue Luo, Qiao Liu

AI总结 该论文提出了一种基于生成模型的贝叶斯方法BGM-IV,用于解决非线性工具变量回归中的因果效应估计问题。该方法通过构建一个具有因果结构的潜在空间,将非线性IV回归问题转化为后验推断问题,从而更有效地处理高维协变量和复杂的非线性关系。BGM-IV通过引入工具变量诱导的伪似然函数,克服了内生性问题,在多个基准数据集上表现出优越的性能,特别是在高维协变量场景中效果显著。

详情
英文摘要

Instrumental-variable (IV) regression enables causal estimation under endogeneity, but modern IV problems often involve nonlinear structural effects and high-dimensional covariates. Existing nonlinear IV methods directly learn the causal relation in observed feature space or rely on learned representations within two-stage or moment-based procedures, which can struggle when the causal information is embedded in a high-dimensional representation. We propose BGM-IV, a latent Bayesian generative modeling approach that reframes nonlinear IV regression as posterior inference in a causally structured latent space. BGM-IV infers latent components that separately capture shared confounding structure, outcome-specific variation, treatment-specific variation, and covariate-only nuisance information. To account for endogeneity, BGM-IV replaces the confounded outcome likelihood with an IV-integrated pseudo-likelihood that averages over instrument-induced treatment values within the latent model. Across various benchmark datasets, BGM-IV remains competitive in the classical low-dimensional regime and performs best in high-dimensional covariate regimes. Together, these results show that structured latent generative modeling provides a principled and effective strategy to nonlinear IV estimation with rich covariates. The code of BGM-IV is available at https://github.com/liuq-lab/BGM-IV.

2605.07002 2026-05-11 cs.AI math.ST stat.ML stat.TH

Adaptive auditing of AI systems with anytime-valid guarantees

Siyu Zhou, Patrick Vossler, Venkatesh Sivaraman, Yifan Mai, Jean Feng

AI总结 本文研究了如何在有限标注成本下对生成式AI系统进行自适应审计,并保证统计推断的严谨性。作者提出了一种基于“对抗性假设检验”的框架,从模型和审计方两个视角分别设定假设,并利用安全任意时刻有效推理(SAVI)方法,将审计过程转化为一种“投注式检验”,从而实现对两个对立假设的同时检验。研究表明,当审计方法足够强大时,通过严格审计可以证明AI系统具有全局鲁棒性,实验也验证了该方法在控制一类错误和统计效能方面的优越性。

详情
英文摘要

A major bottleneck in characterizing the failure modes of generative AI systems is the cost and time of annotation and evaluation. Consequently, adaptive testing paradigms have gained popularity, where one opportunistically decides which cases and how many to annotate based on past results. While this framework is highly practical, its extreme flexibility makes it difficult to draw statistically rigorous conclusions, as it violates classical assumptions: the number of observations is typically limited (often 10 to 50 cases) and decisions regarding sampling and stopping are made in the midst of data collection rather than based a pre-specified rule. To characterize what statistical inferences can be drawn from highly adaptive audits, we introduce a hypothesis testing framework from two 'dueling' perspectives: (i) the model's null that asserts there is no failure mode with performance below a target threshold versus (ii) the auditor's null that asserts they have a sampling strategy that will uncover a failure mode. Leveraging Safe Anytime-Valid Inference (SAVI), we formalize the auditor as conducting 'testing by betting', which translates into simultaneous e-processes for testing the dueling null hypotheses. Furthermore, if the auditor is sufficiently powerful, we prove that these two hypotheses are asymptotically inverses of each other, in that passage of a stringent audit does in fact certify the AI system as being globally robust. Empirically, we demonstrate that our proposed testing procedures maintain anytime-valid type-I error control, outperform pre-specified testing methods, and can reach statistically rigorous conclusions sometimes with as few as 20 observations.

2605.06993 2026-05-11 cs.AI stat.ML

Optimal Experiments for Partial Causal Effect Identification

Tobias Maringgele, Jalal Etesami

AI总结 该研究探讨了如何在观测数据中部分识别因果效应的情况下,选择成本受限的最优实验以最大程度地缩小因果效应的置信区间。作者提出了一个称为“最大效用”的问题,并证明其计算复杂度为NP难。通过引入基于因果图的剪枝准则,研究有效减少了候选实验的搜索空间,并在多个基准网络上验证了方法的有效性,展示了其在实际数据中的应用潜力。

详情
英文摘要

Causal queries are often only partially identifiable from observational data, and experiments that could tighten the resulting bounds are typically costly. We study the problem of selecting, prior to observing experimental outcomes, a cost-constrained subset of experiments that maximally tightens bounds on a target query. We formalize this as the max-potency problem, where epistemic potency measures the worst-case reduction in bound width guaranteed by an experiment, and show that this problem is NP-hard via a reduction from 0-1 knapsack. Building on the polynomial-programming framework of Duarte et al. (2023), we give a general procedure for evaluating epistemic potency in discrete settings. To control the super-exponential search space, we introduce two graphical pruning criteria that depend only on the causal graph and the query: a novel path-interception rule that exploits district structure to certify zero potency in linear time, and an identifiability check based on the ID algorithm. On Erdos-Renyi random graphs and 11 bnlearn benchmark networks, the two criteria together prune 50-88% of candidate experiments on average without solving a single polynomial program. For the general subset search, we show that ID-pruned experiments are combinatorially inert, yielding a super-exponential reduction in the number of subsets evaluated. We close with an end-to-end demonstration on observational NHANES data, selecting optimal experiments for estimating the effect of physical activity on diabetes.

2605.06992 2026-05-11 cs.LG stat.ML

Why Does Agentic Safety Fail to Generalize Across Tasks?

Yonatan Slutzky, Yotam Alexander, Tomer Slor, Yoav Nagel, Nadav Cohen

AI总结 随着AI代理在多任务环境中应用增多,如何在未知任务中保持安全执行成为一个关键问题。本文理论分析与实验表明,代理安全能力难以跨任务泛化,不仅源于训练方法的局限,更是安全本身固有的复杂性所致。研究通过线性二次控制与$H_{\infty}$鲁棒性分析,证明安全需求会显著增加任务到控制器映射的Lipschitz常数,并在无人机导航和CRM任务中验证了该结论,指出当前提升代理安全性的方法可能存在根本性不足。

详情
英文摘要

AI agents are increasingly deployed in multi-task settings, where the task to perform is specified at test time, and the agent must generalize to unseen tasks. A major concern in such settings is safety: often, an agent must not only execute unseen tasks, but do so while avoiding risks and handling ones that materialize. Empirical evidence suggests that even when the ability to execute generalizes to unseen tasks, the ability to do so safely frequently does not. This paper provides theory and experiments indicating that failures of agentic safety to generalize across tasks are not merely due to limitations of training methods, but reflect an inherent property of safety itself: the relationship between a task and its safe execution is more complex than the relationship between a task and its execution alone. Theoretically, we analyze linear-quadratic control with $H_{\infty}$-robustness, and prove that the mapping from task specification to an optimal controller has higher Lipschitz constant with safety requirements than without, yielding a Lipschitz bound of independent interest. Empirically, we demonstrate our conclusions in simulated quadcopter navigation with a neural network agent and in CRM with an LLM agent. Our findings suggest that current efforts to enhance agentic safety may be insufficient, and point to a need for fundamentally different approaches.

2605.06989 2026-05-11 stat.AP cs.AI cs.LG stat.ME

Drawing Lines in Psychological Space: What K-means Clustering Reveals in Simulated and Real Psychometric Data

Pedro Henrique Ramos Pinto, Maria Jullyanna Ferreira Marques, Luiz Carlos Serramo Lopez

AI总结 该研究探讨了K均值聚类在心理测量数据中的应用,指出其传统方法虽广泛用于识别心理子群和类型,但并未检验这些群组是否真实存在。通过构建受控的模拟数据集并分析国际心理测量数据集SMARVUS,研究发现即使在没有真实子群结构的连续高斯潜在空间中,K均值仍能生成稳定且视觉上连贯的聚类结果,揭示了其在心理空间划分中的潜在有效性。

Comments Methodological study on K-means clustering in psychometric data using simulated and empirical datasets

详情
英文摘要

K-means clustering is widely used in psychological and psychometric research to identify profiles, subgroups, and potential typologies, yet its classical formulation does not test whether such groups exist as latent psychological categories. Instead, K-means partitions multidimensional space into regions around centroids, favoring compact, approximately spherical clusters defined by geometric distance. In this paper, we examine this limitation through a sequence of controlled simulated datasets. We then extend the analysis to the SMARVUS dataset, a large international psychometric dataset comprising survey responses from university students across 35 countries, to evaluate whether similar geometric partitioning patterns emerge in empirical psychological data. By contrasting simulated and empirical data, this paper argues that K-means can produce stable and visually coherent clustering solutions even in continuous Gaussian latent spaces without true subgroup structure.

2605.06987 2026-05-11 cs.LG cs.GT econ.TH stat.ML

Response Time Enhances Alignment with Heterogeneous Preferences

Federico Echenique, Alireza Fallah, Baihe Huang, Michael I. Jordan

AI总结 本文研究了如何在存在异质偏好标签者的情况下,提升大语言模型与人类偏好的对齐效果。传统方法通过聚合二元选择数据构建奖励模型,但忽略了标签者之间的偏好差异,导致模型无法准确学习真实的人群平均偏好。为此,作者提出利用用户响应时间作为补充信号,结合漂移-扩散模型(DDM),设计了一种能够识别异质偏好的新估计方法,有效纠正了传统方法的偏差,并在多种数据集上验证了其优越性。该方法无需用户身份信息,具有实际应用价值。

详情
英文摘要

Aligning large language models (LLMs) to human preferences typically relies on aggregating pooled feedback into a single reward model. However, this standard approach assumes that all labelers share the same underlying preferences, ignoring the fact that real-world labelers are highly heterogeneous and usually anonymous. Consequently, relying solely on binary choice data fundamentally distorts the learned policy, making the true population-average preference unidentifiable. To overcome this critical limitation, we demonstrate that augmenting preference datasets with a simple, secondary signal -- the user's response time -- can restore the identifiability of the population's average preference. By modeling each decision as a Drift-Diffusion Model (DDM), we introduce a novel, consistent estimator of heterogeneous preferences that successfully corrects the distortions of standard choice-only labels. We prove that our estimator asymptotically converges to the true average preference even in extreme cases where each anonymous labeler contributes only a single choice. Empirically, across both synthetic and real-world datasets, our method consistently outperforms standard baselines that otherwise fail and plateau at a bias floor. Because response times are essentially free to record and require zero user tracking or identification, our results bring promises and open up new opportunities for future data-collection pipelines to improve the social benefit without requiring user-level identifiers or repeated elicitations.

2605.06979 2026-05-11 cs.LG cs.AI stat.ML

PLOT: Progressive Localization via Optimal Transport in Neural Causal Abstraction

Jonathn Chang, Arya Datla, Ziv Goldfeld

AI总结 本文提出了一种名为PLOT的方法,通过最优运输理论实现神经因果抽象中的渐进式因果变量定位。该方法通过在抽象变量与候选神经位置之间建立最优运输耦合,获得全局软对应关系,并据此校准干预句柄,从而高效定位因果变量。实验表明,PLOT在保持高精度的同时显著提升了计算效率,为大规模因果抽象研究提供了有效的定位工具。

详情
英文摘要

Causal abstraction offers a principled framework for mechanistic interpretability, aligning a high-level causal model with the low-level computation realized by a neural network through counterfactual intervention analysis. Existing methods such as distributed alignment search (DAS) learn expressive subspace interventions, but the relevant neural site is unknown a priori, so finding a handle requires a computationally burdensome search over candidate sites. We introduce PLOT (Progressive Localization via Optimal Transport), a transport-based framework that localizes causal variables from the output effect geometry of abstract and neural interventions. PLOT fits an optimal transport coupling between abstract variables and candidate neural sites, yielding a global soft correspondence that can be calibrated into intervention handles. In simple settings, a single coupling over individual neurons suffices. In larger models, PLOT is applied progressively, moving from coarse sites such as tokens, timesteps, or layers to finer supports such as coordinate groups or PCA spans, and optionally guiding DAS based on the localized signal. Across experiments of increasing complexity, transport-only PLOT handles are exceedingly fast and competitive on accuracy, while PLOT-guided DAS reaches DAS-level accuracy at a fraction of full DAS runtime, providing an efficient localization engine for causal abstraction research at scale.

2605.06977 2026-05-11 cs.LG cs.AI cs.IT math.IT stat.ML

$f$-Divergence Regularized RLHF: Two Tales of Sampling and Unified Analyses

Di Wu, Chengshuai Shi, Jing Yang, Cong Shen

AI总结 本文研究了在强化学习从人类反馈(RLHF)中使用一般$f$-散度正则化的问题,提出了一个统一的理论框架,填补了现有研究在该方向上的理论空白。作者基于两种不同的采样原则设计了两个算法,分别通过优化主义原则和奖励扰动敏感性进行策略优化,理论分析表明这两个算法均可达到$O(\log T)$的遗憾界和$O(1/T)$的次优性间隙,为在线RLHF在一般$f$-散度正则化下的性能提供了首个理论保证。

Comments ICML 2026

详情
英文摘要

Reinforcement Learning from Human Feedback (RLHF) has become a cornerstone technique for post-training large language models. While most existing approaches rely on the reverse KL-regularization, recent empirical studies have begun exploring alternative divergences (e.g., forward KL, chi-squared) as regularizers in RLHF. However, a unified theoretical understanding of general $f$-divergence regularization remains under-explored. To fill this gap, this work develops a comprehensive theoretical framework for online RLHF with a general $f$-divergence regularized objective. Rather than treating each possible divergence function individually, we adopt a holistic perspective across the entire function class and propose two algorithms based on distinct sampling principles. The first extends the classical optimism principle with a carefully designed exploration bonus, while the second introduces a new method that exploits the sensitivity of the optimal policy to reward perturbations under $f$-divergence regularization. Theoretical analysis shows that $O(\log T)$ regret and $O(1/T)$ sub-optimality gap are achievable, establishing provable efficiency of both algorithms and, to the best of our knowledge, the first performance bounds for online RLHF under general $f$-divergence regularization.

2605.06976 2026-05-11 stat.ML cs.LG stat.CO

A Differentiable Bayesian Relaxation for Latent Partial-Order Inference

Dongqing Li, Geoff K. Nicholls, Shiyi Sun, You Luo

AI总结 许多排序和代理轨迹数据集以线性顺序记录,但实际上其潜在结构是部分有序的。本文提出了一种可微分的贝叶斯松弛方法,用于从这类轨迹中推断潜在的部分顺序关系。该方法通过引入平滑替代品,将不连续的偏序关系和边界可行性条件转化为连续的后验分布,从而支持基于梯度的MCMC和变分推断,并在实验中表现出良好的推断精度和计算效率。

详情
英文摘要

Many ranking and agent trace datasets are recorded as linear orders even though their latent structure is only partially ordered. This is especially common in agent and workflow traces, where observed order may reflect arbitrary linearization rather than true prerequisites. We introduce a differentiable relaxation for latent partial-order inference from such traces. Starting from a hard frontier-constrained model of noisy linear extensions, we replace discontinuous product-order precedence and binary frontier feasibility with smooth surrogates, yielding a continuous posterior that preserves closure-level partial-order semantics and supports gradient-based MCMC and variational inference. We prove soft transitivity, sharp-limit frontier recovery, and convergence to the hard likelihood. Experiments on synthetic data, records of social dominance relations, and cloud-agent traces show close posterior fidelity to hard MCMC on small instances and improved runtime--accuracy trade-offs on larger problems.

2605.06959 2026-05-11 stat.ML cs.LG math.ST stat.TH

Locally Near Optimal Piecewise Linear Regression in High Dimensions via Difference of Max-Affine Functions

Haitham Kanj, Kiryung Lee

AI总结 本文提出了一种基于自适应块梯度下降(ABGD)算法的参数化分段线性回归方法,其核心思想是将分段线性函数表示为最大仿射函数(DoMA)的差。通过非渐近的局部收敛分析,证明了在子高斯协变量和噪声分布下ABGD的线性收敛性,并展示了其在噪声环境下所需的样本复杂度及无噪声情况下的精确恢复能力。实验结果验证了理论分析,并表明该方法在实际数据集上具有竞争力。

详情
英文摘要

This paper presents a parametric solution to piecewise linear regression through the Adaptive Block Gradient Descent (ABGD) algorithm. The heart of the method is the parametrization of piecewise linear functions as the difference of max-affine (DoMA) functions. A non-asymptotic local convergence analysis for ABGD is provided under sub-Gaussian covariate and noise distributions. To initialize ABGD, we adapt a prior algorithm originally developed for the simpler setting of max-affine functions. When suitably initialized, ABGD converges linearly to an $ε$-accurate estimate given $\tilde{\mathcal{O}}(d\max(σ_z/ε,1)^2)$ observations where $σ_z^2$ denotes the noise variance. This implies exact recovery given $\tilde{\mathcal{O}}(d)$ samples in the noiseless case. Also, such a rate is shown to be minimax optimal up to logarithmic factors. Synthetic numerical results corroborate the theoretical guarantees for ABGD. We also observe competitive performance compared to the state-of-the-art methods on real-world datasets.

2605.06939 2026-05-11 cs.LG stat.ME stat.ML

Bias and Uncertainty in LLM-as-a-Judge Estimation

James Fiedler

AI总结 本文研究了使用大型语言模型作为裁判(LLM-as-a-Judge)进行模型评估时存在的偏差和不确定性问题。作者指出,直接使用裁判输出进行性能估计会引入系统性偏差,现有校正方法的可靠性依赖于裁判质量及跨模型校准稳定性。研究通过理论分析、模拟实验和真实数据案例,揭示了共享校准在模型比较中可能导致严重偏差甚至方向错误的问题,并提出了基于裁判质量($J$)和跨模型校准不稳定性($ΔJ$)的诊断指标,以指导更可靠的LLM-as-a-Judge评估实践。

详情
英文摘要

LLM-as-a-Judge evaluation has become a standard tool for assessing base model performance. However, characterizing performance via the naive estimator, i.e., raw judge outputs, is systematically biased. Recent work has proposed estimators to correct this bias, but their reliability depends critically on judge quality and, for model comparisons, on calibration stability. Sharing calibration across compared models is practically attractive but can introduce severe bias, including cases where the comparison estimate points in the wrong direction with high apparent confidence. We study these failure modes through analytical results, simulations over judge quality ($J$) and cross-model calibration instability ($ΔJ$), and a real-data MMLU-Pro case study with sign reversal. We propose $J$ and $ΔJ$ as diagnostics for when corrected estimates, especially shared-calibration comparisons, are likely unreliable, and provide reporting guidance for LaaJ evaluation.

2605.06883 2026-05-11 stat.ML cs.LG

Kernel Selection is Model Selection: A Unified Complexity-Penalized Approach for MMD Two-Sample Tests

Yijin Ni, Xiaoming Huo

AI总结 该论文研究了如何通过动态选择核函数来提升最大均值差异(MMD)两样本检验的统计功效。作者提出了一种统一的复杂度惩罚方法(CP-MMD),将核选择视为模型选择问题,并通过引入优化复杂度的惩罚项,使得在连续参数空间上可以直接进行无网格的核优化。该方法在保证第一类错误控制的同时,显著提升了检验能力,适用于包括带宽参数、多项式特征和深度网络在内的多种核类。

详情
英文摘要

The Maximum Mean Discrepancy (MMD) is a cornerstone statistic for nonparametric two-sample testing, but its test power is dictated entirely by the chosen kernel. Because any fixed kernel inherently fails to distinguish certain distributions, the kernel must be dynamically optimized. However, data-driven optimization violates the foundational i.i.d. assumption, forcing a strict trade-off in existing frameworks. Ratio criteria ignore this dependence, inducing overfitting and variance collapse on rich kernel classes. Conversely, aggregation methods bypass the dependence using finite grids, but this strategy cannot scale to continuous search spaces like deep kernels. To break this dichotomy, we establish data-driven kernel selection as a model selection problem. We propose Complexity-Penalized MMD (CP-MMD), a criterion derived by applying the two-sample uniform concentration inequality of preceding works to the post-optimization MMD problem. The resulting penalty bounds the empirical MMD by the complexity of the kernel search space, mathematically absorbing the cost of optimization, so that CP-MMD enables direct, grid-free maximization over continuous parametric classes, including scalar bandwidths, polynomial feature bandwidths, and deep network parameters. By formally accounting for optimization complexity, we prove that CP-MMD maximizes true test power while ensuring unconditional Type-I validity. Consequently, CP-MMD enables grid-free kernel selection across linear, polynomial-feature, and deep regimes, matching or exceeding state-of-the-art test power.

2605.06862 2026-05-11 stat.ME stat.ML

Nonparametric estimation of time-varying network connections by multi-stage smoothing

Jeonghwan Lee, Tianxi Li, Adam J. Rothman

AI总结 本文研究如何估计在多个时间点观测到的时变网络中的边概率,提出了一种多阶段平滑估计方法。该方法首先对每条边进行时间局部平滑,再利用数据驱动的邻域构造进行节点域平滑,并可选地引入额外的时间平滑步骤以提升整体时间域的估计精度。该方法在不同生成模型下的仿真研究中表现出优越性,并在真实时变网络数据上验证了其对连接关系的平滑时间演化和结构模式的有效捕捉。

详情
英文摘要

We consider the problem of estimating the underlying edge probabilities of a time-varying network observed at multiple time points. The probability structure is represented by a time-varying graphon that satisfies temporal Hölder smoothness and piecewise Lipschitz conditions in the latent variables. We propose a multi-stage smoothing estimator that first applies temporal local smoothing to each edge and then performs node-domain smoothing using a data-driven neighborhood construction adapted from the method. An additional temporal smoothing step is introduced as an optional refinement when uniform accuracy over the entire time domain is required. Simulation studies demonstrate the benefits of combining temporal and node-domain smoothing under different generative models. We also apply the method to a real time-varying network dataset and show that it captures both smooth temporal evolution and structural patterns in the connectivity.

2605.06845 2026-05-11 math.ST stat.TH

Convergence Rates for Latent Mixing Measures in Infinite Homoscedastic Location-Scale Mixture Models

Nicola Bariletto, Dung Le, Alessandro Rinaldo, Nhat Ho

AI总结 本文研究了在具有无限多个组件的同方差位置-尺度混合模型中,潜在混合测度的后验收敛速率问题。由于位置和尺度参数均未知,确保混合测度的收敛比密度层面的收敛更具挑战性。作者通过建立新的下界,将混合密度之间的$L^1$距离与混合测度及尺度矩阵之间的Wasserstein距离和算子范数差异联系起来,从而得到了一系列通用不等式,并进一步针对多元正态、柯西和拉普拉斯核等常见混合模型给出了具体的收敛速率结果,为共享未知尺度参数的Dirichlet过程混合模型提供了首次的收敛速率分析。

详情
英文摘要

We study posterior contraction rates for mixing measures in homoscedastic location-scale mixture models with infinitely many components. While posterior convergence at the level of densities is well understood, ensuring convergence of the latent mixing measure is more challenging and has remained an open problem in settings where both location and scale parameters are unknown. We address this by deriving novel lower-bounds that connect the $L^1$ distance between mixture densities to discrepancies, based on the Wasserstein distances and the operator norm, between the underlying mixing measures and scale matrices. Our approach combines the dual formulation of the $W_1$ distance with functional-analytic approximation techniques. This leads to general inequalities, whose strength is determined (i) by the smoothness of the mixture kernel via the rate of decay of its characteristic function, and (ii) by a key lower-bound on the $L^1$ metric involving the operator norm discrepancy between scale parameters. Moreover, a novel PDE inversion condition yields a sharper inequality for important ordinary-smooth cases. We specialize these bounds to popular mixtures based on multivariate Gaussian, Cauchy, and Laplace kernels. As a consequence, we obtain first-of-their-kind contraction rates in the context of Dirichlet process mixtures with an unknown scale parameter shared across components. As a byproduct of our inequalities, we can distinguish the convergence behavior of the location mixing measure from that of the scale parameter across a range of kernel choices, leading to nuanced insights into their respective rates.

2605.06843 2026-05-11 stat.AP stat.ME

Nonlinear Amplification of Finite-Sample Uncertainty in Capability-Based Decisions

Fei Jiang, Lei Yang

AI总结 本文研究了在统计决策系统中,有限样本不确定性在非线性变换下的传播机制,特别关注用于制造过程评估的能力指数。研究发现,能力指数的估计误差虽然随过程波动近似线性变化,但缺陷概率等风险指标却因尾部曲率的影响,导致小误差被显著放大,从而引发决策不确定性。该机制解释了为何在能力指数空间看似稳定的判断,在缺陷风险空间可能表现出较大波动,为提高决策可靠性提供了理论依据,并通过仿真和工业数据分析验证了其实际意义。

Comments 10 pages, 2 figures and 2 tables

详情
英文摘要

This paper studies the propagation of finite-sample uncertainty under nonlinear transformations commonly used in statistical decision systems. In particular, we consider process capability indices, which are widely used in manufacturing practice but are estimated from finite samples, rendering the resulting approval decisions inherently uncertain. We show that such uncertainty cannot be fully explained by estimator variability alone, but is substantially influenced by a nonlinear amplification mechanism through which capability uncertainty is transformed into defect-risk metrics. While capability estimators vary approximately linearly with process dispersion, defect probabilities depend on tail curvature, causing small estimation errors to be disproportionately amplified in measures such as defect probability and parts-per-million (PPM) rates. Consequently, capability assessments that appear stable in index space may exhibit substantial variability in defect-risk space, particularly near decision thresholds. This insight provides a unified explanation of finite-sample decision instability, motivates reliability-aware decision formulations, and links sample-size requirements directly to decision reliability. Monte Carlo simulations and industrial data analyses validate the proposed mechanism and demonstrate its practical implications, including the impact of distributional assumptions on defect-risk estimation.

2605.06826 2026-05-11 stat.ML cs.IT math.IT math.SP

How Does Attention Help? Insights from Random Matrices on Signal Recovery from Sequence Models

Mohamed El Amine Seddik

AI总结 本文研究了从序列模型中构建的样本协方差矩阵的谱特性,其中词嵌入来自固定两类高斯混合分布,并通过固定注意力权重进行池化。在高维极限下,作者推导了特征值分布、异常特征值以及特征向量与隐藏信号对齐的精确刻画,揭示了信号恢复过程中两个与注意力权重和位置相关矩阵相关的相变现象。研究还表明,最大化信噪比的注意力权重应为位置相关矩阵的主特征向量,并验证了因果自注意力在特定参数设置下能提升信号恢复性能。

详情
英文摘要

We study the spectral properties of sample covariance matrices constructed from pooled sequence representations, where token embeddings are drawn from a fixed two-class Gaussian mixture table and pooled via (fixed) attention weights. Working in the high-dimensional regime $d,V,N\to\infty$ with $d/V\toδ$ and $d/N\toγ$, we derive exact characterizations of the limiting eigenvalue distribution, outlier eigenvalues, and eigenvector alignment with the hidden signal. The bulk spectrum follows a non-Marchenko--Pastur law given by the free multiplicative convolution $κ(MP_δ\boxtimes MP_γ)$, reflecting the finite vocabulary structure. Signal recovery undergoes two successive BBP-type phase transitions characterized by the scalars: $δ,γ,α=w^{\top} R w$ and $κ=\|w\|^2$, where $w$ denotes the attention pooling weights and $R$ the positional correlation matrix. An aftermath of our analysis demonstrates that the optimal attention weights maximizing the signal-to-noise ratio $α/κ$ are given by the (normalized) top eigenvector of $R$, and we show (as a particular case of our analysis) that parameter-free causal self-attention with $τ/d$ score scaling yields deterministic harmonic weights that improve signal recovery over mean pooling whenever early tokens carry more signal. Extensive simulations confirm sharp agreement between theory and finite-dimensional experiments.

2605.06821 2026-05-11 cs.LG cs.AI math.OC stat.ML

A Rod Flow Model for Adam at the Edge of Stability

Eric Regis, Sinho Chewi

AI总结 本文研究了Adam优化器在稳定性边缘的行为,提出了一种称为“杆流”(rod flow)的连续时间模型。该方法将参数和一阶矩构成的联合相空间中的连续迭代过程建模为一个扩展的一维对象——“杆”,并将二阶矩作为平滑的辅助变量进行处理。该模型不仅适用于Adam,还推广到多种动量优化方法,并在多个典型机器学习任务中验证了其在稳定性边缘区域对离散迭代过程的更精确追踪能力。

详情
英文摘要

Cohen et al. (arXiv:2207.14484) observed that adaptive gradient methods such as Adam operate at the edge of stability. While there has been significant work on continuous-time modeling of gradient descent at the edge of stability, extending these models to momentum methods remains underdeveloped. In the gradient descent setting, Regis et al. (arXiv:2602.01480) introduced rod flow, which models consecutive iterates as an extended one-dimensional object -- a "rod." Here we extend rod flow to Adam by working in the joint phase space of parameters and first moment $(w, m)$ and treating the second moment $ν$ as a smooth auxiliary variable. We also develop rod flows for heavy ball momentum, Nesterov momentum, and scalar and per-component versions of RMSProp, Adam, and NAdam. For all eight optimizers, we empirically evaluate rod flow on representative machine learning architectures, where it tracks the discrete iterates through the edge-of-stability regime significantly more accurately than the corresponding stable flow.

2605.06818 2026-05-11 stat.ME q-fin.ST

Modeling Dynamic Correlation Matrices with Shrinkage Priors

Daniel Andrew Coulson, David S. Matteson, Martin T. Wells

AI总结 本文研究了如何估计随时间变化的相关矩阵,并提出了一个基于低秩因子表示的贝叶斯方法,利用动态收缩先验对相关结构进行局部自适应正则化,并结合多变量因子随机波动模型处理观测误差。该方法不仅能够更准确地捕捉相关性变化,还首次建立了动态正则化贝叶斯模型的后验收缩理论结果。此外,文章还引入信息论中的总相关概念,为跨截面依赖性提供了一个标量度量,应用于金融市场的压力时期,有效评估了投资组合分散化效益的变化。

Comments 88 pages, 4 figures, 5 tables

详情
英文摘要

Estimating time-varying correlation matrices is challenging because existing methods may adapt slowly to structural changes, impose insufficient regularization, or produce diffuse posterior uncertainty. In moderate dimensions, an additional difficulty is summarizing the estimated evolving dependence structure for downstream decision-making tasks. We propose a Bayesian approach based on a low-rank factor representation, with latent states evolving under a dynamic shrinkage prior and observation errors following a multivariate factor stochastic volatility model. This specification allows locally adaptive regularization of the estimated correlation structure over time and informative uncertainty quantification. We establish, to our knowledge, a first-of-its-kind posterior contraction result for dynamically regularized Bayesian models, showing contraction around the true model parameters at an explicit rate under averaged Hellinger distance. To summarize the estimated correlation matrices, we build on the information-theoretic concept of total correlation to obtain a scalar measure of cross-sectional dependence. Simulation studies show improved accuracy and responsiveness relative to competing methods in a range of challenging scenarios. We then apply our method to monitoring the correlation evolution of equity portfolios during periods of financial market stress, providing an ex post framework for assessing the changing benefits of diversification in backtesting analyses.

2605.06749 2026-05-11 stat.ME cs.AI

A Statistical Framework for Algorithmic Collective Action with Multiple Collectives

Claudio Battiloro, Pietro Greiner, Dario Rancati, Bret Nestor, Oumaima Amezgar, Francesca Dominici

AI总结 随着学习系统在日常决策中扮演越来越重要的角色,算法集体行动(ACA)作为一种用户协调修改共享数据以引导模型行为的方式,为监管政策和企业模型设计提供了补充。现有研究多聚焦于单一集体的场景,而现实中多个集体往往在共享总体目标的同时,因规模、策略和行动目标的不同而分散存在。本文首次提出一个多集体算法集体行动的统计框架,研究多个集体如何共同影响分类器的行为,并提供了基于集体规模和目标对齐程度的定量统计界限,且允许每个集体仅需部分了解其他集体的信息即可计算这些界限。通过模拟智慧城市中气候适应干预的场景,验证了该框架的有效性。

Comments 27 pages, 16 figures

详情
英文摘要

As learning systems increasingly shape everyday decisions, Algorithmic Collective Action (ACA), i.e., users coordinating changes to shared data to steer model behavior, offers a complement to regulator-side policy and corporate model design. Real-world collective actions have traditionally been decentralized and fragmented into multiple collectives, despite sharing overarching objectives, with each collective differing in size, strategy, and actionable goals. However, most of the ACA literature focuses on single collective settings. To address this, we propose the first comprehensive statistical framework for ACA with multiple collectives acting on the same system. In particular, we focus on collective action in classification, studying how multiple collectives can influence a classifier's behavior. We provide quantitative statistical bounds on the success of the collectives, considering the role and the interplay of the collectives' sizes and the alignment of their goals. We make such bounds computable by each collective with only partial knowledge of other collectives' sizes and strategies. Finally, we numerically illustrate our framework on simulations inspired by interventions for climate adaptation in smart cities, demonstrating the usefulness of our bounds.

2605.06742 2026-05-11 stat.ME stat.AP

Bayesian Modeling and Prediction of Generalized Contact Matrices

Shozen Dan, David A. van Dyk, Zhi Ling, Swapnil Mishra, Oliver Ratmann

AI总结 该研究提出了一种贝叶斯建模框架,用于推断超越年龄维度的广义接触矩阵,以更细致地刻画人群间的接触模式。该方法结合张量结构和光滑约束,既满足接触矩阵的基本结构假设,又提升了高维矩阵估计的计算可行性和统计稳定性。研究还揭示了多维矩阵分层与列联表理论的联系,从而有效应对实际数据中接触特征缺失的问题,并通过两个真实数据集验证了方法的有效性。

详情
英文摘要

Social contact matrices are essential tools in infectious disease epidemiology as they quantify close-range human contact patterns which directly drive the transmission of airborne infectious diseases. In this work we propose a Bayesian modeling framework for inferring generalized contact matrices which stratify contact matrices beyond contemporary age dimensions. The model is designed to satisfy fundamental structural assumptions of contacts while leveraging tensor structures and smoothing constraints to make high-dimensional matrix estimation computationally feasible and statistically stable. We discover a link between multi-dimensional matrix stratification subject to structural constraints with the theory of contingency tables. This enables us to approach a challenging missing-data problem commonly encountered in real-world analysis where feature information on the contacts is unobserved. We benchmark the framework against existing methods through simulation studies and illustrate the framework's practical utility through two real-world datasets: BICS (United States) and COVIMOD (Germany). Our models are implemented in an open-source Python package to facilitate adoption in the wider scientific community.

2605.06710 2026-05-11 cs.IT cs.LG math.IT math.ST stat.TH

Information-theoretic Limits of Learning and Estimation

Abbas El Gamal, Maxim Raginsky

AI总结 本文介绍了信息论在学习与估计问题中的基本极限,探讨了无论计算能力如何,任何学习或估计算法所能达到的性能边界。文章从集中不等式、度量熵、Rademacher复杂度等工具入手,推导了泛化误差的上界,并结合互信息与相对熵分析了学习理论框架。随后,通过Fano不等式建立了最小最大估计风险的下界,为理解学习与估计的理论极限提供了重要分析工具。

详情
英文摘要

Information theory plays a central role in establishing fundamental limits on what any learning or estimation algorithm can -- and cannot -- achieve, regardless of computational power. In this chapter, we provide an introduction to these connections. End-of-chapter exercises makes the material suitable for both classroom use and self-study. We begin by introducing concentration inequalities along with the notions of covering and packing in metric spaces, and the associated concept of metric entropy. These tools are essential for our analysis. We then introduce the learning-theoretic framework and derive upper bounds on generalization error in terms of metric entropy, Rademacher complexity, and the VC dimension, as well as mutual information and relative entropy. Finally we discuss the minimax estimation framework and establish lower bounds on minimax risk using Fano's inequality, yielding bounds in terms of relative entropy and covering and packing numbers. This manuscript contains preprint of a chapter under consideration for inclusion in the forthcoming third edition of Cover and Thomas's Elements of Information Theory, posted with permission from Wiley. It would follow the chapter posted at arXiv:2605.02989 . The table of contents of the new edition can be found at: https://docs.google.com/document/d/1L-m4oQEJw1PJhoxBeMwrrBD8S_HmvzMEkPbYvS24980/edit?usp=sharing . For feedback, please contact abbas@ee.stanford.edu.

2605.06688 2026-05-11 q-fin.CP math.PR math.ST stat.TH

American Options Pricing under Heston Model via Curriculum Learning in Coupled PINNs

Rohan, Siddanth Shetty, Amit N. Kumar

AI总结 本文研究了在Heston模型下对美式期权进行定价的问题,该问题由于存在提前行权特性,需要同时确定一个未知的时变行权边界,因此难以用解析方法求解。文章提出了一种基于耦合物理信息神经网络(PINNs)的新方法,结合课程学习和自适应重采样策略,同时预测期权价格和自由边界,有效提升了模型训练的稳定性与准确性。该方法为美式期权在随机波动率环境下的定价提供了高效且鲁棒的深度学习解决方案。

Comments 25 pages, 22 figures

详情
英文摘要

In American options, the early exercise feature allows the option to be exercised at any time prior to expiration. However, this flexibility introduces a challenge: the pricing model must value the option while simultaneously determining an unknown, time-varying exercise boundary. The Heston model is one of the most popular ways to model real market behavior because it allows volatility to change over time. However, unlike European options, there is no closed-form solution for American options under the Heston model, so we have to use numerical methods. In this paper, we propose a novel approach to solving the stochastic Heston partial differential equation for American options, using coupled physics-informed neural networks (PINNs) to predict both the option price and the free boundary, while employing curriculum learning and adaptive resampling to stabilize model training. Our work builds on recent deep learning methods but introduces a more effective training strategy to address the limitations of these approaches. The numerical results demonstrate the effectiveness of the proposed learning framework, providing a robust and efficient alternative to pricing American options, enabling rapid inference and accurate estimation under stochastic volatility.

2605.06686 2026-05-11 cs.LG econ.EM stat.AP stat.ML

Robustness of Refugee-Matching Gains to Off-Policy Evaluation Choices

Kirk Bansak, Elisabeth Paulson, Dominik Rothenhäusler, Jeremy Ferwerda, Jens Hainmueller, Michael Hotard

AI总结 本文研究了在美国难民匹配政策中,反事实影响评估结果对离线策略评估方法的稳健性。通过应用逆概率加权(IPW)和增强型逆概率加权(AIPW)等多种评估方法,并结合不同的模型结构和分配程序,研究发现无论采用何种方法,影响估计结果在数量级上均保持一致,且在多数情况下具有统计显著性。这些结果与Bansak等人(2018)最初的研究结论也高度一致。

Comments 13 pages, 2 figures, 10 tables

详情
英文摘要

Previous research has investigated the potential of refugee matching for boosting refugee outcomes, first considered by Bansak et al. (2018). This paper demonstrates the stability of counterfactual impact evaluation results in the context of refugee matching in the United States using a range of off-policy evaluation methods. In order to estimate counterfactual impact and test the robustness of our results, we employ several evaluation methods, including inverse probability weighting (IPW) and multiple variants of augmented inverse probability weighting (AIPW). We also consider various modifications, including alternative modeling architectures and different assignment procedures. The impact estimates remain consistent in magnitude in all scenarios as well as statistically significant in most cases. Furthermore, the estimates are also consistent with the results originally presented in Bansak et al. (2018).

2605.06685 2026-05-11 cs.SD eess.AS stat.AP

An audio-to-analysis pipeline with certified transcription for information-theoretic profiling of the piano repertoire

Fred Jalbert-Desforges

AI总结 本文提出了一种从音频直接生成作曲家层面信息论特征的分析流程,通过认证的乐谱转录层(在MAESTRO数据集上F1值达0.9791)提取和声音阶分布,并利用香农熵、非对称KL散度和齐普夫模型进行分析。研究揭示了作曲家在和声可预测性上的可解释排序,重现了已知的风格传承关系,并区分出现代极简主义作曲家与历史作曲家在和声过渡分布上的显著差异。

Comments 25 pages, 4 figures, 25 references

详情
英文摘要

We present an audio-to-analysis pipeline that produces composer-level information-theoretic profiles : reflecting compositional vocabulary as it emerges from aggregated performances : from raw recordings, built on a transcription layer whose accuracy we certify on a standard benchmark (F1 = 0.9791 on the MAESTRO v3.0.0 test set). Applied to 1,238 pieces and 15 MAESTRO composers with at least ten attributed pieces, spanning the Baroque through the early twentieth century, the pipeline derives empirical distributions over harmonic scale degrees and analyzes them through Shannon entropy, asymmetric Kullback-Leibler divergence, and Zipfian rank-frequency modeling. The resulting profiles (i) order composers along an interpretable axis of harmonic predictability, with a narrow entropy range (3.33-3.86 bits) that reveals the marginal-level similarity of tonal vocabularies; (ii) recover known stylistic lineages (Haydn-Beethoven, Liszt-Rachmaninoff, Schubert-Schumann) through the smallest KL divergences in the corpus, with Mendelssohn emerging as a stable outlier within this corpus; and (iii) separate contemporary neoclassical artists (Richter, Frahm, Glass, Arnalds, Jóhannsson) from historical composers on the quality of Zipfian fit to the transition distribution, with mean $R^2 = 0.78$ for neoclassical versus 0.46 for historical (N $\geq$ 10 pieces each). This gap is larger than the spread within either group and is consistent with a minimalist compositional tendency: a compact transition vocabulary used with sharper frequency-rank regularity than historical composers. All estimates are reported with Laplace-smoothed bootstrap 95% confidence intervals.

2605.06678 2026-05-11 cs.LG q-fin.RM stat.AP

A Wasserstein GAN-based climate scenario generator for risk management and insurance: the case of soil subsidence

Antoine Heranval, Olivier Lopez, Didier Ngatcha, Daniel Nkameni

AI总结 本文提出了一种基于Wasserstein GAN的气候情景生成框架SwiGAN,用于生成未来气候指数的时空演变轨迹,以支持风险管理与保险策略制定。该方法聚焦于法国用于评估干旱程度的关键指标——土壤湿润指数(SWI),并模拟其到2050年的可能演变路径,帮助理解气候变化下的干旱动态。该模型不仅有助于制定适应性风险应对策略,还可推广至其他气候相关风险及精算应用。

详情
英文摘要

According to the United Nations Office for Disaster Risk Reduction (2025), the average annual cost of natural catastrophes increased from 70--80 billion USD between 1970 and 2000 to 180--200 billion USD between 2001 and 2020. Reports from organizations such as the IFOA and the WWF highlight the need for the insurance sector to adapt to this rapidly evolving context by developing medium- to long-term strategies that go beyond the one-year horizon of prudential regulations such as Solvency II. This paper introduces an artificial intelligence framework based on Conditional Generative Adversarial Networks (Conditional GANs) to generate future spatio-temporal trajectories of climatic indices. The approach focuses on the Soil Wetness Index (SWI), a key indicator used in France to assess drought severity. Drought accounts for approximately 30% of the indemnities paid under the French natural catastrophe insurance scheme. The proposed model, SwiGAN, simulates plausible drought propagation patterns up to 2050 for a region of France particularly exposed to this hazard. By generating realistic sequences of SWI maps, SwiGAN provides insights into drought dynamics under climate change scenarios and supports the design of adaptive risk management and insurance strategies. The methodology is also generalizable to other climate-related perils and actuarial applications such as economic scenario generation.

2604.18972 2026-05-11 stat.ML cs.LG math.OC

Beyond Bellman: High-Order Generator Regression for Continuous-Time Policy Evaluation

Yaowei Zheng, Richong Zhang, Shenxi Wu, Shirui Bian, Haosong Zhang, Li Zeng, Xingjian Ma, Yichi Zhang

AI总结 本文研究在时间非齐次动力学下,如何从离散闭环轨迹进行有限时间连续时间策略评估问题。传统Bellman方法仅具有一阶精度,本文提出通过多步转移估计时间依赖的生成器,并结合矩匹配系数消除低阶截断误差,从而实现更高阶的回归估计。理论分析给出了误差分解及适用条件,实验表明该方法在多种基准测试中优于Bellman基线,验证了高阶生成器回归在连续时间策略评估中的有效性与稳定性。

Comments The authors are withdrawing this paper due to an unresolved dispute concerning authorship and the attribution of intellectual contributions

详情
英文摘要

We study finite-horizon continuous-time policy evaluation from discrete closed-loop trajectories under time-inhomogeneous dynamics. The target value surface solves a backward parabolic equation, but the Bellman baseline obtained from one-step recursion is only first-order in the grid width. We estimate the time-dependent generator from multi-step transitions using moment-matching coefficients that cancel lower-order truncation terms, and combine the resulting surrogate with backward regression. The main theory gives an end-to-end decomposition into generator misspecification, projection error, pooling bias, finite-sample error, and start-up error, together with a decision-frequency regime map explaining when higher-order gains should be visible. Across calibration studies, four-scale benchmarks, feature and start-up ablations, and gain-mismatch stress tests, the second-order estimator consistently improves on the Bellman baseline and remains stable in the regime where the theory predicts visible gains. These results position high-order generator regression as an interpretable continuous-time policy-evaluation method with a clear operating region.

2604.15439 2026-05-11 stat.ML cs.LG math.PR

One-Shot Generative Flows: Existence and Obstructions

Panos Tsimpos, Daniel Sharp, Youssef Marzouk

AI总结 本文研究了生成模型中的动态测度传输问题,重点探讨了通过积分速度场将源分布 $P_0$ 转换为目标分布 $P_1$ 的传输映射。研究核心在于判断何时该过程能产生“直线流”,即点加速度为零、可被任意一阶方法精确积分的流动。文章通过偏微分方程刻画了直线流的特征,并证明了在端点独立条件下,直线流存在与否存在明显二分现象:一方面,对任意高斯端点可构造显式直线流;另一方面,对于具有足够分离模态的目标分布,直线流则根本不存在。这些结果揭示了生成流结构存在的条件与限制。

详情
英文摘要

We study dynamic measure transport for generative modeling, focusing on transport maps that connect a source measure $P_0$ to a target measure $P_1$ by integrating a velocity field of the form $v_t(x) = \mathbb{E}[\dot X_t \mid X_t = x]$, where $X_\bullet = (X_t)_t$ is a stochastic process satisfying $(X_0,X_1)\sim{P_0}\otimes{P_1}$ and $\dot X_t$ is its time derivative. We investigate when $X_\bullet$ induces a \emph{straight-line flow}: a flow whose pointwise acceleration vanishes and is therefore exactly integrable by any first-order method. First, we develop multiple characterizations of straight-line flows in terms of PDEs involving the conditional statistics of the process. Then, we prove that straight-line flows under endpoint independence exhibit a sharp dichotomy. On the one hand, we construct explicit, computable straight-line processes for arbitrary Gaussian endpoints. On the other hand, we show that straight-line processes do not exist for targets with sufficiently well-separated modes. We demonstrate this obstruction through a sequence of increasingly general impossibility theorems that uncover a fundamental relationship between the sample-path behavior of a process with independent endpoints and the space-time geometry of this process' flow map. Taken together, these results provide a structural theory of when straight-line generative flows can, and cannot, exist.

2604.05241 2026-05-11 math.ST stat.TH

Information Geometry and Asymptotic Theory for SMML Estimators

Enes Makalic, Daniel F. Schmidt

AI总结 本文研究了严格最小消息长度(SMML)估计器的信息几何性质及其渐近理论。作者将SMML目标分解为断言熵和条件交叉熵,揭示了其在模型选择与数据编码之间的平衡机制,并证明在高分辨率条件下,最优SMML划分可由最大似然估计器拉回参数空间中的加权费舍尔-拉奥沃罗诺伊划分得到。研究还表明,对于正则指数族,SMML码点满足矩匹配条件,可解释为KL/Bregman中心,为信息几何下的编码理论提供了新的几何解释。

详情
英文摘要

Strict minimum message length (SMML) is an information-theoretic coding principle that represents a continuous statistical model by a finite set of assertions and a partition of the sample space. We show that the SMML objective decomposes into assertion entropy and conditional cross-entropy, balancing the cost of identifying an assertion against the cost of encoding data under the assigned model. For any fixed partition, the optimal codepoint for each cell is the model distribution that minimises Kullback-Leibler divergence from the data distribution restricted to that cell. Using the local Fisher-Rao geometry of regular parametric models, we show that, under high-resolution regularity conditions, optimal SMML partitions are asymptotically the pullback, through the maximum likelihood estimator, of weighted Fisher-Rao Voronoi tessellations in parameter space, with assertion probabilities appearing as additive weights. For regular exponential families, SMML codepoints satisfy a moment-matching condition and admit an interpretation as KL/Bregman centroids, while exact SMML cells are pullbacks of convex polyhedra in sufficient-statistic space. Together, these results show that SMML induces a natural information-geometric quantisation linking entropy-based coding, KL projection, and divergence-based Voronoi geometry.

2603.25806 2026-05-11 stat.ME math.ST stat.CO stat.TH

Context Tree Prior Distributions based on Node Weighting with exact Bayes Factors

Thiago Paulichen, Victor Freguglia

AI总结 该研究提出了一种基于节点加权的上下文树先验分布方法,用于构建变量长度马尔可夫链(VLMC)模型。通过在节点上直接指定权重函数,该方法能够直观地将结构假设融入先验分布,克服了传统分支过程方法在结构控制方面的局限性。研究还引入了精确的贝叶斯因子计算方法,支持模型比较与假设检验,并展示了该方法在模拟研究中的灵活性与有效性。

Comments 31 pages, 9 figures

详情
英文摘要

Variable-length Markov chains (VLMCs) are a flexible class of higher-order Markov models that admit a natural representation as context trees. Existing Bayesian methods for specifying prior distributions on tree structures rely on branching processes, but these suffer from a fundamental limitation. The connection between branching probabilities at individual nodes and the structural properties of the induced tree distribution is not straightforward, making it difficult to construct priors encoding specific structural beliefs. We address this limitation by introducing a novel representation of prior distributions on tree space based on context-tree functions. By directly specifying weights for individual contexts through a function on nodes, our approach provides an intuitive mechanism for incorporating structural hypotheses into the prior. This class of distributions maintains computational tractability, allowing marginal likelihoods and posterior mode trees to be computed exactly via generalizations of the Context Tree Weighting (CTW) and Context Tree Maximizing (CTM) algorithms. Exact Bayes factor computation enables rigorous model comparison and hypothesis testing. We demonstrate the flexibility and effectiveness of our approach through simulation studies comparing different prior specifications, and develop practical algorithms for selecting the maximal depth and performing model selection based on Bayes factors.

2602.10512 2026-05-11 cs.LG cs.LO stat.ML

Exponential Sample Complexity Separation between Flat and Hierarchical Agentic Theorem Provers

Sho Sonoda, Shunta Akiyama, Yuya Uezato

AI总结 本文研究了平铺式与分层式智能定理证明器在样本复杂度上的指数级差异。作者通过将定理证明过程建模为确定性有限时间马尔可夫决策过程,并基于教师证明器生成的验证证明轨迹进行离线模仿学习,分析了两种学习方式在样本效率上的区别。结果表明,分层式学习器通过复用证明结构,能够以指数级更少的样本完成验证,从而揭示了可复用证明结构对基于验证的定理证明的重要作用。

详情
英文摘要

Agentic theorem provers often introduce intermediate lemmas, proof sketches, or subgoal decompositions before returning to tactic-level search. This can look like an expensive detour: if proving lemmas is itself hard, why should a learned prover spend effort there? We give a statistical learning answer. Instead of worst-case proof complexity over all formulas, we study the biased data distribution produced by a teacher prover: initial theorem states together with successful verified proof traces. We model proof search as a deterministic finite-horizon MDP and analyze offline imitation learning from those traces. The success bounds depend on the average length of teacher proofs, how predictable the teacher's next action is, and how accurately the student learns that local prediction problem. A flat student learns from fully inlined traces, so repeated subproofs appear many times in its training and test-time certificate. A hierarchical student instead predicts a reusable proof DAG and solves each shared block once. When flattening duplicates the same hard local argument exponentially many times, the sufficient-sample certificate produced by our bounds can be exponentially smaller for the hierarchical learner. This gives a concrete statistical mechanism by which reusable proof structure helps verifier-based theorem proving.

2602.09457 2026-05-11 stat.ML cs.DS cs.LG

From Average Sensitivity to Small-Loss Regret Bounds under Random-Order Model

Shinsaku Sakaue, Yuichi Yoshida

AI总结 本文研究了随机顺序模型下的在线学习问题,其中损失函数集由对手选定但以随机顺序呈现。通过扩展现有的批量到在线转换方法,作者提出了一种新的分析框架,将离线算法的近似保证、平均敏感度和稳定性转化为在线设置下的小损失遗憾界。该方法适用于包括在线聚类和低秩近似在内的多种问题,并在子模函数最小化和ℓ₁回归等任务中取得了具体的应用结果,展示了稀疏化技术在无需损失函数结构性假设下实现小损失遗憾界的有效性。

详情
英文摘要

We study online learning in the random-order model, where the multiset of loss functions is chosen adversarially but revealed in a uniformly random order. By extending the batch-to-online transformation of Dong and Yoshida (2023), we show that if an offline algorithm enjoys a $(1+\varepsilon)$-approximation guarantee, an average sensitivity bound controlled by a function $φ(\varepsilon)$, and stability with respect to $\varepsilon$, then we can obtain a small-loss regret bound typically of order $\tilde O(φ^{\star}(\mathrm{OPT}_T))$, where $φ^{\star}$ is the concave conjugate of $φ$, $\mathrm{OPT}_T$ is the offline optimum over $T$ rounds, and $\tilde O$ hides polylogarithmic factors in $T$. Our result refines their original $(1+\varepsilon)$-approximate regret guarantee and applies to a broad class of problems, including online $k$-means clustering and online low-rank approximation. We further apply our approach to online submodular function minimization using $(1\pm\varepsilon)$-cut sparsifiers of submodular hypergraphs, obtaining a small-loss regret bound of $\tilde O(n^3 + n^{3/4}\mathrm{OPT}_T^{3/4})$, where $n$ is the ground-set size; we also demonstrate its applicability to online $\ell_1$ regression. Our work sheds light on the power of sparsification and related algorithmic techniques in achieving small-loss regret bounds in the random-order model, without requiring structural assumptions on loss functions, such as linearity or smoothness.

2602.01642 2026-05-11 cs.LG cs.AI math.OC stat.CO stat.ML

The Effect of Mini-Batch Noise on the Implicit Bias of Adam

Matias D. Cattaneo, Boris Shigida

AI总结 本文研究了在Adam优化器中,小批量噪声对隐式偏差的影响,特别是其如何影响模型在损失函数景观中趋向更尖锐或更平坦区域的倾向,进而影响泛化性能。研究发现,当批量较大时,增大β₂会加剧记忆项的反正则化效应,损害泛化;而当批量较小时,β₂对正则化的影响方向相反,β₁的单调性变化也呈现类似趋势。该理论分析还揭示了批量大小与临界批量规模之间的关系,并通过实验验证了这些结论。

详情
英文摘要

With limited high-quality data and growing compute, multi-epoch training is gaining back its importance across sub-areas of deep learning. Adam(W), versions of which are go-to optimizers for many tasks such as next token prediction, has two momentum hyperparameters $(β_1, β_2)$ controlling memory and one very important hyperparameter, batch size, controlling (in particular) the amount mini-batch noise. We introduce a theoretical framework to understand how mini-batch noise influences the implicit bias of memory in Adam (depending on $β_1$, $β_2$) towards sharper or flatter regions of the loss landscape, which is commonly observed to correlate with the generalization gap in multi-epoch training. We find that in the case of large batch sizes, higher $β_2$ increases the magnitude of anti-regularization by memory (hurting generalization), but as the batch size becomes smaller, the dependence of (anti-)regulariation on $β_2$ is reversed. A similar monotonicity shift (in the opposite direction) happens in $β_1$. In particular, the commonly "default" pair $(β_1, β_2) = (0.9, 0.999)$ is a good choice if batches are small; for larger batches, in many settings moving $β_1$ closer to $β_2$ is much better in terms of validation accuracy in multi-epoch training. Moreover, our theoretical derivations connect the scale of the batch size at which the shift happens to the scale of the critical batch size. We illustrate this effect in experiments with small-scale data in the about-to-overfit regime.

2602.00716 2026-05-11 stat.ML cond-mat.dis-nn cs.LG

Emergence of Distortions in High-Dimensional Guided Diffusion Models

Enrico Ventura, Beatrice Achilli, Luca Ambrogioni, Carlo Lucibello

AI总结 该论文研究了在高维引导扩散模型中,分类器无关引导(CFG)方法导致生成样本失真的现象。通过统计物理工具,作者分析了CFG采样分布与真实条件分布之间的不匹配问题,并在可解析处理的设定中,揭示了数据维度和类别数量对失真程度的影响。研究发现,当类别数随数据维度指数增长时,高维高斯混合模型中会出现显著失真,而在次指数增长情况下,失真则因动力学相变而消失。此外,作者提出了一种新的引导调度策略,有效提升了模型的类别可分性和样本多样性。

Comments 41 pages, 21 figures

详情
英文摘要

Classifier-free guidance (CFG) is the de facto standard for conditional sampling in diffusion models, yet it often reduces sample diversity. Using tools from statistical physics, we analyze the emergence of generative distortions induced by CFG, namely the mismatch between the CFG sampling distribution and the true conditional distribution. We study this phenomenon in analytically tractable settings with exact score functions, characterizing its dependence on data dimensionality and the number of classes. For high-dimensional Gaussian mixtures, we use dynamic mean-field theory to show that distortions arise when the number of classes scales exponentially with the data dimension, whereas they vanish in the sub-exponential regime due to a dynamical phase transition. We further prove that, in the infinite-class limit, distortions remain unavoidable regardless of dimensionality because of the increasing density of classes. Finally, we show that standard CFG schedules cannot prevent variance shrinkage, and we propose a theoretically grounded guidance schedule incorporating a negative-guidance window that improves both class separability and sample diversity in real-world latent diffusion models.

2602.00474 2026-05-11 stat.ML cs.LG cs.NA math.NA

Persistent-Transient Policy Evaluation for Markov Chains via Minimal Peripheral Quotients

Yang Xu, Vaneet Aggarwal

AI总结 本文研究了用于有限马尔可夫链的固定策略评估问题,特别是针对可能存在不可约性和周期性的情况。传统的方法在分解收益和偏差时无法准确区分持久性行为和瞬态效应,本文通过识别转移矩阵的实外周不变子空间,提出了一种最小外周商空间分解方法,从而消除了非衰减模式,使得剩余动态严格稳定。该方法将奖励唯一分解为持久模式部分和瞬态部分,能够准确重构有限时间回报,并在生成模型下提供稳定的估计。

详情
英文摘要

We study fixed-policy evaluation for finite Markov chains that may be reducible and periodic. Classical evaluation methods with gain and bias decomposition are not always diagnostic: the gain records only invariant Cesàro averages, while persistent phase-dependent behavior is absorbed into the bias together with genuinely transient effects. We identify the real peripheral invariant subspace $\mathcal{K}(P)$ of the transition matrix $P$ as the source of this ambiguity. Quotienting by $\mathcal{K}(P)$ is the minimal exact quotient that removes all non-decaying modes and makes the remaining dynamics strictly stable. After choosing a gauge projection $Π$ with kernel $\mathcal{K}(P)$, the reward admits a unique decomposition $r = g_Π^\star + (I-P)v_Π^\star$, where $g_Π^\star$ is a persistent regime profile and $v_Π^\star$ is a gauge-fixed transient component. An exact comparison with classical normalized gain and bias shows that the new pair reallocates the same information so that all persistent modes are represented in $g_Π^\star$ and $v_Π^\star$ is transient. This decomposition reconstructs finite-horizon returns, recovers statewise average reward, admits a transient-cost interpretation, and yields a stable estimator under a generative model.

2512.23927 2026-05-11 stat.ML cs.LG

Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration

Lars van der Laan, Nathan Kallus

AI总结 本文研究了软Fitted Q-Iteration(soft FQI)在无Bellman完备性条件下的稳定性机制,提出了一种基于局部平稳分布对齐的稳定性分析方法。通过分析软Bellman算子在软最优固定点附近的收敛行为,作者发现其在平稳状态-动作范数下具有收缩性质,并据此设计了基于平稳重加权的软FQI算法,该方法在有限样本下能够实现局部线性收敛。研究还表明,普通软FQI在策略平稳采样下也具有局部稳定性,并解释了温度退火作为收敛区域的延续策略的作用。

详情
英文摘要

Fitted $Q$-iteration (FQI) and soft FQI are widely used value-based methods for offline reinforcement learning, but their standard stability guarantees often depend on Bellman completeness, a strong closure condition that can fail under function approximation. We analyze soft FQI without Bellman completeness and identify the stability mechanism that replaces it: local stationary norm alignment. Near the soft-optimal fixed point, the soft Bellman operator has the same first-order behavior as the policy-evaluation operator for the soft-optimal policy. This operator contracts in the policy's stationary state-action norm, whereas standard fitted regression projects Bellman targets in the behavior norm. This mismatch explains instability under distribution shift. We use this insight to develop stationary-reweighted soft FQI, which reweights each regression step toward the stationary distribution of the current softmax policy. Under approximate realizability and controlled weighting error, we prove finite-sample local linear convergence to the projected fixed point, separating statistical error from geometrically damped weight-estimation error. Our results also show that ordinary soft FQI is locally stable under on-policy stationary sampling, even without Bellman completeness, and explain temperature annealing as a continuation strategy for reaching a contraction region.

2512.23805 2026-05-11 stat.ML cs.LG

Fitted $Q$ Evaluation Without Bellman Completeness via Stationary Weighting

Lars van der Laan, Nathan Kallus

AI总结 本文研究了一种无需依赖Bellman完备性条件的拟合Q评估(FQE)方法,通过在回归步骤中引入目标策略的平稳状态-动作分布权重,改进了传统FQE在行为分布范数下的投影方式。该方法在保持模块化监督学习形式的同时,使拟合投影与目标策略诱导的$L^2$范数下的收缩算子对齐,从而在有限样本下实现了对平稳投影Bellman不动点的线性收敛,并分离了迭代、统计、近似和权重估计误差,实验表明该方法能有效稳定FQE并降低价值估计误差。

详情
英文摘要

Fitted $Q$-evaluation (FQE) is a standard regression-based tool for off-policy evaluation, but existing stability guarantees often rely on Bellman completeness, a strong closure condition that can fail under function approximation. We study an alternative route: changing the norm used in the regression step. The policy-evaluation Bellman operator is contractive in the $L^2$ norm induced by the target policy's stationary state-action distribution, whereas standard off-policy FQE projects Bellman targets in the behavior-distribution norm. We propose stationary-weighted FQE, which reweights each Bellman regression by the stationary target-to-behavior density ratio. The method preserves FQE's modular supervised-learning form while aligning the fitted projection with that contractive norm. We prove finite-sample linear convergence to the stationary projected Bellman fixed point under misspecification, without requiring Bellman completeness. The bound separates finite-iteration, statistical, approximation, and weight-estimation errors, and shows that ratio-estimation error is attenuated when the inherent Bellman error is small. Controlled experiments show that stationary weighting can stabilize FQE and reduce value error when behavior-norm regression overemphasizes regions rarely visited by the target policy.

2512.23694 2026-05-11 stat.ML cs.LG econ.EM

Bellman Calibration for $V$-Learning in Offline Reinforcement Learning

Lars van der Laan, Nathan Kallus

AI总结 在离线强化学习中,长期价值预测的可靠性面临挑战,因为拟合价值方法涉及引导、函数逼近和分布偏移,而标准保证通常需要贝尔曼完备性或可实现性。本文提出贝尔曼校准,一种较弱的可靠性准则,要求预测值相近的状态具有一致的贝尔曼目标平均值,并基于此提出迭代贝尔曼校准方法,通过拟合原始预测的一维映射对价值预测器进行后处理校准。该方法无需贝尔曼完备性或价值函数可实现性,即可在有限样本下保证校准误差以一维非参数速率控制,并将价值误差分解为统计估计、有限迭代和逼近误差,明确了校准在何时能提升预测性能。

详情
英文摘要

Reliable long-horizon value prediction is difficult in offline reinforcement learning because fitted value methods combine bootstrapping, function approximation, and distribution shift, while standard guarantees often require Bellman completeness or realizability. We introduce Bellman calibration, a weak reliability criterion requiring that states assigned similar predicted values have average Bellman targets that agree with those predictions. This criterion yields a scalar calibration error for diagnosing systematic numerical miscalibration, which we estimate from off-policy data using doubly robust Bellman target estimates. We then propose Iterated Bellman Calibration, a model-agnostic post-hoc procedure that recalibrates any learned value predictor by fitting a one-dimensional map of its original prediction, with histogram and isotonic variants. We prove finite-sample guarantees showing that Bellman calibration error is controlled at one-dimensional nonparametric rates without Bellman completeness or value-function realizability. Our value-error bounds separate statistical estimation, finite-iteration, and approximation errors, clarifying when calibration improves value prediction and when its gains are limited by the information in the original predictor or insufficient coverage.

2512.12116 2026-05-11 cs.LG stat.ML

Neural CDEs as Correctors for Learned Time Series Models

Muhammad Bilal Shahid, Zhanhong Jiang, Prajwal Koirala, Soumik Sarkar, Cody Fleming

AI总结 本文提出了一种预测-校正框架,用于改进时间序列模型的多步预测性能。该框架中,预测器生成多步预测,而校正器采用神经控制微分方程来修正预测误差,能够处理不规则采样的时间序列,并兼容连续和离散时间预测器。研究还引入了两种正则化策略以提升校正器的外推能力和训练效率,并提供了理论上的稳定性与收敛性保证。实验表明,该方法在多种预测模型上均能有效提升预测精度,具有预测器无关的广泛适用性。

详情
英文摘要

Learned time-series models, whether continuous or discrete, are widely used for forecasting the states of dynamical systems but suffer from error accumulation in multi-step forecasts. To address this issue, we propose a Predictor-Corrector framework in which the Predictor is a learned time-series model that generates multi-step forecasts and the Corrector is a neural controlled differential equation that corrects the forecast errors. The Corrector works with irregularly sampled time series and is compatible with both continuous- and discrete-time Predictors. We further introduce two regularization strategies that improve the Corrector's extrapolation performance and accelerate its training. We also provide theoretical guarantees on the stability and convergence of the proposed framework. Experiments on synthetic, physics-based, and real-world datasets show that the proposed framework consistently improves forecasting performance across diverse Predictors, including neural ordinary differential equations, ContiFormer, and DLinear, demonstrating its predictor-agnostic nature.

2512.01279 2026-05-11 stat.ME

A Dynamical Model for Spatio-Temporal Processes Motivated by Second-Order Partial Differential Equations

Yutong Zhang, Xiao Liu

AI总结 本文提出了一种基于二阶随机偏微分方程(SPDE)的时空过程动态模型,通过构建无限维线性状态空间表示,并利用伽辽金方法将其转化为有限维近似,从而实现计算与参数估计的可行性。该模型能够准确描述时空协方差结构,并量化近似误差,通过多种实际场景的数值实验验证了其有效性与适用性。

详情
英文摘要

An important class of spatio-temporal models is constructed by leveraging the hierarchical structure of dynamical (or, state-space) models. This paper proposes a new statistical dynamical model for spatio-temporal processes motivated by second-order stochastic partial differential equations (SPDE). In particular, an infinite-dimensional linear state-space representation is obtained where the state transition is governed by a proposed SDE. Then, using the Galerkin's method, a finite-dimensional approximation to the infinite-dimensional SDE is obtained, yielding a dynamical model with finite states that facilitates computation and parameter estimation. The space-time covariance of the approximated dynamical model is obtained, and the error between the approximate and exact covariance matrices is quantified. Comprehensive numerical investigations, including 2D wave equation, seismic wave propagation, advection-diffusion equations and wildfire aerosol propagation processes, are performed to demonstrate the application of the proposed model. Code is available.

2511.23216 2026-05-11 stat.ME

Comparing Variable Selection and Model Averaging Methods for Logistic Regression

Nikola Sekulovski, František Bartoš, Don van den Bergh, Giuseppe Arena, Henrik R. Godmann, Vipasha Goyal, Julius M. Pfadt, Maarten Marsman, Adrian E. Raftery

AI总结 本文研究了在逻辑回归中处理模型不确定性时,变量选择与模型平均方法的相对表现。通过预注册的模拟实验,比较了28种常用方法在不同数据条件下的性能,发现当数据不存在分离现象时,基于g先验的贝叶斯模型平均方法(尤其是g = max(n, p²))表现最佳;而当出现分离时,LASSO等惩罚似然方法更为稳定,局部经验贝叶斯先验的BMA方法在两种情况下均具有竞争力。研究为实际研究者提供了在逻辑回归中有效应对模型不确定性的实用指导。

详情
英文摘要

Model uncertainty is a central challenge in statistical models for binary outcomes such as logistic regression, arising when it is unclear which predictors should be included in the model. Many methods have been proposed to address this issue for logistic regression, but their relative performance under realistic conditions remains poorly understood. We therefore conducted a preregistered, simulation-based comparison of 28 established methods for variable selection and inference under model uncertainty, using 11 empirical datasets spanning a range of sample sizes and number of predictors, in cases both with and without separation. We found that Bayesian model averaging (BMA) methods based on g-priors, particularly g = max(n, p^2), show the strongest overall performance when separation is absent. When separation occurs, penalized likelihood approaches, especially the LASSO, provide the most stable results, while BMA with the local empirical Bayes (EB-local) prior is competitive in both situations. These findings offer practical guidance for applied researchers on how to effectively address model uncertainty in logistic regression in modern empirical and machine learning research.

2511.09024 2026-05-11 stat.ME

Instrumental variables system identification with $L^p$ consistency

Simon Kuang, Xinfan Lin

AI总结 该研究提出了一种基于数据合成工具变量的系统辨识方法,用于消除动态系统在噪声数据下最小二乘估计的偏差。该方法在离散和连续时间模型中均建立了有限样本下的 $L^p$ 一致性,并恢复了非参数 $\sqrt{n}$ 收敛速率。实验表明,该方法在强制洛伦兹系统中显著降低了参数偏差和均方误差,适用于现代稀疏性促进的动力学学习模型。

Comments To appear at Learning for Decision and Control 2026

详情
英文摘要

Instrumental variables (eliminate the bias that afflicts least-squares identification of dynamical systems through noisy data, yet traditionally relies on external instruments that are seldom available for nonlinear time series data. We propose an IV estimator that synthesizes instruments from the data. We establish finite-sample $L^{p}$ consistency for all $p \ge 1$ in both discrete- and continuous-time models, recovering a nonparametric $\sqrt{n}$-convergence rate. On a forced Lorenz system our estimator reduces parameter bias by 200x (continuous-time) and 500x (discrete-time) relative to least squares and reduces RMSE by up to tenfold. Because the method only assumes that the model is linear in the unknown parameters, it is broadly applicable to modern sparsity-promoting dynamics learning models.

2510.18242 2026-05-11 math.ST stat.ME stat.ML stat.TH

Fast and Efficient Parallel Sampling Using Higher Order Langevin Dynamics

Jaideep Mahajan, Kaihong Zhang, Feng Liang, Jingbo Liu

AI总结 本文研究了从高维强对数凹分布中进行快速并行采样的方法。传统基于朗之万动力学的采样方法在连续时间下收敛迅速,但其离散化版本通常需要多项式时间步数,限制了并行效率。本文提出了一种结合高阶朗之万动力学与分块拉格朗日插值的方法,显著减少了并行采样所需的处理器数量,同时保持对数多项式的时间复杂度,适用于包括贝叶斯逻辑回归和两层神经网络在内的多种模型,提升了现有并行采样方法的空间效率。

详情
英文摘要

We study parallel sampling from high-dimensional strongly log-concave distributions. Langevin-based samplers converge rapidly in continuous time, but their discretizations are typically sequential and often require polynomially many steps in the dimension $d$, the target accuracy $\varepsilon^{-1}$, or both. Picard-based parallel sampling methods reduce this sequential depth to polylogarithmic scale by solving for many time-discretization points in parallel; however, existing guarantees often require a polynomial number of processors, leading to substantial memory and gradient-evaluation costs in high dimensions. We show that higher-order Langevin structure can reduce this parallel resource burden while preserving polylogarithmic sequential depth. Our method combines arbitrary-order Langevin dynamics with blockwise Lagrange polynomial interpolation. This sharper discretization reduces the number of parallel points required to achieve a target accuracy. Our results cover both higher-order smooth potentials and ridge-separable potentials, including models such as Bayesian logistic regression and two-layer neural networks, and improve upon the space complexity of the current literature on parallel log-concave sampling.

2509.24789 2026-05-11 cs.LG stat.ML

Fidel-TS: A High-Fidelity Multimodal Benchmark for Time Series Forecasting

Zhijian Xu, Wanxu Cai, Xilin Dai, Zhaorong Deng, Qiang Xu

AI总结 本文提出Fidel-TS,一个用于时间序列预测的高保真多模态基准数据集,旨在解决现有数据集在规模、频率、数据污染和信息泄露等方面存在的问题。该基准遵循数据来源完整性、无泄露设计和结构清晰性等核心原则,揭示了先前基准的局限性,并为评估多种单模态和多模态预测模型及大语言模型提供了新的见解。

Comments new version

详情
英文摘要

The evaluation of time series forecasting models is hindered by a lack of high-quality benchmarks, leading to overestimated assessments of progress. Existing datasets suffer from issues ranging from small-scale, low-frequency, pre-training data contamination in unimodal designs to the temporal and description leakage prevalent in early multimodal designs. To address this, we formalize the core principles of high-fidelity benchmarking, focusing on data sourcing integrity, leak-free design, and structural clarity. We introduce Fidel-TS, a new large-scale benchmark built from these principles. Our experiments reveal the limitations of prior benchmarks and the potential discrepancies in model evaluation, providing new insights into multiple existing unimodal and multimodal forecasting models and LLMs across various evaluation tasks.

2509.21172 2026-05-11 cs.LG econ.EM math.OC stat.ML

Inverse Reinforcement Learning with Just Classification and a Few Regressions

Lars van der Laan, Nathan Kallus, Aurelien Bibaut

AI总结 本文研究了逆强化学习中在最大熵模型下的奖励函数恢复问题,提出了一种新的通用方法GenPQR,该方法通过分类和少量回归即可实现,无需依赖特定神经网络结构或锚定动作限制。GenPQR 模块化地估计行为策略、计算软Q函数并恢复归一化奖励,理论分析表明其在函数逼近下具有有限样本保证,并通过实验验证其在奖励恢复效果上优于 DeepPQR,同时具备更高的灵活性和模块性。

详情
英文摘要

Inverse reinforcement learning (IRL) aims to infer rewards from observed behavior, but rewards are not identified from the policy alone: many reward--value pairs can rationalize the same actions. Meaningful reward recovery therefore requires a normalization, yet existing normalized IRL methods often rely on anchor-action restrictions or specialized neural architectures. We study reward recovery in the maximum-entropy, or Gumbel-shock, model under a broad class of statewise affine normalizations, with anchor-action constraints as a special case. This yields Generalized Policy-to-$Q$-to-Reward (GenPQR), a modular procedure that estimates the behavior policy, evaluates its soft $Q$-function through the Bellman equation, and recovers the normalized reward. Both stages can be implemented with off-the-shelf classification and regression methods. We prove modular finite-sample guarantees under general function approximation, with separate policy-estimation and $Q$-estimation errors. As a concrete instantiation, we study GenPQR with fitted $Q$-evaluation, reducing IRL to policy estimation followed by regression. Experiments show that GenPQR matches or improves reward recovery relative to DeepPQR while remaining simpler and more modular. Compared with DeepPQR, our theory goes beyond anchor actions, accommodates large and continuous action spaces, makes coverage requirements explicit, and is not tied to a specific neural-network architecture or training procedure.

2509.03738 2026-05-11 cs.LG cs.AI eess.SP stat.ML

Mechanistic Interpretability with Sparse Autoencoder Neural Operators

Bahareh Tolooshams, Ailsa Shen, Anima Anandkumar

AI总结 本文提出了一种新型稀疏自编码神经算子(SAE-NO),它在函数空间而非固定维度的欧几里得空间中进行操作,用于提升机制可解释性。通过引入功能表示假设,SAE-NO 将概念参数化为函数,从而不仅捕捉概念的存在,还描述其在输入域中的表达方式和位置。基于傅里叶神经算子实现的 SAE-FNO 在处理具有空间结构或频率结构的数据时表现出优越的性能,能够学习局部模式、高效利用概念,并在不同分辨率和领域规模下保持稳定性与泛化能力。

Comments Tolooshams and Shen has equal contribution. Preprint. Earlier version was presented as Oral and Extended Abstract at the Workshop on Unifying Representations in Neural Models (UniReps 2025) at NeurIPS

详情
英文摘要

We introduce sparse autoencoder neural operators (SAE-NOs), a new class of sparse autoencoders that operate in function spaces rather than fixed-dimensional Euclidean representations. We formalize the functional representation hypothesis, where data are explained through sparse compositions of structured functions. Unlike standard SAEs that represent concepts with scalar activations, SAE-NOs parameterize concepts as functions, enabling representations that capture not only a concept's presence, but also how and where it is expressed across the input domain. We achieve this through joint sparsity: concept sparsity selects active concepts, while domain sparsity governs where they are expressed. We instantiate this framework using Fourier neural operators (SAE-FNOs), parameterizing concepts as integral operators in the Fourier domain. This functional and spectral parameterization is particularly advantageous when data exhibit spatial structure across scales or when concepts are frequency-structured. We characterize SAE-FNO on vision data and demonstrate that it learns localized patterns, uses concepts more efficiently, and exhibits stable concept characteristics across sparsity levels. We further show that SAE-FNO adapts to changes in domain size and generalizes across discretizations, operating at resolutions beyond those seen during training, where standard SAEs fail. We also introduce lifting into SAEs and show theoretically and empirically that it acts as a preconditioner that accelerates optimization. Overall, our results show that moving from vector-valued to functional parameterizations, with concept and domain sparsity, extends SAEs from representing concept presence to modeling structured concept expression, highlighting the importance of parameterization.

2509.03512 2026-05-11 stat.ME

Bayesian Multivariate Sparse Functional Principal Components Analysis

Joseph Sartini, Scott Zeger, Ciprian Crainiceanu

AI总结 本文提出了一种全贝叶斯推断框架MSFAST,用于处理多变量、稀疏观测的函数型数据,旨在更准确地建模并量化主成分的不确定性。该方法基于FAST方法进行扩展,通过标准化变量、改进正交基函数、优化计算稳定性等策略,提升了模型在稀疏数据下的表现。研究通过模拟实验验证了MSFAST在低信噪比情况下的优越性,并将其应用于儿童生长研究,展示了其在实际分析中的有效性。

Comments 23 pages, 6 figures for main text. Appendix contains supplemental material

详情
英文摘要

Functional Principal Components Analysis (FPCA) provides a parsimonious, semi-parametric model for multivariate, sparsely-observed functional data. Frequentist FPCA approaches estimate principal components (PCs) from the data, then condition on these estimates in subsequent analyses. As an alternative, we propose a fully-Bayesian inferential framework for multivariate, sparse functional data (MSFAST) which explicitly models the PCs and incorporates their uncertainty. MSFAST builds upon the FAST approach to FPCA for univariate, densely-observed functional data. Like FAST, MSFAST represents PCs using orthonormal splines and samples the orthonormal spline coefficients using parameter expansion. MSFAST extends FAST to multivariate, sparsely-observed data by (1) standardizing each functional covariate to mitigate poor posterior conditioning due to disparate scales; (2) using a better-suited orthogonal spline basis; (3) updating parameterizations for computational stability; (4) introducing routines that leverage multiple cores and threads to accelerate compute; (5) using a Procrustes-based posterior PC alignment procedure; and (6) providing efficient prediction routines. We evaluate MSFAST alongside existing implementations using simulations. MSFAST produces uniquely valid inferences and accurate estimates, particularly in smaller signal-to-noise regimes. MSFAST is motivated by and applied to a study of child growth, with an accompanying vignette illustrating the implementation step-by-step.

2509.02826 2026-05-11 cs.LG cs.AI stat.AP stat.CO

Ensemble Learning for Healthcare: A Comparative Analysis of Hybrid Voting and Ensemble Stacking in Obesity Risk Prediction

Towhidul Islam, Md Sumon Ali

AI总结 该研究比较了混合多数投票和集成堆叠两种方法在肥胖风险预测中的性能,旨在评估其准确性与效率。通过两个数据集的实验分析,发现集成堆叠在复杂数据分布下表现出更强的预测能力,而混合多数投票则是一种稳健的替代方案。研究还探讨了不同机器学习算法在集成方法中的互补优势,为医疗健康领域的模型选择提供了参考。

Comments There are some errors found

详情
英文摘要

Obesity is a critical global health issue driven by dietary, physiological, and environmental factors, and is strongly associated with chronic diseases such as diabetes, cardiovascular disorders, and cancer. Machine learning has emerged as a promising approach for early obesity risk prediction, yet a comparative evaluation of ensemble techniques -- particularly hybrid majority voting and ensemble stacking -- remains limited. This study aims to compare hybrid majority voting and ensemble stacking methods for obesity risk prediction, identifying which approach delivers higher accuracy and efficiency. The analysis seeks to highlight the complementary strengths of these ensemble techniques in guiding better predictive model selection for healthcare applications. Two datasets were utilized to evaluate three ensemble models: Majority Hard Voting, Weighted Hard Voting, and Stacking (with a Multi-Layer Perceptron as meta-classifier). A pool of nine Machine Learning (ML) algorithms, evaluated across a total of 50 hyperparameter configurations, was analyzed to identify the top three models to serve as base learners for the ensemble methods. Preprocessing steps involved dataset balancing, and outlier detection, and model performance was evaluated using Accuracy and F1-Score. On Dataset-1, weighted hard voting and stacking achieved nearly identical performance (Accuracy: 0.920304, F1: 0.920070), outperforming majority hard voting. On Dataset-2, stacking demonstrated superior results (Accuracy: 0.989837, F1: 0.989825) compared to majority hard voting (Accuracy: 0.981707, F1: 0.981675) and weighted hard voting, which showed the lowest performance. The findings confirm that ensemble stacking provides stronger predictive capability, particularly for complex data distributions, while hybrid majority voting remains a robust alternative.

2508.02965 2026-05-11 stat.ME

Two Tunable Gini-Type Measures with U-Statistic Estimation: Theory, Simulation, and an Empirical Application to GDP per Capita in the Americas

Roberto Vila, Helton Saulo

AI总结 本文提出两种可调节的基尼系数型不平等度量方法 $G_p$ 和 $H_q$,它们在参数 $p, q$ 趋于无穷时收敛于经典基尼系数。通过引入调节参数 $p>1$ 和 $q>0$,可以灵活控制观测值差异的影响。作者为每种指标推导了闭式 $U$-统计量估计量,并在较弱矩条件下证明了其强一致性与渐近正态性。通过蒙特卡洛模拟和美洲人均GDP的实证分析,展示了这些参数对不平等度量的影响。

Comments 17 pages, 9 figures

详情
英文摘要

We introduce two families of inequality measures, $G_p$ and $H_q$, that converge to the classical Gini coefficient as $p,q\to\infty$. The tuning parameters $p>1$ and $q>0$ regulate the influence of disparities between observations. For each index we derive closed-form $U$-statistic plug-in estimators and establish strong consistency and asymptotic normality under mild moment conditions. A Monte Carlo study assesses finite-sample behavior across $(n,p,q)$, and an empirical illustration with GDP per capita in the Americas shows how the tuning parameters influence the measure of inequality.

2507.18147 2026-05-11 stat.ML

Learning graphons from data: Random walks, transfer operators, and spectral clustering

Stefan Klus, Jason J. Bramburger

AI总结 本文研究了如何从数据中学习图论中的图元(graphon),将信号的随机过程与图元上的随机游走过程建立联系。通过引入转移算子(如Koopman算子和Perron-Frobenius算子),作者提出了从信号数据中估计这些算子的方法,并利用其特征值和特征函数进行聚类分析,从而将传统的图谱聚类方法推广到图元上。此外,研究还展示了如何仅通过信号数据重建转移概率密度以及在可逆情况下重构图元本身,并将该方法应用于多种合成和实际信号数据中。

详情
英文摘要

Many signals evolve in time as a stochastic process, randomly switching between states over discretely sampled time points. Here we make an explicit link between the underlying stochastic process of a signal that can take on a bounded continuum of values and a random walk process on a graphon. Graphons are infinite-dimensional objects that represent the limit of convergent sequences of graphs whose size tends to infinity. We introduce transfer operators, such as the Koopman and Perron--Frobenius operators, associated with random walk processes on graphons and then illustrate how these operators can be estimated from signal data and how their eigenvalues and eigenfunctions can be used for detecting clusters, thereby extending conventional spectral clustering methods from graphs to graphons. Furthermore, we show that it is also possible to reconstruct transition probability densities and, if the random walk process is reversible, the graphon itself using only the signal. The resulting data-driven methods are applied to a variety of synthetic and real-world signals, including daily average temperatures and stock index values.

2410.21213 2026-05-11 stat.ME

Spatial causal inference in the presence of preferential sampling to study the impacts of marine protected areas

Dongjae Son, Brian J. Reich, Erin M. Schliep, Shu Yang, David A. Gill

AI总结 本文研究了在存在偏好采样情况下,如何评估海洋保护区(MPAs)对生物多样性的影响。作者提出了一种空间因果推断方法,能够同时考虑采样过程和处理分配中的未测量空间混杂因素,从而更准确地估计因果效应。通过模拟研究和实际数据分析,验证了该方法在识别因果效应方面的有效性,并发现偏好采样对结果估计具有显著影响。

详情
英文摘要

Marine Protected Areas (MPAs) have been established globally to conserve marine resources. Given their maintenance costs and impact on commercial fishing, it is critical to evaluate their effectiveness to support future conservation. In this paper, we use data collected from the Australian coast to estimate the effect of MPAs on biodiversity. Environmental studies such as these are often observational, and processes of interest exhibit spatial dependence, which presents challenges in estimating the causal effects. Spatial data can also be subject to preferential sampling, where the sampling locations are related to the policy and the response variable, further complicating inference and prediction. To address these challenges, we propose a spatial causal inference method that simultaneously accounts for unmeasured spatial confounders in both the sampling process and the treatment allocation. We prove the identifiability of key parameters in the model and the consistency of the posterior distributions of those parameters. We show via simulation studies that the causal effect of interest can be reliably estimated under the proposed model. The proposed method is applied to assess the effect of MPAs on fish biomass. We find evidence of preferential sampling and that properly accounting for this source of bias impacts the estimate of the causal effect.

2405.10742 2026-05-11 stat.ME stat.AP

Efficient Sampling in Disease Surveillance through Subpopulations: Sampling Canaries in the Coal Mine

Ivo V. Stoepker

AI总结 本文研究了在流行病监测中如何通过选择性采样子群体提高疫情检测效率的问题。作者提出,相较于均匀采样,优先采样基线疾病风险较高的子群体能够提升检测效果,并证明了两个子群体之间的采样效率与它们基线风险的比值成反比。研究还分析了二项式检验的统计功效曲线随样本量变化的非单调特性,并通过荷兰新冠病例的案例验证了理论结论。

Comments Contains slightly more detailed exposition than journal version

详情
Journal ref
Statistics and Probability Letters 2025, Vol. 222, 110384
英文摘要

We consider outbreak detection settings of endemic diseases where the population under study consists of various subpopulations available for stratified surveillance. These subpopulations can for example be based on age cohorts, but may also correspond to other subgroups of the population under study such as international travellers. Rather than sampling uniformly across the population, one may elevate the effectiveness of the detection methodology by optimally choosing a sampling subpopulation. We show (under some assumptions) the relative sampling efficiency between two subpopulations is inversely proportional to the ratio of their respective baseline disease risks. This implies one can increase sampling efficiency by sampling from the subpopulation with higher baseline disease risk. Our results require careful treatment of the power curves of exact binomial tests as a function of their sample size, which are non-monotonic due to the underlying discreteness. A case study of COVID-19 cases in the Netherlands illustrates our theoretical findings.

2308.02480 2026-05-11 math.ST stat.TH

Statistical Inference for Linear Functions of Eigenvectors with Small Eigengaps

Joshua Agterberg

AI总结 本文研究了在特征值间隔较小的情况下,对特征向量线性函数进行统计推断的问题。作者提出了去偏线性形式的近似高斯性,并基于此构造了具有近似有效置信区间的估计方法,这些置信区间的宽度达到最小最大最优。该方法无需样本分割,可直接从数据中计算,适用于矩阵去噪和尖峰主成分分析模型。

详情
英文摘要

Spectral methods have myriad applications in high-dimensional statistics and data science, and while previous works have primarily focused on $\ell_2$ or $\ell_{2,\infty}$ eigenvector and singular vector perturbation theory, in many settings these analyses fall short of providing the fine-grained guarantees required for various inferential tasks. In this paper we study statistical inference for linear functions of eigenvectors and principal components with a particular emphasis on the setting where gaps between eigenvalues may be extremely small relative to the corresponding spiked eigenvalue, a regime which has been oft-neglected in the literature. First, we prove the approximate Gaussianity for debiased linear forms in the matrix denoising model and the spiked principal component analysis model, both under Gaussian noise. Based on this limiting behavior, we propose estimators for the appropriate bias and variance quantities resulting in approximately valid confidence intervals. We then investigate the optimality of these confidence intervals and show that their widths are minimax optimal up to constant factors. Of note, our proposed confidence intervals can be computed directly from data without the need for any sample-splitting.

2303.04754 2026-05-11 stat.ME stat.CO

Estimation of Long-Range Dependent Models with Missing Data: to Impute or not to Impute?

Guilherme Pumi, Gladys Choque Ulloa, Taiane Schaedler Prass

AI总结 本文研究了在存在缺失数据的情况下,如何估计长记忆时间序列模型ARFIMA$(p,d,q)$中的长程依赖参数$d$。文章比较了两种主要方法:一种是先对缺失数据进行插补再进行估计,另一种是直接设计适用于缺失数据的估计方法。通过大量蒙特卡洛模拟实验,作者在不同缺失比例和依赖程度下对35种方法进行了系统比较,为实际应用提供了参考依据。

详情
英文摘要

Among the most important models for long-range dependent time series is the class of ARFIMA$(p,d,q)$ (Autoregressive Fractionally Integrated Moving Average) models. Estimating the long-range dependence parameter $d$ in ARFIMA models is a well-studied problem, but the literature regarding the estimation of $d$ in the presence of missing data is very sparse. There are two basic approaches to dealing with the problem: missing data can be imputed using some plausible method, and then the estimation can proceed as if no data were missing, or we can use a specially tailored methodology to estimate $d$ in the presence of missing data. In this work, we review some of the methods available for both approaches and compare them through a Monte Carlo simulation study. We present a comparison among 35 different setups to estimate $d$, under tenths of different scenarios, considering percentages of missing data ranging from as few as 10\% up to 70\% and several levels of dependence.