arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.13840 2026-05-14 stat.ML cs.DS cs.LG math.ST stat.CO stat.TH

What is Learnable in Valiant's Theory of the Learnable?

Steve Hanneke, Anay Mehrotra, Grigoris Velegkas, Manolis Zampetakis

AI总结 本文重新审视了Valiant在1984年提出的可学习性模型,探讨了其中哪些概念类是可以被学习的。研究发现,在有限域(包括布尔超立方体)中,一个类可学习当且仅当每个可实现的正样本可以通过多项式大小的自适应查询压缩方案进行认证。这一结果揭示了Valiant模型的学习能力严格介于PAC学习和无查询版本之间,并首次给出了在该模型中学习$d$维半空间的有效算法,展示了查询机制对可学习类的实质性影响。

详情
Comments
Abstract shortened for arXiv
英文摘要

Valiant's 1984 paper is widely credited with introducing the PAC learning model, but it, in fact, introduced a different model: unlike PAC learning, the learner receives only positives, may issue membership queries, and must output a hypothesis with no false positives. Prior work characterized variants, including the case without queries. We revisit Valiant's original model and ask: *Which classes are learnable in it?* For every finite domain, including Valiant's Boolean-hypercube setting, we show that a class is learnable if and only if every realizable positive sample can be certified by a poly-size adaptive query-compression scheme. This is a new variant of sample compression where the learner certifies samples via a short interaction with the membership oracle. Our characterization shows that learnability in Valiant's model is strictly sandwiched between learnability in the PAC model and the variant of Valiant's model without membership queries. This is one of the rare cases where introducing membership queries changes the set of learnable classes, and not just the sample or computational complexity. Next, we study the natural extension of the model to arbitrary domains. While we do not obtain an exact characterization, our techniques readily generalize and show that the same strict sandwiching persists. Finally, we show that $d$-dimensional halfspaces, which are not learnable without queries, are learnable with queries: we give a $\mathrm{poly}(d) \tilde{O}(1/ε)$ sample and $\mathrm{poly}(d) \mathrm{polylog}(1/ε)$ query algorithm, and prove that at least $Ω(d)$ samples or queries are necessary. To our knowledge, this is the first algorithm for halfspaces in Valiant's model. Together, these results uncover a surprisingly rich theory behind Valiant's original notion of learnability and introduce ideas that may be of independent interest in learning theory.

2605.13742 2026-05-14 stat.ME stat.AP

Macroscopic Activity-Based Modeling of Urban Active Mobility

Romain Azaïs, Adrien Marion, Florian Patout

AI总结 本文提出了一种基于活动的宏观模型,用于利用非侵入式传感器数据建模城市主动出行行为。该模型引入了参与函数来描述活动之间的时空出行模式,并将聚合计数的分解建模为统计推断问题,采用泊松变量进行计数建模,并通过最大似然估计推断未知子群体规模。该方法基于微观随机模型构建,具有可扩展性和隐私保护优势,为分析城市软出行动态提供了有效工具。

详情
Comments
29 pages
英文摘要

This paper develops a macroscopic, activity-based model of urban active mobility using nonintrusive sensor data. It introduces attendance functions to describe spatio-temporal travel patterns between activities and formulates the disaggregation of aggregated counts as a statistical inference problem. Counts are modeled as Poisson variables, and unknown subpopulation sizes are estimated via maximum likelihood, with theoretical guarantees and an efficient EM algorithm for computation. Grounded in a microscopic stochastic model, the framework offers a scalable and privacy-preserving approach to analyzing urban soft mobility dynamics.

2605.13717 2026-05-14 cs.LG stat.ML

Tight Sample Complexity Bounds for Entropic Best Policy Identification

Amer Essakine, Claire Vernade

AI总结 本文研究了在熵风险度量下有限时间风险敏感强化学习中的最优策略识别问题。作者针对现有样本复杂度上界与下界之间存在的指数级差距,提出了一种基于前向模型并结合KL散度探索奖励的算法,通过利用指数效用函数的平滑性质,改进了集中性分析,从而消除了原有的指数因子,使得样本复杂度达到理论下界,填补了该问题的空白。

详情
英文摘要

We study best-policy identification for finite-horizon risk-sensitive reinforcement learning under the entropic risk measure. Recent work established a constant gap in the exponential horizon dependence between lower and upper bounds on the number of samples required to identify an approximately optimal policy. Precisely, known lower bounds scale in $Ω(e^{|β| H})$ where $H$ is the horizon of the MDP, while the state-of-the-art upper bound achieves at best $O(e^{2|β| H})$ (arXiv:2506.00286v2) using a generative model. We show that this extra exponential factor can be traced to overly loose concentration control for exponential utilities. To close this open gap, we revisit the analysis of this problem through a forward-model based algorithm building on KL-based exploration bonuses that we adapt to the entropic criterion. The improvement we get is due to two main novel technical innovations. We leverage the smoothness properties of the exponential utility to derive sharper concentration bounds, and we propose a new stopping rule that exploits further this tightness to obtain a sample complexity that matches the lower bound.

2605.13710 2026-05-14 math.ST stat.TH

Pattern-based tests for two-dimensional copulas

L. Baringhaus, R. Grübel

AI总结 本文研究了基于模式频率的二维Copula检验方法,提出了适用于二维随机样本的模式频率函数型中心极限定理,并以此为基础构建了非参数 goodness-of-fit 检验、两样本检验和对称性检验方法。研究还探讨了参数情形下的应用,并通过仿真研究验证了方法的有效性。

详情
Journal ref
Bernoulli 31 (2025) 3034-3059
英文摘要

In statistics permutations typically arise in the context of rank plots for two-dimensional data. Such plots can also be interpreted as discrete copulas. In discrete mathematics, typically in the context of the description of large (non-random) objects, two-dimensional copulas appear as limits of permutations and are then known as permutons if the topology refers to the convergence of pattern frequencies. We obtain a functional central limit theorem for such pattern frequencies in the context of two-dimensional random samples. The result serves as the basis for nonparametric goodness-of-fit tests, for two-sample tests, and for tests of symmetry. This includes a suitable variant of the bootstrap for obtaining critical values. Pattern-based procedures are also of interest in a parametric context. We consider two examples, the Farlie-Gumbel-Morgenstern class and a family of delay copulas. We discuss implementation aspects of the resulting procedures and we provide a simulation study that supplements the theoretical results in the nonparametric case.

2605.13689 2026-05-14 stat.ME

Moving beyond spatial and random cross-validation in environmental modelling: a call for prediction-domain adaptive evaluation

Jan Linnenbrink, Jakub Nowosad, Hanna Meyer

AI总结 随着空间预测模型在生态学中的广泛应用,如何准确评估预测地图的质量成为一个关键问题。尽管独立概率抽样被认为是评估地图精度的理想方法,但在实际中往往难以实现,因此交叉验证成为常用手段。本文提出了一种新的交叉验证方法——预测域自适应评估,该方法能够根据不同预测场景灵活调整,从而更可靠地估计地图精度,并通过模拟研究验证了其有效性。

详情
英文摘要

With the growing application of spatial predictive modeling in ecology, the question of how to appropriately evaluate the resulting maps has gained increasing attention. While there is consensus that map accuracy is ideally estimated using an independent probability sample of the prediction area, there is still no agreement on the most appropriate way to conduct an evaluation for the common case when such a sample is not available. Cross-validation, which involves multiple train-test splits, is commonly applied not only to estimate final model accuracy but also to guide model tuning and selection. Many different spatial and non-spatial approaches to cross-validation have been proposed, and approaches in both groups have faced substantial criticism. It has been shown that random cross-validation methods are suitable when the training points are randomly distributed in the prediction area, while spatial cross-validation is better suited towards extrapolation situations. In practice, however, there is a continuum and most cases are between those two extremes. To address this gap, we advocate for a new category of cross-validation methods to account for this: prediction-domain adaptive evaluation. Methods in this category flexibly adapt to the prediction situation, yielding most reliable estimates of map accuracy across different scenarios. To ground this perspective empirically, we reproduce a simulation study that was used in earlier research and systematically compare different evaluation methods and discuss their purpose.

2605.13687 2026-05-14 cs.LG cs.AI stat.ML

A Hierarchical Language Model with Predictable Scaling Laws and Provable Benefits of Reasoning

Jason Gaitonde, Frederic Koehler, Elchanan Mossel, Joonhyung Shin, Allan Sly

AI总结 本文提出了一类具有层次结构的合成语言,并通过树上的广播过程生成,从而能够精确分析上下文长度和推理在自回归生成中的作用。研究引入了一种精确的$k$-gram假设来替代传统变换器模型,并通过实验证明其有效性。研究发现,在特定语言模型下,若上下文长度不足,生成结果将偏离真实语言分布,而具备推理能力的模型仅需对数长度的内存即可精确生成符合真实语言的序列,展现出指数级的性能提升。

详情
英文摘要

We introduce a family of synthetic languages with hierarchical structure -- generated by a broadcast process on trees -- for which the role of context length and reasoning in autoregressive generation can be analyzed precisely. At the heart of our analytic approach is an \emph{exact $k$-gram ansatz} in place of transformers with context length $k$, a substitution we then validate empirically. Using this ansatz we derive explicit asymptotic predictions for distributional statistics of the sequences produced by a trained model, instantiated in two settings. For the \emph{Ising broadcast process} (a soft-constrained language), we prove that the variance of the generated sum scales log-linearly in the context depth and its kurtosis converges to that of a Gaussian -- both deviating from the true language for any sublinear context. For the \emph{coloring broadcast process} (a hard-constrained language) in the freezing regime, bounded-context autoregression produces sequences that, with high probability, are inconsistent with \emph{any} valid coloring of the underlying tree. Together these results imply an $Ω(n)$ lower bound on the context length required to faithfully sample length-$n$ sequences. In contrast, we prove that an autoregressive \emph{reasoning} model with only $Θ(\log n)$ working memory can sample exactly from the true language -- an exponential improvement. We confirm both the lower-bound predictions and the reasoning-based upper bound empirically with transformers trained on the synthetic language; the trained models track our asymptotic predictions quantitatively across a wide range of context sizes.

2605.13681 2026-05-14 cs.LG stat.ML

Sampling from Flow Language Models via Marginal-Conditioned Bridges

Iskander Azangulov, Leo Zhang

AI总结 本文研究了如何从流语言模型(FLMs)中进行有效的采样,提出了一种基于边缘条件桥接的采样方法。与传统方法不同,该方法在每一步反向采样时,根据FLM的边缘后验分布生成干净的one-hot端点,并通过解析的Ornstein-Uhlenbeck桥接过程生成连续状态,从而更准确地保留语言模型的结构特性。该方法无需额外训练,能够自然地支持温度缩放和核截断等解码控制,实验表明其在生成质量与多样性之间取得了更好的平衡。

详情
英文摘要

Flow Language Models (FLMs) are a recently introduced class of language models which adapt continuous flow matching for one-hot encoded token sequences. Their denoisers have a special structure absent from generic continuous diffusion models: each block of the denoising mean is a posterior marginal distribution over the clean token at that position. Standard DDPM-style samplers collapse these marginals to a single conditional-mean endpoint and bridge toward this simplex-valued point, which is generally not a valid one-hot sequence. We argue that the natural sampler for an FLM is instead posterior-predictive. At each reverse step, we sample a clean one-hot endpoint from the factorized posterior defined by the FLM token marginals, and then sample the next continuous state from the analytic Ornstein--Uhlenbeck bridge conditioned on that endpoint. The method is training-free, uses the same model evaluations as standard sampling, and gives a principled interface for token-level decoding controls such as temperature scaling and nucleus truncation. We show that, under exact posterior marginals, the endpoint approximation error is exactly the conditional multi-information among token positions. The induced one-step bridge kernel preserves all token-wise posterior-predictive marginals and loses only the residual cross-position dependence. Finally, we prove a Girsanov path-space comparison showing that the marginal-conditioned bridge has a no-larger denoising-error term than the frozen conditional-mean bridge, with strict improvement whenever intermediate coordinate-wise bridge observations reveal additional information about the clean token. Experiments with FLMs show that the sampler improves the quality--diversity tradeoff. Code is available at: github.com/imbirik/mcb.

2605.13660 2026-05-14 stat.AP

Improving ecological inference and uncertainty quantification from camera trap data through the fusion of AI confidences and manual annotations

Adira Cohen, Erin M. Schliep, Roland Kays, Mohammad Alyetama, Matthew Snider

AI总结 该研究旨在通过融合人工智能预测置信度与人工标注,提升从相机陷阱数据中进行生态推断和不确定性量化的能力。作者提出了一种新的贝叶斯层次数据融合模型,结合人工标注与AI预测的优势,提高了推断精度与预测能力,并通过模拟研究验证了其有效性。该方法应用于北卡罗来纳州白尾鹿的体况分析,揭示了鹿的健康状况与其栖息地和繁殖状态之间的关系,得出了传统方法无法获得的新生态结论。

详情
Comments
38 pages, 8 figures
英文摘要

Camera traps have become a core tool in ecological research, enabling large-scale, noninvasive monitoring of wildlife populations and behavior. By automatically recording animals as they pass within view, these devices generate massive image datasets with minimal field effort. Yet this data richness introduces a new bottleneck when translating the images into usable information due to time and effort required for human annotation. Recently, artificial intelligent (AI) has been integrated into the workflow to improve this efficiency. However, the data procured from AI approaches are of a different nature, necessitating new statistical methods in order to obtain inference, make predictions, and quantify uncertainty. We propose a new Bayesian hierarchical data-fusion model which combines the strengths of human annotations and AI predictions. The benefits of our approach are an ability to provide uncertainty quantification as well as improved inference and prediction power, which we demonstrate using a simulation study. We apply our model to an AI analysis of the body condition of white-tailed deer (Odocoileus virginianus) from camera trap images from North Carolina to study the relationship between health and their environment. We find that bucks in rut have higher body condition than other deer and that green, open habitats are correlated with high body condition. Our new model derived novel ecological inference compared to a traditional approach using the same data.

2605.13650 2026-05-14 math.ST stat.TH

Weighted and Truncated Tail Index Estimation under Random Censoring: A Unified Full-Range Framework

Abdelhakim Necir, Nour Elhouda Guesmia, Djamel Meraghni

AI总结 本文研究了在右删失条件下极值指数的估计问题,提出了一种加权且截断的Nelson-Aalen尾经验过程,构建了一类由大于1的调参参数索引的积分估计方法,从而在从弱删失到强删失的整个范围内恢复了可处理的渐近结构。该方法在标准正则变差条件下建立了统一的高斯近似,无需对删失程度施加限制,理论分析和实际数据应用表明其在中等和强删失情形下具有更高的稳定性和准确性。

详情
英文摘要

Estimation of the extreme value index under right censoring is a fundamental problem in extreme value theory, with important applications in finance, insurance, and reliability. Classical integral estimators for Pareto-type tails typically require that the asymptotic proportion of uncensored observations in the tail is larger than one half, corresponding to the weak censoring regime. This restriction excludes many practically relevant situations involving strong censoring, where the proportion of uncensored observations is smaller than or equal to one half, and reflects the absence of a uniformly valid Gaussian approximation for the associated tail empirical process. To overcome this limitation, we introduce a weighted and truncated Nelson--Aalen tail empirical process and construct a class of integral estimators indexed by a tuning parameter larger than one. This approach restores a tractable asymptotic structure over the entire censoring range, from very weak to very strong censoring. Under standard regular variation conditions, we establish a uniform Gaussian approximation and derive consistency and asymptotic normality without imposing restrictions on the censoring level. A key ingredient of the analysis is a linearization of the estimator as a functional of the underlying process. Simulation studies and real data applications demonstrate improved stability and accuracy, particularly under moderate and strong censoring. In particular, the analysis of insurance loss data, representing weak censoring, and Australian AIDS survival data, representing strong censoring, illustrates the practical relevance of the proposed methodology across contrasting censoring regimes.

2605.13642 2026-05-14 stat.ML cs.LG stat.CO

Conformal Anomaly Detection in Python: Moving Beyond Heuristic Thresholds with 'nonconform'

Oliver Hennhöfer, Maximilian Kirsch, Christine Preisach

AI总结 本文介绍了名为 'nonconform' 的 Python 工具包,用于在机器学习流程中实现校准化的异常检测,解决传统方法依赖启发式阈值的问题。该工具包基于统计学中的交换性假设,将异常分数转化为具有统计意义的 p 值,并支持多种校准策略,适用于多种异常检测模型。文章通过代码示例和理论结合,展示了如何在实际中应用校准化异常检测,并验证了其在统计意义上的有效性。

详情
Comments
20 pages, 4 figures
英文摘要

Most anomaly detection systems output scores rather than calibrated decisions, leaving practitioners to choose thresholds heuristically and without clear statistical interpretation. Conformal anomaly detection addresses this limitation by converting anomaly scores into calibrated p-values that are valid under the statistical assumption of data exchangeability, with a growing literature extending this idea beyond that setting. We present 'nonconform', a Python package for applying conformal anomaly detection within existing machine-learning workflows, and use it as the basis for an implementation-grounded introduction to the field. The package integrates with 'scikit-learn', 'pyod', and custom anomaly detectors, and provides a unified interface for calibration, p-value generation, and false discovery rate control. It supports several conformalization strategies, ranging from simple split-conformal calibration to more data-efficient and shift-aware extensions. Through a progression from foundational concepts to advanced conformalization strategies, complemented by code examples, the paper connects the statistical ideas behind conformal anomaly detection to their practical use in 'nonconform'. Empirical results demonstrate that the implemented methods enable statistically principled anomaly detection. Together, the package and exposition aim to make core conformal anomaly detection workflows more accessible and reproducible in experimental and production-oriented settings.

2605.13639 2026-05-14 cs.LG math.OC stat.ML

Achieving $ε^{-2}$ Sample Complexity for Single-Loop Actor-Critic under Minimal Assumptions

Ishaq Hamza, Zaiwei Chen

AI总结 本文研究了强化学习中无策略actor-critic方法在单循环实现下的样本复杂度问题,在仅假设存在能诱导不可约马尔可夫链的策略的前提下,证明了在单循环、单时间尺度框架下,首次实现了$\tilde{\mathcal{O}}(ε^{-2})$的样本复杂度保证,用于找到一个$ε$-最优策略。相比以往需要嵌套循环或强算法依赖假设的工作,本文通过构建耦合的Lyapunov漂移框架,解决了单循环更新和非策略学习带来的挑战,为actor和critic分别建立了几何收敛率和$\tilde{\mathcal{O}}(1/T)$收敛率,并通过交叉支配性质将两者结合,具有重要的理论意义和应用潜力。

详情
英文摘要

In this paper, we establish last-iterate convergence rates for off-policy actor--critic methods in reinforcement learning. In particular, under a single-loop, single-timescale implementation and a broad class of policy updates, including approximate policy iteration and natural policy gradient methods, we prove the first $\tilde{\mathcal{O}}(ε^{-2})$ sample complexity guarantee for finding an $ε$-optimal policy under minimal assumptions, namely, the existence of a policy that induces an irreducible Markov chain. This stands in stark contrast to the existing literature, where an $\tilde{\mathcal{O}}(ε^{-2})$ sample complexity is achieved only through nested-loop updates and/or under strong, algorithm-dependent assumptions on the policies, such as uniform mixing and uniform exploration. Technically, to address the challenges posed by the coupled update equations arising from the single-loop implementation, as well as the potentially unbounded iterates induced by off-policy learning, our analysis is based on a coupled Lyapunov drift framework. Specifically, we establish a geometric convergence rate for the actor and an $\tilde{\mathcal{O}}(1/T)$ convergence rate for the critic, and combine the two Lyapunov drift inequalities through a cross-domination property. We believe this analytical framework is of independent interest and may be applicable to other coupled iterative algorithms with unbounded

2605.13612 2026-05-14 cs.LG cond-mat.dis-nn stat.ML

Deep Learning as Neural Low-Degree Filtering: A Spectral Theory of Hierarchical Feature Learning

Yatin Dandi, Matteo Vilucchio, Luca Arnaboldi, Hugo Tabanelli, Florent Krzakala

AI总结 本文提出了一种名为“神经低度滤波”(Neural LoFi)的理论框架,用于解释深度神经网络如何通过层次化特征学习从数据中提取有用表示。该方法将基于梯度的训练过程简化为一种显式的迭代谱方法,每一层网络通过选择与标签具有最大低度相关性的方向来逐步构建特征。该理论不仅提供了对深度学习中特征演化机制的数学解释,还通过实验验证了其在全连接和卷积网络中的有效性,展示了其在特征选择和结构滤波方面的优越性。

详情
Comments
62 pages, many figures, companion codes in https://github.com/IdePHICS/Neural-LoFi-Theory
英文摘要

Understanding how deep neural networks learn useful internal representations from data remains a central open problem in the theory of deep learning. We introduce Neural Low-Degree Filtering (Neural LoFi), a stylized limit of gradient-based training in which hierarchical feature learning becomes an explicit iterative spectral procedure. In this limit, the dynamics at each layer decouple: given the current representation, the next layer selects directions with maximal accessible low-degree correlation to the label. This yields a tractable surrogate mechanism for deep learning, together with a natural kernel-space interpretation. Neural LoFi provides a mathematically explicit framework for studying multi-layer feature learning beyond the lazy regime. It predicts how representations are selected layer by layer, explains how emergence of concepts arises with given sample complexity,and gives a concrete mechanism by which depth progressively constructs new features from old ones through low-degree compositionality. We complement the theory with mechanistic experiments on fully connected and convolutional architectures, showing that Neural LoFi improves over lazy random-feature baselines, recovers meaningful structured filters, and predicts representations aligned with early gradient-descent feature discovery with real datasets.

2605.13607 2026-05-14 stat.CO cs.CE cs.MS

Ergodicity Library: A Python Toolkit for Stochastic-Process Simulation, Time-Average Diagnostics, and Agent-Based Experiments

Ihor Kendiukhov

AI总结 Ergodicity Library 是一个开源的 Python 工具包,专注于随机过程的模拟、时间平均分析和基于代理的实验,特别强调非遍历性、重尾过程和不确定性下的决策行为。该工具集整合了过程定义与模拟、分析与拟合工具以及基于代理的实验三个层次,简化了从模型构建到诊断分析的流程。文章介绍了该软件的架构、支持的过程类型、分析流程及其实现范围,并提供了多个可复现的实例以展示其应用。

详情
英文摘要

ergodicity is an open-source Python library for computational work on stochastic dynamics, with particular emphasis on non-ergodicity, time-average behavior, heavy-tailed processes, and decision making under uncertainty. The package brings together three layers that are often split across ad hoc scripts: process definitions and simulators, analysis and fitting tools, and agent-based experimentation. This article documents the implemented software rather than presenting new stochastic theory. We describe the package architecture, the supported process families, the analysis workflow, and the practical boundaries of the current implementation. We also provide fully reproducible examples covering heavy-tailed ensemble spread, multiplicative Levy growth diagnostics, adaptive memory mean reversion, preasymptotic fluctuation analysis, and partial stochastic differential equation simulation. The package is positioned as an integration layer on top of the scientific Python stack, reducing the amount of glue code required to move from process specification to diagnostics and comparative experiments.

2605.13589 2026-05-14 stat.ML cs.LG

Causal Learning with the Invariance Principle

Francesco Montagna, Francesco Locatello

AI总结 本文研究了因果发现问题,即如何推断变量之间的因果方向。作者基于结构因果模型(SCM),提出在因果关系无环且跨不同环境保持不变的假设下,仅需两个辅助环境即可推断出任意非线性机制下的因果图。该方法不仅保证了因果图的可识别性,还进一步确保了反事实推理的正确性,并通过合成数据验证了理论结果。

详情
英文摘要

Causal discovery, the problem of inferring the direction of causality, is generally ill-posed. We use the language of structural causal models (SCM) to show that assuming that the causal relations are acyclic and invariant across multiple environments (e.g., the way minimum wage affects employment rate is stable across different geographical regions), \textit{only} two auxiliary environments are sufficient to infer the causal graph for arbitrary nonlinear mechanisms. Moreover, we demonstrate that this implies identifiability of the SCM functional mechanisms: as a corollary, we show that \textit{two} auxiliary environments are sufficient to guarantee correct counterfactual inference. We empirically support our theoretical results on synthetic data.

2605.13550 2026-05-14 stat.ME

Causal Discovery via Statistical Power (CDSP)

Shreya Prakash, Fan Xia, Elena A. Erosheva

AI总结 本文提出了一种名为CDSP的因果发现方法,通过将因果方向估计与统计功效联系起来,提供了一种能够进行不确定性量化的统计推断框架。该方法基于双变量观测数据,引入效应大小不对称性假设,用于判断数据是否足够支持某一因果方向,并有效提升了因果方向估计的可靠性。实验表明,CDSP在模型轻微误设的情况下仍具有较好的鲁棒性,并在实际数据中相比现有方法将假发现率降低了约18%。

详情
英文摘要

Causal discovery methods aim to infer causal direction from observational data. Functional causal discovery approaches use structural asymmetries to identify causal directionality but rely on strong modeling assumptions and provide limited tools for uncertainty quantification. We introduce Causal Discovery via Statistical Power (CDSP), a statistical inference framework that connects causal direction estimation with statistical power and enables uncertainty quantification. Considering the foundational setting of bivariate observational data, we show how quantities analogous to statistical power and effect size can be used in causal discovery to determine when data contain sufficient information to favor one direction over the other. We introduce the effect-size asymmetry assumption that characterizes when the probability of correctly detecting the causal direction (i.e., the power of causal discovery) exceeds that of incorrectly favoring the reverse direction. We show that the effect-size asymmetry assumption can be used for causal direction estimation with uncertainty quantification. Simulations show that CDSP direction estimation is robust to mild and moderate model misspecifications. Real data analyses on 100 cause-effect benchmark pairs further demonstrate that CDSP reduces false discovery rates by approximately 18% relative to a commonly used existing method.

2605.13504 2026-05-14 stat.ME math.AP math.DS math.PR q-bio.QM

Structural identifiability of partially-observed stochastic processes: from single-particle trajectories to total particle density data

Arianna Ceccarelli, Alexander P. Browning, Ruth E. Baker

AI总结 本文研究了部分观测随机过程的结构可识别性问题,探讨了在单粒子轨迹数据和总粒子密度数据下参数能否唯一确定。作者提出了一种适用于时空随机过程的方法,针对轨迹数据采用个体模型描述,针对密度数据建立偏微分方程模型并结合微分代数方法进行分析,同时引入基于特征方程的初始条件分析方法,揭示了初始条件对可识别性的重要影响,并通过实例展示了该方法在识别参数组合上的有效性。

详情
Comments
Main: 26 pages, 4 figures Supplementary Information: 20 pages, 5 figures
英文摘要

The increasing availability of experimental data has intensified interest in calibrating stochastic models, raising fundamental questions about parameter identifiability. Structural identifiability determines whether parameters can be uniquely recovered from idealised, noise-free data, a prerequisite to allow for parameter estimation. However, existing methods to assess structural identifiability are not generally applicable to stochastic processes. We develop a methodology to analyse structural identifiability for a class of spatio-temporal stochastic processes. We investigate how identifiability depends on the type of available data, distinguishing between single-particle trajectories and total particle density measurements. For trajectory data, we use the individual-based model description that explicitly represents single-particle dynamics. For population-level data, we derive a partial differential equation model representation, that describes the evolution of total particle density, and apply a differential algebra approach, common to ordinary differential equations analysis. We further introduce a novel method to study the initial condition, based on characteristic equations to construct a Taylor expansion of the density evolution, enabling identification of additional identifiable parameter combinations. We apply our methodology to a model, and show it is identifiable with trajectory data but only locally identifiable with density data, and demonstrate the critical role of initial conditions in the identifiability analysis.

2605.13484 2026-05-14 cs.LG cs.AI stat.ME

Discovery of Hidden Miscalibration Regimes

Katarzyna Kobalczyk, Mihaela van der Schaar

AI总结 本文研究了模型在不同输入上的校准偏差问题,指出传统方法仅基于置信度评估校准,可能掩盖局部校准失败的现象。为此,作者提出了一种无需预设数据切片的隐式校准偏差发现方法,通过学习输入空间的校准感知表示,并利用核平滑估计局部校准偏差。实验表明,该方法能有效揭示大语言模型在不同输入下的校准异质性,并在系统性偏差区域显著提升校准效果。

详情
英文摘要

Calibration is commonly evaluated by comparing model confidence with its empirical correctness, implicitly treating reliability as a function of the confidence score alone. However, this view can hide substantial structure: models may be systematically overconfident on some kinds of inputs and underconfident on others, causing global reliability diagnostics to obscure localised calibration failures. To address this, we formulate the problem of discovering hidden miscalibration regimes without assuming access to predefined data slices. We define the corresponding miscalibration field and propose a diagnostic framework for estimating it. Our approach learns a calibration-aware representation of the input space and estimates signed local miscalibration by kernel smoothing in the learned geometry. Across four real-world LLM benchmarks and twelve LLMs, we find that input-dependent calibration heterogeneity is prevalent. We further show that the discovered fields are actionable: they support local confidence correction and reduce calibration error in systematically miscalibrated regions where confidence-based methods such as isotonic regression and temperature scaling are less effective.

2605.13448 2026-05-14 stat.ML cs.LG math.PR

On the Limits of Latent Reuse in Diffusion Models

Yifeng Yu, Lu Yu

AI总结 本文研究了扩散模型在分布偏移情况下潜在空间复用的可靠性问题。作者分析了源域和目标域数据虽近似低维但可能位于不同子空间时,复用源潜在空间会导致目标域评分误差的原因,发现该误差由两个因素决定:源目标子空间之间的主角度偏差以及扩散时间尺度放大后的目标噪声。基于这些发现,作者进一步探讨了混合源-目标训练方法,并分析了共享潜在空间维度与两个分布几何关系之间的依赖性,为潜在空间复用的适用条件提供了理论指导。

详情
英文摘要

Diffusion models are often trained in low-dimensional latent spaces, which are then reused for related but shifted datasets. In this work, we study when such latent reuse remains reliable under distribution shift. We consider a source-target setting in which both datasets are approximately low-dimensional but may lie near different subspaces. We show that freezing and reusing a source latent space induces a target-domain score error governed by two quantities: the principal-angle misalignment between the source and target subspaces, and the target ambient noise amplified by the diffusion time scale. Motivated by these limits, we further study mixed source-target training and characterize how the required shared latent dimension depends on the relative geometry of the two distributions. Our results provide theoretical guidance on when latent reuse is reliable and when learning a shared representation may be necessary.

2605.13446 2026-05-14 stat.AP

Scenario generation of intraday electricity price paths for optimal trading in continuous markets

Andrzej Puć, Joanna Janczura

AI总结 本文研究了如何在连续日内电能交易市场中生成电价路径场景,以支持最优交易决策。作者提出了一种基于修正支持向量回归模型的综合预测框架,通过引入基础变量预测误差的场景生成和新的支持向量排序方法,实现了从点预测到概率轨迹预测的扩展。实验结果表明,该方法在统计和经济指标上均优于基准方法,尤其在风险控制和交易收益方面表现突出。

详情
英文摘要

Continuous intraday electricity markets play an increasingly important role in short-term trading and balancing, yet decision-making under rapidly evolving price dynamics remains challenging. This paper proposes a comprehensive framework for ensemble forecasting of intraday electricity price trajectories and their translation into adaptive trading decisions. Building on a corrected Support Vector Regression model, the approach extends point predictions to probabilistic trajectory forecasts by introducing scenario generation based on forecast errors of fundamental variables and proposing a novel Support Vector Sorting procedure for the efficient selection of representative scenarios. The framework is evaluated using transaction level data from the German intraday continuous market. Empirical results show improvements over benchmark methods in both statistical and economic terms. Fundamental scenarios enhance median trajectory accuracy but produce more concentrated predictive distributions, while historical simulation with scenario selection better captures tail risk. From an economic perspective, ensemble-based forecasts outperform naive benchmarks across most of the trading strategies. Dynamic updating through scenario reweighting further improves profitability with limited impact on downside risk. Overall, the results demonstrate that combining kernel-based learning with scenario driven uncertainty and adaptive updating provides a flexible and effective approach for forecasting and trading in continuous electricity markets.

2605.13439 2026-05-14 stat.ME

Median Radial Function: A Robust, Covariance-Free Framework and Applications

Elsayed Elamir

AI总结 本文提出了一种基于中位数半径的框架,用于评估多元数据的中心性,该方法定义了一种尺度不变的径向离散度度量,并据此构建了一个对异常值鲁棒且不受协方差结构影响的深度函数。该深度函数无需依赖矩假设,能够自然适应偏态、多峰和重尾分布,适用于高维数据分析。研究还分析了该函数的次梯度和凸性特性,揭示了数据在不同方向上的不对称性,为检测数据偏态和结构不对称提供了新的径向方法。

详情
Comments
20 pages, 7 figures
英文摘要

A median-radius framework for assessing centrality in multivariate data using median distances is proposed. Based on the proposed framework, a scale invariant measure of radial dispersion is defined and used to establish a depth function that is robust to outliers and independent of covariance structure. The depth function does not depend on moment assumptions and naturally adapts to skewness, multimodality, and heavy-tailed distributions, which make it effective for high-dimensional data structures. We demonstrate fundamental characteristics of the underlying functionals such as subgradient and convexity. The subgradients provide additional insight and encode the imbalance in directional contributions of the data. This suggests a new approach to detect skewness and structural asymmetry through a purely radial construction. Empirical studies demonstrate that the method agrees with classical approaches under symmetry while providing a more flexible and informative characterization in complex settings.

2605.13434 2026-05-14 cs.LG cs.DC math.OC stat.ML

Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity

Ammar Mahran, Artavazd Maranjyan, Peter Richtárik

AI总结 本文研究了在数据和系统异构环境下分布式学习中的异步随机梯度下降(ASGD)方法。传统ASGD因未考虑不同工作节点的计算速度差异,导致模型更新偏向于局部目标的频率加权平均,而非全局目标。本文提出了一种名为Rescaled ASGD的新方法,通过按各节点计算时间比例调整步长,使得每个节点在周期内对模型的总学习率贡献相同,从而恢复对全局目标的正确优化。理论分析表明,该方法在非凸设置下能够收敛到全局目标的平稳点,且时间复杂度达到已知下界,实验验证了其有效性与先进性。

详情
英文摘要

Asynchronous stochastic gradient descent (ASGD) is a standard way to exploit heterogeneous compute resources in distributed learning: instead of forcing fast workers to wait for slow ones, the server updates the model whenever a gradient arrives. Vanilla ASGD applies each arriving gradient with the same weight. When local data distributions are heterogeneous, this becomes problematic: faster workers contribute more updates, and we show theoretically that the method is biased toward a frequency-weighted average of the local objectives rather than the desired global objective. Existing remedies typically move away from the simple ASGD template by introducing gathering phases, buffering, or extra memory. We show that this is unnecessary. Keeping the standard ASGD mechanism, we recover the correct objective by rescaling worker-specific stepsizes in proportion to their computation times, so that each worker contributes the same aggregate learning rate over a cycle. In the non-convex setting, under smoothness and bounded heterogeneity assumptions, we prove that the resulting method, Rescaled ASGD, converges to stationary points of the correct global objective in the fixed-computation model. Its time complexity matches the known lower bound in the leading term, while the effects of staleness and data heterogeneity appear only in lower-order terms. Experiments confirm that the method converges to the correct objective and is competitive with state-of-the-art baselines.

2605.13421 2026-05-14 stat.ME

Combining pre-trained models via localized model averaging

Ziwen Gao, Baihua He, Yuhong Yang

AI总结 本文研究了如何有效结合多个预训练模型(PTMs)以提升在不同任务上的预测性能。作者提出了一种基于协变量的局部模型平均方法,通过将模型权重建模为输入特征的函数,使方法能够自适应地捕捉不同PTMs在不同上下文中的相对优势。该方法在通用损失框架下学习灵活的局部权重,并在样本内和样本外风险方面建立了渐近最优性,同时证明了权重估计的一致性,实验结果进一步验证了其有效性。

详情
英文摘要

Many pre-trained models (PTMs) are available in modern applications. Because different PTMs are often trained on different datasets, their performances can vary substantially for different new tasks, and the ranking of the candidates may depend heavily on the input. Motivated by this, we propose a localized model averaging method with weights modeled as functions of the covariates, making it substantially more versatile than existing model averaging methods. This formulation allows the model averaging procedure to adaptively capture the varying relative advantages of different PTMs across heterogeneous contexts. Specifically, we learn flexible local weights under a general loss framework that accommodates a broad class of prediction tasks. We further establish the asymptotic optimality of the proposed method for both in-sample and out-of-sample risks, as well as the consistency of the estimated weights. Extensive numerical experiments further demonstrate the effectiveness of the proposed method.

2605.13401 2026-05-14 cs.LG cs.RO stat.ML

Trajectory-Level Data Augmentation for Offline Reinforcement Learning

Tobias Schmähling, Matthias Burkhardt, Tobias Windisch

AI总结 本文提出了一种用于离线强化学习的轨迹级数据增强方法,旨在解决主动定位等任务中从少量次优轨迹中训练策略的问题。该方法利用任务结构以及奖励函数、价值函数与日志策略之间的几何关系,通过轨迹层面的增强技术提升数据质量,从而提高离线强化学习的性能。研究提供了理论依据,并在不同维度和部分可观测性条件下验证了方法的有效性。

详情
Comments
26 pages, 25 figures, Accepted at ICML 2026
英文摘要

We propose a data augmentation method for offline reinforcement learning, motivated by active positioning problems. Particularly, our approach enables the training of off-policy models from a limited number of suboptimal trajectories. We introduce a trajectory-based augmentation technique that exploits task structure and the geometric relationship between rewards, value functions, and mathematical properties of logging policies. During data collection, our augmentation supports suboptimal logging policies, leading to higher data quality and improved offline reinforcement learning performance. We provide theoretical justification for these strategies and validate them empirically across positioning tasks of varying dimensionality and under partial observability.

2605.13388 2026-05-14 stat.ME stat.AP

Toward a practical handbook for choosing among causal inference methods in non-randomized studies with binary outcomes: A simulation study for applied researchers

Adrián Aurensanz-Crespo, Cristóbal M Rodríguez-Leal, Rosario Susi, Jorge Castillo-Mateo, Jesús Asín, José M Ramírez, Teresa Pérez

AI总结 本研究旨在为应用研究者提供一份实用指南,帮助其在非随机研究中选择适合的因果推断方法,以估计二元结局下的处理效应。通过大规模模拟实验,研究比较了四种常用方法——倾向得分匹配、逆概率加权、G计算和靶向最大似然估计的性能,并基于实际数据验证了指南的实用性。该工作为在真实世界数据中进行因果效应估计提供了重要的参考依据。

详情
Comments
21 pages, 4 figures. Code available at https://github.com/aaurensanz/code-causal-inference-comparison
英文摘要

Applied researchers in biomedicine and related fields are often interested in estimating the causal effect of a treatment or intervention. Although randomized clinical trials are considered the gold standard for establishing causal effects, they are not always feasible, and real-world data may represent the only available source of evidence. In such settings, causal effects must be estimated using statistical methods applied to observational data. Over the last few decades, modern causal inference methods based on the potential outcomes framework have emerged as useful tools in this field. However, many such techniques exist, and their performance depends on factors such as sample size, the proportion of treated patients, the proportion of patients experiencing the outcome, the magnitude of the treatment effect, the target estimand, and potential violations of the fundamental assumptions of causal inference. Given the wide range of available methods, selecting an appropriate approach can be challenging for applied researchers. This study uses a large-scale simulation experiment to address this issue and provide researchers with a guide in the form of a handbook for a binary treatment and a binary outcome. Particularly, we test four popular statistical techniques: propensity score matching (full matching), inverse of the probability weighting, G-computation, and targeted maximum likelihood estimation. The proposed handbook is applied to two real-world datasets to assess its practical utility: one comprising vulnerable patients with mild COVID-19 (n=534 patients and more than 50% treated), and another of patients undergoing colorectal surgery (n=3635 patients and about 20% treated).

2605.13386 2026-05-14 cs.LG stat.ML

Support-Conditioned Flow Matching Is Kernel Smoothing

Daniel Matsui Smola

AI总结 本文研究了基于交叉注意力的生成模型在有限支持集条件下的生成机制,揭示其速度场本质上是 Nadaraya-Watson 核平滑器,并随着生成过程时间推移,核带宽逐渐缩小,从早期的全局平均过渡到后期的最近邻行为。研究将交叉注意力机制与经典核方法联系起来,并指出了三种失效场景,实验验证了理论预测,并表明 IP-Adapter 的交叉注意力实现了近似核平滑效果。

详情
Comments
Submitted to NeurIPS 2026. 18 pages, 10 figures, 1 table. Code at https://github.com/BaroqueObama/kernel-flow-matching-code
英文摘要

Generative models are often conditioned on a small set of examples via cross-attention. Under the Gaussian optimal-transport path, we show that the exact velocity field induced by a finite support set is a Nadaraya--Watson kernel smoother whose bandwidth decreases with flow time, from broad averaging at early steps to nearest-neighbor at late steps. A single Gaussian-kernel attention head exactly computes this field, connecting cross-attention conditioning to classical kernel theory. The theory predicts three failure regimes: nearest-neighbor collapse of the kernel at high dimension, mismatch between the isotropic kernel and the data geometry, and insufficient support for nonparametric estimation. Experiments on Gaussian mixtures, spherical shells, and DINOv2 ImageNet features confirm that learned conditioning improves in precisely these regimes, and that IP-Adapter's cross-attention implements approximate NW smoothing in practice.

2605.10303 2026-05-14 math.ST stat.TH

Measuring Tail Dependence in Linear Processes: Theory and Empirics

Debanjana Datta, Diganta Mukherjee

AI总结 该论文研究了线性过程中尾部依赖性的度量问题,旨在捕捉金融时间序列中标准高斯框架无法描述的厚尾分布和极端共移动现象。作者提出了一种基于依赖性度量的联合极端值分析方法,适用于非同分布和同分布的正则变差分布,并结合高频加密货币数据验证了持久性特性的影响。研究通过详细的模拟实验验证了方法的有效性,为极端风险分析提供了新的理论支持和实证依据。

详情
Comments
17 pages
英文摘要

The quantitative analysis of financial time series often reveals two distinct features that standard Gaussian frameworks fail to capture: heavy-tailed marginal distributions and the phenomenon of extreme co-movements.While extreme value theory characterizes marginal behavior, Copulas provide a functional bridge to describe the dependence structure independently of the marginals. We are proposing a different way of looking at the joint extremes on the basis of a dependence measure. The proposed idea incorporates both the non-identical and identical regularly varying distributions. Informed by the analysis of some high-frequency cryptocurrency datasets, the effect of persistence property have been thoroughly studied under these setups. A detailed simulation study confirms our intuition and findings.

2605.04999 2026-05-14 stat.ME

A Tutorial for Evaluating Cure Model Appropriateness

A Tutorial for Evaluating Cure Model Appropriateness Geethanjalee Mudunkotuwa, Durbadal Ghosh, Subodh Selukar

AI总结 在生存分析中,传统模型假设所有个体最终都会经历感兴趣的事件,但随着治疗手段的进步,越来越多的临床场景中存在可能治愈的疗法,部分个体可能永远不会经历该事件。为此,统计学家提出了治愈模型来应对这一挑战,但其在生物医学领域的应用仍较为有限。本文提供了一种系统方法,结合临床判断、Kaplan-Meier曲线的可视化分析和定量评估,以判断是否适合使用治愈模型,并通过急性髓系白血病的临床试验数据及其他造血干细胞移植数据集的实例,为研究者提供了实用的指导,有助于提高生存分析的可靠性与临床决策质量。

详情
Comments
24 pages, 2 figues, to be submitted in Statistics in Medicine, First two authors have equal contributions
英文摘要

In survival analysis, traditional models assume all individuals will eventually experience the event of interest. However, advances in therapeutics have led to multiple clinical contexts with potentially curative therapies, and in these contexts, certain individuals may never experience the event. Statisticians have developed cure models as a methodology to address this challenge. Nonetheless, despite significant statistical advances in cure models, we have seen more limited uptake in biomedical applications, and we hypothesize that this is caused by limited guidance in the appropriate application of cure models. Cure models require specific identifiability conditions for valid parameter estimation, and previous reports have demonstrated significant issues with the inappropriate application of cure models. Existing tutorials for cure models focus on model implementation and either assume or provide only limited guidance on whether cure modeling is appropriate for the given dataset. This tutorial addresses this gap by describing a systematic procedure that integrates clinical judgment, visual inspection of Kaplan-Meier curves, and quantitative evaluation. We provide a worked example using data from a randomized clinical trial in acute myeloid leukemia, and we also summarize findings from a series of other datasets of hematopoietic cell transplantation to suggest broad practical guidance for choosing to apply cure models. By systematically evaluating cure model appropriateness before fitting these models, researchers can achieve more reliable survival analysis and improved clinical decision-making.

2605.04912 2026-05-14 math.PR math.ST stat.TH

Can the $L^1$-$L^\infty$ duality be restored for non-dominated families of probability measures?

Irene Klein, Georg Köstenberger

AI总结 本文研究了在非主导概率测度族下恢复 $L^1$ 与 $L^\infty$ 对偶关系的问题,指出在存在模型不确定性时,经典对偶关系失效,进而提出通过扩展概率空间来恢复这一对偶性的方法。作者证明,在扩展后的模型中,$\mathcal{P}$-几乎处处有界函数空间与有限符号测度空间之间存在等距同构关系,并展示了该方法适用于多种非主导模型,如无限乘积测度、高斯过程和带有不确定波动率的Black-Scholes模型等。此外,该方法统一了已有框架,并将Kraft的经典假设检验结果推广到非主导情形。

详情
Comments
43 pages, fixed minor inconsistencies and typos, incl. references
英文摘要

The duality $L^{\infty}\simeq (L^{1})'$ frequently breaks down in the presence of model uncertainty, where a single reference measure $P$ is replaced by a non-dominated family of probability measures $\mathcal{P}$. The unavailability of classical measure-theoretic and functional-analytic tools in this regime poses a significant obstacle to developing robust probabilistic frameworks. We show that this duality can be restored for a broad class of robust statistical models by extending the underlying probability space. Specifically, on the extended model, the space $\mathbb{L}^{\infty}(\mathcal{P})$ of $\mathcal{P}$-quasi-surely bounded functions is isometrically isomorphic to the dual of the space of finite signed measures absolutely continuous with respect to at least one element of $\mathcal{P}$. The proposed extension is canonical: it is the smallest $\mathcal{P}$-complete extension of the original $σ$-algebra for which $\mathbb{L}^{\infty}(\mathcal{P})$ is the dual of any normed space. Our assumptions encompass several prominent non-dominated settings, including infinite product measures, Gaussian processes, the Black-Scholes model with uncertain constant volatility and drift, robust binomial models, and, more generally, infinite sequences from any parametric model with almost surely estimable parameters. Furthermore, we unify the existing frameworks of Cohen (2012) and Liebrich et al. (2022), demonstrating that our construction is equivalent to the capacity-based approach under mild assumptions satisfied by the aforementioned examples. Finally, we apply our theory to extend Kraft's (1955) characterization of strictly unbiased hypothesis tests to non-dominated cases.

2604.26070 2026-05-14 cs.LG math.OC math.ST q-bio.QM stat.TH

Observable Neural ODEs for Identifiable Causal Forecasting in Continuous Time

Jennifer Wendland, Nicolas Freitag, Maik Kschischo

AI总结 该论文研究了连续时间因果推理中的可识别性问题,针对存在隐藏混杂因素的动态决策场景,提出了可观测神经ODE(ObsNODE)模型。通过将控制理论中的可观测性概念与因果可识别性联系起来,论文推导出一种连续时间调整公式,并设计了能够从观测数据中重构潜在状态的神经ODE模型,从而实现对不同干预路径下结果的预测。实验表明,该方法在合成癌症数据、基于MIMIC-IV的半合成数据和真实脓毒症数据上均表现出优越的性能。

详情
Comments
20 pages, 5 figures
英文摘要

Causal inference in continuous-time sequential decision problems is challenged by hidden confounders. We show that, in latent state-space models with time-varying interventions, observability of the latent dynamics from observed data is necessary for identifying dynamic treatment effects, linking control-theoretic observability to causal identifiability, even when hidden confounders affect both treatments and outcomes. We derive a continuous-time adjustment formula expressing potential outcome distributions under treatment trajectories via the measurement model, latent dynamics, and the filtering distribution over latent states given observed histories. We propose Observable Neural ODEs (ObsNODEs), Neural ODE models in observable normal form for causal forecasting. ObsNODEs learn continuous-time dynamics with states reconstructible from observations, enabling outcome prediction under alternative treatment paths. Experiments on synthetic cancer data, semi-synthetic data based on MIMIC-IV, and real-world sepsis data show strong performance over recent sequence models.

2603.02928 2026-05-14 stat.ME stat.CO

LOO-PIT predictive model checking

Herman Tesso, Aki Vehtari

AI总结 本文研究了基于留一法概率积分变换(LOO-PIT)的贝叶斯模型评估方法,用于预测性模型检验。由于LOO-PIT值在有限样本下存在依赖性,传统基于独立性假设的检验方法可能表现不佳,本文提出了三种适用于连续和离散数据的检验方法,并设计了一种自动化的图形化方法以可视化局部偏差。实验表明,所提方法在多个数据集上表现出更高的检验能力。

详情
Comments
30 pages
英文摘要

We consider predictive checking for Bayesian model assessment using leave-one-out probability integral transform (LOO-PIT). LOO-PIT values are conditional cumulative predictive probabilities given LOO predictive distributions and corresponding left out observations. For a well-calibrated model, LOO-PIT values should be near uniformly distributed, but in the finite sample case they are not independent, due to LOO predictive distributions being determined by nearly the same data (all but one observation). We prove that this dependency is non-negligible in the finite case and depends on model complexity. We propose three testing procedures that can be used for continuous and discrete dependent uniform values. We also propose an automated graphical method for visualizing local departures from the null. Extensive numerical experiments on simulated and real datasets demonstrate that the proposed tests achieve competitive performance overall and have much higher power than standard uniformity tests based on the independence assumption that inevitably lead to lower than expected rejection rate.

2602.22847 2026-05-14 cs.LG cs.AI stat.ML

Decentralized Ranking Aggregation via Gossip: Convergence and Robustness

Kerrian Le Caillec, Anna Van Elst, Igor Colin, Stephan Clémençon

AI总结 本文研究了在去中心化网络环境中实现可靠且鲁棒的排名共识的问题,提出了一种基于随机闲聊(gossip)通信机制的方法,使各节点仅通过局部交互即可计算全局排名共识,无需中心协调。该方法在保证收敛性的同时,增强了对恶意节点的鲁棒性,并降低了通信成本,为分布式偏好分析提供了新的解决方案。

详情
Comments
33 pages, 5 figures
英文摘要

The concept of ranking aggregation plays a central role in preference analysis, and numerous algorithms for calculating median rankings, often originating in social choice theory, have been documented in the literature, offering theoretical guarantees in a centralized setting, \textit{i.e.}, when all the ranking data to be aggregated can be brought together in a single computing unit. For many technologies (\textit{e.g.} peer-to-peer networks, IoT, multi-agent systems), extending the ability to calculate consensus rankings with guarantees of convergence and resilience to potential contamination in a decentralized setting, when preference data is initially distributed across a communicating network, remains a major methodological challenge. Indeed, in recent years, the literature on decentralized computation has mainly focused on computing or optimizing statistics such as arithmetic means using gossip algorithms. The purpose of this article is precisely to study how to achieve reliable and resilient consensus on collective rankings in a decentralized setting, thereby raising new questions, robustness to corrupted nodes, and scalability through reduced communication costs in particular. The approach proposed and analyzed here relies on the robustness guarantees offered by random gossip communication, which allows autonomous agents to compute a global ranking consensus using local interactions only, without coordination or a central authority.

2602.11131 2026-05-14 physics.soc-ph math.ST stat.TH

Formalization of the generalized Pareto principle and structural typicality of the 20/80-rule

Antti Hippeläinen

AI总结 本文对广义帕累托原理进行了形式化描述,将其定义为“输入的分数 $p$ 产生输出的分数 $1-p$”,并基于非负收益密度函数进行分析,得到了一个唯一的表征方法。研究推导了截断幂律、指数和正态分布族的 $p$ 的闭合表达式,并预测在样本量 $N$ 为 $[10^2, 10^5]$ 时,这些分布的 $p$ 值集中在 $[0.15, 0.26]$ 和 $[0.20, 0.29]$ 范围内,接近经典的 20/80 法则,且低于此前提出的饱和值。研究揭示了帕累托型不平衡现象的普遍性及其作为规范性目标的应用意义。

详情
Comments
v1: 33 pages, 10 figures v2: some additions, added references, minor revisions to formatting
英文摘要

We formalize a generalized form of the Pareto principle - ``fraction $p$ of inputs yields fraction $1-p$ of outputs'' - as a property of non-negative gain densities $\ell \in L^1([0,1])$, working with the decreasing rearrangement to obtain a unique characterization. For probability distributions, the resulting $p$ coincides with $1 - k_F$, where $k_F$ is the Kolkata index of the corresponding Lorenz curve. Within this framework we analyze both constructed gain densities and commonly encountered distribution families. We derive closed-form expressions for $p$ for truncated power-law, exponential, and normal distribution families. Combining these with estimates of the truncation parameter as a function of sample size $N$, we predict that datasets of size $N \in [10^2, 10^5]$ from exponential and normal families concentrate $p$ near $[0.15, 0.26]$ and $[0.20, 0.29]$ - values close to the canonical 0.2/0.8-rule, and strictly below the saturation $k \approx 0.865$ conjectured earlier by Ghosh and Chakrabarti. We discuss the implications of the structural ubiquity of Pareto-type imbalances for their use as prescriptive targets.

2602.06021 2026-05-14 stat.ML cs.LG cs.NA math.NA math.PR

Diffusion Model's Generalization Can Be Characterized by Inductive Biases toward a Data-Dependent Ridge Manifold

Ye He, Yitong Qiu, Molei Tao

AI总结 本文研究扩散模型在不记忆训练数据时生成样本的分布特性,提出了一种基于数据依赖的几何视角来刻画其泛化能力。作者引入了一组随时间变化的对数密度脊流形,用于表征反向扩散过程,并发现生成样本遵循“进入-对齐-滑动”的机制。研究进一步将这一几何结构与训练动态联系起来,揭示了模型架构偏差与优化误差之间的定量关系,并在合成数据和MNIST实验中验证了理论预测。

详情
英文摘要

We study a data-dependent notion of diffusion-model generalization: when a model does not memorize the training set, where do its generated samples go relative to the geometry induced by the data? To answer this, we introduce a time-dependent family of log-density ridge manifolds constructed from the smoothed empirical distribution, and use it to characterize reverse-time inference. Our main result shows that generated samples evolve by a reach-align-slide mechanism: they first enter a neighborhood of the ridge, then their distance to the ridge is controlled by the normal component of training error, and finally their motion along the ridge is controlled by the tangential component. We further connect this geometric picture to training dynamics through directional decompositions of the learned error, and make this link explicit for random feature models, where architectural bias and optimization error can be separated quantitatively. Experiments on synthetic multimodal data and MNIST latent diffusion support the predicted geometric behavior in both low and high dimensions.

2602.02791 2026-05-14 stat.ML cs.LG math.ST stat.TH

Plug-In Classification of Drift Functions in Diffusion Processes Using Neural Networks

Yuzhen Zhao, Jiarong Fan, Yating Liu

AI总结 本文研究了扩散过程中的监督多类分类问题,每个类别由不同的漂移函数表征,观测数据为离散时间轨迹。作者提出了一种基于神经网络的插件分类方法,通过估计类别特定的漂移函数进行分类,并在标准正则性假设下建立了误分类风险的收敛速率,明确了漂移估计、时间离散化和维度的影响。理论分析表明,利用扩散结构进行漂移学习能够获得比直接基于轨迹的神经分类更优的性能,数值实验也验证了该方法在不同维度下的有效性。

详情
英文摘要

We study supervised multiclass classification for diffusion processes, where each class is characterized by a distinct drift function and trajectories are observed at discrete times. We first derive a multidimensional Bayes rule and then construct a plug-in classifier by estimating the class-specific drifts with neural networks. Under standard regularity assumptions, we establish convergence rates for the excess misclassification risk, making explicit the contributions of drift estimation, time discretization, and dimension. Our analysis also highlights the benefit of exploiting the diffusion structure: the drift is learned from all observed increments, leading to sharper guarantees than direct trajectory-based neural classifiers in the considered setting. Numerical experiments support the theory: the proposed method achieves better classification performance than Denis et al. (2024) in dimension one, remains effective in higher dimensions when the drift functions admit a compositional structure, and outperforms end-to-end neural classifiers trained directly on trajectories, as in Bos & Schmidt-Hieber (2022).

2512.20280 2026-05-14 stat.ME

The post-hoc test for local dependence

Bogdan Ćmiel, Bartłomiej Gibas

AI总结 本文旨在提出一种同时考虑全局和局部统计独立性检验的方法,以更全面地识别数据中的依赖关系及其强度。研究基于copula理论,引入了基于分位数依赖函数的检验方法,并提出了“临界曲面”概念,以在保持整体显著性水平的前提下,对局部依赖进行详细分析和显著性评估。该方法增强了对依赖结构的可视化与解释能力,具有重要的实际应用价值。

详情
英文摘要

The concept of independence plays a crucial role in probability theory and has been the subject of extensive research in recent years. Numerous approaches have been proposed to test for independence; however, most of them address the problem only at a global level. From a practical perspective, it is important not only to determine whether the data are dependent but also to identify where this dependence occurs and how strong it is. The graphical presentation of results is another essential aspect that should not be neglected, as it considerably enhances interpretability. The main objective of this work is to propose a solution that considers these aspects simultaneously. Relying on copula-based results, we introduce a novel method for testing global and local statistical independence using the quantile dependence function. Rather than assessing whether the value of the test statistic exceeds a single critical threshold and subsequently deciding whether to reject the independence hypothesis, we introduce so-called critical surfaces that guaranty a locally equal probability of exceeding them under independence. This approach enables a detailed examination of local discrepancies and an assessment of their statistical significance while preserving the overall significance level of the test.

2510.00417 2026-05-14 math.OC cs.LG stat.ML

Progressively Sampled Equality-Constrained Optimization

Frank E. Curtis, Lingjun Guo, Daniel P. Robinson

AI总结 本文提出了一种用于求解连续非线性等式约束优化问题的算法,适用于目标函数和约束函数由大量项的期望或平均定义的情形。该算法通过逐步增加样本量,依次求解一系列相关优化问题,从而在保证一定精度的前提下降低最坏情况下的样本复杂度。实验结果表明,该方法在实际应用中具有良好的效果。

详情
英文摘要

An algorithm is proposed, analyzed, and tested for solving continuous nonlinear-equality-constrained optimization problems where the objective and constraint functions are defined by expectations or averages over large, finite numbers of terms. The main idea of the algorithm is to solve a sequence of related problems, each involving finite samples of objective- and constraint-function terms, over which the sample sets grow progressively. Under assumptions about the problem functions and their first- and second-order derivatives that are reasonable in real-world settings of interest, it is shown that -- with sufficiently large initial sample sizes -- solving a sequence of problems defined through progressive sampling yields a better worst-case sample complexity bound compared to solving a single problem with the full sets of samples. The results of numerical experiments with a set of test problems demonstrate that the proposed approach can be effective in practice.

2509.23800 2026-05-14 stat.ML cs.LG

Sample-Efficient Optimisation over the Outputs of Generative Models

Samuel Willis, Paul Duckworth, Jack Simons, Aleksandra Kalisz, Krisztina Sinkovics, Noam Ghenassia, Shikha Surana, Henry T. Oldroyd, Alexandru I. Stere, Dragos D Margineantu, Carl Henrik Ek, Henry Moss, Erik Bodin

AI总结 本文提出了一种名为O3的方法,用于在生成模型的输出上进行样本高效的黑箱优化,特别适用于连续变量的扩散模型和流匹配模型。该方法基于代理潜在空间,即从生成模型中提取的低维欧几里得嵌入,无需额外训练即可实现可控维度的表示,并支持直接应用标准优化算法。实验表明,在图像和蛋白质设计任务中,代理空间优化相比传统采样或原潜在空间优化能获得显著更优的样本。该方法对模型和优化器具有通用性,额外成本极低,且无需重新训练或微调生成模型。

详情
英文摘要

Modern generative AI models, such as diffusion and flow matching models, can sample from rich data distributions. However, many applications, especially in science and engineering, require more than drawing samples from the model distribution: they require searching within this distribution for samples that optimise task-specific criteria. In this work, we propose O3 (Optimisation Over the Outputs of Generative Models), a method for sample-efficient black-box optimisation over continuous-variable diffusion and flow-matching models. O3 is built around surrogate latent spaces: low-dimensional Euclidean embeddings that can be extracted from a generative model without additional training. The resulting representations have controllable dimensionality and support the direct application of standard optimisation algorithms. We show, on image and protein design tasks, that surrogate-space optimisation finds substantially higher-scoring samples than standard sampling or optimisation in the original latent space. Our method is model- and optimiser-agnostic, incurs negligible additional cost over standard generation, and requires no retraining or fine-tuning of the generative model.

2509.19929 2026-05-14 stat.ML cs.LG physics.comp-ph physics.data-an

Geometric Autoencoder Priors for Bayesian Inversion: Learn First Observe Later

Arnaud Vadeboncoeur, Gregory Duthé, Mark Girolami, Eleni Chatzi

AI总结 本文提出了一种用于贝叶斯反演的几何自编码器先验框架(GABI),旨在解决从少量噪声观测中恢复物理系统全场信息这一高度不适定的问题。GABI通过学习不同几何结构系统的物理响应生成模型,构建出与几何条件相关的强先验信息,从而在反演过程中提升不确定性量化(UQ)的准确性与鲁棒性。该方法无需依赖物理方程或边界条件,利用近似贝叶斯计算(ABC)采样实现高效计算,并在多个复杂几何场景中验证了其有效性与可靠性。

详情
英文摘要

Uncertainty Quantification (UQ) is paramount for inference in engineering. A common inference task is to recover full-field information of physical systems from a small number of noisy observations, a usually highly ill-posed problem. Sharing information from multiple distinct yet related physical systems can alleviate this ill-posedness. Critically, engineering systems often have complicated variable geometries prohibiting the use of standard multi-system Bayesian UQ. In this work, we introduce Geometric Autoencoders for Bayesian Inversion (GABI), a framework for learning geometry-aware generative models of physical responses that serve as highly informative geometry-conditioned priors for Bayesian inversion. Following a ''learn first, observe later'' paradigm, GABI distills information from large datasets of systems with varying geometries, without requiring knowledge of governing PDEs, boundary conditions, or observation processes, into a rich latent prior. At inference time, this prior is seamlessly combined with the likelihood of a specific observation process, yielding a geometry-adapted posterior distribution. Our proposed framework is architecture-agnostic. A creative use of Approximate Bayesian Computation (ABC) sampling yields an efficient implementation that utilizes modern GPU hardware. We test our method on: steady-state heat over rectangular domains; Reynolds-Averaged Navier-Stokes (RANS) flow around airfoils; Helmholtz resonance and source localization on 3D car bodies; RANS airflow over terrain. We find: the predictive accuracy to be comparable to deterministic supervised learning approaches in the restricted setting where supervised learning is applicable; UQ to be well calibrated and robust on challenging problems with complex geometries.

2509.12889 2026-05-14 math.ST stat.ML stat.TH

Gaussian Mixture Model with unknown diagonal covariances via continuous sparse regularization

Romane Giard, Yohann de Castro, Clément Marteau

AI总结 本文研究了在未知对角协方差矩阵情况下高斯混合模型(GMM)的统计估计问题,提出利用Beurling-LASSO(BLASSO)方法同时估计混合成分的数量及其参数。该方法扩展了BLASSO框架至具有成分特异性对角协方差矩阵的多变量GMM,相比以往需要已知且相同的协方差矩阵的模型更具灵活性。研究建立了非渐近恢复保证,包括对成分均值、对角协方差和权重的近参数收敛速率,并引入了混合成分分离条件以构造非退化的对偶证书,为BLASSO的统计保证提供了理论支撑。

详情
英文摘要

This paper addresses the statistical estimation of Gaussian Mixture Models (GMMs) with unknown diagonal covariances from independent and identically distributed samples. We employ the Beurling-LASSO (BLASSO), a convex optimization framework that promotes sparsity in the space of measures, to simultaneously estimate the number of components and their parameters. Our main contribution extends the BLASSO methodology to multivariate GMMs with component-specific unknown diagonal covariance matrices. This setting is significantly more flexible than previous approaches, which required known and identical covariances. We establish non-asymptotic recovery guarantees with nearly parametric convergence rates for component means, diagonal covariances, and weights, as well as for density prediction. A key theoretical contribution is the identification of an explicit separation condition on mixture components that enables the construction of non-degenerate dual certificates-essential tools for establishing statistical guarantees for the BLASSO. Our analysis leverages the Fisher-Rao geometry of the statistical model and introduces a novel semi-distance adapted to our framework, providing new insights into the interplay between component separation, parameter space geometry, and achievable statistical recovery.

2507.09983 2026-05-14 stat.AP stat.ME

Gradient boosted multi-population mortality modelling with high-frequency data

Ziting Miao, Han Li, Yuyu Chen

AI总结 本文研究了如何利用高频死亡率数据进行更准确的死亡率建模与预测,针对传统模型在处理季节性波动和短期变化时的不足,提出了一种结合梯度提升技术与多群体随机死亡率模型的新方法。核心创新在于将Li和Lee模型作为弱学习器嵌入梯度提升框架,替代传统决策树,提升了模型拟合与预测精度。实证研究表明,该方法在30个国家的周死亡率数据上表现出更优的预测性能,并有效解决了多群体建模中子群体选择的问题。

详情
英文摘要

High-frequency mortality data have attracted growing attention, but their use has largely been confined to specific applications rather than general modelling and forecasting. Such data pose new challenges to traditional mortality models due to pronounced seasonal patterns and short-term fluctuations. To address these challenges and produce more accurate forecasts with the high-frequency mortality data, this paper introduces a novel integration of gradient boosting techniques into traditional stochastic mortality models under a multi-population setting. Our key innovation lies in using the Li and Lee model as the weak learner within the gradient boosting framework, replacing conventional decision trees. Empirical studies are conducted using weekly mortality data from 30 countries (Human Mortality Database, 2015-2019). Empirical evidence highlights that the proposed methodology not only enhances model fit by accurately capturing underlying mortality trends and seasonal patterns, but also achieves superior forecast accuracy, compared to the benchmark models. We also investigate a key challenge in multi-population mortality modelling: how to select appropriate sub-populations with sufficiently similar mortality experiences. A comprehensive clustering exercise is conducted based on mortality improvement rates and seasonal strength. The empirical results demonstrate that our proposed model maintains strong forecast accuracy across different clustering configurations, thereby reducing the need for extensive data preprocessing.

2502.11583 2026-05-14 stat.ML cs.LG

Distributional Autoencoders Know the Score

Andrej Leban

AI总结 本文研究了分布型主成分自编码器(DPA),旨在实现分布正确重建与编码可解释性的统一。通过理论分析,作者建立了最优水平集几何与数据分布得分之间的精确关系,揭示了DPA能够分离数据变化因素的机理,并允许直接从样本中恢复得分函数。此外,当数据服从玻尔兹曼分布时,该关系可用于单次拟合中近似最小自由能路径。研究还证明,在数据位于可由编码器逼近的流形上时,超出流形维度的潜在变量与数据分布条件独立,从而揭示了数据的内在维度。这些结果表明,单一模型可以在保证下同时学习数据分布及其内在维度,统一了无监督学习的两个长期目标。

详情
Journal ref
Advances in Neural Information Processing Systems 38 (NeurIPS 2025), 2025
Comments
NeurIPS 2025 - camera-ready version
英文摘要

The Distributional Principal Autoencoder (DPA) combines distributionally correct reconstruction with principal-component-like interpretability of the encodings. In this work, we provide exact theoretical guarantees on both fronts. First, we derive a closed-form relation linking each optimal level-set geometry to the data-distribution score. This result explains DPA's empirical ability to disentangle factors of variation of the data, as well as allows the score to be recovered directly from samples. When the data follows the Boltzmann distribution, we demonstrate that this relation yields an approximation of the minimum free-energy path for the Mueller-Brown potential in a single fit. Second, we prove that if the data lies on a manifold that can be approximated by the encoder, latent components beyond the manifold dimension are conditionally independent of the data distribution - carrying no additional information - and thus reveal the intrinsic dimension. Together, these results show that a single model can learn the data distribution and its intrinsic dimension with exact guarantees simultaneously, unifying two longstanding goals of unsupervised learning.

2501.07738 2026-05-14 math.PR math-ph math.MP math.ST q-bio.PE stat.TH

Mixing time for an epidemic model on graphs with external sources of infection

Wasiur R. KhudaBukhsh, Yangrui Xiang

AI总结 本文研究了带有外部感染源的易感-感染-易感(SIS)传染病模型在图上的混合时间问题。作者在参数适当假设下,证明了该模型的混合时间与顶点数 $n$ 的数量级为 $Θ(n\log n)$。进一步地,他们在随机图家族(如 Erdős–Rényi 图、随机正则多重图和 Galton–Watson 树)上分析了该模型,证明在高概率下混合时间仍保持 $Θ(n\log n)$ 的数量级。

详情
Comments
improved results, minor typos fixed, 19 pages, no figures
英文摘要

We study the mixing time of a Susceptible--Infected--Susceptible (SIS) model on graphs with external sources of infection, which we refer to as the noisy SIS model. Under suitable assumptions on the parameters of the dynamics, we show that the mixing time is of the order $Θ(n\log n)$ with respect to the number of vertices $n$. We further investigate the model on random graph families, including Erd{ö}s--R{é}nyi graphs, random regular multigraphs, and Galton--Watson trees. By identifying high-probability structural properties of these graphs and conditioning on typical realizations, we prove that the mixing time remains of order $Θ(n\log n)$ with high probability.

2409.02708 2026-05-14 cs.LG stat.ME

Few-shot Multi-Task Learning of Linear Invariant Features with Meta Subspace Pursuit

Chaozhi Zhang, Lin Liu, Xiaoqun Zhang

AI总结 本文研究了在数据稀缺情况下如何通过多任务学习提取线性不变特征的问题,提出了一种名为Meta Subspace Pursuit(Meta-SP)的新算法,用于学习不同任务间共享的低秩不变子空间。该方法在算法层面和统计层面均提供了理论保证,并通过大量实验验证了其在性能上的优越性,优于包括ANIL在内的多种对比方法。

详情
Journal ref
CSIAM Transactions on Applied Mathematics (2026)
英文摘要

Data scarcity poses a serious threat to modern machine learning and artificial intelligence, as their practical success typically relies on the availability of big datasets. One effective strategy to mitigate the issue of insufficient data is to first harness information from other data sources possessing certain similarities in the study design stage, and then employ the multi-task or meta learning framework in the analysis stage. In this paper, we focus on multi-task (or multi-source) linear models whose coefficients across tasks share an invariant low-rank component, a popular structural assumption considered in the recent multi-task or meta learning literature. Under this assumption, we propose a new algorithm, called Meta Subspace Pursuit (abbreviated as Meta-SP), that provably learns this invariant subspace shared by different tasks. Under this stylized setup for multi-task or meta learning, we establish both the algorithmic and statistical guarantees of the proposed method. Extensive numerical experiments are conducted, comparing Meta-SP against several competing methods, including popular, off-the-shelf model-agnostic meta learning algorithms such as ANIL. These experiments demonstrate that Meta-SP achieves superior performance over the competing methods in various aspects.

2405.07860 2026-05-14 econ.EM math.ST stat.ML stat.TH

Order-Explicit Linearization of High-Dimensional $U$-Statistics

David M. Ritzwoller, Vasilis Syrgkanis

AI总结 本文研究了高维 $U$-统计量与其Hájek投影之间的偏差,并给出了一个与阶数显式相关的大型偏差界。通过发展新的高阶Hoeffding分量的显式矩不等式,作者证明了对于具有特定条件的$d$维核函数的$b$阶$U$-统计量,其最大偏差为$O_p(ϕb n^{-1}\log^2(dn))$,并表明这一速率在对数项的多项式因子内不可改进。研究结果进一步用于建立基于重采样的非参数回归估计器的同时置信区间的一致性,适用于包括广义随机森林在内的多种随机森林回归方法。

详情
英文摘要

We give an order-explicit large deviation bound for the difference between a high-dimensional $U$-statistic and its Hájek projection. In particular, we show that any $U$-statistic of order $b$ on $n$ observations, with a $d$-dimensional kernel whose coordinates have $ψ_1$-Orlicz norm at most $ϕ$, has a maximum deviation from its Hájek projection of order $O_p(ϕb n^{-1}\log^2(dn))$. The proof relies on the development of novel order-explicit moment inequalities for higher-order Hoeffding components. We show that this rate is unimprovable, up to the polynomial factor on the logarithmic term. As corollaries, we obtain new Bernstein-type concentration and Gaussian approximation results for high-dimensional $U$-statistics. We apply these results to establish the consistency of a set of resampling-based simultaneous confidence intervals built around a class of nonparametric regression estimators constructed with subsampled kernels. This class encompasses several forms of random forest regression, including Generalized Random Forests.

2402.15415 2026-05-14 cs.LG math.DS stat.ML

Understanding Catastrophic Forgetting In LoRA via Mean-Field Attention Dynamics

Hugo Koubbi, Louis Hernandez, Matthieu Boussard

AI总结 本文研究了LoRA(低秩适配)方法在微调过程中出现的灾难性遗忘问题,通过构建一个可解析的均场自注意力玩具模型,将令牌视为相互作用的粒子系统,并将LoRA视为低秩扰动。利用偏微分方程和动力系统理论,揭示了遗忘行为与非遗忘行为之间的相变机制,并分析了扰动大小和模型深度对遗忘的影响,同时通过实验验证了理论预测。

详情
Comments
New version accepted at ICML 2026, with new results and without previous results
英文摘要

Low-Rank Adaptation (LoRA) is the dominant parameter-efficient fine-tuning method due to its favorable compute-performance trade-off, yet it suffers from catastrophic forgetting. We study forgetting through a tractable _mean-field self-attention_ toy model, where tokens evolve as an interacting particle system and LoRA acts as a low-rank perturbation. Using tools from partial differential equations and dynamical systems, we characterize regimes suggesting a phase transition between forgetting and non-forgetting behavior. We show that one phase transition appears with respect to the norm of the perturbation, and the other with respect to the depth of the Transformers. We further bound the time-to-deviation in terms of the perturbation size and spectral quantities, and corroborate the predicted trends with experiments and exploratory analyses on real models under LoRA fine-tuning.

2311.02299 2026-05-14 econ.EM stat.ME

The Fragility of Sparsity

Michal Kolesár, Ulrich K. Müller, Sebastian T. Roelsgaard

AI总结 本文通过三个实证应用,揭示了基于稀疏性假设的线性回归估计存在两种脆弱性。首先,不同选择的回归矩阵(如分类变量的基线类别)虽不影响普通最小二乘(OLS)估计,却可能导致稀疏性估计发生两倍标准误以上的变动。其次,作者通过将稀疏性估计与OLS进行比较,提出了两种检验稀疏性假设的方法,结果在所有三个应用中均拒绝了稀疏性假设。除非解释变量数量接近或超过样本量,否则OLS在保持较高效率的同时能提供更稳健的推断。

详情
Comments
48 pages, including appendices
英文摘要

We show, using three empirical applications, that linear regression estimates predicated on the assumption of sparsity are fragile in two ways. First, we document that different choices of the regressor matrix which do not impact ordinary least squares (OLS) estimates, such as the choice of baseline category with categorical controls, can move sparsity-based estimates by two standard errors or more. Second, we develop two tests of the sparsity assumption by comparing sparsity-based estimators with OLS. The tests tend to reject the sparsity assumption in all three applications. Unless the number of regressors is comparable to or exceeds the sample size, OLS yields more robust inference at little efficiency cost.

1804.01050 2026-05-14 stat.ML cs.CV cs.LG

Training VAEs Under Structured Residuals

Gara Dorta, Sara Vicente, Lourdes Agapito, Neill D. F. Campbell, Ivor Simpson

AI总结 本文研究了在变分自编码器(VAE)中如何更好地建模图像重构残差中的结构化相关性。传统VAE假设像素间的不确定性是独立的,但实际重构残差往往具有明显结构。为此,作者提出了一种新的方法,在VAE中引入结构化高斯似然预测网络,以建模残差中的相关性,并在保持模型复杂度较低的前提下,有效提升了VAE对颜色图像的不确定性建模能力与生成质量。

详情
Comments
Simplified training methodology, added more results
英文摘要

Variational auto-encoders (VAEs) are a popular and powerful deep generative model. Previous works on VAEs have assumed a factorized likelihood model, whereby the output uncertainty of each pixel is assumed to be independent. This approximation is clearly limited as demonstrated by observing a residual image from a VAE reconstruction, which often possess a high level of structure. This paper demonstrates a novel scheme to incorporate a structured Gaussian likelihood prediction network within the VAE that allows the residual correlations to be modeled. Our novel architecture, with minimal increase in complexity, incorporates the covariance matrix prediction within the VAE. We also propose a new mechanism for allowing structured uncertainty on color images. Furthermore, we provide a scheme for effectively training this model, and include some suggestions for improving performance in terms of efficiency or modeling longer range correlations.

1802.07079 2026-05-14 stat.ML

Structured Uncertainty Prediction Networks

Gara Dorta, Sara Vicente, Lourdes Agapito, Neill D. F. Campbell, Ivor Simpson

AI总结 本文首次提出了一种网络,用于预测合成图像的结构化不确定性分布。与以往方法主要预测对角协方差矩阵不同,该模型能够学习预测每个重建结果的完整高斯协方差矩阵,从而支持高效的采样和似然计算。实验表明,该模型能够准确重建合成数据集中的真实相关残差分布,并生成具有高频细节的真实人脸图像,同时展示了预测协方差在结构保持图像去噪中的应用。

详情
Comments
CVPR 2018 (final version)
英文摘要

This paper is the first work to propose a network to predict a structured uncertainty distribution for a synthesized image. Previous approaches have been mostly limited to predicting diagonal covariance matrices. Our novel model learns to predict a full Gaussian covariance matrix for each reconstruction, which permits efficient sampling and likelihood evaluation. We demonstrate that our model can accurately reconstruct ground truth correlated residual distributions for synthetic datasets and generate plausible high frequency samples for real face images. We also illustrate the use of these predicted covariances for structure preserving image denoising.

2605.13326 2026-05-14 stat.ME

A Note on the Folding Test of Unimodality: limitation and improved alternative

Colombe Becquart, Aurore Archimbaud, Anne M. Ruiz, Zaineb Smida

AI总结 本文指出了单峰性折叠检验(FTU)的一个关键局限性,即在某些一元混合分布情况下,该方法会系统性地误判多峰分布为单峰分布。作者对狄拉克混合和高斯混合中的此类失效情况进行了全面分析,并提出了一种双折叠方法,通过捕捉互补信息,改进了单峰性检验,有效解决了FTU的缺陷,并在模拟中提升了多峰检测的能力。

详情
英文摘要

This note addresses a key limitation of the Folding Test of Unimodality (FTU). In specific univariate mixture settings, the folding-based criterion can systematically fail, misclassifying clearly multimodal distributions as unimodal. We fully characterize these failures for Dirac mixtures and extend the analysis to Gaussian mixtures. We then introduce a double-folding procedure that captures complementary information, leading to a new test, the Double Folding Test of Unimodality. It resolves the FTU failures and improves multimodality detection power in simulations.

2605.13287 2026-05-14 cs.LG cs.AI math.OC stat.ML

Delightful Exploration

Ian Osband

AI总结 本文提出了一种名为“Delight-gated exploration”(DE)的探索策略,用于解决大规模动作空间中探索预算有限的问题。该方法通过衡量潜在收益与惊喜值的乘积(即“delight”)来决定是否进行探索,从而更高效地利用有限的探索资源。DE 在多种任务中表现出比 Thompson Sampling 和 $\varepsilon$-greedy 更弱的遗憾增长,并且其超参数具有良好的跨任务迁移性,无需重新调整。

详情
英文摘要

Most exploration algorithms search broadly until uncertainty is resolved. When the action space is too large to resolve within budget, practitioners default to $\varepsilon$-greedy, which bounds disruption but spends its override blindly. We introduce \textit{Delight-gated exploration} (DE), a host--override rule that spends exploratory actions only when their prospective delight (expected improvement times surprisal) exceeds a gate price. This practical heuristic recovers a classical result: Pandora's reservation-value rule for costly search, with surprisal setting the effective inspection cost. Resolved arms exit the gate, fresh arms shut off above a prior-determined threshold, and selected linear-bandit overrides consume finite information budget. Across Bernoulli bandits, linear bandits, and tabular MDPs, the same hyperparameters transfer without retuning, and DE shows much weaker regret growth than Thompson Sampling and $\varepsilon$-greedy in the tested unresolved regimes. Delight improves acting for the same reason it improves learning: it prices scarce resources by the product of upside and surprisal.

2605.13284 2026-05-14 stat.ML cs.LG math.ST stat.TH

Learning Perturbations to Extrapolate Your LLM

Zetai Cen, Chenfei Gu, Jin Zhu, Ting Li, Yunxiao Chen, Chengchun Shi

AI总结 该研究旨在提升大语言模型在未知领域中的泛化能力,提出了一种通过学习连续潜在向量的可学习变换来扰动词元前缀的方法。该方法克服了传统离散固定扰动的局限性,并通过推导无偏估计方程并利用随机梯度下降进行优化,建立了在过参数化场景下的统计性质。实验表明,该方法在合成和真实数据集上均显著优于现有先进方法。

详情
Comments
35 pages
英文摘要

Recent advancements in large language models demonstrate that injecting perturbations can substantially enhance extrapolation performance. However, current approaches often rely on discrete perturbations with fixed designs, which limits their flexibility. In this work, we propose a framework where token prefixes are perturbed by a learnable transformation of a continuous latent vector within an embedding space. To overcome the challenge of an intractable marginal likelihood, we derive unbiased estimating equations for model parameters and optimize them via stochastic gradient descent. We establish the statistical properties of the resulting estimator in over-parameterized regimes. Empirical evaluations on both synthetic and real-world datasets demonstrate that our proposal yields significant gains in out-of-domain settings over a range of state-of-the-art baseline methods.

2605.13283 2026-05-14 cs.LG math.ST stat.TH

Byzantine-Robust Distributed Sparse Learning Revisited

Yuxuan Wang, Lixin Zhang, Kangqiang Li

AI总结 本文重新研究了高维稀疏线性模型下的拜占庭鲁棒分布式估计问题。作者提出了一种结合局部鲁棒$\ell_1$正则化估计与服务器端鲁棒聚合的框架,适用于伪Huber回归、分位数回归和稀疏支持向量机。该方法在较弱条件下提供了非渐近保证,达到了近似最优的统计收敛速率,同时保持了通信效率,仿真实验验证了其在多种拜占庭攻击下的估计鲁棒性、支持恢复和分类精度。

详情
英文摘要

We revisit Byzantine robust distributed estimation for high-dimensional sparse linear models. By combining local $\ell_1$-regularized robust estimation with robust aggregation at the server, the framework applies to pseudo-Huber regression, quantile regression, and sparse SVM. We show that the resulting estimators yield non-asymptotic guarantees and attain near-optimal statistical rates under mild conditions, while remaining communication-efficient. Simulations confirm strong robustness in estimation, support recovery and classification accuracy under various Byzantine attacks.

2605.13260 2026-05-14 cs.LG math.AP math.FA stat.ML

Unified generalization analysis for physics informed neural networks

Yuka Hashimoto, Tomoharu Iwata

AI总结 本文针对物理信息神经网络(PINNs)及其变体(VPINNs)的泛化能力进行了统一的理论分析。研究通过泰勒展开将非线性微分算子转化为高维空间中的线性算子,结合Koopman分析方法,建立了适用于包含微分操作的神经网络的泛化界。该方法突破了以往对稳定性条件或线性椭圆性的依赖,揭示了微分算子的非线性特性对泛化性能的显著影响,为理解物理信息神经网络的训练与推广提供了新的理论视角。

详情
英文摘要

Physics-Informed Neural Networks (PINNs) and their variational counterparts (VPINNs) are neural networks that incorporate physical laws, making them useful for scientific problems. Existing generalization analyses for PINNs and VPINNs remain limited, often requiring restrictive assumptions such as stability conditions or linear ellipticity. In this paper, we derive generalization bounds for neural networks that involve differentiation with respect to input variables, covering PINNs and VPINNs under a unified framework. We apply Taylor expansion to represent nonlinear differential operators as linear operators on a high-dimensional space, enabling the use of Koopman-based analysis and showing that high-rank networks can generalize well even in settings involving differential operators. We also show that the nonlinearity of the differential operator exponentially enlarges the bound, highlighting its significant impact on generalization.

2605.13252 2026-05-14 stat.ML cs.LG math.ST stat.TH

The Sample Complexity of Multiple Change Point Identification under Bandit Feedback

Maximilian Graf, Victor Thuot

AI总结 本文研究了在老虎机反馈机制下多突变点定位问题,旨在以最少的采样次数识别出函数中指定数量的突变点,并满足给定的精度和置信水平。作者提出了一种自适应算法,首先检测可能包含突变点的区间,再精确确定其位置,并给出了该算法的样本复杂度上界和下界。研究发现,突变点的幅度和相对位置共同影响样本复杂度,而不仅仅是突变幅度单独决定。

详情
英文摘要

We study multiple change point localization under bandit feedback. An unknown piecewise-constant function on a compact interval can be queried sequentially at adaptively chosen inputs, and each query returns a noisy evaluation of the function. The goal is to identify a prescribed number of discontinuities, known as change points, within a target precision $η$ and confidence level $1-δ$, while using as few samples as possible. We propose an adaptive algorithm that first detects intervals likely to contain change points and then refines their locations to precision $η$. We establish non-asymptotic upper bounds on its sample budget, together with corresponding lower bounds. Prior work shows that jump magnitudes alone determine the asymptotic sample complexity as $δ\to 0$. We reveal that this picture is incomplete beyond this regime. We demonstrate, both empirically and theoretically, that for general $δ$ and $η$, the complexity is jointly governed by the jumps and the relative positions of the change points.

2605.13188 2026-05-14 stat.ML cs.CL cs.LG stat.ME

LLMs as Implicit Imputers: Uncertainty Should Scale with Missing Information

Stef van Buuren

AI总结 本文研究了大型语言模型(LLMs)在不完整上下文下的回答不确定性问题,提出应将LLMs视为隐式的缺失值填补器,并借鉴多重填补理论中的标准,即不确定性应随缺失信息量增加而上升。通过在SQuAD数据集上的实验,作者发现基于采样的响应熵能更准确地反映上下文缺失程度,而置信度则无法有效体现这一变化。研究还提出了一种黑盒诊断指标,用于评估不同上下文水平下模型不确定性减少的比例,为评估LLMs在不完整信息下的表现提供了新方法。

详情
Comments
9 pages, 3 figures, 2 tables, NeurIPS 2026 position paper
英文摘要

Large language models (LLMs) are increasingly deployed in settings where the available context is incomplete or degraded. We argue that an LLM generating answers under incomplete context can be viewed as an implicit imputer, and evaluated against a criterion from the multiple imputation (MI) literature: uncertainty should scale with the amount of missing information. We assess this criterion on SQuAD, using a controlled framework in which context availability is varied across five levels. We evaluate two answer-level uncertainty measures that can be estimated from repeated sampling: sampling-based confidence (empirical mode frequency) and response entropy. Confidence fails to reflect increasing missingness: it remains high even as accuracy collapses. Entropy, by contrast, increases with context removal, consistent with the MI analogy, and explains substantially more variance in accuracy than confidence across all evidence levels (quadratic $R^2$ gap up to 0.057). We further introduce a black-box diagnostic $ρ_R(α)$ that estimates the proportion of baseline uncertainty resolved by context level $α$, requiring only repeated sampling with and without context. These results suggest that entropy is a more responsive black-box uncertainty measure than confidence under incomplete context.

2605.13187 2026-05-14 stat.ME stat.AP

Testing the Structural Properties of Marked Point Processes Using Local Inhomogeneous Mark-Weighted K-Functions

Nicoletta D'Angelo, Giada Adelfio, Matthias Eckardt

AI总结 本文提出了一种基于卡方型检验统计量的方法,用于检验观测到的标记点模式在局部结构上的不同假设。该方法通过局部非齐次标记加权K函数的扩展,评估标记与位置之间的相互作用,从而揭示局部对独立性或均匀性偏离的贡献。该方法在具有细微标记结构或小样本的情境下仍表现出良好的检测能力,并在森林和地震等实际环境数据中展示了其检测空间依赖性标记结构的有效性。

详情
Comments
submitted for publication
英文摘要

This work proposes $χ^2$-type test statistics to assess different hypotheses on the local structure of an observed marked point pattern. The test statistics is based on the local inhomogeneous extension of the mark-weighted $K$-function to investigate local behaviour of the marked point pattern. The summary statistic captures interactions between marks and locations by assessing local contributions to global deviations from independence or homogeneity. The methodology proves to be effective in identifying both global and localised departures from the null hypotheses, even in scenarios with subtle mark structures or small sample sizes. Real-world environmental applications to forestry and earthquake data demonstrate the utility of the proposed framework for detecting spatially dependent marked structures in the patterns.

2605.13174 2026-05-14 stat.ML cs.LG stat.CO

Coupling-Informed Transport Maps for Bayesian Filtering in Nonlinear Dynamical Systems

Dengfei Zeng, Lijian Jiang, Shuyu Sun, Dunhui Xiao

AI总结 本文提出了一种基于状态与观测变量之间耦合关系的无似然传输滤波方法,用于非线性动态系统的贝叶斯滤波。通过利用传输映射的块三角结构,将滤波分析步骤转化为最小化真实联合分布与其传输近似之间的最大平均差异(MMD)。为避免MMD优化中的非凸性问题,作者引入了一种无需训练的传输滤波方法,通过梯度流实现传输映射的解析计算,从而有效逼近非高斯滤波后验分布并避免粒子崩溃。该方法在高维问题中通过域局部化进行扩展,并在数值实验中展现出优于传统滤波方法的性能。

详情
Comments
29 pages, 14 figures
英文摘要

A likelihood-free transport filtering method is proposed based on the couplings between state and observation variables. By exploiting a block-triangular structure in the transport map, the analysis step of filtering is reformulated as the minimization of the maximum mean discrepancy (MMD) between the true joint measure and its transport-based approximation. To circumvent the non-convexity in the MMD optimization, we introduce a training-free transport filter method via gradient flows, which leads to an analytic computation for the transport map that implies the steepest descent direction of the MMD. The proposed approach accurately approximates non-Gaussian filtering posteriors and avoids particle collapse. We provide a convergence analysis for the expectation of the MMD between the approximated posterior and the truth posterior. Finally, we extend the method to high-dimensional problems through domain localization. Numerical examples demonstrate the superior performance of our approach over conventional filtering methods in nonlinear, non-Gaussian scenarios.

2605.13160 2026-05-14 stat.ML cs.LG

Kernel-based guarantees for nonlinear parametric models in Bayesian optimization

Rafael Oliveira

AI总结 本文研究了在贝叶斯优化中使用非线性参数模型时的理论保证问题,针对适应性数据收集场景下的模型分析缺乏理论支持的现状,提出了一种基于核函数的框架。该方法通过参数空间上的核函数诱导模型类的再生核希尔伯特空间结构,为使用广泛正则化凸损失训练的非线性模型提供了置信界,进而支持非线性获取函数和代理模型的收敛性保证,为贝叶斯优化及相关自适应优化问题提供了统一的理论分析途径。

详情
英文摘要

Modern Bayesian optimization and adaptive sampling methods increasingly rely on nonlinear parametric models, yet theoretical guarantees for such models under adaptive data collection remain limited. Existing analyses largely focus on Gaussian processes, kernel machines, linear models, or linearized neural approximations, leaving a gap between theory and the nonlinear models used in practice. We develop a kernel based framework for analyzing regularized nonlinear parametric models trained on adaptively collected data. Our approach uses kernels over the parameter space to induce reproducing kernel Hilbert space structures over the corresponding model class, yielding confidence bounds for models trained with broad classes of regularized convex losses. We show how these bounds can support convergence guarantees for nonlinear acquisition and surrogate models, including randomized regularized policies that select points by maximizing a trained random model. These results provide a unified route to analyzing nonlinear parametric models in Bayesian optimization and related adaptive optimization settings.

2605.13150 2026-05-14 stat.ML cs.LG

Generative Modeling of Approximately Periodic Time Series by a Posterior-Weighted Gaussian Process

Elias Reich, Saverio Messineo, Stefan Huber

AI总结 该论文研究了工业和网络物理系统中具有近似周期性特征的离散自动化过程的时间序列生成问题。为了解决传统高斯过程模型在处理此类数据时的不足,作者提出了一种基于后验加权高斯过程的生成模型,通过引入新的核函数,实现了对周期性结构和重复间变异的解耦。该方法能够在保持重复间结构一致性的同时,生成具有平滑变化特性的近似周期时间序列,为相关领域的建模与生成任务提供了新思路。

详情
英文摘要

Discrete automated processes in industrial and cyber-physical systems often exhibit a repetitive structure in which successive repetitions follow a common trajectory while differing in duration, amplitude, and fine-scale dynamics. Such \emph{approximately periodic} behavior poses a challenge for Gaussian Processes (GP) modeling: strictly periodic models suppress inter-repetition variability, while non-periodic models fail to capture the strong structural regularities required for generation. In this work, we propose a stochastic generative model for approximately periodic time series. The model is based on a GP whose posterior is modulated by a novel kernel. Our approach decouples intra-repetition structure from inter-repetition variability through a two-stage construction which yields a generative distribution with a identical mean function across repetitions, while allowing smooth variation between repetitions. The modeling choices are supported by an implementation in which realistic synthetic trajectories are generated from toy datasets.

2605.13146 2026-05-14 stat.ML cs.CV cs.LG

On Hallucinations in Inverse Problems: Fundamental Limits and Provable Assessment Methods

David Iagaru, Nina M. Gottschling, Anders C. Hansen, Josselin Garnier

AI总结 本文研究了逆问题中的“幻觉”现象,即人工智能模型生成的看似合理但实际错误的细节。作者提出了一种理论框架,揭示这类幻觉不仅源于模型本身,更可能源于逆问题本身的病态特性,并推导出幻觉产生的充要条件及仅依赖于前向模型的可计算界。基于该理论,文章提出了两种算法,分别用于估计最小幻觉幅度和评估重建细节的可信度,实验表明该方法适用于多种成像任务和生成模型,为量化和评估AI幻觉提供了理论依据。

详情
Comments
31 pages, 11 figures; code available at https://github.com/davidiagraid/hallucinations_invpb
英文摘要

Artificial intelligence (AI) has transformed imaging inverse problems, from medical diagnostics to Earth observation. Yet deep neural networks can produce hallucinations, realistic-looking but incorrect details, undermining their reliability, especially when ground truth data is unavailable. We develop a theoretical framework showing that such hallucinations are not merely artifacts of particular models, but can arise from the ill-posed nature of the inverse problem itself. We derive necessary and sufficient conditions for hallucinations, together with computable bounds on their magnitude that depend only on the forward model. Building on this theory, we introduce algorithms to: (1) estimate the minimum hallucination magnitude achievable by any reconstruction model for a given input; (2) assess the faithfulness of reconstructed details by a given reconstruction model. Experiments across three imaging tasks demonstrate that our approach applies broadly, including to modern generative models, and provides a principled way to quantify and evaluate AI hallucinations.

2605.13128 2026-05-14 stat.ML cs.LG stat.CO

Amortized Neural Clustering of Time Series based on Statistical Features

Ángel López-Oriona, Ying Sun

AI总结 本文提出了一种无需依赖传统聚类算法(如K-means、K-medoids或层次聚类)的基于统计特征的时间序列聚类方法,通过神经网络的 amortized 推理学习最优聚类规则。该方法利用自相关和分位数自相关等统计特征,从数据中自动学习亲和结构,无需预先指定聚类形状或数量,且能自动确定聚类数目。实验表明,该框架在多种场景下均能实现与传统方法相当或更优的聚类效果,并在金融时间序列分析中展现出实际应用价值。

详情
英文摘要

This paper introduces an algorithm-agnostic approach to feature-based time series clustering via amortized neural inference. By training neural networks to approximate the optimal partitioning rule from simulated data, the proposed framework reduces reliance on conventional clustering methods, such as $K$-means, $K$-medoids, or hierarchical clustering, and their associated objective functions and heuristics. Leveraging statistical features, such as autocorrelations and quantile autocorrelations, the approach learns a data-driven affinity structure from which clustering partitions can be recovered, without requiring explicit prior specification of cluster shapes or structures. In addition, one version of the method can automatically determine the number of clusters, avoiding ad-hoc selection procedures. Comprehensive empirical studies show that the proposed framework achieves competitive or superior clustering accuracy relative to traditional methods, even in challenging scenarios where competing techniques are provided with the true number of clusters. An application to financial time series of stock returns illustrates its practical utility. By reducing the need for algorithm selection and calibration, the proposed framework opens new possibilities for automated, adaptive, and data-driven clustering of temporal data across scientific and industrial domains.

2605.13127 2026-05-14 stat.ML cs.LG math.PR

State-of-art minibatches via novel DPP kernels: discretization, wavelets, and rough objectives

Hoang-Son Tran, Pranav Gupta, Rémi Bardenet, Subhroshekhar Ghosh

AI总结 本文研究如何利用新型行列式点过程(DPP)核来生成更高效的迷你批次和核心集,以提升大规模数据集的机器学习效率。作者提出了基于小波的欧几里得空间DPP,其精度保证优于现有方法,并开发了一种将连续DPP转换为离散核的通用方法,从而在保持方差衰减特性的同时实现高效采样。该方法拓展了DPP在不规则目标函数任务中的适用性,并提供了与任务正则性自适应的理论保证。

详情
Comments
52 pages
英文摘要

Determinantal point processes (DPPs) have emerged as a kernelized alternative to vanilla independent sampling for generating efficient minibatches, coresets and other parsimonious representations of large-scale datasets. While theoretical foundations and promising empirical performance have been demonstrated, there are two challenges for current proposals for DPP-based coresets or minibatches. The first is the need for families of DPPs with certain key variance reduction properties, usually constructed in a continuous setting, of which there are few known examples. The second is the need for an ad-hoc construction of a discrete DPP defined on a given dataset, that inherits such variance reduction. In this work, we contribute to the programme of establishing DPPs as a subsampling toolbox for ML by advancing on these two fronts. First, we propose new DPPs on the Euclidean space based on wavelets, with provably better accuracy guarantees than the best known rates. Second, we introduce a general method to convert such continuous DPPs, which are more amenable to proving analytical statements, into discrete kernels, which are pertinent for subsampling tasks such as minibatch and coreset constructions. This conversion mechanism simultaneously preserves the desired variance decay and reveals a low-rank decomposition of the discrete kernel, which makes sampling the corresponding DPP computationally inexpensive. En route, we enlarge the class of ML tasks amenable to improvements via DPP-based minibatches and coresets to include objective functions with arbitrarily low regularity, and rate guarantees that explicitly adapt to this regularity.

2605.13092 2026-05-14 stat.ML cs.LG stat.ME

Adaptive Kernel Density Estimation with Pre-training

Ruitong Zhang, Ke Deng

AI总结 本文研究了高维空间中的密度估计问题,传统核密度估计方法因难以指定合适的局部自适应核而不高效。为此,作者引入预训练思想,构建一个预训练神经网络,为每个样本点推荐合适的自适应核,从而实现高维下的高效密度估计。实验表明,当目标分布与预训练分布接近时,该方法能显著提升估计精度;即使分布差异较大,通过微调仍可恢复效果,展示了方法的灵活性和有效性。

详情
Comments
8 pages main text, 14 pages total including references and appendix, 3 figures
英文摘要

Density estimation in high-dimensional settings is an important and challenging statistical problem.Traditional methods based on kernel smoothing are inefficient in high dimensions due to the difficulties in specifying appropriate location-adaptive kernels. In this work, we introduce pre-training, a key idea behind many cutting-edge AI technologies, to the context of non-parametric density estimation. By establishing a pre-trained neural network that can recommend an appropriate location-adaptive kernel for each sample point, efficient density estimation with adaptive kernels is achieved in high dimensions. A wide range of numerical experiments show that this strategy is highly effective for improving density-estimation accuracy, when the target distribution is close to the distribution family for pre-training. When the target distribution is substantially different from the pre-training distribution family, the benefit from the proposed pre-training strategy may be diluted, but can be reactivated by an additional fine-tuning procedure.

2605.13004 2026-05-14 math.PR math.ST stat.TH

Orientation in Poisson Cluster Processes via Imaginary Bispectra

Conor Kresin, Yifu Tang, Boris Baeumer, Ting Wang

AI总结 本文研究了一侧泊松聚类过程在聚类方向被抹去后仍可检测的信息。通过构造保持强度和巴特利特谱的可逆聚类空模型,作者表明仅凭二阶结构无法确定时间方向。对于平稳的泊松分支聚类,研究推导了约简三阶累积量的傅里叶-斯蒂尔特jes变换,并证明在$L^1$三阶累积量条件下,非零的虚因子双谱可确认方向性,同时给出了方向抹去后的空模型及有限窗口的三阶方向对比方法。

详情
英文摘要

We study what remains detectable about one-sided Poisson cluster processes after cluster orientation is erased. We construct matched reversible cluster nulls preserving intensity and the full Bartlett spectrum, showing that second-order structure alone need not identify temporal direction. For stationary Poisson branching clusters, we derive the Fourier--Stieltjes transform of the reduced third cumulant and show that, in the $L^1$ third-cumulant regime, a nonzero imaginary factorial bispectrum certifies orientation. We also give explicit orientation-erased nulls, reversible spectral matches for monotone Hawkes kernels, and finite-window third-order orientation contrasts.

2605.12977 2026-05-14 stat.AP q-fin.MF q-fin.RM q-fin.ST stat.ML

Enhancing a Risk Model by Adding Transient Statistical Factors

Alexandros E. Tzikas, Emmanuel J. Candès, Trevor Hastie, Stephen P. Boyd, Mykel J. Kochenderfer, Ronald N. Kahn

AI总结 本文研究如何通过引入瞬时统计因子来增强现有的风险模型,以更准确地估计资产收益的协方差。作者提出了一种基于最大似然估计的系统方法,通过调整现有因子模型并添加新的统计因子来提升模型表现,仅依赖于观测到的收益序列和两个超参数。该方法适用于存在缺失收益数据的典型股票数据集,并在实际应用中对Barra短期美国风险模型进行了验证,展示了其在捕捉原模型未能反映的收益结构方面的有效性。

详情
英文摘要

Estimating the covariance of asset returns, i.e., the risk model, is a key component of financial portfolio construction and evaluation. Most risk modeling approaches produce a factor model that decomposes the asset variability into two components: the first attributed to a small number of factors that are common among the assets and the second attributed to the idiosyncratic behavior of each asset. Third-party providers typically provide risk models to investors, and while these models are typically of high quality, they may fail to capture important information, e.g., changing market regimes and transient factors. To overcome these limitations, we propose a systematic method based on maximum likelihood estimation to enhance an existing factor model by both refining the given model and adding new statistical factors. Our approach relies only on the observed sequence of realized returns and on the choice of two hyperparameters: the number of additional factors and the half-life parameter that determines the weights assigned to returns in the log-likelihood objective. Importantly, our methodology applies to the situation where asset returns may be missing, making it suitable for typical equity datasets. We demonstrate our approach on the Barra short-term US risk model, a high-quality risk model used in practice, for a universe of US high-capitalization equities. We show that the proposed extension captures structure in the returns that is missed by the original model.

2605.12947 2026-05-14 stat.ML cs.AI cs.LG stat.ME

When Should an AI Workflow Release? Always-Valid Inference for Black-Box Generate-Verify Systems

Young Hyun Cho, Will Wei Sun

AI总结 随着基于大语言模型的AI工作流越来越多地采用生成-评估-修订的迭代流程,如何在适当的时候停止迭代并输出结果成为一个关键问题。本文提出了一种始终有效的发布包装器,用于现有生成-评估系统,通过构建高分失败案例的参考池并结合e-process累积证据,实现了在不确定停止时机下的统计保证。该方法能够在保证不释放不可行任务结果的同时,仍能对可行任务进行有效发布,理论分析和实验结果均验证了其有效性。

详情
英文摘要

LLM-enabled AI workflows increasingly produce outputs through iterative generate-evaluate-revise loops. Each iteration can improve the candidate, but it also creates a release decision: when to stop and output the current result? This raises a statistical challenge because deployment-time evaluator scores are adaptively generated and repeatedly monitored, yet the likelihood models or exchangeability assumptions typically used for calibration are unavailable. We propose an always-valid release wrapper for existing generator-evaluator pipelines. The wrapper builds a hard-negative reference pool of high-scoring failures, calibrates deployment-time evaluator scores against this pool, and accumulates the resulting evidence with an e-process. This separates two roles: the reference pool turns black-box scores into conservative evidence, while the e-process provides validity under optional stopping. In theory, we show that a conservative reference pool yields finite-sample control of the probability of releasing on infeasible tasks, that is, tasks for which the given workflow is not capable of producing a reliable solution. We also characterize conditions under which the same conservative rule still achieves nontrivial release on feasible tasks. In an MBPP+ coding-agent case study, the wrapper reduces premature incorrect release relative to baseline stopping rules while still releasing on tasks for which the workflow repeatedly accumulates moderate supporting evidence.

2605.12908 2026-05-14 stat.ML cs.LG

The Mechanism of Weak-to-Strong Generalization: Feature Elicitation from Latent Knowledge

Ryoya Awano, Taiji Suzuki

AI总结 本文研究了从弱模型到强模型的泛化机制(W2S),即通过弱模型的输出对强模型进行微调,使强模型在保持原有能力的同时学习新任务。作者在奖励模型学习的设定下,利用两层神经网络分析了该过程,证明强模型能够高效学习任务特征并保留预训练的通用能力,而不会发生灾难性遗忘。该研究为理解W2S泛化提供了理论支持,并展示了其在特征学习场景中的有效性。

详情
Comments
48 pages, 1 figure
英文摘要

Weak-to-strong (W2S) generalization, in which a strong model is fine-tuned on outputs of a weaker, task-specialized model, has been proposed as an approach to aligning superhuman AI systems. Existing theoretical analyses either fix the student's representations or operate in restricted settings. Whether multi-step SGD can succeed in feature learning while preserving diverse pre-trained capabilities remains open. We study W2S in the setting of reward-model learning with two-layer neural networks. The strong model has pre-trained representations organized into low-dimensional subspaces $V_k$, and is fine-tuned under the supervision of a weak model specialized on task $κ$. We prove that the strong model efficiently learns task $κ$, eliciting its pre-trained knowledge while retaining general capabilities. This establishes W2S generalization in the feature-learning regime, in the sense that the strong model acquires the target feature direction through W2S training, rather than having it given a priori. Moreover, W2S preserves pre-trained off-target features, whereas standard supervised fine-tuning causes catastrophic forgetting when off-target feature directions are correlated with the target's. Numerical experiments on synthetic data confirm our theoretical results.

2605.12901 2026-05-14 stat.ME stat.AP stat.CO

A Bayesian Adaptive Latent Mixture Model for Zero-Inflated Weighted Brain Connectome Analysis

Hsin-Hsiung Huang, Yuh-Haur Chen, Teng Zhang

AI总结 该研究针对加权脑连接网络中普遍存在的大量零值和异质非零边强度问题,提出了一种贝叶斯自适应潜在混合模型,用于分析零膨胀的加权脑连接组数据。该模型通过将每个被试网络表示为共享的低秩潜在得分矩阵的单纯形混合,并结合 hurdle 概率模型区分边的存在与否与边强度,从而更准确地捕捉连接模式的重叠特性。理论分析证明了后验一致性与预测一致性,并在模拟和实际数据中展示了优于传统拓扑方法的性能。

详情
英文摘要

Replicated weighted networks often exhibit many structural zeros alongside heterogeneous non-zero edge strengths. In structural connectomics, this zero-inflation coincides with subjects expressing overlapping, rather than discrete, connectivity patterns. To address these features, we propose a Bayesian adaptive latent mixture model for zero-inflated weighted networks. Our approach represents each subject network as a simplex mixture of shared low-rank latent score matrices, integrated with a hurdle likelihood that separates edge existence from conditional edge strength. A sparsity-coupling parameter enables absent edges to be either independent of, or informative about, the latent connectivity. For computation, we employ transformed Hamiltonian Monte Carlo on unconstrained coordinates, selecting the number of templates via predictive fit, held-out link prediction, and template stability. Theoretically, we establish posterior consistency, local asymptotic normality, a Bernstein--von Mises approximation, and predictive consistency for an identifiable quotient-space estimand under a fixed-template scenario. Simulations demonstrate performance gains over topology-only baselines in settings with mixed memberships or structure-informed sparsity. Applied to Human Connectome Project data, the model recovers stable latent score patterns and heterogeneous subject-level mixtures, with behavioural analyses serving strictly as exploratory annotations rather than confirmatory biomarker claims.

2605.12899 2026-05-14 stat.ML cs.LG

Robust Sequential Experimental Design for A/B Testing

Qianglin Wen, Xiangkun Wu, Chengchun Shi, Ting Li, Niansheng Tang, Yingying Zhang, Hongtu Zhu

AI总结 本文研究了在模型误设情况下A/B测试中鲁棒的序贯实验设计问题,提出了一种统一的框架,适用于上下文 bandit 和动态设置。理论上,该方法能够保证估计处理效应的最坏情况均方误差上界;实验部分在合成数据和某科技公司的实际数据上验证了方法的有效性。

详情
英文摘要

Experimental design has emerged as a powerful approach for improving the sample efficiency of A/B testing, yet existing designs rely critically on correctly specified models. We study robust sequential experimental design under model misspecification and develop a unified framework that covers both contextual bandit and dynamic settings. Theoretically, we prove that our design bounds the worst-case mean squared error of the estimated treatment effect. Empirically, we demonstrate the effectiveness of the proposed approach using synthetic and real-world datasets from a leading technology company.

2605.12890 2026-05-14 stat.AP cs.LG

Steer-to-Detect: Probing Hidden Representations for Detection of LLM-Generated Texts

Luxu Liang, Xiang Li

AI总结 随着大语言模型(LLM)的快速发展,区分机器生成文本与人类撰写文本变得越来越困难。为了解决这一问题,本文提出了一种名为Steer-to-Detect(S2D)的两阶段检测框架,通过注入引导向量提升冻结的观察模型的隐藏状态表示,从而增强类别可分性,并基于引导后的表示进行假设检验以实现检测。该方法在理论上有严格的误差保证,并在多种场景下表现出色,包括分布外和对抗性扰动情况。

详情
英文摘要

The rapid advancement of large language models (LLMs) has made machine-generated text increasingly difficult to distinguish from human-written text. While recent studies explore leveraging internal representations of language models to uncover deeper detection signals, these raw features often exhibit substantial overlap between classes, limiting their discriminative power. To address this challenge, we propose Steer-to-Detect (\texttt{S2D}), a two-stage framework for detecting LLM-generated text. In the first stage, \texttt{S2D} learns a steering vector that is injected into the hidden states of a frozen observer LLM, producing representations with improved class separability. In the second stage, detection is performed via a hypothesis testing procedure based on the steered representations. We establish finite-sample, high-probability guarantees for Type I and Type II errors, providing a theoretical characterization of the procedure. Empirically, \texttt{S2D} achieves strong and consistent performance across a range of settings, including out-of-distribution scenarios and adversarial perturbations.

2605.12881 2026-05-14 stat.ME

Change-point detection in variance-covariance matrix

Ying Lin, Benjamin Poignard

AI总结 本文研究在方差协方差矩阵分段常数变化背景下,联合估计变化点位置和协方差矩阵的稀疏结构。通过在平方弗罗贝尼乌斯范数上应用分组融合LASSO和LASSO惩罚,并引入自适应权重以提高估计精度,提出了一种新的估计方法。作者还建立了估计量一致性的理论条件,并开发了基于ADMM的高效优化算法,实验表明该方法在合成数据和实际数据中表现良好。

详情
英文摘要

We consider the joint estimation of change point locations and the sparsity pattern of the variance covariance matrix, which is assumed to evolve in a piecewise constant manner. By applying Group Fused LASSO and LASSO penalties to the squared Frobenius norm, we estimate both the covariance structure and the change points. Adaptive weights are incorporated into the penalty terms to enhance change point detection and covariance estimation accuracy. We establish the conditions under which the estimated change points and the sparse estimators within each segment are consistent. To solve the resulting optimization problem efficiently, we develop an alternating direction method of multipliers (ADMM) whose updates reduce to computationally tractable subproblems. The performance of the proposed method is illustrated through synthetic and real data experiments, including comparisons with several competing procedures.

2605.12847 2026-05-14 stat.ME

Never Too LATE: A Fully Stochastic Update to the Potential Outcome Framework

Hanti Lin

AI总结 本文提出了一种对传统潜在结果框架的全新随机化更新方法,旨在解决在随机因果背景下局部平均处理效应(LATE)的识别问题。传统LATE假设每个个体的潜在结果是确定的,而作者引入了随机潜在结果,即个体在不同处理下的结果具有概率分布,并据此定义了一个加权平均处理效应(DATE)。研究证明,在与LATE相似但适用于随机设置的假设下,DATE与传统工具变量估计量一致,从而为因果推断提供了一个更灵活且更符合现实的理论框架。

详情
英文摘要

In the classic potential outcome framework, the local average treatment effect (LATE) and its identification via an instrumental variable are stated in a deterministic setting at the individual level: each individual has settled potential outcomes such as ``cured if treated''. Several authors have proposed working instead with \emph{stochastic} potential outcomes -- counterfactual probabilities of the form ``the chance of being cured if treated'' -- but the integration of stochastic potential outcomes with the LATE machinery raises an issue. It is a metaphysical issue: in a stochastic setting, the standard joint-probability definitions of compliers and the LATE assume what I will call the \emph{unique-parallel-universe view}, which asserts that, in any genuinely possible state of the world, every counterfactual condition settles a unique determinate outcome even when the underlying causal disposition is irreducibly chancy. The statistician Dawid (2000) doubts the plausibility of this view; the philosopher Lewis (1973) develops a reductio argument against it. I propose a fully stochastic update to the Rubin causal model that drops the assumption of the unique-parallel-universe view: stochastic potential outcomes are introduced as Bernoulli parameters in their own (small) probability spaces, and are connected to observables via the factorization rule of a causal Bayes net. Within this framework, I define a Degree-of-compliance-weighted Average Treatment Effect (DATE) and prove that, under assumptions analogous to those used for the LATE but rewritten for the fully stochastic setting, the DATE equals the usual IV estimand. The classic LATE identification result emerges as a deterministic special case. Existing IV practice can therefore be reinterpreted: it has been estimating the DATE all along, in a general stochastic setting, without assuming the unique-parallel-universe view.

2605.12844 2026-05-14 math.NA cs.NA stat.CO

Walk on spheres and Array-RQMC

Valerie N. P. Ho, Art B. Owen

AI总结 本文将Array-RQMC采样方法应用于球面行走(WOS)算法,用于求解狄利克雷边值问题。实验表明,该方法在多个问题上显著降低了蒙特卡洛方差,方差降低因子达到57至2290倍。与之前研究的RQMC-WOS算法相比,本文方法在相同问题上表现出更优的方差缩减效果,并引入了基于Sobol'指标的列均值维度来解释这一改进。

详情
英文摘要

We use Array-RQMC sampling in a walk on spheres (WOS) algorithm for Dirichlet boundary value problems. On a collection of problems, we find that Array-RQMC-WOS reduces the Monte Carlo variance by factors ranging from $57$-fold to $2290$-fold at $n=2^{17}$ trajectories. The variance is known to be $o(1/n)$ but attains empirical rates between $n^{-1.4}$ and $n^{-1.8}$ in our examples. A simpler RQMC-WOS algorithm studied in Ho and Owen (2026) has more theoretical support but only reduced variance by 1.8 to 10.7-fold on the same set of examples. In order to explain this improvement, we introduce a column-wise mean dimension of the RQMC error based on Sobol' indices. It matches the usual mean dimension for Monte Carlo and the mean dimension of a dual lattice error for randomized lattices. We find for a gasket example from Crane et al.\ (2025) that the mean dimension of Array-RQMC-WOS errors is much higher than an analogous Array-MC-WOS algorithm has.

2605.12840 2026-05-14 stat.AP cs.LG

Decision Support for Marketplace Policies under Incomplete Evidence: From Replay to Launch Readiness

Prashant Shekhar, Caroline Howard

AI总结 本文研究了在实时竞价(RTB)市场中,如何基于不完整证据对定价和分配政策进行决策支持的问题。作者提出了一种支持感知的决策支持系统(DSS),整合了回放、离线评估、保守下界排名、多方面防护机制等多种方法,构建了一个可保留主张的评估流程,输出的是政策是否具备上线条件的分类结果,而非单一性能估计。实验表明,该系统能够识别出具有提升潜力的地板价格策略,并指出在缺乏关键因果证据的情况下,应选择在线验证而非直接部署,从而避免决策过断。

详情
英文摘要

Marketplace platforms routinely evaluate pricing and allocation policies using logged observational data, yet strong offline performance does not imply that a policy is safe to deploy. In real-time bidding (RTB) marketplaces, reserve-price and floor-policy changes affect not only revenue but also fill, advertiser value, budget pacing, and competition across auctions, creating feedback and interference. The central problem is therefore not to estimate whether a policy improves an offline metric, but to determine whether the available evidence justifies direct launch or only further validation. In this regard, we propose a support-aware decision-support system (DSS) that distinguishes promising from actionable evidence. The framework integrates replay, support-aware off-policy evaluation (OPE), conservative lower-bound ranking, multi-sided guardrails, out-of-time validation, sensitivity analysis, and interference-aware validation design into a claim-preserving pipeline that outputs a launch-readiness classification rather than a single performance estimate. Applying the framework to iPinYou-style RTB logs, we identify a margin-gated floor policy as the leading candidate, with a 47.7% replay yield lift, a 45.8% conservative lower-tail lift, and stable out-of-time performance. However, the framework does not recommend direct launch. A decision-rule ablation shows that simplified pipelines select the same policy but incorrectly recommend deployment, leaving key causal assumptions unresolved. In contrast, the proposed DSS selects the same policy but changes the action to online validation, reflecting missing evidence on propensities, bidder response, and interference. Overall, the contribution is a reproducible DSS protocol that prevents decision overclaim under partial identification and converts offline evaluation into an auditable, action-oriented recommendation.

2605.12832 2026-05-14 stat.AP cs.LG stat.ML

Digital Twins as Synthetic Controls in Single-Arm Trials

Daniele Bertolini, Franklin Fuller, Aaron M. Smith, Jonathan R. Walsh, Run Zhuang

AI总结 本文探讨了在单臂试验中使用数字孪生作为合成对照的方法,以评估药物疗效和安全性。研究提出基于结果模型的合成对照能够克服传统数据驱动方法的局限性,提供更稳健的治疗效果估计。文章重点介绍了数字孪生技术,即利用机器学习模型生成的个性化疾病进展预测,并讨论了其在实际应用中的统计方法、样本量计算及与FDA最新指南的兼容性。最后,作者通过重新分析肌萎缩侧索硬化症和亨廷顿病的试验数据,验证了所提方法的有效性。

详情
英文摘要

Single-arm trials are an important study design for evaluating drug efficacy and safety without enrolling patients into a control arm. Although they do not provide the gold-standard evidence of randomized controlled trials, they are increasingly used in clinical development as they offer an efficient, ethical, and practical alternative. A wide variety of approaches can be used to construct control comparators and estimate treatment effects, from fixed comparators informed by clinical knowledge to data-based and model-based patient-level comparators, also known as synthetic controls. Powerful and flexible machine learning models can allow outcome-model-based synthetic controls to overcome key limitations of direct data-based approaches, yield more robust estimates of treatment effects, and provide a principled way to incorporate corrections or encode additional assumptions when external data are not directly comparable. In this work, we argue that outcome-model-based synthetic control arms are an important tool for single-arm trials. We focus on digital twins, personalized predictions of disease progression generated from machine learning models trained on historical datasets, which naturally leverage these flexible approaches. We review doubly robust estimators, present power and sample size formulas, and discuss trade-offs in selecting historical data for training and analysis. We also outline practical considerations for deploying digital twins within the framework of recent FDA draft guidance on the use of artificial intelligence in drug development. Finally, we reanalyze data from trials in amyotrophic lateral sclerosis and Huntington's disease to demonstrate the proposed methods.

2605.12807 2026-05-14 stat.CO cs.IT math.IT

Multi-Marginal Couplings for Metropolis-Hastings

Buu Phan, Gergely Flamich, Ashish Khisti, Shahab Asoodeh

AI总结 本文研究了基于多边缘耦合的Metropolis-Hastings算法的收敛诊断问题,提出了一种新的多链耦合方法,通过引入自然的目标函数并结合列表级分布耦合与分布式配对问题,建立了上下界分析。该方法改进了经典泊松蒙特卡洛方法的运行时间复杂度,避免了维度依赖的瓶颈,并在高维设置中表现出显著优势。实验表明,该方法在提升马尔可夫链耦合效率、缩短会合时间方面优于现有方法。

详情
英文摘要

Convergence diagnosis for Markov chain Monte Carlo is a matter of fundamental importance in computational statistics: it determines the resources allocated to a particular sampling problem and influences the practitioner's view of the quality of estimates obtained from a Markov chain. Motivated by this, we contribute to the emerging class of coupling-based convergence diagnostic algorithms. Concretely, we study coupling multiple Metropolis-Hastings chains using multi-marginal coupling. We introduce a natural objective for this setting and establish lower and upper bounds by drawing connections to list-level distribution coupling and distributed pairwise-matching problems. This analysis ultimately leads to a shared-randomness Poisson Monte Carlo construction for coupling multiple Markov chains. In this process, we avoid a key dimension-dependent bottleneck in the runtime complexity of classical Poisson Monte Carlo by developing an adaptive rule for updating the point process, yielding significant gains in high-dimensional settings. Experiments on grand couplings of Markov chains show that our methods improve coalescence rates across dimensions, reducing meeting times by up to 50% compared with existing baselines.

2605.12797 2026-05-14 stat.ME stat.AP

Evaluating the impact of outcome delay on the efficiency of sample size re-estimation

Aritra Mukherjee, Michael J Grayling, James J M S Wason

AI总结 本文研究了结果延迟对内部试点型样本量重新估计(SSR)设计效率的影响。通过分析不同延迟长度下连续和二元结局数据的最终样本量分布,探讨了延迟对样本量估计精度的影响,并引入了均方误差(RMSE)、延迟影响和成本等指标进行评估。研究发现,随着延迟增加,平均样本量和统计功效会上升,但其影响程度取决于具体试验设置,尤其是当重新估计的样本量小于原计划时,延迟的影响更为显著。

详情
英文摘要

Sample size reestimation can be a powerful tool to ensure that a clinical trial meets its prespecified power requirements when uncertainty regarding a design parameter exists at the planning stage. However, long term primary endpoints can be harmful to the efficiency of this trial design. If recruitment is continued while treatment outcomes are awaited, long delay can potentially lead to a large number of pipeline participants being recruited in the trial that do not contribute to the interim analysis. This may lead to a larger number of recruited participants than are actually deemed required, resulting in an overpowered trial with high cost. This paper studies the exact impact of such outcome delay on the efficiency of internal pilot type SSR designs. The distribution of the final sample size post SSR is obtained under various delay lengths for both continuous and binary outcome data, how delay impacts the precision of the final sample size estimate is then discussed. Precisely, the impact of delay on this precision is assessed through RMSE, as well as two more novel metrics, termed the delay impact and cost. The results indicate that with increase in delay length, the delay impact increases, inflating average sample size and power. However, the severity of the effect of delayed outcomes depends highly on the exact trial setting. Trials where the reestimated sample size is smaller than originally planned suffer the most from delayed outcomes, often leading to an overpowered trial. However, the impact of delay is substantially less if the original planned sample size remains smaller than the reestimated sample size.

2605.12780 2026-05-14 stat.ME cs.LG stat.ML

When to Trust Confidence Thresholding: Calibration Diagnostics for Pseudo-Labelled Regression

Marcell T. Kurbucz

AI总结 本文研究了在回归分析中使用经过校准的分类器输出作为伪标签时,置信度阈值选择对估计结果的影响。作者提出了一种基于校准的诊断方法,推导出置信度阈值引起的衰减偏差的闭式表达,并表明该偏差可由未标记数据集上的残差得分方差 $V^{*}$ 预测。研究还给出了在有界校准漂移下的敏感性边界,并提出了一个基于 $V^{*}$ 和 $κ$ 的决策规则,帮助实践者判断是否安全使用置信度阈值进行伪标签。

详情
Comments
24 pages, 6 figures, 6 tables
英文摘要

Calibrated probability outputs of trained classifiers are increasingly used as inputs to downstream regression estimands such as effects, prevalences, or disparities for a latent group observed only on a small labelled subset. A standard practice is to threshold the calibrated score at a confidence cutoff and treat the hard label as the truth. Building on a recent identification result for the underlying moment equation, we develop a calibration-aware diagnostic apparatus for pseudo-labelling pipelines. We derive a closed-form expression for the attenuation bias that confidence thresholding induces in the downstream regression coefficient, and show that the bias can be predicted, before any inference is run, from the residual score variance $V^{*}=\mathbb{E}[\operatorname{Var}(p\mid X)]$ on the unlabelled set after partialling out the downstream controls $X$. We further obtain a sharp sensitivity bound under bounded calibration drift, and identify the boundary $V^{*}=0$, which holds iff $p$ is a deterministic function of $X$; this motivates a structural separation between classifier features $W$ and downstream controls $X\subsetneq W$. Five controlled simulations and a UCI Adult illustration trace the predictions. The contribution is operational: a $(V^{*}, κ)$ decision rule that practitioners can compute from any classifier output to decide whether confidence thresholding is safe.

2605.12760 2026-05-14 stat.ME stat.AP

How long should a block be?

Léo R. Belzile, Anthony C. Davison

AI总结 本文研究了极端值分析中块最大值方法中块长度的选择问题,指出块长度对方法效果有重要影响。作者通过渐近相对效率分析了过长块长的影响,并提出基于似然的方法和图形诊断工具,以判断块长度是否合适,同时考虑数据可能存在的舍入和左截断问题。研究通过模拟和实际气象数据验证了所提方法的有效性。

详情
Comments
18 pages, plus supplementary material
英文摘要

The block maximum method, which is widely used in extreme value analysis, uses a generalized extreme value distribution to approximate that of the maximum of m observations. The quality of this approximation depends on the value of m and may be poor if m is too small. Surprisingly little attention has been paid to the choice of the block length, although a good choice is crucial to the success of the method. In this paper we assess the effect of taking excessively long blocks in terms of asymptotic relative efficiency, and propose likelihood-based approaches and graphical diagnostics to determine whether a proposed block length is suitable, allowing for potential rounding and left-censoring of observations. We investigate our ideas using simulation and illustrate them using wind speed, river flow and rainfall data.

2605.12756 2026-05-14 math.OC cs.AI stat.ML

Uncovering Symmetry Transfer in Large Language Models via Layer-Peeled Optimization

Zhehang Du, Hangfeng He, Weijie Su

AI总结 本文研究了大规模语言模型在最小化交叉熵损失进行预训练时,是否会在模型权重和上下文嵌入中诱导出几何结构。通过分析一个约束的逐层剥离优化模型,作者证明了目标下一个词分布中的对称性会以群论意义上的方式转移到模型的最优解中。例如,当目标词具有循环移位对称性时,最优的logit矩阵为循环矩阵,输出投影和上下文嵌入的格拉姆矩阵也呈现出循环几何结构;对于具有对称群不变性的目标分布,最优输出投影矩阵形成等角紧框架,且继承了输入数据中的排列对称性。实验表明,开源大语言模型自然表现出与理论预测一致的对称性,尽管训练过程中并未显式引入相关正则化。

详情
英文摘要

Large language models (LLMs) are pretrained by minimizing the cross-entropy loss for next-token prediction. In this paper, we study whether this optimization strategy can induce geometric structure in the learned model weights and context embeddings. We approach this problem by analyzing a constrained layer-peeled optimization program, which serves as a mathematically tractable surrogate for LLMs by treating the output projection matrix and last-layer context embeddings as optimization variables. Our analysis of this nonconvex optimization program demonstrates that symmetries in the target next-token distributions are transferred to the global minimizers of the layer-peeled model in a precise group-theoretic sense. Specifically, we prove that when the target tokens exhibit a cyclic-shift symmetry (such as the seven days of the week or the twelve months of the year), the optimal logit matrix is exactly circulant, and the Gram matrices of both the output projections and the context embeddings form circulant geometries as well. Next, for exchangeable target distributions invariant under the symmetric group and, more generally, under two-transitive group actions, we show that the global optimal output projection matrix forms a simplex equiangular tight frame, while the optimal logit matrix and context embeddings inherit the permutation symmetries present in the input data. A key technical step is to reduce the constrained nonconvex factorized problem to an explicit logit-level convex characterization for cyclic symmetry and to a symmetry-based lower bound for permutation symmetry, together with a sharp characterization of the optimal factorization. Finally, we empirically demonstrate that open-source LLMs naturally exhibit symmetries consistent with our theoretical predictions, despite being trained without any explicit regularization promoting such geometric structure.

2605.12733 2026-05-14 cs.LG cs.AI stat.ML

From Generalist to Specialist Representation

Yujia Zheng, Fan Feng, Yuke Li, Shaoan Xie, Kevin Murphy, Kun Zhang

AI总结 本文研究了从通用模型中学习任务相关的专家表征问题,核心在于在非参数设定下证明任务结构和任务相关潜在表征的可识别性。研究无需干预、参数形式或结构约束,证明了即使在时间序列缺乏严格时序依赖或存在断开的情况下,任务结构仍可在完全无监督条件下被识别,同时在每个时间步内,通过简单的稀疏性正则化可将任务相关与无关部分分离。这些结果为从通用模型向专家模型的可证性转变奠定了理论基础。

详情
Comments
ICML 2026
英文摘要

Given a generalist model, learning a task-relevant specialist representation is fundamental for downstream applications. Identifiability, the asymptotic guarantee of recovering the ground-truth representation, is critical because it sets the ultimate limit of any model, even with infinite data and computation. We study this problem in a completely nonparametric setting, without relying on interventions, parametric forms, or structural constraints. We first prove that the structure between time steps and tasks is identifiable in a fully unsupervised manner, even when sequences lack strict temporal dependence and may exhibit disconnections, and task assignments can follow arbitrarily complex and interleaving structures. We then prove that, within each time step, the task-relevant latent representation can be disentangled from the irrelevant part under a simple sparsity regularization, without any additional information or parametric constraints. Together, these results establish a hierarchical foundation: task structure is identifiable across time steps, and task-relevant latent representations are identifiable within each step. To our knowledge, each result provides a first general nonparametric identifiability guarantee, and together they mark a step toward provably moving from generalist to specialist models.

2605.12720 2026-05-14 math.ST math.PR stat.ML stat.TH

Optimal sequential tests yield log-optimal e-processes

Ashwin Ram, Aaditya Ramdas

AI总结 本文研究了序贯检验与对数最优e-过程之间的关系,证明了可以通过聚合渐近最优的序贯检验构造出渐近对数最优的e-过程,从而完善了相关理论。核心方法是引入一种新的加权停止时间指示符聚合的e-过程,能够在备择假设下以最优速率增长。该研究澄清了不同渐近最优性定义之间的细微差别,为序贯统计推断提供了更坚实的理论基础。

详情
Comments
Preprint
英文摘要

It has been recently shown that e-processes are sufficient for sequential testing in the following sense: every level-$α$ sequential test can be obtained by thresholding an e-process at $1/α$. However, in the above result, neither does the test have to be asymptotically optimal (in terms of stopping times) nor does the e-process have to be asymptotically log-optimal. It has separately been shown that asymptotically log-optimal e-processes yield asymptotically optimal sequential tests. In this paper, we prove the converse, arguably completing the story: it is possible to aggregate asymptotically optimal sequential tests into asymptotically log-optimal e-processes. This is accomplished by using a new class of WAIT e-processes: those that are Weighted Aggregates of Indicators of stopping Times that begin at zero, are nondecreasing and increase to infinity under the alternative at the optimal rate. Importantly, the paper discusses several nuances in the varied definitions of asymptotic (log-)optimality.

2605.12697 2026-05-14 stat.ML cs.LG math.PR

A Unified Framework for Critical Scaling of Inverse Temperature in Self-Attention

Tomohiro Hayase, Ryo Karakida

AI总结 本文提出了一种统一的框架,用于确定自注意力机制中逆温度参数的临界缩放规律,以稳定长上下文处理。研究通过分析每个注意力行的间隔计数函数 $N_n$,定义了上尾累积尺度,并证明该尺度决定了softmax集中度的临界逆温度值。该框架统一了先前不同的缩放规律,并为从理论模型到实际Transformer的注意力得分分布提供了直接的诊断方法。

详情
英文摘要

Length-dependent logit rescaling is widely used to stabilize long-context self-attention, but existing analyses and methods suggest conflicting inverse-temperature laws for the context length $n$, ranging from $(\log n)^{1/2}$ to $\log n$ and $(\log n)^2$. We provide a general theory showing that the desirable scale is determined by the gap-counting function $N_n$ of each attention row. Counting how many competitors lie within each gap from the maximum, we define an upper-tail accumulation scale and prove that it gives the critical inverse-temperature scale for softmax concentration: below this scale, the top competitors remain unseparated, whereas above it, the attention entropy collapses. This framework unifies prior scaling laws as different $N_n$ and yields a direct diagnostic for attention-score families, from idealized theoretical models to more practical transformers.

2605.12679 2026-05-14 stat.ME

Measures of predictive accuracy, miscalibration and discrimination

Łukasz Delong, Mario Wüthrich

AI总结 本文研究了在决策理论框架下,基于Bregman散度的均值一致损失函数对实值点预测器的评估方法。作者推导出一种新的Murphy分解形式,将预期损失分解为误校准和判别两个部分,并将其与基于洛伦兹曲线的准确性指标联系起来。引入了改进的ABC²指标以弥补传统ABC指标在均值校准识别上的不足,同时指出这些指标依赖于预测器权重,因而无法与均值一致评分函数保持一致。研究结果表明,在模型选择中使用ABC、ABC²或基尼分数可能导致不诚实的评估,支持采用均值一致损失函数和Murphy分解中的误校准与判别度量进行模型评估。

详情
英文摘要

We study the evaluation of real-valued point predictors under the decision-theoretic framework of mean-consistent loss functions given by the Bregman divergences. We first derive a new version of Murphy's decomposition of the expected loss which does not directly include the response itself but only its predictors. We then relate the miscalibration and the discrimination component of the Murphy's decomposition to Lorenz-curve-based accuracy measures that are widely used in practice. Besides the usual area between the concentration and Lorenz curves, ABC, we introduce a mean-squared version ABC$^2$ that mitigates some of the weaknesses of the original ABC in identifying mean-calibration. More importantly, both ABC and ABC$^2$ are shown to rely on predictor-dependent weights, so they fail to align with the class of mean-consistent scoring functions. In the same spirit, we derive a similar result for the widely used Gini score. These results indicate that ABC, ABC$^2$ and Gini scores may lead to dishonest evaluation of point predictions when used for model selection; this gives support to use mean-consistent loss functions as well as the miscalibration and the discrimination measure from the Murphy's decomposition of the expected loss for model evaluation. Finally, we study forecast dominance when Lorenz curves intersect. We show that Lorenz and Murphy's curves have the same number of crossings and, in the one-crossing case, we establish weaker dominance criteria for subclasses of Bregman divergences through third-degree stochastic dominance.

2605.12668 2026-05-14 stat.ML cs.LG

Online Conformal Prediction: Enforcing monotonicity via Online Optimization

Eduardo Ochoa Rivera, Ambuj Tewari

AI总结 本文研究了在线符合预测问题,旨在在多个置信水平下同时生成具有嵌套结构的有效预测集,以满足不同用户对风险容忍度的异构需求。作者提出了两种新的在线符合预测方法,通过在线优化视角实现预测集的嵌套性,并控制分位数估计误差。实验表明,与现有方法相比,该方法在多个置信水平上实现了稳定的覆盖率、严格的嵌套结构以及更高的统计效率。

详情
英文摘要

Conformal prediction provides a principled framework for uncertainty quantification with finite-sample coverage guarantees. While recent work has extended conformal prediction to online and sequential settings, existing methods typically focus on a single coverage level and do not ensure consistency across multiple confidence levels. In many real-world applications, such as weather forecasting, macroeconomic prediction, and risk management, different users operate under heterogeneous risk tolerances and require calibrated uncertainty estimates across a range of coverage levels. In such settings, it is desirable to produce prediction sets corresponding to different coverage levels that are nested and valid simultaneously. In this paper, we propose two novel online conformal prediction methods that output \emph{nested prediction sets} across a range of coverage levels, enabling simultaneous uncertainty quantification across the entire risk spectrum. Beyond interpretability, jointly estimating multiple coverage levels is known to improve statistical efficiency in classical quantile regression by enforcing non-crossing constraints and sharing information across quantiles. Our approaches leverage an online optimization perspective with small regret that translates to quantile estimation error control while enforcing nestedness of prediction sets. Empirical results on synthetic and real-world datasets, including applications in forecasting tasks with heterogeneous risk requirements, demonstrate that our method achieves stable coverage across all levels, strictly nested prediction sets, and improved efficiency compared to existing online conformal baselines.

2605.12653 2026-05-14 cs.LG cs.AI stat.ML

Plan Before You Trade: Inference-Time Optimization for RL Trading Agents

Eun Go, Rohan Deb, Arindam Banerjee

AI总结 本文提出了一种名为FPILOT的推理时优化框架,用于改进强化学习在投资组合管理中的应用。该方法受模型预测控制启发,利用价格预测信息在推理阶段动态优化交易策略,而无需依赖训练时的固定策略。FPILOT能够在不重新训练策略的情况下,结合价格预测模型生成多步价格轨迹,并据此优化每一步的资产配置,从而在多个风险调整指标上显著提升交易表现。

详情
英文摘要

Reinforcement learning agents for portfolio management are typically trained and deployed as static policies, with no mechanism for using price forecasts at inference time. We propose $\text{FPILOT}$ (**Fin**ancial **P**lugin **I**nference-time **L**earning for **O**ptimal **T**rading), a plugin inference-time optimization framework inspired by Model Predictive Control (MPC). Our key structural insight is that future prices mostly do not depend on one agent's portfolio allocation, so a suitable predictive model can produce a multi-step price trajectory without iterative action-conditioned rollouts as in typical reinforcement learning. At each decision step, we use the forecaster's predicted price trajectory to construct an allocation-based imagined return objective, and optimize the policy at inference-time before executing one step of the trade. Our framework is compatible with any pre-trained agent and adapts the policy to the forecaster's predictions without any retraining. Evaluated across five policy learning algorithms on the TradeMaster DJ30 benchmark, $\text{FPILOT}$ produces consistent improvements in total return and return-based risk-adjusted metrics (Sharpe, Sortino, Calmar), with stochastic policies benefiting more than deterministic ones. Further, using synthetic forecasts at calibrated quality levels, we show that gains consistently improve with forecaster quality, suggesting that our performance will improve based on advances in financial forecasting.

2605.12648 2026-05-14 cs.LG stat.ML

Population Risk Bounds for Kolmogorov-Arnold Networks Trained by DP-SGD with Correlated Noise

Puyu Wang, Jan Schuchardt, Nikita Kalinin, Junyu Zhou, Sophie Fellenz, Christoph Lampert, Marius Kloft

AI总结 本文首次为使用带有梯度裁剪的随机梯度下降(SGD)训练的柯尔莫戈罗夫-阿诺尔德网络(KAN)建立了群体风险界,涵盖了非隐私保护的SGD以及使用高斯扰动的差分隐私SGD(DP-SGD),其中扰动噪声在独立与时间相关之间进行插值。研究采用更贴近实际训练的批量SGD方法,并引入时间相关噪声机制,以改善隐私与效用的平衡。通过引入辅助未投影动态、偏移迭代和高概率引导分析,解决了非凸优化中相关噪声DP训练的分析难题,最终得到了KAN的群体风险界,为非凸学习中的相关噪声机制提供了首个优化与泛化分析。

详情
英文摘要

We establish the first population risk bounds for Kolmogorov-Arnold Networks (KANs) trained by mini-batch SGD with gradient clipping, covering non-private SGD as well as differentially private SGD (DP-SGD) with Gaussian perturbations that interpolate between independent and temporally correlated noise. This setting is substantially closer to practice than prior KAN theory along two axes: training is by mini-batch SGD, the standard recipe for modern networks, rather than full-batch gradient descent (GD); and correlated-noise mechanisms have empirically shown a more favorable privacy-utility tradeoff than independent-noise mechanisms. Our results cover the corresponding full-batch GD and independent-noise DP-GD results for KANs by Wang et al. (2026), while yielding sharper fixed-second-layer specializations. The technical core is a new analysis route for correlated-noise DP training in the non-convex regime. Temporal dependence breaks the conditional-centering structure underlying standard one-step SGD arguments, and the projection step obstructs the exact cancellation structure of correlated perturbations. We address these difficulties through an auxiliary unprojected dynamics, a shifted iterate that absorbs the current noise perturbation, and a high-probability bootstrap certifying projection inactivity. Combining this optimization analysis with a stability-based generalization argument yields the stated population risk bounds. To the best of our knowledge, this is the first optimization and population risk analysis of a correlated-noise mechanism for DP training beyond convex learning, in particular for neural networks.

2605.11168 2026-05-14 stat.ME stat.CO stat.ML

Variational predictive resampling

Laura Battaglia, Stefano Cortinovis, Chris Holmes, David T. Frazier, Jack Jewson

AI总结 本文提出了一种名为变分预测重采样(VPR)的可扩展后验采样方法,旨在解决变分推理(VI)在捕捉后验依赖性方面的不足。该方法结合了VI的预测优势与重采样框架,通过反复生成未来观测并更新变分近似,逐步逼近真实的贝叶斯后验分布。实验表明,VPR在保持计算效率的同时,显著提升了后验不确定性量化能力,并恢复了均值场近似所遗漏的后验依赖关系。

详情
英文摘要

Bayesian inference provides principled uncertainty quantification, but accurate posterior sampling with MCMC can be computationally prohibitive for modern applications. Variational inference (VI) offers a scalable alternative and often yields accurate predictive distributions, but cheap variational families such as mean-field (MF) can produce over-concentrated approximations that miss posterior dependence. We propose variational predictive resampling (VPR), a scalable posterior sampling method that exploits VI's predictive strength within a predictive-resampling framework to better approximate the Bayesian posterior. Given a prior-likelihood pair, VPR repeatedly imputes future observations from the current variational predictive, updates the variational approximation after each imputation, and records the parameter value implied by the completed sample. We establish conditions under which the law of the parameter returned by VPR is well defined and show that its finite-horizon approximation converges to this limit. In a tractable Gaussian location model, we show that VPR with MF variational predictives converges to the exact Bayesian posterior, whereas the optimal MF-VI approximation retains a non-vanishing asymptotic gap. Experiments on linear regression, logistic regression, and hierarchical linear mixed-effects models demonstrate that VPR substantially improves posterior uncertainty quantification and recovers posterior dependence missed by MF-VI, while remaining computationally competitive with, and often more efficient than, MCMC.

2605.11108 2026-05-14 math.PR math.ST stat.TH

Empirical Convergence of Even-Order Gromov-Wasserstein Functionals

Vasyl Paliy

AI总结 本文研究了在实空间 $\mathbb{R}^{d_x}$ 和 $\mathbb{R}^{d_y}$ 上具有紧支撑的概率测度之间,偶阶幂次Gromov-Wasserstein泛函的样本复杂度。作者证明了对于任意固定的正整数对 $(r,k)$,两样本经验估计的误差以 $n^{-2/\max\{\min\{d_x,d_y\},4\}}$ 的速率收敛,其中在 $\min\{d_x,d_y\}=4$ 的临界情况下包含对数因子。该结果将已知的二次欧几里得误差上界推广到了整个偶阶幂次Gromov-Wasserstein泛函族,核心方法包括偶阶GW泛函的多项式分解、耦合依赖项的广义对偶公式以及半凹对偶势的熵估计。

详情
Comments
27 pages. Comments welcome
英文摘要

We study the sample complexity of empirical plug-in estimation for the powered even-order Gromov-Wasserstein functional between compactly supported probability measures on $\mathbb{R}^{d_x}$ and $\mathbb{R}^{d_y}$. For every fixed pair of integers $r,k\geq 1$, we prove that the two-sample empirical error is bounded at the rate $n^{-2/\max\{\min\{d_x,d_y\},4\}}$, up to a logarithmic factor in the critical case $\min\{d_x,d_y\}=4$. This extends the known quadratic Euclidean upper rate to the full powered even-order family. The proof uses a polynomial decomposition of the even-order GW functional, a generalized duality formula reducing the coupling-dependent term to a compact family of ordinary optimal transport problems, and entropy estimates for semiconcave dual potentials.

2605.09968 2026-05-14 cs.LG math.OC stat.ML

Consolidation-Expansion Operator Mechanics:A Unified Framework for Adaptive Learning

Debashis Guha

AI总结 本文提出了一种名为“巩固-扩展算子力学”(OpMech)的统一框架,用于描述自适应学习系统中巩固已有知识与扩展新知识之间的交替过程。核心概念是“顺序差距”(order-gap),它衡量了巩固算子和扩展算子在某一知识状态下的非交换程度,并可作为实时控制信号指导学习过程。该框架在多个领域如强化学习、连续学习和递归语言模型中均有应用,并提供了基于顺序差距的停止规则,具有理论保证和实际有效性。

详情
Comments
38 pages; Corrected author affiliation on title page in v2; no scientific changes
英文摘要

Every adaptive learning system must alternate between two operations: consolidating what it already knows and expanding into new evidence. We propose \emph{Consolidation-Expansion Operator Mechanics} (OpMech), a framework that makes this structure precise. The central object is the \emph{order-gap} $\Ogap(θ; e)$, the degree to which a consolidation operator~$Q$ and an expansion operator~$P_e$ fail to commute at a given knowledge state. Because the order-gap is computable from the system's own trajectory, it serves as a real-time control signal: large values indicate that the system is still sensitive to the ordering of consolidation and expansion; once the order-gap falls and stays small, further processing is unlikely to change the outcome. Three results give the signal precise meaning: the order-gap decays along convergent trajectories; a persistently large order-gap implies the system is far from its settled state; and an order-gap-based stopping rule terminates with provable guarantees in both noiseless and bounded-noise settings. The framework applies across five domains: bandits, reinforcement learning, stochastic optimization, continual learning, and recursive language models. We give conditions under which the order-gap reliably tracks convergence in three representative cases. We develop the recursive language model application in detail, showing how OpMech replaces heuristic stopping rules and fixed recursion budgets with principled, evidence-driven alternatives.

2604.18242 2026-05-14 math.ST cs.LG stat.ML stat.TH

Horospherical Depth and Busemann Median on Hadamard Manifolds

Yangdi Jiang, Xiaotian Chang, Cyrus Mostajeran

AI总结 本文提出了一种在Hadamard流形上的内在统计深度——horospherical深度,并定义了其最大值点集为Busemann中位数。该方法利用了Tukey半空间深度中线性泛函与归一化距离函数极限的关系,在Hadamard流形上则对应为Busemann函数,其下水平集为horoball,可视为半空间的内在替代。该深度具有视觉边界参数化、等距协变等特性,且无需切空间线性化或指定基点,适用于任意Hadamard流形,并在负曲率条件下具有严格拟凹性和唯一中位数,同时具备对污染和样本扰动的鲁棒性。

详情
Comments
52 pages, 10 figures
英文摘要

\We introduce the horospherical depth, an intrinsic notion of statistical depth on Hadamard manifolds, and define the Busemann median as the set of its maximizers. The construction exploits the fact that the linear functionals appearing in Tukey's half-space depth are themselves limits of renormalized distance functions; on a Hadamard manifold the same limiting procedure produces Busemann functions, whose sublevel sets are horoballs, the intrinsic replacements for halfspaces. The resulting depth is parametrized by the visual boundary, is isometry-equivariant, and requires neither tangent-space linearization nor a chosen base point. For arbitrary Hadamard manifolds, we prove that the depth regions are nested and geodesically convex, that a centerpoint of depth at least $1/(d+1)$ exists, and hence that the Busemann median exists for every Borel probability measure. Under strictly negative sectional curvature and mild regularity assumptions, the depth is strictly quasi-concave and the median is unique. We also establish robustness: the depth is stable under total-variation perturbations, and under contamination escaping to infinity the limiting median depends on the escape direction but not on how far the contaminating mass has moved along the geodesic ray, in contrast with the Fréchet mean. Finally, we establish uniform consistency of the sample depth and convergence of sample depth regions and sample Busemann medians; on symmetric spaces of noncompact type, the argument proceeds through a VC analysis of upper horospherical halfspaces, while on general Hadamard manifolds it follows from a compactness argument under a mild non-atomicity assumption.

2603.20521 2026-05-14 cs.LG cs.AI math.OC stat.ML

Delightful Distributed Policy Gradient

Ian Osband

AI总结 分布式强化学习在使用过时、有错误或不匹配的智能体生成的数据进行训练时,容易受到高惊讶度(负对数概率)动作的影响,导致学习效果下降。本文提出的“Delightful Policy Gradient”(DG)方法通过将优势值与惊讶度相乘作为门控机制,有效抑制高惊讶度的失败案例,同时保留高惊讶度的成功案例,从而提升学习效率。实验表明,DG在多种复杂场景下相比传统方法具有显著的样本效率优势,尤其在任务复杂度增加时表现更为突出。

详情
英文摘要

Distributed reinforcement learning trains on data from stale, buggy, or mismatched actors, producing actions with high surprisal (negative log-probability) under the learner's policy. The core difficulty is not surprising data per se, but \emph{negative learning from surprising data}. High-surprisal failures can dominate finite-batch updates through large perpendicular components, while high-surprisal successes reveal opportunities the current policy would otherwise miss. The \textit{Delightful Policy Gradient} (DG) separates these cases by gating each update with delight, the product of advantage and surprisal, suppressing rare failures and preserving rare successes without behavior probabilities. In a tabular analysis, DG suppresses the perpendicular second moment of high-surprisal failures by a policy-overlap factor that vanishes as the learner improves. The advantage sign is essential for surprisal-based filtering: any learner-probability-only gate that suppresses rare failures also suppresses rare successes. On MNIST with simulated staleness, DG without off-policy correction outperforms importance-weighted PG with exact behavior probabilities. On a transformer sequence task with staleness, actor bugs, reward corruption, and rare discovery, DG often achieves nearly order-of-magnitude lower error. When all four frictions act simultaneously, its sample-efficiency advantage is order-of-magnitude and grows with task complexity.

2603.14479 2026-05-14 stat.AP stat.ME

Risk-Calibrated Process Capability Approval with Finite Samples

Fei Jiang, Lei Yang

AI总结 本文研究了在有限样本条件下,如何基于过程能力指数 $C_{pk}$ 进行风险校准的工艺能力认证问题。传统方法通常采用确定性阈值规则进行决策,但未考虑估计不确定性带来的风险。本文提出了一种考虑估计误差和不对称操作损失的决策框架,通过引入校准常数 $k$,将认证规则扩展为 $\widehat{C}_{pk} \ge C_0 + k\,SE(\widehat{C}_{pk})$,从而在近阈值决策中提升认证稳定性并降低预期操作损失。

详情
Comments
17 pages, 4 figures and 6 tables
英文摘要

Process capability indices such as $C_{pk}$ are widely used in manufacturing to support supplier qualification, pilot-build release, and production approval. In practice, approval decisions are often based on deterministic threshold rules of the form $\widehat{C}_{pk} \ge C_0$. Because $\widehat{C}_{pk}$ is estimated from finite samples, however, such decisions are inherently stochastic, especially when the true capability lies near the approval threshold. This paper develops a risk-calibrated decision framework for process capability approval that explicitly accounts for estimation uncertainty and asymmetric operational loss. Capability approval is formulated as a binary statistical decision problem, leading to a rule of the form $\widehat{C}_{pk} \ge C_0 + k\,SE(\widehat{C}_{pk})$, where the calibration constant $k$ is determined either by a tolerable failure probability or by a false-accept/false-reject cost ratio. The resulting formulation unifies several commonly used procedures, including deterministic thresholding, lower confidence bound rules, and probability-based approval rules, and naturally extends them to cost-sensitive decision rules derived from asymmetric operational loss. Simulation experiments and an industrial case study show that risk calibration primarily affects near-threshold decisions, improves approval stability, and can substantially reduce expected operational loss when false acceptance is more costly than false rejection.

2602.13155 2026-05-14 cs.LG cs.DS cs.NE stat.ML

Learning to Approximate Uniform Facility Location via Graph Neural Networks

Chendi Qian, Christopher Morris, Stefanie Jegelka, Christian Sohler

AI总结 本文研究了在统一设施选址问题(UniFL)中如何通过图神经网络(GNN)实现高效的近似求解。作者提出了一种全微分的图神经网络方法,结合经典近似算法的思想,无需求解器监督或离散松弛,从而在保证理论近似比的同时提升了算法性能。该方法在实验中表现优于传统近似算法,缩小了与整数线性规划的性能差距。

详情
Comments
ICML 2026
英文摘要

Neural networks, particularly message-passing neural networks (MPNNs), are increasingly used as heuristics for hard combinatorial optimization problems. Yet many learning-based methods rely on supervision, reinforcement learning, or gradient estimators, causing high computational cost, unstable training, or limited guarantees. Classical approximation algorithms provide worst-case guarantees but are non-differentiable and cannot adapt to structure in natural input distributions. We study this tradeoff through Uniform Facility Location (UniFL), a problem with applications in clustering, summarization, logistics, and supply chains. We propose a fully differentiable MPNN that incorporates approximation-algorithmic principles without solver supervision or discrete relaxations. The model has provable approximation guarantees and empirically improves on standard approximation algorithms, narrowing the gap to integer linear programming.

2602.06713 2026-05-14 stat.ML cs.LG

Distribution Shift in Missing Data Imputation: A Risk-Based Perspective and Importance-Weighted Correction under MAR

Luke Shannon, Song Liu, Katarzyna Reluga

AI总结 本文从风险最小化角度出发,严格将缺失数据填补建模为均方误差风险最小化问题,揭示了当缺失概率依赖于数据时,现有方法未能考虑训练数据与完整数据分布之间的分布偏移,导致无法有效降低整体均方误差。为此,作者提出了一种基于重要性加权的修正算法,显式处理该分布偏移问题,实验表明该方法在RMSE和Wasserstein距离上均优于未修正的基准方法。

详情
Comments
9 pages, 12 figures
英文摘要

Missing data imputation, where a model is trained on observed data to estimate unobserved values, is a fundamental problem in machine learning. In this paper, we rigorously formulate imputation model learning as a mean-squared error risk minimisation problem. We show that when the probability of missingness depends on the data, many state-of-the-art methods fail to account for the resulting distribution shift between the observed data used for training and the full data distribution used for evaluation. Consequently, these approaches do not minimise mean-squared error on the full data distribution. Instead, we propose a novel imputation algorithm designed to learn an imputation model from the observed data while explicitly accounting for this distribution shift. Simulation studies show consistent improvements over otherwise identical uncorrected baselines, with average reductions of 3% in RMSE and 7% in Wasserstein distance.

2602.06104 2026-05-14 cs.LG stat.ML

Pragmatic Curiosity: A Unified Framework for Hybrid Learning and Optimization via Active Inference

Yingke Li, Anjali Parashar, Enlu Zhou, Chuchu Fan

AI总结 该论文提出了一种名为“实用好奇心”(Pragmatic Curiosity, PraC)的统一框架,用于结合学习与优化的混合场景,通过主动推理实现高效的决策。该方法通过权衡任务相关潜在符号的信息增益与结果的预期遗憾,指导候选查询的选择,从而在减少不确定性的同时提升任务性能。研究展示了PraC在多个复杂场景中的应用,包括固定符号的决策监控、局部符号的目标主动搜索以及未知偏好的复合贝叶斯优化,表现出降低决策风险、提升关键结果区域覆盖能力和联合学习预测与偏好结构的优势。

详情
英文摘要

Many engineering and scientific workflows rely on expensive black-box evaluations, requiring sequential decisions that must both improve task performance and reduce uncertainty. Bayesian optimization (BO) and Bayesian experimental design (BED) provide powerful but largely separate treatments of goal-directed optimization and information-seeking experimentation, leaving limited guidance for hybrid settings in which learning and optimization are intrinsically coupled. We propose Pragmatic Curiosity (PraC), a unified framework for hybrid learning and optimization via active inference. PraC evaluates candidate queries by trading information gain about a task-relevant latent symbol against an expected regret-based potential over outcomes. This formulation exposes three operational design choices: which latent quantity should be clarified, how task value is encoded as regret, and how strongly information gain should be exchanged against pragmatic value. We instantiate PraC across three regimes of increasing complexity: decision-oriented plume monitoring with fixed global symbols and known downstream losses, targeted active search with induced local symbols and evolving coverage goals, and composite Bayesian optimization with hierarchical regret learning under unknown preferences. Across these regimes, PraC reduces downstream decision risk, improves coverage of critical outcome regions, and jointly learns predictive and preference structures without relying on task-specific staging rules.

2602.03730 2026-05-14 stat.ML cs.LG

Efficient Generative Prediction for EHR Foundation Models: The SCOPE and REACH Estimators

Luke Solo, Matthew B. A. McDermott, William F. Parker, Bashar Ramadan, Michael C. Burkhart, Brett K. Beaulieu-Jones

AI总结 该论文提出两种高效估计方法SCOPE和REACH,用于提升基于电子健康记录(EHR)生成模型的临床结果预测性能。这两种方法利用生成模型中未被充分利用的下一个token概率分布,有效解决了传统蒙特卡洛采样在稀疏性、计算成本和方差方面的局限。实验表明,它们在保持预测校准的同时,显著减少了生成token数量,尤其在罕见但重要的临床结果上表现突出,从而大幅降低了推理成本。

详情
Comments
10 pages, 4 figures, 1 Table
英文摘要

Generative foundation models trained on tokenized electronic health record (EHR) timelines show promise for clinical outcome prediction via Monte Carlo sampling of simulated future trajectories. However, this approach suffers from three coupled limitations: sparse estimate distributions that poorly differentiate patient risk levels, extreme computational cost, and high sampling variance. We propose two new estimators that leverage next-token probability distributions underutilized by standard Monte Carlo: the Sum of Conditional Outcome Probability Estimator (SCOPE) and Risk Estimation from Anticipated Conditional Hazards (REACH). We prove both are unbiased, that REACH guarantees variance reduction over Monte Carlo for any model and outcome, and that REACH is a Rao-Blackwellization of any naive importance sampling scheme that preserves the non-outcome token distribution. Empirically, across $11$ clinically important outcomes in MIMIC-IV and the UChicago health system, SCOPE and REACH match $100$-sample Monte Carlo accuracy with median token reductions of $2.5\times$ to $3.4\times$ and reductions exceeding $80\times$ for the rarest outcomes, with calibration preserved throughout. Because SCOPE reuses a single sampled pool across an arbitrary number of outcomes at no marginal generation cost while REACH provides a per-task variance guarantee, the two estimators are complementary in deployment and together meaningfully reduce the inference budget required for generative EHR foundation models, particularly for rare, high-impact outcomes in healthcare.

2602.01099 2026-05-14 stat.AP cs.NA math.NA

Simultaneous Estimation of Seabed and Its Roughness With Longitudinal Waves

Babak Maboudi Afkham, Ana Carpio

AI总结 本文提出了一种无限维贝叶斯框架,用于利用纵向波散射同时估计海底及其粗糙度。该方法基于海底统计各向同性假设,通过分数阶可微性刻画海底粗糙度,并设计了鲁棒的数值算法以实现海底参数估计与不确定性量化。大量数值实验验证了该方法的有效性,为大规模海底探测提供了新的可行途径。

详情
英文摘要

This paper introduces an infinite-dimensional Bayesian framework for acoustic seabed tomography, leveraging wave scattering to simultaneously estimate the seabed and its roughness. Tomography is considered an ill-posed problem where multiple seabed configurations can result in similar measurement patterns. We propose a novel approach focusing on the statistical isotropy of the seabed. Utilizing fractional differentiability to identify seabed roughness, the paper presents a robust numerical algorithm to estimate the seabed and quantify uncertainties. Extensive numerical experiments validate the effectiveness of this method, offering a promising avenue for large-scale seabed exploration.

2601.22816 2026-05-14 cs.LG stat.ML

Cascaded Flow Matching for Heterogeneous Tabular Data with Mixed-Type Features

Markus Mueller, Kathrin Gruber, Dennis Fok

AI总结 本文提出了一种用于生成包含离散和连续混合特征的表格数据的级联流匹配方法,以解决现有模型在生成混合类型特征时的困难。该方法首先生成表格数据的低分辨率版本,再通过一种新的引导条件概率路径和数据依赖耦合机制,在高分辨率模型中生成更精确的混合特征。实验表明,该方法在生成样本的真实性和分布细节捕捉方面表现优异,检测得分提升了51.9%。

详情
Comments
published at ICML 2026
英文摘要

Advances in generative modeling have recently been adapted to tabular data containing discrete and continuous features. However, generating mixed-type features that combine discrete states with an otherwise continuous distribution in a single feature remains challenging. We advance the state-of-the-art in diffusion models for tabular data with a cascaded approach. We first generate a low-resolution version of a tabular data row, that is, the collection of the purely categorical features and a coarse categorical representation of numerical features. Next, this information is leveraged in the high-resolution flow matching model via a novel guided conditional probability path and data-dependent coupling. The low-resolution representation of numerical features explicitly accounts for discrete outcomes, such as missing or inflated values, and therewith enables a more faithful generation of mixed-type features. We formally prove that this cascade tightens the transport cost bound. The results indicate that our model generates significantly more realistic samples and captures distributional details more accurately, for example, the detection score improves by 51.9\%. Code is available at https://github.com/muellermarkus/tabcascade.

2601.22409 2026-05-14 cs.LG cs.AI stat.ML

Optimization, Generalization and Differential Privacy Bounds for Gradient Descent on Kolmogorov-Arnold Networks

Puyu Wang, Junyu Zhou, Philipp Liznerski, Marius Kloft

AI总结 本文研究了梯度下降在Kolmogorov-Arnold网络(KAN)上的优化动态、泛化性能及差分隐私保障。通过理论分析,作者得出了关于训练过程、泛化误差和隐私预算的通用界,并在逻辑斯蒂损失下证明了对数宽度的网络即可实现与迭代次数和样本量相关的优化与泛化速率。在差分隐私设置中,研究进一步表明所需噪声与输入维度和隐私参数相关,并揭示了在隐私保护下网络宽度不仅需满足充分性,还需满足必要性,揭示了隐私与非隐私训练之间的本质差异。

详情
Journal ref
ICML 2026
Comments
42 pages, 3 figures
英文摘要

Kolmogorov--Arnold Networks (KANs) have recently emerged as a structured alternative to standard MLPs, yet a principled theory for their training dynamics, generalization, and privacy properties remains limited. In this paper, we analyze gradient descent (GD) for training two-layer KANs and derive general bounds that characterize their training dynamics, generalization, and utility under differential privacy (DP). As a concrete instantiation, we specialize our analysis to logistic loss under an NTK-separable assumption, where we show that polylogarithmic network width suffices for GD to achieve an optimization rate of order $1/T$ and a generalization rate of order $1/n$, with $T$ denoting the number of GD iterations and $n$ the sample size. In the private setting, we characterize the noise required for $(ε,δ)$-DP and obtain a utility bound of order $\sqrt{d}/(nε)$ (with $d$ the input dimension), matching the classical lower bound for general convex Lipschitz problems. Our results imply that polylogarithmic width is not only sufficient but also necessary under differential privacy, revealing a qualitative gap between non-private (sufficiency only) and private (necessity also emerges) training regimes. Experiments further illustrate how these theoretical insights can guide practical choices, including network width selection and early stopping.

2601.06147 2026-05-14 cs.LG cs.CL stat.ML

LLM Flow Processes for Text-Conditioned Regression

Felix Biggs, Samuel Willis

AI总结 本文研究了在文本条件回归任务中如何有效利用预训练大语言模型(LLM)进行预测的问题。针对LLM在短序列预测中存在误差累积、计算密集且难以并行的问题,作者提出将LLM的边际预测密度与一个轻量级扩散神经过程结合,以提升预测的校准性与局部一致性。该方法还引入了一种无需梯度且非蒙特卡洛的采样方法,能够从分数模型与专家密度的乘积中高效采样,具有独立的理论与应用价值。

详情
英文摘要

Recent work has demonstrated surprisingly good performance of pre-trained LLMs on regression tasks (for example, time-series prediction), with the ability to incorporate expert prior knowledge and the information contained in textual metadata. However we observe major error cascades even in short sequences < ~100 points; these models are also computationally intensive and difficult to parallelise. Marginal LLM predictions do not suffer this issue and are trivially parallelised, but can predict over-broad densities. To address this, we propose combining these densities with a lightweight (diffusion-based) neural process. We show that this combination leads to better-calibrated predictions overall, outputs locally consistent trajectories, and leads to text-conditioned function space selection in the meta-learner. As part of this work we propose a gradient-free (and non-Monte Carlo) method for sampling from a product-of-experts of a score model and an 'expert' (here the LLM predictive densities). We believe this general method is of independent interest as it is applicable whenever an expert can be convolved with a Gaussian in closed form.

2512.17485 2026-05-14 math.PR stat.CO

Koenigs functions in the subcritical and critical Markov branching processes with Poisson probability reproduction of particles

Penka Mayster, Assen Tchorbadjieff

AI总结 本文研究了在泊松概率繁殖机制下的亚临界和临界马尔可夫分支过程中的Koenigs函数。通过求解柯尔莫戈罗夫方程并结合Koenigs函数的图示表示,作者得到了亚临界情况下的条件极限分布和临界情况下的不变测度。研究中获得了包含指数贝尔多项式和修正指数积分函数的显式解,为分支过程的理论分析提供了新的工具和方法。

详情
Journal ref
Integral Transforms and Special Functions, 1-24, (2026)
英文摘要

Special functions have always played a central role in physics and in mathematics, arising as solutions of nonlinear differential equations, as well as in the theory of branching processes, which extensively uses probability generating functions. The theory of iteration of real functions leads to limit theorems for the discrete-time and real-time Markov branching processes. The Poisson reproduction of particles in real time is analysed through the integration of the Kolmogorov equation. These results are further extended by employing graphical representations of Koenigs functions under subcritical and critical branching mechanisms. The limit conditional law in the subcritical case and the invariant measure for the critical case are discussed, as well. The obtained explicit solutions contain the exponential Bell polynomials and the modified exponential-integral function $\rm{Ein} (z)$.

2512.10857 2026-05-14 cs.LG cs.AI stat.ML

Generative Modeling from Black-box Corruptions via Self-Consistent Stochastic Interpolants

Chirag Modi, Jiequn Han, Eric Vanden-Eijnden, Joan Bruna

AI总结 本文研究了如何从受黑盒噪声干扰的数据中构建生成模型的问题。作者提出了一种基于随机插值的自洽方法(SCSI),通过迭代更新受污染数据与干净数据之间的映射,仅依赖于受污染数据集和对噪声通道的黑盒访问,从而实现对原始数据分布的逆向建模。该方法在计算效率、灵活性和理论保证方面具有优势,并在图像处理和科学重建等任务中表现出优越性能。

详情
Comments
Accepted at ICLR 2026
英文摘要

Transport-based methods have emerged as a leading paradigm for building generative models from large, clean datasets. However, in many scientific and engineering domains, clean data are often unavailable: instead, we only observe measurements corrupted through a noisy, ill-conditioned channel. A generative model for the original data thus requires solving an inverse problem at the level of distributions. In this work, we introduce a novel approach to this task based on Stochastic Interpolants: we iteratively update a transport map between corrupted and clean data samples using only access to the corrupted dataset as well as black box access to the corruption channel. Under appropriate conditions, this iterative procedure converges towards a self-consistent transport map that effectively inverts the corruption channel, thus enabling a generative model for the clean data. We refer to the resulting method as the self-consistent stochastic interpolant (SCSI). It (i) is computationally efficient compared to variational alternatives, (ii) highly flexible, handling arbitrary nonlinear forward models with only black-box access, and (iii) enjoys theoretical guarantees. We demonstrate superior performance on inverse problems in natural image processing and scientific reconstruction, and establish convergence guarantees of the scheme under appropriate assumptions. Our source code is publicly available at https://github.com/modichirag/SCSI

2511.14056 2026-05-14 cs.LG cs.AI cs.IT math.DG math.IT stat.ML

Radial Compensation: Fixing Radius Distortion in Chart-Based Generative Models on Riemannian Manifolds

Marios Papamichalis, Regina Ruane

AI总结 本文研究了基于坐标图的黎曼流形生成模型中的基础分布问题。传统方法在欧几里得切空间中采样后再映射到流形,但这种方法会导致测地距离的扭曲,不同坐标图、曲率和维度下相同切空间尺度可能对应不同的测地半径。为此,作者提出了一种称为径向补偿(Radial Compensation, RC)的方法,通过特定设计的基础分布使模型实现用户指定的测地半径分布,并提升了训练稳定性与曲率估计的清晰度。此外,文中还引入了平衡指数坐标图,进一步优化了模型的数值条件,使得统计意义与数值计算解耦,提高了模型的可解释性与实用性。

详情
英文摘要

We study the base distribution in chart-based generative models on Riemannian manifolds. Standard methods sample in Euclidean tangent space and then map the sample to the manifold with a chart. This is convenient, but it changes the meaning of distance: the same tangent-space scale can correspond to different geodesic radii, i.e. shortest-path distances from a reference point on the manifold, under different charts, curvatures, and dimensions. Within isotropic, scalar-Jacobian azimuthal charts, we show that no base distribution can simultaneously preserve geodesic-radial likelihoods, chart-invariant radial Fisher information, and tangent-space isotropy unless it has a specific form, which we call Radial Compensation (RC). RC chooses the tangent-space base so that the model realizes a user-specified one-dimensional law for the geodesic radius, and leaves the chart available as a numerical preconditioner. This gives more stable training and cleaner curvature estimates, because curvature no longer has to compensate for distortions introduced by the chart. We also introduce balanced exponential charts, which improve conditioning without changing the realized manifold density under RC. This decouples the statistical meaning of the model, the law of the geodesic radius, from its numerical conditioning, which is governed by the chart Jacobian: chart choice becomes a numerical preconditioner rather than a hidden modeling decision. Across manifold variational autoencoders and continuous normalizing flows, RC matches the intended radius behavior, improves numerical stability, and makes learned curvature easier to interpret.

2510.18114 2026-05-14 cs.LG cs.AI stat.ML

Latent-Augmented Discrete Diffusion Models

Dario Shariatian, Alain Durmus, Umut Simsekli, Stefano Peluchetti

AI总结 离散扩散模型在语言生成任务中展现出强大潜力,但现有方法常因忽略跨词依赖而影响生成效率。本文提出了一种名为Latent-Augmented Discrete Diffusion (LADD) 的新模型,通过引入可学习的辅助潜在变量,在联合的(词,潜在)空间中进行扩散,从而更好地捕捉结构信息并保持参数可学习性。实验表明,LADD在无条件生成任务中优于现有最优方法,尤其在低采样预算下表现更优。

详情
英文摘要

Discrete diffusion models have emerged as a powerful class of models and a promising route to fast language generation, but practical implementations typically rely on factored reverse transitions ignoring cross-token dependencies and degrading few-step performance. We propose Latent-Augmented Discrete Diffusion (LADD), which introduces a learnable auxiliary latent channel and performs diffusion over the joint (token, latent) space. The latent variables provide an intermediate representation expressing joint structure while preserving tractable parameterizations. We instantiate LADD with continuous latents (Co-LADD) and discrete latents (Di-LADD), and study two inference schedules: a joint diffusion that denoises data and latents together, and a sequential diffusion that first resolves latents and then samples tokens conditionally. We derive ELBO-style objectives and analyze design choices that balance latent expressivity with diffusion compatibility. In experiments, LADD models yield improvements on unconditional generation metrics as compared to state-of-the-art masked discrete diffusion baselines, and are effective at lower sampling budgets, where unmasking many tokens per step is desirable.

2510.16986 2026-05-14 stat.ML cs.LG stat.OT

When to Transfer: Adaptive Source Selection for Positive Transfer in Linear Models

Hamza Cherkaoui, Hélène Halconruy, Yohan Petetin

AI总结 在许多实际场景中,目标任务的标注数据稀缺或获取成本高昂,限制了监督学习的效果。本文研究了在多源设置下,如何通过样本共享选择性地从相关源任务中迁移信息以提升目标任务的性能。提出了一种基于数据依赖的迁移增益估计的接受/拒绝规则,用于决定从哪些源任务中引入多少样本,并证明该方法在高概率下能够保证正向迁移。实验表明,该方法在合成和真实数据上均优于经典及近期强基线方法,有效避免了负迁移。

详情
英文摘要

In many business settings, task-specific labeled data are scarce or costly to obtain, limiting supervised learning on a target task. A classical response is transfer learning (TL). Many TL works study how to transfer information from related sources. We study, for linear regression and classification, when to transfer via sample sharing: in a multi-source setting, we greedily decide from which sources and how many samples to incorporate into the target dataset. Our method uses an accept/reject rule based on a data-dependent estimate of the transfer gain, i.e the marginal decrease in target predictive error, computed conditionally on the observed target samples. We analyze our approach and show that how the derived statistical test enforces positive transfer with high probability. Under additional standard conditions, we also study the transfer gain itself and characterize when transfer is beneficial. Experiments on synthetic and real data show consistent gains over classical and recent strong baselines while avoiding negative transfer.

2510.16253 2026-05-14 cs.LG cs.AI q-bio.BM q-bio.QM stat.ML

Protein Folding with Neural Ordinary Differential Equations

Arielle Sanford, Shuo Sun, Christian B. Mendl

AI总结 本文提出了一种基于神经常微分方程(Neural ODE)的连续深度Evoformer模型,用于蛋白质折叠预测。该方法将传统Evoformer中48个离散块替换为连续时间参数化模块,从而在保持核心注意力机制的同时,显著降低了计算资源消耗。实验表明,该模型在较少计算资源下仍能生成结构合理的预测结果,并有效捕捉部分二级结构特征,展示了连续深度模型在生物分子建模中的潜力。

详情
Journal ref
Mach. Learn.: Sci. Technol. 7, 035008 (2026)
英文摘要

Recent advances in protein structure prediction, such as AlphaFold, have demonstrated the power of deep neural architectures like the Evoformer for capturing complex spatial and evolutionary constraints on protein conformation. However, the depth of the Evoformer, comprising 48 stacked blocks, introduces high computational costs and rigid layerwise discretization. Inspired by Neural Ordinary Differential Equations (Neural ODEs), we propose a continuous-depth formulation of the Evoformer, replacing its 48 discrete blocks with a Neural ODE parameterization that preserves its core attention-based operations. This continuous-time Evoformer achieves constant memory cost (in depth) via the adjoint method, while allowing a principled trade-off between runtime and accuracy through adaptive ODE solvers. Benchmarking on protein structure prediction tasks, we find that the Neural ODE-based Evoformer produces structurally plausible predictions and reliably captures certain secondary structure elements, such as alpha-helices, though it does not fully replicate the accuracy of the original architecture. However, our model achieves this performance using dramatically fewer resources, just 17.5 hours of training on a single GPU, highlighting the promise of continuous-depth models as a lightweight and interpretable alternative for biomolecular modeling. This work opens new directions for efficient and adaptive protein structure prediction frameworks.

2509.24728 2026-05-14 cs.LG stat.ML

Beyond Softmax: A Natural Parameterization for Categorical Random Variables

Alessandro Manenti, Cesare Alippi

AI总结 该论文提出了一种替代传统softmax函数的新方法——catnat函数,用于处理分类随机变量。从信息几何角度出发,作者揭示了softmax的局限性,并通过分层二元分割构造catnat函数,使其具有对角化的费舍尔信息矩阵,从而提升梯度下降的效率。实验表明,catnat在图结构学习、变分自编码器和强化学习等多种任务中均能提高学习效率和模型性能,且易于实现并兼容现有训练技术。

详情
英文摘要

Latent categorical variables are frequently found in deep learning architectures. They can model actions in discrete reinforcement-learning environments, represent categories in latent-variable models, or express relations in graph neural networks. Despite their widespread use, their discrete nature poses significant challenges to gradient-descent learning algorithms. While a substantial body of work has offered improved gradient estimation techniques, we take a complementary approach. Specifically, we: 1) revisit the ubiquitous $\textit{softmax}$ function and demonstrate its limitations from an information-geometric perspective; 2) replace the $\textit{softmax}$ with the $\textit{catnat}$ function, a function composed of a sequence of hierarchical binary splits; we prove that this choice offers significant advantages to gradient descent due to the resulting diagonal Fisher Information Matrix. A rich set of experiments - including graph structure learning, variational autoencoders, and reinforcement learning - empirically show that the proposed function improves the learning efficiency and yields models characterized by consistently higher test performance. $\textit{Catnat}$ is simple to implement and seamlessly integrates into existing codebases. Moreover, it remains compatible with standard training stabilization techniques and, as such, offers a better alternative to the $\textit{softmax}$ function.

2507.22095 2026-05-14 stat.ML cs.LG math.PR

Posterior Bayesian Neural Networks with Dependent Weights

Nicola Apollonio, Giovanni Franzina, Giovanni Luca Torrisi

AI总结 本文研究具有依赖权重和可能重尾分布的全连接前馈深度神经网络,旨在克服标准高斯先验的局限性。通过引入高斯似然的后验分布视角,论文分析了在网络宽度趋于无穷时输出的后验分布行为,并在先验下随机协方差矩阵正定的条件下,确定了输出的后验分布。研究还给出了确保协方差矩阵可逆的温和条件,并展示了某些模型参数(如激活函数和相关Lévy测度)对极限独立性的影响,扩展了已有研究成果。

详情
Comments
2 figures
英文摘要

We consider fully connected and feedforward deep neural networks with dependent and possibly heavy-tailed weights, as introduced in [26], to address limitations of the standard Gaussian prior. It has been proved in [26] that, as the number of nodes in the hidden layers grows large, according to a sequential and ordered limit, the law of the output converges weakly to a Gaussian mixture. In this paper, we study the neural network through the lens of the posterior distribution with a Gaussian likelihood. If the random covariance matrix of the infinite-width limit is positive definite under the prior, we identify the posterior distribution of the output in the wide-width limit according to a sequential regime. Remarkably, we provide mild sufficient conditions to ensure the aforementioned invertibility of the random covariance matrix under the prior, thereby extending the results in [8]. Among our results, we present sufficient conditions on some model parameters (the activation function and the associated Lévy measures) which ensure that the sequential limits are independent of the order. We illustrate our findings with examples and numerical simulations.

2507.17172 2026-05-14 stat.ME stat.AP

Local graph estimation with pathwise false discovery control

Omar Melikechi, David B. Dunson, Noureddine Melikechi, Jeffrey W. Miller

AI总结 本文提出了一种局部图估计方法,旨在在复杂网络中聚焦于科学感兴趣的变量,揭示其局部结构关系。该方法引入路径式特征选择(PFS),通过迭代特征选择和网络路径不确定性传播,实现了对局部子图的有效估计,并在有限样本下严格控制假发现率。研究展示了PFS在多个实际应用中的有效性,能够恢复符合领域知识的可解释网络,有助于发现已有机制并提出新假设。

详情
英文摘要

Many datasets include a small set of variables, such as biomarkers or clinical outcomes, whose relationships to the broader system are of primary scientific interest. Estimating the full network of inter-variable relationships in such settings often obscures local structures around these targets, limiting interpretability. To address this fundamental problem, we introduce local graph estimation, a statistical framework for inferring substructures around target variables. We show that traditional graph estimation methods often fail to recover local structure, and present pathwise feature selection (PFS) as an effective alternative. PFS estimates local subgraphs by iteratively applying feature selection and propagating uncertainty along network paths, providing rigorous finite-sample false discovery control even in settings with mixed variable types and nonlinear dependencies. In four distinct applications spanning environmental and public health, multiomics, brain connectomics, and single-nucleus RNA sequencing, PFS recovers interpretable networks consistent with domain knowledge, highlighting its ability to uncover established mechanisms and generate novel hypotheses.

2507.10797 2026-05-14 cs.LG math.OC stat.ML

Multi-Armed Sampling Problem and the End of Exploration

Mohammad Pedramfar, Siamak Ravanbakhsh

AI总结 本文提出了多臂采样问题框架,作为多臂老虎机优化问题的采样对应,旨在严谨分析采样过程中的探索与利用权衡。研究系统定义了该框架下的遗憾概念并建立了下界,提出了一种简单算法实现了近似最优的遗憾界,理论结果表明与优化不同,采样几乎无需探索。通过引入温度参数,本文还建立了连接多臂采样与多臂老虎机的连续问题族,为采样相关研究,如神经采样器、熵正则化强化学习等提供了基础理论支持。

详情
Comments
29th International Conference on Artificial Intelligence and Statistics (AISTATS) 2026
英文摘要

This paper introduces the framework of multi-armed sampling, which serves as the sampling counterpart to the optimization problem of multi-armed bandits. Our primary motivation is to rigorously examine the exploration-exploitation trade-off in the context of sampling. We systematically define plausible notions of regret for this framework and establish corresponding lower bounds. We then propose a simple algorithm that achieves near-optimal regret bounds. Our theoretical results suggest that, in contrast to optimization, sampling barely requires any exploration. To further connect our findings with those of multi-armed bandits, we define a continuous family of problems and associated regret measures that smoothly interpolate and unify multi-armed sampling and multi-armed bandit problems using a temperature parameter. We believe that the multi-armed sampling framework and our findings in this setting can play a foundational role in the study of sampling, including recent neural samplers, much like the role of multi-armed bandits in reinforcement learning. In particular, our work sheds light on the role of exploration (or lack thereof) and the convergence properties of algorithms for entropy-regularized reinforcement learning, fine-tuning of pretrained models and reinforcement learning with human feedback (RLHF).

2506.03120 2026-05-14 stat.AP cs.LG

Validating remotely sensed biomass estimates with forest inventory data in the western US

Xiuyu Cao, Joseph O. Sexton, Panshi Wang, Dimitrios Gounaridis, Neil H. Carter, Kai Zhu

AI总结 该研究旨在验证商业遥感公司terraPulse提供的地表以上生物量密度(AGBD)数据的准确性,利用美国林业局的森林清查与分析(FIA)数据作为独立参考。研究在美国内华达州、犹他州和华盛顿州的64,000公顷六边形区域及县尺度上进行验证,结果显示terraPulse与FIA数据在县尺度上具有高度一致性,R²达0.90,相关系数为0.95。研究还揭示了terraPulse数据在非森林区域和高生物量森林中与FIA数据的偏差原因,并提出了一个基于独立FIA数据的可扩展验证框架,为全球生物量监测提供了新的商业数据基准。

详情
Journal ref
Science of Remote Sensing, Volume 13, June 2026, 100441
Comments
32 pages, 5 figures
英文摘要

Monitoring aboveground biomass (AGB) and its density (AGBD) at high resolution is essential for carbon accounting and ecosystem management. While NASA's spaceborne Global Ecosystem Dynamics Investigation (GEDI) LiDAR mission provides globally distributed reference measurements for AGBD estimation, the majority of commercial remote sensing products based on GEDI remain without rigorous or independent validation. Here, we present an independent regional validation of an AGBD dataset offered by terraPulse, Inc., based on independent reference data from the US Forest Service Forest Inventory and Analysis (FIA) program. Aggregated to 64,000-hectare hexagons and US counties across the US states of Utah, Nevada, and Washington, we found very strong agreement between terraPulse and FIA estimates. At the hexagon scale, we report R2 = 0.88, RMSE = 26.68 Mg/ha, and a correlation coefficient (r) of 0.94. At the county scale, agreement improves to R2 = 0.90, RMSE =32.62 Mg/ha, slope = 1.07, and r = 0.95. Spatial and statistical analyses indicated that terraPulse AGBD values tended to exceed FIA estimates in non-forest areas, likely due to FIA's limited sampling of non-forest vegetation. The terraPulse AGBD estimates also exhibited lower values in high-biomass forests, likely due to saturation effects in its optical remote-sensing covariates. This study advances operational carbon monitoring by delivering a scalable framework for comprehensive AGBD validation using independent FIA data, as well as a benchmark validation of a new commercial dataset for global biomass monitoring.

2505.17469 2026-05-14 cs.LG cs.AI cs.IT math.IT math.OC math.ST stat.TH

Efficient compression of neural networks and datasets

Lukas Silvester Barth, Paulo von Petersenn

AI总结 本文探讨了神经网络与数据集的高效压缩问题,结合算法信息论与神经网络剪枝技术,提出了一种基于最小描述长度原则(MDL)的模型泛化优化方法。通过引入参数稀疏性作为模型描述长度的可计算近似,并改进稀疏优化算法,作者在图像和文本数据集上实现了显著的模型压缩,同时保持了较高的准确率。实验还验证了压缩模型在样本效率和泛化能力上的优势,支持了索洛莫诺夫归纳理论的预测。

详情
Comments
10 pages plus appendix, 9 Figures, 6 Tables
英文摘要

Compression and generalization are fundamentally related through Solomonoff induction and the minimum description length principle (MDL), which predict that simpler models generalize better when data arises from low-complexity distributions. In this article, we combine insights from algorithmic information theory and techniques from neural network pruning to improve model generalization by identifying the most effective data compression method. Since exact MDL optimization is intractable, we cast it as $\ell_0$ regularized learning and explain why parameter sparsity provides an effective computable approximation of model description length. To identify the best practical approach, we systematically compare and refine complementary sparse optimization methods. In particular, we improve probabilistic pruning through a procedure that does not require Monte Carlo sampling and refine smooth $\ell_0$ approximations with a binary search routine that reduces hyperparameter complexity. Across convolutional networks and transformers evaluated on image and text datasets, our refined methods improve upon their predecessors, achieve substantial model compression with minimal accuracy loss, and yield short data description lengths. Finally, we use these methods in a controlled teacher-student setting to empirically verify the prediction of Solomonoff induction that compressed models learn more sample-efficiently and generalize better.

2505.14587 2026-05-14 stat.ML cs.LG

High-Dimensional Analysis of Bootstrap Ensemble Classifiers

Malik Tiomoko, Hamza Cherkaoui, Mohamed El Amine Seddik, Cosme Louart, Ekkehard Schnoor, Balazs Kegl

AI总结 本文对应用于最小二乘支持向量机(LSSVM)集成分类器的自助(Bootstrap)方法进行了理论分析,重点关注样本量和特征维度较大的场景。通过随机矩阵理论工具,研究了由多个弱分类器决策函数聚合而成的分类器性能,并探讨了自助方法在高维设置下的应用效果。基于理论分析,提出了优化子集数量和正则化参数以提升LSSVM集成性能的策略,实验结果在合成和真实数据集上验证了理论结论的有效性。

详情
英文摘要

Bootstrap methods have long been the cornerstone of ensemble learning in machine learning. This paper presents a theoretical analysis of bootstrap techniques applied to the Least Square Support Vector Machine (LSSVM) ensemble in the context of large and growing sample sizes and feature dimensionalities. Using tools from Random Matrix Theory, we investigate the performance of this classifier that aggregates decision functions from multiple weak classifiers, each trained on different subsets of the data. We provide insights into the use of bootstrap methods in high-dimensional settings, enhancing our understanding of their impact. Based on these findings, we propose strategies to select the number of subsets and the regularization parameter that maximize the performance of the LSSVM. Empirical experiments on synthetic and real-world datasets validate our theoretical results.

2505.04613 2026-05-14 stat.ML cs.LG math.ST stat.TH

Kernel Embeddings and the Separation of Measure Phenomenon

Leonardo V. Santoro, Kartik G. Waghmare, Victor M. Panaretos

AI总结 本文研究了核嵌入在区分连续概率分布中的能力,证明了核协方差嵌入能够实现信息论意义上的完美分离。研究指出,在局部紧致的不可数波兰空间上,两个非原子概率测度的相等性检验等价于在再生核希尔伯特空间中两个中心高斯测度的奇异性检验。这一现象揭示了核方法在高维或复杂领域中表现出色的核心机制,并为设计高效的推理工具提供了理论依据。

详情
英文摘要

We prove that kernel covariance embeddings lead to information-theoretically perfect separation of distinct continuous probability distributions. In statistical terms, we establish that testing for the \emph{equality} of two non-atomic (Borel) probability measures on a locally compact uncountable Polish space is \emph{equivalent} to testing for the \emph{singularity} between two centered Gaussian measures on a reproducing kernel Hilbert space. The corresponding Gaussians are defined via the notion of kernel covariance embedding of a probability measure, and the Hilbert space is that generated by the embedding kernel. Distinguishing singular Gaussians is structurally simpler from an information-theoretic perspective than non-parametric two-sample testing, particularly in complex or high-dimensional domains. This is because singular Gaussians are supported on essentially separate and affine subspaces. Our proof leverages the classical Feldman-Hájek dichotomy, and shows that even a small perturbation of a continuous distribution will be maximally magnified through its Gaussian embedding. This ``separation of measure phenomenon'' appears to be a blessing of infinite dimensionality, by means of embedding, with the potential to inform the design of efficient inference tools in considerable generality. The elicitation of this phenomenon also appears to crystallize, in a precise and simple mathematical statement, a core mechanism underpinning the empirical effectiveness of kernel methods.

2504.03158 2026-05-14 stat.ML cs.LG

Accelerating Particle-based Energetic Variational Inference

Xuelian Bao, Lulu Kang, Chun Liu, Yiwei Wang

AI总结 本文提出了一种基于粒子的变分推断方法,旨在加速已有隐式方案的能变分推断(EVI-Im)。该方法借鉴能量二次化和算子分裂技术,通过高效引导粒子向目标分布移动,并保留稳定性机制。与EVI-Im不同,新方法避免了每步中重复计算粒子间相互作用项,显著降低了计算成本,同时框架也可扩展至其他基于梯度的采样技术。实验表明,该方法在效率和鲁棒性方面具有优势,性能与现有粒子变分推断方法相当。

详情
Comments
22 pages, 6 figures, 2 tables
英文摘要

In this work, we propose a new particle-based variational inference (ParVI) method for accelerating the Energetic Variational Inference with Implicit scheme (EVI-Im) introduced in Ref. \cite{wang2021particle}. Inspired by energy quadratization (EQ) and operator splitting techniques for gradient flows, the proposed method efficiently drives particles towards the target distribution, while retaining a meaningful stability mechanism. Unlike EVI-Im, which employs the implicit Euler method to solve variational-preserving particle dynamics obtained from a "discretization-then-variation" approach for minimizing the Kullback--Leibler divergence, the proposed algorithm avoids repeated evaluation of inter-particle interaction terms within each time step, significantly reducing computational cost. The framework is also extensible to other gradient-based sampling techniques. Through several numerical experiments, we demonstrate that the proposed method achieves competitive performance compared with existing ParVI approaches, while offering advantages in efficiency and robustness in certain regimes.

2502.02270 2026-05-14 cs.LG math.OC stat.ML

Exact Sequence Interpolation with Transformers

Albert Alcalde, Giovanni Fantuzzi, Enrique Zuazua

AI总结 本文研究了变压器模型在有限输入序列插值问题中的能力,证明了其可以在实数空间中精确插值任意长度的输入序列及其对应输出序列。通过交替使用前馈层和自注意力层,并结合自注意力机制中的聚类效应,作者构建了一个参数数量与输入序列长度无关的变压器模型,实现了精确插值。此外,该方法还引入了低秩参数矩阵,提升了模型的实用性,并将结果从硬最大自注意力扩展到软最大自注意力,同时提供了正则化训练下的收敛性保证,为理解变压器模型的理论性能提供了新视角。

详情
Comments
36 pages, 9 figures. Funded by the European Union (Horizon Europe MSCA project ModConFlex, grant number 101073558)
英文摘要

We prove that transformers can exactly interpolate datasets of finite input sequences in $\mathbb{R}^d$, $d\geq 2$, with corresponding output sequences of smaller or equal length. Specifically, given $N$ sequences of arbitrary but finite lengths in $\mathbb{R}^d$ and output sequences of lengths $m^1, \dots, m^N \in \mathbb{N}$, we construct a transformer with $\mathcal{O}(\sum_{j=1}^N m^j)$ blocks and $\smash{\mathcal{O}(d \sum_{j=1}^N m^j)}$ parameters that exactly interpolates the dataset. Our construction provides complexity estimates that are independent of the input sequence length, by alternating feed-forward and self-attention layers and by capitalizing on the clustering effect inherent to the latter. Our novel constructive method also uses low-rank parameter matrices in the self-attention mechanism, a common feature of practical transformer implementations. These results are first established in the hardmax self-attention setting, where the geometric structure permits an explicit and quantitative analysis, and are then extended to the softmax setting. Finally, we demonstrate the applicability of our exact interpolation construction to learning problems, in particular by providing convergence guarantees to a global minimizer under regularized training strategies. Our analysis contributes to the theoretical understanding of transformer models, offering an explanation for their excellent performance in exact sequence-to-sequence interpolation tasks.

2501.01541 2026-05-14 stat.ME

Denoising Diffused Embeddings: a Generative Approach for Hypergraphs

Shihao Wu, Junyi Yang, Gongjun Xu, Ji Zhu

AI总结 该论文研究了如何从高维超图数据中生成新的超边,这一任务在电子健康记录分析和生物研究等领域具有重要应用。为了解决超边离散性、模型可解释性差以及超图结构复杂等挑战,作者提出了去噪扩散嵌入(DDE)方法,通过条件超边似然模型将离散超边映射到连续潜在嵌入空间,并利用基于分数的扩散模型进行重构。理论分析表明,DDE在真实潜在嵌入可得时能将高维超边生成简化为低维嵌入生成,实验验证了其在计算效率和生成性能上的优越性。

详情
英文摘要

Hypergraph data, which capture multi-way interactions among entities, are increasingly prevalent in the big data era. Generating new hyperlinks from an observed, usually high-dimensional hypergraph is an important yet challenging task with diverse applications in areas such as electronic health record analysis and biological research. This task is fraught with several challenges. The discrete nature of hyperlinks renders many existing generative models inapplicable. Additionally, powerful machine learning-based generative models often operate as black boxes, providing limited interpretability. Key structural characteristics of hypergraphs, including node degree heterogeneity and hyperlink sparsity, further complicate the modeling process and must be carefully addressed. To tackle these challenges, we propose Denoising Diffused Embeddings (DDE), a general and efficient generative modeling architecture for hypergraphs. DDE exploits low-rank structure in high-dimensional hypergraphs via a conditional hyperlink likelihood model that links discrete hyperlinks to a continuous latent embedding space and leverages a score-based diffusion model to reconstruct that space. Theoretically, we show that when true latent embeddings are accessible, DDE exactly reduces the task of generating new high-dimensional hyperlinks to generating new low-dimensional embeddings. Moreover, we analyze the implications of using estimated embeddings in DDE, revealing how hypergraph characteristics such as dimensionality, node degree heterogeneity, and hyperlink sparsity impact its generative performance. Simulation studies demonstrate the superiority of DDE over existing methods, in terms of both computational efficiency and generative performance. Furthermore, an application to a symptom co-occurrence hypergraph derived from electronic medical records uncovers interesting findings and highlights the advantages of DDE.

2411.04229 2026-05-14 stat.ME

Detecting State Changes in Functional Neuronal Connectivity using Factorial Switching Linear Dynamical Systems

Yiwei Gong, Susanna B. Mierau, Sinead A. Williamson

AI总结 本文研究如何从神经元活动的时间记录中识别动态变化的功能连接性,并提出了一种基于因子隐马尔可夫模型的切换线性动态系统,以捕捉多个潜在子网络可能同时或独立激活的特性。该方法能够更准确地反映神经网络中局部变化不影响整体连接模式的特点,并结合可扩展的变分推断算法,有效推断潜在状态和模型参数,成功揭示了体外神经元培养中神经活动成熟过程的结构特征。

详情
英文摘要

A key question in brain sciences is how to identify time-evolving functional connectivity, such as that obtained from recordings of neuronal activity over time. We wish to explain the observed phenomena in terms of latent states which, in the case of neuronal activity, might correspond to subnetworks of neurons within a brain or organoid. Many existing approaches assume that only one latent state can be active at a time, in contrast to our domain knowledge. We propose a switching dynamical system based on the factorial hidden Markov model. Unlike existing approaches, our model acknowledges that neuronal activity can be caused by multiple subnetworks, which may be activated either jointly or independently. A change in one part of the network does not mean that the entire connectivity pattern will change. We pair our model with scalable variational inference algorithm, using a concrete relaxation of the underlying factorial hidden Markov model, to effectively infer the latent states and model parameters. We show that our algorithm can recover ground-truth structure and yield insights about the maturation of neuronal activity in microelectrode array recordings from in vitro neuronal cultures.

2410.16477 2026-05-14 stat.ME stat.ML

Finite-Sample and Distribution-Free Fair Classification: Optimal Trade-off Between Excess Risk and Fairness, and the Cost of Group-Blindness

Xiaotian Hou, Linjun Zhang

AI总结 本文研究了在有限样本和分布无关条件下实现算法公平分类的最优风险与公平性权衡问题,提出了一个适用于有偏见和无偏见场景的统一框架,能够在控制额外风险的同时提供公平性保证。该方法基于一种适用于任意黑箱模型的后处理过程,具有良好的实用性。理论分析表明该算法在最小最大意义下接近最优,并通过大量实验验证了其有效性。

详情
英文摘要

Algorithmic fairness has become a central concern in modern machine learning and AI applications. However, two pressing challenges remain: (1) The fairness guarantees of existing methods often rely on specific data distributional assumptions and large sample sizes, which can lead to fairness violations in practice. (2) Due to legal and societal considerations, using sensitive group attributes during decision-making (referred to as the group-blind setting) may not always be feasible. In this work, we quantify the impact of enforcing algorithmic fairness and group-blindness/awareness in binary classification under group fairness constraints. Specifically, we propose a unified framework for fair classification that provides distribution-free and finite-sample fairness guarantees with controlled excess risk. This framework is applicable to various group fairness notions in both group-aware and group-blind scenarios. Our approach is based on a post-processing procedure that can be applied to arbitrary black-box models, making it directly compatible with modern machine learning pipelines. Furthermore, we establish a minimax lower bound showing the minimax rate-optimality of our proposed algorithm up to logarithmic factors. Through extensive synthetic and real data studies, we further demonstrate the competitive or superior performance of our algorithm compared to existing methods, and provide empirical support for our theoretical findings.

2407.11518 2026-05-14 stat.ML cs.LG stat.OT

Ensemble Transport Filter via Optimized Maximum Mean Discrepancy

Dengfei Zeng, Lijian Jiang

AI总结 本文提出了一种基于最优最大均值差异(MMD)的集合传输滤波方法,通过构建传输映射直接将先验粒子转移到后验粒子,从而改进粒子滤波中的分析步骤。该方法利用MMD损失函数优化传输映射,以匹配近似后验与参考后验的期望信息,并引入方差惩罚项以增强鲁棒性,有效提升了高维数据同化问题中的后验分布估计精度。数值实验表明,该方法在性能上优于传统的集合卡尔曼滤波。

详情
Comments
27 pages, 14 figures
英文摘要

In this paper, we present a new ensemble-based filter method by reconstructing the analysis step of the particle filter through a transport map, which directly transports prior particles to posterior particles. The transport map is constructed through an optimization problem described by the Maximum Mean Discrepancy loss function, which matches the expectation information of the approximated posterior and reference posterior. The proposed method inherits the accurate estimation of the posterior distribution from particle filtering while gives an extension to high dimensional assimilation problems. To improve the robustness of Maximum Mean Discrepancy, a variance penalty term is used to guide the optimization. It prioritizes minimizing the discrepancy between the expectations of highly informative statistics for the reference posteriors. The penalty term significantly enhances the robustness of the proposed method and leads to a better approximation of the posterior. A few numerical examples are presented to illustrate the advantage of the proposed method over ensemble Kalman filter.

2407.01602 2026-05-14 cs.CL cs.LG math.DS stat.ML

Clustering in pure-attention hardmax transformers and its role in sentiment analysis

Albert Alcalde, Giovanni Fantuzzi, Enrique Zuazua

AI总结 本文研究了纯注意力机制中使用硬max自注意力和归一化子层的Transformer模型在层数趋于无穷时的行为,揭示了其输入会收敛到由特定“领导者”点决定的聚类平衡状态。通过将Transformer视为欧几里得空间中的离散时间动力系统,并结合超平面分离的几何解释,作者提出了一个可解释的Transformer模型,用于情感分析任务,能够通过围绕有意义“领导者”词聚类无意义词来有效捕捉上下文信息。该研究为理解Transformer的数学特性提供了理论基础,并指出了理论分析与实际应用之间的挑战。

详情
Journal ref
SIAM Journal on Mathematics of Data Science 7(3):1367-1393, 2025
Comments
23 pages, 11 figures, 1 table. Funded by the European Union (Horizon Europe MSCA project ModConFlex, grant number 101073558). Accompanying code available at: https://github.com/DCN-FAU-AvH/clustering-hardmax-transformers
英文摘要

Transformers are extremely successful machine learning models whose mathematical properties remain poorly understood. Here, we rigorously characterize the behavior of transformers with hardmax self-attention and normalization sublayers as the number of layers tends to infinity. By viewing such transformers as discrete-time dynamical systems describing the evolution of points in a Euclidean space, and thanks to a geometric interpretation of the self-attention mechanism based on hyperplane separation, we show that the transformer inputs asymptotically converge to a clustered equilibrium determined by special points called \textit{leaders}. We then leverage this theoretical understanding to solve sentiment analysis problems from language processing using a fully interpretable transformer model, which effectively captures `context' by clustering meaningless words around leader words carrying the most meaning. Finally, we outline remaining challenges to bridge the gap between the mathematical analysis of transformers and their real-life implementation.

2406.13619 2026-05-14 stat.ML cs.LG

Generative Modeling by Minimizing the Wasserstein-2 Loss

Yu-Jui Huang, Zachariah Malik

AI总结 本文提出了一种通过最小化二阶Wasserstein损失($W_2$损失)的生成模型,利用与真实数据分布及当前估计相关的Kantorovich势构建分布依赖的常微分方程(ODE)。研究证明该ODE的时间边缘分布形成$W_2$损失的梯度流,并以指数速率收敛于真实数据分布。基于该ODE设计了欧拉数值方案,并通过持续训练策略构建算法,在低维和高维实验中均优于传统Wasserstein生成对抗网络。

详情
英文摘要

This paper develops a generative model by minimizing the second-order Wasserstein loss (the $W_2$ loss) through a distribution-dependent ordinary differential equation (ODE), whose dynamics involves the Kantorovich potential associated with the true data distribution and a current estimate of it. A main result shows that the time-marginal laws of the ODE form a gradient flow for the $W_2$ loss, which converges exponentially to the true data distribution. An Euler scheme for the ODE is proposed and it is shown to recover the gradient flow for the $W_2$ loss in the limit. An algorithm is designed by following the scheme and applying persistent training, which naturally fits our gradient-flow approach. In both low- and high-dimensional experiments, our algorithm outperforms Wasserstein generative adversarial networks by increasing the level of persistent training appropriately.

2404.17772 2026-05-14 stat.ME stat.CO

PWEXP: An R Package Using Piecewise Exponential Model for Study Design and Event/Timeline Prediction

Tianchen Xu, Rachael Wen, Wen Zhang

AI总结 本文介绍了一个名为PWEXP的R语言软件包,用于基于分段指数模型(PWE)进行临床试验设计及事件/时间线预测。该方法通过将风险函数划分为多个具有恒定风险的区间,兼顾了模型的灵活性和计算的便捷性,能够更准确地预测事件数量和分析时间,从而提升研究设计的可靠性。PWEXP包通过AIC、BIC和交叉验证等标准选择最佳分段点,提供稳健的生存曲线拟合与可视化功能,有助于提高样本量计算和研究时间规划的准确性。

详情
Comments
37 pages, 15 figures
英文摘要

Parametric assumptions such as exponential distribution are commonly used in clinical trial design and analysis. However, violation of distribution assumptions can introduce biases in sample size and power calculations. Piecewise exponential (PWE) hazard model partitions the hazard function into segments each with constant hazards and is easy for interpretation and computation. Due to its piecewise property, PWE can fit a wide range of survival curves and accurately predict the future number of events and analysis time in event-driven clinical trials, thus enabling more flexible and reliable study designs. Compared with other existing approaches, the PWE model provides a superior balance of flexibility and robustness in model fitting and prediction. The proposed PWEXP package is designed for estimating and predicting PWE hazard models for right-censored data. By utilizing well-established criteria such as AIC, BIC, and cross-validation log-likelihood, the PWEXP package chooses the optimal number of change-points and determines the optimal position of change-points. With its particular goodness-of-fit, the PWEXP provides accurate and robust hazard estimation, which can be used for reliable power calculation at study design and timeline prediction at study conduct. The package also offers visualization functions to facilitate the interpretation of survival curve fitting results.

2312.04110 2026-05-14 stat.ML cs.LG physics.soc-ph

Small Area Estimation of Case Growths for Timely COVID-19 Outbreak Detection

Zhaowei She, Zilong Wang, Jagpreet Chhatwal, Turgay Ayer

AI总结 本文提出了一种基于迁移学习的随机森林框架(TLRF),用于在小样本区域中准确估计新冠疫情病例增长率,从而实现对疫情爆发的及时检测。该方法通过将增长率估计问题转化为回归任务,并利用随机森林的自适应加权机制实现跨时空的迁移学习,有效平衡了估计精度与计算速度之间的矛盾。实验表明,TLRF在预测性能上优于现有方法,并在科罗拉多州的案例研究中将疫情爆发的检测效率提升了224%。

详情
Comments
Equal contributions by co-first authors Zhaowei She, Zilong Wang (in alphabetical order)
英文摘要

The COVID-19 pandemic has exerted a profound impact on the global economy and continues to exact a significant toll on human lives. The COVID-19 case growth rate stands as a key epidemiological parameter to estimate and monitor for effective detection and containment of the resurgence of outbreaks. A fundamental challenge in growth rate estimation and hence outbreak detection is balancing the accuracy-speed tradeoff, where accuracy typically degrades with shorter fitting windows. In this paper, we provide a transfer learning framework, which we call Transfer Learning Random Forest (TLRF), for an effective implementation of the random forests algorithm that balances this accuracy-speed tradeoff. Specifically, we develop an identification strategy that converts the growth rate estimation problem into a regression task, which enables effective transfer learning across space and time through random forests' adaptive weighting mechanism. As such, through adaptively choosing fitting window sizes based on relevant day-level and county-level features affecting the disease spread, TLRF can accurately estimate case growth rates for counties with small sample sizes. Out-of-sample prediction analysis shows that TLRF outperforms established growth rate estimation methods. Furthermore, we conducted a case study based on outbreak case data from the state of Colorado and showed that TLRF could improve timely detections of outbreaks up to 224% when compared to the decisions made by Colorado's Department of Health and Environment (CDPHE). To demonstrate practical implementation, we developed a publicly available outbreak detection tool that operated from September 2020 through March 2023, receiving substantial attention from policymakers across all 50 states.

2208.12930 2026-05-14 stat.CO math.ST stat.TH

Joint distribution properties of Fully Conditional Specification under the normal linear model with normal inverse-gamma priors

Mingyang Cai, Stef van Buuren, Gerko Vink

AI总结 本文研究了在正态线性模型中使用正态逆伽玛先验时,全条件规范(FCS)方法的联合分布性质。通过理论分析和模拟实验,证明了FCS在该先验下能够收敛,并展示了联合模型与条件模型在先验设定上的等价性。该研究扩展了FCS在有信息先验下的收敛性分析,为缺失数据的多重插补提供了更坚实的理论支持。

详情
Journal ref
Scientific Reports, 2023, 13:644
英文摘要

Fully conditional specification (FCS) is a convenient and flexible multiple imputation approach. It specifies a sequence of simple regression models instead of a potential complex joint density for missing variables. However, FCS may not converge to a stationary distribution. Many authors have studied the convergence properties of FCS when priors of conditional models are non-informative. We extend to the case of informative priors. This paper evaluates the convergence properties of the normal linear model with normal-inverse gamma prior. The theoretical and simulation results prove the convergence of FCS and show the equivalence of prior specification under the joint model and a set of conditional models when the analysis model is a linear regression with normal inverse-gamma priors.

2208.12929 2026-05-14 stat.CO

Graphical and numerical diagnostic tools to assess multiple imputation models by posterior predictive checking

Mingyang Cai, Stef van Buuren, Gerko Vink

AI总结 该论文提出了一种基于后验预测检验的诊断方法,用于评估多重插补模型的合理性。通过比较观测数据与根据后验预测分布生成的重复数据,判断插补模型是否与实际分析模型一致。研究通过模拟和实际应用验证了该方法在参数和半参数插补方法、连续和离散缺失变量以及单变量和多变量缺失模式中的有效性。

详情
Journal ref
Heliyon, Volume 9, Issue 6, 2023, e17077
英文摘要

Missing data are often dealt with multiple imputation. A crucial part of the multiple imputation process is selecting sensible models to generate plausible values for incomplete data. A method based on posterior predictive checking is proposed to diagnose imputation models based on posterior predictive checking. To assess the congeniality of imputation models, the proposed diagnostic method compares the observed data with their replicates generated under corresponding posterior predictive distributions. If the imputation model is congenial with the substantive model, the observed data are expected to be located in the centre of corresponding predictive posterior distributions. Simulation and application are designed to investigate the proposed diagnostic method for parametric and semi-parametric imputation approaches, continuous and discrete incomplete variables, univariate and multivariate missingness patterns. The results show the validity of the proposed diagnostic method.

1906.00573 2026-05-14 q-fin.ST q-fin.PM stat.AP

Conditional inference on the asset with maximum Sharpe ratio

Steven E. Pav

AI总结 本文研究了在一组可能存在相关性的资产中,对具有最大样本夏普比率的资产的信噪比进行条件推断的问题。作者应用了Lee等人提出的方法,并提出了一个用于该条件估计过程的多元夏普比率近似标准误。研究还比较了多种替代方法,如Bonferroni校正、卡方检验、Follman检验等,结果表明所提出的条件推断方法在保持名义I型错误率方面表现良好,且不受收益非正态性的影响,具有较好的统计性能。

详情
Comments
code and latex source available from github repo, github.com/shabbychef/maxsharpe
英文摘要

We apply the procedure of Lee et al. to the problem of performing inference on the signal-noise ratio of the asset which displays maximum sample Sharpe ratio over a set of possibly correlated assets. We find a multivariate analogue of the commonly used approximate standard error of the Sharpe ratio to use in this conditional estimation procedure. We also consider several alternative procedures, including the simple Bonferroni correction for multiple hypothesis testing, which we fix for the case of positive common correlation among assets, the chi-bar square test against one-sided alternatives, Follman's test, and Hansen's asymptotic adjustments. Testing indicates the conditional inference procedure achieves nominal type I rate, and does not appear to suffer from non-normality of returns. The conditional estimation test has low power under the alternative where there is little spread in the signal-noise ratios of the assets, and high power under the alternative where a single asset has high signal-noise ratio. Unlike the alternative procedures, it appears to enjoy rejection probabilities monotonic in the signal-noise ratio of the selected asset, and actually maintains near-nominal rejection rates under the conditional null.

2605.12577 2026-05-14 stat.AP

Circula-based multivariate distributions on the flat torus, with applications in structural biology

Guillaume Carrière, Alix Lhéritier, Frédéric Cazals

AI总结 本文研究了如何在不依赖边缘分布的情况下,建模定义在$d$维平坦环面$\mathbb{T}^d$上的随机变量之间的依赖关系,并将其应用于结构生物学中。作者提出了一种基于低秩协方差结构的循环分布模型,首次在平坦环面上构造了具有协方差结构的闭式归一化分布,并进一步构建了蛋白质中相邻氨基酸的主链和侧链二面角的联合分布模型。实验表明,该模型在似然性和稀疏性方面优于现有方法,有望推动结构生物学从离散结构分析向热力学和动力学研究发展。

详情
英文摘要

Modeling dependencies between random variables independently from their marginals is fundamental in applications ranging from finance to (structural) biology. In this work, we undertake this problem using circula to model data living on the $d$-dimensional flat torus $\mathbb{T}^d$, making two contributions. First, using a low rank covariance structure to define circulae based on a latent variable model, we design the first closed-form normalized distribution on the flat torus $\mathbb{T}^d$--with covariance structure. Second, building on this framework, we propose the first models for joint distributions of torsion angles (backbone and side-chains) for neighboring amino-acids in proteins. In practice, we fit mixtures on flat torii from $\mathbb{T}^{2}$ to $\mathbb{T}^{14}$, and show they are SOTA in terms of likelihood and sparsity. We anticipate that these models will prove fundamental to move from discrete structural studies like in AlphaFold2, to thermodynamics and kinetics, which are the ultimate goals in theoretical biophysics.

2605.12568 2026-05-14 math.ST math.PR stat.ML stat.TH

Non-asymptotic quantisation of spherically symmetric distributions

Luc Pronzato, Anatoly Zhigljavsky

AI总结 本文研究了球对称分布的非渐近量化问题,针对高维空间中传统最优量化方法需要极大样本量的缺陷,提出了一种在中等样本规模下性能优异的随机量化方法。通过分析球面上均匀分布的量化点,作者给出了期望失真度的精确表达式,并展示了如何高效确定最优球半径。研究还结合极值理论,推导了在样本量随维数变化时半径的近似表达式,为高维量化提供了新的理论支持和实用方法。

详情
Comments
24 pages, 14 figures
英文摘要

Zador's celebrated theorem is a cornerstone of optimal quantisation, establishing both the weak limit of the empirical distribution of an $n$-point optimal quantiser in $R^d$ and the decay rate of the associated $L_s$-mean quantisation error. However, for large dimensions $d$, observing this asymptotic behaviour demands an astronomically large sample size $n$, which grows super-exponentially with $d$. Through a detailed analysis of the quantisation problem for spherically symmetric distributions, we demonstrate that for moderate $n$ random quantisers uniformly distributed on a sphere of suitable radius $r$ achieve exceptional performance. The expected distortion, expressed as a triple integral, can be computed with arbitrary precision, and the optimal radius $r$ can be efficiently determined numerically. Leveraging results from extreme-value theory, we derive approximations for $r$, particularly in scenarios where $n$ scales with $d$. Depending on the growth rate of $n$, $r$ may either converge to zero or approach a limiting value that is independent of $s$.

2605.12532 2026-05-14 q-fin.TR cs.AI stat.ME

AgenticAITA: A Proof-Of-Concept About Deliberative Multi-Agent Reasoning for Autonomous Trading Systems

Ivan Letteri

AI总结 传统算法交易系统依赖确定性启发式方法或离线训练的统计模型,难以适应快速变化的市场环境。本文提出AGENTICAITA,一种基于多智能体的自主交易框架,通过多个大型语言模型代理的协同推理、协商与执行,实现无需离线训练和人工干预的自主交易决策。该框架引入了自适应Z分触发引擎、顺序推理管道、推理门控协议和相关性破除多样化评分等四个核心架构创新,经过五天的实盘模拟验证,展示了其在资产交易中的可行性和有效性。

详情
英文摘要

Conventional algorithmic trading systems are grounded in deterministic heuristics or offline-trained statistical models that cannot adapt to the semantic complexity of rapidly shifting market regimes. This paper introduces AGENTICAITA, an agentic AI framework that replaces the traditional signal then execute paradigm with a fully autonomous deliberative loop in which multiple specialized Large Language Model agents reason, negotiate, and act in concert - without any offline training or human intervention. The framework proposes four architectural contributions: (i) an Adaptive Z-Score Trigger Engine that acts as a cognitive resource allocator, gating LLM inference exclusively on statistically anomalous market conditions; (ii) a Sequential Deliberative Pipeline - the core agentic contribution - in which an Analyst agent, a Risk Manager agent, and an Executor agent form a structured reasoning chain governed by typed JSON contracts and a deterministic hard-gate safety layer; (iii) an Inference Gating Protocol, a mutex-based cognitive resource scheduler that serializes concurrent agent activations and ensures fully reproducible audit trails; and (iv) a Correlation-Break Diversification composite score that operationalizes portfolio-level idiosyncratic signal prioritization within individual agent reasoning. Validated over a five-day autonomous dry-run session under live market conditions, the framework demonstrates operational correctness of the deliberative pipeline, achieving 157 zero-intervention invocations across 76 assets with an 11.5% agentic friction rate that confirms non-trivial inter-agent negotiation. This preliminary proof-of-concept establishes the feasibility of training-free, deterministic safety-constrained multi-agent orchestration in financial decision loops, with statistically robust performance evaluation and execution cost modeling deferred to extended live deployment.

2605.12514 2026-05-14 cs.SI cs.CV cs.CY cs.DL stat.AP

Structural Diversity Drives Disruptive Scientific Innovation

Yichun Peng, Saike He, Peijie Zhang, Kang Zhao, Yi Yang, Ning Zhang, Qingpeng Zhang, Daniel Dajun Zeng, Hao Peng

AI总结 科学创新越来越依赖于合作,但能促进突破性想法的组织结构仍不明确。本文提出“结构多样性”(Structural Diversity,SD)这一新指标,用于衡量团队在其先前合作网络中连接多个不同知识社区的程度,并证明其是预测颠覆性创新的强大而稳健的指标,优于传统指标如团队新颖性和边密度。研究还发现,结构多样性能够与团队规模产生正向交互作用,缓解“规模诅咒”问题,并通过跨学科整合机制提升创新效能,为科学合作的组织设计提供了新的理论框架和实践指导。

详情
英文摘要

Scientific innovation increasingly depends on collaboration, yet the organizational structure that fosters breakthrough ideas remains poorly understood. Existing metrics - such as team size or compositional diversity - capture readily observable characteristics but not the deeper architecture of collaboration. We introduce Structural Diversity (SD): the extent to which a team bridges multiple distinct knowledge communities within its prior collaboration network. Using a century-scale dataset of 260 million scientific publications (1900-2025) and combining causal inference with a quasi-natural experiment based on a U.S. National Science Foundation policy change in 2012, we show that SD is a powerful and robust predictor of disruptive innovation, outperforming traditional team novelty indicators such as team freshness and edge density. Moreover, SD positively interacts with team size and is able to mitigate the well-known "curse of scale" by transforming scale from a liability into a resource for creative synthesis. We find that one mechanism underlying this effect is Disciplinary Integration (DI): teams with higher SD can more effectively combine heterogeneous knowledge into novel configurations. Our findings position SD as both a new theoretical construct and an actionable design principle for organizing scientific collaboration. By linking the architecture of team assembly to the dynamics of creative discovery, our work offers a structural explanation for how collective intelligence can be systematically engineered to foster disruptive innovation.

2509.20206 2026-05-14 stat.ME

Non-overlap Average Treatment Effect Bounds

Herbert P. Susmann, Alec McClean, Iván Díaz

AI总结 本文研究了在缺乏重叠条件(overlap)时平均处理效应(ATE)的识别问题。传统方法要求所有个体具有非零的治疗概率,但当这一条件不满足时,通常需转向子群体估计。本文提出了一种无需重叠条件的ATE非重叠置信区间方法,其宽度与非重叠子群体规模成正比,并在常见场景下具有实际意义。作者还提出了一种基于半参数效率理论的估计方法,能够提供渐近有效且一致的区间估计,并通过模拟和实际数据验证了方法的有效性。

详情
Comments
58 pages, 8 figures
英文摘要

The average treatment effect (ATE), the mean difference in potential outcomes under treatment and control, is a canonical causal effect. Overlap, which says that all subjects have non-zero probability of either treatment status, is necessary to identify and estimate the ATE. When overlap fails, the standard solution is to change the estimand, and target a trimmed effect in a subpopulation satisfying overlap. When the outcome is bounded, we demonstrate that this compromise is unnecessary. We derive non-overlap bounds: partial identification bounds on the ATE that do not require overlap. The bounds have width proportional to the size of the non-overlap subpopulation, making them informative in common scenarios when overlap violations are limited. Since the bounds are non-smooth functionals, we derive smooth approximations amenable to semiparametric efficiency theory and propose a Targeted Minimum Loss-Based estimator that is $\sqrt{n}$-consistent and asymptotically normal under nonparametric conditions. A multiplier bootstrap procedure yields uniformly valid confidence sets across all non-overlap subpopulation sizes and smoothing parameters, allowing researchers to report the tightest valid interval. Formally, we compare non-overlap confidence intervals to confidence intervals based on point estimation across multiple overlap regimes. We illustrate the method via simulation studies and real-world data applications.