Normal approximations in nonparametric empirical Bayes
非参数经验贝叶斯中的正态近似
AI总结 本文通过理论分析证明,在非参数经验贝叶斯中,非参数最大似然估计和相关筛法的去噪遗憾由精确正态性下的速率加上中心极限定理近似质量项控制,且该近似仅需边际平均成立,无需高维正态近似。
非参数经验贝叶斯中的正态近似
Jiafeng Chen, Nabarun Deb, Nikolaos Ignatiadis
AI总结 本文通过理论分析证明,在非参数经验贝叶斯中,非参数最大似然估计和相关筛法的去噪遗憾由精确正态性下的速率加上中心极限定理近似质量项控制,且该近似仅需边际平均成立,无需高维正态近似。
经验贝叶斯分析通常将潜在参数的噪声测量建模为正态分布,并非正式地引用中心极限定理(CLT)来证明其合理性。本文将该启发式论证置于更坚实的分析基础上。我们证明,非参数最大似然估计(NPMLE)和相关筛法的去噪遗憾由精确正态性下的速率加上反映CLT近似质量的项控制。CLT仅需对每个坐标边际成立,且仅需平均意义,无需高维正态近似。我们识别出两个渐近区域,其中正态近似足够且经验贝叶斯先验保持信息性,并证明我们的保证对依赖性和方差估计具有鲁棒性。
Empirical Bayes analyses routinely model noisy measurements of latent parameters as normal, justifying this by an informal appeal to the central limit theorem (CLT). This paper puts this heuristic appeal on firmer analytical grounds. We show that the denoising regret of the nonparametric maximum likelihood estimator (NPMLE) and related sieve methods is controlled by the rate attained under exact normality, plus a term reflecting the quality of the CLT approximation. The CLT need only hold marginally for each coordinate, and moreover only on average, without needing high-dimensional normal approximations. We identify two asymptotic regimes in which the normal approximation is adequate and the empirical Bayesian prior remains informative, and we show that our guarantees are robust to dependence and to variance estimation.
基于双谱反演的功能性多目标检测
Anna Little, Daniel Sanz-Alonso, Mikhail Sweeney, Ruiyi Yang
AI总结 针对含未知平移的多目标检测问题,提出基于自相关分析的无初始化恢复算法,通过去偏三阶经验自相关估计双谱,并利用频率推进或Kotlarski反卷积公式恢复信号,证明非渐近恢复保证。
本文发展了多目标检测的功能性理论,其中从包含信号多个未知平移的单个含噪观测中恢复紧支撑信号。我们的公式允许连续、非网格平移和相关平稳高斯过程噪声,超越了先前工作中常见的离散、网格对齐、白噪声模型。我们分析了两种基于自相关分析的无初始化恢复算法;特别地,两种算法首先通过去偏三阶经验自相关估计信号的双谱。然后利用功能性频率推进方案或Kotlarski型反卷积公式从估计的双谱中恢复信号。对于两种算法,我们在无带限假设下证明了紧支撑信号的非渐近恢复保证。得到的误差界依赖于信号的光滑性和双谱估计的精度,后者由噪声特性和信号出现次数决定。数值实验验证了我们的理论,并展示了在低信噪比条件下的准确恢复。
This paper develops a functional theory for multi-target detection, where a compactly supported signal is recovered from a single noisy observation containing many unknown translations of the signal. Our formulation allows continuous, off-grid translations and correlated stationary Gaussian process noise, extending beyond the discrete, grid-aligned, white-noise models common in prior work. We analyze two uninitialized recovery algorithms based on autocorrelation analysis; in particular, both algorithms first estimate the signal's bispectrum via a debiased third-order empirical autocorrelation. The signal is then recovered from the estimated bispectrum using either a functional frequency marching scheme or a Kotlarski-type deconvolution formula. For both algorithms, we prove non-asymptotic recovery guarantees for compactly supported signals without bandlimiting assumptions. The resulting error bounds depend on the smoothness of the signal and the accuracy of bispectrum estimation, with the latter governed by the noise characteristics and the number of signal occurrences. Numerical experiments validate our theory and demonstrate accurate recovery in low-SNR regimes.
使用广义分层和累积概率模型解决多变量中的错误
Eric S. Kawaguchi, Chun Li, Frank E. Harrell, Pamela A. Shaw, Thomas Lumley, Bryan E. Shepherd
AI总结 本研究针对电子健康记录等常规收集数据中的错误,提出使用广义分层结合累积概率模型来校准验证子样本权重,从而减少偏差并提高估计效率。
常规收集的数据,如电子健康记录(EHR)数据,经常用于生物医学研究,但这些数据容易出错,可能会使研究结果产生偏差。在记录的子样本中验证数据可以减少偏差,并且通过将整个队列中可用的易错数据和子样本中可用的验证数据纳入分析,可以提高估计的效率。整合这两种数据源的一种方法是使用广义分层,它利用整个队列中的易错数据来校准验证抽样权重。受一项关于孕期母亲体重增加的EHR研究(带有验证子样本)的启发,我们开发并说明了累积概率模型(CPMs)的广义分层技术。CPMs是稳健的、基于秩的半参数模型,适用于连续、有序或混合类型的结果数据。我们为CPMs开发了高效的广义分层估计量,评估了它们相对于竞争方法的性能,并在一个检查与孕期体重增加相关因素的研究中展示了广义分层与CPMs的实用性和优势。
Routinely collected data, such as electronic health record (EHR) data, are frequently used for biomedical research, but these data are prone to errors, which can bias study findings. Validating data in subsamples of records can reduce bias, and the efficiency of estimates can be improved by incorporating in analyses both the error-prone data available on the entire cohort and the validated data available on the subsample. One approach to incorporate both data sources is with generalized raking, which calibrates validation sampling weights using error-prone data from the entire cohort. Motivated by an EHR study of maternal weight gain during pregnancy with a validation subsample, we develop and illustrate generalized raking techniques for cumulative probability models (CPMs). CPMs are robust, rank-based and semiparametric models for continuous, ordinal, or mixed type outcome data. We develop efficient generalized raking estimators for CPMs, evaluate their performance relative to competing methods, and demonstrate the utility and strengths of generalized raking with CPMs in a study that examines factors associated with weight gain during pregnancy.
混沌替代建模中的动态-概率一致性差距
Andre Herz, Matthijs Pals, Daniel Durstewitz, Georgia Koppe
AI总结 针对混沌系统替代建模中动态与概率目标不一致的问题,提出基于可微扩展卡尔曼滤波的KAFFEE框架,通过局部预测残差似然和雅可比协方差传播来缩小差距。
动力系统重构旨在学习捕捉时间序列数据背后动力学的替代模型。可靠部署这些替代模型需要与所学动力学一致的不确定性估计。我们揭示了一个动态-概率一致性差距:追求有限时域概率目标可能会退化动力学,或使预测不确定性脱离其应反映的局部切向动力学。我们分离出这一差距背后的三种机制:核心坍缩、噪声掩盖和盲不确定性。具体来说,我们表明开环高斯滚动目标会惩罚混沌系统中雅可比生成的协方差增长,鼓励削弱物理扩张或使不确定性与之脱钩的优化捷径。为缓解这一差距,我们提出KAFFEE(用于遍历仿真的卡尔曼感知框架),这是一个基于可微扩展卡尔曼滤波的训练框架,在通过学习的局部雅可比传输协方差的同时,评估局部预测残差(新息)的似然。在随机超混沌Lorenz-96上,KAFFEE减少了已识别的失败模式,改善了相对于开环目标的动力学不变量重建,并保持了有竞争力的预测分数。我们进一步表明,当概率性地将DSR基础模型适应于13个混沌系统时,DPC差距出现,而KAFFEE在基本保留零样本动力学的同时实现了上下文贝叶斯滤波。
Dynamical systems reconstruction (DSR) aims to learn surrogate models that capture the dynamics underlying time-series data. Reliably deploying these surrogates requires uncertainty estimates consistent with the learned dynamics. We expose a dynamic-probabilistic consistency (DPC) gap: the pursuit of finite-horizon probabilistic objectives can degrade dynamics or decouple predictive uncertainty from the local tangent dynamics it ought to reflect. We isolate three mechanisms behind this gap: core collapse, noise masking, and blind uncertainty. Specifically, we show that open-loop Gaussian rollout objectives can penalize Jacobian-generated covariance growth in chaotic systems, encouraging optimization shortcuts that weaken physical expansion or decouple uncertainty from it. To mitigate this gap, we propose KAFFEE (Kalman-Aware Framework For Ergodic Emulation), a differentiable extended Kalman filter-based training framework that evaluates likelihood on local predictive residuals (innovations) while transporting covariance through learned local Jacobians. On stochastic hyperchaotic Lorenz-96, KAFFEE reduces the identified failure modes, improves reconstruction of dynamical invariants relative to open-loop objectives, and maintains competitive predictive scores. We further show that the DPC gap appears when probabilistically adapting a DSR foundation model across 13 chaotic systems, where KAFFEE enables in-context Bayesian filtering while largely preserving zero-shot dynamics.
二维角中心高斯分布的闭式线性矩
Siméon Vareilles
AI总结 本文推导了二维角中心高斯分布的角度θ在自然域上的线性矩E[θ]和E[θ²]的闭式表达式,其中均值由参数的反正切给出,二阶矩由二重对数的实部表示。
中心二元高斯分布的极角边缘分布(在积分掉径向坐标后)给出了Tyler的二维角中心高斯(ACG)分布。尽管其三角矩和向量值矩已被详细研究,但据我们所知,在自然域θ∈]-π/2,π/2[上,线性矩E[θ]和E[θ²]没有显式的闭式表达式。这里“线性”指的是将角度视为实值变量的普通矩∫θ^k f(θ)dθ,而非方向统计中常见的圆形(三角)矩E[e^{ikθ}]。我们提供了这样的表达式:均值是参数的简单反正切,而二阶矩由二重对数的实部给出。推导基于围绕arctan z分支割线的围道积分,是初等的。这些量自然出现在物理学中,其中θ被解释为实值相位而非圆形变量。
The polar-angle marginal of a centred bivariate Gaussian distribution, obtained after integrating out the radial coordinate, gives the two-dimensional angular central Gaussian (ACG) distribution of Tyler. While its trigonometric and vector-valued moments have been studied in detail, to our knowledge there are no explicit closed-form expressions for the \emph{linear} moments $\mathbf{E}[θ]$ and $\mathbf{E}[θ^{2}]$ on the natural domain $θ\in\left]-π/2,π/2\right[$. Here \textit{linear} refers to the ordinary moments $\intθ^{k}f(θ)\,dθ$ of the angle regarded as a real-valued variable, in contrast to the circular (trigonometric) moments $\mathbf{E}[e^{ikθ}]$ customary in directional statistics. We provide such expressions: the mean is a simple arctangent of the parameters, while the second moment is given by the real part of a dilogarithm. The derivation, based on a contour integration around the branch cut of $\arctan z$, is elementary. These quantities naturally arise in physics, where $θ$ is interpreted as a real-valued phase rather than a circular variable.
支持医疗决策的贝叶斯非参数聚类:一种变分推断方法
Inga Huld Ármann, Ioanna Papatsouma, Marina Evangelou
AI总结 提出基于狄利克雷过程混合模型的贝叶斯非参数聚类方法,通过坐标上升变分推断算法高效实现疾病亚型划分,在合成实验中准确聚类且计算成本显著低于MCMC。
医疗决策日益需要快速可靠地将患者分配到疾病亚型,因为许多疾病不再被视为单一实体。例如,癌症患者可被分为侵袭性和非侵袭性亚型,每组采用不同的治疗策略。我们提出一种基于狄利克雷过程混合模型的贝叶斯非参数方法,用于将个体聚类为疾病亚型。我们实现了坐标上升变分推断算法,为马尔可夫链蒙特卡洛(MCMC)提供了一种有效且计算高效的替代方案,以支持医疗决策。在合成实验中,我们证明所提出的方法能准确地将观测值分配到真实聚类,在均匀性和完整性等评估指标上表现出色。此外,我们表明所提出的方法在计算成本上相比MCMC有显著提升,且不会牺牲准确性,从而避免增加误诊风险。
Medical decision-making increasingly requires rapid and reliable assignment of patients to disease subtypes, as many diseases are no longer treated as single entities. For example, cancer patients may be stratified into aggressive and non-aggressive subtypes, with different treatment strategies for each group. We propose a Bayesian nonparametric approach based on a Dirichlet process mixture model for clustering individuals into disease subtypes. We implement a coordinate ascent variational inference algorithm, yielding an effective and computationally efficient alternative to Markov chain Monte Carlo (MCMC), to support medical decision-making. In synthetic experiments, we demonstrate that the proposed approach accurately assigns observations to their ground-truth clusters, achieving strong performance across evaluation metrics, such as homogeneity and completeness. Additionally, we illustrate the proposed approach achieves a substantial improvement in computational cost compared to MCMC, without sacrificing accuracy that would lead to the increased risk of misdiagnosis.
何时多模态预测具有生物学支持?一个诊断性评估框架
Dylan Steiner, Gustavo Arango-Argoty, Gerald Sun, Etai Jacob
AI总结 提出DECAT框架,通过五个零参考指标和规则决策,将多模态表示分类为四种诊断场景,以检测模型是否学到共享生物学、单模态生物学或虚假相关性。
肿瘤学中的多模态模型可以产生准确的预测,但准确预测并不能揭示模型是否学到了跨模态共享的生物学、局限于单一模态的生物学,还是反映了混杂因素而非真正生物学的虚假相关性。我们引入了DECAT,一个模型无关的事后评估框架,该框架针对给定任务和模态,使用五个零参考指标和基于规则的决策程序,将多模态表示分类为四种诊断场景。该框架作用于学习到的表示,不需要知道存在哪个特定混杂因素,并在证据不足时返回不确定。我们在四种多模态模型类别(超过2500个训练表示)的合成数据上以及来自8979名TCGA患者的真实数据上验证了DECAT,评估了多模态嵌入和五个预训练的病理基础模型。纠缠模型(如CLIP)实现了近乎完美的共享生物学检测,但在真实基础模型嵌入中,大多数情况下错误地声称存在共享生物学。这种错误声称率随着混杂强度增加而增加,因此更大的队列和更强的表示会产生更自信但仍然错误的诊断。应用于多模态TCGA嵌入和五个没有配对RNA的病理基础模型时,DECAT检测到了AUROC无法看到的混杂,而无需混杂标签,这一点通过事后分层得到了证实。
Multimodal models in oncology can produce accurate predictions, but accurate prediction does not reveal whether the model has learned biology that is shared across modalities, biology confined to one modality, or spurious correlations that reflect confounders rather than genuine biology. We introduce DECAT, a model-agnostic post-hoc evaluation framework that classifies multimodal representations into four diagnostic scenarios for a given task and modality, using five null-referenced metrics and a rule-based decision procedure. The framework operates on learned representations, requires no knowledge of which specific confounder is present, and returns indeterminate when the evidence is insufficient. We validate DECAT on synthetic data across four multimodal model classes (over 2,500 trained representations) and on real data from 8,979 TCGA patients, evaluating both multimodal embeddings and five pretrained pathology foundation models. Entangled models (e.g., CLIP) achieve near-perfect shared biology detection but falsely claim shared biology in the majority of cases where it is absent on real foundation model embeddings. This false claim rate increases with confound strength so that larger cohorts and stronger representations produce more confident but still incorrect diagnoses. Applied to both multimodal TCGA embeddings and five pathology foundation models without paired RNA, DECAT detects confounding invisible to AUROC without requiring the confounder labels, as confirmed by post-hoc stratification.
Assign and Add: 组合算术的机制研究
Brady Exoo, Alberto Bietti, John Sous
AI总结 通过变量赋值和模加法任务,研究Transformer中组合泛化的机制,发现模型利用同一模加法模块处理直接和间接输入,并揭示了三阶段学习动态。
大型语言模型能够组合技能以执行复杂任务,其中许多任务可能在训练期间未曾见过。这种组合发生的具体细节仍然难以捉摸。在本文中,我们通过考虑一个涉及变量赋值和模加法的简单受控设置,研究Transformer中组合泛化的机制。通过将训练数据划分为不相交的集合,我们观察到小型Transformer能够泛化到先前未见过的变量和数字组合。我们的机制分析表明,无论输入是直接给出还是通过单独的变量赋值机制间接给出,都使用相同的“模加法”MLP模块。我们还从经验角度分析了训练动态,揭示了三个学习阶段:首先学习模加法,然后学习变量赋值所需的结构,最后是精炼阶段,模型泛化到训练中未见的一些困难序列。最后,我们提供了一个理论框架来解释组合性如何从训练动态中涌现。这些结果表明,组合泛化可以是Transformer内部机制组合性的自然结果。
Large language models are able to compose skills in order to perform complex tasks, many of which might not have been seen during training. The details of how exactly this composition occurs remain elusive. In this paper, we study a mechanism for compositional generalization in transformers by considering a simple controlled setting involving variable assignment and modular addition. By partitioning our training data into disjoint sets, we observe that small transformers are able to generalize to previously unseen combinations of variables and numbers. Our mechanistic analysis shows that the same ``modular addition'' MLP module is used whether the inputs are given directly or indirectly through a separate variable assignment mechanism. We also analyze the training dynamics from an empirical lens, which reveals three phases of learning: first, modular addition is learned, then the structure required for variable assignment, and finally a refinement phase where the model generalizes to some hard sequences not seen in training. Finally, we provide a theoretical framework to explain how compositionality emerges from training dynamics. These results suggest that compositional generalization can be a natural consequence of the compositionality of internal mechanisms in~transformers.
非参数Kiefer-Weiss问题
Michael Fauss, H. Vincent Poor, Abdelhak M. Zoubir
AI总结 提出并解决非参数Kiefer-Weiss问题,通过将问题简化为最优停止问题并推导出基于二维检验统计量的最优停止策略,以最小化加权错误概率并约束最大期望样本量。
提出并解决了一个非参数的Kiefer-Weiss问题。目标是最大化二元序贯检验的加权错误概率之和,同时约束其最大期望样本量。该最大值在给定序列空间上的所有可能概率分布上取。首先,证明了非参数Kiefer-Weiss问题可以简化为一个最优停止问题。然后,在假设每次运行检验时最多允许使用k次随机化的条件下,推导出最优停止策略。通过令k趋于无穷大,得到原始问题的解。最优成本函数被证明是非线性Bellman方程的解。相应的最优停止策略基于一个二维检验统计量,其中一个分量跟踪似然比,另一个分量跟踪期望剩余样本量。关键的是,停止策略使用随机化来增加某些运行的剩余期望样本量,同时提前停止其他运行。最优随机化规则由将似然比映射为整数值样本量的函数决定。提出了该函数的两种近似,可在实践中轻松评估。通过两个非参数Kiefer-Weiss检验的数值例子说明了结果,一个针对伯努利分布成功概率的偏移,另一个针对正态分布均值的偏移。
A nonparametric variant of the Kiefer-Weiss problem is proposed and solved. The objective is to minimize a weighted sum of the error probabilities of a binary sequential test subject to a constraint on its maximum expected sample size. This maximum is taken over all possible probability distributions on the given sequence space. First, it is shown that the nonparametric Kiefer-Weiss problem can be reduced to an optimal stopping problem. Then, the optimal stopping policy is derived under the assumption that at most k uses of randomization are permitted during any run of the test. The solution to the original problem is then obtained by letting k go to infinity. The optimal cost function is shown to be the solution of a nonlinear Bellman equation. The corresponding optimal stopping policy is shown to be based on a two-dimensional test statistic, with one component tracking the likelihood ratio and the other one tracking the expected remaining sample size. Critically, the stopping policy uses randomization to increase the remaining expected sample size for some runs, while stopping early for others. The optimal randomization rule is shown to be determined by a function that maps the likelihood ratio to an integer-valued sample size. Two approximations of this function are proposed that can be evaluated easily in practice. The results are illustrated with two numerical examples of nonparametric Kiefer-Weiss tests, one for a shift in the success probability of a Bernoulli distribution, and one for a shift in the mean of a normal distribution.
建模协变量转移以高效估计随机实验中的纵向处理效应
Naoki Chihara, Tatsushi Oka, Yasuko Matsubara, Yasushi Sakurai, Shota Yasui
AI总结 提出一种回归调整框架,通过建模协变量转移来估计随机实验中的纵向处理效应,并实现渐近正态性和半参数有效性。
我们提出一个回归调整框架,用于在静态制度下估计随机实验中的纵向处理效应。虽然回归调整方法通过使用预处理协变量有助于随机实验中的方差减少,但它们通常只关注平均效应,从中我们无法获得关于效应何时出现以及持续多久的有价值见解。为了解决这个问题,我们考虑随时间变化的中间结果和事后协变量,并使用转移核表示这些动态轨迹。此外,我们建立了估计量的渐近正态性和半参数效率界,从而实现更强大的统计推断。使用日本某流媒体平台的A/B测试数据进行的模拟研究和实证分析显示了我们的方法的实际优势。
We present a regression-adjustment framework designed for the estimation of longitudinal treatment effects in randomized experiments under static regimes. While regression-adjustment methods are useful for variance reduction in randomized experiments by using pre-treatment covariates, they usually focus only on average effects, from which we cannot obtain valuable insights into when the effects appear and how long they continue. To address this issue, we consider intermediate outcomes and evolving post-treatment covariates over time, and we represent such dynamic trajectories using transition kernels. Furthermore, we establish the asymptotic normality and the semiparametric efficiency bound for our estimator, enabling more powerful statistical inference. Simulation studies and empirical analysis using A/B test data from a streaming platform in Japan show the practical advantages of our method.
信息抽样下基于不完整调查数据的合成数据生成
Ayat Almomani, Won Chang, Youngdeok Hwang, Young Min Kim, Hang J. Kim
AI总结 提出一种贝叶斯框架,通过自适应加权方案处理信息抽样下的数据合成与插补,解决方差低估问题并提供一致估计和准确的不确定性量化。
我们提出了一种贝叶斯框架,用于在信息抽样的复杂调查设置中进行数据合成和插补。为了解决现有贝叶斯方法中的方差低估问题,并适应调查数据中遇到的缺失数据,我们引入了一种自适应加权方案用于参数估计。我们表明,所提出的加权方法产生一致估计量,并具有渐近有效的Godambe信息矩阵。该框架灵活,可容纳广泛的贝叶斯模型,并便于实际实施。模拟研究表明,所提出的方法为模型参数和合成总体推断提供了准确的不确定性量化。
We propose a Bayesian framework for data synthesis and imputation in complex survey settings with informative sampling. To address variance underestimation in existing Bayesian approaches and to accommodate the missing data encountered in survey data, we introduce an adaptive weighting scheme for parameter estimation. We show that the proposed weighting yields consistent estimators with an asymptotically valid Godambe information matrix. The framework is flexible, accommodating a broad class of Bayesian models and facilitating practical implementation. Simulation studies demonstrate that the proposed method provides accurate uncertainty quantification for both model parameters and synthetic population inference.
Langevin Monte Carlo 的平均光滑性改进保证
Arnak S. Dalalyan, Avetik Karagulyan
AI总结 针对强对数凹情形下的 Langevin Monte Carlo,利用平均坐标光滑常数而非全局光滑常数,改进了 Wasserstein 距离下的非渐近误差界,并推广至变步长、Laplacian-Lipschitz 势及有限和问题。
我们在强对数凹背景下,当误差由 Wasserstein 距离度量时,建立了 Langevin Monte Carlo 的改进非渐近界。主要结果表明,离散化误差由平均坐标光滑常数控制,而非通常的全局光滑常数。证明简短且概率化,依赖于同步耦合的精细使用。我们进一步表明,相同的想法导致了变步长、Laplacian 是 Lipschitz 连续的势以及通过具有固定点控制变量的随机梯度 Langevin 动力学采样的有限和问题的改进界。在 Laplacian 光滑情形下,通常的 Hessian-Lipschitz 贡献被一个更弱的迹型三阶光滑量所取代。在有限和设定中,得到的 SGLD 界改进了对分量函数均方根光滑性的依赖。应用于具有高斯设计的广义线性模型表明,这些改进可以产生比先前已知界显著且依赖于维度的改进,特别是对于相关协变量。
We establish improved nonasymptotic bounds for Langevin Monte Carlo in the strongly log-concave setting, when the error is measured by the Wasserstein distance. The main result shows that the discretization error is governed by an average coordinate-wise smoothness constant, rather than by the usual global smoothness constant. The proof is short and probabilistic, and relies on a refined use of the synchronous coupling. We further show that the same ideas lead to improved bounds for variable step sizes, for potentials whose Laplacian is Lipschitz-continuous, and for finite-sum problems sampled by stochastic-gradient Langevin dynamics with fixed point control variates. In the Laplacian-smooth case, the usual Hessian-Lipschitz contribution is replaced by a weaker trace-type third-order smoothness quantity. In the finite-sum setting, the resulting SGLD bound improves the dependence on the root mean square smoothness of the component functions. Applications to generalized linear models with Gaussian design show that these refinements can yield substantial, dimension-dependent improvements over previously known bounds, especially for correlated covariates.
医疗流动网络的动态潜空间模型:以意大利国家卫生服务为例
Cecilia Manente, Marco Alfò, Silvia D'Angelo
AI总结 提出一种贝叶斯动态潜空间模型,用于分析意大利地方卫生机构间髋关节置换手术的患者流动网络,揭示医疗系统结构演化、COVID-19影响及区域不对称性。
医疗流动——患者在其居住地以外寻求治疗——是分散式卫生系统中不平等和财政失衡的主要来源。在意大利,地方卫生机构(ASL)之间患者流动的持续南北不对称加剧了国家卫生服务内部的现有差距;然而,这些流动在次区域层面的结构组织与时间动态仍知之甚少。我们提出一种针对有向加权网络的贝叶斯动态潜空间模型,采用障碍负二项似然,并将其应用于2018-2024年间109个意大利ASL的髋关节置换手术医疗流动行政出院记录。该模型联合处理了零膨胀、过度离散和网络依赖性,同时通过乘性发送者和接收者效应捕捉方向异质性,并通过适当的暴露项控制地域规模差异。应用于意大利医疗流动数据时,该模型揭示了医疗系统的演化几何结构,量化了COVID-19大流行造成的干扰,并揭示了向外倾向和ASL吸引力的结构性不对称。该框架为动态医疗流动网络的统计分析提供了灵活工具,与区域医疗服务的监测和评估直接相关。
Healthcare mobility -- patients seeking treatment outside their territory of residence -- represents a major source of inequality and financial imbalance in decentralised health systems. In Italy, persistent north-south asymmetries in patient flows among Local Health Authorities (ASLs) have reinforced existing disparities within the National Health Service; yet the structural organisation and temporal dynamics of these flows remain poorly understood at the sub-regional level. We propose a Bayesian dynamic latent space model for directed weighted networks with a hurdle negative binomial likelihood, and apply it to administrative discharge records on mobility for hip replacement procedures among 109 Italian ASLs over 2018-2024. The model jointly addresses excess zeros, overdispersion and network dependence, while capturing directional heterogeneity through multiplicative sender and receiver effects and controlling for differences in territorial size via an appropriate exposure term. Applied to Italian mobility data, the model reveals the evolving geometry of the healthcare system, quantifies the disruption induced by the COVID-19 pandemic, and uncovers structural asymmetries in outward propensity and ASLs attractiveness. The framework provides a flexible tool for the statistical analysis of dynamic healthcare mobility networks with direct relevance to the monitoring and evaluation of territorial healthcare provision.
单纯形上的对数比传播:成分数据细胞污染的理论
Matthias Templ
AI总结 本文提出单纯形上细胞污染的理论,通过乘法扰动和传播定理证明单个成分污染导致对数比向量秩一偏移,并揭示欧几里得细胞方法在单纯形上的失效与降维现象。
成分数据必须通过对数比进行分析:尺度不变性,该领域的定义公理,别无选择。中心化对数比除以每个部分的几何平均值,因此单个受污染成分会同时移动所有中心化对数比坐标,将对数比向量位移一个固定量,任何坐标选择都无法减少。我们围绕这一观察发展了单纯形上细胞污染的理论。基于乘法扰动的尺度不变污染模型与传播定理相结合,表明单个原始部分的腐败会导致对数比向量的秩一偏移,方向由对比矩阵决定。由此产生的扰动模式不等同于对数比坐标中的任何独立细胞污染模型——因此,应用于对数比的标准欧几里得细胞方法在单纯形污染机制下是不适定的。对于其欧几里得细胞崩溃由列集中配置见证的估计量——包括MCD、$S$-、$τ$-和坐标$M$-估计量的位置和散度——单纯形上的细胞崩溃值相对于其欧几里得对应值减少了因子$(D-1)/D$,这种减少是紧的,并且纯粹源于$nD$个原始细胞与$n(D-1)$个ilr细胞之间的归一化不匹配。变异矩阵的细胞影响函数携带诊断指纹:单个部分的污染恰好膨胀一行和一列,从而识别出责任成分。这些结果为单纯形上的细胞鲁棒方法奠定了理论基础;一篇配套论文开发了一种利用传播几何的细胞鲁棒PCA估计器,并在模拟和地球化学数据上进行了演示。
Compositional data must be analysed through log-ratios: scale invariance, the defining axiom of the field, leaves no alternative. The centred log-ratio divides by the geometric mean of every part, so a single contaminated component shifts every centred-log-ratio coordinate at once, displacing the log-ratio vector by a fixed amount that no choice of coordinates can reduce. We develop a theory of cellwise contamination on the simplex around this observation. A scale-invariant contamination model built from multiplicative perturbation combines with a propagation theorem showing that corruption of a single raw part induces a rank-one shift of the log-ratio vector, with direction determined by the contrast matrix. The resulting perturbation pattern is not equivalent to any independent cellwise contamination model in log-ratio coordinates -- so standard Euclidean cellwise methods applied to log-ratios are ill-posed under the simplex contamination mechanism. For estimators whose Euclidean cellwise breakdown is witnessed by a column-concentrated configuration -- a class including MCD, $S$-, $τ$-, and coordinate-wise $M$-estimators of location and scatter -- the cellwise breakdown value on the simplex is reduced by the factor $(D-1)/D$ relative to its Euclidean counterpart, a reduction that is tight and arises purely from the normalisation mismatch between $nD$ raw cells and $n(D-1)$ ilr cells. The cellwise influence function for the variation matrix carries a diagnostic fingerprint: contamination of a single part inflates exactly one row and column, identifying the responsible component. These results form the theoretical foundation for cellwise-robust methods on the simplex; a companion paper develops a cellwise-robust PCA estimator that exploits the propagation geometry and demonstrates it on simulated and geochemical data.
BEND:非线性纵向数据贝叶斯估计的R包
Corissa T. Rohloff, Rik Lamm, Yadira Peralta, Nidhi Kohli, Eric F. Lock
AI总结 本文介绍R包BEND,它使用贝叶斯推断方法估计非线性纵向模型,特别是分段模型,并提供多种扩展功能以处理复杂纵向数据。
纵向数据对于捕捉和分析随时间变化的模式非常有用。通常,这些模式呈现非线性形式。一个有用且常用的非线性函数是分段函数,它假设生长发生在不同阶段,每个阶段有其自己的函数形式。过去的文献已经证实,对于估计分段模型,贝叶斯推断优于基于似然的方法。为了解决这个问题,我们开发了R包BEND——非线性数据的贝叶斯估计(可在CRAN上获取)。BEND的目的是提供一个用户友好的软件,用于使用贝叶斯推断方法估计非线性纵向模型。鉴于分段模型的灵活性和实用性,BEND包含了它的几个扩展,以适应各种类型的复杂纵向数据集和应用。Bayes_PREM()可以经验性地识别分段随机效应模型中随机变点的数量和位置。该函数还可以对具有不同纵向生长模式的多个潜在类别进行建模,并纳入协变量来预测结果和潜在类别成员。Bayes_BPREM()可以联合建模两个相关结果的纵向分段轨迹。最后,Bayes_CREM()可以估计群体成员对纵向生长的影响。本文概述了BEND中包含的函数,以及如何在实践中应用这些模型的实证示例。
Longitudinal data are useful for capturing and analyzing patterns of change over time. Often, these patterns follow a nonlinear form. One useful and commonly applied nonlinear function is the piecewise function, which assumes growth occurs in distinct phases, each with its own functional form. Past literature has established that Bayesian inference is preferred over likelihood-based methods for estimating piecewise models. To address this, we developed the R package BEND - Bayesian Estimation of Nonlinear Data (available on CRAN). The purpose of BEND is to provide a user friendly software for estimating nonlinear longitudinal models using a Bayesian inference approach. Given the flexibility and practicality of the piecewise models, BEND includes several extensions of it to accommodate various types of complex longitudinal datasets and applications. Bayes_PREM() can empirically identify the number and location of random changepoints in a piecewise random effects model. This function can also model multiple latent classes with different longitudinal growth patterns and incorporate covariates to predict the outcome and latent class membership. Bayes_BPREM() can jointly model the longitudinal piecewise trajectories of two interrelated outcomes. Lastly, Bayes_CREM() can estimate the impact of group membership on longitudinal growth. This paper provides an overview of the functions included in BEND and empirical examples of how to apply these models in practice.
随机迭代算法的非渐近收敛性:一个李雅普诺夫框架
Zaiwei Chen, Siva Theja Maguluri
AI总结 本文综述了基于李雅普诺夫技术的随机迭代算法(随机逼近)的有限时间分析方法,通过广义Moreau包络作为通用李雅普诺夫函数,给出了均方收敛保证,并应用于随机梯度下降、线性SA及Q学习等强化学习算法,最后讨论了马尔可夫噪声、半范数压缩算子等扩展。
我们综述了基于李雅普诺夫技术的随机迭代算法(也称为随机逼近(SA)算法)的有限时间分析方法,用于求解不动点方程 $ar{F}(x)=x$,其中算子 $ar{F}(\cdot)$ 只能通过带噪声的预言机访问。我们首先关注标准设定,其中 $ar{F}(\cdot)$ 关于某种范数是压缩的且噪声是独立同分布的,并解释广义Moreau包络如何作为通用李雅普诺夫函数,无论底层范数如何。然后,我们展示该框架如何产生均方收敛保证,并应用于随机梯度下降、线性SA以及基于值的强化学习算法,如Q学习和时序差分学习。最后,我们讨论向马尔可夫噪声、半范数压缩算子、耗散算子和高概率界的扩展,并以开放问题作结。目标是提供一个统一且自包含的SA有限时间分析及其应用(尤其是在强化学习中)的路线图。
We survey Lyapunov-based techniques for the finite-time analysis of stochastic iterative algorithms, also known as stochastic approximation (SA) algorithms, for solving fixed-point equations $\bar{F}(x)=x$, where the operator $\bar{F}(\cdot)$ can only be accessed through a noisy oracle. We first focus on the standard setting in which $\bar{F}(\cdot)$ is contractive with respect to some norm and the noise is i.i.d., and explain how generalized Moreau envelopes serve as universal Lyapunov functions, regardless of the underlying norm. We then show how this framework yields mean-square convergence guarantees and applies to stochastic gradient descent, linear SA, and value-based reinforcement learning algorithms such as Q-learning and temporal-difference learning. Finally, we discuss extensions to Markovian noise, seminorm-contractive operators, dissipative operators, and high-probability bounds, and conclude with open problems. The goal is to present a unified and self-contained roadmap for the finite-time analysis of SA and its applications, especially in reinforcement learning.
贝叶斯分布鲁棒优化中的后验和似然敏感性
Jun-ya Gotoh, Andrew E. B. Lim, Michael Jong Kim
AI总结 本文提出最坏情况后验和似然敏感性的概念,用于量化贝叶斯模型对后验和似然扰动的鲁棒性,并证明分布鲁棒优化可实现性能与鲁棒性的近似帕累托最优权衡。
我们引入了最坏情况后验和似然敏感性的概念。这些分别衡量期望成本对后验分布最坏情况扰动和贝叶斯模型似然最坏情况扰动的敏感性。每个都定义了鲁棒性的定量度量。关心样本外期望成本对其假设偏差敏感性的决策者将希望两个敏感性都较小的决策。我们推导了由偏差度量定义的不确定性集的后验和似然敏感性。当后验方差缩小到零时,后验敏感性消失,这发生在参数不确定性通过学习消除时。参数学习不能消除似然敏感性。贝叶斯优化问题的分布鲁棒公式在性能(期望成本)和鲁棒性(后验和似然敏感性)之间实现了近似帕累托最优的权衡。
We introduce the notion of worst-case posterior and worst-case likelihood sensitivity. These measure, respectively, the sensitivity of the expected cost to worst-case perturbations of the posterior distribution and worst-case perturbations of the likelihood of a Bayesian model. Each defines a quantitative measure of robustness. A decision maker concerned about the sensitivity of the out-of-sample expected cost to deviations from her assumptions will want a decision for which both sensitivities are small. We derive posterior and likelihood sensitivities for uncertainty sets defined in terms of deviation measures. Posterior sensitivity vanishes when the posterior variance shrinks to zero, which occurs when parameter uncertainty is eliminated from learning. Parameter learning does not eliminate likelihood sensitivity. A distributionally robust formulation of a Bayesian optimization problem makes a near-Pareto-optimal tradeoff between performance (expected cost) and robustness (posterior and likelihood sensitivity).
共识水平替代率不同于病毒颗粒水平替代率
David J Pascall
AI总结 本文区分了病毒颗粒水平替代率(VLSR)和共识水平替代率(CLSRs),指出两者生物学意义不同且不可互换,并强调共识生成规则应作为常规报告要求。
估计病毒替代率是进化流行病学的核心,最近对宿主内进化的兴趣加剧了对此类速率测量内容的疑问。我区分了系统发育分析中很少被分开的两类进化速率估计量:病毒颗粒水平替代率(VLSR),一种沿谱系计数突变事件的分子量;以及共识水平替代率(CLSRs),计数共识序列变化的种群汇总量。CLSRs由共识生成规则索引。VLSR和CLSRs都具有生物学意义,但不可互换。由于共识生成规则定义了给定的CLSR,它应作为常规报告要求。这一反思应有助于分析人员在处理病毒序列集时做出更明智的方法选择。
Estimating viral substitution rates is central to evolutionary epidemiology, and recent interest in within-host evolution has sharpened the question of what such rates measure. I distinguish two classes of evolutionary rate estimand that are rarely separated in phylogenetic analysis: the virion-level substitution rate (VLSR), a molecular quantity counting mutational events along lineages, and consensus-level substitution rates (CLSRs), population-summary quantities counting changes in the consensus sequences. CLSRs are indexed by the consensus-generation rule. The VLSR and CLSRs are both biologically meaningful, but not interchangeable. Because the consensus-generation rule defines a given CLSR, it should be a routine reporting requirement. This reflection should help analysts make more informed methodological choices when working with sets of virus sequences.
移动轨迹稀疏性对流行病建模结果的影响
Federico Delussu, Francisco Barreras, Yuan Liao, Duncan J. Watts, Laura Alessandretti
AI总结 通过对比近完整与稀疏GPS轨迹数据,量化轨迹稀疏性对流行病模型关键指标的偏差,并提出基于逆概率加权的校正方法。
GPS移动数据越来越多地用于流行病建模,以构建共定位网络或人口流动。这些轨迹通常表现出高度的时间稀疏性,因为数据收集是机会性的且与手机使用相关。尽管对这一局限性的认识日益增强,但现有流行病建模研究在很大程度上忽视了对其产生的偏差的分析和处理,引发了对下游推断稳健性的担忧。我们引入了一个原则性框架,以量化轨迹稀疏性在不同缺失程度下对关键流行病建模结果的影响。我们的方法利用了一个高度完整的数据集,该数据集同时包含近完整和稀疏的GPS轨迹。近完整轨迹提供基线流行病结果,而稀疏轨迹提供现实的缺失模式,我们将其强加于基线上以测量偏差。通过这种方式,我们展示了缺失记录如何导致对流行病强度关键指标的显著低估,这不仅由缺失数据量解释,还由数据缺失的更复杂特征解释,在设计校正方法时应考虑这些特征。最后,我们提出并评估了一种基于逆概率加权的网络边校正方法,在流行病模型校准前应用,该方法被证明可以减少偏差和参数错误指定。我们还在一个来自商业GPS移动数据集的单独匿名样本上展示了这种校正,并报告了其效果。总之,我们的发现首次严格量化了流行病建模中的轨迹稀疏性偏差,为处理该问题提供了初步指导。
GPS mobility data are increasingly used in epidemic modeling, allowing the construction of co-location networks or population flows. These trajectories typically exhibit high temporal sparsity because data collection is opportunistic and tied to phone use. Despite growing awareness of this limitation, the analysis and treatment of biases derived from it have been largely overlooked in existing epidemic modeling studies, raising concerns about the robustness of downstream inferences. We introduce a principled framework to quantify the impact of trajectory sparsity on key epidemic modeling outcomes across different levels of missingness. Our approach leverages a highly-complete dataset that exhibits both near-complete and sparse GPS trajectories. Near-complete trajectories provide baseline epidemic outcomes, while sparse trajectories provide realistic missingness patterns that we impose on the baseline to measure bias. In this way, we show how missing records can result in substantial underestimation of key measures of epidemic intensity, explained not only by the amount of missing data, but by more complex features of data missingness that should be taken into account when designing correction methods. Finally, we propose and evaluate a correction based on inverse probability weighting of network edges before epidemic model calibration, which is shown to reduce bias and parameter misspecification. We also demonstrate this correction on a separate anonymized sample from a commercial GPS mobility dataset and report on its effect. Together, our findings provide a first rigorous quantification of trajectory-sparsity bias in epidemic modeling, offering initial guidance on the treatment of this issue.
跨期选择中的主观时间变形:一种函数型数据分析方法
Fabrizio Maturo, Salvador Cruz Rambaud, Vincenzo Li Calzi, Andrea Mazzitelli, Annamaria Porreca
AI总结 本文提出函数型数据分析框架,从离散跨期等价判断中重建主观时间轨迹,通过导数汇总、函数主成分分析和聚类揭示跨期选择的异质性,发现两个主成分解释97.44%变异,并识别出三个稳定的时间变形轮廓。
跨期选择数据通常通过标量折扣率参数进行总结,或由预先确定的参数折扣函数拟合,尽管相关信息可能存在于整个折扣轨迹的形状中。本文提出了一种函数型数据分析框架,用于从离散的跨期等价判断中重建和分析隐含的主观时间轨迹。来自多语言问卷的货币等价反应被转化为个体折扣曲线,通过单调平滑进行正则化,并用于恢复归一化的隐含主观时间轨迹。通过导数汇总、函数主成分分析和标准化成分得分的聚类来检查这些轨迹。基于107名参与者的实证应用表明,跨期选择的异质性不能完全由标量折扣率变化捕捉。前两个函数主成分解释了97.44%的变异,表明存在低维结构。函数聚类识别出三个稳定的时间变形轮廓,并通过bootstrap稳定性分析以及成分、算法、距离、平滑规范和异常值处理的敏感性检查得到支持。基于指数、韦伯-费希纳和史蒂文斯规范的参数基准为许多个体提供了准确的拟合,但未能完全恢复函数聚类结构。与显式主观时间感知测量的比较显示,从选择中重建的隐含轨迹与直接报告的时间感知之间仅部分对齐。函数型数据分析提供了一个应用统计框架,将跨期选择异质性表示为函数形状的变化,补充了标量折扣率和参数主观时间模型。
Intertemporal choice data are usually summarized through scalar discount-rate parameters or fitted by predetermined parametric discount functions, although relevant information may lie in the shape of the whole discounting trajectory. This paper proposes a Functional Data Analysis framework for reconstructing and analyzing implicit subjective-time trajectories from discrete intertemporal equivalence judgments. Monetary equivalence responses from a multilingual questionnaire are transformed into individual discount curves, regularized by monotone smoothing, and used to recover normalized implicit subjective-time trajectories. The trajectories are examined through derivative summaries, Functional Principal Component Analysis, and clustering on standardized component scores. The empirical application, based on 107 participants, shows that heterogeneity in intertemporal choice is not fully captured by scalar discount-rate variation. The first two functional principal components explain 97.44% of the variability, indicating a low-dimensional structure. Functional clustering identifies three stable profiles of temporal deformation, supported by bootstrap stability analysis and sensitivity checks on components, algorithms, distances, smoothing specifications, and outlier treatment. Parametric benchmarks based on exponential, Weber-Fechner, and Stevens specifications provide accurate fits for many individuals, but do not fully recover the functional clustering structure. The comparison with explicit subjective-time perception measures reveals only partial alignment between implicit trajectories reconstructed from choices and directly reported temporal perception. Functional Data Analysis provides an applied statistical framework for representing intertemporal choice heterogeneity as variation in functional shape, complementing scalar discount-rate and parametric subjective-time models.
为什么线性循环记忆在部分可观测强化学习中有效
Yike Zhao, Onno Eberhard, Malek Khammassi, Ali H. Sayed, Michael Muehlebach
AI总结 本文通过构造两种线性滤波器,从理论上证明了线性循环神经网络在部分可观测强化学习中作为记忆单元的有效性,并扩展到动作控制的隐马尔可夫模型。
线性循环神经网络家族在部分可观测强化学习中作为循环记忆单元表现出色。我们通过构造并研究两种线性滤波器为其经验有效性提供了理论依据:(i) 第一种在确定性转移矩阵下精确重现隐马尔可夫模型(HMM)中信念向量的预softmax logits,从而作为最优策略学习的充分统计量;(ii) 第二种在近似确定性转移矩阵下实现状态解码误差趋近于零,从而将状态模糊性降至接近零。结果扩展到动作控制的HMM,其中相应的线性滤波器变为随时间变化且依赖于动作的动态。我们通过数值实验说明了主要结果,并进一步展示了所构造的线性滤波器在小型强化学习游戏中作为强特征提取器的能力。
The family of linear recurrent neural networks has shown strong performance as recurrent memory units in partially observable reinforcement learning. We provide a theoretical justification for their empirical effectiveness by constructing and studying two linear filters: (i) the first exactly reproduces the pre-softmax logits of the belief vector in a hidden Markov model (HMM) under a deterministic transition matrix, thereby serving as a sufficient statistic for optimal policy learning, (ii) the second achieves vanishing state-decoding error under a nearly deterministic transition matrix, thus reducing state ambiguity to near zero. The results extend to action-controlled HMMs, where the corresponding linear filters become time-varying with action-dependent dynamics. We illustrate our main results through numerical experiments and further show that the constructed linear filter serves as a strong feature extractor in a small reinforcement learning game.
欺诈类型分解与观测机制分类:支付网络中的类别特定检测极限
Gaurav Dhama
AI总结 本文通过引入观测机制分类将欺诈分为五类,证明按类别分别估计欺诈率并聚合优于整体估计,并推导了每类检测的理论约束。
支付网络中的欺诈检测依赖于通过异质且不完美的观测过程生成的标签,但现有方法将欺诈视为同质二元变量。我们证明这一假设在结构上不正确,并导致可证明的低效。我们引入一个观测机制分类,将欺诈分为五类,每类由不同的审查和标记流程定义。我们证明按类别分别估计欺诈率并聚合严格优于整体估计,效率差距由异质观测率导致的Jensen惩罚刻画。对于每类,我们推导了检测的绑定理论约束,包括内生标签腐败、结构不可观测性和特征非信息性。这些结果确立了欺诈检测本质上是一组不同的估计问题,每个问题由其自身的观测结构和检测极限支配。
Fraud detection in payment networks relies on labels generated through heterogeneous and imperfect observation processes, yet existing approaches treat fraud as a homogeneous binary variable. We show that this assumption is structurally incorrect and leads to provable inefficiency. We introduce an observation-mechanism taxonomy that partitions fraud into five classes, each defined by a distinct censorship and labeling pipeline. We prove that estimating fraud rates separately by class and aggregating strictly dominates pooled estimation, with the efficiency gap characterized as a Jensen penalty arising from heterogeneous observation rates. For each class, we derive the binding theoretical constraint on detection, including endogenous label corruption, structural non-observability, and feature non-informativeness. These results establish that fraud detection is fundamentally a collection of distinct estimation problems, each governed by its own observation structure and detection limit.
熵投影对齐:估计、解释和改进分布偏移下的模型性能
Salim I. Amoukou, Emanuele Albini, Tom Bewley, Saumitra Mishra, Manuela Veloso
AI总结 提出熵投影对齐(EPA)方法,通过匹配选定矩并最小化KL散度来对齐源分布与目标分布,从而统一解决分布偏移下的性能估计、解释和改进问题。
我们提出了一个统一框架,用于解决分布偏移的三个关键挑战:(1)估计模型在未标记目标域上的性能,(2)通过识别导致偏移的特征来解释偏移,以及(3)提高目标域性能。我们的方法,熵投影对齐(EPA),通过匹配精心选择的矩同时最小化与源分布的KL散度,将源分布与目标分布对齐。该公式为重要性权重提供了唯一的闭式解,通过隐式方差控制实现鲁棒性。借鉴领域适应理论,我们证明矩匹配足以实现可靠的估计和适应,避免了完全密度比恢复的需要。大量实验以及强有力的理论保证表明,EPA在提供显著计算效率的同时,始终优于最先进的基线方法。
We propose a unified framework for addressing three key challenges of distribution shift: (1) estimating a model's performance on an unlabeled target domain, (2) explaining the shift by identifying the features responsible, and (3) improving the target domain performance. Our method, Entropic Projection Alignment (EPA), aligns the source distribution to the target by matching carefully selected moments while simultaneously minimising the KL divergence from the source. This formulation yields a unique closed-form solution for importance weights, achieving robustness through implicit variance control. Drawing on domain adaptation theory, we establish that moment matching is sufficient for reliable estimation and adaptation, avoiding the need for full density ratio recovery. Extensive experiments, together with strong theoretical guarantees, demonstrate that EPA consistently outperforms state-of-the-art baselines while offering substantial computational efficiency.
通过随时有效推断纠正在线决策树中的分裂选择
Salim I. Amoukou, Saumitra Mishra, Manuela Veloso
AI总结 针对在线决策树分裂选择缺乏有效统计保证的问题,提出基于随时有效推断的方法,实现任意数据流下错误分裂的随时有效控制、预测优势下的有限承诺时间,并在平稳独立同分布数据下保证风险单调递减且每次分裂严格改善。
基于装袋的集成方法,尤其是自适应随机森林,是数据流学习中最强的表现者之一。这些方法的共同点是依赖霍夫丁树作为基学习器,通过使用浓度不等式测试候选分裂是否显著优于其替代方案来增量式地构建决策树。尽管经验成功,现有变体缺乏有效的统计保证。当前分析依赖于固定样本浓度界,而分裂决策使用数据依赖的停止规则,这使其保证无效,并可能将错误分裂的概率推向1。我们引入了一种基于随时有效推断的原则性替代方案。我们的方法提供:(i) 在任意数据流(包括非平稳设置)下对错误分裂的随时有效控制;(ii) 在预测优势下的有限承诺时间;(iii) 在平稳独立同分布数据下,风险单调递减且每次分裂严格改善。在经验上,我们评估了独立树及其在非平稳流中在自适应随机森林中的使用。我们的方法提高了性能,同时生成了更小的树。
Bagging-based ensembles, most notably Adaptive Random Forests, are among the strongest performers for learning from data streams. A common denominator across these methods is their reliance on Hoeffding Trees as base learners, which grow decision trees incrementally by testing whether a candidate split is significantly better than its alternatives using concentration inequalities. Despite their empirical success, existing variants lack valid statistical guarantees. Current analyses rely on fixed-sample concentration bounds, while split decisions are made using data-dependent stopping rules, which invalidates their guarantees and can drive the probabilty of incorrect splits to one. We introduce a principled alternative based on anytime-valid inference. Our method provides: (i) anytime-valid control of false splits under arbitrary data streams, including non-stationary settings; (ii) finite commitment time under a predictive advantage; and (iii) under stationary i.i.d. data, risk is monotone decreasing and strictly improves at every split. Empirically, we evaluate both standalone trees and their use within Adaptive Random Forests on non-stationary streams. Our method improves performance while producing substantially smaller trees.
Probit回归中的强对数凹性
Martin Chak, Giacomo Zanella
AI总结 本文证明了在没有岭惩罚(即高斯先验)的probit回归似然函数中会出现强对数凹性,并给出了固定设计下的特征刻画以及高斯设计下的条件数分析。
我们表明,与逻辑回归情况不同,在没有岭惩罚(即高斯先验)的probit回归似然函数中会出现强对数凹性。具体地,我们提供:(a) 固定设计下强对数凹性的刻画,类似于最大似然估计(MLE)存在的刻画;(b) 高斯设计下的分析,依赖于样本量$n$与协变量数$d$的比例$d/n = r\in [0, 1)$。在后一种情况下,我们证明,当$r$足够小时,以高概率,所得条件数是有限的,并且在渐近区域$n, d ightarrow \infty$中,与$r$无关。
We show that strong log-concavity emerges in probit regression likelihoods without ridge penalization (i.e. Gaussian priors), unlike for the logistic case. Specifically, we provide: (a) a characterization of strong log-concavity for fixed designs, similar to that for the existence of the maximum likelihood estimator (MLE) and (b) an analysis for Gaussian design, dependent on the proportionality $d/n = r\in [0, 1)$ between the sample size $n$ and the number of covariates $d$. In the latter case we show that, with high probability, provided $r$ is small enough, the resulting condition number is finite and, in the asymptotic regime $n, d\rightarrow \infty$, independent of $r$.
关于度量空间值预测器的公开与私有二分类问题
László Györfi, Martin Kroll, Harro Walk
AI总结 针对度量空间值预测器的二分类问题,提出Proto-NN分类器并推导其收敛率,同时考虑局部差分隐私约束下私有化Proto-NN分类器的通用一致性和收敛率。
我们考虑一个框架下的二分类问题,其中预测变量$X$取值于任意可分度量空间$\mathcal X$,标签$Y$取值于$\{ \pm 1 \}$。在第一部分工作中,假设可以直接从$(X,Y)$的未知分布中获得独立同分布样本$(X_1,Y_1),\ldots,(X_n,Y_n)$。我们推导了最近引入的Proto-NN分类器在度量空间值预测器存在情况下的收敛率。在第二部分中,我们在额外的隐私约束下重新考虑同一问题。具体而言,我们在局部差分隐私框架下工作,假设数据$(X_1,Y_1),\ldots,(X_n,Y_n)$不能被直接观测,只能通过满足隐私约束的适当机制获得的私有化替代数据可用。统计学家应从所有保证局部差分隐私的机制类中选择最优隐私机制。我们选择的方法是在一组训练数据中添加拉普拉斯分布噪声,并证明仅使用私有化数据的Proto-NN分类器是通用一致的。最后,推导了私有化Proto-NN分类器的收敛率。
We consider the problem of binary classification in a framework where the predictor $X$ takes values in an arbitrary separable metric space $\mathcal X$ and the label $Y$ values in $\{ \pm 1 \}$. In the first part of this work, we assume that one has direct access to an i.i.d. sample $(X_1,Y_1),\ldots,(X_n,Y_n)$ from the unknown distribution of the pair $(X,Y)$. We derive a convergence rate for the Proto-NN classifier which was recently introduced as a classifier in the presence of metric space-valued predictors. In the second part of the paper, we reconsider the same problem under an additional privacy constraint. More precisely, we work in the framework of local differential privacy where one assumes that the data $(X_1,Y_1),\ldots,(X_n,Y_n)$ cannot be directly observed but only a privatised surrogate obtained through a suitable mechanism satisfying the privacy constraint is available. The statistician should select an optimal privacy mechanism from the class of all mechanism that guarantee local differential privacy. Our method of choice is to add Laplace distributed noise to both a set of in Proto-NN classifier using the privatised data only is universally consistent. Finally, a rate of convergence for the privatised Proto-NN classifier is derived.
双时间尺度马尔可夫随机逼近的收敛性及其在强化学习中的应用
Vagul Mahadevan, Claire Chen, Shuze Daniel Liu, Shangtong Zhang
AI总结 本文研究双时间尺度随机逼近在马尔可夫噪声下的稳定性与收敛性,通过用慢时间尺度参数的运行最大值控制快时间尺度参数,首次证明了带资格迹的TDC在离策略线性函数逼近下的几乎必然收敛。
本文研究双时间尺度随机逼近(SA)的收敛性,这是一类迭代算法,分别以快慢时间尺度更新两组参数。强化学习中双时间尺度SA的著名例子包括带梯度校正的时间差分学习(TDC)和演员-评论家方法。以往,双时间尺度SA的稳定性(即有界性)和收敛性仅在独立同分布噪声下建立。本文则在马尔可夫噪声下建立双时间尺度SA的稳定性和收敛性,这种设置更符合强化学习实际。值得注意的是,我们无需使用任何投影算子,且噪声无需位于紧集内。我们的关键技术新颖之处在于,用慢时间尺度参数的运行最大值来控制快时间尺度参数,而非像大多数先前工作那样使用当前慢时间尺度参数。作为一个关键应用,我们首次证明了带资格迹的TDC在离策略线性函数逼近下的几乎必然收敛。
This work studies the convergence of two-timescale stochastic approximations (SA), a class of iterative algorithms that update two sets of parameters in fast and slow timescales respectively. Notable examples of two-timescale SA in reinforcement learning (RL) include temporal difference learning with gradient correction (TDC) and actor-critic methods. Previously, the stability (i.e., boundedness) and convergence of two-timescale SA were only established under i.i.d. noise. This work instead establishes the stability and convergence of two-timescale SA under Markovian noise, a setup that is more realistic in RL. Notably, we do not need to use any projection operator and the noise does not need to live in a compact space. Our key technical novelty is to control the fast timescale parameter with the running max of the slow timescale parameter, instead of with the current slow timescale parameter, as most prior works do. As a key application, we establish the first almost sure convergence of TDC with eligibility traces under off-policy learning with linear function approximation.
记忆设计:概率序列层
Matthew Dowling, Hyungju Jeon, Cristina Savin, Il Memming Park
AI总结 提出设计-模型框架,通过精确贝叶斯滤波推导高效循环序列映射,线性高斯实例中的贝叶斯层传播均值和协方差以跟踪不确定性,统一多种次二次递归,并提升鲁棒性和长上下文检索。
我们引入了设计-模型框架:一种从关于记忆的显式假设中推导高效循环序列映射的方法。设计模型通过精确贝叶斯滤波将证据写入记忆;查询相关的读出产生一个预测分布,其均值即为层输出。在我们的线性高斯实例中,贝叶斯层同时传播均值和协方差:协方差跟踪存储关联的不确定性,引导写入朝向不确定方向,随着证据积累而衰减增益,并保留自信的记忆。同一框架统一了几种次二次递归。线性注意力、GLA和Mamba-2/SSD在一个设计模型下是精确滤波器,而DeltaNet及相关Delta-rule模型在另一个设计模型下作为协方差重置约简出现。恢复协方差为检索动力学提供了闭式预测,并经实验验证,在受控碰撞研究、学习关联回忆和Zoology MQAR基准上,改善了训练范围外的鲁棒性;将贝叶斯层蒸馏到预训练的340M Gated DeltaNet中,在匹配计算下提升了RULER长上下文检索性能。
We introduce the design-model framework: a way to derive efficient recurrent sequence maps from explicit assumptions about memory. A design model writes evidence into memory by exact Bayesian filtering; a query-dependent readout produces a predictive distribution whose mean is the layer output. In our linear-Gaussian instantiation, the \emph{Bayesian Layer} propagates both a mean and a covariance: the covariance tracks uncertainty over stored associations, steering writes toward uncertain directions, attenuating gains as evidence accumulates, and preserving confident memories. The same framework unifies several sub-quadratic recurrences. Linear attention, GLA, and Mamba-2/SSD are exact filters under one design model, whereas DeltaNet and related Delta-rule models arise as covariance-reset reductions under another. Restoring the covariance yields closed-form predictions for retrieval dynamics, verified empirically, and improves robustness beyond the training regime across controlled collision studies, learned associative recall, and the Zoology MQAR benchmark; distilling Bayesian Layers into a pretrained 340M Gated DeltaNet improves RULER long-context retrieval at matched compute.
深度ReLU神经网络对各向异性和混合光滑函数的逼近与学习
Yunfei Yang, Jun Fan
AI总结 本文研究深度ReLU神经网络对各向异性和混合光滑函数类的逼近率,并证明在平均光滑度条件下可达到接近最优的逼近速率。
本文研究深度ReLU神经网络逼近和学习光滑函数的效率。当误差以$L^p([0,1]^d)$范数度量且逼近器为宽度$W$、深度$L$的网络时,近期工作已证明在Sobolev嵌入条件$s/d>1/q-1/p$下,对于Besov空间$\mathcal{B}^s_{q,r}([0,1]^d)$有超逼近率$\mathcal{O}((WL)^{-2s/d})$。为克服该速率中的维数灾难,我们将此结果推广到各向异性和混合光滑函数类。对于各向异性光滑度$oldsymbol{s}=(s_1,\dots,s_d)$的各向异性Besov空间$\mathcal{B}^{oldsymbol{s}}_{q,r}([0,1]^d)$,在嵌入条件$ ilde{s} > 1/q-1/p$下建立逼近率$\mathcal{O}((WL)^{-2 ilde{s}})$,其中平均光滑度$ ilde{s} = (\sum_{i=1}^d s_i^{-1})^{-1}$。对于混合光滑度$s>1/q-1/p$的混合光滑Besov空间$\mathcal{MB}^s_{q,r}([0,1]^d)$,我们证明逼近率$\mathcal{O}((WL)^{-2s})$(忽略对数因子)。利用这些结果,我们还推导了各向异性Besov函数复合的逼近界。作为应用,表明深度ReLU神经网络可在广泛光滑函数类上达到极小化最优速率(忽略对数因子)。
This paper studies how efficiently deep ReLU neural networks can approximate and learn smooth functions. When the error is measured in $L^p([0,1]^d)$ norm and the approximator is a network with width $W$ and depth $L$, recent works have proven the supper approximation rate $\mathcal{O}((WL)^{-2s/d})$ for Besov space $\mathcal{B}^s_{q,r}([0,1]^d)$ under the Sobolev embedding condition $s/d>1/q-1/p$. In order to overcome the curse of dimensionality in this rate, we extent this result to anisotropic and mixed smooth function classes. We establish the approximation rate $\mathcal{O}((WL)^{-2\tilde{s}})$ for anisotropic Besov space $\mathcal{B}^{\boldsymbol{s}}_{q,r}([0,1]^d)$ with anisotropic smoothness $\boldsymbol{s}=(s_1,\dots,s_d)$ under the embedding condition $\tilde{s} > 1/q-1/p$, where the mean smoothness $\tilde{s} = (\sum_{i=1}^d s_i^{-1})^{-1}$. For mixed smooth Besov space $\mathcal{MB}^s_{q,r}([0,1]^d)$ with mixed smoothness $s>1/q-1/p$, we show that the approximation rate $\mathcal{O}((WL)^{-2s})$ holds up to logarithmic factors. Using these results, we also derive approximation bounds for the composition of anisotropic Besov functions. As an application, it is shown that deep ReLU neural networks can achieve minimax optimal rates up to logarithmic factors for a wide range of smooth function classes.
生存结局下随机治疗干预的去偏推断
Torben Martinussen, Mark Bech Knudsen, Helene Rytgaard
AI总结 针对生存结局的时间依赖性治疗,利用疾病-死亡模型定义随机干预,通过平滑处理解决非路径可微问题,提出去偏一步估计量实现稳健推断。
估计时间依赖性治疗对死亡时间的因果效应具有挑战性。本文利用疾病-死亡模型构建问题,关注一种随机干预,该干预修改了从无治疗到开始治疗的转移风险。这种干预只能在观测数据层面实施,而因果有效的干预是在真实数据生成过程层面定义的。我们提供了在特定设置下实际可行的干预对应于期望因果干预的条件。我们首先考虑在固定时间点开始治疗的干预,该时间点随后可在相关时间跨度内变化。然而,由此产生的估计量不是路径可微的,阻碍了假设精简推断的发展。为解决此问题,我们转而考虑一种平滑干预,该干预在目标时间点附近的时间窗口内分配治疗,从而产生一个适用于半参数分析的参数。我们推导了相应的有效影响函数,并提出了一种具有理想稳健性的去偏一步估计量。我们在模拟研究中考察了其有限样本性能,并将该方法应用于经典的斯坦福心脏移植数据,以及寻求宫内人工授精的未解释性生育能力低下夫妇的治疗延迟数据。
Estimating the causal effect of a time-dependent treatment on time to death is challenging. In this paper, we formulate the problem using the illness-death model and focus on a stochastic intervention that modifies the hazard governing the transition from no treatment to treatment initiation. Such an intervention can only be implemented at the level of the observed data, whereas the causally valid intervention is defined at the level of the true data-generating process. We provide conditions under which the practically feasible intervention corresponds to the desired causal intervention in the specific setting. We first consider an intervention in which treatment is initiated at a fixed time point, which may subsequently be varied across the relevant time span. However, the resulting estimand is not pathwise differentiable, preventing the development of assumption-lean inference. To address this, we instead consider a smoothed intervention that assigns treatment within a time window around the target time point, thereby yielding a parameter amenable to semiparametric analysis. We derive the corresponding efficient influence function and propose a debiased one-step estimator with desirable robustness properties. We investigate its finite-sample performance in a simulation study and apply the method to the classical Stanford Heart Transplant data, as well as to data on treatment delay among couples with unexplained subfertility seeking intrauterine insemination.
基于机器学习的模型无关信号发现:弥合理论与实践的鸿沟
Oz Amram, Marco Letizia, Mikael Kuusela
AI总结 本文综述了基于AI的模型无关搜索策略,旨在通过广泛探索而非特定假设来增强实验的发现潜力,并讨论了验证与解释方法。
复杂科学数据中新现象的搜索主要是模型依赖的,针对特定假设进行优化,因此覆盖可能信号空间的能力有限。最近,基于AI的模型无关搜索策略(其中许多在高能物理中首创)被提出,提供了一种互补范式,优先考虑广泛探索而非定制分析。这些技术提供了增强现代实验整体发现潜力的机会,特别是在理论指导稀缺的领域。在本文中,我们回顾了基于AI的模型无关策略主要类别的概念框架。我们讨论了这些方法的潜在陷阱,以及验证和解释的策略。我们希望本文能为从业者和有兴趣了解更多关于这些模型无关搜索策略的研究人员提供有用的参考。
Searches for new phenomena in complex scientific data are predominantly model-dependent, optimized for specific hypotheses, and therefore limited in their coverage of the space of possible signals. Recently, new AI-based model-agnostic search strategies, many of which have been pioneered in high-energy physics, have been proposed which provide a complementary paradigm, prioritizing broad exploration over tailored analyses. These techniques offer an opportunity to enhance the overall discovery potential of modern experiments, especially in regimes where theoretical guidance is scarce. In this document, we review the conceptual framework behind the main classes of AI-based model-agnostic strategies. We discuss the potential pitfalls of these methods, and strategies for their validation and interpretation. We aim for this document to serve as a useful reference both for practitioners and for researchers interested in learning more about these model-agnostic search strategies.
关于动能朗之万扩散的耦合
Nawaf Bou-Rabee, Sonja Cox, Roy Schieven
AI总结 研究动能朗之万扩散及其分裂离散化中耦合与全变差界的关系,通过非马尔可夫耦合实现精确收缩并改进现有结果。
对于动能朗之万扩散及其分裂离散化,次椭圆噪声结构使得耦合与全变差(TV)界之间的关系比椭圆情况更微妙。我们证明,对于具有二次势能的动能朗之万方程,没有马尔可夫耦合(连续或离散)能够捕捉两个不同初始值解之间TV距离的渐近衰减率;经典的迭代一次射击(或粘性)耦合(我们为其推导了精确的收缩公式)饱和了这个下界。在建设性方面,我们展示了Chak和Monmarché最近获得的尖锐TV界可以通过一个显式的非马尔可夫耦合得到自然解释,该耦合基于由经典最小能量控制问题表征的最优合并轨迹。对于OBABO分裂方案,该方法还消除了Chak和Monmarché工作中的Hessian-Lipschitz、步长和最终时间假设。
For the kinetic Langevin diffusion and its splitting discretizations, the hypoelliptic noise structure makes the relationship between couplings and total variation (TV) bounds more subtle than in the elliptic case. We establish that, for the kinetic Langevin equation with quadratic potential, no Markovian coupling (continuous or discrete) captures the asymptotic decay rate of the TV distance between two solutions with different initial values; the canonical iterated one-shot (or sticky) coupling, for which we derive an exact contraction formula, saturates this lower bound. On the constructive side, we show that the recent sharp TV bounds obtained by Chak and Monmarché admit a natural interpretation through an explicit non-Markovian coupling, built from an optimal coalescence trajectory characterized by a classical minimum-energy control problem. For the OBABO splitting scheme, this approach additionally eliminates the Hessian-Lipschitz, step-size, and final-time assumptions in the work of Chak and Monmarché.
特定地点大气变量阈值超出的预测
Roberta Baggio, Jean-François Muzy
AI总结 本研究比较了直接概率法和全分布概率法在预测特定地点大气变量(如温度和风速)阈值超出时的表现,发现全分布方法在极端事件中更优,并指出其优势源于对条件分布整体特征的准确捕捉。
本研究比较了两种方法论方法,用于在给定地点预测大气变量(如温度和风速)的阈值超出:(i)直接概率法,将超出视为二元分类问题;(ii)全分布概率法,对目标变量的完整条件概率律进行建模。通过理论分析和在玩具模型上的数值模拟,以及来自法国东南部MeteoNet数据集(2016-2018)的真实数据,我们证明全分布方法在罕见极端事件中始终优于直接方法。这一优势源于全分布方法能够从中等和轻度强度事件中有效学习条件分布的参数,从而在尾部实现更好的校准和区分。我们发现,所选分布的具体参数形状相对于准确捕捉其整体属性(即均值和方差)的可预测变化而言,起次要作用。这种经验上的不可区分性也揭示了驱动大气极值的物理机制,表明极端超出主要由整个分布的显著条件位移驱动,而非静态气候学中不可预测的肥尾异常。我们的结果在强地表风速和强小时降雨量上均得到验证,并使用适当评分规则(Brier分数、对数分数)和确定性技能分数(Peirce技能分数、CSI、HSS)评估性能。这些发现强调了全概率分布建模在罕见事件预测中的关键重要性,并为改进业务气象学中的极端天气预测提供了实用指导。
This study compares two methodological approaches for predicting, at a given site, threshold exceedances of atmospheric variables such as temperature and wind speed: (i) direct probabilistic methods, which treat exceedance as a binary classification problem, and (ii) full distribution probabilistic methods, which model the complete conditional probability law of the target variable. Using theoretical analysis and numerical simulations on a toy model, alongside real-world data from the MeteoNet dataset (2016--2018) for southeastern France, we demonstrate that the full distribution approach consistently outperforms the direct method for rare, extreme events. This advantage arises because the full distribution approach effectively learns the parameters of the conditional distribution from moderate and mild intensity events, thereby achieving better calibration and discrimination in the tails. We find that the specific parametric shape of the chosen distribution plays a secondary role compared to accurately capturing predictable shifts in its bulk properties (i.e., mean and variance). This empirical indistinguishability is also informative about the physical mechanics driving atmospheric extremes, suggesting that extreme exceedances are primarily driven by significant conditional displacements of the entire distribution rather than by unpredictable, fat-tailed anomalies within a static climatology. Our results are validated for both strong surface wind speeds and intense hourly rainfall, with performance evaluated using proper scoring rules (Brier score, logarithmic score) and deterministic skill scores (Peirce Skill Score, CSI, HSS). These findings highlight the critical importance of modeling the full probability distribution for rare-event forecasting and provide practical guidance for improving extreme weather prediction in operational meteorology.
任意状态空间上的自由能估计
Jiajun He, Zijing Ou, Francisco Vargas, Yingzhen Li, José Miguel Hernández-Lobato, Carles Domingo-Enrich, Yuanqi Du
AI总结 提出一种基于广义神经传输学习的框架,将自由能估计推广到任意状态空间,并揭示时间反演与Doob h-变换的群论结构。
自由能估计是一个从物理学到统计学的基础且具有挑战性的问题。经典方法依赖于热力学变换,包括直接估计、准静态积分和有限时间平均。最近的工作[He and Du et al., 2025]通过学习神经传输显著加速了有限时间区间的效率。在本文中,我们将此框架推广到任意状态空间。基于这一观点,我们开发了一种广义神经传输学习方法以实现高效估计。实验验证了所提方法在连续设置之外的有效性和效率,扩展到离散和多模态空间以及自回归设置。除了自由能估计,我们还建立了代数恒等式并揭示了连接无穷小时间反演和广义Doob h-变换的群论结构,表明它们的组合形成一个广义二面体群。
Free energy estimation is a fundamental yet challenging problem, from physics to statistics. Classical approaches rely on thermodynamic transformations, ranging from direct estimation, quasistatic integration, to finite-time averaging. Recent work [He and Du et al., 2025] learns neural transports to significantly accelerate the efficiency in the finite-time regime. In this paper, we generalize this framework to arbitrary state spaces. Building on this view, we develop a generalized neural transport learning approach for efficient estimation. Experiments validate the effectiveness and efficiency of the proposed method beyond continuous settings, extending to discrete and multimodal spaces as well as autoregressive settings. Beyond free energy estimation, we establish algebraic identities and reveal a group-theoretic structure linking infinitesimal time reversal and generalized Doob's $h$-transforms, showing that their compositions form a generalized dihedral group.
Stiefel流形上的路由:自适应子空间选择何时有助于跨域脑电解码?
Isabella Costa Maia, Pedro L. C. Rodrigues, Salem Said, Marco Congedo
AI总结 针对跨域脑电解码中协方差矩阵域偏移问题,提出动态Stiefel路由方法,通过Stiefel流形上的专家投影滤波器池和交叉注意力机制实现自适应子空间选择,并引入三种结构性质避免退化为集成平均,在三个数据集上取得一致提升。
尽管黎曼深度学习取得了进展,跨域脑电解码仍然具有挑战性:来自不同受试者的协方差矩阵占据了SPD流形上系统不同的区域,然而现有的域适应方法要么需要目标域校准数据,要么学习无法跨域泛化的受试者特定组件。我们提出了动态Stiefel路由:在Stiefel流形上有一个包含$K$个专家投影滤波器的池,每个滤波器专门处理SPD流形上的不同区域,每个输入协方差通过交叉注意力路由到最合适的滤波器,从而为每个样本自适应调整子空间投影。一个核心发现是,这种朴素实现的方法会退化为集成平均:当路由权重均匀时,自适应滤波器恰好等价于专家的等贡献组合,与单个固定滤波器无法区分。三种结构性质打破了这种退化:一个对称锚点$W_{\mathrm{base}} \in \mathrm{St}(n,k)$消除了专家间的邻近偏差;一个冻结的域判别查询编码器将路由与任务优化解耦;以及一个解耦的键对齐损失,将专家键训练到稳定的域吸引子。它们共同产生了SPD流形上第一个真正承诺且域结构化的路由,在三个数据集上取得一致提升:平衡准确率分别从$0.773\to 0.823$、$0.757\to 0.809$和$0.801\to 0.839$,且对齐策略由单一数据驱动规则自动确定,无需数据集特定的超参数搜索。
Cross-domain EEG decoding remains challenging despite advances in Riemannian deep learning: covariance matrices from different subjects occupy systematically distinct regions of the SPD manifold, yet existing domain adaptation methods either require target-domain calibration data or learn subject-specific components that cannot generalise across domains. We propose dynamic Stiefel routing: a pool of $K$ expert projection filters on the Stiefel manifold, each specialised for a different region of the SPD manifold, with each input covariance routed to the most appropriate filter via cross-attention, adapting the subspace projection per sample. A central finding is that this approach, implemented naively, provably collapses to ensemble averaging: when routing weights are uniform, the adaptive filter reduces exactly to an equal-contribution combination of experts, indistinguishable from a single fixed filter. Three structural properties break this degeneracy: a symmetric anchor $W_{\mathrm{base}} \in \mathrm{St}(n,k)$ that removes proximity bias among experts; a frozen domain-discriminative query encoder that decouples routing from task optimisation; and a decoupled key alignment loss that trains expert keys toward stable domain attractors. Together they produce the first genuinely committed and domain-structured routing on SPD manifolds, with consistent gains across three datasets: balanced accuracy improves from $0.773\to 0.823$, $0.757\to 0.809$, and $0.801\to 0.839$, with the alignment strategy determined automatically by a single data-driven rule and no dataset-specific hyperparameter search.
$U$-统计量的泛函中心极限定理对于$\beta$-混合数据
Davide Giraudo
AI总结 针对严格平稳的$\beta$-混合随机变量序列,研究部分和过程的收敛性,包括连续函数空间和Hölder空间中的收敛,条件接近最优。
我们研究了基于严格平稳的$\beta$-混合随机变量序列的部分和过程的收敛性。考虑了连续函数空间以及Hölder空间中的收敛性。条件接近最优。
We investigate the convergence of partial sum processes based on a strictly stationary $β$-mixing sequence of random variables. The convergence in the space of continuous function as well as in H{ö}lder spaces is considered. The conditions are close to optimality.
前沿对冲:基于少量样本学习新任务
Tobias Wegel, Federico Di Gennaro, Geelon So, Fanny Yang
AI总结 针对新任务样本少的问题,利用弱单调性假设,通过转移学习和模型选择聚合在模型前沿进行对冲,实现可证明的统计增益。
当学习者面临少量样本的新任务时,必须利用任何可用的辅助信息。在实践中,这通常以公共基准中相关任务的模型评估形式出现。一个关键问题是如何对任务相关性进行建模,使其既现实又能从基准评估中获得可证明的收益。经验上,我们观察到弱单调性通常近似满足:如果一个模型在许多基准上占优,那么它在新任务上也往往表现更好。我们探索了在(近似)弱单调性下学习的统计复杂性,并在两种学习范式(迁移学习和模型选择聚合)中利用它。我们表明,不仅可以根据单调性剪枝模型类,还可以通过在前沿进行对冲来进一步适应可用权衡的几何结构。
When a learner faces a new task with few samples, it must leverage any available side information. In practice, this often comes in the form of model evaluations on related tasks in public benchmarks. A key question then is how to model task relatedness such that it is both realistic and the benchmark evaluations lead to provable gains. Empirically, we observe that weak monotonicity is often approximately satisfied: if a model dominates another on many benchmarks, it also tends to outperform on the new task. We explore the statistical complexity of learning under (approximate) weak monotonicity, leveraging it within two learning paradigms: transfer learning and model selection aggregation. We show that not only can we prune the model class based on monotonicity, but we can also further adapt to the geometry of the available trade-offs by hedging on the frontier.
关于多元Liouville分布的加权Poincaré不等式——在全局敏感性分析中的应用
David Heredia
AI总结 本文建立了多元Liouville分布的加权Poincaré不等式,并通过传输论证将其推广到连续椭圆等高分布,最后应用于全局敏感性分析,在洪水模型案例中展示了实际用途。
在这项工作中,我们建立了多元Liouville分布的加权Poincaré不等式,该分布是Dirichlet分布的推广。我们还考虑了连续椭圆等高分布,其密度水平集是超椭球体的并集。我们的方法基于传输论证,允许在概率测度之间转移加权Poincaré不等式。我们将结果应用于全局敏感性分析,并在一个洪水模型案例研究中说明了它们的实际用途,其中输入变量的依赖结构由经典copula编码。
In this work we establish weighted Poincar{é} inequalities for multivariate Liouville distributions, which are a generalization of the Dirichlet distribution. We also consider continuous elliptically contoured distributions, whose density levels are unions of hyperellipsoids. Our approach is based on a transport argument which allows weighted Poincar{é} inequalities to be transferred between probability measures. We apply our results to global sensitivity analysis and illustrate their practical use in a flood model case study, where the structure of dependence of the input variables is encoded by classical copulas.
具有1比特通信约束的批量随机线性赌博机
Ivan Lau, Daniel McMorrow, Kevin Jamieson, Jonathan Scarlett
AI总结 研究在批量大小B和每批仅1比特反馈的通信约束下,随机线性赌博机的遗憾最小化问题,提出了两种基于G-最优设计和1比特均值估计的相位消除算法,实现了接近无约束线性赌博机的最优遗憾。
我们研究了在批处理和通信约束的自然组合下的随机线性赌博机:时间范围被划分为大小相等的批次$B$,在每个批次中,学习器向一个智能体发送$B$个请求的臂拉动,智能体观察相应的$B$个奖励,并用单个比特的反馈回复学习器。对于每个批次,学习器指定智能体使用的1比特量化规则,该规则可能依赖于所有先前接收到的比特,但不直接依赖于任何过去的奖励。这一设置解决了先前模型(仅有每轮量化或仅有总比特预算)之间一个显著但尚未探索的“中间地带”。我们建立了一个极小极大下界,表明由于1比特通信瓶颈,即使在没有噪声的情况下,$Ω(B\min\{d,\log\lvert \mathcal{A} vert\})$的遗憾也是不可避免的。结合标准的统计极限,这给出了一个通用的下界$\widetildeΩ(B\min\{d,\log\lvert \mathcal{A} vert\} + \sqrt{dT \min\{d,\log\lvert \mathcal{A} vert\}})$。我们开发了两种基于$G$-最优设计和1比特均值估计的相位消除算法。第一种算法实现了$\widetilde{O}(dB + d\sqrt{T})$的遗憾,当$\lvert \mathcal{A} vert = \exp(Ω(d))$时,该下界在对数因子内匹配;第二种算法结合了安全臂识别和热启动过程,获得了$\widetilde{O}(B\log\lvert \mathcal{A} vert + d^{3/2}\sqrt{B} + \sqrt{dT\log\lvert \mathcal{A} vert})$的遗憾,在$(\lvert \mathcal{A} vert, B, d, T)$的广泛缩放范围内接近最优。总之,我们的结果表明,每批仅需一个比特的反馈就足以在广泛的缩放范围内几乎匹配无约束线性赌博机的极小极大遗憾,即使对于$Θ(\sqrt{T})$这样大的批量大小也是如此。
We study stochastic linear bandits under a natural combination of batching and communication constraints: the time horizon is partitioned into batches of equal size $B$, and during each batch the learner sends $B$ requested arm pulls to an agent, who then observes the corresponding $B$ rewards and responds with a single bit of feedback to the learner. For each batch, the learner specifies the 1-bit quantization rule the agent uses, which may depend on all previously received bits but not on any past rewards directly. This setting addresses a significant yet unexplored ``middle ground'' between previous models having per-round quantization only or total bit budgets only. We establish a minimax lower bound showing that $Ω(B\min\{d,\log\lvert \mathcal{A} \rvert\})$ regret is unavoidable due to the 1-bit communication bottleneck, even in the absence of noise. Combined with standard statistical limits, this yields a general lower bound of $\widetildeΩ(B\min\{d,\log\lvert \mathcal{A} \rvert\} + \sqrt{dT \min\{d,\log\lvert \mathcal{A} \rvert\}})$. We develop two phased-elimination algorithms based on $G$-optimal designs and 1-bit mean estimation. The first achieves $\widetilde{O}(dB + d\sqrt{T})$ regret, matching the lower bound up to logarithmic factors when $\lvert \mathcal{A} \rvert = \exp(Ω(d))$, and the second incorporates a safe-arm identification and warm-start procedure to obtain $\widetilde{O}(B\log\lvert \mathcal{A} \rvert + d^{3/2}\sqrt{B} + \sqrt{dT\log\lvert \mathcal{A} \rvert})$ regret, which is near-optimal in broad scaling regimes of $(\lvert \mathcal{A} \rvert, B, d, T)$. Together, our results demonstrate that a single bit of feedback per batch suffices to nearly match the minimax regret of unconstrained linear bandits in broad scaling regimes, even for batch sizes as large as $Θ(\sqrt{T})$.
加性矩阵整数值自回归模型
Kaiyan Cui, Yikai Hu, Tianyun Guo
AI总结 针对高维矩阵整数值时间序列,提出加性矩阵整数值自回归(Add-MINAR)模型,通过将响应分解为行效应、列效应和滞后效应,增强了参数可解释性和结构灵活性,并发展了投影估计和迭代条件最小二乘估计两种方法。
在当代数据驱动和技术集成的时代,各种矩阵整数值时间序列,如跨区域犯罪统计、多类别销售记录和网络流量矩阵,呈现出高维性、复杂结构以及强行列交织依赖性。尽管现有的矩阵整数值自回归(MINAR)模型提供了一个直接处理矩阵数据并捕获双向行列依赖性的框架,但其参数通常缺乏明确的经验意义,且模型无法单独区分来自行、列和滞后动态的影响,导致可解释性有限和结构表示不灵活。为克服这些缺点,本文提出了加性矩阵整数值自回归(Add-MINAR)模型。通过引入显式将矩阵响应分解为行效应、列效应和滞后效应的加性结构,所提模型不仅保留了矩阵值性质,还显著增强了参数可解释性和结构灵活性。发展了两种估计方法,即投影估计和迭代条件最小二乘估计,用于参数识别和推断,并严格建立了它们的一致性和渐近正态性等渐近性质。模拟结果表明,在大多数情况下,迭代条件最小二乘估计器通常优于投影估计器。对芝加哥犯罪数据的实证分析进一步表明,与MINAR等基准模型相比,Add-MINAR模型实现了更优的样本内拟合和样本外预测性能,使其特别适用于具有显式行列交互特征的实际应用。
Contemporary data-driven and technology-integrated era, various matrix-valued integer-valued time series, such as cross-regional crime statistics, multi-category sales records, and network traffic matrices, exhibit high dimensionality, complex structures, and strong row-column intertwined dependencies. Although the existing matrix integer-valued autoregressive (MINAR) model provides a framework that directly handles matrix data and captures bidirectional row-column dependencies, it suffers from limited interpretability and inflexible structural representation, as its parameters often lack clear empirical meaning and the model cannot separately distinguish the effects arising from rows, columns, and lagged dynamics. To overcome these drawbacks, this paper proposes the additive matrix integer-valued autoregressive (Add-MINAR) model. By introducing an additive structure that explicitly decomposes the matrix response into row effects, column effects, and lagged effects, the proposed model not only preserves the matrix-valued nature but also significantly enhances parameter interpretability and structural flexibility. Two estimation methods, namely projection estimation and iterative conditional least squares estimation, are developed for parameter identification and inference, and their asymptotic properties, including consistency and asymptotic normality, are rigorously established. Simulation results show that the iterative conditional least squares estimator generally outperforms the projection estimator in most scenarios. Empirical analysis of Chicago crime data further demonstrates that the Add-MINAR model achieves superior in-sample fitting and out-of-sample forecasting performance compared to benchmark models such as MINAR, making it particularly suitable for practical applications with explicit row-column interaction features.
无通信协调:超越优化与几何布朗运动
G J Milburn, A K Ringsmuth
AI总结 提出基于信息受限反馈的物理框架,在部分观测随机动力系统中,通过宏观到微观的反馈实现群体协调,无需直接通信或策略优化。
我们引入了一个基于物理的群体协调框架,该框架基于部分观测随机动力系统中的信息受限反馈。种群规模作为连续时间生灭马尔可夫过程演化,其转移速率响应与潜在种群状态相关的共享随机测量信号。个体既不直接通信也不优化策略;相反,协调通过由不完美公共信息介导的宏观到微观反馈涌现。我们证明,当测量强度和种群统计满足适当条件时,几何布朗运动作为条件动力学的极限情况出现。更一般地,改变测量通道的信噪比特性会产生更广泛的随机增长过程,包括扩散和跳跃类机制,尽管系综平均增长仍然是指数级的。在适当的极限下,该框架恢复了Peters和Adamou的随机乘法增长模型,为部分可观测性下的推理和反馈协调提供了物理解释。
We introduce a physically grounded framework for coordination in a population based on information constrained feedback in a partially observed stochastic dynamical system. Population size evolves as a continuous time birth death Markov process whose transition rates respond to a shared stochastic measurement signal correlated with the underlying population state. Individuals neither communicate directly nor optimise strategies; instead, coordination emerges from macro to micro feedback mediated by imperfect common information. We show that geometric Brownian motion arises as a limiting case of the conditional dynamics when measurement strength and population statistics satisfy suitable conditions. More generally, varying the signal to noise properties of the measurement channel produces a wider class of stochastic growth processes, including diffusive and jump like regimes, even though ensemble average growth remains exponential. In an appropriate limit the framework recovers the stochastic multiplicative growth model of Peters and Adamou, providing a physical interpretation of coordination as inference and feedback under partial observability.
可检查的神经马尔可夫模型用于非平稳时间序列
Jan Rovirosa, Jesse Schmolze
AI总结 提出一种神经网络参数化随机矩阵流形的混合方法,用于估计稀疏数据下的非平稳马尔可夫链,以金融市场为测试平台,发现基于已实现波动率的状态变量比基于收益的状态变量更一致,并在9/10资产上降低了5.6%的Chapman-Kolmogorov差异并提高了留出似然。
建模非平稳随机系统需要平衡深度学习的表示能力与经典概率模型的结构透明度。马尔可夫转移矩阵提供了这样一个框架,但传统的基于频率的估计在高分辨率下由于数据稀疏性而失效。我们提出了一种混合方法,通过神经网络参数化随机矩阵的流形,从而在稀疏数据情况下估计时间非齐次马尔可夫链,并以金融市场作为测试平台,研究马尔可夫状态变量作为关键归纳偏置。我们表明,基于已实现波动率的状态变量比基于收益的状态变量产生更内部一致的马尔可夫结构,在9/10资产上实现了5.6%的Chapman-Kolmogorov差异减少和优越的留出似然。与黑盒序列模型不同,我们的方法生成显式矩阵,适用于直接几何分析,揭示了诸如高波动率下转移概率的普遍同质化等结构性发现。
Modeling non-stationary stochastic systems requires balancing the representational capacity of deep learning with the structural transparency of classical probabilistic models. Markov transition matrices provide such a framework, but traditional frequency-based estimation collapses at high resolutions due to data sparsity. We propose a hybrid approach that parameterizes the manifold of stochastic matrices through a neural network, enabling estimation of time-inhomogeneous Markov chains in sparse-data regimes, and use financial markets as a testbed to investigate the Markov state variable as a critical inductive bias. We show that conditioning on realized volatility produces a more internally consistent Markovian structure than return-based states, achieving a $5.6\%$ reduction in Chapman-Kolmogorov discrepancy and superior held-out likelihood in 9 of 10 assets. Unlike black-box sequence models, our approach generates explicit matrices amenable to direct geometric analysis, surfacing structural findings such as the universal homogenization of transition probabilities under high-volatility regimes.
过参数化高斯混合模型梯度方法的局部线性收敛性
Jingxing Wang, Vasileios Charisopoulos, Maryam Fazel
AI总结 针对过参数化高斯混合模型,提出一种交替使用短梯度步和长Polyak步的方法,实现局部线性收敛速率,克服了过参数化导致的慢收敛问题。
我们研究了过参数化下学习高斯混合模型的问题。先前的工作表明,虽然过参数化对于避免虚假局部最优和通过梯度EM算法实现全局恢复真实模型至关重要,但它会显著减慢局部收敛速度。在混合权重的某些假设下,我们证明了统计学习过程最小化的标准散度度量具有一个缓慢增长的流形,在该流形上著名的Polyak步长可以几何级地减少损失,并设计了一种基于梯度的方法,该方法以局部线性速率收敛到极小值点。此外,我们表明,对于具有任意权重的混合模型,我们的方法收敛到接近最优的解——直到一个自然的误设阈值。在高层次上,该方法在接近流形的几个“短”梯度下降步和收缩到极小值点距离的“长”Polyak步之间交替。我们的结果表明,慢收敛不是过参数化的内在挑战,而是可以通过利用损失景观的有利结构来克服。
We study the problem of learning Gaussian mixture models under overparameterization. Prior work has shown that while overparameterization is essential for avoiding spurious local optima and enables global recovery of the ground-truth model using the gradient-EM (expectation-maximization) algorithm, it can dramatically slow down the local rate of convergence. Under certain assumptions on the mixture weights, we show that a standard divergence measure minimized by statistical learning procedures possesses a manifold of slow growth on which the well-known Polyak stepsize reduces the loss geometrically, and design a gradient-based method that converges to minimizers at a locally linear rate. Additionally, we show that our method converges to nearly optimal solutions -- up to a natural misspecification threshold -- for mixtures with arbitrary weights. At a high level, the method alternates between several "short" gradient descent steps that approach the manifold and "long" Polyak steps that contract the distance to minimizers. Our results suggest that slow convergence is not an intrinsic challenge of overparameterization, but can be overcome by exploiting the favorable structure of the loss landscape.
具有形状深度非线性MLP的贝叶斯推断
Boris Hanin, Tianze Jiang
AI总结 本文通过神经协方差SDE分析深度非线性MLP在训练样本数、输入维数、隐藏层宽度和层数均较大时的贝叶斯推断,发现LP/N的一阶准则决定深度对模型证据的益处,并推导出贝叶斯预测后验等价于数据相关核方法。
深度学习理论的一个核心目标是刻画神经网络在模型规模和训练集规模同时较大时的预测行为。由于模型参数数量和数据集大小发散极限不可交换,先验上并不清楚存在哪些极限。在这项工作中,我们通过研究深度非线性MLP在训练样本数($P$)、输入维数($N_0$)、隐藏层宽度($N$)和隐藏层数($L$)均可大时的贝叶斯推断,为这些问题提供了新的见解。我们基于神经协方差SDE(Li等人,2022)分析$LP/N\in\Theta(1)$(扮演有效网络深度角色)区域的预测后验。我们的框架涵盖光滑和ReLU激活函数,并适用于任意温度。我们发现,在$LP/N$的一阶近似下,存在一个简单准则,用于判断哪些数据生成过程能从深度中获益,即更大的$LP/N$会增加贝叶斯模型证据。我们还对物理学文献中的一个先前结果给出了新的推导:至少在$LP/N$的一阶近似下,贝叶斯预测后验极其简单,等价于一个数据相关的核方法。
A central aim of deep learning theory is to characterize how neural networks make predictions in the regime of simultaneously large model and training set size. Since the limits of diverging number of model parameters and dataset size do not commute it is not clear a priori what limits exist. In this work, we shed new light on these questions by studying Bayesian inference in deep non-linear MLPs in the regime where the number of training samples ($P$), the input dimension ($N_0$), the hidden layer width ($N$), and the number of hidden layers ($L$) can all be large. We build on the Neural Covariance SDE (Li et al., 2022) to analyze predictive posteriors in the regime where $LP/N\inΘ(1)$, playing the role of an effective network depth. Our framework covers both smooth and ReLU activation functions and applies to arbitrary temperature. We find to first order in $LP/N$ a simple criterion for which data generating processes benefit from depth in the sense that larger $LP/N$ increases the Bayesian model evidence. We also give a novel derivation of a prior result from the physics literature that at least to first order in $LP/N$, the Bayesian predictive posterior is remarkably simple and is simply equivalent to that of a data-dependent kernel method.
微调提升语言模型中的信息传递
Yuwei Cheng, Weiyi Tian, Haifeng Xu
AI总结 提出冠层熵(Canopy Entropy)度量,从树结构视角量化生成空间的有效大小,发现微调模型在总熵降低时仍能增强长度-熵率正相关,从而更高效地将不确定性转化为语义多样性。
微调通常被认为会降低大型语言模型的不确定性和多样性,但现有分析忽略了输出长度这一关键混杂因素,因此未能捕捉不确定性在整个生成展开中的分布。为解决这一问题,我们提出冠层熵($\mathrm{CE}^\star$),一种从树视角看待语言生成的度量,其中“冠层”代表所有可能展开的空间,使得$\mathrm{CE}^\star$自然地量化生成空间的有效大小。$\mathrm{CE}^\star$共同捕捉输出长度$N$和生成序列$Y_{1:N}$中的不确定性——实际上,我们证明它等于总香农熵$H(N, Y_{1:N}\mid X)$,其中$X$表示提示。该公式产生了可解释的度量,包括长度-熵率相关项$ ho(N, r_N)$,其中$r_N$是熵率,通过指示较长输出是否每个标记信息量更多或更少来量化信息传递效率。实验上,跨任务和模型家族,我们发现微调模型一致地表现出更强的正相关$ ho(N, r_N)$,即使总熵降低。此外,在控制模型家族、任务、提示和输出长度效应后,我们发现微调几乎使熵率与语义多样性之间的相关强度增加了两倍,表明对齐模型更有效地将标记不确定性转化为语义多样性。总体而言,这些结果表明微调并非简单地降低不确定性,而是从根本上将其重组为更具信息性和语义意义的生成。我们的代码可在https://github.com/WeiyiTian/canopy-entropy获取。
Fine-tuning is often believed to reduce uncertainty and diversity in large language models, but existing analyses overlook output length, a key confounder, and therefore fail to capture how uncertainty is distributed across an entire generation rollout. To address this, we propose Canopy Entropy ($\mathrm{CE}^\star$), a measure that views language generation from a tree perspective, where ``canopy'' represents the space of all possible rollouts, making $\mathrm{CE}^\star$ naturally quantify the effective size of the generation space. $\mathrm{CE}^\star$ jointly captures uncertainty in both the output length $N$ and the generated sequence $Y_{1:N}$ -- indeed, we show that it equals to total Shannon entropy $H(N, Y_{1:N}\mid X)$, where $X$ denotes the prompt. This formulation yields interpretable metrics, including a length-entropy correlation term $ρ(N, r_N)$, where $r_N$ is the entropy rate, quantifying information conveyance efficiency by indicating whether longer outputs are more or less informative per token. Empirically, across tasks and model families, we find that fine-tuned models consistently exhibit stronger positive correlation $ρ(N, r_N)$, even when total entropy decreases. Furthermore, after controlling for model family, task, prompt, and output-length effects, we find that fine-tuning nearly triples the correlation strength between entropy rate and semantic diversity, suggesting that aligned models convert token uncertainty into semantic diversity more efficiently. Overall, these results demonstrate that fine-tuning does not simply reduce uncertainty, but fundamentally reorganizes it into more informative and semantically meaningful generations. Our code is available at https://github.com/WeiyiTian/canopy-entropy.
扩散模型中的遗忘学习:基于KL散度和似然约束的统一框架
Shervin Khalafi, Alejandro Ribeiro, Dongsheng Ding
AI总结 提出一个约束优化框架,通过最小化与预训练模型的偏差并施加与遗忘分布的分离约束,实现扩散模型中的概念和数据遗忘,并基于KL散度和似然约束推导最优解及原始-对偶算法。
扩散模型中的遗忘学习旨在移除不需要的数据或概念,同时保留预训练模型的效用——这两个目标本质上相互冲突。我们提出了一个原则性的约束优化框架,将遗忘学习形式化为在满足与遗忘分布的显式分离约束下,最小化与预训练模型的偏差。具体地,我们基于反向和正向KL散度以及似然约束,构建了三个约束优化问题。前两个问题泛化了现有的概念和数据遗忘方法,而第三个问题为遗忘学习提供了一种新颖且自然的表述。尽管KL约束非凸,我们证明了所有三个问题的强对偶性,从而能够显式地表征其最优解作为遗忘目标,并为每个公式开发原始-对偶算法。实验结果表明,与基于权重的基线方法相比,我们的KL约束方法在概念和数据遗忘中实现了更优的保留-遗忘权衡,而基于似然的方法在匹配遗忘效果的同时,更好地保留了保留概念。
Unlearning in diffusion models aims to remove undesirable data or concepts while preserving the utility of pretrained models -- two fundamentally conflicting objectives. We propose a principled constrained optimization framework that formulates unlearning as minimizing the deviation from a pretrained model, subject to explicit separation constraints from the unlearning distributions. Specifically, we formulate three constrained optimization problems based on reverse and forward KL divergences, and likelihood constraints. The first two generalize existing approaches for concept and data unlearning, while the third offers a novel and natural formulation for unlearning. Despite the nonconvexity of the KL constraints, we establish strong duality for all three problems, enabling us to explicitly characterize their optimal solutions as unlearning targets and develop primal-dual algorithms for each formulation. Experimental results demonstrate that our KL-constrained approach achieves superior retention-unlearning tradeoffs compared to weight-based baselines for concept and data unlearning, and that our likelihood-based approach matches unlearning effectiveness while better preserving retained concepts compared to baselines.
基于脑电图脑机接口中Probit-link分裂合并高斯过程先验的贝叶斯分类
Yunong Wu, Jane E. Huggins, Jian Kang, Tianwen Ma
AI总结 提出一种基于Probit-link分裂合并高斯过程先验的贝叶斯生成模型,用于脑电图响应二元分类,实现时空特征选择并降低计算复杂度,同时保持预测精度。
基于事件相关电位(ERP)的脑机接口(BCI)拼写系统通过检测视觉刺激诱发的脑电图(EEG)响应,使用户能够选择字符。一个挑战是准确识别目标相关响应,如P300成分。然而,现有方法往往忽略特征选择,或进行不可解释的特征选择,或需要大量计算或数据操作。为解决这些局限性,我们提出一种新颖的贝叶斯生成建模框架,用于EEG对刺激响应的二元分类。我们的方法采用Probit-link分裂合并高斯过程(P-SMGP)先验进行时空特征选择,有效捕捉目标与非目标ERP响应之间的区别。通过模拟研究和真实EEG数据分析,我们的方法能够降低计算复杂度,并在保持可比预测精度的同时,对变换后的ERP函数提供统计解释。这些发现强调了可解释的刺激级建模对于推进预测性和个性化BCI系统的价值。
A Brain-Computer Interface (BCI) speller systems based on Event-Related Potentials (ERPs) enables users to select characters by detecting brain responses to visual stimuli, recorded through electroencephalogram (EEG). One challenge is to accurately identify target-related responses, such as the P300 component. However, existing methods tend to ignore feature selection, perform feature selection without interpretability, or require large computational effort or data manipulation. To address these limitations, we propose a novel Bayesian generative modeling framework to the binary classification of EEG responses to stimuli. Our approach employs a Probit-link Split-and-merge Gaussian Process (P-SMGP) prior to perform spatial-temporal feature selection, effectively capturing the distinctions between target and non-target ERP responses. Through both simulation studies and real EEG data analysis, our approach can reduce computational complexity and provide statistical interpretations on transformed ERP functions while maintaining comparable prediction accuracy. These findings underscore the value of interpretable, stimulus-level modeling for advancing predictive and personalized BCI systems.
最后一层是否足以进行不确定性量化?
Joseph Wilson, Chris van der Heide, Liam Hodgkinson, Fred Roosta
AI总结 通过理论分析和实验评估,比较全网络线性化与最后一层线性化在深度神经网络认知不确定性量化中的性能,发现最后一层近似在保持相当UQ性能的同时显著提升计算效率。
深度神经网络(DNN)的认知不确定性量化(UQ)是在关键任务环境中安全采用AI的要求。几种领先的UQ方法将DNN线性化以形成贝叶斯广义线性模型(GLM),其中认知不确定性通过预测后验分布建模。在DNN的最终连接层参数周围进行线性化是一种常用的近似方法,用于减少此类GLM的计算负担,尽管通常认为这会以性能下降为代价。在这项工作中,我们使用理论和实证方法比较了由全网络和最后一层线性化产生的GLM。我们首先利用随机矩阵理论进行理论比较;该分析显示全线性化在UQ能力上没有有意义的改进。结合一系列现代机器学习任务的大规模实证评估,我们得出以下结论:最后一层近似在提供显著提高的计算效率的同时,产生了可比的UQ性能。
Epistemic uncertainty quantification (UQ) for deep neural networks (DNNs) is a requirement for safe adoption of AI in mission-critical settings. Several leading methods for UQ linearize DNNs to form Bayesian Generalized Linear Models (GLMs), where epistemic uncertainty is modeled via the predictive posterior distribution. Linearizing around the parameters of the final connected layer of a DNN is a commonly used approximation for reducing the computational burden of such GLMs, though it is often believed to come at the cost of degraded performance. In this work, we compare GLMs arising from full-network and last-layer linearization using both theoretical and empirical approaches. We first employ tools from random matrix theory to conduct a theoretical comparison; this analysis reveals no meaningful improvement in the UQ capabilities of full linearization. Coupled with a large-scale empirical evaluation across a range of modern machine learning tasks, we arrive at the following conclusion: a last-layer approximation yields comparable UQ performance while offering substantially improved computational efficiency.
Kalimati蔬菜价格指数预测:基于动量校正的在线堆叠集成方法
Sahaj Raj Malla
AI总结 针对新兴经济体农产品价格高波动性问题,提出动量校正在线堆叠集成模型,通过构建逆波动率加权综合指数和64个因果特征,在90天预测期实现RMSE=1.771、MAPE=0.68%、R²=0.845的优异性能。
由于高波动性、频繁的供应中断以及强烈的文化需求影响,新兴经济体的农产品价格预测十分困难。本研究引入了Kalimati蔬菜价格指数(KVPI),这是一个新的逆波动率加权综合指数,汇总了加德满都十年(2013-2023年)的135种日度批发商品。通过创建稳定的宏观信号,KVPI减少了单个作物建模固有的噪声。我们开发了包含64个因果有效特征的丰富特征集,包括节日领先滞后效应、滚动统计量和日历变量。对涵盖统计、树基、深度学习、混合和Transformer架构的14种预测模型,在短期(7天)、中期(14天和30天)和长期(90天)预测期上进行了严格评估。树基集成方法表现出显著的鲁棒性,而经典统计模型和复杂Transformer在处理噪声数据集时表现不佳。提出的动量校正在线堆叠集成模型取得了最强性能,在90天预测期上均方根误差(RMSE)为1.771,平均绝对百分比误差(MAPE)低至0.68%,并解释了84.5%的方差(R²=0.845)。这一开源流程为尼泊尔及类似市场的政策制定者和供应链参与者提供了实用、可靠的工具,以预测价格波动并加强粮食安全。
Forecasting agricultural commodity prices in emerging economies is difficult due to high volatility, frequent supply disruptions, and strong cultural influences on demand. This study introduces the Kalimati Vegetable Price Index (KVPI), a new inverse-volatility weighted composite index that aggregates 135 daily wholesale commodities from Kathmandu over ten years (2013-2023). By creating a stable macro-level signal, the KVPI reduces the noise inherent in modelling individual crops. A rich set of 64 causally valid features was developed, including festival lead-lag effects, rolling statistics, and calendar variables. Fourteen forecasting models spanning statistical, tree-based, deep learning, hybrid, and transformer architectures were rigorously evaluated across short (7-day), medium (14- and 30-day), and long-term (90-day) horizons. Tree-based ensembles proved notably robust, while classical statistical models and complex transformers struggled with the noisy dataset. The proposed Momentum-Corrected Online Stacking Ensemble achieved the strongest performance, yielding a Root Mean Square Error (RMSE) of 1.771, an exceptionally low Mean Absolute Percentage Error (MAPE) of 0.68%, and explaining 84.5% of the variance (R-squared = 0.845) at the 90-day horizon. This open-source pipeline provides policymakers and supply chain actors in Nepal and similar markets with a practical, reliable tool for anticipating price movements and strengthening food security.
基于矩的潜狄利克雷协变量回归推断
Ziyu Jiang
AI总结 针对回归前使用主题模型降维导致的推断困难,提出一种基于校正谱矩的方法,直接识别回归系数β,避免估计文档级主题份额,并通过可交换性条件估计未知总浓度α0,实现有效推断。
主题模型常被用作回归前的降维工具,将估计的文档级主题份额视为观测协变量。这种插件式工作流程产生了两个推断困难:有效推断需要规则的第一阶段到第二阶段展开以传播主题估计不确定性,并且在固定文档长度下,即使已知总体主题矩阵,文档的主题混合也无法从其自身词汇中一致恢复。潜狄利克雷分配(LDA)的校正谱矩方法提供了一个起点:当总狄利克雷浓度已知时,低阶词矩可被校正以得到在潜主题基上对角的算子。我们将其扩展到下游回归。在有限LDA模型下,当响应残差与用于识别的低阶词矩正交时,响应加权词矩允许相同的校正,由此得到的监督算子直接识别回归系数β,无需估计文档级主题份额。主要障碍在于校正依赖于未知总浓度α0。我们证明,对于k≥3个主题且在一般有限探针条件下,α0通过可交换性识别:在真实值处,一族校正词矩算子可交换,而在偏离时通常不可交换。这产生了可行的估计量,并让α̂0的不确定性传播到β的推断中。该估计量在文档数量增长而文档长度固定时是渐近线性的,其标准误差来自文档级矩贡献的夹心估计。模拟显示,在插件式主题份额回归可能覆盖不足的情况下,该方法具有接近名义水平的覆盖率;对顶级经济学期刊的应用说明了潜主题效应的对比推断。
Topic models are often used as dimension-reduction tools before regression, with estimated document-level topic shares treated as observed covariates. This plug-in workflow creates two inferential difficulties: valid inference requires a regular first-stage-to-second-stage expansion that propagates topic-estimation uncertainty, and, at fixed document length, a document's topic mixture cannot be consistently recovered from its own words even when the population topic matrix is known. Corrected spectral moment methods for latent Dirichlet allocation (LDA) offer a starting point: when the total Dirichlet concentration is known, low-order word moments can be corrected to yield operators diagonal in the latent topic basis. We extend this to downstream regression. Under a finite LDA model with response residuals orthogonal to the low-order token moments used for identification, response-weighted word moments admit the same correction, and the resulting supervised operator identifies the regression coefficient $β$ directly, without estimating document-level topic shares. The main obstacle is that the correction depends on the unknown total concentration $α_0$. We show that, for $k\ge3$ topics and under a generic finite-probe condition, $α_0$ is identified by commutativity: at the true value a family of corrected word-moment operators commute, whereas away from it they generically do not. This yields a feasible estimator and lets uncertainty in $\hatα_0$ propagate into inference for $β$. The estimator is asymptotically linear as the number of documents grows with fixed document length, with sandwich standard errors from document-level moment contributions. Simulations show near-nominal coverage where plug-in topic-share regressions can undercover, and an application to top economics journals illustrates contrast inference for latent topic effects.
SAGE: 一种用于智能体大语言模型中高效记忆演化的新颖门控机制
Sijia Wang, Dhanajit Brahma, Ricardo Henao
AI总结 提出SAGE门控机制,基于von Mises-Fisher密度估计和自适应阈值,将记忆写入控制建模为新奇性检测问题,在LoCoMo上以更低成本实现最优token-F1。
智能体大语言模型必须持续决定新提取的事实是应添加、与现有记忆合并还是忽略,然而先前的工作更侧重于检索和存储,而非原则性的写入端控制。我们将记忆演化视为一个新颖性检测问题,并提出SAGE(Spherical Adaptive Gate for memory Evolution),一种用于记忆演化的球形自适应门控机制,它通过基于von Mises-Fisher的密度估计器对记忆嵌入上的候选事实进行评分,并使用跟踪记忆存储几何结构的自适应阈值对其进行路由。SAGE将明确新颖的事实解析为ADD,明确冗余的事实解析为NOOP,仅将不确定的情况发送给LLM合并步骤,从而减少了昂贵的写入时推理。在LoCoMo上,SAGE在所有七个开放权重骨干对比中均实现了对Mem0的最佳平均token-F1,而在GPT-4o-mini上,它将添加阶段的API成本降低了3.4倍,添加阶段延迟降低了2.5倍,且平均评判分数差距很小。作为A-Mem的即插即用二进制门控,SAGE在五个模型上跳过了大约16-18%的LLM调用,且在开放权重骨干上质量变化极小。这些结果表明,新颖性感知的写入控制是提高长期智能体记忆中记忆质量和系统效率的实用杠杆。
Agentic LLMs must continuously decide whether newly extracted facts should be added, merged with existing memories, or ignored, yet prior work has focused more on retrieval and storage than on principled write-side control. We frame memory evolution as a novelty-detection problem and propose SAGE, a Spherical Adaptive Gate for memory Evolution that scores candidate facts with a von Mises-Fisher-based density estimator over memory embeddings and routes them with an adaptive threshold that tracks memory-store geometry. SAGE resolves clearly novel facts as ADD, clearly redundant facts as NOOP, and sends only uncertain cases to an LLM merge step, reducing expensive write-time reasoning. On LoCoMo, SAGE achieves the best average token-F1 against Mem0 on all seven open-weight backbone comparisons, while on GPT-4o-mini it reduces add-phase API cost by 3.4$\times$ and add-phase latency by 2.5$\times$ with only a small average judge-score gap. As a drop-in binary gate for A-Mem, SAGE skips roughly 16-18% of LLM calls across five models with minimal quality change on open-weight backbones. These results suggest that novelty-aware write control is a practical lever for improving both memory quality and system efficiency in long-term agentic memory.
具有一致性保证的快速近似贝叶斯多维缩放
Ami Sheth, Aaron Smith, Andrew J. Holbrook
AI总结 提出Barnes-Hut BMDS算法,通过树形似然近似和吉布斯采样将计算复杂度从O(n^2)降至O(n log n),并证明其稳态分布的一致性。
贝叶斯多维缩放(BMDS)将$n$个对象嵌入低维空间,以近似保留观测到的相异矩阵。与经典MDS相比,BMDS对模型误设更稳健,并支持后验不确定性量化和层次模型中的联合估计。然而,标准BMDS推断在计算上代价高昂,每次MCMC迭代需要$O(n^2)$次操作来评估似然。我们提出Barnes-Hut BMDS(BH-BMDS),它使用基于树的似然近似和利用此结构的吉布斯采样器,与层次扩展兼容。BH-BMDS将计算复杂度降低到$O(n \log n)$,同时保持嵌入的几何保真度。我们进一步建立了BH-BMDS平稳测度的一致性,证明即使代理似然的总误差发散,它也会集中在真实潜在配置周围。值得注意的是,这种一致性在无限维极限下成立。我们在具有不同结构的数据集上评估了近似,包括航空交通网络、arXiv摘要、MNIST图像以及来自tau病理小鼠模型的神经活动记录。在所有设置中,BH-BMDS与BMDS紧密匹配,同时实现了显著的计算增益,在$n=1{,}000$时加速约10倍,在$n=10{,}000$时加速约70倍。这些增益随$n$增加而增加,展示了强大的经验可扩展性。
Bayesian multidimensional scaling (BMDS) embeds $n$ objects in a low-dimensional space to approximately preserve an observed dissimilarity matrix. Compared to classic MDS, BMDS is more robust to model misspecification and supports posterior uncertainty quantification and joint estimation within hierarchical models. However, standard BMDS inference is computationally prohibitive, requiring $O(n^2)$ operations per MCMC iteration to evaluate the likelihood. We propose Barnes--Hut BMDS (BH-BMDS), which uses a tree-based approximation to the likelihood and a Gibbs sampler that leverages this structure, remaining compatible with hierarchical extensions. BH-BMDS reduces computational complexity to $O(n \log n)$ while preserving the geometric fidelity of the embedding. We further establish consistency for the stationary measure of BH-BMDS, proving that it concentrates around the true latent configuration even as the total error of the surrogate likelihood diverges. Notably, this consistency holds in the infinite-dimensional limit. We evaluate the approximation on datasets with diverse structure, including air traffic networks, arXiv abstracts, MNIST images and neural activity recordings from mouse models of tau pathology. Across all settings, BH-BMDS closely matches BMDS while achieving substantial computational gains, with approximately 10-fold speedups at $n=1{,}000$ and 70-fold speedups at $n=10{,}000$. These gains increase with $n$, demonstrating strong empirical scalability.
谱子采样MCMC用于具有昂贵似然贡献的Lévy驱动连续时间ARMA模型
Thomas Goodwin, Matias Quiroz, Robert Kohn, Feng Li
AI总结 针对连续时间过程频谱推断中Whittle似然因混叠需求和移频分量导致计算昂贵的问题,提出谱子采样MCMC方法,通过数据子采样和高效控制变量估计对数似然,有效降低计算成本。
基于子采样的马尔可夫链蒙特卡罗(MCMC)算法旨在通过每次迭代仅使用数据子集评估似然来加速贝叶斯推断。然而,在许多标准的高维数据应用中,单个似然贡献的计算成本较低,由于计算开销,实际计算时间的减少通常远小于数据量的名义减少。我们研究了在等间隔离散时间点观测的连续时间过程的频域推断中出现的一种不同的计算机制。这导致了混叠现象,即Whittle似然的每个贡献需要对移位的频率分量求和,这与标准离散时间频谱设置不同,后者不需要这种求和。我们证明,这种结构使得谱子采样MCMC(一种通过数据子采样和高效控制变量估计对数似然的基于子采样的MCMC方法)在降低计算成本方面特别有效。我们以离散观测的、由有限二阶矩Lévy过程驱动的连续时间自回归移动平均模型为例,展示了该方法在贝叶斯频域推断中的应用。
Subsampling-based Markov chain Monte Carlo (MCMC) algorithms aim to accelerate Bayesian inference by evaluating the likelihood using only a subset of the data at each iteration. However, in many standard tall-data applications, individual likelihood contributions are inexpensive to evaluate and the resulting reductions in actual computing time are often substantially smaller than the nominal reduction in data size due to computational overhead. We study a different computational regime arising in frequency-domain inference for continuous-time processes observed at equally spaced discrete time points. This gives rise to aliasing, whereby each contribution to the Whittle likelihood requires summation over shifted frequency components, unlike standard discrete-time spectral settings where spectral evaluations do not require such summation. We demonstrate that this structure makes subsampling MCMC, a subsampling-based MCMC approach that estimates the log-likelihood using data subsampling and efficient control variates, particularly effective for reducing computational cost. We illustrate the approach for Bayesian frequency-domain inference in discretely observed continuous-time autoregressive moving average models driven by finite second-moment Lévy processes.
一致贝叶斯局部空间特征选择及其在空间多模态组学中的应用
Kun Huang, Xiyu Peng, Huiyan Sang, Ligang Lu
AI总结 针对空间多模态组学中的高维回归问题,提出一种贝叶斯局部空间特征选择框架,通过随机域划分先验和局部特征选择先验实现域划分与特征选择的一致性理论,并开发高效的可逆跳跃MCMC算法。
受空间多模态组学(SMO)中高维回归问题的启发,我们提出了一种用于局部空间特征选择的贝叶斯框架,其中引入随机域划分先验将空间域划分为若干形状灵活且数量未知的连续簇,并在每个簇内施加局部特征选择先验。“特征”的概念是广义的,可能包括协变量和函数基,使得该框架既能执行局部变量选择,也能执行局部基选择,后者对于自适应逼近具有局部特性的空间变化函数至关重要。我们推导了连接域划分和局部特征选择先验的耦合超参数条件,在此条件下建立了域划分和特征选择的一致性理论及后验收缩率。我们开发了一种高效的信息化可逆跳跃马尔可夫链蒙特卡洛算法,以解决域划分和选定特征的联合后验采样中遇到的计算挑战。模拟研究证明了所提模型和算法的有效性,突出了其相对于现有方法的优势。将该模型应用于SMO数据集,揭示了乳腺癌组织中具有生物学意义的空间模式。
Motivated by a high-dimensional regression problem in spatial multimodal omics (SMO), we propose a Bayesian framework for local spatial feature selection, where a random domain partition prior is introduced to divide the spatial domain into several contiguous clusters with flexible shapes and an unknown number of clusters, conditional on which a local feature selection prior is imposed within each cluster. The notion of "feature" is general and may include both covariates and functional bases, allowing the framework to perform both local variable selection and local basis selection, the latter being essential for adaptively approximating spatially varying functions with localized characteristics. We derive coupled hyperparameter conditions linking domain partition and local feature selection priors, under which the consistency theory and posterior contraction rates of both the domain partition and feature selection are established. We develop an efficient informed reversible jump Markov chain Monte Carlo algorithm to address the computational challenges encountered in joint posterior sampling of domain partitions and selected features. Simulation studies demonstrate the effectiveness of the proposed model and algorithm, highlighting its advantages over existing methods. The application of our model to an SMO dataset reveals biologically meaningful spatial patterns within breast cancer tissue.
张量估计中的自由能普适性:基于通用链式方法
Wenxuan Zou, Galen Reeves
AI总结 本文利用通用链式方法,建立了张量结构数据推断问题中自由能可由高斯比较模型近似的条件,并应用于二元超图模型,证明了在平均度发散的最小假设下自由能的普适性。
我们研究具有张量结构数据的高维推断问题,并建立了其自由能可由高斯比较模型近似的条件。我们的框架适用于具有独立观测值以及数据生成分布与统计模型之间不匹配的模型。结果将先前的工作扩展到矩阵设置之外,并适应模型参数依赖于维度的缩放机制。一个关键的技术贡献是使用通用链式方法来控制在张量结构参数空间上的似然展开产生的余项。作为一个应用,我们在平均度发散的最小假设下建立了二元超图模型的自由能普适性,表明即使在模型不匹配的情况下,其渐近行为也与高斯张量模型一致。
We study high-dimensional inference problems with tensor-structured data and establish conditions under which their free energy can be approximated by that of a Gaussian comparison model. Our framework applies to models with independent observations and mismatch between the data-generating distribution and the statistical model. The results extend prior work beyond matrix settings and accommodate scaling regimes where the model parameters depend on the dimension. A key technical contribution is the use of generic chaining to control remainder terms arising from likelihood expansions over tensor-structured parameter spaces. As an application, we establish free energy universality for binary hypergraph models under the minimal assumption of diverging average degree, showing that their asymptotic behavior coincides with that of a Gaussian tensor model, even under model mismatch.
学习测度值轨迹的主动时间点选择
Nicolas Huynh, Mihaela van der Schaar
AI总结 针对高成本破坏性数据获取场景,提出基于线性化最优传输的主动学习框架,通过高斯过程建模概率路径并迭代选择最优测量时间点以最小化不确定性。
从稀疏快照推断连续概率路径是单细胞生物学等领域的基本挑战,其中高保真数据获取通常具有破坏性且受限于高昂测序成本。这促使需要主动学习策略来战略性选择最优测量时间。然而,为此场景设计主动学习策略仍是一个开放问题:目标对象位于无限维Wasserstein空间,标准欧几里得度量在此不适用,且当前插值方法缺乏认知不确定性量化。我们提出一个将主动实验扩展到测度空间的框架。通过利用线性化最优传输(LOT),我们将分布快照映射到适合高斯过程建模的切空间,从而为底层概率路径构建可处理的概率代理模型。这产生了一种采集策略,通过迭代选择测量时间以最小化不确定性。实验结果表明,我们的策略在合成和真实数据集上均优于不考虑不确定性的基线方法。
Inferring continuous probability paths from sparse snapshots is a fundamental challenge in domains like single-cell biology, where high-fidelity data acquisition is often destructive and constrained by prohibitive sequencing costs. This motivates the need for active learning strategies to strategically select optimal measurement times. However, designing active learning policies for this setting remains an open problem: the target objects reside on the infinite dimensional Wasserstein space where standard Euclidean metrics are ill-defined, and current interpolation methods lack epistemic uncertainty quantification. We introduce a framework which extends active experimentation to the space of measures. By leveraging Linearized Optimal Transport (LOT), we map distributional snapshots into a tangent space amenable to Gaussian Process modeling, allowing us to construct a tractable probabilistic surrogate for the underlying probability path. This yields an acquisition policy that iteratively selects measurement times to minimize uncertainty. Empirical results demonstrate that our strategy outperforms uncertainty-agnostic baselines on both synthetic and real-world datasets.
基于相位对比MRI的主肺动脉速度剖面不确定性感知估计的贝叶斯框架
Amirreza Kachabi, Naomi C. Chesler
AI总结 提出一个贝叶斯框架,结合二维相位对比MRI和机理速度剖面公式,生成考虑不确定性的个体化肺动脉速度表示。
计算心血管血流模型对预设的入口速度剖面高度敏感。虽然成像衍生的速度场提供了生理上真实的信息,但它们可能增加预处理复杂性、成像噪声和计算负担。简化的解析公式计算效率高,但可能无法完全捕捉个体特定的血流特征。在本研究中,我们提出了一个不确定性感知框架,将二维相位对比磁共振成像(2D PC-MRI)与机理速度剖面公式相结合,以生成个体化的肺动脉速度表示。使用椭圆径向分箱和归一化,从犬和猪受试者的主肺动脉(MPA)PC-MRI数据构建成像衍生的径向速度分布。在贝叶斯推断框架内拟合幂律和Womersley速度剖面公式,同时考虑与成像测量和模型表示相关的不确定性。使用区域和全局加权均方根误差(wRMSE)指标比较两种公式。两种模型均与受试者的成像衍生速度剖面表现出良好的一致性。尽管Womersley公式在血管壁附近提供了更大的灵活性,但与更简单的幂律模型相比,并未导致拟合性能的统计显著改善。所提出的框架提供了低维、生理可解释且具有不确定性感知的速度剖面表示,可作为个体化心血管血流建模的计算高效替代方案。
Computational cardiovascular flow models are highly sensitive to prescribed inlet velocity profiles. While imaging-derived velocity fields provide physiologically realistic information, they can introduce increased preprocessing complexity, imaging noise, and computational burden. Simplified analytical formulations are computationally efficient but may not fully capture subject-specific flow characteristics. In this study, we present an uncertainty-aware framework that combines two-dimensional phase-contrast magnetic resonance imaging (2D PC-MRI) with mechanistic velocity-profile formulations to generate subject-specific pulmonary artery velocity representations. Imaging-derived radial velocity distributions were constructed from main pulmonary artery (MPA) PC-MRI data in canine and swine subjects using elliptical radial binning and normalization. Power-law and Womersley velocity-profile formulations were fitted within a Bayesian inference framework while accounting for uncertainty associated with imaging measurements and model representation. The two formulations were compared using regional and global weighted root mean square error (wRMSE) metrics. Both models demonstrated close agreement with the imaging-derived velocity profiles across subjects. Although the Womersley formulation provided greater flexibility near the vessel wall, it did not result in statistically significant improvements in fitting performance compared with the simpler power-law model. The proposed framework provides low-dimensional, physiologically interpretable, and uncertainty-aware velocity-profile representations that may serve as computationally efficient alternatives for subject-specific cardiovascular flow modeling.
从 Best-of-$N$ 偏好数据中学习奖励:目标、权衡与设计原则
Rattana Pukdee, Maria-Florina Balcan, Pradeep Ravikumar
AI总结 本文分析了从 Best-of-$N$ 采样构建的成对偏好数据中 Bradley-Terry 奖励学习的目标,揭示了 $N$ 和基础分布对奖励估计的影响,并提出了基于样本效率和连通性权衡的设计原则。
Best-of-$N$ 采样被广泛用于构建成对偏好数据:从基础分布中抽取 $N$ 个候选,并将最佳响应与拒绝响应配对。尽管其广泛使用,但 Bradley-Terry (BT) 奖励学习从这类数据中提取了什么,以及如何选择 $N$ 和基础分布,仍不清楚。我们将近期通过诱导条件分布对偏好数据的分析专门应用于 Best-of-$N$。对于独立参考变体,我们推导出作为 $N$ 和基础分布显式函数的闭式奖励目标,并证明它们保留了潜在奖励排名。对于实用的 Best-vs-Random 和 Best-vs-Worst 变体,所选和拒绝的响应通过同一候选集耦合,因此精确的 BT 可表示性通常不成立;然而,随着 $N$ 增长,有界类最小化器接近参考目标。尽管已知边界和连通性在成对偏好学习中控制样本效率,但 Best-of-$N$ 通过 $N$ 以相反方向耦合它们:更大的 $N$ 加宽成对边界但降低连通性。这种权衡产生了两个设计原则:当偏好标签是瓶颈时使用较大的 $N$,当生成是瓶颈时使用较小的 $N$;并塑造基础分布,使其质量集中在测试时比较最重要的响应之间。在合成和真实偏好数据上的实验支持了对样本量和基础分布形状的预测依赖性。
Best-of-$N$ sampling is widely used to construct pairwise preference data: $N$ candidates are drawn from a base distribution, and the best is paired with a rejected response. Despite its widespread use, what Bradley--Terry (BT) reward learning extracts from such data, and how to choose $N$ and the base distribution, remain unclear. We specialize a recent analysis of preference data via its induced conditional distribution to Best-of-$N$. For independent-reference variants, we derive closed-form reward targets as explicit functions of $N$ and the base distribution, and show that they preserve the latent reward ranking. For the practical Best-vs-Random and Best-vs-Worst variants, chosen and rejected responses are coupled through the same candidate set, so exact BT representability generally fails; nevertheless, bounded-class minimizers approach the reference targets as $N$ grows. Although margin and connectivity are known to govern sample efficiency in pairwise preference learning, Best-of-$N$ couples them through $N$ in opposing directions: larger $N$ widens pairwise margins but reduces connectivity. This trade-off yields two design principles: use larger $N$ when preference labels are the bottleneck, smaller $N$ when generation is the bottleneck; and shape the base distribution to place mass between the responses whose comparison matters most at test time. Experiments on synthetic and real preference data support the predicted dependence on sample size and base-distribution shape.
修正线性单元回归
Tatsushi Oka
AI总结 提出一种名为修正线性单元(ReLU)回归的方法,通过将ReLU变换后的结果投影到协变量上,直接估计条件结果分布的积分泛函,并建立其渐近分布和推断方法,扩展了经验研究中可用的分布参数集。
本文开发了一个用于直接估计条件结果分布积分泛函的回归框架。所提出的方法称为修正线性单元(ReLU)回归,它将ReLU变换后的结果投影到协变量上,并得到一个闭式估计量。其总体回归函数与结果的积分条件分布函数一致,而通过Legendre-Fenchel变换得到的凸共轭则恢复了积分条件分位数函数。回归及其共轭都只需要温和的分布假设,并适用于非连续结果。我们建立了估计量的均匀渐近分布,并通过Hadamard方向可微映射的delta方法对共轭泛函进行推断。基于这些结果,我们建立了概率水平任意子区间上平均分位数处理效应的识别和推断。这拓宽了经验工作中可用的分布参数集。
This paper develops a regression framework for the direct estimation of integrated functionals of conditional outcome distributions. The proposed method, termed rectified linear unit (ReLU) regression, projects the ReLU-transformed outcome onto covariates and admits a closed-form estimator. Its population regression function coincides with the integrated conditional distribution function of the outcome, and its convex conjugate, obtained via the Legendre-Fenchel transformation, recovers the integrated conditional quantile function. Both the regression and its conjugate require only mild distributional assumptions and accommodate non-continuous outcomes. We establish the uniform asymptotic distribution of the estimator and develop inference for the conjugate functional via the delta method for Hadamard directionally differentiable maps. Building on these results, we establish identification and inference for average quantile treatment effects over arbitrary subintervals of probability levels. This broadens the set of distributional parameters available to empirical work.
用于空间和时空残差风险的正交化核回归:应用于美国本土的校园枪击事件
Tilman M. Davies, Michael R. Desjardins, Alexander Hohl, Guangzhen Wu
AI总结 提出一种正交化核回归框架,在二元半参数模型中嵌入残差风险曲面,用于区分背景异质性与超额风险,并应用于2000-2024年美国本土959起校园枪击事件分析,发现学校规模、中学和高中风险更高,且调整协变量后残差风险仍集中在美国中东部走廊。
区分背景异质性与超额风险是病例-对照事件数据中的一个核心挑战,当协变量和残差空间或时空结构都重要时。我们开发了一个协变量调整的核回归框架,在半参数二元模型中嵌入一个正交化的残差风险曲面,并将该方法从纯空间扩展到显式时空分析。我们将该方法应用于2000年至2024年美国本土公立学校发生的959起枪支暴力事件,使用了K-12学校枪击数据库中的事件,并与相应年份的官方学校记录相关联。拟合模型识别出稳定的学校层面关联,包括规模较大的学校以及初中和高中风险显著更高,同时揭示了超出学校背景分布的显著残差结构。在调整协变量后,发现超额风险仍集中在美国一个持久的中东部走廊,最近几年证据最强。更广泛地说,该分析展示了残差风险曲面如何通过分离背景异质性与异常结构来增强对时空演变的病例-对照事件过程的推断。
Distinguishing background heterogeneity from excess risk is a central challenge in case-control event data when both covariates and residual spatial or spatio-temporal structure matter. We develop a covariate-adjusted kernel regression framework that embeds an orthogonalized residual risk surface within a semiparametric binary model, and extend the approach from purely spatial to explicit spatio-temporal analysis. We apply the method to 959 gun violence incidents at public schools in the contiguous United States from 2000 to 2024, using incidents from the K-12 School Shooting Database linked to official school records for the corresponding year. The fitted models identify stable school-level associations, including markedly higher risk for larger schools and for middle and high schools, while also revealing substantial residual structure beyond the background distribution of schools. After adjustment for covariates, excess risk is found to remain concentrated in a persistent central-eastern corridor of the United States, with the strongest evidence appearing in recent years. More broadly, the analysis shows how residual risk surfaces can sharpen inference by separating background heterogeneity from anomalous structure in case-control event processes evolving over space and time.
基于多元混合效应模型的动态共表达网络估计
Samuel Ozminkowski, Lifang Hou, David R Jacobs, Hongmei Jiang
AI总结 提出一种通过混合效应模型和惩罚算法估计动态共表达网络的方法,在均方误差和平均绝对误差上优于现有方法,并应用于CARDIA研究分析蛋白质共表达网络的时间演化。
高通量测序技术使得大规模纵向组学数据的收集成为可能,为研究分子节点(如基因和蛋白质)之间的共表达网络提供了新的机会。然而,此类数据固有的高维性和时间依赖性需要专门的统计方法。我们提出了一种新颖的方法来推断特征随时间变化的动态共表达网络(DCENt),其中每个节点(特征)用混合效应模型建模,节点间的依赖性通过相关随机效应捕捉。我们开发了两种创新的惩罚算法,利用最新的阈值协方差估计器来估计随机效应协方差结构。模拟研究表明,在均方误差和平均绝对误差方面,该方法优于现有方法。我们进一步将方法应用于CARDIA研究的数据,以探究蛋白质共表达网络如何随时间演化以及蛋白质轨迹模式之间的关联。
High-throughput sequencing technologies have enabled the collection of large-scale longitudinal -omics data, providing new opportunities for studying co-expression networks among molecular nodes such as genes and proteins. However, the high dimensionality and temporal dependence inherent in such data require specialized statistical methods. We propose a novel approach to infer dynamic co-expression networks among features over time (DCENt), where each node (feature) is modeled with a mixed-effects model, and dependencies among nodes are captured through correlated random effects. We develop two innovative penalized algorithms which harness the state of the art of threshold covariance estimators to estimate the random-effects covariance structure. Simulation studies show improved performance over existing approaches in terms of both mean square error and mean absolute error. We further apply the methods to data from the CARDIA study to investigate how the protein co-expression networks evolve over time as well as the association between protein trajectory patterns.
真实自回避行走用于加速马尔可夫链蒙特卡洛积分
Qinghua, Ding, Venkat Anantharam
AI总结 本文提出使用真实自回避行走(TSAW)改进马尔可夫链蒙特卡洛(MCMC)积分估计,通过惩罚过度访问的转移概率,使得经验积分误差达到几乎必然的O(√log t / t)量级,显著优于标准随机游走的t^{-1/2}误差。
我们研究真实自回避行走(TSAW)作为一种通过马尔可夫链蒙特卡洛(MCMC)改进经验积分估计的机制。我们考虑与有限集上不可约马尔可夫核$P$(具有平稳分布$π$)相关的有限状态自适应采样动力学,其中转移概率根据经验过度使用而受到惩罚。我们的主要结果是,由此产生的基于TSAW的行走的经验占用计数$L_t(i)$和转移计数$N_t(i,j)$满足\[ L_t(i)-tπ_i = O(\sqrt{\log t}) \quad ext{和}\quad N_t(i,j)-tπ_iP_{ij}=O(\sqrt{\log t}) \qquad ext{几乎必然} \]对于每个状态$i$和每个满足$P_{ij}>0$的边$(i,j)$。因此,对于每个有界函数$f:V o\mathbb R$,我们的积分估计器的误差收敛为\[ \left| rac1t\sum_{s=0}^{t-1} f(X_s)-\sum_{i\in V}π_i f(i) ight| = O\left( rac{\sqrt{\log t}}{t} ight) \qquad ext{几乎必然}. \]这些结果表明,与标准随机游走方法下经验平均的通常$t^{-1/2}$误差标度相比,基于TSAW的估计器产生的经验积分误差几乎必然为$O(\sqrt{\log t}/t)$量级,从而实现了对样本量$t$的显著更尖锐的依赖性。
We study true self-avoiding walk (TSAW) as a mechanism for improving empirical integral estimation via Markov chain Monte Carlo (MCMC). We consider finite-state adaptive sampling dynamics associated with an irreducible Markov kernel $P$ on a finite set, with stationary distribution $π$, in which the transition probabilities are penalized according to empirical overuse. Our main result is that the empirical occupation counts $L_t(i)$ and transition counts $N_t(i,j)$ of the resulting TSAW-based walk satisfy \[ L_t(i)-tπ_i = O(\sqrt{\log t}) \quad\text{and}\quad N_t(i,j)-tπ_iP_{ij}=O(\sqrt{\log t}) \qquad\text{almost surely} \] for every state $i$ and every edge $(i,j)$ with $P_{ij}>0$. Consequently, for every bounded function $f:V\to\mathbb R$, the error of our integral estimator converges as \[ \left|\frac1t\sum_{s=0}^{t-1} f(X_s)-\sum_{i\in V}π_i f(i)\right| = O\left(\frac{\sqrt{\log t}}{t}\right) \qquad\text{almost surely}. \] These results show that, in contrast with the usual $t^{-1/2}$ error scaling for empirical averages under standard random-walk-based methods, TSAW-based estimator yields empirical integral errors of order $O(\sqrt{\log t}/t)$ almost surely, thereby achieving a substantially sharper dependence on the sample size $t$.
使用R中的mets包进行生存和竞争风险数据的限制平均时间损失分析
Thomas Harder Scheike, Klaus Kähler Holst
AI总结 本文介绍mets R包中用于计算限制平均生存时间(RMST)和限制平均时间损失(RMTL)的非参数和回归估计方法,包括特定原因导致的RMTL,并提供影响函数以实现标准误差计算和复杂统计推断。
本文介绍了在mets R包中实现的软件,用于计算限制平均生存时间(RMST)和限制平均时间损失(RMTL)的非参数和回归估计,包括特定原因导致的RMTL。一个独特的功能是能够同时计算所有时间范围下RMST和RMTL的非参数估计及其标准误差。mets中的回归建模基于逆删失加权(IPCW)方法。该包实现了不同版本的IPCW调整估计方程。一个关键的技术贡献是为所有模型提供了影响函数,这使得能够计算标准误差,并允许将估计值用作更复杂统计量的构建块,例如复发事件设置中的while-alive估计。为了扩展因果推断的能力,mets包还实现了标准化估计(G计算)的方法,以及在竞争风险设置下RMST和RMTL的平均处理效应(ATE)估计。重要的是,计算复杂度与观测数量呈线性关系,使得该软件适用于大型数据集的高效使用。
This paper introduces software implemented in the mets R-package for calculating non-parametric and regression estimates of Restricted Mean Survival Time (RMST) and Restricted Mean Time Lost (RMTL), including RMTL due to specific causes. A unique feature is the ability to compute the non-parametric estimates of RMST and RMTL, as well as their standard errors, for all time horizons simultaneously. Regression modeling in mets is based on Inverse Probability of Censoring Weighting (IPCW) methods. The package implements different versions of IPCW adjusted estimating equations. A critical technical contribution is the provision of influence functions for all models, which enables the computation of standard errors and allows the estimates to be used as building blocks for more complex statistics, such as the while-alive estimate in recurrent events settings. To expand capabilities in causal inference, the mets package also implements methods for standardization estimates (G-computation) and the estimation of Average Treatment Effects (ATE) for both RMST and RMTL in the competing risks setting. Importantly, the computations scale linearly with the number of observations, making the software efficient for use with large datasets.
基于神经和最优传输方法的似然自由推断方法基准测试
Samira Aka, Marie Kratz, Philippe Naveau
AI总结 研究通过模拟评估四种似然自由推断方法(MLE、NBE、EOT和AW-NBE)在重尾或离散数据等结构特征下的性能,强调在极端和离散数据下谨慎选择评估工具的重要性。
基于模拟的推断(SBI)已成为参数估计中越来越重要的框架,适用于模拟可行但似然评估不可用或成本高昂的情况。虽然最近的工作引入了基准框架来比较似然自由方法,但这些研究通常不考虑重尾或离散性等结构特征。在本文中,我们研究了似然自由推断方法的性能如何依赖于这些结构属性。我们考虑了四种方法:MLE、NBE、EOT和AW-NBE,并通过模拟进行评估。这项研究强调了在存在极端和离散数据的情况下,谨慎选择评估工具的重要性。
Simulation-based inference (SBI) has become an increasingly important framework for parameter estimation in models for which simulation is feasible, including cases where likelihood evaluation is unavailable or costly. While recent work has introduced benchmark frameworks to compare likelihood-free methods, these studies often do not account for structural features such as heavy-tails or discreteness. In this article, we investigate how the performance of likelihood-free inference methods depends on these structural properties. We consider four approaches: MLE, NBE, EOT and AW--NBE and evaluate them using simulations. This study highlights the importance of carefully selecting evaluation tools in the presence of extremes and discrete data.
在 $\ell_\infty$ 下的改进分布估计
Doron Cohen, Aryeh Kontorovich, Yonatan Livshitz
AI总结 本文在 $\ell_\infty$ 范数下改进了离散概率分布的估计,给出了期望极小极大界和高概率尾界,解决了 Kontorovich 和 Painsky (JMLR, 2025) 提出的开放问题,包括最紧风险界的完全经验版本和最坏情况极值分布的形式,并报告了鼓励性的实证结果。
我们提出了在 $\ell_\infty$ 范数下估计离散概率分布的改进界。这些包括期望极小极大界和高概率尾界。我们解决了 Kontorovich 和 Painsky (JMLR, 2025) 提出的一些开放问题——包括他们提出的最紧风险界的完全经验版本以及识别最坏情况极值分布的形式。还报告了鼓励性的实证结果。
We present improved bounds for estimating discrete probability distributions under the $\ell_\infty$ norm. These include minimax bounds in expectation and high-probability tail bounds. We resolve some of the open questions posed in Kontorovich and Painsky (JMLR, 2025) -- including a fully empirical version of the tightest risk bound they presented and identifying the form of the worst-case extremal distribution. Encouraging empirical results are reported as well.
混合接触动力学下的物理信息目标条件强化学习
Vittorio Giammarino, Anastasios Manganaris, Ahmed H. Qureshi
AI总结 针对接触丰富任务中混合动力学导致现有物理信息目标条件强化学习方法性能下降的问题,提出接触感知和分层公式,选择性应用物理信息归纳偏置,向接触丰富操作扩展。
从稀疏反馈中学习达到任意目标需要智能体推断状态-目标对之间的丰富可达性概念。目标条件强化学习(GCRL)通过学习跨目标泛化的策略来应对这一挑战,但随着底层动力学变得高维、混合或接触依赖,这种泛化变得越来越困难。为了解决这个问题,物理信息GCRL(Pi-GCRL)将最优控制启发的归纳偏置引入目标条件价值学习。虽然Pi-GCRL方法在导航和无目标到达领域已被证明有效,但它们在接触丰富任务中的可靠性仍不清楚,其中接触交互导致混合动力学、模式依赖的可控性和非光滑价值景观。在这项工作中,我们表明这些结构特性可能导致现有Pi-GCRL方法在朴素应用于接触丰富操作时性能下降。受此分析启发,我们引入了接触感知和分层公式,选择性地将物理信息归纳偏置应用于操作问题。我们的结果为将Pi-GCRL扩展到接触丰富操作提供了原则性的一步。
Learning to reach arbitrary goals from sparse feedback requires agents to infer a rich notion of reachability across state--goal pairs. Goal-conditioned reinforcement learning (GCRL) tackles this challenge by learning policies that generalize across goals, but this generalization becomes increasingly difficult as the underlying dynamics become high-dimensional, hybrid, or contact-dependent. To address this issue, physics-informed GCRL (Pi-GCRL) introduces optimal-control-inspired inductive biases into goal-conditioned value learning. While Pi-GCRL methods have proven effective in navigation and object-free goal-reaching domains, their reliability in contact-rich tasks remains unclear, where contact interactions induce hybrid dynamics, mode-dependent controllability, and nonsmooth value landscapes. In this work, we show that these structural properties can cause existing Pi-GCRL methods to degrade when applied naively to contact-rich manipulation. Motivated by this analysis, we introduce contact-aware and hierarchical formulations that apply physics-informed inductive biases selectively across the manipulation problem. Our results provide a principled step toward extending Pi-GCRL to contact-rich manipulation.
关于不可识别二项式模型的贝叶斯分析
Éric Marchand
AI总结 针对不可识别的二项式模型,利用截断Beta分布、有限混合Beta分布和调和数,推导了后验期望和归一化常数的大样本近似。
我们提供了对$(p_1, p_2)$的后验分布和期望的分析,其中$Y|p_1,p_2 \sim \hbox{Binomial}(n, p_1 p_2)$且$(p_1, p_2)$在单位正方形$[0,1]^2$上均匀分布。我们展示了用截断Beta分布、有限混合Beta分布和调和数表示的有趣表达式,并推导了后验期望$\mathbb{E}(p_i|y)$以及后验联合密度中归一化常数的简单大样本$n$近似。
We provide analysis for the posterior distribution and expectation of $(p_1, p_2)$ where $Y|p_1,p_2 \sim \hbox{Binomial}(n, p_1 p_2)$ and $ (p_1, p_2)$ is uniformly distributed on the unit square $[0,1]^2$. We exhibit interesting expressions in terms of a truncated Beta distribution, a finite mixture of Beta distributions and harmonic numbers, and derive a simple large sample size $n$ approximation for the posterior expectations $\mathbb{E}(p_i|y)$ as well as for the normalization constant in the posterior joint density.
复杂计算机模型的收缩约束函数校准
Liam Myhill, Enrique Martinez, Sez Russcher
AI总结 提出一种新的贝叶斯模型校准方法IBFU,通过将校准参数分解为固定最佳估计和输入空间上的独立高斯过程参数修正,并施加强收缩先验和正交约束,以解决Kennedy-O'Hagan框架在稀疏噪声数据下的正则化不足和混杂问题。
我们提出了一种新的贝叶斯模型校准形式,作为Kennedy-O'Hagan (KOH)框架的替代方案,称为全不确定性集成偏差(IBFU)。在KOH中,校准参数被建模为固定的但先验约束较弱的未知分布,其后验与加性偏差高斯过程(GP)联合推断。这种公式通常提供有限的正则化,并且在应用于具有稀疏、噪声测量的不精确模型时会导致混杂病理。相比之下,我们将每个校准参数表示为一个固定最佳估计值与一个由输入空间上的独立GP表示的参数修正之和,并配备强收缩先验。任何无法通过参数修正解决的剩余偏差由作用于模拟器的加性偏差GP捕获,类似于KOH。然后我们施加正交约束以减轻模拟器与建模的加性偏差之间的混杂以及模型参数之间的共线性。通过保守的超先验施加强复杂性收缩,迫使平均参数修正保持在整个域内平坦,导致预测基本上与KOH公式收敛。然而,当放松复杂性收缩时,如果数据提供证据表明有效校准参数在域内变化,平均参数修正被允许以受控、结构化的方式成为域的函数。从这个意义上说,我们的方法更具通用性:它有效地将KOH作为特例嵌套,同时将其扩展到输入依赖的校准,并且它受到更严格的约束,因为它将真实值锚定在最佳估计周围,并且收缩先验主动正则化校准参数。
We propose a new Bayesian model calibration formalism as an alternative to the Kennedy O'Hagan (KOH) framework which we term integrated bias with full uncertainty (IBFU). In KOH, calibration parameters are modeled as fixed, but unknown distributions with relatively weak prior constraints, and their posteriors are inferred jointly with an additive discrepancy Gaussian Process (GP). This formulation often provides limited regularization and leads to confounding pathologies when applied to inexact models with sparse, noisy measurements. By contrast, we represent each calibration parameter as the sum of a fixed best estimate value and a parameter correction represented by an independent GP over the input space, equipped with strong shrinkage priors. Any residual discrepancy that cannot be addressed via parameter correction is captured by an additive discrepancy GP operating on the simulator, similar to KOH. We then impose orthogonality constraints to mitigate confounding between the simulator and modeled additive discrepancy and colinearity between model parameters. Imposing strong complexity shrinkage via conservative hyperpriors forces the mean parameter correction to remain flat across the domain, resulting in predictions that essentially converge with the KOH formulation. However, upon relaxing complexity shrinkage, should the data provide evidence that the effective calibration parameter varies across the domain, the mean parameter correction is allowed to become a function of the domain in a controlled, structured manner. In this sense, our approach is more universal: it effectively nests KOH as a special case while extending it to input dependent calibration, and it is more tightly constrained, because it anchors the true values around the best estimates and the shrinkage prior actively regularizes the calibration parameters.
基于e统计量的M估计
Hongjian Wang, Aaditya Ramdas
AI总结 提出ME估计量,通过最小化e统计量进行点估计,建立其一致性和收敛速度,并在有界均值估计中分析渐近正态性。
我们通过引入“ME估计量”提出了一种基于e统计量(e值和e过程)的点估计理论:最小化相应e统计量(或反对它的证据)的参数。我们的方法基于e统计量作为证据和赌博收益度量的直观思想,自然推广了经典的最大似然估计方法。首先,我们建立了ME估计量的一致性和几乎必然收敛速度,这与通过对e统计量设置阈值导出的置信集大小的高概率界相关,这种方法使ME估计量区别于传统的M估计量。其次,我们在有界均值估计设置中对ME估计量的一致性和渐近正态性进行经典的M估计量风格分析,讨论了来自不同赌博策略选择的效率(或缺乏效率)概念。我们的工作将e统计量——推断和不确定性量化的基本工具——带入了估计领域。
We present a theory of point estimation with e-statistics (e-values and e-processes) by introducing the "ME-estimator": the parameter that minimizes the corresponding e-statistic, or the evidence against it. Our approach is based on the intuitive idea of e-statistics as a measure of evidence and betting pay-off, and naturally generalizes the classical method of maximum likelihood estimation. First, we establish the consistency as well as the almost sure convergence rate for ME-estimators relating to the high-probability bounds on the size of the confidence set derived from thresholding the e-statistics, an approach that sets ME-estimators apart from traditional M-estimators. Second, we conduct classical M-estimator-style analysis on the consistency and asymptotic normality of ME-estimators in the bounded mean estimation setting, discussing the notion of efficiency (or lack thereof) from various choices of betting strategy. Our work brings e-statistics, a fundamental tool for inference and uncertainty quantification, to the space of estimation.
一般潜在分布下的多维项目反应理论
Chengyu Cui, Taoyi Chen, Chun Wang, Gongjun Xu
AI总结 针对多维项目反应理论中潜在分布被假定为高斯分布的限制,提出基于流的框架以捕捉非高斯潜在分布,并联合学习项目参数、潜在分布和后验近似,改善参数恢复。
多维项目反应理论(MIRT)提供了一个重要的心理测量框架,用于建模多个潜在特质如何共同影响观察到的项目响应。在大多数现有的估计过程中,潜在特质分布被假定为高斯分布。尽管计算上方便,但在许多应用中,当潜在分布呈现偏态、重尾或多模态时,这一假设可能具有限制性。更重要的是,错误指定潜在分布可能会使项目参数和潜在特质的估计产生偏差。为了解决这一局限性,我们提出了一种数据驱动的基于流的MIRT模型框架,该框架可以捕捉一大类非高斯潜在分布。所提出的方法将潜在分布表示为简单基础分布的可逆变换。为了高效估计,我们进一步引入了一个条件流,作为观察到的响应和噪声的函数,以近似后验分布。在该框架下,项目参数、潜在分布和后验近似可以联合学习。全面的模拟研究表明,当真实潜在分布非正态时,所提出的方法改善了项目参数和潜在特质的恢复。对一个个性数据集的应用进一步说明了所提出框架在建模大规模数据中复杂潜在特质分布方面的实际效用。
Multidimensional item response theory (MIRT) provides an important psychometric framework for modeling how multiple latent traits jointly influence observed item responses. In most existing estimation procedures, the latent trait distribution is assumed to be Gaussian. Although computationally convenient, this assumption can be restrictive in many applications where the latent distribution exhibits skewness, heavy tails, or multimodality. More importantly, misspecifying the latent distribution may bias the estimation of item parameters and latent traits. To address this limitation, we propose a data-driven flow-based framework for MIRT models that can capture a broad class of non-Gaussian latent distributions. The proposed approach represents the latent distribution as an invertible transformation of a simple base distribution. For efficient estimation, we further introduce a conditional flow as a function of both the observed response and the noise to approximate the posterior distribution. Under this framework, the item parameters, latent distribution, and posterior approximation can be learned jointly. Comprehensive simulation studies show that the proposed method improves item-parameter and latent-trait recovery when the true latent distribution is non-normal. An application to a personality dataset further illustrates the practical utility of the proposed framework for modeling complex latent trait distributions in large-scale data.
校准偏好学习:以标签排序为例
Santo M. A. R. Thies, Viktor Bengs, Timo Kaufmann, Sebastian J. Vollmer, Eyke Hüllermeier
AI总结 针对概率标签排序问题,形式化定义了校准概念并建立层次体系,通过理论证明和实验验证了不同校准概念的关系及现有模型的校准缺陷。
校准,即预测概率与真实结果频率的对齐,对于可靠决策至关重要。尽管在分类和回归中已有广泛研究,但校准尚未在概率标签排序中得到正式处理,其目标是预测标签集排序上的分布。将排序视为类别会忽略其结构,并无法捕捉成对和top-k预测等重要模态。我们形式化了标签排序的校准,并建立了一个涵盖完整排序、子排序和top-k排序的概念层次。我们证明完整排序校准蕴含其他校准,但反之不成立,且子排序和top-k校准不可比较。实验发现,流行的标签排序模型通常校准不良,子排序和top-k指标之间存在显著差异。将我们的框架应用于RLHF奖励模型,发现校准与基准准确性强相关但不完全一致,表明它捕捉了超越top-1准确性的有意义的质量维度。这些发现激励了未来关于理解误校准的下游影响以及开发纠正方法的工作。
Calibration, the alignment of predicted probabilities with true outcome frequencies, is essential for reliable decision-making. While extensively studied for classification and regression, calibration has not been formally addressed for probabilistic label ranking, where the goal is to predict a distribution over orderings of a label set. Naively treating rankings as classes ignores their structure and fails to capture important modalities such as pairwise and top-k predictions. We formalize calibration for label ranking and develop a hierarchy of notions covering full rankings, sub-rankings, and top-k rankings. We prove that full-rank calibration implies the others but not conversely, and sub-ranking and top-k calibration are incomparable. Empirically, we find popular label ranking models are often poorly calibrated, with substantial differences between sub-ranking and top-k metrics. Applying our framework to RLHF reward models, we find that calibration correlates strongly but not perfectly with benchmark accuracy, suggesting it captures a meaningful quality dimension beyond top-1 accuracy. These findings motivate future work on understanding the downstream effects of miscalibration and developing methods to correct it.
多项式直方图用于长尾系统分布的内存高效表示
Murray Stokely, Tim Hesterberg, Arif Merchant, Nate Coehlo
AI总结 提出多项式直方图,通过在每个桶中标注矩信息,在相同存储成本下比传统直方图更有效地表示长尾分布,并应用于大规模生产系统的文件系统指标。
分布式系统必须经常跟踪许多不同计算机上的多种性能指标。例如,某些操作的延迟分布可能针对计算机、用户和操作的大量组合进行计算。这些经验分布需要在单个软件组件上以最小代价收集,跨多个维度高效聚合,并以紧凑表示存储,用于各种下游数据分析应用。我们描述了一种针对分箱数据的信息损失度量,使我们能够优化不同直方图表示的信息损失成本。我们探索了多项式直方图的使用,其中直方图的每个桶都标注了该桶中底层分布的矩。将这些多项式直方图与使用相同存储成本用于额外桶而不是每个桶中标注的传统直方图进行比较。我们描述了这些技术在大规模生产系统的文件系统指标中的应用,并分析表征了多项式直方图何时以更低成本提供更多信息。
Distributed systems must frequently keep track of many different types of performance metrics across many different computers. For example, the latency distribution of certain operations may be computed for a large combination of computers, users, and operations. These empirical distributions need to be collected at minimal expense on the individual software components, efficiently aggregated across multiple dimensions, and stored in a compact representation for a variety of downstream data analysis applications. We describe an information loss metric for binned data that allows us to optimize cost of information loss from different histogram representations. We explore the use of polynomial histograms where each bin of a histogram is annotated with moments of the underlying distribution in that bin. These polynomial histograms are compared to traditional histograms using the same storage cost for additional bins instead of annotations in each bin. We describe an application of these techniques for file system metrics for a large production system, and analytically characterize when polynomial histograms offer more information at lower cost.
银行卡支付网络中欺诈检测的基本极限
Gaurav Dhama
AI总结 本文通过形式化支付授权为具有延迟、审查、污染和反事实缺失反馈的序贯决策问题,推导出极小极大遗憾下界,证明生态系统信息质量是欺诈检测的根本瓶颈,而非模型复杂度。
银行卡支付欺诈检测通常被框架化为一个监督分类问题。尽管这种方法已经取得了实际进展,但尽管模型架构取得了重大进展,改进仍然只是渐进的。我们认为,这主要不是函数逼近或优化的失败,而是支付生态系统固有的结构性信息损害的结果。我们将银行卡授权形式化为一个具有延迟、审查、污染和反事实缺失反馈的序贯决策问题。我们推导出一个极小极大遗憾下界,表明这些损害在可达学习率的分母中相乘。该下界表明,提高发卡机构报告质量或减少审查可以比增加模型复杂度更大幅度地降低遗憾下限。我们还表明,发卡机构之间的异质性会进一步恶化可学习性,超出平均损害率所暗示的程度。本文贡献了一个理论,解释了为什么支付网络中的欺诈检测本质上比标准在线学习设置更困难,将生态系统信息质量确定为关键瓶颈,并为优先投资于报告基础设施、争议处理质量和选择性探索提供了理论基础。本文以理论为先,不依赖专有交易数据。
Card payment fraud detection is usually framed as a supervised classification problem. Although this approach has generated practical progress, improvement has remained incremental despite major advances in model architecture. We argue that this is not mainly a failure of function approximation or optimization, but a consequence of structural information impairments inherent to the payment ecosystem. We formalize card authorization as a sequential decision problem with delayed, censored, corrupted, and counterfactually missing feedback. We derive a minimax regret lower bound showing that these impairments enter multiplicatively in the denominator of the achievable learning rate. The bound implies that improving issuer reporting quality or reducing censorship can yield larger reductions in the regret floor than increasing model complexity. We also show that heterogeneity across issuers worsens learnability beyond what average impairment rates suggest. The paper contributes a theory of why fraud detection in payment networks is fundamentally harder than in standard online learning settings, identifies ecosystem information quality as the key bottleneck, and provides a theoretical basis for prioritizing investments in reporting infrastructure, dispute process quality, and selective exploration. The paper is theory-first and does not rely on proprietary transaction data.
利用大语言模型和三重损失学习研究相似性以探究荟萃分析中的异质性
Kanella Panagiotopoulou, Harald Binder, Theodoros Evrenoglou
AI总结 提出一种结合大语言模型与深度度量学习的框架,通过三重损失训练嵌入模型,在研究层面推断相似性并聚类,以在荟萃分析前识别同质子群,减少异质性并提高估计精度。
观察性研究的荟萃分析通常显示研究间存在显著异质性,限制了合并估计的可解释性。元回归可用于探索异质性,但往往难以处理多个效应修饰因子。我们提出一种新颖框架,将大语言模型与深度度量学习相结合,在荟萃分析前推断研究层面的相似性。研究层面的临床和方法学特征由大语言模型处理,生成研究三元组(锚点、相似、不相似)。这些三元组通过将每项研究视为锚点,并与其他研究对进行比较,以识别每次实例中与锚点最相似的研究。然后,三元组用于训练三重损失的嵌入模型,这是一种深度学习方法,学习一个嵌入空间,其中临床和方法学相似的研究聚集在一起。我们将该框架应用于一个包含58项观察性研究的荟萃分析数据集,比较早产儿和足月儿儿童的认知结果。随后,我们在识别出的研究聚类内拟合荟萃分析模型,并将结果与整体分析进行比较。结果提示三个聚类,其中两个保留了相当大的研究间异质性。剩余聚类包含最同质的研究组,与整体分析相比,显示出更极端的合并效应估计和更窄的预测区间。这项工作提出了一种新颖方法,通过在模型拟合前纳入研究特征来探索荟萃分析中的异质性。通过将研究信息转化为相似性空间,该框架识别出同质子群,并支持在异质性真实世界证据中进行更精确的推断。
Meta-analyses of observational studies often show substantial between-study heterogeneity, limiting the interpretability of pooled estimates. Meta-regression can be used to explore heterogeneity, but it is often underpowered to handle multiple effect modifiers. We propose a novel framework that integrates large language models (LLMs) with deep metric learning to infer study-level similarity prior to meta-analysis. Study-level clinical and methodological characteristics were processed by an LLM to generate study triplets (anchor, similar, dissimilar). These triplets were constructed by treating each study as an anchor and comparing it with pairs of other studies to identify, in each instance, the study most similar to the anchor. Then, the triplets were used into an embedding model trained with triplet loss; a deep learning approach that learns an embedding space where clinically and methodologically similar studies are clustered together. We apply our framework to a meta-analysis dataset of 58 observational studies comparing cognitive outcomes between preterm- and term-born children. Subsequently, we fit meta-analysis models within the identified study clusters and compare the results with those of the overall analysis. Results suggested three clusters two of which retained considerable between-study heterogeneity. The remaining cluster comprised the most homogeneous group of studies and exhibited a more extreme pooled effect estimate together with a narrower prediction interval compared with the overall analysis. This work presents a novel approach for exploring heterogeneity in meta-analysis by incorporating study characteristics prior to model fitting. By transforming study information into a similarity space, the framework identifies coherent subgroups and supports more precise inference in heterogeneous real-world evidence.
在线学习延迟决策与变化专家
Dang Hoang Duy, Yannis Montreuil, Maxime Meyer, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi
AI总结 针对动态专家池和流式数据,提出首个在线学习延迟决策算法,利用H-一致性界和在线凸优化实现遗憾界保证。
学习延迟决策(L2D)方法将每个查询路由到预测模型或外部专家。虽然现有工作研究批处理设置中的这个问题,但实际部署需要处理流数据、变化的专家可用性和变化的专家分布。我们引入了第一个用于多类分类的在线L2D算法,具有bandit反馈和动态变化的专家池。我们的方法在一般情况下实现了$O((n+n_e)T^{2/3})$的遗憾界,在低噪声条件下实现了$O((n+n_e)\sqrt{T})$的遗憾界,其中$T$是时间范围,$n$是标签数量,$n_e$是跨轮次观察到的不同专家数量。该分析基于在线框架的新颖$\mathcal{H}$-一致性界,结合在线凸优化的一阶方法。在合成和真实世界数据集上的实验表明,我们的方法有效地将标准学习延迟决策扩展到具有变化专家可用性和可靠性的设置。
Learning-to-Defer (L2D) methods route each query either to a predictive model or to external experts. While existing work studies this problem in batch settings, real-world deployments require handling streaming data, changing expert availability, and shifting expert distribution. We introduce the first online L2D algorithm for multiclass classification with bandit feedback and a dynamically varying pool of experts. Our method achieves regret guarantees of $O((n+n_e)T^{2/3})$ in general and $O((n+n_e)\sqrt{T})$ under a low-noise condition, where $T$ is the time horizon, $n$ is the number of labels, and $n_e$ is the number of distinct experts observed across rounds. The analysis builds on novel $\mathcal{H}$-consistency bounds for the online framework, combined with first-order methods for online convex optimization. Experiments on synthetic and real-world datasets demonstrate that our approach effectively extends standard Learning-to-Defer to settings with varying expert availability and reliability.
超越增强动作代理的多专家学习延迟决策
Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi
AI总结 针对多专家学习延迟决策问题,提出一种解耦代理损失函数,通过独立sigmoid头与softmax分类器头分离优化,解决了现有方法中的优化病理问题,并首次给出不随专家数量增长的校准常数界。
学习延迟决策(L2D)系统针对每个输入决定是自行预测还是交给若干可用专家之一。非常成熟的方案通过将$K$个类别和$J$个专家视为共享$(K{+}J)$动作几何中的竞争动作,联合训练分类器和路由器。后续工作在该几何内提出了一系列增量修复;我们表明,即使在统计一致性下,每个方法仍不同程度地遭受优化层面的病理问题(目标失真、梯度放大、赢家通吃饥饿、集合质量崩溃或类别-专家耦合)。我们完全跳出增强动作家族,提出一种解耦代理:一个softmax分类器头以及每个专家独立的sigmoid头,镜像了问题的两个自然对象。我们证明每个样本的更新是坐标式的,且类别-专家Hessian块恒为零,并证明了具有校准常数$\max\{2\sqrt{2},\sqrt{2J/λ}\}$的过量风险界——据我们所知,这是第一个在多专家L2D中当每个专家权重固定时常数不随专家池增长的保证。在受控合成研究以及CIFAR-10、CIFAR-10H和Covertype上,它是我们比较中唯一在专家池增长时保持稳定、保留稀有专家并在每个真实数据基准上优于独立分类器的方法。
A learning-to-defer (L2D) system decides, for each input, whether to predict on its own or to hand it to one of several available experts. The very well established recipe trains classifier and router jointly by treating the $K$ classes and $J$ experts as competing actions in one shared $(K{+}J)$-action geometry. Subsequent work has proposed a series of incremental fixes within this geometry; we show that each still suffers, to varying severity, from an optimization-level pathology (target distortion, gradient amplification, winner-take-all starvation, set-mass collapse, or class-expert coupling) even under statistical consistency. We step outside the augmented-action family entirely and propose a decoupled surrogate: a softmax classifier head and an independent sigmoid head per expert, mirroring the two natural objects of the problem. We show that per-sample updates are then coordinatewise and the class-expert Hessian block is identically zero, and prove an excess-risk bound with calibration constant $\max\{2\sqrt{2},\sqrt{2J/λ}\}$ -- to our knowledge the first multi-expert L2D guarantee whose constant does not grow with the expert pool when the per-expert weight is held fixed. On controlled synthetic studies and on CIFAR-10, CIFAR-10H, and Covertype, it is the only method in our comparison that remains stable as the expert pool grows, preserves rare specialists, and improves over a standalone classifier on every real-data benchmark.
基于专家条件建议的学习-延迟决策
Yannis Montreuil, Leïna Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi
AI总结 研究在决策时可为专家提供额外信息(建议)的延迟学习问题,提出一种在复合专家-建议动作空间上的增广替代损失,并证明其一致性保证和最优策略恢复能力。
学习-延迟决策将每个输入路由到预期成本最小的专家,但假设决策时每个专家可获得的信息是固定的。许多现代系统违反了这一假设:选择专家后,还可以选择该专家应接收哪些额外信息,例如检索到的文档、工具输出或升级上下文。我们研究了这个问题,并将其称为带建议的学习-延迟决策。我们表明,即使在最简单的非平凡设置中,一系列广泛使用的自然分离替代损失(通过不同头部学习路由和建议)也是不一致的。然后,我们引入了一个在复合专家-建议动作空间上操作的增广替代损失,并证明了其$\mathcal{H}$一致性保证以及超额风险转移界,从而在极限情况下恢复贝叶斯最优策略。在表格、语言和多模态任务上的实验表明,所提方法优于标准学习-延迟决策,同时根据成本机制调整其建议获取行为;一个合成基准证实了分离替代损失预测的失败模式。
Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time. Many modern systems violate this assumption: after selecting an expert, one may also choose what additional information that expert should receive, such as retrieved documents, tool outputs, or escalation context. We study this problem and call it Learning-to-Defer with advice. We show that a broad family of natural separated surrogates, which learn routing and advice with distinct heads, is inconsistent even in the smallest non-trivial setting. We then introduce an augmented surrogate that operates on the composite expert--advice action space and prove an $\mathcal{H}$-consistency guarantee together with an excess-risk transfer bound, yielding recovery of the Bayes-optimal policy in the limit. Experiments on tabular, language, and multi-modal tasks show that the resulting method improves over standard Learning-to-Defer while adapting its advice-acquisition behavior to the cost regime; a synthetic benchmark confirms the failure mode predicted for separated surrogates.
使用鞅检验评估超过阈值的峰值的外推能力
Joseph de Vilmarest, Olivier Wintenberger
AI总结 针对极端降水事件概率估计问题,提出基于极值理论的单变量阈值峰值模型,并创新性地使用鞅检验评估外推能力以无偏选择高分位数水平。
我们提出了EVA2025数据挑战赛的获胜策略,该挑战旨在估计极端降水事件的概率。这些事件在数据集中最多出现一次,使得挑战本质上是极端值的外推问题。鉴于极端事件的稀缺性,我们认为采用简单、稳健的建模方法至关重要。我们采用单变量模型而非多变量模型,并使用极值理论对超过阈值的峰值进行建模。具体而言,我们拟合指数分布来模拟目标变量在高分位数(经季节调整后)之上的超出量。我们方法的新颖之处在于使用鞅检验来评估该过程的外推能力,并以无偏方式选择高分位数的水平。尽管该方法存在若干局限性,但我们相信将外推视为一种博弈为极值分析中的其他无偏方法打开了大门。
We present the winning strategy for the EVA2025 Data Challenge, which aimed to estimate the probability of extreme precipitation events. These events occurred at most once in the dataset making the challenge fundamentally one of extrapolating extreme values. Given the scarcity of extreme events, we argue that a simple, robust modeling approach is essential. We adopt univariate models instead of multivariate ones and model Peaks Over Thresholds using Extreme Value Theory. Specifically, we fit an exponential distribution to model exceedances of the target variable above a high quantile (after seasonal adjustment). The novelty of our approach lies in using martingale testing to evaluate the extrapolation power of the procedure and to agnostically select the level of the high quantile. While this method has several limitations, we believe that framing extrapolation as a game opens the door to other agnostic approaches in Extreme Value Analysis.
单阶段学习委托中的对抗鲁棒性
Yannis Montreuil, Letian Yu, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi
AI总结 针对单阶段学习委托(L2D)中预测器与分配器联合训练的场景,提出首个对抗鲁棒性框架,通过形式化攻击、设计成本敏感的对抗替代损失并建立理论保证(包括H、R/F和贝叶斯一致性),在基准数据集上验证了方法在保持干净性能的同时提升了对无目标和有目标攻击的鲁棒性。
学习委托(L2D)通过将输入路由到预测器或外部专家来实现混合决策。尽管前景广阔,但L2D极易受到对抗扰动的影响,这些扰动不仅可能翻转预测,还可能操纵委托决策。先前的鲁棒性分析仅关注两阶段设置,未涉及预测器和分配器联合训练的端到端(单阶段)情况。我们首次提出了单阶段L2D中对抗鲁棒性的框架,涵盖分类和回归。我们的方法形式化了攻击,提出了成本敏感的对抗替代损失,并建立了包括$\mathcal{H}$、$(\mathcal{R}, \mathcal{F})$和贝叶斯一致性在内的理论保证。在基准数据集上的实验证实,我们的方法在保持干净性能的同时,提高了对无目标和有目标攻击的鲁棒性。
Learning-to-Defer (L2D) enables hybrid decision-making by routing inputs either to a predictor or to external experts. While promising, L2D is highly vulnerable to adversarial perturbations, which can not only flip predictions but also manipulate deferral decisions. Prior robustness analyses focus solely on two-stage settings, leaving open the end-to-end (one-stage) case where predictor and allocation are trained jointly. We introduce the first framework for adversarial robustness in one-stage L2D, covering both classification and regression. Our approach formalizes attacks, proposes cost-sensitive adversarial surrogate losses, and establishes theoretical guarantees including $\mathcal{H}$, $(\mathcal{R }, \mathcal{F})$, and Bayes consistency. Experiments on benchmark datasets confirm that our methods improve robustness against untargeted and targeted attacks while preserving clean performance.
高效基准测试仅是特征选择与多元回归
Sam Bowyer, Acyr Locatelli, Kris Cao
AI总结 将高效基准测试重新定义为带特征选择的多元回归问题,使用核岭回归预测和mRMR特征选择算法,在降低计算成本的同时提高预测精度和排名相关性。
高效基准测试技术旨在通过仅使用基准测试问题子集预测完整基准测试分数,从而降低评估LLMs的计算成本。通过将此问题重新定义为带特征选择的多元回归实例,我们发现只需在预测阶段使用核岭回归即可大幅改进现有高效基准测试方法。此外,使用一种名为最小冗余最大相关性(mRMR)的信息论特征选择算法,我们可以通过选择对预测最有用的问题子集进一步改进这些方法。除数据非常匮乏的情况外,这些方法在二元和连续指标的各种基准测试中,始终实现更小的预测误差(MAE和RMSE),以及预测分数与真实分数之间更大的排名相关性(Spearman ρ和Kendall τ)。此外,mRMR子采样比竞争方法(通常涉及拟合概率模型或运行聚类算法)快得多,并且在不同随机种子或训练数据划分下更可能选择相同的问题。教程代码见https://github.com/sambowyer/mrmr_eval。
Efficient benchmarking techniques aim to lower the computational cost of evaluating LLMs by predicting full benchmark scores using only a subset of a benchmark's questions. By reframing this problem as an instance of multiple regression with feature selection, we find that existing efficient benchmarking methods can be greatly improved by simply using kernel ridge regression at the prediction stage. Additionally, using an information-theoretic feature-selection algorithm called minimum redundancy maximum relevance (mRMR), we can further improve upon these methods by selecting question subsets that will be maximally useful for prediction. Except in very data-poor settings, these approaches consistently achieve smaller prediction errors (in both MAE and RMSE), and greater ranking correlation between predicted and true scores (in both Spearman $ρ$ and Kendall $τ$) across a range of benchmarks using both binary and continuous metrics. Furthermore, mRMR subsampling is much faster than competitor methods (which often involve fitting probabilistic models or running clustering algorithms), and is more likely to select the same questions under different random seeds or training data splits. Tutorial code can be found at https://github.com/sambowyer/mrmr_eval .
基于跨维度贝叶斯模型平均的 $^{13}$C 代谢通量分析:结构模型不确定性下的证据驱动通量推断
Johann F. Jadebeck, Anton Stratmann, Martin Beyß, Katharina Nöh
AI总结 针对 $^{13}$C 代谢通量分析中结构模型不确定性问题,提出一种结合可逆跳转马尔可夫链蒙特卡洛与扩散嵌套抽样的贝叶斯模型集平均方法,实现大规模代谢网络模型空间上的鲁棒通量估计与模型结构恢复。
准确量化细胞内代谢通量是系统生物学和生物技术的核心。通量估计依赖于生化网络模型,其中 $^{13}$C 代谢通量分析(MFA)是最先进的方法。然而,同位素标记数据通常不足以唯一支持单一网络公式。在这种情况下,通量估计变得依赖于模型,凸显了需要明确考虑结构不确定性的方法。贝叶斯模型平均(BMA)为此提供了原则性框架,但其在 $^{13}$C-MFA 中的应用迄今仅限于固定网络拓扑内反应双向性的不确定性。我们引入了一种可扩展的贝叶斯推断框架用于 $^{13}$C-MFA,即贝叶斯模型集平均,该框架应用 BMA 来涵盖反应和通路的不确定性。我们的方法结合了用于模型空间跨维度探索的可逆跳转马尔可夫链蒙特卡洛和用于稳健估计模型证据的扩散嵌套抽样,从而能够对大型代谢网络模型族进行平均。通过说明性和应用规模的合成案例研究,我们证明了该方法能够产生稳健的通量估计,揭示何时多个网络配置在统计上不可区分,并恢复数据支持的模型结构。重要的是,该框架不是承诺单一模型,而是管理结构不确定性:在数据有限的情况下,保留竞争模型,而增加数据信息量则改善了模型和通量的恢复。该方法可扩展至数十亿模型变体,为 $^{13}$C-MFA 中考虑不确定性和误设的定量通量推断提供了实用基础。
Accurate quantification of intracellular metabolic fluxes is central to systems biology and biotechnology. Flux estimation relies on biochemical network models, with $^{13}$C metabolic flux analysis (MFA) being the state-of-the-art approach. However, isotope labeling data are often insufficient to uniquely support a single network formulation. In such cases, flux estimates become model-dependent, highlighting the need for methods that explicitly account for structural uncertainty. Bayesian model averaging (BMA) provides a principled framework for this purpose, but its application to $^{13}$C-MFA has so far been restricted to uncertainty in reaction bidirectionality within fixed network topologies. We introduce a scalable Bayesian inference framework for $^{13}$C-MFA, Bayesian model set averaging, that applies BMA to encompass uncertainty in reactions and pathways. Our approach combines reversible jump Markov chain Monte Carlo for trans-dimensional exploration of model spaces with diffusive nested sampling for robust estimation of model evidences, enabling averaging over large families of metabolic network models. Using illustrative and application-scale synthetic case studies, we demonstrate that the method yields robust flux estimates, reveals when multiple network configurations are statistically indistinguishable, and recovers data-supported model structures. Importantly, rather than committing to a single model, the framework manages structural uncertainty: under limited data, competing models are retained, whereas increasing data informativeness improved model and flux recovery. The approach scales to billions of model variants, providing a practical foundation for uncertainty- and misspecification-aware quantitative flux inference in $^{13}$C-MFA.
重新审视格兰杰因果关系:基于因果贝叶斯网络和赖兴巴赫原理
S. A. Adedayo
AI总结 本文通过赖兴巴赫原理和因果贝叶斯网络重新解释格兰杰因果关系,提出因果化格兰杰因果关系(c-GC)算法,赋予其稳健的因果解释,并在合成数据上取得满意结果。
表征复杂系统中的因果关系是理解其潜在机制的基础。格兰杰因果关系(GC)仍然是识别时间序列数据中因果关系的广泛使用的计算工具。然而,与其他因果发现方法一样,GC存在局限性,并因缺乏严格的因果基础而受到批评。在这项工作中,我们通过赖兴巴赫原理和因果贝叶斯网络的视角重新解释GC,从而解决了这一批评。这种重新解释被实现为一种算法,我们称之为因果化格兰杰因果关系(c-GC)。我们在理论上和图形上证明,这种重新表述在特定假设下赋予GC稳健的因果解释。c-GC在合成数据上取得了令人满意的结果,为观测数据集中的因果发现提供了一个更有原则的框架。
Characterising cause-effect relationships in complex systems is fundamental to understanding their underlying mechanisms. Granger causality (GC) remains a widely used computational tool for identifying causal relationships in time series data. However, like other causal discovery methods, GC has limitations and has been criticised for lacking a rigorous causal foundation. In this work, we present a fix to this criticism by reinterpreting GC through the lenses of Reichenbach's principles and causal Bayesian networks. This reinterpretation was implemented as an algorithm we call causalized Granger causality (c-GC). We demonstrate, both theoretically and graphically, that this reformulation endows GC with a robust causal interpretation under specific assumptions. c-GC yields satisfactory results on synthetic data, offering a more principled framework for causal discovery in observational datasets.
下一个词预测何时有用?边缘化、遍历性、混合可识别性、局部充分性、RAG、工具与编程
Francesco Corielli
AI总结 本文通过区分完整条件语言过程、边缘文本过程和模型诱导分布,论证了下一个词预测的有效性依赖于强假设(平稳性、代表性、遍历性)以及观察前缀对潜在上下文的充分性,并解释了RAG和工具使用作为条件充分性机制的作用。
在观察序列上训练的语言模型通常被描述为学习给定前一个词的下一个词的条件分布。这种描述仅在一定条件下成立。在真实词轨迹上训练的模型并未观察到完整的条件法则;它接收的是采样后的延续。此外,真实语言生成不仅受前文影响,还受非文本环境的影响:事实、事件、意图、目标、信念、社会背景和任务特定约束。本文区分了三个常被混淆的对象:以潜在环境为条件的完整条件语言过程、通过积分掉这些环境得到的边缘纯文本过程,以及从有限观察语料库中学习到的模型诱导分布。 本文认为,将模型训练解释为估计边缘纯文本法则需要强假设:平稳性、代表性和遍历性,这些假设在统计估计中是标准的,但在应用于异质语言语料库时存在问题。即使这些假设成立,边缘纯文本法则也仅当观察前缀是延续相关潜在环境的近似充分统计量时才有用。从信息论角度看,有用性要求下一个词与被省略环境之间的条件互信息(给定观察文本)很小。 然后,本文将这一论证扩展到异质训练语料库。 最后,本文将检索增强生成(RAG)和工具使用解释为条件充分性装置。
Language models trained on observed sequences are often described as learning the conditional distribution of the next token given previous tokens. This description is only conditionally correct. A model trained on realized token trajectories does not observe full conditional laws; it receives sampled continuations. Moreover, real language generation is conditioned not only on previous words but also on non-textual circumstances: facts, events, intentions, goals, beliefs, social context, and task-specific constraints. This paper distinguishes three objects that are often conflated: the full conditional language process conditioned on latent circumstances, the marginal text-only process obtained by integrating those circumstances out, and the model-induced distribution learned from finite observed corpora. The paper argues that interpreting model training as estimating the marginal text-only law requires strong assumptions of stationarity, representativeness, and ergodicity, assumptions that are standard in statistical estimation but problematic when applied to heterogeneous language corpora. Even if these assumptions hold, the marginal text-only law is useful only when the observed prefix is an approximately sufficient statistic for the latent circumstances relevant to continuation. In information-theoretic terms, usefulness requires that the residual conditional mutual information between the next token and the omitted circumstances, given the observed text, be small. The paper then extends this argument to heterogeneous training corpora. Finally, the paper interprets Retrieval Augmented Generation (RAG) and tool use as conditional sufficiency devices.
通过切换状态空间模型在非平稳时间序列中的学习-延迟决策
Yannis Montreuil, Letian Yu, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi
AI总结 提出L2D-SLDS框架,利用因子化切换线性高斯状态空间模型处理非平稳流式数据,通过共享因子持续更新未查询专家的信念,并设计学习感知查询分数平衡即时成本与信息增益,实现在线学习-延迟决策。
学习-延迟决策(L2D)将每个决策路由到系统自身的预测器或外部专家。流式时间序列设置打破了离线L2D的假设:数据是非平稳的,专家可用性随时间变化,内部预测器在线训练。我们提出L2D-SLDS,一种基于因子化切换线性高斯状态空间模型的一阶段在线L2D框架,该模型覆盖所有潜在残差:一个离散状态、一个共享全局因子以及每个专家的特异状态。始终观测的内部残差通过共享因子持续更新关于每个未查询专家的信念,而学习感知查询分数平衡即时成本与潜在状态信息增益以及一步学习者的改进。我们证明了一个针对时变学习-延迟比较器的oracle不等式,将遗憾分解为查询奖励预算、SLDS预测成本误差项$\mathcal{E}_{\mathrm{SLDS}}$以及内部学习者的区间动态遗憾。在合成数据、墨尔本、耶拿和24专家德里基准测试上,L2D-SLDS与上下文和非平稳老虎机基线相比具有竞争力或更优,同时在真实数据轮次中延迟比例低于$2\%$。
Learning-to-defer (L2D) routes each decision to a system's own predictor or to an external expert. Streaming time-series settings break the offline-L2D assumptions: the data are non-stationary, expert availability shifts over time, and the internal predictor is trained online. We propose L2D-SLDS, a one-stage online L2D framework based on a factorized switching linear-Gaussian state-space model over all potential residuals: a discrete regime, a shared global factor, and per-expert idiosyncratic states. The always-observed internal residual continuously updates beliefs about every unqueried expert through the shared factor, and a learner-aware query score balances immediate cost against latent-state information gain and one-step learner improvement. We prove an oracle inequality against a time-varying learn-and-defer comparator, decomposing regret into a query-bonus budget, an SLDS predictive-cost-error term~$\mathcal{E}_{\mathrm{SLDS}}$, and the internal learner's interval dynamic regret. On synthetic, Melbourne, Jena, and 24-expert Delhi benchmarks, L2D-SLDS is competitive with or improves on contextual- and non-stationary-bandit baselines while deferring on ${<}2\%$ of real-data rounds.
无需特征值下界的多任务线性回归:自适应性、鲁棒性与安全性
Seok-Jin Kim
AI总结 针对存在污染任务的多任务线性回归问题,提出基于矩阵加权范数正则化的估计器,引入相对平衡条件,在弱谱假设下达到与现有方法相当的预测误差界,并具备安全性保证。
我们研究了存在污染任务的多任务线性回归问题。我们处理了大多数任务的未知参数在 $\ell_2$ 范数下接近,而部分任务是任意异常值的情况。现有的理论框架严重依赖于每个任务的经验二阶矩的最小特征值远离零(阶为 $\Omega(1)$)的假设。关键的是,这一假设在许多高维场景中不成立,导致先前的保证无效。为了克服这一限制,我们提出了一种基于矩阵加权范数正则化的估计器。我们还引入了一个相对平衡条件,由平衡常数量化,该条件将每个任务的二阶矩与平均内点几何进行比较,并放宽了对任务级二阶矩下界的需求。在具有适度平衡性的有利情况下,我们的预测 MSE 界在显著更弱的谱假设下匹配 Duan 和 Wang (2023) 的速率;由此得到的任务总体 MSE 在最小化极大意义下是最优的,仅相差对数因子。此外,我们证明了我们的估计器具有安全性保证:当相关的平衡常数很大或无穷大,或者任务不相关时,该方法的表现不会差于独立任务学习。
We study the multi-task linear regression problem in the presence of contaminated tasks. We address the setting where the unknown parameters of a majority of tasks are close in the $\ell_2$-norm, while a fraction of tasks are arbitrary outliers. Existing theoretical frameworks for this problem rely heavily on the assumption that the empirical second moment of each task has a minimum eigenvalue bounded away from zero (order $Ω(1)$). Crucially, this assumption fails in many high-dimensional scenarios, rendering prior guarantees vacuous. To overcome this limitation, we propose an estimator based on matrix-weighted norm regularization. We also introduce a relative balancedness condition, quantified by a balancedness constant, that compares each task's second moment with the average inlier geometry and relaxes the need for taskwise second-moment lower bounds. In favorable regimes with moderate balancedness, our prediction MSE bounds match the rate of Duan and Wang (2023) under substantially weaker spectral assumptions; the resulting task-overall MSE is minimax optimal up to logarithmic factors. Furthermore, we demonstrate that our estimator enjoys a safety guarantee: when the relevant balancedness constant is large or infinite, or when tasks are unrelated, the method performs no worse than independent task learning.
基于协变量的非参数贝叶斯强度估计的高斯和贝索-拉普拉斯先验的递增域渐近性
Patric Dolmeta, Matteo Giordano
AI总结 研究在递增域渐近性和协变量遍历性假设下,使用高斯先验和贝索-拉普拉斯先验的非参数贝叶斯方法估计协变量驱动点过程强度函数,并证明其达到极小极大最优后验收缩率。
我们研究了基于大窗口内点和协变量的观测来估计协变量驱动点过程强度函数的问题。我们考虑非参数贝叶斯方法,并证明,在递增域渐近性和协变量遍历性假设下,结合灵活链接函数的一类广泛高斯先验实现了极小极大最优后验收缩率。我们还采用了贝索-拉普拉斯先验,这类先验因其边缘保持和稀疏促进特性而在成像和逆问题中广受欢迎。我们证明,这些先验能够对属于低可积性指数贝索空间的空间非均匀强度进行最优估计。这些结果基于一个推广了文献中最新发现的一般浓度定理。为了验证理论,我们提供了广泛的数值模拟,通过合适的后验抽样方案实现了所考虑的程序。此外,我们展示了两个受林业和环境科学应用启发的真实数据分析。
We study the problem of estimating the intensity function of a covariate-driven point process based on observations of the points and covariates over a large window. We consider the nonparametric Bayesian approach, and show that a wide class of Gaussian priors, combined with flexible link functions, achieves minimax-optimal posterior contraction rates in the increasing domain asymptotics and under the assumption that the covariates be ergodic. We also employ Besov-Laplace priors, which are popular in imaging and inverse problems due to their edge-preserving and sparsity-promoting properties. We prove that these yield optimal estimation of spatially inhomogeneous intensities belonging to Besov spaces with low integrability index. These results are based on a general concentration theorem that extends recent findings from the literature. To corroborate the theory, we provide extensive numerical simulations, implementing the considered procedures via suitable posterior sampling schemes. Further, we present two real data analyses motivated by applications in forestry and the environmental sciences.
使用商业微波链路和扩散模型先验的贝叶斯雨场重建
Badr Moufad, Albina Ilina, Hai Victor Habi, Salem Lahlou, Yazid Janati, Hagit Messer, Eric Moulines
AI总结 提出将雨场重建视为贝叶斯逆问题,利用扩散模型作为高保真空间先验,通过无需训练的后验采样方法(如即插即用、序贯蒙特卡洛和副本交换)实现优于传统方法的性能。
商业微波链路(CML)为降雨感知提供了密集的空间覆盖,但其产生的路径积分测量使得精确的地面重建具有挑战性。现有方法通常将CML简化为点传感器,并忽略降雨与信号衰减之间的线积分关系,导致在非均匀降水条件下性能下降。在这项工作中,我们将雨场重建视为一个贝叶斯逆问题,使用扩散模型(DM)作为高保真空间先验。我们表明,与删失高斯过程相比,扩散模型能更好地保留关键降雨统计量。将降雨估计视为具有DM先验的贝叶斯逆问题,使得可以使用广泛的无需训练的后验采样方法,包括即插即用、序贯蒙特卡洛和副本交换方法。在合成和真实世界数据集上的实验表明,与基于CML的现有重建基线相比,该方法具有一致的改进。
Commercial Microwave Links (CMLs) offer dense spatial coverage for rainfall sensing but produce path-integrated measurements that make accurate ground-level reconstruction challenging. Existing methods typically oversimplify CMLs as point sensors and neglect line integration relating rainfall to signal attenuation, resulting in degraded performance under heterogeneous precipitation. In this work, we view rain field reconstruction as a Bayesian inverse problem with Diffusion Models (DMs) as high-fidelity spatial priors. We show that diffusion models better preserve key rainfall statistics compared to censored Gaussian processes. Framing rainfall estimation as a Bayesian inverse problem with a DM prior enables training-free posterior sampling using a broad family of methods, including Plug-and-Play, Sequential Monte Carlo, and Replica Exchange methods. Experiments on synthetic and real-world datasets demonstrate consistent improvements over established CML-based reconstruction baselines.
预测分歧与线性池的核分数视角
Fabian Krüger
AI总结 本文通过将线性池的结果从平方误差损失推广到所有核分数,揭示了预测分歧(组件分布的平均成对散度)对线性池性能的重要影响,并提出了在给定核评分规则下等权重最优的新条件。
本文将线性池的几个结果从平方误差损失推广到所有核分数。后者是一类丰富的评分规则,涵盖了单变量和多变量、离散和连续设置的点预测和分布预测。其成员包括用于单变量分布预测的连续排名概率分数和用于多变量分布预测的能量分数。我们的结果表明,预测分歧(所有组件分布的平均成对散度)对线性池的性能有重要影响。这些结果对于理解和设计一般组合设置中的线性池很有用。特别是,它们激励使用线性池(相对于其他组合公式),并提出了在给定核评分规则下等组合权重最优的新条件。
This paper generalizes several results on linear pooling from squared error loss to all kernel scores. The latter are a rich family of scoring rules that covers point and distribution forecasts for univariate and multivariate, discrete and continuous settings. Its members include the Continuous Ranked Probability Score for univariate distribution forecasting and the Energy Score for multivariate distribution forecasting. Our results indicate that forecast disagreement (measured as the average pairwise divergence of all component distributions) has important implications for the linear pool's performance. The results are useful for understanding and designing linear pools in general combination settings. In particular, they motivate using the linear pool (as opposed to other combination formulas) and yield a novel condition under which equal combination weights are optimal under a given kernel scoring rule.
带有Nesterov加速草图的在线牛顿方法的推断
Haoxuan Wang, Xinchen Du, Sen Na
AI总结 针对在线牛顿方法推断计算成本高的问题,提出结合Hessian平均与Nesterov加速草图投影求解器的方法,在保持一阶方法$O(d^2)$复杂度下实现二阶方法的鲁棒性,并建立了全局收敛性、渐近正态性和在线协方差估计器。
基于流式数据的可靠决策需要对在线方法进行原则性的不确定性量化。虽然一阶方法能够实现高效的迭代更新,但其推断过程仍需更新适当的(协方差)矩阵,导致$O(d^2)$的时间和内存复杂度,并且对问题的病态性和噪声异质性敏感。这一昂贵的推断任务为更鲁棒的二阶方法提供了机会,然而二阶方法受限于求解牛顿系统所需的$O(d^3)$复杂度。在本文中,我们通过研究一种带有Hessian平均的在线牛顿方法来解决这一差距,其中每一步的牛顿方向使用带有Nesterov加速的草图投影求解器近似计算,匹配了一阶方法的$O(d^2)$复杂度。对于所提出的方法,我们量化了来自随机数据和随机计算的不确定性。在标准光滑性和矩条件下,我们建立了全局几乎必然收敛性,证明了最后迭代的渐近正态性,其极限协方差由Lyapunov方程刻画,并开发了一个完全在线的协方差估计器,具有非渐近收敛保证。我们还将所得的不确定性量化与没有Nesterov加速的精确和草图牛顿方法联系起来。在回归模型上的大量实验证明了所提出方法在在线推断中的优越性。
Reliable decision-making with streaming data requires principled uncertainty quantification of online methods. While first-order methods enable efficient iterate updates, their inference procedures still require updating proper (covariance) matrices, incurring $O(d^2)$ time and memory complexity, and are sensitive to ill-conditioning and noise heterogeneity of the problem. This costly inference task offers an opportunity for more robust second-order methods, which are, however, bottlenecked by solving Newton systems with $O(d^3)$ complexity. In this paper, we address this gap by studying an online Newton method with Hessian averaging, where the Newton direction at each step is approximately computed using a sketch-and-project solver with Nesterov's acceleration, matching $O(d^2)$ complexity of first-order methods. For the proposed method, we quantify its uncertainty arising from both random data and randomized computation. Under standard smoothness and moment conditions, we establish global almost-sure convergence, prove asymptotic normality of the last iterate with a limiting covariance characterized by a Lyapunov equation, and develop a fully online covariance estimator with non-asymptotic convergence guarantees. We also connect the resulting uncertainty quantification to that of exact and sketched Newton methods without Nesterov's acceleration. Extensive experiments on regression models demonstrate the superiority of the proposed method for online inference.
网络传递性与社区结构之间的相互作用
Mingao Yuan, Irin Rahman, Chengay S Wangchuk, Minglian Lin
AI总结 本文通过推导几何块模型(GBM)中全局和平均聚类系数的极限,从理论上研究了网络传递性与社区结构之间的关系,揭示了极限的相变行为和非单调关系。
最近的实证观察表明,在许多现实世界的网络中,网络传递性与社区结构高度相关。在本文中,我们通过推导几何块模型(GBM)的全局和平均聚类系数的极限,从理论上研究了这种关系。两个极限都表现出相变;具体来说,在弱和强社区结构强度区域,极限函数的函数形式不同。对于具有平衡社区的GBM,全局和平均聚类系数的极限是相同的,而对于不平衡社区,这些极限不同。通常,聚类系数与社区结构强度不呈现单调关系。特别是,对于平衡GBM,其中社区内边概率是社区间边概率的常数倍,当倍数从1增加时,极限从$3/4$下降到$3/5$,随后增加到$3/4$的渐近上界。在不平衡设置中,全局聚类系数也观察到类似模式,其中两个极限都显式依赖于社区大小。
Recent empirical observations suggest that network transitivity is highly correlated with community structure in many real-world networks. In this paper, we theoretically investigate this relationship by deriving the limits of the global and average clustering coefficients for the geometric block model (GBM). Both limits exhibit a phase transition; specifically, the functional forms of the limit functions differ between the weak and strong community structure strength regimes. For a GBM with balanced communities, the limits of the global and average clustering coefficients are identical, whereas these limits differ for unbalanced communities. In general, the clustering coefficients do not exhibit a monotonic relationship with community structure strength. Particularly, for a balanced GBM where the within-community edge probability is a constant multiple of the between-community edge probability, the limit decreases from $3/4$ to $3/5$ and subsequently increases toward an asymptotic upper bound of $3/4$ as the multiple grows from one. A similar pattern is observed for the global clustering coefficient in unbalanced settings, where both limits exhibit an explicit dependence on community size.
多流快速变化检测:基础与最新进展
Topi Halme, Visa Koivunen
AI总结 本文综述了高维多传感器系统中快速变化检测的最新进展,重点解决结构约束和有限传感资源下的挑战,介绍了利用稀疏性等方法处理高维性、采样约束以及多流检测问题,并讨论了机器学习在未知概率模型或大规模系统中的应用。
本文概述了高维多传感器系统中快速变化检测(QCD)的最新发展,重点涉及结构约束和有限传感资源的设置。经典QCD方法在低维和完全观测情况下已得到充分理解,但在扩展到以大规模数据、受限采样或通信以及异质信号结构为特征的现代应用时面临重大挑战。我们回顾了处理高维性的关键方法,包括利用稀疏性和其他形式的信号异质性的方法。此外,我们讨论了采样约束,即在资源限制下必须顺序选择或获取观测值的情况。多流应用可能需要多次检测,例如在不同流中分别检测变化。描述了概率模型的基本假设、发生的变化类型、常用的决策标准、性能指标和错误类型。我们还简要讨论了在底层概率模型未知或由于系统规模大而需要选择哪些传感器监测现象的情况下,机器学习的应用。
This paper provides an overview of recent developments in quickest change detection (QCD) for high-dimensional multi-sensor systems, with an emphasis on settings involving structural constraints and limited sensing resources. Classical QCD methodologies, while well understood in low-dimensional and fully observed regimes, face significant challenges when extended to modern applications characterized by large-scale data, constrained sampling or communication, and heterogeneous signal structures. We review key approaches for handling high dimensionality, including methods that exploit sparsity, and other forms of signal heterogeneity. Additionally, we discuss sampling constraints, where observations must be selected or acquired sequentially under resource limitations. Multi-stream applications can require making multiple detections, for example when detecting changes separately in different streams. The underlying assumptions on probability models, the types of changes taking place, commonly used decision-making criteria, performance indices, and error types are described. We also briefly discuss the application of machine learning in cases where the underlying probability models are not known or there is a need to select which sensors should monitor the phenomena because of the large scale of the system.
高维两层ReLU神经网络损失景观中局部极小值的精确描述
Jie Huang, Bruno Loureiro, Stefano Sarao Mannelli
AI总结 本文通过总结统计量精确刻画了高维两层ReLU神经网络损失景观中的局部极小值,并建立了与单次SGD的关联,揭示了过参数化对极小值稳定性和可达性的影响。
我们研究了在可实现教师-学生设置下,具有高斯协变量的形式为$\sum_{k=1}^K \mathrm{ReLU}(w_k^ op x)$的两层ReLU网络的总体损失景观。我们证明局部极小值在总结统计量方面允许精确的低维表示,从而对景观产生清晰且可解释的描述。我们进一步建立了与单次SGD的直接联系:局部极小值对应于总结统计量空间中动力学的吸引不动点。这一视角揭示了极小值分组成离散族的层次结构,并展示了过参数化如何改变它们在基于梯度动力学下的稳定性和可达性。在这种过参数化机制下,全局极小值变得越来越可访问,吸引动力学并减少收敛到虚假解。总的来说,我们的结果揭示了常见简化假设的内在局限性,这些假设即使在最小的神经网络模型中也可能遗漏损失景观的基本特征。
We study the population loss landscape of two-layer ReLU networks of the form $\sum_{k=1}^K \mathrm{ReLU}(w_k^\top x)$ in a realisable teacher-student setting with Gaussian covariates. We show that local minima admit an exact low-dimensional representation in terms of summary statistics, yielding a sharp and interpretable characterisation of the landscape. We further establish a direct link with one-pass SGD: local minima correspond to attractive fixed points of the dynamics in summary statistics space. This perspective reveals a hierarchical organisation of minima into discrete families and shows how overparameterisation changes their stability and reachability under gradient-based dynamics. In this overparameterised regime, global minima become increasingly accessible, attracting the dynamics and reducing convergence to spurious solutions. Overall, our results reveal intrinsic limitations of common simplifying assumptions, which may miss essential features of the loss landscape even in minimal neural network models.
黎曼流形上的无逆自然梯度下降
Dario Draca, Takuo Matsubara, Minh-Ngoc Tran
AI总结 针对参数位于一般黎曼流形上的统计模型,提出一种内在的无逆自然梯度方法,通过流形上逆Fisher信息矩阵的移动近似和低秩更新,避免矩阵求逆,并证明迭代序列的几乎必然收敛率。
自然梯度法是统计优化的核心工具,但其广泛应用受到欧几里得参数空间假设、Fisher信息矩阵(FIM)的重复估计以及后续求逆计算成本的限制。本文针对参数位于一般黎曼流形上的统计模型,提出了一种内在的、无逆的自然梯度方法。在这种非欧几里得设定下进行统计优化,可以自然地强制执行参数约束、消除不可辨识参数,并利用测地凸性。我们的算法基于逆FIM的移动近似,该近似直接在流形上维护。通过低秩矩阵恒等式,利用新的得分向量高效更新该近似。我们证明了迭代序列的几乎必然收敛率为$O(\log s / s^α)$,近似FIM也有类似速率。针对大规模应用,进一步提出了一种存储复杂度为次二次的有限内存变体。我们在Bures-Wasserstein流形上的变分贝叶斯、Stiefel流形上的归一化流以及降秩逻辑回归中展示了我们方法的有效性。
The natural gradient method is a central tool for statistical optimisation, but its broader application is hindered by the assumption of a Euclidean parameter space, the repeated estimation of the Fisher information matrix (FIM), and the computational cost of its subsequent inversion. This paper proposes an intrinsic, inversion-free natural gradient method for statistical models whose parameters lie on general Riemannian manifolds. Formulating statistical optimisation in this non-Euclidean setting allows for the natural enforcement of parameter constraints, the elimination of non-identifiable parameters, and the exploitation of geodesic convexity. Our algorithm is based on a moving approximation of the inverse FIM, which is maintained directly on the manifold. This approximation is efficiently updated with new score vectors using low-rank matrix identities. We prove almost-sure convergence rates of $O(\log s / s^α)$ for the sequence of iterates, and a similar rate for the approximate FIM. A limited-memory variant with sub-quadratic storage complexity is further proposed for large-scale applications. We demonstrate the efficacy of our method on variational Bayes within the Bures-Wasserstein manifold, normalising flows on the Stiefel manifold, and reduced-rank logistic regression.
协变量能否解释这些群体差异?参考组的选择可能逆转Oaxaca-Blinder分解的结论
Manuel Quintero, Advik Shreekumar, William T. Stephenson, Tamara Broderick
AI总结 本文通过理论和实证证明,在Oaxaca-Blinder分解中,参考组的选择可能导致实质性不同的结论,且该问题在复杂回归模型中更为常见,建议研究者报告两种方向的分解结果。
科学家们常常试图解释为什么两个群体的结果存在差异。例如,两家医院患者死亡率的差异可能源于患者本身的差异(协变量)或医疗护理的差异(给定协变量下的结果)。Oaxaca-Blinder分解(OBD)是区分这些因素的标准工具。众所周知,OBD需要选择其中一个群体作为参考,且数值答案可能因参考组而异。据我们所知,目前尚无系统研究探讨OBD参考组的选择是否会导致不同的实质性结论以及该问题的普遍性。在本文中,我们通过真实数据和模拟数据给出了存在性证明,表明OBD参考组确实可能导致实质性不同的结论。我们的实证研究发现,当OBD扩展到更复杂的回归模型(包括预训练变换器)时,这种敏感性更为常见。我们的理论和实证结果共同表明,这些结论逆转并非完全由模型误设、小数据或对抗性参数选择导致。我们的结果表明,实践者应始终报告OBD的两个方向;现代机器学习和大数据集并不能自动解决结论逆转问题;且需要进一步研究这一问题。
Scientists often want to explain why an outcome is different in two groups. For instance, differences in patient mortality rates across two hospitals could be due to differences in the patients themselves (covariates) or differences in medical care (outcomes given covariates). The Oaxaca--Blinder decomposition (OBD) is a standard tool to tease apart these factors. It is well known that the OBD requires choosing one of the groups as a reference, and the numerical answer can vary with the reference. To the best of our knowledge, there has been no systematic investigation into whether the choice of OBD reference can yield different substantive conclusions and how common this issue is. In the present paper, we give existence proofs in real and simulated data that the OBD references can in fact yield substantively different conclusions. Our empirical exercises find that this sensitivity is more common when the OBD is extended to more complex regression models, including a pretrained transformer. Our theoretical and empirical results together establish that these conclusion reversals are not entirely driven by model misspecification, small data, or adversarial parameter choices. Our results suggest that practitioners should always report both directions of the OBD; that modern machine learning and large datasets do not automatically resolve the conclusion reversal problem; and that further work on this problem is needed.
无约束线性赌博机的一种扰动方法
Andrew Jacobsen, Dorian Baudry, Shinji Ito, Nicolò Cesa-Bianchi
AI总结 本文提出一种基于扰动的框架,将无约束线性赌博机问题简化为标准在线线性优化问题,并实现了静态和动态遗憾的最优高概率保证。
我们重新审视了Abernethy等人(2008)在无约束赌博机线性优化(uBLO)背景下的标准基于扰动的方法。我们展示了一个令人惊讶的结果:在无约束设置中,这种方法有效地将赌博机线性优化(BLO)简化为一个标准的在线线性优化(OLO)问题。我们的框架在几个方面改进了先前的工作。首先,当我们的扰动方案与比较器自适应的OLO算法结合时,我们推导出了期望遗憾保证,从而对不同的对抗模型如何影响最终的比较器自适应率提供了新的见解。我们还将分析扩展到动态遗憾,在没有$P_T$先验知识的情况下,首次获得了具有最优$\sqrt{P_T}$路径长度依赖的保证。然后,我们为uBLO中的静态和动态遗憾开发了第一个高概率保证。最后,我们讨论了静态遗憾的下界,并证明了欧几里得球上对抗性线性赌博机的传说$Ω(\sqrt{dT})$率,这具有独立的意义。
We revisit the standard perturbation-based approach of Abernethy et al. (2008) in the context of unconstrained Bandit Linear Optimization (uBLO). We show the surprising result that in the unconstrained setting, this approach effectively reduces Bandit Linear Optimization (BLO) to a standard Online Linear Optimization (OLO) problem. Our framework improves on prior work in several ways. First, we derive expected-regret guarantees when our perturbation scheme is combined with comparator-adaptive OLO algorithms, leading to new insights about the impact of different adversarial models on the resulting comparator-adaptive rates. We also extend our analysis to dynamic regret, obtaining the first guarantees with optimal $\sqrt{P_T}$ path-length dependencies without prior knowledge of $P_T$. We then develop the first high-probability guarantees for both static and dynamic regret in uBLO. Finally, we discuss lower bounds on the static regret, and prove the folklore $Ω(\sqrt{dT})$ rate for adversarial linear bandits on the Euclidean ball, which is of independent interest.
基于设计的任意有效推断:针对结果延迟和交错进入的随机实验
Michael Lindon, Nathan Kallus
AI总结 针对在线实验中结果延迟和交错进入的问题,采用基于设计的视角,通过构造臂特定事件时间滤子下的鞅,结合联合界构建处理效应的置信序列,实现连续监测。
延迟结果在在线实验中普遍存在:处理可能影响结果是否发生、何时发生及其实现值。为适应交错进入同时保持对环境非平稳性和单位异质性的稳健性,我们采用基于设计的视角,将每个臂中的样本累积奖励作为日历时间的函数。我们的置信序列允许实践者持续监测反事实增量奖励(如收入),即如果所有已进入单位被分配到处理组而非对照组,在日历时间$t$本应实现的奖励。主要技术挑战是基于设计的滤子选择,由于异步潜在结果时间的存在而复杂化。我们证明IPW处理效应估计误差在任何滤子下都不是鞅,而每个臂特定的IPW估计误差在精心选择的臂特定事件时间滤子下是鞅。因此,我们通过结合两个臂级置信序列与联合界来构建处理效应的置信序列,并进一步证明这可以优于传统的基于设计的方差上界。最后,我们刻画了使每臂AIPW估计误差保持为鞅的增广类。
Delayed outcomes are ubiquitous in online experimentation: treatment can affect whether an outcome occurs, when it occurs, and its realized value. To accommodate staggered entry while remaining robust to environmental nonstationarity and unit-level heterogeneity, we adopt a design-based perspective and target the sample cumulative reward in each arm as a function of calendar time. Our confidence sequences allow practitioners to continuously monitor the counterfactual incremental reward, such as revenue, that would have been realized by calendar time $t$ had all entered units been assigned to treatment rather than control. The main technical challenge is the choice of design-based filtration, complicated by the presence of asynchronous potential outcome times. We show that the IPW treatment-effect estimation error is not a martingale with respect to any filtration, while each arm-specific IPW estimation error is a martingale with respect to a carefully chosen arm-specific event-time filtration. We therefore construct a confidence sequence for the treatment effect by combining two arm-level confidence sequences with a union bound, and further demonstrate that this can outperform the traditional design-based variance upper bound. Finally, we characterize the class of augmentations for which the per-arm AIPW estimation error remains a martingale.
变分路由:用于校准混合专家Transformer的可扩展贝叶斯框架
Albus Yizhuo Li, Matthew Wicker
AI总结 提出变分混合专家路由(VMoER),通过将贝叶斯推断限制在专家选择阶段,实现大规模模型的不确定性校准,在微调基础模型上显著提升路由稳定性、降低校准误差并提高分布外检测AUROC,且额外计算开销极小。
基础模型越来越多地部署在需要理解其输出不确定性的场景中,这对于确保负责任部署至关重要。虽然贝叶斯方法为不确定性量化提供了原则性方法,但其计算开销使得在基础模型规模下进行训练或推理不切实际。最先进的模型通过精心设计的稀疏性(包括混合专家(MoE)层)实现了数万亿的参数数量。在这项工作中,我们通过引入变分混合专家路由(VMoER)展示了大规模下的校准不确定性,这是一种用于建模MoE层不确定性的结构化贝叶斯方法。VMoER将贝叶斯推断限制在通常由确定性路由网络完成的专家选择阶段。我们使用两种推断策略实例化VMoER:对路由logits的摊销变分推断和推断用于随机专家选择的温度参数。在微调测试的基础模型上,VMoER在噪声下将路由稳定性提高了38%,校准误差降低了94%,分布外AUROC提高了12%,同时额外FLOPs增加不到1%。这些结果表明,VMoER为构建鲁棒且具有不确定性意识的基础模型提供了一条可扩展的路径。
Foundation models are increasingly being deployed in contexts where understanding the uncertainty of their outputs is critical to ensuring responsible deployment. While Bayesian methods offer a principled approach to uncertainty quantification, their computational overhead renders their use impractical for training or inference at foundation model scale. State-of-the-art models achieve parameter counts in the trillions through carefully engineered sparsity including Mixture-of-Experts (MoE) layers. In this work, we demonstrate calibrated uncertainty at scale by introducing Variational Mixture-of-Experts Routing (VMoER), a structured Bayesian approach for modelling uncertainty in MoE layers. VMoER confines Bayesian inference to the expert-selection stage which is typically done by a deterministic routing network. We instantiate VMoER using two inference strategies: amortised variational inference over routing logits and inferring a temperature parameter for stochastic expert selection. Across fine-tuning tested foundation models, VMoER improves routing stability under noise by 38\%, reduces calibration error by 94\%, and increases out-of-distribution AUROC by 12\%, while incurring less than 1\% additional FLOPs. These results suggest VMoER offers a scalable path toward robust and uncertainty-aware foundation models.
离散流匹配中的小批量最优传输与困惑度界估计
Etrit Haxholli, Yeti Z. Gurbuz, Ogul Can, Eli Waxman
AI总结 针对离散流匹配中状态转移过多和概率估计困难的问题,提出基于小批量最优传输的动态优化目标以减少转移次数,并给出两个困惑度上界以支持训练与评估。
离散流匹配是一种用于建模分类数据的最新框架,在性能上与自回归模型相当。然而,与连续流匹配不同,由于离散路径的随机性,整流策略无法应用,因此需要替代方法来最小化状态转移。我们提出了一种动态最优传输类的最小化目标,并推导了其用于具有凸插值的离散流的Kantorovich形式,其中传输成本仅取决于状态间的不相似性,并可通过小批量策略进行优化。我们表明,此类方法可以将转移次数减少多达32倍(从1024到32),以达到相同的生成困惑度,同时不损害多样性。此外,离散流中的路径非确定性排除了瞬时变量变换的类似物,从而无法进行连续流可用的精确概率估计。因此,我们提出了两个困惑度上界,实现了有原则的训练、评估和模型比较。最后,我们引入了多掩码流,其在生成困惑度上优于掩码流且不损害多样性,特别是在使用小批量最优传输时。
Discrete flow matching, a recent framework for modeling categorical data, has shown competitive performance with autoregressive models. However, unlike continuous flow matching, the rectification strategy cannot be applied due to the stochasticity of discrete paths, necessitating alternative methods to minimize state transitions. We propose a dynamic-optimal-transport-like minimization objective and derive its Kantorovich formulation for discrete flows with convex interpolants, where transport cost depends solely on inter-state dissimilarity and can be optimized via minibatch strategies. We show that such methods can reduce the number of transitions up to 32 times (1024 to 32) to reach the same generative perplexity without compromising diversity. Additionally, path nondeterminism in discrete flows precludes an instantaneous change-of-variables analogue, preventing precise probability estimation available to continuous flows. We therefore propose two upper bounds on perplexity, enabling principled training, evaluation and model comparison. Finally, we introduce Multimask Flows which outperform masked flows in generative perplexity without compromising diversity, particularly when utilizing minibatch Optimal Transport.
按需辅导异质性效应估计的因果框架
Kirk Vanacore, Danielle R Thomas, Digory Smith, Bibi Groot, Justin Reich, Rene Kizilcec
AI总结 提出一个可扩展的因果推断框架,结合深度知识追踪和因果森林,估计自适应学习系统中按需人工辅导的即时效果,发现辅导能提升后续题目正确率约4个百分点,且效果存在显著异质性。
本文介绍了一个可扩展的因果推断框架,用于估计自适应学习系统中嵌入的按需人工辅导的即时、会话级效果。由于学生在遇到困难时寻求帮助,传统评估会受到自我选择和时间变化知识状态的混淆。我们通过将原则性分析样本构建与深度知识追踪(DKT)相结合来估计潜在掌握度,然后使用因果森林进行双重稳健估计,从而解决了这些挑战。将该框架应用于超过5000次中学数学辅导会话,我们发现请求人工辅导可使下一题正确率提高约4个百分点,并使后续遇到的技能准确率提高约3个百分点,这表明辅导效果在知识组件间存在近端迁移。该效果对多种模型规范和潜在未测量混杂因素具有稳健性。值得注意的是,这些效果在会话和学生之间表现出显著异质性,会话级效果估计范围从-20.25个百分点到+19.91个百分点。我们的后续分析表明,典型的行为指标(如学生发言时间)与高影响会话并不一致相关。此外,对于先前掌握度较低的学生,处理效果更大;对于低社会经济地位(SES)的学生,效果略小。该框架为评估和持续改进按需人工辅导提供了一个严格、实用的模板,并可直接应用于新兴的人工智能辅导系统。
This paper introduces a scalable causal inference framework for estimating the immediate, session-level effects of on-demand human tutoring embedded within adaptive learning systems. Because students seek assistance at moments of difficulty, conventional evaluation is confounded by self-selection and time-varying knowledge states. We address these challenges by integrating principled analytic sample construction with Deep Knowledge Tracing (DKT) to estimate latent mastery, followed by doubly robust estimation using Causal Forests. Applying this framework to over 5,000 middle-school mathematics tutoring sessions, we find that requesting human tutoring increases next-problem correctness by approximately 4 percentage points and accuracy on the subsequent skill encountered by approximately 3 percentage points, suggesting that the effects of tutoring have proximal transfer across knowledge components. This effect is robust to various forms of model specification and potential unmeasured confounders. Notably, these effects exhibit significant heterogeneity across sessions and students, with session-level effect estimates ranging from $-20.25pp$ to $+19.91pp$. Our follow-up analyses suggest that typical behavioral indicators, such as student talk time, do not consistently correlate with high-impact sessions. Furthermore, treatment effects are larger for students with lower prior mastery and slightly smaller for low-SES students. This framework offers a rigorous, practical template for the evaluation and continuous improvement of on-demand human tutoring, with direct applications for emerging AI tutoring systems.
马尔可夫数据的渐近最优序贯检验
Alhad Sethi, Kavali Sofia Sagar, Shubhada Agrawal, Debabrota Basu, P. N. Karthik
AI总结 针对遍历有限状态马尔可夫链生成的数据,提出一种渐近最优的序贯假设检验方法,其期望停止时间与实例相关的下界渐近匹配,并应用于马尔可夫链蒙特卡洛模型误设检测和马尔可夫决策过程结构性质检验。
我们研究了由遍历有限状态马尔可夫链生成的数据的单侧和α-正确序贯假设检验。原假设是未知转移矩阵属于随机矩阵的指定集合P,备择假设对应于不相交的集合Q。我们建立了备择假设下任何有效序贯检验的期望停止时间的非渐近实例相关下界,该下界是渐近紧的。我们的新分析改进了现有下界,这些下界在此设置中要么是渐近的,要么被证明是次优的。我们的下界同时包含了由未知马尔可夫链诱导的平稳分布和转移结构。我们进一步提出了一种最优检验,其期望停止时间在α→0时渐近匹配该下界。我们通过应用该框架到马尔可夫链蒙特卡洛中模型误设的序贯检测以及马尔可夫决策过程中转移动力学的线性等结构性质的检验,说明了我们框架的实用性。我们的发现给出了马尔可夫依赖下最优序贯检验程序的尖锐且一般的刻画。
We study one-sided and $α$-correct sequential hypothesis testing for data generated by an ergodic, finite-state Markov chain. The null hypothesis is that the unknown transition matrix belongs to a prescribed set $P$ of stochastic matrices, and the alternative corresponds to a disjoint set $Q$. We establish a non-asymptotic instance-dependent lower bound on the expected stopping time of any valid sequential test under the alternative, which is asymptotically tight. Our novel analysis improves the existing lower bounds, which are either asymptotic or provably sub-optimal in this setting. Our lower bound incorporates both the stationary distribution and the transition structure induced by the unknown Markov chain. We further propose an optimal test whose expected stopping time matches this lower bound asymptotically as $α\to 0$. We illustrate the usefulness of our framework through applications to sequential detection of model misspecification in Markov Chain Monte Carlo and to testing structural properties, such as the linearity of transition dynamics, in Markov decision processes. Our findings yield a sharp and general characterization of optimal sequential testing procedures under Markovian dependence.
量化扩散模型中的误差传播与模型崩溃
Nail B. Khelifa, Richard E. Turner, Ramji Venkataramanan
AI总结 本文理论分析了基于分数的扩散模型中递归训练导致模型崩溃的误差传播机制,给出了生成分布与目标分布之间累积散度的上下界,并刻画了不同漂移区域。
机器学习模型越来越多地在合成数据上进行训练或微调。已观察到,在此类数据上递归训练会显著降低各种任务的性能,通常表现为逐渐偏离目标分布。在这项工作中,我们在基于分数的扩散模型设置下从理论上分析了这一现象。对于每个训练轮次使用合成数据与来自目标分布的新鲜样本组合的实际流程,我们获得了生成分布与目标分布之间累积散度的上界和下界。值得注意的是,据我们所知,这是首次对学习分布与目标分布之间的散度给出下界,即使对于标准扩散模型也是如此。我们的结果使我们能够根据分数估计误差和每代中使用的新鲜数据比例来表征不同的漂移区域。在某个区域中,多次再训练轮次后的累积散度可以表示为每代分数估计误差的折现和。我们还提供了合成数据和图像上的实证结果以说明该理论。
Machine learning models are increasingly trained or fine-tuned on synthetic data. Recursively training on such data has been observed to significantly degrade performance in a wide range of tasks, often characterized by a progressive drift away from the target distribution. In this work, we theoretically analyze this phenomenon in the setting of score-based diffusion models. For a realistic pipeline where each training round uses a combination of synthetic data and fresh samples from the target distribution, we obtain upper and lower bounds on the accumulated divergence between the generated and target distributions. Notably, to the best of our knowledge, this is the first lower bound on the divergence between the learned and target distributions, even for standard diffusion models. Our results allow us to characterize different regimes of drift, depending on the score estimation error and the proportion of fresh data used in each generation. In a certain regime, the accumulated divergence after several retraining rounds can be expressed as a discounted sum of score estimation errors made at each generation. We also provide empirical results on synthetic data and images to illustrate the theory.
Softmax的信息几何:探测与引导
Kiho Park, Todd Nief, Yo Joong Choe, Victor Veitch
AI总结 本文从信息几何角度研究AI系统如何将语义结构编码到表示空间的几何结构中,并提出一种利用线性探针鲁棒引导表示以展现特定概念的“双重引导”方法。
本文关注AI系统如何将语义结构编码到其表示空间的几何结构中的问题。动机观察是,这些表示空间的自然几何应反映模型使用表示产生行为的方式。我们聚焦于定义softmax分布的重要特例。在这种情况下,我们认为自然几何是信息几何。我们的重点是信息几何在语义编码和线性表示假设中的作用。作为一个说明性应用,我们开发了“双重引导”,一种利用线性探针鲁棒地引导表示以展现特定概念的方法。我们证明双重引导在最小化对非目标概念改变的同时,最优地修改目标概念。实验上,我们发现双重引导增强了概念操控的可控性和稳定性。
This paper concerns the question of how AI systems encode semantic structure into the geometric structure of their representation spaces. The motivating observation is that the natural geometry of these representation spaces should reflect the way models use representations to produce behavior. We focus on the important special case of representations that define softmax distributions. In this case, we argue that the natural geometry is information geometry. Our focus is on the role of information geometry on semantic encoding and the linear representation hypothesis. As an illustrative application, we develop "dual steering", a method for robustly steering representations to exhibit a particular concept using linear probes. We prove that dual steering optimally modifies the target concept while minimizing changes to off-target concepts. Empirically, we find that dual steering enhances the controllability and stability of concept manipulation.
记忆是有益还是有害?先验信息设定阈值
Chen Cheng, Rina Foygel Barber
AI总结 在过参数化线性模型和贝叶斯框架下,研究先验分布如何决定训练误差与泛化误差的关系,给出记忆必要或过拟合有害的条件。
我们研究了任意估计过程中训练误差与泛化误差之间的联系,在贝叶斯设置下,基于一般先验的过参数化线性模型中进行工作。我们发现了先验分布$π$固有的决定因素,给出了最优泛化需要训练误差(i)接近插值(相对于噪声大小,即记忆是必要的),或(ii)接近噪声水平(即过拟合是有害的)的显式条件。值得注意的是,当噪声达到由Fisher信息和先验$π$的方差参数决定的阈值时,这些现象会发生。
We examine the connection between training error and generalization error for arbitrary estimating procedures, working in an overparameterized linear model under general priors in a Bayesian setup. We find determining factors inherent to the prior distribution $π$, giving explicit conditions under which optimal generalization necessitates that the training error be (i) near interpolating relative to the noise size (i.e., memorization is necessary), or (ii) close to the noise level (i.e., overfitting is harmful). Remarkably, these phenomena occur when the noise reaches thresholds determined by the Fisher information and the variance parameters of the prior $π$.
PAC-Bayesian 强化学习训练可泛化策略
Abdelkrim Zitouni, Mehdi Hennequin, Juba Agoun, Ryan Horache, Nadia Kabachi, Omar Rivasplata
AI总结 提出一种新的 PAC-Bayesian 泛化界,通过链的混合时间显式考虑数据中的马尔可夫依赖性,并基于此设计 PB-SAC 算法以优化该界指导探索,在连续控制任务中提供有意义的置信度证书且保持竞争性能。
我们推导了一个新的用于强化学习的 PAC-Bayesian 泛化界,该界通过链的混合时间显式考虑了数据中的马尔可夫依赖性。这有助于克服在强化学习中获取泛化保证的挑战,因为数据的序列性质破坏了经典界所依赖的独立性假设。新界为现代离策略算法(如 Soft Actor-Critic)提供了非空泛证书。我们通过 PB-SAC 展示了该界的实际效用,这是一种在训练过程中优化该界以指导探索的新算法。在多个连续控制任务上的实验表明,所提出的方法在保持竞争性能的同时提供了有意义的置信度证书。
We derive a novel PAC-Bayesian generalization bound for reinforcement learning that explicitly accounts for Markov dependencies in the data, through the chain's mixing time. This contributes to overcoming challenges in obtaining generalization guarantees for reinforcement learning, where the sequential nature of data breaks the independence assumptions underlying classical bounds. The new bound provides non-vacuous certificates for modern off-policy algorithms such as Soft Actor-Critic. We demonstrate the practical utility of the bound through PB-SAC, a novel algorithm that optimizes the bound during training to guide exploration. Experiments across several continuous control tasks show that the proposed approach provides meaningful confidence certificates while maintaining competitive performance.
无参数动态遗憾:时变移动成本、延迟反馈和记忆
Hao Qiu, Andrew Jacobsen, Emmanuel Esposito, Mengxiao Zhang
AI总结 本文提出一种新算法,在具有时变移动成本的在线凸优化中,首次实现了比较器自适应的动态遗憾界,并应用于延迟反馈和时变记忆问题。
在本文中,我们研究了具有移动成本的无约束在线凸优化(OCO)中的动态遗憾。具体来说,我们通过允许移动成本系数$λ_t$随时间任意变化来推广标准设置。我们的主要贡献是一种新颖的算法,该算法为此设置建立了第一个比较器自适应动态遗憾界,保证$\widetilde{\mathcal{O}}(\sqrt{(M^2+MP_T)(T+\sum_t λ_t)})$遗憾,其中$P_T$是比较器序列在$T$轮上的路径长度,$M$是最大比较器范数。我们的结果恢复了OCO中静态和动态遗憾的最优自适应率,作为所有轮次中$λ_t=0$的特例。为了展示我们结果的多功能性,我们考虑了两个应用:具有延迟反馈的OCO和具有时变记忆的OCO。我们表明这两个问题都可以转化为时变移动成本,特别是为延迟反馈设置建立了一种新颖的归约,这具有独立的意义。一个关键的观察是,我们的遗憾界中对移动成本的一阶依赖在实现两种设置中的最优比较器自适应动态遗憾保证中起着关键作用。
In this paper, we study dynamic regret in unconstrained online convex optimization (OCO) with movement costs. Specifically, we generalize the standard setting by allowing the movement cost coefficients $λ_t$ to vary arbitrarily over time. Our main contribution is a novel algorithm that establishes the first comparator-adaptive dynamic regret bound for this setting, guaranteeing $\widetilde{\mathcal{O}}(\sqrt{(M^2+MP_T)(T+\sum_t λ_t)})$ regret, where $P_T$ is the path length of the comparator sequence over $T$ rounds and $M$ is the maximal comparator norm. Our result recovers the optimal adaptive rates for both static and dynamic regret in OCO as the special case where $λ_t=0$ for all rounds. To demonstrate the versatility of our results, we consider two applications: OCO with delayed feedback and OCO with time-varying memory. We show that both problems can be translated into time-varying movement costs, establishing a novel reduction specifically for the delayed feedback setting that is of independent interest. A crucial observation is that the first-order dependence on movement costs in our regret bound plays a key role in enabling optimal comparator-adaptive dynamic regret guarantees in both settings.
超树预测
Alexander März, Kashif Rasul
AI总结 提出超树框架,通过梯度提升树学习目标时间序列模型(如ARIMA或指数平滑)的参数,结合决策树与经典预测模型,并引入混合架构解决高维参数估计的缩放限制。
我们引入超树作为一种新颖的框架,用于使用梯度提升树对时间序列数据进行建模。与直接预测时间序列的传统树方法不同,超树学习目标时间序列模型(如ARIMA或指数平滑)的参数,这些参数是特征的函数。然后,目标模型使用这些参数生成最终预测。我们的框架将决策树在表格数据上的有效性与经典预测模型相结合,从而将时间序列归纳偏差引入树模型。为了解决提升树在估计高维目标模型参数时的缩放限制,我们将决策树和神经网络结合在一个统一的框架中。在这种混合方法中,树从输入特征生成信息表示,然后浅层网络将其作为输入来学习时间序列模型的参数。通过我们的研究,我们探索了超树在各种预测任务中的有效性,并将基于树的建模扩展到时间序列分析中的传统用途之外。
We introduce Hyper-Trees as a novel framework for modeling time series data using gradient boosted trees. Unlike conventional tree-based approaches that forecast time series directly, Hyper-Trees learn the parameters of a target time series model, such as ARIMA or Exponential Smoothing, as functions of features. These parameters are then used by the target model to generate the final forecasts. Our framework combines the effectiveness of decision trees on tabular data with classical forecasting models, thereby inducing a time series inductive bias into tree-based models. To resolve the scaling limitations of boosted trees when estimating a high-dimensional set of target model parameters, we combine decision trees and neural networks within a unified framework. In this hybrid approach, the trees generate informative representations from the input features, which a shallow network then uses as input to learn the parameters of a time series model. With our research, we explore the effectiveness of Hyper-Trees across a range of forecasting tasks and extend tree-based modeling beyond its conventional use in time series analysis.
理解Grokking:岭回归中可证明的Grokking现象
Mingyue Xu, Gal Vardi, Itay Safran
AI总结 本文在经典岭回归设置中研究grokking现象,证明使用带权重衰减的梯度下降学习过参数化线性回归模型时,存在过拟合、泛化延迟和最终泛化误差任意小的三个阶段,并首次给出泛化延迟(grokking时间)的严格定量界,同时通过实验表明该界也适用于非线性神经网络。
我们在经典岭回归设置中研究grokking现象,即过拟合后很久才出现泛化。我们证明了使用带权重衰减的梯度下降学习过参数化线性回归模型的端到端grokking结果。具体地,我们证明以下阶段发生:(i) 训练早期模型过拟合训练数据;(ii) 过拟合显现后长时间泛化性能差;(iii) 泛化误差最终变得任意小。此外,我们从理论和实验上表明,通过适当的超参数调优,可以以原则性的方式放大或消除grokking。据我们所知,这是首次以训练超参数表示的泛化延迟(我们称之为“grokking时间”)的严格定量界。最后,超越线性设置,我们实验证明我们的定量界也捕捉了非线性神经网络上grokking的行为。我们的结果表明,grokking不是深度学习固有的失败模式,而是特定训练条件的结果,因此不需要对模型架构或学习算法进行根本性改变来避免。
We study grokking, the onset of generalization long after overfitting, in a classical ridge regression setting. We prove end-to-end grokking results for learning over-parameterized linear regression models using gradient descent with weight decay. Specifically, we prove that the following stages occur: (i) the model overfits the training data early during training; (ii) poor generalization persists long after overfitting has manifested; and (iii) the generalization error eventually becomes arbitrarily small. Moreover, we show, both theoretically and empirically, that grokking can be amplified or eliminated in a principled manner through proper hyperparameter tuning. To the best of our knowledge, these are the first rigorous quantitative bounds on the generalization delay (which we refer to as the "grokking time") in terms of training hyperparameters. Lastly, going beyond the linear setting, we empirically demonstrate that our quantitative bounds also capture the behavior of grokking on non-linear neural networks. Our results suggest that grokking is not an inherent failure mode of deep learning, but rather a consequence of specific training conditions, and thus does not require fundamental changes to the model architecture or learning algorithm to avoid.
泊松梯度估计的旅行者指南
Michael Ibrahim, Hanqi Zhao, Eli Sennesh, Zhi Li, Anqi Wu, Jacob L. Yates, Chengrui Li, Hadi Vafaii
AI总结 本文系统比较了指数到达时间模拟和Gumbel-SoftMax松弛两种方法,提出改进的EAT方法以降低偏差,并在泊松潜变量模型上验证其优越性能。
泊松分布潜变量模型在计算神经科学中广泛使用,但通过离散随机样本进行微分仍然具有挑战性。两种方法解决了这一问题:*指数到达时间*(EAT)模拟和*Gumbel-SoftMax*(GSM)松弛。我们首次对这些方法进行了系统比较,并为实践者提供了实用指导。我们的主要技术贡献是对EAT方法的修改,理论上保证了无偏的一阶矩(精确匹配发放率),并减少了二阶矩偏差。我们在分布保真度、梯度质量以及两个任务上的性能对这些方法进行了评估:(1)具有泊松潜变量的变分自编码器,以及(2)部分可观测的广义线性模型,其中必须从观测到的脉冲序列推断潜在的神经连接性。在所有指标上,我们修改后的EAT方法表现出更好的整体性能(通常与精确梯度相当),并且对超参数选择具有更高的鲁棒性。这些结果扩展到过度分散的负二项潜变量,其中修改后的EAT再次表现最佳。然而,只有GSM可以推广到任意非泊松分布,包括欠分散的情况。总之,我们的结果阐明了这些方法之间的权衡,并为使用泊松潜变量模型的实践者提供了具体建议。
Poisson-distributed latent variable models are widely used in computational neuroscience, but differentiating through discrete stochastic samples remains challenging. Two approaches address this: *Exponential Arrival Time* (EAT) simulation and *Gumbel-SoftMax* (GSM) relaxation. We provide the first systematic comparison of these methods, along with practical guidance for practitioners. Our main technical contribution is a modification to the EAT method that theoretically guarantees an unbiased first moment (exactly matching the firing rate), and reduces second-moment bias. We evaluate these methods on their distributional fidelity, gradient quality, and performance on two tasks: (1) variational autoencoders with Poisson latents, and (2) partially observable generalized linear models, where latent neural connectivity must be inferred from observed spike trains. Across all metrics, our modified EAT method exhibits better overall performance (often comparable to exact gradients), and substantially higher robustness to hyperparameter choices. These results extend to over-dispersed Negative Binomial latents, where modified EAT again performs best. However, only GSM generalizes to arbitrary non-Poisson distributions, including the under-dispersed regime. Together, our results clarify the trade-offs between these methods and offer concrete recommendations for practitioners working with Poisson latent variable models.
Shapley 值的一个奇估计器
Fabian Fumagalli, Landon Butler, Justin Singh Kang, Kannan Ramchandran, R. Teal Witter
AI总结 本文证明 Shapley 值仅依赖于集合函数的奇分量,并基于此提出 OddSHAP 估计器,通过在奇子空间上进行多项式回归实现高效近似,在较大采样预算下达到最先进精度。
Shapley 值是机器学习中用于归因的普遍框架,涵盖特征重要性、数据估值和因果推断。然而,其精确计算通常是棘手的,需要高效的近似方法。虽然最有效和流行的估计器利用配对采样启发式来减少估计误差,但驱动这种改进的理论机制仍然不透明。在这项工作中,我们为配对采样提供了一个优雅且基本的理由:我们证明了 Shapley 值仅依赖于集合函数的奇分量,并且配对采样正交化回归目标以滤除无关的偶分量。利用这一见解,我们提出了 OddSHAP,一种新颖的一致估计器,它仅在奇子空间上进行多项式回归。通过利用傅里叶基来隔离该子空间,并使用代理模型识别高影响交互,OddSHAP 克服了高阶近似的组合爆炸。通过广泛的基准测试,我们发现 OddSHAP 在较大的采样预算下实现了最先进的估计精度。
The Shapley value is a ubiquitous framework for attribution in machine learning, encompassing feature importance, data valuation, and causal inference. However, its exact computation is generally intractable, necessitating efficient approximation methods. While the most effective and popular estimators leverage the paired sampling heuristic to reduce estimation error, the theoretical mechanism driving this improvement has remained opaque. In this work, we provide an elegant and fundamental justification for paired sampling: we prove that the Shapley value depends exclusively on the odd component of the set function, and that paired sampling orthogonalizes the regression objective to filter out the irrelevant even component. Leveraging this insight, we propose OddSHAP, a novel consistent estimator that performs polynomial regression solely on the odd subspace. By utilizing the Fourier basis to isolate this subspace and employing a proxy model to identify high-impact interactions, OddSHAP overcomes the combinatorial explosion of higher-order approximations. Through an extensive benchmark, we find that OddSHAP achieves state-of-the-art estimation accuracy at larger sampling budgets.
SCORE: 在线FDR控制中过冲退款的统一框架
Qi Kuang, Bowen Gang, Yin Xia
AI总结 提出SCORE框架,利用不等式回收超过拒绝阈值的证据,增强基于e值的在线多重假设检验方法的统计功效,并严格保持FDR控制。
我们提出了一个统一框架,以增强基于$e$值的在线多重假设检验程序的功效。虽然基于$e$值的方法在最小假设下提供了稳健的在线错误发现率(FDR)控制,但它们通常因丢弃超过拒绝阈值的证据而遭受功效损失。我们通过顺序控制与过冲退款(SCORE)框架解决了这一低效问题,该框架利用不等式$\mathbb{I}(y \ge 1) \le y - (y-1)_+$(对所有$y\ge 0$成立)来回收原本浪费的证据。这一简单而强大的见解为改进广泛的在线测试算法提供了统一原则。基于此框架,我们开发了若干最新程序的SCORE增强版本,包括SCORE-LOND、SCORE-LORD和SCORE-SAFFRON,这些版本在保持有效有限样本FDR控制的同时,严格优于原始版本。此外,在温和假设下,SCORE允许通过两次使用最新决策来追溯更新alpha财富:首先确定其奖励或损失,然后刷新过去的财富。这种机制使得在保持有效FDR控制的同时,能够采用更激进的测试策略,从而进一步提高统计功效。通过广泛的模拟和真实数据实验验证了所提方法的有效性。
We propose a unified framework to enhance the power of online multiple hypothesis testing procedures based on $e$-values. While $e$-value-based methods offer robust online False Discovery Rate (FDR) control under minimal assumptions, they often suffer from power loss by discarding evidence that exceeds the rejection threshold. We address this inefficiency via the Sequential Control with Overshoot Refund for E-values (SCORE) framework, which leverages the inequality $\mathbb{I}(y \ge 1) \le y - (y-1)_+$, valid for all $y\ge 0$, to reclaim this otherwise wasted evidence. This simple yet powerful insight yields a unified principle for improving a broad class of online testing algorithms. Building on this framework, we develop SCORE-enhanced versions of several state-of-the-art procedures, including SCORE-LOND, SCORE-LORD, and SCORE-SAFFRON, all of which strictly dominate their original counterparts while preserving valid finite-sample FDR control. Furthermore, under mild assumptions, SCORE permits retroactive updates of alpha-wealth by using the latest decision twice: first to determine its reward or loss, and then to refresh past wealth. Such a mechanism enables more aggressive testing strategies while maintaining valid FDR control, thereby further improving statistical power. The effectiveness of the proposed methods is validated through extensive simulation and real-data experiments.
物理信息约束、边界约束的高斯过程回归用于流场重建
Adrian Padilla-Segarra, Pascal Noble, Olivier Roustant, Éric Savin
AI总结 提出一种通用方法,通过边界约束和物理信息核函数,实现无边界观测的不可压缩流场重建。
高斯过程回归技术已从降维角度用于流体力学中的流场重建。该设置中的一个主要成分是构造适应的协方差函数(或核函数)以获得此类估计。在本文中,我们提出了一种通用方法,用于在任意紧集上约束给定的高斯过程。预定义过程的核函数必须至少连续,并且可以包含关于所研究现象的其他信息。这种通用边界约束框架可以高度灵活地实现,适用于广泛的工程应用。由此,我们推导出物理信息核函数,用于模拟绕气动外形的不可压缩(无散度)二维速度场。这些核函数允许定义满足不可压缩条件和沿外形连续施加的边界条件的高斯过程先验。我们描述了一种适用于边界约束过程的数值方法,该方法由紧集上的测度参数化。通过绕圆柱和NACA 0412翼型(无需任何边界观测)的流场数值模拟,展示了该方法的相关性和性能。
Gaussian process regression techniques have been used in fluid mechanics for the reconstruction of flow fields from a reduction-of-dimension perspective. A main ingredient in this setting is the construction of adapted covariance functions, or kernels, to obtain such estimates. In this paper, we present a general method for constraining a prescribed Gaussian process on an arbitrary compact set. The kernel of the pre-defined process must be at least continuous and may include other information about the studied phenomenon. This general boundary-constraining framework can be implemented with high flexibility for a broad range of engineering applications. From this, we derive physics-informed kernels for simulating two-dimensional velocity fields of an incompressible (divergence-free) flow around aerodynamic profiles. These kernels allow to define Gaussian process priors satisfying the incompressibility condition and the prescribed boundary conditions along the profile in a continuous manner. We describe an adapted numerical method for the boundary-constraining procedure parameterized by a measure on the compact set. The relevance of the methodology and performances are illustrated by numerical simulations of flows around a cylinder and a NACA 0412 airfoil profile, for which no observation at the boundary is needed at all.
高斯逼近下的高效集中不等式
Morgane Austern, Lester Mackey
AI总结 针对有界随机变量,提出具有渐近最优宽度、有限样本有效性和次高斯衰减的可计算集中不等式,通过零偏耦合和Stein交换对方法紧界非均匀Kolmogorov和Wasserstein距离,构建高效置信区间和经验Berry-Esseen界。
样本均值的集中不等式,如Bernstein、Hoeffding和Bentkus不等式,对任意样本量有效但过于保守,导致置信区间过宽。中心极限定理(CLT)提供具有最优宽度的渐近置信区间,但对所有样本量无效。为解决这一矛盾,我们为有界变量开发了新的可计算集中不等式,具有渐近最优大小、有限样本有效性和次高斯衰减。这些界限能够构建具有正确覆盖率的有效置信区间(适用于任意样本量)以及无需先验总体方差信息的有效经验Berry-Esseen界。我们通过零偏耦合和Stein交换对方法紧界非均匀Kolmogorov和Wasserstein距离来推导不等式,并展示了相对于Bernstein、Hoeffding、Bentkus、Berry-Esseen、Feller-Cramér、Romano-Wolf、经验Bernstein、经验Bentkus和coin-betting不等式的实际改进。
Concentration inequalities for the sample mean, like those due to Bernstein, Hoeffding, and Bentkus, are valid for any sample size but overly conservative, yielding confidence intervals that are unnecessarily wide. The central limit theorem (CLT) provides asymptotic confidence intervals with optimal width, but these are invalid for all sample sizes. To resolve this tension, we develop new computable concentration inequalities for bounded variables with asymptotically optimal size, finite-sample validity, and sub-Gaussian decay. These bounds enable the construction of efficient confidence intervals with correct coverage for any sample size and efficient empirical Berry-Esseen bounds that require no prior knowledge of the population variance. We derive our inequalities by tightly bounding non-uniform Kolmogorov and Wasserstein distances to a Gaussian using zero-bias couplings and Stein's method of exchangeable pairs and demonstrate practical improvements over the Bernstein, Hoeffding, Bentkus, Berry-Esseen, Feller-Cramér, Romano-Wolf, empirical Bernstein, empirical Bentkus, and coin-betting inequalities.
条件覆盖诊断用于共形预测
Sacha Braun, David Holzmüller, Michael I. Jordan, Francis Bach
AI总结 提出将条件覆盖估计转化为分类问题,通过超额风险度量(ERT)来诊断共形预测的条件覆盖偏差,实验表明使用现代分类器比传统指标具有更高的统计功效。
评估条件覆盖仍然是评估预测系统可靠性中最持久的挑战之一。尽管共形方法可以保证边际覆盖,但没有方法能保证产生具有正确条件覆盖的集合,这使得实践者无法清晰解释局部偏差。为了克服现有指标的样本低效和过拟合问题,我们将条件覆盖估计转化为一个分类问题。当且仅当某个分类器能够达到比目标覆盖更低的风险时,条件覆盖被违反。通过选择(适当的)损失函数,得到的风险差异给出了自然误覆盖度量(如L1和L2距离)的保守估计,甚至可以分离过覆盖和欠覆盖以及非恒定目标覆盖的影响。我们将得到的度量族称为目标覆盖的超额风险(ERT)。实验表明,使用现代分类器比基于简单分类器的现有指标(如CovGap)具有更高的统计功效。此外,我们使用我们的度量来基准测试不同的共形预测方法。最后,我们发布了ERT以及先前条件覆盖度量的开源软件包。这些贡献共同为理解、诊断和改进预测系统的条件可靠性提供了新视角。
Evaluating conditional coverage remains one of the most persistent challenges in assessing the reliability of predictive systems. Although conformal methods can give guarantees on marginal coverage, no method can guarantee to produce sets with correct conditional coverage, leaving practitioners without a clear way to interpret local deviations. To overcome sample-inefficiency and overfitting issues of existing metrics, we cast conditional coverage estimation as a classification problem. Conditional coverage is violated if and only if some classifier can achieve lower risk than the target coverage. Through the choice of a (proper) loss function, the resulting risk difference gives a conservative estimate of natural miscoverage measures such as L1 and L2 distance, and can even separate the effects of over- and under-coverage, and non-constant target coverages. We call the resulting family of metrics excess risk of the target coverage (ERT). We show experimentally that the use of modern classifiers provides much higher statistical power than simple classifiers underlying established metrics like CovGap. Additionally, we use our metric to benchmark different conformal prediction methods. Finally, we release an open-source package for ERT as well as previous conditional coverage metrics. Together, these contributions provide a new lens for understanding, diagnosing, and improving the conditional reliability of predictive systems.
分布聚类问题的分布检验方法
Gunjan Kumar, Yash Pote, Jonathan Scarlett
AI总结 研究将k个分布划分为两组的问题,其中每组内分布相同且两组分布总变差距离至少为ε,通过分布检验方法给出样本复杂度的上下界。
我们研究以下分布聚类问题:给定一个隐藏的划分,将$k$个分布分成两组,使得每组内的分布相同,且与两个簇相关的两个分布在总变差上$\varepsilon$-远,目标是恢复该划分。我们针对两种基本情形建立了样本复杂度的上下界:(1) 当一个簇的分布已知时,(2) 当两个簇的分布都未知时。我们的上下界刻画了样本复杂度对域大小$n$、分布数量$k$、其中一个簇的大小$r$以及距离$\varepsilon$的依赖关系。特别地,我们在所有参数范围内实现了关于$(n,k,r,\varepsilon)$的紧性(最多相差$O(\log k)$因子)。
We study the following distribution clustering problem: Given a hidden partition of $k$ distributions into two groups, such that the distributions within each group are the same, and the two distributions associated with the two clusters are $\varepsilon$-far in total variation, the goal is to recover the partition. We establish upper and lower bounds on the sample complexity for two fundamental cases: (1) when one of the cluster's distributions is known, and (2) when both are unknown. Our upper and lower bounds characterize the sample complexity's dependence on the domain size $n$, number of distributions $k$, size $r$ of one of the clusters, and distance $\varepsilon$. In particular, we achieve tightness with respect to $(n,k,r,\varepsilon)$ (up to an $O(\log k)$ factor) for all regimes.
面向工具变量回归的结果感知谱特征学习
Dimitri Meunier, Jakub Wornbard, Vladimir R. Kostic, Antoine Moulin, Alek Fröhlich, Karim Lounici, Massimiliano Pontil, Arthur Gretton
AI总结 针对存在隐藏混杂因素的非参数工具变量回归问题,提出一种通过最小化基于增广算子的对比损失来学习结果感知谱特征的方法,以缓解谱错位导致的因果函数表示不足问题。
我们解决了在存在隐藏混杂因素的情况下使用非参数工具变量(IV)回归进行因果效应估计的问题。一种成熟的方法是使用基于学习到的谱特征的估计量,即跨越连接处理变量与工具变量的算子的主要奇异子空间的特征。虽然这种方法很强大,但此类特征对结果变量是无关的。因此,当真实因果函数无法被这些主导奇异函数很好地表示时,该方法可能会失败。为了缓解这一问题,我们引入了增广谱特征学习,这是一个使特征学习过程具有结果感知能力的框架。我们的方法通过最小化从增广算子导出的新颖对比损失来学习特征,该增广算子融合了结果的信息。通过学习这些任务特定的特征,即使在谱错位的情况下,我们的方法仍然有效。我们对该框架进行了理论分析,并在具有挑战性的基准测试上验证了我们的方法。
We address the problem of causal effect estimation in the presence of hidden confounders using nonparametric instrumental variable (IV) regression. An established approach is to use estimators based on learned spectral features, that is, features spanning the top singular subspaces of the operator linking treatments to instruments. While powerful, such features are agnostic to the outcome variable. Consequently, the method can fail when the true causal function is poorly represented by these dominant singular functions. To mitigate, we introduce Augmented Spectral Feature Learning, a framework that makes the feature learning process outcome-aware. Our method learns features by minimizing a novel contrastive loss derived from an augmented operator that incorporates information from the outcome. By learning these task-specific features, our approach remains effective even under spectral misalignment. We provide a theoretical analysis of this framework and validate our approach on challenging benchmarks.
跨张量并行大小的确定性推理,消除训练-推理不匹配
Ziyang Zhang, Xinheng Ding, Jiayi Yuan, Rixin Liu, Huizi Mao, Jiarong Xing, Zirui Liu
AI总结 针对不同张量并行大小导致浮点运算非结合性引起的推理非确定性问题,提出基于树的核(TBIK)实现跨TP大小的比特级一致结果,消除RL训练中推理与训练引擎间的精度不匹配。
确定性推理对于大型语言模型(LLM)应用(如LLM-as-a-judge评估、多智能体系统和强化学习(RL))日益关键。然而,现有的LLM服务框架表现出非确定性行为:当系统配置(例如张量并行(TP)大小、批大小)变化时,即使采用贪心解码,相同的输入也可能产生不同的输出。这是由于浮点运算的非结合性以及GPU间归约顺序不一致导致的。虽然先前的工作通过批不变核解决了与批大小相关的非确定性,但跨不同TP大小的确定性仍然是一个开放问题,特别是在RL设置中,训练引擎通常使用全分片数据并行(即TP=1),而部署引擎依赖多GPU TP以最大化推理吞吐量,从而在两者之间产生自然的不匹配。这种精度不匹配问题可能导致RL训练性能次优甚至崩溃。我们识别并分析了TP引起不一致的根本原因,并提出了基于树的核(TBIK),这是一组TP不变的矩阵乘法和归约原语,无论TP大小如何,都能保证比特级相同的结果。我们的关键见解是通过统一的层次二叉树结构对齐GPU内和GPU间的归约顺序。我们在Triton中实现了这些核,并将其集成到vLLM和FSDP中。实验证明,在不同TP大小下,确定性推理的概率发散为零,且具有比特级可重复性。此外,在采用不同并行策略的RL训练流程中,我们在vLLM和FSDP之间实现了比特级相同的结果。代码可在https://github.com/nanomaoli/llm_reproducibility获取。
Deterministic inference is increasingly critical for large language model (LLM) applications such as LLM-as-a-judge evaluation, multi-agent systems, and Reinforcement Learning (RL). However, existing LLM serving frameworks exhibit non-deterministic behavior: identical inputs can yield different outputs when system configurations (e.g., tensor parallel (TP) size, batch size) vary, even under greedy decoding. This arises from the non-associativity of floating-point arithmetic and inconsistent reduction orders across GPUs. While prior work has addressed batch-size-related nondeterminism through batch-invariant kernels, determinism across different TP sizes remains an open problem, particularly in RL settings, where the training engine typically uses Fully Sharded Data Parallel (i.e., TP = 1) while the rollout engine relies on multi-GPU TP to maximize the inference throughput, creating a natural mismatch between the two. This precision mismatch problem may lead to suboptimal performance or even collapse for RL training. We identify and analyze the root causes of TP-induced inconsistency and propose Tree-Based Invariant Kernels (TBIK), a set of TP-invariant matrix multiplication and reduction primitives that guarantee bit-wise identical results regardless of TP size. Our key insight is to align intra- and inter-GPU reduction orders through a unified hierarchical binary tree structure. We implement these kernels in Triton and integrate them into vLLM and FSDP. Experiments confirm zero probability divergence and bit-wise reproducibility for deterministic inference across different TP sizes. Also, we achieve bit-wise identical results between vLLM and FSDP in RL training pipelines with different parallel strategy. Code is available at https://github.com/nanomaoli/llm_reproducibility.
基于纳什谈判的稀疏混合专家模型专家合并
Dung V. Nguyen, Anh T. Nguyen, Minh H. Nguyen, Luc Q. Nguyen, Shiqi Jiang, Ethan Fetaya, Linh Duy Tran, Gal Chechik, Tan M. Nguyen
AI总结 针对稀疏混合专家模型缺乏原则性加权机制的专家合并问题,提出基于纳什谈判的NAMEx框架,实现专家间更平衡高效的协作,在多项任务中优于现有方法。
现有的稀疏混合专家模型(SMoE)专家合并策略通常依赖于输入相关或输入无关的专家参数平均,但往往缺乏原则性的加权机制。在这项工作中,我们通过博弈论的视角重新解释专家合并,揭示了专家之间的合作与竞争动态。基于这一视角,我们引入了专家纳什合并(NAMEx),这是一个将纳什谈判融入合并过程的新框架,使专家之间能够实现更平衡和高效的协作。此外,我们将复杂动量纳入NAMEx,以加速专家传播,并提供了收敛的理论保证。在语言建模、文本分类、图像分类以及数据损坏下的零样本鲁棒性等广泛实验中,NAMEx始终优于竞争方法,同时与流行的MoE架构无缝集成。最后,我们通过将NAMEx应用于大规模系统(包括Qwen1.5-MoE (14B)和DeepSeek-MoE (16B))展示了其可扩展性,在零样本和微调设置中均证明了其有效性。代码公开于:https://github.com/anh147/NAMEx。
Existing expert merging strategies for Sparse Mixture of Experts (SMoE) typically rely on input-dependent or input-independent averaging of expert parameters, but often lack a principled weighting mechanism. In this work, we reinterpret expert merging through the lens of game theory, revealing cooperative and competitive dynamics among experts. Based on this perspective, we introduce Nash Merging of Experts (NAMEx), a novel framework that incorporates Nash Bargaining into the merging process, enabling more balanced and efficient collaboration among experts. Additionally, we incorporate complex momentum into NAMEx to accelerate expert propagation with theoretical guarantees for convergence. Extensive experiments across language modelling, text classification, image classification, and zero-shot robustness under data corruption show that NAMEx consistently outperforms competing methods while integrating seamlessly with popular MoE architectures. Finally, we demonstrate NAMEx's scalability by applying it to large-scale systems, including Qwen1.5-MoE (14B) and DeepSeek-MoE (16B), where it proves effective in both zero-shot and fine-tuning settings. The code is publicly available at: https://github.com/anh147/NAMEx.
强化序贯蒙特卡洛用于摊销采样
Sanghyeok Choi, Sarthak Mittal, Víctor Elvira, Jinkyoo Park, Esmeralda S. Whitammer
AI总结 本文提出一种摊销方法与粒子方法相结合的采样框架,通过最大熵强化学习训练序贯蒙特卡洛采样器,并利用离线策略学习提高目标分布探索效率,在合成多模态目标和丙氨酸二肽构象玻尔兹曼分布上验证了改进的近似精度与训练稳定性。
本文提出了一种摊销方法和基于粒子的方法的协同作用,用于从未归一化的密度函数定义的分布中采样。我们阐述了序贯蒙特卡洛(SMC)与通过最大熵强化学习(MaxEnt RL)训练的神经序贯采样器之间的联系,其中学习的采样策略和价值函数定义了提议核和扭曲函数。利用这一联系,我们引入了一种离线策略RL训练程序,该程序使用来自SMC的样本(将学习的采样器作为提议)作为行为策略,以更好地探索目标分布。我们描述了稳定联合训练提议和扭曲函数的技术,以及一种自适应权重退火方案以减少训练信号方差。此外,基于过去使用经验回放指导神经采样器训练的尝试,我们推导出一种方法,将历史样本与退火重要性采样权重结合在回放缓冲区中。在合成多模态目标(连续和离散空间)以及丙氨酸二肽构象的玻尔兹曼分布上,我们展示了在近似真实分布以及训练稳定性方面相比摊销方法和蒙特卡洛方法的改进。
This paper proposes a synergy of amortised and particle-based methods for sampling from distributions defined by unnormalised density functions. We state a connection between sequential Monte Carlo (SMC) and neural sequential samplers trained by maximum-entropy reinforcement learning (MaxEnt RL), wherein learnt sampling policies and value functions define proposal kernels and twist functions. Exploiting this connection, we introduce an off-policy RL training procedure for the sampler that uses samples from SMC -- using the learnt sampler as a proposal -- as a behaviour policy that better explores the target distribution. We describe techniques for stable joint training of proposals and twist functions and an adaptive weight tempering scheme to reduce training signal variance. Furthermore, building upon past attempts to use experience replay to guide the training of neural samplers, we derive a way to combine historical samples with annealed importance sampling weights within a replay buffer. On synthetic multi-modal targets (in both continuous and discrete spaces) and the Boltzmann distribution of alanine dipeptide conformations, we demonstrate improvements in approximating the true distribution as well as training stability compared to both amortised and Monte Carlo methods.
学习具有隐藏动态过程的通用因果结构用于气候分析
Minghao Fu, Biwei Huang, Zijian Li, Yujia Zheng, Ignavier Ng, Guangyi Chen, Yingyao Hu, Kun Zhang
AI总结 提出统一框架CaDRe,联合发现观测变量间的因果关系和隐藏动态过程,在非参数设置下可识别,并在气候数据上验证了有效性和可解释性。
理解气候动力学需要超越观测数据中的相关性,揭示潜在的因果过程。诸如大气过程等潜在驱动因素在时间动态中起着核心作用,而地理上邻近的观测变量之间也存在直接的因果影响。传统的因果表示学习(CRL)通常关注潜在因素,但忽略了这种观测到观测的因果关系,这限制了其在气候分析中的适用性。在本文中,我们引入了一个统一框架,联合揭示(i)观测变量之间的因果关系和(ii)潜在驱动力及其相互作用。我们建立了条件,使得隐藏动态过程和观测变量之间的因果结构可以从时间序列数据中同时识别,并且我们的保证在非参数设置下通过恢复潜在变量和因果关系的上下文信息仍然成立。基于这些见解,我们提出了CaDRe(因果发现与表示学习),一个具有结构约束的时间序列生成模型,集成了CRL和因果发现。在合成数据集上的实验验证了我们的理论结果。在真实世界的气候数据集上,CaDRe提供了有竞争力的预测精度,并恢复了与领域专业知识一致的可视化因果图,从而为气候系统提供了可解释的见解。代码可在https://github.com/MinghaoFu/CaDRe获取。
Understanding climate dynamics requires going beyond correlations in observational data to uncover the underlying causal process. Latent drivers such as atmospheric processes play a central role in temporal dynamics, while direct causal influences also exist among geographically proximate observed variables. Traditional Causal Representation Learning (CRL) typically focuses on latent factors but overlooks such observable-to-observable causal relations, which limits its applicability to climate analysis. In this paper, we introduce a unified framework that jointly uncovers (i) causal relations among observed variables and (ii) latent driving forces together with their interactions. We establish conditions under which both the hidden dynamic process and the causal structure among observed variables are simultaneously identifiable from time-series data, and our guarantees continue to hold in the nonparametric setting through contextual information that recovers latent variables and causal relations. Building on these insights, we propose CaDRe (Causal Discovery and Representation learning), a time-series generative model with structural constraints that integrates CRL and causal discovery. Experiments on synthetic datasets validate our theoretical results. On real-world climate datasets, CaDRe delivers competitive forecasting accuracy and recovers visualized causal graphs aligned with domain expertise, thereby offering interpretable insights into climate systems. Code is available at https://github.com/MinghaoFu/CaDRe.
关于通过鞅驱动的Fisher提示进行顺序测试时间自适应的技术说明
Behraj Khan, Tahir Qasim Syed
AI总结 提出M-FISHER框架,通过指数鞅检测分布漂移并利用Fisher预条件更新实现稳定自适应,提供时间一致的错误控制保证和最优检测延迟。
我们提出了M-FISHER的理论框架,这是一种用于流数据中顺序分布漂移检测和稳定自适应的方法。对于检测,我们从非一致性分数构建指数鞅,并应用Ville不等式获得关于误报控制的时间一致保证,确保在任何停止时间下的统计有效性。在持续漂移下,我们进一步将期望检测延迟界定为$\mathcal{O}(\log(1/δ)/Γ)$,其中$Γ$反映了漂移后的信息增益,从而将检测效率与分布散度联系起来。对于自适应,我们展示了提示参数的Fisher预条件更新实现了在分布流形上的自然梯度下降,产生局部最优更新,最小化KL散度同时保持稳定性和参数化不变性。总之,这些结果确立了M-FISHER作为一种在协变量漂移下的顺序决策中实现鲁棒、任意时间有效检测和几何稳定自适应的原则性方法。
We present a theoretical framework for M-FISHER, a method for sequential distribution shift detection and stable adaptation in streaming data. For detection, we construct an exponential martingale from non-conformity scores and apply Ville's inequality to obtain time-uniform guarantees on false alarm control, ensuring statistical validity at any stopping time. Under sustained shifts, we further bound the expected detection delay as $\mathcal{O}(\log(1/δ)/Γ)$, where $Γ$ reflects the post-shift information gain, thereby linking detection efficiency to distributional divergence. For adaptation, we show that Fisher-preconditioned updates of prompt parameters implement natural gradient descent on the distributional manifold, yielding locally optimal updates that minimize KL divergence while preserving stability and parameterization invariance. Together, these results establish M-FISHER as a principled approach for robust, anytime-valid detection and geometrically stable adaptation in sequential decision-making under covariate shift.
纵向修正治疗策略(LMTPs)对健康结果变化率影响的估计
Anja Shahu, Weijie Xia, Ying Wei, Daniel Malinsky
AI总结 本文扩展了纵向修正治疗策略(LMTP)方法,利用非参数有效影响函数(EIF)估计器,估计复杂暴露依赖干预对结果随时间变化率的影响,并构建同时置信区间和假设检验框架。
纵向数据通常包含多次访视时测量的结果,科学兴趣可能在于量化干预对结果变化率的影响。例如,人们可能希望研究在不同假设干预下疾病随时间推移的进展(或轨迹)。我们扩展了纵向修正治疗策略(LMTP)方法,以估计复杂的、暴露依赖的干预对结果随时间变化率的影响。我们利用基于非参数有效影响函数(EIF)的估计器的理论性质,引入了一个新颖的推断框架,可用于构建各种感兴趣因果效应的同时置信区间,并正式检验关于变化率的相关全局和局部假设。我们通过研究纵向移位干预是否影响结果的反事实轨迹(与无干预相比)来展示我们框架的实用性。我们展示了模拟研究的结果,以说明我们的推断框架在具有时变混杂和连续暴露的纵向设置中的性能。我们还将我们的推断框架应用于哥伦比亚大脑健康数据库(CBDB),以检查血压移位对痴呆进展的影响。
Longitudinal data often contains outcomes measured at multiple visits, and scientific interest may lie in quantifying the effect of an intervention on an outcome's rate of change. For example, one may wish to study the progression (or trajectory) of a disease over time under different hypothetical interventions. We extend the longitudinal modified treatment policy (LMTP) methodology to estimate effects of complex, exposure-dependent interventions on rates of change in an outcome over time. We exploit the theoretical properties of a nonparametric efficient influence function (EIF)-based estimator to introduce a novel inference framework that can be used to construct simultaneous confidence intervals for a variety of causal effects of interest and to formally test relevant global and local hypotheses about rates of change. We demonstrate the utility of our framework in investigating whether a longitudinal shift intervention affects an outcome's counterfactual trajectory, as compared with no intervention. We present results from a simulation study to illustrate the performance of our inference framework in a longitudinal setting with time-varying confounding and a continuous exposure. We also apply our inference framework to the Columbia Brain Health DataBank (CBDB) to examine the effect of shifting blood pressure on the progression of dementia.
流形上的动态局部Fréchet曲线回归
M. D. Ruiz-Medina, A. Torres-Signes
AI总结 本文在可分离希尔伯特空间中推导了响应和回归变量的最小二乘局部线性Fréchet曲线预测器,并提出了基于加权Fréchet均值的流形内蕴局部线性Fréchet曲线预测器,证明了其渐近最优性。
在温和条件下,本文推导了在可分离希尔伯特空间中评估的响应和回归变量的最小二乘局部线性Fréchet曲线预测器。我们获得了允许在向量函数的L^{2}空间中实现该局部线性Fréchet函数预测器的条件,该空间的值位于紧致黎曼流形上的时变切空间。其次,基于加权Fréchet均值方法,提出了在该流形上评估的内蕴局部线性Fréchet曲线预测器。证明了其渐近最优性。模拟研究和实际数据分析分析了两种预测器经验版本的有限样本性能,并与测地线Nadaraya-Watson型曲线预测器进行了比较。在实际数据分析中,基于NASA MAGSAT卫星的地心纬度和经度观测,对地球磁场的时变球坐标进行了函数预测。
Under mild conditions, this paper derives a least-squares local linear Fréchet curve predictor for response and regressor evaluated in a separable Hilbert space. We obtain the conditions allowing the implementation of this local linear Fréchet functional predictor in the ambient L^{2}-space of vector functions, with values in the time-varying tangent space on a compact Riemannian manifold. An intrinsic local linear Fréchet curve predictor evaluated in such a manifold is secondly proposed, based on a weighted Fréchet mean approach. Its asymptotical optimality is proved. The simulation study and real-data application analyze the finite-sample performance of the empirical versions of both predictors, compared with a geodesic Nadaraya-Watson-type curve predictor. In the real-data application, the functional prediction of the time-varying spherical coordinates of the Earth's magnetic field is addressed, from the observation of the geocentric latitude and longitude of the satellite NASA's MAGSAT spacecraft.
结构阈值和平滑转换向量自回归模型中的非高斯性识别
Savi Virolainen
AI总结 通过假设冲击相互独立且至多一个为高斯分布,证明了结构平滑转换向量自回归模型的统计可识别性,并提出了估计方法和混合识别策略。
我们证明了如果冲击相互独立且至多一个为高斯分布,则结构平滑转换向量自回归模型在统计上是可识别的。这将对线性结构向量自回归的已知识别结果扩展到时变影响矩阵。我们还提出了一种估计方法,展示了如何采用混合识别策略来解决弱识别问题,并建立了遍历平稳性的充分条件。所引入的方法在随附的R包sstvars中实现。我们的实证应用发现,在低经济政策不确定性和高经济政策不确定性下,正向气候政策不确定性冲击都会减少产出并提高通胀,但其影响,尤其是对通胀的影响,在高不确定性时期更强。
We show that structural smooth transition vector autoregressive models are statistically identified if the shocks are mutually independent and at most one of them is Gaussian. This extends a known identification result for linear structural vector autoregressions to a time-varying impact matrix. We also propose an estimation method, show how a blended identification strategy can be adopted to address weak identification, and establish a sufficient condition for ergodic stationarity. The introduced methods are implemented in the accompanying R package sstvars. Our empirical application finds that a positive climate policy uncertainty shock reduces production and raises inflation under both low and high economic policy uncertainty, but its effects, particularly on inflation, are stronger during the latter.
用于线性统计模型的快速随机草图化序贯最小二乘估计器
Guan-Yu Chen, Dong-Yue Xie, Xi Yang
AI总结 提出一种融合草图-求解与迭代草图方法的序贯最小二乘估计框架,通过逐步增大草图尺寸迭代求解子问题,高效获得高精度参数估计。
我们提出了一种新颖的随机化框架,用于大规模线性统计模型的估计问题,即快速随机草图化序贯最小二乘估计器(SLSE-FRS),该框架首次集成了草图-求解和迭代草图方法。通过迭代构建和求解草图最小二乘子问题,并逐步增大草图尺寸以获得更好的精度,SLSE-FRS逐步细化真实参数向量的估计,最终产生高精度估计器。我们分析了SLSE-FRS的收敛性质,并提供了其高效实现。数值实验表明,SLSE-FRS优于最先进的方法,即预处理共轭梯度法和迭代双重草图法。
We propose a novel randomized framework for the estimation problem of large-scale linear statistical models, namely Sequential Least-Squares Estimators with Fast Randomized Sketching (SLSE-FRS), which integrates Sketch-and-Solve and Iterative-Sketching methods for the first time. By iteratively constructing and solving sketched least-squares (LS) subproblems with increasing sketch sizes to achieve better precisions, SLSE-FRS gradually refines the estimators of the true parameter vector, ultimately producing high-precision estimators. We analyze the convergence properties of SLSE-FRS, and provide its efficient implementation. Numerical experiments show that SLSE-FRS outperforms the state-of-the-art methods, namely the Preconditioned Conjugate Gradient (PCG) method, and the Iterative Double Sketching (IDS) method.
干扰下的随机梯度
Facheng Yu, Ronak Mehta, Alex Luedtke, Zaid Harchaoui
AI总结 本文研究目标函数依赖于未知干扰参数的学习问题中随机梯度算法的非渐近收敛性,证明在Neyman正交性等条件下经典算法仍可收敛,并提出近似正交化更新变体以在非正交情形下达到类似收敛率。
随机梯度优化是从经典监督学习到现代自监督学习等多种场景的主要学习范式。我们考虑目标函数依赖于未知干扰参数的学习问题的随机梯度算法,并建立非渐近收敛保证。我们的结果表明,虽然干扰的存在会改变最优值并扰乱优化轨迹,但在适当条件下(如Neyman正交性),经典随机梯度算法仍可能收敛。此外,即使不满足Neyman正交性,我们证明一种具有近似正交化更新(通过近似正交化梯度预言)的算法变体也能达到类似的收敛率。讨论了来自正交统计学习/双机器学习以及因果推断的例子。
Stochastic gradient optimization is the dominant learning paradigm for a variety of scenarios, from classical supervised learning to modern self-supervised learning. We consider stochastic gradient algorithms for learning problems whose objectives rely on unknown nuisance parameters, and establish non-asymptotic convergence guarantees. Our results show that, while the presence of a nuisance can alter the optimum and upset the optimization trajectory, the classical stochastic gradient algorithm may still converge under appropriate conditions, such as Neyman orthogonality. Moreover, even when Neyman orthogonality is not satisfied, we show that an algorithm variant with approximately orthogonalized updates (with an approximately orthogonalized gradient oracle) may achieve similar convergence rates. Examples from orthogonal statistical learning/double machine learning and causal inference are discussed.
函数型时间序列中相关假设检验的统一理论
Leheng Cai, Qirui Hu
AI总结 提出一个统一框架,通过B样条估计和自归一化构造无 nuisance 参数的检验,在任意采样方案下检测单样本、两样本和变点问题,并证明在稀疏到密集相变下仍保持 n^{-1/2} 的检测率。
本文为函数型时间序列中的相关假设检验发展了一个统一框架。所提出的方法适用于任意采样方案下受污染观测的单样本、两样本和变点问题。结合B样条估计与自归一化,我们构造了无需辅助估计长期协方差函数和测量误差方差函数的无 nuisance 参数检验。通过利用中等高维依赖随机向量的序贯高斯逼近,我们建立了渐近有效性,得到了枢轴极限分布。我们还提供了自归一化器非退化的充分条件,并建立了一致决策规则。一个关键的理论发现是,所提出的检验在任意采样频率下都能检测到 n^{-1/2} 局部备择假设。这揭示了一种与函数型数据分析中通常观察到的稀疏到密集相变不同的相变:虽然采样频率影响渐近方差,但检测率即使在稀疏采样下仍保持 n^{-1/2}。我们进一步研究了多变点备择假设,并将理论推广到存在一致变点估计的情形。我们还讨论了自归一化器的选择,包括最近发展的范围调整自归一化器。大量模拟支持了理论结果,对AU.SHF隐含波动率和交通流量数据集的应用展示了所提出方法的实际效用。
In this paper, we develop a {\em unified} framework for testing relevant hypotheses in functional time series. The proposed approach accommodates one-sample, two-sample, and change point problems for contaminated observations under arbitrary sampling schemes. Combining B-spline estimation with self-normalization, we construct nuisance-parameter-free tests that bypass auxiliary estimation of long-run covariance functions and measurement-error variance functions. We establish asymptotic validity by exploiting a sequential Gaussian approximation for dependent random vectors of moderately high dimension, which leads to a pivotal limiting distribution. We also provide sufficient conditions for the non-degeneracy of the self-normalizer and establish consistent decision rules. A key theoretical finding is that the proposed tests detect \(n^{-1/2}\)-local alternatives under arbitrary sampling frequencies. This uncovers a sparse-to-dense phase transition distinct from those typically observed in functional data analysis: while the sampling frequency affects the asymptotic variance, the detection rate remains \(n^{-1/2}\), even in sparsely sampled regimes. We further study multiple change point alternatives and extend the theory to settings where consistent change point estimates are available. We also discuss the choice of self-normalizers, including the recently developed range-adjusted self-normalizer. Extensive simulations support the theoretical results, and applications to the AU.SHF implied volatility and traffic volume datasets demonstrate the practical utility of the proposed methods.
多标签胸部X光分类中的不确定性及其解缠基准测试
Simon Baur, Wojciech Samek, Jackie Ma
AI总结 本研究使用MIMIC-CXR-JPG数据集,对多标签胸部X光分类任务中的13种不确定性量化方法进行基准测试,评估了卷积和Transformer架构,并扩展了三种方法到多标签设置,揭示了不同方法和架构在不确定性估计和解缠认知与偶然不确定性方面的优缺点。
可靠的不确定性量化对于医疗影像中可信赖的决策和AI模型的部署至关重要。虽然先前的工作已经探索了神经网络在合成或定义良好的数据设置(如自然图像分类)中使用信息论方法量化预测、认知和偶然不确定性的能力,但其在真实医学诊断任务中的适用性仍未得到充分探索。在本研究中,我们使用MIMIC-CXR-JPG数据集为多标签胸部X光分类提供了广泛的不确定性量化基准。我们评估了基于卷积(ResNet)和基于Transformer(Vision Transformer)架构的13种不确定性量化方法,涵盖广泛的任务。此外,我们将证据深度学习、HetClass神经网络和深度确定性不确定性扩展到多标签设置。我们的分析提供了对不确定性估计有效性以及解缠认知和偶然不确定性能力的见解,揭示了方法和架构特定的优势和局限性。
Reliable uncertainty quantification is crucial for trustworthy decision-making and the deployment of AI models in medical imaging. While prior work has explored the ability of neural networks to quantify predictive, epistemic, and aleatoric uncertainties using an information-theoretical approach in synthetic or well defined data settings like natural image classification, its applicability to real life medical diagnosis tasks remains underexplored. In this study, we provide an extensive uncertainty quantification benchmark for multi-label chest X-ray classification using the MIMIC-CXR-JPG dataset. We evaluate 13 uncertainty quantification methods for convolutional (ResNet) and transformer-based (Vision Transformer) architectures across a wide range of tasks. Additionally, we extend Evidential Deep Learning, HetClass NNs, and Deep Deterministic Uncertainty to the multi-label setting. Our analysis provides insights into uncertainty estimation effectiveness and the ability to disentangle epistemic and aleatoric uncertainties, revealing method- and architecture-specific strengths and limitations.
Conformal C2ST:将弱分类器转化为强双样本检验
Vansh Bansal, Tianyu Chen, James G. Scott
AI总结 本文提出基于共形预测的C2ST变体,使任意弱分类器都能产生精确有限样本p值,实现可控第一类错误和温和退化的检验功效,并应用于神经后验估计验证。
双样本检验问题是统计学和机器学习中的一项基本任务,旨在判断来自潜在分布$p$和$q$的两组样本是否实际上同分布(即$p=q$)。一种流行且直观的方法是分类器双样本检验(C2ST),其中训练一个分类器来区分来自$p$和$q$的样本。然而,尽管C2ST简单,其可靠性依赖于接近贝叶斯最优的分类器,这一要求很少满足且难以验证。这引发了一个重要的开放问题:弱分类器是否仍能用于双样本检验?我们证明答案是肯定的。基于Hu和Lei(2024)的工作,我们分析了C2ST的两种共形变体,它们将任何训练好的分类器(即使是弱的、有偏的或过拟合的)的分数转化为精确的有限样本p值。我们建立了共形C2ST的两个关键理论性质:(i)有限样本第一类错误控制,以及(ii)非平凡的功效,该功效随训练分类器误差的增加而温和退化。结果是,即使是表现不佳的分类器也能产生强大且可靠的双样本检验。这一通用框架在贝叶斯推断中找到了强大的应用,特别是在验证神经后验估计(NPE)模型时,其中比较学习到的后验近似$q(θ\mid y)$与真实后验$p(θ\mid y)$的任务可以表述为双样本检验。实验上,共形C2ST在此任务的广泛基准测试中优于经典判别检验。我们的结果确立了共形C2ST作为一种实用、理论基础的诊断工具。
The two-sample testing problem, a fundamental task in statistics and machine learning, seeks to determine whether two sets of samples, drawn from underlying distributions $p$ and $q$, are in fact identically distributed (i.e. whether $p=q$). A popular and intuitive approach is the classifier two-sample test (C2ST), where a classifier is trained to distinguish between samples from $p$ and $q$. Yet despite simplicity of the C2ST, its reliability hinges on access to a near-Bayes-optimal classifier, a requirement that is rarely met and difficult to verify. This raises a major open question: can a weak classifier still be useful for two-sample testing? We show that the answer is a definitive yes. Building on the work of Hu and Lei (2024), we analyze two conformal variants of the C2ST that convert the scores from any trained classifier -- even if weak, biased, or overfit -- into exact, finite-sample p-values. We establish two key theoretical properties of the conformal C2ST: (i) finite-sample Type-I error control, and (ii) non-trivial power that degrades gently in tandem with the error of the trained classifier. The upshot is that even poorly performing classifiers can yield powerful and reliable two-sample tests. This general framework finds a powerful application in Bayesian inference, particularly for validating Neural Posterior Estimation (NPE) models, where the task of comparing a learned posterior approximation $q(θ\mid y)$ to the true posterior $p(θ\mid y)$ can be framed as a two-sample test. Empirically, the Conformal C2ST outperforms classical discriminative tests across a wide range of benchmarks for this task. Our results establish the conformal C2ST as a practical, theoretically grounded diagnostic tool.
稀疏异常的任意时间有效检验
Muriel F. Pérez-Ortiz, Rui M. Castro, Ivo V. Stoepker
AI总结 针对大量数据流中稀疏异常的在线检测问题,设计并分析了任意时间有效(AV)检验,提出了一种计算高效且参数自适应的AV检验方法,在阈值行为上达到与最优AV检验相同的性能。
我们考虑在大量数据流中顺序检测稀疏异常的问题。为此,我们设计并分析了任意时间有效(AV)检验,该检验在任意停止时间都能控制第一类错误。现有结果仅针对非顺序情况,该情况在两个阶段之间表现出微妙的相变:检验要么无力,要么有力。在我们的顺序设置中,我们认为存在两个挑战:(1)AV检验的标准分析无法在相关的样本量范围内执行;(2)参数自适应AV检验的标准构造要么解析上难以处理,要么计算上不可行。本文解决了这些挑战。借鉴非顺序文献中的见解,我们提出了一个分析AV检验及其最短可能样本量的框架。在该框架下,我们证明,在高斯位置设置中,最优AV检验具有微妙的阈值行为,该行为与最优非顺序检验中观察到的相变相关但并非由其导出。我们的主要结果包括一个计算高效、参数自适应的AV检验;我们证明它实现了与最优AV检验相同的阈值行为。数值模拟说明了这些理论发现。
We consider the problem of testing sequentially for the presence of sparse anomalies among a large number of data streams. To this end, we design and analyze Anytime-Valid (AV) tests, which retain type-I error control at arbitrary stopping times. Existing results address exclusively the nonsequential case, which exhibits a subtle phase transition between two regimes where tests are either powerless or powerful. In our sequential setting, we argue, two challenges arise: (1) the standard analysis of AV tests cannot be executed in the relevant sample-size regime; and (2) standard constructions of parameter-adaptive AV tests are either analytically intractable or computationally unfeasible. This work addresses these challenges. Borrowing insights from the nonsequential literature, we propose a framework to analyze AV tests and their shortest possible sample sizes. Under this framework, we show that, in the Gaussian location setting, the oracle AV test has a delicate threshold behavior that is related to -- but not implied by -- the phase transition observed in optimal nonsequential tests. Our main results include a computationally efficient, parameter-adaptive AV test; we show that it achieves the same threshold behavior as the oracle AV test. Numerical simulations illustrate these theoretical findings.
立场:量子核机器应超越标量值核以实现其潜力
Hachem Kadri, Joachim Tomasi, Yuka Hashimoto, Sandrine Anthoine
AI总结 本文主张量子核机器应转向算子值核等更富表达力的框架,以利用纠缠和非交换结构处理复杂结构化预测问题,并通过初步概念验证展示其优势。
基于量子力学原理构建的量子核函数已成为量子机器学习的核心。最近的研究表明,当从经典数据学习时,量子核无法提供显著的计算或统计优势,这削弱了最初对量子核机器的热情。然而,该领域的大多数研究都集中在标准分类或回归设置中的标量值核上,而经典核方法在这些设置中已经高效且有效,留给量子核改进的空间很小。在这篇立场论文中,我们认为该领域的进展需要超越标量值核,转向更富表达力的核框架。标量值核缺乏充分利用纠缠等内在量子资源所需的自由度,并且不足以处理经典学习方法难以应对的复杂学习任务。基于算子值核学习和$C^*$-代数核表示的最新进展,我们提出了一条设计能够利用纠缠和非交换结构来处理复杂结构化预测问题的量子核的路线图。为了支持这一观点,我们展示了一个初步的概念验证,说明量子算子值核公式如何揭示标量值核方法难以访问的结构依赖性。这一焦点的转移可能为新一代量子核机器及其潜在优势的更忠实探索开辟道路。
Quantum kernel functions built using quantum-mechanical principles and have emerged as a centerpiece of quantum machine learning. The initial enthusiasm for quantum kernel machines has been tempered by recent studies suggesting that quantum kernels could not offer significant computational or statistical advantages when learning from classical data. However, most of the research in this area has been devoted to scalar-valued kernels in standard classification or regression settings for which classical kernel methods are efficient and effective, leaving very little room for improvement with quantum kernels. In this position paper, we argue that progress in this field requires moving beyond scalar-valued kernels toward more expressive kernel frameworks. Scalar-valued kernels lack the degrees of freedom necessary to fully exploit intrinsically quantum resources such as entanglement and are not rich enough to deal with complex learning tasks where classical learning methods struggle. Building on recent advances in operator-valued kernel learning and $C^*$-algebraic kernel representations, we propose a roadmap for designing quantum kernels capable of leveraging entanglement and non-commutative structures to tackle complex structured prediction problems. To support this viewpoint, we present an initial proof-of-concept illustrating how quantum operator-valued kernel formulations can reveal structural dependencies that remain difficult to access for scalar-valued kernel methods. This shift in focus could open a pathway toward a new generation of quantum kernel machines and a more faithful exploration of their potential advantages.
从联合数据中学习可解释的因子化随机策略的最小最大方法,附带不确定性量化
Connor T. Jerzak, Priyanshi Chandra, Rishi Hazra
AI总结 针对指数级大因子动作空间,提出从随机偏好数据中离线优化策略的方法,利用联合实验估计可解释随机策略,并给出渐近有效的不确定性量化。
我们研究从随机偏好数据中对指数级大因子动作空间进行离线策略优化,展示了在正则条件下,联合实验如何估计可解释的随机策略,并具有渐近有效的不确定性。联合分析通常通过平均对手属性来报告平均边际成分效应(AMCEs),从而忽略了策略性相互依赖。我们转而学习随机干预——因子水平上的类别策略乘积——这些策略(i)在平均情况下优化期望结果,(ii)扩展到两人最小最大(对抗性)设置,该设置现实地捕捉了同时进行的策略性候选人选择。在方法上,我们推导了具有L2方差正则化的可处理双向交互机制的闭式优化器,并为更丰富的模型类提供了通用的基于梯度的程序。结果模型的不确定性通过Delta方法近似渐近传播到最优策略及其价值。我们进一步在最小最大目标内建模制度细节(例如初选),并引入一个数据驱动的衡量双方之间策略性分歧的指标。在合成数据上,我们根据维度和样本量n的变化,实证刻画了有限样本误差和覆盖度。在美国总统联合实验中,对抗性学习策略产生的受限均衡投票份额与我们数据中的历史选举范围一致,与非对抗性(平均)优化器形成鲜明对比。
We study offline policy optimization over exponentially large factorial action spaces from randomized preference data, showing how conjoint experiments can estimate interpretable stochastic policies with asymptotically valid uncertainty under regularity conditions. Conjoint analyses typically report Average Marginal Component Effects (AMCEs) by averaging over opponent attributes and thus ignore strategic interdependence. We instead learn stochastic interventions -- product-of-Categorical policies over factor levels -- that (i) optimize expected outcomes in an average-case setting and (ii) extend to a two-player minimax (adversarial) setting that realistically captures simultaneous strategic candidate selection. Methodologically, we derive a closed-form optimizer for a tractable two-way interaction regime with L2 variance regularization, and provide a general gradient-based procedure for richer model classes. Uncertainty from the outcome model propagates asymptotically to both the optimal policy and its value via a Delta method approximation. We further model institutional details (e.g., primaries) inside the minimax objective and introduce a data-driven measure of strategic divergence between parties. On synthetic data, we empirically characterize finite-sample error and coverage as dimensionality and $n$ vary. On a U.S. presidential conjoint, adversarially learned policies produce restricted-equilibrium vote shares that align with historical election ranges in our data, in stark contrast to non-adversarial (averaging) optimizers.
缓慢缩放的每条记录差分隐私
Brian Finley, Anthony M Caruso, Justin C Doty, Ashwin Machanavajjhala, Mikaela R Meyer, David Pujol, William Sexton, Zachary Terner
AI总结 针对存在大量异常值的数据(如收入数据),提出隐私保证随记录影响对数级缓慢退化的机制,实现准确无偏的统计发布,并为高影响记录提供有意义的保护。
我们开发了用于从具有许多异常值(如收入数据)的数据中发布统计信息的正式隐私机制。这些机制确保每条记录的差分隐私保证随着受保护记录对发布统计信息的影响而缓慢退化。正式隐私机制通常向发布的统计信息中添加随机性,即“噪声”。如果添加或删除基础数据集中的单条记录后,带噪声统计信息的分布变化很小,那么查看该统计信息的攻击者将难以判断任何特定记录是否存在,从而保护记录的隐私。影响更大的记录——即添加或删除会更大程度改变统计信息分布的那些记录——通常会遭受更大的隐私损失。每条记录差分隐私框架量化了这些记录特定的隐私保证,但现有机制使得这些保证随影响快速退化(线性或二次)。虽然这在存在一些中等影响记录的情况下可能可以接受,但当记录的影响差异很大时(这在经济数据中很常见),会导致不可接受的高隐私损失。我们开发了隐私保证随影响以对数级缓慢退化的机制。这些机制允许准确、无偏地发布统计信息,同时为高影响记录提供有意义的保护。作为示例,我们考虑了诸如工资单等无界机构数据的和值的私有发布,我们的机制甚至为非常大的机构提供了有意义的隐私保护。我们通过实验评估了这些机制并展示了其实用性。
We develop formal privacy mechanisms for releasing statistics from data with many outlying values, such as income data. These mechanisms ensure that a per-record differential privacy guarantee degrades slowly in the protected records' influence on the statistics being released. Formal privacy mechanisms generally add randomness, or "noise," to published statistics. If a noisy statistic's distribution changes little with the addition or deletion of a single record in the underlying dataset, an attacker looking at this statistic will find it plausible that any particular record was present or absent, preserving the records' privacy. More influential records -- those whose addition or deletion would change the statistics' distribution more -- typically suffer greater privacy loss. The per-record differential privacy framework quantifies these record-specific privacy guarantees, but existing mechanisms let these guarantees degrade rapidly (linearly or quadratically) with influence. While this may be acceptable in cases with some moderately influential records, it results in unacceptably high privacy losses when records' influence varies widely, as is common in economic data. We develop mechanisms with privacy guarantees that instead degrade as slowly as logarithmically with influence. These mechanisms allow for the accurate, unbiased release of statistics, while providing meaningful protection for highly influential records. As an example, we consider the private release of sums of unbounded establishment data such as payroll, where our mechanisms extend meaningful privacy protection even to very large establishments. We evaluate these mechanisms empirically and demonstrate their utility.
基于最近邻的逻辑斯蒂 Lasso 回归用于基于梯度的降维
Touqeer Ahmad, François Portier, Gilles Stupfler
AI总结 提出一种结合最近邻和 L1 惩罚的局部逻辑斯蒂回归方法估计条件概率梯度,并利用梯度外积估计中心子空间,实现高维协变量的降维。
本文研究了一种在二分类框架下估计给定协变量的条件概率梯度的新方法。该方法通过拟合局部最近邻逻辑斯蒂模型并施加 ℓ1 惩罚,以应对可能的高维协变量。我们的理论分析表明,在非常温和的假设下,梯度估计量的逐点收敛速度是最优的。此外,利用协变量空间中多个点的梯度估计的外积,我们提供了一种估计中心子空间的新方法,中心子空间是协变量空间内实现降维的著名对象。我们的实现使用交叉验证误分类率来估计该子空间的维度。我们发现,所提出的方法在合成数据和实际数据应用中优于现有竞争方法。
This paper investigates a new approach to estimate the gradient of the conditional probability given the covariates in the binary classification framework. The proposed approach consists of fitting a localized nearest-neighbor logistic model with $\ell_1$-penalty in order to cope with possibly high-dimensional covariates. Our theoretical analysis shows that the pointwise convergence rate of the gradient estimator is optimal under very mild assumptions. Moreover, using an outer product of such gradient estimates at several points in the covariate space, we provide a new method for estimating the central subspace, a well-known object allowing to carry out dimension reduction within the covariate space. Our implementation uses cross-validation on the misclassification rate to estimate the dimension of this subspace. We find that the proposed approach outperforms existing competitors in synthetic and real data applications.
两阶段量子估计与量子增强透射率传感的渐近性
Zihao Gong, Boulat A. Bash
AI总结 针对量子态中单个未知参数的估计问题,提出一种放宽条件的两阶段方法,在第一阶段使用次优测量获得初步估计,第二阶段利用初步估计构造达到量子Cramér-Rao界的测量,并应用于量子增强透射率传感的渐近分析。
我们考虑嵌入在量子态中的单个未知参数的估计。量子Cramér-Rao界(QCRB)是任何无偏估计量均方误差的终极极限。虽然对于大量量子态副本,它可以渐近地达到,但所需的测量通常依赖于感兴趣参数的真实值。先前的工作通过两阶段方法解决了这一悖论:在第一阶段,通过在量子态副本的消失部分上应用不依赖于感兴趣参数的次优测量,获得初步估计。在第二阶段,使用初步估计来构造达到QCRB的测量,并将其应用于剩余的量子态副本。这类似于处理带 nuisance 参数的经典问题的两步估计量。不幸的是,原始分析施加的条件严重限制了应用于量子测量结果的经典估计量类别,阻碍了该方法的实际应用。我们放宽这些条件,以在略微削弱两阶段方法渐近性质为代价,大幅拓宽单参数问题中可用估计量的类别。我们还考虑了 nuisance 参数。我们将我们的结果应用于获得量子增强透射率传感的渐近性。
We consider estimation of a single unknown parameter embedded in a quantum state. Quantum Cramér-Rao bound (QCRB) is the ultimate limit of the mean squared error for any unbiased estimator. While it can be achieved asymptotically for a large number of quantum state copies, the measurement required often depends on the true value of the parameter of interest. Prior work addresses this paradox using a two-stage approach: in the first stage, a preliminary estimate is obtained by applying, on a vanishing fraction of quantum state copies, a sub-optimal measurement that does not depend on the parameter of interest. In the second stage, the preliminary estimate is used to construct the QCRB-achieving measurement that is applied to the remaining quantum state copies. This is akin to two-step estimators for classical problems with nuisance parameters. Unfortunately, the original analysis imposes conditions that severely restrict the class of classical estimators applied to the quantum measurement outcomes, hindering applications of this method. We relax these conditions to substantially broaden the class of usable estimators for single-parameter problems at the cost of slightly weakening the asymptotic properties of the two-stage method. We also account for nuisance parameters. We apply our results to obtain the asymptotics of quantum-enhanced transmittance sensing.
信息借用下的多臂临床试验设计与推断:交互式瓮设计
Giacomo Aletti, Alessandro Baldi Antognini, Irene Crimaldi, Rosamarie Frieri, Andrea Ghiglietti
AI总结 本文提出一种基于交互式瓮系统的新型分层比较试验设计方法,通过信息借用增强早期疗效信息交换、自适应调整分配偏向最有效治疗,并随信息增长消除不同疗效层的影响,同时给出理论性质与渐近推断。
本文涉及一种基于交互式瓮系统的新型分层比较试验设计方法。关键思想是建模瓮之间的相互作用以跨层借用信息,并在设计阶段使用该信息,以便:i) 在研究开始时增强信息交换,此时只有少数受试者入组且关于治疗效果的层特异性信息可能稀缺;ii) 通过基于观察结果的更新机制使信息共享自适应演化,以在每一步将分配偏向层特异性最有希望的治疗;iii) 随着层信息增长,使具有不同治疗效果的层的贡献逐渐消失。具体而言,我们引入了交互式瓮设计,即一种新的协变量调整响应自适应程序,该程序根据瓮系统的演化随机化治疗分配。描述了该提议的理论性质,并提供了相应的渐近推断。此外,通过函数中心极限定理,我们获得了Wald型序贯检验统计量的渐近联合分布,从而允许在临床实践中对建议的设计进行序贯监测。
This paper deals with a new design methodology for stratified comparative experiments based on a system of interacting urns. The key idea is to model the interaction between urns for borrowing information across strata and to use it in the design phase in order to i) enhance the information exchange at the beginning of the study, when only few subjects have been enrolled and the stratum-specific information on treatments' efficacy could be scarce, ii) let the information sharing adaptively evolve via an update mechanism based on the observed outcomes, for skewing at each step the allocations towards the stratum-specific most promising treatment and iii) make the contribution of the strata with different treatment efficacy vanishing as the stratum information grows. In particular, we introduce the Interacting Urns Design, namely a new Covariate-Adjusted Response-Adaptive procedure, that randomizes the treatment allocations according to the evolution of the urn system. The theoretical properties of this proposal are described and the corresponding asymptotic inference is provided. Moreover, by a functional central limit theorem, we obtain the asymptotic joint distribution of the Wald-type sequential test statistics, which allows to sequentially monitor the suggested design in the clinical practice
预承诺最优选择问题:精确有限n公式
Marcos Costa Santos Carreira
AI总结 研究非自适应预承诺类最优选择问题,推导出一般非递增阈值向量和单重复阈值的精确有限n概率公式,并证明该类严格次优。
在Gilbert和Mosteller(1966)的全信息最优选择问题中,n个独立同分布均匀随机变量依次被观测,玩家知道n,并在某一时刻停止以捕捉全局最大值。最优规则是自适应的——仅当当前观测是运行最大值且超过一个依赖于轮次的阈值时才接受它——其获胜概率收敛于0.580164...我们研究受限的非自适应预承诺类,其中阈值预先固定,玩家在第一个超过其阈值的观测处停止,而不检查运行最大值。将每一轮分类为获胜、误报、漏报或继续,我们推导出所有四种概率的精确有限n公式——对于一般非递增阈值向量,以及通过双伽马函数得到的单重复阈值的闭式解——连同最优阈值和无常数近似,所有结果均通过模拟验证。我们证明该类对于每个n≥2都是严格次优的;启发式标度极限分析将其渐近获胜概率置于0.562附近(单阈值情况为0.51735),低于最优的0.580164。
In the full-information best-choice problem of Gilbert and Mosteller (1966), n i.i.d. uniform draws are observed in sequence and the player, knowing n, stops at one with the goal of catching the overall maximum. The optimal rule is adaptive -- accept a draw only if it is a running maximum and exceeds a round-dependent threshold -- and its win probability converges to 0.580164... We study the restricted, non-adaptive pre-commitment class, in which the thresholds are fixed in advance and the player stops at the first draw above its threshold, with no running-maximum check. Classifying each round as a win, a false positive, a false negative, or a continuation, we derive exact finite-n formulas for all four probabilities -- for a general non-increasing threshold vector, and in closed form through the digamma function for a single repeated threshold -- together with the optimal thresholds and a constant-free approximation, all validated by simulation. We show the class is strictly suboptimal for every n >= 2; a heuristic scaling-limit analysis places its asymptotic win probability near 0.562 (the single-threshold case gives 0.51735), below the optimal 0.580164.
关于Wasserstein GANs的正则化
Henning Petzka, Asja Fischer, Denis Lukovnikov
AI总结 本文研究Wasserstein GANs中Lipschitz约束的正则化方法,通过理论分析和实验证明使用较弱的正则化项优于权重裁剪。
自生成对抗网络(GANs)发明以来,它们已成为学习建模真实(未标记)数据分布的一种流行方法。训练过程中的收敛问题通过Wasserstein GANs得以克服,后者通过不同的度量最小化模型与经验分布之间的距离,但由此在优化问题中引入了Lipschitz约束。在神经网络可建模的函数类上强制Lipschitz约束的一种简单方法是权重裁剪。有人提出,可以通过在损失函数中添加正则化项来改进训练,该正则化项惩罚判别器(作为网络输入的函数)的梯度偏离1。我们提出了理论论据,说明为什么使用较弱的正则化项来强制Lipschitz约束更可取。这些论据得到了在玩具数据集上的实验结果的支持。
Since their invention, generative adversarial networks (GANs) have become a popular approach for learning to model a distribution of real (unlabeled) data. Convergence problems during training are overcome by Wasserstein GANs which minimize the distance between the model and the empirical distribution in terms of a different metric, but thereby introduce a Lipschitz constraint into the optimization problem. A simple way to enforce the Lipschitz constraint on the class of functions, which can be modeled by the neural network, is weight clipping. It was proposed that training can be improved by instead augmenting the loss by a regularization term that penalizes the deviation of the gradient of the critic (as a function of the network's input) from one. We present theoretical arguments why using a weaker regularization term enforcing the Lipschitz constraint is preferable. These arguments are supported by experimental results on toy data sets.