arXivDaily arXiv每日学术速递 周一至周五更新
重置
2605.15168 2026-05-15 cs.CL cs.AI cs.LG stat.ML

Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment

Sayantan Kumar, Shahriar Noroozizadeh, Juyong Kim, Jeremy C. Weiss

AI总结 本研究旨在解决临床文本与结构化电子健康记录(EHR)在时间信息上的互补性问题,提出了一种基于检索增强的多模态对齐框架,用于重建更精确的临床时间线。该方法通过从文本中提取关键事件构建时间框架,并结合结构化数据中的时间信息进行校准,从而提升时间戳的准确性。实验表明,该方法在多个模型上均显著提升了时间一致性,同时保留了事件匹配率,展示了多模态对齐在临床轨迹重建中的优势。

详情
Comments
Sayantan Kumar, Shahriar Noroozizadeh, Juyong Kim (authors contributed equally)
英文摘要

Reconstructing precise clinical timelines is essential for modeling patient trajectories and forecasting risk in complex, heterogeneous conditions like sepsis. While unstructured clinical narratives offer semantically rich and contextually complete descriptions of a patient's course, they often lack temporal precision and contain ambiguous event timing. Conversely, structured electronic health record (EHR) data provides precise temporal anchors but misses a substantial portion of clinically meaningful events. We introduce a retrieval-augmented multimodal alignment framework that bridges this gap to improve the temporal precision of absolute clinical timelines extracted from text. Our approach formulates timeline reconstruction as a graph-based multistep process: it first extracts central anchor events from narratives to build an initial temporal scaffold, places non-central events relative to this backbone, and then calibrates the timeline using retrieved structured EHR rows as external temporal evidence. Evaluated using instruction-tuned large language models on the i2m4 benchmark spanning MIMIC-III and MIMIC-IV, our multimodal pipeline consistently improves absolute timestamp accuracy (AULTC) and improves temporal concordance across nearly all evaluated models over unimodal text-only reconstruction, without compromising event match rates. Furthermore, our empirical gap analysis reveals that 34.8% of text-derived events are entirely absent from tabular records, demonstrating that aligning these modalities can produce a more temporally faithful and clinically informative reconstruction of patient trajectories than either source alone.

2605.15154 2026-05-15 stat.ML cs.LG

RoSHAP: A Distributional Framework and Robust Metric for Stable Feature Attribution

Lanxin Xiang, Liang Shi, Youhui Ye, Boyu Jiang, Dawei Zhou, Feng Guo

AI总结 本文提出了一种名为RoSHAP的分布框架和鲁棒度量方法,用于实现更稳定的特征归因分析。该方法基于SHAP值,通过引导重采样和核密度估计建模特征归因分数的分布,并在温和正则条件下证明其聚合值渐近服从高斯分布,从而降低了分布估计的计算成本。RoSHAP不仅提升了特征排名的稳定性,还在模拟和实际数据实验中表现出优于传统单次归因方法的性能,同时使用更少的特征即可达到与全特征模型相当的预测效果。

详情
英文摘要

Feature attribution analysis is critical for interpreting machine learning models and supporting reliable data-driven decisions. However, feature attribution measures often exhibit stochastic variation: different train--test splits, random seeds, or model-fitting procedures can produce substantially different attribution values and feature rankings. This paper proposes a framework for incorporating stochastic nature of feature attribution and a robust attribution metric, RoSHAP, for stable feature ranking based on the SHAP metric. The proposed framework models the distribution of feature attribution scores and estimates it through bootstrap resampling and kernel density estimation. We show that, under mild regularity conditions, the aggregated feature attribution score is asymptotically Gaussian, which greatly reduces the computational cost of distribution estimation. The RoSHAP summarizes the distribution of SHAP into a robust feature-ranking criterion that simultaneously rewards features that are active, strong, and stable. Through simulations and real-data experiments, the proposed framework and RoSHAP outperform standard single-run attribution measures in identifying signal features. In addition, models built using RoSHAP-selected features achieve predictive performance comparable to full-feature models while using substantially fewer predictors. The proposed RoSHAP approach improves the stability and interpretability of machine learning models, enabling reliable and consistent insights for analysis.

2605.15142 2026-05-15 stat.ME

Creating treatment and component hierarchies in component network meta-analysis

Augustine Wigle, Audrey Béliveau, Adriani Nikolakopoulou, Lifeng Lin

AI总结 该论文研究了组件网络meta分析(CNMA)中如何构建治疗和组件的层次结构,以评估多组分治疗(如抗抑郁药物组合)与单个组分(如单一抗抑郁药物)之间的相对效果。由于CNMA中可唯一估计的相对效应较为复杂,传统网络meta分析(NMA)的层次结构方法无法直接应用。本文提出了一种适用于频率学派和贝叶斯学派CNMA的分步工作流程,明确识别可唯一估计的相对效应,并通过两个实际案例展示了该方法的应用。

详情
英文摘要

Component network meta-analysis (CNMA) is a statistical methodology that enables estimation of relative effects for multi-component treatments, such as combinations of antidepressants, and individual components, such as single antidepressants, by synthesizing data from multiple studies. A commonly desired output of a systematic review and meta-analysis is a hierarchy of the treatments in terms of a certain performance metric. Methods have been established for standard network meta-analysis (NMA), but have not yet been extended to CNMA. In particular, CNMA presents unique challenges because the set of relative effects that can be uniquely estimated is more complex to determine compared to standard NMA, and a hierarchy involving relative effects that are not uniquely estimable is misleading. We present a step-by-step workflow for answering treatment hierarchy questions in both frequentist and Bayesian CNMA, including explicitly identifying the uniquely estimable relative effects. We illustrate the workflow by posing multiple treatment hierarchy questions in two distinct networks, one concerning primary care of depression and one disconnected network investigating treatment for chronic lymphocytic leukemia.

2605.15115 2026-05-15 econ.EM stat.ME

A Practical Guide to Instrumental Variables Methods with Heterogeneous Treatment Effects

Tymon Słoczyński, Liyang Sun, S. Derya Uysal

AI总结 本文提供了一本关于工具变量(IV)方法的实用指南,重点探讨了在存在异质处理效应的情况下如何正确应用IV方法。作者分析了不同协变量设定对局部平均处理效应(LATE)加权平均的影响,并指出参数设定错误可能破坏因果推断的可靠性,因此建议采用灵活的模型作为稳健性检验。此外,文章还回顾了LATE假设的正式检验方法,并介绍了对单调性假设不成立具有一定鲁棒性的方法,同时提供了相关软件实现的指导。

详情
英文摘要

Instrumental variables (IV) methods are central to applied microeconomics. While classical approaches assume linear models with constant effects, recent literature has shifted toward the local average treatment effect (LATE) framework to accommodate heterogeneous treatment effects. This paper provides a practical guide to aligning empirical practice with recent theory. We first examine how different specifications with covariates lead to distinct weighted averages of covariate-specific LATEs. We then discuss how parametric misspecification can undermine the causal interpretation of these estimands and suggest flexible specifications as essential robustness checks. Finally, we review formal tests for LATE assumptions and methods robust to monotonicity violations. We provide a guide to software implementations to help researchers apply the methods in practice.

2605.15085 2026-05-15 stat.ML cs.LG stat.AP stat.ME

From Data to Action: Accelerating Refinery Optimization with AI

Dániel Pfeifer, Ábrahám Papp, Tibor Bernáth, Tamás Zoltán Varga, Márk Czifra, Botond Szilágyi, Edith Alice Kovács

AI总结 本文研究了如何利用人工智能加速炼油厂优化过程,针对线性规划(LP)方法在实际应用中面临的解释与应用难题,提出结合机器学习的方法以提升决策支持。核心方法包括改进的异常检测工具和高维数据处理策略,有效识别了炼油厂调度与规划中的业务机会与数据供应错误,为优化结果的可信度提供了新的洞察。

详情
Comments
34 pages, 17 figures
英文摘要

Nowadays refinery optimization utilizes sheer amounts of data, which can be handled with modern Linear Programming (LP) software, but the interpreting and applying the results remains challenging. Large petrochemical companies use massive models, with hundreds of thousands of input matrix elements. The LP solution is mathematically correct, but simplifications are made in the model, and data supply errors may occur. Therefore, further insight is needed to trust the results. The LP solver does not have a memory, so additional understanding could be gained by analyzing historical data and comparing it to the current plan. As such, machine learning approaches were suggested to support decision making based on the LP solution. Among these, Anomaly Detection tools are proposed to be used in tandem with the LP output. A transformed version of the popular ECOD methodology is applied. New methods are proposed to handle high-dimensional data: choosing the most informative pairs. Then, this is used alongside two 2D Anomaly Detection algorithms, revealing several business opportunities and data supply errors in the MOL refinery scheduling and planning architecture.

2605.15082 2026-05-15 stat.ML cs.LG math.ST stat.TH

Average Gradient Outer Product in kernel regression provably recovers the central subspace for multi-index models

Libin Zhu, Damek Davis, Dmitriy Drusvyatskiy, Maryam Fazel

AI总结 本文研究了在样本数量少于精确预测所需的情况下,如何通过学习预测器发现数据中的低维结构。具体来说,考虑从有限数据对中恢复多指标多项式模型 $f^*(x)=h(Ux)$ 的问题,其中输入仅通过未知的 $r$ 维中心子空间的投影来影响输出。作者提出了一种简单方法:拟合核岭回归(KRR)并计算拟合预测器的平均梯度外积(AGOP),证明其前 $r$ 个特征向量可准确恢复该子空间,即使预测误差仍较大时也成立。研究还表明,当目标函数的低阶部分包含所有预测相关方向时,子空间恢复所需的样本量远低于精确预测所需的样本量,揭示了预测与表示之间的差异。

详情
Comments
95 pages, 12 figures
英文摘要

We study a prototypical situation when a learned predictor can discover useful low-dimensional structure in data, while using fewer samples than are needed for accurate prediction. Specifically, we consider the problem of recovering a multi-index polynomial $f^*(x)=h(Ux)$, with $U\in\mathbb{R}^{r\times d}$ and $r\ll d$, from finitely many data/label pairs. Importantly, the target function depends on input $x$ only through the projection onto an unknown $r$-dimensional central subspace. The algorithm we analyze is appealingly simple: fit kernel ridge regression (KRR) to the data and compute the Average Gradient Outer Product (AGOP) from the fitted predictor. Our main results show that under reasonable assumptions the top $r$-dimensional eigenspace of AGOP provably recovers the central subspace, even in regimes when the prediction error remains large. Specifically, if the target function $f^*$ has degree $p^*$, it is known that $n\asymp d^{p^*}$ samples are necessary for KRR to achieve accurate prediction. In contrast, we show that if a low degree $p$ component of $f^*$ already carries all relevant directions for prediction, subspace recovery occurs in the much lower sample regime $n\asymp d^{p+δ}$ for any $δ\in(0,1)$. Our results thus demonstrate a separation between prediction and representation, and provide an explanation for why iterative kernel methods such as Recursive Feature Machines (RFM) can be sample-efficient in practice.

2605.14976 2026-05-15 stat.ME econ.EM q-fin.ST

Multi-regime Markov-switching models with time-varying transition probabilities: An application to U.S. Treasury yields

Samuel Modée, Yushu Li, Sjur Westgaard, Stein Andreas Bethuelsen

AI总结 本文研究了具有时间变化转移概率的多制度马尔可夫切换模型,并将其应用于美国国债收益率分析。作者将广义自回归得分(GAS)模型中两制度共同方差设定扩展到具有制度特异均值和方差的多制度一般情形,并开发了开源R包用于数据模拟与参数估计。研究表明,制度均值、方差和转移概率可可靠估计,但转移概率驱动系数较难识别,同时GAS得分系数在联合似然函数中存在非识别问题。实证分析显示,基于收益率水平的外生设定在拟合效果上优于常数和滞后变化模型,而GAS设定则因收敛问题表现不佳。

详情
Comments
15 pages, 1 figure
英文摘要

This paper studies Markov-switching (MS) models with time-varying transition probabilities (TVTP) under various specifications of the transition probability matrix. Especially, we extend the two-regime common-variance setting of the Generalized Autoregressive Score (GAS) model from (Bazzi et al., 2017) to the general $K$-regime case with regime-specific means and variances. Our study contains comprehensive Monte Carlo simulations and we developed an open-source R package, \texttt{multiregimeTVTP}, for data simulation and parameter estimation. We find that the regime means, variances, and transition probabilities are reliably recovered, whereas the TVTP driving coefficients are harder to identify. Another finding from our paper is that the GAS score coefficient appears to be statistically non-identifiable, due to a ridge in the joint likelihood surface $(σ^2,A)$. In addition, we find that one-step point forecasts are remarkably robust to TVTP misspecification, but filtered regime probabilities are not, so correct specification matters most for characterizing regime dynamics rather than short-horizon forecasting. An empirical application to U.S. Treasury zero-coupon yield changes at four maturities (1961-2024) shows that an exogenous specification driven by the lagged yield level dominates the constant and lagged-change models in fit, while the GAS specification fails to converge, with $\hat{A}$ collapsing to zero, reflecting the same identifiability issue observed in simulation.

2512.16768 2026-05-15 stat.ML cs.LG math.PR

On The Hidden Biases of Flow Matching Samplers

Soon Hoe Lim

AI总结 本文研究了流匹配(Flow Matching)采样器在有限样本情况下的隐藏偏差问题。通过将总体期望替换为样本平均,并用有限样本替代目标分布,作者提出了一种经验流匹配模型的层次结构。针对仿射条件流,文中推导了精确的经验最小化解,并识别出一种平滑插值机制,使得终端分布恰好为核混合估计量。研究揭示了经验流匹配中的多重偏差来源,包括目标分布替换带来的统计目标变化、经验最小化解可能不是梯度场,以及边际路径无法唯一确定粒子动力学等问题。

详情
Comments
41 pages
英文摘要

Flow matching (FM) constructs continuous-time ODE samplers by prescribing probability paths between a base distribution and a target distribution. In this note, we study FM through the lens of finite-sample plug-in estimation. In addition to replacing population expectations by sample averages, one may replace the target distribution itself by a finite-sample surrogate, ranging from the empirical measure to a smoothed estimator. This viewpoint yields a natural hierarchy of empirical FM models. For affine conditional flows, we derive the exact empirical minimizer and identify a smoothed plug-in regime in which the terminal law is exactly a kernel-mixture estimator. This plug-in perspective clarifies several coupled finite-sample biases of empirical FM. First, replacing the target law by a finite-sample surrogate changes the statistical target. Second, the empirical minimizer is generally not a gradient field, even when each conditional flow is. Third, a fixed empirical marginal path does not determine a unique particle dynamics: one may add extra vector fields whose probability flux has zero divergence without changing the marginal path. For Gaussian affine conditional paths, we give explicit families of such flux-null corrections. Finally, the source distribution provides a primary mechanism controlling upper tails of kinetic energy. In particular, Gaussian bases yield exponential upper-tail bounds for instantaneous and integrated kinetic energies, whereas polynomially tailed bases yield corresponding polynomial upper-tail bounds.

2502.14407 2026-05-15 math.ST cs.CC cs.DS math.PR stat.TH

Sharp Phase Transitions in Estimation with Low-Degree Polynomials

Youngtak Sohn, Alexander S. Wein

AI总结 该论文研究了在高维隐藏结构估计问题中,低度多项式算法的计算限制,揭示了统计可行性与计算可行性之间的显著差距。作者提出了一种新的方法,用于证明低度多项式算法在多种模型(如隐藏子矩阵、密集子图、尖峰维吉纳模型和随机块模型)中的下界,从而获得了关于估计任务的精确相变结果。研究不仅解决了多个开放问题,还为相关猜想提供了严格的理论支持。

详情
Comments
65 pages
英文摘要

High-dimensional planted problems, such as finding a hidden dense subgraph within a random graph, often exhibit a gap between statistical and computational feasibility. While recovering the hidden structure may be statistically possible, it is conjectured to be computationally intractable in certain parameter regimes. A powerful approach to understanding this hardness involves proving lower bounds on the efficacy of low-degree polynomial algorithms. We introduce new techniques for establishing such lower bounds, leading to novel results across diverse settings: planted submatrix, planted dense subgraph, the spiked Wigner model, and the stochastic block model. Notably, our results address the estimation task -- whereas most prior work is limited to hypothesis testing -- and capture sharp phase transitions such as the "BBP" transition in the spiked Wigner model (named for Baik, Ben Arous, and Péché) and the Kesten-Stigum threshold in the stochastic block model. Existing work on estimation either falls short of achieving these sharp thresholds or is limited to polynomials of very low (constant or logarithmic) degree. In contrast, our results rule out estimation with polynomials of degree $n^δ$ where $n$ is the dimension and $δ> 0$ is a constant, and in some cases we pin down the optimal constant $δ$. Our work resolves open problems posed by Hopkins & Steurer (2017) and Schramm & Wein (2022), and provides rigorous support within the low-degree framework for conjectures by Abbe & Sandon (2018) and Lelarge & Miolane (2019).

2412.14291 2026-05-15 math.OC cs.LG stat.ML

Projected gradient methods for nonconvex and stochastic smooth optimization: new complexities and auto-conditioned stepsizes

Guanghui Lan, Tianjiao Li, Yangyang Xu

AI总结 本文提出了一类新的投影梯度(PG)方法,用于在凸紧集上最小化光滑但不一定凸的目标函数。研究引入了“自适应条件化”投影梯度(AC-PG)方法,在无需输入梯度的Lipschitz常数或进行线搜索的情况下,达到了与现有最佳方法相当的迭代复杂度。此外,文章将PG方法推广到随机优化场景,提出了随机投影梯度(SPG)和方差缩减随机梯度(VR-SPG)方法,并在不同Oracle设置下获得了新的复杂度界,同时为这些方法设计了自适应步长策略,保证了收敛性。

详情
英文摘要

We present a novel class of projected gradient (PG) methods for minimizing a smooth but not necessarily convex function over a convex compact set. We first provide a novel analysis of the constant-stepsize PG method, achieving the best-known iteration complexity for finding an approximate stationary point of the problem. We then develop an "auto-conditioned" projected gradient (AC-PG) variant that achieves the same iteration complexity without requiring the input of the Lipschitz constant of the gradient or any line search procedure. The key idea is to estimate the Lipschitz constant using first-order information gathered from the previous iterations, and to show that the error caused by underestimating the Lipschitz constant can be properly controlled. We then generalize the PG methods to the stochastic setting, by proposing a stochastic projected gradient (SPG) method and a variance-reduced stochastic gradient (VR-SPG) method, achieving new complexity bounds in different oracle settings. We also present auto-conditioned stepsize policies for both stochastic PG methods and establish comparable convergence guarantees.

2406.06980 2026-05-15 stat.ME

Sensitivity Analysis for the Test-Negative Design

Soumyabrata Kundu, Peng Ding, Jingshu Wang, Xinran Li

AI总结 本文研究了测试阴性设计(test-negative design)在评估疫苗有效性中的应用,并针对该设计中可能存在的未测量混杂因素进行了敏感性分析。作者提出了两种方法来评估疫苗接种对具有良好医疗寻求行为人群的因果优势比影响,并探讨了该设计在控制未测量混杂方面的局限性。通过结合不同方法,本文进一步收紧了因果优势比的置信区间,并将方法应用于新冠疫苗有效性的观察性数据分析中。

详情
英文摘要

The test-negative design has become popular for evaluating the effectiveness of post-licensure vaccines using observational data. In addition to its logistical convenience on data collection, the design is also believed to control for the differential health-care-seeking behavior between vaccinated and unvaccinated individuals, an important while often unmeasured confounder between the vaccination and infection. Hence, the design has been employed routinely to monitor seasonal flu vaccines and more recently to measure the COVID-19 vaccine effectiveness. Despite its popularity, the design has been questioned, in particular about its ability to fully control for the unmeasured confounding. In this paper, we explore deviations from a perfect test-negative design, and propose various sensitivity analysis methods for estimating the effect of vaccination measured by the causal odds ratio on the subpopulation of individuals with good health-care-seeking behavior. We start with point identification of the causal odds ratio under a test-negative design, comparing different forms of identification assumptions and their corresponding estimands. We then propose two approaches for conducting sensitivity analysis, addressing the influence of the unmeasured confounding in two different ways. Specifically, one approach investigates partial control for unmeasured confounding in the test-negative design, while the other examines the impact of unmeasured confounding on both vaccination and infection. Furthermore, we combine these approaches to provide narrower bounds on the true causal odds ratio, and further sharpen the bounds by restricting the treatment effect heterogeneity. Finally, we apply the proposed methods to evaluate the effectiveness of COVID-19 vaccines using observational data from test-negative designs.

2605.14967 2026-05-15 cs.LG stat.ML

InfoSFT: Learn More and Forget Less with Information-Aware Token Weighting

Mahdi Sabbaghi, George Pappas, Adel Javanmard, Hamed Hassani

AI总结 本文提出了一种名为 InfoSFT 的监督微调方法,通过关注信息量大且置信度适中的 token 来提升大语言模型的学习效果,避免过度拟合低概率样本或抑制已有能力。该方法仅需对标准损失函数进行一行修改,能够在数学、代码和思维链等任务中显著提升模型泛化能力,同时更好地保留模型原有的性能。

详情
英文摘要

Supervised fine-tuning (SFT) provides the standard approach for teaching LLMs new behaviors from offline expert demonstrations. However, standard SFT uniformly fits all samples -- including those with low likelihood under the base model -- which can disproportionately drive training updates toward overfitting specific samples rather than learning the target behavior. Moreover, adapting to these unlikely samples induces substantial policy shifts that degrade prior capabilities. Existing methods mitigate this by filtering, regenerating, or down-weighting low-likelihood data. In doing so, they often suppress precisely the novel behaviors the base model has yet to learn. We propose InfoSFT, a principled weighting scheme for the SFT objective that concentrates learning signals on maximally informative, medium-confidence tokens -- those neither overly familiar to the base model nor too unlikely to cause instability. Requiring only a one-line modification to the standard token-wise loss, InfoSFT demonstrably improves generalization over vanilla SFT and likelihood-weighted baselines across math, code, and chain-of-thought tasks with diverse model families, while better preserving pre-existing capabilities.

2605.14952 2026-05-15 stat.ME stat.AP

Generalizing conditional average treatment effects from nested randomized trials to all trial-eligible individuals

Lan Wen, Issa J. Dahabreh, Yu-Han Chiu

AI总结 本文研究了如何从嵌套随机试验中推广条件平均处理效应(CATE),以适用于所有符合试验资格的个体。作者提出了一种结合半参数理论与灵活估计方法的策略,通过数据自适应方法估计干扰函数,构造条件影响函数伪结果,并利用局部线性(核)回归估计CATE函数。该方法采用样本分割和交叉拟合技术以降低过拟合偏差,确保渐近有效性,并通过模拟和冠状动脉外科手术研究(CASS)实例验证了其有限样本性能。

详情
英文摘要

Randomized controlled trials often enroll participants whose characteristics differ from those of a target population, which can limit the generalizability of the estimated treatment effects when effect modifiers differ across populations. While existing generalizability methods primarily focus on estimating the average treatment effect (ATE) in the target population, such summaries may obscure important heterogeneity that is relevant for clinical and policy decision-making. In this work, we illustrate an approach for estimating the conditional average treatment effect (CATE) in a target population of trial-eligible individuals as a function of prespecified effect modifiers within a nested trial setting. Our approach combines semiparametric theory with flexible estimation: we first estimate nuisance functions using data-adaptive methods and construct pseudo-outcomes from conditional influence functions, then estimate the CATE function via local linear (kernel) regression. Sample splitting and cross-fitting are used to reduce overfitting bias and ensure asymptotic valid inference. Finite-sample performance is assessed via simulations and illustrated in the Coronary Artery Surgery Study (CASS).

2605.14936 2026-05-15 stat.ME

Relaxation of Projected Prior with Continuous Gap Shrinkage

Leo L Duan, Sunghyun Cho, Mingzhang Yin

AI总结 本文提出了一种投影先验的连续松弛方法,旨在解决传统投影先验在后验计算中可能带来的高计算成本问题。核心思想是通过量化原问题与对偶目标之间的对偶间隙,并引入一种概率先验以促使该间隙趋于零,从而在无需迭代优化子程序的情况下实现近似投影。该方法具有形式简洁、计算高效的特点,并在后验收缩性能和广泛应用性方面表现出色,已在营销数据分析中得到验证。

详情
英文摘要

Projected priors were originally introduced to accommodate parameter constraints, but have recently regained popularity due to their ability to assign probability mass to low-dimensional parameter sets, such as the spaces of sparse vectors, directed acyclic graphs, or transport plans. When employed as a transformation of random variables, projection is especially useful, since its contraction property not only preserves probability concentration, but also often preserves differentiability for gradient-based posterior computation. On the other hand, unless the projection can be obtained by some non-iterative algorithm, posterior computation can be expensive because it requires nesting an iterative optimization routine within each Markov chain Monte Carlo iteration. In this article, inspired by the success of continuous shrinkage models as replacements for discrete spike-and-slab priors, we propose a continuous relaxation of projected priors. The key idea is to quantify the duality gap between the primal projection loss and the dual objective, and impose a probabilistic prior that shrinks this gap toward zero. The resulting gap-shrinkage prior has a tractable form, does not require running an optimization subroutine inside each posterior update, and puts probability mass near the exact projection. We demonstrate useful properties of gap-shrinkage priors, including connections to global-local shrinkage priors, broad applicability to generalized projection functions, and competitive performance in posterior contraction. We apply the gap-shrinkage model to a marketing data analysis aimed at identifying important predictor effects on multivariate grocery-shopping decisions.

2605.14917 2026-05-15 cs.LG cs.CE cs.IT math.IT stat.ML

A Mutual Information Lower Bound for Multimodal Regression Active Learning

Leonardo Ferreira Guilhoto, Akshat Kaushal, Paris Perdikaris

AI总结 该论文针对多模态回归中的主动学习问题,提出了一种新的获取函数MI-LB,用于更准确地捕捉模型的不确定性。研究引入了双索引框架,区分认识论不确定性和偶然性不确定性,并基于信息论推导出一个互信息下界作为获取目标。实验表明,该方法在多模态系统基准上表现优异,优于现有各类基线方法。

详情
英文摘要

Active learning for continuous regression has lacked an acquisition function that targets epistemic uncertainty when the predictive distribution is multimodal: variance misses modal disagreement, and information-theoretic targets like BALD are designed for discrete outputs. We introduce a Two-Index framework that makes this separation explicit: one stochastic index selects among competing model hypotheses (epistemic source), while a second governs within-hypothesis randomness (aleatoric source). An entropy decomposition within the framework identifies the mutual information between the output and the epistemic index as a principled acquisition objective, and we prove this quantity vanishes as the model is trained on growing datasets, confirming that it captures exactly the uncertainty data can resolve. Because this mutual information is intractable for continuous outputs, we derive the Mutual Information Lower Bound (MI-LB) acquisition function, a closed-form approximation for Mixture Density Network ensembles. On benchmarks featuring multimodal systems, MI-LB matches or beats every baseline evaluated and is the only method to do so consistently -- geometric and Fisher-based baselines compete only when the input space already encodes the multimodality, and collapse otherwise.

2605.14840 2026-05-15 cs.LG math.OC stat.ML

In-Context Learning for Data-Driven Censored Inventory Control

Sohom Mukherjee, Anh-Duy Pham, Richard Pibernik, Yunbei Xu

AI总结 本文研究了在数据驱动环境下具有决策依赖性截断的库存控制问题,提出了一种基于上下文生成后验采样的新方法(ICGPS),结合了生成模型的离线元训练与在线自回归生成,以应对订单量影响需求观测完整性的挑战。该方法理论上保证了其贝叶斯遗憾与理想完成核下的TS基准相比仅增加一个与时间平方根成正比的惩罚项,并在实际应用中通过ChronosFlow实现,表现出对先验偏差和分布偏移的鲁棒性,实验显示其在模拟和真实数据集上均优于传统方法。

详情
英文摘要

We study inventory control with decision-dependent censoring, focusing on the censored or repeated newsvendor (R-NV), where each order quantity determines whether demand is fully observed or censored by sales. Existing approaches based on parametric Thompson sampling (TS) can be brittle under prior mismatch, while offline imputation methods need not transfer to online learning. Motivated by the predictive view of decision making, we combine these ideas by taking oracle actions on learned completions of latent demand. We propose in-context generative posterior sampling (ICGPS), which uses modern generative models that are meta-trained offline and deployed online by in-context autoregressive generation. Theoretically, we show that the Bayesian regret of ICGPS with a learned completion kernel is bounded by the Bayesian regret of a TS benchmark with the ideal completion kernel plus a deployment penalty scaling as $\sqrt{T}$ times the square root of the completion mismatch. This yields a plug-in template for operational problems with known TS regret bounds. For R-NV, we derive sublinear Bayesian regret by reducing censored feedback to bandit convex optimization feedback. We also show that, under reasonable coverage and stability assumptions, the online completion mismatch is controlled by the offline censored predictive mismatch, so offline predictive quality transfers to online performance. Practically, we instantiate ICGPS with ChronosFlow, which combines a frozen time-series transformer backbone with a trainable conditional normalizing-flow head for fast censoring-consistent sampling. In benchmark experiments, ChronosFlow-ICGPS matches correctly specified TS, outperforms myopic and UCB-style baselines, and is robust to prior mismatch and distribution shift. ChronosFlow-ICGPS also performs well for the real-world SuperStore dataset, especially under heavy censoring.

2605.14828 2026-05-15 stat.ML cs.LG stat.ME

K-Models: a Flexible and Interpretable Method for Ordinal Clustering with Application to Antigen-Antibody Interaction Profiles

Giulia Patanè, Alessandra Menafoglio, Alexander Krauth, Peter Fechner, Luca Dede', Bianca Maria Colosimo, Federica Nicolussi

AI总结 该研究提出了一种名为K-Models的新型聚类方法,用于处理具有序数关系的函数型数据,旨在在保证聚类性能的同时提升模型的可解释性。该方法通过引入序数约束,估计生成观测函数型数据的随机过程中的关键要素,从而更准确地识别数据的内在结构。研究通过仿真和实际应用(如抗原-抗体相互作用的反射传感器数据)验证了该方法的有效性,展示了其在具有潜在序数结构的数据分析中的优越性和实用性。

详情
英文摘要

Existing clustering methods for functional data often prioritize partitioning accuracy over interpretability, making it challenging to extract meaningful insights when the data-generating process follows a specific underlying structure and an ordinal relationship among clusters is suspected. This work introduces K-Models, a novel framework that integrates ordinal constraints and estimates key underlying elements of the random process generating the observed functional profiles, improving both interpretability and structure identification. The proposed method is evaluated through simulations and real-world applications. In particular, it is tested on Region of Interest (ROI) curves, which represent reaction profiles from a reflectometric sensor monitoring biomolecular interactions, such as antigen-antibody binding. These curves represent changes in reflected light intensity over time at multiple measurement spots with immobilized antigens during analyte exposure, capturing the binding dynamics of the system. The goal is to identify intrinsic signal patterns solely from the observed dynamics, making this dataset an ideal benchmark for assessing the added interpretability of the proposed approach. By incorporating structural assumptions into the clustering process, K-Models enhances interpretability while maintaining performance comparable to state-of-the-art techniques, providing a valuable tool for analyzing functional data with an underlying ordinal structure.

2605.14796 2026-05-15 stat.ME

A Class of Higher-Order INAR Random Fields for Poisson Counts and Beyond

Christian H. Weiß, Angelika Silbernagel

AI总结 本文提出了一类新型的高阶整数值自回归(CINAR)随机场模型,用于处理计数型数据,解决了现有模型在刻画平稳边缘分布和计算条件概率方面的困难。该模型结合了经典的自回归依赖结构,并允许边缘分布属于广义离散自分解分布类,包括泊松分布和负二项分布等。文章推导了CINAR模型的关键统计性质,探讨了其特殊情形与扩展,并通过农业数据应用验证了其实际有效性。

详情
英文摘要

Existing integer-valued autoregressive (INAR) models for count random fields suffer from difficulties in characterizing the stationary marginal distribution and in computing conditional probabilities (as required for likelihood inference). To overcome these drawbacks, the novel class of combined INAR (CINAR) models is proposed, which both exhibits the classical autoregressive dependence structure and allows to specify the marginal distribution within the wide class of discrete self-decomposable distributions. In particular, CINAR random fields can be equipped with a Poisson or negative-binomial marginal distribution. The CINAR's key stochastic properties are derived (including a simple expression for conditional probabilities), and special cases as well as possible extensions are discussed. Approaches for parameter estimation are developed and investigated, and the practical relevance of the novel CINAR family is demonstrated by an agricultural data application.

2605.14762 2026-05-15 stat.ME math.ST stat.TH

Differentially private inference framework of Riemannian manifold data

Yangdi Jiang, Xiaotian Chang, Qirui Hu

AI总结 本文提出了一种针对非欧几里得数据的系统性差分隐私推断框架。研究设计了适用于黎曼流形数据的弗雷歇均值和方差的两种差分隐私机制,并根据流形的几何结构进行隐私预算的分析校准。进一步建立了所提估计量的一致性和中心极限定理,支持在隐私保护下的统计推断,并提供了完整的实现指南和可行方法。实验表明该方法在医学图像和社会学数据集上具有良好的效果。

详情
英文摘要

We propose a novel and systematic differentially private (DP) inference framework for non-Euclidean data. First, we design two types of DP mechanisms for the Fréchet mean and variance with i.i.d. Riemannian manifold-valued data, tailored to different geometric structures and accompanied by analytic privacy budgets calibrated to the geometry of the underlying manifold. Second, we establish the consistency and central limit theorems (CLTs) of the proposed DP estimators, enabling a suite of statistical inference procedures under privacy protection. Furthermore, we provide comprehensive implementation guidelines and feasible procedures, including consistent DP estimators of the asymptotic variance in the CLTs. Extensive numerical experiments support the proposed methodologies. Finally, we demonstrate the effectiveness of our approach on real-world medical image and sociological datasets lying on two representative manifolds.

2605.14663 2026-05-15 math.OC math.PR stat.ML

Optimal Asymptotic Rates for (Stochastic) Gradient Descent under the Local PL-Condition: A Geometric Approach

Sebastian Kassing, Thomas Kruse

AI总结 本文研究了梯度下降和随机梯度下降在满足Polyak-Lojasiewicz (PL)条件的$C^2$函数下的局部收敛行为,特别考虑了由过参数化神经网络引发的乘法梯度噪声模型。通过几何视角解释PL条件,作者证明了一个简洁而令人惊讶的结论:即使在非凸设置下,(S)GD的渐近收敛速度仍与强凸二次函数的收敛速度一致。这一结果揭示了SGD在非凸优化中具有与凸问题相似的最优渐近收敛速率。

详情
英文摘要

Stochastic gradient descent (SGD) has been studied extensively over the past decades due to its simplicity and broad applicability in machine learning. In this work, we analyze the local behavior of gradient descent and stochastic gradient descent for minimizing $C^2$-functions that satisfy the Polyak-Lojasiewicz (PL) inequality and under a multiplicative gradient noise model motivated by overparameterized neural networks. Using a geometric interpretation of the PL-condition, we prove a simple yet surprising fact: in this possibly non-convex setting, the asymptotic convergence rate of (S)GD matches the rate obtained for strongly convex quadratics.

2605.14647 2026-05-15 stat.ME stat.AP

Multiscale Topological Inference for Marked Point Processes via Euler Characteristic Envelopes

Matthias Eckardt, Mehdi Moradi

AI总结 本文提出了一种基于欧拉特征包络的多尺度拓扑推断框架,用于分析标记点过程中的复杂空间结构与属性依赖关系。通过引入标记加权的距离度量和非参数全局包络检验,该方法能够有效捕捉高阶拓扑结构和属性与空间之间的非线性相互作用,并实现对随机标记假设的正式检验。此外,该方法还通过Z分数分解局部拓扑信号,识别结构中心和拓扑障碍,具有高度的灵敏性和鲁棒性,为标记空间数据的结构依赖分析提供了全面且可解释的工具。

详情
英文摘要

The statistical analysis of marked point processes requires disentangling complex spatial arrangements from attribute-dependent interactions. While classical summary statistics are effective for second-order dependencies, they frequently fail to capture higher-order topological structures and non-linear interactions between marks and space. In this work, we propose a novel multiscale topological inference framework for marked point processes by integrating mark-weighted filtrations with Euler Characteristic envelopes. We redefine the underlying metric space using an exponential mark-weighted distance, which modulates connectivity based on attribute similarity, effectively accelerating the merger of connected components among homophilic neighbors. To ensure rigorous statistical inference, we apply non-parametric global envelope tests to the resulting Euler Characteristic Curves, allowing for formal hypothesis testing against the null model of random labeling. Furthermore, we introduce a local decomposition of the topological signal via Z-scores at the critical filtration scale to identify and localize structural hubs and topological barriers. Systematic simulations across various scenarios demonstrate the framework's high specificity and sensitivity to attribute-space dependencies while remaining robust against purely geometric effects. This methodology provides a comprehensive and interpretable toolkit for identifying, quantifying, and localizing complex structural dependencies in marked spatial data, bridging the gap between topological data analysis and classical point process statistics.

2605.14632 2026-05-15 cs.LG stat.AP

DRL-STAF: A Deep Reinforcement Learning Framework for State-Aware Forecasting of Complex Multivariate Hidden Markov Processes

Manrui Jiang, Jingru Huang, Yong Chen, Chen Zhang

AI总结 该研究提出了一种基于深度强化学习的DRL-STAF框架,用于复杂多变量隐马尔可夫过程的状态感知预测。该方法结合深度神经网络建模非线性观测,并利用强化学习估计离散隐状态,克服了传统隐马尔可夫模型在非线性发射和扩展性方面的不足,同时减少了对预定义状态转移结构的依赖。实验表明,DRL-STAF在预测性能和隐状态估计方面均优于现有方法。

详情
英文摘要

Forecasting multivariate hidden Markov processes is challenging due to nonlinear and nonstationary observations, latent state transitions, and cross-sequence dependencies. While deep learning methods achieve strong predictive accuracy, they typically lack explicit state modeling, whereas Hidden Markov Models (HMMs) provide interpretable latent states but struggle with complex nonlinear emissions and scalability. To address these limitations, we propose DRL-STAF, a Deep Reinforcement Learning based STate-Aware Forecasting framework that jointly predicts next-step observations and estimates the corresponding hidden states for complex multivariate hidden Markov processes. Specifically, DRL-STAF models complex nonlinear emissions using deep neural networks and estimates discrete hidden states using reinforcement learning, reducing the reliance on predefined transition structures and enabling flexible adaptation to diverse temporal dynamics. In particular, DRL-STAF mitigates the state-space explosion encountered by typical multivariate HMM-based methods. Extensive experiments demonstrate that DRL-STAF outperforms HMM variants, standalone deep learning models, and existing DL-HMM hybrids in most cases, while also providing reliable hidden-state estimates.

2605.14599 2026-05-15 cs.LG cs.AI stat.ML

Fast Rates for Inverse Reinforcement Learning

Andreas Schlaginhaufen, Maryam Kamgarpour

AI总结 本文研究了有限时间马尔可夫决策过程中的熵正则化最小-最大逆强化学习(Min-Max-IRL)问题,针对线性奖励类问题,建立了新的结构和统计性质。作者证明了在总体层面,最大似然估计与Min-Max-IRL等价,在确定性动力学下在经验层面也等价。通过利用Min-Max-IRL损失的伪自共轭性质,作者展示了轨迹级KL散度和参数误差在Hessian范数下的衰减速度为$\mathcal{O}(n^{-1})$,且结果适用于模型误设情况,无需探索假设。此外,还扩展了奖励可识别性的结果到一般的Borel空间,并推导了软最优价值函数关于奖励参数的导数新性质。

详情
英文摘要

We establish novel structural and statistical results for entropy-regularized min-max inverse reinforcement learning (Min-Max-IRL) with linear reward classes in finite-horizon MDPs with Borel state and action spaces. On the structural side, we show that maximum likelihood estimation (MLE) and Min-Max-IRL are equivalent at the population level, and at the empirical level under deterministic dynamics. On the statistical side, exploiting pseudo-self-concordance of the Min-Max-IRL loss, we prove that both the trajectory-level KL divergence and the squared parameter error in the Hessian norm decay at the fast rate $\mathcal{O}(n^{-1})$, where $n$ is the number of expert trajectories. Our guarantees apply under misspecification and require no exploration assumptions. We further extend reward-identifiability results to general Borel spaces and derive novel results on the derivatives of the soft-optimal value function with respect to reward parameters.

2605.14575 2026-05-15 econ.GN q-fin.EC stat.ME

The Asset Price Channel of Monetary Policy: Evidence from Regional Stock-Market Developments in the Successor States of Former Yugoslavia

Stefan Tanevski

AI总结 本研究旨在实证分析前南斯拉夫六个共和国地区是否存在货币政策的部门资产价格传导渠道。通过构建区域部门股票指数,并运用面板向量自回归模型和混合均值组估计方法,研究发现金融和电信部门存在明显的资产价格传导效应,这可能归因于跨国企业网络促进了子市场区域化。相比之下,制造业和电力部门则未表现出类似效应,表明当地股票市场仍较为分散,亟需更高效的区域市场整合或加强交易所合作。

详情
英文摘要

The aim of this study is to empirically investigate the existence of a sectoral asset price channel of monetary policy in the region of the six republics of former Yugoslavia. The study constructs sectoral indices for the entire region, building on the idea that one regional stock exchange may provide more efficiency for the listed companies in the region, while monetary policy relevance for it may be sector-specific. We employ panel vector autoregressive model to observe impulse responses of sectoral indices to innovations in monetary policy, while then disentangle the long- from the short-run relationships per index through a Pooled Mean Group estimation. Overall, we document presence of the asset price channel in the finance and telecom sectors, likely driven by the established multinational corporate networks fostering sub-market regionalization. Yet, this is not the case for the manufacturing and electricity sectors, which may imply that local stock markets are yet too fragmented and space for a more efficient regional stock market, either in the true sense of the word or, more realistically, though enhanced regional cooperation of the stock exchanges certainly exists.

2605.14567 2026-05-15 stat.ML cs.LG math.PR math.ST stat.TH

Scaling Laws from Sequential Feature Recovery: A Solvable Hierarchical Model

Arie Wortsman-Zurich, Hugo Tabanelli, Yatin Dandi, Florent Krzakala, Bruno Loureiro

AI总结 本文提出了一种简单的机制,解释了多层网络中特征学习如何产生缩放定律。研究对象是一个高维的分层目标函数,该函数虽然整体复杂度很高,但可以通过一组权重呈幂律衰减的潜在组合特征来表示。通过设计一种逐层谱算法,能够逐步恢复这些潜在特征,且在样本量较小时就能检测到强特征,而弱特征则需要更多数据。理论分析表明,该方法在预测误差上实现了明确的幂律衰减,并通过数值实验验证了特征逐步恢复的现象和与非分层方法的性能差异。

详情
英文摘要

We propose a simple mechanism by which scaling laws emerge from feature learning in multi-layer networks. We study a high-dimensional hierarchical target that is a globally high-degree function, but that can be represented by a combination of latent compositional features whose weights decrease as a power law. We show that a layer-wise spectral algorithm adapted to this compositional structure achieves improved scaling relative to shallow, non-adaptive methods, and recovers the latent directions sequentially: strong features become detectable at small sample sizes, while weaker features require more data. We prove sharp feature-wise recovery thresholds and show that aggregating these transitions yields an explicit power-law decay of the prediction error. Technically, the analysis relies on random matrix methods and a resolvent-based perturbation argument, which gives matching upper and lower bounds for individual eigenvector recovery beyond what standard gap-based perturbation bounds provide. Numerical experiments confirm the predicted sequential recovery, finite-size smoothing of the thresholds, and separation from non-hierarchical kernel baselines. Together, these results show how smooth scaling laws can emerge from a cascade of sharp feature-learning transitions.

2605.14524 2026-05-15 stat.ML cs.LG

Large Dimensional Kernel Ridge Regression: Extending to Product Kernels

Yang Zhou, Yicheng Li, Yuqian Cheng, Qian Lin

AI总结 本文研究了高维核岭回归(KRR)中在更广泛核函数下的泛化误差行为,扩展了之前仅针对球面内积核的结果。作者提出了一类新的高维核函数,并推导了其对应的泛化误差收敛速率。研究发现,即使在更一般的核设置下,仍存在最小最大最优性、饱和效应以及收敛速率的周期性平台和样本量相关的多重下降现象,从而拓展了对高维KRR行为的理解。

详情
英文摘要

Recent studies have reported $\textit{saturation effects}$ and $\textit{multiple descent behavior}$ in large dimensional kernel ridge regression (KRR). However, these findings are predominantly derived under restrictive settings, such as inner product kernels on sphere or strong eigenfunction assumptions like hypercontractivity. Whether such behaviors hold for other kernels remains an open question. In this paper, we establish a broad, new family of large dimensional kernels and derive the corresponding convergence rates of the generalization error. As a result, we recover key phenomena previously associated with inner product kernels on sphere, including: $i)$ the $\textit{minimax optimality}$ when the source condition $s\le 1$; $ii)$ the $\textit{saturation effect}$ when $s>1$; $iii)$ a $\textit{periodic plateau phenomenon}$ in the convergence rate and a $\textit {multiple-descent behavior}$ with respect to the sample size $n$.

2605.14491 2026-05-15 stat.ME math.ST stat.TH

Adaptive Long-Run Variance Thresholding for Sparse Covariance Estimation in High-Dimensional Time Series

Wenhao Zhang, Zhaoxing Gao

AI总结 本文研究了高维时间序列中稀疏协方差矩阵的估计问题,针对传统适用于独立数据的阈值方法在时间序列中可能失效的问题,提出了一种结合长期方差的自适应阈值方法,以适应时间依赖性。该方法在谱范数下具有一致性,并在稀疏协方差矩阵类中达到最优收敛速率,同时能够准确恢复协方差矩阵的非零元素位置。仿真和实际数据应用表明该方法在估计精度和结构恢复方面优于现有方法。

详情
英文摘要

Estimating a sparse covariance matrix is a fundamental problem in high-dimensional statistics. However, thresholding methods developed for independent data are generally not directly applicable to high-dimensional time series, where temporal dependence alters the stochastic behavior of sample covariance estimators. This paper studies sparse covariance matrix estimation for high-dimensional time series under weak dependence. We propose a thresholding procedure that incorporates long-run variance into the construction of entry-specific thresholds, thereby adapting to temporal dependence. Under suitable regularity conditions, we show that the proposed estimator is consistent under the spectral norm and attains the optimal convergence rate over a class of sparse covariance matrices. We further establish support recovery consistency for identifying the nonzero entries of the covariance matrix. In addition, we show that universal and adaptive thresholding methods developed for independent data may fail to recover the support consistently in the presence of autocorrelation. Simulation studies demonstrate that the proposed method compares favorably with existing thresholding estimators in terms of both estimation accuracy and support recovery. Applications to gene expression data and stock return data further illustrate its practical usefulness.

2605.14463 2026-05-15 stat.ME

KAP-CPD: Kernel Aggregation for Change-Point Detection in Dynamic Networks

Mingxuan Sun, Hao Chen

AI总结 本文提出了一种基于核聚合的动态网络变点检测方法KAP-CPD,旨在解决在未知变化模式下选择合适核函数的挑战。该方法通过聚合多个核的信息,提升了对不同变化模式的适应能力,且无需假设网络的具体分布,具有广泛的适用性。为进一步提升计算效率,研究还提出了快速分析检验方法KAPf-CPD,在长序列网络数据上显著减少了计算时间,并通过仿真实验和实际数据验证了方法的有效性。

详情
英文摘要

Change-point detection in dynamic networks has received much attention due to its broad applications in social networks and biological systems. Kernel-based methods have shown strong potential for this problem. However, their performance can depend sensitively on the choice of kernel, and selecting an appropriate kernel is challenging when the underlying change pattern is unknown. Motivated by this challenge, we propose KAP-CPD, a new kernel-based testing framework for change-point detection in dynamic networks. KAP-CPD aggregates information from multiple kernels, allowing it to adapt to diverse change patterns. The proposed method does not assume specific underlying network distribution, and achieves strong empirical power across a wide range of network change scenarios. To improve scalability, we further develop a fast analytic testing procedure, KAPf-CPD, that substantially reduces computation time for long network sequences compared with permutation-based alternatives and current state-of-the-art methods. We evaluate our proposed framework through extensive simulations and real-world data on email communication networks and brain functional connectivity networks.

2605.14453 2026-05-15 stat.ME

Estimating Precision Matrices for High-Dimensional Interval-Valued Data

Zhongfeng Qin, Hao Xu, Wenhao Cui, Wan Tian

AI总结 本文研究了如何在高维区间值数据下估计精度矩阵,这类数据中每个观测值以区间形式表示而非单一数值,传统方法难以有效处理。作者提出了一种新的估计框架,假设区间上下界具有相同的条件依赖结构,并构建了区间图lasso优化目标函数进行估计。该方法在计算上高效,并在理论上证明了估计器的稀疏性和一致性,实验表明其在估计精度和可解释性方面优于现有方法。

详情
英文摘要

In the field of statistical learning and data analysis, estimating precision matrices (i.e., the inverse of covariance matrices) is a critical task, particularly for understanding dependency structures among variables. However, traditional methods often fall short when dealing with high-dimensional interval-valued data, where each observation is represented as an interval rather than a single point. This paper proposes a novel framework for estimating precision matrices in such contexts, addressing the unique challenges posed by the interval nature of the data. Specifically, we assume that the upper and lower bounds of the intervals share the same conditional dependency structure, and then formulate the interval graphical lasso optimization objective to estimate the precision matrix. At the optimization level, we provide an efficient computational approach, while at the theoretical level, we prove the sparsity and consistency of the estimator. Experimental results on simulated studies and real data applications demonstrate the superiority of the proposed method in terms of estimation precision and interpretability.

2605.14444 2026-05-15 stat.ME

Inlier Recovery for Robust Registration via Gram-Matrix Overlap

Ruizi Wu, Yuehaw Khoo, Wanjie Wang

AI总结 本文研究了在存在噪声和离群点的情况下,如何通过比较两个数据集的Gram矩阵的Hadamard乘积,实现鲁棒点集配准中的内点恢复问题。提出的方法将内点识别转化为结构化恢复问题,避免了对旋转群的直接优化,并开发了基于主特征向量匹配和行和匹配的两种算法。实验表明,这些方法在内点比例较低甚至趋于零的情况下仍能实现精确恢复,具有较好的鲁棒性和实用性。

详情
英文摘要

Robust point-set registration in the presence of noise and outliers is challenging because the matched points (inliers) must be identified before reliable alignment can be performed. Existing robust registration methods typically optimize over the transformation space and are often designed for regimes with a nonvanishing fraction of inliers. In this paper, we study the inlier recovery problem arising in robust registration by comparing two datasets through the Hadamard product of their Gram matrices. This formulation converts the inlier identification into a structured recovery problem and avoids direct optimization over the rotation group. Based on this idea, we develop two methods: an eigenvector matching method based on the leading eigenvector of the Gram-matrix overlap, and a row-sum matching method based on aggregated entrywise comparison. We show that the eigenvector method achieves weak recovery when the dimension and sample size are of the same order, while the row-sum method achieves exact recovery under a broader range of dimensional scalings. In particular, when the dimension is comparable to the sample size, exact recovery is possible even when the inlier fraction vanishes, with the number of inliers as small as order $\sqrt{n}$, up to logarithmic factors. We also discuss a parallel implementation for large-scale settings. Numerical experiments on brain imaging data and image examples demonstrate that the proposed methods effectively identify matched structure under substantial corruption.

2605.14343 2026-05-15 cs.LG math.ST stat.ML stat.TH

Nearest-Neighbor Radii under Dependent Sampling

Yuanyuan Gao, Yilong Hou, Zhexiao Lin

AI总结 本文研究了在依赖采样条件下最近邻方法的邻域半径性质,突破了传统独立采样假设。通过分析强混合依赖观测,论文建立了多项式混合条件下的几乎处处收敛结果,并在几何混合条件下给出了精确的非渐近矩界,这些界依赖于局部内在维度而非环境维度,从而适用于高维流形数据。实验验证了理论结果,表明即使在依赖采样下,最近邻几何结构仍具有信息性。

详情
Comments
33 pages
英文摘要

Nearest-neighbor methods are fundamental to classical and modern machine learning, yet their geometric properties are typically analyzed under independent sampling. In this paper, we study the nearest-neighbor radii under dependent sampling. We consider strong mixing dependent observations and ask whether dependence changes the scale of nearest-neighbor neighborhoods. We establish distribution-free almost sure convergence under polynomial mixing and sharp non-asymptotic moment bounds under geometric mixing. The moment bounds depend on the local intrinsic dimension rather than the ambient dimension, making the results applicable to high-dimensional data concentrated near lower-dimensional manifolds. Synthetic experiments and real-world time-series benchmarks support the theory, showing that nearest-neighbor geometry remains informative under dependence sampling.

2605.14301 2026-05-15 cs.LG stat.ML

Language-Induced Priors for Domain Adaptation

Qiyuan Chen, Jiayu Zhou, Raed Al Kontar

AI总结 在领域适应中,当目标域数据稀缺时,传统统计方法难以区分相关与不相关的源域,导致负迁移。本文提出利用目标域的专家文本描述,构建语言诱导先验(LIP),将其与期望最大化算法结合,以识别相关源域。该方法兼容多种参数模型,能够在目标信号弱时引导源域选择,并随着数据积累逐步优化,理论分析表明其在正确先验下具有接近理想冷启动性能,并保持渐近一致性。实验验证了该框架在估计、预测和决策任务中的有效性。

详情
英文摘要

Domain adaptation faces a fundamental paradox in the cold-start regime. When target data is scarce, statistical methods fail to distinguish relevant source domains from irrelevant ones, which often leads to negative transfer. In this paper, we address this challenge by leveraging expert textual descriptions of the target domain, a resource that is often available but overlooked. We propose a probabilistic framework that translates these semantic descriptions into a choice model, namely a Language-Induced Prior (LIP), that learns the preferences from a pretrained Large Language Model (LLM). The LIP is then integrated into an Expectation-Maximization algorithm to identify source relevance. Methodologically, this framework is compatible with any parametric model where a likelihood is available. It allows the LIP to guide the selection of sources when target signals are weak, while gradually refining these choices as samples accumulate. Theoretically, we prove that the estimator roughly matches an oracle cold-start MSE under a correct prior, while remaining asymptotically consistent regardless of the quality of the LIP. Empirically, we validated the framework on a descriptive (Gaussian estimation), a predictive (C-MAPSS dataset), and a prescriptive task (MuJoCo hopper).

2605.14297 2026-05-15 cs.LG cs.AI math.OC stat.ML

Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients

Matias Alvo, Daniel Russo, Yash Kanoria

AI总结 本文研究了在混合离散-连续动作空间中的强化学习问题,这类问题常见于机器人控制和优化领域。为了解决传统策略梯度方法在高维空间中梯度质量差的问题,作者提出了混合策略优化(HPO)方法,通过结合路径梯度和得分函数梯度,实现无偏混合梯度估计,从而有效应对离散动作和非光滑动态带来的挑战。实验表明,HPO在库存控制和切换线性二次调节器等任务中显著优于PPO算法,且在连续动作维度增加时优势更加明显。

详情
英文摘要

We study reinforcement learning in hybrid discrete-continuous action spaces, such as settings where the discrete component selects a regime (or index) and the continuous component optimizes within it -- a structure common in robotics, control, and operations problems. Standard model-free policy gradient methods rely on score-function (SF) estimators and suffer from severe credit-assignment issues in high-dimensional settings, leading to poor gradient quality. On the other hand, differentiable simulation largely sidesteps these issues by backpropagating through a simulator, but the presence of discrete actions or non-smooth dynamics yields biased or uninformative gradients. To address this, we propose Hybrid Policy Optimization (HPO), which backpropagates through the simulator wherever smoothness permits, using a mixed gradient estimator that combines pathwise and SF gradients while maintaining unbiasedness. We also show how problems with action discontinuities can be reformulated in hybrid form, further broadening its applicability. Empirically, HPO substantially outperforms PPO on inventory control and switched linear-quadratic regulator problems, with performance gaps increasing as the continuous action dimension grows. Finally, we characterize the structure of the mixed gradient, showing that its cross term -- which captures how continuous actions influence future discrete decisions -- becomes negligible near a discrete best response, thereby enabling approximate decentralized updates of the continuous and discrete components and reducing variance near optimality. All resources are available at github.com/MatiasAlvo/hybrid-rl.

2605.14280 2026-05-15 cs.LG stat.ML

TILT: Target-induced loss tilting under covariate shift

Kakei Yamamoto, Martin J. Wainwright

AI总结 本文提出了一种名为TILT的无监督域适应方法,用于处理协变量偏移问题。该方法通过引入一个新颖的目标函数,将源域预测器分解为两个部分,并在有标签的源域数据上拟合这两个部分,同时在无标签的目标域数据上对辅助部分施加惩罚,最终得到的主预测器用于目标域预测。理论分析表明,该方法在总体层面能够隐式地诱导相对重要性加权,并且具有良好的稳定性与泛化能力。实验结果表明,TILT在多个任务中优于仅使用源域训练、精确重要性加权以及相对密度比等基线方法。

详情
Comments
32 pages, 17 figures. Submitted to NeurIPS 2026
英文摘要

We introduce and analyze Target-Induced Loss Tilting (TILT) for unsupervised domain adaptation under covariate shift. It is based on a novel objective function that decomposes the source predictor as $f+b$, fits $f+b$ on labeled source data while simultaneously penalizing the auxiliary component $b$ on unlabeled target inputs. The resulting fit $f$ is deployed as the final target predictor. At the population level, we show that this target-side penalty implicitly induces relative importance weighting at the population level, but in terms of an estimand $b^*_f$ that is self-localized to the current error, and remains uniformly bounded for any source-target pair (even those with disjoint supports). We prove a general finite-sample oracle inequality on the excess risk, and use it to give an end-to-end guarantee for training with sparse ReLU networks. Experiments on controlled regression problems and shifted CIFAR-100 distillation show that TILT improves target-domain performance over source-only training, exact importance weighting, and relative density-ratio baselines, with a stable dependence on the regularization parameter.

2605.14276 2026-05-15 stat.ML cs.LG

Training-Free Generative Sampling via Moment-Matched Score Smoothing

Zhenyu Yao, Daniel Paulin

AI总结 本文提出了一种无需训练的生成采样方法MM-SOLD,通过矩匹配的得分平滑技术,直接从训练数据中估计目标分布的统计特性,并在采样过程中保持这些矩不变。该方法基于过阻尼朗之万动力学,能够在不训练神经网络的情况下实现高质量的样本生成,实验表明其在二维分布和图像生成任务中表现优异,具有计算高效、鲁棒性强的特点。

详情
Comments
35 pages
英文摘要

Diffusion models generate samples by denoising along the score of a perturbed target distribution. In practice, one trains a neural diffusion model, which is computationally expensive. Recent work suggests that score matching implicitly smooths the empirical score, and that this smoothing bias promotes generalization by capturing low-dimensional data geometry. We propose moment-matched score-smoothed overdamped Langevin dynamics (MM-SOLD), a training-free interacting particle sampler that enforces the target moments throughout the sampling trajectory. We prove that, in the large-particle limit, the empirical particle density converges to a deterministic limit whose one-particle stationary marginal is a Gibbs--Boltzmann density obtained by exponentially tilting a naive score-smoothed diffusion target. The mean and covariance of this distribution agree with the empirical moments of the training data. Experiments on 2D distributions and latent-space image generation show that MM-SOLD enables fast, robust, training-free sampling on CPUs, with sample fidelity and diversity competitive with neural diffusion baselines.

2605.14275 2026-05-15 math.ST stat.TH

Double/debiased machine learning of quantile treatment effects on long-term outcomes in clinical trials

Ziyang Liu, Niwen Zhou, Peng Wu, Xu Guo

AI总结 在临床试验中,长期结果往往难以获取,而短期替代指标较为常见。本文研究如何结合随机试验数据和外部观察数据,估计长期结果的分位数处理效应,提出了一种双重稳健的估计方法,能够在处理变量随机化和可转移性假设下实现有效推断。该方法兼容灵活的机器学习技术,具有良好的有限样本表现,能够揭示不同分位数下的异质性长期治疗效应。

详情
英文摘要

Long-term outcomes are often unavailable in randomized clinical trials, although short-term surrogate outcomes are commonly observed. External observational data may contain the long-term outcome, but causal comparisons based on such data alone are vulnerable to confounding. Existing surrogate-based data integration methods for long-term outcomes have focused primarily on average treatment effects. We study estimation of quantile treatment effects for long-term outcomes in the trial population by combining randomized trial data with external observational data. Under treatment randomization, positivity, and a surrogate-based transportability assumption, we establish identification and develop a doubly robust estimator for inference. The estimator accommodates flexible machine learning methods for nuisance estimation, remains consistent if either the score-related or outcome regression-related nuisance functions are consistently estimated, and is asymptotically normal under regularity conditions. Simulation and real-data results demonstrate that the proposed method performs well in finite samples and can reveal heterogeneous long-term treatment effects across quantiles.

2605.14222 2026-05-15 stat.ME

Robust and Data-Adaptive Integration of Nonconcurrent Data in Platform Trials via Gaussian Processes

Yuhan Qian, Yu Du, Jingning Zhang, Yanyao Yi, Patrick J. Heagerty, Ting Ye

AI总结 本文研究了如何在平台试验中稳健且数据自适应地整合非同期数据,以提高试验效率。作者提出了一种基于高斯过程的框架,利用平台试验中的时间平滑特性,有效融合非同期数据,并提供了不确定性量化。该方法不仅具有清晰的频率学解释,还理论证明了其在降低治疗效应后验方差和控制偏差方面的优势,并通过实例和R包展示了其应用。

详情
英文摘要

A platform trial is an innovative clinical trial design that enables simultaneous and continuous evaluation of multiple treatments within a single master protocol. Existing robust methods restrict analyses to concurrently randomized participants due to concerns that including nonconcurrent data may introduce bias from temporal trends. However, this exclusion represents a missed opportunity to improve efficiency. We propose a Gaussian process framework for incorporating nonconcurrent data that exploits temporal smoothness, a key feature of platform trials. The framework includes single-task and multi-task formulations and provides data-adaptive integration of nonconcurrent data with uncertainty quantification. The connection to kernel ridge regression yields a transparent frequentist interpretation of how nonconcurrent data are integrated. We establish two theoretical guarantees: incorporating nonconcurrent controls reduces the posterior variance of the treatment effect, and the resulting bias is controlled by a non-increasing bound. We extend the framework to discrete outcomes and to covariate adjustment, illustrate it on a hypothetical platform trial constructed from SURMOUNT-1, and provide an implementation in the R package RobinCID.

2605.14200 2026-05-15 cs.LG stat.ML

How to Scale Mixture-of-Experts: From muP to the Maximally Scale-Stable Parameterization

Leena Chennuru Vankadara, Moritz Haas, Luke Hayward, Sebastian Bordt, Alessandro Breccia

AI总结 本文研究了混合专家(MoE)架构在大规模扩展时的参数设置问题,分析了网络宽度、专家数量、稀疏度等超参数的合理缩放关系。作者提出了一种基于动态平均场理论(DMFT)的分析框架,推导出满足最大更新(μ)条件的参数化方法(μP),但发现其在扩展性方面存在不足。为此,作者进一步提出了最大尺度稳定性参数化(MSSP),在不同扩展场景下均能实现学习率迁移和性能的单调提升,为MoE架构的扩展提供了完整的理论指导。

详情
英文摘要

Recent frontier large language models predominantly rely on Mixture-of-Experts (MoE) architectures. Despite empirical progress, there is still no principled understanding of how hyperparameters should scale with network width $N$, expert width $N_e$, number of experts $M$, sparsity $K$, and depth $L$ to ensure both stability and optimal performance at scale. We take a principled step toward resolving this gap by analyzing three different scaling regimes: (I) co-scaling $N\asymp N_e$, (II) co-scaling $N\asymp M\asymp K$, and (III) full proportional scaling of $N, N_e, M$, and $K$. For each regime, we develop a novel Dynamical Mean Field Theory (DMFT) description of the limiting training dynamics of MoEs that provides a formal foundation for our analysis. Within this framework, we derive the unique parameterization for SGD and Adam satisfying all maximal-update ($μ$) desiderata. We then show that the resulting $μ$P prescription does not reliably induce monotonic improvement with scale or robust learning-rate transfer. We trace these pathologies to scale-dependent observables in the aggregation dynamics, which motivates a refined set of desiderata that we term maximal scale stability. Guided by this principle, we derive a Maximally Scale-Stable Parameterization (MSSP) for both SGD and Adam in all three scaling regimes, and characterize the corresponding limiting dynamics - qualitatively distinct from the $μ$P limit - through a separate DMFT analysis. Experiments verify that MSSP robustly recovers learning rate transfer and monotonic improvement with scale across regimes. Combined with existing depth-scaling theory, these results provide a complete scaling prescription for MoE architectures as a function of width, depth, expert width, and number of experts.

2605.14193 2026-05-15 math.ST stat.TH

Equilibrium and Pricing in Consumer Networks with Nonlinear Utilities: An Online Shape-Constrained Learning Approach

Daniele Bracale, George Michailidis

AI总结 本文研究了具有非线性效用函数的消费者网络中的均衡与最优垄断定价问题,考虑消费者效用不仅依赖于个人价格,还受其社交网络中同伴消费行为的影响。作者提出了一个统一的理论框架,涵盖多种非线性效用形式,并建立了在一般条件下消费者均衡的存在性与唯一性。为应对未知效用函数的挑战,本文引入了一种无需调参的形状约束学习方法,实现了无悔收敛,为垄断定价提供了理论支持与实用工具。

详情
英文摘要

We study optimal monopoly pricing over consumer networks governed by general nonlinear utilities. In our framework, a consumer's utility is jointly determined by an individualized price and the consumption choices of their peers, propagated through a directed and signed social graph. This formulation encapsulates a broad class of utility functions; it strictly generalizes the traditional linear-quadratic framework to include logit-type discrete choice, isoelastic, and Stone-Geary utilities under a single theoretical umbrella. We first establish the existence and uniqueness of the consumer-side equilibrium under general contraction and variational conditions, explicitly accommodating asymmetric and signed network externalities. Leveraging this equilibrium characterization, we analyze targeted price discrimination within community-structured and influencer-driven markets. To this end, we introduce a generalized measure of network influence that extends classical Katz-Bonacich centrality beyond the Euclidean domain. Finally, addressing the challenge of unknown consumer utility functions, we develop a shape-constrained, tuning-parameter-free learning approach utilizing isotonic regression, for which we establish strict no-regret convergence guarantees. Supported by extensive simulations, our results seamlessly integrate equilibrium analysis and nonparametric learning into a cohesive monopoly pricing framework.

2605.13154 2026-05-15 quant-ph math.ST stat.TH

Three ways to find comfort with the Bell proof and the results of the Bell experiments

Richard D Gill, Inge S. Helland, Bart Jongejan

AI总结 本文探讨了贝尔定理及其实验结果所带来的哲学与物理问题,分析了如何在放弃反事实确定性或设定与隐藏状态之间的共谋假设后重建一致的世界观。三位作者分别提出不同观点:吉尔接受不可约的非定域量子随机性,并认为局域性与实在性的对立是虚假的;海兰德从可访问变量理论重建希尔伯特空间形式,认为观察者在某种意义上必须受限;容格詹提出一种依赖空间维度的几何隐变量模型,解释了CHSH不等式违反程度与空间维度的关系。文章综述了贝尔定理的经典部分、无漏洞实验及近期文献,并对不同立场进行了比较讨论。

详情
英文摘要

Bell's theorem states that no description of a Bell experiment can be simultaneously local, realistic in the sense of counterfactual definiteness, and free of conspiracy between settings and hidden state. The recent generation of experiments has confirmed the predicted violation of the CHSH inequality, so one of the assumptions must be abandoned. Which one, and how one reconstructs a coherent worldview after doing so, is a question on which many authors disagree. This paper is written by three such authors. All three reject both counterfactual definiteness and conspiratorial violation of statistical independence of setting choices and state. After a joint exposition of the classical half of Bell's theorem in the language of Pearl-style causal graphs, a joint summary of the loophole-free experiments, and a joint survey of the recent literature, each author states where they have presently arrived. Gill accepts irreducible and non-local quantum randomness and finds the choice between locality and realism a false dichotomy. In his later works, Bell derives counterfactual definiteness from classical local causality, and that is what has to go. The metaphysical concepts "realism", "locality", "causality" need to be reconsidered. Helland reconstructs the Hilbert-space formalism from a theory of accessible variables, and from this theory he concludes that every observer must be limited in a specific sense. Jongejan proposes a geometric hidden-variable construction in which the degree of violation of the CHSH inequality depends on the number of dimensions of space, Tsirelson's bound corresponding to three dimensions. The authors conclude with a discussion.

2605.10289 2026-05-15 cs.LG stat.ML

Sample-Mean Anchored Thompson Sampling for Offline-to-Online Learning with Distribution Shift

Bochao Li, Yao Fu, Wei Chen, Fang Kong

AI总结 本文研究了在分布偏移场景下的离线到在线学习问题,旨在利用离线数据提升在线决策性能。为了解决传统汤普森采样(TS)在处理分布偏移时的估计偏差问题,作者提出了基于样本均值锚定的汤普森采样(Anchor-TS),通过引入中位数锚定规则,有效校正了分布偏移带来的估计偏差,提升了算法的稳定性和性能。理论分析表明该方法能够安全利用离线数据加速在线学习,并通过实验验证了其在多种场景下的优越性。

详情
英文摘要

Offline-to-online learning aims to improve online decision-making by leveraging offline logged data. A central challenge in this setting is the distribution shift between offline and online environments. While some existing works attempt to leverage shifted offline data, they largely rely on UCB-type algorithms. Thompson sampling (TS) represents another canonical class of bandit algorithms, well known for its strong empirical performance and naturally suited to offline-to-online learning through its Bayesian formulation. However, unlike UCB indices, posterior samples in TS are not guaranteed to be optimistic with respect to the true arm means. This makes indices constructed from purely online and hybrid data difficult to compare and complicates their use. To address this issue, we propose sample-mean anchored TS (Anchor-TS), which introduces a novel median-based anchoring rule that defines the arm index as the median of an online posterior sample, a hybrid posterior sample, and the online sample mean. The median anchoring systematically corrects bias induced by distribution shift by mitigating over-estimation for suboptimal arms and under-estimation for optimal arms, while exploiting offline information to obtain more accurate estimates when the shift is small. We establish theoretical guarantees showing that the proposed algorithm safely leverages offline data to accelerate online learning, and quantifying how the degree of distribution shift and the size of offline data affect the resulting regret reduction. Extensive experiments demonstrate consistent improvements of our algorithm over baselines.

2605.07060 2026-05-15 physics.geo-ph cs.LG physics.comp-ph stat.ML

Functional-prior-based approaches to Bayesian PDE-constrained inversion using physics-informed neural networks

Ryoichiro Agata, Tomohisa Okazaki

AI总结 本文提出了一种基于函数先验的贝叶斯偏微分方程约束反演方法(fpBPINN),旨在将物理意义明确的函数空间先验有效引入基于物理信息神经网络(PINN)的贝叶斯反演中。研究引入了两种互补方法:一种通过学习神经网络权重先验以符合给定函数先验,另一种则在函数空间中直接进行变分推理。实验表明,这两种方法在地震层析成像和达西流渗透率反演中均能准确估计后验分布,突显了引入物理可解释函数先验在提升反演精度中的重要性。

详情
英文摘要

Physics-informed neural networks (PINNs) provide a mesh-free framework for solving PDE-constrained inverse problems, but their extension to Bayesian inversion still faces a fundamental difficulty: prior distributions are typically defined in the weight space of neural networks, whereas physically meaningful prior assumptions are more naturally expressed in function space. In this study, we introduce a unified framework, termed functional-prior-based approaches to Bayesian PDE-constrained inversion using physics-informed neural networks (fpBPINN), to incorporate functional priors into Bayesian PINN-based inversion. We consider two complementary approaches. The first is a functional-prior-informed Bayesian PINN (FPI-BPINN), in which a neural network weight prior is learned to be consistent with a prescribed functional prior, and Bayesian inference is subsequently performed in weight space. The second is function-space particle-based variational inference for PINNs (fParVI-PINN), which performs Bayesian estimation using ParVI directly in function space. We also show that random Fourier features (RFF) play an important role in representing Gaussian functional priors with neural networks and in improving posterior approximation. We applied the proposed approaches to one-dimensional seismic traveltime tomography and two-dimensional Darcy-flow permeability inversion. These numerical experiments showed that both approaches accurately estimated posterior distributions, highlighting the significance of introducing physically interpretable functional priors into Bayesian PINN-based inverse problems. We also identified the contrasting advantages of FPI-BPINN and fParVI-PINN, namely flexibility and accuracy, respectively.

2605.03823 2026-05-15 cs.LG cs.IT math.IT math.ST stat.TH

Realizable Bayes-Consistency for General Metric Losses

Dan Tsir Cohen, Steve Hanneke, Aryeh Kontorovich

AI总结 本文研究了在可实现设定下,使用一般度量损失进行学习时的强泛化贝叶斯一致性问题,扩展了传统二分类和回归问题的相关结果。作者给出了假设类满足何种条件时,存在一种分布无关的学习规则,使其风险几乎必然收敛到类内最优风险(即零)。主要贡献在于提出了一种基于组合障碍的精确刻画,引入了无限非递减 $(γ_k)$-Littlestone 树的概念,从而将经典 Littlestone 树结构推广到度量损失场景。

详情
Comments
14 pages. To appear in Proceedings of the 43rd International Conference on Machine Learning (ICML 2026); v2: fixed abstract metadata rendering
英文摘要

We study strong universal Bayes-consistency in the realizable setting for learning with general metric losses, extending classical characterizations beyond $0$-$1$ classification (Bousquet et al., 2020; Hanneke et al., 2021) and real-valued regression (Attias et al., 2024). Given an instance space $(X,ρ)$, a label space $(Y,\ell)$ with possibly unbounded loss, and a hypothesis class $H \subseteq Y^{X}$, we resolve the realizable case of an open problem presented in Tsir Cohen and Kontorovich (2022). Specifically, we find the necessary and sufficient conditions on the hypothesis class $H$ under which there exists a distribution-free learning rule whose risk converges almost surely to the best-in-class risk (which is zero) for every realizable data-generating distribution. Our main contribution is this sharp characterization in terms of a combinatorial obstruction: Similarly to Attias et al. (2024), we introduce the notion of an infinite non-decreasing $(γ_k)$-Littlestone tree, where $γ_k \to \infty$. This extends the Littlestone tree structure used in Bousquet et al. (2020) to the metric loss setting.

2604.21809 2026-05-15 cs.LG cs.AI q-bio.QM stat.ML

Quotient-Space Diffusion Models

Yixian Xu, Yusong Wang, Shengjie Luo, Kaiyuan Gao, Tianyu He, Di He, Chang Liu

AI总结 本文提出了一种名为商空间扩散模型(Quotient-Space Diffusion Models)的生成模型框架,旨在有效处理和利用系统中的对称性。该方法通过在去除对称冗余的商空间上进行生成过程,使模型能够在保持目标对称分布的前提下,更灵活地学习生成过程。该框架在分子结构生成任务中进行了实例化,相比等变扩散模型和基于对齐的方法,表现出更优的性能,为生成模型中的对称性处理提供了新的解决方案。

详情
Comments
ICLR 2026 Oral Presentation; 43 pages, 5 figures, 6 tables; ICLR 2026 Camera Ready version
英文摘要

Diffusion-based generative models have reformed generative AI, and also enabled new capabilities in the science domain, e.g., fast generation of 3D structures of molecules. In such tasks, there is often a symmetry in the system, identifying elements that can be converted by certain transformations as equivalent. Equivariant diffusion models guarantee a symmetric distribution, but miss the opportunity to make learning easier, while alignment-based simplification attempts fail to preserve the target distribution. In this work, we develop quotient-space diffusion models, a principled generative framework to fully handle and leverage symmetry. By viewing the intrinsic generation process on the quotient space, the exact construction that removes symmetry redundancy, the framework simplifies learning by allowing model output to have an arbitrary intra-equivalence-class movement, while generating the correct symmetric target distribution with guarantee. We instantiate the framework for molecular structure generation which follows $\mathrm{SE}(3)$ (rigid-body movement) symmetry. It improves the performance over equivariant diffusion models and outperforms alignment-based methods universally for small molecules and proteins, representing a new framework that surpasses previous symmetry treatments in generative models.

2604.17548 2026-05-15 cs.LG math.AT stat.ML

Contraction and Hourglass Persistence for Learning on Graphs, Simplices, and Cells

Mattie Ji, Indradyumna Roy, Vikas Garg

AI总结 该论文研究了如何在图、单纯复形和胞腔网络上进行学习的拓扑方法,提出了收缩同调(Contraction Homology)和小时glass持续性(Hourglass Persistence)的概念,以改进传统持续同调在图神经网络中的应用。通过结合包含和收缩操作,小时glass持续性提升了模型的表达能力、可学习性和稳定性,并设计了高效的算法,能够在多种现实图数据集上取得优于传统方法的实验结果。

详情
Comments
31 pages, 6 figures, 4 algorithms, 2 tables. Accepted at ICLR 2026
英文摘要

Persistent homology (PH) encodes global information, such as cycles, and is thus increasingly integrated into graph neural networks (GNNs). PH methods in GNNs typically traverse an increasing sequence of subgraphs. In this work, we first expose limitations of this inclusion procedure. To remedy these shortcomings, we analyze contractions as a principled topological operation, in particular, for graph representation learning. We study the persistence of contraction sequences, which we call Contraction Homology (CH). We establish that forward PH and CH differ in expressivity. We then introduce Hourglass Persistence, a class of topological descriptors that interleave a sequence of inclusions and contractions to boost expressivity, learnability, and stability. We also study related families parametrized by two paradigms. We also discuss how our framework extends to simplicial and cellular networks. We further design efficient algorithms that are pluggable into end-to-end differentiable GNN pipelines, enabling consistent empirical improvements over many PH methods across standard real-world graph datasets. Code is available at \href{https://github.com/Aalto-QuML/Hourglass}{this https URL}.

2603.21996 2026-05-15 cs.SE stat.CO

StreamSampling.jl: Efficient Sampling from Data Streams in Julia

Adriano Meligrana

AI总结 StreamSampling.jl 是一个用于在单次遍历数据流时高效采样的 Julia 库,特别适用于数据总量未知的情况。该库通过保持较小的内存占用和无需完全加载数据到内存中,提供了比传统方法更高效和灵活的采样方式。研究通过实证基准测试展示了其在性能和内存使用方面的优势。

详情
Journal ref
The Proceedings of the JuliaCon Conferences, 8(83), 202 (2026)
Comments
Accepted to the Proceedings of the JuliaCon Conferences
英文摘要

StreamSampling$.$jl is a Julia library designed to provide general and efficient methods for sampling from data streams in a single pass, even when the total number of items is unknown. In this paper, we describe the capabilities of the library and its advantages over traditional sampling procedures, such as maintaining a small, constant memory footprint and avoiding the need to fully materialize the stream in memory. Furthermore, we provide empirical benchmarks comparing online sampling methods against standard approaches, demonstrating performance and memory improvements.

2603.00772 2026-05-15 stat.ML cs.LG

Generalizing Score-based generative models for Heavy-tailed Distributions

Tiziano Fassina, Gabriel Cardoso, Sylvan Le Corff, Thomas Romary

AI总结 本文研究了如何将基于分数的生成模型(SGMs)推广到具有重尾分布的数据。针对现有方法在生成保真度和理论基础方面的不足,作者提出了两个理论贡献:一是证明通过早期停止和适当初始化可以将扩散框架扩展到任意目标分布;二是为归一化流的生成过程推导出新的理论保证。基于这些结果,文章提出了一种统一的生成框架,结合归一化流捕捉重尾特性与SGM细化结构细节,有效提升了生成质量并克服了现有方法的局限。

详情
英文摘要

Score-based generative models (SGMs) have achieved remarkable empirical success, motivating their application to a broad range of data distributions. However, extending them to heavy-tailed targets remains a largely open problem. Although dedicated models for heavy-tailed distributions have been proposed, their generative fidelity remains unclear and they lack solid theoretical foundations, leaving important questions open in this regime. In this paper, we address this gap through two theoretical contributions. First, we show that combining early stopping with a suitable initialization is sufficient to extend the diffusion framework to any target distribution; in particular, we establish the well-posedness of the backward process and prove convergence of the approximated diffusion in KL divergence. Second, we derive novel theoretical guarantees for generation with normalizing flows, obtaining convergence results that hold under mild conditions on the flow family and without any assumption on the tail behavior of the target distribution. Building on these results, we propose a unified generative framework for heavy-tailed distributions: a normalizing flow is first trained to capture the tail behavior and is then used as an initialization prior for an SGM, which refines the samples by recovering fine-grained structural details. This design leverages the complementary strengths of the two model classes within a theoretically principled pipeline, overcoming the limitations of existing approaches.

2602.09969 2026-05-15 cs.LG econ.EM stat.ML

Causal Multi-Task Demand Learning

Varun Gupta, Vijay Kamble

AI总结 本文研究了一个由零售定价驱动的多任务需求学习问题,旨在估计不同决策场景下的异质性线性价格响应函数。由于每个场景的协变量丰富但价格变化有限,作者提出了一种新的元学习框架,通过利用跨任务信息进行迁移学习,解决因内生性导致的估计偏差问题。该方法在每个任务中假设存在至少两个局部外生的价格点,从而在保证因果识别的前提下提升需求参数估计的准确性,并在真实和合成数据上验证了其有效性。

详情
英文摘要

We study a canonical multi-task demand-learning problem motivated by retail pricing, where a firm seeks to estimate heterogeneous linear price-response functions across multiple decision contexts. Each context is described by rich covariates but exhibits limited price variation, motivating transfer learning across tasks. A central challenge in leveraging cross-task transfer is endogeneity: prices may be arbitrarily correlated with unobserved task-level demand determinants across tasks. We propose a new meta-learning framework that identifies the conditional mean of task-specific causal demand parameters given a subset of task-specific observables despite such confounding, assuming that each task contains at least two distinct locally exogenous price points. This subset is carefully designed to include all of the prices to address cross-task confounding, while masking two demand outcomes that provide randomized supervision to address identifiability issues arising from the inclusion of all prices. We show that this information design is maximally uniformly valid, in that any refinement of the conditioning set that reveals withheld-outcome information is not guaranteed to identify the conditional mean causal target. We validate our method on real and synthetic data, demonstrating improved recovery of demand responses relative to standard transfer-learning baselines.

2512.24588 2026-05-15 stat.ME

Multiple Testing of One-Sided Hypotheses with Conservative $p$-values

Kwangok Seo, Johan Lim, Hyungwon Choi, Jaesik Jeong

AI总结 本文研究了一类大规模单边假设检验问题,其中检验统计量服从单位方差的正态分布,目标是识别具有正均值效应的信号。传统方法在假设所有零假设均值严格为零的情况下计算p值,但由于零假设是复合的,部分零均值可能为负,导致p值过于保守,从而降低检验功效。本文提出一种新的方法,在经验贝叶斯框架下估计检验统计量的边缘零分布,并基于该分布构造精确的p值,从而在不修改现有多重检验程序的前提下提升检验功效。仿真和实际数据应用表明,该方法在传统p值保守时显著提升功效,且在传统p值准确时表现与现有方法相当。

详情
英文摘要

We study a large-scale one-sided multiple testing problem in which test statistics follow normal distributions with unit variance, and the goal is to identify signals with positive mean effects. A conventional approach is to compute $p$-values under the assumption that all null means are exactly zero and then apply standard multiple testing procedures such as the Benjamini-Hochberg (BH) or Storey-BH method. However, because the null hypothesis is composite, some null means may be strictly negative. In this case, the resulting $p$-values are conservative, leading to a substantial loss of power. Existing methods address this issue by modifying the multiple testing procedure itself, for example through conditioning strategies or discarding rules. In contrast, we focus on correcting the $p$-values so that they are exact under the null. Specifically, we estimate the marginal null distribution of the test statistics within an empirical Bayes framework and construct refined $p$-values based on this estimated distribution. These refined $p$-values can then be directly used in standard multiple testing procedures without modification. Extensive simulation studies show that the proposed method substantially improves power when conventional $p$-values are conservative, while achieving comparable performance to existing methods when conventional $p$-values are exact. An application to phosphorylation data further demonstrates the practical effectiveness of our approach.

2512.03637 2026-05-15 cs.SD cs.LG stat.ML

AaSP: Aliasing-aware Self-Supervised Pre-Training for Audio Spectrogram Transformers

Kohei Yamamoto, Kosuke Okusa

AI总结 该研究提出了一种名为AaSP的音频频谱图Transformer自监督预训练框架,旨在解决传统方法中因时间下采样导致的混叠问题。AaSP通过引入感知混叠的补丁表示、教师-学生掩码建模、跨注意力预测器以及多掩码对比正则化,学习能够整合易受混叠影响频段特征且在不同掩码视图下保持稳定的音频表示。实验表明,AaSP在多个音频识别任务中表现出色,优于现有自监督方法。

详情
Comments
Accepted for publication in IEEE Transactions on Audio, Speech and Language Processing (TALSP). Copyright IEEE
英文摘要

Transformer-based audio self-supervised learning (SSL) models commonly use spectrograms, vision-style Transformers, and masked modeling objectives. However, convolutional patchification with temporal downsampling lowers the effective Nyquist frequency and introduces aliasing, while naïve low-pass filtering may remove task-relevant high-frequency cues. We present AaSP, an aliasing-aware self-supervised pre-training framework for audio spectrogram transformers. AaSP combines an aliasing-aware patch representation, teacher-student masked modeling, a cross-attention predictor, and multi-mask contrastive regularization to learn representations that integrate features from alias-prone modulation bands while remaining stable across masked views. Its patch-embedding module, Aliasing-aware Patch Embedding (AaPE), augments standard patch tokens with features from alias-prone modulation bands using a band-limited complex sinusoidal kernel with a two-sided exponential window. The kernel's frequency and decay parameters are estimated from the input, enabling adaptive subband analysis whose outputs are fused with standard patch tokens. We pre-train on AudioSet and evaluate the learned representations by fine-tuning and linear evaluation on acoustic/environmental, speech, and music recognition benchmarks. Under fine-tuning, the full AaSP framework achieves state-of-the-art results on AS-20K, ESC-50, and NSynth among compared self-supervised baselines, while remaining competitive elsewhere. Linear evaluation shows a similar trend, including gains on US8K and NSynth. Overall, AaSP learns representations that are more stable under aliasing-sensitive temporal perturbations and competitive for downstream transfer.

2511.18739 2026-05-15 cs.AI cs.LG stat.ML

A Problem-Oriented Taxonomy of Evaluation Metrics for Time Series Anomaly Detection

Kaixiang Yang, Jiarong Liu, Yupeng Song, Shuanghua Yang, Yujue Zhou

AI总结 时间序列异常检测在物联网和物理信息系统中应用广泛,但其评估因应用场景多样和指标假设不同而面临挑战。本文提出了一种面向问题的评估指标分类框架,从解决的具体评估问题出发重新诠释现有指标,将其分为六个维度,涵盖准确性、及时性、标签容忍度、人工审核成本惩罚、抗随机性以及跨数据集可比性等方面。通过实验分析不同场景下指标的行为,量化其区分真实检测与随机噪声的能力,揭示了多数事件级指标具有较强区分力,而部分常用指标对随机分数膨胀较为敏感,强调了评估指标应根据具体任务需求进行选择。

详情
英文摘要

Time series anomaly detection is widely used in IoT and cyber-physical systems, yet its evaluation remains challenging due to diverse application objectives and heterogeneous metric assumptions. This study introduces a problem-oriented framework that reinterprets existing metrics based on the specific evaluation challenges they are designed to address, rather than their mathematical forms or output structures. We categorize over twenty commonly used metrics into six dimensions: 1) basic accuracy-driven evaluation; 2) timeliness-aware reward mechanisms; 3) tolerance to labeling imprecision; 4) penalties reflecting human-audit cost; 5) robustness against random or inflated scores; and 6) parameter-free comparability for cross-dataset benchmarking. Comprehensive experiments are conducted to examine metric behavior under genuine, random, and oracle detection scenarios. By comparing their resulting score distributions, we quantify each metric's discriminative ability -- its capability to distinguish meaningful detections from random noise. The results show that while most event-level metrics exhibit strong separability, several widely used metrics (e.g., NAB, Point-Adjust) demonstrate limited resistance to random-score inflation. These findings reveal that metric suitability must be inherently task-dependent and aligned with the operational objectives of IoT applications. The proposed framework offers a unified analytical perspective for understanding existing metrics and provides practical guidance for selecting or developing more context-aware, robust, and fair evaluation methodologies for time series anomaly detection.

2511.05159 2026-05-15 stat.ML cs.LG

A New Framework for Convex Clustering in Kernel Spaces: Finite Sample Bounds, Consistency and Performance Insights

Shubhayan Pan, Kushal Bose, Debolina Paul, Saptarshi Chakraborty, Swagatam Das

AI总结 本文提出了一种在核空间中的凸聚类新框架,用于处理线性不可分或非凸结构的数据。该方法通过将数据映射到再生核希尔伯特空间(RKHS),在变换后的空间中进行凸聚类,从而提升对复杂数据分布的处理能力,并能在有限维空间中生成嵌入表示。研究提供了该方法的理论保证,包括算法收敛性和有限样本误差界,并通过实验验证了其在合成和真实数据集上的优越性能,为非线性与非凸数据的聚类提供了有效解决方案。

详情
英文摘要

Convex clustering is a well-regarded clustering method, resembling the similar centroid-based approach of Lloyd's $k$-means, without requiring a predefined cluster count. It starts with each data point as its centroid and iteratively merges them. Despite its advantages, this method can fail when dealing with data exhibiting linearly non-separable or non-convex structures. To mitigate the limitations, we propose a kernelized extension of the convex clustering method. This approach projects the data points into a Reproducing Kernel Hilbert Space (RKHS) using a feature map, enabling convex clustering in this transformed space. This kernelization not only allows for better handling of complex data distributions but also produces an embedding in a finite-dimensional vector space. We provide a comprehensive theoretical underpinning for our kernelized approach, proving algorithmic convergence and establishing finite sample bounds for our estimates. The effectiveness of our method is demonstrated through extensive experiments on both synthetic and real-world datasets, showing superior performance compared to state-of-the-art clustering techniques. This work marks a significant advancement in the field, offering an effective solution for clustering in non-linear and non-convex data scenarios.

2510.25240 2026-05-15 stat.ML cs.LG

Generative Bayesian Optimization: Generative Models as Acquisition Functions

Rafael Oliveira, Daniel M. Steinberg, Edwin V. Bonilla

AI总结 本文提出了一种将生成模型用于批量贝叶斯优化(BO)的通用策略,使生成模型能够作为候选解采样器,从而实现大规模批量优化、非连续设计空间优化以及高维和组合设计优化。受直接偏好优化(DPO)成功启发,研究通过使用观测数据计算出的简单效用值训练生成模型,使其生成的分布密度与预期效用(即BO的获取函数值)成正比,避免了传统方法中构建代理模型的需求。理论分析表明,生成模型在BO过程中形成的分布序列在一定条件下可逼近最优目标,并通过高维大规模优化实验验证了方法的有效性。

详情
Journal ref
The Fourteenth International Conference on Learning Representations (ICLR 2026)
Comments
Published at ICLR 2026. Compared with the proceedings version on OpenReview, this version includes a minor revision to Section 3
英文摘要

We present a general strategy for turning generative models into candidate solution samplers for batch Bayesian optimization (BO). The use of generative models for BO enables large batch scaling as generative sampling, optimization of non-continuous design spaces, and high-dimensional and combinatorial design. Inspired by the success of direct preference optimization (DPO), we show that one can train a generative model with noisy, simple utility values directly computed from observations to then form proposal distributions whose densities are proportional to the expected utility, i.e., BO's acquisition function values. Furthermore, this approach is generalizable beyond preference-based feedback to general types of reward signals and loss functions. This perspective avoids the construction of surrogate (regression or classification) models, common in previous methods that have used generative models for black-box optimization. Theoretically, we show that the generative models within the BO process follow a sequence of distributions which asymptotically approximate an optimal target under certain conditions. We also evaluate the performance through experiments on challenging optimization problems involving large batches in high dimensions.

2510.15141 2026-05-15 stat.ML cs.LG stat.AP

Manifold Dimension Estimation via Local Graph Structure

Zelong Bi, Pierre Lafaye de Micheaux

AI总结 本文提出了一种基于局部图结构的流形维度估计方法,通过在局部主成分分析坐标上进行回归来捕捉流形的局部结构。该方法引入了两个代表性估计器:二次嵌入(QE)和总最小二乘(TLS),实验表明它们在合成数据和现实数据上均具有竞争力,且在许多情况下优于现有先进方法。

详情
英文摘要

Most existing manifold dimension estimators rely on the assumption that the underlying manifold is locally flat within the neighborhoods under consideration. More recently, curvature-adjusted principal component analysis (CA-PCA) has emerged as a powerful alternative by explicitly accounting for the manifold's curvature. Motivated by these ideas, we propose a manifold dimension estimation framework that captures the local graph structure of the manifold through regression on local PCA coordinates. Within this framework, we introduce two representative estimators: quadratic embedding (QE) and total least squares (TLS). Experiments on both synthetic and real-world datasets demonstrate that these methods perform competitively with, and often outperform, state-of-the-art approaches.

2510.13583 2026-05-15 stat.ML cs.LG

On the Identifiability of Causal Graphs with the Invariance Principle

Francesco Montagna

AI总结 本文研究了在独立同分布观测数据下因果图的可识别性问题,提出在结构因果模型生成的数据分布以及少量(最多两个)具有不同噪声统计特性的环境数据下,可以唯一确定因果图。该成果首次保证了在固定数量环境中恢复完整因果图的可能性,且适用于任意非线性机制,仅需噪声满足高斯性假设,并探讨了放松该假设的可能方法。研究还进一步拓展了独立成分分析与因果发现之间的对偶关系,表明在较少辅助信息条件下,因果发现可达到与非线性ICA相当的性能。

详情
Comments
Published as ICLR 2026 conference paper
英文摘要

Causal discovery from i.i.d. observational data is known to be generally ill-posed. We demonstrate that if we have access to the distribution {induced} by a structural causal model, and additional data from (in the best case) \textit{only two} environments that sufficiently differ in the noise statistics, the unique causal graph is identifiable. Notably, this is the first result in the literature that guarantees the entire causal graph recovery with a constant number of environments and arbitrary nonlinear mechanisms. Our only constraint is the Gaussianity of the noise terms; however, we propose potential ways to relax this requirement. Of interest on its own, we expand on the well-known duality between independent component analysis (ICA) and causal discovery; recent advancements have shown that nonlinear ICA can be solved from multiple environments, at least as many as the number of sources: we show that the same can be achieved for causal discovery while having access to much less auxiliary information.

2510.11177 2026-05-15 stat.AP stat.ME

Policy Robustness & Uncertainty in Model-based Decision Support for the Energy Transition

Ian J. Burton, Femke J. M. M. Nijsse, James M. Salter

AI总结 本文研究了在能源转型背景下基于模型的决策支持系统中的政策鲁棒性与不确定性问题,提出了一种通用的不确定性分析方法,能够识别模型框架中的关键不确定性因素,并克服了传统方法在计算成本和不确定性表示上的限制。通过应用该方法于全球及印度的电力系统转型分析,研究发现可再生能源的平均替代率、建设周期和电网连接时间是影响转型结果的主要不确定性因素,而政策设计可以有效缓解这些不确定性。研究还表明,包含部分淘汰机制的政策组合在应对不确定性方面更具鲁棒性,但长期的实施延迟仍对政策目标构成挑战。

详情
英文摘要

Climate policy modelling is a key tool for assessing mitigation strategies in complex systems, where uncertainty is inherent and unavoidable. We present a general methodology for extensive uncertainty analysis in this field. While other studies have performed uncertainty analyses, few apply methods from the field of Uncertainty Quantification, which are commonly used in other modelling disciplines. We show how emulators can identify key uncertainties in modelling frameworks and demonstrate a novel policy analysis previously restricted by computational cost and limited representation of uncertainty. We apply this methodology to FTT:Power to explore uncertainties in the electricity system transition both globally and in India to assess the robustness of mitigation strategies to a wide range of policy and techno-economic scenarios. This approach results in much larger uncertainties in transition outcomes than commonly represented, but policy design can be shaped to mitigate this. Globally, our results indicate transition uncertainty is dominated by average rates of renewables cannibalisation, construction times and grid connection lead times, outweighing regional price policies, including policy reversals in the US. Solar PV appears most resilient due to low costs, though still sensitive to infrastructure constraints and cannibalisation. Onshore wind is more exposed to a range of uncertainties. In India, we find evidence that policy packages including partial phase-out instruments have greater robustness to key uncertainties, although longer lead times still hinder policy goals. Our results suggest that enabling policy and regulating fossil fuels are critical for robust power sector transitions.

2508.07876 2026-05-15 stat.ML cs.LG math.DS math.ST stat.TH

Stochastic dynamics learning with state-space systems

Juan-Pablo Ortega, Florian Rossmannek

AI总结 本文研究了状态空间系统在随机动态学习中的特性,旨在深化对脉冲神经网络计算(RC)理论基础的理解。通过统一处理确定性和随机性场景下的记忆衰减和回声状态属性(ESP),作者证明了即使在缺乏ESP的情况下,记忆衰减和解的稳定性也具有普遍性,从而为RC模型的广泛应用提供了理论支持。在随机情形下,文章引入了基于概率分布吸引子动力学的新视角,拓展了非自主动力系统的相关研究,为RC模型在因果性、稳定性与记忆特性方面提供了更深入的见解。

详情
Journal ref
Mathematical Models and Methods in Applied Sciences, 2026
英文摘要

This work advances the theoretical foundations of reservoir computing (RC) by providing a unified treatment of fading memory and the echo state property (ESP) in both deterministic and stochastic settings. We investigate state-space systems, a central model class in time series learning, and establish that fading memory and solution stability hold generically -- even in the absence of the ESP -- offering a robust explanation for the empirical success of RC models without strict contractivity conditions. In the stochastic case, we critically assess stochastic echo states, proposing a novel distributional perspective rooted in attractor dynamics on the space of probability distributions, which leads to a rich and coherent theory. Our results extend and generalize previous work on non-autonomous dynamical systems, offering new insights into causality, stability, and memory in RC models. This lays the groundwork for reliable generative modeling of temporal data in both deterministic and stochastic regimes.

2507.11922 2026-05-15 math.ST stat.ME stat.ML stat.TH

Enhancing Signal Proportion Estimation Through Leveraging Arbitrary Covariance Structures

Jingtian Bai, Xinge Jessie Jeng

AI总结 本文研究了在变量之间存在复杂依赖关系的情况下,如何更准确地估计大量变量中真实信号的比例。传统方法通常假设变量独立或满足特定稀疏性条件,限制了其在实际问题中的适用性。本文提出了一种新的信号比例估计方法,利用变量间的任意协方差结构信息,提升了在不同稀疏程度和依赖结构下的估计性能。通过理论分析和模拟实验,验证了该方法在估计精度和弱信号检测方面的优越性。

详情
Comments
Revised technical details in Section 4
英文摘要

Accurately estimating the proportion of true signals among a large number of variables is crucial for enhancing the precision and reliability of scientific research. Traditional signal proportion estimators often assume independence among variables and specific signal sparsity conditions, limiting their applicability in real-world scenarios where such assumptions may not hold. This paper introduces a novel signal proportion estimator that leverages arbitrary covariance dependence information among variables, thereby improving performance across a wide range of sparsity levels and dependence structures. Building on previous work that provides lower confidence bounds for signal proportions, we extend this approach by incorporating the principal factor approximation procedure to account for variable dependence. Our theoretical insights offer a deeper understanding of how signal sparsity, signal intensity, and covariance dependence interact. By comparing the conditions for estimation consistency before and after dependence adjustment, we highlight the advantages of integrating dependence information across different contexts. This theoretical foundation not only validates the effectiveness of the new estimator but also guides its practical application, ensuring reliable use in diverse scenarios. Through extensive simulations, we demonstrate that our method outperforms state-of-the-art estimators in both estimation accuracy and the detection of weaker signals that might otherwise go undetected.

2506.20425 2026-05-15 stat.ML cs.LG stat.CO stat.ME

Scalable Subset Selection in Linear Mixed Models

Ryan Thompson, Matt P. Wand, Joanna J. J. Wang

AI总结 本文研究了在包含固定效应和随机效应的线性混合模型中如何高效地进行可扩展的子集选择问题。为了解决现有方法在处理大量预测变量时计算效率低下的问题,作者提出了一种基于 $\ell_0$ 正则化的新型子集选择方法,并结合坐标下降算法和局部搜索算法以实现快速收敛和非凸优化的高效求解。该方法在统计上提供了有限样本下的KL散度界,并在合成和真实数据实验中表现出优越的性能。

详情
英文摘要

Linear mixed models (LMMs), which incorporate fixed and random effects, are key tools for analyzing heterogeneous data, such as in personalized medicine. Nowadays, this type of data is increasingly wide, sometimes containing thousands of candidate predictors, necessitating sparsity for prediction and interpretation. However, existing sparse learning methods for LMMs do not scale well beyond tens or hundreds of predictors, leaving a large gap compared with sparse methods for linear models, which ignore random effects. This paper closes the gap with a new $\ell_0$ regularized method for LMM subset selection that can run on datasets containing thousands of predictors in seconds to minutes. On the computational front, we develop a coordinate descent algorithm as our main workhorse and provide a guarantee of its convergence. We also develop a local search algorithm to help traverse the nonconvex optimization surface. Both algorithms readily extend to subset selection in generalized LMMs via a penalized quasi-likelihood approximation. On the statistical front, we provide a finite-sample bound on the Kullback-Leibler divergence of the new method. We then demonstrate its excellent performance in experiments involving synthetic and real datasets.

2506.12296 2026-05-15 stat.ME stat.AP

Finite-sample bias-variance tradeoff with variables related to trial participation inserted into causal forest models for ensuring generalizability

Rikuta Hamaya, Etsuji Suzuki, Konan Hara

AI总结 该研究探讨了在因果森林模型中引入与试验参与相关的变量时,有限样本下的偏差-方差权衡问题,旨在提高从随机对照试验(RCT)中估计条件平均处理效应(CATE)的泛化能力。研究发现,在现实样本量下,高维协变量带来的方差膨胀往往超过了偏差的减少,从而降低了估计精度;相比之下,基于逆概率加权(IPW)的方法在不同场景下表现更稳定。该成果为处理RCT中的选择偏差提供了重要参考,建议在实际应用中优先考虑单独处理选择偏差的方法。

详情
Comments
4 figures
英文摘要

Estimating conditional average treatment effects (CATE) from randomized controlled trials (RCTs) and generalizing them to broader populations is essential for personalizing treatment rules but is complicated by selection bias due to trial participation and potentially high dimensional covariates. We evaluated finite sample bias variance tradeoff for Causal Forest based CATE estimation strategies to address the selection bias. Identification theory suggests unbiased CATE estimation is possible when covariates related to trial participation are included in CATE estimating models. However, simulation studies demonstrated that, under realistic RCT sample sizes, variance inflation from high dimensional covariates often outweighed modest bias reduction. In our data generating process that define individual treatment effect (ITE) in source population and selected trial samples, including more than 3 covariates related to participation in causal forest substantially degraded precision unless sample sizes were large. In contrast, inverse probability weighting (IPW) based methods consistently improved performance across scenarios. Application to a RCT of omega 3 fatty acids and coronary heart disease illustrated how IPW shifts CATE estimates toward source population effects and refines heterogeneity assessments. Our findings highlight that including trial-selection variables for CATE estimating models may inflate estimator variance and reduce ITE prediction performance in applications using medical RCTs. Addressing selection bias separately (e.g. through IPW) would be a reasonable strategy.

2505.09552 2026-05-15 stat.ME cs.LG stat.ML

Scalable Krylov Subspace Methods for Generalized Mixed-Effects Models with Crossed Random Effects

Pascal Kündig, Fabio Sigrist

AI总结 该论文针对具有交叉随机效应的广义混合效应模型中的计算瓶颈问题,提出了一种基于Krylov子空间的方法,有效提升了高维数据下的计算效率。研究通过理论分析和实验验证,展示了预条件随机Lanczos拟合和共轭梯度方法在收敛性和数值稳定性方面的优势,并开发了可扩展的预测方差计算方法。实验表明,新方法相比传统的Cholesky分解方法,在速度和稳定性上均有显著提升。

详情
英文摘要

Mixed-effects models are widely used to model data with hierarchical grouping structures and high-cardinality categorical predictor variables. However, for high-dimensional crossed random effects, current standard computations relying on Cholesky decompositions can become prohibitively slow. In this work, we present Krylov subspace-based methods that address existing computational bottlenecks, and we analyze them both theoretically and empirically. In particular, we derive new results on the convergence and accuracy of the preconditioned stochastic Lanczos quadrature and conjugate gradient methods for mixed-effects models, and we develop scalable methods for calculating predictive variances. In experiments with simulated and real-world data, the proposed methods yield speedups by factors of up to about 10,000 and are numerically more stable than Cholesky-based computations.

2505.05670 2026-05-15 econ.EM math.ST stat.AP stat.ME stat.TH

Estimation and Inference in Boundary Discontinuity Designs: Location-Based Methods

Matias D. Cattaneo, Rocio Titiunik, Ruiqi Rae Yu

AI总结 本文研究了边界不连续设计中因果效应的估计与推断问题,针对基于连续分配边界划分处理组与对照组的场景,提出了一种基于位置得分的局部多项式处理效应估计方法。研究构建了边界平均处理效应曲线(BATEC)及其加总参数(WBATE和LBATE)的点wise和uniform估计与推断方法,适用于尖锐和模糊(不完美依从)设计,并通过实证应用和配套软件展示了方法的有效性。

详情
英文摘要

Boundary discontinuity designs are used to learn about causal treatment effects along a continuous assignment boundary that splits units into control and treatment groups according to a bivariate location score. We analyze location-based local polynomial treatment effect estimators that directly employ the bivariate score of each unit. We develop pointwise and uniform estimation and inference methods for the \textit{Boundary Average Treatment Effect Curve} (BATEC), as well as for two aggregated causal parameters: the \textit{Weighted Boundary Average Treatment Effect} (WBATE) and the \textit{Largest Boundary Average Treatment Effect} (LBATE). Our results cover both sharp and fuzzy (imperfect compliance) designs. We illustrate the methods with an empirical application, and provide companion general-purpose software. The supplemental appendix includes additional substantive theoretical results, methodological details, and simulation evidence.

2501.18756 2026-05-15 stat.ML cs.LG math.OC

A Unified Framework for Entropy Search and Expected Improvement in Bayesian Optimization

Nuojin Cheng, Leonard Papenmeier, Stephen Becker, Luigi Nardi

AI总结 本文提出了一种统一的理论框架——变分熵搜索(Variational Entropy Search),揭示了预期改进(EI)与基于信息论的获取函数之间的深层联系,挑战了它们本质不同的传统观点。研究通过将EI解释为最大值熵搜索(MES)的变分近似,提出了一个新的获取函数VES-Gamma,该方法在合成和现实世界的低维与高维基准测试中表现出色,优于现有的EI和MES方法。

详情
Journal ref
Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:10106-10120, 2025
英文摘要

Bayesian optimization is a widely used method for optimizing expensive black-box functions, with Expected Improvement being one of the most commonly used acquisition functions. In contrast, information-theoretic acquisition functions aim to reduce uncertainty about the function's optimum and are often considered fundamentally distinct from EI. In this work, we challenge this prevailing perspective by introducing a unified theoretical framework, Variational Entropy Search, which reveals that EI and information-theoretic acquisition functions are more closely related than previously recognized. We demonstrate that EI can be interpreted as a variational inference approximation of the popular information-theoretic acquisition function, named Max-value Entropy Search. Building on this insight, we propose VES-Gamma, a novel acquisition function that balances the strengths of EI and MES. Extensive empirical evaluations across both low- and high-dimensional synthetic and real-world benchmarks demonstrate that VES-Gamma is competitive with state-of-the-art acquisition functions and in many cases outperforms EI and MES.

2410.24003 2026-05-15 stat.ME

On testing for independence between generalized error models of several time series

Kilani Ghoudi, Bouchra R. Nasri, Bruno N. Remillard

AI总结 本文研究了多个时间序列广义误差模型之间的独立性检验问题,提出了一种适用于任意分布(包括连续和离散混合分布)的广义创新概念,并构建了基于滞后广义误差的经验过程族。通过莫比乌斯变换处理经验过程的协方差,提出了基于Cramer-von Mises统计量和依赖度量的检验统计量,并结合图形方法进行依赖关系可视化。研究还通过数值实验评估了检验方法的效能,并在金融和犯罪数据中展示了方法的应用,相关方法已实现于R语言包IndGenErrors中。

详情
英文摘要

We define generalized innovations associated with generalized error models having arbitrary distributions, that is, distributions that can be mixtures of continuous and discrete distributions. These models include stochastic volatility models and regime-switching models. We also propose statistics for testing independence between the generalized errors of these models, extending previous results of Duchesne, Ghoudi and Remillard (2012) obtained for stochastic volatility models. We define families of empirical processes constructed from lagged generalized errors, and we show that their joint asymptotic distributions are Gaussian and independent of the estimated parameters of the individual time series. Moebius transformations of the empirical processes are used to obtain tractable covariances. Several tests statistics are then proposed, based on Cramer-von Mises statistics and dependence measures, as well as graphical methods to visualize the dependence. In addition, numerical experiments are performed to assess the power of the proposed tests. Finally, to show the usefulness of our methodologies, examples of applications for financial data and crime data are given to cover both discrete and continuous cases. ll developed methodologies are implemented in the CRAN package IndGenErrors.

2410.09504 2026-05-15 stat.ME stat.CO

Bayesian Transfer Learning for Artificially Intelligent Geospatial Systems: A Predictive Stacking Approach

Luca Presicce, Sudipto Banerjee

AI总结 本文提出了一种基于贝叶斯预测堆叠的迁移学习框架,用于构建人工智能地理空间系统,以实现对大规模空间数据的快速、自动化分析。该方法将大规模数据集分割为小数据集逐步输入分析框架,从而在无需人工干预的情况下进行学习传播和整体推断。研究通过大量仿真实验和植被指数数据的应用验证了该方法的有效性,其推断结果与传统统计方法相当,但对硬件要求更低。

详情
英文摘要

Building artificially intelligent geospatial systems requires rapid delivery of spatial data analysis on massive scales with minimal human intervention. Depending upon their intended use, data analysis can also involve model assessment and uncertainty quantification. This article devises transfer learning frameworks for deployment in artificially intelligent systems, where a massive data set is split into smaller data sets that stream into the analytical framework to propagate learning and assimilate inference for the entire data set. Specifically, we introduce Bayesian predictive stacking for multivariate spatial data and demonstrate rapid and automated analysis of massive data sets. Furthermore, inference is delivered without human intervention without excessively demanding hardware settings. We illustrate the effectiveness of our approach through extensive simulation experiments and in producing inference from massive dataset on vegetation index that are indistinguishable from traditional (and more expensive) statistical approaches.

2406.15865 2026-05-15 stat.CO math.OC

Approximate Bayesian Computation sequential Monte Carlo via random forests

Khanh N. Dinh, Cécile Liu, Zijin Xiang, Zhihan Liu, Simon Tavaré

AI总结 本文研究了如何在近似贝叶斯计算(ABC)中更有效地利用随机森林方法进行参数后验推断。作者提出了两种改进方法:一种是利用分布型随机森林直接推断参数的联合后验分布,另一种是结合序贯蒙特卡罗方法,通过迭代更新先验分布以聚焦于参数空间中最可能的区域。这些方法在提高计算效率和推断准确性方面表现出色,适用于多种科学领域的确定性和随机模型。

详情
英文摘要

Approximate Bayesian Computation (ABC) is a popular inference method when likelihoods are hard to come by. Practical bottlenecks of ABC applications include selecting statistics that summarize the data without losing too much information or introducing uncertainty, and choosing distance functions and tolerance thresholds that balance accuracy and computational efficiency. Recent studies have shown that ABC methods using random forest (RF) methodology perform well while circumventing many of ABC's drawbacks. However, RF construction is computationally expensive for large numbers of trees and model simulations, and there can be high uncertainty in the posterior if the prior distribution is uninformative. Here we further adapt random forests to the ABC setting in two ways. The first exploits distributional random forests to provide a direct method for inferring the joint posterior distribution of parameters of interest, while the second describes a sequential Monte Carlo approach which updates the prior distribution iteratively to focus on the most likely regions in the parameter space. We show that the new methods can accurately infer posterior distributions for a wide range of deterministic and stochastic models in different scientific areas.

2404.13649 2026-05-15 stat.ML cs.LG stat.ME

Distributional Principal Autoencoders

Xinwei Shen, Nicolai Meinshausen

AI总结 本文提出了一种名为分布主成分自编码器(DPA)的降维方法,旨在在重建数据时保留原始数据的分布特性。该方法通过学习数据在低维潜在变量条件下的条件分布,使得重建数据与原始数据在分布上一致。实验表明,DPA在气候数据、单细胞数据和图像数据上均能有效保留数据的原始分布和重要结构特征。

详情
英文摘要

Dimension reduction techniques usually lose information in the sense that reconstructed data are not identical to the original data. However, we argue that it is possible to have reconstructed data identically distributed as the original data, irrespective of the retained dimension or the specific mapping. This can be achieved by learning a distributional model that matches the conditional distribution of data given its low-dimensional latent variables. Motivated by this, we propose Distributional Principal Autoencoder (DPA) that consists of an encoder that maps high-dimensional data to low-dimensional latent variables and a decoder that maps the latent variables back to the data space. For reducing the dimension, the DPA encoder aims to minimise the unexplained variability of the data with an adaptive choice of the latent dimension. For reconstructing data, the DPA decoder aims to match the conditional distribution of all data that are mapped to a certain latent value, thus ensuring that the reconstructed data retains the original data distribution. Our numerical results on climate data, single-cell data, and image benchmarks demonstrate the practical feasibility and success of the approach in reconstructing the original distribution of the data. DPA embeddings are shown to preserve meaningful structures of data such as the seasonal cycle for precipitations and cell types for gene expression.

2305.06280 2026-05-15 math.ST math.AG stat.TH

Maximum likelihood thresholds of generic linear concentration models

Daniel Irving Bernstein, Steven J. Gortler, Louis Theran

AI总结 本文研究了一般线性浓度模型的最大似然阈值,即通过最大似然估计拟合模型所需的最小数据点数量。作者确定了这类模型的最大似然阈值,并证明其与直观的维度计算结果一致,这一结论在半代数概念下并不显然。此外,文章还从几何角度解释了线性浓度模型在何种情况下会偏离这一通用行为。

详情
英文摘要

The maximum likelihood threshold of a statistical model is the minimum number of datapoints required to fit the model via maximum likelihood estimation. In this paper we determine the maximum likelihood thresholds of generic linear concentration models. This turns out to be the number that one might expect from a naive dimension count, which is nontrivial to prove given that the maximum likelihood threshold is a semi-algebraic concept. We also describe geometrically how a linear concentration model can fail to exhibit this generic behavior.

2305.00578 2026-05-15 stat.ME

High-Dimensional Clustering via Nearest-Neighbor Asymmetry

Hao Chen, Xiancheng Lin

AI总结 高维聚类通常依赖于几何或局部相似性结构,但群体间的显著差异未必总是由位置差异引起。本文提出了一种基于最近邻不对称性的聚类方法NAC,通过构建有向$k$-最近邻图,并利用两种标准化统计量评估聚类划分,能够适应不同分离模式,无需预设混合模型或低维表示。该方法在位置、尺度及联合差异的多种场景下表现出色,尤其在存在最近邻不对称性时具有明显优势,并在基因表达数据分析中展示了其有效性。

详情
英文摘要

High-dimensional clustering often relies on geometric or local-similarity structure, but the dominant separation between groups may not always be location-based. Differences in dispersion can create asymmetric local-neighborhood patterns: points from a more dispersed component may be closer to points in a more concentrated component than to points from their own component. We turn this high-dimensional phenomenon into a clustering principle. The proposed method, NAC (Nearest-neighbor Asymmetry Clustering), constructs a directed $k$-nearest-neighbor graph and evaluates candidate partitions using two permutation-standardized statistics: a weighted within-edge statistic that captures overall within-cluster enrichment and a contrast statistic that captures asymmetric separation. The resulting objective combines these two standardized signals, allowing the method to adapt to different separation regimes without specifying a mixture model or a low-dimensional representation. We provide a population-level analysis showing how the two statistics target complementary nearest-neighbor patterns. Simulation studies across mean, scale, and combined location-scale differences show that NAC is competitive under location separation and especially effective when nearest-neighbor asymmetry is present; gene-expression applications further illustrate its usefulness in small-sample, high-dimensional clustering.

2304.11468 2026-05-15 cs.LG stat.ML

Increasing the Scope as You Learn: Adaptive Bayesian Optimization in Nested Subspaces

Leonard Papenmeier, Luigi Nardi, Matthias Poloczek

AI总结 本文提出了一种名为BAxUS的自适应贝叶斯优化方法,通过引入嵌套随机子空间,在优化过程中动态调整搜索空间,以应对高维黑箱函数优化中的性能下降问题。该方法在理论上保证了稳定性,并在多个应用任务中表现出优于现有先进方法的优化效果。

详情
Journal ref
Advances in Neural Information Processing Systems 35 (NeurIPS 2022), pp. 11586-11601
Comments
28 pages, 8 figures. Accepted to NeurIPS 2022. This is the revised version and includes the appendix
英文摘要

Recent advances have extended the scope of Bayesian optimization (BO) to expensive-to-evaluate black-box functions with dozens of dimensions, aspiring to unlock impactful applications, for example, in the life sciences, neural architecture search, and robotics. However, a closer examination reveals that the state-of-the-art methods for high-dimensional Bayesian optimization (HDBO) suffer from degrading performance as the number of dimensions increases or even risk failure if certain unverifiable assumptions are not met. This paper proposes BAxUS that leverages a novel family of nested random subspaces to adapt the space it optimizes over to the problem. This ensures high performance while removing the risk of failure, which we assert via theoretical guarantees. A comprehensive evaluation demonstrates that BAxUS achieves better results than the state-of-the-art methods for a broad set of applications.

2202.05568 2026-05-15 stat.ML cs.IT cs.LG math.IT math.PR math.ST stat.TH

Change of measure through the Legendre transform

Antoine Picard-Weibel, Benjamin Guedj

AI总结 本文研究了通过Legendre变换实现测度变化的方法,用于推导PAC-Bayes泛化界。作者结合Legendre变换与Fenchel-Young不等式,基于$f$-散度构建了测度变化不等式,拓展了传统Donsker-Varadhan定理的条件。该方法为学习理论提供了更灵活的分析工具,能够在更广泛的假设条件下建立PAC-Bayes保证。

详情
Comments
27 pages
英文摘要

PAC-Bayes generalisation bounds are derived via change-of-measure inequalities that transfer concentration properties from a reference measure to all posterior measures. The specific choice of change of measure determines the assumptions required on the empirical risk; in particular, the classical Donsker--Varadhan theorem leads to bounds relying on bounded exponential moments. We study change-of-measure inequalities based on \(f\)-divergences, obtained by combining the Legendre transform of \(f\) with the Fenchel--Young inequality. Beyond their intrinsic interest in probability theory, we show how these inequalities are helpful in learning theory and yield PAC-Bayes bounds under tailored assumptions on the empirical risk, thereby extending the range of conditions under which PAC-Bayesian guarantees can be established.

1902.06002 2026-05-15 cs.IT cs.DM math.IT math.PR math.ST stat.TH

Group Testing: An Information Theory Perspective

Matthew Aldridge, Oliver Johnson, Jonathan Scarlett

AI总结 本文从信息论的角度综述了群组测试问题的最新研究进展。群组测试旨在通过检测物品的组合来识别少量缺陷品,广泛应用于医学、生物、通信等领域。文章介绍了高效的算法、解码方法的可达界与反向界,并提出了群组测试的“速率”概念,用于衡量每项测试获取的信息量。此外,还讨论了噪声环境及多种变体问题下的相关结果。

详情
Journal ref
Foundations and Trends in Communications and Information Theory: Vol. 23: No. 1-2, pp 1-221, 2026
Comments
Second edition. Published in Foundations and Trends in Communications and Information Theory. The first edition can be found in arXiv v3
英文摘要

The group testing problem concerns discovering a small number of defective items within a large population by performing tests on pools of items. A test is positive if the pool contains at least one defective, and negative if it contains no defectives. This is a sparse inference problem with a combinatorial flavour, with applications in medical testing, biology, telecommunications, information technology, data science, and more. In this monograph, we survey recent developments in the group testing problem from an information-theoretic perspective. We cover several related developments: efficient algorithms with practical storage and computation requirements, achievability bounds for optimal decoding methods, and algorithm-independent converse bounds. We assess the theoretical guarantees not only in terms of scaling laws, but also in terms of the constant factors, leading to the notion of the {\em rate} of group testing, indicating the amount of information learned per test. For the noiseless setting, we present a series of results leading to optimal rates, which in turn imply optimality and suboptimality results of various algorithms depending on the sparsity regime. We also survey analogous developments in noisy settings. In addition, we survey results concerning a number of variations on the standard group testing problem, including approximate recovery criteria, adaptive algorithms with a limited number of stages, sublinear-time algorithms, and settings with additional prior information, among others.

1805.06144 2026-05-15 math.ST stat.TH

On Difference Between Two Types of $γ$-divergence for Regression

Takayuki Kawashima, Hironori Fujisawa

AI总结 本文研究回归问题中两种类型的 $γ$-散度在异质污染下的差异,指出其中一种具有强鲁棒性,而另一种在一般情况下不具有,但在特定参数模型或同质污染条件下仍可保持鲁棒性。研究揭示了两种散度在处理不同污染场景时的适用性差异,为模型选择提供了理论依据。

详情
英文摘要

The $γ$-divergence is well-known for having strong robustness against heavy contamination. By virtue of this property, many applications via the $γ$-divergence have been proposed. There are two types of \gd\ for regression problem, in which the treatments of base measure are different. In this paper, we compare them and pointed out a distinct difference between these two divergences under heterogeneous contamination where the outlier ratio depends on the explanatory variable. One divergence has the strong robustness under heterogeneous contamination. The other does not have in general, but has when the parametric model of the response variable belongs to a location-scale family in which the scale does not depend on the explanatory variables or under homogeneous contamination where the outlier ratio does not depend on the explanatory variable. \citet{hung.etal.2017} discussed the strong robustness in a logistic regression model with an additional assumption that the tuning parameter $γ$ is sufficiently large. The results obtained in this paper hold for any parametric model without such an additional assumption.

1802.03127 2026-05-15 stat.ML stat.ME

Robust and Sparse Regression in GLM by Stochastic Optimization

Takayuki Kawashima, Hironori Fujisawa

AI总结 该论文研究了在广义线性模型(GLM)中如何通过随机优化方法实现鲁棒且稀疏的回归分析。针对高维数据中稀疏GLM对异常值不鲁棒的问题,作者基于γ-散度提出了一种鲁棒且稀疏的GLM估计方法,并采用随机投影梯度下降算法进行参数估计,有效提升了大规模问题的求解效率。研究还通过数值实验和实际数据分析验证了该方法在多个具体模型中的优越性。

详情
Comments
28 pages
英文摘要

The generalized linear model (GLM) plays a key role in regression analyses. In high-dimensional data, the sparse GLM has been used but it is not robust against outliers. Recently, the robust methods have been proposed for the specific example of the sparse GLM. Among them, we focus on the robust and sparse linear regression based on the $γ$-divergence. The estimator of the $γ$-divergence has strong robustness under heavy contamination. In this paper, we extend the robust and sparse linear regression based on the $γ$-divergence to the robust and sparse GLM based on the $γ$-divergence with a stochastic optimization approach in order to obtain the estimate. We adopt the randomized stochastic projected gradient descent as a stochastic optimization approach and extend the established convergence property to the classical first-order necessary condition. By virtue of the stochastic optimization approach, we can efficiently estimate parameters for very large problems. Particularly, we show the linear regression, logistic regression and Poisson regression with $L_1$ regularization in detail as specific examples of robust and sparse GLM. In numerical experiments and real data analysis, the proposed method outperformed comparative methods.

2605.14168 2026-05-15 cs.LG cs.DS stat.ML

Finite Sample Bounds for Learning with Score Matching

Devin Smedira, Abhijith Jayakumar, Sidhant Misra, Marc Vuffray, Andrey Y. Lokhov

AI总结 本文研究了在有限样本条件下,使用得分匹配方法学习连续指数族分布的统计学习问题。作者提供了非渐近的样本复杂度分析,揭示了模型维数的多项式依赖关系,这是该领域首个此类结果。该工作填补了得分匹配理论分析的空白,为高维统计学习提供了重要的理论保证。

详情
Comments
22 pages
英文摘要

Learning of continuous exponential family distributions with unbounded support remains an important area of research for both theory and applications in high-dimensional statistics. In recent years, score matching has become a widely used method for learning exponential families with continuous variables due to its computational ease when compared against maximum likelihood estimation. However, theoretical understanding of the statistical properties of score matching is still lacking. In this work, we provide a non-asymptotic sample complexity analysis for learning the structure of exponential families of polynomials with score matching. The derived sample bounds show a polynomial dependence on the model dimension. These bounds are the first of its kind, as all prior work has shown only asymptotic bounds on the sample complexity.

2605.14142 2026-05-15 stat.ML cs.LG stat.CO

To discretize continually: Mean shift interacting particle systems for Bayesian inference

Ayoub Belhadji, Daniel Sharp, Youssef M. Marzouk

AI总结 本文提出了一种基于最大均值差异(MMD)最小化的交互粒子系统,用于在已知非归一化密度的情况下近似概率分布的积分。该方法扩展了经典均值漂移算法和经验分布最优量化算法,适用于连续分布,并且不受未知归一化常数的影响,支持无梯度和有梯度的实现方式。实验表明,该方法在多模态混合、贝叶斯分层模型、受PDE约束的反问题等多种采样任务中表现出良好的收敛性、多模态捕捉能力和高维扩展性。

详情
英文摘要

Integration against a probability distribution given its unnormalized density is a central task in Bayesian inference and other fields. We introduce new methods for approximating such expectations with a small set of weighted samples -- i.e., a quadrature rule -- constructed via an interacting particle system that minimizes maximum mean discrepancy (MMD) to the target distribution. These methods extend the classical mean shift algorithm, as well as recent algorithms for optimal quantization of empirical distributions, to the case of continuous distributions. Crucially, our approach creates dynamics for MMD minimization that are invariant to the unknown normalizing constant; they also admit both gradient-free and gradient-informed implementations. The resulting mean shift interacting particle systems converge quickly, capture anisotropy and multi-modality, avoid mode collapse, and scale to high dimensions. We demonstrate their performance on a wide range of benchmark sampling problems, including multi-modal mixtures, Bayesian hierarchical models, PDE-constrained inverse problems, and beyond.

2605.14098 2026-05-15 stat.ML cs.CL cs.LG

Pause and Reflect: Conformal Aggregation for Chain-of-Thought Reasoning

Yu Gu, Zijun Yu, Vahid Partovi Nia, Masoud Asgharian

AI总结 该研究针对链式推理(CoT)中多路径推理结果的聚合不确定性问题,提出了一种基于 conformal 的聚合方法,以提升系统在拒绝回答时的准确性。不同于传统的多数投票方式,该方法采用加权得分聚合,并结合 conformal 风险控制来校准拒绝规则,从而在有限样本下保证自信错误率的控制。实验表明,该方法在多个基准测试中实现了较高的选择性准确率,且无需重新训练模型。

详情
Comments
9 pages, 4 figures, submitted
英文摘要

Chain-of-thought (CoT) reasoning with self-consistency improves performance by aggregating multiple sampled reasoning paths. In this setting, correctness is no longer tied to a single reasoning trace but to the aggregation rule over a pool of candidate paths, making aggregation uncertainty the central challenge. This issue is critical where confidently incorrect answers are far more costly than abstentions. We introduce a conformal procedure for CoT reasoning that directly addresses aggregation uncertainty. Our approach replaces majority voting with weighted score aggregation over reasoning paths and calibrates an abstention rule using conformal risk control. This approach leads to finite-sample guarantees on the confident-error rate--the probability that the system answers and is wrong. We further identify score separability as the key condition under which abstention provably improves selective accuracy, and derive closed-form expressions that predict accuracy gains from calibration data alone. The method is fully inference-time, and requires no retraining. Across four benchmarks, four open-source models, and three score classes, realized confident-error rates are consistent with the prescribed targets up to calibration-split and test-set variability. Our method achieves $90.1\%$ selective accuracy on GSM8K by abstaining on less than $5\%$ of problems, compared with $82\%$ accuracy under majority-voting baseline.

2605.14059 2026-05-15 cond-mat.dis-nn stat.ML

Finite-size scaling of hetero-associative retrieval in continuous-signal-driven Ising spin systems

Andrea Ladiana

AI总结 该研究探讨了在连续信号驱动的伊辛自旋系统中异构联想记忆的有限大小标度行为。通过构建一个多层伊辛框架,将连续信号编码为离散自旋,并结合伪逆记忆耦合,实现了对高维生物信号的有效联想记忆。研究揭示了系统在热涨落作用下的对称性破缺机制,并建立了动态对偶性,明确了并行与串行更新在信号传播中的不同作用。实验表明该模型在睡眠多导睡眠图数据上表现出优异的跨模态回忆能力。

详情
英文摘要

Real-world physical signals are continuous and high-dimensional, yet the statistical-mechanics machinery of associative memory operates on discrete Ising spins. We bridge this divide through a multilayer Ising framework that couples a geometry-preserving continuous-to-Ising encoder (PCA whitening composed with SimHash random-hyperplane projection) to Kanter-Sompolinsky pseudo-inverse memory couplings, embedded directly into the local-field equations of a tri-layer hetero-associative system. The pseudo-inverse correction renders the equal-weight mixture state thermodynamically unstable, so that thermal fluctuations break the cross-modal symmetry and select a single global winner. We further establish a dynamical duality: parallel (Little) updates are structurally required to ignite the cross-modal signal avalanche from a single cued layer, whereas sequential (Glauber) sweeps resolve symmetric superpositions. The operational storage capacity obeys the Amit-Gutfreund-Sompolinsky finite-size correction $α_c(N)=α_c(\infty)-c\,N^{-1/2}$, extrapolating to an asymptotic operational limit $α_c(\infty)\approx 0.50$ under macroscopic-basin retrieval. Applied to multi-channel sleep polysomnography (PhysioNet Sleep-EDF), the architecture reconstructs the macroscopic sleep state on parietal EEG and EOG axes from a single noisy frontal-EEG cue, demonstrating cross-modal recall in the presence of quenched biological disorder.

2605.14056 2026-05-15 stat.ME stat.AP

An MCMC-Based Method for Dynamic Causal Modeling of Effective Connectivity in Functional MRI

Kaitlyn R. Fales, Hyebin Song, Nicole A. Lazar

AI总结 该研究提出了一种基于马尔可夫链蒙特卡洛(MCMC)方法的动态因果建模(CDCM)技术,用于功能磁共振成像(fMRI)中有效连接的动态分析。与传统DCM相比,CDCM采用更简洁的观测模型和No-U-Turn采样器,提高了计算效率并增强了参数可识别性。实验结果表明,CDCM在模拟和真实数据中均能提供更可靠的不确定性估计和实验输入相关参数的一致估计,适用于小规模和大规模神经影像数据分析。

详情
英文摘要

Effective connectivity analysis in functional magnetic resonance imaging (fMRI) studies directional interactions among brain regions and experimental stimuli. Dynamic causal modeling (DCM) is a widely used method to estimate effective connectivity, based on a state-space representation consisting of a latent neural signal model and an observation model transforming the neural signal into the observed blood-oxygen-level-dependent (BOLD) response. A standard DCM combines ordinary differential equation (ODE) dynamics for the latent signal with a complex neural-hemodynamic system for the observation model, and typically uses variational Bayes for parameter estimation. While physically well-motivated, this approach can lead to practical challenges such as inexact solutions and underestimated uncertainty. We introduce Canonical DCM (CDCM), a Markov chain Monte Carlo (MCMC)-based method that adopts a simpler observation model and the No-U-Turn Sampler for posterior sampling. The simpler observation model admits a piecewise analytic solution to the neural ODE, increasing computational efficiency and enabling explicit derivation of sufficient conditions for parameter identifiability. The results indicate that CDCM provides reliable uncertainty quantification and consistent estimation of parameters related to experimental inputs for simulated and real data. We use publicly available data from the Wellcome Centre for Human Neuroimaging and the Human Connectome Project (HCP) to benchmark CDCM against standard DCM methods and examine replicability of estimated connectivity patterns in small- and large-scale neuroimaging settings.

2605.14041 2026-05-15 stat.ME cs.LG

Wahkon: A Statistically Principled Deep RKHS Superposition Network

Yongkai Chen, Wenxuan Zhong, Ping Ma

AI总结 本文提出了一种名为Wahkon的深度再生核希尔伯特空间(RKHS)叠加网络,旨在结合深度学习的预测能力与RKHS方法的统计保证。该方法基于Kolmogorov叠加原理和Wahba样条的RKHS正则化思想,建立了有限维的深度表示定理,实现了可训练的模型结构与逐层复杂度控制。理论分析表明,该方法在层次化高斯过程先验下等价于最大后验估计,并在深度与宽度的正则化权衡方面具有最优收敛率;实验显示其在多个基准任务和单细胞数据分析中优于传统深度模型。

详情
英文摘要

Deep learning excels at prediction but often lacks finite-sample guarantees and calibrated uncertainty; RKHS (Reproducing Kernel Hilbert Space)-based methods provide those guarantees but struggle to adapt in high dimensions. We propose Wahkon, a deep RKHS superposition network that unifies Kolmogorov's superposition principle with RKHS regularization in the smoothing-spline tradition of Wahba. This yields a finite-dimensional deep representer theorem that makes training tractable and provides explicit layerwise complexity control. We show the penalized estimator is exactly the MAP (maximum a posteriori) estimate under a hierarchical Gaussian-process prior, extending the spline/GP duality to deep compositions. Using metric-entropy arguments, we establish minimax-optimal convergence rates under mild smoothness and clarify how depth and width trade off with regularity. Empirically, Wahkon outperforms multilayer perceptrons, Neural Tangent Kernels, and Kolmogorov--Arnold Networks across simulation benchmarks and a single-cell CITE-seq study. By unifying Kolmogorov's superposition principle with RKHS regularization, Wahkon delivers accuracy, interpretability, and statistical rigor in a single framework.

2605.14019 2026-05-15 econ.EM cs.LG math.ST stat.CO stat.TH

Regret Equals Covariance: A Closed-Form Characterization for Stochastic Optimization

Irene Aldridge

AI总结 本文研究了随机优化问题中遗憾(Regret)的度量问题,提出了一个精确的协方差分解公式,将期望遗憾表示为不确定参数与最优决策之间的协方差加上一个可估计的残差项。对于线性规划和无约束二次规划问题,该残差项为零,使得遗憾可直接由协方差计算得出,从而避免了传统样本平均近似方法的高计算复杂度。该方法在实际问题中可通过历史数据高效估计协方差,计算效率显著提升,并通过理论分析和实验验证了其有效性。

详情
Comments
33 pages
英文摘要

Regret is the cost of uncertainty in algorithmic decision-making. Quantifying regret typically requires computationally expensive simulation via Sample Average Approximation (SAA), with complexity $\mathcal{O}(Bn^{2}d^{3})$ in the number of scenarios $B$, variables $n$, and constraints $d$. % This paper proves that expected regret in any stochastic optimization problem admits the exact decomposition % \begin{equation*} \mathrm{Regret}(c) = \mathrm{Cov}(c,\,π^{*}(c)) + R(c), \end{equation*} % where $c$ is the vector of uncertain parameters, $π^{*}(c)$ is the optimal decision, and $R(c)$ is a residual whose magnitude we bound explicitly under Lipschitz, smooth, and strongly convex conditions. % For linear programs and unconstrained quadratic programs, including the classical Markowitz portfolio problem, we prove $R(c)=0$ exactly, so that $\mathrm{Regret}(c) = \mathrm{Cov}(c,π^{*}(c))$ holds without approximation. % When historical cost-decision pairs $\{(c_i, π^*(c_i))\}$ are available, the covariance can be estimated in $\mathcal{O}(nd^{2})$ time, which is orders of magnitude faster than SAA. The estimation is performed by a single pass through the data. % We derive concentration bounds, a central limit theorem, and an asymptotically unbiased residual estimator, and we validate all results on synthetic LP, QP, and integer programming instances and on a rolling-window portfolio experiment using ten years of CRSP equity data.

2605.14011 2026-05-15 stat.ME

Robust inference in inflated beta regression

Francisco Felipe Queiroz, Silvia Lopes de Paula Ferrari

AI总结 本文研究了在边界值存在的连续比例数据建模中,如何提高膨胀beta回归模型的鲁棒性。针对最大似然估计对异常值敏感的问题,提出了一种稳健估计方法,在保持模型简洁性和可解释性的基础上提升了推断的稳定性。同时,文中还引入了一种根据数据稳健性需求选择调参常数的算法,并发展了稳健的Wald型检验,通过模拟研究和实际数据分析验证了方法的有效性。

详情
英文摘要

The inflated beta regression model is widely used for modeling continuous proportions with values at the boundaries. Maximum likelihood estimation for these models is well-known for its sensitivity to outliers, which can severely distort inference and lead to misleading conclusions. We propose robust estimators that mitigate the lack of robustness in maximum likelihood-based inference while preserving the simplicity and interpretability of the inflated beta framework. Additionally, an algorithm is introduced to select tuning constants based on the data's robustness requirements. The proposed estimators' asymptotic and robustness properties are studied, and robust Wald-type tests are developed. Simulation studies and a real data application highlight the advantages and practical effectiveness of the proposed robust estimators.

2605.14008 2026-05-15 stat.ME math.ST stat.TH

Predictive Inference via Kernel Density Estimates

Torey Hilbert

AI总结 本文研究了基于核密度估计的预测推断方法,探讨了两种核预测规则的收敛性质。作者证明了经典核密度估计和递归版本在预测分布上均以几乎必然的方式弱收敛,为核密度估计提供了新的贝叶斯解释。研究发现,经典方法收敛于紧支撑测度,而递归方法则收敛于非紧支撑测度,揭示了两者在渐近行为上的本质差异。

详情
Comments
13 pages
英文摘要

Kernel density estimation is a widely used nonparametric approach to estimate an unknown distribution. Recent work in Bayesian predictive inference has considered stochastic processes formed by specifying the predictive distribution for the next data point given all observed data such that the resulting predictive distributions converge weakly almost surely. We study two kernel based prediction rules: the classic kernel density estimator, and a recursive version previously introduced for online problems. We show that both processes converge weakly almost surely, which opens the door for new Bayesian interpretations of kernel density estimation. Surprisingly, the process based on the classic kernel density estimates converges to a compactly supported measure, while the recursive version converges to a non-compactly supported measure.

2605.14000 2026-05-15 stat.AP

Recent advances in statistical methodology applied to the Hjort liver index time series (1859-2012) and associated influential factors

Gudmund H. Hermansen, Nils Lid Hjort, Olav S. Kjesbu

AI总结 本文综述了若干近期统计方法在生物学和渔业科学中的应用,重点探讨了聚焦模型选择、动态拟合优度检验、突变点检测、预测不确定性以及多源信息融合等方法,并将其应用于分析1859至2012年间的Hjort肝脏质量指数时间序列。该序列源自1914年Hjort的经典研究,并经后续研究扩展为目前最长的海洋科学时间序列之一。研究详细分析了该序列与其相关因素(如科拉冬季温度、鳕鱼长度分布、死亡率及食物可得性指数)之间的关系与相互作用。

详情
Journal ref
Canadian Journal of Fisheries and Aquatic Sciences, 2016, vol. 73, pages 279-295
Comments
16 pages, 19 figures. This is the authors' manuscript, 2016, published in modified form in Canadian Journal of Fisheries and Aquatic Sciences 2016, vol. 73, pages 279-295, part of the special issue based on the Johan Hjort Symposium on Recruitment Dynamics and Stock Variability, Bergen, Norway, October 2014
英文摘要

Certain recent advances in statistical methodology have promising potential for fruitful use in general biology and the fisheries sciences. This paper reviews and discusses some of the relevant themes, including accurate modelling via focused model selection techniques, dynamic goodness-of-fit testing of processes evolving over time, finding break points for phenomena experiencing changes, prediction uncertainty, and optimal combination of information across diverse sources via confidence distributions. The methods are illustrated for the Hjort liver quality index time series. Its roots lie in the classic Hjort (`Fluctuations in the Great Fisheries of Northern Europe, Viewed in the Light of Biological Research', 1914), where liver quality of the Atlantic cod {\it (Gadus morhua)} for 1880--1912 is reported on and studied, along with related factors, making it one of the first teleost time series ever published. Diligent work by Kjesbu et al. (`Making use of Johan Hjort's `unknown' legacy: reconstruction of a 150-year coastal time-series on northeast Arctic cod (Gadus morhua) liver data reveals long-term trends in energy allocation patterns', 2014), involving both archival and calibration efforts, have extended the series both backwards and forwards in time, to 1859--2012, yielding one of the longest time series of marine science. Our study offers a detailed examination of this series and how it relates to and interacts with associated factors, including Kola winter temperatures, length distribution parameters, cod mortality, and a certain index related to availability of food.

2605.13979 2026-05-15 quant-ph cs.LG stat.ML

Winning Lottery Tickets in Neural Networks via a Quantum-Inspired Classical Algorithm

Natsuto Isogai, Hayata Yamasaki, Sho Sonoda, Mio Murao

AI总结 本文提出了一种受量子算法启发的全新经典算法,用于从大型浅层神经网络中高效选取稀疏子网络。该算法通过优化概率分布进行采样,避免了传统方法中指数级的时间复杂度,实现了多项式时间复杂度的改进。实验表明,该算法在采样效率和经验风险方面均优于传统方法,展示了在无需量子硬件的情况下,经典计算机也能高效完成量子启发的稀疏子网络选择任务。

详情
Comments
28 pages, 3 figures
英文摘要

Quantum machine learning (QML) aims to accelerate machine learning tasks by exploiting quantum computation. Previous work studied a QML algorithm for selecting sparse subnetworks from large shallow neural networks. Instead of directly solving an optimization problem over a large-scale network, this algorithm constructs a sparse subnetwork by sampling hidden nodes from an optimized probability distribution defined using the ridgelet transform. The quantum algorithm performs this sampling in time $O(D)$ in the data dimension $D$, whereas a naive classical implementation relies on handling exponentially many candidate nodes and hence takes $\exp[O(D)]$ time. In this work, we construct and analyze a quantum-inspired fully classical algorithm for the same sampling task. We show that our algorithm runs in time $O(\operatorname{poly}(D))$, thereby removing the exponential dependence on $D$ from the previous classical approach. Numerical simulations show that the proposed sampler achieves empirical risk comparable to exact sampling from the optimized distribution and substantially lower than sampling from the non-optimized uniform distribution, while also exhibiting exponentially improved runtime scaling compared with the conventional classical implementation. These successful dequantization results show that sparse subnetwork selection via optimized sampling can be achieved classically with polynomial data-dimension scaling on conventional computers without quantum hardware, providing an alternative to the existing quantum algorithm.

2605.13933 2026-05-15 cs.LG cs.AI stat.ML

Unsupervised learning of acquisition variability in structural connectomes via hybrid latent space modeling

Gaurav Rudravaram, Lianrui Zuo, Karthik Ramadass, Elyssa McMaster, Jongyeon Yoon, Aravind R. Krishnan, Adam M. Saunders, Chenyu Gao, Nancy R. Newlin, Praitayini Kanakaraj, Lori L. Beason Held, Murat Bilgel, Laura A. Barquero, Micah DArchangel, Tin Q. Nguyen, Laurie B. Cutting, Derek Archer, Timothy J. Hohman, Daniel C. Moyer, Bennett A. Landman

AI总结 该研究旨在解决扩散磁共振成像(dMRI)数据中因采集设备、地点和协议不同而引入的结构连接组变异问题。提出了一种无需手动调参的无监督框架,通过架构层面的退火机制,使模型在训练过程中自适应地平衡离散与连续潜在变量,从而更有效地分离采集相关变异与生物变异。实验表明,该方法在多个数据集上表现出更强的站点识别能力,展示了其在捕捉dMRI采集变异方面的有效性。

详情
英文摘要

Acquisition differences across sites, scanners, and protocols in dMRI introduce variability that complicates structural connectome analysis. This motivates deep learning models that can represent high-dimensional connectomes in a low-dimensional space while explicitly separating acquisition-related effects from biological variation. Conventional dimensionality reduction methods model all variance as continuous, so acquisition effects often get absorbed into a continuous latent space. Recent hybrid latent-space models combine discrete and continuous components to address this, but typically require manual capacity tuning to ensure the discrete component captures the intended variability. We introduce an unsupervised framework that removes this manual tuning by architecturally annealing encoder outputs before decoding, allowing the model to adaptively balance discrete and continuous latent variables during training. To evaluate it, we curated a dataset of N=7,416 structural connectomes derived from dMRI, spanning ages 2 to 102 and 13 studies with 25 unique acquisition-parameter combinations. Of these, 5,900 are cognitively unimpaired, 877 have mild cognitive impairment (MCI), and 639 have Alzheimer's disease (AD). We compare against a standard VAE, PCA with k-means clustering, and hybrid models that anneal only through the loss function. Our architectural annealing produces stronger site learning (ARI=0.53, p<0.05) than these baselines. Results show that a hybrid continuous-discrete latent space, with architectural rather than loss-based annealing, provides a useful unsupervised mechanism for capturing acquisition variability in dMRI: by jointly modeling smooth and categorical structure, the Joint-VAE recovers clusters aligned with scanner and protocol differences.

2605.13928 2026-05-15 stat.CO

CudaMon: An R Package to Monitor NVIDIA GPUs, Showcased by Monitoring a GPU-accelerated Single-cell Analysis Workflow in R

Mohammad Amin Zadenoori, Riccardo Ceccaroni, Gabriele Sales, Davide Risso

AI总结 CudaMon 是一个用于监控 NVIDIA GPU 的 R 包,通过 NVML 接口实时提供 GPU 利用率、内存、温度和功耗等信息,并支持数据导出与可视化。该研究以单细胞 RNA 测序分析流程为例,展示了 CudaMon 在监测 GPU 加速计算过程中的应用,揭示了计算密集型步骤的高利用率及数据管理阶段的性能瓶颈。CudaMon 有助于优化 GPU 资源使用、调试性能问题并提升 R 工作流的可重复性。

详情
英文摘要

NVIDIA GPUs have recently started to be used in computational biology, yet R users lack integrated GPU monitoring tools, forcing reliance on external utilities like nvidia-smi. We introduce CudaMon, an R package providing real-time monitoring of GPU utilization, memory, temperature, and power draw via NVML, along with data export and visualization utilities. Monitoring a GPU-accelerated single-cell RNA-seq pipeline (1M brain cells, RAPIDS workflow) shows that compute-intensive steps (PCA, UMAP, t-SNE) exceed 90% GPU utilization, while data management phases reveal bottlenecks. CudaMon facilitates resource optimization, performance debugging, and reproducibility for GPU-accelerated R workflows.

2605.13926 2026-05-15 stat.AP

Optimising football transfer strategy under budget constraints: A weighted multi-criteria approach

Tathagata Basu, Soudeep Deb, Rishideep Roy

AI总结 本文研究了在预算约束下优化足球转会策略的问题,提出了一种结合加权多准则优化的定量框架,综合考虑球员表现、转会价格及联赛环境等因素。通过构建线性混合效应模型预测球员评分和转会价格,并将其整合到约束优化模型中,以制定最优转会方案。同时,将该方案嵌入独立私人价值拍卖模型,分析多支球队竞争同一球员时的市场行为,展示了该方法在捕捉转会市场动态方面的有效性。

详情
英文摘要

The football transfer market is a complex, dynamic environment in which clubs compete to acquire players who strengthen their squads. While several frameworks estimate a player's worth, a comprehensive approach that captures both squad optimisation and transfer market dynamics remains limited. In this paper, we propose a quantitative framework for optimising football transfer strategy under budget constraints, integrated with a competitive bidding paradigm. Using data from professional football leagues, we construct player performance and transfer price models using linear mixed-effects frameworks that incorporate player characteristics, recent performance, team context, and league effects. The predicted ratings and estimated transfer prices are then integrated into a weighted multi-criteria constrained optimisation framework that determines a club's transfer activities at the end of the season. Finally, these optimal transfer decisions are embedded within an independent private-value auction model with a random reserve price to analyse market behaviour when multiple teams compete for the same player. We illustrate our approach using the 2018-19 season of the English Premier League to demonstrate its ability to capture transfer-market dynamics.

2605.13922 2026-05-15 cs.CR cs.LG stat.CO

XAI and Statistical Analysis for Reliable Intrusion Detection in the UAVIDS-2025 Dataset: From Tree to Hybrid and Tabular DNN Ensembles

Iakovos-Christos Zarkadis, Christos Douligeris

AI总结 本文研究了如何利用可解释人工智能(XAI)和统计分析方法,提高无人机入侵检测系统(UAVIDS-2025)中机器学习模型的可靠性。通过对比多种树模型、深度神经网络、混合堆叠模型和集成神经网络,作者找到了性能最佳的XGBoost模型,并结合SHAP方法进行特征重要性分析,揭示了不同攻击类型的关键特征和误判原因。进一步通过密度估计和多重比较统计检验,发现了Wormhole和Blackhole攻击在数据集中的分布特性及其误判的根本原因,为构建可解释且可靠的入侵检测模型提供了重要参考。

详情
英文摘要

During the last few years, the term Mechanistic Interpretability, a specific area, under the umbrella of explainable artificial intelligence (XAI), has been introduced, to explain the decisions made by complex machine learning (ML) models in critical systems like UAV intrusion detection systems (UAVIDS). In this paper, we apply best-practices for data pre-processing and examine a wide range of tree-ensembles, deep neural networks, hybrid stacking models and the latest ensemble neural networks to detect intrusions in UAV, with stratified 10-fold cross validation. With our top-performing model, XGBoost, we proceed to Shapley Additive explanations (SHAP), to analyze the global and local feature importances and understand which features, each attack targets, to mimic normal traffic and where the misclassifications occur. Furthermore a distribution analysis follows, by visually comparing violin plots and the curves of kernel density estimations. With the Westfall-Young permutation test for multiple comparisons, the Bandwidth optimization of the KDEs and the selection of Jensen-Shannon Distance for the test, we discover the true causes of false predictions, observed in Wormhole and Blackhole attacks in UAVIDS-2025. The findings provide robust, reliable and explainable models for UAV intrusion detection, along with statistical insights, which capture and clarify the masked nature of the attacks, regarding the challenge of Density Support Intersection, between these attacks, in this dataset.

2605.13916 2026-05-15 stat.ML cs.AI cs.LG

A Regret Perspective on Online Multiple Testing

Qingyang Hao, Kongchang Zhou, Fang Kong, Hongxin Wei

AI总结 本文从遗憾(Regret)的角度研究在线多重假设检验(OMT),旨在统一评估假阳性与假阴性之间高度不对称的成本。作者引入了加权遗憾指标,揭示了严格控制FDR的确定性方法在稀疏信号冷启动阶段会导致线性遗憾惩罚,并提出了Decoupled-OMT(DOMT)方法,通过引入非负随机扰动,在不增加假阴性的同时显著降低遗憾,实验证明其在非平稳环境下有效缓解阈值耗尽问题。

详情
英文摘要

Online Multiple Testing (OMT), a fundamental pillar of sequential statistical inference, traditionally evaluates the False Discovery Rate (FDR) and statistical power in isolation, obscuring the highly asymmetric costs of false positives and false negatives in modern automated pipelines. To unify this evaluation, we introduce $\textit{Weighted Regret}$. Under this metric, we prove the $\textit{Duality of Regret Conservation}$: purely deterministic procedures ensuring strict FDR control inevitably incur an $Ω(T)$ linear regret penalty, as threshold depletion during signal-sparse cold starts forces massive false negatives. Tailored for exogenous testing streams, we propose Decoupled-OMT (DOMT) as a baseline-agnostic meta-wrapper. By incorporating a history-decoupled, strictly non-negative random perturbation, DOMT rescues purely deterministic baselines from severe threshold depletion. Crucially, it preserves exact asymptotic safety in stationary environments and rigorously bounds finite-sample error inflation during cold-starts. Guaranteeing zero additional false negatives, it yields an order-optimal $Ω(\sqrt{T})$ regret reduction in bursty environments, with a derived ``Cold-Start Tax'' characterizing the exact phase transition of algorithmic superiority. Experiments validate that DOMT consistently curtails empirical weighted regret, achieving an order-optimal sublinear mitigation of threshold depletion to navigate the non-stationary Pareto frontier.

2605.13915 2026-05-15 stat.ML cs.AI cs.LG

Multi-Scale Dequant: Eliminating Dequantization Bottleneck via Activation Decomposition for Efficient LLM Inference

Lingchao Zheng, Yuwei Fan, Jun Li, Chengqiu Hu, Qichen Liao, Junyi Fan, Rui Shi, Fangzheng Miao

AI总结 量化是实现大语言模型高效推理的关键技术,但反量化步骤在现代AI加速器上已成为性能瓶颈。本文提出多尺度反量化(MSD)框架,通过将高精度激活分解为多个低精度组件,直接与量化权重进行矩阵乘法,从而绕过传统反量化流程,显著提升计算效率。实验表明,MSD在保持精度的同时,有效减少了计算延迟和显存带宽需求,适用于多种权重格式并具有严格的误差界保证。

详情
英文摘要

Quantization is essential for efficient large language model (LLM) inference, yet the dequantization step-converting low-bit weights back to high-precision for matrix multiplication has become a critical bottleneck on modern AI accelerators. On architectures with decoupled compute units (e.g., Ascend NPUs), dequantization operations can consume more cycles than the matrix multiplication itself, leaving the high-throughput tensor cores underutilized. This paper presents Multi-Scale Dequant (MSD), a quantization framework that removes weight/KV dequantization from the GEMM critical path. Instead of lifting low-bit weights to BF16 precision, MSD decomposes high-precision BF16 activations into multiple low-precision components, each of which can be multiplied directly with quantized weights via native hardware-accelerated GEMM. This approach shifts the computational paradigm from precision conversion to multi-scale approximation, avoiding INT8-to-BF16 weight conversion before GEMM. We instantiate MSD for two weight formats and derive tight error bounds for each. For INT8 weights (W4A16), two-pass INT8 decomposition achieves near 16 effective bits. For MXFP4 weights (W4A16), two-pass MXFP4 decomposition yields near 6.6 effective bits with error bound 1/64 per block surpassing single-pass MXFP8(5.24 bits) while maintaining the same effective GEMM compute time. We further derive closed-form latency and HBM traffic models showing that MSD avoids the Vector-Cube pipeline stall caused by dequantization and reduces KV cache HBM traffic by up to 2.5 times in attention. Numerical simulations on matrix multiplication and Flash Attention kernels confirm that MSD does not degrade accuracy compared to dequantization baselines, and in many settings achieves lower L2 error.

2605.13913 2026-05-15 stat.ML cs.LG

A Survey on Data-Dependent Worst-Case Generalization Bounds

Hubert Leroux, Jean Marcus, Julien Roger

AI总结 本文综述了数据依赖的最坏情况泛化界的研究进展,旨在解释深度神经网络在高度参数化情况下仍具有良好泛化能力的现象。核心方法包括扩展PAC-Bayesian理论以适应数据依赖的假设集、利用优化轨迹的几何与拓扑特性改进复杂度项,以及通过稳定性假设替代信息论中的相关项。本文将这些成果统一在一个通用不等式框架下,并对不同方法的泛化界进行了对比分析。

详情
Comments
15 pages, 4 figures, 3 tables. The LaTeX source uses the JMLR preprint style (jmlr2e.sty) and BibTeX (refs.bib). Central references in arXiv form include arXiv:2404.17442, arXiv:2006.09313, arXiv:2302.02766, arXiv:2407.08723, and arXiv:2507.06775
英文摘要

Deep neural networks generalize well despite being heavily overparameterized, in apparent contradiction with classical learning theory based on uniform convergence over fixed hypothesis spaces. Uniform bounds over the entire parameter space are vacuous in this regime, and recent work has shown that non-vacuous guarantees can be recovered by restricting attention to the part of parameter space that the algorithm actually visits. This survey paper organizes this line of work around three steps: extending PAC-Bayesian theory to random, data-dependent hypothesis sets (arXiv:2404.17442); refining the complexity term with geometric and topological descriptors of the optimization trajectory, including fractal dimensions, alpha-weighted lifetime sums, and positive magnitude (arXiv:2006.09313, arXiv:2302.02766, arXiv:2407.08723); and replacing the resulting information-theoretic terms by stability assumptions (arXiv:2507.06775). We unify these contributions around a single template inequality and a head-to-head comparison of the resulting bounds.

2605.13910 2026-05-15 stat.ML cs.CV cs.LG

Covariance-aware sampling for Diffusion Models

Andrea Schioppa, Tim Salimans

AI总结 本文提出了一种协方差感知采样器,旨在提升扩散模型在少量采样步数下的像素空间生成质量。该方法通过显式建模反向过程的协方差,结合Tweedie公式和傅里叶空间分解,有效改进了传统仅依赖均值预测的采样方式。实验表明,在相同函数评估次数下,该方法在像素级扩散模型中生成的样本质量优于当前最先进的二阶采样器和最新aDDIM采样器。

详情
英文摘要

We present a covariance-aware sampler that improves the quality of pixel-space Diffusion Model (DM) sampling in the few-step regime. We hypothesize that in the few-step regime samplers fail because they rely solely on the predicted mean of the reverse distribution, while our solution explicitly models the reverse-process covariance. Our method combines Tweedie's formula to estimate the covariance with an efficient, structured Fourier-space decomposition of the covariance matrix. Implemented as an extension of DDIM, our method requires only a minimal overhead: one extra Jacobian-Vector Product (JVP) per step. We demonstrate that for pixel-based DMs, our method consistently produces superior samples compared to state-of-the-art second order samplers (Heun, DPM-Solver++) and the recent aDDIM sampler, at an identical number of function evaluations (NFE).

2605.13907 2026-05-15 stat.ML cs.AI cs.LG

AIS: Adaptive Importance Sampling for Quantized RL

Jiajun Zhou, Wei Shao, Lingchao Zheng, Yuwei Fan, Ngai Wong

AI总结 在大语言模型的强化学习中,低精度 rollout(如 FP8)与高精度训练(如 BF16)之间的不匹配会导致策略梯度偏差,影响训练稳定性。为了解决这一问题,本文提出自适应重要性采样(AIS)方法,通过实时诊断指标动态调整梯度修正强度,既保留了低精度 rollout 的探索优势,又抑制了其带来的不稳定因素。实验表明,AIS 在保持 FP8 加速效果的同时,在多个数学推理和规划任务上达到了与 BF16 基线相当的性能。

详情
英文摘要

Reinforcement learning (RL) for large language models (LLMs) is dominated by the cost of rollout generation, which has motivated the use of low-precision rollouts (e.g., FP8) paired with a BF16 trainer to improve throughput and reduce memory pressure. This introduces a rollout-training mismatch that biases the policy gradient and can cause training to collapse outright on reasoning benchmarks. We show that the mismatch is non-stationary and acts as a double-edged sword: early in training it provides a stochastic exploration bonus, exposing the gradient to trajectories the trainer would otherwise under-sample, but the same perturbation transitions into a destabilizing source of bias as the policy concentrates. To solve this, we propose Adaptive Importance Sampling (AIS), a correction framework that adjusts the strength of its intervention on a per-batch basis. AIS combines three real-time diagnostics, namely weight reliability, divergence severity, and variance amplification, into a single mixing coefficient that interpolates between the uncorrected and fully importance-weighted gradients, suppressing the destabilizing component of the mismatch while preserving its exploratory benefit. We integrate AIS into GRPO and evaluate it on the diffusion-based LLaDA-8B-Instruct and the autoregressive Qwen3-8B and Qwen3.5-9B across mathematical reasoning and planning benchmarks. AIS matches the BF16 baseline on most tasks while retaining the 1.5 to 2.76x rollout speedup of FP8.

2602.21376 2026-05-15 math.OC stat.ME

Fenchel-Young Estimators of Perturbed Utility Models

Xi Lin, Yafeng Yin, Tianming Liu

AI总结 本文研究了扰动效用模型(PUM)框架下的参数估计问题,该框架统一了多项逻辑斯蒂(MNL)和Sparsemax等离散选择模型。为了解决传统最大似然估计在稀疏场景下存在的非凸性和不稳定性问题,作者提出了一种基于Fenchel-Young损失的统一估计方法,该方法利用选择概率的凸共轭结构,保证了全局凸性,从而提供了更稳定可靠的估计方案。此外,作者进一步开发了参数基估计(PBE)方法,在预设基函数族中联合估计效用参数和树结构扰动函数,实验表明该方法在预测性能上优于标准MNL模型。

详情
Comments
46 pages, 5 figures. Distributionally robust extensions previously included in earlier versions are no longer part of this manuscript and will be presented separately
英文摘要

The Perturbed Utility Model (PUM) framework provides a generalization of discrete choice analysis, unifying models like Multinomial Logit (MNL) and Sparsemax through convex optimization. However, standard Maximum Likelihood Estimation (MLE) encounters theoretical and computational limitations when applied to this broader class, particularly regarding non-convexity and instability in sparse regimes. To address these issues, this paper introduces a unified estimation framework for PUMs based on the Fenchel-Young loss. By leveraging the intrinsic convex conjugate structure of the choice probabilities, we demonstrate that the Fenchel-Young estimator guarantees global convexity, providing a stable alternative to MLE that accommodates both dense and sparse choice kernels. Furthermore, we establish the framework's asymptotic consistency and normality under standard regularity conditions. Leveraging the tractability of the Fenchel-Young estimator, we further develop a Parametric Basis Estimation (PBE) procedure that estimate utility parameters jointly with a tree-structured perturbation function within a pre-specified basis family. PBE employs a bi-level optimization architecture that parameterizes the unknown perturbation as a learnable convex combination of basis functions. For any fixed perturbation structure, the inner Fenchel--Young estimation problem is globally convex in the utility parameters, yielding a well-defined solution mapping that can be differentiated under regularity conditions. Empirical validation on the Swissmetro dataset demonstrates that the proposed framework improves predictive performance, as measured by the Brier score and Brier Skill Score, compared to the standard MNL baseline.

2511.08559 2026-05-15 stat.ME

Reluctant Transfer Learning in Penalized Regressions for Individualized Treatment Rules under Effect Heterogeneity

Eun Jeong Oh, Min Qian

AI总结 本文研究了在治疗效应异质性背景下,如何通过迁移学习方法更新个性化治疗规则(ITR)模型以适应新数据集中的治疗-协变量关系变化。提出了一种“迟疑迁移学习”(RTL)框架,通过选择性地迁移源模型的关键组件(如回归系数),在无需访问源数据个体信息的情况下实现高效模型适应。该方法仅在提升目标数据集性能时进行模型调整,控制模型复杂度并增强泛化能力,适用于多治疗组场景,并提供了最优ITR与估计ITR价值差异的遗憾界,实验表明其优于现有方法。

详情
英文摘要

Estimating individualized treatment rules (ITRs) is fundamental to precision medicine, where the goal is to tailor treatment decisions to individual patient characteristics. While numerous methods have been developed for ITR estimation, there is limited research on model updating that accounts for shifted treatment-covariate relationships in the ITR setting. In practice, models trained on source data must be updated for new (target) datasets that exhibit shifts in treatment effects. To address this challenge, we propose a Reluctant Transfer Learning (RTL) framework that enables efficient model adaptation by selectively transferring essential model components (e.g., regression coefficients) from source to target data, without requiring access to individual-level source data. Leveraging the principle of reluctant modeling, the RTL approach incorporates model adjustments only when they improve performance on the target dataset, thereby controlling complexity and enhancing generalizability. Our method supports multi-armed treatment settings, performs variable selection for interpretability, and provides a regret bound for the difference in value of the optimal ITR and that of the estimated ITR. Through simulation studies and an application to a real data example from the Best Apnea Interventions for Research (BestAIR) trial, we demonstrate that RTL outperforms existing alternatives. The proposed framework offers an efficient, practically feasible approach to adaptive treatment decision-making under evolving treatment effect conditions.

2502.00270 2026-05-15 cs.LG cs.AI stat.ML

DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks

Zhiliang Chen, Gregory Kang Ruey Lau, Chuan-Sheng Foo, Bryan Kian Hsiang Low

AI总结 本文研究了如何在未知的下游评估任务下优化大型语言模型的训练数据混合问题。由于实际任务数据往往不可见,传统数据选择方法难以适用,作者提出了一种基于反馈的优化方法DUET,结合影响函数与贝叶斯优化,实现了无需任务数据先验知识的全局到局部的数据混合优化。实验表明,DUET在多种语言任务中优于现有方法,展示了其在未知任务设置下的有效性。

详情
Comments
Accepted to ICLR 2026 main conference
英文摘要

The performance of an LLM depends heavily on the relevance of its training data to the downstream evaluation task. However, in practice, the data involved in an unseen evaluation task is often unknown (e.g., conversations between an LLM and a user are end-to-end encrypted). Hence, it is unclear what data are relevant for fine-tuning the LLM to maximize its performance on the specific unseen evaluation task. Instead, one can only deploy the LLM on the unseen task to gather multiple rounds of feedback on how well the model performs (e.g., user ratings). This novel setting offers a refreshing perspective towards optimizing training data mixtures via feedback from an unseen evaluation task, which prior data mixing and selection works do not consider. Our paper presents DUET, a novel global-to-local algorithm that interleaves influence function as a data selection method with Bayesian optimization to optimize data mixture via feedback from a specific unseen evaluation task. By analyzing DUET's cumulative regret, we theoretically show that DUET converges to the optimal training data mixture for an unseen task even without any data knowledge of the task. Finally, our experiments across a variety of language tasks demonstrate that DUET outperforms existing data selection and mixing methods in the unseen-task setting.

2412.03992 2026-05-15 stat.ML cs.LG math.ST stat.TH

How well behaved is finite dimensional Diffusion Maps?

Wenyu Bo, Marina Meilă

AI总结 本文研究有限维扩散映射(Diffusion Maps)在嵌入子流形时的几何性质及其误差界。在一系列关于子流形的假设下,作者推导了在扩散映射变换后仍保持的几何特性,如近似均匀密度、有限多项式逼近和曲率半径等。基于这些性质,他们严格界定了扩散映射嵌入的误差,并量化了估计切空间与真实切空间之间的偏差,为理解扩散映射在实际应用中的性能和可靠性提供了坚实的理论基础。

详情
Comments
33 pages, 4 figures
英文摘要

Under a set of assumptions on a family of submanifolds $\subset {\mathbb R}^D$, we derive a series of geometric properties that remain valid after finite-dimensional and almost isometric Diffusion Maps (DM), including almost uniform density, finite polynomial approximation and reach. Leveraging these properties, we establish rigorous bounds on the embedding errors introduced by the DM algorithm is $O\left((\frac{\log n}{n})^{\frac{1}{8d+16}}\right)$. Furthermore, we quantify the error between the estimated tangent spaces and the true tangent spaces over the submanifolds after the DM embedding, $\sup_{P\in \mathcal{P}}\mathbb{E}_{P^{\otimes \tilde{n}}} \max_{1\leq j \angle (T_{Y_{φ(M),j}}φ(M),\hat{T}_j)\leq \tilde{n}} \leq C \left(\frac{\log n }{n}\right)^\frac{k-1}{(8d+16)k}$, which providing a precise characterization of the geometric accuracy of the embeddings. These results offer a solid theoretical foundation for understanding the performance and reliability of DM in practical applications.

2409.19129 2026-05-15 math.ST stat.TH

Consistency of Graphical Model-based Clustering: Robust Clustering using Bayesian Spanning Forest

Yu Zheng, Leo L. Duan, Arkaprava Roy

AI总结 本文研究了基于图模型的聚类方法在数据生成过程与假设模型不一致时的鲁棒性问题。作者提出使用贝叶斯生成森林方法,通过节点划分的集成后验分布进行聚类估计,并在无需完全支持分离的条件下,证明了当数据来自未知的组件分布且满足一定渐近分离条件时,后验分布会集中于真实聚类划分,从而实现聚类结果的一致性。该结果适用于固定或随样本量增长的聚类数,并给出了误分类率的上界,表明图模型是混合模型之外一种有效的聚类方法。

详情
Comments
37 pages, 3 figures, 4 tables
英文摘要

Mixture model-based frameworks are very popular for statistical inference in clustering. While convenient for producing probabilistic estimates of cluster assignments and uncertainty, they are prone to misspecification, which can lead to inconsistent clustering results. Graphical model-based clustering adopts a different strategy, specifying the likelihood by treating data as dependently generated from a disjoint union of component graphs. Recent work on Bayesian spanning forests addresses graph uncertainty by using the integrated posterior of the node partition, marginalized over the latent edge distribution, to produce probabilistic clustering estimates. Despite strong empirical performance, theoretical guarantees such as consistency remain unclear, particularly when the true data-generating process deviates from the assumed graphical model. This article establishes a positive asymptotic result: when data are generated from an unknown collection of component distributions and a mild asymptotic separation condition holds with probability tending to one (without requiring complete support separation), the posterior concentrates on the true partition, thereby yielding consistent clustering estimates, including the number of clusters. Our results hold whether the number of clusters is fixed or increases with sample size. Additionally, we derive an upper bound on the expected misclassification rate. These results highlight graphical models as a robust alternative to mixture models in clustering.

2209.11315 2026-05-15 stat.ME

Robust beta regression through the logit transformation

Yuri S. Maluf, Silvia L. P. Ferrari, Francisco F. Queiroz

AI总结 本文研究了在存在异常值情况下如何提高贝塔回归模型的鲁棒性。作者提出了一种基于对数变换的稳健估计方法,避免了现有方法对参数空间的严格限制,提升了模型的适用性。该方法不仅具有良好的渐近性质,还引入了稳健的Wald型检验,并通过仿真和实际数据应用验证了其有效性。

详情
英文摘要

Beta regression models are employed to model continuous response variables in the unit interval, like rates, percentages, or proportions. Their applications rise in several areas, such as medicine, environment research, finance, and natural sciences. The maximum likelihood estimation is widely used to make inferences for the parameters. Nonetheless, it is well-known that the maximum likelihood-based inference suffers from the lack of robustness in the presence of outliers. Such a case can bring severe bias and misleading conclusions. Recently, robust estimators for beta regression models were presented in the literature. However, these estimators require non-trivial restrictions in the parameter space, which limit their application. This paper develops new robust estimators that overcome this drawback. Their asymptotic and robustness properties are studied, and robust Wald-type tests are introduced. Simulation results evidence the merits of the new robust estimators. Inference and diagnostics using the new estimators are illustrated in an application to health insurance coverage data.

2202.01697 2026-05-15 stat.ME

Power logit regression for modeling bounded data

Francisco Felipe Queiroz, Silvia Lopes Paula Ferrari

AI总结 本文提出了一类用于建模有界连续数据的新型回归模型——幂对数回归模型,该模型假设响应变量服从一个包含中位数、离散参数和偏态参数的三参数分布族。文章提供了完整的似然推断与诊断分析工具,并介绍了新的R语言包PLreg。通过实际和模拟数据的应用,展示了所提模型及其统计工具和计算包的优势。

详情
英文摘要

The main purpose of this paper is to introduce a new class of regression models for bounded continuous data, commonly encountered in applied research. The models, named the power logit regression models, assume that the response variable follows a distribution in a wide, flexible class of distributions with three parameters, namely the median, a dispersion parameter and a skewness parameter. The paper offers a comprehensive set of tools for likelihood inference and diagnostic analysis, and introduces the new R package PLreg. Applications with real and simulated data show the merits of the proposed models, the statistical tools, and the computational package.