arXivDaily arXiv每日学术速递 周一至周五更新
2602.11118 2026-02-12 stat.ME stat.ML

A Doubly Robust Machine Learning Approach for Disentangling Treatment Effect Heterogeneity with Functional Outcomes

Filippo Salmaso, Lorenzo Testa, Francesca Chiaromonte

Comments 20 pages, 4 figures

详情
英文摘要

Causal inference is paramount for understanding the effects of interventions, yet extracting personalized insights from increasingly complex data remains a significant challenge for modern machine learning. This is the case, in particular, when considering functional outcomes observed over a continuous domain (e.g., time, or space). Estimation of heterogeneous treatment effects, known as CATE, has emerged as a crucial tool for personalized decision-making, but existing meta-learning frameworks are largely limited to scalar outcomes, failing to provide satisfying results in scientific applications that leverage the rich, continuous information encoded in functional data. Here, we introduce FOCaL (Functional Outcome Causal Learning), a novel, doubly robust meta-learner specifically engineered to estimate a functional heterogeneous treatment effect (F-CATE). FOCaL integrates advanced functional regression techniques for both outcome modeling and functional pseudo-outcome reconstruction, thereby enabling the direct and robust estimation of F-CATE. We provide a rigorous theoretical derivation of FOCaL, demonstrate its performance and robustness compared to existing non-robust functional methods through comprehensive simulation studies, and illustrate its practical utility on diverse real-world functional datasets. FOCaL advances the capabilities of machine intelligence to infer nuanced, individualized causal effects from complex data, paving the way for more precise and trustworthy AI systems in personalized medicine, adaptive policy design, and fundamental scientific discovery.

2602.11108 2026-02-12 stat.CO cs.NA math.NA

Large Scale High-Dimensional Reduced-Rank Linear Discriminant Analysis

Jocelyn T. Chi

详情
英文摘要

Reduced-rank linear discriminant analysis (RRLDA) is a foundational method of dimension reduction for classification that has been useful in a wide range of applications. The goal is to identify an optimal subspace to project the observations onto that simultaneously maximizes between-group variation while minimizing within-group differences. The solution is straight forward when the number of observations is greater than the number of features but computational difficulties arise in both the high-dimensional setting, where there are more features than there are observations, and when the data are very large. Many works have proposed solutions for the high-dimensional setting and frequently involve additional assumptions or tuning parameters. We propose a fast and simple iterative algorithm for both classical and high-dimensional RRLDA on large data that is free from these additional requirements and that comes with guarantees. We also explain how RRLDA-RK provides implicit regularization towards the least norm solution without explicitly incorporating penalties. We demonstrate our algorithm on real data and highlight some results.

2602.11107 2026-02-12 stat.ME cs.LG stat.ML

Renet: Principled and Efficient Relaxation for the Elastic Net via Dynamic Objective Selection

Albert Dorador

详情
英文摘要

We introduce Renet, a principled generalization of the Relaxed Lasso to the Elastic Net family of estimators. While, on the one hand, $\ell_1$-regularization is a standard tool for variable selection in high-dimensional regimes and, on the other hand, the $\ell_2$ penalty provides stability and solution uniqueness through strict convexity, the standard Elastic Net nevertheless suffers from shrinkage bias that frequently yields suboptimal prediction accuracy. We propose to address this limitation through a framework called \textit{relaxation}. Existing relaxation implementations rely on naive linear interpolations of penalized and unpenalized solutions, which ignore the non-linear geometry that characterizes the entire regularization path and risk violating the Karush-Kuhn-Tucker conditions. Renet addresses these limitations by enforcing sign consistency through an adaptive relaxation procedure that dynamically dispatches between convex blending and efficient sub-path refitting. Furthermore, we identify and formalize a unique synergy between relaxation and the ``One-Standard-Error'' rule: relaxation serves as a robust debiasing mechanism, allowing practitioners to leverage the parsimony of the 1-SE rule without the traditional loss in predictive fidelity. Our theoretical framework incorporates automated stability safeguards for ultra-high dimensional regimes and is supported by a comprehensive benchmarking suite across 20 synthetic and real-world datasets, demonstrating that Renet consistently outperforms the standard Elastic Net and provides a more robust alternative to the Adaptive Elastic Net in high-dimensional, low signal-to-noise ratio and high-multicollinearity regimes. By leveraging an adaptive solver backend, Renet delivers these statistical gains while offering a computational profile that remains competitive with state-of-the-art coordinate descent implementations.

2602.11090 2026-02-12 cs.LG cs.AI cs.CE stat.CO

Direct Learning of Calibration-Aware Uncertainty for Neural PDE Surrogates

Carlos Stein Brito

Comments 13 pages, 11 figures

详情
英文摘要

Neural PDE surrogates are often deployed in data-limited or partially observed regimes where downstream decisions depend on calibrated uncertainty in addition to low prediction error. Existing approaches obtain uncertainty through ensemble replication, fixed stochastic noise such as dropout, or post hoc calibration. Cross-regularized uncertainty learns uncertainty parameters during training using gradients routed through a held-out regularization split. The predictor is optimized on the training split for fit, while low-dimensional uncertainty controls are optimized on the regularization split to reduce train-test mismatch, yielding regime-adaptive uncertainty without per-regime noise tuning. The framework can learn continuous noise levels at the output head, within hidden features, or within operator-specific components such as spectral modes. We instantiate the approach in Fourier Neural Operators and evaluate on APEBench sweeps over observed fraction and training-set size. Across these sweeps, the learned predictive distributions are better calibrated on held-out splits and the resulting uncertainty fields concentrate in high-error regions in one-step spatial diagnostics.

2602.11059 2026-02-12 stat.ML cs.LG stat.AP

A Gibbs posterior sampler for inverse problem based on prior diffusion model

Jean-François Giovannelli

详情
英文摘要

This paper addresses the issue of inversion in cases where (1) the observation system is modeled by a linear transformation and additive noise, (2) the problem is ill-posed and regularization is introduced in a Bayesian framework by an a prior density, and (3) the latter is modeled by a diffusion process adjusted on an available large set of examples. In this context, it is known that the issue of posterior sampling is a thorny one. This paper introduces a Gibbs algorithm. It appears that this avenue has not been explored, and we show that this approach is particularly effective and remarkably simple. In addition, it offers a guarantee of convergence in a clearly identified situation. The results are clearly confirmed by numerical simulations.

2602.11018 2026-02-12 cs.LG cs.AI stat.ML

OSIL: Learning Offline Safe Imitation Policies with Safety Inferred from Non-preferred Trajectories

Returaj Burnwal, Nirav Pravinbhai Bhatt, Balaraman Ravindran

Comments 21 pages, Accepted at AAMAS 2026

详情
英文摘要

This work addresses the problem of offline safe imitation learning (IL), where the goal is to learn safe and reward-maximizing policies from demonstrations that do not have per-timestep safety cost or reward information. In many real-world domains, online learning in the environment can be risky, and specifying accurate safety costs can be difficult. However, it is often feasible to collect trajectories that reflect undesirable or unsafe behavior, implicitly conveying what the agent should avoid. We refer to these as non-preferred trajectories. We propose a novel offline safe IL algorithm, OSIL, that infers safety from non-preferred demonstrations. We formulate safe policy learning as a Constrained Markov Decision Process (CMDP). Instead of relying on explicit safety cost and reward annotations, OSIL reformulates the CMDP problem by deriving a lower bound on reward maximizing objective and learning a cost model that estimates the likelihood of non-preferred behavior. Our approach allows agents to learn safe and reward-maximizing behavior entirely from offline demonstrations. We empirically demonstrate that our approach can learn safer policies that satisfy cost constraints without degrading the reward performance, thus outperforming several baselines.

2602.10971 2026-02-12 cs.LG stat.ML

A Jointly Efficient and Optimal Algorithm for Heteroskedastic Generalized Linear Bandits with Adversarial Corruptions

Sanghwa Kim, Junghyun Lee, Se-Young Yun

Comments 49 pages, 1 table

详情
英文摘要

We consider the problem of heteroskedastic generalized linear bandits (GLBs) with adversarial corruptions, which subsumes various stochastic contextual bandit settings, including heteroskedastic linear bandits and logistic/Poisson bandits. We propose HCW-GLB-OMD, which consists of two components: an online mirror descent (OMD)-based estimator and Hessian-based confidence weights to achieve corruption robustness. This is computationally efficient in that it only requires ${O}(1)$ space and time complexity per iteration. Under the self-concordance assumption on the link function, we show a regret bound of $\tilde{O}\left( d \sqrt{\sum_t g(τ_t) \dotμ_{t,\star}} + d^2 g_{\max} κ+ d κC \right)$, where $\dotμ_{t,\star}$ is the slope of $μ$ around the optimal arm at time $t$, $g(τ_t)$'s are potentially exogenously time-varying dispersions (e.g., $g(τ_t) = σ_t^2$ for heteroskedastic linear bandits, $g(τ_t) = 1$ for Bernoulli and Poisson), $g_{\max} = \max_{t \in [T]} g(τ_t)$ is the maximum dispersion, and $C \geq 0$ is the total corruption budget of the adversary. We complement this with a lower bound of $\tildeΩ(d \sqrt{\sum_t g(τ_t) \dotμ_{t,\star}} + d C)$, unifying previous problem-specific lower bounds. Thus, our algorithm achieves, up to a $κ$-factor in the corruption term, instance-wise minimax optimality simultaneously across various instances of heteroskedastic GLBs with adversarial corruptions.

2602.10969 2026-02-12 stat.ME stat.ML

Weighting-Based Identification and Estimation in Graphical Models of Missing Data

Anna Guo, Razieh Nabi

详情
英文摘要

We propose a constructive algorithm for identifying complete data distributions in graphical models of missing data. The complete data distribution is unrestricted, while the missingness mechanism is assumed to factorize according to a conditional directed acyclic graph. Our approach follows an interventionist perspective in which missingness indicators are treated as variables that can be intervened on. A central challenge in this setting is that sequences of interventions on missingness indicators may induce and propagate selection bias, so that identification can fail even when a propensity score is invariant to available interventions. To address this challenge, we introduce a tree-based identification algorithm that explicitly tracks the creation and propagation of selection bias and determines whether it can be avoided through admissible intervention strategies. The resulting tree provides both a diagnostic and a constructive characterization of identifiability under a given missingness mechanism. Building on these results, we develop recursive inverse probability weighting procedures that mirror the intervention logic of the identification algorithm, yielding valid estimating equations for both the missingness mechanism and functionals of the complete data distribution. Simulation studies and a real-data application illustrate the practical performance of the proposed methods. An accompanying R package, flexMissing, implements all proposed procedures.

2602.10960 2026-02-12 q-fin.ST cs.CE econ.EM q-fin.RM stat.CO

Integrating granular data into a multilayer network: an interbank model of the euro area for systemic risk assessment

Ilias Aarab, Thomas Gottron, Andrea Colombo, Jörg Reddig, Annalauro Ianiro

Journal ref Adv Data Anal Classif (2026)

详情
英文摘要

Micro-structural models of contagion and systemic risk emphasize that shock propagation is inherently multi-channel, spanning counterparty exposures, short-term funding and roll-over risk, securities cross-holdings, and common-asset (fire-sale) spillovers. Empirical implementations, however, often rely on stylized or simulated networks, or focus on a single exposure dimension, reflecting the practical difficulty of reconciling heterogeneous granular collections into a coherent representation with consistent identifiers and consolidation rules. We close part of this gap by constructing an empirically grounded multilayer network for euro area significant banking groups that integrates several supervisory and statistical datasets into layer-consistent exposure matrices defined on a common node set. Each layer corresponds to a distinct transmission channel, long- and short-term credit, securities cross-holdings, short-term secured funding, and overlapping external portfolios, and nodes are enriched with balance-sheet information to support model calibration. We document pronounced cross-layer heterogeneity in connectivity and centrality, and show that an aggregated (flattened) representation can mask economically relevant structure and misidentify the institutions that are systemically important in specific markets. We then illustrate how the resulting network disciplines standard systemic-risk analytics by implementing a centrality-based propagation measure and a micro-structural agent-based framework on real exposures. The approach provides a data-grounded basis for layer-aware systemic-risk assessment and stress testing across multiple dimensions of the banking network.

2602.10924 2026-02-12 stat.ME

Non-centred Bayesian inference for discrete-valued state-transition models: the Rippler algorithm

James Neill, Lloyd A. C. Chapman, Chris Jewell

Comments 18 pages, 7 figures (plus supplementary material with an additional 9 pages, 8 figures)

详情
英文摘要

Stochastic state-transition models of infectious disease transmission can be used to deduce relevant drivers of transmission when fitted to data using statistically principled methods. Fitting this individual-level data requires inference on individuals' unobserved disease statuses over time, which form a high-dimensional and highly correlated state space. We introduce a novel Bayesian (data-augmentation Markov chain Monte Carlo) algorithm for jointly estimating the model parameters and unobserved disease statuses, which we call the Rippler algorithm. This is a non-centred method that can be applied to any individual-based state-transition model. We compare the Rippler algorithm to the state-of-the-art inference methods for individual-based stochastic epidemic models and find that it performs better than these methods as the number of disease states in the model increases.

2602.10867 2026-02-12 stat.ML cs.LG

Deep Learning of Compositional Targets with Hierarchical Spectral Methods

Hugo Tabanelli, Yatin Dandi, Luca Pesce, Florent Krzakala

详情
英文摘要

Why depth yields a genuine computational advantage over shallow methods remains a central open question in learning theory. We study this question in a controlled high-dimensional Gaussian setting, focusing on compositional target functions. We analyze their learnability using an explicit three-layer fitting model trained via layer-wise spectral estimators. Although the target is globally a high-degree polynomial, its compositional structure allows learning to proceed in stages: an intermediate representation reveals structure that is inaccessible at the input level. This reduces learning to simpler spectral estimation problems, well studied in the context of multi-index models, whereas any shallow estimator must resolve all components simultaneously. Our analysis relies on Gaussian universality, leading to sharp separations in sample complexity between two and three-layer learning strategies.

2602.03609 2026-02-12 stat.AP

Scalable non-separable spatio-temporal Gaussian process models for large-scale short-term weather prediction

Tim Gyger, Reinhard Furrer, Fabio Sigrist

详情
英文摘要

Monitoring daily weather fields is critical for climate science, agriculture, and environmental planning, yet fully probabilistic spatio-temporal models become computationally prohibitive at continental scale. We present a case study on short-term forecasting of daily maximum temperature and precipitation across the conterminous United States using novel scalable spatio-temporal Gaussian process methodology. Building on three approximation families - inducing-point methods (FITC), Vecchia approximations, and a hybrid Vecchia-inducing-point full-scale approach (VIF) - we introduce three extensions that address key bottlenecks in large space-time settings: (i) a scalable correlation-based neighbor selection strategy for Vecchia approximations with point-referenced data, enabling accurate conditioning under complex dependence structures, (ii) a space-time kMeans++ inducing-point selection algorithm, and (iii) GPU-accelerated implementations of computationally expensive operations, including matrix operations and neighbor searches. Using both synthetic experiments and a large NOAA station dataset containing more than one million space-time observations, we analyze the models with respect to predictive performance, parameter estimation, and computational efficiency. Our results demonstrate that scalable Gaussian process models can yield accurate continental-scale forecasts while remaining computationally feasible, offering practical tools for weather applications.

2512.00181 2026-02-12 cs.LG cs.AI stat.ML

Orion-Bix: Bi-Axial Attention for Tabular In-Context Learning

Mohamed Bouadi, Pratinav Seth, Aditya Tanna, Vinay Kumar Sankarapu

详情
英文摘要

Tabular data drive most real-world machine learning applications, yet building general-purpose models for them remains difficult. Mixed numeric and categorical fields, weak feature structure, and limited labeled data make scaling and generalization challenging. To this end, we introduce Orion-Bix, a tabular foundation model that combines biaxial attention with meta-learned in-context reasoning for few-shot tabular learning. Its encoder alternates standard, grouped, hierarchical, and relational attention, fusing their outputs through multi-CLS summarization to capture both local and global dependencies efficiently. A label-aware ICL head adapts on the fly and scales to large label spaces via hierarchical decision routing. Meta-trained on synthetically generated, structurally diverse tables with causal priors, Orion-Bix learns transferable inductive biases across heterogeneous data. Delivered as a scikit-learn compatible foundation model, it outperforms gradient-boosting baselines and remains competitive with state-of-the-art tabular foundation models on public benchmarks, showing that biaxial attention with episodic meta-training enables robust, few-shot-ready tabular learning. The model is publicly available at https://github.com/Lexsi-Labs/Orion-BiX .

2511.18141 2026-02-12 stat.ML cs.LG

Conformal Prediction for Compositional Data

Lucas P. Amaral, Luben M. C. Cabezas, Thiago R. Ramos, Gustavo H. G. A. Pereira

Comments 32 pages, 11 figures

详情
英文摘要

Dirichlet regression models are suitable for compositional data, in which the response variable represents proportions that sum to one. However, there are still no well-established methods for constructing valid prediction sets in this context, especially considering the geometry of the compositional space. In this work, we investigate conformal prediction-based strategies for constructing valid predictive regions in Dirichlet regression models. We evaluate three distinct approaches: a method based on quantile residuals, an approximate construction of highest density regions (HDR), and an adaptation of the approximate HDR using grid-based discretization over the simplex. The performance of the methods was analyzed through simulation studies under different scenarios, varying the model complexity, response dimensionality, and covariate structure. The results indicated that the HDR approximation approach exhibits good robustness in terms of coverage, while the grid discretization proved effective in reducing overcoverage and the area of the prediction region compared to the original method. The quantile method provided larger prediction regions compared to the grid method, while maintaining adequate coverage. The methodologies were also applied to two real datasets: one concerning sleep stages and another on biomass allocation in plants. In both cases, the proposed methods demonstrated practical feasibility and produced coherent interpretations within the compositional space. Finally, we discuss possible extensions of this work

2510.06025 2026-02-12 cs.LG stat.ML

Out-of-Distribution Detection from Small Training Sets using Bayesian Neural Network Classifiers

Kevin Raina, Tanya Schmah

Comments British Machine Vision Conference (BMVC) 2025; 18 pages, 6 figures, 3 tables

Journal ref https://bmvc2025.bmva.org/proceedings/1187/

详情
英文摘要

Out-of-Distribution (OOD) detection is critical to AI reliability and safety, yet in many practical settings, only a limited amount of training data is available. Bayesian Neural Networks (BNNs) are a promising class of model on which to base OOD detection, because they explicitly represent epistemic (i.e. model) uncertainty. In the small training data regime, BNNs are especially valuable because they can incorporate prior model information. We introduce a new family of Bayesian posthoc OOD scores based on expected logit vectors, and compare 5 Bayesian and 4 deterministic posthoc OOD scores. Experiments on MNIST and CIFAR-10 In-Distributions, with 5000 training samples or less, show that the Bayesian methods outperform corresponding deterministic methods.

2509.25170 2026-02-12 cs.LG cs.AI stat.ML

GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models

Peter Holderrieth, Uriel Singer, Tommi Jaakkola, Ricky T. Q. Chen, Yaron Lipman, Brian Karrer

详情
英文摘要

The performance of flow matching and diffusion models can be greatly improved at inference time using reward alignment algorithms, yet efficiency remains a major limitation. While several algorithms were proposed, we demonstrate that a common bottleneck is the sampling method these algorithms rely on: many algorithms require to sample Markov transitions via SDE sampling, which is significantly less efficient and often less performant than ODE sampling. To remove this bottleneck, we introduce GLASS Flows, a new sampling paradigm that simulates a "flow matching model within a flow matching model" to sample Markov transitions. As we show in this work, this "inner" flow matching model can be retrieved from a pre-trained model without any re-training, combining the efficiency of ODEs with the stochastic evolution of SDEs. On large-scale text-to-image models, we show that GLASS Flows eliminate the trade-off between stochastic evolution and efficiency. Combined with Feynman-Kac Steering, GLASS Flows improve state-of-the-art performance in text-to-image generation, making it a simple, drop-in solution for inference-time scaling of flow and diffusion models.

2508.07465 2026-02-12 cs.LG q-bio.GN stat.ML

MOTGNN: Interpretable Graph Neural Networks for Multi-Omics Disease Classification

Tiantian Yang, Zhiqian Chen

Comments 11 pages, 6 figures, 7 tables

详情
英文摘要

Integrating multi-omics data, such as DNA methylation, mRNA expression, and microRNA (miRNA) expression, offers a comprehensive view of the biological mechanisms underlying disease. However, the high dimensionality of multi-omics data, the heterogeneity across modalities, and the lack of reliable biological interaction networks make meaningful integration challenging. In addition, many existing models rely on handcrafted similarity graphs, are vulnerable to class imbalance, and often lack built-in interpretability, limiting their usefulness in biomedical applications. We propose Multi-Omics integration with Tree-generated Graph Neural Network (MOTGNN), a novel and interpretable framework for binary disease classification. MOTGNN employs eXtreme Gradient Boosting (XGBoost) for omics-specific supervised graph construction, followed by modality-specific Graph Neural Networks (GNNs) for hierarchical representation learning, and a deep feedforward network for cross-omics integration. Across three real-world disease datasets, MOTGNN outperforms state-of-the-art baselines by 5-10% in accuracy, ROC-AUC, and F1-score, and remains robust to severe class imbalance. The model maintains computational efficiency through the use of sparse graphs and provides built-in interpretability, revealing both top-ranked biomarkers and the relative contributions of each omics modality. These results highlight the potential of MOTGNN to improve both predictive accuracy and interpretability in multi-omics disease modeling.

2506.23870 2026-02-12 stat.ME math.ST stat.TH

Upgrading survival models with CARE

William G. Underwood, Henry W. J. Reeve, Oliver Y. Feng, Samuel A. Lambert, Bhramar Mukherjee, Richard J. Samworth

Comments 80 pages, 12 figures

详情
英文摘要

Clinical risk prediction models are regularly updated as new data, often with additional covariates, become available. We propose CARE (Convex Aggregation of relative Risk Estimators) as a general approach for combining existing "external" estimators with a new data set in a time-to-event survival analysis setting. Our method initially employs the new data to fit a flexible family of reproducing kernel estimators via penalised partial likelihood maximisation. The final relative risk estimator is then constructed as a convex combination of the kernel and external estimators, with the convex combination coefficients and regularisation parameters selected using cross-validation. We establish high-probability bounds for the $L_2$-error of our proposed aggregated estimator, showing that it achieves a rate of convergence that is at least as good as both the optimal kernel estimator and the best external model. Empirical results from simulation studies align with the theoretical results, and we illustrate the improvements our methods provide for cardiovascular disease risk modelling. Our methodology is implemented in the Python package care-survival.

2506.16202 2026-02-12 cs.CY cs.HC stat.AP

AI labeling reduces the perceived accuracy of online content but has limited broader effects

Chuyao Wang, Patrick Sturgis, Daniel de Kadt

Comments 31 pages, 5 figures, 10 tables

详情
英文摘要

Explicit labeling of online content produced by artificial intelligence (AI) is a widely discussed policy for ensuring transparency and promoting public confidence. Yet little is known about the scope of AI labeling effects on public assessments of labeled content. We contribute new evidence on this question from a survey experiment using a high-quality nationally representative probability sample (\emph{n} = 3,861). First, we demonstrate that explicit AI labeling of a news article about a proposed public policy reduces its perceived accuracy. Second, we test whether there are spillover effects in terms of policy interest, policy support, and general concerns about online misinformation. We find that AI labeling reduces interest in the policy, but neither influences support for the policy nor triggers general concerns about online misinformation. We further find that increasing the salience of AI use reduces the negative impact of AI labeling on perceived accuracy, while one-sided versus two-sided framing of the policy has no moderating effect. Overall, our findings suggest that the effects of algorithm aversion induced by AI labeling of online content are limited in scope and that transparency policies may benefit from contextualizing AI use to mitigate unintended public skepticism.

2504.08513 2026-02-12 math.PR math.ST stat.TH

Measure Theory of Conditionally Independent Random Function Evaluation

Felix Benning

详情
英文摘要

In sequential design strategies, common in geostatistics and Bayesian optimization, the selection of a new observation point $X_{n+1}$ of a random function $\mathbf f$ is informed by past data, captured by the filtration $\mathcal F_n=σ(\mathbf f(X_0),\dots,\mathbf f(X_n))$. The random nature of $X_{n+1}$ introduces measure-theoretic subtleties in deriving the conditional distribution $\mathbb P(\mathbf f(X_{n+1})\in A \mid \mathcal F_n)$. Practitioners often resort to a heuristic: treating $X_0,\dots, X_{n+1}$ as fixed parameters within the conditional probability calculation. This paper investigates the mathematical validity of this widespread practice. We construct a counterexample to prove that this approach is, in general, incorrect. We also establish our central positive result: for continuous Gaussian random functions and their canonical conditional distribution, the heuristic is sound. This provides a rigorous justification for a foundational technique in Bayesian optimization and spatial statistics. We further extend our analysis to include settings with noisy evaluations and to cases where $X_{n+1}$ is not adapted to $\mathcal F_n$ but is conditionally independent of $\mathbf f$ given the filtration.

2408.12175 2026-02-12 cs.LG stat.ML

Measuring Orthogonality as the Blind-Spot of Uncertainty Disentanglement

Ivo Pascal de Jong, Andreea Ioana Sburlea, Matthia Sabatelli, Matias Valdenegro-Toro

Comments 25 pages, 17 figures, 6 tables

详情
英文摘要

Aleatoric (data) and epistemic (knowledge) uncertainty are textbook components of Uncertainty Quantification. Jointly estimating these components has been shown to be problematic and non-trivial. As a result, there are multiple ways to disentangle these uncertainties, but current methods to evaluate them are insufficient. We propose that aleatoric and epistemic uncertainty estimates should be orthogonally disentangled - meaning that each uncertainty is not affected by the other - a necessary condition that is often not met. We prove that orthogonality and consistency and necessary and sufficient criteria for disentanglement, and construct Uncertainty Disentanglement Error as a metric to measure these criteria, with further empirical evaluation showing that finetuned models give different orthogonality results than models trained from scratch and that UDE can be optimized for through dropout rate. We demonstrate a Deep Ensemble trained from scratch on ImageNet-1k with Information Theoretic disentangling achieves consistent and orthogonal estimates of epistemic uncertainty, but estimates of aleatoric uncertainty still fail on orthogonality.

2407.12149 2026-02-12 math.NA cs.NA math.ST stat.TH

Multigrid Monte Carlo Revisited: Theory and Bayesian Inference

Yoshihito Kazashi, Eike H. Müller, Robert Scheichl

Comments 64 pages, 4 figures, 2 tables; to appear in "Foundations of Computational Mathematics"

详情
英文摘要

Gaussian random fields play an important role in many areas of science and engineering. In practice, they are often simulated by sampling from a high-dimensional multivariate normal distribution, which arises from the discretisation of a suitable precision operator. Existing methods such as Cholesky factorization and Gibbs sampling become prohibitively expensive on fine meshes due to their high computational cost. In this work, we revisit the Multigrid Monte Carlo (MGMC) algorithm developed by Goodman & Sokal (Physical Review D 40.6, 1989) in the quantum physics context. While the authors of this paper conclude that MGMC does not overcome critical slowing down in simulations of field theories near phase transitions, we demonstrate here that it has the potential to significantly accelerate sampling in spatial statistics. The class of Gaussian Random Fields we consider includes those with Matérn covariance, but is more general in that it also allows for non-stationary covariance functions. To show that MGMC can overcome the limitation of existing methods, we establish a grid-size-independent convergence theory based on the link between linear solvers and samplers for multivariate normal distributions, drawing on standard multigrid convergence arguments. We then apply this theory to linear Bayesian inverse problems. This application is achieved by extending the standard multigrid theory to operators with a low-rank perturbation. Moreover, we develop a novel bespoke random smoother which takes care of the low-rank updates that arise in constructing posterior moments. In particular, we prove that Multigrid Monte Carlo is algorithmically optimal in the limit of the grid-size going to zero. Numerical results support our theory, demonstrating that Multigrid Monte Carlo can be significantly more efficient than alternative methods when applied in a Bayesian setting.

2407.07559 2026-02-12 math.ST stat.TH

Granulometric Smoothing on Manifolds

Diego Bolón, Rosa M. Crujeiras, Alberto Rodríguez-Casal

Comments 65 pages (a main paper of 28 pages and several appendices)

详情
英文摘要

Given a random sample from a density function supported on a manifold $M$, a new method for the estimating highest density regions of the underlying population is introduced. The new proposal is based on the empirical version of the opening operator from mathematical morphology combined with a preliminary estimator of the density function. This results in an estimator that is easy-to-compute since it simply consists of a list of centers and a radius $r$ that are adequately selected from the data. The new estimator is shown to be consistent and its convergence rates in terms of the Hausdorff distance are provided. All consistency results are established uniformly on the level of the set and for any Riemannian manifold $M$ satisfying mild assumptions. The applicability of the procedure is shown by some illustrative examples.

2405.16828 2026-02-12 cs.LG math.ST stat.ML stat.TH

Kernel-based Optimally Weighted Conformal Time-Series Prediction

Jonghyeok Lee, Chen Xu, Yao Xie

Journal ref In Proceedings of the Thirteenth International Conference on Learning Representations (ICLR), 2025

详情
英文摘要

In this work, we present a novel conformal prediction method for time-series, which we call Kernel-based Optimally Weighted Conformal Prediction Intervals (KOWCPI). Specifically, KOWCPI adapts the classic Reweighted Nadaraya-Watson (RNW) estimator for quantile regression on dependent data and learns optimal data-adaptive weights. Theoretically, we tackle the challenge of establishing a conditional coverage guarantee for non-exchangeable data under strong mixing conditions on the non-conformity scores. We demonstrate the superior performance of KOWCPI on real and synthetic time-series data against state-of-the-art methods, where KOWCPI achieves narrower confidence intervals without losing coverage.

2402.03991 2026-02-12 cs.LG cs.NA math.NA stat.ML

Provable Emergence of Deep Neural Collapse and Low-Rank Bias in $L^2$-Regularized Nonlinear Networks

Emanuele Zangrando, Piero Deidda, Simone Brugiapaglia, Nicola Guglielmi, Francesco Tudisco

详情
英文摘要

We present a unified theoretical framework connecting the first property of Deep Neural Collapse (DNC1) to the emergence of implicit low-rank bias in nonlinear networks trained with $L^2$ weight decay regularization. Our main contributions are threefold. First, we derive a quantitative relation between the Total Cluster Variation (TCV) of intermediate embeddings and the numerical rank of stationary weight matrices. In particular, we establish that, at any critical point, the distance from a weight matrix to the set of rank-$K$ matrices is bounded by a constant times the TCV of earlier-layer features, scaled inversely with the weight-decay parameter. Second, we prove global optimality of DNC1 in a constrained representation-cost setting for both feedforward and residual architectures, showing that zero TCV across intermediate layers minimizes the representation cost under natural architectural constraints. Third, we establish a benign landscape property: for almost every interpolating initialization there exists a continuous, loss-decreasing path from the initialization to a globally optimal, DNC1-satisfying configuration. Our theoretical claims are validated empirically; numerical experiments confirm the predicted relations among TCV, singular-value structure, and weight decay. These results indicate that neural collapse and low-rank bias are intimately linked phenomena arising from the optimization geometry induced by weight decay.

2304.04724 2026-02-12 stat.CO cs.CC stat.ML

When does Metropolized Hamiltonian Monte Carlo provably outperform Metropolis-adjusted Langevin algorithm?

Yuansi Chen, Khashayar Gatmiry, Minhui Jiang

Comments 46 pages, fixed typos and minor issues

详情
英文摘要

We analyze the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) with the leapfrog integrator to sample from a distribution on $\mathbb{R}^d$ whose log-density is smooth, has Lipschitz Hessian in Frobenius norm and satisfies isoperimetry. We bound the gradient complexity to reach $ε$ error in total variation distance from a warm start by $\tilde O(d^{1/4}\text{polylog}(1/ε))$ and demonstrate the benefit of choosing the number of leapfrog steps to be larger than 1. To surpass the previous analysis on Metropolis-adjusted Langevin algorithm (MALA) that has $\tilde{O}(d^{1/2}\text{polylog}(1/ε))$ dimension dependency [WSC22], we reveal a key feature in our proof that the joint distribution of the location and velocity variables of the discretization of the continuous HMC dynamics stays approximately invariant. This key feature, when shown via induction over the number of leapfrog steps, enables us to obtain estimates on moments of various quantities that appear in the acceptance rate control of Metropolized HMC. Notably, our analysis does not require log-concavity or independence of the marginals, and only relies on an isoperimetric inequality. To illustrate the relevance of the Lipschitz Hessian in Frobenius norm assumption, several examples that fall into our framework are discussed.

2203.00554 2026-02-12 stat.ML cs.LG

Neural Score Matching for High-Dimensional Causal Inference

Oscar Clivio, Fabian Falck, Brieuc Lehmann, George Deligiannidis, Chris Holmes

Comments Fixed erroneous Propositions 5-6-7 and Appendix B from the previous version

详情
英文摘要

Traditional methods for matching in causal inference are impractical for high-dimensional datasets. They suffer from the curse of dimensionality: exact matching and coarsened exact matching find exponentially fewer matches as the input dimension grows, and propensity score matching may match highly unrelated units together. To overcome this problem, we develop theoretical results which motivate the use of neural networks to obtain non-trivial, multivariate balancing scores of a chosen level of coarseness, in contrast to the classical, scalar propensity score. We leverage these balancing scores to perform matching for high-dimensional causal inference and call this procedure neural score matching. We show that our method is competitive against other matching approaches on semi-synthetic high-dimensional datasets, both in terms of treatment effect estimation and reducing imbalance.

2602.10784 2026-02-12 stat.AP

Integrating Unsupervised and Supervised Learning for the Prediction of Defensive Schemes in American football

Rouven Michels, Robert Bajons, Jan-Ole Fischer

详情
英文摘要

Anticipating defensive coverage schemes is a crucial yet challenging task for offenses in American football. Because defenders' assignments are intentionally disguised before the snap, they remain difficult to recognize in real time. To address this challenge, we develop a statistical framework that integrates supervised and unsupervised learning using player tracking data. Our goal is to forecast the defensive coverage scheme -- man or zone -- through elastic net logistic regression and gradient-boosted decision trees with incrementally derived features. We first use features from the pre-motion situation, then incorporate players' trajectories during motion in a naive way, and finally include features derived from a hidden Markov model (HMM). Based on player movements, the non-homogeneous HMM infers latent defensive assignments between offensive and defensive players during motion and transforms decoded state sequences into informative features for the supervised models. These HMM-based features enhance predictive performance and are significantly associated with coverage outcomes. Moreover, estimated random effects offer interpretable insights into how different defenses and positions adjust their coverage responsibilities.

2602.10774 2026-02-12 math.ST stat.TH

Nonparametric two sample test of spectral densities

Ilaria Nadin, Tatyana Krivobokova, Farida Enikeeva

详情
英文摘要

A novel nonparametric test for the equality of the covariance matrices of two Gaussian stationary processes, possibly of different lengths, is proposed. The test translates to testing the equality of two spectral densities and is shown to be minimax rate-optimal. Test performance is validated in a simulation study, and the practical utility is demonstrated in the analysis of real electroencephalography data. The test is implemented in the R-package sdf.test.

2602.10754 2026-02-12 cs.LG cs.AI cs.SY eess.SY stat.ML

Exploring the impact of adaptive rewiring in Graph Neural Networks

Charlotte Cambier van Nooten, Christos Aronis, Yuliya Shapovalova, Lucia Cavallaro

Comments This work has been submitted to the IEEE for possible publication

详情
英文摘要

This paper explores sparsification methods as a form of regularization in Graph Neural Networks (GNNs) to address high memory usage and computational costs in large-scale graph applications. Using techniques from Network Science and Machine Learning, including Erdős-Rényi for model sparsification, we enhance the efficiency of GNNs for real-world applications. We demonstrate our approach on N-1 contingency assessment in electrical grids, a critical task for ensuring grid reliability. We apply our methods to three datasets of varying sizes, exploring Graph Convolutional Networks (GCN) and Graph Isomorphism Networks (GIN) with different degrees of sparsification and rewiring. Comparison across sparsification levels shows the potential of combining insights from both research fields to improve GNN performance and scalability. Our experiments highlight the importance of tuning sparsity parameters: while sparsity can improve generalization, excessive sparsity may hinder learning of complex patterns. Our adaptive rewiring approach, particularly when combined with early stopping, proves promising by allowing the model to adapt its connectivity structure during training. This research contributes to understanding how sparsity can be effectively leveraged in GNNs for critical applications like power grid reliability analysis.

2602.10730 2026-02-12 stat.ME math.ST stat.TH

A closed form solution for Bayesian analysis of a simple linear mixed model

Hilde Vinje, Lars Erik Gangsei

详情
英文摘要

Linear mixed-effects models are a central analytical tool for modeling hierarchical and longitudinal data, as they allow simultaneous representation of fixed and random sources of variation. In practice, inference for such models is most often based on likelihood-based approximations, which are computationally efficient, but rely on numerical integration and may be unreliable example wise in small-sample settings. In this study, the somewhat obscure four-parameter generalized beta density is shown to be usable as a conjugate prior distribution for a simple linear mixed model. This leads to a closed-form Bayesian solution for a balanced mixed-model design, representing a methodological development beyond standard approximate or simulation-based Bayesian approaches. Although the derivation is restricted to a balanced setting, the proposed framework suggests a pathway toward analytically tractable Bayesian inference for more complex mixed-model structures. The method is evaluated through comparison with a standard frequentist solution based on likelihood estimation for linear mixed-effects models. Results indicate that the Bayesian approach performs just as well as the frequentist alternative, while yielding slightly reduced mean squared error. The study further discusses the use of empirical Bayes strategies for hyperparameter specification and outlines potential directions for extending the approach beyond the balanced case.

2602.10714 2026-02-12 stat.CO stat.ML

A Non-asymptotic Analysis for Learning and Applying a Preconditioner in MCMC

Max Hird, Florian Maire, Jeffrey Negrea

详情
英文摘要

Preconditioning is a common method applied to modify Markov chain Monte Carlo algorithms with the goal of making them more efficient. In practice it is often extremely effective, even when the preconditioner is learned from the chain. We analyse and compare the finite-time computational costs of schemes which learn a preconditioner based on the target covariance or the expected Hessian of the target potential with that of a corresponding scheme that does not use preconditioning. We apply our results to the Unadjusted Langevin Algorithm (ULA) for an appropriately regular target, establishing non-asymptotic guarantees for preconditioned ULA which learns its preconditioner. Our results are also applied to the unadjusted underdamped Langevin algorithm in the supplementary material. To do so, we establish non-asymptotic guarantees on the time taken to collect $N$ approximately independent samples from the target for schemes that learn their preconditioners under the assumption that the underlying Markov chain satisfies a contraction condition in the Wasserstein-2 distance. This approximate independence condition, that we formalize, allows us to bridge the non-asymptotic bounds of modern MCMC theory and classical heuristics of effective sample size and mixing time, and is needed to amortise the costs of learning a preconditioner across the many samples it will be used to produce.

2602.10697 2026-02-12 math.OC stat.ML

Fast and Large-Scale Unbalanced Optimal Transport via its Semi-Dual and Adaptive Gradient Methods

Ferdinand Genans

详情
英文摘要

Unbalanced Optimal Transport (UOT) has emerged as a robust relaxation of standard Optimal Transport, particularly effective for handling outliers and mass variations. However, scalable algorithms for UOT, specifically those based on Gradient Descent (SGD), remain largely underexplored. In this work, we address this gap by analyzing the semi-dual formulation of Entropic UOT and demonstrating its suitability for adaptive gradient methods. While the semi-dual is a standard tool for large-scale balanced OT, its geometry in the unbalanced setting appears ill-conditioned under standard analysis. Specifically, worst-case bounds on the marginal penalties using $χ^2$ divergence suggest a condition number scaling with $n/\varepsilon$, implying poor scalability. In contrast, we show that the local condition number actually scales as $\mathcal{O}(1/\varepsilon)$, effectively removing the ill-conditioned dependence on $n$. Exploiting this property, we prove that SGD methods adapt to this local curvature, achieving a convergence rate of $\mathcal{O}(n/\varepsilon T)$ in the stochastic and online regimes, making it suitable for large-scale and semi-discrete applications. Finally, for the full batch discrete setting, we derive a nearly tight upper bound on local smoothness depending solely on the gradient. Using it to adapt step sizes, we propose a modified Adaptive Nesterov Accelerated Gradient (ANAG) method on the semi-dual functional and prove that it achieves a local complexity of $\mathcal{O}(n^2\sqrt{1/\varepsilon}\ln(1/δ))$.

2602.10691 2026-02-12 stat.ML cs.LG

Convergence Rates for Distribution Matching with Sliced Optimal Transport

Gauthier Thurin, Claire Boyer, Kimia Nadjahi

详情
英文摘要

We study the slice-matching scheme, an efficient iterative method for distribution matching based on sliced optimal transport. We investigate convergence to the target distribution and derive quantitative non-asymptotic rates. To this end, we establish __ojasiewicz-type inequalities for the Sliced-Wasserstein objective. A key challenge is to control along the trajectory the constants in these inequalities. We show that this becomes tractable for Gaussian distributions. Specifically, eigenvalues are controlled when matching along random orthonormal bases at each iteration. We complement our theory with numerical experiments and illustrate the predicted dependence on dimension and step-size, as well as the stabilizing effect of orthonormal-basis sampling.

2602.10673 2026-02-12 stat.ME

Inferring the presence and abundance of rare waterbirds species from scarce data

Barbara Bricout, Laura Dami, Pierre Defos du Rau, Sophie Donnet, Thomas Galewski, Stephane Robin

Comments 31 pages, 9 figures

详情
英文摘要

Abundance data are used in ecology for species monitoring and conservation. These count data often display several specific characteristics like numerous missing data, high variance, and a high proportion of zeros, particularly when monitoring rare species. We present a model that aims to impute missing data and estimate the effect of covariates on species presence and abundance. It is based on the log-normal Poisson model, which offers more flexibility in the variance of counts than a Poisson model. A latent variable is added for the overrepresentation of zeros in the data. The imputation of missing data is made possible by assuming that the latent variance matrix has low rank and the inclusion of covariates. \\ We demonstrate the identifiability in the presence of missing data. Since maximum likelihood inference is intractable, we use a variational expectation-maximization algorithm to infer the parameters. We provide an estimate of the asymptotic variance of the estimators and derive prediction intervals for the imputations, an estimate of the temporal trend, and a procedure for detecting a potential change in this trend. \\ We evaluate our imputations and associated prediction intervals using artificially degraded monitoring data set. We conclude with an illustration on a monitoring waterbirds data set.

2602.10640 2026-02-12 stat.ML cs.LG

Beyond Kemeny Medians: Consensus Ranking Distributions Definition, Properties and Statistical Learning

Stephan Clémençon, Ekhine Irurozki

详情
英文摘要

In this article we develop a new method for summarizing a ranking distribution, \textit{i.e.} a probability distribution on the symmetric group $\mathfrak{S}_n$, beyond the classical theory of consensus and Kemeny medians. Based on the notion of \textit{local ranking median}, we introduce the concept of \textit{consensus ranking distribution} ($\crd$), a sparse mixture model of Dirac masses on $\mathfrak{S}_n$, in order to approximate a ranking distribution with small distortion from a mass transportation perspective. We prove that by choosing the popular Kendall $τ$ distance as the cost function, the optimal distortion can be expressed as a function of pairwise probabilities, paving the way for the development of efficient learning methods that do not suffer from the lack of vector space structure on $\mathfrak{S}_n$. In particular, we propose a top-down tree-structured statistical algorithm that allows for the progressive refinement of a CRD based on ranking data, from the Dirac mass at a Kemeny median at the root of the tree to the empirical ranking data distribution itself at the end of the tree's exhaustive growth. In addition to the theoretical arguments developed, the relevance of the algorithm is empirically supported by various numerical experiments.

2602.10613 2026-02-12 stat.ML cs.LG

Highly Adaptive Principal Component Regression

Mingxun Wang, Alejandro Schuler, Mark van der Laan, Carlos García Meixide

详情
英文摘要

The Highly Adaptive Lasso (HAL) is a nonparametric regression method that achieves almost dimension-free convergence rates under minimal smoothness assumptions, but its implementation can be computationally prohibitive in high dimensions due to the large basis matrix it requires. The Highly Adaptive Ridge (HAR) has been proposed as a scalable alternative. Building on both procedures, we introduce the Principal Component based Highly Adaptive Lasso (PCHAL) and Principal Component based Highly Adaptive Ridge (PCHAR). These estimators constitute an outcome-blind dimension reduction which offer substantial gains in computational efficiency and match the empirical performances of HAL and HAR. We also uncover a striking spectral link between the leading principal components of the HAL/HAR Gram operator and a discrete sinusoidal basis, revealing an explicit Fourier-type structure underlying the PC truncation.

2602.10611 2026-02-12 cs.LG physics.comp-ph stat.ML

On the Role of Consistency Between Physics and Data in Physics-Informed Neural Networks

Nicolás Becerra-Zuniga, Lucas Lacasa, Eusebio Valero, Gonzalo Rubio

Comments 24 pages, 7 Figures, 3 Tables

详情
英文摘要

Physics-informed neural networks (PINNs) have gained significant attention as a surrogate modeling strategy for partial differential equations (PDEs), particularly in regimes where labeled data are scarce and physical constraints can be leveraged to regularize the learning process. In practice, however, PINNs are frequently trained using experimental or numerical data that are not fully consistent with the governing equations due to measurement noise, discretization errors, or modeling assumptions. The implications of such data-to-PDE inconsistencies on the accuracy and convergence of PINNs remain insufficiently understood. In this work, we systematically analyze how data inconsistency fundamentally limits the attainable accuracy of PINNs. We introduce the concept of a consistency barrier, defined as an intrinsic lower bound on the error that arises from mismatches between the fidelity of the data and the exact enforcement of the PDE residual. To isolate and quantify this effect, we consider the 1D viscous Burgers equation with a manufactured analytical solution, which enables full control over data fidelity and residual errors. PINNs are trained using datasets of progressively increasing numerical accuracy, as well as perfectly consistent analytical data. Results show that while the inclusion of the PDE residual allows PINNs to partially mitigate low-fidelity data and recover the dominant physical structure, the training process ultimately saturates at an error level dictated by the data inconsistency. When high-fidelity numerical data are employed, PINN solutions become indistinguishable from those trained on analytical data, indicating that the consistency barrier is effectively removed. These findings clarify the interplay between data quality and physics enforcement in PINNs providing practical guidance for the construction and interpretation of physics-informed surrogate models.

2602.10608 2026-02-12 stat.ML cs.LG

Bayesian Inference of Contextual Bandit Policies via Empirical Likelihood

Jiangrong Ouyang, Mingming Gong, Howard Bondell

Comments Accepted for publication in JMLR

详情
英文摘要

Policy inference plays an essential role in the contextual bandit problem. In this paper, we use empirical likelihood to develop a Bayesian inference method for the joint analysis of multiple contextual bandit policies in finite sample regimes. The proposed inference method is robust to small sample sizes and is able to provide accurate uncertainty measurements for policy value evaluation. In addition, it allows for flexible inferences on policy comparison with full uncertainty quantification. We demonstrate the effectiveness of the proposed inference method using Monte Carlo simulations and its application to an adolescent body mass index data set.

2602.10588 2026-02-12 cs.LG stat.ML

TRACE: Theoretical Risk Attribution under Covariate-shift Effects

Hosein Anjidani, S. Yahya S. R. Tehrani, Mohammad Mahdi Mojahedian, Mohammad Hossein Yassaee

详情
英文摘要

When a source-trained model $Q$ is replaced by a model $\tilde{Q}$ trained on shifted data, its performance on the source domain can change unpredictably. To address this, we study the two-model risk change, $ΔR := R_P(\tilde{Q}) - R_P(Q)$, under covariate shift. We introduce TRACE (Theoretical Risk Attribution under Covariate-shift Effects), a framework that decomposes $|ΔR|$ into an interpretable upper bound. This decomposition disentangles the risk change into four actionable factors: two generalization gaps, a model change penalty, and a covariate shift penalty, transforming the bound into a powerful diagnostic tool for understanding why performance has changed. To make TRACE a fully computable diagnostic, we instantiate each term. The covariate shift penalty is estimated via a model sensitivity factor (from high-quantile input gradients) and a data-shift measure; we use feature-space Optimal Transport (OT) by default and provide a robust alternative using Maximum Mean Discrepancy (MMD). The model change penalty is controlled by the average output distance between the two models on the target sample. Generalization gaps are estimated on held-out data. We validate our framework in an idealized linear regression setting, showing the TRACE bound correctly captures the scaling of the true risk difference with the magnitude of the shift. Across synthetic and vision benchmarks, TRACE diagnostics are valid and maintain a strong monotonic relationship with the true performance degradation. Crucially, we derive a deployment gate score that correlates strongly with $|ΔR|$ and achieves high AUROC/AUPRC for gating decisions, enabling safe, label-efficient model replacement.

2602.10587 2026-02-12 stat.ML cs.LG

Deep Bootstrap

Jinyuan Chang, Yuling Jiao, Lican Kang, Junjie Shi

详情
英文摘要

In this work, we propose a novel deep bootstrap framework for nonparametric regression based on conditional diffusion models. Specifically, we construct a conditional diffusion model to learn the distribution of the response variable given the covariates. This model is then used to generate bootstrap samples by pairing the original covariates with newly synthesized responses. We reformulate nonparametric regression as conditional sample mean estimation, which is implemented directly via the learned conditional diffusion model. Unlike traditional bootstrap methods that decouple the estimation of the conditional distribution, sampling, and nonparametric regression, our approach integrates these components into a unified generative framework. With the expressive capacity of diffusion models, our method facilitates both efficient sampling from high-dimensional or multimodal distributions and accurate nonparametric estimation. We establish rigorous theoretical guarantees for the proposed method. In particular, we derive optimal end-to-end convergence rates in the Wasserstein distance between the learned and target conditional distributions. Building on this foundation, we further establish the convergence guarantees of the resulting bootstrap procedure. Numerical studies demonstrate the effectiveness and scalability of our approach for complex regression tasks.

2602.10566 2026-02-12 math.ST math.CO math.RA math.SP stat.TH

Finite-sample confidence regions for spectral clustering and graph centrality

Chandrasekhar Gokavarapu, Sekhar Babu Gosala, Vamis Pasalapudi, Tarakarama Kapakayala

详情
英文摘要

Let a graph be observed through a finite random sampling mechanism. Spectral methods are routinely applied to such graphs, yet their outputs are treated as deterministic objects. This paper develops finite-sample inference for spectral graph procedures. The primary result constructs explicit confidence regions for latent eigenspaces of graph operators under an explicit sampling model. These regions propagate to confidence regions for spectral clustering assignments and for smooth graph centrality functionals. All bounds are nonasymptotic and depend explicitly on the sample size, noise level, and spectral gap. The analysis isolates a failure of common practice: asymptotic perturbation arguments are often invoked without a finite-sample spectral gap, leading to invalid uncertainty claims. Under verifiable gap and concentration conditions, the present framework yields coverage guarantees and certified stability regions. Several corollaries address fairness-constrained post-processing and topological summaries derived from spectral embeddings.

2602.10545 2026-02-12 cs.LG cs.AI stat.ML

$μ$pscaling small models: Principled warm starts and hyperparameter transfer

Yuxin Ma, Nan Chen, Mateo Díaz, Soufiane Hayou, Dmitriy Kunisky, Soledad Villar

Comments 61 pages, 6 figures

详情
英文摘要

Modern large-scale neural networks are often trained and released in multiple sizes to accommodate diverse inference budgets. To improve efficiency, recent work has explored model upscaling: initializing larger models from trained smaller ones in order to transfer knowledge and accelerate convergence. However, this method can be sensitive to hyperparameters that need to be tuned at the target upscaled model size, which is prohibitively costly to do directly. It remains unclear whether the most common workaround -- tuning on smaller models and extrapolating via hyperparameter scaling laws -- is still sound when using upscaling. We address this with principled approaches to upscaling with respect to model widths and efficiently tuning hyperparameters in this setting. First, motivated by $μ$P and any-dimensional architectures, we introduce a general upscaling method applicable to a broad range of architectures and optimizers, backed by theory guaranteeing that models are equivalent to their widened versions and allowing for rigorous analysis of infinite-width limits. Second, we extend the theory of $μ$Transfer to a hyperparameter transfer technique for models upscaled using our method and empirically demonstrate that this method is effective on realistic datasets and architectures.

2602.10532 2026-02-12 stat.ML cs.LG math.ST stat.TH

Statistical Inference and Learning for Shapley Additive Explanations (SHAP)

Justin Whitehouse, Ayush Sawarni, Vasilis Syrgkanis

Comments 48 pages, 1 figure

详情
英文摘要

The SHAP (short for Shapley additive explanation) framework has become an essential tool for attributing importance to variables in predictive tasks. In model-agnostic settings, SHAP uses the concept of Shapley values from cooperative game theory to fairly allocate credit to the features in a vector $X$ based on their contribution to an outcome $Y$. While the explanations offered by SHAP are local by nature, learners often need global measures of feature importance in order to improve model explainability and perform feature selection. The most common approach for converting these local explanations into global ones is to compute either the mean absolute SHAP or mean squared SHAP. However, despite their ubiquity, there do not exist approaches for performing statistical inference on these quantities. In this paper, we take a semi-parametric approach for calibrating confidence in estimates of the $p$th powers of Shapley additive explanations. We show that, by treating the SHAP curve as a nuisance function that must be estimated from data, one can reliably construct asymptotically normal estimates of the $p$th powers of SHAP. When $p \geq 2$, we show a de-biased estimator that combines U-statistics with Neyman orthogonal scores for functionals of nested regressions is asymptotically normal. When $1 \leq p < 2$ (and the hence target parameter is not twice differentiable), we construct de-biased U-statistics for a smoothed alternative. In particular, we show how to carefully tune the temperature parameter of the smoothing function in order to obtain inference for the true, unsmoothed $p$th power. We complement these results by presenting a Neyman orthogonal loss that can be used to learn the SHAP curve via empirical risk minimization and discussing excess risk guarantees for commonly used function classes.

2602.10530 2026-02-12 stat.ML cs.LG math.ST stat.TH

Generalized Robust Adaptive-Bandwidth Multi-View Manifold Learning in High Dimensions with Noise

Xiucai Ding, Chao Shen, Hau-Tieng Wu

Comments 4 figures

详情
英文摘要

Multiview datasets are common in scientific and engineering applications, yet existing fusion methods offer limited theoretical guarantees, particularly in the presence of heterogeneous and high-dimensional noise. We propose Generalized Robust Adaptive-Bandwidth Multiview Diffusion Maps (GRAB-MDM), a new kernel-based diffusion geometry framework for integrating multiple noisy data sources. The key innovation of GRAB-MDM is a {view}-dependent bandwidth selection strategy that adapts to the geometry and noise level of each view, enabling a stable and principled construction of multiview diffusion operators. Under a common-manifold model, we establish asymptotic convergence results and show that the adaptive bandwidths lead to provably robust recovery of the shared intrinsic structure, even when noise levels and sensor dimensions differ across views. Numerical experiments demonstrate that GRAB-MDM significantly improves robustness and embedding quality compared with fixed-bandwidth and equal-bandwidth baselines, and usually outperform existing algorithms. The proposed framework offers a practical and theoretically grounded solution for multiview sensor fusion in high-dimensional noisy environments.

2602.10484 2026-02-12 stat.ME

CoVaR under Asymptotic Independence

Zhaowen Wang, Yutao Liu, Deyuan Li

详情
英文摘要

Conditional value-at-risk (CoVaR) is one of the most important measures of systemic risk. It is defined as the high quantile conditional on a related variable being extreme, widely used in the field of quantitative risk management. In this work, we develop a semi-parametric methodology to estimate CoVaR for asymptotically independent pairs within the framework of bivariate extreme value theory. We use parametric modelling of the bivariate extremal structure to address data sparsity in the joint tail regions and prove consistency and asymptotic normality of the proposed estimator. The robust performance of the estimator is illustrated via simulation studies. Its application to the US stock returns data produces insightful dynamic CoVaR forecasts.

2602.10464 2026-02-12 math.ST stat.ML stat.TH

Do More Predictions Improve Statistical Inference? Filtered Prediction-Powered Inference

Shirong Xu, Will Wei Sun

详情
英文摘要

Recent advances in artificial intelligence have enabled the generation of large-scale, low-cost predictions with increasingly high fidelity. As a result, the primary challenge in statistical inference has shifted from data scarcity to data reliability. Prediction-powered inference methods seek to exploit such predictions to improve efficiency when labeled data are limited. However, existing approaches implicitly adopt a use-all philosophy, under which incorporating more predictions is presumed to improve inference. When prediction quality is heterogeneous, this assumption can fail, and indiscriminate use of unlabeled data may dilute informative signals and degrade inferential accuracy. In this paper, we propose Filtered Prediction-Powered Inference (FPPI), a framework that selectively incorporates predictions by identifying a data-adaptive filtered region in which predictions are informative for inference. We show that this region can be consistently estimated under a margin condition, achieving fast rates of convergence. By restricting the prediction-powered correction to the estimated filtered region, FPPI adaptively mitigates the impact of biased or noisy predictions. We establish that FPPI attains strictly improved asymptotic efficiency compared with existing prediction-powered inference methods. Numerical studies and a real-data application to large language model evaluation demonstrate that FPPI substantially reduces reliance on expensive labels by selectively leveraging reliable predictions, yielding accurate inference even in the presence of heterogeneous prediction quality.

2602.10348 2026-02-12 stat.ME

Optimizing precision in stepped-wedge designs via machine learning and quadratic inference functions

Liangbo Lyu, Bingkai Wang

详情
英文摘要

Stepped-wedge designs are increasingly used in randomized experiments to accommodate logistical and ethical constraints by staggering treatment roll-out over time. Despite their popularity, existing analytical methods largely rely on parametric models with linear covariate adjustment and prespecified correlation structures, which may limit achievable precision in practice. We propose a new class of estimators for the causal average treatment effect in stepped-wedge designs that optimizes precision through flexible, machine-learning-based covariate adjustment to capture complex outcome-covariate relationships, together with quadratic inference functions to adaptively learn the correlation structure. We establish consistency and asymptotic normality under mild conditions requiring only $L_2$ convergence of nuisance estimators, even under model misspecification, and characterize when the estimator attains the minimal asymptotic variance. Moreover, we prove that the proposed estimator never reduces efficiency relative to an independence working correlation. The proposed method further accommodates treatment-effect heterogeneity across both exposure duration and calendar time. Finally, we demonstrate our methods through simulation studies and reanalyses of two empirical studies that differ substantially in research area and key design parameters.

2602.10332 2026-02-12 stat.ME stat.ML

Generalized Prediction-Powered Inference, with Application to Binary Classifier Evaluation

Runjia Zou, Daniela Witten, Brian Williamson

详情
英文摘要

In the partially-observed outcome setting, a recent set of proposals known as "prediction-powered inference" (PPI) involve (i) applying a pre-trained machine learning model to predict the response, and then (ii) using these predictions to obtain an estimator of the parameter of interest with asymptotic variance no greater than that which would be obtained using only the labeled observations. While existing PPI proposals consider estimators arising from M-estimation, in this paper we generalize PPI to any regular asymptotically linear estimator. Furthermore, by situating PPI within the context of an existing rich literature on missing data and semi-parametric efficiency theory, we show that while PPI does not achieve the semi-parametric efficiency lower bound outside of very restrictive and unrealistic scenarios, it can be viewed as a computationally-simple alternative to proposals in that literature. We exploit connections to that literature to propose modified PPI estimators that can handle three distinct forms of covariate distribution shift. Finally, we illustrate these developments by constructing PPI estimators of true positive rate, false positive rate, and area under the curve via numerical studies.

2602.10303 2026-02-12 cs.LG q-bio.QM stat.ML

ICODEN: Ordinary Differential Equation Neural Networks for Interval-Censored Data

Haoling Wang, Lang Zeng, Tao Sun, Youngjoo Cho, Ying Ding

详情
英文摘要

Predicting time-to-event outcomes when event times are interval censored is challenging because the exact event time is unobserved. Many existing survival analysis approaches for interval-censored data rely on strong model assumptions or cannot handle high-dimensional predictors. We develop ICODEN, an ordinary differential equation-based neural network for interval-censored data that models the hazard function through deep neural networks and obtains the cumulative hazard by solving an ordinary differential equation. ICODEN does not require the proportional hazards assumption or a prespecified parametric form for the hazard function, thereby permitting flexible survival modeling. Across simulation settings with proportional or non-proportional hazards and both linear and nonlinear covariate effects, ICODEN consistently achieves satisfactory predictive accuracy and remains stable as the number of predictors increases. Applications to data from multiple phases of the Alzheimer's Disease Neuroimaging Initiative (ADNI) and to two Age-Related Eye Disease Studies (AREDS and AREDS2) for age-related macular degeneration (AMD) demonstrate ICODEN's robust prediction performance. In both applications, predicting time-to-AD or time-to-late AMD, ICODEN effectively uses hundreds to more than 1,000 SNPs and supports data-driven subgroup identification with differential progression risk profiles. These results establish ICODEN as a practical assumption-lean tool for prediction with interval-censored survival data in high-dimensional biomedical settings.

2602.10274 2026-02-12 math.ST stat.TH

Asymptotic equivalence for nonparametric additive regression

Moritz Jirak, Alexander Meister, Angelika Rohde

详情
英文摘要

We prove asymptotic equivalence of nonparametric additive regression and an appropriate Gaussian white noise experiment in which a multidimensional shifted Wiener process is observed, whose dimension equals the number of additive components. The shift depends on the additive components of the regression function and solely the one- and two-dimensional marginal distributions of the covariates via an explicitly specified bounded but non-compact linear operator~$Γ$. The number of additive components $d$ is allowed to increase moderately with respect to the sample size. In the special case of pairwise independent components of the covariates, the white noise model decomposes into $d$ independent univariate processes. Moreover, we study approximation in some semiparametric setting where $Γ$ splits into a multiplication operator and an asymptotically negligible Hilbert-Schmidt operator.

2602.10261 2026-02-12 cs.LG stat.AP stat.ML

Kernel-Based Learning of Chest X-ray Images for Predicting ICU Escalation among COVID-19 Patients

Qiyuan Shi, Jian Kang, Yi Li

详情
英文摘要

Kernel methods have been extensively utilized in machine learning for classification and prediction tasks due to their ability to capture complex non-linear data patterns. However, single kernel approaches are inherently limited, as they rely on a single type of kernel function (e.g., Gaussian kernel), which may be insufficient to fully represent the heterogeneity or multifaceted nature of real-world data. Multiple kernel learning (MKL) addresses these limitations by constructing composite kernels from simpler ones and integrating information from heterogeneous sources. Despite these advances, traditional MKL methods are primarily designed for continuous outcomes. We extend MKL to accommodate the outcome variable belonging to the exponential family, representing a broader variety of data types, and refer to our proposed method as generalized linear models with integrated multiple additive regression with kernels (GLIMARK). Empirically, we demonstrate that GLIMARK can effectively recover or approximate the true data-generating mechanism. We have applied it to a COVID-19 chest X-ray dataset, predicting binary outcomes of ICU escalation and extracting clinically meaningful features, underscoring the practical utility of this approach in real-world scenarios.

2602.10256 2026-02-12 math.ST stat.TH

Bernstein-von Mises theorem for log-concave posteriors

Victor-Emmanuel Brunel

详情
英文摘要

We prove new, general versions of Bernstein-von Mises theorem for both well-specified and misspecified models when the log-likelihood is concave in the parameter and the prior distribution is log-concave. Unlike classical versions of Bernstein-von Mises theorem, our versions do not require technical smoothness assumptions, and they solely rely on convex analysis.

2602.10241 2026-02-12 stat.ME cs.CY

Geographically Weighted Canonical Correlation Analysis: Local Spatial Associations Between Two Sets of Variables

Zhenzhi Jiao, Angela Yao, Ran Tao, Jean-Claude Thill

详情
英文摘要

This article critically assesses the utility of the classical statistical technique of Canonical Correlation Analysis (CCA) for studying spatial associations and proposes a new approach to enhance it. Unlike bivariate correlation analysis, which focuses on the relationship between two individual variables, CCA investigates associations between two sets of variables by identifying pairs of linear combinations that are maximally correlated. CCA has strong potential for uncovering complex multivariate relationships that vary across geographic space. We propose Geographically Weighted Canonical Correlation Analysis (GWCCA) as a new technique for exploring local spatial associations between two sets of variables. GWCCA localizes standard CCA by weighting each observation according to its spatial distance from a target location, thereby estimating location-specific canonical correlations. The effectiveness of GWCCA in recovering spatial structure and capturing spatial effects is evaluated using synthetic data. A case study of US county-level health outcomes and social determinants of health further demonstrates the empirical capabilities of the proposed method. The results indicate that GWCCA has broad potential applications in spatial data-intensive fields such as urban planning, environmental science, public health, and transportation, where understanding local multivariate spatial associations is critical.

2602.10182 2026-02-12 cs.LG stat.ML

Signature-Kernel Based Evaluation Metrics for Robust Probabilistic and Tail-Event Forecasting

Benjamin R. Redhead, Thomas L. Lee, Peng Gu, Víctor Elvira, Amos Storkey

Comments Main Paper: 8 pages 3 figures Including Appendix and References: 19 pages 7 figures

详情
英文摘要

Probabilistic forecasting is increasingly critical across high-stakes domains, from finance and epidemiology to climate science. However, current evaluation frameworks lack a consensus metric and suffer from two critical flaws: they often assume independence across time steps or variables, and they demonstrably lack sensitivity to tail events, the very occurrences that are most pivotal in real-world decision-making. To address these limitations, we propose two kernel-based metrics: the signature maximum mean discrepancy (Sig-MMD) and our novel censored Sig-MMD (CSig-MMD). By leveraging the signature kernel, these metrics capture complex inter-variate and inter-temporal dependencies and remain robust to missing data. Furthermore, CSig-MMD introduces a censoring scheme that prioritizes a forecaster's capability to predict tail events while strictly maintaining properness, a vital property for a good scoring rule. These metrics enable a more reliable evaluation of direct multi-step forecasting, facilitating the development of more robust probabilistic algorithms.

2602.10176 2026-02-12 stat.ML cs.LG

Dissecting Performative Prediction: A Comprehensive Survey

Thomas Kehrenberg, Javier Sanguino, Jose A. Lozano, Novi Quadrianto

详情
英文摘要

The field of performative prediction had its beginnings in 2020 with the seminal paper "Performative Prediction" by Perdomo et al., which established a novel machine learning setup where the deployment of a predictive model causes a distribution shift in the environment, which in turn causes a mismatch between the distribution expected by the predictive model and the real distribution. This shift is defined by a so-called distribution map. In the half-decade since, a literature has emerged which has, among other things, introduced new solution concepts to the original setup, extended the setup, offered new theoretical analyses, and examined the intersection of performative prediction and other established fields. In this survey, we first lay out the performative prediction setting and explain the different optimization targets: performative stability and performative optimality. We introduce a new way of classifying different performative prediction settings, based on how much information is available about the distribution map. We survey existing implementations of distribution maps and existing methods to address the problem of performative prediction, while examining different ways to categorize them. Finally, we point out known and previously unknown connections that can be drawn to other fields, in the hopes of stimulating future research.

2602.10144 2026-02-12 stat.ML cs.AI cs.LG

When LLMs get significantly worse: A statistical approach to detect model degradations

Jonas Kübler, Kailash Budhathoki, Matthäus Kleindessner, Xiong Zhou, Junming Yin, Ashish Khetan, George Karypis

Comments https://openreview.net/forum?id=cM3gsqEI4K

Journal ref ICLR 2026

详情
英文摘要

Minimizing the inference cost and latency of foundation models has become a crucial area of research. Optimization approaches include theoretically lossless methods and others without accuracy guarantees like quantization. In all of these cases it is crucial to ensure that the model quality has not degraded. However, even at temperature zero, model generations are not necessarily robust even to theoretically lossless model optimizations due to numerical errors. We thus require statistical tools to decide whether a finite-sample accuracy deviation is an evidence of a model's degradation or whether it can be attributed to (harmless) noise in the evaluation. We propose a statistically sound hypothesis testing framework based on McNemar's test allowing to efficiently detect model degradations, while guaranteeing a controlled rate of false positives. The crucial insight is that we have to confront the model scores on each sample, rather than aggregated on the task level. Furthermore, we propose three approaches to aggregate accuracy estimates across multiple benchmarks into a single decision. We provide an implementation on top of the largely adopted open source LM Evaluation Harness and provide a case study illustrating that the method correctly flags degraded models, while not flagging model optimizations that are provably lossless. We find that with our tests even empirical accuracy degradations of 0.3% can be confidently attributed to actual degradations rather than noise.

2602.09208 2026-02-12 stat.ME

Some Bayesian Perspectives on Clinical Trials

Alexandra Sokolova, Vadim Sokolov, Nick Polson

详情
英文摘要

We examine three landmark clinical trials -- ECMO, CALGB~49907, and I-SPY~2 -- through a unified Bayesian framework connecting prior specification, sequential adaptation, and decision-theoretic optimisation. For ECMO, the posterior probability of treatment superiority is robust across the range of priors examined. For CALGB, predictive probability monitoring stopped enrolment at 633 instead of 1800 patients. For I-SPY~2, adaptive enrichment graduated nine of 23 arms to Phase~III. These case studies motivate a methodological contribution: exact backward induction for two-arm binary trials, where Beta-Binomial conjugacy yields closed-form transitions on the integer lattice of success counts with no quadrature. A Pólya-Gamma augmentation bridges this to covariate-adjusted logistic regression. Simulation reveals a fundamental tension: the optimal Bayesian design reduces expected sample sizes to 14--26 per arm (versus 42--100 for alternatives) but with substantially lower power. A calibrated variant embedding the declaration threshold in the terminal utility improves power while maintaining sample-size savings; varying the per-stage cost traces a power frontier for selecting the preferred operating point, with suitability highest in patient-sparing contexts such as rare diseases and paediatrics. The Pólya-Gamma Laplace approximation is validated against exact calculations (mean absolute error below 0.01). We discuss implications for the 2026 FDA draft guidance on Bayesian methodology.

2602.08215 2026-02-12 cs.LG stat.ME

Distribution-Free Robust Predict-Then-Optimize in Function Spaces

Yash Patel, Ambuj Tewari

详情
英文摘要

The need to rapidly solve PDEs in engineering design workflows has spurred the rise of neural surrogate models. In particular, neural operator models provide a discretization-invariant surrogate by retaining the infinite-dimensional, functional form of their arguments. Despite improved throughput, such methods lack guarantees on accuracy, unlike classical numerical PDE solvers. Optimizing engineering designs under these potentially miscalibrated surrogates thus runs the risk of producing designs that perform poorly upon deployment. In a similar vein, there is growing interest in automated decision-making under black-box predictors in the finite-dimensional setting, where a similar risk of suboptimality exists under poorly calibrated models. For this reason, methods have emerged that produce adversarially robust decisions under uncertainty estimates of the upstream model. One such framework leverages conformal prediction, a distribution-free post-hoc uncertainty quantification method, to provide these estimates due to its natural pairing with black-box predictors. We herein extend this line of conformally robust decision-making to infinite-dimensional function spaces. We first extend the typical conformal prediction guarantees over finite-dimensional spaces to infinite-dimensional Sobolev spaces. We then demonstrate how such uncertainty can be leveraged to robustly formulate engineering design tasks and characterize the suboptimality of the resulting robust optimal designs. We then empirically demonstrate the generality of our functional conformal coverage method across a diverse collection of PDEs, including the Poisson and heat equations, and showcase the significant improvement of such robust design in a quantum state discrimination task.

2602.07402 2026-02-12 quant-ph physics.hist-ph stat.AP

The ABL Rule and the Perils of Post-Selection

Jacob A. Barandes

Comments 28 pages, no figures

详情
英文摘要

In 1964, Aharonov, Bergmann, and Lebowitz introduced their well-known ABL rule with the intention of providing a time-symmetric formalism for computing novel kinds of conditional probabilities in quantum theory. Later papers attached additional significance to the ABL rule, including assertions that it supported violations of the uncertainty principle. The present work challenges these claims, as well as subsequent attempts to salvage the original interpretation of the ABL rule. Taking a broader view, this paper identifies a subtle category error at the heart of the ABL rule that consists of confusing observables that belong to a single system with emergent observables that arise only for physical ensembles. Along the way, this paper points out other problems and fallacious reasoning in the research literature surrounding the ABL rule, including the misuse of post-selection, a reliance on pattern matching to classical formulas, and a posture of measurementism that takes experimental data as providing answers to interpretational questions.

2602.03514 2026-02-12 cs.LG math.OC stat.ML

A Function-Space Stability Boundary for Generalization in Interpolating Learning Systems

Ronald Katende

详情
英文摘要

Modern learning systems often interpolate training data while still generalizing well, yet it remains unclear when algorithmic stability explains this behavior. We model training as a function-space trajectory and measure sensitivity to single-sample perturbations along this trajectory. We propose a contractive propagation condition and a stability certificate obtained by unrolling the resulting recursion. A small certificate implies stability-based generalization, while we also prove that there exist interpolating regimes with small risk where such contractive sensitivity cannot hold, showing that stability is not a universal explanation. Experiments confirm that certificate growth predicts generalization differences across optimizers, step sizes, and dataset perturbations. The framework therefore identifies regimes where stability explains generalization and where alternative mechanisms must account for success.

2601.21014 2026-02-12 stat.ML cs.LG stat.AP

Efficient Causal Structure Learning via Modular Subgraph Integration

Haixiang Sun, Pengchao Tian, Zihan Zhou, Jielei Zhang, Peiyi Li, Andrew L. Liu

详情
英文摘要

Learning causal structures from observational data remains a fundamental yet computationally intensive task, particularly in high-dimensional settings where existing methods face challenges such as the super-exponential growth of the search space and increasing computational demands. To address this, we introduce VISTA (Voting-based Integration of Subgraph Topologies for Acyclicity), a modular framework that decomposes the global causal structure learning problem into local subgraphs based on Markov Blankets. The global integration is achieved through a weighted voting mechanism that penalizes low-support edges via exponential decay, filters unreliable ones with an adaptive threshold, and ensures acyclicity using a Feedback Arc Set (FAS) algorithm. The framework is model-agnostic, imposing no assumptions on the inductive biases of base learners, is compatible with arbitrary data settings without requiring specific structural forms, and fully supports parallelization. We also theoretically establish finite-sample error bounds for VISTA, and prove its asymptotic consistency under mild conditions. Extensive experiments on both synthetic and real datasets consistently demonstrate the effectiveness of VISTA, yielding notable improvements in both accuracy and efficiency over a wide range of base learners.

2601.18626 2026-02-12 cs.LG cs.AI stat.ML

Rank-1 Approximation of Inverse Fisher for Natural Policy Gradients in Deep Reinforcement Learning

Yingxiao Huo, Satya Prakash Dash, Radu Stoican, Samuel Kaski, Mingfei Sun

详情
英文摘要

Natural gradients have long been studied in deep reinforcement learning due to their fast convergence properties and covariant weight updates. However, computing natural gradients requires inversion of the Fisher Information Matrix (FIM) at each iteration, which is computationally prohibitive in nature. In this paper, we present an efficient and scalable natural policy optimization technique that leverages a rank-1 approximation to full inverse-FIM. We theoretically show that under certain conditions, a rank-1 approximation to inverse-FIM converges faster than policy gradients and, under some conditions, enjoys the same sample complexity as stochastic policy gradient methods. We benchmark our method on a diverse set of environments and show that it achieves superior performance to standard actor-critic and trust-region baselines.

2601.16865 2026-02-12 econ.EM math.ST stat.AP stat.TH

Distributional Instruments: Identification and Estimation with Quantile Least Squares

Rowan Cherodian, Guy Tchuente

详情
英文摘要

We study instrumental-variable designs where policy reforms strongly shift the distribution of an endogenous variable but only weakly move its mean. We formalize this by introducing distributional relevance: instruments may be purely distributional. Within a triangular model, distributional relevance suffices for nonparametric identification of average structural effects via a control function. We then propose Quantile Least Squares (Q-LS), which aggregates conditional quantiles of X given Z into an optimal mean-square predictor and uses this projection as an instrument in a linear IV estimator. We establish consistency, asymptotic normality, and the validity of standard 2SLS variance formulas, and we discuss regularization across quantiles. Monte Carlo designs show that Q-LS delivers well-centered estimates and near-correct size when mean-based 2SLS suffers from weak instruments. In Health and Retirement Study data, Q-LS exploits Medicare Part D-induced distributional shifts in out-of-pocket risk to sharpen estimates of its effects on depression.

2512.25025 2026-02-12 stat.ME econ.EM math.ST stat.TH

Modewise Additive Factor Model for Matrix Time Series

Elynn Chen, Yuefeng Han, Jiayu Li, Ke Xu

详情
英文摘要

We introduce a Modewise Additive Factor Model (MAFM) for matrix-valued time series that captures row-specific and column-specific latent effects through an additive structure, offering greater flexibility than multiplicative frameworks such as Tucker and CP factor models. In MAFM, each observation decomposes into a row-factor component, a column-factor component, and noise, allowing distinct sources of variation along different modes to be modeled separately. We develop a computationally efficient two-stage estimation procedure: Modewise Inner-product Eigendecomposition (MINE) for initialization, followed by Complement-Projected Alternating Subspace Estimation (COMPAS) for iterative refinement. The key methodological innovation is that orthogonal complement projections completely eliminate cross-modal interference when estimating each loading space. We establish convergence rates for the estimated factor loading matrices under proper conditions. We further derive asymptotic distributions for the loading matrix estimators and develop consistent covariance estimators, yielding a data-driven inference framework that enables confidence interval construction and hypothesis testing. As a technical contribution of independent interest, we establish matrix Bernstein inequalities for quadratic forms of dependent matrix time series. Numerical experiments on synthetic and real data demonstrate the advantages of the proposed method over existing approaches.

2512.19338 2026-02-12 math.ST stat.AP stat.ME stat.TH

A hybrid-Hill estimator enabled by heavy-tailed block maxima

Claudia Neves, Chang Xu

Comments 32 pages, 5 figures

详情
英文摘要

When analysing extreme values, two alternative statistical approaches have historically been held in contention: the block maxima method (or annual maxima method, spurred by hydrological applications) and the peaks-over-threshold. Clamoured amongst statisticians as wasteful of potentially informative data, the block maxima method gradually fell into disfavour whilst peaks-over-threshold-based methodologies climbed to the centre stage of extreme value statistics. This paper devises a hybrid method which reconciles these two hitherto disconnected approaches. Appealing in its simplicity, our main result introduces a new universality class of extreme value distributions that discards the customary requirement of a sufficiently large block size for the plausible block maxima-fit to an extreme value distribution. Natural extensions to dependent and/or non-stationary settings are mapped out. We advocate that inference should be drawn solely on larger block maxima, from which practice the mainstream peaks-over-threshold methodology coalesces: the asymptotic properties of the hybrid-Hill estimator herald more than its efficiency, but rather that a fully-fledged unified semi-parametric stream of statistics for extreme values is viable. A reduced-bias off-shoot of the hybrid-Hill estimator provably outclasses the incumbent maximum likelihood estimation that relies on a numerical fit to the entire sample of block maxima.

2511.21516 2026-02-12 math.ST stat.TH

Causal Inference: A Tale of Three Frameworks

Linbo Wang, Thomas Richardson, James Robins

详情
英文摘要

Causal inference is a central goal across many scientific disciplines. Over the past several decades, three major frameworks have emerged to formalize causal questions and guide their analysis: the potential outcomes framework, structural equation models, and directed acyclic graphs. Although these frameworks differ in language, assumptions, and philosophical orientation, they often lead to compatible or complementary insights. This paper provides a comparative introduction to the three frameworks, clarifying their connections, highlighting their distinct strengths and limitations, and illustrating how they can be used together in practice. The discussion is aimed at researchers and graduate students with some background in statistics or causal inference who are seeking a conceptual foundation for applying causal methods across a range of substantive domains.

2511.01103 2026-02-12 math.ST stat.TH

Nonparametric Least Squares Estimators for Interval Censoring

Piet Groeneboom

Comments 26 pages, 8 figures

详情
英文摘要

The limit distribution of the nonparametric maximum likelihood estimator for interval censored data with more than one observation time per unobservable observation, is still unknown in general. For the so-called separated case, where one has observation times which are at a distance larger than a fixed positive epsilon, the limit distribution was derived in [5]. For the non-separated case there is a conjectured limit distribution, given in [10], Section 5.2 of Part 2. Whether this conjecture holds is still unknown, but the present paper shows that for sample sizes 1000 and 10,000 this limit behavior is still not clearly seen. We prove consistency of a related nonparametric isotonic least squares estimator and sketch of the proof for its limit distribution. We also provide simulation results to show how the nonparametric MLE and least squares estimator behave in comparison. Moreover, we discuss a simpler least squares estimator that can be computed in one step, but is inferior to the other least squares estimator, since it does not use all information. For the simplest model of interval censoring, the current status model, the nonparametric maximum likelihood and least squares estimators are the same. This equivalence breaks down if there are more observation times per unobservable observation. The computations for the simulation of the more complicated interval censoring model were performed by using the iterative convex minorant algorithm. They are provided in the GitHub repository [7].

2510.08554 2026-02-12 cs.LG stat.ML

Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization

Kevin Rojas, Jiahe Lin, Kashif Rasul, Anderson Schneider, Yuriy Nevmyvaka, Molei Tao, Wei Deng

详情
英文摘要

Diffusion language models (DLMs) enable parallel, order-agnostic generation with iterative refinement, offering a flexible alternative to autoregressive large language models (LLMs). However, adapting reinforcement learning (RL) fine-tuning to DLMs remains an open challenge because of the intractable likelihood. Pioneering work such as diffu-GRPO estimated token-level likelihoods via one-step unmasking. While computationally efficient, this approach is severely biased. A more principled foundation lies in sequence-level likelihoods, where the evidence lower bound (ELBO) serves as a surrogate. Yet, despite this clean mathematical connection, ELBO-based methods have seen limited adoption due to the prohibitive cost of likelihood evaluation. In this work, we revisit ELBO estimation and disentangle its sources of variance. This decomposition motivates reducing variance through fast, deterministic integral approximations along a few pivotal dimensions. Building on this insight, we introduce Group Diffusion Policy Optimization (GDPO), a new RL algorithm tailored for DLMs. GDPO leverages simple yet effective Semi-deterministic Monte Carlo schemes to mitigate the variance explosion of ELBO estimators under vanilla double Monte Carlo sampling, yielding a provably lower-variance estimator under tight evaluation budgets. Empirically, GDPO achieves consistent gains over pretrained checkpoints and outperforms diffu-GRPO, one of the state-of-the-art baselines, on the majority of math, reasoning, and coding benchmarks.

2510.00309 2026-02-12 cs.LG stat.ML

Lipschitz Bandits with Stochastic Delayed Feedback

Zhongxuan Liu, Yue Kang, Thomas C. M. Lee

Comments The Fourteenth International Conference on Learning Representations (ICLR 2026)

详情
英文摘要

The Lipschitz bandit problem extends stochastic bandits to a continuous action set defined over a metric space, where the expected reward function satisfies a Lipschitz condition. In this work, we introduce a new problem of Lipschitz bandit in the presence of stochastic delayed feedback, where the rewards are not observed immediately but after a random delay. We consider both bounded and unbounded stochastic delays, and design algorithms that attain sublinear regret guarantees in each setting. For bounded delays, we propose a delay-aware zooming algorithm that retains the optimal performance of the delay-free setting up to an additional term that scales with the maximal delay $τ_{\max}$. For unbounded delays, we propose a novel phased learning strategy that accumulates reliable feedback over carefully scheduled intervals, and establish a regret lower bound showing that our method is nearly optimal up to logarithmic factors. Finally, we present experimental results to demonstrate the efficiency of our algorithms under various delay scenarios.

2506.11214 2026-02-12 math.OC cs.AI cs.CC cs.LG stat.ML

Complexity of normalized stochastic first-order methods with momentum under heavy-tailed noise

Chuan He, Zhaosong Lu, Defeng Sun, Zhanwang Deng

详情
英文摘要

In this paper, we propose practical normalized stochastic first-order methods with Polyak momentum, multi-extrapolated momentum, and recursive momentum for solving unconstrained optimization problems. These methods employ dynamically updated algorithmic parameters and do not require explicit knowledge of problem-dependent quantities such as the Lipschitz constant or noise bound. We establish first-order oracle complexity results for finding approximate stochastic stationary points under heavy-tailed noise and weakly average smoothness conditions -- both of which are weaker than the commonly used bounded variance and mean-squared smoothness assumptions. Our complexity bounds either improve upon or match the best-known results in the literature. Numerical experiments are presented to demonstrate the practical effectiveness of the proposed methods.

2506.10569 2026-02-12 stat.AP

A composition of simplified physics-based model with neural operator for trajectory-level seismic response predictions of structural systems

Jungho Kim, Sang-ri Yi, Ziqi Wang

Journal ref Structural Safety, Vol(119), 102668, 2026

详情
英文摘要

Accurate prediction of nonlinear structural responses is essential for earthquake risk assessment and management. While high-fidelity nonlinear time history analysis provides the most comprehensive and accurate representation of the responses, it becomes computationally prohibitive for complex structural system models and repeated simulations under varying ground motions. To address this challenge, we propose a composite learning framework that integrates simplified physics-based models with a Fourier neural operator to enable efficient and accurate trajectory-level seismic response prediction. In the proposed architecture, a simplified physics-based model, obtained from techniques such as linearization, modal reduction, or solver relaxation, serves as a preprocessing operator to generate structural response trajectories that capture coarse dynamic characteristics. A neural operator is then trained to correct the discrepancy between these initial approximations and the true nonlinear responses, allowing the composite model to capture hysteretic and path-dependent behaviors. Additionally, a linear regression-based postprocessing scheme is introduced to further refine predictions and quantify associated uncertainty with negligible additional computational effort. The proposed approach is validated on three representative structural systems subjected to synthetic or recorded ground motions. Results show that the proposed approach consistently improves prediction accuracy over baseline models, particularly in data-scarce regimes. These findings demonstrate the potential of physics-guided operator learning for reliable and data-efficient modeling of nonlinear structural seismic responses.

2505.23599 2026-02-12 cs.LG math.RT math.ST stat.ML stat.TH

On Transferring Transferability: Towards a Theory for Size Generalization

Eitan Levin, Yuxin Ma, Mateo Díaz, Soledad Villar

Comments 75 pages, 10 figures, closest to version to be published in NeurIPS

详情
英文摘要

Many modern learning tasks require models that can take inputs of varying sizes. Consequently, dimension-independent architectures have been proposed for domains where the inputs are graphs, sets, and point clouds. Recent work on graph neural networks has explored whether a model trained on low-dimensional data can transfer its performance to higher-dimensional inputs. We extend this body of work by introducing a general framework for transferability across dimensions. We show that transferability corresponds precisely to continuity in a limit space formed by identifying small problem instances with equivalent large ones. This identification is driven by the data and the learning task. We instantiate our framework on existing architectures, and implement the necessary changes to ensure their transferability. Finally, we provide design principles for designing new transferable models. Numerical experiments support our findings.

2505.16204 2026-02-12 cs.LG math.ST stat.ML stat.TH

Directional Convergence, Benign Overfitting of Gradient Descent in leaky ReLU two-layer Neural Networks

Ichiro Hashimoto

Comments 41 pages, Accepted at International Conference on Learning Representations 2026 (ICLR 2026)

详情
英文摘要

In this paper, we provide sufficient conditions of benign overfitting of fixed width leaky ReLU two-layer neural network classifiers trained on mixture data via gradient descent. Our results are derived by establishing directional convergence of the network parameters and classification error bound of the convergent direction. Our classification error bound also lead to the discovery of a newly identified phase transition. Previously, directional convergence in (leaky) ReLU neural networks was established only for gradient flow. Due to the lack of directional convergence, previous results on benign overfitting were limited to those trained on nearly orthogonal data. All of our results hold on mixture data, which is a broader data setting than the nearly orthogonal data setting in prior work. We demonstrate our findings by showing that benign overfitting occurs with high probability in a much wider range of scenarios than previously known. Our results also allow us to characterize cases when benign overfitting provably fails even if directional convergence occurs. Our work thus provides a more complete picture of benign overfitting in leaky ReLU two-layer neural networks.

2504.08263 2026-02-12 stat.ME stat.OT

A roadmap for systematic identification and analysis of multiple biases in causal inference

Rushani Wijesuriya, Rachael A. Hughes, John B. Carlin, Rachel L. Peters, Jennifer J. Koplin, Margarita Moreno-Betancur

Comments 12 Pages, 4 Figures

详情
英文摘要

Observational studies examining causal effects rely on unverifiable assumptions, the violation of which can induce multiple biases. Quantitative bias analysis (QBA) methods examine the sensitivity of findings to such violations, generally, by producing estimates under alternative assumptions, incorporating external information. Although substantial guidance exists for implementing QBA, there is limited guidance on how to systematically determine the assumptions underlying a primary causal analysis and the potential violations that should guide bias analysis. Consequently, many assumptions remain implicit, leading to selective and therefore misleading QBA. To address this gap, we propose a roadmap for systematically identifying and analysing multiple biases. Briefly, this consists of (1) articulating the assumptions underlying the primary analysis through specification and emulation of the ideal trial that defines the causal estimand and depicting these assumptions using a causal diagram; (2) extending the diagram to depict alternative assumptions under which biases may arise; (3) obtaining a single estimate that simultaneously corrects for all potential biases. We illustrate the roadmap using an investigation of the effect of breastfeeding on risk of childhood asthma, and through simulations illustrate the need for analysing multiple biases jointly rather than one at a time.

2504.05661 2026-02-12 math.ST stat.TH

Online Bernstein-von Mises theorem

Jeyong Lee, Junhyeok Choi, Minwoo Chae

Comments 124 pages, 1 figure (Accepted to the Journal of Machine Learning Research)

详情
英文摘要

Online learning is an inferential paradigm in which parameters are updated incrementally from sequentially available data, in contrast to batch learning, where the entire dataset is processed at once. In this paper, we assume that mini-batches from the full dataset become available sequentially. The Bayesian framework, which updates beliefs about unknown parameters after observing each mini-batch, is naturally suited for online learning. At each step, we update the posterior distribution using the current prior and new observations, with the updated posterior serving as the prior for the next step. However, this recursive Bayesian updating is rarely computationally tractable unless the model and prior are conjugate. When the model is regular, the updated posterior can be approximated by a normal distribution, as justified by the Bernstein-von Mises theorem. We adopt a variational approximation at each step and investigate the frequentist properties of the final posterior obtained through this sequential procedure. Under mild assumptions, we show that the accumulated approximation error becomes negligible once the mini-batch size exceeds a threshold depending on the parameter dimension. As a result, the sequentially updated posterior is asymptotically indistinguishable from the full posterior.

2503.02437 2026-02-12 stat.ML cs.LG

Decentralized Reinforcement Learning for Multi-Agent Multi-Resource Allocation via Dynamic Cluster Agreements

Antonio Marino, Esteban Restrepo, Claudio Pacchierotti, Paolo Robuffo Giordano

Journal ref IEEE Robotics and Automation Letters, 2025, 10 (8), pp.8123-8130

详情
英文摘要

This paper addresses the challenge of allocating heterogeneous resources among multiple agents in a decentralized manner. Our proposed method, Liquid-Graph-Time Clustering-IPPO, builds upon Independent Proximal Policy Optimization (IPPO) by integrating dynamic cluster consensus, a mechanism that allows agents to form and adapt local sub-teams based on resource demands. This decentralized coordination strategy reduces reliance on global information and enhances scalability. We evaluate LGTC-IPPO against standard multi-agent reinforcement learning baselines and a centralized expert solution across a range of team sizes and resource distributions. Experimental results demonstrate that LGTC-IPPO achieves more stable rewards, better coordination, and robust performance even as the number of agents or resource types increases. Additionally, we illustrate how dynamic clustering enables agents to reallocate resources efficiently also for scenarios with discharging resources.

2503.01882 2026-02-12 cs.LG physics.geo-ph stat.AP stat.ML

Constructing balanced datasets for predicting failure modes in structural systems under seismic hazards

Jungho Kim, Taeyong Kim

Journal ref Engineering Structures, Vol(346), 121637, 2026

详情
英文摘要

Accurate prediction of structural failure modes under seismic excitations is essential for seismic risk and resilience assessment. Traditional simulation-based approaches often result in imbalanced datasets dominated by non-failure or frequently observed failure scenarios, limiting the effectiveness in machine learning-based prediction. To address this challenge, this study proposes a framework for constructing balanced datasets that include distinct failure modes. The framework consists of three key steps. First, critical ground motion features (GMFs) are identified to effectively represent ground motion time histories. Second, an adaptive algorithm is employed to estimate the probability densities of various failure domains in the space of critical GMFs and structural parameters. Third, samples generated from these probability densities are transformed into ground motion time histories by using a scaling factor optimization process. A balanced dataset is constructed by performing nonlinear response history analyses on structural systems with parameters matching the generated samples, subjected to corresponding transformed ground motion time histories. Deep neural network models are trained on balanced and imbalanced datasets to highlight the importance of dataset balancing. To further evaluate the framework's applicability, numerical investigations are conducted using two different structural models subjected to recorded and synthetic ground motions. The results demonstrate the framework's robustness and effectiveness in addressing dataset imbalance and improving machine learning performance in seismic failure mode prediction.

2502.14121 2026-02-12 stat.ML cs.AI cs.LG

Multi-Objective Bayesian Optimization for Networked Black-Box Systems: A Path to Greener Profits and Smarter Designs

Akshay Kudva, Wei-Ting Tang, Joel A. Paulson

详情
英文摘要

Designing modern industrial systems requires balancing several competing objectives, such as profitability, resilience, and sustainability, while accounting for complex interactions between technological, economic, and environmental factors. Multi-objective optimization (MOO) methods are commonly used to navigate these tradeoffs, but selecting the appropriate algorithm to tackle these problems is often unclear, particularly when system representations vary from fully equation-based (white-box) to entirely data-driven (black-box) models. While grey-box MOO methods attempt to bridge this gap, they typically impose rigid assumptions on system structure, requiring models to conform to the underlying structural assumptions of the solver rather than the solver adapting to the natural representation of the system of interest. In this chapter, we introduce a unifying approach to grey-box MOO by leveraging network representations, which provide a general and flexible framework for modeling interconnected systems as a series of function nodes that share various inputs and outputs. Specifically, we propose MOBONS, a novel Bayesian optimization-inspired algorithm that can efficiently optimize general function networks, including those with cyclic dependencies, enabling the modeling of feedback loops, recycle streams, and multi-scale simulations - features that existing methods fail to capture. Furthermore, MOBONS incorporates constraints, supports parallel evaluations, and preserves the sample efficiency of Bayesian optimization while leveraging network structure for improved scalability. We demonstrate the effectiveness of MOBONS through two case studies, including one related to sustainable process design. By enabling efficient MOO under general graph representations, MOBONS has the potential to significantly enhance the design of more profitable, resilient, and sustainable engineering systems.

2502.03366 2026-02-12 cs.LG stat.ML

Rethinking Approximate Gaussian Inference in Classification

Bálint Mucsányi, Nathaël Da Costa, Philipp Hennig

Comments 46 pages

详情
英文摘要

In classification tasks, softmax functions are ubiquitously used as output activations to produce predictive probabilities. Such outputs only capture aleatoric uncertainty. To capture epistemic uncertainty, approximate Gaussian inference methods have been proposed. We develop a common formalism to describe such methods, which we view as outputting Gaussian distributions over the logit space. Predictives are then obtained as the expectations of the Gaussian distributions pushed forward through the softmax. However, such softmax Gaussian integrals cannot be solved analytically, and Monte Carlo (MC) approximations can be costly and noisy. We propose to replace the softmax activation by element-wise normCDF or sigmoid, which allows for the accurate sampling-free approximation of predictives. This also enables the approximation of the Gaussian pushforwards by Dirichlet distributions with moment matching. This approach entirely eliminates the runtime and memory overhead associated with MC sampling. We evaluate it combined with several approximate Gaussian inference methods (Laplace, HET, SNGP) on large- and small-scale datasets (ImageNet, CIFAR-100, CIFAR-10), demonstrating improved uncertainty quantification capabilities compared to softmax MC sampling. Our code is available at https://github.com/bmucsanyi/probit.

2502.03174 2026-02-12 math.ST stat.ML stat.TH

Robust Label Shift Quantification

Alexandre Lecestre

Comments Revision were made, including a change of title. Also, this version contains new results in the calibration section

详情
英文摘要

In this paper, we investigate the label shift quantification problem. We propose robust estimators of the label distribution which turn out to coincide with the Maximum Likelihood Estimator. We analyze the theoretical aspects and derive deviation bounds for the proposed method, providing optimal guarantees in the well-specified case, along with notable robustness properties against outliers and contamination. Our results provide theoretical validation for empirical observations on the robustness of Maximum Likelihood Label Shift.

2501.01783 2026-02-12 math.ST stat.ML stat.TH

Nonparametric estimation of a factorizable density using diffusion models

Hyeok Kyu Kwon, Dongha Kim, Ilsang Ohn, Minwoo Chae

Comments Accepted for publication in the Journal of Machine Learning Research (JMLR)

详情
英文摘要

In recent years, diffusion models, and more generally score-based deep generative models, have achieved remarkable success in various applications, including image and audio generation. In this paper, we view diffusion models as an implicit approach to nonparametric density estimation and study them within a statistical framework to analyze their surprising performance. A key challenge in high-dimensional statistical inference is leveraging low-dimensional structures inherent in the data to mitigate the curse of dimensionality. We assume that the underlying density exhibits a low-dimensional structure by factorizing into low-dimensional components, a property common in examples such as Bayesian networks and Markov random fields. Under suitable assumptions, we demonstrate that an implicit density estimator constructed from diffusion models adapts to the factorization structure and achieves the minimax optimal rate with respect to the total variation distance. In constructing the estimator, we design a sparse weight-sharing neural network architecture, where sparsity and weight-sharing are key features of practical architectures such as convolutional neural networks and recurrent neural networks.

2412.20481 2026-02-12 math.OC stat.CO

EM algorithms for optimization problems with polynomial objectives

Kensuke Asai, Jun-ya Gotoh

详情
英文摘要

The EM (Expectation-Maximization) algorithm is regarded as an MM (Majorization-Minimization) algorithm for maximum likelihood estimation of statistical models. Expanding this view, this paper demonstrates that by choosing an appropriate probability distribution, even nonstatistical optimization problem can be cast as a negative log-likelihood-like minimization problem, which can be approached by an EM (or MM) algorithm. When a polynomial objective is optimized over a simple polyhedral feasible set and an exponential family distribution is employed, the EM algorithm can be reduced to a natural gradient descent of the employed distribution with a constant step size. This is demonstrated through three examples. In this paper, we demonstrate the global convergence of specific cases with some exponential family distributions in a general form. In instances when the feasible set is not sufficiently simple, the use of MM algorithms can nevertheless be adequately described. When the objective is to minimize a convex quadratic function and the constraints are polyhedral, global convergence can also be established based on the existing results for an entropy-like proximal point algorithm.

2412.17070 2026-02-12 math.PR math.OC stat.ML

Decoupled Functional Central Limit Theorems for Two-Time-Scale Stochastic Approximation

Yuze Han, Xiang Li, Jiadong Liang, Zhihua Zhang

详情
英文摘要

In two-time-scale stochastic approximation (SA), two iterates are updated at different rates, governed by distinct step sizes, with each update influencing the other. Previous studies have demonstrated that the convergence rates of the error terms for these updates depend solely on their respective step sizes, a property known as decoupled convergence. However, a functional version of this decoupled convergence has not been explored. Our work fills this gap by establishing decoupled functional central limit theorems for two-time-scale SA, offering a more precise characterization of its asymptotic behavior. Our results show that, on each time scale, the limiting dynamics has the same form as in standard SA, and the coupling between the two iterates enters the limit only through the associated coefficients. To achieve these results, we leverage the martingale problem approach and establish tightness as a crucial intermediate step. Furthermore, to address the interdependence between different time scales, we introduce an innovative auxiliary sequence to eliminate the primary influence of the fast-time-scale update on the slow-time-scale update.

2412.00228 2026-02-12 stat.ME

A Doubly Robust Framework for Addressing Outcome-Dependent Selection Bias in Multi-Cohort EHR Studies

Ritoban Kundu, Xu Shi, Michael Kleinsasser, Lars G. Fritsche, Maxwell Salvatore, Bhramar Mukherjee

详情
英文摘要

Selection bias can hinder accurate estimation of association parameters in binary disease risk models using non-probability samples like electronic health records (EHRs). The issue is compounded when participants are recruited from multiple clinics/centers with varying selection mechanisms that may depend on the disease/outcome of interest. Traditional inverse-probability-weighted (IPW) methods, based on constructed parametric selection models, often struggle with misspecifications when selection mechanisms vary across cohorts. This paper introduces a new Joint Augmented Inverse Probability Weighted (JAIPW) method, which integrates individual-level data from multiple cohorts collected under potentially outcome-dependent selection mechanisms, with data from an external probability sample. JAIPW offers double robustness by incorporating a flexible auxiliary score model to address potential misspecifications in the selection models. We outline the asymptotic properties of the JAIPW estimator, and our simulations reveal that JAIPW achieves up to six times lower relative bias and five times lower root mean square error (RMSE) compared to the best performing joint IPW methods under scenarios with misspecified selection models. Applying JAIPW to the Michigan Genomics Initiative (MGI), a multi-clinic EHR-linked biobank, combined with external national probability samples, resulted in cancer-sex association estimates closely aligned with national benchmark estimates. We also analyzed the association between cancer and polygenic risk scores (PRS) in MGI to illustrate a situation where the exposure variable is not measured in the external probability sample.

2410.06125 2026-02-12 stat.ME stat.AP

Simultaneous Graphical Dynamic Modeling

Mike West, Luke Vrotsos

Comments 34 pages and 13 figures

详情
英文摘要

We review theory and methodology of the class of simultaneous graphical dynamic linear models (SGDLMs) that provide flexibility, parsimony and scalability of multivariate time series analysis. Discussion includes core theoretical aspects and summaries of existing Bayesian methodology for forward filtering and forecasting with SGDLMs. The review is complemented by new theory linking dynamic graphical and factor models, and extensions of the Bayesian methodology. This addresses graphical structure uncertainty via model marginal likelihood evaluation, and analysis with missing data relevant to counterfactual analysis. The latter advances the ability to scale causal analysis to higher-dimensional time series. Aspects of the theory and methodology are exemplified in a global macroeconomic time series study with time-varying cross-series relationships and primary interests in potential causal effects. The example highlights the utility of SGDLMs with insights generated by the theoretical structure of these models, and benefits of fully Bayesian assessment of post-intervention outcomes in causal time series studies as in prediction more generally.

2406.01552 2026-02-12 stat.ML cs.AI cs.LG

Tensor learning with orthogonal, Lorentz, and symplectic symmetries

Wilson G. Gregory, Josué Tonelli-Cueto, Nicholas F. Marshall, Andrew S. Lee, Soledad Villar

Comments 40 pages, 1 figure. To appear at ICLR 2026

详情
英文摘要

Tensors are a fundamental data structure for many scientific contexts, such as time series analysis, materials science, and physics, among many others. Improving our ability to produce and handle tensors is essential to efficiently address problems in these domains. In this paper, we show how to exploit the underlying symmetries of functions that map tensors to tensors. More concretely, we develop universally expressive equivariant machine learning architectures on tensors that exploit that, in many cases, these tensor functions are equivariant with respect to the diagonal action of the orthogonal, Lorentz, and/or symplectic groups. We showcase our results on three problems coming from material science, theoretical computer science, and time series analysis. For time series, we combine our method with the increasingly popular path signatures approach, which is also invariant with respect to reparameterizations. Our numerical experiments show that our equivariant models perform better than corresponding non-equivariant baselines.

2404.07593 2026-02-12 stat.ML cs.LG stat.ME

Diffusion posterior sampling for simulation-based inference in tall data settings

Julia Linhart, Gabriel Victorino Cardoso, Alexandre Gramfort, Sylvain Le Corff, Pedro L. C. Rodrigues

Comments 49 pages, 24 figures, 3 tables, 2 algorithms, 12 appendices, TMLR acceptance

详情
英文摘要

Identifying the parameters of a non-linear model that best explain observed data is a core task across scientific fields. When such models rely on complex simulators, evaluating the likelihood is typically intractable, making traditional inference methods such as MCMC inapplicable. Simulation-based inference (SBI) addresses this by training deep generative models to approximate the posterior distribution over parameters using simulated data. In this work, we consider the tall data setting, where multiple independent observations provide additional information, allowing sharper posteriors and improved parameter identifiability. Building on the flourishing score-based diffusion literature, F-NPSE (Geffner et al., 2023) estimates the tall data posterior by composing individual scores from a neural network trained only for a single context observation. This enables more flexible and simulation-efficient inference than alternative approaches for tall datasets in SBI. However, it relies on costly Langevin dynamics during sampling. We propose a new algorithm that eliminates the need for Langevin steps by explicitly approximating the diffusion process of the tall data posterior. Our method retains the advantages of compositional score-based inference while being significantly faster and more stable than F-NPSE. We demonstrate its improved performance on toy problems and standard SBI benchmarks, and showcase its scalability by applying it to a complex real-world model from computational neuroscience.

2402.04582 2026-02-12 stat.AP stat.ML

Dimensionality reduction can be used as a surrogate model for high-dimensional forward uncertainty quantification

Jungho Kim, Sang-ri Yi, Ziqi Wang

Journal ref Reliability Engineering & System Safety, Vol(265), 111474, 2026

详情
英文摘要

We introduce a method to construct a stochastic surrogate model from the results of dimensionality reduction in forward uncertainty quantification. The hypothesis is that the high-dimensional input augmented by the output of a computational model admits a low-dimensional representation. This assumption can be met by numerous uncertainty quantification applications with physics-based computational models. The proposed approach differs from a sequential application of dimensionality reduction followed by surrogate modeling, as we "extract" a surrogate model from the results of dimensionality reduction in the input-output space. This feature becomes desirable when the input space is genuinely high-dimensional. The proposed method also diverges from the Probabilistic Learning on Manifold, as a reconstruction mapping from the feature space to the input-output space is circumvented. The final product of the proposed method is a stochastic simulator that propagates a deterministic input into a stochastic output, preserving the convenience of a sequential "dimensionality reduction + Gaussian process regression" approach while overcoming some of its limitations. The proposed method is demonstrated through two uncertainty quantification problems characterized by high-dimensional input uncertainties.

2305.19640 2026-02-12 stat.ML cs.LG

Fine-grained Analysis of Non-parametric Estimation for Pairwise Learning

Junyu Zhou, Shuo Huang, Han Feng, Puyu Wang, Ding-Xuan Zhou

详情
英文摘要

In this paper, we are concerned with the generalization performance of non-parametric estimation for pairwise learning. Most of the existing work requires the hypothesis space to be convex or a VC-class, and the loss to be convex. However, these restrictive assumptions limit the applicability of the results in studying many popular methods, especially kernel methods and neural networks. We significantly relax these restrictive assumptions and establish a sharp oracle inequality of the empirical minimizer with a general hypothesis space for the Lipschitz continuous pairwise losses. As an example, we apply our general results to study pairwise least squares regression and derive an excess population risk bound that matches the minimax lower bound for the pointwise least squares regression. The key novelty lies in constructing a structured deep ReLU neural network to approximate the true predictor, and in designing a targeted hypothesis space composed of networks with this structure and controllable complexity. Experiments validate the effectiveness of the proposed method. This example demonstrates that the obtained general results indeed help us to explore the generalization performance on a variety of problems that cannot be handled by existing approaches.

2110.01950 2026-02-12 stat.ML cs.LG

Classification of high-dimensional data with spiked covariance matrix structure

Yin-Jen Chen, Minh Tang

Comments 40 pages, 2 figures

Journal ref Transactions on Machine Learning Research (01/2026)

详情
英文摘要

We study the classification problem for high-dimensional data with $n$ observations on $p$ features where the $p \times p$ covariance matrix $Σ$ exhibits a spiked eigenvalue structure and the vector $ζ$, given by the difference between the {\em whitened} mean vectors, is sparse. We analyze an adaptive classifier (adaptive with respect to the sparsity $s$) that first performs dimension reduction on the feature vectors prior to classification in the dimensionally reduced space, i.e., the classifier whitens the data, then screens the features by keeping only those corresponding to the $s$ largest coordinates of $ζ$ and finally applies Fisher linear discriminant on the selected features. Leveraging recent results on entrywise matrix perturbation bounds for covariance matrices, we show that the resulting classifier is Bayes optimal whenever $n \rightarrow \infty$ and $s \sqrt{n^{-1} \ln p} \rightarrow 0$. Notably, our theory also guarantees Bayes optimality for the corresponding quadratic discriminant analysis (QDA). Experimental results on real and synthetic data further indicate that the proposed approach is competitive with state-of-the-art methods while operating on a substantially lower-dimensional representation.