arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.28739 2026-03-31 cs.LG stat.ML

Expectation Error Bounds for Transfer Learning in Linear Regression and Linear Neural Networks

Meitong Liu, Christopher Jung, Rui Li, Xue Feng, Han Zhao

详情
英文摘要

In transfer learning, the learner leverages auxiliary data to improve generalization on a main task. However, the precise theoretical understanding of when and how auxiliary data help remains incomplete. We provide new insights on this issue in two canonical linear settings: ordinary least squares regression and under-parameterized linear neural networks. For linear regression, we derive exact closed-form expressions for the expected generalization error with bias-variance decomposition, yielding necessary and sufficient conditions for auxiliary tasks to improve generalization on the main task. We also derive globally optimal task weights as outputs of solvable optimization programs, with consistency guarantees for empirical estimates. For linear neural networks with shared representations of width $q \leq K$, where $K$ is the number of auxiliary tasks, we derive a non-asymptotic expectation bound on the generalization error, yielding the first non-vacuous sufficient condition for beneficial auxiliary learning in this setting, as well as principled directions for task weight curation. We achieve this by proving a new column-wise low-rank perturbation bound for random matrices, which improves upon existing bounds by preserving fine-grained column structures. Our results are verified on synthetic data simulated with controlled parameters.

2603.28656 2026-03-31 stat.ME stat.AP

Statistical Models for the Inference of Within-person Relations: A Random Intercept Cross-Lagged Panel Model and Its Interpretation

Satoshi Usami

详情
Journal ref
The Japanese Journal of Developmental Psychology, 33, 267-286 (2022)
英文摘要

The cross-lagged panel model (CLPM) has been widely used, particularly in psychology, to infer longitudinal relations among variables. At the same time, controlling for between-person heterogeneity and capturing within-person relations as processes of within-person change are regarded as key components to causal inference based on longitudinal data. Since Hamaker, Kuiper, and Grasman (2015) criticized the CLPM for its limitations in inferring within-person relations, the random intercept cross-lagged panel model (RI-CLPM), which incorporates stable trait factors representing stable individual differences, has rapidly spread, especially in psychology. At the same time, although many statistical models are available for inferring within-person relations, the distinctions among them have not been clearly delineated, and discussions over the interpretation and selection of statistical models remain active. In this paper, I position the RI-CLPM as one useful method for inferring within-person relations, explain its practical issues, and organize its mathematical and conceptual relationships with other statistical models, as well as potential problems that may arise in their application. In particular, I point out that a distinctive feature of the stable trait factors in the RI-CLPM, in representing between-person heterogeneity, is the assumption that they are uncorrelated with within-person variability, and that this point serves as an important link to the mathematical relationship with the dynamic panel model, another promising alternative.

2511.22442 2026-03-31 cs.PF cs.AI cs.CV cs.LG stat.ML

What Is the Optimal Ranking Score Between Precision and Recall? We Can Always Find It and It Is Rarely $F_1$

Sébastien Piérard, Adrien Deliège, Marc Van Droogenbroeck

Comments CVPR 2026

详情
英文摘要

Ranking methods or models based on their performance is of prime importance but is tricky because performance is fundamentally multidimensional. In the case of classification, precision and recall are scores with probabilistic interpretations that are both important to consider and complementary. The rankings induced by these two scores are often in partial contradiction. In practice, therefore, it is extremely useful to establish a compromise between the two views to obtain a single, global ranking. Over the last fifty years or so, it has been proposed to take a weighted harmonic mean, known as the F-score, F-measure, or $F_β$. Generally speaking, by averaging basic scores, we obtain a score that is intermediate in terms of values. However, there is no guarantee that these scores lead to meaningful rankings and no guarantee that the rankings are good tradeoffs between these base scores. Given the ubiquity of $F_β$ scores in the literature, some clarification is in order. Concretely: (1) We establish that $F_β$-induced rankings are meaningful and define a shortest path between precision- and recall-induced rankings. (2) We frame the problem of finding a tradeoff between two scores as an optimization problem expressed with Kendall rank correlations. We show that $F_1$ and its skew-insensitive version are far from being optimal in that regard. (3) We provide theoretical tools and a closed-form expression to find the optimal value for $β$ for any distribution or set of performances, and we illustrate their use on six case studies. Code is available at https://github.com/pierard/cvpr-2026-optimal-tradeoff-precision-recall.

2510.01349 2026-03-31 cs.LG stat.ML

To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking

Hannah Lawrence, Elyssa Hofgard, Vasco Portilheiro, Yuxuan Chen, Tess Smidt, Robin Walters

Comments Published as a conference paper at ICLR 2026. A short version of this paper appeared at the ICLR AI4Mat workshop in April 2025

详情
英文摘要

Symmetry-aware methods for machine learning, such as data augmentation and equivariant architectures, encourage correct model behavior on all transformations (e.g. rotations or permutations) of the original dataset. These methods can improve generalization and sample efficiency, under the assumption that the transformed datapoints are highly probable, or "important", under the test distribution. In this work, we develop a method for critically evaluating this assumption. In particular, we propose a metric to quantify the amount of symmetry breaking in a dataset, via a two-sample classifier test that distinguishes between the original dataset and its randomly augmented equivalent. We validate our metric on synthetic datasets, and then use it to uncover surprisingly high degrees of symmetry-breaking in several benchmark point cloud datasets, constituting a severe form of dataset bias. We show theoretically that distributional symmetry-breaking can prevent invariant methods from performing optimally even when the underlying labels are truly invariant, for invariant ridge regression in the infinite feature limit. Empirically, the implication for symmetry-aware methods is dataset-dependent: equivariant methods still impart benefits on some symmetry-biased datasets, but not others, particularly when the symmetry bias is predictive of the labels. Overall, these findings suggest that understanding equivariance -- both when it works, and why -- may require rethinking symmetry biases in the data.

2505.12578 2026-03-31 stat.ML cs.LG

Stacked conformal prediction

Paulo C. Marques F

Comments 12 pages, 2 figures

详情
Journal ref
Proceedings of Machine Learning Research, 2025. v. 266. p. 305-316
英文摘要

We consider a method for conformalizing a stacked ensemble of predictive models, showing that the potentially simple form of the meta-learner at the top of the stack enables a procedure with manageable computational cost that achieves approximate marginal validity without requiring the use of a separate calibration sample. Empirical results indicate that the method compares favorably to a standard inductive alternative.

2208.04980 2026-03-31 cs.SI cs.LG stat.AP

An NLP-Assisted Bayesian Time Series Analysis for Prevalence of Twitter Cyberbullying During the COVID-19 Pandemic

Christopher Perez, Sayar Karmakar

Comments 22 pages, 15 figures

详情
英文摘要

COVID-19 has brought about many changes in social dynamics. Stay-at-home orders and disruptions in school teaching can influence bullying behavior in-person and online, both of which leading to negative outcomes in victims. To study cyberbullying specifically, 1 million tweets containing keywords associated with abuse were collected from the beginning of 2019 to the end of 2021 with the Twitter API search endpoint. A natural language processing model pre-trained on a Twitter corpus generated probabilities for the tweets being offensive and hateful. To overcome limitations of sampling, data was also collected using the count endpoint. The fraction of tweets from a given daily sample marked as abusive is multiplied to the number reported by the count endpoint. Once these adjusted counts are assembled, a Bayesian autoregressive Poisson model allows one to study the mean trend and lag functions of the data and how they vary over time. The results reveal strong weekly and yearly seasonality in hateful speech but with slight differences across years that may be attributed to COVID-19.

2603.28615 2026-03-31 stat.ME

Toxicity Monitoring Rule for a Two-Cohort Phase II Clinical Trial with Bivariate Beta Prior

Yu Wang, Aniko Szabo

详情
英文摘要

Toxicity monitoring is essential in Phase II clinical trials to ensure participant safety. While monitoring rules are well-established for single-arm trials, two-cohort trials present unique challenges because toxicities are expected to be similar between cohorts but may still differ. Current approaches either monitor the two cohorts independently, which ignores their similarity, or pool them together as a single arm, which neglects heterogeneity between cohorts. We propose a Bayesian method based on a bivariate beta prior that provides a compromise between these two approaches. The marginal posterior distribution is derived as a mixture of beta distributions, enabling exact calculations of the proposed method's operating characteristics. Examples demonstrate that joint monitoring offers a balanced approach between the independent and pooled methods. Keywords: Toxicity; Two-cohort; Phase II clinical trial; Monitoring rules; Bivariate Beta; Exact Operating characteristics

2603.28564 2026-03-31 math.ST stat.TH

LAD estimation of locally stable SDE

Oleksii M. Kulyk, Hiroki Masuda

Comments 50 pages

详情
英文摘要

We prove the asymptotic mixed normality of the least absolute deviation (LAD) estimator for a locally $α$-stable stochastic differential equation (SDE) observed at high frequency, where $α\in(0,2)$. We investigate both ergodic and non-ergodic cases, where the terminal sampling time diverges or is fixed, respectively, under different sets of assumptions. The objective function for the LAD estimator is expressed in a fully explicit form without necessitating numerical integration, offering a significant computational advantage over the existing non-Gaussian stable quasi-likelihood approach.

2603.28556 2026-03-31 stat.ME

Flexible and Scalable Bayesian Modelling of Spatio-Temporal Hawkes Processes

Wenqing Liu, Xenia Miscouridou, Déborah Sulem

详情
英文摘要

Existing spatio-temporal Hawkes process models typically rely on either parametric or semiparametric assumptions, limiting the model's ability to capture complex endogenous and exogenous event dynamics. We propose a fully Bayesian nonparametric framework for spatio-temporal Hawkes processes using additive Gaussian processes for the prior distributions on the background rate and the triggering kernel. This additive structure enhances interpretability by decoupling temporal and spatial effects while maintaining high modelling flexibility across the entire spatio-temporal domain. To address scalability, we develop a sparse variational inference scheme based on the Gaussian variational family. Synthetic experiments demonstrate that the proposed method accurately recovers background and triggering structures, achieving superior performance compared to existing alternatives. When applied to real-world datasets, it achieves higher held-out log-likelihoods and reveals interpretable spatio-temporal structures of the self-excitation mechanism. Overall, the framework provides a flexible, scalable, interpretable, and uncertainty-aware approach for modelling complex excitation patterns in spatio-temporal event data.

2603.28466 2026-03-31 cs.CV stat.ML

Post-hoc Self-explanation of CNNs

Ahcène Boubekki, Line H. Clemmensen

详情
英文摘要

Although standard Convolutional Neural Networks (CNNs) can be mathematically reinterpreted as Self-Explainable Models (SEMs), their built-in prototypes do not on their own accurately represent the data. Replacing the final linear layer with a $k$-means-based classifier addresses this limitation without compromising performance. This work introduces a common formalization of $k$-means-based post-hoc explanations for the classifier, the encoder's final output (B4), and combinations of intermediate feature activations. The latter approach leverages the spatial consistency of convolutional receptive fields to generate concept-based explanation maps, which are supported by gradient-free feature attribution maps. Empirical evaluation with a ResNet34 shows that using shallower, less compressed feature activations, such as those from the last three blocks (B234), results in a trade-off between semantic fidelity and a slight reduction in predictive performance.

2603.28462 2026-03-31 stat.ME

Identifying the desert decision rule to assess and achieve fairness

Ping Zhang, Naiwen Ying, Wang Miao

详情
英文摘要

We study fairness in decision-making when the data may encode systematic bias. Existing approaches typically impose fairness constraints while predicting the observed decision, which may itself be unfair. We propose a novel framework for characterising and addressing fairness issues by introducing the notion of desert decision, a latent variable representing the decision an individual rightfully deserves based on their actions, efforts, or abilities. This formulation shifts the prediction target from the potentially biased observed decision to the desert decision. We advocate achieving fair decision-making by predicting the desert decision and assessing unfairness by the discrepancy between desert and observed decisions. We establish nonparametric identification results under causally interpretable assumptions on the fairness of the desert decision and the unfairness mechanism of the observed decision. For estimation, we develop a sieve maximum likelihood estimator for the desert decision rule and an influence-function-based estimator for the degree of unfairness. Sensitivity analysis procedures are further proposed to assess robustness to violations of identifying assumptions. Our framework connects fairness with measurement error models, aligning predictive accuracy with fairness relative to an appropriate target, and providing a structural approach to modelling the unfairness mechanism.

2603.28455 2026-03-31 cs.LG cs.AI cs.CV cs.DC stat.ML

FeDMRA: Federated Incremental Learning with Dynamic Memory Replay Allocation

Tiantian Wang, Xiang Xiang, Simon S. Du

详情
英文摘要

In federated healthcare systems, Federated Class-Incremental Learning (FCIL) has emerged as a key paradigm, enabling continuous adaptive model learning among distributed clients while safeguarding data privacy. However, in practical applications, data across agent nodes within the distributed framework often exhibits non-independent and identically distributed (non-IID) characteristics, rendering traditional continual learning methods inapplicable. To address these challenges, this paper covers more comprehensive incremental task scenarios and proposes a dynamic memory allocation strategy for exemplar storage based on the data replay mechanism. This strategy fully taps into the inherent potential of data heterogeneity, while taking into account the performance fairness of all participating clients, thereby establishing a balanced and adaptive solution to mitigate catastrophic forgetting. Unlike the fixed allocation of client exemplar memory, the proposed scheme emphasizes the rational allocation of limited storage resources among clients to improve model performance. Furthermore, extensive experiments are conducted on three medical image datasets, and the results demonstrate significant performance improvements compared to existing baseline models.

2603.28423 2026-03-31 stat.ME stat.CO stat.ML

Profile Graphical Models

Alejandra Avalos-Pacheco, Monia Lupparelli, Francesco C. Stingo

详情
英文摘要

We introduce a novel class of graphical models, termed profile graphical models, that represent, within a single graph, how an external factor influences the dependence structure of a multivariate set of variables. This class is quite general and includes multiple graphs and chain graphs as special cases. Profile graphical models capture the conditional distributions of a multivariate random vector given different levels of a risk factor, and learn how the conditional independence structure among variables may vary across these risk profiles; we formally define this family of models and establish their corresponding Markov properties. We derive key structural and probabilistic properties that underpin a more powerful inferential framework than existing approaches, underscoring that our contribution extends beyond a novel graphical representation.Furthermore, we show that the resulting profile undirected graphical models are independence-compatible with two-block LWF chain graph models.We then develop a Bayesian approach for Gaussian undirected profile graphical models based on continuous spike-and-slab priors to learn shared sparsity structures across different levels of the risk factor. We also design a fast EM algorithm for efficient inference. Inferential properties are explored through simulation studies, including the comparison with competing methods. The practical utility of this class of models is demonstrated through the analysis of protein network data from various subtypes of acute myeloid leukemia. Our results show a more parsimonious network and greater patient heterogeneity than its competitors, highlighting its enhanced ability to capture subject-specific differences.

2603.28410 2026-03-31 cs.LG stat.ML

Mixture-Model Preference Learning for Many-Objective Bayesian Optimization

Manisha Dubey, Sebastiaan De Peuter, Wanrong Wang, Samuel Kaski

Comments 18 pages, 9 figures

详情
英文摘要

Preference-based many-objective optimization faces two obstacles: an expanding space of trade-offs and heterogeneous, context-dependent human value structures. Towards this, we propose a Bayesian framework that learns a small set of latent preference archetypes rather than assuming a single fixed utility function, modelling them as components of a Dirichlet-process mixture with uncertainty over both archetypes and their weights. To query efficiently, we designing hybrid queries that target information about (i) mode identity and (ii) within-mode trade-offs. Under mild assumptions, we provide a simple regret guarantee for the resulting mixture-aware Bayesian optimization procedure. Empirically, our method outperforms standard baselines on synthetic and real-world many-objective benchmarks, and mixture-aware diagnostics reveal structure that regret alone fails to capture.

2603.28359 2026-03-31 math.ST stat.ME stat.ML stat.TH

The Conjugate Domain Dichotomy: Exact Risk of M-Estimators under Infinite-Variance Noise in High Dimensions

Charalampos Agiropoulos

Comments 17 pages, 4 figures. Simulation code available upon request

详情
英文摘要

This paper studies high-dimensional M-estimation in the proportional asymptotic regime (p/n -> gamma > 0) when the noise distribution has infinite variance. For noise with regularly-varying tails of index alpha in (1,2), we establish that the asymptotic behavior of a regularized M-estimator is governed by a single geometric property of the loss function: the boundedness of the domain of its Fenchel conjugate. When this conjugate domain is bounded -- as is the case for the Huber, absolute-value, and quantile loss functions -- the dual variable in the min-max formulation of the estimator is confined, the effective noise reduces to the finite first absolute moment of the noise distribution, and the estimator achieves bounded risk without recourse to external information. When the conjugate domain is unbounded -- as for the squared loss -- the dual variable scales with the noise, the effective noise involves the diverging second moment, and bounded risk can be achieved only through transfer regularization toward an external prior. For the squared-loss class specifically, we derive the exact asymptotic risk via the Convex Gaussian Minimax Theorem under a noise-adapted regularization scaling. The resulting risk converges to a universal floor that is independent of the regularizer, yielding a loss-risk trichotomy: squared-loss estimators without transfer diverge; Huber-loss estimators achieve bounded but non-vanishing risk; transfer-regularized estimators attain the floor.

2603.28346 2026-03-31 cs.LG stat.ML

Machine Learning-Assisted High-Dimensional Matrix Estimation

Wan Tian, Hui Yang, Zhouhui Lian, Lingyue Zhang, Yijie Peng

详情
英文摘要

Efficient estimation of high-dimensional matrices-including covariance and precision matrices-is a cornerstone of modern multivariate statistics. Most existing studies have focused primarily on the theoretical properties of the estimators (e.g., consistency and sparsity), while largely overlooking the computational challenges inherent in high-dimensional settings. Motivated by recent advances in learning-based optimization method-which integrate data-driven structures with classical optimization algorithms-we explore high-dimensional matrix estimation assisted by machine learning. Specifically, for the optimization problem of high-dimensional matrix estimation, we first present a solution procedure based on the Linearized Alternating Direction Method of Multipliers (LADMM). We then introduce learnable parameters and model the proximal operators in the iterative scheme with neural networks, thereby improving estimation accuracy and accelerating convergence. Theoretically, we first prove the convergence of LADMM, and then establish the convergence, convergence rate, and monotonicity of its reparameterized counterpart; importantly, we show that the reparameterized LADMM enjoys a faster convergence rate. Notably, the proposed reparameterization theory and methodology are applicable to the estimation of both high-dimensional covariance and precision matrices. We validate the effectiveness of our method by comparing it with several classical optimization algorithms across different structures and dimensions of high-dimensional matrices.

2603.28344 2026-03-31 stat.ME

Interpretable models for forecasting high-dimensional functional time series

Han Lin Shang, Cristian F. Jiménez-Varón

详情
英文摘要

We study the modeling and forecasting of high-dimensional functional time series, which can be temporally dependent and cross-sectionally correlated. We implement a functional analysis of variance (FANOVA) to decompose high-dimensional functional time series, such as subnational age- and sex-specific mortality observed over years, into two distinct components: a deterministic mean structure and a residual process varying over time. Unlike purely statistical dimensionality-reduction techniques, the FANOVA decomposition provides a direct and interpretable framework by partitioning the series into effects attributable to data-specific factors, such as regional and sex-level variations, and a grand functional mean. From the residual process, we implement a functional factor model to capture the remaining stochastic trends. By combining the forecasts of the residual component with the estimated deterministic structure, we obtain the forecasted curves for high-dimensional functional time series. Illustrated by the age-specific Japanese subnational mortality rates from 1975 to 2023, we evaluate and compare the accuracy of the point and interval forecasts across various forecast horizons. The results demonstrate that leveraging these interpretable components not only clarifies the underlying drivers of the data but also improves forecast accuracy, providing more transparent insights for evidence-based policy decisions.

2603.28324 2026-03-31 stat.ML cs.LG cs.NA math.NA

LDDMM stochastic interpolants: an application to domain uncertainty quantification in hemodynamics

Sarah Katz, Francesco Romor, Jia-Jie Zhu, Alfonso Caiazzo

详情
英文摘要

We introduce a novel conditional stochastic interpolant framework for generative modeling of three-dimensional shapes. The method builds on a recent LDDMM-based registration approach to learn the conditional drift between geometries. By leveraging the resulting pull-back and push-forward operators, we extend this formulation beyond standard Cartesian grids to complex shapes and random variables defined on distinct domains. We present an application in the context of cardiovascular simulations, where aortic shapes are generated from an initial cohort of patients. The conditioning variable is a latent geometric representation defined by a set of centerline points and the radii of the corresponding inscribed spheres. This methodology facilitates both data augmentation for three-dimensional biomedical shapes, and the generation of random perturbations of controlled magnitude for a given shape. These capabilities are essential for quantifying the impact of domain uncertainties arising from medical image segmentation on the estimation of relevant biomarkers.

2603.28320 2026-03-31 stat.ME

Design-Based Inference for the AUC with Complex Survey Data

Amaia Iparragirre, Thomas Lumley, Irantzu Barrio

详情
英文摘要

Complex survey data are usually collected following complex sampling designs. Accounting for the sampling design is essential to obtain unbiased estimates and valid inferences when analyzing complex survey data. The area under the receiver operating characteristic curve (AUC) is routinely used to assess the discriminative ability of predictive models for binary outcomes. However, valid inference for the AUC under complex sampling designs remains challenging. Although bootstrap techniques are widely applied under simple random sampling for variance estimation in this framework, traditional implementations do not account for complex designs. In this work, we propose a design-based framework for AUC inference. In particular, replicate weights methods are used to construct confidence intervals and hypothesis tests. The performance of replicate weights methods and the traditional non-design-based bootstrap for this purpose has been analyzed through an extensive simulation study. Design-based methods achieve coverage probabilities close to nominal levels and appropriate rejection rates under the null hypothesis. In contrast, the traditional non-design-based bootstrap method tends to underestimate the variance, leading to undercoverage and inflated rejection rates. Differences between methods decrease as the number of selected clusters per stratum increases. An application to data from the National Health and Nutrition Examination Survey (NHANES) illustrates the practical relevance of the proposed framework. The methods have been incorporated into the svyROC R package.

2603.28274 2026-03-31 stat.OT cs.HC cs.PL

Statistics 101, 201, and 202: Three Shiny Apps for Teaching Probability Distributions, Inferential Statistics, and Simple Linear Regression

Antoine Soetewey

Comments 6 pages, 0 figure

详情
英文摘要

Statistics 101, 201, and 202 are three open-source interactive web applications built with R \citep{R} and Shiny \citep{shiny} to support the teaching of introductory statistics and probability. The apps help students carry out common statistical computations -- computing probabilities from standard probability distributions, constructing confidence intervals, conducting hypothesis tests, and fitting simple linear regression models -- without requiring prior knowledge of R or any other programming language. Each app provides numerical results, plots rendered with \texttt{ggplot2} \citep{ggplot2}, and inline mathematical derivations typeset with MathJax \citep{cervone2012mathjax}, so that computation and statistical reasoning appear side by side in a single interface. The suite is organised around a broad pedagogical progression: Statistics~101 introduces probability distributions and their properties; Statistics~201 addresses confidence intervals and hypothesis tests; and Statistics~202 covers the simple linear model. All three apps are freely accessible online and their source code is released under a CC-BY-4.0 license.

2603.28273 2026-03-31 stat.AP

A statistical perspective on higher-order interactions modeling

Catherine Matias

详情
英文摘要

Modeling higher-order interactions (HOI) has emerged as a crucial challenge in complex systems analysis, as many phenomena cannot be fully captured by pairwise relationships alone. Hypergraphs, which generalize graphs by allowing interactions among more than two entities, provide a powerful framework for representing such intricate dependencies. Adopting a statistical and probabilistic perspective on hypergraph modeling, we propose a guided tour through this emerging research area. We begin by illustrating the ubiquity of HOI in real-world systems, where interactions often involve groups of entities rather than isolated pairs. We then introduce the foundational concepts and notations of hypergraphs, discussing their descriptive statistics, graph-based representations, and the challenges associated with their complexity. We further explore a variety of statistical models for hypergraphs and address the critical task of node clustering. We conclude by outlining some open challenges in the field.

2603.28177 2026-03-31 math.ST stat.TH

Posterior contraction under misspecification and heteroscedasticity in non-linear inverse problems

Fanny Seizilles, Maximilian Siebel

Comments 57 pages

详情
英文摘要

In many practical and numerical inverse problems, the exact data log-likelihood is not fully accessible, motivating the use of surrogate models. We study heteroscedastic nonparametric nonlinear regression problems with Gaussian errors and establish contraction results for posterior distributions arising from a surrogate log-likelihood constructed from proxy error variances, an approximate forward map, and an appropriate Gaussian process prior. Under general assumptions on the approximation quality, we show that the resulting surrogate posterior is statistically reliable and contracts about the true parameter at rates comparable to those of the exact posterior. The analysis leverages consistency properties of the (penalised) MLE to effectively handle heteroscedastic noise and to control the impact of likelihood approximation errors. We apply the framework to PDE-constrained inverse problems for a reaction-diffusion equation and the two-dimensional Navier-Stokes equation. In the latter case, we consider misspecified viscosity and forcing terms as well as Oseen-type linearization models, highlighting the relevance of our results for numerical analysis applications.

2603.28112 2026-03-31 math.ST stat.ME stat.TH

Parametric generalized spectrum for heavy-tailed time series

Yuichi Goto, Gaspard Bernard

Comments 52 pages, 12 figures

详情
英文摘要

Recently, several spectra have emerged, designed to encapsulate the distributional characteristics of non-Gaussian stationary processes. This article introduces parametric families of generalized spectra based on the characteristic function, alongside inference procedures enabling $\sqrt{n}$-consistent estimation of the unknown parameters in a broad class of parametric models. These spectra capture non-linear dependencies without requiring that the underlying stochastic processes satisfy any moment assumptions. Crucially, this approach facilitates frequency domain analysis for heavy-tailed time series, including possibly non-causal Cauchy autoregressive models and discrete-stable integer-valued autoregressive models. To the best of our knowledge, the latter models have not been studied theoretically in the literature. By estimating parameters across both causal and non-causal parameter spaces, our method automatically identifies the causal or non-causal structure of Cauchy autoregressive models. Furthermore, our estimator does not depend on smoothing parameters since it is based on the integrated periodogram associated with the generalized spectrum. As applications, we develop goodness-of-fit tests, moving average unit-root tests, and tests for non-invertibility. We study the finite-sample performance of the proposed estimators and tests via Monte Carlo simulations, and apply the methodology to estimation and forecasting of a measles count dataset. We evaluate finite-sample performance using Monte Carlo simulations and illustrate the practical value of the procedure with an application to measles case-count estimation and forecasting.

2603.27984 2026-03-31 stat.ME

Empirical Bayes Predictive Density Estimation under Covariate Shift in Large Imbalanced Linear Mixed Models

Abir Sarkar, Gourab Mukherjee, Keisuke Yano

详情
英文摘要

We study empirical Bayes (EB) predictive density estimation in linear mixed models (LMMs) with large number of units, which induce a high dimensional random effects space. Focusing on Kullback Leibler (KL) risk minimization, we develop a calibration framework to optimally tune predictive densities derived from on a broad class of flexible priors. Our proposed method addresses two key challenges in predictive inference: (a) severe data scarcity leading to highly imbalanced designs, in which replicates are available for only a small subset of units; and (b) distributional shifts in future covariates. To estimate predictive KL risk in LMMs, we use a data-fission approach that leverages exchangeability in the covariate distribution. We establish convergence rates for our proposed risk estimators and show how their efficiency deteriorates as data scarcity increases. Our results imply the decision-theoretic optimality of the proposed EB predictive density estimator. The theoretical development relies on a novel probabilistic analysis of the interaction between data fission, sample reuse, and the predictive heat-equation representation of George et al. (2006), which expresses predictive KL risk through expected log-marginals. Extensive simulation studies demonstrate strong predictive performance and robustness of the proposed approach across diverse regimes with varying degrees of data scarcity and covariate shift.

2603.27916 2026-03-31 stat.ME

OPTICS: Order-Preserved Test-Inverse Confidence Set for Number of Change-Points

Ao Sun, Jingyuan Liu

Comments 78 pages, 5 figures

详情
英文摘要

Determining the number of change-points is a first-step and fundamental task in change-point detection problems, as it lays the groundwork for subsequent change-point position estimation. While the existing literature offers various methods for consistently estimating the number of change-points, these methods typically yield a single point estimate without any assurance that it recovers the true number of changes in a specific dataset. Moreover, achieving consistency often hinges on very stringent conditions that can be challenging to verify in practice. To address these issues, we introduce a unified test-inverse procedure to construct a confidence set for the number of change-points. The proposed confidence set provides a set of possible values within which the true number of change-points is guaranteed to lie with a specified level of confidence. We further proved that the confidence set is sufficiently narrow to be powerful and informative by deriving the order of its cardinality. Remarkably, this confidence set can be established under more relaxed conditions than those required by most point estimation techniques. We also advocate multiple-splitting procedures to enhance stability and extend the proposed method to heavy-tailed and dependent settings. As a byproduct, we may also leverage this constructed confidence set to assess the effectiveness of point-estimation algorithms. Through extensive simulation studies, we demonstrate the superior performance of our confidence set approach. Additionally, we apply this method to analyze a bladder tumor microarray dataset. Supplementary Material, including proofs of all theoretical results, computer code, the R package, and extended simulation studies, are available online.

2603.27903 2026-03-31 stat.ML cs.LG math-ph math.AT math.MP

Persistence diagrams of random matrices via Morse theory: universality and a new spectral diagnostic

Matthew Loftus

Comments 7 pages, 5 figures, 4 tables

详情
英文摘要

We prove that the persistence diagram of the sublevel set filtration of the quadratic form f(x) = x^T M x restricted to the unit sphere S^{n-1} is analytically determined by the eigenvalues of the symmetric matrix M. By Morse theory, the diagram has exactly n-1 finite bars, with the k-th bar living in homological dimension k-1 and having length equal to the k-th eigenvalue spacing s_k = λ_{k+1} - λ_k. This identification transfers random matrix theory (RMT) universality to persistence diagram universality: for matrices drawn from the Gaussian Orthogonal Ensemble (GOE), we derive the closed-form persistence entropy PE = log(8n/π) - 1, and verify numerically that the coefficient of variation of persistence statistics decays as n^{-0.6}. Different random matrix ensembles (GOE, GUE, Wishart) produce distinct universal persistence diagrams, providing topological fingerprints of RMT universality classes. As a practical consequence, we show that persistence entropy outperforms the standard level spacing ratio \langle r \rangle for discriminating GOE from GUE matrices (AUC 0.978 vs. 0.952 at n = 100, non-overlapping bootstrap 95% CIs), and detects global spectral perturbations in the Rosenzweig-Porter model to which \langle r \rangle is blind. These results establish persistence entropy as a new spectral diagnostic that captures complementary information to existing RMT tools.

2603.27873 2026-03-31 stat.ME

A Robust Moment System Based on Absolute Deviations and Quantile Slicing

Elsayed Elamir

Comments 26, 3 figures

详情
英文摘要

This study develops two robust, quantile-sliced moment systems, mean and median absolute deviation (MAD and MedAD moments), to serve as foundational tools in parametric modeling, statistical inference, and describing distributional location, scale, skewness, and tail behavior in settings where classical moments and L-moments fail. MAD moments use block-wise absolute deviations around the median and exist whenever the mean is finite, while MedAD moments replace expectations with medians, ensuring existence for all distributions, including heavy-tailed cases with undefined mean or variance. The systems exhibit strong consistency, slice-based robustness, and bounded influence. The results indicate that MAD and L moment ratios are efficient for light to moderate tails, whereas MedAD ratios remain uniquely stable when higher moments do not exist. Applications to Cauchy parameter estimation highlight the practical value of MedAD estimators as simple, fully robust alternatives to likelihood-based approaches. Together, these systems offer a unified, median-anchored framework for reliable distributional inference under heavy tails and contamination.

2603.27871 2026-03-31 stat.ML cs.LG

Statistical Guarantees for Distributionally Robust Optimization with Optimal Transport and OT-Regularized Divergences

Jeremiah Birrell, Xiaoxi Shen

Comments 24 pages

详情
英文摘要

We study finite-sample statistical performance guarantees for distributionally robust optimization (DRO) with optimal transport (OT) and OT-regularized divergence model neighborhoods. Specifically, we derive concentration inequalities for supervised learning via DRO-based adversarial training, as commonly employed to enhance the adversarial robustness of machine learning models. Our results apply to a wide range of OT cost functions, beyond the $p$-Wasserstein case studied by previous authors. In particular, our results are the first to: 1) cover soft-constraint norm-ball OT cost functions; soft-constraint costs have been shown empirically to enhance robustness when used in adversarial training, 2) apply to the combination of adversarial sample generation and adversarial reweighting that is induced by using OT-regularized $f$-divergence model neighborhoods; the added reweighting mechanism has also been shown empirically to further improve performance. In addition, even in the $p$-Wasserstein case, our bounds exhibit better behavior as a function of the DRO neighborhood size than previous results when applied to the adversarial setting.

2603.27869 2026-03-31 stat.ME

Dependable Exploitation of High-Dimensional Unlabeled Data in an Assumption-Lean Framework

Chao Ying, Siyi Deng, Yang Ning, Jiwei Zhao, Heping Zhang

详情
英文摘要

Semi-supervised learning has attracted significant attention due to the proliferation of applications featuring limited labeled data but abundant unlabeled data. In this paper, we examine the statistical inference problem in an assumption-lean framework which involves a high-dimensional regression parameter, defined by minimizing the least squares, within the context of semi-supervised learning. We investigate when and how unlabeled data can enhance the estimation efficiency of a regression parameter functional. First, we demonstrate that a straightforward debiased estimator can only be more efficient than its supervised counterpart if the unknown conditional mean function can be consistently estimated at an appropriate rate. Otherwise, incorporating unlabeled data can actually be counterproductive. To address this vulnerability, we propose a novel estimator guaranteed to be at least as efficient as the supervised baseline, even when the conditional mean function is misspecified. This ensures the dependable use of unlabeled data for statistical inference. Finally, we extend our approach to the general M-estimation framework, and demonstrate the effectiveness of our methodology through comprehensive simulation studies and a real data application.

2603.27864 2026-03-31 stat.ME stat.CO stat.ML

Vertical Consensus Inference for High-Dimensional Random Partition

Khai Nguyen, Yang Ni, Peter Mueller

Comments 10 pages, 1 figure

详情
英文摘要

We review recently proposed Bayesian approaches for clustering high-dimensional data. After identifying the main limitations of available approaches, we introduce an alternative framework based on vertical consensus inference (VCI) to mitigate the curse of dimensionality in high-dimensional Bayesian clustering. VCI builds on the idea of consensus Monte Carlo by dividing the data into multiple shards (smaller subsets of variables), performing posterior inference on each shard, and then combining the shard-level posteriors to obtain a consensus posterior. The key distinction is that VCI splits the data vertically, producing vertical shards that retain the same number of observations but have lower dimensionality. We use an entropic regularized Wasserstein barycenter to define a consensus posterior. The shard-specific barycenter weights are constructed to favor shards that provide meaningful partitions, distinct from a trivial single cluster or all singleton clusters, favoring balanced cluster sizes and precise shard-specific posterior random partitions. We show that VCI can be interpreted as a variational approximation to the posterior under a hierarchical model with a generalized Bayes prior. For relatively low-dimensional problems, experiments suggest that VCI closely approximates inference based on clustering the entire multivariate data. For high-dimensional data and in the presence of many noninformative dimensions, VCI introduces a new framework for model-based and principled inference on random partitions. Although our focus here is on random partitions, VCI can be applied to any dimension-independent parameters and serves as a bridge to emerging areas in statistics such as consensus Monte Carlo, optimal transport, variational inference, and generalized Bayes.

2603.27814 2026-03-31 cs.LG stat.ML

RG-TTA: Regime-Guided Meta-Control for Test-Time Adaptation in Streaming Time Series

Indar Kumar, Akanksha Tiwari, Sai Krishna Jasti, Ankit Hemant Lade

Comments 18 pages, 8 figures

详情
英文摘要

Test-time adaptation (TTA) enables neural forecasters to adapt to distribution shifts in streaming time series, but existing methods apply the same adaptation intensity regardless of the nature of the shift. We propose Regime-Guided Test-Time Adaptation (RG-TTA), a meta-controller that continuously modulates adaptation intensity based on distributional similarity to previously-seen regimes. Using an ensemble of Kolmogorov-Smirnov, Wasserstein-1, feature-distance, and variance-ratio metrics, RG-TTA computes a similarity score for each incoming batch and uses it to (i) smoothly scale the learning rate -- more aggressive for novel distributions, conservative for familiar ones -- and (ii) control gradient effort via loss-driven early stopping rather than fixed budgets, allowing the system to allocate exactly the effort each batch requires. As a supplementary mechanism, RG-TTA gates checkpoint reuse from a regime memory, loading stored specialist models only when they demonstrably outperform the current model (loss improvement >= 30%). RG-TTA is model-agnostic and strategy-composable: it wraps any forecaster exposing train/predict/save/load interfaces and enhances any gradient-based TTA method. We demonstrate three compositions -- RG-TTA, RG-EWC, and RG-DynaTTA -- and evaluate 6 update policies (3 baselines + 3 regime-guided variants) across 4 compact architectures (GRU, iTransformer, PatchTST, DLinear), 14 datasets (6 real-world multivariate benchmarks + 8 synthetic regime scenarios), and 4 forecast horizons (96, 192, 336, 720) under a streaming evaluation protocol with 3 random seeds (672 experiments total). Regime-guided policies achieve the lowest MSE in 156 of 224 seed-averaged experiments (69.6%), with RG-EWC winning 30.4% and RG-TTA winning 29.0%. Overall, RG-TTA reduces MSE by 5.7% vs TTA while running 5.5% faster; RG-EWC reduces MSE by 14.1% vs standalone EWC.

2603.27792 2026-03-31 cs.LG cs.AI stat.ML

What-If Explanations Over Time: Counterfactuals for Time Series Classification

Udo Schlegel, Thomas Seidl

Comments 24 pages, 1 figure, 3 tables, accepted at the XAI 2026

详情
英文摘要

Counterfactual explanations emerge as a powerful approach in explainable AI, providing what-if scenarios that reveal how minimal changes to an input time series can alter the model's prediction. This work presents a survey of recent algorithms for counterfactual explanations for time series classification. We review state-of-the-art methods, spanning instance-based nearest-neighbor techniques, pattern-driven algorithms, gradient-based optimization, and generative models. For each, we discuss the underlying methodology, the models and classifiers they target, and the datasets on which they are evaluated. We highlight unique challenges in generating counterfactuals for temporal data, such as maintaining temporal coherence, plausibility, and actionable interpretability, which distinguish the temporal from tabular or image domains. We analyze the strengths and limitations of existing approaches and compare their effectiveness along key dimensions (validity, proximity, sparsity, plausibility, etc.). In addition, we implemented an open-source implementation library, Counterfactual Explanations for Time Series (CFTS), as a reference framework that includes many algorithms and evaluation metrics. We discuss this library's contributions in standardizing evaluation and enabling practical adoption of explainable time series techniques. Finally, based on the literature and identified gaps, we propose future research directions, including improved user-centered design, integration of domain knowledge, and counterfactuals for time series forecasting.

2603.27788 2026-03-31 stat.ME

Omitted-Variable Sensitivity Analysis for Generalizing Randomized Trials

Amir Asiaee, Samhita Pal, Jared D. Huling

详情
英文摘要

Randomized controlled trials (RCTs) yield internally valid causal effect estimates, but generalizing these results to target populations with different characteristics requires an untestable selection ignorability assumption: conditional on observed covariates, trial participation must be independent of potential outcomes. This assumption fails when unobserved effect modifiers are distributed differently between trial and target populations. We develop a sensitivity analysis framework for trial generalization grounded in omitted variable bias (OVB). Our key theoretical contribution is an exact decomposition showing that external-validity bias equals moderation strength $\times$ moderator imbalance: (i) how strongly an unobserved variable shifts the treatment effect, times (ii) how differently that variable is distributed across populations after covariate adjustment. We introduce scale-free sensitivity parameters based on partial $R^2$ values, enabling closed-form bounds and benchmarking against observed covariates -- practitioners can assess whether conclusions would change if an unobserved moderator were "as strong as" a particular observed variable. Simulations demonstrate that our bounds achieve nominal coverage and remain conservative under model misspecification, while comparisons with alternative sensitivity frameworks highlight the interpretive advantages of the OVB decomposition.

2603.25529 2026-03-31 econ.EM stat.ME

Sensitivity Analysis for Instrumental Variables Under Joint Relaxations of Monotonicity and Independence

Pedro Picchetti

详情
英文摘要

In this paper I develop a breakdown frontier approach to assess the sensitivity of Local Average Treatment Effects (LATE) estimates to violations of monotonicity and independence of the instrument. I parametrize violations of independence using the concept of $c$-dependence from Masten & Poirier (2018) and allow for the share of defiers to be greater than zero but smaller than the share of compliers. I derive identified sets for the LATE and the Average Treatment Effect (ATE) in which the bounds are functions of these two sensitivity parameters. Using these bounds, I derive the breakdown frontier for the LATE, which is the weakest set of assumptions such that a conclusion regarding the LATE holds. I derive consistent sample analogue estimators for the breakdown frontiers and provide a valid bootstrap procedure for inference. Monte Carlo simulations show the desirable finite-sample properties of the estimators and an empirical application shows that the conclusions regarding the effect of family size on unemployment from Angrist & Evans (1998) are highly sensitive to violations of independence and monotonicity.

2603.20940 2026-03-31 stat.ME

Fast and Scalable Cellwise-Robust Ensembles for High-Dimensional Data

Anthony Christidis, Jeyshinee Pyneeandee, Gabriela Cohen-Freue

详情
英文摘要

The analysis of high-dimensional data, common in fields such as genomics, is complicated by the presence of cellwise contamination, where individual cells rather than entire rows are corrupted. This contamination poses a significant challenge to standard variable selection techniques. While recent ensemble methods have introduced deterministic frameworks that partition the predictor space to manage high collinearity, these architectures were not designed to handle cellwise contamination, leaving a critical methodological gap. To bridge this gap, we propose the Fast and Scalable Cellwise-Robust Ensemble (FSCRE) algorithm, a multi-stage framework integrating three key statistical stages. First, the algorithm establishes a robust foundation by deriving a cleaned data matrix and a reliable, cellwise-robust covariance structure. Variable selection then proceeds via a competitive ensemble: a robust, correlation-based formulation of the Least-Angle Regression (LARS) algorithm proposes candidates for multiple sub-models, and a cross-validation criterion arbitrates their final assignment. Despite its architectural complexity, the proposed method enjoys fundamental theoretical guarantees, including invariance properties and local selection stability. Through extensive simulations and a bioinformatics application, we demonstrate FSCRE's superior performance in variable selection precision, recall, and predictive accuracy across various contamination scenarios. This work provides a unified framework connecting cellwise-robust estimation with high-performance ensemble learning, with an implementation available on CRAN.

2603.14830 2026-03-31 cs.LG stat.ML

Dataset Distillation Efficiently Encodes Low-Dimensional Representations from Gradient-Based Learning of Non-Linear Tasks

Yuri Kinoshita, Naoki Nishikawa, Taro Toyoizumi

详情
英文摘要

Dataset distillation, a training-aware data compression technique, has recently attracted increasing attention as an effective tool for mitigating costs of optimization and data storage. However, progress remains largely empirical. Mechanisms underlying the extraction of task-relevant information from the training process and the efficient encoding of such information into synthetic data points remain elusive. In this paper, we theoretically analyze practical algorithms of dataset distillation applied to the gradient-based training of two-layer neural networks with width $L$. By focusing on a non-linear task structure called multi-index model, we prove that the low-dimensional structure of the problem is efficiently encoded into the resulting distilled data. This dataset reproduces a model with high generalization ability for a required memory complexity of $\tildeΘ$$(r^2d+L)$, where $d$ and $r$ are the input and intrinsic dimensions of the task. To the best of our knowledge, this is one of the first theoretical works that include a specific task structure, leverage its intrinsic dimensionality to quantify the compression rate and study dataset distillation implemented solely via gradient-based algorithms.

2603.13742 2026-03-31 cs.LG stat.ML

Few Batches or Little Memory, But Not Both: Simultaneous Space and Adaptivity Constraints in Stochastic Bandits

Ruiyuan Huang, Zicheng Lyu, Xiaoyi Zhu, Zengfeng Huang

详情
英文摘要

We study stochastic multi-armed bandits under simultaneous constraints on space and adaptivity: the learner interacts with the environment in $B$ batches and has only $W$ bits of persistent memory. Prior work shows that each constraint alone is surprisingly mild: near-minimax regret $\widetilde{O}(\sqrt{KT})$ is achievable with $O(\log T)$ bits of memory under fully adaptive interaction, and with a $K$-independent $O(\log\log T)$-type number of batches when memory is unrestricted. We show that this picture breaks down in the simultaneously constrained regime. We prove that any algorithm with a $W$-bit memory constraint must use at least $Ω(K/W)$ batches to achieve near-minimax regret $\widetilde{O}(\sqrt{KT})$, even under adaptive grids. In particular, logarithmic memory rules out $O(K^{1-\varepsilon})$ batch complexity. Our proof is based on an information bottleneck. We show that near-minimax regret forces the learner to acquire $Ω(K)$ bits of information about the hidden set of good arms under a suitable hard prior, whereas an algorithm with $B$ batches and $W$ bits of memory allows only $O(BW)$ bits of information. A key ingredient is a localized change-of-measure lemma that yields probability-level arm exploration guarantees, which is of independent interest. We also give an algorithm that, for any bit budget $W$ with $Ω(\log T) \le W \le O(K\log T)$, uses at most $W$ bits of memory and $\widetilde{O}(K/W)$ batches while achieving regret $\widetilde{O}(\sqrt{KT})$, nearly matching our lower bound up to polylogarithmic factors.

2603.10382 2026-03-31 stat.ME econ.EM stat.AP stat.CO

Gimbal Regression: Orientation-Adaptive Local Linear Regression under Spatial Heterogeneity

Yuichiro Otani

Comments Version 2 corrects variable labeling in the Meuse example (from "elevation" to "lead"). No changes to results or conclusions

详情
英文摘要

Local regression is widely used to explore spatial heterogeneity, but anisotropic or effectively low-dimensional neighborhoods can produce ill-conditioned local solves, causing coefficient variation driven by numerical artifacts rather than substantive structure. Such instability is often hidden when estimation relies on implicit tuning or optimization without exposing local diagnostics. This paper proposes Gimbal Regression (GR), a deterministic, geometry-aware local regression framework for stable and auditable estimation. GR constructs directional weights from neighborhood geometry using explicit orientation objects and deterministic safeguards, and computes local coefficients by a closed-form solve. Theoretical results are stated conditional on the realized neighborhood configuration, under which the estimator is a deterministic linear operator with finite-perturbation stability bounds. Simulations and empirical examples demonstrate predictable computation, transparent diagnostics, and improved numerical stability relative to common local regression baselines.

2602.18482 2026-03-31 physics.comp-ph cond-mat.stat-mech cs.LG stat.ML

Boltzmann Generators for Condensed Matter via Riemannian Flow Matching

Emil Hoffmann, Maximilian Schebek, Leon Klein, Frank Noé, Jutta Rogal

Comments Published as a workshop paper at AI4MAT, ICLR 2026

详情
英文摘要

Sampling equilibrium distributions is fundamental to statistical mechanics. While flow matching has emerged as scalable state-of-the-art paradigm for generative modeling, its potential for equilibrium sampling in condensed-phase systems remains largely unexplored. We address this by incorporating the periodicity inherent to these systems into continuous normalizing flows using Riemannian flow matching. The high computational cost of exact density estimation intrinsic to continuous normalizing flows is mitigated by using Hutchinson's trace estimator, utilizing a crucial bias-correction step based on cumulant expansion to render the stochastic estimates suitable for rigorous thermodynamic reweighting. Our approach is validated on monatomic ice, demonstrating the ability to train on systems of unprecedented size and obtain highly accurate free energy estimates without the need for traditional multistage estimators.

2601.18857 2026-03-31 stat.ML cs.LG

Statistical Inference for Explainable Boosting Machines

Haimo Fang, Kevin Tan, Jonathan Pipping-Gamon, Giles Hooker

Comments Accepted to AISTATS 2026 (poster)

详情
英文摘要

Explainable boosting machines (EBMs) are popular "glass-box" models that learn a set of univariate functions using boosting trees. These achieve explainability through visualizations of each feature's effect. However, unlike linear model coefficients, uncertainty quantification for the learned univariate functions requires computationally intensive bootstrapping, making it hard to know which features truly matter. We provide an alternative using recent advances in statistical inference for gradient boosting, deriving methods for statistical inference as well as end-to-end theoretical guarantees. Using a moving average instead of a sum of trees (Boulevard regularization) allows the boosting process to converge to a feature-wise kernel ridge regression. This produces asymptotically normal predictions that achieve the minimax-optimal MSE for fitting Lipschitz GAMs with $p$ features of $O(p n^{-2/3})$, successfully avoiding the curse of dimensionality. We then construct prediction intervals for the response and confidence intervals for each learned univariate function with a runtime independent of the number of datapoints, enabling further explainability within EBMs. Code is available at https://github.com/hetankevin/ebm-inference.

2510.15058 2026-03-31 stat.ML cs.LG math.ST stat.TH

The Minimax Lower Bound of Kernel Stein Discrepancy Estimation

Jose Cribeiro-Ramallo, Agnideep Aich, Florian Kalinke, Ashit Baran Aich, Zoltán Szabó

Comments Accepted for publication at AISTATS 2026

详情
英文摘要

Kernel Stein discrepancies (KSDs) have emerged as a powerful tool for quantifying goodness-of-fit over the last decade, featuring numerous successful applications. To the best of our knowledge, all existing KSD estimators with known rate achieve $\sqrt n$-convergence. In this work, we present two complementary results (with different proof strategies), establishing that the minimax lower bound of KSD estimation is $n^{-1/2}$ and settling the optimality of these estimators. Our first result focuses on KSD estimation on $\mathbb R^d$ with the Langevin-Stein operator; our explicit constant for the Gaussian kernel indicates that the difficulty of KSD estimation may increase exponentially with the dimensionality $d$. Our second result settles the minimax lower bound for KSD estimation on general domains.

2510.10324 2026-03-31 stat.ML cs.LG

On some practical challenges of conformal prediction

Liang Hong, Noura Raydan Nasreddine

详情
英文摘要

Conformal prediction is a model-free machine learning method for constructing prediction regions at a guaranteed coverage probability level. However, a data scientist often faces three challenges in practice: (i) the determination of a conformal prediction region is only approximate, jeopardizing the finite-sample validity of prediction, (ii) the computation required could be prohibitively expensive, and (iii) the shape of a conformal prediction region is hard to control. This article offers new insights into the relationship among the monotonicity of the non-conformity measure, the monotonicity of the plausibility function, and the exact determination of a conformal prediction region. Based on these new insights, we propose a quadratic-polynomial non-conformity measure that allows a data scientist to circumvent the three challenges simultaneously within the full conformal prediction framework.

2508.08517 2026-03-31 stat.ML cs.CE cs.LG

Projection-based multifidelity linear regression for data-scarce applications

Vignesh Sella, Julie Pham, Karen Willcox, Anirban Chaudhuri

Comments 23 page, 7 figures, submitted to Machine Learning for Computational Science and Engineering special issue Accelerating Numerical Methods With Scientific Machine Learning

详情
Journal ref
Mach. Learn. Comput. Sci. Eng. 1, 47 (2025)
英文摘要

Surrogate modeling for systems with high-dimensional quantities of interest remains challenging, particularly when training data are costly to acquire. This work develops multifidelity methods for multiple-input multiple-output linear regression targeting data-limited applications with high-dimensional outputs. Multifidelity methods integrate many inexpensive low-fidelity model evaluations with limited, costly high-fidelity evaluations. We introduce two projection-based multifidelity linear regression approaches that leverage principal component basis vectors for dimensionality reduction and combine multifidelity data through: (i) a direct data augmentation using low-fidelity data, and (ii) a data augmentation incorporating explicit linear corrections between low-fidelity and high-fidelity data. The data augmentation approaches combine high-fidelity and low-fidelity data into a unified training set and train the linear regression model through weighted least squares with fidelity-specific weights. Various weighting schemes and their impact on regression accuracy are explored. The proposed multifidelity linear regression methods are demonstrated on approximating the surface pressure field of a hypersonic vehicle in flight. In a low-data regime of no more than ten high-fidelity samples, multifidelity linear regression achieves approximately 3% - 12% improvement in median accuracy compared to single-fidelity methods with comparable computational cost.

2508.05719 2026-03-31 stat.AP

Modeling Spatio-Temporal Dynamics of Obesity in Italian Regions Via Bayesian Beta Regression

Luciano Rota, Raffaele Argiento, Michela Cameletti

详情
英文摘要

In this paper we investigate the spatio-temporal dynamics of obesity rates across Italian regions from 2010 to 2022, aiming to identify spatial and temporal trends and assess potential heterogeneities. We implement a Bayesian hierarchical Beta regression model to analyze regional obesity rates, integrating spatial and temporal random effects, alongside gender and various exogenous predictors. The model leverages the Stochastic Search Variable Selection technique to identify significant predictors supported by the data. The analysis reveals both regional heterogeneity and dependence in obesity rates over the study period, emphasizing the importance of considering gender and spatial correlation in explaining its dynamics over time. In fact, the inclusion of structured spatial and temporal random effects captures the complexities of regional variations over time. These random effects, along with gender, emerge as the primary determinants of obesity prevalence across Italian regions, while the role of exogenous covariates is found to be minimal at the regional level. While socioeconomic and lifestyle factors remain fundamental at a micro-level, the findings demonstrate that the integration of spatial and temporal structures is critical for capturing macro-level obesity variations.

2505.14480 2026-03-31 stat.ME

Exploration, Confirmation, and Replication in the Same Observational Study: A Two Team Cross-Screening Approach to Studying the Effect of Unwanted Pregnancy on Mothers' Later Life Outcomes

Samrat Roy, Marina Bogomolov, Ruth Heller, Amy M. Claridge, Tishra Beeson, Dylan S. Small

详情
英文摘要

The long term consequences of unwanted pregnancies carried to term on mothers have not been much explored. We use data from the Wisconsin Longitudinal Study (WLS) and propose a novel approach, namely two team cross-screening, to study the possible effects of unwanted pregnancies carried to term on various aspects of mothers' later-life mental health, physical health, economic well-being and life satisfaction. Our method, unlike existing approaches to observational studies, enables the investigators to perform exploratory data analysis, confirmatory data analysis and replication in the same study. This is a valuable property when there is only a single data set available with unique strengths to perform exploratory, confirmatory and replication analysis. In two team cross-screening, the investigators split themselves into two teams and the data is split as well according to a meaningful covariate. Each team then performs exploratory data analysis on its part of the data to design an analysis plan for the other part of the data. The complete freedom of the teams in designing the analysis has the potential to generate new unanticipated hypotheses in addition to a prefixed set of hypotheses. Moreover, only the hypotheses that looked promising in the data each team explored are forwarded for analysis (thus alleviating the multiple testing problem). These advantages are demonstrated in our study of the effects of unwanted pregnancies on mothers' later life outcomes.

2505.01166 2026-03-31 stat.AP

Low-rank bilinear autoregressive models for three-way criminal activity tensors

Gregor Zens, Carlos Díaz, Daniele Durante, Eleonora Patacchini

详情
英文摘要

Criminal activity data are typically available via a three-way tensor encoding the reported frequencies of different crime categories across time and space. The challenges that arise in the design of interpretable, yet realistic, model-based representations of the complex dependencies within and across these three dimensions have led to an increasing adoption of black-box predictive strategies. While this perspective has proved successful in producing accurate forecasts guiding targeted interventions, the lack of interpretable model-based characterizations of the dependence structures underlying criminal activity tensors prevents from inferring the cascading effects of these interventions across the different dimensions. We address this gap through the design of a low-rank bilinear autoregressive model which achieves comparable predictive performance to black-box strategies, while allowing interpretable inference on the dependence structures of reported criminal activities across crime categories, time and space. This representation incorporates the time dimension via an autoregressive construction that accounts for spatial effects and dependencies among crime categories through a separable low-rank bilinear formulation. When applied to Chicago police reports, the proposed model showcases remarkable predictive performance and also reveals interpretable dependence structures unveiling fundamental crime dynamics. These results facilitate the design of more refined intervention policies informed by the cascading effects of the policy itself.

2501.13879 2026-03-31 stat.ME math.ST stat.TH

Finite mixture representations of zero-and-$N$-inflated distributions for count-compositional data

André F. B. Menezes, Andrew C. Parnell, Keefe Murphy

详情
Journal ref
Journal of Multivariate Analysis, 210:105492, (2025)
英文摘要

We provide novel probabilistic portrayals of two multivariate models designed to handle zero-inflation in count-compositional data. We develop a new unifying framework that represents both as finite mixture distributions. One of these distributions, based on Dirichlet-multinomial components, has been studied before, but has not yet been properly characterised as a sampling distribution of the counts. The other, based on multinomial components, is a new contribution. Using our finite mixture representations enables us to derive key statistical properties, including moments, marginal distributions, and special cases for both distributions. We develop enhanced Bayesian inference schemes with efficient Gibbs sampling updates, wherever possible, for parameters and auxiliary variables, demonstrating improvements over existing methods in the literature. We conduct simulation studies to evaluate the efficiency of the Bayesian inference procedures and present applications to a human gut microbiome dataset to illustrate the practical utility of the proposed distributions.

2411.15624 2026-03-31 stat.ML cs.LG stat.ME

Trans-Glasso: A Transfer Learning Approach to Precision Matrix Estimation

Boxin Zhao, Cong Ma, Mladen Kolar

Comments 58 pages, 13 figures. Accepted by the Journal of the American Statistical Association (JASA)

详情
英文摘要

Precision matrix estimation is essential in various fields; yet it is challenging when samples for the target study are limited. Transfer learning can enhance estimation accuracy by leveraging data from related source studies. We propose Trans-Glasso, a two-step transfer learning method for precision matrix estimation. First, we obtain initial estimators using a multi-task learning objective that captures shared and unique features across studies. Then, we refine these estimators through differential network estimation to adjust for structural differences between the target and source precision matrices. Under the assumption that most entries of the target precision matrix are shared with source matrices, we derive non-asymptotic error bounds and show that Trans-Glasso achieves minimax optimality under certain conditions. Extensive simulations demonstrate Trans Glasso's superior performance compared to baseline methods, particularly in small-sample settings. We further validate Trans-Glasso in applications to gene networks across brain tissues and protein networks for various cancer subtypes, showcasing its effectiveness in biological contexts. Additionally, we derive the minimax optimal rate for differential network estimation, representing the first such guarantee in this area. The Python implementation of Trans-Glasso, along with code to reproduce all experiments in this paper, is publicly available at https://github.com/boxinz17/transglasso-experiments.

2411.11580 2026-03-31 stat.ME stat.CO

Metric Oja Depth, New Statistical Tool for Estimating the Most Central Objects

Vida Zamanifarizhandi, Joni Virta

Comments 25 pages + 12 pages as supplementary materials

详情
英文摘要

The Oja depth (simplicial volume depth) is one of the classical statistical techniques for measuring the central tendency of data in multivariate space. Despite the widespread emergence of object data like images, texts, matrices or graphs, a well-developed and suitable version of Oja depth for object data is lacking. To address this shortcoming, a novel measure of statistical depth, the metric Oja depth applicable to any object data, is proposed. Two competing strategies are used for optimizing metric depth functions, i.e., finding the deepest objects with respect to them. The performance of the metric Oja depth is compared with three other depth functions (half-space, lens, and spatial) in diverse data scenarios. Keywords: Object Data, Metric Oja depth, Statistical depth, Optimization, Metric statistics

2410.14843 2026-03-31 stat.ML cs.LG stat.ME

Predictive variational inference: Learn the predictively optimal posterior distribution

Jinlin Lai, Antonio Linero, Yuling Yao

详情
英文摘要

Vanilla variational inference finds an optimal approximation to the Bayesian posterior distribution, but even the exact Bayesian posterior is often not meaningful under model misspecification. We propose predictive variational inference (PVI): a general inference framework that seeks and samples from an optimal posterior density such that the resulting posterior predictive distribution is as close to the true data generating process as possible, while this closeness is measured by multiple scoring rules. By optimizing the objective, the predictive variational inference is generally not the same as, or even attempting to approximate, the Bayesian posterior, even asymptotically. Rather, we interpret it as implicit hierarchical expansion. Further, the learned posterior uncertainty detects heterogeneity of parameters among the population, enabling automatic model diagnosis. This framework applies to both likelihood-exact and likelihood-free models. We demonstrate its application in real data examples.

2410.10226 2026-03-31 math.ST math.PR stat.TH

Kinetic interacting particle system: parameter estimation from complete and partial discrete observations

Chiara Amorino, Vytautė Pilipauskaitė

详情
英文摘要

In this paper, we study the estimation of drift and diffusion coefficients in a two dimensional system of N interacting particles modeled by a degenerate stochastic differential equation. We consider both complete and partial observation cases over a fixed time horizon [0, T] and propose novel contrast functions for parameter estimation. In the partial observation scenario, we tackle the challenge posed by unobserved velocities by introducing a surrogate process based on the increments of the observed positions. This requires a modified contrast function to account for the correlation between successive increments. Our analysis demonstrates that, despite the loss of Markovianity due to the velocity approximation in the partial observation case, the estimators converge to a Gaussian distribution (with a correction factor in the partial observation case). The proofs are based on Ito like bounds and an adaptation of the Euler scheme. Additionally, we provide insights into Hörmander's condition, which helps establish hypoellipticity in our model within the framework of stochastic calculus of variations.

2406.12212 2026-03-31 stat.AP stat.ME

Identifying Genetic Variants for Obesity: A Knowledge Integration Quantile Regression (KIQR) Approach for Ultra-High-Dimensional Data

Jiantong Wang, Heng Lian, Yan Yu, Tianhai Zu, Heping Zhang

详情
英文摘要

Obesity is widely recognized as a serious and pervasive health concern. We study obesity through body mass index (BMI), which is known to be highly heritable, and identify important genetic risk factors for BMI from hundreds of thousands of single nucleotide polymorphisms (SNPs) in the Framingham Study data. Several challenges arise when using traditional genome-wide association studies (GWAS): (1) They suffer from a low power due to a combination of a limited number of participants and the stringent genome-wide significance threshold; (2) existing prior knowledge from large meta-analyses may provide valuable guidance but is often underutilized; (3) the one-at-a-time univariate marginal regression framework ignores the joint and conditional nature of genetic effects; (4) GWAS focus solely on mean outcomes, whereas obesity inherently concerns abnormally high BMI levels. To address these challenges, we conduct the analysis by proposing and applying a novel Knowledge Integration Quantile Regression (KIQR) approach via simultaneous variable selection and estimation, focusing on the conditional high quantiles of BMI, which are most relevant to obesity risk, while integrating prior information from large-scale studies such as the GIANT consortium and UK Biobank. Notably, we identified promising novel associations: rs3798696 in \textit{TFAP2A}, rs7070523 in \textit{ITIH5}, and rs178260 in \textit{AIFM3}, which have not previously been reported in the GWAS literature. These findings provide new insights into the genetic architecture of obesity and demonstrate that quantile-based modeling with integrated prior knowledge can potentially uncover novel genes missed by traditional GWAS approaches. An R implementation and simulation scripts are available at: https://github.com/KIQR-submission/KIQR

2309.02087 2026-03-31 stat.ME

Identifying Causal Effects Using Instrumental Variables from the Auxiliary Dataset

Kang Shuai, Shanshan Luo, Wei Li, Yangbo He

Comments 39 pages

详情
英文摘要

Instrumental variable approaches have gained popularity for estimating causal effects in the presence of unmeasured confounders. However, the availability of instrumental variables in the primary dataset is often challenged due to stringent and untestable assumptions. This paper presents a novel method to identify and estimate causal effects by utilizing instrumental variables from the auxiliary dataset, incorporating a structural equation model, even in scenarios with nonlinear treatment effects. Our approach involves using two datasets: one called the primary dataset with joint observations of treatment and outcome, and another auxiliary dataset providing information about the instrument and treatment. Our strategy differs from most existing methods by not depending on the simultaneous measurements of instrument and outcome. The central idea for identifying causal effects is to establish a valid substitute through the auxiliary dataset, addressing unmeasured confounders. This is achieved by developing a control function and projecting it onto the function space spanned by the treatment variable. We then propose a three-step estimator for estimating causal effects and derive its asymptotic results. We illustrate the proposed estimator through simulation studies, and the results demonstrate favorable performance. We also conduct a real data analysis to evaluate the causal effect between vitamin D status and body mass index.

2307.07753 2026-03-31 cs.LG cs.AI stat.ML

Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks

Dominik Schnaus, Jongseok Lee, Daniel Cremers, Rudolph Triebel

Comments Accepted to ICML 2023

详情
英文摘要

In this work, we propose a novel prior learning method for advancing generalization and uncertainty estimation in deep neural networks. The key idea is to exploit scalable and structured posteriors of neural networks as informative priors with generalization guarantees. Our learned priors provide expressive probabilistic representations at large scale, like Bayesian counterparts of pre-trained models on ImageNet, and further produce non-vacuous generalization bounds. We also extend this idea to a continual learning framework, where the favorable properties of our priors are desirable. Major enablers are our technical contributions: (1) the sums-of-Kronecker-product computations, and (2) the derivations and optimizations of tractable objectives that lead to improved generalization bounds. Empirically, we exhaustively show the effectiveness of this method for uncertainty estimation and generalization.

2304.14895 2026-03-31 stat.ME math.ST stat.TH

Identifiability of causal effects with non-Gaussianity and auxiliary covariates

Kang Shuai, Shanshan Luo, Yue Zhang, Feng Xie, Yangbo He

Comments 40 papges

详情
英文摘要

Assessing causal effects in the presence of unmeasured confounding is challenging. Although auxiliary variables, such as instrumental variables, are commonly used to identify causal effects, they are often unavailable in practice due to stringent and untestable conditions. To address this issue, previous researches have utilized linear structural equation models to show that the causal effect is identifiable when noise variables of the treatment and outcome are both non-Gaussian. In this paper, we investigate the problem of identifying the causal effect using the auxiliary covariate and non-Gaussianity from the treatment. Our key idea is to characterize the impact of unmeasured confounders using an observed covariate, assuming they are all Gaussian. We demonstrate that the causal effect can be identified using a measured covariate, and then extend the identification results to the multi-treatment setting. We further develop a simple estimation procedure for estimating causal effects and derive a $\sqrt{n}$-consistent estimator. Finally, we evaluate the performance of our estimator through simulation studies and apply our method to investigate the effect of the trade on income.

2303.13474 2026-03-31 math.ST stat.TH

PAC-Bayes Bounds for High-Dimensional Multi-Index Models with Unknown Active Dimension

Maximilian F. Steffen

详情
Journal ref
Japanese Journal of Statistics and Data Science, 2025
英文摘要

The multi-index model with sparse dimension reduction matrix is a popular approach to circumvent the curse of dimensionality in a high-dimensional regression setting. Building on the single-index analysis by Alquier, P. & Biau, G. (Journal of Machine Learning Research 14 (2013) 243-280), we develop a PAC-Bayesian estimation method for a possibly misspecified multi-index model with unknown active dimension and an orthogonal dimension reduction matrix. Our main result is a non-asymptotic oracle inequality, which shows that the estimation method adapts to the active dimension of the model, the sparsity of the dimension reduction matrix and the regularity of the link function. Under a Sobolev regularity assumption on the link function the estimator achieves the minimax rate of convergence (up to a logarithmic factor) and no additional price is paid for the unknown active dimension.

2206.05829 2026-03-31 math.ST cs.DM stat.ML stat.TH

Learning general conditional independence structures via the neighbourhood lattice

Arash A. Amini, Bryon Aragam, Qing Zhou

Comments 38 pages, 3 figures

详情
英文摘要

We study the problem of learning multivariate dependencies in nonparametric and high-dimensional settings. This includes but is not limited to graphical models. Our approach effectively combines several features that are missing from previous work on this problem: We show how the entire dependence structure can be learned nonparametrically while simultaneously evading the curse of dimensionality and relaxing common assumptions such as faithfulness. To this end, we introduce and study the neighbourhood lattice decomposition of a distribution, which is a compact, non-graphical representation of conditional independence (CI) that is valid in the absence of a faithful graphical representation. We show that the neighbourhood lattice decomposition exists in any graphical model and can be computed efficiently, nonparametrically, and consistently in high-dimensions without paying the usual curse of dimensionality. This gives a way to learn all of the independence relations implied by any graphical model, without requiring a priori knowledge of the graph or even the graph type. As a special case, our results provide a general solution to the problem of nonparametric estimation of high-dimensional CI structures over any graphical model.

2201.07093 2026-03-31 stat.ME stat.AP

Fragility Measures For Typical Cases

Robin Alexander, Benjamin R. Baer, Stephen E. Fremes, Mary Charlson, Mario Gaudino, Martin T. Wells

Comments 30 pages, 3 figures

详情
英文摘要

The fragility index is a clinically motivated metric designed to supplement the $p$ value during hypothesis testing. The measure relies on two pillars: selecting cases to have their outcome modified and modifying the outcomes. The measure is interesting but the case selection suffers from a drawback which can hamper its interpretation. This work presents the drawback and a method, the stochastic generalized fragility indices, designed to remedy it. Two examples concerning electoral outcomes and the causal effect of smoking cessation illustrate the method.

2106.00839 2026-03-31 cs.LG q-fin.RM stat.ML

Algorithmic Insurance

Dimitris Bertsimas, Agni Orfanoudaki

详情
英文摘要

When AI systems make errors in high-stakes domains like medical diagnosis or autonomous vehicles, a single algorithmic flaw across varying operational contexts can generate highly heterogeneous losses that challenge traditional insurance assumptions. Algorithmic insurance constitutes a novel form of financial coverage for AI-induced damages, representing an emerging market that addresses algorithm-driven liability. However, insurers currently struggle to price these risks, while AI developers lack rigorous frameworks connecting system design with financial liability exposure. We analyze the connection between operational choices of binary classification performance to tail risk exposure. Using conditional value-at-risk (CVaR) to capture extreme losses, we prove that established approaches like maximizing accuracy can significantly increase worst-case losses compared to tail risk optimization, with penalties growing quadratically as thresholds deviate from optimal. We then propose a liability insurance contract structure that mandates risk-aware classification thresholds and characterize the conditions under which it creates value for AI providers. Our analysis extends to degrading model performance and human oversight scenarios. We validate our findings through a mammography case study, demonstrating that CVaR-optimal thresholds reduce tail risk up to 13-fold compared to accuracy maximization. This risk reduction enables insurance contracts to create 14-16% gains for well-calibrated firms, while poorly calibrated firms benefit up to 65% through risk transfer, mandatory recalibration, and regulatory capital relief. Unlike traditional insurance that merely transfers risk, algorithmic insurance can function as both a financial instrument and an operational governance mechanism, simultaneously enabling efficient risk transfer while improving AI safety.

2603.27766 2026-03-31 cs.LG stat.ML

AutoStan: Autonomous Bayesian Model Improvement via Predictive Feedback

Oliver Dürr

详情
英文摘要

We present AutoStan, a framework in which a command-line interface (CLI) coding agent autonomously builds and iteratively improves Bayesian models written in Stan. The agent operates in a loop, writing a Stan model file, executing MCMC sampling, then deciding whether to keep or revert each change based on two complementary feedback signals: the negative log predictive density (NLPD) on held-out data and the sampler's own diagnostics (divergences, R-hat, effective sample size). We evaluate AutoStan on five datasets with diverse modeling structures. On a synthetic regression dataset with outliers, the agent progresses from naive linear regression to a model with Student-t robustness, nonlinear heteroscedastic structure, and an explicit contamination mixture, matching or outperforming TabPFN, a state-of-the-art black-box method, while remaining fully interpretable. Across four additional experiments, the same mechanism discovers hierarchical partial pooling, varying-slope models with correlated random effects, and a Poisson attack/defense model for soccer. No search algorithm, critic module, or domain-specific instructions are needed. This is, to our knowledge, the first demonstration that a CLI coding agent can autonomously write and iteratively improve Stan code for diverse Bayesian modeling problems.

2603.27743 2026-03-31 stat.ME cs.LG

Empirical Likelihood for Nonsmooth Functionals

Hongseok Namkoong

详情
英文摘要

Empirical likelihood is an attractive inferential framework that respects natural parameter boundaries, but existing approaches typically require smoothness of the functional and miscalibrate substantially when these assumptions are violated. For the optimal-value functional central to policy evaluation, smoothness holds only when the optimum is unique -- a condition that fails exactly when rigorous inference is most needed where more complex policies have modest gains. In this work, we develop a bootstrap empirical likelihood method for partially nonsmooth functionals. Our analytic workhorse is a geometric reduction of the profile likelihood to the distance between the score mean and a level set whose shape (a tangent cone given by nonsmoothness patterns) determines the asymptotic distribution. Unlike the classical proof technology based on Taylor expansions on the dual optima, our geometric approach leverages properties of a deterministic convex program and can directly apply to nonsmooth functionals. Since the ordinary bootstrap is not valid in the presence of nonsmoothness, we derive a corrected multiplier bootstrap approach that adapts to the unknown level-set geometry.

2603.27721 2026-03-31 stat.OT

Statistical Compatibility, Refutational Information, and Acceptability

Alessandro Rovetta

详情
英文摘要

This paper develops an interpretive framework for divergence P-values and S-values within a descriptive frequentist perspective. Statistical analysis is framed as operating within idealized worlds defined by a set of assumptions and a target hypothesis, where probabilities describe the behavior of data under the model but do not assign truth values to hypotheses. Within this view, P-values are interpreted as graded indices of compatibility between the observed result and the predictions generated by the assumed model; accordingly, small P-values should not be read as indicating logical impossibility or strict inconsistency of the model itself. Building on this distinction, the paper argues that practical inference requires moving beyond the internal logic of the model toward judgments of overall acceptability, which depend not only on data-model compatibility but also on multiple contextual considerations such as subject-matter knowledge, plausibility of assumptions, data quality, usefulness, and loss - all interpreted through the competence, intentions, perceptions, and moral values of the specific analyst. S-values are therefore interpreted not as evidence against the epistemic status of the model, but as a specific form of refutational information that contributes to the broader body of information used by the analyst to judge whether a model remains acceptable for an intended practical purpose. The paper also examines the linguistic and conceptual risks associated with the language of incompatibility, distinguishes probability from rarity, and clarifies different notions of surprise - including a possible definition of Shannon-type surprise, to be distinguished from Bayesian belief revision. Overall, the article proposes a more cautious and explicit interpretation of frequentist measures, centered on model-based description, analyst responsibility, and decision acceptability.

2603.27718 2026-03-31 stat.ME math.ST stat.TH

Induced replication and the assessment of models

Heather Battey, Nancy Reid

详情
英文摘要

We study the assessment of semiparametric and other highly-parametrised models from the perspective of foundational principles of parametric statistical inference. In doing so, we highlight the possibility of avoiding the usual semiparametric considerations, which typically require estimation of nuisance components through kernel smoothing or basis expansion, with the associated difficulties of tuning-parameter choice that blur the distinction between estimation and model assessment. A key aspect is the availability of preliminary manoeuvres that induce an internal replication of known form under the postulated model. This can be cast as a generalised version of the Fisherian sufficiency/co-sufficiency separation, replacing out-of-sample prediction error as a criterion for semiparametric model assessment by a type of within-sample prediction error. Framed in this light are new methodological contributions in multiple example settings, including model assessment for the proportional hazards model, for a time-dependent Poisson process with semiparametric intensity function, and for matched-pair and two-group examples. Also subsumed within the framework is a post-reduction inference approach to the construction of confidence sets of sparse regression models. Numerical work confirms recovery of nominal error rates under the postulated model and high sensitivity to departures in the direction of semiparametric alternatives. We conclude by emphasising open challenges and unifying perspectives.

2603.27679 2026-03-31 math.ST stat.TH

The asymptotic effect of tuning parameters

Ingrid Dæhlen, Nils Lid Hjort, Ingrid Hobæk Haff

Comments 34 pages, 2 figures

详情
英文摘要

Tuning parameters are parameters involved in an estimating procedure for the purpose of reducing the risk of some other estimator. Examples include the degree of penalization in penalized regression and likelihood problems, as well as the balance parameter in hybrid methods. Typically tuning parameters are set to the minimizers of some estimator of the risk, a step which introduces additional randomness and makes standard methodology inapplicable. We derive precise asymptotic theory for this situation. Our framework allows for smooth, but otherwise arbitrary, loss functions and for the risk to be estimated by cross-validation procedures. Results include consistency of the optimal estimator towards a well-defined quantity and asymptotic normality after proper scaling and centring. We give explicit forms and estimators for the limiting variance matrix and results sharply characterizing the distance from the training error to the cross-validated estimator of the risk.

2603.27672 2026-03-31 stat.ML cs.LG

Energy Score-Guided Neural Gaussian Mixture Model for Predictive Uncertainty Quantification

Yang Yang, Chunlin Ji, Haoyang Li, Ke Deng

Comments 39 pages, 5 figures

详情
英文摘要

Quantifying predictive uncertainty is essential for real world machine learning applications, especially in scenarios requiring reliable and interpretable predictions. Many common parametric approaches rely on neural networks to estimate distribution parameters by optimizing the negative log likelihood. However, these methods often encounter challenges like training instability and mode collapse, leading to poor estimates of the mean and variance of the target output distribution. In this work, we propose the Neural Energy Gaussian Mixture Model (NE-GMM), a novel framework that integrates Gaussian Mixture Model (GMM) with Energy Score (ES) to enhance predictive uncertainty quantification. NE-GMM leverages the flexibility of GMM to capture complex multimodal distributions and leverages the robustness of ES to ensure well calibrated predictions in diverse scenarios. We theoretically prove that the hybrid loss function satisfies the properties of a strictly proper scoring rule, ensuring alignment with the true data distribution, and establish generalization error bounds, demonstrating that the model's empirical performance closely aligns with its expected performance on unseen data. Extensive experiments on both synthetic and real world datasets demonstrate the superiority of NE-GMM in terms of both predictive accuracy and uncertainty quantification.

2603.27631 2026-03-31 cs.LG stat.ML

On the Asymptotics of Self-Supervised Pre-training: Two-Stage M-Estimation and Representation Symmetry

Mohammad Tinati, Stephen Tu

详情
英文摘要

Self-supervised pre-training, where large corpora of unlabeled data are used to learn representations for downstream fine-tuning, has become a cornerstone of modern machine learning. While a growing body of theoretical work has begun to analyze this paradigm, existing bounds leave open the question of how sharp the current rates are, and whether they accurately capture the complex interaction between pre-training and fine-tuning. In this paper, we address this gap by developing an asymptotic theory of pre-training via two-stage M-estimation. A key challenge is that the pre-training estimator is often identifiable only up to a group symmetry, a feature common in representation learning that requires careful treatment. We address this issue using tools from Riemannian geometry to study the intrinsic parameters of the pre-training representation, which we link with the downstream predictor through a notion of orbit-invariance, precisely characterizing the limiting distribution of the downstream test risk. We apply our main result to several case studies, including spectral pre-training, factor models, and Gaussian mixture models, and obtain substantial improvements in problem-specific factors over prior art when applicable.

2603.27572 2026-03-31 math.ST math.PR stat.AP stat.TH

On the role of symmetry for staircase mechanisms in local differential privacy efficiency across different privacy regimes

Chiara Amorino, Arnaud Gloter

详情
英文摘要

We investigate the structural foundations of statistical efficiency under $α$-local differential privacy, with a focus on maximizing Fisher information. Building on the role of continuous staircase mechanisms, we identify a fundamental symmetry regarding the extremal values $1$ and $e^α$. We demonstrate that when the optimal measure satisfies this symmetry, the Fisher information admits a closed-form expression. More generally, we derive a decomposition of the Fisher information into symmetric and asymmetric components, scaling as $α^{2}$ and $α^{3}$, respectively, for $α\to 0$. This reveals that, if in the high-privacy regime asymmetry is negligible, it is no longer the case as privacy constraints are relaxed. Motivated by this, we introduce a class of fully asymmetric privacy mechanisms constructed via pushforward mappings, proving that-unlike their symmetric counterparts-they recover the full Fisher information of the non-private model as $α\to \infty$. We bridge the gap between theory and practice by providing a tractable implementation of these mechanisms, governed by a tuning parameter $c$. This parameter allows for a smooth interpolation between the symmetric regime and the fully asymmetric regime. Furthermore, we demonstrate the versatility of this framework by showing that it encompasses the binomial mechanism as a limiting case.

2603.27546 2026-03-31 stat.ME math.ST stat.TH

Fast localization of anomalous patches in spatial data under dependence

Soham Bonnerjee, Sayar Karmakar, George Michailidis

详情
英文摘要

We propose a scalable, provably accurate method for localizing an unknown number of multiple axis-aligned anomalous patches in spatial data under a general class of spatial dependence. Motivated by the practical need to detect localized changes rather than completely segment large spatial grids, we first introduce both a naive and a significantly faster intelligent-sampling-based estimator for a single patch. We then extend this methodology to the highly challenging multiple-patch setting and propose a two-stage Spatial Patch Localization of Anomalies under DEpendence procedure (SPLADE). Under mild conditions on signal strength, separation from the boundary, inter-patch separation, and a uniform Gaussian approximation, we establish simultaneous consistency for the estimated number of patches and for each individual patch boundary. Extensive numerical results based on synthetic data scenarios demonstrate that the proposed method exhibits significant computational and accuracy gains over competing approaches, as well as robustness to moderate and severe spatial dependence. Finally, we demonstrate the real-world utility of the proposed method by applying it to frame-to-frame video surveillance data, where it accurately detects small, closely separated subjects, a task where existing methods are significantly slower and highly prone to spurious detections due to not accounting for spatial dependence. A second application on 3D fibrous media is deferred to the Appendix.

2603.27535 2026-03-31 stat.ME math.ST stat.TH

Extension of coupling via the Projection of Optimal Transport

Jakwang Kim, Young-Heon Kim, Chan Park

Comments 32 pages, 5 tables

详情
英文摘要

In many statistical settings, two types of data are available: coupled data, which preserve the joint structure among variables but are limited in size due to cost or privacy constraints, and marginal data, which are available at larger scales but lack joint structure. Since standard methods require coupled data, marginal information is often discarded. We propose a fully nonparametric procedure that integrates decoupled marginal data with a limited amount of coupled data to improve the downstream analysis. The approach can be understood as an extension of coupling via projection in optimal transport. Specifically, the estimator is a solution for the optimal transport projection over the space of probability measures, which genuinely provides a natural geometric interpretation. Not only is its stability established, but its sample complexity is also derived using recent advances in statistical optimal transport. In addition to this, we present its explicit formula based on ``shadow," a notion introduced by Eckstein and Nutz. Furthermore, the estimator can be approximated in almost linear time and in parallel by entropic shadow, which demonstrates the theoretical and practical strengths of our methods. Lastly, we present experiments with real and synthetic data to justify the performance of our method.

2603.27502 2026-03-31 stat.AP

Cristiano Ronaldo or Lionel Messi, who is more consistent in scoring goals? The evidence from CFM exploratory analysis

Samsul Anwar, Siti Munawarah, Radhiah Radhiah

Comments 9 pages, 4 tables, 3 figures

详情
英文摘要

The rivalry between two football superstars Cristiano Ronaldo and Lionel Messi has always been a subject of extensive discussion. This study aimed to compare the level of consistency between the two players in scoring goals through 6 ways: right-footed kicks, left-footed kicks, penalty kicks, direct free kicks, long-range kicks, and headers. The data analyzed was the duration of time (minutes) each player took to score a goal in every match they played. The data was obtained from a football website called Transfermarkt.com. Competing Failure Modes (CFM) was used to measure the reliability of the two players in scoring goals based on those various ways. The results of CFM exploratory analysis showed that Ronaldo and Messi had the same level of consistency in scoring goals for more than 17 years of their professional football career. Both have been among most talented players in the modern football era with individual and team achievements that are far above other footballers around the world.

2603.27487 2026-03-31 stat.ME stat.CO

Robust regularized covariance matrix estimation: well-posedness and convergent algorithm

Mengxi Yi, David Tyler

详情
英文摘要

In this paper, we study properties of penalized and structured M-estimators of multivariate scatter, based on geodesically convex but not necessarily smooth penalty functions. Existence and uniqueness conditions for these penalized and structured estimators are given. However, we show that the standard fixed-point algorithm which is usually applied to an M-estimation problem does not necessarily converge for penalized M-estimation problems. Hence, we develop a new but simple re-weighting algorithm and prove that it has monotone convergence for a broad class of penalized and structured M-estimators of multivariate scatter.

2603.27486 2026-03-31 cs.CV stat.AP

Estimating the Impact of COVID-19 on Travel Demand in Houston Area Using Deep Learning and Satellite Imagery

Alekhya Pachika, Lu Gao, Lingguang Song, Pan Lu, Xingju Wang

详情
Journal ref
International Conference on Transportation and Development 2023 (pp. 437-444)
英文摘要

Considering recent advances in remote sensing satellite systems and computer vision algorithms, many satellite sensing platforms and sensors have been used to monitor the condition and usage of transportation infrastructure systems. The level of details that can be detected increases significantly with the increase of ground sample distance (GSD), which is around 15 cm - 30 cm for high-resolution satellite images. In this study, we analyzed data acquired from high-resolution satellite imagery to provide insights, predictive signals, and trend for travel demand estimation. More specifically, we estimate the impact of COVID-19 in the metropolitan area of Houston using satellite imagery from Google Earth Engine datasets. We developed a car-counting model through Detectron2 and Faster R-CNN to monitor the presence of cars within different locations (i.e., university, shopping mall, community plaza, restaurant, supermarket) before and during the COVID-19. The results show that the number of cars detected at these selected locations reduced on average 30% in 2020 compared with the previous year 2019. The results also show that satellite imagery provides rich information for travel demand and economic activity estimation. Together with advanced computer vision and deep learning algorithms, it can generate reliable and accurate information for transportation agency decision makers.

2603.27463 2026-03-31 stat.ME

Multivariate Gaussian process emulation for multifidelity computer models with high-dimensional spatial outputs

Cyrus S. McCrimmon, Pulong Ma

详情
英文摘要

Risk assessment of hurricane-driven storm surge relies on deterministic computer models that produce outputs over a large spatial domain. The surge models can often be run at a range of fidelity levels, with greater precision yielding more accurate simulations. Improved accuracy comes with a significant increase in computational expense, necessitating the development of an emulator which leverages information from the more plentiful low-fidelity outputs to provide fast and accurate predictions of high-fidelity simulations. To properly assess the risk of storm surge over a geographic region at aggregated spatial resolution, an emulator must account for spatial dependence between outputs yet remain computationally feasible for high-dimensional simulations. To address this challenge, we exploit the autoregressive cokriging framework to develop two cross-covariance structures to account for spatial dependence. One approach uses a separable covariance structure with a sparse Cholesky prior for the inverse of the cross-covariance matrix; the other involves a low-rank approximation via basis representations. We demonstrate their predictive performance in the storm surge application and a testbed example.

2603.27458 2026-03-31 stat.ME

Extreme Value Inference for CoVaR and Systemic Risk

Xiaoting Li, Harry Joe

详情
英文摘要

We develop an extreme value framework for CoVaR centered on $v(q \mid p ; C)$, the copula-adjusted probability level, or equivalently, the CoVaR on the uniform (0,1) scale. We characterize the possible tail regimes of $v(q \mid p ; C)$ through the limit behavior of the copula conditional distribution and show that these regimes are determined by the joint tail expansions of the copula. This leads to tractable conditions for identifying the tail regime and deriving the asymptotic behavior of $v(q | p ; C)$. Building on this characterization, we propose a minimum-distance estimation approach for CoVaR that accommodates multiple tail regimes. The methodology links CoVaR and $Δ$CoVaR to the underlying joint tail behavior, thereby providing a clear interpretation of these measures in systemic risk analysis. An empirical analysis across U.S. sectors demonstrates the practical value of the approach for assessing systemic risk contributions and exposures with important implications for macroprudential surveillance and risk management.

2603.27457 2026-03-31 math.ST stat.ME stat.ML stat.TH

Optimal Demixing of Nonparametric Densities

Jianqing Fan, Zheng Tracy Ke, Zhaoyang Shi

详情
英文摘要

Motivated by applications in statistics and machine learning, we consider a problem of unmixing convex combinations of nonparametric densities. Suppose we observe $n$ groups of samples, where the $i$th group consists of $N_i$ independent samples from a $d$-variate density $f_i(x)=\sum_{k=1}^K π_i(k)g_k(x)$. Here, each $g_k(x)$ is a nonparametric density, and each $π_i$ is a $K$-dimensional mixed membership vector. We aim to estimate $g_1(x), \ldots,g_K(x)$. This problem generalizes topic modeling from discrete to continuous variables and finds its applications in LLMs with word embeddings. In this paper, we propose an estimator for the above problem, which modifies the classical kernel density estimator by assigning group-specific weights that are computed by topic modeling on histogram vectors and de-biased by U-statistics. For any $β>0$, assuming that each $g_k(x)$ is in the Nikol'ski class with a smooth parameter $β$, we show that the sum of integrated squared errors of the constructed estimators has a convergence rate that depends on $n$, $K$, $d$, and the per-group sample size $N$. We also provide a matching lower bound, which suggests that our estimator is rate-optimal.

2603.27415 2026-03-31 cs.AI stat.CO

Greedy Is a Strong Default: Agents as Iterative Optimizers

Yitao Li

详情
英文摘要

Classical optimization algorithms--hill climbing, simulated annealing, population-based methods--generate candidate solutions via random perturbations. We replace the random proposal generator with an LLM agent that reasons about evaluation diagnostics to propose informed candidates, and ask: does the classical optimization machinery still help when the proposer is no longer random? We evaluate on four tasks spanning discrete, mixed, and continuous search spaces (all replicated across 3 independent runs): rule-based classification on Breast Cancer (test accuracy 86.0% to 96.5%), mixed hyperparameter optimization for MobileNetV3-Small on STL-10 (84.5% to 85.8%, zero catastrophic failures vs. 60% for random search), LoRA fine-tuning of Qwen2.5-0.5B on SST-2 (89.5% to 92.7%, matching Optuna TPE with 2x efficiency), and XGBoost on Adult Census (AUC 0.9297 to 0.9317, tying CMA-ES with 3x fewer evaluations). Empirically, on these tasks: a cross-task ablation shows that simulated annealing, parallel investigators, and even a second LLM model (OpenAI Codex) provide no benefit over greedy hill climbing while requiring 2-3x more evaluations. In our setting, the LLM's learned prior appears strong enough that acceptance-rule sophistication has limited impact--round 1 alone delivers the majority of improvement, and variants converge to similar configurations across strategies. The practical implication is surprising simplicity: greedy hill climbing with early stopping is a strong default. Beyond accuracy, the framework produces human-interpretable artifacts--the discovered cancer classification rules independently recapitulate established cytopathology principles.

2603.27414 2026-03-31 math.ST cs.AI stat.TH

Multiple-Prediction-Powered Inference

Charlie Cowen-Breen, Alekh Agarwal, Stephen Bates, William W. Cohen, Jacob Eisenstein, Amir Globerson, Adam Fisch

Comments ICLR 2026, 45 pages, 17 figures

详情
英文摘要

Statistical estimation often involves tradeoffs between expensive, high-quality measurements and a variety of lower-quality proxies. We introduce Multiple-Prediction-Powered Inference (MultiPPI): a general framework for constructing statistically efficient estimates by optimally allocating resources across these diverse data sources. This work provides theoretical guarantees about the minimax optimality, finite-sample performance, and asymptotic normality of the MultiPPI estimator. Through experiments across three diverse large language model (LLM) evaluation scenarios, we show that MultiPPI consistently achieves lower estimation error than existing baselines. This advantage stems from its budget-adaptive allocation strategy, which strategically combines subsets of models by learning their complex cost and correlation structures.

2603.27395 2026-03-31 math.DS math.AT stat.ML

Topological Detection of Hopf Bifurcations via Persistent Homology: A Functional Criterion from Time Series

Jhonathan Barrios, Yásser Echávez, Carlos F. Álvarez

Comments 19 pages, 10 figures, submitted

详情
英文摘要

We propose a topological framework for the detection of Hopf bifurcations directly from time series, based on persistent homology applied to phase space reconstructions via Takens embedding within the framework of Topological Data Analysis. The central idea is that changes in the dynamical regime are reflected in the emergence or disappearance of a dominant one-dimensional homological features in the reconstructed attractor. To quantify this behavior, we introduce a simple and interpretable scalar topological functional defined as the maximum persistence of homology classes in dimension one. This functional is used to construct a computable criterion for identifying critical parameters in families of dynamical systems without requiring knowledge of the underlying equations. The proposed approach is validated on representative systems of increasing complexity, showing consistent detection of the bifurcation point. The results support the interpretation of dynamical transitions as topological phase transitions and demonstrate the potential of topological data analysis as a model-free tool for the quantitative analysis of nonlinear time series.

2603.27370 2026-03-31 math.OC math.PR math.ST q-fin.RM stat.ML stat.TH

The Risk Quadrangle in Optimization: An Overview with Recent Results and Extensions

Bogdan Grechuk, Anton Malandii, Terry Rockafellar, Stan Uryasev

详情
英文摘要

This paper revisits and extends the 2013 development by Rockafellar and Uryasev of the Risk Quadrangle (RQ) as a unified scheme for integrating risk management, optimization, and statistical estimation. The RQ features four stochastics-oriented functionals -- risk, deviation, regret, and error, along with an associated statistic, and articulates their revealing and in some ways surprising interrelationships and dualizations. Additions to the RQ framework that have come to light since 2013 are reviewed in a synthesis focused on both theoretical advancements and practical applications. New quadrangles -- superquantile, superquantile norm, expectile, biased mean, quantile symmetric average union, and $φ$-divergence-based quadrangles -- offer novel approaches to risk-sensitive decision-making across various fields such as machine learning, statistics, finance, and PDE-constrained optimization. The theoretical contribution comes in axioms for ``subregularity'' relaxing ``regularity'' of the quadrangle functionals, which is too restrictive for some applications. The main RQ theorems and connections are revisited and rigorously extended to this more ample framework. Examples are provided in portfolio optimization, regression, and classification, demonstrating the advantages and the role played by duality, especially in ties to robust optimization and generalized stochastic divergences.

2603.27350 2026-03-31 stat.OT

Network Evolution and National Interests: Global Scientific Reorganization and the Rise of Scientific Nationalism

Caroline Wagner, Xiaojing Cai

Comments Nine figures plus an appendix

详情
英文摘要

The global network of scientific cooperation has undergone major restructuring over the past two decades, with important implications for geopolitics and science policy. China's integration into this network has redistributed positions of influence in ways that challenge zero-sum views of national competition and security. Drawing on structural holes theory and the Bianconi-Barabasi fitness model, we argue that China's entry accelerated an ongoing process of network maturation. As China's scientific capacity expanded, it formed direct collaborations that reduced reliance on U.S. intermediation. Network analysis shows a large decline in U.S. betweenness centrality, while weighted measures remain stable, indicating a loss of brokerage advantages but continued strong bilateral ties. Granger causality tests suggest that China's early participation predicted later structural changes across fields. Results are consistent across six major domains.

2603.27320 2026-03-31 stat.ME stat.ML

Retrospective Counterfactual Prediction by Conditioning on the Factual Outcome: A Cross-World Approach

Juraj Bodik

详情
英文摘要

Retrospective causal questions ask what would have happened to an observed individual had they received a different treatment. We study the problem of estimating $μ(x,y)=\mathbb{E}[Y(1)\mid X=x,Y(0)=y]$, the expected counterfactual outcome for an individual with covariates $x$ and observed outcome $y$, and constructing valid prediction intervals under the Neyman-Rubin superpopulation model. This quantity is generally not identified without additional assumptions. To link the observed and unobserved potential outcomes, we work with a cross-world correlation $ρ(x)=cor(Y(1),Y(0)\mid X=x)$; plausible bounds on $ρ(x)$ enable a principled approach to this otherwise unidentified problem. We introduce retrospective counterfactual estimators $\hatμ_ρ(x,y)$ and prediction intervals $C_ρ(x,y)$ that asymptotically satisfy $P[Y(1)\in C_ρ(x,y)\mid X=x, Y(0)=y]\ge1-α$ under standard causal assumptions. Many common baselines implicitly correspond to endpoint choices $ρ=0$ or $ρ=1$ (ignoring the factual outcome or treating the counterfactual as a shifted factual outcome). Interpolating between these cases through cross-world dependence yields substantial gains in both theory and practice.

2603.27293 2026-03-31 stat.ME

Bayesian factorization via $L_{1/2}$ shrinkage

Shicheng Liu, Qingping Zhou, Yanan Fan, Xiongwen Ke

详情
英文摘要

Factor models are widely used for dimension reduction. Bayesian approaches to these models often place a prior on the factor loadings that allows for infinitely many factors, with loadings increasingly shrunk toward zero as the column index increases. However, existing increasing shrinkage priors often possess complex hierarchical structures that complicate posterior inference. To address this issue, we propose using an $L_{1/2}$ shrinkage prior. We demonstrate that by carefully setting the parameters in the hyper prior of its global shrinkage parameters, the increasing shrinkage property is preserved. Our prior specification is simple, facilitating the construction of an efficient Gibbs sampler for exact posterior inference. For faster computation, we also propose a variational approximation algorithm. Through numerical studies, we compare our approaches with current popular Bayesian methods for factor models, demonstrating their merits in terms of accuracy and computational efficiency.

2603.27276 2026-03-31 stat.AP cs.MS stat.CO

PyINLA: Fast Bayesian Inference for Latent Gaussian Models in Python

Esmail Abdul Fattah, Elias Krainski, Havard Rue

Comments 41 pages, 9 figures

详情
英文摘要

Bayesian inference often relies on Markov chain Monte Carlo (MCMC) methods, particularly required for non-Gaussian data families. When dealing with complex hierarchical models, the MCMC approach can be computationally demanding in workflows that require repeated model fitting or when working with models of large dimensions with limited hardware resources. The Integrated Nested Laplace Approximations (INLA) is a deterministic alternative for models with non-Gaussian data that belong to the class of latent Gaussian models (LGMs), yielding accurate approximations to posterior marginals in many applied settings. The INLA method was implemented in C as a standalone program, inla, that is widely used in R through the INLA package. This paper introduces PyINLA, a dedicated Python package that provides a Pythonic interface directly to the inla program. Therefore, PyINLA enables specifying LGMs, running INLA-based inference, and accessing posterior summaries directly from Python while leveraging the established INLA implementation. We describe the package design and illustrate its use on representative models, including generalized linear mixed models, time series forecasting, disease mapping, and geostatistical prediction, demonstrating how deterministic Bayesian inference can be performed in Python using INLA in a way that integrates naturally with common scientific computing workflows.

2603.27270 2026-03-31 cs.AI stat.ML

Quantification of Credal Uncertainty: A Distance-Based Approach

Xabier Gonzalez-Garcia, Siu Lun Chau, Julian Rodemann, Michele Caprio, Krikamol Muandet, Humberto Bustince, Sébastien Destercke, Eyke Hüllermeier, Yusuf Sale

详情
英文摘要

Credal sets, i.e., closed convex sets of probability measures, provide a natural framework to represent aleatoric and epistemic uncertainty in machine learning. Yet how to quantify these two types of uncertainty for a given credal set, particularly in multiclass classification, remains underexplored. In this paper, we propose a distance-based approach to quantify total, aleatoric, and epistemic uncertainty for credal sets. Concretely, we introduce a family of such measures within the framework of Integral Probability Metrics (IPMs). The resulting quantities admit clear semantic interpretations, satisfy natural theoretical desiderata, and remain computationally tractable for common choices of IPMs. We instantiate the framework with the total variation distance and obtain simple, efficient uncertainty measures for multiclass classification. In the binary case, this choice recovers established uncertainty measures, for which a principled multiclass generalization has so far been missing. Empirical results confirm practical usefulness, with favorable performance at low computational cost.

2603.27265 2026-03-31 stat.ME math.ST stat.TH

Robust Estimation in Step-Stress Experiments under Weibull Lifetime Distributions

María Jaenada, Juan Millán, Leandro Pardo

Comments 24 pages (without Appendix), 7 figures, 7 tables

详情
英文摘要

Many modern products are highly reliable, often exhibiting long lifetimes. As a result, conducting experiments under normal operating conditions can be prohibitively time-consuming to collect sufficient failure data for robust statistical inference. Accelerated life tests (ALTs) offer a practical solution by inducing earlier failures, thereby reducing the required testing time. In step-stress experiments, a stress factor that accelerates product degradation is identified and systematically increased at predetermined time points, while remaining constant between intervals. Failure data collected under these elevated stress levels is analyzed, and the results are then extrapolated to normal operating conditions. Traditional estimation methods for such data, such as the maximum likelihood estimator (MLE), are highly efficient under ideal conditions but can be severely affected by outlying or contaminated observations. To address this, we propose the use of Minimum Density Power Divergence Estimators (MDPDEs) as a robust alternative, offering a balanced trade-off between efficiency and resistance to contamination. The MDPDE framework is extended to mixed distributions and its theoretical properties, including the asymptotic distribution of the model parameters, are derived assuming Weibull lifetimes. The effectiveness of the proposed approach is illustrated through extensive simulation studies, and its practical applicability is further demonstrated using real-world data.

2603.27189 2026-03-31 stat.ME cs.LG stat.ML

Conformal Prediction Assessment: A Framework for Conditional Coverage Evaluation and Selection

Zheng Zhou, Xiangfei Zhang, Chongguang Tao, Yuhong Yang

详情
英文摘要

Conformal prediction provides rigorous distribution-free finite-sample guarantees for marginal coverage under the assumption of exchangeability, but may exhibit systematic undercoverage or overcoverage for specific subpopulations. Assessing conditional validity is challenging, as standard stratification methods suffer from the curse of dimensionality. We propose Conformal Prediction Assessment (CPA), a framework that reframes the evaluation of conditional coverage as a supervised learning task by training a reliability estimator that predicts instance-level coverage probabilities. Building on this estimator, we introduce the Conditional Validity Index (CVI), which decomposes reliability into safety (undercoverage risk) and efficiency (overcoverage cost). We establish convergence rates for the reliability estimator and prove the consistency of CVI-based model selection. Extensive experiments on synthetic and real-world datasets demonstrate that CPA effectively diagnoses local failure modes and that CC-Select, our CVI-based model selection algorithm, consistently identifies predictors with superior conditional coverage performance.

2603.27171 2026-03-31 math.ST math.DG stat.TH

Estimation of Riemannian Quantities from Noisy Data via Density Derivatives

Junhao Chen, Ruowei Li, Zhigang Yao

Comments 48 pages, 8 figures

详情
英文摘要

We study the recovery of geometric structure from data generated by convolving the uniform measure on a smooth compact submanifold $M\subset\mathbb{R}^D$ with ambient Gaussian noise. Our main result is that several fundamental Riemannian quantities of $M$, including tangent spaces, the intrinsic dimension, and the second fundamental form, are identifiable from derivatives of the noisy density. We first derive uniform small-noise expansions of the data density and its derivatives in a tubular neighborhood of $M$. These expansions show that, at the population level, tangent spaces can be recovered from the density Hessian with $O(σ^2)$ error, while the intrinsic dimension can be estimated consistently. We further construct estimators for the second fundamental form from density derivatives, obtaining $O(d(y,M)+σ)$ and $O(d(y,M)+σ^2)$ errors for hypersurfaces and submanifolds with arbitrary codimension. At the sample level, we estimate the density and its derivatives by kernel methods in the ambient space and plug them into the population constructions, yielding uniform nonparametric rates in the ambient dimension. Finally, we show that these density-based constructions admit a geometric interpretation through density-induced ambient metrics, linking the geometry of $M$ to ambient geodesic structure.

2603.27137 2026-03-31 math.NA cs.NA stat.ML

A Mean Field Games Perspective on Evolutionary Clustering

Alessio Basti, Fabio Camilli, Adriano Festa

详情
英文摘要

We propose a control-theoretic framework for evolutionary clustering based on Mean Field Games (MFG). Moving beyond static or heuristic approaches, we formulate the problem as a population dynamics game governed by a coupled Hamilton-Jacobi-Bellman and Fokker-Planck system. Driven by a variational cost functional rather than predefined statistical shapes, this continuous-time formulation provides a flexible basis for non-parametric cluster evolution. To validate the framework, we analyze the setting of time-dependent Gaussian mixtures, showing that the MFG dynamics recover the trajectories of the classical Expectation-Maximization (EM) algorithm while ensuring mass conservation. Furthermore, we introduce time-averaged log-likelihood functionals to regularize temporal fluctuations. Numerical experiments illustrate the stability of our approach and suggest a path toward more general non-parametric clustering applications where traditional EM methods may face limitations.

2603.27135 2026-03-31 cs.LG stat.ML

Spectral-Aware Text-to-Time Series Generation with Billion-Scale Multimodal Meteorological Data

Shijie Zhang

Comments Accepted By IJCNN 2026 (WCCI)

详情
英文摘要

Text-to-time-series generation is particularly important in meteorology, where natural language offers intuitive control over complex, multi-scale atmospheric dynamics. Existing approaches are constrained by the lack of large-scale, physically grounded multimodal datasets and by architectures that overlook the spectral-temporal structure of weather signals. We address these challenges with a unified framework for text-guided meteorological time-series generation. First, we introduce MeteoCap-3B, a billion-scale weather dataset paired with expert-level captions constructed via a Multi-agent Collaborative Captioning (MACC) pipeline, yielding information-dense and physically consistent annotations. Building on this dataset, we propose MTransformer, a diffusion-based model that enables precise semantic control by mapping textual descriptions into multi-band spectral priors through a Spectral Prompt Generator, which guides generation via frequency-aware attention. Extensive experiments on real-world benchmarks demonstrate state-of-the-art generation quality, accurate cross-modal alignment, strong semantic controllability, and substantial gains in downstream forecasting under data-sparse and zero-shot settings. Additional results on general time-series benchmarks indicate that the proposed framework generalizes beyond meteorology.

2603.27114 2026-03-31 cs.LG stat.ME

Maximin Learning of Individualized Treatment Effect on Multi-Domain Outcomes

Yuying Lu, Wenbo Fei, Yuanjia Wang, Molei Liu

详情
英文摘要

Precision mental health requires treatment decisions that account for heterogeneous symptoms reflecting multiple clinical domains. However, existing methods for estimating individualized treatment effects (ITE) rely on a single summary outcome or a specific set of observed symptoms or measures, which are sensitive to symptom selection and limit generalizability to unmeasured yet clinically relevant domains. We propose DRIFT, a new maximin framework for estimating robust ITEs from high-dimensional item-level data by leveraging latent factor representations and adversarial learning. DRIFT learns latent constructs via generalized factor analysis, then constructs an anchored on-target uncertainty set that extrapolates beyond the observed measures to approximate the broader hyper-population of potential outcomes. By optimizing worst-case performance over this uncertainty set, DRIFT yields ITEs that are robust to underrepresented or unmeasured domains. We further show that DRIFT is invariant to admissible reparameterizations of the latent factors and admits a closed-form maximin solution, with theoretical guarantees for identification and convergence. In analyses of a randomized controlled trial for major depressive disorder (EMBARC), DRIFT demonstrates superior performance and improved generalizability to external multi-domain outcomes, including side effects and self-reported symptoms not used during training.

2603.27113 2026-03-31 cs.LG cond-mat.mtrl-sci stat.ML

Hierarchy-Guided Topology Latent Flow for Molecular Graph Generation

Urvi Awasthi, Alexander Arjun Lobo, Leonid Zhukov

Comments 22 pages, 2 figures, 6 tables. Accepted to ICLR 2026 AI4Mat Workshop

详情
英文摘要

Generating chemically valid 3D molecules is hindered by discrete bond topology: small local bond errors can cause global failures (valence violations, disconnections, implausible rings), especially for drug-like molecules with long-range constraints. Many unconditional 3D generators emphasize coordinates and then infer bonds or rely on post-processing, leaving topology feasibility weakly controlled. We propose Hierarchy-Guided Latent Topology Flow (HLTF), a planner-executor model that generates bond graphs with 3D coordinates, using a latent multi-scale plan for global context and a constraint-aware sampler to suppress topology-driven failures. On QM9, HLTF achieves 98.8% atom stability and 92.9% valid-and-unique, improving PoseBusters validity to 94.0% (+0.9 over the strongest reported baseline). On GEOM-DRUGS, HLTF attains 85.5%/85.0% validity/valid-unique-novel without post-processing and 92.2%/91.2% after standardized relaxation, within 0.9 points of the best post-processed baseline. Explicit topology generation also reduces "false-valid" samples that pass RDKit sanitization but fail stricter checks.

2603.27085 2026-03-31 stat.ME

Model-free Feature Screening via Revised Chatterjee's Rank Correlation for Ultra-high Dimensional Censored Data

Shuya Chen, Heng Peng, Min Zhou

详情
英文摘要

In large-scale biomedical research, it's common to gather ultra-high dimensional data that includes right-censored survival times. Feature screening has emerged as a crucial statistical technique for handling such data. In this paper, we introduce a straightforward and robust feature screening approach, leveraging the modified Chatterjee's rank correlation, suitable for a broad range of survival models. With reasonably mild regularity assumptions, we establish the properties of sure screening and ranking consistency. The computation involved in our proposed method is quite direct and simple. Through simulation studies and real gene expression data analysis, we demonstrate the superior efficacy of our proposed approach.

2603.27074 2026-03-31 stat.AP cs.IT cs.LG math.IT stat.ML

Forecastability as an Information-Theoretic Limit on Prediction

Peter Maurice Catt

详情
英文摘要

Forecasting is usually framed as a problem of model choice. This paper starts earlier, asking how much predictive information is available at each horizon. Under logarithmic loss, the answer is exact: the mutual information between the future observation and the declared information set equals the maximum achievable reduction in expected loss. This paper develops the consequences of that identity. Forecastability, defined as this mutual information evaluated across horizons, forms a profile whose shape reflects the dependence structure of the process and need not be monotone. Three structural properties are derived: compression of the information set can only reduce forecastability; the gap between the profile under a finite lag window and the full history gives an exact truncation error budget; and for processes with periodic dependence, the profile inherits the periodicity. Predictive loss decomposes into an irreducible component fixed by the information structure and an approximation component attributable to the method; their ratio defines the exploitation ratio, a normalised diagnostic for method adequacy. The exact equality is specific to log loss, but when forecastability is near zero, classical inequalities imply that no method under any loss can materially improve on the unconditional baseline. The framework provides a theoretical foundation for assessing, prior to any modelling, whether the declared information set contains sufficient predictive information at the horizon of interest.

2603.27072 2026-03-31 stat.ML cs.LG

On the Loss Landscape Geometry of Regularized Deep Matrix Factorization: Uniqueness and Sharpness

Anil Kamber, Rahul Parhi

Comments 32 pages, 3 figures

详情
英文摘要

Weight decay is ubiquitous in training deep neural network architectures. Its empirical success is often attributed to capacity control; nonetheless, our theoretical understanding of its effect on the loss landscape and the set of minimizers remains limited. In this paper, we show that $\ell^2$-regularized deep matrix factorization/deep linear network training problems with squared-error loss admit a unique end-to-end minimizer for all target matrices subject to factorization, except for a set of Lebesgue measure zero formed by the depth and the regularization parameter. This observation reveals fundamental properties of the loss landscape of regularized deep matrix factorization problems: the Hessian spectrum is constant across all minimizers of the regularized deep scalar factorization problem with squared-error loss. Moreover, we show that, in regularized deep matrix factorization problems with squared-error loss, if the target matrix does not belong to the Lebesgue measure-zero set, then the Frobenius norm of each layer is constant across all minimizers. This, in turn, yields a global lower bound on the trace of the Hessian evaluated at any minimizer of the regularized deep matrix factorization problem. Furthermore, we establish a critical threshold for the regularization parameter above which the unique end-to-end minimizer collapses to zero.

2603.27062 2026-03-31 cs.LG stat.ML

Conformalized Signal Temporal Logic Inference under Covariate Shift

Yixuan Wang, Danyang Li, Matthew Cleaveland, Roberto Tron, Mingyu Cai

详情
英文摘要

Signal Temporal Logic (STL) inference learns interpretable logical rules for temporal behaviors in dynamical systems. To ensure the correctness of learned STL formulas, recent approaches have incorporated conformal prediction as a statistical tool for uncertainty quantification. However, most existing methods rely on the assumption that calibration and testing data are identically distributed and exchangeable, an assumption that is frequently violated in real-world settings. This paper proposes a conformalized STL inference framework that explicitly addresses covariate shift between training and deployment trajectories dataset. From a technical standpoint, the approach first employs a template-free, differentiable STL inference method to learn an initial model, and subsequently refines it using a limited deployment side dataset to promote distribution alignment. To provide validity guarantees under distribution shift, the framework estimates the likelihood ratio between training and deployment distributions and integrates it into an STL-robustness-based weighted conformal prediction scheme. Experimental results on trajectory datasets demonstrate that the proposed framework preserves the interpretability of STL formulas while significantly improving symbolic learning reliability at deployment time.

2603.27039 2026-03-31 stat.ME

Measuring Human Behavior Through Controlled Perturbations: A Framework for Behavioral System Identification

Pietro Cipresso

详情
英文摘要

The measurement of human behavior remains a central challenge across the behavioral sciences. Traditional approaches typically rely on passive observation of responses collected under static or weakly controlled conditions, limiting the identifiability of the underlying generative processes. As a result, different behavioral mechanisms may produce indistinguishable observations, constraining both inference and theoretical development. In this paper, we propose a methodological framework for behavioral measurement based on controlled perturbations. From this perspective, behavior is conceptualized as the observable output of a dynamical system, and measurement is reframed as a problem of system identification. Experimental environments act as measurement instruments that apply structured inputs (perturbations) and record behavioral trajectories as outputs over time. We outline the core components of this framework, including the design of perturbations, the role of temporal resolution, and the integration of multimodal data streams. We further discuss how advances in immersive technologies, programmable environments, and computational modeling enable the implementation of closed-loop experimental systems, where perturbation, observation, and model updating are tightly coupled. The proposed approach provides a principled basis for moving from descriptive and predictive models toward the identification of generative behavioral mechanisms. By integrating psychometrics, experimental design, and dynamical modeling, this framework contributes to the development of a more rigorous and reproducible methodology for the measurement of human behavior.

2603.27038 2026-03-31 stat.ME

A note on conditional densities, Bayes' rule, and recent criticisms of Bayesian inference

Alex Yan, Cathal Mills, Augustin Marignier, Younjung Kim, Ben Lambert

详情
英文摘要

When performing Bayesian inference, we frequently need to work with conditional probability densities. For example, the posterior function is the conditional density of the parameters given the data. Some might worry that conditional densities are ill-defined, considering that for a continuous random variable $Y$, the event $\{Y=y\}$ has probability zero, meaning the formula $\mathbb{P}(A|B)=\mathbb{P}(A\cap B)/\mathbb{P}(B)$ is inapplicable. In reality, when we work with conditional densities, we never condition directly on the zero-probability event $\{Y=y\}$; rather, we first condition on the random variable $Y$, and then we may plug in an observed value $y$. The first purpose of our article is to provide an exposition on conditional densities that elaborates on this point. While we have aimed to make this explanation accessible, we follow it with a roadmap of the measure theory needed to make it rigorous. A recent preprint (arXiv:2411.13570) has expressed the concern that probability densities are ill-defined and that as a result Bayes' theorem cannot be used, and they provide examples that allegedly demonstrate inconsistencies in the Bayesian framework. The second purpose of our article is to investigate their claims. We contend that the examples given in their work do not demonstrate any inconsistencies; we find that there are mathematical errors and that they deviate significantly from the Bayesian framework.

2603.27019 2026-03-31 stat.ML cs.LG math.PR stat.ME

Parameter Estimation in Stochastic Differential Equations via Wiener Chaos Expansion and Stochastic Gradient Descent

Francisco Delgado-Vences, José Julián Pavón-Español, Arelly Ornelas

Comments 25 pages, 3 figures. This manuscript has been submitted to Applied Mathematical Modelling for publication

详情
英文摘要

This study addresses the inverse problem of parameter estimation for Stochastic Differential Equations (SDEs) by minimizing a regularized discrepancy functional via Stochastic Gradient Descent (SGD). To achieve computational efficiency, we leverage the Wiener Chaos Expansion (WCE), a spectral decomposition technique that projects the stochastic solution onto an orthogonal basis of Hermite polynomials. This transformation effectively maps the stochastic dynamics into a hierarchical system of deterministic functions, termed the \textit{propagator}. By reducing the stochastic inference task to a deterministic optimization problem, our framework circumvents the heavy computational burden and sampling requirements of traditional simulation-based methods like MCMC or MLE. The robustness and scalability of the proposed approach are demonstrated through numerical experiments on various non-linear SDEs, including models for individual biological growth. Results show that the WCE-SGD framework provides accurate parameter recovery even from discrete, noisy observations, offering a significant paradigm shift in the efficient modeling of complex stochastic systems.

2603.27010 2026-03-31 stat.ME

Bayesian analysis of the causal reference-based model for missing data in clinical trials, accommodating partially observed post-intercurrent event data

Brendah Nansereko, Marcel Wolbers, James R. Carpenter, Jonathan W. Bartlett

详情
英文摘要

When treatment policy estimands are of interest, clinical trials often attempt to collect patient data after intercurrent events (ICEs), although such data are often limited. Retrieved dropout imputation methods, which use pre-ICE and available post-ICE data to impute missing post-ICE outcomes, are commonly applied but often yield treatment effect estimates with large standard errors (SEs) and may encounter convergence issues when post-ICE data are sparse. Reference-based imputation methods are also used, but they rely on strong assumptions about post-ICE outcomes, which can lead to biased estimates if these assumptions are incorrect. To address these limitations, we previously proposed the reference-based Bayesian causal model (BCM), which incorporates a prior on the maintained effect parameter to reflect uncertainty in reference-based assumptions for missing post-ICE data. Our earlier work assumed no post-ICE data were observed. Here, we extend the BCM to incorporate available post-ICE outcomes, providing an approach that mitigates limitations of both retrieved-dropout and standard reference-based methods. We propose both a fully Bayesian model and an imputation-based approach. A simulation study was conducted to evaluate the frequentist properties of the proposed methods in settings with partially observed post-ICE data and to compare performance with existing approaches. Retrieved-dropout methods produced higher estimated SEs than the BCM, particularly when post-ICE data were sparse. Under the BCM, treatment effect SEs increased as post-ICE data became more limited for both modelling approaches. Importantly, this increase can be controlled through the prior variance of the maintained effect parameter, with more informative priors stabilising estimation when post-ICE data are scarce.

2603.26993 2026-03-31 cs.MA cs.LG math.OC stat.ML

On the Reliability Limits of LLM-Based Multi-Agent Planning

Ruicheng Ao, Siyang Gao, David Simchi-Levi

Comments Technical note

详情
英文摘要

This technical note studies the reliability limits of LLM-based multi-agent planning as a delegated decision problem. We model the LLM-based multi-agent architecture as a finite acyclic decision network in which multiple stages process shared model-context information, communicate through language interfaces with limited capacity, and may invoke human review. We show that, without new exogenous signals, any delegated network is decision-theoretically dominated by a centralized Bayes decision maker with access to the same information. In the common-evidence regime, this implies that optimizing over multi-agent directed acyclic graphs under a finite communication budget can be recast as choosing a budget-constrained stochastic experiment on the shared signal. We also characterize the loss induced by communication and information compression. Under proper scoring rules, the gap between the centralized Bayes value and the value after communication admits an expected posterior divergence representation, which reduces to conditional mutual information under logarithmic loss and to expected squared posterior error under the Brier score. These results characterize the fundamental reliability limits of delegated LLM planning. Experiments with LLMs on a controlled problem set further demonstrate these characterizations.

2603.26982 2026-03-31 stat.ML cs.AI cs.LG

Online Statistical Inference of Constant Sample-averaged Q-Learning

Saunak Kumar Panda, Tong Li, Ruiqi Liu, Yisha Xiang

Comments 7 pages, 2 figures, 2 tables, Reinforcement Learning Safety Workshop (RLSW), Reinforcement Learning Conference (RLC) 2024

详情
英文摘要

Reinforcement learning algorithms have been widely used for decision-making tasks in various domains. However, the performance of these algorithms can be impacted by high variance and instability, particularly in environments with noise or sparse rewards. In this paper, we propose a framework to perform statistical online inference for a sample-averaged Q-learning approach. We adapt the functional central limit theorem (FCLT) for the modified algorithm under some general conditions and then construct confidence intervals for the Q-values via random scaling. We conduct experiments to perform inference on both the modified approach and its traditional counterpart, Q-learning using random scaling and report their coverage rates and confidence interval widths on two problems: a grid world problem as a simple toy example and a dynamic resource-matching problem as a real-world example for comparison between the two solution approaches.

2603.26981 2026-03-31 stat.ME math.ST stat.AP stat.TH

Boosting multi-view association testing via devariation

Ruyi Pan, Yinqiu He, Jun Young Park

详情
英文摘要

Understanding the interplay between high-dimensional data from different views is essential in biomedical research, particularly in fields such as genomics, neuroimaging and biobank-scale studies involving high-dimensional features. Existing statistical tests for the association between two random vectors often do not fully capture dependencies between views due to limitations in modeling within-view dependencies, particularly in high-dimensional data without clear dependency patterns, which can lead to a potential loss of statistical power. In this work, we propose a novel approach termed devariation which is considered a simple yet effective preprocessing method to address the limitations by adopting a penalized low-rank factor model to flexibly capture within-view dependencies. Theoretical analysis of asymptotic power shows that devariation increases statistical power, especially when within-view correlations impact signal-to-noise ratios, while maintaining robustness in scenarios without strong internal correlations. Simulation studies demonstrate devariation's superior performance over existing methods in various scenarios. We further validate devariation in multimodal neuroimaging data from the UK Biobank study, examining the associations between imaging-derived phenotypes (IDPs) from functional, structural, and diffusion magnetic resonance imaging (MRI).

2603.26971 2026-03-31 stat.AP cs.LG

Graph Attention Network-Based Detection of Autism Spectrum Disorder

Abigail Kelly, Ramchandra Rimal, Arpan Sainju

详情
英文摘要

Autism Spectrum Disorder (ASD) is a neurodevelopmental condition characterized by atypical brain connectivity. One of the crucial steps in addressing ASD is its early detection. This study introduces a novel computational framework that employs an Attention-Based Graph Convolutional Network, referred to as the GATGraphClassifier, for detecting ASD. We utilize Functional Magnetic Resonance Imaging (fMRI) data from the Autism Brain Imaging Data Exchange (ABIDE) repository to construct functional connectivity matrices using Pearson correlation, which captures interactions between various brain regions. These matrices are then transformed into graph representations, where the nodes and edges represent the brain regions and functional connections, respectively. The GATGraphClassifier employs attention mechanisms to identify critical connectivity patterns, thereby enhancing the model's interpretability and diagnostic accuracy. Our proposed framework demonstrates superior performance across all standard classification metrics compared to existing state-of-the-art methods. Notably, we achieved an average accuracy of 88.79\% on the test data over 30 independent runs, surpassing the benchmark model's performance by 12.27\%. In addition, we identified the crucial brain regions associated with ASD, consistent with the previous studies, and a few novel regions. This study not only contributes to the advancement of ASD detection but also shows the potential for broader adaptability of GATGraphClassifier in analyzing complex relational data in various fields, where understanding intricate connectivity and interaction patterns is essential.

2603.26963 2026-03-31 cs.CR cs.LG eess.SP stat.ML

On the Optimal Number of Grids for Differentially Private Non-Interactive $K$-Means Clustering

Gokularam Muthukrishnan, Anshoo Tandon

详情
英文摘要

Differentially private $K$-means clustering enables releasing cluster centers derived from a dataset while protecting the privacy of the individuals. Non-interactive clustering techniques based on privatized histograms are attractive because the released data synopsis can be reused for other downstream tasks without additional privacy loss. The choice of the number of grids for discretizing the data points is crucial, as it directly controls the quantization bias and the amount of noise injected to preserve privacy. The widely adopted strategy selects a grid size that is independent of the number of clusters and also relies on empirical tuning. In this work, we revisit this choice and propose a refined grid-size selection rule derived by minimizing an upper bound on the expected deviation in the K-means objective function, leading to a more principled discretization strategy for non-interactive private clustering. Compared to prior work, our grid resolution differs both in its dependence on the number of clusters and in the scaling with dataset size and privacy budget. Extensive numerical results elucidate that the proposed strategy results in accurate clustering compared to the state-of-the-art techniques, even under tight privacy budgets.

2603.26955 2026-03-31 stat.ME

Adaptive procedures for boundary FDR control

Sarah Mostow, Daniel Xiang

详情
英文摘要

A cornerstone of the multiple testing literature is the Benjamini-Hochberg (BH) procedure, which guarantees control of the FDR when $p$-values are independent or positively dependent. While BH controls the average quality of rejections, it does not provide guarantees for individual discoveries, particularly those near the rejection threshold, which are more likely to be false than the average rejection. For independent $p$-values with Uniform$(0,1)$ null distribution, the Support Line procedure (SL; arXiv:2207.07299) provably controls the error probability for the rejection at the edge of the discovery set (i.e. the one with largest $p$-value) at level $q m_0/m$, where $m_0$ is the number of true null hypotheses and $q$ is a tuning parameter. In this work, we study adaptive versions of the SL procedure that operate in two steps: the first step estimates $m_0$ from non-significant statistics, and the second step runs the SL procedure at an adjusted level $q m / \hat{m}_0$. The adaptive procedures are shown to control the false discovery probability for the "boundary'' rejection under an independence assumption. Simulation studies suggest that some but not all of the two-stage procedures maintain error control under positive dependence, and that substantial power is gained relative to the original SL procedure. We illustrate differences between the procedures on meta-data from the recent literature in behavioral psychology on growth mindset and nudge interventions.

2603.26954 2026-03-31 cs.LG math.ST stat.TH

High dimensional theory of two-phase optimizers

Atish Agarwala

详情
英文摘要

The trend towards larger training setups has brought a renewed interest in partially asynchronous two-phase optimizers which optimize locally and then synchronize across workers. Additionally, recent work suggests that the one-worker version of one of these algorithms, DiLoCo, shows promising results as a (synchronous) optimizer. Motivated by these studies we present an analysis of LA-DiLoCo, a simple member of the DiLoCo family, on a high-dimensional linear regression problem. We show that the one-worker variant, LA, provides a different tradeoff between signal and noise than SGD, which is beneficial in many scenarios. We also show that the multi-worker version generates more noise than the single worker version, but that this additional noise generation can be ameliorated by appropriate choice of hyperparameters. We conclude with an analysis of SLA -- LA with momentum -- and show that stacking two momentum operators gives an opportunity for acceleration via a non-linear transformation of the "effective'' Hessian spectrum, which is maximized for Nesterov momentum. Altogether our results show that two-phase optimizers represent a fruitful new paradigm for understanding and improving training algorithms.

2603.26940 2026-03-31 stat.ML cs.LG math.PR

Static and Dynamic Approaches to Computing Barycenters of Probability Measures on Graphs

David Gentile, James M. Murphy

Comments 31 pages, 17 figures, 1 table

详情
英文摘要

The optimal transportation problem defines a geometry of probability measures which leads to a definition for weighted averages (barycenters) of measures, finding application in the machine learning and computer vision communities as a signal processing tool. Here, we implement a barycentric coding model for measures which are supported on a graph, a context in which the classical optimal transport geometry becomes degenerate, by leveraging a Riemannian structure on the simplex induced by a dynamic formulation of the optimal transport problem. We approximate the exponential mapping associated to the Riemannian structure, as well as its inverse, by utilizing past approaches which compute action minimizing curves in order to numerically approximate transport distances for measures supported on discrete spaces. Intrinsic gradient descent is then used to synthesize barycenters, wherein gradients of a variance functional are computed by approximating geodesic curves between the current iterate and the reference measures; iterates are then pushed forward via a discretization of the continuity equation. Analysis of measures with respect to given dictionary of references is performed by solving a quadratic program formed by computing geodesics between target and reference measures. We compare our novel approach to one based on entropic regularization of the static formulation of the optimal transport problem where the graph structure is encoded via graph distance functions, we present numerical experiments validating our approach, and we conclude that intrinsic gradient descent on the probability simplex provides a coherent framework for the synthesis and analysis of measures supported on graphs.

2603.24765 2026-03-31 cs.IR stat.ML

Enhancing Online Support Group Formation Using Topic Modeling Techniques

Pronob Kumar Barman, Tera L. Reynolds, James Foulds

详情
英文摘要

Online health communities (OHCs) are vital for fostering peer support and improving health outcomes. Support groups within these platforms can provide more personalized and cohesive peer support, yet traditional support group formation methods face challenges related to scalability, static categorization, and insufficient personalization. To overcome these limitations, we propose two novel machine learning models for automated support group formation: the Group specific Dirichlet Multinomial Regression (gDMR) and the Group specific Structured Topic Model (gSTM). These models integrate user generated textual content, demographic profiles, and interaction data represented through node embeddings derived from user networks to systematically automate personalized, semantically coherent support group formation. We evaluate the models on a large scale dataset from MedHelp, comprising over 2 million user posts. Both models substantially outperform baseline methods including LDA, DMR, and STM in predictive accuracy (held out log likelihood), semantic coherence (UMass metric), and internal group consistency. The gDMR model yields group covariates that facilitate practical implementation by leveraging relational patterns from network structures and demographic data. In contrast, gSTM emphasizes sparsity constraints to generate more distinct and thematically specific groups. Qualitative analysis further validates the alignment between model generated groups and manually coded themes, showing the practical relevance of the models in informing groups that address diverse health concerns such as chronic illness management, diagnostic uncertainty, and mental health. By reducing reliance on manual curation, these frameworks provide scalable solutions that enhance peer interactions within OHCs, with implications for patient engagement, community resilience, and health outcomes.

2603.23767 2026-03-31 stat.ME stat.AP

Age-Specific Logistic Regression with Complex Event Time Data

Haoxuan, Zhou, X. Joan Hu, Yi Xiong, Yan Yuan

详情
英文摘要

In attempt to advance the current practice for assessing and predicting the primary ovarian insufficiency (POI) risk in female childhood cancer survivors, we propose two estimating function based approaches for age-specific logistic regression. Both approaches adapt the inverse probability of censoring weighting (IPCW) strategy and yield consistent estimators with asymptotic normality. The first approach modifies the IPCW weights used by Im et al. (2023) to account for doubly censoring. The second approach extends the outcome weighted IPCW approach to use the information of the subjects censored before the analysis time. We consider variance estimation for the estimators and explore by simulation the two approaches implemented in the situations where the conditional right-censoring time distribution required in the IPCW weighs is unknown and approximated using the survival random forest approaches, stratified empirical distribution functions, or the estimator under the Cox proportional hazards model. The numerical studies indicate that the second approach is more efficient when right-censoring is relatively heavy, whereas the first approach is preferable when the right-censoring is light. We also observe that the performance of the two approaches heavily relies on the estimation of censoring distribution in our simulation settings. The POI data from a childhood cancer survivor study are employed throughout the paper for motivation and illustration. Our data analysis provides new insight into understanding the POI risk among cancer survivors.

2603.22339 2026-03-31 cs.LG cs.CL stat.ML

Problems with Chinchilla Approach 2: Systematic Biases in IsoFLOP Parabola Fits

Eric Czech, Zhiwei Xu, Yael Elmatad, Yixin Wang, William Held

详情
英文摘要

Chinchilla Approach 2 is among the most widely used methods for fitting neural scaling laws. Its parabolic approximation introduces systematic biases in compute-optimal allocation estimates, even on noise-free synthetic data. Applied to published Llama 3 IsoFLOP data at open frontier compute scales, these biases imply a parameter underallocation corresponding to 6.5% of the $3.8\times10^{25}$ FLOP training budget and \$1.4M (90% CI: \$412K-\$2.9M) in unnecessary compute at 50% H100 MFU. Simulated multimodal model misallocations show even greater opportunity costs due to higher loss surface asymmetry. Three sources of this error are examined: IsoFLOP sampling grid width (Taylor approximation accuracy), uncentered IsoFLOP sampling, and loss surface asymmetry ($α\neq β$). Chinchilla Approach 3 largely eliminates these biases but is often regarded as less data-efficient, numerically unstable, prone to local minima, and harder to implement. Each concern is shown to be unfounded or addressable, especially when the partially linear structure of the objective is exploited via Variable Projection, enabling unbiased inference on all five loss surface parameters through a two-dimensional optimization that is well-conditioned, analytically differentiable, and amenable to dense, or even exhaustive, grid search. It may serve as a more convenient replacement for Approach 2 or a more scalable alternative for adaptations of Approach 3 to richer scaling law formulations. See https://github.com/Open-Athena/vpnls for details and https://openathena.ai/scaling-law-analysis for other results from this study.

2603.14575 2026-03-31 cs.LG cs.CL stat.ML

CausalEvolve: Towards Open-Ended Discovery with Causal Scratchpad

Yongqiang Chen, Chenxi Liu, Zhenhao Chen, Tongliang Liu, Bo Han, Kun Zhang

Comments Preprint of ongoing work; Yongqiang and Chenxi contributed equally;

详情
英文摘要

Evolve-based agent such as AlphaEvolve is one of the notable successes in using Large Language Models (LLMs) to build AI Scientists. These agents tackle open-ended scientific problems by iteratively improving and evolving programs, leveraging the prior knowledge and reasoning capabilities of LLMs. Despite the success, existing evolve-based agents lack targeted guidance for evolution and effective mechanisms for organizing and utilizing knowledge acquired from past evolutionary experience. Consequently, they suffer from decreasing evolution efficiency and exhibit oscillatory behavior when approaching known performance boundaries. To mitigate the gap, we develop CausalEvolve, equipped with a causal scratchpad that leverages LLMs to identify and reason about guiding factors for evolution. At the beginning, CausalEvolve first identifies outcome-level factors that offer complementary inspirations in improving the target objective. During the evolution, CausalEvolve also inspects surprise patterns during the evolution and abductive reasoning to hypothesize new factors, which in turn offer novel directions. Through comprehensive experiments, we show that CausalEvolve effectively improves the evolutionary efficiency and discovers better solutions in 4 challenging open-ended scientific tasks.

2602.19203 2026-03-31 stat.ME

A Calibration Framework for Inference with Partially Observed Data

Mst Moushumi Pervin, Hengfang Wang, Jae Kwang Kim

详情
英文摘要

Missing data is an universal problem in statistics. We develop a unified framework for estimating parameters defined by general estimating equations under a missing-at-random (MAR) mechanism, based on generalized entropy calibration weighting. We construct weights by minimizing a convex entropy subject to (i) balancing constraints on a data-adaptive calibration function, estimated using flexible machine-learning predictors with cross-fitting, and (ii) a debiasing constraint involving the fitted propensity score (PS) model. The resulting estimator is doubly robust, remaining consistent if either the outcome regression (OR) or the PS model is correctly specified, and attains the semiparametric efficiency bound when both models are correctly specified. Our formulation encompasses classical inverse probability weighting (IPW) and augmented IPW (AIPW) as special cases and accommodates a broad class of entropy functions. We illustrate the versatility of the approach in three important settings: semi-supervised learning with unlabeled outcomes, regression analysis with missing covariates, and causal effect estimation in observational studies. Extensive simulation studies and real-data applications demonstrate that the proposed estimators achieve greater efficiency and numerical stability than existing methods. In particular, the proposed estimator outperforms the classical AIPW estimator under the OR model misspecification.

2602.00310 2026-03-31 astro-ph.IM stat.AP

Simulating Roman+Gaia Combined Astrometry, Parallaxes, and Proper Motions

Kevin A. McKinnon, Roeland P. van der Marel

Comments 27 pages, 14 figures, 4 tables

详情
英文摘要

The next generation of high-precision astrometry is rapidly approaching thanks to ongoing and upcoming missions like Euclid, LSST, and RST. We present a new tool (available at https://github.com/KevinMcK95/gaia_roman_astrometry) to simulate the astrometric precision that will be achieved when combining Gaia data with Roman images. The statistics that underpin this method generalize to combinations of astrometric datasets from any telescope. We construct realistic Roman position uncertainties as a function of filter, magnitude, and exposure time, which are combined with Gaia precisions and user-defined Roman observing strategies to predict the expected uncertainty in position, parallax, and proper motion (PM). We also simulate the core Roman surveys to assess their end-of-mission astrometric capabilities, finding that the High Latitude and Galactic Bulge Time Domain Surveys will deliver Gaia-DR3-quality PMs down to G=26.5 mag and G=29.0 mag, respectively. Due to its modest number of repeat observations, we find that the astrometry of the High Latitude Wide Area Survey (HLWAS) is very sensitive to particular choices in observing strategies. We compare possible HLWAS strategies to highlight the impact of parallax effects and conclude that a multi-year Roman-only baseline is required for useful PM uncertainties (<100 mas/yr). This simulation tool is actively being used for ongoing Roman proposal writing to ensure astrometric requirements for science goals will be met. Subsequent work will expand this tool to include simulated observations from other telescopes to plan for a future where all surveys and datasets are harnessed together.

2512.21326 2026-03-31 cs.LG cs.AI cs.CL stat.ML

Measuring all the noises of LLM Evals

Sida Wang

详情
英文摘要

Separating signal from noise is central to experiments. Applying well-established statistical methods effectively to LLM evals requires consideration of their unique noise characteristics. We clearly define and measure three types of noise: prediction noise from generating different answers on a given question, data noise from sampling questions, and their combined total noise following the law of total variance. To emphasize relative comparisons and gain statistical power, we propose the all-pairs paired method, which applies the paired analysis to all pairs of LLMs and measures all the noise components based on millions of question-level predictions across many evals and settings, revealing clear patterns. First, each eval exhibits a characteristic and highly predictable total noise level across all model pairs. Second, paired prediction noise typically exceeds paired data noise, which means reducing prediction noise by averaging can significantly increase statistical power. By measuring all the noises together, we can assess eval results in context, lowering the barrier of using the best analysis to make sound empirical decisions.

2512.11777 2026-03-31 stat.ME

A Doubled Adjacency Spectral Embedding Approach to Graph Clustering

Sinyoung Park, Matthew Nunes, Sandipan Roy

详情
英文摘要

Spectral clustering is a popular tool in network data analysis, with applications in a variety of scientific application areas. However, many studies have shown that classical spectral clustering does not perform well on certain network structures, particularly core-periphery networks. To improve clustering performance in core-periphery structures, Adjacency Spectral Embedding (ASE) has been introduced, which performs clustering via a network's adjacency matrix instead of the graph Laplacian. Despite its advantages in this setting, the optimal performance of ASE is limited to dense networks, whilst network data observed in practice is often sparse in nature. To address this limitation, we propose a new approach which we term Doubled Adjacency Spectral Embedding (DASE), motivated by the observation that the squared adjacency matrix will leverage the fewer connections in sparse structures more efficiently in clustering applications. Theoretical results establish that the resulting clustering algorithm enjoys good consistency properties when determining sparse community structure. The performance and general applicability of the proposed method is evaluated using extensive simulations on both directed and undirected networks. Our results highlight the improved clustering performance on both sparse and dense networks in the presence of core-periphery structures. We illustrate our proposed technique on real-world employment and transportation datasets.

2512.03336 2026-03-31 cs.LG cs.AI stat.ML

Single-Round Scalable Analytic Federated Learning

Alan T. L. Bacellar, Mustafa Munir, Felipe M. G. França, Priscila M. V. Lima, Radu Marculescu, Lizy K. John

Comments To appear in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

详情
英文摘要

Federated Learning (FL) is plagued by two key challenges: high communication overhead and performance collapse on heterogeneous (non-IID) data. Analytic FL (AFL) provides a single-round, data distribution invariant solution, but is limited to linear models. Subsequent non-linear approaches, like DeepAFL, regain accuracy but sacrifice the single-round benefit. In this work, we break this trade-off. We propose SAFLe, a framework that achieves scalable non-linear expressivity by introducing a structured head of bucketed features and sparse, grouped embeddings. We prove this non-linear architecture is mathematically equivalent to a high-dimensional linear regression. This key equivalence allows SAFLe to be solved with AFL's single-shot, invariant aggregation law. Empirically, SAFLe establishes a new state-of-the-art for analytic FL, significantly outperforming both linear AFL and multi-round DeepAFL in accuracy across all benchmarks, demonstrating a highly efficient and scalable solution for federated vision.

2510.26809 2026-03-31 stat.AP physics.ao-ph

A generalisation of the signal-to-noise ratio using proper scoring rules

Jochen Bröcker, Eviatar Bach

Comments 19 pages, 2 figures, 3 tables

详情
英文摘要

A generalised concept of the signal-to-noise ratio (or equivalently the ratio of predictable components, or RPC) is provided, based on proper scoring rules. This definition is the natural generalisation of the classical RPC, yet it allows one to define and analyse the signal-to-noise properties of any type of forecast that is amenable to scoring, thus drastically widening the applicability of these concepts. The methodology is illustrated through numerical examples of ensemble forecasts, scored using the continuous ranked probability score (CRPS), and of probability forecasts of a binary event, scored using the logarithmic score. Numerical examples are carried out using both synthetic data with prescribed signal-to-noise ratios as well as seasonal ensemble hindcasts of the North Atlantic Oscillation (NAO) index. The latter have previously been interpreted as having a signal-to-noise "paradox", or anomalous signal-to-noise ratio, using the RPC statistic. For the synthetic data, the RPC statistic as well as the scoring rule-based ones agree regarding which data sets exhibit anomalous signal-to-noise ratios, but exhibit different variance, indicating different statistical properties. For the NAO data, on the other hand, the different statistics are more equivocal on whether the signal-to-noise ratio is anomalous.

2510.19372 2026-03-31 stat.ML cs.LG

On the Hardness of Reinforcement Learning with Transition Look-Ahead

Corentin Pla, Hugo Richard, Marc Abeille, Nadav Merlis, Vianney Perchet

详情
英文摘要

We study reinforcement learning (RL) with transition look-ahead, where the agent may observe which states would be visited upon playing any sequence of $\ell$ actions before deciding its course of action. While such predictive information can drastically improve the achievable performance, we show that using this information optimally comes at a potentially prohibitive computational cost. Specifically, we prove that optimal planning with one-step look-ahead ($\ell=1$) can be solved in polynomial time through a novel linear programming formulation. In contrast, for $\ell \geq 2$, the problem becomes NP-hard. Our results delineate a precise boundary between tractable and intractable cases for the problem of planning with transition look-ahead in reinforcement learning.

2510.16974 2026-03-31 cs.LG stat.ML

Differentially Private Linear Regression and Synthetic Data Generation with Statistical Guarantees

Shurong Lin, Aleksandra Slavković, Deekshith Reddy Bhoomireddy

详情
英文摘要

In the social sciences, small- to medium-scale datasets are common, and linear regression is canonical. In privacy-aware settings, much work has focused on differentially private (DP) linear regression, but mostly on point estimation with limited attention to uncertainty quantification. Meanwhile, synthetic data generation (SDG) is increasingly important for reproducibility studies, yet current DP linear regression methods do not readily support it. Mainstream DP-SDG approaches either are tailored to discrete or discretized data, making them less suitable for analyses involving continuous variables, or rely on deep learning models that require large datasets, limiting their use for the smaller-scale data typical in social science. We propose a method for linear regression with valid inference under Gaussian DP. It includes a bias-corrected estimator with asymptotic confidence intervals (CIs) and a general SDG procedure such that the corresponding regression on the synthetic data matches our DP linear regression procedure. Our approach is effective in small- to moderate-dimensional settings. Experiments show that our method (1) improves accuracy over existing methods for DP linear regression, (2) provides valid CIs, and (3) produces more reliable synthetic data for downstream statistical and machine learning tasks than current DP synthesizers.

2510.05825 2026-03-31 cs.LG cs.AI cs.CL stat.ML

Mitigating Premature Exploitation in Particle-based Monte Carlo for Inference-Time Scaling

Giorgio Giannone, Guangxuan Xu, Nikhil Shivakumar Nayak, Rohan Mahesh Awhad, Shivchander Sudalairaj, Kai Xu, Akash Srivastava

Comments preprint

详情
英文摘要

Inference-Time Scaling (ITS) improves language models by allocating more computation at generation time. Particle Filtering (PF) has emerged as a strong ITS method for complex mathematical reasoning tasks, but it is vulnerable when guided by process reward models, which often assign overconfident scores early in the reasoning process. This causes PF to suffer from premature exploitation: it myopically commits to locally promising trajectories, prunes potentially correct hypotheses, and converges to suboptimal solutions. This failure mode, known as particle impoverishment, is especially severe under constrained computational budgets. To address this, we analyze the problem and identify two root causes: a lack of diversity in the particle set due to overconfident resampling and consequent inability to assess the potential of a reasoning path. We introduce Entropic Particle Filtering (ePF), an algorithm that integrates two new techniques to solve these issues. The first technique, Entropic Annealing (EA), directly mitigates particle impoverishment by monitoring search diversity via entropy; when diversity drops, it intervenes by dynamically annealing the resampling distribution to preserve exploration. The second, an enhancement called Look-ahead Modulation (LaM), adds a predictive guide to evaluate a state's potential based on its successors. On several challenging math benchmarks, ePF significantly outperforms strong baselines and achieves up to a 50% relative improvement in task reward. Together, these methods improve PF's resilience by balancing the exploration of diverse solution spaces with the exploitation of high-reward regions, ultimately leading to higher-quality solutions.

2510.05809 2026-03-31 q-fin.RM math.ST q-fin.ST stat.TH

Coherent estimation of risk measures

Martin Aichele, Igor Cialenco, Damian Jelito, Marcin Pitera

Comments JEL classification: C13, C58, G32

详情
英文摘要

We develop a statistical framework for risk estimation, inspired by the axiomatic theory of risk measures. Coherent risk estimators -- functionals of P\&L samples inheriting the economic properties of risk measures -- are defined and characterized through robust representations linked to $L$-estimators. The framework provides a canonical methodology for constructing estimators with sound financial and statistical properties, unifying risk measure theory, principles for capital adequacy, and practical statistical challenges in market risk. Numerical illustrations based on simulated and market data demonstrate that coherence of a risk measure does not necessarily carry over to its estimators and show that alternative admissible weight structures within the CRE representation can lead to substantially different capital adequacy outcomes.

2508.01321 2026-03-31 stat.ML cs.LG

Flow IV: Counterfactual Inference In Nonseparable Outcome Models Using Instrumental Variables

Marc Braun, Jose M. Peña, Adel Daoud

详情
英文摘要

To reach human level intelligence, learning algorithms need to incorporate causal reasoning. But identifying causality, and particularly counterfactual reasoning, remains elusive. In this paper, we make progress on counterfactual inference in nonseparable outcome models by utilizing instrumental variables (IVs). IVs are a classic tool for mitigating bias from unobserved confounders when estimating causal effects. While IV methods for effect estimation have been extended to nonseparable outcome models under different assumptions, existing IV approaches to counterfactual prediction typically assume one-dimensional outcomes and additive noise. In this paper, we show that under standard IV assumptions, along with the assumption that the outcome function is invertible and has a triangular structure, then the treatment-outcome relationship becomes identifiable from observed data. We furthermore propose a method to learn the outcome function utilizing normalizing flows. This outcome function estimator can then be used to perform counterfactual inference. We refer to the method as Flow IV.

2507.18591 2026-03-31 stat.ME math.ST stat.AP stat.TH

Omnibus goodness-of-fit tests for univariate continuous distributions based on trigonometric moments

Alain Desgagné, Frédéric Ouimet

Comments 68 pages, 7 figures, 13 tables

详情
Journal ref
Statistica Neerlandica (2026), 80 (2), e70025, 1-32
英文摘要

We propose a new omnibus goodness-of-fit test based on trigonometric moments of probability-integral-transformed data. The test builds on the framework of the LK test introduced by Langholz and Kronmal [J. Amer. Statist. Assoc. 86 (1991), 1077-1084], but fully exploits the covariance structure of the associated trigonometric statistics. As a result, our test statistic converges under the null hypothesis to a $χ_2^2$ distribution, even in the presence of nuisance parameters, yielding a well-calibrated rejection region. We derive the exact asymptotic covariance matrix required for normalization and propose a unified approach to computing the LK normalizing scalar. The applicability of both the proposed test and the LK test is substantially expanded by providing implementation details for 11 families of continuous distributions, covering most commonly used parametric models. Simulation studies demonstrate accurate empirical size, close to the nominal level, and strong power properties, yielding fully plug-and-play procedures. Further insight is provided by an analysis under local alternatives. The methodology is illustrated using surface temperature forecast errors from a numerical weather prediction model.

2507.12581 2026-03-31 stat.ME math.ST stat.ML stat.TH

Cross-World Assumption and Refining Prediction Intervals for Individual Treatment Effects

Juraj Bodik, Yaxuan Huang, Bin Yu

Comments Code: https://github.com/jurobodik/ITE_prediction_cross_world.git

详情
英文摘要

While average treatment effects (ATE) and conditional average treatment effects (CATE) provide valuable population- and subgroup-level summaries, they fail to capture uncertainty at the individual level. For high-stakes decision-making, individual treatment effect (ITE) estimates must be accompanied by valid prediction intervals that reflect heterogeneity and unit-specific uncertainty. However, the fundamental unidentifiability of ITEs limits the ability to derive precise and reliable individual-level uncertainty estimates. To address this challenge, we investigate the role of a cross-world correlation parameter, $ ρ(x) = cor(Y(1), Y(0) | X = x) $, which describes the dependence between potential outcomes, given covariates, in the Neyman-Rubin super-population model with i.i.d. units. Although $ ρ$ is fundamentally unidentifiable, we argue that in most real-world applications, it is possible to impose reasonable and interpretable bounds informed by domain-expert knowledge. Given $ρ$, we design prediction intervals for ITE, achieving more stable and accurate coverage with substantially shorter widths; often less than 1/3 of those from competing methods. The resulting intervals satisfy coverage guarantees $P\big(Y(1) - Y(0) \in C_{ITE}(X)\big) \geq 1 - α$ and are asymptotically optimal under Gaussian assumptions. We provide strong theoretical and empirical arguments that cross-world assumptions can make individual uncertainty quantification both practically informative and statistically valid.

2506.21353 2026-03-31 stat.ME stat.AP

Bayesian Modeling for Aggregated Relational Data: A Unified Perspective

Owen G. Ward, Anna L. Smith, Tian Zheng

详情
英文摘要

Aggregated relational data is widely collected to study social networks, in fields such as sociology, public health and economics. Many of the successes of ARD inference have been driven by increasingly complex Bayesian models, which provide principled and flexible ways of reflecting dependence patterns and biases encountered in real data. In this work we provide researchers with a unified collection of Bayesian implementations of existing models for ARD, within the state-of-the-art Bayesian sampling language Stan. Our implementations incorporate within-iteration rescaling procedures by default, improving algorithm run time and convergence diagnostics. Estimating ARD parameters requires carefully balancing model complexity against computational cost and data requirements, yet this trade-off has received relatively limited systematic attention in the literature. Moreover, general model comparison tools applicable across a wide range of ARD models remain underdeveloped, and existing approaches often require substantial expertise in Bayesian computation and software. Using synthetic data, we demonstrate how well competing models recover true personal network sizes and subpopulation sizes and how existing posterior predictive checks compare across a range of Bayesian ARD models. We provide code to leverage Stan's modeling framework for exact $K$-fold cross-validation, and explain why approximate leave-one-out estimates often fail for many ARD models. This work highlights important connections and future directions in Bayesian modeling of ARD, providing practical guidance for selecting and evaluating Bayesian ARD models.

2506.20437 2026-03-31 stat.ME

Fast Penalized Generalized Estimating Equations for Large Longitudinal Functional Datasets

Gabriel Loewinger, Alex W. Levis, Erjia Cui, Francisco Pereira

Comments Manuscript - 22 pages; Appendix - 39 pages

详情
英文摘要

Longitudinal binary or count functional data are common in neuroscience, but are often too large to analyze with existing functional regression methods. We propose one-step penalized generalized estimating equations that supports generalized functional outcomes (e.g., count, binary, proportion, continuous-valued) and is fast even when datasets have a large number of clusters and large cluster sizes. The method applies to functional and scalar covariates and the one-step estimation framework enables efficient smoothing parameter selection and joint confidence interval construction. Importantly, this semi-parametric approach yields coefficient confidence intervals that are provably valid asymptotically even under working correlation misspecification. By developing a general theory for adaptive one-step M-estimation, we prove that the coefficient estimates are asymptotically normal and as efficient as the fully-iterated estimator; we verify these theoretical properties in simulations. We illustrate the benefits of our approach for analyzing large-scale neural recordings by applying it to a recent calcium imaging dataset published in Nature. We show that our method reveals important timing effects obscured in non-functional analyses. In doing so, we also demonstrate scaling to common neuroscience dataset sizes: the one-step estimator fits to a dataset with 150,000 (binary) functional outcomes, each observed at 120 functional domain points, in only 6.5 minutes on a laptop without parallelization. We release our methods in the R package 'fastfGEE', which supports a wide range of link functions and working covariances.

2505.17288 2026-03-31 stat.ML cs.LG

Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation

Seamus Somerstep, Vinod Raman, Unique Subedi, Yuekai Sun

Comments AISTATS 2026 Camera Ready

详情
英文摘要

Using the bit string generation problem as a case study, we theoretically compare two standard methods for adapting large language models to new tasks. The first, referred to as supervised fine-tuning, involves training a new next token predictor on good generations. The second method, Best-of-N, trains a reward model to select good responses from a collection generated by an unaltered base model. If the learning setting is realizable, we find that supervised fine-tuning outperforms BoN through a better dependence on the response length in its rate of convergence. If realizability fails, then depending on the failure mode, BoN can enjoy a better rate of convergence in either n or a rate of convergence with better dependence on the response length.

2505.13213 2026-03-31 stat.ML cs.LG

Diffusion Models with Double Guidance: Generate with aggregated datasets

Yanfeng Yang, Kenji Fukumizu

详情
英文摘要

Creating large-scale datasets for training high-performance generative models is often prohibitively expensive, especially when associated attributes or annotations must be provided. As a result, merging existing datasets has become a common strategy. However, the sets of attributes across datasets are often inconsistent, and their naive concatenation typically leads to block-wise missing conditions. This presents a significant challenge for conditional generative modeling when the multiple attributes are used jointly as conditions, thereby limiting the model's controllability and applicability. To address this issue, we propose a novel generative approach, Diffusion Model with Double Guidance, which enables precise conditional generation even when no training samples contain all conditions simultaneously. Our method maintains rigorous control over multiple conditions without requiring joint annotations. We demonstrate its effectiveness in molecular and image generation tasks, where it outperforms existing baselines both in alignment with target conditional distributions and in controllability under missing condition settings.

2505.12412 2026-03-31 stat.ML cs.LG

Training Latent Diffusion Models with Interacting Particle Algorithms

Tim Y. J. Wang, Juan Kuntz, O. Deniz Akyildiz

Comments Camera Ready version for AISTATS 2026

详情
英文摘要

We introduce a novel particle-based algorithm for end-to-end training of latent diffusion models. We reformulate the training task as minimizing a free energy functional and obtain a gradient flow that does so. By approximating the latter with a system of interacting particles, we obtain the algorithm, which we underpin theoretically by providing error guarantees. The novel algorithm compares favorably in experiments with previous particle-based methods and variational inference analogues.

2502.18674 2026-03-31 stat.ME

bayesNMF: Fast Bayesian Poisson NMF with Automatically Learned Rank Applied to Mutational Signatures

Jenna M. Landy, Nishanth Basava, Giovanni Parmigiani

Comments 16 pages, 4 figures (+ references and supplement). For open-source R software package, see https://github.com/jennalandy/bayesNMF. For all code used in the simulation studies and data application, see https://github.com/jennalandy/bayesNMF_PAPER

详情
英文摘要

Bayesian Poisson Non-Negative Matrix Factorization (NMF) is widely used to model count data, including in cancer mutational signature analysis. However, standard Gibbs samplers rely on computationally expensive Poisson augmentation, and current software implementations learn the latent rank either through slow and potentially subjective heuristic rank selection or with automatic approaches that do not report posterior uncertainty. In this paper, we introduce bayesNMF, an MH-within-Gibbs sampler to address both of these limitations. First, we define high-overlap proposals for Metropolis-Hastings sampling to remove the need for Poisson augmentation. Second, we define a BIC-based sparsity prior to learn rank automatically within the Bayesian formulation while allowing for posterior uncertainty quantification. We provide an open-source R software package with all of the models and plotting capabilities demonstrated in this paper on GitHub at jennalandy/bayesNMF. Although our applications focus on cancer mutational signatures, our software and results can be extended to any use of Bayesian Poisson NMF.

2501.00277 2026-03-31 stat.ML cs.AI cs.HC cs.LG

Efficient Human-in-the-Loop Active Learning: A Novel Framework for Data Labeling in AI Systems

Yiran Huang, Jian-Feng Yang, Haoda Fu

详情
英文摘要

Modern AI algorithms require labeled data. In real world, majority of data are unlabeled. Labeling the data are costly. this is particularly true for some areas requiring special skills, such as reading radiology images by physicians. To most efficiently use expert's time for the data labeling, one promising approach is human-in-the-loop active learning algorithm. In this work, we propose a novel active learning framework with significant potential for application in modern AI systems. Unlike the traditional active learning methods, which only focus on determining which data point should be labeled, our framework also introduces an innovative perspective on incorporating different query scheme. We propose a model to integrate the information from different types of queries. Based on this model, our active learning frame can automatically determine how the next question is queried. We further developed a data driven exploration and exploitation framework into our active learning method. This method can be embedded in numerous active learning algorithms. Through simulations on five real-world datasets, including a highly complex real image task, our proposed active learning framework exhibits higher accuracy and lower loss compared to other methods.

2412.16320 2026-03-31 stat.ME stat.AP

Combining BART and Principal Stratification to estimate the effect of intermediate variables on primary outcomes with application to estimating the effect of family planning on employment in Nigeria and Senegal

Lucas Godoy Garraza, Ilene Speizer, Leontine Alkema

Comments arXiv admin note: text overlap with arXiv:2408.03777

详情
英文摘要

There is interest in learning about the causal effects of modern contraceptive use on empowerment outcomes. Data on this question often come from family planning (FP) programs that increase access to FP and facilitate contraceptive use among some women, rather than directly assigning use. Women whose contraceptive behavior changes because of these programs ("compliers") may differ from target populations in ways that alter the consequences of contraceptive use for empowerment outcomes. We propose a two-step approach. First, we use principal stratification and Bayesian Additive Regression Trees (BART) to estimate the effect of modern contraceptive use among compliers in the study population, treating the FP program as an instrument rather than as the treatment of interest. Second, we generalize these complier-specific effects to a broader population by averaging conditional effects over the covariate distribution in the target population, with uncertainty in that distribution quantified via a Bayesian bootstrap applied to external complex survey data. We examine performance in simulation designs previously used to evaluate IV estimators. We then apply the approach to employment among urban women in Nigeria and Senegal, finding strong and heterogeneous effects of contraceptive use. Sensitivity analyses suggest robustness to violations of assumptions for internal and external validity.

2408.03777 2026-03-31 stat.ME stat.AP

Combining BART and Principal Stratification to estimate the effect of intermediate on primary outcomes with application to estimating the effect of family planning on employment in sub-Saharan Africa

Lucas Godoy Garraza, Ilene Speizer, Leontine Alkema

Comments We are withdrawing this paper as it has been merged with another manuscript into a single, consolidated work. The combined paper is already available at arXiv.2412.16320. As this submission corresponds to only one component of that work, it is no longer being pursued as a standalone paper

详情
英文摘要

There is interest in learning about the causal effect of family planning (FP) on empowerment related outcomes. Experimental data related to this question are available from trials in which FP programs increase access to FP. While program assignment is unconfounded, FP uptake and subsequent empowerment may share common causes. We use principal stratification to estimate the causal effect of an intermediate FP outcome on a primary outcome of interest, among women affected by a FP program. Within strata defined by the potential reaction to the program, FP uptake is unconfounded. To minimize the need for parametric assumptions, we propose to use Bayesian Additive Regression Trees (BART) for modeling stratum membership and outcomes of interest. We refer to the combined approach as Prince BART. We evaluate Prince BART through a simulation study and use it to assess the causal effect of modern contraceptive use on employment in six cities in Nigeria, based on quasi-experimental data from a FP program trial during the first half of the 2010s. We show that findings differ between Prince BART and alternative modeling approaches based on parametric assumptions.

2407.13261 2026-03-31 stat.ME

Enhanced inference for distributions and quantiles of individual treatment effects in various experiments

Zhe Chen, Xinran Li

详情
Journal ref
Journal of the American Statistical Association, 2026
英文摘要

Understanding treatment effect heterogeneity has become increasingly important in many fields. In this paper we study distributions and quantiles of individual treatment effects to provide a more comprehensive and robust understanding of treatment effects beyond usual averages, despite they are more challenging to infer due to nonidentifiability from observed data. Recent randomization-based approaches offer finite-sample valid inference for treatment effect distributions and quantiles in both completely randomized and stratified randomized experiments, but can be overly conservative by assuming the worst-case scenario where units with large effects are all assigned to the treated (or control) group. We introduce two improved methods to enhance the power of these existing approaches. The first method reinterprets existing approaches as inferring treatment effects among only treated or control units, and then combines the inference for treated and control units to infer treatment effects for all units. The second method explicitly controls for the actual number of treated units with large effects. Both simulation and applications demonstrate the substantial gain from the improved methods. These methods are further extended to sampling-based experiments as well as quasi-experiments from matching, in which the ideas for both improved methods play critical and complementary roles.

2403.05704 2026-03-31 econ.EM cs.SI stat.AP stat.ME

Non-robustness of diffusion estimates on networks with measurement error

Arun G. Chandrasekhar, Paul Goldsmith-Pinkham, Tyler H. McCormick, Samuel Thau, Jerry Wei

详情
英文摘要

Network diffusion models are used to study disease transmission, information spread, technology adoption, and other socio-economic processes. We show that estimates of these diffusions are highly non-robust to mismeasurement. First, even when the network is measured perfectly, small and local mismeasurement in the initial seed generates a large shift in the locations of the expected diffusion. Second, if instead the initial seed is known, even a vanishingly small share of missed links causes diffusion forecasts to be significant under-estimates. Forecast failure depends critically on the geometry of measurement error: we provide sufficient conditions for catastrophic failure when missing links bridge distant network regions (acting as shortcuts), and sufficient conditions for robustness when missing links are a uniformly, randomly thinned subset of the full network (preserving network structure). Such failures exist even when the basic reproductive number is consistently estimable. We explore difficulties implementing possible solutions and conduct simulations on synthetic and real networks.

2401.15703 2026-03-31 stat.ME stat.AP

A Bayesian multivariate extreme value mixture model

Chenglei Hu, Ben Swallow, Daniela Castro-Camilo

Comments 35 pages, 9 figures

详情
英文摘要

Impact assessment of natural hazards requires the consideration of both extreme and non-extreme events. Extensive research has been conducted on the joint modeling of bulk and tail in univariate settings; however, the corresponding body of research in the context of multivariate analysis is comparatively scant. This study extends the univariate joint modeling of bulk and tail to the multivariate framework. Specifically, it pertains to cases where multivariate observations exceed a high threshold in at least one component. We propose a multivariate mixture model that assumes a parametric model to capture the bulk of the distribution, which is in the max-domain of attraction (MDA) of a multivariate extreme value distribution (mGEVD). The tail is described by the multivariate generalized Pareto distribution, which is asymptotically justified to model multivariate threshold exceedances. We show that if all components exceed the threshold, our mixture model is in the MDA of an mGEVD. Bayesian inference based on multivariate random-walk Metropolis-Hastings and the automated factor slice sampler allows us to incorporate uncertainty from the threshold selection easily. Due to computational limitations, simulations and data applications are provided for dimension $d=2$, but a discussion is provided with views toward scalability based on pairwise likelihood.

2311.16793 2026-03-31 stat.ME

Mediation analysis with unmeasured confounding between parallel mediators and outcome

Kang Shuai, Lan Liu, Yangbo He, Wei Li

Comments 40 pages

详情
英文摘要

Mediation analysis extending beyond single mediators has gained significant attention in recent years. However, related methods often assume the absence of unmeasured mediator-outcome confounding. To address this, we develop a mediation analysis framework that accounts for such confounding within a linear structural equation model with parallel mediators. Specifically, we introduce a pseudo proxy variable to capture unmeasured confounding, allowing us to identify causal parameters. Leveraging this proxy, we propose a partially penalized method to identify mediators that significantly affect the outcome. The resultant estimates are consistent, and the estimates of nonzero parameters are asymptotically normal. Motivated by these results, we further introduce a procedure that can consistently select active mediation pathways with large probability. Simulation studies demonstrate the superior performance of the proposed approach. Finally, we apply our approach to genomic data, identifying gene expressions that potentially mediate the impact of a genetic variant on mouse obesity.

2311.10153 2026-03-31 math.ST math.PR stat.TH

Optimal recovery by maximum and integrated conditional likelihood in the general Stochastic Block Model

Andressa Cerqueira, Florencia Leonardi

详情
英文摘要

In this paper, we obtain new results on the weak and strong consistency of the maximum and integrated conditional likelihood estimators for the community detection problem in the Stochastic Block Model with $k$ communities and unknown parameters. In particular, we show that maximum conditional likelihood achieves the optimal known threshold for exact recovery in the logarithmic degree regime. For the integrated conditional likelihood, we obtain a sub-optimal constant, but still obtain strong consistency in the logarithmic degree regime. Both methods are shown to be weakly consistent in the divergent degree regime. These results fill in the gap in the theory of community detection with maximum likelihood and integrated conditional likelihood, solving open problems in the literature.

2303.03521 2026-03-31 stat.ME stat.CO

Bayesian Variable Selection for Function-on-Scalar Regression Models: a comparative analysis

Pedro Henrique T. O. Sousa, Camila P. E. de Souza, Ronaldo Dias

详情
英文摘要

In this work, we developed a new Bayesian method for variable selection in function-on-scalar regression (FOSR). Our method uses a hierarchical Bayesian structure and latent variables to enable an adaptive covariate selection process for FOSR. Extensive simulation studies show the proposed method's main properties, such as its accuracy in estimating the coefficients and high capacity to select variables correctly. Furthermore, we conducted a substantial comparative analysis with the main competing methods, the BGLSS (Bayesian Group Lasso with Spike and Slab prior) method, the group LASSO (Least Absolute Shrinkage and Selection Operator), the group MCP (Minimax Concave Penalty), and the group SCAD (Smoothly Clipped Absolute Deviation). Our results demonstrate that the proposed methodology is superior in correctly selecting covariates compared with the existing competing methods while maintaining a satisfactory level of goodness of fit. In contrast, the competing methods could not balance selection accuracy with goodness of fit. We also considered a COVID-19 dataset and some socioeconomic data from Brazil as an application and obtained satisfactory results. In short, the proposed Bayesian variable selection model is highly competitive, showing significant predictive and selective quality.

2212.13641 2026-03-31 stat.ME

A Nonparametric Framework for Universal Difference-in-Differences

Chan Park, Eric Tchetgen Tchetgen

详情
英文摘要

Difference-in-differences (DiD) is a popular approach to evaluate treatment effects in settings where both pre- and post-treatment measurements of the outcome are available. Despite its popularity, existing methods face important limitations. Specifically, they either: (i) only apply to continuous outcomes and the average treatment effect on the treated; (ii) are sensitive to the transformation of the outcome; (iii) rely on a no unmeasured confounding assumption given pre-treatment covariates and outcome; (iv) lack semiparametric efficiency theory. In this paper, we introduce a novel framework for causal identification and inference in DiD settings that overcomes limitations (i)-(iv), making it the only existing framework that simultaneously satisfies these properties. Key to our framework is an odds ratio equi-confounding assumption, which states that the generalized odds ratio function relating treatment and treatment-free potential outcome is stable across time periods, a form of distributional parallel trends assumption. Under this assumption, we establish nonparametric identification of virtually any standard treatment effect on the treated, including quantile treatment effects on the treated. We also develop corresponding consistent, asymptotically linear, and semiparametric efficient estimators that leverage modern statistical learning theory. We illustrate our framework through simulation studies and two real-world applications using Zika virus outbreak data and traffic safety data.

2211.01512 2026-03-31 cs.LG math.ST stat.TH

Convergence of the Inexact Langevin Algorithm in KL Divergence with Application to Score-based Generative Models

Kaylee Yingxi Yang, Andre Wibisono

Comments Improved SGM convergence dependency on the LSI constant, and a minor correction to the MGF error assumption

详情
英文摘要

Motivated by the increasingly popular Score-based Generative Modeling (SGM), we study the Inexact Langevin Dynamics (ILD) and Inexact Langevin Algorithm (ILA) where a score function estimate is used in place of the exact score. We establish {\em stable} biased convergence guarantees in terms of the Kullback-Leibler (KL) divergence. To achieve these guarantees, we impose two key assumptions: 1) the target distribution satisfies the log-Sobolev inequality, and 2) the error of score estimator exhibits a sub-Gaussian tail, referred to as Moment Generating Function (MGF) error assumption. Under the stronger $L^\infty$ score error assumption, we obtain a stable convergence bound in Rényi divergence. We also generalize the proof technique to SGM, and derive a stable convergence bound in KL divergence. In addition, we explore the question of how to obtain a provably accurate score estimator. We demonstrate that a simple estimator based on kernel density estimation fulfills the MGF error assumption for sub-Gaussian target distributions, at the population level.

2603.26935 2026-03-31 stat.AP

The Load Management Paradox: Correcting the Healthy-Worker Survivor Effect in NBA Injury Modeling

Yue Yu, Guanyu Hu

Comments 40 pages, 23 figures

详情
英文摘要

In professional sports analytics, evaluating the relationship between accumulated workload and injury risk is a central objective. However, naive survival models applied to NBA game-log data consistently yield a paradox: players who recently logged heavy minutes appear less likely to sustain an injury. We demonstrate that this counterintuitive result is an artifact of the healthy-worker survivor effect, wherein conditioning on game participation induces severe collider bias driven by unobserved latent fitness. To address this structural confounding, we develop a Marginal Structural Piecewise Exponential Model (MS-PEM) that unifies inverse probability of treatment weighting (IPTW) with flexible piecewise-exponential additive models and weighted cumulative exposure (WCE). A simulation study confirms that this selection mechanism is mathematically sufficient to entirely reverse the sign of the true association between workload and injury. Applying the MS-PEM to 78,594 player-game observations across three NBA seasons (encompassing 771 players and 2,439 injury events), we find that adjusting for observed selection reliably shifts the hazard back toward the underlying physiological relationship. While the exact magnitude of the correction is sensitive to outcome-model regularization (attenuating the paradoxical weight function by 1% to 2% under conservative cross-validation and up to 63% to 78% under lighter penalization), the positive direction of the causal correction is highly robust across multiple propensity specifications and doubly robust checks. Ultimately, these results provide a methodological template for bias-aware sports injury modeling, while cautioning that models relying strictly on observational game logs will systematically underestimate the true risk of heavy workloads without richer physiological data for full causal identification.

2603.26923 2026-03-31 stat.ML cs.LG math.DS

Koopman Operator Identification of Model Parameter Trajectories for Temporal Domain Generalization (KOMET)

Randy C. Hoover, Jacob James, Paul May, Kyle Caudle

详情
英文摘要

Parametric models deployed in non-stationary environments degrade as the underlying data distribution evolves over time (a phenomenon known as temporal domain drift). In the current work, we present KOMET (Koopman Operator identification of Model parameter Evolution under Temporal drift), a model-agnostic, data-driven framework that treats the sequence of trained parameter vectors as the trajectory of a nonlinear dynamical system and identifies its governing linear operator via Extended Dynamic Mode Decomposition (EDMD). A warm-start sequential training protocol enforces parameter-trajectory smoothness, and a Fourier-augmented observable dictionary exploits the periodic structure inherent in many real-world distribution drifts. Once identified, KOMET's Koopman operator predicts future parameter trajectories autonomously, without access to future labeled data, enabling zero-retraining adaptation at deployment. Evaluated on six datasets spanning rotating, oscillating, and expanding distribution geometries, KOMET achieves mean autonomous-rollout accuracies between 0.981 and 1.000 over 100 held-out time steps. Spectral and coupling analyses further reveal interpretable dynamical structure consistent with the geometry of the drifting decision boundary.

2603.26914 2026-03-31 stat.ME stat.AP

A Bayesian Functional Concurrent Zero-Inflated Dirichlet-Multinomial Regression Model with Application to Infant Microbiome

Brody Erlandson, Ander Wilson, Matthew D. Koslovsky

Comments Contact author for Supplemental Material

详情
英文摘要

The infant microbiome undergoes rapid changes in composition over time and is associated with long-term risks of conditions such as immune strength, allergy, asthma, and other health outcomes. Modeling the associations between exposures or treatments and microbial composition over time is essential for understanding the factors that drive these changes. Estimating these temporal dynamics has several challenges including: repeated measures, overdispersion, compositionality, high-dimensional parameter spaces, and zero-inflation. Many longitudinal regression models used in human microbiome research assume constant effects over time that cannot capture time-varying or functional effects of exposures, ignore the compositional structure of the data by modeling each taxon separately, and are not equipped to handle potential zero-inflation. Dirichlet-multinomial (DM) regression models inherently accommodate overdispersion and the compositional structure of the data and have been extended to account for excess zeros. However, existing DM-based regression models are unable to additionally handle repeated measures designs. To fill this gap, we propose a functional concurrent zero-inflated Dirichlet-multinomial (FunC-ZIDM) regression model which is designed to model time-varying relations between observed covariates and microbial taxa while accounting for zero-inflation, compositionality, and repeated measures. Through simulation, we demonstrate that the model can accurately estimate the underlying functional relations and scale to large compositional spaces. We apply our model to investigate time-varying associations between infant microbiome composition and observed covariates during the 11-week postnatal period. We found that $α$-diversity (i.e., diversity of the microbiome within an individual) is positively associated with a higher gestational age and percentage of breast milk in the diet.

2603.26913 2026-03-31 stat.AP

Synthesizing the Counterfactual: A CTGAN-Augmented Causal Evaluation of Palliative Care on Spousal Depression

Pietro Grassi, Roberto Molinari, Chiara Seghieri, Daniele Vignoli

详情
英文摘要

Spousal bereavement severely deteriorates mental health. While palliative care benefits dying patients, its "stress-buffering" effect on survivors' depression remains empirically elusive due to acute small-$N$ constraints in longitudinal dyadic data. This study evaluates the causal impact of palliative care on bereaved spouses while introducing Synthetic Data Generation (SDG) to resolve sample attrition in quasi-experimental designs. Using SHARE panel data, we augment the sparse treated cohort via a Conditional Tabular GAN, anchoring synthetic trajectories to empirical baseline constraints to preserve causal pathways. A Matched Difference-in-Differences estimator applied to the high-fidelity augmented dataset evaluates the treatment effect. Results reveal a non-linear psychological response. Palliative care initially exacerbates acute depressive symptoms at the time of loss ($β_0 = 0.218,\ p < 0.05$), reflecting the intense emotional confrontation of the intervention. However, a sustained stress-buffering effect emerges in subsequent periods ($β_2 = -0.763,\ p < 0.01$), indicating an accelerated long-term recovery compared to standard care. Estimates are highly robust to unobserved confounding (Oster's $δ> 1$). Substantively, we advocate for reconceptualizing end-of-life care as a dyadic public health intervention. Methodologically, we establish SDG as a robust analytical tool capable of powering fragile quasi-experiments in longitudinal social surveys.

2603.26901 2026-03-31 stat.AP math.OC math.PR math.ST q-fin.RM stat.TH

Biased Mean Quadrangle and Applications

Anton Malandii, Stan Uryasev

详情
英文摘要

This paper introduces \emph{biased mean regression}, estimating the \emph{biased mean}, i.e., $\mathbb{E}[Y] + x$, where $x \in \mathbb{R}$. The approach addresses a fundamental statistical problem that covers numerous applications. For instance, it can be used to estimate factors driving portfolio loss exceeding the expected loss by a specified amount (e.g., $ x=\$10 billion$) or to estimate factors impacting a specific excess release of radiation in the environment, where nuclear safety regulations specify different severity levels. The estimation is performed by minimizing the so-called \emph{superexpectation error}. We establish two equivalence results that connect the method to popular paradigms: (i) biased mean regression is equivalent to quantile regression for an appropriate parameterization and is equivalent to ordinary least squares when $x=0$; (ii) in portfolio optimization, minimizing \emph{superexpectation risk}, associated with the superexpectation error, is equivalent to CVaR optimization. The approach is computationally attractive, as minimizing the superexpectation error reduces to linear programming (LP), thereby offering algorithmic and modeling advantages. It is also a good alternative to ordinary least squares (OLS) regression. The approach is based on the \emph{Risk Quadrangle} (RQ) framework, which links four stochastic functionals -- error, regret, risk, and deviation -- through a statistic. For the biased mean quadrangle, the statistic is the biased mean. We study properties of the new quadrangle, such as \emph{subregularity}, and establish its relationship to the quantile quadrangle. Numerical experiments confirm the theoretical statements and illustrate the practical implications.

2603.26888 2026-03-31 stat.AP

Statistical modeling of breast cancer radiomic features and hazard using image registration-aided longitudinal CT data

Subrata Mukherjee, Qian Cao, Thibaud Coroller, Ravi K. Samala, Nicholas Petrick, Berkman Sahiner

Comments 18 pages, 8 figure, 2 tables. arXiv admin note: substantial text overlap with arXiv:2501.06814

详情
英文摘要

Patients with metastatic breast cancer (mBC) undergo repeated computed tomography (CT) imaging during treatment to monitor disease progression. Accurate longitudinal tracking of individual lesions across scans from multiple radiologists is essential for reliable radiomic analysis and clinical decision-making. We conducted a retrospective study using serial chest CT scans from the Phase III MONALEESA-3 and MONALEESA-7 trials and developed statistical models for multi-source data integration and survival analysis. First, we introduced a Registration-based Automated Matching and Correspondence (RAMAC) algorithm to establish lesion correspondence across annotations from different radiologists and imaging time points using the Hungarian algorithm. Second, using the RAMAC-processed dataset, we developed interpretable radiomic survival models for progression-free survival prediction by combining baseline radiomic features, post-treatment changes at Weeks 8, 16, and 24, and demographic variables. To address the high dimensionality of longitudinal radiomic data, feature reduction was performed using an L1-penalized additive Cox proportional hazards model and best subset selection followed by Cox modeling. Model performance was evaluated using the concordance index (C-index). Incorporating additional imaging time points improved predictive performance, increasing the mean C-index from 0.58 at baseline to 0.64. Joint modeling further showed significant associations between longitudinal radiomic features and survival outcomes over time.

2603.26862 2026-03-31 stat.ME math.ST stat.TH

The exact amount of t-ness that the normal model can tolerate

Nils Lid Hjort

Comments 21 pages, 2 figures; Statistical Research Report, Department of Mathematics, University of Oslo, from July 1993, but arXiv'd April 2026. The article is published in Journal of the American Statistical Association, 1994, vol. 89, pages 665-675, in a slightly abridged form, at this url: https://www.tandfonline.com/doi/abs/10.1080/01621459.1994.10476791

详情
Journal ref
Journal of the American Statistical Association, 1994, vol. 89, pages 665-675
英文摘要

Suppose that the normal model is used for data $Y_1,\ldots,Y_n$, but that the true distribution is a t-distribution with location and scale parameters $ξ$ and $σ$ and $m$ degrees of freedom. The normal model corresponds to $m=\infty$. Using a local asymptotic framework where $m$ is allowed to increase with $n$ two classes of estimands are identified. One small class, which in particular contains the functions of $ξ$ alone, is only affected by t-ness to the second order, and maximum likelihood estimation in the two- or three-parameter models become equivalent. For all other estimands it is shown that if $m\ge1.458\sqrt{n}$, then maximum likelihood estimation using the incorrect normal model is still more precise than using the correct three-parameter model. This is furthermore shown to be true in regression models with t-distributed residuals. We also propose and analyse compromise estimators that in various ways interpolate between the normal and the nonnormal models. A separate section extends the t-ness results to general normal scale mixtures, in which case the tolerance radius around the normal error distribution takes the form of an upper bound $0.3429/\sqrt{n}$ for the variance of the scale mixture distribution. Proving our results requires somewhat nonstandard `corner asymptotics' since behaviour of estimators must be studied when the crucial parameter $γ=1/m$ is close to zero, which is not an inner point of the parameter space, and the maximum likelihood estimator of $m$ is equal to $\infty$ with positive probability.

2603.26858 2026-03-31 cs.LG math.SP q-bio.GN stat.ML

A Hierarchical Sheaf Spectral Embedding Framework for Single-Cell RNA-seq Analysis

Xiang Xiang Wang, Guo-Wei We

详情
英文摘要

Single-cell RNA-seq data analysis typically requires representations that capture heterogeneous local structure across multiple scales while remaining stable and interpretable. In this work, we propose a hierarchical sheaf spectral embedding (HSSE) framework that constructs informative cell-level features based on persistent sheaf Laplacian analysis. Starting from scale-dependent low-dimensional embeddings, we define cell-centered local neighborhoods at multiple resolutions. For each local neighborhood, we construct a data-driven cellular sheaf that encodes local relationships among cells. We then compute persistent sheaf Laplacians over sampled filtration intervals and extract spectral statistics that summarize the evolution of local relational structure across scales. These spectral descriptors are aggregated into a unified feature vector for each cell and can be directly used in downstream learning tasks without additional model training. We evaluate HSSE on twelve benchmark single-cell RNA-seq datasets covering diverse biological systems and data scales. Under a consistent classification protocol, HSSE achieves competitive or improved performance compared with existing multiscale and classical embedding-based methods across multiple evaluation metrics. The results demonstrate that sheaf spectral representations provide a robust and interpretable approach for single-cell RNA-seq data representation learning.

2603.26850 2026-03-31 math.ST stat.TH

Estimation of projection operators with Gaussian noise

Luca Castelli

详情
英文摘要

This paper focuses on random projection operators when the subspace of projection is estimated. We derive non-asymptotic upper bounds on the error between the projection onto the estimated subspace and the projection onto the underlying subspace. The provided upper bounds depend on the noise and on intrinsic properties of the estimated subspace. Several scenarios are considered according to the distribution of the estimator of the matrix spanning the subspace. The aforementioned bounds are attained under a structural assumption on the Gram matrix associated with the subspace. Regularized estimators are introduced to circumvent this assumption. An example is given in the partial least square (PLS) framework where the estimated subspace is spanned by the PLS weights.

2603.26820 2026-03-31 eess.IV cs.CV stat.AP stat.CO

Toward Actionable Digital Twins for Radiation-Based Imaging and Therapy: Mathematical Formulation, Modular Workflow, and an OpenKBP-Based Dose-Surrogate Prototype

Hsin-Hsiung Huang, Bulent Soykan

详情
英文摘要

Digital twins for radiation-based imaging and therapy are most useful when they assimilate patient data, quantify predictive uncertainty, and support clinically constrained decisions. This paper presents a modular framework for actionable digital twins in radiation-based imaging and therapy and instantiates its reproducible open-data component using the \openkbpfull{} benchmark. The framework couples PatientData, Model, Solver, Calibration, and Decision modules and formalizes latent-state updating, uncertainty propagation, and chance-constrained action selection. As an initial implementation, we build a GPU-ready PyTorch/MONAI reimplementation of the \openkbp{} starter pipeline: an 11-channel, 19.2M-parameter 3D U-Net trained with a masked loss over the feasible region and equipped with Monte Carlo dropout for voxel-wise epistemic uncertainty. To emulate the update loop on a static benchmark, we introduce decoder-only proxy recalibration and illustrate uncertainty-aware virtual-therapy evaluation using DVH-based and biological utilities. A complete three-fraction loop including recalibration, Monte Carlo inference, and spatial optimization executes in 10.3~s. On the 100-patient test set, the model achieved mean dose and DVH scores of 2.65 and 1.82~Gy, respectively, with 0.58~s mean inference time per patient. The \openkbp{} case study thus serves as a reproducible test bed for dose prediction, uncertainty propagation, and proxy closed-loop adaptation, while future institutional studies will address longitudinal calibration with delivered-dose logs and repeat imaging.

2603.26813 2026-03-31 physics.ins-det cs.LG stat.ML

Calorimeter Shower Superresolution with Conditional Normalizing Flows: Implementation and Statistical Evaluation

Andrea Cosso

Comments Master's thesis. arXiv admin note: text overlap with arXiv:2409.16336 by other authors

详情
英文摘要

In High Energy Physics, detailed calorimeter simulations and reconstructions are essential for accurate energy measurements and particle identification, but their high granularity makes them computationally expensive. Developing data-driven techniques capable of recovering fine-grained information from coarser readouts, a task known as calorimeter superresolution, offers a promising way to reduce both computational and hardware costs while preserving detector performance. This thesis investigates whether a generative model originally designed for fast simulation can be effectively applied to calorimeter superresolution. Specifically, the model proposed in arXiv:2308.11700 is re-implemented independently and trained on the CaloChallenge 2022 dataset based on the Geant4 Par04 calorimeter geometry. Finally, the model's performance is assessed through a rigorous statistical evaluation framework, following the methodology introduced in arXiv:2409.16336, to quantitatively test its ability to reproduce the reference distributions.

2603.26803 2026-03-31 cs.LG stat.ML

A Comparative Investigation of Thermodynamic Structure-Informed Neural Networks

Guojie Li, Liu Hong

Comments 30 pages, 9 figures, 2 tables

详情
英文摘要

Physics-informed neural networks (PINNs) offer a unified framework for solving both forward and inverse problems of differential equations, yet their performance and physical consistency strongly depend on how governing laws are incorporated. In this work, we present a systematic comparison of different thermodynamic structure-informed neural networks by incorporating various thermodynamics formulations, including Newtonian, Lagrangian, and Hamiltonian mechanics for conservative systems, as well as the Onsager variational principle and extended irreversible thermodynamics for dissipative systems. Through comprehensive numerical experiments on representative ordinary and partial differential equations, we quantitatively evaluate the impact of these formulations on accuracy, physical consistency, noise robustness, and interpretability. The results show that Newtonian-residual-based PINNs can reconstruct system states but fail to reliably recover key physical and thermodynamic quantities, whereas structure-preserving formulation significantly enhances parameter identification, thermodynamic consistency, and robustness. These findings provide practical guidance for principled design of thermodynamics-consistency model, and lay the groundwork for integrating more general nonequilibrium thermodynamic structures into physics-informed machine learning.

2603.26796 2026-03-31 cs.LG cs.AI stat.ML

Robust Batch-Level Query Routing for Large Language Models under Cost and Capacity Constraints

Jelena Markovic-Voronov, Kayhan Behdin, Yuanda Xu, Zhengze Zhou, Zhipeng Wang, Rahul Mazumder

详情
英文摘要

We study the problem of routing queries to large language models (LLMs) under cost, GPU resources, and concurrency constraints. Prior per-query routing methods often fail to control batch-level cost, especially under non-uniform or adversarial batching. To address this, we propose a batch-level, resource-aware routing framework that jointly optimizes model assignment for each batch while respecting cost and model capacity limits. We further introduce a robust variant that accounts for uncertainty in predicted LLM performance, along with an offline instance allocation procedure that balances quality and throughput across multiple models. Experiments on two multi-task LLM benchmarks show that robustness improves accuracy by 1-14% over non-robust counterparts (depending on the performance estimator), batch-level routing outperforms per-query methods by up to 24% under adversarial batching, and optimized instance allocation yields additional gains of up to 3% compared to a non-optimized allocation, all while strictly controlling cost and GPU resource constraints.

2603.26713 2026-03-31 cs.LG eess.SP stat.ML

Boundary-aware Prototype-driven Adversarial Alignment for Cross-Corpus EEG Emotion Recognition

Guangli Li, Canbiao Wu, Na Tian, Li Zhang, Zhen Liang

详情
英文摘要

Electroencephalography (EEG)-based emotion recognition suffers from severe performance degradation when models are transferred across heterogeneous datasets due to physiological variability, experimental paradigm differences, and device inconsistencies. Existing domain adversarial methods primarily enforce global marginal alignment and often overlook class-conditional mismatch and decision boundary distortion, limiting cross-corpus generalization. In this work, we propose a unified Prototype-driven Adversarial Alignment (PAA) framework for cross-corpus EEG emotion recognition. The framework is progressively instantiated in three configurations: PAA-L, which performs prototype-guided local class-conditional alignment; PAA-C, which further incorporates contrastive semantic regularization to enhance intra-class compactness and inter-class separability; and PAA-M, the full boundary-aware configuration that integrates dual relation-aware classifiers within a three-stage adversarial optimization scheme to explicitly refine controversial samples near decision boundaries. By combining prototype-guided subdomain alignment, contrastive discriminative enhancement, and boundary-aware aggregation within a coherent adversarial architecture, the proposed framework reformulates emotion recognition as a relation-driven representation learning problem, reducing sensitivity to label noise and improving cross-domain stability. Extensive experiments on SEED, SEED-IV, and SEED-V demonstrate state-of-the-art performance under four cross-corpus evaluation protocols, with average improvements of 6.72\%, 5.59\%, 6.69\%, and 4.83\%, respectively. Furthermore, the proposed framework generalizes effectively to clinical depression identification scenarios, validating its robustness in real-world heterogeneous settings. The source code is available at \textit{https://github.com/WuCB-BCI/PAA}

2603.20507 2026-03-31 cs.LG stat.ML

Distributed Gradient Clustering: Convergence and the Effect of Initialization

Aleksandar Armacki, Himkant Sharma, Dragana Bajović, Dušan Jakovetić, Mrityunjoy Chakraborty, Soummya Kar

Comments 9 pages, 3 figures

详情
英文摘要

We study the effects of center initialization on the performance of a family of distributed gradient-based clustering algorithms introduced in [1], that work over connected networks of users. In the considered scenario, each user contains a local dataset and communicates only with its immediate neighbours, with the aim of finding a global clustering of the joint data. We perform extensive numerical experiments, evaluating the effects of center initialization on the performance of our family of methods, demonstrating that our methods are more resilient to the effects of initialization, compared to centralized gradient clustering [2]. Next, inspired by the $K$-means++ initialization [3], we propose a novel distributed center initialization scheme, which is shown to improve the performance of our methods, compared to the baseline random initialization.

2512.00296 2026-03-31 stat.ME

Difference-in-differences with stochastic policy shifts of a continuous treatment

Michael Jetsupphasuk, Chenwei Fang, Didong Li, Michael G. Hudgens

详情
英文摘要

Treatment effects of stochastic policy shifts quantify differences in outcomes across counterfactual scenarios with varying treatment distributions. Stochastic policy shifts may be of interest in settings where it is unrealistic or infeasible to deterministically manipulate treatments. In this paper, methods are developed to draw inference about stochastic policy effects under difference-in-differences (DiD) designs with a continuous treatment. The proposed causal estimand is the expected effect of modifying the continuous dose distribution among the treated, i.e., those that received a non-zero dose. Several possible stochastic policies are discussed and a general framework for identification and estimation is proposed. One stochastic policy applicable to many settings is the exponential tilt, which increments the conditional density function of the continuous dose. For the exponential tilt policy, a double/debiased machine learning estimator is proposed that allows for data-adaptive, nonparametric nuisance function estimation. Under mild convergence rate conditions, the estimator is shown to be root-$n$ consistent and asymptotically normal with variance attaining the nonparametric efficiency bound. The proposed method is used to study the effect of hydraulic fracturing activity on employment and income.

2305.10583 2026-03-31 math.CA math.DG math.ST stat.TH

Flagfolds: an approach to multi-dimensional varifolds

Blanche Buet, Xavier Pennec

详情
英文摘要

By interpreting the product of the Principal Component Analysis, that is the covariance matrix, as a sequence of nested subspaces naturally coming with weights according to the level of approximation they provide, we are able to embed all $d$--dimensional Grassmannians into a stratified space of covariance matrices. We observe that Grassmannians constitute the lowest dimensional skeleton of the stratification while it is possible to define a Riemaniann metric on the highest dimensional and dense stratum, such a metric being compatible with the global stratification. With such a Riemaniann metric at hand, it is possible to look for geodesics between two linear subspaces of different dimensions that do not go through higher dimensional linear subspaces as would euclidean geodesics. Building upon the proposed embedding of Grassmannians into the stratified space of covariance matrices, we generalize the concept of varifolds to what we call flagfolds in order to model multi-dimensional shapes.