arXivDaily arXiv每日学术速递 周一至周五更新
重置
2604.28163 2026-05-01 eess.SP cs.LG stat.CO stat.ML

Sequential Inference for Gaussian Processes: A Signal Processing Perspective

Daniel Waxman, Fernando Llorente, Petar M. Djurić

Comments 53 pages, 7 figures. Accepted to IEEE Signal Processing Magazine

详情
英文摘要

The proliferation of capable and efficient machine learning (ML) models marks one of the strongest methodological shifts in signal processing (SP) in its nearly 100-year history. ML models support the development of SP systems that represent complex, nonlinear relationships with high predictive accuracy. Adapting these models often requires sequential inference, which differs both theoretically and methodologically from the usual paradigm of ML, where data are often assumed independent and identically distributed. Gaussian processes (GPs) are a flexible yet principled framework for modeling random functions, and they have become increasingly relevant to SP as statistical and ML methods assume a more prominent role. We provide a self-contained, tutorial-style overview of GPs, with a particular focus on recent methodological advances in sequential, incremental, or streaming inference. We introduce these techniques from a signal-processing perspective while bridging them to recent advances in ML. Many of the developments we survey have direct applications to state-space modeling, sequential regression and forecasting, anomaly detection in time series, sequential Bayesian optimization, adaptive and active sensing, and sequential detection and decision-making. By organizing these advances from a signal-processing perspective, we intend to equip practitioners with practical tools and a coherent roadmap for deploying sequential GP models in real-world systems.

2604.28104 2026-05-01 stat.ME math.ST stat.TH

Kernel-based independence and mean independence tests for weakly dependent data

Daniel Diz-Castro, Manuel Febrero-Bande, Wenceslao González-Manteiga

详情
英文摘要

We provide a unified framework for independence and mean independence tests based on the Hilbert-Schmidt independence criterion, extending some previous results in the literature to hold in general topological spaces. We also present a complete theoretical analysis of the test statistic asymptotic behavior when the observed sample corresponds to a partial sample path of some stationary and ergodic stochastic process under near epoch dependence assumptions. In particular, we explore the test statistic consistency and limit distribution under both fixed and local hypothesis. The finite sample performance of the test(s) is illustrated with a succinct simulation study involving functional data.

2604.28047 2026-05-01 stat.ME

Data-Adaptive and Model-Robust Covariate Adjustment for Time-to-Event Outcomes in Stratified Randomized Trials

Raphael C. Kim, Brian Gilbert, Ramin Zabih, Michele Santacatterina, Ivan Diaz

详情
英文摘要

Time-to-event outcomes are commonly used as primary endpoints in randomized clinical trials. Despite this, relatively little work incorporates baseline covariate information while also accounting for stratified randomization, a common form of randomization. Moreover, leveraging efficiency gains using these approaches typically requires pre-specifying a subset of covariates that are most predictive of the outcome -- a challenging task in practice, as most trials collect dozens of potentially prognostic baseline variables. In this work, we build on existing literature to propose a data-adaptive and model-robust covariate adjustment method for time-to-event outcomes. Our approach, based on targeted minimum loss-based estimation, allows for data-adaptive covariate selection and model-robust efficient inference on functionals of the survival curve while accounting for stratification. Through extensive simulations and analysis, we showcase the simplicity and improved precision of our method when the covariate set is not known a priori.

2604.28027 2026-05-01 stat.ME

Response to: "A note on conditional densities, Bayes' rule, and recent criticisms of Bayesian inference" by Yan et al., 2026

Klaus Mosegaard, Andrew Curtis

Comments 10 pages, 0 figures

详情
英文摘要

In a recent preprint (Mosegaard and Curtis, 2024, arXiv:2411.13570v2) we analyzed the consequences of ignoring the well-known inconsistency of classical conditional probability densities. We explained how this inconsistency, together with acausality in hierarchical methods, invalidate a variety of commonly applied Bayesian methods when applied to problems in the physical world. Yan et al., 2026, (arXiv:2603.27038v1) published a note, in which they claim, contrary to our preprint, that there are no inconsistencies if one uses the method of conditional expectations to derive probabilities. Furthermore, they believe that there are mathematical errors in our exposition and in our use of the Bayesian framework. This note is a response to the claims made by Yan et al. Yan et al. do not discriminate between physical and statistical consistency. Their note addresses statistical consistency of a solution under a change of variables; this is already known to be resolved by using the theory of conditional expectations. By contrast, our preprint concerns the physical consistency of any solution under a change of mathematics used to derive that solution. It demonstrates that widely used methods to compute Bayesian posterior solutions are physically inconsistent under a change of variables. Their note does not, therefore, address the tenet of our preprint. We show herein that the theory of conditional expectations does not resolve physical inconsistency, and that Yan et al. make mathematical errors. We conclude that their claims are unfounded, and in some cases we show that their critique is meaningless. The conclusions of our preprint therefore stand.

2604.27907 2026-05-01 stat.ME

Multivariate mixed models with model-free random effects

Angela Andreella, Livio Finos

详情
英文摘要

Linear mixed models are widely used to analyze non-independent data, but inference for fixed effects can be unreliable under misspecification of the random-effects distribution, inaccurate Fisher information estimation, or convergence failures, leading to a lack of control over false positives. These difficulties are amplified in multivariate settings, where within-cluster and between-response dependence must be modeled jointly. We propose a testing procedure for fixed effects in multivariate linear mixed models that avoids Fisher information estimation and does not require correct specification of the random-effects distribution by combining score statistics with clusterwise sign-flipping transformations. Our method accommodates both forms of dependence and yields asymptotically valid inference under weak distributional assumptions on the data-generating process.

2604.27892 2026-05-01 stat.ML cs.LG stat.AP

Prediction-powered Inference by Mixture of Experts

Yanwu Gu, Linglong Kong, Dong Xia

详情
英文摘要

The rapidly expanding artificial intelligence (AI) industry has produced diverse yet powerful prediction tools, each with its own network architecture, training strategy, data-processing pipeline, and domain-specific strengths. These tools create new opportunities for semi-supervised inference, in which labeled data are limited and expensive to obtain, whereas unlabeled data are abundant and widely available. Given a collection of predictors, we treat them as a mixture of experts (MOE) and introduce an MOE-powered semi-supervised inference framework built upon prediction-powered inference (PPI). Motivated by the variance reduction principle underlying PPI, the proposed framework seeks the mixture of experts that achieves the smallest possible variance. Compared with standard PPI, the MOE-powered inference framework adapts to the unknown performance of individual predictors, benefits from their collective predictive power, and enjoys a best-expert guarantee. The framework is flexible and applies to mean estimation, linear regression, quantile estimation, and general M-estimation. We develop non-asymptotic theory for the MOE-powered inference framework and establish upper bounds on the coverage error of the resulting confidence intervals. Numerical experiments demonstrate the practical effectiveness of MOE-powered inference and corroborate our theoretical findings.

2604.27887 2026-05-01 stat.ME

Meta-Analysis Without Normality: Estimating the True Effect Distribution with Penalized Gaussian Mixtures

Daihe Sui, Elizabeth Tipton

Comments 38 pages, 17 figures

详情
英文摘要

Standard random-effects meta-analysis relies heavily on the assumption that the underlying true effects are normally distributed. In the social sciences, where evidence synthesis increasingly involves large, highly heterogeneous datasets, this assumption is often restrictive and unjustified. Misspecification of the random-effects distribution prevents the detection of asymmetry or multimodality, potentially leading to erroneous conclusions regarding the prevalence of adverse effects or the existence of specific subgroups. This paper introduces a Penalized Gaussian Mixture (PGM) framework designed to recover the entire probability density function of true effects without enforcing a rigid parametric shape. The method adapts to different non-normal scenarios, including skewed and multimodal distributions, while reducing to the normal case when supported by the data. A simulation study demonstrates that in large, highly heterogeneous meta-analyses, PGM yields substantially more accurate estimates of tail probabilities and the density function than standard methods when normality is violated, without substantially compromising efficiency under normality. An empirical application to environmental education data illustrates the practical utility of the method. The proposed framework provides researchers with a robust tool to move beyond simple summary statistics and characterize the complex nature of the true effect distribution in the real world.

2604.27883 2026-05-01 math.ST cs.IT cs.LG math.IT stat.ML stat.TH

Decoupled Descent: Exact Test Error Tracking Via Approximate Message Passing

Max Lovig

Comments 43 Pages, 7 Figures

详情
英文摘要

In modern parametric model training, full-batch gradient descent (and its variants) suffers due to progressively stronger biasing towards the exact realization of training data; this drives the systematic ``generalization gap'', where the train error becomes an unreliable proxy for test error. Existing approaches either argue this gap is benign through complex analysis or sacrifice data to a validation set. In contrast, we introduce decoupled descent (DD), a novel theory-based training algorithm that satisfies a train-test identity -- enforcing the train error to asymptotically track the test error for stylized Gaussian mixture models. Within this specific regime, leveraging approximate message passing theory, DD iteratively cancels the biases due to data reuse, rigorously demonstrating the feasibility of zero-cost validation and $100\%$ data utilization. Moreover, DD is governed by a low-dimensional state evolution recursion, rendering the dynamics of the algorithm transparent and tractable. We validate DD on XOR classification, yielding superior performance compared to GD; additionally, we implement noisy MNIST and non-linear probing of CIFAR-10, demonstrating that even when our stylized assumptions are relaxed, DD narrows the generalization gap compared to GD.

2604.27831 2026-05-01 stat.AP

Optimal allocation of trials to sub-regions in crop variety testing with multiple years and correlated genotype effects

Maryna Prus, Lenka Filová, Hans-Peter Piepho, Waqas Ahmed Malik

详情
英文摘要

Plant breeding and variety trials are usually conducted in multiple environments sampled from a defined target population of environments in order to characterize the performance of breeding lines or varieties. When the population is large and heterogeneous, it may be sub-divided into sub-regions or zones according to administrative and agro-ecological criteria. Analysis then focuses on prediction of performance in the individual sub-regions. Modelling the genotype effect in each sub-region as random, information can be borrowed across sub-regions using best linear unbiased prediction based on a suitable variance-covariance matrix for the genotype-zone effects. Here, we consider the important case where kinship of pedigree information is available for the genotypes under test. This information can be integrated into the variance-covariance matrix for genotype-zone effects. The objective we pursue here is to determine the optimal allocation of a fixed budget of trials to sub-regions. This design problem is solved using a combination of theory and explicit equations on one hand and numerical optimization on the other hand. Our proposed novel approach allows obtaining the optimal allocation when the number of genotypes is in the hundreds, a common setting in large plant breeding programs as well as in variety testing for economically important crops.

2604.27813 2026-05-01 math.ST stat.TH

A High Dimensional Wild Bootstrap Max-Test for Detecting the Presence of Significant Predictors

Jonathan B. Hill

详情
英文摘要

We construct a block bootstrap max-test for detecting the presence of significant predictors in a high dimensional setting, allowing for weakly dependent and heterogeneous (possibly non-stationary) data. The number of covariates to be screened may be large $p$ $>>$ $n$, and growing at an exponential rate, provided $\ln (p)$ $=$ $o(n^{a})$ for some $a$ $>$ $0$ that depends on memory decay and the growth of higher moments. We study the problem of correlation screening in a high dimensional marginal regression setting, assuming so-called \textit{physical dependence} in a time series setting. We entirely sidestep covariance matrix estimation and adaptive re-sampling by working with a max-statistic over the many computed parameters. Thus we do not need endogenous selection of the most relevant predictor index yielding non-uniform asymptotics, nor do we need a post-estimation Bonferroni correction. The non-standard limit distribution arising from the maximum of an increasing number of estimators is easily approximated by a multiplier (wild) block bootstrap. The max-test controls for size well, performs well against various deviations from the null, including very slight deviations with a weak or sparse signal. A numerical experiment is performed and an empirical example with the VIX volatility index is provided.

2604.27791 2026-05-01 stat.ME

Reversible Jump MCMC With No Regrets: Bayesian Variable Selection Using Mixtures of Mutually Singular Distributions

Don van den Bergh, Merlise A. Clyde, Adrian E. Raftery, Maarten Marsman

详情
英文摘要

Bayesian variable selection requires sampling from a posterior distribution that combines discrete model indicators with continuously varying parameters, a challenge often addressed through reversible jump Markov chain Monte Carlo (RJMCMC). Despite its generality, RJMCMC is widely regarded as difficult to design and implement correctly. We present mixtures of mutually singular (MoMS) distributions as a transparent alternative in which competing models are represented within a single fixed-dimensional parameter space partitioned into mutually singular subspaces. We show that this formulation reproduces the exact spike-and-slab interpretation of Bayesian variable selection and that, under appropriate constructions, MoMS and RJMCMC share the same Metropolis--Hastings acceptance probability. On a benchmark dataset with ten predictors, both methods recover posterior inclusion probabilities that match full enumeration, while MoMS achieves comparable or superior effective sample size per second relative to a carefully engineered RJMCMC scheme. We further illustrate the approach in a mixed-effects logistic regression for a sleep-and-memory experiment and in factor-loading selection for a multidimensional generalized partial credit model. Together, these results show that Bayesian variable selection can be carried out within standard fixed-dimensional Markov chain Monte Carlo methodology -- without regret.

2604.27742 2026-05-01 cs.LG stat.ML

Linear-Core Surrogates: Smooth Loss Functions with Linear Rates for Classification and Structured Prediction

Mehryar Mohri, Yutao Zhong

详情
英文摘要

The choice of loss function in classification involves a fundamental trade-off: smooth losses (like Cross-Entropy) enable fast optimization rates but yield slow square-root consistency bounds, while piecewise-linear losses (like Hinge) offer fast linear consistency rates but suffer from non-differentiability. We propose Linear-Core (LC) Surrogates, a new family of convex loss functions that resolve this tension by stitching a linear core to a smooth tail. We prove that these surrogates are differentiable everywhere while retaining strict linear $H$-consistency bounds, effectively combining the optimization benefits of smoothness with the statistical efficiency of margin-based losses. In the structured prediction setting, we show that this smoothness unlocks a massive computational and energy advantage: it allows for an unbiased stochastic gradient estimator that bypasses the quadratic complexity $O(|\mathscr{Y}|^2)$ of exact inference (e.g., Viterbi). Empirically, our method achieves a 23$\times$ speedup over Structured SVMs on large-vocabulary sequence tagging tasks and demonstrates superior robustness to instance-dependent label noise, outperforming Cross-Entropy by 2.6% on corrupted CIFAR-10.

2604.20052 2026-05-01 stat.CO

Annealed Langevin Monte Carlo for Flow ODE Sampling

Hanwen Huang

Comments 25 pages, 3 figures

详情
英文摘要

We propose Annealed Langevin Monte Carlo for Flow ODE Sampling (ALMC-ODE), a method for generating samples from unnormalized target distributions, with a particular emphasis on multimodal densities that are challenging for standard Markov chain Monte Carlo methods. ALMC-ODE is based on a probability-flow ordinary differential equation (ODE) derived from stochastic interpolants, which continuously transports a standard Gaussian reference distribution at $t = 0$ to the target distribution $ρ$ at $t = 1$. The key innovation lies in an annealed Langevin Markov chain that evolves through a sequence of intermediate distributions bridging the reference and the target. The resulting importance-weighted particles, reweighted via a Jarzynski-based scheme, yield a low-variance estimator of the velocity field governing the ODE. On the theoretical side, we establish a Jarzynski-type reweighting identity for general time-inhomogeneous transition kernels, characterize the optimal backward kernel that minimizes the variance of the importance weights, and prove an $\mathcal{O}(1/n)$ mean squared error bound for the resulting velocity-field estimator. Numerical experiments on challenging benchmarks, including Gaussian mixture models and a 64-dimensional Allen--Cahn field system, demonstrate that ALMC-ODE significantly outperforms both direct Monte Carlo ODE approaches and Hamiltonian Monte Carlo when applied to highly multimodal target distributions.

2604.01911 2026-05-01 stat.ME

On the uncertainty from the first-stage estimation of prognostic covariate adjustment in randomized controlled trials

Nodoka Seya, Masataka Taguri

详情
英文摘要

Prognostic covariate adjustment (PROCOVA) is a two-sample two-stage estimation method used in randomized controlled trials. In the first stage, a prognostic score, defined as the conditional expectation of an outcome given covariates under the control treatment, is estimated using historical data. In the second stage, analysis of covariance with the estimated prognostic score and treatment assignment as explanatory variables is performed, and the average treatment effect is estimated. Although the prognostic score is estimated in this procedure, the variance estimator, which treats the prognostic score as known, has been used. Furthermore, the difference in the asymptotic variance between cases where the prognostic score is known versus where it is estimated has not been previously clarified. In this study, we derived these two asymptotic variances and showed that they are equal. We also constructed two variance estimator: one that treats the prognostic score as known, and another that accounts for its estimation, and compared their performance through simulation studies and data applications. For PROCOVA, since both variance estimators are asymptotically valid, it is generally recommended to use a variance estimator that treats the prognostic score as known, as it is simpler to derive and implement. However, when historical data is small, a variance estimator that explicitly accounts for prognostic score estimation is recommended if conservative inference is preferred.

2602.20549 2026-05-01 cs.LG cs.CV stat.ME

Sample-efficient evidence estimation of score based priors for model selection

Frederic Wang, Katherine L. Bouman

Comments ICLR 2026

详情
英文摘要

The choice of prior is central to solving ill-posed imaging inverse problems, making it essential to select one consistent with the measurements $y$ to avoid severe bias. In Bayesian inverse problems, this could be achieved by evaluating the model evidence $p(y \mid M)$ under different models $M$ that specify the prior and then selecting the one with the highest value. Diffusion models are the state-of-the-art approach to solving inverse problems with a data-driven prior; however, directly computing the model evidence with respect to a diffusion prior is intractable. Furthermore, most existing model evidence estimators require either many pointwise evaluations of the unnormalized prior density or an accurate clean prior score. We propose DiME, an estimator of the model evidence of a diffusion prior by integrating over the time-marginals of posterior sampling methods. Our method leverages the large amount of intermediate samples naturally obtained during the reverse diffusion sampling process to obtain an accurate estimation of the model evidence using only a handful of posterior samples (e.g., 20). We also demonstrate how to implement our estimator in tandem with recent diffusion posterior sampling methods. Empirically, our estimator matches the model evidence when it can be computed analytically, and it is able to both select the correct diffusion model prior and diagnose prior misfit under different highly ill-conditioned, non-linear inverse problems, including a real-world black hole imaging problem.

2602.19483 2026-05-01 cs.LG cs.AI stat.ML

Making Conformal Predictors Robust in Healthcare Settings: a Case Study on EEG Classification

Arjun Chatterjee, Sayeed Sajjad Razin, John Wu, Siddhartha Laghuvarapu, Jathurshan Pradeepkumar, Jimeng Sun

Comments Accepted to the International Conference on Artificial Intelligence in Medicine 2026

详情
英文摘要

Quantifying uncertainty in clinical predictions is critical for high-stakes diagnosis tasks. Conformal prediction offers a principled approach by providing prediction sets with theoretical coverage guarantees. However, in practice, patient distribution shifts violate the i.i.d. assumptions underlying standard conformal methods, leading to poor coverage in healthcare settings. In this work, we evaluate several conformal prediction approaches on EEG seizure classification, a task with known distribution shift challenges and label uncertainty. We demonstrate that personalized calibration strategies can improve coverage by over 20 percentage points while maintaining comparable prediction set sizes. Our implementation is available via PyHealth, an open-source healthcare AI framework: https://github.com/sunlabuiuc/PyHealth.

2601.22993 2026-05-01 cs.LG stat.ML

Constrained Policy Optimization with Cantelli-Bounded Value-at-Risk

Rohan Tangri, Jan-Peter Calliess

详情
英文摘要

We introduce the Value-at-Risk Constrained Policy Optimization algorithm (VaR-CPO), a sample efficient and conservative method designed to optimize Value-at-Risk (VaR) constrained reinforcement learning (RL) problems. Empirically, we demonstrate that VaR-CPO is capable of safe exploration, achieving zero constraint violations during training in feasible environments, a critical property that baseline methods fail to uphold. To overcome the inherent non-differentiability of the VaR constraint, we employ Cantelli's inequality to obtain a tractable approximation based on the first two moments of the cost return. Additionally, by extending the trust-region framework of the Constrained Policy Optimization (CPO) method, we provide worst-case bounds for both policy improvement and constraint violation during the training process.

2512.20914 2026-05-01 math.ST stat.AP stat.ML stat.TH

Invariant Feature Extraction Through Conditional Independence and the Optimal Transport Barycenter Problem: the Gaussian case

Ian Bounos, Pablo Groisman, Mariela Sued, Esteban Tabak

详情
英文摘要

A methodology is developed to extract $d$ invariant features $W=f(X)$ that predict a response variable $Y$ without being confounded by variables $Z$ that may influence both $X$ and $Y$. The methodology's main ingredient is the penalization of any statistical dependence between $W$ and $Z$ conditioned on $Y$, replaced by the more readily implementable plain independence between $W$ and the random variable $Z_Y = T(Z,Y)$ that solves the [Monge] Optimal Transport Barycenter Problem for $Z\mid Y$. In the Gaussian case considered in this article, the two statements are equivalent. When the true confounders $Z$ are unknown, other measurable contextual variables $S$ can be used as surrogates, a replacement that involves no relaxation in the Gaussian case if the covariance matrix $Σ_{ZS}$ has full range. The resulting linear feature extractor adopts a closed form in terms of the first $d$ eigenvectors of a known matrix. The procedure extends with little change to more general, non-Gaussian / non-linear cases.

2510.12911 2026-05-01 econ.EM q-fin.RM stat.ME

Spot Regressions with Candlesticks

Yasin Simsek

详情
英文摘要

Betas from spot regressions are central to asset pricing and risk management, as measures of systematic risk. This paper develops a new estimation and inference framework for spot regressions by leveraging high-frequency candlesticks, extending conventional (open-to-close) returns with intra-period high/low prices. Specifically, I construct candlestick-based estimators of regression parameters, including spot beta, by minimizing a quadratic risk under a fixed-k asymptotic framework. I then develop a feasible hypothesis testing procedure for spot betas with correct asymptotic size. Simulation results show that the proposed estimator reduces estimation risk relative to return-based estimators, especially in small samples, and the test achieves notably higher power. I apply the framework to assess the market neutrality of Bitcoin using 1-minute data on IBIT and SPY, finding deviations from neutrality, particularly in high-volatility periods.

2509.20194 2026-05-01 stat.ME econ.EM

Identification and Semiparametric Estimation of Conditional Means from Aggregate Data

Cory McCartan, Shiro Kuriwaki

Comments 20 pages, plus references and appendices

详情
英文摘要

We introduce a new method for estimating the mean of an outcome variable within groups when researchers only observe the average of the outcome and group indicators across a set of aggregation units, such as geographical areas. Existing methods for this problem, also known as ecological inference, implicitly make strong assumptions about the aggregation process. We first formalize weaker conditions for identification which hold conditionally on covariates. To efficiently control for many covariates, we propose a debiased machine learning estimator that is based on nuisance functions restricted to a partially linear form. Our estimator admits a semiparametric sensitivity analysis which allows researchers to evaluate the impact of violations of the key identifying assumption. We also propose a nonparametric test for the identifying assumption itself. Finally, we derive asymptotically valid confidence intervals for local, unit-level estimates under additional assumptions. Simulations and validation on real-world data where ground truth is available demonstrate the advantages of our approach over existing methods. Open-source software is available which implements the proposed methods.

2508.05462 2026-05-01 stat.CO math.PR

Piecewise Deterministic Sampling for Constrained Distributions

Joël Tatang Demano, Paul Dobson, Konstantinos Zygalakis

Comments 44 pages, 9 figures

详情
英文摘要

In this paper, we propose a novel class of Piecewise Deterministic Markov Processes (PDMPs) that are designed to sample from probability distributions $π$ supported on a convex set $\mathcal{M}$. This class of PDMPs adapts the concept of a mirror map from convex optimisation to address sampling problems. The corresponding algorithms provide unbiased samples that respect the constraints and, moreover, allow for exact subsampling. We demonstrate the advantages of these algorithms against a range of constrained sampling problems where the proposed algorithms outperform state of the art stochastic differential equation-based methods.

2506.17463 2026-05-01 math.ST stat.ME stat.TH

Testing Separability of High-Dimensional Covariance Matrices

Bongjung Sung, Peter D. Hoff

Comments 85 pages, 32 pages in the main text, new theoretical results, including the convergence of the Kronecker MLE under the partial-isotropy core, with more sophisticated results on the asymptotic distributions and consistency, are added

详情
英文摘要

Due to their parsimony, separable covariance models have been popular in modeling matrix-variate data. However, the inference from such a model may be misleading if the population covariance matrix $Σ$ is actually non-separable, motivating the use of statistical tests of separability. The existing separability tests suffer mainly from two issues: 1) test statistics that are not well-defined in high-dimensional settings, 2) low power for small sample sizes and null distributions that depend on unknown parameters, preventing exact error rate control. To address these issues, we propose novel invariant tests using the core covariance matrix, a complementary notion to a separable covariance matrix. We show that testing separability of $Σ$ is equivalent to testing sphericity of its core component. With this insight, we construct test statistics that are well-defined in high-dimensional settings and have distributions that are invariant under the null hypothesis of separability, allowing for exact simulation of null distributions. We establish the asymptotic properties of some test statistics by proving the asymptotic spectral equivalence between the sample covariance matrix and its core in a $p/n\rightarrowγ\in(0,\infty)$ regime. The large power of our proposed tests relative to existing procedures is demonstrated numerically.

2505.13230 2026-05-01 cs.LG cond-mat.dis-nn stat.ML

Implicit bias produces neural scaling laws in learning curves, from perceptrons to deep networks

Francesco D'Amico, Dario Bocchi, Matteo Negri

Comments Final accepted version at ICLR26 main conference; 27 pages, 21 Figures, 5 tables

详情
英文摘要

Scaling laws in deep learning -- empirical power-law relationships linking model performance to resource growth -- have emerged as simple yet striking regularities across architectures, datasets, and tasks. These laws are particularly impactful in guiding the design of state-of-the-art models, since they quantify the benefits of increasing data or model size, and hint at the foundations of interpretability in machine learning. However, most studies focus on asymptotic behavior at the end of training. In this work, we describe a richer picture by analyzing the entire training dynamics: we identify two novel \textit{dynamical} scaling laws that govern how performance evolves as function of different norm-based complexity measures. Combined, our new laws recover the well-known scaling for test error at convergence. Our findings are consistent across CNNs, ResNets, and Vision Transformers trained on MNIST, CIFAR-10 and CIFAR-100. Furthermore, we provide analytical support using a single-layer perceptron trained with logistic loss, where we derive the new dynamical scaling laws, and we explain them through the implicit bias induced by gradient-based training.

2505.12487 2026-05-01 stat.CO stat.ME stat.ML

Stereographic Multiple-Try Metropolis

Zhihao Wang, Jun Yang

Comments 53 pages, 12 figures

详情
英文摘要

Multiple-proposal MCMC algorithms have recently gained attention for their potential to improve performance, especially through parallel implementation on modern hardware. We introduce Stereographic Multiple-Try Metropolis (SMTM), a novel family of gradient-free algorithms designed for sampling high-dimensional distributions. By integrating multiple-try Metropolis (MTM) with the stereographic MCMC framework, SMTM overcomes the traditional limitations of MTM, particularly its pathological convergence behavior often observed in high dimensions. For both light-tailed and heavy-tailed targets, SMTM not only outperforms classical MTM and the existing stereographic random-walk Metropolis but also demonstrates strong robustness to tuning. These advantages are supported by high-dimensional scaling analysis and validated through extensive simulation studies.

2503.24324 2026-05-01 stat.AP econ.GN physics.soc-ph q-fin.EC q-fin.RM

Mitigating Financial Risk from Climate-Induced Agricultural Price Volatility

Sourish Das, Sudeep Shukla, Abbinav Sankar Kailasam, Anish Rai, Sejal Garg, Anirban Chakraborti

Comments 15 pages, 11 figures

详情
英文摘要

Agricultural price volatility, driven by market dynamics and meteorological factors such as temperature and precipitation, poses challenges for sustainable finance, planning, and policy. This study analyzes the impact of climate on crop price volatility for soybean in Madhya Pradesh (India) and Illinois (US), rice in Assam (India), wheat in North Dakota (US), cotton in Gujarat (India), and corn in Iowa (US). Using CMIP6 climate projections from the Copernicus Climate Change Service, we examine historical climate patterns and evaluate two future scenarios: SSP2-4.5 (moderate) and SSP5-8.5 (severe). We estimate conditional price volatility using the Exponential Generalized Autoregressive Conditional Heteroskedasticity (EGARCH) model, and forecast this volatility with a Seasonal Autoregressive Integrated Moving Average with Exogenous Regressors (SARIMAX) model that incorporates meteorological variables. Finally, we apply the Black-Scholes framework to evaluate the cost of put-option-based insurance, which provides protection to farmers against adverse price drops linked to climate change. Our results highlight the role of meteorological data in improving agricultural risk modelling, enabling better design of insurance mechanisms, price stabilization tools, and sustainable policy interventions under climate uncertainty.

2503.04956 2026-05-01 stat.ML cs.LG

Foreclassing: A new machine learning perspective on human decision making with temporal data

Daniel Andrew Coulson, Martin T. Wells

Comments 20 pages, 1 figure, 15 tables

详情
英文摘要

Time series forecasts are widely used to inform decisions. Human decision-makers interpret these forecasts, incorporate prior experience and uncertainty about future outcomes, and then make a decision. In this paper, we propose a new machine learning problem, which we call Foreclassing, which addresses settings in which the aim is to automate human involvement in such decision-making processes. Our aim is to develop a unified end-to-end model that takes a time series as input, produces a forecast, accounts for its predictive uncertainty, and makes a downstream classification decision, enabling models to support or automate such temporal decision-making tasks. Related problems arise across a range of applications, yet the literature lacks both a unified methodology and a formal problem statement. By formalizing the task, we aim to stimulate research on such models and encourage cross-domain collaboration. To solve the Foreclassing problem, we propose a deep Bayesian neural network, ForeClassNet. As part of this framework, we introduce a new type of neural network layer, Boltzmann convolutions, which enable probabilistic learning of kernel sizes in convolutional layers. We evaluate the Foreclassing framework against standard time series classification methods and demonstrate the efficacy of ForeClassNet on real-world Foreclassing datasets from the weather, energy, and finance domains, achieving superior performance relative to state-of-the-art time series classifiers.

2502.19234 2026-05-01 physics.ao-ph physics.data-an stat.AP

Arctic teleconnection on climate and ozone pollution in the polar jet stream path of eastern US

K Shuvo Bakar, Sourish Das, Sudeep Shukla, Anirban Chakraborti

Comments 19 pages, 6 figures

详情
英文摘要

Arctic sea-ice loss is a defining feature of climate change and offers insight into its impact on mid-latitude air quality. Here, we investigate how variability in Arctic sea-ice extent (ASI) affects ground-level ozone ($O_3$) across eastern US states through physically and chemically mediated atmospheric pathways. Using observations and causal-inference methods grounded in atmospheric dynamics, we show that ASI drives wintertime ozone variability primarily via indirect meteorological mechanisms, including changes in humidity, temperature, and atmospheric circulation along the polar and subtropical jet streams. Inland regions exhibit the strongest sensitivity, while coastal areas are modulated by marine boundary-layer processes. Seasonal contrasts reveal that Arctic-driven dynamics suppress ozone in winter but can enhance accumulation under certain summer conditions. These findings highlight the importance of Arctic-midlatitude teleconnections in shaping regional air quality and highlight the need to integrate large-scale climate processes into ozone management and climate adaptation strategies.

2402.14532 2026-05-01 cs.LG stat.ML

A Framework for Variational Inference of Lightweight Bayesian Neural Networks with Heteroscedastic Uncertainties

David J. Schodt, Ryan Brown, Michael Merritt, Samuel Park, Delsin Menolascino, Mark A. Peot

Comments Fix equation typos

详情
英文摘要

Obtaining heteroscedastic predictive uncertainties from a Bayesian Neural Network (BNN) is vital to many applications. Often, heteroscedastic aleatoric uncertainties are learned as outputs of the BNN in addition to the predictive means, however doing so may necessitate adding more learnable parameters to the network. In this work, we demonstrate that both the heteroscedastic aleatoric and epistemic variance can be embedded into the variances of learned BNN parameters, improving predictive performance for lightweight networks. By complementing this approach with a moment propagation approach to inference, we introduce a relatively simple framework for sampling-free variational inference suitable for lightweight BNNs.

2310.18500 2026-05-01 stat.ME

Designing Randomized Experiments to Predict Unit-Specific Treatment Effects

Elizabeth Tipton, Michalis Mamakos

Comments 46 pages, 3 figures

详情
Journal ref
Statistics and Public Policy, 12(1), 2505485 (2025)
英文摘要

Typically, a randomized experiment is designed to test a hypothesis about the average treatment effect and sometimes hypotheses about treatment effect variation. The results of such a study may then be used to inform policy and practice for units not in the study. In this paper, we argue that given this use, randomized experiments should instead be designed to predict unit-specific treatment effects in a well-defined population. We then consider how different sampling processes and models affect the bias, variance, and mean squared prediction error of these predictions. The results indicate, for example, that problems of generalizability (differences between samples and populations) can greatly affect bias both in predictive models and in measures of error in these models. We also examine when the average treatment effect estimate outperforms unit-specific treatment effect predictive models and implications of this for planning studies.

2302.03286 2026-05-01 math.NA cs.NA stat.ML

Algorithmically Designed Artificial Neural Networks (ADANNs): Higher order deep operator learning for parametric partial differential equations

Arnulf Jentzen, Adrian Riekert, Philippe von Wurstemberger

Comments 39 pages, 17 Figures

详情
英文摘要

In this article we propose a new deep learning approach to approximate operators related to parametric partial differential equations (PDEs). In particular, we introduce a new strategy to design specific artificial neural network (ANN) architectures in conjunction with specific ANN initialization schemes which are tailor-made for the particular approximation problem under consideration. In the proposed approach we combine efficient classical numerical approximation techniques with deep operator learning methodologies. Specifically, we introduce customized adaptions of existing ANN architectures together with specialized initializations for these ANN architectures so that at initialization we have that the ANNs closely mimic a chosen efficient classical numerical algorithm for the considered approximation problem. The obtained ANN architectures and their initialization schemes are thus strongly inspired by numerical algorithms as well as by popular deep learning methodologies from the literature and in that sense we refer to the introduced ANNs in conjunction with their tailor-made initialization schemes as Algorithmically Designed Artificial Neural Networks (ADANNs). We numerically test the proposed ADANN methodology in the case of several parametric PDEs. In the tested numerical examples the ADANN methodology significantly outperforms existing classical approximation algorithms as well as existing deep operator learning methodologies from the literature.

2208.07086 2026-05-01 stat.ME math.ST stat.TH

Flexible Bayesian Multiple Comparison Adjustment Using Dirichlet Process and Beta-Binomial Model Priors

Don van den Bergh, Fabian Dablander

Comments 31 pages, 12 figures, and 2 tables

详情
英文摘要

Researchers frequently wish to assess the equality or inequality of groups, but this poses the challenge of adequately adjusting for multiple comparisons. Statistically, all possible configurations of equality and inequality constraints can be uniquely represented as partitions of groups, where any number of groups are equal if they are in the same subset of the partition. In a Bayesian framework, one can adjust for multiple comparisons by constructing a suitable prior distribution over all possible partitions. Inspired by work on variable selection in regression, we propose a class of flexible beta-binomial priors for multiple comparison adjustment. We compare this prior setup to the Dirichlet process prior suggested by Gopalan and Berry (1998) and multiple comparison adjustment methods that do not specify a prior over partitions directly. Our approach not only allows researchers to assess pairwise equality constraints but simultaneously all possible equalities among all groups. Since the space of possible partitions grows rapidly -- for ten groups, there are already 115,975 possible partitions -- we use a stochastic search algorithm to efficiently explore the space. Our method is implemented in the Julia package EqualitySampler, and we illustrate it on examples related to the comparison of means, standard deviations, and proportions.

2112.13247 2026-05-01 math.ST stat.TH

Decision-making with possibilistic inferential models

Ryan Martin, Shih-Ni Prim, Jonathan Williams

详情
英文摘要

Inferential models (IMs) are data-dependent, imprecise-probabilistic structures designed to quantify uncertainty about unknowns. As the name suggests, the focus has been on uncertainty quantification for inference and on its reliability properties in that context. Focusing on a likelihood-based possibilistic IM formulation, the present paper develops a corresponding framework for decision making, and investigates the decision-theoretic implications of the IM's reliability guarantees. Here we show that the possibilistic IM's assessment of an action's quality, defined by a simple Choquet integral, tends not be too optimistic compared to that of an oracle. This ensures that the IM tends not to favor actions that the oracle doesn't also favor, hence the IM is also reliable for decision making. We also establish a complementary, large-sample efficiency result that says the IM's reliability isn't achieved by being grossly conservative. In the special case of equivariant statistical models, further connections can be made between the IM's and Bayesian's recommended actions, from which certain optimality conclusions can be drawn.

2604.27733 2026-05-01 cs.LG stat.ML

Mind the Gap: Structure-Aware Consistency in Preference Learning

Mehryar Mohri, Yutao Zhong

详情
英文摘要

Preference learning has become the foundation of aligning Large Language Models (LLMs) with human intent. Popular methods, such as Direct Preference Optimization (DPO), minimize surrogate losses as proxies for the intractable pairwise ranking loss. However, we demonstrate that for the equicontinuous hypothesis sets typical of neural networks, these standard surrogates are theoretically inconsistent, yielding vacuous generalization guarantees. To resolve this, we formulate LLM alignment within a margin-shifted ranking framework. We derive rigorous $H$-consistency bounds that depend on enforcing a separation margin $γ$. Crucially, we extend this to Structure-Aware $H$-consistency, introducing a novel objective (SA-DPO) that adapts the margin based on the semantic distance between responses to handle synonyms and hard pairs. Finally, we analyze the trade-off between consistency and model limitations via the Margin-Capacity Profile, proving that heavy-tailed surrogates (such as the Polynomial Hinge family) offer superior consistency guarantees for capacity-bounded models compared to the standard logistic loss used in DPO.

2604.27732 2026-05-01 stat.AP q-fin.RM stat.OT

A Note on the Generalized Cape Cod Reserving Method

Ronald Richman, Mario V. Wüthrich

详情
英文摘要

Claims reserving is one of the most important actuarial tasks in non-life insurance modeling. There are several popular methods to perform claims reserving such as the chain-ladder (CL), the Bornhuetter--Ferguson (BF) or the generalized Cape Cod (GCC) methods. These methods have originally been introduced as deterministic algorithms, and only in a later step, they have been lifted to stochastic models allowing for analyzing claims prediction uncertainty. This holds true for the CL and the BF methods, but not for the GCC method. The purpose of this article is to close this gap and derive an analytical formula for the mean squared error of prediction (MSEP) of the GCC method.

2604.27723 2026-05-01 cs.LG stat.ML

Optimized Deferral for Imbalanced Settings

Corinna Cortes, Anqi Mao, Mehryar Mohri, Yutao Zhong

详情
英文摘要

Learning algorithms can be significantly improved by routing complex or uncertain inputs to specialized experts, balancing accuracy with computational cost. This approach, known as learning to defer, is essential in domains like natural language generation, medical diagnosis, and computer vision, where an effective deferral can reduce errors at low extra resource consumption. However, the two-stage learning to defer setting, which leverages existing predictors such as a collection of LLMs or other classifiers, often faces challenges due to an expert imbalance problem. This imbalance can lead to suboptimal performance, with deferral algorithms favoring the majority expert. We present a comprehensive study of two-stage learning to defer in expert imbalance settings. We cast the deferral loss optimization as a novel cost-sensitive learning problem over the input-expert domain. We derive new margin-based loss functions and guarantees tailored to this setting, and develop novel algorithms for cost-sensitive learning. Leveraging these results, we design principled deferral algorithms, MILD (Margin-based Imbalanced Learning to Defer), specifically suited for expert imbalance settings. Extensive experiments demonstrate the effectiveness of our approach, showing clear improvements over existing baselines on both image classification and real-world Large Language Model (LLM) routing tasks.

2604.27696 2026-05-01 stat.CO stat.AP stat.ML

FoReco and FoRecoML: A Unified Toolbox for Forecast Reconciliation in R

Daniele Girolimetto, Jeroen Rombouts, Ines Wilms, Yangzhuoran Fin Yang

详情
英文摘要

Forecast reconciliation has become key to improving the accuracy and coherence of forecasts for linearly constrained multiple time series, such as hierarchical and grouped series. Yet, comprehensive software that jointly covers cross-sectional, temporal, and cross-temporal reconciliation has so far been lacking. The R packages FoReco and FoRecoML address this gap by offering a comprehensive and unified framework. The packages respectively implement classical and regression-based linear reconciliation approaches, and non-linear approaches based on machine learning for cross-sectional, temporal and cross-temporal frameworks. Designed for accessibility and flexibility, these packages provide sensible default options that allow new users to apply reconciliation methods with minimal effort, while still giving expert users full control to explore state-of-the-art extensions through customized settings. With this dual focus, FoReco and FoRecoML are versatile tools for practitioners and researchers working on forecast reconciliation.

2604.27665 2026-05-01 math.ST math.PR stat.TH

A note on estimation of quarticity based on spot volatility

Yi Guo

详情
英文摘要

In this paper, we aim at estimating the quarticity of continuous Itô semimartingales. Instead of using some classical estimators, we introduce a more intuitive one and establish a central limit theorem (CLT) for it, with a convergence rate of $1/\sqrt{Δ_n}$ in the sense of stable convergence. Moreover, we compare the asymptotic variance of this estimator with that of other existing estimators.

2604.27603 2026-05-01 stat.CO

Martingale Posteriors for Discretely Observed Diffusions

Jingning Yao, Ajay Jasra, Sheng Jiang

详情
英文摘要

In this paper we consider parameter estimation for discretely observed diffusion processes. In particular, we focus on data that are observed at low frequency and methodology that can estimate parameters with uncertainty quantification. Most statistical work in this domain develops advanced Markov chain Monte Carlo (MCMC) algorithms for sampling from the posterior of the parameters, a task which is often complicated by the fact that one seldom has access to the transition density of the diffusion process; one has to combine sophisticated MCMC methods which are robust to the required time discretization of the diffusion, which can yield expensive algorithms. We focus on developing the martingale posterior method for the context of interest, when one can only numerically approximate the transition density of the diffusion. Based on using types of diffusion bridges we introduce a new martingale posterior method for parameter estimation for discretely observed diffusion processes. We prove that this algorithm approximates, in some sense, the martingale posterior which has no time-discretization bias up-to $\mathcal{O}(Δ)$ if $Δ$ is the time discretization step. Our approach is illustrated on several examples, showing orders of magnitude speed up versus state-of-the-art MCMC algorithms.

2604.27409 2026-05-01 stat.ME stat.AP

Robust inference methods of diagnostic test accuracy meta-analysis for influential outlying studies via density power divergence

Kotaro Sasaki, Hisashi Noma, Theodoros Evrenoglou

Comments 20 pages with 4 figures

详情
英文摘要

In diagnostic test accuracy meta-analysis (DTA-MA), standard inference methods using bivariate random-effects models for jointly synthesizing sensitivity and specificity can be sensitive to outlying studies and may yield misleading conclusions. In this article, we propose frequentist outlier-robust statistical inference methods for DTA-MA based on density power divergence. The proposed methods automatically downweight influential outlying studies by modifying the estimating function using the robust divergence with a tuning parameter. To achieve robust yet statistically efficient inference in the presence of outlying studies, the proposed methods incorporate practical strategies for selecting the tuning parameter, including a data-adaptive criterion based on the Hyvärinen score. We also quantify the contributions of individual studies to the robust pooled estimates, facilitating interpretation of how outlying studies affect the results. We illustrate the effectiveness of the proposed methods through an application to a DTA-MA of the Mini-Mental State Examination. Simulation studies showed that the proposed methods reduced bias and root mean squared error relative to existing methods and improved coverage probability in the presence of outliers. The proposed methods enable a sensitivity analysis to assess whether the main results obtained using standard methods are driven by outlying studies.

2604.27394 2026-05-01 stat.ML cs.LG

Bayesian X-Learner: Calibrated Posterior Inference for Heterogeneous Treatment Effects under Heavy-Tailed Outcomes

Eichi Uehara

Comments 47 pages, 7 figures, 25 tables. Code: https://github.com/EichiUehara/bayesian-X-Leaner. Prepared for submission to TMLR

详情
英文摘要

Conditional Average Treatment Effect (CATE) estimation in practice demands three properties simultaneously: heterogeneous effects $τ(x)$, calibrated uncertainty over them, and robustness to the heavy tails that contaminate real outcome data. Meta-learners (Künzel et al., 2019) give (i); causal forests and BART give (i)-(ii) with Gaussian-tail assumptions; no widely used tool gives all three. We present Bayesian X-Learner, an X-Learner built on cross-fitted doubly robust pseudo-outcomes (Kennedy, 2020) with a full MCMC posterior over $τ(x)$ via a Welsch redescending pseudo-likelihood. On Hill's IHDP benchmark the default configuration attains mean $\sqrt{\varepsilon_{\mathrm{PEHE}}} = 0.56$ on 5 replications (lowest mean; differences from S-/T-/X-learners, full-config Causal BART, and a causal forest baseline are not significant at $α=0.05$, and rank ordering is unstable at 10 replications -- IHDP comparisons are competitive rather than dominant). On contaminated "whale" DGPs with up to 20-25% tail density, a one-flag extension (contamination_severity) that selects a Huber-$δ$ nuisance loss per Huber's minimax-$δ$ relation recovers RMSE $\approx 0.13$ with tight credible intervals (single-cross-fit 30-seed coverage 83% [Wilson 66%, 93%] at 20% density; modular-Bayes pooling with Bayesian-bootstrap nuisance draws restores nominal 95% coverage).

2604.27338 2026-05-01 stat.AP

Estimating Population Viral Load Contextual Exposure Using GPS-Derived Activity Spaces in Rural South Africa

Zhaoxing Wu, Haoyang Wu, Thulile Mathenjwa, Elphas Okango, Khai Hoan Tram, Margot Otto, Maxime Inghels, Paul Mee, Diego Cuadros, Hae-Young Kim, Till Bärnighausen, Frank Tanser, Adrian Dobra

Comments 22 pages, 5 figures

详情
英文摘要

This article introduces novel methodologies for estimating contextual exposure to HIV population viral load using GPS data. We propose a comprehensive analytical framework comprising (i) local (grid-cell level) estimation of HIV population viral load, (ii) derivation of individual activity spaces from GPS trajectories, and (iii) quantification of contextual exposure to HIV within these activity spaces. We integrate HIV surveillance and sociodemographic survey data with GPS-based mobility data collected in rural KwaZulu-Natal, South Africa, to characterize mobility patterns among young adults aged 20-30 years. Using derived measures of mobility and contextual exposure, we assess whether participants' sex and age systematically influence the magnitude, configuration, and heterogeneity of their mobility patterns. Furthermore, we describe analytical approaches to examine how contextual exposure to HIV evolves as activity spaces extend beyond static residential locations, outlining procedures to identify GPS-tracked participants at elevated risk of HIV acquisition. KEYWORDS: Population viral load exposure; GPS-based mobility analysis; Activity space

2604.27305 2026-05-01 stat.ME

Inference on Generalized Latent Variable Models with High-Dimensional Responses and Covariates

Jing Ouyang, Chengyu Cui, Yunxiao Chen, Kean Ming Tan, Gongjun Xu

详情
英文摘要

Regression models with both high-dimensional responses and covariates have attracted growing attention. Standard multivariate regression models become inadequate when the response variables depend not only on observed covariates but also on latent variables that capture key unobserved characteristics. To draw statistical inferences on covariate effects while accounting for latent variables, we consider a high-dimensional generalized latent variable model that accommodates mixed-type responses and allows for flexible dependence between covariates and latent variables, which is more suitable for many real-world applications than existing methods that either rely on a linear regression form or restricted assumptions on the dependence between covariates and latent variables. We develop an alternating algorithm that iteratively updates the regression parameters and the latent variables, transforming an intractable nonconvex problem into a sequence of tractable convex subproblems. Theoretically, we provide algorithmic guarantees by establishing statistical consistency of the resulting estimator and deriving an error bound for it. Further, building on this estimator, we construct a debiased estimator for the covariate effect and establish its asymptotic normality. The effectiveness of the proposed method is demonstrated through an application to evaluating the fairness of the Programme for International Student Assessment (PISA).

2604.27282 2026-05-01 cs.CY cs.LG stat.AP

The Likelihood Ratio Wall: Structural Limits on Accurate Risk Assessment for Rare Violence

Marco Pollanen

Comments 16 pages, 2 figures, 8 tables. Accepted to the 2026 ACM Conference on Fairness, Accountability, and Transparency (FAccT '26)

详情
英文摘要

Pretrial risk assessment tools are used on over one million U.S. defendants each year, yet their use for predicting rare violent re-offense faces a basic statistical barrier. We derive a universal precision bound -- the Likelihood Ratio Wall -- showing that when violent re-arrest rates are low (2-5%), achieving even a 50% hit rate among people labeled "high risk" (positive predictive value, or PPV) would require tools far more discriminative than current instruments appear to be. For rare outcomes, a tool can have respectable-looking performance metrics and still be wrong most of the time it flags someone as "high risk for violence." We show that post-hoc score recalibration cannot solve this problem because it does not improve the tool's underlying ability to separate true positives from false positives. We further prove a Surveillance Ceiling: when over-policing inflates recorded "risk factors" among those who would not re-offend, the maximum achievable precision is structurally lower for over-policed groups, even at equal offense rates. We translate these results into the Number Needed to Detain (how many people must be detained to prevent one violent offense), and propose that risk reports should communicate this uncertainty explicitly. Our findings suggest that for rare violent outcomes, debates about fairness metrics alone are incomplete: under current data regimes, the available features may not support high-confidence individualized detention decisions.

2604.27280 2026-05-01 cs.LG stat.ME

Predicting Covariate-Driven Spatial Deformation for Nonstationary Gaussian Processes

Minghao Gu, Weizhi Lin, Qiang Huang

详情
英文摘要

Nonstationary Gaussian processes (GPs) are essential for modeling complex, locally heterogeneous spatial data. A common modeling approach is the spatial deformation method that warps the domain to recover isotropy. However, this static method does not account for changes in spatial correlation induced by covariates, limiting its ability to predict nonstationary GPs under new covariate conditions. To enable predictive modeling of the deformation method, we propose to model the spatial deformation as a function of covariates. The spaces of diffeomorphic deformations and Euclidean covariate vectors are connected by characterizing deformations as generated by velocity fields living in a Lie algebra. To overcome the estimation instability caused by high-order interactions between multiple covariates in a general Lie algebra, we prove that those interactions can be truncated with a moderate physical assumption. Based on the theoretical results, a concise functional form of deformations driven by multiple covariates can be established, and an efficient estimation-inference algorithm is developed for out-of-sample nonstationary GP prediction with limited covariate-deformation sample pairs. The effectiveness and generalizability of the method are demonstrated on a simulation study and two case studies, in the fields of manufacturing and geostatistics, respectively.

2604.27243 2026-05-01 stat.AP

Estimating Decision Uncertainty from Preference Uncertainty: Application to Ground Vehicle Design

Chia-Ruei Liu, Yongjia Song, Qiong Zhang, Cameron Turner

详情
英文摘要

Engineering design problems are often modeled as multi-objective optimization tasks in which a scalarized utility function selects an optimal design from the Pareto set. In practice, preferences are imperfectly known, so uncertainty in the preference model leads to uncertainty in the resulting optimal design. This paper proposes a probabilistic framework that treats preference parameters as random variables and examines how preference uncertainty propagates to decision uncertainty. A random preference vector induces a probability distribution over optimal designs, allowing us to identify which regions of the Pareto front are most likely to be selected and to assess recommendation stability under preference variability. To explain the sources of this variability, we apply variance-based global sensitivity analysis to the induced optimal solutions, using Sobol' indices and Shapley values to quantify the contributions of individual design variables and their dependencies. We further summarize the overall dispersion of the optimal-design distribution using the Fréchet variance, which provides a scalar measure of decision stability under a given preference model. Two vehicle design case studies demonstrate how problem structure can lead to discrete versus continuous decision distributions and show how the proposed quantities support preference-aware design analysis.

2604.27242 2026-05-01 math.PR math.ST stat.TH

Statistical Inference for Homogenization Limits Driven by Wiener or Hermite Processes

Pablo Ramses Alonso-Martin

Comments 43 pages. Comments are welcome

详情
英文摘要

We study the effective estimation of the diffusivity and Hurst parameter for the homogenized limit of a class of slow/fast systems. Depending on the system parameters, this limit solves a stochastic differential equation driven by either a Wiener process or a Hermite process. In the class of models we consider, the fast variable is a fractional Ornstein--Uhlenbeck process. We show that estimators constructed from the homogenized limit remain consistent when applied to appropriately subsampled data generated by the original slow/fast system. A key tool in our analysis is the consistency of renormalized quadratic variations for a family of additive functionals of the fast process. Using Wiener chaos expansions, we obtain an \(L^2\)-orthogonal decomposition of these renormalized quadratic variations. This allows us to show that, under appropriate subsampling conditions, the consistency properties of the estimators are preserved even when the data is generated by the slow/fast system rather than the homogenized limit. We also show that, under stricter subsampling conditions, a non-central limit theorem is preserved in the case where the fluctuations of the estimator around the true value are non-Gaussian. As a direct consequence of convergence in \(L^2\), we obtain consistency of an estimator for the limiting self-similarity that does not require knowledge of the limiting diffusivity. Finally, we show that our results apply to a class of one-dimensional fluctuation models.

2604.27198 2026-05-01 stat.AP stat.ME

Bayesian Nonparametric Causal Inference for Quantile Residual Life: An Application to Alzheimer's Disease

Woojung Bae, Taekwon Hong, Sang Kyu Lee, Dongrak Choi, Jong-Hyeon Jeong

详情
英文摘要

In Alzheimer's disease research, for individuals who remain dementia-free through a given follow-up time, an important clinical question is how much longer they are likely to remain dementia-free. Quantiles of this remaining time provide clinically interpretable prognostic milestones and can help characterize prognostic heterogeneity across baseline groups. We address this question in the Alzheimer's Disease Neuroimaging Initiative (ADNI), focusing on baseline amyloid status as the exposure. Estimation is challenging because amyloid status is observed rather than randomized, requiring adjustment for confounding, and because time to dementia onset is heterogeneous and heavily right-censored. We estimate causal contrasts in quantile residual life using a Bayesian nonparametric enriched Dirichlet process mixture model for the joint distribution of event times, exposure, and baseline covariates, with inference via Bayesian g-computation. The approach accommodates ignorable missing baseline covariates through data augmentation, supports inference across clinically relevant landmark times, and allows sensitivity analysis for residual unmeasured confounding. Simulation studies show good performance under complex heterogeneity and heavy censoring. In ADNI, elevated baseline amyloid was associated with shorter quantiles of remaining dementia-free time than non-elevated baseline amyloid among individuals who remained dementia-free through relevant landmark times, overall and within baseline diagnostic subgroups.

2604.27196 2026-05-01 math.ST stat.TH

Technical Note on Relating Scores of Tilted Distributions

Curtis McDonald

详情
英文摘要

Recent results have shown that for a linear tilt to a reference measure, the scores that would be produced under convolution with a normal variable can be expressed in terms of convolutions of the original density. Here, we extend that result to include constant negative diagonal tilts as well. The relationship follows from relating the denoisers of the two densities, which define the scores via Tweedie formula. A linear tilt results in a location shift to the score operator, while a quadratic tilt results in both a location shift and a time shift. Thus the scores of the tilted density can be understood as the scores of the original convolution process at a different location and noise level. These results are of interest to those in the score based diffusion community, and may lead to better score estimators which take advantage of these tilted score relationships.

2604.27191 2026-05-01 stat.ME cs.LG stat.ML

Linear Models, Variable Selection, Artificial Intelligence

By Riyadh Alrawkan, Edward Boone, Ryad Ghanam, Anton Westveld

详情
英文摘要

Variable selection in linear regression models has been a problem since hypothesis testing began. Which variables to include or exclude from a model is not an easy task. Techniques such as Forward, Back ward, Stepwise Regression sequentially add or delete variables from a model. Penalized likelihood methods such as AIC, BIC, etc. seek to choose variables that have a significant contribution to the likelihood. Penalized sum of square methods such as LASSO and Elastic Net have been used to penalize small coefficients to only allow variables with large coefficients in the model. This work introduces an Artificial Intelligence approach to model selection where an ANN is trained to determine the significance of the variables based on OLS estimates. A simulation study shows the accuracy across various sample sizes and variances. Furthermore, a simulation study is conducted to compare the performance of the approach against Forward, Backward, AIC, BIC and LASSO. The approach is illustrated using a dataset from the World Health Organization regarding Life Expectancy. A github link is provided to the pretrained ANN that can handle up to 100 predictor variables, the original WHO dataset and the subset used in this work.

2604.27025 2026-05-01 stat.ML cs.LG

SCOPE-FE: Structured Control of Operator and Pairwise Exploration for Feature Engineering

Minhee Park, Seongyeon Son, Yonghyun Lee, Eunchan Kim

详情
英文摘要

Automatic feature engineering is an effective approach for improving predictive performance in tabular learning. However, expand-and-reduce methods, such as OpenFE, become increasingly computationally expensive as the input dimensionality grows. This limitation arises primarily from the combinatorial explosion of candidate features generated through operator-feature combinations. To address this issue, we propose SCOPE-FE, a structured search space control framework that improves efficiency by reducing the candidate space prior to feature generation. SCOPE-FE jointly regulates two major sources of combinatorial growth: the operator space and feature-pair space. First, OperatorProbing estimates the dataset-specific utility of candidate operators and eliminates low-contribution operators in advance. Second, FeatureClustering employs spectral embedding and fuzzy c-means clustering to group structurally related features, thereby restricting candidate generation to relevant within-cluster combinations. In addition, we introduce ReliabilityScoring, which incorporates variance across subsamples to stabilize pruning decisions. Experiments on ten benchmark datasets demonstrate that SCOPE-FE substantially reduces feature engineering time while maintaining competitive predictive performance relative to existing baselines. The efficiency gains are particularly pronounced for high-dimensional datasets. These results indicate that structured control of the search space is an effective strategy for scalable automatic feature engineering. The code will be made publicly available upon acceptance.

2604.27017 2026-05-01 eess.IV cs.LG stat.ML

Validating the Clinical Utility of CineECG 3D Reconstructions through Cross-Modal Feature Attribution

Karol Dobiczek, Maciej Mozolewski, Szymon Bobek, Michał Szafarczyk, Peter van Dam, Grzegorz J. Nalepa

Comments Accepted to the CompHealth workshop at the 26th International Conference on Computational Science

详情
英文摘要

Deep learning models for 12-lead electrocardiogram (ECG) analysis achieve high diagnostic performance but lack the intuitive interpretability required for clinical integration. Standard feature attribution methods are limited by the inherent difficulty in mapping abstract waveform fluctuations to physical anatomical pathologies. To resolve this, we propose a cross-modal method that projects feature attributions from high-performance 12-lead ECG models onto the CineECG 3D anatomical space. Our study reveals that while models trained directly on CineECG signals suffer from reduced accuracy and incoherent attributions, the proposed mapping mechanism effectively recovers clinically relevant feature rankings. Validated against a ground-truth dataset of 20 cases annotated by domain experts, the mapped explanations yield a Dice score of 0.56, significantly outperforming the 0.47 baseline of standard 12-lead attributions. These findings indicate that cross-modal averaging mapping effectively filters attribution instability and improves the localization of pathological features, combining the diagnostic expressiveness of standard ECG with the intuitive clarity of anatomical visualization.

2604.26992 2026-05-01 math.ST stat.ME stat.ML stat.TH

Adaptive Robust Confidence Intervals in Efron's Gaussian Two-Groups Model

Qiaosen Wang, Shuwen Chai, Chao Gao

详情
英文摘要

Robust uncertainty quantification is increasingly important in modern data analysis and is often formalized under Huber's model, which allows an $\varepsilon$-fraction of arbitrary corruptions. In many experimental sciences, however, the measurement protocol is well controlled, and contamination is more plausibly introduced upstream. Motivated by this noise-oblivious nature of adversaries, we study confidence intervals for the null location parameter $θ$ in Efron's Gaussian two-groups model, where an unknown fraction $\varepsilon$ of observations have arbitrarily shifted means, but all samples share the same law of additive Gaussian measurement noise with variance $σ^2$. We characterize the minimax-optimal length among confidence intervals with a prescribed coverage level uniformly over the unknown contamination proportion and all noise-oblivious adversaries. Although prior work has shown that the minimax point estimation rate of theta does not deteriorate when $\varepsilon$ becomes unknown, our results reveal that, with a given $σ^2$, the minimax-optimal length of confidence intervals that are adaptive to unknown $\varepsilon$ is of order $σ(n^{-1/4}+\varepsilon^{1/2}/\max\{1, \log(en \varepsilon^2)\}^{1/2})$, which is polynomially worse than the optimal length when $\varepsilon$ is known. When the variance $σ^2$ is also unknown, we show a further degradation: no adaptive robust confidence interval can be shorter than $Ω(σn^{-1/8})$. Algorithmically, we introduce a Fourier-based certification procedure built on Carathéodory's positive-semidefiniteness constraints. By scanning candidate points and accepting those whose residual characteristic function is certifiably consistent with a Gaussian location mixture, our algorithm attains the minimax lower bound in the known-variance setting and is computable in polynomial time.

2604.26983 2026-05-01 cs.IR cs.LG stat.ML

Value-Aware Product Recommendation by Customer Segmentation using a suitable High-Dimensional Similarity Measure

María Florencia Acosta, Rodrigo García Arancibia, Pamela Llop, Mariel Lovatto, Lucas Mansilla

详情
英文摘要

This paper presents a novel value-aware approach to product recommendation that simultaneously addresses the high dimensionality and sparsity of user-item data while explicitly incorporating the contribution of each product and user to overall sales revenue. The proposed framework encodes revenue contributions in the user-item matrix and computes customer similarity directly on this basis using suitable distance measures. This enables the segmentation of users according to the revenue-based similarity of their purchase baskets and supports recommendations aligned with profitability objectives. We compare conventional similarity metrics with a novel alternative tailored to high-dimensional contexts and propose three recommendation strategies based on revenue share, product popularity, and expected profit generation. The effectiveness of the proposed method is validated through simulation experiments and a real-world application using the UCI Online Retail dataset.

2604.26973 2026-05-01 cs.NE cs.LG stat.CO

MAEO: Multiobjective Animorphic Ensemble Optimization for Scalable Large-scale Engineering Applications

Omer F. Erdem, Dean Price, Paul Seurin, Majdi I. Radaideh

Comments 33 pages, 9 figures, 5 tables, under peer review

详情
英文摘要

Multiobjective optimization remains challenging for many scientific and engineering problems due to the need to balance convergence, diversity, and computational efficiency across high-dimensional objective landscapes. This work presents the Multiobjective Animorphic Ensemble Optimization (MAEO) framework, a parallelizable ensemble strategy that unifies state-of-the-art evolutionary algorithms within an island-based architecture, overcoming the limitations of relying on a single optimizer, as implied by the No Free Lunch theorem. MAEO uses a parameter-free hypervolume indicator for island performance assessment and a strict Pareto-rank-based individual scoring formulation that incorporates crowding distance and nadir-point proximity to ensure consistent selection pressure within each front. The framework is initiated using four algorithms (NSGA-III, CTAEA, AGEMOEA2, SPEA2) and evaluated through extensive benchmarking on 12 DTLZ/ZDT functions under 36 dimensionality settings using Wilcoxon signed-rank tests with both hypervolume and inverse generational distance metrics. Results show that MAEO achieves balanced convergence-diversity performance, outperforming or matching some of the leading multiobjective optimization algorithms across different benchmark problems. To demonstrate practical applicability, MAEO is applied to the equilibrium-cycle optimization of a small modular nuclear reactor. Eight discrete design variables (and three objectives (levelized cost of electricity, peak soluble boron concentration, fuel cycle length) are optimized under two safety constraints. The algorithm carried out roughly 40000 evaluations using computer simulations. MAEO identifies core designs that lower both the levelized cost of electricity and the peak boron concentration, while preserving fuel cycle length and meeting all safety constraints.

2604.24587 2026-05-01 stat.AP

Bayesian inference for hidden Markov models under genuine multimodality with application to ecological time series

Marco A. Gallegos-Herrada, Vianey Leos-Barajas, Jeffrey S. Rosenthal

Comments 37 pages, 11 figures, to be submitted to Bayesian Analysis, corrected author affiliations

详情
英文摘要

Bayesian inference in hidden Markov models (HMMs) can be challenging due to the presence of multimodality in the likelihood function, and consequently in the joint posterior distribution, even after correcting for label switching. The parallel tempering (PT) algorithm, a state-space augmentation method, is a widely used approach for dealing with multimodal distributions. Nevertheless, standard implementation of the PT algorithm may not always be sufficient to effectively explore the high-dimensional, complex multimodal posterior distributions that arise in HMMs. In this work, we demonstrate common pitfalls when implementing the PT algorithm for HMMs, approaches to remedy them, and introduce new non-informative prior distributions that facilitate effective posterior distribution exploration. We analyse time series of blue whale dive data with two 3-state HMMs in a Bayesian framework, one of which includes a categorical covariate in the transition probability matrix to account for the effect of sound stimuli on the whale's behavior. We demonstrate how effective implementation of the modified PT algorithm for Bayesian inference leads to effective exploration of the resultant multimodal posterior distribution and how that affects inference for the underlying movement patterns of the blue whales.

2604.22200 2026-05-01 astro-ph.GA stat.AP

Formalizing Galaxy Population Evolution: Drift and Mergers as Transport Processes on Manifolds

Tsutomu T. Takeuchi

Comments 31 pages, 3 figure, to be submitted

详情
英文摘要

Galaxy evolution is commonly described through the time evolution of observational statistics such as luminosity functions and stellar mass functions. However, these quantities are projections of an underlying multivariate galaxy state space rather than fundamental dynamical variables. We develop a unified framework in which galaxy evolution is formulated as the time evolution of a probability measure on the galaxy manifold. Representing galaxy states by latent variables $θ\in\mathcal{M}$ and the population by a density $ρ(θ,t)$, the evolution is governed by a general equation containing continuous transport and nonlocal jump processes. By reinterpreting manifold learning as the pushforward of measures, we distinguish observational, representation, and physical measures, and emphasize that manifold coordinates themselves need not carry direct physical meaning. In this picture, luminosity functions and stellar mass functions arise as projected observables of a single underlying dynamics, and generally do not form closed equations in observational space. The framework contains existing models as limiting cases: reduction to a single mass variable yields continuity-equation models, while additive post-merger states recover the Smoluchowski coagulation equation. We further show that luminosity-function evolution is naturally described within the Schechter family, whose apparent stability is interpreted as an effective consequence of projection. Since observables are projections of measures, inference of galaxy evolution becomes a statistical inverse problem of recovering manifold dynamics from data. This framework shifts the focus from fitting observed statistics directly to inferring the underlying state-space dynamics, thereby bridging manifold learning and physical theory.

2604.11119 2026-05-01 stat.ML cs.LG

DDO-RM: Distribution-Level Policy Improvement after Reward Learning

Tiantian Zhang, Jierui Zuo, Michael Chen, Wenping Wang

Comments 8 pages, 4 figures

详情
英文摘要

Recent theory suggests that reward-model-first methods can be more sample-efficient than direct policy fitting when the reward function is statistically simpler than the induced policy. We propose DDO-RM, a finite-candidate decision-optimization method that converts reward scores into an explicit target distribution. Unlike PPO-based RLHF or DPO, DDO-RM performs a KL-regularized mirror-descent update to project the policy toward a reward-improved distribution over a candidate set. Preliminary experiments on Pythia-410M show that DDO-RM outperforms DPO in pair accuracy (0.52 to 0.56) and mean margin (0.13 to 0.53). Our framework provides a principled connection between reward learning and mirror-descent policy improvement.

2604.08632 2026-05-01 cs.CR cs.NI stat.AP

Why Network Segmentation Projects Fail

Rohit Dube

详情
英文摘要

Network segmentation is a foundational enterprise security control. Despite its recognized benefits, segmentation initiatives frequently fail in practice, and the field lacks a systematic empirical explanation for why these projects do not achieve their intended outcomes. This paper presents an empirical study of failed segmentation projects based on a survey of 400 U.S.-based\ network security practitioners. The survey was grounded in a two-part failure framework that separately measures general IT project failure factors and segmentation-specific technical and operational barriers. Clustering analysis of the responses reveals four distinct failure archetypes. Surprisingly, practitioners across all four archetypes propose general IT project management fixes over segmentation-specific fixes in the same ratio.

2603.13566 2026-05-01 stat.ML cs.LG

EmDT: Embedding Diffusion Transformer for Tabular Data Generation in Fraud Detection

En-Ya Kuo, Sebastien Motsch

Comments Updated the first page to include the IEEE submission notice required for previously posted electronic preprint versions

详情
英文摘要

Imbalanced datasets pose a difficulty in fraud detection, as classifiers are often biased toward the majority class and perform poorly on rare fraudulent transactions. Synthetic data generation is therefore commonly used to mitigate this problem. In this work, we propose the Clustered Embedding Diffusion-Transformer (EmDT), a diffusion model designed to generate fraudulent samples. Our key innovation is to leverage UMAP clustering to identify distinct fraudulent patterns, and train a Transformer denoising network with sinusoidal positional embeddings to capture feature relationships throughout the diffusion process. Once the synthetic data has been generated, we employ a standard decision-tree-based classifier (e.g., XGBoost) for classification, as this type of model remains better suited to tabular datasets. Experiments on a credit card fraud detection dataset demonstrate that EmDT significantly improves downstream classification performance compared to existing oversampling and generative methods, while maintaining comparable privacy protection and preserving feature correlations present in the original data.

2603.10252 2026-05-01 stat.ML cs.LG physics.data-an stat.ME

Bayesian Hierarchical Models and the Maximum Entropy Principle

Brendon J. Brewer

Comments 6 pages, 2 figures. To appear in the proceedings of the 44th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2025), held in Auckland, New Zealand

详情
英文摘要

Bayesian hierarchical models are frequently used in practical data analysis contexts. One interpretation of these models is that they provide an indirect way of assigning a prior for unknown parameters, through the introduction of hyperparameters. The resulting marginal prior for the parameters (integrating over the hyperparameters) is usually dependent, so that learning one parameter provides some information about the others. In this contribution, I will demonstrate that, when the prior given the hyperparameters is a canonical distribution (a maximum entropy distribution with moment constraints), the dependent marginal prior also has a maximum entropy property, with a different constraint. This constraint is on the marginal distribution of some function of the unknown quantities. The results shed light on what information is actually being assumed when we assign a hierarchical model.

2602.10125 2026-05-01 cs.SI cs.NI stat.AP

How segmented is my network?

Rohit Dube

Comments 5 Tables, 5 Figures

详情
英文摘要

Network segmentation is a popular security practice for limiting lateral movement, yet practitioners lack a metric to measure how segmented a network actually is. We define segmentedness as the fraction of potential node-pair communications disallowed by policy -- equivalently, the complement of graph edge density -- and show it to be the first statistically principled scalar metric for this purpose. Then, we derive a normalized estimator for segmentedness and evaluate its uncertainty using confidence intervals. For a 95\% confidence interval with a margin-of-error of $\pm 0.1$, we show that a minimum of $M=97$ sampled node pairs is sufficient. This result is independent of the total number of nodes in the network, provided that node pairs are sampled uniformly at random. We evaluate the estimator through Monte Carlo simulations on Erdős--Rényi, stochastic block models, and real-world enterprise network datasets, demonstrating accurate estimation. Finally, we discuss applications of the estimator, such as baseline tracking, zero trust assessment, and merger integration.

2602.07915 2026-05-01 cs.LG cs.AI stat.ME stat.ML

CausalCompass: Evaluating the Robustness of Time-Series Causal Discovery in Misspecified Scenarios

Huiyang Yi, Xiaojian Shen, Yonggang Wu, Duxin Chen, He Wang, Wenwu Yu

Comments Major revision from the previous version

详情
英文摘要

Causal discovery from time series is a fundamental task in machine learning. However, its widespread adoption is hindered by a reliance on untestable causal assumptions and by the lack of robustness-oriented evaluation in existing benchmarks. To address these challenges, we propose CausalCompass, a flexible and extensible benchmark framework designed to assess the robustness of time-series causal discovery (TSCD) methods under violations of modeling assumptions. To demonstrate the practical utility of CausalCompass, we conduct extensive benchmarking of representative TSCD algorithms across eight assumption-violation scenarios. Our experimental results indicate that no single method consistently attains optimal performance across all settings. Nevertheless, the methods exhibiting superior overall performance across diverse scenarios are almost invariably deep learning-based approaches. We further provide hyperparameter sensitivity analyses to deepen the understanding of these findings. We additionally conduct ablation experiments to explain the strong performance of deep learning-based methods under assumption violations. We also find, somewhat surprisingly, that NTS-NOTEARS relies heavily on standardized preprocessing in practice, performing poorly in the vanilla setting but exhibiting strong performance after standardization. Finally, our work aims to provide a comprehensive and systematic evaluation of TSCD methods under assumption violations, thereby facilitating their broader adoption in real-world applications. The user-friendly implementation, documentation and datasets are available at https://anonymous.4open.science/r/CausalCompass-anonymous-5B4F/.

2601.05052 2026-05-01 cs.LG stat.ML

DeepWeightFlow: Re-Basined Flow Matching for Generating Neural Network Weights

Saumya Gupta, Scott Biggs, Moritz Laber, Zohair Shafi, Robin Walters, Ayan Paul

Comments 25 pages, 20 tables, 2 figures

详情
Journal ref
The Fourteenth International Conference on Learning Representations (ICLR 2026): https://openreview.net/forum?id=fOwsr1VTi8
英文摘要

Building efficient and effective generative models for neural network weights has been a research focus of significant interest that faces challenges posed by the high-dimensional weight spaces of modern neural networks and their symmetries. Several prior generative models are limited to generating partial neural network weights, particularly for larger models, such as ResNet and ViT. Those that do generate complete weights struggle with generation speed or require finetuning of the generated models. In this work, we present DeepWeightFlow, a Flow Matching model that operates directly in weight space to generate diverse and high-accuracy neural network weights for a variety of architectures, neural network sizes, and data modalities. The neural networks generated by DeepWeightFlow do not require fine-tuning to perform well and can scale to large networks. We apply Git Re-Basin and TransFusion for neural network canonicalization in the context of generative weight models to account for the impact of neural network permutation symmetries and to improve generation efficiency for larger model sizes. The generated networks excel at transfer learning, and ensembles of hundreds of neural networks can be generated in minutes, far exceeding the efficiency of diffusion-based methods. DeepWeightFlow models pave the way for more efficient and scalable generation of diverse sets of neural networks.

2511.02258 2026-05-01 stat.ML cs.LG math.PR math.ST stat.TH

Limit Theorems for Stochastic Gradient Descent in High-Dimensional Single-Layer Networks

Parsa Rangriz

详情
英文摘要

This paper studies the high-dimensional scaling limits of online stochastic gradient descent (SGD). Building on the recent work of Ben Arous, Gheissari, and Jagannath on the effective dynamics of SGD, we study the critical scaling regime of the step size for single-layer networks. Below this critical regime, the effective dynamics are governed by deterministic (ballistic) limits, whereas at the critical scale, a new correction term emerges that changes the phase diagram. In this regime, near fixed points, the corresponding diffusive (SDE) limits of the effective dynamics reduce to an Ornstein-Uhlenbeck process under certain conditions. These results highlight how the information exponent controls sample complexity and illustrate the limitations of deterministic scaling limits in capturing stochastic fluctuations in high-dimensional learning dynamics.

2510.19110 2026-05-01 stat.ML cs.LG stat.AP

Signature Kernel Scoring Rule: A Spatio-Temporal Diagnostic for Probabilistic Weather Forecasting

Archer Dodson, Ritabrata Dutta

详情
英文摘要

Modern weather forecasting has increasingly transitioned from numerical weather prediction (NWP) to data-driven machine learning forecasting techniques. While these new models produce probabilistic forecasts to quantify uncertainty, their training and evaluation may remain hindered by conventional scoring rules, primarily MSE, which are designed for single time point predictions and ignore the highly correlated data structures present in weather behaviour. This work introduces the signature kernel scoring rule to the domain of weather forecasting, which reframes weather variables as continuous paths to encode temporal and spatial dependencies through iterated integrals. Validated as strictly proper through the use of path augmentations to guarantee uniqueness, the signature kernel provides a theoretically robust metric for forecast verification and model training. Empirical evaluations through weather scorecards on WeatherBench 2 models demonstrate the signature kernel scoring rule's high discriminative power and unique capacity to capture path-dependent interactions. Following previous demonstration of successful adversarial-free probabilistic training, we train sliding window generative neural networks using a predictive-sequential scoring rule on ERA5 reanalysis weather data. Using a lightweight model, we demonstrate that signature kernel based training outperforms climatology for forecast paths of up to fifteen timesteps.

2509.16115 2026-05-01 econ.EM stat.AP

A Korean Macroeconomic Database for Data-Rich Policy Analysis and U.S.--Korea Dependence

Changryong Baek, Seunghyun Moon, Seunghyeon Lee

详情
英文摘要

We introduce KRED (Korea Research Economic Database), a FRED-MD-compatible monthly macroeconomic database for Korea designed for data-rich policy analysis and cross-country comparison. KRED contains 125 monthly series from ECOS, KOSIS, and administrative labor-market sources, with coverage back to 1960. Using a balanced panel of 104 series over 2009:06--2025:12, principal-components analysis extracts four factors that explain about 30% of total variation. These factors correspond to financial conditions, real activity, housing and real-estate credit, and labor-market and price pressures, and their diffusion indices summarize major Korean macroeconomic episodes. We then use KRED in two empirical applications. First, factor-augmented VARs show that U.S. monetary tightening transmits strongly to Korea and that factor augmentation yields a more coherent inflation response than a low-dimensional VAR. Second, a grouped U.S.--Korea tensor autoregression shows that cross-country dependence is concentrated in financially oriented blocks, with stronger transmission from the U.S. financial block to Korea than in the reverse direction, while spillovers in real activity and housing are much weaker. KRED thus provides a transparent public database for Korean macroeconomic research and a useful building block for comparative work on macro-financial dependence in Asia.

2505.24259 2026-05-01 stat.ME

Partially-shared Imaging Regression on Integrating Heterogeneous Brain-Cognition Associations across Alzheimer's Diagnoses

Yang Sui, Qi Xu, Ting Li, Yang Bai, Annie Qu

详情
英文摘要

Alzheimer's Disease Neuroimaging Initiative (ADNI) diagnostic groups present strong heterogeneous associations among demographic, imaging, and cognitive data. We propose a novel PArtially-shared Imaging Regression (PAIR) model to represent imaging coefficients as weighted combinations of smooth spatial components. A Total Variation penalty is applied to enforce spatial smoothness, and a Selective Integration penalty is introduced to adaptively learn partial-sharing structures across groups. Theoretically, we establish minimax-optimal error bounds that dynamically adapt to varying sharing paradigms. Numerically, PAIR achieves predictive accuracy comparable to advanced deep learning models while providing superior interpretability. Applied to ADNI data, PAIR reveals substantial heterogeneity in brain-cognition pathways between cognitively normal (CN) and cognitively impaired (CI) groups, with hippocampal imaging contributing minimally in the CN group but substantially in the CI group, particularly in the CA1, CA3, and presubiculum subfields.

2504.19342 2026-05-01 stat.ML cs.LG stat.ME

Contextual Online Uncertainty-Aware Preference Learning for Human Feedback

Nan Lu, Ethan Lee, Ethan X. Fang, Junwei Lu

详情
英文摘要

Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm in artificial intelligence to align large models with human preferences. In this paper, we propose a novel statistical framework to simultaneously conduct the online decision-making and statistical inference on the optimal model using human preference data based on dynamic contextual information. Our approach introduces an efficient decision strategy that achieves both the optimal regret bound and the asymptotic distribution of the estimators. A key challenge in RLHF is handling the dependent online human preference outcomes with dynamic contexts. To address this, in the methodological aspect, we propose a two-stage algorithm starting with $ε$-greedy followed by exploitations; in the theoretical aspect, we tailor anti-concentration inequalities and matrix martingale concentration techniques to derive the uniform estimation rate and asymptotic normality of the estimators using dependent samples from both stages. Extensive simulation results demonstrate that our method outperforms state-of-the-art strategies. We apply the proposed framework to analyze the human preference data for ranking large language models on the Massive Multitask Language Understanding dataset, yielding insightful results on the performance of different large language models for medical anatomy knowledge.

2503.03065 2026-05-01 stat.ME

Meta-analysis of median survival times with inverse-variance weighting

Sean McGrath, Cheng-Han Yang, Jonathan Kimmelman, Omer Ozturk, Russell Steele, Andrea Benedetti

详情
Journal ref
Stat. Med. 45 (2026) e70533
英文摘要

We consider the problem of meta-analyzing outcome measures based on median survival times. Primary studies with time-to-event outcomes often report estimates of median survival times and confidence intervals based on the Kaplan-Meier estimator. However, outcome measures based on median survival are rarely meta-analyzed, as standard inverse-variance weighted methods require within-study standard errors that are typically not reported. In this article, we consider an inverse-variance weighted approach to meta-analyze median survival times that estimates the within-study standard errors from the reported confidence intervals. We show that this method consistently estimates the standard error of median survival when applied to confidence intervals constructed by the Brookmeyer-Crowley method. We conduct a series of simulation studies evaluating the performance of this approach at the study level (i.e., for estimating the standard error of median survival) and the meta-analytic level (i.e., for estimating the pooled median, difference of medians, and ratio of medians) for commonly used confidence intervals for median survival, including the Brookmeyer-Crowley method and nonparametric bootstrap. We find that this approach often performs comparably to a benchmark approach that uses the true within-study standard errors for meta-analyzing median-based outcome measures when within-study sample sizes are moderately large (e.g., above 50). However, when the effective sample sizes are small, the method can yield biased estimates of within-study standard errors. We illustrate an application of this approach in a meta-analysis evaluating survival benefits of being assigned to experimental arms versus comparator arms in randomized trials for non-small cell lung cancer therapies.

2502.14698 2026-05-01 cs.LG cs.AI stat.AP stat.ML

General Uncertainty Estimation with Delta Variances

Simon Schmitt, John Shawe-Taylor, Hado van Hasselt

详情
英文摘要

Decision makers may suffer from uncertainty induced by limited data. This may be mitigated by accounting for epistemic uncertainty, which is however challenging to estimate efficiently for large neural networks. To this extent we investigate Delta Variances, a family of algorithms for epistemic uncertainty quantification, that is computationally efficient and convenient to implement. It can be applied to neural networks and more general functions composed of neural networks. As an example we consider a weather simulator with a neural-network-based step function inside -- here Delta Variances empirically obtain competitive results at the cost of a single gradient computation. The approach is convenient as it requires no changes to the neural network architecture or training procedure. We discuss multiple ways to derive Delta Variances theoretically noting that special cases recover popular techniques and present a unified perspective on multiple related methods. Finally we observe that this general perspective gives rise to a natural extension and empirically show its benefit.

2502.07189 2026-05-01 cs.LG stat.ML

Exploring Vision Neural Network Pruning via Screening Methodology

Mingyuan Wang, Yangzi Guo, Sida Liu, Yuhang Liu

详情
英文摘要

The remarkable performance of modern deep neural networks (DNNs) is largely driven by their massive scale, often comprising tens to hundreds of millions-or even billions-of parameters. However, such a scale incurs substantial storage and computational costs, hindering deployment on platforms such as edge devices that require energy-efficient and real-time processing. In this paper, we propose a network pruning framework that reduces both storage and computation requirements by an order of magnitude while preserving model accuracy. Our approach eliminates non-essential parameters through a statistical analysis of component significance across classification categories. Specifically, we employ a F-statistic-based screening technique combined with a weighted evaluation scheme to quantify the contributions of connections and channels, enabling both unstructured and structured pruning within a unified framework. Extensive experiments on real-world vision datasets, covering both fully connected neural networks (FNNs) and convolutional neural networks (CNNs), demonstrate that the proposed framework produces compact and efficient models that are highly competitive with the state of art apporoaches.

2412.11136 2026-05-01 stat.ME stat.ML

Minimax Regret Estimation for Generalizing Heterogeneous Treatment Effects with Multisite Data

Yi Zhang, Melody Huang, Kosuke Imai

详情
英文摘要

To test scientific theories and develop individualized treatment rules, researchers often wish to learn heterogeneous treatment effects that can be consistently found across diverse populations and contexts. We consider the problem of generalizing heterogeneous treatment effects (HTE) based on data from multiple sites. A key challenge is that a target population may differ from the source sites in unknown and unobservable ways. This means that the estimates from site-specific models lack external validity, and a simple pooled analysis risks bias. We develop a robust CATE (conditional average treatment effect) estimation methodology with multisite data from heterogeneous populations. We propose a minimax-regret framework that learns a generalizable CATE model by minimizing the worst-case regret over a class of target populations whose CATE can be represented as convex combinations of site-specific CATEs. Using robust optimization, the proposed methodology accounts for distribution shifts in both individual covariates and treatment effect heterogeneity across sites. We show that the resulting CATE model has an interpretable closed-form solution, expressed as a weighted average of site-specific CATE models. Thus, researchers can utilize a flexible CATE estimation method within each site and aggregate site-specific estimates to produce the final model. Through simulations and a real-world application, we show that the proposed methodology improves the robustness and generalizability of existing approaches.

2412.05135 2026-05-01 stat.ML cs.LG stat.CO

The Polynomial Stein Discrepancy for Assessing Moment Convergence

Narayan Srinivasan, Matthew Sutton, Christopher Drovandi, Leah F South

Comments 17 Pages, 14 Figs

详情
英文摘要

We propose a novel method for measuring the discrepancy between a set of samples and a desired posterior distribution for Bayesian inference. Classical methods for assessing sample quality like the effective sample size are not appropriate for scalable Bayesian sampling algorithms, such as stochastic gradient Langevin dynamics, that are asymptotically biased. Instead, the gold standard is to use the kernel Stein Discrepancy (KSD), which is itself not scalable given its quadratic cost in the number of samples. The KSD and its faster extensions also typically suffer from the curse of dimensionality and can require extensive tuning. To address these limitations, we develop the polynomial Stein discrepancy (PSD) and an associated goodness-of-fit test. While the new test is not fully convergence-determining, we prove that it detects differences in the first r moments for Gaussian targets. We empirically show that the test has higher power than its competitors in several examples, and at a lower computational cost. Finally, we demonstrate that the PSD can assist practitioners to select hyper-parameters of Bayesian sampling algorithms more efficiently than competitors.

2408.02679 2026-05-01 cs.LG cs.GR cs.HC stat.ME

Visual Analysis of Multi-outcome Causal Graphs

Mengjie Fan, Jinlu Yu, Daniel Weiskopf, Nan Cao, Huai-Yu Wang, Liang Zhou

详情
Journal ref
EEE Transactions on Visualization and Computer Graphics, vol. 31, no. 1, pp. 656-666, 2025
英文摘要

We introduce a visual analysis method for multiple causal graphs with different outcome variables, namely, multi-outcome causal graphs. Multi-outcome causal graphs are important in healthcare for understanding multimorbidity and comorbidity. To support the visual analysis, we collaborated with medical experts to devise two comparative visualization techniques at different stages of the analysis process. First, a progressive visualization method is proposed for comparing multiple state-of-the-art causal discovery algorithms. The method can handle mixed-type datasets comprising both continuous and categorical variables and assist in the creation of a fine-tuned causal graph of a single outcome. Second, a comparative graph layout technique and specialized visual encodings are devised for the quick comparison of multiple causal graphs. In our visual analysis approach, analysts start by building individual causal graphs for each outcome variable, and then, multi-outcome causal graphs are generated and visualized with our comparative technique for analyzing differences and commonalities of these causal graphs. Evaluation includes quantitative measurements on benchmark datasets, a case study with a medical expert, and expert user studies with real-world health research data.

2407.16212 2026-05-01 stat.ME cs.NA math.NA stat.CO

Optimal experimental design: Formulations and computations

Xun Huan, Jayanth Jagalur, Youssef Marzouk

Comments Appears in Acta Numerica 2024. Some corrections and clarifications in this version

详情
Journal ref
Acta Numerica, Volume 33, July 2024, pp. 715-840
英文摘要

Questions of `how best to acquire data' are essential to modeling and prediction in the natural and social sciences, engineering applications, and beyond. Optimal experimental design (OED) formalizes these questions and creates computational methods to answer them. This article presents a systematic survey of modern OED, from its foundations in classical design theory to current research involving OED for complex models. We begin by reviewing criteria used to formulate an OED problem and thus to encode the goal of performing an experiment. We emphasize the flexibility of the Bayesian and decision-theoretic approach, which encompasses information-based criteria that are well-suited to nonlinear and non-Gaussian statistical models. We then discuss methods for estimating or bounding the values of these design criteria; this endeavor can be quite challenging due to strong nonlinearities, high parameter dimension, large per-sample costs, or settings where the model is implicit. A complementary set of computational issues involves optimization methods used to find a design; we discuss such methods in the discrete (combinatorial) setting of observation selection and in settings where an exact design can be continuously parameterized. Finally we present emerging methods for sequential OED that build non-myopic design policies, rather than explicit designs; these methods naturally adapt to the outcomes of past experiments in proposing new experiments, while seeking coordination among all experiments to be performed. Throughout, we highlight important open questions and challenges.

2407.08668 2026-05-01 stat.ML cs.LG

Modeling Spatial Extremal Dependence of Precipitation Using Distributional Neural Networks

Christopher Bülte, Lisa Leimenstoll, Melanie Schienle

详情
英文摘要

In this work, we propose a simulation-based estimation approach using generative neural networks to determine dependencies of precipitation maxima and their underlying uncertainty in time and space. Within the common framework of max-stable processes for extremes under temporal and spatial dependence, our methodology allows estimating the process parameters and their respective uncertainty, but also delivers an explicit nonparametric estimate of the spatial dependence through the pairwise extremal coefficient function. We illustrate the effectiveness and robustness of our approach in a thorough finite sample study where we obtain good performance in complex settings for which closed-form likelihood estimation becomes intractable. We use the technique for studying monthly rainfall maxima in Western Germany for the period 2021-2023, which is of particular interest since it contains an extreme precipitation and consecutive flooding event in July 2021 that had a massive deadly impact. Beyond the considered setting, the presented methodology and its main generative ideas also have great potential for other applications.

2405.15952 2026-05-01 stat.CO math.ST stat.TH

Theoretical guarantees for lifted samplers

Philippe Gagnon, Florian Maire

详情
Journal ref
Stochastic Processes and their Applications, 199, 1-26 (2026)
英文摘要

Lifted samplers form a class of Markov chain Monte Carlo methods which has drawn a lot attention in recent years due to superior performance in challenging Bayesian applications. A canonical example of lifted samplers is the one that is derived from a random walk Metropolis algorithm for a totally-ordered state space such as the integers or the real numbers. The lifted sampler is derived by splitting into two the proposal distribution: one part in the increasing direction, and the other part in the decreasing direction. It keeps following a direction, until a rejection occurs, upon which it flips the direction. In terms of asymptotic variances, it outperforms the random walk Metropolis algorithm, regardless of the target distribution, at no additional computational cost. Other studies show, however, that beyond this simple case, lifted samplers do not always outperform their Metropolis counterparts. In this paper, we leverage the celebrated work of Tierney (1998) to provide an analysis in a general framework encompassing a broad class of lifted samplers. Our finding is that, essentially, the asymptotic variances cannot increase by a factor of more than 2, regardless of the target distribution, the way the directions are induced, and the type of algorithm from which the lifted sampler is derived (be it a Metropolis--Hastings algorithm, a reversible jump algorithm, etc.). This result indicates that, while there is potentially a lot to gain from lifting a sampler, there is not much to lose.

2207.11890 2026-05-01 econ.EM stat.ME

Misclassification in Difference-in-differences Models

Augustine Denteh, Désiré Kédagni

详情
英文摘要

The difference-in-differences (DID) design is one of the most popular methods used in empirical economics research. However, there is almost no work examining what the DID method identifies in the presence of a misclassified treatment variable. This paper studies the identification of treatment effects in DID designs when the treatment is misclassified. Misclassification arises in various ways, including when the timing of a policy intervention is ambiguous or when researchers need to infer treatment from auxiliary data. We show that the DID estimand is biased and recovers a weighted average of the average treatment effects on the treated (ATT) in two subpopulations -- the correctly classified and misclassified groups. In some cases, the DID estimand may yield the wrong sign and is otherwise attenuated. We provide bounds on the ATT when the researcher has access to information on the extent of misclassification in the data. We demonstrate our theoretical results using simulations and provide two empirical applications to guide researchers in performing sensitivity analysis using our proposed methods.