arXivDaily arXiv每日学术速递 周一至周五更新
重置
2604.18569 2026-04-21 stat.ML cs.LG

Revisiting Active Sequential Prediction-Powered Mean Estimation

Maria-Eleni Sfyraki, Jun-Kun Wang

Comments Published as a conference paper at ICLR 2026

详情
英文摘要

In this work, we revisit the problem of active sequential prediction-powered mean estimation, where at each round one must decide the query probability of the ground-truth label upon observing the covariates of a sample. Furthermore, if the label is not queried, the prediction from a machine learning model is used instead. Prior work proposed an elegant scheme that determines the query probability by combining an uncertainty-based suggestion with a constant probability that encodes a soft constraint on the query probability. We explored different values of the mixing parameter and observed an intriguing empirical pattern: the smallest confidence width tends to occur when the weight on the constant probability is close to one, thereby reducing the influence of the uncertainty-based component. Motivated by this observation, we develop a non-asymptotic analysis of the estimator and establish a data-dependent bound on its confidence interval. Our analysis further suggests that when a no-regret learning approach is used to determine the query probability and control this bound, the query probability converges to the constraint of the max value of the query probability when it is chosen obliviously to the current covariates. We also conduct simulations that corroborate these theoretical findings.

2604.18547 2026-04-21 stat.ML cs.CL cs.LG

FUSE: Ensembling Verifiers with Zero Labeled Data

Joonhyuk Lee, Virginia Ma, Sarah Zhao, Yash Nair, Asher Spector, Regev Cohen, Emmanuel J. Candès

详情
英文摘要

Verification of model outputs is rapidly emerging as a key primitive for both training and real-world deployment of large language models (LLMs). In practice, this often involves using imperfect LLM judges and reward models since ground truth acquisition can be time-consuming and expensive. We introduce Fully Unsupervised Score Ensembling (FUSE), a method for improving verification quality by ensembling verifiers without access to ground truth correctness labels. The key idea behind FUSE is to control conditional dependencies between verifiers in a manner that improves the unsupervised performance of a class of spectral algorithms from the ensembling literature. Despite requiring zero ground truth labels, FUSE typically matches or improves upon semi-supervised alternatives in test-time scaling experiments with diverse sets of generator models, verifiers, and benchmarks. In particular, we validate our method on both conventional academic benchmarks such as GPQA Diamond and on frontier, unsaturated benchmarks such as Humanity's Last Exam and IMO Shortlist questions.

2604.18523 2026-04-21 cond-mat.dis-nn cs.IT math.IT math.ST stat.TH

BBP transition and the leading eigenvector of the spiked Wigner model with inhomogeneous noise

Leonardo S. Ferreira, Fernando L. Metz

Comments 21 pages, 7 figures

详情
英文摘要

The spiked Wigner ensemble is a prototypical model for high-dimensional inference. We study the spectral properties of an inhomogeneous rank-one spiked Wigner model in which the variance of each entry of the noise matrix is itself a random variable. In the high-dimensional limit, we derive exact equations for the spectral edges, the outlier eigenvalue, and the distribution of the components of the outlier eigenvector. These equations determine the BBP transition line that separates the gapped phase, where the signal is detectable, from the gapless phase. In the gapped regime, the distribution of the outlier eigenvector provides a natural estimator of the spike. We solve the equations for a noise matrix whose variances are generated from a truncated power-law distribution. In this case, the BBP transition line is non-monotonic, showing that an inhomogeneous noise can enhance signal detectability.

2604.18441 2026-04-21 math.ST cs.LG stat.ML stat.TH

Conformal Robust Set Estimation

Alejandro Cholaquidis, Emilien Joly, Leonardo Moreno

详情
英文摘要

Conformal prediction provides finite-sample, distribution-free coverage under exchangeability, but standard constructions may lack robustness in the presence of outliers or heavy tails. We propose a robust conformal method based on a non-conformity score defined as the half-mass radius around a point, equivalently the distance to its $(\lfloor n/2\rfloor+1)$-nearest neighbour. We show that the resulting conformal regions are marginally valid for any sample size and converge in probability to a robust population central set defined through a distance-to-a-measure functional. Under mild regularity conditions, we establish exponential concentration and tail bounds that quantify the deviation between the empirical conformal region and its population counterpart. These results provide a probabilistic justification for using robust geometric scores in conformal prediction, even for heavy-tailed or multi-modal distributions.

2604.18045 2026-04-21 stat.ME

An ensemble-based approach for multi-fidelity emulation and adaptive sampling

Hossein Mohammadi

详情
英文摘要

High-resolution simulation models are essential for representing complex physical systems, yet their substantial computational cost severely limits the number of feasible high-fidelity (HF) evaluations. This problem is often addressed through multi-fidelity frameworks, which employ hierarchies of simulators with varying levels of fidelity and evaluation cost. A key difficulty in this setting is integrating information from such heterogeneous sources to accurately approximate HF simulators. This paper proposes a novel multi-fidelity emulation methodology based on ensemble learning. The base learners of the ensemble are hierarchical kriging emulators that systematically incorporate information from lower-fidelity models into HF predictions. Aggregation of these base learners via Bayesian model averaging yields the multi-fidelity emulator with principled uncertainty quantification. The between-model variance component of this uncertainty is then employed as the acquisition criterion in an adaptive design strategy to enrich the training set with informative samples. The predictive performance of the approach is assessed on a collection of well-established benchmark problems. Results show that our multi-fidelity emulator outperforms single-model alternatives in terms of accuracy and robustness. Furthermore, the adaptive design strategy effectively identifies informative samples and improves emulator performance under limited computational budgets.

2511.02757 2026-04-21 cs.LG math.OC stat.ML

ConMeZO: Adaptive Descent-Direction Sampling for Gradient-Free Finetuning of Large Language Models

Lejs Deen Behric, Liang Zhang, Bingcong Li, Kiran Koshy Thekumparampil

详情
英文摘要

Zeroth-order or derivative-free optimization (MeZO) is an attractive strategy for finetuning large language models (LLMs) because it eliminates the memory overhead of backpropagation. However, it converges slowly due to the inherent curse of dimensionality when searching for descent directions in the high-dimensional parameter space of billion-scale LLMs. We propose ConMeZO, a novel zeroth-order optimizer that accelerates convergence by adaptive directional sampling. Instead of drawing the direction uniformly at random, ConMeZO restricts the sampling to a cone centered around a momentum estimate. This concentrates the search in directions where the true gradient is more likely to lie and thus reduces the effect of high dimensions. We prove that ConMeZO achieves the same worst-case convergence rate as MeZO. Empirically, when finetuning LLMs on natural language tasks, ConMeZO is up to 2X faster than MeZO while retaining the low-memory footprint of zeroth-order methods.

2510.16750 2026-04-21 math.ST cs.IT math.IT stat.ML stat.TH

On Robust Hypothesis Testing with respect to the Hellinger Distance

Eeshan Modak, Sivaraman Balakrishnan, Ananda Theertha Suresh

Comments 15 pages, 2 figures. Updated authors list. Some changes in notations and exposition. Shorter version to appear in the proceedings of ISIT 2026

详情
英文摘要

We study a variant of the simple hypothesis testing problem where observed samples do not necessarily come from either of the specified distributions, but rather from a close variant of them. In this setting, we require a test that is robust to misspecification and identifies which distribution is closer in Hellinger distance. If the underlying distribution is nearly equidistant from both hypotheses, the problem becomes intractable. Our main result is a lower bound on the slack factor, which quantifies how much closer the underlying distribution must be to one hypothesis relative to the other for any test to remain robust. We also demonstrate the implications of this result for testing with respect to symmetric chi-squared distance. Finally, we study an alternative way to specify robustness, where each hypothesis is a Hellinger ball around a fixed distribution. We provide and analyze a test for this composite hypothesis testing problem.

2604.18505 2026-04-21 cs.IT math.IT stat.ML

Bayesian experimental design: grouped geometric pooled posterior via ensemble Kalman methods

Huchen Yang, Xinghao Dong, Jinlong Wu

详情
英文摘要

Bayesian experimental design (BED) for complex physical systems is often limited by the nested inference required to estimate the expected information gain (EIG) or its gradients. Each outer sample induces a different posterior, creating a large and heterogeneous set of inference targets. Existing methods have to sacrifice either accuracy or efficiency: they either perform per-outer-sample posterior inference, which yields higher fidelity but at prohibitive computational cost, or amortize the inner inference across all outer samples for computational reuse, at the risk of degraded accuracy under posterior heterogeneity. To improve accuracy and maintain cost at the amortized level, we propose a grouped geometric pooled posterior framework that partitions outer samples into groups and constructs a pooled proposal for each group. While such grouping strategy would normally require generating separate proposal samples for different groups, our tailored ensemble Kalman inversion (EKI) formulation generates these samples without extra forward-model evaluation cost. We also introduce a conservative diagnostic to assess importance-sampling quality to guide grouping. This grouping strategy improves within-group proposal-target alignment, yielding more accurate and stable estimators while keeping the cost comparable to amortized approaches. We evaluate the performance of our method on both Gaussian-linear and high-dimensional network-based model discrepancy calibration problems.

2604.18497 2026-04-21 stat.ME

Missingness-Adaptive Factor Identification in High-Dimensional Data

Ping Zeng, Yicheng Zeng, Lixing Zhu

Comments 46 pages, 4 figures, 14 tables

详情
英文摘要

Determining the number of factors in high-dimensional factor models remains a fundamental challenge, particularly when data are incomplete. This paper introduces the concept of identifiable factors, those that can be reliably recovered despite missing observations, and proposes the Missingness-Adaptive Thresholding Estimator (MATE). To our knowledge, MATE is the first missingness-adaptive framework for factor number determination that accommodates both homogeneous and heterogeneous missingness without imposing restrictive assumptions on factor strength. Notably, it operates without data imputation, circumventing the computational burden associated with most existing approaches. We establish a rigorous theoretical foundation for MATE, proving its consistency under a range of structural conditions. Extensive simulations and real-world applications demonstrate that MATE consistently outperforms state-of-the-art methods, exhibiting superior robustness in settings with high missingness rates and weak factor signals.

2604.18450 2026-04-21 stat.ML cs.LG math.ST stat.TH

Random Matrix Theory of Early-Stopped Gradient Flow: A Transient BBP Scenario

Florentin Coeurdoux, Grégoire Ferré, Jean-Philippe Bouchaud

详情
英文摘要

Empirical studies of trained models often report a transient regime in which signal is detectable in a finite gradient descent time window before overfitting dominates. We provide an analytically tractable random-matrix model that reproduces this phenomenon for gradient flow in a linear teacher--student setting. In this framework, learning occurs when an isolated eigenvalue separates from a noisy bulk, before eventually disappearing in the overfitting regime. The key ingredient is anisotropy in the input covariance, which induces fast and slow directions in the learning dynamics. In a two-block covariance model, we derive the full time-dependent bulk spectrum of the symmetrized weight matrix through a $2\times 2$ Dyson equation, and we obtain an explicit outlier condition for a rank-one teacher via a rank-two determinant formula. This yields a transient Baik-Ben Arous-Péché (BBP) transition: depending on signal strength and covariance anisotropy, the teacher spike may never emerge, emerge and persist, or emerge only during an intermediate time interval before being reabsorbed into the bulk. We map the corresponding phase diagrams and validate the theory against finite-size simulations. Our results provide a minimal solvable mechanism for early stopping as a transient spectral effect driven by anisotropy and noise.

2604.18430 2026-04-21 stat.ME

Shrinkage through multiple identifiability

Carlos García Meixide, David Ríos Insua

详情
英文摘要

We propose an empirical Bayes framework for aggregating estimators obtained from several identification functionals associated to the same causal parameter. The central object is a posterior mean that pools a collection of asymptotically linear estimators of a scalar causal target. We establish consistency in two non-nested regimes: exact identifiability, in which every functional identifies the same causal effect; and a second regime, in which individual functionals are biased but the identification biases are mean-zero across functionals, and the number of functionals grows with sample size. The dependence induced by evaluating all estimators on the same sample is handled through a working independence device that preserves consistency of the point estimator. Inference is organized around a latent heterogeneity hyperparameter: when it vanishes, the functionals share a common target and we report frequentist confidence intervals for that target via a sandwich variance or subsampling; when it is strictly positive, each functional targets a genuine draw from a mixing distribution and we construct asymptotically valid Bayesian prediction intervals for the latent target of a new functional. The two inferential outputs rest on distinct assumption sets and are, therefore, complementary rather than exclusive. We illustrate the framework in the context of augmenting randomized controlled trials with observational evidence.

2604.18420 2026-04-21 stat.ML cs.LG

Spectral bandits for smooth graph functions

Michal Valko, Rémi Munos, Branislav Kveton, Tomáš Kocák

Comments Published in International Conference on Machine Learning (ICML 2014)

详情
英文摘要

Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning problems that involve graphs, such as content-based recommendation. In this problem, each item we can recommend is a node and its expected rating is similar to its neighbors. The goal is to recommend items that have high expected ratings. We aim for the algorithms where the cumulative regret with respect to the optimal policy would not scale poorly with the number of nodes. In particular, we introduce the notion of an effective dimension, which is small in real-world graphs, and propose two algorithms for solving our problem that scale linearly and sublinearly in this dimension. Our experiments on real-world content recommendation problem show that a good estimator of user preferences for thousands of items can be learned from just tens of nodes evaluations.

2604.18402 2026-04-21 stat.ML math.DS

Adaptive Kernel Selection for Kernelized Diffusion Maps

Othmane Aboussaad, Adam Miraoui, Boumediene Hamzi, Houman Owhadi

详情
英文摘要

Selecting an appropriate kernel is a central challenge in kernel-based spectral methods. In \emph{Kernelized Diffusion Maps} (KDM), the kernel determines the accuracy of the RKHS estimator of a diffusion-type operator and hence the quality and stability of the recovered eigenfunctions. We introduce two complementary approaches to adaptive kernel selection for KDM. First, we develop a variational outer loop that learns continuous kernel parameters, including bandwidths and mixture weights, by differentiating through the Cholesky-reduced KDM eigenproblem with an objective combining eigenvalue maximization, subspace orthonormality, and RKHS regularization. Second, we propose an unsupervised cross-validation pipeline that selects kernel families and bandwidths using an eigenvalue-sum criterion together with random Fourier features for scalability. Both methods share a common theoretical foundation: we prove Lipschitz dependence of KDM operators on kernel weights, continuity of spectral projectors under a gap condition, a residual-control theorem certifying proximity to the target eigenspace, and exponential consistency of the cross-validation selector over a finite kernel dictionary.

2604.18388 2026-04-21 stat.ME

Order Dependence in Regression by Composition: Discussion on "Regression by Composition'' by Farewell, Daniel, Stensrud, and Huitfeldt

Mei Dong, Linbo Wang, Lin Liu, Oliver Dukes

详情
英文摘要

We discuss the regression-by-composition framework of Farewell, Daniel, Stensrud and Huitfeldt, highlighting a key consequence of its sequential construction: order dependence. Reordering the flows may change the implied conditional distribution, the interpretation of model parameters, and the associated estimation problem, with consequences for model specification, interpretation, and inference.

2604.18363 2026-04-21 stat.ME

Effect Sizes in Marketing Research: Why Cohen's Local f^2 Belongs in the Toolkit

Wolfgang Messner

详情
英文摘要

In an editorial in the Journal of Marketing, Steenkamp et al. (2026) make a valuable and timely intervention by urging marketing scholars to move beyond dichotomous significance testing and to report effect sizes that speak to substantive significance. Their editorial is especially strong in its insistence on exact p-values, richer statistical reporting, and closer alignment between rigor and relevance. Yet, their framework omits the local form of Cohen's f^2, that is f(B)^2 as an effect-size measure for the contribution of an individual predictor or predictor block B within a multivariable model. That omission matters because much of marketing research relies on regression-type models in which the central theoretical question is not merely whether a model fits globally, but whether a focal construct adds meaningful explanatory power beyond competing predictors and controls. This commentary argues that the R-squared foundation of local Cohen's f(B)^2 is a strength, especially in large-sample settings. Moreover, f-squared-type local effect sizes can be extended beyond ordinary least squares to multilevel models and, more tentatively, to neural networks and other machine-learning models.

2604.18341 2026-04-21 stat.ME

Statistical inference with win statistics in cluster-randomized trials with composite outcomes

Xi Fang, Guangyu Tong, Yuan Huang, F. Perry Wilson, Patrick J. Heagerty, Fan Li

详情
英文摘要

Win statistics have become increasingly popular for analyzing hierarchical composite endpoints in clinical trials, because they summarize treatment benefit through pairwise comparisons that respect the clinical importance order among outcome components. The win ratio, win odds, net benefit, and desirability of outcome ranking (DOOR) are all based on the same underlying pairwise comparison methodology and can complement one another to show the strength of the treatment effect. Despite recent progress on win statistics, statistical inference for win statistics in cluster randomized trials (CRTs) remains underdeveloped. In this paper, we provide a comprehensive survey of testing procedures for the win ratio, win odds, net benefit, and DOOR in parallel-arm CRTs with hierarchical composite outcomes. Then based on each win statistic, we compare different testing procedures, including Wald tests based on cluster rank sum statistics and bivariate clustered U-statistics, tests that use a cluster jackknife variance, a score permutation test, a permutation based procedure with analytical variance estimation, and likelihood ratio test derived from clustered jackknife estimates. Through simulation studies that consider varying scenarios such as different cluster sizes, intracluster correlations, and censoring-induced ties, we characterize the finite-sample type I error and power of each procedure across a range of practical settings with small and large numbers of clusters.We illustrate our methods by reanalyzing the Strategies to Reduce Injuries and Develop Confidence in Elders (STRIDE) pragmatic CRT, and implement all win statistics methods in the WinsCRT R package.

2604.18323 2026-04-21 stat.ME

Which Small-Sample Correction Should Be Used When Analyzing Stepped-Wedge Designs with Time-Varying Treatment Effects?

Yongdong Ouyang, Monica Taljaard, James P. Hughes, Fan Li

Comments first draft

详情
英文摘要

Stepped-wedge cluster randomized trials (SW-CRTs) evaluate interventions rolled out across clusters over time. Standard analyses typically use immediate-treatment (IT) models, which assume effects begin at crossover and remain constant thereafter. When effects vary with exposure duration, IT models may misrepresent target effects. Exposure-time indicator (ETI) models address this by allowing treatment effects to differ by time since exposure and by targeting the time-averaged treatment effect (TATE) and long-term effect (LTE). Like IT models, ETI models require specification of a random-effects structure, which is often misspecified, and the performance of robust variance estimators (RVEs) in this setting is not well understood. We review RVEs for ETI models and evaluate them in simulation studies with continuous and binary outcomes under correctly specified (binary only) and misspecified random-effects structures. We compare the classic sandwich, Kauermann-Carroll (KC), Mancl-DeRouen (MD), and Morel-Bokossa-Neerchal (MBN) estimators for inference on the TATE and LTE. Our simulations show that under misspecified random-effects structures, model-based standard errors (SE) produced undercoverage, whereas RVEs improved performance. For continuous outcomes, MD with a t-distribution and degrees of freedom equal to the number of clusters minus two gave the most consistent coverage probabilities. For binary outcomes, MBN was the only consistently reliable option. MD, however, could be unstable in one-cluster-per-sequence designs because of data sparsity. Across scenarios, both model-based SE and RVE for LTE were unstable, indicating that greater caution is needed when targeting LTE under ETI models.

2604.18319 2026-04-21 stat.ML cs.LG stat.ME

Overcoming Selection Bias in Statistical Studies With Amortized Bayesian Inference

Jonas Arruda, Sophie Chervet, Paula Staudt, Andreas Wieser, Michael Hoelscher, Isabelle Sermet-Gaudelus, Nadine Binder, Lulla Opatowski, Jan Hasenauer

详情
英文摘要

Selection bias arises when the probability that an observation enters a dataset depends on variables related to the quantities of interest, leading to systematic distortions in estimation and uncertainty quantification. For example, in epidemiological or survey settings, individuals with certain outcomes may be more likely to be included, resulting in biased prevalence estimates with potentially substantial downstream impact. Classical corrections, such as inverse-probability weighting or explicit likelihood-based models of the selection process, rely on tractable likelihoods, which limits their applicability in complex stochastic models with latent dynamics or high-dimensional structure. Simulation-based inference enables Bayesian analysis without tractable likelihoods but typically assumes missingness at random and thus fails when selection depends on unobserved outcomes or covariates. Here, we develop a bias-aware simulation-based inference framework that explicitly incorporates selection into neural posterior estimation. By embedding the selection mechanism directly into the generative simulator, the approach enables amortized Bayesian inference without requiring tractable likelihoods. This recasting of selection bias as part of the simulation process allows us to both obtain debiased estimates and explicitly test for the presence of bias. The framework integrates diagnostics to detect discrepancies between simulated and observed data and to assess posterior calibration. The method recovers well-calibrated posterior distributions across three statistical applications with diverse selection mechanisms, including settings in which likelihood-based approaches yield biased estimates. These results recast the correction of selection bias as a simulation problem and establish simulation-based inference as a practical and testable strategy for parameter estimation under selection bias.

2604.18314 2026-04-21 stat.ME

Embarrassingly Causal: Causal Use of Associational Data in Magic The Gathering Drafts

Mark Louie F. Ramos, Ph. D

详情
英文摘要

Observational data are often used to answer causal questions, yet the legitimacy of doing so is often argued to hinge on strong, domain supported assumptions about underlying causal structure with limited guidance on how much domain knowledge support should exist to justify including a causal edge of interest in a directed acyclic graph. We introduce the criterion of embarrassingly causal scenarios, where the existence of an exposure outcome relationship is so uncontroversial that the assumptions needed to include the corresponding causal edge in a DAG can be reasonably made. Using the case of Magic The Gathering booster draft decisions and gameplay outcomes, we show how purely observational data from 17Lands are widely and effectively used to guide draft choices despite substantial confounding, selection effects, and post treatment conditioning. We argue that the embarrassingly causal quality is a sufficient condition for justifying the construction of causal estimands and the collection of observational data to estimate them. Correspondingly, we provide guidance on evaluating observational causal inference assumptions for authors, reviewers, and readers.

2604.18310 2026-04-21 stat.ML cs.LG

Symmetry Guarantees Statistic Recovery in Variational Inference

Daniel Marks, Dario Paccagnan, Mark van der Wilk

Comments 19 pages, 2 figures

详情
英文摘要

Variational inference (VI) is a central tool in modern machine learning, used to approximate an intractable target density by optimising over a tractable family of distributions. As the variational family cannot typically represent the target exactly, guarantees on the quality of the resulting approximation are crucial for understanding which of its properties VI can faithfully capture. Recent work has identified instances in which symmetries of the target and the variational family enable the recovery of certain statistics, even under model misspecification. However, these guarantees are inherently problem-specific and offer little insight into the fundamental mechanism by which symmetry forces statistic recovery. In this paper, we overcome this limitation by developing a general theory of symmetry-induced statistic recovery in variational inference. First, we characterise when variational minimisers inherit the symmetries of the target and establish conditions under which these pin down identifiable statistics. Second, we unify existing results by showing that previously known statistic recovery guarantees in location-scale families arise as special cases of our theory. Third, we apply our framework to distributions on the sphere to obtain novel guarantees for directional statistics in von Mises-Fisher families. Together, these results provide a modular blueprint for deriving new recovery guarantees for VI in a broad range of symmetry settings.

2604.18291 2026-04-21 stat.AP

Data (in)equities in data science: Dissecting systemic and systematic biases in pulse oximetry

Lillian Rountree, Harsh Parikh, Bhramar Mukherjee

详情
英文摘要

Data equity is an emerging framework for responsible data science. However, its core concepts, including fairness, representativeness, and information bias, remain largely abstract and general, lacking the mathematical specificity needed for practical implementation. In this paper, we demonstrate how statisticians can operationalize data equity by translating its tenets into precise, testable formulations tailored to a given problem. Using the well-documented case of differential measurement error across racial groups in pulse oximetry, we first adopt an oracle approach, tracing how a single upstream violation of information bias compounds through the analytic pipeline into treatment disparities, fairness violations, and adverse health outcomes. We then demonstrate the inverse: starting from an observed outcome disparity, the data equity framework provides a principled structure for systematically identifying its statistical sources. Our exposition reveals that data equity, prediction equity, and decision equity are distinct requirements with distinct evaluation and policy needs--a nuance that highlights both the unique role of statisticians in the era of artificial intelligence as well as the necessity of interdisciplinary collaboration.

2604.18253 2026-04-21 math.ST math.PR stat.CO stat.TH

Gamma-Based Expansion for the First-Passage Time Distribution of Stochastic Logistic Models with Harvesting

Simone Catanzaro, Elvira Di Nardo

详情
英文摘要

The first passage time problem is considered for stochastic logistic growth model with constant harvesting and multiplicative environmental noise. Explicit expressions for the moments and cumulants of both upcrossing and downcrossing FPTs in the presence of constant thresholds are obtained through a power-series expansion of the Laplace transform. Then a closed-form representation of the FPT density is recovered via an orthogonal Laguerre--Gamma expansion . This representation is used to numerically evaluate FPT densities, with the truncation order controlling the trade-off between accuracy and stability. Numerical experiments based on Monte Carlo simulations confirm the high accuracy of the method in regimes of moderate dispersion and highlight its limitations when higher-order moments grow rapidly. Application to fisheries management models shows that the method remains effective even for large-scale population. Finally, the approximated density is satisfactory used to estimate some parameters of the model.

2604.18251 2026-04-21 cs.CV cs.AI cs.LG stat.AP

Style-Based Neural Architectures for Real-Time Weather Classification

Hamed Ouattara, Pascal Houssam Salmane, Pierre Duthon, Frédéric Bernardin, Omar Ait Aider

Comments 9 pages, 21 figures

详情
Journal ref
International Conference on Image Analysis (ICIAR 2025) and Recognition
英文摘要

In this paper, we present three neural network architectures designed for real-time classification of weather conditions (sunny, rain, snow, fog) from images. These models, inspired by recent advances in style transfer, aim to capture the stylistic elements present in images. One model, called "Multi-PatchGAN", is based on PatchGANs used in well-known architectures such as Pix2Pix and CycleGAN, but here adapted with multiple patch sizes for detection tasks. The second model, "Truncated ResNet50", is a simplified version of ResNet50 retaining only its first nine layers. This truncation, determined by an evolutionary algorithm, facilitates the extraction of high-frequency features essential for capturing subtle stylistic details. Finally, we propose "Truncated ResNet50 with Gram Matrix and Attention", which computes Gram matrices for each layer during training and automatically weights them via an attention mechanism, thus optimizing the extraction of the most relevant stylistic expressions for classification. These last two models outperform the state of the art and demonstrate remarkable generalization capability on several public databases. Although developed for weather detection, these architectures are also suitable for other appearance-based classification tasks, such as animal species recognition, texture classification, disease detection in medical imaging, or industrial defect identification.

2604.18229 2026-04-21 stat.ME

Inference for Functional Data under Markov Constraints

Ulysse Naepels, Victor M. Panaretos

详情
英文摘要

Smoothness has long been the dominant form of parsimony in functional data analysis, to the point of occasionally being conflated with the very notion of functional data. However, many core inferential tasks depend on the inverse covariance, where sparsity--rather than smoothness--emerges as the more natural structural constraint. In this paper, we explore Markovianity as an alternative to smoothness. Focusing on the Gaussian case as a central motivating setting, we exploit the fact that Markovianity induces a shape constraint on the covariance kernel. Building on this observation, we introduce a Markov transform of the empirical covariance together with a corresponding estimator that enforces the Markov structure. The estimator is adaptive and requires no regularity of the underlying covariance beyond continuity. In simulation experiments, it is seen to improve prediction performance even under model misspecification. Unlike smoothness-based assumptions, Markovianity is falsifiable. To assess its validity, we further propose a novel and computationally efficient test for the Markov property based on a new characterization of continuous graphical structure.

2604.18152 2026-04-21 stat.ML cs.LG

mlr3torch: A Deep Learning Framework in R based on mlr3 and torch

Sebastian Fischer, Lukas Burk, Carson Zhang, Bernd Bischl, Martin Binder

详情
英文摘要

Deep learning (DL) has become a cornerstone of modern machine learning (ML) praxis. We introduce the R package mlr3torch, which is an extensible DL framework for the mlr3 ecosystem. It is built upon the torch package, and simplifies the definition, training, and evaluation of neural networks for both tabular data and generic tensors (e.g., images) for classification and regression. The package implements predefined architectures, and torch models can easily be converted to mlr3 learners. It also allows users to define neural networks as graphs. This representation is based on the graph language defined in mlr3pipelines and allows users to define the entire modeling workflow, including preprocessing, data augmentation, and network architecture, in a single graph. Through its integration into the mlr3 ecosystem, the package allows for convenient resampling, benchmarking, preprocessing, and more. We explain the package's design and features and show how to customize and extend it to new problems. Furthermore, we demonstrate the package's capabilities using three use cases, namely hyperparameter tuning, fine-tuning, and defining architectures for multimodal data. Finally, we present some runtime benchmarks.

2604.18089 2026-04-21 cs.LG stat.ML

Towards E-Value Based Stopping Rules for Bayesian Deep Ensembles

Emanuel Sommer, Rickmer Schulte, Sarah Deubner, Julius Kobialka, David Rügamer

Comments Accepted for presentation at the OPTIMAL Workshop at AISTATS 2026, Tangier, Morocco

详情
英文摘要

Bayesian Deep Ensembles (BDEs) represent a powerful approach for uncertainty quantification in deep learning, combining the robustness of Deep Ensembles (DEs) with flexible multi-chain MCMC. While DEs are affordable in most deep learning settings, (long) sampling of Bayesian neural networks can be prohibitively costly. Yet, adding sampling after optimizing the DEs has been shown to yield significant improvements. This leaves a critical practical question: How long should the sequential sampling process continue to yield significant improvements over the initial optimized DE baseline? To tackle this question, we propose a stopping rule based on E-values. We formulate the ensemble construction as a sequential anytime-valid hypothesis test, providing a principled way to decide whether or not to reject the null hypothesis that MCMC offers no improvement over a strong baseline, to early stop the sampling. Empirically, we study this approach for diverse settings. Our results demonstrate the efficacy of our approach and reveal that only a fraction of the full-chain budget is often required.

2604.18088 2026-04-21 cs.CV cs.AI stat.AP

Autonomous Unmanned Aircraft Systems for Enhanced Search and Rescue of Drowning Swimmers: Image-Based Localization and Mission Simulation

Sascha Emanuel Zell, Toni Schneidereit, Armin Fügenschuh, Michael Breuß

Comments Submitted to "Applied Intelligence"

详情
英文摘要

Drowning is an omnipresent risk associated with any activity on or in the water, and rescuing a drowning person is particularly challenging because of the time pressure, making a short response time important. Further complicating water rescue are unsupervised and extensive swimming areas, precise localization of the target, and the transport of rescue personnel. Technical innovations can provide a remedy: We propose an Unmanned Aircraft System (UAS), also known as a drone-in-a-box system, consisting of a fleet of Unmanned Aerial Vehicles (UAVs) allocated to purpose-built hangars near swimming areas. In an emergency, the UAS can be deployed in addition to Standard Rescue Operation (SRO) equipment to locate the distressed person early by performing a fully automated Search and Rescue (S&R) operation and dropping a flotation device. In this paper, we address automatically locating distressed swimmers using the image-based object detection architecture You Only Look Once (YOLO). We present a dataset created for this application and outline the training process. We evaluate the performance of YOLO versions 3, 5, and 8 and architecture sizes (nano, extra-large) using Mean Average Precision (mAP) metrics mAP@.5 and mAP@.5:.95. Furthermore, we present two Discrete-Event Simulation (DES) approaches to simulate response times of SRO and UAS-based water rescue. This enables estimation of time savings relative to SRO when selecting the UAS configuration (type, number, and location of UAVs and hangars). Computational experiments for a test area in the Lusatian Lake District, Germany, show that UAS assistance shortens response time. Even a small UAS with two hangars, each containing one UAV, reduces response time by a factor of five compared to SRO.

2604.18057 2026-04-21 stat.ME stat.AP stat.CO

Efficient Bayesian inference for non-linear association structures in joint models: A hierarchical approach via INLA

Denis Rustand, Håvard Rue, Lisa Le Gall, Karen Leffondre

详情
英文摘要

Joint models for longitudinal and time-to-event data are increasingly used in health research to characterize the association between biomarker trajectories and the risk of clinical events. However, these models usually assume a linear relationship between the longitudinal marker and the log-hazard of the event. This assumption is rarely verified and often fails to capture complex biological mechanisms, such as U-shaped risk profiles or plateau effects. In this paper, we propose a fast and stable hierarchical framework for non-linear association structures in joint models using Integrated Nested Laplace Approximations (INLA), implemented in the INLAjoint R package. Our approach builds upon a unified framework where the scaling effect of the marker is decomposed into a parametric baseline (constant and linear components) and a data-driven smooth deviation modeled via an orthogonal basis derived from a second-order random walk. This natural hierarchy allows researchers to adapt model flexibility directly and verify the linearity assumption using standard information criteria. Through simulation studies, we demonstrate that the proposed method accurately recovers complex non-linear trajectories. We illustrate the practical utility of our framework by analyzing the joint association of the current value and current slope of body mass index (BMI) with all-cause mortality in the Health and Retirement Study. This analysis reveals a U-shaped mortality risk for the BMI value, and a non-linear effect for the rate of weight change, where a declining weight trajectory is associated with higher mortality risk.

2604.18042 2026-04-21 stat.ME math.ST stat.TH

A Bayesian framework with adaptive elastic nets for the inference of Gaussian graphical models

Roland B. Sogan, Tabea Rebafka, Fanny Villers

详情
英文摘要

Estimating conditional independence graphs from high-dimensional Gaussian data is challenging because methods must detect relevant edges while rigorously controlling statistical errors. We propose a Bayesian framework based on a prior accounts for degree heterogeneity edge sparsity, and graph topology the graph. The resulting posterior distribution is incorporated into a multiple testing procedure for graph inference with false discovery rate control. Computation is carried out through a combination of adaptive elastic nets and a variational expectation--maximization algorithm. In simulations, the method achieves reliable false discovery rate control while maintaining strong power, especially in heterogeneous networks such as graphs with hubs, and remains competitive under structural misspecification. Applications to breast cancer gene expression data and financial return networks show that the method yields sparse and interpretable conditional dependence graphs while retaining the most stable interactions detected by competing approaches.

2604.18022 2026-04-21 q-bio.BM cond-mat.stat-mech cs.LG stat.ML

Boltzmann Machine Learning with a Parallel, Persistent Markov chain Monte Carlo method for Estimating Evolutionary Fields and Couplings from a Protein Multiple Sequence Alignment

Sanzo Miyazawa

Comments A manuscript of 11 pages including 3 figures and 3 tables, and a supplementary material of 9 pages including 8 figures. The program and multiple sequence alignments employed here are available from https://gitlab.com/sanzo.miyazawa/BM/ and https://github.com/Sanzo-Miyazawa/BM/

详情
英文摘要

The inverse Potts problem for estimating evolutionary single-site fields and pairwise couplings in homologous protein sequences from their single-site and pairwise amino acid frequencies observed in their multiple sequence alignment would be still one of useful methods in the studies of protein structure and evolution. Since the reproducibility of fields and couplings are the most important, the Boltzmann machine method is employed here, although it is computationally intensive. In order to reduce computational time required for the Boltzmann machine, parallel, persistent Markov chain Monte Carlo method is employed to estimate the single-site and pairwise marginal distributions in each learning step. Also, stochastic gradient descent methods are used to reduce computational time for each learning. Another problem is how to adjust the values of hyperparameters; there are two regularization parameters for evolutionary fields and couplings. The precision of contact residue pair prediction is often used to adjust the hyperparameters. However, it is not sensitive to these regularization parameters. Here, they are adjusted for the fields and couplings to satisfy a specific condition that is appropriate for protein conformations. This method has been applied to eight protein families.

2604.17984 2026-04-21 cs.LG stat.ML

Online Conformal Prediction with Adversarial Semi-bandit Feedback via Regret Minimization

Junyoung Yang, Kyungmin Kim, Sangdon Park

详情
英文摘要

Uncertainty quantification is crucial in safety-critical systems, where decisions must be made under uncertainty. In particular, we consider the problem of online uncertainty quantification, where data points arrive sequentially. Online conformal prediction is a principled online uncertainty quantification method that dynamically constructs a prediction set at each time step. While existing methods for online conformal prediction provide long-run coverage guarantees without any distributional assumptions, they typically assume a full feedback setting in which the true label is always observed. In this paper, we propose a novel learning method for online conformal prediction with partial feedback from an adaptive adversary-a more challenging setup where the true label is revealed only when it lies inside the constructed prediction set. Specifically, we formulate online conformal prediction as an adversarial bandit problem by treating each candidate prediction set as an arm. Building on an existing algorithm for adversarial bandits, our method achieves a long-run coverage guarantee by explicitly establishing its connection to the regret of the learner. Finally, we empirically demonstrate the effectiveness of our method in both independent and identically distributed (i.i.d.) and non-i.i.d. settings, showing that it successfully controls the miscoverage rate while maintaining a reasonable size of the prediction set.

2604.17956 2026-04-21 cs.LG stat.ME

Federated Rule Ensemble Method in Medical Data

Ke Wan, Kensuke Tanioka, Toshio Shimokawa

详情
英文摘要

Machine learning has become integral to medical research and is increasingly applied in clinical settings to support diagnosis and decision-making; however, its effectiveness depends on access to large, diverse datasets, which are limited within single institutions. Although integrating data across institutions can address this limitation, privacy regulations and data ownership constraints hinder these efforts. Federated learning enables collaborative model training without sharing raw data; however, most methods rely on complex architectures that lack interpretability, limiting clinical applicability. Therefore, we proposed a federated RuleFit framework to construct a unified and interpretable global model for distributed environments. It integrates three components: preprocessing based on differentially private histograms to estimate shared cutoff values, enabling consistent rule definitions and reducing heterogeneity across clients; local rule generation using gradient boosting decision trees with shared cutoffs; and coefficient estimation via $\ell_1$-regularized optimization using a Federated Dual Averaging algorithm for sparse and consistent variable selection. In simulation studies, the proposed method achieved a performance comparable to that of centralized RuleFit while outperforming existing federated approaches. Real-world analysis demonstrated its ability to provide interpretable insights with competitive predictive accuracy. Therefore, the proposed framework offers a practical and effective solution for interpretable and reliable modeling in federated learning environments.

2604.17952 2026-04-21 econ.EM cs.SI stat.AP

Causal inference for social network formation

Maximilian Kasy, Elizabeth Linos, Sanaz Mobasseri

详情
英文摘要

This paper develops a framework for identification, estimation, and inference on the causal mechanisms driving endogenous social network formation. Identification is challenging because of unobserved confounders and reverse causality; inference is complicated by questions of equilibrium and sampling. We leverage repeated observations of a network over time and random variation in initial ties to address challenges to causal identification. Our design-based approach sidesteps questions of sampling and asymptotics by treating both the set of nodes (individuals) and potential outcomes as non-random. We apply our approach to data from a large professional services firm, where new hires are randomly assigned to project teams within offices. We estimate the causal effect on tie formation of indirect ties, network degree, and local network density. Indirect ties have a strong and significant positive effect on tie formation, while the effects of degree and density are smaller and less robust.

2601.08184 2026-04-21 math.PR cs.LG stat.ML

Wasserstein-p Central Limit Theorem Rates: From Local Dependence to Markov Chains

Yixuan Zhang, Qiaomin Xie

Comments ACM SIGMETRICS 2026. 73 pages

详情
英文摘要

Non-asymptotic central limit theorem (CLT) rates play a central role in modern machine learning and operations research. In this paper, we study CLT rates for multivariate dependent data in Wasserstein-$p$ ($W_p$) distance, for general $p\ge 1$. We focus on two fundamental dependence structures that commonly arise in practice: locally dependent sequences and geometrically ergodic Markov chains. In both settings, we establish the first optimal $\mathcal O(n^{-1/2})$ rate in $W_1$, as well as the first $W_p$ ($p\ge 2$) CLT rates under mild moment assumptions, substantially improving the best previously known bounds in these dependent-data regimes. As an application of our optimal $W_1$ rate for locally dependent sequences, we further obtain the first optimal $W_1$-CLT rate for multivariate $U$-statistics. On the technical side, we derive a tractable auxiliary bound for $W_1$ Gaussian approximation errors that is well suited for studying dependent data. For Markov chains, we further prove that the regeneration time of the split chain associated with a geometrically ergodic chain has a geometric tail without assuming strong aperiodicity or other restrictive conditions. These tools may be of independent interests and enable our optimal $W_1$ rates and underpin our $W_p$ ($p\ge 2$) results.

2512.23405 2026-04-21 cs.LG stat.ML

On the Sample Complexity of Learning for Blind Inverse Problems

Nathan Buskulic, Luca Calatroni, Lorenzo Rosasco, Silvia Villa

详情
英文摘要

Blind inverse problems arise in many experimental settings where both the signal of interest and the forward operator are (partially) unknown. In this context, methods developed for the non-blind case cannot be adapted in a straightforward manner due to identifiability issues and symmetric solutions inherent to the blind setting. Recently, data-driven approaches have been proposed to address such problems, demonstrating strong empirical performance and adaptability. However, these methods often lack interpretability and are not supported by theoretical guarantees, limiting their reliability in domains such as applied imaging where a blind approach often relates to a calibration of the acquisition device. In this work, we shed light on learning in blind inverse problems within the insightful framework of Linear Minimum Mean Square Estimators (LMMSEs). We provide a theoretical analysis, deriving closed-form expressions for optimal estimators and extending classical recovery results to the blind setting. In particular, we establish equivalences with tailored Tikhonov-regularized formulations, where the regularization structure depends explicitly on the distributions of the unknown signal, of the noise, and of the random forward operator. We also show how the reconstruction error converges as the noise and the randomness of the operator diminish when we use a source condition assumption. Furthermore, we derive finite-sample error bounds that characterize the performance of the learned estimators as a function of the noise level, problem conditioning, and number of available samples. These bounds explicitly quantify the impact of operator randomness and show explicitly the dependence of the associated convergence rates to this randomness factors. Finally, we validate our theoretical findings through illustrative exemplar numerical experiments that confirm the predicted convergence behavior.

2512.02744 2026-04-21 stat.ME econ.EM stat.AP

Implicit score-driven filters for time-varying parameter models

Rutger-Jan Lange, Bram van Os, Dick van Dijk

Comments 73 pages

详情
英文摘要

We propose an observation-driven modeling framework that allows model parameters to vary over time through an implicit score-driven (ISD) update. The ISD update maximizes the logarithmic observation density with respect to the parameter vector while penalizing the weighted L2 norm relative to a one-step-ahead predicted parameter. This yields an implicit stochastic-gradient update. We show that the popular class of explicit score-driven (ESD) models arises when the observation log density is linearly approximated around the prediction. By preserving the full density, the ISD update extends the favorable local properties of the ESD update to a global setting. For log-concave observation densities, whether correctly specified or not, the ISD filter is stable for all learning rates, and its updates are contractive in mean squared error toward the (pseudo-)true parameter at every time step. We demonstrate the usefulness of ISD filters in simulations and empirical applications in finance and macroeconomics.

2511.16172 2026-04-21 econ.EM stat.ME

Confidence Sets for the Emergence, Collapse, and Recovery Dates of a Bubble

Eiji Kurozumi, Anton Skrobotov

详情
英文摘要

We propose constructing confidence sets for the emergence, collapse, and recovery dates of a bubble separately by inverting tests for the location of the break date. We examine both likelihood ratio-type tests and the Elliott-Muller-type (2007) tests for detecting break locations. The limiting distributions of these tests are derived under the null hypothesis, and their asymptotic consistency under the alternative is established. Finite-sample properties are evaluated through Monte Carlo simulations. The results indicate that combining different types of tests effectively controls the empirical coverage rate while maintaining a reasonably small length of the confidence set.

2511.09249 2026-04-21 econ.EM math.ST stat.TH

Robust Cauchy-Based Methods for Predictive Regressions

Rustam Ibragimov, Jihyun Kim, Anton Skrobotov

详情
英文摘要

This paper develops robust inference methods for predictive regressions that address key challenges posed by endogenously persistent or heavy-tailed regressors, as well as persistent volatility in errors. Building on the Cauchy estimation framework, we propose two novel tests: one based on $t$-statistic group inference and the other employing a hybrid approach that combines Cauchy and OLS estimation. These methods effectively mitigate size distortions that commonly arise in standard inference procedures under endogeneity, near nonstationarity, heavy tails, and persistent volatility. The proposed tests are simple to implement and applicable to both continuous- and discrete-time models. Extensive simulation experiments demonstrate favorable finite-sample performance across a range of realistic settings. An empirical application examines the predictability of excess stock returns using the dividend-price and earnings-price ratios as predictors. The results suggest that the dividend-price ratio possesses predictive power, whereas the earnings-price ratio does not significantly forecast returns.

2509.18964 2026-04-21 cs.LG math.OC stat.ML

Central Limit Theorems for Asynchronous Averaged Q-Learning

Xingtu Liu

详情
英文摘要

This paper establishes central limit theorems for Polyak-Ruppert averaged Q-learning under asynchronous updates. We prove a non-asymptotic central limit theorem, where the convergence rate in Wasserstein distance explicitly reflects the dependence on the number of iterations, state-action space size, the discount factor, and the quality of exploration. In addition, we derive a functional central limit theorem, showing that the partial-sum process converges weakly to a Brownian motion.

2507.20993 2026-04-21 cs.LG cs.AI stat.ML

Annotation-Assisted Learning of Treatment Policies From Multimodal Electronic Health Records

Henri Arno, Thomas Demeester

Comments Preprint. Under review

详情
英文摘要

We study how to learn treatment policies from multimodal electronic health records (EHRs) that consist of tabular data and clinical text. These policies can help physicians make better treatment decisions and allocate healthcare resources more efficiently. Causal policy learning methods prioritize patients with the largest expected treatment benefit. Yet, existing estimators are designed for tabular covariates under causal assumptions that may be hard to justify in the multimodal setting. A pragmatic alternative is to apply causal estimators directly to multimodal representations, but this can produce biased treatment effect estimates when the representations do not preserve the relevant confounding information. As a result, predictive models of baseline risk are commonly used in practice to guide treatment decisions, although they are not designed to identify which patients benefit most from treatment. We propose AACE (Annotation-Assisted Coarsened Effects), an annotation-assisted approach to causal policy learning for multimodal EHRs. The method uses expert-provided annotations during training to support confounding adjustment, and then predicts treatment benefit from only multimodal representations at inference. We show that the proposed method achieves strong empirical performance across synthetic, semi-synthetic, and real-world EHR datasets, outperforming risk-based and representation-based causal baselines, and offering practical insights for applying causal machine learning in clinical practice.

2506.12771 2026-04-21 stat.ME

Machine-Learning-Powered Specification Testing in Linear Instrumental Variable Models

Cyrill Scheidegger, Malte Londschien, Peter Bühlmann

详情
英文摘要

The linear instrumental variable (IV) model is widely used in observational studies, yet its validity hinges on strong assumptions. Classical specification tests such as the Sargan-Hansen J test are limited to overidentified settings and are therefore not applicable in the common just-identified case, where the number of instruments is equal to the number of endogenous variables. We propose a novel test for the well-specification of the linear IV model under the assumption that the structural error is mean independent of the instruments. This assumption enables specification testing even in the just-identified setting. Our approach uses the idea of residual prediction: if the two-stage least squares residuals can be predicted from the instruments better than chance, this indicates misspecification. The resulting test employs sample splitting and a user-chosen machine learning method, and we show asymptotic type I error control and consistency against a broad class of alternatives. We further show how the proposed testing principle can be adapted to settings with weak or many instruments via an Anderson-Rubin-type inversion, thereby substantially extending the applicability. The tests accommodate heteroskedasticity- and cluster-robust inference and are implemented in the R package RPIV and the ivmodels software package for Python.

2505.21722 2026-04-21 cs.LG cs.AI stat.ML

Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape

Ioannis Bantzis, James B. Simon, Arthur Jacot

Comments Accepted at ICLR 2026. Camera-ready version

详情
英文摘要

When a deep ReLU network is initialized with small weights, gradient descent (GD) is at first dominated by the saddle at the origin in parameter space. We study the so-called escape directions along which GD leaves the origin, which play a similar role as the eigenvectors of the Hessian for strict saddles. We show that the optimal escape direction features a low-rank bias in its deeper layers: the first singular value of the $\ell$-th layer weight matrix is at least $\ell^{\frac{1}{4}}$ larger than any other singular value. We also prove a number of related results about these escape directions. We suggest that deep ReLU networks exhibit saddle-to-saddle dynamics, with GD visiting a sequence of saddles with increasing bottleneck rank (Jacot, 2023).

2505.20168 2026-04-21 stat.ME

Causal Meta-Analysis: Rethinking the Foundations of Evidence-Based Medicine

Clément Berenfeld, Ahmed Boughdiri, Bénédicte Colnet, Wouter A. C. van Amsterdam, Aurélien Bellet, Rémi Khellaf, Erwan Scornet, Julie Josse

Comments 26 pages, 4 figures, 2 tables. v2: Adding Sec 4.3 and correcting variance equations in Sec 3. v3: Refactoring of the manuscript and additional results (Thm 1 and 2). v4: additional appendix to Sec 3

详情
英文摘要

Meta-analysis, by synthesizing effect estimates from multiple studies conducted in diverse settings, stands at the top of the evidence hierarchy in clinical research. Yet, conventional approaches based on fixed- or random-effects models lack a causal framework, which may limit their interpretability and utility for public policy. Incorporating causal inference reframes meta-analysis as the estimation of well-defined causal effects on clearly specified populations, enabling a principled approach to handling study heterogeneity. We show that classical meta-analysis estimators have a clear causal interpretation when effects are measured as risk differences. However, this breaks down for nonlinear measures like the risk ratio and odds ratio. To address this, we introduce novel causal aggregation formulas that remain compatible with standard meta-analysis practices and do not require access to individual-level data. To evaluate real-world impact, we apply both classical and causal meta-analysis methods to 500 published meta-analyses. While the conclusions often align, notable discrepancies emerge, revealing cases where conventional methods may suggest a treatment is beneficial when, under a causal lens, it is in fact harmful.

2505.15342 2026-04-21 stat.ML cs.LG math.ST stat.TH

Policy Testing in Markov Decision Processes

Kaito Ariu, Po-An Wang, Alexandre Proutiere, Kenshi Abe

详情
英文摘要

We study the policy testing problem in discounted Markov decision processes (MDPs) in the fixed-confidence setting under a generative model with static sampling. The goal is to decide whether the value of a given policy exceeds a specified threshold while minimizing the number of samples. We first derive an instance-dependent lower bound that any reasonable algorithm must satisfy, characterized as the solution to an optimization problem with non-convex constraints. Guided by this formulation, we propose a new algorithm. While this design paradigm is common in pure exploration problems such as best-arm identification, the non-convex constraints that arise in MDPs introduce substantial difficulties. To address them, we reformulate the lower-bound problem by swapping the roles of the objective and the constraints, yielding an alternative problem with a non-convex objective but convex constraints. This reformulation admits an interpretation as a policy optimization task in a newly constructed reversed MDP. We further show that the global KL constraint can be decomposed exactly into a family of product-box subproblems, which are solved by projected policy gradient and combined through an outer budget search. Beyond policy testing, our reformulation and reversed MDP view suggest extensions to other pure exploration tasks in MDPs, including policy evaluation and best policy identification.

2503.21421 2026-04-21 math.OC math.PR math.ST stat.TH

Robust Mean Estimation for Optimization: The Impact of Heavy Tails

Bart P. G. van Parys, Bert Zwart

详情
英文摘要

We consider the problem of constructing a least conservative estimator of the expected value $μ$ of a non-negative heavy-tailed random variable. We require that the probability of overestimating the expected value $μ$ is kept appropriately small; a natural requirement if its subsequent use in a decision process is anticipated. In this setting, we show it is optimal to estimate $μ$ by solving a distributionally robust optimization (DRO) problem using the Kullback-Leibler (KL) divergence. We further show that the statistical properties of KL-DRO compare favorably with other estimators based on truncation, variance regularization, or Wasserstein DRO.

2503.09299 2026-04-21 math.ST cs.GT stat.TH

Low-Rank Graphon Estimation: Theory and Applications to Graphon Games

Olga Klopp, Fedor Noskov

详情
英文摘要

We study low-rank estimation of an unknown sparse graphon from sampled network data under operator-norm loss, motivated by targeted interventions in graphon games. Starting from the observed adjacency matrix, we construct low-rank surrogates by singular value thresholding and, for smooth graphons, by block averaging followed by thresholding. We obtain non-asymptotic bounds on both the operator-norm error and the rank of the resulting estimator for stochastic block model, Hölder, and analytic graphons, and we complement these results with minimax lower bounds showing that the rates are essentially sharp for these classes. Our analysis highlights that low rank is valuable here primarily for computation: while it does not improve the minimax operator-norm rate, it yields operator-norm accurate surrogates with substantially smaller rank. We then apply these estimators to linear-quadratic graphon games and derive non-asymptotic stability bounds showing that the welfare loss incurred by using an estimated graphon is controlled by the operator-norm perturbation. This yields near-optimal guarantees for targeted interventions computed from the estimated graphon, together with substantial computational savings. For zero baseline heterogeneity and under a spectral-gap condition, we also establish matching lower bounds for intervention regret. Numerical experiments illustrate the trade-off between statistical accuracy, retained rank, and runtime.

2502.13570 2026-04-21 stat.ML cs.LG math.ST stat.ME stat.TH

A Scalable Nystrom-Based Kernel Two-Sample Test with Permutations

Antoine Chatalic, Marco Letizia, Nicolas Schreuder, Lorenzo Rosasco

详情
英文摘要

Two-sample hypothesis testing-determining whether two sets of data are drawn from the same distribution-is a fundamental problem in statistics and machine learning with broad scientific applications. In the context of nonparametric testing, maximum mean discrepancy (MMD) has gained popularity as a test statistic due to its flexibility and strong theoretical foundations. However, its use in large-scale scenarios is plagued by high computational costs. In this work, we use a Nyström approximation of the MMD to design a computationally efficient and practical testing algorithm while preserving statistical guarantees. Our main result is a finite-sample bound on the power of the proposed test for distributions that are sufficiently separated with respect to the MMD. The derived separation rate matches the known minimax optimal rate in this setting. We support our findings with a series of numerical experiments, emphasizing applicability to realistic scientific data.

2502.08531 2026-04-21 cs.LG stat.ML

On Different Notions of Redundancy in Conditional-Independence-Based Discovery of Graphical Models

Philipp M. Faller, Dominik Janzing

Comments AISTATS 2026. Previous versions contained incorrect claims about partial correlations and the necessity of the condition in proposition 2

详情
英文摘要

Conditional-independence-based discovery uses statistical tests to identify a graphical model that represents the independence structure of variables in a dataset. These tests, however, can be unreliable, and algorithms are sensitive to errors and violated assumptions. Often, there are tests that were not used in the construction of the graph. In this work, we show that these redundant tests have the potential to detect or sometimes correct errors in the learned model. But we further show that not all tests contain this additional information and that such redundant tests have to be applied with care. Precisely, we argue that the conditional (in)dependence statements that hold for every probability distribution are unlikely to detect and correct errors - in contrast to those that follow only from graphical assumptions.

2501.16315 2026-04-21 math.CA math.ST stat.TH

A varifold-type estimation for data sampled on a rectifiable set

Charly Boricaud, Blanche Buet

详情
英文摘要

We investigate the inference of varifold structures in a statistical framework: assuming that we have access to i.i.d. samples in $\mathbb{R}^n$ obtained from an underlying $d$--dimensional shape $S$ endowed with a possibly non uniform density $θ$, we propose and analyse an estimator of the varifold structure associated to $S$. The shape $S$ is assumed to be piecewise $C^{1,a}$ in a sense that allows for a singular set whose small enlargements are of small $d$--dimensional measure. The estimators are kernel--based both for infering the density and the tangent spaces and the convergence result holds for the bounded Lipschitz distance between varifolds, in expectation and in a noiseless model. The mean convergence rate involves the dimension $d$ of $S$, its regularity through $a \in (0, 1]$ and the regularity of the density $θ$.

2409.15965 2026-04-21 math.ST math.OC stat.TH

A Christoffel-like function for high-dimensional support inference in graphical models

Jean-Bernard Lasserre, Lucas Slot

Comments Implemented referee comments. Added a missing assumption to the statement of Corollary 4.11

详情
英文摘要

Christoffel polynomials are classical tools from approximation theory. They can be used to estimate the (compact) support of a measure $μ$ on $\mathbb{R}^d$ based on its low-degree moments. Recently, they have been applied to problems in data science, including outlier detection and support inference. A major downside of Christoffel polynomials in such applications is the fact that, in order to compute their coefficients, one must invert a moment matrix whose size grows rapidly with the dimension $d$. In this paper, we propose a modification of the Christoffel polynomial which is significantly cheaper to compute, but retains many of its desirable properties. In particular, it (1) exhibits a so-called support dichotomy and (2) it is a rational function, whose numerator and denominator factor into `lower-dimensional' Christoffel polynomials whose coefficients can be computed by inverting potentially much smaller moment matrices. Our approach relies on sparsity of the underlying measure $μ$, described by a graphical model. The complexity of our modification depends on the treewidth of this model.

2407.03725 2026-04-21 econ.EM stat.ME

Is Inference Conditional on Not Rejecting a Pre-test Less Reliable than Unconditional Inference?

Clément de Chaisemartin, Xavier D'Haultfœuille

Comments 42 pages. Many changes compared to v2. In particular, we have added conditions for exact inference and results under local alternatives

详情
英文摘要

Assume that an estimator is asymptotically normal for a target parameter under some conditions. Suppose also that one can test these conditions, and one conducts inference for the target only if the pre-test is not rejected. Does such pre-testing undermine inference? We show that if the tested conditions and mild regularity restrictions hold, conditional inference is still valid, albeit typically conservative. Validity holds regardless of the asymptotic dependence between the estimator and the pre-test. If the tested conditions do not hold, we exhibit conditions under which confidence intervals have larger conditional than unconditional coverage.

2404.05486 2026-04-21 math.ST cs.IT math.IT stat.TH

Quickest Change Detection for Multiple Data Streams Using the James-Stein Estimator

Topi Halme, Venugopal V. Veeravalli, Visa Koivunen

详情
Journal ref
IEEE Transactions on Information Theory (Volume: 71, Issue: 10, October 2025)
英文摘要

The problem of quickest change detection is studied in the context of detecting an arbitrary unknown mean-shift in multiple independent Gaussian data streams. The James-Stein estimator is used in constructing detection schemes that exhibit strong detection performance both asymptotically and non-asymptotically. Our results indicate that utilizing the James-Stein estimator in the recently developed window-limited CuSum test constitutes a uniform improvement over its typical maximum likelihood variant. That is, the proposed James-Stein version achieves a smaller detection delay simultaneously for all possible post-change parameter values and every false alarm rate constraint, as long as the number of parallel data streams is greater than three. Additionally, an alternative detection procedure that utilizes the James-Stein estimator is shown to have asymptotic detection delay properties that compare favorably to existing tests. The second-order asymptotic detection delay term is reduced in a predefined low-dimensional subspace of the parameter space, while second-order asymptotic minimaxity is preserved. The results are verified in simulations, where the proposed schemes are shown to achieve smaller detection delays compared to existing alternatives, especially when the number of data streams is large.

2604.17762 2026-04-21 stat.OT

A Parameter-Centric View on Regression

Jingxin Yan, Lin Liu, Oliver Dukes, Qizhai Li, Linbo Wang

详情
英文摘要

Discussion on ``Regression by Composition'' by Farewell, Daniel, Stensrud, and Huitfeldt

2604.17760 2026-04-21 stat.OT

Toward Variation-Independent Regression by Composition

Ruixuan Zhao, Oliver Dukes, Linbo Wang, Lin Liu

详情
英文摘要

Discussion on "Regression by Composition" by Farewell, Daniel, Stensrud, and Huitfeldt.

2604.17711 2026-04-21 math.ST math.OC math.PR stat.TH

Quantitative Stability of the Shadow for Wasserstein Projections and Sample Complexity

Jakwang Kim

Comments 13 pages

详情
英文摘要

In this paper, we study the stability of the shadow, a projection of a measure onto the set of couplings with respect to the Wasserstein distance. The shadow was introduced by \citet{Eckstein_Nutz_2022} to analyze the stability of the Sinkhorn algorithm, and was recently revisited by \citet{kim2026extensioncouplingprojectionoptimal} for statistical applications. Under mild conditions, we establish the bi-Hölder continuity of the shadow. As a consequence, we also derive the sample complexity of the shadow by combining smoothing techniques with recent results on the rate of convergence of empirical measures in Wasserstein distance. The key idea of the proof is twofold: first, a contraction property of the $L^p$ projection, recently used independently by \citet{kim2025stabilitywassersteinprojectionsconvex} and \citet{alfonsi2025wassersteinprojectionsconvexorder} to study the stability of projections onto the convex order cone in Wasserstein space; and second, the Hölder continuity of optimal transport maps established by \citet{Quantitative_stability_duke2023}, together with its recent extension by \citet{mischler2025quantitativestabilityoptimaltransport}.

2604.17705 2026-04-21 math.ST stat.TH

Asymptotic behavior of the variance of the BLUE for the mean of stationary processes

Mamikon S. Ginovyan

详情
英文摘要

In this paper, we survey results on the asymptotic behavior of the variance of the best linear unbiased estimator (BLUE) for the mean of stationary processes. This behavior is influenced by the regularity and memory structures of the observed models. The results show that the asymptotic behavior of the variance of the BLUE is determined solely by the behavior of the spectrum near the origin. For nondeterministic models, the variance of the BLUE exhibits hyperbolic behavior, similar to the power function, while for purely deterministic models, the variance decreases at an exponential rate. Specifically, a necessary condition for the variance of the BLUE to approach zero exponentially is that the spectral density of the model vanishes on a set of positive Lebesgue measure in any neighborhood of zero. We also present results on the asymptotic efficiency of various unbiased linear estimators in comparison to the BLUE.

2604.17694 2026-04-21 stat.ME cs.LG stat.ML

Improving reproducibility by controlling random seed stability in machine learning based estimation via bagging

Nicholas Williams, Alejandro Schuler

详情
英文摘要

Predictions from machine learning algorithms can vary across random seeds, inducing instability in downstream debiased machine learning estimators. We formalize random seed stability via a concentration condition and prove that subbagging guarantees stability for any bounded-outcome regression algorithm. We introduce a new cross-fitting procedure, adaptive cross-bagging, which simultaneously eliminates seed dependence from both nuisance estimation and sample splitting in debiased machine learning. Numerical experiments confirm that the method achieves the targeted level of stability whereas alternatives do not. Our method incurs a small computational penalty relative to standard practice whereas alternative methods incur large penalties.

2604.17670 2026-04-21 cs.LG stat.ML

Prior-Fitted Functional Flow: In-Context Generative Models for Pharmacokinetics

César Ojeda, Niklas Hartung, Wilhelm Huisinga, Tim Jahn, Purity Kamene Kavwele, Marian Klose, Piyush Kumar, Ramsés J. Sánchez, Darius A. Faroughy

Comments 9 pages, 2 tables and 4 figures

详情
英文摘要

We introduce Prior-Fitted Functional Flows, a generative foundation model for pharmacokinetics that enables zero-shot population synthesis and individual forecasting without manual parameter tuning. We learn functional vector fields, explicitly conditioned on the sparse, irregular data of an entire study population. This enables the generation of coherent virtual cohorts as well as forecasting of partially observed patient trajectories with calibrated uncertainty. We construct a new open-access literature corpus to inform our priors, and demonstrate state-of-the-art predictive accuracy on extensive real-world datasets.

2604.17593 2026-04-21 q-fin.PM stat.ME

Post-Screening Portfolio Selection

Yoshimasa Uematsu, Shinya Tanaka

详情
英文摘要

We propose post-screening portfolio selection (PS$^2$), a two-step framework for high-dimensional mean--variance investing. First, assets are screened by Lasso-type regression of a constant on excess returns without an intercept. Second, portfolio weights are estimated on the selected set using standard low-dimensional methods. Because strong factors can destroy sparsity in real data, we further introduce PS$^2$ with factors (FPS$^2$), which defactors returns before screening and allows factor investing in the final step. We establish theoretical guarantees, and simulations and an empirical application show competitive performance, especially when sparse screening is appropriate or strong factors are explicitly accommodated.

2604.17568 2026-04-21 cs.LG math.ST stat.ML stat.TH

Diverse Dictionary Learning

Yujia Zheng, Zijian Li, Shunxing Fan, Andrew Gordon Wilson, Kun Zhang

Comments ICLR 2026

详情
英文摘要

Given only observational data $X = g(Z)$, where both the latent variables $Z$ and the generating process $g$ are unknown, recovering $Z$ is ill-posed without additional assumptions. Existing methods often assume linearity or rely on auxiliary supervision and functional constraints. However, such assumptions are rarely verifiable in practice, and most theoretical guarantees break down under even mild violations, leaving uncertainty about how to reliably understand the hidden world. To make identifiability actionable in the real-world scenarios, we take a complementary view: in the general settings where full identifiability is unattainable, what can still be recovered with guarantees, and what biases could be universally adopted? We introduce the problem of diverse dictionary learning to formalize this view. Specifically, we show that intersections, complements, and symmetric differences of latent variables linked to arbitrary observations, along with the latent-to-observed dependency structure, are still identifiable up to appropriate indeterminacies even without strong assumptions. These set-theoretic results can be composed using set algebra to construct structured and essential views of the hidden world, such as genus-differentia definitions. When sufficient structural diversity is present, they further imply full identifiability of all latent variables. Notably, all identifiability benefits follow from a simple inductive bias during estimation that can be readily integrated into most models. We validate the theory and demonstrate the benefits of the bias on both synthetic and real-world data.

2604.17526 2026-04-21 math.PR math.ST stat.TH

Convergence of Langevin AIS for multimodal distributions

Akshat Agarwal, Gautam Iyer, Aidan Jameson, Seungjae Son, Wyatt Wimmer

详情
英文摘要

We study convergence rates of the annealed importance sampling algorithm (Neal '01) combined with Langevin Monte Carlo when the target is a multimodal Gibbs measure. The main result shows that for a fixed error threshold, the time complexity is quadratic in the inverse temperature. We identify a simple and useful quantity that controls the sampling error for AIS in a general setting, and then bound this quantity in our setting using spectral estimates. We also study an autonormalized version and obtain bounds for the time complexity in terms of the inverse temperature.

2604.17490 2026-04-21 math.ST q-fin.RM stat.TH

Joint Exclusivity

Nawaf Mohammed

详情
英文摘要

We introduce joint exclusivity (JE), a form of extremal negative dependence that extends the classical notion of mutual exclusivity. The JE structure is analytically tractable and is defined by the exclusion of the interior of the non-negative orthant. We establish a sharp necessary and sufficient condition for the existence of a JE random vector with prescribed marginals, namely $\sum_{i\in N} \overline{F}_i(0) \leq n - 1$. We propose a canonical construction that distributes probability mass on lower-dimensional faces of the support, while allowing flexible copula specifications within each face. The framework is further extended to a generalized class (G-JE) via marginal distortion functions. Finally, we identify a correspondence between the support structures of JE and joint mixability, revealing a structural link between the two concepts.

2604.17410 2026-04-21 math.ST cs.DS stat.ML stat.TH

Algorithmic Contiguity from Low-Degree Heuristic II: Predicting Detection-Recovery Gaps

Zhangsong Li

Comments 74 pages. This is the second part of arXiv:2502.09832. Also merged the results in arXiv:2601.20522

详情
英文摘要

The low-degree polynomial framework has emerged as a powerful tool for providing evidence of statistical-computational gaps in high-dimensional inference. For detection problems, the standard approach bounds the low-degree advantage through an explicit orthonormal basis. However, this method does not extend naturally to estimation tasks, and thus fails to capture the \emph{detection-recovery gap phenomenon} that arises in many high-dimensional problems. Although several important advances have been made to overcome this limitation \cite{SW22, SW25, CGGV25+}, the existing approaches often rely on delicate, model-specific combinatorial arguments. In this work, we develop a general approach for obtaining \emph{conditional computational lower bounds} for recovery problems from mild bounds on low-degree testing advantage. Our method combines the notion of algorithmic contiguity in \cite{Li25} with a cross-validation reduction in \cite{DHSS25} that converts successful recovery into a hypothesis test with lopsided success probabilities. In contrast to prior unconditional lower bounds, our argument is conceptually simple, flexible, and largely model-independent. We apply this framework to several canonical inference problems, including planted submatrix, planted dense subgraph, stochastic block model, multi-frequency angular synchronization, orthogonal group synchronization, and multi-layer stochastic block model. In the first three settings, our method recovers existing low-degree lower bounds for recovery in \cite{SW22, SW25} via a substantially simpler argument. In the latter three, it gives new evidence for conjectured computational thresholds including the persistence of detection-recovery gaps. Together, these results suggest that mild control of low-degree advantage is often sufficient to explain computational barriers for recovery in high-dimensional statistical models.

2604.17395 2026-04-21 stat.ME math.AT stat.AP

A Null Model for Mapper Subtype Claims

Chad M. Topaz

详情
英文摘要

The Mapper algorithm from topological data analysis constructs a graph summarizing the shape of a high-dimensional dataset, and groups of data points identified within this graph are widely interpreted as evidence of distinct subtypes. However, the covariance structure of the data alone can make such groups appear differentiated, even when no subtypes are present. Existing validation approaches do not account for this effect and thus cannot distinguish covariance artifacts from genuine subtypes. We propose a Gaussian null model that generates reference data matching the sample covariance matrix. We pair it with a test statistic that measures mean-level differentiation between communities. In an idealized setting, we prove that covariance geometry alone causes Mapper communities to differ in their average feature profiles, and we show that a simpler label-permutation baseline cannot detect this effect. Simulations confirm well-controlled Type I error under Gaussian data. We apply the framework to four published Mapper analyses spanning breast cancer gene expression, Congressional voting, NBA player performance, and lower-grade glioma genomics. In every case, once outlier singleton communities are accounted for, the observed differentiation does not exceed what the null produces at the α = 0.05 level. This result does not rule out subtypes in these datasets, but it does indicate that the observed structure is consistent with what covariance geometry alone can produce. Stronger evidence would be needed to support a subtype claim.

2604.17381 2026-04-21 stat.ML cs.LG

StrEBM: A Structured Latent Energy-Based Model for Blind Source Separation

Yuan-Hao Wei

详情
英文摘要

This paper proposes StrEBM, a structured latent energy-based model for source-wise structured representation learning. The framework is motivated by a broader goal of promoting identifiable and decoupled latent organization by assigning different latent dimensions their own learnable structural biases, rather than constraining the entire latent representation with a single shared energy. In this sense, blind source separation is adopted here as a concrete and verifiable testbed, through which the evolution of latent dimensions toward distinct underlying components can be directly examined. In the proposed framework, latent trajectories are optimized directly together with an observation-generation map and source-wise structural parameters. Each latent dimension is associated with its own energy-based formulation, allowing different latent components to gradually evolve toward distinct source-like roles during training. In the present study, this source-wise energy design is instantiated using Gaussian-process-inspired energies with learnable length-scales, but the framework itself is not restricted to Gaussian processes and is intended as a more general structured latent EBM formulation. Experiments on synthetic multichannel signals under linear and nonlinear mixing settings show that the proposed model can recover source components effectively, providing an initial empirical validation of the framework. At the same time, the study reveals important optimization characteristics, including slow late-stage convergence and reduced stability under nonlinear observation mappings. These findings not only clarify the practical behavior of the current GP-based instantiation, but also establish a basis for future investigation of richer source-wise energy families and more robust nonlinear optimization strategies.

2604.17267 2026-04-21 cs.AI stat.AP

Rectification Difficulty and Optimal Sample Allocation in LLM-Augmented Surveys

Zikun Ye, Hema Yoganarasimhan

详情
英文摘要

Large Language Models can generate synthetic survey responses at low cost, but their accuracy varies unpredictably across questions. We study the design problem of allocating a fixed budget of human respondents across estimation tasks when cheap LLM predictions are available for every task. Our framework combines three components. First, building on Prediction-Powered Inference, we characterize a question-specific rectification difficulty that governs how quickly the estimator's variance decreases with human sample size. Second, we derive a closed-form optimal allocation rule that directs more human labels to tasks where the LLM is least reliable. Third, since rectification difficulty depends on unobserved human responses for new surveys, we propose a meta-learning approach, trained on historical data, that predicts it for entirely new tasks without pilot data. The framework extends to general M-estimation, covering regression coefficients and multinomial logit partworths for conjoint analysis. We validate the framework on two datasets spanning different domains, question types, and LLMs, showing that our approach captures 61-79% of the theoretically attainable efficiency gains, achieving 11.4% and 10.5% MSE reductions without requiring any pilot human data for the target survey.

2604.17254 2026-04-21 stat.ME stat.AP

Detecting Breast Carcinoma Metastasis on Whole-Slide Images by Partially Subsampled Multiple Instance Learning

Baichen Yu, Xuetong Li, Jing Zhou, Hansheng Wang

详情
英文摘要

Breast cancer is the most prevalent cancer in women worldwide. Histopathology image analysis serves as the gold standard for cancer diagnosis. In this regard, whole-slide imaging (WSI), a revolutionary technology in digital pathology, allows for ultrahigh-resolution tissue analysis. Despite its promise, WSI analysis faces significant computational challenges due to its massive data size and tissue heterogeneity. To address this issue, we present a Gaussian mixture based multiple instance learning (MIL) framework for WSI analysis with partially subsampled instances. Our approach models a WSI as a bag of instances (i.e., randomly cropped sub-images), leveraging a bag-based maximum likelihood estimator (BMLE) to predict metastases. Furthermore, we introduce a subsampling-based maximum likelihood estimator (SMLE) to refine predictions by selectively labeling a subset of instances. Extensive evaluations of the breast carcinoma metastasis prediction demonstrate that BMLE surpasses state-of-the-art methods, while the SMLE further improves the prediction accuracy at both bag and instance levels. We find that our method is fairly robust against various plausible model mis-specifications. Theoretical analyses and simulation studies validate the performance and robustness of our methods.

2604.17250 2026-04-21 stat.AP

Improving post-operative discharge destination prediction of geriatric patients with generative data augmentation

Pegah Golchian, Pauline Maier, Thomas Kocar, Marvin N. Wright

详情
英文摘要

Data scarcity challenges the development and implementation of innovative healthcare solutions. In geriatrics, fall-related injuries are a major cause of hospitalization, functional decline, and mortality in older adults. Optimizing post-operative discharge planning can mitigate these outcomes, but limited data hinders predictive model development. Here, we explored generative machine learning approaches to augment data from the SURGE-Ahead project (Supporting SURgery with Geriatric Co-Management and AI), an initiative addressing geriatric perioperative care. Data from the German geriatric trauma register (AltersTraumaZentrum; ATZ) were incorporated using two strategies: (i) combining SURGE-Ahead and ATZ register data with imputation (ComImp) and (ii) generating synthetic data from SURGE-Ahead alone or combined SURGE-Ahead and the ATZ register datasets with Adversarial random forests (ARF). Predictive models, including multinomial logistic regression, random forest, and a prior-fitted transformer (TabPFN), were trained and evaluated using standard performance metrics: accuracy, area under the receiver operating characteristic curve (ROC AUC), Brier score, and the logistic loss. Random forest and TabPFN performed well (accuracy around 0.84 and AUC around 0.94) and were largely unaffected by augmentation. Logistic regression benefited from augmented data, with predictive performance improving from 0.70 to 0.81 for accuracy and 0.85 to 0.92 for AUC. These results highlight generative data augmentation as a viable approach to enhance simpler predictive models in geriatric care and emphasize the importance of method selection when addressing data scarcity in heterogeneous clinical populations.

2604.17239 2026-04-21 math.ST econ.EM stat.TH

Bootstrap consistency for general double/debiased machine learning estimators

Ziming Lin, Fang Han

Comments 30 pages

详情
英文摘要

Double/debiased machine learning (DML) provides a general framework for inference with high-dimensional or otherwise complex nuisance parameters by combining Neyman-orthogonal scores with cross-fitting, thereby circumventing classical Donsker-type conditions in many modern machine-learning settings. Despite its strong empirical performance, bootstrap inference for DML estimators has received little theoretical justification. This is particularly noteworthy since bootstrap methods are suggested ad used for inference on DML estimators, even though bootstrap procedures can fail for estimators that are root-$n$ consistent and asymptotically normal. This paper fills this gap by establishing bootstrap validity for DML estimators under general exchangeably weighted resampling schemes, with Efron's bootstrap as a special case. Under exactly the same conditions required for the validity of DML itself, we prove that the bootstrap law converges conditionally weakly to the sampling law of the original estimator.

2604.17236 2026-04-21 math.ST stat.TH

Learning Mixtures of Nonparametric and Convolutional Measures on Effectively Low-dimensional Affine Spaces

Sunrit Chakraborty, XuanLong Nguyen

详情
英文摘要

In this paper, we develop a finite mixture of convolutional distributions, a statistical model to analyze continuous data distributed approximately on a mixture of low-dimensional affine subspaces. The observations are assumed independent and identically distributed from the mixture of distributions, where each component arises from a convolution of a distribution supported on a low-dimensional subspace with a suitable noise kernel. We discuss theoretical properties of such class of models, including identifiability under very general conditions - in particular, showing that the minimal representation for such mixtures is uniquely identifiable in a semi-parametric setting. We further study the posterior contraction rates for the parameters for a parametrized class of such models where the supports of the component mixing measures are assumed to be convex polytopes under a suitable well-specified Bayesian regime. This still requires developing novel inverse bounds for problems involving a nested mixture structure, where the mixture kernel is itself another continuous mixture. Our approach for both the identifiability theory and posterior contraction rates is to exploit the geometric structure of the underlying support of the latent measures. Apart from applications in end-member analysis, spectral unmixing and topic models, this study provides a grounded framework for subspace clustering with the goal of exploring conditions for learning multiple latent low-dimensional structures. We illustrate our findings through careful simulation study, which also includes developing new algorithms for such class of models

2604.17224 2026-04-21 cs.LG stat.ML

LASER: Low-Rank Activation SVD for Efficient Recursion

Ege Çakar, Ketan Ali Raghu, Lia Zheng

Comments Accepted to the Latent and Implicit Thinking Workshop at ICLR 2026

详情
英文摘要

Recursive architectures such as Tiny Recursive Models (TRMs) perform implicit reasoning through iterative latent computation, yet the geometric structure of these reasoning trajectories remains poorly understood. We investigate the activation manifold of TRMs during recursive unrolling and find that activations occupy an effectively linear, low-dimensional subspace whose principal directions can be tracked dynamically with cheap power iterations. This suggests that weight-sharing concentrates iterative computation along a small number of dominant eigendirections, and we find that this concentration varies sharply across computational sites. We exploit this structure through LASER (Low-Rank Activation SVD for Efficient Recursion), a dynamic compression framework that maintains an evolving low-rank basis via matrix-free subspace tracking with a fidelity-triggered reset mechanism, achieving ${\sim}60\%$ activation memory savings with no statistically significant accuracy degradation. Our analysis raises questions about how recursive architectures allocate representational capacity during implicit reasoning, and whether this concentration can be exploited to improve the efficiency and stability of latent computation.

2604.17219 2026-04-21 stat.ML cs.LG

PAC-Bayes Bounds for Gibbs Posteriors via Singular Learning Theory

Chenyang Wang, Yun Yang

详情
英文摘要

We derive explicit non-asymptotic PAC-Bayes generalization bounds for Gibbs posteriors, that is, data-dependent distributions over model parameters obtained by exponentially tilting a prior with the empirical risk. Unlike classical worst-case complexity bounds based on uniform laws of large numbers, which require explicit control of the model space in terms of metric entropy (integrals), our analysis yields posterior-averaged risk bounds that can be applied to overparameterized models and adapt to the data structure and the intrinsic model complexity. The bound involves a marginal-type integral over the parameter space, which we analyze using tools from singular learning theory to obtain explicit and practically meaningful characterizations of the posterior risk. Applications to low-rank matrix completion and ReLU neural network regression and classification show that the resulting bounds are analytically tractable and substantially tighter than classical complexity-based bounds. Our results highlight the potential of PAC-Bayes analysis for precise finite-sample generalization guarantees in modern overparameterized and singular models.

2604.17213 2026-04-21 math.OC cs.SY eess.SY stat.ML

Symplectic Inductive Bias for Data-Driven Target Reachability in Hamiltonian Systems

Zhuo Ouyang, Jixian Liu, Enrique Mallada

详情
英文摘要

Inductive bias refers to restrictions on the hypothesis class that enable a learning method to generalize effectively from limited data. A canonical example in control is linearity, which underpins low sample-complexity guarantees for stabilization and optimal control. For general nonlinear dynamics, by contrast, guarantees often rely on smoothness assumptions (e.g., Lipschitz continuity) which, when combined with covering arguments, can lead to data requirements that grow exponentially with the ambient dimension. In this paper we argue that data-efficient nonlinear control demands exploiting inductive bias embedded in nature itself, namely, structure imposed by physical laws. Focusing on Hamiltonian systems, we leverage symplectic geometry and intrinsic recurrence on energy level sets to solve target reachability problems. Our approach combines the recurrence property with a recently proposed class of policies, called chain policies, which composes locally certified trajectory segments extracted from demonstrations to achieve target reachability. We provide sufficient conditions for reachability under this construction and show that the resulting data requirements depend on explicit geometric and recurrence properties of the Hamiltonian rather than the state dimension.

2604.17194 2026-04-21 stat.ML cs.LG

Forecast Sports Outcomes under Efficient Market Hypothesis: Theoretical and Experimental Analysis of Odds-Only and Generalised Linear Models

Kaito Goto, Naoya Takeishi, Takehisa Yairi

详情
英文摘要

Converting betting odds into accurate outcome probabilities is a fundamental challenge in order to use betting odds as a benchmark for sports forecasting and market efficiency analysis. In this study, we propose two methods to overcome the limitations of existing conversion methods. Firstly, we propose an odds-only method to convert betting odds to probabilities without using historical data for model fitting. While existing odds-only methods, such as Multiplicative, Shin, and Power exist, they do not adjust for biases or relationships we found in our betting odds dataset, which consists of 90014 football matches across five different bookmakers. To overcome these limitations, our proposed Odds-Only-Equal-Profitability-Confidence (OO-EPC) method aligns with the bookmakers' pricing objectives of having equal confidence in profitability for each outcome. We provide empirical evidence from our betting odds dataset that, for the majority of bookmakers, our proposed OO-EPC method outperforms the existing odds-only methods. Beyond controlled experiments, we applied the OO-EPC method under real-world uncertainty by using it for six iterations of an annual basketball outcome forecasting competition. Secondly, we propose a generalised linear model that utilises historical data for model fitting and then converts betting odds to probabilities. Existing generalised linear models attempt to capture relationships that the Efficient Market Hypothesis already captures. To overcome this shortcoming, our proposed Favourite-Longshot-Bias-Adjusted Generalised Linear Model (FL-GLM) fits just one parameter to capture the favourite-longshot bias, providing a more interpretable alternative. We provide empirical evidence from historical football matches where, for all bookmakers, our proposed FL-GLM outperforms the existing multinomial and logistic generalised linear models.

2604.17160 2026-04-21 math.ST stat.TH

Bayesian analysis for a generalised Dirichlet process prior

Nils Lid Hjort

Comments 24 pages, no figures. Statistical Research Report, Deparment of Mathematics, University of Oslo, Nov 2000. Annals of Statistics was interested and invited a revision, which somehow was not followed up. The tech report has gathered quite a few citations, and is brought to arXiv April 2026 for better visibility

详情
英文摘要

A family of random probabilities is defined and studied. This family contains the Dirichlet process as a special case, corresponding to an inner point in the appropriate parameter space. The extension makes it possible to have random means with larger or smaller skewnesses as compared to skewnesses under the Dirichlet prior, and also in other ways amounts to additional modelling flexibility. The usefulness of such random probabilities for nonparametric Bayesian statistics is discussed. The posterior distribution is complicated, but inference can nevertheless be carried out via simulation, and some exact formulae are derived for the case of random means. The class of nonparametric priors provides an instructive example where the speed with which the posterior forgets its prior with increasing data sample size depends on special aspects of the prior, which is a different situation from that of parametric inference.

2604.17154 2026-04-21 stat.ME

Model Selection and Parameter Inference through Constraints via Sequences of Surrogate Smoothing Functions

Mateen R Shaikh

Comments submitted for peer review on April 18 2026

详情
英文摘要

Models with fewer parameters are often easier to interpret and more robust. Parsimony can be achieved through optimizing objectives like the AIC or BIC, which are functions of the the number of free parameters in the model. Optimizing this discrete objective is a challenge, often relying on discrete optimization. We construct smooth functions with optima that reach the same optima of these objectives but permit continuous rather than discrete optimization, relieving some selection burden. Proofs of convergence are provided and a novel method of clustering through explicit overparamterization shows promising results.

2604.15632 2026-04-21 math.AG stat.ML

Algebraic Invariants of Lightning Self-Attention

Yulia Alexandr, Hao Duan, Guido Montúfar

详情
英文摘要

We study the polynomial coefficients of lightning self-attention as coordinates of an algebraic variety. We identify linear and nonlinear families of algebraic invariants, including Chow-type, low-rank, Veronese-type, and Sylvester resultant-based constraints.

2604.14621 2026-04-21 stat.ML cs.LG

Differentially Private Conformal Prediction

Jiamei Wu, Ce Zhang, Zhipeng Cai, Jingsen Kong, Bei Jiang, Linglong Kong, Lingchen Kong

详情
英文摘要

Conformal prediction (CP) has attracted broad attention as a simple and flexible framework for uncertainty quantification through prediction sets. In this work, we study how to deploy CP under differential privacy (DP) in a statistically efficient manner. We first introduce differential CP, a non-splitting conformal procedure that avoids the efficiency loss caused by data splitting and serves as a bridge between oracle CP and private conformal inference. By exploiting the stability properties of DP mechanisms, differential CP establishes a direct connection to oracle CP and inherits corresponding validity behavior. Building on this idea, we develop Differentially Private Conformal Prediction (DPCP), a fully private procedure that combines DP model training with a private quantile mechanism for calibration. We establish the end-to-end privacy guarantee of DPCP and investigate its coverage properties under additional regularity conditions. We further study the efficiency of both differential CP and DPCP under empirical risk minimization and general regression models, showing that DPCP can produce tighter prediction sets than existing private split conformal approaches under the same privacy budget. Numerical experiments on synthetic and real datasets demonstrate the practical effectiveness of the proposed methods.

2604.09663 2026-04-21 econ.EM q-fin.GN stat.ME

JFR-rg: A New Macroeconomic Framework for High-Debt, Low-Growth Economies under Financial Repression

Hirofumi Wakimoto

Comments JEL Classification: E44, E52, E62, F31, H63. v2: bibliographic corrections, consistency fixes, and clarifications of scope conditions, falsification language, and selected interpretations; results unchanged

详情
英文摘要

Standard macroeconomic frameworks have correctly identified Japan's government debt - now exceeding 240% of GDP - as carrying substantial fiscal risk. Yet FRED data from 2013 to 2026 present an empirical record inviting a complementary perspective: debt ratios have stabilized, nominal GDP has exceeded 670 trillion yen (SAAR), and unemployment has remained near 2.6-2.7%. This paper formalizes these channels through the Japanese Financial Repression r-g (JFR-rg) model. Building on Blanchard (2019), the framework incorporates a financial repression bias (epsilon_t = pi_t - r^n_t, directly observable from FRED) and a non-linear exchange-rate channel. Three theoretical contributions extend the literature: (i) the Debt Sustainability Corridor, a characterization of stability in (epsilon_t, g^n*_t) space; (ii) the Normalization Ratchet, a path-dependence theorem showing that temporary policy errors generate persistently higher debt trajectories; and (iii) the Captive Financial System Parameter (phi_t), which endogenizes the institutional precondition for JFR-rg stability. Appendices H-L provide supporting empirical evidence (VAR, ARDL, Local Projections) showing the framework's claims are empirically disciplined and falsifiable. The core debt-dynamics propositions are anchored in the consolidated government budget identity (Layer L1), while selected propositions additionally rely on minimal structural assumptions; identification concerns apply only to the empirical Layer L2. Counterfactual simulations illustrate a Normalization Trap: aggressive rate hikes can produce counterproductive debt dynamics. For high-debt, low-growth economies sharing Japan's institutional characteristics, strategically deploying the resulting Repression Dividend into productivity-enhancing investment may represent a regime-contingent equilibrium possibility, conditional on the captive system condition being maintained.

2603.00883 2026-04-21 cs.LG cs.AI cs.CY stat.AP

Knowledge without Wisdom: Measuring Misalignment between LLMs and Intended Impact

Michael Hardy, Yunsung Kim

详情
英文摘要

LLMs increasingly excel on AI benchmarks, but doing so does not guarantee validity for downstream tasks. This study contrasts LLM alignment on benchmarks, downstream tasks, and, importantly the intended impact of those tasks. We evaluate the performance of leading LLMs (i.e., generative pre-trained base models) on difficult-to-verify tasks of the teaching and learning of schoolchildren. Across all LLMs, inter-model behaviors on disparate tasks correlate higher than they do with expert human behaviors on target tasks. These biases shared across LLMs are poorly aligned with downstream measures of teaching quality and often negatively aligned with the intended impact of student learning outcomes. Further, we find multi-model ensembles, both unanimous model voting and expert-weighting by benchmark performance, further exacerbate misalignment with learning. We measure that selection of LLM and/or prompting strategy only reliably accounts for $15\%$ of all measured misalignment error and that variation in misalignment error is shared across LLMs, suggesting that common pretraining accounts for much of the misalignment in these tasks. We demonstrate methods for robustly measuring alignment of complex tasks and provide unique insights into practical applications of LLMs in high-noise contexts.

2601.15690 2026-04-21 cs.AI stat.AP

From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models

Jiaxin Zhang, Wendi Cui, Zhuohang Li, Lifu Huang, Bradley Malin, Caiming Xiong, Chien-Sheng Wu

Comments This paper has been accepted by ACL 2026

详情
英文摘要

While Large Language Models (LLMs) show remarkable capabilities, their unreliability remains a critical barrier to deployment in high-stakes domains. This survey charts a functional evolution in addressing this challenge: the evolution of uncertainty from a passive diagnostic metric to an active control signal guiding real-time model behavior. We demonstrate how uncertainty is leveraged as an active control signal across three frontiers: in \textbf{advanced reasoning} to optimize computation and trigger self-correction; in \textbf{autonomous agents} to govern metacognitive decisions about tool use and information seeking; and in \textbf{reinforcement learning} to mitigate reward hacking and enable self-improvement via intrinsic rewards. By grounding these advancements in emerging theoretical frameworks like Bayesian methods and Conformal Prediction, we provide a unified perspective on this transformative trend. This survey provides a comprehensive overview, critical analysis, and practical design patterns, arguing that mastering the new trend of uncertainty is essential for building the next generation of scalable, reliable, and trustworthy AI.

2601.09173 2026-04-21 cs.LG cs.CL q-bio.QM stat.ML

Geometric Stability: The Missing Axis of Representations

Prashant C. Raju

详情
英文摘要

Representational similarity analysis and related methods have become standard tools for comparing the internal geometries of neural networks and biological systems. These methods measure what is represented, the alignment between two representational spaces, but not whether that structure is robust. We introduce geometric stability, a distinct dimension of representational quality that quantifies how reliably a representation's pairwise distance structure holds under perturbation. Our metric, Shesha, measures self-consistency through split-half correlation of representational dissimilarity matrices constructed from complementary feature subsets. A key formal property distinguishes stability from similarity: Shesha is not invariant to orthogonal transformations of the feature space, unlike CKA and Procrustes, enabling it to detect compression-induced damage to manifold structure that similarity metrics cannot see. Spectral analysis reveals the mechanism: similarity metrics collapse after removing the top principal component, while stability retains sensitivity across the eigenspectrum. Across 2463 encoder configurations in seven domains -- language, vision, audio, video, protein sequences, molecular profiles, and neural population recordings -- stability and similarity are empirically uncorrelated ($ρ=-0.01$). A regime analysis shows this independence arises from opposing effects: geometry-preserving transformations make the metrics redundant, while compression makes them anti-correlated, canceling in aggregate. Applied to 94 pretrained models across 6 datasets, stability exposes a "geometric tax": DINOv2, the top-performing model for transfer learning, ranks last in geometric stability on 5/6 datasets. Contrastive alignment and hierarchical architecture predict stability, providing actionable guidance for model selection in deployment contexts where representational reliability matters.

2512.06324 2026-04-21 math.ST stat.TH

Subsampling Confidence Bound for Persistent Diagram via Time-delay Embedding

Donghyun Park, Junhyun An, Taehyoung Kim, Jisu Kim

Comments 23 pages, 2 figures, Minor corrections in formatting

详情
英文摘要

Time-delay embedding is a fundamental technique in Topological Data Analysis (TDA) for reconstructing the phase space dynamics of time-series data. Persistent homology effectively identifies global topological features, such as loops associated with periodicity. Nevertheless, a statistically rigorous way to quantify uncertainty in the resulting topological features has remained underdeveloped -- a problem that we aim to challenge. First, we analyze the topological characterization of time-delay embeddings under both periodic and non-periodic conditions. Precisely, the embedded trajectory is homotopy equivalent to a circle ($S^1$) for periodic signals and is contractible for non-periodic ones. We also prove that the reach of the sliding window embedding is lower-bounded, ensuring stable persistence features. Next, we propose a subsampling-based method to construct confidence bounds for persistence diagrams derived from time-delay embeddings. Specifically, we derive confidence bounds with asymptotic guarantees, under the assumption that the support satisfies standard manifold regularity. Integrating the results, we propose a statistical testing framework to determine the periodicity of the underlying sampling function. This framework provides a principled statistical test for periodicity with asymptotically controlled type I and type II error rates. Simulation studies demonstrate that our method achieves detection performance comparable to the Generalized Lomb-Scargle Periodogram on periodic data while exhibiting superior robustness in distinguishing non-periodic signals with time-varying frequencies, such as chirp signals. Finally, it successfully captured the periodicity when applied to the BIDMC dataset.

2511.12069 2026-04-21 cs.SE stat.ME

A Code Smell Refactoring Approach using GNNs

HanYu Zhang, Tomoji Kishi

详情
英文摘要

Code smell is a great challenge in software refactoring, which indicates latent design or implementation flaws that may degrade the software maintainability and evolution. Over the past decades, a variety of refactoring approaches have been proposed, which can be broadly classified into metrics-based, rule-based, and machine learning-based approaches. Recent years, deep learning-based approaches have also attracted widespread attention. However, existing techniques exhibit various limitations. Metrics- and rule-based approaches rely heavily on manually defined heuristics and thresholds, whereas deep learning-based approaches are often constrained by dataset availability and model design. In this study, we proposed a graph-based deep learning approach for code smell refactoring. Specifically, we designed two types of input graphs (class-level and method-level) and employed both graph classification and node classification tasks to address the refactoring of three representative code smells: long method, large class, and feature envy. In our experiment, we propose a semi-automated dataset generation approach that could generate a large-scale dataset with minimal manual effort. We implemented the proposed approach with three classical GNN (graph neural network) architectures: GCN, GraphSAGE, and GAT, and evaluated its performance against both traditional and state-of-the-art deep learning approaches. The results demonstrate that proposed approach achieves superior refactoring performance.

2511.09215 2026-04-21 stat.ME

Principled analysis of crossover designs: causal effects, efficient estimation, and robust inference

Zhichao Jiang, Peng Ding

详情
英文摘要

Crossover designs randomly assign each unit to receive a sequence of treatments. By comparing outcomes within the same unit, these designs can effectively eliminate between-unit variation and facilitate the identification of both instantaneous effects of current treatments and carryover effects from past treatments. They are widely used in traditional biomedical studies and are increasingly adopted in modern digital platforms. However, standard analyses of crossover designs often rely on strong parametric models, making inference vulnerable to model misspecification. This paper adopts a design-based framework to analyze general crossover designs. We make two main contributions. First, we use potential outcomes to formally define the causal estimands and assumptions on the data-generating process. For any given type of crossover design and assumptions on potential outcomes, we outline a procedure for identification and estimation, emphasizing the central role of the treatment assignment mechanism in design-based inference. Second, we unify the analysis of crossover designs using least squares, with restrictions on the coefficients and weights on the units. Based on the theory, we recommend the specification of the regression function, weighting scheme, and coefficient restrictions to assess identifiability, construct efficient estimators, and estimate variances in a unified fashion. Crucially, the least squares procedure is simple to implement, and yields not only consistent and efficient point estimates but also valid variance estimates even when the working regression model is misspecified.

2511.07261 2026-04-21 math.NA cs.NA stat.CO stat.ML

High-dimensional Bayesian filtering through deep density approximation

Kasper Bågmark, Filip Rydin

Comments 30 pages, 13 figures

详情
英文摘要

In this work, we systematically benchmark two recently developed deep density methods for nonlinear filtering. We model the filtering density of a discretely observed stochastic differential equation through the associated Fokker--Planck equation, coupled with Bayesian updates at discrete observation times. The two filters: the deep splitting filter and the deep backward stochastic differential equation filter, are both based on Feynman--Kac formulas, Euler--Maruyama discretizations and neural networks. The two methods are extended to logarithmic formulations providing sound, robust, and positivity-preserving density approximations in increasing state dimension. Comparing to the classical bootstrap particle filter and an ensemble Kalman filter, we benchmark the methods on numerous examples. In the low-dimensional examples the particle filters work well, but when we scale up to a partially observed $100$-dimensional Lorenz-96 model, the particle-based methods fail and the logarithmic deep backward stochastic differential equation filter prevails. In terms of computational efficiency, the deep density methods reduce inference time by roughly two to five orders of magnitude relative to the particle-based filters.

2510.14142 2026-04-21 stat.ME

Complier General Causal Effect in Randomized Controlled Trials with One-Sided Noncompliance

Yin Tang, Yanyuan Ma, Jiwei Zhao

详情
英文摘要

A randomized controlled trial (RCT) is widely regarded as the gold standard for assessing the causal effect of a treatment or intervention, assuming perfect implementation. In practice, however, randomization can be compromised for various reasons, such as one-sided noncompliance. In this paper, we first systematically study the likelihood-based identifiability in an RCT with one-sided noncompliance. This foundational analysis naturally gives rise to the complier general causal effect (CGCE) as the primary estimand. We further develop two estimators for the CGCE: a simple estimator that requires no nonparametric procedures, and an efficient estimator that achieves the semiparametric efficiency bound. Our theoretical analysis shows that, achieving semiparametric efficiency requires only the nuisance estimators to converge in $L_2$-norm, with no restriction on their convergence rates. This rate-free property opens the door to employing many more modern machine learning methods while still guaranteeing efficiency. Comprehensive simulation studies and a real data application are conducted to illustrate the proposed methods and to compare them with existing approaches.

2510.05573 2026-04-21 stat.ML cs.IT cs.LG math.IT

On the Theory of Continual Learning with Gradient Descent for Neural Networks

Hossein Taheri, Avishek Ghosh, Arya Mazumdar

详情
英文摘要

Continual learning, the ability of a model to adapt to an ongoing sequence of tasks without forgetting earlier ones, is a central goal of artificial intelligence. To better understand its underlying mechanisms, we study the limitations of continual learning in a tractable yet representative setting. Specifically, we analyze one-hidden-layer quadratic neural networks trained by gradient descent on a sequence of XOR-cluster datasets with Gaussian noise, where different tasks correspond to clusters with orthogonal means. Our analysis is based on a tight characterization of gradient descent dynamics for the training loss, which yields explicit bounds on the rate of train-time forgetting as functions of the number of iterations, sample size, number of tasks, and hidden-layer width. We then leverage an algorithmic stability framework to bound the generalization gap, leading to corresponding guarantees on test-time forgetting. Together, our results provide the first closed-form guarantees for forgetting in continual learning with neural networks and show how key problem parameters jointly govern forgetting dynamics. Numerical experiments corroborate our theoretical results.

2509.19088 2026-04-21 cs.CY cs.AI cs.HC stat.AP

Digital Twins as Funhouse Mirrors: Five Key Distortions

Tianyi Peng, George Gui, Melanie Brucks, Daniel J. Merlau, Grace Jiarui Fan, Malek Ben Sliman, Eric J. Johnson, Abdullah Althenayyan, Silvia Bellezza, Dante Donati, Hortense Fong, Elizabeth Friedman, Ariana Guevara, Mohamed Hussein, Kinshuk Jerath, Bruce Kogut, Akshit Kumar, Kristen Lane, Hannah Li, Vicki Morwitz, Oded Netzer, Patryk Perkowski, Olivier Toubia

详情
英文摘要

Scientists and practitioners are increasingly moving to deploy digital twins--LLM-based models of real individuals--across social science and policy research. We conduct 19 pre-registered studies spanning 164 diverse outcomes (e.g., attitudes toward hiring algorithms, intentions to share misinformation), comparing human responses to those of their corresponding digital twins, which are trained on each individual's prior responses to over 500 questions. We establish an empirical benchmark for digital twin performance: their predictions are only modestly more accurate than those of a homogeneous base LLM and exhibit weak correlation with human responses (average $r = 0.20$). To inform future development, we identify five systematic distortions in digital twin behavior: (i) insufficient individuation, (ii) stereotyping, (iii) representation bias, (iv) ideological bias, and (v) hyper-rationality. Finally, we release our full dataset and code as a standardized testbed for evaluating and improving digital twin methodologies. Together, our findings caution against premature deployment while laying the groundwork for a transparent, replicable, and iterative science of responsible digital twin development.

2508.17235 2026-04-21 stat.AP

On the relationship between the Wasserstein distance and differences in life expectancy at birth

Markus Sauerberg

Comments 19 pages, 18 figures

详情
英文摘要

The Wasserstein distance is a metric for assessing distributional differences. The measure originates in optimal transport theory and can be interpreted as the minimal cost of transforming one distribution into another. In this paper, the Wasserstein distance is applied to life table age-at-death distributions. The main finding is that, under certain conditions, the Wasserstein distance between two age-at-death distributions equals the corresponding gap in life expectancy at birth ($e_0$). More specifically, the paper shows mathematically and empirically that this equivalence holds whenever the survivorship functions do not cross. For example, this applies when comparing mortality between women and men from 1990 to 2020 using data from the Human Mortality Database. In such cases, the gap in $e_0$ reflects not only a difference in mean ages at death but can also be interpreted directly as a measure of distributional difference.

2508.10630 2026-04-21 math.NA cs.NA stat.CO stat.ML

Nonlinear filtering based on density approximation and deep BSDE prediction

Kasper Bågmark, Adam Andersson, Stig Larsson

Comments 18 pages, 6 figures

详情
英文摘要

A novel approximate Bayesian filter based on backward stochastic differential equations is introduced. It uses a nonlinear Feynman--Kac representation of the filtering problem and the approximation of an unnormalized filtering density using the well-known deep BSDE method and neural networks. The method is trained offline, which means that it can be applied online with new observations. A hybrid a priori-a posteriori error bound is proved under a parabolic Hörmander condition. The theoretical convergence rate is confirmed in two numerical examples.

2508.01942 2026-04-21 math.OC math.ST stat.TH

Central Limit Theorems for Sample Average Approximations in Stochastic Optimal Control

Johannes Milz, Alexander Shapiro

详情
英文摘要

We establish central limit theorems for the Sample Average Approximation (SAA) method in discrete-time, finite-horizon stochastic optimal control. Our analysis is based on an abstract limit theorem for stochastic backward recursions, which yields a recursive characterization of the limiting laws. Applied to the dynamic programming principle, this framework gives Gaussian limits for SAA value functions under unique optimal policies. The asymptotic variance at each stage decomposes into a current-stage variance and a propagated future variance, demonstrating how statistical uncertainty accumulates backward through time. We also apply the framework to the linear quadratic regulator, derive explicit limiting laws and variance formulas, and provide numerical illustrations of the resulting variance decomposition. Finally, we discuss the form of the limit laws under nonunique optimal policies.

2508.00040 2026-04-21 cs.LG math.PR stat.AP stat.ML

Regime-Aware Conditional Neural Processes with Multi-Criteria Decision Support for Operational Electricity Price Forecasting

Abhinav Das, Stephan Schlüter

详情
Journal ref
Energy Economics 157 (2026) 109233
英文摘要

This work integrates Bayesian regime detection with conditional neural processes for 24-hour electricity price prediction in the German market. Our methodology integrates regime detection using a disentangled sticky hierarchical Dirichlet process hidden Markov model (DS-HDP-HMM) applied to daily electricity prices. Each identified regime is subsequently modeled by an independent conditional neural process (CNP), trained to learn localized mappings from input contexts to 24-dimensional hourly price trajectories, with final predictions computed as regime-weighted mixtures of these CNP outputs. We rigorously evaluate R-NP against deep neural networks (DNN) and Lasso estimated auto-regressive (LEAR) models by integrating their forecasts into diverse battery storage optimization frameworks, including price arbitrage, risk management, grid services, and cost minimization. This operational utility assessment revealed complex performance trade-offs: LEAR often yielded superior absolute profits or lower costs, while DNN showed exceptional optimality in specific cost-minimization contexts. Recognizing that raw prediction accuracy doesn't always translate to optimal operational outcomes, we employed TOPSIS as a comprehensive multi-criteria evaluation layer. Our TOPSIS analysis identified LEAR as the top-ranked model for 2021, but crucially, our proposed R-NP model emerged as the most balanced and preferred solution for 2021, 2022 and 2023.

2506.12176 2026-04-21 cs.LG stat.ML

"Faithful to What?" On the Limits of Fidelity-Based Explanations

Jackson Eshbaugh

Comments 6 pages, 3 figures, 3 tables. Accepted at the Workshop on Scientific Methods for Understanding Deep Learning (Sci4DL) at ICLR 2026. Code available at https://github.com/jacksoneshbaugh/lambda-linearity-score/tree/main

详情
英文摘要

In explainable AI, surrogate models are commonly evaluated by their fidelity to a neural network's predictions. Fidelity, however, measures alignment to a learned model rather than alignment to the data-generating signal underlying the task. This work introduces the linearity score $λ(f)$, a diagnostic that quantifies the extent to which a regression network's input--output behavior is linearly decodable. $λ(f)$ is defined as an $R^2$ measure of surrogate fit to the network. Across synthetic and real-world regression datasets, we find that surrogates can achieve high fidelity to a neural network while failing to recover the predictive gains that distinguish the network from simpler models. In several cases, high-fidelity surrogates underperform even linear baselines trained directly on the data. These results demonstrate that explaining a model's behavior is not equivalent to explaining the task-relevant structure of the data, highlighting a limitation of fidelity-based explanations when used to reason about predictive performance.

2505.12548 2026-04-21 stat.ME stat.ML

Modeling Nonstationary Extremal Dependence via Deep Spatial Deformations

Xuanjie Shao, Jordan Richards, Raphael Huser

详情
英文摘要

Modeling nonstationarity that often prevails in extremal dependence of spatial data can be challenging, and typically requires bespoke or complex spatial models that are difficult to estimate. Inference for stationary and isotropic models is considerably easier, but the assumptions that underpin these models are rarely met by data observed over large or topographically complex domains. A possible approach for accommodating nonstationarity in a spatial model is to warp the spatial domain to a latent space where stationarity and isotropy can be reasonably assumed. Although this approach is very flexible, estimating the warping function can be computationally expensive, and the transformation is not always guaranteed to be bijective, which may lead to physically unrealistic transformations when the domain folds onto itself. We overcome these challenges by developing deep compositional spatial models to capture nonstationarity in extremal dependence. Specifically, we focus on modeling high threshold exceedances of process functionals by leveraging efficient inference methods for limiting r-Pareto processes. A detailed high-dimensional simulation study demonstrates the superior performance of our model in estimating the warped space. We illustrate our method by modeling UK precipitation extremes and show that we can efficiently estimate the extremal dependence structure of data observed at thousands of locations.

2502.19499 2026-04-21 cs.LG math.OC stat.ML

On the Interpolation Effect of Score Smoothing in Diffusion Models

Zhengdao Chen

Comments 34 pages, 14 figures. Code available at: https://github.com/google-research/diffusion-score-smoothing

详情
Journal ref
14th International Conference on Learning Representations (ICLR 2026)
英文摘要

Diffusion models have achieved remarkable progress in various domains with an intriguing ability to produce new data that do not exist in the training set. In this work, we study the hypothesis that such creativity arises from the neural network backbone learning a smoothed version of the empirical score function, which guides the denoising dynamics to generate data points that interpolate the training data. Focusing mainly on settings where the training set lies uniformly in a one-dimensional subspace, we elucidate the interplay between score smoothing and the denoising dynamics with analytical solutions and numerical experiments, demonstrating how smoothing the score function can cause the denoised data samples to interpolate the training set along the subspace. Moreover, we present theoretical and empirical evidence that learning score functions with neural networks - either with or without explicit regularization - can naturally achieve a similar effect, including when the data belong to simple nonlinear manifolds.

2502.05075 2026-04-21 cs.LG cs.NA math.NA stat.ML

Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension

Yijun Dong, Yicheng Li, Yunai Li, Jason D. Lee, Qi Lei

Comments ICML 2025

详情
英文摘要

Weak-to-strong (W2S) generalization is a type of finetuning (FT) where a strong (large) student model is trained on pseudo-labels generated by a weak teacher. Surprisingly, W2S FT often outperforms the weak teacher. We seek to understand this phenomenon through the observation that FT often occurs in intrinsically low-dimensional spaces. Leveraging the low intrinsic dimensionality of FT, we analyze W2S in the ridgeless regression setting from a variance reduction perspective. For a strong student-weak teacher pair with sufficiently expressive low-dimensional feature subspaces $\mathcal{V}_s, \mathcal{V}_w$, we provide an exact characterization of the variance that dominates the generalization error of W2S. This unveils a virtue of discrepancy between the strong and weak models in W2S: the variance of the weak teacher is inherited by the strong student in $\mathcal{V}_s \cap \mathcal{V}_w$, while reduced by a factor of $\mathrm{dim}(\mathcal{V}_s)/N$ in the subspace of discrepancy $\mathcal{V}_w \setminus \mathcal{V}_s$ with $N$ pseudo-labels for W2S. Our analysis further casts light on the sample complexities and the scaling of performance gap recovery in W2S. The analysis is supported by experiments on synthetic regression problems, as well as real vision and NLP tasks.

2501.02703 2026-04-21 stat.ME

Full-conformal novelty detection

Junu Lee, Ilia Popov, Zhimei Ren

详情
英文摘要

This paper presents a powerful methodology for flexible full-data nonparametric novelty detection that offers distribution-free false discovery rate (FDR) control guarantees. Building on the full conformal inference framework and the concept of e-values, we introduce full conformal e-values to quantify evidence for novelty relative to a given reference dataset. These e-values are then utilized by carefully crafted multiple testing procedures to identify a set of novel units out-of-sample with provable finite-sample FDR control. We showcase several instantiations of e-values, including those which employ a data-driven model selection strategy to amplify power. Furthermore, our framework is extended to address distribution shift, accommodating scenarios where novelty detection must be performed on data drawn from a shifted distribution relative to the reference dataset. In all settings, our method can perform powerfully -- outperforming existing novelty detection methods -- even with limited amounts of reference data; this is illustrated by empirical evaluations on synthetic data and an application to a malicious LLM prompts dataset.

2412.06628 2026-04-21 stat.ME math.ST stat.TH

Partial identification of principal causal effects under violations of principal ignorability

Minxuan Wu, Joseph Antonelli

Comments Corrected Figure 2: the top-row plots were inadvertently duplicated in the previous version. No changes to the related text, results, or conclusions. A few minor edits

详情
英文摘要

Principal stratification is a general framework for studying causal mechanisms involving post-treatment variables. When estimating principal causal effects, the principal ignorability assumption is commonly invoked, which we study in detail in this manuscript. Our first key contribution is studying a commonly used strategy of using parametric models to jointly model the outcome and principal strata without requiring the principal ignorability assumption. We show that even if the joint distribution of principal strata is known, this strategy necessarily leads to only partial identification of causal effects, even under very simple and correctly specified outcome models. While principal ignorability leads to point identification in this setting, we discuss alternative, weaker assumptions and show how they can lead to informative partial identification regions. An additional contribution is that we provide theoretical support to strategies used in the literature for identifying association parameters that govern the joint distribution of principal strata. We prove that this is possible, but only if the principal ignorability assumption is violated. Additionally, due to partial identifiability of causal effects even when these association parameters are known, we show that these association parameters are only identifiable under strong parametric constraints. Lastly, we extend these results to more flexible semiparametric and nonparametric Bayesian models.

2410.10282 2026-04-21 stat.CO stat.ME

Exact MCMC for Intractable Proposals

Dwija Kakkad, Dootika Vats

详情
英文摘要

Accept-reject based Markov chain Monte Carlo (MCMC) methods are the workhorse algorithm for Bayesian inference. These algorithms, like Metropolis-Hastings, require choosing a proposal distribution which is typically informed by the desired target distribution. Surprisingly, proposal distributions with unknown normalizing constants are not uncommon, even though for such a choice of a proposal, the Metropolis-Hastings acceptance ratio cannot be evaluated exactly. Across the literature, authors resort to approximation methods that yield inexact MCMC or develop specialized algorithms to combat this problem. We show how Bernoulli factory MCMC algorithms, originally proposed for doubly intractable target distributions, can quite naturally be adapted to yield an exact MCMC sampling method. We present three diverse and relevant examples demonstrating the usefulness of the Bernoulli factory approach to this problem.

2409.14585 2026-04-21 math.NA cs.NA math.PR stat.CO stat.ML

A convergent scheme for the Bayesian filtering problem based on the Fokker--Planck equation and deep splitting

Kasper Bågmark, Adam Andersson, Stig Larsson, Filip Rydin

Comments 22 pages, 3 figures

详情
英文摘要

A numerical scheme for approximating the nonlinear filtering density is introduced and its convergence rate is established, theoretically under a parabolic Hörmander condition, and empirically in numerical examples. In a prediction step, between the noisy and partial measurements at discrete times, the scheme approximates the Fokker--Planck equation with a deep splitting scheme, followed by an exact update through Bayes' formula. This results in a classical prediction-update filtering algorithm that operates online for new observation sequences post-training. The algorithm employs a sampling-based Feynman--Kac approach, designed to mitigate the curse of dimensionality. As a corollary we obtain the convergence rate for the approximation of the Fokker--Planck equation alone, disconnected from the filtering problem. The convergence analysis is complemented by a nonlinear $10$-dimensional numerical example demonstrating the robustness of the method.

2409.10030 2026-04-21 stat.ME econ.EM stat.ML

LASSO Inference for High Dimensional Predictive Regressions

Zhan Gao, Ji Hyung Lee, Ziwei Mei, Zhentao Shi

详情
英文摘要

LASSO inflicts shrinkage bias on estimated coefficients, which undermines asymptotic normality and invalidates standard inferential procedures based on the t-statistic. Given cross sectional data, the desparsified LASSO has emerged as a well-known remedy for correcting the shrinkage bias. In the context of high dimensional predictive regression, the desparsified LASSO faces an additional challenge: the Stambaugh bias arising from nonstationary regressors modeled as local unit roots. To restore standard inference, we propose a novel estimator called IVX-desparsified LASSO (XDlasso). XDlasso simultaneously eliminates both shrinkage bias and Stambaugh bias and does not require prior knowledge about the identities of nonstationary and stationary regressors. We establish the asymptotic properties of XDlasso for hypothesis testing, and our theoretical findings are supported by Monte Carlo simulations. Applying our method to real-world applications from the FRED-MD database, we investigate two important empirical questions: (i) the predictability of the U.S. stock returns based on the earnings-price ratio, and (ii) the predictability of the U.S. inflation using the unemployment.

2409.06172 2026-04-21 stat.ME

Nonparametric Inference for Balance in Signed Networks

Xuyang Chen, Yinjie Wang, Weijing Tang

详情
英文摘要

In many real-world networks, relationships often go beyond simple dyadic presence or absence; they can be positive, like friendship, alliance, and mutualism, or negative, characterized by enmity, disputes, and competition. To understand the formation mechanism of such signed networks, the social balance theory sheds light on the dynamics of positive and negative connections. In particular, it characterizes the proverbs, "a friend of my friend is my friend" and "an enemy of my enemy is my friend". In this work, we propose a nonparametric inference approach for assessing empirical evidence for the balance theory in real-world signed networks. We first characterize the generating process of signed networks with node exchangeability and propose a nonparametric sparse signed graphon model. Under this model, we construct confidence intervals for the population parameters associated with balance theory and establish their theoretical validity. Our inference procedure is as computationally efficient as a simple normal approximation but offers higher-order accuracy. By applying our method, we find strong real-world evidence for balance theory in signed networks across various domains, extending its applicability beyond social psychology.

2405.20936 2026-04-21 stat.ME

Bayesian Deep Generative Models for Multiplex Networks with Multiscale Overlapping Clusters

Yuren Zhou, Yuqi Gu, David B. Dunson

详情
英文摘要

Our interest is in multiplex network data with multiple network samples observed across the same set of nodes. Examples originate from a variety of fields, including brain connectivity, international trade networks, and social networks, among others. Our goal is to infer a hierarchical structure of the nodes at a population level, while performing multi-resolution clustering of the individual replicates. To accomplish this, we propose a Bayesian hierarchical model, provide theoretical support in terms of identifiability and posterior consistency, and design efficient methods for posterior computation. We provide novel technical tools for proving model identifiability, which are of independent interest. Our proposed methodology is demonstrated through numerical simulation and an application to brain connectome data.

2307.16421 2026-04-21 math.PR math.AP stat.ML

Wasserstein Mirror Gradient Flow as the limit of the Sinkhorn Algorithm

Nabarun Deb, Young-Heon Kim, Soumik Pal, Geoffrey Schiebinger

Comments 49 pages, 2 figures, Accepted in the Annals of Probability

详情
英文摘要

We prove that the sequence of marginals obtained from the iterations of the Sinkhorn algorithm or the iterative proportional fitting procedure (IPFP) on joint densities, converges to an absolutely continuous curve on the $2$-Wasserstein space, as the regularization parameter $\varepsilon$ goes to zero and the number of iterations is scaled as $1/\varepsilon$ (and other technical assumptions). This limit, which we call the Sinkhorn flow, is an example of a Wasserstein mirror gradient flow, a concept we introduce here inspired by the well-known Euclidean mirror gradient flows. In the case of Sinkhorn, the gradient is that of the relative entropy functional with respect to one of the marginals and the mirror is half of the squared Wasserstein distance functional from the other marginal. Interestingly, the norm of the velocity field of this flow can be interpreted as the metric derivative with respect to the linearized optimal transport (LOT) distance. An equivalent description of this flow is provided by the parabolic Monge-Ampère PDE whose connection to the Sinkhorn algorithm was noticed by Berman (2020). We derive conditions for exponential convergence for this limiting flow. We also construct a Mckean-Vlasov diffusion whose marginal distributions follow the Sinkhorn flow.

2307.01908 2026-04-21 stat.ME

Efficient Estimation of Average Treatment Effect on the Treated under Endogenous Treatment Assignment

Trinetri Ghosh, Jiawei Shan, Menggang Yu, Jiwei Zhao

详情
英文摘要

In this paper, we consider estimation of average treatment effect on the treated (ATT), an interpretable and relevant causal estimand to policy makers when treatment assignment is endogenous. By considering shadow variables that are unrelated to the treatment assignment but related to the outcomes of interest, we establish identification of the ATT. Then we focus on efficient estimation of the ATT by characterizing the geometric structure of the likelihood, deriving the semiparametric efficiency bound for ATT estimation and proposing an estimator that can achieve this bound. We rigorously establish the theoretical results of the proposed estimator. The finite sample performance of the proposed estimator is studied through comprehensive simulation studies as well as an application to our motivating study.

2604.17144 2026-04-21 stat.ME

Statistical Validation of Computer Models: Global and Subdomain Hypothesis Testing

Chaoan Li, Xianyang Zhang, Rui Tuo

详情
英文摘要

Computer simulations play an important role in scientific discovery and engineering innovation. Reliable computer models enable virtual experimentation that reduces the need for costly and time-consuming physical testing. However, the credibility of such models hinges on rigorous statistical validation against real-world data. This paper develops a formal frequentist framework for both global and subdomain validation of computer models. We propose the Fourier Maximum Modulus Test (FMMT), which leverages kernel ridge regression (KRR) to estimate the discrepancy between the computer model and the physical process, followed by a frequency-domain test based on weighted generalized Fourier coefficients. The theoretical analysis establishes the asymptotic normality of these coefficients, allowing for closed-form p-values. Simulation studies and a shear-layer experiment demonstrate that FMMT achieves high power, accurate Type I error control, and strong sensitivity to localized discrepancies.

2604.17136 2026-04-21 math.NT math.PR math.ST stat.TH

On the normality of the concatenated Fibonacci constant

José Ricardo G. Mendonça

Comments AMSart style, 18+ε pages, 8 tables, no figures, 27 references

详情
英文摘要

We study the concatenated Fibonacci constant $\mathcal{F} := 0.F_{1}F_{2}F_{3}\cdots = 0.11235813\cdots$, obtained by concatenating the Fibonacci numbers in the fractional part, and ask whether it is normal. We show that several classical sufficient conditions for normality by concatenation do not apply to the Fibonacci sequence because of its exponential growth, while a criterion of Pollack and Vandehey implies that the normality of $\mathcal{F}$ in base $10$ would follow if almost all Fibonacci numbers were $(\varepsilon,k)$-normal in base $10$. The Benford bias of leading digits and the Pisano periodicity of trailing digits are shown to contribute asymptotically negligible fractions of the total digits, isolating the distribution of the deep digits of large Fibonacci numbers as the remaining obstruction. Large-scale numerical experiments on the first $500{,}000$ Fibonacci numbers in bases $10$ and $2$ indicate that global single-digit counts and $k$-block statistics for $k = 2, 3, 4$ are compatible with iid-like fluctuations at the scales tested, and that a positional decomposition concentrates the visible structured deviation at the boundaries between consecutive Fibonacci numbers, while pooled interior blocks remain close to uniform. Our computations suggest that any obstruction to normality lies in the asymptotic behavior of the deep digits of $F_{n}$.

2604.17130 2026-04-21 stat.ME cs.LG stat.ML

A proposal for PU classification under Non-SCAR using clustering and logistic model

Konrad Furmanczyk, Kacper Paczutkowski

Comments 12 pages, 2 figures, MDAI 25

详情
Journal ref
USB Proceedings of the 22nd International Conference on Modeling Decisions for Artificial Intelligence: MDAI 2025, Valencia, Spain 15 - 18 September, 2025 ISBN 978-91-531-0240-3
英文摘要

The present study aims to investigate a cluster cleaning algorithm that is both computationally simple and capable of solving the PU classification when the SCAR condition is unsatisfied. A secondary objective of this study is to determine the robustness of the LassoJoint method to perturbations of the SCAR condition. In the first step of our algorithm, we obtain cleaning labels from 2-means clustering. Subsequently, we perform logistic regression on the cleaned data, assigning positive labels from the cleaning algorithm with additional true positive observations. The remaining observations are assigned the negative label. The proposed algorithm is evaluated by comparing 11 real data sets from machine learning repositories and a synthetic set. The findings obtained from this study demonstrate the efficacy of the clustering algorithm in scenarios where the SCAR condition is violated and further underscore the moderate robustness of the LassoJoint algorithm in this context.

2604.17067 2026-04-21 math.OC cs.LG math.ST stat.TH

Trajectory-Restricted Optimization Conditions and Geometry-Aware Linear Convergence

Faris Chaudhry, Anthea Monod, Keisuke Yano

Comments 37 pages, 2 figures

详情
英文摘要

Linear convergence of first-order methods is typically characterized by global optimization conditions whose constants reflect worst-case geometry of the ambient space. In high-dimensional or structured problems, these global constants can be arbitrarily conservative and fail to capture the geometry actually encountered by optimization trajectories. In this paper, we develop a trajectory-restricted framework for linear convergence based on localized geometric regularity. We introduce restricted variants of the Polyak--Łojasiewicz inequality, error bound, and quadratic growth conditions that are required to hold only on subsets of the domain. We show that classical convergence guarantees extend under these localized conditions, and in key cases, we develop new arguments that yield explicit relationships between the corresponding constants. The resulting rates are governed by geometric quantities associated with the regions traversed by the algorithm. For polyhedral composite problems, we prove that convergence is controlled by restricted Hoffman constants corresponding to the active polyhedral faces visited along the trajectory. Once the iterates enter a well-conditioned face, the effective condition number improves accordingly. Our work provides a geometric quantification for fast local convergence after active-set or manifold identification and more broadly suggests that linear convergence is fundamentally governed by the geometry of the subsets explored by the algorithm, rather than by worst-case global conditioning.

2604.16975 2026-04-21 math.NA cs.LG cs.NA stat.ML

Convergence theory for Hermite approximations under adaptive coordinate transformations

Yahya Saleh

详情
英文摘要

Recent work has shown that parameterizing and optimizing coordinate transformations using normalizing flows, i.e., invertible neural networks, can significantly accelerate the convergence of spectral approximations. We present the first error estimates for approximating functions using Hermite expansions composed with adaptive coordinate transformations. Our analysis establishes an equivalence principle: approximating a function $f$ in the span of the transformed basis is equivalent to approximating the pullback of $f$ in the span of Hermite functions. This allows us to leverage the classical approximation theory of Hermite expansions to derive error estimates in transformed coordinates in terms of the regularity of the pullback. We present an example demonstrating how a nonlinear coordinate transformation can enhance the convergence of Hermite expansions. Focusing on smooth functions decaying along the real axis, we construct a monotone transport map that aligns the decay of the target function with the Hermite basis. This guarantees spectral convergence rates for the corresponding Hermite expansion. Our analysis provides theoretical insight into the convergence behavior of adaptive Hermite approximations based on normalizing flows, as recently explored in the computational quantum physics literature.

2604.16949 2026-04-21 cs.LG eess.SP stat.ME

L1 Regularization Paths in Linear Models by Parametric Gaussian Message Passing

Yun-Peng Li, Hans-Andrea Loeliger

详情
英文摘要

The paper considers the computation of L1 regularization paths in a state space setting, which includes L1 regularized Kalman smoothing, linear SVM, LASSO, and more. The paper proposes two new algorithms, which are duals of each other; the first algorithm applies to L1 regularization of independent variables while the second applies to L1 regularization of dependent variables. The heart of the proposed algorithms is parametric Gaussian message passing (i.e., Kalman-type forward-backward recursions) in the pertinent factor graphs. The proposed methods are broadly applicable, they (usually) require only matrix multiplications, and their complexity can be competitive with prior methods in some cases.

2604.16932 2026-04-21 stat.ML cs.LG

Neighbor Embedding for High-Dimensional Sparse Poisson Data

Noga Mudrik, Adam S. Charles

详情
英文摘要

Across many scientific fields, measurements often represent the number of times an event occurs. For example, a document can be represented by word occurrence counts, neural activity by spike counts per time window, or online communication by daily email counts. These measurements yield high-dimensional count data that often approximate a Poisson distribution, frequently with low rates that produce substantial sparsity and complicate downstream analysis. A useful approach is to embed the data into a low-dimensional space that preserves meaningful structure, commonly termed dimensionality reduction. Yet existing dimensionality reduction methods, including both linear (e.g., PCA) and nonlinear approaches (e.g., t-SNE), often assume continuous Euclidean geometry, thereby misaligning with the discrete, sparse nature of low-rate count data. Here, we propose p-SNE (Poisson Stochastic Neighbor Embedding), a nonlinear neighbor embedding method designed around the Poisson structure of count data, using KL divergence between Poisson distributions to measure pairwise dissimilarity and Hellinger distance to optimize the embedding. We test p-SNE on synthetic Poisson data and demonstrate its ability to recover meaningful structure in real-world count datasets, including weekday patterns in email communication, research area clusters in OpenReview papers, and temporal drift and stimulus gradients in neural spike recordings.

2604.16900 2026-04-21 stat.AP

Analyzing Process Data from Computer-Based Assessments: A Tutorial on Preprocessing, Feature Extraction, and Model-Based Inference

Daeun Hwangbo, Junyeong Park, Minjeong Jeon, Ick Hoon Jin

详情
英文摘要

Computer-based assessments routinely generate detailed interaction logs -- commonly referred to as process data -- that record every action a respondent performs during task completion, yet systematic preprocessing guidance, integrated analytical workflows, and cross-method consistency checks remain scarce in the literature. This paper provides a unified, end-to-end analytical framework for analyzing process data from large-scale assessments -- covering the full pipeline from raw log preprocessing to model-based inference -- using the Programme for the International Assessment of Adult Competencies (PIAAC) Problem Solving in Technology-Rich Environments (PS-TRE) domain as an illustrative example. We first present a systematic preprocessing pipeline -- including timestamp correction, duplicate removal, action block consolidation, and LLM-assisted standardization -- that transforms raw event-level logs into analysis-ready action sequences. We then review and demonstrate two complementary families of analytical methods. The first consists of feature-based methods and their downstream applications, including descriptive process indicators, n-gram analysis with TF--IDF weighting, multidimensional scaling, and process data-informed differential item functioning (DIF) analysis. The second consists of model-based approaches, namely hidden Markov models and the subtask identification procedure. Empirical illustrations using the United States sample illustrate that n-gram-based behavioral clusters carry differential diagnostic information primarily among incorrect respondents, that multidimentionsl scaling-derived features comprehensively reconstruct observed behavioral variables, and that process-informed DIF analyses can identify and mitigate construct-irrelevant sources of group differences. Reproducible R code implementations are provided for all major techniques.

2604.16894 2026-04-21 cs.LG stat.ME stat.ML

Covariance-Based Structural Equation Modeling in Small-Sample Settings with $p>n$

Hiroki Hasegawa, Aoba Tamura, Yukihiko Okada

Comments 31 pages, 7 figures and 7 tables

详情
英文摘要

Factor-based Structural Equation Modeling (SEM) relies on likelihood-based estimation assuming a nonsingular sample covariance matrix, which breaks down in small-sample settings with $p>n$. To address this, we propose a novel estimation principle that reformulates the covariance structure into self-covariance and cross-covariance components. The resulting framework defines a likelihood-based feasible set combined with a relative error constraint, enabling stable estimation in small-sample settings where $p>n$ for sign and direction. Experiments on synthetic and real-world data show improved stability, particularly in recovering the sign and direction of structural parameters. These results extend covariance-based SEM to small-sample settings and provide practically useful directional information for decision-making.

2604.16865 2026-04-21 stat.ML cs.LG math.PR

Extraction of informative statistical features in the problem of forecasting time series generated by It{ô}-type processes

Victor Korolev, Mikhail Ivanov, Tatiana Kukanova, Artyom Rukavitsa, Alexander Vakshin, Peter Solomonov, Alexander Zeifman

详情
英文摘要

In this paper, we consider the problem of extraction of most informative features from time series that are regarded as observed values of stochastic processes satisfying the It{ô} stochastic differential equations with unknown random drift and diffusion coefficients. We do not attract any additional information and use only the information contained in the time series as it is. Therefore, as additional features, we use the parameters of statistically adjusted mixture-type models of the observed regularities of the behavior of the time series. Several algorithms of construction of these parameters are discussed. These algorithms are based on statistical reconstruction of the coefficients which, in turn, is based on statistical separation of normal mixtures. We obtain two types of parameters by the techniques of the uniform and non-uniform statistical reconstruction of the coefficients of the underlying It{ô} process. The reconstructed coefficients obtained by uniform techniques do not depend on the current value of the process, while the non-uniform techniques reconstruct the coefficients with the account of their dependence on the value of the process. Actually, the non-uniform techniques used in this paper represent a stochastic analog of the Taylor expansion for the time series. The efficiency of the obtained additional features is compared by using them in the autoregressive algorithms of prediction of time series. In order to obtain pure conclusion that is not affected by unwanted factors, say, related to a special choice of the architecture of the neural network prediction methods, we used only simple autoregressive algorithms. We show that the use of additional statistical features improves the prediction.

2604.16809 2026-04-21 stat.ML cs.LG math.OC

A Mechanism Study of Delayed Loss Spikes in Batch-Normalized Linear Models

Peifeng Gao, Wenyi Fang, Yang Zheng, Difan Zou

详情
英文摘要

Delayed loss spikes have been reported in neural-network training, but existing theory mainly explains earlier non-monotone behavior caused by overly large fixed learning rates. We study one stylized hypothesis: normalization can postpone instability by gradually increasing the effective learning rate during otherwise stable descent. To test this hypothesis at theorem level, we analyze batch-normalized linear models. Our flagship result concerns whitened square-loss linear regression, where we derive explicit no-rising-edge and delayed-onset conditions, bound the waiting time to directional onset, and show that the rising edge self-stabilizes within finitely many iterations. Combined with a square-loss decomposition, this yields a concrete delayed-spike mechanism in the whitened regime. For logistic regression, under highly restrictive active-margin assumptions, we prove only a supporting finite-horizon directional precursor in a knife-edge regime, with an optional appendix-only loss lower bound under an extra non-degeneracy condition. The paper should therefore be read as a stylized mechanism study rather than a general explanation of neural-network loss spikes. Within that scope, the results isolate one concrete delayed-instability pathway induced by batch normalization.

2604.16714 2026-04-21 cs.LG stat.CO stat.ML

How to Approximate Inference with Subtractive Mixture Models

Lena Zellinger, Nicola Branchini, Lennert De Smet, Víctor Elvira, Nikolay Malkin, Antonio Vergari

Comments Accepted version at AISTATS 2026

详情
英文摘要

Classical mixture models (MMs) are widely used tractable proposals for approximate inference settings such as variational inference (VI) and importance sampling (IS). Recently, mixture models with negative coefficients, called subtractive mixture models (SMMs), have been proposed as a potentially more expressive alternative. However, how to effectively use SMMs for VI and IS is still an open question as they do not provide latent variable semantics and therefore cannot use sampling schemes for classical MMs. In this work, we study how to circumvent this issue by designing several expectation estimators for IS and learning schemes for VI with SMMs, and we empirically evaluate them for distribution approximation. Finally, we discuss the additional challenges in estimation stability and learning efficiency that they carry and propose ways to overcome them. Code is available at: https://github.com/april-tools/delta-vi.

2604.16671 2026-04-21 stat.ME

Multi-Experiment Analysis

Reza Hosseini

详情
英文摘要

Online controlled experiments face growing challenges from overlapping tests on shared traffic, where interactions between concurrent experiments obscure insights into feature combinations and produce effect estimates that do not correspond to any actionable launch scenario. While traffic splitting, layering, and sequential execution (non-concurrent) mitigate some of these issues, they require coordination overhead and can reduce experimentation velocity. We propose Multi-Experiment Analysis (MEA), a methodology for consistent joint estimation in the presence of arbitrary partial or full overlaps and multiple variants. MEA produces three types of estimates: (1) corrected individual treatment effects that account for the presence of overlapping experiments, (2) combined effects of launching any desired combination of variants across experiments, and (3) conditional effects of an experiment's variant given that specific variants of other experiments are launched or deramped -- all without requiring factorial pre-design or traffic restrictions. We validate the approach through comprehensive simulations confirming consistency and correct coverage. We report on production deployment at scale, illustrate the methodology through real-world use cases, and share practical lessons learned -- including system design, adoption patterns, and insights from production use.

2604.16661 2026-04-21 math.ST stat.ME stat.TH

Horseshoe Predictive Inference

Percy S. Zhai, Veronika Ročková

详情
英文摘要

Predictive inference in the sparse Gaussian sequence model has received considerably less attention than its non-sparse, finite-sample counterpart. Existing work has largely been confined to discrete mixture priors. In this paper, we study predictive inference under a widely used continuous mixture prior, the Horseshoe. We provide new theoretical results establishing exact asymptotic minimax optimality of the predictive Bayes estimator when the sparsity level is known. Furthermore, through a Gaussian-mixture representation of the posterior predictive density (which we term Horseshoe spectroscopy), the phase-transition in the local shrinkage scale is inherited by the predictive mechanism, producing behavior similar to that of previous thresholding/switching estimators. When sparsity is unknown, we adopt a fully Bayesian approach using a hierarchical Horseshoe prior and show that it performs adaptive, as opposed to manual, switching. Under a theta-min condition, the resulting predictive risk admits an upper bound over a restricted parameter class that is sharper than the minimax rate over the full class. We demonstrate the practical value of predictive Horseshoe shrinkage on data such as images and time series that can be naturally modeled as sparse Gaussian sequences. We illustrate this approach on facial recognition across varying facial expressions and study region-wise atypical brain lateralization in autism spectrum disorder.

2604.16610 2026-04-21 stat.ML cs.LG

Fairness Constraints in High-Dimensional Generalized Linear Models

Yixiao Lin, James Booth

详情
英文摘要

Machine learning models often inherit biases from historical data, raising critical concerns about fairness and accountability. Conventional fairness interventions typically require access to sensitive attributes like gender or race, but privacy and legal restrictions frequently limit their use. To address this challenge, we propose a framework that infers sensitive attributes from auxiliary features and integrates fairness constraints into model training. Our approach mitigates bias while preserving predictive accuracy, offering a practical solution for fairness-aware learning. Empirical evaluations validate its effectiveness, contributing to the advancement of more equitable algorithmic decision-making.

2604.16537 2026-04-21 stat.ME cs.AI stat.AP

Robustifying and Selecting Cohort-Appropriate Prognostic Models under Distributional Shifts

Dimitris Bertsimas, Carol Gao, Angelos G. Koulouras, Georgios Antonios Margonis

详情
英文摘要

External validation is widely regarded as the gold standard for prognostic model evaluation. In this study, we challenge the assumption that successful external calibration guarantees model generalizability and propose two complementary strategies to improve transportability of prognostic models across cohorts. Using six real-world surgical cohorts from tertiary academic centers, we tested whether successful external calibration depends largely on similarity in covariates and outcomes between training and validation cohorts, quantified using Kullback-Leibler (KL) divergence, with calibration assessed by the Integrated Calibration Index (ICI). From the model-developer's perspective, we trained the "best-on-average" prognostic model by tuning toward a meta-analysis-derived covariate and outcome distribution as an approximation of the broader target population. From the end-user perspective, we proposed a simple measure for cohort outcome similarity to identify, among published models, the one most suitable for a given target cohort in terms of both calibration and clinical utility. External calibration worsened as distributional mismatch increased. Higher KL divergence was associated with higher ICI in both surgery-alone (Spearman $ρ=0.614$, $p=0.004$) and surgery + adjuvant chemotherapy cohorts (Spearman $ρ=0.738$, $p<0.001$). Meta-analysis-informed weighting improved calibration in most settings without materially affecting discrimination, with the clearest benefit when evaluated on the aggregated external population ($p=0.037$). Models developed in more similar cohorts achieved lower ICI in surgery-alone (Spearman $ρ=0.803$, $p<0.001$) and surgery + adjuvant chemotherapy cohorts (Spearman $ρ=0.737$, $p<0.001$), and provided greater clinical utility on DCA.

2604.16464 2026-04-21 stat.AP cs.LG

Horizon-Aware Forecasting of Passenger Assistance Demand for Rail Station Workforce Planning

Michael Sheehan, Irina Timoshenko

Comments 26 pages, 6 figures, 3 tables

详情
英文摘要

Passenger assistance services are essential for accessible rail travel, yet demand varies substantially across stations and over time, creating challenges for workforce planning and staff rostering. This paper presents a data-driven decision support framework for forecasting station-level passenger assistance demand and translating forecasts into workforce plans. The forecasting component applies a horizon-aware Prophet modelling approach using multi-source operational data, while the planning component maps demand forecasts to staffing requirements under service and operational constraints through an interpretable red-amber-green risk framework. The approach has been implemented within a production-grade system to support routine planning and staffing decisions across LNER-managed stations. Results demonstrate improved forecast accuracy relative to year-on-year baseline methods, with absolute error reduced by up to 76.9%, and show that forecast-informed staffing is associated with an approximate 50% reduction in failed passenger assistance deliveries attributable to staff availability. These findings highlight the value of integrating interpretable forecasting with operational work.

2604.16453 2026-04-21 cs.LG cs.AI stat.ML

Sampling for Quality: Training-Free Reward-Guided LLM Decoding via Sequential Monte Carlo

Jelena Markovic-Voronov, Wenhui Zhu, Bo Long, Zhipeng Wang, Suyash Gupta, Kayhan Behdin, Bee-Chung Chen, Deepak Agarwal

详情
英文摘要

We introduce a principled probabilistic framework for reward-guided decoding in large language models, addressing the limitations of standard decoding methods that optimize token-level likelihood rather than sequence-level quality. Our method defines a reward-augmented target distribution over complete sequences by combining model transition probabilities with prefix-dependent reward potentials. Importantly, the approach is training-free: it leaves model weights unchanged and instead modifies the inference distribution via reward potentials, with all gains arising purely from inference-time sampling. To sample from this distribution, we develop Sequential Monte Carlo algorithms, including a computationally efficient prefix-only variant and a lookahead variant whose intermediate targets match the exact marginals of the full sequence distribution. The framework also integrates resample-move updates with Metropolis-Hastings rejuvenation and supports block-wise generation, subsuming common decoding strategies such as temperature sampling and power-tempered objectives. Empirical results across three 7B models show significant gains. On code generation (HumanEval), our method improves base performance by up to 54.9% and surpasses the strongest sampling baselines by 9.1%-15.3%. On mathematical reasoning (MATH500), it achieves gains of up to 8.8%. Notably, it reaches 87.8% on HumanEval and 78.4% on MATH500 with Qwen2.5-7B, consistently outperforming the reinforcement learning method GRPO.

2604.16435 2026-04-21 eess.SP cs.IT math.IT math.ST stat.TH

Beyond the Flat-Spike: Adaptive Sparse CCA for Decaying and Unbalanced Signals

Mengchu Xu, Jian Wang, Yonina C. Eldar

Comments 15 pages, 4 figures; submitted to IEEE TSP

详情
英文摘要

Sparse Canonical Correlation Analysis (SCCA) is a fundamental statistical tool for identifying linear relationships in high-dimensional, multi-view data. While minimax theory establishes an optimal sample complexity scaling additively with the sparsity levels of the canonical vectors, computationally efficient algorithms typically suffer from a suboptimal multiplicative dependence. This computational-statistical gap is intrinsically tied to worst-case ``flat'' signal assumptions. In practice, however, multi-view signals frequently exhibit structured energy concentration, such as a power-law decay. To exploit this structural concentration and bypass the worst-case bottleneck, we propose Bilateral Spectral Energy Pursuit (Bi-SEP). Operating directly on the cross-covariance matrix, Bi-SEP is a stagewise adaptive algorithm that utilizes a proxy refinement step to dynamically track and capture cross-view signal energy. Theoretically, we establish a profile-adaptive sample complexity bound governed by the coupled energy profiles of the two views. Notably, under power-law decay models, we reveal a synergistic phase transition: the optimal linear sample complexity is attainable provided that the aggregate decay rate of the two views is sufficiently large. This result demonstrates that a highly concentrated signal in one view allows the model to accommodate a completely flat signal in its partner. Numerical experiments validate our theoretical findings, illustrating the advantages of Bi-SEP in structured, non-flat signal regimes.

2604.16428 2026-04-21 cs.LG cs.AI stat.ML

Non-Stationarity in the Embedding Space of Time Series Foundation Models

Jinmyeong Choi, Brad Shook, Artur Dubrawski

Comments 17 pages, 7 figures

详情
英文摘要

Time series foundation models (TSFMs) are widely used as generic feature extractors, yet the notion of non-stationarity in their embedding spaces remains poorly understood. Recent work often conflates non-stationarity with distribution shift, blurring distinctions fundamental to classical time-series analysis and long-standing methodologies such as statistical process control (SPC). In SPC, non-stationarity signals a process leaving a stable regime - via shifts in mean, variance, or emerging trends - and detecting such departures is central to quality monitoring and change-point analysis. Motivated by this diagnostic tradition, we study how different forms of distributional non-stationarity - mean shifts, variance changes, and linear trends - become linearly accessible in TSFM embedding spaces under controlled conditions. We further examine temporal non-stationarity arising from persistence, which reflects violations of weak stationarity due to long-memory or near-unit-root behavior rather than explicit distributional shifts. By sweeping shift strength and probing multiple TSFMs, we find that embedding-space detectability of non-stationarity degrades smoothly and that different models exhibit distinct, model-specific failure modes.

2604.14949 2026-04-21 stat.ML cs.LG

Unsupervised feature selection using Bayesian Tucker decomposition

Y-h. Taguchi, Yoh-ichi Mototake

Comments 24 pages, 10 figures, to appear in Neural Computation

详情
英文摘要

In this paper, we proposed Bayesian Tucker decomposition (BTuD) in which residual is supposed to obey Gaussian distribution analogous to linear regression. Although we have proposed an algorithm to perform the proposed BTuD, the conventional higher-order orthogonal iteration can generate Tucker decomposition consistent with the present implementation. Using the proposed BTuD, we can perform unsupervised feature selection successfully applied to various synthetic datasets, global coupled maps with randomized coupling strength, and gene expression profiles. Thus we can conclude that our newly proposed unsupervised feature selection method is promising. In addition to this, BTuD based unsupervised FE is expected to coincide with TD based unsupervised FE that were previously proposed and successfully applied to a wide range of problems.

2604.03337 2026-04-21 cs.CV stat.AP

Significance and Stability Analysis of Gene-Environment Interaction using RGxEStat

Meng'en Qin, Zhe Li, Xiaohui Yang

详情
英文摘要

Genotype-by-Environment (GxE) interactions influence the performance of genotypes across diverse environments, reducing the predictability of phenotypes in target environments. In-depth analysis of GxE interactions facilitates the identification of how genetic advantages or defects are expressed or suppressed under specific environmental conditions, thereby enabling genetic selection and enhancing breeding practices. This paper introduces two key models for GxE interaction research. Specifically, it includes significance analysis based on the mixed effect model to determine whether genes or GxE interactions significantly affect phenotypic traits; stability analysis, which further investigates the interactive relationships between genes and environments, as well as the relative superiority or inferiority of genotypes across environments. Additionally, this paper presents RGxEStat, a lightweight interactive tool, which is developed by the authors and integrates the construction, solution, and visualization of the aforementioned models. Designed to eliminate the need for breeders and agronomists to learn complex SAS or R programming, RGxEStat provides a user-friendly interface for streamlined breeding data analysis, significantly accelerating research cycles. Codes and datasets are available at https://github.com/mason-ching/RGxEStat.

2604.01502 2026-04-21 stat.ML cs.LG

Conformal Risk Control under Non-Monotone Losses: Theory and Finite-Sample Guarantees

Tareq Aldirawi, Yun Li, Wenge Guo

Comments 39 pages, 6 figures, 3 tables

详情
英文摘要

Conformal risk control (CRC) provides distribution-free guarantees for controlling the expected loss at a user-specified level. Existing theory typically assumes that the loss decreases monotonically with a tuning parameter that governs the size of the prediction set. However, this assumption is often violated in practice, where losses may behave non-monotonically due to competing objectives such as coverage and efficiency. In this paper, we study CRC under non-monotone loss functions when the tuning parameter is selected from a finite grid, a setting commonly arising in thresholding and discretized decision rules. Revisiting a known counterexample, we show that the validity of CRC without monotonicity depends critically on the relationship between the calibration sample size and the grid resolution. In particular, reliable risk control can still be achieved when the calibration sample is sufficiently large relative to the grid size. We establish a finite-sample guarantee for bounded losses over a grid of size $m$, showing that the excess risk above the target level $α$ scales on the order of $\sqrt{\log(m)/n}$, where $n$ is the calibration sample size. A matching lower bound demonstrates that this rate is minimax optimal. We also derive refined guarantees under additional structural conditions, including Lipschitz continuity and monotonicity, and extend the analysis to settings with distribution shift via importance weighting. Numerical experiments on synthetic multilabel classification and real object detection data illustrate the practical implications of non-monotonicity. Methods that explicitly account for finite-sample uncertainty achieve more stable risk control than approaches based on monotonicity transformations, while maintaining competitive prediction set sizes.

2603.24647 2026-04-21 cs.LG stat.ML

Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch

Fabio Ferreira, Lucca Wobbe, Arjun Krishnakumar, Frank Hutter, Arber Zela

详情
英文摘要

The autoresearch repository enables an LLM agent to optimize hyperparameters by editing training code directly. We use it as a testbed to compare classical HPO algorithms against LLM-based methods on tuning the hyperparameters of a small language model under a fixed compute budget. When defining a fixed search space over autoresearch, classical methods such as CMA-ES and TPE consistently outperform LLM-based agents, where avoiding out-of-memory failures matters more than search diversity. Allowing the LLM to directly edit source code narrows the gap to the classical methods but does not close it, even with frontier models available at the time of writing such as Claude Opus 4.6 and Gemini 3.1 Pro Preview. We observe that LLMs struggle to track optimization state across trials. In contrast, classical methods lack the domain knowledge of LLMs. To combine the strengths of both, we introduce Centaur, a hybrid that shares CMA-ES's interpretable internal state, including mean vector, step-size, and covariance matrix, with an LLM. Centaur achieves the best result in our experiments, and a 0.8B LLM already suffices to outperform all classical and pure LLM methods. Unconstrained code editing requires larger models to be competitive with classical methods. We further analyze search diversity, model scaling from 0.8B to frontier models, and ablate the fraction of LLM-proposed trials in Centaur. All in all, our results suggest that LLMs are most effective as a complement to classical optimizers, not as a replacement. Code is available at https://github.com/ferreirafabio/autoresearch-automl & interactive demo at https://ferreirafabio.github.io/autoresearch-automl.

2512.13400 2026-04-21 econ.EM stat.ML

Profit-Aligned CATE Estimation: Reconciling Policy Learning and Inference

Artem Timoshenko, Caio Waisman

详情
英文摘要

We propose a framework that aligns Conditional Average Treatment Effect (CATE) estimation with profit maximization. Our method recognizes that, for customers with extreme treatment effects, additional estimation accuracy is unlikely to change the recommended actions. In contrast, accuracy is critical near the decision boundary, where treatment effects are close to treatment costs. Our approach optimizes a novel objective function that concentrates learning capacity along this boundary. The proposed objective is Fisher consistent with respect to the original profit function and yields a consistent estimator for CATEs. Theoretically, our framework unifies standard plug-in optimization and direct policy optimization as limiting cases of the same optimization problem. We further show that entropy-regularized policy optimization is a special case of our framework. This result has a direct practical implication: firms can recover consistent CATE estimates from existing profit-maximization pipelines. We use synthetic data to demonstrate how the proposed framework allows firms to explicitly navigate the trade-off between global prediction accuracy and profit maximization.

2511.22003 2026-04-21 stat.ML cs.LG stat.ME

A Sensitivity Approach to Causal Inference Under Limited Overlap

Yuanzhe Ma, Yian Huang, Hongseok Namkoong

详情
英文摘要

Limited overlap between treated and control groups is a key challenge in observational analysis. Standard approaches like trimming importance weights can reduce variance but introduce a fundamental bias. We propose a sensitivity framework for contextualizing findings under limited overlap, where we assess how irregular the outcome function has to be in order for the main finding to be invalidated. Our approach is based on worst-case confidence bounds on the bias introduced by standard trimming practices, under explicit assumptions necessary to extrapolate counterfactual estimates from regions of overlap to those without. Empirically, we demonstrate how our sensitivity framework protects against spurious findings by quantifying uncertainty in regions with limited overlap.

2511.09872 2026-04-21 math.NA cs.NA stat.ML

Randomized batch-sampling Kaczmarz methods for solving linear systems

Dong-Yue Xie, Xi Yang

详情
英文摘要

To conduct a more in-depth investigation of randomized solvers for solving linear systems, we adopt a unified randomized batch-sampling Kaczmarz framework with per-iteration costs as low as cyclic block methods, and develop a general analysis technique to establish its convergence guarantee. With concentration inequalities, we derive new expected linear convergence rate bounds. The analysis applies to any randomized non-extended block Kaczmarz methods with arbitrary static stochastic samplings. In addition, the new rate bounds are scale-invariant, which eliminate the dependence on the magnitude of the data matrix. In most experiments, the new bounds are significantly tighter than existing ones and better reflect the empirical convergence behavior of block methods. Within this new framework, the batch-sampling distribution, as a learnable parameter, provides the possibility for block methods to achieve efficient performance in specific application scenarios, which deserves further investigation.

2511.03951 2026-04-21 math.ST stat.TH

A unified approach to the Behrens-Fisher problem

Nagananda K G, Jong Sung Kim

Comments 22 pages, 2 figures

详情
英文摘要

A unified framework is presented to study the two-sample Behrens--Fisher problem -- testing equality of means when two normal populations have unequal, unknown variances -- and a compact expression is derived for the null distribution of the classical test statistic. Our new approach involves a Mellin--Barnes factorization that decouples the square root of a weighted sum of independent chi-square variates, thereby collapsing a challenging two-dimensional integral to a tractable single-contour integral. Closing the contour yields a residue series that terminates whenever either sample's degrees of freedom is odd. A complementary Euler--Beta reduction identifies the density as a Gauss hypergeometric function with explicit parameters, yielding a numerically stable form that recovers Student's $t$ under equal variances. Ramanujan's master theorem supplies exact inverse-power tail coefficients, which bound Lugannani--Rice saddle-point approximation errors and support reliable tail analyses. The proposed framework reveals why hypergeometric structure appears, why certain finite-sum cases arise, and how one can pass from the bulk of the distribution to its tails without altering the analytic framework. Finally, it lets us tabulate exact two-sided critical values over a broad grid of sample sizes and variance ratios that reveal the parameter surface on which the well-known Welch's approximation switches from conservative to liberal, quantifying its maximum size distortion.

2511.03535 2026-04-21 math.ST stat.TH

Asymptotics of the maximum likelihood estimator of the location parameter of Pearson Type VII distribution

Kazuki Okamura

Comments 32 pages, Simulation results added, Exposition modified, to appear in Sankhya A

详情
英文摘要

We study the maximum likelihood estimator of the location parameter of the Pearson Type VII distribution with known scale. We rigorously establish precise asymptotic properties such as strong consistency, asymptotic normality, Bahadur efficiency and asymptotic variance of the maximum likelihood estimator. Our focus is the heavy-tailed case, including the Cauchy distribution. The main difficulty lies in the fact that the likelihood equation may have multiple roots; nevertheless, the maximum likelihood estimator performs well for large samples.

2510.22341 2026-04-21 stat.AP q-fin.TR

Understanding Carbon Trade Dynamics: A European Union Emissions Trading System Perspective

Avirup Chakraborty

详情
英文摘要

The European Union Emissions Trading System (EU ETS), the world's first and largest cap-and-trade carbon market, is a cornerstone of EU climate policy. This study provides a comprehensive empirical analysis of the EU carbon market's efficiency, price dynamics, and structural network from 2010 to 2020. First, we identify significant price clustering and short-term return predictability using an AR-GARCH model, achieving around 60 percent directional accuracy and a 80 percent hit rate within forecasted confidence intervals. These observed patterns motivate a deeper exploration of market structure. Second, leveraging this insight, a weighted network analysis of inter-country transactions uncovers a concentrated market where a few registries dominate high-value flows and exert disproportionate influence. Finally, building upon the network findings, country-specific log-log regressions of price on traded quantity reveal heterogeneous and sometimes counter-intuitive elasticities; in several cases, positive elasticities exceed unity, indicating that trading volumes rise with prices, a deviation from conventional demand behavior that highlights potential inefficiencies driven by speculation, strategic behavior, or policy distortions. Collectively, these results point to persistent inefficiencies within the EU ETS, including partial predictability, asymmetric market power, and anomalous price-volume relationships, implying that while the system has driven decarbonization, its trading and pricing mechanisms remain imperfect.

2510.12916 2026-04-21 stat.ML cs.LG

Efficient Inference for Coupled Hidden Markov Models in Continuous Time and Discrete Space

Giosue Migliorini, Padhraic Smyth

详情
英文摘要

Systems of interacting continuous-time Markov chains are a powerful model class, but inference is typically intractable in high dimensional settings. Auxiliary information, such as noisy observations, is typically only available at discrete times, and incorporating it via a Doob's $h$-transform gives rise to an intractable posterior process that requires approximation. We introduce Latent Interacting Particle Systems, a model class parameterizing the generator of each Markov chain in the system. Our inference method involves estimating look-ahead functions (twist potentials) that anticipate future information, for which we introduce an efficient parameterization. We incorporate this approximation in a twisted Sequential Monte Carlo sampling scheme. We demonstrate the effectiveness of our approach on a challenging posterior inference task for a latent SIRS model on a graph, and on a neural model for wildfire spread dynamics trained on real data.

2509.09773 2026-04-21 stat.ME math.ST stat.TH

Optimal Inference of the Mean Outcome under Optimal Treatment Regime

Shuoxun Xu, Xinzhou Guo

Comments 17 pages, 5 figures

详情
英文摘要

When an optimal treatment regime (OTR) is considered, we need to evaluate the OTR in a valid and efficient way. The classical inference applied to the mean outcome under OTR, assuming the OTR is the same as the estimated OTR, might be biased when the regularity assumption that OTR is unique is violated. Although several methods have been proposed to allow nonregularity in such inference, its optimality is unclear due to challenges in deriving semiparametric efficiency bounds under potential nonregularity. In this paper, we address the bias issue via adaptive smoothing over the estimated OTR and develop a valid inference procedure on the mean outcome under OTR regardless of whether regularity is satisfied. We establish the optimality of the proposed method by deriving a lower bound of the asymptotic variance for the robust asymptotically linear unbiased estimator to the mean outcome under OTR and showing that our proposed estimator achieves the variance lower bound. The considered estimator class is general and the derived variance lower bound paves a novel way to establish efficiency optimality theories for OTR in a more general scenario allowing nonregularity. The merit of the proposed method is demonstrated by re-analyzing the ACTG 175 trial.

2508.17412 2026-04-21 cs.LG cs.AI stat.ML

A Ridge Too Far: Correcting Over-Shrinkage via Negative Regularization

Dongseok Kim, Gisung Oh

Comments Substantially revised and reorganized version with a new title, updated framing, and new experiments; the core idea of the work remains unchanged

详情
英文摘要

Conventional regularization is designed to control variance, but in small-data regression it can also aggravate underfitting when predictive signal is concentrated in weak directions of a restricted representation. We study a negative-capable ridge family that permits a feasible negative region whenever the estimator remains well posed, and show that negative regularization acts there as controlled anti-shrinkage by increasing effective complexity most strongly along weak eigendirections. Building on this mechanism, we formalize weak-spectrum underfitting, derive a sign-switch result under conservative baseline shrinkage, and study criterion-based automatic selection over the full negative-capable family. Synthetic and semi-synthetic experiments support the theory by verifying feasibility, spectral complexity increase, sign-switch behavior, and effective recovery of negative adjustments in the predicted regimes.

2506.10060 2026-04-21 cs.LG cs.AI stat.ML

Textual Bayes: Quantifying Prompt Uncertainty in LLM-Based Systems

Brendan Leigh Ross, Noël Vouitsis, Atiyeh Ashari Ghomi, Rasa Hosseinzadeh, Ji Xin, Zhaoyan Liu, Yi Sui, Shiyi Hou, Kin Kwan Leung, Gabriel Loaiza-Ganem, Jesse C. Cresswell

Comments ICLR 2026

详情
英文摘要

Although large language models (LLMs) are becoming increasingly capable of solving challenging real-world tasks, accurately quantifying their uncertainty remains a critical open problem--one that limits their applicability in high-stakes domains. This challenge is further compounded by the closed-source, black-box nature of many state-of-the-art LLMs. Moreover, LLM-based systems can be highly sensitive to the prompts that bind them together, which often require significant manual tuning (i.e., prompt engineering). In this work, we address these challenges by viewing LLM-based systems through a Bayesian lens. We interpret prompts as textual parameters in a statistical model, allowing us to use a small training dataset to perform Bayesian inference over these prompts. This novel perspective enables principled uncertainty quantification over both the model's textual parameters and its downstream predictions, while also incorporating prior beliefs about these parameters expressed in free-form text. To perform Bayesian inference--a difficult problem even for well-studied data modalities--we introduce Metropolis-Hastings through LLM Proposals (MHLP), a novel Markov chain Monte Carlo (MCMC) algorithm that combines prompt optimization techniques with standard MCMC methods. MHLP is a turnkey modification to existing LLM pipelines, including those that rely exclusively on closed-source models. Empirically, we demonstrate that our method yields improvements in both predictive accuracy and uncertainty quantification (UQ) on a range of LLM benchmarks and UQ tasks. More broadly, our work demonstrates a viable path for incorporating methods from the rich Bayesian literature into the era of LLMs, paving the way for more reliable and calibrated LLM-based systems.

2505.13660 2026-04-21 math.OC cs.LG stat.ML

Sobolev Gradient Ascent for Optimal Transport: Barycenter Optimization and Convergence Analysis

Kaheon Kim, Bohan Zhou, Changbo Zhu, Xiaohui Chen

详情
英文摘要

This paper introduces a new constraint-free concave dual formulation for the Wasserstein barycenter. Tailoring the vanilla dual gradient ascent algorithm to the Sobolev geometry, we derive a scalable Sobolev gradient ascent (SGA) algorithm to compute the barycenter for input distributions discretized over a regular grid. Despite the algorithmic simplicity, we provide a global convergence analysis that achieves the same rate as the classical subgradient descent methods for minimizing nonsmooth convex functions in the Euclidean space. A central feature of our SGA algorithm is that the computationally expensive $c$-concavity projection operator enforced on the Kantorovich dual potentials is unnecessary to guarantee convergence, leading to significant algorithmic and theoretical simplifications over all existing primal and dual methods for computing the exact barycenter. Our numerical experiments demonstrate the superior empirical performance of SGA over the existing optimal transport barycenter solvers.

2505.07033 2026-04-21 stat.ML cs.LG stat.ME

Introducing the O-Value: A Universal Standardization for Confusion-Matrix-Based Classification Performance Metrics

Ningsheng Zhao, Trang Bui, Jia Yuan Yu, Krzysztof Dzieciolowski

详情
英文摘要

Many classification performance metrics exist, each suited to a specific application. However, these metrics often differ in scale and can exhibit varying sensitivity to class imbalance rates in the test set. As a result, it is difficult to use the nominal values of these metrics to evaluate, compare and monitor classification performances, especially when imbalance rates vary. To address this problem, we introduce the outperformance standardization (OPS) function, a universal standardization method for confusion-matrix-based classification performance (CMBCP) metrics. It maps any given metric to a common scale of $[0,1]$, while providing a clear and consistent interpretation. Specifically, the resulting OPS value (o-value) represents the percentile rank of the observed classification performance within a reference distribution of possible performances. This unified framework enables meaningful comparison and monitoring of classification performance across test sets with differing imbalance rates. We illustrate how o-values can be applied to a variety of commonly used classification performance metrics and demonstrate the utility and robustness of our method through experiments on real-world datasets spanning multiple classification applications.

2501.02817 2026-04-21 math.AT math.ST stat.ML stat.TH

A Stable Measure of Similarity for Time Series using Persistent Homology

Bala Krishnamoorthy, Elizabeth P. Thompson

Comments Modified our similarity measure and included associated results on real climate data

详情
英文摘要

Persistent homology, the study of holes that appear in data as one thickens balls centered around its points over time, has theoretically guaranteed stability. That is, small data perturbations guarantee small changes in the lifetimes of these holes. This stability has been used to construct a measure of periodicity for a single univariate time series, denoted score(f1). One popular measure of similarity between two time series is percent determinism (%DET), which measures the correlation between two time-series embeddings. We introduce a novel persistent-homology based measure of time-series similarity which we denote the bi-conditional periodicity score, score(f1,f2). We prove the stability of our measure under small time series and frequency perturbations, as well as the existence of a minimum embedding dimension for the convergence of our score. Our latter result implies that larger embedding dimensions may be necessary to reach desired levels of convergence. Since pairwise distances between points in these larger dimensions may start to concentrate, we also prove the stability of our measure under dimension reduction which guarantees that as long as the first K principal components capture a majority of the variance under orthogonal projection, the score will undergo small changes. We next introduce an algorithm for computing the bi-conditional periodicity score and deduce its computational complexity as O(N log N + PK^2 + P^6) for N the number of time series points, P the number of embedding points, and K the number of principal components. We experimentally verify the greater stability of our measure in comparison with %DET on both synthetic time series as well as real climate data. As well, score(f1,f2) requires only one parameter for its computation while %DET requires four.

2410.15368 2026-04-21 math.OC cs.LG stat.ML

Tighter Performance Theory of FedExProx

Wojciech Anyszka, Kaja Gruntkowska, Alexander Tyurin, Peter Richtárik

Comments 43 pages, 4 figures

详情
英文摘要

We revisit FedExProx - a recently proposed distributed optimization method designed to enhance convergence properties of parallel proximal algorithms via extrapolation. In the process, we uncover a surprising flaw: its known theoretical guarantees on quadratic optimization tasks are no better than those offered by the vanilla Gradient Descent (GD) method. Motivated by this observation, we develop a novel analysis framework, establishing a tighter linear convergence rate for non-strongly convex quadratic problems. By incorporating both computation and communication costs, we demonstrate that FedExProx can indeed provably outperform GD, in stark contrast to the original analysis. Furthermore, we consider partial participation scenarios and analyze two adaptive extrapolation strategies - based on gradient diversity and Polyak stepsizes - again significantly outperforming previous results. Moving beyond quadratics, we extend the applicability of our analysis to general functions satisfying the Polyak-Lojasiewicz condition, outperforming the previous strongly convex analysis while operating under weaker assumptions. Backed by empirical results, our findings point to a new and stronger potential of FedExProx, paving the way for further exploration of the benefits of extrapolation in federated learning.

2410.09296 2026-04-21 cs.CR cs.DS stat.AP stat.ML

The 2020 US Decennial Census is more private than you (might) think

Buxin Su, Weijie J. Su, Chendi Wang

详情
英文摘要

The U.S. Decennial Census serves as the foundation for many high-profile policy decision-making processes, including federal funding allocation and redistricting. In 2020, the Census Bureau adopted differential privacy to protect the confidentiality of individual responses through a disclosure avoidance system that injects noise into census data tabulations. The Bureau subsequently posed an open question: Could stronger privacy guarantees be obtained for the 2020 U.S. Census compared to their published guarantees, or equivalently, had the privacy budgets been fully utilized? In this paper, we address this question affirmatively by demonstrating that the 2020 U.S. Census provides significantly stronger privacy protections than its nominal guarantees suggest at each of the eight geographical levels, from the national level down to the block level. This finding is enabled by our precise tracking of privacy losses using $f$-differential privacy, applied to the composition of private queries across these geographical levels. Our analysis reveals that the Census Bureau introduced unnecessarily high levels of noise to meet the specified privacy guarantees for the 2020 Census. Consequently, we show that noise variances could be reduced by $15.08\%$ to $24.82\%$ while maintaining nearly the same level of privacy protection for each geographical level, thereby improving the accuracy of privatized census statistics. We empirically demonstrate that reducing noise injection into census statistics mitigates distortion caused by privacy constraints in downstream applications of private census data, illustrated through a study examining the relationship between earnings and education.

2401.15604 2026-04-21 cs.LG stat.ML

Neural Network-Based Score Estimation in Diffusion Models: Optimization and Generalization

Yinbin Han, Meisam Razaviyayn, Renyuan Xu

Comments 58 pages

详情
英文摘要

Diffusion models have become a leading paradigm in generative AI, with score estimation via denoising score matching as a central component. While recent theory provides strong statistical guarantees, it typically relies on algorithm-agnostic assumptions and treats empirical risk minimization as if it were solved exactly. In practice, however, score functions are parameterized by highly nonconvex neural networks and trained by gradient descent (GD), and it remains unclear whether such practical procedures admit rigorous guarantees. We take a first step toward this question by developing a mathematical framework for score estimation with GD-trained neural networks. Our analysis addresses both optimization and generalization. We introduce a parametric formulation that reduces denoising score matching to a regression problem with noisy labels. This setting poses several challenges, including unbounded inputs, vector-valued outputs, and an additional time variable, which prevent a direct application of existing techniques. We show that, with a suitable design, the dynamics of GD-trained networks can be approximated by a sequence of localized kernel regression problems. We also show that prolonged training on noisy labels leads to overfitting, and derive an early-stopping rule adapted to unbounded domains. As a consequence, we establish the first minimax-optimal generalization bounds for GD-trained neural networks in diffusion models. Experiments on the Credit Default dataset further show that our theory-guided training framework achieves performance comparable to heavily tuned heuristic methods for generating high-fidelity financial tabular data.

2112.07572 2026-04-21 math.PR math.ST stat.TH

The high-dimensional asymptotics of first order methods with random data

Michael Celentano, Chen Cheng, Andrea Montanari

Comments 78 pages; v3: introduction, motivations and examples expanded

详情
英文摘要

We study a class of deterministic flows in ${\mathbb R}^{d\times k}$, parametrized by a random matrix ${\boldsymbol X}\in {\mathbb R}^{n\times d}$ with i.i.d. centered subgaussian entries. We characterize the asymptotic behavior of these flows over bounded time horizons, in the high-dimensional limit in which $n,d\to\infty$ with $k$ fixed and converging aspect ratios $n/d\toδ$. The asymptotic characterization we prove is in terms of a system of nonlinear stochastic processes in $k$ dimensions, whose parameters are determined by a fixed point condition. This type of characterization is known in physics as dynamical mean field theory. Rigorous results of this type have been obtained in the past for a few spin glass models. Our proof is based on time discretization and a reduction to certain iterative schemes known as approximate message passing (AMP) algorithms, as opposed to earlier work that was based on large deviations theory and stochastic processes theory. The new approach provides a unified view of a general class of algorithms and implies that the high-dimensional behavior of the flow is universal with respect to the distribution of the entries of ${\boldsymbol X}$. As specific applications, we obtain high-dimensional characterizations of gradient flow in some classical models from statistics and machine learning, under a random design assumption.

2006.12024 2026-04-21 stat.ML cs.LG

Bayesian Neural Networks: An Introduction and Survey

Ethan Goan, Clinton Fookes

Comments 44 pages, 8 figures, Fix typos in Eqn 30, 48, and alpha divergence

详情
Journal ref
Case Studies in Applied Bayesian Data Science: CIRM Jean-Morlet Chair, Fall 2018, 1, (2020) 45-87
英文摘要

Neural Networks (NNs) have provided state-of-the-art results for many challenging machine learning tasks such as detection, regression and classification across the domains of computer vision, speech recognition and natural language processing. Despite their success, they are often implemented in a frequentist scheme, meaning they are unable to reason about uncertainty in their predictions. This article introduces Bayesian Neural Networks (BNNs) and the seminal research regarding their implementation. Different approximate inference methods are compared, and used to highlight where future research can improve on current methods.