arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.22276 2026-03-24 cs.LG stat.ML

Scaling DoRA: High-Rank Adaptation via Factored Norms and Fused Kernels

Alexandra Zelenin, Alexandra Zhuravlyova

Comments 30 pages, 15 figures, 15 tables, including appendices. Code and data at https://github.com/sockeye44/dorafactors

详情
英文摘要

Weight-Decomposed Low-Rank Adaptation (DoRA) extends LoRA by decoupling weight magnitude from direction, but its forward pass requires the row-wise norm of W + sBA, a computation that every major framework we surveyed implements by materializing the dense [d_out, d_in] product BA. At d_in = 8192 and rank r = 384, a single module's norm requires about 512 MB of transient working memory in bf16, making high-rank DoRA costly and often infeasible on common single-GPU setups once hundreds of adapted modules and checkpointing are involved. We present two systems contributions. A factored norm decomposes the squared norm into base, cross, and Gram terms computable through O(d_out r + r^2) intermediates, eliminating the dense product. Fused Triton kernels collapse the four-kernel DoRA composition into a single pass, reducing memory traffic by about 4x and using a numerically stable form that avoids catastrophic cancellation in the near-unity rescaling regime where magnitude scales concentrate in practice. Across six 8-32B vision-language models (VLMs) on three NVIDIA GPUs (RTX 6000 PRO, H200, B200) at r = 384 in bf16, the fused implementation is 1.5-2.0x faster than Hugging Face PEFT's DoRA implementation for inference and 1.5-1.9x faster for gradient computation (optimizer step excluded), with up to 7 GB lower peak VRAM. Microbenchmarks on six GPUs spanning four architecture generations (L40S, A100, RTX 6000 PRO, H200, B200, B300) confirm 1.5-2.7x compose-kernel speedup. Final-logit cosine similarity exceeds 0.9999 across all model/GPU pairs, and multi-seed training curves match within 7.1 x 10^-4 mean per-step loss delta over 2000 steps.

2603.22248 2026-03-24 cs.LG cs.AI cs.IT math.IT stat.ML

Confidence-Based Decoding is Provably Efficient for Diffusion Language Models

Changxiao Cai, Gen Li

详情
英文摘要

Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive (AR) models for language modeling, allowing flexible generation order and parallel generation of multiple tokens. However, this flexibility introduces a challenge absent in AR models: the \emph{decoding strategy} -- which determines the order and number of tokens generated at each iteration -- critically affects sampling efficiency. Among decoding strategies explored in practice, confidence-based methods, which adaptively select which and how many tokens to unmask based on prediction confidence, have shown strong empirical performance. Despite this success, our theoretical understanding of confidence-based decoding remains limited. In this work, we develop the first theoretical analysis framework for confidence-based decoding in DLMs. We focus on an entropy sum-based strategy that continues unmasking tokens within each iteration until the cumulative entropy exceeds a threshold, and show that it achieves $\varepsilon$-accurate sampling in KL divergence with an expected number of iterations $\widetilde O(H(X_0)/\varepsilon)$, where $H(X_0)$ denotes the entropy of the target data distribution. Notably, this strategy yields substantial sampling acceleration when the data distribution has low entropy relative to the sequence length, while automatically adapting to the intrinsic complexity of data without requiring prior knowledge or hyperparameter tuning. Overall, our results provide a theoretical foundation for confidence-based decoding and may inform the design of more efficient decoding strategies for DLMs.

2603.22219 2026-03-24 cs.LG stat.ML

Noise Titration: Exact Distributional Benchmarking for Probabilistic Time Series Forecasting

Qilin Wang

详情
英文摘要

Modern time series forecasting is evaluated almost entirely through passive observation of single historical trajectories, rendering claims about a model's robustness to non-stationarity fundamentally unfalsifiable. We propose a paradigm shift toward interventionist, exact-statistical benchmarking. By systematically titrating calibrated Gaussian observation noise into known chaotic and stochastic dynamical systems, we transform forecasting from a black-box sequence matching game into an exact distributional inference task. Because the underlying data-generating process and noise variance are mathematically explicit, evaluation can rely on exact negative log-likelihoods and calibrated distributional tests rather than heuristic approximations. To fully leverage this framework, we extend the Fern architecture into a probabilistic generative model that natively parameterizes the Symmetric Positive Definite (SPD) cone, outputting calibrated joint covariance structures without the computational bottleneck of generic Jacobian modeling. Under this rigorous evaluation, we find that state-of-the-art zero-shot foundation models behave consistently with the context-parroting mechanism, failing systematically under non-stationary regime shifts and elevated noise. In contrast, Fern explicitly captures the invariant measure and multivariate geometry of the underlying dynamics, maintaining structural fidelity and statistically sharp calibration precisely where massive sequence-matching models collapse.

2603.22192 2026-03-24 math.ST cs.CC cs.DS stat.TH

Stable Algorithms Lower Bounds for Estimation

Xifan Yu, Ilias Zadik

Comments 82 pages, 2 figures

详情
英文摘要

In this work, we show that for all statistical estimation problems, a natural MMSE instability (discontinuity) condition implies the failure of stable algorithms, serving as a version of OGP for estimation tasks. Using this criterion, we establish separations between stable and polynomial-time algorithms for the following MMSE-unstable tasks (i) Planted Shortest Path, where Dijkstra's algorithm succeeds, (ii) random Parity Codes, where Gaussian elimination succeeds, and (iii) Gaussian Subset Sum, where lattice-based methods succeed. For all three, we further show that all low-degree polynomials are stable, yielding separations against low-degree methods and a new method to bound the low-degree MMSE. In particular, our technique highlights that MMSE instability is a common feature for Shortest Path and the noiseless Parity Codes and Gaussian subset sum. Last, we highlight that our work places rigorous algorithmic footing on the long-standing physics belief that first-order phase transitions--which in this setting translates to MMSE-instability impose fundamental limits on classes of efficient algorithms.

2603.22188 2026-03-24 stat.AP cs.CY math.PR

Generalized Sequential Monte Carlo Sampling for Redistricting Simulation

Philip O'Sullivan, Kosuke Imai, Cory McCartan

详情
英文摘要

Simulation methods have become important tools for quantifying partisan and racial bias in redistricting plans. We generalize the Sequential Monte Carlo (SMC) algorithm of McCartan and Imai (2023), one of the commonly used approaches. First, our generalized SMC (gSMC) algorithm can split off regions of arbitrary size, rather than a single district as in the original SMC framework, enabling the sampling of multi-member districts. Second, the gSMC algorithm can operate over various sampling spaces, providing additional computational flexibility. Third, we derive optimal-variance incremental weights and show how to compute them efficiently for each sampling space. Finally, we incorporate Markov chain Monte Carlo (MCMC) steps, creating a hybrid gSMC-MCMC algorithm that can be used for large-scale redistricting applications. We demonstrate the effectiveness of the proposed methodology through analyses of the Irish Parliament, which uses multi-member districts, and the Pennsylvania House of Representatives, which has more than 200 single-member districts.

2603.22160 2026-03-24 stat.AP cs.LG

Data Curation for Machine Learning Interatomic Potentials by Determinantal Point Processes

Joanna Zou, Youssef Marzouk

Comments Original publication at https://openreview.net/forum?id=PKGP7tg65A

详情
Journal ref
ICLR AI4MAT Workshop (2025)
英文摘要

The development of machine learning interatomic potentials faces a critical computational bottleneck with the generation and labeling of useful training datasets. We present a novel application of determinantal point processes (DPPs) to the task of selecting informative subsets of atomic configurations to label with reference energies and forces from costly quantum mechanical methods. Through experiments with hafnium oxide data, we show that DPPs are competitive with existing approaches to constructing compact but diverse training sets by utilizing kernels of molecular descriptors, leading to improved accuracy and robustness in machine learning representations of molecular systems. Our work identifies promising directions to employ DPPs for unsupervised training data curation with heterogeneous or multimodal data, or in online active learning schemes for iterative data augmentation during molecular dynamics simulation.

2603.22071 2026-03-24 stat.ME math.ST stat.TH

Detecting change regions on spheres

Di Su, Yining Chen, Tengyao Wang

详情
英文摘要

While change point detection in time series data has been extensively studied, little attention has been given to its generalisation to data observed on spheres or other manifolds, where changes may occur within spatially complex regions with irregular boundaries, posing significant challenges. We propose a new class of estimators, namely, Change Region Identification and SeParation (CRISP), to locate changes in the mean function of a signal-plus-noise model defined on $d$-dimensional spheres. The CRISP estimator applies to scenarios with a single change region, and is extended to multiple change regions via a newly developed generic scheme. The convergence rate of the CRISP estimator is shown to depend on the VC dimension of the hypothesis class that characterises the change regions in general. We also carefully study the case where change regions have the geometry of spherical caps. Simulations confirm the promising finite-sample performance of this approach. The CRISP estimator's practical applicability is further demonstrated through two real data sets on global temperature and ozone hole.

2603.22050 2026-03-24 stat.ML cs.LG

MAGPI: Multifidelity-Augmented Gaussian Process Inputs for Surrogate Modeling from Scarce Data

Atticus Rex, Elizabeth Qian, David Peterson

详情
英文摘要

Supervised machine learning describes the practice of fitting a parameterized model to labeled input-output data. Supervised machine learning methods have demonstrated promise in learning efficient surrogate models that can (partially) replace expensive high-fidelity models, making many-query analyses, such as optimization, uncertainty quantification, and inference, tractable. However, when training data must be obtained through the evaluation of an expensive model or experiment, the amount of training data that can be obtained is often limited, which can make learned surrogate models unreliable. However, in many engineering and scientific settings, cheaper \emph{low-fidelity} models may be available, for example arising from simplified physics modeling or coarse grids. These models may be used to generate additional low-fidelity training data. The goal of \emph{multifidelity} machine learning is to use both high- and low-fidelity training data to learn a surrogate model which is cheaper to evaluate than the high-fidelity model, but more accurate than any available low-fidelity model. This work proposes a new multifidelity training approach for Gaussian process regression which uses low-fidelity data to define additional features that augment the input space of the learned model. The approach unites desirable properties from two separate classes of existing multifidelity GPR approaches, cokriging and autoregressive estimators. Numerical experiments on several test problems demonstrate both increased predictive accuracy and reduced computational cost relative to the state of the art.

2603.22030 2026-03-24 cs.LG stat.ML

On the Interplay of Priors and Overparametrization in Bayesian Neural Network Posteriors

Julius Kobialka, Emanuel Sommer, Chris Kolb, Juntae Kwon, Daniel Dold, David Rügamer

Comments Accepted at the 29th International Conference on Artificial Intelligence and Statistics (AISTATS) 2026

详情
英文摘要

Bayesian neural network (BNN) posteriors are often considered impractical for inference, as symmetries fragment them, non-identifiabilities inflate dimensionality, and weight-space priors are seen as meaningless. In this work, we study how overparametrization and priors together reshape BNN posteriors and derive implications allowing us to better understand their interplay. We show that redundancy introduces three key phenomena that fundamentally reshape the posterior geometry: balancedness, weight reallocation on equal-probability manifolds, and prior conformity. We validate our findings through extensive experiments with posterior sampling budgets that far exceed those of earlier works, and demonstrate how overparametrization induces structured, prior-aligned weight posterior distributions.

2603.22024 2026-03-24 stat.ME math.ST stat.TH

Cost-Aware Optimized Front-Door Experimental Design

Leopold Mareis, Mathias Drton

Comments This article will be published in the proceedings of CLeaR 2026

详情
英文摘要

Causal effect estimation often succeeds cost-constrained sequential data collection. This work considers multivariate linear front-door models with arbitrary unobserved confounding on treatment and response. We optimize the experimental design by balancing the statistical efficiency and measurement costs through partial data. The full-data efficient influence function for the causal effect is derived, together with the geometry of all observed-data influence functions. This characterization yields a closed-form optimal sampling policy and an estimator to minimize the asymptotic variance of regular asymptotically linear (RAL) estimators within a class of augmented full-data influence functions. The resulting design also covers back-door estimation. In simulations and applications to biological, medical, and industrial datasets, the optimized designs achieve substantial efficiency gains ($5.3\%$ to $31.9\%$) over naive full-sampling strategies.

2603.22006 2026-03-24 astro-ph.CO astro-ph.IM cs.LG stat.ME

A plug-and-play approach with fast uncertainty quantification for weak lensing mass mapping

Hubert Leterme, Andreas Tersenov, Jalal Fadili, Jean-Luc Starck

详情
英文摘要

Upcoming stage-IV surveys such as Euclid and Rubin will deliver vast amounts of high-precision data, opening new opportunities to constrain cosmological models with unprecedented accuracy. A key step in this process is the reconstruction of the dark matter distribution from noisy weak lensing shear measurements. Current deep learning-based mass mapping methods achieve high reconstruction accuracy, but either require retraining a model for each new observed sky region (limiting practicality) or rely on slow MCMC sampling. Efficient exploitation of future survey data therefore calls for a new method that is accurate, flexible, and fast at inference. In addition, uncertainty quantification with coverage guarantees is essential for reliable cosmological parameter estimation. We introduce PnPMass, a plug-and-play approach for weak lensing mass mapping. The algorithm produces point estimates by alternating between a gradient descent step with a carefully chosen data fidelity term, and a denoising step implemented with a single deep learning model trained on simulated data corrupted by Gaussian white noise. We also propose a fast, sampling-free uncertainty quantification scheme based on moment networks, with calibrated error bars obtained through conformal prediction to ensure coverage guarantees. Finally, we benchmark PnPMass against both model-driven and data-driven mass mapping techniques. PnPMass achieves performance close to that of state-of-the-art deep-learning methods while offering fast inference (converging in just a few iterations) and requiring only a single training phase, independently of the noise covariance of the observations. It therefore combines flexibility, efficiency, and reconstruction accuracy, while delivering tighter error bars than existing approaches, making it well suited for upcoming weak lensing surveys.

2603.21992 2026-03-24 stat.ME

Pair-based estimators of infection and removal rates for stochastic epidemic models

Seth D. Temple, Jonathan Terhorst

详情
英文摘要

Stochastic epidemic models can estimate infection and removal rates, and derived quantities such as the basic reproductive number ($R_0$), when both infection and removal times are observed. In practice, however, removal times are often available while infection times are not, and existing methods that rely only on removal times can become unstable or biased. We study inference for stochastic SIR/SEIR models in a partial--observation setting. We develop imputation--based estimators that use a small calibration sample of fully observed infectious periods, derive closed--form expressions for the pairwise exposure terms they require, and use a studentized parametric bootstrap for bias correction and uncertainty quantification. In simulations, removal time--only methods performed poorly in moderate to large $R_0$ scenarios, while observing even tens of complete infectious periods substantially improved the estimation of the infection rate. A reanalysis of the 1861 Hagelloch measles outbreak under simulated missingness recovered stable qualitative differences in transmission between school classes. Based on our results, we advocate for the targeted collection of a modest number of complete infectious periods as a means of improving surveillance in the early stages of an epidemic.

2603.21967 2026-03-24 stat.ME

Unified implementation and comparison of Bayesian shrinkage methods for treatment effect estimation in subgroups

Marcel Wolbers, Miriam Pedrera Gómez, Alex Ocampo, Isaac Gravestock

Comments 26 pages (23 main, 3 supplementary), 5 figures (4 main, 1 supplementary), 8 tables (4 main, 4 supplementary)

详情
英文摘要

Evaluating treatment effect heterogeneity across patient subgroups is a fundamental aspect of clinical trial analysis. Yet, these analyses have inherent limitations due to small sample sizes and the substantial number of subgroups investigated. Statisticians in regulatory agencies and pharmaceutical companies have begun considering shrinkage methods grounded in Bayesian statistical theory. These methods incorporate priors on treatment effect heterogeneity, which operationally shrink raw subgroup treatment effect estimates towards the overall treatment effect. Various shrinkage estimators and priors have been proposed, yet it remains unclear which methods perform best. This work provides a unified presentation, software implementation (in the R package bonsaiforest2), and simulation comparison of one-way and global shrinkage methods for continuous, binary, count, and time-to-event endpoints. One-way models fit a separate shrinkage model for each subgrouping variable, whereas global models fit a model including all subgroup indicators at once. Both can derive standardized subgroup-specific treatment effects. Across all simulation scenarios, shrinkage methods outperformed the standard subgroup estimator without shrinkage in terms of mean squared error. They were also more efficient in identifying a non-efficacious subgroup. Global shrinkage models tended to have smaller mean squared error and less dependence on hyperprior parameters than one-way models, but also exhibited slightly larger bias and worse frequentist coverage of associated credible intervals. For both models, hyperprior choices anchored in trial assumptions about the anticipated size of the overall treatment effect performed well. We conclude that some degree of shrinkage is preferable to none and advocate for the routine inclusion of shrunken estimates in clinical forest plots to facilitate more robust decision-making.

2603.21952 2026-03-24 stat.ME stat.CO

Parsimonious Subset Selection for Generalized Linear Models with Biomedical Applications

Anant Mathur, Benoit Liquet, Samuel Muller, Sarat Moka

详情
英文摘要

High-dimensional biomedical studies require models that are simultaneously accurate, sparse, and interpretable, yet exact best subset selection for generalized linear models is computationally intractable. We develop a scalable method that combines a continuous Boolean relaxation of the subset problem with a Frank--Wolfe algorithm driven by envelope gradients. The resulting method, which we refer to as COMBSS-GLM, is simple to implement, requires one penalized generalized linear model fit per iteration, and produces sparse models along a model-size path. Theoretically, we identify a curvature-based parameter regime in which the relaxed objective is concave in the selection weights, implying that global minimizers occur at binary corners. Empirically, in logistic and multinomial simulations across low- and high-dimensional correlated settings, the proposed method consistently improves variable-selection quality relative to established penalised likelihood competitors while maintaining strong predictive performance. In biomedical applications, it recovers established loci in a binary-outcome rice genome-wide association study and achieves perfect multiclass test accuracy on the Khan SRBCT cancer dataset using a small subset of genes. Open-source implementations are available in R at https://github.com/benoit-liquet/COMBSS-GLM-R and in Python at https://github.com/saratmoka/COMBSS-GLM-Python.

2603.21918 2026-03-24 stat.ML cs.LG

Structural Concentration in Weighted Networks: A Class of Topology-Aware Indices

L. Riso, M. G. Zoia

详情
英文摘要

This paper develops a unified framework for measuring concentration in weighted systems embedded in networks of interactions. While traditional indices such as the Herfindahl-Hirschman Index capture dispersion in weights, they neglect the topology of relationships among the elements receiving those weights. To address this limitation, we introduce a family of topology-aware concentration indices that jointly account for weight distributions and network structure. At the core of the framework lies a baseline Network Concentration Index (NCI), defined as a normalized quadratic form that measures the fraction of potential weighted interconnection realized along observed network links. Building on this foundation, we construct a flexible class of extensions that modify either the interaction structure or the normalization benchmark, including weighted, density-adjusted, null-model, degree-constrained, transformed-data, and multi-layer variants. This family of indices preserves key properties such as normalization, invariance, and interpretability, while allowing concentration to be evaluated across different dimensions of dependence, including intensity, higher-order interactions, and extreme events. Theoretical results characterize the indices and establish their relationship with classical concentration and network measures. Empirical and simulation evidence demonstrate that systems with identical weight distributions may exhibit markedly different levels of structural concentration depending on network topology, highlighting the additional information captured by the proposed framework. The approach is broadly applicable to economic, financial, and complex systems in which weighted elements interact through networks.

2603.21914 2026-03-24 math.ST stat.TH

On the identifiability of Dirichlet mixture models

Hien Duy Nguyen, Mayetri Gupta

详情
英文摘要

We study identifiability of finite mixtures of Dirichlet distributions on the interior of the simplex. We first prove a shift identity showing that every Dirichlet density can be written as a mixture of $J$ shifted Dirichlet densities, where $J-1$ is the dimension of the simplex support, which yields non-identifiability on the full parameter space. We then show that identifiability is recovered on a fixed-total parameter slice and on restricted box-type regions. On the full parameter space, we prove that any nontrivial linear relation among Dirichlet kernels must involve at least $J$ coefficients sharing a common sign, and deduce that mixtures with fewer than $J$ atoms are identifiable. We further report direct non-identifiability implications for unrestricted finite mixtures of generalized Dirichlet, Dirichlet-multinomial, fixed-topic-matrix latent Dirichlet allocation, Beta-Liouville, and inverted Beta-Liouville models.

2603.21844 2026-03-24 cs.LG cs.AI stat.ME stat.ML

On the Number of Conditional Independence Tests in Constraint-based Causal Discovery

Marc Franquesa Monés, Jiaqi Zhang, Caroline Uhler

详情
英文摘要

Learning causal relations from observational data is a fundamental problem with wide-ranging applications across many fields. Constraint-based methods infer the underlying causal structure by performing conditional independence tests. However, existing algorithms such as the prominent PC algorithm need to perform a large number of independence tests, which in the worst case is exponential in the maximum degree of the causal graph. Despite extensive research, it remains unclear if there exist algorithms with better complexity without additional assumptions. Here, we establish an algorithm that achieves a better complexity of $p^{\mathcal{O}(s)}$ tests, where $p$ is the number of nodes in the graph and $s$ denotes the maximum undirected clique size of the underlying essential graph. Complementing this result, we prove that any constraint-based algorithm must perform at least $2^{Ω(s)}$ conditional independence tests, establishing that our proposed algorithm achieves exponent-optimality up to a logarithmic factor in terms of the number of conditional independence tests needed. Finally, we validate our theoretical findings through simulations, on semi-synthetic gene-expression data, and real-world data, demonstrating the efficiency of our algorithm compared to existing methods in terms of number of conditional independence tests needed.

2603.21752 2026-03-24 stat.AP cs.LG

Identifiability and amortized inference limitations in Kuramoto models

Emma Hannula, Jana de Wiljes, Matthew T. Moores, Heikki Haario, Lassi Roininen

详情
英文摘要

Bayesian inference is a powerful tool for parameter estimation and uncertainty quantification in dynamical systems. However, for nonlinear oscillator networks such as Kuramoto models, widely used to study synchronization phenomena in physics, biology, and engineering, inference is often computationally prohibitive due to high-dimensional state spaces and intractable likelihood functions. We present an amortized Bayesian inference approach that learns a neural approximation of the posterior from simulated phase dynamics, enabling fast, scalable inference without repeated sampling or optimization. Applied to synthetic Kuramoto networks, the method shows promising results in approximating posterior distributions and capturing uncertainty, with computational savings compared to traditional Bayesian techniques. These findings suggest that amortized inference is a practical and flexible framework for uncertainty-aware analysis of oscillator networks.

2603.21748 2026-03-24 stat.ME

Fixed Rank co-Kriging: a model for multivariate spatial prediction

Gaia Caringi, Piercesare Secchi

Comments 36 pages, 25 figures

详情
英文摘要

This work develops a multivariate extension of the Fixed Rank Kriging (FRK) framework for spatial prediction in settings where multiple spatial processes may provide complementary information. The goal is to preserve the computational efficiency, the ability to operate without assuming stationarity over the domain, and the spatial support flexibility of FRK, while incorporating cross-process dependence. To this end, we employ a multiresolution coregionalization structure for the latent spatial effects, in which spatial basis functions are combined with Gaussian Markov Random Field coefficients. An estimation procedure based on the expectation-maximization algorithm is developed, designed to exploit the multiresolution latent structure. Through simulation studies, we examine when the proposed joint modeling is beneficial. We consider cases in which one process is observed more sparsely or is entirely unobserved in a subregion and find that the multivariate formulation is able to borrow information from the more densely observed process, producing coherent and accurate predictions even where direct observations are limited or absent. Finally, the model is applied to the analysis of PM10 concentrations in Northern Italy, illustrating its applicability in a real environmental context.

2603.21699 2026-03-24 econ.EM stat.ML

A Job I Like or a Job I Can Get: Designing Job Recommender Systems Using Field Experiments

Guillaume Bied, Philippe Caillou, Bruno Crépon, Christophe Gaillac, Elia Pérennes, Michèle Sebag

Comments The main paper, which stops at page 49, is followed by the online appendix (31 pages)

详情
英文摘要

Recommendation systems (RSs) are increasingly used to guide job seekers on online platforms, yet the algorithms currently deployed are typically optimized for predictive objectives such as clicks, applications, or hires, rather than job seekers' welfare. We develop a job-search model with an application stage in which the value of a vacancy depends on two dimensions: the utility it delivers to the worker and the probability that an application succeeds. The model implies that welfare-optimal RSs rank vacancies by an expected-surplus index combining both, and shows why rankings based solely on utility, hiring probabilities, or observed application behavior are generically suboptimal, an instance of the inversion problem between behavior and welfare. We test these predictions and quantify their practical importance through two randomized field experiments conducted with the French public employment service. The first experiment, comparing existing algorithms and their combinations, provides behavioral evidence that both dimensions shape application decisions. Guided by the model and these results, the second experiment extends the comparison to an RS designed to approximate the welfare-optimal ranking. The experiments generate exogenous variation in the vacancies shown to job seekers, allowing us to estimate the model, validate its behavioral predictions, and construct a welfare metric. Algorithms informed by the model-implied optimal ranking substantially outperform existing approaches and perform close to the welfare-optimal benchmark. Our results show that embedding predictive tools within a simple job-search framework and combining it with experimental evidence yields recommendation rules with substantial welfare gains in practice.

2603.21683 2026-03-24 math.OC math.PR stat.ML

Learning operators on labelled conditional distributions with applications to mean field control of non exchangeable systems

Samy Mekkaoui, Huyên Pham, Xavier Warin

详情
英文摘要

We study the approximation of operators acting on probability measures on a product space with prescribed marginal. Let $I$ be a label space endowed with a reference measure $λ$, and define $\cal M_λ$ as the set of probability measures on $I\times \mathbb{R}^d$ with first marginal $λ$. By disintegration, elements of $\cal M_λ$ correspond to families of labeled conditional distributions. Operators defined on this constrained measure space arise naturally in mean-field control problems with heterogeneous, non-exchangeable agents. Our main theoretical result establishes a universal approximation theorem for continuous operators on $\cal M_λ$. The proof combines cylindrical approximations of probability measures with DeepONet-type branch-trunk neural architecture, yielding finite-dimensional representations of such operators. We further introduce a sampling strategy for generating training measures in $\cal M_λ$, enabling practical learning of such conditional mean-field operators. We apply the method to the numerical resolution of mean-field control problems with heterogeneous interactions, thereby extending previous neural approaches developed for homogeneous (exchangeable) systems. Numerical experiments illustrate the accuracy and computational effectiveness of the proposed framework.

2603.21678 2026-03-24 stat.ML cs.LG

CoNBONet: Conformalized Neuroscience-inspired Bayesian Operator Network for Reliability Analysis

Shailesh Garg, Souvik Chakraborty

详情
英文摘要

Time-dependent reliability analysis of nonlinear dynamical systems under stochastic excitations is a critical yet computationally demanding task. Conventional approaches, such as Monte Carlo simulation, necessitate repeated evaluations of computationally expensive numerical solvers, leading to significant computational bottlenecks. To address this challenge, we propose \textit{CoNBONet}, a neuroscience-inspired surrogate model that enables fast, energy-efficient, and uncertainty-aware reliability analysis, providing a scalable alternative to techniques such as Monte Carlo simulations. CoNBONet, short for \textbf{Co}nformalized \textbf{N}euroscience-inspired \textbf{B}ayesian \textbf{O}perator \textbf{Net}work, leverages the expressive power of deep operator networks while integrating neuroscience-inspired neuron models to achieve fast, low-power inference. Unlike traditional surrogates such as Gaussian processes, polynomial chaos expansions, or support vector regression, that may face scalability challenges for high-dimensional, time-dependent reliability problems, CoNBONet offers \textit{fast and energy-efficient inference} enabled by a neuroscience-inspired network architecture, \textit{calibrated uncertainty quantification with theoretical guarantees} via split conformal prediction, and \textit{strong generalization capability} through an operator-learning paradigm that maps input functions to system response trajectories. Validation of the proposed CoNBONet for various nonlinear dynamical systems demonstrates that CoNBONet preserves predictive fidelity, and achieves reliable coverage of failure probabilities, making it a powerful tool for robust and scalable reliability analysis in engineering design.

2603.21590 2026-03-24 math.ST cs.LG stat.TH

Feature Incremental Clustering with Generalization Bounds

Jing Zhang, Chenping Hou

详情
英文摘要

In many learning systems, such as activity recognition systems, as new data collection methods continue to emerge in various dynamic environmental applications, the attributes of instances accumulate incrementally, with data being stored in gradually expanding feature spaces. How to design theoretically guaranteed algorithms to effectively cluster this special type of data stream, commonly referred to as activity recognition, remains unexplored. Compared to traditional scenarios, we will face at least two fundamental questions in this feature incremental scenario. (i) How to design preliminary and effective algorithms to address the feature incremental clustering problem? (ii) How to analyze the generalization bounds for the proposed algorithms and under what conditions do these algorithms provide a strong generalization guarantee? To address these problems, by tailoring the most common clustering algorithm, i.e., $k$-means, as an example, we propose four types of Feature Incremental Clustering (FIC) algorithms corresponding to different situations of data access: Feature Tailoring (FT), Data Reconstruction (DR), Data Adaptation (DA), and Model Reuse (MR), abbreviated as FIC-FT, FIC-DR, FIC-DA, and FIC-MR. Subsequently, we offer a detailed analysis of the generalization error bounds for these four algorithms and highlight the critical factors influencing these bounds, such as the amounts of training data, the complexity of the hypothesis space, the quality of pre-trained models, and the discrepancy of the reconstruction feature distribution. The numerical experiments show the effectiveness of the proposed algorithms, particularly in their application to activity recognition clustering tasks.

2603.21554 2026-03-24 math.OC math.ST stat.TH

Sinkhorn algorithms for entropic vector quantile regression

Kengo Kato, Boyu Wang

Comments 32 pages

详情
英文摘要

Vector quantile regression (VQR) is an optimal transport (OT)-based framework that extends linear quantile regression to vector-valued response variables and can be formulated as an OT problem with a mean-independence constraint. In this paper, we study two Sinkhorn-type algorithms for VQR with entropic regularization, building on our previous work on its duality theory. The first is a direct adaptation of the classical Sinkhorn iteration based on solving the full Schrödinger-type system characterizing the dual potentials, which requires solving an implicit functional equation at each iteration. The second algorithm, which is new in the literature, replaces the implicit update with a projected gradient step, resulting in a modified scheme that is computationally more practical. For both algorithms, and for general compactly supported marginals, we establish linear convergence in both the dual objective value and the iterates. A key innovation in our analysis is the derivation of explicit quantitative bounds on the dual potentials and Sinkhorn iterates.

2603.21549 2026-03-24 stat.ME stat.CO

Bayesian inference for ordinary differential equations models with heteroscedastic measurement error

Selva Salimi, David J. Warne, Christopher Drovandi

Comments 28 pages

详情
英文摘要

Ordinary differential equation (ODE) models are widely used to describe systems in many areas of science. To ensure these models provide accurate and interpretable representations of real-world dynamics, it is often necessary to infer parameters from data, which involves specifying the form of the ODE system as well as a statistical model describing the observational process. A popular and convenient choice for the error model is a Gaussian distribution with constant variance. However, the choice may not be realistic in many systems, since the variance of the observational error may vary over time or have some dependence on the system state (heteroscedastic), reflecting changes in measurement conditions, environmental fluctuations, or intrinsic system variability. Misspecification of the error model can lead to substantial inaccuracies of the posterior estimates of the ODE model parameters and predictions. More elaborate parametric error models could be specified, but this would increase computational cost because additional parameters would need to be estimated within the MCMC procedure and may still be misspecified. In this work we propose a two-step semi-parametric framework for Bayesian parameter estimation of ODE model parameters when there exists heteroscedasticity in the error process. The first step applies a heteroscedastic Gaussian process to estimate the time-dependent error, and the second step performs Bayesian inference for the ODE model parameters using the estimated time-dependent error estimated from step one in the likelihood function. Through a simulation study and two real-world applications, we demonstrate that the proposed approach yields more reliable posterior inference and predictive uncertainty compared to the standard homoscedastic models. Although our focus is on heteroscedasticity, the framework could be applied to handle more complex error processes.

2603.19422 2026-03-24 stat.ML cs.LG math.ST stat.TH

Pseudo-Labeling for Unsupervised Domain Adaptation with Kernel GLMs

Nathan Weill, Kaizheng Wang

Comments 55 pages, 4 figures. Python solvers and experiment scripts are available at: https://github.com/nathanweill/KRGLM

详情
英文摘要

We propose a principled framework for unsupervised domain adaptation under covariate shift in kernel Generalized Linear Models (GLMs), encompassing kernelized linear, logistic, and Poisson regression with ridge regularization. Our goal is to minimize prediction error in the target domain by leveraging labeled source data and unlabeled target data, despite differences in covariate distributions. We partition the labeled source data into two batches: one for training a family of candidate models, and the other for building an imputation model. This imputation model generates pseudo-labels for the target data, enabling robust model selection. We establish non-asymptotic excess-risk bounds that characterize adaptation performance through an "effective labeled sample size", explicitly accounting for the unknown covariate shift. Experiments on synthetic and real datasets demonstrate consistent performance gains over source-only baselines.

2603.17866 2026-03-24 stat.AP stat.ME

Bayesian multilevel step-and-turn models for evaluating player movement in American football

Quang Nguyen, Ronald Yurko

详情
英文摘要

In sports analytics, player tracking data have driven significant advancements in the task of player evaluation. We present a novel generative framework for evaluating the observed frame-by-frame player positioning against a distribution of hypothetical alternatives. We illustrate our approach by modeling the within-play movement of an individual ball carrier in the National Football League (NFL). Specifically, we develop Bayesian multilevel models for frame-level player movement based on two components: step length (distance between successive locations) and turn angle (change in direction between successive steps). Using the step-and-turn models, we perform posterior predictive simulation to generate hypothetical ball carrier steps at each frame during a play. This enables comparison of the observed player movement with a distribution of simulated alternatives using common valuation measures in American football. We apply our framework to tracking data from the first nine weeks of the 2022 NFL season and derive novel player performance metrics based on hypothetical evaluation.

2603.15884 2026-03-24 stat.AP stat.ME

A Utility Score Framework for Dose Optimization Studies with Binary Efficacy-Safety Endpoints: Sample Size Determination and Bias Characterization

Xuemin Gu, Cong Xu, Lei Xu, Ying Yu

详情
英文摘要

The FDA's Project Optimus initiative emphasizes patient-centered dose selection in oncology that balances efficacy and safety. We develop a framework for randomized dose optimization studies that uses clinically interpretable utility scores to integrate binary efficacy and safety endpoints and select the optimal dose for a follow-on confirmatory trial. The framework provides: (i) a systematic method for eliciting utility scores that reflect clinical priorities; (ii) closed-form sample size formulas to achieve prespecified Probabilities of Correct Selection (PCS) under clinically relevant scenarios; and (iii) analytical expressions characterizing the propagation of selection-induced bias to confirmatory trials, including time-to-event endpoints correlated with the selection endpoint. Extensive simulations (10^6 replications per scenario) confirm that the sample size methods achieve target PCS and that the bias and Type I error formulas closely match empirical estimates. An R package DoseOptDesign and an interactive Shiny application are publicly available.

2603.14757 2026-03-24 stat.OT

The Rise of Null Hypothesis Significance Testing (NHST): Institutional Massification and the Emergence of a Procedural Epistemology

Carol Ting

Comments 29 pages, 6 figures. v2: Added missing citation (Ting & Greenland, 2024), corrected formatting issues, and minor typographical edits

详情
英文摘要

It has long been a puzzle why, despite sustained reform efforts, many applied scientific fields remain dominated by Null Hypothesis Significance Testing (NHST), a framework that dichotomizes study results and privileges "statistically significant" findings. This paper examines that puzzle by situating the development and rise of NHST within its historical and institutional context. Taking Actor-Network Theory as a point of entry, the analysis identifies the conditions under which particular inferential technologies stabilize and endure. The analysis shows that, although NHST does not resolve the technical problem of statistical inference, it came to dominate as a social technology that addressed the most pressing institutional challenge of the postwar period: the mass expansion of scientific networks. Under conditions of rapid institutional growth, NHST's technical slippages--purging research context and replacing epistemic judgment with mechanical procedures--became functional features rather than flaws. These features enabled procedural self-sufficiency across settings marked by heterogeneous goals and uneven expertise, thereby sealing NHST's position as the obligatory passage point in many postwar scientific fields.

2602.10273 2026-03-24 stat.ML cs.LG

Power-SMC: Low-Latency Sequence-Level Power Sampling for Training-Free LLM Reasoning

Seyedarmin Azizi, Erfan Baghaei Potraghloo, Minoo Ahmadi, Souvik Kundu, Massoud Pedram

详情
英文摘要

Many recent reasoning gains in large language models can be explained as distribution sharpening: biasing generation toward high-likelihood trajectories already supported by the pretrained model, rather than modifying its weights. A natural formalization is the sequence-level power distribution $π_α(y\mid x)\propto p_θ(y\mid x)^α$ ($α>1$), which concentrates mass on whole sequences instead of adjusting token-level temperature. Prior work shows that Metropolis--Hastings (MH) sampling from this distribution recovers strong reasoning performance, but at order-of-magnitude inference slowdowns. We introduce Power-SMC, a training-free Sequential Monte Carlo scheme that targets the same objective while remaining close to standard decoding latency. Power-SMC advances a small particle set in parallel, corrects importance weights token-by-token, and resamples when necessary, all within a single GPU-friendly batched decode. We prove that temperature $τ=1/α$ is the unique prefix-only proposal minimizing incremental weight variance, interpret residual instability via prefix-conditioned Rényi entropies, and introduce an exponent-bridging schedule that improves particle stability without altering the target. On MATH500, Power-SMC matches or exceeds MH power sampling while reducing latency from $16$--$28\times$ to $1.4$--$3.3\times$ over baseline decoding. The code is available at https://github.com/ArminAzizi98/Power-SMC.

2602.08998 2026-03-24 math.AT cs.LG math.OA stat.ML

Universal Coefficients and Mayer-Vietoris Sequence for Groupoid Homology

Luciano Melodia

Comments Master's thesis, Code available at https://codeberg.org/Jiren/MSc

详情
英文摘要

We study homology of ample groupoids via the compactly supported Moore complex of the nerve. Let $A$ be a topological abelian group. For $n\ge 0$ set $C_n(\mathcal G;A) := C_c(\mathcal G_n,A)$ and define $\partial_n^A=\sum_{i=0}^n(-1)^i(d_i)_*$. This defines $H_n(\mathcal G;A)$. The theory is functorial for continuous étale homomorphisms. It is compatible with standard reductions, including restriction to saturated clopen subsets. In the ample setting it is invariant under Kakutani equivalence. We reprove Matui type long exact sequences and identify the comparison maps at chain level. For discrete $A$ we prove a natural universal coefficient short exact sequence $$0\to H_n(\mathcal G)\otimes_{\mathbb Z}A\xrightarrow{\ ι_n^{\mathcal G}\ }H_n(\mathcal G;A)\xrightarrow{\ κ_n^{\mathcal G}\ }\operatorname{Tor}_1^{\mathbb Z}\bigl(H_{n-1}(\mathcal G),A\bigr)\to 0.$$ The key input is the chain level isomorphism $C_c(\mathcal G_n,\mathbb Z)\otimes_{\mathbb Z}A\cong C_c(\mathcal G_n,A)$, which reduces the groupoid statement to the classical algebraic UCT for the free complex $C_c(\mathcal G_\bullet,\mathbb Z)$. We also isolate the obstruction for non-discrete coefficients. For a locally compact totally disconnected Hausdorff space $X$ with a basis of compact open sets, the image of $Φ_X:C_c(X,\mathbb Z)\otimes_{\mathbb Z}A\to C_c(X,A)$ is exactly the compactly supported functions with finite image. Thus $Φ_X$ is surjective if and only if every $f\in C_c(X,A)$ has finite image, and for suitable $X$ one can produce compactly supported continuous maps $X\to A$ with infinite image. Finally, for a clopen saturated cover $\mathcal G_0=U_1\cup U_2$ we construct a short exact sequence of Moore complexes and derive a Mayer-Vietoris long exact sequence for $H_\bullet(\mathcal G;A)$ for explicit computations.

2602.07098 2026-03-24 stat.CO cs.LG stat.ML

BayesFlow 2: Multi-Backend Amortized Bayesian Inference in Python

Lars Kühmichel, Jerry M. Huang, Valentin Pratz, Jonas Arruda, Hans Olischläger, Daniel Habermann, Simon Kucharsky, Lasse Elsemüller, Aayush Mishra, Niels Bracher, Svenja Jedhoff, Marvin Schmitt, Paul-Christian Bürkner, Stefan T. Radev

详情
英文摘要

Modern Bayesian inference involves a mixture of computational methods for estimating, validating, and drawing conclusions from probabilistic models as part of principled workflows. An overarching motif of many Bayesian methods is that they are relatively slow, which often becomes prohibitive when fitting complex models to large data sets. Amortized Bayesian inference (ABI) offers a path to solving the computational challenges of Bayes. ABI trains neural networks on model simulations, rewarding users with rapid inference of any model-implied quantity, such as point estimates, likelihoods, or full posterior distributions. In this work, we present the Python library BayesFlow, Version 2.0, for general-purpose ABI. Along with direct posterior, likelihood, and ratio estimation, the software includes support for multiple popular deep learning backends, a rich collection of generative networks for sampling and density estimation, complete customization and high-level interfaces, as well as new capabilities for hyperparameter optimization, design optimization, and hierarchical modeling. Using a case study on dynamical system parameter estimation, combined with comparisons to similar software, we show that our streamlined, user-friendly workflow has strong potential to support broad adoption.

2601.09888 2026-03-24 econ.EM math.ST stat.TH

Learning about Treatment Effects with Prior Studies: A Bayesian Model Averaging Approach

Frederico Finan, Demian Pouzo

详情
英文摘要

We establish concentration rates for estimation of treatment effects in experiments that incorporate prior sources of information -- such as past pilots, related studies, or expert assessments -- whose external validity is uncertain. Each source is modeled as a Gaussian prior with its own mean and precision, and sources are combined using Bayesian model averaging (BMA), allowing data from the new experiment to update posterior weights. To capture empirically relevant settings in which prior studies may be as informative as the current experiment, we introduce a nonstandard asymptotic framework in which prior precisions grow with the experiment's sample size. In this regime, posterior weights are governed by an external-validity index that depends jointly on a source's bias and information content: biased sources are exponentially downweighted, while unbiased sources dominate. When at least one source is unbiased, our procedure concentrates on the unbiased set and achieves faster convergence than relying on new data alone. When all sources are biased, including a deliberately conservative (diffuse) prior guarantees robustness and recovers the standard convergence rate.

2511.17167 2026-03-24 math.ST cs.CR stat.ME stat.TH

Differentially private testing for relevant dependencies in high dimensions

Patrick Bastian, Holger Dette, Martin Dunsche

Comments 39 pages, 9 figures

详情
英文摘要

We investigate the problem of detecting dependencies between the components of a high-dimensional vector. Our approach advances the existing literature in two important respects. First, we consider the problem under privacy constraints. Second, instead of testing whether the coordinates are pairwise independent, we are interested in determining whether certain pairwise associations between the components (such as all pairwise Kendall's $τ$ coefficients) do not exceed a given threshold in absolute value. Considering hypotheses of this form is motivated by the observation that in the high-dimensional regime, it is rare and perhaps impossible to have a null hypothesis that can be modeled exactly by assuming that all pairwise associations are precisely equal to zero. The formulation of the null hypothesis as a composite hypothesis makes the problem of constructing tests already non-standard in the non-private setting. Additionally, under privacy constraints, state of the art procedures rely on permutation approaches that are rendered invalid under a composite null. We propose a novel bootstrap based methodology that is especially powerful in sparse settings, develop theoretical guarantees under mild assumptions and show that the proposed method enjoys good finite sample properties even in the high privacy regime. Additionally, we present applications in medical data that showcase the applicability of our methodology.

2511.01137 2026-03-24 cs.LG math.AG math.DS stat.ML

Regularization Implies balancedness in the deep linear network

Kathryn Lindsey, Govind Menon

Comments 18 pages, 3 figures. Fixed minor errors in revision, added more context and created Discussion section

详情
英文摘要

We use geometric invariant theory (GIT) to study the deep linear network (DLN). The Kempf-Ness theorem is used to establish that the $L^2$ regularizer is minimized on the balanced manifold. We introduce related balancing flows using the Riemannian geometry of fibers. The balancing flow defined by the $L^2$ regularizer is shown to converge to the balanced manifold at a uniform exponential rate. The balancing flow defined by the squared moment map is computed explicitly and shown to converge globally. This framework allows us to decompose the training dynamics into two distinct gradient flows: a regularizing flow on fibers and a learning flow on the balanced manifold. It also provides a common mathematical framework for balancedness in deep learning and linear systems theory. We use this framework to interpret balancedness in terms of fast-slow systems, model reduction and Bayesian principles.

2509.19988 2026-03-24 stat.ML cs.LG q-bio.QM

BioBO: Biology-informed Bayesian Optimization for Perturbation Design

Yanke Li, Tianyu Cui, Tommaso Mansi, Mangal Prakash, Rui Liao

Comments ICLR 2026

详情
英文摘要

Efficient design of genomic perturbation experiments is crucial for accelerating drug discovery and therapeutic target identification, yet exhaustive perturbation of the human genome remains infeasible due to the vast search space of potential genetic interactions and experimental constraints. Bayesian optimization (BO) has emerged as a powerful framework for selecting informative interventions, but existing approaches often fail to exploit domain-specific biological prior knowledge. We propose Biology-Informed Bayesian Optimization (BioBO), a method that integrates Bayesian optimization with multimodal gene embeddings and enrichment analysis, a widely used tool for gene prioritization in biology, to enhance surrogate modeling and acquisition strategies. BioBO combines biologically grounded priors with acquisition functions in a principled framework, which biases the search toward promising genes while maintaining the ability to explore uncertain regions. Through experiments on established public benchmarks and datasets, we demonstrate that BioBO improves labeling efficiency by 25-40%, and consistently outperforms conventional BO by identifying top-performing perturbations more effectively. Moreover, by incorporating enrichment analysis, BioBO yields pathway-level explanations for selected perturbations, offering mechanistic interpretability that links designs to biologically coherent regulatory circuits.

2508.14936 2026-03-24 q-bio.QM cs.AI cs.LG stat.AP stat.ML

Can synthetic data reproduce real-world findings in epidemiology? A replication study using adversarial random forests

Jan Kapar, Kathrin Günther, Lori Ann Vallis, Klaus Berger, Nadine Binder, Hermann Brenner, Stefanie Castell, Beate Fischer, Volker Harth, Bernd Holleczek, Timm Intemann, Till Ittermann, André Karch, Thomas Keil, Lilian Krist, Berit Lange, Michael F. Leitzmann, Katharina Nimptsch, Nadia Obi, Iris Pigeot, Tobias Pischon, Tamara Schikowski, Börge Schmidt, Carsten Oliver Schmidt, Anja M. Sedlmair, Justine Tanoey, Harm Wienbergen, Andreas Wienke, Claudia Wigmann, Marvin N. Wright

详情
英文摘要

Synthetic data holds substantial potential to address practical challenges in epidemiology due to restricted data access and privacy concerns. However, many current methods suffer from limited quality, high computational demands, and complexity for non-experts. Furthermore, common evaluation strategies for synthetic data often fail to directly reflect statistical utility and measure privacy risks sufficiently. Against this background, a critical underexplored question is whether synthetic data can reliably reproduce key findings from epidemiological research while preserving privacy. We propose adversarial random forests (ARF) as an efficient and convenient method for synthesizing tabular epidemiological data. To evaluate its performance, we replicated statistical analyses from six epidemiological publications covering blood pressure, anthropometry, myocardial infarction, accelerometry, loneliness, and diabetes, from the German National Cohort (NAKO Gesundheitsstudie), the Bremen STEMI Registry U45 Study, and the Guelph Family Health Study. We further assessed how dataset dimensionality and variable complexity affect the quality of synthetic data, and contextualized ARF's performance by comparison with commonly used tabular data synthesizers in terms of utility, privacy, generalisation, and runtime. Across all replicated studies, results on ARF-generated synthetic data consistently aligned with original findings. Even for datasets with relatively low sample size-to-dimensionality ratios, replication outcomes closely matched the original results across descriptive and inferential analyses. Reduced dimensionality and variable complexity further enhanced synthesis quality. ARF demonstrated favourable performance regarding utility, privacy preservation, and generalisation relative to other synthesizers and superior computational efficiency.

2506.14082 2026-03-24 physics.geo-ph stat.AP

Smooth surface reconstruction of earthquake faults from distributed moment-potency-tensor solutions

Dye SK Sato, Yuji Yagi, Ryo Okuwaki, Yukitoshi Fukahata

Comments 46 pages, 13 figures

详情
英文摘要

Earthquake faults as observed by seismic motions primarily manifest as displacement discontinuities within elastic continua. The displacement discontinuity and the surface normal vector (n-vector) of such an idealized earthquake source are measured by the tensor of potency, which is seismic moment normalized by stiffness. This study formulates an inverse problem to reconstruct a smooth 3D fault surface from an areal density field of the potency tensor. Here, the surface is represented by an elevation field, while nodal planes of the potency density represent the surface normal (n-vector) field, reducing the problem to an n-vector-to-elevation transform. Although this transform is a one-to-one mapping in 2D, it becomes overdetermined in 3D because the n-vector has two degrees of freedom while the scalar elevation has only one, admitting no solution in general. This overdeterminacy originates from modeling the potency density, the inelastic strain with six degrees of freedom, as a displacement discontinuity of five degrees of freedom. Whereas this overdeterminacy appears as the violation of the determinant-free constraint in point potency sources, it raises a conflict with the global consistency of the n-vector field in areal potency densities. Recognizing this capacity of the potency density to describe inelastic strain incompatible with displacement discontinuity, we introduce an a priori constraint to define the fault as the smooth surface that best approximates inelastic strain as displacement discontinuity. We derive an analytical solution for this formulation and demonstrate its ability to reproduce 3D surfaces from noisy synthetic n-vectors. We integrate this formula into potency density tensor inversion and apply it to the 2013 Balochistan earthquake. The estimated 3D geometry shows better agreement with observed fault traces than previous quasi-2D methods, validating our proposal.

2505.16919 2026-03-24 stat.ME

Hilbert space methods for approximating multi-output latent variable Gaussian processes

Soham Mukherjee, Manfred Claassen, Paul-Christian Bürkner

Comments 44 pages, 34 figures

详情
英文摘要

Gaussian processes are a powerful class of non-linear models, but have limited applicability for larger datasets due to their high computational complexity. In such cases, approximate methods are required, for example, the recently developed class of Hilbert space Gaussian processes. They have been shown to significantly reduce computation time while retaining most of the favorable properties of exact Gaussian processes. However, Hilbert space approximations have so far only been developed for uni-dimensional outputs and manifest (known) inputs. Thus, we generalize Hilbert space methods to multi-output and latent input settings. Through extensive simulations, we show that the developed approximate Gaussian processes are indeed not only faster, but also provide similar or even better uncertainty calibration and accuracy of latent variable estimates compared to exact Gaussian processes. While not necessarily faster than alternative Gaussian process approximations, our new models provide better calibration and estimation accuracy, thus striking an excellent balance between trustworthiness and speed. We additionally illustrate our methods on a real-world case study from single cell biology.

2504.10881 2026-03-24 stat.ME stat.AP stat.CO

A Nonparametric Bayesian Local-Global Model for Enhanced Adverse Event Signal Detection in Spontaneous Reporting System Data

Xin-Wei Huang, Saptarshi Chakraborty

详情
英文摘要

Spontaneous reporting system databases are key resources for post-marketing surveillance, providing real-world evidence (RWE) on the adverse events (AEs) of regulated drugs or other medical products. Various statistical methods have been proposed for AE signal detection in these databases, flagging drug-specific AEs with disproportionately high observed counts compared to expected counts under independence. However, signal detection remains challenging for rare AEs or newer drugs, which receive small observed and expected counts and thus suffer from reduced statistical power. Principled information sharing on signal strengths across drugs/AEs is crucial in such cases to enhance signal detection. However, existing methods typically ignore complex between-drug associations on AE signal strengths, limiting their ability to detect signals. We propose novel local-global mixture Dirichlet process (DP) prior-based nonparametric Bayesian models to capture these associations, enabling principled information sharing between drugs while balancing flexibility and shrinkage for each drug, thereby enhancing statistical power. We develop efficient Markov chain Monte Carlo algorithms for implementation and employ a false discovery rate (FDR)-controlled, false negative rate (FNR)-optimized hypothesis testing framework for AE signal detection. Extensive simulations demonstrate our methods' superior sensitivity -- often surpassing existing approaches by a twofold or greater margin -- while strictly controlling the FDR. An application to FDA FAERS data on statin drugs further highlights our methods' effectiveness in real-world AE signal detection. Software implementing our methods is provided as supplementary material.

2504.09396 2026-03-24 cs.LG cs.AI stat.ML

Adaptive Insurance Reserving with CVaR-Constrained Reinforcement Learning under Macroeconomic Regimes

Stella C. Dong

详情
英文摘要

We develop a reinforcement learning (RL) framework for insurance loss reserving that formulates reserve setting as a finite-horizon sequential decision problem under claim development uncertainty, macroeconomic stress, and solvency governance. The reserving process is modeled as a Markov Decision Process (MDP) in which reserve adjustments influence future reserve adequacy, capital efficiency, and solvency outcomes. A Proximal Policy Optimization (PPO) agent is trained using a risk-sensitive reward that penalizes reserve shortfall, capital inefficiency, and breaches of a volatility-adjusted solvency floor, with tail risk explicitly controlled through Conditional Value-at-Risk (CVaR). To reflect regulatory stress-testing practice, the agent is trained under a regime-aware curriculum and evaluated using both regime-stratified simulations and fixed-shock stress scenarios. Empirical results for Workers Compensation and Other Liability illustrate how the proposed RL-CVaR policy improves tail-risk control and reduces solvency violations relative to classical actuarial reserving methods, while maintaining comparable capital efficiency. We further discuss calibration and governance considerations required to align model parameters with firm-specific risk appetite and supervisory expectations under Solvency II and Own Risk and Solvency Assessment (ORSA) frameworks.

2503.04071 2026-03-24 stat.ML cs.LG

Tightening optimality gap with confidence through conformal prediction

Miao Li, Michael Klamkin, Russell Bent, Pascal Van Hentenryck

Comments none

详情
英文摘要

Decision makers routinely use constrained optimization technology to plan and operate complex systems like global supply chains or power grids. In this context, practitioners must assess how close a computed solution is to optimality in order to make operational decisions, such as whether the current solution is sufficient or whether additional computation is warranted. A common practice is to evaluate solution quality using dual bounds returned by optimization solvers. While these dual bounds come with certified guarantees, they are often too loose to be practically informative. To this end, this paper introduces a novel conformal prediction framework for tightening loose primal and dual bounds. The proposed method addresses the heteroskedasticity commonly observed in these bounds via selective inference, and further exploits their inherent certified validity to produce tighter, more informative prediction intervals. Finally, numerical experiments on large-scale industrial problems suggest that the proposed approach can provide the same coverage level more efficiently than baseline methods.

2502.04907 2026-03-24 stat.ML cs.LG

Scalable Learning from Probability Measures with Mean Measure Quantization

Erell Gachon, Elsa Cazelles, Jérémie Bigot

详情
英文摘要

We consider statistical learning problems in which data are observed as a set of probability measures. Optimal transport (OT) is a popular tool to compare and manipulate such objects, but its computational cost becomes prohibitive when the measures have large support. We study a quantization-based approach in which all input measures are approximated by $K$-point discrete measures sharing a common support. We establish consistency of the resulting quantized measures. We further derive convergence guarantees for several OT-based downstream tasks computed from the quantized measures. Numerical experiments on synthetic and real datasets demonstrate that the proposed approach achieves performance comparable to individual quantization while substantially reducing runtime.

2501.16933 2026-03-24 stat.ME math.ST stat.AP stat.TH

Rethinking the Win Ratio: A Causal Framework for Hierarchical Outcome Analysis

Mathieu Even, Julie Josse

详情
英文摘要

Quantifying causal effects in the presence of complex and multivariate outcomes remains a key challenge in treatment evaluation. For hierarchical multivariate outcomes, the FDA recommends the Win Ratio and Generalized Pairwise Comparisons approaches \citep{Pocock2011winratio,Buyse2010}. However, commonly used estimators can yield treatment recommendations that target a population-level estimand (the probability that a randomly sampled patient under treatment fares better than another randomly sampled patient under control), which can contradict conclusions drawn from an ideal estimand (the probability that an individual would fare better with treatment than without), especially in heterogeneous populations. This discrepancy arises from the non-identifiability of the latter estimand and underscores both the influence of the chosen causal measure on the resulting conclusions and the necessity of articulating the underlying causal framework with clarity. We propose a novel, individual-level yet identifiable causal effect measure that more closely approximates the ideal individual-level estimand. We show that computing the Win Ratio or Net Benefit via nearest-neighbor pairing between treated and control patients, which can be seen as an extreme form of stratification, yields an estimator of our new causal measure in both randomized controlled trials and observational settings. We then develop a distributional regression framework, alongside semiparametric efficient estimators. Our methods are simple to implement and readily applicable in practice. We evaluate the proposed approach through simulations and apply it to the CRASH-3 trial \citep{crash3}, a major study assessing the effects of tranexamic acid in patients with traumatic brain injury.

2501.06404 2026-03-24 econ.EM cs.AI cs.LG stat.ML

A Hybrid Framework for Reinsurance Optimization: Integrating Generative Models and Reinforcement Learning

Stella C. Dong

详情
英文摘要

Reinsurance optimization is a cornerstone of solvency and capital management, yet traditional approaches often rely on restrictive distributional assumptions and static program designs. We propose a hybrid framework that combines Variational Autoencoders (VAEs) to learn joint distributions of multi-line and multi-year claims data with Proximal Policy Optimization (PPO) reinforcement learning to adapt treaty parameters dynamically. The framework explicitly targets expected surplus under capital and ruin-probability constraints, bridging statistical modeling with sequential decision-making. Using simulated and stress-test scenarios, including pandemic-type and catastrophe-type shocks, we show that the hybrid method produces more resilient outcomes than classical proportional and stop-loss benchmarks, delivering higher surpluses and lower tail risk. Our findings highlight the usefulness of generative models for capturing cross-line dependencies and demonstrate the feasibility of RL-based dynamic structuring in practical reinsurance settings. Contributions include (i) clarifying optimization goals in reinsurance RL, (ii) defending generative modeling relative to parametric fits, and (iii) benchmarking against established methods. This work illustrates how hybrid AI techniques can address modern challenges of portfolio diversification, catastrophe risk, and adaptive capital allocation.

2408.05819 2026-03-24 stat.ML cs.LG

Fast convergence of a Federated Expectation-Maximization Algorithm

Zhixu Tao, Rajita Chandak, Sanjeev Kulkarni

详情
英文摘要

Data heterogeneity has been a long-standing bottleneck in studying the convergence rates of Federated Learning algorithms. In order to better understand the issue of data heterogeneity, we study the convergence rate of the Expectation-Maximization (EM) algorithm for the Federated Mixture of $K$ Linear Regressions model (FMLR). We completely characterize the convergence rate of the EM algorithm under all regimes of number of clients and number of data points per client, with partial limits in the number of clients. We show that with a signal-to-noise-ratio (SNR) that is atleast of order $\sqrt{K}$, the well-initialized EM algorithm converges to the ground truth under all regimes. We perform experiments on synthetic data to illustrate our results. In line with our theoretical findings, the simulations show that rather than being a bottleneck, data heterogeneity can accelerate the convergence of iterative federated algorithms.

2407.11455 2026-03-24 math.ST stat.TH

ERM-Lasso classification algorithm for Multivariate Hawkes Processes paths

Charlotte Dion-Blanc, Christophe Denis, Laure Sansonnet, Romain Edmond Lacoste

详情
英文摘要

We are interested in the problem of classifying Multivariate Hawkes Processes (MHP) paths coming from several classes. MHP form a versatile family of point processes that models interactions between connected individuals within a network. In this paper, the classes are discriminated by the exogenous intensity vector and the adjacency matrix, which encodes the strength of the interactions. The observed learning data consist of labeled repeated and independent paths on a fixed time interval. Besides, we consider the high-dimensional setting, meaning the dimension of the network may be large {\it w.r.t.} the number of observations. We consequently require a sparsity assumption on the adjacency matrix. In this context, we propose a novel methodology with an initial interaction recovery step, by class, followed by a refitting step based on a suitable classification criterion. To recover the support of the adjacency matrix, a Lasso-type estimator is proposed, for which we establish rates of convergence. Then, leveraging the estimated support, we build a classification procedure based on the minimization of a $L_2$-risk. Notably, rates of convergence of our classification procedure are provided. An in-depth testing phase using synthetic data supports both theoretical results.

2402.08412 2026-03-24 stat.ML cs.LG math.DS math.ST stat.TH

Interacting Particle Systems on Networks: joint inference of the network and the interaction kernel

Quanjun Lang, Xiong Wang, Fei Lu, Mauro Maggioni

Comments 53 pages, 17 figures

详情
英文摘要

Modeling multi-agent systems on networks is a fundamental challenge in a wide variety of disciplines. Given data consisting of multiple trajectories, we jointly infer the (weighted) network and the interaction kernel, which determine, respectively, which agents are interacting and the rules of such interactions. Our estimator is based on a non-convex optimization problem, and we investigate two approaches to solve it: one based on an alternating least squares (ALS) algorithm, and another based on a new algorithm named operator regression with alternating least squares (ORALS). Both algorithms are scalable to large ensembles of data trajectories. We establish coercivity conditions guaranteeing identifiability and well-posedness. The ALS algorithm appears statistically efficient and robust even in the small data regime, but lacks performance and convergence guarantees. The ORALS estimator is consistent and asymptotically normal under a coercivity condition. We conduct several numerical experiments ranging from Kuramoto particle systems on networks to opinion dynamics in leader-follower models.

2402.01491 2026-03-24 stat.ME math.PR math.ST stat.AP stat.TH

Moving Aggregate Modified Autoregressive Copula-Based Time Series Models (MAGMAR-Copulas)

Sven Pappert

详情
英文摘要

Copula-based time series models can model univariate and stationary time series in a flexible way by decomposing the joint distribution of consecutive observations into a copula and the stationary distribution. Implicitly this approach assumes a finite Markov order. In reality a time series may not follow the Markov property. We modify the copula-based time series models by introducing a moving aggregate (MAG) part into the model updating equation. The functional form of the MAG-part is given as the conditional quantile function corresponding to a copula. The resulting MAG-modified Autoregressive Copula-Based Time Series model (MAGMAR-Copula) is discussed in detail and distributional properties are derived in a D-vine framework. We show that the stationary distribution implied by the model is not standard-uniform. Hence we propose an adjustment transformation that recovers the desired standard-uniformity. The model nests the classical ARMA model and can be interpreted as a non-linear generalization of the ARMA model. The modeling performance is evaluated by modeling US inflation. Our model is competitive with benchmark models in terms of information criteria.

2312.02246 2026-03-24 cs.CV cs.AI cs.LG stat.ML

Conditional Variational Diffusion Models

Gabriel della Maggiora, Luis Alberto Croquevielle, Nikita Deshpande, Harry Horsley, Thomas Heinis, Artur Yakimovich

Comments Denoising Diffusion Probabilistic Models, Inverse Problems, Generative Models, Super Resolution, Phase Quantification, Variational Methods

详情
Journal ref
In The Twelfth International Conference on Learning Representations. 2023
英文摘要

Inverse problems aim to determine parameters from observations, a crucial task in engineering and science. Lately, generative models, especially diffusion models, have gained popularity in this area for their ability to produce realistic solutions and their good mathematical properties. Despite their success, an important drawback of diffusion models is their sensitivity to the choice of variance schedule, which controls the dynamics of the diffusion process. Fine-tuning this schedule for specific applications is crucial but time-costly and does not guarantee an optimal result. We propose a novel approach for learning the schedule as part of the training process. Our method supports probabilistic conditioning on data, provides high-quality solutions, and is flexible, proving able to adapt to different applications with minimum overhead. This approach is tested in two unrelated inverse problems: super-resolution microscopy and quantitative phase imaging, yielding comparable or superior results to previous methods and fine-tuned diffusion models. We conclude that fine-tuning the schedule by experimentation should be avoided because it can be learned during training in a stable way that yields better results.

2310.09335 2026-03-24 stat.ML cs.LG math.ST stat.TH

The surrogate Gibbs-posterior of a corrected stochastic MALA: Towards uncertainty quantification for neural networks

Sebastian Bieringer, Gregor Kasieczka, Maximilian F. Steffen, Mathias Trabs

Comments The first version of this manuscript was entitled "Statistical guarantees for stochastic Metropolis-Hastings''. Some preliminary results were initially presented in the first version of arXiv:2204.12392, but have been moved to this manuscript, where they have been further developed

详情
Journal ref
Journal of Machine Learning Research, 27 (1), 1-50, 2026
英文摘要

MALA is a popular gradient-based Markov chain Monte Carlo method to access the Gibbs-posterior distribution. Stochastic MALA (sMALA) scales to large data sets, but changes the target distribution from the Gibbs-posterior to a surrogate posterior which only exploits a reduced sample size. We introduce a corrected stochastic MALA (csMALA) with a simple correction term for which distance between the resulting surrogate posterior and the original Gibbs-posterior decreases in the full sample size while retaining scalability. In a nonparametric regression model, we prove a PAC-Bayes oracle inequality for the surrogate posterior. Uncertainties can be quantified by sampling from the surrogate posterior. Focusing on Bayesian neural networks, we analyze the diameter and coverage of credible balls for shallow neural networks and we show optimal contraction rates for deep neural networks. Our credibility result is independent of the correction and can also be applied to the standard Gibbs-posterior. A simulation study in a high-dimensional parameter space demonstrates that an estimator drawn from csMALA based on its surrogate Gibbs-posterior indeed exhibits these advantages in practice.

2304.12505 2026-03-24 math.ST stat.ML stat.TH

Generalized Bayesian Additive Regression Trees: Theory and Software

Enakshi Saha

Comments 39 pages

详情
英文摘要

Bayesian Additive Regression Trees (BART) are a powerful ensemble learning technique for modeling nonlinear regression functions. Although initially BART was proposed for predicting only continuous and binary response variables, over the years multiple extensions have emerged that are suitable for estimating a wider class of response variables (e.g. categorical and count data) in a multitude of application areas. In this paper we describe a generalized framework for Bayesian trees and their additive ensembles where the response variable comes from an exponential family distribution and hence encompasses many prominent variants of BART. We derive sufficient conditions on the response distribution, under which the posterior concentrates at a minimax rate, up to a logarithmic factor. In this regard our results provide theoretical justification for the empirical success of BART and its variants. To support practitioners, we develop a Python package, also accessible in R via reticulate, that implements GBART for a range of exponential family response variables including Poisson, Inverse Gaussian, and Gamma distributions, alongside the standard continuous regression and binary classification settings. The package provides a user-friendly interface, enabling straightforward implementation of BART models across a broad class of response distributions.

2207.05901 2026-03-24 stat.AP stat.CO

Virtual sensing of subsoil strain response in monopile-based offshore wind turbines via Gaussian process latent force models

Joanna Zou, Eliz-Mari Lourens, Alice Cicirello

Comments submitted to Mechanical Systems and Signal Processing

详情
Journal ref
Mechanical Systems and Signal Processing (2023), vol. 200, 110488
英文摘要

Virtual sensing techniques have gained traction in applications to the structural health monitoring of monopile-based offshore wind turbines, as the strain response below the mudline, which is a primary indicator of fatigue damage accumulation, is impractical to measure directly with physical instrumentation. The Gaussian process latent force model (GPFLM) is a generalized Bayesian virtual sensing technique which combines a physics-driven model of the structure with a data-driven model of latent variables of the system to extrapolate unmeasured strain states. In the GPLFM, modeling of unknown sources of excitation as a Gaussian process (GP) serves to facilitate strain estimation by providing a complete stochastic characterization of the covariance relationship between input forces and states, using properties of the GP covariance kernel as well as correlation information supplied by the mechanical model. It is shown that posterior inference of the latent inputs and states is performed by Gaussian process regression of measured accelerations, computed efficiently using Kalman filtering and Rauch-Tung-Striebel smoothing in an augmented state-space model. While the GPLFM has been previously demonstrated in numerical studies to improve upon other virtual sensing techniques in terms of accuracy, robustness, and numerical stability, this work provides one of the first cases of in-situ validation of the GPLFM. The predicted strain response by the GPLFM is compared to subsoil strain data collected from an operating offshore wind turbine in the Westermeerwind Park in the Netherlands.

2206.02088 2026-03-24 stat.ML cs.LG stat.ME

LOCO Feature Importance Inference without Data Splitting via Minipatch Ensembles

Luqin Gan, Lili Zheng, Genevera I. Allen

详情
英文摘要

Feature importance inference is critical for the interpretability and reliability of machine learning models. There has been increasing interest in developing model-agnostic approaches to interpret any predictive model, often in the form of feature occlusion or leave-one-covariate-out (LOCO) inference. Existing methods typically make limiting distributional assumptions, modeling assumptions, and require data splitting. In this work, we develop a novel, mostly model-agnostic, and distribution-free inference framework for feature importance in regression or classification tasks that does not require data splitting. Our approach leverages a form of random observation and feature subsampling called minipatch ensembles; it utilizes the trained ensembles for inference and requires no model-refitting or held-out test data after training. We show that our approach enjoys both computational and statistical efficiency as well as circumvents interpretational challenges with data splitting. Further, despite using the same data for training and inference, we show the asymptotic validity of our confidence intervals under mild assumptions. Additionally, we propose theory-supported solutions to critical practical issues including vanishing variance for null features and inference after data-driven tuning for hyperparameters. We demonstrate the advantages of our approach over existing methods on a series of synthetic and real data examples.

2111.11566 2026-03-24 stat.ME stat.AP

Combining chains of Bayesian models with Markov melding

Andrew A. Manderson, Robert J. B. Goudie

Comments 37 pages, 14 figures. Revisions to text

详情
Journal ref
Bayesian Anal. (2023) 18(3):807-840
英文摘要

A challenge for practitioners of Bayesian inference is specifying a model that incorporates multiple relevant, heterogeneous data sets. It may be easier to instead specify distinct submodels for each source of data, then join the submodels together. We consider chains of submodels, where submodels directly relate to their neighbours via common quantities which may be parameters or deterministic functions thereof. We propose chained Markov melding, an extension of Markov melding, a generic method to combine chains of submodels into a joint model. One challenge we address is appropriately capturing the prior dependence between common quantities within a submodel, whilst also reconciling differences in priors for the same common quantity between two adjacent submodels. Estimating the posterior of the resulting overall joint model is also challenging, so we describe a sampler that uses the chain structure to incorporate information contained in the submodels in multiple stages, possibly in parallel. We demonstrate our methodology using two examples. The first example considers an ecological integrated population model, where multiple data sets are required to accurately estimate population immigration and reproduction rates. We also consider a joint longitudinal and time-to-event model with uncertain, submodel-derived event times. Chained Markov melding is a conceptually appealing approach to integrating submodels in these settings.

2110.11442 2026-03-24 math.OC cs.LG stat.ML

Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent

Sharan Vaswani, Benjamin Dubois-Taine, Reza Babanezhad

详情
英文摘要

We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $σ^2$ in the stochastic gradients and (ii) problem-dependent constants. When minimizing smooth, strongly-convex functions with condition number $κ$, we prove that $T$ iterations of SGD with exponentially decreasing step-sizes and knowledge of the smoothness can achieve an $\tilde{O} \left(\exp \left( \frac{-T}κ \right) + \frac{σ^2}{T} \right)$ rate, without knowing $σ^2$. In order to be adaptive to the smoothness, we use a stochastic line-search (SLS) and show (via upper and lower-bounds) that SGD with SLS converges at the desired rate, but only to a neighbourhood of the solution. On the other hand, we prove that SGD with an offline estimate of the smoothness converges to the minimizer. However, its rate is slowed down proportional to the estimation error. Next, we prove that SGD with Nesterov acceleration and exponential step-sizes (referred to as ASGD) can achieve the near-optimal $\tilde{O} \left(\exp \left( \frac{-T}{\sqrtκ} \right) + \frac{σ^2}{T} \right)$ rate, without knowledge of $σ^2$. When used with offline estimates of the smoothness and strong-convexity, ASGD still converges to the solution, albeit at a slower rate. We empirically demonstrate the effectiveness of exponential step-sizes coupled with a novel variant of SLS.

2004.02881 2026-03-24 stat.ML cs.CG cs.LG cs.NE

Estimate of the Neural Network Dimension using Algebraic Topology and Lie Theory

Luciano Melodia, Richard Lenz

Comments Code available at https://codeberg.org/Jiren/NTOPL

详情
英文摘要

In this paper we present an approach to determine the smallest possible number of neurons in a layer of a neural network in such a way that the topology of the input space can be learned sufficiently well. We introduce a general procedure based on persistent homology to investigate topological invariants of the manifold on which we suspect the data set. We specify the required dimensions precisely, assuming that there is a smooth manifold on or near which the data are located. Furthermore, we require that this space is connected and has a commutative group structure in the mathematical sense. These assumptions allow us to derive a decomposition of the underlying space whose topology is well known. We use the representatives of the $k$-dimensional homology groups from the persistence landscape to determine an integer dimension for this decomposition. This number is the dimension of the embedding that is capable of capturing the topology of the data manifold. We derive the theory and validate it experimentally on toy data sets.

1911.02922 2026-03-24 cs.CG cs.LG math.AT stat.ML

Persistent Homology as Stopping-Criterion for Voronoi Interpolation

Luciano Melodia, Richard Lenz

Comments Code available at https://codeberg.org/Jiren/SIML

详情
英文摘要

In this study the Voronoi interpolation is used to interpolate a set of points drawn from a topological space with higher homology groups on its filtration. The technique is based on Voronoi tessellation, which induces a natural dual map to the Delaunay triangulation. Advantage is taken from this fact calculating the persistent homology on it after each iteration to capture the changing topology of the data. The boundary points are identified as critical. The Bottleneck and Wasserstein distance serve as a measure of quality between the original point set and the interpolation. If the norm of two distances exceeds a heuristically determined threshold, the algorithm terminates. We give the theoretical basis for this approach and justify its validity with numerical experiments.

1807.04021 2026-03-24 math.ST eess.SP stat.TH

On bayesian estimation and proximity operators

Rémi Gribonval, Mila Nikolova

Comments Compared to the published version, this document (March 2026) includes typo corrections in Proposition 5,indicated in blue

详情
Journal ref
Applied and Computational Harmonic Analysis, 2021, 50, pp.49-72
英文摘要

There are two major routes to address the ubiquitous family of inverse problems appearing in signal and image processing, such as denoising or deblurring. A first route relies on Bayesian modeling, where prior probabilities are used to embody models of both the distribution of the unknown variables and their statistical dependence with respect to the observed data. The estimation process typically relies on the minimization of an expected loss (e.g. minimum mean squared error, or MMSE). The second route has received much attention in the context of sparse regularization and compressive sensing: it consists in designing (often convex) optimization problems involving the sum of a data delity term and a penalty term promoting certain types of unknowns (e.g., sparsity, promoted through an `1 norm). Well known relations between these two approaches have led to some widely spread mis-conceptions. In particular, while the so-called Maximum A Posterori (MAP) estimate with a Gaussian noise model does lead to an optimization problem with a quadratic data-fidelity term, we disprove through explicit examples the common belief that the converse would be true. It has already been shown [7, 9] that for denoising in the presence of additive Gaussian noise, for any prior probability on the unknowns, MMSE estimation can be expressed as a penalized least squares problem, with the apparent characteristics of a MAP estimation problem with Gaussian noise and a (generally) different prior on the unknowns. In other words, the variational approach is rich enough to build all possible MMSE estimators associated to additive Gaussian noise via a well chosen penalty. We generalize these results beyond Gaussian denoising and characterize noise models for which the same phenomenon occurs. In particular, we prove that with (a variant of) Poisson noise and any prior probability on the unknowns, MMSE estimation can again be expressed as the solution of a penalized least squares optimization problem. For additive scalar denoising the phenomenon holds if and only if the noise distribution is log-concave. In particular, Laplacian denoising can (perhaps surprisingly) be expressed as the solution of a penalized least squares problem. In the multivariate case, the same phenomenon occurs when the noise model belongs to a particular subset of the exponential family. For multivariate additive denoising, the phenomenon holds if and only if the noise is white and Gaussian.

2603.21424 2026-03-24 stat.ME math.ST stat.TH

Tiny but uniform improvements of adaptive BH procedures via compound e-values

Nikolaos Ignatiadis, Ruodu Wang, Aaditya Ramdas

详情
英文摘要

After the seminal Benjamini-Hochberg (BH) procedure for controlling the false discovery rate (FDR) was proposed, dozens of papers have attempted to improve its power by adapting to the unknown proportion of nulls. We observe that most null proportion estimates are simply compound e-values in disguise, and thus most adaptive FDR procedures can be interpreted as instances of the e-weighted BH (ep-BH) procedure of Ignatiadis, Wang, and Ramdas [2024], i.e., the BH procedure weighted by compound e-values. This lens helps us show that most existing procedures are inadmissible, and we provide uniform improvements to them. While the improvements are small in practice, they still come for free (without additional assumptions), and help unify the literature. We also use our "leave-one-out ep-BH method" to design a new method with finite-sample FDR control for the simultaneous t-test setting.

2603.21407 2026-03-24 econ.TH stat.AP

The Geometry of Heterogeneous Extremes: Optimal Transport and Entropic Design

I. Sebastian Buhai

详情
英文摘要

Extreme economic outcomes are not shaped by tails alone. They are also shaped by unequal access to opportunities. This paper develops a theory of heterogeneous extremes by taking the distribution of opportunity access as the object of study. In a mixed Poisson search setting, normalized maxima admit a Laplace mixture representation that yields order comparisons and a clean benchmark against the homogeneous economy. The main contribution is geometric: a canonical coupling turns differences in heterogeneity into optimal transport bounds for the whole induced law of extremes, the full schedule of top quantiles, and structured counterfactual paths between economies. The paper also derives a second order expansion that separates classical extreme value approximation error from heterogeneity effects. As a complementary normative exercise, it studies an entropy regularized design problem for reallocating opportunities under a mean constraint. A stylized labor market network application interprets heterogeneity as unequal access to job opportunities and shows how the framework can be used for tail counterfactuals and robustness analysis of top wage distributions.

2603.21393 2026-03-24 cs.LG stat.ML

A Generalised Exponentiated Gradient Approach to Enhance Fairness in Binary and Multi-class Classification Tasks

Maryam Boubekraoui, Giordano d'Aloisio, Antinisca Di Marco

详情
英文摘要

The widespread use of AI and ML models in sensitive areas raises significant concerns about fairness. While the research community has introduced various methods for bias mitigation in binary classification tasks, the issue remains under-explored in multi-class classification settings. To address this limitation, in this paper, we first formulate the problem of fair learning in multi-class classification as a multi-objective problem between effectiveness (i.e., prediction correctness) and multiple linear fairness constraints. Next, we propose a Generalised Exponentiated Gradient (GEG) algorithm to solve this task. GEG is an in-processing algorithm that enhances fairness in binary and multi-class classification settings under multiple fairness definitions. We conduct an extensive empirical evaluation of GEG against six baselines across seven multi-class and three binary datasets, using four widely adopted effectiveness metrics and three fairness definitions. GEG overcomes existing baselines, with fairness improvements up to 92% and a decrease in accuracy up to 14%.

2603.21375 2026-03-24 cs.LG stat.ML

Constrained Online Convex Optimization with Memory and Predictions

Mohammed Abdullah, George Iosifidis, Salah Eddine Elayoubi, Tijani Chahed

Comments accepted to AAAI 2026

详情
Journal ref
Proceedings of the AAAI Conference on Artificial Intelligence, 40(24):19524--19532, 2026
英文摘要

We study Constrained Online Convex Optimization with Memory (COCO-M), where both the loss and the constraints depend on a finite window of past decisions made by the learner. This setting extends the previously studied unconstrained online optimization with memory framework and captures practical problems such as the control of constrained dynamical systems and scheduling with reconfiguration budgets. For this problem, we propose the first algorithms that achieve sublinear regret and sublinear cumulative constraint violation under time-varying constraints, both with and without predictions of future loss and constraint functions. Without predictions, we introduce an adaptive penalty approach that guarantees sublinear regret and constraint violation. When short-horizon and potentially unreliable predictions are available, we reinterpret the problem as online learning with delayed feedback and design an optimistic algorithm whose performance improves as prediction accuracy improves, while remaining robust when predictions are inaccurate. Our results bridge the gap between classical constrained online convex optimization and memory-dependent settings, and provide a versatile learning toolbox with diverse applications.

2603.21370 2026-03-24 stat.ME cs.SY eess.SY

Adaptive and robust experimental design for linear dynamical models using Kalman filter

Arno Strouwen, Bart M. Nicolaï, Peter Goos

详情
Journal ref
Statistical Papers, 64, 1209--1231, 2023
英文摘要

Current experimental design techniques for dynamical systems often only incorporate measurement noise, while dynamical systems also involve process noise. To construct experimental designs we need to quantify their information content. The Fisher information matrix is a popular tool to do so. Calculating the Fisher information matrix for linear dynamical systems with both process and measurement noise involves estimating the uncertain dynamical states using a Kalman filter. The Fisher information matrix, however, depends on the true but unknown model parameters. In this paper we combine two methods to solve this issue and develop a robust experimental design methodology. First, Bayesian experimental design averages the Fisher information matrix over a prior distribution of possible model parameter values. Second, adaptive experimental design allows for this information to be updated as measurements are being gathered. This updated information is then used to adapt the remainder of the design.

2603.21361 2026-03-24 stat.ME stat.CO

A Note on the Output of a Coordinate-Exchange Algorithm for Optimal Experimental Design

Arno Strouwen, Peter Goos

详情
Journal ref
Chemometrics and Intelligent Laboratory Systems, 192, 103819, 2019
英文摘要

The coordinate-exchange algorithm is commonly used to construct optimal experimental designs. Every execution of the coordinate-exchange algorithm produces a new, seemingly random, order of the selected design points. In this short communication, we study the order of the design points produced by the algorithm and conclude that certain orders appear much more often than others. As a result, an explicit randomization step of the design points is required before conducting an experiment using a design produced by a coordinate-exchange algorithm.

2603.21342 2026-03-24 stat.ML cs.AI cs.CL cs.LG

Generalized Discrete Diffusion from Snapshots

Oussama Zekri, Théo Uscidda, Nicolas Boullé, Anna Korba

Comments 37 pages, 6 figures, 13 tables

详情
英文摘要

We introduce Generalized Discrete Diffusion from Snapshots (GDDS), a unified framework for discrete diffusion modeling that supports arbitrary noising processes over large discrete state spaces. Our formulation encompasses all existing discrete diffusion approaches, while allowing significantly greater flexibility in the choice of corruption dynamics. The forward noising process relies on uniformization and enables fast arbitrary corruption. For the reverse process, we derive a simple evidence lower bound (ELBO) based on snapshot latents, instead of the entire noising path, that allows efficient training of standard generative modeling architectures with clear probabilistic interpretation. Our experiments on large-vocabulary discrete generation tasks suggest that the proposed framework outperforms existing discrete diffusion methods in terms of training efficiency and generation quality, and beats autoregressive models for the first time at this scale. We provide the code along with a blog post on the project page : \href{https://oussamazekri.fr/gdds}{https://oussamazekri.fr/gdds}.

2603.21247 2026-03-24 stat.ML cs.LG math.DG physics.data-an

Accelerate Vector Diffusion Maps by Landmarks

Sing-Yuan Yeh, Yi-An Wu, Hau-Tieng Wu, Mao-Pei Tsui

详情
英文摘要

We propose a landmark-constrained algorithm, LA-VDM (Landmark Accelerated Vector Diffusion Maps), to accelerate the Vector Diffusion Maps (VDM) framework built upon the Graph Connection Laplacian (GCL), which captures pairwise connection relationships within complex datasets. LA-VDM introduces a novel two-stage normalization that effectively address nonuniform sampling densities in both the data and the landmark sets. Under a manifold model with the frame bundle structure, we show that we can accurately recover the parallel transport with landmark-constrained diffusion from a point cloud, and hence asymptotically LA-VDM converges to the connection Laplacian. The performance and accuracy of LA-VDM are demonstrated through experiments on simulated datasets and an application to nonlocal image denoising.

2603.21235 2026-03-24 stat.ML cs.AI cs.CV

Domain Elastic Transform: Bayesian Function Registration for High-Dimensional Scientific Data

Osamu Hirose, Emanuele Rodola

详情
英文摘要

Nonrigid registration is conventionally divided into point set registration, which aligns sparse geometries, and image registration, which aligns continuous intensity fields on regular grids. However, this dichotomy creates a critical bottleneck for emerging scientific data, such as spatial transcriptomics, where high-dimensional vector-valued functions, e.g., gene expression, are defined on irregular, sparse manifolds. Consequently, researchers currently face a forced choice: either sacrifice single-cell resolution via voxelization to utilize image-based tools, or ignore the critical functional signal to utilize geometric tools. To resolve this dilemma, we propose Domain Elastic Transform (DET), a grid-free probabilistic framework that unifies geometric and functional alignment. By treating data as functions on irregular domains, DET registers high-dimensional signals directly without binning. We formulate the problem within a rigorous Bayesian framework, modeling domain deformation as an elastic motion guided by a joint spatial-functional likelihood. The method is fully unsupervised and scalable, utilizing feature-sensitive downsampling to handle massive atlases. We demonstrate that DET achieves 92\% topological preservation on MERFISH data where state-of-the-art optimal transport methods struggle ($<$5\%), and successfully registers whole-embryo Stereo-seq atlases across developmental stages -- a task involving massive scale and complex nonrigid growth. The implementation of DET is available on {https://github.com/ohirose/bcpd} (since Mar, 2025).

2603.21216 2026-03-24 stat.AP

VA-Calibration: Correcting for Algorithmic Misclassification in Estimating Cause Distributions

Sandipan Pramanik, Emily B. Wilson, Henry D. Kalter, Agbessi Amouzou, Robert E. Black, Li Liu, Jamie Perin, Abhirup Datta

Comments 27 pages, 5 figures

详情
英文摘要

Accurate estimation of cause-specific mortality fractions (CSMFs), the percentage of deaths attributable to each cause in a population, is essential for global health monitoring. Challenge arises because computer-coded verbal autopsy (CCVA) algorithms, commonly used to estimate CSMFs, frequently misclassify the cause of death (COD). This misclassification is further complicated by structured patterns and substantial variation across countries. To address this, we introduce the R package 'vacalibration'. It implements a modular Bayesian framework to correct for the misclassification, thereby yielding more accurate CSMF estimates from verbal autopsy (VA) questionnaire data. The package utilizes uncertainty-quantified CCVA misclassification matrix estimates derived from data collected in the CHAMPS project and available on the 'CCVA-Misclassification-Matrices' GitHub repository. Currently, these matrices cover three CCVA algorithms (EAVA, InSilicoVA, and InterVA) and two age groups (neonates aged 0-27 days, and children aged 1-59 months) across countries (specific estimates for Bangladesh, Ethiopia, Kenya, Mali, Mozambique, Sierra Leone, and South Africa, and a combined estimate for all other countries), enabling global calibration. The 'vacalibration' package also supports ensemble calibration when multiple algorithms are available. Implemented using the 'RStan', the package offers rapid computation, uncertainty quantification, and seamless compatibility with openVA, a leading COD analysis software ecosystem. We demonstrate the package's flexibility with two real-world applications in COMSA-Mozambique and CA CODE. The package and its foundational methodology applies more broadly and can calibrate any discrete classifier or their ensemble.

2603.21191 2026-03-24 cs.LG math.OC stat.ML

On the Role of Batch Size in Stochastic Conditional Gradient Methods

Rustem Islamov, Roman Machacek, Aurelien Lucchi, Antonio Silveti-Falls, Eduard Gorbunov, Volkan Cevher

详情
英文摘要

We study the role of batch size in stochastic conditional gradient methods under a $μ$-Kurdyka-Łojasiewicz ($μ$-KL) condition. Focusing on momentum-based stochastic conditional gradient algorithms (e.g., Scion), we derive a new analysis that explicitly captures the interaction between stepsize, batch size, and stochastic noise. Our study reveals a regime-dependent behavior: increasing the batch size initially improves optimization accuracy but, beyond a critical threshold, the benefits saturate and can eventually degrade performance under a fixed token budget. Notably, the theory predicts the magnitude of the optimal stepsize and aligns well with empirical practices observed in large-scale training. Leveraging these insights, we derive principled guidelines for selecting the batch size and stepsize, and propose an adaptive strategy that increases batch size and sequence length during training while preserving convergence guarantees. Experiments on NanoGPT are consistent with the theoretical predictions and illustrate the emergence of the predicted scaling regimes. Overall, our results provide a theoretical framework for understanding batch size scaling in stochastic conditional gradient methods and offer guidance for designing efficient training schedules in large-scale optimization.

2603.21163 2026-03-24 stat.AP stat.ME

Simultaneous Estimation of Ballpark Effects and Team Defense Using Total Bases Residuals

Jhe-Jia Wu, Tian-Li Yan, Ting-Li Chen

详情
英文摘要

Estimating ballpark effects and team defense in baseball is challenging because batted-ball outcomes are influenced by multiple factors, including contact quality, ballpark environment, defensive performance, and random variation. In this study, we propose a simple and interpretable framework based on Total Bases Residuals (TBR). Using Statcast data from 2015 to 2024, we construct expected total bases conditional on exit velocity and launch angle, and define residuals relative to this baseline. These residuals allow us to separate the effects of ballpark environment and team defense and to estimate them simultaneously within a unified regression framework. Our results show that, when our estimates differ from official MLB metrics, the differences can be explained by consistent patterns in home and away performance for both teams and their opponents, providing empirical support for our approach. Similar patterns are also observed in comparisons with existing defensive metrics. The results also suggest changes in league-wide outcomes and are broadly consistent with developments in the game, including the increased use of data-driven positioning, the restriction on defensive shifts, and possible changes in the physical properties of the baseball. We further introduce a standardized index that facilitates comparison across teams, ballparks, and seasons by expressing effects in units of standard deviation.

2603.21144 2026-03-24 stat.ML cs.LG

Time-adaptive functional Gaussian Process regression

MD Ruiz-Medina, AE Madrid, A Torres-Signes, JM Angulo

详情
英文摘要

This paper proposes a new formulation of functional Gaussian Process regression in manifolds, based on an Empirical Bayes approach, in the spatiotemporal random field context. We apply the machinery of tight Gaussian measures in separable Hilbert spaces, exploiting the invariance property of covariance kernels under the group of isometries of the manifold. The identification of these measures with infinite-product Gaussian measures is then obtained via the eigenfunctions of the Laplace-Beltrami operator on the manifold. The involved time-varying angular spectra constitute the key tool for dimension reduction in the implementation of this regression approach, adopting a suitable truncation scheme depending on the functional sample size. The simulation study and synthetic data application undertaken illustrate the finite sample and asymptotic properties of the proposed functional regression predictor.

2603.21091 2026-03-24 stat.ML cs.LG math.PR

Stochastic approximation in non-markovian environments revisited

Vivek Shripad Borkar

详情
英文摘要

Based on some recent work of the author on stochastic approximation in non-markovian environments, the situation when the driving random process is non-ergodic in addition to being non-markovian is considered. Using this, we propose an analytic framework for understanding transformer based learning, specifically, the `attention' mechanism, and continual learning, both of which depend on the entire past in principle.

2603.21067 2026-03-24 stat.ME

A Bayesian Framework for Quantifying Association Between Functional and Structural Data in Neuroimaging

Sakul Mahat, Sharmistha Guha, Jessica Bernard

详情
英文摘要

Structural and functional neuroimaging modalities provide complementary windows into brain organization: structural imaging characterizes neural tissue anatomy and microstructure, while functional imaging captures dynamic patterns of neural activity and connectivity. Together, they offer a more complete picture than either alone. Recent multimodal neuroimaging work has focused on joint modeling of structural and functional data, often assuming a strong association between them to improve prediction and interpretability. However, relatively little attention has been given to developing statistically principled frameworks for formally testing hypotheses about these associations. Existing approaches typically rely on simple correlation-based measures or heuristic integration strategies, which may fail to capture the complex dependencies inherent in neuroimaging data, particularly when functional data are represented as brain networks and structural data as region-specific anatomical measures. We address this gap by developing an explicit Bayesian hypothesis testing framework for quantifying associations between structural and functional neuroimaging data. Our approach constructs functional brain networks from fMRI data, then integrates them with structural measurements through a hierarchical Bayesian model. The Bayesian formulation naturally accommodates two types of datasets with different structures, incorporates prior knowledge, and yields full posterior uncertainty quantification. Through extensive empirical studies, we demonstrate that the proposed method achieves excellent performance in detecting associations under a wide range of settings, including varying signal-to-noise ratios, different numbers of brain regions, and diverse sets of structural imaging measures.

2603.21062 2026-03-24 stat.ML cs.LG math.ST stat.TH

Gradient Descent with Projection Finds Over-Parameterized Neural Networks for Learning Low-Degree Polynomials with Nearly Minimax Optimal Rate

Yingzhen Yang, Ping Li

详情
英文摘要

We study the problem of learning a low-degree spherical polynomial of degree $k_0 = Θ(1) \ge 1$ defined on the unit sphere in $\RR^d$ by training an over-parameterized two-layer neural network with augmented feature in this paper. Our main result is the significantly improved sample complexity for learning such low-degree polynomials. We show that, for any regression risk $\eps \in (0, Θ(d^{-k_0})]$, an over-parameterized two-layer neural network trained by a novel Gradient Descent with Projection (GDP) requires a sample complexity of $n \asymp Θ( \log(4/δ) \cdot d^{k_0}/\eps)$ with probability $1-δ$ for $δ\in (0,1)$, in contrast with the representative sample complexity $Θ(d^{k_0} \max\set{\eps^{-2},\log d})$. Moreover, such sample complexity is nearly unimprovable since the trained network renders a nearly optimal rate of the nonparametric regression risk of the order $\log({4}/δ) \cdot Θ(d^{k_0}/{n})$ with probability at least $1-δ$. On the other hand, the minimax optimal rate for the regression risk with a kernel of rank $Θ(d^{k_0})$ is $Θ(d^{k_0}/{n})$, so that the rate of the nonparametric regression risk of the network trained by GDP is nearly minimax optimal. In the case that the ground truth degree $k_0$ is unknown, we present a novel and provable adaptive degree selection algorithm which identifies the true degree and achieves the same nearly optimal regression rate. To the best of our knowledge, this is the first time that a nearly optimal risk bound is obtained by training an over-parameterized neural network with a popular activation function (ReLU) and algorithmic guarantee for learning low-degree spherical polynomials. Due to the feature learning capability of GDP, our results are beyond the regular Neural Tangent Kernel (NTK) limit.

2603.21042 2026-03-24 stat.ME cs.LG

Statistical Learning for Latent Embedding Alignment with Application to Brain Encoding and Decoding

Shuoxun Xu, Zhanhao Yan, Lexin Li

Comments 35 pages, 3 figures

详情
英文摘要

Brain encoding and decoding aims to understand the relationship between external stimuli and brain activities, and is a fundamental problem in neuroscience. In this article, we study latent embedding alignment for brain encoding and decoding, with a focus on improving sample efficiency under limited fMRI-stimulus paired data and substantial subject heterogeneity. We propose a lightweight alignment framework equipped with two statistical learning components: inverse semi-supervised learning that leverages abundant unpaired stimulus embeddings through inverse mapping and residual debiasing, and meta transfer learning that borrows strength from pretrained models across subjects via sparse aggregation and residual correction. Both methods operate exclusively at the alignment stage while keeping encoders and decoders frozen, allowing for efficient computation, modular deployment, and rigorous theoretical analysis. We establish finite-sample generalization bounds and safety guarantees, and demonstrate competitive empirical performance on the large-scale fMRI-image reconstruction benchmark data.

2603.21032 2026-03-24 stat.AP stat.ME

Integrative Predictor-Dependent Learning of Network Data and Spatially Correlated Nodal Attributes for Multimodal Brain Imaging in Aging

Jose Rodriguez-Acosta, Sharmistha Guha, Jessica Bernard, Thamires Magalhaes, Kaitlin McOwen

Comments 38 pages

详情
英文摘要

This article introduces a predictor-dependent joint modeling framework for network data obtained from multiple subjects over a shared set of nodes with spatial co-ordinates and spatially correlated nodal attributes. The framework is highly flexible, allowing concurrent inference on nodes significantly associated with a predictor, spatial associations of nodal attributes and the regression relationship between a predictor and edge connecting a pair of nodes or a specific nodal attribute. Empirical results indicate a superior performance of the proposed approach due to accounting for network structure and spatial correlation in the data simultaneously. The methodology analyzes multimodal brain imaging data collected first-hand in the coauthor's Lifespan Cognitive and Motor Neuroimaging Laboratory, with a focus on integrating structural and functional information. It examines brain connectivity, represented as a connectome network across regions of interest (ROIs) derived from functional magnetic resonance imaging (fMRI), while also incorporating ROI-specific attributes obtained from structural MRI data, for each subject. Subject-specific aging-related features and spatial locations of ROIs are incorporated in the analysis. This framework facilitates robust inference on the associations between predictors and brain connectivity patterns, the spatial relationships among ROI-specific attributes, and the regression relationships involving edges or ROI-specific attributes with aging-related predictors. By integrating these diverse data sources, the approach provides a deeper understanding of the complex interplay between brain structure, function, aging-related changes, and external predictors. As a model-based Bayesian approach, it provides uncertainty quantification for all inferences, offering robust and reliable results, particularly in scenarios with limited sample size.

2603.21027 2026-03-24 cs.IT math.IT math.ST stat.TH

Dual Representation of Minimum Divergence Under Integral Constraints

Shubhanshu Shekhar, Shubhada Agrawal

Comments 45 pages [Preliminary version; feedback welcome]

详情
英文摘要

Minimum divergence problems under integral constraints appear throughout statistics and probability, including sequential inference, bandit theory, and distributionally robust optimization. In many such settings, dual representations are the key step that convert information-theoretic lower bounds into computationally tractable (and often near-optimal) algorithms. In this paper, we present a general two-stage recipe for deriving dual representations of constrained minimum divergence (in the second argument) for distributions supported on $[0,1]^K$. The first stage derives a dual representation for finitely-supported distributions using classical finite-dimensional convex duality techniques, while the second establishes an abstract interchange argument that lifts this discretized dual to arbitrary distributions. We begin with the simplest case of mean-constrained minimum relative entropy, commonly called $\mathrm{KL}_{\inf}$, and generalize an existing argument from multi-armed bandits literature for $K=1$ to arbitrary dimensions. Our main contribution is to significantly expand the scope of this approach to a broad class of $f$-divergences (beyond relative entropy) and to general integral constraint functionals (beyond the mean constraint). Finally, we illustrate the statistical implications of our results by constructing optimal procedures for sequential testing, estimation, and change detection with observations in $[0,1]^K$.

2603.21004 2026-03-24 econ.EM math.ST stat.TH

Power Bounds and Efficiency Loss for Asymptotically Optimal Tests in IV Regression

Marcelo J. Moreira, Geert Ridder, Mahrad Sharifvaghefi

详情
英文摘要

We characterize the maximal attainable power-size gap in overidentified instrumental variables models with heteroskedastic or autocorrelated (HAC) errors. Using total variation distance and Kraft's theorem, we define the decision theoretic frontier of the testing problem. We show that Lagrange multiplier and conditional quasi likelihood ratio tests can have power arbitrarily close to size even when the null and alternative are well separated, because they do not fully exploit the reduced-form likelihood. In contrast, the conditional likelihood ratio (CLR) test uses the full reduced-form likelihood. We prove that the power-size gap of CLR converges to one if and only if the testing problem becomes trivial in total variation distance, so that CLR attains the decision theoretic frontier whenever any test can. An empirical illustration based on Yogo (2004) shows that these failures arise in empirically relevant configurations.

2603.20974 2026-03-24 math.ST stat.TH

Support of Continuous Smeary Measures on Spheres

Susovan Pal

详情
英文摘要

We investigate the support of smeary, directionally smeary, and finite sample smeary probability measures $μ$ with density $ρ$ on spheres $\mathbb{S}^m$. First, in the rotationally symmetric case, we show that a distribution is not smeary, or equivalently, not directionally smeary whenever its support lies in a geodesic ball centered at the Fréchet mean of radius $R_m>π/2$, where $R_m=π/2+O(1/m)$. In the general case, we show that neither directional nor full smeariness holds whenever the support is contained in a closed ball of radius $π/2$, however, past the support radius $π/2,$ full smeariness may break down, but directional smeariness breaks down only past the support radius $R_m.$ Second, we prove sharpness of this threshold. For every $\varepsilon>0$, we show there exists $m_0(\varepsilon)$ such that for all $m\ge m_0(\varepsilon)$ there exists a rotationally symmetric continuous smeary probability measure on $\mathbb{S}^m$ whose support lies in a ball of radius $π/2+\varepsilon$ around the Fréchet mean. Third, in every dimension we construct directionally smeary continuous distributions supported in a ball of radius $π/2+\varepsilon$ whose Fréchet function has Hessian of rank one. Finally, we study finite sample smeariness. We show that any continuous non-smeary distribution supported in a geodesic ball of radius $π/2$ is necessarily Type~I finite sample smeary, i.e. its variance modulation $m_n$ satisfies $\lim_{n\to\infty} m_n>1$. In the rotationally symmetric case, we further prove a curse-of-dimensionality phenomenon: the variance modulation increases with the dimension and can become arbitrarily large depending on the support.

2603.20962 2026-03-24 stat.AP stat.ME stat.ML

Integrative Learning of Dynamically Evolving Multiplex Graphs and Nodal Attributes Using Neural Network Gaussian Processes with an Application to Dynamic Terrorism Graphs

Jose Rodriguez-Acosta, Sharmistha Guha, Lekha Patel, Kurtis Shuler

Comments 59 pages

详情
英文摘要

Exploring the dynamic co-evolution of multiplex graphs and nodal attributes is a compelling question in criminal and terrorism networks. This article is motivated by the study of dynamically evolving interactions among prominent terrorist organizations, considering various organizational attributes like size, ideology, leadership, and operational capacity. Statistically principled integration of multiplex graphs with nodal attributes is significantly challenging due to the need to leverage shared information within and across layers, account for uncertainty in predicting unobserved links, and capture temporal evolution of node attributes. These difficulties increase when layers are partially observed, as in terrorism networks where connections are deliberately hidden to obscure key relationships. To address these challenges, we present a principled methodological framework to integrate the multiplex graph layers and nodal attributes. The approach employs time-varying stochastic latent factor models, leveraging shared latent factors to capture graph structure and its co-evolution with node attributes. Latent factors are modeled using Gaussian processes with an infinitely wide deep neural network-based covariance function, termed neural network Gaussian processes (NN-GP). The NN-GP framework on latent factors exploits the predictive power of Bayesian deep neural network architecture while propagating uncertainty for reliability. Simulation studies highlight superior performance of the proposed approach in achieving inferential objectives. The approach, termed as dynamic joint learner, enables predictive inference (with uncertainty) of diverse unobserved dynamic relationships among prominent terrorist organizations and their organization-specific attributes, as well as clustering behavior in terms of friend-and-foe relationships, which could be informative in counter-terrorism research.

2603.20945 2026-03-24 stat.ME

Functional Estimation of Manifold-Valued Diffusion Processes

Jacob McErlean, Hau-Tieng Wu

详情
英文摘要

Nonstationary high-dimensional time series are increasingly encountered in biomedical research as measurement technologies advance. Owing to the homeostatic nature of physiological systems, such datasets are often located on, or can be well approximated by, a low-dimensional manifold. Modeling such datasets by manifold-valued Itô diffusion processes has been shown to provide valuable insights and to guide the design of algorithms for clinical applications. In this paper, we propose Nadaraya-Watson type nonparametric estimators for the drift vector field and diffusion matrix of the process from one trajectory. Assuming a time-homogeneous stochastic differential equation on a smooth complete manifold without boundary, we show that as the sampling interval and kernel bandwidth vanish with increasing trajectory length, recurrence of the process yields asymptotic consistency and normality of the drift and diffusion estimators, as well as the associated occupation density. Analysis of the diffusion estimator further produces a tangent space estimator for dependent data, which has its own interest and is essential for drift estimation. Numerical experiments across a range of manifold configurations support the theoretical results.

2603.20939 2026-03-24 cs.CL cs.AI cs.HC cs.IR stat.ML

User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction

Yuren Hao, Shuhaib Mehri, ChengXiang Zhai, Dilek Hakkani-Tür

Comments 21 pages including appendices

详情
英文摘要

Large language models are increasingly used as personal assistants, yet most lack a persistent user model, forcing users to repeatedly restate preferences across sessions. We propose Vector-Adapted Retrieval Scoring (VARS), a pipeline-agnostic, frozen-backbone framework that represents each user with long-term and short-term vectors in a shared preference space and uses these vectors to bias retrieval scoring over structured preference memory. The vectors are updated online from weak scalar rewards from users' feedback, enabling personalization without per-user fine-tuning. We evaluate on \textsc{MultiSessionCollab}, an online multi-session collaboration benchmark with rich user preference profiles, across math and code tasks. Under frozen backbones, the main benefit of user-aware retrieval is improved interaction efficiency rather than large gains in raw task accuracy: our full VARS agent achieves the strongest overall performance, matches a strong Reflection baseline in task success, and reduces timeout rate and user effort. The learned long-term vectors also align with cross-user preference overlap, while short-term vectors capture session-specific adaptation, supporting the interpretability of the dual-vector design. Code, model, and data are available at https://github.com/YurenHao0426/VARS.

2603.20936 2026-03-24 econ.EM stat.ML

Two Approaches to Direct Estimation of Riesz Representers

David Bruns-Smith

Comments A short technical and historical note

详情
英文摘要

The Riesz representer is a central object in semiparametric statistics and debiased/doubly-robust estimation. Two literatures in econometrics have highlighted the role for directly estimating Riesz representers: the automatic debiased machine learning literature (as in Chernozhukov et al., 2022b), and an independent literature on sieve methods for conditional moment models (as in Chen et al., 2014). These two literatures solve distinct optimization problems that in the population both have the Riesz representer as their solution. We show that with unregularized or ridge-regularized linear, sieve, or RKHS models, the two resulting estimators are numerically equivalent. However, for other regularization schemes such as the Lasso, or more general machine learning function classes including neural networks, the estimators are not necessarily equivalent. In the latter case, the Chen et al. (2014) formulation yields a novel constrained optimization problem for directly estimating Riesz representers with machine learning. Drawing on results from Birrell et al. (2022), we conjecture that this approach may offer statistical advantages at the cost of greater computational complexity.

2603.20929 2026-03-24 stat.ML cs.LG math.ST stat.CO stat.TH

Stability of Sequential and Parallel Coordinate Ascent Variational Inference

Debdeep Pati

Comments 20 pages, 3 figures

详情
英文摘要

We highlight a striking difference in behavior between two widely used variants of coordinate ascent variational inference: the sequential and parallel algorithms. While such differences were known in the numerical analysis literature in simpler settings, they remain largely unexplored in the optimization-focused literature on variational inference in more complex models. Focusing on the moderately high-dimensional linear regression problem, we show that the sequential algorithm, although typically slower, enjoys convergence guarantees under more relaxed conditions than the parallel variant, which is often employed to facilitate block-wise updates and improve computational efficiency.

2603.20927 2026-03-24 stat.ML cs.LG

Active Inference for Physical AI Agents -- An Engineering Perspective

Bert de Vries

详情
英文摘要

Physical AI agents, such as robots and other embodied systems operating under tight and fluctuating resource constraints, remain far less capable than biological agents in open-ended real-world environments. This paper argues that Active Inference (AIF), grounded in the Free Energy Principle, offers a principled foundation for closing that gap. We develop this argument from first principles, following a chain from probability theory through Bayesian machine learning and variational inference to active inference and reactive message passing. From the FEP perspective, systems that maintain their structural and functional integrity over time can, under suitable assumptions, be described as minimizing variational free energy (VFE), and AIF operationalizes this by unifying perception, learning, planning, and control within a single computational objective. We show that VFE minimization is naturally realized by reactive message passing on factor graphs, where inference emerges from local, parallel computations. This realization is well matched to the constraints of physical operation, including hard deadlines, asynchronous data, fluctuating power budgets, and changing environments. Because reactive message passing is event-driven, interruptible, and locally adaptable, performance degrades gracefully under reduced resources while model structure can adjust online. We further show that, under suitable coupling and coarse-graining conditions, coupled AIF agents can be described as higher-level AIF agents, yielding a homogeneous architecture based on the same message-passing primitive across scales. Our contribution is not empirical benchmarking, but a clear theoretical and architectural case for the engineering community.

2603.20908 2026-03-24 cs.LG stat.ML

Bayesian Scattering: A Principled Baseline for Uncertainty on Image Data

Bernardo Fichera, Zarko Ivkovic, Kjell Jorner, Philipp Hennig, Viacheslav Borovitskiy

详情
英文摘要

Uncertainty quantification for image data is dominated by complex deep learning methods, yet the field lacks an interpretable, mathematically grounded baseline. We propose Bayesian scattering to fill this gap, serving as a first-step baseline akin to the role of Bayesian linear regression for tabular data. Our method couples the wavelet scattering transform-a deep, non-learned feature extractor-with a simple probabilistic head. Because scattering features are derived from geometric principles rather than learned, they avoid overfitting the training distribution. This helps provide sensible uncertainty estimates even under significant distribution shifts. We validate this on diverse tasks, including medical imaging under institution shift, wealth mapping under country-to-country shift, and Bayesian optimization of molecular properties. Our results suggest that Bayesian scattering is a solid baseline for complex uncertainty quantification methods.

2603.20891 2026-03-24 stat.ML cs.LG eess.SP math.DS

Auto-differentiable data assimilation: Co-learning of states, dynamics, and filtering algorithms

Melissa Adrian, Daniel Sanz-Alonso, Rebecca Willett

详情
英文摘要

Data assimilation algorithms estimate the state of a dynamical system from partial observations, where the successful performance of these algorithms hinges on costly parameter tuning and on employing an accurate model for the dynamics. This paper introduces a framework for jointly learning the state, dynamics, and parameters of filtering algorithms in data assimilation through a process we refer to as auto-differentiable filtering. The framework leverages a theoretically motivated loss function that enables learning from partial, noisy observations via gradient-based optimization using auto-differentiation. We further demonstrate how several well-known data assimilation methods can be learned or tuned within this framework. To underscore the versatility of auto-differentiable filtering, we perform experiments on dynamical systems spanning multiple scientific domains, such as the Clohessy-Wiltshire equations from aerospace engineering, the Lorenz-96 system from atmospheric science, and the generalized Lotka-Volterra equations from systems biology. Finally, we provide guidelines for practitioners to customize our framework according to their observation model, accuracy requirements, and computational budget.

2603.20853 2026-03-24 stat.ME stat.AP

Correcting for Missing Data When Evaluating Surrogate Markers in a Clinical Trial

Sarah C. Lotspeich, P. D. Anh. Nguyen, Layla Parast

Comments 19 pages, 4 tables, 3 figures, R package and GitHub repository with simulation code

详情
英文摘要

Evaluating treatment effects is critical in clinical trials but sometimes involves lengthy, invasive, or costly follow-up procedures. In these cases, surrogate markers, which provide intermediate measures of the long-term treatment effect, allow clinicians to obtain results faster and more efficiently than would have otherwise been possible. Prior to adoption, it is vital that the utility of surrogate markers (i.e., their ability to capture the treatment effect on the primary outcome) is statistically validated. Many frameworks for evaluating surrogate markers have been proposed, but they do not account for missing data. Instead, they rely on complete cases (the subset of patients without missing data), which can be inefficient and biased. To improve on this, we propose methods to accommodate missing data in nonparametric and parametric surrogate evaluation via inverse probability weighting (IPW) and semiparametric maximum likelihood estimation (SMLE). Through simulation studies, we demonstrate that the proposed methods remain unbiased under a broader range of missing data mechanisms than complete case analysis and can help retain the statistical precision of the full trial. We illustrate their practical utility through an application to a diabetes clinical trial. Moreover, our missing data corrections have complementary strengths with respect to computational ease, robustness, and statistical efficiency. All methods are implemented in the MissSurrogate R package.

2603.20844 2026-03-24 stat.ME

A scalable Bayesian functional factor model for high-dimensional longitudinal molecular data

Salima Jaoua, Daniel Temko, Hélène Ruffieux

详情
英文摘要

Large-scale longitudinal molecular profiling is now firmly established in biomedical research, prompted by the need to uncover coordinated biomarker trajectories reflecting the dynamics of underlying biological mechanisms and characterise patient heterogeneity in disease progression. While a range of statistical tools exist for either longitudinal modelling or high-dimensional analysis, there is no unified framework tailored to address these questions jointly. Motivated by a longitudinal COVID-19 study conducted in Cambridge hospitals, we propose a Bayesian functional factor model to address this gap. The framework combines latent factor modelling with functional principal component analysis to represent shared temporal programmes across subsets of variables while capturing individual variation through low-dimensional functional scores. We specify sparsity-inducing priors that yield interpretable factor structure and allow the effective number of factors to be inferred via overspecification. An annealed variational algorithm ensures efficient joint posterior inference at scale. The approach achieves accurate recovery of temporal structure in simulations with up to 20 000 variables. Application to the COVID-19 data reveals clinically meaningful heterogeneity in recovery dynamics through interpretable subject-level scores capturing coordinated inflammatory and immune-response pathway activity. The methodology is implemented in the R package bayesSYNC.

2603.20819 2026-03-24 cs.LG cs.SY eess.SY stat.ML

Achieving $\widetilde{O}(1/ε)$ Sample Complexity for Bilinear Systems Identification under Bounded Noises

Hongyu Yi, Chenbei Lu, Jing Yu

详情
英文摘要

This paper studies finite-sample set-membership identification for discrete-time bilinear systems under bounded symmetric log-concave disturbances. Compared with existing finite-sample results for linear systems and related analyses under stronger noise assumptions, we consider the more challenging bilinear setting with trajectory-dependent regressors and allow marginally stable dynamics with polynomial mean-square state growth. Under these conditions, we prove that the diameter of the feasible parameter set shrinks with sample complexity $\widetilde{O}(1/ε)$. Simulation supports the theory and illustrates the advantage of the proposed estimator for uncertainty quantification.

2603.20783 2026-03-24 stat.ME math.ST stat.TH

Ordinal Patterns Based Testing of Spatial Independence in Irregular Spatial Structures

Giorgio Micali, David Garnés-Galindo, Mariano Matilla-García, Manuel Ruiz-Marín

详情
英文摘要

We propose a nonparametric test of spatial independence for data observed on irregular, non-lattice point clouds $\mathcal{V}_{n}\subset\mathbb{R}^{2}$. For each location $v\in\mathcal{V}_{n}$, we encode the local spatial configuration through the ordinal pattern of the $m$ nearest-neighbour observations, obtaining a symbolic representation that is invariant under strictly monotone transformations and robust to outliers. Under the null hypothesis of spatial independence, the local ordinal patterns are i.i.d.\ and uniformly distributed over the symmetric group $\mathcal{S}_{m}$, regardless of the unknown marginal distribution $F$. We exploit this characterisation to construct a test statistic $L_{n}$ based on the additive log-ratio (ALR) transformation of the empirical ordinal-pattern frequencies. Invoking a central limit theorem for graph-dependent processes under a graph-based $α$-mixing condition, we establish that $L_{n}$ converges in distribution to a $χ^{2}_{m!-1}$ random variable, yielding an asymptotically pivotal procedure with no nuisance parameters. An extensive Monte Carlo study confirms that the $χ^{2}_{m!-1}$ approximation is accurate already at moderate sample sizes, that the test controls size at the nominal level, and that power increases monotonically with the strength of spatial dependence. Notably, the test detects dependence in both linear and nonlinearly transformed spatial autoregressive models, illustrating the robustness that is characteristic of ordinal-pattern methods. Our framework extends the spatial ordinal-pattern testing paradigm from regular lattices to general spatial supports, opening the door to ordinal-pattern inference in the many applied settings where observations are irregularly located.

2603.20780 2026-03-24 stat.ME

Bregman projection for calibration estimation

Jae Kwang Kim, Yonghyun Kwon, Yumou Qiu

详情
英文摘要

Calibration weighting is a fundamental technique in survey sampling and data integration for incorporating auxiliary information and improving efficiency of estimators. Classical calibration methods are typically formulated through distance functions applied to weight ratios relative to design weights. In this paper we develop a unified framework for calibration estimation based on Bregman divergence defined directly on the weight vector. We show that calibration estimators obtained from Bregman divergence admit a dual representation that depends only on the dimension of the auxiliary variables and can be interpreted as a Bregman projection onto the calibration constraint set. This geometric structure leads to a general asymptotic representation showing that calibration estimators are equivalent to debiased regression estimators whose regression coefficient depends on the choice of the Bregman generator. The result provides a unifying perspective on classical calibration methods such as quadratic calibration and exponential tilting, and reveals how the choice of divergence influences efficiency. Under Poisson sampling we further characterize the generator that minimizes the asymptotic variance of the calibration estimator and obtain an optimal contrast entropy divergence. The framework also extends naturally to settings where inclusion probabilities are unknown and must be estimated, yielding cross-fitted estimators that remain root-n consistent under mild conditions. Finally, we develop a regularized calibration estimator suitable for high-dimensional auxiliary variables. Simulation studies and a real data application illustrate the practical advantages of the proposed approach.

2603.20761 2026-03-24 math.ST quant-ph stat.TH

Asymptotic statistical theory of irreducible quantum Markov chains

Federico Girotti, Jukka Kiukas, Mădălin Guţă

Comments 92 pages, 6 figures, comments and suggestions are more than welcome

详情
英文摘要

In this paper we investigate the asymptotic statistical theory of irreducible quantum Markov chains, focusing on identifiability properties and asymptotic convergence of associated quantum statistical models. We show that the space of identifiable parameters for the stationary output is a stratified space called an orbifold, which is obtained as the quotient of the manifold of irreducible dynamics by a compact group of state preserving symmetries. We analyse the orbifold's geometric properties, the connection between periodicity and strata, and provide orbifold charts as the starting point for the local asymptotic theory. The quantum Fisher information rate of the system and output state is expressed in terms of a canonical inner product on the identifiable tangent space. We then show that the joint system and output model satisfies quantum local asymptotic normality while the stationary output model converges to a product between a quantum Gaussian shift model and a mixture of quantum Gaussian shift models, reflecting the underlying periodicity. These strong convergence results provide the basis for constructing asymptotically optimal estimators of dynamical parameters. We provide an in-depth analysis of the model with smallest dimensions, consisting of two-dimensional system and environment units.

2603.20727 2026-03-24 stat.ME stat.AP

Compositional regression using principal nested spheres

Mymuna Monem, Ian L. Dryden, Florence George, Natalia Soares Quinete

Comments 19 pages, 8 figures, 1 table

详情
英文摘要

Regression with compositional responses is challenging due to the nonlinear geometry of the simplex and the limitations of Euclidean methods. We propose a regression framework for manifold-valued data based on mappings to statistically tractable intermediate spaces. For compositional data, responses are embedded in the positive orthant of the sphere and analysed using Principal Nested Spheres (PNS), yielding a cylindrical intermediate space with a circular leading score and Euclidean higher-order scores. Regression is performed in this intermediate space and fitted values are mapped back to the simplex. A simulation study demonstrates good performance of PNS-based regression. An application to environmental chemical exposure data illustrates the interpretability and practical utility of the method.

2603.20716 2026-03-24 stat.ME

Testing for cross-quantilogram change

Chia-Min Chang, Yu-Hsiang Cheng, Tzee-Ming Huang

Comments 13 pages

详情
英文摘要

For two time series $\{ (Y_t, Z_t^Y) \}_{t}$ and $\{(X_t, Z_t^X)\}_{t}$, the directional dependence of $\{ X_t \}_{t}$ on $\{ Y_t \}_{t}$ while removing the impact of $Z_t^X$ on $X_t$ and the impact of $Z_t^Y$ on $ Y_t$ can be measured by cross-quantilograms. When the two time series are obeserved over two periods of time, it can be of interest to learn whether the cross-quantilograms remain the same for the two periods of time. We propose a test for this purpose, and the cross-quantilograms are estimated using the estimators proposed by Han (2016). The $p$-value of the proposed test is obtained based on a bootstrap approach.

2603.20696 2026-03-24 stat.ML cs.LG

High-dimensional online learning via asynchronous decomposition: Non-divergent results, dynamic regularization, and beyond

Shixiang Liu, Zhifan Li, Hanming Yang, Jianxin Yin

Comments 41 pages, 1 figure

详情
英文摘要

Existing high-dimensional online learning methods often face the challenge that their error bounds, or per-batch sample sizes, diverge as the number of data batches increases. To address this issue, we propose an asynchronous decomposition framework that leverages summary statistics to construct a surrogate score function for current-batch learning. This framework is implemented via a dynamic-regularized iterative hard thresholding algorithm, providing a computationally and memory-efficient solution for sparse online optimization. We provide a unified theoretical analysis that accounts for both the streaming computational error and statistical accuracy, establishing that our estimator maintains non-divergent error bounds and $\ell_0$ sparsity across all batches. Furthermore, the proposed estimator adaptively achieves additional gains as batches accumulate, attaining the oracle accuracy as if the entire historical dataset were accessible and the true support were known. These theoretical properties are further illustrated through an example of the generalized linear model.

2603.20671 2026-03-24 cs.LG stat.ML

Breaking the $O(\sqrt{T})$ Cumulative Constraint Violation Barrier while Achieving $O(\sqrt{T})$ Static Regret in Constrained Online Convex Optimization

Haricharan Balasundaram, Karthick Krishna Mahendran, Rahul Vaze

详情
英文摘要

The problem of constrained online convex optimization is considered, where at each round, once a learner commits to an action $x_t \in \mathcal{X} \subset \mathbb{R}^d$, a convex loss function $f_t$ and a convex constraint function $g_t$ that drives the constraint $g_t(x)\le 0$ are revealed. The objective is to simultaneously minimize the static regret and cumulative constraint violation (CCV) compared to the benchmark that knows the loss functions and constraint functions $f_t$ and $g_t$ for all $t$ ahead of time, and chooses a static optimal action that is feasible with respect to all $g_t(x)\le 0$. In recent prior work Sinha and Vaze [2024], algorithms with simultaneous regret of $O(\sqrt{T})$ and CCV of $O(\sqrt{T})$ or (CCV of $O(1)$ in specific cases Vaze and Sinha [2025], e.g. when $d=1$) have been proposed. It is widely believed that CCV is $Ω(\sqrt{T})$ for all algorithms that ensure that regret is $O(\sqrt{T})$ with the worst case input for any $d\ge 2$. In this paper, we refute this and show that the algorithm of Vaze and Sinha [2025] simultaneously achieves regret of $O(\sqrt{T})$ regret and CCV of $O(T^{1/3})$ when $d=2$.

2603.20665 2026-03-24 stat.ME math.ST stat.TH

Continuity of the Solution of a Non-Parametric Bayesian Statistical Calibration Procedure

Akshay Prasadan, Donald Estep, Derek Bingham

Comments 25 pages

详情
英文摘要

Recent work has developed a non-parametric Bayesian approach to the calibration of a computer model, which abstractly amounts to the inversion of a pushforward of stochastic input parameters by a smooth map. The framework has been used in several complex scientific applications, motivating our investigation on the continuity of the solution operator with respect to the distribution on the input parameters. We demonstrate that the solution operator for this approach is uniformly continuous in the total variation metric and weakly continuous for a broad class of distributions.

2603.20656 2026-03-24 stat.ML cs.AI cs.LG math.OC math.ST stat.TH

Sinkhorn Based Associative Memory Retrieval Using Spherical Hellinger Kantorovich Dynamics

Aratrika Mustafi, Soumya Mukherjee

详情
英文摘要

We propose a dense associative memory for empirical measures (weighted point clouds). Stored patterns and queries are finitely supported probability measures, and retrieval is defined by minimizing a Hopfield-style log-sum-exp energy built from the debiased Sinkhorn divergence. We derive retrieval dynamics as a spherical Hellinger Kantorovich (SHK) gradient flow, which updates both support locations and weights. Discretizing the flow yields a deterministic algorithm that uses Sinkhorn potentials to compute barycentric transport steps and a multiplicative simplex reweighting. Under local separation and PL-type conditions we prove basin invariance, geometric convergence to a local minimizer, and a bound showing the minimizer remains close to the corresponding stored pattern. Under a random pattern model, we further show that these Sinkhorn basins are disjoint with high probability, implying exponential capacity in the ambient dimension. Experiments on synthetic Gaussian point-cloud memories demonstrate robust recovery from perturbed queries versus a Euclidean Hopfield-type baseline.

2603.20631 2026-03-24 stat.ML cs.LG

LassoFlexNet: Flexible Neural Architecture for Tabular Data

Kry Yik Chau Lui, Cheng Chi, Kishore Basu, Yanshuai Cao

Comments 49 pages

详情
英文摘要

Despite their dominance in vision and language, deep neural networks often underperform relative to tree-based models on tabular data. To bridge this gap, we incorporate five key inductive biases into deep learning: robustness to irrelevant features, axis alignment, localized irregularities, feature heterogeneity, and training stability. We propose \emph{LassoFlexNet}, an architecture that evaluates the linear and nonlinear marginal contribution of each input via Per-Feature Embeddings, and sparsely selects relevant variables using a Tied Group Lasso mechanism. Because these components introduce optimization challenges that destabilize standard proximal methods, we develop a \emph{Sequential Hierarchical Proximal Adaptive Gradient optimizer with exponential moving averages (EMA)} to ensure stable convergence. Across $52$ datasets from three benchmarks, LassoFlexNet matches or outperforms leading tree-based models, achieving up to a $10$\% relative gain, while maintaining Lasso-like interpretability. We substantiate these empirical results with ablation studies and theoretical proofs confirming the architecture's enhanced expressivity and structural breaking of undesired rotational invariance.

2603.20624 2026-03-24 math.ST eess.SP stat.TH

Cross-Correlation Periodograms with Decaying Noise Floor for Power Spectral Density Estimation

Mark Magsino

详情
英文摘要

We present a statistical analysis of a variant of the periodogram method that forms power spectral density estimates by cross-correlating the discrete Fourier transforms of adjacent time windows. The proposed estimator is closely related to cross-power spectral methods and to a technique introduced by Nelson, which has been observed empirically to improve detection of sinusoidal components in noise. We show that, under a white Gaussian noise model, the expected contribution of noise to the proposed estimator is zero and that the estimator is unbiased under certain window alignment conditions. This contrasts with classical estimators where averaging reduces variance but not expected noise. Moreover, we derive closed-form expressions for the variance and prove an upper bound on the expected magnitude of the estimator that decreases as the number of windows increases. This establishes that the proposed method achieves a noise floor that decays with averaging, unlike standard nonparametric spectral estimators. We further analyze the effect of taking the absolute value to enforce nonnegativity, providing bounds on the resulting bias, and show that this bias also decreases with the number of windows. Theoretical results are validated through numerical simulations. We demonstrate the potential sensitivity to phase misalignment and methods of realignment. We also provide empirical evidence that the estimator is robust to other types of noise.

2603.20602 2026-03-24 stat.ML cs.AI cs.LG

Interpretable Operator Learning for Inverse Problems via Adaptive Spectral Filtering: Convergence and Discretization Invariance

Hang-Cheng Dong, Pengcheng Cheng, Shuhuan Li

Comments 16 pages, 3 figures

详情
英文摘要

Solving ill-posed inverse problems necessitates effective regularization strategies to stabilize the inversion process against measurement noise. While classical methods like Tikhonov regularization require heuristic parameter tuning, and standard deep learning approaches often lack interpretability and generalization across resolutions, we propose SC-Net (Spectral Correction Network), a novel operator learning framework. SC-Net operates in the spectral domain of the forward operator, learning a pointwise adaptive filter function that reweights spectral coefficients based on the signal-to-noise ratio. We provide a theoretical analysis showing that SC-Net approximates the continuous inverse operator, guaranteeing discretization invariance. Numerical experiments on 1D integral equations demonstrate that SC-Net: (1) achieves the theoretical minimax optimal convergence rate ($O(δ^{0.5})$ for $s=p=1.5$), matching theoretical lower bounds; (2) learns interpretable sharp-cutoff filters that outperform Oracle Tikhonov regularization; and (3) exhibits zero-shot super-resolution, maintaining stable reconstruction errors ($\approx 0.23$) when trained on coarse grids ($N=256$) and tested on significantly finer grids (up to $N=2048$). The proposed method bridges the gap between rigorous regularization theory and data-driven operator learning.

2603.20601 2026-03-24 cs.DB stat.ME

Global Dataset of Solar Power Plants: Multidimensional Integration and Analysis

Anibal Mantilla-Guerra, Christian Mejia-Escobar, Jorge Azorin-Lopez, Jose Garcia-Rodriguez, Byron Fernando Tarco, Karen Santamaria

Comments 21 pages

详情
英文摘要

The use of clean energy is a global trend, with solar photovoltaic plants serving as a cornerstone of this energy transition. To support this rapid growth, optimize energy utilization, and enable a wide range of applications and services, it is essential to have access to more sophisticated and detailed solar data. Specifically, existing datasets lack integration, contain significant gaps, and have limited geographic coverage. In contrast, this study proposes a reliable, standardized, and multidimensional dataset with a global scope. Through a reproducible methodology and automated processes, we have successfully collected, generated, and combined 27 attributes of geographic, topographic, logistical, climate, and power nature, which are critical for the study of photovoltaic plants worldwide. Based on descriptive statistical analysis of the 58,978 records comprising the compiled dataset, the raw data have been transformed into valuable information for the energy sector. This demonstrates the utility of this product as a source of knowledge discovery, publicly available to the academic and professional communities.

2603.20585 2026-03-24 cs.LG stat.ML

RECLAIM: Cyclic Causal Discovery Amid Measurement Noise

Muralikrishnna G. Sethuraman, Faramarz Fekri

详情
英文摘要

Uncovering causal relationships is a fundamental problem across science and engineering. However, most existing causal discovery methods assume acyclicity and direct access to the system variables -- assumptions that fail to hold in many real-world settings. For instance, in genomics, cyclic regulatory networks are common, and measurements are often corrupted by instrumental noise. To address these challenges, we propose RECLAIM, a causal discovery framework that natively handles both cycles and measurement noise. RECLAIM learns the causal graph structure by maximizing the likelihood of the observed measurements via expectation-maximization (EM), using residual normalizing flows for tractable likelihood computation. We consider two measurement models: (i) Gaussian additive noise, and (ii) a linear measurement system with additive Gaussian noise. We provide theoretical consistency guarantees for both the settings. Experiments on synthetic data and real-world protein signaling datasets demonstrate the efficacy of the proposed method.

2603.20582 2026-03-24 q-fin.MF stat.ML

Generative Diffusion Model for Risk-Neutral Derivative Pricing

Nilay Tiwari

Comments 15 pages, 2 figures. Introduces a risk-neutral correction for diffusion models via a score function shift, with applications to derivative pricing

详情
英文摘要

Denoising diffusion probabilistic models (DDPMs) have emerged as powerful generative models for complex distributions, yet their use in arbitrage-free derivative pricing remains largely unexplored. Financial asset prices are naturally modeled by stochastic differential equations (SDEs), whose forward and reverse density evolution closely parallels the forward noising and reverse denoising structure of diffusion models. In this paper, we develop a framework for using DDPMs to generate risk-neutral asset price dynamics for derivative valuation. Starting from log-return dynamics under the physical measure, we analyze the associated forward diffusion and derive the reverse-time SDE. We show that the change of measure from the physical to the risk-neutral measure induces an additive shift in the score function, which translates into a closed-form risk-neutral epsilon shift in the DDPM reverse dynamics. This correction enforces the risk-neutral drift while preserving the learned variance and higher-order structure, yielding an explicit bridge between diffusion-based generative modeling and classical risk-neutral SDE-based pricing. We show that the resulting discounted price paths satisfy the martingale condition under the risk-neutral measure. Empirically, the method reproduces the risk-neutral terminal distribution and accurately prices both European and path-dependent derivatives, including arithmetic Asian options, under a GBM benchmark. These results demonstrate that diffusion-based generative models provide a flexible and principled approach to simulation-based derivative pricing.

2603.18404 2026-03-24 stat.ML cs.LG stat.ME

Multi-Domain Empirical Bayes for Linearly-Mixed Causal Representations

Bohan Wu, Julius von Kügelgen, David M. Blei

详情
英文摘要

Causal representation learning (CRL) aims to learn low-dimensional causal latent variables from high-dimensional observations. While identifiability has been extensively studied for CRL, estimation has been less explored. In this paper, we explore the use of empirical Bayes (EB) to estimate causal representations. In particular, we consider the problem of learning from data from multiple domains, where differences between domains are modeled by interventions in a shared underlying causal model. Multi-domain CRL naturally poses a simultaneous inference problem that EB is designed to tackle. Here, we propose an EB $f$-modeling algorithm that improves the quality of learned causal variables by exploiting invariant structure within and across domains. Specifically, we consider a linear measurement model and interventional priors arising from a shared acyclic SCM. When the graph and intervention targets are known, we develop an EM-style algorithm based on causally structured score matching. We further discuss EB $g$-modeling in the context of existing CRL approaches. In experiments on synthetic data, our proposed method achieves more accurate estimation than other methods for CRL.

2603.15232 2026-03-24 cs.LG math.ST stat.ML stat.TH

Decomposing Probabilistic Scores: Reliability, Information Loss and Uncertainty

Arthur Charpentier, Agathe Fernandes Machado

详情
英文摘要

Calibration is a conditional property that depends on the information retained by a predictor. We develop decomposition identities for arbitrary proper losses that make this dependence explicit. At any information level $\mathcal A$, the expected loss of an $\mathcal A$-measurable predictor splits into a proper-regret (reliability) term and a conditional entropy (residual uncertainty) term. For nested levels $\mathcal A\subseteq\mathcal B$, a chain decomposition quantifies the information gain from $\mathcal A$ to $\mathcal B$. Applied to classification with features $\boldsymbol{X}$ and score $S=s(\boldsymbol{X})$, this yields a three-term identity: miscalibration, a {\em grouping} term measuring information loss from $\boldsymbol{X}$ to $S$, and irreducible uncertainty at the feature level. We leverage the framework to analyze post-hoc recalibration, aggregation of calibrated models, and stagewise/boosting constructions, with explicit forms for Brier and log-loss.

2603.15182 2026-03-24 stat.ME cs.LG

Sequential Transport for Causal Mediation Analysis

Agathe Fernandes Machado, Iryna Voitsitska, Arthur Charpentier, Ewen Gallic

详情
英文摘要

We propose sequential transport (ST), a distributional framework for mediation analysis that combines optimal transport (OT) with a mediator directed acyclic graph (DAG). Instead of relying on cross-world counterfactual assumptions, ST constructs unit-level mediator counterfactuals by minimally transporting each mediator, either marginally or conditionally, toward its distribution under an alternative treatment while preserving the causal dependencies encoded by the DAG. For numerical mediators, ST uses monotone (conditional) OT maps based on conditional CDF/quantile estimators; for categorical mediators, it extends naturally via simplex-based transport. We establish consistency of the estimated transport maps and of the induced unit-level decompositions into mutatis mutandis direct and indirect effects under standard regularity and support conditions. When the treatment is randomized or ignorable (possibly conditional on covariates), these decompositions admit a causal interpretation; otherwise, they provide a principled distributional attribution of differences between groups aligned with the mediator structure. Gaussian examples show that ST recovers classical mediation formulas, while additional simulations confirm good performance in nonlinear and mixed-type settings. An application to the COMPAS dataset illustrates how ST yields deterministic, DAG-consistent counterfactual mediators and a fine-grained mediator-level attribution of disparities.

2603.01162 2026-03-24 cs.LG stat.ML

Demystifying Group Relative Policy Optimization: Its Policy Gradient is a U-Statistic

Hongyi Zhou, Kai Ye, Erhan Xu, Jin Zhu, Ying Yang, Shijin Gong, Chengchun Shi

Comments 5 pages, 53 figures

详情
英文摘要

Group relative policy optimization (GRPO), a core methodological component of DeepSeekMath and DeepSeek-R1, has emerged as a cornerstone for scaling reasoning capabilities of large language models. Despite its widespread adoption and the proliferation of follow-up works, the theoretical properties of GRPO remain less studied. This paper provides a unified framework to understand GRPO through the lens of classical U-statistics. We demonstrate that the GRPO policy gradient is inherently a U-statistic, allowing us to characterize its mean squared error (MSE), derive the finite-sample error bound and asymptotic distribution of the suboptimality gap for its learned policy. Our findings reveal that GRPO is asymptotically equivalent to an oracle policy gradient algorithm -- one with access to a value function that quantifies the goodness of its learning policy at each training iteration -- and achieves asymptotically optimal performance within a broad class of policy gradient algorithms. Furthermore, we establish a universal scaling law that offers principled guidance for selecting the optimal group size. Empirical experiments further validate our theoretical findings, demonstrating that the optimal group size is universal, and verify the oracle property of GRPO.

2602.22271 2026-03-24 cs.LG math.PR math.ST stat.TH

Support Tokens, Stability Margins, and a New Foundation for Robust LLMs

Deepak Agarwal, Dhyey Dharmendrakumar Mavani, Suyash Gupta, Karthik Sethuraman, Tejas Dharamsi

Comments 45 pages, 9 figures

详情
英文摘要

Self-attention is usually described as a flexible, content-adaptive way to mix a token with information from its past. We reinterpret causal self-attention transformers, the backbone of modern foundation models, within a probabilistic framework, much as classical PCA is extended to probabilistic PCA. This reformulation reveals a key structural consequence of the underlying change of variables: a barrier constraint emerges on the parameters of self-attention. The resulting geometry exposes a degeneracy boundary where the attention-induced mapping becomes locally ill-conditioned, yielding a stability-margin interpretation analogous to the margin in support vector machines. This, in turn, naturally gives rise to the concept of support tokens. We further show that causal transformers define a consistent stochastic process over infinite token sequences, providing a rigorous probabilistic foundation for sequence modeling. Building on this view, we derive a Bayesian MAP training objective that requires only a minimal modification to standard LLM training: adding a smooth log-barrier penalty to the usual cross-entropy loss. Empirically, the resulting training objective improves robustness to input perturbations and sharpens the margin geometry of the learned representations without sacrificing out-of-sample accuracy.

2601.22481 2026-03-24 stat.ME stat.AP stat.ML

Changepoint Detection As Model Selection: A General Framework

Michael Grantham, Xueheng Shi, Bertrand Clarke

详情
英文摘要

This dissertation presents a general framework for changepoint detection based on L0 model selection. The core method, Iteratively Reweighted Fused Lasso (IRFL), improves upon the generalized lasso by adaptively reweighting penalties to enhance support recovery and minimize criteria such as the Bayesian Information Criterion (BIC). The approach allows for flexible modeling of seasonal patterns, linear and quadratic trends, and autoregressive dependence in the presence of changepoints. Simulation studies demonstrate that IRFL achieves accurate changepoint detection across a wide range of challenging scenarios, including those involving nuisance factors such as trends, seasonal patterns, and serially correlated errors. The framework is further extended to image data, where it enables edge-preserving denoising and segmentation, with applications spanning medical imaging and high-throughput plant phenotyping. Applications to real-world data demonstrate IRFL's utility. In particular, analysis of the Mauna Loa CO2 time series reveals changepoints that align with volcanic eruptions and ENSO events, yielding a more accurate trend decomposition than ordinary least squares. Overall, IRFL provides a robust, extensible tool for detecting structural change in complex data.

2512.19398 2026-03-24 stat.ME stat.CO

A Reduced Basis Decomposition Approach to Efficient Data Collection in Pairwise Comparison Studies

Jiahua Jiang, Joseph Marsh, Rowland G Seymour

Comments Author Accepted Manuscript

详情
Journal ref
Comput Stat 41, 60 (2026)
英文摘要

Comparative judgement studies elicit quality assessments through pairwise comparisons, typically analysed using the Bradley-Terry model. A challenge in these studies is experimental design, specifically, determining the optimal pairs to compare to maximize statistical efficiency. Constructing static experimental designs for these studies requires spectral decomposition of a covariance matrix over pairs of pairs, which becomes computationally infeasible for studies with more than approximately 150 objects. We propose a scalable method based on reduced basis decomposition that bypasses explicit construction of this matrix, achieving computational savings of two to three orders of magnitude. We establish eigenvalue bounds guaranteeing approximation quality and characterise the rank structure of the design matrix. Simulations demonstrate speedup factors exceeding 100 for studies with 64 or more objects, with negligible approximation error. We apply the method to construct designs for a 452-region spatial study in under 7 minutes and enable real-time design updates for classroom peer assessment, reducing computation time from 15 minutes to 15 seconds.

2511.10814 2026-03-24 math.PR math.ST stat.TH

Convergence of the extended Kalman filter with small and state-dependent noise

Ibrahim Mbouandi Njiasse, Florent Ouabo Kamkumo, Ralf Wunderlich

Comments 20 pages

详情
英文摘要

Nonlinear filtering problems are encountered in many applications, and one solution approach is the extended Kalman filter, which is not always convergent. Therefore, it is crucial to identify conditions under which the extended Kalman filter provides accurate approximations. This paper generalizes two significant results of Picard (1991) on the efficiency of the continuous-time extended Kalman filter for a filtering system with small noise, to a more general setting where the observation noise may be state-dependent but does not allow signal reconstruction from the quadratic variation of the observation process as for example in epidemic models. First, we show that if the drift of the signal process and the observation process becomes nearly linear when the parameter $ε$, which scales the diffusion coefficients, approaches zero, and the drift coefficient of the observation process is strongly injective, then the estimation error is of the order of $\sqrtε$. We then establish conditions under which the impact of the initial filtering error decays exponentially fast.

2511.03115 2026-03-24 physics.med-ph stat.AP

SDE-based Monte Carlo dose calculation for proton therapy validated against Geant4

Christopher B. C. Dean, Maria L. Pérez-Lara, Emma Horton, Matthew Southerby, Jere Koskela, Andreas E. Kyprianou

Comments 30 pages, 11 figures

详情
英文摘要

Objective: To assess the accuracy and computational performance of a stochastic differential equation (SDE)--based model for proton beam dose calculation by benchmarking against Geant4 in simplified phantom geometries. Approach: Building on Crossley et al. (2025), we implemented the SDE model using standard approximations to interaction cross sections and mean excitation energies, enabling straightforward adaptation to new materials and configurations. The model was benchmarked against Geant4 in homogeneous, longitudinally heterogeneous and laterally heterogeneous phantoms to assess depth--dose behaviour, lateral transport and material heterogeneities. Main results: Across all phantoms and beam energies, the SDE model reproduced the main depth--dose characteristics predicted by Geant4, with proton range agreement within 0.2 mm for 100 MeV beams and 0.6 mm for 150 MeV beams. Voxel--wise comparisons yielded gamma pass rates exceeding 95% under 2%/0.5 mm criteria with a 1% dose threshold. Differences were localised to steep dose gradients or material interfaces, while overall lateral beam dispersion was well reproduced. The SDE model achieved speed-up factors of about 2.5--3 relative to single-threaded Geant4. Significance: The SDE approach reproduces key dosimetric features with good accuracy at lower computational cost and is amenable to parallel and GPU implementations, supporting fast proton therapy dose calculations.

2510.22690 2026-03-24 math.ST math.PR stat.ME stat.TH

Stopping Rules for Monte Carlo Methods of Martingale Difference Type

Jiezhong Wu, Reiichiro Kawai

Comments 30 pages, 4 figures

详情
Journal ref
SIAM J. Sci. Comput., Vol 48 (2026)
英文摘要

We establish a practical and easy-to-implement sequential stopping rule for the martingale central limit theorem, focusing on Monte Carlo methods for estimating the mean of a non-iid sequence of martingale difference type. Starting with an impractical scheme based on the standard martingale central limit theorem, we progressively address its limitations from implementation perspectives in the non-asymptotic regime. Along the way, we compare the proposed schemes with their counterparts in the asymptotic regime. The developed framework has potential applications in various domains. Numerical results are provided to demonstrate the effectiveness of the developed stopping rules in terms of reliability and complexity.

2510.22083 2026-03-24 stat.ME

Ridge Boosting is Both Robust and Efficient

David Bruns-Smith, Zhongming Xie, Avi Feller

详情
英文摘要

Estimators in statistics and machine learning must typically trade off between efficiency, having low variance for a fixed target, and distributional robustness, such as multiaccuracy, or having low bias over a range of possible targets. In this paper, we consider a simple estimator, ridge boosting: starting with any initial predictor, perform a single boosting step with (kernel) ridge regression. Surprisingly, we show that ridge boosting simultaneously achieves both efficiency and distributional robustness: for target distribution shifts that lie within an RKHS unit ball, this estimator maintains low bias across all such shifts and has variance at the semiparametric efficiency bound for each target. In addition to bridging otherwise distinct research areas, this result has immediate practical value. Since ridge boosting uses only data from the source distribution, researchers can train a single model to obtain both robust and efficient estimates for multiple target estimands at the same time, eliminating the need to fit separate semiparametric efficient estimators for each target. We assess this approach through simulations and an application estimating the age profile of retirement income.

2510.03798 2026-03-24 cs.LG stat.ML

Robust Batched Bandits

Yunwen Guo, Yunlun Shu, Gongyi Zhuo, Tianyu Wang

Comments 39 pages

详情
英文摘要

The batched multi-armed bandit (MAB) problem, in which rewards are collected in batches, is crucial for applications such as clinical trials. Existing research predominantly assumes light-tailed reward distributions, yet many real-world scenarios, including clinical outcomes, exhibit heavy-tailed characteristics. This paper bridges this gap by proposing robust batched bandit algorithms designed for heavy-tailed rewards, within both finite-arm and Lipschitz-continuous settings. We reveal a surprising phenomenon: in the instance-independent regime, as well as in the Lipschitz setting, heavier-tailed rewards necessitate a smaller number of batches to achieve near-optimal regret. In stark contrast, for the instance-dependent setting, the required number of batches to attain near-optimal regret remains invariant with respect to tail heaviness.

2509.20721 2026-03-24 cs.LG math.ST stat.ML stat.TH

Scaling Laws are Redundancy Laws

Yuda Bi, Vince D Calhoun

Comments This is not a serious research at this time

详情
英文摘要

Scaling laws, a defining feature of deep learning, reveal a striking power-law improvement in model performance with increasing dataset and model size. Yet, their mathematical origins, especially the scaling exponent, have remained elusive. In this work, we show that scaling laws can be formally explained as redundancy laws. Using kernel regression, we show that a polynomial tail in the data covariance spectrum yields an excess risk power law with exponent alpha = 2s / (2s + 1/beta), where beta controls the spectral tail and 1/beta measures redundancy. This reveals that the learning curve's slope is not universal but depends on data redundancy, with steeper spectra accelerating returns to scale. We establish the law's universality across boundedly invertible transformations, multi-modal mixtures, finite-width approximations, and Transformer architectures in both linearized (NTK) and feature-learning regimes. This work delivers the first rigorous mathematical explanation of scaling laws as finite-sample redundancy laws, unifying empirical observations with theoretical foundations.

2509.07322 2026-03-24 stat.ME

Cumulative Marginal Mean Model for Assessing Sequential Effects Using Digital Health Data

Xingche Guo, Zexi Cai, Yuanjia Wang, Donglin Zeng

详情
英文摘要

Mobile health (mHealth) leverages digital technologies, such as mobile phones, to capture objective, frequent, and real-world digital phenotypes from individuals, enabling the delivery of tailored interventions to accommodate substantial between-subject and temporal heterogeneity. However, evaluating heterogeneous treatment effects (HTEs) using digital phenotype data is challenging because treatments are delivered dynamically over time and may generate carryover effects that persist beyond the immediate response. Additionally, modeling observational data is complicated by confounding factors. To address these challenges, we propose a double machine learning (DML) method for estimating time-varying HTEs using digital phenotypes under a cumulative marginal mean model that separates current instantaneous effects from lagged carryover effects. Our approach uses a sequential estimation procedure together with Neyman-orthogonal scores to obtain robust inference for the time-varying HTEs. We establish the asymptotic normality of the proposed estimator. Extensive simulation studies validate the finite-sample performance of our approach, demonstrating the advantages of DML and the decomposition of treatment effects. We apply the method to an mHealth study of Parkinson's disease (PD), where we find that treatment is significantly more effective for younger patients. Our results highlight the potential of the proposed approach for advancing precision medicine in mHealth studies.

2508.07392 2026-03-24 cs.LG math.ST stat.ML stat.TH

Tight Bounds for Schrödinger Potential Estimation in Unpaired Data Translation

Nikita Puchkin, Denis Suchkov, Alexey Naumov, Denis Belomestny

Comments The 14th International Conference on Learning Representations (ICLR 2026)

详情
英文摘要

Modern methods of generative modelling and unpaired data translation based on Schrödinger bridges and stochastic optimal control theory aim to transform an initial density to a target one in an optimal way. In the present paper, we assume that we only have access to i.i.d. samples from the initial and final distributions. This makes our setup suitable for both generative modelling and unpaired data translation. Relying on the stochastic optimal control approach, we choose an Ornstein-Uhlenbeck process as the reference one and estimate the corresponding Schrödinger potential. Introducing a risk function as the Kullback-Leibler divergence between couplings, we derive tight bounds on the generalization ability of an empirical risk minimizer over a class of Schrödinger potentials, including Gaussian mixtures. Thanks to the mixing properties of the Ornstein-Uhlenbeck process, we almost achieve fast rates of convergence, up to some logarithmic factors, in favourable scenarios. We also illustrate the performance of the suggested approach with numerical experiments.

2507.23646 2026-03-24 math.ST cs.IT math.DG math.IT math.PR q-fin.MF stat.TH

Information geometry of Lévy processes and financial models

Jaehyung Choi

Comments 22 pages

详情
英文摘要

We develop the information geometry of Lévy processes. Deriving $α$-divergences directly in terms of the Lévy triplets of the Lévy processes, we identify Fisher information matrix and $α$-connection on the statistical manifold. In addition, we discuss statistical implications of this information geometry, including bias reduction estimation and Bayesian predictive priors. Several Lévy processes, broadly used for financial modeling such as tempered stable processes, the CGMY model, variance gamma processes, and the Merton model, are investigated through their differential-geometric structures as illustrative examples.

2507.16749 2026-03-24 stat.ME stat.ML

Bootstrapped Control Limits for Score-Based Concept Drift Control Charts

Jiezhong Wu, Daniel W. Apley

Comments 46 pages, 3 figures

详情
英文摘要

Monitoring for changes in a predictive relationship represented by a fitted supervised learning model (i.e., concept drift detection) is a widespread problem in modern data-driven applications. A general and powerful Fisher score-based concept drift approach was recently proposed, in which detecting concept drift reduces to detecting changes in the mean of the model's score vector using a multivariate exponentially weighted moving average (MEWMA). To implement the approach, the initial data must be split into two subsets. The first subset serves as the training sample to which the model is fit, and the second subset serves as an out-of-sample test set from which the MEWMA control limit (CL) is determined. In this paper, we retain the same score-based MEWMA monitoring statistic as the existing method and focus instead on improving the computation of the control limit. We develop a novel nested bootstrap procedure for calibrating the CL that allows the entire initial sample to be used for model fitting, thereby yielding a more accurate baseline model while eliminating the need for a large holdout set. The outer bootstrap loop is fully parallelizable, making the method computationally practical, with CL setup times comparable to or faster than the existing method. We show that a standard nested bootstrap substantially underestimates the variability of the monitoring statistic and develop a 0.632-like correction that appropriately accounts for this. We demonstrate the advantages with numerical examples.

2507.14869 2026-03-24 stat.CO math.PR

Bayesian Inversion via Probabilistic Cellular Automata: an application to image denoising

Danilo Costarelli, Michele Piconi, Alessio Troiani

详情
英文摘要

We propose using Probabilistic Cellular Automata (PCA) to address inverse problems with the Bayesian approach. In particular, we use PCA to sample from an approximation of the posterior distribution. The peculiar feature of PCA is their intrinsic parallel nature, which allows for a straightforward parallel implementation that allows the exploitation of parallel computing architecture in a natural and efficient manner. We compare the performance of the PCA method with the standard Gibbs sampler on an image denoising task in terms of Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM). The numerical results and the large speedups obtained with this approach suggest that PCA-based algorithms are a promising alternative for Bayesian inference in high-dimensional inverse problems.

2506.03467 2026-03-24 cs.IT cs.CR cs.LG eess.SP math.IT stat.ME

Differentially Private Distribution Release of Gaussian Mixture Models via KL-Divergence Minimization

Hang Liu, Anna Scaglione, Sean Peisert

Comments This work has been submitted to the IEEE for possible publication

详情
英文摘要

Gaussian Mixture Models (GMMs) are widely used statistical models for representing multi-modal data distributions, with numerous applications in data mining, pattern recognition, data simulation, and machine learning. However, recent research has shown that releasing GMM parameters poses significant privacy risks, potentially exposing sensitive information about the underlying data. In this paper, we address the challenge of releasing GMM parameters while ensuring differential privacy (DP) guarantees. Specifically, we focus on the privacy protection of mixture weights, component means, and covariance matrices. We propose to use Kullback-Leibler (KL) divergence as a utility metric to assess the accuracy of the released GMM, as it captures the joint impact of noise perturbation on all the model parameters. To achieve privacy, we introduce a DP mechanism that adds carefully calibrated random perturbations to the GMM parameters. Through theoretical analysis, we quantify the effects of privacy budget allocation and perturbation statistics on the DP guarantee, and derive a tractable expression for evaluating KL divergence. We formulate and solve an optimization problem to minimize the KL divergence between the released and original models, subject to a given $(ε, δ)$-DP constraint. Extensive experiments on both synthetic and real-world datasets demonstrate that our approach achieves strong privacy guarantees while maintaining high utility.

2506.01619 2026-03-24 math.ST stat.TH

A projector-rank partition theorem for exact degrees of freedom in experimental design

Nagananda K G

Comments 26 pages

详情
Journal ref
Journal of Statistical Planning and Inference, 2026
英文摘要

In many experimental designs -- split-plots, blocked or nested layouts, fractional factorials, and studies with missing or unequal replication -- standard ANOVA procedures no longer tell us exactly how many independent pieces of information each effect truly contributes. We provide a general degrees of freedom $(\mathrm{df})$ partition theorem that resolves this ambiguity. For $N$ observations, we show that the total information in the data (i.e., $N-1$ $\mathrm{df}$) can be split exactly across experimental effects and randomization strata by projecting the data onto each stratum and counting the $\mathrm{df}$ each effect contributes there. This yields integer $\mathrm{df}$ -- not approximations -- for any mix of fixed and random effects, blocking structures, fractionation, or imbalance. This result yields closed-form $\mathrm{df}$ tables for unbalanced split-plot, row-column, lattice, and crossed-nested designs. We introduce practical diagnostics -- the $\mathrm{df}$-retention ratio $ρ$, df deficiency $δ$, and variance-inflation index $α$ -- that measure exactly how many $\mathrm{df}$ an effect retains under blocking or fractionation and the resulting loss of precision, thereby extending Box-Hunter's resolution idea to multi-stratum and incomplete designs. Classical results emerge as corollaries: Cochran's one-stratum identity; Yates's split-plot $\mathrm{df}$; resolution-$R$ identified when an effect retains no $\mathrm{df}$. Empirical studies on split-plot and nested designs, a blocked fractional-factorial design-selection experiment, and timing benchmarks show that our approach delivers calibrated error rates, recovers information to raise power by up to 60% without additional runs, and is orders of magnitude faster than bootstrap-based $\mathrm{df}$ approximations.

2505.19731 2026-03-24 stat.ML cs.LG

Proximal Point Nash Learning from Human Feedback

Daniil Tiapkin, Daniele Calandriello, Denis Belomestny, Eric Moulines, Alexey Naumov, Kashif Rasul, Michal Valko, Pierre Menard

详情
英文摘要

Traditional Reinforcement Learning from Human Feedback (RLHF) often relies on reward models, frequently assuming preference structures like the Bradley--Terry model, which may not accurately capture the complexities of real human preferences (e.g., intransitivity). Nash Learning from Human Feedback (NLHF) offers a more direct alternative by framing the problem as finding a Nash equilibrium of a game defined by these preferences. While many works study the Nash learning problem directly in the policy space, we instead consider it under a more realistic policy parametrization setting. We first analyze a simple self-play policy gradient method, which is equivalent to Online IPO. We establish high-probability last-iterate convergence guarantees for this method, but our analysis also reveals a possible stability limitation of the underlying dynamics. Motivated by this, we embed the self-play updates into a proximal point framework, yielding a stabilized algorithm. For this combined method, we prove high-probability last-iterate convergence and discuss its more practical version, which we call Nash Prox. Finally, we apply this method to post-training of large language models and validate its empirical performance.

2505.12617 2026-03-24 stat.ME stat.AP

Double machine learning to estimate the effects of multiple treatments and their interactions

Qingyan Xiang, Yubai Yuan, Dongyuan Song, Usman J. Wudil, Muktar H. Aliyu, C. William Wester, Bryan E. Shepherd

详情
英文摘要

Causal inference literature has extensively focused on binary treatments, with relatively fewer methods developed for multi-valued treatments. In particular, methods for multiple simultaneously assigned treatments remain understudied despite their practical importance. This paper introduces two settings: (1) estimating the effects of multiple treatments of different types (binary, categorical, and continuous) and the effects of treatment interactions, and (2) estimating the average treatment effect across categories of multi-valued regimens. To obtain robust estimates for both settings, we propose a class of methods based on the Double Machine Learning (DML) framework. Our methods are well-suited for complex settings of multiple treatments/regimens, using machine learning to model confounding relationships while overcoming regularization and overfitting biases through Neyman orthogonality and cross-fitting. To our knowledge, this work is the first to apply machine learning for robust estimation of interaction effects in the presence of multiple treatments. We further establish the asymptotic distribution of our estimators and derive variance estimators for statistical inference. Extensive simulations demonstrate the performance of our methods. Finally, we apply the methods to study the effect of three treatments on HIV-associated kidney disease in an adult HIV cohort of 2455 participants in Nigeria.

2504.16780 2026-03-24 math.ST stat.ME stat.TH

Linear Regression Using Principal Components from General Hilbert-Space-Valued Covariates

Xinyi Li, Margaret Hoch, Michael R. Kosorok

详情
英文摘要

We introduce Adaptive Subspace PCA (AS-PCA), a framework for principal component analysis of random elements in a general separable Hilbert space. AS-PCA projects the covariance operator onto a data-adaptive finite-dimensional subspace prior to eigendecomposition, requiring no kernel specification and accommodating multi-dimensional functional objects including images and surfaces. Under the second-moment condition, we prove a Donsker theorem for Hilbert-space-valued empirical processes and use it to establish uniform consistency and joint Gaussian limits for the leading eigenpairs. A data-driven diagnostic verifies projection accuracy, and a consistent proportion-of-variance-explained rule selects the number of components. Building on AS-PCA, we construct Hilbert-Space Principal Component Regression (HS-PCR) for models combining Euclidean and Hilbert-space-valued covariates. The HS-PCR estimator is root-$n$ consistent and asymptotically normal, with an explicit influence function decomposition accounting for eigenfunction estimation uncertainty. Both nonparametric and wild bootstrap procedures are shown to be asymptotically valid. Simulations with two- and three-dimensional imaging predictors confirm accurate eigenstructure recovery and nominal bootstrap coverage. HS-PCR is applied to Alzheimer's Disease Neuroimaging Initiative data in regression and precision-medicine settings.

2504.03097 2026-03-24 stat.ML cs.LG math.PR math.ST stat.TH

A Computational Transition for Detecting Multivariate Shuffled Linear Regression by Low-Degree Polynomials

Zhangsong Li

Comments 27 pages; improved exposition

详情
Journal ref
IEEE Transactions on Information Theory, 72(4):2444-2456 (April 2026)
英文摘要

In this paper, we study the problem of multivariate shuffled linear regression, where the correspondence between predictors and responses in a linear model is obfuscated by a latent permutation. Specifically, we investigate the model $Y=\tfrac{1}{\sqrt{1+σ^2}}(Π_* X Q_* + σZ)$, where $X$ is an $n*d$ standard Gaussian design matrix, $Z$ is an $n*m$ Gaussian noise matrix, $Π_*$ is an unknown $n*n$ permutation matrix, and $Q_*$ is an unknown $d*m$ on the Grassmanian manifold satisfying $Q_*^{\top} Q_* = \mathbb I_m$. Consider the hypothesis testing problem of distinguishing this model from the case where $X$ and $Y$ are independent Gaussian random matrices of sizes $n*d$ and $n*m$, respectively. Our results reveal a phase transition phenomenon in the performance of low-degree polynomial algorithms for this task. (1) When $m=o(d)$, we show that all degree-$D$ polynomials fail to distinguish these two models even when $σ=0$, provided with $D^4=o\big( \tfrac{d}{m} \big)$. (2) When $m=d$ and $σ=ω(1)$, we show that all degree-$D$ polynomials fail to distinguish these two models provided with $D=o(σ)$. (3) When $m=d$ and $σ=o(1)$, we show that there exists a constant-degree polynomial that strongly distinguish these two models. These results establish a smooth transition in the effectiveness of low-degree polynomial algorithms for this problem, highlighting the interplay between the dimensions $m$ and $d$, the noise level $σ$, and the computational complexity of the testing task.

2502.10010 2026-03-24 stat.ME

Principal Decomposition with Nested Submanifolds

Jiaji Su, Zhigang Yao

Comments 34 pages, 12 figures, 1 table

详情
英文摘要

Over the past decades, the increasing dimensionality of data has increased the need for effective data decomposition methods. Existing approaches, however, often rely on linear models or lack sufficient interpretability or flexibility. To address this issue, we introduce a novel nonlinear decomposition technique called the principal nested submanifolds, which builds on the foundational concepts of principal component analysis. This method exploits the local geometric information of data sets by projecting samples onto a series of nested principal submanifolds with progressively decreasing dimensions. It effectively isolates complex information within the data in a backward stepwise manner by targeting variations associated with smaller eigenvalues in local covariance matrices. Unlike previous methods, the resulting subspaces are smooth manifolds, not merely linear spaces or special shape spaces. Validated through extensive simulation studies and applied to real-world RNA sequencing data, our approach surpasses existing models in delineating intricate nonlinear structures. It provides more flexible subspace constraints that improve the extraction of significant data components and facilitate noise reduction. This innovative approach not only advances the non-Euclidean statistical analysis of data with low-dimensional intrinsic structure within Euclidean spaces, but also offers new perspectives for dealing with high-dimensional noisy data sets in fields such as bioinformatics and machine learning.

2501.02406 2026-03-24 stat.ML cs.AI cs.CL cs.IT cs.LG math.IT

A Training-free Method for LLM Text Attribution

Tara Radvand, Mojtaba Abdolmaleki, Mohamed Mostagir, Ambuj Tewari

详情
英文摘要

Verifying the provenance of content is crucial to the functioning of many organizations, e.g., educational institutions, social media platforms, and firms. This problem is becoming increasingly challenging as text generated by Large Language Models (LLMs) becomes almost indistinguishable from human-generated content. In addition, many institutions use in-house LLMs and want to ensure that external, non-sanctioned LLMs do not produce content within their institutions. In this paper, we answer the following question: Given a piece of text, can we identify whether it was produced by a particular LLM, while ensuring a guaranteed low false positive rate? We model LLM text as a sequential stochastic process with complete dependence on history. We then design zero-shot statistical tests to (i) distinguish between text generated by two different known sets of LLMs $A$ (non-sanctioned) and $B$ (in-house), and (ii) identify whether text was generated by a known LLM or by any unknown model. We prove that the Type I and Type II errors of our test decrease exponentially with the length of the text. We also extend our theory to black-box access via sampling and characterize the required sample size to obtain essentially the same Type I and Type II error upper bounds as in the white-box setting (i.e., with access to $A$). We show the tightness of our upper bounds by providing an information-theoretic lower bound. We next present numerical experiments to validate our theoretical results and assess their robustness in settings with adversarial post-editing. Our work has a host of practical applications in which determining the origin of a text is important and can also be useful for combating misinformation and ensuring compliance with emerging AI regulations. See https://github.com/TaraRadvand74/llm-text-detection for code, data, and an online demo of the project.

2412.20013 2026-03-24 stat.ME

Kendall's tau and Spearman's rho for normal location-scale and skew-normal scale mixture copulas

Ye Lu

详情
英文摘要

We derive explicit formulas for Kendall's tau and Spearman's rho for two broad classes of asymmetric copulas: normal location-scale mixture copulas and skew-normal scale mixture copulas. These classes encompass widely used specifications, including the normal scale mixture, skew-normal, and various skew-$t$ copulas, as special cases. The derived formulas establish functional mappings from copula parameters to rank correlation coefficients, and we investigate and compare how asymmetry parameters influence rank correlation properties and drive departures from the elliptically symmetric case within these two classes. A notable finding is that the introduction of asymmetry in normal location-scale mixture copulas restricts the attainable range of rank correlations from the standard [-1,1] interval, which is observed under elliptical symmetry, to a strict subset of [-1,1]. In contrast, the entire interval [-1,1] remains attainable for skew-normal scale mixture copulas.

2412.07971 2026-03-24 cs.LG cs.DC stat.ML

Effectiveness of Distributed Gradient Descent with Local Steps for Overparameterized Models

Heng Zhu, Harsh Vardhan, Arya Mazumdar

详情
英文摘要

In distributed training of machine learning models, gradient descent with local iterative steps, commonly known as Local (Stochastic) Gradient Descent (Local-(S)GD) or Federated averaging (FedAvg), is a very popular method to mitigate communication burden. In this method, gradient steps based on local datasets are taken independently in distributed compute nodes to update the local models, which are then aggregated intermittently. In the interpolation regime, Local-GD can converge to zero training loss. However, with many potential solutions corresponding to zero training loss, it is not known which solution Local-GD converges to. In this work we answer this question by analyzing implicit bias of Local-GD for classification tasks with linearly separable data. For the interpolation regime, our analysis shows that the aggregated global model obtained from Local-GD, with arbitrary number of local steps, converges exactly to the model that would be obtained if all data were in one place (centralized model) ''in direction''. Our result gives the exact rate of convergence to the centralized model with respect to the number of local steps. We also obtain the same implicit bias with a learning rate independent of number of local steps with a modified version of the Local-GD algorithm. Our analysis provides a new view to understand why Local-GD can still perform well with a very large number of local steps even for heterogeneous data. Lastly, we also discuss the extension of our results to Local-SGD and non-separable data.

2411.17841 2026-03-24 stat.ME stat.AP

Bayesian defective Marshall-Olkin Gompertz model: an integrated approach to identifying cure fraction

Dionisio Alves-Neto, Vera Lucia Tomazella, Adriano Suzuki, Danilo Alvares

详情
英文摘要

Regression models have a substantial impact on interpretation of treatments, genetic characteristics and other potential risk factors in survival analysis. In many applications, the description of censoring and survival curve reveals the presence of cure fraction on data, which leads to alternative modeling. The most common approach to introduce covariates under a parameter estimation is the cure rate model and its variations, although the use of defective distributions have introduced a more parsimonious and integrated approach. Defective distributions are given by a density function whose integration is not one after changing the domain of one of the parameters, making them appropriate for survival curves with an evident plateau. In this work, we introduce a new Bayesian defective regression model for long-term survival outcomes using the Marshall-Olkin Gompertz distribution. The estimation process is under the Bayesian paradigm. We evaluate the asymptotic properties of our proposal under the vague prior scheme in Monte Carlo studies. We present a motivating real-world application using data from patients diagnosed with testicular cancer in São Paulo, Brazil, in which long-term survivors were identified. Scenarios of cure with uncertainty estimates via credible intervals are provided to evaluate characteristics such as risk age, presence of treatment, and cancer stage.

2410.18869 2026-03-24 math.PR math.AP math.OC math.ST q-fin.MF stat.TH

On the Mean-Field limit of diffusive games through the master equation: $L^{\infty}$ estimates and extreme value behavior

Erhan Bayraktar, Nikolaos Kolliopoulos

Comments 41 pages including references

详情
英文摘要

We consider an $N$-player game where the states of the players evolve with time as Stochastic Differential Equations (SDEs) with interaction only in the drift terms. Each player controls the drift of the SDE satisfied by her state process, aiming to minimize the expected value of a cost that depends on the paths of the player's state and the empirical measure of the states of all the players until a terminal time. When $N \to \infty$, previous works have established Central Limit Theorems and Large Deviation Principles for the state processes when the game is in Nash Equilibrium (the Nash states), by using the Master Equation to construct approximations of those processes that evolve with time as SDEs with classical Mean-Field interaction. Staying in this framework, we improve an existing $L^{1}$ estimate for the total error of approximating all the Nash states to an $L^{\infty}$ one, and we also establish the $N \to \infty$ asymptotic behavior of the upper order statistics of the Nash states. The latter initiates the development of an Extreme Value Theory for Stochastic Differential Games.

2410.11151 2026-03-24 stat.ME cs.IT math.IT stat.AP

Discovering the critical number of respondents to validate an item in a questionnaire: The Binomial Cut-level Content Validity proposal

Helder Gomes Costa, Eduardo Shimoda, José Fabiano da Serra Costa, Aldo Shimoya, Edilvando Pereira Eufrazio

Comments 17 pages, 1 figure

详情
Journal ref
Quality & Quantity (2026)
英文摘要

The question that drives this research is: "How to discover the number of respondents that are necessary to validate items of a questionnaire as actually essential to reach the questionnaire's proposal?" Among the efforts in this subject, \cite{Lawshe1975, Wilson2012, Ayre_CVR_2014} approached this issue by proposing and refining the Content Validation Ratio (CVR) that looks to identify items that are actually essentials. Despite their contribution, these studies do not check if an item validated as "essential" should be also validated as "not essential" by the same sample, which should be a paradox. Another issue is the assignment a probability equal a 50\% to a item be randomly checked by a respondent as essential, despite an evaluator has three options to choose. Our proposal faces these issues, making it possible to verify if a paradoxical situation occurs, and being more precise in recommending whether an item should either be retained or discarded from a questionnaire.

2410.09027 2026-03-24 stat.ME cs.LG econ.EM stat.AP

Variance reduction combining pre-experiment and in-experiment data

Zhexiao Lin, Pablo Crespo

Comments Accepted to 5th Conference on Causal Learning and Reasoning (CLeaR), 2026

详情
英文摘要

Online controlled experiments (A/B testing) are fundamental to data-driven decision-making in many companies. Improving the sensitivity of these experiments under fixed sample size constraints requires reducing the variance of the average treatment effect (ATE) estimator. Existing variance reduction techniques such as CUPED and CUPAC use pre-experiment data, but their effectiveness depends on how predictive those data are for outcomes measured during the experiment. In-experiment data are often more strongly correlated with the outcome, but using arbitrary post-treatment variables can introduce bias. In this paper, we propose a general, robust, and scalable framework that combines both pre-experiment and in-experiment data to achieve variance reduction. Our framework is simple, interpretable, and computationally efficient, making it practical for real-world deployment. We develop the asymptotic theory of the proposed estimator and provide consistent variance estimators. Empirical results from multiple online experiments conducted at Etsy demonstrate substantial additional variance reduction over current pipeline, even when incorporating only a few post-treatment covariates. These findings underscore the effectiveness of our framework in improving experimental sensitivity and accelerating data-driven decision-making.

2408.05106 2026-03-24 stat.ME

Restricted Spatial Regression is Reasonable Statistical Practice: Clarifications, Interpretations, and New Developments

Jonathan R. Bradley

详情
英文摘要

The spatial linear mixed model (SLMM) consists of fixed and spatial random effects that may be linearly dependent. Partially motivated as a means to address potential issues with confounding, the Restricted spatial regression (RSR) model restricts spatial random effects to be in the orthogonal column space of the covariates. Recent articles have shown that the misspecified Bayesian RSR generally performs worse than the SLMM when the data is generated from the SLMM. However, we show that the misspecified Bayesian RSR model's marginal posterior distribution is equivalent up to a reparameterization to that of the SLMM's marginal posterior distribution, under a certain prior assumption on the orthogonalized regression coefficients. This suggests that the RSR models are not sub-optimal as the subsequent Bayesian analysis can be interpreted as a type of SLMM Bayesian analysis. This equivalence relationship is developed further in the context of unmeasured confounders and nonlinearity, where we explore a semi-parametric property of the orthogonalized regression effects. Several results are provided to demonstrate new benefits of an RSR. In particular, we provide new results that show that the RSR can produce clear computational advantages via a direct sampler from the posterior distribution for all hyperparameters, fixed effects, and random effects. Additionally, a transfer learning approach offers a new interpretation to orthogonalized regression coefficients, which we show empirically can improve inference on dependent regression coefficients in the presence of spatial confounding. Simulations and an illustration using COVID-19 mortality data are provided.

2407.18707 2026-03-24 cs.LG stat.ML

Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection

Steven Adams, Andrea Patanè, Morteza Lahijanian, Luca Laurenti

详情
英文摘要

Infinitely wide or deep neural networks (NNs) with independent and identically distributed (i.i.d.) parameters have been shown to be equivalent to Gaussian processes. Because of the favorable properties of Gaussian processes, this equivalence is commonly employed to analyze neural networks and has led to various breakthroughs over the years. However, neural networks and Gaussian processes are equivalent only in the limit; in the finite case there are currently no methods available to approximate a trained neural network with a Gaussian model with bounds on the approximation error. In this work, we present an algorithmic framework to approximate a neural network of finite width and depth, and with not necessarily i.i.d. parameters, with a mixture of Gaussian processes with error bounds on the approximation error. In particular, we consider the Wasserstein distance to quantify the closeness between probabilistic models and, by relying on tools from optimal transport and Gaussian processes, we iteratively approximate the output distribution of each layer of the neural network as a mixture of Gaussian processes. Crucially, for any NN and $ε>0$ our approach is able to return a mixture of Gaussian processes that is $ε$-close to the NN at a finite set of input points. Furthermore, we rely on the differentiability of the resulting error bound to show how our approach can be employed to tune the parameters of a NN to mimic the functional behavior of a given Gaussian process, e.g., for prior selection in the context of Bayesian inference. We empirically investigate the effectiveness of our results on both regression and classification problems with various neural network architectures. Our experiments highlight how our results can represent an important step towards understanding neural network predictions and formally quantifying their uncertainty.

2407.05543 2026-03-24 stat.ME

Functional Principal Component Analysis for Sparse Censored Data

Caitrin Murphy, Eric Laber, Rhonda Merwin, Brian Reich, Jake Koerner

详情
英文摘要

Functional principal component analysis (FPCA) is a key tool in the study of functional data, driving both exploratory analyses and feature construction for use in formal modeling and testing procedures. However, existing methods for FPCA do not apply when functional observations are truncated, e.g., the measurement instrument only supports recordings within a pre-specified interval, thereby truncating values outside of the range to the nearest boundary. A naive application of existing methods without correction for truncation induces bias. We extend the FPCA framework to accommodate truncated noisy functional data by first recovering smooth mean and covariance surface estimates that are representative of the latent process's mean and covariance functions. Unlike traditional sample covariance smoothing techniques, our procedure yields a positive semi-definite covariance surface, computed without the need to retroactively remove negative eigenvalues in the covariance operator decomposition. Additionally, we construct a FPC score predictor and demonstrate its use in the generalized functional linear model. Convergence rates for the proposed estimators are provided. In simulation experiments, the proposed method yields better predictive performance and lower bias than existing alternatives. We illustrate its practical value through an application to a study with truncated blood glucose measurements.

2407.02419 2026-03-24 quant-ph cs.LG stat.ML

Quantum Curriculum Learning

Quoc Hoan Tran, Yasuhiro Endo, Hirotaka Oshima

Comments Updated with schematic figures of quantum circuits and transparent explanation for Curriculum Learning

详情
Journal ref
Phys. Rev. A 112, 032431 (2025)
英文摘要

Quantum machine learning (QML) requires significant quantum resources to address practical real-world problems. When the underlying quantum information exhibits hierarchical structures in the data, limitations persist in training complexity and generalization. Research should prioritize both the efficient design of quantum architectures and the development of learning strategies to optimize resource usage. We propose a framework called quantum curriculum learning (Q-CurL) for quantum data, where the curriculum introduces simpler tasks or data to the learning model before progressing to more challenging ones. Q-CurL exhibits robustness to noise and data limitations, which is particularly relevant for current and near-term noisy intermediate-scale quantum devices. We achieve this through a curriculum design based on quantum data density ratios and a dynamic learning schedule that prioritizes the most informative quantum data. Empirical evidence shows that Q-CurL significantly enhances training convergence and generalization for unitary learning and improves the robustness of quantum phase recognition tasks. Q-CurL is effective with physical learning applications in physics and quantum chemistry.

2406.16849 2026-03-24 math.ST math.PR stat.TH

Computationally tractable nonparametric bootstrap of high-dimensional sample covariance matrices

Holger Dette, Angelika Rohde

详情
英文摘要

We introduce a new ``$(m,mp/n)$ out of $(n,p)$'' sampling-with-replace\-ment bootstrap for eigenvalue statistics of high-dimensional sample covariance matrices based on $n$ independent $p$-dimensional random vectors. As it only uses $q=\lfloor mp/n\rfloor $ coordinates of the observations in a subsample of size $m \ll n $ from the original data, it is computationally tractable for large scale data. In the high-dimensional scenario $p/n\rightarrow c\in (0,\infty)$, this fully nonparametric bootstrap is shown to consistently reproduce the empirical spectral measure if $m/n\rightarrow 0$. If $m^2/n\rightarrow 0$, it approximates correctly the distribution of linear spectral statistics. The crucial component is a suitably defined Representative Subpopulation Condition which is shown to be verified in a large variety of situations. Our proofs are conducted under minimal moment requirements and incorporate delicate results on non-centered quadratic forms, combinatorial trace moments estimates as well as a conditional bootstrap martingale CLT which may be of independent interest.

2404.04709 2026-03-24 econ.GN q-fin.EC stat.AP

Two-Sided Flexibility in Platforms

Daniel Freund, Sébastien Martin, Jiayu Kamessi Zhao

详情
英文摘要

Flexibility is a cornerstone of operations management, crucial to hedge stochasticity in product demands, service requirements, and resource allocation. In two-sided platforms, flexibility is also two-sided and can be viewed as the compatibility of agents on one side with agents on the other side. Platform actions often influence the flexibility on either the demand or the supply side. But how should flexibility be jointly allocated across different sides? Whereas the literature has traditionally focused on only one side at a time, our work initiates the study of two-sided flexibility in matching platforms. We propose an abstract matching model in random graphs and identify the flexibility allocation that optimizes the expected size of a maximum matching. Our findings reveal that flexibility allocation is a first-order issue: for a given flexibility budget, the resulting matching size can vary greatly depending on how the budget is allocated. Moreover, even in the simple and symmetric settings we study, the quest for the optimal allocation is complicated. In particular, easy and costly mistakes can be made if the flexibility decisions on the demand and supply sides are optimized independently (e.g., by two different teams in the company), rather than jointly. To guide the search for optimal flexibility allocation, we uncover two effects - flexibility cannibalization and flexibility asymmetry - that govern when the optimal design places the flexibility budget only on one side or equally on both sides. In doing so we identify the study of two-sided flexibility as a significant aspect of platform efficiency.

2403.10889 2026-03-24 cs.LG stat.ML

List Sample Compression and Uniform Convergence

Steve Hanneke, Shay Moran, Tom Waknine

详情
英文摘要

List learning is a variant of supervised classification where the learner outputs multiple plausible labels for each instance rather than just one. We investigate classical principles related to generalization within the context of list learning. Our primary goal is to determine whether classical principles in the PAC setting retain their applicability in the domain of list PAC learning. We focus on uniform convergence (which is the basis of Empirical Risk Minimization) and on sample compression (which is a powerful manifestation of Occam's Razor). In classical PAC learning, both uniform convergence and sample compression satisfy a form of `completeness': whenever a class is learnable, it can also be learned by a learning rule that adheres to these principles. We ask whether the same completeness holds true in the list learning setting. We show that uniform convergence remains equivalent to learnability in the list PAC learning setting. In contrast, our findings reveal surprising results regarding sample compression: we prove that when the label space is $Y=\{0,1,2\}$, then there are 2-list-learnable classes that cannot be compressed. This refutes the list version of the sample compression conjecture by Littlestone and Warmuth (1986). We prove an even stronger impossibility result, showing that there are $2$-list-learnable classes that cannot be compressed even when the reconstructed function can work with lists of arbitrarily large size. We prove a similar result for (1-list) PAC learnable classes when the label space is unbounded. This generalizes a recent result by arXiv:2308.06424.

2402.15127 2026-03-24 cs.LG cs.IT math.IT stat.ML

Asymptotically and Minimax Optimal Regret Bounds for Multi-Armed Bandits with Abstention

Junwen Yang, Tianyuan Jin, Vincent Y. F. Tan

Comments 36 pages

详情
英文摘要

We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic innovation: abstention. In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the stochastic instantaneous reward before observing it. When opting for abstention, the agent either suffers a fixed regret or gains a guaranteed reward. This added layer of complexity naturally prompts the key question: can we develop algorithms that are both computationally efficient and asymptotically and minimax optimal in this setting? We answer this question in the affirmative by designing and analyzing algorithms whose regrets meet their corresponding information-theoretic lower bounds. Our results offer valuable quantitative insights into the benefits of the abstention option, laying the groundwork for further exploration in other online decision-making problems with such an option. Extensive numerical experiments validate our theoretical results, demonstrating that our approach not only advances theory but also has the potential to deliver significant practical benefits.

2401.09346 2026-03-24 stat.ML cs.LG

High Confidence Level Inference is Almost Free using Parallel Stochastic Optimization

Wanrong Zhu, Zhipeng Lou, Ziyang Wei, Wei Biao Wu

详情
英文摘要

Uncertainty quantification for estimation through stochastic optimization solutions in an online setting has gained popularity recently. This paper introduces a novel inference method focused on constructing confidence intervals with efficient computation and fast convergence to the nominal level. Specifically, we propose to use a small number of independent multi-runs to acquire distribution information and construct a t-based confidence interval. Our method requires minimal additional computation and memory beyond the standard updating of estimates, making the inference process almost cost-free. We provide a rigorous theoretical guarantee for the confidence interval, demonstrating that the coverage is approximately exact with an explicit convergence rate and allowing for high confidence level inference. In particular, a new Gaussian approximation result is developed for the online estimators to characterize the coverage properties of our confidence intervals in terms of relative errors. Additionally, our method also allows for leveraging parallel computing to further accelerate calculations using multiple cores. It is easy to implement and can be integrated with existing stochastic algorithms without the need for complicated modifications.

2309.06053 2026-03-24 stat.ME math.ST stat.TH

Confounder selection via iterative graph expansion

F. Richard Guo, Qingyuan Zhao

Comments 31 pages; new notation and terminology; to appear in the Annals of Statistics

详情
Journal ref
Ann. Statist. 54 (1) 516 - 541, 2026
英文摘要

Confounder selection, namely choosing a set of covariates to control for confounding between a treatment and an outcome, is arguably the most important step in the design of an observational study. Previous methods, such as Pearl's back-door criterion, typically require pre-specifying a causal graph, which can often be difficult in practice. We propose an interactive procedure for confounder selection that does not require pre-specifying the graph or the set of observed variables. This procedure iteratively expands the causal graph by finding what we call "primary adjustment sets" for a pair of possibly confounded variables. This can be viewed as inverting a sequence of marginalizations of the underlying causal graph. Structural information in the form of primary adjustment sets is elicited from the user, bit by bit, until either a set of covariates is found to control for confounding or it can be determined that no such set exists. Other information, such as the causal relations between confounders, is not required by the procedure. We show that if the user correctly specifies the primary adjustment sets in every step, our procedure is both sound and complete.

2305.10413 2026-03-24 stat.ML cs.LG math.ST stat.AP stat.TH

On Consistency of Signature Using Lasso

Xin Guo, Binnan Wang, Ruixun Zhang, Chaoyi Zhao

详情
英文摘要

Signatures are iterated path integrals of continuous and discrete-time processes, and their universal nonlinearity linearizes the problem of feature selection in time series data analysis. This paper studies the consistency of signature using Lasso regression, both theoretically and numerically. We establish conditions under which the Lasso regression is consistent both asymptotically and in finite sample. Furthermore, we show that the Lasso regression is more consistent with the Itô signature for time series and processes that are closer to the Brownian motion and with weaker inter-dimensional correlations, while it is more consistent with the Stratonovich signature for mean-reverting time series and processes. We demonstrate that signature can be applied to learn nonlinear functions and option prices with high accuracy, and the performance depends on properties of the underlying process and the choice of the signature.

2305.03158 2026-03-24 stat.CO stat.ME

Quantile Importance Sampling

Jyotishka Datta, Nicholas G. Polson

Comments Fixed a few typos and errors, and added a real data example

详情
英文摘要

In Bayesian inference, the approximation of integrals of the form $ψ= \mathbb{E}_{F}{l(X)} = \int_χ l(\mathbf{x}) d F(\mathbf{x})$ is a fundamental challenge. Such integrals are crucial for evidence estimation, which is important for various purposes, including model selection and numerical analysis. The existing strategies for evidence estimation are classified into four categories: deterministic approximation, density estimation, importance sampling, and vertical representation (Llorente et al., 2020). In this paper, we show that the Riemann sum estimator due to Yakowitz (1978) can be used in the context of nested sampling (Skilling, 2006) to achieve a $O(n^{-4})$ rate of convergence, faster than the usual Ergodic Central Limit Theorem. We provide a brief overview of the literature on the Riemann sum estimators and the nested sampling algorithm and its connections to vertical likelihood Monte Carlo. We provide theoretical and numerical arguments to show how merging these two ideas may result in improved and more robust estimators for evidence estimation, especially in higher dimensional spaces. We also briefly discuss the idea of simulating the Lorenz curve that avoids the problem of intractable $Λ$ functions, essential for the vertical representation and nested sampling.

2206.10143 2026-03-24 stat.ML cs.LG math.ST stat.ME stat.TH

Noise-contrastive Online Change Point Detection

Nikita Puchkin, Artur Goldman, Konstantin Yakovlev, Valeriia Dzis, Uliana Vinogradova

Comments The preliminary version of this paper was presented at the 26th International Conference on Artificial Intelligence and Statistics (AISTATS 2023, PMLR 206:5686-5713)

详情
英文摘要

We suggest a novel procedure for online change point detection. Our approach expands an idea of maximizing a discrepancy measure between points from pre-change and post-change distributions. This leads to flexible algorithms suitable for both parametric and nonparametric scenarios. We prove non-asymptotic bounds on the average running length of the procedure and its expected detection delay. The efficiency of the algorithm is illustrated with numerical experiments on synthetic and real-world data sets.

2203.06573 2026-03-24 stat.ME

Homogeneity and Sub-homogeneity Pursuit: Iterative Complement Clustering PCA

Daning Bi, Le Chang, Yanrong Yang

详情
英文摘要

Principal component analysis (PCA), the most popular dimension-reduction technique, has been used to analyze high-dimensional data in many areas. It discovers the homogeneity within the data and creates a reduced feature space to capture as much information as possible from the original data. However, in the presence of a group structure of the data, PCA often fails to identify the group-specific pattern, which is known as sub-homogeneity in this study. Group-specific information that is missed can result in an unsatisfactory representation of the data from a particular group. It is important to capture both homogeneity and sub-homogeneity in high-dimensional data analysis, but this poses a great challenge. In this study, we propose a novel iterative complement-clustering principal component analysis (CPCA) to iteratively estimate the homogeneity and sub-homogeneity. A principal component regression based clustering method is also introduced to provide reliable information about clusters. Theoretically, this study shows that our proposed clustering approach can correctly identify the cluster membership under certain conditions. The simulation study and real analysis of the stock return data confirm the superior performance of our proposed methods.

2112.00414 2026-03-24 stat.ME math.ST stat.TH

AR-sieve Bootstrap for High-dimensional Time Series

Daning Bi, Han Lin Shang, Yanrong Yang, Huanjun Zhu

详情
英文摘要

This paper proposes a new AR-sieve bootstrap approach to high-dimensional time series. The major challenge of classical bootstrap methods on high-dimensional time series is two-fold: curse of dimensionality and temporal dependence. To address such a difficulty, we utilize factor modeling to reduce dimension and capture temporal dependence simultaneously. A factor-based bootstrap procedure is constructed, which performs an AR-sieve bootstrap on the extracted low-dimensional common factor time series and then recovers the bootstrap samples for the original data from the factor model. Asymptotic properties for bootstrap mean statistics and extreme eigenvalues are established. Various simulation studies further demonstrate the advantages of the new AR-sieve bootstrap in high-dimensional scenarios. An empirical application on particulate matter (PM) concentration data is studied, where bootstrap confidence intervals for mean vectors and autocovariance matrices are provided.

2603.20538 2026-03-24 cs.LG stat.ML

Understanding Behavior Cloning with Action Quantization

Haoqun Cao, Tengyang Xie

详情
英文摘要

Behavior cloning is a fundamental paradigm in machine learning, enabling policy learning from expert demonstrations across robotics, autonomous driving, and generative models. Autoregressive models like transformer have proven remarkably effective, from large language models (LLMs) to vision-language-action systems (VLAs). However, applying autoregressive models to continuous control requires discretizing actions through quantization, a practice widely adopted yet poorly understood theoretically. This paper provides theoretical foundations for this practice. We analyze how quantization error propagates along the horizon and interacts with statistical sample complexity. We show that behavior cloning with quantized actions and log-loss achieves optimal sample complexity, matching existing lower bounds, and incurs only polynomial horizon dependence on quantization error, provided the dynamics are stable and the policy satisfies a probabilistic smoothness condition. We further characterize when different quantization schemes satisfy or violate these requirements, and propose a model-based augmentation that provably improves the error bound without requiring policy smoothness. Finally, we establish fundamental limits that jointly capture the effects of quantization error and statistical complexity.

2603.20526 2026-03-24 cs.LG cs.AI stat.ML

Does This Gradient Spark Joy?

Ian Osband

详情
英文摘要

Policy gradient computes a backward pass for every sample, even though the backward pass is expensive and most samples carry little learning value. The Delightful Policy Gradient (DG) provides a forward-pass signal of learning value: \emph{delight}, the product of advantage and surprisal (negative log-probability). We introduce the \emph{Kondo gate}, which compares delight against a compute price and pays for a backward pass only when the sample is worth it, thereby tracing a quality--cost Pareto frontier. In bandits, zero-price gating preserves useful gradient signal while removing perpendicular noise, and delight is a more reliable screening signal than additive combinations of value and surprise. On MNIST and transformer token reversal, the Kondo gate skips most backward passes while retaining nearly all of DG's learning quality, with gains that grow as problems get harder and backward passes become more expensive. Because the gate tolerates approximate delight, a cheap forward pass can screen samples before expensive backpropagation, suggesting a speculative-decoding-for-training paradigm.

2603.20520 2026-03-24 stat.ML cs.LG

CogFormer: Learn All Your Models Once

Jerry M. Huang, Lukas Schumacher, Niek Stevenson, Stefan T. Radev

详情
英文摘要

Simulation-based inference (SBI) with neural networks has accelerated and transformed cognitive modeling workflows. SBI enables modelers to fit complex models that were previously difficult or impossible to estimate, while also allowing rapid estimation across large numbers of datasets. However, the utility of SBI for iterating over varying modeling assumptions remains limited: changing parameterizations, generative functions, priors, and design variables all necessitate model retraining and hence diminish the benefits of amortization. To address these issues, we pilot a meta-amortized framework for cognitive modeling which we nickname the CogFormer. Our framework trains a transformer-based architecture that remains valid across a combinatorial number of structurally similar models, allowing for changing data types, parameters, design matrices, and sample sizes. We present promising quantitative results across families of decision-making models for binary, multi-alternative, and continuous responses. Our evaluation suggests that CogFormer can accurately estimate parameters across model families with a minimal amortization offset, making it a potentially powerful engine that catalyzes cognitive modeling workflows.

2603.20467 2026-03-24 stat.ME cs.LG math.DS

Goal-oriented learning of stochastic dynamical systems using error bounds on path-space observables

Joanna Zou, Han Cheng Lie, Youssef Marzouk

详情
英文摘要

The governing equations of stochastic dynamical systems often become cost-prohibitive for numerical simulation at large scales. Surrogate models of the governing equations, learned from data of the high-fidelity system, are routinely used to predict key observables with greater efficiency. However, standard choices of loss function for learning the surrogate model fail to provide error guarantees in path-dependent observables, such as reaction rates of molecular dynamical systems. This paper introduces an error bound for path-space observables and employs it as a novel variational loss for the goal-oriented learning of a stochastic dynamical system. We show the error bound holds for a broad class of observables, including mean first hitting times on unbounded time domains. We derive an analytical gradient of the goal-oriented loss function by leveraging the formula for Frechet derivatives of expected path functionals, which remains tractable for implementation in stochastic gradient descent schemes. We demonstrate that surrogate models of overdamped Langevin systems developed via goal-oriented learning achieve improved accuracy in predicting the statistics of a first hitting time observable and robustness to distributional shift in the data.

2603.20464 2026-03-24 econ.EM stat.ME stat.ML

Double Machine Learning for Static Panel Data with Instrumental Variables: New Method and Applications

Anna Baiardi, Paul S. Clarke, Andrea A. Naghi, Annalivia Polselli

详情
英文摘要

Panel data methods are widely used in empirical analysis to address unobserved heterogeneity, but causal inference remains challenging when treatments are endogenous and confounding variables high-dimensional and potentially nonlinear. Standard instrumental variables (IV) estimators, such as two-stage least squares (2SLS), become unreliable when instrument validity requires flexibly conditioning on many covariates with potentially non-linear effects. This paper develops a Double Machine Learning estimator for static panel models with endogenous treatments (panel IV DML), and introduces weak-identification diagnostics for it. We revisit three influential migration studies that use shift-share instruments. In these settings, instrument validity depends on a rich covariate adjustment. In one application, panel IV DML strengthens the predictive power of the instrument and broadly confirms 2SLS results. In the other cases, flexible adjustment makes the instruments weak, leading to substantially more cautious causal inference than conventional 2SLS. Monte Carlo evidence supports these findings, showing that panel IV DML improves estimation accuracy under strong instruments and delivers more reliable inference under weak identification.

2603.20394 2026-03-24 econ.EM stat.ME

When are time series predictions causal? The potential system and dynamic causal effects

Jacob Carlson, Neil Shephard

详情
英文摘要

The potential system is a nonparametric time series model for assessing the causal impact of moving an assignment at time $t$ on an outcome at future time $t+h$, accounting for the presence of features. The potential system provides nonparametric content for, e.g., time series experiments, time series regression, local projection, impulse response functions and SVARs. It closes a gap between time series causality and nonparametric cross-sectional causal methods, and provides a foundation for many new methods which have causal content.

2603.20392 2026-03-24 cs.LG cs.AI stat.ML

SymCircuit: Bayesian Structure Inference for Tractable Probabilistic Circuits via Entropy-Regularized Reinforcement Learning

Y. Sungtaek Ju

Comments 17 pages

详情
英文摘要

Probabilistic circuit (PC) structure learning is hampered by greedy algorithms that make irreversible, locally optimal decisions. We propose SymCircuit, which replaces greedy search with a learned generative policy trained via entropy-regularized reinforcement learning. Instantiating the RL-as-inference framework in the PC domain, we show the optimal policy is a tempered Bayesian posterior, recovering the exact posterior when the regularization temperature is set inversely proportional to the dataset size. The policy is implemented as SymFormer, a grammar-constrained autoregressive Transformer with tree-relative self-attention that guarantees valid circuits at every generation step. We introduce option-level REINFORCE, restricting gradient updates to structural decisions rather than all tokens, yielding an SNR (signal to noise ratio) improvement and >10 times sample efficiency gain on the NLTCS dataset. A three-layer uncertainty decomposition (structural via model averaging, parametric via the delta method, leaf via conjugate Dirichlet-Categorical propagation) is grounded in the multilinear polynomial structure of PC outputs. On NLTCS, SymCircuit closes 93% of the gap to LearnSPN; preliminary results on Plants (69 variables) suggest scalability.

2603.20388 2026-03-24 math.ST cs.LG econ.EM stat.ML stat.TH

From Cross-Validation to SURE: Asymptotic Risk of Tuned Regularized Estimators

Karun Adusumilli, Maximilian Kasy, Ashia Wilson

详情
英文摘要

We derive the asymptotic risk function of regularized empirical risk minimization (ERM) estimators tuned by $n$-fold cross-validation (CV). The out-of-sample prediction loss of such estimators converges in distribution to the squared-error loss (risk function) of shrinkage estimators in the normal means model, tuned by Stein's unbiased risk estimate (SURE). This risk function provides a more fine-grained picture of predictive performance than uniform bounds on worst-case regret, which are common in learning theory: it quantifies how risk varies with the true parameter. As key intermediate steps, we show that (i) $n$-fold CV converges uniformly to SURE, and (ii) while SURE typically has multiple local minima, its global minimum is generically well separated. Well-separation ensures that uniform convergence of CV to SURE translates into convergence of the tuning parameter chosen by CV to that chosen by SURE.

2603.20365 2026-03-24 stat.ML cs.AI cs.LG

Comprehensive Description of Uncertainty in Measurement for Representation and Propagation with Scalable Precision

Ali Darijani, Jürgen Beyerer, Zahra Sadat Hajseyed Nasrollah, Luisa Hoffmann, Michael Heizmann

详情
英文摘要

Probability theory has become the predominant framework for quantifying uncertainty across scientific and engineering disciplines, with a particular focus on measurement and control systems. However, the widespread reliance on simple Gaussian assumptions--particularly in control theory, manufacturing, and measurement systems--can result in incomplete representations and multistage lossy approximations of complex phenomena, including inaccurate propagation of uncertainty through multi stage processes. This work proposes a comprehensive yet computationally tractable framework for representing and propagating quantitative attributes arising in measurement systems using Probability Density Functions (PDFs). Recognizing the constraints imposed by finite memory in software systems, we advocate for the use of Gaussian Mixture Models (GMMs), a principled extension of the familiar Gaussian framework, as they are universal approximators of PDFs whose complexity can be tuned to trade off approximation accuracy against memory and computation. From both mathematical and computational perspectives, GMMs enable high performance and, in many cases, closed form solutions of essential operations in control and measurement. The paper presents practical applications within manufacturing and measurement contexts especially circular factory, demonstrating how the GMMs framework supports accurate representation and propagation of measurement uncertainty and offers improved accuracy--compared to the traditional Gaussian framework--while keeping the computations tractable.

2603.20349 2026-03-24 stat.ME stat.AP

Prediction intervals for overdispersed multinomial data with application to historical controls

Sören Budig, Frank Schaarschmidt, Max Menssen

详情
英文摘要

In pharmaceutical and toxicological research, historical control data are increasingly used to validate concurrent control groups, typically via the construction of historical control limits. While methods have been described for continuous and dichotomous endpoints, approaches for overdispersed multinomial data, common in developmental and reproductive toxicology or histopathology, are currently lacking. This article introduces and compares methods for constructing simultaneous prediction intervals for future multinomial observations subject to overdispersion. We investigate a range of frequentist approaches, including asymptotic approximations and bootstrap techniques (incorporating symmetric, asymmetric, and marginal calibration, as well as rank-based methods), alongside Bayesian hierarchical models. Extensive simulation studies assessing simultaneous coverage probability and the balance of lower and upper tail error probabilities show that standard asymptotic methods and simple Bonferroni adjustments yield liberal intervals, especially for small sample sizes or rare event categories. In contrast, bootstrap methods, specifically the Marginal Calibration and Rank-Based Simultaneous Confidence Sets, provide reliable error control and equal tail probabilities across diverse scenarios involving varying cluster sizes and degrees of overdispersion. These methods fill an important gap for multinomial endpoints and support the validation of concurrent controls using historical control data, in line with the recent European Food Safety Authority scientific opinion on the use and reporting of historical control data.

2603.20345 2026-03-24 q-bio.QM stat.AP

Towards Improved Short-term Hypoglycemia Prediction and Diabetes Management based on Refined Heart Rate Data

Vaibhav Gupta, Florian Grensing, Beyza Cinar, Louisa van den Boom, Maria Maleshkova

Comments 10 pages, 2 tables

详情
英文摘要

Hypoglycemia is a severe condition of decreased blood glucose, specifically below 70 mg/dL (3.9 mmol/L). This condition can often be asymptomatic and challenging to predict in individuals with type 1 diabetes (T1D). Research on hypoglycemic prediction typically uses a combination of blood glucose readings and heart rate data to predict hypoglycemic events. Given that these features are collected through wearable sensors, they can sometimes have missing values, necessitating efficient imputation methods. This work makes significant contributions to the current state of the art by introducing two novel imputation techniques for imputing heart rate values over short-term horizons: Controlled Weighted Rational Bézier Curves (CRBC) and Controlled Piecewise Cubic Hermite Interpolating Polynomial with mapped peaks and valleys of Control Points (CMPV). In addition to these imputation methods, we employ two metrics to capture data patterns, alongside a combined metric that integrates the strengths of both individual metrics with RMSE scores for a comprehensive evaluation of the imputation techniques. According to our combined metric assessment, CMPV outperforms the alternatives with an average score of 0.33 across all time gaps, while CRBC follows with a score of 0.48. These findings clearly demonstrate the effectiveness of the proposed imputation methods in accurately filling in missing heart rate values. Moreover, this study facilitates the detection of abnormal physiological signals, enabling the implementation of early preventive measures for more accurate diagnosis.

2603.20343 2026-03-24 stat.CO stat.AP

A practical introduction to ODE modelling in Stan for biological systems

Sara Hamis, John Forslund, Cici Chen Gu, Jodie A. Cochrane

Comments 23 pages, 10 figures

详情
英文摘要

Integrating dynamical systems models with time series data is a central part of contemporary mathematical biology. With the rich variety of available models and data, numerous methods and computational tools have been developed for these purposes. One such tool is Stan, a freely available and open-source probabilistic programming framework that provides efficient methods for estimating model parameters from data using computational Bayesian inference algorithms. Stan includes built-in mechanisms for working with ordinary differential equation (ODE) models, which are widely used in mathematical biology and related fields to study simulated, experimental, and real-world systems that change over time. Through step-by-step worked examples, including both pedagogical toy models and applications with real data, this article provides a practical, self-contained introduction to performing parameter estimation and model evaluation for first-order linear and nonlinear ODE models in Stan. The article also explains key statistical methods that underpin Stan and discusses computational Bayesian modelling in the context of biological applications.

2603.20318 2026-03-24 stat.ME

Beyond Pairwise: Nonparametric Kernel Estimators for a Generalized Weitzman Coefficient Across k Distributions

Omar Eidous, Noura Almasri

Comments 15 pages, 1 figure, 4 tables

详情
英文摘要

This papers presents a generalization of the Weitzman overlapping coefficient, originally defined for two probability density functions, to a setting involving k independent distributions, denoted by Delta. To estimate this generalized coefficient, we develop nonparametric methods based on kernel density estimation using k independent random samples (k>=2). Given the analytical complexity of directly deriving Delta using kernel estimators, a novel estimation strategy is proposed. It reformulates Delta as the expected value of a suitably defined function, which is then estimated via the method of moments and the resulting expressions are combined with kernel density estimators to construct the proposed estimators. This method yields multiple new estimators for the generalized Weitzman coefficient. Their performance is evaluated and compared through extensive Monte Carlo simulations. The results demonstrate that the proposed estimators are both effective and practically applicable, providing flexible tools for measuring overlap among multiple distributions.

2603.20254 2026-03-24 cs.CY cs.AI stat.OT

AI Detectors Fail Diverse Student Populations: A Mathematical Framing of Structural Detection Limits

Nathan Garland

详情
英文摘要

Student experiences and empirical studies report that "black box" AI text detectors produce high false positive rates with disproportionate errors against certain student populations, yet typically theoretical analyses model detection as a test between two known distributions for human and AI prose. This framing omits the structural feature of university assessment whereby an assessor generally does not know the individual student's writing distribution, making the null hypothesis composite. Standard application of the variational characterisation of total variation distance to this composite null shows trade-off bounds that any text-only, one-shot detector with useful power must produce false accusations at a rate governed by the distributional overlap between student writing and AI output. This is a constraint arising from population diversity that is logically independent of AI model quality and cannot be overcome by better detector engineering or technology. A subgroup mixture bound connects these quantities to observable demographic groups, providing a theoretical basis for the disparate impact patterns documented empirically. We propose suggestions to improve policy and practice, and argue that detection scores should not serve as sole evidence in misconduct proceedings.

2603.20243 2026-03-24 q-fin.PR q-fin.MF stat.AP

Two-Factor Hull-White Model Revisited: Correlation Structure for Two-Factor Interest Rate Model in CVA Calculation

Osamu Tsuchiya

详情
英文摘要

The development of credit valuation adjustment (CVA) (valuation adjustments [XVA]) [Green] has increased the importance of simple interest rate models such as the Hull-White model [Tan14] [Tsuchiya]. This is because the XVA model is an FX hybrid model, and is tractable only when the interest rate part is a simple Gaussian model. For the XVA calculation of interest rate instruments, de-correlation of the yield curve can be important even for the swap portfolio. Capturing the correlation structure in the two-factor Hull-White model is an integral element of CVA (XVA) modeling. However, the correlation structure in two-factor Hull-White model has not studied enough except for the analysis in [AndersenPiterbarg]. In this study, the correlation structure of the two-factor Hull-White model is analyzed in detail. The correlation structure of co-initial swap rates is investigated using a combination of the approximation formula and Monte-Carlo simulation. The Hull-White model captures the de-correlation of the yield curve only when the parameters (volatilities and mean reversion strength) satisfy certain relationships, making the valuation of XVA by two-factor Hull-White model effective.

2603.20241 2026-03-24 cond-mat.mtrl-sci stat.AP

Probabilistic calibration of crystal plasticity material models with synthetic global and local data

Joshua D. Pribe, Patrick E. Leser, Saikumar R. Yeratapally, George Weber

详情
英文摘要

Crystal plasticity models connect macroscopic deformation with the physics of microscale slip in polycrystalline materials. These models can be calibrated using global stress-strain curves, but the resulting parametrization is often not unique: multiple parametrizations can predict the same global behavior but different local, grain-scale behavior. Using local data for calibration can mitigate uniqueness issues, but expensive specialized experiments like high-energy X-ray diffraction (HEDM) are typically required to gather the data. The computational expense of full-field simulations also often prevents uncertainty quantification with sampling-based calibration algorithms like Markov chain Monte Carlo. This study presents a two-stage calibration procedure that combines global and local data and balances the efficiency of a surrogate model with the accuracy of full-field crystal plasticity simulations. The procedure quantifies uncertainty using Bayesian inference with an efficient, parallelized sequential Monte Carlo algorithm. Calibrations are completed using synthetic data with a microstructure representative of Inconel 718 to assess uncertainty and accuracy of the parameters relative to a known ground truth. Global data comes from the uniaxial stress-strain curve, while local data comes from grain-average stresses, reflecting typical outputs of HEDM experiments. Additional calibrations with limited and noisy local data demonstrate robustness of the procedure and identify the most important features of the data. Overall, the results demonstrate the computational efficiency of the two-stage procedure and the value of local data for reducing parameter uncertainty. In addition, joint distributions of the calibrated parameters highlight key considerations in choosing constitutive models and calibration data, including challenges resulting from correlated parameters.

2602.12683 2026-03-24 cs.LG stat.ML

Flow Matching from Viewpoint of Proximal Operators

Kenji Fukumizu, Wei Huang, Han Bao, Shuntuo Xu, Nisha Chandramoorthy

Comments 38 pages, 6 figures

详情
英文摘要

We reformulate Optimal Transport Conditional Flow Matching (OT-CFM), a class of dynamical generative models, showing that it admits an exact proximal formulation via an extended Brenier potential, without assuming that the target distribution has a density. In particular, the mapping to recover the target point is exactly given by a proximal operator, which yields an explicit proximal expression of the vector field. We also discuss the convergence of minibatch OT-CFM to the population formulation as the batch size increases. Finally, using second epi-derivatives of convex potentials, we prove that, for manifold-supported targets, OT-CFM is terminally normally hyperbolic: after time rescaling, the dynamics contracts exponentially in directions normal to the data manifold while remaining neutral along tangential directions.

2601.10878 2026-03-24 astro-ph.IM stat.AP

Optimal and Unbiased Fluxes from Up-the-Ramp Detectors under Variable Illumination

Bowen Li, Kevin A. McKinnon, Andrew K. Saydjari, Conor Sayres, Gwendolyn M. Eadie, Andrew R. Casey, Jon A. Holtzman, Timothy D. Brandt, Jose G. Fernandez-Trincado

Comments 22 pages, 20 figures

详情
英文摘要

Near-infrared (NIR) detectors -- which use non-destructive readouts to measure time-series counts-per-pixel -- play a crucial role in modern astrophysics. Standard NIR flux extraction techniques were developed for space-based observations and assume that source fluxes are constant over an observation. However, ground-based telescopes often see short-timescale atmospheric variations that can dramatically change the number of photons arriving at a pixel. This work presents a new statistical model that shares information between neighboring spectral pixels to characterize time-variable observations and extract unbiased fluxes with optimal uncertainties. We generate realistic synthetic data using a variety of flux and amplitude-of-time-variability conditions to confirm that our model recovers unbiased and optimal estimates of both the true flux and the time-variable signal. We find that the time-variable model should be favored over a constant-flux model when the observed count rates change by more than 3.5%. Ignoring time variability in the data can result in flux-dependent, unknown-sign biases that are as large as ~120% of the flux uncertainty. Using real APOGEE spectra, we find empirical evidence for approximately wavelength-independent, time-dependent variations in count rates with amplitudes much greater than the 3.5% threshold. Our model can robustly measure and remove the time-dependence in real data, improving the quality of data-model comparison. We show several examples where the observed time-dependence quantitatively agrees with independent measurements of observing conditions, such as variable cloud cover and seeing.

2512.09708 2026-03-24 math.ST stat.TH

A simple geometric proof for the characterisation of e-merging functions

Eugenio Clerico

Comments 4 pages

详情
英文摘要

E-values offer a powerful framework for aggregating evidence across different (possibly dependent) statistical experiments. A fundamental question is to identify e-merging functions, namely mappings that merge several e-values into a single valid e-value. A simple and elegant characterisation of this function class was recently obtained by Wang(2025), though via technically involved arguments. This note gives a short and intuitive geometric proof of the same characterisation, based on a supporting hyperplane argument applied to concave envelopes. We also show that the result holds even without imposing monotonicity in the definition of e-merging functions, which was needed for the existing proof. This shows that any non-monotone merging rule is automatically dominated by a monotone one, and hence extending the definition beyond the monotone case brings no additional generality.

2506.20789 2026-03-24 math.PR math.ST stat.TH

Central limit theory for Peaks-over-Threshold partial sums of long memory linear time series

Ioan Scheffel, Marco Oesting, Gilles Stupfler

Comments 61 pages, 4 figures, accepted for publication in Stochastic Processes and their Applications (2026)

详情
英文摘要

Over the last 30 years, extensive work has been devoted to developing central limit theory for partial sums of subordinated long memory linear time series. A much less studied problem, motivated by questions that are ubiquitous in extreme value theory, is the asymptotic behavior of such partial sums when the subordination mechanism has a threshold depending on sample size, so as to focus on the right tail of the time series. This article substantially extends longstanding asymptotic techniques by allowing the subordination mechanism to depend on the sample size in this way and to grow at a polynomial rate, while permitting the innovation process to have infinite variance. The cornerstone of our theoretical approach is a tailored reduction principle, which enables the use of classical results on partial sums of long memory linear processes. In this way we obtain asymptotic theory for certain Peaks-over-Threshold estimators with deterministic or random thresholds. Applications cover both heavy- and light-tailed regimes, yielding unexpected results which, to the best of our knowledge, are new to the literature. A simulation study illustrates the relevance of our findings in finite samples.

2505.06760 2026-03-24 stat.ME

Quantifying uncertainty and stability among highly correlated predictors: a subspace perspective

Xiaozhu Zhang, Jacob Bien, Armeen Taeb

详情
英文摘要

We study the problem of linear feature selection when features are highly correlated. Such settings pose two fundamental challenges. First, how should model similarity be defined? Simply counting features in common can be misleading: two models may share no features, yet highly correlated features can make the two models very similar in terms of predictive ability. Second, how can feature stability be assessed across runs of a variable selection method? High correlation can yield very different feature sets, so counting how often a feature is selected may label most features as unstable, and selecting stable features would result in models that are too small with poor predictive performance. In essence, these issues arise because existing notions of similarity and stability are "discrete" in nature. To overcome these challenges, we propose a novel framework based on feature subspaces -- the subspaces spanned by selected columns of the feature matrix. This new perspective leads to "continuous" measures of similarity and stability, as well as false positive error, all of which are defined in terms of "closeness" of feature subspaces. Our measures naturally account for feature correlation and reduce to existing discrete notions when features are uncorrelated. To obtain stable models, we propose and theoretically analyze a subspace-based generalization of stability selection (Meinshausen & Bühlmann 2010, Taeb et al. 2020), which combines a discrete model search with a continuous subspace-based assessment of stability. On synthetic and real gene expression data, our method improves on existing stability-based approaches by (i) producing multiple stable models that capture feature interchangeability, and (ii) generating larger models with better predictive performance. Our method is implemented in the R package substab.

2501.16562 2026-03-24 cs.LG stat.ME

C-HDNet: Hyperdimensional Computing for Causal Effect Estimation from Observational Data Under Network Interference

Abhishek Dalvi, Neil Ashtekar, Vasant Honavar

Comments Published at Social Network Analysis and Mining

详情
英文摘要

We address the problem of estimating causal effects from observational data in the presence of network confounding, a setting where both treatment assignment and observed outcomes of individuals may be influenced by their neighbors within a network structure, resulting in network interference. Traditional causal inference methods often fail to account for these dependencies, leading to biased estimates. To tackle this challenge, we introduce a novel matching-based approach that utilizes principles from hyperdimensional computing to effectively encode and incorporate structural network information. This enables more accurate identification of comparable individuals, thereby improving the reliability of causal effect estimates. Through extensive empirical evaluation on multiple benchmark datasets, we demonstrate that our method either outperforms or performs on par with existing state-of-the-art approaches, including several recent deep learning-based models that are significantly more computationally intensive. In addition to its strong empirical performance, our method offers substantial practical advantages, achieving nearly an order-of-magnitude reduction in runtime without compromising accuracy, making it particularly well-suited for large-scale or time-sensitive application

2403.19157 2026-03-24 math.PR math-ph math.MP math.ST stat.TH

Correlation functions between singular values and eigenvalues

Matthias Allard, Mario Kieburg

Comments 42 pages, 1 figure. Updated version: Peer reviewed version

详情
Journal ref
Random Matrices: Theory and Applications 15, 2550024 (2026)
英文摘要

Exploiting the explicit bijection between the density of singular values and the density of eigenvalues for bi-unitarily invariant complex random matrix ensembles of finite matrix size, we aim at finding the induced probability measure on $j$ eigenvalues and $k$ singular values that we coin $j,k$-point correlation measure. We find an expression for the $1,k$-point correlation measure which simplifies drastically when assuming that the singular values follow a polynomial ensemble, yielding a closed formula in terms of the kernel corresponding to the determinantal point process of the singular value statistics. These expressions simplify even further when the singular values are drawn from a Pólya ensemble and extend known results between the eigenvalue and singular value statistics of the corresponding bi-unitarily invariant ensemble.

2312.03257 2026-03-24 stat.ME stat.AP

Bayesian Functional Analysis for Untargeted Metabolomics Data with Matching Uncertainty and Small Sample Sizes

Guoxuan Ma, Jian Kang, Tianwei Yu

详情
Journal ref
Briefings in Bioinformatics, Volume 25, Issue 3, May 2024, bbae141
英文摘要

Untargeted metabolomics based on liquid chromatography-mass spectrometry technology is quickly gaining widespread application given its ability to depict the global metabolic pattern in biological samples. However, the data is noisy and plagued by the lack of clear identity of data features measured from samples. Multiple potential matchings exist between data features and known metabolites, while the truth can only be one-to-one matches. Some existing methods attempt to reduce the matching uncertainty, but are far from being able to remove the uncertainty for most features. The existence of the uncertainty causes major difficulty in downstream functional analysis. To address these issues, we develop a novel approach for Bayesian Analysis of Untargeted Metabolomics data (BAUM) to integrate previously separate tasks into a single framework, including matching uncertainty inference, metabolite selection, and functional analysis. By incorporating the knowledge graph between variables and using relatively simple assumptions, BAUM can analyze datasets with small sample sizes. By allowing different confidence levels of feature-metabolite matching, the method is applicable to datasets in which feature identities are partially known. Simulation studies demonstrate that, compared with other existing methods, BAUM achieves better accuracy in selecting important metabolites that tend to be functionally consistent and assigning confidence scores to feature-metabolite matches. We analyze a COVID-19 metabolomics dataset and a mouse brain metabolomics dataset using BAUM. Even with a very small sample size of 16 mice per group, BAUM is robust and stable. It finds pathways that conform to existing knowledge, as well as novel pathways that are biologically plausible.

2311.05649 2026-03-24 stat.AP stat.ME

Bayesian Image-on-Image Regression via Deep Kernel Learning based Gaussian Processes

Guoxuan Ma, Bangyao Zhao, Hasan Abu-Amara, Jian Kang

详情
Journal ref
Annals of Applied Statistics 2026, Vol. 20, No. 1, 536-559
英文摘要

In neuroimaging studies, it becomes increasingly important to study associations between different imaging modalities using image-on-image regression (IIR), which faces challenges in interpretation, statistical inference, and prediction. Our motivating problem is how to predict task-evoked fMRI activity using resting-state fMRI data in the Human Connectome Project (HCP). The main difficulty lies in effectively combining different types of imaging predictors with varying resolutions and spatial domains in IIR. To address these issues, we develop Bayesian Image-on-image Regression via Deep Kernel Learning Gaussian Processes (BIRD-GP) and develop efficient posterior computation methods through Stein variational gradient descent. We demonstrate the advantages of BIRD-GP over state-of-the-art IIR methods using simulations. For HCP data analysis using BIRD-GP, we combine the voxel-wise fALFF maps and region-wise connectivity matrices to predict fMRI contrast maps for language and social recognition tasks. We show that fALFF is less predictive than the connectivity matrix for both tasks, but combining both yields improved results. Angular Gyrus Right emerges as the most predictable region for the language task (75.9% predictable voxels), while Superior Parietal Gyrus Right tops for the social recognition task (48.9% predictable voxels). Additionally, we identify features from the resting-state fMRI data that are important for task fMRI prediction.