arXivDaily arXiv每日学术速递 周一至周五更新
重置
2604.15288 2026-04-17 math.ST stat.TH

Generalization of Pearl's Front-Door Criterion

Carol Wu, Elina Robeva

详情
英文摘要

Pearl's front-door criterion provides a set of sufficient conditions for estimating the total causal effect from observational data in the presence of latent confounding, using the functional P(y | do(x := x*)) = \sum_z P(z | x*) \sum_x P(y | x, z) P(x). An open question is whether these conditions can be generalized to be both necessary and sufficient for the validity of this functional, similar to the generalization achieved for the back-door adjustment criterion by Shpitser. In this paper, we present a new, weakened set of graph-based conditions sufficient for the front-door formula to estimate the total causal effect, expanding the scope of problems amenable to front-door identification.

2604.15285 2026-04-17 stat.ML cs.LG math.ST stat.TH

Structural interpretability in SVMs with truncated orthogonal polynomial kernels

Víctor Soto-Larrosa, Nuria Torrado, Edmundo J. Huertas

详情
英文摘要

We study post-training interpretability for Support Vector Machines (SVMs) built from truncated orthogonal polynomial kernels. Since the associated reproducing kernel Hilbert space is finite-dimensional and admits an explicit tensor-product orthonormal basis, the fitted decision function can be expanded exactly in intrinsic RKHS coordinates. This leads to Orthogonal Representation Contribution Analysis (ORCA), a diagnostic framework based on normalized Orthogonal Kernel Contribution (OKC) indices. These indices quantify how the squared RKHS norm of the classifier is distributed across interaction orders, total polynomial degrees, marginal coordinate effects, and pairwise contributions. The methodology is fully post-training and requires neither surrogate models nor retraining. We illustrate its diagnostic value on a synthetic double-spiral problem and on a real five-dimensional echocardiogram dataset. The results show that the proposed indices reveal structural aspects of model complexity that are not captured by predictive accuracy alone.

2604.15269 2026-04-17 quant-ph cs.LG math.ST stat.TH

Cloning is as Hard as Learning for Stabilizer States

Nikhil Bansal, Matthias C. Caro, Gaurav Mahajan

Comments 10 + 33 + 8 pages

详情
英文摘要

The impossibility of simultaneously cloning non-orthogonal states lies at the foundations of quantum theory. Even when allowing for approximation errors, cloning an arbitrary unknown pure state requires as many initial copies as needed to fully learn the state. Rather than arbitrary unknown states, modern quantum learning theory often considers structured classes of states and exploits such structure to develop learning algorithms that outperform general-state tomography. This raises the question: How do the sample complexities of learning and cloning relate for such structured classes? We answer this question for an important class of states. Namely, for $n$-qubit stabilizer states, we show that the optimal sample complexity of cloning is $Θ(n)$. Thus, also for this structured class of states, cloning is as hard as learning. To prove these results, we use representation-theoretic tools in the recently proposed Abelian State Hidden Subgroup framework and a new structured version of the recently introduced random purification channel to relate stabilizer state cloning to a variant of the sample amplification problem for probability distributions that was recently introduced in classical learning theory. This allows us to obtain our cloning lower bounds by proving new sample amplification lower bounds for classes of distributions with an underlying linear structure. Our results provide a more fine-grained perspective on No-Cloning theorems, opening up connections from foundations to quantum learning theory and quantum cryptography.

2604.15230 2026-04-17 stat.AP

On the robustness of Mann-Kendall tests used to forecast critical transitions

Tristan Gamot, Nils Thibeau--Sutre, Tom J. M. Van Dooren

Comments 26 pages including appendices, 10 figures, 2 tables

详情
英文摘要

Non-parametric approaches to test for trends in time series make use of the Mann-Kendall statistic. Based on asymptotic arguments, these tests assume that its distribution follows a Gaussian distribution, even for autocorrelated time series. Recent results on the lack of validity of this assumption urge a robustness analysis of these approaches. While the issue is relevant across a wide range of applications, we illustrate it here in the context of detecting early warning signals (EWS) of critical transitions, which are used across a variety of research domains, and where commonly applied methods generate autocorrelation. We present a broad analysis, covering all types of critical transitions commonly investigated in EWS studies. We compare empirical distributions of the Mann-Kendall statistic computed from classical EWS indicators preceding critical transitions to the theoretical distributions hypothesized by Mann-Kendall tests. We detect mismatches leading to inflated type I error rates, which would routinely lead to announcing a critical transition while it is not occurring. In contrast to a recent recommendation, we conclude that the use of Mann-Kendall tests for trend detection in the context of forecasting critical transitions should be avoided. We point out several alternative methods available instead.

2604.15229 2026-04-17 math.ST stat.ME stat.TH

On a Probability Inequality for Order Statistics with Applications to Bootstrap, Conformal Prediction, and more

Manit Paul, Arun Kumar Kuchibhotla

Comments 65 pages, 10 figures

详情
英文摘要

``Behind every limit theorem, there is an inequality'' said Kolmogorov. We say ``for every inequality, there is an approximate inequality under approximate regularity conditions.'' Suppose $X, X'$ are independent and identically distributed random variables. Then $X \le X'$ with a probability of at least $1/2$, irrespective of the underlying (common) distribution. One can ask what happens to the probability if $X, X'$ are independent but not identically distributed. It should be approximately $1/2$ if the distributions are approximately equal. Similarly, what if the random variables are dependent? It should, again, be approximately $1/2$ if the random variables are approximately independent. We explore an extension of this probability inequality involving order statistics and develop approximate versions of such an inequality under violations of independence and identical distribution assumptions. We further show that this inequality can be used as a basis to prove asymptotic validity of bootstrap/subsampling, finite-sample validity of conformal prediction, permutation tests, and asymptotic validity of rank tests without group invariance. Specifically, in the context of resampling inference, our results can be seen as a finite-sample instantiation of some results by Peter Hall and yield an alternative ``cheap bootstrap'' that applies to high-dimensional data.

2604.15217 2026-04-17 stat.ME

A Bayesian Approach to Unit-level Dependent Multi-type Survey Data

Zewei Kong, Paul A. Parker, Jonathan R. Bradley, Scott H. Holan

Comments 28 pages, 2 figures. Submitted to Journal of Survey Statistics and Methodology

详情
英文摘要

The American Community Survey (ACS) Public Use Microdata Sample (PUMS) provides access to a wide range of unit-level survey data consisting of correlated Gaussian and binomial distributed survey responses along with associated survey weights. As such, we propose a Bayesian hierarchical framework for jointly modeling unit-level Gaussian and binomial survey data. The model introduces a shared area-level random effect to capture dependence across responses. Informative sampling is addressed using a pseudo-likelihood construction, and Polya-Gamma data augmentation provides an efficient conjugate Gibbs sampler, enabling scalable inference for large survey datasets. Through empirical simulations based on ACS PUMS data, we show that the joint model achieves notable reductions in mean squared error and improved interval scores compared to univariate and design-based estimators. Applying the method to the 2023 Illinois PUMS data, we find that the joint model yields small-area estimates similar to those from the univariate model and the Horvitz-Thompson estimator, but with smaller posterior variances. The computational cost associated with the joint model is also comparable to that of the univariate binomial model. Combined with the empirical simulation results, these findings demonstrate the practical advantages of the proposed approach.

2604.15114 2026-04-17 stat.ML cs.AI cs.LG

Amortized Optimal Transport from Sliced Potentials

Minh-Phuc Truong, Khai Nguyen

Comments 26 pages, 11 figures, 10 tables

详情
英文摘要

We propose a novel amortized optimization method for predicting optimal transport (OT) plans across multiple pairs of measures by leveraging Kantorovich potentials derived from sliced OT. We introduce two amortization strategies: regression-based amortization (RA-OT) and objective-based amortization (OA-OT). In RA-OT, we formulate a functional regression model that treats Kantorovich potentials from the original OT problem as responses and those obtained from sliced OT as predictors, and estimate these models via least-squares methods. In OA-OT, we estimate the parameters of the functional model by optimizing the Kantorovich dual objective. In both approaches, the predicted OT plan is subsequently recovered from the estimated potentials. As amortized OT methods, both RA-OT and OA-OT enable efficient solutions to repeated OT problems across different measure pairs by reusing information learned from prior instances to rapidly approximate new solutions. Moreover, by exploiting the structure provided by sliced OT, the proposed models are more parsimonious, independent of specific structures of the measures, such as the number of atoms in the discrete case, while achieving high accuracy. We demonstrate the effectiveness of our approaches on tasks including MNIST digit transport, color transfer, supply-demand transportation on spherical data, and mini-batch OT conditional flow matching.

2604.15107 2026-04-17 stat.ML cs.LG

MinShap: A Modified Shapley Value Approach for Feature Selection

Chenghui Zheng, Garvesh Raskutti

详情
英文摘要

Feature selection is a classical problem in statistics and machine learning, and it continues to remain an extremely challenging problem especially in the context of unknown non-linear relationships with dependent features. On the other hand, Shapley values are a classic solution concept from cooperative game theory that is widely used for feature attribution in general non-linear models with highly-dependent features. However, Shapley values are not naturally suited for feature selection since they tend to capture both direct effects from each feature to the response and indirect effects through other features. In this paper, we combine the advantages of Shapley values and adapt them to feature selection by proposing \emph{MinShap}, a modification of the Shapley value framework along with a suite of other related algorithms. In particular for MinShap, instead of taking the average marginal contributions over permutations of features, considers the minimum marginal contribution across permutations. We provide a theoretical foundation motivated by the faithfulness assumption in DAG (directed acyclic graphical models), a guarantee for the Type I error of MinShap, and show through numerical simulations and real data experiments that MinShap tends to outperform state-of-the-art feature selection algorithms such as LOCO, GCM and Lasso in terms of both accuracy and stability. We also introduce a suite of algorithms related to MinShap by using the multiple testing/p-value perspective that improves performance in lower-sample settings and provide supporting theoretical guarantees.

2604.15106 2026-04-17 stat.ME

Cellwise Robust Twoblock Dimension Reduction

Sven Serneels

详情
英文摘要

Cellwise Robust Twoblock (CRTB) is introduced, the first cellwise robust method for simultaneous dimension reduction of multivariate predictor and response blocks, in both a dense and a sparse variable-selecting variant. Classical robust methods protect against casewise outliers by downweighting or removing entire observations, a strategy that becomes inefficient -- and eventually breaks down -- when contamination is scattered across individual cells rather than concentrated in whole rows. CRTB combines a column-wise pre-filter for cellwise outlier detection with model-based imputation of flagged cells inside an iteratively reweighted M-estimation loop, retaining the clean cells of partially contaminated rows instead of discarding the observation. An efficient algorithm is provided that uses the classical twoblock SVD as a warm start and converges in a handful of IRLS iterations at a moderate computational cost. The method resists settings where more than $50\%$ of rows contain contaminated cells while retaining comparable efficiency on clean data. A simulation study confirms these properties and shows that CRTB additionally recovers the underlying cellwise outlier pattern with high fidelity and, in the sparse setting, the correct set of informative variables. Two compelling examples illustrate CRTB's practical utility. In each of these, CRTB is shown to be conducive to results that are highly interpretable in the respective domains in the presence of cellwise outliers. As a by-product, the corresponding cells are identified with high fidelity.

2604.15104 2026-04-17 stat.ME

On the Conservativeness of Robust Variance Estimators in Propensity Score Weighted Cox Models

Hiroya Morita, Shunichiro Orihara, Fumitaka Shimizu, Masataka Taguri

Comments 19 pages, 4 table

详情
英文摘要

In propensity score weighted analysis, robust variance that does not account for weight estimation is commonly used. In propensity score weighted Cox models (CoxPSW), the robust variance is known to be conservative when weights for the average treatment effect (ATE) are used, but it remains unclear whether this conservativeness also holds for other weighting schemes. This study evaluated the performance of the robust variance in CoxPSW when weights other than ATE are applied. We conducted an asymptotic comparison between the robust variance and a variance estimator that accounts for weight estimation under non-ATE weights. Their performance was further evaluated through simulation studies and real data analysis. The analytical results, simulations, and real data analysis indicated that the robust variance is not necessarily conservative in CoxPSW when weights other than ATE are used. These findings suggest that variance estimators that account for weight estimation should be used when applying non-ATE weights in CoxPSW.

2604.15070 2026-04-17 stat.ME

Adaptive Multi-Prior Lasso for High-Dimensional Generalized Linear Models

Fuzhi Xu, Weijuan Liang, Shuangge Ma, Qingzhao Zhang

Comments 23 pages, 3 figures, 2 tables

详情
英文摘要

Incorporation of external information into high-dimensional modeling for gene expression data has been shown, both theoretically and empirically, to substantially enhance performance. Such external information, sometimes referred to as prior information or priors, has become increasingly accessible from multiple sources, yet its reliability may vary considerably. Existing approaches often integrate these priors without sufficiently accounting for their quality, which may result in unsatisfactory or even misleading results. To effectively and selectively exploit such priors, we propose adaptive Multi-Prior Lasso, a novel regularization approach that simultaneously identifies reliable prior sources and integrates them to improve model performance. For high-dimensional generalized linear models (GLMs), an adaptive data-driven weight is assigned to each prior, so that more reliable sources are emphasized while less credible ones are downweighted. Theoretical guarantees are established, and the proposed method is shown through extensive simulations to improve estimation, prediction, and variable selection. An application to TCGA breast cancer gene expression data further illustrates the practical value of the proposed method, showing that incorporating prior information from PubMed published studies improves model performance.

2604.15067 2026-04-17 stat.AP stat.ME

Capturing Aleatoric Uncertainty in Climate Models

Cornelia Gruber, Henri Funk, Magdalena Mittermeier, Helmut Küchenhoff, Göran Kauermann

详情
英文摘要

Internal climate variability arises from the climate system's inherently chaotic dynamics. Quantifying it is essential for climate science, as it enables risk-based decision-making and differentiates between externally forced change and internal fluctuations. In statistical terms, natural variability corresponds to aleatoric uncertainty, i.e., irreducible stochastic variability. Despite this close conceptual alignment, the link between internal climate variability and aleatoric uncertainty has not yet been formalized. We establish a theoretical link by showing that member-to-member differences in single-model large ensembles provide a direct representation of aleatoric uncertainty. To quantify the spatio-temporal structure of aleatoric uncertainty, we employ generalized additive models. The proposed framework is validated through comparison with ERA5-Land reanalysis data, demonstrating that ensemble-derived estimates reproduce key spatial and temporal patterns of real-world variability. Applied to the water balance over the Iberian Peninsula, our approach reveals coherent variability structures and pronounced regional heterogeneity. We find a decline in variability in drought-prone regions and seasons, a pattern that strengthens under +3 °C global warming, implying an increased risk of persistent summer drought conditions. Beyond this application, the framework is climate-model agnostic and transferable to other variables and spatial scales, providing a statistical basis for quantifying internal climate variability as aleatoric uncertainty.

2604.15064 2026-04-17 stat.ME

Ranked-choice conjoint experiments

Thomas S. Robinson, Mats Ahrenshop, Spyros Kosmidis

详情
英文摘要

Forced-choice conjoint designs have become a staple method in the experimentalist's toolkit. However, the forced-choice outcome is neither always consistent with the types of choices individuals make in real political contexts, nor is it statistically efficient. In this paper, we formalize how ranked outcomes can be integrated into the conjoint framework. We provide a proof that rank-expanded estimators are equivalent to conventional AMCE, a theoretical account of how additional profiles increase the efficiency of conjoint designs, and design-based tests for the transitivity and independence of irrelevant alternatives assumptions that underpin the expansion. Across two pre-registered survey experiments--the first comparing forced-choice and ranked-choice designs across candidate and policy domains, and the second varying the number of ranked profiles--we find that ranked-choice conjoints yield substantively similar but more precise AMCE estimates, shrinking standard errors by 12-13% with one additional profile and up to 55% with six profiles per vignette. Based on efficiency--validity trade-offs, we recommend K = 4 profiles for most applications. We provide an accompanying open-source R package, cjrank, that implements rank expansion, AMCE estimation, efficiency diagnostics, and the assumption tests described in this paper.

2604.15061 2026-04-17 math.ST stat.TH

On general weighted cumulative residual (past) extropy of extreme order statistics

Santosh Kumar Chaudhary, Sarikul Islam, Nitin Gupta

详情
英文摘要

Weighted extropy has recently emerged as a flexible information measure for quantifying uncertainty, with particular relevance to order statistics. In this paper, we introduce and study a weighted cumulative analogue of extropy, extending the framework of weighted cumulative residual and cumulative past entropies to extreme order statistics. Specifically, we define the general weighted cumulative residual extropy (GWCREx) for the smallest order statistic and the general weighted cumulative past extropy (GWCPEx) for the largest order statistic, along with their dynamic versions. We show that these weighted measures and their dynamic counterparts uniquely characterize the underlying distribution. Moreover, we establish new characterization results for two widely used reliability models: the generalized Pareto distribution and the power distribution. The proposed framework provides a unified information-theoretic tool for analysing extreme lifetimes in reliability engineering and survival analysis.

2604.14975 2026-04-17 stat.CO cs.NA math.NA stat.AP stat.ML

Theta-regularized Kriging: Modelling and Algorithms

Xuelin Xie, Xiliang Lu

详情
Journal ref
Applied Mathematical Modelling, Vol. 136, 115627 (2024)
英文摘要

To obtain more accurate model parameters and improve prediction accuracy, we proposed a regularized Kriging model that penalizes the hyperparameter theta in the Gaussian stochastic process, termed the Theta-regularized Kriging. We derived the optimization problem for this model from a maximum likelihood perspective. Additionally, we presented specific implementation details for the iterative process, including the regularized optimization algorithm and the geometric search cross-validation tuning algorithm. Three distinct penalty methods, Lasso, Ridge, and Elastic-net regularization, were meticulously considered. Meanwhile, the proposed Theta-regularized Kriging models were tested on nine common numerical functions and two practical engineering examples. The results demonstrate that, compared with other penalized Kriging models, the proposed model performs better in terms of accuracy and stability.

2604.14971 2026-04-17 stat.AP

Mapping Subnational Vulnerability to Inadequate Micronutrient Intake using a Bayesian Small Area Estimation Framework

Sahoko Ishida, Mohammed Osman, Ziyao Cui, Uchenna Agu, Emily Becher, Gabriel Battcock, Daniel Hernandez, Duccio Piovani, Frances Knight, Seth Flaxman, Kevin Tang

详情
英文摘要

Inadequate dietary micronutrient intake is a significant risk factor for deficiency and remains a major global health challenge. Nutrition programmes and interventions are most effective when targeted to populations at greatest risk. Household Consumption and Expenditure Surveys (HCES) are a widely available source of dietary data; however, they are often not powered for estimation below the first administrative level, limiting their utility for geographically targeted interventions. To address this, we applied Bayesian Small Area Estimation (SAE) methods to estimate the prevalence of apparent inadequate intake at the second administrative level. Three approaches were considered: a cluster level Beta binomial model and two area level models (mean smoothing and joint smoothing). Models were evaluated using a Rwanda HCES survey that supports inference at this scale. All models were implemented in a fully Bayesian framework to propagate uncertainty. Simulation results in Rwanda showed that the cluster level Beta binomial model achieved the strongest performance, while the area level joint smoothing model was the most reliable alternative among models accounting for survey design. Based on these results, models were applied to Senegal and Nigeria. In Senegal, second administrative level estimates captured meaningful subnational variation, reduced uncertainty relative to direct estimates, and remained consistent with first administrative level benchmarks. In Nigeria, despite smaller sample sizes and survey design constraints, modelled estimates reduced extreme uncertainty and showed good agreement with first administrative level estimates. This study demonstrates that Bayesian SAE methods can be applied to HCES data to generate reliable fine scale estimates of inadequate micronutrient intake, supporting localised nutrition interventions.

2604.14908 2026-04-17 cs.LG cs.SY eess.SY stat.ML

Multi-User mmWave Beam and Rate Adaptation via Combinatorial Satisficing Bandits

Emre Özyıldırım, Barış Yaycı, Umut Eren Akturk, Cem Tekin

详情
英文摘要

We study downlink beam and rate adaptation in a multi-user mmWave MISO system where multiple base stations (BSs), each using analog beamforming from finite codebooks, serve multiple single-antenna user equipments (UEs) with a unique beam per UE and discrete data transmission rates. BSs learn about transmission success based on ACK/NACK feedback. To encode service goals, we introduce a satisficing throughput threshold $τ_r$ and cast joint beam and rate adaptation as a combinatorial semi-bandit over beam-rate tuples. Within this framework, we propose SAT-CTS, a lightweight, threshold-aware policy that blends conservative confidence estimates with posterior sampling, steering learning toward meeting $τ_r$ rather than merely maximizing. Our main theoretical contribution provides the first finite-time regret bounds for combinatorial semi-bandits with satisficing objective: when $τ_r$ is realizable, we upper bound the cumulative satisficing regret to the target with a time-independent constant, and when $τ_r$ is non-realizable, we show that SAT-CTS incurs only a finite expected transient outside committed CTS rounds, after which its regret is governed by the sum of the regret contributions of restarted CTS rounds, yielding an $O((\log T)^2)$ standard regret bound. On the practical side, we evaluate the performance via cumulative satisficing regret to $τ_r$ alongside standard regret and fairness. Experiments with time-varying sparse multipath channels show that SAT-CTS consistently reduces satisficing regret and maintains competitive standard regret, while achieving favorable average throughput and fairness across users, indicating that feedback-efficient learning can equitably allocate beams and rates to meet QoS targets without channel state knowledge.

2603.06431 2026-04-17 math.NA cs.LG cs.NA stat.ML

Certified and accurate computation of function space norms of deep neural networks

Johannes Gründler, Moritz Maibaum, Philipp Petersen

详情
英文摘要

Neural network methods for PDEs require reliable error control in function space norms. However, trained neural networks can typically only be probed at a finite number of point values. Without strong assumptions, point evaluations alone do not provide enough information to derive tight deterministic and guaranteed bounds on function space norms. In this work, we move beyond a purely black-box setting and exploit the neural network structure directly. We present a framework for the certified and accurate computation of integral quantities of neural networks, including Lebesgue and Sobolev norms, by combining interval arithmetic enclosures on axis-aligned boxes with adaptive marking/refinement and quadrature-based aggregation. On each box, we compute guaranteed lower and upper bounds for function values and derivatives, and propagate these local certificates to global lower and upper bounds for the target integrals. Our analysis provides a general convergence theorem for such certified adaptive quadrature procedures and instantiates it for function values, Jacobians, and Hessians, yielding certified computation of $L^p$, $W^{1,p}$, and $W^{2,p}$ norms. We further show how these ingredients lead to practical certified bounds for PINN interior residuals. Numerical experiments illustrate the accuracy and practical behavior of the proposed methods.

2603.02196 2026-04-17 cs.AI cs.LG math.ST stat.ML stat.TH

Conformal Policy Control

Drew Prinster, Clara Fannjiang, Ji Won Park, Kyunghyun Cho, Anqi Liu, Suchi Saria, Samuel Stanton

详情
英文摘要

An agent must try new behaviors to explore and improve. In high-stakes environments, an agent that violates safety constraints may cause harm and must be taken offline, curtailing any future interaction. Imitating old behavior is safe, but excessive conservatism discourages exploration. How much behavior change is too much? We show how to use any safe reference policy as a probabilistic regulator for any optimized but untested policy. Conformal calibration on data from the safe policy determines how aggressively the new policy can act, while provably enforcing the user's declared risk tolerance. Unlike conservative optimization methods, we do not assume the user has identified the correct model class nor tuned any hyperparameters. Unlike previous conformal methods, our theory provides finite-sample guarantees even for non-monotonic bounded loss functions. Our experiments on applications ranging from natural language question answering to biomolecular engineering show that safe exploration is not only possible from the first moment of deployment, but can also improve performance.

2602.06930 2026-04-17 cs.LG math.OC math.ST stat.ML stat.TH

Continuous-time reinforcement learning: ellipticity enables model-free value function approximation

Wenlong Mou

Comments update from previous version: removed unnecessarily strong requirement on discount rate

详情
英文摘要

We study off-policy reinforcement learning for controlling continuous-time Markov diffusion processes with discrete-time observations and actions. We consider model-free algorithms with function approximation that learn value and advantage functions directly from data, without unrealistic structural assumptions on the dynamics. Leveraging the ellipticity of the diffusions, we establish a new class of Hilbert-space positive definiteness and boundedness properties for the Bellman operators. Based on these properties, we propose the Sobolev-prox fitted $q$-learning algorithm, which learns value and advantage functions by iteratively solving least-squares regression problems. We derive oracle inequalities for the estimation error, governed by (i) the best approximation error of the function classes, (ii) their localized complexity, (iii) exponentially decaying optimization error, and (iv) numerical discretization error. These results identify ellipticity as a key structural property that renders reinforcement learning with function approximation for Markov diffusions no harder than supervised learning.

2505.07427 2026-04-17 stat.AP cs.CE

Value of Information-based assessment of strain-based thickness loss monitoring in ship hull structures

Nicholas E. Silionis, Konstantinos N. Anyfantis

Comments 39 pages, 18 figures, Preprint submitted to journal

详情
英文摘要

Recent advances in Structural Health Monitoring (SHM) have attracted industry interest, yet real-world applications, such as in ship structures remain scarce. Despite SHM's potential to optimise maintenance, its adoption in ships is limited due to the lack of clearly quantifiable benefits for hull maintenance. This study employs a Bayesian pre-posterior decision analysis to quantify the value of information (VoI) from SHM systems monitoring corrosion-induced thickness loss (CITL) in ship hulls, in a first-of-its-kind analysis for ship structures. We define decision-making consequence cost functions based on exceedance probabilities relative to a target CITL threshold, which can be set by the decision-maker. This introduces a practical aspect to our framework, that enables implicitly modelling the decision-maker's risk perception. We apply this framework to a large-scale, high-fidelity numerical model of a commercial vessel and examine the relative benefits of different CITL monitoring strategies, including strain-based SHM and traditional on-site inspections.

2407.05790 2026-04-17 stat.CO stat.ML

Kinetic Interacting Particle Langevin Monte Carlo

Paul Felix Valsecchi Oliva, O. Deniz Akyildiz

详情
英文摘要

This paper introduces and analyses interacting underdamped Langevin algorithms, termed Kinetic Interacting Particle Langevin Monte Carlo (KIPLMC) methods, for statistical inference in latent variable models. We propose a diffusion process that evolves jointly in the space of parameters and latent variables and show that the stationary distribution of this diffusion concentrates around the maximum marginal likelihood estimate of the parameters. We then provide two explicit discretisations of this diffusion as practical algorithms to estimate parameters of statistical models. For each algorithm, we obtain nonasymptotic rates of convergence in Wasserstein-2 distance for the case where the joint log-likelihood is strongly concave with respect to latent variables and parameters. We achieve accelerated convergence rates clearly demonstrating improvement in dimension dependence. To demonstrate the utility of the introduced methodology, we provide numerical experiments that illustrate the effectiveness of the proposed diffusion for statistical inference. Our setting covers a broad number of applications, including unsupervised learning, statistical inference, and inverse problems.

2604.14860 2026-04-17 stat.ML cs.LG

Best of both worlds: Stochastic & adversarial best-arm identification

Yasin Abbasi-Yadkori, Peter L. Bartlett, Victor Gabillon, Alan Malek, Michal Valko

Comments Published in Conference on Learning Theory (COLT 2018)

详情
英文摘要

We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the rewards are sampled stochastically. Therefore, we ask: Can we design a learner that performs optimally in both the stochastic and adversarial problems while not being aware of the nature of the rewards? First, we show that designing such a learner is impossible in general. In particular, to be robust to adversarial rewards, we can only guarantee optimal rates of error on a subset of the stochastic problems. We give a lower bound that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards. Finally, we design a simple parameter-free algorithm and show that its probability of error matches (up to log factors) the lower bound in stochastic problems, and it is also robust to adversarial ones.

2604.14810 2026-04-17 stat.ML cs.LG stat.CO

Scalable Model-Based Clustering with Sequential Monte Carlo

Connie Trojan, Pavel Myshkov, Paul Fearnhead, James Hensman, Tom Minka, Christopher Nemeth

Comments Accepted at AISTATS 2026. 31 pages, 20 figures

详情
英文摘要

In online clustering problems, there is often a large amount of uncertainty over possible cluster assignments that cannot be resolved until more data are observed. This difficulty is compounded when clusters follow complex distributions, as is the case with text data. Sequential Monte Carlo (SMC) methods give a natural way of representing and updating this uncertainty over time, but have prohibitive memory requirements for large-scale problems. We propose a novel SMC algorithm that decomposes clustering problems into approximately independent subproblems, allowing a more compact representation of the algorithm state. Our approach is motivated by the knowledge base construction problem, and we show that our method is able to accurately and efficiently solve clustering problems in this setting and others where traditional SMC struggles.

2604.14809 2026-04-17 stat.ML cs.LG stat.AP

Expert-Guided Class-Conditional Goodness-of-Fit Scores for Interpretable Classification with Informative Missingness: An Application to Seismic Monitoring

Shahar Cohen, David M. Steinberg, Yael Radzyner, Yochai Ben Horin

Comments 50 pages, 8 figures

详情
英文摘要

We study a classification problem with three key challenges: pervasive informative missingness, the integration of partial prior expert knowledge into the learning process, and the need for interpretable decision rules. We propose a framework that encodes prior knowledge through an expert-guided class-conditional model for one or more classes, and use this model to construct a small set of interpretable goodness-of-fit features. The features quantify how well the observed data agree with the expert model, isolating the contributions of different aspects of the data, including both observed and missing components. These features are combined with a few transparent auxiliary summaries in a simple discriminative classifier, resulting in a decision rule that is easy to inspect and justify. We develop and apply the framework in the context of seismic monitoring used to assess compliance with the Comprehensive Nuclear-Test-Ban Treaty. We show that the method has strong potential as a transparent screening tool, reducing workload for expert analysts. A simulation designed to isolate the contribution of the proposed framework shows that this interpretable expert-guided method can even outperform strong standard machine-learning classifiers, particularly when training samples are small.

2604.14702 2026-04-17 cs.LG stat.ML

Gating Enables Curvature: A Geometric Expressivity Gap in Attention

Satwik Bathula, Anand A. Joshi

Comments 41 pages, 9 figures

详情
英文摘要

Multiplicative gating is widely used in neural architectures and has recently been applied to attention layers to improve performance and training stability in large language models. Despite the success of gated attention, the mathematical implications of gated attention mechanisms remain poorly understood. We study attention through the geometry of its representations by modeling outputs as mean parameters of Gaussian distributions and analyzing the induced Fisher--Rao geometry. We show that ungated attention operator is restricted to intrinsically flat statistical manifolds due to its affine structure, while multiplicative gating enables non-flat geometries, including positively curved manifolds that are unattainable in the ungated setting. These results establish a geometric expressivity gap between ungated and gated attention. Empirically, we show that gated models exhibit higher representation curvature and improved performance on tasks requiring nonlinear decision boundaries whereas they provide no consistent advantage on tasks with linear decision boundaries. Furthermore, we identify a structured regime in which curvature accumulates under composition, yielding a systematic depth amplification effect.

2604.14669 2026-04-17 cs.LG math.DS math.OC stat.ML

Zeroth-Order Optimization at the Edge of Stability

Minhak Song, Liang Zhang, Bingcong Li, Niao He, Michael Muehlebach, Sewoong Oh

Comments 38 pages

详情
英文摘要

Zeroth-order (ZO) methods are widely used when gradients are unavailable or prohibitively expensive, including black-box learning and memory-efficient fine-tuning of large models, yet their optimization dynamics in deep learning remain underexplored. In this work, we provide an explicit step size condition that exactly captures the (mean-square) linear stability of a family of ZO methods based on the standard two-point estimator. Our characterization reveals a sharp contrast with first-order (FO) methods: whereas FO stability is governed solely by the largest Hessian eigenvalue, mean-square stability of ZO methods depends on the entire Hessian spectrum. Since computing the full Hessian spectrum is infeasible in practical neural network training, we further derive tractable stability bounds that depend only on the largest eigenvalue and the Hessian trace. Empirically, we find that full-batch ZO methods operate at the edge of stability: ZO-GD, ZO-GDM, and ZO-Adam consistently stabilize near the predicted stability boundary across a range of deep learning training problems. Our results highlight an implicit regularization effect specific to ZO methods, where large step sizes primarily regularize the Hessian trace, whereas in FO methods they regularize the top eigenvalue.

2604.14657 2026-04-17 stat.AP

Evacuation destination choices during Hurricane Ian: A direct demand modeling approach

Alessandra Recalde, Luyu Liu, Xiaojian Zhang, Sangung Park, Shangkun Jiang, Xilei Zhao

详情
英文摘要

Hurricanes are causing unprecedented damage to the natural environment, infrastructure, and communities. Understanding evacuation behavior is essential for improving emergency preparedness. Past studies have relied on surveys and interviews, which are prone to recall bias. Additionally, they urge incorporating social vulnerability in evacuation research, emphasizing its impact on evacuation capability and destination choice. This study addresses these gaps by analyzing evacuation behavior using mobile device location data from Hurricane Ian, one of Florida's deadliest hurricanes, and directly incorporating variables from the Social Vulnerability Index (SVI) into a zone-to-zone (census tract level) evacuation demand model. We find that vehicle availability, residence in group quarters, road density, and English proficiency have significant effects on evacuation demand, shaping both the ability to evacuate from origin tracts and the attractiveness of destination tracts. Travel impedance, measured as distance, also plays a significant role, with evacuees substantially less likely to travel longer distances.

2604.14649 2026-04-17 stat.ME math.ST stat.TH

Model Checking for Regressions Based on Weighted Residual Processes with Diverging Number of Predictors

Yue Hu, Haiqi Li, Xintao Xia

详情
英文摘要

The integrated conditional moment (ICM) test is a classical and widely used method for assessing the adequacy of regression models. Although it performs well in fixed-dimension settings, its behavior changes dramatically when the predictor dimension diverges: in such regimes, the limiting null and alternative distributions of the ICM statistic degenerate to fixed constants. Moreover, when the number of predictors diverges, the commonly used wild bootstrap no longer approximates the null distribution of the ICM statistic well, leading to size distortion and substantial power loss. To address these challenges, we propose a new specification test based on weighted residual processes for evaluating the parametric form of the regression mean function in high-dimensional settings where the number of predictors increases with the sample size. We establish the asymptotic properties of the test statistic under the null hypothesis and under global and local alternatives. The proposed test maintains the nominal significance level and can detect local alternatives that deviate from the null hypothesis at the parametric rate $1/\sqrt{n}$. Furthermore, we propose a smooth residual bootstrap to approximate the limiting null distribution and establish its validity in high-dimensional settings. Two simulation studies and a real-data example are conducted to evaluate the finite-sample performance of the proposed test.

2604.14587 2026-04-17 cs.LG math.OC stat.ML

CLion: Efficient Cautious Lion Optimizer with Enhanced Generalization

Feihu Huang, Guanyi Zhang, Songcan Chen

Comments 30 pages

详情
英文摘要

Lion optimizer is a popular learning-based optimization algorithm in machine learning, which shows impressive performance in training many deep learning models. Although convergence property of the Lion optimizer has been studied, its generalization analysis is still missing. To fill this gap, we study generalization property of the Lion via algorithmic stability based on the mathematical induction. Specifically, we prove that the Lion has a generalization error of $O(\frac{1}{Nτ^T})$, where $N$ is training sample size, and $τ>0$ denotes the smallest absolute value of non-zero element in gradient estimator, and $T$ is the total iteration number. In addition, we obtain an interesting byproduct that the SignSGD algorithm has the same generalization error as the Lion. To enhance generalization of the Lion, we design a novel efficient Cautious Lion (i.e., CLion) optimizer by cautiously using sign function. Moreover, we prove that our CLion has a lower generalization error of $O(\frac{1}{N})$ than $O(\frac{1}{Nτ^T})$ of the Lion, since the parameter $τ$ generally is very small. Meanwhile, we study convergence property of our CLion optimizer, and prove that our CLion has a fast convergence rate of $O(\frac{\sqrt{d}}{T^{1/4}})$ under $\ell_1$-norm of gradient for nonconvex stochastic optimization, where $d$ denotes the model dimension. Extensive numerical experiments demonstrate effectiveness of our CLion optimizer.

2604.14571 2026-04-17 stat.ME stat.CO

Bayesian sparse principal coordinates analysis with delta-tolerant linear approximation for microbiome data

Hsin-Hsiung Huang, Ruitao Liu, Liangliang Zhang, Shao-Hsuan Wang

详情
英文摘要

Principal coordinates analysis (PCoA) is a standard exploratory tool for microbiome beta-diversity studies, but its axes are defined by pairwise dissimilarities and therefore do not directly identify the taxa driving an ordination. We propose Bayesian sparse principal coordinates analysis (BSPCoA), a post hoc framework that approximates the leading principal coordinates by a sparse linear surrogate in the observed taxa. A delta-tolerance diagnostic quantifies the discrepancy between the classical ordination and its best linear surrogate, clarifying when taxon-level interpretation is well supported. We place three-parameter beta normal global-local priors on the surrogate coefficients to induce row sparsity, obtain posterior uncertainty, and select influential taxa. The method reduces to sparse principal component analysis under Euclidean distance, while remaining applicable to ecologically meaningful dissimilarities such as Bray--Curtis and Hellinger distances. We conduct simulation studies to demonstrate that BSPCoA provides an approximately linear representation of the dominant ordination geometry while enhancing interpretability in sparse microbiome settings. In the Hadza gut microbiome data, the method produces an ordination close to that of classical PCoA while highlighting a parsimonious set of taxa associated with seasonal variation.

2604.14534 2026-04-17 cs.LG stat.AP

An unsupervised decision-support framework for multivariate biomarker analysis in athlete monitoring

Fernando Barcelos Rosito, Sebastião De Jesus Menezes, Simone Ferreira Sturza, Adriana Seixas, Muriel Figueredo Franco

Comments 15 pages, 4 figures, 3 tables, submitted to Springer Nature Scientific Reports

详情
英文摘要

Purpose. Athlete monitoring is constrained by small cohorts, heterogeneous biomarker scales, limited feasibility of repeated sampling, and the lack of reliable injury ground truth. These limitations reduce the interpretability and utility of traditional univariate and binary risk models. This study addresses these challenges by proposing an unsupervised multivariate framework to identify latent physiological states in athletes using real data. Methods. We propose a modular computational framework that operates in the joint biomarker space, integrating preprocessing, clinical safety screening, unsupervised clustering, and centroid-based physiological interpretation. Profiles are learned exclusively from amateur soccer players during a competitive microcycle. Synthetic data augmentation evaluates robustness and scalability. Ward hierarchical clustering supports monitoring and etiological differentiation, while Gaussian Mixture Models (GMM) enable structural stability analysis in high-dimensional settings. Results. The framework identifies coherent profiles that distinguish mechanical damage from metabolic stress while preserving homeostatic states. Synthetic data augmentation demonstrates feasibility and detection of latent silent risk phenotypes typically missed by univariate monitoring. Structural analyses indicate robustness under augmentation and higher-dimensional settings. Conclusion. The framework enables interpretable identification of latent physiological states from multivariate biomarker data without injury labels. By distinguishing mechanisms and revealing silent risk patterns not captured by conventional monitoring, it provides actionable insights for individualized athlete monitoring and decision making.

2604.14517 2026-04-17 stat.ME

Bayesian Node-Level Outlier Detection for Graph Signals

Seongmin Kim, Kyusoon Kim

Comments 35 pages, 4 figures

详情
英文摘要

This paper proposes a fully Bayesian framework for node-level outlier detection in graph signals, where measurements are observed on the nodes of an underlying graph. Unlike traditional outlier detection methods, our approach accounts for the relational dependencies induced by the graph, identifying outliers that disrupt the underlying smoothness. We model the observed signal as a combination of a graph-smooth component, captured via an intrinsic Gaussian Markov random field (IGMRF) prior, and a sparse outlier component modeled by a spike-and-slab prior. A key advantage of the proposed method is its ability to provide principled uncertainty quantification by estimating the posterior probability that each node is an outlier, rather than enforcing a deterministic binary decision. To facilitate posterior inference, we develop an efficient Gibbs sampling algorithm. We demonstrate the effectiveness of the proposed method through simulation studies on various graph structures, as well as a real data analysis of PM2.5 levels in California, exploring their relationship with wildfire occurrences.

2604.14498 2026-04-17 cs.AI cs.LG stat.ML

Improving Machine Learning Performance with Synthetic Augmentation

Mel Sohm, Charles Dezons, Sami Sellami, Oscar Ninou, Axel Pincon

详情
英文摘要

Synthetic augmentation is increasingly used to mitigate data scarcity in financial machine learning, yet its statistical role remains poorly understood. We formalize synthetic augmentation as a modification of the effective training distribution and show that it induces a structural bias--variance trade-off: while additional samples may reduce estimation error, they may also shift the population objective whenever the synthetic distribution deviates from regions relevant under evaluation. To isolate informational gains from mechanical sample-size effects, we introduce a size-matched null augmentation and a finite-sample, non-parametric block permutation test that remains valid under weak temporal dependence. We evaluate this framework in both controlled Markov-switching environments and real financial datasets, including high-frequency option trade data and a daily equity panel. Across generators spanning bootstrap, copula-based models, variational autoencoders, diffusion models, and TimeGAN, we vary augmentation ratio, model capacity, task type, regime rarity, and signal-to-noise. We show that synthetic augmentation is beneficial only in variance-dominant regimes, such as persistent volatility forecasting-while it deteriorates performance in bias-dominant settings, including near-efficient directional prediction. Rare-regime targeting can improve domain-specific metrics but may conflict with unconditional permutation inference. Our results provide a structural perspective on when synthetic data improves financial learning performance and when it induces persistent distributional distortion.

2604.14497 2026-04-17 cs.CE stat.AP

Robust Optimal Experimental Design Accounting for Sensor Failure

Rebekah White, Chandler Smith, Drew Kouri, Jace Ritchie, Wilkins Aquino, Timothy Walsh

详情
英文摘要

Optimal experimental design provides a way of determining a-priori the best locations at which to place accelerometers in vibrations analysis experiments. However, in practice, sensors often fail during experimentation due high mechanical accelerations. There have been limited works exploring the use of robust OED in the context of vibrations analysis, where design spaces (i.e. candidate sensor locations and orientations) are high-dimensional and the finite-element models are expensive to compute. Therefore, this work considers the application of more general robust OED formulations to such a structural dynamics problem. We employ a relaxation-based approach that enables the use of efficient gradient-based optimization. Furthermore, we leverage a binary-inducing penalty during optimization to provide a binary sensor design as an alternative to leveraging post-optimization rounding heuristics. We consider performance metrics based on the log-determinant of the parameter covariance as well those based on parameter and prediction mean-squared errors. We find that although robust and classical designs are similar for the structural dynamics problem of interest, robust designs outperform classical designs on average over relevant failure scenarios of interest.

2604.14482 2026-04-17 math.NT math.CA math.ST stat.TH

Arithmetic functions and learning theory

W. Burstein, A. Iosevich, A. Sant

详情
英文摘要

We establish a connection between analytic number theory and computational learning theory by showing that the Möbius function belongs to a class of functions that is statistically hard to learn from random samples. Let $μ_R$ denote the restriction of the Möbius function to the squarefree integers in $\{1,\dots,R\}$. Using a recent lower bound of Pandey and Radziwiłł for the $L^1$ norm of exponential sums with Möbius coefficients, we prove that \[ \FR(μ_R) \gg R^{-1/4-ε} \] for every $ε>0$. We then show that, for a suitable absolute constant $c_0>0$, the class of $\{-1,1\}$-valued functions on the squarefree integers with Fourier Ratio at least $c_0$ has Vapnik--Chervonenkis dimension at least $cR$. It follows that any distribution-independent learning algorithm that succeeds uniformly on the class $\mathcal{H}_R(η_R)$ containing $μ_R$, where $η_R \to 0$, requires at least $Ω(R)$ samples. We also discuss a conditional improvement under a strong uniform bound for additive twists of the Möbius function, and we note that the same method applies to the Liouville function.

2604.14407 2026-04-17 stat.ME

Propensity Score Weighting to Ensure Balance in Key Subgroups or Strata: A Practical Guide

Emma K. Mackay, Amol A. Verma, Fahad Razak, Surain B. Roberts

Comments 15 pages, 1 figure

详情
英文摘要

Propensity score weighting approaches have been widely implemented in clinical research to estimate the effects of a treatment or exposure while mitigating the risk of confounding in the absence of random assignment. In practice, when working with large electronic health records (EHR) or administrative datasets to evaluate health quality outcomes at the institutional level, or evaluate supportive care interventions for a wide range of hospitalized patients, it may be advisable to stratify the propensity score weighting approach by indication, reason for admission, or other clinical risk factors due to the potential for substantial heterogeneity across subgroups of patients with complex care needs. A stratified approach may be appropriate if (i) prognosis differs substantially between patient subgroups such that achieving balance in the composition of these strata between exposure/treatment groups should be prioritized, (ii) likelihood of exposure differs substantially across clinical subgroups, or (iii) the covariate-exposure associations are expected to differ substantially between subgroups (i.e. there are covariate-subgroup interactions in the exposure/treatment propensity model). For example, we may want to evaluate the impact of prophylactic anticoagulant use for venous thromboembolism prevention in elderly patients admitted to hospital for a wide array of conditions. The purpose of this article is to outline an approach to implementing propensity score weighting with stratification by clinical groups. We also provide guidance on best practices with particular focus on EHR and administrative medical data, and population health settings.

2604.14404 2026-04-17 math.ST stat.ME stat.ML stat.TH

Early-stopped aggregation: Adaptive inference with computational efficiency

Ilsang Ohn, Shitao Fan, Jungbin Jun, Lizhen Lin

详情
英文摘要

When considering a model selection or, more generally, an aggregation approach for adaptive statistical inference, it is often necessary to compute estimators over a wide range of model complexities including unnecessarily large models even when the true data-generating process is relatively simple, due to the lack of prior knowledge. This requirement can lead to substantial computational inefficiency. In this work, we propose a novel framework for efficient model aggregation called the early-stopped aggregation (ESA): instead of computing and aggregating estimators for all candidate models, we compute only a small number of simpler ones using an early-stopping criterion and aggregate only these for final inference. Our framework is versatile and applies to both Bayesian model selection, in particular, within the variational Bayes framework, and frequentist estimation, including a general penalized estimation setting. We investigate adaptive optimal property of the ESA approach across three learning paradigms. We first show that ESA achieves optimal adaptive contraction rates in the variational Bayes setting under mild conditions. We extend this result to variational empirical Bayes, where prior hyperparameters are chosen in a data-dependent manner. In addition, we apply the ESA approach to frequentist aggregation including both penalization-based and sample-splitting implementations, and establish corresponding theory. As we demonstrate, there is a clear unification between early-stopped Bayes and frequentist penalized aggregation, with a common "energy" functional comprising a data-fitting term and a complexity-control term that drives both procedures. We further present several applications and numerical studies that highlight the efficiency and strong performance of the proposed approach.

2604.14394 2026-04-17 econ.EM math.ST stat.TH

Generalized Autoregressive Multivariate Models: From Binary to Poisson

Anna Bykhovskaya, Nour Meddahi

Comments 39 pages

详情
英文摘要

This paper presents a framework for binary autoregressive time series in which each observation is a Bernoulli variable whose success probability evolves with past outcomes and probabilities, in the spirit of GARCH-type dynamics, accommodating nonlinearities, network interactions, and cross-sectional dependence in the multivariate case. Existence and uniqueness of a stationary solution is established via a coupling argument tailored to the discontinuities inherent in binary data. A key theoretical result, further supported by our empirical illustration on S&P 100 data, shows that, under a rare-events scaling, aggregates of such binary processes converge to a Poisson autoregression, providing a micro-foundation for this widely used count model. Maximum likelihood estimation is proposed and illustrated empirically.

2604.14370 2026-04-17 stat.ME cs.LG

Deployment of AI-Assisted Interventions: Capacity Constraints and Noisy Compliance

Carri W. Chan, Yi Han, Hannah Li, Benjamin L. Ranard

详情
英文摘要

AI tools increasingly guide targeted interventions in healthcare, education, and recruiting. Algorithms score individuals, trigger outreach to those above a threshold (e.g., high-risk or high-value), and encourage them to request service; then providers deliver service to those who request. Standard practice sets the threshold and selects the algorithm to maximize predictive accuracy, assuming that better predictions yield better outcomes. We show that this approach is suboptimal when limited service capacity and probabilistic behavioral responses influence who receives service. In such settings, the optimal score threshold must balance two effects: ensuring all capacity is filled (utilization) and ensuring high-value individuals are served despite competition between requests (cannibalization). We characterize the optimal threshold and prove that policies based solely on predictive accuracy are generally suboptimal. Further, because optimal thresholds vary with service capacity, algorithm selection metrics like AUC, which weight all thresholds equally, are misaligned with operational performance. We introduce a new metric--Operational AUC (OpAUC)--and show it leads to optimal algorithm selection. Finally, we conduct a case study on sepsis early warning data and illustrate the magnitude of improvement that can be achieved from improved threshold and algorithm selection.

2604.14364 2026-04-17 stat.AP

Joint Bayesian Inference of Genetic Effect Sizes and PK Parameters in Nonlinear Mixed-Effects Models

Julien Martinelli, Ibtissem Rebai, David W. Haas, Julie Bertrand

详情
英文摘要

High-dimensional genetic covariate selection in population pharmacokinetic (PK) models is challenging due to the cohort's restricted size and high correlation among single-nucleotide polymorphisms (SNPs). We propose a fully Bayesian, single-stage framework that jointly infers nonlinear mixed effect model (NLMEM) parameters and SNP effect sizes, providing coherent posterior uncertainty and inclusion summaries within a single model fit. We compare five sparsity-inducing priors -- Spike-and-Slab, Hierarchical Lasso, Regularized Horseshoe, R2--D2, and the $\ell_1$-ball -- calibrated through effect-size and sparsity targets. In simulations, all priors showed low false-discovery rates around $0$--$0.08$ under the null, and recovered the causal signal under the alternative, with peak $F_1$ scores around $0.8$--$0.85$ under reasonable inclusion cutoffs. Spike-and-Slab was especially attractive because it provides analytical posterior inclusion probabilities directly, while among priors requiring tolerance-based proxy inclusion summaries, the $\ell_1$-ball combined similarly strong recovery with the most stable behavior across tolerance values. On genetic and PK data from the ANRS 12154 study in 129 Cambodians living with HIV and receiving nevirapine, posterior predictive checks indicated adequate calibration and PK parameter inference remained stable across priors. While the dominant signal was robust across priors, additional candidate SNPs showed only partial agreement in ranking and more prior-sensitive effect-size estimates. These results support Bayesian variable selection within joint NLMEM as a principled approach for pharmacogenetic analyses when uncertainty quantification and regularization are central.

2604.14352 2026-04-17 stat.ME cs.LG stat.AP

PROXIMA: A Reliability Scoring Framework for Proxy Metrics in Online Controlled Experiments

Avinash Amudala

Comments 14 pages. Sole-author submission. Independent research. Companion code at https://github.com/Avinash-Amudala/PROXIMA. Zenodo archive: 10.5281/zenodo.15483241. Related US provisional patent application: 63/974,569 (filed Feb 3, 2026)

详情
英文摘要

Online A/B testing at scale relies on proxy metrics -- short-term, easily-measured signals used in place of slow-moving long-term outcomes. When the proxy-outcome relationship is heterogeneous across user segments, aggregate correlation can mask directional failures akin to Simpson's Paradox, leading to costly ship/no-ship errors. We introduce PROXIMA (Proxy Metric Validation Framework for Online Experiments), a lightweight diagnostic framework that scores proxy reliability through a composite of three complementary dimensions: normalised effect correlation, directional accuracy, and segment-level fragility rate. Unlike surrogate-index approaches that predict long-term treatment effects, PROXIMA directly audits whether a candidate proxy leads to correct launch decisions and flags the user segments where it fails. We validate PROXIMA on two public datasets -- the Criteo Uplift corpus (14M observations, advertising) and KuaiRec (7K users, video recommendation) -- using 80 simulated A/B tests. Early engagement metrics achieve a composite reliability of 0.80 on Criteo and 0.62 on KuaiRec, yielding 98.4% average decision agreement with an oracle policy. Fragility analysis reveals that recommendation domains exhibit substantially higher segment-level heterogeneity (68% fragility) than advertising (13%), yet directional accuracy remains above 96% in both cases. A sensitivity analysis over the weight space confirms that no single component suffices and that the composite provides substantially better discrimination between reliable and unreliable proxies than correlation alone. Code and reproduction scripts are available at: https://github.com/Avinash-Amudala/PROXIMA

2604.14338 2026-04-17 cs.LG stat.ML

Path-Sampled Integrated Gradients

Firuz Kamalov, Fadi Thabtah, R. Sivaraj, Neda Abdelhamid

详情
Journal ref
Gulf Journal of Mathematics, Vol 22, Issue 1 (2026)
英文摘要

We introduce path-sampled integrated gradients (PS-IG), a framework that generalizes feature attribution by computing the expected value over baselines sampled along the linear interpolation path. We prove that PS-IG is mathematically equivalent to path-weighted integrated gradients, provided the weighting function matches the cumulative distribution function of the sampling density. This equivalence allows the stochastic expectation to be evaluated via a deterministic Riemann sum, improving the error convergence rate from $O(m^{-1/2})$ to $O(m^{-1})$ for smooth models. Furthermore, we demonstrate analytically that PS-IG functions as a variance-reducing filter against gradient noise - strictly lowering attribution variance by a factor of 1/3 under uniform sampling - while preserving key axiomatic properties such as linearity and implementation invariance.

2604.14331 2026-04-17 cs.LG stat.ML

Heat and Matérn Kernels on Matchings

Dmitry Eremeev, Salem Said, Viacheslav Borovitskiy

详情
英文摘要

Applying kernel methods to matchings is challenging due to their discrete, non-Euclidean nature. In this paper, we develop a principled framework for constructing geometric kernels that respect the natural geometry of the space of matchings. To this end, we first provide a complete characterization of stationary kernels, i.e. kernels that respect the inherent symmetries of this space. Because the class of stationary kernels is too broad, we specifically focus on the heat and Matérn kernel families, adding an appropriate inductive bias of smoothness to stationarity. While these families successfully extend widely popular Euclidean kernels to matchings, evaluating them naively incurs a prohibitive super-exponential computational cost. To overcome this difficulty, we introduce and analyze a novel, sub-exponential algorithm leveraging zonal polynomials for efficient kernel evaluation. Finally, motivated by the known bijective correspondence between matchings and phylogenetic trees-a crucial data modality in biology-we explore whether our framework can be seamlessly transferred to the space of trees, establishing novel negative results and identifying a significant open problem.

2604.14305 2026-04-17 stat.ME cs.LG q-bio.GN stat.AP

Combining Bayesian and Frequentist Inference for Laboratory-Specific Performance Guarantees in Copy Number Variation Detection

Austin Talbot, Alex V. Kotlar, Yue Ke

详情
英文摘要

Targeted amplicon panels are widely used in oncology diagnostics, but providing per-gene performance guarantees for copy number variant (CNV) detection remains challenging due to amplification artifacts, process-mismatch heterogeneity, and limited validation sample sizes. While Bayesian CNV callers naturally quantify per-sample uncertainty, translating this into the frequentist population-level guarantees required for clinical validation, coverage rates, false-positive bounds, and minimum detectable copy-number changes, is a fundamentally different inferential problem. We show empirically that even robust Bayesian credible intervals, including coarsened posteriors and sandwich-adjusted intervals, are severely miscalibrated on panels with small amplicon counts per gene. To address this, we propose a hybrid framework that evaluates Bayesian posterior functionals on validation samples and models the resulting squared losses with a Gamma distribution, yielding tolerance intervals with valid frequentist coverage. Three components make the method practical under real-world constraints: (1) imputation that removes the influence of true CNV-positive samples without requiring known ground truth, (2) regularization to address small sample variability, and (3) evidence-based stratification on the log model evidence to accommodate non-exchangeable noise profiles arising from process mismatch. Evaluated on two targeted amplicon panels using leave-one-out cross-validation, the proposed method achieves single-digit mean absolute coverage error across all genes under both process-matched and unmatched conditions, whereas Bayesian comparators exhibit mean absolute errors exceeding 60\% on clinically relevant genes such as ERBB2.

2604.14257 2026-04-17 econ.GN q-fin.EC stat.AP

Mapping the causal structure of price formation in Texas's transitioning electricity market

Shiva Madadkhani, Nils Sturma, Mathias Drton, Svetlana Ikonnikova

详情
英文摘要

Electricity markets are changing, driven by large-scale renewable integration and rising demand from electrification and digitalisation. This raises fundamental questions about how electricity prices form as the relationships among key price determinants evolve. Here we apply causal discovery to characterise these dynamics across major supply- and demand-side drivers of wholesale electricity prices in Texas, where rapid renewable growth intersects with surging demand. We show that wind generation has become the dominant causal driver of day-ahead electricity prices with effects more than 3 times larger than those of natural gas prices, overturning the view of the Texas market as gas-price-driven. Wind reduces prices locally but redistributes congestion costs across regions in seasonally varying patterns. Natural gas prices remain causally relevant, though their influence is modest and the dominant gas benchmark changes over time. Electricity demand also shows region- and period-specific causal effects. These findings highlight the need for causal models that capture time-varying relationships across both supply and demand to guide system planners and market participants navigating the ongoing transition.

2604.14249 2026-04-17 cs.LG stat.ML

Metric-Aware Principal Component Analysis (MAPCA):A Unified Framework for Scale-Invariant Representation Learning

Michael Leznik

Comments 12 pages , one figure

详情
英文摘要

We introduce Metric-Aware Principal Component Analysis (MAPCA), a unified framework for scale-invariant representation learning based on the generalised eigenproblem max Tr(W^T Sigma W) subject to W^T M W = I, where M is a symmetric positive definite metric matrix. The choice of M determines the representation geometry. The canonical beta-family M(beta) = Sigma^beta, beta in [0,1], provides continuous spectral bias control between standard PCA (beta=0) and output whitening (beta=1), with condition number kappa(beta) = (lambda_1/lambda_p)^(1-beta) decreasing monotonically to isotropy. The diagonal metric M = D = diag(Sigma) recovers Invariant PCA (IPCA), a method rooted in Frisch (1928) diagonal regression, as a distinct member of the broader framework. We prove that scale invariance holds if and only if the metric transforms as M_tilde = CMC under rescaling C, a condition satisfied exactly by IPCA but not by the general beta-family at intermediate values. Beyond its classical interpretation, MAPCA provides a geometric language that unifies several self-supervised learning objectives. Barlow Twins and ZCA whitening correspond to beta=1 (output whitening); VICReg's variance term corresponds to the diagonal metric. A key finding is that W-MSE, despite being described as a whitening-based method, corresponds to M = Sigma^{-1} (beta = -1), outside the spectral compression range entirely and in the opposite spectral direction to Barlow Twins. This distinction between input and output whitening is invisible at the level of loss functions and becomes precise only within the MAPCA framework.

2604.14230 2026-04-17 stat.AP

A Statistical Market-Design Framework for Academic Job Markets

Ali Kaazempur-Mofrad, Xiaowu Dai, Xuming He

详情
英文摘要

The academic job market for new statisticians is highly congested at the interview stage, where departments must rank and select candidates from large applicant pools without credible signals of candidate interest. As a result, interviews and offers are often misallocated, leading to unfilled positions and poor mutual fit. We frame interview allocation as a statistical ranking problem under uncertainty and propose a market-design framework that incorporates structured preference signaling into interview selection. Candidates submit a single standardized questionnaire describing preferences over interpretable job characteristics, which departments combine with traditional application materials and historical hiring data to estimate candidate-specific acceptance probabilities and expected utilities. To account for estimation uncertainty, we employ a confidence-calibrated ranking procedure based on pairwise utility comparisons that provides statistical guarantees for candidate ranking. We establish that truthful participation is optimal for candidates and that preference information improves departmental outcomes and matching stability. We use a dataset of U.S. statistics departments to show that the proposed framework substantially increases matching rates, improves match quality, and reduces hiring failures relative to the current practice.

2604.14209 2026-04-17 cs.LG cs.AI stat.ML

Towards Verified and Targeted Explanations through Formal Methods

Hanchen David Wang, Diego Manzanas Lopez, Preston K. Robinette, Ipek Oguz, Taylor T. Johnson, Meiyi Ma

Comments Paper has been accepted at JAIR

详情
英文摘要

As deep neural networks are deployed in safety-critical domains such as autonomous driving and medical diagnosis, stakeholders need explanations that are interpretable but also trustworthy with formal guarantees. Existing XAI methods fall short: heuristic attribution techniques (e.g., LIME, Integrated Gradients) highlight influential features but offer no mathematical guarantees about decision boundaries, while formal methods verify robustness yet remain untargeted, analyzing the nearest boundary regardless of whether it represents a critical risk. In safety-critical systems, not all misclassifications carry equal consequences; confusing a "Stop" sign for a "60 kph" sign is far more dangerous than confusing it with a "No Passing" sign. We introduce ViTaX (Verified and Targeted Explanations), a formal XAI framework that generates targeted semifactual explanations with mathematical guarantees. For a given input (class y) and a user-specified critical alternative (class t), ViTaX: (1) identifies the minimal feature subset most sensitive to the y->t transition, and (2) applies formal reachability analysis to guarantee that perturbing these features by epsilon cannot flip the classification to t. We formalize this through Targeted epsilon-Robustness, certifying whether a feature subset remains robust under perturbation toward a specific target class. ViTaX is the first method to provide formally guaranteed explanations of a model's resilience against user-identified alternatives. Evaluations on MNIST, GTSRB, EMNIST, and TaxiNet demonstrate over 30% fidelity improvement with minimal explanation cardinality.

2604.14206 2026-04-17 cs.LG q-fin.PM stat.ML

Portfolio Optimization Proxies under Label Scarcity and Regime Shifts via Bayesian and Deterministic Students under Semi-Supervised Sandwich Training

Adhiraj Chattopadhyay

Comments 18 pages of main text. 10 pages of appendices. 35 references. Around 13 figures

详情
英文摘要

This paper proposes a machine learning assisted portfolio optimization framework designed for low data environments and regime uncertainty. We construct a teacher student learning pipeline in which a Conditional Value at Risk (CVaR) optimizer generates supervisory labels, and neural models (Bayesian and deterministic) are trained using both real and synthetically augmented data. The synthetic data is generated using a factor based model with t copula residuals, enabling training beyond the limited real sample of 104 labeled observations. We evaluate four student models under a structured experimental framework comprising (i) controlled synthetic experiments (3 x 5 seed grid), (ii) in-distribution real market evaluation (C2A) and (iii) cross-universe generalization (D2A). In real-market settings, models are deployed using a rolling evaluation protocol where a frozen pretrained model is periodically fine tuned on recent observations and reset to its base state, ensuring stability while allowing limited adaptation. Results show that student models can match or outperform the CVaR teacher in several settings, while achieving improved robustness under regime shifts and reduced turnover. These findings suggest that hybrid optimization learning approaches can enhance portfolio construction in data constrained environments

2604.14182 2026-04-17 stat.ME stat.ML

Cellwise Outliers

Mia Hubert, Jakob Raymaekers, Peter J. Rousseeuw

Comments This is a review paper

详情
英文摘要

In statistics and machine learning, the traditional meaning of the terms `outlier' and `anomaly' is a case in the dataset that behaves differently from the bulk of the data. This raises suspicion that it may belong to a different population. But nowadays increasing attention is being paid to so-called cellwise outliers. These are individual values somewhere in the data matrix (or data tensor). Depending on the dimension, even a relatively small proportion of outlying cells can contaminate over half the cases, which is a problem for existing casewise methods. It turns out that detecting cellwise outliers as well as constructing cellwise robust methods requires techniques that are quite different from the casewise setting. For instance, one has to let go of some intuitive equivariance properties. The problem is difficult, but the past decade has seen substantial progress. For high-dimensional data the cellwise approach is becoming dominant, and typically can deal with missing values as well. We review developments in the estimation of location and covariance matrices as well as regression methods, principal component analysis, methods for tensor data, and various other settings.

2604.14181 2026-04-17 math.ST stat.TH

A note on kernel density estimators with optimal bandwidths

Nils Lid Hjort, Stephen G. Walker

Comments 8 pages, 0 figures. Statistical Research Report, Department of Mathematics, University of Oslo, from June 2000, but arXiv'd April 2026. The papers is pubished in essentially this form in Statistics & Probabiity Letters, 2001, vol. 54, pages 153-159, at this url: https://www.sciencedirect.com/science/article/pii/S016771520100027X

详情
Journal ref
Statistics & Probabiity Letters, 2001, vol. 54, pages 153-159
英文摘要

We show that the cumulative distribution function corresponding to a kernel density estimator with optimal bandwidth lies outside any confidence interval, around the empirical distribution function, with probability tending to 1 as the sample size increases.

2604.14176 2026-04-17 cs.LG cs.AI stat.ML

The Devil Is in Gradient Entanglement: Energy-Aware Gradient Coordinator for Robust Generalized Category Discovery

Haiyang Zheng, Nan Pu, Yaqi Cai, Teng Long, Wenjing Li, Nicu Sebe, Zhun Zhong

Comments Accepted by CVPR26

详情
英文摘要

Generalized Category Discovery (GCD) leverages labeled data to categorize unlabeled samples from known or unknown classes. Most previous methods jointly optimize supervised and unsupervised objectives and achieve promising results. However, inherent optimization interference still limits their ability to improve further. Through quantitative analysis, we identify a key issue, i.e., gradient entanglement, which 1) distorts supervised gradients and weakens discrimination among known classes, and 2) induces representation-subspace overlap between known and novel classes, reducing the separability of novel categories. To address this issue, we propose the Energy-Aware Gradient Coordinator (EAGC), a plug-and-play gradient-level module that explicitly regulates the optimization process. EAGC comprises two components: Anchor-based Gradient Alignment (AGA) and Energy-aware Elastic Projection (EEP). AGA introduces a reference model to anchor the gradient directions of labeled samples, preserving the discriminative structure of known classes against the interference of unlabeled gradients. EEP softly projects unlabeled gradients onto the complement of the known-class subspace and derives an energy-based coefficient to adaptively scale the projection for each unlabeled sample according to its degree of alignment with the known subspace, thereby reducing subspace overlap without suppressing unlabeled samples that likely belong to known classes. Experiments show that EAGC consistently boosts existing methods and establishes new state-of-the-art results. Code is available at https://haiyangzheng.github.io/EAGC.

2604.13861 2026-04-17 cs.LG stat.AP

Simulation-Based Optimisation of Batting Order and Bowling Plans in T20 Cricket

Tinniam V Ganesh

Comments Improved abstract wording and readability; minor textual edits, no change to methodology or results. Submitted to the Journal of Quantitative Analysis in Sports (JQAS), April 2026. 23 pages, 8 figures

详情
英文摘要

This paper develops a unified Markov Decision Process (MDP) framework for optimising two recurring in-match decisions in T20 cricket, namely batting order selection and bowling plan assignment, directly in terms of win and defend probability rather than expected runs. A three-phase player profile engine (Powerplay, Middle, Death) with James-Stein shrinkage (a technique that blends a player's individual statistics toward the league average when their phase-specific data is sparse) is estimated from 1,161 IPL ball-by-ball records (2008-2025). Win/defend probabilities are evaluated using vectorised Monte Carlo simulation over N = 50,000 innings trajectories. Batting orders are evaluated by comparing all feasible arrangements of the remaining players and selecting the one that maximises win probability. Bowling plans are optimised through a guided search over possible over assignments, progressively improving the allocation while respecting constraints such as the prohibition on consecutive overs by the same bowler. Applied to two 2026 IPL matches, the optimal batting order improves Mumbai Indians' win probability by 4.1 percentage points (52.4% to 56.5%), and the optimal Gujarat Titans bowling plan improves defend probability by 5.2 percentage points (39.1% to 44.3%). In both cases, the observed sub-optimality is consistent with phase-agnostic deployment: decisions that appear reasonable under aggregate metrics are shown to be costly when phase-specific profiles are applied.

2602.10955 2026-04-17 stat.ME stat.AP

Prior Smoothing for Multivariate Disease Mapping Models

Garazi Retegui, María Dolores Ugarte, Jaione Etxeberria, Alan E. Gelfand

详情
英文摘要

To date, we have seen the emergence of a large literature on multivariate disease mapping. That is, incidence of (or mortality from) multiple diseases is recorded at the scale of areal units where incidence (mortality) across the diseases is expected to manifest dependence. The modeling involves a hierarchical structure: a Poisson model for disease counts (conditioning on the rates) at the first stage, and a specification of a function of the rates using spatial random effects at the second stage. These random effects are specified as a prior and introduce spatial smoothing to the rate (or risk) estimates. What we see in the literature is the amount of smoothing induced under a given prior across areal units compared with the observed/empirical risks. Our contribution here extends previous research on smoothing in univariate areal data models. Specifically, for three different choices of multivariate prior, we investigate both within prior smoothing according to hyperparameters and across prior smoothing. Its benefit to the user is to illuminate the expected nature of departure from perfect fit associated with these priors since model performance is not a question of goodness of fit. We propose both theoretical and empirical metrics for our investigation and illustrate with both simulated and real data.

2601.14147 2026-04-17 math.OC stat.CO

Gradient flow for finding E-optimal designs

Jieling Shi, Kim-Chuan Toh, Xin T. Tong, Weng Kee Wong

Comments 44 pages, 3 figures

详情
英文摘要

The $E$-optimality criterion for a regression model maximizes the smallest eigenvalue of the information matrix and becomes non-differentiable when this eigenvalue has multiplicity greater than one. Working in the $2$-Wasserstein space, we show that the Wasserstein gradient at an empirical measure coincides, up to a constant factor, with the Euclidean particle gradient for smooth criteria such as $D$- and $L$-optimality, and that the approximation gap for equal-weight $N$-particle designs vanishes at an explicit rate. The main challenge is the nonsmooth $E$-criterion, for which the Wasserstein gradient does not exist. We replace it with a constrained Wasserstein steepest-ascent field obtained by maximizing feasible directional derivatives over the tangent cone of the design space, and prove that the resulting flow satisfies an exact energy identity and that every limit point is first-order stationary. The particle ascent computation reduces to a convex semidefinite programme whose dimension equals the multiplicity of the smallest eigenvalue. In numerical comparisons on second-order response surface models and a seven-dimensional logistic regression model, the constrained Wasserstein steepest-ascent method attains near-optimal $E$-criterion values and is markedly more reliable than particle swarm optimization in higher-dimensional settings. The framework applies more broadly to other nonsmooth minimax criteria in optimal design, and a numerical experiment on the minimax-single-parameter criterion confirms that the method attains the theoretical optimum.

2512.05024 2026-04-17 stat.ME cs.AI cs.LG

Model-Free Assessment of Simulator Fidelity via Quantile Curves

Garud Iyengar, Yu-Shiou Willy Lin, Kaizheng Wang

Comments 39 pages, 15 figures

详情
英文摘要

As generative AI models are increasingly used to simulate real-world systems, quantifying the ``sim-to-real'' gap is critical. For each input setting of interest -- which we call a \emph{scenario}, such as a survey question or operating condition -- the real and simulated systems are associated with unobserved latent population parameters, and their discrepancy varies across scenarios. A fundamental challenge is that, for any given scenario, this discrepancy cannot be observed directly, since both systems are accessible only through finite samples, often of heterogeneous sizes across scenarios. Standard predictive inference methods are therefore ill-suited, as they quantify uncertainty in observable outputs rather than latent population parameters. To address this, we construct confidence sets for these latent parameters and use them to derive a robust proxy for the sim-to-real discrepancy. We then estimate the quantile function of this proxy to obtain a distribution-level risk profile of the simulator, which supports a broad range of statistical summaries, including statistical inference for the real output distribution in a new scenario, the calculation of risk measures like Conditional Value-at-Risk (CVaR), and principled comparisons across simulators. Our method is model-agnostic and handles general output spaces, such as categorical survey responses and continuous multi-dimensional data. We demonstrate the practical utility of this method by evaluating the alignment of four major LLMs with human populations on the WorldValueBench dataset.

2511.18107 2026-04-17 cs.LG stat.ML

Active Learning with Selective Time-Step Acquisition for PDEs

Yegon Kim, Hyunsu Kim, Gyeonghoon Ko, Juho Lee

Comments This manuscript is an improvement over the camera-ready version in ICML 2025. We have added a clearer motivation for our acquisition function. (See Sections 2.3 and 3.2)

详情
Journal ref
ICML 2025
英文摘要

Accurately solving partial differential equations (PDEs) is critical to understanding complex scientific and engineering phenomena, yet traditional numerical solvers are computationally expensive. Surrogate models offer a more efficient alternative, but their development is hindered by the cost of generating sufficient training data from numerical solvers. In this paper, we present a novel framework for active learning in PDE surrogate modeling that reduces this cost. Unlike the existing AL methods for PDEs that always acquire entire PDE trajectories, our approach, STAP (**S**elective **T**ime-Step **A**cquisition for **P**DEs), strategically generates only the most important time steps with the numerical solver, while employing the surrogate model to approximate the remaining steps. This reduces the cost incurred by each trajectory and thus allows the active learning algorithm to try out a more diverse set of trajectories given the same budget. To accommodate this novel framework, we develop an acquisition function that estimates the utility of a set of time steps by approximating its resulting variance reduction. We demonstrate the effectiveness of our method on several benchmark PDEs.

2510.10260 2026-04-17 math.OC math.PR q-fin.MF stat.ML

Robust Exploratory Stopping under Ambiguity in Reinforcement Learning

Junyan Ye, Hoi Ying Wong, Kyunghyun Park

Comments 31 pages, 9 figures, 1 table

详情
英文摘要

We propose and analyze a continuous-time robust reinforcement learning framework for optimal stopping under ambiguity. In this framework, an agent chooses a robust exploratory stopping time motivated by two objectives: robust decision-making under ambiguity and learning about the unknown environment. Here, ambiguity refers to considering multiple probability measures dominated by a reference measure, reflecting the agent's awareness that the reference measure representing her learned belief about the environment would be erroneous. Using the $g$-expectation framework, we reformulate the optimal stopping problem under ambiguity as a robust exploratory control problem with Bernoulli distributed controls. We then characterize the optimal Bernoulli distributed control via backward stochastic differential equations and, based on this, construct the robust exploratory stopping time that approximates the optimal stopping time under ambiguity. Last, we establish a policy iteration theorem and implement it as a reinforcement learning algorithm. Numerical experiments demonstrate the convergence, robustness, and scalability of our reinforcement learning algorithm across different levels of ambiguity and exploration.

2508.06179 2026-04-17 math.ST stat.TH

Consistency of variational inference for Besov priors in non-linear inverse problems

Shaokang Zu, Junxiong Jia, Zhiguo Wang

Comments 37 pages. arXiv admin note: substantial text overlap with arXiv:2409.18415

详情
英文摘要

This study investigates the variational posterior convergence rates of inverse problems for partial differential equations (PDEs) with parameters in Besov spaces $B_{pp}^α$ ($p \geq 1$) which are modeled naturally in a Bayesian manner using Besov priors constructed via random wavelet expansions with $p$-exponentially distributed coefficients. Departing from exact Bayesian inference, variational inference transforms the inference problem into an optimization problem by introducing variational sets. Building on a refined ``prior mass and testing'' framework, we derive general conditions on PDE operators and guarantee that variational posteriors achieve convergence rates matching those of the exact posterior under widely adopted variational families (Besov-type measures or mean-field families). Moreover, our results achieve minimax-optimal rates over $B^α_{pp}$ classes, significantly outperforming the suboptimal rates of Gaussian priors (by a polynomial factor). As specific examples, two typical nonlinear inverse problems, the Darcy flow problems and the inverse potential problem for a subdiffusion equation, are investigated to validate our theory. Besides, we show that our convergence rates of ``prediction'' loss for these ``PDE-constrained regression problems'' are minimax optimal.

2506.18994 2026-04-17 stat.ME stat.ML

Causal Decomposition Analysis with Synergistic Interventions: A Triply-Robust Machine Learning Approach to Addressing Multiple Dimensions of Social Disparities

Soojin Park, Su Yeon Kim, Xinyao Zheng, Chioun Lee

Comments The case study section contains errors due to coding issues. Therefore, I would like to withdraw the paper

详情
英文摘要

Educational disparities are rooted in and perpetuate social inequalities across multiple dimensions such as race, socioeconomic status, and geography. To reduce disparities, most intervention strategies focus on a single domain and frequently evaluate their effectiveness by using causal decomposition analysis. However, a growing body of research suggests that single-domain interventions may be insufficient for individuals marginalized on multiple fronts. While interventions across multiple domains are increasingly proposed, there is limited guidance on appropriate methods for evaluating their effectiveness. To address this gap, we develop an extended causal decomposition analysis that simultaneously targets multiple causally ordered intervening factors, allowing for the assessment of their synergistic effects. These scenarios often involve challenges related to model misspecification due to complex interactions among group categories, intervening factors, and their confounders with the outcome. To mitigate these challenges, we introduce a triply robust estimator that leverages machine learning techniques to address potential model misspecification. We apply our method to a cohort of students from the High School Longitudinal Study, focusing on math achievement disparities between Black, Hispanic, and White high schoolers. Specifically, we examine how two sequential interventions - equalizing the proportion of students who attend high-performing schools and equalizing enrollment in Algebra I by 9th grade across racial groups - may reduce these disparities.

2506.13763 2026-04-17 cs.LG cs.AI cs.CV stat.ML

Diagnosing and Improving Diffusion Models by Estimating the Optimal Loss Value

Yixian Xu, Shengjie Luo, Liwei Wang, Di He, Chang Liu

Comments 33 pages, 12 figures, 9 tables. ICLR 2026 Camera Ready version

详情
英文摘要

Diffusion models have achieved remarkable success in generative modeling. Despite more stable training, the loss of diffusion models is not indicative of absolute data-fitting quality, since its optimal value is typically not zero but unknown, leading to confusion between large optimal loss and insufficient model capacity. In this work, we advocate the need to estimate the optimal loss value for diagnosing and improving diffusion models. We first derive the optimal loss in closed form under a unified formulation of diffusion models, and develop effective estimators for it, including a stochastic variant scalable to large datasets with proper control of variance and bias. With this tool, we unlock the inherent metric for diagnosing the training quality of mainstream diffusion model variants, and develop a more performant training schedule based on the optimal loss. Moreover, using models with 120M to 1.5B parameters, we find that the power law is better demonstrated after subtracting the optimal loss from the actual training loss, suggesting a more principled setting for investigating the scaling law for diffusion models.

2506.13139 2026-04-17 stat.ML cs.LG

Random Matrix Theory for Deep Learning: Beyond Eigenvalues of Linear Models

Zhenyu Liao, Michael W. Mahoney

Comments 30 pages, 6 figures, to appear on IEEE Signal Processing Magazine

详情
英文摘要

Modern Machine Learning (ML) and Deep Neural Networks (DNNs) often operate on high-dimensional data and rely on overparameterized models, where classical low-dimensional intuitions break down. In particular, the proportional regime where the data dimension, sample size, and number of model parameters are all large and comparable, gives rise to novel and sometimes counterintuitive behaviors. This paper extends traditional Random Matrix Theory (RMT) beyond eigenvalue-based analysis of linear models to address the challenges posed by nonlinear ML models such as DNNs in this regime. We introduce the concept of High-dimensional Equivalent, which unifies and generalizes both Deterministic Equivalent and Linear Equivalent, to systematically address three technical challenges: high dimensionality, nonlinearity, and the need to analyze generic eigenspectral functionals. Leveraging this framework, we provide precise characterizations of the training and generalization performance of linear models, nonlinear shallow networks, and deep networks. Our results capture rich phenomena, including scaling laws, double descent, and nonlinear learning dynamics, offering a unified perspective on the theoretical understanding of deep learning in high dimensions.

2506.11251 2026-04-17 stat.ME cs.AI cs.LG

Measuring multi-calibration

Ido Guy, Daniel Haimovich, Fridolin Linder, Nastaran Okati, Lorenzo Perini, Niek Tax, Mark Tygert

Comments 25 pages, 12 tables

详情
英文摘要

A suitable scalar metric can help measure multi-calibration, defined as follows. When the expected values of observed responses are equal to corresponding predicted probabilities, the probabilistic predictions are known as "perfectly calibrated." When the predicted probabilities are perfectly calibrated simultaneously across several subpopulations, the probabilistic predictions are known as "perfectly multi-calibrated." In practice, predicted probabilities are seldom perfectly multi-calibrated, so a statistic measuring the distance from perfect multi-calibration is informative. A recently proposed metric for calibration, based on the classical Kuiper statistic, is a natural basis for a new metric of multi-calibration and avoids well-known problems of metrics based on binning or kernel density estimation. The newly proposed metric weights the contributions of different subpopulations in proportion to their signal-to-noise ratios; data analyses' ablations demonstrate that the metric becomes noisy when omitting the signal-to-noise ratios from the metric. Numerical examples on benchmark data sets illustrate the new metric.

2505.07153 2026-04-17 stat.ME

Enhancing Inference for Small Cohorts via Transfer Learning and Weighted Integration of Multiple Datasets

Subharup Guha, Mengqi Xu, Yi Li

详情
英文摘要

Lung sepsis remains a significant concern in the Northeastern U.S., yet the national eICU Collaborative Database includes only a small number of patients from this region, highlighting underrepresentation. Understanding clinical variables such as FiO2, creatinine, platelets, and lactate, which reflect oxygenation, kidney function, coagulation, and metabolism, is crucial because these markers influence sepsis outcomes and may vary by sex. Transfer learning helps address small sample sizes by borrowing information from larger datasets, although differences in covariates and outcome-generating mechanisms between the target and external cohorts can complicate the process. We propose a novel weighting method, TRANSfer LeArning wiTh wEights (TRANSLATE), to integrate data from various sources by incorporating domain-specific characteristics through learned weights that align external data with the target cohort. These weights adjust for cohort differences, are proportional to each cohort's effective sample size, and downweight dissimilar cohorts. TRANSLATE offers theoretical guarantees for improved precision and applies to a wide range of estimands, including means, variances, and distribution functions. Simulations and a real-data application to sepsis outcomes in the Northeast cohort, using a much larger sample from other U.S. regions, show that the method enhances inference while accounting for regional heterogeneity.

2504.20470 2026-04-17 stat.ME

The Promises of Multiple Experiments: Identifying Joint Distribution of Potential Outcomes

Peng Wu, Xiaojie Mao

详情
英文摘要

Typical causal effects are defined based on the marginal distribution of potential outcomes. However, many real-world applications require causal estimands involving the joint distribution of potential outcomes to enable more nuanced treatment evaluation and selection. In this article, we propose a novel framework for identifying and estimating the joint distribution of potential outcomes using multiple experimental datasets. We introduce the assumption of transportability of state transition probabilities for potential outcomes across datasets and establish the identification of the joint distribution under this assumption, along with a regular full-column rank condition. The key identification assumptions are testable in an overidentified setting and are analogous to those in the context of instrumental variables, with the dataset indicator serving as "instrument". Moreover, we propose an easy-to-use least-squares-based estimator for the joint distribution of potential outcomes in each dataset, proving its consistency and asymptotic normality. We further extend the proposed framework to identify and estimate principal causal effects. We empirically demonstrate the proposed framework by conducting extensive simulations and applying it to evaluate the surrogate endpoint in a real-world application.

2503.06538 2026-04-17 stat.ME

Association measures for two-way contingency tables based on multi-categorical proportional reduction in error

Wataru Urasaki, Kouji Tahata, Sadao Tomizawa

详情
英文摘要

In two-way contingency tables under an asymmetric situation, where the row and column variables are defined as explanatory and response variables, respectively, quantifying the extent to which the explanatory variable contributes to predicting the response variable is important. One quantification method is the association measure, which indicates the degree of association in a range from $0$ to $1$. Among various measures that have been proposed, those based on proportional reduction in error (PRE) are particularly notable for their simplicity and intuitive interpretation. These measures, including Goodman-Kruskal's lambda proposed in 1954, are widely implemented in statistical software such as R and SAS and remain extensively used. However, a well-known limitation of PRE measures is their potential to return a value of $0$ despite no independence. This issue arises because the measures are constructed based solely on the maximum joint and marginal probabilities, failing to make full use of the information available in the contingency table. To address this problem, we propose an extension of PRE measures designed for the proportional reduction in error with multiple categories. The properties of the proposed measures are examined, and their utility is demonstrated through numerical experiments. The results suggest their potential as practical tools in applied statistics.

2502.01254 2026-04-17 math.ST stat.TH

A necessary and sufficient condition for convergence in distribution of the quantile process in $L^1(0,1)$

Brendan K. Beare, Tetsuya Kaji

Comments 22 pages

详情
英文摘要

We establish a necessary and sufficient condition for the quantile process based on iid sampling to converge in distribution in $L^1(0,1)$. The condition is that the quantile function is locally absolutely continuous and satisfies a slight strengthening of square integrability. If the quantile process converges in distribution then it may be approximated using the bootstrap.

2501.11315 2026-04-17 stat.AP q-bio.QM stat.ML

High-dimensional point forecast combinations for emergency department demand

Peihong Guo, Wen Ye Loh, Kenwin Maung, Esther Li Wen Choo, Borame Lee Dickens, Kelvin Bryan Tan, John Abishgenadan, Pei Ma, Jue Tao Lim

详情
Journal ref
BMC Emerg Med 26, 83 (2026)
英文摘要

Current work on forecasting emergency department (ED) admissions focuses on disease aggregates or singular disease types. However, given differences in the dynamics of individual diseases, it is unlikely that any single forecasting model would accurately account for each disease and for all time, leading to significant forecast model uncertainty. Yet, forecasting models for ED admissions to-date do not explore the utility of forecast combinations to improve forecast accuracy and stability. It is also unknown whether improvements in forecast accuracy can be yield from (1) incorporating a large number of environmental and anthropogenic covariates or (2) forecasting total ED causes by aggregating cause-specific ED forecasts. To address this gap, we propose high-dimensional forecast combination schemes to combine a large number of forecasting individual models for forecasting cause-specific ED admissions over multiple causes and forecast horizons. We use time series data of ED admissions with an extensive set of explanatory lagged variables at the national level, including meteorological/ambient air pollutant variables and ED admissions of all 16 causes studied. We show that the simple forecast combinations yield forecast accuracies of around 3.81%-23.54% across causes. Furthermore, forecast combinations outperform individual forecasting models, in more than 50% of scenarios (across all ED admission categories and horizons) in a statistically significant manner. Inclusion of high-dimensional covariates and aggregating cause-specific forecasts to provide all-cause ED forecasts provided modest improvements in forecast accuracy. Forecasting cause-specific ED admissions can provide fine-scale forward guidance on resource optimization and pandemic preparedness and forecast combinations can be used to hedge against model uncertainty when forecasting across a wide range of admission categories.

2501.09331 2026-04-17 cs.LG stat.ML

Identifying Information from Observations with Uncertainty and Novelty

Derek S. Prijatelj, Timothy J. Ireland, Walter J. Scheirer

Comments 29 pages, 4 figures, 2 table, and 2 inline algorithms

详情
英文摘要

A machine that learns a task from observations must encounter and process uncertainty and novelty, especially when it is to maintain performance when observing new information and to select the hypothesis that best fits the current observations. In this context, some key questions arise: what and how much information did the observations provide, how much information is required to identify the data-generating process, how many observations remain to get that information, and how does a predictor determine that it has observed novel information? We formalize identifying information to answer these questions and synthesize prior works. Identifying information are bits that verify or falsify a hypothesis as the data-generating process. In this formalization, we prove the information theoretic characteristics of the computation of hypothesis identification and the resulting sample complexity. We define hypothesis identification and sample complexity via the computation of an indicator function over a set of hypotheses, bridging algorithmic and probabilistic information. We detail the sample complexity and its properties for data-generating processes ranging from deterministic processes to ergodic stationary stochastic processes, which connect the notion of identifying information in finite steps with asymptotic statistics and PAC-learning. The indicator function's computation naturally formalizes novel information and its identification from observations with respect to a hypothesis set, which detects a misspecified hypothesis set. We also proved that a computable PAC-Bayes learners' sample complexity distribution is determined by its moments in terms of the prior probability distribution over a fixed finite hypothesis set, and thus an approximation of the sample complexity distribution is always computable within the desired precision that resources allow.

2307.02582 2026-04-17 q-fin.ST math.PR math.ST stat.TH

Estimating the roughness exponent of stochastic volatility from discrete observations of the integrated variance

Xiyue Han, Alexander Schied

Comments 50 pages, 3 figures

详情
英文摘要

We consider the problem of estimating the roughness of the volatility process in a stochastic volatility model that arises as a nonlinear function of fractional Brownian motion with drift. To this end, we introduce a new estimator that measures the so-called roughness exponent of a continuous trajectory, based on discrete observations of its antiderivative. The estimator has a very simple form and can be computed with great efficiency on large data sets. It is not derived from distributional assumptions but from strictly pathwise considerations. We provide conditions on the underlying trajectory under which our estimator converges in a strictly pathwise sense. Then we verify that these conditions are satisfied by almost every sample path of fractional Brownian motion (with drift). As a consequence, we obtain strong consistency theorems in the context of a large class of rough volatility models, such as the rough fractional volatility model and the rough Bergomi model. We also demonstrate that our estimator is robust with respect to proxy errors between the integrated and realized variance, and that it can be applied to estimate the roughness exponent directly from the price trajectory. Numerical simulations show that our estimation procedure performs well after passing to a scale-invariant modification of our estimator.

2304.08974 2026-04-17 econ.EM stat.ME

Doubly Robust Estimators with Weak Overlap

Yukun Ma, Pedro H. C. Sant'Anna, Yuya Sasaki, Takuya Ura

详情
英文摘要

Doubly robust (DR) estimators guard against model misspecification but remain sensitive to weak covariate overlap. We show that trimming propensity scores reduces variance but eliminates double robustness. We introduce DR estimators that retain double robustness after trimming through bias correction, preserving the original causal targets across unconfoundedness, instrumental variables, and difference-in-differences designs. In four applications, the proposed estimator yields more precise estimates: ruling out large mortality effects of Medicaid expansion, detecting workforce growth from mental health reform, recovering the Black--White test score gap without strong functional form restrictions, and recovering a positive 401(k) savings effect consistent with the prior literature.

2301.07386 2026-04-17 q-bio.NC stat.AP

Hierarchical Bayesian inference for community detection and connectivity of functional brain networks

Lingbin Bian, Nizhuan Wang, Leonardo Novelli, Jonathan Keith, Adeel Razi

详情
Journal ref
IEEE Transactions on Medical Imaging, 2026
英文摘要

Most functional magnetic resonance imaging studies rely on estimates of hierarchically organized functional brain networks whose segregation and integration reflect the cognitive and behavioral changes in humans. However, most existing methods for estimating the community structure of networks from both individual and group-level analysis methods do not account for the variability between subjects. In this paper, we develop a new multilayer community detection method based on Bayesian latent block model (LBM). The method can robustly detect the community structure of weighted functional networks with an unknown number of communities at both individual and group levels and retain the variability of the individual networks. For validation, we propose a new community structure-based multivariate Gaussian generative model to simulate synthetic signal. Our simulation study shows that the community memberships estimated by hierarchical Bayesian inference are consistent with the predefined node labels in the generative model. The method is also tested via split-half reproducibility using working memory task fMRI data of 100 unrelated healthy subjects from the Human Connectome Project. Analyses using both synthetic and real data show that our proposed method is more accurate and reliable compared with the commonly used (multilayer) modularity models.

2104.03436 2026-04-17 math.ST stat.ME stat.TH

Synthetic likelihood in misspecified models

David T. Frazier, Christopher Drovandi, David J. Nott

详情
英文摘要

Bayesian synthetic likelihood is a widely used approach for conducting Bayesian analysis in complex models where evaluation of the likelihood is infeasible but simulation from the assumed model is tractable. We analyze the behaviour of the Bayesian synthetic likelihood posterior when the assumed model differs from the actual data generating process. We demonstrate that the Bayesian synthetic likelihood posterior can display a wide range of non-standard behaviours depending on the level of model misspecification, including multimodality and asymptotic non-Gaussianity. Our results suggest that likelihood tempering, a common approach for robust Bayesian inference, fails for synthetic likelihood whilst recently proposed robust synthetic likelihood approaches can ameliorate this behavior and deliver reliable posterior inference under model misspecification. All results are illustrated using a simple running example.