arXivDaily arXiv每日学术速递 周一至周五更新
2602.12234 2026-02-13 stat.ME math.OC

Batch-based Bayesian Optimal Experimental Design in Linear Inverse Problems

Sofia Mäkinen, Andrew B. Duncan, Tapio Helin

Comments 25 pages, 5 figures

详情
英文摘要

Experimental design is central to science and engineering. A ubiquitous challenge is how to maximize the value of information obtained from expensive or constrained experimental settings. Bayesian optimal experimental design (OED) provides a principled framework for addressing such questions. In this paper, we study experimental design problems such as the optimization of sensor locations over a continuous domain in the context of linear Bayesian inverse problems. We focus in particular on batch design, that is, the simultaneous optimization of multiple design variables, which leads to a notoriously difficult non-convex optimization problem. We tackle this challenge using a promising strategy recently proposed in the frequentist setting, which relaxes A-optimal design to the space of finite positive measures. Our main contribution is the rigorous identification of the Bayesian inference problem corresponding to this relaxed A-optimal OED formulation. Moreover, building on recent work, we develop a Wasserstein gradient-flow -based optimization algorithm for the expected utility and introduce novel regularization schemes that guarantee convergence to an empirical measure. These theoretical results are supported by numerical experiments demonstrating both convergence and the effectiveness of the proposed regularization strategy.

2602.12216 2026-02-13 stat.OT

Bayesian inference for the automultinomial model with an application to landcover data

Maria Paula Duenas-Herrera, Stephen Berg, Murali Haran

详情
英文摘要

Multicategory lattice data arise in a wide variety of disciplines such as image analysis, biology, and forestry. We consider modeling such data with the automultinomial model, which can be viewed as a natural extension of the autologistic model to multicategory responses, or equivalently as an extension of the Potts model that incorporates covariate information into a pure-intercept model. The automultinomial model has the advantage of having a unique parameter that controls the spatial correlation. However, the model's likelihood involves an intractable normalizing function of the model parameters that poses serious computational problems for likelihood-based inference. We address this difficulty by performing Bayesian inference through the Double-Metropolis Hastings algorithm, and implement diagnostics to assess the convergence to the target posterior distribution. Through simulation studies and an application to land cover data, we find that the automultinomial model is flexible across a wide range of spatial correlations while maintaining a relatively simple specification. For large data sets we find it also has advantages over spatial generalized linear mixed models. To make this model practical for scientists, we provide recommendations for its specification and computational implementation.

2602.12082 2026-02-13 cs.LG stat.ML

Empirical Gaussian Processes

Jihao Andreas Lin, Sebastian Ament, Louis C. Tiao, David Eriksson, Maximilian Balandat, Eytan Bakshy

详情
英文摘要

Gaussian processes (GPs) are powerful and widely used probabilistic regression models, but their effectiveness in practice is often limited by the choice of kernel function. This kernel function is typically handcrafted from a small set of standard functions, a process that requires expert knowledge, results in limited adaptivity to data, and imposes strong assumptions on the hypothesis space. We study Empirical GPs, a principled framework for constructing flexible, data-driven GP priors that overcome these limitations. Rather than relying on standard parametric kernels, we estimate the mean and covariance functions empirically from a corpus of historical observations, enabling the prior to reflect rich, non-trivial covariance structures present in the data. Theoretically, we show that the resulting model converges to the GP that is closest (in KL-divergence sense) to the real data generating process. Practically, we formulate the problem of learning the GP prior from independent datasets as likelihood estimation and derive an Expectation-Maximization algorithm with closed-form updates, allowing the model handle heterogeneous observation locations across datasets. We demonstrate that Empirical GPs achieve competitive performance on learning curve extrapolation and time series forecasting benchmarks.

2602.12072 2026-02-13 stat.AP

Enhanced Forest Inventories for Habitat Mapping: A Case Study in the Sierra Nevada Mountains of California

Maxime Turgeon, Michael Kieser, Dwight Wolfe, Bruce MacArthur

Comments 11 pages, 6 figures

详情
英文摘要

Traditional forest inventory systems, originally designed to quantify merchantable timber volume, often lack the spatial resolution and structural detail required for modern multi-resource ecosystem management. In this manuscript, we present an Enhanced Forest Inventory (EFI) and demonstrate its utility for high-resolution wildlife habitat mapping. The project area covers 270,000 acres of the Eldorado National Forest in California's Sierra Nevada. By integrating 118 ground-truth Forest Inventory and Analysis (FIA) plots with multi-modal remote sensing data (LiDAR, aerial photography, and Sentinel-2 satellite imagery), we developed predictive models for key forest attributes. Our methodology employed a two-tier segmentation approach, partitioning the landscape into approximately 575,000 reporting units with an average size of 0.5 acre to capture forest heterogeneity. We utilized an Elastic-Net Regression framework and automated feature selection to relate remote sensing metrics to ground-measured variables such as basal area, stems per acre, and canopy cover. These physical metrics were translated into functional habitat attributes to evaluate suitability for two focal species: the California Spotted Owl (Strix occidentalis occidentalis) and the Pacific Fisher (Pekania pennanti). Our analysis identified 25,630 acres of nesting and 26,622 acres of foraging habitat for the owl, and 25,636 acres of likely habitat for the fisher based on structural requirements like large-diameter trees and high canopy closure. The results demonstrate that EFIs provide a critical bridge between forestry and conservation ecology, offering forest managers a spatially explicit tool to monitor ecosystem health and manage vulnerable species in complex environments.

2602.12043 2026-02-13 econ.EM stat.ME stat.ML

Improved Inference for CSDID Using the Cluster Jackknife

Sunny R. Karim, Morten Ørregaard Nielsen, James G. MacKinnon, Matthew D. Webb

详情
英文摘要

Obtaining reliable inferences with traditional difference-in-differences (DiD) methods can be difficult. Problems can arise when both outcomes and errors are serially correlated, when there are few clusters or few treated clusters, when cluster sizes vary greatly, and in various other cases. In recent years, recognition of the ``staggered adoption'' problem has shifted the focus away from inference towards consistent estimation of treatment effects. One of the most popular new estimators is the CSDID procedure of Callaway and Sant'Anna (2021). We find that the issues of over-rejection with few clusters and/or few treated clusters are at least as severe for CSDID as for traditional DiD methods. We also propose using a cluster jackknife for inference with CSDID, which simulations suggest greatly improves inference. We provide software packages in Stata csdidjack and R didjack to calculate cluster-jackknife standard errors easily.

2602.11947 2026-02-13 math.OC stat.ML

Mixed-Integer Programming for Change-point Detection

Apoorva Narula, Santanu S. Dey, Yao Xie

详情
英文摘要

We present a new mixed-integer programming (MIP) approach for offline multiple change-point detection by casting the problem as a globally optimal piecewise linear (PWL) fitting problem. Our main contribution is a family of strengthened MIP formulations whose linear programming (LP) relaxations admit integral projections onto the segment assignment variables, which encode the segment membership of each data point. This property yields provably tighter relaxations than existing formulations for offline multiple change-point detection. We further extend the framework to two settings of active research interest: (i) multidimensional PWL models with shared change-points, and (ii) sparse change-point detection, where only a subset of dimensions undergo structural change. Extensive computational experiments on benchmark real-world datasets demonstrate that the proposed formulations achieve reductions in solution times under both $\ell_1$ and $\ell_2$ loss functions in comparison to the state-of-the-art.

2602.11920 2026-02-13 cs.LG stat.ML

Learning Conditional Averages

Marco Bressan, Nataly Brukhim, Nicolo Cesa-Bianchi, Emmanuel Esposito, Yishay Mansour, Shay Moran, Maximilian Thiessen

详情
英文摘要

We introduce the problem of learning conditional averages in the PAC framework. The learner receives a sample labeled by an unknown target concept from a known concept class, as in standard PAC learning. However, instead of learning the target concept itself, the goal is to predict, for each instance, the average label over its neighborhood -- an arbitrary subset of points that contains the instance. In the degenerate case where all neighborhoods are singletons, the problem reduces exactly to classic PAC learning. More generally, it extends PAC learning to a setting that captures learning tasks arising in several domains, including explainability, fairness, and recommendation systems. Our main contribution is a complete characterization of when conditional averages are learnable, together with sample complexity bounds that are tight up to logarithmic factors. The characterization hinges on the joint finiteness of two novel combinatorial parameters, which depend on both the concept class and the neighborhood system, and are closely related to the independence number of the associated neighborhood graph.

2602.11857 2026-02-13 cs.GT cs.LG stat.ML

Scale-Invariant Fast Convergence in Games

Taira Tsuchiya, Haipeng Luo, Shinji Ito

Comments 44 pages

详情
英文摘要

Scale-invariance in games has recently emerged as a widely valued desirable property. Yet, almost all fast convergence guarantees in learning in games require prior knowledge of the utility scale. To address this, we develop learning dynamics that achieve fast convergence while being both scale-free, requiring no prior information about utilities, and scale-invariant, remaining unchanged under positive rescaling of utilities. For two-player zero-sum games, we obtain scale-free and scale-invariant dynamics with external regret bounded by $\tilde{O}(A_{\mathrm{diff}})$, where $A_{\mathrm{diff}}$ is the payoff range, which implies an $\tilde{O}(A_{\mathrm{diff}} / T)$ convergence rate to Nash equilibrium after $T$ rounds. For multiplayer general-sum games with $n$ players and $m$ actions, we obtain scale-free and scale-invariant dynamics with swap regret bounded by $O(U_{\mathrm{max}} \log T)$, where $U_{\mathrm{max}}$ is the range of the utilities, ignoring the dependence on the number of players and actions. This yields an $O(U_{\mathrm{max}} \log T / T)$ convergence rate to correlated equilibrium. Our learning dynamics are based on optimistic follow-the-regularized-leader with an adaptive learning rate that incorporates the squared path length of the opponents' gradient vectors, together with a new stopping-time analysis that exploits negative terms in regret bounds without scale-dependent tuning. For general-sum games, scale-free learning is enabled also by a technique called doubling clipping, which clips observed gradients based on past observations.

2602.10515 2026-02-13 econ.EM stat.ME

Quantile optimization in semidiscrete optimal transport

Yinchu Zhu, Ilya O. Ryzhov

详情
英文摘要

Optimal transport is the problem of designing a joint distribution for two random variables with fixed marginals. In virtually the entire literature on this topic, the objective is to minimize expected cost. This paper is the first to study a variant in which the goal is to minimize a quantile of the cost, rather than the mean. For the semidiscrete setting, where one distribution is continuous and the other is discrete, we derive a complete characterization of the optimal transport plan and develop simulation-based methods to efficiently compute it. One particularly novel aspect of our approach is the efficient computation of a tie-breaking rule that preserves marginal distributions. In the context of geographical partitioning problems, the optimal plan is shown to produce a novel geometric structure.

2602.07488 2026-02-13 cs.LG cs.AI stat.ML

Deriving Neural Scaling Laws from the statistics of natural language

Francesco Cagnetta, Allan Raventós, Surya Ganguli, Matthieu Wyart

详情
英文摘要

Despite the fact that experimental neural scaling laws have substantially guided empirical progress in large-scale machine learning, no existing theory can quantitatively predict the exponents of these important laws for any modern LLM trained on any natural language dataset. We provide the first such theory in the case of data-limited scaling laws. We isolate two key statistical properties of language that alone can predict neural scaling exponents: (i) the decay of pairwise token correlations with time separation between token pairs, and (ii) the decay of the next-token conditional entropy with the length of the conditioning context. We further derive a simple formula in terms of these statistics that predicts data-limited neural scaling exponents from first principles without any free parameters or synthetic data models. Our theory exhibits a remarkable match with experimentally measured neural scaling laws obtained from training GPT-2 and LLaMA style models from scratch on two qualitatively different benchmarks, TinyStories and WikiText.

2602.05716 2026-02-13 stat.ME stat.CO

MixMashNet: An R Package for Single and Multilayer Networks

Maria De Martino, Federico Triolo, Adrien Perigord, Alice Margherita Ornago, Davide Liborio Vetrano, Caterina Gregorio

详情
英文摘要

The R package MixMashNet provides an integrated framework for estimating and analyzing single and multilayer networks using Mixed Graphical Models (MGMs), accommodating continuous, count, and categorical variables. In the multilayer setting, layers may comprise different types and numbers of variables, and users can explicitly impose a predefined multilayer topology. Bootstrap procedures are implemented to quantify sampling uncertainty for edge weights and node-level centrality indices. In addition, the package includes tools to assess the stability of node community membership and to compute community scores that summarize the latent dimensions identified through network clustering. MixMashNet also offers interactive Shiny applications to support exploration, visualization, and interpretation of the estimated networks.

2602.03466 2026-02-13 quant-ph stat.ML

Quantum Circuit Generation via test-time learning with large language models

Adriano Macarone-Palmieri, Rosario Lo Franco

Comments 9 pages, 1 figure

详情
英文摘要

Large language models (LLMs) can generate structured artifacts, but using them as dependable optimizers for scientific design requires a mechanism for iterative improvement under black-box evaluation. Here, we cast quantum circuit synthesis as a closed-loop, test-time optimization problem: an LLM proposes edits to a fixed-length gate list, and an external simulator evaluates the resulting state with the Meyer-Wallach (MW) global entanglement measure. We introduce a lightweight test-time learning recipe that can reuse prior high-performing candidates as an explicit memory trace, augments prompts with a score-difference feedback, and applies restart-from-the-best sampling to escape potential plateaus. Across fixed 20-qubit settings, the loop without feedback and restart-from-the-best improves random initial circuits over a range of gate budgets. To lift up this performance and success rate, we use the full learning strategy. For the 25-qubit, it mitigates a pronounced performance plateau when naive querying is used. Beyond raw scores, we analyze the structure of synthesized states and find that high MW solutions can correspond to stabilizer or graph-state-like constructions, but full connectivity is not guaranteed due to the metric property and prompt design. These results illustrate both the promise and the pitfalls of memory evaluator-guided LLM optimization for circuit synthesis, highlighting the critical role of prior human-made theoretical theorems to optimally design a custom tool in support of research.

2601.16250 2026-02-13 stat.ML cs.CE cs.LG cs.NA math.NA math.PR

Distributional Computational Graphs: Error Bounds

Olof Hallqvist Elias, Michael Selby, Phillip Stanley-Marbell

Comments 28 pages, 2 figures, minor correction to Theorem 1.1

详情
英文摘要

We study a general framework of distributional computational graphs: computational graphs whose inputs are probability distributions rather than point values. We analyze the discretization error that arises when these graphs are evaluated using finite approximations of continuous probability distributions. Such an approximation might be the result of representing a continuous real-valued distribution using a discrete representation or from constructing an empirical distribution from samples (or might be the output of another distributional computational graph). We establish non-asymptotic error bounds in terms of the Wasserstein-1 distance, without imposing structural assumptions on the computational graph.

2511.00772 2026-02-13 cs.DB cs.LG stat.AP

Reliable Curation of EHR Dataset via Large Language Models under Environmental Constraints

Raymond M. Xiong, Panyu Chen, Tianze Dong, Jian Lu, Louis Hu, Nathan Yu, Benjamin Goldstein, Danyang Zhuo, Anru R. Zhang

详情
英文摘要

Electronic health records (EHRs) are central to modern healthcare delivery and research; yet, many researchers lack the database expertise necessary to write complex SQL queries or generate effective visualizations, limiting efficient data use and scientific discovery. To address this barrier, we introduce CELEC, a large language model (LLM)-powered framework for automated EHR data extraction and analytics. CELEC translates natural language queries into SQL using a prompting strategy that integrates schema information, few-shot demonstrations, and chain-of-thought reasoning, which together improve accuracy and robustness. CELEC also adheres to strict privacy protocols: the LLM accesses only database metadata (e.g., table and column names), while all query execution occurs securely within the institutional environment, ensuring that no patient-level data is ever transmitted to or shared with the LLM. On a subset of the EHRSQL benchmark, CELEC achieves execution accuracy comparable to prior systems while maintaining low latency, cost efficiency, and strict privacy by exposing only database metadata to the LLM. Ablation studies confirm that each component of the SQL generation pipeline, particularly the few-shot demonstrations, plays a critical role in performance. By lowering technical barriers and enabling medical researchers to query EHR databases directly, CELEC streamlines research workflows and accelerates biomedical discovery.

2510.24187 2026-02-13 stat.ML cs.LG

Self-Concordant Perturbations for Linear Bandits

Lucas Lévy, Jean-Lou Valeau, Arya Akhavan, Patrick Rebeschini

详情
英文摘要

We consider the adversarial linear bandits setting and present a unified algorithmic framework that bridges Follow-the-Regularized-Leader (FTRL) and Follow-the-Perturbed-Leader (FTPL) methods, extending the known connection between them from the full-information setting. Within this framework, we introduce self-concordant perturbations, a family of probability distributions that mirror the role of self-concordant barriers previously employed in the FTRL-based SCRiBLe algorithm. Using this idea, we design a novel FTPL-based algorithm that combines self-concordant regularization with efficient stochastic exploration. Our approach achieves a regret of $\mathcal{O}(d\sqrt{n \ln n})$ on both the $d$-dimensional hypercube and the $\ell_2$ ball. On the $\ell_2$ ball, this matches the rate attained by SCRiBLe. For the hypercube, this represents a $\sqrt{d}$ improvement over these methods and matches the optimal bound up to logarithmic factors.

2508.07936 2026-02-13 math.ST stat.TH

Hybrid estimation for a mixed fractional Black-Scholes model with random effects from discrete time observations

Nesrine Chebli, Hamdi Fathallah, Yousri Slaoui

详情
英文摘要

We propose a hybrid estimation procedure to estimate global fixed parameters and subject-specific random effects in a mixed fractional Black-Scholes model based on discrete-time observations. Specifically, we consider $N$ independent stochastic processes, each driven by a linear combination of standard Brownian motion and an independent fractional Brownian motion, and governed by a drift term that depends on an unobserved random effect with unknown distribution. Based on $n$ discrete time statistics of process increments, we construct parametric estimators for the Brownian motion volatility, the scaling parameter for the fractional Brownian motion, and the Hurst parameter using a generalized method of moments. We establish their strong consistency under the two-step regime where the observation frequency $n$ and then the sample size $N$ tend to infinity, and prove their joint asymptotic normality when $H \in \big(\frac12, \frac34\big)$. Then, using a plug-in approach, we consistently estimate the random effects, and we study their asymptotic behavior under the same sequential asymptotic regime. Finally, we construct a nonparametric estimator for the distribution function of these random effects using a Lagrange interpolation at Chebyshev-Gauss nodes based method, and we analyze its asymptotic properties as both $n$ and $N$ increase. We illustrate the theoretical results through a numerical simulation framework. We further demonstrate the efficiency performance of the proposed estimators in an empirical application to crypto returns data, analyzing five major cryptocurrencies to uncover their distinct volatility structures and heterogeneous trend behaviors.

2507.22344 2026-02-13 stat.ME

Risk-inclusive Contextual Bandits for Early Phase Clinical Trials

Rohit Kanrar, Chunlin Li, Zara Ghodsi, Margaret Gamalo

详情
英文摘要

Early-phase clinical trials face the challenge of selecting optimal drug doses that balance safety and efficacy due to uncertain dose-response relationships and varied participant characteristics. Traditional randomized dose allocation often exposes participants to sub-optimal doses by not considering individual covariates, necessitating larger sample sizes and prolonging drug development. This paper introduces a risk-inclusive contextual bandit algorithm that utilizes multi-arm bandit (MAB) strategies to optimize dosing through participant-specific data integration. By combining two separate Thompson samplers, one for efficacy and one for safety, the algorithm enhances the balance between efficacy and safety in dose allocation. The effect sizes are estimated with a generalized version of asymptotic confidence sequences (AsympCS), offering a uniform coverage guarantee for sequential causal inference over time. The validity of AsympCS is also established in the MAB setup with a possibly mis-specified model. The empirical results demonstrate the strengths of this method in optimizing dose allocation compared to randomized allocations and traditional contextual bandits focused solely on efficacy. Moreover, an application on real data generated from a recent Phase IIb study aligns with actual findings.

2507.04713 2026-02-13 stat.ME stat.CO

Optimal Exact Designs of Multiresponse Experiments under Linear and Sparsity Constraints

Lenka Filová, Pál Somogyi, Radoslav Harman

Journal ref Applied Stochastic Models in Business and Industry, Volume 42, Issue 2 (2026)

详情
英文摘要

We propose a computational approach to constructing exact designs on finite design spaces that are optimal for multiresponse regression experiments under a combination of the standard linear and specific 'sparsity' constraints. The linear constraints address, for example, limits on multiple resource consumption and the problem of optimal design augmentation, while the sparsity constraints control the set of distinct trial conditions utilized by the design. The key idea is to construct an artificial optimal design problem that can be solved using any existing mathematical programming technique for univariate-response optimal designs under pure linear constraints. The solution to this artificial problem can then be directly converted into an optimal design for the primary multivariate-response setting with combined linear and sparsity constraints. We demonstrate the utility and flexibility of the approach through dose-response experiments with constraints on safety, efficacy, and cost, where cost also depends on the number of distinct doses used.

2506.18846 2026-02-13 stat.CO cs.NA math.NA

Bayesian decomposition using Besov priors

Andreas Horst, Babak Maboudi Afkham, Yiqiu Dong, Jakob Lemvig

Comments 28 pages, 13 figures, this is a preprint of an article submitted to the journal of Applied Numerical Mathematics

详情
英文摘要

In many inverse problems, the unknown is composed of multiple components with different regularities, for example, in imaging problems, where the unknown can have both rough and smooth features. We investigate linear Bayesian inverse problems, where the unknown consists of two components: one smooth and one piecewise constant. We model the unknown as a sum of two components and assign individual priors on each component to impose the assumed behavior. We propose and compare two prior models: (i) a combination of a Haar wavelet-based Besov prior and a smoothing Besov prior, and (ii) a hierarchical Gaussian prior on the gradient coupled with a smoothing Besov prior. To achieve a balanced reconstruction, we place hyperpriors on the prior parameters and jointly infer both the components and the hyperparameters. We propose Gibbs sampling schemes for posterior inference in both prior models. We demonstrate the capabilities of our approach on 1D and 2D deconvolution problems, where the unknown consists of smooth parts with jumps. The numerical results indicate that our methods improve the reconstruction quality compared to single-prior approaches and that the prior parameters can be successfully estimated to yield a balanced decomposition.

2505.13732 2026-02-13 stat.ML cs.LG

Backward Conformal Prediction

Etienne Gauthier, Francis Bach, Michael I. Jordan

Comments Code available at: https://github.com/GauthierE/backward-cp

详情
英文摘要

We introduce $\textit{Backward Conformal Prediction}$, a method that guarantees conformal coverage while providing flexible control over the size of prediction sets. Unlike standard conformal prediction, which fixes the coverage level and allows the conformal set size to vary, our approach defines a rule that constrains how prediction set sizes behave based on the observed data, and adapts the coverage level accordingly. Our method builds on two key foundations: (i) recent results by Gauthier et al. [2025] on post-hoc validity using e-values, which ensure marginal coverage of the form $\mathbb{P}(Y_{\rm test} \in \hat C_n^{\tildeα}(X_{\rm test})) \ge 1 - \mathbb{E}[\tildeα]$ up to a first-order Taylor approximation for any data-dependent miscoverage $\tildeα$, and (ii) a novel leave-one-out estimator $\hatα^{\rm LOO}$ of the marginal miscoverage $\mathbb{E}[\tildeα]$ based on the calibration set, ensuring that the theoretical guarantees remain computable in practice. This approach is particularly useful in applications where large prediction sets are impractical such as medical diagnosis. We provide theoretical results and empirical evidence supporting the validity of our method, demonstrating that it maintains computable coverage guarantees while ensuring interpretable, well-controlled prediction set sizes.

2504.15923 2026-02-13 stat.AP stat.CO stat.ME

Bayesian sample size calculations for external validation studies of risk prediction models

Mohsen Sadatsafavi, Paul Gustafson, Solmaz Setayeshgar, Laure Wynants, Richard D Riley

Comments 21 pages, 4 tables, 4 figures

详情
英文摘要

Contemporary sample size calculations for external validation of risk prediction models require users to specify fixed values of assumed model performance metrics alongside target precision levels (e.g., 95% CI widths). However, due to the finite samples of previous studies, our knowledge of true model performance in the target population is uncertain, and so choosing fixed values represents an incomplete picture. As well, for net benefit (NB) as a measure of clinical utility, the relevance of conventional precision-based inference is doubtful. In this work, we propose a general Bayesian framework for multi-criteria sample size considerations for prediction models for binary outcomes. For statistical metrics of performance (e.g., discrimination and calibration), we propose sample size rules that target desired expected precision or desired assurance probability that the precision criteria will be satisfied. For NB, we propose rules based on Optimality Assurance (the probability that the planned study correctly identifies the optimal strategy) and Value of Information (VoI) analysis. We showcase these developments in a case study on the validation of a risk prediction model for deterioration of hospitalized COVID-19 patients. Compared to the conventional sample size calculation methods, a Bayesian approach requires explicit quantification of uncertainty around model performance, and thereby enables flexible sample size rules based on expected precision, assurance probabilities, and VoI. In our case study, calculations based on VoI for NB suggest considerably lower sample sizes are needed than when focusing on precision of calibration metrics.

2502.14566 2026-02-13 stat.ME stat.AP

Feasible Dose-Response Curves for Continuous Treatments Under Positivity Violations

Han Bao, Michael Schomaker

Comments 43 pages (30 without appendix), 8 figures

详情
英文摘要

Positivity violations can complicate estimation and interpretation of causal dose-response curves (CDRCs) for continuous interventions. Weighting-based methods are designed to handle limited overlap, but the resulting weighted targets can be hard to interpret scientifically. Modified treatment policies can be less sensitive to support limitations, yet they typically target policy-defined effects that may not align with the original dose-response question. We develop an approach that addresses limited overlap while remaining close to the scientific target of the CDRC. Our work is motivated by the CHAPAS-3 trial of HIV-positive children in Zambia and Uganda, where clinically relevant efavirenz concentration levels are not uniformly supported across covariate strata. We introduce a diagnostic, the non-overlap ratio, which quantifies, as a function of the target intervention level, the proportion of the population for whom that level is not supported given observed covariates. We also define an individualized most feasible intervention: for each child and target concentration, we retain the target when it is supported, and otherwise map it to the nearest supported concentration. The resulting feasible dose-response curve answers: if we try to set everyone to a given concentration, but it is not realistically attainable for some individuals, what outcome would be expected after shifting those individuals to their nearest attainable concentration? We propose a plug-in g-computation estimator that combines outcome regression with flexible conditional density estimation to learn supported regions and evaluate the feasible estimand. Simulations show reduced bias under positivity violations and recovery of the standard CDRC when support is adequate. An application to CHAPAS-3 yields a stable and interpretable concentration-response summary under realistic support constraints.

2407.16283 2026-02-13 stat.CO stat.ME

A Randomized Exchange Algorithm for Optimal Design of Multi-Response Experiments

Pál Somogyi, Samuel Rosa, Radoslav Harman

Journal ref Metrika, Volume 89, pages 217-242, (2026)

详情
英文摘要

Despite the increasing prevalence of vector observations, computation of optimal experimental design for multi-response models has received limited attention. To address this problem within the framework of approximate designs, we introduce mREX, an algorithm that generalizes the randomized exchange algorithm REX (J Am Stat Assoc 115:529, 2020), originally specialized for single-response models. The mREX algorithm incorporates several improvements: a novel method for computing efficient sparse initial designs, an extension to all differentiable Kiefer's optimality criteria, and an efficient method for performing optimal exchanges of weights. For the most commonly used D-optimality criterion, we propose a technique for optimal weight exchanges based on the characteristic matrix polynomial. The mREX algorithm is applicable to linear, nonlinear, and generalized linear models, and scales well to large problems. It typically converges to optimal designs faster than available alternative methods, although it does not require advanced mathematical programming solvers. We demonstrate the usefulness of mREX to bivariate dose-response Emax models for clinical trials, both without and with the inclusion of covariates.

2403.07772 2026-02-13 math.ST stat.TH

Privacy Guarantees in Posterior Sampling under Contamination

Shenggang Hu, Louis Aslett, Hongsheng Dai, Murray Pollock, Gareth O. Roberts

Comments Minor revisions

详情
英文摘要

In recent years, differential privacy has been adopted by tech-companies and governmental agencies as the standard for measuring privacy in algorithms. In this article, we study differential privacy in Bayesian posterior sampling settings. We begin by considering differential privacy in the most common privatisation setting in which Laplace or Gaussian noise is injected into the output. In an effort to achieve better differential privacy, we consider adopting {\em Huber's contamination model} for use within privacy settings, and replace at random data points with samples from a heavy-tailed distribution ({\em instead} of injecting noise into the output). We derive bounds for the differential privacy level $(ε,δ)$ of our approach, without requiring bounded observation and parameter spaces, a restriction commonly imposed in the literature. We further consider for our approach the effect of sample size on the privacy level and the rate at which $(ε,δ)$ converges to zero. Asymptotically, our contamination approach is fully private with no information loss. We also provide examples of inference models for which our approach applies, with theoretical convergence rate analysis and simulation studies.

2309.04414 2026-02-13 stat.AP cs.DL

Scientific productivity as a random walk

Sam Zhang, Nicholas LaBerge, Samuel F. Way, Daniel B. Larremore, Aaron Clauset

详情
英文摘要

The expectation that scientific productivity follows regular patterns over a career underpins many scholarly evaluations. However, recent studies of individual productivity patterns reveal a puzzle: the average number of papers published per year robustly follows the ``canonical trajectory'' of a rapid rise followed by a gradual decline, yet only about 20\% of individual productivity trajectories follow this pattern. We resolve this puzzle by modeling scientific productivity as a random walk, showing that the canonical pattern can be explained as a decrease in the variance in changes to productivity in the early-to-mid career. By empirically characterizing the variable structure of 2,085 productivity trajectories of computer science faculty at 205 PhD-granting institutions, spanning 29,119 publications over 1980--2016, we (i) discover remarkably simple patterns in both early-career and year-to-year changes to productivity, and (ii) show that a random walk model of productivity both reproduces the canonical trajectory in the average productivity and captures much of the diversity of individual-level trajectories, including the lognormal distribution of cumulative productivity observed by William Shockley in 1957. We confirm that these results generalize across fields by fitting our model to a separate panel of 22,952 faculty across 12 fields from 2011 to 2023. These results highlight the importance of variance in shaping individual scientific productivity, opening up new avenues for characterizing how systemic incentives and opportunities can be directed for aggregate effect.

2306.14851 2026-02-13 math.OC cs.LG stat.ME

Optimal Cross-Validation for Sparse Linear Regression

Ryan Cory-Wright, Andrés Gómez

Comments Updated manuscript for revision

详情
英文摘要

Given a high-dimensional covariate matrix and a response vector, ridge-regularized sparse linear regression selects a subset of features that explains the relationship between covariates and the response in an interpretable manner. To choose hyperparameters that control the sparsity level and amount of regularization, practitioners commonly use k-fold cross-validation. However, cross-validation substantially increases the computational cost of sparse regression as it requires solving many mixed-integer optimization problems (MIOs) for each hyperparameter combination. To address this computational burden, we derive computationally tractable relaxations of the k-fold cross-validation loss, facilitating hyperparameter selection while solving $50$--$80\%$ fewer MIOs in practice. Our computational results demonstrate, across eleven real-world UCI datasets, that exact MIO-based cross-validation can be competitive with mature software packages such as glmnet and L0Learn -particularly when the sample-to-feature ratio is small.

2302.08763 2026-02-13 math.PR math.ST stat.TH

Rigorous Derivation of the Degenerate Parabolic-Elliptic Keller-Segel System from a Moderately Interacting Stochastic Particle System. Part II Propagation of Chaos

Li Chen, Veniamin Gvozdik, Yue Li

详情
英文摘要

This work is a series of two articles. The main goal is to rigorously derive the degenerate parabolic-elliptic Keller-Segel system in the sub-critical regime from a moderately interacting stochastic particle system. In the first article [7], we establish the classical solution theory of the degenerate parabolic-elliptic Keller-Segel system and its non-local version. In the second article, which is the current one, we derive a propagation of chaos result, where the classical solution theory obtained in the first article is used to derive required estimates for the particle system. Due to the degeneracy of the non-linear diffusion and the singular aggregation effect in the system, we perform an approximation of the stochastic particle system by using a cut-offed interacting potential. An additional linear diffusion on the particle level is used as a parabolic regularization of the system. We present the propagation of chaos result with logarithmic scalings. Consequently, the propagation of chaos follows directly from convergence in the sense of expectation and the vanishing viscosity argument of the Keller-Segel system.

2602.11747 2026-02-13 math.ST stat.ML stat.TH

High-Probability Minimax Adaptive Estimation in Besov Spaces via Online-to-Batch

Paul Liautaud, Pierre Gaillard, Olivier Wintenberger

详情
英文摘要

We study nonparametric regression over Besov spaces from noisy observations under sub-exponential noise, aiming to achieve minimax-optimal guarantees on the integrated squared error that hold with high probability and adapt to the unknown noise level. To this end, we propose a wavelet-based online learning algorithm that dynamically adjusts to the observed gradient noise by adaptively clipping it at an appropriate level, eliminating the need to tune parameters such as the noise variance or gradient bounds. As a by-product of our analysis, we derive high-probability adaptive regret bounds that scale with the $\ell_1$-norm of the competitor. Finally, in the batch statistical setting, we obtain adaptive and minimax-optimal estimation rates for Besov spaces via a refined online-to-batch conversion. This approach carefully exploits the structure of the squared loss in combination with self-normalized concentration inequalities.

2602.11722 2026-02-13 stat.ML cs.LG

PAC-Bayesian Generalization Guarantees for Fairness on Stochastic and Deterministic Classifiers

Julien Bastian, Benjamin Leblanc, Pascal Germain, Amaury Habrard, Christine Largeron, Guillaume Metzler, Emilie Morvant, Paul Viallard

详情
英文摘要

Classical PAC generalization bounds on the prediction risk of a classifier are insufficient to provide theoretical guarantees on fairness when the goal is to learn models balancing predictive risk and fairness constraints. We propose a PAC-Bayesian framework for deriving generalization bounds for fairness, covering both stochastic and deterministic classifiers. For stochastic classifiers, we derive a fairness bound using standard PAC-Bayes techniques. Whereas for deterministic classifiers, as usual PAC-Bayes arguments do not apply directly, we leverage a recent advance in PAC-Bayes to extend the fairness bound beyond the stochastic setting. Our framework has two advantages: (i) It applies to a broad class of fairness measures that can be expressed as a risk discrepancy, and (ii) it leads to a self-bounding algorithm in which the learning procedure directly optimizes a trade-off between generalization bounds on the prediction risk and on the fairness. We empirically evaluate our framework with three classical fairness measures, demonstrating not only its usefulness but also the tightness of our bounds.

2602.11711 2026-02-13 stat.ML cs.LG cs.NA math.NA stat.AP

Estimation of instrument and noise parameters for inverse problem based on prior diffusion model

Jean-François Giovannelli

详情
英文摘要

This article addresses the issue of estimating observation parameters (response and error parameters) in inverse problems. The focus is on cases where regularization is introduced in a Bayesian framework and the prior is modeled by a diffusion process. In this context, the issue of posterior sampling is well known to be thorny, and a recent paper proposes a notably simple and effective solution. Consequently, it offers an remarkable additional flexibility when it comes to estimating observation parameters. The proposed strategy enables us to define an optimal estimator for both the observation parameters and the image of interest. Furthermore, the strategy provides a means of quantifying uncertainty. In addition, MCMC algorithms allow for the efficient computation of estimates and properties of posteriors, while offering some guarantees. The paper presents several numerical experiments that clearly confirm the computational efficiency and the quality of both estimates and uncertainties quantification.

2602.11679 2026-02-13 stat.ML cs.AI cs.LG math.OC stat.ME

Provable Offline Reinforcement Learning for Structured Cyclic MDPs

Kyungbok Lee, Angelica Cristello Sarteau, Michael R. Kosorok

Comments 65 pages, 4 figures. Submitted to JMLR

详情
英文摘要

We introduce a novel cyclic Markov decision process (MDP) framework for multi-step decision problems with heterogeneous stage-specific dynamics, transitions, and discount factors across the cycle. In this setting, offline learning is challenging: optimizing a policy at any stage shifts the state distributions of subsequent stages, propagating mismatch across the cycle. To address this, we propose a modular structural framework that decomposes the cyclic process into stage-wise sub-problems. While generally applicable, we instantiate this principle as CycleFQI, an extension of fitted Q-iteration enabling theoretical analysis and interpretation. It uses a vector of stage-specific Q-functions, tailored to each stage, to capture within-stage sequences and transitions between stages. This modular design enables partial control, allowing some stages to be optimized while others follow predefined policies. We establish finite-sample suboptimality error bounds and derive global convergence rates under Besov regularity, demonstrating that CycleFQI mitigates the curse of dimensionality compared to monolithic baselines. Additionally, we propose a sieve-based method for asymptotic inference of optimal policy values under a margin condition. Experiments on simulated and real-world Type 1 Diabetes data sets demonstrate CycleFQI's effectiveness.

2602.11610 2026-02-13 stat.ME math.ST stat.TH

Improving the adjusted Benjamini--Hochberg method using e-values in knockoff-assisted variable selection

Aniket Biswas, Aaditya Ramdas

Comments Main manuscript 18 pages, 4 figures. Appendices 12 pages, 8 figures

详情
英文摘要

Considering the knockoff-based multiple testing framework of Barber and Candès [2015], we revisit the method of Sarkar and Tang [2022] and identify it as a specific case of an un-normalized e-value weighted Benjamini-Hochberg procedure. Building on this insight, we extend the method to use bounded p-to-e calibrators that enable more refined and flexible weight assignments. Our approach generalizes the method of Sarkar and Tang [2022], which emerges as a special case corresponding to an extreme calibrator. Within this framework, we propose three procedures: an e-value weighted Benjamini-Hochberg method, its adaptive extension using an estimate of the proportion of true null hypotheses, and an adaptive weighted Benjamini-Hochberg method. We establish control of the false discovery rate (FDR) for the proposed methods. While we do not formally prove that the proposed methods outperform those of Barber and Candès [2015] and Sarkar and Tang [2022], simulation studies and real-data analysis demonstrate large and consistent improvement over the latter in all cases, and better performance than the knockoff method in scenarios with low target FDR, a small number of signals, and weak signal strength. Simulation studies and a real-data application in HIV-1 drug resistance analysis demonstrate strong finite sample FDR control and exhibit improved, or at least competitive, power relative to the aforementioned methods.

2602.11520 2026-02-13 stat.ME cs.AI cs.LG stat.ML

Locally Interpretable Individualized Treatment Rules for Black-Box Decision Models

Yasin Khadem Charvadeh, Katherine S. Panageas, Yuan Chen

详情
英文摘要

Individualized treatment rules (ITRs) aim to optimize healthcare by tailoring treatment decisions to patient-specific characteristics. Existing methods typically rely on either interpretable but inflexible models or highly flexible black-box approaches that sacrifice interpretability; moreover, most impose a single global decision rule across patients. We introduce the Locally Interpretable Individualized Treatment Rule (LI-ITR) method, which combines flexible machine learning models to accurately learn complex treatment outcomes with locally interpretable approximations to construct subject-specific treatment rules. LI-ITR employs variational autoencoders to generate realistic local synthetic samples and learns individualized decision rules through a mixture of interpretable experts. Simulation studies show that LI-ITR accurately recovers true subject-specific local coefficients and optimal treatment strategies. An application to precision side-effect management in breast cancer illustrates the necessity of flexible predictive modeling and highlights the practical utility of LI-ITR in estimating optimal treatment rules while providing transparent, clinically interpretable explanations.

2602.11511 2026-02-13 stat.ME math.ST stat.TH

Representation Learning with Blockwise Missingness and Signal Heterogeneity

Ziqi Liu, Ye Tian, Weijing Tang

详情
英文摘要

Unified representation learning for multi-source data integration faces two important challenges: blockwise missingness and blockwise signal heterogeneity. The former arises from sources observing different, yet potentially overlapping, feature sets, while the latter involves varying signal strengths across subject groups and feature sets. While existing methods perform well with fully observed data or uniform signal strength, their performance degenerates when these two challenges coincide, which is common in practice. To address this, we propose Anchor Projected Principal Component Analysis (APPCA), a general framework for representation learning with structured blockwise missingness that is robust to signal heterogeneity. APPCA first recovers robust group-specific column spaces using all observed feature sets, and then aligns them by projecting shared "anchor" features onto these subspaces before performing PCA. This projection step induces a significant denoising effect. We establish estimation error bounds for embedding reconstruction through a fine-grained perturbation analysis. In particular, using a novel spectral slicing technique, our bound eliminates the standard dependency on the signal strength of subject embeddings, relying instead solely on the signal strength of integrated feature sets. We validate the proposed method through extensive simulation studies and an application to multimodal single-cell sequencing data.

2602.11496 2026-02-13 stat.ME

High-Dimensional Mediation Analysis for Generalized Linear Models Using Bayesian Variable Selection Guided by Mediator Correlation

Youngho Bae, Chanmin Kim, Fenglei Wang, Qi Sun, Kyu Ha Lee

详情
英文摘要

High-dimensional mediation analysis aims to identify mediating pathways and to estimate indirect effects linking an exposure to an outcome. In this paper, we propose a Bayesian framework to address key challenges in these analyses, including high dimensionality, complex dependence among omics mediators, and non-continuous outcomes. Furthermore, commonly used approaches assume independent mediators or ignore correlations in the selection stage, which can reduce power when mediators are highly correlated. Addressing these challenges leads to a non-Gaussian likelihood and specialized selection priors, which in turn require efficient and adaptive posterior computation. Our proposed framework selects active pathways under generalized linear models while accounting for mediator dependence. Specifically, the mediators are modeled using a multivariate distribution, exposure-mediator selection is guided by a Markov random field prior on inclusion indicators, and mediator-outcome activation is restricted to mediators supported in the exposure-mediator model through a sequential subsetting Bernoulli prior. Simulation studies show improved operating characteristics in correlated-mediator settings, with appropriate error control under the global null and stable performance under model misspecification. We illustrate the method using real-world metabolomics data to study metabolites that mediate the association between adherence to the Alternate Mediterranean Diet score and two cardiometabolic outcomes.

2602.11403 2026-02-13 stat.ME

Who's Winning? Clarifying Estimands Based on Win Statistics in Cluster Randomized Trials

Kenneth M. Lee, Xi Fang, Fan Li, Michael O. Harhay

Comments 13 pages (main manuscript), 5 pages (supplementary appendix), 4 tables (main manuscript), 3 tables (supplementary appendix)

详情
英文摘要

Treatment effect estimands based on win statistics, including the win ratio, win odds, and win difference are increasingly popular targets for summarizing endpoints in clinical trials. Such win estimands offer an intuitive approach for prioritizing outcomes by clinical importance. The implementation and interpretation of win estimands is complicated in cluster randomized trials (CRTs), where researchers can target fundamentally different estimands on the individual-level or cluster-level. We numerically demonstrate that individual-pair and cluster-pair win estimands can substantially differ when cluster size is informative: where outcomes and/or treatment effects depend on cluster size. With such informative cluster sizes, individual-pair and cluster-pair win estimands can even yield opposite conclusions regarding treatment benefit. We describe consistent estimators for individual-pair and cluster-pair win estimands and propose a leave-one-cluster-out jackknife variance estimator for inference. Despite being consistent, our simulations highlight that some caution is needed when implementing individual-pair win estimators due to finite-sample bias. In contrast, cluster-pair win estimators are unbiased for their respective targets. Altogether, careful specification of the target estimand is essential when applying win estimators in CRTs. Failure to clearly define whether individual-pair or cluster-pair win estimands are of primary interest may result in answering a dramatically different question than intended.

2602.11379 2026-02-13 stat.AP econ.GN q-fin.EC stat.ME

Regularized Ensemble Forecasting for Learning Weights from Historical and Current Forecasts

Han Su, Xiaojia Guo, Xiaoke Zhang

详情
英文摘要

Combining forecasts from multiple experts often yields more accurate results than relying on a single expert. In this paper, we introduce a novel regularized ensemble method that extends the traditional linear opinion pool by leveraging both current forecasts and historical performances to set the weights. Unlike existing approaches that rely only on either the current forecasts or past accuracy, our method accounts for both sources simultaneously. It learns weights by minimizing the variance of the combined forecast (or its transformed version) while incorporating a regularization term informed by historical performances. We also show that this approach has a Bayesian interpretation. Different distributional assumptions within this Bayesian framework yield different functional forms for the variance component and the regularization term, adapting the method to various scenarios. In empirical studies on Walmart sales and macroeconomic forecasting, our ensemble outperforms leading benchmark models both when experts' full forecasting histories are available and when experts enter and exit over time, resulting in incomplete historical records. Throughout, we provide illustrative examples that show how the optimal weights are determined and, based on the empirical results, we discuss where the framework's strengths lie and when experts' past versus current forecasts are more informative.

2602.11360 2026-02-13 cs.LG cs.AI stat.ML

Bootstrapping-based Regularisation for Reducing Individual Prediction Instability in Clinical Risk Prediction Models

Sara Matijevic, Christopher Yau

详情
英文摘要

Clinical prediction models are increasingly used to support patient care, yet many deep learning-based approaches remain unstable, as their predictions can vary substantially when trained on different samples from the same population. Such instability undermines reliability and limits clinical adoption. In this study, we propose a novel bootstrapping-based regularisation framework that embeds the bootstrapping process directly into the training of deep neural networks. This approach constrains prediction variability across resampled datasets, producing a single model with inherent stability properties. We evaluated models constructed using the proposed regularisation approach against conventional and ensemble models using simulated data and three clinical datasets: GUSTO-I, Framingham, and SUPPORT. Across all datasets, our model exhibited improved prediction stability, with lower mean absolute differences (e.g., 0.019 vs. 0.059 in GUSTO-I; 0.057 vs. 0.088 in Framingham) and markedly fewer significantly deviating predictions. Importantly, discriminative performance and feature importance consistency were maintained, with high SHAP correlations between models (e.g., 0.894 for GUSTO-I; 0.965 for Framingham). While ensemble models achieved greater stability, we show that this came at the expense of interpretability, as each constituent model used predictors in different ways. By regularising predictions to align with bootstrapped distributions, our approach allows prediction models to be developed that achieve greater robustness and reproducibility without sacrificing interpretability. This method provides a practical route toward more reliable and clinically trustworthy deep learning models, particularly valuable in data-limited healthcare settings.

2602.08907 2026-02-13 cs.LG stat.ML

Positive Distribution Shift as a Framework for Understanding Tractable Learning

Marko Medvedev, Idan Attias, Elisabetta Cornacchia, Theodor Misiakiewicz, Gal Vardi, Nathan Srebro

Comments Added acknowledgments. Expanded the summary section

详情
英文摘要

We study a setting where the goal is to learn a target function f(x) with respect to a target distribution D(x), but training is done on i.i.d. samples from a different training distribution D'(x), labeled by the true target f(x). Such a distribution shift (here in the form of covariate shift) is usually viewed negatively, as hurting or making learning harder, and the traditional distribution shift literature is mostly concerned with limiting or avoiding this negative effect. In contrast, we argue that with a well-chosen D'(x), the shift can be positive and make learning easier -- a perspective called Positive Distribution Shift (PDS). Such a perspective is central to contemporary machine learning, where much of the innovation is in finding good training distributions D'(x), rather than changing the training algorithm. We further argue that the benefit is often computational rather than statistical, and that PDS allows computationally hard problems to become tractable even using standard gradient-based training. We formalize different variants of PDS, show how certain hard classes are easily learnable under PDS, and make connections with membership query learning.

2601.20269 2026-02-13 stat.ML cs.LG stat.ME

Empirical Likelihood-Based Fairness Auditing: Distribution-Free Certification and Flagging

Jie Tang, Chuanlong Xie, Xianli Zeng, Lixing Zhu

Comments 62 pages, 6 figures; Code available at: https://github.com/Tang-Jay/EL-for-fairness-auditing; Author list is in alphabetical order by last names

详情
英文摘要

Machine learning models in high-stakes applications, such as recidivism prediction and automated personnel selection, often exhibit systematic performance disparities across sensitive subpopulations, raising critical concerns regarding algorithmic bias. Fairness auditing addresses these risks through two primary functions: certification, which verifies adherence to fairness constraints; and flagging, which isolates specific demographic groups experiencing disparate treatment. However, existing auditing techniques are frequently limited by restrictive distributional assumptions or prohibitive computational overhead. We propose a novel empirical likelihood-based (EL) framework that constructs robust statistical measures for model performance disparities. Unlike traditional methods, our approach is non-parametric; the proposed disparity statistics follow asymptotically chi-square or mixed chi-square distributions, ensuring valid inference without assuming underlying data distributions. This framework uses a constrained optimization profile that admits stable numerical solutions, facilitating both large-scale certification and efficient subpopulation discovery. Empirically, the EL methods outperform bootstrap-based approaches, yielding coverage rates closer to nominal levels while reducing computational latency by several orders of magnitude. We demonstrate the practical utility of this framework on the COMPAS dataset, where it successfully flags intersectional biases, specifically identifying a significantly higher positive prediction rate for African-American males under 25 and a systemic under-prediction for Caucasian females relative to the population mean.

2601.13458 2026-02-13 stat.ML cs.AI cs.LG math.ST stat.TH

Labels or Preferences? Budget-Constrained Learning with Human Judgments over AI-Generated Outputs

Zihan Dong, Xiaotian Hou, Ruijia Wu, Linjun Zhang

详情
英文摘要

The increasing reliance on human preference feedback to judge AI-generated pseudo labels has created a pressing need for principled, budget-conscious data acquisition strategies. We address the crucial question of how to optimally allocate a fixed annotation budget between ground-truth labels and pairwise preferences in AI. Our solution, grounded in semi-parametric inference, casts the budget allocation problem as a monotone missing data framework. Building on this formulation, we introduce Preference-Calibrated Active Learning (PCAL), a novel method that learns the optimal data acquisition strategy and develops a statistically efficient estimator for functionals of the data distribution. Theoretically, we prove the asymptotic optimality of our PCAL estimator and establish a key robustness guarantee that ensures robust performance even with poorly estimated nuisance models. Our flexible framework applies to a general class of problems, by directly optimizing the estimator's variance instead of requiring a closed-form solution. This work provides a principled and statistically efficient approach for budget-constrained learning in modern AI. Simulations and real-data analysis demonstrate the practical benefits and superior performance of our proposed method.

2601.07059 2026-02-13 econ.EM stat.ME

Empirical Bayes Estimation in Heterogeneous Coefficient Panel Models

Myunghyun Song, Sokbae Lee, Serena Ng

详情
英文摘要

We develop an empirical Bayes (EB) G-modeling framework for short-panel linear models with nonparametric prior for the random intercepts, slopes, dynamics, and non-spherical error variances. We establish identification and consistency of the nonparametric maximum likelihood estimator (NPMLE) under general conditions, and provide low-level sufficient conditions for several models of empirical interest. Conditions for regret consistency of the EB estimators are also established. The NPMLE is computed using a Wasserstein-Fisher-Rao gradient flow algorithm adapted to panel regressions. Using data from the Panel Study of Income Dynamics, we find that the slope coefficient for potential experience is substantially heterogeneous and negatively correlated with the random intercept, and that error variances and autoregressive coefficients vary significantly across individuals. The EB estimates reduce mean squared prediction errors relative to individual maximum likelihood estimates.

2510.12026 2026-02-13 cs.LG stat.ML

Mamba Can Learn Low-Dimensional Targets In-Context via Test-Time Feature Learning

Junsoo Oh, Wei Huang, Taiji Suzuki

Comments 34 pages. Polished writing, added more experiments, and fixed minor errors

详情
英文摘要

Mamba, a recently proposed linear-time sequence model, has attracted significant attention for its computational efficiency and strong empirical performance. However, a rigorous theoretical understanding of its underlying mechanisms remains limited. In this work, we provide a theoretical analysis of Mamba's in-context learning (ICL) capability by focusing on tasks defined by low-dimensional nonlinear target functions. Specifically, we study in-context learning of a single-index model $y \approx g_*(\langle \boldsymbolβ, \boldsymbol{x} \rangle)$, which depends on only a single relevant direction $\boldsymbolβ$, referred to as feature. We prove that Mamba, pretrained by gradient-based methods, can achieve efficient ICL via test-time feature learning, extracting the relevant direction directly from context examples. Consequently, we establish a test-time sample complexity that improves upon linear Transformers -- analyzed to behave like kernel methods -- and is comparable to nonlinear Transformers, which have been shown to surpass the Correlational Statistical Query (CSQ) lower bound and achieve near information-theoretically optimal rate in previous works. Our analysis reveals the crucial role of the nonlinear gating mechanism in Mamba for feature extraction, highlighting it as the fundamental driver behind Mamba's ability to achieve both computational efficiency and high performance.

2509.22341 2026-02-13 stat.ML cs.LG math.ST stat.ME stat.TH

Preventing Model Collapse Under Overparametrization: Optimal Mixing Ratios for Interpolation Learning and Ridge Regression

Anvit Garg, Sohom Bhattacharya, Pragya Sur

Comments 36 pages, 5 figures

详情
英文摘要

Model collapse occurs when generative models degrade after repeatedly training on their own synthetic outputs. We study this effect in overparameterized linear regression in a setting where each iteration mixes fresh real labels with synthetic labels drawn from the model fitted in the previous iteration. We derive precise generalization error formulae for minimum-$\ell_2$-norm interpolation and ridge regression under this iterative scheme. Our analysis reveals intriguing properties of the optimal mixing weight that minimizes long-term prediction error and provably prevents model collapse. For instance, in the case of min-$\ell_2$-norm interpolation, we establish that the optimal real-data proportion converges to the reciprocal of the golden ratio for fairly general classes of covariate distributions. Previously, this property was known only for ordinary least squares, and additionally in low dimensions. For ridge regression, we further analyze two popular model classes -- the random-effects model and the spiked covariance model -- demonstrating how spectral geometry governs optimal weighting. In both cases, as well as for isotropic features, we uncover that the optimal mixing ratio should be at least one-half, reflecting the necessity of favoring real-data over synthetic. We study three additional settings: (i) where real data is fixed and fresh labels are not obtained at each iteration, (ii) where covariates vary across iterations but fresh real labels are available each time, and (iii) where covariates vary with time but only a fraction of them receive fresh real labels at each iteration. Across these diverse settings, we characterize when model collapse is inevitable and when synthetic data improves learning. We validate our theoretical results with extensive simulations.

2509.05864 2026-02-13 stat.ME stat.ML

Beyond ATE: Multi-Criteria Design for A/B Testing

Jiachun Li, Kaining Shi, David Simchi-Levi

详情
英文摘要

In the era of large-scale AI deployment and high-stakes clinical trials, adaptive experimentation faces a ``trilemma'' of conflicting objectives: minimizing cumulative regret (welfare loss during the experiment), maximizing the estimation accuracy of heterogeneous treatment effects (CATE), and ensuring differential privacy (DP) for participants. Existing literature typically optimizes these metrics in isolation or under restrictive parametric assumptions. In this work, we study the multi-objective design of adaptive experiments in a general non-parametric setting. First, we rigorously characterize the instance-dependent Pareto frontier between cumulative regret and estimation error, revealing the fundamental statistical limits of dual-objective optimization. We propose ConSE, a sequential segmentation and elimination algorithm that adaptively discretizes the covariate space to achieve the Pareto-optimal frontier. Second, we introduce DP-ConSE, a privacy-preserving extension that satisfies Joint Differential Privacy. We demonstrate that privacy comes ``for free'' in our framework, incurring only asymptotically negligible costs to regret and estimation accuracy. Finally, we establish a robust link between experimental design and long-term utility: we prove that any policy derived from our Pareto-optimal algorithms minimizes post-experiment simple regret, regardless of the specific exploration-exploitation trade-off chosen during the trial. Our results provide a theoretical foundation for designing ethical, private, and efficient adaptive experiments in sensitive domains.

2508.03245 2026-02-13 cs.LG stat.ML

Conformal Unlearning: A New Paradigm for Unlearning in Conformal Predictors

Yahya Alkhatib, Muhammad Ahmar Jamal, Wee Peng Tay

详情
英文摘要

Conformal unlearning aims to ensure that a trained conformal predictor miscovers data points with specific shared characteristics, such as those from a particular label class, associated with a specific user, or belonging to a defined cluster, while maintaining valid coverage on the remaining data. Existing machine unlearning methods, which typically approximate a model retrained from scratch after removing the data to be forgotten, face significant challenges when applied to conformal unlearning. These methods often lack rigorous, uncertainty-aware statistical measures to evaluate unlearning effectiveness and exhibit a mismatch between their degraded performance on forgotten data and the frequency with which that data are still correctly covered by conformal predictors-a phenomenon we term ''fake conformal unlearning''. To address these limitations, we propose a new paradigm for conformal machine unlearning that provides finite-sample, uncertainty-aware guarantees on unlearning performance without relying on a retrained model as a reference. We formalize conformal unlearning to require high coverage on retained data and high miscoverage on forgotten data, introduce practical empirical metrics for evaluation, and present an algorithm that optimizes these conformal objectives. Extensive experiments on vision and text benchmarks demonstrate that the proposed approach effectively removes targeted information while preserving utility.

2507.02890 2026-02-13 stat.AP cs.LG stat.ML

Robust Short-Term OEE Forecasting in Industry 4.0 via Topological Data Analysis

Korkut Anapa, İsmail Güzel, Ceylan Yozgatlıgil

Comments 44 pages

详情
英文摘要

In Industry 4.0 manufacturing environments, forecasting Overall Equipment Efficiency (OEE) is critical for data-driven operational control and predictive maintenance. However, the highly volatile and nonlinear nature of OEE time series--particularly in complex production lines and hydraulic press systems--limits the effectiveness of forecasting. This study proposes a novel informational framework that leverages Topological Data Analysis (TDA) to transform raw OEE data into structured engineering knowledge for production management. The framework models hourly OEE data from production lines and systems using persistent homology to extract large-scale topological features that characterize intrinsic operational behaviors. These features are integrated into a SARIMAX (Seasonal Autoregressive Integrated Moving Average with Exogenous Regressors) architecture, where TDA components serve as exogenous variables to capture latent temporal structures. Experimental results demonstrate forecasting accuracy improvements of at least 17% over standard seasonal benchmarks, with Heat Kernel-based features consistently identified as the most effective predictors. The proposed framework was deployed in a Global Lighthouse Network manufacturing facility, providing a new strategic layer for production management and achieving a 7.4% improvement in total OEE. This research contributes a formal methodology for embedding topological signatures into classical stochastic models to enhance decision-making in knowledge-intensive production systems.

2505.08960 2026-02-13 stat.ME

Modern Causal Inference Approaches to Improve Power for Subgroup Analysis in Randomized Controlled Trials

Antonio D'Alessandro, Jiyu Kim, Samrachana Adhikari, Donald Goff, Falco J. Bargagli Stoffi, Michele Santacatterina

详情
英文摘要

Randomized controlled trials (RCTs) often include subgroup analyses to assess whether treatment effects vary across pre-specified patient populations. However, these analyses frequently suffer from small sample sizes which limit the power to detect heterogeneous effects. Power can be improved by leveraging predictors of the outcome -- i.e., through covariate adjustment -- as well as by borrowing external data from similar RCTs or observational studies. The benefits of covariate adjustment may be limited when the trial sample is small. Borrowing external data can increase the effective sample size and improve power, but it introduces two key challenges: (i) integrating data across sources can lead to model misspecification, and (ii) practical violations of the positivity assumption -- where the probability of receiving the target treatment is near-zero for some covariate profiles in the external data -- can lead to extreme inverse-probability weights and unstable inferences, ultimately negating potential power gains. To account for these shortcomings, we present an approach to improving power in pre-planned subgroup analyses of small RCTs that leverages both baseline predictors and external data. We propose debiased estimators that accommodate parametric, machine learning, and nonparametric Bayesian methods. To address practical positivity violations, we introduce three estimators: a covariate-balancing approach, an automated debiased machine learning (DML) estimator, and a calibrated DML estimator. We show improved power in various simulations and offer practical recommendations for the application of the proposed methods. Finally, we apply them to evaluate the effectiveness of citalopram for negative symptoms in first-episode schizophrenia patients across subgroups defined by duration of untreated psychosis, using data from two small RCTs.

2505.07232 2026-02-13 stat.ME

Spatial Confounding in Multivariate Areal Data Analysis

Kyle Lin Wu, Sudipto Banerjee

Comments 29 pages, 2 figures

详情
英文摘要

We investigate spatial confounding in the presence of multivariate disease dependence. In the "analysis model perspective" of spatial confounding, adding a spatially dependent random effect can lead to significant variance inflation of the posterior distribution of the fixed effects. The "data generation perspective" views covariates as stochastic and correlated with an unobserved spatial confounder, leading to inferior statistical inference over multiple realizations. Although multiple methods have been proposed for adjusting statistical models to mitigate spatial confounding in estimating regression coefficients, the results on interactions between spatial confounding and multivariate dependence are very limited. We contribute to this domain by investigating spatial confounding from the analysis and data generation perspectives in a Bayesian coregionalized areal regression model. We derive novel results that distinguish variance inflation due to spatial confounding from inflation based on multicollinearity between predictors and provide insights into the estimation efficiency of a spatial estimator under a spatially confounded data generation model. We demonstrate favorable performance of spatial analysis compared to a non-spatial model in our simulation experiments even in the presence of spatial confounding and a misspecified spatial structure. In this regard, we align with several other authors in the defense of traditional hierarchical spatial models (Gilbert et al., 2025; Khan and Berrett, 2023; Zimmerman and Ver Hoef, 2022) and extend this defense to multivariate areal models. We analyze county-level data from the US on obesity / diabetes prevalence and diabetes-related cancer mortality, comparing the results with and without spatial random effects.

2505.00113 2026-02-13 stat.ME

Doubly robust augmented weighting estimators for the analysis of externally controlled single-arm trials and unanchored indirect treatment comparisons

Harlan Campbell, Antonio Remiro-Azócar

详情
英文摘要

Externally controlled single-arm trials are critical to assess treatment efficacy across therapeutic indications for which randomized controlled trials are not feasible. A closely-related research design, the unanchored indirect treatment comparison, is often required for disconnected treatment networks in health technology assessment. We present a unified causal inference framework for both research designs. We develop a novel estimator that augments a popular weighting approach based on entropy balancing -- matching-adjusted indirect comparison (MAIC) -- by fitting a model for the conditional outcome expectation. The predictions of the outcome model are combined with the entropy balancing MAIC weights. While the standard MAIC estimator is singly robust where the outcome model is non-linear, our augmented MAIC approach is doubly robust, providing increased robustness against model misspecification. This is demonstrated in a simulation study with binary outcomes and a logistic outcome model, where the augmented estimator demonstrates its doubly robust property, while exhibiting higher precision than all non-augmented weighting estimators and near-identical precision to G-computation. We describe the extension of our estimator to the setting with unavailable individual participant data for the external control, illustrating it through an applied example. Our findings reinforce the understanding that entropy balancing-based approaches have desirable properties compared to standard ``modeling'' approaches to weighting, but should be augmented to improve protection against bias and guarantee double robustness.

2503.16687 2026-02-13 stat.ME

biniLasso: Automated cut-point detection via sparse cumulative binarization

Abdollah Safari, Hamed Halisaz, Peter Loewen

详情
英文摘要

We present biniLasso and its sparse variant (sparse biniLasso), novel methods for prognostic analysis of high-dimensional survival data that enable detection of multiple cut-points per feature. Our approach leverages the Cox proportional hazards model with two key innovations: (1) a cumulative binarization scheme with $L_1$-penalized coefficients operating on context-dependent cut-point candidates, and (2) for sparse biniLasso, additional uniLasso regularization to enforce sparsity while preserving univariate coefficient patterns. These innovations yield substantially improved interpretability, computational efficiency (4-11x faster than existing approaches), and prediction performance. Through extensive simulations, we demonstrate superior performance in cut-point detection, particularly in high-dimensional settings. Application to three genomic cancer datasets from TCGA confirms the methods' practical utility, with both variants showing enhanced risk prediction accuracy compared to conventional techniques.

2501.02624 2026-02-13 math.ST stat.ML stat.TH

Simultaneous analysis of approximate leave-one-out cross-validation and mean-field inference

Pierre C Bellec

详情
英文摘要

Approximate Leave-One-Out Cross-Validation (ALO-CV) is a method that has been proposed to estimate the generalization error of a regularized estimator in the high-dimensional regime where dimension and sample size are of the same order, the so-called ``proportional regime''. A new analysis is developed to derive the consistency of ALO-CV for non-differentiable regularizers under Gaussian covariates and strong convexity. Using a conditioning argument, the difference between the ALO-CV weights and their counterparts in mean-field inference is shown to be small. Combined with upper bounds between the mean-field inference estimate and the leave-one-out quantity, this provides a proof that ALO-CV approximates the leave-one-out quantity up to negligible error terms. Linear models with square loss, robust linear regression and single-index models are explicitly treated.

2501.02087 2026-02-13 cs.LG stat.ML

Beyond CVaR: Leveraging Static Spectral Risk Measures for Enhanced Decision-Making in Distributional Reinforcement Learning

Mehrdad Moghimi, Hyejin Ku

Comments Accepted at ICML 2025

Journal ref Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:44571-44593, 2025

详情
英文摘要

In domains such as finance, healthcare, and robotics, managing worst-case scenarios is critical, as failure to do so can lead to catastrophic outcomes. Distributional Reinforcement Learning (DRL) provides a natural framework to incorporate risk sensitivity into decision-making processes. However, existing approaches face two key limitations: (1) the use of fixed risk measures at each decision step often results in overly conservative policies, and (2) the interpretation and theoretical properties of the learned policies remain unclear. While optimizing a static risk measure addresses these issues, its use in the DRL framework has been limited to the simple static CVaR risk measure. In this paper, we present a novel DRL algorithm with convergence guarantees that optimizes for a broader class of static Spectral Risk Measures (SRM). Additionally, we provide a clear interpretation of the learned policy by leveraging the distribution of returns in DRL and the decomposition of static coherent risk measures. Extensive experiments demonstrate that our model learns policies aligned with the SRM objective, and outperforms existing risk-neutral and risk-sensitive DRL models in various settings.

2407.21082 2026-02-13 cs.CL cs.LG stat.ML

Accelerating Large Language Model Inference with Self-Supervised Early Exits

Florian Valade

详情
英文摘要

This paper presents a modular approach to accelerate inference in large language models (LLMs) by adding early exit heads at intermediate transformer layers. Each head is trained in a self-supervised manner to mimic the main model's predictions, allowing computation to stop early when a calibrated confidence threshold is reached. We evaluate several confidence metrics and show that entropy provides the most reliable separation between correct and incorrect predictions. Experiments on the Pythia model suite (70M to 2.8B parameters) demonstrate that our method significantly reduces inference cost while maintaining accuracy across multiple benchmarks. We further adapt this approach to speculative decoding, introducing Dynamic Self-Speculative Decoding (DSSD), which achieves 1.66x higher token acceptance than manually-tuned LayerSkip baselines with minimal hyperparameter tuning.

2405.00357 2026-02-13 q-fin.RM math.PR math.ST q-fin.MF stat.TH

Optimal nonparametric estimation of the expected shortfall risk

Daniel Bartl, Stephan Eckstein

Comments To appear in: SIAM Journal on Financial Mathematics

详情
英文摘要

We address the problem of estimating the expected shortfall risk of a financial loss using a finite number of i.i.d. data. It is well known that the classical plug-in estimator suffers from poor statistical performance when faced with (heavy-tailed) distributions that are commonly used in financial contexts. Further, it lacks robustness, as the modification of even a single data point can cause a significant distortion. We propose a novel procedure for the estimation of the expected shortfall and prove that it recovers the best possible statistical properties (dictated by the central limit theorem) under minimal assumptions and for all finite numbers of data. Further, this estimator is adversarially robust: even if a (small) proportion of the data is maliciously modified, the procedure continuous to optimally estimate the true expected shortfall risk. We demonstrate that our estimator outperforms the classical plug-in estimator through a variety of numerical experiments across a range of standard loss distributions.

2311.05532 2026-02-13 eess.SP stat.ME

Uncertainty-Aware Bayes' Rule and Its Applications

Shixiong Wang

详情
英文摘要

Bayes' rule has enabled innumerable powerful algorithms of statistical signal processing and statistical machine learning. However, when model misspecifications exist in prior and/or data distributions, the direct application of Bayes' rule is questionable. Philosophically, the key is to balance the relative importance between the prior information and the data evidence when calculating posterior distributions: If prior distributions are overly conservative (i.e., exceedingly spread), we upweight the prior belief; if prior distributions are overly aggressive (i.e., exceedingly concentrated), we downweight the prior belief. The same operation also applies to likelihood distributions, which are defined as normalized likelihoods if the normalization exists. This paper studies a generalized Bayes' rule, called uncertainty-aware (UA) Bayes' rule, to technically realize the above philosophy, thus combating model uncertainties in prior and/or data distributions. In particular, the advantage of the proposed UA Bayes' rule over the existing power posterior (i.e., $α$-posterior) is investigated. Applications of the UA Bayes' rule on classification and estimation are discussed: Specifically, the UA naive Bayes classifier, the UA Kalman filter, the UA particle filter, and the UA interactive-multiple-model filter are suggested and experimentally validated.

2212.06338 2026-02-13 stat.ML cs.LG

Minimax Optimal Estimation of Stability Under Distribution Shift

Hongseok Namkoong, Yuanzhe Ma, Peter W. Glynn

Journal ref Operations Research 2026 74:1, 464-483

详情
英文摘要

The performance of decision policies and prediction models often deteriorates when applied to environments different from the ones seen during training. To ensure reliable operation, we analyze the stability of a system under distribution shift, which is defined as the smallest change in the underlying environment that causes the system's performance to deteriorate beyond a permissible threshold. In contrast to standard tail risk measures and distributionally robust losses that require the specification of a plausible magnitude of distribution shift, the stability measure is defined in terms of a more intuitive quantity: the level of acceptable performance degradation. We develop a minimax optimal estimator of stability and analyze its convergence rate, which exhibits a fundamental phase shift behavior. Our characterization of the minimax convergence rate shows that evaluating stability against large performance degradation incurs a statistical cost. Empirically, we demonstrate the practical utility of our stability framework by using it to compare system designs on problems where robustness to distribution shift is critical.

2211.00035 2026-02-13 math.ST stat.TH

Statistical properties of approximate geometric quantiles in infinite-dimensional Banach spaces

Gabriel Romon

Comments v4, added a fully fleshed-out proof of Lemma 5.3

详情
英文摘要

Geometric quantiles are location parameters which extend classical univariate quantiles to normed spaces (possibly infinite-dimensional) and which include the geometric median as a special case. The infinite-dimensional setting is highly relevant in the modeling and analysis of functional data, as well as for kernel methods. We begin by providing new results on the existence and uniqueness of geometric quantiles. Estimation is then performed with an approximate M-estimator and we investigate its large-sample properties in infinite dimension. When the population quantile is not uniquely defined, we leverage the theory of variational convergence to obtain asymptotic statements on subsequences in the weak topology. When there is a unique population quantile, we show, under minimal assumptions, that the estimator is consistent in the norm topology for a wide range of Banach spaces including every separable uniformly convex space. In separable Hilbert spaces, we establish weak Bahadur-Kiefer representations of the estimator, from which $\sqrt n$-asymptotic normality follows. As a consequence, we obtain the first central limit theorem valid in a generic Hilbert space and under minimal assumptions that exactly match those of the finite-dimensional case. Our consistency and asymptotic normality results significantly improve the state of the art, even for exact geometric medians in Hilbert spaces.

2204.06990 2026-02-13 math.ST stat.ML stat.TH

Observable adjustments in single-index models for regularized M-estimators

Pierre C Bellec

详情
英文摘要

We consider observations $(X,y)$ from single index models with unknown link function, Gaussian covariates and a regularized M-estimator $\hatβ$ constructed from convex loss function and regularizer. In the regime where sample size $n$ and dimension $p$ are both increasing such that $p/n$ has a finite limit, the behavior of the empirical distribution of $\hatβ$ and the predicted values $X\hatβ$ has been previously characterized in a number of models: The empirical distributions are known to converge to proximal operators of the loss and penalty in a related Gaussian sequence model, which captures the interplay between ratio $p/n$, loss, regularization and the data generating process. This connection between$(\hatβ,X\hatβ)$ and the corresponding proximal operators require solving fixed-point equations that typically involve unobservable quantities such as the prior distribution on the index or the link function. This paper develops a different theory to describe the empirical distribution of $\hatβ$ and $X\hatβ$: Approximations of $(\hatβ,X\hatβ)$ in terms of proximal operators are provided that only involve observable adjustments. These proposed observable adjustments are data-driven, e.g., do not require prior knowledge of the index or the link function. These new adjustments yield confidence intervals for individual components of the index, as well as estimators of the correlation of $\hatβ$ with the index. The interplay between loss, regularization and the model is thus captured in a data-driven manner, without solving the fixed-point equations studied in previous works. The results apply to both strongly convex regularizers and unregularized M-estimation. Simulations are provided for the square and logistic loss in single index models including logistic regression and 1-bit compressed sensing with 20\% corrupted bits.