arXivDaily arXiv每日学术速递 周一至周五更新
重置
2602.23341 2026-02-27 cs.LG cs.DS math.ST stat.ML stat.TH

Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms

Alkis Kalavasis, Anay Mehrotra, Manolis Zampetakis, Felix Zhou, Ziyu Zhu

Comments Abstract truncated to arXiv limits. To appear in ICLR'26

详情
英文摘要

Coarse data arise when learners observe only partial information about samples; namely, a set containing the sample rather than its exact value. This occurs naturally through measurement rounding, sensor limitations, and lag in economic systems. We study Gaussian mean estimation from coarse data, where each true sample $x$ is drawn from a $d$-dimensional Gaussian distribution with identity covariance, but is revealed only through the set of a partition containing $x$. When the coarse samples, roughly speaking, have ``low'' information, the mean cannot be uniquely recovered from observed samples (i.e., the problem is not identifiable). Recent work by Fotakis, Kalavasis, Kontonis, and Tzamos [FKKT21] established that sample-efficient mean estimation is possible when the unknown mean is identifiable and the partition consists of only convex sets. Moreover, they showed that without convexity, mean estimation becomes NP-hard. However, two fundamental questions remained open: (1) When is the mean identifiable under convex partitions? (2) Is computationally efficient estimation possible under identifiability and convex partitions? This work resolves both questions. [...]

2602.23336 2026-02-27 cs.LG stat.ML

Differentiable Zero-One Loss via Hypersimplex Projections

Camilo Gomez, Pengyang Wang, Liansheng Tang

Comments To appear in PAKDD 2026 (Pacific-Asia Conference on Knowledge Discovery and Data Mining), 12 pages

详情
英文摘要

Recent advances in machine learning have emphasized the integration of structured optimization components into end-to-end differentiable models, enabling richer inductive biases and tighter alignment with task-specific objectives. In this work, we introduce a novel differentiable approximation to the zero-one loss-long considered the gold standard for classification performance, yet incompatible with gradient-based optimization due to its non-differentiability. Our method constructs a smooth, order-preserving projection onto the n,k-dimensional hypersimplex through a constrained optimization framework, leading to a new operator we term Soft-Binary-Argmax. After deriving its mathematical properties, we show how its Jacobian can be efficiently computed and integrated into binary and multiclass learning systems. Empirically, our approach achieves significant improvements in generalization under large-batch training by imposing geometric consistency constraints on the output logits, thereby narrowing the performance gap traditionally observed in large-batch training.

2602.23311 2026-02-27 stat.ME stat.AP

Data-Efficient Generative Modeling of Non-Gaussian Global Climate Fields via Scalable Composite Transformations

Johannes Brachem, Paul F. V. Wiemann, Matthias Katzfuss

详情
英文摘要

Quantifying uncertainty in future climate projections is hindered by the prohibitive computational cost of running physical climate models, which severely limits the availability of training data. We propose a data-efficient framework for emulating the internal variability of global climate fields, specifically designed to overcome these sample-size constraints. Inspired by copula modeling, our approach constructs a highly expressive joint distribution via a composite transformation to a multivariate standard normal space. We combine a nonparametric Bayesian transport map for spatial dependence modeling with flexible, spatially varying marginal models, essential for capturing non-Gaussian behavior and heavy-tailed extremes. These marginals are defined by a parametric model followed by a semi-parametric B-spline correction to capture complex distributional features. The marginal parameters are spatially smoothed using Gaussian-process priors with low-rank approximations, rendering the computational cost linear in the spatial dimension. When applied to global log-precipitation-rate fields at more than 50,000 grid locations, our stochastic surrogate achieves high fidelity, accurately quantifying the climate distribution's spatial dependence and marginal characteristics, including the tails. Using only 10 training samples, it outperforms a state-of-the-art competitor trained on 80 samples, effectively octupling the computational budget for climate research. We provide a Python implementation at https://github.com/jobrachem/ppptm .

2602.23291 2026-02-27 stat.ME math.ST stat.TH

Identifiability of Treatment Effects with Unobserved Spatially Varying Confounders

Tommy Tang, Xinran Li, Bo Li

Comments 8 pages, 1 figure

详情
英文摘要

The study of causal effects in the presence of unmeasured spatially varying confounders has garnered increasing attention. However, a general framework for identifiability, which is critical for reliable causal inference from observational data, has yet to be advanced. In this paper, we study a linear model with various parametric model assumptions on the covariance structure between the unmeasured confounder and the exposure of interest. We establish identifiability of the treatment effect for many commonly 20 used spatial models for both discrete and continuous data, under mild conditions on the structure of observation locations and the exposure-confounder association. We also emphasize models or scenarios where identifiability may not hold, under which statistical inference should be conducted with caution.

2602.23257 2026-02-27 stat.ME econ.EM

Randomization Tests in Switchback Experiments

Jizhou Liu, Liang Zhong

详情
英文摘要

Switchback experiments--alternating treatment and control over time--are widely used when unit-level randomization is infeasible, outcomes are aggregated, or user interference is unavoidable. In practice, experimentation must support fast product cycles, so teams often run studies for limited durations and make decisions with modest samples. At the same time, outcomes in these time-indexed settings exhibit serial dependence, seasonality, and occasional heavy-tailed shocks, and temporal interference (carryover or anticipation) can render standard asymptotics and naive randomization tests unreliable. In this paper, we develop a randomization-test framework that delivers finite-sample valid, distribution-free p-values for several null hypotheses of interest using only the known assignment mechanism, without parametric assumptions on the outcome process. For causal effects of interests, we impose two primitive conditions--non-anticipation and a finite carryover horizon m--and construct conditional randomization tests (CRTs) based on an ex ante pooling of design blocks into "sections," which yields a tractable conditional assignment law and ensures imputability of focal outcomes. We provide diagnostics for learning the carryover window and assessing non-anticipation, and we introduce studentized CRTs for a session-wise weak null that accommodates within-session seasonality with asymptotic validity. Power approximations under distributed-lag effects with AR(1) noise guide design and analysis choices, and simulations demonstrate favorable size and power relative to common alternatives. Our framework extends naturally to other time-indexed designs.

2602.23233 2026-02-27 stat.AP

The Counterfactual Combine: A Causal Framework for Player Evaluation

Herbert P. Susmann, Antonio D'Alessandro

Comments 23 pages, 8 figures

详情
英文摘要

Evaluating sports players based on their performance shares core challenges with evaluating healthcare providers based on patient outcomes. Drawing on recent advances in healthcare provider profiling, we cast sports player evaluation within a rigorous causal inference framework and define a flexible class of causal player evaluation estimands. Using stochastic interventions, we compare player success rates on repeated tasks (such as field goal attempts or plate appearance) to counterfactual success rates had those same attempts been randomly reassigned to players according to prespecified reference distributions. This setup encompasses direct and indirect standardization parameters familiar from healthcare provider profiling, and we additionally propose a "performance above random replacement" estimand designed for interpretability in sports settings. We develop doubly robust estimators for these evaluation metrics based on modern semiparametric statistical methods, with a focus on Targeted Minimum Loss-based Estimation, and incorporate machine learning methods to capture complex relationships driving player performance. We illustrate our framework in detailed case studies of field goal kickers in the National Football League and batters in Major League Baseball, highlighting how different causal estimands yield distinct interpretations and insights about player performance.

2602.23143 2026-02-27 stat.ME

Dimension Reduction in Multivariate Extremes via Latent Linear Factor Models

Alexis Boulin, Axel Bücher

Comments 36 pages

详情
英文摘要

We propose a new and interpretable class of high-dimensional tail dependence models based on latent linear factor structures. Specifically, extremal dependence of an observable vector is assumed to be driven by a lower-dimensional latent $K$-factor model, where $K \ll d$, thereby inducing an explicit form of dimension reduction. Geometrically, this is reflected in the support of the associated spectral dependence measure, whose intrinsic dimension is at most $K-1$. The loading structure may additionally exhibit sparsity, meaning that each component is influenced by only a small number of latent factors, which further enhances interpretability and scalability. Under mild structural assumptions, we establish identifiability of the model parameters and provide a constructive recovery procedure based on a margin-free tail pairwise dependence matrix, which also yields practical rank-based estimation methods. The framework combines naturally with marginal tail models and is particularly well suited to high-dimensional settings. We illustrate its applicability in a spatial wind energy application, where the latent factor structure enables tractable estimation of the risk that a large proportion of turbines simultaneously fall below their cut-in wind speed thresholds.

2602.23131 2026-02-27 physics.data-an stat.AP

Titanic overconfidence -- dark uncertainty can sink hybrid metrology for semiconductor manufacturing

Ronald G. Dixson, Adam L. Pintar, R. Joseph. Kline, Thomas A. Germer, J. Alexander Liddle, John S. Villarrubia, Samuel M. Stavis

Comments This manuscript has been submitted to the IEEE for possible publication

详情
英文摘要

Hybrid metrology for semiconductor manufacturing is on a collision course with dark uncertainty. An IEEE technology roadmap for this venture has targeted a linewidth uncertainty of +/- 0.17 nm at 95 % coverage and advised the hybridization of results from different measurement methods to hit this target. Related studies have applied statistical models that require consistent results to compel a lower uncertainty, whereas inconsistent results are prevalent. We illuminate this lurking issue, studying how standard methods of uncertainty evaluation fail to account for the causes and effects of dark uncertainty. We revisit a comparison of imaging and scattering methods to measure linewidths of approximately 13 nm, applying contrasting statistical models to highlight the potential effect of dark uncertainty on hybrid metrology. A random effects model allows the combination of inconsistent results, accounting for dark uncertainty and estimating a total uncertainty of +/- 0.8 nm at 95 % coverage. In contrast, a common mean model requires consistent results for combination, ignoring dark uncertainty and underestimating the total uncertainty by as much as a factor of five. To avoid such titanic overconfidence, which can sink a venture, we outline good practices to reduce dark uncertainty and guide the combination of indeterminately consistent results.

2602.23073 2026-02-27 math.ST cs.AI stat.TH

Accelerated Online Risk-Averse Policy Evaluation in POMDPs with Theoretical Guarantees and Novel CVaR Bounds

Yaacov Pariente, Vadim Indelman

详情
英文摘要

Risk-averse decision-making under uncertainty in partially observable domains is a central challenge in artificial intelligence and is essential for developing reliable autonomous agents. The formal framework for such problems is the partially observable Markov decision process (POMDP), where risk sensitivity is introduced through a risk measure applied to the value function, with Conditional Value-at-Risk (CVaR) being a particularly significant criterion. However, solving POMDPs is computationally intractable in general, and approximate methods rely on computationally expensive simulations of future agent trajectories. This work introduces a theoretical framework for accelerating CVaR value function evaluation in POMDPs with formal performance guarantees. We derive new bounds on the CVaR of a random variable X using an auxiliary random variable Y, under assumptions relating their cumulative distribution and density functions; these bounds yield interpretable concentration inequalities and converge as the distributional discrepancy vanishes. Building on this, we establish upper and lower bounds on the CVaR value function computable from a simplified belief-MDP, accommodating general simplifications of the transition dynamics. We develop estimators for these bounds within a particle-belief MDP framework with probabilistic guarantees, and employ them for acceleration via action elimination: actions whose bounds indicate suboptimality under the simplified model are safely discarded while ensuring consistency with the original POMDP. Empirical evaluation across multiple POMDP domains confirms that the bounds reliably separate safe from dangerous policies while achieving substantial computational speedups under the simplified model.

2602.23045 2026-02-27 stat.ME math.ST stat.TH

Semiparametric Joint Inference for Sensitivity and Specificity at the Youden-Optimal Cut-Off

Siyan Liu, Qinglong Tian, Chunlin Wang, Pengfei Li

Comments 23 pages, 2 figures, 6 tables

详情
英文摘要

Sensitivity and specificity evaluated at an optimal diagnostic cut-off are fundamental measures of classification accuracy when continuous biomarkers are used for disease diagnosis. Joint inference for these quantities is challenging because their estimators are evaluated at a common, data-driven threshold estimated from both diseased and healthy samples, inducing statistical dependence. Existing approaches are largely based on parametric assumptions or fully nonparametric procedures, which may be sensitive to model misspecification or lack efficiency in moderate samples. We propose a semiparametric framework for joint inference on sensitivity and specificity at the Youden-optimal cut-off under the density ratio model. Using maximum empirical likelihood, we derive estimators of the optimal threshold and the corresponding sensitivity and specificity, and establish their joint asymptotic normality. This leads to Wald-type and range-preserving logit-transformed confidence regions. Simulation studies show that the proposed method achieves accurate coverage with improved efficiency relative to existing parametric and nonparametric alternatives across a variety of distributional settings. An analysis of COVID-19 antibody data demonstrates the practical advantages of the proposed approach for diagnostic decision-making.

2602.23023 2026-02-27 math.ST cs.LG math.PR stat.ML stat.TH

Low-degree Lower bounds for clustering in moderate dimension

Alexandra Carpentier, Nicolas Verzelen

详情
英文摘要

We study the fundamental problem of clustering $n$ points into $K$ groups drawn from a mixture of isotropic Gaussians in $\mathbb{R}^d$. Specifically, we investigate the requisite minimal distance $Δ$ between mean vectors to partially recover the underlying partition. While the minimax-optimal threshold for $Δ$ is well-established, a significant gap exists between this information-theoretic limit and the performance of known polynomial-time procedures. Although this gap was recently characterized in the high-dimensional regime ($n \leq dK$), it remains largely unexplored in the moderate-dimensional regime ($n \geq dK$). In this manuscript, we address this regime by establishing a new low-degree polynomial lower bound for the moderate-dimensional case when $d \geq K$. We show that while the difficulty of clustering for $n \leq dK$ is primarily driven by dimension reduction and spectral methods, the moderate-dimensional regime involves more delicate phenomena leading to a "non-parametric rate". We provide a novel non-spectral algorithm matching this rate, shedding new light on the computational limits of the clustering problem in moderate dimension.

2602.23021 2026-02-27 math.ST stat.TH

On the errors committed by sequences of estimator functionals

Steffen Grønneberg, Nils Lid Hjort

Comments 27 pages, 1 figure. Statistical Research Report, Department of Mathematics, University of Oslo, from March 2010, now arXiv'd February 2026. The paper is published, essentially in this form, in Mathematical Methods of Statistics, 2012, vol. 20, pages 327-346, at this url: link.springer.com/article/10.3103/S106653071104003X

详情
英文摘要

Consider a sequence of estimators $\hat θ_n$ which converges almost surely to $θ_0$ as the sample size $n$ tends to infinity. Under weak smoothness conditions, we identify the asymptotic limit of the last time $\hat θ_n$ is further than $\eps$ away from $θ_0$ when $\eps \rightarrow 0^+$. These limits lead to the construction of sequentially fixed width confidence regions for which we find analytic approximations. The smoothness conditions we impose is that $\hat θ_n$ is to be close to a Hadamard-differentiable functional of the empirical distribution, an assumption valid for a large class of widely used statistical estimators. Similar results were derived in Hjort and Fenstad (1992, Annals of Statistics) for the case of Euclidean parameter spaces; part of the present contribution is to lift these results to situations involving parameter functionals. The apparatus we develop is also used to derive appropriate limit distributions of other quantities related to the far tail of an almost surely convergent sequence of estimators, like the number of times the estimator is more than $\eps$ away from its target. We illustrate our results by giving a new sequential simultaneous confidence set for the cumulative hazard function based on the Nelson--Aalen estimator and investigate a problem in stochastic programming related to computational complexity.

2602.23020 2026-02-27 stat.ME math.ST stat.TH

Testing Partially-Identifiable Causal Queries Using Ternary Tests

Sourbh Bhadane, Joris M. Mooij, Philip Boeken, Onno Zoeter

详情
英文摘要

We consider hypothesis testing of binary causal queries using observational data. Since the mapping of causal models to the observational distribution that they induce is not one-to-one, in general, causal queries are often only partially identifiable. When binary statistical tests are used for testing partially-identifiable causal queries, their results do not translate in a straightforward manner to the causal hypothesis testing problem. We propose using ternary (three-outcome) statistical tests to test partially-identifiable causal queries. We establish testability requirements that ternary tests must satisfy in terms of uniform consistency and present equivalent topological conditions on the hypotheses. To leverage the existing toolbox of binary tests, we prove that obtaining ternary tests by combining binary tests is complete. Finally, we demonstrate how topological conditions serve as a guide to construct ternary tests for two concrete causal hypothesis testing problems, namely testing the instrumental variable (IV) inequalities and comparing treatment efficacy.

2602.22985 2026-02-27 stat.ML cs.IT cs.LG math.IT

Kernel Integrated $R^2$: A Measure of Dependence

Pouya Roudaki, Shakeel Gavioli-Akilagun, Florian Kalinke, Mona Azadkia, Zoltán Szabó

详情
英文摘要

We introduce kernel integrated $R^2$, a new measure of statistical dependence that combines the local normalization principle of the recently introduced integrated $R^2$ with the flexibility of reproducing kernel Hilbert spaces (RKHSs). The proposed measure extends integrated $R^2$ from scalar responses to responses taking values on general spaces equipped with a characteristic kernel, allowing to measure dependence of multivariate, functional, and structured data, while remaining sensitive to tail behaviour and oscillatory dependence structures. We establish that (i) this new measure takes values in $[0,1]$, (ii) equals zero if and only if independence holds, and (iii) equals one if and only if the response is almost surely a measurable function of the covariates. Two estimators are proposed: a graph-based method using $K$-nearest neighbours and an RKHS-based method built on conditional mean embeddings. We prove consistency and derive convergence rates for the graph-based estimator, showing its adaptation to intrinsic dimensionality. Numerical experiments on simulated data and a real data experiment in the context of dependency testing for media annotations demonstrate competitive power against state-of-the-art dependence measures, particularly in settings involving non-linear and structured relationships.

2602.22975 2026-02-27 stat.ME

permApprox: a general framework for accurate permutation p-value approximation

Stefanie Peschel, Anne-Laure Boulesteix, Erika von Mutius, Christian L. Müller

详情
英文摘要

Permutation procedures are common practice in hypothesis testing when distributional assumptions about the test statistic are not met or unknown. With only few permutations, empirical p-values lie on a coarse grid and may even be zero when the observed test statistic exceeds all permuted values. Such zero p-values are statistically invalid and hinder multiple testing correction. Parametric tail modeling with the Generalized Pareto Distribution (GPD) has been proposed to address this issue, but existing implementations can again yield zero p-values when the estimated shape parameter is negative and the fitted distribution has a finite upper bound. We introduce a method for accurate and zero-free p-value approximation in permutation testing, embedded in the permApprox workflow and R package. Building on GPD tail modeling, the method enforces a support constraint during parameter estimation to ensure valid extrapolation beyond the observed statistic, thereby strictly avoiding zero p-values. The workflow further integrates robust parameter estimation, data-driven threshold selection, and principled handling of hybrid p-values that are discrete in the bulk and continuous in the extreme tail. Extensive simulations using two-sample t-tests and Wilcoxon rank-sum tests show that permApprox produces accurate, robust, and zero-free p-value approximations across a wide range of sample and effect sizes. Applications to single-cell RNA-seq and microbiome data demonstrate its practical utility: permApprox yields smooth and interpretable p-value distributions even with few permutations. By resolving the zero-p-value problem while preserving accuracy and computational efficiency, permApprox enables reliable permutation-based inference in high-dimensional and computationally intensive settings.

2602.22974 2026-02-27 cs.CE cs.CV eess.IV eess.SP stat.ML

An automatic counting algorithm for the quantification and uncertainty analysis of the number of microglial cells trainable in small and heterogeneous datasets

L. Martino, M. M. Garcia, P. S. Paradas, E. Curbelo

详情
Journal ref
Expert Systems With Applications, Volume 296, Part D, 2026. Num. 129208
英文摘要

Counting immunopositive cells on biological tissues generally requires either manual annotation or (when available) automatic rough systems, for scanning signal surface and intensity in whole slide imaging. In this work, we tackle the problem of counting microglial cells in lumbar spinal cord cross-sections of rats by omitting cell detection and focusing only on the counting task. Manual cell counting is, however, a time-consuming task and additionally entails extensive personnel training. The classic automatic color-based methods roughly inform about the total labeled area and intensity (protein quantification) but do not specifically provide information on cell number. Since the images to be analyzed have a high resolution but a huge amount of pixels contain just noise or artifacts, we first perform a pre-processing generating several filtered images {(providing a tailored, efficient feature extraction)}. Then, we design an automatic kernel counter that is a non-parametric and non-linear method. The proposed scheme can be easily trained in small datasets since, in its basic version, it relies only on one hyper-parameter. However, being non-parametric and non-linear, the proposed algorithm is flexible enough to express all the information contained in rich and heterogeneous datasets as well (providing the maximum overfit if required). Furthermore, the proposed kernel counter also provides uncertainty estimation of the given prediction, and can directly tackle the case of receiving several expert opinions over the same image. Different numerical experiments with artificial and real datasets show very promising results. Related Matlab code is also provided.

2602.22965 2026-02-27 stat.ME cs.CE eess.SP stat.CO stat.ML

A note on the area under the likelihood and the fake evidence for model selection

L. Martino, F. Llorente

详情
Journal ref
Computational Statistics, Volume 40, pages 4799-4824, year 2025
英文摘要

Improper priors are not allowed for the computation of the Bayesian evidence $Z=p({\bf y})$ (a.k.a., marginal likelihood), since in this case $Z$ is not completely specified due to an arbitrary constant involved in the computation. However, in this work, we remark that they can be employed in a specific type of model selection problem: when we have several (possibly infinite) models belonging to the same parametric family (i.e., for tuning parameters of a parametric model). However, the quantities involved in this type of selection cannot be considered as Bayesian evidences: we suggest to use the name ``fake evidences'' (or ``areas under the likelihood'' in the case of uniform improper priors). We also show that, in this model selection scenario, using a diffuse prior and increasing its scale parameter asymptotically to infinity, we cannot recover the value of the area under the likelihood, obtained with a uniform improper prior. We first discuss it from a general point of view. Then we provide, as an applicative example, all the details for Bayesian regression models with nonlinear bases, considering two cases: the use of a uniform improper prior and the use of a Gaussian prior, respectively. A numerical experiment is also provided confirming and checking all the previous statements.

2602.22954 2026-02-27 math.ST cs.CE eess.SP stat.CO stat.ML stat.TH

Effective sample size approximations as entropy measures

L. Martino, V. Elvira

详情
Journal ref
Computational Statistics, Volume 40, pages 5433-5464, 2025
英文摘要

In this work, we analyze alternative effective sample size (ESS) metrics for importance sampling algorithms, and discuss a possible extended range of applications. We show the relationship between the ESS expressions used in the literature and two entropy families, the Rényi and Tsallis entropy. The Rényi entropy is connected to the Huggins-Roy's ESS family introduced in \cite{Huggins15}. We prove that that all the ESS functions included in the Huggins-Roy's family fulfill all the desirable theoretical conditions. We analyzed and remark the connections with several other fields, such as the Hill numbers introduced in ecology, the Gini inequality coefficient employed in economics, and the Gini impurity index used mainly in machine learning, to name a few. Finally, by numerical simulations, we study the performance of different ESS expressions contained in the previous ESS families in terms of approximation of the theoretical ESS definition, and show the application of ESS formulas in a variable selection problem.

2602.22929 2026-02-27 math.ST math.PR stat.TH

Remarks on stationary GARCH processes under heavy tail distributions

Marc Taberner-Ortiz, Manfred Denker

详情
英文摘要

Let $(X_n)_{n\in \mathbb Z}$ be a GARCH process with $E(X_0^4)<\infty$, and let $μ_n$ denote the distribution of $\frac 1{\sqrt n}\sum_{i=1}^n [X_i^2-\mathbb E(X_0^2)]$. We derive a numerical approximation of $μ_n$ when $x_1,...,x_n$ are observed. This yields the derivation of confidence intervals for $μ= E(X_0^2)$ and we investigate the accuracy of these confidence intervals in comparison with standard ones based on normal approximation. Moreover, when the innovation process has heavy tail distribution, we improve the method using a new resampling method.

2602.22925 2026-02-27 stat.ML cs.LG

Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks

Katerina Papagiannouli, Dario Trevisan, Giuseppe Pio Zitto

详情
英文摘要

We study wide Bayesian neural networks focusing on the rare but statistically dominant fluctuations that govern posterior concentration, beyond Gaussian-process limits. Large-deviation theory provides explicit variational objectives-rate functions-on predictors, providing an emerging notion of complexity and feature learning directly at the functional level. We show that the posterior output rate function is obtained by a joint optimization over predictors and internal kernels, in contrast with fixed-kernel (NNGP) theory. Numerical experiments demonstrate that the resulting predictions accurately describe finite-width behavior for moderately sized networks, capturing non-Gaussian tails, posterior deformation, and data-dependent kernel selection effects.

2602.22884 2026-02-27 stat.ML cs.LG

Unsupervised Continual Learning for Amortized Bayesian Inference

Aayush Mishra, Šimon Kucharský, Paul-Christian Bürkner

详情
英文摘要

Amortized Bayesian Inference (ABI) enables efficient posterior estimation using generative neural networks trained on simulated data, but often suffers from performance degradation under model misspecification. While self-consistency (SC) training on unlabeled empirical data can enhance network robustness, current approaches are limited to static, single-task settings and fail to handle sequentially arriving data or distribution shifts. We propose a continual learning framework for ABI that decouples simulation-based pre-training from unsupervised sequential SC fine-tuning on real-world data. To address the challenge of catastrophic forgetting, we introduce two adaptation strategies: (1) SC with episodic replay, utilizing a memory buffer of past observations, and (2) SC with elastic weight consolidation, which regularizes updates to preserve task-critical parameters. Across three diverse case studies, our methods significantly mitigate forgetting and yield posterior estimates that outperform standard simulation-based training, achieving estimates closer to MCMC reference, providing a viable path for trustworthy ABI across a range of different tasks.

2602.22877 2026-02-27 stat.ME

Projection depth for functional data: Practical issues, computation and applications

Filip Bočinec, Stanislav Nagy, Hyemin Yeon

详情
英文摘要

Statistical analysis of functional data is challenging due to their complex patterns, for which functional depth provides an effective means of reflecting their ordering structure. In this work, we investigate practical aspects of the recently proposed regularized projection depth (RPD), which induces a meaningful ordering of functional data while appropriately accommodating their complex shape features. Specifically, we examine the impact and choice of its tuning parameter, which regulates the degree of effective dimension reduction applied to the data, and propose a random projection-based approach for its efficient computation, supported by theoretical justification. Through comprehensive numerical studies, we explore a wide range of statistical applications of the RPD and demonstrate its particular usefulness in uncovering shape features in functional data analysis. This ability allows the RPD to outperform competing depth-based methods, especially in tasks such as functional outlier detection, classification, and two-sample hypothesis testing.

2602.22694 2026-02-27 stat.AP

Robust optimal reconciliation for hierarchical time series forecasting with M-estimation

Zhichao Wang, Shanshan Wang, Wei Cao, Fei Yang

详情
英文摘要

Aggregation constraints, arising from geographical or sectoral division, frequently emerge in a large set of time series. Coherent forecasts of these constrained series are anticipated to conform to their hierarchical structure organized by the aggregation rules. To enhance its resilience against potential irregular series, we explore the robust reconciliation process for hierarchical time series (HTS) forecasting. We incorporate M-estimation to obtain the reconciled forecasts by minimizing a robust loss function of transforming a group of base forecasts subject to the aggregation constraints. The related minimization procedure is developed and implemented through a modified Newton-Raphson algorithm via local quadratic approximation. Extensive numerical experiments are carried out to evaluate the performance of the proposed method, and the results suggest its feasibility in handling numerous abnormal cases (for instance, series with non-normal errors). The proposed robust reconciliation also demonstrates excellent efficiency when no outliers exist in HTS. Finally, we showcase the practical application of the proposed method in a real-data study on Australian domestic tourism.

2602.22687 2026-02-27 stat.ME stat.CO

Renewable estimation in linear expectile regression models with streaming data sets

Wei Cao, Shanshan Wanga, Xiaoxue Hua

详情
英文摘要

Streaming data often exhibit heterogeneity due to heteroscedastic variances or inhomogeneous covariate effects. Online renewable quantile and expectile regression methods provide valuable tools for detecting such heteroscedasticity by combining current data with summary statistics from historical data. However, quantile regression can be computationally demanding because of the non-smooth check function. To address this, we propose a novel online renewable method based on expectile regression, which efficiently updates estimates using both current observations and historical summaries, thereby reducing storage requirements. By exploiting the smoothness of the expectile loss function, our approach achieves superior computational efficiency compared with existing online renewable methods for streaming data with heteroscedastic variances or inhomogeneous covariate effects. We establish the consistency and asymptotic normality of the proposed estimator under mild regularity conditions, demonstrating that it achieves the same statistical efficiency as oracle estimators based on full individual-level data. Numerical experiments and real-data applications demonstrate that our method performs comparably to the oracle estimator while maintaining high computational efficiency and minimal storage costs.

2602.22684 2026-02-27 stat.ME

Learning about Corner Kicks in Soccer by Analysis of Event Times Using a Frailty Model

Riley L Isaacs, X. Joan Hu, K. Ken Peng, Tim Swartz

详情
英文摘要

Corner kicks are an important event in soccer because they are often the result of strong attacking play and can be of keen interest to sports fans and bettors. Peng, Hu, and Swartz (2024, Computational Statistics) formulate the mixture feature of corner kick times caused by previous corner kicks, frame the commonly available corner kick data as right-censored event times, and explore patterns of corner kicks. This paper extends their modeling to accommodate the potential correlations between corner kicks by the same teams within the same games. We con- sider a frailty model for event times and apply the Monte Carlo Expec- tation Maximization (MCEM) algorithm to obtain the maximum like- lihood estimates for the model parameters. We compare the proposed model with the model in Peng, Hu, and Swartz (2024) using likelihood ratio tests. The 2019 Chinese Super League (CSL) data are employed throughout the paper for motivation and illustration.

2602.21585 2026-02-27 cs.LG cs.AI cs.CL stat.ML

Duel-Evolve: Reward-Free Test-Time Scaling via LLM Self-Preferences

Sweta Karlekar, Carolina Zheng, Magnus Saebo, Nicolas Beltran-Velez, Shuyang Yu, John Bowlan, Michal Kucer, David Blei

详情
英文摘要

Many applications seek to optimize LLM outputs at test time by iteratively proposing, scoring, and refining candidates over a discrete output space. Existing methods use a calibrated scalar evaluator for the target objective to guide search, but for many tasks such scores are unavailable, too sparse, or unreliable. Pairwise comparisons, by contrast, are often easier to elicit, still provide useful signal on improvement directions, and can be obtained from the LLM itself without external supervision. Building on this observation, we introduce Duel-Evolve, an evolutionary optimization algorithm that replaces external scalar rewards with pairwise preferences elicited from the same LLM used to generate candidates. Duel-Evolve aggregates these noisy candidate comparisons via a Bayesian Bradley-Terry model, yielding uncertainty-aware estimates of candidate quality. These quality estimates guide allocation of the comparison budget toward plausible optima using Double Thompson Sampling, as well as selection of high-quality parents to generate improved candidates. We evaluate Duel-Evolve on MathBench, where it achieves 20 percentage points higher accuracy over existing methods and baselines, and on LiveCodeBench, where it improves over comparable iterative methods by over 12 percentage points. Notably, the method requires no reward model, no ground-truth labels during search, and no hand-crafted scoring function. Results show that pairwise self-preferences provide strong optimization signal for test-time improvement over large, discrete output spaces.

2602.19964 2026-02-27 cs.LG cs.AI math.PR stat.ML

On the Equivalence of Random Network Distillation, Deep Ensembles, and Bayesian Inference

Moritz A. Zanger, Yijun Wu, Pascal R. Van der Vaart, Wendelin Böhmer, Matthijs T. J. Spaan

Comments 8 pages, 1 Figure

详情
英文摘要

Uncertainty quantification is central to safe and efficient deployments of deep learning models, yet many computationally practical methods lack lacking rigorous theoretical motivation. Random network distillation (RND) is a lightweight technique that measures novelty via prediction errors against a fixed random target. While empirically effective, it has remained unclear what uncertainties RND measures and how its estimates relate to other approaches, e.g. Bayesian inference or deep ensembles. This paper establishes these missing theoretical connections by analyzing RND within the neural tangent kernel framework in the limit of infinite network width. Our analysis reveals two central findings in this limit: (1) The uncertainty signal from RND -- its squared self-predictive error -- is equivalent to the predictive variance of a deep ensemble. (2) By constructing a specific RND target function, we show that the RND error distribution can be made to mirror the centered posterior predictive distribution of Bayesian inference with wide neural networks. Based on this equivalence, we moreover devise a posterior sampling algorithm that generates i.i.d. samples from an exact Bayesian posterior predictive distribution using this modified \textit{Bayesian RND} model. Collectively, our findings provide a unified theoretical perspective that places RND within the principled frameworks of deep ensembles and Bayesian inference, and offer new avenues for efficient yet theoretically grounded uncertainty quantification methods.

2602.19241 2026-02-27 stat.ML cs.AI cs.LG

Scaling Laws for Precision in High-Dimensional Linear Regression

Dechen Zhang, Xuan Tang, Yingyu Liang, Difan Zou

详情
英文摘要

Low-precision training is critical for optimizing the trade-off between model quality and training costs, necessitating the joint allocation of model size, dataset size, and numerical precision. While empirical scaling laws suggest that quantization impacts effective model and data capacities or acts as an additive error, the theoretical mechanisms governing these effects remain largely unexplored. In this work, we initiate a theoretical study of scaling laws for low-precision training within a high-dimensional sketched linear regression framework. By analyzing multiplicative (signal-dependent) and additive (signal-independent) quantization, we identify a critical dichotomy in their scaling behaviors. Our analysis reveals that while both schemes introduce an additive error and degrade the effective data size, they exhibit distinct effects on effective model size: multiplicative quantization maintains the full-precision model size, whereas additive quantization reduces the effective model size. Numerical experiments validate our theoretical findings. By rigorously characterizing the complex interplay among model scale, dataset size, and quantization error, our work provides a principled theoretical basis for optimizing training protocols under practical hardware constraints.

2602.01434 2026-02-27 cs.LG math.ST stat.TH

Phase Transitions for Feature Learning in Neural Networks

Andrea Montanari, Zihao Wang

Comments 75 pages; 17 pdf figures; v2 is a minor revision of v1

详情
英文摘要

According to a popular viewpoint, neural networks learn from data by first identifying low-dimensional representations, and subsequently fitting the best model in this space. Recent works provide a formalization of this phenomenon when learning multi-index models. In this setting, we are given $n$ i.i.d. pairs $({\boldsymbol x}_i,y_i)$, where the covariate vectors ${\boldsymbol x}_i\in\mathbb{R}^d$ are isotropic, and responses $y_i$ only depend on ${\boldsymbol x}_i$ through a $k$-dimensional projection ${\boldsymbol Θ}_*^{\sf T}{\boldsymbol x}_i$. Feature learning amounts to learning the latent space spanned by ${\boldsymbol Θ}_*$. In this context, we study the gradient descent dynamics of two-layer neural networks under the proportional asymptotics $n,d\to\infty$, $n/d\toδ$, while the dimension of the latent space $k$ and the number of hidden neurons $m$ are kept fixed. Earlier work establishes that feature learning via polynomial-time algorithms is possible if $δ> δ_{\text{alg}}$, for $δ_{\text{alg}}$ a threshold depending on the data distribution, and is impossible (within a certain class of algorithms) below $δ_{\text{alg}}$. Here we derive an analogous threshold $δ_{\text{NN}}$ for two-layer networks. Our characterization of $δ_{\text{NN}}$ opens the way to study the dependence of learning dynamics on the network architecture and training algorithm. The threshold $δ_{\text{NN}}$ is determined by the following scenario. Training first visits points for which the gradient of the empirical risk is large and learns the directions spanned by these gradients. Then the gradient becomes smaller and the dynamics becomes dominated by negative directions of the Hessian. The threshold $δ_{\text{NN}}$ corresponds to a phase transition in the spectrum of the Hessian in this second phase.

2510.16560 2026-02-27 stat.ME

Calibrating confounding strength in sensitivity models for weighting estimators: a comparative review and a new method

Jean-Baptiste Baitairian, Bernard Sebastien, Rana Jreich, Sandrine Katsahian, Agathe Guilloux

详情
英文摘要

Causal inference is only valid when its underlying assumptions are satisfied, one of the most central being the ignorability or unconfoundedness assumption. However, this hypothesis is often unrealistic in observational studies, as some confounding variables may remain unobserved. To address this limitation, sensitivity models for Inverse Probability Weighting (IPW) estimators, known as Marginal Sensitivity Models, have been introduced, allowing for a controlled relaxation of ignorability. A substantial body of literature has emerged around these models, aiming to derive sharp and robust bounds for both binary and continuous treatment effects. A key element of these approaches is the specification of a sensitivity parameter, referred to as the "confounding strength", which quantifies the extent of deviation from ignorability. Yet, determining an appropriate value for this parameter is challenging, and the final interpretation of sensitivity analyses can be unclear. We believe these difficulties represent major obstacles to the adoption of such methods in practice. Therefore, after introducing sensitivity analyses for IPW estimators, we review different strategies to estimate or lower bound the confounding strength, introduce a new method leveraging negative controls, provide a decision tree with guidelines to choose a suitable approach, and compare the methodologies in an in-depth simulation study.

2510.15464 2026-02-27 cs.LG cs.AI stat.ML

Learning to Answer from Correct Demonstrations

Nirmit Joshi, Gene Li, Siddharth Bhandari, Shiva Prasad Kasiviswanathan, Cong Ma, Nathan Srebro

Comments Generalized some results. Updated the presentation in light of an important related work of Syed and Schapire. Improved discussions. Comments are welcome

详情
英文摘要

We study the problem of learning to generate an answer (or completion) to a question (or prompt), where there could be multiple correct answers, any one of which is acceptable at test time. Learning is based on demonstrations of some correct answer to each training question, as in Supervised Fine Tuning (SFT). We formalize the problem as imitation learning (i.e., apprenticeship learning) in contextual bandits, with offline demonstrations from some expert (optimal, or very good) policy, without explicitly observed rewards. In contrast to prior work, which assumes the demonstrator belongs to a bounded-complexity policy class, we propose relying only on the underlying reward model (i.e., specifying which answers are correct) being in a bounded-complexity class, which we argue is a strictly weaker assumption. We show that likelihood-maximization methods can fail in this setting, and instead present an approach that learns to answer nearly as well as the demonstrator, with sample complexity logarithmic in the cardinality of the reward class. Our method is similar to Syed and Schapire 2007, when adapted to a contextual bandit (i.e., single step) setup, but is a simple one-pass online approach that enjoys an "optimistic rate" (i.e., $1/\varepsilon$ when the demonstrator is optimal, versus $1/\varepsilon^2$ in Syed and Schapire), and works even with arbitrarily adaptive demonstrations.

2510.13868 2026-02-27 math.OC cs.LG cs.NA math.NA math.PR stat.ML

DeepMartingale: Duality of the Optimal Stopping Problem with Expressivity and High-Dimensional Hedging

Junyan Ye, Hoi Ying Wong

Comments 46 pages, 2 tables, 11 figures

详情
英文摘要

We propose \textit{DeepMartingale}, a deep-learning framework for the dual formulation of discrete-monitoring optimal stopping problems under continuous-time models. Leveraging a martingale representation, our method implements a \emph{pure-dual} procedure that directly optimizes over a parameterized class of martingales, producing computable and tight \emph{dual upper bounds} for the value function in high-dimensional settings without requiring any primal information or Snell-envelope approximation. We prove convergence of the resulting upper bounds under mild assumptions for both first- and second-moment losses. A key contribution is an expressivity theorem showing that \textit{DeepMartingale} can approximate the true value function to any prescribed accuracy $\varepsilon$ using neural networks of size at most $\tilde{c} d^{\tilde{q}}\varepsilon^{-\tilde{r}}$, with constants independent of the dimension $d$ and accuracy $\varepsilon$, thereby avoiding the curse of dimensionality. Since expressivity in this setting translates into scalability, our theory also motivates estimating the dimension scaling law to guide architecture design and the training setup in deep learning-based numerical computation and the choice of rebalancing frequency for the related hedging strategy. The learned martingale representation further yields a practical and dimension-scalable \emph{deep delta hedging strategy}. Numerical experiments on high-dimensional Bermudan option benchmarks confirm convergence, expressivity, scalable training, and the stability of the resulting upper bounds and hedging performance.

2510.07904 2026-02-27 eess.SY cs.LG cs.SY stat.ML

Multi-level informed optimization via decomposed Kriging for large design problems under uncertainty

Enrico Ampellio, Blazhe Gjorgiev, Giovanni Sansavini

Comments 34 pages, 18 figures

详情
英文摘要

Engineering design involves demanding models encompassing many decision variables and uncontrollable parameters. In addition, unavoidable aleatoric and epistemic uncertainties can be very impactful and add further complexity. The state-of-the-art adopts two steps, uncertainty quantification and design optimization, to optimize systems under uncertainty by means of robust or stochastic metrics. However, conventional scenario-based, surrogate-assisted, and mathematical programming methods are not sufficiently scalable to be affordable and precise in large and complex cases. Here, a multi-level approach is proposed to accurately optimize resource-intensive, high-dimensional, and complex engineering problems under uncertainty with minimal resources. A non-intrusive, fast-scaling, Kriging-based surrogate is developed to map the combined design/parameter domain efficiently. Multiple surrogates are adaptively updated by hierarchical and orthogonal decomposition to leverage the fewer and most uncertainty-informed data. The proposed method is statistically compared to the state-of-the-art via an analytical testbed and is shown to be concurrently faster and more accurate by orders of magnitude.

2509.12051 2026-02-27 stat.AP stat.ML

A comparison between geostatistical and machine learning models for spatio-temporal prediction of PM2.5 data

Zeinab Mohamed, Wenlong Gong

详情
英文摘要

Ambient air pollution poses significant health and environmental challenges. Exposure to high concentrations of PM$_{2.5}$ have been linked to increased respiratory and cardiovascular hospital admissions, more emergency department visits and deaths. Traditional air quality monitoring systems such as EPA-certified stations provide limited spatial and temporal data. The advent of low-cost sensors has dramatically improved the granularity of air quality data, enabling real-time, high-resolution monitoring. This study exploits the extensive data from PurpleAir sensors to assess and compare the effectiveness of various statistical and machine learning models in producing accurate hourly PM$_{2.5}$ maps across California. We evaluate traditional geostatistical methods, including kriging and land use regression, against advanced machine learning approaches such as neural networks, random forests, and support vector machines, as well as ensemble model. Our findings enhanced the predictive accuracy of PM2.5 concentration by correcting the bias in PurpleAir data with an ensemble model, which incorporating both spatiotemporal dependencies and machine learning models.

2507.03772 2026-02-27 cs.LG stat.ML

Skewed Score: A statistical framework to assess autograders

Magda Dubois, Harry Coppock, Mario Giulianelli, Timo Flesch, Lennart Luettgau, Cozmin Ududec

详情
英文摘要

The evaluation of large language model (LLM) outputs is increasingly performed by other LLMs, a setup commonly known as "LLM-as-a-judge", or autograders. While autograders offer a scalable alternative to human evaluation, they have shown mixed reliability and may exhibit systematic biases, depending on response type, scoring methodology, domain specificity, or other factors. Here we propose a statistical framework based on Bayesian generalised linear models (GLMs) that enables researchers to simultaneously assess their autograders while addressing their primary research questions (e.g., LLM evaluation). Our approach models evaluation outcomes (e.g., scores or pairwise preferences) as a function of properties of the grader (e.g., human vs. autograder) and the evaluated item (e.g., response length or the LLM that generated it), allowing for explicit quantification of scoring differences and potential biases within a unified framework. In addition, our method can be used to augment traditional metrics such as inter-rater agreement, by providing uncertainty estimates and clarifying sources of disagreement. Overall, this approach contributes to more robust and interpretable use of autograders in LLM evaluation, enabling both performance analysis and bias detection.

2506.18656 2026-02-27 stat.ML cs.LG math.ST stat.TH

On the Interpolation Error of Nonlinear Attention versus Linear Regression

Zhenyu Liao, Jiaqing Liu, TianQi Hou, Difan Zou, Zenan Ling

Comments 37 pages, 7 figures

详情
英文摘要

Attention has become the core building block of modern machine learning (ML) by efficiently capturing the long-range dependencies among input tokens. Its inherently parallelizable structure allows for efficient performance scaling with the rapidly increasing size of both data and model parameters. Despite its central role, the theoretical understanding of Attention, especially in the nonlinear setting, is progressing at a more modest pace. This paper provides a precise characterization of the interpolation error for a nonlinear Attention, in the high-dimensional regime where the number of input tokens $n$ and the embedding dimension $p$ are both large and comparable. Under a signal-plus-noise data model and for fixed Attention weights, we derive explicit (limiting) expressions for the mean-squared interpolation error. Leveraging recent advances in random matrix theory, we show that nonlinear Attention generally incurs a larger interpolation error than linear regression on random inputs. However, this gap vanishes, and can even be reversed, when the input contains a structured signal, particularly if the Attention weights align with the signal direction. Our theoretical insights are supported by numerical experiments.

2505.08371 2026-02-27 cs.LG stat.ML

Density Ratio-based Causal Discovery from Bivariate Continuous-Discrete Data

Takashi Nicholas Maeda, Shohei Shimizu, Hidetoshi Matsui

详情
英文摘要

We address the problem of inferring the causal direction between a continuous variable $X$ and a discrete variable $Y$ from observational data. For the model $X \to Y$, we adopt the threshold model used in prior work. For the model $Y \to X$, we consider two cases: (1) the conditional distributions of $X$ given different values of $Y$ form a location-shift family, and (2) they are mixtures of generalized normal distributions with independently parameterized components. We establish identifiability of the causal direction through three theoretical results. First, we prove that under $X \to Y$, the density ratio of $X$ conditioned on different values of $Y$ is monotonic. Second, we establish that under $Y \to X$ with non-location-shift conditionals, monotonicity of the density ratio holds only on a set of Lebesgue measure zero in the parameter space. Third, we show that under $X \to Y$, the conditional distributions forming a location-shift family requires a precise coordination between the causal mechanism and input distribution, which is non-generic under the principle of independent mechanisms. Together, these results imply that monotonicity of the density ratio characterizes the direction $X \to Y$, whereas non-monotonicity or location-shift conditionals characterizes $Y \to X$. Based on this, we propose Density Ratio-based Causal Discovery (DRCD), a method that determines causal direction by testing for location-shift conditionals and monotonicity of the estimated density ratio. Experiments on synthetic and real-world datasets demonstrate that DRCD outperforms existing methods.

2502.12108 2026-02-27 cs.LG cs.AI stat.ML

Using the Path of Least Resistance to Explain Deep Networks

Sina Salek, Joseph Enguehard

详情
英文摘要

Integrated Gradients (IG), a widely used axiomatic path-based attribution method, assigns importance scores to input features by integrating model gradients along a straight path from a baseline to the input. While effective in some cases, we show that straight paths can lead to flawed attributions. In this paper, we identify the cause of these misattributions and propose an alternative approach that equips the input space with a model-induced Riemannian metric (derived from the explained model's Jacobian) and computes attributions by integrating gradients along geodesics under this metric. We call this method Geodesic Integrated Gradients (GIG). To approximate geodesic paths, we introduce two techniques: a k-Nearest Neighbours-based approach for smaller models and a Stochastic Variational Inference-based method for larger ones. Additionally, we propose a new axiom, No-Cancellation Completeness (NCC), which strengthens completeness by ruling out feature-wise cancellation. We prove that, for path-based attributions under the model-induced metric, NCC holds if and only if the integration path is a geodesic. Through experiments on both synthetic and real-world image classification data, we provide empirical evidence supporting our theoretical analysis and showing that GIG produces more faithful attributions than existing methods, including IG, on the benchmarks considered.

2407.20683 2026-02-27 stat.ME

An online generalization of the (e-)Benjamini-Hochberg procedure

Lasse Fischer, Ziyu Xu, Aaditya Ramdas

Comments 33 pages, 6 figures

详情
英文摘要

In online multiple testing, the hypotheses arrive one by one, and at each time we must immediately reject or accept the current hypothesis solely based on the data and hypotheses observed so far. Many online procedures have been proposed, but none of them are generalizations of the Benjamini-Hochberg (BH) procedure based on p-values, or of the e-BH procedure that uses e-values. In this paper, we consider a relaxed problem setup that allows the current hypothesis to be rejected at any later step. We show that this relaxation allows us to define -- what we justify extensively to be -- the natural and appropriate online extension of the BH and e-BH procedures. We show that the FDR guarantees for BH (resp. e-BH) and online BH (resp. online e-BH) are identical under positive, negative or arbitrary dependence, at fixed and stopping times. Further, the online BH (resp. online e-BH) rule recovers the BH (resp. e-BH) rule as a special case when the number of hypotheses is known to be fixed. Of independent interest, our proof techniques also allow us to prove that numerous existing online procedures, which were known to control the FDR at fixed times, also control the FDR at stopping times.

2402.16639 2026-02-27 cs.LG stat.CO

Differentiable Particle Filtering using Optimal Placement Resampling

Domonkos Csuzdi, Olivér Törő, Tamás Bécsi

详情
英文摘要

Particle filters are a frequent choice for inference tasks in nonlinear and non-Gaussian state-space models. They can either be used for state inference by approximating the filtering distribution or for parameter inference by approximating the marginal data (observation) likelihood. A good proposal distribution and a good resampling scheme are crucial to obtain low variance estimates. However, traditional methods like multinomial resampling introduce nondifferentiability in PF-based loss functions for parameter estimation, prohibiting gradient-based learning tasks. This work proposes a differentiable resampling scheme by deterministic sampling from an empirical cumulative distribution function. We evaluate our method on parameter inference tasks and proposal learning.

2309.15604 2026-02-27 cs.LG q-bio.MN q-bio.QM stat.ML

Entropic Matching for Expectation Propagation of Markov Jump Processes

Yannick Eich, Bastian Alt, Heinz Koeppl

Comments AISTATS 2025

详情
Journal ref
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:856-864, 2025
英文摘要

We propose a novel, tractable latent state inference scheme for Markov jump processes, for which exact inference is often intractable. Our approach is based on an entropic matching framework that can be embedded into the well-known expectation propagation algorithm. We demonstrate the effectiveness of our method by providing closed-form results for a simple family of approximate distributions and apply it to the general class of chemical reaction networks, which are a crucial tool for modeling in systems biology. Moreover, we derive closed-form expressions for point estimation of the underlying parameters using an approximate expectation maximization procedure. We evaluate our method across various chemical reaction networks and compare it to multiple baseline approaches, demonstrating superior performance in approximating the mean of the posterior process. Finally, we discuss the limitations of our method and potential avenues for future improvement, highlighting its promising direction for addressing complex continuous-time Bayesian inference problems.

2307.06048 2026-02-27 math.OC cs.LG stat.ML

Online Inventory Problems: Beyond the i.i.d. Setting with Online Convex Optimization

Massil Hihat, Stéphane Gaïffas, Guillaume Garrigos, Simon Bussy

详情
Journal ref
Advances in Neural Information Processing Systems 2023
英文摘要

We study multi-product inventory control problems where a manager makes sequential replenishment decisions based on partial historical information in order to minimize its cumulative losses. Our motivation is to consider general demands, losses and dynamics to go beyond standard models which usually rely on newsvendor-type losses, fixed dynamics, and unrealistic i.i.d. demand assumptions. We propose MaxCOSD, an online algorithm that has provable guarantees even for problems with non-i.i.d. demands and stateful dynamics, including for instance perishability. We consider what we call non-degeneracy assumptions on the demand process, and argue that they are necessary to allow learning.

2305.12996 2026-02-27 stat.ME

Multilevel Control Functional

Kaiyu Li, Yiming Yang, Xiaoyuan Cheng, Yi He, Zhuo Sun

详情
英文摘要

Control variates are variance reduction techniques for Monte Carlo estimators. They play a critical role in improving Monte Carlo estimators in scientific and machine learning applications that involve computationally expensive integrals. We introduce multilevel control functionals (MLCFs), a novel and widely applicable extension of control variates that combines non-parametric Stein-based control variates with multi-fidelity methods. We show that when the integrand and the density are smooth, and when the dimensionality is not very high, MLCFs enjoy a faster convergence rate. We provide both theoretical analysis and empirical assessments on differential equation examples, including Bayesian inference for ecological models, to demonstrate the effectiveness of our proposed approach. Furthermore, we extend MLCFs for variational inference, and demonstrate improved performance empirically through Bayesian neural network examples.

2208.14537 2026-02-27 stat.CO stat.ME

Bayesian Multinomial Logistic Regression for Numerous Categories

Jared D. Fisher, Kyle R. McEvoy

Comments 14 pages, 2 figures. R package available at https://github.com/kylemcevoy/BayesMultiLogit

详情
英文摘要

Bayesian multinomial logistic regression provides a principled, interpretable approach to multiclass classification, but posterior sampling becomes increasingly expensive as the model dimension grows. Prior work has studied scalability in the number of subjects and covariates; in contrast, this paper focuses on how computation changes as the number of outcome categories increases. To improve scalability in settings with numerous categories, we adapt a gamma-augmentation strategy to decouple category-specific coefficient updates, so that each category's coefficients can be updated conditional on a single auxiliary variable per subject, rather than on the full set of other categories' coefficients. Because the resulting coefficient conditionals are non-conjugate, we couple this augmentation with either adaptive Metropolis-Hastings or elliptical slice sampling. Through simulation and a real-data example, we compare effective sample size and effective sampling rate across several standard competitors. We find that the best-performing sampler depends on the dimension and imbalance regime, and that the proposed augmentation provides substantial speedups in scenarios with numerous categories.

2202.03045 2026-02-27 cs.LG stat.ML

Metric-valued regression

Dan Tsir Cohen, Aryeh Kontorovich

详情
Journal ref
Conference on Learning Theory (COLT) 2022, Proceedings of Machine Learning Research (PMLR), 178: 662-700, 2022
英文摘要

We propose an efficient algorithm for learning mappings between two metric spaces, $\X$ and $\Y$. Our procedure is strongly Bayes-consistent whenever $\X$ and $\Y$ are topologically separable and $\Y$ is "bounded in expectation" (our term; the separability assumption can be somewhat weakened). At this level of generality, ours is the first such learnability result for unbounded loss in the agnostic setting. Our technique is based on metric medoids (a variant of Fréchet means) and presents a significant departure from existing methods, which, as we demonstrate, fail to achieve Bayes-consistency on general instance- and label-space metrics. Our proofs introduce the technique of {\em semi-stable compression}, which may be of independent interest.

2602.22648 2026-02-27 stat.ME math.ST stat.TH

A General (Non-Markovian) Framework for Covariate Adaptive Randomization: Achieving Balance While Eliminating the Shift

Hengjia Fang, Wei Ma

详情
英文摘要

Emerging applications increasingly demand flexible covariate adaptive randomization (CAR) methods that support unequal targeted allocation ratios. While existing procedures can achieve covariate balance, they often suffer from the shift problem. This occurs when the allocation ratios of some additional covariates deviate from the target. We show that this problem is equivalent to a mismatch between the conditional average allocation ratio and the target among units sharing specific covariate values, revealing a failure of existing procedures in the long run. To address it, we derive a new form of allocation function by requiring that balancing covariates ensures the ratio matches the target. Based on this form, we design a class of parameterized allocation functions. When the parameter roughly matches certain characteristics of the covariate distribution, the resulting procedure can balance covariates. Thus, we propose a feasible randomization procedure that updates the parameter based on collected covariate information, rendering the procedure non-Markovian. To accommodate this, we introduce a CAR framework that allows non-Markovian procedure. We then establish its key theoretical properties, including the boundedness of covariate imbalance in probability and the asymptotic distribution of the imbalance for additional covariates. Ultimately, we conclude that the feasible randomization procedure can achieve covariate balance and eliminate the shift.

2602.22612 2026-02-27 stat.ME

Feasible Fusion: Constrained Joint Estimation under Structural Non-Overlap

Yuxi Du, Zhiheng Zhang, Haoxuan Li, Cong Fang, Jixing Xu, Peng Zhen, Jiecheng Guo

详情
英文摘要

Causal inference in modern largescale systems faces growing challenges, including highdimensional covariates, multi-valued treatments, massive observational (OBS) data, and limited randomized controlled trial (RCT) samples due to cost constraints. We formalize treatment-induced structural non-overlap and show that, under this regime, commonly used weighted fusion methods provably fail to satisfy randomized identifying restrictions.To address this issue,we propose a constrained joint estimation framework that minimizes observational risk while enforcing causal validity through orthogonal experimental moment conditions. We further show that structural non-overlap creates a feasibility obstruction for moment enforcement in the original covariate space.We also derive a penalized primaldual algorithm that jointly learns representations and predictors, and establish oracle inequalities decomposing error into overlap recovery, moment violation, and statistical terms.Extensive synthetic experiments demonstrate robust performance under varying degrees of nonoverlap. A largescale ridehailing application shows that our method achieves substantial gains over existing baselines, matching the performance of models trained with significantly more RCT data.

2602.22590 2026-02-27 stat.ME

Beyond Vintage Rotation: Bias-Free Sparse Representation Learning with Oracle Inference

Chengyu Cui, Yunxiao Chen, Jing Ouyang, Gongjun Xu

详情
英文摘要

Learning low-dimensional latent representations is a central topic in statistics and machine learning, and rotation methods have long been used to obtain sparse and interpretable representations. Despite nearly a century of widespread use across many fields, rigorous guarantees for valid inference for the learned representation remain lacking. In this paper, we identify a surprisingly prevalent phenomenon that suggests a reason for this gap: for a broad class of vintage rotations, the resulting estimators exhibit a non-estimable bias. Because this bias is independent of the data, it fundamentally precludes the development of valid inferential procedures, including the construction of confidence intervals and hypothesis testing. To address this challenge, we propose a novel bias-free rotation method within a general representation learning framework based on latent variables. We establish an oracle inference property for the learned sparse representations: the estimators achieve the same asymptotic variance as in the ideal setting where the latent variables are observed. To bridge the gap between theory and computation, we develop an efficient computational framework and prove that its output estimators retain the same oracle property. Our results provide a rigorous inference procedure for the rotated estimators, yielding statistically valid and interpretable representation learning.

2602.22588 2026-02-27 stat.ME

Modeling Covariate Feedback, Reversal, and Latent Traits in Longitudinal Data: A Joint Hierarchical Framework

Niloofar Ramezani, Pascal Nitiema, Jeffrey R. Wilson

Comments 16 pages, 1 figure, 4 tables, 3 supplementary figures

详情
英文摘要

Time-varying covariates in longitudinal studies frequently evolve through reciprocal feedback, undergo role reversal, and reflect unobserved individual heterogeneity. Standard statistical frameworks often assume fixed covariate roles and exogenous predictors, limiting their utility in systems governed by dynamic behavioral or physiological processes. We develop a hierarchical joint modeling framework that unifies three key features of such systems: (i) bidirectional feedback between a binary and a continuous covariate, (ii) role reversal in which these covariates become jointly modeled outcomes at a prespecified decision phase, and (iii) a shared latent trait influencing both intermediate covariates and a final binary endpoint. The model proceeds in three phases: a feedback-driven longitudinal process, a reversal phase in which the two covariates are jointly modeled conditional on the latent trait, and an outcome model linking a binary, decision-relevant endpoint to observed and latent components. Estimation is carried out using maximum likelihood and Bayesian inference, with Hamiltonian Monte Carlo supporting robust posterior estimation for models with latent structure and mixed outcome types. Simulation studies show that the model yields well calibrated coverage, small bias, and improved predictive performance compared to standard generalized linear mixed models, marginal approaches, and models that ignore feedback or latent traits. In an analysis of nationally representative U.S. panel data, the model captures the co-evolution of physical activity and body mass index and their joint influence, moderated by a latent behavioral resilience factor, on income mobility. The framework offers a flexible, practically implementable tool for analyzing longitudinal decision systems in which feedback, covariate role transition, and unmeasured capacity are central to prediction and intervention.

2602.22582 2026-02-27 stat.ME

Predictive variational inference for flexible regression models

Lucas Kock, Scott A. Sisson, G. S. Rodrigues, David J. Nott

详情
英文摘要

A conventional Bayesian approach to prediction uses the posterior distribution to integrate out parameters in a density for unobserved data conditional on the observed data and parameters. When the true posterior is intractable, it is replaced by an approximation; here we focus on variational approximations. Recent work has explored methods that learn posteriors optimized for predictive accuracy under a chosen scoring rule, while regularizing toward the prior or conventional posterior. Our work builds on an existing predictive variational inference (PVI) framework that improves prediction, but also diagnoses model deficiencies through implicit model expansion. In models where the sampling density depends on the parameters through a linear predictor, we improve the interpretability of existing PVI methods as a diagnostic tool. This is achieved by adopting PVI posteriors of Gaussian mixture form (GM-PVI) and establishing connections with plug-in prediction for mixture-of-experts models. We make three main contributions. First, we show that GM-PVI prediction is equivalent to plug-in prediction for certain mixture-of-experts models with covariate-independent weights in generalized linear models and hierarchical extensions of them. Second, we extend standard PVI by allowing GM-PVI posteriors to vary with the prediction covariate and in this case an equivalence to plug-in prediction for mixtures of experts with covariate-dependent weights is established. Third, we demonstrate the diagnostic value of this approach across several examples, including generalized linear models, linear mixed models, and latent Gaussian process models, demonstrating how the parameters of the original model must vary across the covariate space to achieve improvements in prediction.

2602.22505 2026-02-27 cs.LG stat.ML

Sharp Convergence Rates for Masked Diffusion Models

Yuchen Liang, Zhiheng Tan, Ness Shroff, Yingbin Liang

详情
英文摘要

Discrete diffusion models have achieved strong empirical performance in text and other symbolic domains, with masked (absorbing-rate) variants emerging as competitive alternatives to autoregressive models. Among existing samplers, the Euler method remains the standard choice in many applications, and more recently, the First-Hitting Sampler (FHS) has shown considerable promise for masked diffusion models. Despite their practical success, the theoretical understanding of these samplers remains limited. Existing analyses are conducted in Kullback-Leibler (KL) divergence, which often yields loose parameter dependencies and requires strong assumptions on score estimation. Moreover, these guarantees do not cover recently developed high-performance sampler of FHS. In this work, we first develop a direct total-variation (TV) based analysis for the Euler method that overcomes these limitations. Our results relax assumptions on score estimation, improve parameter dependencies, and establish convergence guarantees without requiring any surrogate initialization. Also for this setting, we provide the first convergence lower bound for the Euler sampler, establishing tightness with respect to both the data dimension $d$ and the target accuracy $\varepsilon$. Finally, we analyze the FHS sampler and show that it incurs no sampling error beyond that induced by score estimation, which we show to be tight with a matching lower error bound. Overall, our analysis introduces a direct TV-based error decomposition along the CTMC trajectory and a decoupling-based path-wise analysis for FHS, which may be of independent interest.

2602.22492 2026-02-27 stat.ML cs.AI cs.LG

From Shallow Bayesian Neural Networks to Gaussian Processes: General Convergence, Identifiability and Scalable Inference

Gracielle Antunes de Araújo, Flávio B. Gonçalves

Comments 29 pages, 4 figures, 8 tables. Supplementary material included

详情
英文摘要

In this work, we study scaling limits of shallow Bayesian neural networks (BNNs) via their connection to Gaussian processes (GPs), with an emphasis on statistical modeling, identifiability, and scalable inference. We first establish a general convergence result from BNNs to GPs by relaxing assumptions used in prior formulations, and we compare alternative parameterizations of the limiting GP model. Building on this theory, we propose a new covariance function defined as a convex mixture of components induced by four widely used activation functions, and we characterize key properties including positive definiteness and both strict and practical identifiability under different input designs. For computation, we develop a scalable maximum a posterior (MAP) training and prediction procedure using a Nyström approximation, and we show how the Nyström rank and anchor selection control the cost-accuracy trade-off. Experiments on controlled simulations and real-world tabular datasets demonstrate stable hyperparameter estimates and competitive predictive performance at realistic computational cost.

2602.22432 2026-02-27 stat.ML cs.LG

LoBoost: Fast Model-Native Local Conformal Prediction for Gradient-Boosted Trees

Vagner Santos, Victor Coscrato, Luben Cabezas, Rafael Izbicki, Thiago Ramos

详情
英文摘要

Gradient-boosted decision trees are among the strongest off-the-shelf predictors for tabular regression, but point predictions alone do not quantify uncertainty. Conformal prediction provides distribution-free marginal coverage, yet split conformal uses a single global residual quantile and can be poorly adaptive under heteroscedasticity. Methods that improve adaptivity typically fit auxiliary nuisance models or introduce additional data splits/partitions to learn the conformal score, increasing cost and reducing data efficiency. We propose LoBoost, a model-native local conformal method that reuses the fitted ensemble's leaf structure to define multiscale calibration groups. Each input is encoded by its sequence of visited leaves; at resolution level k, we group points by matching prefixes of leaf indices across the first k trees and calibrate residual quantiles within each group. LoBoost requires no retraining, auxiliary models, or extra splitting beyond the standard train/calibration split. Experiments show competitive interval quality, improved test MSE on most datasets, and large calibration speedups.

2602.22369 2026-02-27 math.ST math.PR stat.ML stat.TH

Sampling from Constrained Gibbs Measures: with Applications to High-Dimensional Bayesian Inference

Ruixiao Wang, Xiaohong Chen, Sinho Chewi

详情
英文摘要

This paper considers a non-standard problem of generating samples from a low-temperature Gibbs distribution with \emph{constrained} support, when some of the coordinates of the mode lie on the boundary. These coordinates are referred to as the non-regular part of the model. We show that in a ``pre-asymptotic'' regime in which the limiting Laplace approximation is not yet valid, the low-temperature Gibbs distribution concentrates on a neighborhood of its mode. Within this region, the distribution is a bounded perturbation of a product measure: a strongly log-concave distribution in the regular part and a one-dimensional exponential-type distribution in each coordinate of the non-regular part. Leveraging this structure, we provide a non-asymptotic sampling guarantee by analyzing the spectral gap of Langevin dynamics. Key examples of low-temperature Gibbs distributions include Bayesian posteriors, and we demonstrate our results on three canonical examples: a high-dimensional logistic regression model, a Poisson linear model, and a Gaussian mixture model.

2602.22358 2026-02-27 stat.CO

Multiproposal Elliptical Slice Sampling

Guillermina Senn, Nathan Glatt-Holtz, Giulia Carigi, Andrew Holbrook, Håkon Tjelmeland

详情
英文摘要

We introduce Multiproposal Elliptical Slice Sampling, a self-tuning multiproposal Markov chain Monte Carlo method for Bayesian inference with Gaussian priors. Our method generalizes the Elliptical Slice Sampling algorithm by 1) allowing multiple candidate proposals to be sampled in parallel at each self-tuning step, and 2) basing the acceptance step on a distance-informed transition matrix that can favor proposals far from the current state. This allows larger moves in state space and faster self-tuning, at essentially no additional wall clock time for expensive likelihoods, and results in improved mixing. We additionally provide theoretical arguments and experimental results suggesting dimension-robust mixing behavior, making the algorithm particularly well suited for Bayesian PDE inverse problems.

2602.22334 2026-02-27 cs.LG cs.AI stat.ML

A 1/R Law for Kurtosis Contrast in Balanced Mixtures

Yuda Bi, Wenjun Xiao, Linhao Bai, Vince D Calhoun

详情
英文摘要

Kurtosis-based Independent Component Analysis (ICA) weakens in wide, balanced mixtures. We prove a sharp redundancy law: for a standardized projection with effective width $R_{\mathrm{eff}}$ (participation ratio), the population excess kurtosis obeys $|κ(y)|=O(κ_{\max}/R_{\mathrm{eff}})$, yielding the order-tight $O(c_bκ_{\max}/R)$ under balance (typically $c_b=O(\log R)$). As an impossibility screen, under standard finite-moment conditions for sample kurtosis estimation, surpassing the $O(1/\sqrt{T})$ estimation scale requires $R\lesssim κ_{\max}\sqrt{T}$. We also show that \emph{purification} -- selecting $m\!\ll\!R$ sign-consistent sources -- restores $R$-independent contrast $Ω(1/m)$, with a simple data-driven heuristic. Synthetic experiments validate the predicted decay, the $\sqrt{T}$ crossover, and contrast recovery.

2602.22282 2026-02-27 cs.CR cs.LG stat.AP stat.ME stat.ML

Differentially Private Truncation of Unbounded Data via Public Second Moments

Zilong Cao, Xuan Bi, Hai Zhang

详情
英文摘要

Data privacy is important in the AI era, and differential privacy (DP) is one of the golden solutions. However, DP is typically applicable only if data have a bounded underlying distribution. We address this limitation by leveraging second-moment information from a small amount of public data. We propose Public-moment-guided Truncation (PMT), which transforms private data using the public second-moment matrix and applies a principled truncation whose radius depends only on non-private quantities: data dimension and sample size. This transformation yields a well-conditioned second-moment matrix, enabling its inversion with a significantly strengthened ability to resist the DP noise. Furthermore, we demonstrate the applicability of PMT by using penalized and generalized linear regressions. Specifically, we design new loss functions and algorithms, ensuring that solutions in the transformed space can be mapped back to the original domain. We have established improvements in the models' DP estimation through theoretical error bounds, robustness guarantees, and convergence results, attributing the gains to the conditioning effect of PMT. Experiments on synthetic and real datasets confirm that PMT substantially improves the accuracy and stability of DP models.

2602.22239 2026-02-27 stat.AP cs.LG q-bio.GN

VAE-MS: An Asymmetric Variational Autoencoder for Mutational Signature Extraction

Ida Egendal, Rasmus Froberg Brøndum, Dan J Woodcock, Christopher Yau, Martin Bøgsted

Comments Keywords: Variational Autoencoders, Mutational Signatures

详情
英文摘要

Mutational signature analysis has emerged as a powerful method for uncovering the underlying biological processes driving cancer development. However, the signature extraction process, typically performed using non-negative matrix factorization (NMF), often lacks reliability and clinical applicability. To address these limitations, several solutions have been introduced, including the use of neural networks to achieve more accurate estimates and probabilistic methods to better capture natural variation in the data. In this work, we introduce a Variational Autoencoder for Mutational Signatures (VAE-MS), a novel model that leverages both an asymmetric architecture and probabilistic methods for the extraction of mutational signatures. VAE-MS is compared to with three state-of-the-art models for mutational signature extraction: SigProfilerExtractor, the NMF-based gold standard; MUSE-XAE, an autoencoder that employs an asymmetric design without probabilistic components; and SigneR, a Bayesian NMF model, to illustrate the strength in combining a nonlinear extraction with a probabilistic model. In the ability to reconstruct input data and generalize to unseen data, models with probabilistic components (VAE-MS, SigneR) dramatically outperformed models without (SigProfilerExtractor, MUSE-XAE). The NMF-baed models (SigneR, SigProfilerExtractor) had the most accurate reconstructions in simulated data, while VAE-MS reconstructed more accurately on real cancer data. Upon evaluating the ability to extract signatures consistently, no model exhibited a clear advantage over the others. Software for VAE-MS is available at https://github.com/CLINDA-AAU/VAE-MS.

2602.22130 2026-02-27 cs.LG cs.DS math.ST stat.TH

Sample Complexity Bounds for Robust Mean Estimation with Mean-Shift Contamination

Ilias Diakonikolas, Giannis Iakovidis, Daniel M. Kane, Sihan Liu

详情
英文摘要

We study the basic task of mean estimation in the presence of mean-shift contamination. In the mean-shift contamination model, an adversary is allowed to replace a small constant fraction of the clean samples by samples drawn from arbitrarily shifted versions of the base distribution. Prior work characterized the sample complexity of this task for the special cases of the Gaussian and Laplace distributions. Specifically, it was shown that consistent estimation is possible in these cases, a property that is provably impossible in Huber's contamination model. An open question posed in earlier work was to determine the sample complexity of mean estimation in the mean-shift contamination model for general base distributions. In this work, we study and essentially resolve this open question. Specifically, we show that, under mild spectral conditions on the characteristic function of the (potentially multivariate) base distribution, there exists a sample-efficient algorithm that estimates the target mean to any desired accuracy. We complement our upper bound with a qualitatively matching sample complexity lower bound. Our techniques make critical use of Fourier analysis, and in particular introduce the notion of a Fourier witness as an essential ingredient of our upper and lower bounds.

2602.18383 2026-02-27 stat.ME math.ST stat.TH

Design-based inference for generalized causal effects in randomized experiments

Xinyuan Chen, Fan Li

详情
英文摘要

Generalized causal effect estimands, including the Mann-Whitney parameter and causal net benefit, provide flexible summaries of treatment effects in randomized experiments with non-Gaussian or multivariate outcomes. We develop a unified design-based inference framework for regression adjustment and variance estimation of a broad class of generalized causal effect estimands defined through pairwise contrast functions. Leveraging the theory of U-statistics and finite-population asymptotics, we establish the consistency and asymptotic normality of regression estimators constructed from individual pairs and per-unit pair averages, even when the working models are misspecified. Consequently, these estimators are model-assisted rather than model-based. In contrast to classical average treatment effect estimands, we show that for nonlinear contrast functions, covariate adjustment preserves consistency but does not admit a universal efficiency guarantee. For inference, we demonstrate that standard heteroskedasticity-robust and cluster-robust variance estimators are generally inconsistent in this setting. As a remedy, we prove that a complete two-way cluster-robust variance estimator, which fully accounts for pairwise dependence and reverse comparisons, is consistent.

2512.22714 2026-02-27 math.ST stat.ME stat.TH

Polynomial-Time Near-Optimal Estimation over Certain Type-2 Convex Bodies

Matey Neykov

Comments fixed some typos

详情
英文摘要

We develop polynomial-time algorithms for near-optimal minimax mean estimation under $\ell_2$-squared loss in a Gaussian sequence model under convex constraints. The parameter space is an origin-symmetric, type-2 convex body $K \subset \mathbb{R}^n$, and we assume additional regularity conditions: specifically, we assume $K$ is well-balanced, i.e., there exist known radii $r, R > 0$ such that $r B_2 \subseteq K \subseteq R B_2$, as well as oracle access to the Minkowski gauge of $K$. Under these and some further assumptions on $K$, our procedures achieve the minimax rate up to small factors, depending poly-logarithmically on the dimension, while remaining computationally efficient. We further extend our methodology to the linear regression and robust heavy-tailed settings, establishing polynomial-time near-optimal estimators when the constraint set satisfies the regularity conditions above. To the best of our knowledge, these results provide the first general framework for attaining statistically near-optimal performance under such broad geometric constraints while preserving computational tractability.

2512.05251 2026-02-27 stat.ML cs.LG

One-Step Diffusion Samplers via Self-Distillation and Deterministic Flow

Pascal Jutras-Dube, Jiaru Zhang, Ziran Wang, Ruqi Zhang

Comments Accepted to the 29th International Conference on Artificial Intelligence and Statistics (AISTATS 2026)

详情
英文摘要

Sampling from unnormalized target distributions is a fundamental yet challenging task in machine learning and statistics. Existing sampling algorithms typically require many iterative steps to produce high-quality samples, leading to high computational costs. We introduce one-step diffusion samplers which learn a step-conditioned ODE so that one large step reproduces the trajectory of many small ones via a state-space consistency loss. We further show that standard ELBO estimates in diffusion samplers degrade in the few-step regime because common discrete integrators yield mismatched forward/backward transition kernels. Motivated by this analysis, we derive a deterministic-flow (DF) importance weight for ELBO estimation without a backward kernel. To calibrate DF, we introduce a volume-consistency regularization that aligns the accumulated volume change along the flow across step resolutions. Our proposed sampler therefore achieves both fast sampling and stable evidence estimate in only one or few steps. Across challenging synthetic and Bayesian benchmarks, it achieves competitive sample quality with orders-of-magnitude fewer network evaluations while maintaining robust ELBO estimates.

2509.19494 2026-02-27 astro-ph.CO astro-ph.GA physics.data-an stat.ME

Testing the Constancy of Type Ia Supernova Luminosities with Gaussian Process

Akshay Rana

Comments 14 Pages, 04 Figures, Under review

详情
Journal ref
Physics of the Dark Universe, February, 2026
英文摘要

Type Ia supernovae (SNe~Ia) are central to studies of cosmic expansion, under the assumption that their absolute magnitude $M_B$ does not evolve with redshift. Even small drifts in brightness can bias cosmological parameters such as $H_0$ and $w$. Here we test this assumption using a non-parametric Gaussian Process (GP) reconstruction of the expansion history from cosmic chronometer $H(z)$ data, which provides a model-independent baseline distance modulus, $μ_{\rm GP}(z)$. To propagate uncertainties, we draw Monte Carlo realizations of $H(z)$ from the GP posterior and evaluate them on a Chebyshev grid, which improves numerical stability and quadrature accuracy. Supernova observations are then compared to this baseline through residuals, $ΔM_B(z)$, and their derivatives. Applying this method to Pantheon+ (1701 SNe~Ia) and DES 5YR (435 SNe~Ia), we find that SNe~Ia are consistent with being standard candles within $1σ$, though both datasets exhibit localized departures: near $z \sim 1$ in Pantheon+ and at $z \sim 0.3$--$0.5$ in DES. The presence of similar features in two independent surveys suggests they are not purely statistical. Our results point toward a possible non-monotonic luminosity evolution, likely reflecting different physical drivers at different epochs, and highlight the need for a deeper astrophysical understanding of SN~Ia populations.

2506.01408 2026-02-27 physics.soc-ph stat.AP

Modeling temporal hypergraphs

Jürgen Lerner, Marian-Gabriel Hâncean, Matjaz Perc

详情
Journal ref
J. Complex Netw. 13, cnaf054 (2025)
英文摘要

Networks representing social, biological, technological or other systems are often characterized by higher-order interaction involving any number of nodes. Temporal hypergraphs are given by ordered sequences of hyperedges representing sets of nodes interacting at given points in time. In this paper we discuss how a recently proposed model family for time-stamped hyperedges - relational hyperevent models (RHEM) - can be employed to define tailored null distributions for temporal hypergraphs and to test and control for complex dependencies in hypergraph dynamics. RHEM can be specified with a given vector of temporal hyperedge statistics - functions that quantify the structural position of hyperedges in the history of previous hyperedges - and equate expected values of these statistics with their empirically observed values. This allows, for instance, to analyze the overrepresentation or underrepresentation of temporal hyperedge configurations in a model that reproduces the observed distributions of possibly complex sub-configurations, including but going beyond node degrees. Concrete examples include, but are not limited to, preferential attachment, repetition of subsets of any given size, triadic closure, homophily, and degree assortativity for subsets of any order.

2505.05633 2026-02-27 stat.ME stat.CO

Tutorial on Bayesian Functional Regression Using Stan

Ziren Jiang, Ciprian Crainiceanu, Erjia Cui

详情
英文摘要

This manuscript provides step-by-step instructions for implementing Bayesian functional regression models using Stan. Extensive simulations indicate that the inferential performance of the methods is comparable to that of state-of-the-art frequentist approaches. However, Bayesian approaches allow for more flexible modeling and provide an alternative when frequentist methods are not available or may require additional development. Methods and software are illustrated using the accelerometry data from the National Health and Nutrition Examination Survey (NHANES).

2504.21787 2026-02-27 math.ST cs.IT cs.LG math.IT stat.ML stat.TH

Estimation of discrete distributions in relative entropy, and the deviations of the missing mass

Jaouad Mourtada

Comments Minor revision; 54 pages

详情
英文摘要

We study the problem of estimating a distribution over a finite alphabet from an i.i.d. sample, with accuracy measured in relative entropy (Kullback-Leibler divergence). While optimal bounds on the expected risk are known, high-probability guarantees remain less well-understood. First, we analyze the classical Laplace (add-one) estimator, obtaining matching upper and lower bounds on its performance and establishing its optimality among confidence-independent estimators. We then characterize the minimax-optimal high-probability risk and show that it is achieved by a simple confidence-dependent smoothing technique. Notably, the optimal non-asymptotic risk incurs an additional logarithmic factor compared to the ideal asymptotic rate. Next, motivated by regimes in which the alphabet size exceeds the sample size, we investigate methods that adapt to the sparsity of the underlying distribution. We introduce an estimator using data-dependent smoothing, for which we establish a high-probability risk bound depending on two effective sparsity parameters. As part of our analysis, we also derive a sharp high-probability upper bound on the missing mass.

2503.19874 2026-02-27 cs.LG stat.ML

Extensions of the regret-minimization algorithm for optimal design

Youguang Chen, George Biros

详情
英文摘要

We consider the problem of selecting a subset of points from a dataset of $n$ unlabeled examples for labeling, with the goal of training a multiclass classifier. To address this, we build upon the regret minimization framework introduced by Allen-Zhu et al. in "Near-optimal design of experiments via regret minimization" (ICML, 2017). We propose an alternative regularization scheme within this framework, which leads to a new sample selection objective along with a provable sample complexity bound that guarantees a $(1+ε)$-approximate solution. Additionally, we extend the regret minimization approach to handle experimental design in the ridge regression setting. We evaluate the selected samples using logistic regression and compare performance against several state-of-the-art methods. Our empirical results on MNIST, CIFAR-10, and a 50-class subset of ImageNet demonstrate that our method consistently outperforms competing approaches across most scenarios.

2502.13431 2026-02-27 stat.ME econ.EM

Functional Network Autoregressive Models for Panel Data

Tomohiro Ando, Tadao Hoshino

详情
英文摘要

This study proposes a novel functional vector autoregressive framework for analyzing network interactions of functional outcomes in panel data settings. In this framework, an individual's outcome function is influenced by the outcomes of others through a simultaneous equation system. To estimate the functional parameters of interest, we need to address the endogeneity issue arising from these simultaneous interactions among outcome functions. This issue is carefully handled by developing a novel functional moment-based estimator. We establish the consistency, convergence rate, and pointwise asymptotic normality of the proposed estimator. Additionally, we discuss the estimation of marginal effects and impulse response analysis. As an empirical illustration, we analyze the demand for a bike-sharing service in the U.S. The results reveal statistically significant spatial interactions in bike availability across stations, with interaction patterns varying over the time of day.

2502.06051 2026-02-27 cs.LG cs.AI math.ST stat.ML stat.TH

Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits

Qingyue Zhao, Kaixuan Ji, Heyang Zhao, Tong Zhang, Quanquan Gu

Comments 35 pages

详情
英文摘要

Many offline reinforcement learning algorithms are underpinned by $f$-divergence regularization, but their sample complexity *defined with respect to regularized objectives* still lacks tight analyses, especially in terms of concrete data coverage conditions. In this paper, we study the exact concentrability requirements to achieve the $\tildeΘ(ε^{-1})$ sample complexity for offline $f$-divergence-regularized contextual bandits. For reverse Kullback-Leibler (KL) divergence, arguably the most commonly used one, we achieve an $\tilde{O}(ε^{-1})$ sample complexity under single-policy concentrability for the first time via a novel pessimism-based analysis, surpassing existing $\tilde{O}(ε^{-1})$ bound under all-policy concentrability and $\tilde{O}(ε^{-2})$ bound under single-policy concentrability. We also propose a near-matching lower bound, demonstrating that a multiplicative dependency on single-policy concentrability is necessary to maximally exploit the curvature property of reverse KL. Moreover, for $f$-divergences with strongly convex $f$, to which reverse KL *does not* belong, we show that the sharp sample complexity $\tildeΘ(ε^{-1})$ is achievable even without pessimistic estimation or single-policy concentrability. We further corroborate our theoretical insights with numerical experiments and extend our analysis to contextual dueling bandits. We believe these results take a significant step towards a comprehensive understanding of objectives with $f$-divergence regularization.

2405.06727 2026-02-27 stat.ML cs.LG

Approximation Error and Complexity Bounds for ReLU Networks on Low-Regular Function Spaces

Owen Davis, Gianluca Geraci, Mohammad Motamed

Comments Theorem 1 and 2 proofs are incorrect; the proof strategy is flawed and not easily fixed. The Fourier feature residual network weights in [12] must be controlled to avoid infinite constants C. Any fix would require a near-rewrite, producing a manuscript of different scope and not suitable as a replacement. A corrected, modified Theorem 1 appears in arXiv:2207.09511v2

详情
英文摘要

In this work, we consider the approximation of a large class of bounded functions, with minimal regularity assumptions, by ReLU neural networks. We show that the approximation error can be bounded from above by a quantity proportional to the uniform norm of the target function and inversely proportional to the product of network width and depth. We inherit this approximation error bound from Fourier features residual networks, a type of neural network that uses complex exponential activation functions. Our proof is constructive and proceeds by conducting a careful complexity analysis associated with the approximation of a Fourier features residual network by a ReLU network.

2309.01935 2026-02-27 stat.AP

The impact of electronic health records (EHR) data continuity on prediction model fairness and racial-ethnic disparities

Yu Huang, Jingchuan Guo, Zhaoyi Chen, Jie Xu, William T Donahoo, Olveen Carasquillo, Hrushyang Adloori, Jiang Bian, Elizabeth A Shenkman

Comments Substantial revision planned: We're preparing a significantly revised version and prefer to withdraw the current one rather than leave a misleading draft up

详情
英文摘要

Electronic health records (EHR) data have considerable variability in data completeness across sites and patients. Lack of "EHR data-continuity" or "EHR data-discontinuity", defined as "having medical information recorded outside the reach of an EHR system" can lead to a substantial amount of information bias. The objective of this study was to comprehensively evaluate (1) how EHR data-discontinuity introduces data bias, (2) case finding algorithms affect downstream prediction models, and (3) how algorithmic fairness is associated with racial-ethnic disparities. We leveraged our EHRs linked with Medicaid and Medicare claims data in the OneFlorida+ network and used a validated measure (i.e., Mean Proportions of Encounters Captured [MPEC]) to estimate patients' EHR data continuity. We developed a machine learning model for predicting type 2 diabetes (T2D) diagnosis as the use case for this work. We found that using cohorts selected by different levels of EHR data-continuity affects utilities in disease prediction tasks. The prediction models trained on high continuity data will have a worse fit on low continuity data. We also found variations in racial and ethnic disparities in model performances and model fairness in models developed using different degrees of data continuity. Our results suggest that careful evaluation of data continuity is critical to improving the validity of real-world evidence generated by EHR data and health equity.