arXivDaily arXiv每日学术速递 周一至周五更新
2602.06931 2026-02-09 math.ST stat.CO stat.TH

On micromodes in Bayesian posterior distributions and their implications for MCMC

Sanket Agrawal, Sebastiano Grazzi, Gareth O. Roberts

Comments 37 pages, 4 figures

详情
英文摘要

We investigate the existence and severity of local modes in posterior distributions from Bayesian analyses. These are known to occur in posterior tails resulting from heavy-tailed error models such as those used in robust regression. To understand this phenomenon clearly, we consider in detail location models with Student-$t$ errors in dimension $d$ with sample size $n$. For sufficiently heavy-tailed data-generating distributions, extreme observations become increasingly isolated as $n \to \infty$. We show that each such observation induces a unique local posterior mode with probability tending to $1$. We refer to such a local mode as a micromode. These micromodes are typically small in height but their domains of attraction are large and grow polynomially with $n$. We then connect this posterior geometry to computation. We establish an Arrhenius law for the time taken by one-dimensional piecewise deterministic Monte Carlo algorithms to exit these micromodes. Our analysis identifies a phase transition where a misspecified and overly underdispersed model causes exit times to increase sharply, leading to a pronounced deterioration in sampling performance.

2602.06918 2026-02-09 math.ST stat.TH

Convex lineability in copula and quasi-copula sets

Enrique de Amo, Juan Fernández-Sánchez, David García-Fernández, Manuel Úbeda-Flores

Comments 4 figures

详情
英文摘要

In this paper, we investigate several subsets of $n$-copulas and $n$-quasi-copulas from the perspective of convex-lineability and the recently introduced concept of convex-spaceability. Our purpose is to determine when such families contain extremely large algebraic structures, namely linearly independent sets of cardinality of the continuum whose convex hull, and in some cases a closed convex linearly independent subset, remain entirely inside the class under study. These include the families of asymmetric copulas, copulas with maximal asymmetric measure, and proper $n$-quasi-copulas, among others. In contrast, for several other natural classes of copulas we show that (maximal) convex lineability holds while convex spaceability remains an open problem.

2602.06910 2026-02-09 stat.ME stat.AP

Assessment of evidence against homogeneity in exhaustive subgroup treatment effect plots

Björn Bornkamp, Jiarui Lu, Frank Bretz

详情
英文摘要

Exhaustive subgroup treatment effect plots are constructed by displaying all subgroup treatment effects of interest against subgroup sample size, providing a useful overview of the observed treatment effect heterogeneity in a clinical trial. As in any exploratory subgroup analysis, however, the observed estimates suffer from small sample sizes and multiplicity issues. To facilitate more interpretable exploratory assessments, this paper introduces a computationally efficient method to generate homogeneity regions within exhaustive subgroup treatment effect plots. Using the Doubly Robust (DR) learner, pseudo-outcomes are used to estimate subgroup effects and derive reference distributions, quantifying how surprising observed heterogeneity is under a homogeneous effects model. Explicit formulas are derived for the homogeneity region and different methods for calculation of the critical values are compared. The method is illustrated with a cardiovascular trial and evaluated via simulation, showing well-calibrated inference and improved performance over standard approaches using simple differences of observed group means.

2602.06900 2026-02-09 cs.LG cs.AI cs.IT cs.NE math.IT stat.ML

Supercharging Simulation-Based Inference for Bayesian Optimal Experimental Design

Samuel Klein, Willie Neiswanger, Daniel Ratner, Michael Kagan, Sean Gasiorowski

详情
英文摘要

Bayesian optimal experimental design (BOED) seeks to maximize the expected information gain (EIG) of experiments. This requires a likelihood estimate, which in many settings is intractable. Simulation-based inference (SBI) provides powerful tools for this regime. However, existing work explicitly connecting SBI and BOED is restricted to a single contrastive EIG bound. We show that the EIG admits multiple formulations which can directly leverage modern SBI density estimators, encompassing neural posterior, likelihood, and ratio estimation. Building on this perspective, we define a novel EIG estimator using neural likelihood estimation. Further, we identify optimization as a key bottleneck of gradient based EIG maximization and show that a simple multi-start parallel gradient ascent procedure can substantially improve reliability and performance. With these innovations, our SBI-based BOED methods are able to match or outperform by up to $22\%$ existing state-of-the-art approaches across standard BOED benchmarks.

2602.06899 2026-02-09 cs.LG stat.ML

Sample Complexity of Causal Identification with Temporal Heterogeneity

Ameya Rathod, Sujay Belsare, Salvik Krishna Nautiyal, Dhruv Laad, Ponnurangam Kumaraguru

详情
英文摘要

Recovering a unique causal graph from observational data is an ill-posed problem because multiple generating mechanisms can lead to the same observational distribution. This problem becomes solvable only by exploiting specific structural or distributional assumptions. While recent work has separately utilized time-series dynamics or multi-environment heterogeneity to constrain this problem, we integrate both as complementary sources of heterogeneity. This integration yields unified necessary identifiability conditions and enables a rigorous analysis of the statistical limits of recovery under thin versus heavy-tailed noise. In particular, temporal structure is shown to effectively substitute for missing environmental diversity, possibly achieving identifiability even under insufficient heterogeneity. Extending this analysis to heavy-tailed (Student's t) distributions, we demonstrate that while geometric identifiability conditions remain invariant, the sample complexity diverges significantly from the Gaussian baseline. Explicit information-theoretic bounds quantify this cost of robustness, establishing the fundamental limits of covariance-based causal graph recovery methods in realistic non-stationary systems. This work shifts the focus from whether causal structure is identifiable to whether it is statistically recoverable in practice.

2602.06828 2026-02-09 stat.ME

A prediction interval for the population-wise error rate

Remi Luschei, Werner Brannath

详情
英文摘要

We construct an asymptotic prediction interval for the population-wise error rate (PWER), which is a multiple type I error criterion for clinical trials with overlapping patient populations. The PWER is the probability that a randomly selected patient will receive an ineffective treatment. It must usually be estimated due to unknown population strata sizes, such that only an estimate can be controlled at the given significance level. We apply the delta method to find a prediction interval for the resulting true PWER, we demonstrate by simulations that the interval has the required coverage probability, and illustrate the approach with real data examples.

2602.06764 2026-02-09 math.ST stat.ME stat.TH

Prediction-based inference for integrated diffusions with high-frequency data

Emil S. Jørgensen, Michael Sørensen

详情
英文摘要

We consider parametric inference for an ergodic and stationary diffusion process, when the data are high-frequency observations of the integral of the diffusion process. Such data are obtained via certain measurement devices, or if positions are recorded and speed is modelled by a diffusion. In finance, realized volatility or variations thereof can be used to construct observations of the latent integrated volatility process. Specifically, we assume that the integrated process is observed at equidistant, deterministic time points and consider the high-frequency/infinite horizon asymptotic scenario, where the number of observations, the sampling frequency and the time of the last observation all go to infinity. Subject to mild standard regularity conditions on the diffusion model, we prove the asymptotic existence and uniqueness of a consistent estimator for useful and tractable classes of prediction-based estimating functions. Asymptotic normality of the estimator is obtained under an additional assumption on the rates. The proofs are based on the useful Euler-Ito expansions of transformations of diffusions and integrated diffusions, which we study in some detail.

2602.06621 2026-02-09 stat.ML cs.LG

Infinite-dimensional generative diffusions via Doob's h-transform

Thorben Pieper-Sethmacher, Daniel Paulin

详情
英文摘要

This paper introduces a rigorous framework for defining generative diffusion models in infinite dimensions via Doob's h-transform. Rather than relying on time reversal of a noising process, a reference diffusion is forced towards the target distribution by an exponential change of measure. Compared to existing methodology, this approach readily generalises to the infinite-dimensional setting, hence offering greater flexibility in the diffusion model. The construction is derived rigorously under verifiable conditions, and bounds with respect to the target measure are established. We show that the forced process under the changed measure can be approximated by minimising a score-matching objective and validate our method on both synthetic and real data.

2602.06584 2026-02-09 cs.CL cs.LG stat.ML

Inference-Time Rethinking with Latent Thought Vectors for Math Reasoning

Deqian Kong, Minglu Zhao, Aoyang Qin, Bo Pang, Chenxin Tao, David Hartmann, Edouardo Honig, Dehong Xu, Amit Kumar, Matt Sarte, Chuan Li, Jianwen Xie, Ying Nian Wu

详情
英文摘要

Standard chain-of-thought reasoning generates a solution in a single forward pass, committing irrevocably to each token and lacking a mechanism to recover from early errors. We introduce Inference-Time Rethinking, a generative framework that enables iterative self-correction by decoupling declarative latent thought vectors from procedural generation. We factorize reasoning into a continuous latent thought vector (what to reason about) and a decoder that verbalizes the trace conditioned on this vector (how to reason). Beyond serving as a declarative buffer, latent thought vectors compress the reasoning structure into a continuous representation that abstracts away surface-level token variability, making gradient-based optimization over reasoning strategies well-posed. Our prior model maps unstructured noise to a learned manifold of valid reasoning patterns, and at test time we employ a Gibbs-style procedure that alternates between generating a candidate trace and optimizing the latent vector to better explain that trace, effectively navigating the latent manifold to refine the reasoning strategy. Training a 0.2B-parameter model from scratch on GSM8K, our method with 30 rethinking iterations surpasses baselines with 10 to 15 times more parameters, including a 3B counterpart. This result demonstrates that effective mathematical reasoning can emerge from sophisticated inference-time computation rather than solely from massive parameter counts.

2602.06579 2026-02-09 stat.ME stat.ML

Efficient Online Variational Estimation via Monte Carlo Sampling

Mathis Chagneux, Mathias Müller, Pierre Gloaguen, Sylvain Le Corff, Jimmy Olsson

详情
英文摘要

This article addresses online variational estimation in parametric state-space models. We propose a new procedure for efficiently computing the evidence lower bound and its gradient in a streaming-data setting, where observations arrive sequentially. The algorithm allows for the simultaneous training of the model parameters and the distribution of the latent states given the observations. It is based on i.i.d. Monte Carlo sampling, coupled with a well-chosen deep architecture, enabling both computational efficiency and flexibility. The performance of the method is illustrated on both synthetic data and real-world air-quality data. The proposed approach is theoretically motivated by the existence of an asymptotic contrast function and the ergodicity of the underlying Markov chain, and applies more generally to the computation of additive expectations under posterior distributions in state-space models.

2602.06557 2026-02-09 cs.LG cs.AI stat.ML

Which Graph Shift Operator? A Spectral Answer to an Empirical Question

Yassine Abbahaddou

详情
英文摘要

Graph Neural Networks (GNNs) have established themselves as the leading models for learning on graph-structured data, generally categorized into spatial and spectral approaches. Central to these architectures is the Graph Shift Operator (GSO), a matrix representation of the graph structure used to filter node signals. However, selecting the optimal GSO, whether fixed or learnable, remains largely empirical. In this paper, we introduce a novel alignment gain metric that quantifies the geometric distortion between the input signal and label subspaces. Crucially, our theoretical analysis connects this alignment directly to generalization bounds via a spectral proxy for the Lipschitz constant. This yields a principled, computation-efficient criterion to rank and select the optimal GSO for any prediction task prior to training, eliminating the need for extensive search.

2602.06545 2026-02-09 stat.ML cs.LG

Operationalizing Stein's Method for Online Linear Optimization: CLT-Based Optimal Tradeoffs

Zhiyu Zhang, Aaditya Ramdas

详情
英文摘要

Adversarial online linear optimization (OLO) is essentially about making performance tradeoffs with respect to the unknown difficulty of the adversary. In the setting of one-dimensional fixed-time OLO on a bounded domain, it has been observed since Cover (1966) that achievable tradeoffs are governed by probabilistic inequalities, and these descriptive results can be converted into algorithms via dynamic programming, which, however, is not computationally efficient. We address this limitation by showing that Stein's method, a classical framework underlying the proofs of probabilistic limit theorems, can be operationalized as computationally efficient OLO algorithms. The associated regret and total loss upper bounds are "additively sharp", meaning that they surpass the conventional big-O optimality and match normal-approximation-based lower bounds by additive lower order terms. Our construction is inspired by the remarkably clean proof of a Wasserstein martingale central limit theorem (CLT) due to Röllin (2018). Several concrete benefits can be obtained from this general technique. First, with the same computational complexity, the proposed algorithm improves upon the total loss upper bounds of online gradient descent (OGD) and multiplicative weight update (MWU). Second, our algorithm can realize a continuum of optimal two-point tradeoffs between the total loss and the maximum regret over comparators, improving upon prior works in parameter-free online learning. Third, by allowing the adversary to randomize on an unbounded support, we achieve sharp in-expectation performance guarantees for OLO with noisy feedback.

2602.06539 2026-02-09 stat.ML cs.CG

Revisiting the Sliced Wasserstein Kernel for persistence diagrams: a Figalli-Gigli approach

Marc Janthial, Théo Lacombe

详情
英文摘要

The Sliced Wasserstein Kernel (SWK) for persistence diagrams was introduced in (Carri{è}re et al. 2017) as a powerful tool to implicitly embed persistence diagrams in a Hilbert space with reasonable distortion. This kernel is built on the intuition that the Figalli-Gigli distance-that is the partial matching distance routinely used to compare persistence diagrams-resembles the Wasserstein distance used in the optimal transport literature, and that the later could be sliced to define a positive definite kernel on the space of persistence diagrams. This efficient construction nonetheless relies on ad-hoc tweaks on the Wasserstein distance to account for the peculiar geometry of the space of persistence diagrams. In this work, we propose to revisit this idea by directly using the Figalli-Gigli distance instead of the Wasserstein one as the building block of our kernel. On the theoretical side, our sliced Figalli-Gigli kernel (SFGK) shares most of the important properties of the SWK of Carri{è}re et al., including distortion results on the induced embedding and its ease of computation, while being more faithful to the natural geometry of persistence diagrams. In particular, it can be directly used to handle infinite persistence diagrams and persistence measures. On the numerical side, we show that the SFGK performs as well as the SWK on benchmark applications.

2602.06482 2026-02-09 stat.ME

On Stein's Method of Moments and Generalized Score Matching

Alfred Kume, Stephen G. Walker

Comments 10 pages, 1 figure

详情
英文摘要

We show that a special case of method of moment estimator derived from the Stein class coincides with the class of generalized score matching estimator. Choosing a suitable weight function for generalized score matching is not straightforward. However, by placing it within the method of moment framework we can alleviate this problem by extending the Stein class to generalized method of moments. As a consequence we can work with a number of functions and hence derive generalized score matching estimators with optimal properties.

2602.06458 2026-02-09 cond-mat.stat-mech stat.ME

Inferring Microscopic Explanatory Structures from Observational Constraints via Large Deviations

Akihisa Ichiki

详情
英文摘要

We study how macroscopic observational constraints restrict admissible microscopic explanatory structures when no intrinsic order or dynamics is assumed a priori. Starting from an unordered collection of measurement outcomes, we formulate inference as a constrained large deviation problem, selecting probability assignments that minimize relative entropy with respect to a reference measure determined solely by the measurement setup. We show that among all microscopic structures compatible with a given macroscopic constraint, those rendering the observation statistically most typical are selected. As an explicit illustration, we demonstrate how ordered microscopic structures can emerge purely from inference under constraint, even when the reference measure is fully permutation symmetric. Order is thus not assumed but inferred, serving here only as an illustrative example of a broader class of relational explanatory hypotheses constrained by observation.

2602.06435 2026-02-09 stat.ME econ.EM

Social Interactions Models with Latent Structures

Zhongjian Lin, Zhentao Shi, Yapeng Zheng

详情
英文摘要

This paper studies estimation and inference of heterogeneous peer effects featuring group fixed effects and slope heterogeneity under latent structure. We adapt the Classifier-Lasso algorithm to consistently discover latent structures and determine the number of clusters. To solve the incidental parameter problem in the binary choice model with social interactions, we propose a parametric bootstrap method to debias and establish its asymptotic validity. Monte Carlo simulations confirm strong finite sample performance of our methods. In an application to students' risky behaviors, the algorithm detects two latent clusters and finds that peer effects are significant within one of the clusters, demonstrating the practical applicability in uncovering heterogeneous social interactions.

2602.06415 2026-02-09 q-fin.MF q-fin.PR stat.ME

Joint survival annuity derivative valuation in the linear-rational Wishart mortality model

Jose Da Fonseca, Patrick Wong

详情
英文摘要

This study proposes a linear-rational joint survival mortality model based on the Wishart process. The Wishart process, which is a stochastic continuous matrix affine process, allows for a general dependency between the mortality intensities that are constructed to be positive. Using the linear-rational framework along with the Wishart process as state variable, we derive a closed-form expression for the joint survival annuity, as well as the guaranteed joint survival annuity option. Exploiting our parameterisation of the Wishart process, we explicit the distribution of the mortality intensities and their dependency. We provide the distribution (density and cumulative distribution) of the joint survival annuity. We also develop some polynomial expansions for the underlying state variable that lead to fast and accurate approximations for the guaranteed joint survival annuity option. These polynomial expansions also significantly simplify the implementation of the model. Overall, the linear-rational Wishart mortality model provides a flexible and unified framework for modelling and managing joint mortality risk.

2602.06379 2026-02-09 stat.ME stat.AP

E-values for Adaptive Clinical Trials: Anytime-Valid Monitoring in Practice

Alexandra Sokolova, Vadim Sokolov

详情
英文摘要

Adaptive clinical trials rely on interim analyses, flexible stopping, and data-dependent design modifications that complicate statistical guarantees when fixed-horizon test statistics are repeatedly inspected or reused after adaptations. E-values and e-processes provide anytime-valid tests and confidence sequences that remain valid under optional stopping and optional continuation without requiring a prespecified monitoring schedule. This paper is a methodology guide for practitioners. We develop the betting-martingale construction of e-processes for two-arm randomized controlled trials, show how e-values naturally handle composite null hypotheses and support futility monitoring, and provide guidance on when e-values are appropriate, when established alternatives are preferable, and how to integrate e-value monitoring with group sequential and Bayesian adaptive workflows. A numerical study compares five monitoring rules -- naive and calibrated versions of frequentist, Bayesian, and e-value approaches -- in a two-arm binary-endpoint trial. Naive repeated testing and naive posterior thresholds inflate Type I error substantially under frequent interim looks. Among the valid methods, the calibrated group sequential rule achieves the highest power, the e-value rule provides robust anytime-valid control with moderate power, and the calibrated Bayesian rule is the most conservative. Extended simulations show that the power gap between group sequential and e-value methods depends on the monitoring schedule and reverses under continuous monitoring. The methodology, including futility monitoring, platform trial multiplicity control, and hybrid strategies combining e-values with established methods, is implemented in the open-source R package `evalinger` and situated within the regulatory framework of the January 2026 FDA draft guidance on Bayesian methodology.

2602.06360 2026-02-09 math.ST stat.TH

Robust Bayesian estimation in conditionally heteroscedastic time series models

Jeongho Lee, Junmo Song

Comments 31 pages, 5 figures

详情
英文摘要

Outliers can seriously distort statistical inference by inducing excessive sensitivity in the likelihood function, thereby compromising the reliability of Bayesian estimation. To address this issue, we develop a robust Bayesian estimation method for conditionally heteroscedastic time series models by extending the density power divergence (DPD) framework to the Bayesian setting. The resulting DPD-based posterior distribution, controlled by a tuning parameter, achieves a smooth balance between efficiency and robustness. We establish the asymptotic properties of the proposed estimator; specifically, the DPD-based posterior is shown to satisfy a Bernstein-von Mises type theorem, converging to a normal distribution centered at the minimum DPD estimator (MDPDE). Furthermore, the corresponding Bayes estimator, defined as the posterior mean under the DPD-based posterior (EDPE), is asymptotically equivalent to the MDPDE. Monte Carlo simulations based on GARCH(1,1) models confirm that the proposed EDPE performs well under both uncontaminated and contaminated data, maintaining robustness where the ordinary Bayes estimator becomes severely biased. An empirical application to BTC-USD returns further demonstrates the practical advantages of the proposed robust Bayesian framework for financial time series analysis.

2602.06301 2026-02-09 stat.ME stat.AP stat.CO

Design-Conditional Prior Elicitation for Dirichlet Process Mixtures: A Unified Framework for Cluster Counts and Weight Control

JoonHo Lee

详情
英文摘要

Dirichlet process mixture (DPM) models are widely used for semiparametric Bayesian analysis in educational and behavioral research, yet specifying the concentration parameter remains a critical barrier. Default hyperpriors often impose strong, unintended assumptions about clustering, while existing calibration methods based on cluster counts suffer from computational inefficiency and fail to control the distribution of mixture weights. This article introduces Design-Conditional Elicitation (DCE), a unified framework that translates practitioner beliefs about cluster structure into coherent Gamma hyperpriors for a fixed design size J. DCE makes three contributions. First, it solves the computational bottleneck using Two-Stage Moment Matching (TSMM), which couples a closed-form approximation with an exact Newton refinement to calibrate hyperparameters without grid search. Second, addressing the "unintended prior" phenomenon, DCE incorporates a Dual-Anchor protocol to diagnose and optionally constrain the risk of weight dominance while transparently reporting the resulting trade-off against cluster-count fidelity. Third, the complete workflow is implemented in the open-source DPprior R package with reproducible diagnostics and a reporting checklist. Simulation studies demonstrate that common defaults such as Gamma(1, 1) induce posterior collapse rates exceeding 60% regardless of the true cluster structure, while DCE-calibrated priors substantially reduce bias and improve recovery across varying levels of data informativeness.

2602.06297 2026-02-09 stat.ML cs.LG

Time-uniform conformal and PAC prediction

Kayla E. Scharfstein, Arun Kumar Kuchibhotla

详情
英文摘要

Given that machine learning algorithms are increasingly being deployed to aid in high stakes decision-making, uncertainty quantification methods that wrap around these black box models such as conformal prediction have received much attention in recent years. In sequential settings, where data are observed/generated in a streaming fashion, traditional conformal methods do not provide any guarantee without fixing the sample size. More importantly, traditional conformal methods cannot cope with sequentially updated predictions. As such, we develop an extension of the conformal prediction and related probably approximately correct (PAC) prediction frameworks to sequential settings where the number of data points is not fixed in advance. The resulting prediction sets are anytime-valid in that their expected coverage is at the required level at any time chosen by the analyst even if this choice depends on the data. We present theoretical guarantees for our proposed methods and demonstrate their validity and utility on simulated and real datasets.

2602.05227 2026-02-09 stat.ML cs.LG cs.NA math.AP math.NA stat.ME

Radon--Wasserstein Gradient Flows for Interacting-Particle Sampling in High Dimensions

Elias Hess-Childs, Dejan Slepčev, Lantian Xu

Comments 49 pages, 7 figures; corrected Figure 4.4

详情
英文摘要

Gradient flows of the Kullback--Leibler (KL) divergence, such as the Fokker--Planck equation and Stein Variational Gradient Descent, evolve a distribution toward a target density known only up to a normalizing constant. We introduce new gradient flows of the KL divergence with a remarkable combination of properties: they admit accurate interacting-particle approximations in high dimensions, and the per-step cost scales linearly in both the number of particles and the dimension. These gradient flows are based on new transportation-based Riemannian geometries on the space of probability measures: the Radon--Wasserstein geometry and the related Regularized Radon--Wasserstein (RRW) geometry. We define these geometries using the Radon transform so that the gradient-flow velocities depend only on one-dimensional projections. This yields interacting-particle-based algorithms whose per-step cost follows from efficient Fast Fourier Transform-based evaluation of the required 1D convolutions. We additionally provide numerical experiments that study the performance of the proposed algorithms and compare convergence behavior and quantization. Finally, we prove some theoretical results including well-posedness of the flows and long-time convergence guarantees for the RRW flow.

2601.21299 2026-02-09 cs.CE eess.SP stat.AP

Collective Noise Filtering in Complex Networks

Tingyu Zhao, István A. Kovács

详情
英文摘要

Complex networks are powerful representations of complex systems across scales and domains, and the field is experiencing unprecedented growth in data availability. However, real-world network data often suffer from noise, biases, and missing data in edge weights, which undermine the reliability of downstream network analyses. Standard noise filtering approaches, whether treating individual edges one-by-one or assuming a uniform global noise level, are suboptimal, because in reality both signal and noise can be heterogeneous and correlated across multiple edges. As a solution, we introduce the Network Wiener Filter, a principled method for collective edge-level noise filtering that leverages both network structure and noise characteristics, to reduce error in the observed edge weights and to infer missing edge weights. We demonstrate the broad practical efficacy of the Network Wiener Filter in two distinct settings, the genetic interaction network of the budding yeast S. cerevisiae and the Enron Corpus email network, noting compelling evidence of successful noise suppression in both applications. With the Network Wiener Filter, we advocate for a shift toward error-aware network science, one that embraces data imperfection as an inherent feature and learns to navigate it effectively.

2601.09347 2026-02-09 cs.IT math.IT math.ST stat.TH

A Constructive Method to Maximize Entropy under Marginal Constraints

Pierre Jean-Claude Robert Bertrand

详情
英文摘要

We study the problem of maximizing R{é}nyi entropy of order $2$ (equivalently, minimizing the index of coincidence) over the set of joint distributions with prescribed marginals. A closed-form optimizer is known under a feasibility condition on the marginals; we show that this condition is highly restrictive. We then provide an explicit construction of an optimal coupling for arbitrary marginals. Our approach characterizes the optimizer's structure and yields an iterative algorithm that terminates in finite time, returning an exact solution after at most $p-1$ updates, where $p$ is the number of rows.

2601.02157 2026-02-09 physics.flu-dyn stat.ML

Multi-fidelity graph-based neural networks architectures to learn Navier-Stokes solutions on non-parametrized 2D domains

Francesco Songia, Raoul Sallé de Chou, Hugues Talbot, Irene Vignon-Clementel

详情
英文摘要

We propose a graph-based, multi-fidelity learning framework for the prediction of stationary Navier--Stokes solutions in non-parametrized two-dimensional geometries. The method is designed to guide the learning process through successive approximations, starting from reduced-order and full Stokes models, and progressively approaching the Navier--Stokes solution. To effectively capture both local and long-range dependencies in the velocity and pressure fields, we combine graph neural networks with Transformer and Mamba architectures. While Transformers achieve the highest accuracy, we show that Mamba can be successfully adapted to graph-structured data through an unsupervised node-ordering strategy. The Mamba approach significantly reduces computational cost while maintaining performance. Physical knowledge is embedded directly into the architecture through an encoding-processing-physics informed decoding pipeline. Derivatives are computed through algebraic operators constructed via the Weighted Least Squares method. The flexibility of these operators allows us not only to make the output obey the governing equations, but also to constrain selected hidden features to satisfy mass conservation. We introduce additional physical biases through an enriched graph convolution with the same differential operators describing the PDEs. Overall, we successfully guide the learning process by physical knowledge and fluid dynamics insights, leading to more regular and accurate predictions

2512.23087 2026-02-09 cs.LG cs.AI stat.ML

Dynamic Vocabulary Pruning: Stable LLM-RL by Taming the Tail

Yingru Li, Jiawei Xu, Jiacai Liu, Yuxuan Tong, Ziniu Li, Tianle Cai, Ge Zhang, Qian Liu, Baoxiang Wang

详情
英文摘要

Reinforcement Learning (RL) for Large Language Models (LLMs) faces a fundamental tension: the numerical divergence between high-throughput inference engines and numerically precise training engines. Although these systems share the same parameters, they produce slightly different probability distributions, creating a training-inference mismatch. We prove that the bound on the log-probability divergence arising from this mismatch scales as $(1-p)$, where $p$ is the token probability. This scaling induces a highly asymmetric effect: the bound vanishes for high-probability tokens but remains significant for low-probability tokens in the distribution tail. When sampled, these tail tokens introduce systematically biased errors that accumulate over sequences, thereby destabilizing gradient estimation. Instead of applying post-hoc corrections, we propose Dynamic Vocabulary Pruning (DVP), which constrains the RL objective to a dynamically determined ''safe'' vocabulary that excludes the extreme tail. This strategy trades large, destabilizing numerical errors for a small, bounded optimization bias. We validate DVP empirically by demonstrating stable training, and theoretically by deriving strict bounds on the induced bias.

2512.11648 2026-02-09 stat.AP

Dynamic Conditional SKEPTIC

Gabriele Di Luzio, Giacomo Morelli

详情
英文摘要

We introduce the Dynamic Conditional SKEPTIC (DCS), a semiparametric approach for efficiently and robustly estimating time-varying correlations in multivariate models. We exploit nonparametric rank-based statistics, namely Spearman's rho and Kendall's tau, to estimate the unknown correlation matrix and discuss the stationarity, beta- and rho- mixing conditions of the model. We illustrate the methodology by estimating the time-varying conditional correlation matrix of the stocks included in the S&P100 and S&P500 during the period from 02/01/2013 to 23/01/2025. The results show that DCS improves diagnostic checks compared to the classical Dynamic Conditional Correlation (DCC) models, providing uncorrelated and normally distributed residuals. A risk management application shows that global minimum variance portfolios estimated using the DCS model exhibit lower turnover than those based on the DCC and DCC-NL models, while also achieving higher Sharpe ratios for portfolios constructed from S&P 100 constituents.

2511.11817 2026-02-09 stat.ML cs.LG

FreDN: Spectral Disentanglement for Time Series Forecasting via Learnable Frequency Decomposition

Zhongde An, Jinhong You, Jiyanglin Li, Yiming Tang, Wen Li, Heming Du, Shouguo Du

Comments Added a code link and fixed minor typos

详情
英文摘要

Time series forecasting is essential in a wide range of real world applications. Recently, frequency-domain methods have attracted increasing interest for their ability to capture global dependencies. However, when applied to non-stationary time series, these methods encounter the $\textit{spectral entanglement}$ and the computational burden of complex-valued learning. The $\textit{spectral entanglement}$ refers to the overlap of trends, periodicities, and noise across the spectrum due to $\textit{spectral leakage}$ and the presence of non-stationarity. However, existing decompositions are not suited to resolving spectral entanglement. To address this, we propose the Frequency Decomposition Network (FreDN), which introduces a learnable Frequency Disentangler module to separate trend and periodic components directly in the frequency domain. Furthermore, we propose a theoretically supported ReIm Block to reduce the complexity of complex-valued operations while maintaining performance. We also re-examine the frequency-domain loss function and provide new theoretical insights into its effectiveness. Extensive experiments on seven long-term forecasting benchmarks demonstrate that FreDN outperforms state-of-the-art methods by up to 10\%. Furthermore, compared with standard complex-valued architectures, our real-imaginary shared-parameter design reduces the parameter count and computational cost by at least 50\%.

2511.08855 2026-02-09 q-bio.GN stat.ML

Path Signatures Enable Model-Free Mapping of RNA Modifications

Maud Lemercier, Paola Arrubarrena, Salvatore Di Giorgio, Julia Brettschneider, Thomas Cass, Valerie Griesche, Isabel S. Naarmann-de Vries, Anastasia Papavasiliou, Alessia Ruggieri, Irem Tellioglu, Chia Ching Wu, F. Nina Papavasiliou, Terry Lyons

详情
英文摘要

Detecting chemical modifications on RNA molecules remains a key challenge in epitranscriptomics. Traditional reverse transcription-based sequencing methods introduce enzyme- and sequence-dependent biases and fragment RNA molecules, confounding the accurate mapping of modifications across the transcriptome. Nanopore direct RNA sequencing offers a powerful alternative by preserving native RNA molecules, enabling the detection of modifications at single-molecule resolution. However, current computational tools can identify only a limited subset of modification types within well-characterized sequence contexts for which ample training data exists. Here, we introduce a model-free computational method that reframes modification detection as an anomaly detection problem, requiring only canonical (unmodified) RNA reads without any other annotated data. For each nanopore read, our approach extracts robust, modification-sensitive features from the raw ionic current signal at a site using the signature transform, then computes an anomaly score by comparing the resulting feature vector to its nearest neighbors in an unmodified reference dataset. We convert anomaly scores into statistical p-values to enable anomaly detection at both individual read and site levels. Validation on densely-modified \textit{E. coli} rRNA demonstrates that our approach detects known sites harboring diverse modification types, without prior training on these modifications. We further applyied this framework to dengue virus (DENV) transcripts and mammalian mRNAs. For DENV sfRNA, it led to revealing a novel 2'-O-methylated site, which we validate orthogonally by qRT-PCR assays. These results demonstrate that our model-free approach operates robustly across different types of RNAs and datasets generated with different nanopore sequencing chemistries.

2511.08846 2026-02-09 cs.LG math.AT stat.ML

On topological descriptors for graph products

Mattie Ji, Amauri H. Souza, Vikas Garg

Comments 26 pages, 4 tables, 5 figures. Accepted at NeurIPS 2025. Final version, clarified and fixed a bug

详情
英文摘要

Topological descriptors have been increasingly utilized for capturing multiscale structural information in relational data. In this work, we consider various filtrations on the (box) product of graphs and the effect on their outputs on the topological descriptors - the Euler characteristic (EC) and persistent homology (PH). In particular, we establish a complete characterization of the expressive power of EC on general color-based filtrations. We also show that the PH descriptors of (virtual) graph products contain strictly more information than the computation on individual graphs, whereas EC does not. Additionally, we provide algorithms to compute the PH diagrams of the product of vertex- and edge-level filtrations on the graph product. We also substantiate our theoretical analysis with empirical investigations on runtime analysis, expressivity, and graph classification performance. Overall, this work paves way for powerful graph persistent descriptors via product filtrations. Code is available at https://github.com/Aalto-QuML/tda_graph_product.

2510.20942 2026-02-09 stat.ME stat.CO

Bayesian analysis of flexible Heckman selection models using Hamiltonian Monte Carlo

Heeju Lim, Victor E. Lachos, Victor H. Lachos

Journal ref Computational Statistics, Volume 41, article number 40, (2026)

详情
英文摘要

The Heckman selection model is widely used in econometric analysis and other social sciences to address sample selection bias in data modeling. A common assumption in Heckman selection models is that the error terms follow an independent bivariate normal distribution. However, real-world data often deviates from this assumption, exhibiting heavy-tailed behavior, which can lead to inconsistent estimates if not properly addressed. In this paper, we propose a Bayesian analysis of Heckman selection models that replace the Gaussian assumption with well-known members of the class of scale mixture of normal distributions, such as the Student's-t and contaminated normal distributions. For these complex structures, Stan's default No-U-Turn sampler is utilized to obtain posterior simulations. Through extensive simulation studies, we compare the performance of the Heckman selection models with normal, Student's-t and contaminated normal distributions. We also demonstrate the broad applicability of this methodology by applying it to medical care and labor supply data. The proposed algorithms are implemented in the R package HeckmanStan.

2510.09796 2026-02-09 cs.LG cs.NA math.NA math.OC stat.ML

A Unified Framework for Lifted Training and Inversion Approaches

Xiaoyu Wang, Alexandra Valavanis, Azhir Mahmood, Andreas Mang, Martin Benning, Audrey Repetti

详情
英文摘要

The training of deep neural networks predominantly relies on a combination of gradient-based optimisation and back-propagation for the computation of the gradient. While incredibly successful, this approach faces challenges such as vanishing or exploding gradients, difficulties with non-smooth activations, and an inherently sequential structure that limits parallelisation. Lifted training methods offer an alternative by reformulating the nested optimisation problem into a higher-dimensional, constrained optimisation problem where the constraints are no longer enforced directly but penalised with penalty terms. This chapter introduces a unified framework that encapsulates various lifted training strategies, including the Method of Auxiliary Coordinates, Fenchel Lifted Networks, and Lifted Bregman Training, and demonstrates how diverse architectures, such as Multi-Layer Perceptrons, Residual Neural Networks, and Proximal Neural Networks fit within this structure. By leveraging tools from convex optimisation, particularly Bregman distances, the framework facilitates distributed optimisation, accommodates non-differentiable proximal activations, and can improve the conditioning of the training landscape. We discuss the implementation of these methods using block-coordinate descent strategies, including deterministic implementations enhanced by accelerated and adaptive optimisation techniques, as well as implicit stochastic gradient methods. Furthermore, we explore the application of this framework to inverse problems, detailing methodologies for both the training of specialised networks (e.g., unrolled architectures) and the stable inversion of pre-trained networks. Numerical results on standard imaging tasks validate the effectiveness and stability of the lifted Bregman approach compared to conventional training, particularly for architectures employing proximal activations.

2510.02514 2026-02-09 eess.IV cs.CV cs.IT eess.SP math.IT stat.ML

Learning a distance measure from the information-estimation geometry of data

Guy Ohayon, Pierre-Etienne H. Fiquet, Florentin Guth, Jona Ballé, Eero P. Simoncelli

Comments ICLR 2026. Code is available at https://github.com/ohayonguy/information-estimation-metric

详情
英文摘要

We introduce the Information-Estimation Metric (IEM), a novel form of distance function derived from an underlying continuous probability density over a domain of signals. The IEM is rooted in a fundamental relationship between information theory and estimation theory, which links the log-probability of a signal with the errors of an optimal denoiser, applied to noisy observations of the signal. In particular, the IEM between a pair of signals is obtained by comparing their denoising error vectors over a range of noise amplitudes. Geometrically, this amounts to comparing the score vector fields of the blurred density around the signals over a range of blur levels. We prove that the IEM is a valid global distance metric and derive a closed-form expression for its local second-order approximation, which yields a Riemannian metric. For Gaussian-distributed signals, the IEM coincides with the Mahalanobis distance. But for more complex distributions, it adapts, both locally and globally, to the geometry of the distribution. In practice, the IEM can be computed using a learned denoiser (analogous to generative diffusion models) and solving a one-dimensional integral. To demonstrate the value of our framework, we learn an IEM on the ImageNet database. Experiments show that this IEM is competitive with or outperforms state-of-the-art supervised image quality metrics in predicting human perceptual judgments.

2507.23034 2026-02-09 stat.ME

Hypothesis testing for community structure in temporal networks using e-values

Eric Yanchenko, Jonathan P. Williams, Ryan Martin

详情
英文摘要

Community structure in networks naturally arises in various applications. But while the topic has received significant attention for static networks, the literature on community structure in temporally evolving networks is more scarce. In particular, there are currently no statistical methods available to test for the presence of community structure in a sequence of networks evolving over time. In this work, we propose a simple yet powerful test using e-values, an alternative to p-values that is more flexible in certain ways. Specifically, an e-value framework retains valid testing properties even after combining dependent information, a relevant feature in the context of testing temporal networks. We apply the proposed test to synthetic and real-world networks, demonstrating various features inherited from the e-value formulation and exposing some of the inherent difficulties of testing on temporal networks.

2507.04341 2026-02-09 stat.ML cs.AI cs.LG

Efficient Perplexity Bound and Ratio Matching in Discrete Diffusion Language Models

Etrit Haxholli, Yeti Z. Gurbuz, Ogul Can, Eli Waxman

详情
英文摘要

While continuous diffusion models excel in modeling continuous distributions, their application to categorical data has been less effective. Recent work has shown that ratio-matching through score-entropy within a continuous-time discrete Markov chain (CTMC) framework serves as a competitive alternative to autoregressive models in language modeling. To enhance this framework, we first introduce three new theorems concerning the KL divergence between the data and learned distribution. Our results serve as the discrete counterpart to those established for continuous diffusion models and allow us to derive an improved upper bound of the perplexity. Second, we empirically show that ratio-matching performed by minimizing the denoising cross-entropy between the clean and corrupted data enables models to outperform those utilizing score-entropy with up to 10% lower perplexity/generative-perplexity, and 15% faster training steps. To further support our findings, we introduce and evaluate a novel CTMC transition-rate matrix that allows prediction refinement, and derive the analytic expression for its matrix exponential which facilitates the computation of conditional ratios thus enabling efficient training and generation.

2506.14389 2026-02-09 stat.OT math.ST stat.TH

Ole E. Barndorff-Nielsen: Sand, Wind and Inference

Michael Sørensen

Journal ref Bernoulli 32, 2026, 49-67

详情
英文摘要

This paper reviews Ole Eiler Barndorff-Nielsen's research in the first decades of his career. The focus is on topics that he kept returning to throughout his scientific life, and on papers that he built on in later important contributions. First his early contributions to the foundations of statistical inference are reviewed with focus on conditional inference and exponential families, two topics in which he had a lifelong interest. The second half of the paper reviews his research on wind blown sand and hyperbolic distributions and processes, including his early contributions to modelling of turbulent wind fields. This research laid the foundations for his later work on financial econometrics and ambit processes.

2505.23506 2026-02-09 cs.LG stat.ML

Position: Epistemic uncertainty estimation methods are fundamentally incomplete

Sebastián Jiménez, Mira Jürgens, Willem Waegeman

详情
英文摘要

Identifying and disentangling sources of predictive uncertainty is essential for trustworthy supervised learning. We argue that widely used second-order methods that disentangle aleatoric and epistemic uncertainty are fundamentally incomplete. First, we show that unaccounted bias contaminates uncertainty estimates by overestimating aleatoric (data-related) uncertainty and underestimating the epistemic (model-related) counterpart, leading to incorrect uncertainty quantification. Second, we demonstrate that existing methods capture only partial contributions to the variance-driven part of epistemic uncertainty; different approaches account for different variance sources, yielding estimates that are incomplete and difficult to interpret. Together, these results highlight that current epistemic uncertainty estimates can only be used in safety-critical and high-stakes decision-making when limitations are fully understood by end users and acknowledged by AI developers.

2505.13370 2026-02-09 stat.ME stat.ML

A Kolmogorov-Arnold Neural Model for Cascading Extremes

Miguel de Carvalho, Clemente Ferrer, Ronny Vallejos

详情
英文摘要

This paper addresses the growing concern of cascading extreme events, such as an extreme earthquake followed by a tsunami, by presenting a novel method for risk assessment focused on these domino effects. The proposed approach develops an extreme value theory framework within a Kolmogorov-Arnold network (KAN) to estimate the probability of one extreme event triggering another, conditionally on a feature vector. An extra layer is added to the KAN architecture to ensure that the parameter of interest lies within the unit interval, and we refer to the resulting neural model as KANE (KAN with Natural Enforcement). The proposed method is backed by exhaustive numerical studies and further illustrated with real-world applications to seismology and climatology.

2504.15243 2026-02-09 cs.LG stat.ML

Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints

Ming Yang, Gang Li, Quanqi Hu, Qihang Lin, Tianbao Yang

详情
英文摘要

Constrained optimization with multiple functional inequality constraints has significant applications in machine learning. This paper examines a crucial subset of such problems where both the objective and constraint functions are weakly convex. Existing methods often face limitations, including slow convergence rates or reliance on double-loop algorithmic designs. To overcome these challenges, we introduce a novel single-loop penalty-based stochastic algorithm. Following the classical exact penalty method, our approach employs a {\bf hinge-based penalty}, which permits the use of a constant penalty parameter, enabling us to achieve a {\bf state-of-the-art complexity} for finding an approximate Karush-Kuhn-Tucker (KKT) solution. We further extend our algorithm to address finite-sum coupled compositional objectives, which are prevalent in artificial intelligence applications, establishing improved complexity over existing approaches. Finally, we validate our method through experiments on fair learning with receiver operating characteristic (ROC) fairness constraints and continual learning with non-forgetting constraints.

2503.06558 2026-02-09 cs.LG stat.ML

Generative modelling with jump-diffusions

Adrian Baule

Comments New version contains: (i) A generalized score function in closed analytical form leading to the jump-Laplace (JL) model; (ii) Additional numerical experiments comparing JL ODE/SDE, Gaussian ODE, and Levy-Ito-Model SDE

详情
英文摘要

Score-based diffusion models generate samples from an unknown target distribution using a time-reversed diffusion process. While such models represent state-of-the-art approaches in industrial applications such as artificial image generation, it has recently been noted that their performance can be further improved by considering injection noise with heavy tailed characteristics. Here, I present a generalization of generative diffusion processes to a wide class of non-Gaussian noise processes. I consider forward processes driven by standard Gaussian noise with super-imposed Poisson jumps representing a finite activity Levy process. The generative process is shown to be governed by a generalized score function that depends on the jump amplitude distribution and can be estimated by minimizing a simple MSE loss as in conventional Gaussian models. Both probability flow ODE and SDE formulations are derived using basic technical effort. A detailed implementation for a pure jump process with Laplace distributed amplitudes yields a generalized score function in closed analytical form and is shown to outperform the equivalent Gaussian model in specific parameter regimes.

2502.09740 2026-02-09 econ.EM stat.ML

High-dimensional censored MIDAS logistic regression for corporate survival forecasting

Wei Miao, Jad Beyhum, Jonas Striaukas, Ingrid Van Keilegom

详情
英文摘要

This paper addresses the challenge of forecasting corporate distress, a problem marked by three key statistical hurdles: (i) right censoring, (ii) high-dimensional predictors, and (iii) mixed-frequency data. To overcome these complexities, we introduce a novel high-dimensional censored MIDAS (Mixed Data Sampling) logistic regression. Our approach handles censoring through inverse probability weighting and achieves accurate estimation with numerous mixed-frequency predictors by employing a sparse-group penalty. We establish finite-sample bounds for the estimation error, accounting for censoring, MIDAS approximation error, and heavy tails. For statistical inference, we develop a de-sparsified version of the proposed penalized estimator and establish its asymptotic theory, which enables valid statistical inference in high-dimensional settings with censoring. We show that censoring induces a nonstandard variance structure for the de-sparsified estimator, a feature that, to the best of our knowledge, has not been studied in the existing literature. The superior performance of the method is demonstrated through Monte Carlo simulations. Finally, we present an extensive application of our methodology to predict the financial distress of Chinese-listed firms and to identify covariates that are statistically significant for predicting distress. Our novel procedure is implemented in the R package \texttt{Survivalml}.

2501.07681 2026-02-09 cs.LG cs.CV math.OC stat.ML

Dataset Distillation as Pushforward Optimal Quantization

Hong Ye Tan, Emma Slade

Comments ICLR 2026, https://openreview.net/forum?id=FMSp8AUF3m

详情
英文摘要

Dataset distillation aims to find a synthetic training set such that training on the synthetic data achieves similar performance to training on real data, with orders of magnitude less computational requirements. Existing methods can be broadly categorized as either bi-level optimization problems that have neural network training heuristics as the lower level problem, or disentangled methods that bypass the bi-level optimization by matching distributions of data. The latter method has the major advantages of speed and scalability in terms of size of both training and distilled datasets. We demonstrate that when equipped with an encoder-decoder structure, the empirically successful disentangled methods can be reformulated as an optimal quantization problem, where a finite set of points is found to approximate the underlying probability measure by minimizing the expected projection distance. In particular, we link existing disentangled dataset distillation methods to the classical optimal quantization and Wasserstein barycenter problems, demonstrating consistency of distilled datasets for diffusion-based generative priors. We propose Dataset Distillation by Optimal Quantization, based on clustering in a latent space. Compared to the previous SOTA method D\textsuperscript{4}M, we achieve better performance and inter-model generalization on the ImageNet-1K dataset with trivial additional computation, and SOTA performance in higher image-per-class settings. Using the distilled noise initializations in a stronger diffusion transformer model, we obtain SOTA distillation performance on ImageNet-1K and its subsets, outperforming diffusion guidance methods.

2501.00382 2026-02-09 econ.GN cs.AI q-fin.EC stat.AP stat.ML

Adventures in Demand Analysis Using AI

Philipp Bach, Victor Chernozhukov, Sven Klaassen, Martin Spindler, Jan Teichert-Kluge, Suhas Vijaykumar

Comments 35 pages, 8 figures

详情
英文摘要

This paper advances empirical demand analysis by integrating multimodal product representations derived from artificial intelligence (AI). Using a detailed dataset of toy cars on textit{Amazon.com}, we combine text descriptions, images, and tabular covariates to represent each product using transformer-based embedding models. These embeddings capture nuanced attributes, such as quality, branding, and visual characteristics, that traditional methods often struggle to summarize. Moreover, we fine-tune these embeddings for causal inference tasks. We show that the resulting embeddings substantially improve the predictive accuracy of sales ranks and prices and that they lead to more credible causal estimates of price elasticity. Notably, we uncover strong heterogeneity in price elasticity driven by these product-specific features. Our findings illustrate that AI-driven representations can enrich and modernize empirical demand analysis. The insights generated may also prove valuable for applied causal inference more broadly.

2405.16594 2026-02-09 stat.ML cs.LG

Training-Conditional Coverage Bounds under Covariate Shift

Mehrdad Pournaderi, Yu Xiang

Comments Published in Transactions on Machine Learning Research

详情
英文摘要

Conformal prediction methodology has recently been extended to the covariate shift setting, where the distribution of covariates differs between training and test data. While existing results ensure that the prediction sets from these methods achieve marginal coverage above a nominal level, their coverage rate conditional on the training dataset (referred to as training-conditional coverage) remains unexplored. In this paper, we address this gap by deriving upper bounds on the tail of the training-conditional coverage distribution, offering probably approximately correct (PAC) guarantees for these methods. Our results characterize the reliability of the prediction sets in terms of the severity of distributional changes and the size of the training dataset.

2401.08468 2026-02-09 math.ST cs.LG eess.SP stat.TH

Nonparametric Evaluation of Noisy ICA Solutions

Syamantak Kumar, Purnamrita Sarkar, Peter Bickel, Derek Bean

Comments NeurIPS 2024 (Main Conference Track). 44 pages

Journal ref Advances in Neural Information Processing Systems, 37, pp.132647-132690 (2024)

详情
英文摘要

Independent Component Analysis (ICA) was introduced in the 1980's as a model for Blind Source Separation (BSS), which refers to the process of recovering the sources underlying a mixture of signals, with little knowledge about the source signals or the mixing process. While there are many sophisticated algorithms for estimation, different methods have different shortcomings. In this paper, we develop a nonparametric score to adaptively pick the right algorithm for ICA with arbitrary Gaussian noise. The novelty of this score stems from the fact that it just assumes a finite second moment of the data and uses the characteristic function to evaluate the quality of the estimated mixing matrix without any knowledge of the parameters of the noise distribution. In addition, we propose some new contrast functions and algorithms that enjoy the same fast computability as existing algorithms like FASTICA and JADE but work in domains where the former may fail. While these also may have weaknesses, our proposed diagnostic, as shown by our simulations, can remedy them. Finally, we propose a theoretical framework to analyze the local and global convergence properties of our algorithms.

2310.12436 2026-02-09 math.ST stat.TH

Nonparametric Prior Learning in Differential Equation Modeling

Junxiong Jia, Deyu Meng, Zongben Xu, Fang Yao

Comments 99 pages

详情
英文摘要

This paper addresses Bayesian inference related to partial differential equations (PDEs), particularly nonparametric regression constrained by PDEs. To effectively encode prior information, we propose a novel framework that learns a prediction function of the prior distribution from historical training datasets. We introduce hyper-prior and hyper-posterior distributions and derive a generalization error estimate, which accommodates data-dependent priors by extending the concept of differential privacy. Some mild conditions are given to validate the error estimate, where various typical PDEs such as diffusion and Darcy flow equations can be integrated. We thus formulate an infinite-dimensional optimization problem to obtain the point estimate of the hyper-posterior. Numerical examples demonstrate the performance of our proposed method in learning the prediction function of priors.

2111.15524 2026-02-09 stat.ME math.ST stat.TH

Robustness and Efficiency of Rosenbaum's Rank-based Estimator in Randomized Trials: A Design-based Perspective

Aditya Ghosh, Nabarun Deb, Bikram Karmakar, Bodhisattva Sen

Comments 101 pages

详情
英文摘要

Mean-based estimators of causal effects in randomized experiments may behave poorly if the potential outcomes have a heavy tail or contain outliers. An alternative estimator proposed by Rosenbaum (1993) estimates a constant additive treatment effect by inverting a randomization test using ranks. We develop a design-based asymptotic theory for this rank-based estimator and study its robustness and efficiency properties. We show that Rosenbaum's estimator is robust against outliers with a breakdown point that uniformly dominates that of any weighted quantile estimator. When pretreatment covariates are available, a regression-adjusted version of Rosenbaum's estimator uses an agnostic linear regression on the covariates and bases inference on the ranks of residuals. Under mild integrability conditions, we show that this estimator is at most 13.6% less efficient, in the worst case, than the commonly used mean-based regression adjustment method proposed by Lin (2013); often outperforming it when the residuals have heavy tails. Moreover, under suitable assumptions, Rosenbaum's regression-adjusted estimator is at least as efficient as the unadjusted one. Finally, we initiate the study of Rosenbaum's estimator when the constant treatment effect assumption may be violated. To analyze the regression-adjusted estimator, we develop local asymptotics of rank statistics under the design-based framework, which may be of independent interest.

1905.13599 2026-02-09 stat.CO stat.ME

Component-wise approximate Bayesian computation via Gibbs-like steps

Grégoire Clarté, Christian P. Robert, Robin Ryder, Julien Stoehr

Comments 28 pages, 13 figures, third revision (accepted for publication in Biometrika on 17 September, 2020)

详情
英文摘要

Approximate Bayesian computation methods are useful for generative models with intractable likelihoods. These methods are however sensitive to the dimension of the parameter space, requiring exponentially increasing resources as this dimension grows. To tackle this difficulty, we explore a Gibbs version of the ABC approach that runs component-wise approximate Bayesian computation steps aimed at the corresponding conditional posterior distributions, and based on summary statistics of reduced dimensions. While lacking the standard justifications for the Gibbs sampler, the resulting Markov chain is shown to converge in distribution under some partial independence conditions. The associated stationary distribution can further be shown to be close to the true posterior distribution and some hierarchical versions of the proposed mechanism enjoy a closed form limiting distribution. Experiments also demonstrate the gain in efficiency brought by the Gibbs version over the standard solution.

1706.10096 2026-02-09 stat.CO

Noisy Hamiltonian Monte Carlo for doubly-intractable distributions

Julien Stoehr, Alan Benson, Nial Friel

详情
英文摘要

Hamiltonian Monte Carlo (HMC) has been progressively incorporated within the statistician's toolbox as an alternative sampling method in settings when standard Metropolis-Hastings is inefficient. HMC generates a Markov chain on an augmented state space with transitions based on a deterministic differential flow derived from Hamiltonian mechanics. In practice, the evolution of Hamiltonian systems cannot be solved analytically, requiring numerical integration schemes. Under numerical integration, the resulting approximate solution no longer preserves the measure of the target distribution, therefore an accept-reject step is used to correct the bias. For doubly-intractable distributions -- such as posterior distributions based on Gibbs random fields -- HMC suffers from some computational difficulties: computation of gradients in the differential flow and computation of the accept-reject proposals poses difficulty. In this paper, we study the behaviour of HMC when these quantities are replaced by Monte Carlo estimates.

1508.05680 2026-02-09 math.ST math.AP stat.TH

Bayesian approach to inverse problems for functions with variable index Besov prior

Junxiong Jia, Jigen Peng, Jinghuai Gao

Comments 31 pages. arXiv admin note: text overlap with arXiv:1302.6989 by other authors

Journal ref Inverse Problems, 32(8), 2016, 085006

详情
英文摘要

We adopt Bayesian approach to consider the inverse problem of estimate a function from noisy observations. One important component of this approach is the prior measure. Total variation prior has been proved with no discretization invariant property, so Besov prior has been proposed recently. Different prior measures usually connect to different regularization terms. Variable index TV, variable index Besov regularization terms have been proposed in image analysis, however, there are no such prior measure in Bayesian theory. So in this paper, we propose a variable index Besov prior measure which is a Non-Guassian measure. Based on the variable index Besov prior measure, we build the Bayesian inverse theory. Then applying our theory to integer and fractional order backward diffusion problems. Although there are many researches about fractional order backward diffusion problems, we firstly apply Bayesian inverse theory to this problem which provide an opportunity to quantify the uncertainties for this problem.

1502.01997 2026-02-09 math.ST stat.CO stat.TH

Calibration of conditional composite likelihood for Bayesian inference on Gibbs random fields

Julien Stoehr, Nial Friel

Comments JMLR Workshop and Conference Proceedings, 18th International Conference on Artificial Intelligence and Statistics (AISTATS), San Diego, California, USA, 9-12 May 2015 (Vol. 38, pp. 921-929). arXiv admin note: substantial text overlap with arXiv:1207.5758

Journal ref Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR 38:921-929, 2015

详情
英文摘要

Gibbs random fields play an important role in statistics, however, the resulting likelihood is typically unavailable due to an intractable normalizing constant. Composite likelihoods offer a principled means to construct useful approximations. This paper provides a mean to calibrate the posterior distribution resulting from using a composite likelihood and illustrate its performance in several examples.

2602.06210 2026-02-09 stat.AP

Evaluating Predictive Modeling Strategies for Predicting Individual Treatment Effects in Precision Medicine

Pamela M. Chiroque-Solano, M Lee Van Horn, Thomas Jaki

详情
英文摘要

Precision medicine seeks to match patients with treatments that produce the greatest benefit. The Predicted Individual Treatment Effect (PITE)-the difference between predicted outcomes under treatment and control-quantifies this benefit but is difficult to estimate due to unobserved counterfactuals, high dimensionality, and complex interactions. We compared 30+ modeling strategies, including penalized and projection-based methods, flexible learners, and tree-ensembles, using a structured simulation framework varying sample size, dimensionality, multicollinearity, and interaction complexity. Performance was measured using root mean squared error (RMSE) for prediction accuracy and directional accuracy (DIR) for correctly classifying benefit versus harm. Internal validation produced optimistic estimates, whereas external validation with distributional shifts and higher-order interactions more clearly revealed model weaknesses. Penalized and projection-based approaches-ridge, lasso, elastic net, partial least squares (PLS), and principal components regression (PCR)-consistently achieved strong RMSE and DIR performance. Flexible learners excelled only under strong signals and sufficient sample sizes. Results highlight robust linear/projection defaults and the necessity of rigorous external validation.

2602.06155 2026-02-09 cs.LG stat.ML

Latent Structure Emergence in Diffusion Models via Confidence-Based Filtering

Wei Wei, Yizhou Zeng, Kuntian Chen, Sophie Langer, Mariia Seleznova, Hung-Hsu Chou

详情
英文摘要

Diffusion models rely on a high-dimensional latent space of initial noise seeds, yet it remains unclear whether this space contains sufficient structure to predict properties of the generated samples, such as their classes. In this work, we investigate the emergence of latent structure through the lens of confidence scores assigned by a pre-trained classifier to generated samples. We show that while the latent space appears largely unstructured when considering all noise realizations, restricting attention to initial noise seeds that produce high-confidence samples reveals pronounced class separability. By comparing class predictability across noise subsets of varying confidence and examining the class separability of the latent space, we find evidence of class-relevant latent structure that becomes observable only under confidence-based filtering. As a practical implication, we discuss how confidence-based filtering enables conditional generation as an alternative to guidance-based methods.

2602.06153 2026-02-09 stat.ME

A Compound Logistic Regression Model for Binary Responses

Anthony Almudevar, Jacob Almudevar

Comments 33 pages; 6 figures

详情
英文摘要

Logistic regression is the most commonly used method for constructing predictive models for binary responses. One significant drawback to this approach, however, is that the asymptotes of the logistic response function are fixed at 0 and 1, and there are many applications for which this constraint is inappropriate. More flexible models have been proposed for this application, most proceeding by supplementing the logistic response function with additional parameters. In this article we extend these models to allow correlated responses and the inclusion of covariates. This is achieved through the \emph{compound logistic regression model}, for which the mean response is a function of several logistic regression functions. This permits a greater variety of models, while retaining the advantages of logistic regression.

2602.06148 2026-02-09 stat.AP

Non-Linear Drivers of Population Dynamics: a Nonparametric Coalescent Approach

Filippo Monti, Nuno R. Faria, Xiang Ji, Philippe Lemey, Moritz U. G. Kraemer, Marc A. Suchard

详情
英文摘要

Effective population size (Ne(t)) is a fundamental parameter in population genetics and phylodynamics that quantifies genetic diversity and reveals demographic history. Coalescent-based methods enable the inference of Ne(t) trajectories through time from phylogenies reconstructed from molecular sequence data. Understanding the ecological and environmental drivers of population dynamics requires linking Ne(t) to external covariates. Existing approaches typically impose log-linear relationships between covariates and Ne(t), which may fail to capture complex biological processes and can introduce bias when the true relationship is nonlinear. We present a flexible Bayesian framework that integrates covariates into coalescent models with piecewise-constant Ne(t) through a Gaussian process (GP) prior. The GP, a distribution over functions, naturally accommodates nonlinear covariate effects without restrictive parametric assumptions. This formulation improves estimation of covariate-Ne(t) relationships, mitigates bias under nonlinear associations, and yields interpretable uncertainty quantification that varies across the covariate space. To balance global covariate-driven patterns with local temporal dynamics, we couple the GP prior with a Gaussian Markov random field that enforces smoothness in Ne(t) trajectories. Through simulation studies and three empirical applications - yellow fever virus dynamics in Brazil (2016-2018), late-Quaternary musk ox demography, and HIV-1 CRF02-AG evolution in Cameroon - we demonstrate that our method both confirms linear relationships where appropriate and reveals nonlinear covariate effects that would otherwise be missed or mischaracterized. This framework advances phylodynamic inference by enabling more accurate and biologically realistic modeling of how environmental and epidemiological factors shape population size through time.

2602.06146 2026-02-09 cs.LG math.OC stat.ML

Optimistic Training and Convergence of Q-Learning -- Extended Version

Prashant Mehta, Sean Meyn

详情
英文摘要

In recent work it is shown that Q-learning with linear function approximation is stable, in the sense of bounded parameter estimates, under the $(\varepsilon,κ)$-tamed Gibbs policy; $κ$ is inverse temperature, and $\varepsilon>0$ is introduced for additional exploration. Under these assumptions it also follows that there is a solution to the projected Bellman equation (PBE). Left open is uniqueness of the solution, and criteria for convergence outside of the standard tabular or linear MDP settings. The present work extends these results to other variants of Q-learning, and clarifies prior work: a one dimensional example shows that under an oblivious policy for training there may be no solution to the PBE, or multiple solutions, and in each case the algorithm is not stable under oblivious training. The main contribution is that far more structure is required for convergence. An example is presented for which the basis is ideal, in the sense that the true Q-function is in the span of the basis. However, there are two solutions to the PBE under the greedy policy, and hence also for the $(\varepsilon,κ)$-tamed Gibbs policy for all sufficiently small $\varepsilon>0$ and $κ\ge 1$.

2602.06137 2026-02-09 quant-ph cs.LG stat.ML

Warm Starts, Cold States: Exploiting Adiabaticity for Variational Ground-States

Ricard Puig, Berta Casas, Alba Cervera-Lierta, Zoë Holmes, Adrián Pérez-Salinas

Comments 11 + 24 pages, 3 figures

详情
英文摘要

Reliable preparation of many-body ground states is an essential task in quantum computing, with applications spanning areas from chemistry and materials modeling to quantum optimization and benchmarking. A variety of approaches have been proposed to tackle this problem, including variational methods. However, variational training often struggle to navigate complex energy landscapes, frequently encountering suboptimal local minima or suffering from barren plateaus. In this work, we introduce an iterative strategy for ground-state preparation based on a stepwise (discretized) Hamiltonian deformation. By complementing the Variational Quantum Eigensolver (VQE) with adiabatic principles, we demonstrate that solving a sequence of intermediate problems facilitates tracking the ground-state manifold toward the target system, even as we scale the system size. We provide a rigorous theoretical foundation for this approach, proving a lower bound on the loss variance that suggests trainability throughout the deformation, provided the system remains away from gap closings. Numerical simulations, including the effects of shot noise, confirm that this path-dependent tracking consistently converges to the target ground state.

2602.06135 2026-02-09 stat.AP

Early warning of Mpox outbreaks in U.S. jurisdictions using Lasso Vector Autoregression models with cross-jurisdictional lags

Hannah Craddock, Joel O. Wertheim, Eliah Aronoff-Spencer, Mark Beatty, David Valentine, Rishi Graham, Jade C. Wang, Lior Rennert, Seema Shah, Ravi Goyal, Natasha K. Martin

详情
英文摘要

Mpox is an orthopoxvirus that infects humans and animals and is transmitted primarily through close physical contact. The episodic and spatially heterogeneous dynamics of Mpox transmission underscores the need for timely, area-specific forecasts to support targeted public health responses in the U.S. We develop a Vector Autoregression model with Lasso regularization (VAR-Lasso) to generate rolling two-week-ahead forecasts of weekly Mpox cases for eight high-incidence U.S. jurisdictions using national surveillance data from the Centers for Disease Control and Prevention (CDC). The VAR-Lasso model identifies significant long-lag, cross-jurisdictional predictors. For a case study in San Diego County (SDC), these statistical predictors align with phylogenetic analysis that traces a 2023 cluster in SDC to an outbreak in Illinois six months earlier. As the need for public health action is often greatest when incidence is increasing, our performance evaluation focuses on positive-slope weighted error metrics. Forecast performance of the VAR-Lasso model is compared to a uni-variate Auto-Regressive (AR) Lasso model and a naive moving-average estimate. The models are compared using slope-weighted Root Mean Squared Error (RMSE), slope-weighted Mean Absolute Error (MAE), and slope-weighted bias. Across all observations, the VAR-Lasso model reduces slope-weighted RMSE, MAE, and bias by 12%, 7%, and 66% relative to the AR model, and by 16%, 13%, and 76% relative to the naive benchmark. Our findings highlight the value of sparse multivariate time-series models that leverage cross-jurisdictional case data for early forecasting of Mpox outbreaks. Such forecasting can aid health departments in proactively providing timely resources and messaging to mitigate the risks of a future outbreak.

2601.15449 2026-02-09 stat.ME

Distributional Balancing for Causal Inference: A Unified Framework via Characteristic Function Distance

Diptanil Santra, Guanhua Chen, Chan Park

Comments 36 pages

详情
英文摘要

Weighting methods are essential tools for estimating causal effects in observational studies, with the goal of balancing pre-treatment covariates across treatment groups. Traditional approaches pursue this objective indirectly, for example, via inverse propensity score weighting or by matching a finite number of covariate moments, and therefore do not guarantee balance of the full joint covariate distributions. Recently, distributional balancing methods have emerged as robust, nonparametric alternatives that directly target alignment of entire covariate distributions, but they lack a unified framework, formal theoretical guarantees, and valid inferential procedures. We introduce a unified framework for nonparametric distributional balancing based on the characteristic function distance (CFD) and show that widely used discrepancy measures, including the maximum mean discrepancy and energy distance, arise as special cases. Our theoretical analysis establishes conditions under which the resulting CFD-based weighting estimator achieves $\sqrt{n}$-consistency. Since the standard bootstrap may fail for this estimator, we propose subsampling as a valid alternative for inference. We further extend our approach to an instrumental variable setting to address potential unmeasured confounding. Finally, we evaluate the performance of our method through simulation studies and a real-world application, where the proposed estimator performs well and exhibits results consistent with our theoretical predictions.

2512.12783 2026-02-09 cs.LG q-fin.ST stat.AP

Credit Risk Estimation with Non-Financial Features: Evidence from a Synthetic Istanbul Dataset

Atalay Denknalbant, Emre Sezdi, Zeki Furkan Kutlu

详情
英文摘要

Financial exclusion constrains entrepreneurship, increases income volatility, and widens wealth gaps. Underbanked consumers in Istanbul often have no bureau file because their earnings and payments flow through informal channels. To study how such borrowers can be evaluated we create a synthetic dataset of one hundred thousand Istanbul residents that reproduces first quarter 2025 TÜİK (TURKSTAT) census marginals and telecom usage patterns. Retrieval augmented generation feeds these public statistics into the OpenAI o3 model, which synthesises realistic yet private records. Each profile contains seven socio demographic variables and nine alternative attributes that describe phone specifications, online shopping rhythm, subscription spend, car ownership, monthly rent, and a credit card flag. To test the impact of the alternative financial data CatBoost, LightGBM, and XGBoost are each trained in two versions. Demo models use only the socio demographic variables; Full models include both socio demographic and alternative attributes. Across five fold stratified validation the alternative block raises area under the curve by about one point three percentage and lifts balanced F 1 from roughly 0.84 to 0.95, a fourteen percent gain. We contribute an open Istanbul 2025 Q1 synthetic dataset, a fully reproducible modeling pipeline, and empirical evidence that a concise set of behavioural attributes can approach bureau level discrimination power while serving borrowers who lack formal credit records. These findings give lenders and regulators a transparent blueprint for extending fair and safe credit access to the underbanked.

2509.17382 2026-02-09 stat.ML cs.LG math.ST stat.ME stat.TH

Optimal Bias-variance Tradeoff in Matrix and Tensor Estimation

Shivam Kumar, Xiaokai Luo, Haotian Xu, Carlos Misael Madrid Padilla, Oscar Hernan Madrid Padilla, Daren Wang

详情
英文摘要

We study matrix and tensor denoising when the underlying signal is \textbf{not} necessarily low-rank. In the tensor setting, we observe \[ Y = X^\ast + Z \in \mathbb{R}^{p_1 \times p_2 \times p_3}, \] where $X^\ast$ is an unknown signal tensor and $Z$ is a noise tensor. We propose a one-step variant of the higher-order SVD (HOSVD) estimator, denoted $\widetilde X$, and show that, uniformly over any user-specified Tucker ranks $(r_1,r_2,r_3)$, with high probability, \[ \|\widetilde X - X^\ast\|_{\mathrm F}^2 = O\Big( κ^2\Big\{r_1r_2r_3 + \sum_{k=1}^3 p_k r_k\Big\} + ξ_{(r_1,r_2,r_3)}^2 \Big). \] Here, $ξ_{(r_1,r_2,r_3)}$ is the best achievable Tucker rank-$(r_1,r_2,r_3)$ approximation error of $X^\ast$ (bias), $κ^2$ quantifies the noise level, and $κ^2\{r_1r_2r_3+\sum_{k=1}^3 p_k r_k\}$ is the variance term scaling with the effective degrees of freedom of $\widetilde X$. This yields a rank-adaptive bias-variance tradeoff: increasing $(r_1,r_2,r_3)$ decreases the bias $ξ_{(r_1,r_2,r_3)}$ while increasing variance. In the matrix setting, we show that truncated SVD achieves an analogous bias-variance tradeoff for arbitrary signal matrices. Notably, our matrix result requires \textbf{no} assumptions on the signal matrix, such as finite rank or spectral gaps. Finally, we complement our upper bounds with matching information-theoretic lower bounds, showing that the resulting bias-variance tradeoff is minimax optimal up to universal constants in both the matrix and tensor settings.

2507.14666 2026-02-09 stat.AP

What Quality Engineers Need to Know about Degradation Models

Jared M. Clark, Jie Min, Mingyang Li, Richard L. Warr, Stephanie P. DeHart, Caleb B. King, Lu Lu, Yili Hong

Comments 38 pages, 16 figures

详情
英文摘要

Degradation models play a critical role in quality engineering by enabling the assessment and prediction of system reliability based on data. The objective of this paper is to provide an accessible introduction to degradation models. We explore commonly used degradation data types, including repeated measures degradation data and accelerated destructive degradation test data, and review modeling approaches such as general path models and stochastic process models. Key inference problems, including reliability estimation and prediction, are addressed. Applications across diverse fields, including material science, renewable energy, civil engineering, aerospace, and pharmaceuticals, illustrate the broad impact of degradation models in industry. We also discuss best practices for quality engineers, software implementations, and challenges in applying these models. This paper aims to provide quality engineers with a foundational understanding of degradation models, equipping them with the knowledge necessary to apply these techniques effectively in real-world scenarios.

2506.09871 2026-02-09 stat.ME math.ST stat.TH

Optimal Adjustment Sets for Nonparametric Estimation of Weighted Controlled Direct Effect

Ruiyang Lin, Yongyi Guo, Kyra Gan

Comments 47 pages, 8 figures, NeurIPS 2025 (accepted)

详情
英文摘要

The weighted controlled direct effect (WCDE) generalizes the standard controlled direct effect (CDE) by averaging over the mediator distribution, providing a robust estimate when treatment effects vary across mediator levels. This makes the WCDE especially relevant in fairness analysis, where it isolates the direct effect of an exposure on an outcome, independent of mediating pathways. This work establishes three fundamental advances for WCDE in observational studies: First, we establish necessary and sufficient conditions for the unique identifiability of the WCDE, clarifying when it diverges from the CDE. Next, we consider nonparametric estimation of the WCDE and derive its influence function, focusing on the class of regular and asymptotically linear estimators. Lastly, we characterize the optimal covariate adjustment set that minimizes the asymptotic variance, demonstrating how mediator-confounder interactions introduce distinct requirements compared to average treatment effect estimation. Our results offer a principled framework for efficient estimation of direct effects in complex causal systems, with practical applications in fairness and mediation analysis.

2506.00602 2026-02-09 stat.AP q-bio.QM

Assessing Honey Bee Colony Health Using Temperature Time Series

Karina Arias-Calluari, Theotime Colin, Tanya Latty, Mary Myerscough, Eduardo G. Altmann

Comments 14 pages, 7 figures and 1 repository

Journal ref J R Soc Interface 23 (235): 202e50505 (2026)

详情
英文摘要

Honey bees face an increasing number of stressors that disrupt the natural behaviour of colonies and, in extreme cases, can lead to their collapse. Quantifying the status and resilience of colonies is essential to measure the impact of stressors and to identify colonies at risk. In this manuscript, we present and apply new methodologies to efficiently diagnose the status of a honey bee colony from widely available time series of hive and environmental temperature. Healthy hives have a remarkable ability to control temperature near the brood area. Our method exploits this fact and quantifies the status of a hive by measuring how resilient they are to extreme environmental temperatures, which act as natural stressors. Analysing 22 hives during different times of the year, including 3 hives that collapsed, we find the statistical signatures of stress that reveal whether honeybees are doing well or are at risk of failure. Based on these analyses, we propose a simple scale of hive status (stable, warning, and collapse) that can be determined based on a few temperature measurements. Our approach offers a lower-cost and practical bee-monitoring solution, providing a non-invasive way to track hive conditions and trigger interventions to save the hives from collapse.

2505.12743 2026-02-09 stat.ME

Deep Generative Modeling with Spatial and Network Images: An Explainable AI (XAI) Approach

Yeseul Jeon, Rajarshi Guhaniyogi, Aaron Scheffler

详情
英文摘要

This article addresses the challenge of modeling the amplitude of spatially indexed low frequency fluctuations (ALFF) in resting state functional MRI as a function of cortical structural features and a multi-task coactivation network in the Adolescent Brain Cognitive Development (ABCD) Study. It proposes a generative model that integrates effects of spatially-varying inputs and a network-valued input using deep neural networks to capture complex non-linear and spatial associations with the output. The method models spatial smoothness, accounts for subject heterogeneity and complex associations between network and spatial images at different scales, enables accurate inference of each images effect on the output image, and allows prediction with uncertainty quantification via Monte Carlo dropout, contributing to one of the first Explainable AI (XAI) frameworks for heterogeneous imaging data. The model is highly scalable to high-resolution data without the heavy pre-processing or summarization often required by Bayesian methods. Empirical results demonstrate its strong performance compared to existing statistical and deep learning methods. We applied the XAI model to the ABCD data which revealed associations between cortical features and ALFF throughout the entire brain. Our model performed comparably to existing methods in predictive accuracy but provided superior uncertainty quantification and faster computation, demonstrating its effectiveness for large-scale neuroimaging analysis. Open-source software in Python for XAI is available.

2406.02741 2026-02-09 stat.CO

Sampling From Multiscale Densities With Delayed Rejection Generalized Hamiltonian Monte Carlo

Gilad Turok, Chirag Modi, Bob Carpenter

Comments 9 pages, 5 figures

详情
英文摘要

Hamiltonian Monte Carlo (HMC) is the mainstay of applied Bayesian inference for differentiable models. However, HMC still struggles to sample from hierarchical models that induce densities with multiscale geometry: a large step size is needed to efficiently explore low curvature regions while a small step size is needed to accurately explore high curvature regions. We introduce the delayed rejection generalized HMC (DR-G-HMC) sampler that overcomes this challenge by employing dynamic step size selection, inspired by differential equation solvers. In generalized HMC, each iteration does a single leapfrog step. DR-G-HMC sequentially makes proposals with geometrically decreasing step sizes upon rejection of earlier proposals. This simulates Hamiltonian dynamics that can adjust its step size along a (stochastic) Hamiltonian trajectory to deal with regions of high curvature. DR-G-HMC makes generalized HMC competitive by decreasing the number of rejections which otherwise cause inefficient backtracking and prevents directed movement. We present experiments to demonstrate that DR-G-HMC (1) correctly samples from multiscale densities, (2) makes generalized HMC methods competitive with the state of the art No-U-Turn sampler, and (3) is robust to tuning parameters.

2405.15167 2026-02-09 stat.ML cs.LG

ProDAG: Projected Variational Inference for Directed Acyclic Graphs

Ryan Thompson, Edwin V. Bonilla, Robert Kohn

Comments To appear in Advances in Neural Information Processing Systems

详情
英文摘要

Directed acyclic graph (DAG) learning is a central task in structure discovery and causal inference. Although the field has witnessed remarkable advances over the past few years, it remains statistically and computationally challenging to learn a single (point estimate) DAG from data, let alone provide uncertainty quantification. We address the difficult task of quantifying graph uncertainty by developing a Bayesian variational inference framework based on novel, provably valid distributions that have support directly on the space of sparse DAGs. These distributions, which we use to define our prior and variational posterior, are induced by a projection operation that maps an arbitrary continuous distribution onto the space of sparse weighted acyclic adjacency matrices. While this projection is combinatorial, it can be solved efficiently using recent continuous reformulations of acyclicity constraints. We empirically demonstrate that our method, ProDAG, can outperform state-of-the-art alternatives in both accuracy and uncertainty quantification.

2404.13204 2026-02-09 stat.AP stat.CO

Scalable Bayesian Image-on-Scalar Regression for Population-Scale Neuroimaging Data Analysis

Yuliang Xu, Timothy D. Johnson, Thomas E. Nichols, Jian Kang

详情
英文摘要

Bayesian Image-on-Scalar Regression (ISR) provides flexible, uncertainty-aware neuroimaging analysis. However, applying ISR to large-scale datasets such as the UK Biobank is challenging due to intensive computational demands and the need to handle subject-specific brain masks rather than a common mask. We propose a novel Bayesian ISR model that scales efficiently while accommodating these inconsistent masks. Our method leverages Gaussian process priors with salience area indicators and introduces a scalable posterior computation algorithm using stochastic gradient Langevin dynamics combined with memory mapping. This approach achieves linear scaling with subsample size and constrains memory usage to the batch size, facilitating direct spatial posterior inferences on brain activation regions. Simulation studies and analysis of UK Biobank task fMRI data (38,639 subjects; over 120,000 voxels per image) demonstrate a 4- to 11-fold speed increase and an 8-18% enhancement in statistical power compared to traditional Gibbs sampling with zero-imputation. Our analysis reveals a subregion of the amygdala where emotion-related brain activation decreases by approximately 58% between ages 50 and 60.

1805.03273 2026-02-09 stat.ME

Nothing to See Here? A non-inferiority approach to parallel trends

Alyssa Bilinski, Laura A. Hatfield

详情
英文摘要

Difference-in-differences is a popular method for observational health policy evaluation. It relies on a causal assumption that in the absence of intervention, treatment groups' outcomes would have evolved in parallel to those of comparison groups. Researchers frequently look for parallel trends in the pre-intervention period to bolster confidence in this assumption. The popular "parallel trends test" evaluates a null hypothesis of parallel trends and, failing to find evidence against the null, concludes that the assumption holds. This tightly controls the probability of falsely concluding that trends are not parallel but may have low power to detect non-parallel trends. When used as a screening step, it can also introduce bias in treatment effect estimates. We propose a non-inferiority/equivalence approach that tightly controls the probability of missing large violations of parallel trends measured on the scale of the treatment effect. Our framework nests several common use cases, including linear trend tests and event studies. We show that our approach may induce no or minimal bias when used as a screening step under commonly-assumed error structures, and absent violations, can offer a higher-power alternative to testing treatment effects in more flexible models. We illustrate our ideas by re-considering a study of the impact of the Affordable Care Act's dependent coverage provision.