arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.05483 2026-03-06 cs.LG cs.AI stat.ML

SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis

Shahriar Noroozizadeh, Xiaobin Shen, Jeremy C. Weiss, George H. Chen

Comments The Fourteenth International Conference on Learning Representations (ICLR 2026)

详情
英文摘要

Estimating heterogeneous treatment effects (HTEs) from right-censored survival data is critical in high-stakes applications such as precision medicine and individualized policy-making. Yet, the survival analysis setting poses unique challenges for HTE estimation due to censoring, unobserved counterfactuals, and complex identification assumptions. Despite recent advances, from Causal Survival Forests to survival meta-learners and outcome imputation approaches, evaluation practices remain fragmented and inconsistent. We introduce SurvHTE-Bench, the first comprehensive benchmark for HTE estimation with censored outcomes. The benchmark spans (i) a modular suite of synthetic datasets with known ground truth, systematically varying causal assumptions and survival dynamics, (ii) semi-synthetic datasets that pair real-world covariates with simulated treatments and outcomes, and (iii) real-world datasets from a twin study (with known ground truth) and from an HIV clinical trial. Across synthetic, semi-synthetic, and real-world settings, we provide the first rigorous comparison of survival HTE methods under diverse conditions and realistic assumption violations. SurvHTE-Bench establishes a foundation for fair, reproducible, and extensible evaluation of causal survival methods. The data and code of our benchmark are available at: https://github.com/Shahriarnz14/SurvHTE-Bench .

2603.05480 2026-03-06 stat.ML cs.LG math.ST stat.TH

Thermodynamic Response Functions in Singular Bayesian Models

Sean Plummer

详情
英文摘要

Singular statistical models-including mixtures, matrix factorization, and neural networks-violate regular asymptotics due to parameter non-identifiability and degenerate Fisher geometry. Although singular learning theory characterizes marginal likelihood behavior through invariants such as the real log canonical threshold and singular fluctuation, these quantities remain difficult to interpret operationally. At the same time, widely used criteria such as WAIC and WBIC appear disconnected from underlying singular geometry. We show that posterior tempering induces a one-parameter deformation of the posterior distribution whose associated observables generate a hierarchy of thermodynamic response functions. A universal covariance identity links derivatives of tempered expectations to posterior fluctuations, placing WAIC, WBIC, and singular fluctuation within a unified response framework. Within this framework, classical quantities from singular learning theory acquire natural thermodynamic interpretations: RLCT governs the leading free-energy slope, singular fluctuation corresponds to curvature of the tempered free energy, and WAIC measures predictive fluctuation. We formalize an observable algebra that quotients out non-identifiable directions, allowing structurally meaningful order parameters to be constructed in singular models. Across canonical singular examples-including symmetric Gaussian mixtures, reduced-rank regression, and overparameterized neural networks-we empirically demonstrate phase-transition-like behavior under tempering. Order parameters collapse, susceptibilities peak, and complexity measures align with structural reorganization in posterior geometry. Our results suggest that thermodynamic response theory provides a natural organizing framework for interpreting complexity, predictive variability, and structural reorganization in singular Bayesian learning.

2603.05396 2026-03-06 stat.ML cs.LG

Harnessing Synthetic Data from Generative AI for Statistical Inference

Ahmad Abdel-Azim, Ruoyu Wang, Xihong Lin

Comments Submitted to Statistical Science

详情
英文摘要

The emergence of generative AI models has dramatically expanded the availability and use of synthetic data across scientific, industrial, and policy domains. While these developments open new possibilities for data analysis, they also raise fundamental statistical questions about when synthetic data can be used in a valid, reliable, and principled manner. This paper reviews the current landscape of synthetic data generation and use from a statistical perspective, with the goal of clarifying the assumptions under which synthetic data can meaningfully support downstream discovery, inference, and prediction. We survey major classes of modern generative models, their intended use cases, and the benefits they offer, while also highlighting their limitations and characteristic failure modes. We additionally examine common pitfalls that arise when synthetic data are treated as surrogates for real observations, including biases from model misspecification, attenuated uncertainty, and difficulties in generalization. Building on these insights, we discuss emerging frameworks for the principled use of synthetic data. We conclude with practical recommendations, open problems, and cautions intended to guide both method developers and applied researchers.

2603.05370 2026-03-06 cs.LG cs.AI stat.ME

Learning Causal Structure of Time Series using Best Order Score Search

Irene Gema Castillo Mansilla, Urmi Ninad

详情
英文摘要

Causal structure learning from observational data is central to many scientific and policy domains, but the time series setting common to many disciplines poses several challenges due to temporal dependence. In this paper we focus on score-based causal discovery for multivariate time series and introduce TS-BOSS, a time series extension of the recently proposed Best Order Score Search (BOSS) (Andrews et al. 2023). TS-BOSS performs a permutation-based search over dynamic Bayesian network structures while leveraging grow-shrink trees to cache intermediate score computations, preserving the scalability and strong empirical performance of BOSS in the static setting. We provide theoretical guarantees establishing the soundness of TS-BOSS under suitable assumptions, and we present an intermediate result that extends classical subgraph minimality results for permutation-based methods to the dynamic (time series) setting. Our experiments on synthetic data show that TS-BOSS is especially effective in high auto-correlation regimes, where it consistently achieves higher adjacency recall at comparable precision than standard constraint-based methods. Overall, TS-BOSS offers a high-performing, scalable approach for time series causal discovery and our results provide a principled bridge for extending sparsity-based, permutation-driven causal learning theory to dynamic settings.

2603.05317 2026-03-06 stat.ML cs.LG

How important are the genes to explain the outcome - the asymmetric Shapley value as an honest importance metric for high-dimensional features

Mark A. van de Wiel, Jeroen Goedhart, Martin Jullum, Kjersti Aas

Comments 32 pages, incl. Supplementary Material

详情
英文摘要

In clinical prediction settings the importance of a high-dimensional feature like genomics is often assessed by evaluating the change in predictive performance when adding it to a set of traditional clinical variables. This approach is questionable, because it does not account for collinearity nor known directionality of dependencies between variables. We suggest to use asymmetric Shapley values as a more suitable alternative to quantify feature importance in the context of a mixed-dimensional prediction model. We focus on a setting that is particularly relevant in clinical prediction: disease state as a mediating variable for genomic effects, with additional confounders for which the direction of effects may be unknown. We derive efficient algorithms to compute local and global asymmetric Shapley values for this setting. The former are shown to be very useful for inference, whereas the latter provide interpretation by decomposing any predictive performance metric into contributions of the features. Throughout, we illustrate our framework by a leading example: the prediction of progression-free survival for colorectal cancer patients.

2603.05306 2026-03-06 math.PR math.ST stat.TH

Maximum of sparsely equicorrelated Gaussian fields and applications

Johannes Heiny, Tiefeng Jiang, Tuan Pham, Yongcheng Qi

详情
英文摘要

We investigate the extreme values of a sparse and equicorrelated Gaussian field on a triangle: the correlations on every vertical or horizontal line are all equal to a parameter $r \in [0,1/2]$ and are zero everywhere else. This problem is closely linked with various problems in high-dimensional statistics and extreme-value theory. We identify the threshold for $r$ at which the standard Gumbel law breaks down. Our result is based on a subtle application of the Chen-Stein method for Poisson approximation. As applications, we discuss the implication of our results on multiple testing and resolve several questions that were left open in \cite{heiny2024maximum}, \cite{tang2022asymptotic} and \cite{Jiang19}.

2603.05288 2026-03-06 stat.ML cs.LG

Bayesian Supervised Causal Clustering

Luwei Wang, Nazir Lone, Sohan Seth

详情
英文摘要

Finding patient subgroups with similar characteristics is crucial for personalized decision-making in various disciplines such as healthcare and policy evaluation. While most existing approaches rely on unsupervised clustering methods, there is a growing trend toward using supervised clustering methods that identify operationalizable subgroups in the context of a specific outcome of interest. We propose Bayesian Supervised Causal Clustering (BSCC), with treatment effect as outcome to guide the clustering process. BSCC identifies homogenous subgroups of individuals who are similar in their covariate profiles as well as their treatment effects. We evaluate BSCC on simulated datasets as well as real-world dataset from the third International Stroke Trial to assess the practical usefulness of the framework.

2602.16537 2026-03-06 math.ST cs.IT cs.LG math.IT stat.ML stat.TH

Optimal training-conditional regret for online conformal prediction

Jiadong Liang, Zhimei Ren, Yuxin Chen

详情
英文摘要

We study online conformal prediction for non-stationary data streams subject to unknown distribution drift. While most prior work studied this problem under adversarial settings and/or assessed performance in terms of gaps of time-averaged marginal coverage, we instead evaluate performance through training-conditional cumulative regret. We specifically focus on independently generated data with two types of distribution shift: abrupt change points and smooth drift. When non-conformity score functions are pretrained on an independent dataset, we propose a split-conformal style algorithm that leverages drift detection to adaptively update calibration sets, which provably achieves minimax-optimal regret. When non-conformity scores are instead trained online, we develop a full-conformal style algorithm that again incorporates drift detection to handle non-stationarity; this approach relies on stability - rather than permutation symmetry - of the model-fitting algorithm, which is often better suited to online learning under evolving environments. We establish non-asymptotic regret guarantees for our online full conformal algorithm, which match the minimax lower bound under appropriate restrictions on the prediction sets. Numerical experiments corroborate our theoretical findings.

2512.17805 2026-03-06 math.ST cs.NA math.NA stat.ML stat.TH

Towards Sharp Minimax Risk Bounds for Operator Learning

Ben Adcock, Gregor Maier, Rahul Parhi

详情
英文摘要

We develop a minimax theory for operator learning, where the goal is to estimate an unknown operator between separable Hilbert spaces from finitely many noisy input-output samples. For uniformly bounded Lipschitz operators, we prove information-theoretic lower bounds together with matching or near-matching upper bounds, covering both fixed and random designs under Hilbert-valued Gaussian noise and Gaussian white noise errors. The rates are controlled by the spectrum of the covariance operator of the measure that defines the error metric. Our setup is very general and allows for measures with unbounded support. A key implication is a curse of sample complexity, which shows that the minimax risk for generic Lipschitz operators cannot decay at any algebraic rate in the sample size. We obtain sharp characterizations when the covariance spectrum decays exponentially and provide general upper and lower bounds in slower-decay regimes. Finally, we show that assuming higher regularity, i.e., Hölder smoothness, does not improve minimax rates over the Lipschitz case, up to potential constants. Thus, we show that learning operators of any finite regularity necessarily suffers a curse of sample complexity.

2511.05840 2026-03-06 stat.ME econ.EM stat.AP

Comparative e-backtests for general risk measures

Zhanyi Jiao, Qiuqi Wang, Yimiao Zhao

详情
英文摘要

Backtesting risk measures is a central task in financial regulation. While standard backtests evaluate whether a forecasting model is statistically consistent with observed losses, regulatory practice often requires assessing the performance of an internal model relative to benchmark models. We develop a non-parametric sequential framework for comparative backtests of general elicitable risk measures using e-values and e-processes. The proposed methods provide anytime-valid inference and remain robust under dependence and model misspecification. In particular, we propose a modified three-zone approach based on weak dominance, which yields more informative conclusions in comparative backtesting. As a technical building block, we also construct general standard e-backtests for identifiable risk measures and characterize the associated e-values and e-processes. The resulting procedures apply to a broad class of commonly used risk measures, including the mean, variance, Value-at-Risk, Expected Shortfall, and expectiles. Simulation studies and empirical analyses illustrate the effectiveness of the proposed approach.

2510.15664 2026-03-06 stat.ME cs.LG physics.comp-ph

Bayesian Inference for PDE-based Inverse Problems using the Optimization of a Discrete Loss

Lucas Amoudruz, Sergey Litvinov, Costas Papadimitriou, Petros Koumoutsakos

详情
英文摘要

Inverse problems are crucial for many applications in science, engineering and medicine that involve data assimilation, design, and imaging. Their solution infers the parameters or latent states of a complex system from noisy data and partially observable processes. When measurements are an incomplete or indirect view of the system, additional knowledge is required to accurately solve the inverse problem. Adopting a physical model of the system in the form of partial differential equations (PDEs) is a potent method to close this gap. In particular, the method of optimizing a discrete loss (ODIL) has shown great potential in terms of robustness and computational cost. In this work, we introduce B-ODIL, a Bayesian extension of ODIL, that integrates the PDE loss of ODIL as prior knowledge and combines it with a likelihood describing the data. B-ODIL employs a Bayesian formulation of PDE-based inverse problems to infer solutions with quantified uncertainties. We demonstrate the capabilities of B-ODIL in a series of synthetic benchmarks involving PDEs in one, two, and three dimensions. We showcase the application of B-ODIL in estimating tumor concentration and its uncertainty in a patient's brain from MRI scans using a three-dimensional tumor growth model.

2509.24544 2026-03-06 stat.ML cs.LG math.PR

Quantitative convergence of trained single layer neural networks to Gaussian processes

Eloy Mosig, Andrea Agazzi, Dario Trevisan

Comments Submitted and accepted at NeurIPS 2025, main body of 10 pages, 3 figures, 28 pages of supplementary material. Corrected an issue in the proof of Proposition 3.7

详情
英文摘要

In this paper, we study the quantitative convergence of shallow neural networks trained via gradient descent to their associated Gaussian processes in the infinite-width limit. While previous work has established qualitative convergence under broad settings, precise, finite-width estimates remain limited, particularly during training. We provide explicit upper bounds on the quadratic Wasserstein distance between the network output and its Gaussian approximation at any training time $t \ge 0$, demonstrating polynomial decay with network width. Our results quantify how architectural parameters, such as width and input dimension, influence convergence, and how training dynamics affect the approximation error.

2506.08921 2026-03-06 math.NA cs.NA math.ST stat.ML stat.TH

Enabling stratified sampling in high dimensions via nonlinear dimensionality reduction

Gianluca Geraci, Daniele E. Schiavazzi, Andrea Zanoni

详情
英文摘要

We consider the problem of propagating the uncertainty from a possibly large number of random inputs through a computationally expensive model. Stratified sampling is a well-known variance reduction strategy, but its application, thus far, has focused on models with a limited number of inputs due to the challenges of creating uniform partitions in high dimensions. To overcome these challenges, we propose a simple methodology for constructing an effective stratification of the input domain that is adapted to the model response. Our approach leverages neural active manifolds, a recently introduced nonlinear dimensionality reduction technique based on neural networks that identifies a one-dimensional manifold capturing most of the model variability. The resulting one-dimensional latent space is mapped to the unit interval, where stratification is performed with respect to the uniform distribution. The corresponding strata in the original input space are then recovered through the neural active manifold, generating partitions that tend to follow the level sets of the model. We show that our approach is effective in high dimensions and can be used to further reduce the variance of multifidelity Monte Carlo estimators.

2412.20298 2026-03-06 cs.LG cs.CY stat.ML

An Experimental Study on Fairness-aware Machine Learning for Credit Scoring Problems

Huyen Giang Thi Thu, Thang Viet Doan, Ha-Bang Ban, Tai Le Quy

Comments The manuscript is submitted to Springer Nature's journal

详情
英文摘要

The digitalization of credit scoring has become essential for financial institutions and commercial banks, especially in the era of digital transformation. Machine learning techniques are commonly used to evaluate customers' creditworthiness. However, the predicted outcomes of machine learning models can be biased toward protected attributes, such as race or gender. Numerous fairness-aware machine learning models and fairness measures have been proposed. Nevertheless, their performance in the context of credit scoring has not been thoroughly investigated. In this paper, we present a comprehensive experimental study of fairness-aware machine learning in credit scoring. The study explores key aspects of credit scoring, including financial datasets, predictive models, and fairness measures. We also provide a detailed evaluation of fairness-aware predictive models and fairness measures on widely used financial datasets. The experimental results show that fairness-aware models achieve a better balance between predictive accuracy and fairness compared to traditional classification models.

2603.05280 2026-03-06 cs.CV cs.LG stat.ML

Layer by layer, module by module: Choose both for optimal OOD probing of ViT

Ambroise Odonnat, Vasilii Feofanov, Laetitia Chapel, Romain Tavenard, Ievgen Redko

Comments Accepted at ICLR 2026 CAO Workshop

详情
英文摘要

Recent studies have observed that intermediate layers of foundation models often yield more discriminative representations than the final layer. While initially attributed to autoregressive pretraining, this phenomenon has also been identified in models trained via supervised and discriminative self-supervised objectives. In this paper, we conduct a comprehensive study to analyze the behavior of intermediate layers in pretrained vision transformers. Through extensive linear probing experiments across a diverse set of image classification benchmarks, we find that distribution shift between pretraining and downstream data is the primary cause of performance degradation in deeper layers. Furthermore, we perform a fine-grained analysis at the module level. Our findings reveal that standard probing of transformer block outputs is suboptimal; instead, probing the activation within the feedforward network yields the best performance under significant distribution shift, whereas the normalized output of the multi-head self-attention module is optimal when the shift is weak.

2603.05274 2026-03-06 stat.ME

Monitoring Covariance in Multichannel Profiles via Functional Graphical Models

Christian Capezza, Davide Forcina, Antonio Lepore, Biagio Palumbo

详情
英文摘要

Most statistical process monitoring methods for multichannel profiles focus solely on the mean and are almost ineffective when changes involve the covariance structure. Although it is known to be crucial, covariance monitoring requires estimating a much larger number of parameters, which may shift in a subtle and sparse fashion. That is, an out-of-control (OC) state may manifest with small deviations and affect only a very limited subset of these parameters. To address these difficulties, we propose a multichannel profile covariance (MPC) control chart based on functional graphical models that provide an interpretable representation of conditional dependencies between profiles. A nonparametric combination of the likelihood-ratio tests corresponding to different sparsity levels is then used to draw an overall inference and signal whether an OC state may have occurred. Between-profile relationships that are likely to have shifted are naturally identified at no additional computational cost. An extensive Monte Carlo simulation study compares the MPC control chart with state-of-the-art competitors, and a case study on monitoring multichannel temperature profiles in a roasting machine illustrates its practical applicability.

2603.05226 2026-03-06 stat.ML cs.LG

Learning Optimal Individualized Decision Rules with Conditional Demographic Parity

Wenhai Cui, Wen Su, Donglin Zeng, Xingqiu Zhao

详情
英文摘要

Individualized decision rules (IDRs) have become increasingly prevalent in societal applications such as personalized marketing, healthcare, and public policy design. However, a critical ethical concern arises from the potential discriminatory effects of IDRs trained on biased data. These algorithms may disproportionately harm individuals from minority subgroups defined by sensitive attributes like gender, race, or language. To address this issue, we propose a novel framework that incorporates demographic parity (DP) and conditional demographic parity (CDP) constraints into the estimation of optimal IDRs. We show that the theoretically optimal IDRs under DP and CDP constraints can be obtained by applying perturbations to the unconstrained optimal IDRs, enabling a computationally efficient solution. Theoretically, we derive convergence rates for both policy value and the fairness constraint term. The effectiveness of our methods is illustrated through comprehensive simulation studies and an empirical application to the Oregon Health Insurance Experiment.

2603.05201 2026-03-06 cs.LG stat.ML

Towards a data-scale independent regulariser for robust sparse identification of non-linear dynamics

Jay Raut, Daniel N. Wilke, Stephan Schmidt

Comments 21 pages, 9 figures, 5 tables

详情
英文摘要

Data normalisation, a common and often necessary preprocessing step in engineering and scientific applications, can severely distort the discovery of governing equations by magnitudebased sparse regression methods. This issue is particularly acute for the Sparse Identification of Nonlinear Dynamics (SINDy) framework, where the core assumption of sparsity is undermined by the interaction between data scaling and measurement noise. The resulting discovered models can be dense, uninterpretable, and physically incorrect. To address this critical vulnerability, we introduce the Sequential Thresholding of Coefficient of Variation (STCV), a novel, computationally efficient sparse regression algorithm that is inherently robust to data scaling. STCV replaces conventional magnitude-based thresholding with a dimensionless statistical metric, the Coefficient Presence (CP), which assesses the statistical validity and consistency of candidate terms in the model library. This shift from magnitude to statistical significance makes the discovery process invariant to arbitrary data scaling. Through comprehensive benchmarking on canonical dynamical systems and practical engineering problems, including a physical mass-spring-damper experiment, we demonstrate that STCV consistently and significantly outperforms standard Sequential Thresholding Least Squares (STLSQ) and Ensemble-SINDy (E-SINDy) on normalised, noisy datasets. The results show that STCV-based methods can successfully identify the correct, sparse physical laws even when other methods fail. By mitigating the distorting effects of normalisation, STCV makes sparse system identification a more reliable and automated tool for real-world applications, thereby enhancing model interpretability and trustworthiness.

2603.05163 2026-03-06 math.PR math.ST stat.TH

New Berry-Esseen bounds for parameter estimation of Gaussian processes observed at high frequency

Khalifa Es-Sebaiy, Yong Chen

详情
英文摘要

The purpose of this paper is to estimate the limiting variance of asymptotically stationary Gaussian processes observed at high frequency, using the second moment estimator (SME). We study rates of convergence of the central limit theorem for the SME in terms of the total variation, Kolmogorov and Wasserstein distances, using some novel techniques and sharp estimates for cumulants. We apply our approach to provide Berry-Esseen bounds in Kolmogorov and Wasserstein distances for estimators of the drift parameters of Gaussian Ornstein-Uhlenbeck processes. Moreover, we prove that most of our estimates are strictly sharper than the ones obtained in the existing literature.

2603.05154 2026-03-06 eess.SP stat.AP

Revitalizing AR Process Simulation of Non-Gaussian Radar Clutter via Series-Based Analytic Continuation

Xingxing Liao, Junhao Xie

Comments 13 pages, 12 figures

详情
英文摘要

Due to the conceptual simplicity, the linear filtering framework, notably the autoregressive (AR) process, has a long history in simulating clutter sequences with specified probability density functions (PDFs) and autocorrelation functions (ACFs). However, linear filtering inevitably distorts the input distribution, which may lead to inaccurate PDF reproduction or restrict applicability to very simple ACFs. To address these challenges, this study proposes a series-based analytic continuation strategy that revitalizes AR process clutter simulation by accurately precomputing the input pre-distortion required to compensate for AR filtering. First, the moments and cumulants of the AR input are derived based on the input-output relationship of the AR process, facilitating the moment and cumulant expansions of the Laplace transform (LT) and the logarithmic LT around zero, respectively. Second, both series expansions are analytically continued via the Padé approximation (PA) to recover the LT over the full complex plane. Notably, the PA-based continuation of the moment expansion, a conventional choice, can be highly inaccurate when the LT exhibits strong oscillations. By contrast, given the logarithmic LT generally has a simpler structure, the continuation of the cumulant expansion provides a more stable and accurate alternative. Third, the LT recovered from the cumulant expansion facilitates fast simulation of the AR input non-Gaussian white sequence via a random variable transformation method, thereby enabling an efficient AR process. Finally, simulations demonstrate that the proposed strategy enables accurate and fast simulation of non-Gaussian correlated clutter sequences.

2603.05149 2026-03-06 cs.LG cs.AI stat.ML

Federated Causal Discovery Across Heterogeneous Datasets under Latent Confounding

Maximilian Hahn, Alina Zajak, Dominik Heider, Adèle Helena Ribeiro

详情
英文摘要

Causal discovery across multiple datasets is often constrained by data privacy regulations and cross-site heterogeneity, limiting the use of conventional methods that require a single, centralized dataset. To address these challenges, we introduce fedCI, a federated conditional independence test that rigorously handles heterogeneous datasets with non-identical sets of variables, site-specific effects, and mixed variable types, including continuous, ordinal, binary, and categorical variables. At its core, fedCI uses a federated Iteratively Reweighted Least Squares (IRLS) procedure to estimate the parameters of generalized linear models underlying likelihood-ratio tests for conditional independence. Building on this, we develop fedCI-IOD, a federated extension of the Integration of Overlapping Datasets (IOD) algorithm, that replaces its meta-analysis strategy and enables, for the fist time, federated causal discovery under latent confounding across distributed and heterogeneous datasets. By aggregating evidence federatively, fedCI-IOD not only preserves privacy but also substantially enhances statistical power, achieving performance comparable to fully pooled analyses and mitigating artifacts from low local sample sizes. Our tools are publicly available as the fedCI Python package, a privacy-preserving R implementation of IOD, and a web application for the fedCI-IOD pipeline, providing versatile, user-friendly solutions for federated conditional independence testing and causal discovery.

2603.05119 2026-03-06 q-fin.ST math.ST stat.TH

Asymptotic Separability of Diffusion and Jump Components in High-Frequency CIR and CKLS Models

Sourojyoti Barick

详情
英文摘要

This paper develops a robust parametric framework for jump detection in discretely observed CKLS-type jump-diffusion processes with high-frequency asymptotics, based on the minimum density power divergence estimator (MDPDE). The methodology exploits the intrinsic asymptotic scale separation between diffusion increments, which decay at rate $\sqrt{Δ_n}$, and jump increments, which remain of non-vanishing stochastic magnitude. Using robust MDPDE-based estimators of the drift and diffusion coefficients, we construct standardized residuals whose extremal behavior provides a principled basis for statistical discrimination between continuous and discontinuous components. We establish that, over diffusion intervals, the maximum of the normalized residuals converges to the Gumbel extreme-value distribution, yielding an explicit and asymptotically valid detection threshold. Building on this result, we prove classification consistency of the proposed robust detection procedure: the probability of correctly identifying all jump and diffusion increments converges to one under proper asymptotics. The MDPDE-based normalization attenuates the influence of atypical increments and stabilizes the detection boundary in the presence of discontinuities. Simulation results confirm that robustness improves finite-sample stability and reduces spurious detections without compromising asymptotic validity. The proposed methodology provides a theoretically rigorous and practically resilient robust approach to jump identification in high-frequency stochastic systems.

2603.05065 2026-03-06 stat.ME

Modeling cyclostationarity in time series using ASCA

Daniel Vallejo-España, Jesús García Sánchez, Manuel Villar-Argaiz, Concepción De Linares, José Camacho

Comments 27 pages and 4 figures in main text. 16 pages and 8 figures in supplementary materials

详情
英文摘要

Modern data analysis across diverse disciplines increasingly relies on time series. Many of these datasets exhibit cyclostationarity, where patterns approximately repeat in a regular manner, often across multiple time scales, such as daily, weekly or yearly cycles. In this context, statistical inference is essential to distinguish genuine underlying effects from random variability. While tools like Analysis of Variance (ANOVA) provide such inference, they often lack interpretability and struggle with the complexities of multivariate data. To address these limitations, we propose a unified pipeline for the exploratory analysis of cyclostationary times series using ANOVA Simultaneous Component Analysis (ASCA). ASCA is an extension of ANOVA that is able to work in both univariate and multivariate cases. Combining inference with the visualization capabilities of Principal Component Analysis (PCA), ASCA provides powerful options for interpretability. ASCA's capabilities have been well-established in the analysis of experimental data, but they remain largely unexplored for observational data like time series. Our workflow introduces an algorithmic approach to modeling time-dependent data using ASCA, enabling control over multiple cyclostationary time scales while also accounting for the specific challenges of this type of data, such as autocorrelation. Furthermore, we observed that ASCA provides a better separation of variability across factors than ANOVA in unbalanced designs due to its multivariate nature. We demonstrate the efficacy of this methodology through two real-world case studies: water temperature trends in mountain lakes in Sierra Nevada, Spain, and airborne pollen trends over 30 years recorded in the city of Granada, Spain.

2603.04780 2026-03-06 cs.LG stat.ML

Distributional Equivalence in Linear Non-Gaussian Latent-Variable Cyclic Causal Models: Characterization and Learning

Haoyue Dai, Immanuel Albrecht, Peter Spirtes, Kun Zhang

Comments Appears at ICLR 2026 (oral)

详情
Journal ref
Proceedings of the International Conference on Learning Representations (ICLR), 2026
英文摘要

Causal discovery with latent variables is a fundamental task. Yet most existing methods rely on strong structural assumptions, such as enforcing specific indicator patterns for latents or restricting how they can interact with others. We argue that a core obstacle to a general, structural-assumption-free approach is the lack of an equivalence characterization: without knowing what can be identified, one generally cannot design methods for how to identify it. In this work, we aim to close this gap for linear non-Gaussian models. We establish the graphical criterion for when two graphs with arbitrary latent structure and cycles are distributionally equivalent, that is, they induce the same observed distribution set. Key to our approach is a new tool, edge rank constraints, which fills a missing piece in the toolbox for latent-variable causal discovery in even broader settings. We further provide a procedure to traverse the whole equivalence class and develop an algorithm to recover models from data up to such equivalence. To our knowledge, this is the first equivalence characterization with latent variables in any parametric setting without structural assumptions, and hence the first structural-assumption-free discovery method. Code and an interactive demo are available at https://equiv.cc.

2603.04752 2026-03-06 stat.ME math.ST stat.TH

Robust estimation via $γ$-divergence for diffusion processes

Tomoyuki Nakagawa, Yusuke Shimizu

Comments 25page

详情
英文摘要

This paper deals with the problem of outliers in high frequency observation data from diffusion processes. Robust estimation methods are needed because the inclusion of outliers can lead to incorrect statistical inference even in the diffusion process. To construct a robust estimator, we first approximate the transition density of the diffusion process to the Gaussian density by using Kessler's approach and then employ two types of minimum robust divergence estimation methods. In this paper, we provide the asymptotic properties of the robust estimator using $γ$-divergence. Furthermore, we derive the conditional influence functions of the estimation using divergences and discuss its boundness.

2603.04697 2026-03-06 stat.ME

A Multi-Fidelity Tensor Emulator for Spatiotemporal Outputs: Emulation of Arctic Sea Ice Dynamics

Tristan Contant, Yawen Guan, Ander Wilson, Adrian K. Turner, Deborah Sulsky

Comments 25 pages, 6 figures

详情
英文摘要

Numerical models are widely used to simulate the earth system, but they are computationally expensive and often depend on many uncertain input parameters. Their effective use requires calibration and uncertainty quantification, which typically involve running the model across many input configurations and therefore incur substantial computational cost. Statistical emulation provides a practical alternative for efficiently exploring model behavior. We are motivated by the Arctic sea ice component of the Energy Exascale Earth System Model (MPAS-Seaice), which generates large spatiotemporal outputs at multiple spatial resolutions, with high-resolution (or high-fidelity, HF) simulations being more accurate but computationally more expensive than lower-resolution (low-fidelity, LF) simulations. Multi-fidelity (MF) emulation integrates information across resolutions to construct efficient and accurate surrogate models, yet existing approaches struggle to scale to large spatiotemporal data. We develop an MF emulator that combines tensor decomposition for dimensionality reduction, Gaussian process priors for flexible function approximation, and an additive discrepancy model to capture systematic differences between LF and HF data. The proposed framework enables scalable emulation while maintaining accurate predictions and well-calibrated uncertainty for complex spatiotemporal fields, and consistently achieves lower prediction error and reduced uncertainty than LF-only and HF-only models in both simulation studies and MPAS-Seaice analysis. By leveraging the complementary strengths of LF and HF data and using an efficient tensor decomposition approach, our emulator greatly reduces computational expense, making it well suited for large-scale simulation tasks involving complex physical models.

2603.04690 2026-03-06 math.ST stat.TH

Strong consistency of the local linear estimator for a generalized regression function with dependent functional data

Danilo Hiroshi Matsuoka, Hudson da Silva Torrent

Comments Supplementary material included. Submitted to Annals of the Institute of Statistical Mathematics

详情
英文摘要

In this study, we focus on a generalized nonparametric scalar-on-function regression model for heterogeneously distributed and strongly mixing data. We provide almost complete convergence rates for the local linear estimator of the regression function. We show that, under our conditions, the pointwise and uniform convergence rates are the same on a compact set. On the other hand, when the data is dependent, it is proved that the convergence rate can be slower than those obtained for independent data. A simulation study shows the good performance and finite sample properties of the functional local linear estimator (FLL) in comparison to the local constant estimator (FLC). In addition, a one step ahead energy consumption forecasting exercise illustrates that the forecasts of the FLL estimator are significantly more accurate than those of the FLC.

2603.04688 2026-03-06 q-bio.NC cs.AI cs.LG stat.ML

Why the Brain Consolidates: Predictive Forgetting for Optimal Generalisation

Zafeirios Fountas, Adnan Oomerjee, Haitham Bou-Ammar, Jun Wang, Neil Burgess

Comments 25 pages, 6 figures

详情
英文摘要

Standard accounts of memory consolidation emphasise the stabilisation of stored representations, but struggle to explain representational drift, semanticisation, or the necessity of offline replay. Here we propose that high-capacity neocortical networks optimise stored representations for generalisation by reducing complexity via predictive forgetting, i.e. the selective retention of experienced information that predicts future outcomes or experience. We show that predictive forgetting formally improves information-theoretic generalisation bounds on stored representations. Under high-fidelity encoding constraints, such compression is generally unattainable in a single pass; high-capacity networks therefore benefit from temporally separated, iterative refinement of stored traces without re-accessing sensory input. We demonstrate this capacity dependence with simulations in autoencoder-based neocortical models, biologically plausible predictive coding circuits, and Transformer-based language models, and derive quantitative predictions for consolidation-dependent changes in neural representational geometry. These results identify a computational role for off-line consolidation beyond stabilisation, showing that outcome-conditioned compression optimises the retention-generalisation trade-off.

2603.04686 2026-03-06 math.ST stat.TH

The augmented van Trees inequality

Elliot H. Young

详情
英文摘要

We introduce an augmented form of the van Trees inequality, that yields uniformly tighter lower bounds on the minimax squared Bayes risk of estimators compared with the classical van Trees inequality. Our augmented inequality also accommodates prior distributions whose densities need not vanish at the boundaries of their supports. We demonstrate how this refinement can be utilised for elementary proofs of a number of minimax lower bounds for nonparametric estimands, that also often attain sharper constants than those obtained by the alternative Le Cam convergence of experiments theory and the classical van Trees inequality, and in some cases obtain exact constants. As an example, our augmented van Trees inequality can be used to obtain the asymptotic minimax pointwise mean squared error when estimating the regression function in the model with normal errors: when the regression function is univariate and differentiable with Lipschitz derivative we obtain this quantity up to a constant factor of $1.37$; and in the high dimensional regime with a Hölder smooth regression function of smoothness $β\in(0,2]$ we obtain exact constants. Both these results do not follow from an application of the classical van Trees inequality. The flexibility of our augmented van Trees inequality accommodates lower bounds for models beyond Gaussianity, loss functions beyond the squared error loss, and we are also able to incorporate this augmentation into generalised versions of the van Trees inequality for irregular models.

2603.04685 2026-03-06 math.ST stat.TH

Sequential Multiple Testing: A Second-Order Asymptotic Analysis

Jingyu Liu, Yanglei Song

详情
英文摘要

We study sequential multiple testing with independent data streams, where the goal is to identify an unknown subset of signals while controlling commonly used error metrics, including generalized familywise rates and false discovery and non-discovery rates. For these problems, procedures that are first-order optimal are known, in the sense that the ratio of their expected sample size (ESS) to the minimal achievable ESS converges to one as the error tolerance levels vanish. In this work, we develop a unified theory of second-order asymptotic optimality. We establish general sufficient conditions under which second-order Bayesian optimality implies second-order frequentist optimality for broad classes of sequential testing procedures. As a consequence, several procedures previously known to be first-order optimal are shown to be second-order optimal: for every signal configuration, the difference between their ESS and the minimal achievable ESS remains uniformly bounded as the error tolerance levels tend to zero. In addition, we derive a second-order asymptotic expansion of the minimal achievable ESS, which refines the classical first-order approximation by identifying the second-order correction term arising from a boundary-crossing problem for a multidimensional random walk. We apply this result to several commonly used error metrics.

2603.04681 2026-03-06 math.ST stat.TH

Uniform convergence of kernel averages under fixed design with heterogeneous dependent data

Danilo Hiroshi Matsuoka, Hudson da Silva Torrent

Comments Supplementary material included. Submitted to Journal of Time Series Analysis

详情
英文摘要

We provide uniform convergence rates for kernel averages on $[0,1]$ under equally-spaced fixed design points of the form $x_{t,T}=t/T,\ t\in\{1,\dotsc, T\},\ T\in\mathbb{N}$. The rates of weak and strong uniform consistency are derived under strong mixing and moment conditions and do not require stationarity. The analysis exploits the grid structure and thus complements existing random-design results such as those of Hansen (2008) and Kristensen (2009), which rely on density-based conditioning arguments. The framework accommodates dependent triangular arrays and is particularly relevant for nonparametric methods applied to time series observed on deterministic grids. As an application, we derive uniform convergence rates for the local linear estimator in a nonparametric regression model with time-varying autoregressive errors. The theoretical results are illustrated through Monte Carlo experiments and an empirical application.

2603.04671 2026-03-06 math.PR math.ST stat.TH

Estimating Graph Dynamics from Population Observations

Peter Braunsteins, Michel Mandjes, Florian Montalescot

详情
英文摘要

In this paper we consider a population process evolving on a dynamic random graph. The dynamic random graph is an Erdős--Rényi graph that is resampled every time unit, independently of the previous ones, with `edge existence probability' $p$. The population process consists of $M$ individuals which reside at the vertices of the dynamic graph. At each point in time any of the $M$ individuals, supposing it resides at a vertex with $k$ neighbors, jumps to an adjacent vertex with probability $k/(k+1)$ (where this adjacent vertex is picked uniformly at random), and with probability $1/(k+1)$ it stays where it is. We suppose we observe the numbers of individuals at each of the vertices, but not the evolving random graph itself. We propose two estimators for $p$, and establish their consistency and asymptotic normality.

2603.04635 2026-03-06 stat.ML cs.DS cs.LG

Optimal Prediction-Augmented Algorithms for Testing Independence of Distributions

Maryam Aliakbarpour, Alireza Azizi, Ria Stevens

详情
英文摘要

Independence testing is a fundamental problem in statistical inference: given samples from a joint distribution $p$ over multiple random variables, the goal is to determine whether $p$ is a product distribution or is $ε$-far from all product distributions in total variation distance. In the non-parametric finite-sample regime, this task is notoriously expensive, as the minimax sample complexity scales polynomially with the support size. In this work, we move beyond these worst-case limitations by leveraging the framework of \textit{augmented distribution testing}. We design independence testers that incorporate auxiliary, but potentially untrustworthy, predictive information. Our framework ensures that the tester remains robust, maintaining worst-case validity regardless of the prediction's quality, while significantly improving sample efficiency when the prediction is accurate. Our main contributions include: (i) a bivariate independence tester for discrete distributions that adaptively reduces sample complexity based on the prediction error; (ii) a generalization to the high-dimensional multivariate setting for testing the independence of $d$ random variables; and (iii) matching minimax lower bounds demonstrating that our testers achieve optimal sample complexity.

2603.04632 2026-03-06 stat.ME stat.CO

Least trimmed squares regression with missing values and cellwise outliers

Jakob Raymaekers, Peter J. Rousseeuw

详情
英文摘要

Regression is the workhorse of statistics, and is often faced with real data that contain outliers. When these are casewise outliers, that is, cases that are entirely wrong or belong to a different population, the issue can be remedied by existing casewise robust regression methods. It is another matter when cellwise outliers occur, that is, suspicious individual entries in the data matrix containing the regressors and the response. We propose a new regression method that is robust to both casewise and cellwise outliers, and handles missing values as well. Its construction allows for skewed distributions. We show that it obeys the first breakdown result for cellwise robust regression. It is also the first such method that is geared to making robust out-of-sample predictions. Its performance is studied by simulation, and it is illustrated on a substantial real dataset.

2603.04625 2026-03-06 cs.LG math.ST stat.ML stat.TH

K-Means as a Radial Basis function Network: a Variational and Gradient-based Equivalence

Felipe de Jesus Felix Arredondo, Alejandro Ucan-Puc, Carlos Astengo Noguez

Comments 21 pages, 2 figures, 1 appendix

详情
英文摘要

This work establishes a rigorous variational and gradient-based equivalence between the classical K-Means algorithm and differentiable Radial Basis Function (RBF) neural networks with smooth responsibilities. By reparameterizing the K-Means objective and embedding its distortion functional into a smooth weighted loss, we prove that the RBF objective $Γ$-converges to the K-Means solution as the temperature parameter $σ$ vanishes. We further demonstrate that the gradient-based updates of the RBF centers recover the exact K-Means centroid update rule and induce identical training trajectories in the limit. To address the numerical instability of the Softmax transformation in the low-temperature regime, we propose the integration of Entmax-1.5, which ensures stable polynomial convergence while preserving the underlying Voronoi partition structure. These results bridge the conceptual gap between discrete partitioning and continuous optimization, enabling K-Means to be embedded directly into deep learning architectures for the joint optimization of representations and clusters. Empirical validation across diverse synthetic geometries confirms a monotone collapse of soft RBF centroids toward K-Means fixed points, providing a unified framework for end-to-end differentiable clustering.

2603.04608 2026-03-06 math.ST stat.ME stat.TH

KRAFTY: Khatri-Rao Framework for Joint Cluster Recovery

Siyi Gao, Zachary Lubberts, Marianna Pensky

Comments 47 pages, 38 figures

详情
英文摘要

When multiple datasets describe complementary information about the same set of entities, for example, brain scans of an individual over time, global trade network across years, or user information across social media platforms, integrating these snapshots allows us to see a more holistic picture. A common way of identifying structure in data is through clustering, but while clustering may be applied to each dataset separately, we learn more in the multi-view setting by identifying joint clusters. We consider a clustering problem where each view conflates some of these joint clusters, only revealing partial information, and seek to recover the true joint cluster structure. We introduce this multi-view clustering model and a method for recovering it: the transposed Khatri-RAo Framework for joinT cluster recoverY (KRAFTY). The model is flexible and can accommodate a variety of data-generating processes, including latent positions in random dot product graphs and Gaussian mixtures. A key advantage of KRAFTY is that it represents joint clusters in a space with sufficient dimension so that each joint cluster occupies an orthogonal subspace in the transposed Khatri-Rao matrix, which results in a sharp drop in the scree plot at the true number of joint clusters, enabling easy model selection. Our simulations show that when the number of joint clusters exceeds the sum of the numbers of clusters in each individual view, our method outperforms existing methods in both joint clustering accuracy and estimation of the number of joint clusters.

2603.04576 2026-03-06 stat.ME

Variable Selection for Linear Regression Imputation in Surveys

Ziming An, Mehdi Dagdoug, David Haziza

详情
英文摘要

Survey sampling is concerned with the estimation of finite population parameters. In practice, survey data suffer from item nonresponse, which is commonly handled through imputation, i.e., replacing missing values with predicted values. As a result, the properties of the resulting imputed estimator depend critically on the properties of the prediction method used. In turn, prediction methods themselves depend on the choice of variables and tuning parameters used to fit the imputation model. In this article, we study the problem of variable selection for linear regression imputation. Although variable selection has been widely studied across many fields, primarily for identification or prediction, its role in imputation for survey data has received comparatively little attention. We introduce the notion of an optimal imputation model defined through an oracle loss function and show that, with probability tending to one, the optimal model coincides with the true model. We also examine the consequences of using misspecified models -- either omitting relevant covariates or including irrelevant ones -- on consistency and asymptotic variance. We then develop a complete methodological framework for constructing confidence intervals after model selection. The proposed confidence intervals are shown to be asymptotically valid and optimal among all candidate models. Simulation studies indicate that the proposed methodology performs well in finite samples.

2603.04551 2026-03-06 stat.AP cs.LG

Weather-Related Crash Risk Forecasting: A Deep Learning Approach for Heterogenous Spatiotemporal Data

Abimbola Ogungbire, Srinivas Pulugurtha

Comments 20 pages 5 figures

详情
英文摘要

This study introduces a deep learning-based framework for forecasting weather-related traffic crash risk using heterogeneous spatiotemporal data. Given the complex, non-linear relationship between crash occurrence and factors such as road characteristics, and traffic conditions, we propose an ensemble of Convolutional Long Short-Term Memory (ConvLSTM) models trained over overlapping spatial grids. This approach captures both spatial dependencies and temporal dynamics while addressing spatial heterogeneity in crash patterns. North Carolina was selected as the study area due to its diverse weather conditions, with historical crash, weather, and traffic data aggregated at 5-mi by 5-mi grid resolution. The framework was evaluated using Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and spatial cross-K analysis. Results show that the ensembled ConvLSTM significantly outperforms baseline models, including linear regression, ARIMA, and standard ConvLSTM, particularly in high-risk zones. The ensemble approach effectively combines the strengths of multiple ConvLSTM models, resulting in lower MSE and RMSE values across all regions, particularly when data from different crash risk zones are aggregated. Notably, the model performs exceptionally well in volatile high-risk areas (Cluster 1), achieving the lowest MSE and RMSE, while in stable low-risk areas (Cluster 2), it still improves upon simpler models but with slightly higher errors due to challenges in capturing subtle variations.

2603.04546 2026-03-06 cs.LG stat.ML

Oracle-efficient Hybrid Learning with Constrained Adversaries

Princewill Okoroafor, Robert Kleinberg, Michael P. Kim

详情
英文摘要

The Hybrid Online Learning Problem, where features are drawn i.i.d. from an unknown distribution but labels are generated adversarially, is a well-motivated setting positioned between statistical and fully-adversarial online learning. Prior work has presented a dichotomy: algorithms that are statistically-optimal, but computationally intractable (Wu et al., 2023), and algorithms that are computationally-efficient (given an ERM oracle), but statistically-suboptimal (Wu et al., 2024). This paper takes a significant step towards achieving statistical optimality and computational efficiency simultaneously in the Hybrid Learning setting. To do so, we consider a structured setting, where the Adversary is constrained to pick labels from an expressive, but fixed, class of functions $R$. Our main result is a new learning algorithm, which runs efficiently given an ERM oracle and obtains regret scaling with the Rademacher complexity of a class derived from the Learner's hypothesis class $H$ and the Adversary's label class $R$. As a key corollary, we give an oracle-efficient algorithm for computing equilibria in stochastic zero-sum games when action sets may be high-dimensional but the payoff function exhibits a type of low-dimensional structure. Technically, we develop a number of tools for the design and analysis of our learning algorithm, including a novel Frank-Wolfe reduction with "truncated entropy regularizer" and a new tail bound for sums of "hybrid" martingale difference sequences.

2603.04544 2026-03-06 stat.ME

Proximal Learning for Trials With External Controls: A Case Study in HIV Prevention

Yilin Song, Yinxiang Wu, Raphael J. Landovitz, Susan Buchbinder, Srilatha Edupuganti, Lydia Soto-Torres, Kendrick Li, Xu Shi, Fei Gao, Deborah Donnell, Holly Janes, Ting Ye

详情
英文摘要

With the advent of effective pre-exposure prophylaxis agents, active-controlled HIV prevention trials have become a common study design. Nevertheless, estimating absolute efficacy relative to a placebo remains important. In this paper, we introduce a novel application of proximal causal inference methods to estimate the counterfactual cumulative HIV incidence under placebo for participants in an active-controlled trial of cabotegravir, using external control data from a placebo-controlled trial with similar eligibility criteria. We leverage baseline sexually transmitted infection status and geographic region as negative control outcome and exposure variables, respectively. We address two key challenges: unmeasured differences in HIV risk between trials and statistical difficulties arising from low HIV incidence rates in both studies. To overcome these challenges, we develop two proximal inference approaches: (1) a semiparametric inverse probability of censoring weighting estimator, and (2) a two-stage regression-based strategy tailored to low-event-rate settings. Our theoretical and numerical investigations demonstrate these methods yield reliable estimates of the counterfactual one-year cumulative HIV incidence under placebo, and provide robust evidence of the superior efficacy of cabotegravir compared with placebo. These findings highlight the potential of proximal inference methods to estimate placebo-controlled effects in both single-arm and active-controlled trials by leveraging external controls.

2603.04541 2026-03-06 stat.OT

Engaging students with statistics through choice of real data context on homework

Catalina Medina, Mine Dogucu

Comments 25 pages, 3 figures, 2 tables. Submitted to The American Statistician. Supplementary materials and code available at https://github.com/CatalinaMedina/data-context-choice-manuscript

详情
英文摘要

Statistics educators recommend teaching with real data with relevant contexts, but defining relevancy is challenging and varies by student. We investigated whether providing student choice of data context increases engagement through a quasi-experiment in two sections of an introductory probability and statistics course at a large public university (n=65 consenting students). Sections alternated as treatment and control: during their treatment, students chose weekly homework from three similar instructor-provided options varying by data context; during control weeks, they received randomly assigned contexts. We found no significant difference in homework grades between treatment and control conditions. However, thematic analysis revealed students with choice reported enhanced engagement and motivation, greater appreciation for statistics' real-world value, and increased autonomy. Students overwhelmingly preferred contexts relevant to their interests, experiences, daily lives, and career paths-though preferences varied considerably across individuals. Based on these findings, we provide four recommendations for statistics educators: (1) use real data with authentic contexts, (2) select contexts students care about, (3) incorporate variety across data contexts, and (4) consider choice as a pedagogical tool.

2603.04479 2026-03-06 stat.ML cs.LG math.PR math.ST stat.AP stat.TH

Bayesian Modeling of Collatz Stopping Times: A Probabilistic Machine Learning Perspective

Nicolò Bonacorsi, Matteo Bordoni

详情
英文摘要

We study the Collatz total stopping time $τ(n)$ over $n\le 10^7$ from a probabilistic machine learning viewpoint. Empirically, $τ(n)$ is a skewed and heavily overdispersed count with pronounced arithmetic heterogeneity. We develop two complementary models. First, a Bayesian hierarchical Negative Binomial regression (NB2-GLM) predicts $τ(n)$ from simple covariates ($\log n$ and residue class $n \bmod 8$), quantifying uncertainty via posterior and posterior predictive distributions. Second, we propose a mechanistic generative approximation based on the odd-block decomposition: for odd $m$, write $3m+1=2^{K(m)}m'$ with $m'$ odd and $K(m)=v_2(3m+1)\ge 1$; randomizing these block lengths yields a stochastic approximation calibrated via a Dirichlet-multinomial update. On held-out data, the NB2-GLM achieves substantially higher predictive likelihood than the odd-block generators. Conditioning the block-length distribution on $m\bmod 8$ markedly improves the generator's distributional fit, indicating that low-order modular structure is a key driver of heterogeneity in $τ(n)$.

2603.04473 2026-03-06 stat.ML cs.IT cs.LG math.IT

Dictionary Based Pattern Entropy for Causal Direction Discovery

Harikrishnan N B, Shubham Bhilare, Aditi Kathpalia, Nithin Nagaraj

Comments 13 pages

详情
英文摘要

Discovering causal direction from temporal observational data is particularly challenging for symbolic sequences, where functional models and noise assumptions are often unavailable. We propose a novel \emph{Dictionary Based Pattern Entropy ($DPE$)} framework that infers both the direction of causation and the specific subpatterns driving changes in the effect variable. The framework integrates \emph{Algorithmic Information Theory} (AIT) and \emph{Shannon Information Theory}. Causation is interpreted as the emergence of compact, rule based patterns in the candidate cause that systematically constrain the effect. $DPE$ constructs direction-specific dictionaries and quantifies their influence using entropy-based measures, enabling a principled link between deterministic pattern structure and stochastic variability. Causal direction is inferred via a minimum-uncertainty criterion, selecting the direction exhibiting stronger and more consistent pattern-driven organization. As summarized in Table 7, $DPE$ consistently achieves reliable performance across diverse synthetic systems, including delayed bit-flip perturbations, AR(1) coupling, 1D skew-tent maps, and sparse processes, outperforming or matching competing AIT-based methods ($ETC_E$, $ETC_P$, $LZ_P$). In biological and ecological datasets, performance is competitive, while alternative methods show advantages in specific genomic settings. Overall, the results demonstrate that minimizing pattern level uncertainty yields a robust, interpretable, and broadly applicable framework for causal discovery.

2602.15581 2026-03-06 stat.OT

Confidence as Forecast: A Decision-Theoretic Interpretation of Confidence Intervals

Scott Lee

详情
英文摘要

What, if anything, should a frequentist say about a single realized confidence interval (CI) and its chance of having covered the parameter? Jerzy Neyman's original answer was to refuse any nondegenerate probability for coverage ex post and, instead, to "state that the interval covers". In this paper I argue that the usual frequentist machinery already supports a different reading. I treat the coverage event as a Bernoulli random variable, with the nominal level 1-alpha as its design-based success probability, and view "confidence" as a probability forecast for that Bernoulli outcome. Using strictly proper scoring rules, I show that 1-alpha is the unique optimal constant forecast for coverage, both before and after observing the data, and that it remains optimal post-trial in common unbounded, translation-invariant models with pivot-based CIs. When the design yields a theta-free statistic--such as the relative width of the interval in a finite-window uniform model--the conditional coverage given that statistic provides a nonconstant, design-based refinement of 1-alpha that strictly improves predictive performance. Two thought experiments, a Monty Hall-style shell game and the "lost submarine" example of Morey et al. (2016), illustrate how this perspective resolves familiar interpretational puzzles about CIs without appealing to priors or single-case subjective degrees of belief. I conclude with simple "what to do when you see an interval" guidance for applied work and some implications for teaching confidence intervals as tools for forecasting long-run coverage. Keywords: Confidence intervals, coverage probability, proper scoring rules, probabilistic forecasting, frequentist inference Disclaimer: The findings and conclusions in this report are those of the author and do not necessarily represent the official position of the Centers for Disease Control and Prevention

2601.23236 2026-03-06 cs.LG cs.AI math.OC stat.ML

YuriiFormer: A Suite of Nesterov-Accelerated Transformers

Aleksandr Zimin, Yury Polyanskiy, Philippe Rigollet

详情
英文摘要

We propose a variational framework that interprets transformer layers as iterations of an optimization algorithm acting on token embeddings. In this view, self-attention implements a gradient step of an interaction energy, while MLP layers correspond to gradient updates of a potential energy. Standard GPT-style transformers emerge as vanilla gradient descent on the resulting composite objective, implemented via Lie--Trotter splitting between these two energy functionals. This perspective enables principled architectural design using classical optimization ideas. As a proof of concept, we introduce a Nesterov-style accelerated transformer that preserves the same attention and MLP oracles. The resulting architecture consistently outperforms a nanoGPT baseline on TinyStories and OpenWebText, demonstrating that optimization-theoretic insights can translate into practical gains.

2512.12988 2026-03-06 stat.ME math.ST stat.CO stat.ML stat.TH

A Bayesian approach to learning mixtures of nonparametric components

Yilei Zhang, Yun Wei, Aritra Guha, XuanLong Nguyen

Comments 80 pages, 9 figures

详情
英文摘要

Mixture models are widely used in modeling heterogeneous data populations. A standard approach of mixture modeling assumes that the mixture component takes a parametric kernel form. In many applications, making parametric assumptions on the latent subpopulation distributions may be unrealistic, which motivates the need for nonparametric modeling of the mixture components themselves. In this paper, we study finite mixtures with nonparametric mixture components, using a Bayesian nonparametric modeling approach. In particular, it is assumed that the data population is generated according to a finite mixture of latent component distributions, where each component is endowed with a Bayesian nonparametric prior such as the Dirichlet process mixture. We present conditions under which the individual mixture component's distribution can be identified, and establish posterior contraction behavior for the data population's density, as well as densities of the latent mixture components. We develop an efficient MCMC algorithm for posterior inference and demonstrate via simulation studies and real-world data illustrations that it is possible to efficiently learn complex forms of probability distribution for the latent subpopulations. In theory, the posterior contraction rate of the component densities is nearly polynomial, which is a significant improvement over the logarithmic convergence rates of estimating mixing measures via deconvolution.

2512.06945 2026-03-06 stat.ML cs.LG

Symmetric Aggregation of Conformity Scores for Efficient Uncertainty Sets

Nabil Alami, Jad Zakharia, Souhaib Ben Taieb

详情
英文摘要

Access to multiple predictive models trained for the same task, whether in regression or classification, is increasingly common in many applications. Aggregating their predictive uncertainties to produce reliable and efficient uncertainty quantification is therefore a critical but still underexplored challenge, especially within the framework of conformal prediction (CP). While CP methods can generate individual prediction sets from each model, combining them into a single, more informative set remains a challenging problem. To address this, we propose SACP (Symmetric Aggregated Conformal Prediction), a novel method that aggregates nonconformity scores from multiple predictors. SACP transforms these scores into e-values and combines them using any symmetric aggregation function. This flexible design enables a robust, data-driven framework for selecting aggregation strategies that yield sharper prediction sets. We also provide theoretical insights that help justify the validity and performance of the SACP approach. Extensive experiments on diverse datasets show that SACP consistently improves efficiency and often outperforms state-of-the-art model aggregation baselines.

2510.07093 2026-03-06 cs.LG stat.ML

Non-Asymptotic Analysis of Efficiency in Conformalized Regression

Yunzhen Yao, Lie He, Michael Gastpar

Comments Published as a conference paper at ICLR 2026

详情
英文摘要

Conformal prediction provides prediction sets with coverage guarantees. The informativeness of conformal prediction depends on its efficiency, typically quantified by the expected size of the prediction set. Prior work on the efficiency of conformalized regression commonly treats the miscoverage level $α$ as a fixed constant. In this work, we establish non-asymptotic bounds on the deviation of the prediction set length from the oracle interval length for conformalized quantile and median regression trained via SGD, under mild assumptions on the data distribution. Our bounds of order $\mathcal{O}(1/\sqrt{n} + 1/(α^2 n) + 1/\sqrt{m} + \exp(-α^2 m))$ capture the joint dependence of efficiency on the proper training set size $n$, the calibration set size $m$, and the miscoverage level $α$. The results identify phase transitions in convergence rates across different regimes of $α$, offering guidance for allocating data to control excess prediction set length. Empirical results are consistent with our theoretical findings.

2508.16523 2026-03-06 stat.ME stat.AP

Identifying Treatment Effect Heterogeneity with Bayesian Hierarchical Adjustable Random Partition in Adaptive Enrichment Trials

Xianglin Zhao, Shirin Golchi, Jean-Philippe Gouin, Kaberi Dasgupta

详情
英文摘要

Treatment effect heterogeneity refers to the systematic variation in treatment effects across subgroups. There is an increasing need for clinical trials that aim to investigate treatment effect heterogeneity and estimate subgroup-specific responses. While several statistical methods have been proposed to address this problem, existing partitioning-based methods often depend on auxiliary analysis, overlook model uncertainty, or impose inflexible borrowing strength. We propose the Bayesian Hierarchical Adjustable Random Partition (BHARP) model, a self-contained framework that applies a finite mixture model with an unknown number of components to explore the partition space accounting for model uncertainty. The BHARP model jointly estimates subgroup-specific effects and the heterogeneity patterns, and adjusts the borrowing strengths based on within-cluster cohesion without requiring manual calibration. Posterior sampling is performed via a custom reversible-jump Markov chain Monte Carlo sampler tailored to partitioning-based information borrowing in clinical trials. Simulation studies across a range of treatment effect heterogeneity patterns show that the BHARP model achieves better accuracy and precision compared to conventional and advanced methods. We showcase the utilities of the BHARP model in the context of a multi-arm adaptive enrichment trial investigating physical activity interventions in patients with type 2 diabetes.

2508.11847 2026-03-06 stat.ML cs.LG

Dropping Just a Handful of Preferences Can Change Top Large Language Model Rankings

Jenny Y. Huang, Yunyi Shen, Dennis Wei, Tamara Broderick

详情
英文摘要

We propose a method for evaluating the robustness of widely used LLM ranking systems -- variants of a Bradley--Terry model -- to dropping a worst-case very small fraction of preference data. Our approach is computationally fast and easy to adopt. When we apply our method to matchups from popular LLM ranking platforms, including Chatbot Arena and derivatives, we find that the rankings of top-performing models can be remarkably sensitive to the removal of a small fraction of preferences; for instance, dropping just 0.003% of human preferences can change the top-ranked model on Chatbot Arena. Our robustness check identifies the specific preferences most responsible for such ranking flips, allowing for inspection of these influential preferences. We observe that the rankings derived from MT-bench preferences are notably more robust than those from Chatbot Arena, likely due to MT-bench's use of expert annotators and carefully constructed prompts. Finally, we find that neither rankings based on crowdsourced human evaluations nor those based on LLM-as-a-judge preferences are systematically more sensitive than the other.

2506.14020 2026-03-06 cs.LG cs.AI stat.ML

Bures-Wasserstein Flow Matching for Graph Generation

Keyue Jiang, Jiahao Cui, Xiaowen Dong, Laura Toni

详情
英文摘要

Graph generation has emerged as a critical task in fields ranging from drug discovery to circuit design. Contemporary approaches, notably diffusion and flow-based models, have achieved solid graph generative performance through constructing a probability path that interpolates between reference and data distributions. However, these methods typically model the evolution of individual nodes and edges independently and use linear interpolations in the disjoint space of nodes/edges to build the path. This disentangled interpolation breaks the interconnected patterns of graphs, making the constructed probability path irregular and non-smooth, which causes poor training dynamics and faulty sampling convergence. To address the limitation, this paper first presents a theoretically grounded framework for probability path construction in graph generative models. Specifically, we model the joint evolution of the nodes and edges by representing graphs as connected systems parameterized by Markov random fields (MRF). We then leverage the optimal transport displacement between MRF objects to design a smooth probability path that ensures the co-evolution of graph components. Based on this, we introduce BWFlow, a flow-matching framework for graph generation that utilizes the derived optimal probability path to benefit the training and sampling algorithm design. Experimental evaluations in plain graph generation and molecule generation validate the effectiveness of BWFlow with competitive performance, better training convergence, and efficient sampling.

2505.04007 2026-03-06 stat.ML cs.LG

Variational Formulation of Particle Flow

Yinzhuang Yi, Jorge Cortés, Nikolay Atanasov

详情
英文摘要

This paper provides a formulation of the log-homotopy particle flow from the perspective of variational inference. We show that the transient density used to derive the particle flow follows a time-scaled trajectory of the Fisher-Rao gradient flow in the space of probability densities. The Fisher-Rao gradient flow is obtained as a continuous-time algorithm for variational inference, minimizing the Kullback-Leibler divergence between a variational density and the true posterior density. When considering a parametric family of variational densities, the function space Fisher-Rao gradient flow simplifies to the natural gradient flow of the variational density parameters. By adopting a Gaussian variational density, we derive a Gaussian approximated Fisher-Rao particle flow and show that, under linear Gaussian assumptions, it reduces to the Exact Daum and Huang particle flow. Additionally, we introduce a Gaussian mixture approximated Fisher-Rao particle flow to enhance the expressive power of our model through a multi-modal variational density. Simulations on low- and high-dimensional estimation problems illustrate our results.

2502.15116 2026-03-06 math.PR math.ST stat.TH

Uniform mean estimation via generic chaining

Daniel Bartl, Shahar Mendelson

详情
Journal ref
Advances in Mathematics, 2026+
英文摘要

We introduce an empirical functional $Ψ$ that is an optimal uniform mean estimator: Let $F\subset L_2(μ)$ be a class of mean zero functions, $u$ is a real valued function, and $X_1,\dots,X_N$ are independent, distributed according to $μ$. We show that under minimal assumptions, with $μ^{\otimes N}$ exponentially high probability, \[ \sup_{f\in F} |Ψ(X_1,\dots,X_N,f) - \mathbb{E} u(f(X))| \leq c R(F) \frac{ \mathbb{E} \sup_{f\in F } |G_f| }{\sqrt N}, \] where $(G_f)_{f\in F}$ is the gaussian processes indexed by $F$ and $R(F)$ is an appropriate notion of `diameter' of the class $\{u(f(X)) : f\in F\}$. The fact that such a bound is possible is surprising, and it leads to the solution of various key problems in high dimensional probability and high dimensional statistics. The construction is based on combining Talagrand's generic chaining mechanism with optimal mean estimation procedures for a single real-valued random variable.

2502.11682 2026-03-06 cs.LG math.OC stat.ML

Double Momentum and Error Feedback for Clipping with Fast Rates and Differential Privacy

Rustem Islamov, Samuel Horvath, Aurelien Lucchi, Peter Richtarik, Eduard Gorbunov

详情
英文摘要

Strong Differential Privacy (DP) and Optimization guarantees are two desirable properties for a method in Federated Learning (FL). However, existing algorithms do not achieve both properties at once: they either have optimal DP guarantees but rely on restrictive assumptions such as bounded gradients/bounded data heterogeneity, or they ensure strong optimization performance but lack DP guarantees. To address this gap in the literature, we propose and analyze a new method called Clip21-SGD2M based on a novel combination of clipping, heavy-ball momentum, and Error Feedback. In particular, for non-convex smooth distributed problems with clients having arbitrarily heterogeneous data, we prove that Clip21-SGD2M has optimal convergence rate and also near optimal (local-)DP neighborhood. Our numerical experiments on non-convex logistic regression and training of neural networks highlight the superiority of Clip21-SGD2M over baselines in terms of the optimization performance for a given DP-budget.

2502.07584 2026-03-06 stat.ML cs.LG

Generalization Bounds for Markov Algorithms through Entropy Flow Computations

Benjamin Dupuis, Maxime Haddouche, George Deligiannidis, Umut Simsekli

详情
英文摘要

Many learning algorithms can be represented as Markov processes, and understanding their generalization error is a central topic in learning theory. For specific continuous-time noisy algorithms, a prominent analysis technique relies on information-theoretic tools and the so-called ``entropy flow'' method. This technique is compatible with a broad range of assumptions and leverages the convergence properties of learning dynamics to produce meaningful generalization bounds, which can also be informative or extend to discrete-time settings. Despite their success, existing entropy flow formulations are limited to specific noise and algorithm structures (\eg, Langevin dynamics). In this work, we exploit new technical tools to extend its applicability to all learning algorithms whose iterative dynamics is governed by a time-homogeneous Markov process. Our approach builds on a principled continuous-time approximation of Markov algorithms and introduces a new, exact entropy flow formula for such processes. Within this unified framework, we establish novel connections to a well-studied family of modified logarithmic Sobolev inequalities, which we use to connect the generalization error to the ergodic properties of Markov processes. Finally, we provide a detailed analysis of all the terms appearing in our theory and demonstrate its effectiveness by deriving new generalization bounds for several concrete algorithms.

2502.05360 2026-03-06 cs.LG math.OC stat.ML

Curse of Dimensionality in Neural Network Optimization

Sanghoon Na, Haizhao Yang

Comments Accepted for publication in Information and Inference: A Journal of the IMA. 32 pages, 1 figure

详情
英文摘要

This paper demonstrates that when a shallow neural network with a Lipschitz continuous activation function is trained using either empirical or population risk to approximate a target function that is $r$ times continuously differentiable on $[0,1]^d$, the population risk may not decay at a rate faster than $t^{-\frac{4r}{d-2r}}$, where $t$ denotes the time parameter of the gradient flow dynamics. This result highlights the presence of the curse of dimensionality in the optimization computation required to achieve a desired accuracy. Instead of analyzing parameter evolution directly, the training dynamics are examined through the evolution of the parameter distribution under the 2-Wasserstein gradient flow. Furthermore, it is established that the curse of dimensionality persists when a locally Lipschitz continuous activation function is employed, where the Lipschitz constant in $[-x,x]$ is bounded by $O(x^δ)$ for any $x \in \mathbb{R}$. In this scenario, the population risk is shown to decay at a rate no faster than $t^{-\frac{(4+2δ)r}{d-2r}}$. Understanding how function smoothness influences the curse of dimensionality in neural network optimization theory is an important and underexplored direction that this work aims to address.

2412.03832 2026-03-06 math.ST stat.TH

Information theoretic limits of robust sub-Gaussian mean estimation under star-shaped constraints

Akshay Prasadan, Matey Neykov

详情
Journal ref
Ann. Statist. 54(1): 490-515 (February 2026)
英文摘要

We obtain the minimax rate for a mean location model with a bounded star-shaped set $K \subseteq \mathbb{R}^n$ constraint on the mean, in an adversarially corrupted data setting with Gaussian noise. We assume an unknown fraction $ε\le 1/2-κ$ for some fixed $κ\in(0,1/2]$ of $N$ observations are arbitrarily corrupted. We obtain a minimax risk up to proportionality constants under the squared $\ell_2$ loss of $\max(η^{*2},σ^2ε^2)\wedge d^2$ with \begin{align*} η^* = \sup \bigg\{η\ge 0 : \frac{Nη^2}{σ^2} \leq \log \mathcal{M}_K^{\operatorname{loc}}(η,c)\bigg\}, \end{align*} where $\log \mathcal{M}_K^{\operatorname{loc}}(η,c)$ denotes the local entropy of the set $K$, $d$ is the diameter of $K$, $σ^2$ is the variance, and $c$ is some sufficiently large absolute constant. A variant of our algorithm achieves the same rate for settings with known or symmetric sub-Gaussian noise, with a smaller breakdown point, still of constant order. We further study the case of unknown sub-Gaussian noise and show that the rate is slightly slower: $\max(η^{*2},σ^2ε^2\log(1/ε))\wedge d^2$. We generalize our results to the case when $K$ is star-shaped but unbounded.

2411.13199 2026-03-06 math.ST stat.TH

Sharp Bounds for Multiple Models in Matrix Completion

Dali Liu, Haolei Weng

Comments 37 pages. Accepted by the Electronic Journal of Statistics. All comments are warmly welcomed

详情
英文摘要

In this paper, we demonstrate how a class of advanced matrix concentration inequalities, introduced in \cite{brailovskaya2024universality}, can be used to eliminate the dimensional factor in the convergence rate of matrix completion. This dimensional factor represents a significant gap between the upper bound and the minimax lower bound, especially in high dimension. Through a more precise spectral norm analysis, we remove the dimensional factors for three popular matrix completion estimators, thereby establishing their minimax rate optimality.

2411.09847 2026-03-06 cs.LG stat.ML

Towards a Fairer Non-negative Matrix Factorization

Lara Kassab, Erin George, Deanna Needell, Haowen Geng, Nika Jafar Nia, Aoxi Li

详情
英文摘要

There has been a recent critical need to study fairness and bias in machine learning (ML) algorithms. Since there is clearly no one-size-fits-all solution to fairness, ML methods should be developed alongside bias mitigation strategies that are practical and approachable to the practitioner. Motivated by recent work on ``fair" PCA, here we consider the more challenging method of non-negative matrix factorization (NMF) as both a showcasing example and a method that is important in its own right for both topic modeling tasks and feature extraction for other ML tasks. We demonstrate that a modification of the objective function, by using a min-max formulation, may \textit{sometimes} be able to offer an improvement in fairness for groups in the population. We derive two methods for the objective minimization, a multiplicative update rule as well as an alternating minimization scheme, and discuss implementation practicalities. We include a suite of synthetic and real experiments that show how the method may improve fairness while also highlighting the important fact that this may sometime increase error for some individuals and fairness is not a rigid definition and method choice should strongly depend on the application at hand.

2406.05911 2026-03-06 math.ST stat.TH

Some facts about the optimality of the LSE in the Gaussian sequence model with convex constraint

Akshay Prasadan, Matey Neykov

详情
Journal ref
IEEE Trans. Inf. Theory 71(11): 8928-8958 (November 2025)
英文摘要

We consider a convex constrained Gaussian sequence model and characterize necessary and sufficient conditions for the least squares estimator (LSE) to be minimax optimal. For a closed convex set $K\subset \mathbb{R}^n$ we observe $Y=μ+ξ$ for $ξ\sim \mathcal{N}(0,σ^2\mathbb{I}_n)$ and $μ\in K$ and aim to estimate $μ$. We characterize the worst case risk of the LSE in multiple ways by analyzing the behavior of the local Gaussian width on $K$. We demonstrate that optimality is equivalent to a Lipschitz property of the local Gaussian width mapping. We also provide theoretical algorithms that search for the worst case risk. We then provide examples showing optimality or suboptimality of the LSE on various sets, including $\ell_p$ balls for $p\in[1,2]$, pyramids, solids of revolution, and multivariate isotonic regression, among others.

2402.03352 2026-03-06 math.OC cs.LG stat.ML

Zeroth-Order primal-dual Alternating Projection Gradient Algorithms for Nonconvex Minimax Problems with Coupled linear Constraints

Huiling Zhang, Zi Xu, Yuhong Dai

Comments arXiv admin note: text overlap with arXiv:2212.04672

详情
英文摘要

In this paper, we study zeroth-order algorithms for nonconvex minimax problems with coupled linear constraints under the deterministic and stochastic settings, which have attracted wide attention in machine learning, signal processing and many other fields in recent years, e.g., adversarial attacks in resource allocation problems and network flow problems etc. We propose two single-loop algorithms, namely the zeroth-order primal-dual alternating projected gradient (ZO-PDAPG) algorithm and the zeroth-order regularized momentum primal-dual projected gradient algorithm (ZO-RMPDPG), for solving deterministic and stochastic nonconvex-(strongly) concave minimax problems with coupled linear constraints. The iteration complexity of the two proposed algorithms to obtain an $\varepsilon$-stationary point are proved to be $\mathcal{O}(\varepsilon ^{-2})$ (resp. $\mathcal{O}(\varepsilon ^{-4})$) for solving nonconvex-strongly concave (resp. nonconvex-concave) minimax problems with coupled linear constraints under deterministic settings and $\tilde{\mathcal{O}}(\varepsilon ^{-3})$ (resp. $\tilde{\mathcal{O}}(\varepsilon ^{-6.5})$) under stochastic settings respectively. To the best of our knowledge, they are the first two zeroth-order algorithms with iterative complexity guarantees for solving nonconvex-(strongly) concave minimax problems with coupled linear constraints under the deterministic and stochastic settings. The proposed ZO-RMPDPG algorithm, when specialized to stochastic nonconvex-concave minimax problems without coupled constraints, outperforms all existing zeroth-order algorithms by achieving a better iteration complexity, thus setting a new state-of-the-art.

2309.00756 2026-03-06 stat.AP math.OC

Learning Risk Preferences in Markov Decision Processes: an Application to the Fourth Down Decision in the National Football League

Nathan Sandholtz, Lucas Wu, Martin Puterman, Timothy C. Y. Chan

Comments 22 pages, 12 figures

详情
Journal ref
The Annals of Applied Statistics 2024, Vol. 18, No. 4, 3205-3228
英文摘要

For decades, National Football League (NFL) coaches' observed fourth down decisions have been largely inconsistent with prescriptions based on statistical models. In this paper, we develop a framework to explain this discrepancy using an inverse optimization approach. We model the fourth down decision and the subsequent sequence of plays in a game as a Markov decision process (MDP), the dynamics of which we estimate from NFL play-by-play data from the 2014 through 2022 seasons. We assume that coaches' observed decisions are optimal but that the risk preferences governing their decisions are unknown. This yields an inverse decision problem for which the optimality criterion, or risk measure, of the MDP is the estimand. Using the quantile function to parameterize risk, we estimate which quantile-optimal policy yields the coaches' observed decisions as minimally suboptimal. In general, we find that coaches' fourth-down behavior is consistent with optimizing low quantiles of the next-state value distribution, which corresponds to conservative risk preferences. We also find that coaches exhibit higher risk tolerances when making decisions in the opponent's half of the field as opposed to their own half, and that league average fourth down risk tolerances have increased over time.

2307.10960 2026-03-06 math.ST math.PR stat.TH

Change point estimation for a stochastic heat equation

Markus Reiß, Claudia Strauch, Lukas Trottner

详情
英文摘要

We study a change point model based on a stochastic partial differential equation (SPDE) corresponding to the heat equation governed by the weighted Laplacian $Δ_\vartheta = \nabla\vartheta\nabla$, where $\vartheta=\vartheta(x)$ is a space-dependent diffusivity. As a basic problem the domain $(0,1)$ is considered with a piecewise constant diffusivity with a jump at an unknown point $τ$. Based on local measurements of the solution in space with resolution $δ$ over a finite time horizon, we construct a simultaneous M-estimator for the diffusivity values and the change point. The change point estimator converges at rate $δ$, while the diffusivity constants can be recovered with convergence rate $δ^{3/2}$. Moreover, when the diffusivity parameters are known and the jump height vanishes with the spatial resolution tending to zero, we derive a limit theorem for the change point estimator and identify the limiting distribution. For the mathematical analysis, a precise understanding of the SPDE with discontinuous $\vartheta$, tight concentration bounds for quadratic functionals in the solution, and a generalisation of classical M-estimators are developed.

2209.11691 2026-03-06 econ.EM cs.LG stat.ME

Linear Multidimensional Regression with Interactive Fixed-Effects

Hugo Freeman

详情
英文摘要

This paper studies a linear model for multidimensional panel data of three or more dimensions with unobserved interactive fixed-effects. The main estimator uses a Neyman-orthogonal approach, and requires two preliminary steps. First, the model is embedded within a two-dimensional panel framework where factor model methods in Bai (2009) lead to consistent, but slowly converging, estimates. The second step develops a weighted-within transformation that is robust to multidimensional interactive fixed-effects and achieves the parametric rate of consistency. The estimator is shown to be asymptotically normal. The methods are implemented to estimate the demand elasticity for beer.

2012.07167 2026-03-06 math.ST stat.TH

Pseudo-likelihood-based $M$-estimation of random graphs with dependent edges and parameter vectors of increasing dimension

Jonathan R. Stewart, Michael Schweinberger

详情
英文摘要

An important question in statistical network analysis is how to estimate models of discrete and dependent network data with intractable likelihood functions, without sacrificing computational scalability and statistical guarantees. We demonstrate that scalable estimation of random graph models with dependent edges is possible, by establishing convergence rates of pseudo-likelihood-based $M$-estimators for discrete undirected graphical models with exponential parameterizations and parameter vectors of increasing dimension in single-observation scenarios. We highlight the impact of two complex phenomena on the convergence rate: phase transitions and model near-degeneracy. The main results have possible applications to discrete and dependent network, spatial, and temporal data. To showcase convergence rates, we introduce a novel class of generalized $β$-models with dependent edges and parameter vectors of increasing dimension, which leverage additional structure in the form of overlapping subpopulations to control dependence. We establish convergence rates of pseudo-likelihood-based $M$-estimators for generalized $β$-models in dense- and sparse-graph settings.

2603.04420 2026-03-06 cs.LG math.DS q-bio.NC stat.ML

Machine Learning for Complex Systems Dynamics: Detecting Bifurcations in Dynamical Systems with Deep Neural Networks

Swadesh Pal, Roderick Melnik

Comments 15 pages; 5 figures

详情
英文摘要

Critical transitions are the abrupt shifts between qualitatively different states of a system, and they are crucial to understanding tipping points in complex dynamical systems across ecology, climate science, and biology. Detecting these shifts typically involves extensive forward simulations or bifurcation analyses, which are often computationally intensive and limited by parameter sampling. In this study, we propose a novel machine learning approach based on deep neural networks (DNNs) called equilibrium-informed neural networks (EINNs) to identify critical thresholds associated with catastrophic regime shifts. Rather than fixing parameters and searching for solutions, the EINN method reverses this process by using candidate equilibrium states as inputs and training a DNN to infer the corresponding system parameters that satisfy the equilibrium condition. By analyzing the learned parameter landscape and observing abrupt changes in the feasibility or continuity of equilibrium mappings, critical thresholds can be effectively detected. We demonstrate this capability on nonlinear systems exhibiting saddle-node bifurcations and multi-stability, showing that EINNs can recover the parameter regions associated with impending transitions. This method provides a flexible alternative to traditional techniques, offering new insights into the early detection and structure of critical shifts in high-dimensional and nonlinear systems.

2603.04418 2026-03-06 cs.LG cs.AI stat.ML

Decorrelating the Future: Joint Frequency Domain Learning for Spatio-temporal Forecasting

Zepu Wang, Bowen Liao, Jeff, Ban

详情
英文摘要

Standard direct forecasting models typically rely on point-wise objectives such as Mean Squared Error, which fail to capture the complex spatio-temporal dependencies inherent in graph-structured signals. While recent frequency-domain approaches such as FreDF mitigate temporal autocorrelation, they often overlook spatial and cross spatio-temporal interactions. To address this limitation, we propose FreST Loss, a frequency-enhanced spatio-temporal training objective that extends supervision to the joint spatio-temporal spectrum. By leveraging the Joint Fourier Transform (JFT), FreST Loss aligns model predictions with ground truth in a unified spectral domain, effectively decorrelating complex dependencies across both space and time. Theoretical analysis shows that this formulation reduces estimation bias associated with time-domain training objectives. Extensive experiments on six real-world datasets demonstrate that FreST Loss is model-agnostic and consistently improves state-of-the-art baselines by better capturing holistic spatio-temporal dynamics.