arXivDaily arXiv每日学术速递 周一至周五更新
重置
2604.14108 2026-04-16 cs.LG math.DS math.OC stat.ML

Momentum Further Constrains Sharpness at the Edge of Stochastic Stability

Arseniy Andreyev, Advikar Ananthkumar, Marc Walden, Tomaso Poggio, Pierfrancesco Beneventano

Comments 40 pages, 38 figures

详情
英文摘要

Recent work suggests that (stochastic) gradient descent self-organizes near an instability boundary, shaping both optimization and the solutions found. Momentum and mini-batch gradients are widely used in practical deep learning optimization, but it remains unclear whether they operate in a comparable regime of instability. We demonstrate that SGD with momentum exhibits an Edge of Stochastic Stability (EoSS)-like regime with batch-size-dependent behavior that cannot be explained by a single momentum-adjusted stability threshold. Batch Sharpness (the expected directional mini-batch curvature) stabilizes in two distinct regimes: at small batch sizes it converges to a lower plateau $2(1-β)/η$, reflecting amplification of stochastic fluctuations by momentum and favoring flatter regions than vanilla SGD; at large batch sizes it converges to a higher plateau $2(1+β)/η$, where momentum recovers its classical stabilizing effect and favors sharper regions consistent with full-batch dynamics. We further show that this aligns with linear stability thresholds and discuss the implications for hyperparameter tuning and coupling.

2604.14086 2026-04-16 stat.OT

The Epidemiology of Artificial Intelligence

Harsh Parikh, Tyler McCormick, Emily Johnson, Leo Hickey, Megan Ranney, Bhramar Mukherjee

Comments Perspective/Viewpoint of causal role of AI

详情
英文摘要

Artificial intelligence (AI) systems increasingly shape how people access health information, make medical decisions, and receive care -- yet epidemiology lacks frameworks for measuring AI exposure or studying its health effects at the population level. Here we argue that AI now functions as a determinant of health and propose a conceptual framework, borrowed from environmental epidemiology, for studying it. We distinguish ambient AI exposure -- algorithmic curation and AI-mediated institutional decisions that affect populations regardless of individual choice -- from personal AI exposure -- direct, volitional use of AI tools. We characterize AI's possible causal roles in epidemiological models, show that existing experimental approaches are inadequate for capturing chronic, population-level effects, and illustrate these ideas with nationally representative US survey data. We discuss implications for study design, health equity, and AI governance.

2604.14075 2026-04-16 math.OC cs.LG stat.ML

Multistage Conditional Compositional Optimization

Buse Şen, Yifan Hu, Daniel Kuhn

详情
英文摘要

We introduce Multistage Conditional Compositional Optimization (MCCO) as a new paradigm for decision-making under uncertainty that combines aspects of multistage stochastic programming and conditional stochastic optimization. MCCO minimizes a nest of conditional expectations and nonlinear cost functions. It has numerous applications and arises, for example, in optimal stopping, linear-quadratic regulator problems, distributionally robust contextual bandits, as well as in problems involving dynamic risk measures. The naïve nested sampling approach for MCCO suffers from the curse of dimensionality familiar from scenario tree-based multistage stochastic programming, that is, its scenario complexity grows exponentially with the number of nests. We develop new multilevel Monte Carlo techniques for MCCO whose scenario complexity grows only polynomially with the desired accuracy.

2604.14071 2026-04-16 math.ST math.DS stat.TH

Finite-Step Bounds for Iterated Correlation Matrices

Ishrak AlhajjHassan

详情
英文摘要

We establish finite-step probabilistic upper bounds on the contraction ratios $ρ_k = Δ_{k+1}/Δ_k$ for iterated Pearson correlation dynamics. Let $(P_k)_{k\ge 0}$ be the sequence generated by the Pearson update. Define $Δ_k := \|P_{k+1}-P_k\|_F$, $ρ_k := Δ_{k+1}/Δ_k$ for $Δ_k > 0$, and $δ_k := Δ_k/n$. Although $Δ_k \to 0$ along convergent trajectories, the ratios $ρ_k$ may exceed unity in finitely many steps. This behavior is invisible to local linearization. Our main contribution is a probabilistic bounding framework that captures these finite-step expansions. We initialize $P_0$ with i.i.d. $\mathcal{U}[-1,1]$ entries and let $\mathbb{P}$ be the induced measure. For $k \ge 2$, we construct state-dependent bounds $B_p : \mathbb{R}_+ \to \mathbb{R}_+$ satisfying $\mathbb{P}(ρ_k \le B_p(δ_k)) \ge p$. The functions $B^{\mathrm{q}}_p(δ)$ are empirical conditional $p$-quantiles of $\log ρ_k$ given $δ_k$ under logarithmic binning. Larger families $B^{\mathrm{TC}}_{p,τ}(δ)$ and $B^{\mathrm{tol}}_{p,τ}(δ)$ are obtained via multiplicative adjustments, yielding pointwise larger bounds that preserve the $δ$-dependence. Validation on held-out trajectories confirms the bounds hold with empirical coverage matching nominal levels for all $n \in [3,2000]$. The baseline $0.95$-quantile bound $B^{\mathrm{q}}_{0.95}(δ)$ yields two concrete results: $\mathbb{P}(ρ\le 1 \mid δ\le 0.03) \ge 0.95$ uniformly in $n$, and $\mathbb{P}(ρ\le 1.7) \ge 0.95$ for 21 of 22 dimensions. The exception $n = 69$ attains $2.35$, revealing a rare extreme upper tail discontinuity not captured by asymptotic analysis. These are the first finite-step probabilistic bounds for Pearson correlation dynamics. The framework is fully reproducible with provided code and data.

2604.14061 2026-04-16 cs.IT math.IT math.PR stat.ML

Two-Sided Bounds for Entropic Optimal Transport via a Rate-Distortion Integral

Jingbo Liu

Comments IEEE International Symposium on Information Theory (ISIT) 2026

详情
英文摘要

We show that the maximum expected inner product between a random vector and the standard normal vector over all couplings subject to a mutual information constraint or regularization is equivalent to a truncated integral involving the rate-distortion function, up to universal multiplicative constants. The proof is based on a lifting technique, which constructs a Gaussian process indexed by a random subset of the type class of the probability distribution involved in the information-theoretic inequality, and then applying a form of the majorizing measure theorem.

2604.13980 2026-04-16 cs.LG q-bio.QM stat.ML

BOAT: Navigating the Sea of In Silico Predictors for Antibody Design via Multi-Objective Bayesian Optimization

Jackie Rao, Ferran Gonzalez Hernandez, Leon Gerard, Alexandra Gessner

Comments Proceedings of the 29th International Conference on Artificial Intelligence and Statistics (AISTATS) 2026

详情
英文摘要

Antibody lead optimization is inherently a multi-objective challenge in drug discovery. Achieving a balance between different drug-like properties is crucial for the development of viable candidates, and this search becomes exponentially challenging as desired properties grow. The ever-growing zoo of sophisticated in silico tools for predicting antibody properties calls for an efficient joint optimization procedure to overcome resource-intensive sequential filtering pipelines. We present BOAT, a versatile Bayesian optimization framework for multi-property antibody engineering. Our `plug-and-play' framework couples uncertainty-aware surrogate modeling with a genetic algorithm to jointly optimize various predicted antibody traits while enabling efficient exploration of sequence space. Through systematic benchmarking against genetic algorithms and newer generative learning approaches, we demonstrate competitive performance with state-of-the-art methods for multi-objective protein optimization. We identify clear regimes where surrogate-driven optimization outperforms expensive generative approaches and establish practical limits imposed by sequence dimensionality and oracle costs.

2604.13973 2026-04-16 stat.ME

Improving Treatment Effect Estimation in Trials through Adaptive Borrowing of External Controls

Qinwei Yang, Jingyi Li, Peng Wu, Shu Yang

详情
英文摘要

Randomized controlled trials (RCTs) often suffer from limited inferential efficiency in estimating treatment effects due to their small sample sizes. In recent years, incorporating external controls (ECs) has gained increasing attention as an effective way to augment small RCTs and thereby enhance estimation efficiency. However, ECs are not always comparable to RCTs, and direct borrowing without careful evaluation can introduce substantial bias and, paradoxically, undermine the accuracy of treatment effect estimation. In this paper, we propose a novel adaptive influence-based sample borrowing framework to improve average treatment effect (ATE) estimation in RCTs. The framework quantifies the ``comparability'' of each sample in ECs using influence functions and identifies the optimal subset of ECs that minimizes the mean squared error of the ATE estimator. The proposed framework is assumption-lean regarding the distribution of ECs and is robust to outliers, making it broadly applicable across diverse settings. Moreover, we develop an outcome calibration method to improve the data utilization efficiency of ECs, further strengthening the adaptive influence-based sample-borrowing framework. We demonstrate the effectiveness of the proposed method using both simulated and real-world datasets.

2604.13944 2026-04-16 stat.ME

High-Dimensional Data Analysis for Elliptically Symmetric Distributions

Long Feng

详情
英文摘要

High-dimensional data arise routinely in modern statistics, econometrics, finance, genomics, and machine learning. While a large body of existing methodology is developed under Gaussian or light-tailed assumptions, many real data sets exhibit heavy tails, heterogeneity, and departures from classical covariance-based models. This book provides a systematic treatment of high-dimensional data analysis under elliptically symmetric distributions, with an emphasis on robust inference based on spatial signs, spatial ranks, multivariate Kendall's tau matrices, and related shape-based methods.The book covers the basic theory of elliptical symmetry, high-dimensional location inference, estimation and testing for covariance and precision matrices, sphericity and proportionality testing, high-dimensional alpha testing in factor pricing models, change-point analysis, white-noise and independence testing, high-dimensional discriminant analysis, and dimension reduction through principal component analysis and factor models. Throughout, we review classical low-dimensional and high-dimensional benchmark methods and then develop robust alternatives tailored to elliptical models. Particular attention is paid to the interplay between sum-type, max-type, and adaptive procedures, as well as to the role of scatter, shape, and rank-based dependence measures in heavy-tailed settings. This book is intended as a unified overview of robust high-dimensional methods under elliptical symmetry and as a synthesis of the author's recent research contributions in this area. It is written for researchers and graduate students in statistics, econometrics, and related fields who are interested in modern high-dimensional inference beyond the Gaussian paradigm.

2604.13890 2026-04-16 physics.soc-ph cs.LG econ.EM econ.TH stat.ML

Sandpile Economics: Theory, Identification, and Evidence

Diego Vallarino

详情
英文摘要

Why do capitalist economies recurrently generate crises whose severity is disproportionate to the size of the triggering shock? This paper proposes a structural answer grounded in the evolutionary geometry of production networks. As economies evolve through specialization, integration, and competitive selection, their inter-sectoral linkages drift toward configurations of increasing geometric fragility, eventually crossing a threshold beyond which small disturbances generate disproportionately large cascades. We introduce Sandpile Economics, a formal framework that interprets macroeconomic instability as an emergent property of disequilibrium production networks. The key state variable is the Forman--Ricci curvature of the input--output graph, capturing local substitution possibilities when supply chains are disrupted. We show that when curvature falls below an endogenous threshold, the distribution of cascade sizes follows a power law with tail index $α\in (1,2)$, implying a regime of unbounded amplification. The underlying mechanism is evolutionary: specialization reduces input substitutability, pushing the economy toward criticality, while crisis episodes induce endogenous network reconfiguration and path dependence. These dynamics are inherently non-ergodic and cannot be captured by representative-agent frameworks. Empirically, using global input--output data, we document that production networks operate in persistently negative curvature regimes and that curvature robustly predicts medium-run output dynamics. A one-standard-deviation increase in curvature is associated with higher cumulative growth over three-year horizons, and curvature systematically outperforms standard network metrics in explaining cross-country differences in resilience.

2604.09832 2026-04-16 stat.CO

Adaptive Riemannian Manifold Hamiltonian Monte Carlo with Hierarchical Metric

Miika Kailas, Matti Vihola, Jonas Wallin

详情
英文摘要

Hamiltonian Monte Carlo (HMC) and its dynamic extensions, such as the No-U-Turn Sampler (NUTS), are powerful Markov chain Monte Carlo methods for sampling from complex, high-dimensional probability distributions. Riemannian manifold Hamiltonian Monte Carlo (RMHMC) extends HMC by allowing the mass matrix to depend on position, which can substantially improve mixing but also makes implementation considerably more challenging. In this paper, we study an adaptive hierarchical version of RMHMC that is well suited to many hierarchical sampling problems. A key feature of hierarchical RMHMC is that, unlike general RMHMC, it admits a closed-form explicit leapfrog integrator, enabling efficient implementation and direct use within dynamic HMC methods such as NUTS. We introduce an adaptive scheme that automatically tunes the parameters of the hierarchical mass matrix during simulation. Importantly, the target density need not exhibit any hierarchical or block structure; the hierarchy is instead imposed on the mass matrix as a modeling device to capture the local geometry of the target distribution. Numerical experiments demonstrate appealing empirical performance in high-dimensional Bayesian inference problems.

2603.24153 2026-04-16 math.ST math.PR stat.TH

Penalized estimation of GEV parameters for extreme quantile regression

Lucien M. Vidagbandji, Alexandre Berred, Cyrille Bertelle, Laurent Amanton

详情
英文摘要

Quantile regression (QR) relies on the estimation of conditional quantiles and explores the relationships between independent and dependent variables. At high probability levels, classical QR methods face extrapolation difficulties due to the scarcity of data in the tail of the distribution. Another challenge arises when the number of predictors is large and the quantile function exhibits a complex structure. In this work, we propose an estimation method designed to overcome these challenges. To enhance extrapolation in the tail of the conditional response distribution, we model block maxima using the generalized extreme value (GEV) distribution, where the parameters depend on covariates. To address the second challenge, we adopt an approach based on generalized random forests (grf) to estimate these parameters. Specifically, we maximize a penalized likelihood, weighted by the weights obtained through the grf method. This penalization helps overcome the limitations of the maximum likelihood estimator (MLE) in small samples, while preserving its optimality in large samples. The effectiveness of our method is validated through comparisons with other approaches in simulation studies and an application to U.S. wage data.

2603.02417 2026-04-16 stat.ML cs.LG math.OC

Mini-Batch Covariance, Diffusion Limits, and Oracle Complexity in Stochastic Gradient Descent: A Sampling-Design Perspective

Daniel Zantedeschi, Kumar Muthuraman

详情
英文摘要

Stochastic gradient descent (SGD) is central to simulation optimization, stochastic programming, and online M-estimation, where sampling effort is a decision variable. We study the mini-batch gradient noise as a sampling-design object. Under exchangeable fresh-sampling mini-batches, the conditional covariance given the de Finetti directing measure mu is b^{-1} G_mu(theta), and under identifiability the projected population object is b^{-1} G*(theta) -- projected Fisher information for correctly specified likelihoods, the sandwich partner of the Hessian otherwise. This identification fixes the noise matrix entering the diffusion analysis of constant-step SGD: the raw iterate path has a deterministic fluid limit, and the sqrt(b/eta)-scaled fluctuations satisfy a functional CLT with noise covariance G*; near a nondegenerate optimum the limit is Ornstein-Uhlenbeck, and its Lyapunov covariance scaled by eta/b matches the linearized discrete recursion at leading order. Under a curvature-noise compatibility condition mu_F > 0, we prove 1/N mean-square upper bounds and an i.i.d. parametric Fisher van Trees lower bound of the same rate order, with oracle-complexity guarantees depending on an effective dimension d_eff and condition number kappa_F. Numerical experiments verify the identification and confirm the Lyapunov predictions in direct SGD.

2603.00192 2026-04-16 cs.LG stat.AP stat.ML

Diagnostics for Individual-Level Prediction Instability in Machine Learning for Healthcare

Elizabeth W. Miller, Jeffrey D. Blume

详情
英文摘要

In healthcare, predictive models increasingly inform patient-level decisions, yet little attention is paid to the variability in individual risk estimates and its impact on treatment decisions. For overparameterized models, now standard in machine learning, a substantial source of variability often goes undetected. Even when the data and model architecture are held fixed, randomness introduced by optimization and initialization can lead to materially different risk estimates for the same patient. This problem is largely obscured by standard evaluation practices, which rely on aggregate performance metrics (e.g., log-loss, accuracy) that are agnostic to individual-level stability. As a result, models with indistinguishable aggregate performance can nonetheless exhibit substantial procedural arbitrariness, which can undermine clinical trust. We propose an evaluation framework that quantifies individual-level prediction instability by using two complementary diagnostics: empirical prediction interval width (ePIW), which captures variability in continuous risk estimates, and empirical decision flip rate (eDFR), which measures instability in threshold-based clinical decisions. We apply these diagnostics to simulated data and GUSTO-I clinical dataset. Across observed settings, we find that for flexible machine-learning models, randomness arising solely from optimization and initialization can induce individual-level variability comparable to that produced by resampling the entire training dataset. Neural networks exhibit substantially greater instability in individual risk predictions compared to logistic regression models. Risk estimate instability near clinically relevant decision thresholds can alter treatment recommendations. These findings that stability diagnostics should be incorporated into routine model validation for assessing clinical reliability.

2601.04193 2026-04-16 cs.IT math.IT math.PR stat.ML

A discrete Benamou-Brenier formulation of Optimal Transport on graphs

Kieran Morris, Oliver Johnson

详情
英文摘要

We propose a discrete transport equation on graphs which connects distributions on both vertices and edges. We then derive a discrete analogue of the Benamou-Brenier formulation for Wasserstein-$1$ distance on a graph and as a result classify all $W_1$ geodesics on graphs.

2512.24968 2026-04-16 econ.GN cs.AI cs.CY q-fin.EC stat.AP

Strategic Response of News Publishers to Generative AI

Hangcheng Zhao, Ron Berman

详情
英文摘要

Generative AI can adversely impact news publishers by lowering consumer demand. It can also reduce demand for newsroom employees, and increase the creation of news "slop." However, it can also form a source of traffic referrals and an information-discovery channel that increases demand. We use high-frequency granular data to analyze the strategic response of news publishers to the introduction of Generative AI. Many publishers strategically blocked LLM access to their websites using the robots.txt file standard. Using a difference-in-differences approach, we find that large publishers who block GenAI bots experience reduced website traffic compared to not blocking. In addition, we find that large publishers shift toward richer content that is harder for LLMs to replicate, without increasing text volume. Finally, we find that the share of new editorial and content-production job postings rises over time. Together, these findings illustrate the levers that publishers choose to use to strategically respond to competitive Generative AI threats, and their consequences.

2510.18099 2026-04-16 stat.ME

Staying on Track: Efficient Trajectory Discovery with Adaptive Batch Sampling

Arindam Fadikar, Abby Stevens, Mickael Binois, Nicholson Collier, David O'Gara, Jonathan Ozik

详情
英文摘要

Bayesian optimization (BO) is a powerful framework for estimating parameters of expensive simulation models, particularly in settings where the likelihood is intractable and evaluations are costly. In stochastic models every simulation is run with a specific parameter set and an implicit or explicit random seed, where each parameter set and random seed combination generates an individual realization, or trajectory, sampled from an underlying random process. Existing BO approaches typically rely on summary statistics over the realizations, such as means, medians, or quantiles, potentially limiting their effectiveness when trajectory-level information is desired. We propose a trajectory-oriented BO method that incorporates a Gaussian process surrogate using both input parameters and random seeds as inputs, enabling direct inference at the trajectory level. Using a common random number approach, we define a surrogate-based likelihood over trajectories and introduce an adaptive Thompson Sampling algorithm that refines a fixed-size input grid through likelihood-based filtering and Metropolis-Hastings-based densification. This approach concentrates computation on statistically promising regions of the input space while balancing exploration and exploitation. We apply the method to stochastic epidemic models, a simple compartmental and a more computationally demanding agent-based model, demonstrating improved sampling efficiency and faster identification of data-consistent trajectories relative to parameter-only inference.

2509.02154 2026-04-16 cs.LG cs.AI cs.CV stat.ML

Heavy-Tailed Class-Conditional Priors for Long-Tailed Generative Modeling

Aymene Mohammed Bouayed, Samuel Deslauriers-Gauthier, Adrian Iaccovelli, David Naccache

详情
英文摘要

Variational Autoencoders (VAEs) with global priors trained under an imbalanced empirical class distribution can lead to underrepresentation of tail classes in the latent space. While $t^3$VAE improves robustness via heavy-tailed Student's $t$-distribution priors, its single global prior still allocates mass proportionally to class frequency. We address this latent geometric bias by introducing C-$t^3$VAE, which assigns a per-class Student's $t$ joint prior over latent and output variables. This design promotes uniform prior mass across class-conditioned components. To optimize our model we derive a closed-form objective from the $γ$-power divergence, and we introduce an equal-weight latent mixture for class-balanced generation. On SVHN-LT, CIFAR100-LT, and CelebA datasets, C-$t^3$VAE consistently attains lower FID scores than $t^3$VAE and Gaussian-based VAE baselines under severe class imbalance while remaining competitive in balanced or mildly imbalanced settings. In per-class F1 evaluations, our model outperforms the conditional Gaussian VAE across highly imbalanced settings. Moreover, we identify the mild imbalance threshold $ρ< 5$, for which Gaussian-based models remain competitive. However, for $ρ\geq 5$ our approach yields improved class-balanced generation and mode coverage.

2505.16051 2026-04-16 stat.ML cs.LG

Flow-based Generative Modeling of Potential Outcomes and Counterfactuals

Dongze Wu, David I. Inouye, Yao Xie

Comments Accepted at 2026 IEEE International Symposium on Information Theory (ISIT 2026)

详情
英文摘要

Predicting potential and counterfactual outcomes from observational data is central to individualized decision-making, particularly in clinical settings where treatment choices must be tailored to each patient rather than guided solely by population averages. We propose PO-Flow, a continuous normalizing flow (CNF) framework for causal inference that jointly models potential outcome distributions and factual-conditioned counterfactual outcomes. Trained via flow matching, PO-Flow provides a unified approach to individualized potential outcome prediction, conditional average treatment effect estimation, and counterfactual prediction. By encoding an observed factual outcome and decoding under an alternative treatment, PO-Flow provides an encode-decode mechanism for factual-conditioned counterfactual prediction. In addition, PO-Flow supports likelihood-based evaluation of potential outcomes, enabling uncertainty-aware assessment of predictions. A supporting recovery guarantee is established under certain assumptions, and empirical results on benchmark datasets demonstrate strong performance across a range of causal inference tasks within the potential outcomes framework.

2504.21143 2026-04-16 stat.AP

Comparative Analysis of Weather-Based Indexes and the Actuaries Climate Index$^{TM}$ for Crop Yield Prediction and Weather-Derivative Pricing

Cem Yavrum, A. Sevtap Selcuk-Kestel, José Garrido

Comments 1) The application of the ACI within a weather-derivative framework is incorporated. 2) A time-trend analysis is integrated prior to crop yield prediction. 3) The iterative M-split leave-k-out cross-validation method is implemented. 4) The Discussion section is added

详情
英文摘要

Climate change poses significant challenges to the agricultural and financial sectors, affecting crop productivity and overall financial stability. This study evaluates the robustness of the Actuaries Climate Index$^{TM}$ (ACI), a newer entrant in the field as a tool for measuring climate impacts, by comparing its explanatory power with well-established weather-based indexes (WBIs) across two key sectors. In the agricultural context, the yields of three major crops are predicted using generalized statistical models and advanced machine learning algorithms with climate indexes as explanatory variables. To enhance model reliability and address multicollinearity among weather-related variables, the study also incorporates both principal component analysis and functional principal component analysis. A total of 22 models, each constructed with different sets of explanatory variables, demonstrate the significant impact of wind speed and sea-level changes, alongside temperature and precipitation, on crop yield variability across six regions of the United States. For the financial market application, the analysis adapts the weather derivative framework, as it is a critical instrument for energy companies, insurers, and agribusinesses seeking to hedge against weather-related risks. By analyzing the payoffs of derivative contracts that use WBIs and ACI components as underlying variables, the findings reveal that the ACI framework holds a strong potential as a comprehensive climate risk indicator, not only for the agricultural sector but also for the finance and insurance industries.

2503.00379 2026-04-16 cs.LG stat.ML

Improving clustering quality evaluation in noisy Gaussian mixtures

Renato Cordeiro de Amorim, Vladimir Makarenkov

详情
英文摘要

Clustering is a well-established technique in machine learning and data analysis, widely used across various domains. Cluster validity indices, such as the Average Silhouette Width, Calinski-Harabasz, and Davies-Bouldin indices, play a crucial role in assessing clustering quality when external ground truth labels are unavailable. However, these measures can be affected by different degrees of feature relevance, potentially leading to unreliable evaluations in high-dimensional or noisy data sets. We introduce a theoretically grounded Feature Importance Rescaling (FIR) method that enhances the quality of clustering validation by adjusting feature contributions based on their dispersion. It attenuates noise features, clarifies clustering compactness and separation, and thereby aligns clustering validation more closely with the ground truth. Through extensive experiments on synthetic data sets under different configurations and a case study on real-world data, we demonstrate that FIR consistently improves the correlation between the values of cluster validity indices and the ground truth, particularly in settings with noisy or irrelevant features. The results show that FIR increases the robustness of clustering evaluation, reduces variability in performance across different data sets, and remains effective even when clusters exhibit significant overlap. These findings highlight the potential of FIR as a valuable enhancement of clustering validation, making it a practical tool for unsupervised learning tasks where labelled data is unavailable.

2412.03596 2026-04-16 stat.ME

SMART-MC: Characterizing the Dynamics of Multiple Sclerosis Therapy Transitions Using a Covariate-Based Markov Model

Beomchang Kim, Zongqi Xia, Priyam Das

详情
英文摘要

Treatment switching is a common occurrence in the management of Multiple Sclerosis (MS), where patients transition across various disease-modifying therapies (DMTs) due to heterogeneous treatment responses, differences in disease progression, patient characteristics, and therapy-associated adverse effects. To investigate how patient-level covariates influence the likelihood of treatment transitions among DMTs, we adopt a Markovian framework, Sparse Matrix Estimation with Covariate-Based Transitions in Markov Chain Modeling (SMART-MC), in which the transition probabilities are modeled as functions of these covariates. Modeling real-world treatment transitions under this framework presents several challenges, including ensuring parameter identifiability and handling sparse transitions without overfitting. To address identifiability, we constrain each transition-specific covariate coefficient vectors to have a fixed L2 norm. Furthermore, our method automatically estimates transition probabilities for sparsely observed transitions as constants and enforces zero transition probabilities for transitions that are empirically unobserved. This approach mitigates the need for additional model complexity to handle sparsity while maintaining interpretability and efficiency. To optimize the multi-modal likelihood function, we develop a scalable, parallelized global optimization routine, which is validated through benchmark comparisons and supported by key theoretical properties. Our analysis uncovers meaningful patterns in DMT transitions, revealing variations across MS patient subgroups defined by age, race, and other clinical factors.

2408.10610 2026-04-16 cs.LG math.PR stat.ME

On an $L^2$ norm for stationary ARMA processes

Anand Ganesh, Babhrubahan Bose, Anand Rajagopalan

Comments 5 pages

详情
英文摘要

We propose an $L^2$ norm for stationary Autoregressive Moving Average (ARMA) models. We look at ARMA models within the Hilbert space of the past with present of a true purely linearly non-deterministic stationary process $X_t$, and compute the $L^2$ norm based on its Wold decomposition. As an application of this $L^2$ norm, we derive bounds on the mean square prediction error for AR(1) models of MA(1) processes, and verify these bounds empirically for sample data.

2408.02839 2026-04-16 stat.ML cs.LG

Mini-batch Estimation for Deep Cox Models: Statistical Foundations and Practical Guidance

Lang Zeng, Weijing Tang, Zhao Ren, Ying Ding

详情
英文摘要

The stochastic gradient descent (SGD) algorithm has been widely used to optimize deep Cox neural network (Cox-NN) by updating model parameters using mini-batches of data. We show that SGD aims to optimize the average of mini-batch partial-likelihood, which is different from the standard partial-likelihood. This distinction requires developing new statistical properties for the global optimizer, namely, the mini-batch maximum partial-likelihood estimator (mb-MPLE). We establish that mb-MPLE for Cox-NN is consistent and achieves the optimal minimax convergence rate up to a polylogarithmic factor. For Cox regression with linear covariate effects, we further show that mb-MPLE is $\sqrt{n}$-consistent and asymptotically normal with asymptotic variance approaching the information lower bound as batch size increases, which is confirmed by simulation studies. Additionally, we offer practical guidance on using SGD, supported by theoretical analysis and numerical evidence. For Cox-NN, we demonstrate that the ratio of the learning rate to the batch size is critical in SGD dynamics, offering insight into hyperparameter tuning. For Cox regression, we characterize the iterative convergence of SGD, ensuring that the global optimizer, mb-MPLE, can be approximated with sufficiently many iterations. Finally, we demonstrate the effectiveness of mb-MPLE in a large-scale real-world application where the standard MPLE is intractable.

1909.04024 2026-04-16 stat.ME stat.CO

Estimating the Optimal Linear Combination of Biomarkers using Spherically Constrained Optimization

Priyam Das, Debsurya De, Raju Maiti, Mona Kamal, Katherine A. Hutcheson, Clifton D. Fuller, Bibhas Chakraborty, Christine B. Peterson

详情
英文摘要

In the context of a binary classification problem, the optimal linear combination of continuous predictors can be estimated by maximizing an empirical estimate of the area under the receiver operating characteristic (ROC) curve (AUC). For multi-category responses, the optimal predictor combination can similarly be obtained by maximization of the empirical hypervolume under the manifold (HUM). This problem is particularly relevant to medical research, where it may be of interest to diagnose a disease with various subtypes or predict a multi-category outcome. Since the empirical HUM is discontinuous, non-differentiable, and possibly multi-modal, solving this maximization problem requires a global optimization technique. Estimation of the optimal coefficient vector using existing global optimization techniques is computationally expensive, becoming prohibitive as the number of predictors and the number of outcome categories increases. We propose an efficient derivative-free black-box optimization technique based on pattern search to solve this problem. Through extensive simulation studies, we demonstrate that the proposed method achieves better performance compared to existing methods including the step-down algorithm. Finally, we illustrate the proposed method to predict swallowing difficulty after radiation therapy for oropharyngeal cancer based on radiation dose to various structures in the head and neck.

1904.10046 2026-04-16 stat.ME stat.AP

A distribution-free smoothed combination method of biomarkers to improve diagnostic accuracy in multi-category classification

Raju Maiti, Jialiang Li, Priyam Das, Lei Feng, Derek Hausenloy, Bibhas Chakraborty

详情
英文摘要

Results from multiple diagnostic tests are usually combined to improve the overall diagnostic accuracy. For binary classification, maximization of the empirical estimate of the area under the receiver operating characteristic (ROC) curve is widely adopted to produce the optimal linear combination of multiple biomarkers. In the presence of large number of biomarkers, this method proves to be computationally expensive and difficult to implement since it involves maximization of a discontinuous, non-smooth function for which gradient-based methods cannot be used directly. Complexity of this problem increases when the classification problem becomes multi-category. In this article, we develop a linear combination method that maximizes a smooth approximation of the empirical Hypervolume Under Manifolds (HUM) for multi-category outcome. We approximate HUM by replacing the indicator function with the sigmoid function or normal cumulative distribution function (CDF). With the above smooth approximations, efficient gradient-based algorithms can be employed to obtain better solution with less computing time. We show that under some regularity conditions, the proposed method yields consistent estimates of the coefficient parameters. We also derive the asymptotic normality of the coefficient estimates. We conduct extensive simulations to examine our methods. Under different simulation scenarios, the proposed methods are compared with other existing methods and are shown to outperform them in terms of diagnostic accuracy. The proposed method is illustrated using two real medical data sets.

1609.02249 2026-04-16 math.OC stat.ME

Clustering sequence data with mixture Markov chains with covariates using multiple simplex constrained optimization routine (MSiCOR)

Priyam Das, Deborshee Sen, Debsurya De, Jue Hou, Zahra S. H. Abad, Nicole Kim, Zongqi Xia, Tianxi Cai

详情
英文摘要

Mixture Markov Model (MMM) is a widely used tool to cluster sequences of events coming from a finite state-space. However the MMM likelihood being multi-modal, the challenge remains in its maximization. Although Expectation-Maximization (EM) algorithm remains one of the most popular ways to estimate the MMM parameters, however convergence of EM algorithm is not always guaranteed. Given the computational challenges in maximizing the mixture likelihood on the constrained parameter space, we develop a pattern search-based global optimization technique which can optimize any objective function on a collection of simplexes, which is eventually used to maximize MMM likelihood. This is shown to outperform other related global optimization techniques. In simulation experiments, the proposed method is shown to outperform the expectation-maximization (EM) algorithm in the context of MMM estimation performance. The proposed method is applied to cluster Multiple sclerosis (MS) patients based on their treatment sequences of disease-modifying therapies (DMTs). We also propose a novel method to cluster people with MS based on DMT prescriptions and associated clinical features (covariates) using MMM with covariates. Based on the analysis, we divided MS patients into 3 clusters. Further cluster-specific summaries of relevant covariates indicate patient differences among the clusters.

1604.08636 2026-04-16 math.OC cs.DS stat.ME

Recursive Modified Pattern Search on High-dimensional Simplex : A Blackbox Optimization Technique

Priyam Das

详情
英文摘要

In this paper, a novel derivative-free pattern search based algorithm for Black-box optimization is proposed over a simplex constrained parameter space. At each iteration, starting from the current solution, new possible set of solutions are found by adding a set of derived step-size vectors to the initial starting point. While deriving these step-size vectors, precautions and adjustments are considered so that the set of new possible solution points still remain within the simplex constrained space. Thus, no extra time is spent in evaluating the (possibly expensive) objective function at infeasible points (points outside the unit-simplex space). While minimizing any objective function of m parameters, within each iteration, the objective function is evaluated at 2m new possible solution points. So, upto 2m parallel threads can be incorporated which makes the computation even faster while optimizing expensive objective functions over high-dimensional parameter space. Once a local minimum is discovered, in order to find a better solution, a novel `re-start' strategy is considered to increase the likelihood of finding a better solution. Unlike existing pattern search based methods, a sparsity control parameter is introduced which can be used to induce sparsity in the solution in case the solution is expected to be sparse in prior. A comparative study of the performances of the proposed algorithm and other existing algorithms are shown for a few low, moderate and high-dimensional optimization problems. Upto 338 folds improvement in computation time is achieved using the proposed algorithm over Genetic algorithm along with better solution. The proposed algorithm is used to estimate the simultaneous quantiles of North Atlantic Hurricane velocities during 1981-2006 by maximizing a non-closed form likelihood function with (possibly) multiple maximums.

2604.13772 2026-04-16 stat.ME

Testing Alpha in High-Dimensional Conditional Time-Varying Factor Models with Dependent Observations

Long Feng, Huifang Ma, Zhaojun Wang

详情
英文摘要

This paper studies alpha testing in a high-dimensional conditional time-varying factor model with temporally dependent observations. Both factor loadings and alpha processes are allowed to vary smoothly over time, and the cross-sectional dimension may be comparable to or larger than the sample size. Using a B-spline sieve method, we develop a sum-type test for dense alternatives, a max-type test for sparse alternatives, and a Cauchy combination test for adaptive inference. On the theoretical side, we derive explicit stochastic expansions for the estimated average alphas, establish asymptotic normality of the sum statistic, and develop the extreme-value limit theory for the max statistic by showing its Gumbel convergence under temporal dependence together with the validity of block-bootstrap calibration. We further prove asymptotic independence between the sum and max statistics and thereby justify the Cauchy combination test. Simulation results demonstrate that the proposed procedures achieve satisfactory size control and competitive power across a wide range of dense and sparse alternatives. An empirical application further illustrates the usefulness of the proposed methods in testing asset-pricing models with time-varying structure.

2604.13748 2026-04-16 stat.ME stat.ML

Forecasting Multivariate Time Series under Predictive Heterogeneity: A Validation-Driven Clustering Framework

Ziling Ma, Ángel López Oriona, Hernando Ombao, Ying Sun

详情
英文摘要

We study adaptive pooling under predictive heterogeneity in high-dimensional multivariate time series forecasting, where global models improve statistical efficiency but may fail to capture heterogeneous predictive structure, while naive specialization can induce negative transfer. We formulate adaptive pooling as a statistical decision problem and propose a validation-driven framework that determines when and how specialization should be applied. Rather than grouping series based on representation similarity, we define partitions through out-of-sample predictive performance, thereby aligning data organization with predictive risk, defined as expected out-of-sample loss and approximated via validation error. Cluster assignments are iteratively updated using validation losses for both point (Huber) and probabilistic (pinball) forecasting, improving robustness to heavy-tailed errors and local anomalies. To ensure reliability, we introduce a leakage-free fallback mechanism that reverts to a global model whenever specialization fails to improve validation performance, providing a safeguard against performance degradation under a strict training-validation-test protocol. Experiments on large-scale traffic datasets demonstrate consistent improvements over strong baselines while avoiding degradation when heterogeneity is weak. Overall, the proposed framework provides a principled and practically reliable approach to adaptive pooling in high-dimensional forecasting problems.

2604.13740 2026-04-16 cs.LG stat.ML

Online learning with noisy side observations

Tomáš Kocák, Gergely Neu, Michal Valko

Comments Published at International Conference on Artificial Intelligence and Statistics (AISTATS) 2016. 13 pages, 7 figures

详情
Journal ref
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1186-1194, 2016
英文摘要

We propose a new partial-observability model for online learning problems where the learner, besides its own loss, also observes some noisy feedback about the other actions, depending on the underlying structure of the problem. We represent this structure by a weighted directed graph, where the edge weights are related to the quality of the feedback shared by the connected nodes. Our main contribution is an efficient algorithm that guarantees a regret of $\widetilde{O}(\sqrt{α^* T})$ after $T$ rounds, where $α^*$ is a novel graph property that we call the effective independence number. Our algorithm is completely parameter-free and does not require knowledge (or even estimation) of $α^*$. For the special case of binary edge weights, our setting reduces to the partial-observability models of Mannor and Shamir (2011) and Alon et al. (2013) and our algorithm recovers the near-optimal regret bounds.

2604.13739 2026-04-16 cs.LG stat.ML

Spectral Thompson sampling

Tomas Kocak, Michal Valko, Remi Munos, Shipra Agrawal

Comments Published at AAAI Conference on Artificial Intelligence (AAAI) 2014

详情
英文摘要

Thompson Sampling (TS) has attracted a lot of interest due to its good empirical performance, in particular in the computational advertising. Though successful, the tools for its performance analysis appeared only recently. In this paper, we describe and analyze SpectralTS algorithm for a bandit problem, where the payoffs of the choices are smooth given an underlying graph. In this setting, each choice is a node of a graph and the expected payoffs of the neighboring nodes are assumed to be similar. Although the setting has application both in recommender systems and advertising, the traditional algorithms would scale poorly with the number of choices. For that purpose we consider an effective dimension d, which is small in real-world graphs. We deliver the analysis showing that the regret of SpectralTS scales as d*sqrt(T ln N) with high probability, where T is the time horizon and N is the number of choices. Since a d*sqrt(T ln N) regret is comparable to the known results, SpectralTS offers a computationally more efficient alternative. We also show that our algorithm is competitive on both synthetic and real-world data.

2604.13738 2026-04-16 stat.ML cs.LG

Covariance-adapting algorithm for semi-bandits with application to sparse rewards

Pierre Perrault, Vianney Perchet, Michal Valko

Comments Published at Conference on Learning Theory (COLT) 2020

详情
Journal ref
Proceedings of the 33rd Annual Conference on Learning Theory (COLT 2020), PMLR 125, 2020
英文摘要

We investigate stochastic combinatorial semi-bandits, where the entire joint distribution of outcomes impacts the complexity of the problem instance (unlike in the standard bandits). Typical distributions considered depend on specific parameter values, whose prior knowledge is required in theory but quite difficult to estimate in practice; an example is the commonly assumed sub-Gaussian family. We alleviate this issue by instead considering a new general family of sub-exponential distributions, which contains bounded and Gaussian ones. We prove a new lower bound on the expected regret on this family, that is parameterized by the unknown covariance matrix of outcomes, a tighter quantity than the sub-Gaussian matrix. We then construct an algorithm that uses covariance estimates, and provide a tight asymptotic analysis of the regret. Finally, we apply and extend our results to the family of sparse outcomes, which has applications in many recommender systems.

2604.13709 2026-04-16 stat.ME

Adaptive Sample Size Simulations with R package adsasi

Skerdi Haviari

Comments 21 pages, 7 figures

详情
英文摘要

Planning empirical experiments such as clinical trials or A/B tests requires sample size determination, which in many interesting cases has no closed-form solution (e.g. factorial or adaptive designs). adsasi is a new R package that enables simulations-first sample size calculations for any trial that can be simulated in short compute time. First, the user specifies as a function that takes a sample size as argument, simulates the experiment, and returns a boolean for success/failure. Then, adsasi functions adsasi_0d and adsasi_1d iteratively call it on different sample sizes and progressively home in on the one with nominal success rate (power), assuming that increasing sample size increases power. adsasi_1d can also draw, purely empirically, the relationship between a design parameter and sample size. The implementation uses a modified probit regression (with success/failure as the dependent variable), informed by simulations conducted around the target size, and provides standard errors at each stage using the Cramér-Rao bound derived from a custom analytical Hessian matrix. Simple examples are first presented, yielding results within Monte Carlo variance of their closed-form expressions, then intractable ones (including bootstrapping from an existing medical cohort). adsasi will hopefully facilitate the funding and conduct of interesting, highly complex experimental designs by making their sizing straightforward.

2604.13689 2026-04-16 stat.ME

Fractional lower-order covariance-based measures for cyclostationary time series with heavy-tailed distributions: application to dependence testing and model order identification

Wojciech Żuławiński, Agnieszka Wyłomańska

Comments 26 pages, 17 figures

详情
Journal ref
Digital Signal Processing 163, 105214, 2025
英文摘要

This article introduces new methods for the analysis of cyclostationary time series with infinite variance. Traditional cyclostationary analysis, based on periodically correlated (PC) processes, relies on the autocovariance function (ACVF). However, the ACVF is not suitable for data exhibiting a heavy-tailed distribution, particularly with infinite variance. Thus, we propose a novel framework for the analysis of cyclostationary time series with heavy-tailed distribution, utilizing the fractional lower-order covariance (FLOC) as an alternative to covariance. This leads to the introduction of two new autodependence measures: the periodic fractional lower-order autocorrelation function (peFLOACF) and the periodic fractional lower-order partial autocorrelation function (peFLOPACF). These measures generalize the classical periodic autocorrelation function (peACF) and periodic partial autocorrelation function (pePACF), offering robust tools for analyzing infinite-variance processes. Two practical applications of the proposed measures are explored: a portmanteau test for testing dependence in cyclostationary series and a method for order identification in periodic autoregressive (PAR) and periodic moving average (PMA) models with infinite variance. Both applications demonstrate the potential of new tools, with simulations validating their efficiency. The methodology is further illustrated through the analysis of real-world air pollution data, which showcases its practical utility. The results indicate that the proposed measures based on FLOC provide reliable and efficient techniques for analyzing cyclostationary processes with heavy-tailed distributions.

2604.13656 2026-04-16 cs.LG cs.AI math.ST stat.ML stat.TH

Ordinary Least Squares is a Special Case of Transformer

Xiaojun Tan, Yuchen Zhao

详情
英文摘要

The statistical essence of the Transformer architecture has long remained elusive: Is it a universal approximator, or a neural network version of known computational algorithms? Through rigorous algebraic proof, we show that the latter better describes Transformer's basic nature: Ordinary Least Squares (OLS) is a special case of the single-layer Linear Transformer. Using the spectral decomposition of the empirical covariance matrix, we construct a specific parameter setting where the attention mechanism's forward pass becomes mathematically equivalent to the OLS closed-form projection. This means attention can solve the problem in one forward pass, not by iterating. Building upon this prototypical case, we further uncover a decoupled slow and fast memory mechanism within Transformers. Finally, the evolution from our established linear prototype to standard Transformers is discussed. This progression facilitates the transition of the Hopfield energy function from linear to exponential memory capacity, thereby establishing a clear continuity between modern deep architectures and classical statistical inference.

2604.13598 2026-04-16 cs.LG stat.ME

Enhancing Reinforcement Learning for Radiology Report Generation with Evidence-aware Rewards and Self-correcting Preference Learning

Qin Zhou, Guoyan Liang, Qianyi Yang, Jingyuan Chen, Sai Wu, Chang Yao, Zhe Wang

Comments 13 pages,4 figures, ACL2026-main

详情
英文摘要

Recent reinforcement learning (RL) approaches have advanced radiology report generation (RRG), yet two core limitations persist: (1) report-level rewards offer limited evidence-grounded guidance for clinical faithfulness; and (2) current methods lack an explicit self-improving mechanism to align with clinical preference. We introduce clinically aligned Evidence-aware Self-Correcting Reinforcement Learning (ESC-RL), comprising two key components. First, a Group-wise Evidence-aware Alignment Reward (GEAR) delivers group-wise, evidence-aware feedback. GEAR reinforces consistent grounding for true positives, recovers missed findings for false negatives, and suppresses unsupported content for false positives. Second, a Self-correcting Preference Learning (SPL) strategy automatically constructs a reliable, disease-aware preference dataset from multiple noisy observations and leverages an LLM to synthesize refined reports without human supervision. ESC-RL promotes clinically faithful, disease-aligned reward and supports continual self-improvement during training. Extensive experiments on two public chest X-ray datasets demonstrate consistent gains and state-of-the-art performance.

2604.13563 2026-04-16 math.NA cs.NA math.ST stat.TH

Covariance-Informed Subspace: an Adaptive Gradient-Free Input Dimension Reduction Method for Bayesian Inference

Nadège Polette, Olivier Le Maître, Pierre Sochala, Alexandrine Gesret

详情
英文摘要

This paper addresses the challenge of dimension reduction (DR) in Bayesian inference of high-resolution two-or three-dimensional fields, where a priori parametrizations require a large number of terms. The underlying idea is common to state-of-the-art methods in which the parameter space is decomposed into two subspaces, one informed by the likelihood and one constrained by the prior. DR techniques generally use gradient information from the log-likelihood to derive the corresponding subspaces. However, the gradient may be unavailable or expensive to compute accurately, for instance in the case of simulation-based inference. Inspired by approaches based on likelihood-informed subspaces, we develop a new DR method tailored for settings where gradient computation is not feasible. More specifically, we propose a gradient-free indicator for determining whether a direction is informed by the data. This indicator is derived from the posterior-to-prior covariance ratio introduced in Spantini et al. (2015). We show that, in the linear Gaussian case, this indicator combined with an approximate likelihood leads to a better posterior approximation. The method is then extended to nonlinear cases, and strategies to approximate the posterior covariance are detailed. We demonstrate the effectiveness of this DR through two high-dimensional inference problems arising from groundwater and atmospheric applications.

2604.13539 2026-04-16 stat.AP

Relative plausibility versus probabilism: A level-of-analysis error in juridical proof

Stanley E. Lazic

详情
英文摘要

Debates about juridical proof are often framed as a conflict between probabilistic approaches and relative plausibility theory (RPT). This paper argues that this opposition rests on a level-of-analysis error. Drawing on Marr's distinction between levels of analysis, we show that RPT and probabilistic approaches operate at different conceptual levels and are therefore compatible rather than competing theories. RPT provides a computational-level description of juridical proof, characterizing the task of comparing explanations in light of the evidence and assessing whether a standard of proof has been met. Probabilistic approaches supply algorithmic-level accounts that specify how such comparative assessments can be represented and computed. When plausibility judgments satisfy minimal coherence conditions, relative plausibility corresponds to posterior odds. Recognizing this distinction clarifies longstanding disputes and highlights the complementary roles of explanation and probability in legal reasoning.

2604.13525 2026-04-16 stat.ML cs.LG math.OC

Robust Low-Rank Tensor Completion based on M-product with Weighted Correlated Total Variation and Sparse Regularization

Biswarup Karmakar, Ratikanta Behera

Comments 32 pages

详情
英文摘要

The robust low-rank tensor completion problem addresses the challenge of recovering corrupted high-dimensional tensor data with missing entries, outliers, and sparse noise commonly found in real-world applications. Existing methodologies have encountered fundamental limitations due to their reliance on uniform regularization schemes, particularly the tensor nuclear norm and $\ell_1$ norm regularization approaches, which indiscriminately apply equal shrinkage to all singular values and sparse components, thereby compromising the preservation of critical tensor structures. The proposed tensor weighted correlated total variation (TWCTV) regularizer addresses these shortcomings through an $M$-product framework that combines a weighted Schatten-$p$ norm on gradient tensors for low-rankness with smoothness enforcement and weighted sparse components for noise suppression. The proposed weighting scheme adaptively reduces the thresholding level to preserve both dominant singular values and sparse components, thus improving the reconstruction of critical structural elements and nuanced details in the recovered signal. Through a systematic algorithmic approach, we introduce an enhanced alternating direction method of multipliers (ADMM) that offers both computational efficiency and theoretical substantiation, with convergence properties comprehensively analyzed within the $M$-product framework.Comprehensive numerical evaluations across image completion, denoising, and background subtraction tasks validate the superior performance of this approach relative to established benchmark methods.

2604.13484 2026-04-16 stat.ML cs.LG

Joint Representation Learning and Clustering via Gradient-Based Manifold Optimization

Sida Liu, Yangzi Guo, Mingyuan Wang

详情
英文摘要

Clustering and dimensionality reduction have been crucial topics in machine learning and computer vision. Clustering high-dimensional data has been challenging for a long time due to the curse of dimensionality. For that reason, a more promising direction is the joint learning of dimension reduction and clustering. In this work, we propose a Manifold Learning Framework that learns dimensionality reduction and clustering simultaneously. The proposed framework is able to jointly learn the parameters of a dimension reduction technique (e.g. linear projection or a neural network) and cluster the data based on the resulting features (e.g. under a Gaussian Mixture Model framework). The framework searches for the dimension reduction parameters and the optimal clusters by traversing a manifold,using Gradient Manifold Optimization. The obtained The proposed framework is exemplified with a Gaussian Mixture Model as one simple but efficient example, in a process that is somehow similar to unsupervised Linear Discriminant Analysis (LDA). We apply the proposed method to the unsupervised training of simulated data as well as a benchmark image dataset (i.e. MNIST). The experimental results indicate that our algorithm has better performance than popular clustering algorithms from the literature.

2604.13478 2026-04-16 math.OC cs.CE econ.GN q-fin.EC stat.AP

Deepbullwhip: An Open-Source Simulation and Benchmarking for Multi-Echelon Bullwhip Analyses

Mansur M. Arief

详情
英文摘要

The bullwhip effect remains operationally persistent despite decades of analytical research. Two computational deficiencies hinder progress: the absence of modular open-source simulation tools for multi-echelon inventory dynamics with asymmetric costs, and the lack of a standardized benchmarking protocol for comparing mitigation strategies across shared metrics and datasets. This paper introduces deepbullwhip, an open-source Python package that integrates a simulation engine for serial supply chains (with pluggable demand generators, ordering policies, and cost functions via abstract base classes, and a vectorized Monte Carlo engine achieving 50 to 90 times speedup) with a registry-based benchmarking framework shipping a curated catalog of ordering policies, forecasting methods, six bullwhip metrics, and demand datasets including WSTS semiconductor billings. Five sets of experiments on a four-echelon semiconductor chain demonstrate cumulative amplification of 427x (Monte Carlo mean across 1,000 paths), a stochastic filtering phenomenon at upstream tiers (CV = 0.01), super-exponential lead time sensitivity, and scalability to 20.8 million simulation cells in under 7 seconds. Benchmark experiments reveal a 155x disparity between synthetic AR(1) and real WSTS bullwhip severity under the Order-Up-To policy, and quantify the BWR-NSAmp tradeoff across ordering policies, demonstrating that no single metric captures policy quality.

2604.13470 2026-04-16 cs.LG stat.ML

Universality of Gaussian-Mixture Reverse Kernels in Conditional Diffusion

Nafiz Ishtiaque, Syed Arefinul Haque, Kazi Ashraful Alam, Fatima Jahara

Comments 10+19 pages

详情
英文摘要

We prove that conditional diffusion models whose reverse kernels are finite Gaussian mixtures with ReLU-network logits can approximate suitably regular target distributions arbitrarily well in context-averaged conditional KL divergence, up to an irreducible terminal mismatch that typically vanishes with increasing diffusion horizon. A path-space decomposition reduces the output error to this mismatch plus per-step reverse-kernel errors; assuming each reverse kernel factors through a finite-dimensional feature map, each step becomes a static conditional density approximation problem, solved by composing Norets' Gaussian-mixture theory with quantitative ReLU bounds. Under exact terminal matching the resulting neural reverse-kernel class is dense in conditional KL.

2604.13446 2026-04-16 physics.ao-ph stat.AP

Modeling the Sea-Level Change from U.S. Vehicle Emissions

Tony Wong

详情
英文摘要

Recent U.S. Environmental Protection Agency (EPA) analyses have argued that greenhouse gas emissions from U.S. on-road vehicles contribute negligibly to global mean sea-level rise (GMSLR). Here, I replicate and extend the EPA's modeling framework using the FaIR climate model coupled with the BRICK sea-level model, incorporating a probabilistic weighting approach and a longer model timescale to better represent joint climate-sea-level uncertainty. In addition to the baseline SSP2-4.5 scenario and an EPA-consistent emissions reduction case, I examine alternative scenarios reflecting stalled technological progress and a counterfactual pre-regulation vehicle fleet. Results reproduce EPA estimates of approximately 1-2 cm of GMSLR reduction by 2100 under vehicle emissions mitigation but show that these differences grow substantially over multi-century timescales, exceeding 6 cm by 2200. Downscaling to U.S. coastlines reveals larger local effects, particularly along the Gulf of Mexico Coast. These findings highlight the long-term and regionally amplified benefits of emissions reductions from the transportation sector.

2604.13406 2026-04-16 stat.ME

Leveraging machine learning to estimate individualized treatment effects in cluster-randomized trials

Changjun Li, Xi Fang, Michael O. Harhay, Andrew B. Forbes, F. Perry Wilson, Guangyu Tong, Fan Li

详情
英文摘要

Cluster-randomized trials (CRTs) are widely used to evaluate interventions delivered at the clinic, practice, or community level. Although standard analyses typically target average treatment effects, such summaries mask potentially meaningful variation in treatment response across individuals and clusters. This work addresses the estimation of conditional average treatment effects (CATEs) for continuous outcomes in two-arm parallel CRTs by defining causal estimands that incorporate both individual- and cluster-level baseline covariates while marginalizing over unobserved cluster heterogeneity. To estimate these quantities, we develop a unified framework based on mixed-effects machine learning, integrating and extending a range of existing approaches, including Bayesian additive regression trees with random effects, multilevel Bayesian causal forests, mixed-effects random forests, several mixed-effects gradient boosting procedures, and generalized additive mixed models, while incorporating cluster-specific random intercepts to account for within-cluster dependence. We evaluate these methods across diverse simulation scenarios and demonstrate their use in the Task Shifting and Blood Pressure Control in Ghana CRT, which investigates strategies for improving hypertension management. Drawing on these investigations, we provide practical guidance for applying mixed-effects machine learning to quantify treatment-effect heterogeneity in CRTs, together with reproducible code that enables investigators to implement all methods within a coherent workflow.

2604.13393 2026-04-16 math.OC cs.LG stat.ML

A short proof of near-linear convergence of adaptive gradient descent under fourth-order growth and convexity

Damek Davis, Dmitriy Drusvyatskiy

详情
英文摘要

Davis, Drusvyatskiy, and Jiang showed that gradient descent with an adaptive stepsize converges locally at a nearly-linear rate for smooth functions that grow at least quartically away from their minimizers. The argument is intricate, relying on monitoring the performance of the algorithm relative to a certain manifold of slow growth -- called the ravine. In this work, we provide a direct Lyapunov-based argument that bypasses these difficulties when the objective is in addition convex and a has a unique minimizer. As a byproduct of the argument, we obtain a more adaptive variant than the original algorithm with encouraging numerical performance.

2604.13352 2026-04-16 stat.AP

A Machine Learning Framework for Uncertainty-Calibrated Capability Decision under Finite Samples

Fei Jiang, Lei Yang

Comments 18 pages, 4 figures and 10 tables

详情
英文摘要

Process capability indices such as $C_{pk}$ are widely used for manufacturing decisions, yet are typically applied via deterministic thresholding of finite-sample estimates, ignoring uncertainty and leading to unstable outcomes near the capability boundary. This paper reformulates capability approval as a decision-risk calibration problem, quantifying the probability of misclassification under finite-sample variability. We propose an uncertainty-aware hybrid framework that combines a statistically grounded baseline with a data-driven residual learner, where the baseline provides an interpretable approximation of failure risk and the residual captures systematic deviations due to non-normality, measurement effects, and finite-sample uncertainty. A nested Monte Carlo procedure is introduced to approximate oracle decision risk under controlled synthetic settings, enabling direct evaluation of probabilistic calibration. Empirical results show that conventional approaches exhibit substantial miscalibration in near-threshold regimes, while the proposed framework provides a structured and uncertainty-aware representation of decision risk that remains stable under stricter leak-free evaluation. The framework is simple, compatible with existing capability metrics, and readily deployable in industrial analytics systems.

2604.13341 2026-04-16 stat.ME

Newton's Algorithm as a Gradient Flow: A Geometric Framework for Recursive Mixture Estimation

Bernardo Flores

详情
英文摘要

Bayesian nonparametric mixture models provide a flexible framework for data analysis but are often hindered by the computational expense of traditional inference methods like MCMC. A fast, recursive algorithm proposed by Newton (2002) offers a practical alternative, yet its formal connection to Bayesian inference and its theoretical properties remain only partially understood. This paper reveals a new geometric interpretation of this class of predictive recursions. We demonstrate that Newton's recursion is a discrete-time approximation of a gradient flow on the space of probability measures governed by the Fisher-Rao geometry, providing the first rigorous dynamical characterisation of this family of estimators. This geometric perspective provides a principled theoretical foundation for studying these recursions: it clarifies their convergence behaviour, situates them within the variational Bayes literature, and yields a systematic basis for generalisation by modifying the underlying geometry and discretisation. In contrast to approaches that construct gradient flows from a prescribed variational objective, this work proceeds in the reverse direction: beginning from an existing recursive estimator and uncovering the variational problem it implicitly solves, it opens a pathway for the systematic analysis and extension of a broad class of sequential Bayesian estimators.

2604.13295 2026-04-16 cs.LG math.PR stat.ML

Some Theoretical Limitations of t-SNE

Rupert Li, Elchanan Mossel

Comments 19 pages, 7 figures

详情
英文摘要

t-SNE has gained popularity as a dimension reduction technique, especially for visualizing data. It is well-known that all dimension reduction techniques may lose important features of the data. We provide a mathematical framework for understanding this loss for t-SNE by establishing a number of results in different scenarios showing how important features of data are lost by using t-SNE.

2604.13274 2026-04-16 math.ST cs.CR stat.TH

Sequential Change Detection for Multiple Data Streams with Differential Privacy

Lixing Zhang, Liyan Xie, Ruizhi Zhang

Comments Accepted to the 2026 IEEE International Symposium on Information Theory (ISIT 2026)

详情
英文摘要

Sequential change-point detection seeks to rapidly identify distributional changes in streaming data while controlling false alarms. Existing multi-stream detection methods typically rely on non-private access to raw observations or intermediate statistics, limiting their usage in privacy-sensitive settings. We study sequential change-point detection for multiple data streams under differential privacy constraints. We consider multiple independent streams undergoing a synchronized change at an unknown time and in an unknown subset of streams, and propose DP-SUM-CUSUM, a differentially private detection procedure based on the summation of per-stream CUSUM statistics with calibrated Laplace noise injection. We show that DP-SUM-CUSUM satisfies sequential $\varepsilon$-differential privacy and derive bounds on the average run length to false alarm and the worst-case average detection delay, explicitly characterizing the privacy--efficiency tradeoff. A truncation-based extension is also presented to handle distributional shifts with unbounded log-likelihood ratios. Simulations and experiments on an Internet of Things (IoT) botnet dataset validate the proposed approach.

2604.13265 2026-04-16 stat.ME stat.AP

Efficient estimation of cumulative incidence curves via data fusion with surrogates: application to integrated analysis of vaccine trial and immunobridging data

Pan Zhao, Peter B. Gilbert, Oliver Dukes, Bo Zhang

详情
英文摘要

Refined vaccine regimens containing variant-matched inserts are often authorized based on historical phase 3 efficacy trials together with immunobridging studies. Phase 3 trials are essential for establishing immune biomarkers that reliably predict disease risk or vaccine efficacy against clinical endpoints. Once such immune correlates are identified, updated vaccine regimens can be approved through immunobridging designs that compare the immunogenicity of the updated regimen to that of an already-approved vaccine. We develop methods of inference for the counterfactual cumulative incidence curve using participant-level data from both a historical vaccine efficacy trial and an immunobridging study. We further extend these methods to pathogens with multiple serotypes -- such as dengue virus and influenza -- by estimating cause-specific cumulative incidence curves. We describe the identification assumptions, propose efficient and multiply robust estimators, and assess their finite-sample performance through simulation studies. We then apply the proposed methods to (1) estimating the hypothetical cumulative incidence curve for a bivalent mRNA booster and (2) testing a key assumption of no controlled direct effects, using data from the COVID-19 Variant Immunologic Landscape (COVAIL) Trial, a multistage randomized clinical study evaluating the safety and immunogenicity of a second COVID-19 booster dose.

2604.13264 2026-04-16 stat.ME stat.AP

Estimating effect thresholds and beyond: A flexible framework for multivariate alert detection

Lucia Ameis, Niklas Hagemann, Kathrin Möllenhoff

Comments 20 pages

详情
英文摘要

Evaluating the influence of continuous covariates, like exposure time or dose, on a response variable is a pivotal objective in the assessment of a compound's effect, particularly when determining toxicity in pre-clinical research or pharmacokinetics in clinical trials. The determination of an alert, such as the ED50 value, at which a pre-specified threshold of the response variable is crossed, is an important tool for the evaluation process. In practice, response data might be available for combinations of different covariates and the alert depending on both is of interest. In this case, it is crucial to use all available information and extrapolate between cases to ensure the optimal utilization of the data. In this paper, we introduce a parametric approach that allows alerts to be estimated in a multidimensional setting. For time-dose-response data, for instance, alert doses at a given time can be determined, even when there are no measurements available at that exact time. Likewise, it allows estimation of alert times for a given dose. More generally, the method makes it possible to characterize the complete alert relationship between covariates by leveraging all available data. This is achieved by fitting a parametric model and constructing either a confidence band for the two-dimensional curve given for example a fixed time or dose or by constructing a confidence plane for the three-dimensional model fit. The initial model fit is achieved by the flexible framework of Generalized Additive Models for Location, Scale and Shape (GAMLSS), which offers the possibility to account for a plethora of complex three-dimensional data structures. We demonstrate the validity of our approach through a simulation study and present an application to data from a study investigating the relevance of the exposure duration on cytotoxicity in primary human hepatocytes.

2604.13253 2026-04-16 cs.LG stat.ME stat.ML

Bias-Corrected Adaptive Conformal Inference for Multi-Horizon Time Series Forecasting

Ankit Lade, Sai Krishna J., Indar Kumar

Comments 14 pages, 3 figures, 2 tables. Preprint

详情
英文摘要

Adaptive Conformal Inference (ACI) provides distribution-free prediction intervals with asymptotic coverage guarantees for time series under distribution shift. However, ACI only adapts the quantile threshold -- it cannot shift the interval center. When a base forecaster develops persistent bias after a regime change, ACI compensates by widening intervals symmetrically, producing unnecessarily conservative bands. We propose Bias-Corrected ACI (BC-ACI), which augments standard ACI with an online exponentially weighted moving average (EWM) estimate of forecast bias. BC-ACI corrects nonconformity scores before quantile computation and re-centers prediction intervals, addressing the root cause of miscalibration rather than its symptom. An adaptive dead-zone threshold suppresses corrections when estimated bias is indistinguishable from noise, ensuring no degradation on well-calibrated data. In controlled experiments across 688 runs spanning two base models, four synthetic regimes, and three real datasets, BC-ACI reduces Winkler interval scores by 13--17% under mean and compound distribution shifts (Wilcoxon p < 0.001) while maintaining equivalent performance on stationary data (ratio 1.002x). We provide finite-sample analysis showing that coverage guarantees degrade gracefully with bias estimation error.

2604.13218 2026-04-16 stat.ML cs.AI cs.LG math.ST stat.TH

Identifiability of Potentially Degenerate Gaussian Mixture Models With Piecewise Affine Mixing

Danru Xu, Sébastien Lachapelle, Sara Magliacane

Comments 49 pages, 10 figures, AISTATS 2026

详情
英文摘要

Causal representation learning (CRL) aims to identify the underlying latent variables from high-dimensional observations, even when variables are dependent with each other. We study this problem for latent variables that follow a potentially degenerate Gaussian mixture distribution and that are only observed through the transformation via a piecewise affine mixing function. We provide a series of progressively stronger identifiability results for this challenging setting in which the probability density functions are ill-defined because of the potential degeneracy. For identifiability up to permutation and scaling, we leverage a sparsity regularization on the learned representation. Based on our theoretical results, we propose a two-stage method to estimate the latent variables by enforcing sparsity and Gaussianity in the learned representations. Experiments on synthetic and image data highlight our method's effectiveness in recovering the ground-truth latent variables.

2604.13188 2026-04-16 econ.EM stat.AP

Is Productivity Advantage of Cities Really Down To Mean and Variance?

Vladislav Morozov, Andrea Sy

详情
英文摘要

Firms in denser areas are more productive, a pattern attributed to agglomeration economies and firm selection. To disentangle these two channels, the popular approach of Combes et al. (2012, ECTA) critically assumes that total factor productivity (TFP) distributions between denser and less dense areas are the same up to mean, variance, and left-tail truncation. We empirically validate this assumption using Spanish administrative firm-level data and recent econometric methods adapted to noisy TFP estimates. Our results find that TFP distributions are indeed statistically identical up to these parameters, validating the use of such productivity decompositions. Furthermore, using only the mean and variance is sufficient to capture differences for all sectors. Accordingly, the productivity advantage of cities may be entirely due to agglomeration rather than stronger selection, suggesting that policymakers should focus on policies targeting agglomeration. Finally, our approach extends to related contexts like differences in worker skill distributions.

2604.13130 2026-04-16 cs.LG stat.ML

Generalization Guarantees on Data-Driven Tuning of Gradient Descent with Langevin Updates

Saumya Goyal, Rohith Rongali, Ritabrata Ray, Barnabás Póczos

详情
英文摘要

We study learning to learn for regression problems through the lens of hyperparameter tuning. We propose the Langevin Gradient Descent Algorithm (LGD), which approximates the mean of the posterior distribution defined by the loss function and regularizer of a convex regression task. We prove the existence of an optimal hyperparameter configuration for which the LGD algorithm achieves the Bayes' optimal solution for squared loss. Subsequently, we study generalization guarantees on meta-learning optimal hyperparameters for the LGD algorithm from a given set of tasks in the data-driven setting. For a number of parameters $d$ and hyperparameter dimension $h$, we show a pseudo-dimension bound of $O(dh)$, upto logarithmic terms under mild assumptions on LGD. This matches the dimensional dependence of the bounds obtained in prior work for the elastic net, which only allows for $h=2$ hyperparameters, and extends their bounds to regression on convex loss. Finally, we show empirical evidence of the success of LGD and the meta-learning procedure for few-shot learning on linear regression using a few synthetically created datasets.

2604.11165 2026-04-16 stat.ML cs.AI cs.LG math.ST stat.TH

Cost-optimal Sequential Testing via Doubly Robust Q-learning

Doudou Zhou, Yiran Zhang, Dian Jin, Yingye Zheng, Lu Tian, Tianxi Cai

详情
英文摘要

Clinical decision-making often involves selecting tests that are costly, invasive, or time-consuming, motivating individualized, sequential strategies for what to measure and when to stop ascertaining. We study the problem of learning cost-optimal sequential decision policies from retrospective data, where test availability depends on prior results, inducing informative missingness. Under a sequential missing-at-random mechanism, we develop a doubly robust Q-learning framework for estimating optimal policies. The method introduces path-specific inverse probability weights that account for heterogeneous test trajectories and satisfy a normalization property conditional on the observed history. By combining these weights with auxiliary contrast models, we construct orthogonal pseudo-outcomes that enable unbiased policy learning when either the acquisition model or the contrast model is correctly specified. We establish oracle inequalities for the stage-wise contrast estimators, along with convergence rates, regret bounds, and misclassification rates for the learned policy. Simulations demonstrate improved cost-adjusted performance over weighted and complete-case baselines, and an application to a prostate cancer cohort study illustrates how the method reduces testing cost without compromising predictive accuracy.

2603.24654 2026-04-16 quant-ph cs.LG stat.ML

Spectral methods: crucial for machine learning, natural for quantum computers?

Vasilis Belis, Joseph Bowles, Rishabh Gupta, Evan Peters, Maria Schuld

Comments 25 pages, 8 figures

详情
英文摘要

This article presents an argument for why quantum computers could unlock new methods for machine learning. We argue that spectral methods, in particular those that learn, regularise, or otherwise manipulate the Fourier spectrum of a machine learning model, are often natural for quantum computers. For example, if a generative machine learning model is represented by a quantum state, the Quantum Fourier Transform allows us to manipulate the Fourier spectrum of the state using the entire toolbox of quantum routines, an operation that is usually prohibitive for classical models. At the same time, spectral methods are surprisingly fundamental to machine learning: A spectral bias has recently been hypothesised to be the core principle behind the success of deep learning; support vector machines have been known for decades to regularise in Fourier space, and convolutional neural nets build filters in the Fourier space of images. Could, then, quantum computing open fundamentally different, much more direct and resource-efficient ways to design the spectral properties of a model? We discuss this potential in detail here, hoping to stimulate a direction in quantum machine learning research that puts the question of ``why quantum?'' first.

2603.20968 2026-04-16 cs.IT cs.CR math.IT math.ST stat.TH

Composition Theorems for Multiple Differential Privacy Constraints

Cemre Cadir, Salim Najib, Yanina Y. Shkel

Comments Extended version of article in 2026 IEEE International Symposium on Information Theory (ISIT 2026)

详情
英文摘要

The exact composition of mechanisms for which two differential privacy (DP) constraints hold simultaneously is studied. The resulting privacy region admits an exact representation as a mixture over compositions of mechanisms of heterogeneous DP guarantees, yielding a framework that naturally generalizes to the composition of mechanisms for which any number of DP constraints hold. This result is shown through a structural lemma for mixtures of binary hypothesis tests. Lastly, the developed methodology is applied to approximate $f$-DP composition.

2603.13848 2026-04-16 stat.ME

A family of divergence-based correlation measures for contingency tables under bivariate normality

Wataru Urasaki

详情
英文摘要

We propose a family of association measures for two-way contingency tables whose latent distribution can be assumed to be bivariate normal. When this assumption holds, the power-divergence measuring departure from independence can be approximated in closed form as a function of the latent correlation coefficient. By inverting this relationship, we obtain a family of measures $ρ_{(λ)}$, indexed by a scalar parameter $-1 \leq λ\leq 1$, that directly approximates the latent correlation. Special cases include the informational measure of correlation proposed by Linfoot (1957) at $λ= 0$ and Pearson's contingency coefficient $C$ at $λ= 1$. Additionally, we derive asymptotic distributions via the delta method and construct two families of confidence intervals. Simulation studies confirm that the proposed measures approximate the true latent correlation more faithfully than conventional divergence-based measures, and that they successfully distinguish between weak and moderate associations where existing measures tend to give indistinguishable values. Compared with the polychoric correlation coefficient, the proposed measures are computed several thousand times faster and remain numerically stable even when the latent correlation is close to one.

2603.13464 2026-04-16 stat.ME

Modeling Heterogeneous Mediation Effects in Survival Analysis via an Interpretable M-Learner Framework

Xingyu Li, Qing Liu, Xun Jiang, Hong Amy Xia, Brian P. Hobbs, Peng Wei

详情
英文摘要

Mediation analysis is a useful tool to evaluate surrogate endpoints in clinical trials. We propose a novel method, the M-survival learner, for estimating heterogeneous indirect treatment effects in the presence of censored outcomes. The proposed approach enables the identification of interpretable patient subgroups characterized by distinct mediation pathways. To distinguish heterogeneous from homogeneous mediation effects, we introduce a new statistical criterion specifically designed for survival data. The method provides a principled framework for evaluating heterogeneity in surrogate biomarker performance across patient populations, offering evidence to support accelerated approval drug. By explicitly assessing subgroup-specific surrogate validity, the proposed approach addresses key regulatory concerns regarding the reliability of surrogate endpoints. We further establish theoretical properties of the method to justify its statistical guarantees. We apply the approach to data from a Phase III randomized clinical trial of HIV treatment, demonstrating its practical utility in real-world settings. Extensive simulation studies further evaluate and demonstrate its finite-sample performance.

2602.09595 2026-04-16 stat.ME

Sharp Bounds for Treatment Effect Generalization under Outcome Distribution Shift

Amir Asiaee, Samhita Pal, Cole Beck, Jared D. Huling

详情
Journal ref
5th Conference on Causal Learning and Reasoning (CLeaR), 2026
英文摘要

Generalizing treatment effects from a randomized trial to a target population requires the assumption that potential outcome distributions are invariant across populations after conditioning on observed covariates. This assumption fails when unmeasured effect modifiers are distributed differently between trial participants and the target population. We develop a sensitivity analysis framework that bounds how much conclusions can change when this transportability assumption is violated. Our approach constrains the likelihood ratio between target and trial outcome densities by a scalar parameter $Λ\geq 1$, with $Λ= 1$ recovering standard transportability. For each $Λ$, we derive sharp bounds on the target average treatment effect -- the tightest interval guaranteed to contain the true effect under all data-generating processes compatible with the observed data and the sensitivity model. We show that the optimal likelihood ratios have a simple threshold structure, leading to a closed-form greedy algorithm that requires only sorting trial outcomes and redistributing probability mass. The resulting estimator runs in $O(n \log n)$ time and is consistent under standard regularity conditions. Simulations demonstrate that our bounds achieve nominal coverage when the true outcome shift falls within the specified $Λ$, provide substantially tighter intervals than worst-case bounds, and remain informative across a range of realistic violations of transportability.

2512.23748 2026-04-16 cs.LG math.PR stat.ML

A Review of Diffusion-based Simulation-Based Inference: Foundations and Applications in Non-Ideal Data Scenarios

Haley Rosso, Talea Mayo

详情
英文摘要

For complex simulation problems, inferring parameters often precludes the use of classical likelihood-based techniques due to intractable likelihoods. Simulation-based inference (SBI) methods offer a likelihood-free approach to directly learn posterior distributions $p(\bftheta \mid \xobs)$ from simulator outputs. Recently, diffusion models have emerged as promising tools for SBI, addressing limitations of earlier neural methods such as neural likelihood/posterior estimation and normalizing flows. This review examines diffusion-based SBI from first principles to applications, emphasizing robustness in three non-ideal data scenarios common to scientific computing: model misspecification (simulator-reality mismatch), unstructured or infinite-dimensional observations, and missing data. We synthesize mathematical foundations and survey eight methods addressing these challenges, such as conditional diffusion for irregular data, guided diffusion for prior adaptation, sequential and factorized approaches for efficiency, and consistency models for fast sampling. Throughout, we maintain consistent notation and emphasize conditions required for accurate posteriors. We conclude with open problems and applications to geophysical uncertainty quantification, where these challenges are acute.

2512.15232 2026-04-16 stat.AP eess.SP

A Blind Source Separation Framework to Monitor Sectoral Power Demand from Grid-Scale Load Measurements

Guillaume Koechlin, Filippo Bovera, Elena Degli Innocenti, Barbara Santini, Alessandro Venturi, Simona Vazio, Piercesare Secchi

详情
英文摘要

As demand-side flexibility becomes increasingly necessary to integrate variable renewable energy, understanding electricity demand composition across different grid levels is essential. However, at regional and national scales, visibility into the relative contributions of different consumer categories remains limited due to the complexity and cost of collecting end-use consumption data. To address this challenge, we propose a blind source separation framework to disaggregate open-access high-voltage grid load measurements into sectoral contributions. The approach relies on a constrained variant of non-negative matrix factorization, termed linearly-constrained non-negative matrix factorization (LCNMF), which allows prior information to be incorporated as linear constraints on the factor matrices, thereby providing weak supervision of the separation process. The framework is evaluated using Italian national load data from 2021 to 2023. Results demonstrate the identifiability of residential, services, and industrial load components and provide monthly sectoral consumption estimates consistent with reported statistics. The proposed method is generalizable and applicable to load disaggregation problems across multiple grid scales where disaggregated measurements are unavailable.

2511.20191 2026-04-16 stat.ME stat.CO

A Generalized Additive Partial-Mastery Cognitive Diagnosis Model

Camilo Cárdenas-Hurtado, Sze Ming Lee, Yunxiao Chen, Irini Moustaki

Comments 29 pages, 4 figures. Includes online appendix

详情
英文摘要

Cognitive diagnosis models (CDMs) are restricted latent class models widely used to measure attributes of interest in diagnostic assessments across education, psychology, biomedical sciences, and related fields. Partial-mastery CDMs (PM-CDMs) are an important extension of CDMs. They model individuals' status for each attribute as continuous to measure partial mastery levels, thereby relaxing the restrictive discrete-attribute assumption of classical CDMs. As a result, PM-CDMs often yield better fits to real-world data and more refined measurements of the substantive attributes of interest. However, these models inherit strong parametric assumptions from traditional CDMs about item response functions and thus still face a significant risk of model misspecification. This paper proposes a generalized additive PM-CDM (GaPM-CDM) that substantially relaxes the parametric assumptions of PM-CDMs. This proposal leverages model parsimony and interpretability by modeling each item response function as a mixture of nonparametric monotone functions of attributes. A method for estimating GaPM-CDM is developed that combines the marginal maximum likelihood estimator with a sieve approximation of the nonparametric functions. The new model is applicable in both confirmatory and exploratory settings, depending on whether prior knowledge of the relationship between observed variables and attributes is available. The proposed method is evaluated and compared with PM-CDMs through extensive simulation studies and further applied to two measurement problems from educational testing and healthcare research, respectively.

2509.21912 2026-04-16 cs.LG stat.ML

Discrete Guidance Matching: Exact Guidance for Discrete Flow Matching

Zhengyan Wan, Yidong Ouyang, Liyan Xie, Fang Fang, Hongyuan Zha, Guang Cheng

Comments Published as a conference paper at ICLR 2026

详情
英文摘要

Guidance provides a simple and effective framework for posterior sampling by steering the generation process towards the desired distribution. When modeling discrete data, existing approaches mostly focus on guidance with the first-order approximation to improve the sampling efficiency. However, such an approximation is inappropriate in discrete state spaces since the approximation error could be large. A novel guidance framework for discrete data is proposed to address this problem: we derive the exact transition rate for the desired distribution given a learned discrete flow matching model, leading to guidance that only requires a single forward pass in each sampling step, significantly improving efficiency. This unified novel framework is general enough, encompassing existing guidance methods as special cases, and it can also be seamlessly applied to the masked diffusion model. We demonstrate the effectiveness of our proposed guidance on energy-guided simulations and preference alignment on text-to-image generation and multimodal understanding tasks. The code is available at https://github.com/WanZhengyan/Discrete-Guidance-Matching.

2508.21025 2026-04-16 math.ST stat.ME stat.TH

Pivotal inference for linear predictions in stationary processes

Holger Dette, Sebastian Kühnert

Comments 33 pages, 3 figures, 2 tables

详情
英文摘要

In this paper we develop pivotal inference for the final (FPE) and relative final prediction error (RFPE) of linear forecasts in stationary processes. Our approach is based on a self-normalizing technique and avoids the estimation of the asymptotic variances of the empirical autocovariances. We provide pivotal confidence intervals for the (R)FPE, develop estimates for the minimal order of a linear prediction that is required to obtain a prespecified forecasting accuracy and also propose (pivotal) statistical tests for the hypotheses that the (R)FPE exceeds a given threshold. Additionally, we provide pivotal uncertainty quantification for the commonly used coefficient of determination $R^2$ obtained from a linear prediction based on the past $p \geq 1$ observations and develop new (pivotal) inference tools for the partial autocorrelation, which do not require the assumption of an autoregressive process.

2508.05663 2026-04-16 stat.ML cs.CR cs.LG cs.SY eess.SY

Random Walk Learning and the Pac-Man Attack

Xingran Chen, Parimal Parag, Rohit Bhagat, Zonghong Liu, Salim El Rouayheb

Comments The updated manuscript represents an incomplete version of the work. A substantially updated version will be prepared before further dissemination

详情
英文摘要

Random walk (RW)-based algorithms have long been popular in distributed systems due to low overheads and scalability, with recent growing applications in decentralized learning. However, their reliance on local interactions makes them inherently vulnerable to malicious behavior. In this work, we investigate an adversarial threat that we term the ``Pac-Man'' attack, in which a malicious node probabilistically terminates any RW that visits it. This stealthy behavior gradually eliminates active RWs from the network, effectively halting the learning process without triggering failure alarms. To counter this threat, we propose the Average Crossing (AC) algorithm--a fully decentralized mechanism for duplicating RWs to prevent RW extinction in the presence of Pac-Man. Our theoretical analysis establishes that (i) the RW population remains almost surely bounded under AC and (ii) RW-based stochastic gradient descent remains convergent under AC, even in the presence of Pac-Man, with a quantifiable deviation from the true optimum. Our extensive empirical results on both synthetic and real-world datasets corroborate our theoretical findings. Furthermore, they uncover a phase transition in the extinction probability as a function of the duplication threshold. We offer theoretical insights by analyzing a simplified variant of the AC, which sheds light on the observed phase transition.

2507.20846 2026-04-16 astro-ph.IM eess.SP stat.AP

Precision spectral estimation at sub-Hz frequencies: closed-form posteriors and Bayesian noise projection

Lorenzo Sala, Stefano Vitale

Comments This work has been submitted for possible publication

详情
英文摘要

We consider the problem of estimating cross-spectral quantities in the low-frequency regime, where long observation times limit averaging over large ensembles of periodograms, thereby preventing the use of approximate Gaussian statistics. This case is relevant for precision low-frequency gravitational experiments such as LISA and LISA Pathfinder. We present a Bayesian method for estimating spectral quantities in multivariate Gaussian time series. The approach, based on periodograms and Wishart statistics, yields closed-form expressions at any given frequency for the marginal posterior distributions of the individual power spectral densities, the pairwise coherence, and the multiple coherence, as well as for the joint posterior distribution of the full cross-spectral density matrix. In the context of noise projection -- where one series is modeled as a linear combination of filtered versions of the others, plus a background component -- the method also provides closed-form posteriors for both the susceptibilities, i.e., the filter transfer functions, and the power spectral density of the background. We apply the method to data from the LISA Pathfinder mission, showing effective decorrelation of temperature-induced acceleration noise and reliable estimation of its coupling coefficient.

2505.12836 2026-04-16 eess.IV cs.CV cs.LG stat.ML

The Gaussian Latent Machine: Efficient Prior and Posterior Sampling for Inverse Problems

Muhamed Kuric, Martin Zach, Andreas Habring, Michael Unser, Thomas Pock

详情
英文摘要

We consider the problem of sampling from a product-of-experts-type model that encompasses many standard prior and posterior distributions commonly found in Bayesian imaging. We show that this model can be easily lifted into a novel latent variable model, which we refer to as a Gaussian latent machine. This leads to a general sampling approach that unifies and generalizes many existing sampling algorithms in the literature. Most notably, it yields a highly efficient and effective two-block Gibbs sampling approach in the general case, while also specializing to direct sampling algorithms in particular cases. Finally, we present detailed numerical experiments that demonstrate the efficiency and effectiveness of our proposed sampling approach across a wide range of prior and posterior sampling problems from Bayesian imaging.

2504.18107 2026-04-16 stat.ME

Multi-Task Learning for High-Dimensional Regression with Many Weak Instruments

Di Zhang, Xuanyu Li, Baoluo Sun

Comments 55 pages, 2 figures, 4 tables

详情
英文摘要

Many weak instrumental variables (IVs) are routinely used in the health and social sciences to improve identification and inference of the treatment effect of interest, along with a broad collection of data on potential confounding factors in the hope that the IV assumptions hold within each data stratum. We propose a new debiased continuous-updating generalized method of moments estimator with multi-task learning of the IV propensity scores to simultaneously address the biases from a diverging number of weak IVs as well as first-step regularized estimation of nuisance regression functions in high-dimensional potential confounding factors. We develop a new multi-task learning theory for generalized linear models under a general sub-Gaussian design to establish valid inference in the many weak IVs asymptotic regime under appropriate sparsity conditions. We evaluate the proposed method via extensive Monte Carlo studies and an empirical application to investigate the returns to education.

2503.10787 2026-04-16 stat.ME stat.AP

Bayes factor functions for testing partial correlation coefficients

Saptati Datta

详情
英文摘要

Partial correlation coefficients are widely applied in the social sciences to evaluate the relationship between two variables after accounting for the influence of others. In this article, we present Bayes Factor Functions (BFFs) for assessing the presence of partial correlation. BFFs represent Bayes factors derived from test statistics and are expressed as functions of a standardized effect size. While traditional frequentist methods based on $p$-values have been criticized for their inability to provide cumulative evidence in favor of the true hypothesis, Bayesian approaches are often challenged due to their computational demands and sensitivity to prior distributions. BFFs overcome these limitations and offer summaries of hypothesis tests as alternative hypotheses are varied over a range of prior distributions on standardized effects. They also enable the integration of evidence across multiple studies.

2502.16758 2026-04-16 math.ST math.PR stat.TH

Stabilizing the Splits through Minimax Decision Trees

Zhenyuan Zhang, Hengrui Luo

Comments 69 pages, 17 figures; a substantial expansion upon the previous version

详情
英文摘要

By revisiting the end-cut preference (ECP) phenomenon associated with a single CART (Breiman et al. (1984)), we introduce MinimaxSplit decision trees, a robust alternative to CART that selects splits by minimizing the worst-case child risk rather than the average risk. For regression, we minimize the maximum within-child squared error; for classification, we minimize the maximum child entropy, yielding a C4.5-compatible criterion. We also study a cyclic variant that deterministically cycles coordinates, leading to our main method of cyclic MinimaxSplit decision trees. We prove oracle inequalities that cover both regression and classification, under mild marginal non-atomicity conditions. The bounds control the tree's global excess risk by local worst-case impurities and yield fast convergence rates compared to CART. We extend the analysis to a random-dimension forest variant that subsamples coordinates per node. Empirically, (cyclic) MinimaxSplit trees and their forests improve over baselines on structured heterogeneous data such as EEG amplitude regression over fixed time horizons and image denoising, framed as non-parametric regression on spatial coordinates.

2501.02746 2026-04-16 eess.SP math.PR math.SP math.ST stat.TH

A Large-Dimensional Analysis of ESPRIT DoA Estimation: Inconsistency and a Correction via RMT

Zhengyu Wang, Wei Yang, Xiaoyi Mai, Zenan Ling, Zhenyu Liao, Robert C. Qiu

Comments 29 pages, 10 figures, to appear on IEEE Trans. SP. Part of this work was presented at the IEEE 32nd European Signal Processing Conference (EUSIPCO 2024), Lyon, France, under the title "Inconsistency of ESPRIT DoA Estimation for Large Arrays and a Correction via RMT."

详情
英文摘要

In this paper, we perform asymptotic analyses of the widely used ESPRIT direction-of-arrival (DoA) estimator for large arrays, where the array size $N$ and the number of snapshots $T$ grow to infinity at the same pace. In this large-dimensional regime, the sample covariance matrix (SCM) is known to be a poor eigenspectral estimator of the population covariance. We show that the classical ESPRIT algorithm, that relies on the SCM, and as a consequence of the large-dimensional inconsistency of the SCM, produces inconsistent DoA estimates as $N,T \to \infty$ with $N/T \to c \in (0,\infty)$, for both widely-~and~closely-spaced DoAs. Leveraging tools from random matrix theory (RMT), we propose an improved G-ESPRIT method and prove its consistency in the same large-dimensional setting. From a technical perspective, we derive a novel bound on the eigenvalue differences between two potentially non-Hermitian matrices, which may be of independent interest. Numerical simulations are provided to corroborate our theoretical findings.

2501.02378 2026-04-16 cs.LG q-bio.NC stat.ML

A ghost mechanism: An analytical model of abrupt learning in recurrent networks

Fatih Dinc, Ege Cirakman, Bariscan Kurtkaya, Mert Yuksekgonul, Yiqi Jiang, Mark J. Schnitzer, Hidenori Tanaka

Comments to appear in Physical Review X

详情
英文摘要

Abrupt learning is a common phenomenon in recurrent neural networks (RNNs) trained on working memory tasks. In such cases, the networks develop transient slow regions in state space that extend the effective timescales of computation. However, the mechanisms driving sudden performance improvements and their causal role remain unclear. To address this gap, we introduce the ghost mechanism, a process by which dynamical systems exhibit transient slowdown near the remnant of a saddle-node bifurcation. By reducing the high-dimensional dynamics near ghost points, we derive a one-dimensional canonical form that analytically captures learning as a process controlled by a single scale parameter. Using this model, we study a form of abrupt learning emerging from ghost points and identify a critical learning rate that scales as an inverse power law with the timescale of the learned computation. Beyond this rate, learning collapses through two interacting modes: (i) vanishing gradients and (ii) oscillatory gradients near minima. These features can lock the system into high-confidence but incorrect predictions when parameter updates trigger a no-learning zone, a region of parameter space where gradients vanish. We validate these predictions in low-rank RNNs, where ghost points precede abrupt transitions, and further demonstrate their generality in full-rank RNNs trained on canonical working memory tasks. Our theory offers two approaches to address these learning difficulties: increasing trainable ranks stabilizes learning trajectories, while reducing output confidence mitigates entrapment in no-learning zones. Overall, the ghost mechanism reveals how the computational demands of a task constrain the optimization landscape, demonstrating that well-known learning difficulties in RNNs partly arise from the dynamical systems they must learn to implement.

2407.13407 2026-04-16 math.OC math.ST stat.TH

Nonconvex landscapes for $\mathbf{Z}_2$ synchronization and graph clustering are benign near exact recovery thresholds

Andrew D. McRae, Pedro Abdalla, Afonso S. Bandeira, Nicolas Boumal

详情
英文摘要

We study the optimization landscape of a smooth nonconvex program arising from synchronization over the two-element group $\mathbf{Z}_2$, that is, recovering $z_1, \dots, z_n \in \{\pm 1\}$ from (noisy) relative measurements $R_{ij} \approx z_i z_j$. Starting from a max-cut--like combinatorial problem, for integer parameter $r \geq 2$, the nonconvex problem we study can be viewed both as a rank-$r$ Burer--Monteiro factorization of the standard max-cut semidefinite relaxation and as a relaxation of $\{ \pm 1 \}$ to the unit sphere in $\mathbf{R}^r$. First, we present deterministic, non-asymptotic conditions on the measurement graph and noise under which every second-order critical point of the nonconvex problem yields exact recovery of the ground truth. Then, via probabilistic analysis, we obtain asymptotic guarantees for three benchmark problems: (1) synchronization with a complete graph and Gaussian noise, (2) synchronization with an Erdős--Rényi random graph and Bernoulli noise, and (3) graph clustering under the binary symmetric stochastic block model. In each case, we have, asymptotically as the problem size goes to infinity, a benign nonconvex landscape near a previously-established optimal threshold for exact recovery; we can approach this threshold to arbitrary precision with large enough (but finite) rank parameter $r$. In addition, our results are robust to monotone adversaries.

2405.07432 2026-04-16 stat.ML cs.LG cs.SY eess.SY

Nonparametric Sparse Online Learning of the Koopman Operator

Boya Hou, Sina Sanjari, Nathan Dahlin, Alec Koppel, Subhonmesh Bose

Comments 44 pages

详情
英文摘要

The Koopman operator provides a powerful framework for representing the dynamics of general nonlinear dynamical systems. However, existing data-driven approaches to learning the Koopman operator rely on batch data. In this work, we present a sparse online learning algorithm that learns the Koopman operator iteratively via stochastic approximation, with explicit control over model complexity and provable convergence guarantees. Specifically, we study the Koopman operator via its action on the reproducing kernel Hilbert space (RKHS), and address the mis-specified scenario where the dynamics may escape the chosen RKHS. In this mis-specified setting, we relate the Koopman operator to the conditional mean embeddings (CME) operator. We further establish both asymptotic and finite-time convergence guarantees for our learning algorithm in mis-specified setting, with trajectory-based sampling where the data arrive sequentially over time. Numerical experiments demonstrate the algorithm's capability to learn unknown nonlinear dynamics.

2404.12828 2026-04-16 math.OC math.ST stat.TH

Low solution rank of the matrix LASSO under RIP with consequences for rank-constrained algorithms

Andrew D. McRae

详情
英文摘要

We show that solutions to the popular convex matrix LASSO problem (nuclear-norm--penalized linear least-squares) have low rank under similar assumptions as required by classical low-rank matrix sensing error bounds. Although the purpose of the nuclear norm penalty is to promote low solution rank, a proof has not yet (to our knowledge) been provided outside very specific circumstances. Furthermore, we show that this result has significant theoretical consequences for nonconvex rank-constrained optimization approaches. Specifically, we show that if (a) the ground truth matrix has low rank, (b) the (linear) measurement operator has the matrix restricted isometry property (RIP), and (c) the measurement error is small enough relative to the nuclear norm penalty, then the (unique) LASSO solution has rank (approximately) bounded by that of the ground truth. From this, we show (a) that a low-rank--projected proximal gradient descent algorithm will converge linearly to the LASSO solution from any initialization, and (b) that the nonconvex landscape of the low-rank Burer-Monteiro--factored problem formulation is benign in the sense that all second-order critical points are globally optimal and yield the LASSO solution.

2312.05593 2026-04-16 econ.EM stat.ME

Benign Overfitting in Economic Forecasting via Noise Regularization

Yuan Liao, Xinjie Ma, Andreas Neuhierl, Zhentao Shi

详情
英文摘要

This paper studies linear overparameterized models in economic forecasting and highlights that including noise variables (regressors with no predictive power) regularizes the estimator. We consider a setting where both the outcome variable and the high-dimensional predictors are driven by a small number of latent factors, and show that the linear forecast model is dense rather than sparse. It turns out that a ridgeless regression augmented with noise predictors attains the same asymptotic forecast accuracy as an oracle with known true factors, without estimating the factors or assuming them to be strong. The gain comes from shrinkage of the eigenvalues of the design matrix, which reduces the out-of-sample variance. In contrast, perfect variable selection that removes noise variables can worsen forecasts when the number of retained predictors is comparable to the sample size. Empirically, we apply this approach to forecasting U.S. inflation, international GDP growth, and the U.S. equity risk premium, finding that noise regularization improves and stabilizes predictive performance.

2305.02304 2026-04-16 stat.ML cs.LG

New Equivalences Between Interpolation and SVMs: Kernels and Structured Features

Chiraag Kaushik, Andrew D. McRae, Mark A. Davenport, Vidya Muthukumar

Comments 23 pages, 2 figures

详情
英文摘要

The support vector machine (SVM) is a supervised learning algorithm that finds a maximum-margin linear classifier, often after mapping the data to a high-dimensional feature space via the kernel trick. Recent work has demonstrated that in certain sufficiently overparameterized settings, the SVM decision function coincides exactly with the minimum-norm label interpolant. This phenomenon of support vector proliferation (SVP) is especially interesting because it allows us to understand SVM performance by leveraging recent analyses of harmless interpolation in linear and kernel models. However, previous work on SVP has made restrictive assumptions on the data/feature distribution and spectrum. In this paper, we present a new and flexible analysis framework for proving SVP in an arbitrary reproducing kernel Hilbert space with a flexible class of generative models for the labels. We present conditions for SVP for features in the families of general bounded orthonormal systems (e.g. Fourier features) and independent sub-Gaussian features. In both cases, we show that SVP occurs in many interesting settings not covered by prior work, and we leverage these results to prove novel generalization results for kernel SVM classification.

2209.00991 2026-04-16 q-fin.RM math.ST stat.ME stat.TH

E-backtesting

Qiuqi Wang, Ruodu Wang, Johanna Ziegel

详情
英文摘要

In the recent Basel Accords, the Expected Shortfall (ES) replaces the Value-at-Risk (VaR) as the standard risk measure for market risk in the banking sector, making it the most important risk measure in financial regulation. One of the most challenging tasks in risk modeling practice is to backtest ES forecasts provided by financial institutions. To design a model-free backtesting procedure for ES, we make use of the recently developed techniques of e-values and e-processes. Backtest e-statistics are introduced to formulate e-processes for risk measure forecasts, and unique forms of backtest e-statistics for VaR and ES are characterized using recent results on identification functions. For a given backtest e-statistic, a few criteria for optimally constructing the e-processes are studied. The proposed method can be naturally applied to many other risk measures and statistical quantities. We conduct extensive simulation studies and data analysis to illustrate the advantages of the model-free backtesting method, and compare it with the ones in the literature.

1805.00318 2026-04-16 stat.CO

Likelihood-Based Inference with Separable Correlation Matrices

Karl Oskar Ekvall

详情
英文摘要

This paper proposes methods for likelihood-based inference in multivariate linear regressions when the correlation matrix of the responses is separable; that is, it has a Kronecker product structure, but the variances are unrestricted. The methods are enabled by a block-coordinate ascent-like algorithm with closed-form updates that strictly increases the likelihood at every iteration until convergence. In the numerical experiments, the proposed algorithm is 300--2500 times faster than a general-purpose solver, making parametric bootstrap tests of correlation and covariance separability practical. Parameters are identifiable, and standard errors can therefore be obtained from the expected Fisher information, which can be computed efficiently using the Kronecker product structure. Simulations show that the proposed estimator has lower error than both separable covariance and unrestricted estimators when the model holds, and that bootstrap tests maintain nominal size where asymptotic tests fail. An application to dissolved oxygen data from the Mississippi River demonstrates that separable correlation captures location-specific variance patterns that separable covariance cannot.

1804.09154 2026-04-16 cs.LG cs.HC stat.ML

DOOM Level Generation using Generative Adversarial Networks

Edoardo Giacomello, Pier Luca Lanzi, Daniele Loiacono

详情
英文摘要

We applied Generative Adversarial Networks (GANs) to learn a model of DOOM levels from human-designed content. Initially, we analysed the levels and extracted several topological features. Then, for each level, we extracted a set of images identifying the occupied area, the height map, the walls, and the position of game objects. We trained two GANs: one using plain level images, one using both the images and some of the features extracted during the preliminary analysis. We used the two networks to generate new levels and compared the results to assess whether the network trained using also the topological features could generate levels more similar to human-designed ones. Our results show that GANs can capture intrinsic structure of DOOM levels and appears to be a promising approach to level generation in first person shooter games.