arXivDaily arXiv每日学术速递 周一至周五更新
2410.18696 2026-01-28 stat.ME

Latent Functional PARAFAC for modeling multidimensional longitudinal data

Lucas Sort, Laurent Le Brusquet, Arthur Tenenhaus

详情
英文摘要

In numerous settings, it is increasingly common to deal with longitudinal data organized as high-dimensional multi-dimensional arrays, also known as tensors. Within this framework, the time-continuous property of longitudinal data often implies a smooth functional structure on one of the tensor modes. To help researchers investigate such data, we introduce a new tensor decomposition approach based on the CANDECOMP/PARAFAC decomposition. Our approach allows for representing a high-dimensional functional tensor as a low-dimensional set of functions and feature matrices. Furthermore, to capture the underlying randomness of the statistical setting more efficiently, we introduce a probabilistic latent model in the decomposition. A covariance-based block-relaxation algorithm is derived to obtain estimates of model parameters. Thanks to the covariance formulation of the solving procedure and thanks to the probabilistic modeling, the method can be used in sparse and irregular sampling schemes, making it applicable in numerous settings. We apply our approach to help characterize multiple neurocognitive scores observed over time in the Alzheimer's Disease Neuroimaging Initiative (ADNI) study. Finally, intensive simulations show a notable advantage of our method in reconstructing tensors.

2601.19868 2026-01-28 math.ST stat.TH

Estimating ordered variance of two scale mixture of normal distributions

Shrajal Bajpai, Lakshmi Kanta Patra

详情
英文摘要

This study investigates component wise estimation of ordered variances of scale mixture of two normal distributions. For this study two special loss functions are considered namely squared error loss function and entropy loss function. We have derived the general improvement results and based on these results the estimators that outperform BAEE are obtained. Moreover under certain sufficient conditions a class of improved estimators is proposed for both loss functions. As a special case of scale mixture of normal distribution the results are applied to the multivariate t-distribution and obtained the improvement results. For this case a detailed numerical comparison is carried out which validates our theoretical findings.

2601.19859 2026-01-28 physics.soc-ph nlin.AO stat.AP

Modeling Two-Scale Rank Distributions via Redistribution Dynamics or an Analytic Derivation of the Beta Rank Function

Oscar Fontanelli, Wentian Li

Comments 13 pages, 5 figures

详情
英文摘要

Beta Rank Function (BRF) is a two-sided distribution characterized by a smooth peak and double powerlaw decay, widely used to model empirical data exhibiting deviations from pure power laws. In this paper, we introduce a novel two-step generative process that produces data exactly following the BRF distribution. The first step involves any mechanism generating a power-law distribution, while the second step applies a regressive redistribution process that reallocates resources from poorer to richer entities, thereby amplifying inequality. This approach represents the first analytic derivation of an exact BRF distribution from a generative mechanism. We validate the model through applications to income and urban population distributions. Beyond exact generation, this framework offers new insights into the systemic origins of deviations from power laws frequently observed in complex systems, linking rank distributions to underlying feedback and redistribution dynamics.

2601.19814 2026-01-28 physics.soc-ph cs.CY stat.AP stat.OT

Abundance and Economic diversity as a descriptor of cities' economic complexity

Marco A. Rosas Pulido, Roberto Murcio, Omar R. Vázquez, Carlos Gershenson

详情
英文摘要

Intricate interactions among firms, institutions, and spatial structures shape urban economic systems. In this study, we propose a framework based on three structural dimensions -- abundance, diversity, and longevity (ADL) of economic units -- as proxies of urban economic complexity and resilience. Using a decade of georeferenced firm-level data from Mexico City, we analyze the relationships among ADL variables using regression, spatial correlation, and time-series clustering. Our results reveal nonlinear dynamics across urban space, with powerlaw behavior in central zones and logarithmic saturation in peripheral areas, suggesting differentiated growth regimes. Notably, firm longevity modulates the relationship between abundance and diversity, particularly in periurban transition zones. These spatial patterns point to an emerging polycentric restructuring within a traditionally monocentric metropolis. By integrating economic complexity theory with spatial analysis, our approach provides a scalable method to assess the adaptive capacity of urban economies. This has implications for understanding informality, designing inclusive urban policies, and navigating structural transitions in rapidly urbanizing regions.

2601.19811 2026-01-28 stat.ML cs.AI cs.LG math.ST stat.ME stat.TH

Revisiting Incremental Stochastic Majorization-Minimization Algorithms with Applications to Mixture of Experts

TrungKhang Tran, TrungTin Nguyen, Gersende Fort, Tung Doan, Hien Duy Nguyen, Binh T. Nguyen, Florence Forbes, Christopher Drovandi

Comments TrungKhang Tran and TrungTin Nguyen are co-first authors

详情
英文摘要

Processing high-volume, streaming data is increasingly common in modern statistics and machine learning, where batch-mode algorithms are often impractical because they require repeated passes over the full dataset. This has motivated incremental stochastic estimation methods, including the incremental stochastic Expectation-Maximization (EM) algorithm formulated via stochastic approximation. In this work, we revisit and analyze an incremental stochastic variant of the Majorization-Minimization (MM) algorithm, which generalizes incremental stochastic EM as a special case. Our approach relaxes key EM requirements, such as explicit latent-variable representations, enabling broader applicability and greater algorithmic flexibility. We establish theoretical guarantees for the incremental stochastic MM algorithm, proving consistency in the sense that the iterates converge to a stationary point characterized by a vanishing gradient of the objective. We demonstrate these advantages on a softmax-gated mixture of experts (MoE) regression problem, for which no stochastic EM algorithm is available. Empirically, our method consistently outperforms widely used stochastic optimizers, including stochastic gradient descent, root mean square propagation, adaptive moment estimation, and second-order clipped stochastic optimization. These results support the development of new incremental stochastic algorithms, given the central role of softmax-gated MoE architectures in contemporary deep neural networks for heterogeneous data modeling. Beyond synthetic experiments, we also validate practical effectiveness on two real-world datasets, including a bioinformatics study of dent maize genotypes under drought stress that integrates high-dimensional proteomics with ecophysiological traits, where incremental stochastic MM yields stable gains in predictive performance.

2601.19800 2026-01-28 math.PR math.ST stat.TH

Towards a complete characterization of indicator variograms and madograms

Xavier Emery, Christian Lantuéjoul, Nadia Mery, Mohammad Maleki

详情
英文摘要

Indicator variograms and madograms are structural tools used in many disciplines of the natural sciences and engineering to describe random sets and random fields. To date, several necessary conditions are known for a function to be a valid indicator variogram but, except for intractable corner-positive inequalities, a complete characterization of indicator variograms is missing. Likewise, only partial characterizations of madograms are known. This paper provides novel necessary and sufficient conditions for a given function to be the variogram of an indicator random field with constant mean value or to be the madogram of a random field, and establishes under which conditions these two families of functions coincide. Our results apply to any set of points where the random field is defined and rely on distance geometry and Gaussian random field theory.

2601.19774 2026-01-28 stat.ME

Cure models: from mixture to matrix distributions

Martin Bladt, Jorge Yslas

详情
英文摘要

Cure rate models address survival data in which a proportion of individuals will never experience the event of interest. Existing parametric approaches are predominantly based on finite mixtures, which impose restrictive assumptions on both the cure mechanism and the distribution of susceptible event times. A cure model based on phase-type distributions is introduced, leveraging their latent Markov jump process representation to allow immunity to occur either at baseline or dynamically during follow-up. This structure yields a flexible and interpretable formulation of long-term survival while encompassing classical mixture cure models as special cases. A unified regression framework is developed for covariate effects on both the cure rate and the susceptible survival distribution, and the proposed model class is dense, reducing the impact of parametric misspecification. Estimation is performed via expectation-maximization algorithms, accompanied by an automatic model selection strategy. Simulation studies and a real-data example demonstrate the practical advantages of the approach.

2601.19756 2026-01-28 cs.LG stat.ML

Provable Learning of Random Hierarchy Models and Hierarchical Shallow-to-Deep Chaining

Yunwei Ren, Yatin Dandi, Florent Krzakala, Jason D. Lee

详情
英文摘要

The empirical success of deep learning is often attributed to deep networks' ability to exploit hierarchical structure in data, constructing increasingly complex features across layers. Yet despite substantial progress in deep learning theory, most optimization results sill focus on networks with only two or three layers, leaving the theoretical understanding of hierarchical learning in genuinely deep models limited. This leads to a natural question: can we prove that deep networks, trained by gradient-based methods, can efficiently exploit hierarchical structure? In this work, we consider Random Hierarchy Models -- a hierarchical context-free grammar introduced by arXiv:2307.02129 and conjectured to separate deep and shallow networks. We prove that, under mild conditions, a deep convolutional network can be efficiently trained to learn this function class. Our proof builds on a general observation: if intermediate layers can receive clean signal from the labels and the relevant features are weakly identifiable, then layerwise training each individual layer suffices to hierarchically learn the target function.

2601.19755 2026-01-28 stat.ML cs.LG

Regularized $f$-Divergence Kernel Tests

Mónica Ribero, Antonin Schrab, Arthur Gretton

详情
英文摘要

We propose a framework to construct practical kernel-based two-sample tests from the family of $f$-divergences. The test statistic is computed from the witness function of a regularized variational representation of the divergence, which we estimate using kernel methods. The proposed test is adaptive over hyperparameters such as the kernel bandwidth and the regularization parameter. We provide theoretical guarantees for statistical test power across our family of $f$-divergence estimates. While our test covers a variety of $f$-divergences, we bring particular focus to the Hockey-Stick divergence, motivated by its applications to differential privacy auditing and machine unlearning evaluation. For two-sample testing, experiments demonstrate that different $f$-divergences are sensitive to different localized differences, illustrating the importance of leveraging diverse statistics. For machine unlearning, we propose a relative test that distinguishes true unlearning failures from safe distributional variations.

2601.19729 2026-01-28 stat.ME stat.AP

Coarsened data in small area estimation: a Bayesian two-part model for mapping smoking behaviour

Aldo Gardini, Lorenzo Mori

Comments 25 pages, 7 Figures, 2 Tables

详情
英文摘要

Estimating health indicators for restricted sub-populations is a recurring challenge in epidemiology and public health. When survey data are used, Small Area Estimation (SAE) methods can improve precision by borrowing strength across domains. In many applications, however, outcomes are self-reported and affected by coarsening mechanisms, such as rounding and digit preference, that reduce data resolution and may bias inference. This paper addresses both issues by developing a Bayesian unit-level SAE framework for semi-continuous, coarsened responses. Motivated by the 2019 Italian European Health Interview Survey, we estimate smoking indicators for domains defined by the cross-classification of Italian regions and age groups, capturing both smoking prevalence and intensity. The model adopts a two-part structure: a logistic component for smoking prevalence and a flexible mixture of Lognormal distributions for average cigarette consumption, coupled with an explicit model for coarsening and topcoding. Simulation studies show that ignoring coarsening can yield biased and unstable domain estimates with poor interval coverage, whereas the proposed model improves accuracy and achieves near-nominal coverage. The empirical application provides a detailed picture of smoking patterns across region-age domains, helping to characterize the dynamics of the phenomenon and inform targeted public health policies.

2601.19722 2026-01-28 stat.CO math.ST stat.ME stat.TH

Zeroth-order parallel sampling

Francesco Pozza, Giacomo Zanella

Comments 32 pages, 5 figures

详情
英文摘要

Finding effective ways to exploit parallel computing to accelerate Markov chain Monte Carlo methods is an important problem in Bayesian computation and related disciplines. In this paper, we consider the zeroth-order setting where the unnormalized target distribution can be evaluated but its gradient is unavailable for theoretical, practical, or computational reasons. We also assume access to $m$ parallel processors to accelerate convergence. The proposed approach is inspired by modern zeroth-order optimization methods, which mimic gradient-based schemes by replacing the gradient with a zeroth-order stochastic gradient estimator. Our contribution is twofold. First, we show that a naive application of popular zeroth-order stochastic gradient estimators within Markov chain Monte Carlo methods leads to algorithms with poor dependence on $m$, both for unadjusted and Metropolis-adjusted schemes. We then propose a simple remedy to this problem, based on a random-slice perspective, as opposed to a stochastic gradient one, obtaining a new class of zeroth-order samplers that provably achieve a polynomial speed-up in $m$. Theoretical findings are supported by numerical studies.

2601.19715 2026-01-28 math.ST stat.TH

Normalized Fractional Order Entropy-Based Decision-Making Models under Risk

Poulami Paul, Chanchal Kundu

Comments 19 pages, 3 figures

详情
英文摘要

Constructing efficient portfolios requires balancing expected returns with risk through optimal stock selection, while accounting for investor preferences. In a recent work by Paul and Kundu (2026), the fractional-order entropy due to Ubriaco was introduced as an uncertainty measure to capture varying investor attitudes toward risk. Building on this foundation, we introduce a novel normalized fractional order entropy aligned with investors' risk preferences that combines normalized fractional entropy with expected utility and variance. Risk sensitivity is modeled through the fractional parameter, interpolating between conservative or risk aversion and adventurous or high risk tolerance attitudes. Furthermore, the robustness and statistical significance of the fractional order entropy-based risk measure, termed normalized expected utility-fractional entropy (NEU-FE) and normalized expected utility-fractional entropy-variance (NEU-FEV) risk measures are explained with the help of machine learning tools, including Random forest, Ridge regression, Lasso Regression and artificial neural networks by using Indian stock market (NIFTY50). The results confirm that the proposed decision models support investors in making high-quality portfolio investments.

2601.19710 2026-01-28 stat.CO math.ST stat.ME stat.TH

On randomized step sizes in Metropolis-Hastings algorithms

Sebastiano Grazzi, Samuel Livingstone, Lionel Riou-Durand

详情
英文摘要

The performance of Metropolis-Hastings algorithms is highly sensitive to the choice of step size, and miss-specification can lead to severe loss of efficiency. We study algorithms with randomized step sizes, considering both auxiliary-variable and marginalized constructions. We show that algorithms with a randomized step size inherit weak Poincaré inequalities/spectral gaps from their fixed-step-size counterparts under minimal conditions, and that the marginalized kernel should always be preferred in terms of asymptotic variance to the auxiliary-variable choice if it is implementable. In addition we show that both types of randomization make an algorithm robust to tuning, meaning that spectral gaps decay polynomially as the step size is increasingly poorly chosen. We further show that step-size randomization often preserves high-dimensional scaling limits and algorithmic complexity, while increasing the optimal acceptance rate for Langevin and Hamiltonian samplers when an Exponential or Uniform distribution is chosen to randomize the step size. Theoretical results are complemented with a numerical study on challenging benchmarks such as Poisson regression, Neal's funnel and the Rosenbrock (banana) distribution.

2601.19688 2026-01-28 stat.AP

Adaptive L-tests for high dimensional independence

Ping Zhao, Huifang Ma

详情
英文摘要

Testing mutual independence among multiple random variables is a fundamental problem in statistics, with wide applications in genomics, finance, and neuroscience. In this paper, we propose a new class of tests for high-dimensional mutual independence based on $L$-statistics. We establish the asymptotic distribution of the proposed test when the order parameter $k$ is fixed, and prove asymptotic normality when $k$ diverges with the dimension. Moreover, we show the asymptotic independence of the fixed-$k$ and diverging-$k$ statistics, enabling their combination through the Cauchy method. The resulting adaptive test is both theoretically justified and practically powerful across a wide range of alternatives. Simulation studies demonstrate the advantages of our method.

2601.19666 2026-01-28 stat.ME math.ST stat.ML stat.TH

Direct Doubly Robust Estimation of Conditional Quantile Contrasts

Josh Givens, Song Liu, Henry W J Reeve, Katarzyna Reluga

Comments To be published as a conference paper at ICLR 2026

详情
英文摘要

Within heterogeneous treatment effect (HTE) analysis, various estimands have been proposed to capture the effect of a treatment conditional on covariates. Recently, the conditional quantile comparator (CQC) has emerged as a promising estimand, offering quantile-level summaries akin to the conditional quantile treatment effect (CQTE) while preserving some interpretability of the conditional average treatment effect (CATE). It achieves this by summarising the treated response conditional on both the covariates and the untreated response. Despite these desirable properties, the CQC's current estimation is limited by the need to first estimate the difference in conditional cumulative distribution functions and then invert it. This inversion obscures the CQC estimate, hampering our ability to both model and interpret it. To address this, we propose the first direct estimator of the CQC, allowing for explicit modelling and parameterisation. This explicit parameterisation enables better interpretation of our estimate while also providing a means to constrain and inform the model. We show, both theoretically and empirically, that our estimation error depends directly on the complexity of the CQC itself, improving upon the existing estimation procedure. Furthermore, it retains the desirable double robustness property with respect to nuisance parameter estimation. We further show our method to outperform existing procedures in estimation accuracy across multiple data scenarios while varying sample size and nuisance error. Finally, we apply it to real-world data from an employment scheme, uncovering a reduced range of potential earnings improvement as participant age increases.

2601.19649 2026-01-28 math.ST stat.TH

Semi-supervised learning in unmatched linear regression using an empirical likelihood approach

Fadoua Balabdaoui, Jinyu Chen

详情
英文摘要

Knowing the link between observed predictive variables and outcomes is crucial for making inference in any regression model. When this link is missing, partially or completely, classical estimation methods fail in recovering the true regression function. Deconvolution approaches have been proposed and studied in detail in the unmatched setting where the predictive variables and responses are allowed to be independent. In this work, we consider linear regression in a semi-supervised learning setting where, beside a small sample of matched data, we have access to a relatively large unmatched sample. Using maximum likelihood estimation, we show that under some mild assumptions the semi-supervised learning empirical maximum likelihood estimator (SSLEMLE) is asymptotically normal and give explicitly its asymptotic covariance matrix as a function of the ratio of the matched/unmatched sample sizes and other parameters. Furthermore, we quantify the statistical gain achieved by having the additional large unmatched sample over having only the small matched sample. To illustrate the theory, we present the results of an extensive simulation study and apply our methodology to the "combined cycle power plant" data set.

2601.19595 2026-01-28 cs.LG cs.AI math.OC stat.ML

Intersectional Fairness via Mixed-Integer Optimization

Jiří Němeček, Mark Kozdoba, Illia Kryvoviaz, Tomáš Pevný, Jakub Mareček

Comments 17 pages, 10 figures, 1 table

详情
英文摘要

The deployment of Artificial Intelligence in high-risk domains, such as finance and healthcare, necessitates models that are both fair and transparent. While regulatory frameworks, including the EU's AI Act, mandate bias mitigation, they are deliberately vague about the definition of bias. In line with existing research, we argue that true fairness requires addressing bias at the intersections of protected groups. We propose a unified framework that leverages Mixed-Integer Optimization (MIO) to train intersectionally fair and intrinsically interpretable classifiers. We prove the equivalence of two measures of intersectional fairness (MSD and SPSF) in detecting the most unfair subgroup and empirically demonstrate that our MIO-based algorithm improves performance in finding bias. We train high-performing, interpretable classifiers that bound intersectional bias below an acceptable threshold, offering a robust solution for regulated industries and beyond.

2601.19559 2026-01-28 cs.IR stat.AP

Comparing how Large Language Models perform against keyword-based searches for social science research data discovery

Mark Green, Maura Halstead, Caroline Jay, Richard Kingston, Alex Singleton, David Topping

详情
英文摘要

This paper evaluates the performance of a large language model (LLM) based semantic search tool relative to a traditional keyword-based search for data discovery. Using real-world search behaviour, we compare outputs from a bespoke semantic search system applied to UKRI data services with the Consumer Data Research Centre (CDRC) keyword search. Analysis is based on 131 of the most frequently used search terms extracted from CDRC search logs between December 2023 and October 2024. We assess differences in the volume, overlap, ranking, and relevance of returned datasets using descriptive statistics, qualitative inspection, and quantitative similarity measures, including exact dataset overlap, Jaccard similarity, and cosine similarity derived from BERT embeddings. Results show that the semantic search consistently returns a larger number of results than the keyword search and performs particularly well for place based, misspelled, obscure, or complex queries. While the semantic search does not capture all keyword based results, the datasets returned are overwhelmingly semantically similar, with high cosine similarity scores despite lower exact overlap. Rankings of the most relevant results differ substantially between tools, reflecting contrasting prioritisation strategies. Case studies demonstrate that the LLM based tool is robust to spelling errors, interprets geographic and contextual relevance effectively, and supports natural-language queries that keyword search fails to resolve. Overall, the findings suggest that LLM driven semantic search offers a substantial improvement for data discovery, complementing rather than fully replacing traditional keyword-based approaches.

2601.19390 2026-01-28 astro-ph.CO stat.AP

Almanac: HMC sampling with bounded velocity

Javier Silva Lafaurie, Lorne Whiteway, Elena Sellentin, Kutay Nazli, Andrew H. Jaffe, Alan F. Heavens, Arthur Loureiro

详情
英文摘要

In Hamiltonian Monte Carlo sampling, the shape of the potential and the choice of the momentum distribution jointly give rise to the Hamiltonian dynamics of the sampler. An efficient sampler propagates quickly in all regions of the parameter space, so that the chain has a low autocorrelation length and the sampler has a high acceptance rate, with the goal of optimising the number of near-independent samples for given computational cost. Standard Gaussian momentum distributions allow arbitrarily large velocities, which can lead to inefficient exploration in posteriors with ridges or funnel-like geometries. We investigate alternative momentum distributions based on relativistic and Student's t kinetic energies, which naturally limit particle velocities and may improve robustness. Using Almanac, a sampler for cosmological posterior distributions of sky maps and power spectra on the sphere, we test these alternatives in both low- and high-dimensional settings. We find that the choice of parameterization and momentum distribution can improve convergence and effective sample rate, though the achievable gains are generally modest and strongly problem-dependent, reaching up to an order of magnitude in favorable cases. Among the momentum distributions that we tested, those with moderately heavy tails achieved the best balance between efficiency and stability. These results highlight the importance of sampler design and encourage future work on adaptive and self-tuning strategies for kinetic energy parameter optimization in high-dimensional settings.

2601.19389 2026-01-28 math.ST math.PR stat.TH

Sufficient Conditions for Some Stochastic Orders of Discrete Random Variables with Applications in Reliability

F. Belzunce, C. Martínez-Riquelme, M. Pereda

Comments 15 pages. Published open access article

Journal ref Mathematics (2022), 10, 147

详情
英文摘要

In this paper we focus on providing sufficient conditions for some well-known stochastic orders in reliability but dealing with the discrete versions of them, filling a gap in the literature. In particular, we find conditions based on the unimodality of the likelihood ratio for the comparison in some stochastic orders of two discrete random variables. These results have interest in comparing discrete random variables because the sufficient conditions are easy to check when there are no closed expressions for the survival functions, which occurs in many cases. In addition, the results are applied to compare several parametric families of discrete distributions.

2601.19368 2026-01-28 physics.flu-dyn stat.ML

Divergence-Free Diffusion Models for Incompressible Fluid Flows

Wilfried Genuist, Éric Savin, Filippo Gatti, Didier Clouteau

Comments This paper proposes an autoregressive divergence-free diffusion model for 2D Kolmogorov fluid flow forecasting

详情
英文摘要

Generative diffusion models are extensively used in unsupervised and self-supervised machine learning with the aim to generate new samples from a probability distribution estimated with a set of known samples. They have demonstrated impressive results in replicating dense, real-world contents such as images, musical pieces, or human languages. This work investigates their application to the numerical simulation of incompressible fluid flows, with a view toward incorporating physical constraints such as incompressibility in the probabilistic forecasting framework enabled by generative networks. For that purpose, we explore different conditional, score-based diffusion models where the divergence-free constraint is imposed by the Leray spectral projector, and autoregressive conditioning is aimed at stabilizing the forecasted flow snapshots at distant time horizons. The proposed models are run on a benchmark turbulence problem, namely a Kolmogorov flow, which allows for a fairly detailed analytical and numerical treatment and thus simplifies the evaluation of the numerical methods used to simulate it. Numerical experiments of increasing complexity are performed in order to compare the advantages and limitations of the diffusion models we have implemented and appraise their performances, including: (i) in-distribution assessment over the same time horizons and for similar physical conditions as the ones seen during training; (ii) rollout predictions over time horizons unseen during training; and (iii) out-of-distribution tests for forecasting flows markedly different from those seen during training. In particular, these results illustrate the ability of diffusion models to reproduce the main statistical characteristics of Kolmogorov turbulence in scenarios departing from the ones they were trained on.

2601.19323 2026-01-28 math.PR math.ST stat.TH

Positive autocorrelation at unit lag for stationary random walk Metropolis-Hastings in ${\mathbb R}^d$

James Allen Fill, Svante Janson

Comments 52 pages

详情
英文摘要

It is often asserted in the literature that one should expect positive autocorrelation for random walk Metropolis-Hastings (RWMH), especially if the typical proposal step-size is small relative to the variability in the target density. In this paper, we consider a stationary RWMH chain ${\bf X}$ taking values in $d$-dimensional Euclidean space and (subject only to the existence of densities with respect to Lebesgue measure) with general target distribution having finite second moment and general proposal random walk step-distribution. We prove, for any nonzero vector ${\bf c}$, strict positivity of the autocorrelation function at unit lag for the stochastic process $\langle{\bf c},{\bf X}\rangle$, that is, \[{\operatorname{Corr}}(\langle{\bf c},{\bf X}_0\rangle,\langle{\bf c},{\bf X}_1\rangle)>0,\] and we establish the same result, but with weak inequality (which can in some cases be equality) when the state space for ${\bf X}$ is changed to the integer grid ${\mathbb Z}^d$. Further, for ${\bf c}\neq{\bf 0}$ we establish the sharp lower bound \[{\operatorname{Corr}}(\langle{\bf c},{\bf X}_0\rangle,\langle{\bf c},{\bf X}_1\rangle)>\tfrac19\] on autocorrelation when we assume both that (i) the target density $π$ is spherically symmetric and unimodal in the specific sense that $π({\bf x})=\hatπ(\|{\bf x}\|)$ for some nonincreasing function $\hatπ$ on $[0,\infty)$ and that (ii) the proposal step-density is symmetric about ${\bf 0}$. We study the autocorrelation indirectly, by considering the incremental variance function (or incremental second-moment function) at unit lag. The same approach allows us also for $r\in[2,\infty)$ to upper-bound the incremental $r$th-absolute-moment function at unit lag. We give also closely related inequalities for the total variation distance between two distributions on ${\mathbb R}^d$ differing only by a location shift.

2601.19321 2026-01-28 q-fin.CP q-fin.ST stat.ML

Predictive Accuracy versus Interpretability in Energy Markets: A Copula-Enhanced TVP-SVAR Analysis

Fredy Pokou, Jules Sadefo Kamdem, Kpante Emmanuel Gnandi

详情
英文摘要

This paper investigates whether structural econometric models can rival machine learning in forecasting energy--macro dynamics while retaining causal interpretability. Using monthly data from 1999 to 2025, we develop a unified framework that integrates Time-Varying Parameter Structural VARs (TVP-SVAR) with advanced dependence structures, including DCC-GARCH, t-copulas, and mixed Clayton--Frank--Gumbel copulas. These models are empirically evaluated against leading machine learning techniques Gaussian Process Regression (GPR), Artificial Neural Networks, Random Forests, and Support Vector Regression across seven macro-financial and energy variables, with Brent crude oil as the central asset. The findings reveal three major insights. First, TVP-SVAR consistently outperforms standard VAR models, confirming structural instability in energy transmission channels. Second, copula-based extensions capture non-linear and tail dependence more effectively than symmetric DCC models, particularly during periods of macroeconomic stress. Third, despite their methodological differences, copula-enhanced econometric models and GPR achieve statistically equivalent predictive accuracy (t-test p = 0.8444). However, only the econometric approach provides interpretable impulse responses, regime shifts, and tail-risk diagnostics. We conclude that machine learning can replicate predictive performance but cannot substitute the explanatory power of structural econometrics. This synthesis offers a pathway where AI accuracy and economic interpretability jointly inform energy policy and risk management.

2601.19312 2026-01-28 cs.LG cs.SY eess.SY stat.CO stat.ML

LightSBB-M: Bridging Schrödinger and Bass for Generative Diffusion Modeling

Alexandre Alouadi, Pierre Henry-Labordère, Grégoire Loeper, Othmane Mazhar, Huyên Pham, Nizar Touzi

详情
英文摘要

The Schrodinger Bridge and Bass (SBB) formulation, which jointly controls drift and volatility, is an established extension of the classical Schrodinger Bridge (SB). Building on this framework, we introduce LightSBB-M, an algorithm that computes the optimal SBB transport plan in only a few iterations. The method exploits a dual representation of the SBB objective to obtain analytic expressions for the optimal drift and volatility, and it incorporates a tunable parameter beta greater than zero that interpolates between pure drift (the Schrodinger Bridge) and pure volatility (Bass martingale transport). We show that LightSBB-M achieves the lowest 2-Wasserstein distance on synthetic datasets against state-of-the-art SB and diffusion baselines with up to 32 percent improvement. We also illustrate the generative capability of the framework on an unpaired image-to-image translation task (adult to child faces in FFHQ). These findings demonstrate that LightSBB-M provides a scalable, high-fidelity SBB solver that outperforms existing SB and diffusion baselines across both synthetic and real-world generative tasks. The code is available at https://github.com/alexouadi/LightSBB-M.

2601.19277 2026-01-28 stat.AP

Embedding Birth-Death Processes within a Dynamic Stochastic Block Model

Gabriela Bayolo Soler, Miraine Dávila Felipe, Ghislaine Gayraud

详情
英文摘要

Statistical clustering in dynamic networks aims to identify groups of nodes with similar or distinct internal connectivity patterns as the network evolves over time. While early research primarily focused on static Stochastic Block Models (SBMs), recent advancements have extended these models to handle dynamic and weighted networks, allowing for a more accurate representation of temporal variations in structure. Additional developments have introduced methods for detecting structural changes, such as shifts in community membership. However, limited attention has been paid to dynamic networks with variable population sizes, where nodes may enter or exit the network. To address this gap, we propose an extension of dynamic SBMs (dSBMs) that incorporates a birth-death process, enabling the statistical clustering of nodes in dynamic networks with evolving population sizes. This work makes three main contributions: (1) the introduction of a novel model for dSBMs with birth-death processes, (2) a framework for parameter inference and prediction of latent communities in this model, and (3) the development of an adapted Variational Expectation-Maximization (VEM) algorithm for efficient inference within this extended framework.

2601.19256 2026-01-28 cs.LG stat.ML

E-QRGMM: Efficient Generative Metamodeling for Covariate-Dependent Uncertainty Quantification

Zhiyang Liang, Qingkai Zhang

详情
英文摘要

Covariate-dependent uncertainty quantification in simulation-based inference is crucial for high-stakes decision-making but remains challenging due to the limitations of existing methods such as conformal prediction and classical bootstrap, which struggle with covariate-specific conditioning. We propose Efficient Quantile-Regression-Based Generative Metamodeling (E-QRGMM), a novel framework that accelerates the quantile-regression-based generative metamodeling (QRGMM) approach by integrating cubic Hermite interpolation with gradient estimation. Theoretically, we show that E-QRGMM preserves the convergence rate of the original QRGMM while reducing grid complexity from $O(n^{1/2})$ to $O(n^{1/5})$ for the majority of quantile levels, thereby substantially improving computational efficiency. Empirically, E-QRGMM achieves a superior trade-off between distributional accuracy and training speed compared to both QRGMM and other advanced deep generative models on synthetic and practical datasets. Moreover, by enabling bootstrap-based construction of confidence intervals for arbitrary estimands of interest, E-QRGMM provides a practical solution for covariate-dependent uncertainty quantification.

2601.19223 2026-01-28 math.DS stat.ML

Nonlocal Kramers-Moyal formulas and data-driven discovery of stochastic dynamical systems with multiplicative Lévy noise

Yang Li, Jinqiao Duan

详情
英文摘要

Traditional data-driven methods, effective for deterministic systems or stochastic differential equations (SDEs) with Gaussian noise, fail to handle the discontinuous sample paths and heavy-tailed fluctuations characteristic of Lévy processes, particularly when the noise is state-dependent. To bridge this gap, we establish nonlocal Kramers-Moyal formulas, rigorously generalizing the classical Kramers-Moyal relations to SDEs with multiplicative Lévy noise. These formulas provide a direct link between short-time transition probability densities (or sample path statistics) and the underlying SDE coefficients: the drift vector, diffusion matrix, Lévy jump measure kernel, and Lévy noise intensity functions. Leveraging these theoretical foundations, we develop novel data-driven algorithms capable of simultaneously identifying all governing components from data and establish convergence results and error analysis for the algorithms. We validate the framework through extensive numerical experiments on prototypical systems. This work provides a principled and practical toolbox for discovering interpretable SDE models governing complex systems influenced by discontinuous, heavy-tailed, state-dependent fluctuations, with broad applicability in climate science, neuroscience, epidemiology, finance, and biological physics.

2601.19167 2026-01-28 stat.ME

Modeling Ordinal Survey Data with Unfolding Models

Rayleigh Lei, Abel Rodriguez

详情
英文摘要

Surveys that rely on ordinal polychotomous (Likert-like) items are widely employed to capture individual preferences because they allow respondents to express both the direction and strength of their preferences. Latent factor models traditionally used in this context implicitly assume that the response functions (the cumulative distribution of the ordinal outcome) are monotonic on the latent trait. This assumption can be too restrictive in several application areas, including in political science and marketing. In this work, we propose a novel ordinal probit unfolding model that can accommodate both monotonic and non-monotonic response functions. The advantages of the model are illustrated by analyzing an immigration attitude survey conducted in the United States.

2601.19156 2026-01-28 stat.ML cs.LG math.OC

Convergence of Muon with Newton-Schulz

Gyu Yeol Kim, Min-hwan Oh

Comments Accepted at ICLR 2026

详情
英文摘要

We analyze Muon as originally proposed and used in practice -- using the momentum orthogonalization with a few Newton-Schulz steps. The prior theoretical results replace this key step in Muon with an exact SVD-based polar factor. We prove that Muon with Newton-Schulz converges to a stationary point at the same rate as the SVD-polar idealization, up to a constant factor for a given number $q$ of Newton-Schulz steps. We further analyze this constant factor and prove that it converges to 1 doubly exponentially in $q$ and improves with the degree of the polynomial used in Newton-Schulz for approximating the orthogonalization direction. We also prove that Muon removes the typical square-root-of-rank loss compared to its vector-based counterpart, SGD with momentum. Our results explain why Muon with a few low-degree Newton-Schulz steps matches exact-polar (SVD) behavior at a much faster wall-clock time and explain how much momentum matrix orthogonalization via Newton-Schulz benefits over the vector-based optimizer. Overall, our theory justifies the practical Newton-Schulz design of Muon, narrowing its practice-theory gap.

2601.19140 2026-01-28 astro-ph.HE stat.AP stat.CO stat.ME

Latent characterisation of the complete BATSE gamma ray bursts catalogue using Gaussian mixture of factor analysers and model-estimated overlap-based syncytial clustering

Fan Dai, Ranjan Maitra

Journal ref Monthly Notices of the Royal Astronomical Society 535 (2024) 3396-3409

详情
英文摘要

Characterising and distinguishing gamma-ray bursts (GRBs) has interested astronomers for many decades. While some authors have found two or three groups of GRBs by analyzing only a few parameters, recent work identified five ellipsoidally-shaped groups upon considering nine parameters $T_{50}, T_{90}, F_1, F_2, F_3, F_4, P_{64}, P_{256}, P_{1024}$. Yet others suggest sub-classes within the two or three groups found earlier. Using a mixture model of Gaussian factor analysers, we analysed 1150 GRBs, that had nine parameters observed, from the current Burst and Transient Source Experiment (BATSE) catalogue, and again established five ellipsoidal-shaped groups to describe the GRBs. These five groups are characterised in terms of their average duration, fluence and spectrum as shorter-faint-hard, long-intermediate-soft, long-intermediate-intermediate, long-bright-intermediate and short-faint-hard. The use of factor analysers in describing individual group densities allows for a more thorough group-wise characterisation of the parameters in terms of a few latent features. However, given the discrepancy with many other existing studies that advocated for two or three groups, we also performed model-estimated overlap-based syncytial clustering (MOBSynC) that successively merges poorer-separated groups. The five ellipsoidal groups merge into three and then into two groups, one with GRBs of low durations and the other having longer duration GRBs. These groups are also characterised in terms of a few latent factors made up of the nine parameters. Our analysis provides context for all three sets of results, and in doing so, details a multi-layered characterisation of the BATSE GRBs, while also explaining the structure in their variability.

2601.19071 2026-01-28 math.ST stat.TH

Asymptotic inference for skewed stable Ornstein-Uhlenbeck process

Eitaro Kawamo, Hiroki Masuda

详情
英文摘要

We consider the parametric estimation of the Ornstein-Uhlenbeck process driven by a non-Gaussian $α$-stable Lévy process with the stable index $α>1$ and possibly skewed jumps, based on a discrete-time sample over a fixed period. By employing a suitable non-diagonal normalizing matrix, we present the following: the parametric family satisfies the local asymptotic mixed normality with a non-degenerate Fisher information matrix; there exists a local maximum of the log-likelihood function which is asymptotically mixed-normal; the local maximum is asymptotically efficient in the sense that it has maximal concentration around the true value over symmetric convex Borel subsets. In the proof, we prove the asymptotic equivalence between the genuine likelihood and the much simpler Euler-type quasi-likelihood. Furthermore, we propose a simple moment-based method to estimate the parameters of the driving stable Lévy process, which serves as an initial estimator for numerical search of the (quasi-)likelihood, reducing the computational burden of the optimization to a large extent. We also present simulation results, which illustrate the theoretical results and highlight the advantages and disadvantages of the genuine and quasi-likelihood approaches.

2601.19055 2026-01-28 cs.LG cs.AI cs.CL stat.ML

Principled Fine-tuning of LLMs from User-Edits: A Medley of Preference, Supervision, and Reward

Dipendra Misra, Aldo Pacchiano, Ta-Chung Chi, Ge Gao

Comments Accepted at NeurIPS 2025

详情
英文摘要

We study how to fine-tune LLMs using user-edit deployment data consisting of a set of context, an agent's response, and user edits. This deployment data is naturally generated by users in applications such as LLMs-based writing assistants and coding agents. The _natural_ origin of user edits makes it a desired source for adapting and personalizing LLMs. In this setup, there emerges a unification of various feedback types namely preferences, supervised labels, and cost that are typically studied separately in the literature. In this paper, we initiate the theoretical investigation of learning from user edits. We first derive bounds for learning algorithms that learn from each of these feedback types. We prove that these algorithms have different trade-offs depending upon the user, data distribution, and model class. We then propose a simple ensembling procedure to jointly learn from these feedback types. On two domains adapted from Gao et al. 2024, we show our ensembling procedure outperforms these methods that learn from individual feedback. Further, we show that our proposed procedure can robustly adapt to different user-edit distributions at test time.

2601.19049 2026-01-28 stat.ME

A class of skew-multivariate distributions for spatial data

Pavel Krupskii

Comments 35 pages, 6 figures, and 3 tables

详情
英文摘要

This paper introduces a class of copula models for spatial data, based on multivariate Pareto-mixture distributions. We explore the tail properties of these models, demonstrating their ability to capture both tail dependence and asymptotic independence, as well as the tail asymmetry frequently observed in real-world data. The proposed models also offer flexibility in accounting for permutation asymmetry and can effectively represent both the bulk and extreme tails of the distribution. We consider special cases of these models with computationally tractable likelihoods and present an extensive simulation study to assess the finite-sample performance of the maximum likelihood estimators. Finally, we apply our models to analyze a temperature dataset, showcasing their practical utility.

2601.19044 2026-01-28 stat.ME stat.AP

Local Variable and Neighborhood Selection for Firearm Fatality in the Southeast USA

Debjoy Thakur, Lingyuan Zhao, Soutir Bandyopadhyay

详情
英文摘要

A major public health concern in the United States (US) is gun-related deaths. The number of gun injuries largely varies spatially because of county-wise heterogeneity of race, sex, age, and income distributions. But still, a major challenge is to locally identify the influential socio-economic factors behind these firearm fatality incidents. For a diverging number of predictors, a rich literature exists regarding SCAD under the independence framework; however, a vacuum remains when discussing local variable selection for spatially correlated, over-dispersed data. This research presents a two-step localized variable selection and inference framework for spatially indexed gunshot fatality data. In the first step, we select variables locally using the SCAD penalty for specific locations where the number of gunshot incidents exceeds a threshold. For these locations, after selecting the predictors, we proceed to the next step, which involves examining the directional variation in the latent spatial neighborhood structure. We further discuss the theoretical properties of this county-specific local variable selection under infill asymptotics. This method has threefold advantages: (i) this method selects the variables locally, (ii) this method provides inference about directional variation of a selected predictor, and (iii) instead of assuming the spatial neighborhood structure in an ad hoc manner, this method identifies the specific type of spatial neighborhood structure that is most appropriate for modeling the random effects.

2601.19037 2026-01-28 cs.LG q-bio.QM stat.ML

XIMP: Cross Graph Inter-Message Passing for Molecular Property Prediction

Anatol Ehrlich, Lorenz Kummer, Vojtech Voracek, Franka Bause, Nils M. Kriege

详情
英文摘要

Accurate molecular property prediction is central to drug discovery, yet graph neural networks often underperform in data-scarce regimes and fail to surpass traditional fingerprints. We introduce cross-graph inter-message passing (XIMP), which performs message passing both within and across multiple related graph representations. For small molecules, we combine the molecular graph with scaffold-aware junction trees and pharmacophore-encoding extended reduced graphs, integrating complementary abstractions. While prior work is either limited to a single abstraction or non-iterative communication across graphs, XIMP supports an arbitrary number of abstractions and both direct and indirect communication between them in each layer. Across ten diverse molecular property prediction tasks, XIMP outperforms state-of-the-art baselines in most cases, leveraging interpretable abstractions as an inductive bias that guides learning toward established chemical concepts, enhancing generalization in low-data settings.

2601.19030 2026-01-28 cs.LG cs.AI stat.ML

A Unifying View of Coverage in Linear Off-Policy Evaluation

Philip Amortila, Audrey Huang, Akshay Krishnamurthy, Nan Jiang

Comments To appear at ICLR 2026

详情
英文摘要

Off-policy evaluation (OPE) is a fundamental task in reinforcement learning (RL). In the classic setting of linear OPE, finite-sample guarantees often take the form $$ \textrm{Evaluation error} \le \textrm{poly}(C^π, d, 1/n,\log(1/δ)), $$ where $d$ is the dimension of the features and $C^π$ is a coverage parameter that characterizes the degree to which the visited features lie in the span of the data distribution. While such guarantees are well-understood for several popular algorithms under stronger assumptions (e.g. Bellman completeness), the understanding is lacking and fragmented in the minimal setting where only the target value function is linearly realizable in the features. Despite recent interest in tight characterizations of the statistical rate in this setting, the right notion of coverage remains unclear, and candidate definitions from prior analyses have undesirable properties and are starkly disconnected from more standard definitions in the literature. We provide a novel finite-sample analysis of a canonical algorithm for this setting, LSTDQ. Inspired by an instrumental-variable view, we develop error bounds that depend on a novel coverage parameter, the feature-dynamics coverage, which can be interpreted as linear coverage in an induced dynamical system for feature evolution. With further assumptions -- such as Bellman-completeness -- our definition successfully recovers the coverage parameters specialized to those settings, finally yielding a unified understanding for coverage in linear OPE.

2601.19005 2026-01-28 cs.IR cs.LG stat.ML

Recommending Composite Items Using Multi-Level Preference Information: A Joint Interaction Modeling Approach

Xuan Bi, Yaqiong Wang, Gediminas Adomavicius, Shawn Curley

详情
英文摘要

With the advancement of machine learning and artificial intelligence technologies, recommender systems have been increasingly used across a vast variety of platforms to efficiently and effectively match users with items. As application contexts become more diverse and complex, there is a growing need for more sophisticated recommendation techniques. One example is the composite item (for example, fashion outfit) recommendation where multiple levels of user preference information might be available and relevant. In this study, we propose JIMA, a joint interaction modeling approach that uses a single model to take advantage of all data from different levels of granularity and incorporate interactions to learn the complex relationships among lower-order (atomic item) and higher-order (composite item) user preferences as well as domain expertise (e.g., on the stylistic fit). We comprehensively evaluate the proposed method and compare it with advanced baselines through multiple simulation studies as well as with real data in both offline and online settings. The results consistently demonstrate the superior performance of the proposed approach.

2601.19004 2026-01-28 stat.ME math.ST stat.TH

Asymptotic Distribution of Robust Effect Size Index

Xinyu Zhang, Rachael Muscatello, Megan Jones, Blythe Corbett, Simon Vandekar

详情
英文摘要

The Robust Effect Size Index (RESI) is a recently proposed standardized effect size to quantify association strength across models. However, its confidence interval construction has relied on computationally intensive bootstrap procedures. We establish a general theorem for the asymptotic distribution of the RESI using a Taylor expansion that accommodates a broad class of models. Simulations under various linear and logistic regression settings show that RESI and its CI have smaller bias and more reliable coverage than commonly used effect sizes such as Cohen's d and f. Combining with robust covariance estimation yields valid inference under model misspecification. We use the methods to investigate associations of depression and behavioral problems with sex and diagnosis in Autism spectrum disorders, and demonstrate that the asymptotic approach achieves up to a 50-fold speedup over the bootstrap. Our work provides a scalable and reliable alternative to bootstrap inference, greatly enhancing the applicability of RESI to high-dimensional studies.

2601.18992 2026-01-28 stat.ME cs.NA math.NA

Mixture-Weighted Ensemble Kalman Filter with Quasi-Monte Carlo Transport

Ilja Klebanov, Claudia Schillings, Dana Wrischnig

详情
英文摘要

The Bootstrap Particle Filter (BPF) and the Ensemble Kalman Filter (EnKF) are two widely used methods for sequential Bayesian filtering: the BPF is asymptotically exact but can suffer from weight degeneracy, while the EnKF scales well in high dimension yet is exact only in the linear-Gaussian case. We combine these approaches by retaining the EnKF transport step and adding a principled importance-sampling correction. Our first contribution is a general importance-sampling theory for mixture targets and proposals, including variance comparisons between individual- and mixture-based estimators. We then interpret the stochastic EnKF analysis as sampling from explicit Gaussian-mixture proposals obtained by conditioning on the current or previous ensemble, which leads to six self-normalized IS-EnKF schemes. We embed these updates into a broader class of ensemble-based filters and prove consistency and error bounds, including weight-variance comparisons and sufficient conditions ensuring finite-variance importance weights. As a second contribution, we construct transported quasi-Monte Carlo (TQMC) point sets for the Gaussian-mixture laws arising in prediction and analysis, yielding TQMC-enhanced variants that can substantially reduce sampling error without changing the filtering pipeline. Numerical experiments on benchmark models compare the proposed mixture-weighted and TQMC-enhanced filters, showing improved filtering accuracy relative to BPF, EnKF, and the standard weighted EnKF, and that the weighted schemes eliminate the EnKF error plateau often caused by analysis-target mismatch.

2601.04499 2026-01-28 stat.ME math.ST stat.CO stat.ML stat.TH

A Generalized Adaptive Joint Learning Framework for High-Dimensional Time-Varying Models

Baolin Chen, Mengfei Ran

详情
英文摘要

In modern biomedical and econometric studies, longitudinal processes are often characterized by complex time-varying associations and abrupt regime shifts that are shared across correlated outcomes. Standard functional data analysis (FDA) methods, which prioritize smoothness, often fail to capture these dynamic structural features, particularly in high-dimensional settings. This article introduces Adaptive Joint Learning (AJL), a hierarchical regularization framework designed to integrate functional variable selection with structural changepoint detection in multivariate time-varying coefficient models. Unlike standard simultaneous estimation approaches, we propose a theoretically grounded two-stage screening-and-refinement procedure. This framework first synergizes adaptive group-wise penalization with sure screening principles to robustly identify active predictors, followed by a refined fused regularization step that effectively borrows strength across multiple outcomes to detect local regime shifts. We provide a rigorous theoretical analysis of the estimator in the ultra-high-dimensional regime (p >> n). Crucially, we establish the sure screening consistency of the first stage, which serves as the foundation for proving that the refined estimator achieves the oracle property-performing as well as if the true active set and changepoint locations were known a priori. A key theoretical contribution is the explicit handling of approximation bias via undersmoothing conditions to ensure valid asymptotic inference. The proposed method is validated through comprehensive simulations and an application to Sleep-EDF data, revealing novel dynamic patterns in physiological states.

2512.18479 2026-01-28 stat.ME

Calibrating hierarchical Bayesian domain inference for a proportion

Rayleigh Lei, Yajuan Si

详情
英文摘要

Small area estimation (SAE) improves estimates for local communities or groups, such as counties, neighborhoods, or demographic subgroups, when data are insufficient for each area. This is important for targeting local resources and policies, especially when national-level or large-area data mask variation at a more granular level. Researchers often fit hierarchical Bayesian models to stabilize SAE when data are sparse. Ideally, Bayesian procedures also exhibit good frequentist properties, as demonstrated by calibrated Bayes metrics. However, hierarchical Bayesian models tend to shrink domain estimates toward the overall mean and may produce credible intervals that do not maintain nominal coverage. Hoff et al. developed the Frequentist, but Assisted by Bayes (FAB) intervals for subgroup estimates with normally distributed outcomes. However, non-normally distributed data present new challenges, and multiple types of intervals have been proposed for estimating proportions. We examine domain inference with binary outcomes and extend FAB intervals to improve nominal coverage. We describe how to numerically compute FAB intervals for a proportion and evaluate their performance through repeated simulation studies. Leveraging multilevel regression and poststratification (MRP), we further refine SAE to correct for sample selection bias, construct the FAB intervals for MRP estimates and assess their repeated sampling properties. Finally, we apply the proposed inference methods to estimate COVID-19 infection rates across geographic and demographic subgroups. We find that the FAB intervals improve nominal coverage, at the cost of wider intervals.

2511.12760 2026-01-28 cs.LG stat.ML

Conformal Online Learning of Deep Koopman Linear Embeddings

Ben Gao, Jordan Patracone, Stéphane Chrétien, Olivier Alata

Comments NeurIPS 2025

详情
英文摘要

We introduce Conformal Online Learning of Koopman embeddings (COLoKe), a novel framework for adaptively updating Koopman-invariant representations of nonlinear dynamical systems from streaming data. Our modeling approach combines deep feature learning with multistep prediction consistency in the lifted space, where the dynamics evolve linearly. To prevent overfitting, COLoKe employs a conformal-style mechanism that shifts the focus from evaluating the conformity of new states to assessing the consistency of the current Koopman model. Updates are triggered only when the current model's prediction error exceeds a dynamically calibrated threshold, allowing selective refinement of the Koopman operator and embedding. Empirical results on benchmark dynamical systems demonstrate the effectiveness of COLoKe in maintaining long-term predictive accuracy while significantly reducing unnecessary updates and avoiding overfitting.

2511.09486 2026-01-28 stat.ML cs.AI cs.LG stat.ME

A general framework for adaptive nonparametric dimensionality reduction

Antonio Di Noia, Federico Ravenda, Antonietta Mira

详情
英文摘要

Dimensionality reduction is a fundamental task in modern data science. Several projection methods specifically tailored to take into account the non-linearity of the data via local embeddings have been proposed. Such methods are often based on local neighbourhood structures and require tuning the number of neighbours that define this local structure, and the dimensionality of the lower-dimensional space onto which the data are projected. Such choices critically influence the quality of the resulting embedding. In this paper, we exploit a recently proposed intrinsic dimension estimator which also returns the optimal locally adaptive neighbourhood sizes according to some desirable criteria. In principle, this adaptive framework can be employed to perform an optimal hyper-parameter tuning of any dimensionality reduction algorithm that relies on local neighbourhood structures. Numerical experiments on both real-world and simulated datasets show that the proposed method can be used to significantly improve well-known projection methods when employed for various learning tasks, with improvements measurable through both quantitative metrics and the quality of low-dimensional visualizations.

2511.06374 2026-01-28 cs.LG stat.ML

Adaptive Regularization for Large-Scale Sparse Feature Embedding Models

Mang Li, Wei Lyu

详情
英文摘要

The one-epoch overfitting problem has drawn widespread attention, especially in CTR and CVR estimation models in search, advertising, and recommendation domains. These models which rely heavily on large-scale sparse categorical features, often suffer a significant decline in performance when trained for multiple epochs. Although recent studies have proposed heuristic solutions, the fundamental cause of this phenomenon remains unclear. In this work, we present a theoretical explanation grounded in Rademacher complexity, supported by empirical experiments, to explain why overfitting occurs in models with large-scale sparse categorical features. Based on this analysis, we propose a regularization method that constrains the norm budget of embedding layers adaptively. Our approach not only prevents the severe performance degradation observed during multi-epoch training, but also improves model performance within a single epoch. This method has already been deployed in online production systems.

2510.21091 2026-01-28 stat.ML cs.LG stat.ME

Doubly-Regressing Approach for Subgroup Fairness

Kunwoong Kim, Kyungseon Lee, Jihu Lee, Dongyoon Yang, Yongdai Kim

详情
英文摘要

Algorithmic fairness is a socially crucial topic in real-world applications of AI. Among many notions of fairness, subgroup fairness is widely studied when multiple sensitive attributes (e.g., gender, race, age) are present. However, as the number of sensitive attributes grows, the number of subgroups increases accordingly, creating heavy computational burdens and data sparsity problem (subgroups with too small sizes). In this paper, we develop a novel learning algorithm for subgroup fairness which resolves these issues by focusing on subgroups with sufficient sample sizes as well as marginal fairness (fairness for each sensitive attribute). To this end, we formalize a notion of subgroup-subset fairness and introduce a corresponding distributional fairness measure called the supremum Integral Probability Metric (supIPM). Building on this formulation, we propose the Doubly Regressing Adversarial learning for subgroup Fairness (DRAF) algorithm, which reduces a surrogate fairness gap for supIPM with much less computation than directly reducing supIPM. Theoretically, we prove that the proposed surrogate fairness gap is an upper bound of supIPM. Empirically, we show that the DRAF algorithm outperforms baseline methods in benchmark datasets, specifically when the number of sensitive attributes is large so that many subgroups are very small.

2510.20717 2026-01-28 math.ST stat.ME stat.TH

Testing Imprecise Hypotheses

Lucas Kania, Tudor Manole, Larry Wasserman, Sivaraman Balakrishnan

详情
英文摘要

Many scientific applications involve testing theories that are only partially specified. This task often amounts to testing the goodness-of-fit of a candidate distribution while allowing for reasonable deviations from it. The tolerant testing framework provides a systematic way of constructing such tests. Rather than testing the simple null hypothesis that data was drawn from a candidate distribution, a tolerant test assesses whether the data is consistent with any distribution that lies within a given neighborhood of the candidate. As this neighborhood grows, the tolerance to misspecification increases, while the power of the test decreases. In this work, we characterize the information-theoretic trade-off between the size of the neighborhood and the power of the test, in several canonical models. On the one hand, we characterize the optimal trade-off for tolerant testing in the Gaussian sequence model, under deviations measured in both smooth and non-smooth norms. On the other hand, we study nonparametric analogues of this problem in smooth regression and density models. Along the way, we establish the sub-optimality of the classical chi-squared statistic for tolerant testing, and study simple alternative hypothesis tests.

2510.03871 2026-01-28 cs.LG cs.AI stat.ML

Optimal Scaling Needs Optimal Norm

Oleg Filatov, Jiangtao Wang, Jan Ebert, Stefan Kesselheim

详情
英文摘要

Despite recent progress in optimal hyperparameter transfer under model and dataset scaling, no unifying explanatory principle has been established. For Adam and Scion optimizers, we discover that joint optimal scaling across model and dataset sizes is conditioned on a single invariant: the operator norm of the output layer. Across models with up to 1.3B parameters trained on up to 138B tokens, the optimal learning rate/batch size pair $(η^{\ast}, B^{\ast})$ consistently has the same operator norm value - a phenomenon we term norm transfer. This constant norm condition is necessary but not sufficient: while for each dataset size, multiple $(η, B)$ reach the optimal norm, only a unique $(η^{\ast}, B^{\ast})$ achieves the best loss. As a sufficient condition, we provide the first measurement of $(η^{\ast}, B^{\ast})$ scaling with dataset size for Scion, and find that the scaling rules are consistent with those of Adam. Tuning per-layer-group learning rates also improves model performance, with the output layer being the most sensitive and hidden layers benefiting from lower learning rates. We provide practical insights on norm-guided optimal scaling and release our Distributed Scion (Disco) implementation with logs from over two thousand runs to support research on LLM training dynamics at scale.

2509.22755 2026-01-28 stat.ML cs.LG math.PR

Concept activation vectors: a unifying view and adversarial attacks

Ekkehard Schnoor, Malik Tiomoko, Jawher Said, Alex Jung, Wojciech Samek

Comments 5 pages, 4 figures

详情
英文摘要

Concept Activation Vectors (CAVs) are a tool from explainable AI, offering a promising approach for understanding how human-understandable concepts are encoded in a model's latent spaces. They are computed from hidden-layer activations of inputs belonging either to a concept class or to non-concept examples. Adopting a probabilistic perspective, the distribution of the (non-)concept inputs induces a distribution over the CAV, making it a random vector in the latent space. This enables us to derive mean and covariance for different types of CAVs, leading to a unified theoretical view. This probabilistic perspective also reveals a potential vulnerability: CAVs can strongly depend on the rather arbitrary non-concept distribution, a factor largely overlooked in prior work. We illustrate this with a simple yet effective adversarial attack, underscoring the need for a more systematic study.

2509.22124 2026-01-28 stat.ML cs.LG

Incorporating priors in learning: a random matrix study under a teacher-student framework

Malik Tiomoko, Ekkehard Schnoor

Comments 5 pages, 4 figures

详情
英文摘要

Regularized linear regression is central to machine learning, yet its high-dimensional behavior with informative priors remains poorly understood. We provide the first exact asymptotic characterization of training and test risks for maximum a posteriori (MAP) regression with Gaussian priors centered at a domain-informed initialization. Our framework unifies ridge regression, least squares, and prior-informed estimators, and -- using random matrix theory -- yields closed-form risk formulas that expose the bias-variance-prior tradeoff, explain double descent, and quantify prior mismatch. We also identify a closed-form minimizer of test risk, enabling a simple estimator of the optimal regularization parameter. Simulations confirm the theory with high accuracy. By connecting Bayesian priors, classical regularization, and modern asymptotics, our results provide both conceptual clarity and practical guidance for learning with structured prior knowledge.

2509.17543 2026-01-28 stat.ML cs.LG stat.ME

Bilateral Distribution Compression: Reducing Both Data Size and Dimensionality

Dominic Broadbent, Nick Whiteley, Robert Allison, Tom Lovett

Comments 46 pages, 26 figures

详情
英文摘要

Existing distribution compression methods reduce the number of observations in a dataset by minimising the Maximum Mean Discrepancy (MMD) between original and compressed sets, but modern datasets are often large in both sample size and dimensionality. We propose Bilateral Distribution Compression (BDC), a two-stage framework that compresses along both axes while preserving the underlying distribution, with overall linear time and memory complexity in dataset size and dimension. Central to BDC is the Decoded MMD (DMMD), which we introduce to quantify the discrepancy between the original data and a compressed set decoded from a low-dimensional latent space. BDC proceeds by (i) learning a low-dimensional projection using the Reconstruction MMD (RMMD), and (ii) optimising a latent compressed set with the Encoded MMD (EMMD). We show that this procedure minimises the DMMD, guaranteeing that the compressed set faithfully represents the original distribution. Experiments show that BDC can achieve comparable or superior downstream task performance to ambient-space compression at substantially lower cost and with significantly higher rates of compression.

2509.06223 2026-01-28 stat.ME math.ST stat.TH

Maximum-likelihood estimation of the Matérn covariance structure of isotropic spatial random fields on finite, sampled grids

Frederik J. Simons, Olivia L. Walbert, Arthur P. Guillaumin, Gabriel L. Eggers, Kevin W. Lewis, Sofia C. Olhede

Comments Accepted by Geophysical Journal International, January 2026

详情
英文摘要

We present a statistically and computationally efficient spectral-domain maximum-likelihood procedure to solve for the structure of Gaussian spatial random fields within the Matern covariance hyperclass. For univariate, stationary, and isotropic fields, the three controlling parameters are the process variance, smoothness, and range. The debiased Whittle likelihood maximization explicitly treats discretization and edge effects for finite sampled regions in parameter estimation and uncertainty quantification. As even the best parameter estimate may not be good enough, we provide a test for whether the model specification itself warrants rejection. Our results are practical and relevant for the study of a variety of geophysical fields, and for spatial interpolation, out-of-sample extension, kriging, machine learning, and feature detection of geological data. We present procedural details and high-level results on real-world examples.

2509.05823 2026-01-28 math.ST econ.EM stat.ME stat.TH

Polynomial Log-Marginals and Tweedie's Formula : When Is Bayes Possible?

Jyotishka Datta, Nicholas G. Polson

详情
英文摘要

Motivated by Tweedie's formula for the Compound Decision problem, we examine the theoretical foundations of empirical Bayes estimators that directly model the marginal density $m(y)$. Our main result shows that polynomial log-marginals of degree $k \ge 3 $ cannot arise from any valid prior distribution in exponential family models, while quadratic forms correspond exactly to Gaussian priors. This provides theoretical justification for why certain empirical Bayes decision rules, while practically useful, do not correspond to any formal Bayes procedures. We also strengthen the diagnostic by showing that a marginal is a Gaussian convolution only if it extends to a bounded solution of the heat equation in a neighborhood of the smoothing parameter, beyond the convexity of $c(y)=\tfrac12 y^2+\log m(y)$.

2509.01342 2026-01-28 stat.AP stat.ME

Suicide Mortality in Spain (2010-2022): Temporal Trends, Spatial Patterns, and Risk Factors

A. Adin, G. Retegui, A. Sánchez Villegas, M. D. Ugarte

Journal ref International Journal of Health Geographics (2026)

详情
英文摘要

Background: Suicide remains a major public health concern worldwide, responsible for more than 700,000 deaths in 2021, accounting for approximately 1.1\% of all global deaths. While many high-income countries have reported declines in age-standardized suicide rates over the past two decades, recent evidence from Spain indicates increasing mortality among women, whereas suicide rates among men have remained relatively stable. To better understand these patterns and their potential underlying determinants, this study examines the spatial and temporal patterns of age-stratified suicide mortality across Spanish provinces from 2010 to 2022, with particular attention to sex-specific differences. Methods: Mixed Poisson models were applied to analyze provincial- and temporal-level suicide mortality rates, stratified by age and sex. The models accounted for spatial and temporal confounding effects and examined associations with various socioeconomic and contextual factors, including rurality and unemployment. Results: Findings highlight the influence of rurality and unemployment on suicide mortality, with distinct gender-specific patterns. A 10$\%$ increase in the proportion of residents living in rural areas was associated with more than a 5$\%$ rise in male suicide mortality, while a 1$\%$ increase in the annual unemployment rate was linked to a 2.4$\%$ increase in female suicide mortality. Although male suicide rates remained consistently higher than female rates, a notable and steady upward trend was observed in female suicide mortality over the study period. Conclusions: The use of sophisticated statistical models permits the detection of underlying patterns, revealing both geographic and temporal disparities in suicide mortality across Spanish provinces.

2508.09156 2026-01-28 cs.LG cs.AI stat.AP

Physics-Constrained Fine-Tuning of Flow-Matching Models for Generation and Inverse Problems

Jan Tauberschmidt, Sophie Fellenz, Sebastian J. Vollmer, Andrew B. Duncan

详情
英文摘要

We present a framework for fine-tuning flow-matching generative models to enforce physical constraints and solve inverse problems in scientific systems. Starting from a model trained on low-fidelity or observational data, we apply a differentiable post-training procedure that minimizes weak-form residuals of governing partial differential equations (PDEs), promoting physical consistency and adherence to boundary conditions without distorting the underlying learned distribution. To infer unknown physical inputs, such as source terms, material parameters, or boundary data, we augment the generative process with a learnable latent parameter predictor and propose a joint optimization strategy. The resulting model produces physically valid field solutions alongside plausible estimates of hidden parameters, effectively addressing ill-posed inverse problems in a data-driven yet physicsaware manner. We validate our method on canonical PDE benchmarks, demonstrating improved satisfaction of PDE constraints and accurate recovery of latent coefficients. Our approach bridges generative modelling and scientific inference, opening new avenues for simulation-augmented discovery and data-efficient modelling of physical systems.

2507.15809 2026-01-28 cs.CV cs.LG physics.geo-ph stat.AP

Diffusion models for multivariate subsurface generation and efficient probabilistic inversion

Roberto Miele, Niklas Linde

Comments 35 p., 16 figs. This updated version corrects an error with the analysis of the denoising scores' magnitudes in the results section. The discussion and conclusions of our study remain unchanged. The scores' trends were erroneously reported with inverted time-step order, now fixed in Figure 5a and b (now with the correct increasing trends) and in the corresponding analysis in the Results section

Journal ref Comput. Geosci. 207 (2026) 106076

详情
英文摘要

Diffusion models offer stable training and state-of-the-art performance for deep generative modeling tasks. Here, we consider their use in the context of multivariate subsurface modeling and probabilistic inversion. We first demonstrate that diffusion models enhance multivariate modeling capabilities compared to variational autoencoders and generative adversarial networks. In diffusion modeling, the generative process involves a comparatively large number of time steps with update rules that can be modified to account for conditioning data. We propose different corrections to the popular Diffusion Posterior Sampling approach by Chung et al. (2023). In particular, we introduce a likelihood approximation accounting for the noise-contamination that is inherent in diffusion modeling. We assess performance in a multivariate geological scenario involving facies and correlated acoustic impedance. Conditional modeling is demonstrated using both local hard data (well logs) and nonlinear geophysics (fullstack seismic data). Our tests show significantly improved statistical robustness, enhanced sampling of the posterior probability density function and reduced computational costs, compared to the original approach. The method can be used with both hard and indirect conditioning data, individually or simultaneously. As the inversion is included within the diffusion process, it is faster than other methods requiring an outer-loop around the generative model, such as Markov chain Monte Carlo.

2507.07981 2026-01-28 cs.CL cs.AI cs.LG stat.ML

Why is Your Language Model a Poor Implicit Reward Model?

Noam Razin, Yong Lin, Jiarui Yao, Sanjeev Arora

Comments Accepted to ICLR 2026; Code available at https://github.com/princeton-pli/exrm-vs-imrm

详情
英文摘要

Reward models are key to language model post-training and inference pipelines. Conveniently, recent work showed that every language model defines an implicit reward model (IM-RM), without requiring any architectural changes. However, such IM-RMs tend to generalize worse, especially out-of-distribution, compared to explicit reward models (EX-RMs) that apply a dedicated linear head over the hidden representations of a language model. The existence of a generalization gap is puzzling, as EX-RMs and IM-RMs are nearly identical. They can be trained using the same data, loss function, and language model, and differ only in how the reward is computed. Toward a fundamental understanding of the implicit biases underlying different reward model types, we investigate the root cause of this gap. Our main finding, backed by theory and experiments, is that IM-RMs rely more heavily on superficial token-level cues. Consequently, they often generalize worse than EX-RMs under token-level distribution shifts, as well as in-distribution. Furthermore, we provide evidence against alternative hypotheses for the generalization gap. Most notably, we challenge the claim that IM-RMs struggle in tasks where generation is harder than verification because they can operate both as a verifier and a generator. Overall, our results highlight that seemingly minor design choices can substantially impact the generalization behavior of reward models.

2506.05059 2026-01-28 cs.LG stat.ML

NIMO: a Nonlinear Interpretable MOdel

Shijian Xu, Marcello Massimo Negri, Volker Roth

Comments ICLR 2026

详情
英文摘要

Deep learning has achieved remarkable success across many domains, but it has also created a growing demand for interpretability in model predictions. Although many explainable machine learning methods have been proposed, post-hoc explanations lack guaranteed fidelity and are sensitive to hyperparameter choices, highlighting the appeal of inherently interpretable models. For example, linear regression provides clear feature effects through its coefficients. However, such models are often outperformed by more complex neural networks (NNs) that usually lack inherent interpretability. To address this dilemma, we introduce NIMO, a framework that combines inherent interpretability with the expressive power of neural networks. Building on the simple linear regression, NIMO is able to provide flexible and intelligible feature effects. Relevantly, we develop an optimization method based on parameter elimination, that allows for optimizing the NN parameters and linear coefficients effectively and efficiently. By relying on adaptive ridge regression we can easily incorporate sparsity as well. We show empirically that our model can provide faithful and intelligible feature effects while maintaining good predictive performance.

2506.04775 2026-01-28 cs.LG cs.IT math.IT stat.ML

Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards

Artin Tajdini, Jonathan Scarlett, Kevin Jamieson

详情
英文摘要

We study stochastic linear bandits with heavy-tailed rewards, where the rewards have a finite $(1+ε)$-absolute central moment bounded by $\upsilon$ for some $ε\in (0,1]$. We improve both upper and lower bounds on the minimax regret compared to prior work. When $\upsilon = \mathcal{O}(1)$, the best prior known regret upper bound is $\tilde{\mathcal{O}}(d T^{\frac{1}{1+ε}})$. While a lower with the same scaling has been given, it relies on a construction using $\upsilon = \mathcal{O}(d)$, and adapting the construction to the bounded-moment regime with $\upsilon = \mathcal{O}(1)$ yields only a $Ω(d^{\fracε{1+ε}} T^{\frac{1}{1+ε}})$ lower bound. This matches the known rate for multi-armed bandits and is generally loose for linear bandits, in particular being $\sqrt{d}$ below the optimal rate in the finite-variance case ($ε= 1$). We propose a new elimination-based algorithm guided by experimental design, which achieves regret $\tilde{\mathcal{O}}(d^{\frac{1+3ε}{2(1+ε)}} T^{\frac{1}{1+ε}})$, thus improving the dependence on $d$ for all $ε\in (0,1)$ and recovering a known optimal result for $ε= 1$. We also establish a lower bound of $Ω(d^{\frac{2ε}{1+ε}} T^{\frac{1}{1+ε}})$, which strictly improves upon the multi-armed bandit rate and highlights the hardness of heavy-tailed linear bandit problems. For finite action sets, we derive similarly improved upper and lower bounds for regret. Finally, we provide action set dependent regret upper bounds showing that for some geometries, such as $l_p$-norm balls for $p \le 1 + ε$, we can further reduce the dependence on $d$, and we can handle infinite-dimensional settings via the kernel trick, in particular establishing new regret bounds for the Matérn kernel that are the first to be sublinear for all $ε\in (0, 1]$.

2506.00779 2026-01-28 stat.ME math.ST stat.TH

Uncertainty quantification of synchrosqueezing transform under complicated nonstationary noise

Hau-Tieng Wu, Zhou Zhou

详情
英文摘要

We propose a bootstrapping framework to quantify uncertainty in time-frequency representations (TFRs) generated by the short-time Fourier transform (STFT) and the STFT-based synchrosqueezing transform (SST) for oscillatory signals with time-varying amplitude and frequency contaminated by complex nonstationary noise. To this end, we leverage a recent high-dimensional Gaussian approximation technique to establish a sequential Gaussian approximation for nonstationary processes under mild assumptions. This result is of independent interest and provides a theoretical basis for characterizing the approximate Gaussianity of STFT-induced TFRs as random fields. Building on this foundation, we establish the robustness of SST-based signal decomposition in the presence of nonstationary noise. Furthermore, assuming locally stationary noise, we develop a Gaussian autoregressive bootstrap for uncertainty quantification of SST-based TFRs and provide theoretical justification. We validate the proposed methods with simulations and illustrate their practical utility by analyzing spindle activity in electroencephalogram recordings. Our work bridges time-frequency analysis in signal processing and nonlinear spectral analysis of time series in statistics.

2506.00452 2026-01-28 eess.SP cs.AI stat.ML

Attention-Aided MMSE for OFDM Channel Estimation: Learning Linear Filters with Attention

TaeJun Ha, Chaehyun Jung, Hyeonuk Kim, Jeongwoo Park, Jeonghun Park

Comments 13 pages, 8 figures

详情
英文摘要

In orthogonal frequency division multiplexing (OFDM), accurate channel estimation is crucial. Classical signal processing-based approaches, such as linear minimum mean-squared error (LMMSE) estimation, often require second-order statistics that are difficult to obtain in practice. Recent deep neural network (DNN)-based methods have been introduced to address this; yet they often suffer from high inference complexity. This paper proposes an Attention-aided MMSE (A-MMSE), a model-based DNN framework that learns the linear MMSE filter via the Attention Transformer. Once trained, the A-MMSE performs channel estimation through a single linear operation, eliminating nonlinear activations during inference and thus reducing computational complexity. To improve the learning efficiency of the A-MMSE, we develop a two-stage Attention encoder that captures the frequency and temporal correlation structure of OFDM channels. We also introduce a rank-adaptive extension that enables a flexible performance-complexity trade-off. Numerical simulations show that the proposed A-MMSE consistently outperforms other baseline methods in terms of normalized MSE across a wide range of signal-to-noise ratio (SNR) conditions. In particular, the A-MMSE and its rank-adaptive extension provide an improved performance-complexity trade-off, providing a powerful and highly efficient solution for practical channel estimation.

2505.12825 2026-01-28 cs.LG stat.ML

Theoretical Investigation on Inductive Bias of Isolation Forest

Qin-Cheng Zheng, Shao-Qun Zhang, Shen-Huan Lyu, Yuan Jiang, Zhi-Hua Zhou

详情
英文摘要

Isolation Forest (iForest) stands out as a widely-used unsupervised anomaly detector, primarily owing to its remarkable runtime efficiency and superior performance in large-scale tasks. Despite its widespread adoption, a theoretical foundation explaining iForest's success remains unclear. This paper focuses on the inductive bias of iForest, which theoretically elucidates under what circumstances and to what extent iForest works well. The key is to formulate the growth process of iForest, where the split dimensions and split values are randomly selected. We model the growth process of iForest as a random walk, enabling us to derive the expected depth function, which is the outcome of iForest, using transition probabilities. The case studies reveal key inductive biases: iForest exhibits lower sensitivity to central anomalies while demonstrating greater parameter adaptability compared to $k$-Nearest Neighbor. Our study provides a theoretical understanding of the effectiveness of iForest and establishes a foundation for further theoretical exploration.

2505.08683 2026-01-28 stat.ML cs.LG stat.ME

Uncertainty-Aware Surrogate-based Amortized Bayesian Inference for Computationally Expensive Models

Stefania Scheurer, Philipp Reiser, Tim Brünnette, Wolfgang Nowak, Anneli Guthke, Paul-Christian Bürkner

Comments 27 pages, 15 figures

Journal ref Transactions on Machine Learning Research (2026)

详情
英文摘要

Bayesian inference typically relies on a large number of model evaluations to estimate posterior distributions. Established methods like Markov Chain Monte Carlo (MCMC) and Amortized Bayesian Inference (ABI) can become computationally challenging. While ABI enables fast inference after training, generating sufficient training data still requires thousands of model simulations, which is infeasible for expensive models. Surrogate models offer a solution by providing approximate simulations at a lower computational cost, allowing the generation of large data sets for training. However, the introduced approximation errors and uncertainties can lead to overconfident posterior estimates. To address this, we propose Uncertainty-Aware Surrogate-based Amortized Bayesian Inference (UA-SABI) -- a framework that combines surrogate modeling and ABI while explicitly quantifying and propagating surrogate uncertainties through the inference pipeline. Our experiments show that this approach enables reliable, fast, and repeated Bayesian inference for computationally expensive models, even under tight time constraints.

2504.14610 2026-01-28 cs.LG stat.ML

Imputation-free Learning of Tabular Data with Missing Values using Incremental Feature Partitions in Transformer

Manar D. Samad, Kazi Fuad B. Akhter, Shourav B. Rabbani, Ibna Kowsar

详情
英文摘要

Tabular data sets with varying missing values are prepared for machine learning using an arbitrary imputation strategy. Synthetic values generated by imputation models often raise concerns regarding data quality and the reliability of data-driven outcomes. To address these concerns, this article proposes an imputation-free incremental attention learning (IFIAL) method for tabular data with missing values. A pair of attention masks is derived and retrofitted to a transformer to directly streamline tabular data without imputing or initializing missing values. The proposed method incrementally learns partitions of overlapping and fixed-size feature sets to enhance the performance of the transformer. The average classification performance rank order across 17 diverse tabular data sets highlights the superiority of IFIAL over 11 state-of-the-art learning methods with or without missing value imputations. Additional experiments corroborate the robustness of IFIAL to varying types and proportions of missing data, demonstrating its superiority over methods that rely on explicit imputations. A feature partition size equal to one-half the original feature space yields the best trade-off between computational efficiency and predictive performance. IFIAL is one of the first solutions that enables deep attention models to learn directly from tabular data, eliminating the need to impute missing values. %without the need for imputing missing values. The source code for this paper is publicly available.

2504.11396 2026-01-28 cs.IT math.IT stat.ML

Property Inheritance for Subtensors in Tensor Train Decompositions

HanQin Cai, Longxiu Huang

Comments 2025 IEEE International Symposium on Information Theory (ISIT 2025)

Journal ref IEEE International Symposium on Information Theory, 2025

详情
英文摘要

Tensor dimensionality reduction is one of the fundamental tools for modern data science. To address the high computational overhead, fiber-wise sampled subtensors that preserve the original tensor rank are often used in designing efficient and scalable tensor dimensionality reduction. However, the theory of property inheritance for subtensors is still underdevelopment, that is, how the essential properties of the original tensor will be passed to its subtensors. This paper theoretically studies the property inheritance of the two key tensor properties, namely incoherence and condition number, under the tensor train setting. We also show how tensor train rank is preserved through fiber-wise sampling. The key parameters introduced in theorems are numerically evaluated under various settings. The results show that the properties of interest can be well preserved to the subtensors formed via fiber-wise sampling. Overall, this paper provides several handy analytic tools for developing efficient tensor analysis methods.

2504.04480 2026-01-28 eess.SY cs.SY stat.ML

Fine Tuning a Simulation-Driven Estimator

Braghadeesh Lakshminarayanan, Margarita A. Guerrero, Cristian R. Rojas

Comments Published in IEEE Control Systems Letters, vol. 9, pp 2975-2980, 2025

详情
英文摘要

Many industries now deploy high-fidelity simulators (digital twins) to represent physical systems, yet their parameters must be calibrated to match the true system. This motivated the construction of simulation-driven parameter estimators, built by generating synthetic observations for sampled parameter values and learning a supervised mapping from observations to parameters. However, when the true parameters lie outside the sampled range, predictions suffer from an out-of-distribution (OOD) error. This paper introduces a fine-tuning approach for the Two-Stage estimator that mitigates OOD effects and improves accuracy. The effectiveness of the proposed method is verified through numerical simulations.

2504.02233 2026-01-28 stat.ME math.ST stat.TH

Testing independence and conditional independence in high dimensions via coordinatewise Gaussianization

Jinyuan Chang, Yue Du, Jing He, Qiwei Yao

详情
英文摘要

We propose new statistical tests, in high-dimensional settings, for testing the independence of two random vectors and their conditional independence given a third random vector. The key idea is simple, i.e., we first transform each component variable to the standard normal via its marginal empirical distribution, and we then test for independence and conditional independence of the transformed random vectors using appropriate $L_\infty$-type test statistics. While we are testing some necessary conditions of the independence or the conditional independence, the new tests outperform the 13 frequently used testing methods in a large scale simulation comparison. The advantage of the new tests can be summarized as follows: (i) they do not require any moment conditions, (ii) they allow arbitrary dependence structures of the components among the random vectors, and (iii) they allow the dimensions of random vectors to diverge at the exponential rates of the sample size. The critical values of the proposed tests are determined by a computationally efficient multiplier bootstrap procedure. Theoretical analysis shows that the sizes of the proposed tests can be well controlled by the nominal significance level, and the proposed tests are also consistent under certain local alternatives. The finite sample performance of the new tests is illustrated via extensive simulation studies and a real data application.

2503.21334 2026-01-28 math.ST stat.TH

Safety of particle filters: Some results on the time evolution of particle filter estimates

Mathieu Gerber

Comments 24 pages (major revision of the paper)

详情
英文摘要

Particle filters (PFs) form a class of Monte Carlo algorithms that propagate over time a set of $N\geq 1$ particles which can be used to estimate, in an online fashion, the sequence of filtering distributions $(\hatη_t)_{t\geq 1}$ defined by a state-space model. Despite the popularity of PFs, the study of the time evolution of their estimates has received barely any attention in the literature. Denoting by $(\hatη_t^N)_{t\geq 1}$ the PF estimate of $(\hatη_t)_{t\geq 1}$ and letting $κ\in (0,1/2)$, in this work we first show that for any number of particles $N$ it holds that, with probability one, we have $\|\hatη_t^N- \hatη_t\|\geq κ$ for infinitely many time instants $t\geq 1$, with $\|\cdot\|$ the Kolmogorov distance between probability distributions. Considering a simple filtering problem we then provide reassuring results concerning the ability of PFs to estimate jointly a finite set $\{\hatη_t\}_{t=1}^T$ of filtering distributions by studying the probability $\mathbb{P}(\sup_{t\in\{1,\dots,T\}}\|\hatη_t^{N}-\hatη_t\|\geq κ)$. Finally, on the same toy filtering problem, we prove that sequential quasi-Monte Carlo, a randomized quasi-Monte Carlo version of PF algorithms, offers greater safety guarantees than PFs in the sense that, for this algorithm, it holds that $\lim_{N\rightarrow\infty}\sup_{t\geq 1}\|\hatη_t^N-\hatη_t\|=0$ with probability one.

2503.09887 2026-01-28 math.PR stat.CO stat.ML

On the contraction properties of Sinkhorn semigroups

O. Deniz Akyildiz, Pierre del Moral, Joaquin Miguez

详情
英文摘要

We develop a novel stability theory for Sinkhorn semigroups based on Lyapunov techniques and quantitative contraction coefficients, and establish exponential convergence of Sinkhorn iterations on weighted Banach spaces. This operator-theoretic framework yields explicit exponential decay rates of Sinkhorn iterates toward Schrödinger bridges with respect to a broad class of $ϕ$-divergences and Kantorovich-type distances, including relative entropy, squared Hellinger integrals, $α$-divergences, weighted total variation norms, and Wasserstein distances. To the best of our knowledge, these results provide the first systematic contraction inequalities of this kind for entropic transport and the Sinkhorn algorithm. We further introduce Lyapunov contraction principles under minimal regularity assumptions, leading to quantitative exponential stability estimates for a large family of Sinkhorn semigroups. The framework applies to models with polynomially growing potentials and heavy-tailed marginals on general normed spaces, as well as to more structured boundary state-space models, including semicircle transitions and Beta, Weibull, and exponential marginals, together with semi-compact settings. Finally, our approach extends naturally to statistical finite mixtures of such models, including kernel-based density estimators arising in modern generative modeling.

2503.06917 2026-01-28 cs.LG cs.DS stat.ML

Sample-Efficient Optimization over Generative Priors via Coarse Learnability

Pranjal Awasthi, Sreenivas Gollapudi, Ravi Kumar, Kamesh Munagala

详情
英文摘要

In zeroth-order optimization, we seek to minimize a function $d(\cdot)$, which may encode combinatorial feasibility, using only function evaluations. We focus on the setting where solutions must also satisfy qualitative constraints or conform to a complex prior distribution. To address this, we introduce a new framework in which such constraints are represented by an initial generative prior $Ł(\cdot)$, for example, a Large Language Model (LLM). The objective is to find solutions $s$ that minimize $d(s)$ while having high probability under $Ł(s)$, effectively sampling from a target distribution proportional to $Ł(s) \cdot e^{-T \cdot d(s)}$ for a temperature parameter $T$. While this framework aligns with classical Model-Based Optimization (e.g., the Cross-Entropy method), existing theory is ill-suited for deriving sample complexity bounds in black-box deep generative models. We therefore propose a novel learning assumption, which we term \emph{coarse learnability}, where an agent with access to a polynomial number of samples can learn a model whose point-wise density approximates the target within a polynomial factor. Leveraging this assumption, we design an iterative algorithm that employs a Metropolis-Hastings correction to provably approximate the target distribution using a polynomial number of samples. To the best of our knowledge, this is one of the first works to establish such sample-complexity guarantees for model-based optimization with deep generative priors. We provide two lines of evidence supporting the coarse learnability assumption. Theoretically, we show that maximum likelihood estimation naturally induces the required coverage properties, holding for both standard exponential families and for misspecified models. Empirically, we demonstrate that LLMs can adapt their learned distributions to zeroth-order feedback to solve combinatorial optimization problems.

2502.19086 2026-01-28 stat.ML cs.LG stat.AP

Forecasting intermittent time series with Gaussian Processes and Tweedie likelihood

Stefano Damato, Dario Azzimonti, Giorgio Corani

Comments Published in International Journal of Forecasting

Journal ref International Journal of Forecasting (2025)

详情
英文摘要

We adopt Gaussian Processes (GPs) as latent functions for probabilistic forecasting of intermittent time series. The model is trained in a Bayesian framework that accounts for the uncertainty about the latent function. We couple the latent GP variable with two types of forecast distributions: the negative binomial (NegBinGP) and the Tweedie distribution (TweedieGP). While the negative binomial has already been used in forecasting intermittent time series, this is the first time in which a fully parameterized Tweedie density is used for intermittent time series. We properly evaluate the Tweedie density, which has both a point mass at zero and heavy tails, avoiding simplifying assumptions made in existing models. We test our models on thousands of intermittent count time series. Results show that our models provide consistently better probabilistic forecasts than the competitors. In particular, TweedieGP obtains the best estimates of the highest quantiles, thus showing that it is more flexible than NegBinGP.

2502.07580 2026-01-28 cs.LG stat.ML

Generative Modeling with Bayesian Sample Inference

Marten Lienen, Marcel Kollovieh, Stephan Günnemann

详情
英文摘要

We derive a novel generative model from iterative Gaussian posterior inference. By treating the generated sample as an unknown variable, we can formulate the sampling process in the language of Bayesian probability. Our model uses a sequence of prediction and posterior update steps to iteratively narrow down the unknown sample starting from a broad initial belief. In addition to a rigorous theoretical analysis, we establish a connection between our model and diffusion models and show that it includes Bayesian Flow Networks (BFNs) as a special case. In our experiments, we demonstrate that our model improves sample quality on ImageNet32 over both BFNs and the closely related Variational Diffusion Models, while achieving equivalent log-likelihoods on ImageNet32 and ImageNet64. Find our code at https://github.com/martenlienen/bsi.

2501.11596 2026-01-28 stat.ME

Precision of Treatment Hierarchy: A Metric for Quantifying Certainty in Treatment Hierarchies from Network Meta-Analysis

Augustine Wigle, Audrey Béliveau, Georgia Salanti, Gerta Rücker, Guido Schwarzer, Dimitris Mavridis, Adriani Nikolakopoulou

Comments 15 pages, 9 figures

详情
英文摘要

Network meta-analysis (NMA) is an extension of pairwise meta-analysis which facilitates the estimation of relative effects for multiple competing treatments. A hierarchy of treatments is a useful output of an NMA. Treatment hierarchies are produced using ranking metrics. Common ranking metrics include the Surface Under the Cumulative RAnking curve (SUCRA) and P-scores, which are the frequentist analogue to SUCRAs. Both metrics consider the size and uncertainty of the estimated treatment effects, with larger values indicating a more preferred treatment. Although SUCRAs and P-scores themselves consider uncertainty, treatment hierarchies produced by these ranking metrics are typically reported without a measure of certainty, which might be misleading to practitioners. We propose a new metric, Precision of Treatment Hierarchy (POTH), which quantifies the certainty in producing a treatment hierarchy from SUCRAs or P-scores. The metric connects three statistical quantities: The variance of the SUCRA values, the variance of the mean rank of each treatment, and the average variance of the distribution of individual ranks for each treatment. POTH provides a single, interpretable value which quantifies the degree of certainty in producing a treatment hierarchy. We show how the metric can be adapted to apply to subsets of treatments in a network, for example, to quantify the certainty in the hierarchy of the top three treatments. We calculate POTH for a database of NMAs to investigate its empirical properties, and we demonstrate its use on three published networks.

2410.15244 2026-01-28 eess.IV cs.CV cs.NA eess.SP math.NA stat.ME

Extensions on Low-complexity DCT Approximations for Larger Blocklengths Based on Minimal Angle Similarity

A. P. Radünz, L. Portella, R. S. Oliveira, F. M. Bayer, R. J. Cintra

Comments Clarified methodology; 27 pages, 6 figures, 5 tables

Journal ref J Sign Process Syst 95, 495-516 (2023)

详情
英文摘要

The discrete cosine transform (DCT) is a central tool for image and video coding because it can be related to the Karhunen-Loève transform (KLT), which is the optimal transform in terms of retained transform coefficients and data decorrelation. In this paper, we introduce 16-, 32-, and 64-point low-complexity DCT approximations by minimizing individually the angle between the rows of the exact DCT matrix and the matrix induced by the approximate transforms. According to some classical figures of merit, the proposed transforms outperformed the approximations for the DCT already known in the literature. Fast algorithms were also developed for the low-complexity transforms, asserting a good balance between the performance and its computational cost. Practical applications in image encoding showed the relevance of the transforms in this context. In fact, the experiments showed that the proposed transforms had better results than the known approximations in the literature for the cases of 16, 32, and 64 blocklength.

2410.14485 2026-01-28 cs.LG stat.ML

CaTs and DAGs: Integrating Directed Acyclic Graphs with Transformers for Causally Constrained Predictions

Matthew J. Vowels, Mathieu Rochat, Sina Akbari

详情
英文摘要

Artificial Neural Networks (ANNs), including fully-connected networks and transformers, are highly flexible and powerful function approximators, widely applied in fields like computer vision and natural language processing. However, their inability to inherently respect causal structures can limit their robustness, making them vulnerable to covariate shift and difficult to interpret/explain. This poses significant challenges for their reliability in real-world applications. In this paper, we introduce Causal Transformers (CaTs), a general model class designed to operate under predefined causal constraints, as specified by a Directed Acyclic Graph (DAG). CaTs retain the powerful function approximation abilities of traditional neural networks while adhering to the underlying structural constraints, improving robustness, reliability, and interpretability at inference time. This approach opens new avenues for deploying neural networks in more demanding, real-world scenarios where robustness and explainability is critical.

2409.11967 2026-01-28 stat.ME math.ST stat.TH

Incremental effects for continuous exposures

Kyle Schindl, Shuying Shen, Edward H. Kennedy

详情
英文摘要

Causal inference problems often involve continuous treatments, such as dose, duration, or frequency. However, identifying and estimating standard dose-response estimands requires that everyone has some chance of receiving any level of the exposure (i.e., positivity). To avoid this assumption, we consider stochastic interventions based on exponentially tilting the treatment distribution by some parameter $δ$ (an incremental effect); this increases or decreases the likelihood a unit receives a given treatment level. We derive the efficient influence function and semiparametric efficiency bound for these incremental effects under continuous exposures. We then show estimation depends on the size of the tilt, as measured by $δ$. In particular, we derive new minimax lower bounds illustrating how the best possible root mean squared error scales with an effective sample size of $n / δ$, instead of $n$. Further, we establish new convergence rates and bounds on the bias of double machine learning-style estimators. Our novel analysis gives a better dependence on $δ$ compared to standard analyses by using mixed supremum and $L_2$ norms. Finally, we define a "reflected" exponential tilt around any interior point and show that taking $δ\to \infty$ yields a new estimator of the dose-response curve across the treatment support.

2408.08080 2026-01-28 stat.AP q-bio.QM

Assessing the properties of the prediction interval in random-effects meta-analysis

Peter Matrai, Tamas Koi, Zoltan Sipos, Nelli Farkas

Comments To be published in Research Synthesis Methods

Journal ref Research Synthesis Methods (2026)

详情
英文摘要

Random effects meta-analysis is a widely applied methodology to synthetize research findings of studies in a specific scientific question. Besides estimating the mean effect, an important aim of the meta-analysis is to summarize the heterogeneity, i.e. the variation in the underlying effects caused by the differences in study circumstances. The prediction interval is frequently used for this purpose: a 95% prediction interval contains the true effect of a similar new study in 95% of the cases when it is constructed, or in other words, it covers 95% of the true effects distribution on average. In this article, after providing a clear mathematical background, we present an extensive simulation investigating the performance of all frequentist prediction interval methods published to date. The work focuses on the distribution of the coverage probabilities and how these distributions change depending on the amount of heterogeneity and the number of involved studies. Although the single requirement that a prediction interval has to fulfill is to keep a nominal coverage probability on average, we demonstrate why the distribution of coverages cannot be disregarded, and that for small number of studies no reliable conclusion can be drawn from the prediction interval. We argue that assessing only the mean coverage can easily lead to misunderstanding and misinterpretation. The length of the intervals and the robustness of the methods concerning non-normality of the true effects are also investigated.

2408.01540 2026-01-28 stat.CO

Monotonic warpings for additive and deep Gaussian processes

Steven D. Barnett, Lauren J. Beesley, Annie S. Booth, Robert B. Gramacy, Dave Osthus

Journal ref Statistics and Computing 35 (2025) 65

详情
英文摘要

Gaussian processes (GPs) are canonical as surrogates for computer experiments because they enjoy a degree of analytic tractability. But that breaks when the response surface is constrained, say to be monotonic. Here, we provide a mono-GP construction for a single input that is highly efficient even though the calculations are non-analytic. Key ingredients include transformation of a reference process and elliptical slice sampling. We then show how mono-GP may be deployed effectively in two ways. One is additive, extending monotonicity to more inputs; the other is as a prior on injective latent warping variables in a deep Gaussian process for (non-monotonic, multi-input) non-stationary surrogate modeling. We provide illustrative and benchmarking examples throughout, showing that our methods yield improved performance over the state-of-the-art on examples from those two classes of problems.

2407.10070 2026-01-28 cs.LG math.OC stat.ML

Have ASkotch: A Neat Solution for Large-scale Kernel Ridge Regression

Pratik Rathore, Zachary Frangella, Jiaming Yang, Michał Dereziński, Madeleine Udell

Comments 63 pages (including appendices), 17 figures, 6 tables

详情
英文摘要

Kernel ridge regression (KRR) is a fundamental computational tool, appearing in problems that range from computational chemistry to health analytics, with a particular interest due to its starring role in Gaussian process regression. However, full KRR solvers are challenging to scale to large datasets: both direct (i.e., Cholesky decomposition) and iterative methods (i.e., PCG) incur prohibitive computational and storage costs. The standard approach to scale KRR to large datasets chooses a set of inducing points and solves an approximate version of the problem, inducing points KRR. However, the resulting solution tends to have worse predictive performance than the full KRR solution. In this work, we introduce a new solver, ASkotch, for full KRR that provides better solutions faster than state-of-the-art solvers for full and inducing points KRR. ASkotch is a scalable, accelerated, iterative method for full KRR that provably obtains linear convergence. Under appropriate conditions, we show that ASkotch obtains condition-number-free linear convergence. This convergence analysis rests on the theory of ridge leverage scores and determinantal point processes. ASkotch outperforms state-of-the-art KRR solvers on a testbed of 23 large-scale KRR regression and classification tasks derived from a wide range of application domains, demonstrating the superiority of full KRR over inducing points KRR. Our work opens up the possibility of as-yet-unimagined applications of full KRR across a number of disciplines.

2407.08827 2026-01-28 stat.ME stat.AP

Estimating Methane Emissions from the Upstream Oil and Gas Industry Using a Multi-Stage Framework

Augustine Wigle, Audrey Beliveau

详情
英文摘要

Measurement-based methane inventories, which involve surveying oil and gas facilities and compiling data to estimate methane emissions, are becoming the gold standard for quantifying emissions. However, there is a current lack of statistical guidance for the design and analysis of such surveys. The only existing method is a Monte Carlo procedure which is difficult to interpret, computationally intensive, and lacks available open-source code for its implementation. We provide an alternative method by framing methane surveys in the context of multi-stage sampling designs. We contribute estimators of the total emissions along with variance estimators which do not require simulation, as well as stratum-level total estimators. We show that the variance contribution from each stage of sampling can be estimated to inform the design of future surveys. We also introduce a more efficient modification of the estimator. Finally, we propose combining the multi-stage approach with a simple Monte Carlo procedure to model measurement error. The resulting methods are interpretable and require minimal computational resources. We apply the methods to aerial survey data of oil and gas facilities in British Columbia, Canada, to estimate the methane emissions in the province. An R package is provided to facilitate the use of the methods.

2407.06395 2026-01-28 stat.ME

Logit unfolding choice models for binary data

Rayleigh Lei, Abel Rodriguez

Journal ref Stat Comput 35, 41 (2025)

详情
英文摘要

Discrete choice models with non-monotonic response functions are important in many areas of application, especially political sciences and marketing. This paper describes a novel unfolding model for binary data that allows for heavy-tailed shocks to the underlying utilities. One of our key contributions is a Markov chain Monte Carlo algorithm that requires little or no parameter tuning, fully explores the support of the posterior distribution, and can be used to fit various extensions of our core model that involve (Bayesian) hypothesis testing on the latent construct. Our empirical evaluations of the model and the associated algorithm suggest that they provide better complexity-adjusted fit to voting data from the United States House of Representatives.

2407.02700 2026-01-28 cs.LG math.PR stat.ML

A simple algorithm for output range analysis for deep neural networks

Helder Rojas, Nilton Rojas, Espinoza J. B., Luis Huamanchumo

详情
英文摘要

This paper presents a novel approach for the output range estimation problem in Deep Neural Networks (DNNs) by integrating a Simulated Annealing (SA) algorithm tailored to operate within constrained domains and ensure convergence towards global optima. The method effectively addresses the challenges posed by the lack of local geometric information and the high non-linearity inherent to DNNs, making it applicable to a wide variety of architectures, with a special focus on Residual Networks (ResNets) due to their practical importance. Unlike existing methods, our algorithm imposes minimal assumptions on the internal architecture of neural networks, thereby extending its usability to complex models. Theoretical analysis guarantees convergence, while extensive empirical evaluations-including optimization tests involving functions with multiple local minima-demonstrate the robustness of our algorithm in navigating non-convex response surfaces. The experimental results highlight the algorithm's efficiency in accurately estimating DNN output ranges, even in scenarios characterized by high non-linearity and complex constraints. For reproducibility, Python codes and datasets used in the experiments are publicly available through our GitHub repository.

2406.07409 2026-01-28 stat.ML cs.IT cs.LG eess.SP math.IT math.OC

Accelerating Ill-conditioned Hankel Matrix Recovery via Structured Newton-like Descent

HanQin Cai, Longxiu Huang, Xiliang Lu, Juntao You

Journal ref Inverse Problems, 41(7): 075015, 2025

详情
英文摘要

This paper studies the robust Hankel recovery problem, which simultaneously removes the sparse outliers and fulfills missing entries from the partial observation. We propose a novel non-convex algorithm, coined Hankel Structured Newton-Like Descent (HSNLD), to tackle the robust Hankel recovery problem. HSNLD is highly efficient with linear convergence, and its convergence rate is independent of the condition number of the underlying Hankel matrix. The recovery guarantee has been established under some mild conditions. Numerical experiments on both synthetic and real datasets show the superior performance of HSNLD against state-of-the-art algorithms.

2406.06765 2026-01-28 q-bio.PE stat.AP

Classical JAK2V617F+ Myeloproliferative Neoplasms emergence and development based on real life incidence and mathematical modeling

Ana Fernández Baranda, Vincent Bansaye, Evelyne Lauret, Morgane Mounier, Valérie Ugo, Sylvie Méléard, Stéphane Giraudier

详情
英文摘要

Mathematical modeling allows us to better understand myeloproliferative neoplasms (MPN), a group of blood cancers, emergence and development. We test different mathematical models on an initial cohort to determine the emergence and evolution times before diagnosis of JAK2V617F+ classical MPN (Polycythemia Vera (PV) and Essential Thrombocythemia (ET)). We consider the time before diagnosis as the sum of two independent periods: the time (from embryonic development) for the JAK2V617F mutation to occur, not disappear and enter proliferation, and a second time corresponding to the expansion of the clonal population until diagnosis. We prove that the rate of active mutation occurrence increases exponentially with age following the Gompertz model rather than being constant. We find that the first tumorous cell takes an average time of $63.1 \pm 13$ years to appear and start proliferation. On the other hand, the expansion time is constant: $8.8$ years once the mutation has emerged. These results are validated in an external cohort. Using this model, we analyze JAK2V617F ET versus PV, and obtain that the time of active mutation occurrence for PV takes approximately $1.5$ years more than for ET to develop, while the expansion time was similar. In conclusion, our age-dependent approach for the emergence and development of MPN demonstrates that the emergence of a JAKV617F mutation should be linked to an aging mechanism, and indicates a $8-9$ years period of time to develop a full MPN.

2405.18531 2026-01-28 econ.EM stat.AP

Difference-in-Discontinuities: Estimation, Inference and Validity Tests

Pedro Picchetti, Cristine C. X. Pinto, Stephanie T. Shinoki

详情
英文摘要

This paper provides a formal econometric framework behind the newly developed difference-in-discontinuities design (DiDC). Despite its increasing use in applied research, there are currently limited studies of its properties. We formalize the theory behind the difference-in-discontinuity approach by stating the identification assumptions, proposing a nonparametric estimator, and deriving its asymptotic properties. We also provide comprehensive tests for one of the identification assumption of the DiDC and sensitivity analysis methods that allow researchers to evaluate the robustness of DiDC estimates under violations of the identifying assumptions. Monte Carlo simulation studies show that the estimators have desirable finite-sample properties. Finally, we revisit Grembi et al. (2016), which studies the effects of relaxing fiscal rules on public finance outcomes. Our results show that most of the qualitative takeaways of the original work are robust to time-varying confounding effects.

2405.16106 2026-01-28 stat.ME stat.AP

On the PM2.5 -- Mortality Association: A Bayesian Model for Spatio-Temporal Confounding

Carlo Zaccardi, Pasquale Valentini, Luigi Ippoliti, Alexandra M. Schmidt

Comments 45 pages, 9 figures

详情
英文摘要

In epidemiological studies of air pollution and public health, estimating the health impact of exposure to air pollution may be hindered by the unknown functional form of the exposure-outcome association and by unmeasured confounding factors that are linked to both exposure and outcome. These challenges are especially relevant in spatio-temporal analyses, where their joint exploration remains limited. To study the effects of fine particulate matter on mortality among elderly people in Italy, we propose a Bayesian spatial dynamic generalized linear model that captures the non-linear exposure-outcome association and decomposes the exposure effect across fine and coarse spatio-temporal scales of variation. Together, these features allow reducing the spatio-temporal confounding bias and recovering the shape of the association, as demonstrated through simulation studies. The real-data analysis reveals a clear temporal pattern in the exposure effect, with peaks during summer months. We argue that this finding may be due to interactions of particulate matter with air temperature and unmeasured confounders.

2403.18602 2026-01-28 stat.ME q-bio.MN

Multi-omics network reconstruction with collaborative graphical lasso

Alessio Albanese, Wouter Kohlen, Pariya Behrouzi

详情
英文摘要

Motivation: In recent years, the availability of multi-omics data has increased substantially. Multi-omics data integration methods mainly aim to leverage different molecular layers to gain a complete molecular description of biological processes. An attractive integration approach is the reconstruction of multi-omics networks. However, the development of effective multi-omics network reconstruction strategies lags behind. Results: In this study, we introduce collaborative graphical lasso, a novel approach that extends graphical lasso by incorporating collaboration between omics layers, thereby improving multi-omics data integration and enhancing network inference. Our method leverages a collaborative penalty term, which harmonizes the contribution of the omics layers to the reconstruction of the network structure. This promotes a cohesive integration of information across modalities, and it is introduced alongside a dual regularization scheme that separately controls sparsity within and between layers. To address the challenge of model selection in this framework, we propose XStARS, a stability-based criterion for multi-dimensional hyperparameter tuning. We assess the performance of collaborative graphical lasso and the corresponding model selection procedure through simulations, and we apply them to publicly available multi-omics data. This application demonstrated collaborative graphical lasso recovers established biological interactions while suggesting novel, biologically coherent connections. Availability and implementation: We implemented collaborative graphical lasso as an R package, available on CRAN as coglasso. The results of the manuscript can be reproduced running the code available at https://github.com/DrQuestion/coglasso_reproducible_code

2309.13183 2026-01-28 math.ST stat.ME stat.ML stat.TH

Statistical Hypothesis Testing for Information Value (IV)

Helder Rojas, Cirilo Alvarez, Nilton Rojas

详情
英文摘要

Information Value (IV) is a widely used technique for feature selection prior to the modeling phase, particularly in credit scoring and related domains. However, conventional IV-based practices rely on fixed empirical thresholds, which lack statistical justification and may be sensitive to characteristics such as class imbalance. In this work, we develop a formal statistical framework for IV by establishing its connection with Jeffreys divergence and propose a novel nonparametric hypothesis test, referred to as the J-Divergence test. Our method provides rigorous asymptotic guarantees and enables interpretable decisions based on \(p\)-values. Numerical experiments, including synthetic and real-world data, demonstrate that the proposed test is more reliable than traditional IV thresholding, particularly under strong imbalance. The test is model-agnostic, computationally efficient, and well-suited for the pre-modeling phase in high-dimensional or imbalanced settings. An open-source Python library is provided for reproducibility and practical adoption.

2306.12344 2026-01-28 cs.LG cs.DS stat.ML

An efficient, provably optimal algorithm for the 0-1 loss linear classification problem

Xi He, Max A. Little

Comments Published in ICLR 2026, The Fourteenth International Conference on Learning Representations. 19 pages, 6 figures

详情
英文摘要

Algorithms for solving the linear classification problem have a long history, dating back at least to 1936 with linear discriminant analysis. For linearly separable data, many algorithms can obtain the exact solution to the corresponding 0-1 loss classification problem efficiently, but for data which is not linearly separable, it has been shown that this problem, in full generality, is NP-hard. Alternative approaches all involve approximations of some kind, such as the use of surrogates for the 0-1 loss (for example, the hinge or logistic loss), none of which can be guaranteed to solve the problem exactly. Finding an efficient, rigorously proven algorithm for obtaining an exact (i.e., globally optimal) solution to the 0-1 loss linear classification problem remains an open problem. By analyzing the combinatorial and incidence relations between hyperplanes and data points, we derive a rigorous construction algorithm, incremental cell enumeration (ICE), that can solve the 0-1 loss classification problem exactly in $O(N^{D+1})$. To the best of our knowledge, this is the first standalone algorithm-one that does not rely on general-purpose solvers-with rigorously proven guarantees for this problem. Moreover, we further generalize ICE to address the polynomial hypersurface classification problem in $O(N^{G+1})$ time, where $G$ is determined by both the data dimension and the polynomial hypersurface degree. The correctness of our algorithm is proved by the use of tools from the theory of hyperplane arrangements and oriented matroids. We demonstrate the effectiveness of our algorithm on real-world datasets, achieving optimal training accuracy for small-scale datasets and higher test accuracy on most datasets. Furthermore, our complexity analysis shows that the ICE algorithm offers superior computational efficiency compared with state-of-the-art branch-and-bound algorithm.

2305.19380 2026-01-28 stat.AP

Dynamic Factor Models for Binary Data in Circular Spaces: An Application to the U.S. Supreme Court

Rayleigh Lei, Abel Rodriguez

详情
英文摘要

Latent factor models are widely used in the social and behavioral science as scaling tools to map discrete multivariate outcomes into low dimensional, continuous scales. In political science, dynamic versions of classical factor models have been widely used to study the evolution of justices' preferences in multi-judge courts. In this paper, we discuss a new dynamic factor model that relies on a latent circular space that can accommodate voting behaviors in which justices commonly understood to be on opposite ends of the ideological spectrum vote together on a substantial number of otherwise closely-divided opinions. We apply this model to data on non-unanimous decisions made by the U.S. Supreme Court between 1937 and 2021, and show that, for most of this period, voting patterns can be better described by a circular latent space.

2210.04146 2026-01-28 stat.ME

Inference on model parameters with many L-moments

Luis Alvarez, Chang Chiann, Pedro Morettin

Journal ref Journal of Econometrics, Volume 252, Part A, November 2025, 106101

详情
英文摘要

This paper studies parameter estimation using L-moments, an alternative to traditional moments with attractive statistical properties. The estimation of model parameters by matching sample L-moments is known to outperform maximum likelihood estimation (MLE) in small samples from popular distributions. The choice of the number of L-moments used in estimation remains ad-hoc, though: researchers typically set the number of L-moments equal to the number of parameters, which is inefficient in larger samples. In this paper, we show that, by properly choosing the number of L-moments and weighting these accordingly, one is able to construct an estimator that outperforms MLE in finite samples, and yet retains asymptotic efficiency. We do so by introducing a generalised method of L-moments estimator and deriving its properties in an asymptotic framework where the number of L-moments varies with sample size. We then propose methods to automatically select the number of L-moments in a sample. Monte Carlo evidence shows our approach can provide mean-squared-error improvements over MLE in smaller samples, whilst working as well as it in larger samples. We consider extensions of our approach to the estimation of conditional models and a class semiparametric models. We apply the latter to study expenditure patterns in a ridesharing platform in Brazil.

2209.10166 2026-01-28 q-fin.MF cs.LG math.PR q-fin.CP stat.ML

Chaotic Hedging with Iterated Integrals and Neural Networks

Ariel Neufeld, Philipp Schmocker

详情
英文摘要

In this paper, we derive an $L^p$-chaos expansion based on iterated Stratonovich integrals with respect to a given exponentially integrable continuous semimartingale. By omitting the orthogonality of the expansion, we show that every $p$-integrable functional, $p \in [1,\infty)$, can be approximated by a finite sum of iterated Stratonovich integrals. Using (possibly random) neural networks as integrands, we therefere obtain universal approximation results for $p$-integrable financial derivatives in the $L^p$-sense. Moreover, we can approximately solve the $L^p$-hedging problem (coinciding for $p = 2$ with the quadratic hedging problem), where the approximating hedging strategy can be computed in closed form within short runtime.

2202.11393 2026-01-28 cs.CR stat.ML

Differential privacy for symmetric log-concave mechanisms

Staal A. Vinterbo

Comments AISTATS 2022, v3 included warning about error in Lemma 8

详情
英文摘要

Adding random noise to database query results is an important tool for achieving privacy. A challenge is to minimize this noise while still meeting privacy requirements. Recently, a sufficient and necessary condition for $(ε, δ)$-differential privacy for Gaussian noise was published. This condition allows the computation of the minimum privacy-preserving scale for this distribution. We extend this work and provide a sufficient and necessary condition for $(ε, δ)$-differential privacy for all symmetric and log-concave noise densities. Our results allow fine-grained tailoring of the noise distribution to the dimensionality of the query result. We demonstrate that this can yield significantly lower mean squared errors than those incurred by the currently used Laplace and Gaussian mechanisms for the same $ε$ and $δ$.

2109.10755 2026-01-28 math.ST stat.TH

Contraction rates for sparse variational approximations in Gaussian process regression

Dennis Nieman, Botond Szabo, Harry van Zanten

Comments 26 pages, 6 figures, 1 table

Journal ref Journal of Machine Learning Research 23(205), pages 1-26 (2022)

详情
英文摘要

We study the theoretical properties of a variational Bayes method in the Gaussian Process regression model. We consider the inducing variables method introduced by Titsias (2009a) and derive sufficient conditions for obtaining contraction rates for the corresponding variational Bayes (VB) posterior. As examples we show that for three particular covariance kernels (Matérn, squared exponential, random series prior) the VB approach can achieve optimal, minimax contraction rates for a sufficiently large number of appropriately chosen inducing variables. The theoretical findings are demonstrated by numerical experiments.

2107.03041 2026-01-28 math.ST stat.TH

Test for independence of long-range dependent time series using distance covariance

Annika Betken, Herold Dehling

详情
英文摘要

We apply the concept of distance covariance for testing independence of two long-range dependent time series. As test statistic we propose a linear combination of empirical distance cross-covariances. We derive the asymptotic distribution of the test statistic, and we show consistency against a very general class of alternatives. The asymptotic theory developed in this paper is based on a novel non-central limit theorem for stochastic processes with values in an $L^2$-Hilbert space. This limit theorem is of general theoretical interest which goes beyond the context of this article. Subject to the dependence in the data, the standardization and the limit distributions of the proposed test statistic vary. Since the limit distributions are unknown, we propose a subsampling procedure to determine the critical values for the proposed test, and we provide a proof for the validity of subsampling. In a simulation study, we investigate the finite-sample behavior of our test, and we compare its performance to tests based on the empirical cross-covariances. As an application of our results we analyze the cross-dependencies between mean monthly discharges of three rivers.

2103.03237 2026-01-28 econ.EM stat.ME

High-dimensional estimation of quadratic variation based on penalized realized variance

Kim Christensen, Mikkel Slot Nielsen, Mark Podolskij

详情
英文摘要

In this paper, we develop a penalized realized variance (PRV) estimator of the quadratic variation (QV) of a high-dimensional continuous Itô semimartingale. We adapt the principle idea of regularization from linear regression to covariance estimation in a continuous-time high-frequency setting. We show that under a nuclear norm penalization, the PRV is computed by soft-thresholding the eigenvalues of realized variance (RV). It therefore encourages sparsity of singular values or, equivalently, low rank of the solution. We prove our estimator is minimax optimal up to a logarithmic factor. We derive a concentration inequality, which reveals that the rank of PRV is -- with a high probability -- the number of non-negligible eigenvalues of the QV. Moreover, we also provide the associated non-asymptotic analysis for the spot variance. We suggest an intuitive data-driven subsampling procedure to select the shrinkage parameter. Our theory is supplemented by a simulation study and an empirical application. The PRV detects about three-five factors in the equity market, with a notable rank decrease during times of distress in financial markets. This is consistent with most standard asset pricing models, where a limited amount of systematic factors driving the cross-section of stock returns are perturbed by idiosyncratic errors, rendering the QV -- and also RV -- of full rank.

2007.10432 2026-01-28 econ.EM stat.ME

Treatment Effects with Targeting Instruments

Sokbae Lee, Bernard Salanié

详情
英文摘要

Multivalued treatments are commonplace in applications. We explore the use of discrete-valued instruments to control for selection bias in this setting. Our discussion revolves around the concept of targeting: which instruments target which treatments. It allows us to establish conditions under which counterfactual averages and treatment effects are point- or partially-identified for composite complier groups. We explore the additional identifying power of a positive selection assumption. We illustrate its usefulness by revisiting the findings of Kline and Walters (2016) on the Head Start Impact Study. We derive informative bounds that suggest less beneficial effects of Head Start expansions than their parametric estimates.

2601.18952 2026-01-28 cs.LG stat.ME stat.ML

Vector-Valued Distributional Reinforcement Learning Policy Evaluation: A Hilbert Space Embedding Approach

Mehrdad Mohammadi, Qi Zheng, Ruoqing Zhu

详情
英文摘要

We propose an (offline) multi-dimensional distributional reinforcement learning framework (KE-DRL) that leverages Hilbert space mappings to estimate the kernel mean embedding of the multi-dimensional value distribution under a proposed target policy. In our setting, the state-action variables are multi-dimensional and continuous. By mapping probability measures into a reproducing kernel Hilbert space via kernel mean embeddings, our method replaces Wasserstein metrics with an integral probability metric. This enables efficient estimation in multi-dimensional state-action spaces and reward settings, where direct computation of Wasserstein distances is computationally challenging. Theoretically, we establish contraction properties of the distributional Bellman operator under our proposed metric involving the Matern family of kernels and provide uniform convergence guarantees. Simulations and empirical results demonstrate robust off-policy evaluation and recovery of the kernel mean embedding under mild assumptions, namely, Lipschitz continuity and boundedness of the kernels, highlighting the potential of embedding-based approaches in complex real-world decision-making scenarios and risk evaluation.

2601.18950 2026-01-28 stat.ML cs.IT cs.LG math.IT

Collaborative Compressors in Distributed Mean Estimation with Limited Communication Budget

Harsh Vardhan, Arya Mazumdar

Journal ref Transactions on Machine Learning Research 2025

详情
英文摘要

Distributed high dimensional mean estimation is a common aggregation routine used often in distributed optimization methods. Most of these applications call for a communication-constrained setting where vectors, whose mean is to be estimated, have to be compressed before sharing. One could independently encode and decode these to achieve compression, but that overlooks the fact that these vectors are often close to each other. To exploit these similarities, recently Suresh et al., 2022, Jhunjhunwala et al., 2021, Jiang et al, 2023, proposed multiple correlation-aware compression schemes. However, in most cases, the correlations have to be known for these schemes to work. Moreover, a theoretical analysis of graceful degradation of these correlation-aware compression schemes with increasing dissimilarity is limited to only the $\ell_2$-error in the literature. In this paper, we propose four different collaborative compression schemes that agnostically exploit the similarities among vectors in a distributed setting. Our schemes are all simple to implement and computationally efficient, while resulting in big savings in communication. The analysis of our proposed schemes show how the $\ell_2$, $\ell_\infty$ and cosine estimation error varies with the degree of similarity among vectors.

2601.18938 2026-01-28 cs.LG cs.SI stat.ML

FSD-CAP: Fractional Subgraph Diffusion with Class-Aware Propagation for Graph Feature Imputation

Xin Qiao, Shijie Sun, Anqi Dong, Cong Hua, Xia Zhao, Longfei Zhang, Guangming Zhu, Liang Zhang

Comments 31 pages, 12 figures

详情
英文摘要

Imputing missing node features in graphs is challenging, particularly under high missing rates. Existing methods based on latent representations or global diffusion often fail to produce reliable estimates, and may propagate errors across the graph. We propose FSD-CAP, a two-stage framework designed to improve imputation quality under extreme sparsity. In the first stage, a graph-distance-guided subgraph expansion localizes the diffusion process. A fractional diffusion operator adjusts propagation sharpness based on local structure. In the second stage, imputed features are refined using class-aware propagation, which incorporates pseudo-labels and neighborhood entropy to promote consistency. We evaluated FSD-CAP on multiple datasets. With $99.5\%$ of features missing across five benchmark datasets, FSD-CAP achieves average accuracies of $80.06\%$ (structural) and $81.01\%$ (uniform) in node classification, close to the $81.31\%$ achieved by a standard GCN with full features. For link prediction under the same setting, it reaches AUC scores of $91.65\%$ (structural) and $92.41\%$ (uniform), compared to $95.06\%$ for the fully observed case. Furthermore, FSD-CAP demonstrates superior performance on both large-scale and heterophily datasets when compared to other models.

2601.18932 2026-01-28 eess.IV cs.IT cs.LG math.IT stat.ML

Advances in Diffusion-Based Generative Compression

Yibo Yang, Stephan Mandt

Comments Preprint

详情
英文摘要

Popularized by their strong image generation performance, diffusion and related methods for generative modeling have found widespread success in visual media applications. In particular, diffusion methods have enabled new approaches to data compression, where realistic reconstructions can be generated at extremely low bit-rates. This article provides a unifying review of recent diffusion-based methods for generative lossy compression, with a focus on image compression. These methods generally encode the source into an embedding and employ a diffusion model to iteratively refine it in the decoding procedure, such that the final reconstruction approximately follows the ground truth data distribution. The embedding can take various forms and is typically transmitted via an auxiliary entropy model, and recent methods also explore the use of diffusion models themselves for information transmission via channel simulation. We review representative approaches through the lens of rate-distortion-perception theory, highlighting the role of common randomness and connections to inverse problems, and identify open challenges.

2601.18907 2026-01-28 stat.ML cs.LG

Implicit Q-Learning and SARSA: Liberating Policy Control from Step-Size Calibration

Hwanwoo Kim, Eric Laber

详情
英文摘要

Q-learning and SARSA are foundational reinforcement learning algorithms whose practical success depends critically on step-size calibration. Step-sizes that are too large can cause numerical instability, while step-sizes that are too small can lead to slow progress. We propose implicit variants of Q-learning and SARSA that reformulate their iterative updates as fixed-point equations. This yields an adaptive step-size adjustment that scales inversely with feature norms, providing automatic regularization without manual tuning. Our non-asymptotic analyses demonstrate that implicit methods maintain stability over significantly broader step-size ranges. Under favorable conditions, it permits arbitrarily large step-sizes while achieving comparable convergence rates. Empirical validation across benchmark environments spanning discrete and continuous state spaces shows that implicit Q-learning and SARSA exhibit substantially reduced sensitivity to step-size selection, achieving stable performance with step-sizes that would cause standard methods to fail.

2601.18900 2026-01-28 cs.CV cs.LG stat.ML

RealStats: A Rigorous Real-Only Statistical Framework for Fake Image Detection

Haim Zisman, Uri Shaham

Comments 22 pages, 14 figures. Accepted to AISTATS 2026

详情
英文摘要

As generative models continue to evolve, detecting AI-generated images remains a critical challenge. While effective detection methods exist, they often lack formal interpretability and may rely on implicit assumptions about fake content, potentially limiting robustness to distributional shifts. In this work, we introduce a rigorous, statistically grounded framework for fake image detection that focuses on producing a probability score interpretable with respect to the real-image population. Our method leverages the strengths of multiple existing detectors by combining training-free statistics. We compute p-values over a range of test statistics and aggregate them using classical statistical ensembling to assess alignment with the unified real-image distribution. This framework is generic, flexible, and training-free, making it well-suited for robust fake image detection across diverse and evolving settings.

2601.18889 2026-01-28 stat.AP

A penalized heteroskedastic ordered probit model for DIF (measurement invariance) testing of single-item assessments in cross-cultural research

R Noah Padgett

Comments 10 pages, 3 figures

详情
英文摘要

Differential item functioning (DIF) or measurement invariance (MI) testing for single-item assessments has previously been impossible. Part of the issue is that there are no conditioning variables to serve as a proxy for the latent variable--regression-based DIF methods. Another reason is that factor-analytic approaches require multiple items to estimate parameters. In this technical working paper, I propose an approach for evaluating DIF/MI in a single-item assessment of a construct. The current methods should NOT replace using multiple-indicator MG-CFA/IRT analyses of DIF/MI or regression mased methods when possible. More items generally provide significantly better construct coverage and provide more rigorous DIF/MI evaluation.

2601.18815 2026-01-28 q-fin.MF stat.ML

Prediction Markets as Bayesian Inverse Problems: Uncertainty Quantification, Identifiability, and Information Gain from Price-Volume Histories under Latent Types

Juan Pablo Madrigal-Cianci, Camilo Monsalve Maya, Lachlan Breakey

详情
英文摘要

Prediction markets are often described as mechanisms that ``aggregate information'' into prices, yet the mapping from dispersed private information to observed market histories is typically noisy, endogenous, and shaped by heterogeneous and strategic participation. This paper formulates prediction markets as Bayesian inverse problems in which the unknown event outcome \(Y\in\{0,1\}\) is inferred from an observed history of market-implied probabilities and traded volumes. We introduce a mechanism-agnostic observation model in log-odds space in which price increments conditional on volume arise from a latent mixture of trader types. The resulting likelihood class encompasses informed and uninformed trading, heavy-tailed microstructure noise, and adversarial or manipulative flow, while requiring only price and volume as observables. Within this framework we define posterior uncertainty quantification for \(Y\), provide identifiability and well-posedness criteria in terms of Kullback--Leibler separation between outcome-conditional increment laws, and derive posterior concentration statements and finite-sample error bounds under general regularity assumptions. We further study stability of posterior odds to perturbations of the observed price--volume path and define realized and expected information gain via the posterior-vs-prior KL divergence and mutual information. The inverse-problem formulation yields explicit diagnostics for regimes in which market histories are informative and stable versus regimes in which inference is ill-posed due to type-composition confounding or outcome--nuisance symmetries. Extensive experiments on synthetic data validate our theoretical predictions regarding posterior concentration rates and identifiability thresholds.

2601.18806 2026-01-28 cond-mat.soft nlin.AO stat.AP

Is gelation a singularity or a flow induced instability?

Manuel Dedola, Ludovico Cademartiri

详情
英文摘要

Gelation in the Smoluchowski coagulation equation is commonly interpreted as a finite-time singularity marked by mass loss or moment divergence. We instead characterize gelation as a loss of dynamical stability of the Smoluchowski flow, quantified through the time-dependent spectrum of the Jacobian along the evolving aggregation dynamics. Studying homogeneous kernels $K(i,j)=(ij)^α$ together with the classical Smoluchowski, we show that gelation is consistently preceded by the appearance of positive real eigenvalues, indicating a loss of local dynamical stability. While non-gelling kernels exhibit only transient finite-size effects, gelling kernels display persistent spectral destabilization associated with macroscopic gel formation. Our results identify gelation as a genuine dynamical instability of the Smoluchowski flow.

2512.05276 2026-01-28 stat.ME stat.AP

A Functional Approach to Testing Overall Effect of Interaction Between DNA Methylation and SNPs

Yvelin Gansou, Karim Oualkacha, Marzia Angela Cremona, Lajmi Lakhal-Chaieb

Comments 30 pages, 14 figures

Journal ref Statistics in Medicine, 2026

详情
英文摘要

We introduce a test for the overall effect of interaction between DNA methylation and a set of single nucleotide polymorphisms (SNPs) on a quantitative phenotype. The developed inference procedure is based on a functional approach that extends existing regression models in functional data analysis. Through extensive simulations, we show that the proposed test effectively controls type I error rates and highlights increased empirical power over existing methods, particularly when multiple interactions are present. The use of the proposed test is illustrated with an application to data from obesity patients and controls.

2512.03102 2026-01-28 cs.LG cs.AI stat.CO

Dynamic Correction of Erroneous State Estimates via Diffusion Bayesian Exploration

Yiwei Shi, Hongnan Ma, Mengyue Yang, Cunjia Liu, Weiru Liu

详情
英文摘要

In emergency response and other high-stakes societal applications, early-stage state estimates critically shape downstream outcomes. Yet, these initial state estimates-often based on limited or biased information-can be severely misaligned with reality, constraining subsequent actions and potentially causing catastrophic delays, resource misallocation, and human harm. Under the stationary bootstrap baseline (zero transition and no rejuvenation), bootstrap particle filters exhibit Stationarity-Induced Posterior Support Invariance (S-PSI), wherein regions excluded by the initial prior remain permanently unexplorable, making corrections impossible even when new evidence contradicts current beliefs. While classical perturbations can in principle break this lock-in, they operate in an always-on fashion and may be inefficient. To overcome this, we propose a diffusion-driven Bayesian exploration framework that enables principled, real-time correction of early state estimation errors. Our method expands posterior support via entropy-regularized sampling and covariance-scaled diffusion. A Metropolis-Hastings check validates proposals and keeps inference adaptive to unexpected evidence. Empirical evaluations on realistic hazardous-gas localization tasks show that our approach matches reinforcement learning and planning baselines when priors are correct. It substantially outperforms classical SMC perturbations and RL-based methods under misalignment, and we provide theoretical guarantees that DEPF resolves S-PSI while maintaining statistical rigor.

2505.21791 2026-01-28 stat.ML cs.LG

Global Minimizers of $\ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks

Julia Nakhleh, Robert D. Nowak

Comments Update to final published version (NeurIPS 2025). Fixed one incorrect citation

详情
英文摘要

Overparameterized neural networks can interpolate a given dataset in many different ways, prompting the fundamental question: which among these solutions should we prefer, and what explicit regularization strategies will provably yield these solutions? This paper addresses the challenge of finding the sparsest interpolating ReLU network--i.e., the network with the fewest nonzero parameters or neurons--a goal with wide-ranging implications for efficiency, generalization, interpretability, theory, and model compression. Unlike post hoc pruning approaches, we propose a continuous, almost-everywhere differentiable training objective whose global minima are guaranteed to correspond to the sparsest single-hidden-layer ReLU networks that fit the data. This result marks a conceptual advance: it recasts the combinatorial problem of sparse interpolation as a smooth optimization task, potentially enabling the use of gradient-based training methods. Our objective is based on minimizing $\ell^p$ quasinorms of the weights for $0 < p < 1$, a classical sparsity-promoting strategy in finite-dimensional settings. However, applying these ideas to neural networks presents new challenges: the function class is infinite-dimensional, and the weights are learned using a highly nonconvex objective. We prove that, under our formulation, global minimizers correspond exactly to sparsest solutions. Our work lays a foundation for understanding when and how continuous sparsity-inducing objectives can be leveraged to recover sparse networks through training.

2502.02580 2026-01-28 math.ST stat.ME stat.ML stat.TH

Minimax-Optimal Spectral Clustering with Covariance Projection for High-Dimensional Anisotropic Mixtures

Chengzhu Huang, Yuqi Gu

详情
英文摘要

In mixture models, anisotropic noise within each cluster is widely present in real-world data. This work investigates both computationally efficient procedures and fundamental statistical limits for clustering in high-dimensional anisotropic mixtures. We propose a new clustering method, Covariance Projected Spectral Clustering (COPO), which adapts to a wide range of dependent noise structures. We first project the data onto a low-dimensional space via eigen-decomposition of a diagonal-deleted Gram matrix. Our central methodological idea is to sharpen clustering in this embedding space by a covariance-aware reassignment step, using quadratic distances induced by estimated projected covariances. Through a novel row-wise analysis of the subspace estimation step in weak-signal regimes, which is of independent interest, we establish tight performance guarantees and algorithmic upper bounds for COPO, covering both Gaussian noise with flexible covariance and general noise with local dependence. To characterize the fundamental difficulty of clustering high-dimensional anisotropic Gaussian mixtures, we further establish two distinct and complementary minimax lower bounds, each highlighting different covariance-driven barriers. Our results show that COPO attains minimax-optimal misclustering rates in Gaussian settings. Extensive simulation studies across diverse noise structures, along with a real data application, demonstrate the superior empirical performance of our method.

2407.05895 2026-01-28 cs.LG stat.ML

Link Representation Learning for Probabilistic Travel Time Estimation

Chen Xu, Qiang Wang, Lijun Sun

Journal ref IEEE Transactions on Intelligent Transportation Systems (2025)

详情
英文摘要

Travel time estimation is a key task in navigation apps and web mapping services. Existing deterministic and probabilistic methods, based on the assumption of trip independence, predominantly focus on modeling individual trips while overlooking trip correlations. However, real-world conditions frequently introduce strong correlations between trips, influenced by external and internal factors such as weather and the tendencies of drivers. To address this, we propose a deep hierarchical joint probabilistic model ProbETA for travel time estimation, capturing both inter-trip and intra-trip correlations. The joint distribution of travel times across multiple trips is modeled as a low-rank multivariate Gaussian, parameterized by learnable link representations estimated using the empirical Bayes approach. We also introduce a data augmentation method based on trip sub-sampling, allowing for fine-grained gradient backpropagation when learning link representations. During inference, our model estimates the probability distribution of travel time for a queried trip, conditional on spatiotemporally adjacent completed trips. Evaluation on two real-world GPS trajectory datasets demonstrates that ProbETA outperforms state-of-the-art deterministic and probabilistic baselines, with Mean Absolute Percentage Error decreasing by over 12.60%. Moreover, the learned link representations align with the physical network geometry, potentially making them applicable for other tasks.

2311.09426 2026-01-28 stat.CO

Linear-Cost Vecchia Approximation of Multivariate Normal Probabilities

Jian Cao, Matthias Katzfuss

详情
英文摘要

Multivariate normal (MVN) probabilities arise in myriad applications, but they are analytically intractable and need to be evaluated via Monte-Carlo-based numerical integration. For the state-of-the-art minimax exponential tilting (MET) method, we show that the complexity of each of its components can be greatly reduced through an integrand parameterization that utilizes the sparse inverse Cholesky factor produced by the Vecchia approximation, whose approximation error is often negligible relative to the Monte-Carlo error. Based on this idea, we derive algorithms that can estimate MVN probabilities and sample from truncated MVN distributions in linear time (and that are easily parallelizable) at the same convergence or acceptance rate as MET, whose complexity is cubic in the dimension of the MVN probability. We showcase the advantages of our methods relative to existing approaches using several simulated examples. We also analyze a groundwater-contamination dataset with over twenty thousand censored measurements to demonstrate the scalability of our method for partially censored Gaussian-process models.

2203.10651 2026-01-28 cs.LG stat.ML

Forecasting Sparse Movement Speed of Urban Road Networks with Nonstationary Temporal Matrix Factorization

Xinyu Chen, Chengyuan Zhang, Xi-Le Zhao, Nicolas Saunier, Lijun Sun

Comments Data and Python codes: https://github.com/xinychen/tracebase

Journal ref Transportation Science (2025)

详情
英文摘要

Movement speed data from urban road networks, computed from ridesharing vehicles or taxi trajectories, is often high-dimensional, sparse, and nonstationary (e.g., exhibiting seasonality). To address these challenges, we propose a Nonstationary Temporal Matrix Factorization (NoTMF) model that leverages matrix factorization to project high-dimensional and sparse movement speed data into low-dimensional latent spaces. This results in a concise formula with the multiplication between spatial and temporal factor matrices. To characterize the temporal correlations, NoTMF takes a latent equation on the seasonal differenced temporal factors using higher-order vector autoregression (VAR). This approach not only preserves the low-rank structure of sparse movement speed data but also maintains consistent temporal dynamics, including seasonality information. The learning process for NoTMF involves optimizing the spatial and temporal factor matrices along with a collection of VAR coefficient matrices. To solve this efficiently, we introduce an alternating minimization framework, which tackles a challenging procedure of estimating the temporal factor matrix using conjugate gradient method, as the subproblem involves both partially observed matrix factorization and seasonal differenced VAR. To evaluate the forecasting performance of NoTMF, we conduct extensive experiments on Uber movement speed datasets, which are estimated from ridesharing vehicle trajectories. These datasets contain a large proportion of missing values due to insufficient ridesharing vehicles on the urban road network. Despite the presence of missing data, NoTMF demonstrates superior forecasting accuracy and effectiveness compared to baseline models. Moreover, as the seasonality of movement speed data is of great concern, the experiment results highlight the significance of addressing the nonstationarity of movement speed data.