arXivDaily arXiv每日学术速递 周一至周五更新
2602.20153 2026-02-24 stat.ML cs.LG stat.ME

JUCAL: Jointly Calibrating Aleatoric and Epistemic Uncertainty in Classification Tasks

Jakob Heiss, Sören Lambrecht, Jakob Weissteiner, Hanna Wutte, Žan Žurič, Josef Teichmann, Bin Yu

Comments 11 pages + appendix. Preliminary version of an ongoing project that will be expanded with furhter evaluations

详情
英文摘要

We study post-calibration uncertainty for trained ensembles of classifiers. Specifically, we consider both aleatoric (label noise) and epistemic (model) uncertainty. Among the most popular and widely used calibration methods in classification are temperature scaling (i.e., pool-then-calibrate) and conformal methods. However, the main shortcoming of these calibration methods is that they do not balance the proportion of aleatoric and epistemic uncertainty. Not balancing these uncertainties can severely misrepresent predictive uncertainty, leading to overconfident predictions in some input regions while being underconfident in others. To address this shortcoming, we present a simple but powerful calibration algorithm Joint Uncertainty Calibration (JUCAL) that jointly calibrates aleatoric and epistemic uncertainty. JUCAL jointly calibrates two constants to weight and scale epistemic and aleatoric uncertainties by optimizing the negative log-likelihood (NLL) on the validation/calibration dataset. JUCAL can be applied to any trained ensemble of classifiers (e.g., transformers, CNNs, or tree-based methods), with minimal computational overhead, without requiring access to the models' internal parameters. We experimentally evaluate JUCAL on various text classification tasks, for ensembles of varying sizes and with different ensembling strategies. Our experiments show that JUCAL significantly outperforms SOTA calibration methods across all considered classification tasks, reducing NLL and predictive set size by up to 15% and 20%, respectively. Interestingly, even applying JUCAL to an ensemble of size 5 can outperform temperature-scaled ensembles of size up to 50 in terms of NLL and predictive set size, resulting in up to 10 times smaller inference costs. Thus, we propose JUCAL as a new go-to method for calibrating ensembles in classification.

2602.20152 2026-02-24 cs.LG cs.AI stat.ML

Behavior Learning (BL): Learning Hierarchical Optimization Structures from Data

Zhenyao Ma, Yue Liang, Dongxu Li

Comments ICLR 2026

详情
英文摘要

Inspired by behavioral science, we propose Behavior Learning (BL), a novel general-purpose machine learning framework that learns interpretable and identifiable optimization structures from data, ranging from single optimization problems to hierarchical compositions. It unifies predictive performance, intrinsic interpretability, and identifiability, with broad applicability to scientific domains involving optimization. BL parameterizes a compositional utility function built from intrinsically interpretable modular blocks, which induces a data distribution for prediction and generation. Each block represents and can be written in symbolic form as a utility maximization problem (UMP), a foundational paradigm in behavioral science and a universal framework of optimization. BL supports architectures ranging from a single UMP to hierarchical compositions, the latter modeling hierarchical optimization structures. Its smooth and monotone variant (IBL) guarantees identifiability. Theoretically, we establish the universal approximation property of BL, and analyze the M-estimation properties of IBL. Empirically, BL demonstrates strong predictive performance, intrinsic interpretability and scalability to high-dimensional data. Code: https://github.com/MoonYLiang/Behavior-Learning ; install via pip install blnetwork.

2602.20151 2026-02-24 stat.ME cs.LG math.ST stat.ML stat.TH

Conformal Risk Control for Non-Monotonic Losses

Anastasios N. Angelopoulos

详情
英文摘要

Conformal risk control is an extension of conformal prediction for controlling risk functions beyond miscoverage. The original algorithm controls the expected value of a loss that is monotonic in a one-dimensional parameter. Here, we present risk control guarantees for generic algorithms applied to possibly non-monotonic losses with multidimensional parameters. The guarantees depend on the stability of the algorithm -- unstable algorithms have looser guarantees. We give applications of this technique to selective image classification, FDR and IOU control of tumor segmentations, and multigroup debiasing of recidivism predictions across overlapping race and sex groups using empirical risk minimization.

2602.20126 2026-02-24 cs.LG cs.IT math.IT math.ST stat.ML stat.TH

Adaptation to Intrinsic Dependence in Diffusion Language Models

Yunxiao Zhao, Changxiao Cai

详情
英文摘要

Diffusion language models (DLMs) have recently emerged as a promising alternative to autoregressive (AR) approaches, enabling parallel token generation beyond a rigid left-to-right order. Despite growing empirical success, the theoretical understanding of how unmasking schedules -- which specify the order and size of unmasked tokens during sampling -- affect generation quality remains limited. In this work, we introduce a distribution-agnostic unmasking schedule for DLMs that adapts to the (unknown) dependence structure of the target data distribution, without requiring any prior knowledge or hyperparameter tuning. In contrast to prior deterministic procedures that fix unmasking sizes, our method randomizes the number of tokens revealed at each iteration. We show that, for two specific parameter choices, the sampling convergence guarantees -- measured by Kullback-Leibler (KL) divergence -- scale as $\widetilde O(\mathsf{TC}/K)$ and $\widetilde O(\mathsf{DTC}/K)$ respectively. Here, $K$ is the number of iterations, and $\mathsf{TC}$ and $\mathsf{DTC}$ are the total correlation and dual total correlation of the target distribution, capturing the intrinsic dependence structure underlying the data. Importantly, our guarantees hold in the practically relevant parallel-sampling regime $K<L$ where $L$ is the token sequence length. These results significantly improve upon prior convergence theories and yield substantial sampling acceleration for low-complexity distributions. Overall, our findings unveil the adaptivity of DLMs to intrinsic data structures and shed light on the benefit of randomized unmasking sizes in inference schedule design.

2602.20118 2026-02-24 stat.ME

Improving the Power of Bonferroni Adjustments under Joint Normality and Exchangeability

Caleb Hiltunen, Yeonwoo Rho

详情
英文摘要

Bonferroni's correction is a popular tool to address multiplicity but is notorious for its low power when tests are dependent. This paper proposes a practical modification of Bonferroni's correction when test statistics are jointly normal and exchangeable. This method is intuitive to practitioners and achieves higher power in sparse alternatives, as our simulations suggest. We also prove that this method successfully controls the family-wise error rate at any significance level.

2602.20115 2026-02-24 math.ST econ.EM stat.ME stat.TH

Compound decisions and empirical Bayes via Bayesian nonparametrics

Nikolaos Ignatiadis, Sid Kankanala

Comments 34 pages

详情
英文摘要

We study the Gaussian sequence compound decision problem and analyze a Bayesian nonparametric estimator from an empirical Bayes, regret-based perspective. Motivated by sharp results for the classical nonparametric maximum likelihood estimator (NPMLE), we ask whether an analogous guarantee can be obtained using a standard Bayesian nonparametric prior. We show that a Dirichlet-process-based Bayesian procedure achieves near-optimal regret bounds. Our main results are stated in the compound decision framework, where the mean vector is treated as fixed, while we also provide parallel guarantees under a hierarchical model in which the means are drawn from a true unknown prior distribution. The posterior mean Bayes rule is, a fortiori, admissible, whereas we show that the NPMLE plug-in rule is inadmissible.

2602.20071 2026-02-24 math.ST stat.AP stat.ME stat.TH

Estimators of different delta coefficients based on the unbiased estimator of the expected proportions of agreements

A. Martín Andrés, M. Álvarez Hernández

详情
英文摘要

To measure the degree of agreement between two observers that independently classify $n$ subjects within $K$ categories, it is common to use different kappa type coefficients, the most common of which is the $κ_C$ coefficient (Cohen's kappa). As $κ_C$ has some weaknesses -such as its poor performance with highly unbalanced marginal distributions-, the $Δ$ coefficient is sometimes used, based on the $delta$ response model. This model allows us to obtain other parameters like: (a) the $α_i$ contribution of each $i$ category to the value of the global agreement $Δ=\sum α_i$; and (b) the consistency $\mathcal{S}_i$ in the category $i$ (degree of agreement in the category $i$), a more appropriate parameter than the kappa value obtained by collapsing the data into the category $i$. It has recently been shown that the classic estimator $\hatκ_C$ underestimates $κ_C$, having obtained a new estimator $\hatκ_{CU}$ which is less biased. This article demonstrates that something similar happens to the known estimators $\hatΔ$, $\hatα_i$, and $\hat{\mathcal{S}}_i$ of $Δ$, $α_i$ and $\mathcal{S}_i$ (respectively), proposes new and less biased estimators $\hatΔ_U$, $\hatα_{iU}$, and $\hat{\mathcal{S}}_{iU}$, determines their variances, analyses the behaviour of all estimators, and concludes that the new estimators should be used when $n$ or $K$ are small (at least when $n\leq 50$ or $K\leq 3$). Additionally, the case where one of the raters is a gold standard is contemplated, in which situation two new parameters arise: the $conformity$ (the rater's capability to recognize a subject in the category $i$) and the $predictivity$ (the reliability of a response $i$ by the rater).

2602.20062 2026-02-24 cs.LG stat.ML

A Theory of How Pretraining Shapes Inductive Bias in Fine-Tuning

Nicolas Anguita, Francesco Locatello, Andrew M. Saxe, Marco Mondelli, Flavia Mancini, Samuel Lippl, Clementine Domine

详情
英文摘要

Pretraining and fine-tuning are central stages in modern machine learning systems. In practice, feature learning plays an important role across both stages: deep neural networks learn a broad range of useful features during pretraining and further refine those features during fine-tuning. However, an end-to-end theoretical understanding of how choices of initialization impact the ability to reuse and refine features during fine-tuning has remained elusive. Here we develop an analytical theory of the pretraining-fine-tuning pipeline in diagonal linear networks, deriving exact expressions for the generalization error as a function of initialization parameters and task statistics. We find that different initialization choices place the network into four distinct fine-tuning regimes that are distinguished by their ability to support feature learning and reuse, and therefore by the task statistics for which they are beneficial. In particular, a smaller initialization scale in earlier layers enables the network to both reuse and refine its features, leading to superior generalization on fine-tuning tasks that rely on a subset of pretraining features. We demonstrate empirically that the same initialization parameters impact generalization in nonlinear networks trained on CIFAR-100. Overall, our results demonstrate analytically how data and network initialization interact to shape fine-tuning generalization, highlighting an important role for the relative scale of initialization across different layers in enabling continued feature learning during fine-tuning.

2602.20029 2026-02-24 stat.ME

Covariance estimation for derivatives of functional data using an additive penalty in P-splines

Yueyun Zhu, Steven Golovkine, Norma Bargary, Andrew J. Simpkin

详情
英文摘要

P-splines provide a flexible and computationally efficient smoothing framework and are commonly used for derivative estimation in functional data. Including an additive penalty term in P-splines has been shown to improve estimates of derivatives. We propose a method which incorporates the fast covariance estimation (FACE) algorithm with an additive penalty in P-splines. The proposed method is used to estimate derivatives of covariance for functional data, which play an important role in derivative-based functional principal component analysis (FPCA). Following this, we provide an algorithm for estimating the eigenfunctions and their corresponding scores in derivative-based FPCA. For comparison, we evaluate our algorithm against an existing function \texttt{FPCAder()} in simulation. In addition, we extend the algorithm to multivariate cases, referred to as derivative multivariate functional principal component analysis (DMFPCA). DMFPCA is applied to joint angles in human movement data, where the derivative-based scores demonstrate strong performance in distinguishing locomotion tasks.

2602.19610 2026-02-24 cs.LG stat.CO stat.ME stat.ML

Variational Inference for Bayesian MIDAS Regression

Luigi Simeone

Comments 27 pages, 11 figures

详情
英文摘要

We develop a Coordinate Ascent Variational Inference (CAVI) algorithm for Bayesian Mixed Data Sampling (MIDAS) regression with linear weight parameterizations. The model separates impact coeffcients from weighting function parameters through a normalization constraint, creating a bilinear structure that renders generic Hamiltonian Monte Carlo samplers unreliable while preserving conditional conjugacy exploitable by CAVI. Each variational update admits a closed-form solution: Gaussian for regression coefficients and weight parameters, Inverse-Gamma for the error variance. The algorithm propagates uncertainty across blocks through second moments, distinguishing it from naive plug-in approximations. In a Monte Carlo study spanning 21 data-generating configurations with up to 50 predictors, CAVI produces posterior means nearly identical to a block Gibbs sampler benchmark while achieving speedups of 107x to 1,772x (Table 9). Generic automatic differentiation VI (ADVI), by contrast, produces bias 714 times larger while being orders of magnitude slower, confirming the value of model-specific derivations. Weight function parameters maintain excellent calibration (coverage above 92%) across all configurations. Impact coefficient credible intervals exhibit the underdispersion characteristic of mean-field approximations, with coverage declining from 89% to 55% as the number of predictors grows a documented trade-off between speed and interval calibration that structured variational methods can address. An empirical application to realized volatility forecasting on S&P 500 daily returns cofirms that CAVI and Gibbs sampling yield virtually identical point forecasts, with CAVI completing each monthly estimation in under 10 milliseconds.

2510.03817 2026-02-24 cs.LG stat.ML

TROLL: Trust Regions improve Reinforcement Learning for Large Language Models

Philipp Becker, Niklas Freymuth, Serge Thilges, Fabian Otto, Gerhard Neumann

Comments Published as a conference paper at ICLR 2026

详情
英文摘要

Reinforcement Learning (RL) with PPO-like clip objectives has become the standard choice for reward-based fine-tuning of large language models (LLMs). Although recent work has explored improved estimators of advantages and normalization, the clipping mechanism itself has remained untouched. Originally introduced as a proxy for principled KL-based trust regions, clipping is a crude approximation that often causes unstable updates and suboptimal performance. We replace the clip objective with a novel discrete differentiable trust region projection, which provides principled token-level KL constraints. The projection operates on a sparse subset of the model's most important token logits to balance computational cost and projection effectiveness. Our approach, Trust Region Optimization for Large Language models (TROLL), serves as a direct replacement for PPO-like clipping during training and does not alter the model's inference behavior. Across mathematical reasoning and code generation tasks, model families, as well as advantage-estimation methods, TROLL consistently outperforms PPO-like clipping in terms of training speed, stability, and final success rates.

2506.22975 2026-02-24 math.ST stat.TH

On the Study of Weighted Fractional Cumulative Residual Inaccuracy and its Dynamical Version with Applications

Aman Pandey, Chanchal Kundu

详情
英文摘要

In recent years, there has been a growing interest in information measures that quantify inaccuracy and uncertainty in systems. In this paper, we introduce a novel concept called the Weighted Fractional Cumulative Residual Inaccuracy (WFCRI). We develop several fundamental properties of WFCRI and establish important bounds that reveal its analytical behavior. Further, we examine the behavior of WFCRI under a mixture hazard model. A dynamic version of WFCRI also proposed and studied its behavior under proportional hazard rate model. An empirical estimation method for WFCRI under the proportional hazard rate model framework is also proposed, and its performance is evaluated through simulation studies. Finally, we demonstrate the utility of WFCRI measure in characterizing chaotic dynamics by applying it to the Ricker and cubic maps. The proposed measure is also applied to real data to assess the uncertainty.

2505.24506 2026-02-24 stat.AP

Enhancing the Accuracy of Spatio-Temporal Models for Wind Speed Prediction by Incorporating Bias-Corrected Crowdsourced Data

Eamonn Organ, Maeve Upton, Denis Allard, Lionel Benoit, James Sweeney

Journal ref Environmetrics 37(2), e70069 (2026)

详情
英文摘要

Accurate high-resolution spatial and temporal wind speed data is critical for estimating the wind energy potential of a location. For real-time wind speed prediction, statistical models typically depend on high-quality (near) real-time data from official meteorological stations to improve forecasting accuracy. Personal weather stations (PWS) offer an additional source of real-time data and broader spatial coverage than official stations. However, they are not subject to rigorous quality control and may exhibit bias or measurement errors. This paper presents a framework for incorporating PWS data into statistical models for validated official meteorological station data via a two-stage approach. First, bias correction is performed on PWS wind speed data using reanalysis data. Second, we implement a Bayesian hierarchical spatio-temporal model that accounts for varying measurement error in the PWS data. This enables wind speed prediction across a target area, and is particularly beneficial for improving predictions in regions sparse in official monitoring stations. Our results show that including bias-corrected PWS data improves prediction accuracy compared to using meteorological station data alone, with a 5% reduction in prediction error on average across all sites. The results are comparable with popular reanalysis products, but unlike these numerical weather models our approach is available in real-time and offers improved uncertainty quantification. are comparable with popular reanalysis products, but unlike these numerical weather models our approach is available in real-time and offers improved uncertainty quantification.

2501.10471 2026-02-24 cs.LG q-bio.QM stat.ML

VillageNet: Graph-based, Easily-interpretable, Unsupervised Clustering for Broad Biomedical Applications

Aditya Ballal, Gregory A. DePaul, Esha Datta, Asuka Hatano, Erik Carlsson, Ye Chen-Izu, Javier E. López, Leighton T. Izu

Comments Software available at https://villagenet.streamlit.app/ Github Link: https://github.com/lordareicgnon/VillageNet

详情
英文摘要

Clustering large high-dimensional datasets with diverse variable is essential for extracting high-level latent information from these datasets. Here, we developed an unsupervised clustering algorithm, we call "Village-Net". Village-Net is specifically designed to effectively cluster high-dimension data without priori knowledge on the number of existing clusters. The algorithm operates in two phases: first, utilizing K-Means clustering, it divides the dataset into distinct subsets we refer to as "villages". Next, a weighted network is created, with each node representing a village, capturing their proximity relationships. To achieve optimal clustering, we process this network using a community detection algorithm called Walk-likelihood Community Finder (WLCF), a community detection algorithm developed by one of our team members. A salient feature of Village-Net Clustering is its ability to autonomously determine an optimal number of clusters for further analysis based on inherent characteristics of the data. We present extensive benchmarking on extant real-world datasets with known ground-truth labels to showcase its competitive performance, particularly in terms of the normalized mutual information (NMI) score, when compared to other state-of-the-art methods. The algorithm is computationally efficient, boasting a time complexity of O(N*k*d), where N signifies the number of instances, k represents the number of villages and d represents the dimension of the dataset, which makes it well suited for effectively handling large-scale datasets.

2501.01129 2026-02-24 stat.AP

Compositional data analysis for modelling and forecasting mortality using the α-transformation

Han Ying Lim, Dharini Pathmanathan, Sophie Dabo-Niang

Comments 15 pages, 3 tables, 4 figures

详情
英文摘要

Mortality forecasting is crucial for demographic planning and actuarial studies, especially for projecting population ageing and longevity risk. Classical approaches largely rely on extrapolative methods, such as the Lee-Carter (LC) model, which use mortality rates as the mortality measure. In recent years, compositional data analysis (CoDA), which respects summability and non-negativity constraints, has gained increasing attention for mortality forecasting. While the centred log-ratio (CLR) transformation is commonly used to map compositional data to real space, the α-transformation, a generalisation of log-ratio transformations, offers greater flexibility and adaptability. This study contributes to mortality forecasting by introducing the α-transformation as an alternative to the CLR transformation within a non-functional CoDA model that has not been previously investigated in existing literature. To fairly compare the impact of transformation choices on forecast accuracy, zero values in the data are imputed, although the α-transformation can inherently handle them. Using age-specific life table death counts for males and females in 31 selected European countries/regions from 1983 to 2018, the proposed method demonstrates comparable performance to the CLR transformation in most cases, with improved forecast accuracy in some instances. These findings highlight the potential of the α-transformation for enhancing mortality forecasting within the non-functional CoDA framework.

2412.02094 2026-02-24 cs.LG cs.CY stat.AP

Crash Severity Risk Modeling Strategies under Data Imbalance

Abdullah Al Mamun, Abyad Enan, Debbie A. Indah, Judith Mwakalonge, Gurcan Comert, Mashrur Chowdhury

Comments This second revised version has been resubmitted to the Transportation Research Record: Journal of the Transportation Research Board after addressing the reviewers' comments and is currently awaiting the final decision

Journal ref Transportation Research Record (2025)

详情
英文摘要

This study investigates crash severity risk modeling strategies for work zones involving large vehicles (i.e., trucks, buses, and vans) under crash data imbalance between low-severity (LS) and high-severity (HS) crashes. We utilized crash data involving large vehicles in South Carolina work zones from 2014 to 2018, which included four times more LS crashes than HS crashes. The objective of this study is to evaluate the crash severity prediction performance of various statistical, machine learning, and deep learning models under different feature selection and data balancing techniques. Findings highlight a disparity in LS and HS predictions, with lower accuracy for HS crashes due to class imbalance and feature overlap. Discriminative Mutual Information (DMI) yields the most effective feature set for predicting HS crashes without requiring data balancing, particularly when paired with gradient boosting models and deep neural networks such as CatBoost, NeuralNetTorch, XGBoost, and LightGBM. Data balancing techniques such as NearMiss-1 maximize HS recall when combined with DMI-selected features and certain models such as LightGBM, making them well-suited for HS crash prediction. Conversely, RandomUnderSampler, HS Class Weighting, and RandomOverSampler achieve more balanced performance, which is defined as an equitable trade-off between LS and HS metrics, especially when applied to NeuralNetTorch, NeuralNetFastAI, CatBoost, LightGBM, and Bayesian Mixed Logit (BML) using merged feature sets or models without feature selection. The insights from this study offer safety analysts guidance on selecting models, feature selection, and data balancing techniques aligned with specific safety goals, providing a robust foundation for enhancing work-zone crash severity prediction.

2407.16024 2026-02-24 stat.ME

Generalized dynamic functional principal component analysis

Tzung Hsuen Khoo, Issa-Mbenard Dabo, Dharini Pathmanathan, Sophie Dabo-Niang

详情
英文摘要

In this paper, we explore dimension reduction for functional time series. We propose a generalized dynamic functional principal component analysis (GDFPCA) which does not rely on spectral density estimation and demonstrates strong empirical performance for both stationary and nonstationary functional time series. We define the generalized dynamic functional principal components (GDFPCs) as static factor time series in a functional dynamic factor model and obtain their multivariate representation from a truncation of the functional dynamic factor model. Estimation is based on a least-squares reconstruction criterion and implemented via a two-step procedure for the coefficient vectors of the loading curves under a basis expansion. We establish mean-square consistency of the reconstructed functional time series under weak stationarity. Simulation studies show that GDFPCA performs comparably to dynamic functional principal component analysis (DFPCA) for stationary data, while providing improved reconstruction accuracy in nonstationary settings, where both DFPCA and functional principal component analysis (FPCA) deteriorate. Applications to real datasets support the empirical advantages observed in the simulations.

2406.07210 2026-02-24 econ.GN physics.soc-ph q-fin.EC stat.AP

The green hydrogen ambition and implementation gap

Adrian Odenweller, Falko Ueckerdt

Journal ref Nat Energy 10, 110-123 (2025)

详情
英文摘要

Green hydrogen is critical for decarbonising hard-to-electrify sectors, but faces high costs and investment risks. Here we define and quantify the green hydrogen ambition and implementation gap, showing that meeting hydrogen expectations will remain challenging despite surging announcements of projects and subsidies. Tracking 137 projects over three years, we identify a wide 2022 implementation gap with only 2% of global capacity announcements finished on schedule. In contrast, the 2030 ambition gap towards 1.5°C scenarios is gradually closing as the announced project pipeline has nearly tripled to 441 GW within three years. However, we estimate that, without carbon pricing, realising all these projects would require global subsidies of \$1.6 trillion (\$1.2 - 2.6 trillion range), far exceeding announced subsidies. Given past and future implementation gaps, policymakers must prepare for prolonged green hydrogen scarcity. Policy support needs to secure hydrogen investments, but should focus on applications where hydrogen is indispensable.

2405.09797 2026-02-24 stat.ME stat.ML stat.OT

Extrapolating Single-Treatment Effects Out of Factorial Experiments

Guilherme Duarte

详情
英文摘要

Despite their cost, randomized controlled trials (RCTs) are widely regarded as gold-standard evidence in disciplines ranging from social science to medicine. In recent decades, researchers have increasingly sought to reduce the resource burden of repeated RCTs with factorial designs that simultaneously test multiple hypotheses, e.g. experiments that evaluate the effects of many medications or products simultaneously. Here I show that when multiple interventions are randomized in experiments, the effect any single intervention would have outside the experimental setting is not identified absent heroic assumptions, even if otherwise perfectly realistic conditions are achieved. This happens because single-treatment effects involve a counterfactual world with a single focal intervention, allowing other variables to take their natural values (which may be confounded or modified by the focal intervention). In contrast, observational studies and factorial experiments provide information about potential-outcome distributions with zero and multiple interventions, respectively. In this paper, I formalize sufficient conditions for the identifiability of those isolated quantities. I show that researchers who rely on this type of design have to justify either linearity of functional forms or -- in the nonparametric case -- specify with Directed Acyclic Graphs how variables are related in the real world. Finally, I develop nonparametric sharp bounds -- i.e., maximally informative best-/worst-case estimates consistent with limited RCT data -- that show when extrapolations about effect signs are empirically justified. These new results are illustrated with simulated data.

2212.00795 2026-02-24 stat.ME

Causal Selection of Covariates in Regression Calibration for Mismeasured Continuous Exposure

Wenze Tang, Donna Spiegelman, Xiaomei Liao, Molin Wang

Comments 11 pages, 3 figures

详情
英文摘要

Regression calibration as developed by Rosner, Spiegelman and Willet is used to correct the bias in effect estimates due to measurement error in continuous exposures. The method involves two models: a measurement error model (MEM) relating the mismeasured exposure to the true exposure and an outcome model relating the mismeasured exposure to outcome. However, no comprehensive guidance exists for determining which covariates should be included in each model. In this paper, we investigate the selection of the minimal and most efficient covariate adjustment sets under a causal inference framework. We show that in order to correct for the measurement error, researchers must adjust for, in both MEM and outcome model, any common causes (1) of true exposure and the outcome and (2) of measurement error and the outcome. When such variable(s) are only available in the main study, researchers should still adjust for them in the outcome model to reduce bias, provided that these covariates are at most weakly associated with measurement error. We also show that adjusting for so called prognostic variables that are independent of true exposure and measurement error in outcome model, may increase efficiency, while adjusting for any covariates that are associated only with true exposure generally results in efficiency loss in realistic settings. We apply the proposed covariate selection approach to the Health Professional Follow-up Study dataset to study the effect of fiber intake on cardiovascular disease. Finally, we extend the originally proposed estimators to a non-parametric setting where effect modification by covariates is allowed.

1705.10494 2026-02-24 stat.ML cs.LG

Joint auto-encoders: a flexible multi-task learning framework

Baruch Epstein, Ron Meir, Tomer Michaeli

详情
英文摘要

The incorporation of prior knowledge into learning is essential in achieving good performance based on small noisy samples. Such knowledge is often incorporated through the availability of related data arising from domains and tasks similar to the one of current interest. Ideally one would like to allow both the data for the current task and for previous related tasks to self-organize the learning system in such a way that commonalities and differences between the tasks are learned in a data-driven fashion. We develop a framework for learning multiple tasks simultaneously, based on sharing features that are common to all tasks, achieved through the use of a modular deep feedforward neural network consisting of shared branches, dealing with the common features of all tasks, and private branches, learning the specific unique aspects of each task. Once an appropriate weight sharing architecture has been established, learning takes place through standard algorithms for feedforward networks, e.g., stochastic gradient descent and its variations. The method deals with domain adaptation and multi-task learning in a unified fashion, and can easily deal with data arising from different types of sources. Numerical experiments demonstrate the effectiveness of learning in domain adaptation and transfer learning setups, and provide evidence for the flexible and task-oriented representations arising in the network.

2602.19954 2026-02-24 stat.AP

A Two-Step Spatio-Temporal Framework for Turbine-Height Wind Estimation at Unmonitored Sites from Sparse Meteorological Data

Eamonn Organ, Maeve Upton, Denis Allard, Lionel Benoit, James Sweeney

详情
英文摘要

Accurate estimates of wind speeds at wind turbine hub heights are crucial for both wind resource assessment and day-to-day management of electricity grids with high renewable penetration. In the absence of direct measurements, parametric models are commonly used to extrapolate wind speeds from observed heights to turbine heights. Recent literature has proposed extensions to allow for spatially or temporally varying vertical wind gradients, that is, the rate at which wind speed changes with height. However, these approaches typically assume that reference height and hub height measurements are available at the same locations, which limits their applicability in operational settings where meteorological stations and wind farms are spatially separated. In this paper, we develop a two-step spatio-temporal framework to estimate turbine height wind speeds using only open-access observations from sparse meteorological stations. First, a non-parametric generalized additive model is trained on reanalysis data to perform vertical height extrapolation. Second, a spatial Gaussian process model interpolates these hub-height estimates to wind farm locations while explicitly propagating uncertainty from the height extrapolation stage. The proposed framework enables the construction of high-resolution, sub-hourly turbine-height wind speed time series and spatial wind maps using data available in real time, capabilities not provided by existing reanalysis products. We further provide calibrated uncertainty estimates that account for both vertical extrapolation and spatial interpolation errors. The approach is validated using hub-height measurements from seven operational wind farms in Ireland, demonstrating improved accuracy relative to ERA5 reanalysis while relying solely on real-time, open-access data.

2602.19952 2026-02-24 stat.AP stat.ME stat.ML

A Bayesian Framework for Post-disruption Travel Time Prediction in Metro Networks

Shayan Nazemi, Aurélie Labbe, Stefan Steiner, Pratheepa Jeganathan, Martin Trépanier, Léo R. Belzile

详情
英文摘要

Disruptions are an inherent feature of transportation systems, occurring unpredictably and with varying durations. Even after an incident is reported as resolved, disruptions can induce irregular train operations that generate substantial uncertainty in passenger waiting and travel times. Accurately forecasting post-disruption travel times therefore remains a critical challenge for transit operators and passenger information systems. This paper develops a Bayesian spatiotemporal modeling framework for post-disruption train travel times that explicitly captures train interactions, headway imbalance, and non-Gaussian distributional characteristics observed during recovery periods. The proposed model decomposes travel times into delay and journey components and incorporates a moving-average error structure to represent dependence between consecutive trains. Skew-normal and skew-$t$ distributions are employed to flexibly accommodate heteroskedasticity, skewness, and heavy-tailed behavior in post-disruption travel times. The framework is evaluated using high-resolution track-occupancy and disruption log data from the Montréal metro system, covering two lines in both travel directions. Empirical results indicate that post-disruption travel times exhibit pronounced distributional asymmetries that vary with traveled distance, as well as significant error dependence across trains. The proposed models consistently outperform baseline specifications in both point prediction accuracy and uncertainty quantification, with the skew-$t$ model demonstrating the most robust performance for longer journeys. These findings underscore the importance of incorporating both distributional flexibility and error dependence when forecasting post-disruption travel times in urban rail systems.

2602.19922 2026-02-24 stat.ME

Transfer Learning with Network Embeddings under Structured Missingness

Mengyan Li, Xiaoou Li, Kenneth D Mandl, Tianxi Cai

详情
英文摘要

Modern data-driven applications increasingly rely on large, heterogeneous datasets collected across multiple sites. Differences in data availability, feature representation, and underlying populations often induce structured missingness, complicating efforts to transfer information from data-rich settings to those with limited data. Many transfer learning methods overlook this structure, limiting their ability to capture meaningful relationships across sites. We propose TransNEST (Transfer learning with Network Embeddings under STructured missingness), a framework that integrates graphical data from source and target sites with prior group structure to construct and refine network embeddings. TransNEST accommodates site-specific features, captures within-group heterogeneity and between-site differences adaptively, and improves embedding estimation under partial feature overlap. We establish the convergence rate for the TransNEST estimator and demonstrate strong finite-sample performance in simulations. We apply TransNEST to a multi-site electronic health record study, transferring feature embeddings from a general hospital system to a pediatric hospital system. Using a hierarchical ontology structure, TransNEST improves pediatric embeddings and supports more accurate pediatric knowledge extraction, achieving the best accuracy for identifying pediatric-specific relational feature pairs compared with benchmark methods.

2602.19903 2026-02-24 eess.SP cs.LG stat.ML

Rethinking Chronological Causal Discovery with Signal Processing

Kurt Butler, Damian Machlanski, Panagiotis Dimitrakopoulos, Sotirios A. Tsaftaris

Comments 5 pages, 5 figures, Final version accepted to the 59th Asilomar Conference on Signals, Systems, and Computers (2025)

详情
英文摘要

Causal discovery problems use a set of observations to deduce causality between variables in the real world, typically to answer questions about biological or physical systems. These observations are often recorded at regular time intervals, determined by a user or a machine, depending on the experiment design. There is generally no guarantee that the timing of these recordings matches the timing of the underlying biological or physical events. In this paper, we examine the sensitivity of causal discovery methods to this potential mismatch. We consider empirical and theoretical evidence to understand how causal discovery performance is impacted by changes of sampling rate and window length. We demonstrate that both classical and recent causal discovery methods exhibit sensitivity to these hyperparameters, and we discuss how ideas from signal processing may help us understand these phenomena.

2602.19893 2026-02-24 cs.LG stat.ML

Generalized Random Direction Newton Algorithms for Stochastic Optimization

Soumen Pachal, Prashanth L. A., Shalabh Bhatnagar, Avinash Achar

详情
英文摘要

We present a family of generalized Hessian estimators of the objective using random direction stochastic approximation (RDSA) by utilizing only noisy function measurements. The form of each estimator and the order of the bias depend on the number of function measurements. In particular, we demonstrate that estimators with more function measurements exhibit lower-order estimation bias. We show the asymptotic unbiasedness of the estimators. We also perform asymptotic and non-asymptotic convergence analyses for stochastic Newton methods that incorporate our generalized Hessian estimators. Finally, we perform numerical experiments to validate our theoretical findings.

2602.19859 2026-02-24 stat.ML cs.LG

Dirichlet Scale Mixture Priors for Bayesian Neural Networks

August Arnstad, Leiv Rønneberg, Geir Storvik

Comments 24 pages, 20 figures

详情
英文摘要

Neural networks are the cornerstone of modern machine learning, yet can be difficult to interpret, give overconfident predictions and are vulnerable to adversarial attacks. Bayesian neural networks (BNNs) provide some alleviation of these limitations, but have problems of their own. The key step of specifying prior distributions in BNNs is no trivial task, yet is often skipped out of convenience. In this work, we propose a new class of prior distributions for BNNs, the Dirichlet scale mixture (DSM) prior, that addresses current limitations in Bayesian neural networks through structured, sparsity-inducing shrinkage. Theoretically, we derive general dependence structures and shrinkage results for DSM priors and show how they manifest under the geometry induced by neural networks. In experiments on simulated and real world data we find that the DSM priors encourages sparse networks through implicit feature selection, show robustness under adversarial attacks and deliver competitive predictive performance with substantially fewer effective parameters. In particular, their advantages appear most pronounced in correlated, moderately small data regimes, and are more amenable to weight pruning. Moreover, by adopting heavy-tailed shrinkage mechanisms, our approach aligns with recent findings that such priors can mitigate the cold posterior effect, offering a principled alternative to the commonly used Gaussian priors.

2602.19851 2026-02-24 stat.ME cs.LG

Orthogonal Uplift Learning with Permutation-Invariant Representations for Combinatorial Treatments

Xinyan Su, Jiacan Gao, Mingyuan Ma, Xiao Xu, Xinrui Wan, Tianqi Gu, Enyun Yu, Jiecheng Guo, Zhiheng Zhang

详情
英文摘要

We study uplift estimation for combinatorial treatments. Uplift measures the pure incremental causal effect of an intervention (e.g., sending a coupon or a marketing message) on user behavior, modeled as a conditional individual treatment effect. Many real-world interventions are combinatorial: a treatment is a policy that specifies context-dependent action distributions rather than a single atomic label. Although recent work considers structured treatments, most methods rely on categorical or opaque encodings, limiting robustness and generalization to rare or newly deployed policies. We propose an uplift estimation framework that aligns treatment representation with causal semantics. Each policy is represented by the mixture it induces over contextaction components and embedded via a permutation-invariant aggregation. This representation is integrated into an orthogonalized low-rank uplift model, extending Robinson-style decompositions to learned, vector-valued treatments. We show that the resulting estimator is expressive for policy-induced causal effects, orthogonally robust to nuisance estimation errors, and stable under small policy perturbations. Experiments on large-scale randomized platform data demonstrate improved uplift accuracy and stability in long-tailed policy regimes

2602.19839 2026-02-24 math.ST stat.TH

Addressing parity blindness of data-driven Sobolev tests on the hypersphere

Marcio Reverbel

Comments 6 pages, 1 figure, submitted to Statistics & Probability Letters

详情
英文摘要

We study the asymptotic behavior of the data-driven Sobolev test for testing uniformity on the (hyper)sphere. We show that it can be blind to certain contiguous alternatives and propose a simple modification of the test statistic. This adapted test retains consistency under fixed alternatives and achieves non-trivial asymptotic power against contiguous alternatives for which the original test fails. Simulation results support our theoretical findings.

2602.19838 2026-02-24 stat.ME math.ST stat.TH

Optimality of the Half-Order Exponent in the Turing-Good Identities for Bayes Factors

Kensuke Okada

详情
英文摘要

Bayes factors are widely computed by Monte Carlo, yet heavy-tailed sampling distributions can make numerical validation unreliable. The Turing--Good identities provide exact moment equalities for powers of a Bayes factor (a density ratio). When these identities are used as Good-check diagnostics, the power choice becomes a statistical design parameter. We develop a nonasymptotic variance theory for Monte Carlo evaluation of the identities and show that the half-order (square-root) power is uniquely minimax-stable: it equalizes variability across the two model orientations and is the only choice that guarantees finite second moments in a distribution-free worst-case sense over all mutually absolutely continuous model pairs. This yields a balanced two-sample half-order diagnostic that is symmetric in model labeling and has a uniform variance bound at fixed computational budget; in small-overlap regimes it is guaranteed to be no less efficient than the standard one-sided Turing check. Simulations for binomial Bayes factor workflows illustrate stable finite-sample behavior and sensitivity to simulator--evaluator mismatches. We further connect the half-order overlap viewpoint to stable primitives for normalizing-constant ratios and importance-sampling degeneracy summaries.

2602.19803 2026-02-24 math.ST cs.IT math.IT stat.TH

From Asymptotic to Finite-Sample Minimax Robust Hypothesis Testing

Gökhan Gül

Comments 40 pages, 6 figures. Submitted to IEEE Transactions on Information Theory

详情
英文摘要

This paper establishes a formal connection between finite-sample and asymptotically minimax robust hypothesis testing under distributional uncertainty. It is shown that, whenever a finite-sample minimax robust test exists, it coincides with the solution of the corresponding asymptotic minimax problem. This result enables the analytical derivation of finite-sample minimax robust tests using asymptotic theory, bypassing the need for heuristic constructions. The total variation distance and band model are examined as representative uncertainty classes. For each, the least favorable distributions and corresponding robust likelihood ratio functions are derived in parametric form. In the total variation case, the new derivation generalizes earlier results by allowing unequal robustness parameters. The theory also explains and systematizes previously heuristic designs. Simulations are provided to illustrate the theoretical results.

2602.19785 2026-02-24 cs.LG cs.NE stat.ML

Unsupervised Anomaly Detection in NSL-KDD Using $β$-VAE: A Latent Space and Reconstruction Error Approach

Dylan Baptiste, Ramla Saddem, Alexandre Philippot, François Foyer

Journal ref 2025 15th France-Japan \& 13th Europe-Asia Congress on Mechatronics (MECATRONICS) / 23rd International Conference on Research and Education in Mechatronics (REM), Dec 2025, Saint-Ouen-sur-Seine, France. pp.1-6

详情
英文摘要

As Operational Technology increasingly integrates with Information Technology, the need for Intrusion Detection Systems becomes more important. This paper explores an unsupervised approach to anomaly detection in network traffic using $β$-Variational Autoencoders on the NSL-KDD dataset. We investigate two methods: leveraging the latent space structure by measuring distances from test samples to the training data projections, and using the reconstruction error as a conventional anomaly detection metric. By comparing these approaches, we provide insights into their respective advantages and limitations in an unsupervised setting. Experimental results highlight the effectiveness of latent space exploitation for classification tasks.

2602.19761 2026-02-24 stat.ML cs.LG stat.AP

Ensemble Machine Learning and Statistical Procedures for Dynamic Predictions of Time-to-Event Outcomes

Nina van Gerwen, Sten Willemsen, Bettina E. Hansen, Christophe Corpechot, Marco Carbone, Cynthia Levy, Maria-Carlota Londõno, Atsushi Tanaka, Palak Trivedi, Alejandra Villamil, Gideon Hirschfield, Dimitris Rizopoulos

详情
英文摘要

Dynamic predictions for longitudinal and time-to-event outcomes have become a versatile tool in precision medicine. Our work is motivated by the application of dynamic predictions in the decision-making process for primary biliary cholangitis patients. For these patients, serial biomarker measurements (e.g., bilirubin and alkaline phosphatase levels) are routinely collected to inform treating physicians of the risk of liver failure and guide clinical decision-making. Two popular statistical approaches to derive dynamic predictions are joint modelling and landmarking. However, recently, machine learning techniques have also been proposed. Each approach has its merits, and no single method exists to outperform all others. Consequently, obtaining the best possible survival estimates is challenging. Therefore, we extend the Super Learner framework to combine dynamic predictions from different models and procedures. Super Learner is an ensemble learning technique that allows users to combine different prediction algorithms to improve predictive accuracy and flexibility. It uses cross-validation and different objective functions of performance (e.g., squared loss) that suit specific applications to build the optimally weighted combination of predictions from a library of candidate algorithms. In our work, we pay special attention to appropriate objective functions for Super Learner to obtain the most optimal weighted combination of dynamic predictions. In our primary biliary cholangitis application, Super Learner presented unique benefits due to its ability to flexibly combine outputs from a diverse set of models with varying assumptions for equal or better predictive performance than any model fit separately.

2602.19740 2026-02-24 econ.EM stat.AP

Volatility Spillovers in China's Real Estate Crisis: A Network Approach

Julia Manso

详情
英文摘要

Sentiment towards the Chinese real estate sector has deteriorated following the introduction of financing constraints in 2020 with the ''three red lines." Forcing developers to restructure their debt, the policy triggered a cascade of financing troubles, defaults, and reduced housing demand, ultimately culminating in a prolonged real estate crisis. This paper utilizes a network approach in line with Demirer et al. (2018) and Diebold and Yilmaz (2014) to measure daily time-varying connectedness in the stock return volatilities of major Chinese real estate developers throughout the crisis. Focusing on spillover between companies as reflected by market perception, this paper examines how connectedness evolves over time across firms with different regional exposures and state-ownership statuses, filling a gap in the literature to elucidate where property demand and real estate firm trustworthiness have deteriorated most. An event-study analysis of four key moments of the crisis outlines distinct phases of market sentiment: with the introduction of the three red lines, connectedness primarily reflects shared exposure and a uniform shock to the market. Then, the early unrest surrounding Evergrande exposes strong regional differentiation, with firms concentrated in less developed regions receiving significant spillover. By one year into the crisis, previously stable regions receive higher levels of spillover, and there is evidence of a substitution effect towards private developers. Two years into the crisis, the market has much less homogeneity in effects across regions and state-ownership status: major shocks induce minimal network changes, reflecting how investors have already priced in their beliefs. This paper also offers one of the most extensive timelines of the Chinese real estate crisis to date, and a new R package, GephiForR, was created for the network visualization in this paper.

2602.19738 2026-02-24 stat.ME

Individualized Causal Effects under Network Interference with Combinatorial Treatments

Yunping Lu, Haoang Chi, Qirui Hu, Zhiheng Zhang

详情
英文摘要

Modern causal decision-making increasingly demands individualized treatment-effect estimation in networks where interventions are high-dimensional, combinatorial vectors. While network interference, effect heterogeneity, and multi-dimensional treatments have been studied separately, their intersection yields an exponentially large intervention space that makes standard identification tools and low-dimensional exposure mappings untenable. We bridge this gap with a unified framework that constructs a \emph{global potential-outcome emulator} for unit-level inference. Our method combines (1) rooted network configurations to leverage local smoothness, (2) doubly robust orthogonalization to mitigate confounding from network position and covariates, and (3) sparse spectral learning to efficiently estimate response surfaces over the $2^p$-dimensional treatment space. We also decompose networked effects into own-treatment, structural, and interaction components, and provide finite-sample error bounds and asymptotic consistency guarantees. Overall, we show that individualized causal inference remains feasible in high-dimensional networked settings without collapsing the intervention space.

2602.19709 2026-02-24 math.ST stat.TH

On Expectation Propagation and the Probabilistic Editor in some simple mixture problems

Nils Lid Hjort, Mike Titterington

Comments 22 pages, 0 figures; Mike Titterington passed away in 2023, at the age of 77; this is the October 2010 version of a paper we collaborated on then and (still) planned to extend before submitting to a journal

详情
英文摘要

As for other latent-variable problems, exact Bayesian analysis is typically not practicable for mixture problems and approximate methods have been developed. Variational Bayes tends to produce approximate posterior distributions for parameters that are too tightly concentrated in having variances that are too small. The paper identifies a few mixture problems in which Expectation Propagation and variations thereof lead to approximate posterior distributions that asymptotically exhibit `correct' variances and therefore stand to provide reliable interval estimates for the unknown parameter or parameters.

2602.19663 2026-02-24 q-fin.RM stat.CO

The impact of class imbalance in logistic regression models for low-default portfolios in credit risk

Willem D. Schutte, Charl Pretorius, Neill Smit, Leandra van der Merwe, Robert Maxwell

Comments 24 pages, 9 figures

详情
英文摘要

In this paper, we study how class imbalance, typical of low-default credit portfolios, affects the performance of logistic regression models. Using a simulation study with controlled data-generating mechanisms, we vary (i) the level of class imbalance and (ii) the strength of association between the predictors and the response. The results show that, for a given strength of association, achievable classification accuracy deteriorates markedly as the event rate decreases, and the optimal classification cut-off shifts with the level of imbalance. In contrast, the Gini coefficient is comparatively stable with respect to class imbalance once sample sizes are sufficiently large, even when classification accuracy is strongly affected. As a practical guideline, we summarise attainable classification performance as a function of the event rate and strength of association between the predictors and the response.

2602.19648 2026-02-24 stat.ME

Local depth-based classification of directional data

Giuseppe Gismondi, Rebecca Rivieccio, Giuseppe Pandolfo

详情
英文摘要

Directional data arise in many applications where observations are naturally represented as unit vectors or as observations on the surface of a unit hypersphere. In this context, statistical depth functions provide a center--outward ordering of the data. This work aims at proposing the use of a local notion of data depth function to be applied in the DD-plot (Depth vs. Depth plot) to classify directional data. The proposed method is investigated through an extensive simulation study and two real-data examples.

2602.19634 2026-02-24 cs.LG cs.AI stat.ML

Compositional Planning with Jumpy World Models

Jesse Farebrother, Matteo Pirotta, Andrea Tirinzoni, Marc G. Bellemare, Alessandro Lazaric, Ahmed Touati

详情
英文摘要

The ability to plan with temporal abstractions is central to intelligent decision-making. Rather than reasoning over primitive actions, we study agents that compose pre-trained policies as temporally extended actions, enabling solutions to complex tasks that no constituent alone can solve. Such compositional planning remains elusive as compounding errors in long-horizon predictions make it challenging to estimate the visitation distribution induced by sequencing policies. Motivated by the geometric policy composition framework introduced in arXiv:2206.08736, we address these challenges by learning predictive models of multi-step dynamics -- so-called jumpy world models -- that capture state occupancies induced by pre-trained policies across multiple timescales in an off-policy manner. Building on Temporal Difference Flows (arXiv:2503.09817), we enhance these models with a novel consistency objective that aligns predictions across timescales, improving long-horizon predictive accuracy. We further demonstrate how to combine these generative predictions to estimate the value of executing arbitrary sequences of policies over varying timescales. Empirically, we find that compositional planning with jumpy world models significantly improves zero-shot performance across a wide range of base policies on challenging manipulation and navigation tasks, yielding, on average, a 200% relative improvement over planning with primitive actions on long-horizon tasks.

2602.19600 2026-02-24 stat.ML cs.LG

Manifold-Aligned Generative Transport

Xinyu Tian, Xiaotong Shen

Comments 64 pages, 5 figures

详情
英文摘要

High-dimensional generative modeling is fundamentally a manifold-learning problem: real data concentrate near a low-dimensional structure embedded in the ambient space. Effective generators must therefore balance support fidelity -- placing probability mass near the data manifold -- with sampling efficiency. Diffusion models often capture near-manifold structure but require many iterative denoising steps and can leak off-support; normalizing flows sample in one pass but are limited by invertibility and dimension preservation. We propose MAGT (Manifold-Aligned Generative Transport), a flow-like generator that learns a one-shot, manifold-aligned transport from a low-dimensional base distribution to the data space. Training is performed at a fixed Gaussian smoothing level, where the score is well-defined and numerically stable. We approximate this fixed-level score using a finite set of latent anchor points with self-normalized importance sampling, yielding a tractable objective. MAGT samples in a single forward pass, concentrates probability near the learned support, and induces an intrinsic density with respect to the manifold volume measure, enabling principled likelihood evaluation for generated samples. We establish finite-sample Wasserstein bounds linking smoothing level and score-approximation accuracy to generative fidelity, and empirically improve fidelity and manifold concentration across synthetic and benchmark datasets while sampling substantially faster than diffusion models.

2602.19590 2026-02-24 q-fin.TR cs.CE q-fin.ST stat.CO

Metaorder modelling and identification from public data

Ezra Goliath, Tim Gebbie

Comments 12 pages, 6 figures

详情
英文摘要

Market-order flow in financial markets exhibits long-range correlations. This is a widely known stylised fact of financial markets. A popular hypothesis for this stylised fact comes from the Lillo-Mike-Farmer (LMF) order-splitting theory. However, quantitative tests of this theory have historically relied on proprietary datasets with trader identifiers, limiting reproducibility and cross-market validation. We show that the LMF theory can be validated using publicly available Johannesburg Stock Exchange (JSE) data by leveraging recently developed methods for reconstructing synthetic metaorders. We demonstrate the validation using 3 years of Transaction and Quote Data (TAQ) for the largest 100 stocks on the JSE when assuming that there are either N=50 or N=150 effective traders managing metaorders in the market.

2602.19578 2026-02-24 stat.ML cs.LG

Goal-Oriented Influence-Maximizing Data Acquisition for Learning and Optimization

Weichi Yao, Bianca Dumitrascu, Bryan R. Goldsmith, Yixin Wang

详情
英文摘要

Active data acquisition is central to many learning and optimization tasks in deep neural networks, yet remains challenging because most approaches rely on predictive uncertainty estimates that are difficult to obtain reliably. To this end, we propose Goal-Oriented Influence- Maximizing Data Acquisition (GOIMDA), an active acquisition algorithm that avoids explicit posterior inference while remaining uncertainty-aware through inverse curvature. GOIMDA selects inputs by maximizing their expected influence on a user-specified goal functional, such as test loss, predictive entropy, or the value of an optimizer-recommended design. Leveraging first-order influence functions, we derive a tractable acquisition rule that combines the goal gradient, training-loss curvature, and candidate sensitivity to model parameters. We show theoretically that, for generalized linear models, GOIMDA approximates predictive-entropy minimization up to a correction term accounting for goal alignment and prediction bias, thereby, yielding uncertainty-aware behavior without maintaining a Bayesian posterior. Empirically, across learning tasks (including image and text classification) and optimization tasks (including noisy global optimization benchmarks and neural-network hyperparameter tuning), GOIMDA consistently reaches target performance with substantially fewer labeled samples or function evaluations than uncertainty-based active learning and Gaussian-process Bayesian optimization baselines.

2602.19528 2026-02-24 cs.LG stat.ML

Beyond Accuracy: A Unified Random Matrix Theory Diagnostic Framework for Crash Classification Models

Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma

详情
英文摘要

Crash classification models in transportation safety are typically evaluated using accuracy, F1, or AUC, metrics that cannot reveal whether a model is silently overfitting. We introduce a spectral diagnostic framework grounded in Random Matrix Theory (RMT) and Heavy-Tailed Self-Regularization (HTSR) that spans the ML taxonomy: weight matrices for BERT/ALBERT/Qwen2.5, out-of-fold increment matrices for XGBoost/Random Forest, empirical Hessians for Logistic Regression, induced affinity matrices for Decision Trees, and Graph Laplacians for KNN. Evaluating nine model families on two Iowa DOT crash classification tasks (173,512 and 371,062 records respectively), we find that the power-law exponent $α$ provides a structural quality signal: well-regularized models consistently yield $α$ within $[2, 4]$ (mean $2.87 \pm 0.34$), while overfit variants show $α< 2$ or spectral collapse. We observe a strong rank correlation between $α$ and expert agreement (Spearman $ρ= 0.89$, $p < 0.001$), suggesting spectral quality captures model behaviors aligned with expert reasoning. We propose an $α$-based early stopping criterion and a spectral model selection protocol, and validate both against cross-validated F1 baselines. Sparse Lanczos approximations make the framework scalable to large datasets.

2602.19520 2026-02-24 stat.AP

Decomposing Crowd Wisdom: Domain-Specific Calibration Dynamics in Prediction Markets

Nam Anh Le

详情
英文摘要

Prediction markets are increasingly used as probability forecasting tools, yet their usefulness depends on calibration, specifically whether a contract trading at 70 cents truly implies a 70% probability. Using 292 million trades across 327,000 binary contracts on Kalshi and Polymarket, this paper shows that calibration is a structured, multidimensional phenomenon. On Kalshi, calibration decomposes into four components (a universal horizon effect, domain-specific biases, domain-by-horizon interactions and a trade-size scale effect) that together explain 87.3% of calibration variance. The dominant pattern is persistent underconfidence in political markets, where prices are chronically compressed toward 50%, and this bias generalises across both exchanges. However, the trade-size scale effect, whereby large trades are associated with amplified underconfidence in politics on Kalshi ($Δ= 0.53$, 95% confidence interval [0.29, 0.75]), does not replicate on Polymarket ($Δ= 0.11$, [-0.15, 0.39]), suggesting platform-specific microstructure. A Bayesian hierarchical model confirms the frequentist decomposition with 96.3% posterior predictive coverage. Consumers of prediction market prices who treat them as face-value probabilities will systematically misinterpret them, and the direction of misinterpretation depends on what is being predicted, when and by whom.

2602.19513 2026-02-24 stat.AP stat.ML

Real-time Win Probability and Latent Player Ability via STATS X in Team Sports

Yasutaka Shimizu, Atsushi Yamanobe

详情
英文摘要

This study proposes a statistically grounded framework for real-time win probability evaluation and player assessment in score-based team sports, based on minute-by-minute cumulative box-score data. We introduce a continuous dominance indicator (T-score) that maps final scores to real values consistent with win/lose outcomes, and formulate it as a time-evolving stochastic representation (T-process) driven by standardized cumulative statistics. This structure captures temporal game dynamics and enables sequential, analytically tractable updates of in-game win probability. Through this stochastic formulation, competitive advantage is decomposed into interpretable statistical components. Furthermore, we define a latent contribution index, STATS X, which quantifies a player's involvement in favorable dominance intervals identified by the T-process. This allows us to separate a team's baseline strength from game-specific performance fluctuations and provides a coherent, structural evaluation framework for both teams and players. While we do not implement AI methods in this paper, our framework is positioned as a foundational step toward hybrid integration with AI. By providing a structured time-series representation of dominance with an explicit probabilistic interpretation, the framework enables flexible learning mechanisms and incorporation of high-dimensional data, while preserving statistical coherence and interpretability. This work provides a basis for advancing AI-driven sports analytics.

2602.19510 2026-02-24 cs.LG math.OC stat.ML

Less is More: Convergence Benefits of Fewer Data Weight Updates over Longer Horizon

Rudrajit Das, Neel Patel, Meisam Razaviyayn, Vahab Mirrokni

详情
英文摘要

Data mixing--the strategic reweighting of training domains--is a critical component in training robust machine learning models. This problem is naturally formulated as a bilevel optimization task, where the outer loop optimizes domain weights to minimize validation loss, and the inner loop optimizes model parameters to minimize the weighted training loss. Classical bilevel optimization relies on hypergradients, which theoretically require the inner optimization to reach convergence. However, due to computational constraints, state-of-the-art methods use a finite, often small, number of inner update steps before updating the weights. The theoretical implications of this approximation are not well understood. In this work, we rigorously analyze the convergence behavior of data mixing with a finite number of inner steps $T$. We prove that the "greedy" practical approach of using $T=1$ can fail even in a simple quadratic example. Under a fixed parameter update budget $N$ and assuming the per-domain losses are strongly convex, we show that the optimal $T$ scales as $Θ(\log N)$ (resp., $Θ({(N \log N)}^{1/2})$) for the data mixing problem with access to full (resp., stochastic) gradients. We complement our theoretical results with proof-of-concept experiments.

2602.19486 2026-02-24 eess.SY cs.SY stat.AP stat.ME

A mixed Hinfty-Passivity approach for Leveraging District Heating Systems as Frequency Ancillary Service in Electric Power Systems

Xinyi Yi, Ioannis Lestas

详情
英文摘要

This paper introduces a mixed H-infinity-passivity framework that enables district heating systems (DHSs) with heat pumps to support electric-grid frequency regulation. The analysis illustrates how the DHS regulator influences coupled electro-thermal frequency dynamics and provides LMI conditions for efficient controller design. We also present a disturbance-independent temperature regulator that ensures stability and robustness against heat-demand uncertainty. Simulations demonstrate improved frequency-control dynamics in the electrical power grid while maintaining good thermal performance in the DHS.

2602.19481 2026-02-24 math.ST stat.TH

A Selection Premium Decomposition for the Expected Maximum of Random Walks

Victor H. de la Pena, Fangyuan Lin, Victor K. de la Pena

详情
英文摘要

When $K$ models are evaluated on the same validation set of size $n$, the selected winner's apparent performance is biased upward. Suppose $K$ models are evaluated on a shared sequence of i.i.d. observations $X_1,\dots, X_n$, where model $k$ achieves response $f_k(X_i)$ with mean $μ_k = \mathbb E[f_k(X)]$. Writing $Y_{i,k} = f_k(X_i)-μ_k$ for the centered increment and $S_{n,k} = \sum_{i=1}^n Y_{i,k}$ for the centered cumulative score, the expected maximum satisfies $0\le\mathbb E\bigl[\max_k S_{n,k}\bigr] = \sum_{i=1}^n \mathbb E\bigl[φ_K(S_{i-1})\bigr]$ where $φ_K(u) = \mathbb{E}\bigl[\max_k(u_k + Y_k)\bigr] - \max_k u_k$, $u\in \mathbb R^K$, is the selection premium function. This formula corresponds to the null hypothesis case (all models are equal in the sense that they have the same mean), which clarifies that the bias arises from selection. While this decomposition follows from elementary conditioning and telescoping, we develop the analytical consequences in five directions. (i) structural properties of $φ_K$; (ii) extension to stopping times, recovering Wald's equation at $K=1$; (iii) a winner's curse decomposition for heterogeneous means; (iv) a universal bias concentration law showing that the first $α$-fraction of observations generates a $\sqrtα$-fraction of total bias.

2602.19462 2026-02-24 stat.ME math.ST stat.AP stat.TH

Zero Variance Portfolio

Jinyuan Chang, Yi Ding, Zhentao Shi, Bo Zhang

详情
英文摘要

When the number of assets is larger than the sample size, the minimum variance portfolio interpolates the training data, delivering pathological zero in-sample variance. We show that if the weights of the zero variance portfolio are learned by a novel ``Ridgelet'' estimator, in a new test data this portfolio enjoys out-of-sample generalizability. It exhibits the double descent phenomenon and can achieve optimal risk in the overparametrized regime when the number of assets dominates the sample size. In contrast, a ``Ridgeless'' estimator which invokes the pseudoinverse fails in-sample interpolation and diverges away from out-of-sample optimality. Extensive simulations and empirical studies demonstrate that the Ridgelet method performs competitively in high-dimensional portfolio optimization.

2602.19455 2026-02-24 cs.LG cs.AI cs.CL stat.ML

SenTSR-Bench: Thinking with Injected Knowledge for Time-Series Reasoning

Zelin He, Boran Han, Xiyuan Zhang, Shuai Zhang, Haotian Lin, Qi Zhu, Haoyang Fang, Danielle C. Maddix, Abdul Fatir Ansari, Akash Chandrayan, Abhinav Pradhan, Bernie Wang, Matthew Reimherr

Comments Accepted by the 29th International Conference on Artificial Intelligence and Statistics (AISTATS 2026)

详情
英文摘要

Time-series diagnostic reasoning is essential for many applications, yet existing solutions face a persistent gap: general reasoning large language models (GRLMs) possess strong reasoning skills but lack the domain-specific knowledge to understand complex time-series patterns. Conversely, fine-tuned time-series LLMs (TSLMs) understand these patterns but lack the capacity to generalize reasoning for more complicated questions. To bridge this gap, we propose a hybrid knowledge-injection framework that injects TSLM-generated insights directly into GRLM's reasoning trace, thereby achieving strong time-series reasoning with in-domain knowledge. As collecting data for knowledge injection fine-tuning is costly, we further leverage a reinforcement learning-based approach with verifiable rewards (RLVR) to elicit knowledge-rich traces without human supervision, then transfer such an in-domain thinking trace into GRLM for efficient knowledge injection. We further release SenTSR-Bench, a multivariate time-series-based diagnostic reasoning benchmark collected from real-world industrial operations. Across SenTSR-Bench and other public datasets, our method consistently surpasses TSLMs by 9.1%-26.1% and GRLMs by 7.9%-22.4%, delivering robust, context-aware time-series diagnostic insights.

2602.19403 2026-02-24 cs.CL stat.AP

Personalized Prediction of Perceived Message Effectiveness Using Large Language Model Based Digital Twins

Jasmin Han, Janardan Devkota, Joseph Waring, Amanda Luken, Felix Naughton, Roger Vilardaga, Jonathan Bricker, Carl Latkin, Meghan Moran, Yiqun Chen, Johannes Thrul

Comments 31 pages, 5 figures, submitted to Journal of the American Medical Informatics Association (JAMIA). Drs. Chen and Thrul share last authorship

详情
英文摘要

Perceived message effectiveness (PME) by potential intervention end-users is important for selecting and optimizing personalized smoking cessation intervention messages for mobile health (mHealth) platform delivery. This study evaluates whether large language models (LLMs) can accurately predict PME for smoking cessation messages. We evaluated multiple models for predicting PME across three domains: content quality, coping support, and quitting support. The dataset comprised 3010 message ratings (5-point Likert scale) from 301 young adult smokers. We compared (1) supervised learning models trained on labeled data, (2) zero and few-shot LLMs prompted without task-specific fine-tuning, and (3) LLM-based digital twins that incorporate individual characteristics and prior PME histories to generate personalized predictions. Model performance was assessed on three held-out messages per participant using accuracy, Cohen's kappa, and F1. LLM-based digital twins outperformed zero and few-shot LLMs (12 percentage points on average) and supervised baselines (13 percentage points), achieving accuracies of 0.49 (content), 0.45 (coping), and 0.49 (quitting), with directional accuracies of 0.75, 0.66, and 0.70 on a simplified 3-point scale. Digital twin predictions showed greater dispersion across rating categories, indicating improved sensitivity to individual differences. Integrating personal profiles with LLMs captures person-specific differences in PME and outperforms supervised and zero and few-shot approaches. Improved PME prediction may enable more tailored intervention content in mHealth. LLM-based digital twins show potential for supporting personalization of mobile smoking cessation and other health behavior change interventions.

2602.19398 2026-02-24 stat.ME

Variable selection via knockoffs for clustered data

Silvia Bacci, Leonardo Grilli, Carla Rampichini

Comments 11 pages, under submission

详情
英文摘要

We extend the knockoffs method for selecting predictors to clustered data (cross-sectional or repeated measures). In the setting of clustered data, variable selection is complex because some predictors are measured at the observation level (level 1), whereas others are measured at the cluster level (level 2), so their values are constant within clusters. The solution we propose is to conduct variable selection separately at the two levels. To this end, we suggest a two-step approach: (i) decompose each level 1 predictor into level 2 and level 1 components by replacing it with the cluster mean and the deviation from the cluster mean; (ii) perform variable selection separately at the two levels, where the level 1 data matrix includes the deviations from the cluster means and the level 2 data matrix includes the cluster means of level 1 predictors and the level 2 predictors. To evaluate the performance of the proposed approach, we conduct a simulation study comparing the sequential knockoff, the derandomized knockoff, and the Lasso. The study shows satisfactory results in terms of false discovery rate and power. All methods fail when applied to the complete data matrix, including both level 1 and level 2 predictors. In contrast, all methods perform better when applied to the level 1 and level 2 data matrices separately. Moreover, the sequential knockoffs method performs substantially better than the Lasso and the derandomized knockoffs. Our proposal to implement the knockoffs method in a clustered data framework is feasible, flexible, and effective.

2602.19378 2026-02-24 stat.ME

Identification and estimation of the conditional average treatment effect with nonignorable missing covariates, treatment, and outcome

Shuozhi Zuo, Yixin Wang, Fan Yang

详情
英文摘要

Treatment effect heterogeneity is central to policy evaluation, social science, and precision medicine, where interventions can affect individuals differently. In observational studies, covariates, treatment, and outcomes are often only partially observed. When missingness depends on unobserved values (missing not at random; MNAR), standard methods can yield biased estimates of the conditional average treatment effect (CATE). This paper establishes nonparametric identification of the CATE under multivariate MNAR mechanisms that allow covariates, treatment, and outcomes to be MNAR. It also develops nonparametric and parametric estimators and proposes a sensitivity analysis framework for assessing robustness to violations of the missingness assumptions.

2602.19370 2026-02-24 stat.AP stat.ME

Reliability of stochastic capacity estimates

Igor Mikolasek

Comments 9 pages, 3 figures, 3 tables, accepted for TRA 2026 conference

详情
英文摘要

Stochastic traffic capacity is used in traffic modelling and control for unidirectional sections of road infrastructure, although some of the estimation methods have recently proved flawed. However, even sound estimation methods require sufficient data. Because breakdowns are rare, the number of recorded breakdowns effectively determines sample size. This is especially relevant for temporary traffic infrastructure, but also for permanent bottlenecks (e.g., on- and off-ramps), where practitioners must know when estimates are reliable enough for control or design decisions. This paper studies this reliability along with the impact of censored data using synthetic data with a known capacity distribution. A corrected maximum-likelihood estimator is applied to varied samples. In total, 360 artificial measurements are created and used to estimate the capacity distribution, and the deviation from the pre-defined distribution is then quantified. Results indicate that at least 50 recorded breakdowns are necessary; 100-200 are the recommended minimum for temporary measurements. Beyond this, further improvements are marginal, with the expected average relative error below 5 %.

2602.19351 2026-02-24 stat.AP

Network-Level Travel Time Prediction Considering The Effects of Weather and Seasonality

Yufei Ai, Yao Yu, Wenjing Pu, Lu Gao, Yihao Ren

详情
英文摘要

Accurately predicting travel time information can be helpful for travelers. This study proposes a framework for predicting network-level travel time index (TTI) using machine learning models. A case study was performed on more than 50,000 TTI data collected from the Washington DC area over 6 years. The proposed approach is also able to identify the effects of weather and seasonality. The performances of the machine learning models were assessed and compared with each other. It was shown that the ridge regression model outperformed the other models in both short-term and long-term predictions.

2602.19331 2026-02-24 cs.LG cs.NE stat.ML

Partial Soft-Matching Distance for Neural Representational Comparison with Partial Unit Correspondence

Chaitanya Kapoor, Alex H. Williams, Meenakshi Khosla

详情
英文摘要

Representational similarity metrics typically force all units to be matched, making them susceptible to noise and outliers common in neural representations. We extend the soft-matching distance to a partial optimal transport setting that allows some neurons to remain unmatched, yielding rotation-sensitive but robust correspondences. This partial soft-matching distance provides theoretical advantages -- relaxing strict mass conservation while maintaining interpretable transport costs -- and practical benefits through efficient neuron ranking in terms of cross-network alignment without costly iterative recomputation. In simulations, it preserves correct matches under outliers and reliably selects the correct model in noise-corrupted identification tasks. On fMRI data, it automatically excludes low-reliability voxels and produces voxel rankings by alignment quality that closely match computationally expensive brute-force approaches. It achieves higher alignment precision across homologous brain areas than standard soft-matching, which is forced to match all units regardless of quality. In deep networks, highly matched units exhibit similar maximally exciting images, while unmatched units show divergent patterns. This ability to partition by match quality enables focused analyses, e.g., testing whether networks have privileged axes even within their most aligned subpopulations. Overall, partial soft-matching provides a principled and practical method for representational comparison under partial correspondence.

2602.19329 2026-02-24 stat.AP cs.LG

Dynamic Elasticity Between Forest Loss and Carbon Emissions: A Subnational Panel Analysis of the United States

Keonvin Park

详情
英文摘要

Accurate quantification of the relationship between forest loss and associated carbon emissions is critical for both environmental monitoring and policy evaluation. Although many studies have documented spatial patterns of forest degradation, there is limited understanding of the dynamic elasticity linking tree cover loss to carbon emissions at subnational scales. In this paper, we construct a comprehensive panel dataset of annual forest loss and carbon emission estimates for U.S. subnational administrative units from 2001 to 2023, based on the Hansen Global Forest Change dataset. We apply fixed effects and dynamic panel regression techniques to isolate within-region variation and account for temporal persistence in emissions. Our results show that forest loss has a significant positive short-run elasticity with carbon emissions, and that emissions exhibit strong persistence over time. Importantly, the estimated long-run elasticity, accounting for autoregressive dynamics, is substantially larger than the short-run effect, indicating cumulative impacts of repeated forest loss events. These findings highlight the importance of modeling temporal dynamics when assessing environmental responses to land cover change. The dynamic elasticity framework proposed here offers a robust and interpretable tool for analyzing environmental change processes, and can inform both regional monitoring systems and carbon accounting frameworks.

2602.19295 2026-02-24 q-bio.QM stat.AP stat.ME

Time-Varying Hazard Patterns and Co-Mutation Profiles of KRAS G12C and G12D in Real-World NSCLC

Robert Amevor, Dennis Baidoo, Emmanuel Kubuafor

详情
英文摘要

Background: KRAS mutations are the largest oncogenic subset in NSCLC. While KRAS G12C is now targetable, no approved therapies exist for G12D. We examined time-to-next-treatment (TTNT) and overall survival (OS) differences between G12C and G12D, allowing for time-varying hazard effects. Methods: De-identified data from AACR Project GENIE BPC NSCLC v2.0-public were analyzed. TTNT served as a real-world surrogate for progression-free survival. Co-mutations (TP53, STK11, KEAP1, SMARCA4, MET), TMB, and PD-L1 were harmonized. Kaplan-Meier, multivariable Cox, and a pre-specified piecewise Cox model (split at median TTNT = 23 months) were applied. Schoenfeld residuals assessed proportional hazards; bootstrap resampling (B=1000) evaluated stability. Results: Among 162 TTNT-evaluable patients (G12C n=130; G12D n=32), median TTNT was 28.6 versus 32.0 months (log-rank p=0.79). Adjusted Cox regression showed no overall hazard difference (HR=0.85; 95% CI 0.53-1.37; p=0.50), but Schoenfeld testing indicated borderline non-proportionality (p=0.053). Piecewise Cox modeling revealed time-varying effects: early TTNT hazard favored G12D (HR=0.41; 95% CI 0.17-0.97; p=0.043) with significant KRAS x period interaction (HR=3.33; p=0.021) and late-period attenuation (HR=1.38; 95% CI 0.77-2.47; p=0.285). Bootstrap resampling confirmed this pattern (median HRearly=0.39; HRlate=1.41). Among 278 OS-evaluable patients (133 deaths), G12D showed improved OS (adjusted HR=0.63; 95% CI 0.39-0.99; p=0.048). G12C tumors exhibited higher TMB (9.79 vs 7.83 mut/Mb; p=0.002) and greater STK11/KEAP1 enrichment. Conclusions: KRAS G12D demonstrated early TTNT advantage and improved OS. Late-period TTNT differences were non-significant (post-hoc power: 12.3%). These exploratory findings require validation in larger cohorts but support allele-specific therapeutic development for G12D.

2602.19290 2026-02-24 stat.ME econ.EM math.ST stat.TH

Distributional Discontinuity Design

Kyle Schindl, Larry Wasserman

详情
英文摘要

Regression discontinuity and kink designs are typically analyzed through mean effects, even when treatment changes the shape of the entire outcome distribution. To address this, we introduce distributional discontinuity designs, a framework for estimating causal effects for a scalar outcome at the boundary of a discontinuity in treatment assignment. Our estimand is the Wasserstein distance between limiting conditional outcome distributions; a single scale-interpretable measure of distribution shift. We show that this weakly bounds the average treatment effect, where equality holds if and only if the treatment effect is purely additive; thus, departure from equality measures effect heterogeneity. To further encode effect heterogeneity we show that the Wasserstein distance admits an orthogonal decomposition into squared differences in $L$-moments, thereby quantifying the contribution from location, scale, skewness, and higher-order shape components to the overall distributional distance. Next, we extend this framework to distributional kink designs by evaluating the Wasserstein derivative at a policy kink; this describes the flow of probability mass through the kink. In the case of fuzzy kink designs, we derive new identification results. Finally, we apply our methods on real data by re-analyzing two natural experiments to compare our distributional effects to traditional causal estimands.

2602.19284 2026-02-24 stat.ME

Localized conformal model selection

Yuhao Wang, Tengyao Wang

Comments 8 pages, 1 figure

详情
英文摘要

We propose a localized conformal model selection framework that integrates local adaptivity with post-selection validity for distribution-free prediction. By performing model selection symmetrically across calibration points using upper and lower surrogate intervals, we construct a data-dependent safe index set that contains the oracle model and preserves exchangeability. The resulting ensemble procedure retains exact finite-sample marginal coverage while adapting to spatial heterogeneity and model complexity. Simulations demonstrate substantial reductions in interval length compared to the best fixed model, especially in heterogeneous and low-noise settings.

2602.19263 2026-02-24 stat.AP cs.LG

Prognostics of Multisensor Systems with Unknown and Unlabeled Failure Modes via Bayesian Nonparametric Process Mixtures

Kani Fu, Sanduni S Disanayaka Mudiyanselage, Chunli Dai, Minhee Kim

详情
英文摘要

Modern manufacturing systems often experience multiple and unpredictable failure behaviors, yet most existing prognostic models assume a fixed, known set of failure modes with labeled historical data. This assumption limits the use of digital twins for predictive maintenance, especially in high-mix or adaptive production environments, where new failure modes may emerge, and the failure mode labels may be unavailable. To address these challenges, we propose a novel Bayesian nonparametric framework that unifies a Dirichlet process mixture module for unsupervised failure mode discovery with a neural network-based prognostic module. The key innovation lies in an iterative feedback mechanism to jointly learn two modules. These modules iteratively update one another to dynamically infer, expand, or merge failure modes as new data arrive while providing high prognostic accuracy. Experiments on both simulation and aircraft engine datasets show that the proposed approach performs competitively with or significantly better than existing approaches. It also exhibits robust online adaptation capabilities, making it well-suited for digital-twin-based system health management in complex manufacturing environments.

2602.19239 2026-02-24 stat.ML cs.LG

Attention Deficits in Language Models: Causal Explanations for Procedural Hallucinations

Ahmed Karim, Fatima Sheaib, Zein Khamis, Maggie Chlon, Jad Awada, Leon Chlon

详情
英文摘要

Large language models can follow complex procedures yet fail at a seemingly trivial final step: reporting a value they themselves computed moments earlier. We study this phenomenon as \emph{procedural hallucination}: failure to execute a verifiable, prompt-grounded specification even when the correct value is present in context. In long-context binding tasks with a known single-token candidate set, we find that many errors are readout-stage routing failures. Specifically, failures decompose into Stage~2A (gating) errors, where the model does not enter answer mode, and Stage~2B (binding) errors, where it enters answer mode but selects the wrong candidate (often due to recency bias). In the hard regime, Stage~2B accounts for most errors across model families in our tasks (Table~1). On Stage~2B error trials, a linear probe on the final-layer residual stream recovers the correct value far above chance (e.g., 74\% vs.\ 2\% on Qwen2.5-3B; Table~2), indicating that the answer is encoded but not used. We formalize ``present but not used'' via available vs.\ used mutual information and pseudo-prior interventions, yielding output-computable diagnostics and information-budget certificates. Finally, an oracle checkpointing intervention that restates the true binding near the query can nearly eliminate Stage~2B failures at long distance (e.g., Qwen2.5-3B $0/400 \rightarrow 399/400$ at $k = 1024$; Table~8).

2602.19236 2026-02-24 stat.ME

CoMET: A Compressed Bayesian Mixed-Effects Model for High-Dimensional Tensors

Sreya Sarkar, Kshitij Khare, Sanvesh Srivastava

Comments 50 pages, 11 figures, and 2 tables

详情
英文摘要

Mixed-effects models are fundamental tools for analyzing clustered and repeated-measures data, but existing high-dimensional methods largely focus on penalized estimation with vector-valued covariates. Bayesian alternatives in this regime are limited, with no sampling-based mixed-effects framework that supports tensor-valued fixed- and random-effects covariates while remaining computationally tractable. We propose the Compressed Mixed-Effects Tensor (CoMET) model for high-dimensional repeated-measures data with scalar responses and tensor-valued covariates. CoMET performs structured, mode-wise random projection of the random-effects covariance, yielding a low-dimensional covariance parameter that admits simple Gaussian prior specification and enables efficient imputation of compressed random-effects. For the mean structure, CoMET leverages a low-rank tensor decomposition and margin-structured Horseshoe priors to enable fixed-effects selection. These design choices lead to an efficient collapsed Gibbs sampler whose computational complexity grows approximately linearly with the tensor covariate dimensions. We establish high-dimensional theoretical guarantees by identifying regularity conditions under which CoMET's posterior predictive risk decays to zero. Empirically, CoMET outperforms penalized competitors across a range of simulation studies and two benchmark applications involving facial-expression prediction and music emotion modeling.

2602.19220 2026-02-24 stat.ME

A likelihood approach to proper analysis of secondary outcomes in matched case-control studies

Shanshan Liu, Guoqing Diao

详情
英文摘要

Matched case-control studies are commonly employed in epidemiological research for their convenience and efficiency. Analysis of secondary outcomes can yield valuable insights into biological pathways and help identify genetic variants of importance. Naive analysis using standard statistical methods, such as least-squares regression for quantitative traits, can be misleading because they fail to account for unequal sampling induced by the case-control design and matching. In this paper, we propose novel statistical methods that appropriately reflect the study design and sampling scheme in the analysis of secondary outcome data. The new methods provide consistent estimation and accurate coverage probabilities for the confidence interval estimators. We demonstrate the advantages of the new methods through simulation studies and a real application with diabetes patients. R code implementing the proposed methods is publicly available.

2602.19216 2026-02-24 stat.ME

Statistical Measures for Explainable Aspect-Based Sentiment Analysis: A Case Study on Environmental Discourse in Reddit

Luisa Stracqualursi, Patrizia Agati

Comments Preprint of an article accepted for publication in Statistics (Taylor & Francis). 14 pages, 2 figures, 4 tables

详情
英文摘要

Aspect-Based Sentiment Analysis (ABSA) provides a fine-grained understanding of opinions by linking sentiment to specific aspects in text. While transformer-based models excel at this task, their black-box nature limits their interpretability, posing risks in real-world applications without labeled data. This paper introduces a statistical, model-agnostic framework to assess the behavioral transparency and trustworthiness of ABSA models. Our framework relies on several metrics, such as the entropy of polarity distributions, soft-count-based dominance scores, and sentiment divergence between sources, whose robustness is validated through bootstrap resampling and sensitivity analysis. A case study on environmentally focused Reddit communities illustrates how the proposed indicators provide interpretable diagnostics of model certainty, decisiveness, and cross-source variability. The results show that statistical indicators computed on soft outputs can complement traditional approaches, offering a computationally efficient methodology for validating, monitoring, and interpreting ABSA models in contexts where labeled data are unavailable.

2602.19143 2026-02-24 cs.LG math.OC stat.ML

Incremental Learning of Sparse Attention Patterns in Transformers

Oğuz Kaan Yüksel, Rodrigo Alvarez Lucendo, Nicolas Flammarion

Comments 36 pages, 19 figures

详情
英文摘要

This paper introduces a high-order Markov chain task to investigate how transformers learn to integrate information from multiple past positions with varying statistical significance. We demonstrate that transformers learn this task incrementally: each stage is defined by the acquisition of specific information through sparse attention patterns. Notably, we identify a shift in learning dynamics from competitive, where heads converge on the most statistically dominant pattern, to cooperative, where heads specialize in distinct patterns. We model these dynamics using simplified differential equations that characterize the trajectory and prove stage-wise convergence results. Our analysis reveals that transformers ascend a complexity ladder by passing through simpler, misspecified hypothesis classes before reaching the full model class. We further show that early stopping acts as an implicit regularizer, biasing the model toward these simpler classes. These results provide a theoretical foundation for the emergence of staged learning and complex behaviors in transformers, offering insights into generalization for natural language processing and algorithmic reasoning.

2602.19129 2026-02-24 stat.ME

Estimation and Statistical Inference for Generalized Multilayer Latent Space Model

Zhaozhe Liu, Gongjun Xu, Haoran Zhang

详情
英文摘要

Multilayer networks have become increasingly ubiquitous across diverse scientific fields, ranging from social sciences and biology to economics and international relations. Despite their broad applications, the inferential theory for multilayer networks remains underdeveloped. In this paper, we propose a flexible latent space model for multilayer directed networks with various edge types, where each node is assigned with two latent positions capturing sending and receiving behaviors, and each layer has a connection matrix governing the layer-specific structure. Through nonlinear link functions, the proposed model represents the structure of a multilayer network as a tensor, which admits a Tucker low-rank decomposition. This formulation poses significant challenges on the estimation and statistical inference for the latent positions and connection matrices, where existing techniques are inapplicable. To tackle this issue, a novel unfolding and fusion method is developed to facilitate estimation. We establish both consistency and asymptotic normality for the estimated latent positions and connection matrices, which paves the way for statistical inference tasks in multilayer network applications, such as constructing confidence regions for the latent positions and testing whether two network layers share the same structure. We validate the proposed method through extensive simulation studies and demonstrate its practical utility on real-world data.

2602.17960 2026-02-24 math.PR stat.AP stat.ML

Anisotropic local law for non-separable sample covariance matrices

Zhou Fan, Renyuan Ma, Elliot Paquette, Zhichao Wang

详情
英文摘要

We establish local laws for sample covariance matrices $K = N^{-1}\sum_{i=1}^N \g_i\g_i^*$ where the random vectors $\g_1, \ldots, \g_N \in \R^n$ are independent with common covariance $Σ$. Previous work has largely focused on the separable model $\g = Σ^{1/2}\w$ with $\w$ having independent entries, but this structure is rarely present in statistical applications involving dependent or nonlinearly transformed data. Under a concentration assumption for quadratic forms $\g^*A\g$, we prove an optimal averaged local law showing that the Stieltjes transform of $K$ converges to its deterministic limit uniformly down to the optimal scale $η\geq N^{-1+\eps}$. Under an additional structural assumption on the cumulant tensors of $\g$ -- which interpolates between the highly structured case of independent entries and generic dependence -- we establish the full anisotropic local law, providing entrywise control of the resolvent $(K-zI)^{-1}$ in arbitrary directions. We discuss several classes of non-separable examples satisfying our assumptions, including conditionally mean-zero distributions, the random features model $\g = σ(X\w)$ arising in machine learning, and Gaussian measures with nonlinear tilting. The proofs introduce a tensor network framework for analyzing fluctuation averaging in the presence of higher-order cumulant structure.

2602.14934 2026-02-24 stat.ML cs.LG

Activation-Space Uncertainty Quantification for Pretrained Networks

Richard Bergna, Stefan Depeweg, Sergio Calvo-Ordoñez, Jonathan Plenk, Alvaro Cartea, Jose Miguel Hernández-Lobato

详情
英文摘要

Reliable uncertainty estimates are crucial for deploying pretrained models; yet, many strong methods for quantifying uncertainty require retraining, Monte Carlo sampling, or expensive second-order computations and may alter a frozen backbone's predictions. To address this, we introduce Gaussian Process Activations (GAPA), a post-hoc method that shifts Bayesian modeling from weights to activations. GAPA replaces standard nonlinearities with Gaussian-process activations whose posterior mean exactly matches the original activation, preserving the backbone's point predictions by construction while providing closed-form epistemic variances in activation space. To scale to modern architectures, we use a sparse variational inducing-point approximation over cached training activations, combined with local k-nearest-neighbor subset conditioning, enabling deterministic single-pass uncertainty propagation without sampling, backpropagation, or second-order information. Across regression, classification, image segmentation, and language modeling, GAPA matches or outperforms strong post-hoc baselines in calibration and out-of-distribution detection while remaining efficient at test time.

2602.14440 2026-02-24 stat.ME cs.LG stat.ML

CAIRO: Decoupling Order from Scale in Regression

Harri Vanhems, Yue Zhao, Peng Shi, Archer Y. Yang

详情
英文摘要

Standard regression methods typically optimize a single pointwise objective, such as mean squared error, which conflates the learning of ordering with the learning of scale. This coupling renders models vulnerable to outliers and heavy-tailed noise. We propose CAIRO (Calibrate After Initial Rank Ordering), a framework that decouples regression into two distinct stages. In the first stage, we learn a scoring function by minimizing a scale-invariant ranking loss; in the second, we recover the target scale via isotonic regression. We theoretically characterize a class of "Optimal-in-Rank-Order" objectives -- including variants of RankNet and Gini covariance -- and prove that they recover the ordering of the true conditional mean under mild assumptions. We further show that subsequent monotone calibration recovers the true regression function at the population level and mathematically guarantees that finite-sample predictions are strictly auto-calibrated. Empirically, CAIRO combines the representation learning of neural networks with the robustness of rank-based statistics. It matches the performance of state-of-the-art tree ensembles on tabular benchmarks and significantly outperforms standard regression objectives in regimes with heavy-tailed or heteroskedastic noise.

2602.14208 2026-02-24 cs.LG math.OC stat.ML

Fast Catch-Up, Late Switching: Optimal Batch Size Scheduling via Functional Scaling Laws

Jinbo Wang, Binghui Li, Zhanpeng Zhou, Mingze Wang, Yuxuan Sun, Jiaqi Zhang, Xunliang Cai, Lei Wu

Comments 34 pages, accepted by ICLR 2026 as a conference paper

详情
英文摘要

Batch size scheduling (BSS) plays a critical role in large-scale deep learning training, influencing both optimization dynamics and computational efficiency. Yet, its theoretical foundations remain poorly understood. In this work, we show that the functional scaling law (FSL) framework introduced in Li et al. (2025a) provides a principled lens for analyzing BSS. Specifically, we characterize the optimal BSS under a fixed data budget and show that its structure depends sharply on task difficulty. For easy tasks, optimal schedules keep increasing batch size throughout. In contrast, for hard tasks, the optimal schedule maintains small batch sizes for most of training and switches to large batches only in a late stage. To explain the emergence of late switching, we uncover a dynamical mechanism -- the fast catch-up effect -- which also manifests in large language model (LLM) pretraining. After switching from small to large batches, the loss rapidly aligns with the constant large-batch trajectory. Using FSL, we show that this effect stems from rapid forgetting of accumulated gradient noise, with the catch-up speed determined by task difficulty. Crucially, this effect implies that large batches can be safely deferred to late training without sacrificing performance, while substantially reducing data consumption. Finally, extensive LLM pretraining experiments -- covering both Dense and MoE architectures with up to 1.1B parameters and 1T tokens -- validate our theoretical predictions. Across all settings, late-switch schedules consistently outperform constant-batch and early-switch baselines.

2601.13851 2026-02-24 cs.LG stat.ML

Inverting Self-Organizing Maps: A Unified Activation-Based Framework

Alessandro Londei, Matteo Benati, Denise Lanzieri, Vittorio Loreto

详情
英文摘要

Self-Organizing Maps (SOMs) provide topology-preserving projections of high-dimensional data, yet their use as generative models remains largely unexplored. We show that the activation pattern of a SOM -- the squared distances to its prototypes -- can be \emph{inverted} to recover the exact input, following from a classical result in Euclidean distance geometry: a point in $D$ dimensions is uniquely determined by its distances to $D{+}1$ affinely independent references. We derive the corresponding linear system and characterize the conditions under which inversion is well-posed. Building on this mechanism, we introduce the \emph{Manifold-Aware Unified SOM Inversion and Control} (MUSIC) update rule, which modifies squared distances to selected prototypes while preserving others, producing controlled, semantically meaningful trajectories aligned with the SOM's piecewise-linear structure. Tikhonov regularization stabilizes the update and ensures smooth motion in high dimensions. Unlike variational or diffusion-based generative models, MUSIC requires no sampling, latent priors, or learned decoders: it operates entirely on prototype geometry. If no perturbation is applied, inversion recovers the exact input; when a target prototype or cluster is specified, MUSIC produces coherent semantic transitions. We validate the framework on synthetic Gaussian mixtures, MNIST digits, and the Labeled Faces in the Wild dataset. Across all settings, MUSIC trajectories maintain high classifier confidence, produce significantly sharper intermediate images than linear interpolation, and reveal an interpretable geometric structure of the learned map.

2512.04861 2026-02-24 math.ST stat.ML stat.TH

Concentration bounds for intrinsic dimension estimation using Gaussian kernels

Martin Andersson

Comments 24 pages, 8 figures

详情
英文摘要

We prove finite-sample concentration and anti-concentration bounds for dimension estimation using Gaussian kernel sums. Our bounds provide explicit dependence on sample size, bandwidth, and local geometric and distributional parameters, characterizing precisely how regularity conditions influence statistical performance. We also propose a bandwidth selection heuristic using derivative information, supported by numerical experiments.

2511.07588 2026-02-24 stat.ME

Weighted Asymptotically Optimal Sequential Testing

Soumyabrata Bose, Jay Bartroff

详情
英文摘要

This paper develops a framework for incorporating prior information into sequential multiple testing procedures while maintaining asymptotic optimality. We define a weighted log-likelihood ratio (WLLR) as an additive modification of the standard LLR and use it to construct two new sequential tests: the Weighted Gap and Weighted Gap-Intersection procedures. We prove that both procedures provide strong control of the family-wise error rate. Our main theoretical contribution is to show that these weighted procedures are asymptotically optimal; their expected stopping times achieve the theoretical lower bound as the error probabilities vanish. This first-order optimality is shown to be robust, holding in high-dimensional regimes where the number of null hypotheses grows and in settings with random weights, provided that mild, interpretable conditions on the weight distribution are met.

2511.07270 2026-02-24 math.ST cs.IT cs.LG math.IT math.PR stat.ML stat.TH

High-Dimensional Asymptotics of Differentially Private PCA

Youngjoo Yun, Rishabh Dudeja

详情
英文摘要

In differential privacy, random noise is introduced to privatize summary statistics of a sensitive dataset before releasing them. The noise level determines the privacy loss, which quantifies how easily an adversary can detect a target individual's presence in the dataset using the published statistic. Most privacy analyses provide upper bounds on the privacy loss. Sometimes, these bounds offer weak privacy guarantees unless the noise level is so high that it overwhelms the meaningful signal. It is unclear whether such high noise levels are necessary or a limitation of loose and pessimistic privacy bounds. This paper explores whether it is possible to obtain sharp privacy characterizations that determine the exact privacy loss of a mechanism on a given dataset. We study this problem in the context of differentially private principal component analysis (PCA), where the goal is to privatize the leading principal components of a dataset with $n$ samples and $p$ features. We analyze the exponential mechanism in a model-free setting and provide sharp utility and privacy characterizations in the high-dimensional limit ($p \rightarrow \infty$). We show that in high dimensions, detecting a target individual's presence using privatized PCs is exactly as hard as distinguishing between two Gaussians with slightly different means, where the mean difference depends on certain spectral properties of the dataset. Our analysis combines the hypothesis-testing formulation of privacy guarantees proposed by Dong, Roth, and Su (2022) with Le Cam's contiguity arguments.

2511.00958 2026-02-24 cs.LG cs.AI stat.ML

The Hidden Power of Normalization Layers in Neural Networks: Exponential Capacity Control

Khoat Than

详情
英文摘要

Normalization layers are critical components of modern AI systems, such as ChatGPT, Gemini, DeepSeek, etc. Empirically, they are known to stabilize training dynamics and improve generalization ability. However, the underlying theoretical mechanism by which normalization layers contribute to both optimization and generalization remains largely unexplained, especially when using many normalization layers in a deep neural network (DNN). In this work, we develop a theoretical framework that elucidates the role of normalization through the lens of capacity control. We prove that an unnormalized DNN can exhibit exponentially large Lipschitz constants with respect to either its parameters or inputs, implying excessive functional capacity and potential overfitting. Such bad DNNs are uncountably many. In contrast, the insertion of normalization layers provably can reduce the Lipschitz constant at an exponential rate in the number of normalization layers. This exponential reduction yields two fundamental consequences: (1) it smooths the loss landscape at an exponential rate, facilitating faster and more stable optimization; and (2) it constrains the effective capacity of the network, thereby enhancing generalization guarantees on unseen data. Our results thus offer a principled explanation for the empirical success of normalization methods in deep learning.

2510.21491 2026-02-24 cs.LG cs.DC stat.ML

Benchmarking Catastrophic Forgetting Mitigation Methods in Federated Time Series Forecasting

Khaled Hallak, Oudom Kem

Comments Accepted for presentation at the FLTA 2025 Conference on Federated Learning. This version corresponds to the camera-ready author manuscript

详情
英文摘要

Catastrophic forgetting (CF) poses a persistent challenge in continual learning (CL), especially within federated learning (FL) environments characterized by non-i.i.d. time series data. While existing research has largely focused on classification tasks in vision domains, the regression-based forecasting setting prevalent in IoT and edge applications remains underexplored. In this paper, we present the first benchmarking framework tailored to investigate CF in federated continual time series forecasting. Using the Beijing Multi-site Air Quality dataset across 12 decentralized clients, we systematically evaluate several CF mitigation strategies, including Replay, Elastic Weight Consolidation, Learning without Forgetting, and Synaptic Intelligence. Key contributions include: (i) introducing a new benchmark for CF in time series FL, (ii) conducting a comprehensive comparative analysis of state-of-the-art methods, and (iii) releasing a reproducible open-source framework. This work provides essential tools and insights for advancing continual learning in federated time-series forecasting systems.

2510.16703 2026-02-24 cs.LG cs.AI stat.ME

On the Granularity of Causal Effect Identifiability

Yizuo Chen, Adnan Darwiche

详情
英文摘要

The classical notion of causal effect identifiability is defined in terms of treatment and outcome variables. In this paper, we consider the identifiability of state-based causal effects: how an intervention on a particular state of treatment variables affects a particular state of outcome variables. We demonstrate that state-based causal effects may be identifiable even when variable-based causal effects may not. Moreover, we show that this separation occurs only when additional knowledge -- such as context-specific independencies -- is available. We further examine knowledge that constrains the states of variables, and show that such knowledge can improve both variable-based and state-based identifiability when combined with other knowledge such as context-specific independencies. We finally propose an approach for identifying causal effects under these additional constraints, and conduct empirical studies to further illustrate the separations between the two levels of identifiability.

2510.11853 2026-02-24 stat.ME math.ST stat.TH

A Martingale Kernel Two-Sample Test

Anirban Chatterjee, Aaditya Ramdas

Comments Accepted for publication in the proceedings of The 37th International Conference on Algorithmic Learning Theory

详情
英文摘要

The Maximum Mean Discrepancy (MMD) is a widely used multivariate distance metric for two-sample testing. The standard MMD test statistic has an intractable null distribution typically requiring costly resampling or permutation approaches for calibration. In this work we leverage a martingale interpretation of the estimated squared MMD to propose martingale MMD (mMMD), a quadratic-time statistic which has a limiting standard Gaussian distribution under the null. Moreover we show that the test is consistent against any fixed alternative and for large sample sizes, mMMD offers substantial computational savings over the standard MMD test, with only a minor loss in power.

2510.03734 2026-02-24 cs.LG cs.AI cs.CY stat.ML

Cost Efficient Fairness Audit Under Partial Feedback

Nirjhar Das, Mohit Sharma, Praharsh Nanavati, Kirankumar Shiragur, Amit Deshpande

Comments Accepted at NeurIPS 2025 RegML Workshop; Reliable ML Workshop

详情
英文摘要

We study the problem of auditing the fairness of a given classifier under partial feedback, where true labels are available only for positively classified individuals, (e.g., loan repayment outcomes are observed only for approved applicants). We introduce a novel cost model for acquiring additional labeled data, designed to more accurately reflect real-world costs such as credit assessment, loan processing, and potential defaults. Our goal is to find optimal fairness audit algorithms that are more cost-effective than random exploration and natural baselines. In our work, we consider two audit settings: a black-box model with no assumptions on the data distribution, and a mixture model, where features and true labels follow a mixture of exponential family distributions. In the black-box setting, we propose a near-optimal auditing algorithm under mild assumptions and show that a natural baseline can be strictly suboptimal. In the mixture model setting, we design a novel algorithm that achieves significantly lower audit cost than the black-box case. Our approach leverages prior work on learning from truncated samples and maximum-a-posteriori oracles, and extends known results on spherical Gaussian mixtures to handle exponential family mixtures, which may be of independent interest. Moreover, our algorithms apply to popular fairness metrics including demographic parity, equal opportunity, and equalized odds. Empirically, we demonstrate strong performance of our algorithms on real-world fair classification datasets like Adult Income and Law School, consistently outperforming natural baselines by around 50% in terms of audit cost.

2509.12666 2026-02-24 stat.ML cs.LG cs.NA math.NA

PBPK-iPINNs: Inverse Physics-Informed Neural Networks for Physiologically Based Pharmacokinetic Brain Models

Charuka D. Wickramasinghe, Krishanthi C. Weerasinghe, Pradeep K. Ranaweera, Nelum S. S. M. Hapuhinna

Comments 28 pages, 12 figures

详情
英文摘要

Physics-Informed Neural Networks (PINNs) integrate machine learning with differential equations to solve forward and inverse problems while ensuring that predictions adhere to physical laws. Physiologically based pharmacokinetic (PBPK) modeling advances beyond classical compartmental approaches by employing a mechanistic, physiology-focused framework. Such models involve many unknown parameters that are difficult to measure directly in humans due to ethical and practical constraints. PBPK models are constructed as systems of ordinary differential equations (ODEs) and these parametric ODEs are often stiff, and traditional numerical and statistical methods frequently fail to converge. In this study, we consider a permeability-limited, four-compartment PBPK brain model that mimics human brain functionality in drug delivery. We introduce PBPK-iPINN, a method for estimating drug-specific or patient-specific parameters and drug concentration profiles using inverse PINNs. We also conducted parameter identifiability analysis to determines whether the parameters can be uniquely and reliably estimated from the available data. We demonstrate that, for the inverse problem to converge to the correct solution, the components of the loss function (data loss, initial condition loss, and residual loss) must be appropriately weighted, and the hyperparameters including the number of layers and neurons, activation functions, learning rate, optimizer, and collocation points must be carefully tuned. The performance of the PBPK-iPINN approach is then compared with established numerical and statistical methods. Accurate parameter estimation yields precise drug concentration-time profiles, which in turn enable the calculation of pharmacokinetic metrics. These metrics support drug developers and clinicians in designing and optimizing therapies for brain cancer.

2508.13668 2026-02-24 physics.bio-ph stat.CO

Perspective: An outlook on fluorescence tracking

Lance W. Q. Xu, Steve Pressé

详情
英文摘要

Tracking single fluorescent molecules has offered resolution into dynamic molecular processes at the single-molecule level. This perspective traces the evolution of single-molecule tracking, highlighting key developments across various methodological branches within fluorescence microscopy. We compare the strengths and limitations of each approach, ranging from conventional widefield offline tracking to real-time confocal tracking. In the final section, we explore emerging efforts to advance physics-inspired tracking techniques, a possibility for parallelization and artificial intelligence, and discuss challenges and opportunities they present toward achieving higher spatiotemporal resolution and greater computational and data efficiency in next-generation single-molecule studies.

2507.11891 2026-02-24 stat.ML cs.LG math.ST stat.TH

Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?

Shuangning Li, Chonghuan Wang, Jingyan Wang

详情
英文摘要

We study A/B experiments that are designed to compare the performance of two recommendation algorithms. Prior work has observed that the stable unit treatment value assumption (SUTVA) often does not hold in large-scale recommendation systems, and hence the estimate for the global treatment effect (GTE) is biased. Specifically, units under the treatment and control algorithms contribute to a shared pool of data that subsequently train both algorithms, resulting in interference between the two groups. In this paper, we investigate when such interference may affect our decision making on which algorithm is better. We formalize this insight under a multi-armed bandit framework and theoretically characterize when the sign of the difference-in-means estimator of the GTE under data sharing aligns with or contradicts the sign of the true GTE. Our analysis identifies the level of exploration versus exploitation as a key determinant of how data sharing impacts decision making, and we propose a detection procedure based on ramp-up experiments to signal incorrect algorithm comparison in practice.

2507.11768 2026-02-24 stat.ML cs.LG

LLMs are Bayesian, In Expectation, Not in Realization

Leon Chlon, Zein Khamis, Maggie Chlon, Mahdi El Zein, MarcAntonio M. Awada

详情
英文摘要

Exchangeability-based martingale diagnostics have been used to question Bayesian explanations of transformer in-context learning. We show that these violations are compatible with Bayesian/MDL behavior once we account for a basic architectural fact: positional encodings break exchangeability. Accordingly, the relevant baseline is performance in expectation over orderings of an exchangeable multiset, not performance under every fixed ordering. In a Bernoulli microscope (under explicit regularity assumptions), we bound the permutation-induced dispersion detected by martingale diagnostics (Theorem~3.4) while proving near-optimal expected MDL/compression over permutations (Theorem~3.6). Empirically, black-box next-token log-probabilities from an Azure OpenAI deployment exhibit nonzero expectation--realization gaps that decay with context length (mean 0.74 at $n = 10$ to 0.26 at $n = 50$; 95\% confidence intervals), and permutation averaging reduces order-induced standard deviation with a $k^{-1/2}$ trend (Figure~2). Controlled from-scratch training ablations varying only the positional encoding show within-prefix order variance collapsing to $\approx 10^{-16}$ with no positional encoding, but remaining $10^{-8}$--$10^{-6}$ under standard positional encoding schemes (Table~2). Robustness checks extend beyond Bernoulli to categorical sequences, synthetic in-context learning tasks, and evidence-grounded QA with permuted exchangeable evidence chunks.

2506.22740 2026-02-24 cs.AI stat.ML

Explanations are a Means to an End: Decision Theoretic Explanation Evaluation

Ziyang Guo, Berk Ustun, Jessica Hullman

详情
英文摘要

Explanations of model behavior are commonly evaluated via proxy properties weakly tied to the purposes explanations serve in practice. We contribute a decision theoretic framework that treats explanations as information signals valued by the expected improvement they enable on a specified decision task. This approach yields three distinct estimands: 1) a theoretical benchmark that upperbounds achievable performance by any agent with the explanation, 2) a human-complementary value that quantifies the theoretically attainable value that is not already captured by a baseline human decision policy, and 3) a behavioral value representing the causal effect of providing the explanation to human decision-makers. We instantiate these definitions in a practical validation workflow, and apply them to assess explanation potential and interpret behavioral effects in human-AI decision support and mechanistic interpretability.

2506.18630 2026-02-24 stat.ML cs.LG eess.SP

Trustworthy Prediction with Gaussian Process Knowledge Scores

Kurt Butler, Guanchao Feng, Tong Chen, Petar Djuric

Comments 6 pages, 5 figures, to be published in the Proceedings of the European Signal Processing Conference (EUSIPCO)

详情
英文摘要

Probabilistic models are often used to make predictions in regions of the data space where no observations are available, but it is not always clear whether such predictions are well-informed by previously seen data. In this paper, we propose a knowledge score for predictions from Gaussian process regression (GPR) models that quantifies the extent to which observing data have reduced our uncertainty about a prediction. The knowledge score is interpretable and naturally bounded between 0 and 1. We demonstrate in several experiments that the knowledge score can anticipate when predictions from a GPR model are accurate, and that this anticipation improves performance in tasks such as anomaly detection, extrapolation, and missing data imputation. Source code for this project is available online at https://github.com/KurtButler/GP-knowledge.

2506.18215 2026-02-24 math.ST stat.TH

Estimating quantile treatments without strict overlap

Marco Avella-Medina, Richard Davis, Gennady Samorodnitsky

详情
英文摘要

We consider the problem of estimating quantile treatment effects without assuming strict overlap , i.e., we do not assume that the propensity score is bounded away from zero. More specifically, we consider an inverse probability weighting (IPW) approach for estimating quantiles in the potential outcomes framework and pay special attention to scenarios where the propensity scores can tend to zero as a regularly varying function. Our approach effectively considers a heavy-tailed objective function for estimating the quantile process. We introduce a truncated IPW estimator that is shown to outperform the standard quantile IPW estimator when strict overlap does not hold. We show that the limiting distribution of the estimated quantile process follows an infinitely divisible law and converges at the rate $n^{1-1/γ}$, where $γ>1$ is the tail index of the propensity scores when they tend to zero. We propose a practical, data-driven procedure for selecting the truncation parameter, grounded in our asymptotic theory. The performance of our estimators is illustrated in numerical experiments and in a dataset that exhibits the presence of extreme propensity scores.

2506.10572 2026-02-24 stat.ML cs.LG

Probability Bounding: Post-Hoc Calibration via Box-Constrained Softmax

Kyohei Atarashi, Satoshi Oyama, Hiromi Arai, Hisashi Kashima

Comments 46 pages, 4 figures

详情
英文摘要

Many studies have observed that modern neural networks achieve high accuracy while producing poorly calibrated probabilities, making calibration a critical practical issue. In this work, we propose probability bounding (PB), a novel post-hoc calibration method that mitigates both underconfidence and overconfidence by learning lower and upper bounds on the output probabilities. To implement PB, we introduce the box-constrained softmax (BCSoftmax) function, a generalization of Softmax that explicitly enforces lower and upper bounds on the output probabilities. While BCSoftmax is formulated as the solution to a box-constrained optimization problem, we develop an exact and efficient algorithm for computing BCSoftmax. We further provide theoretical guarantees for PB and introduce two variants of PB. We demonstrate the effectiveness of our methods experimentally on four real-world datasets, consistently reducing calibration errors. Our Python implementation is available at https://github.com/neonnnnn/torchbcsoftmax.

2506.00486 2026-02-24 cs.LG cs.AI stat.ML

It Takes a Good Model to Train a Good Model: Generalized Gaussian Priors for Optimized LLMs

Jun Wu, Patrick Huang, Jiangtao Wen, Yuxing Han

详情
英文摘要

Despite rapid progress in large language models (LLMs), the statistical structure of their weights, activations, and gradients-and its implications for initialization, training dynamics, and efficiency-remains largely unexplored. We empirically show that these quantities in LLMs are well modeled by generalized Gaussian (GG) distributions, and introduce a unified, end-to-end optimization framework grounded in this observation. Our contributions are threefold: (1) a GG-based initialization that aligns with trained model statistics, accelerating convergence and improving accuracy; (2) ACT, a progressive activation-constrained training method that reduces redundancy and propagation overhead; and (3) GCT, a gradient-constrained training algorithm that substantially lowers communication cost in distributed training. Experiments across diverse architectures demonstrate consistently smaller, faster models with minimal communication overhead that match or surpass standard baselines. By anchoring LLM optimization in principled statistical modeling, this work advances efficient, scalable, and hardware-aware AI systems.

2505.21417 2026-02-24 stat.ME stat.CO

Model averaging with mixed criteria for estimating high quantiles of extreme values: Application to heavy rainfall

Yonggwan Shin, Yire Shin, Jeong-Soo Park

Journal ref Shin, Y., Shin, Y. & Park, JS. Model averaging with mixed criteria for estimating high quantiles of extreme values: application to heavy rainfall. Stoch Environ Res Risk Assess 40(2), 47 (2026)

详情
英文摘要

Accurately estimating high quantiles beyond the largest observed value is crucial for risk assessment and devising effective adaptation strategies to prevent a greater disaster. The generalized extreme value distribution is widely used for this purpose, with L-moment estimation (LME) and maximum likelihood estimation (MLE) being the primary methods. However, estimating high quantiles with a small sample size becomes challenging when the upper endpoint is unbounded, or equivalently, when there are larger uncertainties involved in extrapolation. This study introduces an improved approach using a model averaging (MA) technique. The proposed method combines MLE and LME to construct candidate submodels and assign weights effectively. The properties of the proposed approach are evaluated through Monte Carlo simulations and an application to maximum daily rainfall data in Korea. In addition, theoretical properties of the MA estimator are examined, including the asymptotic variance with random weights. A surrogate model of MA estimation is also developed and applied for further analysis. Finally, a Bayesian model averaging approach is considered to reduce the estimation bias occurring in the MA methods.

2504.19375 2026-02-24 cs.LG cs.SY eess.SY math.OC stat.ML

$O(1/k)$ Finite-Time Bound for Non-Linear Two-Time-Scale Stochastic Approximation

Siddharth Chandak

Comments Submitted to IEEE Transactions on Automatic Control

详情
英文摘要

Two-time-scale stochastic approximation (SA) is an algorithm with coupled iterations which has found broad applications in reinforcement learning, optimization and game control. In this work, we derive mean squared error bounds for non-linear two-time-scale iterations with contractive mappings. In the setting where both stepsizes are order $Θ(1/k)$, commonly referred to as single time-scale SA with multiple coupled sequences, we obtain the first $O(1/k)$ rate without imposing additional smoothness assumptions. In the setting with true time-scale separation, the previous best bound was $O(1/k^{2/3})$. We improve this to $O(1/k^a)$ for any $a<1$ approaching the optimal $O(1/k)$ rate. The key step in our analysis involves rewriting the original iteration in terms of an averaged noise sequence whose variance decays sufficiently fast. Additionally, we use an induction-based approach to show that the iterates are bounded in expectation. Our results apply to Polyak averaging, as well as to algorithms from reinforcement learning, and optimization, including gradient descent-ascent and two-time-scale Lagrangian optimization.

2504.16100 2026-02-24 eess.SP cs.AI cs.LG stat.ML

Towards Accurate Forecasting of Renewable Energy : Building Datasets and Benchmarking Machine Learning Models for Solar and Wind Power in France

Eloi Lindas, Yannig Goude, Philippe Ciais

Comments 24 pages, 4 tables, 18 figures

Journal ref Environmental Data Science , Volume 4 , 2025 , e45

详情
英文摘要

Accurate prediction of non-dispatchable renewable energy sources is essential for grid stability and price prediction. Regional power supply forecasts are usually indirect through a bottom-up approach of plant-level forecasts, incorporate lagged power values, and do not use the potential of spatially resolved data. This study presents a comprehensive methodology for predicting solar and wind power production at country scale in France using machine learning models trained with spatially explicit weather data combined with spatial information about production sites capacity. A dataset is built spanning from 2012 to 2023, using daily power production data from RTE (the national grid operator) as the target variable, with daily weather data from ERA5, production sites capacity and location, and electricity prices as input features. Three modeling approaches are explored to handle spatially resolved weather data: spatial averaging over the country, dimension reduction through principal component analysis, and a computer vision architecture to exploit complex spatial relationships. The study benchmarks state-of-the-art machine learning models as well as hyperparameter tuning approaches based on cross-validation methods on daily power production data. Results indicate that cross-validation tailored to time series is best suited to reach low error. We found that neural networks tend to outperform traditional tree-based models, which face challenges in extrapolation due to the increasing renewable capacity over time. Model performance ranges from 4% to 10% in nRMSE for midterm horizon, achieving similar error metrics to local models established at a single-plant level, highlighting the potential of these methods for regional power supply forecasting.

2503.09287 2026-02-24 econ.EM stat.AP

On the Wisdom of Crowds (of Economists)

Francis X. Diebold, Aaron Mora, Minchul Shin

详情
英文摘要

We study the properties of macroeconomic survey forecast response averages as the number of survey respondents grows. Such averages are ``portfolios" of forecasts. We characterize the speed and pattern of the gains from diversification as a function of portfolio size (the number of survey respondents) in both (1) the key real-world data-based environment of the U.S. Survey of Professional Forecasters, and (2) the theoretical model-based environment of equicorrelated forecast errors. We proceed by proposing and comparing various direct and model-based ``crowd size signature plots", which summarize the forecasting performance of $k$-average forecasts as a function of $k$, where $k$ is the number of forecasts in the average. We then estimate the equicorrelation model for growth and inflation forecast errors by choosing model parameters to minimize the divergence between direct and model-based signature plots. The results indicate near-perfect equicorrelation model fit for both growth and inflation, which we explicate by showing analytically that, under very weak conditions, the direct and fitted equicorrelation model-based signature plots are identical at a particular model parameter configuration. That parameter configuration immediately suggests an analytic closed-form estimator for the direct signature plot, so that equicorrelation ultimately emerges as a device for convenient calculation of direct signature plots, rather than a separate ``model" producing separate signature plots. In any event we find that the gains from survey diversification are greater for inflation forecasts than for growth forecasts, and that they are largely exhausted with inclusion of 5-10 representative forecasters.

2502.09257 2026-02-24 cs.LG cs.AI stat.ML

From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards

Liad Erez, Tomer Koren

详情
英文摘要

We study the problem of contextual combinatorial semi-bandits, where input contexts are mapped into subsets of size $m$ of a collection of $K$ possible actions. In each round, the learner observes the realized reward of the predicted actions. Motivated by prototypical applications of contextual bandits, we focus on the $s$-sparse regime where we assume that the sum of rewards is bounded by some value $s\ll K$. For example, in recommendation systems the number of products purchased by any customer is significantly smaller than the total number of available products. Our main result is for the $(ε,δ)$-PAC variant of the problem for which we design an algorithm that returns an $ε$-optimal policy with high probability using a sample complexity of $\tilde{O}((poly(K/m)+sm/ε^2) \log(|Π|/δ))$ where $Π$ is the underlying (finite) class and $s$ is the sparsity parameter. This bound improves upon known bounds for combinatorial semi-bandits whenever $s\ll K$, and in the regime where $s=O(1)$, the leading term is independent of $K$. Our algorithm is also computationally efficient given access to an ERM oracle for $Π$. Our framework generalizes the list multiclass classification problem with bandit feedback, which can be seen as a special case with binary reward vectors. In the special case of single-label classification corresponding to $s=m=1$, we prove an $O((K^7+1/ε^2)\log(|H|/δ))$ sample complexity bound, which improves upon recent results in this scenario. Additionally, we consider the regret minimization setting where data can be generated adversarially, and establish a regret bound of $\tilde O(|Π|+\sqrt{smT\log |Π|})$, extending the result of Erez et al. (2024) who consider the simpler single label classification setting.

2501.10117 2026-02-24 econ.EM stat.ME

Prediction Sets and Conformal Inference with Interval Outcomes

Weiguang Liu, Áureo de Paula, Elie Tamer

详情
英文摘要

Given data on a random variable \(Y\), a prediction set with miscoverage level \(α\in (0,1)\) is a set that contains a new draw of \(Y\) with probability \(1-α\). Among all prediction sets satisfying this coverage property, the oracle prediction set is the one with minimal volume. The oracle prediction set offers a complementary view of the distribution of \(Y\), beyond point estimators such as the mean and quantiles, and has attracted considerable interest recently. This paper develops methods for estimating such prediction sets conditional on observed covariates when \(Y\) is \textit{censored} or \textit{interval-valued}. We characterise the oracle prediction set under partial identification induced by interval censoring and propose consistent estimators for both oracle prediction intervals and more general oracle prediction sets consisting of multiple disjoint intervals. In addition, we apply conformal inference to construct finite-sample valid prediction sets for interval outcomes that remain consistent as the sample size grows, using a conformity score tailored to interval data. The proposed procedure accounts for irreducible prediction uncertainty due to the stochastic nature of outcomes, modelling uncertainty arising from partial identification, and sampling uncertainty that vanishes as sample size increases. We conduct Monte Carlo simulations and two empirical applications using UK job postings data and the US Current Population Survey. The results demonstrate the robustness and efficiency of the proposed methods.

2412.15520 2026-02-24 stat.ME

Logistic Regression Model for Differentially-Private Matrix Masked Data

Linh H Nghiem, Aidong A. Ding, Samuel Wu

详情
英文摘要

A recently proposed scheme utilizing local noise addition and matrix masking enables data collection while protecting individual privacy from all parties, including the central data manager. Statistical analysis of such privacy-preserved data is particularly challenging for nonlinear models like logistic regression. By leveraging a relationship between logistic regression and linear regression estimators, we propose the first valid statistical analysis method for logistic regression under this setting. Theoretical analysis of the proposed estimators confirmed its validity under an asymptotic framework with increasing noise magnitude to account for strict privacy requirements. Simulations and real data analyses demonstrate the superiority of the proposed estimators over naive logistic regression methods on privacy-preserved data sets.

2411.02770 2026-02-24 cs.LG math.PR stat.CO stat.ML

A spectral mixture representation of isotropic kernels with application to random Fourier features

Nicolas Langrené, Xavier Warin, Pierre Gruet

Comments 27 pages, 12 figures

详情
英文摘要

Rahimi and Recht (2007) introduced the idea of decomposing positive definite shift-invariant kernels by randomly sampling from their spectral distribution for machine learning applications. This famous technique, known as Random Fourier Features (RFF), is in principle applicable to any such kernel whose spectral distribution can be identified and simulated. In practice, however, it is usually applied to the Gaussian kernel because of its simplicity, since its spectral distribution is also Gaussian. Clearly, simple spectral sampling formulas would be desirable for broader classes of kernels. In this paper, we show that the spectral distribution of positive definite isotropic kernels in $\mathbb{R}^{d}$ for all $d\geq1$ can be decomposed as a scale mixture of $α$-stable random vectors, and we identify the mixing distribution as a function of the kernel. This constructive decomposition provides a simple and ready-to-use spectral sampling formula for many multivariate positive definite shift-invariant kernels, including exponential power kernels, and generalized Cauchy kernels, as well as newly introduced kernels such as the generalized Matérn, Tricomi, and Fox $H$ kernels. In particular, we retrieve the fact that the spectral distributions of these kernels, which can only be explicited in terms of the Fox $H$ special function, are scale mixtures of the multivariate Gaussian distribution, along with an explicit mixing distribution formula. This result has broad applications for support vector machines, kernel ridge regression, Gaussian processes, and other kernel-based machine learning techniques for which the random Fourier features technique is applicable.

2407.06898 2026-02-24 math.OC stat.AP

Who Goes Next? Optimizing the Allocation of Adherence-Improving Interventions

Daniel Otero-Leon, Mariel Lavieri, Brian Denton, Jeremy Sussman, Rodney Hayward

详情
英文摘要

Long-term adherence to medication is a critical factor in preventing chronic diseases, such as cardiovascular disease. To address poor adherence, physicians may recommend adherence-improving interventions; however, such interventions are costly and limited in their availability. Knowing which patients will stop adhering helps distribute the available resources more effectively. We developed a binary integer program (BIP) model to select patients for adherence-improving intervention under budget constraints. We further studied a long-term adherence prediction model using dynamic logistic regression (DLR) model that uses patients' claim data, medical health factors, demographics, and monitoring frequencies to predict the risk of future non-adherence. We trained and tested our predictive model to longitudinal data for cardiovascular disease in a large cohort of patients taking medication for cholesterol control seen in the national Veterans Affairs health system. Our study shows the importance of including past adherence to increase prediction accuracy. Finally, we assess the potential benefits of using the prediction model by proposing an algorithm that combines the DLR and BIP models to decrease the number of CVD events in a population.

2404.00888 2026-02-24 math.ST stat.TH

Two step estimations via the Dantzig selector for models of stochastic processes with high-dimensional parameters

Kou Fujimori, Koji Tsukuda

Comments 51 pages, 1 figure

Journal ref Stochastic Processes and their Applications Volume 192 (2026), 104809

详情
英文摘要

We consider the sparse estimation for stochastic processes with possibly infinite-dimensional nuisance parameters, by using the Dantzig selector which is a sparse estimation method similar to $Z$-estimation. When a consistent estimator for a nuisance parameter is obtained, it is possible to construct an asymptotically normal estimator for the parameter of interest under appropriate conditions. Motivated by this fact, we establish the asymptotic behavior of the Dantzig selector for models of ergodic stochastic processes with high-dimensional parameters of interest and possibly infinite-dimensional nuisance parameters. Moreover, we construct an asymptotically normal estimator by the two step estimation with help of the variable selection through the Dantzig selector and a consistent estimator of the nuisance parameter. Applications to ergodic time series models including integer-valued autoregressive models and ergodic diffusion processes are presented.

2402.12122 2026-02-24 math.NA cs.NA math.PR math.ST stat.TH

Almost sure convergence rates of adaptive increasingly rare Markov chain Monte Carlo

Julian Hofstadler, Krzysztof Latuszynski, Gareth O. Roberts, Daniel Rudolf

详情
英文摘要

We consider adaptive increasingly rare Markov chain Monte Carlo (MCMC) algorithms, which are adaptive MCMC methods, where the adaptation concerning the "past'' happens less and less frequently over time. Under a contraction assumption with respect to a Wasserstein-like function we deduce upper bounds of the convergence rate of Monte Carlo sums taking a renormalisation factor into account that is "almost'' the one that appears in a law of the iterated logarithm. We demonstrate the applicability of our results by considering different settings, among which are those of simultaneous geometric and uniform ergodicity. All proofs are carried out on an augmented state space, including the classical non-augmented setting as a special case. In contrast to other adaptive MCMC limit theory, some technical assumptions, like diminishing adaptation, are not needed.

2401.05812 2026-02-24 stat.CO

A Tidy Framework and Infrastructure to Systematically Assemble Spatio-temporal Indexes from Multivariate Data

H. Sherry Zhang, Dianne Cook, Ursula Laa, Nicolas Langrené, Patricia Menéndez

Journal ref Journal of Computational and Graphical Statistics 34(2) 642-653 (2025)

详情
英文摘要

Indexes are useful for summarizing multivariate information into single metrics for monitoring, communicating, and decision-making. While most work has focused on defining new indexes for specific purposes, more attention needs to be directed towards making it possible to understand index behavior in different data conditions, and to determine how their structure affects their values and variation in values. Here we discuss a modular data pipeline recommendation to assemble indexes. It is universally applicable to index computation and allows investigation of index behavior as part of the development procedure. One can compute indexes with different parameter choices, adjust steps in the index definition by adding, removing, and swapping them to experiment with various index designs, calculate uncertainty measures, and assess indexes robustness. The paper presents three examples to illustrate the pipeline framework usage: comparison of two different indexes designed to monitor the spatio-temporal distribution of drought in Queensland, Australia; the effect of dimension reduction choices on the Global Gender Gap Index (GGGI) on countries ranking; and how to calculate bootstrap confidence intervals for the Standardized Precipitation Index (SPI). The methods are supported by a new R package, called tidyindex.

2401.02953 2026-02-24 stat.ME

Linked factor analysis

Giuseppe Vinci

Comments 42 page, 7 figures

详情
英文摘要

Factor models are widely applied to the analysis of multivariate data across disparate fields of research. However, modern scientific data are often incomplete, and estimating a factor model from partially observed data can be very challenging. In this work, we show that if the data are structurally incomplete, the factor model likelihood function can be decomposed into a product of likelihood functions for multiple factor models relative to different observed data subsets. If these factor models are linked together by common parameters, we can obtain complete maximum likelihood estimates of the full factor model parameters. We call this modeling framework Linked Factor Analysis (LINFA). LINFA can be used for covariance matrix completion, dependence estimation, dimension reduction, and data completion. We compute the maximum likelihood estimator through an efficient Expectation-Maximization algorithm, accelerated by a novel Group Vertex Tessellation algorithm. We establish the conditions for the consistency and asymptotic normality of the estimator. We design confidence regions, hypothesis tests, bootstrap algorithms, and methods for selecting the number of factors. Finally, we illustrate the application of LINFA in an extensive simulation study and in the analysis of neuroscience data.

2312.03274 2026-02-24 stat.ME stat.ML

Empirical Bayes Covariance Decomposition, and a Solution to the Multiple Tuning Problem in Sparse PCA

Joonsuk Kang, Matthew Stephens

详情
英文摘要

Sparse Principal Components Analysis (PCA) has been proposed as a way to improve both interpretability and reliability of PCA. However, use of sparse PCA in practice is hindered by the difficulty of tuning the multiple hyperparameters that control the sparsity of different PCs (the "multiple tuning problem", MTP). Here we present a solution to the MTP using Empirical Bayes methods. We first introduce a general formulation for penalized PCA of a data matrix $\mathbf{X}$, which includes some existing sparse PCA methods as special cases. We show that this formulation also leads to a penalized decomposition of the covariance (or Gram) matrix, $\mathbf{X}^T\mathbf{X}$. We introduce empirical Bayes versions of these penalized problems, in which the penalties are determined by prior distributions that are estimated from the data by maximum likelihood rather than cross-validation. The resulting "Empirical Bayes Covariance Decomposition" provides a principled and efficient solution to the MTP in sparse PCA, and one that can be immediately extended to incorporate other structural assumptions (e.g. non-negative PCA). We illustrate the effectiveness of this approach on both simulated and real data examples.

2301.00201 2026-02-24 stat.ML cs.LG math.DG

Exploring Singularities in point clouds with the graph Laplacian: An explicit approach

Martin Andersson, Benny Avelin

Comments 28 pages, 12 figures

Journal ref Journal of Computational Mathematics and Data Science 14 (2025) 100113

详情
英文摘要

We develop theory and methods that use the graph Laplacian to analyze the geometry of the underlying manifold of datasets. Our theory provides theoretical guarantees and explicit bounds on the functional forms of the graph Laplacian when it acts on functions defined close to singularities of the underlying manifold. We use these explicit bounds to develop tests for singularities and propose methods that can be used to estimate geometric properties of singularities in the datasets.

2210.01844 2026-02-24 math.OC math.ST q-fin.MF stat.TH

A quickest detection problem with false negatives

Tiziano De Angelis, Jhanvi Garg, Quan Zhou

Comments 35 pages, 4 figures

详情
英文摘要

We formulate and solve a variant of the quickest detection problem which features false negatives. A standard Brownian motion acquires a drift at an independent exponential random time which is not directly observable. Based on the observation in continuous time of the sample path of the process, an optimizer must detect the drift as quickly as possible after it has appeared. The optimizer can inspect the system multiple times upon payment of a fixed cost per inspection. If a test is performed on the system before the drift has appeared then, naturally, the test will return a negative outcome. However, if a test is performed after the drift has appeared, then the test may fail to detect it and return a false negative with probability $ε\in(0,1)$. The optimisation ends when the drift is eventually detected. The problem is formulated mathematically as an optimal multiple stopping problem, and it is shown to be equivalent to a recursive optimal stopping problem. Exploiting such connection and free boundary methods we find explicit formulae for the expected cost and the optimal strategy. We also show that when $ε= 0$ our expected cost is an affine transformation of the one in Shiryaev's classical optimal detection problem with a rescaled model parameter.

2205.00259 2026-02-24 stat.CO stat.ME

cubble: An R Package for Organizing and Wrangling Multivariate Spatio-temporal Data

H. Sherry Zhang, Dianne Cook, Ursula Laa, Nicolas Langrené, Patricia Menéndez

Journal ref Journal of Statistical Software 110(7) 1-27 (2024)

详情
英文摘要

Multivariate spatio-temporal data refers to multiple measurements taken across space and time. For many analyses, spatial and time components can be separately studied: for example, to explore the temporal trend of one variable for a single spatial location, or to model the spatial distribution of one variable at a given time. However for some studies, it is important to analyse different aspects of the spatio-temporal data simultaneouly, like for instance, temporal trends of multiple variables across locations. In order to facilitate the study of different portions or combinations of spatio-temporal data, we introduce a new data structure, cubble, with a suite of functions enabling easy slicing and dicing on the different components spatio-temporal components. The proposed cubble structure ensures that all the components of the data are easy to access and manipulate while providing flexibility for data analysis. In addition, cubble facilitates visual and numerical explorations of the data while easing data wrangling and modelling. The cubble structure and the functions provided in the cubble R package equip users with the capability to handle hierarchical spatial and temporal structures. The cubble structure and the tools implemented in the package are illustrated with different examples of Australian climate data.

2107.05956 2026-02-24 stat.CO stat.ME

IID Sampling from Intractable Distributions

Sourabh Bhattacharya

Comments This updated version will appear in Sankhya A's special issue paying tribute to Professor C. R. Rao

详情
英文摘要

We propose a novel methodology for drawing iid realizations from any target distribution on the Euclidean space with arbitrary dimension. No assumption of compact support is necessary for the validity of our theory and method. Our idea is to construct an appropriate infinite sequence of concentric closed ellipsoids, represent the target distribution as an infinite mixture on the central ellipsoid and the ellipsoidal annuli, and to construct efficient perfect samplers for the mixture components. In contrast with most of the existing works on perfect sampling, ours is not only a theoretically valid method, it is practically applicable to all target distributions on any dimensional Euclidean space and very much amenable to parallel computation. We validate the practicality and usefulness of our methodology by generating 10000 iid realizations from the standard distributions such as normal, Student's t with 5 degrees of freedom and Cauchy, for dimensions d = 1, 5, 10, 50, 100, as well as from a 50-dimensional mixture normal distribution. The implementation time in all the cases are very reasonable, and often less than a minute in our parallel implementation. The results turned out to be highly accurate. We also apply our method to draw 10000 iid realizations from the posterior distributions associated with the well-known Challenger data, a Salmonella data and the 160-dimensional challenging spatial example of the radionuclide count data on Rongelap Island. Again, we are able to obtain quite encouraging results with very reasonable computing time.

2103.01280 2026-02-24 econ.EM math.ST stat.ME stat.ML stat.TH

Dynamic covariate balancing: estimating treatment effects over time with potential local projections

Davide Viviano, Jelena Bradic

详情
英文摘要

This paper studies the estimation and inference of treatment effects in panel data settings when treatments change dynamically over time. We propose a balancing method that allows for (i) treatments to be assigned dynamically over time based on high-dimensional covariates, past outcomes, and treatments; (ii) outcomes and time-varying covariates to depend on the trajectory of all past treatments; (iii) heterogeneity of treatment effects. Our approach recursively projects potential outcomes' expectations on past histories. It then controls the bias arising from the non-experimental and sequential nature of this setting by balancing dynamically observable characteristics over time. We establish inferential guarantees of the proposed method even when the number of observable characteristics significantly exceeds the sample size. We study numerical properties of the estimator and illustrate the benefits of the procedure in an empirical application.

2008.11175 2026-02-24 stat.AP stat.ME

How Ominous is the Premonition of Future Global Warming?

Debashis Chatterjee, Sourabh Bhattacharya

Comments This updated version will appear in Sankhya B's special issue paying tribute to Professor C. R. Rao

详情
英文摘要

Global warming, the phenomenon of increasing global average temperature in the recent decades, is receiving wide attention due to its very significant adverse effects on climate. Whether global warming will continue even in the future, is a question that is most important to investigate. In this regard, the so-called general circulation models (GCMs) have attempted to project the future climate, and nearly all of them exhibit alarming rates of global temperature rise in the future. Although global warming in the current time frame is undeniable, it is important to assess the validity of the future predictions of the GCMs. In this article, we attempt such a study using our recently-developed Bayesian multiple testing paradigm for model selection in inverse regression problems. The model we assume for the global temperature time series is based on Gaussian process emulation of the black box scenario, realistically treating the dynamic evolution of the time series as unknown. We apply our ideas to datasets available from the Intergovernmental Panel on Climate Change (IPCC) website. The best GCM models selected by our method under different assumptions on future climate change scenarios do not convincingly support the present global warming pattern when only the future predictions are considered known. Using our Gaussian process idea, we also forecast the future temperature time series given the current one. Interestingly, our results do not support drastic future global warming predicted by almost all the GCM models.

2602.19012 2026-02-24 stat.ME stat.AP

Adaptive Weighting for Time-to-Event Continual Reassessment Method: Improving Safety in Phase I Dose-Finding Through Data-Driven Delay Distribution Estimation

Robert Amevor, Emmanuel Kubuafor, Dennis Baidoo

详情
英文摘要

Background: Phase I dose-finding trials increasingly encounter delayed-onset toxicities, especially with immunotherapies and targeted agents. The time-to-event continual reassessment method (TITE-CRM) handles incomplete follow-up using fixed linear weights, but this ad hoc approach doesn't reflect actual delay patterns and may expose patients to excessive risk during dose escalation. Methods: We replace TITE-CRM's fixed weights with adaptive weights, posterior predictive probabilities derived from the evolving toxicity delay distribution. Under a Weibull timing model, we get closed-form weight updates through maximum likelihood estimation, making real-time implementation straightforward. We tested our method (AW-TITE) against TITE-CRM and standard designs (3+3, mTPI, BOIN) across three dose-toxicity scenarios through simulation (N = 30 patients, 2,000 replications). We also examined robustness across varying accrual rates, sample sizes, shape parameters, observation windows, and priors. Results: Our AW-TITE reduced patient overdosing by 40.6% compared to TITE-CRM (mean fraction above MTD: 0.202 vs 0.340; 95% CI: -0.210 to -0.067, p < 0.001) while maintaining comparable MTD selection accuracy (mean difference: +0.023, p = 0.21). Against algorithm-based methods, AW-TITE achieved higher MTD identification: +32.6% vs mTPI, +19.8% vs 3+3, and +5.6% vs BOIN. Performance remained robust across all sensitivity analyses. Conclusions: Adaptive weighting offers a practical way to improve Phase I trial safety while preserving MTD selection accuracy. The method requires minimal computation and is ready for real-time use.

2602.18988 2026-02-24 stat.ME stat.AP

Latent Moment Models for Recurrent Binary Outcomes: A Bayesian and Quasi-Distributional Approach

Niloofar Ramezani, Lori P. Selby, Pascal Nitiema, Jeffrey R. Wilson

Comments 16 pages, 1 figure, 4 tables, 1 Supplementary Table

详情
英文摘要

Recurrent binary outcomes within individuals, such as hospital readmissions, often reflect latent risk processes that evolve over time. Conventional methods like generalized linear mixed models and generalized estimating equations estimate average risk but fail to capture temporal changes in variability, asymmetry, and tail behavior. We introduce two statistical frameworks that model each binary event as the outcome of a thresholded value drawn from a time-varying latent distribution defined by its location, scale, skewness, and kurtosis. Rather than treating these four quantities as nonparametric moment estimators, we model them as interpretable latent moments within a flexible latent distributional family. The first, BLaS-Recurrent, is a Bayesian model using the sinh-arcsinh distribution (a parametric family that provides explicit control over asymmetry and tail weight) to estimate latent moment trajectories; the second, QuaD-Recurrent, is a quasi-distributional approach that maps simulated moment vectors to event probabilities using a flexible nonparametric surface. Both models support time-dependent covariates, serial correlation, and multiple membership structures. Simulation studies show improved calibration, interpretability, and robustness over standard models. Applied to ICU readmission data from the MIMIC-IV database, both approaches uncover clinically meaningful patterns in latent risk, such as right-skewed escalation and widening dispersion, that are missed by traditional methods. These models provide interpretable, distribution-sensitive tools for longitudinal binary outcomes in healthcare while explicitly acknowledging that latent "moments" summarize but do not uniquely determine the underlying distribution.

2602.18948 2026-02-24 cs.LG cs.NE hep-th stat.ML

Toward Manifest Relationality in Transformers via Symmetry Reduction

J. François, L. Ravera

Comments 12 pages

详情
英文摘要

Transformer models contain substantial internal redundancy arising from coordinate-dependent representations and continuous symmetries, in model space and in head space, respectively. While recent approaches address this by explicitly breaking symmetry, we propose a complementary framework based on symmetry reduction. We reformulate representations, attention mechanisms, and optimization dynamics in terms of invariant relational quantities, eliminating redundant degrees of freedom by construction. This perspective yields architectures that operate directly on relational structures, providing a principled geometric framework for reducing parameter redundancy and analyzing optimization.

2602.18870 2026-02-24 stat.ML cs.LG

Federated Measurement of Demographic Disparities from Quantile Sketches

Arthur Charpentier, Agathe Fernandes Machado, Olivier Côté, François Hu

详情
英文摘要

Many fairness goals are defined at a population level that misaligns with siloed data collection, which remains unsharable due to privacy regulations. Horizontal federated learning (FL) enables collaborative modeling across clients with aligned features without sharing raw data. We study federated auditing of demographic parity through score distributions, measuring disparity as a Wasserstein--Frechet variance between sensitive-group score laws, and expressing the population metric in federated form that makes explicit how silo-specific selection drives local-global mismatch. For the squared Wasserstein distance, we prove an ANOVA-style decomposition that separates (i) selection-induced mixture effects from (ii) cross-silo heterogeneity, yielding tight bounds linking local and global metrics. We then propose a one-shot, communication-efficient protocol in which each silo shares only group counts and a quantile summary of its local score distributions, enabling the server to estimate global disparity and its decomposition, with $O(1/k)$ discretization bias ($k$ quantiles) and finite-sample guarantees. Experiments on synthetic data and COMPAS show that a few dozen quantiles suffice to recover global disparity and diagnose its sources.

2602.18865 2026-02-24 stat.ME

Expected Shortfall Regression via Optimization

Yuanzhi Li, Shushu Zhang, Xuming He

Comments Yuanzhi Li and Shushu Zhang contributed equally to this work

详情
英文摘要

To provide a comprehensive summary of the tail distribution, the expected shortfall is defined as the average over the tail above (or below) a certain quantile of the distribution. The expected shortfall regression captures the heterogeneous covariate-response relationship and describes the covariate effects on the tail of the response distribution. Based on a critical observation that the superquantile regression from the operations research literature does not coincide with the expected shortfall regression, we propose and validate a novel optimization-based approach for the linear expected shortfall regression, without additional assumptions on the conditional quantile models. While the proposed loss function is implicitly defined, we provide a prototype implementation of the proposed approach with some initial expected shortfall estimators based on binning techniques. With practically feasible initial estimators, we establish the consistency and the asymptotic normality of the proposed estimator. The proposed approach achieves heterogeneity-adaptive weights and therefore often offers efficiency gain over existing linear expected shortfall regression approaches in the literature, as demonstrated through simulation studies.

2602.18808 2026-02-24 math.PR stat.ME

Orthogonal polynomials on path-space

Ilya Chevyrev, Emilio Ferrucci, Darrick Lee, Terry Lyons, Harald Oberhauser, Nikolas Tapia

Comments 38 pages, 4 figures

详情
英文摘要

We consider the orthogonalisation of the signature of a stochastic process as the analogue of orthogonal polynomials on path-space. Under an infinite radius of convergence assumption, we prove density of linear functions on the signature in $L^p$ functions on grouplike elements, making it possible to represent a square-integrable function on (rough) paths as an $L^2$-convergent series. By viewing the shuffle algebra as commutative polynomials on the free Lie algebra, we revisit much of the theory of classical orthogonal polynomials in several variables, such as the recurrence relation and Favard's theorem. Finally, we restrict our attention to the case of Brownian motion with and without drift, and prove that dimension-independent orthogonal signature exists with drift but not without. We end with numerical examples of how orthogonal signature polynomials of Brownian motion can be applied for the approximation of functions on paths sampled from the Wiener measure.

2602.18795 2026-02-24 cs.LG stat.ML

Vectorized Bayesian Inference for Latent Dirichlet-Tree Allocation

Zheng Wang, Nizar Bouguila

Comments Submitted to JMLR, under review

详情
英文摘要

Latent Dirichlet Allocation (LDA) is a foundational model for discovering latent thematic structure in discrete data, but its Dirichlet prior cannot represent the rich correlations and hierarchical relationships often present among topics. We introduce the framework of Latent Dirichlet-Tree Allocation (LDTA), a generalization of LDA that replaces the Dirichlet prior with an arbitrary Dirichlet-Tree (DT) distribution. LDTA preserves LDA's generative structure but enables expressive, tree-structured priors over topic proportions. To perform inference, we develop universal mean-field variational inference and Expectation Propagation, providing tractable updates for all DT. We reveal the vectorized nature of the two inference methods through theoretical development, and perform fully vectorized, GPU-accelerated implementations. The resulting framework substantially expands the modeling capacity of LDA while maintaining scalability and computational efficiency.

2602.18762 2026-02-24 stat.ML cs.LG

Bounds and Identification of Joint Probabilities of Potential Outcomes and Observed Variables under Monotonicity Assumptions

Naoya Hashimoto, Yuta Kawakami, Jin Tian

详情
英文摘要

Evaluating joint probabilities of potential outcomes and observed variables, and their linear combinations, is a fundamental challenge in causal inference. This paper addresses the bounding and identification of these probabilities in settings with discrete treatment and discrete ordinal outcome. We propose new families of monotonicity assumptions and formulate the bounding problem as a linear programming problem. We further introduce a new monotonicity assumption specifically to achieve identification. Finally, we present numerical experiments to validate our methods and demonstrate their application using real-world datasets.

2602.18727 2026-02-24 stat.AP q-bio.QM

Statistical methods for reference-free single-molecule localisation microscopy

Jack Peyton, Benjamin Davis, Emily Gribbin, Daniel Rolfe, Hannah Mitchell

详情
英文摘要

MINFLUX (Minimal Photon Flux) is a single-molecule imaging technique capable of resolving fluorophores at a precision of <5 nm. Interpretation of the point patterns generated by this technique presents challenges due to variable emitter density, incomplete bio-labelling of target molecules and their detection, error prone measurement processes, and the presence of spurious (non-structure associated) fluorescent detections. Together, these challenges ensure structural inferences from single-molecule imaging datasets are non-trivial in the absence of strong a priori information, for all but the smallest of point patterns. In addition, current methods often require subjective parameter tuning and presuppose known structural templates, limiting reference-free discovery. We present a statistically grounded, end-to-end analysis framework. Focusing on MINFLUX derived datasets and leveraging Bayesian and spatial statistical methods, a pipeline is presented that demonstrates 1) uncertainty aware clustering of measurements into emitter groups that performs better than current gold standards, 2) rapid identification of molecular structure supergroups, and 3) reconstruction of repeating structures within the dataset without substantial prior knowledge. This pipeline is demonstrated using simulated and real MINFLUX datasets, where emitter clustering and centre detection maintain high performance (emitter subset assignment accuracy > 0.75) across all conditions evaluated, while structural inference achieves reliable discrimination (F1 approx. 0.9) at high labelling efficiency. Template-free reconstruction of Nup96 and DNA-Origami 3x3 grids are achieved.

2602.18677 2026-02-24 stat.ME stat.AP

Bayesian calendar-time survival analysis with epidemic curve priors and variant-specific infection hazards

Angela M Dahl, Elizabeth R Brown

Comments 24 pages, 6 figures

详情
英文摘要

In this paper, we develop a Bayesian calendar-time survival model motivated by infectious disease prevention studies occurring during an epidemic, when the risk of infection can change rapidly as the epidemic curve shifts. For studies in which a biomarker is the predictor of interest, we include the option to estimate a threshold of protection for the biomarker. If the intervention is hypothesized to have different associations with several circulating viral variants, or if the infectiousness of the dominant variant(s) changes over the course of the study, we treat infection from different variants as competing risks. We also introduce a novel method for incorporating existing epidemic curve estimates into an informative prior for the baseline hazard function, enabling estimation of the intervention's association with infection risk during periods of calendar time with minimal follow-up in one or more comparator groups. We demonstrate the strengths of this method via simulations, and we apply it to data from an observational COVID-19 vaccine study.

2602.18660 2026-02-24 stat.ME cs.HC

Better Assumptions, Stronger Conclusions: The Case for Ordinal Regression in HCI

Brandon Victor Syiem, Eduardo Velloso

Comments 21 pages, 16 figures, to be published in the Proceedings of the 2026 ACM CHI Conference on Human Factors in Computing Systems

详情
英文摘要

Despite the widespread use of ordinal measures in HCI, such as Likert-items, there is little consensus among HCI researchers on the statistical methods used for analysing such data. Both parametric and non-parametric methods have been extensively used within the discipline, with limited reflection on their assumptions and appropriateness for such analyses. In this paper, we examine recent HCI works that report statistical analyses of ordinal measures. We highlight prevalent methods used, discuss their limitations and spotlight key assumptions and oversights that diminish the insights drawn from these methods. Finally, we champion and detail the use of cumulative link (mixed) models (CLM/CLMM) for analysing ordinal data. Further, we provide practical worked examples of applying CLM/CLMMs using R to published open-sourced datasets. This work contributes towards a better understanding of the statistical methods used to analyse ordinal data in HCI and helps to consolidate practices for future work.

2602.18656 2026-02-24 stat.ME math.ST stat.TH

Minimally Discrete and Minimally Randomized p-Values

Joshua Habiger, Pratyaydipta Rudra

详情
英文摘要

In meta analysis, multiple hypothesis testing and many other methods, p-values are utilized as inputs and assumed to be uniformly distributed over the unit interval under the null hypotheses. If data used to generate p-values have discrete distributions then either natural, mid- or randomized p-values are typically utilized. Natural and mid-p-values can allow for valid, albeit conservative, downstream methods since under the null hypothesis they are dominated by uniform distributions in the stochastic and convex order, respectively. Randomized p-values need not lead to conservative procedures since they permit a uniform distributions under the null hypotheses through the generation of independent auxiliary variates. However, the auxiliary variates necessarily add variation to procedures. This manuscript introduces and studies ``minimally discrete'' (MD) natural p-values, MD mid-p-values and ``minimally randomized'' (MR) p-values. It is shown that MD p-values dominate their non-MD counterparts in the stochastic and convex order, and hence lead to less conservative, yet still valid, downstream methods. Likewise, MR p-values dominate their non-MR counterparts in that they are still uniformly distributed under the null hypotheses, but the added variation attributable to the independently generated auxiliary variate is smaller. It is anticipated that results here will facilitate the construction of new meta-analysis and multiple testing methods via more efficient p-value construction, and facilitate theoretical study of existing and new methods by establishing gold standards for addressing the unavoidable detrimental ``discreteness effect''.

2602.18651 2026-02-24 stat.ME

Hybrid combinations of parametric and empirical likelihoods

Nils Lid Hjort, Ian W. McKeague, Ingrid Van Keilegom

Comments 24 pages, 4 figures. This is the July 2017 authors' manuscript, with Supplementary Material, with final paper published in Statistica Sinica, 2018, their Peter Hall issue, vol. 28, pages 2389-2407, see pmc.ncbi.nlm.nih.gov/articles/PMC6602551/

Journal ref Statistica Sinica, 2018, vol. 28, pages 2389-2407

详情
英文摘要

This paper develops a hybrid likelihood (HL) method based on a compromise between parametric and nonparametric likelihoods. Consider the setting of a parametric model for the distribution of an observation $Y$ with parameter $θ$. Suppose there is also an estimating function $m(\cdot,μ)$ identifying another parameter $μ$ via $E\,m(Y,μ)=0$, at the outset defined independently of the parametric model. To borrow strength from the parametric model while obtaining a degree of robustness from the empirical likelihood method, we formulate inference about $θ$ in terms of the hybrid likelihood function $H_n(θ)=L_n(θ)^{1-a}R_n(μ(θ))^a$. Here $a\in[0,1)$ represents the extent of the compromise, $L_n$ is the ordinary parametric likelihood for $θ$, $R_n$ is the empirical likelihood function, and $μ$ is considered through the lens of the parametric model. We establish asymptotic normality of the corresponding HL estimator and a version of the Wilks theorem. We also examine extensions of these results under misspecification of the parametric model, and propose methods for selecting the balance parameter $a$.

2602.18636 2026-02-24 cs.CY stat.AP

Statistical Imaginaries, State Legitimacy: Grappling with the Arrangements Underpinning Quantification in the U.S. Census

Jayshree Sarathy, danah boyd

Journal ref Critical Sociology, 51(6), 1267-1288 (2024)

详情
英文摘要

Over the last century, the adoption of novel scientific methods for conducting the U.S. census has been met with wide-ranging receptions. Some methods were quietly embraced, while others sparked decades-long controversies. What accounts for these differences? We argue that controversies emerge from $\textit{arrangements of statistical imaginaries}$, putting into tension divergent visions of the census. To analyze these dynamics, we compare reactions to two methods designed to improve data accuracy (imputation and adjustment) and two methods designed to protect confidentiality (swapping and differential privacy), offering insight into how each method reconfigures stakeholder orientations and rhetorical claims. These cases allow us to reflect on how technocratic efforts to improve accuracy and confidentiality can strengthen -- or erode -- trust in data. Our analysis shows how the credibility of the Census Bureau and its data stem not just from empirical evaluations of quantification, but also from how statistical imaginaries are contested and stabilized.

2602.18573 2026-02-24 stat.ML cs.LG stat.ME

Multiclass Calibration Assessment and Recalibration of Probability Predictions via the Linear Log Odds Calibration Function

Amy Vennos, Xin Xing, Christopher T. Franck

详情
英文摘要

Machine-generated probability predictions are essential in modern classification tasks such as image classification. A model is well calibrated when its predicted probabilities correspond to observed event frequencies. Despite the need for multicategory recalibration methods, existing methods are limited to (i) comparing calibration between two or more models rather than directly assessing the calibration of a single model, (ii) requiring under-the-hood model access, e.g., accessing logit-scale predictions within the layers of a neural network, and (iii) providing output which is difficult for human analysts to understand. To overcome (i)-(iii), we propose Multicategory Linear Log Odds (MCLLO) recalibration, which (i) includes a likelihood ratio hypothesis test to assess calibration, (ii) does not require under-the-hood access to models and is thus applicable on a wide range of classification problems, and (iii) can be easily interpreted. We demonstrate the effectiveness of the MCLLO method through simulations and three real-world case studies involving image classification via convolutional neural network, obesity analysis via random forest, and ecology via regression modeling. We compare MCLLO to four comparator recalibration techniques utilizing both our hypothesis test and the existing calibration metric Expected Calibration Error to show that our method works well alone and in concert with other methods.

2602.18570 2026-02-24 stat.ME

Spatiotemporal double machine learning to estimate the impact of Cambodian land concessions on deforestation

Anika Arifin, Duncan DeProfio, Layla Lammers, Benjamin Shapiro, Brian J Reich, Henry Uddyback, Joshua M Gray

详情
英文摘要

Environmental policy evaluation frequently requires thoughtful consideration of space and time in causal inference. We use novel statistical methods to analyze the causal effect of land concessions on deforestation rates in Cambodia. Standard approaches, such as difference-in-differences regression, effectively address spatiotemporally-correlated treatments under some conditions, but they are limited in their ability to account for unobserved confounders affecting both treatment and outcome. Double Spatial Regression (DSR) is an approach that uses double machine learning to address these scenarios. DSR resolves the confounding variables for both treatment and outcome, comparing the residuals to estimate treatment effectiveness. We improve upon DSR by considering time in our analysis of policy interventions with spatial effects. We conduct a large-scale simulation study using Bayesian Additive Regression Trees (BART) with spatial embeddings and find that, under certain conditions, our DSR model outperforms standard approaches for addressing unobserved spatial confounding. We then apply our method to evaluate the policy impacts of land concessions on deforestation in Cambodia.

2602.18518 2026-02-24 cs.LG stat.ME stat.ML

Measuring the Prevalence of Policy Violating Content with ML Assisted Sampling and LLM Labeling

Attila Dobi, Aravindh Manickavasagam, Benjamin Thompson, Xiaohan Yang, Faisal Farooq

Comments 8 pages

详情
英文摘要

Content safety teams need metrics that reflect what users actually experience, not only what is reported. We study prevalence: the fraction of user views (impressions) that went to content violating a given policy on a given day. Accurate prevalence measurement is challenging because violations are often rare and human labeling is costly, making frequent, platform-representative studies slow. We present a design-based measurement system that (i) draws daily probability samples from the impression stream using ML-assisted weights to concentrate label budget on high-exposure and high-risk content while preserving unbiasedness, (ii) labels sampled items with a multimodal LLM governed by policy prompts and gold-set validation, and (iii) produces design-consistent prevalence estimates with confidence intervals and dashboard drilldowns. A key design goal is one global sample with many pivots: the same daily sample supports prevalence by surface, viewer geography, content age, and other segments through post-stratified estimation. We describe the statistical estimators, variance and confidence interval construction, label-quality monitoring, and an engineering workflow that makes the system configurable across policies.

2602.18486 2026-02-24 cs.LG eess.SP stat.ML

Support Vector Data Description for Radar Target Detection

Jean Pinsolle, Yadang Alexis Rouzoumka, Chengfang Ren, Chistèle Morisseau, Jean-Philippe Ovarlez

Comments 5 pages, 2 figures, to appear in Acoustics, Speech and Signal Processing (ICASSP), 2026 IEEE International Conference on, Barcelona, Spain, May 2026

详情
英文摘要

Classical radar detection techniques rely on adaptive detectors that estimate the noise covariance matrix from target-free secondary data. While effective in Gaussian environments, these methods degrade in the presence of clutter, which is better modeled by heavy-tailed distributions such as the Complex Elliptically Symmetric (CES) and Compound-Gaussian (CGD) families. Robust covariance estimators like M-estimators or Tyler's estimator address this issue, but still struggle when thermal noise combines with clutter. To overcome these challenges, we investigate the use of Support Vector Data Description (SVDD) and its deep extension, Deep SVDD, for target detection. These one-class learning methods avoid direct noise covariance estimation and are adapted here as CFAR detectors. We propose two novel SVDD-based detection algorithms and demonstrate their effectiveness on simulated radar data.

2602.18465 2026-02-24 cs.LG stat.ML

Revisiting the Seasonal Trend Decomposition for Enhanced Time Series Forecasting

Sanjeev Panta, Xu Yuan, Li Chen, Nian-Feng Tzeng

Comments 5 pages, accepted at 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026)

详情
英文摘要

Time series forecasting presents significant challenges in real-world applications across various domains. Building upon the decomposition of the time series, we enhance the architecture of machine learning models for better multivariate time series forecasting. To achieve this, we focus on the trend and seasonal components individually and investigate solutions to predict them with less errors. Recognizing that reversible instance normalization is effective only for the trend component, we take a different approach with the seasonal component by directly applying backbone models without any normalization or scaling procedures. Through these strategies, we successfully reduce error values of the existing state-of-the-art models and finally introduce dual-MLP models as more computationally efficient solutions. Furthermore, our approach consistently yields positive results with around 10% MSE average reduction across four state-of-the-art baselines on the benchmark datasets. We also evaluate our approach on a hydrological dataset extracted from the United States Geological Survey (USGS) river stations, where our models achieve significant improvements while maintaining linear time complexity, demonstrating real-world effectiveness. The source code is available at https://github.com/Sanjeev97/Time-Series-Decomposition

2602.18442 2026-02-24 stat.ME stat.AP

Ostrom-Weighted Bootstrap: A Theoretically Optimal and Provably Complete Framework for Hierarchical Imputation in Multi-Agent Systems

Hirofumi Wakimoto

Comments 7 pages, initial submission

详情
英文摘要

We study the statistical properties of the \emph{Ostrom-Weighted Bootstrap} (OWB), a hierarchical, variance-aware resampling scheme for imputing missing values and estimating archetypes in multi-agent voting data. At Level~1, under mild linear model assumptions, the \emph{ideal} OWB estimator -- with known persona-level (agent-level) variances -- is shown to be the Gauss--Markov best linear unbiased estimator (BLUE) and to strictly dominate uniform weighting whenever persona variances differ. At Level~2, within a canonical hierarchical normal model, the ideal OWB coincides with the conditional Bayesian posterior mean of the archetype. We then analyze the \emph{feasible} OWB, which replaces unknown variances with hierarchically pooled empirical estimates, and show that it can be interpreted as both a feasible generalized least-squares (FGLS) and an empirical-Bayes shrinkage estimator with asymptotically valid weighted bootstrap confidence intervals under mild regularity conditions. Finally, we establish a Zero-NaN Guarantee: as long as each petal has at least one finite observation, the OWB imputation algorithm produces strictly NaN-free completed data using only explicit, non-uniform bootstrap weights and never resorting to uniform sampling or numerical zero-filling. To our knowledge, OWB is the first resampling-based method that simultaneously achieves exact BLUE optimality, conditional Bayesian posterior mean interpretation, empirical Bayes shrinkage of precision parameters, asymptotic efficiency via FGLS, consistent weighted bootstrap inference, and provable zero-NaN completion under minimal data assumptions.

2602.11080 2026-02-24 stat.ME math.ST stat.TH

Constrained Fiducial Inference for Gaussian Models

Hank Flury, Jan Hannig, Richard Smith

详情
英文摘要

We propose a new fiducial Markov Chain Monte Carlo (MCMC) method for fitting parametric Gaussian models. We utilize the Cayley transform to decompose the parametric covariance matrix, which in turn allows us to formulate a general data generating algorithm for Gaussian data. Leveraging constrained generalized fiducial inference, we are able to create the basis of an MCMC algorithm, which can be specified to parametric models with minimal effort. The appeal of this novel approach is the wide class of models which it permits, ease of implementation and the posterior-like fiducial distribution without the need for a prior. We provide background information for the derivation of the relevant fiducial quantities, and a proof that the proposed MCMC algorithm targets the correct fiducial distribution. We need not assume independence nor identical distribution of the data, which makes the method attractive for application to time series and spatial data. Well-performing simulation results of the MA(1) and Matérn models are presented.

2601.18412 2026-02-24 stat.ME

Preference-based Centrality and Ranking in General Metric Spaces

Lingfeng Lyu, Doudou Zhou

详情
英文摘要

Ranking or assessing centrality in multivariate and non-Euclidean data is difficult because there is no canonical order and many depth notions become computationally fragile in high-dimensional or structured settings. We introduce a preference-based notion of centrality defined through population proximity comparisons with respect to a random reference draw, yielding a metric-intrinsic statistical functional that is well-defined on general metric spaces. Because the induced pairwise preferences may be non-transitive, we map them to a coherent one-dimensional score via a Bradley--Terry--Luce cross-entropy projection, viewed as a calibrated aggregation device rather than a correctly specified model. We develop two finite-sample estimators a convex M-estimator and a fast spectral estimator based on a comparison operator, and establish identifiability and consistency under mild conditions. Simulations and real-data examples, including high-dimensional and functional observations, illustrate that the proposed scores provide stable, interpretable rankings aligned with the underlying preference centrality.

2601.17160 2026-02-24 stat.ML cs.AI cs.LG stat.ME

Information-Theoretic Causal Bounds under Unmeasured Confounding

Yonghan Jung, Bogyeong Kang

详情
英文摘要

We develop a data-driven information-theoretic framework for sharp partial identification of causal effects under unmeasured confounding. Existing approaches often rely on restrictive assumptions, such as bounded or discrete outcomes; require external inputs (for example, instrumental variables, proxies, or user-specified sensitivity parameters); necessitate full structural causal model specifications; or focus solely on population-level averages while neglecting covariate-conditional effects. We overcome all four limitations simultaneously by establishing novel information-theoretic, data-driven divergence bounds. Our key theoretical contribution shows that the f-divergence between the observational distribution P(Y | A = a, X = x) and the interventional distribution P(Y | do(A = a), X = x) is upper bounded by a function of the propensity score alone. This result enables sharp partial identification of conditional causal effects directly from observational data, without requiring external sensitivity parameters, auxiliary variables, full structural specifications, or outcome boundedness assumptions. For practical implementation, we develop a semiparametric estimator satisfying Neyman orthogonality (Chernozhukov et al., 2018), which ensures root-n consistent inference even when nuisance functions are estimated via flexible machine learning methods. Simulation studies and real-world data applications, implemented in the GitHub repository (https://github.com/yonghanjung/Information-Theretic-Bounds), demonstrate that our framework provides tight and valid causal bounds across a wide range of data-generating processes.

2601.15500 2026-02-24 stat.ML cs.AI cs.LG math.ST stat.TH

Low-Dimensional Adaptation of Rectified Flow: A Diffusion and Stochastic Localization Perspective

Saptarshi Roy, Alessandro Rinaldo, Purnamrita Sarkar

Comments 32 pages, 7 figures

详情
英文摘要

In recent years, Rectified flow (RF) has gained considerable popularity largely due to its generation efficiency and state-of-the-art performance. In this paper, we investigate the degree to which RF automatically adapts to the intrinsic low dimensionality of the support of the target distribution to accelerate sampling. We show that, using a carefully designed choice of the time-discretization scheme and with sufficiently accurate drift estimates, the RF sampler enjoys an iteration complexity of order $O(k/\varepsilon)$ (up to log factors), where $\varepsilon$ is the precision in total variation distance and $k$ is the intrinsic dimension of the target distribution. In addition, we show that the denoising diffusion probabilistic model (DDPM) procedure is equivalent to a stochastic version of RF by establishing a novel connection between these processes and stochastic localization. Building on this connection, we further design a stochastic RF sampler that also adapts to the low-dimensionality of the target distribution under milder requirements on the accuracy of the drift estimates, and also with a specific time schedule. We illustrate with simulations on the synthetic data and text-to-image data experiments the improved performance of the proposed samplers implementing the newly designed time-discretization schedules.

2601.06830 2026-02-24 stat.ML cs.LG cs.NA math.NA math.OC math.PR

Constrained Density Estimation via Optimal Transport

Yinan Hu, Esteban G. Tabak

详情
英文摘要

A novel framework for density estimation under expectation constraints is proposed. The framework minimizes the Wasserstein distance between the estimated density and a prior, subject to the constraints that the expected value of a set of functions adopts or exceeds given values. The framework is generalized to include regularization inequalities to mitigate the artifacts in the target measure. An annealing-like algorithm is developed to address non-smooth constraints, with its effectiveness demonstrated through both synthetic and proof-of-concept real world examples in finance.

2512.20363 2026-02-24 cs.LG cs.AI cs.DC stat.AP stat.ML

Clust-PSI-PFL: A Population Stability Index Approach for Clustered Non-IID Personalized Federated Learning

Daniel M. Jimenez-Gutierrez, Mehrdad Hassanzadeh, David Solans, Mohammed Elbamby, Nicolas Kourtellis, Aris Anagnostopoulos, Ioannis Chatzigiannakis, Andrea Vitaletti

Comments Accepted for publication to the 40th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2026)

详情
英文摘要

Federated learning (FL) supports privacy-preserving, decentralized machine learning (ML) model training by keeping data on client devices. However, non-independent and identically distributed (non-IID) data across clients biases updates and degrades performance. To alleviate these issues, we propose Clust-PSI-PFL, a clustering-based personalized FL framework that uses the Population Stability Index (PSI) to quantify the level of non-IID data. We compute a weighted PSI metric, $WPSI^L$, which we show to be more informative than common non-IID metrics (Hellinger, Jensen-Shannon, and Earth Mover's distance). Using PSI features, we form distributionally homogeneous groups of clients via K-means++; the number of optimal clusters is chosen by a systematic silhouette-based procedure, typically yielding few clusters with modest overhead. Across six datasets (tabular, image, and text modalities), two partition protocols (Dirichlet with parameter $α$ and Similarity with parameter S), and multiple client sizes, Clust-PSI-PFL delivers up to 18% higher global accuracy than state-of-the-art baselines and markedly improves client fairness by a relative improvement of 37% under severe non-IID data. These results establish PSI-guided clustering as a principled, lightweight mechanism for robust PFL under label skew.

2512.20007 2026-02-24 stat.ML cs.LG stat.ME

Semiparametric KSD test: unifying score and distance-based approaches for goodness-of-fit testing

Zhihan Huang, Ziang Niu

详情
英文摘要

Goodness-of-fit (GoF) tests are fundamental for assessing model adequacy. Score-based tests are appealing because they require fitting the model only once under the null. However, extending them to powerful nonparametric alternatives is difficult due to the lack of suitable score functions. Through a class of exponentially tilted models, we show that the resulting score-based GoF tests are equivalent to the tests based on integral probability metrics (IPMs) indexed by a function class. When the class is rich, the test is universally consistent. This simple yet insightful perspective enables reinterpretation of classical distance-based testing procedures-including those based on Kolmogorov-Smirnov distance, Wasserstein-1 distance, and maximum mean discrepancy-as arising from score-based constructions. Building on this insight, we propose a new nonparametric score-based GoF test through a special class of IPM induced by kernelized Stein's function class, called semiparametric kernelized Stein discrepancy (SKSD) test. Compared with other nonparametric score-based tests, the SKSD test is computationally efficient and accommodates general nuisance-parameter estimators, supported by a generic parametric bootstrap procedure. The SKSD test is universally consistent and attains Pitman efficiency. Moreover, SKSD test provides simple GoF tests for models with intractable likelihoods but tractable scores with the help of Stein's identity and we use two popular models, kernel exponential family and conditional Gaussian models, to illustrate the power of our method. Our method achieves power comparable to task-specific normality tests such as Anderson-Darling and Lilliefors, despite being designed for general nonparametric alternatives.

2512.13123 2026-02-24 math.OC cs.LG math.ST stat.ML stat.TH

Stopping Rules for Stochastic Gradient Descent via Anytime-Valid Confidence Sequences

Liviu Aolaritei, Michael I. Jordan

详情
英文摘要

The problem of stopping stochastic gradient descent (SGD) in an online manner, based solely on the observed trajectory, is a challenging theoretical problem with significant consequences for applications. While SGD is routinely monitored as it runs, the classical theory of SGD provides guarantees only at pre-specified iteration horizons and offers no valid way to decide, based on the observed trajectory, when further computation is justified. We address this longstanding gap by developing anytime-valid confidence sequences for stochastic gradient methods, which remain valid under continuous monitoring and directly induce statistically valid, trajectory-dependent stopping rules: stop as soon as the current upper confidence bound on an appropriate performance measure falls below a user-specified tolerance. The confidence sequences are constructed using nonnegative supermartingales, are time-uniform, and depend only on observable quantities along the SGD trajectory, without requiring prior knowledge of the optimization horizon. In convex optimization, this yields anytime-valid certificates for weighted suboptimality of projected SGD under general stepsize schedules, without assuming smoothness or strong convexity. In nonconvex optimization, it yields time-uniform certificates for weighted first-order stationarity under smoothness assumptions. We further characterize the stopping-time complexity of the resulting stopping rules under standard stepsize schedules. To the best of our knowledge, this is the first framework that provides statistically valid, time-uniform stopping rules for SGD across both convex and nonconvex settings based solely on its observed trajectory.

2511.05640 2026-02-24 cs.LG cs.GT stat.ML

Blind Inverse Game Theory: Jointly Decoding Rewards and Rationality in Entropy-Regularized Competitive Games

Hamza Virk, Sandro Amaglobeli, Zuhayr Syed

详情
英文摘要

Inverse Game Theory (IGT) methods based on the entropy-regularized Quantal Response Equilibrium (QRE) offer a tractable approach for competitive settings, but critically assume the agents' rationality parameter (temperature $τ$) is known a priori. When $τ$ is unknown, a fundamental scale ambiguity emerges that couples $τ$ with the reward parameters ($θ$), making them statistically unidentifiable. We introduce Blind-IGT, the first statistical framework to jointly recover both $θ$ and $τ$ from observed behavior. We analyze this bilinear inverse problem and establish necessary and sufficient conditions for unique identification by introducing a normalization constraint that resolves the scale ambiguity. We propose an efficient Normalized Least Squares (NLS) estimator and prove it achieves the optimal $\mathcal{O}(N^{-1/2})$ convergence rate for joint parameter recovery. When strong identifiability conditions fail, we provide partial identification guarantees through confidence set construction. We extend our framework to Markov games and demonstrate optimal convergence rates with strong empirical performance even when transition dynamics are unknown.

2511.01222 2026-02-24 stat.ME

Perturbed Double Machine Learning: Nonstandard Inference Beyond the Parametric Length

Mengchu Zheng, Matteo Bonvini, Zijian Guo

详情
英文摘要

We study inference on a low-dimensional functional $β$ in the presence of infinite-dimensional nuisance parameters. Classical inferential methods are typically based on Wald intervals, whose large-sample validity rests on asymptotic negligibility of nuisance error; for example, influence-curve based estimators (Double/Debiased Machine Learning, DML) are asymptotically Gaussian when nuisance estimators converge faster than $n^{-1/4}$. Although such negligibility can hold even in nonparametric classes, it can be restrictive. To relax this requirement, we propose Perturbed Double Machine Learning, which ensures valid inference even when nuisance estimators converge slower than $n^{-1/4}$. Our proposal is to (i) inject randomness into the nuisance estimation step to generate perturbed nuisance models, each yielding an estimate of $β$ and a Wald interval, and (ii) filter out perturbations whose deviations from the original DML estimate exceed a threshold. For Lasso nuisance learners, we show that, with high probability, at least one perturbation yields nuisance estimates sufficiently close to the truth, so the associated estimator of $β$ is close to an oracle with known nuisances. The union of retained intervals delivers valid coverage even when the DML estimator converges slower than $n^{-1/2}$. The framework extends to general machine-learning nuisance learners, and simulations show coverage when state-of-the-art methods fail.

2510.22792 2026-02-24 math.ST stat.TH

Composite goodness-of-fit test with the Kernel Stein Discrepancy and a bootstrap for degenerate U-statistics with estimated parameters

Florian Brück, Veronika Reimoser, Fabian Baier

详情
英文摘要

This paper formally derives the asymptotic distribution of a goodness-of-fit test based on the Kernel Stein Discrepancy introduced in (Oscar Key et al., "Composite Goodness-of-fit Tests with Kernels", Journal of Machine Learning Research 26.51 (2025), pp. 1-60). The test enables the simultaneous estimation of the optimal parameter within a parametric family of candidate models. Its asymptotic distribution is shown to be a weighted sum of infinitely many $χ^2$-distributed random variables plus an additional disturbance term, which is due to the parameter estimation. Further, we provide a general framework to bootstrap degenerate parameter-dependent $U$-statistics and use it to derive a new Kernel Stein Discrepancy composite goodness-of-fit test.

2510.22664 2026-02-24 cond-mat.stat-mech cs.IT gr-qc hep-ph math.IT math.ST quant-ph stat.TH

The Gravitational Aspect of Information: The Physical Reality of Asymmetric "Distance"

Tomoi Koide, Armin van de Venn

Comments 9 pages, no figures. Typos corrected and text added

详情
英文摘要

We show that when a Brownian bridge is physically constrained to satisfy a canonical condition, its time evolution exactly coincides with an m-geodesic on the statistical manifold of Gaussian distributions. This identification provides a direct physical realization of a geometric concept in information geometry. It implies that purely random processes evolve along informationally straight trajectories, analogous to geodesics in general relativity. Our findings suggest that the asymmetry of informational ``distance" (divergence) plays a fundamental physical role, offering a concrete step toward an equivalence principle for information.

2509.21655 2026-02-24 cs.LG stat.ML

DriftLite: Lightweight Drift Control for Inference-Time Scaling of Diffusion Models

Yinuo Ren, Wenhao Gao, Lexing Ying, Grant M. Rotskoff, Jiequn Han

Comments Published at ICLR 2026 (https://openreview.net/forum?id=l01eG3Qikl)

详情
英文摘要

We study inference-time scaling for diffusion models, where the goal is to adapt a pre-trained model to new target distributions without retraining. Existing guidance-based methods are simple but introduce bias, while particle-based corrections suffer from weight degeneracy and high computational cost. We introduce DriftLite, a lightweight, training-free particle-based approach that steers the inference dynamics on the fly with provably optimal stability control. DriftLite exploits a previously unexplored degree of freedom in the Fokker-Planck equation between the drift and particle potential, and yields two practical instantiations: Variance- and Energy-Controlling Guidance (VCG/ECG) for approximating the optimal drift with minimal overhead. Across Gaussian mixture models, particle systems, and large-scale protein-ligand co-folding problems, DriftLite consistently reduces variance and improves sample quality over pure guidance and sequential Monte Carlo baselines. These results highlight a principled, efficient route toward scalable inference-time adaptation of diffusion models. Our source code is publicly available at https://github.com/yinuoren/DriftLite.

2509.01924 2026-02-24 stat.ML cs.LG stat.AP stat.ME

Non-Linear Model-Based Sequential Decision-Making in Agriculture

Sakshi Arya, Wentao Lin

详情
英文摘要

Sequential decision-making is central to sustainable agricultural management and precision agriculture, where resource inputs must be optimized under uncertainty and over time. However, such decisions must often be made with limited observations, whereas classical bandit and reinforcement learning approaches typically rely on either linear or black-box reward models that may misrepresent domain knowledge or require large amounts of data. We propose a family of \emph{nonlinear, model-based bandit algorithms} that embed domain-specific response curves directly into the exploration-exploitation loop. By coupling (i) principled uncertainty quantification with (ii) closed-form or rapidly computable profit optima, these algorithms achieve sublinear regret and near-optimal sample complexity while preserving interpretability. Theoretical analysis establishes regret and sample complexity bounds, and extensive simulations emulating real-world fertilizer-rate decisions show consistent improvements over both linear and nonparametric baselines (such as linear UCB and $k$-NN UCB) in the low-sample regime, under both well-specified and shape-compatible misspecified models. Because our approach leverages mechanistic insight rather than large data volumes, it is especially suited to resource-constrained settings, supporting sustainable, inclusive, and transparent sequential decision-making across agriculture, environmental management, and allied applications.

2508.14487 2026-02-24 stat.ME stat.CO

Bridge Sampling Diagnostics

Giorgio Micaletto, Aki Vehtari

Comments 19 pages

详情
英文摘要

In Bayesian statistics, the marginal likelihood is used for model selection and averaging, yet it is often challenging to compute accurately for complex models. Approaches such as bridge sampling, while effective, may suffer from issues of high variability of the estimates. We present how to estimate Monte Carlo standard error (MCSE) for bridge sampling, and how to diagnose the reliability of MCSE estimates using Pareto-$\hat{k}$ and block reshuffling diagnostics without the need to repeatedly re-run full posterior inference. We demonstrate the behavior with increasingly more difficult simulated posteriors and many real posteriors from the posteriordb database.

2508.01457 2026-02-24 physics.ao-ph stat.ML

NICE^k Metrics: Unified and Multidimensional Framework for Evaluating Deterministic Solar Forecasting Accuracy

Cyril Voyant, Milan Despotovic, Luis Garcia-Gutierrez, Rodrigo Amaro e Silva, Philippe Lauret, Ted Soubdhan, Nadjem Bailek

Comments 24 pages, 1 Table, 5 Figures

Journal ref Sustainable Energy Technologies and Assessments (2025), 104588

详情
英文摘要

Accurate solar energy output prediction is key for integrating renewables into grids, maintaining stability, and improving energy management. However, standard error metrics such as Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Skill Scores (SS) fail to capture the multidimensional nature of solar irradiance forecasting. These metrics lack sensitivity to forecastability, rely on arbitrary baselines (e.g., clear-sky models), and are poorly suited for operational use. To address this, we introduce the NICEk framework (Normalized Informed Comparison of Errors, with k = 1, 2, 3, Sigma), offering a robust and interpretable evaluation of forecasting models. Each NICEk score corresponds to an Lk norm: NICE1 targets average errors, NICE2 emphasizes large deviations, NICE3 highlights outliers, and NICESigma combines all. Using Monte Carlo simulations and data from 68 stations in the Spanish SIAR network, we evaluated methods including autoregressive models, extreme learning, and smart persistence. Theoretical and empirical results align when assumptions hold (e.g., R^2 ~ 1.0 for NICE2). Most importantly, NICESigma consistently shows higher discriminative power (p < 0.05), outperforming traditional metrics (p > 0.05). The NICEk metrics exhibit stronger statistical significance (e.g., p-values from 10^-6 to 0.004 across horizons) and greater generalizability. They offer a unified and operational alternative to standard error metrics in deterministic solar forecasting.

2507.11732 2026-02-24 cs.LG stat.ML

Graph Neural Networks Powered by Encoder Embedding for Improved Node Learning

Shiyu Chen, Cencheng Shen, Youngser Park, Carey E. Priebe

详情
英文摘要

Graph neural networks (GNNs) have emerged as a powerful framework for a wide range of node-level graph learning tasks. However, their performance typically depends on random or minimally informed initial feature representations, where poor initialization can lead to slower convergence and increased training instability. In this paper, we address this limitation by leveraging a statistically grounded one-hot graph encoder embedding (GEE) as a high-quality, structure-aware initialization for node features. Integrating GEE into standard GNNs yields the GEE-powered GNN (GG) framework. Across extensive simulations and real-world benchmarks, GG provides consistent and substantial performance gains in both unsupervised and supervised settings. For node classification, we further introduce GG-C, which concatenates the outputs of GG and GEE and outperforms competing methods, achieving roughly 10-50% accuracy improvements across most datasets. These results demonstrate the importance of principled, structure-aware initialization for improving the efficiency, stability, and overall performance of graph neural network architecture, enabling models to better exploit graph topology from the outset.

2507.10373 2026-02-24 math.ST stat.ME stat.TH

Post-reduction inference for confidence sets of models

Heather Battey, Daniel Garcia Rasines, Yanbo Tang

详情
英文摘要

Sparsity in a regression context makes the model itself an object of interest, pointing to a confidence set of models as the appropriate presentation of evidence. A difficulty in areas such as genomics, where the number of candidate variables is vast, arises from the need for preliminary reduction prior to the assessment of models. The present paper considers a resolution using inferential separations fundamental to the Fisherian approach to conditional inference, namely, the sufficiency/co-sufficiency separation, and the ancillary/co-ancillary separation. The advantage of these separations is that no direction for departure from any hypothesised model is needed, avoiding issues that would otherwise arise from using the same data for reduction and for model assessment. In idealised cases with no nuisance parameters, the separations extract all the information in the data solely for the purpose for which it is useful, without loss or redundancy. The extent to which estimation of nuisance parameters affects the idealised information extraction is illustrated in detail for the normal-theory linear regression model, extending immediately to a log-normal accelerated-life model for time-to-event outcomes. This idealised analysis provides insight into when sample-splitting is likely to perform as well as, or better than, the co-sufficient or ancillary tests, and when it may be unreliable. The considerations involved in extending the detailed implementation to canonical exponential-family and more general regression models are briefly discussed. As part of the analysis for the Gaussian model, we introduce a modified version of the refitted cross-validation estimator of Fan et al. (2012), whose distribution theory is tractable in the appropriate conditional sense.

2506.09217 2026-02-24 cs.RO cs.CV stat.AP

Perception Characteristics Distance: Measuring Stability and Robustness of Perception System in Dynamic Conditions under a Certain Decision Rule

Boyu Jiang, Liang Shi, Zhengzhi Lin, Lanxin Xiang, Loren Stowe, Feng Guo

Comments This paper has been accepted to the CVPR 2026 Main Conference

详情
英文摘要

The safety of autonomous driving systems (ADS) depends on accurate perception across distance and driving conditions. The outputs of AI perception algorithms are stochastic, which have a major impact on decision making and safety outcomes, including time-to-collision estimation. However, current perception evaluation metrics do not reflect the stochastic nature of perception algorithms. We introduce the Perception Characteristics Distance (PCD), a novel metric incorporating model output uncertainty as represented by the farthest distance at which an object can be reliably detected. To represent a system's overall perception capability in terms of reliable detection distance, we average PCD values across multiple detection quality and probabilistic thresholds to produce the average PCD (aPCD). For empirical validation, we present the SensorRainFall dataset, collected on the Virginia Smart Road using a sensor-equipped vehicle (cameras, radar, and LiDAR) under different weather (clear and rainy) and illumination conditions (daylight, streetlight, and nighttime). The dataset includes ground-truth distances, bounding boxes, and segmentation masks for target objects. Experiments with state-of-the-art models show that aPCD captures meaningful differences across weather, daylight, and illumination conditions, which traditional evaluation metrics fail to reflect. PCD provides an uncertainty-aware measure of perception performance, supporting safer and more robust ADS operation, while the SensorRainFall dataset offers a valuable benchmark for evaluation. The SensorRainFall dataset is publicly available at https://www.kaggle.com/datasets/datadrivenwheels/sensorrainfall, and the evaluation code is available at https://github.com/datadrivenwheels/PCD_Python.

2505.23546 2026-02-24 math.OC stat.ML

Going from a Representative Agent to Counterfactuals in Combinatorial Choice

Yanqiu Ruan, Karthyek Murthy, Karthik Natarajan

Comments 34 pages, 6 figures

详情
英文摘要

We study decision-making problems where data comprises points from a collection of binary polytopes, capturing aggregate information stemming from various combinatorial selection environments. We propose a nonparametric approach for counterfactual inference in this setting based on a representative agent model, where the available data is viewed as arising from maximizing separable concave utility functions over the respective binary polytopes. Our first contribution is to precisely characterize the selection probabilities representable under this model and show that verifying the consistency of any given aggregated selection dataset reduces to solving a polynomial-sized linear program. Building on this characterization, we develop a nonparametric method for counterfactual prediction. When data is inconsistent with the model, finding a best-fitting approximation for prediction reduces to solving a compact mixed-integer convex program. Numerical experiments based on synthetic data demonstrate the method's flexibility, predictive accuracy, and strong representational power even under model misspecification.

2505.19371 2026-02-24 cs.AI cs.LG math.ST stat.TH

Foundations of Top-$k$ Decoding For Language Models

Georgy Noarov, Soham Mallick, Tao Wang, Sunay Joshi, Yan Sun, Yangxinyu Xie, Mengxin Yu, Edgar Dobriban

详情
英文摘要

Top-$k$ decoding is a widely used method for sampling from LLMs: at each token, only the largest $k$ next-token-probabilities are kept, and the next token is sampled after re-normalizing them to sum to unity. Top-$k$ and other sampling methods are motivated by the intuition that true next-token distributions are sparse, and the noisy LLM probabilities need to be truncated. However, to our knowledge, a precise theoretical motivation for the use of top-$k$ decoding is missing. In this work, we develop a theoretical framework that both explains and generalizes top-$k$ decoding. We view decoding at a fixed token as the recovery of a sparse probability distribution. We consider \emph{Bregman decoders} obtained by minimizing a separable Bregman divergence (for both the \emph{primal} and \emph{dual} cases) with a sparsity-inducing $\ell_0$ regularization. Despite the combinatorial nature of the objective, we show how to optimize it efficiently for a large class of divergences. We show that the optimal decoding strategies are greedy, and further that the loss function is discretely convex in $k$, so that binary search provably and efficiently finds the optimal $k$. We show that top-$k$ decoding arises as a special case for the KL divergence, and identify new decoding strategies that have distinct behaviors (e.g., non-linearly up-weighting larger probabilities after re-normalization).

2505.06595 2026-02-24 stat.ML cs.AI cs.CV cs.LG math.PR

Feature Representation Transferring to Lightweight Models via Perception Coherence

Hai-Vy Nguyen, Fabrice Gamboa, Sixin Zhang, Reda Chhaibi, Serge Gratton, Thierry Giaccone

Comments Published in Transactions on Machine Learning Research (02/2026)

Journal ref Published in Transactions on Machine Learning Research (02/2026)

详情
英文摘要

In this paper, we propose a method for transferring feature representation to lightweight student models from larger teacher models. We mathematically define a new notion called \textit{perception coherence}. Based on this notion, we propose a loss function, which takes into account the dissimilarities between data points in feature space through their ranking. At a high level, by minimizing this loss function, the student model learns to mimic how the teacher model \textit{perceives} inputs. More precisely, our method is motivated by the fact that the representational capacity of the student model is weaker than the teacher model. Hence, we aim to develop a new method allowing for a better relaxation. This means that, the student model does not need to preserve the absolute geometry of the teacher one, while preserving global coherence through dissimilarity ranking. Importantly, while rankings are defined only on finite sets, our notion of \textit{perception coherence} extends them into a probabilistic form. This formulation depends on the input distribution and applies to general dissimilarity metrics. Our theoretical insights provide a probabilistic perspective on the process of feature representation transfer. Our experiments results show that our method outperforms or achieves on-par performance compared to strong baseline methods for representation transferring.

2503.22933 2026-02-24 stat.ME

Improving Transportability of Regression Calibration Under the Main/External Validation Study Design

Zexiang Li, Donna Spiegelman, Molin Wang, Zuoheng Wang, Xin Zhou

详情
英文摘要

In epidemiology, obtaining accurate individual exposure measurements can be costly and challenging. Thus, these measurements are often subject to error. Regression calibration with a validation study is widely employed as a study design and analysis method to correct for measurement error in the main study due to its broad applicability and simple implementation. However, relying on an external validation study to assess the measurement error process carries the risk of introducing bias into the analysis. Specifically, if the parameters of regression calibration model estimated from the external validation study are not transportable to the main study, the subsequent estimated parameter describing the exposure-disease association will be biased. In this work, we improve the regression calibration method for linear regression models using an external validation study. Unlike the original approach, our proposed method ensures that the regression calibration model is transportable by estimating the parameters in the measurement error generating process using the external validation study and obtaining the remaining parameter values in the regression calibration model directly from the main study. This guarantees that parameter values in the regression calibration model will be applicable to the main study. We derived the theoretical properties of our proposed method. The simulation results show that the proposed method effectively reduces bias and maintains nominal confidence interval coverage. We applied this method to data from the Health Professionals Follow-Up Study (main study) and the Men's Lifestyle Validation Study (external validation study) to assess the effects of dietary intake on body weight.

2503.14381 2026-02-24 stat.ML cs.LG math.ST stat.ME stat.TH

Optimizing High-Dimensional Oblique Splits

Chien-Ming Chi

Comments 91 pages, 13 tables

详情
英文摘要

Evidence suggests that oblique splits can significantly enhance the performance of decision trees. This paper explores the optimization of high-dimensional oblique splits for decision tree construction, establishing the Sufficient Impurity Decrease (SID) convergence that takes into account $s_0$-sparse oblique splits. We demonstrate that the SID function class expands as sparsity parameter $s_0$ increases, enabling the model to capture complex data-generating processes such as the $s_0$-dimensional XOR function. Thus, $s_0$ represents the unknown potential complexity of the underlying data-generating function. Furthermore, we establish that learning these complex functions necessitates greater computational resources. This highlights a fundamental trade-off between statistical accuracy, which is governed by the $s_0$-dependent size of the SID function class, and computational cost. Particularly, for challenging problems, the required candidate oblique split set can become prohibitively large, rendering standard ensemble approaches computationally impractical. To address this, we propose progressive trees that optimize oblique splits through an iterative refinement process rather than a single-step optimization. These splits are integrated alongside traditional orthogonal splits into ensemble models like Random Forests to enhance finite-sample performance. The effectiveness of our approach is validated through simulations and real-data experiments, where it consistently outperforms various existing oblique tree models.

2503.11842 2026-02-24 cs.LG stat.ML

Test-Time Training Provably Improves Transformers as In-context Learners

Halil Alperen Gozeten, M. Emrullah Ildiz, Xuechen Zhang, Mahdi Soltanolkotabi, Marco Mondelli, Samet Oymak

Comments Accepted at ICML 2025

详情
英文摘要

Test-time training (TTT) methods explicitly update the weights of a model to adapt to the specific test instance, and they have found success in a variety of settings, including most recently language modeling and reasoning. To demystify this success, we investigate a gradient-based TTT algorithm for in-context learning, where we train a transformer model on the in-context demonstrations provided in the test prompt. Specifically, we provide a comprehensive theoretical characterization of linear transformers when the update rule is a single gradient step. Our theory (i) delineates the role of alignment between pretraining distribution and target task, (ii) demystifies how TTT can alleviate distribution shift, and (iii) quantifies the sample complexity of TTT including how it can significantly reduce the eventual sample size required for in-context learning. As our empirical contribution, we study the benefits of TTT for TabPFN, a tabular foundation model. In line with our theory, we demonstrate that TTT significantly reduces the required sample size for tabular classification (3 to 5 times fewer) unlocking substantial inference efficiency with a negligible training cost.

2502.04591 2026-02-24 cs.LG cs.AI stat.ML

Are We Measuring Oversmoothing in Graph Neural Networks Correctly?

Kaicheng Zhang, Piero Deidda, Desmond Higham, Francesco Tudisco

Comments Accepted into ICLR 2026

详情
英文摘要

Oversmoothing is a fundamental challenge in graph neural networks (GNNs): as the number of layers increases, node embeddings become increasingly similar, and model performance drops sharply. Traditionally, oversmoothing has been quantified using metrics that measure the similarity of neighbouring node features, such as the Dirichlet energy. We argue that these metrics have critical limitations and fail to reliably capture oversmoothing in realistic scenarios. For instance, they provide meaningful insights only for very deep networks, while typical GNNs show a performance drop already with as few as 10 layers. As an alternative, we propose measuring oversmoothing by examining the numerical or effective rank of the feature representations. We provide extensive numerical evaluation across diverse graph architectures and datasets to show that rank-based metrics consistently capture oversmoothing, whereas energy-based metrics often fail. Notably, we reveal that drops in the rank align closely with performance degradation, even in scenarios where energy metrics remain unchanged. Along with the experimental evaluation, we provide theoretical support for this approach, clarifying why Dirichlet-like measures may fail to capture performance drop and proving that the numerical rank of feature representations collapses to one for a broad family of GNN architectures.

2410.19412 2026-02-24 cs.LG cs.AI cs.CE econ.EM stat.CO

Robust Time Series Causal Discovery for Agent-Based Model Validation

Gene Yu, Ce Guo, Wayne Luk

Comments A peer-reviewed version titled "VCDF: A Validated Consensus-Driven Framework for Time Series Causal Discovery" is accepted to Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2026. Please cite the PAKDD version

详情
英文摘要

Agent-Based Model (ABM) validation is crucial as it helps ensuring the reliability of simulations, and causal discovery has become a powerful tool in this context. However, current causal discovery methods often face accuracy and robustness challenges when applied to complex and noisy time series data, which is typical in ABM scenarios. This study addresses these issues by proposing a Robust Cross-Validation (RCV) approach to enhance causal structure learning for ABM validation. We develop RCV-VarLiNGAM and RCV-PCMCI, novel extensions of two prominent causal discovery algorithms. These aim to reduce the impact of noise better and give more reliable causal relation results, even with high-dimensional, time-dependent data. The proposed approach is then integrated into an enhanced ABM validation framework, which is designed to handle diverse data and model structures. The approach is evaluated using synthetic datasets and a complex simulated fMRI dataset. The results demonstrate greater reliability in causal structure identification. The study examines how various characteristics of datasets affect the performance of established causal discovery methods. These characteristics include linearity, noise distribution, stationarity, and causal structure density. This analysis is then extended to the RCV method to see how it compares in these different situations. This examination helps confirm whether the results are consistent with existing literature and also reveals the strengths and weaknesses of the novel approaches. By tackling key methodological challenges, the study aims to enhance ABM validation with a more resilient valuation framework presented. These improvements increase the reliability of model-driven decision making processes in complex systems analysis.

2410.10251 2026-02-24 math.ST stat.TH

Convergence rates for estimating multivariate scale mixtures of uniform densities

Arlene K. H. Kim, Gil Kur, Adityanand Guntuboyina

Journal ref Electron. J. Statist. 19(2): 3771-3834 (2025)

详情
英文摘要

The Grenander estimator is a well-studied procedure for univariate nonparametric density estimation. It is usually defined as the Maximum Likelihood Estimator (MLE) over the class of all non-increasing densities on the positive real line. It can also be seen as the MLE over the class of all scale mixtures of uniform densities. Using the latter viewpoint, Pavlides and Wellner~\cite{pavlides2012nonparametric} proposed a multivariate extension of the Grenander estimator as the nonparametric MLE over the class of all multivariate scale mixtures of uniform densities. We prove that this multivariate estimator achieves the univariate cube root rate of convergence with only a logarithmic multiplicative factor that depends on the dimension. The usual curse of dimensionality is therefore avoided to some extent for this multivariate estimator. This result positively resolves a conjecture of Pavlides and Wellner~\cite{pavlides2012nonparametric} under an additional lower bound assumption. Our proof proceeds via a general accuracy result for the Hellinger accuracy of MLEs over convex classes of densities. We also provide algorithms for computing the estimator, and illustrate performance on real and simulated datasets.

2410.08958 2026-02-24 stat.ML cs.LG

The MAPS Algorithm: Fast model-agnostic and distribution-free prediction intervals for supervised learning

Daniel Salnikov, Dan Leonte, Kevin Michalewicz

Comments 28 pages, 3 algorithms, 5 figures, 3 tables

详情
英文摘要

A fundamental problem in modern supervised learning is computing reliable conditional prediction intervals in high-dimensional settings: existing methods often rely on restrictive modelling assumptions, do not scale as predictor dimension increases, or only guarantee marginal (population-level) rather than conditional (individual-level) coverage. We introduce the $\textit{lifted predictive model}$ (LPM), a new conditional representation, and propose the MAPS (Model-Agnostic Prediction Sets) algorithm that produces distribution-free conditional prediction intervals and adapts to any trained predictive model. Our procedure is bootstrap-based, scales to high-dimensional inputs and accounts for heteroscedastic errors. We establish the theoretical properties of the LPM, connect prediction accuracy to interval length, and provide sufficient conditions for asymptotic conditional coverage. We evaluate the finite-sample performance of MAPS in a simulation study, and apply our method to simulation-based inference and image classification. In the former, MAPS provides the first approach for debiasing neural Bayes estimators and constructing valid confidence intervals for model parameters given the estimators, at any desired level. In the latter, it provides the first approach that accounts for uncertainty in model calibration and label prediction.

2404.12613 2026-02-24 stat.ML cs.LG eess.SP stat.ME

Model Selection and Parameter Estimation of One-Dimensional Gaussian Mixture Models

Xinyu Liu, Hai Zhang

详情
英文摘要

In this paper, we study the problem of learning one-dimensional Gaussian mixture models (GMMs) with a specific focus on estimating both the model order and the mixing distribution from independent and identically distributed (i.i.d.) samples. This paper establishes the optimal sampling complexity for model order estimation in one-dimensional Gaussian mixture models. We prove a fundamental lower bound on the number of samples required to correctly identify the number of components with high probability, showing that this limit depends critically on the separation between component means and the total number of components. We then propose a Fourier-based approach to estimate both the model order and the mixing distribution. Our algorithm utilizes Fourier measurements constructed from the samples, and our analysis demonstrates that its sample complexity matches the established lower bound, thereby confirming its optimality. Numerical experiments further show that our method outperforms conventional techniques in terms of efficiency and accuracy.

2403.18248 2026-02-24 econ.EM stat.ML

Statistical Inference of Optimal Allocations I: Regularities and their Implications

Kai Feng, Han Hong, Denis Nekipelov

详情
英文摘要

In this paper, we develop a functional differentiability approach for solving statistical optimal allocation problems. We derive Hadamard differentiability of the value functions through analyzing the properties of the sorting operator using tools from geometric measure theory. Building on our Hadamard differentiability results, we apply the functional delta method to obtain the asymptotic properties of the value function process for the binary constrained optimal allocation problem and the plug-in ROC curve estimator. Moreover, the convexity of the optimal allocation value functions facilitates demonstrating the degeneracy of first order derivatives with respect to the policy. We then present a double / debiased estimator for the value functions. Importantly, the conditions that validate Hadamard differentiability justify the margin assumption from the statistical classification literature for the fast convergence rate of plug-in methods.

2402.11717 2026-02-24 math.RA math.ST stat.TH

A symmetric function approach to polynomial regression

Hans-Christian Herbig, Daniel Herden, Christopher Seaton

Comments 12 pages, 2 figures

Journal ref Aequationes mathematicae Volume 100, article number 21, (2026)

详情
英文摘要

We give an explicit solution formula for the polynomial regression problem in terms of Schur polynomials and Vandermonde determinants. We thereby generalize the work of Chang, Deng, and Floater to the case of model functions of the form $\sum _{i=1}^{n} a_{i} x^{d_{i}}$ for some integer exponents $d_{1} >d_{2} >\dotsc >d_{n} \geq 0$ and phrase the results using Schur polynomials. Even though the solution circumvents the well-known problems with the forward stability of the normal equation, it is only of practical value if $n$ is small because the number of terms in the formula grows rapidly with the number $m$ of data points. The formula can be evaluated essentially without rounding.

2402.10758 2026-02-24 stat.ML cs.LG stat.CO

Stochastic Localization via Iterative Posterior Sampling

Louis Grenioux, Maxence Noble, Marylou Gabrié, Alain Oliviero Durmus

Comments Accepted at ICML 2024, improved assumption A0 (and consequences), fixed corollary 11

详情
英文摘要

Building upon score-based learning, new interest in stochastic localization techniques has recently emerged. In these models, one seeks to noise a sample from the data distribution through a stochastic process, called observation process, and progressively learns a denoiser associated to this dynamics. Apart from specific applications, the use of stochastic localization for the problem of sampling from an unnormalized target density has not been explored extensively. This work contributes to fill this gap. We consider a general stochastic localization framework and introduce an explicit class of observation processes, associated with flexible denoising schedules. We provide a complete methodology, $\textit{Stochastic Localization via Iterative Posterior Sampling}$ (SLIPS), to obtain approximate samples of this dynamics, and as a by-product, samples from the target distribution. Our scheme is based on a Markov chain Monte Carlo estimation of the denoiser and comes with detailed practical guidelines. We illustrate the benefits and applicability of SLIPS on several benchmarks of multi-modal distributions, including Gaussian mixtures in increasing dimensions, Bayesian logistic regression and a high-dimensional field system from statistical-mechanics.

2308.04825 2026-02-24 stat.ME math.PR

Repelled point processes with application to numerical integration

Diala Hawat, Gabriel Mastrilli, Rémi Bardenet, Raphaël Lachièze-Rey

详情
英文摘要

We look at Monte Carlo numerical integration from a stochastic geometry point of view. While crude Monte Carlo estimators relate to linear statistics of a homogeneous Poisson point process (PPP), linear statistics of more regularly spread point processes can yield unbiased estimators with faster-decaying variance, and thus lower integration error. Following this intuition, we introduce a Coulomb repulsion operator, which reduces clustering by slightly pushing the points of a configuration away from each other. Our empirical findings show that applying the repulsion operator to a PPP as well as, intriguingly, to more regular point processes, preserves unbiasedness while reducing the variance of the corresponding Monte Carlo estimator, thus enhancing the method. We prove this variance reduction when the initial point process is a PPP. On the computational side, the complexity of the operator is quadratic and the corresponding algorithm can be parallelized without communication across tasks.

2306.01485 2026-02-24 cs.LG cs.AI cs.NA math.NA stat.ML

Robust low-rank training via approximate orthonormal constraints

Dayana Savostianova, Emanuele Zangrando, Gianluca Ceruti, Francesco Tudisco

Journal ref Proceedings NeurIPS 2023

详情
英文摘要

With the growth of model and data sizes, a broad effort has been made to design pruning techniques that reduce the resource demand of deep learning pipelines, while retaining model performance. In order to reduce both inference and training costs, a prominent line of work uses low-rank matrix factorizations to represent the network weights. Although able to retain accuracy, we observe that low-rank methods tend to compromise model robustness against adversarial perturbations. By modeling robustness in terms of the condition number of the neural network, we argue that this loss of robustness is due to the exploding singular values of the low-rank weight matrices. Thus, we introduce a robust low-rank training algorithm that maintains the network's weights on the low-rank matrix manifold while simultaneously enforcing approximate orthonormal constraints. The resulting model reduces both training and inference costs while ensuring well-conditioning and thus better adversarial robustness, without compromising model accuracy. This is shown by extensive numerical evidence and by our main approximation theorem that shows the computed robust low-rank network well-approximates the ideal full model, provided a highly performing low-rank sub-network exists.

2210.04140 2026-02-24 stat.ME

Bayesian Repulsive Mixture Modeling with Matérn Point Processes

Hanxi Sun, Boqian Zhang, Minhyeok Kim, Vinayak Rao

Comments Main doc: 18 pages, 8 figures. Supp: 16 pages, 19 figures. Changes: added author (Minhyeok Kim) and section/results on setting repulsion parameters

详情
英文摘要

Mixture models are a standard tool in statistical analyses, widely used for density modeling and model-based clustering. In this work, we propose a Bayesian mixture model with repulsion between mixture components. Such repulsion helps address the problem of overlapping or poorly separated clusters, and assists with model interpretibility and robustness. Our modeling approach introduces repulsion via a generalized Matérn type-III repulsive point process model, and proceeds by applying a dependent sequential thinning scheme to a latent Poisson point process. A key feature of our model is that in contrast to most existing approaches to modeling repulsion, efficient posterior inference is possible via a Gibbs sampler, one that exploits the latent Poisson of our problem. This novel sampler also allows posterior inference over the number of clusters, and is of independent interest even in standard clustering applications without repulsion. We demonstrate the utility of the proposed method on a number of synthetic and real-world problems.

2203.14959 2026-02-24 stat.AP physics.ao-ph physics.data-an

Benchmarks for Solar Radiation Time Series Forecasting

Cyril Voyant, Gilles Notton, Jean-Laurent Duchaud, Luis Antonio García Gutiérrez, Jamie M. Bright, Dazhi Yang

Comments 32 pages, 9 Tables and 4 Figures

Journal ref Volume 191, May 2022, Pages 747-762

详情
英文摘要

With an ever-increasing share of intermittent renewable energy in the world's energy mix,there is an increasing need for advanced solar power forecasting models to optimize the operation and control of solar power plants. In order to justify the need for more elaborate forecast modeling, one must compare the performance of advanced models with naive reference methods. On this point, a rigorous formalism using statistical tools, variational calculation and quantification of noise in the measurement is studied and five naive reference forecasting methods are considered, among which there is a newly proposed approach called ARTU (a particular autoregressive model of order two). These methods do not require any training phase nor demand any (or almost no) historical data. Additionally, motivated by the well-known benefits of ensemble forecasting, a combination of these models is considered, and then validated using data from multiple sites with diverse climatological characteristics, based on various error metrics, among which some are rarely used in the field of solar energy. The most appropriate benchmarking method depends on the salient features of the variable being forecast (e.g., seasonality, cyclicity, or conditional heteoroscedasity) as well as the forecast horizon. Hence, to ensure a fair benchmarking, forecasters should endeavor to discover the most appropriate naive reference method for their setup by testing all available options. Among the methods proposed in this paper, the combination and ARTU statistically offer the best results for the proposed study conditions.

1408.0705 2026-02-24 stat.ME econ.EM

Using Invalid Instruments on Purpose: Focused Moment Selection and Averaging for GMM

Francis J. DiTraglia

Journal ref Journal of Econometrics, Volume 195, Issue 2, December 2016, Pages 187-208

详情
英文摘要

In finite samples, the use of a slightly endogenous but highly relevant instrument can reduce mean-squared error (MSE). Building on this observation, I propose a novel moment selection procedure for GMM -- the Focused Moment Selection Criterion (FMSC) -- in which moment conditions are chosen not based on their validity but on the MSE of their associated estimator of a user-specified target parameter. The FMSC mimics the situation faced by an applied researcher who begins with a set of relatively mild "baseline" assumptions and must decide whether to impose any of a collection of stronger but more controversial "suspect" assumptions. When the (correctly specified) baseline moment conditions identify the model, the FMSC provides an asymptotically unbiased estimator of asymptotic MSE, allowing us to select over the suspect moment conditions. I go on to show how the framework used to derive the FMSC can address the problem of inference post-moment selection. Treating post-selection estimators as a special case of moment-averaging, in which estimators based on different moment sets are given data-dependent weights, I propose simulation-based procedures for inference that can be applied to a variety of formal and informal moment-selection and averaging procedures. Both the FMSC and confidence interval procedures perform well in simulations. I conclude with an empirical example examining the effect of instrument selection on the estimated relationship between malaria and income per capita.