arXivDaily arXiv每日学术速递 周一至周五更新
2506.01874 2026-01-23 econ.EM stat.ME

Life Sequence Transformer: Generative Modelling of Socio-Economic Trajectories from Administrative Data

Alberto Cabezas, Carlotta Montorsi

详情
英文摘要

Generative modelling with Transformer architectures can simulate complex sequential structures across various applications. We extend this line of work to the social sciences by introducing a Transformer-based generative model tailored to longitudinal socio-economic data. Our contributions are: (i) we design a novel encoding method that represents socio-economic life histories as sequences, including overlapping events across life domains; and (ii) we adapt generative modelling techniques to simulate plausible alternative life trajectories conditioned on past histories. Using large-scale data from the Italian social security administration (INPS), we show that the model can be trained at scale, reproduces realistic labour market patterns consistent with known causal relationships, and generates coherent hypothetical life paths. This work demonstrates the feasibility of generative modelling for socio-economic trajectories and opens new opportunities for policy-oriented research, with counterfactual generation as a particularly promising application.

2601.16095 2026-01-23 stat.ME math.ST stat.TH

On the spherical cardioid distribution and its goodness-of-fit

Eduardo García-Portugués

Comments 53 pages, 7 figures, 2 tables

详情
英文摘要

In this paper, we study the spherical cardioid distribution, a higher-dimensional and higher-order generalization of the circular cardioid distribution. This distribution is rotationally symmetric and generates unimodal, multimodal, axial, and girdle-like densities. We show several characteristics of the spherical cardioid that make it highly tractable: simple density evaluation, closedness under convolution, explicit expressions for vectorized moments, and efficient simulation. The moments of the spherical cardioid up to a given order coincide with those of the uniform distribution on the sphere, highlighting its closeness to the latter. We derive estimators by the method of moments and maximum likelihood, their asymptotic distributions, and their asymptotic relative efficiencies. We give the machinery for a bootstrap goodness-of-fit test based on the projected-ecdf approach, including the projected distribution and closed-form expressions for test statistics. An application to modeling the orbits of long-period comets shows the usefulness of the spherical cardioid distribution in real data analyses.

2601.16089 2026-01-23 stat.CO

A forward-only scheme for online learning of proposal distributions in particle filters

Sylvain Procope-Mamert, Nicolas Chopin, Maud Delattre, Guillaume Kon Kam King

详情
英文摘要

We introduce a new online approach for constructing proposal distributions in particle filters using a forward scheme. Our method progressively incorporates future observations to refine proposals. This is in contrast to backward-scheme algorithms that require access to the entire dataset, such as the iterated auxiliary particle filters (Guarniero et al., 2017, arXiv:1511.06286) and controlled sequential Monte Carlo (Heng et al., 2020, arXiv:1708.08396) which leverage all future observations through backward recursion. In comparison, our forward scheme achieves a gradual improvement of proposals that converges toward the proposal targeted by these backward methods. We show that backward approaches can be numerically unstable even in simple settings. Our forward method, however, offers significantly greater robustness with only a minor trade-off in performance, measured by the variance of the marginal likelihood estimator. Numerical experiments on both simulated and real data illustrate the enhanced stability of our forward approach.

2601.16070 2026-01-23 stat.ML cs.LG math.ST stat.TH

On damage of interpolation to adversarial robustness in regression

Jingfu Peng, Yuhong Yang

详情
英文摘要

Deep neural networks (DNNs) typically involve a large number of parameters and are trained to achieve zero or near-zero training error. Despite such interpolation, they often exhibit strong generalization performance on unseen data, a phenomenon that has motivated extensive theoretical investigations. Comforting results show that interpolation indeed may not affect the minimax rate of convergence under the squared error loss. In the mean time, DNNs are well known to be highly vulnerable to adversarial perturbations in future inputs. A natural question then arises: Can interpolation also escape from suboptimal performance under a future $X$-attack? In this paper, we investigate the adversarial robustness of interpolating estimators in a framework of nonparametric regression. A finding is that interpolating estimators must be suboptimal even under a subtle future $X$-attack, and achieving perfect fitting can substantially damage their robustness. An interesting phenomenon in the high interpolation regime, which we term the curse of simple size, is also revealed and discussed. Numerical experiments support our theoretical findings.

2601.16058 2026-01-23 math.ST stat.ME stat.TH

Fully Functional Weighted Testing for Abrupt and Gradual Location Changes in Functional Time Series

Claudia Kirch, Hedvika Ranošová, Martin Wendler

详情
英文摘要

Change point tests for abrupt changes in the mean of functional data, i.e., random elements in infinite-dimensional Hilbert spaces, are either based on dimension reduction techniques, e.g., based on principal components, or directly based on a functional CUSUM (cumulative sum) statistic. The former have often been criticized as not being fully functional and losing too much information. On the other hand, unlike the latter, they take the covariance structure of the data into account by weighting the CUSUM statistics obtained after dimension reduction with the inverse covariance matrix. In this paper, as a middle ground between these two approaches, we propose an alternative statistic that includes the covariance structure with an offset parameter to produce a scale-invariant test procedure and to increase power when the change is not aligned with the first components. We obtain the asymptotic distribution under the null hypothesis for this new test statistic, allowing for time dependence of the data. Furthermore, we introduce versions of all three test statistics for gradual change situations, which have not been previously considered for functional data, and derive their limit distribution. Further results shed light on the asymptotic power behavior for all test statistics under various ground truths for the alternatives.

2601.16041 2026-01-23 math.ST cs.LG math.OC stat.TH

Risk reversal for least squares estimators under nested convex constraints

Omar Al-Ghattas

Comments 31 pages, 5 figures

详情
英文摘要

In constrained stochastic optimization, one naturally expects that imposing a stricter feasible set does not increase the statistical risk of an estimator defined by projection onto that set. In this paper, we show that this intuition can fail even in canonical settings. We study the Gaussian sequence model, a deliberately austere test best, where for a compact, convex set $Θ\subset \mathbb{R}^d$ one observes \[ Y = θ^\star + σZ, \qquad Z \sim N(0, I_d), \] and seeks to estimate an unknown parameter $θ^\star \in Θ$. The natural estimator is the least squares estimator (LSE), which coincides with the Euclidean projection of $Y$ onto $Θ$. We construct an explicit example exhibiting \emph{risk reversal}: for sufficiently large noise, there exist nested compact convex sets $Θ_S \subset Θ_L$ and a parameter $θ^\star \in Θ_S$ such that the LSE constrained to $Θ_S$ has strictly larger risk than the LSE constrained to $Θ_L$. We further show that this phenomenon can persist at the level of worst-case risk, with the supremum risk over the smaller constraint set exceeding that over the larger one. We clarify this behavior by contrasting noise regimes. In the vanishing-noise limit, the risk admits a first-order expansion governed by the statistical dimension of the tangent cone at $θ^\star$, and tighter constraints uniformly reduce risk. In contrast, in the diverging-noise regime, the risk is determined by global geometric interactions between the constraint set and random noise directions. Here, the embedding of $Θ_S$ within $Θ_L$ can reverse the risk ordering. These results reveal a previously unrecognized failure mode of projection-based estimators: in sufficiently noisy settings, tightening a constraint can paradoxically degrade statistical performance.

2601.15996 2026-01-23 math.OC math.ST stat.TH

Minimax-optimal Halpern iterations for Lipschitz maps

Mario Bravo, Roberto Cominetti, Jongmin Lee

详情
英文摘要

This paper investigates the minimax-optimality of Halpern fixed-point iterations for Lipschitz maps in general normed spaces. Starting from an a priori bound on the orbit of iterates, we derive non-asymptotic estimates for the fixed-point residuals. These bounds are tight, meaning that they are attained by a suitable Lipschitz map and an associated Halpern sequence. By minimizing these tight bounds we identify the minimax-optimal Halpern scheme. For contractions, the optimal iteration exhibits a transition from an initial Halpern phase to the classical Banach-Picard iteration and, as the Lipschitz constant approaches one, we recover the known convergence rate for nonexpansive maps. For expansive maps, the algorithm is purely Halpern with no Banach-Picard phase; moreover, on bounded domains, the residual estimates converge to the minimal displacement bound. Inspired by the minimax-optimal iteration, we design an adaptive scheme whose residuals are uniformly smaller than the minimax-optimal bounds, and can be significantly sharper in practice. Finally, we extend the analysis by introducing alternative bounds based on the distance to a fixed point, which allow us to handle mappings on unbounded domains; including the case of affine maps for which we also identify the minimax-optimal iteration.

2601.15942 2026-01-23 stat.ME

A Hierarchical Bayesian Framework for Model-based Prognostics

Xinyu Jia, Iason Papaioannou, Daniel Straub

详情
英文摘要

In prognostics and health management (PHM) of engineered systems, maintenance decisions are ideally informed by predictions of a system's remaining useful life (RUL) based on operational data. Model-based prognostics algorithms rely on a parametric model of the system degradation process. The model parameters are learned from real-time operational data collected on the system. However, there can be valuable information in data from similar systems or components, which is not typically utilized in PHM. In this contribution, we propose a hierarchical Bayesian modeling (HBM) framework for PHM that integrates both operational data and run-to-failure data from similar systems or components. The HBM framework utilizes hyperparameter distributions learned from data of similar systems or components as priors. It enables efficient updates of predictions as more information becomes available, allowing for increasingly accurate assessments of the degradation process and its associated variability. The effectiveness of the proposed framework is demonstrated through two experimental applications involving real-world data from crack growth and lithium battery degradation. Results show significant improvements in RUL prediction accuracy and demonstrate how the framework facilitates uncertainty management through predictive distributions.

2601.15936 2026-01-23 stat.AP

Detecting interpolation errors in infant mortality counts in 20th Century England and Wales

Tessa Wilkie, Idris Eckley, Paul Fearnhead, Ian Gregory

Comments 40 pages, 18 figures

详情
英文摘要

Understanding historical datasets, such as the England and Wales infant mortality data, for local government districts can provide valuable insights into our changing society. Such analyses can prove challenging in practice, due to frequent changes in the boundaries of local government districts for which records are collected. One solution adopted in the literature to overcome such practical challenges is to pre-process data using areal interpolation to render the units consistent over the time period of focus. However, such methods are prone to errors. In this paper we introduce a novel changepoint method to detect instances where interpolation performs poorly. We demonstrate the utility of our method on original data, and also demonstrate how correcting interpolation errors can affect the clustering of the infant mortality curves.

2601.15896 2026-01-23 stat.ME

Leave-one-out testing for node-level differences in Gaussian graphical models

Davide Benussi, Ester Alongi, Erika Banzato

详情
英文摘要

We study two-sample equality testing in Gaussian graphical models. Classical likelihood ratio tests on decomposable graphs admit clique-wise factorizations, offering limited localization and unstable finite-sample behaviour. We propose node-level inference via a leave-one-out Bartlett-adjusted test on a fully connected graph. The resulting increments have standard chi-square null limits, enabling calibrated significance for single nodes and fixed-size subsets. Simulations confirm validity, and a case study shows practical utility.

2601.15807 2026-01-23 stat.CO cs.NE math.AC math.ST stat.TH

Algebraic Statistics in OSCAR

Tobias Boege, Antony Della Vecchia, Marina Garrote-López, Benjamin Hollering

详情
英文摘要

We introduce the AlgebraicStatistics section of the OSCAR computer algebra system. We give an overview of its extensible design and highlight its features including serialization of data types for sharing results and creating databases, and state-of-the-art implicitization algorithms.

2601.15696 2026-01-23 stat.ME stat.ML

Learning Functional Graphs with Nonlinear Sufficient Dimension Reduction

Kyongwon Kim, Bing Li

详情
英文摘要

Functional graphical models have undergone extensive development during the recent years, leading to a variety models such as the functional Gaussian graphical model, the functional copula Gaussian graphical model, the functional Bayesian graphical model, the nonparametric functional additive graphical model, and the conditional functional graphical model. These models rely either on some parametric form of distributions on random functions, or on additive conditional independence, a criterion that is different from probabilistic conditional independence. In this paper we introduce a nonparametric functional graphical model based on functional sufficient dimension reduction. Our method not only relaxes the Gaussian or copula Gaussian assumptions, but also enhances estimation accuracy by avoiding the ``curse of dimensionality''. Moreover, it retains the probabilistic conditional independence as the criterion to determine the absence of edges. By doing simulation study and analysis of the f-MRI dataset, we demonstrate the advantages of our method.

2601.15675 2026-01-23 stat.AP

Climate Vulnerability and Community Health: Identifying Greensboro Neighborhoods at Intersectional Risk

Rehinatu Usman, Onyedikachi J. Okeke

Comments 18 pages, 8 figures

详情
英文摘要

This study develops an integrated, intersectional climate vulnerability assessment for Greensboro, North Carolina, a midsize city in the rapidly changing American Southeast. Moving beyond generalized mapping, we combine demographic, socioeconomic, health, and environmental data at the census tract level to identify neighborhoods where flood exposure, chronic health burdens, and social disadvantage spatially converge. Through k-means and hierarchical clustering, we identify four distinct neighborhood typologies, including a critically high-risk cluster characterized by high flood exposure, extreme poverty, poor respiratory health, and aging housing. The findings demonstrate that climate-related risks are not randomly distributed but systematically cluster in historically marginalized communities, revealing a clear environmental justice disparity. This place-based typology approach provides a targeted framework for policymakers to design integrated interventions that bridge flood management, public health, housing, and social services to build equitable urban resilience

2601.15640 2026-01-23 cs.LG stat.ML

An Empirical Study on Ensemble-Based Transfer Learning Bayesian Optimisation with Mixed Variable Types

Natasha Trinkle, Huong Ha, Jeffrey Chan

Comments 36 pages, 16 figures

详情
英文摘要

Bayesian optimisation is a sample efficient method for finding a global optimum of expensive black-box objective functions. Historic datasets from related problems can be exploited to help improve performance of Bayesian optimisation by adapting transfer learning methods to various components of the Bayesian optimisation pipeline. In this study we perform an empirical analysis of various ensemble-based transfer learning Bayesian optimisation methods and pipeline components. We expand on previous work in the literature by contributing some specific pipeline components, and three new real-time transfer learning Bayesian optimisation benchmarks. In particular we propose to use a weighting strategy for ensemble surrogate model predictions based on regularised regression with weights constrained to be positive, and a related component for handling the case when transfer learning is not improving Bayesian optimisation performance. We find that in general, two components that help improve transfer learning Bayesian optimisation performance are warm start initialisation and constraining weights used with ensemble surrogate model to be positive.

2601.15635 2026-01-23 cs.SI physics.soc-ph stat.ME

Community-Size Biases in Statistical Inference of Communities in Temporal Networks

Theodore Y. Faust, Arash A. Amini, Mason A. Porter

Comments 45 pages, 11 figures

详情
英文摘要

In the study of time-dependent (i.e., temporal) networks, researchers often examine the evolution of communities, which are sets of densely connected sets of nodes that are connected sparsely to other nodes. An increasingly prominent approach to studying community structure in temporal networks is statistical inference. In the present paper, we study the performance of a class of statistical-inference methods for community detection in temporal networks. We represent temporal networks as multilayer networks, with each layer encoding a time step, and we illustrate that statistical-inference models that generate community assignments via either a uniform distribution on community assignments or discrete-time Markov processes are biased against generating communities with large or small numbers of nodes. In particular, we demonstrate that statistical-inference methods that use such generative models tend to poorly identify community structure in networks with large or small communities. To rectify this issue, we introduce a novel statistical model that generates the community assignments of the nodes in given layer (i.e., at a given time) using all of the community assignments in the previous layer. We prove results that guarantee that our approach greatly mitigates the bias against large and small communities, so using our generative model is beneficial for studying community structure in networks with large or small communities. Our code is available at https://github.com/tfaust0196/TemporalCommunityComparison.

2601.15608 2026-01-23 math.OC stat.AP

Lead distance under a pickoff limit in Major League Baseball: A sequential game model

Scott Powers, Sivaramakrishnan Ramani, Jacob Hahn, Andrew J. Schaefer

Comments 33 pages

详情
英文摘要

Major League Baseball (MLB) recently limited pitchers to three pickoff attempts, creating a cat-and-mouse game between pitcher and runner. Each failed attempt adds pressure on the pitcher to avoid using another, and the runner can intensify this pressure by extending their leadoff toward the next base. We model this dynamic as a two-player zero-sum sequential game in which the runner first chooses a lead distance, and then the pitcher chooses whether to attempt a pickoff. We establish optimality characterizations for the game and present variants of value iteration and policy iteration to solve the game. Using lead distance data, we estimate generalized linear mixed-effects models for pickoff and stolen base outcome probabilities given lead distance, context, and player skill. We compute the game-theoretic equilibria under the two-player model, as well as the optimal runner policy under a simplified one-player Markov decision process (MDP) model. In the one-player setting, our results establish an actionable rule of thumb: the Two-Foot Rule, which recommends that a runner increase their lead by two feet after each pickoff attempt.

2601.15603 2026-01-23 math.ST cs.IT math.IT stat.ML stat.TH

On the Nonasymptotic Scaling Guarantee of Hyperparameter Estimation in Inhomogeneous, Weakly-Dependent Complex Network Dynamical Systems

Yi Yu, Yubo Hou, Yinchong Wang, Nan Zhang, Jianfeng Feng, Wenlian Lu

详情
英文摘要

Hierarchical Bayesian models are increasingly used in large, inhomogeneous complex network dynamical systems by modeling parameters as draws from a hyperparameter-governed distribution. However, theoretical guarantees for these estimates as the system size grows have been lacking. A critical concern is that hyperparameter estimation may diverge for larger networks, undermining the model's reliability. Formulating the system's evolution in a measure transport perspective, we propose a theoretical framework for estimating hyperparameters with mean-type observations, which are prevalent in many scientific applications. Our primary contribution is a nonasymptotic bound for the deviation of estimate of hyperparameters in inhomogeneous complex network dynamical systems with respect to network population size, which is established for a general family of optimization algorithms within a fixed observation duration. While we firstly establish a consistency result for systems with independent nodes, our main result extends this guarantee to the more challenging and realistic setting of weakly-dependent nodes. We validate our theoretical findings with numerical experiments on two representative models: a Susceptible-Infected-Susceptible model and a Spiking Neuronal Network model. In both cases, the results confirm that the estimation error decreases as the network population size increases, aligning with our theoretical guarantees. This research proposes the foundational theory to ensure that hierarchical Bayesian methods are statistically consistent for large-scale inhomogeneous systems, filling a gap in this area of theoretical research and justifying their application in practice.

2601.15566 2026-01-23 stat.ME

Model-Free Inference for Characterizing Protein Mutations through a Coevolutionary Lens

Fan Yang, Zhao Ren, Wen Zhou, Kejue Jia, Robert Jernigan

详情
英文摘要

Multiple sequence alignment (MSA) data play a crucial role in the study of protein mutations, with contact prediction being a notable application. Existing methods are often model-based or algorithmic and typically do not incorporate statistical inference to quantify the uncertainty of the prediction outcomes. To address this, we propose a novel framework that transforms the task of contact prediction into a statistical testing problem. Our approach is motivated by the partial correlation for continuous random variables. With one-hot encoding of MSA data, we are able to construct a partial correlation graph for multivariate categorical variables. In this framework, two connected nodes in the graph indicate that the corresponding positions on the protein form a contact. A new spectrum-based test statistic is introduced to test whether two positions are partially correlated. Moreover, the new framework enables the identification of amino acid combinations that contribute to the correlation within the identified contacts, an important but largely unexplored aspect of protein mutations. Numerical experiments demonstrate that our proposed method is valid in terms of controlling Type I errors and powerful in general. Real data applications on various protein families further validate the practical utility of our approach in coevolution and mutation analysis.

2601.15552 2026-01-23 cs.LG cs.AI stat.ML

BanditLP: Large-Scale Stochastic Optimization for Personalized Recommendations

Phuc Nguyen, Benjamin Zelditch, Joyce Chen, Rohit Patra, Changshuai Wei

详情
英文摘要

We present BanditLP, a scalable multi-stakeholder contextual bandit framework that unifies neural Thompson Sampling for learning objective-specific outcomes with a large-scale linear program for constrained action selection at serving time. The methodology is application-agnostic, compatible with arbitrary neural architectures, and deployable at web scale, with an LP solver capable of handling billions of variables. Experiments on public benchmarks and synthetic data show consistent gains over strong baselines. We apply this approach in LinkedIn's email marketing system and demonstrate business win, illustrating the value of integrated exploration and constrained optimization in production.

2601.15514 2026-01-23 stat.AP stat.ML

Assessing the informative value of macroeconomic indicators for public health forecasting

Shome Chakraborty, Fardil Khan, Soutik Ghosal

Comments 16 pages, 6 figures

详情
英文摘要

Macroeconomic conditions influence the environments in which health systems operate, yet their value as leading signals of health system capacity has not been systematically evaluated. In this study, we examine whether selected macroeconomic indicators contain predictive information for several capacity-related public health targets, including employment in the health and social assistance workforce, new business applications in the sector, and health care construction spending. Using monthly U.S. time series data, we evaluate multiple forecasting approaches, including neural network models with different optimization strategies, generalized additive models, random forests, and time series models with exogenous macroeconomic indicators, under alternative model fitting designs. Across evaluation settings, we find that macroeconomic indicators provide a consistent and reproducible predictive signal for some public health targets, particularly workforce and infrastructure measures, while other targets exhibit weaker or less stable predictability. Models emphasizing stability and implicit regularization tend to perform more reliably during periods of economic volatility. These findings suggest that macroeconomic indicators may serve as useful upstream signals for digital public health monitoring, while underscoring the need for careful model selection and validation when translating economic trends into health system forecasting tools.

2601.15491 2026-01-23 stat.AP

Geometric Morphometrics approach for classifying children's nutritional status on out of sample data

Laura Medialdea, Ana Arribas-Gil, Álvaro Pérez-Romero, Amador Gómez

Journal ref Scientific Reports 15, 3906 (2025)

详情
英文摘要

Current alignment-based methods for classification in geometric morphometrics do not generally address the classification of new individuals that were not part of the study sample. However, in the context of infant and child nutritional assessment from body shape images this is a relevant problem. In this setting, classification rules obtained on the shape space from a reference sample cannot be used on out-of-sample individuals in a straightforward way. Indeed, a series of sample dependent processing steps, such as alignment (Procrustes analysis, for instance) or allometric regression, need to be conducted before the classification rule can be applied. This work proposes ways of obtaining shape coordinates for a new individual and analyzes the effect of using different template configurations on the sample of study as target for registration of the out-of-sample raw coordinates. Understanding sample characteristics and collinearity among shape variables is crucial for optimal classification results when evaluating children's nutritional status using arm shape analysis from photos. The SAM Photo Diagnosis App\c{opyright} Program's goal is to develop an offline smartphone tool, enabling updates of the training sample across different nutritional screening campaigns.

2601.15467 2026-01-23 stat.OT

Treatment effect: a critique

Heather Battey, Charlotte Edgar

Comments Presented at the Nordic-Baltic Biometrics Conference (Oslo, June 2025), and the RSS International Conference (Edinburgh, September 2025)

详情
英文摘要

Two broad positions within statistics define a treatment effect, on the one hand, as a parameter of a statistical model, and on the other, as an appropriate population-level difference in outcomes or counterfactual outcomes under the different treatment regimes. This short expository paper presents some simple but consequential insights on the two formulations, contrasting the answers under the most favourable fictitious idealisation for the counterfactual framework. These observations clarify the relationship between Fisherian model-based inference and modern counterfactual formulations, and emphasise concerns, raised by Cox and others, regarding the suitability of model-free definitions as targets of inference when scientific conclusions are intended to generalise beyond the observed sample. Parts of the paper are necessarily controversial; we follow Cox (1958a) in not putting these forward in any dogmatic spirit.

2601.15442 2026-01-23 cs.AI cs.LG cs.LO cs.NA math.NA stat.ML

A tensor network formalism for neuro-symbolic AI

Alex Goessmann, Janina Schütte, Maximilian Fröhlich, Martin Eigel

Comments 51 pages, 14 figures

详情
英文摘要

The unification of neural and symbolic approaches to artificial intelligence remains a central open challenge. In this work, we introduce a tensor network formalism, which captures sparsity principles originating in the different approaches in tensor decompositions. In particular, we describe a basis encoding scheme for functions and model neural decompositions as tensor decompositions. The proposed formalism can be applied to represent logical formulas and probability distributions as structured tensor decompositions. This unified treatment identifies tensor network contractions as a fundamental inference class and formulates efficiently scaling reasoning algorithms, originating from probability theory and propositional logic, as contraction message passing schemes. The framework enables the definition and training of hybrid logical and probabilistic models, which we call Hybrid Logic Network. The theoretical concepts are accompanied by the python library tnreason, which enables the implementation and practical use of the proposed architectures.

2601.15380 2026-01-23 cs.LG cs.CL stat.ML

You Need Better Attention Priors

Elon Litman, Gabe Guo

详情
英文摘要

We generalize the attention mechanism by viewing it through the lens of Entropic Optimal Transport, revealing that standard attention corresponds to a transport problem regularized by an implicit uniform prior. We introduce Generalized Optimal transport Attention with Trainable priors (GOAT), a new attention mechanism that replaces this naive assumption with a learnable, continuous prior. This prior maintains full compatibility with optimized kernels such as FlashAttention. GOAT also provides an EOT-based explanation of attention sinks and materializes a solution for them, avoiding the representational trade-offs of standard attention. Finally, by absorbing spatial information into the core attention computation, GOAT learns an extrapolatable prior that combines the flexibility of learned positional embeddings with the length generalization of fixed encodings.

2601.15363 2026-01-23 stat.ML cs.LG

Non-Stationary Functional Bilevel Optimization

Jason Bohne, Ieva Petrulionyte, Michael Arbel, Julien Mairal, Paweł Polak

详情
英文摘要

Functional bilevel optimization (FBO) provides a powerful framework for hierarchical learning in function spaces, yet current methods are limited to static offline settings and perform suboptimally in online, non-stationary scenarios. We propose SmoothFBO, the first algorithm for non-stationary FBO with both theoretical guarantees and practical scalability. SmoothFBO introduces a time-smoothed stochastic hypergradient estimator that reduces variance through a window parameter, enabling stable outer-loop updates with sublinear regret. Importantly, the classical parametric bilevel case is a special reduction of our framework, making SmoothFBO a natural extension to online, non-stationary settings. Empirically, SmoothFBO consistently outperforms existing FBO methods in non-stationary hyperparameter optimization and model-based reinforcement learning, demonstrating its practical effectiveness. Together, these results establish SmoothFBO as a general, theoretically grounded, and practically viable foundation for bilevel optimization in online, non-stationary scenarios.

2601.15360 2026-01-23 stat.ML cs.LG econ.EM stat.ME

Robust X-Learner: Breaking the Curse of Imbalance and Heavy Tails via Robust Cross-Imputation

Eichi Uehara

Comments 17 pages, 4 figures, 4 tables

详情
英文摘要

Estimating Heterogeneous Treatment Effects (HTE) in industrial applications such as AdTech and healthcare presents a dual challenge: extreme class imbalance and heavy-tailed outcome distributions. While the X-Learner framework effectively addresses imbalance through cross-imputation, we demonstrate that it is fundamentally vulnerable to "Outlier Smearing" when reliant on Mean Squared Error (MSE) minimization. In this failure mode, the bias from a few extreme observations ("whales") in the minority group is propagated to the entire majority group during the imputation step, corrupting the estimated treatment effect structure. To resolve this, we propose the Robust X-Learner (RX-Learner). This framework integrates a redescending γ-divergence objective -- structurally equivalent to the Welsch loss under Gaussian assumptions -- into the gradient boosting machinery. We further stabilize the non-convex optimization using a Proxy Hessian strategy grounded in Majorization-Minimization (MM) principles. Empirical evaluation on a semi-synthetic Criteo Uplift dataset demonstrates that the RX-Learner reduces the Precision in Estimation of Heterogeneous Effect (PEHE) metric by 98.6% compared to the standard X-Learner, effectively decoupling the stable "Core" population from the volatile "Periphery".

2601.15353 2026-01-23 stat.AP cs.LG stat.ML

Statistical Reinforcement Learning in the Real World: A Survey of Challenges and Future Directions

Asim H. Gazi, Yongyi Guo, Daiqi Gao, Ziping Xu, Kelly W. Zhang, Susan A. Murphy

详情
英文摘要

Reinforcement learning (RL) has achieved remarkable success in real-world decision-making across diverse domains, including gaming, robotics, online advertising, public health, and natural language processing. Despite these advances, a substantial gap remains between RL research and its deployment in many practical settings. Two recurring challenges often underlie this gap. First, many settings offer limited opportunity for the agent to interact extensively with the target environment due to practical constraints. Second, many target environments often undergo substantial changes, requiring redesign and redeployment of RL systems (e.g., advancements in science and technology that change the landscape of healthcare delivery). Addressing these challenges and bridging the gap between basic research and application requires theory and methodology that directly inform the design, implementation, and continual improvement of RL systems in real-world settings. In this paper, we frame the application of RL in practice as a three-component process: (i) online learning and optimization during deployment, (ii) post- or between-deployment offline analyses, and (iii) repeated cycles of deployment and redeployment to continually improve the RL system. We provide a narrative review of recent advances in statistical RL that address these components, including methods for maximizing data utility for between-deployment inference, enhancing sample efficiency for online learning within-deployment, and designing sequences of deployments for continual improvement. We also outline future research directions in statistical RL that are use-inspired -- aiming for impactful application of RL in practice.

2601.15249 2026-01-23 cs.LG cs.AI cs.GT stat.ME

Recommending Best Paper Awards for ML/AI Conferences via the Isotonic Mechanism

Garrett G. Wen, Buxin Su, Natalie Collina, Zhun Deng, Weijie Su

详情
英文摘要

Machine learning and artificial intelligence conferences such as NeurIPS and ICML now regularly receive tens of thousands of submissions, posing significant challenges to maintaining the quality and consistency of the peer review process. This challenge is particularly acute for best paper awards, which are an important part of the peer review process, yet whose selection has increasingly become a subject of debate in recent years. In this paper, we introduce an author-assisted mechanism to facilitate the selection of best paper awards. Our method employs the Isotonic Mechanism for eliciting authors' assessments of their own submissions in the form of a ranking, which is subsequently utilized to adjust the raw review scores for optimal estimation of the submissions' ground-truth quality. We demonstrate that authors are incentivized to report truthfully when their utility is a convex additive function of the adjusted scores, and we validate this convexity assumption for best paper awards using publicly accessible review data of ICLR from 2019 to 2023 and NeurIPS from 2021 to 2023. Crucially, in the special case where an author has a single quota -- that is, may nominate only one paper -- we prove that truthfulness holds even when the utility function is merely nondecreasing and additive. This finding represents a substantial relaxation of the assumptions required in prior work. For practical implementation, we extend our mechanism to accommodate the common scenario of overlapping authorship. Finally, simulation results demonstrate that our mechanism significantly improves the quality of papers selected for awards.

2601.14872 2026-01-23 math.ST cs.LG stat.ME stat.ML stat.TH

Finite-Sample Inference for Sparsely Permuted Linear Regression

Hirofumi Ota, Masaaki Imaizumi

详情
英文摘要

We study a linear observation model with an unknown permutation called \textit{permuted/shuffled linear regression}, where responses and covariates are mismatched and the permutation forms a discrete, factorial-size parameter. The permutation is a key component of the data-generating process, yet its statistical investigation remains challenging due to its discrete nature. We develop a general statistical inference framework on the permutation and regression coefficients. First, we introduce a localization step that reduces the permutation space to a small candidate set building on recent advances in the repro samples method, whose miscoverage decays polynomially with the number of Monte Carlo samples. Then, based on this localized set, we provide statistical inference procedures: a conditional Monte Carlo test of permutation structures with valid finite-sample Type-I error control. We also develop coefficient inference that remains valid under alignment uncertainty of permutations. For computational purposes, we develop a linear assignment problem computable in polynomial time and demonstrate that, with high probability, the solution is equivalent to that of the conventional least squares with large computational cost. Extensions to partially permuted designs and ridge regularization are further discussed. Extensive simulations and an application to air-quality data corroborate finite-sample validity, strong power to detect mismatches, and practical scalability.

2601.14727 2026-01-23 stat.ME

Recent advances in the Bradley--Terry Model: theory, algorithms, and applications

Shuxing Fang, Ruijian Han, Yuanhang Luo, Yiming Xu

详情
英文摘要

This article surveys recent progress in the Bradley-Terry (BT) model and its extensions. We focus on the statistical and computational aspects, with emphasis on the regime in which both the number of objects and the volume of comparisons tend to infinity, a setting relevant to large-scale applications. The main topics include asymptotic theory for statistical estimation and inference, along with the associated algorithms. We also discuss applications of these models, including recent work on preference alignment in machine learning. Finally, we discuss several key challenges and outline directions for future research.

2601.13448 2026-01-23 cs.LG math.OC stat.ML

Fairness-informed Pareto Optimization : An Efficient Bilevel Framework

Sofiane Tanji, Samuel Vaiter, Yassine Laguel

详情
英文摘要

Despite their promise, fair machine learning methods often yield Pareto-inefficient models, in which the performance of certain groups can be improved without degrading that of others. This issue arises frequently in traditional in-processing approaches such as fairness-through-regularization. In contrast, existing Pareto-efficient approaches are biased towards a certain perspective on fairness and fail to adapt to the broad range of fairness metrics studied in the literature. In this paper, we present BADR, a simple framework to recover the optimal Pareto-efficient model for any fairness metric. Our framework recovers its models through a Bilevel Adaptive Rescalarisation procedure. The lower level is a weighted empirical risk minimization task where the weights are a convex combination of the groups, while the upper level optimizes the chosen fairness objective. We equip our framework with two novel large-scale, single-loop algorithms, BADR-GD and BADR-SGD, and establish their convergence guarantees. We release badr, an open-source Python toolbox implementing our framework for a variety of learning tasks and fairness metrics. Finally, we conduct extensive numerical experiments demonstrating the advantages of BADR over existing Pareto-efficient approaches to fairness.

2601.01147 2026-01-23 stat.ML cs.LG

Conformal Blindness: A Note on $A$-Cryptic change-points

Johan Hallberg Szabadváry

Comments 6 pages, 3 figures

详情
英文摘要

Conformal Test Martingales (CTMs) are a standard method within the Conformal Prediction framework for testing the crucial assumption of data exchangeability by monitoring deviations from uniformity in the p-value sequence. Although exchangeability implies uniform p-values, the converse does not hold. This raises the question of whether a significant break in exchangeability can occur, such that the p-values remain uniform, rendering CTMs blind. We answer this affirmatively, demonstrating the phenomenon of \emph{conformal blindness}. Through explicit construction, for the theoretically ideal ``predictive oracle'' conformity measure (given by the true conditional density), we demonstrate the possibility of an \emph{$A$-cryptic change-point} (where $A$ refers to the conformity measure). Using bivariate Gaussian distributions, we identify a line along which a change in the marginal means does not alter the distribution of the conformity scores, thereby producing perfectly uniform p-values. Simulations confirm that even a massive distribution shift can be perfectly cryptic to the CTM, highlighting a fundamental limitation and emphasising the critical role of the alignment of the conformity measure with potential shifts. By contrasting the predictive oracle with recent results on detection-optimal scores, we emphasise that validity monitoring in safety-critical systems requires careful separation of predictive and diagnostic goals.

2512.08671 2026-01-23 cs.LG stat.ML

DS FedProxGrad: Asymptotic Stationarity Without Noise Floor in Fair Federated Learning

Huzaifa Arif

Comments Withdrawn due to an error in the proof; a corrected version will be posted

详情
英文摘要

Recent work \cite{arifgroup} introduced Federated Proximal Gradient \textbf{(\texttt{FedProxGrad})} for solving non-convex composite optimization problems in group fair federated learning. However, the original analysis established convergence only to a \textit{noise-dominated neighborhood of stationarity}, with explicit dependence on a variance-induced noise floor. In this work, we provide an improved asymptotic convergence analysis for a generalized \texttt{FedProxGrad}-type analytical framework with inexact local proximal solutions and explicit fairness regularization. We call this extended analytical framework \textbf{DS \texttt{FedProxGrad}} (Decay Step Size \texttt{FedProxGrad}). Under a Robbins-Monro step-size schedule \cite{robbins1951stochastic} and a mild decay condition on local inexactness, we prove that $\liminf_{r\to\infty} \mathbb{E}[\|\nabla F(\mathbf{x}^r)\|^2] = 0$, i.e., the algorithm is asymptotically stationary and the convergence rate does not depend on a variance-induced noise floor.

2511.22049 2026-01-23 stat.ME

Univariate-Guided Sparse Regression for Biobank-Scale High-Dimensional Omics Data

Joshua Richland, Tuomo Kiiskinen, William Wang, Sophia Lu, Balasubramanian Narasimhan, Trevor Hastie, Manuel Rivas, Robert Tibshirani

详情
英文摘要

We present a scalable framework for computing polygenic risk scores (PRS) in high-dimensional genomic settings using the recently introduced Univariate-Guided Sparse Regression (uniLasso). UniLasso is a two-stage penalized regression procedure that leverages univariate coefficients and magnitudes to stabilize feature selection and enhance interpretability. Building on its theoretical and empirical advantages, we adapt uniLasso for application to the UK Biobank, a population-based repository comprising over one million genetic variants measured on hundreds of thousands of individuals from the United Kingdom. We further extend the framework to incorporate external summary statistics to increase predictive accuracy. Our results demonstrate that uniLasso attains predictive performance comparable to standard Lasso while selecting substantially fewer variants, yielding sparser and more interpretable models. Moreover, it exhibits superior performance in estimating PRS relative to its competitors, such as PRS-CS. Integrating external scores further improves prediction while maintaining sparsity.

2511.01785 2026-01-23 stat.ME

RESOLVE-IPD: High-Fidelity Individual Patient Data Reconstruction and Uncertainty-Aware Subgroup Meta-Analysis

Lang Lang, Yao Zhao, Qiuxin Gao, Yanxun Xu

详情
英文摘要

Individual patient data (IPD) from oncology trials are essential for reliable evidence synthesis but are rarely publicly available, necessitating reconstruction from published Kaplan-Meier (KM) curves. Existing reconstruction methods suffer from digitization errors, unrealistic uniform censoring assumptions, and the inability to recover subgroup-level IPD when only aggregate statistics are available. We developed RESOLVE-IPD, a unified computational framework that enables high-fidelity IPD reconstruction and uncertainty-aware subgroup meta-analysis to address these limitations. RESOLVE-IPD comprises two components. The first component, High-Fidelity IPD Reconstruction, integrates the VEC-KM and CEN-KM modules: VEC-KM extracts precise KM coordinates and explicit censoring marks from vectorized figures, minimizing digitization error, while CEN-KM corrects overlapping censor symbols and eliminates the uniform censoring assumption. The second component, Uncertainty-Aware Subgroup Recovery, employs the MAPLE (Marginal Assignment of Plausible Labels and Evidence Propagation) algorithm to infer patient-level subgroup labels consistent with published summary statistics (e.g., hazard ratio, median overall survival) when subgroup KM curves are unavailable. MAPLE generates ensembles of mathematically valid labelings, facilitating a propagating meta-analysis that quantifies and reflects uncertainty from subgroup reconstruction. RESOLVE-IPD was validated through a subgroup meta-analysis of four trials in advanced esophageal squamous cell carcinoma, focusing on the programmed death ligand 1 (PD-L1)-low population. RESOLVE-IPD enables accurate IPD reconstruction and robust, uncertainty-aware subgroup meta-analyses, strengthening the reliability and transparency of secondary evidence synthesis in precision oncology.

2510.06815 2026-01-23 stat.ME math.ST stat.TH

Inference in pseudo-observation-based regression using (biased) covariance estimation and naive bootstrapping

Simon Mack, Morten Overgaard, Dennis Dobler

Comments 35 pages, 7 tables; minor changes to abstract, added clarification regarding assumption 2;

详情
英文摘要

The pseudo-observation method is regularly applied to time-to-event data. However, to date such analyses have relied on not formally verified statements or ad-hoc methods regarding covariance estimation. This paper strives to close this gap in the literature. To begin with, we demonstrate that the usual Huber-White estimator is not consistent for the limiting covariance of parameter estimates in pseudo-observation regression approaches. By confirming that a plug-in estimator can be used instead, we obtain asymptotically exact and consistent tests for general linear hypotheses in the parameters of the model. Additionally, we confirm that naive bootstrapping can not be used for covariance estimation in the pseudo-observation model either. However, it can be used for hypothesis testing by applying a suitable studentization. Simulations illustrate the good performance of our proposed methods in many scenarios. Finally, we obtain a general uniform law of large numbers for U- and V-statistics, as such statistics are central in the mathematical analysis of the inference procedures developed in this work.

2510.04256 2026-01-23 stat.CO

regTPS-KLE: A Novel Approach To Approximate A Gaussian Random Field for Bayesian Spatial Modeling

Joaquin Cavieres, Sebastian Krumscheid

详情
英文摘要

Gaussian random field is a ubiquitous model for spatial phenomena in diverse scientific disciplines. Its approximation is often crucial for computational feasibility in simulation, inference, and uncertainty quantification. The Karhunen-Loève Expansion provides a theoretically optimal basis for representing a Gaussian random field as a sum of deterministic orthonormal functions weighted by uncorrelated random variables. While this is a well-established method for dimension reduction and approximation of (spatial) stochastic processes, its practical application depends on the explicit or implicit definition of the covariance structure. In this work, we propose a novel approach, referred to as regTPS-KLE, for approximating a Gaussian random field by explicitly constructing its covariance via a regularized thin plate spline (TPS) kernel. Because TPS kernels are conditionally positive definite and lack a direct spectral decomposition, we formulate the covariance as the inverse of a regularized elliptic operator. To evaluate its statistical performance, we compare its predictive accuracy and computational efficiency with a Gaussian random field approximation constructed using the stochastic partial differential equations (SPDE) method and implemented within an MCMC algorithm. In simulation studies, the predictive differences between the SPDE and regTPS-KLE models were minimal when the spatial field was generated using Matèrn and exponential covariance functions, while regTPS-KLE models consistently outperformed the SPDE approach in terms of computational efficiency. In a real data application, regTPS-KLE exhibits superior predictive accuracy compared with SPDE models based on leave-one-out cross-validation while also achieving improved computational efficiency.

2509.21191 2026-01-23 stat.AP q-bio.QM

Not All Accuracy Is Equal: Prioritizing Independence in Infectious Disease Forecasting

Carson Dudley, Marisa Eisenberg

Comments 5 pages, 2 figures

详情
英文摘要

Ensemble forecasts have become a cornerstone of large-scale disease response, underpinning decision making at agencies such as the US Centers for Disease Control and Prevention (CDC). Their growing use reflects the goal of combining multiple models to improve accuracy and stability versus relying on any single model. However, while ensembles regularly demonstrate stability against individual model failures, improved accuracy is not guaranteed. During the COVID-19 pandemic, the CDC's multi-model ensemble outperformed the best single model by only 1\%, and CDC flu ensembles have often ranked below individual models. Prior work has established that ensemble performance depends critically on diversity: when models make independent errors, combining them yields substantial gains. In practice, however, this diversity is often lacking. Here, we propose that this is due in part to how models are developed and selected: both modelers and ensemble builders optimize for stand-alone accuracy rather than ensemble contribution, and most epidemic forecasts are built from a small set of approaches trained on the same surveillance data. The result is highly correlated errors, limiting the benefit of ensembling. This suggests that in developing models and ensembles, we should prioritize models that contribute complementary information rather than replicating existing approaches. We present a toy example illustrating the theoretical cost of correlated errors, analyze correlations among COVID-19 forecasting models, and propose improvements to model fitting and ensemble construction that foster genuine diversity. Ensembles built with this principle in mind produce forecasts that are more robust and more valuable for epidemic preparedness and response.

2508.21804 2026-01-23 stat.ME cs.LG

Considerations for Estimating Causal Effects of Informatively Timed Treatments

Arman Oganisian

Comments Epidemiology, January 15, 2026

详情
英文摘要

Epidemiological studies are often concerned with estimating causal effects of a sequence of treatment decisions on survival outcomes. In many settings, treatment decisions do not occur at fixed, pre-specified followup times. Rather, timing varies across subjects in ways that may be informative of subsequent treatment decisions and potential outcomes. Awareness of the issue and its potential solutions is lacking in the literature, which motivate this work. Here, we formalize the issue of informative timing, problems associated with ignoring it, and show how g-methods can be used to analyze sequential treatments that are informatively timed. As we describe, in such settings, the waiting times between successive treatment decisions may be properly viewed as a time-varying confounders. Using synthetic examples, we illustrate how g-methods that do not adjust for these waiting times may be biased and how adjustment can be done in scenarios where patients may die or be censored in between treatments. We draw connections between adjustment and identification with discrete-time versus continuous-time models. Finally, we provide implementation guidance and examples using publicly available software. Our concluding message is that 1) considering timing is important for valid inference and 2) correcting for informative timing can be done with g-methods that adjust for waiting times between treatments as time-varying confounders.

2508.03636 2026-01-23 stat.ML cs.LG math.ST stat.AP stat.ME stat.TH

Likelihood Matching for Diffusion Models

Lei Qian, Wu Su, Yanqi Huang, Song Xi Chen

详情
英文摘要

We propose a Likelihood Matching approach for training diffusion models by first establishing an equivalence between the likelihood of the target data distribution and a likelihood along the sample path of the reverse diffusion. To efficiently compute the reverse sample likelihood, a quasi-likelihood is considered to approximate each reverse transition density by a Gaussian distribution with matched conditional mean and covariance, respectively. The score and Hessian functions for the diffusion generation are estimated by maximizing the quasi-likelihood, ensuring a consistent matching of both the first two transitional moments between every two time points. A stochastic sampler is introduced to facilitate computation that leverages both the estimated score and Hessian information. We establish consistency of the quasi-maximum likelihood estimation, and provide non-asymptotic convergence guarantees for the proposed sampler, quantifying the rates of the approximation errors due to the score and Hessian estimation, dimensionality, and the number of diffusion steps. Empirical and simulation evaluations demonstrate the effectiveness of the proposed Likelihood Matching and validate the theoretical results.

2507.06775 2026-01-23 cs.LG math.AT stat.ML

Stability, Complexity and Data-Dependent Worst-Case Generalization Bounds

Mario Tuci, Lennart Bastian, Benjamin Dupuis, Nassir Navab, Tolga Birdal, Umut Şimşekli

Comments 29 pages

详情
英文摘要

Providing generalization guarantees for stochastic optimization algorithms remains a key challenge in learning theory. Recently, numerous works demonstrated the impact of the geometric properties of optimization trajectories on generalization performance. These works propose worst-case generalization bounds in terms of various notions of intrinsic dimension and/or topological complexity, which were found to empirically correlate with the generalization error. However, most of these approaches involve intractable mutual information terms, which limit a full understanding of the bounds. In contrast, some authors built on algorithmic stability to obtain worst-case bounds involving geometric quantities of a combinatorial nature, which are impractical to compute. In this paper, we address these limitations by combining empirically relevant complexity measures with a framework that avoids intractable quantities. To this end, we introduce the concept of \emph{random set stability}, tailored for the data-dependent random sets produced by stochastic optimization algorithms. Within this framework, we show that the worst-case generalization error can be bounded in terms of (i) the random set stability parameter and (ii) empirically relevant, data- and algorithm-dependent complexity measures of the random set. Moreover, our framework improves existing topological generalization bounds by recovering previous complexity notions without relying on mutual information terms. Through a series of experiments in practically relevant settings, we validate our theory by evaluating the tightness of our bounds and the interplay between topological complexity and stability.

2504.15211 2026-01-23 cs.AI stat.AP

Embracing Ambiguity: Bayesian Nonparametrics and Stakeholder Participation for Ambiguity-Aware Safety Evaluation

Yanan Long

Comments AAAI 2026 workshop MURE

详情
英文摘要

Evaluations of generative AI models often collapse nuanced behaviour into a single number computed for a single decoding configuration. Such point estimates obscure tail risks, demographic disparities, and the existence of multiple near-optimal operating points. We propose a unified framework that embraces multiplicity by modelling the distribution of harmful behaviour across the entire space of decoding knobs and prompts, quantifying risk through tail-focused metrics, and integrating stakeholder preferences. Our technical contributions are threefold: (i) we formalise decoding Rashomon sets, regions of knob space whose risk is near-optimal under given criteria and measure their size and disagreement; (ii) we develop a dependent Dirichlet process (DDP) mixture with stakeholder-conditioned stick-breaking weights to learn multi-modal harm surfaces; and (iii) we introduce an active sampling pipeline that uses Bayesian deep learning surrogates to explore knob space efficiently. Our approach bridges multiplicity theory, Bayesian nonparametrics, and stakeholder-aligned sensitivity analysis, paving the way for trustworthy deployment of generative models.

2502.15655 2026-01-23 math.ST math.PR stat.ML stat.TH

Local geometry of high-dimensional mixture models: Effective spectral theory and dynamical transitions

Gerard Ben Arous, Reza Gheissari, Jiaoyang Huang, Aukosh Jagannath

Comments Final version. 60 pages, 7 figures

详情
英文摘要

We study the local geometry of empirical risks in high dimensions via the spectral theory of their Hessian and information matrices. We focus on settings where the data, $(Y_\ell)_{\ell =1}^n \in \mathbb{R}^d$, are i.i.d. draws of a $k$-Gaussian mixture model, and the loss depends on the projection of the data into a fixed number of vectors, namely $\mathbf{x}^\top Y$, where $\mathbf{x}\in \mathbb{R}^{d\times C}$ are the parameters, and $C$ need not equal $k$. This setting captures a broad class of problems such as classification by one and two-layer networks and regression on multi-index models. We provide exact formulas for the limits of the empirical spectral distribution and outlier eigenvalues and eigenvectors of such matrices in the proportional asymptotics limit, where the number of samples and dimension $n,d\to\infty$ and $n/d=ϕ\in (0,\infty)$. These limits depend on the parameters $\mathbf{x}$ only through the summary statistic of the $(C+k)\times (C+k)$ Gram matrix of the parameters and class means, $\mathbf{G} = (\mathbf{x},\boldsymbolμ)^\top(\mathbf{x},\boldsymbolμ)$. It is known that under general conditions, when $\mathbf{x}$ is trained by online stochastic gradient descent, the evolution of these same summary statistics along training converges to the solution of an autonomous system of ODEs, called the effective dynamics. This enables us to connect the training dynamics to the spectral theory of these matrices generated with test data. We demonstrate our general results by analyzing the effective spectrum along the effective dynamics in the case of multi-class logistic regression. In this setting, the empirical Hessian and information matrices have substantially different spectra, each with their own static and even dynamical spectral transitions.

2502.00214 2026-01-23 stat.ME

A critical evaluation of longitudinal proportional effect models

Michael C. Donohue, Philip S. Insel, Oliver Langford

Comments 9 pages, 4 figures

详情
英文摘要

Nonlinear longitudinal proportional effect models have been proposed to improve power and provide direct estimates of the proportional treatment effect in randomized clinical trials. These models assume a fixed proportional treatment effect over time, which can lead to bias and Type I error inflation when the assumption is violated. Even when the proportional effect assumption holds, these models are biased, and their inference is sensitive to the labeling of treatment groups. Typically, this bias favors the active group, inflates Type I error, and can result in one-sided testing. Conversely, the bias can make it more difficult to detect treatment harm, creating a safety concern.

2501.19373 2026-01-23 stat.ML cs.LG

Beyond Fixed Horizons: A Theoretical Framework for Adaptive Denoising Diffusions

Sören Christensen, Jan Kallsen, Claudia Strauch, Lukas Trottner

详情
英文摘要

We introduce a new class of generative diffusion models that, unlike conventional denoising diffusion models, achieve a time-homogeneous structure for both the noising and denoising processes, allowing the number of steps to adaptively adjust based on the noise level. This is accomplished by conditioning the forward process using Doob's $h$-transform, which terminates the process at a suitable sampling distribution at a random time. The model is particularly well suited for generating data with lower intrinsic dimensions, as the termination criterion simplifies to a first-hitting rule. A key feature of the model is its adaptability to the target data, enabling a variety of downstream tasks using a pre-trained unconditional generative model. These tasks include natural conditioning through appropriate initialisation of the denoising process and classification of noisy data.

2412.01212 2026-01-23 stat.ML cond-mat.stat-mech cs.CL cs.LG

Berezinskii--Kosterlitz--Thouless transition in a context-sensitive random language model

Yuma Toji, Jun Takahashi, Vwani Roychowdhury, Hideyuki Miyahara

Comments accepted for publication in PRE

Journal ref Phys. Rev. E 113, 015305 (Published 21 January, 2026)

详情
英文摘要

Several power-law critical properties involving different statistics in natural languages -- reminiscent of scaling properties of physical systems at or near phase transitions -- have been documented for decades. The recent rise of large language models has added further evidence and excitement by providing intriguing similarities with notions in physics such as scaling laws and emergent abilities. However, specific instances of classes of generative language models that exhibit phase transitions, as understood by the statistical physics community, are lacking. In this work, inspired by the one-dimensional Potts model in statistical physics, we construct a simple probabilistic language model that falls under the class of context-sensitive grammars, which we call the context-sensitive random language model, and numerically demonstrate an unambiguous phase transition in the framework of a natural language model. We explicitly show that a precisely defined order parameter -- that captures symbol frequency biases in the sentences generated by the language model -- changes from strictly zero to a strictly nonzero value (in the infinite-length limit of sentences), implying a mathematical singularity arising when tuning the parameter of the stochastic language model we consider. Furthermore, we identify the phase transition as a variant of the Berezinskii--Kosterlitz--Thouless (BKT) transition, which is known to exhibit critical properties not only at the transition point but also in the entire phase. This finding leads to the possibility that critical properties in natural languages may not require careful fine-tuning nor self-organized criticality, but are generically explained by the underlying connection between language structures and the BKT phases.

2409.15059 2026-01-23 math.ST stat.TH

Multivariate change estimation for a stochastic heat equation from local measurements

Anton Tiepner, Lukas Trottner

Comments 37 pages, 4 figures

详情
英文摘要

We study a stochastic heat equation with piecewise constant diffusivity $θ$ having a jump at a hypersurface $Γ$ that splits the underlying space $[0,1]^d$, $d\geq2,$ into two disjoint sets $Λ_-\cupΛ_+.$ Based on multiple spatially localized measurement observations on a regular $δ$-grid of $[0,1]^d$, we propose a joint M-estimator for the diffusivity values and the set $Λ_+$ that is inspired by statistical image reconstruction methods. We study convergence of the domain estimator $\hatΛ_+$ in the vanishing resolution level regime $δ\to 0$ and with respect to the expected symmetric difference pseudometric. As a first main finding we give a characterization of the convergence rate for $\hatΛ_+$ in terms of the complexity of $Γ$ measured by the number of intersecting hypercubes from the regular $δ$-grid. Furthermore, for the special case of domains $Λ_+$ that are built from hypercubes from the $δ$-grid, we demonstrate that perfect identification with overwhelming probability is possible with a slight modification of the estimation approach. Implications of our general results are discussed under two specific structural assumptions on $Λ_+$. For a $β$-Hölder smooth boundary fragment $Γ$, the set $Λ_+$ is estimated with rate $δ^β$. If we assume $Λ_+$ to be convex, we obtain a $δ$-rate. While our approach only aims at optimal domain estimation rates, we also demonstrate consistency of our diffusivity estimators, which is strengthened to a CLT at minimax optimal rate for sets $Λ_+$ anchored on the $δ$-grid.

2406.12205 2026-01-23 cs.LG cs.AI cs.IT math.IT math.ST stat.ML stat.TH

On the Exponential Convergence for Offline RLHF with Pairwise Comparisons

Zhirui Chen, Vincent Y. F. Tan

Comments Accepted as an oral presentation at AAAI 2026 (AI Alignment Track)

详情
英文摘要

We consider the problem of offline reinforcement learning from human feedback (RLHF) with pairwise comparisons proposed by Zhu et al. (2023), where the implicit reward is a linear function of an unknown parameter. Given an offline dataset, our objective consists in ascertaining the optimal action for each state, with the ultimate goal of minimizing the {\em simple regret}. We propose an algorithm, \underline{RL} with \underline{L}ocally \underline{O}ptimal \underline{W}eights or {\sc RL-LOW}, which yields an exponential form of simple regret of $\exp ( - Ω(n/H) )$ where $n$ is the number of data samples and $H$ denotes an instance-dependent hardness quantity that depends explicitly on the suboptimality gap of each action. Furthermore, we derive a first-of-its-kind instance-dependent lower bound in offline RLHF with pairwise comparisons. Interestingly, we observe that the lower and upper bounds on the simple regret match order-wise in the exponent, demonstrating order-wise optimality of our {\sc RL-LOW}. In view of privacy considerations in practical applications, we also extend {\sc RL-LOW} to the setting of $(\varepsilon,δ)$-differential privacy and show, somewhat surprisingly, that the hardness parameter $H$ is unchanged in the asymptotic regime as $n$ tends to infinity; this underscores the inherent efficiency of {\sc RL-LOW} in terms of preserving the privacy of the observed rewards. Given our focus on establishing instance-dependent bounds of exponential convergence, our research fills the research gap in existing studies that concentrate on establishing worst-case regrets of {\em inverse polynomial convergence} (e.g., $\widetilde{O}(\frac{1}{\sqrt{n}})$) for offline RLHF with pairwise comparisons.

2404.18678 2026-01-23 stat.ME

Sequential model confidence sets

Sebastian Arnold, Georgios Gavrilopoulos, Benedikt Schulz, Johanna Ziegel

详情
英文摘要

In most prediction and estimation situations, scientists consider various statistical models for the same problem, and naturally want to select amongst the best. Hansen et al. (2011) provide a powerful solution to this problem by the so-called model confidence set, a subset of the original set of available models that contains the best models with a given level of confidence. Importantly, model confidence sets respect the underlying selection uncertainty by being flexible in size. However, they presuppose a fixed sample size which stands in contrast to the fact that model selection and forecast evaluation are inherently sequential tasks where we successively collect new data and where the decision to continue or conclude a study may depend on the previous outcomes. In this article, we extend model confidence sets sequentially over time by relying on sequential testing methods. Recently, e-processes and confidence sequences have been introduced as new, safe methods for assessing statistical evidence. Sequential model confidence sets allow to continuously monitor the models' performances and come with time-uniform, nonasymptotic coverage guarantees.

2401.14193 2026-01-23 eess.IV cs.CV stat.AP

Clinical Melanoma Diagnosis with Artificial Intelligence: Insights from a Prospective Multicenter Study

Lukas Heinlein, Roman C. Maron, Achim Hekler, Sarah Haggenmüller, Christoph Wies, Jochen S. Utikal, Friedegund Meier, Sarah Hobelsberger, Frank F. Gellrich, Mildred Sergon, Axel Hauschild, Lars E. French, Lucie Heinzerling, Justin G. Schlager, Kamran Ghoreschi, Max Schlaak, Franz J. Hilke, Gabriela Poch, Sören Korsing, Carola Berking, Markus V. Heppt, Michael Erdmann, Sebastian Haferkamp, Konstantin Drexler, Dirk Schadendorf, Wiebke Sondermann, Matthias Goebeler, Bastian Schilling, Eva Krieghoff-Henning, Titus J. Brinker

详情
英文摘要

Early detection of melanoma, a potentially lethal type of skin cancer with high prevalence worldwide, improves patient prognosis. In retrospective studies, artificial intelligence (AI) has proven to be helpful for enhancing melanoma detection. However, there are few prospective studies confirming these promising results. Existing studies are limited by low sample sizes, too homogenous datasets, or lack of inclusion of rare melanoma subtypes, preventing a fair and thorough evaluation of AI and its generalizability, a crucial aspect for its application in the clinical setting. Therefore, we assessed 'All Data are Ext' (ADAE), an established open-source ensemble algorithm for detecting melanomas, by comparing its diagnostic accuracy to that of dermatologists on a prospectively collected, external, heterogeneous test set comprising eight distinct hospitals, four different camera setups, rare melanoma subtypes, and special anatomical sites. We advanced the algorithm with real test-time augmentation (R-TTA, i.e. providing real photographs of lesions taken from multiple angles and averaging the predictions), and evaluated its generalization capabilities. Overall, the AI showed higher balanced accuracy than dermatologists (0.798, 95% confidence interval (CI) 0.779-0.814 vs. 0.781, 95% CI 0.760-0.802; p<0.001), obtaining a higher sensitivity (0.921, 95% CI 0.900- 0.942 vs. 0.734, 95% CI 0.701-0.770; p<0.001) at the cost of a lower specificity (0.673, 95% CI 0.641-0.702 vs. 0.828, 95% CI 0.804-0.852; p<0.001). As the algorithm exhibited a significant performance advantage on our heterogeneous dataset exclusively comprising melanoma-suspicious lesions, AI may offer the potential to support dermatologists particularly in diagnosing challenging cases.

2309.04746 2026-01-23 stat.ME

Global quantile regression

Tomáš Mrkvička, Konstantinos Konstantinou, Mikko Kuronen, Mari Myllymäki

Comments 44 pages, 12 figures

Journal ref Statistics and Computing 2026; 36, 66

详情
英文摘要

Quantile regression is used to study effects of covariates on a particular quantile of the data distribution. Here we are interested in the question whether a covariate has any effect on the entire data distribution, i.e., on any of the quantiles. To this end, we treat all the quantiles simultaneously and consider global tests for the existence of the covariate effect in the presence of nuisance covariates. This global quantile regression can be used as the extension of linear regression or as the extension of distribution comparison in the sense of Kolmogorov-Smirnov test. The proposed method is based on pointwise coefficients, permutations and global envelope tests. The global envelope test serves as the multiple test adjustment procedure under the control of the family-wise error rate and provides the graphical interpretation which automatically shows the quantiles or the levels of categorical covariate responsible for the rejection. The Freedman-Lane permutation strategy showed liberality of the test for extreme quantiles, therefore we propose four alternatives that work well even for extreme quantiles and are suitable in different conditions. We present a simulation study to inspect the performance of these strategies, and we apply the chosen strategies to two data examples.

2309.03494 2026-01-23 eess.IV cs.CV stat.AP

Evaluating Deep Learning-based Melanoma Classification using Immunohistochemistry and Routine Histology: A Three Center Study

Christoph Wies, Lucas Schneider, Sarah Haggenmueller, Tabea-Clara Bucher, Sarah Hobelsberger, Markus V. Heppt, Gerardo Ferrara, Eva I. Krieghoff-Henning, Titus J. Brinker

详情
英文摘要

Pathologists routinely use immunohistochemical (IHC)-stained tissue slides against MelanA in addition to hematoxylin and eosin (H&E)-stained slides to improve their accuracy in diagnosing melanomas. The use of diagnostic Deep Learning (DL)-based support systems for automated examination of tissue morphology and cellular composition has been well studied in standard H&E-stained tissue slides. In contrast, there are few studies that analyze IHC slides using DL. Therefore, we investigated the separate and joint performance of ResNets trained on MelanA and corresponding H&E-stained slides. The MelanA classifier achieved an area under receiver operating characteristics curve (AUROC) of 0.82 and 0.74 on out of distribution (OOD)-datasets, similar to the H&E-based benchmark classification of 0.81 and 0.75, respectively. A combined classifier using MelanA and H&E achieved AUROCs of 0.85 and 0.81 on the OOD datasets. DL MelanA-based assistance systems show the same performance as the benchmark H&E classification and may be improved by multi stain classification to assist pathologists in their clinical routine.

2306.12674 2026-01-23 stat.ME stat.AP

Mapping poverty at multiple geographical scales

Silvia De Nicolò, Enrico Fabrizi, Aldo Gardini

Comments 22 pages, 7 figures

Journal ref De Nicolò, S., Fabrizi, E., Gardini, A. (2024). Mapping non-monetary poverty at multiple geographical scales. JOURNAL OF THE ROYAL STATISTICAL SOCIETY. SERIES A. STATISTICS IN SOCIETY, 187(4 (October)), 1096-1119

详情
英文摘要

Poverty mapping is a powerful tool to study the geography of poverty. The choice of the spatial resolution is central as poverty measures defined at a coarser level may mask their heterogeneity at finer levels. We introduce a small area multi-scale approach integrating survey and remote sensing data that leverages information at different spatial resolutions and accounts for hierarchical dependencies, preserving estimates coherence. We map poverty rates by proposing a Bayesian Beta-based model equipped with a new benchmarking algorithm that accounts for the double-bounded support. A simulation study shows the effectiveness of our proposal and an application on Bangladesh is discussed.

2107.08950 2026-01-23 stat.ME econ.EM

Mind the Income Gap: Bias Correction of Inequality Estimators in Small-Sized Samples

Silvia De Nicolò, Maria Rosaria Ferrante, Silvia Pacei

Comments 21 pages, 4 figures

Journal ref De Nicolò, S.; Ferrante, M.R.; Pacei, S., Small-sample bias correction of inequality estimators in complex surveys, «JOURNAL OF OFFICIAL STATISTICS», 2024, 40, pp. 238 - 261

详情
英文摘要

Income inequality estimators are biased in small samples, leading generally to an underestimation. This aspect deserves particular attention when estimating inequality in small domains and performing small area estimation at the area level. We propose a bias correction framework for a large class of inequality measures comprising the Gini Index, the Generalized Entropy and the Atkinson index families by accounting for complex survey designs. The proposed methodology does not require any parametric assumption on income distribution, being very flexible. Design-based performance evaluation of our proposal has been carried out using EU-SILC data, their results show a noticeable bias reduction for all the measures. Lastly, an illustrative example of application in small area estimation confirms that ignoring ex-ante bias correction determines model misspecification.