arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.02204 2026-03-03 cs.LG stat.ML

Partial Causal Structure Learning for Valid Selective Conformal Inference under Interventions

Amir Asiaee, Kavey Aryan, James P. Long

详情
英文摘要

Selective conformal prediction can yield substantially tighter uncertainty sets when we can identify calibration examples that are exchangeable with the test example. In interventional settings, such as perturbation experiments in genomics, exchangeability often holds only within subsets of interventions that leave a target variable "unaffected" (e.g., non-descendants of an intervened node in a causal graph). We study the practical regime where this invariance structure is unknown and must be learned from data. Our contributions are: (i) a contamination-robust conformal coverage theorem that quantifies how misclassification of "unaffected" calibration examples degrades coverage via an explicit function $g(δ,n)$ of the contamination fraction and calibration set size, providing a finite-sample lower bound that holds for arbitrary contaminating distributions; (ii) a task-driven partial causal learning formulation that estimates only the binary descendant indicators $Z_{a,i}=\mathbf{1}\{i\in\mathrm{desc}(a)\}$ needed for selective calibration, rather than the full causal graph; and (iii) algorithms for descendant discovery via perturbation intersection patterns (differentially affected variable set intersections across interventions), and for approximate distance-to-intervention estimation via local invariant causal prediction. We provide recovery conditions under which contamination is controlled. Experiments on synthetic linear structural equation models (SEMs) validate the bound: under controlled contamination up to $δ=0.30$, the corrected procedure maintains $\ge 0.95$ coverage while uncorrected selective CP degrades to $0.867$. A proof-of-concept on Replogle K562 CRISPR interference (CRISPRi) perturbation data demonstrates applicability to real genomic screens.

2603.02195 2026-03-03 stat.AP

Comparative Analysis of Spatiotemporal Volatility Models: An Empirical Study on Financial Network Series

Ariane N. Meli Chrisko, Jessie Li, Philipp Otto, Wolfgang Schmid

Comments 28 pages, 21 figures. Submitted to the Vienna-Copenhagen Conference on Financial Econometrics (2026)

详情
英文摘要

Various spatiotemporal and network GARCH models have recently been proposed to capture volatility interactions, such as the transmission of market risk across financial networks. These approaches rely heavily on the specification of the adjacency or spatiotemporal weight matrix, for which several alternatives exist in the literature. This paper evaluates the out-of-sample forecasting performance of a range of spatiotemporal volatility models and multivariate GARCH benchmarks under nine alternative network specifications. The empirical analysis uses daily data for 16 sectorally diversified S&P 500 stocks from 22 December 1998 to 20 October 2024. A one-step-ahead forecasting framework is implemented, and models are assessed using BIC, RMSFE, and MAFE, with forecasts evaluated against a single realised volatility proxy based on squared log-returns. The nine spatial weight matrices reflect diverse economic and statistical relationships, including Granger-filtered and EGARCH-based spillovers. Results show that some spatiotemporal models outperform standard GARCH benchmarks in out-of-sample forecasting accuracy. Notably, the Dynamic Spatiotemporal ARCH model achieves the lowest RMSFE and MAFE across all network specifications at minimal computational cost. Pairwise Diebold-Mariano tests confirm significant differences in predictive accuracy. These findings underscore the value of incorporating spatial structure into volatility modelling as a parsimonious and interpretable alternative for financial network analysis.

2603.02193 2026-03-03 cs.LG cs.AI stat.ML

Symbol-Equivariant Recurrent Reasoning Models

Richard Freinschlag, Timo Bertram, Erich Kobler, Andreas Mayr, Günter Klambauer

详情
英文摘要

Reasoning problems such as Sudoku and ARC-AGI remain challenging for neural networks. The structured problem solving architecture family of Recurrent Reasoning Models (RRMs), including Hierarchical Reasoning Model (HRM) and Tiny Recursive Model (TRM), offer a compact alternative to large language models, but currently handle symbol symmetries only implicitly via costly data augmentation. We introduce Symbol-Equivariant Recurrent Reasoning Models (SE-RRMs), which enforce permutation equivariance at the architectural level through symbol-equivariant layers, guaranteeing identical solutions under symbol or color permutations. SE-RRMs outperform prior RRMs on 9x9 Sudoku and generalize from just training on 9x9 to smaller 4x4 and larger 16x16 and 25x25 instances, to which existing RRMs cannot extrapolate. On ARC-AGI-1 and ARC-AGI-2, SE-RRMs achieve competitive performance with substantially less data augmentation and only 2 million parameters, demonstrating that explicitly encoding symmetry improves the robustness and scalability of neural reasoning. Code is available at https://github.com/ml-jku/SE-RRM.

2603.02191 2026-03-03 math.ST math.AG stat.TH

Algebraic statistics of Hüsler-Reiss graphical models in multivariate extremes

Carlos Améndola, Jane Ivy Coons, Alexandros Grosdos, Frank Röttger

Comments 27 pages, 2 figures

详情
英文摘要

The field of extreme value statistics is concerned with modeling and predicting rare events. In a Hüsler-Reiss graphical model, a graph represents extremal conditional independence (CI) relations between random variables. These models are exponential families parameterized by a graph Laplacian and are considered the analogue of multivariate Gaussian models in the extremal setting. We study these models from the perspective of algebraic geometry. Translating the CI relations into polynomial constraints in the parameters, we define extremal CI ideals and find a determinantal representation of their generators. In terms of parametric inference, we study the extremal maximum likelihood degree as the number of solutions to a conditionally negative definite matrix completion problem. We also define and analyze the extremal maximum likelihood threshold for Hüsler-Reiss graphical models, which provides a certificate for the existence of a surrogate MLE in terms of the dimensionality of the point configuration that realizes the underlying summary statistic as a Euclidean distance matrix. We highlight throughout many interesting similarities but also differences with respect to Gaussian graphical models.

2603.02178 2026-03-03 cs.LG cs.AI stat.ML

Reservoir Subspace Injection for Online ICA under Top-n Whitening

Wenjun Xiao, Yuda Bi, Vince D Calhoun

详情
英文摘要

Reservoir expansion can improve online independent component analysis (ICA) under nonlinear mixing, yet top-$n$ whitening may discard injected features. We formalize this bottleneck as \emph{reservoir subspace injection} (RSI): injected features help only if they enter the retained eigenspace without displacing passthrough directions. RSI diagnostics (IER, SSO, $ρ_x$) identify a failure mode in our top-$n$ setting: stronger injection increases IER but crowds out passthrough energy ($ρ_x: 1.00\!\rightarrow\!0.77$), degrading SI-SDR by up to $2.2$\,dB. A guarded RSI controller preserves passthrough retention and recovers mean performance to within $0.1$\,dB of baseline $1/N$ scaling. With passthrough preserved, RE-OICA improves over vanilla online ICA by $+1.7$\,dB under nonlinear mixing and achieves positive SI-SDR$_{\mathrm{sc}}$ on the tested super-Gaussian benchmark ($+0.6$\,dB).

2603.02160 2026-03-03 stat.ME

Setwise Hierarchical Variable Selection and the Generalized Linear Step-Up Procedure for False Discovery Rate Control

Sarah Organ, Toby Kenney, Hong Gu

Comments 24 pages, 5 figures

详情
英文摘要

Controlling the false discovery rate (FDR) in variable selection becomes challenging when predictors are correlated, as existing methods often exclude all members of correlated groups and consequently perform poorly for prediction. We introduce a new setwise variable-selection framework that identifies clusters of potential predictors rather than forcing selection of a single variable. By allowing any member of a selected set to serve as a surrogate predictor, our approach supports strong predictive performance while maintaining rigorous FDR control. We construct sets via hierarchical clustering of predictors based on correlation, then test whether each set contains any non-null effects. Similar clustering and setwise selection have been applied in the familywise error rate (FWER) control regime, but previous research has been unable to overcome the inherent challenges of extending this to the FDR control framework. To control the FDR, we develop substantial generalizations of linear step-up procedures, extending the Benjamini-Hochberg and Benjamini-Yekutieli methods to accommodate the logical dependencies among these composite hypotheses. We prove that these procedures control the FDR at the nominal level and highlight their broader applicability. Simulation studies and real-data analyses show that our methods achieve higher power than existing approaches while preserving FDR control, yielding more informative variable selections and improved predictive models.

2603.02155 2026-03-03 cs.LG cs.AI math.ST stat.ML stat.TH

Near-Optimal Regret for KL-Regularized Multi-Armed Bandits

Kaixuan Ji, Qingyue Zhao, Heyang Zhao, Qiwei Di, Quanquan Gu

详情
英文摘要

Recent studies have shown that reinforcement learning with KL-regularized objectives can enjoy faster rates of convergence or logarithmic regret, in contrast to the classical $\sqrt{T}$-type regret in the unregularized setting. However, the statistical efficiency of online learning with respect to KL-regularized objectives remains far from completely characterized, even when specialized to multi-armed bandits (MABs). We address this problem for MABs via a sharp analysis of KL-UCB using a novel peeling argument, which yields a $\tilde{O}(ηK\log^2T)$ upper bound: the first high-probability regret bound with linear dependence on $K$. Here, $T$ is the time horizon, $K$ is the number of arms, $η^{-1}$ is the regularization intensity, and $\tilde{O}$ hides all logarithmic factors except those involving $\log T$. The near-tightness of our analysis is certified by the first non-constant lower bound $Ω(ηK \log T)$, which follows from subtle hard-instance constructions and a tailored decomposition of the Bayes prior. Moreover, in the low-regularization regime (i.e., large $η$), we show that the KL-regularized regret for MABs is $η$-independent and scales as $\tildeΘ(\sqrt{KT})$. Overall, our results provide a thorough understanding of KL-regularized MABs across all regimes of $η$ and yield nearly optimal bounds in terms of $K$, $η$, and $T$.

2603.02131 2026-03-03 stat.AP cs.SI physics.soc-ph stat.OT

Socio-Spatial Patterns of Suicide Mortality in the United States

Kushagra Tiwari, M. Amin Rahimian, Marie-Laure Charpignon, Philippe J. Giabbanelli, Praveen Kumar

Comments Code and data: https://github.com/kut97/suicide-sci

详情
英文摘要

Suicides cause over 49000 deaths yearly in the United States, 55% involving firearms. Suicide mortality exhibits substantial geographical and sociodemographic heterogeneity; yet the role of social networks remains underexplored. To assess how suicide risk and firearm restriction policies propagate through social ties, we integrate county-level suicide mortality data (2010-2022) with the Facebook Social Connectedness Index (SCI). We also examine Extreme Risk Protection Orders (ERPO), state-level policies restricting firearm access for individuals at risk of self-harm. In two-way fixed effects regressions, a one-standard-deviation increase in the SCI-weighted average suicide mortality rate of connected counties was associated with +2.78 deaths per 100,000 in a focal county, while a one-standard-deviation increase in ERPO social exposure was associated with -0.214 deaths per 100,000. These associations persisted when adjusting for geographic proximity and including state-by-year fixed effects, and confirm the effect of social networks on diffusion of both harmful exposures and protective interventions.

2603.02102 2026-03-03 cs.SI stat.AP

Political attitudes differ but share a common low-dimensional structure across social media and survey data

Antoine Vendeville, Hiroki Yamashita, Pedro Ramaciotti

详情
英文摘要

Does polarization online reflect the state of polarization in society? We study ideological positions and attitudes on several issues in France, a country with documented issue nonalignment. We compare distributions on X/Twitter with a nationally representative sample, focusing on two key properties: ideological polarization and issue alignment. Despite significant issue-wise divergences, positions of both the X population and the nationally representative sample present a similar bi-dimensional structure along two dominant bundles of aligned issues: a Left-Right divide, and a Global-Local divide. We then study how our results vary when accounting for key structural parameters of the online public sphere: activity, popularity, and visibility. We find that the dimensionality of attitude distributions shrinks as ideological polarization increases when selecting more active users. The divergence between political attitudes on social media and in survey data is greatly mediated by the combination of activity and popularity of social media users: users benefiting from the most exposure are also the most representative of the general public. Together, our results shed light on the structural similarities and differences between political attitudes from social media users and the general public.

2603.02069 2026-03-03 cs.LG cs.AI math.OC stat.ML

Scaling Laws of SignSGD in Linear Regression: When Does It Outperform SGD?

Jihwan Kim, Dogyoon Song, Chulhee Yun

Comments Accepted at ICLR 2026, 89 pages, 25 figures

详情
英文摘要

We study scaling laws of signSGD under a power-law random features (PLRF) model that accounts for both feature and target decay. We analyze the population risk of a linear model trained with one-pass signSGD on Gaussian-sketched features. We express the risk as a function of model size, training steps, learning rate, and the feature and target decay parameters. Comparing against the SGD risk analyzed by Paquette et al. (2024), we identify a drift-normalization effect and a noise-reshaping effect unique to signSGD. We then obtain compute-optimal scaling laws under the optimal choice of learning rate. Our analysis shows that the noise-reshaping effect can make the compute-optimal slope of signSGD steeper than that of SGD in regimes where noise is dominant. Finally, we observe that the widely used warmup-stable-decay (WSD) schedule further reduces the noise term and sharpens the compute-optimal slope, when feature decay is fast but target decay is slow.

2603.02059 2026-03-03 stat.ML cs.LG

TRAKNN: Efficient Trajectory Aware Spatiotemporal kNN for Rare Meteorological Trajectory Detection

Guillaume Coulaud, Davide Faranda

详情
英文摘要

Extreme weather events, such as windstorms and heatwaves, are driven by persistent atmospheric circulation patterns that evolve over several consecutive days. While traditional circulation-based studies often focus on instantaneous atmospheric states, capturing the temporal evolution, or trajectory, of these spatial fields is essential for characterizing rare and potentially impactful atmospheric behavior. However, performing an exhaustive similarity search on multi-decadal, continental-scale gridded datasets presents significant computational and memory challenges. In this paper, we propose TRAKNN (TRajectory Aware KNN), a fully unsupervised and data-agnostic framework for detecting geometrically rare short trajectories in spatio-temporal data with an exact kNN approach. TRAKNN leverages a recurrence-based algorithm that decouples computational complexity from trajectory length and efficient batch operations, maximizing computational intensity. These optimizations enable exhaustive analysis on standard workstations, either on CPU or on GPU. We evaluate our approach on 75 years of daily European sea-level pressure data. Our results illustrate that rare trajectories identified by TRAKNN correspond to physically coherent atmospheric anomalies and align with independent extreme-event databases.

2603.02043 2026-03-03 cs.LG stat.ML

Leave-One-Out Prediction for General Hypothesis Classes

Jian Qian, Jiachen Xu

详情
英文摘要

Leave-one-out (LOO) prediction provides a principled, data-dependent measure of generalization, yet guarantees in fully transductive settings remain poorly understood beyond specialized models. We introduce Median of Level-Set Aggregation (MLSA), a general aggregation procedure based on empirical-risk level sets around the ERM. For arbitrary fixed datasets and losses satisfying a mild monotonicity condition, we establish a multiplicative oracle inequality for the LOO error of the form \[ LOO_S(\hat{h}) \;\le\; C \cdot \frac{1}{n} \min_{h\in H} L_S(h) \;+\; \frac{Comp(S,H,\ell)}{n}, \qquad C>1. \] The analysis is based on a local level-set growth condition controlling how the set of near-optimal empirical-risk minimizers expands as the tolerance increases. We verify this condition in several canonical settings. For classification with VC classes under the 0-1 loss, the resulting complexity scales as $O(d \log n)$, where $d$ is the VC dimension. For finite hypothesis and density classes under bounded or log loss, it scales as $O(\log |H|)$ and $O(\log |P|)$, respectively. For logistic regression with bounded covariates and parameters, a volumetric argument based on the empirical covariance matrix yields complexity scaling as $O(d \log n)$ up to problem-dependent factors.

2603.02010 2026-03-03 cs.LG stat.ML

Noise-Calibrated Inference from Differentially Private Sufficient Statistics in Exponential Families

Amir Asiaee, Samhita Pal

详情
英文摘要

Many differentially private (DP) data release systems either output DP synthetic data and leave analysts to perform inference as usual, which can lead to severe miscalibration, or output a DP point estimate without a principled way to do uncertainty quantification. This paper develops a clean and tractable middle ground for exponential families: release only DP sufficient statistics, then perform noise-calibrated likelihood-based inference and optional parametric synthetic data generation as post-processing. Our contributions are: (1) a general recipe for approximate-DP release of clipped sufficient statistics under the Gaussian mechanism; (2) asymptotic normality, explicit variance inflation, and valid Wald-style confidence intervals for the plug-in DP MLE; (3) a noise-aware likelihood correction that is first-order equivalent to the plug-in but supports bootstrap-based intervals; and (4) a matching minimax lower bound showing the privacy distortion rate is unavoidable. The resulting theory yields concrete design rules and a practical pipeline for releasing DP synthetic data with principled uncertainty quantification, validated on three exponential families and real census data.

2603.02003 2026-03-03 stat.ME

Analysis of Stepped-Wedge Randomised Cluster Trial using a generalized pairwise comparison approach : a simulation study

Yohan Bard, Emilie Presles, Marc Buyse, Silvy Laporte, Paul Zufferey, Frederikus A. Klok, Olivier Sanchez, Francis Couturaud, Edouard Ollier

详情
英文摘要

Stepped-wedge cluster randomised trials (SW-CRTs) increasingly evaluate complex interventions, yet methodological guidance for analysing composite endpoints using generalized pairwise comparisons (GPC)remains limited. This work investigates the performance of several GPC-based estimators in the presence of clustering, temporal trends, and varying correlation structures typical of SW-CRTs. We conducted an extensive simulation study covering a range of intraclass correlations (ICC), cluster autocorrelation coefficients (CAC), time effects, and treatment effect sizes. Eight analytical approaches were compared, including unadjusted estimators, cluster-stratified win odds, mixed-effects models applied to cluster-period win odds, and probabilistic index models (PIMs). Type I error control was strongly compromised for methods ignoring time or clustering, whereas only two approaches consistently maintained nominal error rates: a hierarchical mixed-effects model with sequence and cluster-level random slopes (b4) and a cluster-restricted PIM (c2). These two methods were further evaluated in terms of statistical power, where c2 generally showed higher efficiency, particularly under strong clustering, low CAC, or the presence of temporal trends, while both converged to similar performance for large treatment effects. Overall, our findings identify b4 and c2 as the most reliable GPC-based strategies for SW-CRT analysis and provide practical guidance for their application, including for ongoing trials such as ETHER.

2603.01989 2026-03-03 math.ST stat.TH

Wasserstein-based identification of metastable states in time series data via change point detection and segment clustering

David Gentile, Joshua Huang, James M. Murphy

详情
英文摘要

Change point detection for time series analysis is a difficult and important problem in applied statistics, for which a variety of approaches have been developed in the past several decades. Here, the Wasserstein metric is employed as a tool for change-point identification in multi-dimensional time series data in order to identify clusters in time series in an unsupervised way. We leverage the simplicity of the optimal transport cost in the 1-dimensional setting to quickly identify both a segmentation (family of change points for a trajectory) and a clustering for the data when the number of segments is much smaller than the number of data points, making no parametric assumptions about the particular distributions involved. Our change point detection method scales linearly in the size of the data and in the dimension of the samples. We test our approach on idealized synthetic data trajectories, as well as real world trajectories coming from the domain of molecular dynamics simulations and underwater acoustics. We find that segmenting these time series via change points obtained by estimating the Wasserstein metric derivative and then clustering the identified segments as measures with similarity measured by the Wasserstein metric, successfully identifies metastable states in the law of the processes.

2603.01981 2026-03-03 stat.AP

Quantifying Uncertainty in Void Swelling Prediction: A Conformal Prediction Framework for Reactor Safety Margins

Minhee Kim, Yong Yang

详情
英文摘要

Irradiation-induced void swelling is a critical degradation mechanism for structural materials in nuclear reactors, dictating component operational lifespan and safety. While recent machine learning (ML) approaches have improved the accuracy of swelling rate predictions, they often fail to account for the inherent stochasticity of radiation damage, providing point estimates without rigorous uncertainty quantification. This lack of probabilistic context limits their applications in materials qualification, reactor licensing and risk assessment. In this work, we develop a framework that integrates ensemble ML models with Conformal Prediction (CP) to generate statistically calibrated prediction intervals. Unlike standard error estimation or Bayesian methods that often rely on rigid distributional assumptions, this approach specifically addresses the physical heteroscedasticity of swelling data, where variance transitions from the nucleation-dominated incubation regime to the growth-dominated steady-state regime. We demonstrate that log-transformed conformal prediction inference provides valid empirical coverage consistent with target confidence levels even in sparse data regimes. This framework offers a pathway to replace overly conservative upper-bound curves with Probabilistic Risk Assessment (PRA) tools for high-dose reactor core internals.

2603.01975 2026-03-03 stat.ML cs.NA math.NA

Density-Matrix Spectral Embeddings for Categorical Data: Operator Structure and Stability

Raquel Bosch-Romeu, Antonio Falcó, osé-Antonio Rodríguez-Gallego

详情
英文摘要

We introduce a supervised dimensionality reduction methodology for categorical (and discretized mixed-type) data based on a density-matrix construction induced by class-conditional frequencies. Given a labeled dataset encoded in a one-hot survey space, we assemble a frequency matrix whose columns aggregate feature occurrences within each class, and define a normalized Gram-type operator that satisfies the axioms of a density matrix. The resulting representation admits an intrinsic rank bound controlled by the number of classes, enabling low-dimensional spectral embeddings via dominant eigenmodes. Classification is performed in the reduced space through class-conditional kernel density estimation and a maximum-likelihood decision rule. We establish structural invariances, provide complexity estimates, and validate the approach on synthetic benchmarks probing high cardinality, sparsity, noise, and class imbalance.

2603.01971 2026-03-03 stat.ML cs.LG

LOCUS: A Distribution-Free Loss-Quantile Score for Risk-Aware Predictions

Matheus Barreto, Mário de Castro, Thiago R. Ramos, Denis Valle, Rafael Izbicki

Comments The article contains nine pages and the appendix twelve

详情
英文摘要

Modern machine learning models can be accurate on average yet still make mistakes that dominate deployment cost. We introduce Locus, a distribution-free wrapper that produces a per-input loss-scale reliability score for a fixed prediction function. Rather than quantifying uncertainty about the label, Locus models the realized loss of the prediction function using any engine that outputs a predictive distribution for the loss given an input. A simple split-calibration step turns this function into a distribution-free interpretable score that is comparable across inputs and can be read as an upper loss level. The score is useful on its own for ranking, and it can optionally be thresholded to obtain a transparent flagging rule with distribution-free control of large-loss events. Experiments across 13 regression benchmarks show that Locus yields effective risk ranking and reduces large-loss frequency compared to standard heuristics.

2603.01951 2026-03-03 cs.LG math.OC stat.ML

Accelerating Single-Pass SGD for Generalized Linear Prediction

Qian Chen, Shihong Ding, Cong Fang

Comments 50 pages

详情
英文摘要

We study generalized linear prediction under a streaming setting, where each iteration uses only one fresh data point for a gradient-level update. While momentum is well-established in deterministic optimization, a fundamental open question is whether it can accelerate such single-pass non-quadratic stochastic optimization. We propose the first algorithm that successfully incorporates momentum via a novel data-dependent proximal method, achieving dual-momentum acceleration. Our derived excess risk bound decomposes into three components: an improved optimization error, a minimax optimal statistical error, and a higher-order model-misspecification error. The proof handles mis-specification via a fine-grained stationary analysis of inner updates, while localizing statistical error through a two-phase outer-loop analysis. As a result, we resolve the open problem posed by Jain et al. [2018a] and demonstrate that momentum acceleration is more effective than variance reduction for generalized linear prediction in the streaming setting.

2603.01943 2026-03-03 stat.ME

A Simulation Study to Compare Inferential Properties when Modelling Ordinal Outcomes: The Case for the (Plain but Robust) Proportional Odds Model

Stefan Inerle, Markus Pauly, Moritz Berger

详情
英文摘要

Ordinal measurements are common outcomes in studies within psychology, as well as in the social and behavioral sciences. Choosing an appropriate regression model for analysing such data poses a difficult task. This paper aims to facilitate modeling decisions for quantitative researchers by presenting the results of an extensive simulation study on the inferential properties of common ordinal regression models: the proportional odds model, the category-specific odds model, the location-shift model, the location-scale model, and the linear model, which incorrectly treats ordinal outcomes as metric. The simulations were conducted under different data generating processes based on each of the ordinal models and varying parameter configurations within each model class. We examined the bias of parameter estimates as well as type I error rates ($α$-errors) and the power of statistical parameter testing procedures corresponding to the respective models. Our findings reveal several highlights. For parameter estimates, we observed that cumulative ordinal regression models exhibited large biases in cases of large parameter values and high skewness of the outcome distribution in the true data generation process. Regarding statistical hypothesis testing, the proportional odds model and the linear model showed the most reliable results. Due to its better fit and interpretability for ordinal outcomes, we recommend the use of the proportional odds model unless there are relevant contraindications.

2603.01858 2026-03-03 math.PR math.ST stat.TH

Existence, properties, and parametric inference for possibly hyperuniform Gibbs perturbed lattices

Jean-François Coeurjolly, Christopher Renaud-Chan

详情
英文摘要

This work lies at the intersection of Gibbs models and hyperuniform point processes. Classical Gibbs models, whether defined on lattices or in continuous space, provide flexible tools to describe interacting particle systems but are generally not hyperuniform. Conversely, known hyperuniform models such as the Ginibre process or perturbed lattices lack flexibility and typically cannot enforce physically relevant constraints such as hard-core interactions. We introduce a new class of models, termed Gibbs perturbed lattice models, which preserve a lattice structure while allowing interactions through a Hamiltonian defined on the perturbed particle locations. We establish existence results for the associated Gibbs measures, derive DLR-type equilibrium equations, and show that some models in this class exhibit hyperuniformity. Finally, we propose statistical inference methods based on the Takacs-Fiksel type approach and prove their asymptotic properties.

2602.08212 2026-03-03 stat.ME

Improved Conditional Logistic Regression using Information in Concordant Pairs with Software

Jacob Tennenbaum, Adam Kapelner

Comments 18 pages, 7 tables

详情
英文摘要

We develop an improvement to conditional logistic regression (CLR) in the setting where the parameter of interest is the additive effect of binary treatment effect on log-odds of the positive level in the binary response. Our improvement is simply to use information learned above the nuisance control covariates found in the concordant response pairs' observations (which is usually discarded) to create an informative prior on their coefficients. This prior is then used in the CLR which is run on the discordant pairs. Our power improvements over CLR are most notable in small sample sizes and in nonlinear log-odds-of-positive-response models. Our methods are released in an optimized R package called bclogit.

2601.21895 2026-03-03 cs.CL cs.AI stat.ML

Learn-to-Distance: Distance Learning for Detecting LLM-Generated Text

Hongyi Zhou, Jin Zhu, Kai Ye, Ying Yang, Erhan Xu, Chengchun Shi

Comments Accepted by ICLR2026

详情
英文摘要

Modern large language models (LLMs) such as GPT, Claude, and Gemini have transformed the way we learn, work, and communicate. Yet, their ability to produce highly human-like text raises serious concerns about misinformation and academic integrity, making it an urgent need for reliable algorithms to detect LLM-generated content. In this paper, we start by presenting a geometric approach to demystify rewrite-based detection algorithms, revealing their underlying rationale and demonstrating their generalization ability. Building on this insight, we introduce a novel rewrite-based detection algorithm that adaptively learns the distance between the original and rewritten text. Theoretically, we demonstrate that employing an adaptively learned distance function is more effective for detection than using a fixed distance. Empirically, we conduct extensive experiments with over 100 settings, and find that our approach demonstrates superior performance over baseline algorithms in the majority of scenarios. In particular, it achieves relative improvements from 54.3% to 75.4% over the strongest baseline across different target LLMs (e.g., GPT, Claude, and Gemini). A python implementation of our proposal is publicly available at https://github.com/Mamba413/L2D.

2512.13599 2026-03-03 physics.geo-ph stat.ME

Correcting exponentiality test for binned earthquake magnitudes

Angela Stallone, Ilaria Spassiani

详情
Journal ref
Seismica 5.1 (2026) 1-10
英文摘要

Above the magnitude of completeness - the minimum threshold for which a 100\% detection rate is assumed - earthquake magnitudes are typically modeled as a continuous exponential distribution. In practice, however, earthquake catalogs report magnitudes with finite resolution, resulting in a discrete (geometric) distribution. To determine the magnitude of completeness, the Lilliefors test is commonly applied. Because this test assumes continuous data, it is standard practice to add uniform noise to binned magnitudes prior to testing exponentiality. Here we show analytically that uniform dithering does not recover the underlying continuous exponential distribution from its discretized (geometric) form. It instead returns a piecewise-constant residual lifetime distribution, whose deviation from the exponential model becomes detectable as catalog size or bin width increases. Through numerical experiments, we demonstrate that this deviation yields a systematic overestimation of the magnitude of completeness, with biases exceeding one magnitude unit in large, high-resolution catalogs. We derive the exact noise distribution - a truncated exponential within each magnitude bin - that correctly restores the continuous exponential distribution over the whole magnitude range. Numerical tests show that this correction yields Lilliefors rejection probabilities that are consistent with the significance level across a wide range of bin widths and catalog sizes. Although illustrated for the Lilliefors test, the identified bias and the proposed correction are independent of the specific statistical test and apply generally to exponentiality testing of discretized magnitude data.

2512.12046 2026-03-03 cs.LG cs.RO cs.SY eess.SY stat.ML

Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning

Vittorio Giammarino, Ahmed H. Qureshi

详情
英文摘要

Goal-Conditioned Reinforcement Learning (GCRL) mitigates the difficulty of reward design by framing tasks as goal reaching rather than maximizing hand-crafted reward signals. In this setting, the optimal goal-conditioned value function naturally forms a quasimetric, motivating Quasimetric RL (QRL), which constrains value learning to quasimetric mappings and enforces local consistency through discrete, trajectory-based constraints. We propose Eikonal-Constrained Quasimetric RL (Eik-QRL), a continuous-time reformulation of QRL based on the Eikonal Partial Differential Equation (PDE). This PDE-based structure makes Eik-QRL trajectory-free, requiring only sampled states and goals, while improving out-of-distribution generalization. We provide theoretical guarantees for Eik-QRL and identify limitations that arise under complex dynamics. To address these challenges, we introduce Eik-Hierarchical QRL (Eik-HiQRL), which integrates Eik-QRL into a hierarchical decomposition. Empirically, Eik-HiQRL achieves state-of-the-art performance in offline goal-conditioned navigation and yields consistent gains over QRL in manipulation tasks, matching temporal-difference methods.

2510.22835 2026-03-03 cs.LG stat.CO stat.ML

Clustering by Denoising: Latent plug-and-play diffusion for single-cell data

Dominik Meier, Shixing Yu, Sagnik Nandy, Promit Ghosal, Kyra Gan

详情
英文摘要

Single-cell RNA sequencing (scRNA-seq) enables the study of cellular heterogeneity. Yet, clustering accuracy, and with it downstream analyses based on cell labels, remain challenging due to measurement noise and biological variability. In standard latent spaces (e.g., obtained through PCA), data from different cell types can be projected close together, making accurate clustering difficult. We introduce a latent plug-and-play diffusion framework that separates the observation and denoising space. This separation is operationalized through a novel Gibbs sampling procedure: the learned diffusion prior is applied in a low-dimensional latent space to perform denoising, while to steer this process, noise is reintroduced into the original high-dimensional observation space. This unique "input-space steering" ensures the denoising trajectory remains faithful to the original data structure. Our approach offers three key advantages: (1) adaptive noise handling via a tunable balance between prior and observed data; (2) uncertainty quantification through principled uncertainty estimates for downstream analysis; and (3) generalizable denoising by leveraging clean reference data to denoise noisier datasets, and via averaging, improve quality beyond the training set. We evaluate robustness on both synthetic and real single-cell genomics data. Our method improves clustering accuracy on synthetic data across varied noise levels and dataset shifts. On real-world single-cell data, our method demonstrates improved biological coherence in the resulting cell clusters, with cluster boundaries that better align with known cell type markers and developmental trajectories.

2510.07088 2026-03-03 stat.ML cs.LG

Fourier Analysis on the Boolean Hypercube via Hoeffding Functional Decomposition

Baptiste Ferrere, Nicolas Bousquet, Fabrice Gamboa, Jean-Michel Loubes, Joseph Muré

详情
英文摘要

Fourier analysis on the Boolean hypercube is fundamentally defined as the orthogonal decomposition of the space of pseudo-Boolean functions with respect to the uniform probability measure. In this work, we propose an ANOVA-based generalization of the Fourier decomposition on the Boolean hypercube endowed with any arbitrary probability measure. We provide an \emph{explicit} decomposition basis which generalizes the Walsh-Hadamard (or parity functions) basis under any \emph{arbitrary} probability measure on the Boolean hypercube. We formulate the computation of the entire functional decomposition as a least squares problem and also provide a method to address the classical \emph{curse of dimensionality} challenge. We provide a comprehensive generalization of Fourier analysis on the Boolean hypercube, enabling the handling of non-uniform configuration spaces inherent to real-world machine learning tasks, \textit{e.g.} when dealing with \emph{one-hot encoded} features. Finally, we demonstrate its practical impact in the field of explainable AI, by conducting comparative studies with feature attribution methods such as SHAP or TreeHFD.

2510.01339 2026-03-03 cs.CV stat.ML

LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration

Alessio Spagnoletti, Andrés Almansa, Marcelo Pereyra

Comments 30 pages, 16 figures. The Fourteenth International Conference on Learning Representations, ICLR 2026

详情
英文摘要

Computational imaging methods increasingly rely on powerful generative diffusion models to tackle challenging image restoration tasks. In particular, state-of-the-art zero-shot image inverse solvers leverage distilled text-to-image latent diffusion models (LDMs) to achieve unprecedented accuracy and perceptual quality with high computational efficiency. However, extending these advances to high-definition video restoration remains a significant challenge, due to the need to recover fine spatial detail while capturing subtle temporal dependencies. Consequently, methods that naively apply image-based LDM priors on a frame-by-frame basis often result in temporally inconsistent reconstructions. We address this challenge by leveraging recent advances in Video Consistency Models (VCMs), which distill video latent diffusion models into fast generators that explicitly capture temporal causality. Building on this foundation, we propose LVTINO, the first zero-shot or plug-and-play inverse solver for high definition video restoration with priors encoded by VCMs. Our conditioning mechanism bypasses the need for automatic differentiation and achieves state-of-the-art video reconstruction quality with only a few neural function evaluations, while ensuring strong measurement consistency and smooth temporal transitions across frames. Extensive experiments on a diverse set of video inverse problems show significant perceptual improvements over current state-of-the-art methods that apply image LDMs frame by frame, establishing a new benchmark in both reconstruction fidelity and computational efficiency. The code is available on GitHub.

2507.16545 2026-03-03 stat.ME

Bayesian Variational Inference for Mixed Data Mixture Models

Junyang Wang, James Bennett, Victor Lhoste, Sarah Filippi

Comments Updated Corollary 5 to include contraction rate

详情
英文摘要

Heterogeneous, mixed type datasets including both continuous and categorical variables are ubiquitous, and enriches data analysis by allowing for more complex relationships and interactions to be modelled. Mixture models offer a flexible framework for capturing the underlying heterogeneity and relationships in mixed type datasets. Most current approaches for modelling mixed data either forgo uncertainty quantification and only conduct point estimation, and some use MCMC which incurs a very high computational cost that is not scalable to large datasets. This paper develops a coordinate ascent variational inference algorithm (CAVI) for mixture models on mixed (continuous and categorical) data, which circumvents the high computational cost of MCMC while retaining uncertainty quantification. We demonstrate our approach through simulation studies as well as an applied case study of the NHANES risk factor dataset. We provide theoretical justification for our method by establishing that the CAVI variational posterior mean converges locally to the true parameter value at a gap of $O(1/n)$ from the maximum likelihood estimator. Building on this result, we show that the CAVI variational posterior contracts around the true parameter at $O(n^{-1/2})$ rate.

2506.06267 2026-03-03 stat.ME

A causal framework for evaluating the total effect of strategies aiming to expand screening and to improve outcomes

Joy Zora Nakato, Janice Litunya, Brian Beesiga, Jane Kabami, James Ayieko, Moses R. Kamya, Gabriel Chamie, Laura B. Balzer

Comments 20 pages, 3 figures

详情
英文摘要

For many health conditions, there are highly efficacious treatment and prevention products. Maximizing their impact requires strategies that improve the reach of health screening in order to establish who could benefit. For example, HIV prevention strategies aim to expand risk screening and to improve uptake of pre-exposure prophylaxis (PrEP) among those experiencing risk. Often, these strategies induce changes at the group-level (e.g., health clinics or communities) and are evaluated through cluster randomized trials. This scenario creates a complex, multilevel-mediation-missing data problem for the following reasons. First, the strategy is delivered at the cluster-level, while health screening and outcomes are at the individual-level. Second, the strategy improves health outcomes directly and indirectly through improved health screening. Third, everyone has an underlying status, which is only observed among those screened. To formally define the total effect in such settings, we use Counterfactual Strata Effects: causal estimands where the outcome is only relevant for a group whose membership is subject to missingness and/ or impacted by the exposure of interest. To identify and estimate the corresponding statistical estimand, we propose a novel extension of Two-Stage targeted minimum loss-based estimation (TMLE). Simulations demonstrate the practical performance of our approach as well as the limitations of existing approaches.

2501.04134 2026-03-03 stat.ML cs.LG math.OC math.ST stat.TH

Mixing Times and Privacy Analysis for the Projected Langevin Algorithm under a Modulus of Continuity

Mario Bravo, Juan P. Flores-Mella, Cristóbal Guzmán

Comments 38 pages, 2 figures

详情
英文摘要

We study the mixing time of the projected Langevin algorithm (LA) and the privacy curve of noisy Stochastic Gradient Descent (SGD), beyond nonexpansive iterations. Specifically, we derive new mixing time bounds for the projected LA which are, in some important cases, dimension-free and poly-logarithmic on the accuracy, closely matching the existing results in the smooth convex case. Additionally, we establish new upper bounds for the privacy curve of the subsampled noisy SGD algorithm. These bounds show a crucial dependency on the regularity of gradients, and are useful for a wide range of convex losses beyond the smooth case. Our analysis relies on a suitable extension of the Privacy Amplification by Iteration (PABI) framework (Feldman et al., 2018; Altschuler and Talwar, 2022, 2023) to noisy iterations whose gradient map is not necessarily nonexpansive. This extension is achieved by designing an optimization problem which accounts for the best possible Rényi divergence bound obtained by an application of PABI, where the tractability of the problem is crucially related to the modulus of continuity of the associated gradient mapping. We show that, in several interesting cases -- namely the nonsmooth convex, weakly smooth and (strongly) dissipative -- such optimization problem can be solved exactly and explicitly, yielding the tightest possible PABI-based bounds.

2412.16031 2026-03-03 stat.ML cs.LG math.ST stat.TH

Learning sparsity-promoting regularizers for linear inverse problems

Giovanni S. Alberti, Ernesto De Vito, Tapio Helin, Matti Lassas, Luca Ratti, Matteo Santacesaria

Comments 28 pages, 4 figures

详情
Journal ref
SIAM Journal on Mathematics of Data Science 2026 8:1, 167-199
英文摘要

This paper introduces a novel approach to learning sparsity-promoting regularizers for solving linear inverse problems. We develop a bilevel optimization framework to select an optimal synthesis operator, denoted as $B$, which regularizes the inverse problem while promoting sparsity in the solution. The method leverages statistical properties of the underlying data and incorporates prior knowledge through the choice of $B$. We establish the well-posedness of the optimization problem, provide theoretical guarantees for the learning process, and present sample complexity bounds. The approach is demonstrated through theoretical infinite-dimensional examples, including compact perturbations of a known operator and the problem of learning the mother wavelet, and through extensive numerical simulations. This work extends previous efforts in Tikhonov regularization by addressing non-differentiable norms and proposing a data-driven approach for sparse regularization in infinite dimensions.

2410.23450 2026-03-03 cs.LG cs.AI cs.RO stat.ML

Return Augmented Decision Transformer for Off-Dynamics Reinforcement Learning

Ruhan Wang, Yu Yang, Zhishuai Liu, Dongruo Zhou, Pan Xu

Comments 26 pages, 11 tables, 8 figures. Published in Transactions on Machine Learning Research (TMLR)

详情
Journal ref
Transactions on Machine Learning Research, 2026
英文摘要

We study offline off-dynamics reinforcement learning (RL) to utilize data from an easily accessible source domain to enhance policy learning in a target domain with limited data. Our approach centers on return-conditioned supervised learning (RCSL), particularly focusing on Decision Transformer (DT) type frameworks, which can predict actions conditioned on desired return guidance and complete trajectory history. Previous works address the dynamics shift problem by augmenting the reward in the trajectory from the source domain to match the optimal trajectory in the target domain. However, this strategy can not be directly applicable in RCSL owing to (1) the unique form of the RCSL policy class, which explicitly depends on the return, and (2) the absence of a straightforward representation of the optimal trajectory distribution. We propose the Return Augmented (REAG) method for DT type frameworks, where we augment the return in the source domain by aligning its distribution with that in the target domain. We provide the theoretical analysis demonstrating that the RCSL policy learned from REAG achieves the same level of suboptimality as would be obtained without a dynamics shift. We introduce two practical implementations REAG$_\text{Dara}^{*}$ and REAG$_\text{MV}^{*}$ respectively. Thorough experiments on D4RL datasets and various DT-type baselines demonstrate that our methods consistently enhance the performance of DT type frameworks in off-dynamics RL.

2407.08086 2026-03-03 cs.LG stat.CO stat.ML

The GeometricKernels Package: Heat and Matérn Kernels for Geometric Learning on Manifolds, Meshes, and Graphs

Peter Mostowsky, Vincent Dutordoir, Iskander Azangulov, Noémie Jaquier, Michael John Hutchinson, Aditya Ravuri, Leonel Rozo, Alexander Terenin, Viacheslav Borovitskiy

详情
Journal ref
Journal of Machine Learning Research, 2025
英文摘要

Kernels are a fundamental technical primitive in machine learning. In recent years, kernel-based methods such as Gaussian processes are becoming increasingly important in applications where quantifying uncertainty is of key interest. In settings that involve structured data defined on graphs, meshes, manifolds, or other related spaces, defining kernels with good uncertainty-quantification behavior, and computing their value numerically, is less straightforward than in the Euclidean setting. To address this difficulty, we present GeometricKernels, a Python software package which implements the geometric analogs of classical Euclidean squared exponential - also known as heat - and Matérn kernels, which are widely-used in settings where uncertainty is of key interest. As a byproduct, we obtain the ability to compute Fourier-feature-type expansions, which are widely used in their own right, on a wide set of geometric spaces. Our implementation supports automatic differentiation in every major current framework simultaneously via a backend-agnostic design. In this companion paper to the package and its documentation, we outline the capabilities of the package and present an illustrated example of its interface. We also include a brief overview of the theory the package is built upon and provide some historic context in the appendix.

2405.00081 2026-03-03 math.PR math.ST stat.ML stat.TH

Imprecise Markov Semigroups and their Ergodicity

Michele Caprio, Mengqi Chen

详情
英文摘要

We introduce the concept of an imprecise Markov semigroup \(\mathbf Q\). It is a tool that allows us to represent ambiguity around both the transition probabilities and the invariant measure of a continuous-time Markov process via a collection of Markov semigroups, each associated with a (possibly different) Markov process. We use techniques from topology, geometry, and probability to analyze ergodic limits under model uncertainty encoded by \(\mathbf Q\). We establish long-term bounds that are uniform in the initial state and identify regimes in which the imprecision in these bounds collapses asymptotically. Our results are proved in progressively more general settings. We first assume that \(\mathbf Q\) is compact and that the state space is Euclidean or a Riemannian manifold, working with a fixed bounded observable. We then allow the state space to be standard Borel, while keeping \(\mathbf Q\) compact and the observable fixed. Finally, we drop compactness and work on Polish metric spaces of finite diameter, where we treat arbitrary bounded Lipschitz observables. The importance of our findings for the fields of artificial intelligence and computer vision is also discussed at a high level; In particular, in the study of how the probability of an output evolves over time as we perturb the input of a convolutional autoencoder.

2404.08480 2026-03-03 cs.LG cs.CL stat.CO

Using ChatGPT for Data Science Analyses

Ozan Evkaya, Miguel de Carvalho

Comments 19 pages with figures and appendix

详情
Journal ref
Harvard Data Science Review, 8(1) (2026)
英文摘要

As a result of recent advancements in generative AI, the field of data science is prone to various changes. The way practitioners construct their data science workflows is now irreversibly shaped by recent advancements, particularly by tools like OpenAI's Data Analysis plugin. While it offers powerful support as a quantitative co-pilot, its limitations demand careful consideration in empirical analysis. This paper assesses the potential of ChatGPT for data science analyses, illustrating its capabilities for data exploration and visualization, as well as for commonly used supervised and unsupervised modeling tasks. While we focus here on how the Data Analysis plugin can serve as co-pilot for Data Science workflows, its broader potential for automation is implicit throughout.

2401.17961 2026-03-03 math.ST stat.TH

A Bernstein-von Mises Theorem for Generalized Fiducial Distributions

J. E. Borgert, Jan Hannig

详情
英文摘要

An established and growing literature on generalized fiducial inference and related fiducial ideas points to the adoption of fiducial inference as a mainstream perspective among modern statisticians. Like Bayesian posteriors, generalized fiducial distributions (GFDs) are known to satisfy Bernstein-von Mises (BvM)-type results under classical regularity conditions. Existing fiducial BvM results, however, rely on relatively restrictive smoothness assumptions and are limited in scope. In this paper, we establish a Bernstein-von Mises theorem for generalized fiducial inference under the general framework of local asymptotic normality, which accommodates non-i.i.d. data settings and reduces to the familiar differentiability in quadratic mean condition in the i.i.d. case. We apply our result to extend existing fiducial theory for free-knot spline models first developed in Sonderegger and Hannig (2014), and further illustrate its generality in models where classical regularity conditions fail or i.i.d. assumptions are not met.

2401.00664 2026-03-03 math.OC cs.LG math.PR math.ST stat.TH

Metric Entropy-Free Sample Complexity Bounds for Sample Average Approximation in Convex Stochastic Programming

Hongcheng Liu, Jindong Tong

详情
英文摘要

This paper studies sample average approximation (SAA) in solving convex or strongly convex stochastic programming (SP) problems. In estimating SAA's sample efficiency, the state-of-the-art sample complexity bounds entail metric entropy terms (such as the logarithm of the feasible region's covering number), which often grow polynomially with problem dimensionality. While it has been shown that metric entropy-free complexity rates are attainable under a uniform Lipschitz condition, such an assumption can be overly critical for many important SP problem settings. In response, this paper presents metric entropy-free sample complexity bounds for the SAA under standard SP assumptions} -- in the absence of the uniform Lipschitz condition. For a $d$-dimensional problem, the new results often lead to an $O(d)$-improvement in the complexity rate compared with the state-of-the-art. From the newly established complexity bounds, an important revelation is that SAA and the canonical stochastic mirror descent (SMD) method, two mainstream solution approaches to SP, entail almost identical rates of sample efficiency, lifting a theoretical discrepancy of SAA from SMD also by a factor of $O(d)$. Furthermore, this paper explores non-Lipschitzian scenarios where SAA maintains provable efficacy but the corresponding results for SMD remain mostly unexplored, indicating the potential of SAA's better applicability in some irregular settings. The results of our numerical experiments align with our theoretical findings.

2310.17624 2026-03-03 math-ph math.MP math.PR math.ST quant-ph stat.TH

The six blinds and the elephant or an interdisciplinary selection of measurement features

Ask Ellingsen, Douglas Lundholm, Jean-Pierre Magnot

Comments 28 pages, 11 figures, 1 table. Proceedings of the XL Workshop on Geometric Methods in Physics, Bialowieza, 2023

详情
Journal ref
In: Kielanowski, P., Beltita, D., Dobrogowska, A., Goliński, T. (eds) Geometric Methods in Physics XL. WGMP 2022. Trends in Mathematics. Birkhäuser, Cham
英文摘要

We propose here selected actual features of measurement problems based on our concerns in our respective fields of research. Their technical similarity in apparently disconnected fields motivate this common communication. Problems of coherence and consistency, correlation, randomness and uncertainty are exposed in various fields including physics, decision theory and game theory, while the underlying mathematical structures are very similar.

2307.14025 2026-03-03 cs.LG cs.CV eess.IV q-bio.QM stat.ML

Topological Inductive Bias fosters Multiple Instance Learning in Data-Scarce Scenarios

Salome Kazeminia, Carsten Marr, Bastian Rieck

详情
Journal ref
Transactions on Machine Learning Research, 2026
英文摘要

Multiple instance learning (MIL) is a framework for weakly supervised classification, where labels are assigned to sets of instances, i.e., bags, rather than to individual data points. This paradigm has proven effective in tasks where fine-grained annotations are unavailable or costly to obtain. However, the effectiveness of MIL drops sharply when training data are scarce, such as for rare disease classification. To address this challenge, we propose incorporating topological inductive biases into the data representation space within the MIL framework. This bias introduces a topology-preserving constraint that encourages the instance encoder to maintain the topological structure of the instance distribution within each bag when mapping them to MIL latent space. As a result, our Topology Guided MIL (TG-MIL) method enhances the performance and generalizability of MIL classifiers across different aggregation functions, especially under scarce-data regimes. Our evaluations show average performance improvements of 15.3% for synthetic MIL datasets, 2.8% for MIL benchmarks, and 5.5% for rare anemia classification compared to current state-of-the-art MIL models, where only 17-120 samples per class are available. We make our code publicly available.

2305.04979 2026-03-03 cs.LG cs.DC stat.ML

FedHB: Hierarchical Bayesian Federated Learning

Minyoung Kim, Timothy Hospedales

详情
英文摘要

We propose a novel hierarchical Bayesian approach to Federated Learning (FL), where our model reasonably describes the generative process of clients' local data via hierarchical Bayesian modeling: constituting random variables of local models for clients that are governed by a higher-level global variate. Interestingly, the variational inference in our Bayesian model leads to an optimisation problem whose block-coordinate descent solution becomes a distributed algorithm that is separable over clients and allows them not to reveal their own private data at all, thus fully compatible with FL. We also highlight that our block-coordinate algorithm has particular forms that subsume the well-known FL algorithms including Fed-Avg and Fed-Prox as special cases. Beyond introducing novel modeling and derivations, we also offer convergence analysis showing that our block-coordinate FL algorithm converges to an (local) optimum of the objective at the rate of $O(1/\sqrt{t})$, the same rate as regular (centralised) SGD, as well as the generalisation error analysis where we prove that the test error of our model on unseen data is guaranteed to vanish as we increase the training data size, thus asymptotically optimal.

2303.16668 2026-03-03 cs.LG cs.AI cs.CR stat.ML

Protecting Federated Learning from Extreme Model Poisoning Attacks via Multidimensional Time Series Anomaly Detection

Edoardo Gabrielli, Dimitri Belli, Zoe Matrullo, Vittorio Miori, Gabriele Tolomei

详情
英文摘要

Current defense mechanisms against model poisoning attacks in federated learning (FL) systems have proven effective up to a certain threshold of malicious clients. In this work, we introduce FLANDERS, a novel pre-aggregation filter for FL resilient to large-scale model poisoning attacks, i.e., when malicious clients far exceed legitimate participants. FLANDERS treats the sequence of local models sent by clients in each FL round as a matrix-valued time series. Then, it identifies malicious client updates as outliers in this time series by comparing actual observations with estimates generated by a matrix autoregressive forecasting model maintained by the server. Experiments conducted in several non-iid FL setups show that FLANDERS significantly improves robustness across a wide spectrum of attacks when paired with standard and robust existing aggregation methods.

2303.04945 2026-03-03 quant-ph cs.DS cs.NA math.NA math.ST stat.TH

A Survey of Quantum Alternatives to Randomized Algorithms: Monte Carlo Integration and Beyond

Philip Intallura, Georgios Korpas, Sudeepto Chakraborty, Vyacheslav Kungurtsev, Rufus Lawrence, Ales Wodecki, Jakub Marecek

详情
英文摘要

Monte Carlo sampling is a powerful toolbox of algorithmic techniques widely used for a number of applications wherein some noisy quantity, or summary statistic thereof, is sought to be estimated. In this paper, we survey the literature for implementing Monte Carlo procedures using quantum circuits, focusing on the potential to obtain a quantum advantage in the computational speed of these procedures. We revisit the quantum algorithms that could replace classical Monte Carlo and then consider both the existing quantum algorithms and the potential quantum realizations that include adaptive enhancements as alternatives to the classical procedure.

2210.17453 2026-03-03 stat.ME stat.AP stat.ML

Adaptive Selection of the Optimal Strategy to Improve Precision and Power in Randomized Trials

Laura B. Balzer, Erica Cai, Lucas Godoy Garraza, Pracheta Amaranath

Comments 27 pages (double spaced); 2 figures; 9 tables

详情
Journal ref
Biometrics, Volume 80, Issue 1, March 2024, ujad034
英文摘要

Benkeser et al. demonstrate how adjustment for baseline covariates in randomized trials can meaningfully improve precision for a variety of outcome types. Their findings build on a long history, starting in 1932 with R.A. Fisher and including more recent endorsements by the U.S. Food and Drug Administration and the European Medicines Agency. Here, we address an important practical consideration: *how* to select the adjustment approach -- which variables and in which form -- to maximize precision, while maintaining Type-I error control. Balzer et al. previously proposed *Adaptive Prespecification* within TMLE to flexibly and automatically select, from a prespecified set, the approach that maximizes empirical efficiency in small trials (N$<$40). To avoid overfitting with few randomized units, selection was previously limited to working generalized linear models, adjusting for a single covariate. Now, we tailor Adaptive Prespecification to trials with many randomized units. Using $V$-fold cross-validation and the estimated influence curve-squared as the loss function, we select from an expanded set of candidates, including modern machine learning methods adjusting for multiple covariates. As assessed in simulations exploring a variety of data generating processes, our approach maintains Type-I error control (under the null) and offers substantial gains in precision -- equivalent to 20-43\% reductions in sample size for the same statistical power. When applied to real data from ACTG Study 175, we also see meaningful efficiency improvements overall and within subgroups.

2105.11205 2026-03-03 stat.ME

Deconvolution density estimation with penalised MLE

Yun Cai, Hong Gu, Toby Kenney

Comments 25 pages, 4 figures, Appendix - 30 pages 8 figures

详情
Journal ref
Journal of the American Statistical Association 120 (2025) 1711-1723
英文摘要

Deconvolution is the important problem of estimating the distribution of a quantity of interest from a sample with additive measurement error. Nearly all methods in the literature are based on Fourier transformation because it is mathematically a very neat solution. However, in practice these methods are unstable, and produce bad estimates when signal-noise ratio or sample size are low. In this paper, we develop a new deconvolution method based on maximum likelihood with a smoothness penalty. We show that our new method has much better performance than existing methods, particularly for small sample size or signal-noise ratio.

2603.01837 2026-03-03 cs.LG stat.ML

Constrained Particle Seeking: Solving Diffusion Inverse Problems with Just Forward Passes

Hongkun Dou, Zike Chen, Zeyu Li, Hongjue Li, Lijun Yang, Yue Deng

Comments Accepted by AAAI 2026

详情
英文摘要

Diffusion models have gained prominence as powerful generative tools for solving inverse problems due to their ability to model complex data distributions. However, existing methods typically rely on complete knowledge of the forward observation process to compute gradients for guided sampling, limiting their applicability in scenarios where such information is unavailable. In this work, we introduce \textbf{\emph{Constrained Particle Seeking (CPS)}}, a novel gradient-free approach that leverages all candidate particle information to actively search for the optimal particle while incorporating constraints aligned with high-density regions of the unconditional prior. Unlike previous methods that passively select promising candidates, CPS reformulates the inverse problem as a constrained optimization task, enabling more flexible and efficient particle seeking. We demonstrate that CPS can effectively solve both image and scientific inverse problems, achieving results comparable to gradient-based methods while significantly outperforming gradient-free alternatives. Code is available at https://github.com/deng-ai-lab/CPS.

2603.01825 2026-03-03 cs.LG cs.GT stat.ML

Uncertainty Quantification of Click and Conversion Estimates for the Autobidding

Ivan Zhigalskii, Andrey Pudovikov, Aleksandr Katrutsa, Egor Samosvat

Comments 17 pages (10 main text + 7 appendix), 5 figures, 2 tables

详情
英文摘要

Modern e-commerce platforms employ various auction mechanisms to allocate paid slots for a given item. To scale this approach to the millions of auctions, the platforms suggest promotion tools based on the autobidding algorithms. These algorithms typically depend on the Click-Through-Rate (CTR) and Conversion-Rate (CVR) estimates provided by a pre-trained machine learning model. However, the predictions of such models are uncertain and can significantly affect the performance of the autobidding algorithm. To address this issue, we propose the DenoiseBid method, which corrects the generated CTRs and CVRs to make the resulting bids more efficient in auctions. The underlying idea of our method is to employ a Bayesian approach and replace noisy CTR or CVR estimates with those from recovered distributions. To demonstrate the performance of the proposed approach, we perform extensive experiments on the synthetic, iPinYou, and BAT datasets. To evaluate the robustness of our approach to the noise scale, we use synthetic noise and noise estimated from the predictions of the pre-trained machine learning model.

2603.01786 2026-03-03 cs.LG cs.AI stat.ML

Learning Shortest Paths with Generative Flow Networks

Nikita Morozov, Ian Maksimov, Daniil Tiapkin, Sergey Samsonov

详情
英文摘要

In this paper, we present a novel learning framework for finding shortest paths in graphs utilizing Generative Flow Networks (GFlowNets). First, we examine theoretical properties of GFlowNets in non-acyclic environments in relation to shortest paths. We prove that, if the total flow is minimized, forward and backward policies traverse the environment graph exclusively along shortest paths between the initial and terminal states. Building on this result, we show that the pathfinding problem in an arbitrary graph can be solved by training a non-acyclic GFlowNet with flow regularization. We experimentally demonstrate the performance of our method in pathfinding in permutation environments and in solving Rubik's Cubes. For the latter problem, our approach shows competitive results with state-of-the-art machine learning approaches designed specifically for this task in terms of the solution length, while requiring smaller search budget at test-time.

2603.01737 2026-03-03 eess.SP math.ST stat.TH

Detection of weak signals under arbitrary noise distributions

J. Zschetzsche, M. Weimar, O. Lang, S. Schuster, A. Haberl, S. Schertler, B. Lehner, J. Reisinger, M. Huemer, S. Rotter

Comments 24 pages, 8 figures, Code available at https://github.com/jonaslindenberger/LRao-detector

详情
英文摘要

Detecting weak signals buried in complex, non-Gaussian noise is a fundamental challenge in science and engineering, with applications ranging from radar systems and communications to industrial monitoring and gravitational wave detection. The Rao detector, a key concept in this domain, achieves asymptotically optimal performance as the number of measurements increases, but requires precise knowledge of the data's statistical properties, often relying on simplified noise models. We propose a hybrid framework that combines a lightweight neural network with the Rao detection framework to address this limitation. The neural network, trained on noise-only data, learns the optimal multivariate nonlinearity, transforming noisy data to enhance signal detectability. The newly introduced LRao detector then fully extracts the signal information, achieving asymptotically optimal performance even under challenging noise conditions. Validated on both simulated and real-world magnetic sensor data, our method significantly outperforms conventional approaches. By bridging data-driven techniques with model-based signal processing, it offers a robust and interpretable solution for signal detection across diverse applications.

2603.01719 2026-03-03 stat.ML cs.LG

Co-optimization for Adaptive Conformal Prediction

Xiaoyi Su, Zhixin Zhou, Rui Luo

详情
英文摘要

Conformal prediction (CP) provides finite-sample, distribution-free marginal coverage, but standard conformal regression intervals can be inefficient under heteroscedasticity and skewness. In particular, popular constructions such as conformalized quantile regression (CQR) often inherit a fixed notion of center and enforce equal-tailed errors, which can displace the interval away from high-density regions and produce unnecessarily wide sets. We propose Co-optimization for Adaptive Conformal Prediction (CoCP), a framework that learns prediction intervals by jointly optimizing a center $m(x)$ and a radius $h(x)$.CoCP alternates between (i) learning $h(x)$ via quantile regression on the folded absolute residual around the current center, and (ii) refining $m(x)$ with a differentiable soft-coverage objective whose gradients concentrate near the current boundaries, effectively correcting mis-centering without estimating the full conditional density. Finite-sample marginal validity is guaranteed by split-conformal calibration with a normalized nonconformity score. Theory characterizes the population fixed point of the soft objective and shows that, under standard regularity conditions, CoCP asymptotically approaches the length-minimizing conditional interval at the target coverage level as the estimation error and smoothing vanish. Experiments on synthetic and real benchmarks demonstrate that CoCP yields consistently shorter intervals and achieves state-of-the-art conditional-coverage diagnostics.

2603.01716 2026-03-03 stat.ME

A spatial scan statistical for categorical, functional data

Camille Frévent, Moustapha Sarr, Sophie Dabo-Niang

详情
英文摘要

We have developed and tested a spatial scan statistic for categorical, functional data (CFSS) - a data structure within which current approaches cannot identify spatial clusters. Our methodology combines an encoding scheme for categorical, functional observations with a nonparametric scan statistic. In a simulation study with three distinct scenarios, the CFSS accurately recovered the simulated spatial clusters and gave very low false positive rates, high true positive rates, and high positive predictive values. We have also used the CFSS to identify and characterize spatial clusters in French air pollution data from the winter of 2024.

2603.01715 2026-03-03 stat.ME stat.AP

Power and Sample Size Calculations for Bayes Factors in two-arm clinical Phase II Trials with binary Endpoints

Riko Kelter

Comments 53 pages, 10 figures

详情
英文摘要

Bayesian sample size calculations in clinical trials usually rely on complex Monte Carlo simulations in practice. Obtaining bounds on Bayesian notions of the false-positive rate and power often lack closed-form or approximate numerical solutions. In this paper, we focus on power and sample size calculations for Bayes factors in the two-arm binomial setting of phase II trials. We cover point-null versus composite and directional hypothesis tests, derive the corresponding Bayes factors, and discuss relevant aspects to consider when pursuing Bayesian design of experiments with the introduced approach. Based on these Bayes factors, we propose a numerical approach which allows to determine the necessary sample size to obtain prespecified bounds of Bayesian power and type-I-error rate in a computationally efficient way. Our method does not rely on Monte Carlo simulations and instead solely relies on standard numerical methods. Real-world examples of phase II trials from oncology and autoimmune diseases illustrate the advantage of the proposed calibration method. In summary, our approach allows for a Bayes-frequentist compromise by providing a Bayesian analogue to a frequentist power analysis for various Bayes factors in the two-arm binomial setting of a phase II clinical trial. The methods are implemented in our R package bfbin2arm.

2603.01670 2026-03-03 math.PR math.ST stat.ML stat.TH

Statistical Consistency of Discrete-to-Continuous Limits of Determinantal Point Processes

Hugo Jaquard, Nicolas Keriven

详情
英文摘要

We investigate the limiting behavior of discrete determinantal point processes (DPPs) towards continuous DPPs when the size of the set to sample from goes to infinity. We propose a non-asymptotic characterization of this limit in terms of the concentration of statistics associated to these processes, which we refer to as "weak coherency". This allows to translate statistical guarantees from the limiting process to the original, discrete one. Our main result describes sufficient conditions for weak coherency to hold. In particular, our study encompasses settings where both the kernel of the continuous process and its underlying space are inaccessible, or when the discrete marginal kernel is a noisy version of its continuous counterpart. We illustrate our theory on several examples. We prove that a discrete multivariate orthogonal polynomial ensemble can be used to produce coresets strictly smaller than independent sampling for the same error. We propose a process achieving repulsive sampling on an unknown manifold from a set of points sampled from an unknown density. Finally, we show that continuous DPPs can be obtained as limits on random graphs with Bernoulli edges, even when only observing the graph structure. We obtain interesting byproduct results along the way.

2603.01653 2026-03-03 stat.AP

Probabilistic forecasting of weather-driven faults in electricity networks: a flexible approach for extreme and non-extreme events

Mateus Maia, Daniela Castro-Camilo, Jethro Browell

详情
英文摘要

Electricity networks are vulnerable to weather damage, with severe events often leading to faults and power outages. Timely forecasts of fault occurrences, ranging from nowcasts to several days ahead, can enhance preparedness, support faster response, and reduce outage durations. To be operationally useful, such forecasts must quantify uncertainty, enabling risk-informed resource allocation. We present a novel probabilistic framework for forecasting fault counts that captures typical and extreme events. Non-extreme faults are modeled linearly interpolating estimates from multiple additive quantile regressions, while extreme events are described through a discrete generalized Pareto distribution. To incorporate the impact of weather fluctuations, we use ensemble numerical weather predictions, which helps to quantify uncertainty in the forecasts. This approach is designed to provide reliable fault predictions up to four days ahead. We evaluate the model through numerical experiments and apply it to historical fault data from two electricity distribution networks in Great Britain. The resulting forecasts demonstrate substantial improvements over business-as-usual and alternative modeling approaches. A practitioner trial conducted with Scottish Power Energy Networks from October 2024 to March 2025 further demonstrates the operational value of the forecasts. Engineers found them sufficiently reliable to inform decision-making, offering benefits to both network operators and electricity consumers.

2603.01588 2026-03-03 cs.LG stat.ML

Jump Like A Squirrel: Optimized Execution Step Order for Anytime Random Forest Inference

Daniel Biebert, Christian Hakert, Kay Heider, Daniel Kuhse, Sebastian Buschjäger, Jian-Jia Chen

详情
英文摘要

Due to their efficiency and small size, decision trees and random forests are popular machine learning models used for classification on resource-constrained systems. In such systems, the available execution time for inference in a random forest might not be sufficient for a complete model execution. Ideally, the already gained prediction confidence should be retained. An anytime algorithm is designed to be able to be aborted anytime, while giving a result with an increasing quality over time. Previous approaches have realized random forests as anytime algorithms on the granularity of trees, stopping after some but not all trees of a forest have been executed. However, due to the way decision trees subdivide the sample space in every step, an increase in prediction quality is achieved with every additional step in one tree. In this paper, we realize decision trees and random forest as anytime algorithms on the granularity of single steps in trees. This approach opens a design space to define the step order in a forest, which has the potential to optimize the mean accuracy. We propose the Optimal Order, which finds a step order with a maximal mean accuracy in exponential runtime and the polynomial runtime heuristics Forward Squirrel Order and Backward Squirrel Order, which greedily maximize the accuracy for each additional step taken down and up the trees, respectively. Our evaluation shows, that the Backward Squirrel Order performs $\sim94\%$ as well as the Optimal Order and $\sim99\%$ as well as all other step orders.

2603.01514 2026-03-03 cs.LG stat.ML

Training Dynamics of Softmax Self-Attention: Fast Global Convergence via Preconditioning

Gautam Goel, Mahdi Soltanolkotabi, Peter Bartlett

详情
英文摘要

We study the training dynamics of gradient descent in a softmax self-attention layer trained to perform linear regression and show that a simple first-order optimization algorithm can converge to the globally optimal self-attention parameters at a geometric rate. Our analysis proceeds in two steps. First, we show that in the infinite-data limit the regression problem solved by the self-attention layer is equivalent to a nonconvex matrix factorization problem. Second, we exploit this connection to design a novel "structure-aware" variant of gradient descent which efficiently optimizes the original finite-data regression objective. Our optimization algorithm features several innovations over standard gradient descent, including a preconditioner and regularizer which help avoid spurious stationary points, and a data-dependent spectral initialization of parameters which lie near the manifold of global minima with high probability.

2603.01468 2026-03-03 stat.ME stat.ML

Wild Bootstrap Inference for Non-Negative Matrix Factorization with Random Effects

Kenichi Satoh

详情
英文摘要

Non-negative matrix factorization (NMF) is widely used for parts-based representations, yet formal inference for covariate effects is rarely available when the basis is learned under non-negativity. We introduce non-negative matrix factorization with random effects (NMF-RE), a mean-structure latent-variable model $Y=X(ΘA+U)+\mathcal{E}$ that combines covariate-driven scores with unit-specific deviations. Random effects act as a working device for modeling heterogeneity and controlling complexity; we monitor their effective degrees of freedom and enforce a df-based cap to prevent near-saturated fits. Estimation alternates closed-form ridge (BLUP-like) updates for $U$ with multiplicative non-negative updates for $X$ and $Θ$. For inference on $Θ$, we condition on $(\widehat X,\widehat U)$ and obtain fast uncertainty quantification via asymptotic linearization, a one-step Newton update, and a multiplier (wild) bootstrap; this avoids repeated constrained re-optimization. Simulations include a targeted stress test showing that, without df control, the random-effects penalty can collapse and inference for $Θ$ becomes degenerate, whereas the df-cap prevents this failure mode. The non-negativity constraint induces sparse, parts-based loadings -- a measurement-side variable selection -- while inference on $Θ$ identifies which covariates affect which components, providing covariate-side selection. Longitudinal, psychometric, spatial-flow, and text examples further illustrate stable, interpretable covariate-effect inference.

2603.01434 2026-03-03 math.ST q-fin.RM stat.TH

A Laplace-based perspective on conditional mean risk sharing

Christopher Blier-Wong

详情
英文摘要

The conditional mean risk-sharing (CMRS) rule is an important tool for distributing aggregate losses across individual risks, but its implementation in continuous multivariate models typically requires complicated multidimensional integrals. We develop a framework to compute CMRS allocations from the joint Laplace--Stieltjes transform of the risk vector. The LSTs of the allocation measures $ν_i(B)=\mathbb{E}[X_i\boldsymbol{1}_{\{S\in B\}}]$ are expressed as partial derivatives of the joint LST evaluated on the diagonal $t_1=\cdots=t_n$. When densities exist, this yields one-dimensional Laplace inversions for $f_S$ and $ξ_i$, and hence $h_i(s)=ξ_i(s)/f_S(s)$ on the absolutely continuous part, providing closed-form or semi-analytic solutions for a broad class of distributions. We also develop numerical inversion methods for cases where analytic inversion is unavailable. We introduce an exponential tilting procedure to stabilize numerical inversion in low-probability aggregate events. We provide several examples to illustrate the approach, including in some high-dimensional settings where existing approaches are infeasible.

2603.01428 2026-03-03 stat.AP astro-ph.IM

A Hybrid Particle Gaussian Mixture Filtering Method for Cislunar Orbit Determination Under Extreme Uncertainty

Ishan Paranjape, Tarun Hejmadi, Utkarsh Ranjan Mishra, Suman Chakravorty

Comments 8 pages; submitted to the 2026 IEEE FUSION conference

详情
英文摘要

Gauss's method of orbit determination (OD) and its variants are among the most popular initial state estimation techniques for astronomers and engineers alike. However, owing to its assumptions regarding the two-body problem, Gauss's method is inapplicable in the cislunar domain, where three body effects dominate. We introduce a hybrid Particle Gaussian Mixture filtering method, a purely recursive probabilistic orbit determination framework based on a combination of the Markov Chain Monte Carlo based Particle Gaussian Mixture-II (PGM-II) and Particle Gaussian Mixture-I (PGM-I) filters. This method enables us to fuse probabilistic information with angles-only observations from terrestrial telescopes for short and long-term cislunar target tracking. We demonstrate this technique on an important cislunar orbit regime.

2603.01402 2026-03-03 stat.ME

Wrapped flat-top kernel density estimation with circular data

Yasuhito Tsuruta

Comments Accepted for publication in Statistical Papers

详情
英文摘要

Kernel density estimators with circular data have been studied extensively for decades, as they allow flexible estimations even when the shape of the underlying density is complex. Many recent studies have examined bias correction methods; however, these methods are limited by the order when trying to improve the convergence rate of the bias, even if the true density is sufficiently smooth. To overcome this limitation, the present study considers a new bias correction approach based on the characteristic functions of the underlying circular density. We introduce wrapped flat-top kernels, which are generated by wrapping the standard flat-top kernels defined on the real line onto the circumference of a unit circle. The asymptotic mean squared errors of the wrapped flat-top kernel density estimators are then derived. The results show that the convergence rate of these estimators is faster than that of previously introduced estimators. Furthermore, wrapped flat-top kernel density estimators achieve $\sqrt{n}$-consistency under the characteristic function of finite support, such as the circular uniform and cardioid distributions. We confirm these theoretical results in the numerical experiments. In empirical analyses, we also show that wrapped flat-top kernel density estimators effectively capture the shape of data. Therefore, such estimators are expected to allow flexible and accurate estimation in circular data analysis.

2603.01378 2026-03-03 stat.ME

Integration of Individual Participant and Aggregate Data Under Dataset Shift: Summary Statistic Comparison and Scalable Computation

Ming-Yueh Huang, Jing Qin, Chiung-Yu Huang

详情
英文摘要

Integrated IPD-AD analysis, which combines individual participant data (IPD) with aggregate data (AD), is increasingly recognized as an effective strategy for generating more reliable and generalizable inferences from heterogeneous studies. While most existing work has focused on algorithmic approaches, this paper investigates a complementary yet underexplored question: how different forms of AD influence the efficiency of data integration. Working within a constrained maximum likelihood estimation framework, we compare commonly reported summary statistics and show that subgroup-specific summaries can substantially improve estimation efficiency. In particular, we find that AD derived from outcome-stratified subgroups (e.g., cases and controls) consistently yield greater efficiency gains than those based on covariate-stratified subgroups (e.g., age or exposure categories), especially when the outcome is continuous. Although outcome-stratified summaries are commonly reported for discrete outcomes, they are rarely provided when the outcome is continuous. Our findings therefore support the routine inclusion of outcome-stratified summaries for continuous endpoints in trial reports and public data repositories to facilitate more efficient evidence synthesis. We further extend the constrained maximum likelihood framework to accommodate dataset shift and develop a fast, non-iterative estimation procedure to improve numerical stability and scalability. We illustrate the proposed methodology with two applications: an analysis of income data under covariate shift and an analysis of housing data under prior probability shift.

2603.01376 2026-03-03 cs.LG stat.ML

3BASiL: An Algorithmic Framework for Sparse plus Low-Rank Compression of LLMs

Mehdi Makni, Xiang Meng, Rahul Mazumder

Comments The Thirty-ninth Annual Conference on Neural Information Processing Systems

详情
英文摘要

Sparse plus Low-Rank $(\mathbf{S} + \mathbf{LR})$ decomposition of Large Language Models (LLMs) has emerged as a promising direction in model compression, aiming to decompose pre-trained model weights into a sum of sparse and low-rank matrices $(\mathbf{W} \approx \mathbf{S} + \mathbf{LR})$. Despite recent progress, existing methods often suffer from substantial performance degradation compared to dense models. In this work, we introduce 3BASiL-TM, an efficient one-shot post-training method for $(\mathbf{S} + \mathbf{LR})$ decomposition of LLMs that addresses this gap. Our approach first introduces a novel 3-Block Alternating Direction Method of Multipliers (ADMM) method, termed 3BASiL, to minimize the layer-wise reconstruction error with convergence guarantees. We then design an efficient transformer-matching (TM) refinement step that jointly optimizes the sparse and low-rank components across transformer layers. This step minimizes a novel memory-efficient loss that aligns outputs at the transformer level. Notably, the TM procedure is universal as it can enhance any $(\mathbf{S} + \mathbf{LR})$ decomposition, including pure sparsity. Our numerical experiments show that 3BASiL-TM reduces the WikiText2 perplexity gap relative to dense LLaMA-8B model by over 30% under a (2:4 Sparse + 64 LR) configuration, compared to prior methods. Moreover, our method achieves over 2.5x faster compression runtime on an A100 GPU compared to SOTA $(\mathbf{S} + \mathbf{LR})$ method. Our code is available at https://github.com/mazumder-lab/3BASiL.

2603.01346 2026-03-03 cs.LG stat.ML

Relatively Smart: A New Approach for Instance-Optimal Learning

Shaddin Dughmi, Alireza F. Pour

详情
英文摘要

We revisit the framework of Smart PAC learning, which seeks supervised learners which compete with semi-supervised learners that are provided full knowledge of the marginal distribution on unlabeled data. Prior work has shown that such marginal-by-marginal guarantees are possible for "most" marginals, with respect to an arbitrary fixed and known measure, but not more generally. We discover that this failure can be attributed to an "indistinguishability" phenomenon: There are marginals which cannot be statistically distinguished from other marginals that require different learning approaches. In such settings, semi-supervised learning cannot certify its guarantees from unlabeled data, rendering them arguably non-actionable. We propose relatively smart learning, a new framework which demands that a supervised learner compete only with the best "certifiable" semi-supervised guarantee. We show that such modest relaxation suffices to bypass the impossibility results from prior work. In the distribution-free setting, we show that the OIG learner is relatively smart up to squaring the sample complexity, and show that no supervised learning algorithm can do better. For distribution-family settings, we show that relatively smart learning can be impossible or can require idiosyncratic learning approaches, and its difficulty can be non-monotone in the inclusion order on distribution families.

2603.01339 2026-03-03 stat.ML cs.LG

Causal Effects with Unobserved Unit Types in Interacting Human-AI Systems

William Overman, Sadegh Shirani, Mohsen Bayati

详情
英文摘要

We study experiments on interacting populations of humans and AI agents, where both unit types and the interaction network remain unobserved. Although causal effects propagate throughout the system, the goal is to estimate effects on humans. Examples include online platforms where human users interact alongside AI-driven accounts. We assume a human-AI prior that gives each unit a probability of being human. While humans cannot be distinguished at the unit level, the prior allows us to compute the average human composition within large subpopulations. We then model outcome dynamics through a causal message passing (CMP) framework and analyze sample-mean outcomes across subpopulations. We show that by constructing subpopulations that vary in expected human composition and treatment exposure, one can consistently recover human-specific causal effects. Our results characterize when distributional knowledge of population composition (without observing unit types or the interaction network) is sufficient for identification. We validate the approach on a simulated human-AI platform driven by behaviorally differentiated LLM agents. Together, these results provide a theoretical and practical framework for experimentation in emerging human-AI systems.

2603.01337 2026-03-03 stat.ML cs.LG

Adaptive Estimation and Inference in Conditional Moment Models via the Discrepancy Principle

Jiyuan Tan, Vasilis Syrgkanis

详情
英文摘要

We study adaptive estimation and inference in ill-posed linear inverse problems defined by conditional moment restrictions. Existing regularized estimators such as Regularized DeepIV (RDIV) require prior knowledge of the smoothness of the nuisance function, typically encoded by a beta source condition to tune their regularization parameters. In practice, this smoothness is unknown, and misspecified hyperparameters can lead to suboptimal convergence or instability. We introduce a discrepancy-principle-based framework for adaptive hyperparameter selection that automatically balances bias and variance without relying on the unknown smoothness parameter. Our framework applies to both RDIV (Li et al. [2024]) and the Tikhonov Regularized Adversarial Estimator (TRAE) (Bennett et al. [2023a]) and achieves the same rates in both weak and strong metrics. Building on this, we construct a fully adaptive doubly robust estimator for linear functionals that attains the optimal rate of the better-conditioned primal or dual problem, providing a practical, theoretically grounded approach for adaptive inference in ill-posed econometric models.

2603.01309 2026-03-03 cs.LG stat.ML

PAC Guarantees for Reinforcement Learning: Sample Complexity, Coverage, and Structure

Joshua Steier

Comments 43 pages

详情
英文摘要

When data is scarce or mistakes are costly, average-case metrics fall short. What a practitioner needs is a guarantee: with probability at least $1-δ$, the learned policy is $\varepsilon$-close to optimal after $N$ episodes. This is the PAC promise, and between 2018 and 2025 the RL theory community made striking progress on when such promises can be kept. We survey that progress. Our organizing tool is the Coverage-Structure-Objective (CSO) framework, proposed here, which decomposes nearly every PAC sample complexity result into three factors: coverage (how data were obtained), structure (intrinsic MDP or function-class complexity), and objective (what the learner must deliver). CSO is not a theorem but an interpretive template that identifies bottlenecks and makes cross-setting comparison immediate. The technical core covers tight tabular baselines and the uniform-PAC bridge to regret; structural complexity measures (Bellman rank, witness rank, Bellman-Eluder dimension) governing learnability with function approximation; results for linear, kernel/NTK, and low-rank models; reward-free exploration as upfront coverage investment; and pessimistic offline RL where inherited coverage is the binding constraint. We provide practitioner tools: rate lookup tables indexed by CSO coordinates, Bellman residual diagnostics, coverage estimation with deployment gates, and per-episode policy certificates. A final section catalogs open problems, separating near-term targets from frontier questions where coverage, structure, and computation tangle in ways current theory cannot resolve.

2603.01304 2026-03-03 cs.LG stat.ML

Nonconvex Latent Optimally Partitioned Block-Sparse Recovery via Log-Sum and Minimax Concave Penalties

Takanobu Furuhashi, Hiroki Kuroda, Masahiro Yukawa, Qibin Zhao, Hidekata Hontani, Tatsuya Yokota

Comments 13 pages, 11 figures

详情
英文摘要

We propose two nonconvex regularization methods, LogLOP-l2/l1 and AdaLOP-l2/l1, for recovering block-sparse signals with unknown block partitions. These methods address the underestimation bias of existing convex approaches by extending log-sum penalty and the Minimax Concave Penalty (MCP) to the block-sparse domain via novel variational formulations. Unlike Generalized Moreau Enhancement (GME) and Bayesian methods dependent on the squared-error data fidelity term, our proposed methods are compatible with a broad range of data fidelity terms. We develop efficient Alternating Direction Method of Multipliers (ADMM)-based algorithms for these formulations that exhibit stable empirical convergence. Numerical experiments on synthetic data, angular power spectrum estimation, and denoising of nanopore currents demonstrate that our methods outperform state-of-the-art baselines in estimation accuracy.

2603.01293 2026-03-03 cs.LG cs.AI stat.ML

Theoretical Perspectives on Data Quality and Synergistic Effects in Pre- and Post-Training Reasoning Models

Adel Javanmard, Baharan Mirzasoleiman, Vahab Mirrokni

Comments 35 pages, 5 figures

详情
英文摘要

Large Language Models (LLMs) are pretrained on massive datasets and later instruction-tuned via supervised fine-tuning (SFT) or reinforcement learning (RL). Best practices emphasize large, diverse pretraining data, whereas post-training operates differently: SFT relies on smaller, high-quality datasets, while RL benefits more from scale, with larger amounts of feedback often outweighing label quality. Yet it remains unclear why pretraining and RL require large datasets, why SFT excels on smaller ones, and what defines high-quality SFT data. In this work, we theoretically analyze transformers trained on an in-context weight prediction task for linear regression. Our analysis reveals several key findings: $(i)$ balanced pretraining data can induce latent capabilities later activated during post-training, and $(ii)$ SFT learns best from a small set of examples challenging for the pretrained model, while excessively large SFT datasets may dilute informative pretraining signals. In contrast, RL is most effective on large-scale data that is not overly difficult for the pretrained model. We validate these theoretical insights with experiments on large nonlinear transformer architectures.

2603.01268 2026-03-03 cs.DS cs.IT math.IT math.PR math.ST stat.TH

Achievability of Heterogeneous Hypergraph Recovery from its Graph Projection

Alexander Morgan, Chenghao Guo

详情
英文摘要

We formulate and analyze a heterogeneous random hypergraph model, and we provide an achieveability result for recovery of hyperedges from the observed projected graph. We observe a projected graph which combines random hyperedges across all degrees, where a projected edge appears if and only if both vertices appear in at least one hyperedge. Our goal is to reconstruct the original set of hyperedges of degree $d_j$ for some $j$. Our achievability result is based on the idea of selecting maximal cliques of size $d_j$ in the projected graph, and we show that this algorithm succeeds under a natural condition on the densities. This achievability condition generalizes a known threshold for $d$-uniform hypergraphs with noiseless and noisy projections. We conjecture the threshold to be optimal for recovering hyperedges with the largest degree.

2603.01237 2026-03-03 stat.ME

Robust measures of dispersion for circular data with an anomaly detection rule

Houyem Demni, Mia Hubert, Giovanni C. Porzio, Peter J. Rousseeuw

详情
英文摘要

Circular variables that represent directions or periodic observations arise in many fields, such as biology and environmental sciences. An important issue when dealing with circular data is how to estimate their dispersion robustly, avoiding undue effects of anomalies. This work extends three robust dispersion measures from the line to the circle. Their robustness is studied via their influence functions and relative bias curves. From these dispersion measures, robust estimators of parameters of circular distributions can be derived. This yields robust estimators for the concentration parameter of the von Mises distribution and the dispersion parameter of the wrapped normal distribution. Their breakdown values and statistical efficiencies are obtained, and they are compared in a simulation study. Building on the best performing estimator, a robust circular anomaly detection procedure is developed, and employed to visualize outliers through a circular violin plot. Three real datasets are analyzed.

2603.01230 2026-03-03 stat.CO

Stochastic Neural Networks for Causal Inference with Missing Confounders

Yaxin Fang, Faming Liang

Comments Accepted at the International Conference on Learning Representations (ICLR) 2026

详情
Journal ref
In Proceedings of the International Conference on Learning Representations (ICLR), 2026
英文摘要

Unmeasured confounding is a fundamental obstacle to causal inference from observational data. Latent-variable methods address this challenge by imputing unobserved confounders, yet many lack explicit model-based identification guarantees and are difficult to extend to richer causal structures. We propose Confounder Imputation with Stochastic Neural Networks (CI-StoNet), which parameterizes the conditional structure of a causal directed acyclic graph using a stochastic neural network and imputes latent confounders via adaptive stochastic-gradient Hamiltonian Monte Carlo. Under SUTVA and overlap, and assuming that the structural components of the data-generating process are well approximated by a capacity-controlled sparse deep neural network class, we establish model identification and consistent estimation of the mean potential outcome under a fixed intervention within this class. Although the latent confounder is identifiable only up to reparameterizations that preserve the joint treatment-outcome distribution, the causal estimand is invariant across this observationally equivalent class. We further characterize the effect of overlap on estimation accuracy. Empirical results on simulated and benchmark datasets demonstrate accurate performance, and the framework extends naturally to proxy-variable and multiple-cause settings with overlap diagnostics and bootstrap-based uncertainty quantification.

2603.01184 2026-03-03 cs.LG cs.AI q-bio.NC stat.CO

Scaling of learning time for high dimensional inputs

Carlos Stein Brito

Comments 14 pages, 5 figures

详情
英文摘要

Representation learning from complex data typically involves models with a large number of parameters, which in turn require large amounts of data samples. In neural network models, model complexity grows with the number of inputs to each neuron, with a trade-off between model expressivity and learning time. A precise characterization of this trade-off would help explain the connectivity and learning times observed in artificial and biological networks. We present a theoretical analysis of how learning time depends on input dimensionality for a Hebbian learning model performing independent component analysis. Based on the geometry of high-dimensional spaces, we show that the learning dynamics reduce to a unidimensional problem, with learning times dependent only on initial conditions. For higher input dimensions, initial parameters have smaller learning gradients and larger learning times. We find that learning times have supralinear scaling, becoming quickly prohibitive for high input dimensions. These results reveal a fundamental limitation for learning in high dimensions and help elucidate how the optimal design of neural networks depends on data complexity. Our approach outlines a new framework for analyzing learning dynamics and model complexity in neural network models.

2603.01119 2026-03-03 stat.ME cs.AI

Robust Weighted Triangulation of Causal Effects Under Model Uncertainty

Rohit Bhattacharya, Ina Ocelli, Ted Westling

Comments 17 pages

详情
英文摘要

A fundamental challenge in causal inference with observational data is correct specification of a causal model. When there is model uncertainty, analysts may seek to use estimates from multiple candidate models that rely on distinct, and possibly partially overlapping, sets of identifying assumptions to infer the causal effect, a process known as triangulation. Principled methods for triangulation, however, remain underdeveloped. Here, we develop a framework for causal effect triangulation that combines model testability methods from causal discovery with statistical inference methods from semiparametric theory, while avoiding explicit model selection and post-selection inference problems. We propose a triangulation functional that combines identified functionals from each model with data-driven measures of model validity. We provide a bound on the distance of the functional from the true causal effect along with conditions under which this distance can be taken to zero. Finally, we derive valid statistical inference for this functional. Our framework formalizes robustness under causal pluralism without requiring agreement across models or commitment to a single specification. We demonstrate its performance through simulations and an empirical application.

2603.01117 2026-03-03 cs.DL cs.SI econ.GN q-fin.EC stat.AP

China leads scientific trends; the West launches new ones

Jeffrey W. Lockhart, Jamshid Sourati, Feng Shi, James Evans

Comments 16 pages, 4 figures

详情
英文摘要

How nations shape the scientific frontier matters for technological competition, but standard metrics, including publication counts, citations, and disruption indices, look backward and fail to distinguish between fundamentally different leadership strategies. We develop and validate two forward-looking model-based measures and apply them to tens of millions of articles since 1990. The first embeds research pathways within an evolving hypergraph of concepts and scientists to identify leadership in emerging areas--work that anticipates where the scientific crowd is heading. The second embeds evolving samples of ideas and disciplines drawn upon in past research to identify leadership in surprising new directions as unexpected combinations become routine and science reorganizes around them. China became the global leader in emerging areas roughly a decade ago, well before it led in volume, reflecting a capacity to detect and amplify nascent consensus at scale. The United States and Europe show the opposite profile: declining emergence shares but persistent leadership in prescient work, especially research bridging disciplinary boundaries. These patterns replicate across databases, attribution methods, and strategic domains, including AI, biotechnology, energy, and semiconductors. Nations lead science by reading the landscape or by reshaping it, and the institutional requirements for each strategy lie in tension. The distribution of these strategies promises to shape the global structure of technological leadership for decades.

2603.01085 2026-03-03 stat.AP stat.CO

Recovery-Informed Forecasting Strategy Enhancement

Feng Li, Taozhu Ruan

详情
英文摘要

We propose a three-stage framework named as Recovery-Informed Strategy Enhancement (RISE) to forecast the recovery of Chinese outbound tourism following the coronavirus disease 2019 pandemic. The framework decomposes the forecasts into three parts: the initial forecasts, the terminal forecasts and the recovery curve forecasts that connect the two points. We integrate multiple sources of information and employ forecast combination techniques in all stages, enhancing both the accuracy and robustness of recovery forecasts. Compared with conventional forecasting approaches, our framework provides a structured and transparent pipeline to integrate model-based forecasts with expert-informed judgment under structural breaks and high uncertainty. Our findings demonstrate the effectiveness of this framework, offering an adaptable tool for recovery trajectory forecasting in post-crisis contexts.

2603.01081 2026-03-03 stat.AP

Issue-Specific Polarization and Cohesion in a Multi-Party Legislature: Integrating the Latent Space Item Response Model with Topic-Based Regression

Seungju Lee, In-Kyun Kim, Ick Hoon Jin

详情
英文摘要

We develop a one-stage Bayesian framework for quantifying issue-specific legislative alignment in multi-party systems. The approach integrates a Latent Space Item Response Model (LSIRM), which embeds legislators and bills in a shared Euclidean space, with Bayesian beta regression using text-derived topic proportions as bill-level covariates. This yields posterior distributions of legislator- and issue-specific coefficients, enabling coherent comparison of polarization and cohesion across policy domains. Uncertainty is propagated through a one-stage MCMC sampler that jointly updates the latent-space and regression components. Application to the 17th Korean National Assembly reveals substantial heterogeneity in partisan conflict: fiscal domains such as Taxation and Grants and Local Government Budget show sharp polarization with tight within-party clustering, whereas Armed Services, Patriots, and Veterans exhibits weak party structuring and greater intra-party variability. The Democratic Labor Party (DLP) forms a coherent and distinct cluster on several issues even where the two major parties are not strongly polarized, confirming that important dimensions of legislative conflict are not captured by a single left-right ordering. The framework provides a principled tool for analyzing issue-structured voting behavior in legislatures where one-dimensional ideal point models yield unreliable estimates.

2603.01077 2026-03-03 math.DS cs.NA cs.SY eess.SY math.NA stat.ML

Kernel Methods for Stochastic Dynamical Systems with Application to Koopman Eigenfunctions: Feynman-Kac Representations and RKHS Approximation

Boumediene Hamzi, Houman Owhadi, Umesh Vaidya

详情
英文摘要

We extend the unified kernel framework for transport equations and Koopman eigenfunctions, developed in previous work by the authors for deterministic systems, to stochastic differential equations (SDEs). In the deterministic setting, three analytically grounded constructions-Lions-type variational principles, Green's function convolution, and resolvent operators along characteristic flows--were shown to yield identical reproducing kernels. For stochastic systems, the Koopman generator includes a second-order diffusion term, transforming the first-order hyperbolic transport equation into a second-order elliptic-parabolic PDE. This fundamental change necessitates replacing the method of characteristics with probabilistic representations based on the Feynman--Kac formula. Our main contributions include: (i) extension of all three kernel constructions to stochastic systems via Feynman--Kac path-integral representations; (ii) proof of kernel equivalence under uniform ellipticity assumptions; (iii) a collocation-based computational framework incorporating second-order differential operators; (iv) error bounds separating RKHS approximation error from Monte Carlo sampling error; (v) analysis of how diffusion affects numerical conditioning; and (vi) connections to generator EDMD, diffusion maps, and kernel analog forecasting. Numerical experiments on Ornstein--Uhlenbeck processes, nonlinear SDEs with varying diffusion strength, and multi-dimensional systems validate the theoretical developments and demonstrate that moderate diffusion can improve numerical stability through elliptic regularization.

2603.01057 2026-03-03 physics.flu-dyn stat.AP

Extreme-value statistics of curl-of-vorticity precursor peaks in perturbed Taylor-Green vortex turbulence

Satori Tsuzuki

详情
英文摘要

Precursor peaks in the wavenumber $k_{\mathrm{peak}}(t)$ maximizing the curl-of-vorticity spectrum have been observed to precede the dissipation peak in decaying turbulence. Because small perturbations in the initial condition can shift peak times, the associated lead time should be characterized statistically. We perform a pseudospectral DNS ensemble of $N_s=1000$ perturbed Taylor--Green vortex realizations at $N=256^3$ and $ν=10^{-3}$. For each run we extract $k_{\mathrm{peak}}(t)$, several definitions of the precursor time $t_k$, the dissipation-peak time $t_\varepsilon$, and run-wise extrema including $K_{\max}=\max_t k_{\mathrm{peak}}(t)$ and $M_{\max}=\max_t\max_k \mathcal{C}(k,t)$, where $\mathcal{C}(k,t)$ is the isotropic curl-of-vorticity spectrum. The distribution of $Δt_{\varepsilon,k}=t_\varepsilon-t_k$ shows that the precursor typically leads, while rare lagging realizations occur and are strongly conditioned on $K_{\max}$. Using peaks-over-threshold extreme-value theory, we fit generalized Pareto models to the right tails of $X=-Δt_{\varepsilon,k}$ and $M_{\max}$; negative shape parameters indicate bounded tails and enable worst-case lag and endpoint estimates. Finally, $M_{\max}$ correlates strongly with $\varepsilon_{\max}$ and ensemble cross-correlations reveal a reproducible phase offset, supporting a dynamical coupling between high-curvature activity and dissipation bursts.

2603.01047 2026-03-03 cs.LG stat.ML

Evaluating GFlowNet from partial episodes for stable and flexible policy-based training

Puhua Niu, Shili Wu, Xiaoning Qian

Comments Accepted by ICLR 2026

详情
英文摘要

Generative Flow Networks (GFlowNets) were developed to learn policies for efficiently sampling combinatorial candidates by interpreting their generative processes as trajectories in directed acyclic graphs. In the value-based training workflow, the objective is to enforce the balance over partial episodes between the flows of the learned policy and the estimated flows of the desired policy, implicitly encouraging policy divergence minimization. The policy-based strategy alternates between estimating the policy divergence and updating the policy, but reliable estimation of the divergence under directed acyclic graphs remains a major challenge. This work bridges the two perspectives by showing that flow balance also yields a principled policy evaluator that measures the divergence, and an evaluation balance objective over partial episodes is proposed for learning the evaluator. As demonstrated on both synthetic and real-world tasks, evaluation balance not only strengthens the reliability of policy-based training but also broadens its flexibility by seamlessly supporting parameterized backward policies and enabling the integration of offline data-collection techniques.

2602.23629 2026-03-03 stat.ML cs.LG math.ST stat.AP stat.ME stat.TH

Multivariate Spatio-Temporal Neural Hawkes Processes

Christopher Chukwuemeka, Hojun You, Mikyoung Jun

Comments 16 pages, 20 figures (including supplementary material)

详情
英文摘要

We propose a Multivariate Spatio-Temporal Neural Hawkes Process for modeling complex multivariate event data with spatio-temporal dynamics. The proposed model extends continuous-time neural Hawkes processes by integrating spatial information into latent state evolution through learned temporal and spatial decay dynamics, enabling flexible modeling of excitation and inhibition without predefined triggering kernels. By analyzing fitted intensity functions of deep learning-based temporal Hawkes process models, we identify a modeling gap in how fitted intensity behavior is captured beyond likelihood-based performance, which motivates the proposed spatio-temporal approach. Simulation studies show that the proposed method successfully recovers sensible temporal and spatial intensity structure in multivariate spatio-temporal point patterns, while existing temporal neural Hawkes process approach fails to do so. An application to terrorism data from Pakistan further demonstrates the proposed model's ability to capture complex spatio-temporal interaction across multiple event types.

2602.22803 2026-03-03 stat.ME

Rejoinder to the discussants of the two JASA articles `Frequentist Model Averaging' and `The Focused Information Criterion', by Nils Lid Hjort and Gerda Claeskens

Nils Lid Hjort, Gerda Claeskens

Comments 16 pages; Statistical Research Report, Department of Mathematics, University of Oslo, August 2003, and arXiv'd February 2026 (v2 simply corrects spelling in v1). This rejoinder to the two papers `FMA' and `FIC' is published in JASA, 2003, vol. 98, pages 938-945, at this url: tandfonline.com/doi/abs/10.1198/016214503000000882

详情
英文摘要

We are honoured to have our work read and discussed at such a thorough level by several experts. Words of appreciation and encouragement are gratefully received, while the many supplementary comments, thoughtful reminders, new perspectives and additional themes raised are warmly welcomed and deeply appreciated. Our thanks go also to JASA Editor Francisco Samaniego and his editorial helpers for organising this discussion. Space does not allow us answering all of the many worthwhile points raised by our discussants, but in the following we make an attempt to respond to what we perceive of as being the major issues. Our responses are organised by themes rather than by discussants. We shall refer to our two articles as `the FMA paper' (Hjort and Claeskens) and `the FIC paper' (Claeskens and Hjort).

2602.19113 2026-03-03 cs.LG cs.AI stat.ML

Learning from Complexity: Exploring Dynamic Sample Pruning of Spatio-Temporal Training

Wei Chen, Junle Chen, Yuqian Wu, Yuxuan Liang, Xiaofang Zhou

详情
英文摘要

Spatio-temporal forecasting is fundamental to intelligent systems in transportation, climate science, and urban planning. However, training deep learning models on the massive, often redundant, datasets from these domains presents a significant computational bottleneck. Existing solutions typically focus on optimizing model architectures or optimizers, while overlooking the inherent inefficiency of the training data itself. This conventional approach of iterating over the entire static dataset each epoch wastes considerable resources on easy-to-learn or repetitive samples. In this paper, we explore a novel training-efficiency techniques, namely learning from complexity with dynamic sample pruning, ST-Prune, for spatio-temporal forecasting. Through dynamic sample pruning, we aim to intelligently identify the most informative samples based on the model's real-time learning state, thereby accelerating convergence and improving training efficiency. Extensive experiments conducted on real-world spatio-temporal datasets show that ST-Prune significantly accelerates the training speed while maintaining or even improving the model performance, and it also has scalability and universality.

2602.07667 2026-03-03 econ.EM stat.AP stat.ML

Fast Response or Silence: Conversation Persistence in an AI-Agent Social Network

Aysajan Eziz

Comments 32 pages, 6 figures, 18 tables

详情
英文摘要

Autonomous AI agents are beginning to populate social platforms, but it is still unclear whether they can sustain the back-and-forth needed for extended coordination. We study Moltbook, an AI-agent social network, using a first-week snapshot and introduce interaction half-life: how quickly a comment's chance of receiving a direct reply fades as the comment ages. Across tens of thousands of commented threads, Moltbook discussions are dominated by first-layer reactions rather than extended chains. Most comments never receive a direct reply, reciprocal back-and-forth is rare, and when replies do occur they arrive almost immediately -- typically within seconds -- implying persistence on the order of minutes rather than hours. Moltbook is often described as running on an approximately four-hour ``heartbeat'' check-in schedule; using aggregate spectral tests on the longest contiguous activity window, we do not detect a reliable four-hour rhythm in this snapshot, consistent with jittered or out-of-phase individual schedules. A contemporaneous Reddit baseline analyzed with the same estimators shows substantially deeper threads and much longer reply persistence. Overall, early agent social interaction on Moltbook fits a ``fast response or silence'' regime, suggesting that sustained multi-step coordination will likely require explicit memory, thread resurfacing, and re-entry scaffolds.

2602.02577 2026-03-03 stat.ML cs.IT cs.LG math.IT

Relaxed Triangle Inequality for Kullback-Leibler Divergence Between Multivariate Gaussian Distributions

Shiji Xiao, Yufeng Zhang, Chubo Liu, Yan Ding, Keqin Li, Kenli Li

详情
英文摘要

The Kullback-Leibler (KL) divergence is not a proper distance metric and does not satisfy the triangle inequality, posing theoretical challenges in certain practical applications. Existing work has demonstrated that KL divergence between multivariate Gaussian distributions follows a relaxed triangle inequality. Given any three multivariate Gaussian distributions $\mathcal{N}_1, \mathcal{N}_2$, and $\mathcal{N}_3$, if $KL(\mathcal{N}_1, \mathcal{N}_2)\leq ε_1$ and $KL(\mathcal{N}_2, \mathcal{N}_3)\leq ε_2$, then $KL(\mathcal{N}_1, \mathcal{N}_3)< 3ε_1+3ε_2+2\sqrt{ε_1ε_2}+o(ε_1)+o(ε_2)$. However, the supremum of $KL(\mathcal{N}_1, \mathcal{N}_3)$ is still unknown. In this paper, we investigate the relaxed triangle inequality for the KL divergence between multivariate Gaussian distributions and give the supremum of $KL(\mathcal{N}_1, \mathcal{N}_3)$ as well as the conditions when the supremum can be attained. When $ε_1$ and $ε_2$ are small, the supremum is $ε_1+ε_2+2\sqrt{ε_1ε_2}+o(ε_1)+o(ε_2)$. Finally, we demonstrate several applications of our results in out-of-distribution detection with flow-based generative models and safe reinforcement learning.

2601.19957 2026-03-03 stat.CO astro-ph.CO astro-ph.IM

SunBURST: Deterministic GPU-Accelerated Bayesian Evidence via Mode-Centric Laplace Integration

Ira Wolfson

Comments 46 pages, 1 figure, 10 tables

详情
英文摘要

Bayesian evidence evaluation becomes computationally prohibitive in high dimensions due to the curse of dimensionality and the sequential nature of sampling-based methods. We introduce SunBURST, a deterministic GPU-native algorithm for Bayesian evidence calculation that replaces global volume exploration with mode-centric geometric integration. The pipeline combines radial mode discovery, batched L-BFGS refinement, and Laplace-based analytic integration, treating modes independently and converting large batches of likelihood evaluations into massively parallel GPU workloads. For Gaussian and near-Gaussian posteriors, where the Laplace approximation is exact or highly accurate, SunBURST achieves numerical agreement at double-precision tolerance in dimensions up to 1024 in our benchmarks, with sub-linear wall-clock scaling across the tested range. In multimodal Gaussian mixtures, conservative configurations yield sub-percent accuracy while maintaining favorable scaling. SunBURST is not intended as a universal replacement for sampling-based inference. Its design targets regimes common in physical parameter estimation and inverse problems, where posterior mass is locally well approximated by Gaussian structure around a finite number of modes. In strongly non-Gaussian settings, the method can serve as a fast geometry-aware evidence estimator or as a preprocessing stage for hybrid workflows. These results show that high-precision Bayesian evidence evaluation can be made computationally tractable in very high dimensions through deterministic integration combined with massive parallelism.

2512.12937 2026-03-03 math.PR math.ST stat.TH

Asymptotic Normality of Subgraph Counts in Sparse Inhomogeneous Random Graphs

Sayak Chatterjee, Anirban Chatterjee, Abhinav Chakraborty, Bhaswar B. Bhattacharya

Comments Revised version. 28 pages, 3 figures

详情
英文摘要

In this paper, we derive the asymptotic distribution of the number of copies of a fixed graph $H$ in a random graph $G_n$ sampled from a sparse graphon model. Specifically, we provide a refined analysis that separates the contributions of edge randomness and vertex-label randomness, allowing us to identify distinct sparsity regimes in which each component dominates or both contribute jointly to the fluctuations. As a result, we establish asymptotic normality for the count of any fixed graph $H$ in $G_n$ across the entire range of sparsity (above the containment threshold for $H$ in $G_n$). These results provide a complete description of subgraph count fluctuations in sparse inhomogeneous networks, closing several gaps in the existing literature that were limited to specific motifs or suboptimal sparsity assumptions.

2511.03087 2026-03-03 stat.ME math.OC math.ST stat.ML stat.TH

Beyond Maximum Likelihood: Variational Inequality Estimation for Generalized Linear Models

Linglingzhi Zhu, Jonghyeok Lee, Yao Xie

详情
英文摘要

Generalized linear models (GLMs) are fundamental tools for statistical modeling, with maximum likelihood estimation (MLE) serving as the classical approach for parameter inference. While MLE performs well for canonical GLMs, it can become computationally challenging in more general settings with non-canonical, non-smooth, or nonlinear link functions, where the resulting optimization landscape may be ill-conditioned, non-convex, or non-differentiable. In this paper, we study an alternative estimation framework based on variational inequalities (VIs), which formulates GLM estimation through an operator-based equilibrium condition rather than likelihood minimization. We analyze the VI estimator from a statistical perspective and establish finite-sample error bounds and asymptotic normality under mild regularity conditions, together with convergence guarantees for fixed-point and stochastic approximation algorithms. The framework accommodates a broad class of link functions, including non-canonical and non-monotone cases satisfying a strong Minty-type condition, and extends naturally to generalized additive models via basis expansion. Numerical experiments demonstrate that the VI approach achieves competitive finite-sample accuracy and improved numerical stability relative to MLE, particularly in GLMs and GAMs with non-canonical or non-smooth link functions.

2511.02137 2026-03-03 stat.ML cs.LG stat.ME

DoFlow: Flow-based Generative Models for Interventional and Counterfactual Forecasting on Time Series

Dongze Wu, Feng Qiu, Yao Xie

Comments Accepted to the 14th International Conference on Learning Representations (ICLR 2026)

详情
英文摘要

Time-series forecasting increasingly demands not only accurate observational predictions but also causal forecasting under interventional and counterfactual queries in multivariate systems. We present DoFlow, a flow-based generative model defined over a causal Directed Acyclic Graph (DAG) that delivers coherent observational and interventional predictions, as well as counterfactuals through the natural encoding-decoding mechanism of continuous normalizing flows (CNFs). We also provide a supporting counterfactual recovery theory under certain assumptions. Beyond forecasting, DoFlow provides explicit likelihoods of future trajectories, enabling principled anomaly detection. Experiments on synthetic datasets with various causal DAG structures and real-world hydropower and cancer-treatment time series show that DoFlow achieves accurate system-wide observational forecasting, enables causal forecasting over interventional and counterfactual queries, and effectively detects anomalies. This work contributes to the broader goal of unifying causal reasoning and generative modeling for complex dynamical systems.

2510.08409 2026-03-03 stat.ML cs.LG

Optimal Stopping in Latent Diffusion Models

Yu-Han Wu, Quentin Berthet, Gérard Biau, Claire Boyer, Romuald Elie, Pierre Marion

详情
英文摘要

We identify and analyze a surprising phenomenon of Latent Diffusion Models (LDMs) where the final steps of the diffusion can degrade sample quality. In contrast to conventional arguments that justify early stopping for numerical stability, this phenomenon is intrinsic to the dimensionality reduction in LDMs. We provide a principled explanation by analyzing the interaction between latent dimension and stopping time. Under a Gaussian framework with linear autoencoders, we characterize the conditions under which early stopping is needed to minimize the distance between generated and target distributions. More precisely, we show that lower-dimensional representations benefit from earlier termination, whereas higher-dimensional latent spaces require later stopping time. We further establish that the latent dimension interplays with other hyperparameters of the problem such as constraints in the parameters of score matching. Experiments on synthetic and real datasets illustrate these properties, underlining that early stopping can improve generative quality. Together, our results offer a theoretical foundation for understanding how the latent dimension influences the sample quality, and highlight stopping time as a key hyperparameter in LDMs.

2510.05060 2026-03-03 cs.LG math.ST stat.ML stat.TH

ResCP: Reservoir Conformal Prediction for Time Series Forecasting

Roberto Neglia, Andrea Cini, Michael M. Bronstein, Filippo Maria Bianchi

Comments ICLR 2026

详情
英文摘要

Conformal prediction offers a powerful framework for building distribution-free prediction intervals for exchangeable data. Existing methods that extend conformal prediction to sequential data rely on fitting a relatively complex model to capture temporal dependencies. However, these methods can fail if the sample size is small and often require expensive retraining when the underlying data distribution changes. To overcome these limitations, we propose Reservoir Conformal Prediction (ResCP), a novel training-free conformal prediction method for time series. Our approach leverages the efficiency and representation learning capabilities of reservoir computing to dynamically reweight conformity scores. In particular, we compute similarity scores among reservoir states and use them to adaptively reweight the observed residuals at each step. With this approach, ResCP enables us to account for local temporal dynamics when modeling the error distribution without compromising computational scalability. We prove that, under reasonable assumptions, ResCP achieves asymptotic conditional coverage, and we empirically demonstrate its effectiveness across diverse forecasting tasks.

2510.03605 2026-03-03 cs.AI cs.LG stat.ML

Understanding the Role of Training Data in Test-Time Scaling

Adel Javanmard, Baharan Mirzasoleiman, Vahab Mirrokni

Comments 25 pages, 5 figures, accepted in ICLR 2026

详情
英文摘要

Test-time scaling improves the reasoning capabilities of large language models (LLMs) by allocating extra compute to generate longer Chains-of-Thoughts (CoTs). This enables models to tackle more complex problem by breaking them down into additional steps, backtracking, and correcting mistakes. Despite its strong performance--demonstrated by OpenAI's o1 and DeepSeek R1, the conditions in the training data under which long CoTs emerge, and when such long CoTs improve the performance, remain unclear. In this paper, we study the performance of test-time scaling for transformers trained on an in-context weight prediction task for linear regression. Our analysis provides a theoretical explanation for several intriguing observations: First, at any fixed test error, increasing test-time compute allows us to reduce the number of in-context examples (context length) in training prompts. Second, if the skills required to solve a downstream task are not sufficiently present in the training data, increasing test-time compute can harm performance. Finally, we characterize task hardness via the smallest eigenvalue of its feature covariance matrix and show that training on a diverse, relevant, and hard set of tasks results in best performance for test-time scaling. We confirm our findings with experiments on large, nonlinear transformer architectures.

2510.00504 2026-03-03 stat.ML cond-mat.dis-nn cs.IT cs.LG math.IT

A universal compression theory for lottery ticket hypothesis and neural scaling laws

Hong-Yi Wang, Di Luo, Tomaso Poggio, Isaac L. Chuang, Liu Ziyin

Comments 26 pages. Accepted by ICLR 2026 conference

详情
英文摘要

When training large-scale models, the performance typically scales with the number of parameters and the dataset size according to a slow power law. A fundamental theoretical and practical question is whether comparable performance can be achieved with significantly smaller models and substantially less data. In this work, we provide a positive and constructive answer. We prove that a generic permutation-invariant function of $d$ objects can be asymptotically compressed into a function of $\operatorname{polylog} d$ objects with vanishing error, which is proved to be the optimal compression rate. This theorem yields two key implications: (Ia) a large neural network can be compressed to polylogarithmic width while preserving its learning dynamics; (Ib) a large dataset can be compressed to polylogarithmic size while leaving the loss landscape of the corresponding model unchanged. Implication (Ia) directly establishes a proof of the dynamical lottery ticket hypothesis, which states that any ordinary network can be strongly compressed such that the learning dynamics and result remain unchanged. (Ib) shows that a neural scaling law of the form $L\sim d^{-α}$ can be boosted to an arbitrarily fast power law decay, and ultimately to $\exp(-α' \sqrt[m]{d})$.

2509.26560 2026-03-03 stat.ML cs.LG q-bio.NC

Estimating Dimensionality of Neural Representations from Finite Samples

Chanwoo Chun, Abdulkadir Canatar, SueYeon Chung, Daniel Lee

详情
英文摘要

The global dimensionality of a neural representation manifold provides rich insight into the computational process underlying both artificial and biological neural networks. However, all existing measures of global dimensionality are sensitive to the number of samples, i.e., the number of rows and columns of the sample matrix. We show that, in particular, the participation ratio of eigenvalues, a popular measure of global dimensionality, is highly biased with small sample sizes, and propose a bias-corrected estimator that is more accurate with finite samples and with noise. On synthetic data examples, we demonstrate that our estimator can recover the true known dimensionality. We apply our estimator to neural brain recordings, including calcium imaging, electrophysiological recordings, and fMRI data, and to the neural activations in a large language model and show our estimator is invariant to the sample size. Finally, our estimators can additionally be used to measure the local dimensionalities of curved neural manifolds by weighting the finite samples appropriately.

2509.21513 2026-03-03 cs.LG cs.AI cs.CV math.PR stat.ML

DistillKac: Few-Step Image Generation via Damped Wave Equations

Weiqiao Han, Chenlin Meng, Christopher D. Manning, Stefano Ermon

Comments Accepted to ICLR 2026

详情
英文摘要

We present DistillKac, a fast image generator that uses the damped wave equation and its stochastic Kac representation to move probability mass at finite speed. In contrast to diffusion models whose reverse time velocities can become stiff and implicitly allow unbounded propagation speed, Kac dynamics enforce finite speed transport and yield globally bounded kinetic energy. Building on this structure, we introduce classifier-free guidance in velocity space that preserves square integrability under mild conditions. We then propose endpoint only distillation that trains a student to match a frozen teacher over long intervals. We prove a stability result that promotes supervision at the endpoints to closeness along the entire path. Experiments demonstrate DistillKac delivers high quality samples with very few function evaluations while retaining the numerical stability benefits of finite speed probability flows.

2509.02391 2026-03-03 cs.LG cs.GT stat.ML

Gaming and Cooperation in Federated Learning: What Can Happen and How to Monitor It

Dongseok Kim, Hyoungsun Choi, Mohamed Jismy Aashik Rasool, Gisung Oh

Comments Published in Transactions on Machine Learning Research (TMLR), 2026. Camera-ready version. OpenReview: https://openreview.net/forum?id=Ck3q5YdWIv

详情
Journal ref
Transactions on Machine Learning Research, 2026
英文摘要

The success of federated learning (FL) ultimately depends on how strategic participants behave under partial observability, yet most formulations still treat FL as a static optimization problem. We instead view FL deployments as governed strategic systems and develop an analytical framework that separates welfare-improving behavior from metric gaming. Within this framework, we introduce indices that quantify manipulability, the price of gaming, and the price of cooperation, and we use them to study how rules, information disclosure, evaluation metrics, and aggregator-switching policies reshape incentives and cooperation patterns. We derive threshold conditions for deterring harmful gaming while preserving benign cooperation, and for triggering auto-switch rules when early-warning indicators become critical. Building on these results, we construct a design toolkit including a governance checklist and a simple audit-budget allocation algorithm with a provable performance guarantee. Simulations across diverse stylized environments and a federated learning case study consistently match the qualitative and quantitative patterns predicted by our framework. Taken together, our results provide design principles and operational guidelines for reducing metric gaming while sustaining stable, high-welfare cooperation in FL platforms.

2508.11727 2026-03-03 cs.LG stat.ML

Causal Structure Learning in Hawkes Processes with Complex Latent Confounder Networks

Songyao Jin, Biwei Huang

详情
英文摘要

Multivariate Hawkes process provides a powerful framework for modeling temporal dependencies and event-driven interactions in complex systems. While existing methods primarily focus on uncovering causal structures among observed subprocesses, real-world systems are often only partially observed, with latent subprocesses posing significant challenges. In this paper, we show that continuous-time event sequences can be represented by a discrete-time causal model as the time interval shrinks, and we leverage this insight to establish necessary and sufficient conditions for identifying latent subprocesses and the causal influences. Accordingly, we propose a two-phase iterative algorithm that alternates between inferring causal relationships among discovered subprocesses and uncovering new latent subprocesses, guided by path-based conditions that guarantee identifiability. Experiments on both synthetic and real-world datasets show that our method effectively recovers causal structures despite the presence of latent subprocesses.

2508.09576 2026-03-03 stat.ME

Decoding Neuronal Ensembles from Spatially-Referenced Calcium Traces: A Bayesian Semiparametric Approach

Laura D'Angelo, Francesco Denti, Antonio Canale, Michele Guindani

详情
英文摘要

Understanding how neurons coordinate their activity is a fundamental question in neuroscience, with implications for learning, memory, and neurological disorders. Calcium imaging has emerged as a powerful method to observe large-scale neuronal activity in freely moving animals, providing time-resolved recordings of hundreds of neurons. However, fluorescence signals are noisy and only indirectly reflect underlying spikes of neuronal activity, complicating the extraction of reliable patterns of neuronal coordination. We introduce a fully Bayesian, semiparametric model that jointly infers spiking activity and identifies functionally coherent neuronal ensembles from calcium traces. Our approach models each neuron's spiking probability through a latent Gaussian process and encourages anatomically coherent clustering using a location-dependent stick-breaking prior. A spike-and-slab Dirichlet process captures heterogeneity in spike amplitudes while filtering out negligible events. We consider calcium imaging data from the hippocampal CA1 region of a mouse as it navigates a circular arena, a setting critical for understanding spatial memory and neuronal representation of environments. Our model uncovers spatially structured co-activation patterns among neurons and can be employed to reveal how ensemble structures vary with the animal's position.

2507.07469 2026-03-03 stat.ML cs.LG econ.EM

A Projection-Based ARIMA Framework for Nonlinear Dynamics in Macroeconomic and Financial Time Series: Closed-Form Estimation and Rolling-Window Inference

Haojie Liu, Zihan Lin

详情
英文摘要

We introduce Galerkin-ARIMA and Galerkin-SARIMA, a projection-based extension of classical ARIMA/SARIMA that replaces rigid linear lag operators with low-dimensional Galerkin basis expansions while preserving the familiar AR-MA decomposition. Experiments on synthetic series and on quarterly GDP and daily S&P 500 returns show that Galerkin-SARIMA matches or improves forecast accuracy relative to classical ARIMA/SARIMA. Estimation is closed-form via a two-stage least-squares procedure, and the closed-form two-stage estimator enables efficient rolling-window re-estimation while preserving the familiar AR-MA operator structure, facilitating applications in central bank forecasting and portfolio risk management. We establish approximation-estimation trade-offs under weak dependence, provide consistency and asymptotic distributional results for the unpenalized estimator, compare prediction risk to classical SARIMA, and propose information-criterion selection of basis size. We further develop bootstrap-based inference for exogenous factor blocks and block-bootstrap prediction intervals that account for serial dependence and the two-stage generated-regressor structure.

2507.05958 2026-03-03 math.ST stat.TH

Importance sampling for Sobol' indices estimation

Haythem Boucharif, Jérôme Morio, Paul Rochet

详情
英文摘要

We propose a new importance sampling framework for the estimation and analysis of Sobol' indices. We focus on the estimation of the conditional second-moment quantity underlying these indices, which is the most challenging term to estimate. We show that this quantity, originally defined under a reference input distribution, can be estimated from samples drawn under auxiliary distributions by reweighting the model outputs. We derive the optimal sampling distribution that minimises the asymptotic variance of efficient estimators and demonstrate its impact on estimation. Beyond variance reduction, the framework also supports distributional sensitivity analysis through reverse importance sampling.

2506.16965 2026-03-03 cs.LG stat.ML

RocketStack: Level-aware Deep Recursive Ensemble Learning Architecture

Çağatay Demirel

Comments 32 pages, 1 graphical abstract, 8 figures, 10 tables, 2 supplementary figures

详情
英文摘要

Ensemble learning remains a cornerstone of machine learning, with stacking used to integrate predictions from multiple base learners through a meta-model. However, deep stacking remains uncommon due to feature redundancy, complexity, and computational burden. To address these limitations, RocketStack is introduced as a level-aware recursive stacking architecture explored up to ten stacking levels, extending beyond prior architectures. At level 1, base-learner predictions are fused with original features; at later levels, weaker learners are incrementally pruned using out-of-fold (OOF) scores. To curb early saturation, pruning is regularized by applying Gaussian perturbations at two noise scales to OOF scores prior to model selection for next-level stacking, alongside deterministic pruning. To control feature growth, periodic compression is applied at levels 3, 6, and 9 using Simple, Fast, Efficient (SFE) filtering, attention-based selection, and autoencoders. Across 33 datasets (23 binary, 10 multi-class), increasing accuracy with depth is confirmed by linear mixed-effects trend tests, and the best meta-model per level increasingly outperforms the best standalone ensemble. OOF-perturbed pruning is found to improve stability and late-level gains, while periodic compression is found to yield substantial runtime and dimensionality reductions with minimal accuracy drop. At the deepest level, accuracy slightly surpasses established deep tabular baselines. When hyperparameter optimization is performed on baseline models, early performance is boosted; however, untuned RocketStack closes the gap with depth and remains competitive at later levels. It achieves deep recursive stacking with sublinear computational growth and provides a modular, depth-aware foundation for scalable decision fusion as model pools and feature spaces evolve.

2505.16953 2026-03-03 cs.LG stat.ML

ICYM2I: The illusion of multimodal informativeness under missingness

Young Sang Choi, Vincent Jeanselme, Pierre Elias, Shalmali Joshi

Comments Published as a conference paper at ICLR 2026

详情
英文摘要

Multimodal learning is of continued interest in artificial intelligence-based applications, motivated by the potential information gain from combining different data modalities. However, modalities observed in the source environment may differ from the modalities observed in the target environment due to multiple factors, including cost, hardware failure, or the perceived \textit{informativeness} of a given modality. This change in missingness patterns between the source and target environment has not been carefully studied. Na{ï}ve estimation of the information gain associated with including an additional modality without accounting for missingness may result in improper estimates of that modality's value in the target environment. We formalize the problem of missingness, demonstrate its ubiquity, and show that the subsequent distribution shift induces bias when the missingness process is not explicitly accounted for. To address this issue, we introduce ICYM2I (In Case You Multimodal Missed It), a framework for the evaluation of predictive performance and information gain under missingness through inverse probability weighting-based correction. We demonstrate the importance of the proposed adjustment to estimate information gain under missingness on synthetic, semi-synthetic, and real-world datasets.

2505.14042 2026-03-03 cs.LG cs.CV stat.ML

Adversarially Pretrained Transformers May Be Universally Robust In-Context Learners

Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki

Comments ICLR26

详情
英文摘要

Adversarial training is one of the most effective defenses against adversarial attacks, but it incurs a high computational cost. In this study, we present the first theoretical analysis suggesting that adversarially pretrained transformers can serve as universally robust foundation models -- models that can adapt robustly to diverse downstream tasks with only lightweight tuning. Specifically, we demonstrate that single-layer linear transformers, after adversarial pretraining across a variety of classification tasks, can generalize robustly to unseen classification tasks through in-context learning from clean demonstrations (i.e., without requiring additional adversarial training or examples). This universal robustness stems from the model's ability to adaptively focus on robust features within given tasks. We also identify two open challenges for attaining robustness: the accuracy-robustness trade-off and sample-hungry training. This study initiates the discussion on the utility of universally robust foundation models. While their training is expensive, the investment would prove worthwhile as downstream tasks can obtain adversarial robustness for free. The code is available at https://github.com/s-kumano/universally-robust-in-context-learner.

2504.08428 2026-03-03 stat.ME cond-mat.stat-mech cs.LG math.ST stat.ML stat.TH

Standardization of Weighted Ranking Correlation Coefficients

Pierangelo Lombardo

Comments 24 pages, 5 figures

详情
英文摘要

A fundamental problem in statistics is measuring the correlation between two rankings of a set of items. Kendall's $τ$ and Spearman's $ρ$ are well established correlation coefficients whose symmetric structure guarantees zero expected value between two rankings randomly chosen with uniform probability. In many modern applications, however, greater importance is assigned to top-ranked items, motivating weighted variants of these coefficients. Such weighting schemes generally break the symmetry of the original formulations, resulting in a non-zero expected value under independence and compromising the interpretation of zero correlation. We propose a general standardization function $g(\cdot)$ that transforms a ranking correlation coefficient $Γ$ into a standardized form $g(Γ)$ with zero expected value under randomness. The transformation preserves the domain $[-1,1]$, satisfies the boundary conditions, is continuous and increasing, and reduces to the identity for coefficients that already satisfy the zero-expected-value property. The construction of $g(x)$ depends on three distributional parameters of $Γ$: its mean, variance, and left variance; since their exact calculation becomes infeasible for large ranking lengths $n$, we develop accurate numerical estimates based on Monte Carlo sampling combined with polynomial regression to capture their dependence on $n$.

2504.08214 2026-03-03 stat.CO stat.ME

An Optimal Transport-Based Generative Model for Bayesian Posterior Sampling

Ke Li, Wei Han, Yuexi Wang, Yun Yang

详情
英文摘要

We investigate the problem of sampling from posterior distributions with intractable normalizing constants in Bayesian inference. Our solution is a new generative modeling approach based on optimal transport (OT) that learns a deterministic map from a reference distribution to the target posterior through constrained optimization. The method uses structural constraints from OT theory to ensure uniqueness of the solution and allows efficient generation of many independent, high-quality posterior samples. The framework supports both continuous and mixed discrete-continuous parameter spaces, with specific adaptations for latent variable models and near-Gaussian posteriors. Beyond computational benefits, it also enables new inferential tools based on OT-derived multivariate ranks and quantiles for Bayesian exploratory analysis and visualization. We demonstrate the effectiveness of our approach through multiple simulation studies and a real-world data analysis.

2502.21278 2026-03-03 cs.LG stat.ML

Does Generation Require Memorization? Creative Diffusion Models using Ambient Diffusion

Kulin Shah, Alkis Kalavasis, Adam R. Klivans, Giannis Daras

Comments 33 pages

详情
英文摘要

There is strong empirical evidence that the state-of-the-art diffusion modeling paradigm leads to models that memorize the training set, especially when the training set is small. Prior methods to mitigate the memorization problem often lead to a decrease in image quality. Is it possible to obtain strong and creative generative models, i.e., models that achieve high generation quality and low memorization? Despite the current pessimistic landscape of results, we make significant progress in pushing the trade-off between fidelity and memorization. We first provide theoretical evidence that memorization in diffusion models is only necessary for denoising problems at low noise scales (usually used in generating high-frequency details). Using this theoretical insight, we propose a simple, principled method to train the diffusion models using noisy data at large noise scales. We show that our method significantly reduces memorization without decreasing the image quality, for both text-conditional and unconditional models and for a variety of data availability settings.

2502.11788 2026-03-03 stat.AP

Comparison of offset and ratio weighted regressions in tweedie models with application to mid-term cancellations

Boucher Jean-Philippe, Coulibaly Raïssa

Comments 30 pages, 9 figures, 1 table

详情
英文摘要

In property and casualty insurance, particularly in automobile insurance, risk exposure is commonly assumed to be proportional to the duration of coverage. This assumption leads to two standard estimation strategies: the ratio approach, which normalizes the response variable (e.g., claim cost or premium) by the exposure, and the offset approach, which incorporates a transformation of the exposure (typically its logarithm) as a fixed regressor in the mean structure of the model. Although both approaches rely on the same proportionality assumption, they are not equivalent when the response variable follows a Tweedie distribution, a framework widely used in insurance analytics. In this paper, we show that each approach can be implemented independently and yields a consistent estimator of the true mean parameter vector. We then show that the offset approach is asymptotically more efficient than the ratio approach, a result established both theoretically and through simulation studies. However, when evaluated from the perspective of portfolio-level financial balance, the ratio approach exhibits superior performance, particularly in the presence of heterogeneous or truncated exposures arising from mid-term policy cancellations. These theoretical results are illustrated through an empirical analysis of an automobile insurance portfolio with a high cancellation rate, highlighting the practical implications of model choice for premium estimation under variable exposure conditions.

2502.00251 2026-03-03 stat.ME

Two-stage least squares with treatment-covariate interactions for treatment effect heterogeneity

Anqi Zhao, Peng Ding, Fan Li

详情
英文摘要

Treatment effect heterogeneity with respect to covariates is common in instrumental variable (IV) analyses. An intuitive approach, which we call the interacted two-stage least squares (2sls), is to postulate a working linear model of the outcome on the treatment, covariates, and treatment-covariate interactions, and instrument it using the IV, covariates, and IV-covariate interactions. We clarify the causal interpretation of the interacted 2sls under the local average treatment effect (LATE) framework when the IV is valid conditional on the covariates. Our main findings are threefold. First, we show that the coefficients on the treatment-covariate interactions from the interacted 2sls are consistent for estimating treatment effect heterogeneity with respect to covariates among compliers for any outcome-generating process if and only if the product of the IV propensity score and covariates are linear in the covariates, referred to as the linear IV-covariate interactions condition. Second, assuming that the covariate vector has dimension K and includes a constant term, we show that the linear IV-covariate interactions condition holds only if the IV propensity score takes at most K distinct values. As a result, this condition is difficult to satisfy beyond two special cases: (a) the covariates are categorical with K levels, or (b) the IV is randomly assigned. These results underscore the difficulty of interpreting regression coefficients from specifications with treatment-covariate interactions when the covariates are not saturated and the IV is not unconditionally randomized, absent correct specification of the outcome model. Third, as an application of our theory, we show that the interacted 2sls with demeaned covariates is consistent for estimating the LATE under the linear IV-covariate interactions condition.

2411.19305 2026-03-03 stat.ML cs.LG math.DS

LD-EnSF: Synergizing Latent Dynamics with Ensemble Score Filters for Fast Data Assimilation with Sparse Observations

Pengpeng Xiao, Phillip Si, Peng Chen

详情
英文摘要

Data assimilation techniques are crucial for accurately tracking complex dynamical systems by integrating observational data with numerical forecasts. Recently, score-based data assimilation methods emerged as powerful tools for high-dimensional and nonlinear data assimilation. However, these methods still incur substantial computational costs due to the need for expensive forward simulations. In this work, we propose LD-EnSF, a novel score-based data assimilation method that eliminates the need for full-space simulations by evolving dynamics directly in a compact latent space. Our method incorporates improved Latent Dynamics Networks (LDNets) to learn accurate surrogate dynamics and introduces a history-aware LSTM encoder to effectively process sparse and irregular observations. By operating entirely in the latent space, LD-EnSF achieves speedups orders of magnitude over existing methods while maintaining high accuracy and robustness. We demonstrate the effectiveness of LD-EnSF on several challenging high-dimensional benchmarks with highly sparse (in both space and time) and noisy observations.

2410.08939 2026-03-03 stat.CO stat.ME stat.ML

Linear-cost unbiased posterior estimates for crossed effects and matrix factorization models via couplings

Paolo Maria Ceriani, Andrea Pandolfi, Giacomo Zanella

Comments 48 pages, 10 figures, 1 table

详情
英文摘要

We design and analyze unbiased Markov chain Monte Carlo (MCMC) schemes based on couplings of blocked Gibbs samplers (BGSs), whose total computational costs scale linearly with the number of parameters and data points. Our methodology is designed for and applicable to high-dimensional BGS with conditionally independent blocks, which are often encountered in Bayesian modeling. We provide bounds on the expected number of iterations needed for coalescence for Gaussian targets, as well as on the tails of the coalescence times distribution. These imply that practical two-step coupling strategies achieve coalescence times that match the relaxation times of the original BGS scheme up to logarithmic factors. To illustrate the practical relevance of our methodology, we apply it to high-dimensional crossed random effect and probabilistic matrix factorization models, for which we develop a novel BGS scheme with improved convergence speed. Our methodology provides unbiased posterior estimates at linear cost (usually requiring only a few BGS iterations for problems with thousands of parameters), matching state-of-the-art procedures for both frequentist and Bayesian estimation of those models.

2410.04264 2026-03-03 stat.ML cs.LG

Decoupling Dynamical Richness from Representation Learning: Towards Practical Measurement

Yoonsoo Nam, Nayara Fonseca, Seok Hyeong Lee, Chris Mingard, Niclas Goring, Ouns El Harzli, Abdurrahman Hadi Erturk, Soufiane Hayou, Ard A. Louis

Comments Published at ICLR 2026

详情
英文摘要

Dynamic feature transformation (the rich regime) does not always align with predictive performance (better representation), yet accuracy is often used as a proxy for richness, limiting analysis of their relationship. We propose a computationally efficient, performance-independent metric of richness grounded in the low-rank bias of rich dynamics, which recovers neural collapse as a special case. The metric is empirically more stable than existing alternatives and captures known lazy-torich transitions (e.g., grokking) without relying on accuracy. We further use it to examine how training factors (e.g., learning rate) relate to richness, confirming recognized assumptions and highlighting new observations (e.g., batch normalization promotes rich dynamics). An eigendecomposition-based visualization is also introduced to support interpretability, together providing a diagnostic tool for studying the relationship between training factors, dynamics, and representations.

2407.15256 2026-03-03 math.ST econ.EM stat.TH

Weak-instrument-robust subvector inference in instrumental variables regression: A subvector Lagrange multiplier test and properties of subvector Anderson-Rubin confidence sets

Malte Londschien, Peter Bühlmann

详情
英文摘要

We propose a weak-instrument-robust subvector Lagrange multiplier test for instrumental variables regression. We show that it is asymptotically size-correct under a technical condition or as the number of instruments grows to infinity. This is the first weak-instrument-robust subvector test for instrumental variables regression to recover the degrees of freedom of the commonly used non-weak-instrument-robust Wald test. Additionally, we provide a closed-form solution for subvector confidence sets obtained by inverting the subvector Anderson-Rubin test. We show that they are centered around a k-class estimator. We show that the subvector confidence sets for single coefficients of the causal parameter are jointly bounded if and only if Anderson's likelihood-ratio test rejects the null hypothesis that the first-stage regression parameter is of reduced rank, that is, that the causal parameter is not identified. Finally, we show that if a confidence set obtained by inverting the Anderson-Rubin test is bounded and nonempty, it is equal to a Wald-based confidence set with a data-dependent confidence level. We explicitly compute this Wald-based confidence set and its confidence level.

2406.16227 2026-03-03 stat.ML cs.LG stat.ME

VICatMix: variational Bayesian clustering and variable selection for discrete biomedical data

Jackie Rao, Paul D. W. Kirk

详情
英文摘要

Effective clustering of biomedical data is crucial in precision medicine, enabling accurate stratifiction of patients or samples. However, the growth in availability of high-dimensional categorical data, including `omics data, necessitates computationally efficient clustering algorithms. We present VICatMix, a variational Bayesian finite mixture model designed for the clustering of categorical data. The use of variational inference (VI) in its training allows the model to outperform competitors in term of efficiency, while maintaining high accuracy. VICatMix furthermore performs variable selection, enhancing its performance on high-dimensional, noisy data. The proposed model incorporates summarisation and model averaging to mitigate poor local optima in VI, allowing for improved estimation of the true number of clusters simultaneously with feature saliency. We demonstrate the performance of VICatMix with both simulated and real-world data, including applications to datasets from The Cancer Genome Atlas (TCGA), showing its use in cancer subtyping and driver gene discovery. We demonstrate VICatMix's utility in integrative cluster analysis with different `omics datasets, enabling the discovery of novel subtypes. \textbf{Availability:} VICatMix is freely available as an R package, incorporating C++ for faster computation, at https://github.com/j-ackierao/VICatMix.

2406.02701 2026-03-03 stat.CO

MPCR: Multi-Precision Computations Package in R

Mary Lai O. Salvana, Sameh Abdulah, Minwoo Kim, David Helmy, Ying Sun, Marc G. Genton

详情
英文摘要

In the early days of computing, severe memory constraints made it necessary to use lower floating-point precision. As hardware capabilities have advanced, modern systems, particularly in computational statistics and scientific computing, have widely adopted 64-bit precision to reduce numerical errors and support complex calculations. However, in some applications, double-precision accuracy exceeds practical requirements, prompting interest in lower-precision alternatives that decrease computational complexity while maintaining adequate accuracy. This trend has accelerated with the advent of hardware optimized for low-precision computations, such as leveraging Tensor Cores technology in recent NVIDIA GPUs. Although lower precision can introduce numerical and accuracy challenges, many applications demonstrate robustness under these conditions. Consequently, new multi-precision algorithms have been developed to balance accuracy and computational cost. To facilitate the adoption of these approaches in statistical computing, this article introduces MPCR, a new R package that supports arithmetic operations at 16-, 32-, and 64-bit precision. Written in C++ and integrated with Rcpp, MPCR delivers highly optimized multi-precision computations on both CPU and GPU, enabling seamless low-precision operations. Several examples demonstrate the benefits of MPCR across both performance and accuracy.

2309.13324 2026-03-03 stat.ME

Targeted Learning on Variable Importance Measure for Heterogeneous Treatment Effect

Haodong Li, Alan E Hubbard, Oliver J Hines, Andrea M Storås, Kajsa Kvist, Mark van der Laan

详情
英文摘要

Quantifying the heterogeneity of treatment effect is important for understanding how a commercial product or medical treatment affects different population subgroups. While much of treatment effect heterogeneity analysis focuses on the conditional average treatment effect, an alternative parameter that captures treatment effect heterogeneity is the variance of treatment effect across different covariate groups. One can also derive variable importance parameters that measure (and rank) how much of treatment effect heterogeneity is explained by a targeted subset of covariates. In this article, we propose a new targeted maximum likelihood estimator (TMLE) for a treatment effect variable importance measure, in the form of the difference of the variances of conditional average treatment effect. This TMLE is a pure plug-in estimator that consists of two steps: 1) the initial estimation of relevant components to plug in and 2) an iterative updating step to optimize the bias-variance tradeoff. Simulation results show that the proposed estimator has competitive performance in terms of lower bias and better confidence interval coverage compared to a simple substitution estimator and an estimating equation estimator. We apply these methods to data from a randomized clinical trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults. We find that the estimating equation estimator and the proposed TMLE suggest similar rankings of variable importance. The application of this method also demonstrates the advantage of a plug-in estimator, which always respects the global constraints on the data distribution and that the estimand is a particular function of the distribution.

2603.01033 2026-03-03 stat.ME stat.AP stat.OT

Interpreting Net Survival: What We Estimate Versus What We Think We Estimate

Matthew J. Smith

Comments 21 pages, 4 figures

详情
英文摘要

Net survival is conventionally defined as ``survival if cancer were the only possible cause of death'', an estimand corresponding to cancer-specific mortality alone. The Pohar Perme estimator targets this by removing general population other-cause mortality from observed total mortality, but achieves it only when cancer patients experience the same other-cause mortality as the general population. However, cancer patients often experience elevated other-cause mortality due to baseline health differences and treatment-induced effects. Using recent theoretical work decomposing total mortality into four components (cancer deaths, baseline health differences, treatment-induced other-cause deaths, and general population other-cause mortality), we show that the Pohar Perme estimator delivers the sum of cancer deaths, baseline differences, and treatment-induced deaths, falling short of its intended estimand whenever either source of excess is present. From Botta \textit{et al}, we present empirical evidence showing relative risk of other-cause deaths ranging from 1.0 (colorectal cancer) to 4.0+ (head and neck cancers), and calculations demonstrating that net survival can substantially underestimate cancer-specific survival probability when relative risk exceeds 1.0. Critically, treatment-induced other-cause deaths represent irreducible causal pathways from cancer to death that cannot be eliminated through better stratification. We recommend interpreting net survival as ``survival where general population other-cause mortality is removed'' rather than as a causal counterfactual, and call for more precise language in cancer epidemiology.

2603.00973 2026-03-03 stat.AP stat.ME

A Dirichlet-Multinomial-Poisson framework for the coherent analysis and forecast of cause-specific mortality

Andrea Nigri, Han Lin Shang, Francesco Ungolo

详情
英文摘要

Separate modelling of cause specific mortality rates and their projections can yield inconsistent forecasts when the sum of deaths by cause does not match the total observed in a population. We develop a hierarchical probabilistic framework for cause specific mortality counts in which both the total number of deaths and the occurrence of deaths across causes are treated as random. Conditional on the total number of deaths, cause specific counts follow a multinomial distribution, whereas the total count is modelled using a Poisson distribution, and the vector of cause of death probabilities is assigned a Dirichlet distribution. The variation in cause specific mortality rates by age and calendar year is captured in both the Poisson and Dirichlet models, allowing interpretable demographic patterns while preserving coherence by construction. This model construction naturally preserves the coherence between the sum of deaths by cause and the total mortality. The method is exhibited through the analysis of cause specific mortality rates in the United States and France, sourced from the Human Mortality Database from 1979 to 2023, separately by sex and across ages, with deaths grouped into major cause categories. The empirical analysis uses a rolling 15 year out o fsample evaluation and compares the proposed model with the standard Lee Carter model and its compositional extension. The results show that coherent projections can be obtained across countries and sexes, that competitive predictive accuracy is achieved, and that uncertainty is well calibrated for both total and cause specific mortality.

2603.00971 2026-03-03 stat.ML cs.LG math.ST stat.TH

Random Features for Operator-Valued Kernels: Bridging Kernel Methods and Neural Operators

Mike Nguyen, Nicole Mücke

详情
英文摘要

In this work, we investigate the generalization properties of random feature methods. Our analysis extends prior results for Tikhonov regularization to a broad class of spectral regularization techniques and further generalizes the setting to operator-valued kernels. This unified framework enables a rigorous theoretical analysis of neural operators and neural networks through the lens of the Neural Tangent Kernel (NTK). In particular, it allows us to establish optimal learning rates and provides a good understanding of how many neurons are required to achieve a given accuracy. Furthermore, we establish minimax rates in the well-specified case and also in the misspecified case, where the target is not contained in the reproducing kernel Hilbert space. These results sharpen and complete earlier findings for specific kernel algorithms.

2603.00968 2026-03-03 stat.ML cs.LG stat.ME

Learning with the Nash-Sutcliffe loss

Hristos Tyralis, Georgia Papacharalampous

Comments 77 pages, 4 figures, 6 tables

详情
英文摘要

The Nash-Sutcliffe efficiency ($\text{NSE}$) is a widely used, positively oriented relative measure for evaluating forecasts across multiple time series. However, it lacks a decision-theoretic foundation for this purpose. To address this, we examine its negatively oriented counterpart, which we refer to as Nash-Sutcliffe loss, defined as $L_{\text{NS}} = 1 - \text{NSE}$. We prove that $L_{\text{NS}}$ is strictly consistent for an elicitable and identifiable multi-dimensional functional, which we name the Nash-Sutcliffe functional. This functional is a data-weighted component-wise mean. The common practice of maximizing the average NSE across multiple series is the sample analog of minimizing the expected $L_{\text{NS}}$. Consequently, this operation implicitly assumes that all series originate from a single non-stationary, stochastic process. We introduce Nash-Sutcliffe linear regression, a multi-dimensional model estimated by minimizing the average $L_{\text{NS}}$, which reduces to a data-weighted least squares formulation. By reorienting the sample average loss function, we extend the previously proposed evaluation and estimation framework to forecasting multiple stationary dependent time series with differing stochastic properties. This constitutes a more natural empirical implementation of the $\text{NSE}$ than the earlier formulation. Our results establish a decision-theoretic foundation for $\text{NSE}$-based model estimation and forecast evaluation in large datasets, while further clarifying the benefits of global over local machine learning models.

2603.00955 2026-03-03 stat.ME cs.AI

Beyond False Discovery Rate: A Stepdown Group SLOPE Approach for Grouped Variable Selection

Xuelin Zhang, Jingxuan Liang, Xinyue Liu, Hong Chen, Biqin Song

详情
英文摘要

High-dimensional feature selection is routinely required to balance statistical power with strict control of multiple-error metrics such as the k-Family-Wise Error Rate (k-FWER) and the False Discovery Proportion (FDP), yet some existing frameworks are confined to the narrower goal of controlling the expected False Discovery Rate (FDR) and can not exploit the group-structure of the covariates, such as Sorted L-One Penalized Estimation (SLOPE). We introduce the Group Stepdown SLOPE, a unified optimization procedure which is capable of embedding the Lehmann-Romano stepdown rules into SLOPE to achieve finite-sample guarantees under k-FWER and FDP thresholds. Specifically, we derive closed-form regularization sequences under orthogonal designs that provably bound k-FWER and FDP at user-specified levels, and extend these results to grouped settings via gk-SLOPE and gF-SLOPE, which control the analogous group-level errors gk-FWER and gFDP. For non-orthogonal general designs, we provide a calibrated data-driven sequence inspired by Gaussian approximation and Monte-Carlo correction, preserving convexity and scalability. Extensive simulations are conducted across sparse, correlated, and group-structured regimes. Empirical results corroborate our theoretical findings that the proposed methods achieve nominal error control, while yielding markedly higher power than competing stepdown procedures, thereby confirming the practical value of the theoretical advances.

2603.00935 2026-03-03 stat.ML cs.LG

Time-Aware Latent Space Bayesian Optimization

Tuan A. Vu, Julien Martinelli, Harri Lähdesmäki

详情
英文摘要

Latent-space Bayesian optimization (LSBO) extends Bayesian optimization to structured domains, such as molecular design, by searching in the continuous latent space of a generative model. However, most LSBO methods assume a fixed objective, whereas real design campaigns often face temporal drift (e.g., evolving preferences or shifting targets). Bringing time-varying BO into LSBO is nontrivial: drift can affect not only the surrogate, but also the latent search space geometry induced by the representation. We propose Time-Aware Latent-space Bayesian Optimization (TALBO), which incorporates time in both the surrogate and the learned generative representation via a GP-prior variational autoencoder, yielding a latent space aligned as objectives evolve. To evaluate timevarying LSBO systematically, we adapt widely used molecular design tasks to drifting multi-property objectives and introduce metrics tailored to changing targets. Across these benchmarks, TALBO consistently outperforms strong LSBO baselines and remains robust across drift speeds and design choices, while remaining competitive under actually time-invariant objectives.

2603.00927 2026-03-03 stat.ME

Laplace Variational Inference for Bayesian Envelope Models

Seunghyeon Kim, Kwangmin Lee, Yeonhee Park

Comments 63 pages, 4 figures. Code available at https://github.com/Seunghyeon-Kim-stat/env-LVI

详情
英文摘要

Envelope models provide a sufficient dimension reduction framework for multivariate regression analysis. Bayesian inference for these models has been developed primarily using Markov chain Monte Carlo (MCMC) methods. Specifically, Gibbs sampling and Metropolis-Hastings algorithms suffer from slow mixing and high computational cost. Although automatic differentiation variational inference (ADVI) has been explored for Bayesian envelope models, the resulting gradient-based optimization is often numerically unstable due to severe ill-conditioning of the posterior distribution. To address this issue, we propose a novel reparameterization of the posterior distribution that alleviates the ill-conditioning inherent in conventional variational approaches. Building on this reparameterization, we develop an efficient variational inference procedure. Since the resulting likelihood remains nonconjugate, we approximate the corresponding variational factor using a Laplace approximation within a coordinate-ascent variational inference (CAVI) framework. We establish theoretical results showing that, at each one-step coordinate update, the Laplace approximation error relative to the exact variational inference coordinate update converges to zero. Simulation studies and a real-data analysis demonstrate that the proposed method substantially improves computational efficiency while maintaining estimation accuracy and model-selection performance relative to existing approaches.

2603.00888 2026-03-03 cs.LG stat.ML

Probabilistic Learning and Generation in Deep Sequence Models

Wenlong Chen

Comments PhD thesis

详情
英文摘要

Despite exceptional predictive performance of Deep sequence models (DSMs), the main concern of their deployment centers around the lack of uncertainty awareness. In contrast, probabilistic models quantify the uncertainty associated with unobserved variables with rules of probability. Notably, Bayesian methods leverage Bayes' rule to express our belief of unobserved variables in a principled way. Since exact Bayesian inference is computationally infeasible at scale, approximate inference is required in practice. Two major bottlenecks of Bayesian methods, especially when applied in deep neural networks, are prior specification and approximation quality. In Chapter 3 & 4, we investigate how the architectures of DSMs themselves can be informative for the design of priors or approximations in probabilistic models. We first develop an approximate Bayesian inference method tailored to the Transformer based on the similarity between attention and sparse Gaussian process. Next, we exploit the long-range memory preservation capability of HiPPOs (High-order Polynomial Projection Operators) to construct an interdomain inducing point for Gaussian process, which successfully memorizes the history in online learning. In addition to the progress of DSMs in predictive tasks, sequential generative models consisting of a sequence of latent variables are popularized in the domain of deep generative models. Inspired by the explicit self-supervised signals for these latent variables in diffusion models, in Chapter 5, we explore the possibility of improving other generative models with self-supervision for their sequential latent states, and investigate desired probabilistic structures over them. Overall, this thesis leverages inductive biases in DSMs to design probabilistic inference or structure, which bridges the gap between DSMs and probabilistic models, leading to mutually reinforced improvement.

2603.00875 2026-03-03 cs.CE cs.SY eess.SY stat.AP

Battery Lifetime Prediction using Data-driven Modeling Approaches

Vikram C Patil

详情
英文摘要

Batteries are ubiquitous today, with applications ranging from smartphones, watches, and laptops to electric cars, drones, and electric aircraft. Lithium-ion batteries are widely used in these applications due to their high energy density, rechargeability, and low lifecycle cost. Understanding the lifetime of lithium-ion batteries is essential for their effective utilization across many domains. In this study, data-driven modeling approaches are explored to predict the lifetime of lithium-ion batteries using various measurable battery parameters. A battery dataset from NASA's electric aircraft experiments was used, which included 17 predictor variables and remaining flight time as the response variable representing battery lifetime. The dataset contained more than 4,000,000 rows. However, the original dataset provided limited directly useful information about battery utilization over time; therefore, feature engineering was performed to generate more informative variables. Additionally, dimensionality reduction using principal component analysis (PCA) was applied to reduce computational cost and model complexity by selecting a smaller number of principal components as predictors for model development. Random forest and neural network models were explored for battery lifetime prediction using the engineered features. Multiple neural network configurations were evaluated, including single- and double-hidden-layer architectures with varying numbers of nodes. Mean squared error (MSE) on the test dataset was used as the performance metric for model comparison. The results indicate that data-driven modeling approaches are effective for battery lifetime prediction, with neural network models outperforming other models based on the MSE metric. Furthermore, neural networks demonstrate robustness in handling high-dimensional battery data.

2603.00849 2026-03-03 math.ST stat.TH

A new kernel-based index for the global sensitivity analysis of models with correlated inputs

Troy Larsen, Alen Alexanderian

Comments 22 pages

详情
英文摘要

We present an HSIC-based approach for global sensitivity analysis of broad classes of models with correlated and possibly function-valued inputs and outputs. To this end, we define the total HSIC sensitivity index: a bounded, interpretable, and moment-independent analogue to the total-effect Sobol' index. These desirable qualities hinge upon the key property of monotonicity under marginalization for the HSIC. We rigorously establish this monotonicity property by using a suitable class of augmented kernels. Furthermore, we provide an efficient algorithm for computing an empirical estimator of the HSIC that significantly reduces computational complexity and storage requirements. The effectiveness and interpretability of the total HSIC sensitivity indices are demonstrated through computational experiments on models that feature nonlinear relationships, correlated inputs, and functional outputs.

2603.00819 2026-03-03 math.NA cs.LG cs.NA math.ST stat.TH

A short tour of operator learning theory: Convergence rates, statistical limits, and open questions

Simone Brugiapaglia, Nicola Rares Franco, Nicholas H. Nelsen

Comments 12 pages

详情
英文摘要

This paper surveys recent developments at the intersection of operator learning, statistical learning theory, and approximation theory. First, it reviews error bounds for empirical risk minimization with a focus on holomorphic operators and neural network approximations. Next, it illustrates fundamental performance limits in terms of sample size by adopting a minimax perspective and considering various notions of regularity beyond holomorphy. The paper ends with a discussion on the interplay between these two perspectives and related open questions.

2603.00794 2026-03-03 math.ST stat.TH

A New Look at the Visual Performance of Nonparametric Hazard Rate Estimators

Olaf Gefeller, Nils Lid Hjort

Comments 8 pages, no figures. Statistical Research Report, Department of Mathematics, University of Oslo, June 1997, but now arXiv'd March 2026. Has later appeared in "Data Highways and Information Flooding, a Challenge for Classification and Data Analysis", 1997, Springer Verlag

详情
英文摘要

Nonparametric curve estimation by kernel methods has attracted widespread interest in theoretical and applied statistics. One area of conflict between theory and application relates to the evaluation of the performance of the estimators. Recently, Marron and Tsybakov (1995) proposed {\it visual error criteria} for addressing this issue of controversy in density estimation. Their core idea consists in using integrated alternatives to the Hausdorff distance for measuring the closeness of two sets based onthe Euclidean distance. In this paper, we transfer these ideas to hazard rate estimation from censored data. We are able to derive similar results that help to understand when the application of the new criteria will lead to answers that differ from those given by the conventional approaches.

2603.00784 2026-03-03 math.PR math.ST stat.TH

On the time a diffusion process spends along a line

Nils Lid Hjort, Rafail Zalmonovich Khasminskii

Comments 16 pages, 0 figures; Statistical Research Report, Department of Mathematics, University of Oslo, October 1992, but now arXiv'd in March 2026. The paper is published, in essentially this form, in Stochastic Processes and their Applications, 1993, vol. 47, pages 229-247, and may be found at this url: www.sciencedirect.com/science/article/pii/030441499390016W

详情
Journal ref
Stochastic Processes and their Applications, 1993, vol. 47, pages 229-247
英文摘要

For an arbitrary diffusion process $X$ with time-homogeneous drift and variance parameters $μ(x)$ and $σ^2(x)$, let $V_\varepsilon$ be $1/\varepsilon$ times the total time $X(t)$ spends in the strip $[a+bt-(1/2)\varepsilon,a+bt+(1/2)\varepsilon]$.The limit $V$ as $\varepsilon\rightarrow0$ is the full halfline version of the local time of $X(t)-a-bt$ at zero, and can be thought of as the time $X$ spends along the straight line $x=a+bt$. We prove that $V$ is either infinite with probability 1 or distributed as a mixture of an exponential and a unit point mass at zero, and we give formulae for the parameters of this distribution in terms of $μ(\cdot)$, $σ(\cdot)$, $a$, $b$, and the starting point $X(0)$. The special case ofa Brownian motion is studied in more detail, leading in particular to a full process $V(b)$ with continuous sample paths and exponentially distributed marginals. This construction leads to new families of bivariate and multivariate exponential distributions. Truncated versions of such `total relative time' variables are also studied. A relation is pointed out to a second order asymptotics problem in statistical estimation theory, recently investigated in Hjort and Fenstad (1992a, 1992b).

2603.00750 2026-03-03 math.ST math.PR stat.TH

A simple integral representation of single-event scoring rules

Alexander R. Pruss

详情
英文摘要

A simple integral representation involving no derivatives or continuity assumptions is given for proper single-event scoring rules.

2603.00749 2026-03-03 stat.ME

Hidden in Plain Sight: How Non-Collapsibility Biases Treatment Effects in (Network) Meta-Analysis

Harlan Campbell, Jeroen P. Jansen

详情
英文摘要

Network meta-analysis (NMA) is widely used to compare multiple interventions simultaneously by synthesizing direct and indirect evidence. The general fixed or random effects contrast-based NMA model can be applied to different outcomes and data structures by opting for either an arm-based or contrast-based likelihood depending on the data available. Depending on the outcome and link-function, we estimate either collapsible or non-collapsible effect measures. Using an illustrative example involving binary outcomes and the non-collapsible odds ratio, we demonstrate that the standard NMA model produces estimates for non-collapsible effect measures that are biased toward the null when studies in the evidence base enroll heterogeneous populations (mixtures of distinct risk groups) that vary across studies. Importantly, this also holds when there are no differences in effect-modifiers across studies; the standard assumption of a common treatment effect when there are no differences in the distribution of effect-modifiers across studies is not appropriate when studies have different baseline risks. As a potential solution, we propose a ``bookend'' approach that explicitly models mixed-population studies as weighted combinations of two homogeneous subpopulations identified from studies with extreme baseline risks and provide guidance for practitioners to determine if bias due to non-collapsibility may be a concern.

2603.00734 2026-03-03 stat.ME stat.AP

Robust Power and Sample Size Calculations in Quasi-likelihood Models: Methods and Practice

Shijie Yuan, Amy Cochran, Paul Rathouz

详情
英文摘要

Accurate power and sample size (PSS) calculations are essential for designing studies that use quasi-likelihood (QL) models, which extend generalized linear models (GLMs) to settings where the full distribution of the outcome is not specified. Traditional PSS approaches often rely on restrictive distributional assumptions, limiting their applicability when responses have non-standard distributions, variance functions are misspecified, or when predictors exhibit complex dependence structures. Building on recent advances in effect size measures for PSS - specifically, 2 Standard Deviations in the Linear Predictor (2SLiP) and Pseudo-Partial $R^2$ (P2R2) - developed with interpretability in mind, this paper extends and evaluates these effect size measures in the QL framework, keying in particular on their utility in PSS. We assess their empirical performance for the Wald test and then extend to the score test through extensive simulations across diverse outcome types, link functions, and variance structures. To illustrate practical utility, we applied these effect size measures to survey data on frontline health care workers from \citet{cahill2022occupational} to quantify the association between perceived personal protective equipment adequacy and mental health outcomes during the COVID-19 pandemic, adjusting for covariates. Our findings demonstrate that both 2SLiP and P2R2 provide robust and interpretable alternatives to traditional methods, maintaining accuracy with minimal distributional assumptions and enhancing the flexibility of PSS for realistic study designs.

2603.00716 2026-03-03 cs.LG stat.ML

Frozen Policy Iteration: Computationally Efficient RL under Linear $Q^π$ Realizability for Deterministic Dynamics

Yijing Ke, Zihan Zhang, Ruosong Wang

详情
英文摘要

We study computationally and statistically efficient reinforcement learning under the linear $Q^π$ realizability assumption, where any policy's $Q$-function is linear in a given state-action feature representation. Prior methods in this setting are either computationally intractable, or require (local) access to a simulator. In this paper, we propose a computationally efficient online RL algorithm, named Frozen Policy Iteration, under the linear $Q^π$ realizability setting that works for Markov Decision Processes (MDPs) with stochastic initial states, stochastic rewards and deterministic transitions. Our algorithm achieves a regret bound of $\widetilde{O}(\sqrt{d^2H^6T})$, where $d$ is the dimensionality of the feature space, $H$ is the horizon length, and $T$ is the total number of episodes. Our regret bound is optimal for linear (contextual) bandits which is a special case of our setting with $H = 1$. Existing policy iteration algorithms under the same setting heavily rely on repeatedly sampling the same state by access to the simulator, which is not implementable in the online setting with stochastic initial states studied in this paper. In contrast, our new algorithm circumvents this limitation by strategically using only high-confidence part of the trajectory data and freezing the policy for well-explored states, which ensures that all data used by our algorithm remains effectively on-policy during the whole course of learning. We further demonstrate the versatility of our approach by extending it to the Uniform-PAC setting and to function classes with bounded eluder dimension.

2603.00636 2026-03-03 cs.LG physics.ao-ph stat.ML

Retrodictive Forecasting: A Proof-of-Concept for Exploiting Temporal Asymmetry in Time Series Prediction

Cedric Damour

Comments 27 pages, 13 figures, 5 tables, Code available at https://github.com/cdamour/retrodictive-forecasting (Zenodo: https://doi.org/10.5281/zenodo.18803446)

详情
英文摘要

We propose a retrodictive forecasting paradigm for time series: instead of predicting the future from the past, we identify the future that best explains the observed present via inverse MAP optimization over a Conditional Variational Autoencoder (CVAE). This conditioning is a statistical modeling choice for Bayesian inversion; it does not assert that future events cause past observations. The approach is theoretically grounded in an information-theoretic arrow-of-time measure: the symmetrized Kullback-Leibler divergence between forward and time-reversed trajectory ensembles provides both the conceptual rationale and an operational GO/NO-GO diagnostic for applicability. We implement the paradigm as MAP inference over an inverse CVAE with a learned RealNVP normalizing-flow prior and evaluate it on six time series cases: four synthetic processes with controlled temporal asymmetry and two ERA5 reanalysis datasets (wind speed and solar irradiance). The work makes four contributions: (i) a formal retrodictive inference formulation; (ii) an inverse CVAE architecture; (iii) a model-free irreversibility diagnostic; and (iv) a falsifiable validation protocol with four pre-specified predictions. All pre-specified predictions are empirically supported: the diagnostic correctly classifies all six cases; the learned flow prior improves over an isotropic Gaussian baseline on GO cases; the inverse MAP yields no spurious advantage on time-reversible dynamics; and on irreversible GO cases, it achieves competitive or superior RMSE relative to forward baselines, with a statistically significant 17.7% reduction over a forward MLP on ERA5 solar irradiance. These results provide a structured proof-of-concept that retrodictive forecasting can constitute a viable alternative to conventional forward prediction when statistical time-irreversibility is present and exploitable.

2603.00553 2026-03-03 math.ST stat.TH

Minimax Simple Bayes Estimators of a Normal Variance

Yuzo Maruyama

Comments 6 pages

详情
英文摘要

This paper is a follow-up to Maruyama and Strawderman (2006, Journal of Statistical Planning and Inference), which identified a new class of generalized Bayes estimators with a particularly simple form for estimating a normal variance under entropy loss. Although their previous work established the Bayesianity of these estimators, it did not provide a closed-form result for their minimaxity. In this paper, we revisit the problem and establish a definitive closed-form minimaxity result for this class of simple Bayes estimators.

2603.00410 2026-03-03 stat.ME

Sensitivity Analysis for False Discovery Rate Estimation with Published p-Values

Tianyu Cao, Sangyoon Yi, Joshua Habiger

详情
英文摘要

There is recent interest in estimating the false discovery rate (FDR) with published p-values. However, there is little formal research that addresses the manner and extent to which the presumed selection, or publication, bias model impacts the bias and variance of FDR estimators. This manuscript provides general and closed-form expressions for the bias and variance of an established FDR estimator when the publication bias model (p<0.05) may or may not be correct. Expressions reveal that FDR estimates could be conservative or liberal, depending on how well a $p<0.05$ publication rule approximates the true selection mechanism. Analysis of a well-studied large-scale replication project in psychology, where selection model parameters are estimable, suggests that bias expressions are accurate in practice. Another well-studied collection of p-values mined from medical journal abstracts is used to illustrate how provided closed-form expressions may facilitate a simple sensitivity analysis when the goal is FDR estimation using selected p-values with unknown selection mechanism.

2603.00393 2026-03-03 physics.geo-ph cs.LG stat.ML

Dual-space posterior sampling for Bayesian inference in constrained inverse problems

Ali Siahkoohi, Kamal Aghazade, Ali Gholami

详情
英文摘要

Inverse problems constrained by partial differential equations are often ill-conditioned due to noisy and incomplete data or inherent non-uniqueness. A prominent example is full waveform inversion, which estimates Earth's subsurface properties by fitting seismic measurements subject to the wave equation, where ill-conditioning is inherent to noisy, band-limited, finite-aperture data and shadow zones. Casting the inverse problem into a Bayesian framework allows for a more comprehensive description of its solution, where instead of a single estimate, the posterior distribution characterizes non-uniqueness and can be sampled to quantify uncertainty. However, no clear procedure exists for translating hard physical constraints, such as the wave equation, into prior distributions amenable to existing sampling techniques. To address this, we perform posterior sampling in the dual space using an augmented Lagrangian formulation, which translates hard constraints into penalties amenable to sampling algorithms while ensuring their exact satisfaction. We achieve this by seamlessly integrating the alternating direction method of multipliers (ADMM) with Stein variational gradient descent (SVGD) -- a particle-based sampler -- where the constraint is relaxed at each iteration and multiplier updates progressively enforce satisfaction. This enables constrained posterior sampling while inheriting the favorable conditioning properties of dual-space solvers, where partial constraint relaxation allows productive updates even when the current model is far from the true solution. We validate the method on a stylized Rosenbrock conditional inference problem and on frequency-domain full waveform inversion for a Gaussian anomaly model and the Marmousi~II benchmark, demonstrating well-calibrated uncertainty estimates and posterior contraction with increasing data coverage.

2603.00365 2026-03-03 stat.AP econ.GN q-fin.EC

Randomized Recruitment Driven Sampling

Adam Visokay, Laura Boudreau, Rachel M. Heath, Tyler H. McCormick

详情
英文摘要

Surveys are critical inputs for research and policy, yet, enumerating a sampling frame is logistically infeasible or financially nonviable in many circumstances, such as during pandemics, natural disasters, or armed conflict. Respondent Driven Sampling (RDS) does not require a sampling frame, yet non-random peer recruitment often introduces substantial bias, particularly under high homophily. We introduce and evaluate Randomized Recruitment Driven Sampling (RRDS), a cellphone-based adaptation of RDS that incorporates researcher-controlled randomization into each recruitment wave. While standard RDS is necessary for stigmatized groups where network transparency is infeasible, RRDS is designed for low-stigma populations that become difficult to access due to logistical barriers. In these contexts, RRDS enforces the random recruitment assumption that traditional RDS relies upon but rarely achieves. Through simulation and an experiment surveying Bangladeshi garment workers during the COVID-19 pandemic, we demonstrate that RRDS produces less biased estimates and improved confidence interval coverage compared to traditional RDS. RRDS offers a scalable, remote-compatible alternative for studying low-stigma groups in challenging contexts where large-scale probability sampling is unsafe or infeasible.

2603.00347 2026-03-03 stat.ME

Synthetic Priors

Nick Polson, Vadim Sokolov

详情
英文摘要

Bayesian inference in generalized linear models requires a prior on the coefficient vector $β$. Practitioners naturally reason about response probabilities at specific covariate values, not about abstract log-odds parameters. We develop synthetic priors: informative Bayesian priors for GLMs grounded in Good's device of imaginary observations -- the principle that every conjugate prior is equivalent to a likelihood on pseudo-data from the same exponential family. The conditional means prior of Bedrick (1996) elicits independent Beta priors on the conditional mean response at $p$ expert-chosen design points; the induced prior on $β$ is a product of binomial likelihoods at synthetic data points. Combined with Pólya-Gamma data augmentation \citep{polson2013}, the posterior admits an exact conjugate Gibbs sampler -- no tuning, no Metropolis step -- by treating the augmented dataset as a standard logistic regression. We show that ridge regression and catalytic priors \citep{huang2020} are instances of Good's device, and identify prediction-powered inference \citep{angelopoulos2023ppi} as a structural analogue in the frequentist setting -- all three mediate a variance-bias tradeoff through a single informativeness parameter. We illustrate the approach on two benchmark problems: the Challenger O-ring data \citep{dalal1989}, where the BCJ prior provides a more moderate posterior predictive at the 31°F launch temperature; and a Phase~II atopic dermatitis dose-finding trial ($n = 300$), where the synthetic prior narrows 95\% credible intervals by 3-6\% and raises decision probabilities by up to 2 percentage points relative to a flat prior.

2603.00343 2026-03-03 stat.ME

Causal Inference with MNAR Self-Masking Confounders: A Stratified Delta-Imputed Propensity Estimation Method

Md. Niamul Islam Sium, Mohammad Hridoy Patwary

Comments Under Review

详情
英文摘要

In observational studies, causal inference becomes difficult when confounders are missing-not-at-random (MNAR), particularly where the missingness depends on the confounder's own unreported value (self-masking). Existing methods for handling MNAR confounders often rely on strong, unverifiable assumptions, leading to biased estimates. We propose a simple approach with Stratified Delta-Imputed Propensity Estimator (SDIPE) in the presence of self-masking confounders. SDIPE first stratifies data into observed and missing groups, imputes missing confounders via delta-adjusted multiple imputation. Then, within each group, average-treatment-effects (ATEs) are estimated by stabilized-inverse-probability-weights. The final ATE is obtained by combining the subgroup-specific estimates, weighted by respective proportions in the sample. Simulation study shows that SDIPE achieves low bias and near-nominal coverage (94-96%) across varying missingness, sample sizes, and treatment prevalence. In contrast, conventional sensitivity-based multiple imputation exhibits substantial bias and poor coverage (18-89%). Additionally, SDIPE is robust to the choice of the delta parameter. Applied to NHANES-2017-2018, SDIPE estimates that married individuals have a 1.19-point lower depression score than unmarried individuals (95% CI: -1.76, -0.64), adjusting for MNAR income data. SDIPE provides a practical and robust approach for causal inference with self-masking MNAR confounders, offering improved performance over existing methods without requiring restrictive assumptions about the missingness mechanism.

2603.00333 2026-03-03 math.OC cs.NA math.NA stat.ML

Dynamic Proximal Gradient Algorithms for Schatten-$p$ Quasi-Norm Regularized Problems

Weiping Shen, Linglingzhi Zhu, Yaohua Hu, Chong Li, Xiaoqi Yang

详情
英文摘要

This paper investigates numerical solution methods for the Schatten-$p$ quasi-norm regularized problem with $p \in [0,1]$, which has been widely studied for finding low-rank solutions of linear inverse problems and gained successful applications in various mathematics and applied science fields. We propose a dynamic proximal gradient algorithm that, through the use of the Cayley transformation, avoids computationally expensive singular value decompositions at each iteration, thereby significantly reducing the computational complexity. The algorithm incorporates two step size selection strategies: an adaptive backtracking search and an explicit step size rule. We establish the sublinear convergence of the proposed algorithm for all $p \in [0,1]$ within the framework of the Kurdyka-Lojasiewicz property. Notably, under mild assumptions, we show that the generated sequence converges to a stationary point of the objective function of the problem. For the special case when $p=1$, the linear convergence is further proved under the strict complementarity-type regularity condition commonly used in the linear convergence analysis of the forward-backward splitting algorithms. Preliminary numerical results validate the superior computational efficiency of the proposed algorithm.

2603.00322 2026-03-03 stat.ME

Fast distance computation of multivariate distributions via nonparanormal transport

Edward Shao, Junyoung Park, Naresh Punjabi, Hui Jiang, Irina Gaynanova

Comments 21 pages 9 figures

详情
英文摘要

With the increasing availability of data objects in the form of probability distributions, there is a growing need for statistical methods tailored to distributional data. Distance measures, especially the pairwise distance matrix between data objects, provide the foundation for a wide range of modern data analysis methods, such as clustering, multidimensional scaling, and distance-based regression, among others. The Wasserstein distance is commonly used with distributional data due to its compelling optimal transport property. However, while the Wasserstein distance can be efficiently computed for univariate distributions, its application to multivariate distributions is limited due to high computational costs. To address these scalability issues, we introduce the Nonparanormal Transport (NPT) metric, a closed-form distance based on the flexible nonparanormal distribution family for modeling skewed and non-Gaussian multivariate data. Simulation studies demonstrate that NPT maintains a high level of agreement with the Wasserstein distance, while being at least 1000 times faster than its efficient variants when computing a 100-distribution pairwise distance matrix in both 2 and 5 dimensions. We illustrate the utility of NPT through a multidimensional scaling analysis of bivariate oxygen desaturation distributions of 723 individuals with sleep apnea in the Sleep Heart Health Study.

2603.00291 2026-03-03 econ.EM stat.AP

Anticorruption Enforcement and Sale Mechanism Choice in China's Land Market

Julia Manso

详情
英文摘要

Upon taking office in late 2012, Chinese President Xi Jinping launched one of the most intensive anticorruption campaigns in the history of the People's Republic of China. Prior to the campaign, China's land market suffered from corruption, particularly surrounding sale method selection (auction versus listing). Listing is a two-stage sale mechanism that prior research has identified as more susceptible to corruption, leading to lower prices. This paper examines the campaign's impact on land allocation, focusing on whether corruption influences the choice of sale method and, in turn, land sale prices. This paper is the first to utilize Blackwell and Yamauchi (2021, 2024)'s marginal structural model with fixed effects in the inverse probability of treatment weighting model; absorbing time-invariant unobserved confounding and utilizing a set of time-varying covariates as controls, this model can estimate causal effects in the land sale case. I find that indictments in a prefecture cause a statistically significant drop in the probability that land is sold via listing$\unicode{x2014}$an effect that is further compounded when indictments occur in consecutive months. Sensitivity analyses indicate that any violations of the identification assumptions would bias estimates towards zero, confirming the negative effect. A second marginal structural model shows that both mean and median land sale prices increase in the presence of indictments. Together, these results suggest that the anticorruption campaign not only deterred actual corrupt allocation practices, but also impacted the discretionary use of listings.

2603.00277 2026-03-03 stat.ME stat.CO

CliPS -- How to identify cluster distributions in Bayesian mixture models

Gertraud Malsiner-Walli, Sylvia Frühwirth-Schnatter, Bettina Grün

详情
英文摘要

We propose the CliPS procedure when fitting Bayesian mixture models in the context of model-based clustering to identify the cluster distributions while simultaneously assessing the suitability of a cluster solution and validating the cluster structure. The procedure relies on the point process representation of a mixture model and is based on the assumption that a suitable cluster solution requires the clusters to be distinguishable with respect to a low-dimensional functional of the component-specific parameters of the mixture. CliPS maps the component-specific MCMC draws to the point process representation and identifies clusters there, exploiting that, while data distributions usually overlap, the posterior of these functionals are more and more separated for increasing sample size. We outline the procedure and illustrate its use on several model-based clustering examples.

2602.21487 2026-03-03 math.ST stat.TH

Moment bounds for condition numbers and singular values of high-dimensional Gaussian random matrices: Applications and limitations

Partha Sarkar, Kshitij Khare, Sanvesh Srivastava

详情
英文摘要

Spectral properties of Gram matrices are central to high dimensional asymptotic analyses of statistical estimators in regression and covariance estimation. These properties, in turn, depend critically on the extreme singular values and condition numbers of Gaussian random matrices. For many applications, sharp positive and negative moment bounds for these quantities are required to control expected prediction risk and related performance metrics. Although extensive work provides concentration and tail bounds for extreme singular values of Gaussian random matrices, these results do not readily yield the moment bounds needed in such analyses. Motivated by this gap, we establish non asymptotic moment bounds for arbitrary positive moments of the largest singular value and arbitrary negative moments of the smallest singular value, and uniform bounds for arbitrary positive moments of the condition number of high dimensional Gaussian random matrices. We demonstrate the utility of these bounds by applying them to derive explicit risk guarantees in high dimensional regression and covariance estimation, as well as to obtain bounds on the mean iteration complexity of gradient descent for solving Gram linear systems. Finally, we present counterexamples demonstrating that the positive condition number moment bounds and negative smallest singular value moment bounds cannot, in general, be extended to the broader class of sub Gaussian random matrices.

2602.19691 2026-03-03 stat.ML cs.LG cs.NA math.NA

Smoothness Adaptivity in Constant-Depth Neural Networks: Optimal Rates via Smooth Activations

Yuhao Liu, Zilin Wang, Lei Wu, Shaobo Zhang

详情
英文摘要

Smooth activation functions are ubiquitous in modern deep learning, yet their theoretical advantages over non-smooth counterparts remain poorly understood. In this work, we study both approximation and statistical properties of neural networks with smooth activations for learning functions in the Sobolev space $W^{s,\infty}([0,1]^d)$ with $s>0$. We prove that constant-depth networks equipped with smooth activations achieve smoothness adaptivity: increasing width alone suffices to attain the minimax-optimal approximation and estimation error rates (up to logarithmic factors). In contrast, for non-smooth activations such as ReLU, smoothness adaptivity is fundamentally limited by depth: the attainable approximation order is bounded by depth, and higher-order smoothness requires proportional depth growth. These results identify activation smoothness as a fundamental mechanism, complementary to depth, for achieving optimal rates over Sobolev function classes. Technically, our analysis is based on a multi-scale approximation framework that yields explicit neural network approximators with controlled parameter norms and model size. This complexity control ensures statistical learnability under empirical risk minimization (ERM) and avoids the impractical $\ell^0$-sparsity constraints commonly required in prior analyses.

2602.13104 2026-03-03 stat.ML cs.LG math.ST stat.TH

Random Forests as Statistical Procedures: Design, Variance, and Dependence

Nathaniel S. O'Connell

Comments 55 pages (35 page main text; 20 page supplement); 10 figures (9 main text; 1 supplement). Version 2: Added procedure-aligned synthetic resampling (PASR) estimation framework, pointwise prediction and confidence intervals, and comprehensive simulations validating theoretical claims

详情
英文摘要

We develop a finite-sample, design-based theory for random forests in which each tree is a randomized conditional predictor acting on fixed covariates and the forest is their Monte Carlo average. An exact variance identity separates Monte Carlo error from a covariance floor that persists under infinite aggregation. The floor arises through two mechanisms: observation reuse, where the same training outcomes receive weight across multiple trees, and partition alignment, where independently generated trees discover similar conditional prediction rules. We prove the floor is strictly positive under minimal conditions and show that alignment persists even when sample splitting eliminates observation overlap entirely. We introduce procedure-aligned synthetic resampling (PASR) to estimate the covariance floor, decomposing the total prediction uncertainty of a deployed forest into interpretable components. For continuous outcomes, resulting prediction intervals achieve nominal coverage with a theoretically guaranteed conservative bias direction. For classification forests, the PASR estimator is asymptotically unbiased, providing the first pointwise confidence intervals for predicted conditional probabilities from a deployed forest. Nominal coverage is maintained across a range of design configurations for both outcome types, including high-dimensional settings. The underlying theory extends to any tree-based ensemble with an exchangeable tree-generating mechanism.

2601.18075 2026-03-03 stat.ME

Maximum-Variance-Reduction Stratification for Improved Subsampling

Dingyi Wang, Haiying Wang, Qingpei Hu

详情
英文摘要

Subsampling is a widely used and effective approach for addressing the computational challenges posed by massive datasets. Substantial progress has been made in developing non-uniform, probability-based subsampling schemes that prioritize more informative observations. We propose a novel stratification mechanism that can be combined with existing subsampling designs to further improve estimation efficiency. We establish the estimator's asymptotic normality and quantify the resulting efficiency gains, which enables a principled procedure for selecting stratification variables and interval boundaries that target reductions in asymptotic variance. The resulting algorithm, Maximum-Variance-Reduction Stratification (MVRS), achieves significant improvements in estimation efficiency while incurring only linear additional computational cost. MVRS is applicable to both non-uniform and uniform subsampling methods. Experiments on simulated and real datasets confirm that MVRS markedly reduces estimator variance and improves accuracy compared with existing subsampling methods.

2601.12175 2026-03-03 q-fin.ST stat.AP

Distributional Fitting and Tail Analysis of Lead-Time Compositions: Nights vs. Revenue on Airbnb

Harrison E. Katz, Jess Needleman, Liz Medina

详情
英文摘要

We analyze daily lead-time distributions for two Airbnb demand metrics, Nights Booked (volume) and Gross Booking Value (revenue), treating each day's allocation across 0-365 days as a compositional vector. The data span 2,557 days from January 2019 through December 2025 in a large North American region. Three findings emerge. First, GBV concentrates more heavily in mid-range horizons: beyond 90 days, GBV tail mass typically exceeds Nights by 20-50%, with ratios reaching 75% at the 180-day threshold during peak seasons. Second, Gamma and Weibull distributions fit comparably well under interval-censored cross-entropy. Gamma wins on 61% of days for Nights and 52% for GBV, with Weibull close behind at 38% and 45%. Lognormal rarely wins (<3%). Nonparametric GAMs achieve 18-80x lower CRPS but sacrifice interpretability. Third, generalized Pareto fits suggest bounded tails for both metrics at thresholds below 150 days, though this may partly reflect right-truncation at 365 days; above 150 days, estimates destabilize. Bai-Perron tests with HAC standard errors identify five structural breaks in the Wasserstein distance series, with early breaks coinciding with COVID-19 disruptions. The results show that volume and revenue lead-time shapes diverge systematically, that simple two-parameter distributions capture daily pmfs adequately, and that tail inference requires care near truncation boundaries.

2601.04906 2026-03-03 math.ST stat.TH

Inference for concave distribution functions under measurement error

Mohammed Es-Salih Benjrada, Cecile Durot, Tommaso Lando

详情
英文摘要

We propose an estimator of a concave cumulative distribution function under the measurement error model, where the non-negative variables of interest are perturbed by additive independent random noise. The estimator is defined as the least concave majorant on the positive half-line of the deconvolution estimator of the distribution function. We show its uniform consistency and its square root convergence in law in $\ell_\infty(\mathbb R)$. To assess the validity of the concavity assumption, we construct a test for the nonparametric null hypothesis that the distribution function is concave on the positive half-line, against the alternative that it is not. We calibrate the test using bootstrap methods. The theoretical justification for calibration led us to establish a bootstrap version of Theorem 1 in Söhl and Trabs (2012), a Donsker-type result from which we obtain, as a special case, the limiting behavior of the deconvolution estimator of the distribution function in a bootstrap setting with measurement error. Combining this Donsker-type theorem with the functional delta method, we show that the test statistic and its bootstrap version have the same limiting distribution under the null hypothesis, whereas under the alternative, the bootstrap statistic is stochastically smaller. Consequently, the power of the test tends to one, for any fixed alternative, as the sample size tends to infinity. In addition to the theoretical results for the estimator and the test, we investigate their finite-sample performance in simulation studies.

2601.04584 2026-03-03 math.PR math.ST stat.TH

Distributional Limits for Eigenvalues of Graphon Kernel Matrices

Behzad Aalipur

详情
英文摘要

We study the fluctuation behavior of individual eigenvalues of kernel matrices arising from dense graphon-based random graphs. Under minimal integrability and boundedness assumptions on the graphon, we establish distributional limits for simple, well-separated eigenvalues of the associated integral operator. A sharp probabilistic dichotomy emerges: in the non-degenerate regime, the properly normalized empirical eigenvalue satisfies a central limit theorem with an explicit variance, whereas in the degenerate regime the leading stochastic term vanishes and the centered eigenvalue converges to a weighted chi-square law determined by the operator spectrum. The analysis requires no smoothness or Lipschitz conditions on the kernel. Prior work under comparable assumptions established only operator convergence and eigenspace consistency; the present results characterize the full distributional behavior of individual eigenvalues, extending fluctuation theory beyond the reach of classical operator-level arguments. The proofs combine second-order perturbation expansions, concentration bounds for kernel matrices, and Hoeffding decompositions for symmetric statistics, revealing that at the $\sqrt{n}$ scale the dominant randomness arises from latent-position sampling rather than Bernoulli edge noise.

2601.03059 2026-03-03 stat.ME

On the bias of the Hoover index estimator: Results for the gamma distribution

Roberto Vila, Helton Saulo

Comments 16 pages, 2 figures

详情
英文摘要

The Hoover index is a widely used measure of inequality with an intuitive interpretation, yet little is known about the finite-sample properties of its empirical estimator. In this paper, we derive a simple expression for the expected value of the Hoover index estimator for general non-negative populations, based on Laplace transform techniques and exponential tilting. This unified framework applies to both continuous and discrete distributions. Explicit bias expressions are obtained for gamma population, showing that the estimator is generally biased in finite samples. Numerical and simulation results illustrate the magnitude of the bias and its dependence on the underlying distribution and sample size.

2512.06116 2026-03-03 stat.AP q-bio.QM

Spatial Analysis for AI-segmented Histopathology Images: Methods and Implementation

Yoolkyu Park, Fangjiang Wu, Xin Feng, Shengjie Yang, Elizabeth H. Wang, Bo Yao, Chul Moon, Guanghua Xiao, Qiwei Li

Comments 44 pages, 2 figures

详情
英文摘要

Quantitative characterization of cellular spatial organization is critical for understanding tumor progression and immune response. Recent advances in artificial intelligence (AI) enable large-scale segmentation and classification of nuclei from digitized histopathology slides, producing massive point pattern and marked point pattern data. However, accessible and standardized tools for downstream spatial statistical analysis remain limited. We present SASHIMI (Spatial Analysis for Segmented Histopathology Images using Machine Intelligence), a browser-based platform for real-time spatial analysis of AI-segmented histopathology images. Rather than proposing new spatial methods, SASHIMI systematically organizes and operationalizes 27 widely used spatial summary statistics, areal indices, and topological features within a unified computational framework. The platform computes mathematically grounded descriptors including K-, L-, G-, F-, and J-functions, pair correlation and mark connection functions, spatial autocorrelation measures, similarity indices, and persistent homology-based topological summaries. Outputs include both functional curves and scalar feature tables suitable for downstream statistical modeling. We illustrate the framework using two cancer cohorts: oral potentially malignant disorders and non-small-cell lung cancer. Across datasets, cross-type spatial interactions and topological descriptors show associations with patient survival, demonstrating that complementary spatial features capture distinct aspects of tumor microenvironment architecture. SASHIMI provides an accessible, reproducible platform for single-cell-level spatial profiling of tumor tissue, enabling interactive visualization and standardized feature extraction without requiring programming expertise.

2511.10967 2026-03-03 stat.CO math.OC math.ST stat.TH

Autocovariance and Optimal Design for Random Walk Metropolis-Hastings Algorithm

Jingyi Zhang, James C. Spall

详情
英文摘要

The Metropolis-Hastings algorithm has been extensively studied in the estimation and simulation literature, with most prior work focusing on convergence behavior and asymptotic theory. However, its covariance structure-an important statistical property for both theory and implementation-remains less understood. In this work, we provide new theoretical insights into the scalar case, focusing primarily on symmetric unimodal target distributions with symmetric random walk proposals, where we also establish an optimal proposal design. In addition, we derive some more general results beyond this setting. For the high-dimensional case, we relate the covariance matrix to the classical 0.23 average acceptance rate tuning criterion.

2511.00944 2026-03-03 stat.ME econ.EM

On the estimation of leverage effect and volatility of volatility in the presence of jumps

Qiang Liu, Zhi Liu, Wang Zhou

详情
英文摘要

We study the estimation of leverage effect and volatility of volatility by using high-frequency data with the presence of jumps. We first construct spot volatility estimator by using the empirical characteristic function of the high-frequency increments to deal with the effect of jumps, based on which the estimators of leverage effect and volatility of volatility are proposed. Compared with existing estimators, our method is valid under more general jumps, making it a better alternative for empirical applications. Under some mild conditions, the asymptotic normality of the estimators is established and consistent estimators of the limiting variances are proposed based on the estimation of volatility functionals. We conduct extensive simulation study to verify the theoretical results. The results demonstrate that our estimators have relative better performance than the existing ones, especially when the jump is of infinite variation. Besides, we apply our estimators to a real high-frequency dataset, which reveals nonzero leverage effect and volatility of volatility in the market.

2510.21314 2026-03-03 cs.LG cs.AI stat.ML

A Convergence Analysis of Adaptive Optimizers under Floating-point Quantization

Xuan Tang, Jichu Li, Difan Zou

Comments 68 pages, 13 figures, ICLR 2026

详情
英文摘要

The rapid scaling of large language models (LLMs) has made low-precision training essential for reducing memory, improving efficiency, and enabling larger models and datasets. Existing convergence theories for adaptive optimizers, however, assume all components are exact and neglect hardware-aware quantization, leaving open the question of why low-precision training remains effective. We introduce the first theoretical framework for analyzing the convergence of adaptive optimizers, including Adam and Muon, under floating-point quantization of gradients, weights, and optimizer states (e.g., moment estimates). Within this framework, we derive convergence rates on smooth non-convex objectives under standard stochastic gradient assumptions, explicitly characterizing how quantization errors from different components affect convergence. We show that both algorithms retain rates close to their full-precision counterparts provided mantissa length scales only logarithmically with the number of iterations. Our analysis further reveals that Adam is highly sensitive to weights and second-moment quantization due to its reliance on $β_2 \to 1$, while Muon requires weaker error control and is thus potentially more robust. These results narrow the gap between empirical success and theoretical understanding of low-precision training methods. Numerical experiments on synthetic and real-world data corroborate our theory.

2509.23357 2026-03-03 cs.LG math.OC stat.ML

Landing with the Score: Riemannian Optimization through Denoising

Andrey Kharitenko, Zebang Shen, Riccardo de Santi, Niao He, Florian Doerfler

Comments 41 pages, 9 figures

详情
英文摘要

Under the data manifold hypothesis, high-dimensional data are concentrated near a low-dimensional manifold. We study the problem of Riemannian optimization over such manifolds when they are given only implicitly through the data distribution, and the standard manifold operations required by classical algorithms are unavailable. This formulation captures a broad class of data-driven design problems that are central to modern generative AI. Our key idea is to introduce a link function that connects the data distribution to the geometric operations needed for optimization. We show that this function enables the recovery of essential manifold operations, such as retraction and Riemannian gradient computation. Moreover, we establish a direct connection between our construction and the score function in diffusion models of the data distribution. This connection allows us to leverage well-studied parameterizations, efficient training procedures, and even pretrained score networks from the diffusion model literature to perform optimization. Building on this foundation, we propose two efficient inference-time algorithms -- Denoising Landing Flow (DLF) and Denoising Riemannian Gradient Descent (DRGD) -- and provide theoretical guarantees for both feasibility (approximate manifold adherence) and optimality (small Riemannian gradient norm). Finally, we demonstrate the effectiveness of our approach on finite-horizon reference tracking tasks in data-driven control, highlighting its potential for practical generative and design applications.

2509.22240 2026-03-03 eess.IV cs.CV cs.LG stat.AP stat.ML

COMPASS: Robust Feature Conformal Prediction for Medical Segmentation Metrics

Matt Y. Cheung, Ashok Veeraraghavan, Guha Balakrishnan

Comments Accepted at ICLR 2026

详情
英文摘要

In clinical applications, the utility of segmentation models is often based on the accuracy of derived downstream metrics such as organ size, rather than by the pixel-level accuracy of the segmentation masks themselves. Thus, uncertainty quantification for such metrics is crucial for decision-making. Conformal prediction (CP) is a popular framework to derive such principled uncertainty guarantees, but applying CP naively to the final scalar metric is inefficient because it treats the complex, non-linear segmentation-to-metric pipeline as a black box. We introduce COMPASS, a practical framework that generates efficient, metric-based CP intervals for image segmentation models by leveraging the inductive biases of their underlying deep neural networks. COMPASS performs calibration directly in the model's representation space by perturbing intermediate features along low-dimensional subspaces maximally sensitive to the target metric. We prove that COMPASS achieves valid marginal coverage under the assumption of exchangeability. Empirically, we demonstrate that COMPASS produces significantly tighter intervals than traditional CP baselines on four medical image segmentation tasks for area estimation of skin lesions and anatomical structures. Furthermore, we show that leveraging learned internal features to estimate importance weights allows COMPASS to also recover target coverage under covariate shifts. COMPASS paves the way for practical, metric-based uncertainty quantification for medical image segmentation.

2509.20323 2026-03-03 cs.LG math.OC stat.ML

A Recovery Guarantee for Sparse Neural Networks

Sara Fridovich-Keil, Mert Pilanci

Comments ICLR 2026

详情
英文摘要

We prove the first guarantees of sparse recovery for ReLU neural networks, where the sparse network weights constitute the signal to be recovered. Specifically, we study structural properties of the sparse network weights for two-layer, scalar-output networks under which a simple iterative hard thresholding algorithm recovers these weights exactly, using memory that grows linearly in the number of nonzero weights. We validate this theoretical result with simple experiments on recovery of sparse planted MLPs, MNIST classification, and implicit neural representations. Experimentally, we find performance that is competitive with, and often exceeds, a high-performing but memory-inefficient baseline based on iterative magnitude pruning. Code is available at https://github.com/voilalab/MLP-IHT.

2507.21783 2026-03-03 stat.AP cs.LG stat.ME stat.ML

Domain Generalization and Adaptation in Intensive Care with Anchor Regression

Malte Londschien, Manuel Burger, Gunnar Rätsch, Peter Bühlmann

详情
英文摘要

The performance of predictive models in clinical settings often degrades when deployed in new hospitals due to distribution shifts. This paper presents a large-scale study of causality-inspired domain generalization on heterogeneous multi-center intensive care unit (ICU) data. We apply anchor regression and introduce anchor boosting, a novel, tree-based nonlinear extension, to a large dataset comprising 400,000 patients from nine distinct ICU databases. We find that anchor regularization yields improvements of out-of-distribution performance, particularly for the most dissimilar target domains. The methods appear robust to violations of theoretical assumptions, such as anchor exogeneity. Furthermore, we propose a novel conceptual framework to quantify the utility of large external data datasets. By evaluating performance as a function of available target-domain data, we identify three regimes: (i) a domain generalization regime, where only the external model should be used, (ii) a domain adaptation regime, where refitting the external model is optimal, and (iii) a data-rich regime, where external data provides no additional value.

2507.07815 2026-03-03 stat.ME stat.CO

Vecchia approximated Bayesian heteroskedastic Gaussian processes

Parul V. Patil, Robert B. Gramacy, Cayelan C. Carey, R. Quinn Thomas

Comments 33 pages, 14 figures

详情
英文摘要

Many computer simulations are stochastic and exhibit input dependent noise. In such situations, heteroskedastic Gaussian processes (hetGPs) make ideal surrogates as they estimate a latent, non-constant variance. However, existing hetGP implementations are unable to deal with large simulation campaigns and use point-estimates for all unknown quantities, including latent variances. This limits applicability to small experiments and undercuts uncertainty. We propose a Bayesian hetGP using elliptical slice sampling (ESS) for posterior variance integration, and the Vecchia approximation to circumvent computational bottlenecks. We show good performance for our upgraded hetGP capability, compared to alternatives, on a benchmark example and a motivating corpus of more than 9-million lake temperature simulations. An open source implementation is provided as bhetGP on CRAN.

2505.12096 2026-03-03 cs.LG cs.AI stat.ML

When Bias Meets Trainability: Connecting Theories of Initialization

Alberto Bassi, Marco Baity-Jesi, Aurelien Lucchi, Carlo Albert, Emanuele Francazi

详情
英文摘要

The statistical properties of deep neural networks (DNNs) at initialization play an important role to comprehend their trainability and the intrinsic architectural biases they possess before data exposure Well established mean field (MF) theories have uncovered that the distribution of parameters of randomly initialized networks strongly influences the behavior of the gradients, dictating whether they explode or vanish. Recent work has showed that untrained DNNs also manifest an initial guessing bias (IGB), in which large regions of the input space are assigned to a single class. In this work, we provide a theoretical proof that links IGB to previous MF theories for a vast class of DNNs, showing that efficient learning is tightly connected to a network prejudice towards a specific class. This connection leads to a counterintuitive conclusion: the initialization that optimizes trainability is systematically biased rather than neutral.

2505.07800 2026-03-03 stat.ME stat.AP

Moderation effects and elasticities in compositional regression with a total. Application to Bayesian spatiotemporal modelling of all-cause mortality from environmental stressors

Germà Coenders, Javier Palarea-Albaladejo, Marc Saez, Maria A. Barceló

详情
Journal ref
Stochastic Environmental Research and Risk Assessment, 40 (2026), 56
英文摘要

Compositional regression models with a real-valued response variable can generally be specified as log-contrast models subject to a zero-sum constraint on the model coefficients. This formulation emphasises the relative information conveyed in the composition, while the overall total is regarded irrelevant. In this work, such a setting is extended to account not only for total effects, formally defined in a so-called T-space, but also for moderation or interaction effects. This is applied in the context of complex spatiotemporal data modelling, through an adaptation of the integrated nested Laplace approximation (INLA) method within a Bayesian estimation framework. Particular emphasis is placed on the interpretation of model coefficients and results, both on the original scale of the response variable and in terms of elasticities. The methodology is demonstrated through a detailed case study investigating the relationship between all-cause mortality and the interaction between extreme temperatures, air pollution composition, and total air pollution in Catalonia, Spain, during the summer of 2022. The results indicate that extreme temperatures are associated with an increased risk of mortality four days after exposure. Additionally, exposure to total air pollution, especially to NO2, is linked to elevated mortality risk regardless of temperature. In contrast, particulate matter is associated to increased mortality only when exposure occurs on days of extreme heat.

2503.22739 2026-03-03 econ.GN q-fin.EC stat.AP

The "Days of Learning" Metric for Education Evaluations

Gregory Camilli

详情
英文摘要

The third National Charter School Study (NCSS III) aimed to test whether charter school were effective and to highlight outcomes on academic progress. The authors reported that typical charter school students outperformed similar students in non-charter public schools by 6 days in mathematics and 16 days in reading. This "days of learning" metric used to claim relatively higher performance in charter schools than in comparable public schools. This logic of this metric is critiqued in this paper, and an alternative method of reporting outcomes is proposed.

2503.09026 2026-03-03 stat.ME

A Sparse Linear Model for Positive Definite Estimation of Covariance Matrices

Rakheon Kim, Irina Gaynanova

详情
英文摘要

Sparse covariance matrices play crucial roles by encoding the interdependencies between variables in numerous fields such as genetics and neuroscience. Despite substantial studies on sparse covariance matrices, existing methods face several challenges such as the correlation among the elements in the sample covariance matrix, positive definiteness and unbiased estimation of the diagonal elements. To address these challenges, we formulate a linear covariance model for estimating sparse covariance matrices and propose a penalized regression. This method is general enough to encompass existing sparse covariance estimators and can additionally consider correlation among the elements in the sample covariance matrix while avoiding unnecessary bias in the diagonal elements and preserving positive definiteness. We develop a consensus ADMM algorithm for estimation and derive $\ell_2$ convergence rate of the proposed estimator. We apply our estimator to simulated data and real data from neuroscience and genetics to describe the efficacy of our proposed method.

2502.12063 2026-03-03 stat.ML cs.LG math.OC math.ST stat.ME stat.TH

Low-Rank Thinning

Annabelle Michael Carrell, Albert Gong, Abhishek Shetty, Raaz Dwivedi, Lester Mackey

详情
英文摘要

The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially reducing the number of summary points. However, existing guarantees cover only a restricted range of distributions and kernel-based quality measures and suffer from pessimistic dimension dependence. To address these deficiencies, we introduce a new low-rank analysis of sub-Gaussian thinning that applies to any distribution and any kernel, guaranteeing high-quality compression whenever the kernel or data matrix is approximately low-rank. To demonstrate the broad applicability of the techniques, we design practical sub-Gaussian thinning approaches that improve upon the best known guarantees for approximating attention in transformers, accelerating stochastic gradient training through reordering, and distinguishing distributions in near-linear time.

2501.04272 2026-03-03 stat.ML cs.LG

On weight and variance uncertainty in neural networks for regression tasks

Moein Monemi, Morteza Amini, S. Mahmoud Taheri, Mohammad Arashi

Comments Submitted to journal

详情
英文摘要

We investigate the problem of weight uncertainty originally proposed by [Blundell et al. (2015). Weight uncertainty in neural networks. In International conference on machine learning, 1613-1622, PMLR.] in the context of neural networks designed for regression tasks, and we extend their framework by incorporating variance uncertainty into the model. Our analysis demonstrates that explicitly modeling uncertainty in the variance parameter can significantly enhance the predictive performance of Bayesian neural networks. By considering a full posterior distribution over the variance, the model achieves improved generalization compared to approaches that treat variance as fixed or deterministic. We evaluate the generalization capability of our proposed approach through a function approximation example and further validate it on the riboflavin genetic dataset. Our exploration encompasses both fully connected dense networks and dropout neural networks, employing Gaussian and spike-and-slab priors respectively for the network weights, providing a comprehensive assessment of how variance uncertainty affects model performance across different architectural choices.

2412.17879 2026-03-03 stat.AP stat.ME

Strategy to control biases in prior event rate ratio method, with application to palliative care in patients with advanced cancer

Xiangmei Ma, Grace Meijuan Yang, Qingyuan Zhuang, Yin Bun Cheung

Comments 35 pages, including 3 tables, 1 figure and 3 supplemental materials

详情
英文摘要

Objectives: Prior event rate ratio (PERR) is a method shown to perform well in mitigating confounding in real-world evidence research but it depends on several model assumptions. We propose an analytic strategy to correct biases arising from violation of two model assumptions, namely, population homogeneity and event-independent treatment. Study Design and Setting: We reformulate PERR estimation by embedding a treatment-by-period interaction term in an analytic model for recurrent event data, which is robust to bias arising from unobserved heterogeneity. Based on this model, we propose a set of methods to examine the presence of event-dependent treatment and to correct the resultant bias. We evaluate the proposed methods by simulation and apply it to a de-identified dataset on palliative care and emergency department visits in patients with advanced cancer. Results: Simulation results showed that the proposed method could mitigate the two sources of bias in PERR. In the palliative care study, analysis by the Cox model showed that patients who had started receiving palliative care had higher incidence of emergency department visits than their match controls (hazard ratio 3.31; 95% confidence interval 2.78 to 3.94). Using PERR without the proposed bias control strategy indicated a 19% reduction of the incidence (0.81; 0.64 to 1.02). However, there was evidence of event-dependent treatment. The proposed correction method showed no effect of palliative care on ED visits (1.00; 0.79 to 1.26). Conclusions: The proposed analytic strategy can control two sources of biases in the PERR approach. It enriches the armamentarium for real-world evidence research.

2412.05397 2026-03-03 stat.ME

Network Structural Equation Models for Causal Mediation and Spillover Effects

Ritoban Kundu, Peter X. K. Song

详情
英文摘要

Social network interference induces complex dependencies where a unit's outcome is influenced not only by its own exposure and mediator but also by those of connected neighbors. In such settings, a significant challenge lies in distinguishing direct exposure effects from interference-driven spillover effects, and further separating these from indirect effects mediated by intermediate variables. To address this, we propose a theoretical framework utilizing structural graphical models. Central to our approach is the Random Effects Network Structural Equation Model (REN-SEM), which extends the exposure mapping paradigm to capture these multifaceted spillover and mediation mechanisms while accounting for latent dependencies within mediators and outcomes. We establish general identification conditions and derive decomposition formulas for six distinct mechanistic estimands. Furthermore, for the class of Linear REN-SEMs, we develop a maximum likelihood estimation framework and establish a rigorous asymptotic theory tailored to non-i.i.d. network data, proving the consistency of our estimators and the validity of the variance estimates. The robustness and practical utility of our methodology are demonstrated through simulation experiments and an analysis of the Twitch Gamers Network, underscoring its effectiveness in quantifying intricate network-mediated exposure effects.

2411.04340 2026-03-03 cs.HC cs.CY stat.CO

Survival of the Notable: Gender Asymmetry in Wikipedia Collective Deliberations

Khandaker Tasnim Huq, Giovanni Luca Ciampaglia

详情
Journal ref
Proc. ACM Hum.-Comput. Interact. 9, 7, Article CSCW482 (November 2025), 29 pages
英文摘要

Communities on the web rely on open conversation forums for a number of tasks, including governance, information sharing, and decision making. However these forms of collective deliberation can often result in biased outcomes. A prime example are Articles for Deletion (AfD) discussions on Wikipedia, which allow editors to gauge the notability of existing articles, and that, as prior work has suggested, may play a role in perpetuating the notorious gender gap of Wikipedia. Prior attempts to address this question have been hampered by access to narrow observation windows, reliance on limited subsets of both biographies and editorial outcomes, and by potential confounding factors. To address these limitations, here we adopt a competing risk survival framework to fully situate biographical AfD discussions within the full editorial cycle of Wikipedia content. We find that biographies of women are nominated for deletion faster than those of men, despite editors taking longer to reach a consensus for deletion of women, even after controlling for the size of the discussion. Furthermore, we find that AfDs about historical figures show a strong tendency to result into the redirecting or merging of the biography under discussion into other encyclopedic entries, and that there is a striking gender asymmetry: biographies of women are redirected or merged into biographies of men more often than the other way round. Our study provides a more complete picture of the role of AfD in the gender gap of Wikipedia, with implications for the governance of the open knowledge infrastructure of the web.

2410.21603 2026-03-03 stat.ME stat.CO

Approximate Bayesian Computation with Statistical Distances for Model Selection

Clara Grazian

详情
英文摘要

Model selection in the presence of intractable likelihoods remains a central challenge in Bayesian inference. Approximate Bayesian computation (ABC) provides a flexible likelihood-free framework, but its use for model choice is known to be sensitive to the choice of summary statistics, often leading to poorly calibrated posterior model probabilities. Recent ABC variants based on statistical distances allow comparisons to be performed directly on empirical distributions, avoiding data reduction and offering improved theoretical guarantees under suitable conditions. This paper provides a systematic evaluation of discrepancy-based ABC methods for Bayesian model selection, focusing on their empirical behavior across a range of simulation settings and levels of model complexity. We compare full data ABC approaches based on Wasserstein, Creamer-von-Mises, and maximum mean discrepancy metrics with summary-statistic-based ABC and neural network classifiers. The results highlight settings in which full data ABC yields stable and well-calibrated posterior model probabilities, as well as scenarios where performance degrades due to model overlap or dependence. An application to toad movement models illustrates the practical implications of these findings. Overall, the study clarifies the strengths and limitations of discrepancy-based ABC for likelihood-free model choice and provides guidance for its use in realistic inferential settings.

2409.12446 2026-03-03 cs.LG cs.AI math.ST stat.ML stat.TH

Neural Networks Generalize on Low Complexity Data

Sourav Chatterjee, Timothy Sudijono

Comments 37 pages. Small corrections made

详情
英文摘要

We show that feedforward neural networks with ReLU activation generalize on low complexity data, suitably defined. Given i.i.d.~data generated from a simple programming language, the minimum description length (MDL) feedforward neural network which interpolates the data generalizes with high probability. We define this simple programming language, along with a notion of description length of such networks. We provide several examples on basic computational tasks, such as checking primality of a natural number. For primality testing, our theorem shows the following and more. Suppose that we draw an i.i.d.~sample of $n$ numbers uniformly at random from $1$ to $N$. For each number $x_i$, let $y_i = 1$ if $x_i$ is a prime and $0$ if it is not. Then, the interpolating MDL network accurately answers, with probability $1- O((\ln N)/n)$, whether a newly drawn number between $1$ and $N$ is a prime or not. Note that the network is not designed to detect primes; minimum description learning discovers a network which does so. Extensions to noisy data are also discussed, suggesting that MDL neural network interpolators can demonstrate tempered overfitting.

2407.18341 2026-03-03 stat.ME

Generalizing the Finkelstein-Schoenfeld Test to Incorporate Multiple Alternating Thresholds

Yunhan Mou, Tassos Kyriakides, Scott Hummel, Fan Li, Yuan Huang

详情
英文摘要

Composite endpoints consisting of both terminal and non-terminal events, such as death and hospitalization, are frequently used in cardiovascular clinical trials. The Finkelstein-Schoenfeld (FS) test provides a way to employ a hierarchical structure to combine fatal and non-fatal events by giving death information an absolute priority, which may limit the contribution of clinically meaningful non-fatal events. To provide a more flexible alternative, we propose the Finkelstein-Schoenfeld with Multiple Thresholds (FS-MT) test, which extends the standard FS test by incorporating multiple thresholds applied sequentially and alternating across endpoints. A weighted adaptive approach is also developed to help determine the thresholds in FS-MT. The proposed approach retains the statistical properties of the FS test while allowing more flexible use of information from lower-priority events. We evaluate the operating characteristics of the proposed test through simulations that vary the follow-up time, the correlation between events, and the treatment effect sizes. A case study based on the Digitalis Investigation Group clinical trial data is presented to further illustrate our proposed method. An R package ``FSMT'' that implements the proposed methodology has been developed.

2406.04098 2026-03-03 stat.ML cs.LG

A Large-Scale Neutral Comparison Study of Survival Models on Low-Dimensional Data

Lukas Burk, John Zobolas, Bernd Bischl, Andreas Bender, Marvin N. Wright, Raphael Sonabend

Comments 44 pages, 20 figures

详情
英文摘要

This work presents the first large-scale neutral benchmark experiment focused on single-event, right-censored, low-dimensional survival data. Benchmark experiments are essential in methodological research to scientifically compare new and existing model classes through proper empirical evaluation. Existing benchmarks in the survival literature are smaller in scale regarding the number of used datasets and extent of empirical evaluation. They often lack appropriate tuning or evaluation procedures, while other comparison studies focus on qualitative reviews rather than quantitative comparisons. This comprehensive study aims to fill the gap by neutrally evaluating a broad range of methods and providing generalizable guidelines for practitioners. We benchmark 19 models, ranging from classical statistical approaches to many common machine learning methods, on 34 publicly available datasets. The benchmark tunes models using both a discrimination measure (Harrell's C-index) and a scoring rule (Integrated Survival Brier Score), and evaluates them across six metrics covering discrimination, calibration, and overall predictive performance. Despite superior average ranks in overall predictive performance from individual learners like oblique random survival forests and likelihood-based boosting, and better discrimination rankings from multiple boosting- and tree-based methods as well as parametric survival models, no method significantly outperforms the commonly used Cox proportional hazards model for either tuning measure. We conclude that for predictive purposes in the standard survival analysis setting of low-dimensional, right-censored data, the Cox Proportional Hazards model remains a simple and robust method, sufficient for most practitioners. All code, data, and results are publicly available on GitHub https://github.com/slds-lmu/paper_2023_survival_benchmark

2404.11345 2026-03-03 stat.ME

Jacobi Prior: An Alternative Bayesian Method for Supervised Learning

Sourish Das, Shouvik Sardar

Comments 44 pages, 10 figures

详情
英文摘要

The Jacobi prior offers an alternative Bayesian framework, designed to achieve superior computational efficiency without compromising predictive performance. Compared to widely used methods such as Lasso, Ridge, Elastic Net, uniLasso, the MCMC-based Horseshoe prior, and non-Bayesian machine learning methods including Support Vector Machines (SVM), Random Forests, and Extreme Gradient Boosting (XGBoost), the Jacobi prior achieves competitive or better accuracy with significantly reduced computational cost. The method is well suited to distributed computing environments, as it naturally accommodates partitioned data across multiple servers. We propose a parallelisable Monte Carlo algorithm to quantify the uncertainty in the estimated coefficients. We establish that the Jacobi estimator is asymptotically close to, and asymptotically equivalent to, the posterior mode under the Jacobi prior. To demonstrate its practical utility, we conduct a comprehensive simulation study comprising seven experiments focused on statistical consistency, prediction accuracy, scalability, sensitivity analysis and robustness study. We further present three real-data applications multi-class classification of stars, quasars, and galaxies using Sloan Digital Sky Survey data, and spinal degeneration classification using sagittal MRI scans from the RSNA 2024 Lumbar Spine Degenerative Classification Challenge. In the spine classification task, we extract last-layer features from a fine-tuned ResNet-50 model and evaluate multiple classifiers, including Jacobi-Multinomial logit regression, SVM, and Random Forest. All code and datasets used in this paper are available at: https://github.com/sourish-cmi/Jacobi-Prior/

2403.17609 2026-03-03 stat.ME stat.CO

Estimation Method under Three-Parameter Generalized Exponential Model: Consistency, Uniqueness and its Applications

Kiran Prajapat, Sharmishtha Mitra, Debasis Kundu

Comments Accepted for publication in the Japanese Journal of Statistics and Data Science

详情
英文摘要

In numerous instances, the generalized exponential distribution can be used as an alternative to the most widely used non-regular family of distributions: Weibull, gamma, lognormal with three-parameters when analyzing lifetime or any skewed continuous data. A non-regular family is a class of probability distributions that do not satisfy the regularity conditions typically assumed in classical statistical inference. Some key features of such family of distributions are: support of its probability density function depends on one its parameters; its likelihood function may not be bounded for a certain range of parameter space, hence maximum likelihood estimators do not exist; the likelihood function even may not be differentiable or integrable as needed, hence Fisher Information may not exist or be infinite. Moreover, standard results like MLE existence, consistency, asymptotic normality may fail. Therefore, specialized or robust inferential techniques are needed. This article offers a consistent method for estimating the parameters of a three-parameter generalized exponential distribution that sidesteps the issue of an unbounded likelihood function. The method is hinged on a maximum likelihood estimation of shape and scale parameters that uses a location-invariant statistic. Important estimator properties, such as uniqueness and consistency, are demonstrated for the first time under this approach. In addition, quantile estimates for the assumed distribution are provided. We present a Monte Carlo simulation study along with comparisons to a number of well-known estimation techniques in terms of bias and root mean square error. For illustrative purposes, a real dataset from reliability engineering, has been analyzed and the goodness of fit along with the bootstrap confidence intervals are compared with existing traditional methods.

2402.10018 2026-03-03 cs.IT math.IT q-bio.QM stat.AP

Two-Stage Decoding Algorithm and Bounds for Group Testing with Prior Statistics

Ayelet C. Portnoy, Amit Solomon, Alejandro Cohen

详情
英文摘要

In this paper, we propose an efficient two-stage decoding algorithm for non-adaptive Group Testing (GT) with general correlated prior statistics. The proposed solution can be applied to any correlated statistical prior represented in trellis, e.g., finite state machines and Markov processes. We introduce a variation of List Viterbi Algorithm (LVA) to enable accurate recovery using much fewer tests than objectives, which efficiently gains from the correlated prior statistics structure. We also provide a sufficiency bound to the number of pooled tests required by any Maximum A Posteriori (MAP) decoder with an arbitrary correlation, i.e., dependence between infected items. Our numerical results demonstrate that the proposed two-stage decoding GT (2SDGT) algorithm can obtain the optimal MAP performance with feasible complexity in practical regimes, such as with COVID-19 and sparse signal recovery applications, and reduce in the scenarios tested the number of pooled tests by at least $25\%$ compared to existing classical low complexity GT algorithms. Moreover, we analytically characterize the complexity of the proposed 2SDGT algorithm that guarantees its efficiency.

2402.06223 2026-03-03 cs.LG cs.CV stat.ML

Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning

Yuhang Liu, Zhen Zhang, Dong Gong, Erdun Gao, Biwei Huang, Mingming Gong, Anton van den Hengel, Kun Zhang, Javen Qinfeng Shi

详情
英文摘要

Directed Acyclic Graphs (DAGs) are a standard tool in causal modeling, but their suitability for capturing the complexity of large-scale multimodal data is questionable. In practice, real-world multimodal datasets are often collected from heterogeneous generative processes that do not conform to a single DAG. Instead, they may involve multiple, and even opposing, DAG structures with inverse causal directions. To address this gap, in this work, we first propose a novel latent partial causal model tailored for multimodal data representation learning, featuring two latent coupled variables parts connected by an undirected edge, to represent the transfer of knowledge across modalities. Under specific statistical assumptions, we establish an identifiability result, demonstrating that representations learned by MultiModal Contrastive Learning (MMCL) correspond to the latent coupled variables up to a trivial transformation. This result deepens our understanding of the why MMCL works, highlights its potential for representation disentanglement, and expands the utility of pre-trained models like CLIP. Synthetic experiments confirm the robustness of our findings, even when the assumptions are partially violated. Most importantly, experiments on a pre-trained CLIP model embodies disentangled representations, enabling few-shot learning and improving domain generalization across diverse real-world datasets. Together, these contributions push the boundaries of MMCL, both in theory and in practical applications.

2308.13905 2026-03-03 stat.ME math.ST stat.TH

Estimation and Hypothesis Testing of Derivatives in Smoothing Spline ANOVA Models

Ruiqi Liu, Kexuan Li, Meng Li

详情
英文摘要

Within the framework of smoothing spline ANOVA, we propose a plug-in kernel ridge regression estimator to estimate the derivatives of the underlying multivariate regression function. We first establish an $L_\infty$ convergence rate of the proposed estimator under general random designs. When the covariates are uniformly distributed, we provide a in-depth analysis that includes a sharp upper bound and the minimax lower bound of the $L_2$ convergence rate. Additionally, motivated by a wide range of applications, we propose a hypothesis testing procedure to examine whether a derivative is zero. Theoretical results demonstrate that the proposed testing procedure achieves the correct size under the null hypothesis and is asymptotically powerful under local alternatives. For ease of use, we also develop an associated bootstrap algorithm to construct the rejection region and calculate p-value, and the consistency of the proposed algorithm is established. Simulation studies using synthetic data and an application to a real-world dataset confirm the effectiveness of our approach.

2210.11611 2026-03-03 stat.AP

3D Bivariate Spatial Modelling of Argo Ocean Temperature and Salinity

Mary Lai Salvana, Jian Cao, Mikyoung Jun

详情
英文摘要

Variables contained within the global oceans can detect and reveal the effects of the warming climate as the oceans absorb huge amounts of solar energy. Hence, information regarding the joint spatial distribution of ocean variables is critical for climate monitoring. In this paper, we investigate the spatial correlation structure between ocean temperature and salinity using data harvested from the Argo program and construct a model to capture their bivariate spatial dependence from the surface to the ocean's interior. We develop a flexible class of multivariate nonstationary covariance models defined in 3-dimensional (3D) space (longitude $\times$ latitude $\times$ depth) that allows for the variances and correlation to change along the vertical pressure dimension. These models are able to describe the joint spatial distribution of the two variables while incorporating the underlying vertical structure of the ocean. We demonstrate that the proposed cross-covariance models describe the complex vertical cross-covariance structure well, while existing cross-covariance models including bivariate Matérn models poorly fit empirical cross-covariance structure. Furthermore, the results show that using one more variable significantly enhances the prediction of the other variable and that the estimated spatial dependence structures are consistent with the ocean stratification.

2603.00216 2026-03-03 math.ST math.PR stat.TH

The relative efficiency of sequential tests

Henri Doerks, Erik Ekström, Yuqiong Wang

详情
英文摘要

While many statistical procedures rely on a fixed sample size, sequential methods allow a decision-maker to adapt the sample size to achieve a given precision. In this way, sequential tests reduce the average number of observations required to achieve a given power of the test -- but by how much? To address this question, we focus on the scenario of testing the unknown drift of a Brownian motion, comparing the Wald sequential probability ratio test with tests that use a pre-determined fixed sample size. We provide precise bounds on the average reduction in sample size needed to achieve a desired precision. Specifically, we demonstrate that for symmetric error bounds, the sequential test reduces the average sample size by at least 36\% and by at most 75\%. Moreover, the reduction in sample size increases monotonically with the power of the test, meaning that the relative advantage of using a sequential test over a fixed sample size test grows as higher power is required. We also study the relative efficiency in the case with asymmetric error bounds, and we provide a lower bound in terms of the symmetric case.

2603.00135 2026-03-03 econ.EM stat.ME

Shift-Share Designs in Political Science

Peter Kyungtae Park

详情
英文摘要

Shift-share designs are gaining popularity in political science. This article introduces what shift-share designs are, reviews their application in the literature, synthesizes recent methodological developments, and discusses their potential utility in the field. Although shift-share designs have a long historical use in economics, their causal properties only recently began to be understood. Articles in political science tend to be aware of these developments, but do not fully discuss and test identifying assumptions and sometimes apply the methods incorrectly. Most articles rely on the share exogeneity framework, suggesting that the shifter exogeneity framework is underutilized despite its comparable prevalence in economics. I illustrate shifter exogeneity framework and develop auxiliary theoretical results that are potentially useful in applying the framework in political science settings.

2603.00105 2026-03-03 cs.LG cs.CL stat.ME stat.ML

LIDS: LLM Summary Inference Under the Layered Lens

Dylan Park, Yingying Fan, Jinchi Lv

Comments 48 pages, 15 figures

详情
英文摘要

Large language models (LLMs) have gained significant attention by many researchers and practitioners in natural language processing (NLP) since the introduction of ChatGPT in 2022. One notable feature of ChatGPT is its ability to generate summaries based on prompts. Yet evaluating the quality of these summaries remains challenging due to the complexity of language. To this end, in this paper we suggest a new method of LLM summary inference with BERT-SVD-based direction metric and SOFARI (LIDS) that assesses the summary accuracy equipped with interpretable key words for layered themes. The LIDS uses a latent SVD-based direction metric to measure the similarity between the summaries and original text, leveraging the BERT embeddings and repeated prompts to quantify the statistical uncertainty. As a result, LIDS gives a natural embedding of each summary for large text reduction. We further exploit SOFARI to uncover important key words associated with each latent theme in the summary with controlled false discovery rate (FDR). Comprehensive empirical studies demonstrate the practical utility and robustness of LIDS through human verification and comparisons to other similarity metrics, including a comparison of different LLMs.

2603.00100 2026-03-03 stat.AP cs.LG

Using Artificial Neural Networks to Predict Claim Duration in a Work Injury Compensation Environment

Anthony Almudevar

Comments 8 pages; 9 figures; 6 tables

详情
英文摘要

Currently, work injury compensation boards in Canada track injury information using a standard system of codes (under the National Work Injury Statistics Program (NWISP)). These codes capture the medical nature and original cause of the injury in some detail, hence they potentially contain information which may be used to predict the severity of an injury and the resulting time loss from work. Claim duration easurements and forecasts are central to the operation of a work injury compensation program. However, due to the complexity of the codes traditional statistical modelling techniques are of limited value. We will describe an artificial neural network implementation of Cox proportional hazards regression due to Ripley (1998 thesis) which is used as the basis for a model for the prediction of claim duration within a work injury compensation environment. The model accepts as input the injury codes, as well as basic demographic and workplace information. The output consists of a claim duration prediction in the form of a distribution. The input represents information available when a claim is first filed, and may therefore be used in a claims management setting. We will describe the model selection procedure, as well as a procedure for accepting inputs with missing covariates.

2603.00098 2026-03-03 stat.OT cs.CY cs.LG econ.GN math.PR q-fin.EC

Profiling vs. Case-specific Evidence: A Probabilistic Analysis

Marcello Di Bello, Nicolò Cangiotti, Michele Loi

Comments 16 pages

详情
英文摘要

The use of profiling evidence in criminal trials is a longstanding controversy in legal epistemology and evidence law theory. Many scholars, even when they oppose its use at trial, still assume that profiling evidence can be probative of guilt. We reject that assumption. Profiling evidence may support a generic hypothesis, but is not evidence that the defendant is guilty of the specific crime of which they are accused. We contrast profiling evidence with case-specific evidence, which speaks more directly to the facts of the case. Our critique departs from others by grounding the argument in a probabilistic analysis of evidentiary value. We also explore the implications of our account for debates about stereotyping.

2603.00039 2026-03-03 cs.LG cs.AI stat.ML

CARE: Confounder-Aware Aggregation for Reliable LLM Evaluation

Jitian Zhao, Changho Shin, Tzu-Heng Huang, Satya Sai Srinath Namburi GNVV, Frederic Sala

详情
英文摘要

LLM-as-a-judge ensembles are the standard paradigm for scalable evaluation, but their aggregation mechanisms suffer from a fundamental flaw: they implicitly assume that judges provide independent estimates of true quality. However, in practice, LLM judges exhibit correlated errors caused by shared latent confounders -- such as verbosity, stylistic preferences, or training artifacts -- causing standard aggregation rules like majority vote or averaging to provide little gain or even amplify systematic mistakes. To address this, we introduce CARE, a confounder-aware aggregation framework that explicitly models LLM judge scores as arising from both a latent true-quality signal and shared confounding factors. Rather than heuristically re-weighting judges, CARE separates quality from confounders without access to ground-truth labels. We provide theoretical guarantees for identifiability and finite-sample recovery under shared confounders, and we quantify the systematic bias incurred when aggregation models omit confounding latent factors. Across 12 public benchmarks spanning continuous scoring, binary classification, and pairwise preference settings, CARE improves aggregation accuracy, reducing error by up to 26.8\%. Code is released in \href{https://github.com/SprocketLab/CARE}{https://github.com/SprocketLab/CARE}.