arXivDaily arXiv每日学术速递 周一至周五更新
2602.04863 2026-02-05 cs.LG cs.AI cs.CL stat.ML

Subliminal Effects in Your Data: A General Mechanism via Log-Linearity

Ishaq Aden-Ali, Noah Golowich, Allen Liu, Abhishek Shetty, Ankur Moitra, Nika Haghtalab

Comments Code available at https://github.com/ishaqadenali/logit-linear-selection

详情
英文摘要

Training modern large language models (LLMs) has become a veritable smorgasbord of algorithms and datasets designed to elicit particular behaviors, making it critical to develop techniques to understand the effects of datasets on the model's properties. This is exacerbated by recent experiments that show datasets can transmit signals that are not directly observable from individual datapoints, posing a conceptual challenge for dataset-centric understandings of LLM training and suggesting a missing fundamental account of such phenomena. Towards understanding such effects, inspired by recent work on the linear structure of LLMs, we uncover a general mechanism through which hidden subtexts can arise in generic datasets. We introduce Logit-Linear-Selection (LLS), a method that prescribes how to select subsets of a generic preference dataset to elicit a wide range of hidden effects. We apply LLS to discover subsets of real-world datasets so that models trained on them exhibit behaviors ranging from having specific preferences, to responding to prompts in a different language not present in the dataset, to taking on a different persona. Crucially, the effect persists for the selected subset, across models with varying architectures, supporting its generality and universality.

2602.04855 2026-02-05 stat.ME

Marginal Likelihood Inference for Fitting Dynamical Survival Analysis Models to Epidemic Count Data

Suchismita Roy, Alexander A. Fisher, Jason Xu

Comments 25 pages, 2 figures and 6 tables

详情
英文摘要

Stochastic compartmental models are prevalent tools for describing disease spread, but inference under these models is challenging for many types of surveillance data when the marginal likelihood function becomes intractable due to missing information. To address this, we develop a closed-form likelihood for discretely observed incidence count data under the dynamical survival analysis (DSA) paradigm. The method approximates the stochastic population-level hazard by a large population limit while retaining a count-valued stochastic model, and leads to survival analytic inferential strategies that are both computationally efficient and flexible to model generalizations. Through simulation, we show that parameter estimation is competitive with recent exact but computationally expensive likelihood-based methods in partially observed settings. Previous work has shown that the DSA approximation is generalizable, and we show that the inferential developments here also carry over to models featuring individual heterogeneity, such as frailty models. We consider case studies of both Ebola and COVID-19 data on variants of the model, including a network-based epidemic model and a model with distributions over susceptibility, demonstrating its flexibility and practical utility on real, partially observed datasets.

2602.04823 2026-02-05 math.ST stat.TH

Adaptive estimation of Sobolev-type energy functionals on the sphere

Claudio Durastanti

Comments 26 pages, 3 figures

详情
英文摘要

We study the estimation of quadratic Sobolev-type integral functionals of an unknown density on the unit sphere. The functional is defined through fractional powers of the Laplace--Beltrami operator and provides a global measure of smoothness and spectral energy. Our approach relies on spherical needlet frames, which yield a localized multiscale decomposition while preserving tight frame properties in the natural square-integrable function space on the sphere. We construct unbiased estimators of suitably truncated versions of the functional and derive sharp oracle risk bounds through an explicit bias--variance analysis. When the smoothness of the density is unknown, we propose a Lepski-type data-driven selection of the resolution level. The resulting adaptive estimator achieves minimax-optimal rates over Sobolev classes, without resorting to nonlinear or sparsity-based methods.

2602.04798 2026-02-05 stat.ME stat.ML

Score-Based Change-Point Detection and Region Localization for Spatio-Temporal Point Processes

Wenbin Zhou, Liyan Xie, Shixiang Zhu

详情
英文摘要

We study sequential change-point detection for spatio-temporal point processes, where actionable detection requires not only identifying when a distributional change occurs but also localizing where it manifests in space. While classical quickest change detection methods provide strong guarantees on detection delay and false-alarm rates, existing approaches for point-process data predominantly focus on temporal changes and do not explicitly infer affected spatial regions. We propose a likelihood-free, score-based detection framework that jointly estimates the change time and the change region in continuous space-time without assuming parametric knowledge of the pre- or post-change dynamics. The method leverages a localized and conditionally weighted Hyvärinen score to quantify event-level deviations from nominal behavior and aggregates these scores using a spatio-temporal CUSUM-type statistic over a prescribed class of spatial regions. Operating sequentially, the procedure outputs both a stopping time and an estimated change region, enabling real-time detection with spatial interpretability. We establish theoretical guarantees on false-alarm control, detection delay, and spatial localization accuracy, and demonstrate the effectiveness of the proposed approach through simulations and real-world spatio-temporal event data.

2602.04788 2026-02-05 stat.ME stat.AP

Species Sensitivity Distribution revisited: a Bayesian nonparametric approach

Louise Alamichel, Julyan Arbel, Guillaume Kon Kam King, Igor Prünster

详情
英文摘要

We present a novel approach to ecological risk assessment by recasting the Species Sensitivity Distribution (SSD) method within a Bayesian nonparametric (BNP) framework. Widely mandated by environmental regulatory bodies globally, SSD has faced criticism due to its historical reliance on parametric assumptions when modeling species variability. By adopting nonparametric mixture models, we address this limitation, establishing a statistically robust foundation for SSD. Our BNP approach offers several advantages, including its efficacy in handling small datasets or censored data, which are common in ecological risk assessment, and its ability to provide principled uncertainty quantification alongside simultaneous density estimation and clustering. We utilize a specific nonparametric prior as the mixing measure, chosen for its robust clustering properties, a crucial consideration given the lack of strong prior beliefs about the number of components. Through simulation studies and analysis of real datasets, we demonstrate the superiority of our BNP-SSD over classical SSD methods. We also provide a BNP-SSD Shiny application, making our methodology available to the Ecotoxicology community. Moreover, we exploit the inherent clustering structure of the mixture model to explore patterns in species sensitivity. Our findings underscore the effectiveness of the proposed approach in improving ecological risk assessment methodologies.

2602.04762 2026-02-05 q-bio.PE stat.AP stat.OT

Uncertainty in Island-based Ecosystem Services and Climate Change

Nazli Demirel, Ioannis N. Vogiatzakis, George Zittis, Mirela Tase, Attila D. Sandor, Savvas Zotos, Christos Zoumides, Turgay Dindaroglu, Mauro Fois, Irene Christoforidi, Valentini Stamatiadou, Shiri Zemah-Shamir, Tamer Albayrak, Cigdem Kaptan Ayhan, Paraskevi Manolaki, Ina Sieber, Ziv Zemah-Shamir, Elli Tzirkalli, Aristides Moustakas

详情
英文摘要

Small and medium-sized islands are acutely exposed to climate change and ecosystem degradation, yet the extent to which uncertainty is systematically addressed in scientific assessments of their ecosystem services remains poorly understood. This study revisits 226 peer-reviewed articles drawn from two global systematic reviews on island ecosystem services and climate change, applying a structured post hoc analysis to evaluate how uncertainty is treated across methods, service categories, ecosystem realms, and decision contexts. Studies were classified according to whether uncertainty was explicitly analysed, just mentioned, or ignored. Only 30 percent of studies incorporated uncertainty explicitly, while more than half did not address it at all. Scenario-based approaches dominated uncertainty assessment, whereas probabilistic and ensemble-based frameworks remained limited. Cultural ecosystem services and extreme climate impacts exhibited the lowest levels of uncertainty integration, and few studies connected uncertainty treatment to policy relevant decision frameworks. Weak or absent treatment of uncertainty emerges as a structural challenge in island systems, where narrow ecological thresholds, strong land-sea coupling, limited spatial buffers, and reduced institutional redundancy amplify the consequences of decision-making under incomplete knowledge. Systematic mapping of how uncertainty is framed, operationalised, or neglected reveals persistent methodological and conceptual gaps and informs concrete directions for strengthening uncertainty integration in future island-focused ecosystem service and climate assessments. Embedding uncertainty more robustly into modelling practices, participatory processes, and policy tools is essential for enhancing scientific credibility, governance relevance, and adaptive capacity in insular socio-ecological systems.

2602.04761 2026-02-05 cs.LG stat.ML

Improved Dimension Dependence for Bandit Convex Optimization with Gradient Variations

Hang Yu, Yu-Hu Yan, Peng Zhao

详情
英文摘要

Gradient-variation online learning has drawn increasing attention due to its deep connections to game theory, optimization, etc. It has been studied extensively in the full-information setting, but is underexplored with bandit feedback. In this work, we focus on gradient variation in Bandit Convex Optimization (BCO) with two-point feedback. By proposing a refined analysis on the non-consecutive gradient variation, a fundamental quantity in gradient variation with bandits, we improve the dimension dependence for both convex and strongly convex functions compared with the best known results (Chiang et al., 2013). Our improved analysis for the non-consecutive gradient variation also implies other favorable problem-dependent guarantees, such as gradient-variance and small-loss regrets. Beyond the two-point setup, we demonstrate the versatility of our technique by achieving the first gradient-variation bound for one-point bandit linear optimization over hyper-rectangular domains. Finally, we validate the effectiveness of our results in more challenging tasks such as dynamic/universal regret minimization and bandit games, establishing the first gradient-variation dynamic and universal regret bounds for two-point BCO and fast convergence rates in bandit games.

2602.04751 2026-02-05 stat.CO stat.ME

Multiple Imputation Methods under Extreme Values

Enzo Porto Brasil

Comments 36 pages main text, 20 pages appendix, 12 figures, 28 tables. Submitted to the Austrian Journal of Statistics (under review)

详情
英文摘要

Missing data are ubiquitous in empirical databases, yet statistical analyses typically require complete data matrices. Multiple imputation offers a principled solution for filling these gaps. This study evaluates the performance of several multiple imputation methods, both in the presence and absence of extreme values, using the MICE package in R. Through Monte Carlo simulations, we generated incomplete data sets with three variables and assessed each imputation method within regression models. The results indicate that the linear regression based imputation method showed the best overall predictive performance (CV-MSE), whereas the sparse model approach was generally less efficient. Our findings underscore the relevance of extreme values when selecting an imputation strategy and highlight sample size, proportion of missingness, presence of extremes, and the type of fitted model as key determinants of performance. Despite its limitations, the study offers practical recommendations for researchers, stressing the need to examine the missingness mechanism and the occurrence of extreme values before choosing an imputation method.

2602.04736 2026-02-05 stat.ML cs.LG

Conditional Counterfactual Mean Embeddings: Doubly Robust Estimation and Learning Rates

Thatchanon Anancharoenkij, Donlapark Ponnoprat

Comments Code is available at https://github.com/donlap/Conditional-Counterfactual-Mean-Embeddings

详情
英文摘要

A complete understanding of heterogeneous treatment effects involves characterizing the full conditional distribution of potential outcomes. To this end, we propose the Conditional Counterfactual Mean Embeddings (CCME), a framework that embeds conditional distributions of counterfactual outcomes into a reproducing kernel Hilbert space (RKHS). Under this framework, we develop a two-stage meta-estimator for CCME that accommodates any RKHS-valued regression in each stage. Based on this meta-estimator, we develop three practical CCME estimators: (1) Ridge Regression estimator, (2) Deep Feature estimator that parameterizes the feature map by a neural network, and (3) Neural-Kernel estimator that performs RKHS-valued regression, with the coefficients parameterized by a neural network. We provide finite-sample convergence rates for all estimators, establishing that they possess the double robustness property. Our experiments demonstrate that our estimators accurately recover distributional features including multimodal structure of conditional counterfactual distributions.

2602.04708 2026-02-05 math.ST stat.TH

Statistical inference for the stochastic wave equation based on discrete observations

Anton Tiepner, Mathias Trabs, Eric Ziebell

Comments 44 pages, 6 figures

详情
英文摘要

The wave speed of a stochastic wave equation driven by Riesz noise on the unbounded multidimensional spatial domain is estimated based on discrete measurements. Central limit theorems for second-order variations of the observations in space, time, and space-time are established. Under general assumptions on the spatial and temporal sampling frequencies, the resulting method-of-moments estimators are asymptotically normally distributed. The covariance structure of the discrete increments admits a closed-form representation involving two different Fejér-type kernels, enabling a precise analysis of the interplay between spatial and temporal contributions.

2602.04691 2026-02-05 stat.ME

Linear Regression: Inference Based on Cluster Estimates

Subhodeep Dey, Gopal K. Basak, Samarjit Das

详情
英文摘要

This article proposes a novel estimator for regression coefficients in clustered data that explicitly accounts for within-cluster dependence. We study the asymptotic properties of the proposed estimator under both finite and infinite cluster sizes. The analysis is then extended to a standard random coefficient model, where we derive asymptotic results for the average (common) parameters and develop a Wald-type test for general linear hypotheses. We also investigate the performance of the conventional pooled ordinary least squares (POLS) estimator within the random coefficients framework and show that it can be unreliable across a wide range of empirically relevant settings. Furthermore, we introduce a new test for parameter stability at a higher (superblock; Tier 2, Tier 3,...) level, assuming that parameters are stable across clusters within that level. Extensive simulation studies demonstrate the effectiveness of the proposed tests, and an empirical application illustrates their practical relevance.

2602.04682 2026-02-05 stat.ME

Covariate Selection for Joint Latent Space Modeling of Sparse Network Data

Emma G Crenshaw, Yuhua Zhang, Jukka-Pekka Onnela

详情
英文摘要

Network data are increasingly common in the social sciences and infectious disease epidemiology. Analyses often link network structure to node-level covariates, but existing methods falter with sparse networks and high-dimensional node features. We propose a joint latent space modeling framework for sparse networks with high-dimensional binary node covariates that performs covariate selection while accounting for uncertainty in estimated latent positions. Building on joint latent space models that couple edges and node variables through shared latent positions, we introduce a group lasso screening step and incorporate a measurement-error-aware stabilization term to mitigate bias from using estimated latent positions as predictors. We establish prediction error rates for the covariate component both when latent positions are treated as observed and when they are estimated with bounded error; under uniform control across $q$ covariates and $n$ nodes, the rate is of order $O(\log q / n)$ up to an additional term due to latent position estimation error. Our method addresses three challenges: (1) incorporating information from isolated nodes, which are common in sparse networks but often ignored; (2) selecting relevant covariates from high-dimensional spaces; and (3) accounting for uncertainty in estimated latent positions. Simulations show predictive performance remains stable as covariate sparsity grows, while naive approaches degrade. We illustrate how the method can support efficient study design using household social networks from 75 Indian villages, where an emulated pilot study screens a large covariate battery and substantially reduces required subsequent data collection without sacrificing network predictive accuracy.

2602.04679 2026-02-05 stat.CO cs.CY

LID Framework: A new method for geospatial and exploratory data analysis of potential innovation deter-minants at the neighborhood level

Eleni Oikonomaki, Belivanis Dimitris, Kakderi Christina

详情
英文摘要

The geography of innovation offers a framework to understand how territorial characteristics shape innovation, often via spatial and cognitive proximity. Empirical research has focused largely on national and regional scales, while urban and sub-regional geographies receive less attention. Local studies typically rely on limited indicators (e.g., firm-level data, patents, basic socioeconomic measures), with few offering a systematic framework integrating urban form, mobility, amenities, and human-capital proxies at the neighborhood scale. Our study investigates innovation at a finer spatial resolution, going beyond proprietary or static indicators. We develop the Local Innovation Determinants (LID) database and framework to identify key enabling factors across regions, combining traditional government data with publicly available data via APIs for a more granular understanding of spatial dynamics shaping innovation capacity. Using exploratory big and geospatial data analytics and random forest models, we examine neighborhoods in New York and Massachusetts across four dimensions: social factors, economic characteristics, land use and mobility, morphology, and environment. Results show that alternative data sources offer significant yet underexplored potential to enhance insights into innovation dynamics. City policymakers should consider neighborhood-specific determinants and characteristics when designing and implementing local innovation strategies.

2602.04667 2026-02-05 stat.ML cs.LG

Causal explanations of outliers in systems with lagged time-dependencies

Philipp Alexander Schwarz, Johannes Oberpriller, Sven Klaassen

详情
英文摘要

Root-cause analysis in controlled time dependent systems poses a major challenge in applications. Especially energy systems are difficult to handle as they exhibit instantaneous as well as delayed effects and if equipped with storage, do have a memory. In this paper we adapt the causal root-cause analysis method of Budhathoki et al. [2022] to general time-dependent systems, as it can be regarded as a strictly causal definition of the term "root-cause". Particularly, we discuss two truncation approaches to handle the infinite dependency graphs present in time-dependent systems. While one leaves the causal mechanisms intact, the other approximates the mechanisms at the start nodes. The effectiveness of the different approaches is benchmarked using a challenging data generation process inspired by a problem in factory energy management: the avoidance of peaks in the power consumption. We show that given enough lags our extension is able to localize the root-causes in the feature and time domain. Further the effect of mechanism approximation is discussed.

2602.04638 2026-02-05 stat.AP

Inference for Within- and Between-Partnership Transmission Rates for HIV Infection

Irene García Muñoz, Ian Hall, Thomas House

Comments 14 pages, 3 figures, 3 tables

详情
英文摘要

HIV transmission within serodiscordant couples remains a significant public health challenge, particularly in sub-Saharan Africa. Estimating the rate of such infection, alongside the rates of introduction of infection from outside the partnership, is a special case of the more general epidemiological challenge of inferring intensities of within- and between-group intensities of transmission. This study presents a stochastic susceptible-infected (SI) pair model for estimating key epidemiological parameters governing HIV transmission within and between couples, which we further extend to account for gender-specific differences in infection dynamics. Using a likelihood-based inference approach, we estimate transmission parameters and associated uncertainty from observed data. These values can be used to inform infection prevention strategies for HIV, and the methodology proposed can be generalised to other epidemiological settings.

2602.04596 2026-02-05 stat.ML cs.LG stat.ME

A principled framework for uncertainty decomposition in TabPFN

Sandra Fortini, Kenyon Ng, Sonia Petrone, Judith Rousseau, Susan Wei

Comments 9 pages (+2 reference, +34 appendix). Code in https://github.com/weiyaw/ud4pfn

详情
英文摘要

TabPFN is a transformer that achieves state-of-the-art performance on supervised tabular tasks by amortizing Bayesian prediction into a single forward pass. However, there is currently no method for uncertainty decomposition in TabPFN. Because it behaves, in an idealised limit, as a Bayesian in-context learner, we cast the decomposition challenge as a Bayesian predictive inference (BPI) problem. The main computational tool in BPI, predictive Monte Carlo, is challenging to apply here as it requires simulating unmodeled covariates. We therefore pursue the asymptotic alternative, filling a gap in the theory for supervised settings by proving a predictive CLT under quasi-martingale conditions. We derive variance estimators determined by the volatility of predictive updates along the context. The resulting credible bands are fast to compute, target epistemic uncertainty, and achieve near-nominal frequentist coverage. For classification, we further obtain an entropy-based uncertainty decomposition.

2602.04594 2026-02-05 stat.ME

Distributed Convoluted Rank Regression for Non-Shareable Data under Non-Additive Losses

Wen Zhang, Liping Zhu, Songshan Yang

详情
英文摘要

We study high-dimensional rank regression when data are distributed across multiple machines and the loss is a non-additive U-statistic, as in convoluted rank regression (CRR). Classical communication-efficient surrogate likelihood (CSL) methods crucially rely on the additivity of the empirical loss and therefore break down for CRR, whose global loss couples all sample pairs across machines. We propose a distributed convoluted rank regression (DCRR) framework that constructs a similar surrogate loss and demonstrate its validity under the non-additive losses. We show that this surrogate shares the same population minimizer as the full-data CRR loss and yields estimators that are statistically equivalent to centralized CRR. Building on this, we develop a two-stage sparse DCRR procedure -- an iterative $\ell_1$-penalized stage followed by a folded-concave refinement -- and establish non-asymptotic error bounds, a distributed strong oracle property, and a DHBIC-type criterion for consistent model selection. A scaling result shows that the number of machines may diverge as $M = o({N/(s^2\log p)})$ while achieving centralized oracle rates with only $O(\log N)$ communication rounds. Simulations and a large-scale real data example demonstrate substantial gains over naive divide-and-conquer, particularly under heavy-tailed errors.

2602.04554 2026-02-05 stat.AP stat.CO

mmcmcBayes:An R Package Implementing a Multistage MCMC Framework for Detecting the Differentially Methylated Regions

Zhexuan Yang, Duchwan Ryu, Feng Luan

Comments 27 pages, 3 figures

详情
英文摘要

Identifying differentially methylated regions is an important task in epigenome-wide association studies, where differential signals often arise across groups of neighboring CpG sites. Many existing methods detect differentially methylated regions by aggregating CpG-level test results, which may limit their ability to capture complex regional methylation patterns. In this paper, we introduce the R package mmcmcBayes, which implements a multistage Markov chain Monte Carlo procedure for region-level detection of differentially methylated regions. The method models sample-wise regional methylation summaries using the alpha-skew generalized normal distribution and evaluates evidence for differential methylation between groups through Bayes factors. We use a multistage region-splitting strategy to refine candidate regions based on statistical evidence. We describe the underlying methodology and software implementation, and illustrate its performance through simulation studies and applications to Illumina 450K methylation data. The mmcmcBayes package provides a practical region-level alternative to existing CpG-based differentially methylated regions detection methods and includes supporting functions for summarizing, comparing, and visualizing detected regions.

2602.04548 2026-02-05 cs.LG stat.ML

Gradient Flow Through Diagram Expansions: Learning Regimes and Explicit Solutions

Dmitry Yarotsky, Eugene Golikov, Yaroslav Gusev

Comments 48 pages, under review for ICML'2026

详情
英文摘要

We develop a general mathematical framework to analyze scaling regimes and derive explicit analytic solutions for gradient flow (GF) in large learning problems. Our key innovation is a formal power series expansion of the loss evolution, with coefficients encoded by diagrams akin to Feynman diagrams. We show that this expansion has a well-defined large-size limit that can be used to reveal different learning phases and, in some cases, to obtain explicit solutions of the nonlinear GF. We focus on learning Canonical Polyadic (CP) decompositions of high-order tensors, and show that this model has several distinct extreme lazy and rich GF regimes such as free evolution, NTK and under- and over-parameterized mean-field. We show that these regimes depend on the parameter scaling, tensor order, and symmetry of the model in a specific and subtle way. Moreover, we propose a general approach to summing the formal loss expansion by reducing it to a PDE; in a wide range of scenarios, it turns out to be 1st order and solvable by the method of characteristics. We observe a very good agreement of our theoretical predictions with experiment.

2602.04527 2026-02-05 cs.GT stat.AP

Graph-Based Audits for Meek Single Transferable Vote Elections

Edouard Heitzmann

详情
英文摘要

In the context of election security, a Risk-Limiting Audit (RLA) is a statistical framework that uses a minimal partial recount of the ballots to guarantee that the results of the election were correctly reported. A generalized RLA framework has remained elusive for algorithmic election rules such as the Single Transferable Vote (STV) rule, because of the dependence of these rules on the chronology of eliminations and elections leading to the outcome of the election. This paper proposes a new graph-based approach to audit these algorithmic election rules, by considering the space of all possible sequences of elections and eliminations. If we fix a subgraph of this universal space ahead of the audit, a sufficient strategy is to verify statistically that the true election sequence does not leave the fixed subgraph. This makes for a flexible framework to audit these elections in a chronology-agnostic way.

2602.04459 2026-02-05 stat.ML cs.LG

Bayesian PINNs for uncertainty-aware inverse problems (BPINN-IP)

Ali Mohammad-Djafari

Comments submitted to ICIP 2006 conference

详情
英文摘要

The main contribution of this paper is to develop a hierarchical Bayesian formulation of PINNs for linear inverse problems, which is called BPINN-IP. The proposed methodology extends PINN to account for prior knowledge on the nature of the expected NN output, as well as its weights. Also, as we can have access to the posterior probability distributions, naturally uncertainties can be quantified. Also, variational inference and Monte Carlo dropout are employed to provide predictive means and variances for reconstructed images. Un example of applications to deconvolution and super-resolution is considered, details of the different steps of implementations are given, and some preliminary results are presented.

2602.04457 2026-02-05 stat.ME cs.LG

Journey to the Centre of Cluster: Harnessing Interior Nodes for A/B Testing under Network Interference

Qianyi Chen, Anpeng Wu, Bo Li, Lu Deng, Yong Wang

Comments ICLR 2026

详情
英文摘要

A/B testing on platforms often faces challenges from network interference, where a unit's outcome depends not only on its own treatment but also on the treatments of its network neighbors. To address this, cluster-level randomization has become standard, enabling the use of network-aware estimators. These estimators typically trim the data to retain only a subset of informative units, achieving low bias under suitable conditions but often suffering from high variance. In this paper, we first demonstrate that the interior nodes - units whose neighbors all lie within the same cluster - constitute the vast majority of the post-trimming subpopulation. In light of this, we propose directly averaging over the interior nodes to construct the mean-in-interior (MII) estimator, which circumvents the delicate reweighting required by existing network-aware estimators and substantially reduces variance in classical settings. However, we show that interior nodes are often not representative of the full population, particularly in terms of network-dependent covariates, leading to notable bias. We then augment the MII estimator with a counterfactual predictor trained on the entire network, allowing us to adjust for covariate distribution shifts between the interior nodes and full population. By rearranging the expression, we reveal that our augmented MII estimator embodies an analytical form of the point estimator within prediction-powered inference framework. This insight motivates a semi-supervised lens, wherein interior nodes are treated as labeled data subject to selection bias. Extensive and challenging simulation studies demonstrate the outstanding performance of our augmented MII estimator across various settings.

2602.04400 2026-02-05 stat.ME math.PR math.ST stat.TH

Unit Shiha Distribution and its Applications to Engineering and Medical Data

F. A. Shiha

详情
英文摘要

There is a growing need for flexible statistical distributions that can accurately model data defined on the unit interval. This paper introduces a new unit distribution, termed the unit Shiha (USh) distribution, which is derived from the original Shiha (Sh) distribution through an inverse exponential transformation. The probability density function of the USh distribution is sufficiently flexible to model both left- and right-skewed data, while its hazard rate function is capable of capturing various failure-rate patterns, including increasing, bathtub-shaped, and J-shaped forms. Several statistical properties of the proposed distribution are investigated, including moments and related measures, the quantile function, entropy, and stress-strength reliability. Parameter estimation is carried out using the maximum likelihood method, and its performance is evaluated through a simulation study. The practical usefulness of the USh distribution is demonstrated using four real-life data sets, and its performance is compared with several well-known competing unit distributions. The comparative results indicate that the proposed model fits the data better than the competitive models applied in this study.

2602.04364 2026-02-05 stat.ML cs.LG

Anytime-Valid Conformal Risk Control

Bror Hultberg, Dave Zachariah, Antônio H. Ribeiro

详情
英文摘要

Prediction sets provide a means of quantifying the uncertainty in predictive tasks. Using held out calibration data, conformal prediction and risk control can produce prediction sets that exhibit statistically valid error control in a computationally efficient manner. However, in the standard formulations, the error is only controlled on average over many possible calibration datasets of fixed size. In this paper, we extend the control to remain valid with high probability over a cumulatively growing calibration dataset at any time point. We derive such guarantees using quantile-based arguments and illustrate the applicability of the proposed framework to settings involving distribution shift. We further establish a matching lower bound and show that our guarantees are asymptotically tight. Finally, we demonstrate the practical performance of our methods through both simulations and real-world numerical examples.

2601.22378 2026-02-05 stat.ML cs.LG stat.AP

It's all In the (Exponential) Family: An Equivalence between Maximum Likelihood Estimation and Control Variates for Sketching Algorithms

Keegan Kang, Kerong Wang, Ding Zhang, Rameshwar Pratap, Bhisham Dev Verma, Benedict H. W. Wong

Comments 36 pages, 15 figures, accepted to AISTATS 2026 (poster)

详情
英文摘要

Maximum likelihood estimators (MLE) and control variate estimators (CVE) have been used in conjunction with known information across sketching algorithms and applications in machine learning. We prove that under certain conditions in an exponential family, an optimal CVE will achieve the same asymptotic variance as the MLE, giving an Expectation-Maximization (EM) algorithm for the MLE. Experiments show the EM algorithm is faster and numerically stable compared to other root finding algorithms for the MLE for the bivariate Normal distribution, and we expect this to hold across distributions satisfying these conditions. We show how the EM algorithm leads to reproducibility for algorithms using MLE / CVE, and demonstrate how the EM algorithm leads to finding the MLE when the CV weights are known.

2601.13874 2026-02-05 stat.ML cs.LG

Unified Unbiased Variance Estimation for Maximum Mean Discrepancy: Robust Finite-Sample Performance with Imbalanced Data and Exact Acceleration under Null and Alternative Hypotheses

Shijie Zhong, Yikun Yang, Da Gong, Jiangfeng Fu

详情
英文摘要

The maximum mean discrepancy (MMD) is a kernel-based nonparametric statistic for two-sample testing, whose inferential accuracy depends critically on variance characterization. Existing work provides various finite-sample estimators of the MMD variance, often differing under the null and alternative hypotheses and across balanced or imbalanced sampling schemes. In this paper, we study the variance of the MMD statistic through its U-statistic representation and Hoeffding decomposition, and establish a unified finite-sample characterization covering different hypotheses and sample configurations. Building on this analysis, we propose an exact acceleration method for the univariate case under the Laplacian kernel, which reduces the overall computational complexity from $\mathcal O(n^2)$ to $\mathcal O(n \log n)$.

2512.17688 2026-02-05 cs.LG stat.ML

Convergence Guarantees for Federated SARSA with Local Training and Heterogeneous Agents

Paul Mangold, Eloïse Berthier, Eric Moulines

Comments Deep FedSARSA !

详情
英文摘要

We present a novel theoretical analysis of Federated SARSA (FedSARSA) with linear function approximation and local training. We establish convergence guarantees for FedSARSA in the presence of heterogeneity, both in local transitions and rewards, providing the first sample and communication complexity bounds in this setting. At the core of our analysis is a new, exact multi-step error expansion for single-agent SARSA, which is of independent interest. Our analysis precisely quantifies the impact of heterogeneity, demonstrating the convergence of FedSARSA with multiple local updates. Crucially, we show that FedSARSA achieves linear speed-up with respect to the number of agents, up to higher-order terms due to Markovian sampling. Numerical experiments support our theoretical findings.

2510.07473 2026-02-05 cs.LG stat.ML

metabeta -- A fast neural model for Bayesian mixed-effects regression

Alex Kipnis, Marcel Binz, Eric Schulz

Comments 19 pages, 9 main text, 8 figures

详情
英文摘要

Hierarchical data with multiple observations per group is ubiquitous in empirical sciences and is often analyzed using mixed-effects regression. In such models, Bayesian inference gives an estimate of uncertainty but is analytically intractable and requires costly approximation using Markov Chain Monte Carlo (MCMC) methods. Neural posterior estimation shifts the bulk of computation from inference time to pre-training time, amortizing over simulated datasets with known ground truth targets. We propose metabeta, a neural network model for Bayesian mixed-effects regression. Using simulated and real data, we show that it reaches stable and comparable performance to MCMC-based parameter estimation at a fraction of the usually required time, enabling new use cases for Bayesian mixed-effects modeling.

2510.06136 2026-02-05 stat.ME

Geometric Model Selection for Latent Space Network Models: Hypothesis Testing via Multidimensional Scaling and Resampling Techniques

Jieyun Wang, Anna L. Smith

详情
英文摘要

Latent space models assume that network ties are more likely between nodes that are closer together in an underlying latent space. Euclidean space is a popular choice for the underlying geometry, but hyperbolic geometry can mimic more realistic patterns of ties in complex networks. To identify the underlying geometry, past research has applied non-Euclidean extensions of multidimensional scaling (MDS) to the observed geodesic distances: the shortest path lengths between nodes. The difference in stress, a standard goodness-of-fit metric for MDS, across the geometries is then used to select a latent geometry with superior model fit (lower stress). The effectiveness of this method is assessed through simulations of latent space networks in Euclidean and hyperbolic geometries. To better account for uncertainty, we extend permutation-based hypothesis tests for MDS to the latent network setting. However, these tests do not incorporate any network structure. We propose a parametric bootstrap distribution of networks, conditioned on observed geodesic distances and the Gaussian Latent Position Model (GLPM). Our method extends the Davidson-MacKinnon J-test to latent space network models with differing latent geometries. We pay particular attention to large and sparse networks, and both the permutation test and the bootstrapping methods show an improvement in detecting the underlying geometry.

2510.04769 2026-02-05 cs.LG cs.AI math.PR math.ST stat.ML stat.TH

When Do Credal Sets Stabilize? Fixed-Point Theorems for Credal Set Updates

Michele Caprio, Siu Lun Chau, Krikamol Muandet

详情
英文摘要

Many machine learning algorithms rely on iterative updates of uncertainty representations, ranging from variational inference and expectation-maximization, to reinforcement learning, continual learning, and multi-agent learning. In the presence of imprecision and ambiguity, credal sets -- closed, convex sets of probability distributions -- have emerged as a popular framework for representing imprecise probabilistic beliefs. Under such imprecision, many learning problems in imprecise probabilistic machine learning (IPML) may be viewed as processes involving successive applications of update rules on credal sets. This naturally raises the question of whether this iterative process converges to stable fixed points -- or, more generally, under what conditions on the updating mechanism such fixed points exist, and whether they can be attained. We provide the first analysis of this problem, and illustrate our findings using Credal Bayesian Deep Learning as a concrete example. Our work demonstrates that incorporating imprecision into the learning process not only enriches the representation of uncertainty, but also reveals structural conditions under which stability emerges, thereby offering new insights into the dynamics of iterative learning under imprecision.

2510.04441 2026-02-05 cs.LG stat.ML

Domain Generalization Under Posterior Drift

Yilun Zhu, Naihao Deng, Naichen Shi, Aditya Gangrade, Clayton Scott

详情
英文摘要

Domain generalization (DG) is the problem of generalizing from several distributions (or domains), for which labeled training data are available, to a new test domain for which no labeled data is available. For the prevailing benchmark datasets in DG, there exists a single classifier that performs well across all domains. In this work, we study a fundamentally different regime where the domains satisfy a \emph{posterior drift} assumption, in which the optimal classifier might vary substantially with domain. We establish a decision-theoretic framework for DG under posterior drift, and investigate the practical implications of this framework through experiments on language and vision tasks.

2508.06377 2026-02-05 stat.ML cs.CR cs.LG math.ST stat.TH

DP-SPRT: Differentially Private Sequential Probability Ratio Tests

Thomas Michel, Debabrota Basu, Emilie Kaufmann

Comments Accepted for spotlight presentation at AISTATS 2026. 36 pages, 5 figures, 1 table

详情
英文摘要

We revisit Wald's celebrated Sequential Probability Ratio Test for sequential tests of two simple hypotheses, under privacy constraints. We propose DP-SPRT, a wrapper that can be calibrated to achieve desired error probabilities and privacy constraints, addressing a significant gap in previous work. DP-SPRT relies on a private mechanism that processes a sequence of queries and stops after privately determining when the query results fall outside a predefined interval. This OutsideInterval mechanism improves upon naive composition of existing techniques like AboveThreshold, achieving a factor-of-2 privacy improvement and thus potentially benefiting other continual monitoring procedures. We prove generic upper bounds on the error and sample complexity of DP-SPRT that can accommodate various noise distributions based on the practitioner's privacy needs. We exemplify them in two settings: Laplace noise (pure Differential Privacy) and Gaussian noise (Rényi differential privacy). In the former setting, by providing a lower bound on the sample complexity of any $\varepsilon$-DP test with prescribed type I and type II errors, we show that DP-SPRT is near optimal when both errors are small and the two hypotheses are close. Moreover, we conduct an experimental study revealing its good practical performance.

2508.05844 2026-02-05 cs.GT cs.LG stat.ML

Online Budget Allocation with Censored Semi-Bandit Feedback

François Bachoc, Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni

详情
英文摘要

We study a stochastic budget-allocation problem over $K$ tasks. At each round $t$, the learner chooses an allocation $X_t \in Δ_K$. Task $k$ succeeds with probability $F_k(X_{t,k})$, where $F_1,\dots,F_K$ are nondecreasing budget-to-success curves, and upon success yields a random reward with unknown mean $μ_k$. The learner observes which tasks succeed, and observes a task's reward only upon success (censored semi-bandit feedback). This model captures, for instance, splitting payments across crowdsourcing workers or distributing bids across simultaneous auctions, and subsumes stochastic multi-armed bandits and semi-bandits. We design an optimism-based algorithm that operates under censored semi-bandit feedback. Our main result shows that in diminishing-returns regimes, the regret of this algorithm scales polylogarithmically with the horizon $T$ without any ad hoc tuning. For general nondecreasing curves, we prove that the same algorithm (with the same tuning) achieves a worst-case regret upper bound of $\tilde O(K\sqrt{T})$. Finally, we establish a matching worst-case regret lower bound of $Ω(K\sqrt{T})$ that holds even for full-feedback algorithms, highlighting the intrinsic hardness of our problem outside diminishing returns.

2507.06969 2026-02-05 cs.LG cs.AI cs.CR cs.CY stat.ML

Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy

Bogdan Kulynych, Juan Felipe Gomez, Georgios Kaissis, Jamie Hayes, Borja Balle, Flavio P. Calmon, Jean Louis Raisaro

Comments NeurIPS 2025

详情
英文摘要

Differentially private (DP) mechanisms are difficult to interpret and calibrate because existing methods for mapping standard privacy parameters to concrete privacy risks -- re-identification, attribute inference, and data reconstruction -- are both overly pessimistic and inconsistent. In this work, we use the hypothesis-testing interpretation of DP ($f$-DP), and determine that bounds on attack success can take the same unified form across re-identification, attribute inference, and data reconstruction risks. Our unified bounds are (1) consistent across a multitude of attack settings, and (2) tunable, enabling practitioners to evaluate risk with respect to arbitrary, including worst-case, levels of baseline risk. Empirically, our results are tighter than prior methods using $\varepsilon$-DP, Rényi DP, and concentrated DP. As a result, calibrating noise using our bounds can reduce the required noise by 20% at the same risk level, which yields, e.g., an accuracy increase from 52% to 70% in a text classification task. Overall, this unifying perspective provides a principled framework for interpreting and calibrating the degree of protection in DP against specific levels of re-identification, attribute inference, or data reconstruction risk.

2506.24007 2026-02-05 econ.EM cs.LG math.ST stat.ME stat.ML stat.TH

Minimax and Bayes Optimal Best-Arm Identification

Masahiro Kato

详情
英文摘要

This study investigates minimax and Bayes optimal strategies for fixed-budget best-arm identification. We consider an adaptive procedure consisting of a sampling phase followed by a recommendation phase, and we design an adaptive experiment within this framework to efficiently identify the best arm, defined as the one with the highest expected outcome. In our proposed strategy, the sampling phase consists of two stages. The first stage is a pilot phase, in which we allocate samples uniformly across arms to eliminate clearly suboptimal arms and to estimate outcome variances. Before entering the second stage, we solve a Gaussian minimax game, which yields a sampling ratio and a decision rule. In the second stage, samples are allocated according to this sampling ratio. After the sampling phase, the procedure enters the recommendation phase, where we select an arm using the decision rule. We prove that this single strategy is simultaneously asymptotically minimax and Bayes optimal for the simple regret, and we establish upper bounds that coincide exactly with our lower bounds, including the constant terms.

2506.06571 2026-02-05 cs.LG cs.AI stat.ML

Graph Persistence goes Spectral

Mattie Ji, Amauri H. Souza, Vikas Garg

Comments 32 pages, 4 figures, 7 tables. Accepted at NeurIPS 2025. Final version, clarified minor bug

详情
英文摘要

Including intricate topological information (e.g., cycles) provably enhances the expressivity of message-passing graph neural networks (GNNs) beyond the Weisfeiler-Leman (WL) hierarchy. Consequently, Persistent Homology (PH) methods are increasingly employed for graph representation learning. In this context, recent works have proposed decorating classical PH diagrams with vertex and edge features for improved expressivity. However, these methods still fail to capture basic graph structural information. In this paper, we propose SpectRe -- a new topological descriptor for graphs that integrates spectral information into PH diagrams. Notably, SpectRe is strictly more expressive than PH and spectral information on graphs alone. We also introduce notions of global and local stability to analyze existing descriptors and establish that SpectRe is locally stable. Finally, experiments on synthetic and real-world datasets demonstrate the effectiveness of SpectRe and its potential to enhance the capabilities of graph models in relevant learning tasks. Code is available at https://github.com/Aalto-QuML/SpectRe/.

2506.01913 2026-02-05 cs.LG stat.ML

Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness

Thomas Pethick, Wanyun Xie, Mete Erdogan, Kimon Antonakopoulos, Antonio Silveti-Falls, Volkan Cevher

详情
英文摘要

This work introduces a hybrid non-Euclidean optimization method which generalizes gradient norm clipping by combining steepest descent and conditional gradient approaches. The method achieves the best of both worlds by establishing a descent property under a generalized notion of ($L_0$,$L_1$)-smoothness. Weight decay is incorporated in a principled manner by identifying a connection to the Frank-Wolfe short step. In the stochastic case, we show an order optimal $O(n^{-1/4})$ convergence rate by leveraging a momentum based gradient estimator. We discuss how to instantiate the algorithms for deep learning, which we dub Clipped Scion, and demonstrate their properties on image classification and language modeling. The code is available at https://github.com/LIONS-EPFL/ClippedScion.

2504.21106 2026-02-05 econ.EM stat.ME

An Axiomatic Approach to Comparing Sensitivity Parameters

Paul Diegert, Matthew A. Masten, Alexandre Poirier

Comments This paper is a revised, shorter version of our now-superseded previous working paper arXiv:2206.02303v4, without the identification analysis or empirical results of the former sections 4 and 5. The identification analysis and empirical results can now be found in our companion paper arXiv:2206.02303v5

详情
英文摘要

Many methods are available for assessing the importance of omitted variables in linear regression. These methods typically make different, non-falsifiable assumptions. Hence the data alone cannot tell us which method is most appropriate. Since it is unreasonable to expect results to be robust against all possible robustness checks, researchers often use methods deemed ``interpretable,'' a subjective criterion with no formal definition. In contrast, we develop the first formal, axiomatic framework for comparing and selecting among these methods. Our framework is analogous to the standard approach for comparing estimators based on their sampling distributions. We propose that sensitivity parameters be selected based on their covariate sampling distributions, a design distribution of parameter values induced by an assumption on how covariates are assigned to be observed or unobserved. Using this idea, we define new concepts of parameter consistency and monotonicity, and argue that a reasonable sensitivity parameter should satisfy both properties. We prove that the literature's most popular approach is inconsistent and non-monotonic, while several alternatives satisfy both.

2501.06148 2026-02-05 cs.LG stat.ML

From discrete-time policies to continuous-time diffusion samplers: Asymptotic equivalences and faster training

Julius Berner, Lorenz Richter, Marcin Sendera, Jarrid Rector-Brooks, Nikolay Malkin

Comments TMLR final version; code: https://github.com/GFNOrg/gfn-diffusion/tree/stagger

详情
英文摘要

We study the problem of training neural stochastic differential equations, or diffusion models, to sample from a Boltzmann distribution without access to target samples. Existing methods for training such models enforce time-reversal of the generative and noising processes, using either differentiable simulation or off-policy reinforcement learning (RL). We prove equivalences between families of objectives in the limit of infinitesimal discretization steps, linking entropic RL methods (GFlowNets) with continuous-time objects (partial differential equations and path space measures). We further show that an appropriate choice of coarse time discretization during training allows greatly improved sample efficiency and the use of time-local objectives, achieving competitive performance on standard sampling benchmarks with reduced computational cost.

2410.18844 2026-02-05 cs.LG cs.AI stat.ME stat.ML

Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints

Udvas Das, Debabrota Basu

详情
英文摘要

Pure exploration in bandits formalises multiple real-world problems, such as tuning hyper-parameters or conducting user studies to test a set of items, where different safety, resource, and fairness constraints on the decision space naturally appear. We study these problems as pure exploration in multi-armed bandits with unknown linear constraints, where the aim is to identify an $r$-optimal and feasible policy as fast as possible with a given level of confidence. First, we propose a Lagrangian relaxation of the sample complexity lower bound for pure exploration under constraints. Second, we leverage properties of convex optimisation in the Lagrangian lower bound to propose two computationally efficient extensions of Track-and-Stop and Gamified Explorer, namely LATS and LAGEX. Then, we propose a constraint-adaptive stopping rule, and while tracking the lower bound, use optimistic estimate of the feasible set at each step. We show that LAGEX achieves asymptotically optimal sample complexity upper bound, while LATS shows asymptotic optimality up to novel constraint-dependent constants. Finally, we conduct numerical experiments with different reward distributions and constraints that validate efficient performance of LATS and LAGEX.

2410.07427 2026-02-05 cs.LG stat.ML

A Generalization Bound for a Family of Implicit Networks

Samy Wu Fung, Benjamin Berkels

详情
英文摘要

Implicit networks are a class of neural networks whose outputs are defined by the fixed point of a parameterized operator. They have enjoyed success in many applications including natural language processing, image processing, and numerous other applications. While they have found abundant empirical success, theoretical work on its generalization is still under-explored. In this work, we consider a large family of implicit networks defined parameterized contractive fixed point operators. We show a generalization bound for this class based on a covering number argument for the Rademacher complexity of these architectures.

2305.19557 2026-02-05 math.OC cs.LG eess.SP stat.ML

Dictionary Learning under Symmetries via Group Representations

Subhroshekhar Ghosh, Aaron Y. R. Low, Yong Sheng Soh, Zhuohang Feng, Brendan K. Y. Tan

Comments 33 pages, 3 figures

详情
英文摘要

The dictionary learning problem can be viewed as a data-driven process to learn a suitable transformation so that data is sparsely represented directly from example data. In this paper, we examine the problem of learning a dictionary that is invariant under a pre-specified group of transformations. Natural settings include Cryo-EM, multi-object tracking, synchronization, pose estimation, etc. We specifically study this problem under the lens of mathematical representation theory. Leveraging the power of non-abelian Fourier analysis for functions over compact groups, we prescribe an algorithmic recipe for learning dictionaries that obey such invariances. We relate the dictionary learning problem in the physical domain, which is naturally modelled as being infinite dimensional, with the associated computational problem, which is necessarily finite dimensional. We establish that the dictionary learning problem can be effectively understood as an optimization instance over certain matrix orbitopes having a particular block-diagonal structure governed by the irreducible representations of the group of symmetries. This perspective enables us to introduce a band-limiting procedure which obtains dimensionality reduction in applications. We provide guarantees for our computational ansatz to provide a desirable dictionary learning outcome. We apply our paradigm to investigate the dictionary learning problem for the groups SO(2) and SO(3). While the SO(2)-orbitope admits an exact spectrahedral description, substantially less is understood about the SO(3)-orbitope. We describe a tractable spectrahedral outer approximation of the SO(3)-orbitope, and contribute an alternating minimization paradigm to perform optimization in this setting. We provide numerical experiments to highlight the efficacy of our approach in learning SO(3)-invariant dictionaries, both on synthetic and on real world data.

2301.11791 2026-02-05 stat.CO

Improving Software Engineering in Biostatistics: Challenges and Opportunities

Daniel Sabanés Bové, Heidi Seibold, Anne-Laure Boulesteix, Juliane Manitz, Alessandro Gasparini, Burak K. Günhan, Oliver Boix, Armin Schüler, Sven Fillinger, Sven Nahnsen, Anna E. Jacob, Thomas Jaki

Journal ref Drug Discovery Today, Vol. 31, Issue 3, 2026, 104613

详情
英文摘要

Programming is ubiquitous in applied biostatistics; adopting software engineering skills will help biostatisticians do a better job. To explain this, we start by highlighting key challenges for software development and application in biostatistics. Silos between different statistician roles, projects, departments, and organizations lead to the development of duplicate and suboptimal code. Building on top of open-source software requires critical appraisal and risk-based assessment of the used modules. Code that is written needs to be readable to ensure reliable software. The software needs to be easily understandable for the user, as well as developed within testing frameworks to ensure that long term maintenance of the software is feasible. Finally, the reproducibility of research results is hindered by manual analysis workflows and uncontrolled code development. We next describe how the awareness of the importance and application of good software engineering practices and strategies can help address these challenges. The foundation is a better education in basic software engineering skills in schools, universities, and during the work life. Dedicated software engineering teams within academic institutions and companies can be a key factor for the establishment of good software engineering practices and catalyze improvements across research projects. Providing attractive career paths is important for the retainment of talents. Readily available tools can improve the reproducibility of statistical analyses and their use can be exercised in community events. [...]

2206.02303 2026-02-05 econ.EM stat.ME

Assessing Omitted Variable Bias when the Controls are Endogenous

Paul Diegert, Matthew A. Masten, Alexandre Poirier

Comments This paper is a revised, shorter version of our now-superseded previous working paper arXiv:2206.02303v4, without the design-based framework of Section 3. The design-based framework and associated results can now be found in our companion paper arXiv:2504.21106

详情
英文摘要

Omitted variables are one of the most important threats to the identification of causal effects. Several widely used methods assess the impact of omitted variables on empirical conclusions by comparing measures of selection on observables with measures of selection on unobservables. The recent literature has discussed various limitations of these existing methods, however. This includes challenges that arise when the omitted variables are endogenous, meaning that they are correlated with the included controls. We develop a new approach to regression sensitivity analysis that avoids those limitations, while still allowing researchers to calibrate sensitivity parameters by comparing the magnitude of selection on observables with the magnitude of selection on unobservables as in previous methods. We illustrate our results in an empirical study of the effect of historical American frontier life on modern cultural beliefs. Finally, we implement these methods in the companion Stata module regsensitivity for easy use in practice.

2602.04335 2026-02-05 stat.ML cs.LG

Geometry-Aware Optimal Transport: Fast Intrinsic Dimension and Wasserstein Distance Estimation

Ferdinand Genans, Olivier Wintenberger

详情
英文摘要

Solving large scale Optimal Transport (OT) in machine learning typically relies on sampling measures to obtain a tractable discrete problem. While the discrete solver's accuracy is controllable, the rate of convergence of the discretization error is governed by the intrinsic dimension of our data. Therefore, the true bottleneck is the knowledge and control of the sampling error. In this work, we tackle this issue by introducing novel estimators for both sampling error and intrinsic dimension. The key finding is a simple, tuning-free estimator of $\text{OT}_c(ρ, \hatρ)$ that utilizes the semi-dual OT functional and, remarkably, requires no OT solver. Furthermore, we derive a fast intrinsic dimension estimator from the multi-scale decay of our sampling error estimator. This framework unlocks significant computational and statistical advantages in practice, enabling us to (i) quantify the convergence rate of the discretization error, (ii) calibrate the entropic regularization of Sinkhorn divergences to the data's intrinsic geometry, and (iii) introduce a novel, intrinsic-dimension-based Richardson extrapolation estimator that strongly debiases Wasserstein distance estimation. Numerical experiments demonstrate that our geometry-aware pipeline effectively mitigates the discretization error bottleneck while maintaining computational efficiency.

2602.04322 2026-02-05 stat.ME

Exact Multiple Change-Point Detection Via Smallest Valid Partitioning

Vincent Runge, Anica Kostic, Alexandre Combeau, Gaetano Romano

详情
英文摘要

We introduce smallest valid partitioning (SVP), a segmentation method for multiple change-point detection in time-series. SVP relies on a local notion of segment validity: a candidate segment is retained only if it passes a user-chosen validity test (e.g., a single change-point test). From the collection of valid segments, we propose a coherent aggregation procedure that constructs a global segmentation which is the exact solution of an optimization problem. Our main contribution is the use of a lexicographic order for the optimization problem that prioritizes parsimony. We analyze the computational complexity of the resulting procedure, which ranges from linear to cubic time depending on the chosen cost and validity functions, the data regime and the number of detected changes. Finally, we assess the quality of SVP through comparisons with standard optimal partitioning algorithms, showing that SVP yields competitive segmentations while explicitly enforcing segment validity. The flexibility of SVP makes it applicable to a broad class of problems; as an illustration, we demonstrate robust change-point detection by encoding robustness in the validity criterion.

2602.04318 2026-02-05 stat.ME math.ST stat.TH

Accurate and Efficient Approximation of the Null Distribution of Rao's Spacing Test

Yoshiki Kinoshita, Aya Shinozaki, Toshinari Kamakura

Comments 10 pages

详情
英文摘要

Rao's spacing test is a widely used nonparametric method for assessing uniformity on the circle. However, its broader applicability in practical settings has been limited because the null distribution is not easily calculated. As a result, practitioners have traditionally depended on pre-tabulated critical values computed for a limited set of sample sizes, which restricts the flexibility and generality of the method. In this paper, we address this limitation by recursively computing higher-order moments of the Rao's spacing test statistic and employing the Gram-Charlier expansion to derive an accurate approximation to its null distribution. This approach allows for the efficient and direct computation of p-values for arbitrary sample sizes, thereby eliminating the dependency on existing critical value tables. Moreover, we confirm that our method remains accurate and effective even for large sample sizes that are not represented in current tables, thus overcoming a significant practical limitation. Comparative evaluations with published critical values and saddlepoint approximations demonstrate that our method achieves a high degree of accuracy across a wide range of sample sizes. These findings greatly improve the practicality and usability of Rao's spacing test in both theoretical investigations and applied statistical analyses.

2602.04272 2026-02-05 stat.CO cs.LG stat.ME

Bures-Wasserstein Importance-Weighted Evidence Lower Bound: Exposition and Applications

Peiwen Jiang, Takuo Matsubara, Minh-Ngoc Tran

Comments 27 pages, 6 figures. Submitted to Bayesian Analysis

详情
英文摘要

The Importance-Weighted Evidence Lower Bound (IW-ELBO) has emerged as an effective objective for variational inference (VI), tightening the standard ELBO and mitigating the mode-seeking behaviour. However, optimizing the IW-ELBO in Euclidean space is often inefficient, as its gradient estimators suffer from a vanishing signal-to-noise ratio (SNR). This paper formulates the optimisation of the IW-ELBO in Bures-Wasserstein space, a manifold of Gaussian distributions equipped with the 2-Wasserstein metric. We derive the Wasserstein gradient of the IW-ELBO and project it onto the Bures-Wasserstein space to yield a tractable algorithm for Gaussian VI. A pivotal contribution of our analysis concerns the stability of the gradient estimator. While the SNR of the standard Euclidean gradient estimator is known to vanish as the number of importance samples $K$ increases, we prove that the SNR of the Wasserstein gradient scales favourably as $Ω(\sqrt{K})$, ensuring optimisation efficiency even for large $K$. We further extend this geometric analysis to the Variational Rényi Importance-Weighted Autoencoder bound, establishing analogous stability guarantees. Experiments demonstrate that the proposed framework achieves superior approximation performance compared to other baselines.

2602.04270 2026-02-05 cs.LG q-bio.NC q-bio.QM stat.ML

Multi-Integration of Labels across Categories for Component Identification (MILCCI)

Noga Mudrik, Yuxi Chen, Gal Mishne, Adam S. Charles

详情
英文摘要

Many fields collect large-scale temporal data through repeated measurements (trials), where each trial is labeled with a set of metadata variables spanning several categories. For example, a trial in a neuroscience study may be linked to a value from category (a): task difficulty, and category (b): animal choice. A critical challenge in time-series analysis is to understand how these labels are encoded within the multi-trial observations, and disentangle the distinct effect of each label entry across categories. Here, we present MILCCI, a novel data-driven method that i) identifies the interpretable components underlying the data, ii) captures cross-trial variability, and iii) integrates label information to understand each category's representation within the data. MILCCI extends a sparse per-trial decomposition that leverages label similarities within each category to enable subtle, label-driven cross-trial adjustments in component compositions and to distinguish the contribution of each category. MILCCI also learns each component's corresponding temporal trace, which evolves over time within each trial and varies flexibly across trials. We demonstrate MILCCI's performance through both synthetic and real-world examples, including voting patterns, online page view trends, and neuronal recordings.

2602.04250 2026-02-05 math.PR math.ST stat.TH

A Note on Physical Dependence and Mixing Conditions for Triangular Arrays

Florian Heinrichs

Comments Keywords: Weak Dependence, Strong Mixing, $β$-Mixing, Physical Dependence, Triangular Arrays, Local Stationarity

详情
英文摘要

Under mild structural assumptions and regularity conditions on the marginal and conditional densities, an explicit bound on the $β$-mixing coefficients in terms of the physical dependence measure is provided. Consequently, weak physical dependence implies $β$-mixing and strong mixing for triangular arrays, complementing Hill (2025), who proved the converse implication under moment assumptions.

2602.04233 2026-02-05 stat.ML cs.LG

Provable Target Sample Complexity Improvements as Pre-Trained Models Scale

Kazuto Fukuchi, Ryuichiro Hataya, Kota Matsui

Comments AISTATS2026

详情
英文摘要

Pre-trained models have become indispensable for efficiently building models across a broad spectrum of downstream tasks. The advantages of pre-trained models have been highlighted by empirical studies on scaling laws, which demonstrate that larger pre-trained models can significantly reduce the sample complexity of downstream learning. However, existing theoretical investigations of pre-trained models lack the capability to explain this phenomenon. In this paper, we provide a theoretical investigation by introducing a novel framework, caulking, inspired by parameter-efficient fine-tuning (PEFT) methods such as adapter-based fine-tuning, low-rank adaptation, and partial fine-tuning. Our analysis establishes that improved pre-trained models provably decrease the sample complexity of downstream tasks, thereby offering theoretical justification for the empirically observed scaling laws relating pre-trained model size to downstream performance, a relationship not covered by existing results.

2602.04230 2026-02-05 stat.ME econ.EM

Validating Causal Message Passing Against Network-Aware Methods on Real Experiments

Albert Tan, Sadegh Shirani, James Nordlund, Mohsen Bayati

详情
英文摘要

Estimating total treatment effects in the presence of network interference typically requires knowledge of the underlying interaction structure. However, in many practical settings, network data is either unavailable, incomplete, or measured with substantial error. We demonstrate that causal message passing, a methodology that leverages temporal structure in outcome data rather than network topology, can recover total treatment effects comparable to network-aware approaches. We apply causal message passing to two large-scale field experiments where a recently developed bipartite graph methodology, which requires network knowledge, serves as a benchmark. Despite having no access to the interaction network, causal message passing produces effect estimates that match the network-aware approach in direction across all metrics and in statistical significance for the primary decision metric. Our findings validate the premise of causal message passing: that temporal variation in outcomes can serve as an effective substitute for network observation when estimating spillover effects. This has important practical implications: practitioners facing settings where network data is costly to collect, proprietary, or unreliable can instead exploit the temporal dynamics of their experimental data.

2602.04178 2026-02-05 stat.ME stat.AP stat.CO

Sparse group principal component analysis via double thresholding with application to multi-cellular programs

Qi Xu, Jing Lei, Kathryn Roeder

详情
英文摘要

Multi-cellular programs (MCPs) are coordinated patterns of gene expression across interacting cell types that collectively drive complex biological processes such as tissue development and immune responses. While MCPs are typically estimated from high-dimensional gene expression data using methods like sparse principal component analysis or latent factor models, these approaches often suffer from high computational costs and limited statistical power. In this work, we propose Sparse Group Principal Component Analysis (SGPCA) to estimate MCPs by leveraging their inherent group and individual sparsity. We introduce an efficient double-thresholding algorithm based on power iteration. In each iteration, a group thresholding step first identifies relevant gene groups, followed by an individual thresholding step to select active cell types. This algorithm achieves a linear computational complexity of $O(np)$, making it highly efficient and scalable for large-scale genomic analyses. We establish theoretical guarantees for SGPCA, including statistical consistency and a convergence rate that surpasses competing methods. Through extensive simulations, we demonstrate that SGPCA achieves superior estimation accuracy and improved statistical power for signal detection. Furthermore, We apply SGPCA to a Lupus study, discovering differentially expressed MCPs distinguishing Lupus patients from normal subjects.

2602.04164 2026-02-05 cs.ET stat.AP stat.CO stat.OT

The Dynamics of Attention across Automated and Manual Driving Modes: A Driving Simulation Study

Yuan Cai, Mustafa Demir, Farzan Sasangohar, Mohsen Zare

详情
英文摘要

This study aims to explore the dynamics of driver attention to various zones, including the road, the central mirror, the embedded Human-Machine Interface (HMI), and the speedometer, across different driving modes in AVs. The integration of autonomous vehicles (AVs) into transportation systems has introduced critical safety concerns, particularly regarding driver re-engagement during mode transitions. Past accidents underscore the risks of overreliance on automation and highlight the need to understand dynamic attention allocation to support safety in autonomous driving. A high-fidelity driving simulation was conducted. Eye-tracking technology was used to measure fixation duration, fixation count, and time to first fixation across distinct driving modes (automated, manual, and transition), which were then used to assess how drivers allocated attention to various areas of interest (AOIs). Findings show that drivers' attention varies significantly across driving modes. In manual mode, attention consistently focuses on the road, while in automated mode, prolonged fixation on the embedded HMI was observed. During the handover and takeover phases, attention shifts dynamically between environmental and technological elements. The study reveals that driver attention allocation is mode-dependent. These findings inform the design of adaptive HMIs in AVs that align with drivers' attention patterns. By presenting relevant information according to the driving context, such systems can enhance driver-vehicle interaction, support effective transitions, and improve overall safety. Systematic analysis of visual attention dynamics across driving modes is gaining prominence, as it informs adaptive HMI designs and driver readiness interventions. The GLMM findings can be directly applied to the design of adaptive HMIs or driver training programs to enhance attention and improve safety.

2602.04125 2026-02-05 stat.ML cs.LG stat.ME

Attack-Resistant Uniform Fairness for Linear and Smooth Contextual Bandits

Qingwen Zhang, Wenjia Wang

详情
英文摘要

Modern systems, such as digital platforms and service systems, increasingly rely on contextual bandits for online decision-making; however, their deployment can inadvertently create unfair exposure among arms, undermining long-term platform sustainability and supplier trust. This paper studies the contextual bandit problem under a uniform $(1-δ)$-fairness constraint, and addresses its unique vulnerabilities to strategic manipulation. The fairness constraint ensures that preferential treatment is strictly justified by an arm's actual reward across all contexts and time horizons, using uniformity to prevent statistical loopholes. We develop novel algorithms that achieve (nearly) minimax-optimal regret for both linear and smooth reward functions, while maintaining strong $(1-\tilde{O}(1/T))$-fairness guarantees, and further characterize the theoretically inherent yet asymptotically marginal "price of fairness". However, we reveal that such merit-based fairness becomes uniquely susceptible to signal manipulation. We show that an adversary with a minimal $\tilde{O}(1)$ budget can not only degrade overall performance as in traditional attacks, but also selectively induce insidious fairness-specific failures while leaving conspicuous regret measures largely unaffected. To counter this, we design robust variants incorporating corruption-adaptive exploration and error-compensated thresholding. Our approach yields the first minimax-optimal regret bounds under $C$-budgeted attack while preserving $(1-\tilde{O}(1/T))$-fairness. Numerical experiments and a real-world case demonstrate that our algorithms sustain both fairness and efficiency.

2602.04078 2026-02-05 cs.LG cs.AI stat.ML

Principles of Lipschitz continuity in neural networks

Róisín Luo

Comments Ph.D. Thesis

详情
英文摘要

Deep learning has achieved remarkable success across a wide range of domains, significantly expanding the frontiers of what is achievable in artificial intelligence. Yet, despite these advances, critical challenges remain -- most notably, ensuring robustness to small input perturbations and generalization to out-of-distribution data. These critical challenges underscore the need to understand the underlying fundamental principles that govern robustness and generalization. Among the theoretical tools available, Lipschitz continuity plays a pivotal role in governing the fundamental properties of neural networks related to robustness and generalization. It quantifies the worst-case sensitivity of network's outputs to small input perturbations. While its importance is widely acknowledged, prior research has predominantly focused on empirical regularization approaches based on Lipschitz constraints, leaving the underlying principles less explored. This thesis seeks to advance a principled understanding of the principles of Lipschitz continuity in neural networks within the paradigm of machine learning, examined from two complementary perspectives: an internal perspective -- focusing on the temporal evolution of Lipschitz continuity in neural networks during training (i.e., training dynamics); and an external perspective -- investigating how Lipschitz continuity modulates the behavior of neural networks with respect to features in the input data, particularly its role in governing frequency signal propagation (i.e., modulation of frequency signal propagation).

2602.04077 2026-02-05 stat.ML cs.LG

Efficient Subgroup Analysis via Optimal Trees with Global Parameter Fusion

Zhongming Xie, Joseph Giorgio, Jingshen Wang

详情
英文摘要

Identifying and making statistical inferences on differential treatment effects (commonly known as subgroup analysis in clinical research) is central to precision health. Subgroup analysis allows practitioners to pinpoint populations for whom a treatment is especially beneficial or protective, thereby advancing targeted interventions. Tree based recursive partitioning methods are widely used for subgroup analysis due to their interpretability. Nevertheless, these approaches encounter significant limitations, including suboptimal partitions induced by greedy heuristics and overfitting from locally estimated splits, especially under limited sample sizes. To address these limitations, we propose a fused optimal causal tree method that leverages mixed integer optimization (MIO) to facilitate precise subgroup identification. Our approach ensures globally optimal partitions and introduces a parameter fusion constraint to facilitate information sharing across related subgroups. This design substantially improves subgroup discovery accuracy and enhances statistical efficiency. We provide theoretical guarantees by rigorously establishing out of sample risk bounds and comparing them with those of classical tree based methods. Empirically, our method consistently outperforms popular baselines in simulations. Finally, we demonstrate its practical utility through a case study on the Health and Aging Brain Study Health Disparities (HABS-HD) dataset, where our approach yields clinically meaningful insights.

2602.04028 2026-02-05 cs.AI cs.LG stat.ME

Axiomatic Foundations of Counterfactual Explanations

Leila Amgoud, Martin Cooper

详情
英文摘要

Explaining autonomous and intelligent systems is critical in order to improve trust in their decisions. Counterfactuals have emerged as one of the most compelling forms of explanation. They address ``why not'' questions by revealing how decisions could be altered. Despite the growing literature, most existing explainers focus on a single type of counterfactual and are restricted to local explanations, focusing on individual instances. There has been no systematic study of alternative counterfactual types, nor of global counterfactuals that shed light on a system's overall reasoning process. This paper addresses the two gaps by introducing an axiomatic framework built on a set of desirable properties for counterfactual explainers. It proves impossibility theorems showing that no single explainer can satisfy certain axiom combinations simultaneously, and fully characterizes all compatible sets. Representation theorems then establish five one-to-one correspondences between specific subsets of axioms and the families of explainers that satisfy them. Each family gives rise to a distinct type of counterfactual explanation, uncovering five fundamentally different types of counterfactuals. Some of these correspond to local explanations, while others capture global explanations. Finally, the framework situates existing explainers within this taxonomy, formally characterizes their behavior, and analyzes the computational complexity of generating such explanations.

2602.04021 2026-02-05 cs.LG q-bio.QM stat.ML

Group Contrastive Learning for Weakly Paired Multimodal Data

Aditya Gorla, Hugues Van Assel, Jan-Christian Huetter, Heming Yao, Kyunghyun Cho, Aviv Regev, Russell Littman

详情
英文摘要

We present GROOVE, a semi-supervised multi-modal representation learning approach for high-content perturbation data where samples across modalities are weakly paired through shared perturbation labels but lack direct correspondence. Our primary contribution is GroupCLIP, a novel group-level contrastive loss that bridges the gap between CLIP for paired cross-modal data and SupCon for uni-modal supervised contrastive learning, addressing a fundamental gap in contrastive learning for weakly-paired settings. We integrate GroupCLIP with an on-the-fly backtranslating autoencoder framework to encourage cross-modally entangled representations while maintaining group-level coherence within a shared latent space. Critically, we introduce a comprehensive combinatorial evaluation framework that systematically assesses representation learners across multiple optimal transport aligners, addressing key limitations in existing evaluation strategies. This framework includes novel simulations that systematically vary shared versus modality-specific perturbation effects enabling principled assessment of method robustness. Our combinatorial benchmarking reveals that there is not yet an aligner that uniformly dominates across settings or modality pairs. Across simulations and two real single-cell genetic perturbation datasets, GROOVE performs on par with or outperforms existing approaches for downstream cross-modal matching and imputation tasks. Our ablation studies demonstrate that GroupCLIP is the key component driving performance gains. These results highlight the importance of leveraging group-level constraints for effective multi-modal representation learning in scenarios where only weak pairing is available.

2602.04010 2026-02-05 stat.ME

Robust Nonparametric Two-Sample Tests via Mutual Information using Extended Bregman Divergence

Arijit Pyne

详情
英文摘要

We introduce a generalized formulation of mutual information (MI) based on the extended Bregman divergence, a framework that subsumes the generalized S-Bregman (GSB) divergence family. The GSB divergence unifies two important classes of statistical distances, namely the S-divergence and the Bregman exponential divergence (BED), thereby encompassing several widely used subfamilies, including the power divergence (PD), density power divergence (DPD), and S-Hellinger distance (S-HD). In parametric inference, minimum divergence estimators are well known to balance robustness with high asymptotic efficiency relative to the maximum likelihood estimator. However, nonparametric tests based on such statistical distances have been relatively less explored. In this paper, we construct a class of consistent and robust nonparametric two-sample tests for the equality of two absolutely continuous distributions using the generalized MI. We establish the asymptotic normality of the proposed test statistics under the null and contiguous alternatives. The robustness properties of the generalized MI are rigorously studied through the influence function and the breakdown point, demonstrating that stability of the generalized MI translates into stability of the associated tests. Extensive simulation studies show that divergences beyond the PD family often yield superior robustness under contamination while retaining high asymptotic power. A data-driven scheme for selecting optimal tuning parameters is also proposed. Finally, the methodology is illustrated with applications to real data.

2602.03985 2026-02-05 stat.ME stat.AP

Doubly-Robust Bayesian Estimation of Optimal Individualized Treatment Rules using Network Meta-Analysis

Augustine Wigle, Erica E. M. Moodie

详情
英文摘要

An optimal individualized treatment rule (ITR) is a function that takes a patient's characteristics, such as demographics, biomarkers, and treatment history, and outputs a treatment that is expected to give the best outcome for that patient. Major Depressive Disorder (MDD) is a common and disabling mental health condition for which an optimal ITR is of interest. Unfortunately, the power to detect treatment-covariate interactions in individual studies of MDD treatments is low. Additionally, all treatments of interest are not compared head-to-head in a single study. Network meta-analysis (NMA) is a method of synthesizing data from multiple studies to estimate the relative effects of a set of treatments. Recently, two-stage ITR NMA was proposed as a method to estimate ITRs that has the potential to improve power and simultaneously consider all relevant treatment options. In the first stage, study-specific ITRs are estimated, and in the second stage, they are pooled using a Bayesian NMA model. The existing approach is vulnerable to model misspecification and fails to address missing outcomes, which occur in the MDD data. We overcome these challenges by proposing Bayesian Bootstrap dynamic Weighted Ordinary Least Squares (BBdWOLS), a doubly-robust approach to ITR estimation that accounts for missing at random outcomes and naturally quantifies the uncertainty in estimation. We also propose an improvement to the NMA model that incorporates the full variance-covariance matrix of study-specific estimates. In a simulation study, we show that our fully Bayesian ITR NMA method is more robust and efficient than the existing approach. We apply our method to the motivating dataset consisting of three studies of pharmacological treatments for MDD, and explore how ITR NMA results can support personalized decision making in this context.

2602.03954 2026-02-05 stat.ML cs.LG stat.CO stat.ME

Learning Multi-type heterogeneous interacting particle systems

Quanjun Lang, Xiong Wang, Fei Lu, Mauro Maggioni

详情
英文摘要

We propose a framework for the joint inference of network topology, multi-type interaction kernels, and latent type assignments in heterogeneous interacting particle systems from multi-trajectory data. This learning task is a challenging non-convex mixed-integer optimization problem, which we address through a novel three-stage approach. First, we leverage shared structure across agent interactions to recover a low-rank embedding of the system parameters via matrix sensing. Second, we identify discrete interaction types by clustering within the learned embedding. Third, we recover the network weight matrix and kernel coefficients through matrix factorization and a post-processing refinement. We provide theoretical guarantees with estimation error bounds under a Restricted Isometry Property (RIP) assumption and establish conditions for the exact recovery of interaction types based on cluster separability. Numerical experiments on synthetic datasets, including heterogeneous predator-prey systems, demonstrate that our method yields an accurate reconstruction of the underlying dynamics and is robust to noise.

2602.03948 2026-02-05 stat.ML cs.CR cs.LG cs.SI math.ST stat.TH

Privacy utility trade offs for parameter estimation in degree heterogeneous higher order networks

Bibhabasu Mandal, Sagnik Nandy

详情
英文摘要

In sensitive applications involving relational datasets, protecting information about individual links from adversarial queries is of paramount importance. In many such settings, the available data are summarized solely through the degrees of the nodes in the network. We adopt the $β$ model, which is the prototypical statistical model adopted for this form of aggregated relational information, and study the problem of minimax-optimal parameter estimation under both local and central differential privacy constraints. We establish finite sample minimax lower bounds that characterize the precise dependence of the estimation risk on the network size and the privacy parameters, and we propose simple estimators that achieve these bounds up to constants and logarithmic factors under both local and central differential privacy frameworks. Our results provide the first comprehensive finite sample characterization of privacy utility trade offs for parameter estimation in $β$ models, addressing the classical graph case and extending the analysis to higher order hypergraph models. We further demonstrate the effectiveness of our methods through experiments on synthetic data and a real world communication network.

2602.03919 2026-02-05 cond-mat.stat-mech math.ST stat.TH

Tsallis Entropy derived from the Chaitin-Kolmogorov Informational Entropy

Airton Deppman

Comments 16 pages 1 figure

详情
英文摘要

We provide a rigorous first-principle derivation of the non-additive Tsallis' entropy by employing the Chaitin-Kolmogorov algorithmic information theory. By applying non-local restrictive rules on the string formation (grammar), we show that the algorithmic cost follows a power-law of the string length, instead of the linear behaviour obtained in the classical theory. As a result, the Tsallis entropy governs the increase of information. We explore the result showing, through Landauer's limit, that the heat dissipation in systems with long-range correlations is diminished. The $Ω_q$ number, which remains incompressible, now offers the possibility of a continuous increase of complexity, measured by the parameter $q$. We show the consistency of the results by a numerical simulation, and discuss Zipf's law in light of the new findings.

2602.03914 2026-02-05 cs.LG stat.ME

Causal Discovery for Cross-Sectional Data Based on Super-Structure and Divide-and-Conquer

Wenyu Wang, Yaping Wan

Comments 7 pages,16 figures

详情
英文摘要

This paper tackles a critical bottleneck in Super-Structure-based divide-and-conquer causal discovery: the high computational cost of constructing accurate Super-Structures--particularly when conditional independence (CI) tests are expensive and domain knowledge is unavailable. We propose a novel, lightweight framework that relaxes the strict requirements on Super-Structure construction while preserving the algorithmic benefits of divide-and-conquer. By integrating weakly constrained Super-Structures with efficient graph partitioning and merging strategies, our approach substantially lowers CI test overhead without sacrificing accuracy. We instantiate the framework in a concrete causal discovery algorithm and rigorously evaluate its components on synthetic data. Comprehensive experiments on Gaussian Bayesian networks, including magic-NIAB, ECOLI70, and magic-IRRI, demonstrate that our method matches or closely approximates the structural accuracy of PC and FCI while drastically reducing the number of CI tests. Further validation on the real-world China Health and Retirement Longitudinal Study (CHARLS) dataset confirms its practical applicability. Our results establish that accurate, scalable causal discovery is achievable even under minimal assumptions about the initial Super-Structure, opening new avenues for applying divide-and-conquer methods to large-scale, knowledge-scarce domains such as biomedical and social science research.

2602.03911 2026-02-05 cs.LG math.OC stat.ML

The Role of Target Update Frequencies in Q-Learning

Simon Weissmann, Tilman Aach, Benedikt Wille, Sebastian Kassing, Leif Döring

详情
英文摘要

The target network update frequency (TUF) is a central stabilization mechanism in (deep) Q-learning. However, their selection remains poorly understood and is often treated merely as another tunable hyperparameter rather than as a principled design decision. This work provides a theoretical analysis of target fixing in tabular Q-learning through the lens of approximate dynamic programming. We formulate periodic target updates as a nested optimization scheme in which each outer iteration applies an inexact Bellman optimality operator, approximated by a generic inner loop optimizer. Rigorous theory yields a finite-time convergence analysis for the asynchronous sampling setting, specializing to stochastic gradient descent in the inner loop. Our results deliver an explicit characterization of the bias-variance trade-off induced by the target update period, showing how to optimally set this critical hyperparameter. We prove that constant target update schedules are suboptimal, incurring a logarithmic overhead in sample complexity that is entirely avoidable with adaptive schedules. Our analysis shows that the optimal target update frequency increases geometrically over the course of the learning process.

2602.03906 2026-02-05 cs.LG cs.AI cs.IT math.IT stat.ML

GeoIB: Geometry-Aware Information Bottleneck via Statistical-Manifold Compression

Weiqi Wang, Zhiyi Tian, Chenhan Zhang, Shui Yu

详情
英文摘要

Information Bottleneck (IB) is widely used, but in deep learning, it is usually implemented through tractable surrogates, such as variational bounds or neural mutual information (MI) estimators, rather than directly controlling the MI I(X;Z) itself. The looseness and estimator-dependent bias can make IB "compression" only indirectly controlled and optimization fragile. We revisit the IB problem through the lens of information geometry and propose a \textbf{Geo}metric \textbf{I}nformation \textbf{B}ottleneck (\textbf{GeoIB}) that dispenses with mutual information (MI) estimation. We show that I(X;Z) and I(Z;Y) admit exact projection forms as minimal Kullback-Leibler (KL) distances from the joint distributions to their respective independence manifolds. Guided by this view, GeoIB controls information compression with two complementary terms: (i) a distribution-level Fisher-Rao (FR) discrepancy, which matches KL to second order and is reparameterization-invariant; and (ii) a geometry-level Jacobian-Frobenius (JF) term that provides a local capacity-type upper bound on I(Z;X) by penalizing pullback volume expansion of the encoder. We further derive a natural-gradient optimizer consistent with the FR metric and prove that the standard additive natural-gradient step is first-order equivalent to the geodesic update. We conducted extensive experiments and observed that the GeoIB achieves a better trade-off between prediction accuracy and compression ratio in the information plane than the mainstream IB baselines on popular datasets. GeoIB improves invariance and optimization stability by unifying distributional and geometric regularization under a single bottleneck multiplier. The source code of GeoIB is released at "https://anonymous.4open.science/r/G-IB-0569".

2602.03899 2026-02-05 stat.ML cs.AI cs.LG math.OC math.ST stat.TH

Byzantine Machine Learning: MultiKrum and an optimal notion of robustness

Gilles Bareilles, Wassim Bouaziz, Julien Fageot, El-Mahdi El-Mhamdi

详情
英文摘要

Aggregation rules are the cornerstone of distributed (or federated) learning in the presence of adversaries, under the so-called Byzantine threat model. They are also interesting mathematical objects from the point of view of robust mean estimation. The Krum aggregation rule has been extensively studied, and endowed with formal robustness and convergence guarantees. Yet, MultiKrum, a natural extension of Krum, is often preferred in practice for its superior empirical performance, even though no theoretical guarantees were available until now. In this work, we provide the first proof that MultiKrum is a robust aggregation rule, and bound its robustness coefficient. To do so, we introduce $κ^\star$, the optimal *robustness coefficient* of an aggregation rule, which quantifies the accuracy of mean estimation in the presence of adversaries in a tighter manner compared with previously adopted notions of robustness. We then construct an upper and a lower bound on MultiKrum's robustness coefficient. As a by-product, we also improve on the best-known bounds on Krum's robustness coefficient. We show that MultiKrum's bounds are never worse than Krum's, and better in realistic regimes. We illustrate this analysis by an experimental investigation on the quality of the lower bound.

2602.03889 2026-02-05 stat.ML cs.LG

Transcendental Regularization of Finite Mixtures:Theoretical Guarantees and Practical Limitations

Ernest Fokoué

Comments 24 pages, 6 figures, 2 tables

详情
英文摘要

Finite mixture models are widely used for unsupervised learning, but maximum likelihood estimation via EM suffers from degeneracy as components collapse. We introduce transcendental regularization, a penalized likelihood framework with analytic barrier functions that prevent degeneracy while maintaining asymptotic efficiency. The resulting Transcendental Algorithm for Mixtures of Distributions (TAMD) offers strong theoretical guarantees: identifiability, consistency, and robustness. Empirically, TAMD successfully stabilizes estimation and prevents collapse, yet achieves only modest improvements in classification accuracy-highlighting fundamental limits of mixture models for unsupervised learning in high dimensions. Our work provides both a novel theoretical framework and an honest assessment of practical limitations, implemented in an open-source R package.

2602.01928 2026-02-05 stat.ML cs.LG

Privacy Amplification by Missing Data

Simon Roburin, Rafaël Pinot, Erwan Scornet

详情
英文摘要

Privacy preservation is a fundamental requirement in many high-stakes domains such as medicine and finance, where sensitive personal data must be analyzed without compromising individual confidentiality. At the same time, these applications often involve datasets with missing values due to non-response, data corruption, or deliberate anonymization. Missing data is traditionally viewed as a limitation because it reduces the information available to analysts and can degrade model performance. In this work, we take an alternative perspective and study missing data from a privacy preservation standpoint. Intuitively, when features are missing, less information is revealed about individuals, suggesting that missingness could inherently enhance privacy. We formalize this intuition by analyzing missing data as a privacy amplification mechanism within the framework of differential privacy. We show, for the first time, that incomplete data can yield privacy amplification for differentially private algorithms.

2601.06514 2026-02-05 stat.ML cs.LG cs.NA math.NA math.OC math.ST stat.TH

Inference-Time Alignment for Diffusion Models via Variationally Stable Doob's Matching

Jinyuan Chang, Chenguang Duan, Yuling Jiao, Yi Xu, Jerry Zhijian Yang

详情
英文摘要

Inference-time alignment for diffusion models aims to adapt a pre-trained reference diffusion model toward a target distribution without retraining the reference score network, thereby preserving the generative capacity of the reference model while enforcing desired properties at the inference time. A central mechanism for achieving such alignment is guidance, which modifies the sampling dynamics through an additional drift term. In this work, we introduce variationally stable Doob's matching, a novel framework for provable guidance estimation grounded in Doob's $h$-transform. Our approach formulates guidance as the gradient of logarithm of an underlying Doob's $h$-function and employs gradient-regularized regression to simultaneously estimate both the $h$-function and its gradient, resulting in a consistent estimator of the guidance. Theoretically, we establish non-asymptotic convergence rates for the estimated guidance. Moreover, we analyze the resulting controllable diffusion processes and prove non-asymptotic convergence guarantees for the generated distributions in the 2-Wasserstein distance. Finally, we show that variationally stable guidance estimators are adaptive to unknown low dimensionality, effectively mitigating the curse of dimensionality under low-dimensional subspace assumptions.

2512.21005 2026-02-05 stat.ML cs.LG math.PR

Learning from Neighbors with PHIBP: Predicting Infectious Disease Dynamics in Data-Sparse Environments

Edwin Fong, Lancelot F. James, Juho Lee

Comments v2: Revised version incorporating peer review feedback from book chapter submission. Clarifies modeling objectives for infectious disease prediction and situates the work within a three-paper PHIBP framework, highlighting suitability for future AI/LLM plug-and-play model specification

详情
英文摘要

Modeling sparse count data, which arise across numerous scientific fields, presents significant statistical challenges. This chapter addresses these challenges in the context of infectious disease prediction, with a focus on predicting outbreaks in geographic regions that have historically reported zero cases. To this end, we present the detailed computational framework and experimental application of the Poisson Hierarchical Indian Buffet Process (PHIBP), with demonstrated success in handling sparse count data in microbiome and ecological studies. The PHIBP's architecture, grounded in the concept of absolute abundance, systematically borrows statistical strength from related regions and circumvents the known sensitivities of relative-rate methods to zero counts. Through a series of experiments on infectious disease data, we show that this principled approach provides a robust foundation for generating coherent predictive distributions and for the effective use of comparative measures such as alpha and beta diversity. The chapter's emphasis on algorithmic implementation and experimental results confirms that this unified framework delivers both accurate outbreak predictions and meaningful epidemiological insights in data-sparse settings.

2512.12742 2026-02-05 stat.ML cs.LG

A Novel Framework Using Variational Inference with Normalizing Flows to Train Transport Reversible Jump Proposals

Pingping Yin, Xiyun Jiao

详情
英文摘要

We propose a unified framework that employs variational inference (VI) with (conditional) normalizing flows (NFs) to train both between-model and within-model proposals for reversible jump Markov chain Monte Carlo, enabling efficient trans-dimensional Bayesian inference. In contrast to the transport reversible jump (TRJ) of Davies et al. (2023), which optimizes forward KL divergence using pilot samples from the complex target distribution, our approach minimizes the reverse KL divergence, requiring only samples from a simple base distribution and largely reducing computational cost. Especially, we develop a novel trans-dimensional VI method with conditional NFs to fit the conditional transport proposal of Davies et al. (2023). We use RealNVP flows to learn the model-specific transport maps used for constructing proposals so that the calculation is parallelizable. Our framework also provides accurate estimates of marginal likelihoods, which may facilitate efficient model comparison and help design rejection-free proposals. Extensive numerical studies demonstrate that the TRJ method trained under our framework achieves faster mixing compared to existing baselines.

2511.02235 2026-02-05 stat.ME econ.EM

Diffusion Index Forecasting with Tensor Data

Bin Chen, Yuefeng Han, Qiyang Yu

详情
英文摘要

In this paper, we consider diffusion index forecasting with both tensor and non-tensor predictors, where the tensor structure is preserved with a Canonical Polyadic (CP) tensor factor model. When the number of non-tensor predictors is small, we study the asymptotic properties of the least squares estimator in this tensor factor-augmented regression, allowing for factors with different strengths. We derive an analytical formula for prediction intervals that accounts for the estimation uncertainty of the latent factors. In addition, we propose a novel thresholding estimator for the high-dimensional covariance matrix that is robust to cross-sectional dependence. When the number of non-tensor predictors exceeds or diverges with the sample size, we introduce a multi-source factor-augmented sparse regression model and establish the consistency of the corresponding penalized estimator. Simulation studies validate our theoretical results and an empirical application to U.S. trade flows demonstrates the advantages of our approach over other popular methods in the literature.

2510.13060 2026-02-05 cs.LG cs.GT math.OC stat.ML

Achieving Logarithmic Regret in KL-Regularized Zero-Sum Markov Games

Anupam Nayak, Tong Yang, Osman Yagan, Gauri Joshi, Yuejie Chi

详情
英文摘要

Reverse Kullback-Leibler (KL) divergence-based regularization with respect to a fixed reference policy is widely used in modern reinforcement learning to preserve the desired traits of the reference policy and sometimes to promote exploration (using uniform reference policy, known as entropy regularization). Beyond serving as a mere anchor, the reference policy can also be interpreted as encoding prior knowledge about good actions in the environment. In the context of alignment, recent game-theoretic approaches have leveraged KL regularization with pretrained language models as reference policies, achieving notable empirical success in self-play methods. Despite these advances, the theoretical benefits of KL regularization in game-theoretic settings remain poorly understood. In this work, we develop and analyze algorithms that provably achieve improved sample efficiency under KL regularization. We study both two-player zero-sum matrix games and Markov games: for matrix games, we propose OMG, an algorithm based on best response sampling with optimistic bonuses, and extend this idea to Markov games through the algorithm SOMG, which also uses best response sampling and a novel concept of superoptimistic bonuses. Both algorithms achieve a logarithmic regret in $T$ that scales inversely with the KL regularization strength $β$ in addition to the traditional $\widetilde{\mathcal{O}}(\sqrt{T})$ regret without the $β^{-1}$ dependence.

2509.25783 2026-02-05 stat.ML cs.LG math.OC

Sharpness of Minima in Deep Matrix Factorization

Anil Kamber, Rahul Parhi

Comments 18 pages, 7 figures

详情
英文摘要

Understanding the geometry of the loss landscape near a minimum is key to explaining the implicit bias of gradient-based methods in non-convex optimization problems such as deep neural network training and deep matrix factorization. A central quantity to characterize this geometry is the maximum eigenvalue of the Hessian of the loss. Currently, its precise role has been obfuscated because no exact expressions for this sharpness measure were known in general settings. In this paper, we present the first exact expression for the maximum eigenvalue of the Hessian of the squared-error loss at any minimizer in deep matrix factorization/deep linear neural network training problems, resolving an open question posed by Mulayoff & Michaeli (2020). This expression reveals a fundamental property of the loss landscape in deep matrix factorization: Having a constant product of the spectral norms of the left and right intermediate factors across layers is a sufficient condition for flatness. Most notably, in both depth-$2$ matrix and deep overparameterized scalar factorization, we show that this condition is both necessary and sufficient for flatness, which implies that flat minima are spectral-norm balanced even though they are not necessarily Frobenius-norm balanced. To complement our theory, we provide the first empirical characterization of an escape phenomenon during gradient-based training near a minimizer of a deep matrix factorization problem.

2509.24095 2026-02-05 stat.ML cs.LG

Singleton-Optimized Conformal Prediction

Tao Wang, Yan Sun, Edgar Dobriban

详情
英文摘要

Conformal prediction can be used to construct prediction sets that cover the true outcome with a desired probability, but can sometimes lead to large prediction sets that are costly in practice. The most useful outcome is a singleton prediction-an unambiguous decision-yet existing efficiency-oriented methods primarily optimize average set size. Motivated by this, we propose a new nonconformity score that aims to minimize the probability of producing non-singleton sets. Starting from a non-convex constrained optimization problem as a motivation, we provide a geometric reformulation and associated algorithm for computing the nonconformity score and associated split conformal prediction sets in O(K) time for K-class problems. Using this score in split conformal prediction leads to our proposed Singleton-Optimized Conformal Prediction (SOCOP) method. We evaluate our method in experiments on image classification and LLM multiple-choice question-answering, comparing with standard nonconformity scores such as the (negative) label probability estimates and their cumulative distribution function; both of which are motivated by optimizing length. The results show that SOCOP increases singleton frequency (sometimes by over 20%) compared to the above scores, with minimal impact on average set size.

2508.13131 2026-02-05 cs.CL cs.LG stat.ML

Improving Detection of Watermarked Language Models

Dara Bahri, John Wieting

Comments Published at TMLR 2026

详情
英文摘要

Watermarking has recently emerged as an effective strategy for detecting the generations of large language models (LLMs). The strength of a watermark typically depends strongly on the entropy afforded by the language model and the set of input prompts. However, entropy can be quite limited in practice, especially for models that are post-trained, for example via instruction tuning or reinforcement learning from human feedback (RLHF), which makes detection based on watermarking alone challenging. In this work, we investigate whether detection can be improved by combining watermark detectors with non-watermark ones. We explore a number of hybrid schemes that combine the two, observing performance gains over either class of detector under a wide range of experimental conditions.

2506.15492 2026-02-05 cs.LG stat.ML

LIT-LVM: Structured Regularization for Interaction Terms in Linear Predictors using Latent Variable Models

Mohammadreza Nemati, Zhipeng Huang, Kevin S. Xu

Comments Published in the Transactions on Machine Learning Research (2025). https://openreview.net/forum?id=3uW5nxESu1

详情
英文摘要

Some of the simplest, yet most frequently used predictors in statistics and machine learning use weighted linear combinations of features. Such linear predictors can model non-linear relationships between features by adding interaction terms corresponding to the products of all pairs of features. We consider the problem of accurately estimating coefficients for interaction terms in linear predictors. We hypothesize that the coefficients for different interaction terms have an approximate low-dimensional structure and represent each feature by a latent vector in a low-dimensional space. This low-dimensional representation can be viewed as a structured regularization approach that further mitigates overfitting in high-dimensional settings beyond standard regularizers such as the lasso and elastic net. We demonstrate that our approach, called LIT-LVM, achieves superior prediction accuracy compared to the elastic net, hierarchical lasso, and factorization machines on a wide variety of simulated and real data, particularly when the number of interaction terms is high compared to the number of samples. LIT-LVM also provides low-dimensional latent representations for features that are useful for visualizing and analyzing their relationships.

2506.12818 2026-02-05 cs.LG cs.AI stat.ML

Taking the GP Out of the Loop

Mehul Bafna, Siddhant anand Jadhav, David Sweet

Comments 12 pages, 11 figures

详情
英文摘要

Bayesian optimization (BO) has traditionally solved black-box problems where function evaluation is expensive and, therefore, observations are few. Recently, however, there has been growing interest in applying BO to problems where function evaluation is cheaper and observations are more plentiful. In this regime, scaling to many observations $N$ is impeded by Gaussian-process (GP) surrogates: GP hyperparameter fitting scales as $\mathcal{O}(N^3)$ (reduced to roughly $\mathcal{O}(N^2)$ in modern implementations), and it is repeated at every BO iteration. Many methods improve scaling at acquisition time, but hyperparameter fitting still scales poorly, making it the bottleneck. We propose Epistemic Nearest Neighbors (ENN), a lightweight alternative to GPs that estimates function values and uncertainty (epistemic and aleatoric) from $K$-nearest-neighbor observations. ENN scales as $\mathcal{O}(N)$ for both fitting and acquisition. Our BO method, TuRBO-ENN, replaces the GP surrogate in TuRBO with ENN and its Thompson-sampling acquisition with $\mathrm{UCB} = μ(x) + σ(x)$. For the special case of noise-free problems, we can omit fitting altogether by replacing $\mathrm{UCB}$ with a non-dominated sort over $μ(x)$ and $σ(x)$. We show empirically that TuRBO-ENN reduces proposal time (i.e., fitting time + acquisition time) by one to two orders of magnitude compared to TuRBO at up to 50,000 observations.

2505.18526 2026-02-05 stat.ML cs.LG

Scalable Deep Basis Kernel Gaussian Processes

Yunqin Zhu, Henry Shaowu Yuchi, Yao Xie

Comments Previous title: Scalable Gaussian Processes with Low-Rank Deep Kernel Decomposition

详情
英文摘要

Learning expressive kernels while retaining tractable inference remains a central challenge in scaling Gaussian processes (GPs) to large and complex datasets. We propose a scalable GP regressor based on deep basis kernels (DBKs). Our DBK is constructed from a small set of neural-network-parameterized basis functions with an explicit low-rank structure. This formulation immediately enables linear-complexity inference with respect to the number of samples, possibly without inducing points. DBKs provide a unifying perspective that recovers sparse deep kernel learning and Gaussian Bayesian last-layer methods as special cases. We further identify that naively maximizing the marginal likelihood can lead to oversimplified uncertainty and rank-deficient solutions. To address this, we introduce a mini-batch stochastic objective that directly targets the predictive distribution with decoupled regularization. Empirically, DBKs show advantages in predictive accuracy, uncertainty quantification, and computational efficiency across a range of large-scale regression benchmarks.

2505.00785 2026-02-05 stat.ME econ.EM

Proper Correlation Coefficients for Nominal Random Variables

Jan-Lukas Wermuth

详情
英文摘要

This paper develops an intuitive concept of perfect dependence between two variables of which at least one has a nominal scale. Perfect dependence is attainable for all marginal distributions. It furthermore proposes a set of dependence measures that are 1 if and only if this perfect dependence is satisfied. The advantages of these dependence measures relative to classical dependence measures like contingency coefficients, Goodman-Kruskal's lambda and tau and the so-called uncertainty coefficient are twofold. Firstly, they are defined if one of the variables exhibits continuities. Secondly, they satisfy the property of attainability. That is, they can take all values in the interval [0,1] irrespective of the marginals involved. Both properties are not shared by classical dependence measures which need two discrete marginal distributions and can in some situations yield values close to 0 even though the dependence is strong or even perfect. Additionally, the paper provides a consistent estimator for one of the new dependence measures together with its asymptotic distribution under independence as well as in the general case. This allows to construct confidence intervals and an independence test with good finite sample properties, as a subsequent simulation study shows. Finally, two applications on the dependence between the variables country and income, and country and religion, respectively, illustrate the use of the new measure.

2504.00049 2026-02-05 stat.ME stat.CO

Scalable Durational Event Models: Application to Physical and Digital Interactions

Cornelius Fritz, Riccardo Rastelli, Michael Fop, Alberto Caimo

详情
英文摘要

Durable interactions are ubiquitous in social network analysis and are increasingly observed with precise time stamps. Phone and video calls, for example, are events to which a specific duration can be assigned. We term data encoding interactions with the start and end times ``durational event data''. Recent advances in data collection have enabled the observation of such data over extended periods of time and between large populations of actors. Methodologically, we propose the Durational Event Model, an extension of Relational Event Models that decouples the modeling of event incidence from event duration. Computationally, we derive a fast, memory-efficient, and exact block-coordinate ascent algorithm to facilitate large-scale inference. Theoretical complexity analysis and numerical simulations demonstrate computational superiority of this approach over state-of-the-art methods. We apply the model to physical and digital interactions among college students in Copenhagen. Our empirical findings reveal that past interactions drive physical interactions, whereas digital interactions are influenced predominantly by friendship ties and prior dyadic contact.

2408.07872 2026-02-05 cs.RO stat.AP

Autonomous on-Demand Shuttles for First Mile-Last Mile Connectivity: Design, Optimization, and Impact Assessment

Sudipta Roy, Gabriel Dadashev, Lampros Yfantis, Bat-hen Nahmias-Biran, Samiul Hasan

Comments 25 Pages, 13 Figures, 1 Table

详情
英文摘要

The First-Mile Last-Mile (FMLM) connectivity is crucial for improving public transit accessibility and efficiency, particularly in sprawling suburban regions where traditional fixed-route transit systems are often inadequate. Autonomous on-Demand Shuttles (AODS) hold a promising option for FMLM connections due to their cost-effectiveness and improved safety features, thereby enhancing user convenience and reducing reliance on personal vehicles. A critical issue in AODS service design is the optimization of travel paths, for which realistic traffic network assignment combined with optimal routing offers a viable solution. In this study, we have designed an AODS controller that integrates a mesoscopic simulation-based dynamic traffic assignment model with a greedy insertion heuristics approach to optimize the travel routes of the shuttles. The controller also considers the charging infrastructure/strategies and the impact of the shuttles on regular traffic flow for routes and fleet-size planning. The controller is implemented in Aimsun traffic simulator considering Lake Nona in Orlando, Florida as a case study. We show that, under the present demand based on 1% of total trips as transit riders, a fleet of 3 autonomous shuttles can serve about 80% of FMLM trip requests on-demand basis with an average waiting time below 4 minutes. Additional power sources have significant effect on service quality as the inactive waiting time for charging would increase the fleet size. We also show that low-speed autonomous shuttles would have negligible impact on regular vehicle flow, making them suitable for suburban areas. These findings have important implications for sustainable urban planning and public transit operations.

2407.13731 2026-02-05 stat.ML cs.LG

Predictive Low Rank Matrix Learning under Partial Observations: Mixed-Projection ADMM

Dimitris Bertsimas, Nicholas A. G. Johnson

详情
英文摘要

We study the problem of learning a partially observed matrix under the low rank assumption in the presence of fully observed side information that depends linearly on the true underlying matrix. This problem consists of an important generalization of the Matrix Completion problem, a central problem in Statistics, Operations Research and Machine Learning, that arises in applications such as recommendation systems, signal processing, system identification and image denoising. We formalize this problem as an optimization problem with an objective that balances the strength of the fit of the reconstruction to the observed entries with the ability of the reconstruction to be predictive of the side information. We derive a mixed-projection reformulation of the resulting optimization problem and present a strong semidefinite cone relaxation. We design an efficient, scalable alternating direction method of multipliers algorithm that produces high quality feasible solutions to the problem of interest. Our numerical results demonstrate that in the small rank regime ({\color{black}$k \leq 10$}), our algorithm outputs solutions that achieve on average {\color{black}$2.3\%$} lower objective value and {\color{black}$41\%$} lower $\ell_2$ reconstruction error than the solutions returned by the best performing benchmark method on synthetic data. The runtime of our algorithm is competitive with and often superior to that of the benchmark methods. Our algorithm is able to solve problems with $n = 10000$ rows and $m = 10000$ columns in less than a minute. On large scale real world data, our algorithm produces solutions that achieve $67\%$ lower out of sample error than benchmark methods in $97\%$ less execution time.

2405.04636 2026-02-05 cs.LG stat.ML

Data-driven Error Estimation: Excess Risk Bounds without Class Complexity as Input

Sanath Kumar Krishnamurthy, Anna Lyubarskaja, Emma Brunskill, Susan Athey

详情
英文摘要

Constructing confidence intervals that are simultaneously valid across a class of estimates is central to tasks such as multiple mean estimation, generalization guarantees, and adaptive experimental design. We frame this as an ``error estimation problem," where the goal is to determine a high-probability upper bound on the maximum error for a class of estimates. We propose an entirely data-driven approach that derives such bounds for both finite and infinite class settings, naturally adapting to a potentially unknown correlation structure of random errors. Notably, our method does not require class complexity as an input, overcoming a major limitation of existing approaches. We present our simple yet general solution and demonstrate applications to simultaneous confidence intervals, excess-risk control and optimizing exploration in contextual bandit algorithms.

2404.01390 2026-02-05 math.ST math.OC stat.TH

Convex relaxation for the generalized maximum-entropy sampling problem

Gabriel Ponte, Marcia Fampa, Jon Lee

详情
英文摘要

The generalized maximum-entropy sampling problem (GMESP) is to select an order-$s$ principal submatrix from an order-$n$ covariance matrix, to maximize the product of its $t$ greatest eigenvalues, $0<t\leq s <n$. Introduced more than 25 years ago, GMESP is a natural generalization of two fundamental problems in statistical design theory: (i) maximum-entropy sampling problem (MESP); (ii) binary D-optimality (D-Opt). In the general case, it can be motivated by a selection problem in the context of principal component analysis (PCA). We introduce the first convex-optimization based relaxation for GMESP, study its behavior, compare it to an earlier spectral bound, and demonstrate its use in a branch-and-bound scheme. We find that such an approach is practical when $s-t$ is very small.

2403.14881 2026-02-05 math.ST stat.TH

The German Tank Problem with Multiple Factories

Steven J. Miller, Kishan Sharma, Andrew K. Yang

详情
英文摘要

During the Second World War, estimates of the number of tanks deployed by Germany were critically needed. The Allies adopted a successful statistical approach to estimate this information: assume that the tanks are sequentially numbered starting from, say, 1, and ending at an unknown positive integer $N$. If we observe the numbers of $k$ tanks, then the best linear unbiased estimator for $N$ is $M(1+1/k)-1$ where $M$ is the maximum observed serial number. While this approach was successful, there are many more adversarial situations where the approach for the original German Tank Problem falls short. Typically the number of ``factories'' is a possibly unknown $l>1$, and tanks produced by different factories may have serial numbers in disjoint ranges that are often separated by unknown amounts. Clark, Gonye and Miller (CGM) presented an unbiased estimator for $N$ when the minimum serial number is unknown. So if one can identify which samples correspond to which factory, one can then estimate each factory's range using CGM's method, and sum them for an estimate of the rival's total productivity. We present a procedure to estimate the total productivity and prove that it is effective when $\log l/\log k$ is sufficiently small. In the final section, we show that if we have a small number of samples, we can make an estimator that performs orders of magnitude better when given additional information about the size of the gaps.

2310.01184 2026-02-05 stat.AP

Applications of Improvements to the Pythagorean Won-Loss Expectation in Optimizing Rosters

Alexander F. Almeida, Kevin Dayaratna, Steven J. Miller, Andrew K. Yang

详情
英文摘要

Bill James' Pythagorean formula has for decades done an excellent job estimating a baseball team's winning percentage from very little data: if the average runs scored and allowed are denoted respectively by ${\rm RS}$ and ${\rm RA}$, there is some $γ\approx 2$ such that the winning percentage is approximately ${\rm RS}^γ/ ({\rm RS}^γ+ {\rm RA}^γ)$. One use case is to determine the value of potential signings to the team, as it allows us to estimate how many more wins one obtains over a season given an estimated change in run production and concession. We summarize earlier work on the subject, and extend the earlier theoretical model of Miller (who assumed the home and away teams' runs arise from independent Weibull distributions with the same shape parameter $γ$; this has been observed to describe the observed run data well and yields a win probability equivalent to that of James' formula). We extend this work to model runs scored and allowed as being drawn from independent Weibull distributions with different shape parameters, and then consider the first and second moments to solve a system of four equations in the four unknowns. Doing so fits the training data better, yielding a higher winning percentage over the last 30 MLB seasons (1994 to 2023). This comes at a small cost as we no longer have a closed form expression for the win probability, but must evaluate a two-dimensional integral of Weibull distributions and numerically estimate the solutions to the system of equations. These are trivial to do with simple computational programs.

2306.10767 2026-02-05 stat.ML cs.LG math.ST stat.TH

P-Tensors: a General Formalism for Constructing Higher Order Message Passing Networks

Andrew Hands, Tianyi Sun, Risi Kondor

Journal ref Proc. AISTATS, PMLR 238:424-432, 2024

详情
英文摘要

Several recent papers have proposed increasing the expressive power of graph neural networks by exploiting subgraphs or other topological structures. In parallel, researchers have investigated higher order permutation equivariant networks. In this paper we tie these two threads together by providing a general framework for higher order permutation equivariant message passing in subgraph neural networks. In this paper we introduce a new type of mathematical object called $P$-tensors, which provide a simple way to define the most general form of permutation equivariant message passing in both the above two categories of networks. We show that the P-Tensors paradigm can achieve state-of-the-art performance on benchmark molecular datasets.

2305.00081 2026-02-05 stat.ME

Mixture Quantiles Estimated by Constrained Linear Regression

Cheng Peng, Yizhou Li, Stan Uryasev

详情
英文摘要

We study the problem of modeling univariate distributions via their quantile functions. We introduce a flexible family of distributions whose quantile function is a linear combination of basis quantiles. Because the model is linear in its parameters, estimation reduces to constrained linear regression, yielding a convex optimization problem that readily accommodates cardinality constraints as well as L1 or smoothness regularization. For Lq-type objectives we show the estimator is asymptotically equivalent to a minimum q-Wasserstein distance estimator and establish asymptotic normality. Experiments on simulated and real-world datasets demonstrate that the proposed method accurately captures both the central body and extreme tails of distributions while requiring substantially less computation than standard benchmark approaches.