arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.11040 2026-03-12 math.ST cs.IT math.CA math.FA math.IT math.MG stat.TH

On positive definite thresholding of correlation matrices

Sujit Sakharam Damase, James Eldred Pascoe

Comments 15 pages

详情
英文摘要

Standard thresholding techniques for correlation matrices often destroy positive semidefiniteness. We investigate the construction of positive definite functions that vanish on specific sets $K \subseteq [-1,1)$, ensuring that the thresholded matrix remains a valid correlation matrix. We establish existence results, define a criterion for faithfulness based on the linear coefficient of the normalized Gegenbauer expansion in analogy with Delsarte's method in coding theory, and provide bounds for thresholding at single points and pairs of points. We prove that for correlation matrices of rank $n$, any soft-thresholding operator that preserves positive semidefiniteness necessarily induces a geometric collapse of the feature space, as quantified by an $\mathcal{O}(1/n)$ bound on the faithfulness constant. Such demonstrates that geometrically unbiased soft-thresholding limits the recoverable signal.

2603.11019 2026-03-12 stat.ME stat.AP

Don't Disregard the Data for Lack of a Likelihood: Bayesian Synthetic Likelihood for Enhanced Multilevel Network Meta-Regression

Harlan Campbell, Charles C. Margossian, Jeroen P. Jansen, Paul Gustafson

详情
英文摘要

Multilevel network meta-regression (ML-NMR) enables population-adjusted indirect treatment comparisons by combining individual patient data (IPD) with aggregate data. When individual-level covariates are unavailable, ML-NMR marginalizes over the covariate distribution, but this strategy cannot exploit subgroup-level summary results that are often available and potentially highly informative. We propose using Bayesian Synthetic Likelihood (BSL) to leverage this ancillary summary information and present an implementation strategy for Hamiltonian Monte Carlo (HMC), a gradient-based Markov chain Monte Carlo (MCMC) algorithm. At each MCMC iteration, the BSL method imputes missing covariates by sampling from the model-implied conditional distribution, computes synthetic subgroup summaries from the imputed data, and matches these synthetic summaries to observed summaries via a multivariate normal synthetic likelihood. Fitting this model with HMC presents multiple challenges: first, gradients cannot be computed exactly but must be estimated stochastically; and second, the model's likelihood may be non-differentiable at certain points, a pathology that can deeply frustrate the performance of HMC. We address these challenges with pre-drawn random numbers, continuous relaxation of the likelihood, and Pareto-smoothed importance sampling. This work (1) introduces a novel application of BSL to missing data problems where summary statistics from the complete dataset are available despite substantial missingness in the individual-level data, (2) demonstrates how BSL strategies can be implemented within Stan's HMC framework, and (3) shows, using a network of plaque psoriasis trials, that BSL-enhanced ML-NMR can substantially improve upon standard ML-NMR by leveraging informative ancillary information.

2603.11016 2026-03-12 stat.AP

A Model-Based Restricted Shapley Value to Measure the Players' Contribution to Shot Actions in Football

Mattia Cefis, Rodolfo Metulini, Maurizio Carpita

Comments 20 pages, 4 figures. Submitted to "Computational Statistics" (Springer)

详情
英文摘要

This paper proposes a novel framework to assess individual player contributions in football, explicitly accounting for the cooperative nature of shot-ending offensive actions. By incorporating team interaction into player evaluation, it also supports economically sustainable decision-making, with practical implications for performance analysis and player scouting. Extending the expected Goal (xG) paradigm, we propose the expected Goal Action (xGA), a measure of shot quality that incorporates build-up play and passing networks. Furthermore, we adapt cooperative game theory and introduce the Player's Restricted Shapley (PRS) statistic, a contribution metric based on restricted coalition structures derived from observed passing interactions, where xGA is adopted to compute the cohesion function. Unlike traditional Shapley approaches, the PRS one restricts coalitions to tactically admissible player subsets, offering action-specific, interpretable measures of marginal contribution in a cooperative context. We apply the framework to 8,421 shot-actions from the Italian League Serie A season 2022/23, and the case studies of AC Milan and SSC Napoli reveal some heterogeneity in contributions within teams. Furthermore, combining the PRS statistic with a final efficiency metric highlights the discrepancies between cooperative engagement and goal conversion.

2603.10991 2026-03-12 math.ST cs.LG cs.NE stat.CO stat.TH

ForwardFlow: Simulation only statistical inference using deep learning

Stefan Böhringer

详情
英文摘要

Deep learning models are being used for the analysis of parametric statistical models based on simulation-only frameworks. Bayesian models using normalizing flows simulate data from a prior distribution and are composed of two deep neural networks: a summary network that learns a sufficient statistic for the parameter and a normalizing flow that conditional on the summary network can approximate the posterior distribution. Here, we explore frequentist models that are based on a single summary network. During training, input of the network is a simulated data set based on a parameter and the loss function minimizes the mean-square error between learned summary and parameter. The network thereby solves the inverse problem of parameter estimation. We propose a branched network structure that contains collapsing layers that reduce a data set to summary statistics that are further mapped through fully connected layers to approximate the parameter estimate. We motivate our choice of network structure by theoretical considerations. In simulations we demonstrate three desirable properties of parameter estimates: finite sample exactness, robustness to data contamination, and algorithm approximation. These properties are achieved offering the the network varying sample size, contaminated data, and data needing algorithmic reconstruction during the training phase. In our simulations an EM-algorithm for genetic data is automatically approximated by the network. Simulation only approaches seem to offer practical advantages in complex modeling tasks where the simpler data simulation part is left to the researcher and the more complex problem of solving the inverse problem is left to the neural network. Challenging future work includes offering pre-trained models that can be used in a wide variety of applications.

2603.10989 2026-03-12 stat.ME

Causal Survival Analysis in Platform Trials with Non-Concurrent Controls

Antonio D'Alessandro, Samrachana Adhikari, Michele Santacatterina

详情
英文摘要

Platform trials allow treatment arms to enter and exit over time while maintaining a shared control arm, yielding concurrent and non-concurrent controls (NCC). Pooling NCC is often motivated as a strategy to improve statistical efficiency, but it is unclear which estimand is targeted, what assumptions justify identification and estimation, and when precision gains are achievable; these questions are further complicated by time-to-event/survival data. Motivated by the Adaptive COVID-19 Treatment Trial (ACTT) platform trial with time to recovery as the primary endpoint, we develop an estimand-first causal survival framework targeting the treatment-specific counterfactual survival curve in the concurrent population and the corresponding functionals including the concurrent restricted mean survival time (RMST). We give nonparametric identification results and formalize conditions that justify pooling using NCC. We study covariate-adjusted outcome-regression (OR) and doubly robust (DR) estimators for the concurrent RMST, comparing concurrent-only versions to pooled-control versions. Pooling improves precision for OR estimators only when the pooling assumption holds and parametric hazard models are correctly specified; otherwise, pooling can induce bias. Moreover, in certain settings, pooling NCC yields no efficiency gain for the DR estimator. Overall, the most robust route to improve precision is to target concurrent causal survival estimands and use a covariate-adjusted DR estimation that uses only concurrent controls. An ACTT application corroborates these results.

2603.10950 2026-03-12 cs.LG stat.ML

When should we trust the annotation? Selective prediction for molecular structure retrieval from mass spectra

Mira Jürgens, Gaetan De Waele, Morteza Rakhshaninejad, Willem Waegeman

详情
英文摘要

Machine learning methods for identifying molecular structures from tandem mass spectra (MS/MS) have advanced rapidly, yet current approaches still exhibit significant error rates. In high-stakes applications such as clinical metabolomics and environmental screening, incorrect annotations can have serious consequences, making it essential to determine when a prediction can be trusted. We introduce a selective prediction framework for molecular structure retrieval from MS/MS spectra, enabling models to abstain from predictions when uncertainty is too high. We formulate the problem within the risk-coverage tradeoff framework and comprehensively evaluate uncertainty quantification strategies at two levels of granularity: fingerprint-level uncertainty over predicted molecular fingerprint bits, and retrieval-level uncertainty over candidate rankings. We compare scoring functions including first-order confidence measures, aleatoric and epistemic uncertainty estimates from second-order distributions, as well as distance-based measures in the latent space. All experiments are conducted on the MassSpecGym benchmark. Our analysis reveals that while fingerprint-level uncertainty scores are poor proxies for retrieval success, computationally inexpensive first-order confidence measures and retrieval-level aleatoric uncertainty achieve strong risk-coverage tradeoffs across evaluation settings. We demonstrate that by applying distribution-free risk control via generalization bounds, practitioners can specify a tolerable error rate and obtain a subset of annotations satisfying that constraint with high probability.

2603.10937 2026-03-12 cs.LG stat.AP

Quantifying Membership Disclosure Risk for Tabular Synthetic Data Using Kernel Density Estimators

Rajdeep Pathak, Sayantee Jana

详情
英文摘要

The use of synthetic data has become increasingly popular as a privacy-preserving alternative to sharing real datasets, especially in sensitive domains such as healthcare, finance, and demography. However, the privacy assurances of synthetic data are not absolute, and remain susceptible to membership inference attacks (MIAs), where adversaries aim to determine whether a specific individual was present in the dataset used to train the generator. In this work, we propose a practical and effective method to quantify membership disclosure risk in tabular synthetic datasets using kernel density estimators (KDEs). Our KDE-based approach models the distribution of nearest-neighbour distances between synthetic data and the training records, allowing probabilistic inference of membership and enabling robust evaluation via ROC curves. We propose two attack models: a 'True Distribution Attack', which assumes privileged access to training data, and a more realistic, implementable 'Realistic Attack' that uses auxiliary data without true membership labels. Empirical evaluations across four real-world datasets and six synthetic data generators demonstrate that our method consistently achieves higher F1 scores and sharper risk characterization than a prior baseline approach, without requiring computationally expensive shadow models. The proposed method provides a practical framework and metric for quantifying membership disclosure risk in synthetic data, which enables data custodians to conduct a post-generation risk assessment prior to releasing their synthetic datasets for downstream use. The datasets and codes for this study are available at https://github.com/PyCoder913/MIA-KDE.

2603.10924 2026-03-12 stat.ME

Calibrated Bayesian Nonparametric Tolerance Intervals

Tony Pourmohamad, Robert Richardson, Bruno Sansó

详情
英文摘要

Tolerance intervals provide bounds that contain a specified proportion of a population with a given confidence level, yet their construction remains challenging when parametric assumptions fail or sample sizes are small. Traditional nonparametric methods, such as Wilks' intervals, lack flexibility and often require large samples to be valid. We propose a fully nonparametric approach for constructing one-sided and two-sided tolerance intervals using a calibrated Gibbs posterior. Leveraging the connection between tolerance limits and population quantiles, we employ a Gibbs posterior based on the asymmetric Laplace (check) loss function. A key feature of our method is the calibration of the learning rate, which ensures nominal frequentist coverage across diverse distributional shapes. Simulation studies show that the proposed approach often yields shorter intervals than classical nonparametric benchmarks while maintaining reliable coverage. The framework's practical utility is illustrated through applications in ecology, biopharmaceutical manufacturing, and environmental monitoring, demonstrating its flexibility and robustness across diverse applications.

2603.10918 2026-03-12 stat.ME

Redefining shared information: a heterogeneity-adaptive framework for meta-analysis

Elizabeth M. Davis, Emily C. Hector

Comments 47 pages, 10 tables, 8 figures

详情
英文摘要

Meta-analytic methods tend to take all-or-nothing approaches to study-level heterogeneity, assuming all studies are heterogeneous or homogeneous, leading to inefficiency and/or bias in estimation and inference. In this paper, we develop a heterogeneity-adaptive meta-analysis in linear models that adapts to the amount of information shared between datasets. The primary mechanism for the information-sharing is a shrinkage of dataset-specific distributions towards a new "centroid" distribution through a Kullback-Leibler divergence penalty. The Kullback-Leibler divergence is uniquely geometrically suited for measuring relative information between datasets, and leads to relatively simple closed form estimators with intuitive interpretations. We establish our estimator's desirable inferential properties without assuming homogeneity of dataset parameters. Among other results, we show that our estimator has a provably smaller mean squared error than the dataset-specific maximum likelihood estimators, and establish asymptotically valid inference procedures. A comprehensive set of simulations highlights our estimator's versatility, and an analysis of data from the eICU Collaborative Research Database illustrates its performance in a real-world setting.

2507.20558 2026-03-12 stat.AP

Time-to-Event Modeling with Pseudo-Observations in Federated Settings

Hyojung Jang, Malcolm Risk, Yaojie Wang, Norrina Bai Allen, Xu Shi, Lili Zhao

Comments 30 pages, 5 figures

详情
英文摘要

In multi-center clinical research, privacy regulations often prohibit pooling individual-level records, complicating the analysis of time-to-event data. Current federated survival methods frequently require iterative communication or rely strictly on proportional hazards (PH) assumptions or require sensitive survival information. We propose a one-shot federated framework using pseudo-observations derived from a sequentially updated Kaplan-Meier estimator and fitted via a renewable generalized estimating equation. Unlike traditional methods, our approach allows flexible link functions tailored to the target estimand and accommodates non-proportional hazards. To address site-level heterogeneity, we introduce a covariate-wise debiasing procedure that shrinks noise-driven local deviations toward the global estimate while preserving genuine site-specific effects. Simulation studies demonstrate that our framework achieves inferential accuracy comparable to pooled Cox regression and the privacy-preserving One-shot Distributed Algorithm to fit a multicenter Cox proportional hazards model (ODAC) under PH assumptions, while recovering time-varying coefficient trajectories when PH is violated. Furthermore, simulations confirm that the debiasing procedure optimizes the bias-variance trade-off, adaptively balancing global stability with the preservation of genuine site-specific deviations. Applied to pediatric obesity data from the Chicago Area Patient-Centered Outcomes Research Network (CAPriCORN) network ($N=45,865$), the model produced robust estimates of time-invariant and time-varying hazard ratios, offering a flexible, privacy-preserving alternative for collaborative survival research.

2504.09831 2026-03-12 stat.ML cs.AI cs.LG math.ST stat.AP stat.TH

Offline Dynamic Inventory and Pricing Strategy: Addressing Censored and Dependent Demand

Korel Gundem, Zhengling Qi

详情
英文摘要

In this paper, we study the offline sequential feature-based pricing and inventory control problem where the current demand depends on the past demand levels and any demand exceeding the available inventory is lost. Our goal is to leverage the offline dataset, consisting of past prices, ordering quantities, inventory levels, covariates, and censored sales levels, to estimate the optimal pricing and inventory control policy that maximizes long-term profit. While the underlying dynamic without censoring can be modeled by Markov decision process (MDP), the primary obstacle arises from the observed process where demand censoring is present, resulting in missing profit information, the failure of the Markov property, and a non-stationary optimal policy. To overcome these challenges, we first approximate the optimal policy by solving a high-order MDP characterized by the number of consecutive censoring instances, which ultimately boils down to solving a specialized Bellman equation tailored for this problem. Inspired by offline reinforcement learning and survival analysis, we propose two novel data-driven algorithms for solving these Bellman equations and, thus, estimate the optimal policy. Furthermore, we establish finite-sample regret bounds to validate the effectiveness of these algorithms. Finally, we conduct numerical experiments to demonstrate the efficacy of our algorithms in estimating the optimal policy. To the best of our knowledge, this is the first data-driven approach to learning optimal pricing and inventory control policies in a sequential decision-making environment characterized by censored and dependent demand. The implementations of the proposed algorithms are available at https://github.com/gundemkorel/Inventory_Pricing_Control

2410.10649 2026-03-12 math.ST stat.ME stat.TH

Vecchia Gaussian Processes: on probabilistic and statistical properties

Botond Szabo, Yichen Zhu

详情
英文摘要

Gaussian Processes (GPs) are widely used to model dependencies in spatial statistics and machine learning. However, exact inference is computationally intractable for GP regression, with a time complexity of $O(n^3)$. The Vecchia approximation scales up computation by introducing sparsity into the spatial dependency structure, represented by a directed acyclic graph (DAG). Despite its practical popularity, this approach lacks rigorous theoretical foundations, and the choice of DAG structure remains an open problem. In this paper, we systematically study the Vecchia approximation of the popular, isotropic Matérn GP as standalone stochastic process and uncover key probabilistic and statistical properties. We propose selecting parent sets as norming sets with fixed cardinality in the Vecchia approximation. On the probabilistic side, we show that the conditional distributions of Matérn GPs, as well as their Vecchia approximations, can be characterized by polynomial interpolations. This enables us to establish several results on small ball probabilities and the Reproducing Kernel Hilbert Spaces (RKHSs) of Vecchia GPs. Building on these probabilistic results, we prove that in the nonparametric regression model, the corresponding posterior contracts around the truth at the optimal minimax rate, both under oracle rescaling and hierarchical tuning of the prior. We illustrate the theoretical findings through numerical experiments on synthetic datasets. Our core algorithms are implemented in C++ with an R interface.

2410.08523 2026-03-12 stat.ME

Parametric multi-fidelity Monte Carlo estimation with applications to extremes

Minji Kim, Brendan Brown, Vladas Pipiras

Comments 46 pages, 7 figures

详情
英文摘要

In a multi-fidelity setting, data are available from two sources, high- and low-fidelity. Low-fidelity data has larger size and can be leveraged to make more efficient inference about quantities of interest, e.g. the mean, for high-fidelity variables. In this work, such multi-fidelity setting is studied when the goal is to fit more efficiently a parametric model to high-fidelity data. Three multi-fidelity parameter estimation methods are considered, joint maximum likelihood, (multi-fidelity) moment estimation and (multi-fidelity) marginal maximum likelihood, and are illustrated on several parametric models, with the focus on parametric families used in extreme value analysis. An application is also provided concerning quantification of occurrences of extreme ship motions generated by two computer codes of varying fidelity.

2603.10889 2026-03-12 stat.AP

Estimands and the Choice of Non-Inferiority Margin under ICH E9(R1)

Tobias Mütze, Helle Lynggaard, Sunita Rehal, Oliver N. Keene, Marian Mitroiu, David Wright

详情
英文摘要

Since the release of the ICH E9(R1) addendum on estimands, its application in non-inferiority trials has received far less attention than in superiority settings. A key conclusion from Lynggaard et al. was that the "choice of non-inferiority margin must reflect the chosen estimand." However, current regulatory guidance predates ICH E9(R1) and therefore does not reflect how the estimand influences the historical evidence and constancy assumption (assay sensitivity) used to derive the non-inferiority margin. This paper investigates the degree to which the non-inferiority margin depends on the estimand. Using simulated patient journeys in a weight-management setting, we illustrate how different intercurrent event strategies and variations in the intercurrent event frequency affect the estimand, and consequently the estimated treatment effect. These results emphasize that the historical treatment effect of the reference treatment versus placebo, and thus the margin $M_{1}$, is specific to an estimand and may differ even when trials formally target similar questions. We further illustrate the process of determining the non-inferiority margin using two examples in non-inferiority trials for a new theoretical weight management treatment. In the first example, we focus on the setting where the historical clinical trials use the estimand framework highlighting that even when they include the estimand framework, determining the non-inferiority margin can be challenging in case the historical trials target an estimand different from the one in the planned study. A second example highlights challenges when historical trials did not employ the estimand framework and the targeted estimand cannot be fully reconstructed.

2603.10869 2026-03-12 stat.ME

Risk time splitting for improved estimation of screening programs effect on later mortality

Harald Weedon-Fekjær, Elsebeth Lynge, Niels Keiding

详情
英文摘要

There is a great need for evaluating screening programs, but analysing data from population screening is often complicated by a delayed screening effect. In cancer screening, only new, not yet clinically diagnosed cases, might benefit from screening through earlier treatment. Hence, mortality data following screening should be analysed based on refined mortality, separating cases based on diagnosis before and after first screening invitation. Historically, refined mortality has been implemented by selecting comparison groups from the available data to disentangle the causal effect. While giving valid estimates, the ignorance of large parts of the available data has limited study precision. In BMJ 2014, Weedon-Fekjær et al. used a new estimation approach applying all the available Norwegian mammography screening data. The estimation uses historic pre-screening data on time from clinical diagnosis to death estimating the proportion of post-screening mortality which is expected to be based on cases incident before first screening invitation, in the absence of a screening effect. Utilizing this expected proportion of post-screening incident cases, Poisson regression offsets are added to align the expected number of cases. The screening effect is then estimated adjusting for relevant covariables. While the method increases study precision, it has not been easily available and widely adopted. We here explain the method in detail, add maximum likelihood estimation, and lay the foundation for widespread use. Applying the method on Norwegian and Danish data, bootstrap confidence intervals are considerably narrower than intervals seen using other refined mortality methods, especially for the gradually introduced Norwegian program.

2603.10866 2026-03-12 stat.OT

Beyond Reproducible Research: Building a Formal Representation of a Data Analysis

Roger D. Peng

详情
英文摘要

Data analyses are often constructed in an imperative manner, where commands representing actions taken on the data are issued sequentially. The publication of these commands, along with the data, is essential to the reproducibility of the analysis by others. However, simply presenting the code and the results of running the code can hide important details about the data analyst's premises, expectations, and assumptions about the data. Understanding this analysis reasoning can be critical to evaluating the quality of an analysis and for suggesting possible improvements. We argue that a formal representation of a data analysis that externalizes its logical construction offers more useful information for statically illustrating an analyst's reasoning. Such a formal representation would allow for the evaluation of some aspects of a data analysis without the need for the data, the visualization of the logical connections leading to a conclusion, and the ability to assess the sensitivity of an analyst's assumptions to unexpected features in the data. In this paper we describe an implementation of this formal representation and how it might be applied to some common data analysis tasks.

2603.10830 2026-03-12 stat.ME stat.AP

Bayesian Design and Analysis of Precision Trials with Partial Borrowing

Shirin Golchi, Satoshi Morita

详情
英文摘要

With the advancement of precision medicine there is an increasing need for design and analysis methods in clinical trials with the objective of investigating effect heterogeneity and estimating subgroup effects. As this requires precise estimation of interaction effects, borrowing information from external data sources including retrospective studies and early phase clinical trials to enrich the trial in sparse subgroups is pertinent. Motivated by a trial in gastric cancer we consider a practical design and analysis framework for borrowing from external data sources that only partially inform the inference. As the analysis model we propose an individually weighted model where the external data are weighted based on their fit with the target population based on the distribution of a set of covariates. In a simulation study we assess the performance of the model under various scenarios and make comparisons to dynamic borrowing. In addition, we provide a Bayesian design framework where design priors are extracted from the external data to determine decision boundaries and sample sizes. The design procedure is demonstrated within the context of our motivating example.

2603.10772 2026-03-12 stat.ME

Multiple change-point detection on the circle via isolation using permutation testing

Sophia Loizidou, Andreas Anastasiou, Christophe Ley

Comments 22 pages, 7 figures

详情
英文摘要

In this paper we propose a new method for multiple change-point detection for piecewise-constant circular signals, a setting that, despite its importance in many scientific domains, remains comparatively under-explored. The proposed method, Permutation-based Circular Isolate-Detect, denoted PCID, uses an appropriately chosen contrast function and permutation testing to detect change-points in an offline manner, for the data sequence under consideration. Prior to detection, PCID isolates the change-points. The contrast function used is derived under the assumption of von Mises distribution for the noise, but we show that the method is robust and performs well for other distributions as well. Simulations are used to showcase the usability of the method in different signal and noise structures, including serially correlated noise. In order to exhibit the practical relevance of the method in real-world applications, PCID is applied to three real-world datasets, namely flare, acrophase and wave data.

2603.10731 2026-03-12 cs.LG stat.ML

Beyond Accuracy: Reliability and Uncertainty Estimation in Convolutional Neural Networks

Sanne Ruijs, Alina Kosiakova, Farrukh Javed

Comments 30 pages, 39 figures

详情
英文摘要

Deep neural networks (DNNs) have become integral to a wide range of scientific and practical applications due to their flexibility and strong predictive performance. Despite their accuracy, however, DNNs frequently exhibit poor calibration, often assigning overly confident probabilities to incorrect predictions. This limitation underscores the growing need for integrated mechanisms that provide reliable uncertainty estimation. In this article, we compare two prominent approaches for uncertainty quantification: a Bayesian approximation via Monte Carlo Dropout and the nonparametric Conformal Prediction framework. Both methods are assessed using two convolutional neural network architectures; H-CNN VGG16 and GoogLeNet, trained on the Fashion-MNIST dataset. The empirical results show that although H-CNN VGG16 attains higher predictive accuracy, it tends to exhibit pronounced overconfidence, whereas GoogLeNet yields better-calibrated uncertainty estimates. Conformal Prediction additionally demonstrates consistent validity by producing statistically guaranteed prediction sets, highlighting its practical value in high-stakes decision-making contexts. Overall, the findings emphasize the importance of evaluating model performance beyond accuracy alone and contribute to the development of more reliable and trustworthy deep learning systems.

2603.10687 2026-03-12 stat.CO math.DG stat.AP

A Python implementation of some geometric tools on Kendall 3D shape space for practical applications

Jorge Valero, Vicent Gimeno i Garcia, M. Victoría Ibáñez, Pau Martinavarro, Amelia Simó

详情
英文摘要

This work addresses the challenge of analyzing geometric structures using Kendall's 3D Shape Space. While Riemannian geometry provides a robust framework for shape analysis (independent of scale, position, and orientation) the transition from theoretical manifolds to practical computational workflows remains difficult. Although Geomstats is currently the leading Python library for manifold-based statistics, it lacks specific utilities required for advanced 3D shape analysis. This article introduces tools designed to bridge this gap, translating complex mathematical abstractions into efficient, accessible software solutions for researchers.

2603.10686 2026-03-12 stat.ME

Investigations of Heterogeneity in Diagnostic Test Accuracy Meta-Analysis: A Methodological Review

Lukas Mischinger, Angela Ernst, Bernhard Haller, Alexey Formenko, Zekeriya Aktuerk, Alexander Hapfelmeier

Comments 39 pages, 5 tables, one figure

详情
英文摘要

Background: Subgroup analyses and meta-regression are commonly used to investigate heterogeneity in diagnostic test accuracy (DTA) meta-analyses (MA), but adherence to methodological guidance is unclear. This methodological review summarizes investigations of heterogeneity (IoH) in DTA-MAs, examining their frequency, characteristics, and alignment with recommendations. Methods: We included DTA-MAs published in 2024 reporting at least one pair of summary sensitivity and specificity. Non-DTA reviews, narrative syntheses, studies reporting only alternative measures, and overviews of systematic reviews were excluded. MEDLINE (via Ovid) was searched for English-language publications, with the final search in January 2025. Results: From 403 records, the most recent 100 DTA-MAs were included, each contributing one index test. IoH were reported in 61 analyses. The number of primary studies was positively associated with conducting an investigation (OR 1.66; p = 0.008). Subgroup analyses were used in 35/61 (57%), while 26/61 (43%) applied meta-regression alone or with subgroup analyses. Subgroup analyses examined fewer variables than meta-regression (p < 0.001). Among 44/61 (72%) analyses with sufficient detail to identify a statistical model, the bivariate model was used in 28/44 (64%), univariate random-effects models in 14/44 (32%), and the HSROC model in 5/44 (11%). Formal tests for subgroup differences were reported in 37/61 (61%). Protocols were available for 43/61 (70%) analyses, of which 19/43 (44%) fully prespecified IoH. Discussion: IoH were common and more likely when more primary studies were available, although individual subgroups were often supported by limited data. Reporting of statistical models and model choice was frequently unclear. Greater prespecification of IoH in protocols may reduce spurious findings and improve transparency in diagnostic research.

2603.10674 2026-03-12 stat.ME stat.AP

Conformal prediction for high-dimensional functional time series: Applications to subnational mortality

Han Lin Shang

Comments 23 pages, 5 figures, 2 tables

详情
英文摘要

In statistics, forecast uncertainty is often quantified using a specified statistical model, though such approaches may be vulnerable to model misspecification, selection bias, and limited finite-sample validity. While bootstrapping can potentially mitigate some of these concerns, it is often computationally demanding. Instead, we take a model-agnostic and distribution-free approach, namely conformal prediction, to construct prediction intervals in high-dimensional functional time series. Among a rich family of conformal prediction methods, we study split and sequential conformal prediction. In split conformal prediction, the data are divided into training, validation, and test sets, where the validation set is used to select optimal tuning parameters by calibrating empirical coverage probabilities to match nominal levels; after this, prediction intervals are constructed for the test set, and their accuracy is evaluated. In contrast, sequential conformal prediction removes the need for a validation set by updating predictive quantiles sequentially via an autoregressive process. Using subnational age-specific log-mortality data from Japan and Canada, we compare the finite-sample forecast performance of these two conformal methods using empirical coverage probability and the mean interval score.

2603.10595 2026-03-12 math.ST stat.ME stat.TH

Strong Gaussian approximation for U-statistics in high dimensions and beyond

Weijia Li, Leheng Cai, Qirui Hu

详情
英文摘要

We establish a strong Gaussian approximation for high-dimensional non-degenerate U-statistics with diverging dimension. Under mild assumptions, we construct, on a sufficiently rich probability space, a Gaussian process that uniformly approximates the entire sequential U-statistic process. The approximation error is explicitly characterized and vanishes under polynomial growth of the dimension. The key technical contribution is a sharp martingale maximal inequality for completely degenerate U-statistics, combined with a high-dimensional strong approximation for independent sums. This coupling yields functional Gaussian limits without relying on $\mathcal{L}^\infty$-type bounds or bootstrap arguments. The theory is illustrated through three representative examples of U-statistics: the spatial Kendall's tau matrix, the multivariate Gini's mean difference, and the characteristic dispersion parameter. As applications, we derive Brownian bridge approximations for U-statistic-based change-point statistics and develop a self-normalized relevant testing procedure whose limiting distribution is fully pivotal. The framework naturally accommodates bounded kernels and therefore remains valid under heavy-tailed distributions. Overall, our results provide a unified probability-theoretic foundation for high-dimensional inference based on U-statistics.

2603.10511 2026-03-12 stat.ME

Post-Experiment Decisions: The Dual Adjustments for Rollout and Downstream Optimizations

Guoxing He, Dan Yang, Wei Zhang

详情
英文摘要

Firms increasingly use randomized experiments to decide whether to scale up an intervention and, if so, how to re-optimize related operational choices such as inventory, capacity, or pricing. In many settings, experiments are performed on small samples, so the estimated effect of the intervention is uncertain. A common practice is to plug a 'significant' estimate of the effect into both (i) the rollout rule and (ii) the downstream optimization. However, this can lead to avoidable losses because the costs of over- versus under-estimating the effect are often asymmetric. The technically ideal approach is to obtain a data-dependent decision rule that minimizes the Bayes risk, but this lacks transparency and requires more computations. We propose Predict-Adjust-Then-Rollout-Optimize (PATRO), a plug-in approach that keeps the standard estimate, but makes data-independent adjustments, respectively, for the two types of decision. We show that the two adjustments can be substitutes or complements and provide an alternating-iteration method to compute the pair. PATRO performs both in theory and numerically close or equivalent to the Bayes-optimal benchmark, making it a simple, effective way to convert noisy experimental results into better rollout and operational decisions.

2603.10452 2026-03-12 stat.ML cs.LG

Brenier Isotonic Regression

Han Bao, Amirreza Eshraghi, Yutong Wang

Comments AISTATS2026

详情
英文摘要

Isotonic regression (IR) is shape-constrained regression to maintain a univariate fitting curve non-decreasing, which has numerous applications including single-index models and probability calibration. When it comes to multi-output regression, the classical IR is no longer applicable because the monotonicity is not readily extendable. We consider a novel multi-output regression problem where a regression function is \emph{cyclically monotone}. Roughly speaking, a cyclically monotone function is the gradient of some convex potential. Whereas enforcing cyclic monotonicity is apparently challenging, we leverage the fact that Kantorovich's optimal transport (OT) always yields a cyclically monotone coupling as an optimal solution. This perspective naturally allows us to interpret a regression function and the convex potential as a link function in generalized linear models and Brenier's potential in OT, respectively, and hence we call this IR extension \emph{Brenier isotonic regression}. We demonstrate experiments with probability calibration and generalized linear models. In particular, IR outperforms many famous baselines in probability calibration robustly.

2603.10442 2026-03-12 cs.LG stat.ML

GGMPs: Generalized Gaussian Mixture Processes

Vardaan Tekriwal, Mark D. Risser, Hengrui Luo, Marcus M. Noack

详情
英文摘要

Conditional density estimation is complicated by multimodality, heteroscedasticity, and strong non-Gaussianity. Gaussian processes (GPs) provide a principled nonparametric framework with calibrated uncertainty, but standard GP regression is limited by its unimodal Gaussian predictive form. We introduce the Generalized Gaussian Mixture Process (GGMP), a GP-based method for multimodal conditional density estimation in settings where each input may be associated with a complex output distribution rather than a single scalar response. GGMP combines local Gaussian mixture fitting, cross-input component alignment and per-component heteroscedastic GP training to produce a closed-form Gaussian mixture predictive density. The method is tractable, compatible with standard GP solvers and scalable methods, and avoids the exponentially large latent-assignment structure of naive multimodal GP formulations. Empirically, GGMPs improve distributional approximation on synthetic and real-world datasets with pronounced non-Gaussianity and multimodality.

2603.10435 2026-03-12 stat.ML cs.LG

Adaptive Active Learning for Regression via Reinforcement Learning

Simon D. Nguyen, Troy Russo, Kentaro Hoffman, Tyler H. McCormick

Comments 33 pages, 103 figures. Main paper (8 pages, 4 figures) plus appendix with proofs and supplemental experimental results. Submitted to UAI2026. Codebase available at https://github.com/thatswhatsimonsaid/WeightedGreedySampling

详情
英文摘要

Active learning for regression reduces labeling costs by selecting the most informative samples. Improved Greedy Sampling is a prominent method that balances feature-space diversity and output-space uncertainty using a static, multiplicative rule. We propose Weighted improved Greedy Sampling (WiGS), which replaces this framework with a dynamic, additive criterion. We formulate weight selection as a reinforcement learning problem, enabling an agent to adapt the exploration-investigation balance throughout learning. Experiments on 18 benchmark datasets and a synthetic environment show WiGS outperforms iGS and other baseline methods in both accuracy and labeling efficiency, particularly in domains with irregular data density where the baseline's multiplicative rule ignores high-error samples in dense regions.

2603.10400 2026-03-12 cs.LG cs.AI math.OC stat.ML

Designing Service Systems from Textual Evidence

Ruicheng Ao, Hongyu Chen, Siyang Gao, Hanwei Li, David Simchi-Levi

Comments 67 pages,

详情
英文摘要

Designing service systems requires selecting among alternative configurations -- choosing the best chatbot variant, the optimal routing policy, or the most effective quality control procedure. In many service systems, the primary evidence of performance quality is textual -- customer support transcripts, complaint narratives, compliance review reports -- rather than the scalar measurements assumed by classical optimization methods. Large language models (LLMs) can read such textual evidence and produce standardized quality scores, but these automated judges exhibit systematic biases that vary across alternatives and evaluation instances. Human expert review remains accurate but costly. We study how to identify the best service configuration with high confidence while minimizing expensive human audits, given that automated evaluation is cheap but biased. We formalize this as a sequential decision problem where a biased proxy score is observed for every evaluation, and a verified outcome can be acquired selectively at additional cost. We prove that LLM-only selection fails under arm-dependent bias, and that naive selective-audit estimators can be asymptotically biased. We develop an estimator combining proxy scores with inverse-propensity-weighted residuals and construct anytime-valid confidence sequences. Our algorithm, PP-LUCB, jointly decides which alternatives to evaluate and whether to request human audits, concentrating reviews where the LLM judge is least reliable. We prove correctness and establish instance-dependent cost bounds showing near-optimal efficiency. On a customer support ticket classification task, our algorithm correctly identifies the best model in 40/40 trials while achieving 90\% audit cost reduction.

2603.10389 2026-03-12 stat.ME

Robust Updating of a Risk Prediction Model by Integrating External Ranking Information

Nicholas C. Henderson

详情
英文摘要

Utilizing established risk factors and prognostic models can often improve the construction of a newer risk model that uses novel biomarkers in a smaller, internal study. However, directly borrowing information from an established prognostic model is often unsuitable due to differences in study populations, patient outcomes measured, and other specific features of the internal study design. To better enable the use of established prognostic information when constructing a novel risk model, we propose an estimation approach centered around the idea that the risk rankings rather than the risk scores from an established prognostic model are often more transportable to the internal study context. To leverage external ranking information, our approach introduces the ranking parameters associated with the regression coefficients of an internal risk model and estimates the internal risk model parameters by penalizing a ranking-based discrepancy measure between the ranking parameters and the rankings implied by the established prognostic model. Our method does not require the external prognostic model to have a specific form, but only requires one to compute risk score rankings from an external model. Simulation studies demonstrate that our method leads to competitive predictive performance and performs particularly well when the true internal and external prognostic models have high rank correlation but large discrepancies between their underlying risk scores. We demonstrate the use of our approach through the development of a prognostic model for advanced prostate cancer patients who were treated with an immune checkpoint inhibitor

2603.10346 2026-03-12 stat.ML cs.LG

On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits

Leo Maynard-Zhang, Zhihan Xiong, Kevin Jamieson, Maryam Fazel

详情
英文摘要

We study the fixed-budget best-arm identification (BAI) problem in non-stationary linear bandits. Concretely, given a fixed time budget $T\in \mathbb{N}$, finite arm set $\mathcal{X} \subset \mathbb{R}^d$, and a potentially adversarial sequence of unknown parameters $\lbrace θ_t\rbrace_{t=1}^{T}$ (hence non-stationary), a learner aims to identify the arm with the largest cumulative reward $x_* = \arg\max_{x \in \mathcal{X}} x^\top\sum_{t=1}^T θ_t$ with high probability. In this setting, it is well-known that uniformly sampling arms from the G-optimal design yields a minimax-optimal error probability of $\exp\left(-Θ\left(T / H_{G}\right)\right)$, where $H_{G}$ scales proportionally with the dimension $d$. However, this notion of complexity is overly pessimistic, as it is derived from a lower bound in which the arm set consists only of the standard basis vectors, thus masking any potential advantages arising from arm sets with richer geometric structure. To address this, we establish an arm-set-dependent lower bound that, in contrast, holds for any arm set. Motivated by the ideas underlying our lower bound, we propose the Adjacent-optimal design, a specialization of the well-known $\mathcal{X}\mathcal{Y}$-optimal design, and develop the $\textsf{Adjacent-BAI}$ algorithm. We prove that the error probability of $\textsf{Adjacent-BAI}$ matches our lower bound up to constants, verifying the tightness of our lower bound, and establishing the arm-set-dependent complexity of this setting.

2603.10329 2026-03-12 stat.ME math.PR

Optimized combination of independent or simultaneous e-values

Jiahao Ming, Yi Shen, Ruodu Wang

详情
英文摘要

We show that a class of optimized e-value combinations, arising from a standard construction of e-processes, remains valid even when the tuning parameter is optimized based on the data. This result holds for independent e-values, and, more generally, for a new class called simultaneous e-variables, whose dependence structure lies between independence and sequential validity. We further propose an improved combination test for such e-values based on elementary symmetric polynomials.

2603.10318 2026-03-12 math.PR cs.IT math.CO math.IT math.OC stat.CO

Optimising two-block averaging kernels to speed up Markov chains

Ryan J. Y. Lim, Michael C. H. Choi

Comments 45 pages, 5 figures

详情
英文摘要

We study the problem of selecting optimal two-block partitions to accelerate the mixing of finite Markov chains under group-averaging transformations. The main objectives considered are the Kullback-Leibler (KL) divergence and the Frobenius distance to stationarity. We establish explicit connections between these objectives and the induced projection chain. In the case of the KL divergence, this reduction yields explicit decay rates in terms of the log-Sobolev constant. For the Frobenius distance, we identify a Cheeger-type functional that characterises optimal cuts. This formulation recasts two-block selection as a structured combinatorial optimisation problem admitting difference-of-submodular decompositions. We further propose several algorithmic approximations, including majorisation-minimisation and coordinate descent schemes, as computationally feasible alternatives to exhaustive combinatorial search. Our numerical experiments reveal that optimal cuts under the two objectives can substantially reduce total variation distance to stationarity and demonstrate the practical effectiveness of the proposed approximation algorithms.

2603.10287 2026-03-12 stat.ML cs.LG

MultiwayPAM: Multiway Partitioning Around Medoids for LLM-as-a-Judge Score Analysis

Chihiro Watanabe, Jingyu Sun

详情
英文摘要

LLM-as-a-Judge is a flexible framework for text evaluation, which allows us to obtain scores for the quality of a given text from various perspectives by changing the prompt template. Two main challenges in using LLM-as-a-Judge are computational cost of LLM inference, especially when evaluating a large number of texts, and inherent bias of an LLM evaluator. To address these issues and reveal the structure of score bias caused by an LLM evaluator, we propose to apply a tensor clustering method to a given LLM-as-a-Judge score tensor, whose entries are the scores for different combinations of questions, answerers, and evaluators. Specifically, we develop a new tensor clustering method MultiwayPAM, with which we can simultaneously estimate the cluster membership and the medoids for each mode of a given data tensor. By observing the medoids obtained by MultiwayPAM, we can gain knowledge about the membership of each question/answerer/evaluator cluster. We experimentally show the effectiveness of MultiwayPAM by applying it to the score tensors for two practical datasets.

2603.10230 2026-03-12 math.OC cs.LG cs.NA math.NA stat.ML

A Trust-Region Interior-Point Stochastic Sequential Quadratic Programming Method

Yuchen Fang, Jihun Kim, Sen Na, James Demmel, Javad Lavaei

详情
英文摘要

In this paper, we propose a trust-region interior-point stochastic sequential quadratic programming (TR-IP-SSQP) method for solving optimization problems with a stochastic objective and deterministic nonlinear equality and inequality constraints. In this setting, exact evaluations of the objective function and its gradient are unavailable, but their stochastic estimates can be constructed. In particular, at each iteration our method builds stochastic oracles, which estimate the objective value and gradient to satisfy proper adaptive accuracy conditions with a fixed probability. To handle inequality constraints, we adopt an interior-point method (IPM), in which the barrier parameter follows a prescribed decaying sequence. Under standard assumptions, we establish global almost-sure convergence of the proposed method to first-order stationary points. We implement the method on a subset of problems from the CUTEst test set, as well as on logistic regression problems, to demonstrate its practical performance.

2603.10219 2026-03-12 stat.ML cs.AI cs.LG math.ST stat.TH

A Diffusion Analysis of Policy Gradient for Stochastic Bandits

Tor Lattimore

Comments 17 pages

详情
英文摘要

We study a continuous-time diffusion approximation of policy gradient for $k$-armed stochastic bandits. We prove that with a learning rate $η= O(Δ^2/\log(n))$ the regret is $O(k \log(k) \log(n) / η)$ where $n$ is the horizon and $Δ$ the minimum gap. Moreover, we construct an instance with only logarithmically many arms for which the regret is linear unless $η= O(Δ^2)$.

2603.10218 2026-03-12 stat.AP

Bayesian Synchronization of Proxy Paleorecords with Reference Chronologies

Marco A. Aquino-López, Francesco Muschitiello, Matt Osman

Comments 24 pages, 10 figures

详情
英文摘要

Many scientific fields compare two or more noisy time series that integrate the same underlying process but are recorded on different time scales. In paleoclimate studies, for example, proxy measurements are collected versus stratigraphic depth in a climate archive and then converted to calendar time. Synchronizing two proxy records often requires estimating an alignment that maps the depth (or preliminary age) of an input record onto the calendar--time scale of an absolutely--dated target record so that corresponding proxy signals line up. Existing alignment approaches are generally optimization--based and return a single transformation, providing limited formal uncertainty quantification. Here, we introduce BSync, a Bayesian synchronization framework that treats alignments as inference over a monotone time--mapping function to match an input to a target record. The alignment is expressed as a transformation of the input depth (or age) scale to match the target record, achieved through a link function that locally expands and compresses the input scale. The model is parameterized through interpretable local rate parameters, enabling the specification of priors on deposition times to regularize the alignment toward physically plausible deformations. BSync jointly infers the aligned chronology and provides posterior uncertainty for the time--warping function and the resulting age scale. In synthetic data experiments and a real--data case study, BSync yields well--calibrated credible intervals for the aligned time scale and achieves more accurate alignments than a state--of--the--art automated method, particularly when independent age constraints are sparse.

2603.10215 2026-03-12 q-bio.PE cs.LG stat.ML

SDSR: A Spectral Divide-and-Conquer Approach for Species Tree Reconstruction

Ortal Reshef, Ofer Glassman, Or Zuk, Yariv Aizenbud, Boaz Nadler, Ariel Jaffe

Comments 35 pages, 13 figures. Code available at https://github.com/reshefo/sdsr

详情
英文摘要

Recovering a tree that represents the evolutionary history of a group of species is a key task in phylogenetics. Performing this task using sequence data from multiple genetic markers poses two key challenges. The first is the discordance between the evolutionary history of individual genes and that of the species. The second challenge is computational, as contemporary studies involve thousands of species. Here we present SDSR, a scalable divide-and-conquer approach for species tree reconstruction based on spectral graph theory. The algorithm recursively partitions the species into subsets until their sizes are below a given threshold. The trees of these subsets are reconstructed by a user-chosen species tree algorithm. Finally, these subtrees are merged to form the full tree. On the theoretical front, we derive recovery guarantees for SDSR, under the multispecies coalescent (MSC) model. We also perform a runtime complexity analysis. We show that SDSR, when combined with a species tree reconstruction algorithm as a subroutine, yields substantial runtime savings as compared to applying the same algorithm on the full data. Empirically, we evaluate SDSR on synthetic benchmark datasets with incomplete lineage sorting and horizontal gene transfer. In accordance with our theoretical analysis, the simulations show that combining SDSR with common species tree methods, such as CA-ML or ASTRAL, yields up to 10-fold faster runtimes. In addition, SDSR achieves a comparable tree reconstruction accuracy to that obtained by applying these methods on the full data.

2603.10204 2026-03-12 stat.ME

A General Theory of Outcome Weighted Learning for Individualized Treatment Rules

Zhu Wang

详情
英文摘要

Personalized medicine aims to tailor treatments to individual patients, especially when people respond heterogeneously to therapies. A key objective is to learn individualized treatment rules that recommend optimal treatments from patient characteristics. Outcome weighted learning (OWL) is an important framework because it reformulates the task as a weighted classification problem targeting clinical benefit and using modern machine learning tools. Existing OWL theory has been focusing on specific surrogate losses and Gaussian kernels. Matern kernels, which allow adjustable smoothness and better match many real world data structures, are often more suitable and include the Gaussian kernel as a special case. This work develops a general relationship between population 0-1 risk and risks from a broad class of nonnegative surrogate losses using a constrained variational transformation. The transform simplifies for convex losses and provides simple expressions for certain nonconvex losses. A condition is established that ensures a nontrivial upper bound on the excess 0-1 risk. The paper establishes convergence rates for kernel based OWL under smoothness conditions with Matern kernels or geometric noise conditions with Gaussian kernels for both convex and nonconvex losses. It also proposes two iteratively reweighted convex optimization algorithms. Simulations and an application to ACTG 175 show strong performance.

2603.10184 2026-03-12 stat.ML cs.LG

Stability and Robustness via Regularization: Bandit Inference via Regularized Stochastic Mirror Descent

Budhaditya Halder, Ishan Sengupta, Koustav Chowdhury, Koulik Khamaru

详情
英文摘要

Statistical inference with bandit data presents fundamental challenges due to adaptive sampling, which violates the independence assumptions underlying classical asymptotic theory. Recent work has identified stability as a sufficient condition for valid inference under adaptivity. This paper develops a systematic theory of stability for bandit algorithms based on stochastic mirror descent, a broad algorithmic framework that includes the widely-used EXP3 algorithm as a special case. Our contributions are threefold. First, we establish a general stability criterion: if the average iterates of a stochastic mirror descent algorithm converge in ratio to a non-random probability vector, then the induced bandit algorithm is stable. This result provides a unified lens for analyzing stability across diverse algorithmic instantiations. Second, we introduce a family of regularized-EXP3 algorithms employing a log-barrier regularizer with appropriately tuned parameters. We prove that these algorithms satisfy our stability criterion and, as an immediate corollary, that Wald-type confidence intervals for linear functionals of the mean parameter achieve nominal coverage. Notably, we show that the same algorithms attain minimax-optimal regret guarantees up to logarithmic factors, demonstrating that inference-enabling stability and learning efficiency are compatible objectives within the mirror descent framework. Third, we establish robustness to corruption: a modified variant of regularized-EXP3 maintains asymptotic normality of empirical arm means even in the presence of $o(T^{1/2})$ adversarial corruptions. This stands in sharp contrast to other stable algorithms such as UCB, which suffer linear regret even under logarithmic levels of corruption.

2603.10169 2026-03-12 stat.ME stat.AP

Novel g-computation algorithms for time-varying actions with recurrent and semi-competing events

Alena Sorensen D'Alessio, Lucas M. Neuroth, Jessie K Edwards, Chantel L. Martin, Paul N Zivich

Comments 22 pages, 2 figures, 5 tables

详情
英文摘要

Background: A core aspect of epidemiology is determining the impacts of potential public health interventions over time. With long follow-up periods, epidemiologists may need to consider semi-competing events, in which a terminal event, like death, precludes a non-terminal event, like hypertension. Time-varying confounding poses an additional challenge when studying time-varying interventions or actions. Existing methods do not simultaneously address semi- competing events and time-varying confounding. Methods: We propose two novel g-computation algorithms for causal effects with semi- competing events and time-varying actions. To explore performance of our novel g-computation estimators, we conducted a Monte Carlo simulation study. We then applied our estimator to investigate how cigarette smoking prevention throughout young and middle adulthood might impact prevalent hypertension using data from Waves III (aged 18-26 years) - VI (aged 39-51 years) of the National Longitudinal Study of Adolescent to Adult Health. Results: Our simulations show that the novel g-computation estimators had little bias and appropriate confidence interval coverage. They outperformed existing alternative estimators across sample sizes. In the illustrative application, the novel estimator identified a small reduction in prevalence of hypertension and risk of death in midlife had all cigarette smoking been prevented across follow-up compared to the observed smoking patterns. Conclusion: As long-running cohorts progress in age, death within the study sample will become an increasing concern for studies of aging-related outcomes, life course analyses, and investigations into chronic disease development. Our novel g-computation estimators provide a simultaneous solution.

2603.10136 2026-03-12 stat.ME

Pseudo Empirical Best Prediction of Multiple Characteristics in Small Areas

William Acero, Domingo Morales, Isabel Molina

详情
英文摘要

Small area estimators that ignore the sampling design lack design consistency when the sampling mechanism is complex and may be severely biased under informative designs. Existing procedures that account for the survey weights under unit-level models typically focus on a single response variable. This paper addresses the estimation of area means for several dependent target variables under a multivariate nested error regression (MNER) model. We propose a multivariate pseudo-empirical best linear unbiased predictor that accounts for the sampling mechanism. Moreover, by aggregating the MNER model, we derive a unified predictor that can be obtained from either unit-level or area-level data. Bootstrap procedures are proposed to estimate the mean squared errors (MSEs) of the proposed predictors. Simulation experiments are conducted to examine the properties of the proposed small area estimators and the MSE estimators. Finally, an application with housing data illustrates the proposed methods.

2603.10132 2026-03-12 cs.CV cs.LG math.ST stat.TH

Unbalanced Optimal Transport Dictionary Learning for Unsupervised Hyperspectral Image Clustering

Joshua Lentz, Nicholas Karris, Alex Cloninger, James M. Murphy

Comments IEEE WHISPERS 2025

详情
英文摘要

Hyperspectral images capture vast amounts of high-dimensional spectral information about a scene, making labeling an intensive task that is resistant to out-of-the-box statistical methods. Unsupervised learning of clusters allows for automated segmentation of the scene, enabling a more rapid understanding of the image. Partitioning the spectral information contained within the data via dictionary learning in Wasserstein space has proven an effective method for unsupervised clustering. However, this approach requires balancing the spectral profiles of the data, blurring the classes, and sacrificing robustness to outliers and noise. In this paper, we suggest improving this approach by utilizing unbalanced Wasserstein barycenters to learn a lower-dimensional representation of the underlying data. The deployment of spectral clustering on the learned representation results in an effective approach for the unsupervised learning of labels.

2603.10095 2026-03-12 cs.LG stat.ML

Rethinking Adam for Time Series Forecasting: A Simple Heuristic to Improve Optimization under Distribution Shifts

Yuze Dong, Jinsong Wu

详情
英文摘要

Time-series forecasting often faces challenges from non-stationarity, particularly distributional drift, where the data distribution evolves over time. This dynamic behavior can undermine the effectiveness of adaptive optimizers, such as Adam, which are typically designed for stationary objectives. In this paper, we revisit Adam in the context of non-stationary forecasting and identify that its second-order bias correction limits responsiveness to shifting loss landscapes. To address this, we propose TS_Adam, a lightweight variant that removes the second-order correction from the learning rate computation. This simple modification improves adaptability to distributional drift while preserving the optimizer core structure and requiring no additional hyperparameters. TS_Adam integrates easily into existing models and consistently improves performance across long- and short-term forecasting tasks. On the ETT datasets with the MICN model, it achieves an average reduction of 12.8% in MSE and 5.7% in MAE compared to Adam. These results underscore the practicality and versatility of TS_Adam as an effective optimization strategy for real-world forecasting scenarios involving non-stationary data. Code is available at: https://github.com/DD-459-1/TS_Adam.

2603.10089 2026-03-12 stat.ME stat.AP

Trajectory-informed graph-based clustering for longitudinal cancer subtyping

Lara Cavinato, Marco Rocchi, Luca Viganò, Francesca Ieva

详情
英文摘要

Cancer subtyping plays a crucial role in informing prognosis and guiding personalized treatment strategies. However, conventional subtyping approaches often rely on static, biopsy-derived scores that hardly capture the biological heterogeneity and temporal evolution of the disease. In this study, we propose a novel trajectory-informed clustering method for cancer subtyping that integrates multi-modal clinical data and longitudinal patient trajectories. Our method constructs a patient similarity graph using time-varying imaging-derived features, clinical covariates, and transitions among key clinical states such as therapy, surveillance, relapse, and death. This graph structure enables the identification of patient subgroups that are not only phenotypically and genotypically distinct but also aligned with patterns of disease progression. We position our approach within the landscape of existing subtyping methods and highlight its advantages in terms of temporal modeling and graph-based interpretability. Through simulation studies and application to a real world dataset of liver metastases, we demonstrate the ability of our framework to uncover clinically relevant subtypes with distinct prognostic trajectories. Our results underscore the potential of trajectory-informed clustering to enhance personalized oncology by bridging cross-sectional biomarkers with dynamic disease evolution.

2603.10073 2026-03-12 math.ST cs.IT math.IT math.PR stat.TH

Universal Shuffle Asymptotics, Part II: Non-Gaussian Limits for Shuffle Privacy -- Poisson, Skellam, and Compound-Poisson Regimes

Alex Shvets

Comments 35 pages. Part II of a series; Part I is arXiv:2602.09029

详情
英文摘要

Part I of this series (arXiv:2602.09029) develops a sharp Gaussian (LAN/GDP) limit theory for neighboring shuffle experiments when the local randomizer is fixed and has full support bounded away from zero. The present paper characterizes the first universality-breaking frontier: critical sequences of increasingly concentrated local randomizers for which classical Lindeberg conditions fail and the shuffle score exhibits rare macroscopic jumps. For shuffled binary randomized response with local privacy $\varepsilon_0 = \varepsilon_0(n)$, we prove experiment-level convergence (in Le Cam distance) to explicit shift limit experiments: a Poisson-shift limit for the canonical neighboring pair when $\exp(\varepsilon_0(n))/n \to c^2$, and a Skellam-shift limit for proportional compositions $k/n \to π\in (0,1)$ in the same scaling, including an explicit disappearance of the two-sided $δ$-floor away from boundary compositions. For general finite alphabets, we introduce a sparse-error critical regime and prove a multivariate compound-Poisson / independent Poisson vector limit for the centered released histogram, yielding a multivariate Poisson-shift experiment and an explicit limiting $(\varepsilon, δ)$ curve as a multivariate Poisson series. Together with Part I, these results yield a three-regime picture (Gaussian/GDP, critical Poisson/Skellam/compound-Poisson, and super-critical no privacy) under convergent macroscopic scalings.

2603.07371 2026-03-12 cs.LG cs.AI stat.AP stat.ME stat.ML

ConfHit: Conformal Generative Design with Oracle Free Guarantees

Siddhartha Laghuvarapu, Ying Jin, Jimeng Sun

Comments Accepted at ICLR 2026

详情
英文摘要

The success of deep generative models in scientific discovery requires not only the ability to generate novel candidates but also reliable guarantees that these candidates indeed satisfy desired properties. Recent conformal-prediction methods offer a path to such guarantees, but its application to generative modeling in drug discovery is limited by budget constraints, lack of oracle access, and distribution shift. To this end, we introduce ConfHit, a distribution-free framework that provides validity guarantees under these conditions. ConfHit formalizes two central questions: (i) Certification: whether a generated batch can be guaranteed to contain at least one hit with a user-specified confidence level, and (ii) Design: whether the generation can be refined to a compact set without weakening this guarantee. ConfHit leverages weighted exchangeability between historical and generated samples to eliminate the need for an experimental oracle, constructs multiple-sample density-ratio weighted conformal p-value to quantify statistical confidence in hits, and proposes a nested testing procedure to certify and refine candidate sets of multiple generated samples while maintaining statistical guarantees. Across representative generative molecule design tasks and a broad range of methods, ConfHit consistently delivers valid coverage guarantees at multiple confidence levels while maintaining compact certified sets, establishing a principled and reliable framework for generative modeling.

2603.07331 2026-03-12 physics.ao-ph physics.soc-ph stat.AP

Causal Attribution of Coastal Water Clarity Degradation to Nickel Processing Expansion at the Indonesia Morowali Industrial Park, Sulawesi

Sandy Hardian Susanto Herho, Alfita Puspa Handayani, Iwan Pramesti Anwar, Faruq Khadami, Karina Aprilia Sujatmiko, Doandy Yonathan Wibisono, Rusmawan Suwarman, Dasapta Erwin Irawan

Comments 19 pages, 8 figures

详情
英文摘要

Indonesia's nickel ore export ban has driven rapid expansion of smelting and hydrometallurgical processing capacity at the Indonesia Morowali Industrial Park (IMIP), now the world's largest integrated nickel processing complex, on the coast of Central Sulawesi. Whether this industrialization has degraded the adjacent marine environment remains unquantified. We apply Bayesian structural time-series (BSTS) causal inference to a multi-decadal, multi-sensor satellite ocean color record of the diffuse attenuation coefficient at 490 nm, $K_d(490)$, to test for a causal link between IMIP expansion and nearshore turbidity change. A consensus structural breakpoint, a significant posterior causal effect estimated against a Banda Sea counterfactual, and a distribution-free placebo rank test collectively establish that coastal water clarity deteriorated after the transition from initial nickel pig iron production to hyper-expansion of high-pressure acid leaching facilities for battery-grade nickel. Satellite-derived land cover analysis independently corroborates this timing, showing substantial built-area growth and concurrent tree cover loss within the IMIP footprint. The resulting euphotic zone shoaling occurs in oligotrophic waters supporting high marine biodiversity, where even moderate optical degradation may impair coral photosynthesis and compress depth-dependent reef habitat. These findings quantify a marine environmental cost absent from Indonesia's mineral downstreaming policy discourse and demonstrate a transferable, satellite-based quasi-experimental framework for causal impact assessment at coastal industrial sites in data-limited tropical settings.

2603.07252 2026-03-12 stat.ME

Insights into the Relationship Between D- and A-optimal Designs

Andrew T. Karl, Bradley Jones

详情
英文摘要

For a fixed linear-model basis, we show that the $A$ criterion factors into an inverse-$D$ scale term and a dimensionless sphericity factor that depends only on eigenvalue dispersion. This factor isolates exactly the part of $A$ not controlled by the determinant, explaining why designs that are exact or near ties in $D$ can differ materially in coefficient-variance, aliasing, and prediction-variance behavior. We illustrate the factorization on a published $D$ tie and on screening settings with infinitely many $D$-optimal solutions, then use the same scale/shape viewpoint as a lightweight post-screen within a space-filling candidate pool. A final section connects the same idea to Kiefer's $Φ$-class and introduces sphericity profiles.

2603.03507 2026-03-12 cs.LG cond-mat.dis-nn q-bio.NC stat.ML

Solving adversarial examples requires solving exponential misalignment

Alessandro Salvatore, Stanislav Fort, Surya Ganguli

详情
英文摘要

Adversarial attacks - input perturbations imperceptible to humans that fool neural networks - remain both a persistent failure mode in machine learning, and a phenomenon with mysterious origins. To shed light, we define and analyze a network's perceptual manifold (PM) for a class concept as the space of all inputs confidently assigned to that class by the network. We find, strikingly, that the dimensionalities of neural network PMs are orders of magnitude higher than those of natural human concepts. Since volume typically grows exponentially with dimension, this suggests exponential misalignment between machines and humans, with exponentially many inputs confidently assigned to concepts by machines but not humans. Furthermore, this provides a natural geometric hypothesis for the origin of adversarial examples: because a network's PM fills such a large region of input space, any input will be very close to any class concept's PM. Our hypothesis thus suggests that adversarial robustness cannot be attained without dimensional alignment of machine and human PMs, and therefore makes strong predictions: both robust accuracy and distance to any PM should be negatively correlated with the PM dimension. We confirmed these predictions across 18 different networks of varying robust accuracy. Crucially, we find even the most robust networks are still exponentially misaligned, and only the few PMs whose dimensionality approaches that of human concepts exhibit alignment to human perception. Our results connect the fields of alignment and adversarial examples, and suggest the curse of high dimensionality of machine PMs is a major impediment to adversarial robustness.

2603.01630 2026-03-12 cs.AI stat.AP

SEED-SET: Scalable Evolving Experimental Design for System-level Ethical Testing

Anjali Parashar, Yingke Li, Eric Yang Yu, Fei Chen, James Neidhoefer, Devesh Upadhyay, Chuchu Fan

Comments 10 main pages along with Appendix containing additional results, manuscript accepted in ICLR 2026

详情
英文摘要

As autonomous systems such as drones, become increasingly deployed in high-stakes, human-centric domains, it is critical to evaluate the ethical alignment since failure to do so imposes imminent danger to human lives, and long term bias in decision-making. Automated ethical benchmarking of these systems is understudied due to the lack of ubiquitous, well-defined metrics for evaluation, and stakeholder-specific subjectivity, which cannot be modeled analytically. To address these challenges, we propose SEED-SET, a Bayesian experimental design framework that incorporates domain-specific objective evaluations, and subjective value judgments from stakeholders. SEED-SET models both evaluation types separately with hierarchical Gaussian Processes, and uses a novel acquisition strategy to propose interesting test candidates based on learnt qualitative preferences and objectives that align with the stakeholder preferences. We validate our approach for ethical benchmarking of autonomous agents on two applications and find our method to perform the best. Our method provides an interpretable and efficient trade-off between exploration and exploitation, by generating up to $2\times$ optimal test candidates compared to baselines, with $1.25\times$ improvement in coverage of high dimensional search spaces.

2602.18045 2026-03-12 stat.ME cs.AI cs.LG

Conformal Tradeoffs: Operational Profiles Beyond Coverage

Petrus H. Zwart

详情
英文摘要

Conformal prediction gives exact finite-sample coverage guarantees under exchangeability, but deployed systems are judged by more than coverage alone. For a fixed calibrated rule reused over a finite operational window, stakeholders also care about deployment-facing quantities such as commitment frequency, deferral, and decisive error exposure. These are not determined by coverage: calibration choices with similar coverage can still induce materially different operational profiles. We study this characterization gap in a scoped setting: binary split conformal prediction under exchangeability with a fixed deployed rule. We introduce the Small-Sample Beta Correction (SSBC) which gives finite-sample coverage semantics for the deployed rule: it inverts the Beta/Beta--Binomial law governing calibration-conditional coverage to map a user request $(α^\star,δ)$ to the least conservative calibration grid point with calibration-conditional PAC semantics for the realized deployed rule. Calibrate-and-Audit then fixes the rule by calibration and uses an independent audit split to estimate the induced region--class label table, a reusable summary from which deployment-facing Key Performance Indicators (KPIs) follow by projection. Under this design, fixed operational rates admit exact finite-sample Binomial inference, while Beta--Binomial envelopes serve as practical predictive summaries for future windows. The induced partition also exposes regime boundaries, Pareto-relevant tradeoffs, and inverse-pricing questions for fixed downstream conventions. Simulations validate the SSBC semantics and compare audit-based summaries with leave-one-out planning proxies; molecular toxicity data provide an audit-based empirical example, and a solubility case study illustrates scenario planning once coverage semantics are fixed.

2602.04472 2026-03-12 math.ST cs.LG math.PR stat.ML stat.TH

Universality of General Spiked Tensor Models

Yanjin Xiang, Zhihua Zhang

Comments 115pages

详情
英文摘要

We study asymmetric rank-one spiked tensor models in the high-dimensional regime, where the noise entries are independent and identically distributed with zero mean, unit variance, and finite fourth moment. This extends the classical Gaussian framework to a substantially broader class of noise distributions. We analyze the maximum-likelihood estimator associated with the best rank-one approximation of an order-$d$ tensor, for $d\ge 3$. Our approach is formulated along an informative, spectrally separated branch of stationary points of the non-convex maximum-likelihood landscape. In the core order-three asymmetric model, we verify locally in the high-signal regime that such an informative branch exists and remains separated from the bulk. Under this branch-selection framework, we show that the empirical spectral distribution of a suitable block-wise tensor contraction converges almost surely to the same deterministic limit as in the Gaussian case. As a consequence, the asymptotic singular value and the mode-wise alignments between the estimated and planted spike directions admit the same explicit characterizations as under Gaussian noise. These results establish a universality principle for asymmetric spiked tensor models: the high-dimensional spectral behavior and statistical limits of the selected maximum-likelihood stationary point are robust beyond the Gaussian setting. Our proof combines resolvent methods from random matrix theory, cumulant expansions under finite fourth-moment assumptions, and Efron--Stein-type variance bounds. A main technical difficulty is to control the statistical dependence between the estimator and the noise, including the associated cross terms in the non-Gaussian setting.

2602.04347 2026-03-12 stat.ML cs.LG

A Bandit-Based Approach to Educational Recommender Systems: Contextual Thompson Sampling for Learner Skill Gain Optimization

Lukas De Kerpel, Arthur Thuy, Dries F. Benoit

Comments Accepted for publication in INFORMS Transactions on Education

详情
英文摘要

In recent years, instructional practices in Operations Research (OR), Management Science (MS), and Analytics have increasingly shifted toward digital environments, where large and diverse groups of learners make it difficult to provide practice that adapts to individual needs. This paper introduces a method that generates personalized sequences of exercises by selecting, at each step, the exercise most likely to advance a learner's understanding of a targeted skill. The method uses information about the learner and their past performance to guide these choices, and learning progress is measured as the change in estimated skill level before and after each exercise. Using data from an online mathematics tutoring platform, we find that the approach recommends exercises associated with greater skill improvement and adapts effectively to differences across learners. From an instructional perspective, the framework enables personalized practice at scale, highlights exercises with consistently strong learning value, and helps instructors identify learners who may benefit from additional support.

2602.00387 2026-03-12 stat.ML cs.LG stat.AP

Singular Bayesian Neural Networks

Mame Diarra Toure, David A. Stephens

Comments 8 pages Main text, 53 pages Appendix, 20 figures

详情
英文摘要

Bayesian neural networks promise calibrated uncertainty but require $O(mn)$ parameters for standard mean-field Gaussian posteriors. We argue this cost is often unnecessary, particularly when weight matrices exhibit fast singular value decay. By parameterizing weights as $W = AB^{\top}$ with $A \in \mathbb{R}^{m \times r}$, $B \in \mathbb{R}^{n \times r}$, we induce a posterior that is singular with respect to the Lebesgue measure, concentrating on the rank-$r$ manifold. This singularity captures structured weight correlations through shared latent factors, geometrically distinct from mean-field's independence assumption. We derive PAC-Bayes generalization bounds whose complexity term scales as $\sqrt{r(m+n)}$ instead of $\sqrt{m n}$, and prove loss bounds that decompose the error into optimization and rank-induced bias using the Eckart-Young-Mirsky theorem. We further adapt recent Gaussian complexity bounds for low-rank deterministic networks to Bayesian predictive means. Empirically, across MLPs, LSTMs, and Transformers on standard benchmarks, our method achieves predictive performance competitive with 5-member Deep Ensembles while using up to $15\times$ fewer parameters. Furthermore, it substantially improves OOD detection and often improves calibration relative to mean-field and perturbation baselines.

2601.18837 2026-03-12 cs.LG stat.ML

Time series forecasting with Hahn Kolmogorov-Arnold networks

Md Zahidul Hasan, A. Ben Hamza, Nizar Bouguila

详情
英文摘要

Recent Transformer- and MLP-based models have demonstrated strong performance in long-term time series forecasting, yet Transformers remain limited by their quadratic complexity and permutation-equivariant attention, while MLPs exhibit spectral bias. We propose HaKAN, a versatile model based on Kolmogorov-Arnold Networks (KANs), leveraging Hahn polynomial-based learnable activation functions and providing a lightweight and interpretable alternative for multivariate time series forecasting. Our model integrates channel independence, patching, a stack of Hahn-KAN blocks with residual connections, and a bottleneck structure comprised of two fully connected layers. The Hahn-KAN block consists of inter- and intra-patch KAN layers to effectively capture both global and local temporal patterns. Extensive experiments on various forecasting benchmarks demonstrate that our model consistently outperforms recent state-of-the-art methods, with ablation studies validating the effectiveness of its core components.

2601.17374 2026-03-12 stat.ML cs.LG cs.NA math.NA

Error Analysis of Bayesian Inverse Problems with Generative Priors

Bamdad Hosseini, Ziqi Huang

Comments 30 pages, 8 figures

详情
英文摘要

Data-driven methods for the solution of inverse problems have become widely popular in recent years thanks to the rise of machine learning techniques. A popular approach concerns the training of a generative model on additional data to learn a bespoke prior for the problem at hand. In this article we present an analysis for such problems by presenting quantitative error bounds for minimum Wasserstein-2 generative models for the prior. We show that under some assumptions, the error in the posterior due to the generative prior will inherit the same rate as the prior with respect to the Wasserstein-1 distance. We further present numerical experiments that verify that aspects of our error analysis manifests in some benchmarks followed by an elliptic PDE inverse problem where a generative prior is used to model a non-stationary field.

2601.17217 2026-03-12 stat.ME stat.ML

Transfer learning for functional linear regression via control variates

Yuping Yang, Zhiyang Zhou

Comments 45 pages, 2 figures

详情
英文摘要

Transfer learning (TL) has emerged as a powerful tool for improving estimation and prediction performance by leveraging information from related datasets, with the offset TL (O-TL) being a prevailing implementation. In this paper, we adapt the control-variates (CVS) method for TL and develop CVS-based estimators for scalar-on-function regression, one of the most fundamental models in functional data analysis. These estimators rely exclusively on dataset-specific summary statistics, thereby avoiding the pooling of subject-level data and remaining applicable in privacy-restricted or decentralized settings. We establish, for the first time, a theoretical connection between O-TL and CVS-based TL, showing that these two seemingly distinct TL strategies adjust local estimators in fundamentally similar ways. We further derive convergence rates that explicitly account for the unavoidable but typically overlooked smoothing error arising from discretely observed functional predictors, and clarify how similarity among covariance functions across datasets governs the performance of TL. Numerical studies support the theoretical findings and demonstrate that the proposed methods achieve competitive estimation and prediction performance compared with existing alternatives.

2601.08527 2026-03-12 math.NA cs.LG cs.NA math.PR stat.ML

Sampling via Stochastic Interpolants by Langevin-based Velocity and Initialization Estimation in Flow ODEs

Chenguang Duan, Yuling Jiao, Gabriele Steidl, Christian Wald, Jerry Zhijian Yang, Ruizhe Zhang

详情
英文摘要

We propose a novel method for sampling from unnormalized Boltzmann densities based on a probability flow ordinary differential equation (ODE) derived from linear stochastic interpolants. The key innovation of our approach is the use of a sequence of Langevin samplers to enable efficient simulation of the flow. Specifically, these Langevin samplers are employed (i) to generate samples from the interpolant distribution at intermediate times and (ii) to construct, starting from these intermediate times, a robust estimator of the velocity field governing the probability flow ODE. Theoretically, we provide convergence guarantees for both Langevin components, and establish a non-asymptotic convergence rate for the probability flow ODE. Extensive numerical experiments demonstrate the efficiency of the proposed method on challenging multimodal distributions across a range of dimensions, as well as its effectiveness in Bayesian inference tasks.

2512.10445 2026-03-12 stat.ML cs.AI cs.LG stat.ME

Maximum Risk Minimization with Random Forests

Francesco Freni, Anya Fries, Linus Kühne, Markus Reichstein, Jonas Peters

Comments 47 pages, 13 figures

详情
英文摘要

We consider a regression setting where observations are collected in different environments modeled by different data distributions. The field of out-of-distribution (OOD) generalization aims to design methods that generalize better to test environments whose distributions differ from those observed during training. One line of such works has proposed to minimize the maximum risk across environments, a principle that we refer to as MaxRM (Maximum Risk Minimization). In this work, we introduce variants of random forests based on the principle of MaxRM. We provide computationally efficient algorithms and prove statistical consistency for our primary method. Our proposed method can be used with each of the following three risks: the mean squared error, the negative reward, and the regret (which quantifies the excess risk relative to the best predictor). For MaxRM with regret as the risk, we prove a novel out-of-sample guarantee over unseen test distributions. Finally, we evaluate the proposed methods on both simulated and real-world data.

2511.06934 2026-03-12 cs.GT cs.MA stat.ME stat.OT

Sequential Causal Normal Form Games: Theory, Computation, and Strategic Signaling

Dennis Thumm

Comments AAAI 2026 Workshop on Foundations of Agentic Systems Theory

详情
英文摘要

Can classical game-theoretic frameworks be extended to capture the bounded rationality and causal reasoning of AI agents? We investigate this question by extending Causal Normal Form Games (CNFGs) to sequential settings, introducing Sequential Causal Multi-Agent Systems (S-CMAS) that incorporate Pearl's Causal Hierarchy across leader-follower interactions. While theoretically elegant -- we prove PSPACE-completeness, develop equilibrium refinements, and establish connections to signaling theory -- our comprehensive empirical investigation reveals a critical limitation: S-CNE provides zero welfare improvement over classical Stackelberg equilibrium across all tested scenarios. Through 50+ Monte Carlo simulations and hand-crafted synthetic examples, we demonstrate that backward induction with rational best-response eliminates any strategic advantage from causal layer distinctions. We construct a theoretical example illustrating conditions where benefits could emerge ($ε$-rational satisficing followers), though implementation confirms that even relaxed rationality assumptions prove insufficient when good instincts align with optimal play. This negative result provides valuable insight: classical game-theoretic extensions grounded in rational choice are fundamentally incompatible with causal reasoning advantages, motivating new theoretical frameworks beyond standard Nash equilibrium for agentic AI.

2511.04361 2026-03-12 q-fin.CP cs.LG stat.ME stat.OT

Causal Regime Detection in Energy Markets With Augmented Time Series Structural Causal Models

Dennis Thumm

Comments EurIPS 2025 Workshop Causality for Impact: Practical challenges for real-world applications of causal methods

详情
英文摘要

Energy markets exhibit complex causal relationships between weather patterns, generation technologies, and price formation, with regime changes occurring continuously rather than at discrete break points. Current approaches model electricity prices without explicit causal interpretation or counterfactual reasoning capabilities. We introduce Augmented Time Series Causal Models (ATSCM) for energy markets, extending counterfactual reasoning frameworks to multivariate temporal data with learned causal structure. Our approach models energy systems through interpretable factors (weather, generation mix, demand patterns), rich grid dynamics, and observable market variables. We integrate neural causal discovery to learn time-varying causal graphs without requiring ground truth DAGs. Applied to real-world electricity price data, ATSCM enables novel counterfactual queries such as "What would prices be under different renewable generation scenarios?".

2510.25408 2026-03-12 math.ST stat.TH

Empirical Orlicz norms

Fabian Mies

详情
英文摘要

The empirical Orlicz norm based on a random sample is defined as a natural estimator of the Orlicz norm of a univariate probability distribution. A law of large numbers is derived under minimal assumptions. The latter extends readily to a linear and a nonparametric regression model. Secondly, sufficient conditions for a central limit theorem with a standard rate of convergence are supplied. The conditions for the CLT exclude certain canonical examples, such as the empirical sub-Gaussian norm of normally distributed random variables. For the latter, we discover a nonstandard rate of $n^{1/4} \log(n)^{3/8}$, with a heavy-tailed, stable limit distribution. It is shown that in general, the empirical Orlicz norm does not admit any uniform rate of convergence for the class of distributions with bounded Orlicz norm.

2510.25296 2026-03-12 stat.ME

Nonparametric bounds for vaccine effects in randomized trials

Rachel Axelrod, Uri Obolski, Daniel Nevo

详情
英文摘要

Vaccine randomized trials are typically designed to be blinded, ensuring that the estimated vaccine efficacy (VE) reflects the immunological effect of the vaccine. When blinding is broken, however, the estimated VE reflects not only the immunological effect but also behavioral effects stemming from participants' awareness of their treatment status. Recent work has proposed alternative causal estimands to the standard VE to address this issue, but their point identification results require a strong assumption: the absence of unmeasured common causes of infection risk and participants' belief about whether they received the vaccine. Personality traits, for example, may plausibly violate this assumption. We relax this assumption and derive nonparametric causal bounds for different types of VE. We construct these bounds using two approaches: linear programming-based and monotonicity-based methods. We further consider several possible causal structures for vaccine trials and show how the nonparametric bounds differ across these scenarios. Finally, we illustrate the performance of the proposed bounds using fully synthetic data and a semi-synthetic data example based on a COVID-19 vaccine trial.

2510.13065 2026-03-12 cs.LG stat.ML

Absolute indices for determining compactness, separability and number of clusters

Adil M. Bagirov, Ramiz M. Aliguliyev, Nargiz Sultanova, Sona Taheri

Comments 25 pages, 11 figures, 9 tables

详情
英文摘要

Finding "true" clusters in a data set is a challenging problem. Clustering solutions obtained using different models and algorithms do not necessarily provide compact and well-separated clusters or the optimal number of clusters. Cluster validity indices are commonly applied to identify such clusters. Nevertheless, these indices are typically relative, and they are used to compare clustering algorithms or choose the parameters of a clustering algorithm. Moreover, the success of these indices depends on the underlying data structure. This paper introduces novel absolute cluster indices to determine both the compactness and separability of clusters. We define a compactness function for each cluster and a set of neighboring points for cluster pairs. This function is utilized to determine the compactness of each cluster and the whole cluster distribution. The set of neighboring points is used to define the margin between clusters and the overall distribution margin. The proposed compactness and separability indices are applied to identify the true number of clusters. Using a number of synthetic and real-world data sets, we demonstrate the performance of these new indices and compare them with other widely-used cluster validity indices.

2509.24962 2026-03-12 cs.LG stat.ML

Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation

Valentyn Melnychuk, Dennis Frauen, Jonas Schweisthal, Stefan Feuerriegel

详情
Journal ref
Proceedings of the Fourteenth International Conference on Learning Representations (ICLR 2026), Rio de Janeiro, Brazil
英文摘要

The conditional average treatment effect (CATE) is widely used in personalized medicine to inform therapeutic decisions. However, state-of-the-art methods for CATE estimation (so-called meta-learners) often perform poorly in the presence of low overlap. In this work, we introduce a new approach to tackle this issue and improve the performance of existing meta-learners in the low-overlap regions. Specifically, we introduce Overlap-Adaptive Regularization (OAR) that regularizes target models proportionally to overlap weights so that, informally, the regularization is higher in regions with low overlap. To the best of our knowledge, our OAR is the first approach to leverage overlap weights in the regularization terms of the meta-learners. Our OAR approach is flexible and works with any existing CATE meta-learner: we demonstrate how OAR can be applied to both parametric and non-parametric second-stage models. Furthermore, we propose debiased versions of our OAR that preserve the Neyman-orthogonality of existing meta-learners and thus ensure more robust inference. Through a series of (semi-)synthetic experiments, we demonstrate that our OAR significantly improves CATE estimation in low-overlap settings in comparison to constant regularization.

2509.22953 2026-03-12 cs.LG stat.ML

GDR-learners: Orthogonal Learning of Generative Models for Potential Outcomes

Valentyn Melnychuk, Stefan Feuerriegel

详情
Journal ref
Proceedings of the Fourteenth International Conference on Learning Representations (ICLR 2026), Rio de Janeiro, Brazil
英文摘要

Various deep generative models have been proposed to estimate potential outcomes distributions from observational data. However, none of them have the favorable theoretical property of general Neyman-orthogonality and, associated with it, quasi-oracle efficiency and double robustness. In this paper, we introduce a general suite of generative Neyman-orthogonal (doubly-robust) learners that estimate the conditional distributions of potential outcomes. Our proposed generative doubly-robust learners (GDR-learners) are flexible and can be instantiated with many state-of-the-art deep generative models. In particular, we develop GDR-learners based on (a) conditional normalizing flows (which we call GDR-CNFs), (b) conditional generative adversarial networks (GDR-CGANs), (c) conditional variational autoencoders (GDR-CVAEs), and (d) conditional diffusion models (GDR-CDMs). Unlike the existing methods, our GDR-learners possess the properties of quasi-oracle efficiency and rate double robustness, and are thus asymptotically optimal. In a series of (semi-)synthetic experiments, we demonstrate that our GDR-learners are very effective and outperform the existing methods in estimating the conditional distributions of potential outcomes.

2509.20985 2026-03-12 stat.ML cs.LG

Empirical PAC-Bayes Bounds for Markov Chains

Vahe Karagulyan, Pierre Alquier

Comments To appear in the proceedings of AISTATS 2026

详情
英文摘要

The core of generalization theory was developed for independent observations. Some PAC and PAC-Bayes bounds are available for data that exhibit a temporal dependence. However, there are constants in these bounds that depend on properties of the data-generating process: mixing coefficients, mixing time, spectral gap... Such constants are unknown in practice. In this paper, we prove a new PAC-Bayes bound for Markov chains. This bound depends on a quantity called the pseudo-spectral gap. The main novelty is that we can provide an empirical bound on the pseudo-spectral gap when the state space is finite. Thus, we obtain the first fully empirical PAC-Bayes bound for Markov chains. This extends beyond the finite case, although this requires additional assumptions. On simulated experiments, the empirical version of the bound is essentially as tight as the non-empirical one.

2509.18149 2026-03-12 math.NA cs.LG cs.NA eess.SP math.OC stat.CO stat.ML

Tensor Train Completion from Fiberwise Observations Along a Single Mode

Shakir Showkat Sofi, Lieven De Lathauwer

Comments 26 pages, 12 figures

详情
Journal ref
Mathematics 2026, 14(5), 922
英文摘要

Tensor completion is an extension of matrix completion aimed at recovering a multiway data tensor by leveraging a given subset of its entries (observations) and the pattern of observation. The low-rank assumption is key in establishing a relationship between the observed and unobserved entries of the tensor. The low-rank tensor completion problem is typically solved using numerical optimization techniques, where the rank information is used either implicitly (in the rank minimization approach) or explicitly (in the error minimization approach). Current theories concerning these techniques often study probabilistic recovery guarantees under conditions such as random uniform observations and incoherence requirements. However, if an observation pattern exhibits some low-rank structure that can be exploited, more efficient algorithms with deterministic recovery guarantees can be designed by leveraging this structure. This work shows how to use only standard linear algebra operations to compute the tensor train decomposition of a specific type of ``fiber-wise'' observed tensor, where some of the fibers of a tensor (along a single specific mode) are either fully observed or entirely missing, unlike the usual entry-wise observations. From an application viewpoint, this setting is relevant when it is easier to sample or collect a multiway data tensor along a specific mode (e.g., temporal). The proposed completion method is fast and is guaranteed to work under reasonable deterministic conditions on the observation pattern. Through numerical experiments, we showcase interesting applications and use cases that illustrate the effectiveness of the proposed approach.

2509.10067 2026-03-12 stat.ME

Robust evaluation of treatment effects in longitudinal studies with truncation by death or other intercurrent events

Georgi Baklicharov, Kelly Van Lancker, Stijn Vansteelandt

详情
英文摘要

Intercurrent events, such as treatment switching, rescue medication, dropout, or truncation by death, frequently complicate intention-to-treat analyses in randomized clinical trials. Existing causal inference frameworks typically target hypothetical or principal stratum estimands (e.g., survivor average causal effects), which rely on unverifiable assumptions and can be sensitive to unmeasured confounders or positivity violations. We propose a novel approach that mitigates this sensitivity by using only information measured prior to the intercurrent event. Our key idea is to compare treated and untreated individuals, matched on baseline covariates, at the most recent time point before either experiences an intercurrent event. We call these contrasts Pairwise Last Observation Time (PLOT) estimands. PLOT estimands are identified in randomized trials without structural assumptions, even under severe positivity violations. Although PLOT-based tests may theoretically be susceptible to residual selection bias, we show this bias vanishes under standard conditions and remains negligible in extensive simulations. We develop asymptotically efficient, model-free tests and treatment effect estimators using data-adaptive nuisance parameter estimation. We evaluate performance via simulation and apply the method to re-analyze the DEVOTE trial, affected by truncation by death. PLOT offers a robust, data-driven alternative for evaluating treatment efficacy in the presence of complex intercurrent events.

2507.18125 2026-03-12 stat.ME stat.AP

Regression approaches for modelling genotype-environment interaction and making predictions into unseen environments

Maksym Hrachov, Hans-Peter Piepho, Niaz Md. Farhat Rahman, Waqas Ahmed Malik

Comments 26 pages, 1 Figure

详情
Journal ref
Theor.Appl.Genet.139 (2026)
英文摘要

In plant breeding and variety testing, there is an increasing interest in making use of environmental information to enhance predictions for new environments. Here, we will review linear mixed models that have been proposed for this purpose. The emphasis will be on predictions and on methods to assess the uncertainty of predictions for new environments. Our point of departure is straight-line regression, which may be extended to multiple environmental covariates and genotype-specific responses. When observable environmental covariates are used, this is also known as factorial regression. Early work along these lines can be traced back to Stringfield & Salter (1934) and Yates & Cochran (1938), who proposed a method nowadays best known as Finlay-Wilkinson regression. This method, in turn, has close ties with regression on latent environmental covariates and factor-analytic variance-covariance structures for genotype-environment interaction. Extensions of these approaches - reduced rank regression, kernel- or kinship-based approaches, random coefficient regression, and extended Finlay-Wilkinson regression - will be the focus of this paper. Our objective is to demonstrate how seemingly disparate methods are very closely linked and fall within a common model-based prediction framework. The framework considers environments as random throughout, with genotypes also modelled as random in most cases. We will discuss options for assessing uncertainty of predictions, including cross validation and model-based estimates of uncertainty, the latter one being estimated using our new suggested approach. The methods are illustrated using a long-term rice variety trial dataset from Bangladesh.

2506.11322 2026-03-12 stat.ME

Bayesian Sensitivity Analysis for Causal Estimation with Time-varying Unmeasured Confounding

Yushu Zou, Liangyuan Hu, Amanda Ricciuto, Mark Deneau, Kuan Liu

详情
英文摘要

Causal inference relies on the untestable assumption of no unmeasured confounding. Sensitivity analysis can be used to quantify the impact of unmeasured confounding on causal estimates. Among sensitivity analysis methods proposed in the literature for unmeasured confounding, the latent confounder approach is favoured for its intuitive interpretation via the use of bias parameters to specify the relationship between the observed and unobserved variables and the sensitivity function approach directly characterizes the net causal effect of the unmeasured confounding without explicitly introducing latent variables to the causal models. In this paper, we developed and extended two sensitivity analysis approaches, namely the Bayesian sensitivity analysis with latent confounding variables and the Bayesian sensitivity function approach for the estimation of time-varying treatment effects with longitudinal observational data subjected to time-varying unmeasured confounding. We investigated the performance of these methods in a series of simulation studies and applied them to a multi-center pediatric disease registry data to provide practical guidance on their implementation.

2506.03054 2026-03-12 stat.ME stat.AP

Constructing Evidence-Based Tailoring Variables for Adaptive Interventions

John J. Dziak, Inbal Nahum-Shani

Comments 34 pages, 4 figures. Re-submitted to Annals of Behavioral Medicine

详情
英文摘要

Background: Adaptive interventions provide a guide for using ongoing information about individuals to decide whether and how to modify the type, amount, delivery modality, or timing of treatment, to improve intervention effectiveness while reducing cost and burden. The variables that inform treatment modification decisions are called tailoring variables. Specifying a tailoring variable requires describing what should be measured, when to measure it, when the measure should be used to make decisions, and what cutoffs should be used in making decisions. These questions are causal and prescriptive (what to do, when), not merely predictive. They involve tradeoffs between specificity and sensitivity, and between waiting for sufficient information versus intervening quickly. Purpose: There is little specific guidance in the literature on how to empirically choose tailoring variables, including cutoffs, measurement times, and decision times. Methods: We review possible approaches for comparing potential tailoring variables and propose a framework for systematically developing tailoring variables. Results: Although secondary observational data can be used to select tailoring variables, additional assumptions are needed. A specifically designed experiment for optimization (an optimization randomized controlled trial), e.g., a multi-arm randomized trial, sequential multiple assignment randomized trial, factorial experiment, or hybrid design, may provide a more direct way to answer these questions. Conclusions: Using randomization directly to inform tailoring variables provides the most direct causal evidence but requires more effort and resources than secondary data analysis. More research is needed on how best to design tailoring variables for effective, scalable interventions.

2505.13755 2026-03-12 cs.LG cs.NE nlin.CD stat.ML

Panda: A pretrained forecast model for chaotic dynamics

Jeffrey Lai, Anthony Bao, William Gilpin

详情
英文摘要

Chaotic systems are intrinsically sensitive to small errors, challenging efforts to construct predictive data-driven models of real-world dynamical systems such as fluid flows or neuronal activity. Prior efforts comprise either specialized models trained on individual time series, or foundation models trained on vast time series databases with little underlying dynamical structure. Motivated by dynamical systems theory, we present Panda, Patched Attention for Nonlinear DynAmics. We train Panda on a novel synthetic, extensible dataset of $2 \times 10^4$ chaotic dynamical systems that we discover using an evolutionary algorithm. Trained purely on simulated data, Panda exhibits emergent properties: zero-shot forecasting of unseen chaotic systems preserving both short-term accuracy and distributional measures, nonlinear resonance patterns in attention heads, and effective prediction of real-world experimental time series. Despite having been trained only on low-dimensional ordinary differential equations, Panda spontaneously develops the ability to predict partial differential equations without retraining. We also demonstrate a neural scaling law for differential equations, underscoring the potential of pre-trained models for probing abstract mathematical domains like nonlinear dynamics.

2505.09828 2026-03-12 stat.CO

Optimally balancing exploration and exploitation to automate multi-fidelity statistical estimation

Thomas Dixon, Alex Gorodetsky, John Jakeman, Akil Narayan, Yiming Xu

Comments 40 pages

详情
英文摘要

Multi-fidelity methods that use an ensemble of models to compute a Monte Carlo estimator of the expectation of a high-fidelity model can significantly reduce computational costs compared to single-model approaches. These methods use oracle statistics, specifically the covariance between models, to optimally allocate samples to each model in the ensemble. However, in practice, the oracle statistics are estimated using additional model evaluations, whose computational cost and induced error are typically ignored. To address this issue, this paper proposes an adaptive algorithm to optimally balance the resources between oracle statistics estimation and final multi-fidelity estimator construction, leveraging ideas from multilevel best linear unbiased estimators in Schaden and Ullmann (2020) and a bandit-learning procedure in Xu et al. (2022). Under mild assumptions, we demonstrate that the multi-fidelity estimator produced by the proposed algorithm exhibits mean-squared error commensurate with that of the best linear unbiased estimator under the optimal allocation computed with oracle statistics. Our theoretical findings are supported by detailed numerical experiments, including a parametric elliptic PDE and an ice-sheet mass-change modeling problem.

2504.08937 2026-03-12 cs.GR cs.CV cs.LG eess.IV stat.ML

Rethinking Few-Shot Image Fusion: Granular Ball Priors Enable General-Purpose Deep Fusion

Minjie Deng, Yan Wei, An Wu, Yuncan Ouyang, Hao Zhai, Qianyao Peng

详情
英文摘要

In image fusion tasks, the absence of real fused images as supervision signals poses significant challenges for supervised learning. Existing deep learning methods typically address this issue either by designing handcrafted priors or by relying on large-scale datasets to learn model parameters. Different from previous approaches, this paper introduces the concept of incomplete priors, which formally describe handcrafted priors at the algorithmic level and estimate their confidence. Based on this idea, we couple incomplete priors with the neural network through a sample-level adaptive loss function, enabling the network to learn and re-infer fusion rules under conditions that approximate the real fusion process.To generate incomplete priors, we propose a Granular Ball Pixel Computation (GBPC) algorithm based on the principles of granular computing. The algorithm models fused-image pixels as information units, estimating pixel weights at a fine-grained level while statistically evaluating prior reliability at a coarse-grained level. This design enables the algorithm to perceive cross-modal discrepancies and perform adaptive inference.Experimental results demonstrate that even under few-shot conditions, a lightweight neural network can still learn effective fusion rules by training only on image patches extracted from ten image pairs. Extensive experiments across multiple fusion tasks and datasets further show that the proposed method achieves superior performance in both visual quality and model compactness. The code is available at: https://github.com/DMinjie/GBFF

2504.06903 2026-03-12 stat.ME stat.CO

Network Cross-Validation and Model Selection via Subsampling

Sayan Chakrabarty, Srijan Sengupta, Yuguo Chen

详情
英文摘要

Complex and larger networks are becoming increasingly prevalent in scientific applications in various domains. Although a number of models and methods exist for such networks, cross-validation on networks remains challenging due to the unique structure of network data. In this paper, we propose a general cross-validation procedure called NETCROP (NETwork CRoss-Validation using Overlapping Partitions). The key idea is to divide the original network into multiple subnetworks with a shared overlap part, producing training sets consisting of the subnetworks and a test set with the node pairs between the subnetworks. This train-test split provides the basis for a network cross-validation procedure that can be applied on a wide range of model selection and parameter tuning problems for networks. The method is computationally efficient for large networks as it uses smaller subnetworks for the training step. We provide methodological details and theoretical guarantees for several model selection and parameter tuning tasks using NETCROP. Numerical results demonstrate that NETCROP performs accurate cross-validation on a diverse set of network model selection and parameter tuning problems. The results also indicate that NETCROP is computationally much faster while being often more accurate than the existing methods for network cross-validation.

2504.04371 2026-03-12 cs.LG stat.ML

An Algorithm to perform Covariance-Adjusted Support Vector Classification in Non-Euclidean Spaces

Satyajeet Sahoo, Jhareswar Maiti

详情
英文摘要

Traditional Support Vector Machine (SVM) classification is carried out by finding the max-margin classifier for the training data that divides the margin space into two equal sub-spaces. This study demonstrates limitations of performing Support Vector Classification in non-Euclidean spaces by establishing that the underlying principle of max-margin classification and Karush Kuhn Tucker (KKT) boundary conditions are optimal only in the Euclidean vector spaces. The study establishes a methodology to perform Support Vector Classification in Non-Euclidean Spaces by incorporating data covariance into the optimization problem using Cholesky Decomposition of respective class covariance structure. It also demonstrates that in non-Euclidean spaces KKT modelling is sub-optimal as the principle of maximum margin is a function of intra-class data covariances and the classifier obtained separates the margin space in ratio of the respective class population covariance matrix. The study proposes an algorithm to iteratively estimate the population covariance-adjusted SVM classifier in non-Euclidean space from sample covariance matrices of the training data. The effectiveness of this SVM classification approach is demonstrated by applying the classifier on multiple datasets and comparing the performance with traditional SVM kernels and whitening algorithms. The Cholesky-SVM model shows marked improvement in the accuracy, precision, F1 scores and ROC performance compared to linear and other kernel SVMs.

2503.07022 2026-03-12 math.ST math.PR stat.TH

The level of self-organized criticality in oscillating Brownian motion: $n$-consistency and stable Poisson-type convergence of the MLE

Johannes Brutsche, Angelika Rohde

详情
英文摘要

For some discretely observed path of oscillating Brownian motion with level of self-organized criticality $ρ_0$, we prove in the infill asymptotics that the MLE is $n$-consistent, where $n$ denotes the sample size, and derive its limit distribution with respect to stable convergence. As the transition density of this homogeneous Markov process is not even continuous in $ρ_0$, the analysis is highly non-standard. Therefore, interesting and somewhat unexpected phenomena occur: The likelihood function splits into several components, each of them contributing very differently depending on how close the argument $ρ$ is to $ρ_0$. Correspondingly, the MLE is successively excluded to lay outside a compact set, a $1/\sqrt{n}$-neighborhood and finally a $1/n$-neighborhood of $ρ_0$ asymptotically. The crucial argument to derive the stable convergence is to exploit the semimartingale structure of the sequential suitably rescaled local log-likelihood function (as a process in time). Both sequentially and as a process in $ρ$, it exhibits a bivariate Poissonian behavior in the stable limit with its intensity being a multiple of the local time at $ρ_0$.

2502.07460 2026-03-12 cs.LG stat.ML

Logarithmic Regret for Online KL-Regularized Reinforcement Learning

Heyang Zhao, Chenlu Ye, Wei Xiong, Quanquan Gu, Tong Zhang

详情
英文摘要

Recent advances in Reinforcement Learning from Human Feedback (RLHF) have shown that KL-regularization plays a pivotal role in improving the efficiency of RL fine-tuning for large language models (LLMs). Despite its empirical advantage, the theoretical difference between KL-regularized RL and standard RL remains largely under-explored. While there is a recent line of work on the theoretical analysis of KL-regularized objective in decision making \citep{xiong2024iterative, xie2024exploratory,zhao2024sharp}, these analyses either reduce to the traditional RL setting or rely on strong coverage assumptions. In this paper, we propose an optimism-based KL-regularized online contextual bandit algorithm, and provide a novel analysis of its regret. By carefully leveraging the benign optimization landscape induced by the KL-regularization and the optimistic reward estimation, our algorithm achieves an $\mathcal{O}\big(η\log (N_{\mathcal R} T)\cdot d_{\mathcal R}\big)$ logarithmic regret bound, where $η, N_{\mathcal R},T,d_{\mathcal R}$ denote the KL-regularization parameter, the cardinality of the reward function class, number of rounds, and the complexity of the reward function class. Furthermore, we extend our algorithm and analysis to reinforcement learning by developing a novel decomposition over transition steps and also obtain a similar logarithmic regret bound.

2501.07437 2026-03-12 stat.ML cs.LG

Pairwise Comparisons without Stochastic Transitivity: Model, Theory and Applications

Sze Ming Lee, Yunxiao Chen

Comments 55 pages, 2 figures

详情
英文摘要

Most statistical models for pairwise comparisons, including the Bradley-Terry (BT) and Thurstone models and many extensions, make a relatively strong assumption of stochastic transitivity. This assumption imposes the existence of an unobserved global ranking among all the players/teams/items and monotone constraints on the comparison probabilities implied by the global ranking. However, the stochastic transitivity assumption does not hold in many real-world scenarios of pairwise comparisons, especially games involving multiple skills or strategies. As a result, models relying on this assumption can have suboptimal predictive performance. In this paper, we propose a general family of statistical models for pairwise comparison data without a stochastic transitivity assumption, substantially extending the BT and Thurstone models. In this model, the pairwise probabilities are determined by a (approximately) low-dimensional skew-symmetric matrix. Likelihood-based estimation methods and computational algorithms are developed, which allow for sparse data with only a small proportion of observed pairs. Theoretical analysis shows that the proposed estimator achieves minimax-rate optimality, which adapts effectively to the sparsity level of the data. The spectral theory for skew-symmetric matrices plays a crucial role in the implementation and theoretical analysis. The proposed method's superiority against the BT model, along with its broad applicability across diverse scenarios, is further supported by simulations and real data analysis.

2501.03009 2026-03-12 stat.AP

Equipoise calibration of clinical trial design

Fabio Rigat

Comments 15 pages, 1 figure, 4 tables

详情
英文摘要

Clinical trial design ensures that primary analysis outcomes have strong statistical properties. However, mainstream methodology for randomised study design does not establish a formal link between statistical and clinical significance. This paper contributes to bridging this gap by calibrating the operational characteristics of primary trial outcomes to establishing clinical equipoise imbalance. Common late phase designs are shown to provide at least 90% evidence of equipoise imbalance. Designs carrying 95% power at 5% false positive rate are shown to demonstrate 95% evidence of equipoise imbalance, providing an operational definition of a robustly powered study. Equipoise calibration is applied to design of clinical development plans comprising phase 2 and phase 3 studies using standard oncology endpoints. Commonly used power and false positive error rates are shown to provide strong equipoise imbalance when positive outcomes are observed in both phase 2 and phase 3. Establishing strong equipoise imbalance based on inconsistent outcomes of phase 2 and phase 3 studies is shown to require large sample sizes unlikely to be associated with clinically meaningful effect sizes.

2410.11238 2026-03-12 math.ST stat.ME stat.TH

Impact of existence and nonexistence of pivot on the coverage of empirical best linear prediction intervals for small areas

Yuting Chen, Masayo Y. Hirose, Partha Lahiri

详情
英文摘要

We advance the theory of parametric bootstrap in constructing highly efficient empirical best (EB) prediction intervals of small area means. The coverage error of such a prediction interval is of the order $O(m^{-3/2})$, where $m$ is the number of small areas to be pooled using a linear mixed normal model. In the context of an area level model where the random effects follow a non-normal known distribution except possibly for unknown hyperparameters, we analytically show that the order of coverage error of empirical best linear (EBL) prediction interval remains the same even if we relax the normality of the random effects by the existence of pivot for a suitably standardized random effects when hyperpameters are known. Recognizing the challenge of showing existence of a pivot, we develop a simple moment-based method to claim non-existence of pivot. We show that existing parametric bootstrap EBL prediction interval fails to achieve the desired order of the coverage error, i.e. $O(m^{-3/2})$, in absence of a pivot. We obtain a surprising result that the order $O(m^{-1})$ term is always positive under certain conditions indicating possible overcoverage of the existing parametric bootstrap EBL prediction interval. In general, we analytically show for the first time that the coverage problem can be corrected by adopting a suitably devised double parametric bootstrap. Our Monte Carlo simulations show that our proposed single bootstrap method performs reasonably well when compared to rival methods.

2410.08727 2026-03-12 stat.ML cs.LG

Losing dimensions: Geometric memorization in generative diffusion

Beatrice Achilli, Enrico Ventura, Gianluigi Silvestri, Bao Pham, Gabriel Raya, Dmitry Krotov, Carlo Lucibello, Luca Ambrogioni

Comments 17 pages, 9 figures

详情
英文摘要

Diffusion models power leading generative AI, but when and how they memorize training data, especially on low-dimensional manifolds, remains unclear. We find memorization emerges gradually, not abruptly: as data become scarce, diffusion models experience a smooth collapse where their capacity to vary across independent directions diminishes. Measuring latent dimensionality via the learned score field, we reveal how generative behavior increasingly centers on a few examples while other variations "freeze out". We propose a geometric memorization theory, showing that salient features collapse first, then finer details, leading to near point-wise replication. This mirrors physical systems condensing into a few low-energy configurations. Our theoretical predictions align with both synthetic and real data, identifying geometric memorization as a distinct phase between generalization and exact copying.

2410.08226 2026-03-12 physics.geo-ph cs.LG stat.AP stat.ML

EarthquakeNPP: A Benchmark for Earthquake Forecasting with Neural Point Processes

Samuel Stockman, Daniel Lawson, Maximilian Werner

Comments Accepted to Transactions on Machine Learning Research (TMLR), 2026

详情
英文摘要

For decades, classical point process models, such as the epidemic-type aftershock sequence (ETAS) model, have been widely used for forecasting the event times and locations of earthquakes. Recent advances have led to Neural Point Processes (NPPs), which promise greater flexibility and improvements over such classical models. However, the currently-used benchmark for NPPs does not represent an up-to-date challenge in the seismological community, since it contains data leakage and omits the largest earthquake sequence from the region. Additionally, initial earthquake forecasting benchmarks fail to compare NPPs with state-of-the-art forecasting models commonly used in seismology. To address these gaps, we introduce EarthquakeNPP: a benchmarking platform that curates and standardizes existing public resources: globally available earthquake catalogs, the ETAS model, and evaluation protocols from the seismology community. The datasets cover a range of small to large target regions within California, dating from 1971 to 2021, and include different methodologies for dataset generation. Benchmarking experiments, using both log-likelihood and generative evaluation metrics widely recognised in seismology, show that none of the five NPPs tested outperform ETAS. These findings suggest that current NPP implementations are not yet suitable for practical earthquake forecasting. Nonetheless, EarthquakeNPP provides a platform to foster future collaboration between the seismology and machine learning communities.

2408.09335 2026-03-12 math.OC cs.LG q-fin.MF stat.ML

Exploratory Optimal Stopping: A Singular Control Formulation

Jodi Dianetti, Giorgio Ferrari, Renyuan Xu

Comments 49 pages, 3 figures

详情
英文摘要

This paper explores continuous-time and state-space optimal stopping problems from a reinforcement learning perspective. We begin by formulating the stopping problem using randomized stopping times, where the decision maker's control is represented by the probability of stopping within a given time-specifically, a bounded, non-decreasing, càdlàg control process. To encourage exploration and facilitate learning, we introduce a regularized version of the problem by penalizing the performance criterion with the cumulative residual entropy of the randomized stopping time. The regularized problem takes the form of an (n+1)-dimensional degenerate singular stochastic control with finite-fuel, where the regularized free boundary becomes the graph of a function mapping the state variable of the original stopping problem into the probability of stopping. We address this singular control problem through the dynamic programming principle, which enables us to identify the unique optimal exploratory strategy. Finally, we propose both model-based and model-free reinforcement learning algorithms tailored for exploratory optimal stopping problems. We establish policy improvement guarantees for the proposed algorithms. Moreover, the model-free method is of actor-critic type and it is scalable in high-dimensions under neural network parameterization.

2408.09155 2026-03-12 stat.ME math.ST stat.CO stat.ML stat.TH

Learning Robust Treatment Rules for Censored Data

Yifan Cui, Junyi Liu, Tao Shen, Zhengling Qi, Xi Chen

详情
英文摘要

There is a fast-growing literature on estimating optimal treatment rules directly by maximizing the expected outcome. In biomedical studies and operations applications, censored survival outcome is frequently observed, in which case the truncated mean survival time and survival probability are of great interest. In this paper, we propose two robust criteria for learning optimal treatment rules with censored survival outcomes; the former one targets an optimal treatment rule maximizing the truncated mean survival time, where the cutoff is specified by a given quantile such as median; the latter one targets an optimal treatment rule maximizing buffered survival probabilities, where the predetermined threshold is adjusted to account for the truncated mean survival time. We develop a sampling-based difference-of-convex algorithm for learning the proposed optimal treatment rules, and provide theoretical justifications for them. In simulation studies, our estimators show improved performance compared to existing methods. We also demonstrate the proposed method using AIDS clinical trial data.

2405.05781 2026-03-12 stat.ME

Nonparametric estimation of a state entry time distribution conditional on a "past" state occupation in a progressive multistate model with current status data

Samuel Anyaso-Samuel, Somnath Datta

Comments 26 pages, 5 tables, 4 figures

详情
英文摘要

Case-I interval-censored (current status) data from multistate systems are often encountered in biomedical and epidemiological studies. In this article, we focus on the problem of estimating state entry distribution and occupation probabilities, contingent on a preceding state occupation. This endeavor is particularly complex owing to the inherent challenge of the unavailability of directly observed counts of individuals at risk of transitioning from a state, due to severe interval censoring. We propose two nonparametric approaches, one using the fractional at-risk set approach recently adopted in the right-censoring framework and the other a new estimator based on the ratio of marginal state occupation probabilities. Both estimation approaches utilize innovative applications of concepts from the competing risks paradigm. The finite-sample behavior of the proposed estimators is studied via extensive simulation studies where we show that the estimators based on severely censored current status data have good performance when compared with those based on complete data. We demonstrate the application of the two methods to analyze data from patients diagnosed with breast cancer.

2312.09877 2026-03-12 cs.LG cs.AI cs.DC stat.ML

Optimal Transport Aggregation for Distributed Mixture-of-Experts

Faïcel Chamroukhi, Nhat Thien Pham

详情
英文摘要

Mixture-of-experts (MoE) models provide a flexible statistical framework for modeling heterogeneity and nonlinear relationships. In many modern applications, however, datasets are naturally distributed across multiple machines due to storage, computational, or governance constraints. We consider a distributed model aggregation setting in which local MoE models are trained independently on decentralized datasets and subsequently combined into a global estimator. Aggregating MoE models is challenging because standard averaging produces models that do not preserve the MoE structure, and therefore do not yield estimates of the global model parameters. To address this issue, we propose a principled aggregation framework based on optimal transport that constructs a reduced global MoE estimator by minimizing a transportation divergence between the collection of local estimators and the aggregated model. An efficient majorization--minimization (MM) algorithm is derived to solve the resulting optimization problem. The method requires only a single communication step from local machines to a central server, making it a frugal distributed learning approach particularly attractive for large-scale settings where communication costs are a major bottleneck. We further establish statistical guarantees for the aggregated estimator, including consistency under standard assumptions on the local estimators. Experiments on synthetic and real datasets demonstrate that the approach achieves performance comparable to centralized training while significantly reducing computation time. The source codes are publicly available on Github.

2311.02766 2026-03-12 cs.LG stat.ME stat.ML

Riemannian Laplace Approximation with the Fisher Metric

Hanlin Yu, Marcelo Hartmann, Bernardo Williams, Mark Girolami, Arto Klami

Comments AISTATS 2024, with additional fixes and improvements. Theorem 2 is fixed

详情
英文摘要

Laplace's method approximates a target density with a Gaussian distribution at its mode. It is computationally efficient and asymptotically exact for Bayesian inference due to the Bernstein-von Mises theorem, but for complex targets and finite-data posteriors it is often too crude an approximation. A recent generalization of the Laplace Approximation transforms the Gaussian approximation according to a chosen Riemannian geometry providing a richer approximation family, while still retaining computational efficiency. However, as shown here, its properties depend heavily on the chosen metric, indeed the metric adopted in previous work results in approximations that are overly narrow as well as being biased even at the limit of infinite data. We correct this shortcoming by developing the approximation family further, deriving two alternative variants that are exact at the limit of infinite data, extending the theoretical analysis of the method, and demonstrating practical improvements in a range of experiments.

2305.12292 2026-03-12 cs.LG math.OC stat.ML

Disjunctive Branch-and-Bound for Certifiably Optimal Low-Rank Matrix Completion

Dimitris Bertsimas, Ryan Cory-Wright, Sean Lo, Jean Pauphilet

Comments Updated version for revision at INFORMS Journal on Computing

详情
英文摘要

Low-rank matrix completion consists of computing a matrix of minimal complexity that recovers a given set of observations as accurately as possible. Unfortunately, existing methods for matrix completion are heuristics that, while highly scalable and often identifying high-quality solutions, do not provide an instance-wise certificate of optimality. We reexamine matrix completion with an optimality-oriented eye. We reformulate low-rank matrix completion problems as convex problems over the non-convex set of projection matrices and implement a disjunctive branch-and-bound scheme that solves them to certifiable optimality. Further, we derive a novel and often near-exact class of convex relaxations by decomposing a low-rank matrix as a sum of rank-one matrices and incentivizing that two-by-two minors in each rank-one matrix have determinant zero. In numerical experiments, our new convex relaxations decrease the optimality gap by two orders of magnitude compared to existing attempts, and our disjunctive branch-and-bound scheme solves $n \times m$ rank-$k$ matrix completion problems to certifiable optimality or near optimality in hours for $\max \{m, n\} \leq 2500$ and $k \leq 5$. Moreover, this reduction in the training error translates into an average $2\%$--$50\%$ reduction in the test set error compared with alternating minimization-based methods.

2211.01720 2026-03-12 eess.SY cs.SY math.ST stat.TH

Response time central-limit and failure rate estimation for stationary periodic rate monotonic real-time systems

Kevin Zagalo, Avner Bar-Hen

Comments submitted to IEEE Journal

详情
英文摘要

Real-time systems consist of a set of tasks, a scheduling policy, and a system architecture, all constrained by timing requirements. Many everyday embedded systems, within devices such as airplanes, cars, trains, and spatial probes, operate as real-time systems. To ensure safe failure rates, response times-the time required for the exection of a task-must be bounded. Rate Monotonic real-time systems prioritize tasks according to their arrival rate. This paper focuses on the use of the central limit of response times built in \cite{zagalo2022} and an approximation of their distribution with an inverse Gaussian mixture distribution. The distribution parameters and their associated failure rates are estimated through a suitable re-parameterization of the inverse Gaussian distribution and an adapted Expectation-Maximization algorithm. Extensive simulations demonstrate that the method is well-suited for the approximation of failure rates. We discuss the extension of such method to a chi-squared independence test adapted to real-time systems.

2207.02943 2026-03-12 econ.EM stat.ME

Degrees of Freedom and Information Criteria for the Synthetic Control Method

Guillaume Allaire Pouliot, Zhen Xie, Ziyi Liu

详情
英文摘要

We provide an analytical characterization of the model flexibility of the synthetic control method (SCM) in the familiar form of degrees of freedom. We obtain estimable information criteria, which may be used to circumvent cross-validation when selecting either the tuning parameter in penalized variants of SCM or the weighting matrix in the SCM with covariates. We assess the impact of car license rationing in Tianjin; while a natural match is available, both it and other donors are noisy, inviting the use of SCM to average over approximately matching donors. The very large number of candidate donors calls for penalized variants of SCM and we observe that model selection using information criteria outperforms that based on cross-validation.

2110.07583 2026-03-12 math.ST cs.LG quant-ph stat.TH

Near optimal sample complexity for matrix and tensor normal models via geodesic convexity

Cole Franks, Rafael Oliveira, Akshay Ramachandran, Michael Walter

Comments 76 pages, accepted in Annals of Statistics

详情
Journal ref
Annals of Statistics 54 (1), 93-119 (2026)
英文摘要

The matrix normal model, i.e., the family of Gaussian matrix-variate distributions whose covariance matrices are the Kronecker product of two lower dimensional factors, is frequently used to model matrix-variate data. The tensor normal model generalizes this family to Kronecker products of three or more factors. We study the estimation of the Kronecker factors of the covariance matrix in the matrix and tensor normal models. For the above models, we show that the maximum likelihood estimator (MLE) achieves nearly optimal nonasymptotic sample complexity and nearly tight error rates in the Fisher-Rao and Thompson metrics. In contrast to prior work, our results do not rely on the factors being well-conditioned or sparse, nor do we need to assume an accurate enough initial guess. For the matrix normal model, all our bounds are minimax optimal up to logarithmic factors, and for the tensor normal model our bounds for the largest factor and for overall covariance matrix are minimax optimal up to constant factors provided there are enough samples for any estimator to obtain constant Frobenius error. In the same regimes as our sample complexity bounds, we show that the flip-flop algorithm, a practical and widely used iterative procedure to compute the MLE, converges linearly with high probability. Our main technical insight is that, given enough samples, the negative log-likelihood function is strongly geodesically convex in the geometry on positive-definite matrices induced by the Fisher information metric. This strong convexity is determined by the expansion of certain random quantum channels.

1809.08801 2026-03-12 stat.AP cs.CV

Beyond Binomial and Negative Binomial: Adaptation in Bernoulli Parameter Estimation

Safa C. Medin, John Murray-Bruce, David Castañón, Vivek K Goyal

Comments 13 pages, 16 figures

详情
Journal ref
IEEE Trans. Computational Imaging, vol. 5, no. 4, pp. 570-584, December 2019
英文摘要

Estimating the parameter of a Bernoulli process arises in many applications, including photon-efficient active imaging where each illumination period is regarded as a single Bernoulli trial. Motivated by acquisition efficiency when multiple Bernoulli processes are of interest, we formulate the allocation of trials under a constraint on the mean as an optimal resource allocation problem. An oracle-aided trial allocation demonstrates that there can be a significant advantage from varying the allocation for different processes and inspires a simple trial allocation gain quantity. Motivated by realizing this gain without an oracle, we present a trellis-based framework for representing and optimizing stopping rules. Considering the convenient case of Beta priors, three implementable stopping rules with similar performances are explored, and the simplest of these is shown to asymptotically achieve the oracle-aided trial allocation. These approaches are further extended to estimating functions of a Bernoulli parameter. In simulations inspired by realistic active imaging scenarios, we demonstrate significant mean-squared error improvements: up to 4.36 dB for the estimation of p and up to 1.80 dB for the estimation of log p.

1710.03181 2026-03-12 stat.AP

Bayesian analysis of 210Pb dating

Marco A Aquino-López, Maarten Blaauw, J Andrés Christen, Nicole K. Sanderson

Comments 22 Pages, 4 Figures

详情
英文摘要

In many studies of environmental change of the past few centuries, 210Pb dating is used to obtain chronologies for sedimentary sequences. One of the most commonly used approaches to estimate the ages of depths in a sequence is to assume a constant rate of supply (CRS) or influx of `unsupported' 210Pb from the atmosphere, together with a constant or varying amount of `supported' 210Pb. Current 210Pb dating models do not use a proper statistical framework and thus provide poor estimates of errors. Here we develop a new model for 210Pb dating, where both ages and values of supported and unsupported 210Pb form part of the parameters. We apply our model to a case study from Canada as well as to some simulated examples. Our model can extend beyond the current CRS approach, deal with asymmetric errors and mix 210Pb with other types of dating, thus obtaining more robust, realistic and statistically better defined estimates.

1605.08887 2026-03-12 q-bio.GN stat.AP stat.ME

Controlling the joint local false discovery rate is more powerful than meta-analysis methods in joint analysis of summary statistics from multiple genome-wide association studies

Wei Jiang, Weichuan Yu

详情
英文摘要

In genome-wide association studies (GWASs) of common diseases/traits, we often analyze multiple GWASs with the same phenotype together to discover associated genetic variants with higher power. Since it is difficult to access data with detailed individual measurements, summary-statistics-based meta-analysis methods have become popular to jointly analyze data sets from multiple GWASs. In this paper, we propose a novel summary-statistics-based joint analysis method based on controlling the joint local false discovery rate (Jlfdr). We prove that our method is the most powerful summary-statistics-based joint analysis method when controlling the false discovery rate at a certain level. In particular, the Jlfdr-based method achieves higher power than commonly used meta-analysis methods when analyzing heterogeneous data sets from multiple GWASs. Simulation experiments demonstrate the superior power of our method over meta-analysis methods. Also, our method discovers more associations than meta-analysis methods from empirical data sets of four phenotypes. The R-package is available at: http://bioinformatics.ust.hk/Jlfdr.html.

1508.06715 2026-03-12 q-bio.GN stat.AP stat.ME

Estimating Reproducibility in Genome-Wide Association Studies

Wei Jiang, Jing-Hao Xue, Weichuan Yu

详情
英文摘要

Genome-wide association studies (GWAS) are widely used to discover genetic variants associated with diseases. To control false positives, all findings from GWAS need to be verified with additional evidences, even for associations discovered from a high power study. Replication study is a common verification method by using independent samples. An association is regarded as true positive with a high confidence when it can be identified in both primary study and replication study. Currently, there is no systematic study on the behavior of positives in the replication study when the positive results of primary study are considered as the prior information. In this paper, two probabilistic measures named Reproducibility Rate (RR) and False Irreproducibility Rate (FIR) are proposed to quantitatively describe the behavior of primary positive associations (i.e. positive associations identified in the primary study) in the replication study. RR is a conditional probability measuring how likely a primary positive association will also be positive in the replication study. This can be used to guide the design of replication study, and to check the consistency between the results of primary study and those of replication study. FIR, on the contrary, measures how likely a primary positive association may still be a true positive even when it is negative in the replication study. This can be used to generate a list of potentially true associations in the irreproducible findings for further scrutiny. The estimation methods of these two measures are given. Simulation results and real experiments show that our estimation methods have high accuracy and good prediction performance.

1406.1761 2026-03-12 stat.AP

Photon-Efficient Computational 3D and Reflectivity Imaging with Single-Photon Detectors

Dongeek Shin, Ahmed Kirmani, Vivek K Goyal, Jeffrey H. Shapiro

Comments 11 pages, 8 figures

详情
Journal ref
IEEE Trans. Computational Imaging, vol. 1, no. 2, pp. 112-125, June 2015
英文摘要

Capturing depth and reflectivity images at low light levels from active illumination of a scene has wide-ranging applications. Conventionally, even with single-photon detectors, hundreds of photon detections are needed at each pixel to mitigate Poisson noise. We develop a robust method for estimating depth and reflectivity using on the order of 1 detected photon per pixel averaged over the scene. Our computational imager combines physically accurate single-photon counting statistics with exploitation of the spatial correlations present in real-world reflectivity and 3D structure. Experiments conducted in the presence of strong background light demonstrate that our computational imager is able to accurately recover scene depth and reflectivity, while traditional maximum-likelihood based imaging methods lead to estimates that are highly noisy. Our framework increases photon efficiency 100-fold over traditional processing and also improves, somewhat, upon first-photon imaging under a total acquisition time constraint in raster-scanned operation. Thus our new imager will be useful for rapid, low-power, and noise-tolerant active optical imaging, and its fixed dwell time will facilitate parallelization through use of a detector array.