arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.08676 2026-03-10 stat.ML cs.LG stat.CO

Momentum SVGD-EM for Accelerated Maximum Marginal Likelihood Estimation

Adam Rozzio, Rafael Athanasiades, O. Deniz Akyildiz

Comments Accepted to AISTATS 2026

详情
英文摘要

Maximum marginal likelihood estimation (MMLE) can be formulated as the optimization of a free energy functional. From this viewpoint, the Expectation-Maximisation (EM) algorithm admits a natural interpretation as a coordinate descent method over the joint space of model parameters and probability measures. Recently, a significant body of work has adopted this perspective, leading to interacting particle algorithms for MMLE. In this paper, we propose an accelerated version of one such procedure, based on Stein variational gradient descent (SVGD), by introducing Nesterov acceleration in both the parameter updates and in the space of probability measures. The resulting method, termed Momentum SVGD-EM, consistently accelerates convergence in terms of required iterations across various tasks of increasing difficulty, demonstrating effectiveness in both low- and high-dimensional settings.

2603.08607 2026-03-10 stat.ME stat.AP

RESAPLE: An Approximate One-Step Restricted Likelihood Estimator of Spatial Dependence for Exploratory Spatial Analysis

Aditya Khan, Meredith Franklin

详情
英文摘要

Diagnostics such as Moran's index and approximate profile likelihood-based estimators (APLE) for Gaussian spatial autoregressive models are widely used in exploratory data analysis to assess the strength of spatial dependence. Yet, although Moran's index is often applied to regression residuals, and APLE is typically formulated for raw outcomes, neither is explicitly constructed as an estimator of residual spatial dependence after adjustment for large-scale trends and covariates. We propose RESAPLE, a one-step approximate restricted maximum likelihood (REML) estimator of the spatial error model's spatial dependence parameter $ρ$, constructed from REML residuals. Because RESAPLE is a Rayleigh coefficient, it retains the interpretability and diagnostic convenience of exploratory indices, while also providing a computationally inexpensive and accurate estimator of $ρ$ for moderate dependence. We show that for small to medium sample sizes and adequately specified trend models, RESAPLE is a better estimator of, and test statistic for, residual spatial dependence relative to existing alternatives including Moran's index and the APLE across a wide range of practical settings. The theory we develop also yields a diagnostic for spatial weight selection, providing guidance towards resolving a common point of ambiguity in spatial data analysis. We illustrate the method using simulations on both regular and highly irregular lattices with a case study using American Community Survey tract-level data.

2603.08553 2026-03-10 stat.ML cs.LG math.OC q-fin.PM q-fin.RM

Generative Adversarial Regression (GAR): Learning Conditional Risk Scenarios

Saeed Asadi, Jonathan Yu-Meng Li

详情
英文摘要

We propose Generative Adversarial Regression (GAR), a framework for learning conditional risk scenarios through generators aligned with downstream risk objectives. GAR builds on a regression characterization of conditional risk for elicitable functionals, including quantiles, expectiles, and jointly elicitable pairs. We extend this principle from point prediction to generative modeling by training generators whose policy-induced risk matches that of real data under the same context. To ensure robustness across all policies, GAR adopts a minimax formulation in which an adversarial policy identifies worst-case discrepancies in risk evaluation while the generator adapts to eliminate them. This structure preserves alignment with the risk functional across a broad class of policies rather than a fixed, pre-specified set. We illustrate GAR through a tail-risk instantiation based on jointly elicitable $(\mathrm{VaR}, \mathrm{ES})$ objectives. Experiments on S\&P 500 data show that GAR produces scenarios that better preserve downstream risk than unconditional, econometric, and direct predictive baselines while remaining stable under adversarially selected policies.

2603.08542 2026-03-10 math.ST cs.DS math.PR stat.TH

Bayesian inference of planted matchings: Local posterior approximation and infinite-volume limit

Zhou Fan, Timothy L. H. Wee, Kaylee Y. Yang

详情
英文摘要

We study Bayesian inference of an unknown matching $π^*$ between two correlated random point sets $\{X_i\}_{i=1}^n$ and $\{Y_i\}_{i=1}^n$ in $[0,1]^d$, under a critical scaling $\|X_i-Y_{π^*(i)}\|_2 \asymp n^{-1/d}$, in both an exact matching model where all points are observed and a partial matching model where a fraction of points may be missing. Restricting to the simplest setting of $d=1$, in this work, we address the questions of (1) whether the posterior distribution over matchings is approximable by a local algorithm, and (2) whether marginal statistics of this posterior have a well-defined limit as $n \to \infty$. We answer both questions affirmatively for partial matching, where a decay-of-correlations arises for large $n$. For exact matching, we show that the posterior is approximable locally only after a global sorting of the points, and that defining a large-$n$ limit of marginal statistics requires a careful indexing of points in the Poisson point process limit of the data, based on a notion of flow. We leave as an open question the extensions of such results to dimensions $d \geq 2$.

2603.08518 2026-03-10 cs.LG stat.ML

Breaking the Bias Barrier in Concave Multi-Objective Reinforcement Learning

Swetha Ganesh, Vaneet Aggarwal

详情
英文摘要

While standard reinforcement learning optimizes a single reward signal, many applications require optimizing a nonlinear utility $f(J_1^π,\dots,J_M^π)$ over multiple objectives, where each $J_m^π$ denotes the expected discounted return of a distinct reward function. A common approach is concave scalarization, which captures important trade-offs such as fairness and risk sensitivity. However, nonlinear scalarization introduces a fundamental challenge for policy gradient methods: the gradient depends on $\partial f(J^π)$, while in practice only empirical return estimates $\hat J$ are available. Because $f$ is nonlinear, the plug-in estimator is biased ($\mathbb{E}[\partial f(\hat J)] \neq \partial f(\mathbb{E}[\hat J])$), leading to persistent gradient bias that degrades sample complexity. In this work we identify and overcome this bias barrier in concave-scalarized multi-objective reinforcement learning. We show that existing policy-gradient methods suffer an intrinsic $\widetilde{\mathcal{O}}(ε^{-4})$ sample complexity due to this bias. To address this issue, we develop a Natural Policy Gradient (NPG) algorithm equipped with a multi-level Monte Carlo (MLMC) estimator that controls the bias of the scalarization gradient while maintaining low sampling cost. We prove that this approach achieves the optimal $\widetilde{\mathcal{O}}(ε^{-2})$ sample complexity for computing an $ε$-optimal policy. Furthermore, we show that when the scalarization function is second-order smooth, the first-order bias cancels automatically, allowing vanilla NPG to achieve the same $\widetilde{\mathcal{O}}(ε^{-2})$ rate without MLMC. Our results provide the first optimal sample complexity guarantees for concave multi-objective reinforcement learning under policy-gradient methods.

2603.08495 2026-03-10 cs.LG stat.ML

Efficient Credal Prediction through Decalibration

Paul Hofman, Timo Löhr, Maximilian Muschalik, Yusuf Sale, Eyke Hüllermeier

详情
英文摘要

A reliable representation of uncertainty is essential for the application of modern machine learning methods in safety-critical settings. In this regard, the use of credal sets (i.e., convex sets of probability distributions) has recently been proposed as a suitable approach to representing epistemic uncertainty. However, as with other approaches to epistemic uncertainty, training credal predictors is computationally complex and usually involves (re-)training an ensemble of models. The resulting computational complexity prevents their adoption for complex models such as foundation models and multi-modal systems. To address this problem, we propose an efficient method for credal prediction that is grounded in the notion of relative likelihood and inspired by techniques for the calibration of probabilistic classifiers. For each class label, our method predicts a range of plausible probabilities in the form of an interval. To produce the lower and upper bounds of these intervals, we propose a technique that we refer to as decalibration. Extensive experiments show that our method yields credal sets with strong performance across diverse tasks, including coverage-efficiency evaluation, out-of-distribution detection, and in-context learning. Notably, we demonstrate credal prediction on models such as TabPFN and CLIP -- architectures for which the construction of credal sets was previously infeasible.

2603.08377 2026-03-10 cs.LG stat.ML

Beyond the Markovian Assumption: Robust Optimization via Fractional Weyl Integrals in Imbalanced Data

Gustavo A. Dorrego

Comments 5 pages, 3 figures

详情
英文摘要

Standard Gradient Descent and its modern variants assume local, Markovian weight updates, making them highly susceptible to noise and overfitting. This limitation becomes critically severe in extremely imbalanced datasets such as financial fraud detection where dominant class gradients systematically overwrite the subtle signals of the minority class. In this paper, we introduce a novel optimization algorithm grounded in Fractional Calculus. By isolating the core memory engine of the generalized fractional derivative, the Weighted Fractional Weyl Integral, we replace the instantaneous gradient with a dynamically weighted historical sequence. This fractional memory operator acts as a natural regularizer. Empirical evaluations demonstrate that our method prevents overfitting in medical diagnostics and achieves an approximately 40 percent improvement in PR-AUC over classical optimizers in financial fraud detection, establishing a robust bridge between pure fractional topology and applied Machine Learning.

2603.08370 2026-03-10 stat.ML cs.IR cs.LG stat.ME

Unifying On- and Off-Policy Variance Reduction Methods

Olivier Jeunen

详情
英文摘要

Continuous and efficient experimentation is key to the practical success of user-facing applications on the web, both through online A/B-tests and off-policy evaluation. Despite their shared objective -- estimating the incremental value of a treatment -- these domains often operate in isolation, utilising distinct terminologies and statistical toolkits. This paper bridges that divide by establishing a formal equivalence between their canonical variance reduction methods. We prove that the standard online Difference-in-Means estimator is mathematically identical to an off-policy Inverse Propensity Scoring estimator equipped with an optimal (variance-minimising) additive control variate. Extending this unification, we demonstrate that widespread regression adjustment methods (such as CUPED, CUPAC, and ML-RATE) are structurally equivalent to Doubly Robust estimation. This unified view extends our understanding of commonly used approaches, and can guide practitioners and researchers working on either class of problems.

2603.08353 2026-03-10 math.ST stat.TH

Limiting Spectral Distribution of moderately large Kendall's correlation matrix and its application

Raunak Shevade, Monika Bhattacharjee

Comments 25 pages, Submitted to journal

详情
英文摘要

We establish the limiting spectral distribution of Kendall's correlation matrices in the moderate high-dimensional regime where the dimension grows slower than the sample size. Our framework allows observations to be independent but not necessarily identically distributed, and accommodates both discrete and continuous data. Unlike existing results developed under i.i.d. observations, our approach remains valid under substantial distributional heterogeneity and also covers certain i.i.d. models beyond previously studied settings. Under mild symmetry and convergence conditions on some traces, we prove that the empirical spectral distribution of a properly centered and scaled Kendall's correlation matrix converges weakly almost surely to a deterministic, generally model-dependent limit. The analysis clarifies how distributional heterogeneity influences the limiting spectrum. As an application, we propose a graphical tool for detecting dependence among components in high-dimensional data and show that ignoring heterogeneity may lead to spurious detection of dependence.

2603.08349 2026-03-10 cs.LG cs.AI stat.ML

Towards plausibility in time series counterfactual explanations

Marcin Kostrzewa, Krzysztof Galus, Maciej Zięba

详情
英文摘要

We present a new method for generating plausible counterfactual explanations for time series classification problems. The approach performs gradient-based optimization directly in the input space. To enforce plausibility, we integrate soft-DTW (dynamic time warping) alignment with $k$-nearest neighbors from the target class, which effectively encourages the generated counterfactuals to adopt a realistic temporal structure. The overall optimization objective is a multi-faceted loss function that balances key counterfactual properties. It incorporates losses for validity, sparsity, and proximity, alongside the novel soft-DTW-based plausibility component. We conduct an evaluation of our method against several strong reference approaches, measuring the key properties of the generated counterfactuals across multiple dimensions. The results demonstrate that our method achieves competitive performance in validity while significantly outperforming existing approaches in distributional alignment with the target class, indicating superior temporal realism. Furthermore, a qualitative analysis highlights the critical limitations of existing methods in preserving realistic temporal structure. This work shows that the proposed method consistently generates counterfactual explanations for time series classifiers that are not only valid but also highly plausible and consistent with temporal patterns.

2603.08345 2026-03-10 stat.ME q-bio.QM

Amortized Phylodynamic Inference with Neural Bayes Estimators and Recursive Neural Networks

Alexander E. Zarebski, Thomas Williams, Louis du Plessis

详情
英文摘要

Phylodynamics is used to estimate epidemic dynamics from phylogenetic trees or genomic sequences of pathogens, but the likelihood calculations needed can be challenging for complex models. We present a neural Bayes estimator (NBE) for key epidemic quantities: the reproduction number, prevalence, and cumulative infections through time. By performing quantile regression over tree space, the NBE allows us to estimate posterior medians and credible intervals directly from a reconstructed tree. Our approach uses a recursive neural network as a tree embedding network with a prediction network conditioned on time and quantile level to generate the estimates. In simulation studies, the NBE achieves good predictive performance, with conservative uncertainty estimates. Compared with a BEAST2 fixed-tree analysis, the NBE gives less biased estimates of time-varying reproduction numbers in our test setting. Under a misspecified sampling model, the NBE performance degrades (as expected) but remains reasonable, and fine-tuning a pre-trained model yields estimates comparable to those from a model trained from scratch, at substantially lower computational cost.

2603.08320 2026-03-10 math.ST math.PR stat.TH

Size-Location Correlation for Set-Valued Processes: Theory, Estimation, and Laws of Large Numbers under $ρ$-Mixing

Tuyen Luc Tri

Comments 47 pages, 2 figures

详情
英文摘要

We propose a variational framework for analyzing dependence structures of convex compact random sets based on their support functions. The approach relies on the canonical even--odd decomposition on the unit sphere, which separates size-related and location-related components and induces an exact orthogonality in the sphere $L^2(σ)$ space. This decomposition yields an additive variance--covariance structure that is intrinsic to set-valued data and cannot be recovered from point-based or selection-based representations. Within this framework, we introduce size, location, and total covariance and correlation indices for random sets, together with compatible $ρ$-mixing coefficients for set-valued processes. The resulting dependence measures are geometrically interpretable, invariant under translations, and free of degeneracies that arise for centrally symmetric sets under classical approaches. Weak and strong laws of large numbers are established under weak stationarity, providing asymptotic stability of Minkowski averages in the $L^2(σ)$ support-function norm. The proposed quantities admit natural numerical realizations via directional Monte Carlo and spherical designs. Applications to interval-valued and convex-valued data, including regression with set-valued responses, illustrate how the even--odd decomposition disentangles directional location dependence from size effects beyond what can be captured by finite-dimensional summaries such as the Steiner point.

2603.08311 2026-03-10 math.ST cs.LG stat.TH

Sign Identifiability of Causal Effects in Stationary Stochastic Dynamical Systems

Gijs van Seeventer, Saber Salehkaleybar

详情
英文摘要

We study identifiability in continuous-time linear stationary stochastic differential equations with known causal structure. Unlike existing approaches, we relax the assumption of a known diffusion matrix, thereby respecting the model's intrinsic scale invariance. Rather than recovering drift coefficients themselves, we introduce edge-sign identifiability: for a given causal structure, we ask whether the sign of a given drift entry is uniquely determined across all observational covariance matrices induced by parametrizations compatible with that structure. Under a notion of faithfulness, we derive criteria for characterising identifiability, non-identifiability, and partial identifiability for general graphs. Applying our criteria to specific causal structures, both analogous to classical causal settings (e.g., instrumental variables) and novel cyclic settings, we determine their edge-sign identifiability and, in some cases, obtain explicit expressions for the sign of a target edge in terms of the observational covariance matrix.

2603.08287 2026-03-10 stat.ML cs.LG

Posterior Sampling Reinforcement Learning with Gaussian Processes for Continuous Control: Sublinear Regret Bounds for Unbounded State Spaces

Hamish Flynn, Joe Watson, Ingmar Posner, Jan Peters

Comments 37 pages, 8 figures

详情
英文摘要

We analyze the Bayesian regret of the Gaussian process posterior sampling reinforcement learning (GP-PSRL) algorithm. Posterior sampling is an effective heuristic for decision-making under uncertainty that has been used to develop successful algorithms for a variety of continuous control problems. However, theoretical work on GP-PSRL is limited. All known regret bounds either fail to achieve a tight dependence on a kernel-dependent quantity called the maximum information gain or fail to properly account for the fact that the set of possible system states is unbounded. Through a recursive application of the Borell-Tsirelson-Ibragimov-Sudakov inequality, we show that, with high probability, the states actually visited by the algorithm are contained within a ball of near-constant radius. To obtain tight dependence on the maximum information gain, we use the chaining method to control the regret suffered by GP-PSRL. Our main result is a Bayesian regret bound of the order $\widetilde{\mathcal{O}}(H^{3/2}\sqrt{γ_{T/H} T})$, where $H$ is the horizon, $T$ is the number of time steps and $γ_{T/H}$ is the maximum information gain. With this result, we resolve the limitations with prior theoretical work on PSRL, and provide the theoretical foundation and tools for analyzing PSRL in complex settings.

2603.08285 2026-03-10 stat.ME

An objective non-local prior for skew-symmetric models

F. J. Rubio

Comments R code and real data available at: https://github.com/FJRubio67/MOOMIN

详情
英文摘要

We propose an objective non-local prior for testing symmetry against skew-symmetric alternatives. The prior is derived through a formal construction rule by assigning a uniform distribution to a discrepancy-based measure of the shape parameter's effect. This approach avoids the need for user-specified hyperparameters and produces a weakly informative prior tailored to the skew-symmetric family. We illustrate the use of the proposed prior in the context of testing normality against skew-normal alternatives through both a simulation study and a real-data application.

2603.08257 2026-03-10 stat.ML cs.LG

Beyond ReinMax: Low-Variance Gradient Estimators for Discrete Latent Variables

Daniel Wang, Thang D. Bui

详情
英文摘要

Machine learning models involving discrete latent variables require gradient estimators to facilitate backpropagation in a computationally efficient manner. The most recent addition to the Straight-Through family of estimators, ReinMax, can be viewed from a numerical ODE perspective as incorporating an approximation via Heun's method to reduce bias, but at the cost of high variance. In this work, we introduce the ReinMax-Rao and ReinMax-CV estimators which incorporate Rao-Blackwellisation and control variate techniques into ReinMax to reduce its variance. Our estimators demonstrate superior performance on training variational autoencoders with discrete latent spaces. Furthermore, we investigate the possibility of leveraging alternative numerical methods for constructing more accurate gradient approximations and present an alternative view of ReinMax from a simpler numerical integration perspective.

2603.08242 2026-03-10 cs.LG stat.AP

Optimising antibiotic switching via forecasting of patient physiology

Magnus Ross, Nel Swanepoel, Akish Luintel, Emma McGuire, Ingemar J. Cox, Steve Harris, Vasileios Lampos

Comments 32 pages, 8 figures

详情
英文摘要

Timely transition from intravenous (IV) to oral antibiotic therapy shortens hospital stays, reduces catheter-related infections, and lowers healthcare costs, yet one in five patients in England remain on IV antibiotics despite meeting switching criteria. Clinical decision support systems can improve switching rates, but approaches that learn from historical decisions reproduce the delays and inconsistencies of routine practice. We propose using neural processes to model vital sign trajectories probabilistically, predicting switch-readiness by comparing forecasts against clinical guidelines rather than learning from past actions, and ranking patients to prioritise clinical review. The design yields interpretable outputs, adapts to updated guidelines without retraining, and preserves clinical judgement. Validated on MIMIC-IV (US intensive care, 6,333 encounters) and UCLH (a large urban academic UK hospital group, 10,584 encounters), the system selects 2.2-3.2$\times$ more relevant patients than random. Our results demonstrate that forecasting patient physiology offers a principled foundation for decision support in antibiotic stewardship.

2603.06759 2026-03-10 math.ST stat.TH

A New Estimator of Kullback--Leibler Divergence via Shannon Entropy

Mehmet Siddik Cadirci, Martin Singull

Comments 20 pages, 6 figures, 2 tables

详情
英文摘要

We examine the estimation of the Kullback-Leibler (KL) divergence and the use of the goodness-of-fit test for multivariate continuous distributions. Our starting point is the maximum entropy principle for Shannon entropy: among all distributions with a fixed mean vector and covariance matrix, the multivariate Gaussian distributions uniquely maximize entropy. As a result, the KL divergence from a moment-matched Gaussian distribution to an unknown density can then be written as the \emph{entropy difference}, which is a suitable information-theoretic measure of divergence from the Gaussian distribution. To estimate, we use $k$-nearest neighbor (kNN) estimators based on Shannon entropy and KL divergence derived from the Kozachenko-Leonenko approach and subsequent improvements, along with the consistency and $L^{2}$-convergence results established for these estimators. Motivated by previous entropy-based goodness-of-fit ideas developed for Rényi-type functionals under generalized Gaussian and Student-type models, we describe a KL-based test statistic as being the difference between (i) the entropy of a Gaussian model fitted to the sample mean and covariance and (ii) the KL divergence between the unknown entropy and the kNN estimate. The statistic converges to zero under multivariate normality and converges to a strictly positive bound under non-Gaussian alternatives. Results from Monte Carlo simulations on various dimensions and sample sizes indicate that the proposed procedure achieves accurate Type I error control and accurate, generally superior power compared to conventional multivariate tests of normality, particularly at medium and high dimensions.

2602.20640 2026-03-10 math.ST stat.ML stat.TH

Scalable multitask Gaussian processes for complex mechanical systems with functional covariates

Razak Christophe Sabi Gninkou, Andrés F. López-Lopera, Franck Massa, Rodolphe Le Riche

详情
英文摘要

Functional covariates arise in many scientific and engineering applications when model inputs take the form of time-dependent or spatially distributed profiles, such as varying boundary conditions or changing material behaviours. In addition, new practices in digital simulation require predictions accompanied by confidence intervals. Models based on Gaussian processes (GPs) provide principled uncertainty quantification. However, GPs capable of jointly handling functional covariates and multiple correlated functional tasks remain largely under-explored. In this work, we extend the framework of GPs with functional covariates to multitask problems by introducing a fully separable kernel structure that captures dependencies across tasks and functional inputs. By taking advantage of the Kronecker structure of the covariance matrix, the model is made scalable. The proposed model is validated on a synthetic benchmark and applied to a realistic structure, a riveted assembly with functional descriptions of the material behaviour and response forces. The proposed functional multitask GP significantly improves over single task GPs. For the riveted assembly, it requires less than 100 samples to produce an accurate mean and confidence interval prediction. Despite its larger number of parameters, the multitask GP is computationally easier to learn than its single task pendant.

2602.10760 2026-03-10 math.ST stat.TH

Covariate-Adaptive Randomization in Clinical Trials without Inflated Variances

Zhang Li-Xin

Comments 30 pages

详情
英文摘要

Covariate adaptive randomization (CAR) procedures are extensively used to reduce the likelihood of covariate imbalances occurring in clinical trials. In literatures, a lot of CAR procedures have been proposed so that the specified covariates are balanced well between treatments. However, the variance of the imbalance of the unspecified covariates may be inflated comparing to the one under the simple randomization. The inflation of the variance causes the usual test of treatment effects being not valid and adjusting the test being not an easy work. In this paper, we propose a new kind covariate adaptive randomization procedures to balance covariates between two treatments with a ratio $ρ:(1-ρ)$. Under this kind of CAR procedures, the convergence rate of the imbalance of the specified covariates is $o(n^{1/2})$, and at the same time the asymptotic variance of the imbalance of any unspecified (observed or unobserved) covariates does not exceed the one under the simple randomization. The ``shift problem'' found by Liu, Hu, and Ma (2025) will not appear under the new CAR procedures.

2601.02241 2026-03-10 stat.ML cs.LG

From Mice to Trains: Amortized Bayesian Inference on Graph Data

Svenja Jedhoff, Elizaveta Semenova, Aura Raulo, Anne Meyer, Paul-Christian Bürkner

详情
英文摘要

Graphs arise across diverse domains, from biology and chemistry to social and information networks, as well as in transportation and logistics. Inference on graph-structured data requires methods that are permutation-invariant, scalable across varying sizes and sparsities, and capable of capturing complex long-range dependencies, making posterior estimation on graph parameters particularly challenging. Amortized Bayesian Inference (ABI) is a simulation-based framework that employs generative neural networks to enable fast, likelihood-free posterior inference. We adapt ABI to graph data to address these challenges to perform inference on node-, edge-, and graph-level parameters. Our approach couples permutation-invariant graph encoders with flexible neural posterior estimators in a two-module pipeline: a summary network maps attributed graphs to fixed-length representations, and an inference network approximates the posterior over parameters. In this setting, several neural architectures can serve as the summary network. In this work we evaluate multiple architectures and assess their performance on controlled synthetic settings and two real-world domains - biology and logistics - in terms of recovery and calibration.

2512.24327 2026-03-10 stat.ML cs.CG cs.LG

Topological Spatial Graph Coarsening

Anna Calissano, Etienne Lasalle

详情
英文摘要

Spatial graphs are particular graphs for which the nodes are localized in space (e.g., public transport network, molecules, branching biological structures). In this work, we consider the problem of spatial graph reduction, that aims to find a smaller spatial graph (i.e., with less nodes) with the same overall structure as the initial one. In this context, performing the graph reduction while preserving the main topological features of the initial graph is particularly relevant, due to the additional spatial information. Thus, we propose a topological spatial graph coarsening approach based on a new framework that finds a trade-off between the graph reduction and the preservation of the topological characteristics. The coarsening is realized by collapsing short edges. In order to capture the topological information required to calibrate the reduction level, we adapt the construction of classical topological descriptors made for point clouds (the so-called persistent diagrams) to spatial graphs. This construction relies on the introduction of a new filtration called triangle-aware graph filtration. Our coarsening approach is parameter-free and we prove that it is equivariant under rotations, translations and scaling of the initial spatial graph. We evaluate the performances of our method on synthetic and real spatial graphs, and show that it significantly reduces the graph sizes while preserving the relevant topological information.

2511.19905 2026-03-10 math.ST stat.ME stat.TH

Sigmoid-FTRL: Design-Based Adaptive Neyman Allocation for AIPW Estimators

Fangyi Chen, Shu Ge, Jian Qian, Christopher Harshaw

详情
英文摘要

We consider the problem of Adaptive Neyman Allocation for the class of AIPW estimators in a design-based setting, where potential outcomes and covariates are deterministic. As each subject arrives, an adaptive procedure must select both a treatment assignment probability and a pair of linear predictors to be used in the AIPW estimator. Our goal is to construct an adaptive procedure that minimizes the Neyman Regret, which is the difference between the variance of the adaptive procedure and an oracle variance which uses the optimal non-adaptive choice of assignment probabilities and linear predictors. While previous work has drawn insightful connections between Neyman Regret and online convex optimization for the Horvitz--Thompson estimator, one of the central challenges for the AIPW estimator is that the underlying optimization is non-convex. In this paper, we propose Sigmoid-FTRL, an adaptive experimental design which addresses the non-convexity via simultaneous minimization of two convex regrets. We prove that under standard regularity conditions, the Neyman Regret of Sigmoid-FTRL converges at a $T^{-1/2} R$ rate, where $T$ is the number of subjects in the experiment and $R$ is the maximum norm of covariate vectors. Moreover, we show that no adaptive design can improve upon the $T^{-1/2} R$ rate under our regularity conditions, establishing the minimax rate of Neyman Regret. Finally, we establish a central limit theorem and a consistently conservative variance estimator which facilitate the construction of asymptotically valid Wald-type confidence intervals.

2510.04602 2026-03-10 stat.ML cs.AI cs.LG

Wasserstein Gradient Flows for Scalable and Regularized Barycenter Computation

Eduardo Fernandes Montesuma, Yassir Bendou, Mike Gartrell

Comments Under review

详情
英文摘要

Wasserstein barycenters provide a principled approach for aggregating probability measures, while preserving the geometry of their ambient space. Existing discrete methods are not scalable as they assume access to the complete set of samples from the input measures. Meanwhile, neural network approaches do scale well, but rely on complex optimization problems and cannot easily incorporate label information. We address these limitations through gradient flows in the space of probability measures. Through time discretization, we achieve a scalable algorithm that i) relies on mini-batch optimal transport, ii) accepts modular regularization through task-aware functions, and iii) seamlessly integrates supervised information into the ground-cost. We empirically validate our approach on domain adaptation benchmarks that span computer vision, neuroscience, and chemical engineering. Our method establishes a new state-of-the-art barycenter solver, with labeled barycenters consistently outperforming unlabeled ones.

2510.04543 2026-03-10 cs.LG stat.ML

The Role of Feature Interactions in Graph-based Tabular Deep Learning

Elias Dubbeldam, Reza Mohammadi, Marit Schoonhoven, S. Ilker Birbil

Comments 12 pages, 5 figures, accepted at TMLR 2026

详情
英文摘要

Accurate predictions on tabular data rely on capturing complex, dataset-specific feature interactions. Attention-based methods and graph neural networks, referred to as graph-based tabular deep learning (GTDL), aim to improve predictions by modeling these interactions as a graph. In this work, we analyze how these methods model the feature interactions. Current GTDL approaches primarily focus on optimizing predictive accuracy, often neglecting the accurate modeling of the underlying graph structure. Using synthetic datasets with known ground-truth graph structures, we find that current GTDL methods fail to recover meaningful feature interactions, as their edge recovery is close to random. This suggests that the attention mechanism and message-passing schemes used in GTDL do not effectively capture feature interactions. Furthermore, when we impose the true interaction structure, we find that the predictive accuracy improves. This highlights the need for GTDL methods to prioritize accurate modeling of the graph structure, as it leads to better predictions.

2510.01734 2026-03-10 stat.ME

Stabilizing Thompson Sampling with Null Hypothesis Bayesian Response-Adaptive Randomization

Samuel Pawel, Leonhard Held

详情
英文摘要

Response-adaptive randomization (RAR) methods can be used to adapt randomization probabilities based on accumulating data, aiming to increase the probability of allocating patients to effective treatments. A popular RAR method is Thompson sampling, which randomizes patients proportionally to the Bayesian posterior probability that each treatment is the most effective. However, its high variability can also increase the risk of assigning patients to inferior treatments and lead to inferential problems such as confidence interval undercoverage. We propose a principled method based on Bayesian hypothesis testing to address these issues: We introduce a null hypothesis postulating equal effectiveness of treatments. Bayesian model averaging then induces shrinkage toward equal randomization probabilities, with the degree of shrinkage controlled by the prior probability of the null hypothesis. Equal randomization and Thompson sampling arise as special cases when the prior probability is set to one or zero, respectively. Simulated and real-world examples illustrate that the method balances highly variable Thompson sampling with static equal randomization. A simulation study demonstrates that the method can mitigate issues with Thompson sampling and has comparable statistical properties to Thompson sampling with common ad hoc modifications such as power transformation and probability capping. We implement the method in the free and open-source R package brar, enabling experimenters to easily perform null hypothesis Bayesian RAR and support more effective randomization of patients.

2509.26429 2026-03-10 stat.ML cs.LG

An Orthogonal Learner for Individualized Outcomes in Markov Decision Processes

Emil Javurek, Valentyn Melnychuk, Jonas Schweisthal, Konstantin Hess, Dennis Frauen, Stefan Feuerriegel

Comments Published as a conference paper at ICLR 2026

详情
英文摘要

Predicting individualized potential outcomes in sequential decision-making is central for optimizing therapeutic decisions in personalized medicine (e.g., which dosing sequence to give to a cancer patient). However, predicting potential outcomes over long horizons is notoriously difficult. Existing methods that break the curse of the horizon typically lack strong theoretical guarantees such as orthogonality and quasi-oracle efficiency. In this paper, we revisit the problem of predicting individualized potential outcomes in sequential decision-making (i.e., estimating Q-functions in Markov decision processes with observational data) through a causal inference lens. In particular, we develop a comprehensive theoretical foundation for meta-learners in this setting with a focus on beneficial theoretical properties. As a result, we yield a novel meta-learner called DRQ-learner and establish that it is: (1) doubly robust (i.e., valid inference under the misspecification of one of the nuisances), (2) Neyman-orthogonal (i.e., insensitive to first-order estimation errors in the nuisance functions), and (3) achieves quasi-oracle efficiency (i.e., behaves asymptotically as if the ground-truth nuisance functions were known). Our DRQ-learner is applicable to settings with both discrete and continuous state spaces. Further, our DRQ-learner is flexible and can be used together with arbitrary machine learning models (e.g., neural networks). We validate our theoretical results through numerical experiments, thereby showing that our meta-learner outperforms state-of-the-art baselines.

2507.14391 2026-03-10 stat.ME econ.EM

Policy relevance of causal quantities in networks

Sahil Loomba, Dean Eckles

Comments 27 Pages, 4 figures

详情
英文摘要

In settings where units' outcomes are affected by others' treatments, there has been a proliferation of ways to quantify effects of treatments on outcomes, including via indirect exposure to other units' treatments. Here we consider two properties we might want estimands to have: being interpretable as summaries of unit-level effects, and being relevant to choice of a policy governing treatment assignment. We characterize many estimands as involving one of two orders of averaging over units in a population and over treatment assignments under a policy. The more common representation often results in quantities that are insufficient for optimal policy choice. This occurs because these quantities summarize outcomes under homogeneous exposure to treatment, but even homogeneous policies often lead to heterogeneous exposures. The other representation often yields quantities that lack an interpretation as summaries of unit-level effects. We argue that, among various estimands, the expected average outcome, which averages over units and treatment assignments in either order, deserves further attention from researchers. This estimand, or contrasts among these estimands under different policies, is both a summary of unit-level effects and is sufficient for optimal policy choice with utilitarian welfare.

2505.03234 2026-03-10 stat.ME

Designing clinical trials for the comparison of single and multiple quantiles with right-censored data

Beatriz Farah, Olivier Bouaziz, Aurélien Latouche

详情
英文摘要

Based on the test for equality of quantiles originally introduced by Kosorok (1999), we propose new power formulas for the comparison of one quantile between two treatment groups, as well as for the comparison of a collection of quantiles. Under the null hypothesis of equality of quantiles, the test statistic follows asymptotically a normal distribution in the univariate case and a chi-squared with J degrees of freedom in the multivariate case, with J the number of quantiles compared. The variance of the test statistic depends on the estimation of the probability density function of the distribution of failure times at the quantile being tested. In order to apply the test on real data, we propose to estimate this quantity using a resampling-based method, as an alternative to Kosorok's original kernel density estimator. The whole procedure provides a practical tool for designing and analyzing data arising from clinical trials using quantiles of survival as an endpoint. Simulation studies are performed to show the appropriateness of the power formulas. We illustrate the proposed test in a phase III randomized clinical trial where the proportional hazards assumption between treatment arms does not hold.

2503.00290 2026-03-10 econ.EM math.ST stat.TH

GMM and M Estimation under Network Dependence

Yuya Sasaki

详情
英文摘要

This paper presents GMM and M estimators and their asymptotic properties for network-dependent data. To this end, I build on Kojevnikov, Marmer, and Song (KMS, 2021) and develop a novel uniform law of large numbers (ULLN), which is essential to ensure desired asymptotic behaviors of nonlinear estimators (e.g., Newey and McFadden, 1994, Section 2). Using this ULLN, I establish the consistency and asymptotic normality of both GMM and M estimators. For practical convenience, complete estimation and inference procedures are also provided.

2502.03849 2026-03-10 math.ST stat.CO stat.ME stat.TH

Fast confidence bounds for the false discovery proportion over a path of hypotheses

Guillermo Durand

详情
Journal ref
Computo, 2025
英文摘要

This paper presents a new algorithm (and an additional trick) that allows to compute fastly an entire curve of post hoc bounds for the False Discovery Proportion when the underlying bound $V^*\_{\mathfrak{R}}$ construction is based on a reference family $\mathfrak{R}$ with a forest structure {à} la Durand et al. (2020). By an entire curve, we mean the values $V^*\_{\mathfrak{R}}(S\_1),\dotsc,V^*\_{\mathfrak{R}}(S\_m)$ computed on a path of increasing selection sets $S\_1\subsetneq\dotsb\subsetneq S\_m$, $|S\_t|=t$. The new algorithm leverages the fact that going from $S\_t$ to $S\_{t+1}$ is done by adding only one hypothesis. Compared to a more naive approach, the new algorithm has a complexity in $O(|\mathcal K|m)$ instead of $O(|\mathcal K|m^2)$, where $|\mathcal K|$ is the cardinality of the family.

2409.16044 2026-03-10 stat.AP stat.ME

Stable Survival Extrapolation via Transfer Learning

Anastasios Apsemidis, Nikolaos Demiris

Comments 28 pages, 6 figures, 1 table

详情
英文摘要

The mean survival is the key ingredient of the decision process in several applications, notably in health economic evaluations. It is defined as the area under the complete survival curve, thus necessitating extrapolation of the observed data. This may be achieved in a more stable manner by borrowing long term evidence from registry and demographic data. Such borrowing can be seen as an implicit bias-variance trade-off in unseen data. In this article we employ a Bayesian mortality model and transfer its projections in order to construct the baseline population that acts as an anchor of the survival model. We then propose extrapolation methods based on flexible parametric polyhazard models which can naturally accommodate diverse shapes, including non-proportional hazards and crossing survival curves, while typically maintaining a natural interpretation. We estimate the mean survival and related estimands in three cases, namely breast cancer, cardiac arrhythmia and advanced melanoma. Specifically, we evaluate the survival disadvantage of triple-negative breast cancer cases, the efficacy of combining immunotherapy with mRNA cancer therapeutics for melanoma treatment and the suitability of implantable cardioverter defibrilators for cardiac arrhythmia. The latter is conducted in a competing risks context illustrating how working on the cause-specific hazard alone minimizes potential instability. The results suggest that the proposed approach offers a flexible, interpretable and robust approach when survival extrapolation is required.

2409.09787 2026-03-10 cs.LG cs.AI stat.CO stat.ML

BNEM: A Boltzmann Sampler Based on Bootstrapped Noised Energy Matching

RuiKang OuYang, Bo Qiang, José Miguel Hernández-Lobato

Comments Camera-ready version for TMLR (03/2026)

详情
Journal ref
Transactions on Machine Learning Research (TMLR), 2026
英文摘要

Developing an efficient sampler capable of generating independent and identically distributed (IID) samples from a Boltzmann distribution is a crucial challenge in scientific research, e.g. molecular dynamics. In this work, we intend to learn neural samplers given energy functions instead of data sampled from the Boltzmann distribution. By learning the energies of the noised data, we propose a diffusion-based sampler, Noised Energy Matching, which theoretically has lower variance and more complexity compared to related works. Furthermore, a novel bootstrapping technique is applied to NEM to balance between bias and variance. We evaluate NEM and BNEM on a 2-dimensional 40 Gaussian Mixture Model (GMM) and a 4-particle double-well potential (DW-4). The experimental results demonstrate that BNEM can achieve state-of-the-art performance while being more robust.

2408.13143 2026-03-10 stat.ME

A Restricted Latent Class Model with Polytomous Attributes and Respondent-Level Covariates

Eric Alan Wayman, Steven Andrew Culpepper, Jeff Douglas, Jesse Bowers

Comments 42 pages, 1 figure, 11 tables. Added second simulation study, expanded explanations, added runtime information, and fixed typos. The version of record of this article, first published in Behaviormetrika, is available on the publisher's website at https://doi.org/10.1007/s41237-025-00271-8

详情
Journal ref
Behaviormetrika (October, 2025)
英文摘要

We present an exploratory restricted latent class model where response data is for a single time point, polytomous, and differing across items, and where latent classes reflect a multi-attribute state where each attribute is ordinal. Our model extends previous work to allow for correlation of the attributes through a multivariate probit specification and to allow for respondent-specific covariates. We demonstrate that the model recovers parameters well in a variety of realistic scenarios, and apply the model to the analysis of a particular dataset designed to diagnose depression. The application demonstrates the utility of the model in identifying the latent structure of depression beyond single-factor approaches which have been used in the past.

2406.13691 2026-03-10 stat.ME stat.CO

Computationally efficient multi-level Gaussian process regression for functional data observed under completely or partially regular sampling designs

Adam Gorm Hoffmann, Claus Thorn Ekstrøm, Andreas Kryger Jensen

Comments 48 pages, 3 figures; Figure 1 corrected

详情
Journal ref
TEST 35 (2026) 211-231
英文摘要

Gaussian process regression is a frequently used statistical method for flexible yet fully probabilistic non-linear regression modeling. A common obstacle is its computational complexity which scales poorly with the number of observations. This is especially an issue when applying Gaussian process models to multiple functions simultaneously in various applications of functional data analysis. We consider a multi-level Gaussian process regression model where a common mean function and individual subject-specific deviations are modeled simultaneously as latent Gaussian processes. We derive exact analytic and computationally efficient expressions for the log-likelihood function and the posterior distributions in the case where the observations are sampled on either a completely or partially regular grid. This enables us to fit the model to large data sets that are currently computationally inaccessible using a standard implementation. We show through a simulation study that our analytic expressions are several orders of magnitude faster compared to a standard implementation, and we provide an implementation in the probabilistic programming language Stan.

2405.08290 2026-03-10 stat.CO stat.ME

MCMC using $\textit{bouncy}$ Hamiltonian dynamics: A unifying framework for Hamiltonian Monte Carlo and piecewise deterministic Markov process samplers

Andrew Chin, Akihiko Nishimura

详情
英文摘要

Piecewise-deterministic Markov process (PDMP) samplers constitute a state-of-the-art Markov chain Monte Carlo paradigm in Bayesian computation, with examples including the zig-zag and bouncy particle sampler (bps). Recent work on the zig-zag has indicated its connection to Hamiltonian Monte Carlo (HMC), a version of the Metropolis algorithm that exploits Hamiltonian dynamics. Here we establish that, in fact, the connection between the two paradigms extends far beyond the specific instance. The key lies in (1) the fact that any time-reversible deterministic dynamics provides a valid Metropolis proposal and (2) how PDMPs' characteristic velocity changes constitute an alternative to the usual acceptance-rejection. We turn this observation into a rigorous framework for constructing rejection-free Metropolis proposals based on bouncy Hamiltonian dynamics which simultaneously possess Hamiltonian-like properties and generate discontinuous trajectories similar in appearance to PDMPs. When combined with periodic refreshment of the inertia, the dynamics converge strongly to PDMP equivalents in the limit of increasingly frequent refreshment. We demonstrate the practical implications of this new framework with a sampler based on a bouncy Hamiltonian dynamics closely related to the bps. The resulting sampler exhibits competitive performance on challenging real-data posteriors involving tens of thousands of parameters. As the sampler of choice in modern probabilistic programming languages, HMC plays a critical role in applied Bayesian modeling; by generalizing the paradigm and elucidating its connection to the leading competitor, our framework opens up opportunities for cross-pollination and innovation to further scale Bayesian inference.

2312.10330 2026-03-10 math.OC stat.ML

Convergence and complexity of block majorization-minimization for constrained block-Riemannian optimization

Yuchen Li, Laura Balzano, Deanna Needell, Hanbaek Lyu

Comments 54 pages, 8 figures. Related work updated

详情
Journal ref
Journal of Machine Learning Research, 2026
英文摘要

Block majorization-minimization (BMM) is a simple iterative algorithm for nonconvex optimization that sequentially minimizes a majorizing surrogate of the objective function in each block coordinate while the other block coordinates are held fixed. We consider a family of BMM algorithms for minimizing smooth nonconvex objectives, where each parameter block is constrained within a subset of a Riemannian manifold. We establish that this algorithm converges asymptotically to the set of stationary points, and attains an $ε$-stationary point within $\widetilde{O}(ε^{-2})$ iterations. In particular, the assumptions for our complexity results are completely Euclidean when the underlying manifold is a product of Euclidean or Stiefel manifolds, although our analysis makes explicit use of the Riemannian geometry. Our general analysis applies to a wide range of algorithms with Riemannian constraints: Riemannian MM, block projected gradient descent, optimistic likelihood estimation, geodesically constrained subspace tracking, robust PCA, and Riemannian CP-dictionary-learning. We experimentally validate that our algorithm converges faster than standard Euclidean algorithms applied to the Riemannian setting.

2309.03122 2026-03-10 stat.AP physics.soc-ph stat.ME

Bayesian Evidence Synthesis for Modeling SARS-CoV-2 Transmission

Anastasios Apsemidis, Nikolaos Demiris

Comments 27 pages, 6 figures

详情
英文摘要

The acute phase of the Covid-19 pandemic has made apparent the need for decision support based upon accurate epidemic modeling. This process is substantially hampered by under-reporting of cases and related data incompleteness issues. In this article we adopt the Bayesian paradigm and synthesize publicly available data via a discrete-time stochastic epidemic modeling framework. The models allow for estimating the total number of infections while accounting for the endemic phase of the pandemic. We assess the prediction of the infection rate utilizing mobility information, notably the principal components of the mobility data. We evaluate variational Bayes in this context and find that Hamiltonian Monte Carlo offers a robust inference alternative for such models. We elaborate upon vector analysis of the epidemic dynamics, thus enriching the traditional tools used for decision making. In particular, we show how certain 2-dimensional plots on the phase plane may yield intuitive information regarding the speed and the type of transmission dynamics. We investigate the potential of a two-stage analysis as a consequence of cutting feedback, for inference on certain functionals of the model parameters. Finally, we show that a point mass on critical parameters is overly restrictive and investigate informative priors as a suitable alternative.

2603.08156 2026-03-10 cs.LG stat.ML

Are We Winning the Wrong Game? Revisiting Evaluation Practices for Long-Term Time Series Forecasting

Thanapol Phungtua-eng, Yoshitaka Yamamoto

Comments First draft

详情
英文摘要

Long-term time series forecasting (LTSF) is widely recognized as a central challenge in data mining and machine learning. LTSF has increasingly evolved into a benchmark-driven ''GAME,'' where models are ranked, compared, and declared state-of-the-art based primarily on marginal reductions in aggregated pointwise error metrics such as MSE and MAE. Across a small set of canonical datasets and fixed forecasting horizons, progress is communicated through leaderboard-style tables in which lower numerical scores define success. In this GAME, what is measured becomes what is optimized, and incremental error reduction becomes the dominant currency of advancement. We argue that this metric-centric regime is not merely incomplete, but structurally misaligned with the broader objectives of forecasting. In real-world settings, forecasting often prioritizes preserving temporal structure, trend stability, seasonal coherence, robustness to regime shifts, and supporting downstream decision processes. Optimizing aggregate pointwise error does not necessarily imply modeling these structural properties. As a result, leaderboard improvement may increasingly reflect specialization in benchmark configurations rather than a deeper understanding of temporal dynamics. This paper revisits LTSF evaluation as a foundational question in data science: what does it mean to measure forecasting progress? We propose a multi-dimensional evaluation perspective that integrates statistical fidelity, structural coherence, and decision-level relevance. By challenging the current metric monoculture, we aim to redirect attention from winning benchmark tables toward advancing meaningful, context-aware forecasting.

2603.08149 2026-03-10 math.ST math.PR stat.TH

The W-footrule coefficient: A copula-based measure of countermonotonicity

Enrique de Amo, David García-Fernández, Manuel Úbeda-Flores

详情
英文摘要

We introduce the $W$-footrule coefficient $Φ_C$, a copula-based coefficient of negative association defined as the $L^1$-distance to the countermonotonic copula $W$. We prove that Gini's gamma admits the decomposition $γ_C = \frac{2}{3}(φ_C+Φ_C)$, linking it to Spearman's footrule $φ_C$. A rank-based estimator is introduced, with its strong consistency and asymptotic normality established via the functional delta method. Monte Carlo simulations confirm the estimator's finite-sample validity and its sensitivity to negative dependence structures.

2603.08130 2026-03-10 cs.LG stat.ML

Explainable Condition Monitoring via Probabilistic Anomaly Detection Applied to Helicopter Transmissions

Aurelio Raffa Ugolini, Jessica Leoni, Valentina Breschi, Damiano Paniccia, Francesco Aldo Tucci, Luigi Capone, Mara Tanelli

详情
英文摘要

We present a novel Explainable methodology for Condition Monitoring, relying on healthy data only. Since faults are rare events, we propose to focus on learning the probability distribution of healthy observations only, and detect Anomalies at runtime. This objective is achieved via the definition of probabilistic measures of deviation from nominality, which allow to detect and anticipate faults. The Bayesian perspective underpinning our approach allows us to perform Uncertainty Quantification to inform decisions. At the same time, we provide descriptive tools to enhance the interpretability of the results, supporting the deployment of the proposed strategy also in safety-critical applications. The methodology is validated experimentally on two use cases: a publicly available benchmark for Predictive Maintenance, and a real-world Helicopter Transmission dataset collected over multiple years. In both applications, the method achieves competitive detection performance with respect to state-of-the-art anomaly detection methods.

2603.08101 2026-03-10 stat.AP

Non-stationary GEV models for estimating design sea-states in a changing climate. Applications to offshore wind farms along the French coasts

Nicolas Raillard, Coline Poppeschi, Tessa Chevallier, Youen Kervella, Laurent Dubus

Comments This work is under review for journal "Advances in Statistical Climatology, Meteorology and Oceanography" (ASCMO)

详情
英文摘要

The rapid expansion of the French offshore wind sector requires a critical reassessment of structural durability in the face of evolving marine conditions driven by climate change. Traditional design methodologies, which rely on the assumption of stationary environmental conditions, are no longer adequate. This study introduces a novel statistical framework to assess future changes in significant wave height by employing non-stationary Generalized Extreme Value (GEV) models applied to monthly maxima. This approach aims to reduce uncertainty and provide robust design tools adapted to the non-stationary conditions of the future. Based on CMIP6 climate models and reanalysis data, results reveal a projected trend towards a more pronounced seasonal contrast along the French Atlantic and English Channel coasts under future scenarios (SSP1-2.6 and SSP5-8.5), whereas the French Mediterranean Sea exhibits results that are more difficult to interpret, due to a weaker increase of extremes and large uncertainties (inter-model spread). Projections indicate more intense winters and calmer summers, along with a shift in the seasonal cycle. Overall, the multi-model ensemble suggests an increase in the design levels for extreme sea states. The research concludes by defining a new methodology for calculating an equivalent design level over the structure's operational lifespan. This tool is deemed essential for ensuring the resilience and economic viability of future offshore wind farms in a changing climate.

2603.08002 2026-03-10 math.ST stat.ME stat.TH

Post-Hoc Large-Sample Statistical Inference

Ben Chugg, Etienne Gauthier, Michael I. Jordan, Aaditya Ramdas, Ian Waudby-Smith

Comments 61 pages, 7 figures

详情
英文摘要

We derive inferential procedures for large sample sizes that remain valid under data-dependent significance levels (so-called "post-hoc valid inference"). Classical statistical tools require that the significance level -- the "type-I error" -- is selected prior to seeing or analyzing any data. This restriction leads to some drawbacks. For instance, if an analyst generates an inconclusive confidence interval, repeating the process with a larger significance level is not an option -- the result is final. Recently, e-values have emerged as the solution to this problem, being both necessary and sufficient tools for performing various forms of post-hoc inference. All such results, however, have thus far been nonasymptotic. As a result, they inherit familiar limitations of nonasymptotic inferential procedures such as requiring strong moment assumptions and being conservative in general. This paper develops a theory of post-hoc inference in the asymptotic setting, yielding asymptotic post-hoc confidence sets and asymptotic post-hoc p-values that make weaker assumptions and are sharper than their nonasymptotic counterparts.

2603.07971 2026-03-10 math.ST stat.TH

Estimation of differential entropy for normal populations under prior information

Somnath Mandal, Lakshmi Kanta Patra

Comments 29 pages, 28 figures, 3 tables, 34 references

详情
英文摘要

The problem of nonlinear functional of parameters, such as differential entropy, has received much attention in information theory and statistics. In many situations, prior information about the parameters is available in the form of order restrictions. This information should be taken into account to obtain improved estimators. In this paper, we study the problems of point-wise and interval estimation of the entropy of two normal populations under a general location-invariant loss function. For the point-wise estimation, we have derived the maximum likelihood estimator (MLE), restricted MLE and the uniformly minimum variance unbiased estimator (UMVUE). Further, we derive a sufficient condition for improvement over affine equivariant estimators. A class of improved estimators is derived that dominates the best affine equivariant estimator (BAEE). Furthermore, we obtain a class of smooth improved estimator that dominates BAEE. We present special loss functions and derive expressions for the proposed improved estimators. A numerical study is conducted to compare the risk performance of the proposed estimators under quadratic and linex loss functions. For interval estimation, we have derived asymptotic confidence interval, bootstrap confidence intervals, HPD credible interval, and intervals based on generalized pivot variables. A comprehensive numerical comparison of these intervals is carried out in terms of coverage probabilities and average lengths. Finally, the proposed results are illustrated with a real example: the failure of the air-conditioning systems on Boeing 720 jet planes.

2603.07965 2026-03-10 stat.ML cs.LG

Local Constrained Bayesian Optimization

Jing Jingzhe, Fan Zheyi, Szu Hui Ng, Qingpei Hu

详情
英文摘要

Bayesian optimization (BO) for high-dimensional constrained problems remains a significant challenge due to the curse of dimensionality. We propose Local Constrained Bayesian Optimization (LCBO), a novel framework tailored for such settings. Unlike trust-region methods that are prone to premature shrinking when confronting tight or complex constraints, LCBO leverages the differentiable landscape of constraint-penalized surrogates to alternate between rapid local descent and uncertainty-driven exploration. Theoretically, we prove that LCBO achieves a convergence rate for the Karush-Kuhn-Tucker (KKT) residual that depends polynomially on the dimension $d$ for common kernels under mild assumptions, offering a rigorous alternative to global BO where regret bounds typically scale exponentially. Extensive evaluations on high-dimensional benchmarks (up to 100D) demonstrate that LCBO consistently outperforms state-of-the-art baselines.

2603.07921 2026-03-10 stat.ML cs.LG

Robust Transfer Learning with Side Information

Akram S. Awad, Shihab Ahmed, Yue Wang, George K. Atia

详情
英文摘要

Robust Markov Decision Processes (MDPs) address environmental shift through distributionally robust optimization (DRO) by finding an optimal worst-case policy within an uncertainty set of transition kernels. However, standard DRO approaches require enlarging the uncertainty set under large shifts, which leads to overly conservative and pessimistic policies. In this paper, we propose a framework for transfer under environment shift that derives a robust target-domain policy via estimate-centered uncertainty sets, constructed through constrained estimation that integrates limited target samples with side information about the source-target dynamics. The side information includes bounds on feature moments, distributional distances, and density ratios, yielding improved kernel estimates and tighter uncertainty sets. The side information includes bounds on feature moments, distributional distances, and density ratios, yielding improved kernel estimates and tighter uncertainty sets. Error bounds and convergence results are established for both robust and non-robust value functions. Moreover, we provide a finite-sample guarantee on the learned robust policy and analyze the robust sub-optimality gap. Under mild low-dimensional structure on the transition model, the side information reduces this gap and improves sample efficiency. We assess the performance of our approach across OpenAI Gym environments and classic control problems, consistently demonstrating superior target-domain performance over state-of-the-art robust and non-robust baselines.

2603.07899 2026-03-10 cs.LG stat.ML

Bayesian Transformer for Probabilistic Load Forecasting in Smart Grids

Sajib Debnath, Md. Uzzal Mia

详情
英文摘要

The reliable operation of modern power grids requires probabilistic load forecasts with well-calibrated uncertainty estimates. However, existing deep learning models produce overconfident point predictions that fail catastrophically under extreme weather distributional shifts. This study proposes a Bayesian Transformer (BT) framework that integrates three complementary uncertainty mechanisms into a PatchTST backbone: Monte Carlo Dropout for epistemic parameter uncertainty, variational feed-forward layers with log-uniform weight priors, and stochastic attention with learnable Gaussian noise perturbations on pre-softmax logits, representing, to the best of our knowledge, the first application of Bayesian attention to probabilistic load forecasting. A seven-level multi-quantile pinball-loss prediction head and post-training isotonic regression calibration produce sharp, near-nominally covered prediction intervals. Evaluation of five grid datasets (PJM, ERCOT, ENTSO-E Germany, France, and Great Britain) augmented with NOAA covariates across 24, 48, and 168-hour horizons demonstrates state-of-the-art performance. On the primary benchmark (PJM, H=24h), BT achieves a CRPS of 0.0289, improving 7.4% over Deep Ensembles and 29.9% over the deterministic LSTM, with 90.4% PICP at the 90% nominal level and the narrowest prediction intervals (4,960 MW) among all probabilistic baselines. During heat-wave and cold snap events, BT maintained 89.6% and 90.1% PICP respectively, versus 64.7% and 67.2% for the deterministic LSTM, confirming that Bayesian epistemic uncertainty naturally widens intervals for out-of-distribution inputs. Calibration remained stable across all horizons (89.8-90.4% PICP), while ablation confirmed that each component contributed a distinct value. The calibrated outputs directly support risk-based reserve sizing, stochastic unit commitment, and demand response activation.

2603.07887 2026-03-10 cs.LG cs.AI cs.CL math.ST stat.ML stat.TH

Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference

Noah Golowich, Fan Chen, Dhruv Rohatgi, Raghav Singhal, Carles Domingo-Enrich, Dylan J. Foster, Akshay Krishnamurthy

详情
英文摘要

Inference-time methods that aggregate and prune multiple samples have emerged as a powerful paradigm for steering large language models, yet we lack any principled understanding of their accuracy-cost tradeoffs. In this paper, we introduce a route to rigorously study such approaches using the lens of *particle filtering* algorithms such as Sequential Monte Carlo (SMC). Given a base language model and a *process reward model* estimating expected terminal rewards, we ask: *how accurately can we sample from a target distribution given some number of process reward evaluations?* Theoretically, we identify (1) simple criteria enabling non-asymptotic guarantees for SMC; (2) algorithmic improvements to SMC; and (3) a fundamental limit faced by all particle filtering methods. Empirically, we demonstrate that our theoretical criteria effectively govern the *sampling error* of SMC, though not necessarily its final *accuracy*, suggesting that theoretical perspectives beyond sampling may be necessary.

2603.07871 2026-03-10 stat.ME

Effective and flexible depth-based inference for functional parameters

Hyemin Yeon

详情
英文摘要

For hypothesis testing of functional parameters, given a functional statistic $T_n$ and a functional depth $D$ with respect to the distribution $P_n$ of $T_n$, we propose the depth value $DT_n \equiv D(T_n;P_n)$ as a test statistic, which we refer to as a depth statistic. In practice, its sampling distribution is approximated by a resampling method such as bootstrap. While achieving accurate sizes, a test based on the proposed depth statistic produces stronger power, as it remains sensitive even to subtle variations arising from complex functional patterns in the alternatives. Moreover, it is broadly applicable to a broad range of inference problems for functional parameters, including two-sample tests, analysis of variance, regression, etc. We provide its theoretical guarantee under mild assumptions along with examples of bootstrap methods and functional depths that satisfy these conditions. Its effectiveness is thoroughly investigated through numerical studies under two popular frameworks: (i) two-sample functional mean tests and (ii) mean response inference for function-on-function regression. The proposed depth statistic is illustrated with two data examples: Canadian weather and German electricity prices datasets.

2603.07864 2026-03-10 stat.ML cs.LG

An Interpretable Generative Framework for Anomaly Detection in High-Dimensional Financial Time Series

Waldyn G Martinez

详情
英文摘要

Detecting structural instability and anomalies in high-dimensional financial time series is challenging due to complex temporal dependence and evolving cross-sectional structure. We propose ReGEN-TAD, an interpretable generative framework that integrates modern machine learning with econometric diagnostics for anomaly detection. The model combines joint forecasting and reconstruction within a refined convolutional--transformer architecture and aggregates complementary signals capturing predictive inconsistency, reconstruction degradation, latent distortion, and volatility shifts. Robust calibration yields a unified anomaly score without labeled data. Experiments on synthetic and financial panels demonstrate improved robustness to structured deviations while enabling economically coherent factor-level attribution.

2603.07856 2026-03-10 stat.ME

Variational Inference for Variable Selection in Scalar-on-Function Regression

Ana Carolina da Cruz, Camila P. E. de Souza, Pedro H. T. O. Sousa

Comments 41 pages in main text and 18 pages in the Supplementary Material

详情
英文摘要

In practical regression applications, multiple covariates are often measured, but not all may be associated with the response variable. Identifying and including only the relevant covariates in the model is crucial for improving prediction accuracy. In this work, we develop a variational inference approach for estimation and variable selection in scalar-on-function regression, involving only functional covariates, and in partially functional regression models that also include scalar covariates. Specifically, we develop a variational expectation-maximization (VEM) algorithm, with a variational Bayes procedure implemented in the E-step to obtain approximate marginal posterior distributions for most model parameters, except for the regularization parameters, which are updated in the M-step. Our method accurately identifies relevant covariates while maintaining strong predictive performance, as demonstrated through extensive simulation studies across diverse scenarios. Compared with alternative approaches, including BGLSS (Bayesian Group Lasso with Spike-and-Slab priors), grLASSO (group Least Absolute Shrinkage and Selection Operator), grMCP (group Minimax Concave Penalty), and grSCAD (group Smoothly Clipped Absolute Deviation), our approach achieves a superior balance between goodness-of-fit and sparsity in most scenarios. We further illustrate its practical utility through real-data applications involving spectral analysis of sugar samples and weather measurements from Japan.

2603.07842 2026-03-10 stat.ME

New results and tests for stochastic dominance between linear combinations

Tommaso Lando, Paulo Eduardo Oliveira

详情
英文摘要

Convex combinations of i.i.d. random variables without a finite mean can behave in a strikingly different way from the finite-mean case: as the weight vector becomes more balanced, the resulting combination may become stochastically larger, rather than less dispersed. Existing results establish stochastic dominance between pairs of linear combinations-or between a convex combination and the underlying variable-under shape restrictions on the distribution and structural assumptions on the weights. We expand the class for which the general result can be derived. Nonetheless, two practical limitations remain: (i) the sufficient conditions vary across results, and (ii) being non-necessary, they exclude many relevant configurations. Moreover, under a statistical perspective, where the true distribution of the data is assumed to be unknown, these conditions cannot be checked. Motivated by this gap, we develop nonparametric procedures to test whether two linear combinations are stochastically ordered. We propose two complementary approaches: a least-favorable calibration and a bootstrap-based method.We show that both tests control size asymptotically under the null of stochastic dominance and are consistent against alternatives of non-dominance. Monte Carlo experiments illustrate the finite-sample performance of the proposed procedures across a range of models and weight configurations.

2603.07813 2026-03-10 econ.EM stat.AP

At-Risk Transformation for U.S. Recession Prediction

Rahul Billakanti, Minchul Shin

Comments 46 pages, 2 figures

详情
英文摘要

We propose a simple binarization of predictors, an "at-risk" transformation, as an alternative to the standard practice of using continuous, standardized variables in recession forecasting models. By converting predictors into indicators of unusually weak states based on a thresholding rule estimated from training data, we demonstrate their ability to capture the discrete nature of rare events such as U.S. recessions. Using a large panel of monthly U.S. macroeconomic and financial data, we show that binarized predictors consistently improve out-of-sample forecasting performance, often making linear models competitive with flexible machine learning methods, and that the gains are particularly pronounced around the onset of recessions.

2603.07791 2026-03-10 stat.ME stat.AP

Design Effect Ratios for Bayesian Survey Models: A Diagnostic Framework for Identifying Survey-Sensitive Parameters

JoonHo Lee

详情
英文摘要

Bayesian hierarchical models fit to complex survey data require variance correction for the sampling design, yet applying this correction uniformly harms parameters already protected by the hierarchical structure. We propose the Design Effect Ratio -- the ratio of design-corrected to model-based posterior variance -- as a per-parameter diagnostic identifying which quantities are survey-sensitive. Closed-form decompositions show that fixed-effect sensitivity depends on whether identifying variation lies between or within clusters, while random-effect sensitivity is governed by hierarchical shrinkage. These results yield a compute-classify-correct workflow adding negligible overhead to Bayesian estimation. In simulations spanning 54 scenarios and 10,800 replications of hierarchical logistic regression, selective correction achieves 87-88% coverage for survey-sensitive parameters -- matching blanket correction -- while preserving near-nominal coverage for protected parameters that blanket correction collapses to 20-21%. A threshold of 1.2 produces zero false positives, with a separation ratio of approximately 4:1. Applied to the 2019 National Survey of Early Care and Education (6,785 providers, 51 states), the diagnostic flags exactly 1 of 54 parameters for correction; blanket correction would have narrowed the worst remaining interval to 4.3% of its original width. The entire pipeline completes in under 0.03 seconds, bridging design-based and model-based survey inference.

2603.07780 2026-03-10 econ.EM math.ST stat.TH

Testing for Endogeneity: A Moment-Based Bayesian Approach

Siddhartha Chib, Minchul Shin, Anna Simoni

Comments 109 pages, 4 figures

详情
英文摘要

A standard assumption in the Bayesian estimation of linear regression models is that the regressors are exogenous in the sense that they are uncorrelated with the model error term. In practice, however, this assumption can be invalid. In this paper, using the exponentially tilted empirical likelihood framework, we develop a Bayes factor test for endogeneity that compares a base model that is correctly specified under exogeneity but misspecified under endogeneity against an extended model that is correctly specified in either case. We provide a comprehensive study of the log-marginal exponentially tilted empirical likelihood. We demonstrate that our testing procedure is consistent from a frequentist point of view: as the sample grows, it almost surely selects the base model if and only if the regressors are exogenous, and the extended model if and only if the regressors are endogenous. The methods are illustrated with simulated data, and problems concerning the causal effect of automobile prices on automobile demand and the causal effect of potentially endogenous airplane ticket prices on passenger volume.

2603.07742 2026-03-10 stat.OT

A Cylindrical Galton Board at the Galton Board's 150th Anniversary

Kanti V. Mardia, Colin Goodall, John Rubbo

Comments 18 pages, 8 Figures

详情
英文摘要

The Galton board is a well known device for showing how repeated Bernoulli trials on a triangular lattice produce an approximately normal distribution. Marking the 150th anniversary of Galton's 1875 construction, this paper revisits the original apparatus and extends it to a cylindrical setting in which the peg lattice is wrapped around a cylinder. This creates angular periodicity and leads to height dependent behaviour that does not arise in the classical planar design. The cylindrical form links Galton's demonstration of variation and the emergence of the normal distribution with modern ideas in circular statistics, giving a physical realisation of binomial random walks on a circular linear product space. We distinguish cases where the wrapped lattice covers only an arc from those that span the full circumference, and show how these geometries lead to wrapped binomial and wrapped normal behaviour. We describe the construction of our physical model, discuss practical issues for replication, and analyse its statistical and pedagogical properties as a modern reinterpretation of Galton's work.

2603.07701 2026-03-10 cond-mat.str-el hep-th physics.app-ph quant-ph stat.CO

Fractional Topological Phases, Flat Bands, and Robust Edge States on Finite Cyclic Graphs via Single-Coin Split-Step Quantum Walks

Dinesh Kumar Panda, Colin Benjamin

Comments 18 pages, 18 figures, 2 tables

详情
英文摘要

We report the first realization of a fractional topological phase in a fully unitary, noninteracting discrete-time quantum walk implemented on finite cyclic graphs. Using a single-coin split-step cyclic quantum walk (SCSS-CQW), we uncover topological phenomena that are inaccessible within conventional cyclic quantum-walk dynamics. The protocol enables controlled engineering of quasienergy spectra, flat bands, and topological phase transitions through the step-dependency parameter and coin-rotation angle. We show that cyclic graphs with even and odd numbers of sites exhibit qualitatively different band structures, while rotational flat bands arise exclusively in $4n$-site cycles; a general analytic condition for their emergence is derived. The SCSS-CQW produces fractional winding numbers $\pm \frac{1}{2}$ (Zak phases $\pm \fracπ{2}$), in sharp contrast with the integer invariants of standard quantum walks. These fractional invariants lead to an unconventional bulk--boundary correspondence and support edge states beyond the usual integer topological classification. In the step-dependent protocol, transitions between distinct fractional winding sectors generate robust edge modes. Numerical simulations show that these states remain stable in the presence of both dynamic and static coin disorder as well as phase-preserving perturbations, while survival-probability analysis demonstrates their long-time persistence. Requiring only a constant number of detectors independent of the evolution time, the proposed scheme offers a minimal-resource and experimentally accessible platform for realizing fractional topology, flat bands, and protected edge states in small-scale synthetic quantum systems.

2603.07656 2026-03-10 stat.ME math.ST stat.AP stat.CO stat.TH

Group-Sparse Smoothing for Longitudinal Models with Time-Varying Coefficients

Yu Lu, Tianni Zhang, Yuyao Wang, Mengfei Ran

详情
英文摘要

Longitudinal data analysis is fundamental for understanding dynamic processes in biomedical and social sciences. Although varying coefficient models (VCMs) provide a flexible framework by allowing covariate effects to evolve over time, fitting all effects as time-varying may lead to overfitting, efficiency loss, and reduced interpretability when some effects are actually constant. In contrast, standard linear mixed models (LMMs) may suffer substantial bias when temporal heterogeneity is ignored. To address this issue, we propose time-varying effect selection, TV-Select, a unified framework for structural identification that simultaneously selects relevant variables and determines whether their effects are constant or time-varying. The proposed method decomposes each coefficient function into a time-invariant mean component and a centered time-varying deviation, where the latter is approximated by B-splines. We then construct a doubly penalized objective function that combines a group Lasso penalty for structural sparsity with a roughness penalty for smoothness control. An efficient block coordinate descent algorithm is developed for computation. Under regular semiparametric conditions, we establish selection consistency and oracle-type asymptotic properties, including asymptotic normality for the constant-effect component after correct structure recovery. Simulation studies and a real-data application show that TV-Select achieves more accurate structural recovery, smoother functional estimation, and better predictive performance than competing methods.

2603.07634 2026-03-10 stat.ME physics.data-an

Dissecting Spectral Granger Causality through Partial Information Decomposition

Luca Faes, Gorana Mijatovic, Riccardo Pernice, Daniele Marinazzo, Sebastiano Stramaglia, Yuri Antonacci

详情
英文摘要

Granger causality (GC), a popular statistical method for the inference of directional influences between time series measured from a complex network, is sensitive to high-order (non-pairwise) interactions which fundamentally shape the collective network dynamics. This work introduces Partial Decomposition of Granger Causality (PDGC), a tool eliciting redundant and synergistic causal interactions in the pattern of information flow between the subsystems of physiological networks. The tool exploits the framework of partial information decomposition to dissect the multivariate GC from a set of driver random processes to a target process into unique effects carried exclusively by each driver, redundant effects carried identically by more drivers, and synergistic effects carried jointly by some drivers but not by any of them individually. Computation is based on multivariate state-space models expanded in the frequency domain to assess PDGC both in specific bands of physiological interest and in the time domain after whole-band integration. The spectral PDGC was tested in physiological networks probed by measuring the variability series of arterial pressure, heart period, respiration and cerebral blood velocity in patients prone to neurally-mediated syncope compared to healthy controls. This application revealed unprecedented modes of physiological interaction, related to the sympathetic control of low-frequency cardiovascular and cerebrovascular oscillations, characterizing distinctive patterns of autonomic dysfunction. The extraction of high-order causality patterns from the spectral GC favors dissecting the mechanisms of causal influence underlying multivariate interactions among oscillatory processes in many data-driven applications of network science.

2603.07527 2026-03-10 stat.ME

An efficient method of posterior sampling for Poisson INGARCH models

Yixuan Fan, Zhengwei Liu, Fukang Zhu

详情
英文摘要

We develop an efficient posterior sampling scheme for the Poisson INGARCH models. The proposed method is based on the approximation of the posterior density that exploits the Poisson limit of the negative binomial distribution. It allows us to rewrite the model in a form amenable to Pólya-Gamma data augmentation scheme, which yields simple conditionally Gaussian updates for the autoregressive coefficients. Sampling from the approximate posterior is straightforward via Gibbs-type iterations and remains numerically stable even under strong temporal dependence. Using this sampler as a proposal distribution will enhance the efficiency in Metropolis-Hastings algorithm and adaptive importance sampling. Numerical simulations indicate accurate posterior estimates, high effective sample sizes, and rapidly mixing chains.

2603.07522 2026-03-10 stat.ML cs.LG

Beyond Data Splitting: Full-Data Conformal Prediction by Differential Privacy

Young Hyun Cho, Jordan Awan

详情
英文摘要

Privacy protection and uncertainty quantification are increasingly important in data-driven decision making. Conformal prediction provides finite-sample marginal coverage, but existing private approaches often rely on data splitting, reducing the effective sample size. We propose a full-data privacy-preserving conformal prediction framework that avoids splitting. Our framework leverages stability induced by differential privacy to control the gap between in-sample and out-of-sample conformal scores, and pairs this with a conservative private quantile routine designed to prevent under-coverage. We show that a generic differential privacy guarantee yields a universal coverage floor, yet cannot generally recover the nominal $1-α$ level. We then provide a refined, mechanism-specific stability analysis and yields asymptotic recovery of the nominal level. Experiments demonstrate sharper prediction sets than the split-based private baseline.

2603.07505 2026-03-10 stat.ME

Adapting to noise tails in private linear regression

Jinyuan Chang, Lin Yang, Mengyue Zha, Wen-Xin Zhou

详情
英文摘要

While the traditional goal of statistics is to infer population parameters, modern practice increasingly demands protection of individual privacy. One way to address this need is to adapt classical statistical procedures into privacy-preserving algorithms. In this paper, we develop differentially private tail-robust methods for linear regression. The trade-off among bias, privacy, and robustness is controlled by a tunable robustification parameter in the Huber loss. We implement noisy clipped gradient descent for low-dimensional settings and noisy iterative hard thresholding for high-dimensional sparse models. Under sub-Gaussian errors, our method achieves near-optimal convergence rates while relaxing several assumptions required in earlier work. For heavy-tailed errors, we explicitly characterize how the non-asymptotic convergence rate depends on the moment index, privacy parameters, sample size, and intrinsic dimension. Our analysis shows how the moment index influences the choice of robustification parameters and, in turn, the resulting statistical error and privacy cost. By quantifying the interplay among bias, privacy, and robustness, we extend classical perspectives on privacy-preserving robust regression. The proposed methods are evaluated through simulations and two real datasets.

2603.07479 2026-03-10 stat.ME

Mixed Effects Mixture of Experts: Modeling Double Heterogeneous Trajectories

Xinkai Yue, Xiaodong Yan, Haohui Han, Liya Fu

Comments 5 figures

详情
英文摘要

Linear mixed-effects model (LMM) is a cornerstone of longitudinal data analysis, but is limited to adeptly make heterogeneous analyses predictable under both group-specific fixed effects and subject-specific random effects. To address this challenge, we propose a novel statistical framework by using a large model prototype: a mixed effects mixture of experts model (MEMoE). This framework integrates the divide-and-conquer paradigm of Mixture of Experts Models with classical mixed-effect modeling. In the proposed MEMoE, each expert is a full LMM dedicated to capturing the longitudinal trajectory of a specific latent subpopulation, while another model gating function learns to route subjects to the most appropriate expert in a data-driven manner based on baseline covariates. We develop a robust inferential procedure for parameter estimation based on the Laplace Expectation-Maximization algorithm, with standard errors calibrated using robust sandwich estimators to account for potential model misspecification. Extensive simulation studies and an empirical application demonstrate that MEMoE outperforms both traditional single-population LMM and conventional Mixture of Experts models in terms of parameter recovery, classification accuracy, and overall model fit.

2603.07478 2026-03-10 stat.ME math.OC stat.AP

Evaluating consumption effects of intelligent control algorithms for district heated buildings

Antti Solonen, Arttu Häkkinen, Sallamaari Rapo, Antti Mäkinen, Sampo Kaukonen, Felipe Uribe

详情
英文摘要

As buildings become increasingly connected and sensor-rich, intelligent remote heating control is rapidly superseding conventional local heating control. Such control algorithms often aim at reducing energy consumption by minimizing over-heating and utilizing free solar energy, for instance. Numerous companies offering heating optimization solutions have recently emerged. After installing such a system, end-users naturally want to quantify and verify the effect of such an investment, i.e., monetary return. Methods for tracking buildings' heating efficiency are diverse, ranging from simple weather normalization to more complex modeling approaches, but lack transparency and commonly agreed best practices. The problem is further complicated by the fact that buildings constantly undergo non-control-related changes that affect their energy efficiency, making it difficult to isolate and track only control-related effects using the existing methods. In this paper, we first review and derive methods for monitoring the overall efficiency of buildings, and show their inability to isolate the control effects from other changes happening in the buildings. We then propose a model-based approach for estimating and tracking only the control-related effects. Moreover, we show how the models can decompose the total control effect into sub-components to reveal where the energy effects come from. We demonstrate the methods using real data collected over approximately 10 years from the Danfoss Leanheat Building platform. Our scope focuses on district heated buildings with substation-level (supply temperature) control, but the methodology extends to other cases as well.

2603.07467 2026-03-10 stat.ML cs.LG math.PR math.ST stat.ME stat.TH

Probabilistic Inference and Learning with Stein's Method

Qiang Liu, Lester Mackey, Chris Oates

详情
英文摘要

This monograph provides a rigorous overview of theoretical and methodological aspects of probabilistic inference and learning with Stein's method. Recipes are provided for constructing Stein discrepancies from Stein operators and Stein sets, and properties of these discrepancies such as computability, separation, convergence detection, and convergence control are discussed. Further, the connection between Stein operators and Stein variational gradient descent is set out in detail. The main definitions and results are precisely stated, and references to all proofs are provided.

2603.07458 2026-03-10 econ.EM stat.AP

ForeComp: An R Package for Comparing Predictive Accuracy Using Fixed-Smoothing Asymptotics

Minchul Shin, Nathan Schor

Comments 45 pages, 2 figures

详情
英文摘要

We introduce ForeComp, an R package for comparing predictive accuracy using Diebold-Mariano type tests of equal predictive ability with standard and fixed smoothing inference. The package provides a common interface for loss differential based testing and includes Plot Tradeoff, a visual diagnostic for bandwidth sensitivity and the size-power tradeoff. We illustrate the toolkit with Survey of Professional Forecasters applications and Monte Carlo evidence on finite-sample performance.

2603.07447 2026-03-10 stat.ME math.ST stat.AP stat.TH

Dirichlet kernel density estimation on the simplex with missing data

Hanen Daayeb, Wissem Jedidi, Salah Khardani, Guanjie Lyu, Frédéric Ouimet

Comments 32 pages, 9 figures, 2 tables

详情
英文摘要

Nonparametric density estimation for compositional data supported on the simplex is examined under a missing at random mechanism. Rather than imputing missing values and estimating the density from a completed data set, we adopt a strategy based on inverse probability weighting. The proposed estimator uses an adaptive Dirichlet kernel, which ensures nonnegativity on the simplex and favorable behavior near the boundary. When the observation probabilities are unknown, they are estimated through a Nadaraya-Watson regression step. The large-sample properties of the estimator are derived, including pointwise bias and variance expansions, optimal smoothing rates, and asymptotic normality. A simulation study investigates its finite-sample performance under varying sample sizes and missing rates. Simulations show our method outperforms inverse-probability-weighted kernel density estimators based on additive and isometric log-ratio transformations of the data for certain target densities. The methodology is further illustrated through an application to leukocyte composition data from the National Health and Nutrition Examination Survey (NHANES), which allows for the identification of the modal immune profile in the sampled population.

2603.07437 2026-03-10 cs.LG cs.SY eess.SY math.OC stat.ML

Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part II

Yi Tian, Kaiqing Zhang, Russ Tedrake, Suvrit Sra

Comments 38 pages; preliminary version appeared in IEEE CDC 2023; this is the extended journal version, with an end-to-end guarantee added

详情
英文摘要

We study the problem of state representation learning for control from partial and potentially high-dimensional observations. We approach this problem via cost-driven state representation learning, in which we learn a dynamical model in a latent state space by predicting cumulative costs. In particular, we establish finite-sample guarantees on finding a near-optimal representation function and a near-optimal controller using the learned latent model for infinite-horizon time-invariant Linear Quadratic Gaussian (LQG) control. We study two approaches to cost-driven representation learning, which differ in whether the transition function of the latent state is learned explicitly or implicitly. The first approach has also been investigated in Part I of this work, for finite-horizon time-varying LQG control. The second approach closely resembles MuZero, a recent breakthrough in empirical reinforcement learning, in that it learns latent dynamics implicitly by predicting cumulative costs. A key technical contribution of this Part II is to prove persistency of excitation for a new stochastic process that arises from the analysis of quadratic regression in our approach, and may be of independent interest.

2603.07409 2026-03-10 stat.ME stat.ML

Tree-Based Predictive Models for Noisy Input Data

Kevin McCoy, Zachary Wooten, Christine B. Peterson

Comments 17 pages, 9 figures

详情
英文摘要

Measurement error is prevalent across all domains of scientific research where only imprecise observations, rather than the true underlying values, can be obtained. For example, estimates of human microbiome diversity are based on small samples from a much larger, generally unobserved system and reflect both sampling error and technical variation. In high-noise settings like these, it becomes difficult to make accurate predictions and to summarize uncertainty. Methods have previously been proposed to accommodate measurement error in classic predictive models, such as linear regression. However, relatively little work has been done to address measurement error in more complex and flexible models. Bayesian additive regression trees (BART), a Bayesian nonparametric model that sums the output of many decision trees, offers robust predictions with built-in uncertainty quantification. In this work, we propose measurement error BART (meBART), a novel extension to the BART model that directly incorporates measurement error in the independent variable(s). Through simulation studies, we show that in the presence of measurement error, our model enables more accurate parameter estimation, more robust uncertainty quantification, and superior predictive performance. We illustrate the utility of our proposed approach through two biomedical applications where the predictors of interest are subject to measurement error.

2603.07380 2026-03-10 stat.AP

Excessive data censoring in fMRI undermines individual precision and weakens brain-behavior associations

Amanda Mejia, Joanne Hwang, Damon Pham, Stephanie Noble, Theodore D. Satterthwaite, Thomas E. Nichols, B. T. Thomas Yeo

详情
英文摘要

Censoring high-motion volumes in fMRI is common practice to reduce effects of head motion on functional connectivity (FC). Although aggressive censoring removes more noise, it causes extensive data loss, creating a tradeoff that may ultimately improve or degrade FC accuracy. Here, we evaluate how censoring affects FC estimation and downstream brain-wide association studies (BWAS). Using extensively sampled participants from the Human Connectome Project (HCP) Retest dataset, we establish individual "ground truth" FC and assess the accuracy of FC estimated from 5-30 minute scans. We find that censoring degrades FC accuracy, with more aggressive censoring being more detrimental, particularly among participants exhibiting above-average motion. In these participants, aggressive censoring reduces FC accuracy by 30% for 30-minute scans denoised with ICA-FIX, an advanced denoising method, and by 3% for scans denoised with conventional confound regression. These effects reflect substantial data loss (34%) that outweighs comparatively modest noise reductions: 7% with ICA-FIX and 18% with confound regression. Compensating for this would require substantially longer scans (62% with confound regression; 76% with ICA-FIX), inflating data collection budgets. Introducing a repeated measures framework to separate motion trait from artifact, we find that standard QC metrics are dominated by motion trait and overstate motion bias, which is effectively mitigated with less aggressive censoring. Finally, using data from nearly 1,000 HCP participants, we demonstrate that unreliable FC substantially attenuates BWAS correlations: by ~30% under optimal conditions (longer ICA-FIX scans with no censoring) but exceeding 75% in short, aggressively censored scans. Our findings support the use of advanced denoising methods, limiting censoring, and collecting longer scans to maximize fidelity of FC and BWAS.

2603.07351 2026-03-10 cs.RO cs.LG stat.ML

A Distributed Gaussian Process Model for Multi-Robot Mapping

Seth Nabarro, Mark van der Wilk, Andrew J. Davison

Comments ICRA 2026, 8 pages

详情
英文摘要

We propose DistGP: a multi-robot learning method for collaborative learning of a global function using only local experience and computation. We utilise a sparse Gaussian process (GP) model with a factorisation that mirrors the multi-robot structure of the task, and admits distributed training via Gaussian belief propagation (GBP). Our loopy model outperforms Tree-Structured GPs \cite{bui2014tree} and can be trained online and in settings with dynamic connectivity. We show that such distributed, asynchronous training can reach the same performance as a centralised, batch-trained model, albeit with slower convergence. Last, we compare to DiNNO \cite{yu2022dinno}, a distributed neural network (NN) optimiser, and find DistGP achieves superior accuracy, is more robust to sparse communication and is better able to learn continually.

2602.22758 2026-03-10 cs.AI stat.AP

Decomposing Physician Disagreement in HealthBench

Satya Borgohain, Roy Mariathas

详情
英文摘要

We decompose physician disagreement in the HealthBench medical AI evaluation dataset to understand where variance resides and what observable features can explain it. Rubric identity accounts for 15.8% of met/not-met label variance but only 3.6-6.9% of disagreement variance; physician identity accounts for just 2.4%. The dominant 81.8% case-level residual is not reduced by HealthBench's metadata labels (z = -0.22, p = 0.83), normative rubric language (pseudo R^2 = 1.2%), medical specialty (0/300 Tukey pairs significant), surface-feature triage (AUC = 0.58), or embeddings (AUC = 0.485). Disagreement follows an inverted-U with completion quality (AUC = 0.689), confirming physicians agree on clearly good or bad outputs but split on borderline cases. Physician-validated uncertainty categories reveal that reducible uncertainty (missing context, ambiguous phrasing) more than doubles disagreement odds (OR = 2.55, p < 10^(-24)), while irreducible uncertainty (genuine medical ambiguity) has no effect (OR = 1.01, p = 0.90), though even the former explains only ~3% of total variance. The agreement ceiling in medical AI evaluation is thus largely structural, but the reducible/irreducible dissociation suggests that closing information gaps in evaluation scenarios could lower disagreement where inherent clinical ambiguity does not, pointing toward actionable evaluation design improvements.

2602.15319 2026-03-10 stat.ME stat.AP

Bayesian Inference for Joint Tail Risk in Paired Biomarkers via Archimedean Copulas with Restricted Jeffreys Priors

Agnideep Aich, Md. Monzur Murshed, Sameera Hewage, Ashit Baran Aich

详情
英文摘要

We propose a Bayesian copula-based framework to quantify clinically interpretable joint tail risks from paired continuous biomarkers. After converting each biomarker margin to rank-based pseudo-observations, we model dependence using one-parameter Archimedean copulas and focus on three probability-scale summaries at tail level $α$: the lower-tail joint risk $R_L(θ)=C_θ(α,α)$, the upper-tail joint risk $R_U(θ)=2α-1+C_θ(1-α,1-α)$, and the conditional lower-tail risk $R_C(θ)=R_L(θ)/α$. Uncertainty is quantified via a restricted Jeffreys prior on the copula parameter and grid-based posterior approximation, which induces an exact posterior for each tail-risk functional. In simulations from Clayton and Gumbel copulas across multiple dependence strengths, posterior credible intervals achieve near-nominal coverage for $R_L$, $R_U$, and $R_C$. We then analyze NHANES 2017--2018 fasting glucose (GLU) and HbA1c (GHB) ($n=2887$) at $α=0.05$, obtaining tight posterior credible intervals for both the dependence parameter and induced tail risks. The results reveal markedly elevated extremal co-movement relative to independence; under the Gumbel model, the posterior mean joint upper-tail risk is $R_U(α)=0.0286$, approximately $11.46\times$ the independence benchmark $α^2=0.0025$. Overall, the proposed approach provides a principled, dependence-aware method for reporting joint and conditional extremal-risk summaries with Bayesian uncertainty quantification in biomedical applications.

2512.03112 2026-03-10 cs.LG cs.AI stat.ML

Beyond Additivity: Sparse Isotonic Shapley Regression toward Nonlinear Explainability

Jialai She

详情
英文摘要

Shapley values, a gold standard for feature attribution in Explainable AI, face two key challenges. First, the canonical Shapley framework assumes that the worth function is additive, yet real-world payoff constructions--driven by non-Gaussian distributions, heavy tails, feature dependence, or domain-specific loss scales--often violate this assumption, leading to distorted attributions. Second, achieving sparse explanations in high-dimensional settings by computing dense Shapley values and then applying ad hoc thresholding is costly and risks inconsistency. We introduce Sparse Isotonic Shapley Regression (SISR), a unified nonlinear explanation framework. SISR simultaneously learns a monotonic transformation to restore additivity--obviating the need for a closed-form specification--and enforces an L0 sparsity constraint on the Shapley vector, enhancing computational efficiency in large feature spaces. Its optimization algorithm leverages Pool-Adjacent-Violators for efficient isotonic regression and normalized hard-thresholding for support selection, ensuring ease in implementation and global convergence guarantees. Analysis shows that SISR recovers the true transformation in a wide range of scenarios and achieves strong support recovery even in high noise. Moreover, we are the first to demonstrate that irrelevant features and inter-feature dependencies can induce a true payoff transformation that deviates substantially from linearity. Extensive experiments demonstrate that SISR stabilizes attributions across payoff schemes and correctly filters irrelevant features; in contrast, standard Shapley values suffer severe rank and sign distortions. By unifying nonlinear transformation estimation with sparsity pursuit, SISR advances the frontier of nonlinear explainability, providing a theoretically grounded and practical attribution framework.

2510.16717 2026-03-10 stat.ME math.ST quant-ph stat.TH

Correlation of divergency: c-delta. Being different in a similar way or not

Johan F. Hoorn

Comments 17 pages, 1 table

详情
英文摘要

This paper introduces the correlation-of-divergency coefficient, c-delta, a custom statistical measure designed to quantify the similarity of internal divergence patterns between two groups of values. Unlike conventional correlation coefficients such as Pearson or Spearman, which assess the association between paired values, c-delta evaluates whether the way values differ within one group is mirrored in another. The method involves calculating, for each value, its divergence from all other values in its group, and then comparing these patterns across the two groups (e.g., human vs machine intelligence). The coefficient is normalised by the average root mean square divergence within each group, ensuring scale invariance. Potential applications of c-delta span quantum physics, where it can compare the spread of measurement outcomes between quantum systems, as well as fields such as genetics, ecology, psychometrics, manufacturing, machine learning, and social network analysis. The measure is particularly useful for benchmarking, clustering validation, and assessing the similarity of variability structures. While c-delta is not bounded between -1 and 1 and may be sensitive to outliers (but so is Pearson's r), it offers a new perspective for analysing internal variability and divergence. The article discusses the mathematical formulation, potential adaptations for complex data, and the interpretative considerations relevant to this alternative approach.

2509.02171 2026-03-10 stat.ML cs.LG stat.AP

Synthetic data for ratemaking: imputation-based methods vs adversarial networks and autoencoders

Yevhen Havrylenko, Meelis Käärik, Artur Tuttar

Comments 35 pages, 2 figures, 2 tables

详情
英文摘要

Actuarial ratemaking depends on high-quality data, yet access to such data is often limited by the cost of obtaining new data, privacy concerns, etc. In this paper, we explore synthetic-data generation as a potential solution to these issues. In addition to generative methods previously studied in the actuarial literature, we explore and benchmark another class of approaches based on Multivariate Imputation by Chained Equations (MICE). In a comparative study using an open-source dataset, MICE-based models are evaluated against other generative models like Variational Autoencoders and Conditional Tabular Generative Adversarial Networks. We assess how well synthetic data preserves the original marginal distributions of variables as well as the multivariate relationships among covariates. The consistency between Generalized Linear Models (GLMs) trained on synthetic data with GLMs trained on the original data is also investigated. Furthermore, we assess the ease of use of each generative approach and study the impact of generically augmenting original data with synthetic data on the performance of GLMs for predicting claim counts. Our results highlight the potential of MICE-based methods in creating high-fidelity tabular data while offering lower implementation complexity compared to deep generative models.

2508.01920 2026-03-10 q-bio.NC q-bio.QM stat.AP

CITS: Nonparametric Statistical Causal Modeling for High-Resolution Neural Time Series

Rahul Biswas, SuryaNarayana Sripada, Somabha Mukherjee, Reza Abbasi-Asl

Comments arXiv admin note: text overlap with arXiv:2312.09604

详情
英文摘要

Identifying causal interactions in complex dynamical systems is a fundamental challenge across the computational sciences. Existing functional connectivity methods capture correlations but not causation. While addressing directionality, popular causal inference tools such as Granger causality and the Peter-Clark algorithm rely on restrictive assumptions that limit their applicability to high-resolution time-series data, such as the large-scale recordings now standard in neuroscience. Here, we introduce CITS (Causal Inference in Time Series), a nonparametric framework for inferring statistically causal structure from multivariate time series. CITS models dynamics using a structural causal model of arbitrary Markov order and statistical tests for lagged conditional independence. We prove consistency under mild assumptions and demonstrate superior accuracy over state-of-the-art baselines across simulated linear, nonlinear, and recurrent neural network benchmarks. Applying CITS to large-scale neuronal recordings from the mouse visual cortex, thalamus, and hippocampus, we uncover stimulus-specific causal pathways and inter-regional hierarchies that align with known anatomy while revealing new functional insights. We further highlight CITS ability in accurately identifying conditional dependencies within small inferred neuronal motifs. These results establish CITS as a theoretically grounded and empirically validated method for discovering interpretable statistically causal networks in neural time series. Beyond neuroscience, the framework is broadly applicable to causal discovery in complex temporal systems across domains.

2505.09496 2026-03-10 stat.ML cs.LG

Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data

Rui Miao, Babak Shahbaba, Annie Qu

详情
英文摘要

Offline reinforcement learning (RL) aims to find optimal policies in dynamic environments in order to maximize the expected total rewards by leveraging pre-collected data. Learning from heterogeneous data is one of the fundamental challenges in offline RL. Traditional methods focus on learning an optimal policy for all individuals with pre-collected data from a single episode or homogeneous batch episodes, and thus, may result in a suboptimal policy for a heterogeneous population. In this paper, we propose an individualized offline policy optimization framework for heterogeneous time-stationary Markov decision processes (MDPs). The proposed heterogeneous model with individual latent variables enables us to efficiently estimate the individual Q-functions, and our Penalized Pessimistic Personalized Policy Learning (P4L) algorithm guarantees a fast rate on the average regret under a weak partial coverage assumption on behavior policies. In addition, our simulation studies and a real data application demonstrate the superior numerical performance of the proposed method compared with existing methods.

2504.20527 2026-03-10 math.OC stat.ML

Adaptive Replication Strategies in Trust-Region-Based Bayesian Optimization of Stochastic Functions

Mickael Binois, Jeffrey Larson

详情
英文摘要

We develop and analyze a method for stochastic simulation optimization based on Gaussian process models within a trust-region framework. We focus on settings where the variance of the objective function is large, making accurate estimation challenging and often requiring many evaluations. To address this regime, we combine local modeling with adaptive replication, allowing the method to allocate repeated evaluations where they are most beneficial. We introduce several mechanisms to promote and adapt replication, including modifications to the acquisition function and cost-aware evaluation strategies. These components enable our approach to scale effectively when high levels of sampling are required to reduce noise. Numerical experiments show that adaptive replication can substantially improve solution accuracy by several orders of magnitude over baseline methods and computational efficiency when evaluation costs are taken into account.

2502.13711 2026-03-10 math.ST math.PR stat.AP stat.TH

On noncentral Wishart mixtures of noncentral Wisharts and their use for testing random effects in factorial design models

Christian Genest, Anne MacKay, Frédéric Ouimet

Comments 12 pages, 0 figures, 2 tables

详情
Journal ref
Journal of Mathematical Analysis and Applications (2026), 554 (1), 1-11
英文摘要

It is shown that a noncentral Wishart mixture of noncentral Wishart distributions with the same degrees of freedom yields a noncentral Wishart distribution, thereby extending the main result of Jones and Marchand [Stat 10 (2021), Paper No. e398, 7 pp.] from the chi-square to the Wishart setting. To illustrate its use, this fact is then employed to derive the finite-sample distribution of test statistics for random effects in a two-factor factorial design model with $d$-dimensional normal data, thereby broadening the findings of Bilodeau [ArXiv (2022), 6 pp.], who treated the case $d = 1$. The same approach makes it possible to test random effects in more general factorial design models.

2501.15163 2026-03-10 cs.LG stat.ML

The Exploration of Error Bounds in Classification with Noisy Labels

Haixia Liu, Boxiao Li, Can Yang, Yang Wang

Comments 21 pages

详情
英文摘要

Numerous studies have shown that label noise can lead to poor generalization performance, negatively affecting classification accuracy. Therefore, understanding the effectiveness of classifiers trained using deep neural networks in the presence of noisy labels is of considerable practical significance. In this paper, we focus on the error bounds of excess risks for classification problems with noisy labels within deep learning frameworks. We derive error bounds for the excess risk, decomposing it into statistical error and approximation error. To handle statistical dependencies (e.g., mixing sequences), we employ an independent block construction to bound the error, leveraging techniques for dependent processes. For the approximation error, we establish these theoretical results to the vector-valued setting, where the output space consists of $K$-dimensional unit vectors. Finally, under the low-dimensional manifold hypothesis, we further refine the approximation error to mitigate the impact of high-dimensional input spaces.

2501.06024 2026-03-10 stat.ME math.ST stat.TH

Doubly-Robust Functional Average Treatment Effect Estimation

Lorenzo Testa, Tobia Boschi, Francesca Chiaromonte, Edward H. Kennedy, Matthew Reimherr

Comments 19 pages, 2 figures

详情
英文摘要

Understanding causal relationships in the presence of complex, structured data remains a central challenge in modern statistics and science in general. While traditional causal inference methods are well-suited for scalar outcomes, many scientific applications demand tools capable of handling functional data -- outcomes observed as functions over continuous domains such as time or space. Motivated by this need, we propose DR-FoS, a novel method for estimating the Functional Average Treatment Effect (FATE) in observational studies with functional outcomes. DR-FoS exhibits double robustness properties, ensuring consistent estimation of FATE even if either the outcome or the treatment assignment model is misspecified. By leveraging recent advances in functional data analysis and causal inference, we establish the asymptotic properties of the estimator, proving its convergence to a Gaussian process. This guarantees valid inference with simultaneous confidence bands across the entire functional domain. Through extensive simulations, we show that DR-FoS achieves robust performance under a wide range of model specifications. Finally, we illustrate the utility of DR-FoS in a real-world application, analyzing functional outcomes to uncover meaningful causal insights in the SHARE ({\em Survey of Health, Aging and Retirement in Europe}) dataset.

2501.04959 2026-03-10 econ.EM stat.CO

DisSim-FinBERT: Text Simplification for Core Message Extraction in Complex Financial Texts

Wonseong Kim, Christina Niklaus, Choong Lyol Lee, Siegfried Handschuh

Comments 28 pages, 5 figures, 2 tables

详情
英文摘要

This study proposes DisSim-FinBERT, a novel framework that integrates Discourse Simplification (DisSim) with Aspect-Based Sentiment Analysis (ABSA) to enhance sentiment prediction in complex financial texts. By simplifying intricate documents such as Federal Open Market Committee (FOMC) minutes, DisSim improves the precision of aspect identification, resulting in sentiment predictions that align more closely with economic events. The model preserves the original informational content and captures the inherent volatility of financial language, offering a more nuanced and accurate interpretation of long-form financial communications. This approach provides a practical tool for policymakers and analysts aiming to extract actionable insights from central bank narratives and other detailed economic documents.

2410.21263 2026-03-10 stat.ME cs.LG math.ST stat.ML stat.TH

Adaptive Transfer Clustering: A Unified Framework

Yuqi Gu, Zhongyuan Lyu, Kaizheng Wang

Comments 72 pages

详情
英文摘要

We propose a general transfer learning framework for clustering given a main dataset and an auxiliary one about the same subjects. The two datasets may reflect similar but different latent grouping structures of the subjects. We propose an adaptive transfer clustering (ATC) algorithm that automatically leverages the commonality in the presence of unknown discrepancy, by optimizing an estimated bias-variance decomposition. It applies to a broad class of statistical models including Gaussian mixture models, stochastic block models, and latent class models. A theoretical analysis proves the optimality of ATC under the Gaussian mixture model and explicitly quantifies the benefit of transfer. Extensive simulations and real data experiments confirm our method's effectiveness in various scenarios.

2409.08838 2026-03-10 stat.ME

Intrinsic Geometry-Based Angular Covariance: A Novel Framework for Nonparametric Changepoint Detection in Meteorological Data

Surojit Biswas, Buddhananda Banerjee, Arnab Kumar Laha

Comments arXiv admin note: text overlap with arXiv:2403.00508

详情
英文摘要

In many temporal datasets, the parameters of the underlying distribution may change abruptly at unknown times. Detecting such changepoints is crucial for numerous applications. Although such a problem has been extensively studied for linear data, there has been notably less research on bivariate angular data. To the best of our knowledge, this paper presents the first attempt to address the changepoint detection problem for the mean direction of toroidal and spherical data. By defining the ``square of an angle'' through intrinsic geometry, we construct a curved dispersion matrix for bivariate angular data, analogous to the linear dispersion matrix in Euclidean space. Using the analogous measure of the ``Mahalanobis distance,'' we develop two new non-parametric tests to identify changes in the mean direction parameters for toroidal and spherical distributions. The pivotal distributions of the test statistics are shown to follow the Kolmogorov distribution under the null hypothesis. Under the alternative hypothesis, we establish the consistency of the proposed tests. We also apply the proposed methods to detect changes in mean direction for hourly wind-wave direction (toroidal) measurements and the path (spherical) of the cyclonic storm ``Biporjoy,'' which occurred between 6th and 19th June 2023 over the Arabian Sea, western coast of India.

2407.19602 2026-03-10 stat.ME

Metropolis--Hastings with Scalable Subsampling

Estevão Prado, Christopher Nemeth, Chris Sherlock

Comments 78 pages, 14 figures, 9 tables

详情
英文摘要

The Metropolis-Hastings (MH) algorithm is one of the most widely used Markov Chain Monte Carlo schemes for generating samples from Bayesian posterior distributions. The algorithm is asymptotically exact, flexible and easy to implement. However, in the context of Bayesian inference for large datasets, evaluating the likelihood on the full data for thousands of iterations until convergence can be prohibitively expensive. This paper introduces a new subsample MH algorithm that satisfies detailed balance with respect to the target posterior and utilises control variates to enable exact, efficient Bayesian inference on datasets with large numbers of observations. Through theoretical results, simulation experiments and real-world applications on certain generalised linear models, we demonstrate that our method requires substantially smaller subsamples and is computationally more efficient than the standard MH algorithm and other exact subsample MH algorithms.

2407.05110 2026-03-10 math.ST math.OC stat.AP stat.TH

Distributional stability of sparse inverse covariance matrix estimators

Renjie Chen, Huifu Xu, Henryk Zähle

详情
英文摘要

Finding an approximation of the inverse of the covariance matrix, also known as precision matrix, of a random vector with empirical data is widely discussed in finance and engineering. In data-driven problems, empirical data may be ``contaminated''. This raises the question as to whether the approximate precision matrix is reliable from a statistical point of view. In this paper, we concentrate on a much-noticed sparse estimator of the precision matrix and investigate the issue from the perspective of distributional stability. Specifically, we derive an explicit local Lipschitz bound for the distance between the distributions of the sparse estimator under two different distributions (regarded as the true data distribution and the distribution of ``contaminated'' data). The distance is measured by the Kantorovich metric on the set of all probability measures on a matrix space. We also present analogous results for the standard estimators of the covariance matrix and its eigenvalues. Furthermore, we discuss several applications and conduct some numerical experiments.

2301.08056 2026-03-10 stat.ME math.PR math.ST stat.TH

Geodesic slice sampling on the sphere

Michael Habeck, Mareike Hasenpflug, Shantanu Kodgirwar, Daniel Rudolf

Comments 38 pages, 10 figures in the main text, 1 table in the appendix, appeared in Journal of Machine Learning Research, 26(297), 1-28, (2025)

详情
英文摘要

Probability measures on the sphere form an important class of statistical models and are used, for example, in modeling directional data or shapes. Due to their widespread use, but also as an algorithmic building block, efficient sampling of distributions on the sphere is highly desirable. We propose a shrinkage based and an idealized geodesic slice sampling Markov chain, designed to generate approximate samples from distributions on the sphere. In particular, the shrinkage-based version of the algorithm can be implemented such that it runs efficiently and has no tuning parameters. We verify reversibility and prove that under weak regularity conditions geodesic slice sampling is uniformly ergodic. Numerical experiments show that the proposed slice samplers achieve excellent mixing on challenging targets including distributions arising in rigid-registration problems and mixtures of von Mises-Fisher distributions. In these settings our approach outperforms standard samplers such as random-walk Metropolis-Hastings and Hamiltonian Monte Carlo.

2212.14857 2026-03-10 math.ST stat.ME stat.ML stat.TH

Nuisance Function Tuning and Sample Splitting for Optimally Estimating a Doubly Robust Functional

Sean McGrath, Rajarshi Mukherjee

详情
英文摘要

Estimators of doubly robust functionals typically rely on estimating two complex nuisance functions, such as the propensity score and conditional outcome mean for the average treatment effect functional. We consider the problem of how to estimate nuisance functions to obtain optimal rates of convergence for a doubly robust nonparametric functional that has witnessed applications across the causal inference and conditional independence testing literature. For several plug-in estimators and a first-order bias-corrected estimator, we illustrate the interplay between different tuning parameter choices for the nuisance function estimators and sample splitting strategies on the optimal rate of estimating the functional of interest. For each of these estimators and each sample splitting strategy, we show the necessity to either undersmooth or oversmooth the nuisance function estimators under low regularity conditions to obtain optimal rates of convergence for the functional of interest. Unlike the existing literature, we show that plug-in and first-order bias-corrected estimators can achieve minimax rates of convergence across all Hölder smoothness classes of the nuisance functions by careful combinations of sample splitting and nuisance function tuning strategies. We complement these results with numerical simulations illustrating the impact of different nuisance function tuning and sample splitting strategies.

2212.14511 2026-03-10 cs.LG cs.SY eess.SY math.OC stat.ML

Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part I

Yi Tian, Kaiqing Zhang, Russ Tedrake, Suvrit Sra

Comments 51 pages; preliminary version appeared in L4DC 2023; this is the extended journal version, with an end-to-end guarantee added

详情
英文摘要

We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a cost-driven approach, where a dynamic model in some latent state space is learned by predicting the costs without predicting the observations or actions. In particular, we focus on an intuitive cost-driven state representation learning method for solving Linear Quadratic Gaussian (LQG) control, one of the most fundamental partially observable control problems. As our main results, we establish finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model, for finite-horizon time-varying LQG control problems. To the best of our knowledge, despite various empirical successes, finite-sample guarantees of such a cost-driven approach remain elusive. Our result underscores the value of predicting multi-step costs, an idea that is key to our theory, and notably also an idea that is known to be empirically valuable for learning state representations. A second part of this work, that is to appear as Part II, addresses the infinite-horizon linear time-invariant setting; it also extends the results to an approach that implicitly learns the latent dynamics, inspired by the recent empirical breakthrough of MuZero in model-based reinforcement learning.

2108.07636 2026-03-10 stat.ML cs.LG

Accounting for shared covariates in semi-parametric Bayesian additive regression trees

Estevão B. Prado, Andrew C. Parnell, Keefe Murphy, Nathan McJames, Ann O'Shea, Rafael A. Moral

Comments 48 pages, 8 tables, 10 figures

详情
Journal ref
The Annals of Applied Statistics 19 (1) 302 - 328, March 2025
英文摘要

We propose some extensions to semi-parametric models based on Bayesian additive regression trees (BART). In the semi-parametric BART paradigm, the response variable is approximated by a linear predictor and a BART model, where the linear component is responsible for estimating the main effects and BART accounts for non-specified interactions and non-linearities. Previous semi-parametric models based on BART have assumed that the set of covariates in the linear predictor and the BART model are mutually exclusive in an attempt to avoid poor coverage properties and reduce bias in the estimates of the parameters in the linear predictor. The main novelty in our approach lies in the way we change the tree-generation moves in BART to deal with this bias and resolve non-identifiability issues between the parametric and non-parametric components, even when they have covariates in common. This allows us to model complex interactions involving the covariates of primary interest, both among themselves and with those in the BART component. Our novel method is developed with a view to analysing data from an international education assessment, where certain predictors of students' achievements in mathematics are of particular interpretational interest. Through additional simulation studies and another application to a well-known benchmark dataset, we also show competitive performance when compared to regression models, alternative formulations of semi-parametric BART, and other tree-based methods. The implementation of the proposed method is available at \url{https://github.com/ebprado/CSP-BART}.

1905.01358 2026-03-10 cs.MA stat.AP

Agent based decision making for Integrated Air Defense system

Sumanta Kumar Das, Sumant Mukherjee

Comments 8 pages,9 figure,2 tables

详情
Journal ref
Journal of Battlefield Technology, 2011,vol 14,no 1
英文摘要

This paper presents algorithms of decision making agents for an integrated air defense (IAD) system. The advantage of using agent based over conventional decision making system is its ability to automatically detect and track targets and if required allocate weapons to neutralize threat in an integrated mode. Such approach is particularly useful for futuristic network centric warfare. Two agents are presented here that perform the basic decisions making tasks of command and control (C2) like detection and action against jamming, threat assessment and weapons allocation, etc. The belief-desire-intension (BDI) architectures stay behind the building blocks of these agents. These agents decide their actions by meta level plan reasoning process. The proposed agent based IAD system runs without any manual inputs, and represents a state of art model for C2 autonomy.

2603.07320 2026-03-10 stat.ME

Bayesian repulsive mixture model for multivariate functional data

Ricardo Cunha Pedroso, Fernando Andrés Quintana, Rosangela Helena Loschi

Comments 25 pages, 7 figures

详情
英文摘要

We introduce a repulsive mixture model to cluster observation units represented by multivariate functional data, based on similarity of curve shapes and individual-specific covariates. We propose a repulsive prior distribution for the component-specific location parameters that depends on a B-spline curve-tailored distance, extending existent repulsive priors to the context of multivariate functional data. The proposed model favors the identification of well-differentiated clusters, avoiding the presence of redundant ones. To sample from the posterior distribution, we propose an MCMC algorithm that includes a novel split-merge step that significantly improves the chain mixing. Different features of the proposed model, including the effects of repulsion and covariates in the clustering, are evaluated through simulation. The proposed model is fitted to analyze Chronic Ankle Instability (CAI) data, focusing on identifing individuals with similar types of physical dysfunctions based on the similarity of movement patterns.

2603.07310 2026-03-10 stat.CO math.PR

A note on diffusive/random-walk behaviour in Metropolis--Hastings algorithms

Yuxin Liu, Peiyi Zhou, Samuel Livingstone

Comments 12 pages, 9 pages of appendix

详情
英文摘要

We prove a general result that if a Metropolis--Hastings algorithm has a proposal that is not geometrically ergodic and the acceptance rate approaches unity at a suitable rate as the state variable becomes large, then the Metropolised chain will also not be geometrically ergodic. Our conditions seem stronger than might be expected, but are shown to be necessary through a counterexample. We then turn our attention to the random walk and guided walk Metropolis algorithms. We show that if the target distribution has polynomial tails the latter converges at twice the polynomial rate of the former, but that if instead the target distribution has strictly convex potential then the random walk Metropolis behaves as a $1/2$-lazy version of the guided walk Metropolis when the state variable is large, and therefore moves at a similar (ballistic) speed.

2603.07288 2026-03-10 stat.ME

Loglinear modelling of huge contingency tables

Veronica Vinciotti, Ernst C. Wit

详情
英文摘要

Contingency tables are a fundamental representation of multivariate categorical data. As the size of the contingency table grows exponentially with the number of variables, even a moderate number of variables, each with a moderate number of levels, will result in a huge number of cells, the majority of which will remain empty even with a significant amount of data. We propose an efficient method for inferring higher-order loglinear models in such scenarios. We tackle the computational challenge by using only a sample of the empty cells and deriving the associated likelihood under a Poisson sampling scheme. This allows us to define an iteratively re-weighted least squares (IRWLS) algorithm for parameter estimation. Under the extreme setting of huge contingency tables, we show how standard Poisson regression on the sampled data converges to this IRWLS scheme, when the number of sampled empty cells exceeds the number of observations. We illustrate the method with an analysis of data from the General Social Survey, which consists of 15014 observations in a 70-dimensional contingency table with a total of 2.6 x 10^{39} cells.

2603.07276 2026-03-10 cs.CV cs.LG stat.ML

Variational Flow Maps: Make Some Noise for One-Step Conditional Generation

Abbas Mammadov, So Takao, Bohan Chen, Ricardo Baptista, Morteza Mardani, Yee Whye Teh, Julius Berner

详情
英文摘要

Flow maps enable high-quality image generation in a single forward pass. However, unlike iterative diffusion models, their lack of an explicit sampling trajectory impedes incorporating external constraints for conditional generation and solving inverse problems. We put forth Variational Flow Maps, a framework for conditional sampling that shifts the perspective of conditioning from "guiding a sampling path", to that of "learning the proper initial noise". Specifically, given an observation, we seek to learn a noise adapter model that outputs a noise distribution, so that after mapping to the data space via flow map, the samples respect the observation and data prior. To this end, we develop a principled variational objective that jointly trains the noise adapter and the flow map, improving noise-data alignment, such that sampling from complex data posterior is achieved with a simple adapter. Experiments on various inverse problems show that VFMs produce well-calibrated conditional samples in a single (or few) steps. For ImageNet, VFM attains competitive fidelity while accelerating the sampling by orders of magnitude compared to alternative iterative diffusion/flow models. Code is available at https://github.com/abbasmammadov/VFM

2603.07273 2026-03-10 math.ST stat.TH

Maximal Ancillarity, Semiparametric Efficiency, and the Elimination of Nuisances

Marc Hallin, Bas J. M. Werker, Bo Zhou

详情
英文摘要

Restricting statistical experiments via nuisance-ancillary $σ$-fields yields nuisance-free experiments. However, a moot point with ancillarity is that maximal ancillary $σ$-fields are typically not unique. There are exceptions, though, among which the limiting experiments in a locally asymptotically normal (LAN) context. Building on this, we address the maximal ancillarity uniqueness problem by adopting a Hájek-Le Cam asymptotic perspective and define the concept of sequences of locally asymptotically maximal nuisance-ancillary $σ$-fields. We then show that any semiparametrically efficient procedure admits versions that are measurable with respect to such $σ$-fields while enjoying strict finite-sample nuisance-ancillarity, hence eliminating the nuisance without the hassle of estimating it. This is in sharp contrast with classical tangent space projections, which also achieve semiparametric efficiency but only enjoy asymptotic nuisance-ancillarity -- at the price, moreover, of adequately estimating the nuisance. When the nuisance is the density of some noise or innovation driving the data-generating process of a LAN experiment, we show that a sequence of locally asymptotically maximal nuisance-ancillary $σ$-fields is generated by the so-called center-outward residual ranks and signs based on measure transportation results. Restricting local experiments to such $σ$-fields yields sequences of finite-sample nuisance-free (here, distribution-free) restrictions of the original local LAN experiments that nevertheless achieve the semiparametric efficiency bounds of the original ones.

2603.07247 2026-03-10 math.NA cs.NA stat.CO

Multi-parameter determination in the semilinear Helmholtz equation

Long-Ling Du, Zejun Sun, Li-Li Wang, Guang-Hui Zheng

Comments 26 pages

详情
英文摘要

This paper studies an inverse boundary value problem for a semilinear Helmholtz equation with Neumann boundary conditions in a bounded domain $Ω\subset \mathbb{R}^n$ ($n\ge2$). The objective is to recover the unknown linear and nonlinear coefficients from the associated Neumann-to-Dirichlet (NtD) map. Using a higher-order linearization approach, we establish the unique determination of both coefficients from boundary measurements. For spatial dimensions $n\ge3$, uniqueness holds under $C^γ(\overlineΩ)$ regularity assumptions with $0<γ<1$, while in the two-dimensional case uniqueness is obtained under Sobolev regularity $W^{1,p}(Ω)$ with $p>2$. The analysis relies on the well-posedness of the forward problem together with techniques from linear inverse problems, including Runge-type approximation arguments and Fourier analysis. In addition, we develop a numerical reconstruction framework for recovering the coefficients from boundary data. The forward problem is discretized using a finite difference scheme combined with a quasi-Newton iteration, and the inverse problem is formulated within a Bayesian inference framework. Posterior distributions of the coefficients are explored using the preconditioned Crank-Nicolson (pCN) Markov chain Monte Carlo algorithm, which provides both point estimates and uncertainty quantification. Numerical experiments demonstrate the effectiveness of the proposed reconstruction method and illustrate the theoretical uniqueness results.

2603.07230 2026-03-10 stat.ME cs.LG stat.ML

Conditional Rank-Rank Regression via Deep Conditional Transformation Models

Xiaoyi Wang, Long Feng, Zhaojun Wang

详情
英文摘要

Intergenerational mobility quantifies the transmission of socio-economic outcomes from parents to children. While rank-rank regression (RRR) is standard, adding covariates directly (RRRX) often yields parameters with unclear interpretation. Conditional rank-rank regression (CRRR) resolves this by using covariate-adjusted (conditional) ranks to measure within-group mobility. We improve and extend CRRR by estimating conditional ranks with a deep conditional transformation model (DCTM) and cross-fitting, enabling end-to-end conditional distribution learning with structural constraints and strong performance under nonlinearity, high-order interactions, and discrete ordered outcomes where the distributional regression used in traditional CRRR may be cumbersome or prone to misconfiguration. We further extend CRRR to discrete outcomes via an $ω$-indexed conditional-rank definition and study sensitivity to $ω$. For continuous outcomes, we establish an asymptotic theory for the proposed estimators and verify the validity of exchangeable bootstrap inference. Simulations across simple/complex continuous and discrete ordered designs show clear accuracy gains in challenging settings. Finally, we apply our method to two empirical studies, revealing substantial within-group persistence in U.S. income and pronounced gender differences in educational mobility in India.

2603.07169 2026-03-10 cs.LG stat.ML

Making LLMs Optimize Multi-Scenario CUDA Kernels Like Experts

Yuxuan Han, Meng-Hao Guo, Zhengning Liu, Wenguang Chen, Shi-Min Hu

详情
英文摘要

Optimizing GPU kernels manually is a challenging and time-consuming task. With the rapid development of LLMs, automated GPU kernel optimization is gradually becoming a tangible reality. However, current LLM-driven automated optimization methods narrowly focus on machine learning applications, such as PyTorch operator optimization, while overlooking broader domains like sparse matrix operations in scientific computing. Extending to these broader applications brings new challenges for the benchmark and algorithm. Therefore, developing a general-purpose automated kernel optimization method becomes our primary focus. In this paper, we address the absence of systematic evaluation for multi-scenario settings by introducing MSKernelBench, which spans multiple scenarios, including fundamental algebraic operations, common LLM kernels, sparse matrix operators, and scientific computing routines, each supporting both FP32 and BF16 precision. Building on this benchmark, we introduce CUDAMaster, a multi-agent, hardware-aware system for kernel optimization that leverages profiling information and automatically constructs the full compilation and execution toolchain. Experimental results demonstrate that CUDAMaster achieves significant speedups across most operators, outperforming Astra by about 35%. In several cases, its performance matches or surpasses that of highly optimized, closed-source libraries such as cuBLAS. A demo showcasing the original and optimized code for each operator is available at https://hanyx2021.github.io/MSKernelBenchDemo/.

2603.07132 2026-03-10 math.PR math.ST stat.TH

Quadratic form of heavy-tailed self-normalized random vector with applications in $α$-heavy Mar\v cenko--Pastur law

Zhaorui Dong, Johannes Heiny, Jianfeng Yao

详情
英文摘要

Let $\mathbf{x}$ be a random vector with $n$ i.i.d.\ real-valued components in the domain attraction of an $α$-stable law with $α\in(0,2)$, and let $\mathbf{y}=\mathbf{x}/\|\mathbf{x}\|_2$ be the associated self-normalized vector on the unit sphere. For a (possibly random) Hermitian matrix $\mathbf{A}_n=\big(a_{ij}^{(n)}\big)$ independent of $\mathbf{y}$, we study the asymptotic law of the quadratic form $\mathbf{y}^\top \mathbf{A}_n \mathbf{y}$. Building on the sharp separation between diagonal and off-diagonal contributions in this heavy-tailed setting, we show that under a mild assumption on the Frobenius norm of the off-diagonal part of $\mathbf{A}_n$ the limiting law is solely governed by the empirical distribution of the diagonal entries and the index $α$. More precisely, if $n^{-1}\sum_{i=1}^n δ_{a^{(n)}_{ii}}$ converges weakly almost surely to a deterministic $ν$, then $Q_n$ converges in distribution to a non-degenerate law $μ_{ν,α}$ characterized through its Stieltjes transform. The law $μ_{ν,α}$ is shown to be atom-free (provided that $ν$ is non-degenerate) with an explicit density and tractable tail behavior. As an application in random matrix theory, we derive an implicit resolvent-based representation of the $α$-heavy Marčenko--Pastur law $H_{α,γ}$ for heavy-tailed sample correlation matrices and prove that $H_{α,γ}$ has no atoms except possibly at the origin. For comparison with the light-tailed setting, we also provide a Hanson--Wright-type concentration inequality for $\mathbf{y}^\top \mathbf{A}_n \mathbf{y}$ when the components of $\mathbf{x}$ are sub-Gaussian.

2603.07122 2026-03-10 cs.LG stat.ML

Combining Adam and its Inverse Counterpart to Enhance Generalization of Deep Learning Optimizers

Tao Shi, Liangming Chen, Long Jin, Mengchu Zhou

详情
英文摘要

In the training of neural networks, adaptive moment estimation (Adam) typically converges fast but exhibits suboptimal generalization performance. A widely accepted explanation for its defect in generalization is that it often tends to converge to sharp minima. To enhance its ability to find flat minima, we propose its new variant named inverse Adam (InvAdam). The key improvement of InvAdam lies in its parameter update mechanism, which is opposite to that of Adam. Specifically, it computes element-wise multiplication of the first-order and second-order moments, while Adam computes the element-wise division of these two moments. This modification aims to increase the step size of the parameter update when the elements in the second-order moments are large and vice versa, which helps the parameter escape sharp minima and stay at flat ones. However, InvAdam's update mechanism may face challenges in convergence. To address this challenge, we propose dual Adam (DualAdam), which integrates the update mechanisms of both Adam and InvAdam, ensuring convergence while enhancing generalization performance. Additionally, we introduce the diffusion theory to mathematically demonstrate InvAdam's ability to escape sharp minima. Extensive experiments are conducted on image classification tasks and large language model (LLM) fine-tuning. The results validate that DualAdam outperforms Adam and its state-of-the-art variants in terms of generalization performance. The code is publicly available at https://github.com/LongJin-lab/DualAdam.

2603.07114 2026-03-10 physics.soc-ph stat.CO

Robustness and size-dependence of circadian rhythms in multiscale suprachiasmatic-nucleus networks

Youhao Zhuo, Yingpeng Liu, Jiao Wu, Kesheng Xu, Muhua Zheng

Comments 20 pages, 14 figures

详情
Journal ref
Phys. Rev. E 113, 034304 (2026)
英文摘要

Understanding how multi-scale network structure influences circadian rhythms in the suprachiasmatic nucleus (SCN) is essential for uncovering the principles of rhythmic robustness and synchronization. Previous studies using synthetic SCN networks suggested a size-dependent phenomenon, in which rhythmic activity initially strengthens with network size and then saturates, but it remains unclear whether this occurs in real SCN networks. Here, we apply geometric branch growth (GBG) and geometric renormalization (GR) to generate self-similar scaled-up and scaled-down replicas from a single-scale functional mouse SCN network. Unlike synthetic models, these SCN replicas do not exhibit size-dependent rhythms: average period, amplitude, and synchronization remain stable across scales. By increasing the average degree with network size, we reproduce size-dependent rhythms and show that they arise from network connectivity, whereas low-degree networks fragment and fail to sustain oscillations. Disrupting clustering self-similarity slightly reduces synchronization, but circadian rhythms remain robust, indicating that average degree, rather than clustering, is the dominant structural driver. These results highlight the resilience of SCN rhythms to network scaling and provide a framework for linking multi-scale network structure to biological timekeeping.

2603.07108 2026-03-10 stat.ML cs.LG stat.ME

Deep Generative Spatiotemporal Engression for Probabilistic Forecasting of Epidemics

Rajdeep Pathak, Tanujit Chakraborty

详情
英文摘要

Accurate and reliable forecasting of epidemic incidences is critical for public health preparedness, yet it remains a challenging task due to complex nonlinear temporal dependencies and heterogeneous spatial interactions. Often, point forecasts generated by spatiotemporal models are unreliable in assigning uncertainty to future epidemic events. Probabilistic forecasting of epidemics is therefore crucial for providing the best or worst-case scenarios rather than a simple, often inaccurate, point estimate. We present deep spatiotemporal engression methods to generate accurate and reliable probabilistic forecasts on low-frequency epidemic datasets. The proposed methods act as distributional lenses, and out-of-sample probabilistic forecasts are generated by sampling from the trained models. Our frameworks encapsulate lightweight deep generative architectures, wherein uncertainty is quantified endogenously, driven by a pre-additive noise component during model construction. We establish geometric ergodicity and asymptotic stationarity of the spatiotemporal engression processes under mild assumptions on the network weights and pre-additive noise process. Comprehensive evaluations across six epidemiological datasets over three forecast horizons demonstrate that the proposal consistently outperforms several temporal and spatiotemporal benchmarks in both point and probabilistic forecasting. Additionally, we explore the explainability of the proposal to enhance the models' practical application for informed, timely public health interventions.

2603.07099 2026-03-10 stat.ME stat.CO

Parametric modal regression for right-censored positive responses

Christian E. Galarza, Víctor H. Lachos

Comments 25 pages, 7 figures, 4 tables. R package available at https://github.com/chedgala/ModalCens

详情
英文摘要

We present a unified parametric framework for modal regression applicable to continuous positive distributions, with explicit support for right-censored observations. The key contribution is a systematic analytical reparameterization of density parameters as direct functions of the conditional mode. This closed-form mapping is derived for the Gamma, Beta, Weibull, Lognormal, and Inverse Gaussian distributions, directly linking the mode to a linear predictor. Maximum likelihood estimation is performed using the censored log-likelihood, with asymptotic inference based on the observed Fisher information matrix. A Monte Carlo simulation study across multiple distributions, sample sizes, and censoring levels confirms consistent parameter recovery. Empirical bias and RMSE decrease as expected, and Wald confidence intervals achieve nominal coverage. Finally, the proposed methodology is illustrated through an application to real-world reliability data. All methodology is implemented in the open-source R package ModalCens.

2603.07055 2026-03-10 stat.ME econ.EM math.ST stat.TH

Integrating Heterogeneous Information in Randomized Experiments: A Unified Calibration Framework

Wei Ma, Zeqi Wu, Zheng Zhang

详情
英文摘要

In modern randomized experiments, large-scale data collection increasingly yields rich baseline covariates and auxiliary information from multiple sources. Such information offers opportunities for more precise treatment effect estimation, but it also raises the challenge of integrating heterogeneous information coherently without compromising validity. Covariate-adaptive randomization (CAR) is widely used to improve covariate balance at the design stage, but it typically balances only a small set of covariates used to form strata, making covariate adjustment at the analysis stage essential for more efficient estimation of treatment effects. Beyond standard covariate adjustment, it is often desirable to incorporate auxiliary information, including cross-stratum information, predictions from various machine learning models, and external data from historical trials or real-world sources. While this auxiliary information is widely available, existing covariate adjustment methods under CAR primarily exploit within-stratum covariates and do not provide a coherent mechanism for integrating it. We propose a unified calibration framework that integrates such information through an information proxy vector and calibration weights defined by a convex optimization problem. The resulting estimator recovers many recent covariate adjustment procedures as special cases while providing a systematic mechanism for both internal and external information borrowing within a single framework. We establish large-sample validity and a no-harm efficiency guarantee, showing that incorporating additional information sources cannot increase asymptotic variance, and we extend the theory to settings in which both the number of strata and the number of information sources grow with the sample size.

2603.07014 2026-03-10 stat.ME math.ST stat.ML stat.TH

Fréchet regression of multivariate distributions with nonparanormal transport

Junyoung Park, Irina Gaynanova

Comments 62 pages, 4 figures

详情
英文摘要

Regression with distribution-valued responses and Euclidean predictors has gained increasing scientific relevance. While methodology for univariate distributional data has advanced rapidly in recent years, multivariate distributions, which additionally encode dependence across univariate marginals, have received less attention and pose computational and statistical challenges. In this work, we address these challenges with a new regression approach for multivariate distributional responses, in which distributions are modeled within the semiparametric nonparanormal family. By incorporating the nonparanormal transport (NPT) metric -- an efficient closed-form surrogate for the Wasserstein distance -- into the Fréchet regression framework, our approach decomposes the problem into separate regressions of marginal distributions and their dependence structure, facilitating both efficient estimation and granular interpretation of predictor effects. We provide theoretical justification for NPT, establishing its topological equivalence to the Wasserstein distance and proving that it mitigates the curse of dimensionality. We further prove uniform convergence guarantees for regression estimators, both when distributional responses are fully observed and when they are estimated from empirical samples, attaining fast convergence rates comparable to the univariate case. The utility of our method is demonstrated via simulations and an application to continuous glucose monitoring data.

2603.07005 2026-03-10 cs.LG stat.ML

Combinatorial Allocation Bandits with Nonlinear Arm Utility

Yuki Shibukawa, Koichi Tanaka, Yuta Saito, Shinji Ito

Comments 32 pages

详情
英文摘要

A matching platform is a system that matches different types of participants, such as companies and job-seekers. In such a platform, merely maximizing the number of matches can result in matches being concentrated on highly popular participants, which may increase dissatisfaction among other participants, such as companies, and ultimately lead to their churn, reducing the platform's profit opportunities. To address this issue, we propose a novel online learning problem, Combinatorial Allocation Bandits (CAB), which incorporates the notion of *arm satisfaction*. In CAB, at each round $t=1,\dots,T$, the learner observes $K$ feature vectors corresponding to $K$ arms for each of $N$ users, assigns each user to an arm, and then observes feedback following a generalized linear model (GLM). Unlike prior work, the learner's objective is not to maximize the number of positive feedback, but rather to maximize the arm satisfaction. For CAB, we provide an upper confidence bound algorithm that achieves an approximate regret upper bound, which matches the existing lower bound for the special case. Furthermore, we propose a TS algorithm and provide an approximate regret upper bound. Finally, we conduct experiments on synthetic data to demonstrate the effectiveness of the proposed algorithms compared to other methods.

2603.06970 2026-03-10 stat.ME

Deep Probabilistic Spatial Modeling for Multivariate Mixed-Type Responses

Yeseul Jeon, Kyeong Eun Lee, Joon Jin Song

详情
英文摘要

Many scientific applications involve mixed spatially indexed outcomes of heterogeneous types that are driven by shared latent mechanisms. Modeling such data is challenging due to complex, nonlinear, and potentially nonstationary spatial dependence, as well as the need for coherent joint inference across mixed outcome distributions. Existing multivariate mixed outcome models often rely on restrictive linear assumptions, while recent deep learning approaches emphasize predictive flexibility but typically lack coherent joint modeling and uncertainty quantification for spatial data. We develop MultiDeepGP, a scalable and statistically principled framework for joint modeling of multivariate mixed outcomes in spatial settings. The proposed approach introduces a shared latent spatial component that governs cross-outcome dependence while allowing outcome-specific distributions. Spatial dependence and nonlinear structure are captured through a deep latent representation, and uncertainty quantification is enabled via an efficient Monte Carlo-based inference strategy. This construction balances modeling flexibility with probabilistic interpretability and computational feasibility. The proposed method is evaluated through simulation studies designed to reflect key challenges in mixed outcome spatial modeling, as well as an application to georeferenced environmental and public health data from the African Great Lakes region. The results demonstrate that the proposed framework provides accurate joint prediction and reliable uncertainty quantification in complex spatial settings.

2603.06957 2026-03-10 stat.ML cs.AI cs.LG

Post-Training with Policy Gradients: Optimality and the Base Model Barrier

Alireza Mousavi-Hosseini, Murat A. Erdogdu

Comments 36 pages, 2 figures

详情
英文摘要

We study post-training linear autoregressive models with outcome and process rewards. Given a context $\boldsymbol{x}$, the model must predict the response $\boldsymbol{y} \in Y^N$, a sequence of length $N$ that satisfies a $γ$ margin condition, an extension of the standard separability to sequences. We prove that on test samples where the base model achieves a non-trivial likelihood $α$, a variant of policy gradient (PG) can achieve likelihood $1 - \varepsilon$ with an essentially minimax optimal number of reward queries $\tilde{O}((α^{-1} + \varepsilon^{-1})/γ^2)$. However, a barrier arises for going beyond the support of the base model. We prove that the overall expected error after post-training with outcome rewards is governed by a property of the base model called the Likelihood Quantile (LQ), and that variants of PG, while minimax optimal, may require a number of reward queries exponential in $N$ to go beyond this support, regardless of the pre-training algorithm. To overcome this barrier, we study post-training with a process reward model, and demonstrate how PG variants in this setting avoid the curse of dimensionality in $N$ via dependence on a token-level LQ. Along the way, we prove that under the margin condition, SGD with adaptive learning rate (LR) achieves a near optimal test error for statistical learning, and PG with adaptive LR achieves a near optimal number of mistakes for online learning while being computationally efficient whenever possible, both of which may be of independent interest.

2603.06944 2026-03-10 stat.ME

Estimating Complex Densities using Two-Stage Normalizing Flows

Roxana Darvishi, David C. Stenning, Ted von Hippel, Owen G. Ward

详情
英文摘要

In many scientific applications, the target probability distribution cannot be evaluated in closed form or sampled from directly. Instead, it can often be decomposed into multiple components, some of which are accessible only through samples generated by simulators or external datasets, while others admit tractable mathematical expressions or are specified through statistical assumptions about variable relationships. Developing inference methods that coherently integrate these heterogeneous sources of information remains an open challenge. In this paper, we propose a Two-Stage Normalizing Flows framework for approximating and sampling from such distributions. The method first learns the densities of components for which only samples are available, and then combines the outputs with the analytically specified terms to reconstruct the full target distribution in a second stage. The resulting model enables both point-wise density evaluation and efficient generation of representative samples, without requiring direct access to the full target density or joint samples from the complete model. We assess the proposed approach through simulation studies in joint density inference and Bayesian hierarchical models with inaccessible likelihoods. The proposed framework is able to accurately recover complex, highly nonlinear target structures using only partial information about the target density, providing stable and flexible approximations in settings where standard modeling assumptions do not hold (or when complete access to the target distribution is not available). Analysis of a large scale astronomy application highlights interesting differences between our method and existing approaches. Our normalizing flows procedure offers a robust and flexible approach to inference for intractable target distributions across both simulated and real-world applications.

2603.06941 2026-03-10 math.ST stat.ME stat.TH

Demonstration Experiments

Guido Imbens, Lorenzo Masoero, Alexander Rakhlin, Thomas S. Richardson, Suhas Vijaykumar

详情
英文摘要

Adaptive experiments are used extensively in online platforms, healthcare and biotechnology, and a variety of other settings. In many of these applications, the main goal is not to precisely estimate a treatment effect, but to demonstrate that at least one candidate intervention yields a positive effect, for some subpopulation, on some measured outcome. We formalize this objective in a multi-armed bandit framework and develop inference procedures for testing whether any arm's mean exceeds a given threshold under fully adaptive sampling: one which pools information across promising arms, and one which corresponds to time-uniform multiple inference on the means of individual arms. To support the latter, we establish a moderate deviations principle for the sequential t-statistic, justifying anytime-valid testing of a large number of hypotheses concurrently. To illustrate how adaptive design can target the proposed statistics, we recast experimental design as bandit optimization where an arm's reward corresponds to its signal-to-noise ratio, and analyze an adaptive allocation rule for which we establish a logarithmic regret bound.

2603.06916 2026-03-10 stat.ME stat.AP

Living forwards or understanding backwards? A comparison of Inverse Probability of Treatment Weighting and G-estimation methods for targeting hypothetical full adherence estimands in longitudinal cohort studies

Xiaoran Liang, Deniz Türkmen, Jane A H Masoli, Luke C Pilling, Jack Bowden

详情
英文摘要

Medication adherence is essential to ensure treatment effectiveness, but too often in routine care non-adherence compromises the desired outcome. We explore longitudinal causal modelling using observational data to estimate the time-varying effects of continuous drug adherence measures on health outcomes over a sustained period. The goal of such analyses is to quantify the potential impact of interventions to improve adherence on long-term health. We consider two established longitudinal causal approaches designed to handle time-varying confounding under the ``no unmeasured confounding'' (NUC) assumption: G-estimation and inverse probability of treatment weighting (IPTW). In randomized controlled trial, NUC-based methods have been applied to address non-adherence as an intercurrent event, and instrumental variable (IV) extensions of G-estimation have also been introduced for settings where the NUC assumption may fail. We adapt these methods to observational data settings and illustrate their use for assessing how adherence over time impacts health outcomes. We align the causal parameters across methods and show they can target the same causal estimand: the average effect among treated individuals of full adherence versus zero adherence. We set out the identification conditions for IPTW and G-estimation under NUC, and for an IV-based extension that has specific utility when the NUC assumption is implausible. We assess the statistical properties, strengths and weaknesses of each approach through Monte Carlo simulations designed to reflect longitudinal studies with a continuous exposure. We demonstrate these methods by quantifying the effect of full statin adherence on LDL cholesterol control in 13,000 UK Biobank participants with linked primary care data.

2603.06901 2026-03-10 stat.ML cs.LG

Fairness May Backfire: When Leveling-Down Occurs in Fair Machine Learning

Yi Yang, Xiangyu Chang, Pei-yu Chen

Comments Short version of the paper (Nov 20, 2025)

详情
英文摘要

As machine learning (ML) systems increasingly shape access to credit, jobs, and other opportunities, the fairness of algorithmic decisions has become a central concern. Yet it remains unclear when enforcing fairness constraints in these systems genuinely improves outcomes for affected groups or instead leads to "leveling down," making one or both groups worse off. We address this question in a unified, population-level (Bayes) framework for binary classification under prevalent group fairness notions. Our Bayes approach is distribution-free and algorithm-agnostic, isolating the intrinsic effect of fairness requirements from finite-sample noise and from training and intervention specifics. We analyze two deployment regimes for ML classifiers under common legal and governance constraints: attribute-aware decision-making (sensitive attributes available at decision time) and attribute-blind decision-making (sensitive attributes excluded from prediction). We show that, in the attribute-aware regime, fair ML necessarily (weakly) improves outcomes for the disadvantaged group and (weakly) worsens outcomes for the advantaged group. In contrast, in the attribute-blind regime, the impact of fairness is distribution-dependent: fairness can benefit or harm either group and may shift both groups' outcomes in the same direction, leading to either leveling up or leveling down. We characterize the conditions under which these patterns arise and highlight the role of "masked" candidates in driving them. Overall, our results provide structural guidance on when pursuing algorithmic fairness is likely to improve group outcomes and when it risks systemic leveling down, informing fair ML design and deployment choices.

2603.06872 2026-03-10 math.NA cs.NA math.DS stat.ML

Kernel Methods for Some Transport Equations with Application to Learning Kernels for the Approximation of Koopman Eigenfunctions: A Unified Approach via Variational Methods, Green's Functions and the Method of Characteristics

Boumediene Hamzi, Houman Owhadi, Umesh Vaidya

详情
英文摘要

We present a unified theoretical and computational framework for constructing reproducing kernels tailored to transport equations and adapted to Koopman eigenfunctions of nonlinear dynamical systems. These eigenfunctions satisfy a transport-type partial differential equation (PDE) that we invert using three analytically grounded methods: (i) A Lions-type variational principle in a reproducing kernel Hilbert space (RKHS), (ii) convolution with a Green's function, and (iii) a resolvent operator constructed via Laplace transforms along characteristic flows. We prove that these three constructions yield identical kernels under mild smoothness and causality assumptions. We further show that the associated kernel eigenfunctions (Mercer modes) converge in L^2 to true Koopman eigenfunctions when the latter lie in the RKHS. Our approach is numerically realized through a mesh-free, convex optimization framework, enhanced with boundary regularization to handle eigenfunction blow-up. A multiple-kernel learning (MKL) scheme selects kernels automatically via residual minimization. Finally, we demonstrate that the same framework applies verbatim to a broader class of linear transport PDEs, including the advection, continuity, and Liouville equations. The unification of variational principles, Green's functions, and the method of characteristics enables the development of novel schemes for approximating eigenfunctions of transport equations, including those of the Koopman operator, and introduces a data-driven approach for learning kernels tailored to these approximations. Numerical experiments confirm the practical utility and robustness of the method.

2603.06851 2026-03-10 stat.ML cs.GT cs.LG

Bilateral Trade Under Heavy-Tailed Valuations: Minimax Regret with Infinite Variance

Hangyi Zhao

Comments 9 pages

详情
英文摘要

We study contextual bilateral trade under full feedback when trader valuations have bounded density but infinite variance. We first extend the self-bounding property of Bachoc et al. (ICML 2025) from bounded to real-valued valuations, showing that the expected regret of any price $π$ satisfies $\mathbb{E}[g(m,V,W) - g(π,V,W)] \le L|m-π|^2$ under bounded density alone. Combining this with truncated-mean estimation, we prove that an epoch-based algorithm achieves regret $\widetilde{O}(T^{1-2β(p-1)/(βp + d(p-1))})$ when the noise has finite $p$-th moment for $p \in (1,2)$ and the market value function is $β$-Hölder, and we establish a matching $Ω(\cdot)$ lower bound via Assouad's method with a smoothed moment-matching construction. Our results characterize the exact minimax rate for this problem, interpolating between the classical nonparametric rate at $p=2$ and the trivial linear rate as $p \to 1^+$.

2603.06826 2026-03-10 stat.ML cs.LG stat.ME

CREDO: Epistemic-Aware Conformalized Credal Envelopes for Regression

Luben M. C. Cabezas, Sabina J. Sloman, Bruno M. Resende, Fanyi Wu, Michele Caprio, Rafael Izbicki

Comments 26 pages, 5 figures

详情
英文摘要

Conformal prediction delivers prediction intervals with distribution-free coverage, but its intervals can look overconfident in regions where the model is extrapolating, because standard conformal scores do not explicitly represent epistemic uncertainty. Credal methods, by contrast, make epistemic effects visible by working with sets of plausible predictive distributions, but they are typically model-based and lack calibration guarantees. We introduce CREDO, a simple "credal-then-conformalize" recipe that combines both strengths. CREDO first builds an interpretable credal envelope that widens when local evidence is weak, then applies split conformal calibration on top of this envelope to guarantee marginal coverage without further assumptions. This separation of roles yields prediction intervals that are interpretable: their width can be decomposed into aleatoric noise, epistemic inflation, and a distribution-free calibration slack. We provide a fast implementation based on trimming extreme posterior predictive endpoints, prove validity, and show on benchmark regressions that CREDO maintains target coverage while improving sparsity adaptivity at competitive efficiency.

2603.06715 2026-03-10 q-bio.PE stat.CO

Understanding and Managing Frogeye Leaf Spot through Network-Based Modeling in Soybean

Chinthaka Weerarathna, Thien-Minh Le, Jin Wang

Comments 22 pages, 7 figures, 3 tables

详情
英文摘要

Frogeye Leaf Spot (FLS), caused by Cercospora sojina, poses a significant threat to soybean production, with yield losses of 30-60%. Traditional mass-action models assume homogeneous mixing, which rarely holds in real fields and limits their ability to inform FLS management. To address this, we developed a network-based model that incorporates real-field structure to improve FLS management in soybeans. Using approximate Bayesian computation, we estimated key epidemiological parameters and found that infection origin can shift the balance between transmission routes. Data analyses indicated that tillage and non-tillage plots did not differ significantly in fungal spread, decay, or disease severity. Finally, we show that early, targeted roguing is more effective than delayed or random removal. Together, these findings offer science-based guidance for FLS management and highlight the value of network-based models to inform agricultural disease control.

2603.06616 2026-03-10 cs.LG cs.AI math.ST stat.TH

RACER: Risk-Aware Calibrated Efficient Routing for Large Language Models

Sai Hao, Hao Zeng, Hongxin Wei, Bingyi Jing

详情
英文摘要

Efficiently routing queries to the optimal large language model (LLM) is crucial for optimizing the cost-performance trade-off in multi-model systems. However, most existing routers rely on single-model selection, making them susceptible to misrouting. In this work, we formulate LLM routing as the $α$-VOR problem to minimize expected set size while controlling the misrouting risk, and propose a novel method -- RACER, extending base routers to output model sets that can be subsequently aggregated for improved output. In particular, RACER constructs nested model sets via augmented scoring and utilizes finite-sample concentration bounds to calibrate a threshold that allows for both variable set sizes and abstention. We theoretically prove that RACER achieves rigorous distribution-free risk control on unseen test data in a post-hoc and model-agnostic manner. Extensive experiments verify our theoretical guarantees and demonstrate that RACER consistently enhances downstream accuracy across a wide range of benchmarks.

2603.01198 2026-03-10 eess.SY cs.SY stat.AP

Digital Twin-Based Cooling System Optimization for Data Center

Shrenik Jadhav, Zheng Liu

Comments 30 pages, 8 figures

详情
英文摘要

Data center cooling systems consume significant auxiliary energy, yet optimization studies rarely quantify the gap between theoretically optimal and operationally deployable control strategies. This paper develops a digital twin of the liquid cooling infrastructure at the Frontier exascale supercomputer, in which a hot-temperature water system comprises three parallel subloops, each serving dedicated coolant distribution unit clusters through plate heat exchangers and variable-speed pumps. The surrogate model is built based on Modelica and validated through one full calendar year of 10-minute operational data following ASHRAE Guideline 14. The model achieves a subloop coefficient of variation of the root mean square error below 2.7% and a normalized mean bias error within 2.5%. Using this validated surrogate model, a layered optimization framework evaluates three progressively constrained strategies: an analytical flow-only optimization achieves 20.4% total energy saving, unconstrained joint optimization of flow rate and supply temperature demonstrates 30.1% total energy saving, and ramp-constrained optimization of flow rate and supply temperature, enforcing actuator rate limits, can reach total energy saving of 27.8%. The analysis reveals that the baseline system operates at 2.9 times the minimum thermally safe flow rate, and the co-optimizing supply temperature with flow rate nearly doubles the savings achievable by flow reduction alone.

2603.00827 2026-03-10 math.ST stat.TH

Minimax convergence rates of a binary plug-in type classification procedure for time-homogeneous SDE paths under low-noise conditions

Eddy Michel Ella-Mintsa

Comments 55 pages

详情
英文摘要

The study of minimax convergence rates for classification procedures adapted to SDE paths is rarely addressed in the literature. Only one paper established optimal convergence rates for a binary classifier for SDE paths constructed from the white noise model. In this paper, we consider a more complex diffusion model with space-dependent drift and diffusion coefficients where the drift depends on the class and the diffusion coefficient is common to all classes. We establish, under the low-noise condition, a faster convergence rate over a Holder space. This result will require the establishment of an exponential inequality, which is essential to obtain the expected rate. We then study the lower bound on the excess risk of the empirical classifier.

2603.00202 2026-03-10 stat.ML cs.LG math.PR

The Partition Principle Revisited: Non-Equal Volume Designs Achieve Minimal Expected Star Discrepancy

Xiaoda Xu

Comments Wrong in critical steps

详情
英文摘要

We study the expected star discrepancy under a newly designed class of non-equal volume partitions. The main contributions are twofold. First, we establish a strong partition principle for the star discrepancy, showing that our newly designed non-equal volume partitions yield stratified sampling point sets with lower expected star discrepancy than classical jittered sampling. Specifically, we prove that $\mathbb{E}(D^{*}_{N}(Z)) < \mathbb{E}(D^{*}_{N}(Y))$, where $Y$ and $Z$ represent jittered sampling and our non-equal volume partition sampling, respectively. Second, we derive explicit upper bounds for the expected star discrepancy under our non-equal volume partition models, which improve upon existing bounds for jittered sampling. Our results provide a theoretical foundation for using non-equal volume partitions in high-dimensional numerical integration.

2602.23355 2026-03-10 stat.ME

Robust model selection using likelihood as data

Jongwoo Choi, Neil A. Spencer, Jeffrey W. Miller

详情
英文摘要

Model selection is a central task in statistics, but standard methods are not robust in misspecified settings where the true data-generating process (DGP) is not in the set of candidate models. The key limitation is that existing methods -- including information criteria and Bayesian posteriors -- do not quantify uncertainty about how well each candidate model approximates the true DGP. In this paper, we introduce a novel approach to model selection based on modeling the likelihood values themselves. Specifically, given $K$ candidate models and $n$ observations, we view the $n\times K$ matrix of negative log-likelihood values as a random data matrix and observe that the expectation of each row is equal to the vector of Kullback--Leibler divergences between the $K$ models and the true DGP, up to an additive constant. We use a multivariate normal model to estimate and quantify uncertainty in this expectation, providing calibrated inferences for robust model selection under misspecification. The procedure is easy to compute, interpretable, and comes with theoretical guarantees, including consistency.

2602.20912 2026-03-10 stat.AP

A Corrected Welch Satterthwaite Equation. And: What You Always Wanted to Know About Kish's Effective Sample but Were Afraid to Ask

Matthias von Davier

Comments 16 pages

详情
英文摘要

This article presents a corrected version of the Satterthwaite (1941, 1946) approximation for the degrees of freedom of a weighted sum of independent variance components. The original formula is known to yield biased estimates when component degrees of freedom are small. The correction, derived from exact moment matching, adjusts for the bias by incorporating a factor that accounts for the estimation of fourth moments. We show that Kish's (1965) effective sample size formula emerges as a special case when all variance components are equal, and component degrees of freedom are ignored. Simulation studies demonstrate that the corrected estimator closely matches the expected degrees of freedom even for small component sizes, while the original Satterthwaite estimator exhibits substantial downward bias. Additional applications are discussed, including jackknife variance estimation, multiple imputation total variance, and the Welch test for unequal variances.

2602.00784 2026-03-10 q-fin.RM math.LO math.PR math.ST q-fin.MF stat.TH

Non-standard analysis for coherent risk estimation: hyperfinite representations, discrete Kusuoka formulae, and plug-in asymptotics

Tomasz Kania

Comments 42 pp

详情
英文摘要

We develop a non-standard analysis framework for coherent risk measures and their finite-sample analogues, coherent risk estimators, building on recent work of Aichele, Cialenco, Jelito, and Pitera. Coherent risk measures on $L^\infty$ are realised as standard parts of internal support functionals on Loeb probability spaces, and coherent risk estimators arise as finite-grid restrictions. Our main results are: (i) a hyperfinite robust representation theorem that yields, as finite shadows, the robust representation results for coherent risk estimators; (ii) a discrete Kusuoka representation for law-invariant coherent risk estimators as suprema of mixtures of discrete expected shortfalls on $\{k/n:k=1,\ldots,n\}$; (iii) uniform almost sure consistency (with an explicit rate) for canonical spectral plug-in estimators over Lipschitz spectral classes; (iv) a Kusuoka-type plug-in consistency theorem under tightness and uniform estimation assumptions; (v) bootstrap validity for spectral plug-in estimators via an NSA reformulation of the functional delta method (under standard smoothness assumptions on $F_X$); and (vi) asymptotic normality obtained through a hyperfinite central limit theorem. The hyperfinite viewpoint provides a transparent probability-to-statistics dictionary: applying a risk measure to a law corresponds to evaluating an internal functional on a hyperfinite empirical measure and taking the standard part. We include a standardd self-contained introduction to the required non-standard tools.

2601.17205 2026-03-10 stat.ME

Bayesian Inference for Discrete Markov Random Fields Through Coordinate Rescaling

Giuseppe Arena, Maarten Marsman

详情
英文摘要

Discrete Markov random fields are undirected graphical models that capture complex conditional dependencies between discrete variables. Conducting exact posterior inference in these models is often computationally challenging because evaluating their normalizing constant requires summation over all possible state configurations, and the size of this state space grows exponentially with the number of variables and their possible states. As a result, exact likelihood-based inference is infeasible in many practical settings, and existing methods, such as Double Metropolis-Hastings or pseudo-likelihood approximations, either scale poorly to large systems or underestimate posterior variability. To address these limitations, we propose a new class of coordinate-rescaling sampling methods that transform pseudo-likelihood-based posteriors toward the target posterior while preserving computational efficiency. The resulting samplers retain scalability while improving uncertainty quantification. In simulation studies, we compare the proposed methods to existing approaches and demonstrate that coordinate-rescaling sampling yields more accurate estimates of posterior variability, providing a scalable and reliable approach to Bayesian inference in discrete MRFs.

2601.16120 2026-03-10 stat.ML cs.LG stat.ME

Synthetic Augmentation in Imbalanced Learning: When It Helps, When It Hurts, and How Much to Add

Zhengchi Ma, Anru R. Zhang

详情
英文摘要

Imbalanced classification often causes standard training procedures to prioritize the majority class and perform poorly on rare but important cases. A classic and widely used remedy is to augment the minority class with synthetic samples, but two basic questions remain under-resolved: when does synthetic augmentation actually help, and how many synthetic samples should be generated? We develop a unified statistical framework for synthetic augmentation in imbalanced learning, studying models trained on imbalanced data augmented with synthetic minority samples. Our theory shows that synthetic data is not always beneficial. In a "local symmetry" regime, imbalance is not the dominant source of error, so adding synthetic samples cannot improve learning rates and can even degrade performance by amplifying generator mismatch. When augmentation can help ("local asymmetry"), the optimal synthetic size depends on generator accuracy and on whether the generator's residual mismatch is directionally aligned with the intrinsic majority-minority shift. This structure can make the best synthetic size deviate from naive full balancing. Practically, we recommend Validation-Tuned Synthetic Size (VTSS): select the synthetic size by minimizing balanced validation loss over a range centered near the fully balanced baseline, while allowing meaningful departures. Extensive simulations and real data analysis further support our findings.

2601.02275 2026-03-10 eess.SY cs.SY stat.AP

Machine Learning Guided Cooling System Optimization for Data Center

Shrenik Jadhav, Zheng Liu

Comments 11 pages, 11 figures

详情
英文摘要

Effective data center cooling is crucial for reliable operation; however, cooling systems often exhibit inefficiencies that result in excessive energy consumption. This paper presents a three-stage, physics-guided machine learning framework for identifying and reducing cooling energy waste in high-performance computing facilities. Using one year of 10-minute resolution operational data from the Frontier exascale supercomputer, we first train a monotonicity-constrained gradient boosting surrogate that predicts facility accessory power from coolant flow rates, temperatures, and server power. The surrogate achieves a mean absolute error of 0.026 MW and predicts power usage effectiveness within 0.01 of measured values for 98.7% of test samples. In the second stage, the surrogate serves as a physics-consistent baseline to quantify excess cooling energy, revealing approximately 85 MWh of annual inefficiency concentrated in specific months, hours, and operating regimes. The third stage evaluates guardrail-constrained counterfactual adjustments to supply temperature and subloop flows, demonstrating that up to 96% of identified excess can be recovered through small, safe setpoint changes while respecting thermal limits and operational constraints. The framework yields interpretable recommendations, supports counterfactual analyses such as flow reduction during low-load periods and redistribution of thermal duty across cooling loops, and provides a practical pathway toward quantifiable reductions in accessory power. The developed framework is readily compatible with model predictive control and provides a template that, with site-specific recalibration, could be adapted to other liquid-cooled data centers with different configurations and cooling requirements.

2511.20968 2026-03-10 stat.CO

SVEMnet: An R package for Self-Validated Elastic-Net Ensembles and Multi-Response Optimization in Small-Sample Mixture-Process Experiments

Andrew T. Karl

详情
Journal ref
Chemometrics and Intelligent Laboratory Systems, Volume 271, 2026, 105660
英文摘要

SVEMnet is an R package for fitting Self-Validated Ensemble Models (SVEM) with elastic-net base learners and performing multi-response optimization in small-sample mixture-process design-of-experiments (DOE) studies with numeric, categorical, and mixture factors. SVEMnet wraps elastic-net and relaxed elastic-net models for Gaussian and binomial responses from glmnet in a fractional random-weight (FRW) resampling scheme with anti-correlated train/validation weights; penalties are selected by validation-weighted AIC- and BIC-type criteria, and predictions are averaged across replicates to stabilize fits near the interpolation boundary. In addition to the core SVEM engine, the package provides deterministic high-order formula expansion, a permutation-based whole-model test heuristic, and a mixture-constrained random-search optimizer that combines Derringer-Suich desirability functions, bootstrap-based uncertainty summaries, and optional mean-level specification-limit probabilities to generate scored candidate tables and diverse exploitation and exploration medoids for sequential fit-score-run-refit workflows. A simulated lipid nanoparticle (LNP) formulation study illustrates these tools in a small-sample mixture-process DOE setting, and simulation experiments based on sparse quadratic response surfaces benchmark SVEMnet against repeated cross-validated elastic-net baselines.

2511.19525 2026-03-10 cs.LG cs.CV stat.ML

Shortcut Invariance: Targeted Jacobian Regularization in Disentangled Latent Space

Shivam Pal, Sakshi Varshney, Piyush Rai

详情
英文摘要

Deep neural networks are prone to learning shortcuts, spurious correlations present in the training data that undermine out-of-distribution (OOD) generalization. Most prior work mitigates shortcut learning through input-space reweighting, either relying on explicit shortcut labels or inferring shortcut structure from heuristics such as per-sample loss. Moreover, these approaches typically assume the presence of some shortcut-conflicting examples in the training set, an assumption that is often violated in practice, particularly in medical imaging where data is aggregated across institutions with different acquisition protocols. We propose a latent-space method that views shortcut learning as over-reliance on shortcut-aligned axes. In a disentangled latent space, we identify candidate shortcut-aligned axes via their strong correlation with labels and reduce classifier reliance on them by injecting targeted anisotropic noise during training. Unlike prior latent-space based approaches that remove, project out, or adversarially suppress shortcut features, our method preserves the full representation and instead impose functional invariance by regularizing the classifier's sensitivity along those axes. We show that injecting anisotropic noise induces targeted Jacobian and curvature regularization, effectively flattening the decision boundary along shortcut axes while leaving core feature dimensions largely unaffected. Our method achieves state-of-the-art OOD performance across standard shortcut-learning benchmarks without requiring shortcut labels or shortcut-conflicting samples.

2511.04060 2026-03-10 math.ST stat.TH

A Unified Graphical Criterion for Characterizing a Linear Causal Interpretation of Partial Regression Coefficients

Masato Shimokawa

Comments v6: Added Theorem 3.7. v7: Corrected a typo in Definition 2.5. v8: Changed the title. Added a Discussion section. Removed Lemma 5.9. The correction does not affect the main results. v9: Focused the discussion on the main theme

详情
英文摘要

This paper characterizes the values of partial regression coefficients, defined as projection coefficients onto the space spanned by explanatory variables, for random variables generated by linear structural equation models using graphical structures. First, we derive a generalized graphical criterion that unifies the d-separation, single-door, and back-door criteria. This criterion provides a generically necessary and sufficient condition under which a partial regression coefficient coincides with the linear causal effect not mediated by other explanatory variables. Second, we reveal the mechanism underlying post-treatment bias and characterize it quantitatively. This provides a unified framework for discussing the graph structures that generate post-treatment bias, which have previously been examined individually, and clarifies the existence of graph structures that cannot be prevented by the conventional concept of path-blocking. These results are based on the algebraic properties of acyclic directed mixed graphs and do not rely on any specific probability distribution, making them applicable to a broad class of linear models.

2511.01040 2026-03-10 stat.OT stat.AP stat.CO

From Structural Equation Modeling to Targeted Learning: A Tutorial Introduction to Targeted Maximum Likelihood Estimation for SEM Researchers

Junjie Ma, Xiaoya Zhang, Guangye He, Yuting Han, Ting Ge, Feng Ji

详情
英文摘要

Structural equation modeling (SEM) and path analysis have long been central tools for studying complex causal relationships in the social and behavioral sciences, yet their reliance on parametric assumptions can lead to biased inference under model misspecification. To bridge traditional SEM with modern causal machine learning, this paper introduces targeted maximum likelihood estimation (TMLE), a doubly robust framework built on nonparametric structural equation modeling. We formally connect TMLE to classical path analysis, showing that standard SEM estimators arise as special cases of TMLE under restrictive parametric specifications and that both approaches can estimate common causal quantities such as direct, indirect, and total effects. Through simulation studies under both correctly specified and misspecified models, we demonstrate that while the two methods perform similarly when models are correctly specified, TMLE consistently achieves lower bias, reduced mean squared error, and improved confidence interval coverage when parametric assumptions are violated. We further illustrate these differences using an applied mediation analysis examining the role of poverty in access to high school education, where path analysis suggests a significant direct effect, whereas TMLE does not, highlighting the practical consequences of robustness in causal inference. Overall, this tutorial offers SEM researchers a conceptual and practical introduction to targeted learning, providing guidance on leveraging TMLE to enhance causal analysis beyond traditional parametric frameworks.

2510.23745 2026-03-10 stat.ML cs.LG

Bayesian neural networks with interpretable priors from Mercer kernels

Alex Alberts, Ilias Bilionis

Comments Published in Computer Methods in Applied Mechanics and Engineering

详情
英文摘要

Quantifying the uncertainty in the output of a neural network is essential for deployment in scientific or engineering applications where decisions must be made under limited or noisy data. Bayesian neural networks (BNNs) provide a framework for this purpose by constructing a Bayesian posterior distribution over the network parameters. However, the prior, which is of key importance in any Bayesian setting, is rarely meaningful for BNNs. This is because the complexity of the input-to-output map of a BNN makes it difficult to understand how certain distributions enforce any interpretable constraint on the output space of the network. Gaussian processes (GPs), on the other hand, are often preferred in uncertainty quantification tasks due to their interpretability. The drawback is that GPs are limited to small datasets without advanced techniques, which often rely on the covariance kernel having a specific structure. To address these challenges, we introduce a new class of priors for BNNs, called Mercer priors, such that the resulting BNN has samples which approximate that of a specified GP. The method works by defining a prior directly over the network parameters from the Mercer representation of the covariance kernel, and does not rely on the network having a specific structure. In doing so, we can exploit the scalability of BNNs in a meaningful Bayesian way.

2510.03449 2026-03-10 stat.ME stat.AP stat.CO

Bayesian Transfer Learning for High-Dimensional Linear Regression via Adaptive Shrinkage

Parsa Jamshidian, Donatello Telesca

详情
英文摘要

We introduce BLAST, Bayesian Linear regression with Adaptive Shrinkage for Transfer, a Bayesian multi-source transfer learning framework for high-dimensional linear regression. The proposed analytical framework leverages global-local shrinkage priors together with Bayesian source selection to balance information sharing and regularization. We show how Bayesian source selection allows for the extraction of the most useful data sources, while discounting biasing information that may lead to negative transfer. In this framework, both source selection and sparse regression are jointly accounted for in prediction and inference via Bayesian model averaging. The structure of our model admits efficient posterior simulation via a Metropolis-within-Gibbs sampling algorithm allowing full posterior inference for the target regression coefficients, making BLAST both computationally practical and inferentially straightforward. Our method achieves more accurate posterior inference for the target than regularization approaches based on target data alone, while offering competitive predictive performance and superior uncertainty quantification compared to current state-of-the-art transfer learning methods. We validate its effectiveness through extensive simulation studies and illustrate its analytical properties when applied to a case study on the estimation of tumor mutational burden from gene expression, using data from The Cancer Genome Atlas (TCGA).

2509.26112 2026-03-10 stat.AP

Shotgun DNA sequencing evidence: sample-specific and unknown genotyping error probabilities

Mikkel Meyer Andersen

Comments Handling multiple markers (including adding maximising profile likelihood) in Methods and reworked Results as a consequence

详情
英文摘要

Many forensic genetic trace samples are of too low quality to obtain short tandem repeat (STR) DNA profiles as the nuclear DNA they contain is highly degraded (e.g., telogen hairs). Instead, performing shotgun DNA sequencing of such samples can provide valuable information on, e.g., single nucleotide polymorphism (SNP) markers. As a result, shotgun sequencing is starting to gain more attention in forensic genetics and statistical models to correctly interpret such evidence, including properly accounting for sequencing errors, are needed. One such model is the wgsLR model by Andersen et. al. (2025) that enabled evaluating the evidential strength of a comparison between the genotypes in the trace sample and reference sample assuming a single-source contribution to both samples. This paper extends the wgsLR model to allow for different (asymmetric) genotyping error probabilities (e.g., from a low quality trace sample and a high quality reference sample). The model was also extended to handle unknown genotyping error probabilities via both maximising profile likelihood and using a prior distribution. The sensitivity of the wgsLR model against overdispersion was also investigated and it was found robust against it. It was also found that handling an unknown genotyping error probability of the trace sample with the methods having a sufficient number of independent markers gave concordant weight of evidence (WoE) under both the hypotheses (same or different individuals being donors of trace and reference sample). It was found more conservative to use a too small trace sample genotyping error probability rather than a too high genotyping error probability as the latter can explain genotype inconsistencies by errors rather than due to two different individuals being the donors of the trace sample and reference sample. The extensions of the model are implemented in the R package wgsLR.

2509.02937 2026-03-10 math.OC cs.LG stat.ML

Faster Gradient Methods for Highly-Smooth Stochastic Bilevel Optimization

Lesi Chen, Junru Li, El Mahdi Chayti, Jingzhao Zhang

Comments ICLR 2026; Add one additional author compared to v1

详情
英文摘要

This paper studies the complexity of finding an $ε$-stationary point for stochastic bilevel optimization when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent work proposed the first-order method, F${}^2$SA, achieving the $\tilde{\mathcal{O}}(ε^{-6})$ upper complexity bound for first-order smooth problems. This is slower than the optimal $Ω(ε^{-4})$ complexity lower bound in its single-level counterpart. In this work, we show that faster rates are achievable for higher-order smooth problems. We first reformulate F$^2$SA as approximating the hyper-gradient with a forward difference. Based on this observation, we propose a class of methods F${}^2$SA-$p$ that uses $p$th-order finite difference for hyper-gradient approximation and improves the upper bound to $\tilde{\mathcal{O}}(p ε^{-4-p/2})$ for $p$th-order smooth problems. Finally, we demonstrate that the $Ω(ε^{-4})$ lower bound also holds for stochastic bilevel problems when the high-order smoothness holds for the lower-level variable, indicating that the upper bound of F${}^2$SA-$p$ is nearly optimal in the highly smooth region $p = Ω( \log ε^{-1} / \log \log ε^{-1})$.

2506.18562 2026-03-10 stat.ME

Multi-Rank Subspace Change-Point Detection for Monitoring Robotic Swarms

Jonghyeok Lee, Yao Xie, Youngser Park, Jason Hindes, Ira Schwartz, Carey Priebe

详情
英文摘要

We study real-time detection of low-rank changes in the covariance structure of high-dimensional streaming data, motivated by robotic swarm monitoring. Building on the spiked covariance model, we propose the Multi-rank Subspace-CUSUM (MRS-C) procedure, which extends classical CUSUM by tracking projection energy onto an estimated signal subspace. We analyze performance by characterizing the expected detection delay (EDD) under a prescribed average run length (ARL), deriving closed-form asymptotically optimal choices of the window size and drift. We further prove that MRS-C is first-order asymptotically optimal relative to the oracle Exact CUSUM, with an explicit efficiency constant that depends on heterogeneity in spike strengths. When the signal rank is unknown, we use a parallel procedure. Simulations and robotic swarm-behavior data illustrate robustness and effectiveness.

2505.13564 2026-03-10 cs.LG stat.ML

Online Decision-Focused Learning

Aymeric Capitaine, Maxime Haddouche, Eric Moulines, Michael I. Jordan, Etienne Boursier, Alain Durmus

详情
英文摘要

Decision-focused learning (DFL) is an increasingly popular paradigm for training predictive models whose outputs are used in decision-making tasks. Instead of merely optimizing for predictive accuracy, DFL trains models to directly minimize the loss associated with downstream decisions. However, existing studies focus solely on scenarios where a fixed batch of data is available and the objective function does not change over time. We instead investigate DFL in dynamic environments where the objective function and data distribution evolve over time. This setting is challenging for online learning because the objective function has zero or undefined gradients, which prevents the use of standard first-order optimization methods, and is generally non-convex. To address these difficulties, we (i) regularize the objective to make it differentiable and (ii) use perturbation techniques along with a near-optimal oracle to overcome non-convexity. Combining those techniques yields two original online algorithms tailored for DFL, for which we establish respectively static and dynamic regret bounds. These are the first provable guarantees for the online decision-focused problem. Finally, we showcase the effectiveness of our algorithms on a knapsack experiment, where they outperform two standard benchmarks.

2505.04957 2026-03-10 math.ST stat.ME stat.TH

The Poisson tensor completion parametric estimator

Daniel M. Dunlavy, Richard B. Lehoucq, Carolyn D. Mayer, Arvind Prasadan

Comments 19 pages, 9 figures

详情
英文摘要

We introduce the Poisson tensor completion (PTC) estimator that exploits inter-sample relationships to compute a low-rank Poisson tensor decomposition of the frequency histogram for samples of a multivariate distribution. Our crucial observation is that the histogram bins are an instance of a space partitioning of counts and thus can be identified with a spatial non-homogeneous Poisson process. The Poisson tensor decomposition leads to a completion of the mean measure over all bins -- including those containing few to no samples -- and leads to our proposed estimator. A Poisson tensor decomposition models the underlying distribution of the count data and guarantees non-negative estimated values obviating the need for additional constraints to ensure non-negativity. Furthermore, we demonstrate that our PTC estimator is a substantial improvement over standard histogram-based estimators for sub-Gaussian probability distributions because of the concentration of norm phenomenon.

2505.00940 2026-03-10 cs.LG math.OC stat.CO stat.ME

StablePCA: Distributionally Robust Learning of Shared Representations from Multi-Source Data

Zhenyu Wang, Molei Liu, Jing Lei, Francis Bach, Zijian Guo

详情
英文摘要

When synthesizing multi-source high-dimensional data, a key objective is to extract low-dimensional representations that effectively approximate the original features across different sources. Such representations facilitate the discovery of transferable structures and help mitigate systematic biases such as batch effects. We introduce Stable Principal Component Analysis (StablePCA), a distributionally robust framework for constructing stable latent representations by maximizing the worst-case explained variance over multiple sources. A primary challenge in extending classical PCA to the multi-source setting lies in the nonconvex rank constraint, which renders the StablePCA formulation a nonconvex optimization problem. To overcome this challenge, we conduct a convex relaxation of StablePCA and develop an efficient Mirror-Prox algorithm to solve the relaxed problem, with global convergence guarantees. Since the relaxed problem generally differs from the original formulation, we further introduce a data-dependent certificate to assess how well the algorithm solves the original nonconvex problem and establish the condition under which the relaxation is tight. Finally, we explore alternative distributionally robust formulations of multi-source PCA based on different loss functions.

2502.07937 2026-03-10 cs.LG stat.ML

Active Advantage-Aligned Online Reinforcement Learning with Offline Data

Xuefeng Liu, Hung T. C. Le, Siyu Chen, Rick Stevens, Zhuoran Yang, Matthew R. Walter, Yuxin Chen

详情
英文摘要

Online reinforcement learning (RL) enhances policies through direct interactions with the environment, but faces challenges related to sample efficiency. In contrast, offline RL leverages extensive pre-collected data to learn policies, but often produces suboptimal results due to limited data coverage. Recent efforts integrate offline and online RL in order to harness the advantages of both approaches. However, effectively combining online and offline RL remains challenging due to issues that include catastrophic forgetting, lack of robustness to data quality and limited sample efficiency in data utilization. In an effort to address these challenges, we introduce A3RL, which incorporates a novel confidence aware Active Advantage Aligned (A3) sampling strategy that dynamically prioritizes data aligned with the policy's evolving needs from both online and offline sources, optimizing policy improvement. Moreover, we provide theoretical insights into the effectiveness of our active sampling strategy and conduct diverse empirical experiments and ablation studies, demonstrating that our method outperforms competing online RL techniques that leverage offline data.

2410.13744 2026-03-10 stat.ME q-bio.MN

Inferring the dynamics of quasi-reaction systems via nonlinear local mean-field approximations

Matteo Framba, Veronica Vinciotti, Ernst C. Wit

详情
英文摘要

In the modelling of stochastic phenomena, such as quasi-reaction systems, parameter estimation of kinetic rates can be challenging, particularly when the time gap between consecutive measurements is large. Local linear approximation approaches account for the stochasticity in the system but fail to capture the nonlinear nature of the underlying process. At the mean level, the dynamics of the system can be described by a system of ODEs, which have an explicit solution only for simple unitary systems. An analytical solution for generic quasi-reaction systems is proposed via a first order Taylor approximation of the hazard rate. This allows a nonlinear forward prediction of the future dynamics given the current state of the system. Predictions and corresponding observations are embedded in a nonlinear least-squares approach for parameter estimation. The performance of the algorithm is compared to existing SDE and ODE-based methods via a simulation study. Besides the increased computational efficiency of the approach, the results show an improvement in the kinetic rate estimation, particularly for data observed at large time intervals. Additionally, the availability of an explicit solution makes the method robust to stiffness, which is often present in biological systems. An illustration on Rhesus Macaque data shows the applicability of the approach to the study of cell differentiation.

2408.06710 2026-03-10 cs.LG cs.AI stat.ML

Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling

Jian Xu, Shian Du, Junmei Yang, Qianli Ma, Delu Zeng, John Paisley

详情
英文摘要

Gaussian Process Latent Variable Models (GPLVMs) have become increasingly popular for unsupervised tasks such as dimensionality reduction and missing data recovery due to their flexibility and non-linear nature. An importance-weighted version of the Bayesian GPLVMs has been proposed to obtain a tighter variational bound. However, this version of the approach is primarily limited to analyzing simple data structures, as the generation of an effective proposal distribution can become quite challenging in high-dimensional spaces or with complex data sets. In this work, we propose an Annealed Importance Sampling (AIS) approach to address these issues. By transforming the posterior into a sequence of intermediate distributions using annealing, we combine the strengths of Sequential Monte Carlo samplers and VI to explore a wider range of posterior distributions and gradually approach the target distribution. We further propose an efficient algorithm by reparameterizing all variables in the evidence lower bound (ELBO). Experimental results on both toy and image datasets demonstrate that our method outperforms state-of-the-art methods in terms of tighter variational bounds, higher log-likelihoods, and more robust convergence.

2408.00329 2026-03-10 cs.LG cs.AI math.OC stat.ML

OTAD: An Optimal Transport-Induced Robust Model for Agnostic Adversarial Attack

Kuo Gai, Sicong Wang, Shihua Zhang

Comments 15 pages, 2 figures

详情
英文摘要

Deep neural networks (DNNs) are vulnerable to small adversarial perturbations of the inputs, posing a significant challenge to their reliability and robustness. Empirical methods such as adversarial training can defend against particular attacks but remain vulnerable to more powerful attacks. Alternatively, Lipschitz networks provide certified robustness to unseen perturbations but lack sufficient expressive power. To harness the advantages of both approaches, we design a novel two-step Optimal Transport induced Adversarial Defense (OTAD) model that can fit the training data accurately while preserving the local Lipschitz continuity. First, we train a DNN with a regularizer derived from optimal transport theory, yielding a discrete optimal transport map linking data to its features. By leveraging the map's inherent regularity, we interpolate the map by solving the convex integration problem (CIP) to guarantee the local Lipschitz property. OTAD is extensible to diverse architectures of ResNet and Transformer, making it suitable for complex data. For efficient computation, the CIP can be solved through training neural networks. OTAD opens a novel avenue for developing reliable and secure deep learning systems through the regularity of optimal transport maps. Empirical results demonstrate that OTAD can outperform other robust models on diverse datasets.

2407.16786 2026-03-10 stat.ME

Causal generalized linear models via Pearson risk invariance

Alice Polinelli, Veronica Vinciotti, Ernst C. Wit

详情
英文摘要

Prediction invariance of causal models under heterogeneous settings has been exploited by a number of recent methods for causal discovery, typically focussing on recovering the causal parents of a target variable of interest. Existing methods require observational data from a number of sufficiently different environments, which is rarely available. In this paper, we consider a structural equation model where the target variable is described by a generalized linear model conditional on its parents. Besides having finite moments, no modelling assumptions are made on the conditional distributions of the other variables in the system, and nonlinear effects on the target variable can naturally be accommodated by a generalized additive structure. Under this setting, we characterize the causal model uniquely by means of two key properties: the Pearson risk invariant under the causal model and, conditional on the causal parents, the causal parameters maximize the expected likelihood. These two properties form the basis of a computational strategy for searching the causal model among all possible models. A stepwise greedy search is proposed for systems with a large number of variables. Crucially, for generalized linear models with a known dispersion parameter, such as Poisson and logistic regression, the causal model can be identified from a single data environment. The method is implemented in the R package causalreg.

2406.14380 2026-03-10 econ.EM cs.LG stat.ME

Estimating Treatment Effects under Algorithmic Interference: A Structured Neural Networks Approach

Ruohan Zhan, Shichao Han, Yuchen Hu, Zhenling Jiang

详情
英文摘要

Online user-generated content platforms allocate billions of dollars of promotional traffic through algorithms in two-sided marketplaces. To evaluate updates to these algorithms, platforms frequently rely on creator-side randomized experiments. However, because treated and control creators compete for exposure, such experiments suffer from algorithmic interference: exposure outcomes depend on competitors' treatment status. We show that commonly used difference-in-means estimators can therefore be severely biased and may even recommend deploying inferior algorithms. To address this challenge, we develop a structured semiparametric framework that explicitly models the competitive allocation mechanism underlying exposure. Our approach combines an algorithm choice model that characterizes how exposure is allocated across competing content with a viewer response model that captures engagement conditional on exposure. We construct a debiased estimator grounded in the double machine learning framework to recover the global treatment effect of platform-wide rollout. Methodologically, we extend DML asymptotic theory to accommodate correlated samples arising from overlapping consideration sets. Using Monte Carlo simulations and a large-scale field experiment on a major short-video platform, we show that our estimator closely matches an interference-free benchmark obtained from a costly double-sided experimental design. In contrast, standard estimators exhibit substantial bias and, in some cases, even reverse the sign of the effect.

2406.09055 2026-03-10 stat.ME

Relational event models with global covariates

Melania Lembo, Rūta Juozaitienė, Veronica Vinciotti, Ernst C. Wit

详情
英文摘要

Bike sharing is an increasingly popular mobility choice as it is a sustainable, healthy and economically viable transportation mode. By interpreting rides between bike stations over time as temporal events connecting two bike stations, relational event models can provide important insights into this phenomenon. The focus of relational event models, as a typical event history model, is normally on dyadic or node-specific covariates, as global covariates are considered nuisance parameters in a partial likelihood approach. As full likelihood approaches are infeasible given the sheer size of the relational process, we propose an innovative sampling approach of temporally shifted non-events to recover important global drivers of the relational process. The method combines nested case-control sampling on a time-shifted version of the event process. This leads to a partial likelihood of the relational event process that is identical to that of a degenerate logistic additive model, enabling efficient estimation of both global and non-global covariate effects. The computational effectiveness of the method is demonstrated through a simulation study. The analysis of around 350,000 bike rides in the Washington D.C. area reveals significant influences of weather and time of day on bike sharing dynamics, besides a number of traditional node-specific and dyadic covariates.

2404.12556 2026-03-10 stat.CO

Bias- and Variance-Aware Probabilistic Rounding Error Analysis for Floating-Point Arithmetic

Sahil Bhola, Karthik Duraisamy

详情
英文摘要

Probabilistic rounding error analysis can yield much sharper bounds than classical worst-case theory, but existing results typically rely on zero-mean rounding errors and often leave the confidence parameter implicit. This work revisits probabilistic rounding error analysis in a moment-aware setting. We first derive a confidence-calibrated reformulation of the Higham and Mary [16] bound that makes its confidence parameter explicit. We then introduce a variance-informed probabilistic backward error bound based on the first two moments of $\log(1+δ)$, where $δ$ is the relative rounding error. This allows the analysis to accommodate biased rounding error models rather than relying on a zero-mean assumption. To illustrate this framework, we study both a uniform model and a log-space $\operatorname{Beta}$ model for rounding errors, the latter of which provides a simple way to represent bias. This perspective shows that the growth of probabilistic rounding error bounds is not universal: near-zero-mean regimes recover $\sqrt{n}$-like behavior, while biased models can exhibit faster accumulation. $\texttt{CUDA}$ experiments in single and half precision on dot products, sparse matrix-vector products, and a stochastic boundary-value problem show that the proposed framework is especially useful in low-precision regimes where deterministic bounds are overly conservative and where bias-aware modeling better matches observed error growth.

2302.00941 2026-03-10 cs.GT stat.ML

A Robust Multi-Item Auction Design with Statistical Learning

Jiale Han, Xiaowu Dai

详情
英文摘要

We propose a novel statistical learning method for multi-item auctions that incorporates credible intervals. Our approach employs nonparametric density estimation to estimate credible intervals for bidder types based on historical data. We introduce two new strategies that leverage these credible intervals to reduce the time cost of implementing auctions. The first strategy screens potential winners' value regions within the credible intervals, while the second strategy simplifies the type distribution when the length of the interval is below a threshold value. These strategies are easy to implement and ensure fairness, dominant-strategy incentive compatibility, and dominant-strategy individual rationality with a high probability, while simultaneously reducing implementation costs. We demonstrate the effectiveness of our strategies using the Vickrey-Clarke-Groves mechanism and evaluate their performance through simulation experiments. Our results show that the proposed strategies consistently outperform alternative methods, achieving both revenue maximization and cost reduction objectives.

2012.09828 2026-03-10 math.ST stat.TH

Nonparametric two-sample hypothesis testing for low-rank random graphs of differing sizes

Joshua Agterberg, Minh Tang, Carey Priebe

详情
英文摘要

Given two networks of differing sizes, it is of interest to test whether the two networks belong to the same distribution. We formalize the notion of "equality of distribution" under the framework of the generalized random dot product graph, which considers as special cases a number of popular network models with low-rank expectations. We then propose a nonparametric two-sample test statistic to conduct this test, assuming only that the networks have independent edges generated from low-rank probability matrices. Our proposed test statistic involves using the maximum mean discrepancy applied to suitably rotated rows of a graph embedding, where the rotation is estimated using optimal transport. We show that our test statistic, appropriately scaled, is consistent for sufficiently dense graphs, and we study its convergence under different sparsity regimes, and our results are demonstrated in numerical simulations.

2010.01388 2026-03-10 cs.LG cs.AI stat.ML

Online Neural Networks for Change-Point Detection

Mikhail Hushchyn, Kenenbek Arzymatov, Denis Derkach

Comments This version of the article has been submitted to the journal but is not the Version of Record and does not reflect peer-review improvements, post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s10994-026-07000-6

详情
Journal ref
Mach Learn 115, 56 (2026)
英文摘要

Moments when a time series changes its behavior are called change points. Occurrence of change point implies that the state of the system is altered and its timely detection might help to prevent unwanted consequences. In this paper, we present two change-point detection approaches based on neural networks and online learning. These algorithms demonstrate linear computational complexity and are suitable for change-point detection in large time series. We compare them with the best known algorithms on various synthetic and real world data sets. Experiments show that the proposed methods outperform known approaches. We also prove the convergence of the algorithms to the optimal solutions and describe conditions rendering current approach more powerful than offline one.