arXivDaily arXiv每日学术速递 周一至周五更新
重置
2603.30017 2026-04-01 cs.LG cs.CR stat.ML

Refined Detection for Gumbel Watermarking

Tor Lattimore

详情
英文摘要

We propose a simple detection mechanism for the Gumbel watermarking scheme proposed by Aaronson (2022). The new mechanism is proven to be near-optimal in a problem-dependent sense among all model-agnostic watermarking schemes under the assumption that the next-token distribution is sampled i.i.d.

2603.29884 2026-04-01 math.ST stat.TH

Csiszár indices and interpolating copulas

Cristina Butucea, Jean-François Delmas, Anne Dutfoy, Antoine Schoonaert

详情
英文摘要

We study various properties of $f$-divergences and Csiszár indices between two probability distributions in very general setups for the convex function $f$ and for the probability distributions. We establish general structural properties of $f$-divergences and show how they are inherited by the associated Csiszár indices, including monotonicity and invariance under suitable transformations. We also study the relationship between Csiszár indices and copula representations of random vectors. When the marginal distributions have atoms, the copula representation is not unique and the Csiszár index of the transformed vectors may increase. We build a large family of interpolating copulas which minimize the Csiszár index and thus preserve the dependence structure of the initial vector.

2603.29786 2026-04-01 stat.OT

On the symmetry of evidential support

Grant Molnar

Comments 20 pages

详情
英文摘要

For events $A$ and $B$, we have \[ \mathbb{P}(A\mid B) > \mathbb{P}(A\mid \neg B) \qquad\Longleftrightarrow\qquad \mathbb{P}(B\mid A) > \mathbb{P}(B\mid \neg A) \] whenever all four quantities are defined. In other words, $B$ is evidence for $A$ if and only if $A$ is evidence for $B$. This note gives seven different proofs of this fact -- by cross-multiplication, covariance, coupling parameters, odds ratios, pointwise mutual information, combinatorial double counting, and mixed discrete derivatives -- and develops a surrounding web of interpretations. Once the marginals $\mathbb{P}(A)$ and $\mathbb{P}(B)$ are fixed, a $2\times 2$ table has only one degree of freedom, so every scalar notion of positive association must be governed by the same signed parameter.

2603.29764 2026-04-01 stat.ME

High dimensional alpha test for linear factor pricing model with $L_q$-norm

Ping Zhao, Huifang Ma, Long Feng

详情
英文摘要

We consider testing zero pricing errors in high-dimensional linear factor pricing models. Existing methods are mainly based on either an $L_2$ statistic, which is effective under dense alternatives, or an $L_\infty$ statistic, which is powerful under very sparse alternatives. To bridge these two regimes, we develop a class of $L_q$-based tests for finite $q$, including the practically useful $L_4$ and $L_6$ cases. We show that larger $q$ leads to greater sensitivity to sparse alternatives. We further establish the asymptotic independence between the $L_\infty$ statistic and the $L_q$ statistic for any finite $q$, which motivates a Cauchy combination test that adapts to a broad range of sparsity levels. Simulation studies and a real-data analysis show that the proposed methods are more robust to the unknown sparsity of the alternative and can outperform existing procedures in finite samples.

2603.29725 2026-04-01 stat.ML cs.LG

Unbounded Density Ratio Estimation and Its Application to Covariate Shift Adaptation

Ren-Rui Liu, Jun Fan, Lei Shi, Zheng-Chu Guo

Comments 48 pages, 1 figure, 1 table

详情
英文摘要

This paper focuses on the problem of unbounded density ratio estimation -- an understudied yet critical challenge in statistical learning -- and its application to covariate shift adaptation. Much of the existing literature assumes that the density ratio is either uniformly bounded or unbounded but known exactly. These conditions are often violated in practice, creating a gap between theoretical guarantees and real-world applicability. In contrast, this work directly addresses unbounded density ratios and integrates them into importance weighting for effective covariate shift adaptation. We propose a three-step estimation method that leverages unlabeled data from both the source and target distributions: (1) estimating a relative density ratio; (2) applying a truncation operation to control its unboundedness; and (3) transforming the truncated estimate back into the standard density ratio. The estimated density ratio is then employed as importance weights for regression under covariate shift. We establish rigorous, non-asymptotic convergence guarantees for both the proposed density ratio estimator and the resulting regression function estimator, demonstrating optimal or near-optimal convergence rates. Our findings offer new theoretical insights into density ratio estimation and learning under covariate shift, extending classical learning theory to more practical and challenging scenarios.

2603.29724 2026-04-01 stat.CO

A new gradient-free active subspace estimation method with application to rare event probability estimation

Valentin Breaz, Miguel Munoz Zuniga, Olivier Zahm, Richard Wilkinson

Comments 24 pages, 6 figures

详情
英文摘要

To reduce the cost of estimating the probability of a rare event involving a very large number of random parameters, we propose a new strategy for dimension reduction coupled with a surrogate model for the expensive part of the algorithm. To this end, we extend the Ordinary Kriging Active Subspace (OK-AS) method into a sequential version. Our approach consists of iteratively re-estimating the active subspace using a Kriging surrogate trained in a rotated coordinate system until the active subspace stabilises. This method allows for a reduction in prediction error and a better approximation of the active subspace on a benchmark of test problems. Furthermore, we integrate our algorithm into an efficient pre-existing approach for estimating the probability of a rare event. This approach is based on learning the active subspace associated with the random event whose probability is to be estimated. The sequential learning of an importance sampling density is necessary and corresponds to the expensive part of this strategy. To circumvent this issue, we integrate our sequential OK-AS version into the estimation of the importance sampling density. The numerical results indicate that our method allows for reducing the cost required to obtain a precise estimate of the rare event probability.

2603.29715 2026-04-01 cs.LG eess.SP math.OC stat.ML

Nonnegative Matrix Factorization in the Component-Wise L1 Norm for Sparse Data

Giovanni Seraghiti, Kévin Dubrulle, Arnaud Vandaele, Nicolas Gillis

Comments 21 pages before supplementary, code available from https://github.com/giovanniseraghiti/wL1-NMF

详情
英文摘要

Nonnegative matrix factorization (NMF) approximates a nonnegative matrix, $X$, by the product of two nonnegative factors, $WH$, where $W$ has $r$ columns and $H$ has $r$ rows. In this paper, we consider NMF using the component-wise L1 norm as the error measure (L1-NMF), which is suited for data corrupted by heavy-tailed noise, such as Laplace noise or salt and pepper noise, or in the presence of outliers. Our first contribution is an NP-hardness proof for L1-NMF, even when $r=1$, in contrast to the standard NMF that uses least squares. Our second contribution is to show that L1-NMF strongly enforces sparsity in the factors for sparse input matrices, thereby favoring interpretability. However, if the data is affected by false zeros, too sparse solutions might degrade the model. Our third contribution is a new, more general, L1-NMF model for sparse data, dubbed weighted L1-NMF (wL1-NMF), where the sparsity of the factorization is controlled by adding a penalization parameter to the entries of $WH$ associated with zeros in the data. The fourth contribution is a new coordinate descent (CD) approach for wL1-NMF, denoted as sparse CD (sCD), where each subproblem is solved by a weighted median algorithm. To the best of our knowledge, sCD is the first algorithm for L1-NMF whose complexity scales with the number of nonzero entries in the data, making it efficient in handling large-scale, sparse data. We perform extensive numerical experiments on synthetic and real-world data to show the effectiveness of our new proposed model (wL1-NMF) and algorithm (sCD).

2603.29654 2026-04-01 cs.LG cs.AI stat.ML

Concept frustration: Aligning human concepts and machine representations

Enrico Parisini, Christopher J. Soelistyo, Ahab Isaac, Alessandro Barp, Christopher R. S. Banerji

Comments 34 pages, 7 figures

详情
英文摘要

Aligning human-interpretable concepts with the internal representations learned by modern machine learning systems remains a central challenge for interpretable AI. We introduce a geometric framework for comparing supervised human concepts with unsupervised intermediate representations extracted from foundation model embeddings. Motivated by the role of conceptual leaps in scientific discovery, we formalise the notion of concept frustration: a contradiction that arises when an unobserved concept induces relationships between known concepts that cannot be made consistent within an existing ontology. We develop task-aligned similarity measures that detect concept frustration between supervised concept-based models and unsupervised representations derived from foundation models, and show that the phenomenon is detectable in task-aligned geometry while conventional Euclidean comparisons fail. Under a linear-Gaussian generative model we derive a closed-form expression for Bayes-optimal concept-based classifier accuracy, decomposing predictive signal into known-known, known-unknown and unknown-unknown contributions and identifying analytically where frustration affects performance. Experiments on synthetic data and real language and vision tasks demonstrate that frustration can be detected in foundation model representations and that incorporating a frustrating concept into an interpretable model reorganises the geometry of learned concept representations, to better align human and machine reasoning. These results suggest a principled framework for diagnosing incomplete concept ontologies and aligning human and machine conceptual reasoning, with implications for the development and validation of safe interpretable AI for high-risk applications.

2603.29647 2026-04-01 stat.CO stat.ME

A Hybrid NUTS-Gibbs Sampler with State Space Marginalization for Estimation of Dynamic Structural Equation Models with Binomial Outcomes

Øystein Sørensen, Ethan M. McCormick

详情
英文摘要

Dynamic structural equation modeling (DSEM) is widely used for analyzing intensive longitudinal data (ILD). Although many ILD have categorical (Bernoulli or binomially distributed) responses, currently available Metropolis-within-Gibbs samplers for estimating DSEMs are limited to using the probit link and the Bernoulli distribution. These samplers scale poorly with increasing model complexity and/or data size. Here, we present a hybrid sampler -- alternating between one step of the No-U-Turn Sampler (NUTS) and one Gibbs step -- which solves both of these problems: the Gibbs step naturally handles Pólya-Gamma distributed latent variables arising from binomially distributed responses with a logit link, and the NUTS step utilizes a Kalman filter to exactly marginalize over latent states, alleviating the need to sample these variables. We demonstrate in simulation experiments that the proposed sampler is more efficient than alternative algorithms, and that it makes DSEM estimation with binomial data feasible for larger data and models than what has previously been possible. We also illustrate its use in an example application of predicting panic attacks.

2603.29612 2026-04-01 stat.ME cs.LG

Central limit theorems for the outputs of fully convolutional neural networks with time series input

Annika Betken, Giorgio Micali, Johannes Schmidt-Hieber

详情
英文摘要

Deep learning is widely deployed for time series learning tasks such as classification and forecasting. Despite the empirical successes, only little theory has been developed so far in the time series context. In this work, we prove that if the network inputs are generated from short-range dependent linear processes, the outputs of fully convolutional neural networks (FCNs) with global average pooling (GAP) are asymptotically Gaussian and the limit is attained if the length of the observed time series tends to infinity. The proof leverages existing tools from the theoretical time series literature. Based on our theory, we propose a generalization of the GAP layer by considering a global weighted pooling step with slowly varying, learnable coefficients.

2603.29571 2026-04-01 math.PR cs.IT math.CO math.IT math.OC math.ST stat.TH

Randomstrasse101: Open Problems of 2025

Afonso S. Bandeira, Daniil Dmitriev, Kevin Lucca, Petar Nizić-Nikolac, Almut Rödder

详情
英文摘要

Randomstrasse101 is a blog dedicated to Open Problems in Mathematics, with a focus on Probability Theory, Computation, Combinatorics, Statistics, and related topics. This manuscript serves as a stable record of the Open Problems posted in 2025, with the goal of easing academic referencing. The blog can currently be accessed at randomstrasse101.math.ethz.ch

2603.12299 2026-04-01 stat.CO math.PR

Regenerative Rejection Sampling

Tommaso Bozzi

Comments EPFL Master's Thesis, 122 pages, 30 figures

详情
英文摘要

This thesis presents Regenerative Rejection Sampling (RRS), a novel approximate sampling algorithm inspired by classical Rejection Sampling and Markov Chain Monte Carlo methods. The method constructs a continuous-time regenerative process whose stationary distribution coincides with a target density known only up to a normalizing constant. Unlike standard Rejection Sampling, RRS does not require the existence of a finite constant that upper-bounds the likelihood ratio. As a result, its total variation convergence rate remains exponential for a larger class of scenarios compared to, for example, the Independent Metropolis-Hastings sampler, which requires a finite bounding constant. To explain the workings of the method, we first present a detailed review of renewal and regenerative processes, including their limit theorems, stationary versions, and convergence properties under standard conditions. We explain a coupling proof for exponential convergence of regenerative processes, under the assumption of a spread-out cycle length distribution. We then introduce the RRS algorithm, and derive its convergence rate. Its performance is compared theoretically and empirically with classical MCMC methods. Numerical experiments demonstrate that RRS can exhibit lower autocorrelations and faster effective mixing, both in synthetic examples and in a Bayesian probit regression model applied to a real medical dataset. Moreover, if the algorithm is run until time t, we show that the usual order $O(1/t)$ results for the bias of the time-average estimators, is improved to a bias of $O(1/t^2)$ for the estimator constructed from the RRS method, and provide easy-to-estimate non-asymptotic bounds for this bias.

2602.17525 2026-04-01 cs.LG math.ST stat.ML stat.TH

Variational inference via radial transport

Luca Ghafourpour, Sinho Chewi, Alessio Figalli, Aram-Alexandre Pooladian

详情
英文摘要

In variational inference (VI), the practitioner approximates a high-dimensional distribution $π$ with a simple surrogate one, often a (product) Gaussian distribution. However, in many cases of practical interest, Gaussian distributions might not capture the correct radial profile of $π$, resulting in poor coverage. In this work, we approach the VI problem from the perspective of optimizing over these radial profiles. Our algorithm radVI is a cheap, effective add-on to many existing VI schemes, such as Gaussian (mean-field) VI and Laplace approximation. We provide theoretical convergence guarantees for our algorithm, owing to recent developments in optimization over the Wasserstein space--the space of probability distributions endowed with the Wasserstein distance--and new regularity properties of radial transport maps in the style of Caffarelli (2000).

2602.11873 2026-04-01 eess.IV physics.med-ph stat.ME

Time-resolved aortic 3D shape reconstruction from a limited number of cine 2D MRI slices

Gloria Wolkerstorfer, Stefano Buoso, Rabea Schlenker, Jochen von Spiczak, Robert Manka, Sebastian Kozerke

详情
英文摘要

Background and Objective: To assess the feasibility and accuracy of reconstructing time-resolved, three-dimensional, subject-specific aortic geometries from a limited number of standard cine 2D magnetic resonance imaging (MRI) acquisitions. This is achieved by coupling a statistical shape model with a differentiable volumetric mesh optimization algorithm. Methods: Cine 2D MRI slices were manually segmented and used to reconstruct subject-specific aortic geometries via a differentiable mesh optimization algorithm, constrained by a statistical shape model. Optimal slice positioning was first evaluated on synthetic data, followed by in-vivo acquisition in 30 subjects (19 volunteers and 11 aortic stenosis patients). Time-resolved aortic geometries were reconstructed, from which geometric descriptors and radial strain were derived. In a subset of 10 subjects, 4D flow MRI data was acquired to provide volumetric reference for peak-systolic shape comparison. Results: Accurate reconstruction was achieved using as few as six cine 2D MRI slices. Agreement with 4D flow MRI reference data yielded a Dice score of (89.9 +/- 1.6) %, Intersection over Union of (81.7 +/- 2.7) %, Hausdorff distance of (7.3 +/- 3.3) mm, and Chamfer distance of (3.7 +/- 0.6) mm. The mean absolute radius error along the aortic arch was (0.8 +/- 0.6) mm. Secondary analysis demonstrated significant differences in geometric features and radial strain across age groups, with strain decreasing progressively with age at values of (11.00 +/- 3.11) x 10-2 vs. (3.74 +/- 1.25) x 10-2 vs. (2.89 +/- 0.87) x 10-2 for the young, mid-age, and elderly groups, respectively. Conclusion: The proposed framework enables reconstruction of time-resolved, subject-specific aortic geometries from a limited number of standard cine 2D MRI acquisitions, providing a practical basis for downstream computational analysis.

2601.08411 2026-04-01 stat.CO

Particle Filtering for a Class of State-Space Models with Low and Degenerate Observational Noise

Abylay Zhumekenov, Alexandros Beskos, Dan Crisan, Ajay Jasra, Nikolas Kantas

Comments 27 pages, 13 figures

详情
英文摘要

We consider the discrete-time filtering problem in scenarios where the observation noise is low or degenerate. We focus on the case where the observation equation is a linear function of the state and the data involve additive noise. However, we place minimal assumptions on the hidden state process. For such a class of models we derive new particle filters (PFs) with the key property that their performance is robust to the size of the observation noise. As a consequence, the developed PFs are well-defined in the limiting case of degenerate observation noise. Indicatively, we prove (under assumptions) that the PF applied in this low noise setting inherits the properties of the PF used in the degenerate case. We extend our framework to the case where the hidden states are drawn from a diffusion process. In this scenario we develop new PFs which are robust to both low noise and fine levels of time discretization. We illustrate our algorithms numerically on several examples.

2511.11161 2026-04-01 stat.ML cs.LG math.ST stat.TH

Drift Estimation for Diffusion Processes Using Neural Networks Based on Discretely Observed Independent Paths

Yuzhen Zhao, Yating Liu, Marc Hoffmann

Comments Accepted for an oral presentation at the 40th Annual AAAI Conference on Artificial Intelligence (AAAI-26)

详情
Journal ref
Proceedings of the AAAI Conference on Artificial Intelligence, 2026, 40(34), 28778-28785
英文摘要

This paper addresses the nonparametric estimation of the drift function over a compact domain for a time-homogeneous diffusion process, based on high-frequency discrete observations from $N$ independent trajectories. We propose a neural network-based estimator and derive a non-asymptotic convergence rate, decomposed into a training error, an approximation error, and a diffusion-related term scaling as ${\log N}/{N}$. For compositional drift functions, we establish an explicit rate. In the numerical experiments, we consider a drift function with local fluctuations generated by a double-layer compositional structure featuring local oscillations, and show that the empirical convergence rate becomes independent of the input dimension $d$. Compared to the $B$-spline method, the neural network estimator achieves better convergence rates and more effectively captures local features, particularly in higher-dimensional settings.

2510.14582 2026-04-01 stat.ML cs.AI cs.LG

Local Causal Discovery for Statistically Efficient Causal Inference

Mátyás Schubert, Tom Claassen, Sara Magliacane

Comments Accepted at AISTATS 2026

详情
英文摘要

Causal discovery methods can identify valid adjustment sets for causal effect estimation for a pair of target variables, even when the underlying causal graph is unknown. Global causal discovery methods focus on learning the whole causal graph and therefore enable the recovery of optimal adjustment sets, i.e., sets with the lowest asymptotic variance, but they quickly become computationally prohibitive as the number of variables grows. Local causal discovery methods offer a more scalable alternative by focusing on the local neighborhood of the target variables, but are restricted to statistically suboptimal adjustment sets. In this work, we propose Local Optimal Adjustments Discovery (LOAD), a sound and complete causal discovery approach that combines the computational efficiency of local methods with the statistical optimality of global methods. First, LOAD identifies the causal relation between the targets and tests if the causal effect is identifiable by using only local information. If it is identifiable, it finds the possible descendants of the treatment and infers the optimal adjustment set as the parents of the outcome in a modified forbidden projection. Otherwise, it returns the locally valid parent adjustment sets. In our experiments on synthetic and realistic data LOAD outperforms global methods in scalability, while providing more accurate effect estimation than local methods.

2509.13130 2026-04-01 math.ST stat.ME stat.ML stat.TH

Fuzzy Prediction Sets: Conformal Prediction with E-values

Nick W. Koning, Sam van Meer

Comments Shortened and more polished version

详情
英文摘要

Prediction sets offer a binary inclusion/exclusion for each element at the same fixed confidence level. We generalize to fuzzy prediction sets, which exclude elements at their own data-driven confidence level. Our key insight is that a fuzzy prediction set \emph{is} an e-value, capturing precisely what e-values bring to predictive inference. Fuzzy prediction sets inherit the merging properties of their e-value, offer richer guarantees to decision-makers. We also show in what sense optimal e-values give rise to optimal (fuzzy) prediction sets. We apply our results to conformal prediction, deriving optimal fuzzy conformal prediction sets, and characterizing in what sense classical conformal prediction is optimal.

2509.03378 2026-04-01 stat.ML cs.LG

Understanding and Improving Shampoo and SOAP via Kullback-Leibler Minimization

Wu Lin, Scott C. Lowe, Felix Dangel, Runa Eschenhagen, Zikun Xu, Roger B. Grosse

Comments an extended version of the ICLR 2026 paper (added a sentence about viewing KL-Shampoo from a gradient orthogonalization viewpoint)

详情
英文摘要

Shampoo and its efficient variant, SOAP, employ structured second-moment estimations and have shown strong performance for training neural networks (NNs). In practice, however, Shampoo typically requires step-size grafting with Adam to be competitive, and SOAP mitigates this by applying Adam in Shampoo's eigenbasis -- at the cost of additional memory overhead from Adam in both methods. Prior analyses have largely relied on the Frobenius norm to motivate these estimation schemes. We instead recast their estimation procedures as covariance estimation under Kullback-Leibler (KL) divergence minimization, revealing a previously overlooked theoretical limitation and motivating principled redesigns. Building on this perspective, we develop $\textbf{KL-Shampoo}$ and $\textbf{KL-SOAP}$, practical schemes that match or exceed the performance of Shampoo and SOAP in NN pre-training while achieving SOAP-level per-iteration runtime. Notably, KL-Shampoo does not rely on Adam to attain competitive performance, eliminating the memory overhead introduced by Adam. Across our experiments, KL-Shampoo consistently outperforms SOAP, Shampoo, and even KL-SOAP, establishing the KL-based approach as a promising foundation for designing structured methods in NN optimization. An implementation of KL-Shampoo/KL-SOAP is available at https://github.com/yorkerlin/KL-Methods

2505.23496 2026-04-01 cs.LG stat.ML

Epistemic Errors of Imperfect Multitask Learners When Distributions Shift

Sabina J. Sloman, Michele Caprio, Samuel Kaski

详情
英文摘要

Uncertainty-aware machine learners, such as Bayesian neural networks, output a quantification of uncertainty instead of a point prediction. We provide uncertainty-aware learners with a principled framework to characterize, and identify ways to eliminate, errors that arise from reducible (epistemic) uncertainty. We introduce a principled definition of epistemic error, and provide a decompositional epistemic error bound which operates in the very general setting of imperfect multitask learning under distribution shift. In this setting, the training (source) data may arise from multiple tasks, the test (target) data may differ systematically from the source data tasks, and/or the learner may not arrive at an accurate characterization of the source data. Our bound separately attributes epistemic errors to each of multiple aspects of the learning procedure and environment. As corollaries of the general result, we provide epistemic error bounds specialized to the settings of Bayesian transfer learning and distribution shift within $ε$-neighborhoods.

2505.19635 2026-04-01 cs.LG math.ST stat.ML stat.TH

When fractional quasi p-norms concentrate

Ivan Y. Tyukin, Bogdan Grechuk, Evgeny M. Mirkes, Alexander N. Gorban

详情
英文摘要

Concentration of distances in high dimension is an important factor for the development and design of stable and reliable data analysis algorithms. In this paper, we address the fundamental long-standing question about the concentration of distances in high dimension for fractional quasi $p$-norms, $p\in(0,1)$. The topic has been at the centre of various theoretical and empirical controversies. Here we, for the first time, identify conditions when fractional quasi $p$-norms concentrate and when they don't. We show that contrary to some earlier suggestions, for broad classes of distributions, fractional quasi $p$-norms admit exponential and uniform in $p$ concentration bounds. For these distributions, the results effectively rule out previously proposed approaches to alleviate concentration by "optimal" setting the values of $p$ in $(0,1)$. At the same time, we specify conditions and the corresponding families of distributions for which one can still control concentration rates by appropriate choices of $p$. We also show that in an arbitrarily small vicinity of a distribution from a large class of distributions for which uniform concentration occurs, there are uncountably many other distributions featuring anti-concentration properties. Importantly, this behavior enables devising relevant data encoding or representation schemes favouring or discouraging distance concentration. The results shed new light on this long-standing problem and resolve the tension around the topic in both theory and empirical evidence reported in the literature.

2504.20586 2026-04-01 physics.comp-ph cs.CE stat.AP

Faster Random Walk-based Capacitance Extraction with Generalized Antithetic Sampling

Periklis Liaskovitis, Marios Visvardis, Efthymios Efstathiou

详情
英文摘要

Floating random walk-based capacitance extraction has emerged in recent years as a tried and true approach for extracting parasitic capacitance in very large scale integrated circuits. Being a Monte Carlo method, its performance is dependent on the variance of sampled quantities and variance reduction methods are crucial for the challenges posed by ever denser process technologies and layout-dependent effects. In this work, we present a novel, universal variance reduction method for floating random walk-based capacitance extraction, which is conceptually simple, highly efficient and provably reduces variance in all extractions, especially when layout-dependent effects are present. It is complementary to existing mathematical formulations for variance reduction and its performance gains are experienced on top of theirs. Numerical experiments demonstrate substantial such gains of up to 50% in number of walks necessary as well as in actual extraction times compared to the best previously proposed variance reduction approaches for the floating random-walk.

2504.14173 2026-04-01 stat.ME

A generalized tetrad constraint for testing conditional independence given a latent variable

Naiwen Ying, Ping Zhang, Shanshan Luo, Wang Miao

详情
英文摘要

The tetrad constraint is widely used to test whether four observed variables are conditionally independent given a latent variable, based on the fact that if four observed variables following a linear model are mutually independent after conditioning on an unobserved variable, then products of covariances of any two different pairs of these four variables are equal. It is an important tool for discovering a latent common cause or distinguishing between alternative linear causal structures. However, the classical tetrad constraint fails in nonlinear models because the covariance of observed variables cannot capture nonlinear association. In this paper, we propose a generalized tetrad constraint, which establishes a testable implication for conditional independence given a latent variable in nonlinear and nonparametric models. In linear models, this constraint implies the classical tetrad constraint; in nonlinear models, it remains a necessary condition for conditional independence but the classical tetrad constraint no longer is. Based on this constraint, we further propose a formal test, which can control type I error and has power approaching unity under certain conditions. We illustrate the proposed approach via simulations and two real data applications on mental ability tests and on moral attitudes towards dishonesty.

2503.21473 2026-04-01 stat.ML cs.LG

DeepRV: Accelerating Spatiotemporal Inference with Pre-trained Neural Priors

Jhonathan Navott, Daniel Jenson, Seth Flaxman, Elizaveta Semenova

Comments Code to reproduce all experiments is available in the dl4bi codebase: https://github.com/MLGlobalHealth/dl4bi

详情
英文摘要

Gaussian Processes (GPs) provide a flexible and statistically principled foundation for modelling spatiotemporal phenomena, but their $O(N^3)$ scaling makes them intractable for large datasets. Approximate methods such as variational inference (VI), inducing-point (sparse) GPs, low-rank kernel approximations (e.g., Nystrom methods and random Fourier features), and approximations such as INLA improve scalability but typically trade off accuracy, calibration, or modelling flexibility. We introduce DeepRV, a neural-network surrogate that replaces GP prior sampling, while closely matching full GP accuracy at inference including hyperparameter estimates, and reducing computational complexity to $O(N^2)$, increasing scalability and inference speed. DeepRV serves as a drop-in replacement for GP prior realisations in e.g. MCMC-based probabilistic programming pipelines, preserving full model flexibility. Across simulated benchmarks, non-separable spatiotemporal GPs, and a real-world application to education deprivation in London (n = 4,994 locations), DeepRV achieves the highest fidelity to exact GPs while substantially accelerating inference. Code is provided in the dl4bi Python package, with all experiments run on a single consumer-grade GPU to ensure accessibility for practitioners.

2502.17275 2026-04-01 cond-mat.mtrl-sci stat.CO

Time-dependent global sensitivity analysis of the Doyle-Fuller-Newman model

Elia Zonta, Ivana Jovanovic Buha, Michele Spinola, Christoph Weißinger, Hans-Joachim Bungartz, Andreas Jossen

详情
Journal ref
Journal of Energy Storage 158 (2026) 121751
英文摘要

The Doyle-Fuller-Newman model is arguably the most ubiquitous electrochemical model in lithium-ion battery research. Since it is a highly nonlinear model, its input-output relations are still poorly understood. Researchers therefore often employ sensitivity analyses to elucidate relative parametric importance for certain use cases. However, some methods are ill-suited for the complexity of the model and appropriate methods often face the downside of only being applicable to scalar quantities of interest. We implement a novel framework for global sensitivity analysis of time-dependent model outputs and apply it to a drive cycle simulation. We conduct a full and a subgroup sensitivity analysis to resolve lowly sensitive parameters and explore the model error when unimportant parameters are set to arbitrary values. Our findings suggest that the method identifies insensitive parameters whose variations cause only small deviations in the voltage response of the model. By providing the methodology, we hope research questions related to parametric sensitivity for time-dependent quantities of interest, such as voltage responses, can be addressed more easily and adequately in simulative battery research and beyond.

2406.14399 2026-04-01 cs.LG cs.CV physics.ao-ph stat.ML

Benchmarking Physics-Informed Time-Series Models for Operational Global Station Weather Forecasting

Tao Han, Zhibin Wen, Zhenghao Chen, Dazhao Du, Song Guo, Lei Bai

Comments 34 pages, 20 figures

详情
英文摘要

The development of Time-Series Forecasting (TSF) models is often constrained by the lack of comprehensive datasets, especially in Global Station Weather Forecasting (GSWF), where existing datasets are small, temporally short, and spatially sparse. To address this, we introduce WEATHER-5K, a large-scale observational weather dataset that better reflects real-world conditions, supporting improved model training and evaluation. While recent TSF methods perform well on benchmarks, they lag behind operational Numerical Weather Prediction systems in capturing complex weather dynamics and extreme events. We propose PhysicsFormer, a physics-informed forecasting model combining a dynamic core with a Transformer residual to predict future weather states. Physical consistency is enforced via pressure-wind alignment and energy-aware smoothness losses, ensuring plausible dynamics while capturing complex temporal patterns. We benchmark PhysicsFormer and other TSF models against operational systems across several weather variables, extreme event prediction, and model complexity, providing a comprehensive assessment of the gap between academic TSF models and operational forecasting. The dataset and benchmark implementation are available at: https://github.com/taohan10200/WEATHER-5K.

2404.13497 2026-04-01 cs.GR stat.CO

Histropy: A Computer Program for Quantifications of Histograms of 2D Gray-scale Images

Sagarika Menon, Peter Moeck

详情
英文摘要

The computer program "Histropy" is an interactive Python program for the quantification of selected features of two-dimensional (2D) images/patterns (in either JPG/JPEG, PNG, GIF, BMP, or baseline TIF/TIFF formats) using calculations based on the pixel intensities in this data, their histograms, and user-selected sections of those histograms. The histograms of these images display pixel-intensity values along the x-axis (of a 2D Cartesian plot), with the frequency of each intensity value within the image represented along the y-axis. The images need to be of 8-bit or 16-bit information depth and can be of arbitrary size. Histropy generates an image's histogram surrounded by a graphical user interface that allows one to select any range of image-pixel intensity levels, i.e. sections along the histograms' x-axis, using either the computer mouse or numerical text entries. The program subsequently calculates the (so-called Monkey Model) Shannon entropy and root-mean-square contrast for the selected section and displays them as part of what we call a "histogram-workspace-plot." To support the visual identification of small peaks in the histograms, the user can switch between a linear and log-base-10 display scale for the y-axis of the histograms. Pixel intensity data from different images can be overlaid onto the same histogram-workspace-plot for visual comparisons. The visual outputs of the program can be saved as histogram-workspace-plots in the PNG format for future usage. The source code of the program and a brief user manual are published in the supporting materials as well as on GitHub. Instead of taking only 2D images as inputs, the program's functionality could be extended by a few lines of code to other potential uses employing data tables with one or two dimensions in the CSV format.

2603.29539 2026-04-01 stat.ME stat.AP

Tree models for covariate-dependent method agreement with repeated measurements in clinical research

Siranush Karapetyan, Achim Zeileis, Moritz Flick, Bernd Saugel, Alexander Hapfelmeier

Comments number of pages: 22 number of figures: 10

详情
英文摘要

Background: In clinical research, the Bland-Altman analysis is commonly used to assess agreement of metric measurements made by two or more techniques, devices or methods. The approach can also deal with repeated measurements per subject or observational unit. However, a strong and implicit assumption is that agreement of methods is homogeneous across subjects. Objective: To extend the previously introduced multivariable modeling of conditional method agreement with single measurements per subject to the frequent case of repeated measurements. Methods: Appropriate regression trees, called conditional method agreement trees (COAT), are generalized to capture the dependence of the parameters of the Bland-Altman analysis on covariates. These parameters, the expectation and variance of the differences between the methods, are decomposed into subject-specific components to account for repeated measurements. Whilst the theoretical, asymptotic properties of tree models are known, a simulation study was carried out to assess the performance of COAT in finite samples. A comparison of devices measuring cardiac output serves as an application example. Results: COAT is applicable to the two relevant cases of paired and unpaired repeated measurements. In the simulation study, it controlled the type-I error at the nominal level and could detect covariate-dependent method agreement with increasing sample size. The Adjusted Rand Index, a measure of concordance between the estimated and true subgroups, reached very high values close to the maximum of 1. The analysis of cardiac output showed that patients' characteristics may influence the agreement between measuring devices, with implications for use in patient care. Conclusion: COAT can explicitly define subgroups of heterogeneous method agreement in dependence of covariates with appropriate statistical testing in case of repeated measurements.

2603.29485 2026-04-01 math.ST stat.TH

Inference in covariate-adjusted bipartite network models

Wu Zuhui, Wang Qiuping, Yan Ting

Comments 23 pages, 3 figures, 2tables

详情
英文摘要

In this paper, we introduce a general model for jointly modelling the nodal heterogeneity and covariates in weighted or unweighted bipartite networks, which contains two different types of nodes. The model has a degree heterogeneity parameter for each node and a fixed-dimensional regression coefficient for the covariates. We use the method of moments to estimate the unknown parameters. When the model belongs to the exponential family of distributions, the moment estimator is identical to the maximum likelihood estimator. We show the uniform consistency of the moment estimator, when the number of actors and the number of events both go to infinity under some conditions. Further, we derive an asymptotic representation of the moment estimator, which leads to their asymptotic normal distributions under some conditions. We present two applications to illustrate the unified results. Numerical simulations and a real-data analysis demonstrates our theoretical findings.

2603.29440 2026-04-01 stat.ME

A Robbins-Monro algorithm for non-parametric estimation of NAR process with Markov-Switching: asymptotic normality

Lisandro Fermin, Ricardo Rios, Luis-Ángel Rodríguez

详情
英文摘要

This paper is the second part of our study on the non-parametric estimation of MS-NAR processes started with [L. Fermin et al. 2017]. We consider the Nadaraya-Watson type regression function estimator for non-linear autoregressive Markov switching processes. In this context the regression function estimator is interpreted as a solution of a local weighted We have introduced, in the first work, a restoration-estimation Robbins-Monro algorithm to approximate the estimator, and we proved identifiability of model and the consistency of the non-parametric estimator. In this work, we obtain the central limit theorem for the non-parametric estimator, whether the Markov chain is observed or not. Finally, we present a detailed simulation study illustrating the performances of our estimation procedure.

2603.29436 2026-04-01 stat.ME stat.CO

Efficient Amortized Bayesian Inference for Markov Random Fields via Gradient-Informed Grid Selection

Laura Bazahica, Alejandra Avalos-Pacheco, Matthew Moores, Lassi Roininen

Comments 15 pages

详情
英文摘要

Bayesian inference for models with intractable likelihoods, such as Markov random fields, poses a fundamental computational challenge due to the tradeoff between inferential accuracy and computational cost. Various MCMC methods have been developed to address this challenge. The exchange algorithm targets the exact posterior, but requires an expensive perfect sampling step at each iteration, which is often infeasible in practice. In contrast, path sampling approximates the Metropolis acceptance ratio using a precomputed grid of likelihood values, but may introduce bias when the grid is poorly chosen. We introduce a novel amortized MCMC framework that retains the theoretical validity of exact methods while substantially reducing the computational burden. The proposed approach employs a gradient-informed grid selection procedure and constructs a surrogate likelihood via Hermite interpolation, yielding a smooth approximation with low error. A simulation study characterizes the rate at which inferential accuracy improves as the number of grid points increases. We further demonstrate the practical performance of the method through applications to a hidden Potts model for satellite imagery and an autologistic model for Arctic ice floes.

2603.29354 2026-04-01 eess.SP cs.SY eess.SY stat.ME

ARC: Alignment-based RPM Estimation with Curvature-adaptive Tracking

Weiheng Hua, Changyu Hao

详情
英文摘要

Tacho-less rotational speed estimation is critical for vibration-based prognostics and health management (PHM) of rotating machinery, yet traditional methods--such as time-domain periodicity, cepstrum, and harmonic comb matching--struggle under noise, non-stationarity, and inharmonic interference. Probabilistic tracking offers a principled way to fuse multiple estimators, but a major challenge is that heterogeneous estimators produce evidence on incompatible axes and scales. We address this with ARC (Alignment-based RPM Estimation with Curvature-adaptive Tracking) by unifying the observation representation. Each estimator outputs a one-dimensional evidence curve on its native axis, which is mapped onto a shared RPM grid and converted into a comparable grid-based log-likelihood via robust standardization and a Gibbs-form energy shaping. Standard recursive filtering with fixed-variance motion priors can fail under multi-modal or ambiguous evidence. To overcome this, ARC introduces a curvature-informed, state-dependent motion prior, where the transition variance is derived from the local discrete Hessian of the previous log-posterior. This design enforces smooth tracking around confident modes while preserving competing hypotheses, such as octave alternatives. Experiments on synthetic stress tests and real vibration-table data demonstrate stable, physically plausible trajectories with interpretable uncertainty, and ablations confirm that these gains arise from uncertainty-aware temporal propagation rather than per-frame peak selection or ad hoc rules.

2603.29333 2026-04-01 stat.ME

Semiparametric analysis for paired comparisons with covariates

Haoyue Song, Lianqiang Qu, Ting Yan, Yuguo Chen

Comments 21 pages

详情
英文摘要

Statistical inference in parametric models (e.g., the Bradley--Terry model and its variants) for paired-comparison data has been explored in the high-dimensional regime, in which the number of items involving in paired comparisons diverges. However, parametric models are highly susceptible to model misspecification. To relax the assumption of known distributions and provide flexibility, we propose a semiparametric framework for modeling the merits of items and covariate effects (e.g., home-field advantage) by introducing latent random variables with unspecified distributions. As the number of parameters increases with the number of items, semiparametric inference is highly nontrivial. To address this issue, we employ a kernel-based least squares approach to estimate all unknown parameters. When each pair of items has a fixed number of comparisons and the number of items tends to infinity, we prove the consistency of all resulting estimators and derive their asymptotic normal distributions. To the best of our knowledge, this is the first study to conduct a semiparametric analysis of paired comparisons with an increasing dimension. We conduct simulations to evaluate the finite-sample performance of the proposed method and illustrate its practical utility by analyzing an NBA dataset.

2603.29215 2026-04-01 stat.ME

Semi-supervised Classification for Functional Data with Application to Astronomical Spectra Analysis

Ruoxu Tan, Mingjie Jian, Yiming Zang

详情
英文摘要

Despite its extensive development for multivariate data, semi-supervised learning remains underdeveloped for functional data. To address this challenge, we extend the Fermat distance, a density-sensitive metric aligning with the semi-supervised setting, to the functional domain. Leveraging the Fermat distance, we propose novel semi-supervised classifiers, including the weighted $k$-nearest neighbors (NN) classifier and multidimensional scaling (MDS)-induced classifiers. To accommodate massive datasets commonly seen in semi-supervised applications, we design a computationally efficient estimation procedure tailored for discrete and noisy functional observations. Theoretically, we establish exponentially decaying convergence rates of the $k$-NN classifier and the consistency of the estimated Fermat distance. Crucially, our results reveal a phenomenon unique to error-contaminated functional data: Incorporating unlabeled data leads to improved classification accuracy only when the individual sampling rate grows sufficiently fast. Applying our framework to simulated data and a large-scale dataset of Gaia astronomical spectra, we demonstrate that our proposed semi-supervised classifiers uniformly outperform existing supervised benchmarks.

2603.29169 2026-04-01 stat.ME stat.CO

BLOC: A Global Optimization Framework for Sparse Covariance Estimation with Non-Convex Penalties

Priyam Das, Trambak Banerjee, Prajamitra Bhuyan

详情
英文摘要

We introduce BLOC (Black-box Optimization over Correlation matrices), a general framework for sparse covariance estimation with non-convex penalties. BLOC operates on the manifold of correlation matrices and reparameterizes it via an angular Cholesky mapping, transforming the positive-definite, unit-diagonal constraint into an unconstrained search over a Euclidean hyperrectangle. This enables gradient-free global optimization of diverse objectives, including non-differentiable or black-box losses, using a pattern search routine with adaptive coordinate polling, run-wise restarts to escape local minima, and leveraging up to $d(d-1)$ parallel threads when optimizing a $d$-dimensional correlation matrix. The method is penalty-agnostic and ensures that every iterate is a valid correlation matrix, from which covariance estimates are obtained. We establish convergence guarantees, including stationarity, probabilistic escape from poor local minima, and sublinear rates under smooth convex losses. From a statistical perspective, we prove consistency, convergence rates, and sparsistency for penalized correlation estimators under general conditions, extending sparse covariance theory beyond the Gaussian setting. Empirically, BLOC with nonconvex penalties such as SCAD and MCP outperforms leading estimators in both low- and high-dimensional regimes, achieving lower estimation error and improved sparsity recovery. A parallel implementation enhances scalability, and a proteomic network application demonstrates robust, positive-definite sparse covariance estimation.

2603.29168 2026-04-01 stat.ME

Linear models for causal inference under network interference

Eric Tong, Salvador V. Balkus

Comments 12 pages, 4 figures

详情
英文摘要

In causal inference, interference occurs when the treatment of one unit may affect the outcomes of other units. The goal of this work is to serve as a guide to the use of linear outcome modeling for estimating causal effects in settings where interference may pose a challenge to identification and estimation, such as spatial and network data. We demonstrate that, under a linear model, causal effects of binary and continuous treatments can be identified in terms of regression coefficients under totally and partially known interference structures. Our work constructs unbiased and consistent point and variance estimators for these effects under one or more possible fixed or random interference networks. A chief advantage is that this approach can be implemented using standard linear regression software, and is easily augmented with random effects and heteroscedastic or autocorrelation consistent standard errors. Numerical experiments and an example data analysis demonstrate the efficacy of this approach in eliminating interference bias.

2603.29134 2026-04-01 stat.ME stat.AP

When Bayes goes bad: Weakly-regularized covariate adjustment leads to a biased estimate of prevalence

Swen Kuh, Lauren Kennedy, Qixuan Chen, Andrew Gelman

详情
英文摘要

When estimating population prevalence from a non-random sample, it is important to adjust for differences between sample and population. However, adjustment for multiple factors requires analysis that can be difficult to understand and validate. In this manuscript, we explore an unexpected downward trend of estimates when covariates are added sequentially to a Bayesian hierarchical model for the estimation of the prevalence of SARS-CoV-2 specific antibodies in an Australian city in late 2020. We compare our data analysis to results from a simulation study to understand four potential contributors to this effect: (i) correction for differences between sample and population, (ii) rare-events bias in logistic regression, (iii) inclusion of the uncertainty of test sensitivity and specificity in a multilevel model, and (iv) increasing model dimensionality. We find that weak prior distributions on the logistic regression coefficients lead to a systematic increase in the amount of partial pooling across adjustment cells-the prior becomes stronger as model dimensionality increases-which in turn feeds through to the estimated assay specificity, which then feeds back to the model and results in lowering the estimated prevalence. Our paper contributes three elements: (i) immediate and longer-term recommendations for using these types of models, (ii) simulation studies to explore the impact of the contributors to this effect, and (iii) a worked example of investigation of unexpected results in a model with multiple adjustment factors.

2603.29120 2026-04-01 math.ST stat.TH

Computable error bounds for high-dimensional Edgeworth expansions in sphericity testing under two-step monotone incomplete data

Tetsuya Sato, Tomoyuki Nakagawa

Comments 37pages

详情
英文摘要

In this paper, we consider the sphericity test for a one-sample problem under high-dimensional two-step monotone incomplete data. Existing asymptotic expansions for the null distributions of the likelihood ratio test (LRT) statistic and modified LRT statistic are inaccurate in high-dimensional settings. Therefore, we derive Edgeworth expansions for the null distribution of the LRT statistic in such settings and obtain computable error bounds. Furthermore, we demonstrate that our proposed Edgeworth expansions provide better approximation accuracy than the existing asymptotic expansions. We also conduct numerical experiments using Monte Carlo simulations to evaluate the maximum absolute error (MAE) between the distribution function of the standardized test statistic and Edgeworth expansions for the null distribution of the LRT statistic, as well as to assess the performance of the computable error bounds.

2603.29110 2026-04-01 stat.ME

Optimal Data Integration and Adaptive Sampling for Efficient Treatment Effect Estimation

Yen-Chun Liu, Alexander Volfovsky, German Schnaidt, Cristobal Garib, Eric Laber

详情
英文摘要

This study addresses the challenge of estimating average treatment effects (ATEs) for advertising campaigns in online marketplaces where complete randomized experimentation is infeasible. We propose two key innovations: (1) a shrinkage estimator that optimally combines observational and experimental data without assuming smooth treatment effects across campaigns, and (2) a Bayesian adaptive experimental design framework that efficiently selects campaigns for randomized evaluation that minimizes cumulative risk. Our shrinkage estimator achieves lower risk compared to existing methods by balancing bias-variance tradeoffs, while our adaptive design significantly reduces the costs of campaign randomization. We establish theoretical guarantees including asymptotic normality and regret bounds. In an application to Amazon Ads data analyzing 2,583 campaigns, our approach achieves equivalent estimation precision while requiring only half of the randomized experiments needed by random sampling, the standard method widely used in practice today. The proposed method serves as a practical solution for marketplace platforms to efficiently measure advertising effectiveness while managing experimentation costs.

2603.29096 2026-04-01 stat.ME

Efficient and Fast Sampling from Arbitrary Probability Kernels using Sliced Gibbs Sampler

Prithwish Ghosh, Sujit K Ghosh

Comments 27 pages, 24 figures

详情
英文摘要

An Automated Sliced Gibbs framework is proposed for fully automated Markov chain Monte Carlo sampling from arbitrary finite dimensional probability kernels. The method targets unnormalized, non-smooth, heavy tailed, and highly multimodal densities. A Cauchy transformation based effective support estimator is combined with slice driven Gibbs updates. This construction removes the need for user specified truncation bounds, proposal scales, step-size tuning, or conditional optimization. Unlike existing slice samplers, ASG does not require manually chosen bracket widths or geometric insight into the support. All calibration is performed automatically within each Gibbs cycle. The resulting Markov chain preserves invariance and ergodicity. Automated support detection allows efficient movement across disconnected high density regions. The sampler adapts to sharp curvature and irregular geometry without gradient information. Extensive numerical experiments evaluate performance on complex kernels, including univariate Beta mixtures, multivariate Rosenbrock and Ackley benchmarks, and non-smooth kernels derived from generalized LASSO type loss functions. Across these challenging settings, ASG consistently achieves higher effective sample size per second and faster decorrelation than Random Walk Metropolis Hastings, adaptive Gibbs variants, and some recently proposed slice based methods. The framework provides a scalable and general-purpose solution for sampling from complicated probability kernels where existing algorithms require substantial tuning or exhibit slow mixing.

2603.29058 2026-04-01 stat.ME math.ST stat.AP stat.TH

A Unified Framework for Nonlinear Mediation Analysis of Random Objects

Wenxi Tan, Bing Li, Lingzhou Xue

Comments 35 pages, 7 figures

详情
英文摘要

Mediation analysis for complex, non-Euclidean data, such as probability distributions, compositions, images, and networks, presents significant methodological challenges due to the inherent nonlinearity and geometric constraints of such spaces. Existing approaches are often restricted to Euclidean settings or specific data types. We propose Random Object Mediation Analysis (ROMA), a unified framework that simultaneously accommodates object-valued exposures, mediators, and outcomes, enabling the analysis of nonlinear causal pathways in general metric spaces. ROMA leverages an additive Reproducing Kernel Hilbert Space (RKHS) operator model to rigorously disentangle direct and indirect causal pathways, which is a significant advancement over existing single-predictor or purely predictive additive frameworks. Theoretically, we establish the nonparametric identification of causal effects and derive global asymptotic normality for our estimators. Crucially, this theoretical foundation enables the construction of simultaneous confidence bands and global test statistics without the need for computationally intensive resampling. We demonstrate the practical utility of ROMA through simulations and real-world applications involving compositional mediators and distributional outcomes, extending the scope of mediation analysis.

2603.29054 2026-04-01 stat.ME

Likelihood-Free Inference via Structured Score Matching

Haoyu Jiang, Yuexi Wang, Yun Yang

详情
英文摘要

In many statistical problems, the data distribution is specified through a generative process for which the likelihood function is analytically intractable, yet inference on the associated model parameters remains of primary interest. We develop a likelihood-free inference framework that combines score matching with gradient-based optimization and bootstrap procedures to facilitate parameter estimation together with uncertainty quantification. The proposed methodology introduces tailored score-matching estimators for approximating likelihood score functions, and incorporates an architectural regularization scheme that embeds the statistical structure of log-likelihood scores to improve both accuracy and scalability. We provide theoretical guarantees and demonstrate the practical utility of the method through numerical experiments, where it performs favorably compared to existing approaches.

2603.29003 2026-04-01 stat.ME cs.HC

Active Inference with People: a general approach to real-time adaptive experiments

Lucas Gautheron, Nori Jacoby, Peter Harrison

详情
英文摘要

Adaptive experiments automatically optimize their design throughout the data collection process, which can bring substantial benefits compared to conventional experimental settings. Potential applications include, among others: computerized adaptive testing (for selecting informative tasks in ability measurements), adaptive treatment assignment (when searching experimental conditions maximizing certain outcomes), and active learning (for choosing optimal training data for machine learning algorithms). However, implementing these techniques in real time poses substantial computational and technical challenges. Additionally, despite their conceptual similarity, the above scenarios are often treated as separate problems with distinct solutions. In this paper, we introduce a practical and unified approach to real-time adaptive experiments that can encompass all of the above scenarios, regardless of the modality of the task (including textual, visual, and audio inputs). Our strategy combines active inference, a Bayesian framework inspired by cognitive neuroscience, with PsyNet, a platform for large-scale online behavioral experiments. While active inference provides a compact, flexible, and principled mathematical framework for adaptive experiments generally, PsyNet is a highly modular Python package that supports social and behavioral experiments with stimuli and responses in arbitrary domains. We illustrate this approach through two concrete examples: (1) an adaptive testing experiment estimating participants' ability by selecting optimal challenges, effectively reducing the amount of trials required by 30--40\%; and (2) an adaptive treatment assignment strategy that identifies the optimal treatment up to three times as accurately as a fixed design in our example. We provide detailed instructions to facilitate the adoption of these techniques.

2603.28999 2026-04-01 math.OC cs.LG stat.ML

Transfer Learning in Bayesian Optimization for Aircraft Design

Ali Tfaily, Youssef Diouane, Nathalie Bartoli, Michael Kokkolaras

详情
英文摘要

The use of transfer learning within Bayesian optimization addresses the disadvantages of the so-called \textit{cold start} problem by using source data to aid in the optimization of a target problem. We present a method that leverages an ensemble of surrogate models using transfer learning and integrates it in a constrained Bayesian optimization framework. We identify challenges particular to aircraft design optimization related to heterogeneous design variables and constraints. We propose the use of a partial-least-squares dimension reduction algorithm to address design space heterogeneity, and a \textit{meta} data surrogate selection method to address constraint heterogeneity. Numerical benchmark problems and an aircraft conceptual design optimization problem are used to demonstrate the proposed methods. Results show significant improvement in convergence in early optimization iterations compared to standard Bayesian optimization, with improved prediction accuracy for both objective and constraint surrogate models.

2603.28989 2026-04-01 math.ST cs.IT math.IT stat.CO stat.ME stat.TH

Linear Regression from 1-bit Quantized Data

Daniel Hill, Martin Slawski

详情
英文摘要

Motivated by the prevalence of environments in which data is abundant while resources for storage and/or transmission might be scarce, we study linear regression when predictors, their squares, and responses are subject to single-bit dithered quantization. An estimator relying on plug-in estimation of the quadratic and linear terms in the quadratic program formulation of the least squares problem is proposed. We provide a non-asymptotic bound on the $\ell_2$-estimation error of this estimator and obtain its asymptotic distribution when the number of predictors is fixed, which can be used for inference and an investigation of the mean-square error efficiency relative to the ordinary least squares estimator. It is shown that for the quantization protocol under consideration, substantial improvements over the proposed estimator cannot be expected. A compression pipeline in which the underlying data is first subject to sketching and subsequently quantization can be studied within our framework as well. We also present an extension to address high-dimensional predictors. Numerical experiments with synthetic data complement our theoretical findings.

2603.28987 2026-04-01 math.OC cs.CE stat.ML

Multi-fidelity approaches for general constrained Bayesian optimization with application to aircraft design

Oihan Cordelier, Youssef Diouane, Nathalie Bartoli, Eric Laurendeau

详情
英文摘要

Aircraft design relies heavily on solving challenging and computationally expensive Multidisciplinary Design Optimization problems. In this context, there has been growing interest in multi-fidelity models for Bayesian optimization to improve the MDO process by balancing computational cost and accuracy through the combination of high- and low-fidelity simulation models, enabling efficient exploration of the design process at a minimal computational effort. In the existing literature, fidelity selection focuses only on the objective function to decide how to integrate multiple fidelity levels, balancing precision and computational cost using variance reduction criteria. In this work, we propose novel multi-fidelity selection strategies. Specifically, we demonstrate how incorporating information from both the objective and the constraints can further reduce computational costs without compromising the optimality of the solution. We validate the proposed multi-fidelity optimization strategy by applying it to four analytical test cases, showcasing its effectiveness. The proposed method is used to efficiently solve a challenging aircraft wing aero-structural design problem. The proposed setting uses a linear vortex lattice method and a finite element method for the aerodynamic and structural analysis respectively. We show that employing our proposed multi-fidelity approach leads to $86\%$ to $200\%$ more constraint compliant solutions given a limited budget compared to the state-of-the-art approach.

2603.28973 2026-04-01 quant-ph stat.CO

Bell's Inequality, Causal Bounds, and Quantum Bayesian Computation: A Unified Framework

Nick Polson, Vadim Sokolov, Daniel Zantedeschi

详情
英文摘要

Bell inequalities characterize the boundary of the local-realist correlation polytope -- the set of joint probability distributions achievable by classical hidden-variable models. Quantum mechanics exceeds this boundary through non-commutativity, reaching the Tsirelson bound $2\sqrt{2}$ for CHSH. We show that this polytope structure is not specific to quantum foundations: it appears identically in the causal inference literature, where the instrumental inequality, the Balke--Pearl linear programming bounds, and the Tian--Pearl probabilities of causation all arise as facets of the same marginal compatibility polytope. Fine's theorem -- that CHSH inequalities hold if and only if a joint distribution exists -- is precisely the pivot: the instrumental variable model in causal inference is structurally equivalent to the Bell local hidden-variable model, with the instrument playing the role of the measurement setting and the latent confounder playing the role of the hidden variable $λ$. We develop this correspondence in detail, extending it to algorithmic (Kolmogorov complexity) and entropic formulations of Bell inequalities, the NPA semidefinite programming hierarchy, and the MIP$^*$=RE undecidability result. We further show that the Born-rule / Bayes-rule duality underlying quantum Bayesian computation exploits the same non-commutativity that enables Bell violation, providing polynomial speedups for posterior inference. The framework yields a concrete dictionary between quantum information theory, causal econometrics, and Bayesian computation, and suggests new directions including NPA-based quantum causal inference algorithms and quantum architectures for function approximation.

2603.28958 2026-04-01 stat.AP

The Problem of Dynamic Spatial Sampling and Geofence Surveillance

Marty Davidson, Jason Byers

详情
英文摘要

Geofencing surveillance poses a dynamic spatial sampling problem. Law enforcement must establish geofence perimeters to identify a relevant suspect. This requires identifying a sampling region around a surveillance site and counting the number of intersecting individuals as proxied by geolocation tags. Law enforcement commonly constructs sampling regions with fixed distance intervals or fixed polygon boundaries. This generates privacy concerns as considerations for constructing these perimeters do not factor in the local density of human activity, such as pedestrian flows or traffic patterns. This increases the risk of selective expansion where agencies attempt to extend their data collection beyond what a warrant previously approved. This paper attempts to balance law enforcement's needs for surveillance with individual level privacy by proposing a set of optimal radius estimators. These plug-in estimators use the empirical distribution of human activity patterns to estimate an optimal radius. Given a surveillance site and set of point densities, the optimal radius generates surveillance perimeters that adapt to local conditions. We discuss the implications of applying this estimator to policing surveillance efforts and how law enforcement can use algorithms to better protect the privacy of its citizens.

2603.28956 2026-04-01 math.FA cs.LG math.MG math.PR math.ST stat.TH

Minimum Norm Interpolation via The Local Theory of Banach Spaces: The Role of $2$-Uniform Convexity

Gil Kur, Pierre Bizeul

Comments A Preliminary work of this work "Minimum Norm Interpolation Meets The Local Theory of Banach Spaces'' appeared at the International Conference of Machine Learning 2024 (consider this info for citations)

详情
英文摘要

The minimum-norm interpolator (MNI) framework has recently attracted considerable attention as a tool for understanding generalization in overparameterized models, such as neural networks. In this work, we study the MNI under a $2$-uniform convexity assumption, which is weaker than requiring the norm to be induced by an inner product, and it typically does not admit a closed-form solution. At a high level, we show that this condition yields an upper bound on the MNI bias in both linear and nonlinear models. We further show that this bound is sharp for overparameterized linear regression when the unit ball of the norm is in isotropic (or John's) position, and the covariates are isotropic, symmetric, i.i.d. sub-Gaussian, such as vectors with i.i.d. Bernoulli entries. Finally, under the same assumption on the covariates, we prove sharp generalization bounds for the $\ell_p$-MNI when $p \in \bigl(1 + C/\log d, 2\bigr]$. To the best of our knowledge, this is the first work to establish sharp bounds for non-Gaussian covariates in linear models when the norm is not induced by an inner product. This work is deeply inspired by classical works on $K$-convexity, and more modern work on the geometry of 2-uniform and isotropic convex bodies.

2603.28936 2026-04-01 math.PR cs.DS math.CO math.NT math.ST stat.TH

Composition of random functions and word reconstruction

Guillaume Chapuy, Guillem Perarnau

详情
英文摘要

Given two functions $\mathbf{a}\!:\! [n] \rightarrow [n]$ and $\mathbf{b}\!:\! [n] \rightarrow [n]$ chosen uniformly at random, any word $w=w_1w_2\dots w_k\in \{a,b\}^k$ induces a random function $\mathbf{w}\!:\! [n] \rightarrow [n]$ by composition, i.e. $\mathbf{w}=ϕ_{w_k}\circ \dots \circ ϕ_{w_1}$ with $ϕ_a=\mathbf{a}$ and $ϕ_b=\mathbf{b}$. We study the following question: assuming $w$ is fixed but unknown, and $n$ goes to infinity, does one sample of $\mathbf{w}$ carry enough information to (partially) recover the word $w$ with good enough probability? We show that the length of $w$, and its exponent (largest $d$ such that $w={u}^d$ for some word ${u}$) can be recovered with high probability. We also prove that the random functions stemming from two different words are separated in total variation distance, provided that certain ``auto-correlation'' word-depending constant $c(w)$ is different for each of them. We give an explicit expression for $c(w)$ and conjecture that non-isomorphic words have different constants. We prove that this is the case assuming a major conjecture in transcendental number theory, Schanuel's conjecture.

2603.28930 2026-04-01 stat.CO econ.EM econ.GN q-bio.QM q-fin.EC

Retrospective Economic Evaluation of Group Testing in the COVID-19 Pandemic

Michael Balzer, Kainat Khowaja, Christiane Fuchs

详情
英文摘要

Surveillance of diseases in a pandemic is an important part of public health policy. Diagnostic testing at the individual level is often infeasible due to resource constraints. To circumvent these constraints, group testing can be applied. The economic cost evaluation from the payer's perspective typically focuses only on deterministic costs which overlooks the substantial economic impact of productivity losses resulting from quarantine and workplace disruptions. The objective of this article is to develop a mathematical model for a retrospective economic evaluation of group testing that incorporates both deterministic costs and income-based economic loss. Group testing algorithms are revisited and simulated at optimized pool sizes to determine the required number of tests. Income data from the German Socio-Economic Panel are integrated into a mathematical model to capture the economic loss. Afterward, hybrid Monte Carlo experiments are conducted by evaluating the economic cost in the Coronavirus disease 2019 pandemic in Germany. Monte Carlo experiments show that the optimal choice of group testing algorithms changes substantially when income-based economic losses are included. Evaluations considering only deterministic costs systematically underestimate the total economic cost. Algorithms with a longer quarantine duration are less attractive than shorter quarantine duration if income-based economic loss is accounted for. The findings show that current evaluations underestimate the true economic cost. Group testing algorithms with shorter duration and fewer stages are preferred, even when they require a larger number of tests. These results underscore the importance of incorporating income-based economic loss into a mathematical model.

2603.28907 2026-04-01 stat.AP

GPU-accelerated Bayesian inference for block-cave mine monitoring via muon tomography

Miguel Biron-Lattes, Patrick Belliveau, Faezeh Yazdi, Samopriya Basu, Donald Estep, Derek Bingham, Doug Schouten

Comments Submitted to Computers & Geosciences

详情
英文摘要

We describe a Bayesian framework for an inverse problem arising from monitoring block caving operations via muon tomography. We work with a low dimensional surface-based representation of the geometry of the block cave, which dramatically reduces the computational requirements of the model while allowing realistic geometries. Adopting a Bayesian approach, we define a prior distribution on the space of geometries that favors realistic cave shapes. Pairing this prior with a likelihood based on the muon tomography forward model, we obtain a posterior distribution over cave geometries using Bayes rule. We obtain approximate samples from this posterior distribution using Markov chain Monte Carlo algorithms running on GPUs, resulting in fast and accurate sampling. We test the fidelity of our methodology by applying it to a simulated block caving scenario for which the ground truth is known. Results show that our method produces a diverse array of sensible geometries that are simultaneously compatible with the data.

2603.28844 2026-04-01 stat.AP stat.OT

Violence Against Women: a pilot study on the perception of Apulian High school students

Crescenza Calculli, Serena Arima, Alessio Pollice, Nunziata Ribecco

详情
英文摘要

Violence Against Women (VAW) is a widespread issue deeply rooted in social and cultural structures. Affecting women of all ages and backgrounds, VAW is often underreported due to stigma and victim-blaming. This study explores young people's perceptions of VAW in the Apulia region (Southern Italy), using a local survey inspired by a National framework on gender stereotypes and attitudes towards VAW. The survey gathers insights into youth opinions on gender roles, the acceptability of violence, and awareness of VAW within their communities, aiming to uncover the underlying attitudes that perpetuate this issue. The analysis combines two methodological approaches to examine these data. A network-based approach explores relationships within item responses, allowing for an in-depth look at the direct interactions among youth attitudes. This approach is paired with a psychometric model based on Item Response Theory, specifically the Graded Response Model, which interprets attitudes as manifestations of latent traits, revealing how different factors shape perceptions of VAW. Together, these methods offer a comprehensive analysis of young people's views on VAW, highlighting both individual response patterns and broader cultural trends essential for designing effective interventions. Findings indicate a gradual shift in attitudes toward gender roles; however, traditional views remain prevalent, especially among young males. Socioeconomic factors, such as parents' employment status, also contribute to the persistence of stereotypes, underscoring the need for targeted interventions to address and reduce VAW in youth populations.

2603.28786 2026-04-01 cs.CY cs.AI stat.AP

AI in Work-Based Learning: Understanding the Purposes and Effects of Intelligent Tools Among Student Interns

John Paul P. Miranda, Rhiziel P. Manalese, Sheila M. Geronimo, Vernon Grace M. Maniago, Charlie K. Padilla, Aileen P. De Leon, Santa L. Merle, Mark Anthony A. Castro

Comments 5 pages, 2 tables, conference proceedings

详情
Journal ref
2025 International Workshop on Artificial Intelligence and Education (2026) 411-415
英文摘要

This study examined how student interns in Philippine higher education use intelligent tools during their OJT. Data were collected from 384 respondents using a structured questionnaire that asked about AI tool usage, task-specific applications, and perceptions of confidence, ethics, and support. Analysis of task-based usage identified four main purposes: productivity and report writing, communication and content drafting, technical assistance and code support, and independent task completion. ChatGPT was the most commonly used AI tool, followed by Quillbot, Canva AI, and Grammarly. Students reported moderate confidence in using AI and applied these tools selectively and ethically during OJT tasks. This indicate that AI tools assist student interns in various OJT activities related to work-readiness. The study suggests that higher education programs include AI literacy and onboarding. Clear policies and fair access to AI tools are important to support responsible use and prepare students for future careers.

2603.23397 2026-04-01 stat.ME cs.NA math.NA stat.ML

Kinetic Langevin Splitting Schemes for Constrained Sampling

Neil K. Chada, Lu Yu

Comments 35 pages

详情
英文摘要

Constrained sampling is an important and challenging task in computational statistics, concerned with generating samples from a distribution under certain constraints. There are numerous types of algorithm aimed at this task, ranging from general Markov chain Monte Carlo, to unadjusted Langevin methods. In this article we propose a series of new sampling algorithms based on the latter of these, specifically the kinetic Langevin dynamics. Our series of algorithms are motivated on advanced numerical methods which are splitting order schemes, which include the BU and BAO families of splitting schemes.Their advantage lies in the fact that they have favorable strong order (bias) rates and computationally efficiency. In particular we provide a number of theoretical insights which include a Wasserstein contraction and convergence results. We are able to demonstrate favorable results, such as improved complexity bounds over existing non-splitting methodologies. Our results are verified through numerical experiments on a range of models with constraints, which include a toy example and Bayesian linear regression.

2603.20546 2026-04-01 stat.AP cs.IT math.IT

On the Limits of Prediction: Forecastability Profiles and Information Decay in Time Series

Peter Maurice Catt

Comments Resubmitted as a highly revised paper - Forecastability as an Information-Theoretic Limit on Prediction [arXiv:2603.27074]

详情
英文摘要

Forecasting accuracy is bounded by the information available about the future. This paper makes that statement precise using information-theoretic tools. Under logarithmic loss, the expected performance of any probabilistic forecast decomposes into two parts: an irreducible component and an approximation component. The irreducible term is the conditional entropy of the future given the available information, while the approximation term is the divergence between the true conditional distribution and the forecasting method. The gap between this conditional-entropy limit and an unconditional baseline is exactly the mutual information between the future observation and the declared information set. This leads to a definition of forecastability as the maximum achievable reduction in expected log loss. Evaluated across horizons, forecastability forms a profile that describes how predictive information varies with lead time. This profile reflects the dependence structure of the process and need not be monotone: predictive information may be concentrated at particular lags, including seasonal horizons, even when intermediate horizons contain little useful signal. From this profile, the paper defines the informative horizon set: the horizons at which forecastability exceeds a practical threshold. At horizons not in this set, the achievable gain over the unconditional baseline is necessarily small, regardless of the forecasting method used. The framework therefore separates what is learnable from what is not, and distinguishes limits imposed by the data from errors introduced by modelling. The result is a pre-modelling diagnostic that identifies where meaningful prediction is feasible before any model is chosen, providing a principled basis for allocating modelling effort across forecast horizons.

2512.15987 2026-04-01 cs.LG cs.AI cs.DS stat.ML

Provably Extracting the Features from a General Superposition

Allen Liu

详情
英文摘要

It is widely believed that complex machine learning models generally encode features through linear representations. This is the foundational hypothesis behind a vast body of work on interpretability. A key challenge toward extracting interpretable features, however, is that they exist in superposition. In this work, we study the question of extracting features in superposition from a learning theoretic perspective. We start with the following fundamental setting: we are given query access to a function \[ f(x)=\sum_{i=1}^n σ_i(v_i^\top x), \] where each unit vector $v_i$ encodes a feature direction and $σ_i:\R\to\R$ is an arbitrary response function and our goal is to recover the $v_i$ and the function $f$. In learning-theoretic terms, superposition refers to the \emph{overcomplete regime}, when the number of features is larger than the underlying dimension (i.e. $n > d$), which has proven especially challenging for typical algorithmic approaches. Our main result is an efficient query algorithm that, from noisy oracle access to $f$, identifies all feature directions whose responses are non-degenerate and reconstructs the function $f$. Crucially, our algorithm works in a significantly more general setting than all related prior results. We allow for essentially arbitrary superpositions, only requiring that $v_i, v_j$ are not nearly identical for $i \neq j$, and allowing for general response functions $σ_i$. At a high level, our algorithm introduces an approach for searching in Fourier space by iteratively refining the search space to locate the hidden directions $v_i$.

2511.13603 2026-04-01 stat.AP

Variance Stabilizing Transformations for Electricity Price Forecasting in Periods of Increased Volatility

Bartosz Uniejewski

Comments Forthcoming in Electric Power Systems Research

详情
Journal ref
Electric Power Systems Research, 257, 112992, 2026
英文摘要

Accurate day-ahead electricity price forecasts are critical for power system operation and market participation, yet growing renewable penetration and recent crises have caused unprecedented volatility that challenges standard models. This paper revisits variance stabilizing transformations (VSTs) as a preprocessing tool by introducing a novel parametrization of the asinh transformation, systematically analyzing parameter sensitivity and calibration window size, and explicitly testing performance under volatile market regimes. Using data from Germany, Spain, and France over 2015-2024 with two model classes (NARX and LEAR), we show that VSTs substantially reduce forecast errors, with gains of up to 14.6% for LEAR and 8.7% for NARX relative to untransformed benchmarks. The new parametrized asinh consistently outperforms its standard form, while rolling averaging across transformations delivers the most robust improvements, reducing errors by up to 17.7%. Results demonstrate that VSTs are especially valuable in volatile regimes, making them a powerful tool for enhancing electricity price forecasting in today's power markets.

2510.06662 2026-04-01 cs.LG stat.ML

The Effect of Attention Head Count on Transformer Approximation

Penghao Yu, Haotian Jiang, Zeyu Bao, Ruoxi Yu, Qianxiao Li

Comments Accepted by ICLR 2026

详情
英文摘要

Transformer has become the dominant architecture for sequence modeling, yet a detailed understanding of how its structural parameters influence expressive power remains limited. In this work, we study the approximation properties of transformers, with particular emphasis on the role of the number of attention heads. Our analysis begins with the introduction of a generalized $D$-retrieval task, which we prove to be dense in the space of continuous functions, thereby providing the basis for our theoretical framework. We then establish both upper and lower bounds on the parameter complexity required for $ε$-approximation. Specifically, we show that transformers with sufficiently many heads admit efficient approximation, whereas with too few heads, the number of parameters must scale at least as $O(1/ε^{cT})$, for some constant $c$ and sequence length $T$. To the best of our knowledge, this constitutes the first rigorous lower bound of this type in a nonlinear and practically relevant setting. We further examine the single-head case and demonstrate that an embedding dimension of order $O(T)$ allows complete memorization of the input, where approximation is entirely achieved by the feed-forward block. Finally, we validate our theoretical findings with experiments on both synthetic data and real-world tasks, illustrating the practical relevance of our results.

2510.03638 2026-04-01 cs.LG cs.AI math.RT stat.ML

Expressive Power of Implicit Models: Rich Equilibria and Test-Time Scaling

Jialin Liu, Lisang Ding, Stanley Osher, Wotao Yin

详情
英文摘要

Implicit models, an emerging model class, compute outputs by iterating a single parameter block to a fixed point. This architecture realizes an infinite-depth, weight-tied network that trains with constant memory, significantly reducing memory needs for the same level of performance compared to explicit models. While it is empirically known that these compact models can often match or even exceed the accuracy of larger explicit networks by allocating more test-time compute, the underlying mechanism remains poorly understood. We study this gap through a nonparametric analysis of expressive power. We provide a strict mathematical characterization, showing that a simple and regular implicit operator can, through iteration, progressively express more complex mappings. We prove that for a broad class of implicit models, this process lets the model's expressive power scale with test-time compute, ultimately matching a much richer function class. The theory is validated across four domains: image reconstruction, scientific computing, operations research, and LLM reasoning, demonstrating that as test-time iterations increase, the complexity of the learned mapping rises, while the solution quality simultaneously improves and stabilizes.

2509.20702 2026-04-01 stat.AP cs.AI q-bio.GN

Incorporating LLM Embeddings for Variation Across the Human Genome

Hongqian Niu, Jordan Bryan, Jacob Williams, Hufeng Zhou, Haoyu Zhang, Xihao Li, Didong Li

详情
英文摘要

Recent advances in large language model (LLM) embeddings have enabled powerful representations for biological data, but most applications to date focus on gene-level information. We present one of the first systematic frameworks to generate genetic variant-level embeddings across the entire human genome. Using curated annotations from FAVOR, ClinVar, and the GWAS Catalog, we construct functional text descriptions for 8.9 billion possible variants and generated embeddings at three scales: 1.5 million HapMap3/MEGA variants, 90 million imputed UK Biobank (UKB) variants, and 9 billion all possible variants. Embeddings were produced using general purpose models including both OpenAI's text-embedding-3-large and the open-source Qwen3-Embedding-0.6B models. Baseline quality control experiments demonstrate high predictive accuracy for variant-level properties, validating the embeddings as structured representations of genomic variation. We further apply them to real-world embedding-augmented genetic risk predictions that demonstrate the performance of using LLM embeddings in polygenic risk score (PRS) style predictions over the UK Biobank cohort data. These resources, publicly available on Hugging Face, provide a foundation for advancing large-scale genomic discovery and precision medicine.

2509.03317 2026-04-01 stat.ML cs.LG

Bayesian Additive Regression Trees for functional ANOVA model

Seokhun Park, Insung Kong, Yongdai Kim

详情
英文摘要

Bayesian Additive Regression Trees (BART) is a powerful statistical model that leverages the strengths of Bayesian inference and regression trees. It has received significant attention for capturing complex non-linear relationships and interactions among predictors. However, the accuracy of BART often comes at the cost of interpretability. To address this limitation, we propose ANOVA Bayesian Additive Regression Trees (ANOVA-BART), a novel extension of BART based on the functional ANOVA decomposition, which is used to decompose the variability of a function into different interactions, each representing the contribution of a different set of covariates or factors. Our proposed ANOVA-BART enhances interpretability, preserves and extends the theoretical guarantees of BART, and achieves comparable prediction performance. Specifically, we establish that the posterior concentration rate of ANOVA-BART is nearly minimax optimal, and further provides the same convergence rates for each interaction that are not available for BART. Moreover, comprehensive experiments confirm that ANOVA-BART is comparable to BART in both accuracy and uncertainty quantification, while also demonstrating its effectiveness in component selection. These results suggest that ANOVA-BART offers a compelling alternative to BART by balancing predictive accuracy, interpretability, and theoretical consistency.

2508.12627 2026-04-01 stat.ML cs.DS cs.NA math.NA stat.CO stat.ME

On computing and the complexity of computing higher-order $U$-statistics, exactly

Xingyu Chen, Ruiqi Zhang, Lin Liu

Comments Comments are welcome! 71 pages, 8 tables, 5 figures. An accompanying Python package is available at: https://libraries.io/pypi/u-stats or https://github.com/Amedar-Asterisk/U-Statistics-Python

详情
英文摘要

Higher-order $U$-statistics abound in fields such as statistics, machine learning, and computer science, but are known to be highly time-consuming to compute in practice. Despite their widespread appearance, a comprehensive study of their computational complexity is surprisingly lacking. This paper aims to fill this gap by presenting several results related to the computational aspect of $U$-statistics. First, we derive a useful decomposition from a $m$-th order $U$-statistic to a linear combination of $V$-statistics with orders not exceeding $m$, which are generally more feasible to compute. Second, we explore the connection between exactly computing $V$-statistics and Einstein summation, a tool often used in computational mathematics and quantum computing to accelerate tensor computations. Third, we provide an optimistic estimate of the time complexity for exactly computing $U$-statistics, based on the treewidth of a particular graph associated with the $U$-statistic kernel. The above ingredients lead to (1) a new, much more runtime-efficient algorithm to exactly compute general higher-order $U$-statistics, and (2) a more streamlined characterization of runtime complexity of computing $U$-statistics. We develop an accompanying open-source package called \texttt{u-stats} in both Python (https://github.com/zrq1706/U-Statistics-Python) and R (https://github.com/cxy0714/U-Statistics-R). We demonstrate through three examples in statistics that \texttt{u-stats} achieves impressive runtime performance compared to existing benchmarks. This paper also aspires to achieve two goals: (1) to capture the interest of researchers in both statistics and other related areas to further advance the algorithmic development of $U$-statistics and (2) to lift the burden of implementing higher-order $U$-statistics from practitioners.

2507.11780 2026-04-01 econ.EM cs.LG math.ST stat.ME stat.TH

Inference on Optimal Policy Values and Other Irregular Functionals via Softmax Smoothing

Justin Whitehouse, Qizhao Chen, Morgane Austern, Vasilis Syrgkanis

Comments 82 pages, 4 figures, 1 table

详情
英文摘要

Constructing confidence intervals for the value of an (unknown) optimal treatment policy is a fundamental problem in causal inference. Insight into the optimal policy value can guide the development of reward-maximizing, individualized treatment regimes. However, because the functional that defines the optimal value is non-differentiable, standard semi-parametric approaches for performing inference fail to be directly applicable. Many existing works circumvent non-differentiability by making the unrealistic assumption of zero probability of treatment non-response, i.e. that every unit responds (either positively or negatively) to an assigned treatment. Further, works that don't circumvent this restriction rely on refitting nuisance models a number of times proportional to the sample size. In this paper, we construct and analyze a simple, softmax smoothing-based estimator for the value of an optimal treatment policy. Our estimator applies in both static and dynamic treatment regimes, only requires fitting a constant number of nuisance models, and is statistically efficient when there is zero probability of non-response to treatment. Also, while our estimator does not require making semi-parametric restrictions, it can exploit them when they exist. We further show how our softmax smoothing approach can be used to estimate general parameters that are specified as a maximum of scores involving nuisance components, and look at conditional Balke and Pearl bounds and $L^1$ calibration error as salient examples.

2506.23062 2026-04-01 math.PR cs.DS cs.NA math.AP math.NA math.ST stat.TH

Shifted Composition IV: Toward Ballistic Acceleration for Log-Concave Sampling

Jason M. Altschuler, Sinho Chewi, Matthew S. Zhang

Comments v3: amending minor typos

详情
英文摘要

Acceleration is a celebrated cornerstone of convex optimization, enabling gradient-based algorithms to converge sublinearly in the condition number. A major open question is whether an analogous acceleration phenomenon is possible for log-concave sampling. Underdamped Langevin dynamics (ULD) has long been conjectured to be the natural candidate for acceleration, but a central challenge is that its degeneracy necessitates the development of new analysis approaches, e.g., the theory of hypocoercivity. Although recent breakthroughs established ballistic acceleration for the (continuous-time) ULD diffusion via space-time Poincare inequalities, (discrete-time) algorithmic results remain entirely open: the discretization error of existing analysis techniques dominates any continuous-time acceleration. In this paper, we give a new coupling-based local error framework for analyzing ULD and its numerical discretizations in KL divergence. This extends the framework in Shifted Composition III from uniformly elliptic diffusions to degenerate diffusions, and shares its virtues: the framework is user-friendly, applies to sophisticated discretization schemes, and does not require contractivity. Applying this framework to the randomized midpoint discretization of ULD establishes the first ballistic acceleration result for log-concave sampling (i.e., sublinear dependence on the condition number). Along the way, we also obtain the first $d^{1/3}$ iteration complexity guarantee for sampling to constant total variation error in dimension $d$.

2506.20114 2026-04-01 stat.ML cs.LG

Extracting Interpretable Models from Tree Ensembles: Computational and Statistical Perspectives

Brian Liu, Rahul Mazumder, Peter Radchenko

详情
英文摘要

Tree ensembles are non-parametric methods widely recognized for their accuracy and ability to capture complex interactions. While these models excel at prediction, they are difficult to interpret and may fail to uncover useful relationships in the data. We propose an estimator to extract compact sets of decision rules from tree ensembles. The extracted models are accurate and can be manually examined to reveal relationships between the predictors and the response. A key novelty of our estimator is the flexibility to jointly control the number of rules extracted and the interaction depth of each rule, which improves accuracy. We develop a tailored exact algorithm to efficiently solve optimization problems underlying our estimator and an approximate algorithm for computing regularization paths, sequences of solutions that correspond to varying model sizes. We also establish novel non-asymptotic prediction error bounds for our proposed approach, comparing it to an oracle that chooses the best data-dependent linear combination of the rules in the ensemble subject to the same complexity constraint as our estimator. The bounds illustrate that the large-sample predictive performance of our estimator is on par with that of the oracle. Through experiments, we demonstrate that our estimator outperforms existing algorithms for rule extraction.

2504.15543 2026-04-01 stat.ME cs.IT math.IT stat.ML

Bayesian model-averaging stochastic item selection for adaptive testing

Tina Su, Edison Choe, Joshua C. Chang

Comments Under review; major revision

详情
英文摘要

Computer Adaptive Testing (CAT) aims to accurately estimate an individual's ability using only a subset of an Item Response Theory (IRT) instrument. Many applications also require diverse item exposure across testing sessions, preventing any single item from being over- or underutilized. In CAT, items are selected sequentially based on a running estimate of a respondent's ability. Prior methods almost universally see item selection through an optimization lens, motivating greedy item selection procedures. While efficient, these deterministic methods tend to have poor item exposure. Existing stochastic methods for item selection are ad-hoc, with item sampling weights that lack theoretical justification. We formulate stochastic CAT as a Bayesian model averaging problem. We seek item sampling probabilities, treated in the long-run frequentist sense, that perform optimal model averaging for the ability estimate in a Bayesian sense. The derivation yields an information criterion for optimal stochastic mixing: the expected entropy of the next posterior. We tested our method on seven publicly available psychometric instruments spanning personality, social attitudes, narcissism, and work preferences, in addition to the eight scales of the Work Disability Functional Assessment Battery. Across all instruments, accuracy differences between selection methods at a given test length are varied but minimal relative to the natural noise in ability estimation; however, the stochastic selector achieves full item bank exposure, resolving the longstanding tradeoff between measurement efficiency and item security at negligible accuracy cost.

2504.13565 2026-04-01 stat.ME

Constructive Instrumental Variable Identification and Inference with Many Weak Interaction Moments

Di Zhang, Minhao Yao, Zhonghua Liu, Baoluo Sun

Comments 35 pages, 1 figure, 3 tables

详情
英文摘要

Instrumental variable methods are widely used for causal inference, but identification becomes especially challenging when instruments are weak and potentially invalid. These challenges are particularly pronounced in Mendelian randomization, where genetic variants serve as instruments and violations of exclusion restriction or independence assumptions are common. We propose MAGIC, a constructive and assumption-lean framework that achieves identification even when all candidate instruments may be invalid. The method exploits pairwise and higher-order interactions among mutually independent instruments to construct moment conditions orthogonal to both unmeasured confounding and direct effects under a linear structural model. The resulting estimation problem involves many potentially weak interaction moments with unknown nuisance parameters. We develop a semiparametric generalized method of moments estimator and introduce a global Neyman orthogonality condition to ensure robustness of both the moment function and its derivative to nuisance estimation under many weak moment asymptotics. We establish consistency and asymptotic normality when the number of moments diverges with sample size and characterize the semiparametric efficiency bound under fixed dimension. Simulations and an application to UK Biobank data illustrate the method.

2406.14753 2026-04-01 cs.LG stat.ME

A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms

Weiqin Chen, Mark S. Squillante, Chai Wah Wu, Santiago Paternain

详情
英文摘要

We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We establish various theoretical properties of our approach, such as convergence and optimality of our analog of the Bellman operator and Q-learning, a new control-policy-variable gradient theorem, and a specific gradient ascent algorithm based on this theorem within the context of a specific control-theoretic framework. We empirically evaluate the performance of our control theoretic approach on several classical reinforcement learning tasks, demonstrating significant improvements in solution quality, sample complexity, and running time of our approach over state-of-the-art methods.

2404.00256 2026-04-01 stat.ME

Robust Bayesian Modeling with Adaptive Posterior FDR Control for Large-Scale Data

Yoshiko Hayashi

详情
英文摘要

Controlling the false discovery rate (FDR) is a critical challenge in large-scale data analysis, particularly in the presence of outliers. A common practice involves imposing a Student-$t$ distribution to eliminate the influence of outliers. Here, we developed a robust Bayesian analysis based on heavy-tailed modeling, applied it to large-scale studies in Bayesian inference, and performed diagnoses for detecting outliers using the posterior predictive $p$-value ($ppp$). In addition, we propose an adaptive method to decide the level of the posterior false discovery rate. We demonstrated the utility of our methods using gene expression data for colorectal cancer. We suggest an adaptive method to determine it using an estimated ratio of true null genes using Storey's $q$-value method.

2310.11065 2026-04-01 stat.ML cs.LG

Cheap Bootstrap for Fast Uncertainty Quantification of Stochastic Gradient Descent

Henry Lam, Zitong Wang

详情
Journal ref
Journal of Machine Learning Research, 27(25-0008):1-42, 2026
英文摘要

Stochastic gradient descent (SGD) or stochastic approximation has been widely used in model training and stochastic optimization. While there is a huge literature on analyzing its convergence, inference on the obtained solutions from SGD has only been recently studied, yet it is important due to the growing need for uncertainty quantification. We investigate two computationally cheap resampling-based methods to construct confidence intervals for SGD solutions. One uses multiple, but few, SGDs in parallel via resampling with replacement from the data, and another operates this in an online fashion. Our methods can be regarded as enhancements of established bootstrap schemes to substantially reduce the computation effort in terms of resampling requirements, while bypassing the intricate mixing conditions in existing batching methods. We achieve these via a recent so-called cheap bootstrap idea and refinement of a Berry-Esseen-type bound for SGD.

2212.03944 2026-04-01 math.PR math-ph math.MP math.ST stat.TH

LDP for Inhomogeneous U-Statistics

Sohom Bhattacharya, Nabarun Deb, Sumit Mukherjee

Comments 37 pages, accepted for publication in the Annals of Applied Probability

详情
英文摘要

In this paper we derive a Large Deviation Principle (LDP) for inhomogeneous U/V-statistics of a general order. Using this, we derive a LDP for two types of statistics: random multilinear forms, and number of monochromatic copies of a subgraph. We show that the corresponding rate functions in these cases can be expressed as a variational problem over a suitable space of functions. We use the tools developed to study Gibbs measures with the corresponding Hamiltonians, which include tensor generalizations of both Ising (with non-compact base measure) and Potts models. For these Gibbs measures, we establish scaling limits of log normalizing constants, and weak laws in terms of weak* topology, which are of possible independent interest.