arXivDaily arXiv每日学术速递 周一至周五更新
重置
2601.14542 2026-03-11 astro-ph.GA physics.data-an stat.AP

New techniques to investigate the AGN-SF connection with integral field spectroscopy

Aman Chopra, Henry R. M. Zovaro, Rebecca L. Davies

Comments 26 pages, 23 figures

详情
Journal ref
Publ. Astron. Soc. Aust. 43 (2026) e023
英文摘要

Understanding the connection between active galactic nuclei and star-formation (the AGN-SF connection) is one of the longest standing problems in modern astrophysics. In the age of large Integral Field Unit (IFU) surveys, studies of the AGN-SF connection greatly benefit from spatially resolving AGN and SF contributions to study the two processes independently. Using IFU data for 54 local active galaxies from the S7 sample, we present a new method to separate emission from AGN activity and SF using mixing sequences observed in the [NII]$λ6584$/H$α$ vs. [OIII]$λ5007$/H$β$ Baldwin-Phillips-Terlevich (BPT) diagram. We use the new decomposition method to calculate the H$α$ star-formation rate and AGN [OIII] luminosity for the galaxies. Our new method is robust to outliers in the line-ratio distribution and can be applied to large galaxy samples with little manual intervention. We infer star-formation histories (SFHs) using pPXF, conducting detailed recovery tests to determine the quantities that can be considered robust. We test the correlation between the AGN Eddington ratio, using the proxy L[OIII]/$σ_*^4$, and star-formation properties. We find a moderately strong correlation between the Eddington ratio and the star-formation rate (SFR). We also observe marginally significant correlations between the AGN Eddington ratio and the light-weighted stellar age under 100 Myr. Our results point to higher AGN accretion being associated with young nuclear star formation under 100 Myr, consistent with timelines presented in previous studies. The correlations found in this paper are relatively weak; extending our methods to larger samples, including radio-quiet galaxies, will help better constrain the physical mechanisms and timescales of the AGN-SF connection.

2407.18835 2026-03-11 stat.ME math.ST stat.AP stat.OT stat.TH

Robust Estimation of Polychoric Correlation

Max Welz, Patrick Mair, Andreas Alfons

Comments 78 pages (37 main text), 21 figures (9 in main text), 10 tables (5 in main text). This is the final version of this article, as accepted in Psychometrika

详情
Journal ref
Psychometrika 91 (2026) 247-278
英文摘要

Polychoric correlation is often an important building block in the analysis of rating data, particularly for structural equation models. However, the commonly employed maximum likelihood (ML) estimator is highly susceptible to misspecification of the polychoric correlation model, for instance through violations of latent normality assumptions. We propose a novel estimator that is designed to be robust against partial misspecification of the polychoric model, that is, when the model is misspecified for an unknown fraction of observations, such as careless respondents. To this end, the estimator minimizes a robust loss function based on the divergence between observed frequencies and theoretical frequencies implied by the polychoric model. In contrast to existing literature, our estimator makes no assumption on the type or degree of model misspecification. It furthermore generalizes ML estimation, is consistent as well as asymptotically normally distributed, and comes at no additional computational cost. We demonstrate the robustness and practical usefulness of our estimator in simulation studies and an empirical application on a Big Five administration. In the latter, the polychoric correlation estimates of our estimator and ML differ substantially, which, after further inspection, is likely due to the presence of careless respondents that the estimator helps identify.

2603.09952 2026-03-11 cs.LG cs.NA cs.SY eess.SY math.NA math.OC stat.ML

On the Width Scaling of Neural Optimizers Under Matrix Operator Norms I: Row/Column Normalization and Hyperparameter Transfer

Ruihan Xu, Jiajin Li, Yiping Lu

详情
英文摘要

A central question in modern deep learning is how to design optimizers whose behavior remains stable as the network width $w$ increases. We address this question by interpreting several widely used neural-network optimizers, including \textrm{AdamW} and \textrm{Muon}, as instances of steepest descent under matrix operator norms. This perspective links optimizer geometry with the Lipschitz structure of the network forward map, and enables width-independent control of both Lipschitz and smoothness constants. However, steepest-descent rules induced by standard $p \to q$ operator norms lack layerwise composability and therefore cannot provide width-independent bounds in deep architectures. We overcome this limitation by introducing a family of mean-normalized operator norms, denoted $\pmean \to \qmean$, that admit layerwise composability, yield width-independent smoothness bounds, and give rise to practical optimizers such as \emph{rescaled} \textrm{AdamW}, row normalization, and column normalization. The resulting learning rate width-aware scaling rules recover $μ$P scaling~\cite{yang2021tensor} as a special case and provide a principled mechanism for cross-width learning-rate transfer across a broad class of optimizers. We further show that \textrm{Muon} can suffer an $\mathcal{O}(\sqrt{w})$ worst-case growth in the smoothness constant, whereas a new family of row-normalized optimizers we propose achieves width-independent smoothness guarantees. Based on the observations, we propose MOGA (Matrix Operator Geometry Aware), a width-aware optimizer based only on row/column-wise normalization that enables stable learning-rate transfer across model widths. Large-scale pre-training on GPT-2 and LLaMA shows that MOGA, especially with row normalization, is competitive with Muon while being notably faster in large-token and low-loss regimes.

2603.09842 2026-03-11 cs.LG stat.ME stat.ML

A Unified Hierarchical Multi-Task Multi-Fidelity Framework for Data-Efficient Surrogate Modeling in Manufacturing

Manan Mehta, Zhiqiao Dong, Yuhang Yang, Chenhui Shao

详情
英文摘要

Surrogate modeling is an essential data-driven technique for quantifying relationships between input variables and system responses in manufacturing and engineering systems. Two major challenges limit its effectiveness: (1) large data requirements for learning complex nonlinear relationships, and (2) heterogeneous data collected from sources with varying fidelity levels. Multi-task learning (MTL) addresses the first challenge by enabling information sharing across related processes, while multi-fidelity modeling addresses the second by accounting for fidelity-dependent uncertainty. However, existing approaches typically address these challenges separately, and no unified framework simultaneously leverages inter-task similarity and fidelity-dependent data characteristics. This paper develops a novel hierarchical multi-task multi-fidelity (H-MT-MF) framework for Gaussian process-based surrogate modeling. The proposed framework decomposes each task's response into a task-specific global trend and a residual local variability component that is jointly learned across tasks using a hierarchical Bayesian formulation. The framework accommodates an arbitrary number of tasks, design points, and fidelity levels while providing predictive uncertainty quantification. We demonstrate the effectiveness of the proposed method using a 1D synthetic example and a real-world engine surface shape prediction case study. Compared to (1) a state-of-the-art MTL model that does not account for fidelity information and (2) a stochastic kriging model that learns tasks independently, the proposed approach improves prediction accuracy by up to 19% and 23%, respectively. The H-MT-MF framework provides a general and extensible solution for surrogate modeling in manufacturing systems characterized by heterogeneous data sources.

2603.09768 2026-03-11 math.ST stat.TH

The exact region between Chatterjee's and Blest's rank correlations

Marcus Rockel

详情
英文摘要

Exact regions between rank correlations describe the set of all pairs of values that two dependence measures can attain simultaneously on the same copula and thus yield sharp inequalities between them. In this paper, we determine the exact region between Chatterjee's rank correlation $ξ$ and Blest's rank correlation $ν$ over the class of all bivariate copulas. Our approach is based on a constrained optimization problem whose solution is characterized by Karush--Kuhn--Tucker conditions. This leads to a novel extremal copula family that uniquely traces the boundary of the region. For this family, we derive closed-form expressions for both $ξ$ and $ν$, which provide an explicit parametrization of the exact attainable region.

2603.09680 2026-03-11 math.NT stat.ML

Murmurations: a case study in AI-assisted mathematics

Yang-Hui He, Kyu-Hwan Lee, Thomas Oliver, Alexey Pozdnyakov

Comments 12 pages, 15 figures

详情
英文摘要

We report the emergence of a striking new phenomenon in arithmetic, which we call murmurations. First observed experimentally through averages over large arithmetic datasets, murmurations can be detected and analyzed using standard interpretability tools from machine learning, including principal component weightings, saliency curves, and convolutional filters. Although discovered computationally, they constitute a genuinely new and intriguing phenomenon in arithmetic that can be formulated and investigated using established tools of number theory. In particular, murmurations encode subtle information about Frobenius traces and naturally belong to the framework of arithmetic statistics. More precisely, murmurations connect to central themes surrounding the conjecture of Birch and Swinnerton-Dyer and perspectives from random matrix theory. In this paper, we present an overview of murmurations, contextualizing them within number theory and AI.

2603.09629 2026-03-11 math.ST stat.TH

On the last time and the number of times an estimator is more than epsilon from its target value

Nils Lid Hjort, Grete Fenstad

Comments 18 pages, no figures; Statistical Research Report, Department of Mathematics, University of Oslo, from April 1991, now arXiv'd March 2026. The paper has appeared in Annals of Statistics, 1992, vol. 20, pages 469-489, at this url: projecteuclid.org/journals/annals-of-statistics/volume-20/issue-1/On-the-Last-Time-and-the-Number-of-Times-an/10.1214/aos/1176348533.full

详情
英文摘要

Suppose $\widehatθ_n$ is a strongly consistent estimator for $θ_0$ in some i.i.d. situation. Let $N_\varepsilon$ and $Q_\varepsilon$ be respectively the last $n$ and the total number of $n$ for which $\widehatθ_n$ is at least $\varepsilon$ away from $θ_0$. The limit distributions for ${\varepsilon}^2 N_\varepsilon$ and ${\varepsilon}^2 Q_\varepsilon$ as $\varepsilon$ goes to zero are obtained under natural and weak conditions. The theory covers both parametric and nonparametric cases, multi-dimensional parameters, and general distance functions. Our results are of probabilistic interest, and, on the statistical side, suggest ways in which competing estimators can be compared. In particular several new optimality properties for the maximum likelihood estimator sequence in parametric families are established. Another use of our results is ways of constructing sequential fixed-volume or shrinking-volume confidence sets, as well as sequential tests with power 1. The paper also includes limit distribution results for the last $n$ and the number of $n$ for which the supremum distance $\|F_n-F\|\ge\varepsilon$, where $F_n$ is the empirical distribution function. Yet other results are reached for $\varepsilon^{5/2} N_\varepsilon$ and $\varepsilon^{5/2} Q_\varepsilon$ in the context of nonparametric density estimation, referring to the last time and the number of times where $|f_n(x) f(x)|\ge\varepsilon$. Finally it is shown that our results extend to several non-i.i.d. situations.

2603.09601 2026-03-11 cs.LG stat.ME stat.ML

MM-algorithms for traditional and convex NMF with Tweedie and Negative Binomial cost functions and empirical evaluation

Elisabeth Sommer James, Asger Hobolth, Marta Pelizzola

详情
英文摘要

Non-negative matrix factorisation (NMF) is a widely used tool for unsupervised learning and feature extraction, with applications ranging from genomics to text analysis and signal processing. Standard formulations of NMF are typically derived under Gaussian or Poisson noise assumptions, which may be inadequate for data exhibiting overdispersion or other complex mean-variance relationships. In this paper, we develop a unified framework for both traditional and convex NMF under a broad class of distributional assumptions, including Negative Binomial and Tweedie models, where the connection between the Tweedie and the $β$-divergence is also highlighted. Using a Majorize-Minimisation approach, we derive multiplicative update rules for all considered models, and novel updates for convex NMF with Poisson and Negative Binomial cost functions. We provide a unified implementation of all considered models, including the first implementations of several convex NMF models. Empirical evaluations on mutational and word count data demonstrate that the choice of noise model critically affects model fit and feature recovery, and that convex NMF can provide an efficient and robust alternative to traditional NMF in scenarios where the number of classes is large. The code for our proposed updates is available in the R package nmfgenr and can be found at https://github.com/MartaPelizzola/nmfgenr.

2603.09564 2026-03-11 stat.ML cs.LG

a-TMFG: Scalable Triangulated Maximally Filtered Graphs via Approximate Nearest Neighbors

Lionel Yelibi

详情
英文摘要

The traditional Triangular Maximally Filtered Graph (TMFG) construction requires pre-computation and storage of a dense correlation matrix; this limits its applicability to small and medium-sized datasets. Here we identify key memory and runtime complexity challenges when using TMFG at scale. We then present the Approximate Triangular Maximally Filtered Graph (a-TMFG) algorithm. This is a novel approach to scaling the construction of artificial graphs from data inspired by TMFG. The method employs k-Nearest Neighbors Graphs (kNNG) for initial construction, and implements a memory management strategy to search and estimate missing correlations on-the-fly. This provides representations to control combinatorial explosion. The algorithm is tested for robustness to the parameters and noise, and is evaluated on datasets with millions of observations. This new method provides a parsimonious way to construct graphs for use-cases where graphs are used as input to supervised and unsupervised learning but where no natural graph exists.

2603.09532 2026-03-11 stat.ML cs.LG

What Do We Care About in Bandits with Noncompliance? BRACE: Bandits with Recommendations, Abstention, and Certified Effects

Nicolás Della Penna

详情
英文摘要

Bandits with noncompliance separate the learner's recommendation from the treatment actually delivered, so the learning target itself must be chosen. A platform may care about recommendation welfare in the current mediated workflow, treatment learning for a future direct-control regime, or anytime-valid uncertainty for one of those targets. These objectives need not agree. We formalize this objective-choice problem, identify the direct-control regime in which recommendation and treatment objectives collapse, and show by example that recommendation welfare can strictly exceed every learner-measurable treatment policy when downstream actors use private information. For finite-context square-IV problems we propose BRACE, a parameter-free phase-doubling algorithm that performs IV inversion only after matrix certification and otherwise returns full-range but honest structural intervals. BRACE delivers simultaneous policy-value validity, fixed-gap identification of the operationally optimal recommendation policy, and fixed-gap identification of the structurally optimal treatment policy under contextual homogeneity and invertibility. We complement the theory with a finite-context empirical benchmark spanning direct control, mediated present-versus-future tradeoffs, weak identification, homogeneity failure, and rectangular overidentification. The experiments show that safety appears as regret on easy problems, as abstention and wide valid intervals under weak identification, as a reason to prefer recommendation welfare under homogeneity failure, and as tighter structural uncertainty when extra instruments are available. For rich contexts, we also derive an orthogonal score whose conditional bias factorizes into compliance-model and outcome-model errors, clarifying what must be stabilized for anytime-valid semiparametric IV inference.

2603.09504 2026-03-11 math.PR math.ST stat.TH

Uniform Lorden-type bounds for overshoot moments for standard exponential families: small drift and an exponential correction

El'mira Yu. Kalimulina, Mark Ya. Kelbert

Comments 20 pages, no figure

详情
英文摘要

We study the overshoot \(R_b=S_{τ(b)}-b\) of a random walk with independent identically distributed increments from a standardised one-parameter exponential family, with primary emphasis on the small-drift regime \(θ\downarrow0\). Unlike the classical renewal-process setting with nonnegative increments, we allow sign-changing increments and assume only a positive drift \(μ_θ>0\). For each \(k\in\mathbb N\) we obtain Lorden-type moment bounds, uniform in the barrier \(b\), for \(\E_θ[R_b^k]\) with an explicit remainder term decaying exponentially in \(b\). The proof reduces the problem to the renewal process of strict ascending ladder heights and combines a simple bound for the limiting overshoot moments with a uniform exponential estimate for the rate of convergence of the distribution functions of \(R_b\) to the limiting random variable \(R_\infty\) as \(b\to\infty\), uniformly in \(θ\in[0,θ^\ast]\). As a consequence, the classical constant \((k+2)/(k+1)\) arising in residual-life bounds improves to \(C_k=1\) for sufficiently large \(b\) at fixed \(θ\), and also uniformly over all \(b\ge0\) in the small-drift regime. Counterexamples are provided showing that the stronger inequality with \(kμ_θ\) in the denominator cannot hold uniformly in \((b,θ)\). Finally, the exponential CDF estimate is interpreted in terms of optimal transport: we obtain exponential convergence in the metric \(W_1\), a quantile coupling with \(\E|\widetilde R_b-\widetilde R_\infty|=O(e^{-rb})\), error bounds for Lipschitz functionals and a total-variation bound for smoothed distributions.

2603.09428 2026-03-11 stat.AP

Bayesian Species Distribution Models using Hierarchical Decomposition Priors

Luisa Ferrari, Massimo Ventrucci, Alex Laini

Comments 43 pages, 8 figures

详情
英文摘要

Understanding the relative contributions of environmental, spatial, and temporal processes in shaping species distribution is a central objective in ecology. Bayesian species distribution models (SDMs) offer a flexible framework for this task, yet prior specification for variance components remains challenging. To address this issue, we adapt the Hierarchical Decomposition (HD) prior framework to latent Gaussian SDMs, enabling direct and transparent prior control over variance partitioning. The HD approach reparametrizes variances into a total variance and a set of interpretable proportions, structured through a decomposition tree that reflects both model architecture and ecologically meaningful groupings of effects. We discuss a principled approach for a default tree design tailored to SDMs and a practical workflow for the step-by-step implementation of the method. The framework is illustrated using presence--absence data for 39 demersal fish species from the NOAA Northeast Fisheries Science Center fall bottom trawl survey. Results demonstrate predictive performance comparable to established priors, while providing substantially improved interpretability and transparency in variance attribution and prior sensitivity analysis.

2603.09425 2026-03-11 stat.AP cs.AI

CERES: A Probabilistic Early Warning System for Acute Food Insecurity

Tom Danny S. Pedersen

Comments 12 pages, 4 tables, 2 appendices. Live system: https://ceres.northflow.no

详情
英文摘要

We present CERES (Calibrated Early-warning and Risk Estimation System), an automated probabilistic forecasting system for acute food insecurity. CERES generates 90-day ahead probability estimates of IPC Phase 3+ (Crisis), Phase 4+ (Emergency), and Phase 5 (Famine) conditions for 43 high-risk countries globally, updated weekly. The system fuses six data streams, precipitation anomalies (CHIRPS), vegetation indices (MODIS NDVI), conflict events (ACLED), IPC classifications, food consumption scores (WFP), and cereal price indices (FAO/WFP) - through a logistic scoring model with author-specified initial coefficients and parametric input-perturbation intervals (n=2,000 draws). In historical back-validation against four IPC Phase 4-5 events selected for data completeness, CERES assigned TIER-1 classification in all four cases; these are in-sample sanity checks only, not prospective performance claims. All prospective predictions are timestamped, cryptographically identified, and archived for public verification against IPC outcome data at the T+90 horizon. To the author's knowledge, CERES is the first famine early warning system that is simultaneously: (1) probabilistic, (2) open-access, (3) continuously running, (4) machine-readable at prediction level, and (5) committed to public prospective verification of every prediction made.

2603.09318 2026-03-11 stat.ME stat.AP stat.OT

Anomaly detection using surprisals

Rob J Hyndman, David T. Frazier

详情
英文摘要

Anomaly detection methods are widely used but often rely on ad hoc rules or strong assumptions, and they often focus on tail events, missing ``inlier'' anomalies that occur in low-density gaps between modes. We propose a unified framework that defines an anomaly as an observation with unusually low probability under a (possibly misspecified) model. For each observation we compute its surprisal (the negative log generalized density) and define an anomaly score as the probability of a surprisal at least as large as that observed. This reduces anomaly detection for complex univariate or multivariate data to estimating the upper tail of a univariate surprisal distribution. We develop two model-robust estimators of these tail probabilities: an empirical estimator based on the observed surprisal distribution and an extreme-value estimator that fits a Generalized Pareto Distribution above a high threshold. For the empirical method we give conditions under which tail ordering is preserved and derive finite-sample confidence guarantees via the Dvoretzky--Kiefer--Wolfowitz inequality. For the GPD method we establish broad tail conditions ensuring classical extreme-value behavior. Simulations and applications to French mortality and Test-cricket data show the approach remains effective under substantial model misspecification.

2603.09314 2026-03-11 math.ST stat.TH

Second order asymptotics for the number of times an estimator is more than epsilon from its target value

Nils Lid Hjort, Grete Fenstad

Comments 11 pages, no figures; Statistical Research Report, Department of Mathematics, University of Oslo, September 1994, but now arXiv'd March 2026. The paper has appeared in essentially this form in Journal of Statistical Planning and Inference, 1995, vol. 48, pages 261-275, at this url: www.sciencedirect.com/science/article/pii/037837589500008W

详情
英文摘要

Suppose $\{\widehatθ_n\colon n\ge1\}$ is a strongly consistent sequence of estimators for a parameter $θ$, where $\widehatθ_n$ is based on the first $n$ observations. Consider $Q_\varepsilon$, the number of times $|\widehatθ_n-θ|\ge\varepsilon$. In another paper (Hjort and Fenstad, 1992) we have shown that $\varepsilon^2 Q_\varepsilon$ has a limit distribution as $\varepsilon\rightarrow0$, depending only on $σ$, the standard deviation of the limit distribution for $\sqrt{n}(\widehatθ_n-θ)$, under natural regularity conditions. The present paper investigates some second order asymptotics for differences between $Q_\varepsilon$ variables. The limit of ${\rm E}(Q_{1,\varepsilon}-Q_{2,\varepsilon})$ is calculated in cases where ${\rm E} Q_{1,\varepsilon}/{\rm E} Q_{2,\varepsilon}$ goes to 1, leading to a notion of `asymptotic relative deficiency' in cases where the asymptotic relative efficiency is 1. This is used to distinguish between competing estimators with identical limit distributions. Thus using denominator $n-{1\over3}$ in the familiar formula for estimating a normal variance is better than both $n$ and $n-1$ and indeed all other choices, for example, in the sense of leading to the smallest possible expected number of $\varepsilon$ errors. Results of this type are found in a selection of familiar estimation problems, using limit results for expected differences, and are compared to corresponding asymptotic relative deficiency analysis in the sense of Hodges and Lehmann. Some second order distributional results are reached as well. It is shown how $\varepsilon$ times a $Q_\varepsilon$-difference tends to a variable which is related to some exponential distributions associated with Brownian motion, and that have recently been investigated by Hjort and Khasminskii (1993).

2603.09310 2026-03-11 cs.LG math.PR stat.ML

A Gaussian Comparison Theorem for Training Dynamics in Machine Learning

Ashkan Panahi

详情
英文摘要

We study training algorithms with data following a Gaussian mixture model. For a specific family of such algorithms, we present a non-asymptotic result, connecting the evolution of the model to a surrogate dynamical system, which can be easier to analyze. The proof of our result is based on the celebrated Gordon comparison theorem. Using our theorem, we rigorously prove the validity of the dynamic mean-field (DMF) expressions in the asymptotic scenarios. Moreover, we suggest an iterative refinement scheme to obtain more accurate expressions in non-asymptotic scenarios. We specialize our theory to the analysis of training a perceptron model with a generic first-order (full-batch) algorithm and demonstrate that fluctuation parameters in a non-asymptotic domain emerge in addition to the DMF kernels.

2603.09306 2026-03-11 stat.ME

Contrastive Bayesian Inference for Unnormalized Models

Naruki Sonobe, Shonosuke Sugasawa, Daichi Mochihashi, Takeru Matsuda

详情
英文摘要

Unnormalized (or energy-based) models provide a flexible framework for capturing the characteristics of data with complex dependency structures. However, the application of standard Bayesian inference methods has been severely limited because the parameter-dependent normalizing constant is either analytically intractable or computationally prohibitive to evaluate. A promising approach is score-based generalized Bayesian inference, which avoids evaluating the normalizing constant by replacing the likelihood with a scoring rule. However, this approach requires careful tuning of the likelihood information, and it may fail to yield valid inference without appropriate control. To overcome this difficulty, we propose a fully Bayesian framework for inference on unnormalized models that does not require such tuning. We build on noise contrastive estimation, which recasts inference as a binary classification problem between observed and noise samples, and treat the normalizing constant as an additional unknown parameter within the resulting likelihood. For exponential families, the classification likelihood becomes conditionally Gaussian via Pólya-Gamma data augmentation, leading to a simple Gibbs sampler. We demonstrate the proposed approach through two models: time-varying density models for temporal point process data and sparse torus graph models for multivariate circular data. Through simulation studies and real-data analyses, the proposed method provides accurate point estimation and enables principled uncertainty quantification.

2603.09257 2026-03-11 cs.LG stat.ML

Transductive Generalization via Optimal Transport and Its Application to Graph Node Classification

MoonJeong Park, Seungbeom Lee, Kyungmin Kim, Jaeseung Heo, Seunghyuk Cho, Shouheng Li, Sangdon Park, Dongwoo Kim

详情
英文摘要

Many existing transductive bounds rely on classical complexity measures that are computationally intractable and often misaligned with empirical behavior. In this work, we establish new representation-based generalization bounds in a distribution-free transductive setting, where learned representations are dependent, and test features are accessible during training. We derive global and class-wise bounds via optimal transport, expressed in terms of Wasserstein distances between encoded feature distributions. We demonstrate that our bounds are efficiently computable and strongly correlate with empirical generalization in graph node classification, improving upon classical complexity measures. Additionally, our analysis reveals how the GNN aggregation process transforms the representation distributions, inducing a trade-off between intra-class concentration and inter-class separation. This yields depth-dependent characterizations that capture the non-monotonic relationship between depth and generalization error observed in practice. The code is available at https://github.com/ml-postech/Transductive-OT-Gen-Bound.

2603.09251 2026-03-11 stat.ML cs.LG cs.NA math.NA

A Generative Sampler for distributions with possible discrete parameter based on Reversibility

Lei Li, Zhen Wang, Lishuo Zhang

详情
英文摘要

Learning to sample from complex unnormalized distributions is a fundamental challenge in computational physics and machine learning. While score-based and variational methods have achieved success in continuous domains, extending them to discrete or mixed-variable systems remains difficult due to ill-defined gradients or high variance in estimators. We propose a unified, target-gradient-free generative sampling framework applicable across diverse state spaces. Building on the fact that detailed balance implies the time-reversibility of the equilibrium stochastic process, we enforce this symmetry as a statistical constraint. Specifically, using a prescribed physical transition kernel (such as Metropolis-Hastings), we minimize the Maximum Mean Discrepancy (MMD) between the joint distributions of forward and backward Markov trajectories. Crucially, this training procedure relies solely on energy evaluations via acceptance ratios, circumventing the need for target score functions or continuous relaxations. We demonstrate the versatility of our method on three distinct benchmarks: (1) a continuous multi-modal Gaussian mixture, (2) the discrete high-dimensional Ising model, and (3) a challenging hybrid system coupling discrete indices with continuous dynamics. Experiments show that our framework accurately reproduces thermodynamic observables and captures mode-switching behavior across all regimes, offering a physically grounded and universally applicable alternative for equilibrium sampling.

2603.09168 2026-03-11 cs.LG cs.DS stat.ML

Better Bounds for the Distributed Experts Problem

David P. Woodruff, Samson Zhou

详情
英文摘要

In this paper, we study the distributed experts problem, where $n$ experts are distributed across $s$ servers for $T$ timesteps. The loss of each expert at each time $t$ is the $\ell_p$ norm of the vector that consists of the losses of the expert at each of the $s$ servers at time $t$. The goal is to minimize the regret $R$, i.e., the loss of the distributed protocol compared to the loss of the best expert, amortized over the all $T$ times, while using the minimum amount of communication. We give a protocol that achieves regret roughly $R\gtrsim\frac{1}{\sqrt{T}\cdot\text{poly}\log(nsT)}$, using $\mathcal{O}\left(\frac{n}{R^2}+\frac{s}{R^2}\right)\cdot\max(s^{1-2/p},1)\cdot\text{poly}\log(nsT)$ bits of communication, which improves on previous work.

2603.06602 2026-03-11 cs.LG stat.ML

Khatri-Rao Clustering for Data Summarization

Martino Ciaperoni, Collin Leiber, Aristides Gionis, Heikki Mannila

详情
英文摘要

As datasets continue to grow in size and complexity, finding succinct yet accurate data summaries poses a key challenge. Centroid-based clustering, a widely adopted approach to address this challenge, finds informative summaries of datasets in terms of few prototypes, each representing a cluster in the data. Despite their wide adoption, the resulting data summaries often contain redundancies, limiting their effectiveness particularly in datasets characterized by a large number of underlying clusters. To overcome this limitation, we introduce the Khatri-Rao clustering paradigm that extends traditional centroid-based clustering to produce more succinct but equally accurate data summaries by postulating that centroids arise from the interaction of two or more succinct sets of protocentroids. We study two central approaches to centroid-based clustering, namely the well-established k-Means algorithm and the increasingly popular topic of deep clustering, under the lens of the Khatri-Rao paradigm. To this end, we introduce the Khatri-Rao k-Means algorithm and the Khatri-Rao deep clustering framework. Extensive experiments show that Khatri-Rao k-Means can strike a more favorable trade-off between succinctness and accuracy in data summarization than standard k-Means. Leveraging representation learning, the Khatri-Rao deep clustering framework offers even greater benefits, reducing even more the size of data summaries given by deep clustering while preserving their accuracy.

2603.00945 2026-03-11 math.OC cs.LG stat.ML

Non-Rectangular Average-Reward Robust MDPs: Optimal Policies and Their Transient Values

Shengbo Wang, Nian Si

详情
英文摘要

We study non-rectangular robust Markov decision processes under the average-reward criterion, where the ambiguity set couples transition probabilities across states and the adversary commits to a stationary kernel for the entire horizon. We show that any history-dependent policy achieving sublinear expected regret uniformly over the ambiguity set is robust-optimal, and that the robust value admits a minimax representation as the infimum over the ambiguity set of the classical optimal gains, without requiring any form of rectangularity or robust dynamic programming principle. Under the weak communication assumption, we establish the existence of such policies by converting high-probability regret bounds from the average-reward reinforcement learning literature into the expected-regret criterion. We then introduce a transient-value framework to evaluate finite-time performance of robust optimal policies, proving that average-reward optimality alone can mask arbitrarily poor transients and deriving regret-based lower bounds on transient values. Finally, we construct an epoch-based policy that combines an optimal stationary policy for the worst-case model with an anytime-valid sequential test and an online learning fallback, achieving a constant-order transient value.

2602.20007 2026-03-11 math.ST stat.ME stat.TH

Order-Induced Variance in the Moving-Range Sigma Estimator: A Total-Variance Decomposition

Andrew T. Karl

详情
英文摘要

I--MR charts commonly estimate the process standard deviation $σ$ via the span-2 average moving range divided by the unbiasing constant $d_2$; unlike the unbiased sample standard deviation ($S/c_4$), this estimator depends on ordering through adjacency, so permuting a fixed sample changes it. We formalize this by introducing an independent uniformly random permutation and applying the law of total variance, yielding an exact decomposition into a values component (variance of the permutation mean) and an adjacency component (expected conditional variance over permutations). The permutation mean is order-invariant and equals $\GMD/d_2$, where $\GMD$ is the sample Gini mean difference. Under i.i.d.\ Normal sampling, both components admit closed forms; the adjacency fraction converges to $0.3813$, and the familiar asymptotic efficiency loss relative to $S/c_4$ is almost entirely an adjacency effect.

2601.14947 2026-03-11 math.ST stat.ME stat.TH

Central subspace data depth

Giacomo Francisci, Claudio Agostinelli

Comments 25+34 pages, 7+4 figures

详情
英文摘要

Statistical data depth plays an important role in the analysis of multivariate data sets. The main outcome is a center-outward ordering of the observations that can be used both to highlight features of the underlying distribution of the data and as input to further statistical analysis. An important property of data depth is related to symmetric distributions as the point with the highest depth value, the center, coincides with the point of symmetry. However, there are applications in which it is more natural to consider symmetry with respect to a subspace of a certain dimension rather than to a point, i.e. a subspace of dimension zero. We provide a general framework to construct statistical data depths which attain maximum value in a subspace, providing a center-outward ordering from that subspace. We refer to these data depths as central subspace data depths. Moreover, if the distribution is symmetric with respect to a subspace, then the depth is maximized at that subspace. We introduce general notions of symmetry about a subspace for distributions, study the properties of central subspace data depths and provide asymptotic convergence for the corresponding sample versions. Additionally, we discuss connections with projection pursuit and dimension reduction. An application based on custom data fraud detection shows the importance of the proposed approach and strengthens its potential.

2601.05355 2026-03-11 stat.ML cs.AI cs.LG stat.CO stat.ME

An AI-powered Bayesian Generative Modeling Approach for Arbitrary Conditional Inference

Qiao Liu, Wing Hung Wong

详情
英文摘要

Modern data analysis increasingly requires flexible conditional inference P(X_B | X_A) where (X_A, X_B) is an arbitrary partition of observed variable X. Existing approaches are either restricted to a fixed conditioning structure or depend strongly on the distribution of conditioning masks during training. To address these limitations, we introduce Bayesian generative modeling (BGM), a unified framework for arbitrary conditional inference. BGM learns a generative model of X via a stochastic iterative Bayesian updating algorithm in which model parameters and latent variables are updated until convergence. Once trained, any conditional distribution can be obtained without retraining. Empirically, BGM achieves superior predictive performance with posterior predictive intervals, demonstrating that a single learned model can serve as a universal engine for conditional prediction with principled uncertainty quantification. We provide theoretical guarantees for convergence of the stochastic iterative algorithm, statistical consistency, and conditional risk bounds. The proposed BGM framework leverages modern AI to capture complex relationships among variables while adhering to Bayesian principles, offering a promising approach for a wide range of applications in modern data science. Code for BGM is available at https://github.com/liuq-lab/bayesgm. Document of BGM is available at https://bayesgm.readthedocs.io.

2512.11427 2026-03-11 stat.ME

Conditional Copula models using loss-based Bayesian Additive Regression Trees

Tathagata Basu, Fabrizio Leisen, Cristiano Villa, Kevin Wilson

Comments typos related to loss function inside the prior is fixed

详情
英文摘要

The study of dependence between random variables under external influences is a challenging problem in multivariate analysis. We address this by proposing a novel semi-parametric approach for conditional copula models using Bayesian additive regression trees (BART) models. BART is becoming a popular approach in statistical modelling due to its simple ensemble type formulation complemented by its ability to provide inferential insights. Although BART allows us to model complex functional relationships, it tends to suffer from overfitting. In this article, we exploit a loss-based prior for the tree topology that is designed to reduce the tree complexity. In addition, we propose a novel adaptive Reversible Jump Markov Chain Monte Carlo algorithm that is ergodic in nature and requires very few assumptions allowing us to model complex and non-smooth likelihood functions with ease. Moreover, we show that our method can efficiently recover the true tree structure and approximate a complex conditional copula parameter, and that our adaptive routine can explore the true likelihood region under a sub-optimal proposal variance. Lastly, we provide case studies concerning the effect of gross domestic product on the dependence between the life expectancies and literacy rates of the male and female populations of different countries.

2509.18978 2026-03-11 math.ST math.DG math.PR stat.TH

Refining Cramér-Rao Bound With Multivariate Parameters: An Extrinsic Geometry Perspective

Sunder Ram Krishnan

Comments Vector parameter extension of work done in arXiv:2509.17886

详情
英文摘要

We derive a vector generalization of the curvature-corrected Cramér--Rao bound (CRB) in the nonasymptotic regime using a Hilbert space square-root embedding. Building on previous scalar results, we establish a \emph{directional} curvature correction derived from the second fundamental form of the model manifold. To obtain matrix-valued refinements, we formulate sufficient conditions for a conservative matrix-level correction using a semidefinite program (SDP) based on sum-of-squares (SOS) relaxations. The framework is rigorously illustrated with two distinct geometries: (i) a curved Gaussian location model, which reveals a characteristic \textit{pinching effect} where directional bounds vanish along principal axes despite non-zero extrinsic curvature and classical subspace-based bounds using the second-order Bhattacharyya matrix provide overly optimistic variance predictions that fail to account for the manifold's directional topology, and (ii) a spherical multinomial model where the curvature is isotropic. Our results demonstrate that while classical second-order corrections using the Bhattacharyya matrix provide useful benchmarks derived from the local coordinate basis, the proposed directional and SOS-certified bounds offer a more faithful and geometry-consistent representation of the directional sensitivity and fundamental limits of estimation in curved statistical families.

2509.17886 2026-03-11 math.ST math.DG math.PR stat.TH

Improving Cramér-Rao Bound And Its Variants: An Extrinsic Geometry Perspective

Sunder Ram Krishnan

Comments Improved and corrected version

详情
英文摘要

This work presents a geometric refinement of the classical Cramér--Rao bound (CRB) in the non-asymptotic regime by incorporating curvature-aware corrections based on the second fundamental form associated with the statistical model manifold. That is, our formulation shows that relying on the extrinsic geometry of the square root embedding of the manifold in the ambient Hilbert space comprising square integrable functions with respect to a fixed base measure offers a rigorous (and intuitive) way to improve upon the CRB and some of its variants, such as the Bhattacharyya-type bounds, that use higher-order derivatives of the log-likelihood. Precisely, the improved bounds in the latter case make explicit use of the elegant framework offered by employing the Faà di Bruno formula and exponential Bell polynomials in expressing the jets associated with the square root embedding in terms of the raw scores. The interplay between the geometry of the statistical embedding and the behavior of the estimator variance is quantitatively analyzed in concrete examples, showing that our corrections can meaningfully tighten the lower bound, suggesting further exploration into connections with estimator efficiency in more general situations.

2509.10325 2026-03-11 stat.ME

Using the rejection sampling for finding tests

Markku Kuismin

详情
英文摘要

A new method based on the rejection sampling for finding statistical tests is proposed. This method is conceptually intuitive, easy to implement, and applicable for arbitrary dimension. To illustrate its potential applicability, three distinct empirical examples are presented: (1) examine the differences between group means of correlated (repeated) or independent samples, (2) examine if a mean vector equals to a specific fixed vector, and (3) investigate if samples come from a specific population distribution. The simulation examples indicate that the new test has similar statistical power as uniformly the most powerful (unbiased) tests. Moreover, these examples demonstrate that the new test is a powerful goodness-of-fit test.

2509.10166 2026-03-11 stat.ML cs.LG

Repulsive Monte Carlo on the sphere for the sliced Wasserstein distance

Vladimir Petrovic, Rémi Bardenet, Agnès Desolneux

详情
英文摘要

In this paper, we consider the problem of computing the integral of a function on the unit sphere, in any dimension, using Monte Carlo methods. Although the methods we present are general, our guiding thread is the sliced Wasserstein distance between two measures on $\mathbb{R}^d$, which is precisely an integral on the $d$-dimensional sphere. The sliced Wasserstein distance (SW) has gained momentum in machine learning either as a proxy to the less computationally tractable Wasserstein distance, or as a distance in its own right, due in particular to its built-in alleviation of the curse of dimensionality. There has been recent numerical benchmarks of quadratures for the sliced Wasserstein, and our viewpoint differs in that we concentrate on quadratures where the nodes are repulsive, i.e. negatively dependent. Indeed, negative dependence can bring variance reduction when the quadrature is adapted to the integration task. Our first contribution is to extract and motivate quadratures from the recent literature on determinantal point processes (DPPs) and repelled point processes, as well as repulsive quadratures from the literature specific to the sliced Wasserstein distance. We then numerically benchmark these quadratures. Moreover, we analyze the variance of the UnifOrtho estimator, an orthogonal Monte Carlo estimator. Our analysis sheds light on UnifOrtho's success for the estimation of the sliced Wasserstein in large dimensions, as well as counterexamples from the literature. Our final recommendation for the computation of the sliced Wasserstein distance is to use randomized quasi-Monte Carlo in low dimensions and UnifOrtho in large dimensions. DPP-based quadratures only shine when quasi-Monte Carlo also does, while repelled quadratures show moderate variance reduction in general, but more theoretical effort is needed to make them robust.

2506.20533 2026-03-11 stat.ML cs.LG math.OC

Global Convergence of Iteratively Reweighted Least Squares for Robust Subspace Recovery

Gilad Lerman, Kang Li, Tyler Maunu, Teng Zhang

详情
英文摘要

Robust subspace estimation is fundamental to many machine learning and data analysis tasks. Iteratively Reweighted Least Squares (IRLS) is an elegant and empirically effective approach to this problem, yet its theoretical properties remain poorly understood. This paper establishes that, under deterministic conditions, a variant of IRLS with dynamic smoothing regularization converges linearly to the underlying subspace from any initialization. We extend these guarantees to affine subspace estimation, a setting that lacks prior recovery theory. Additionally, we illustrate the practical benefits of IRLS through an application to low-dimensional neural network training. Our results provide the first global convergence guarantees for IRLS in robust subspace recovery and, more broadly, for nonconvex IRLS on a Riemannian manifold.

2506.12842 2026-03-11 cs.SI cs.LG stat.ML

Uncovering Social Network Activity Using Joint User and Topic Interaction

Gaspard Abel, Argyris Kalogeratos, Jean-Pierre Nadal, Julien Randon-Furling

Comments Content: 13 pages, 8 figures, 4 tables

详情
Journal ref
IEEE Transactions on Computational Social Systems, 2026
英文摘要

The emergence of online social platforms, such as social networks and social media, has drastically affected the way people apprehend the information flows to which they are exposed. In such platforms, various information cascades spreading among users is the main force creating complex dynamics of opinion formation, each user being characterized by their own behavior adoption mechanism. Moreover, the spread of multiple pieces of information or beliefs in a networked population is rarely uncorrelated. In this paper, we introduce the Mixture of Interacting Cascades (MIC), a model of marked multidimensional Hawkes processes with the capacity to model jointly non-trivial interaction between cascades and users. We emphasize on the interplay between information cascades and user activity, and use a mixture of temporal point processes to build a coupled user/cascade point process model. Experiments on synthetic and real data highlight the benefits of this approach and demonstrate that MIC achieves superior performance to existing methods in modeling the spread of information cascades. Finally, we demonstrate how MIC can provide, through its learned parameters, insightful bi-layered visualizations of real social network activity data.

2506.00168 2026-03-11 q-bio.QM q-bio.CB stat.ML

SSRCA: a novel machine learning pipeline to perform sensitivity analysis for agent-based models

Edward H. Rohr, John T. Nardini

详情
英文摘要

Agent-based models (ABMs) are widely used in biology to understand how individual actions scale into emergent population behavior. Modelers employ sensitivity analysis (SA) algorithms to quantify input parameters' impact on model outputs, however, it is hard to perform SA for ABMs due to their computational and complex nature. In this work, we develop the Simulate, Summarize, Reduce, Cluster, and Analyze (SSRCA) methodology, a machine-learning based pipeline designed to facilitate SA for ABMs. In particular, SSRCA can achieve the following tasks for ABMS: 1) identify sensitive model parameters, 2) reveal common output model patterns, and 3) determine which input parameter values generate these patterns. We use an example ABM of tumor spheroid growth to showcase how SSRCA identifies four common patterns from the ABM and the parameter regions that generate these outputs. Additionally, we compare the SA results between SSRCA and the popular Sobol' Method and find that SSRCA's identified sensitive parameters are robust to the choice of model descriptors while Sobol's are not. This analysis could streamline data-driven tasks, such as parameter estimation, for ABMs by reducing parameter space. While we highlight these results with an ABM on tumor spheroid formation, the SSRCA Methodology is broadly applicable to biological ABMs.

2503.20940 2026-03-11 stat.ME

A Restricted Latent Class Hidden Markov Model for Polytomous Responses, Polytomous Attributes, and Covariates: Identifiability and Application

Eric Alan Wayman, Steven Andrew Culpepper, Jeff Douglas, Jesse Bowers

Comments 60 pages, 3 figures, 34 tables. Edited language for clarity, removed one table, and fixed typos. Published in the Journal of Educational and Behavioral Statistics

详情
Journal ref
Journal of Educational and Behavioral Statistics (February, 2026)
英文摘要

We introduce a restricted latent class exploratory model for longitudinal data with ordinal attributes and respondent-specific covariates. Responses follow a time inhomogeneous hidden Markov model where the probability of a respondent's latent state at the current time point is conditional on the respondent's latent state at the previous time point as well as the respondent's covariates at the current time point. We prove that the model is identifiable, state a Bayesian formulation, and demonstrate its efficacy in a variety of scenarios through two simulation studies. We apply the model to response data from a mathematics examination, comparing the results to a previously published confirmatory analysis, and also apply it to emotional state response data which was measured over a several-day period.

2502.15933 2026-03-11 stat.ME stat.AP

Empirical best prediction of poverty indicators via nested error regression with high dimensional parameters

Yuting Chen, Partha Lahiri, Nicola Salvati

详情
英文摘要

The Nested Error Regression Model with High-Dimensional Parameters (NERHDP) is extended to address challenges in small area poverty estimation. A robust and flexible framework is proposed to derive empirical best predictors (EBPs) of small area poverty indicators while accommodating heterogeneity in regression coefficients and sampling variances across areas. To mitigate the computational limitations of the existing algorithm, an efficient estimation procedure is introduced, substantially reducing computation time and enhancing scalability for large datasets. A novel approach for generating area-specific poverty estimates in out-of-sample areas is also developed, improving the reliability of synthetic estimates. Uncertainty is quantified through a parametric bootstrap method specifically tailored to the extended model. Under heterogeneous data-generating scenarios, the proposed method yields lower relative bias and relative root mean squared prediction error than existing approaches. The methodology is further illustrated using data from the 2002 Albania Living Standards Measurement Survey, combined with auxiliary information from the 2001 census, to estimate poverty indicators for 374 municipalities.

2410.09067 2026-03-11 stat.AP cs.CG physics.soc-ph

Evaluating Cooling Center Coverage Using Persistent Homology of a Filtered Witness Complex

Erin O'Neil, Sarah Tymochko

详情
英文摘要

In light of the increase in frequency of extreme heat events, there is a critical need to develop tools to identify geographic locations that are at risk of heat-related mortality. This paper aims to identify locations by assessing holes in cooling-center coverage using persistent homology (PH), a method from topological data analysis (TDA). Persistent homology has shown promising results in identifying holes in coverage of specific resources. We adapt these methods using a witness complex construction to study the coverage of cooling centers. We test our approach on four locations (central Boston, MA; central Austin, TX; Portland, OR; and Miami, FL) and use death times, a measurement of the size and scale of the gap in coverage, to identify most at risk regions. For comparison, we implement a standard technique for studying the risk of heat-related mortality called a heat vulnerability index (HVI). The HVI is a numerical score calculated for a geographic area based on demographic information. PH and the HVI identify different locations as vulnerable, thus indicating a potential value of assessing vulnerability from multiple perspectives. By using the regions identified by both persistent homology and the HVI, we provide a more holistic understanding of coverage.

2410.05861 2026-03-11 stat.ME econ.EM

Persistence-Robust Break Detection in Predictive CoVaR Regressions

Yannick Hoga

详情
英文摘要

Forecasting risk (as measured by quantiles) and systemic risk (as measured by Adrian and Brunnermeiers's (2016) CoVaR) is important in economics and finance. However, past research has shown that predictive relationships may be unstable over time. Therefore, this paper develops structural break tests in predictive quantile and CoVaR regressions. These tests can detect changes in the forecasting power of covariates, and are based on the principle of self-normalization. We show that our tests are valid irrespective of whether the predictors are stationary or near-stationary, rendering the tests suitable for a range of practical applications. Simulations illustrate the good finite-sample properties of our tests. Two empirical applications concerning equity premium and systemic risk forecasting models show the usefulness of the tests.

2410.02840 2026-03-11 cs.LG cs.CY math.ST stat.TH

Overcoming Representation Bias in Fairness-Aware data Repair using Optimal Transport

Abigail Langbridge, Anthony Quinn, Robert Shorten

详情
英文摘要

Optimal transport (OT) has an important role in transforming data distributions in a manner which engenders fairness. Typically, the OT operators are learnt from the unfair attribute-labelled data, and then used for their repair. Two significant limitations of this approach are as follows: (i) the OT operators for underrepresented subgroups are poorly learnt (i.e. they are susceptible to representation bias); and (ii) these OT repairs cannot be effected on identically distributed but out-of-sample (i.e.\ archival) data. In this paper, we address both of these problems by adopting a Bayesian nonparametric stopping rule for learning each attribute-labelled component of the data distribution. The induced OT-optimal quantization operators can then be used to repair the archival data. We formulate a novel definition of the fair distributional target, along with quantifiers that allow us to trade fairness against damage in the transformed data. These are used to reveal excellent performance of our representation-bias-tolerant scheme in simulated and benchmark data sets.

2407.05277 2026-03-11 eess.SP math.ST stat.TH

Einstein from Noise: Statistical Analysis

Amnon Balanov, Wasim Huleihel, Tamir Bendory

详情
英文摘要

``Einstein from noise" (EfN) is a prominent example of the model bias phenomenon: systematic errors in the statistical model that lead to spurious but consistent estimates. In the EfN experiment, one falsely believes that a set of observations contains noisy, shifted copies of a template signal (e.g., an Einstein image), whereas in reality, it contains only pure noise observations. To estimate the signal, the observations are first aligned with the template using cross-correlation, and then averaged. Although the observations contain nothing but noise, it was recognized early on that this process produces a signal that resembles the template signal! This pitfall was at the heart of a central scientific controversy about validation techniques in structural biology. This paper provides a comprehensive statistical analysis of the EfN phenomenon above. We show that the Fourier phases of the EfN estimator (namely, the average of the aligned noise observations) converge to the Fourier phases of the template signal, explaining the observed structural similarity. Additionally, we prove that the convergence rate is inversely proportional to the number of noise observations and, in the high-dimensional regime, to the Fourier magnitudes of the template signal. Moreover, in the high-dimensional regime, the Fourier magnitudes converge to a scaled version of the template signal's Fourier magnitudes. This work not only deepens the theoretical understanding of the EfN phenomenon but also highlights potential pitfalls in template matching techniques and emphasizes the need for careful interpretation of noisy observations across disciplines in engineering, statistics, physics, and biology.

2402.18741 2026-03-11 stat.ME

Spectral Graph Filtering for Modality-Specific Representation Learning

Shira Yoffe, Amit Moscovich, Ariel Jaffe

详情
英文摘要

Multimodal datasets, where measurements are obtained from multiple sensors, have become central to many scientific domains. In unsupervised settings, most representation learning methods focus on identifying shared latent structures, such as clusters or continuous processes that appear across modalities. However, some aspects of the data may be observed only through a single modality. For example, in computational biology, certain cell-subtypes may appear in genetic profiles but not in epigenetic markers. In this paper, we present DELVE, a spectral method for extracting modality-specific (differential) latent variables. Our approach constructs a graph for each modality and leverages differences in their connectivity patterns to design a graph filter that attenuates shared signals while preserving modality-specific components. We provide an asymptotic convergence analysis for our method under a product manifold model. To evaluate the performance of our method, we test its ability to recover differential latent structures in several synthetic and real datasets.

2311.15485 2026-03-11 stat.ME

Calibrated Generalized Bayesian Inference

David T. Frazier, Christopher Drovandi, Robert Kohn

Comments This paper is a substantially revised version of arXiv:2302.06031v1. This revised version has a slightly different focus, additional examples, and theoretical results, as well as different authors

详情
英文摘要

We propose a simple approach that provides accurate uncertainty quantification for Bayesian inference in misspecified or approximate models, and for generalized (Gibbs) posteriors. While existing solutions in this context are based on explicit Gaussian approximations or post-processing procedures, we demonstrate that correct uncertainty quantification can be achieved by substituting the usual posterior with an intuitively appealing alternative that conveys the same information. This solution applies to both likelihood-based and loss-based posteriors, and is formally demonstrated to reliably quantify uncertainty. This new approach is demonstrated through a range of examples, including generalized linear models, and doubly intractable models.

2210.13687 2026-03-11 stat.AP cs.CY

Implicit Biases in Refereeing: Lessons from NBA Referees

Konstantinos Pelechrinis

详情
英文摘要

Implicit biases occur automatically and unintentionally and are particularly present when we have to make split second decisions. One such situations appears in refereeing, where referees have to make an instantaneous decision on a potential violation. In this work we revisit and extend some of the existing work on implicit biases in refereeing. In particular, we focus on refereeing in the NBA and examine three different types of implicit bias; (i) home-vs-away bias, (ii) bias towards individual players or teams, and, (iii) racial bias. For our study, we use play-by-play data and data from the Last Two Minutes reports the league office releases for games that were within 5 points in the last 2 minutes since the 2015 season. Our results indicate that the there is a bias towards the home team - particularly pronounced during the playoffs - but it has been reduced since the COVID-19 pandemic. Furthermore, there is robust statistical evidence that specific players benefit from referee decisions more than expected from pure chance. However, we find no evidence of negative bias towards individual players, or towards specific teams. Finally, our analysis on racial bias indicates the absence of any bias.

1904.11060 2026-03-11 econ.EM math.ST stat.TH

Normal Approximation in Large Network Models

Michael P. Leung, Hyungsik Roger Moon

详情
英文摘要

We prove a central limit theorem for network formation models with strategic interactions and homophilous agents. Since data often consists of observations on a single large network, we consider an asymptotic framework in which the network size diverges. We argue that a modification of ``stabilization'' conditions from the literature on geometric graphs provides a useful high-level formulation of weak dependence which we utilize to establish an abstract central limit theorem. Using results in branching process theory, we derive interpretable primitive conditions for stabilization. The main conditions restrict the strength of strategic interactions and equilibrium selection mechanism. We discuss practical inference procedures justified by our results.

2603.09067 2026-03-11 stat.ML cond-mat.stat-mech cs.LG math-ph math.MP

Verifying Good Regulator Conditions for Hypergraph Observers: Natural Gradient Learning from Causal Invariance via Established Theorems

Max Zhuravlev

Comments 18 pages, 15 formal results. Part of a series of companion papers submitted simultaneously; cross-references updated with arXiv IDs in v2

详情
英文摘要

We verify that persistent observers in causally invariant hypergraph substrates satisfy the conditions of the Conant-Ashby Good Regulator Theorem. Building on Wolfram's hypergraph physics and Vanchurin's neural network cosmology, we formalize persistent observers as entities that minimize prediction error at their boundary with the environment. Applying a modern reformulation of the Conant-Ashby theorem, we demonstrate that hypergraph observers satisfy Good Regulator conditions, requiring them to maintain internal models. Once an internal model with loss function exists, the emergence of a Fisher information metric follows from standard information geometry. Invoking Amari's uniqueness theorem for reparameterization-invariant gradients, we show that natural gradient descent is the unique admissible learning rule. Under the ansatz M=F^2 for exponential family observers and one specific convergence time functional, we derive a closed-form formula for the regime parameter alpha in Vanchurin's Type II framework, with a quantum-classical threshold at kappa(F)=2. However, three alternative convergence models do not reproduce this result, so this prediction is strongly model-dependent. We further introduce the directional regime parameter alpha_{v_k} and the trace-free deviation tensor, showing that a single observer can simultaneously occupy different Vanchurin regimes along different eigendirections of the Fisher metric. This connects Wolfram and Vanchurin frameworks through established theorems, providing approximately 25-30% novel contribution.

2603.09061 2026-03-11 stat.AP stat.ME

Distribution-free screening of spatially variable genes in spatial transcriptomics

Changhu Wang, Qiyun Huang, Zihao Chen, Jin Liu, Ruibin Xi

详情
英文摘要

Spatial transcriptomics (ST) technologies enable transcriptome-wide gene expression profiling while preserving spatial resolution, offering unprecedented opportunities to uncover complex spatial structures. Due to the ultra-high dimensionality of ST data, identifying spatially variable genes (SVGs) associated with unknown spatial clusters has become a central task in ST data analysis. Here, we develop a distribution-free SVG screening method based on a novel quasi-likelihood ratio statistic, the MM-test, combined with a knockoff procedure to control the false discovery rate (FDR). MM-test leverages auxiliary information, such as spatial distances, about the unknown spatial domains for SVG screening. Notably, in addition to two-dimensional ST datasets, MM-test is well-suited for increasingly common three-dimensional (3D), multi-slice ST datasets. Extensive benchmarking using simulations and 34 real ST datasets demonstrates that MM-test consistently outperforms existing SVG detection methods. In a 3D mouse brain dataset, MM-test accurately delineates fine-scale structures that are challenging for other methods, such as the 3D architecture of the pyramidal layer of the hippocampal cornu ammonis and the dentate gyrus. Theoretical guarantees-including selection consistency, FDR control, and an error bound for post-selection clustering-are also established.

2603.09058 2026-03-11 stat.ME cs.LG

Adaptive Active Learning for Online Reliability Prediction of Satellite Electronics

Shixiang Li, Yubin Tian, Dianpeng Wang, Piao Chen, Mengying Ren

详情
英文摘要

Accurate on-orbit reliability prediction for satellite electronics is often hindered by limited data availability, varying operational conditions, and considerable unit-to-unit variability. To overcome these obstacles, this paper proposes a novel integrated online reliability prediction framework. The main contributions are twofold. First, a Wiener process-based degradation model is developed, incorporating a generalized Arrhenius link function, individual random effects, and spatial correlations among adjacent units. A customized maximum likelihood estimation method is further devised to facilitate efficient and accurate parameter inference. Second, a two-stage active learning sampling scheme is designed to adaptively enhance prediction accuracy. This strategy initially selects representative units based on spatial configuration, and subsequently determines optimal sampling times using a comprehensive criterion that balances unit-specific information, model uncertainty, and degradation dynamics. Numerical experiments and a practical case study from the Tiangong space station demonstrate that the proposed method markedly improves reliability prediction accuracy while significantly reducing data requirements, offering an efficient solution for the prognostic and health management of complex satellite electronic systems.

2603.09041 2026-03-11 stat.ME stat.AP

AgroDesign: A Design-Aware Statistical Inference Framework for Agricultural Experiments in Python

Aqib Gul

Comments 21 pages, 8 figures, 8 tables

详情
英文摘要

Statistical analysis of agricultural experiments is based on structured experimental designs such as randomized block, factorial, split-plot, and multi-environment trials. While the theoretical bases of these approaches are sound, their implementation in modern programming frameworks usually involves manual specification of statistical models, choice of error terms, and subjective interpretation of interaction effects. This divide between experimental design and computational implementation opens the door to misleading inference and inconsistent reporting. We introduce AgroDesign, a Python framework that makes experimental design the central specification of statistical analysis. The framework translates specified experimental designs directly into valid linear models, automatically identifies error strata, conducts hypothesis testing and mean separation, checks assumptions of linear models, and provides decision-focused interpretations. The framework integrates fixed-effect ANOVA, hierarchical designs, linear mixed models, and genotype-by-environment stability analysis into a single declarative framework. AgroDesign is validated on canonical designs in agricultural statistics and shows consistency with traditional statistical analysis while strictly enforcing correct interpretation constraints, especially in interaction-dominant and multi-stratum designs. By integrating design semantics into computation, the framework minimizes analyst-driven modeling choices and enhances reproducibility.

2603.09009 2026-03-11 stat.ML cs.LG

Statistical Inference via Generative Models: Flow Matching and Causal Inference

Shinto Eguchi

详情
英文摘要

Generative AI has achieved remarkable empirical success, but from the perspective of statistics it often remains opaque: its predictions may be accurate, yet the underlying mechanism is difficult to interpret, analyze, and trust. This book reinterprets generative AI in the language of statistics, using flow matching as a central example. The key idea is that generative models should be understood not merely as devices for producing plausible data, but as methods for the nonparametric learning of high-dimensional probability distributions. From this viewpoint, missing-data imputation becomes principled sampling from learned conditional distributions, counterfactual analysis becomes the estimation of intervention distributions, and distributional dynamics become statistically analyzable objects. Mathematically, flow matching represents distributional deformation through the continuity equation and a time-dependent velocity field, thereby extending score matching from the learning of static score fields to the learning of transport paths themselves. Building on this foundation, the book develops a statistical framework in which generative models are used to estimate nuisance components while inferential validity is maintained through orthogonalization and cross-fitting in the spirit of double/debiased machine learning. Applications to survival analysis, censoring, missingness, and causal inference show how generative models can be integrated into statistical inference for structured high-dimensional problems.

2603.08981 2026-03-11 stat.ME stat.AP

Uncertainty quantification for critical energy systems during compound extremes via BMW-GAM

Mitchell L. Krock, W. Neal Mann, Zhi Zhou

详情
英文摘要

Extreme weather poses a large risk to critical energy systems (Ekisheva, Rieder, Norris, Lauby, & Dobson 2021; Levin, Botterud, Mann, Kwon, & Zhou 2022). Uncertainty quantification of negative impacts is important for developing resilience, especially during compound extreme weather events involving multiple climate variables. We leverage BMW-GAM (Economou & Garry 2022), a copula workflow that relies on fitting marginal distributions with Bayesian generalized additive models in moving windows -- an embarrassingly parallel task. The Gaussian copula has separable multivariate space-time correlation, allowing for efficient emulation and likelihood fitting with big datasets. Overall, the formulation is interpretable and provides uncertainty quantification through probabilistic simulations of weather variables during extreme events. Our method is illustrated in an analysis of temperature, wind speed, and global horizontal irradiance from Argonne National Laboratory's high-fidelity climate model output ADDA.

2603.08979 2026-03-11 math.OC cs.LG stat.ML

Data-driven robust Markov decision processes on Borel spaces: performance guarantees via an axiomatic approach

Sivaramakrishnan Ramani

详情
英文摘要

We consider Markov decision processes (MDPs) with unknown disturbance distribution and address this problem using the robust Markov decision process (RMDP) approach. We construct the empirical distribution of the unknown disturbance distribution and characterize our ambiguity set of distributions as the sublevel set of a nonnegative distance function from the empirical distribution. By connecting the weak convergence of distributions to convergence with respect to the distance function, we prove that the robust optimal value function and the out-of-sample value function converge to the true optimal value function with increasing sample-sizes. We establish that, for finite sample-sizes, the robust optimal value function serves as a high probability upper bound on the out-of-sample value function. We also obtain probabilistic convergence rates, sample complexity bounds, and out-of-distribution performance bounds. The finite sample performance guarantees rely on the distance function satisfying a certain concentration type inequality. Several well-studied distances in the literature meet the requirements imposed on the distance function. We also analyze the data-driven properties of empirical MDPs and demonstrate that, unlike our data-driven RMDPs, empirical MDPs fail to satisfy some of the finite sample performance guarantees.

2603.08963 2026-03-11 stat.ME stat.ML

Estimation of heterogeneous principal effects under principal ignorability

Rui Zhang, Charles R. Doss, Jared D. Huling

详情
英文摘要

We study estimation and inference for heterogeneous principal causal effects with binary treatments and binary intermediate variables. Principal causal effects are subgroup effects within strata defined by potential values of an intermediate variable, including effects among compliers. We propose a framework for estimating and forming pointwise confidence intervals for heterogeneous principal causal effects under the principal ignorability assumption. Several estimators are developed, and their robustness properties are characterized: one estimator is doubly robust, whereas the other two attain intermediate robustness between double and triple robustness; in contrast, principal causal effects can be estimated in a triply robust manner only. We establish large-sample theory under nonparametric smoothness conditions and analyze the bias contributions of each approach, providing insight into performance beyond the smooth setting, including in high-dimensional regimes. Camden Coalition hotspotting randomized trial are used to illustrate the methods by estimating heterogeneous complier effects.

2603.08947 2026-03-11 stat.ML cs.LG

Towards Reliable Simulation-based Inference

Arnaud Delaunoy

Comments PhD thesis

详情
英文摘要

Scientific knowledge expands by observing the world, hypothesizing some theories about it, and testing them against collected data. When those theories take the form of statistical models, statistical analyses are involved in the process of testing and refining scientific hypotheses. In this thesis, we focus on statistical models that take the form of scientific simulators and provide background about how machine learning can be used for statistical analyses in this context. The first part of this thesis is about showing empirically that performing statistical analyses with machine learning involves a degree of approximation. Specifically, all statistical analyses involve a level of uncertainty in the conclusions drawn, and we show that approximations can lead to overconfident conclusions. We draw caution regarding such overconfident conclusions and introduce a criterion to diagnose overconfident approximations. In the second part, we introduce balancing, a way to regularize machine learning models to reduce overconfidence and favor calibrated or underconfident approximations. Balancing is first introduced for neural ratio estimation algorithms and then extended to other algorithms. Intuition about why balancing leads to less overconfident solutions is provided, and it is shown empirically that balanced algorithms are often either close to calibrated or underconfident. The third part shows that Bayesian neural networks can also be used to mitigate the overconfidence of approximations. Unlike balancing, no regularization is required, and this solution can then work with few training samples and, hence, computationally expensive simulators. To that end, a new Bayesian neural network prior tailored for simulation-based inference is developed, and empirical results show a reduction in overconfidence compared to similar solutions without Bayesian neural networks.

2603.08945 2026-03-11 math.ST cs.LG stat.ML stat.TH

Kernel Debiased Plug-in Estimation based on the Universal Least Favorable Submodel

Haiyi Chen, Yang Liu, Ivana Malenica

详情
英文摘要

We propose ULFS-KDPE, a kernel debiased plug-in estimator based on the universal least favorable submodel, for estimating pathwise differentiable parameters in nonparametric models. The method constructs a data-adaptive debiasing flow in a reproducing kernel Hilbert space (RKHS), producing a plug-in estimator that achieves semiparametric efficiency without requiring explicit derivation or evaluation of efficient influence functions. We place ULFS-KDPE on a rigorous functional-analytic foundation by formulating the universal least favorable update as a nonlinear ordinary differential equation on probability densities. We establish existence, uniqueness, stability, and finite-time convergence of the empirical score along the induced flow. Under standard regularity conditions, the resulting estimator is regular, asymptotically linear, and attains the semiparametric efficiency bound simultaneously for a broad class of pathwise differentiable parameters. The method admits a computationally tractable implementation based on finite-dimensional kernel representations and principled stopping criteria. In finite samples, the combination of solving a rich collection of score equations with RKHS-based smoothing and avoidance of direct influence-function evaluation leads to improved numerical stability. Simulation studies illustrate the method and support the theoretical results.

2603.08925 2026-03-11 math.ST stat.ML stat.TH

Functional Bias and Tangent-Space Geometry in Variational Inference

Sean Plummer

详情
英文摘要

Variational inference approximates Bayesian posterior distributions by projecting onto a tractable family of distributions. While most theoretical analyses evaluate the quality of this approximation using global divergence measures, many applications rely on specific posterior summaries such as expectations, variances, or tail probabilities. We develop a geometric framework for analyzing the bias of posterior functionals under variational approximations. We show that the leading-order bias of a posterior functional is determined by its component orthogonal to the variational tangent space induced by the variational family. Functionals aligned with this space incur only second-order bias. For structured mean-field variational families we characterize the tangent space explicitly and show that it consists of block-additive functions of the parameter blocks, while interaction components determine the leading-order bias. Under standard local asymptotic normality conditions we further derive explicit asymptotic expansions for the bias of posterior functionals and show that omitted interaction directions produce first-order distortion of cross-block dependence measures. These results provide a geometric explanation for several well-known properties of mean-field variational inference, including the systematic distortion of cross-block dependencies.

2603.08907 2026-03-11 cs.LG cs.AI stat.ML

Cross-Domain Uncertainty Quantification for Selective Prediction: A Comprehensive Bound Ablation with Transfer-Informed Betting

Abhinaba Basu

详情
英文摘要

We present a comprehensive ablation of nine finite-sample bound families for selective prediction with risk control, combining concentration inequalities (Hoeffding, Empirical Bernstein, Clopper-Pearson, Wasserstein DRO, CVaR) with multiple-testing corrections (union bound, Learn Then Test fixed-sequence) and betting-based confidence sequences (WSR). Our main theoretical contribution is Transfer-Informed Betting (TIB), which warm-starts the WSR wealth process using a source domain's risk profile, achieving tighter bounds in data-scarce settings with a formal dominance guarantee. We prove that the TIB wealth process remains a valid supermartingale under all source-target divergences, that TIB dominates standard WSR when domains match, and that no data-independent warm-start can achieve better convergence. The combination of betting-based confidence sequences, LTT monotone testing, and cross-domain transfer is, to our knowledge, a three-way novelty not present in the literature. We evaluate all nine bound families on four benchmarks-MASSIVE (n=1,102), NyayaBench (n=280), CLINC-150 (n=22.5K), and Banking77 (n=13K)-across 18 (alpha, delta) configurations. On MASSIVE at alpha=0.10, LTT eliminates the ln(K) union-bound penalty, achieving 94.0% guaranteed coverage versus 73.8% for Hoeffding-a 27% relative improvement. On NyayaBench, where the small calibration set makes Hoeffding-family bounds infeasible below alpha=0.20, Transfer-Informed Betting achieves 18.5% coverage at alpha=0.10, a 5.4x improvement over LTT + Hoeffding. We additionally compare with split-conformal prediction, showing that conformal methods produce prediction sets (avg. 1.67 classes) whereas selective prediction provides single-prediction risk guarantees. We apply these methods to agentic caching systems, formalizing a progressive trust model where the guarantee determines when cached responses can be served autonomously.

2603.08871 2026-03-11 stat.ME

Efficient semiparametric estimation of marginal treatment effects with genetic instrumental variables

Ashish Patel, Francis J DiTraglia, Stephen Burgess

详情
英文摘要

Alcohol misuse is a key target of public health strategies aimed at reducing cardiovascular risk. The effect of excessive alcohol consumption on blood pressure may vary systematically with individuals' unobserved propensity to engage in heavy drinking, complicating causal inference with observational data. The marginal treatment effects framework uses an instrumental variable for treatment choice (excessive alcohol consumption) to study how selection into treatment is linked with the treatment effect. We explore the use of a genetic instrument within this framework, which is challenging because genetic compliers (individuals for whom a change in the instrument changes their treatment choice) are likely to be a small proportion of the overall sample. This can lead to greater sampling uncertainty in the tails of the propensity score distribution, i.e., the conditional probability of choosing treatment, and in turn poor estimation of causal estimands that measure heterogeneous treatment effects. We show that the use of efficient influence functions of target estimands improves estimation in terms of robustness to sampling uncertainty in nonparametrically estimated propensity scores. We find evidence of reverse selection on gains: individuals most prone to excessive alcohol consumption experience larger adverse effects on blood pressure.

2603.08803 2026-03-11 cs.LG stat.ML

The Temporal Markov Transition Field

Michael Leznik

Comments 13 pages, 2 figures

详情
英文摘要

The Markov Transition Field (MTF), introduced by Wang and Oates (2015), encodes a time series as a two-dimensional image by mapping each pair of time steps to the transition probability between their quantile states, estimated from a single global transition matrix. This construction is efficient when the transition dynamics are stationary, but produces a misleading representation when the process changes regime over time: the global matrix averages across regimes and the resulting image loses all information about \emph{when} each dynamical regime was active. In this paper we introduce the \emph{Temporal Markov Transition Field} (TMTF), an extension that partitions the series into $K$ contiguous temporal chunks, estimates a separate local transition matrix for each chunk, and assembles the image so that each row reflects the dynamics local to its chunk rather than the global average. The resulting $T \times T$ image has $K$ horizontal bands of distinct texture, each encoding the transition dynamics of one temporal segment. We develop the formal definition, establish the key structural properties of the representation, work through a complete numerical example that makes the distinction from the global MTF concrete, analyse the bias--variance trade-off introduced by temporal chunking, and discuss the geometric interpretation of the local transition matrices in terms of process properties such as persistence, mean reversion, and trending behaviour. The TMTF is amplitude-agnostic and order-preserving, making it suitable as an input channel for convolutional neural networks applied to time series characterisation tasks.

2603.08773 2026-03-11 cs.LG cs.AI stat.ML

Multi-level meta-reinforcement learning with skill-based curriculum

Sichen Yang, Mauro Maggioni

Comments 78 pages, 12 figures

详情
英文摘要

We consider problems in sequential decision making with natural multi-level structure, where sub-tasks are assembled together to accomplish complex goals. Systematically inferring and leveraging hierarchical structure has remained a longstanding challenge; we describe an efficient multi-level procedure for repeatedly compressing Markov decision processes (MDPs), wherein a parametric family of policies at one level is treated as single actions in the compressed MDPs at higher levels, while preserving the semantic meanings and structure of the original MDP, and mimicking the natural logic to address a complex MDP. Higher-level MDPs are themselves independent MDPs with less stochasticity, and may be solved using existing algorithms. As a byproduct, spatial or temporal scales may be coarsened at higher levels, making it more efficient to find long-term optimal policies. The multi-level representation delivered by this procedure decouples sub-tasks from each other and usually greatly reduces unnecessary stochasticity and the policy search space, leading to fewer iterations and computations when solving the MDPs. A second fundamental aspect of this work is that these multi-level decompositions plus the factorization of policies into embeddings (problem-specific) and skills (including higher-order functions) yield new transfer opportunities of skills across different problems and different levels. This whole process is framed within curriculum learning, wherein a teacher organizes the student agent's learning process in a way that gradually increases the difficulty of tasks and and promotes transfer across MDPs and levels within and across curricula. The consistency of this framework and its benefits can be guaranteed under mild assumptions. We demonstrate abstraction, transferability, and curriculum learning in examples, including MazeBase+, a more complex variant of the MazeBase example.

2603.08753 2026-03-11 stat.ML cs.AI cs.LG

Permutation-Equivariant 2D State Space Models: Theory and Canonical Architecture for Multivariate Time Series

Seungwoo Jeong, Heung-Il Suk

详情
英文摘要

Multivariate time series (MTS) modeling often implicitly imposes an artificial ordering over variables, violating the inherent exchangeability found in many real-world systems where no canonical variable axis exists. We formalize this limitation as a violation of the permutation symmetry principle and require state-space dynamics to be permutation-equivariant along the variable axis. In this work, we theoretically characterize the complete canonical form of linear variable coupling under this symmetry constraint. We prove that any permutation-equivariant linear 2D state-space system naturally decomposes into local self-dynamics and a global pooled interaction, rendering ordered recurrence not only unnecessary but structurally suboptimal. Motivated by this theoretical foundation, we introduce the Variable-Invariant Two-Dimensional State Space Model (VI 2D SSM), which realizes the canonical equivariant form via permutation-invariant aggregation. This formulation eliminates sequential dependency chains along the variable axis, reducing the dependency depth from $\mathcal{O}(C)$ to $\mathcal{O}(1)$ and simplifying stability analysis to two scalar modes. Furthermore, we propose VI 2D Mamba, a unified architecture integrating multi-scale temporal dynamics and spectral representations. Extensive experiments on forecasting, classification, and anomaly detection benchmarks demonstrate that our model achieves state-of-the-art performance with superior structural scalability, validating the theoretical necessity of symmetry-preserving 2D modeling.

2603.08742 2026-03-11 cs.NE cs.LG cs.NA math.NA stat.ML

Robust Parameter and State Estimation in Multiscale Neuronal Systems Using Physics-Informed Neural Networks

Changliang Wei, Yangyang Wang, Xueyu Zhu

详情
英文摘要

Inferring biophysical parameters and hidden state variables from partial and noisy observations is a fundamental challenge in computational neuroscience. This problem is particularly difficult for fast - slow spiking and bursting models, where strong nonlinearities, multiscale dynamics, and limited observational data often lead to severe sensitivity to initial parameter guesses and convergence failure in the methods replying on the traditional numerical forward solvers. In this work, we developed a physics-informed neural network (PINN) framework for the joint reconstruction of unobserved state variables and the estimation of unknown biophysical parameters in neuronal models. We demonstrate the effectiveness of the method on biophysical neuron models, including the Morris-Lecar model across multiple spiking and bursting regimes and a respiratory model neuron. The method requires only partial voltage observations over short observation windows and remains robust even when initialized with non-informative parameter guesses. These results suggest that PINN can deliver robust and accurate parameter inference and state reconstruction, providing a promising alternative for inverse problems in multiscale neuronal dynamics, where traditional techniques often struggle.

2603.06820 2026-03-11 econ.EM stat.OT

Hippocratic Utility

Tomasz Strzalecki

详情
英文摘要

A utility function has been proposed that values more those lives that are saved by not imposing a harmful treatment and values less those lives that could be saved by treating people who would otherwise die. I do not dispute the ethical motivation behind this kind of asymmetry. However, as my example illustrates, the scope of applicability of such a decision criterion may be limited.

2603.06465 2026-03-11 stat.AP

Risk Prediction in Cancer Imaging Using Enriched Radiomics Features

Alec Reinhardt, Tsung-Hung Yao, Raven Hollis, Galia Jacobson, Millicent Roach, Mohamed Badawy, Peter Park, Laura Beretta, David Fuentes, Newsha Nikzad, Prasun Jalal, Eugene Koay, Suprateek Kundu

详情
英文摘要

Background: We aim to develop enriched radiomics features that integrate classical structural radiomics with novel functional radiomics derived from liver MRI for diagnosis and risk stratification in liver cancer. The proposed framework leverages enhancement pattern mapping (EPM) images to provide an automated and robust radiomics representation that captures intratumoral heterogeneity through pixel-level functional information. Methods: Pixel-wise EPM data reflecting blood perfusion were extracted from T1-weighted MRI scans. Classical structural radiomics features were extracted via existing software such as PyRadiomics. In addition, empirical quantiles of EPM values over all pixels within the image, and then smoothed using suitable basis. The smoothed quantiles, along with the classical structural quantiles, are used as functional radiomics features for diagnostic classification and tumor grade stratification, using L1-penalized logistic model that automatically downweights the contribution of the irrelevant features. Further, we conducted longitudinal analyses using Bayesian tensor response regression, which enables spatial smoothing and parsimonious modeling of temporally evolving imaging patterns. Results: The enriched radiomics features illustrate higher diagnostic classification performance (AUC=0.96, sensitivity> 0.8) and superior tumor grade stratification accuracy (AUC=0.87, sensitivity=0.8) compared to alternate radiomics features. Moreover, we find that the proportion of lesion pixels with significant reduction in EPM values over time is considerably higher (median = 0.12) in aggressive lesions versus stable or mildly aggressive lesions (median = 0.025). Conclusion: The enriched novel radiomics features can potentially replace classical radiomics analysis and be used for imaging biomarkers in cross-sectional and in longitudinal cancer imaging studies.

2602.10696 2026-03-11 stat.ML cs.LG math.OC math.ST stat.TH

Robust Assortment Optimization from Observational Data

Miao Lu, Yuxuan Han, Han Zhong, Zhengyuan Zhou, Jose Blanchet

Comments 65 pages, 9 figures

详情
英文摘要

Assortment optimization is a fundamental challenge in modern retail and recommendation systems, where the goal is to select a subset of products that maximizes expected revenue under complex customer choice behaviors. While recent advances in data-driven methods have leveraged historical data to learn and optimize assortments, these approaches typically rely on strong assumptions -- namely, the stability of customer preferences and the correctness of the underlying choice models. However, such assumptions frequently break in real-world scenarios due to preference shifts and model misspecification, leading to poor generalization and revenue loss. Motivated by this limitation, we propose a robust framework for data-driven assortment optimization that accounts for potential distributional shifts in customer choice behavior. Our approach models potential preference shift from a nominal choice model that generates data and seeks to maximize worst-case expected revenue. We first establish the computational tractability of robust assortment planning when the nominal model is known, then advance to the data-driven setting, where we design statistically optimal algorithms that minimize the data requirements while maintaining robustness. Our theoretical analysis provides both upper bounds and matching lower bounds on the sample complexity, offering theoretical guarantees for robust generalization. Notably, we uncover and identify the notion of ``robust item-wise coverage'' as the minimal data requirement to enable sample-efficient robust assortment learning. Our work bridges the gap between robustness and statistical efficiency in assortment learning, contributing new insights and tools for reliable assortment optimization under uncertainty.

2602.04146 2026-03-11 math.ST stat.TH

Bayes, E-values and Testing

Nicholas G. Polson, Vadim Sokolov, Daniel Zantedeschi

Comments Revised submission: fixed typos, added clarifications, and compressed the exposition

详情
英文摘要

E-values and E-processes (nonnegative supermartingales) provide anytime-valid evidence for sequential testing via Ville's inequality, yet their connection to Bayesian reasoning, representational structure, and computational feasibility are often conflated in the literature. We develop a typed framework that separates sequential evidence into three layers: (i) representation (Radon-Nikodym / likelihood-ratio geometry), (ii) validity (supermartingale certificates under optional stopping), and (iii) decision (boundary design and efficiency calibration). Our main results are: (a) under log-loss and Bayes-risk minimization, the likelihood ratio is the unique evidence representation within the coherent predictive subclass; (b) the likelihood-ratio stopping time satisfies E_1[tau_b] = (log b)/mu + O(sqrt(log b)) under Cramer conditions, while validity-only thresholds admit no such growth-rate guarantee; and (c) regret-optimal codes (e.g., NML/MDL) do not in general yield valid E-processes, while prequential codes do. Monte Carlo experiments confirm the theoretical predictions. The framework applies to online model validation, adaptive experimentation, conformal prediction, and sequential changepoint detection.

2510.16232 2026-03-11 stat.ML cs.LG cs.MA cs.SY eess.SY

Personalized Collaborative Learning with Affinity-Based Variance Reduction

Chenyu Zhang, Navid Azizan

Comments Published as a conference paper at ICLR 2026

详情
英文摘要

Multi-agent learning faces a fundamental tension: leveraging distributed collaboration without sacrificing the personalization needed for diverse agents. This tension intensifies when aiming for full personalization while adapting to unknown heterogeneity levels -- gaining collaborative speedup when agents are similar, without performance degradation when they are different. Embracing the challenge, we propose personalized collaborative learning (PCL), a novel framework for heterogeneous agents to collaboratively learn personalized solutions with seamless adaptivity. Through carefully designed bias correction and importance correction mechanisms, our method AffPCL robustly handles both environment and objective heterogeneity. We prove that AffPCL reduces sample complexity over independent learning by a factor of $\max\{n^{-1}, δ\}$, where $n$ is the number of agents and $δ\in[0,1]$ measures their heterogeneity. This affinity-based acceleration automatically interpolates between the linear speedup of federated learning in homogeneous settings and the baseline of independent learning, without requiring prior knowledge of the system. Our analysis further reveals that an agent may obtain linear speedup even by collaborating with arbitrarily dissimilar agents, unveiling new insights into personalization and collaboration in the high heterogeneity regime.

2508.20943 2026-03-11 stat.AP

DESA: An R Package for Detecting Epidemics using a School-Absenteeism Surveillance Framework

Vinay Joshy, Zeny Feng, Lorna Deeth, Kayla Vanderkruk, Justin Slater

详情
Journal ref
Journal of Open Research Software, vol. 14, no. 1, 2026, p. null
英文摘要

Absenteeism of elementary school children has been shown to be effective in the early detection of an incoming influenza epidemic within a given population. This paper introduces DESA, an R package designed to: 1) model an epidemic using school absenteeism data, 2) raise an alert for an incoming epidemic using school absenteeism data, 3) evaluate the timeliness of the raised alert using different metrics, and 4) simulate community-level household populations, epidemics, and school absenteeism to facilitate research in related fields. This paper provides an overview of the functions in the package and demonstrates its complete workflow using simulated data generated within the package. DESA offers researchers and public health officials a tool for improving early detection of seasonal influenza epidemics or epidemics of other diseases. The package is available on CRAN, making it readily accessible to the R user community.

2508.20924 2026-03-11 math.ST math.PR stat.TH

Palm distributions of superposed point processes for statistical inference

Mario Beraha, Federico Camerlenghi, Lorenzo Ghilotti

Comments This submission replaces arXiv:2409.14753

详情
英文摘要

Palm distributions play a central role in the study of point processes and their associated summary statistics. In this paper, we characterize the Palm distributions of the superposition of independent point processes, establishing a simple mixture representation depending on the point processes' Palm distributions and moment measures. We explore two statistical applications enabled by our main result. First, we consider minimum contrast estimation for corrupted point processes. Second, we investigate the class of shot noise Cox processes and derive explicit expressions for their higher-order Palm distributions. In the finite case, we further obtain a tractable expression for the Janossy density, which plays the role of a likelihood function and thus can be used for new likelihood-based inference strategies. Extensions to the superposition of multiple point processes and to higher-order Palm distributions are also presented.

2506.04626 2026-03-11 stat.ML cs.LG

Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning

Haochen Zhang, Zhong Zheng, Lingzhou Xue

Comments arXiv admin note: text overlap with arXiv:2502.02859

详情
英文摘要

Motivated by real-world settings where data collection and policy deployment -- whether for a single agent or across multiple agents -- are costly, we study the problem of on-policy single-agent reinforcement learning (RL) and federated RL (FRL) with a focus on minimizing burn-in costs (the sample sizes needed to reach near-optimal regret) and policy switching or communication costs. In parallel finite-horizon episodic Markov Decision Processes (MDPs) with $S$ states and $A$ actions, existing methods either require superlinear burn-in costs in $S$ and $A$ or fail to achieve logarithmic switching or communication costs. We propose two novel model-free RL algorithms -- Q-EarlySettled-LowCost and FedQ-EarlySettled-LowCost -- that are the first in the literature to simultaneously achieve: (i) the best near-optimal regret among all known model-free RL or FRL algorithms, (ii) low burn-in cost that scales linearly with $S$ and $A$, and (iii) logarithmic policy switching cost for single-agent RL or communication cost for FRL. Additionally, we establish gap-dependent theoretical guarantees for both regret and switching/communication costs, improving or matching the best-known gap-dependent bounds.

2504.04528 2026-03-11 cs.LG cs.AI stat.ME stat.ML

A Consequentialist Critique of Binary Classification Evaluation: Theory, Practice, and Tools

Gerardo Flores, Abigail Schiff, Alyssa H. Smith, Julia A Fukuyama, Ashia C. Wilson

详情
英文摘要

Machine learning-supported decisions, such as ordering diagnostic tests or determining preventive custody, often require converting probabilistic forecasts into binary classifications. We adopt a consequentialist perspective from decision theory to argue that evaluation methods should prioritize forecast quality across thresholds and base rates. This motivates the use of proper scoring rules such as the Brier score and log loss. However, our empirical review of practices at major ML venues (ICML, FAccT, CHIL) reveals a dominant reliance on top-K metrics or fixed-threshold evaluations. To bridge this disconnect, we introduce a decision-theoretic framework that maps evaluation metrics to their appropriate use cases, accompanied by a practical Python package, \texttt{briertools}, which lowers the barrier to applying proper scoring rules in practice. Methodologically, we derive and implement a clipped Brier score variant that avoids full integration and better reflects bounded, interpretable threshold ranges. Theoretically, we reconcile the Brier score with decision curve analysis, directly addressing the critique of (Assel, et al. 2017) regarding the clinical utility of proper scoring rules.

2410.12367 2026-03-11 math.ST cs.LG stat.ME stat.TH

Adaptive and Stratified Subsampling for High-Dimensional Robust Estimation

Prateek Mittal, Joohi Chauhan

详情
英文摘要

We study robust high-dimensional sparse regression under finite-variance heavy-tailed noise, epsilon-contamination, and alpha-mixing dependence via two subsampling estimators: Adaptive Importance Sampling (AIS) and Stratified Sub-sampling (SS). Under sub-Gaussian design whose scopeis precisely delimited and finite-variance noise, a subsample of size m achieves the minimax-optimal rate. We close the theory-algorithm gap: Theorem 4.6 applies to AIS at termination conditional on stabilized weights (Proposition 4.1), and SS fits the median-of-means M-estimation framework of Lecue and Lerasle (Proposition 4.3). The de-biasing step is fully specified via the nodewise-Lasso precision estimator under a new sparse-precision assumption, yielding valid coordinate-wise CIs (Theorem 4.14). The alpha-mixing extension uses a calendar-time block protocol that guarantees temporal separation (Theorem 4.12). Empirically, AIS achieves 3.10 times lower error than uniform subsampling at 20% contamination, and 29.5% lower test MSE on Riboflavin (p=4,088 and n=71).

2409.13060 2026-03-11 stat.ME

Forecasting Causal Effects of Future Interventions: Confounding and Transportability Issues

Laura Forastiere, Fan Li, Michela Baccini

详情
英文摘要

Recent developments in causal inference allow us to transport a causal effect of a time-fixed treatment from a randomized trial to a target population across space but within the same time frame. In contrast to transportability across space, transporting causal effects across time or forecasting causal effects of future interventions is more challenging due to time-varying confounders and time-varying effect modifiers. In this article, we seek to formally clarify the causal estimands for forecasting causal effects over time and the structural assumptions required to identify these estimands. Specifically, we develop a set of novel nonparametric identification formulas--g-computation formulas--for these causal estimands, and lay out the conditions required to accurately forecast causal effects from a past observed sample to a future population in a future time window. Our overarching objective is to leverage the modern causal inference theory to provide a theoretical framework for investigating whether the effects seen in a past sample would carry over to a new future population. Throughout the article, a working example addressing the effect of public policies or social events on COVID-related deaths is considered to contextualize the developments of analytical results.

2405.11111 2026-03-11 stat.ME

Euclidean mirrors and first-order changepoints in network time series

Tianyi Chen, Zachary Lubberts, Avanti Athreya, Youngser Park, Carey E. Priebe

详情
英文摘要

We describe a model for a network time series whose evolution is governed by an underlying stochastic process, known as the latent position process, in which network evolution can be represented in Euclidean space by a curve, called the Euclidean mirror. We define the notion of a first-order changepoint for a time series of networks, and construct a family of latent position process networks with underlying first-order changepoints. We prove that a spectral estimate of the associated Euclidean mirror localizes these changepoints, even when the graph distribution evolves continuously, but at a rate that changes. Simulated and real data examples on organoid networks show that this localization captures empirically significant shifts in network evolution.

2401.15014 2026-03-11 stat.ME

Constructing Genetic Risk Scores: Robust Bayesian Approach through Projected Summary Statistics and Flexible Shrinkage

Yuzheng Dun, Nilanjan Chatterjee, Jin Jin, Akihiko Nishimura

详情
英文摘要

Polygenic risk scores (PRS) developed from genome-wide association studies (GWAS) can be used for risk stratification by quantifying the genetic contribution to disease, and many clinical applications have been proposed. Bayesian methods are popular for building PRS because of their natural ability to regularize models and incorporate external information. In this article, we present new theoretical results, methods, and extensive numerical studies to advance Bayesian methods for PRS applications. We identify a potential risk, under a common Bayesian PRS framework, of posterior impropriety when integrating the required GWAS summary statistics and linkage disequilibrium (LD) data from distinct sources. As a principled remedy, we propose a projection of the summary statistics that ensures compatibility between the two sources and in turn a proper behavior of the posterior. We further introduce a new PRS method, with accompanying software, under the less-explored Bayesian bridge prior to more flexibly model varying sparsity levels in effect-size distributions. We extensively benchmark it against alternative Bayesian methods using synthetic and real datasets, quantifying the impact of prior specification and LD estimation strategy. Our proposed PRS-Bridge, equipped with the projection technique and flexible prior, demonstrates the most consistent and generally superior performance across a variety of scenarios.

2307.14282 2026-03-11 econ.EM econ.TH stat.ME

Causal Effects in Matching Mechanisms with Strategically Reported Preferences

Marinho Bertanha, Margaux Luflade, Ismael Mourifié

详情
英文摘要

A growing number of authorities use mechanisms to allocate students to schools in a way that reflects student preferences and school priorities. However, most real-world mechanisms incentivize students to strategically misreport their preferences. Misreporting complicates the identification of causal parameters that depend on true preferences, which are necessary inputs for a broad class of counterfactual analyses. We provide an identification approach robust to misreporting and derive sharp bounds on causal effects of school assignment. Our approach applies to allocation rules characterized by placement scores and cutoffs. We use data from a deferred acceptance mechanism that assigns students to university programs in Chile. Matching theory predicts and empirical evidence shows that students behave strategically in Chile because they face constraints on preference submission and have good prior information about school accessibility. Our bounds are informative enough to reveal significant heterogeneity in graduation success with respect to preferences and school assignment.

2202.00190 2026-03-11 math.ST stat.ME stat.TH

Sketching stochastic valuation functions

Milan Vojnović, Yiliu Wang

详情
Journal ref
INFORMS Journal on Optimization 2026
英文摘要

We consider the problem of sketching set valuation functions, defined as the expectation of a valuation function applied to independent random item values. For valuation functions that are monotone and either subadditive or submodular, and that satisfy a weak homogeneity condition (or other structural conditions), we show that there exist discretized versions of the item value distributions -- each with support size $O(k \log k)$ -- that yield a sketch valuation function providing a constant-factor approximation to the true valuation for any subset of items of size at most $k$. These discretized distributions can be computed efficiently for each item independently, making the approach highly scalable. Our results apply broadly to valuation functions commonly encountered in practice, including team performance based on the best member (e.g., maximum functions), constant elasticity of substitution (CES) production functions with diminishing returns in economics, and others. Sketch valuation functions are especially useful in optimization problems such as best set selection and welfare maximization, where exact value computations are costly or intricate. They enable efficient approximate evaluation of value oracle queries while preserving provable approximation guarantees for the original stochastic optimization problem.