arXivDaily arXiv每日学术速递 周一至周五更新
2602.10055 2026-02-11 math.ST stat.TH

The weak law of large numbers for the friendship paradox index

Mingao Yuan

详情
英文摘要

The friendship paradox index is a network summary statistic used to quantify the friendship paradox, which describes the tendency for an individual's friends to have more friends than the individual. In this paper, we utilize Markov's inequality to derive the weak law of large numbers for the friendship paradox index in a random geometric graph, a widely-used model for networks with spatial dependence and geometry. For uniform random geometric graph, where the nodes are uniformly distributed in a space, the friendship paradox index is asymptotically equal to $1/4$. On the contrary, in nonuniform random geometric graphs, the nonuniform node distribution leads to distinct limiting properties for the index. In the relatively sparse regime, the friendship paradox index is still asymptotically equal to $1/4$, the same as in the uniform case. In the intermediate sparse regime, however, the index converges in probability to $1/4$ plus a constant that is explicitly dependent on the node distribution. Finally, in the relatively dense case, the index diverges to infinity as the graph size increases. Our results highlight the sharp contrast between the uniform case and its nonuniform counterpart.

2602.10045 2026-02-11 cs.CV cs.LG stat.ME stat.ML

Conformal Prediction Sets for Instance Segmentation

Kerri Lu, Dan M. Kluger, Stephen Bates, Sherrie Wang

详情
英文摘要

Current instance segmentation models achieve high performance on average predictions, but lack principled uncertainty quantification: their outputs are not calibrated, and there is no guarantee that a predicted mask is close to the ground truth. To address this limitation, we introduce a conformal prediction algorithm to generate adaptive confidence sets for instance segmentation. Given an image and a pixel coordinate query, our algorithm generates a confidence set of instance predictions for that pixel, with a provable guarantee for the probability that at least one of the predictions has high Intersection-Over-Union (IoU) with the true object instance mask. We apply our algorithm to instance segmentation examples in agricultural field delineation, cell segmentation, and vehicle detection. Empirically, we find that our prediction sets vary in size based on query difficulty and attain the target coverage, outperforming existing baselines such as Learn Then Test, Conformal Risk Control, and morphological dilation-based methods. We provide versions of the algorithm with asymptotic and finite sample guarantees.

2602.10026 2026-02-11 stat.ME

Degrees-of-Freedom Approximations for Conditional-Mean Inference in Random-Lot Stability Analysis

Andrew T. Karl, Heath Rushing, Richard K. Burdick, Jeff Hofer

详情
英文摘要

Linear mixed models are widely used for pharmaceutical stability trending when sufficient lots are available. Expiry support is typically based on whether lot-specific conditional-mean confidence limits remain within specification through a proposed expiry. These limits depend on the denominator degrees-of-freedom (DDF) method used for $t$-based inference. We document an operationally important boundary-proximal phenomenon: when a fitted random-effect variance component is close to zero, Satterthwaite DDF for conditional-mean predictions can collapse, inflating $t$ critical values and producing unnecessarily wide and sometimes nonmonotone pointwise confidence limits on scheduled time grids. In contrast, containment DDF yields stable degrees of freedom and avoids sharp discontinuities as variance components approach the boundary. Using a worked example and simulation studies, we show that DDF choice can materially change pass/fail conclusions even when observed data comfortably meet specifications. Containment-based inference with the full random-effects model provides a single modeling framework that avoids the discontinuities introduced by data-dependent model reduction at arbitrary cutoffs. When containment is unavailable, a 10\% variance-contribution reduction workflow mitigates extreme Satterthwaite behavior by simplifying the random-effects structure only when fitted contributions at the proposed expiry are negligible. An AICc step-down is also evaluated but is best treated as a sensitivity analysis, as it can be liberal when the margin between the mean trend and the specification limit at the proposed expiry is small.

2602.10018 2026-02-11 stat.ME math.ST stat.ML stat.TH

Online Selective Conformal Prediction with Asymmetric Rules: A Permutation Test Approach

Mingyi Zheng, Ying Jin

详情
英文摘要

Selective conformal prediction aims to construct prediction sets with valid coverage for a test unit conditional on it being selected by a data-driven mechanism. While existing methods in the offline setting handle any selection mechanism that is permutation invariant to the labeled data, their extension to the online setting -- where data arrives sequentially and later decisions depend on earlier ones -- is challenged by the fact that the selection mechanism is naturally asymmetric. As such, existing methods only address a limited collection of selection mechanisms. In this paper, we propose PErmutation-based Mondrian Conformal Inference (PEMI), a general permutation-based framework for selective conformal prediction with arbitrary asymmetric selection rules. Motivated by full and Mondrian conformal prediction, PEMI identifies all permutations of the observed data (or a Monte-Carlo subset thereof) that lead to the same selection event, and calibrates a prediction set using conformity scores over this selection-preserving reference set. Under standard exchangeability conditions, our prediction sets achieve finite-sample exact selection-conditional coverage for any asymmetric selection mechanism and any prediction model. PEMI naturally incorporates additional offline labeled data, extends to selection mechanisms with multiple test samples, and achieves FCR control with fine-grained selection taxonomies. We further work out several efficient instantiations for commonly-used online selection rules, including covariate-based rules, conformal p/e-values-based procedures, and selection based on earlier outcomes. Finally, we demonstrate the efficacy of our methods across various selection rules on a real drug discovery dataset and investigate their performance via simulations.

2602.10012 2026-02-11 stat.ME

Doubly Robust Estimation of Desirability of Outcome Ranking (DOOR) Probability with Application to MDRO Studies

Shiyu Shu, Toshimitsu Hamasaki, Scott Evans, Lauren Komarow, David van Duin, Guoqing Diao

详情
英文摘要

In observational studies, adjusting for confounders is required if a treatment comparison is planned. A crude comparison of the primary endpoint without covariate adjustment will suffer from biases, and the addition of regression models could improve precision by incorporating imbalanced covariates and thus help make correct inference. Desirability of outcome ranking (DOOR) is a patient-centric benefit-risk evaluation methodology designed for randomized clinical trials. Still, robust covariate adjustment methods could further expand the compatibility of this method in observational studies. In DOOR analysis, each participant's outcome is ranked based on pre-specified clinical criteria, where the most desirable rank represents a good outcome with no side effects and the least desirable rank is the worst possible clinical outcome. We develop a causal framework for estimating the population-level DOOR probability, via the inverse probability of treatment weighting method, G-Computation method, and a Doubly Robust method that combines both. The performance of the proposed methodologies is examined through simulations. We also perform a causal analysis of the Multi-Drug Resistant Organism (MDRO) network within the Antibacterial Resistant Leadership Group (ARLG), comparing the benefit:risk between Mono-drug therapy and Combination-drug therapy.

2602.09982 2026-02-11 stat.ME

Kelly Betting as Bayesian Model Evaluation: A Framework for Time-Updating Probabilistic Forecasts

Michael Beuoy

Comments 31 pages, 10 figures

详情
英文摘要

This paper proposes a new way of evaluating the accuracy and validity of probabilistic forecasts that change over time (such as an in-game win probability model, or an election forecast). Under this approach, each model to be evaluated is treated as a canonical Kelly bettor, and the models are pitted against each other in an iterative betting contest. The growth or decline of each model's bankroll serves as the evaluation metric. Under this approach, market consensus probabilities and implied model credibilities can be updated real time as each model updates, and do not require one to wait for the final outcome. Using a simulation model, it will be shown that this method is in general more accurate than traditional average log-loss and Brier score methods at distinguishing a correct model from an incorrect model. This Kelly approach is shown to have a direct mathematical and conceptual analogue to Bayesian inference, with bankroll serving as a proxy for Bayesian credibility.

2602.09959 2026-02-11 math.ST cs.LG stat.ML stat.TH

Statistical-Computational Trade-offs in Learning Multi-Index Models via Harmonic Analysis

Hugo Latourelle-Vigeant, Theodor Misiakiewicz

Comments 91 pages

详情
英文摘要

We study the problem of learning multi-index models (MIMs), where the label depends on the input $\boldsymbol{x} \in \mathbb{R}^d$ only through an unknown $\mathsf{s}$-dimensional projection $\boldsymbol{W}_*^\mathsf{T} \boldsymbol{x} \in \mathbb{R}^\mathsf{s}$. Exploiting the equivariance of this problem under the orthogonal group $\mathcal{O}_d$, we obtain a sharp harmonic-analytic characterization of the learning complexity for MIMs with spherically symmetric inputs -- which refines and generalizes previous Gaussian-specific analyses. Specifically, we derive statistical and computational complexity lower bounds within the Statistical Query (SQ) and Low-Degree Polynomial (LDP) frameworks. These bounds decompose naturally across spherical harmonic subspaces. Guided by this decomposition, we construct a family of spectral algorithms based on harmonic tensor unfolding that sequentially recover the latent directions and (nearly) achieve these SQ and LDP lower bounds. Depending on the choice of harmonic degree sequence, these estimators can realize a broad range of trade-offs between sample and runtime complexity. From a technical standpoint, our results build on the semisimple decomposition of the $\mathcal{O}_d$-action on $L^2 (\mathbb{S}^{d-1})$ and the intertwining isomorphism between spherical harmonics and traceless symmetric tensors.

2602.09936 2026-02-11 stat.ML cs.LG math.ST stat.TH

The Catastrophic Failure of The k-Means Algorithm in High Dimensions, and How Hartigan's Algorithm Avoids It

Roy R. Lederman, David Silva-Sánchez, Ziling Chen, Gilles Mordant, Amnon Balanov, Tamir Bendory

详情
英文摘要

Lloyd's k-means algorithm is one of the most widely used clustering methods. We prove that in high-dimensional, high-noise settings, the algorithm exhibits catastrophic failure: with high probability, essentially every partition of the data is a fixed point. Consequently, Lloyd's algorithm simply returns its initial partition - even when the underlying clusters are trivially recoverable by other methods. In contrast, we prove that Hartigan's k-means algorithm does not exhibit this pathology. Our results show the stark difference between these algorithms and offer a theoretical explanation for the empirical difficulties often observed with k-means in high dimensions.

2511.20605 2026-02-11 cs.LG stat.ML

How to Purchase Labels? A Cost-Effective Approach Using Active Learning Markets

Xiwen Huang, Pierre Pinson

Comments Accepted for publication in INFORMS Journal on Data Science (IJDS). This is the authors' preprint

详情
英文摘要

We introduce and analyse active learning markets as a way to purchase labels, in situations where analysts aim to acquire additional data to improve model fitting, or to better train models for predictive analytics applications. This comes in contrast to the many proposals that already exist to purchase features and examples. By originally formalising the market clearing as an optimisation problem, we integrate budget constraints and improvement thresholds into the label acquisition process. We focus on a single-buyer-multiple-seller setup and propose the use of two active learning strategies (variance based and query-by-committee based), paired with distinct pricing mechanisms. They are compared to benchmark baselines including random sampling and a greedy knapsack heuristic. The proposed strategies are validated on real-world datasets from two critical application domains: real estate pricing and energy forecasting. Results demonstrate the robustness of our approach, consistently achieving superior performance with fewer labels acquired compared to conventional methods. Our proposal comprises an easy-to-implement practical solution for optimising data acquisition in resource-constrained environments.

2506.22499 2026-02-11 cs.CV cs.AI stat.AP

Scalable Dynamic Origin-Destination Demand Estimation Enhanced by High-Resolution Satellite Imagery Data

Jiachao Liu, Pablo Guarda, Koichiro Niinuma, Sean Qian

详情
英文摘要

This study presents a novel integrated framework for dynamic origin-destination demand estimation (DODE) in multi-class mesoscopic network models, incorporating high-resolution satellite imagery together with conventional traffic data from local sensors. Unlike sparse local detectors, satellite imagery offers consistent, city-wide road and traffic information of both parking and moving vehicles, overcoming data availability limitations. To extract information from imagery data, we design a computer vision pipeline for class-specific vehicle detection and map matching, generating link-level traffic density observations by vehicle class. Building upon this information, we formulate a computational graph-based DODE framework that calibrates dynamic network states by jointly matching observed traffic counts/speeds from local sensors with density measurements derived from satellite imagery. To assess the accuracy and robustness of the proposed framework, we conduct a series of numerical experiments using both synthetic and real-world data. The results demonstrate that supplementing traditional data with satellite-derived density significantly improves estimation performance, especially for links without local sensors. Real-world experiments also show the framework's potential for practical deployment on large-scale networks. Sensitivity analysis further evaluates the impact of data quality related to satellite imagery data.

2504.03560 2026-02-11 math.OC cs.LG math.ST stat.ML stat.TH

Stochastic Optimization with Optimal Importance Sampling

Liviu Aolaritei, Bart P. G. Van Parys, Henry Lam, Michael I. Jordan

详情
英文摘要

Importance Sampling (IS) is a widely used variance reduction technique for enhancing the efficiency of Monte Carlo methods, particularly in rare-event simulation and related applications. Despite its effectiveness, the performance of IS is highly sensitive to the choice of the proposal distribution and often requires stochastic calibration. While the design and analysis of IS have been extensively studied in estimation settings, applying IS within stochastic optimization introduces a lesser-known fundamental challenge: the decision variable and the importance sampling distribution are mutually dependent, creating a circular optimization structure. This interdependence complicates both convergence analysis and variance control. In this paper, we consider the generic setting of convex stochastic optimization with linear constraints. We propose a single-loop stochastic approximation algorithm, based on a variant of Nesterov's dual averaging, that jointly updates the decision variable and the importance sampling distribution, notably without time-scale separation or nested optimization. The method is globally convergent and achieves the minimal asymptotic variance among stochastic gradient schemes, which moreover matches the performance of an oracle sampler adapted to the optimal solution and thus effectively resolves the circular optimization challenge.

2402.15004 2026-02-11 stat.ME math.ST stat.TH

Repro Samples Method for a Performance Guaranteed Inference in General and Irregular Inference Problems

Minge Xie, Peng Wang

详情
英文摘要

Rapid advancements in data science require us to have fundamentally new frameworks to tackle prevalent but highly non-trivial "irregular" inference problems, to which the large sample central limit theorem does not apply. Typical examples are those involving discrete or non-numerical parameters and those involving non-numerical data, etc. In this article, we present an innovative, wide-reaching, and effective approach, called "repro samples method," to conduct statistical inference for these irregular problems plus more. The development relates to but improves several existing simulation-inspired inference approaches, and we provide both exact and approximate theories to support our development. Moreover, the proposed approach is broadly applicable and subsumes the classical Neyman-Pearson framework as a special case. For the often-seen irregular inference problems that involve both discrete/non-numerical and continuous parameters, we propose an effective three-step procedure to make inferences for all parameters. We also develop a unique matching scheme that turns the discreteness of discrete/non-numerical parameters from an obstacle for forming inferential theories into a beneficial attribute for improving computational efficiency. We demonstrate the effectiveness of the proposed general methodology using various examples, including a case study example on a Gaussian mixture model with unknown number of components. This case study example provides a solution to a long-standing open inference question in statistics on how to quantify the estimation uncertainty for the unknown number of components and other associated parameters. Real data and simulation studies, with comparisons to existing approaches, demonstrate the far superior performance of the proposed method.

2310.01153 2026-02-11 math.ST stat.ME stat.TH

Measuring Evidence against Exchangeability and Group Invariance with E-values

Nick W. Koning

详情
英文摘要

We study e-values for quantifying evidence against exchangeability and general invariance of a random variable under a compact group. We start by characterizing such e-values, and explaining how they nest traditional group invariance tests as a special case. We show they can be easily designed for an arbitrary test statistic, and computed through Monte Carlo sampling. We prove a result that characterizes optimal e-values for group invariance against optimality targets that satisfy a mild orbit-wise decomposition property. We apply this to design expected-utility-optimal e-values for group invariance, which include both Neyman-Pearson-optimal tests and log-optimal e-values. Moreover, we generalize the notion of rank- and sign-based testing to compact groups, by using a representative inversion kernel. In addition, we characterize e-processes for group invariance for arbitrary filtrations, and provide tools to construct them. We also describe test martingales under a natural filtration, which are simpler to construct. Peeking beyond compact groups, we encounter e-values and e-processes based on ergodic theorems. These nest e-processes based on de Finetti's theorem for testing exchangeability.

2602.09911 2026-02-11 stat.ME

Doubly Robust Machine Learning for Population Size Estimation with Missing Covariates: Application to Gaza Conflict Mortality

Mateo Dulce Rubio, Edward H. Kennedy, Nicholas P. Jewell

详情
英文摘要

Population size estimation from capture-recapture data is central for studying hard-to-reach populations, incorporating auxiliary covariates to account for heterogeneous capture probabilities and recapture dependencies. However, missing attributes pose a critical methodological challenge due to reluctance to share sensitive information, data collection limitations, and imperfect record linkage. Existing approaches either ignore missingness or rely on a priori imputation, potentially introducing substantial bias. In this work, we develop a novel nonparametric estimation framework using a Missing at Random assumption to identify capture probabilities under missing covariates. Using semiparametric efficiency theory, we construct one-step estimators that combine efficiency, robustness, and finite-sample validity: they approximately achieve the nonparametric efficiency bound, accommodate flexible machine learning methods through a doubly robust structure, and provide approximately valid inference for any sample size. Simulations demonstrate substantial improvements over naive imputation approaches, with our doubly robust ML estimators maintaining valid inference even at high missingness rates where competing methods fail. We apply our methodology to re-estimate mortality in the Gaza Strip from October 7, 2023, to June 30, 2024, using three-list capture-recapture data with missing demographic information. Our approach yields more conservative yet precise estimates compared to previous methods, indicating the true death toll exceeds official statistics by approximately 26%. Our framework provides practitioners with principled tools for handling incomplete data in conflict settings and other applications with hard-to-reach populations.

2602.09847 2026-02-11 stat.ML cs.LG

Stabilized Maximum-Likelihood Iterative Quantum Amplitude Estimation for Structural CVaR under Correlated Random Fields

Alireza Tabarraei

详情
英文摘要

Conditional Value-at-Risk (CVaR) is a central tail-risk measure in stochastic structural mechanics, yet its accurate evaluation under high-dimensional, spatially correlated material uncertainty remains computationally prohibitive for classical Monte Carlo methods. Leveraging bounded-expectation reformulations of CVaR compatible with quantum amplitude estimation, we develop a quantum-enhanced inference framework that casts CVaR evaluation as a statistically consistent, confidence-constrained maximum-likelihood amplitude estimation problem. The proposed method extends iterative quantum amplitude estimation (IQAE) by embedding explicit maximum-likelihood inference within a rigorously controlled interval-tracking architecture. To ensure global correctness under finite-shot noise and the non-injective oscillatory response induced by Grover amplification, we introduce a stabilized inference scheme incorporating multi-hypothesis feasibility tracking, periodic low-depth disambiguation, and a bounded restart mechanism governed by an explicit failure-probability budget. This formulation preserves the quadratic oracle-complexity advantage of amplitude estimation while providing finite-sample confidence guarantees and reduced estimator variance. The framework is demonstrated on benchmark problems with spatially correlated lognormal Young's modulus fields generated using a Nystrom low-rank Gaussian kernel model. Numerical results show that the proposed estimator achieves substantially lower oracle complexity than classical Monte Carlo CVaR estimation at comparable confidence levels, while maintaining rigorous statistical reliability. This work establishes a practically robust and theoretically grounded quantum-enhanced methodology for tail-risk quantification in stochastic continuum mechanics.

2602.09845 2026-02-11 stat.CO

Estimating Individual Customer Lifetime Values with R: The CLVTools Package

Markus Meierer, Patrick Bachmann, Jeffrey Näf, Patrik Schilter, René Algesheimer

详情
英文摘要

Customer lifetime value (CLV) describes a customer's long-term economic value for a business. This metric is widely used in marketing, for example, to select customers for a marketing campaign. However, modeling CLV is challenging. When relying on customers' purchase histories, the input data is sparse. Additionally, given its long-term focus, prediction horizons are often longer than estimation periods. Probabilistic models are able to overcome these challenges and, thus, are a popular option among researchers and practitioners. The latter also appreciate their applicability for both small and big data as well as their robust predictive performance without any fine-tuning requirements. Their popularity is due to three characteristics: data parsimony, scalability, and predictive accuracy. The R package CLVTools provides an efficient and user-friendly implementation framework to apply key probabilistic models such as the Pareto/NBD and Gamma-Gamma model. Further, it provides access to the latest model extensions to include time-invariant and time-varying covariates, parameter regularization, and equality constraints. This article gives an overview of the fundamental ideas of these statistical models and illustrates their application to derive CLV predictions for existing and new customers.

2602.09833 2026-02-11 math.ST stat.TH

Density estimation from batched broken random samples

Hancheng Bi, Bernhard Schmitzer, Thilo D. Stier

Comments 18 pages, 4 figures

详情
英文摘要

The broken random sample problem was first introduced by DeGroot, Feder, and Gole (1971, Ann. Math. Statist.): in each observation (batch), a random sample of $M$ i.i.d. point pairs $ ((X_i,Y_i))_{i=1}^M$ is drawn from a joint distribution with density $p(x,y)$, but we can observe only the unordered multisets $(X_i)_{i=1}^M$ and $(Y_i)_{i=1}^M$ separately; that is, the pairing information is lost. For large $M$, inferring $p$ from a single observation has been shown to be essentially impossible. In this paper, we propose a parametric method based on a pseudo-log-likelihood to estimate $p$ from $N$ i.i.d. broken sample batches, and we prove a fast convergence rate in $N$ for our estimator that is uniform in $M$, under mild assumptions.

2602.09762 2026-02-11 math.ST cs.NA math.NA stat.TH

Asymptotic analysis of the Gaussian kernel matrix for partially noisy data in high dimensions

Kensuke Aishima

详情
英文摘要

The Gaussian kernel is one of the most important kernels, applicable to many research fields, including scientific computing and data science. In this paper, we present asymptotic analysis of the Gaussian kernel matrix in high dimension under a statistical model of noisy data. The main result is a nice combination of Karoui's asymptotic analysis with procedures of constrained low rank matrix approximations. More specifically, Karouli clarified an important asymptotic structure of the Gaussian kernel matrix, leading to strong consistency of the eigenvectors, though the eigenvalues are inconsistent. This paper focuses on the above results and presents a consistent estimator with the use of the smallest eigenvalue, whenever the target kernel matrix tends to low rank in the asymptotic regime. Importantly, asymptotic analysis is given under a statistical model representing partial noise. Although a naive estimator is inconsistent, applying an optimization method for low rank approximations with constraints, we overcome the difficulty caused by the inconsistency, resulting in a new estimator with strong consistency in rank deficient cases.

2602.09731 2026-02-11 stat.ME

Bayesian identification of early warning signals for long-range dependent climatic time series

Sigrunn H. Sørbye, Eirik Myrvoll-Nilsen, Håvard Rue

Comments 27 pages, 9 figures

详情
英文摘要

Detecting early warning signals in climatic time series is essential for anticipating critical transitions and tipping points. Common statistical indicators include increased variance and lag-one autocorrelation prior to bifurcation points. However, these indicators are sensitive to observational noise, long-term mean trends, and long-memory dependence, all of which are prevalent in climatic time series. Such effects can easily obscure genuine signals or generate spurious detections. To address these challenges, we employ a flexible Bayesian framework for modelling time-varying autocorrelation in long-range dependent time series, also accounting for time-varying variance. The approach uses a mixture of two fractional Gaussian noise processes with a time-dependent weight function to represent fractional Gaussian noise with a time-varying Hurst exponent. Inference is performed via integrated nested Laplace approximation, enabling joint estimation of mean trends and handling of irregularly sampled observations. The strengths and limitations of detecting changes in the autocorrelation is investigated in extensive simulations. Applied to real climatic data sets, we find evidence of early warning signals in a reconstructed Atlantic multidecadal variability index, while dismissing such signals for paleoclimate records spanning the Dansgaard-Oeschger events.

2602.09720 2026-02-11 stat.ML cs.LG

Continual Learning for non-stationary regression via Memory-Efficient Replay

Pablo García-Santaclara, Bruno Fernández-Castro, RebecaP. Díaz-Redondo, Martín Alonso-Gamarra

详情
英文摘要

Data streams are rarely static in dynamic environments like Industry 4.0. Instead, they constantly change, making traditional offline models outdated unless they can quickly adjust to the new data. This need can be adequately addressed by continual learning (CL), which allows systems to gradually acquire knowledge without incurring the prohibitive costs of retraining them from scratch. Most research on continual learning focuses on classification problems, while very few studies address regression tasks. We propose the first prototype-based generative replay framework designed for online task-free continual regression. Our approach defines an adaptive output-space discretization model, enabling prototype-based generative replay for continual regression without storing raw data. Evidence obtained from several benchmark datasets shows that our framework reduces forgetting and provides more stable performance than other state-of-the-art solutions.

2602.09704 2026-02-11 stat.ME stat.ML

Extended Isolation Forest with feature sensitivities

Illia Donhauzer

Comments The automated classifier suggested cs.LG. We believe the paper is primarily machine learning theory, and we would appreciate cross-listing to cs.LG or stat.ML if deemed appropriate

详情
英文摘要

Compared to theoretical frameworks that assume equal sensitivity to deviations in all features of data, the theory of anomaly detection allowing for variable sensitivity across features is less developed. To the best of our knowledge, this issue has not yet been addressed in the context of isolation-based methods, and this paper represents the first attempt to do so. This paper introduces an Extended Isolation Forest with feature sensitivities, which we refer to as the Anisotropic Isolation Forest (AIF). In contrast to the standard EIF, the AIF enables anomaly detection with controllable sensitivity to deviations in different features or directions in the feature space. The paper also introduces novel measures of directional sensitivity, which allow quantification of AIF's sensitivity in different directions in the feature space. These measures enable adjustment of the AIF's sensitivity to task-specific requirements. We demonstrate the performance of the algorithm by applying it to synthetic and real-world datasets. The results show that the AIF enables anomaly detection that focuses on directions in the feature space where deviations from typical behavior are more important.

2602.09643 2026-02-11 math.PR math.ST stat.TH

A simple proof of the discreteness of Dirichlet processes

Nils Lid Hjort

Comments Based on pages 18-19 in N.L. Hjort's graduate thesis, 1976

详情
英文摘要

That Dirichlet processes are discrete with probability 1 is demonstrated once more. And yes, these two pages spent fifty years in Norwegian.

2602.09632 2026-02-11 stat.AP

Bayesian network approach to building an affective module for a driver behavioural model

Dorota Młynarczyk, Gabriel Calvo, Francisco Palmi-Perales, Carmen Armero, Virgilio Gómez-Rubio, Ana de la Torre-García, Ricardo Bayona Salvador

详情
英文摘要

This paper focuses on the affective component of a driver behavioural model (DBM). This component specifically models some drivers' mental states such as mental load and active fatigue, which may affect driving performance. We have used Bayesian networks (BNs) to explore the dependencies between various relevant random variables and assess the probability that a driver is in a particular mental state based on their physiological and demographic conditions. Through this approach, our goal is to improve our understanding of driver behaviour in dynamic environments, with potential applications in traffic safety and autonomous vehicle technologies.

2602.09619 2026-02-11 math.ST math.AG stat.TH

Discrete-time, discrete-state multistate Markov models from the perspective of algebraic statistics

Dario Gasbarra, Kaie Kubjas, Sangita Kulathinal, Nataliia Kushnerchuk, Fatemeh Mohammadi, Etienne Sebag

详情
英文摘要

We study discrete-time, discrete-state multistate Markov models from the perspective of algebraic statistics. These models are widely studied in event history analysis, and are characterized by the state space, the initial distribution and the transition probabilities. A finite path under the multistate Markov model is a particular set of states occupied at finite time instances $\{1, \dots, n\}$. The main goal of this paper is to establish a bridge between event history analysis and algebraic statistics. The joint probabilities of finite paths in these models have a natural monomial parametrization in terms of the initial distribution and the transition probabilities. We study the polynomial relations among joint path probabilities. When the statistical constraints on the parameters are disregarded, nonhomogeneous multistate Markov models of arbitrary order can be viewed as slices of decomposable hierarchical models. This yields a complete description of their vanishing ideals as toric ideals generated by explicit families of binomials. Moreover, the variety of this vanishing ideal equals the nonhomogeneous multistate Markov model on the probability simplex. In contrast, homogeneous multistate Markov models exhibit different algebraic behavior, as time homogeneity imposes additional polynomial relations, leading to vanishing ideals that are strictly larger than in the nonhomogeneous case. We also derive families of binomial relations that vanish on homogeneous multistate Markov models. We investigate maximum likelihood estimation from statistical and algebraic perspectives. For nonhomogeneous models, classical and algebraic formulas agree; in the homogeneous case, the algebraic approach is more complex. Lastly, we provide data applications where we demonstrate the statistical theory to obtain the maximum likelihood estimates of the parameters under specific multistate Markov models.

2602.09566 2026-02-11 cs.LG cs.AI cs.CV stat.ME

ECG-IMN: Interpretable Mesomorphic Neural Networks for 12-Lead Electrocardiogram Interpretation

Vajira Thambawita, Jonas L. Isaksen, Jørgen K. Kanters, Hugo L. Hammer, Pål Halvorsen

详情
英文摘要

Deep learning has achieved expert-level performance in automated electrocardiogram (ECG) diagnosis, yet the "black-box" nature of these models hinders their clinical deployment. Trust in medical AI requires not just high accuracy but also transparency regarding the specific physiological features driving predictions. Existing explainability methods for ECGs typically rely on post-hoc approximations (e.g., Grad-CAM and SHAP), which can be unstable, computationally expensive, and unfaithful to the model's actual decision-making process. In this work, we propose the ECG-IMN, an Interpretable Mesomorphic Neural Network tailored for high-resolution 12-lead ECG classification. Unlike standard classifiers, the ECG-IMN functions as a hypernetwork: a deep convolutional backbone generates the parameters of a strictly linear model specific to each input sample. This architecture enforces intrinsic interpretability, as the decision logic is mathematically transparent and the generated weights (W) serve as exact, high-resolution feature attribution maps. We introduce a transition decoder that effectively maps latent features to sample-wise weights, enabling precise localization of pathological evidence (e.g., ST-elevation, T-wave inversion) in both time and lead dimensions. We evaluate our approach on the PTB-XL dataset for classification tasks, demonstrating that the ECG-IMN achieves competitive predictive performance (AUROC comparable to black-box baselines) while providing faithful, instance-specific explanations. By explicitly decoupling parameter generation from prediction execution, our framework bridges the gap between deep learning capability and clinical trustworthiness, offering a principled path toward "white-box" cardiac diagnostics.

2602.09542 2026-02-11 stat.ME

High Dimensional Mean Test for Shrinking Random Variables with Applications to Backtesting

Liujun Chen, Chen Zhou

详情
英文摘要

We propose a high dimensional mean test framework for shrinking random variables, where the underlying random variables shrink to zero as the sample size increases. By pooling observations across overlapping subsets of dimensions, we estimate subsets means and test whether the maximum absolute mean deviates from zero. This approach overcomes cancellations that occur in simple averaging and remains valid even when marginal asymptotic normality fails. We establish theoretical properties of the test statistic and develop a multiplier bootstrap procedure to approximate its distribution. The method provides a flexible and powerful tool for the validation and comparative backtesting of value-at-risk. Simulations show superior performance in high-dimensional settings, and a real-data application demonstrates its practical effectiveness in backtesting.

2602.09537 2026-02-11 stat.ME

A joint QoL-Survival framework with debiased estimation under truncation by death

Torben Martinussen, Klaus K. Holst, Christian Bressen Pipper, Per Kragh Andersen

详情
英文摘要

Evaluating quality-of-life (QoL) outcomes in populations with high mortality risk is complicated by truncation by death, since QoL is undefined for individuals who do not survive to the planned measurement time. We propose a framework that jointly models the distribution of QoL and survival without extrapolating QoL beyond death. Inspired by multistate formulations, we extend the joint characterization of binary health states and mortality to continuous QoL outcomes. Because treatment effects cannot be meaningfully summarized in a single one-dimensional estimand without strong assumptions, our approach simultaneously considers both survival and the joint distribution of QoL and survival with the latter conveniently displayed in a simplex. We develop assumption-lean, semiparametric estimators based on efficient influence functions, yielding flexible, root-n consistent estimators that accommodate machine-learning methods while making transparent the conditions these must satisfy. The proposed method is illustrated through simulation studies and two real-data applications.

2602.09512 2026-02-11 stat.ME stat.CO

Continuous mixtures of Gaussian processes as models for spatial extremes

Lorenzo Dell'Oro, Carlo Gaetan, Thomas Opitz

详情
英文摘要

Spatial modelling of extreme values allows studying the risk of joint occurrence of extreme events at different locations and is of significant interest in climatic and other environmental sciences. A popular class of dependence models for spatial extremes is that of random location-scale mixtures, in which a spatial "baseline" process is multiplied or shifted by a random variable, potentially altering its extremal dependence behaviour. Gaussian location-scale mixtures retain benefits of their Gaussian baseline processes while overcoming some of their limitations, such as symmetry, light tails and weak tail dependence. We review properties of Gaussian location-scale mixtures and develop novel constructions with interesting features, together with a general algorithm for conditional simulation from these models. We leverage their flexibility to propose extended extreme-value models, that allow for appropriately modelling not only the tails but also the bulk of the data. This is important in many applications and avoids the need to explicitly select the events considered as extreme. We propose new solutions for likelihood inference in parametric models of Gaussian location-scale mixtures, in order to avoid the numerical bottleneck given by the latent location and scale variables that can lead to high computational cost of standard likelihood evaluations. The effectiveness of the models and of the inference methods is confirmed with simulated data examples, and we present an application to wildfire-related weather variables in Portugal. Although not detailed here, the approaches would also be straightforward to use for modelling multivariate (non spatial) data.

2602.09456 2026-02-11 cs.LG stat.ML

Taming the Monster Every Context: Complexity Measure and Unified Framework for Offline-Oracle Efficient Contextual Bandits

Hao Qin, Chicheng Zhang

Comments 40 pages (13 pages main body, 24 pages supplementary materials)

详情
英文摘要

We propose an algorithmic framework, Offline Estimation to Decisions (OE2D), that reduces contextual bandit learning with general reward function approximation to offline regression. The framework allows near-optimal regret for contextual bandits with large action spaces with $O(log(T))$ calls to an offline regression oracle over $T$ rounds, and makes $O(loglog(T))$ calls when $T$ is known. The design of OE2D algorithm generalizes Falcon~\citep{simchi2022bypassing} and its linear reward version~\citep[][Section 4]{xu2020upper} in that it chooses an action distribution that we term ``exploitative F-design'' that simultaneously guarantees low regret and good coverage that trades off exploration and exploitation. Central to our regret analysis is a new complexity measure, the Decision-Offline Estimation Coefficient (DOEC), which we show is bounded in bounded Eluder dimension per-context and smoothed regret settings. We also establish a relationship between DOEC and Decision Estimation Coefficient (DEC)~\citep{foster2021statistical}, bridging the design principles of offline- and online-oracle efficient contextual bandit algorithms for the first time.

2602.09356 2026-02-11 math.ST stat.TH

Regularized geometric quantiles and universal linear distribution functionals

Dimitri Konen, Gilles Stupfler

详情
英文摘要

Geometric quantiles are popular location functionals to build rank-based statistical procedures in multivariate settings. They are obtained through the minimization of a non-smooth convex objective function. As a result, the singularity of the directional derivatives leads to numerical instabilities and poor sample properties as well as surprising `phase transitions' from empirical to population distributions. To solve these issues, we introduce a regularized version of geometric distribution functions and quantiles that are provably close to the usual geometric concepts and share their qualitative properties, both in the empirical and continuous case, while allowing for a much broader applicability of asymptotic results without any moment condition. We also show that any linear assignment of probability measures (such as the univariate distribution function), that is also translation- and orthogonal-equivariant, necessarily coincides with one of our regularized geometric distribution functions.

2602.09351 2026-02-11 stat.ME

Supervised Learning of Functional Outcomes with Predictors at Different Scales: A Functional Gaussian Process Approach

R. Jacob Andros, Rajarshi Guhaniyogi, Devin Francom, Donatella Pasqualini

详情
英文摘要

The analysis of complex computer simulations, often involving functional data, presents unique statistical challenges. Conventional regression methods, such as function-on-function regression, typically associate functional outcomes with both scalar and functional predictors on a per-realization basis. However, simulation studies often demand a more nuanced approach to disentangle nonlinear relationships of functional outcome with predictors observed at multiple scales: domain-specific functional predictors that are fixed across simulation runs, and realization-specific global predictors that vary between runs. In this article, we develop a novel supervised learning framework tailored to this setting. We propose an additive nonlinear regression model that flexibly captures the influence of both predictor types. The effects of functional predictors are modeled through spatially-varying coefficients governed by a Gaussian process prior. Crucially, to capture the impact of global predictors on the functional outcome, we introduce a functional Gaussian process (fGP) prior. This new prior jointly models the entire collection of unknown, spatially-indexed nonlinear functions that encode the effects of the global predictors over the entire domain, explicitly accounting for their spatial dependence. This integrated architecture enables simultaneous learning from both predictor types, provides a principled strategies to quantify their respective contributions in predicting the functional outcome, and delivers rigorous uncertainty estimates for both model parameters and predictions. The utility and robustness of our approach are demonstrated through multiple synthetic datasets and a real-world application involving outputs from the Sea, Lake, and Overland Surges from Hurricanes (SLOSH) model.

2602.09314 2026-02-11 cs.LG cs.AI stat.ML

Clarifying Shampoo: Adapting Spectral Descent to Stochasticity and the Parameter Trajectory

Runa Eschenhagen, Anna Cai, Tsung-Hsien Lee, Hao-Jun Michael Shi

详情
英文摘要

Optimizers leveraging the matrix structure in neural networks, such as Shampoo and Muon, are more data-efficient than element-wise algorithms like Adam and Signum. While in specific settings, Shampoo and Muon reduce to spectral descent analogous to how Adam and Signum reduce to sign descent, their general relationship and relative data efficiency under controlled settings remain unclear. Through extensive experiments on language models, we demonstrate that Shampoo achieves higher token efficiency than Muon, mirroring Adam's advantage over Signum. We show that Shampoo's update applied to weight matrices can be decomposed into an adapted Muon update. Consistent with this, Shampoo's benefits can be exclusively attributed to its application to weight matrices, challenging interpretations agnostic to parameter shapes. This admits a new perspective that also avoids shortcomings of related interpretations based on variance adaptation and whitening: rather than enforcing semi-orthogonality as in spectral descent, Shampoo's updates are time-averaged semi-orthogonal in expectation.

2602.09279 2026-02-11 stat.ME math.ST stat.TH

Stochastic EM Estimation and Inference for Zero-Inflated Beta-Binomial Mixed Models for Longitudinal Count Data

John Barrera, Ana Arribas-Gil, Dae-Jin Lee, Cristian Meza

Comments 21 pages, 4 figures

详情
英文摘要

Analyzing overdispersed, zero-inflated, longitudinal count data poses significant modeling and computational challenges, which standard count models (e.g., Poisson or negative binomial mixed effects models) fail to adequately address. We propose a Zero-Inflated Beta-Binomial Mixed Effects Regression (ZIBBMR) model that augments a beta-binomial count model with a zero-inflation component, fixed effects for covariates, and subject-specific random effects, accommodating excessive zeros, overdispersion, and within-subject correlation. Maximum likelihood estimation is performed via a Stochastic Approximation EM (SAEM) algorithm with latent variable augmentation, which circumvents the model's intractable likelihood and enables efficient computation. Simulation studies show that ZIBBMR achieves accuracy comparable to leading mixed-model approaches in the literature and surpasses simpler zero-inflated count formulations, particularly in small-sample scenarios. As a case study, we analyze longitudinal microbiome data, comparing ZIBBMR with an external Zero-Inflated Beta Regression (ZIBR) benchmark; the results indicate that applying both count- and proportion-based models in parallel can enhance inference robustness when both data types are available.

2602.09277 2026-02-11 stat.ML cs.LG

Mutual Information Collapse Explains Disentanglement Failure in $β$-VAEs

Minh Vu, Xiaoliang Wan, Shuangqing Wei

详情
英文摘要

The $β$-VAE is a foundational framework for unsupervised disentanglement, using $β$ to regulate the trade-off between latent factorization and reconstruction fidelity. Empirically, however, disentanglement performance exhibits a pervasive non-monotonic trend: benchmarks such as MIG and SAP typically peak at intermediate $β$ and collapse as regularization increases. We demonstrate that this collapse is a fundamental information-theoretic failure, where strong Kullback-Leibler pressure promotes marginal independence at the expense of the latent channel's semantic informativeness. By formalizing this mechanism in a linear-Gaussian setting, we prove that for $β> 1$, stationarity-induced dynamics trigger a spectral contraction of the encoder gain, driving latent-factor mutual information to zero. To resolve this, we introduce the $λβ$-VAE, which decouples regularization pressure from informational collapse via an auxiliary $L_2$ reconstruction penalty $λ$. Extensive experiments on dSprites, Shapes3D, and MPI3D-real confirm that $λ> 0$ stabilizes disentanglement and restores latent informativeness over a significantly broader range of $β$, providing a principled theoretical justification for dual-parameter regularization in variational inference backbones.

2602.09247 2026-02-11 stat.CO

Motivating REML via Prediction-Error Covariances in EM Updates for Linear Mixed Models

Andrew T. Karl

详情
英文摘要

We present a computational motivation for restricted maximum likelihood (REML) estimation in linear mixed models using an expectation--maximization (EM) algorithm. At each iteration, maximum likelihood (ML) and REML solve the same mixed-model equations for the best linear unbiased estimator (BLUE) of the fixed effects and the best linear unbiased predictor (BLUP) of the random effects. They differ only in the trace adjustments used in the variance-component updates: ML uses conditional covariances of the random effects given the data, whereas REML uses prediction-error covariances from Henderson's C-matrix, reflecting uncertainty from estimating the fixed effects. Short R code makes this switch explicit, exposes the key matrices for classroom inspection, and reproduces lme4 ML and REML fits.

2602.09240 2026-02-11 math.ST cs.IT cs.LG math.IT math.PR stat.ML stat.TH

Optimal Estimation in Orthogonally Invariant Generalized Linear Models: Spectral Initialization and Approximate Message Passing

Yihan Zhang, Hong Chang Ji, Ramji Venkataramanan, Marco Mondelli

详情
英文摘要

We consider the problem of parameter estimation from a generalized linear model with a random design matrix that is orthogonally invariant in law. Such a model allows the design have an arbitrary distribution of singular values and only assumes that its singular vectors are generic. It is a vast generalization of the i.i.d. Gaussian design typically considered in the theoretical literature, and is motivated by the fact that real data often have a complex correlation structure so that methods relying on i.i.d. assumptions can be highly suboptimal. Building on the paradigm of spectrally-initialized iterative optimization, this paper proposes optimal spectral estimators and combines them with an approximate message passing (AMP) algorithm, establishing rigorous performance guarantees for these two algorithmic steps. Both the spectral initialization and the subsequent AMP meet existing conjectures on the fundamental limits to estimation -- the former on the optimal sample complexity for efficient weak recovery, and the latter on the optimal errors. Numerical experiments suggest the effectiveness of our methods and accuracy of our theory beyond orthogonally invariant data.

2602.09235 2026-02-11 cs.LG stat.AP stat.ME

RAPID: Risk of Attribute Prediction-Induced Disclosure in Synthetic Microdata

Matthias Templ, Oscar Thees, Roman Müller

Comments 29 pages, 5 figures

详情
英文摘要

Statistical data anonymization increasingly relies on fully synthetic microdata, for which classical identity disclosure measures are less informative than an adversary's ability to infer sensitive attributes from released data. We introduce RAPID (Risk of Attribute Prediction--Induced Disclosure), a disclosure risk measure that directly quantifies inferential vulnerability under a realistic attack model. An adversary trains a predictive model solely on the released synthetic data and applies it to real individuals' quasi-identifiers. For continuous sensitive attributes, RAPID reports the proportion of records whose predicted values fall within a specified relative error tolerance. For categorical attributes, we propose a baseline-normalized confidence score that measures how much more confident the attacker is about the true class than would be expected from class prevalence alone, and we summarize risk as the fraction of records exceeding a policy-defined threshold. This construction yields an interpretable, bounded risk metric that is robust to class imbalance, independent of any specific synthesizer, and applicable with arbitrary learning algorithms. We illustrate threshold calibration, uncertainty quantification, and comparative evaluation of synthetic data generators using simulations and real data. Our results show that RAPID provides a practical, attacker-realistic upper bound on attribute-inference disclosure risk that complements existing utility diagnostics and disclosure control frameworks.

2602.09219 2026-02-11 math.ST stat.TH

Goodness-of-fit testing for nonlinear inverse problems with random observations

Remo Kretschmann, Han Cheng Lie

Comments 44 pages

详情
英文摘要

This work is concerned with nonparametric goodness-of-fit testing in the context of nonlinear inverse problems with random observations. Bayesian posterior distributions based upon a Gaussian process prior distribution are proven to contract at a certain rate uniformly over a set of true parameters. The corresponding posterior mean is shown to converge uniformly at the posterior contraction rate in the sense of satisfying a concentration inequality. Distinguishability for bounded alternatives separated from a composite null hypothesis at the posterior contraction rate is established using infimum plug-in tests based on the posterior mean and also on maximum a posteriori estimators. The results are applied to a class of inverse problems governed by ordinary differential equation initial value problems that is widely used in pharmacokinetics. For this class, uniform posterior contraction rates are proven and then used to establish distinguishability.

2602.09196 2026-02-11 cs.LG stat.ML

Fair Feature Importance Scores via Feature Occlusion and Permutation

Camille Little, Madeline Navarro, Santiago Segarra, Genevera Allen

详情
英文摘要

As machine learning models increasingly impact society, their opaque nature poses challenges to trust and accountability, particularly in fairness contexts. Understanding how individual features influence model outcomes is crucial for building interpretable and equitable models. While feature importance metrics for accuracy are well-established, methods for assessing feature contributions to fairness remain underexplored. We propose two model-agnostic approaches to measure fair feature importance. First, we propose to compare model fairness before and after permuting feature values. This simple intervention-based approach decouples a feature and model predictions to measure its contribution to training. Second, we evaluate the fairness of models trained with and without a given feature. This occlusion-based score enjoys dramatic computational simplification via minipatch learning. Our empirical results reflect the simplicity and effectiveness of our proposed metrics for multiple predictive tasks. Both methods offer simple, scalable, and interpretable solutions to quantify the influence of features on fairness, providing new tools for responsible machine learning development.

2602.09167 2026-02-11 stat.ME

Mean regression for (0,1) responses via beta scale mixtures

Arno Otto, Andriëtte Bekker, Johan Ferreira, Lebogang Rathebe

Comments 21 pages, 11 figures

详情
英文摘要

To achieve a greater general flexibility for modeling heavy-tailed bounded responses, a beta scale mixture model is proposed. Each member of the family is obtained by multiplying the scale parameter of the conditional beta distribution by a mixing random variable taking values on all or part of the positive real line and whose distribution depends on a single parameter governing the tail behavior of the resulting compound distribution. These family members allow for a wider range of values for skewness and kurtosis. To validate the effectiveness of the proposed model, we conduct experiments on both simulated data and real datasets. The results indicate that the beta scale mixture model demonstrates superior performance relative to the classical beta regression model and alternative competing methods for modeling responses on the bounded unit domain.

2602.09145 2026-02-11 stat.ME

Estimating causal effects of functional treatments with modified functional treatment policies

Ziren Jiang, Erjia Cui, Jared D. Huling

详情
英文摘要

Functional data are increasingly prevalent in biomedical research. While functional data analysis has been established for decades, causal inference with functional treatments remains largely unexplored. Existing methods typically focus on estimating the causal average dose response functional (ADRF), which requires strong positivity assumptions and offers limited interpretability. In this work, we target a new causal estimand, the modified functional treatment policy (MFTP), which focuses on estimating the average potential outcome when each individual slightly modifies their treatment trajectory from the observed one. A major challenge for this new estimand is the need to define an average over an infinite-dimensional object with no density. By proposing a novel definition of the population average over a functional variable using a functional principal component analysis (FPCA) decomposition, we establish the causal identifiability of the MFTP estimand. We further derive outcome regression, inverse probability weighting, and doubly robust estimators for the MFTP, and provide theoretical guarantees under mild regularity conditions. The proposed estimators are validated through extensive simulation studies. Applying our MFTP framework to the National Health and Nutrition Examination Survey (NHANES) accelerometer data, we estimate the causal effects of reducing disruptive nighttime activity and low-activity duration on all-cause mortality.

2602.09058 2026-02-11 stat.ML cs.AI cs.IT cs.LG math.IT

Persistent Entropy as a Detector of Phase Transitions

Matteo Rucco

详情
英文摘要

Persistent entropy (PE) is an information-theoretic summary statistic of persistence barcodes that has been widely used to detect regime changes in complex systems. Despite its empirical success, a general theoretical understanding of when and why persistent entropy reliably detects phase transitions has remained limited, particularly in stochastic and data-driven settings. In this work, we establish a general, model-independent theorem providing sufficient conditions under which persistent entropy provably separates two phases. We show that persistent entropy exhibits an asymptotically non-vanishing gap across phases. The result relies only on continuity of persistent entropy along the convergent diagram sequence, or under mild regularization, and is therefore broadly applicable across data modalities, filtrations, and homological degrees. To connect asymptotic theory with finite-time computations, we introduce an operational framework based on topological stabilization, defining a topological transition time by stabilizing a chosen topological statistic over sliding windows, and a probability-based estimator of critical parameters within a finite observation horizon. We validate the framework on the Kuramoto synchronization transition, the Vicsek order-to-disorder transition in collective motion, and neural network training dynamics across multiple datasets and architectures. Across all experiments, stabilization of persistent entropy and collapse of variability across realizations provide robust numerical signatures consistent with the theoretical mechanism.

2602.08681 2026-02-11 cs.LG stat.ML

The Theory and Practice of MAP Inference over Non-Convex Constraints

Leander Kurscheidt, Gabriele Masina, Roberto Sebastiani, Antonio Vergari

详情
英文摘要

In many safety-critical settings, probabilistic ML systems have to make predictions subject to algebraic constraints, e.g., predicting the most likely trajectory that does not cross obstacles. These real-world constraints are rarely convex, nor the densities considered are (log-)concave. This makes computing this constrained maximum a posteriori (MAP) prediction efficiently and reliably extremely challenging. In this paper, we first investigate under which conditions we can perform constrained MAP inference over continuous variables exactly and efficiently and devise a scalable message-passing algorithm for this tractable fragment. Then, we devise a general constrained MAP strategy that interleaves partitioning the domain into convex feasible regions with numerical constrained optimization. We evaluate both methods on synthetic and real-world benchmarks, showing our approaches outperform constraint-agnostic baselines, and scale to complex densities intractable for SoTA exact solvers.

2602.07707 2026-02-11 stat.ME

Generation of Multivariate Discrete Data with Generalized Poisson, Negative Binomial and Binomial Marginal Distributions

Chak Kwong, Cheng, Hakan Demirtas

详情
英文摘要

The analysis of multivariate discrete data is crucial in various scientific research areas, such as epidemiology, the social sciences, genomics, and environmental studies. As the availability of such data increases, developing robust analytical and data generation tools is necessary to understand the relationships among variables. This paper builds upon previous work on data generation frameworks for multivariate ordinal data with a prespecified correlation matrix. The proposed algorithm generates multivariate discrete data from marginal distributions that follow the generalized Poisson, negative binomial, and binomial distributions. A step-by-step algorithm is provided, and its performance is illustrated in four simulated data scenarios and three real-data scenarios. This technique has the potential to be applied in a wide range of settings involving the generation of correlated discrete data.

2602.07681 2026-02-11 stat.ME cs.AI

Mapping Drivers of Greenness: Spatial Variable Selection for MODIS Vegetation Indices

Qishi Zhan, Cheng-Han Yu, Yuchi Chen, Zhikang Dong, Rajarshi Guhaniyogi

详情
英文摘要

Understanding how environmental drivers relate to vegetation condition motivates spatially varying regression models, but estimating a separate coefficient surface for every predictor can yield noisy patterns and poor interpretability when many predictors are irrelevant. Motivated by MODIS vegetation index studies, we examine predictors from spectral bands, productivity and energy fluxes, observation geometry, and land surface characteristics. Because these relationships vary with canopy structure, climate, land use, and measurement conditions, methods should both model spatially varying effects and identify where predictors matter. We propose a spatially varying coefficient model where each coefficient surface uses a tensor product B-spline basis and a Bayesian group lasso prior on the basis coefficients. This prior induces predictor level shrinkage, pushing negligible effects toward zero while preserving spatial structure. Posterior inference uses Markov chain Monte Carlo and provides uncertainty quantification for each effect surface. We summarize retained effects with spatial significance maps that mark locations where the 95 percent posterior credible interval excludes zero, and we define a spatial coverage probability as the proportion of locations where the credible interval excludes zero. Simulations recover sparsity and achieve prediction. A MODIS application yields a parsimonious subset of predictors whose effect maps clarify dominant controls across landscapes.

2602.07632 2026-02-11 stat.ML cs.LG

Scalable Mean-Field Variational Inference via Preconditioned Primal-Dual Optimization

Jinhua Lyu, Tianmin Yu, Ying Ma, Naichen Shi

详情
英文摘要

In this work, we investigate the large-scale mean-field variational inference (MFVI) problem from a mini-batch primal-dual perspective. By reformulating MFVI as a constrained finite-sum problem, we develop a novel primal-dual algorithm based on an augmented Lagrangian formulation, termed primal-dual variational inference (PD-VI). PD-VI jointly updates global and local variational parameters in the evidence lower bound in a scalable manner. To further account for heterogeneous loss geometry across different variational parameter blocks, we introduce a block-preconditioned extension, P$^2$D-VI, which adapts the primal-dual updates to the geometry of each parameter block and improves both numerical robustness and practical efficiency. We establish convergence guarantees for both PD-VI and P$^2$D-VI under properly chosen constant step size, without relying on conjugacy assumptions or explicit bounded-variance conditions. In particular, we prove $O(1/T)$ convergence to a stationary point in general settings and linear convergence under strong convexity. Numerical experiments on synthetic data and a real large-scale spatial transcriptomics dataset demonstrate that our methods consistently outperform existing stochastic variational inference approaches in terms of convergence speed and solution quality.

2601.20152 2026-02-11 math.ST math.PR stat.ML stat.TH

Concentration Inequalities for Exchangeable Tensors and Matrix-valued Data

Chen Cheng, Rina Foygel Barber

Comments 45 pages, 3 figures

详情
英文摘要

We study concentration inequalities for structured weighted sums of random data, including (i) tensor inner products and (ii) sequential matrix sums. We are interested in tail bounds and concentration inequalities for those structured weighted sums under exchangeability, extending beyond the classical framework of independent terms. We develop Hoeffding and Bernstein bounds provided with structure-dependent exchangeability. Along the way, we recover known results in weighted sum of exchangeable random variables and i.i.d. sums of random matrices to the optimal constants. Notably, we develop a sharper concentration bound for combinatorial sum of matrix arrays than the results previously derived from Chatterjee's method of exchangeable pairs. For applications, the richer structures provide us with novel analytical tools for estimating the average effect of multi-factor response models and studying fixed-design sketching methods in federated averaging. We apply our results to these problems, and find that our theoretical predictions are corroborated by numerical evidence.

2601.19186 2026-02-11 stat.ML cs.LG

Double Fairness Policy Learning: Integrating Action Fairness and Outcome Fairness in Decision-making

Zeyu Bian, Lan Wang, Chengchun Shi, Zhengling Qi

详情
英文摘要

Fairness is a central pillar of trustworthy machine learning, especially in domains where accuracy- or profit-driven optimization is insufficient. While most fairness research focuses on supervised learning, fairness in policy learning remains less explored. Because policy learning is interventional, it induces two distinct fairness targets: action fairness (equitable action assignments) and outcome fairness (equitable downstream consequences). Crucially, equalizing actions does not generally equalize outcomes when groups face different constraints or respond differently to the same action. We propose a novel double fairness learning (DFL) framework that explicitly manages the trade-off among three objectives: action fairness, outcome fairness, and value maximization. We integrate fairness directly into a multi-objective optimization problem for policy learning and employ a lexicographic weighted Tchebyshev method that recovers Pareto solutions beyond convex settings, with theoretical guarantees on the regret bounds. Our framework is flexible and accommodates various commonly used fairness notions. Extensive simulations demonstrate improved performance relative to competing methods. In applications to a motor third-party liability insurance dataset and an entrepreneurship training dataset, DFL substantially improves both action and outcome fairness while incurring only a modest reduction in overall value.

2601.17400 2026-02-11 stat.ME

Variational autoencoder for inference of nonlinear mixed effect models based on ordinary differential equations

Zhe Li, Mélanie Prague, Rodolphe Thiébaut, Quentin Clairon

详情
英文摘要

We propose a variational autoencoder (VAE) approach for parameter estimation in nonlinear mixed-effects models based on ordinary differential equations (NLME-ODEs) using longitudinal data from multiple subjects. In moderate dimensions, likelihood-based inference via the stochastic approximation EM algorithm (SAEM) is widely used, but it relies on Markov Chain Monte-Carlo (MCMC) to approximate subject-specific posteriors. As model complexity increases or observations per subject are sparse and irregular, performance often deteriorates due to a complex, multimodal likelihood surface which may lead to MCMC convergence difficulties. We instead estimate parameters by maximizing the evidence lower bound (ELBO), a regularized surrogate for the marginal likelihood. A VAE with a shared encoder amortizes inference of subject-specific random effects by avoiding per-subject optimization and the use of MCMC. Beyond pointwise estimation, we quantify parameter uncertainty using observed-information-based variance estimator and verify that practical identifiability of the model parameters is not compromised by nuisance parameters introduced in the encoder. We evaluate the method in three simulation case studies (pharmacokinetics, humoral response to vaccination, and TGF-$β$ activation dynamics in asthmatic airways) and on a real-world antibody kinetics dataset, comparing against SAEM baselines.

2601.14049 2026-02-11 stat.ME

Tail-Aware Density Forecasting of Locally Explosive Time Series: A Neural Network Approach

Elena Dumitrescu, Julien Peignon, Arthur Thomas

详情
英文摘要

This paper proposes a Mixture Density Network specifically designed for forecasting time series that exhibit locally explosive behavior. By incorporating skewed t-distributions as mixture components, our approach offers enhanced flexibility in capturing the skewed, heavy-tailed, and potentially multimodal nature of predictive densities associated with bubble dynamics modeled by mixed causal-noncausal ARMA processes. In addition, we implement an adaptive weighting scheme that emphasizes tail observations during training and hence leads to accurate density estimation in the extreme regions most relevant for financial applications. Equally important, once trained, the MDN produces near-instantaneous density forecasts. Through extensive Monte Carlo simulations and two empirical applications, on the natural gas price and inflation, we show that the proposed MDN-based framework delivers superior forecasting performance relative to existing approaches.

2601.07752 2026-02-11 econ.EM cs.LG math.ST stat.ME stat.ML stat.TH

A Unified Framework for Debiased Machine Learning: Riesz Representer Fitting under Bregman Divergence

Masahiro Kato

详情
英文摘要

Estimating the Riesz representer is central to debiased machine learning for causal and structural parameter estimation. We propose generalized Riesz regression, a unified framework for estimating the Riesz representer by fitting a representer model via Bregman divergence minimization. This framework includes various divergences as special cases, such as the squared distance and the Kullback--Leibler (KL) divergence, where the former recovers Riesz regression and the latter recovers tailored loss minimization. Under suitable pairs of divergence and model specifications (link functions), the dual problems of the Riesz representer fitting problem correspond to covariate balancing, which we call automatic covariate balancing. Moreover, under the same specifications, the sample average of outcomes weighted by the estimated Riesz representer satisfies Neyman orthogonality even without estimating the regression function, a property we call automatic Neyman orthogonalization. This property not only reduces the estimation error of Neyman orthogonal scores but also clarifies a key distinction between debiased machine learning and targeted maximum likelihood estimation (TMLE). Our framework can also be viewed as a generalization of density ratio fitting under Bregman divergences to Riesz representer estimation, and it applies beyond density ratio estimation. We provide convergence analyses for both reproducing kernel Hilbert space (RKHS) and neural network model classes. A Python package for generalized Riesz regression is released as genriesz and is available at https://github.com/MasaKat0/genriesz.

2512.23190 2026-02-11 cs.LG math.OC stat.ML

A Simple, Optimal and Efficient Algorithm for Online Exp-Concave Optimization

Yi-Han Wang, Peng Zhao, Zhi-Hua Zhou

详情
英文摘要

Online eXp-concave Optimization (OXO) is a fundamental problem in online learning, where the goal is to minimize regret when loss functions are exponentially concave. The standard algorithm, Online Newton Step (ONS), guarantees an optimal $O(d \log T)$ regret, where $d$ is the dimension and $T$ is the time horizon. Despite its simplicity, ONS may face a computational bottleneck due to the Mahalanobis projection at each round. This step costs $Ω(d^ω)$ arithmetic operations for bounded domains, even for simple domains such as the unit ball, where $ω\in (2,3]$ is the matrix-multiplication exponent. As a result, the total runtime can reach $\tilde{O}(d^ωT)$, particularly when iterates frequently oscillate near the domain boundary. This paper proposes a simple variant of ONS, called LightONS, which reduces the total runtime to $O(d^2 T + d^ω\sqrt{T \log T})$ while preserving the optimal regret. Deploying LightONS with the online-to-batch conversion implies a method for stochastic exp-concave optimization with runtime $\tilde{O}(d^3/ε)$, thereby answering an open problem posed by Koren [2013]. The design leverages domain-conversion techniques from parameter-free online learning and defers expensive Mahalanobis projections until necessary, thereby preserving the elegant structure of ONS and enabling LightONS to act as an efficient plug-in replacement in broader scenarios, including gradient-norm adaptivity, parametric stochastic bandits, and memory-efficient OXO.

2512.15771 2026-02-11 cs.LG cs.AI cs.NA math.NA stat.ML

Solving PDEs With Deep Neural Nets under General Boundary Conditions

Chenggong Zhang

Comments 7 pages, 2 figures

详情
英文摘要

Partial Differential Equations (PDEs) are central to modeling complex systems across physical, biological, and engineering domains, yet traditional numerical methods often struggle with high-dimensional or complex problems. Physics-Informed Neural Networks (PINNs) have emerged as an efficient alternative by embedding physics-based constraints into deep learning frameworks, but they face challenges in achieving high accuracy and handling complex boundary conditions. In this work, we extend the Time-Evolving Natural Gradient (TENG) framework to address Dirichlet boundary conditions, integrating natural gradient optimization with numerical time-stepping schemes, including Euler and Heun methods, to ensure both stability and accuracy. By incorporating boundary condition penalty terms into the loss function, the proposed approach enables precise enforcement of Dirichlet constraints. Experiments on the heat equation demonstrate the superior accuracy of the Heun method due to its second-order corrections and the computational efficiency of the Euler method for simpler scenarios. This work establishes a foundation for extending the framework to Neumann and mixed boundary conditions, as well as broader classes of PDEs, advancing the applicability of neural network-based solvers for real-world problems.

2512.14609 2026-02-11 stat.ME econ.EM

Asymptotic Inference for Rank Correlations

Marc-Oliver Pohle, Jan-Lukas Wermuth, Christian H. Weiß

详情
英文摘要

Kendall's tau and Spearman's rho are widely used tools for measuring dependence. Surprisingly, when it comes to asymptotic inference for these rank correlations, some fundamental results and methods have not yet been developed, in particular for discrete random variables and in the time series case, and concerning variance estimation in general. Consequently, asymptotic confidence intervals are not available. We provide a comprehensive treatment of asymptotic inference for classical rank correlations, including Kendall's tau, Spearman's rho, Goodman-Kruskal's gamma, Kendall's tau-b, and grade correlation. We derive asymptotic distributions for both iid and time series data, resorting to asymptotic results for U-statistics, and introduce consistent variance estimators. This enables the construction of confidence intervals and tests, generalizes classical results for continuous random variables and leads to corrected versions of widely used tests of independence. We analyze the finite-sample performance of our variance estimators, confidence intervals, and tests in simulations and illustrate their use in case studies.

2512.02266 2026-02-11 stat.AP

Estimating excess mortality during the Covid-19 pandemic in Aotearoa New Zealand: Addendum

Michael J. Plank, Pubudu Senanayake, Richard Lyon

Journal ref International Journal of Epidemiology (2026) 55(1): dyag008

详情
英文摘要

In our previous article, we estimated excess mortality during in Aotearoa New Zealand for 2020 to 2023. Since our work was published, updated population estimates have been released by Statistics NZ. In this short letter, we provide the results of applying our original model to the new population data. Our updated excess mortality estimate of 2.0% (95% CI [0.5%, 3.3%]) is 1.3 percentage points higher than our original estimate because the new population estimates for the period 2020 to 2023 are smaller, but the main conclusions of our original article still apply.

2512.01965 2026-02-11 stat.AP

Predicting Onsets and Dry Spells of the West African Monsoon Season Using Machine Learning Methods

Colin Bobocea, Yves Atchadé

详情
英文摘要

The beginning of the rainy season and the occurrence of dry spells in West Africa is notoriously difficult to predict, however these are the key indicators farmers use to decide when to plant crops, having a major influence on their overall yield. While many studies have shown correlations between global sea surface temperatures and characteristics of the West African monsoon season, there are few that effectively implement this information into machine learning (ML) prediction models. In this study we investigated the best ways to define our target variables, onset and dry spell, and produced methods to predict them for upcoming seasons using sea surface temperature teleconnections. Defining our target variables required the use of a combination of two well known definitions of onset. We then applied custom statistical techniques -- like total variation regularization and predictor selection -- to the two models we constructed, the first being a linear model and the other an adaptive-threshold logistic regression model. We found mixed results for onset prediction, with spatial verification showing signs of significant skill, while temporal verification showed little to none. For dry spell though, we found significant accuracy through the analysis of multiple binary classification metrics. These models overcome some limitations that current approaches have, such as being computationally intensive and needing bias correction. We also introduce this study as a framework to use ML methods for targeted prediction of certain weather phenomenon using climatologically relevant variables. As we apply ML techniques to more problems, we see clear benefits for fields like meteorology and lay out a few new directions for further research.

2510.23631 2026-02-11 cs.LG cs.AI stat.ME stat.ML

Beyond Pairwise: Empowering LLM Alignment With Ranked Choice Modeling

Yuxuan Tang, Yifan Feng

Comments Accepted by The Fourteenth International Conference on Learning Representations (ICLR 2026)

详情
英文摘要

Alignment of large language models (LLMs) has predominantly relied on pairwise preference optimization, where annotators select the better of two responses to a prompt. While simple, this approach overlooks the opportunity to learn from richer forms of human feedback, such as multiway comparisons and top-$k$ rankings. We introduce Ranked Choice Preference Optimization (RCPO), a unified framework that bridges preference optimization with (ranked) choice modeling via maximum likelihood estimation. RCPO supports both utility-based and rank-based models, subsumes several pairwise methods (such as DPO and SimPO) as special cases, and provides principled training objectives for richer feedback formats. We instantiate this framework with two representative models (Multinomial Logit and Mallows-RMJ). Experiments on Llama-3-8B-Instruct, Gemma-2-9B-it, and Mistral-7B-Instruct across in-distribution and out-of-distribution settings show that RCPO consistently outperforms competitive baselines. RCPO shows that directly leveraging ranked preference data, combined with the right choice models, yields more effective alignment. It offers an extensible foundation for incorporating (ranked) choice modeling into LLM training.

2510.15632 2026-02-11 stat.ME math.ST stat.AP stat.TH

Robust estimation of polyserial correlation coefficients: A density power divergence approach

Max Welz

Comments 69 pages (32 main text), 19 figures and 5 tables in total

Journal ref Forthcoming in Psychometrika (2026+)

详情
英文摘要

The association between a continuous and an ordinal variable is commonly modeled through the polyserial correlation model. However, this model, which is based on a partially-latent normality assumption, may be misspecified in practice, due to, for example (but not limited to), outliers or careless responses. The typically used maximum likelihood (ML) estimator is highly susceptible to such misspecification: One single observation not generated by partially-latent normality can suffice to produce arbitrarily poor estimates. As a remedy, we propose a novel estimator of the polyserial correlation model designed to be robust against the adverse effects of observations discrepant to that model. The estimator leverages density power divergence estimation to achieve robustness by implicitly downweighting such observations; the ensuing weights constitute a useful tool for pinpointing potential sources of model misspecification. The proposed estimator generalizes ML and is consistent as well as asymptotically Gaussian. As price for robustness, some efficiency must be sacrificed, but substantial robustness can be gained while maintaining more than 98% of ML efficiency. We demonstrate our estimator's robustness and practical usefulness in simulation experiments and an empirical application in personality psychology where our estimator helps identify outliers. Finally, the proposed methodology is implemented in free open-source software.

2510.08174 2026-02-11 math.ST stat.TH

Dimension-free Bounds for Covariance Estimation with Tensor-Train Structure

Artsiom Patarusau, Nikita Puchkin, Maxim Rakhuba, Fedor Noskov

详情
英文摘要

We consider a problem of covariance estimation from a sample of i.i.d. high-dimensional random vectors. To avoid the curse of dimensionality, we impose an additional assumption on the structure of the covariance matrix $Σ$. To be more precise, we study the case when $Σ$ can be approximated by a sum of double Kronecker products of smaller matrices in a tensor train (TT) format. Our setup naturally extends widely known Kronecker sum and CANDECOMP/PARAFAC models but admits richer interaction across modes. We suggest an iterative polynomial time algorithm based on TT-SVD and higher-order orthogonal iteration (HOOI) adapted to Tucker-2 hybrid structure. We derive non-asymptotic dimension-free bounds on the accuracy of covariance estimation taking into account hidden Kronecker product and tensor train structures. The efficiency of our approach is illustrated with numerical experiments.

2509.21996 2026-02-11 stat.ML cs.LG

A Nonparametric Discrete Hawkes Model with a Collapsed Gaussian-Process Prior

Trinnhallen Brisley, Gordon Ross, Daniel Paulin

详情
英文摘要

Hawkes process models are used in settings where past events increase the likelihood of future events occurring. Many applications record events as counts on a regular grid, yet discrete-time Hawkes models remain comparatively underused and are often constrained by fixed-form baselines and excitation kernels. In particular, there is a lack of flexible, nonparametric treatments of both the baseline and the excitation in discrete time. To this end, we propose the Gaussian Process Discrete Hawkes Process (GP-DHP), a nonparametric framework that places Gaussian process priors on both the baseline and the excitation and performs inference through a collapsed latent representation. This yields smooth, data-adaptive structure without prespecifying trends, periodicities, or decay shapes, and enables maximum a posteriori (MAP) estimation with near-linear-time \(O(T\log T)\) complexity. A closed-form projection recovers interpretable baseline and excitation functions from the optimized latent trajectory. In simulations, GP-DHP recovers diverse excitation shapes and evolving baselines. In case studies on U.S. terrorism incidents and weekly Cryptosporidiosis counts, it improves test predictive log-likelihood over standard parametric discrete Hawkes baselines while capturing bursts, delays, and seasonal background variation. The results indicate that flexible discrete-time self-excitation can be achieved without sacrificing scalability or interpretability.

2509.09569 2026-02-11 stat.AP

Measuring football fever through wearable technology: A case study on the German cup final

Timo Adam, Jonas Bauer, Christian Deutscher, Christiane Fuchs, Tamara Schamberger, David Winkelmann

详情
英文摘要

Football is the world's most popular sport, evoking strong physiological and emotional responses among its fans. Yet, the specific dynamics of fan attachment to matches have received little attention in the literature. In this paper, we quantify these dynamics through a unique case study from professional football: the 2025 cup final of the German Football Association (DFB) between first-division club VfB Stuttgart and third-division club Arminia Bielefeld. We collected high-resolution smartwatch data, including heart rate and stress level, from 229 Arminia Bielefeld fans over approximately 12 weeks, complemented by survey responses on club attachment, match attendance, and personal characteristics from a subset of 37 participants. By combining physiological data with survey information, we analyse variations in emotional engagement across individuals and contexts, as well as physiological reactions to key match events. This approach provides rare, data-driven insights into the football fever that captivates fans during high-stakes competitions. Furthermore, we compare the vital parameters recorded on the day of the match with baseline levels on non-matchdays throughout the entire observation period. Our findings reveal pronounced physiological responses among fans, beginning hours before the match and peaking at kick-off.

2508.21536 2026-02-11 stat.ME econ.EM

Triply Robust Panel Estimators

Susan Athey, Guido Imbens, Zhaonan Qu, Davide Viviano

详情
英文摘要

This paper studies estimation of causal effects in a panel data setting. We introduce a new estimator, the Triply RObust Panel (TROP) estimator, that combines (i) a flexible model for the potential outcomes based on a low-rank factor structure on top of a two-way-fixed effect specification, with (ii) unit weights intended to upweight units similar to the treated units and (iii) time weights intended to upweight time periods close to the treated time periods. We study the performance of the estimator in a set of simulations designed to closely match several commonly studied real data sets. We find that there is substantial variation in the performance of the estimators across the settings considered. The proposed estimator outperforms two-way-fixed-effect/difference-in-differences, synthetic control, matrix completion and synthetic-difference-in-differences estimators. We investigate what features of the data generating process lead to this performance, and assess the relative importance of the three components of the proposed estimator. We have two recommendations. Our preferred strategy is that researchers use simulations closely matched to the data they are interested in, along the lines discussed in this paper, to investigate which estimators work well in their particular setting. A simpler approach is to use more robust estimators such as synthetic difference-in-differences or the new triply robust panel estimator which we find to substantially outperform two-way fixed effect estimators in many empirically relevant settings.

2508.13366 2026-02-11 stat.AP econ.GN q-fin.EC stat.ME

Monotonic Path-Specific Effects: Application to Estimating Educational Returns

Aleksei Opacic

详情
英文摘要

Conventional research on educational effects typically either employs a "years of schooling" measure of education, or dichotomizes attainment as a point-in-time treatment. Yet, such a conceptualization of education is misaligned with the sequential process by which individuals make educational transitions. In this paper, I propose a causal mediation framework for the study of educational effects on outcomes such as earnings. The framework considers the effect of a given educational transition as operating indirectly, via progression through subsequent transitions, as well as directly, net of these transitions. I demonstrate that the average treatment effect (ATE) of education can be additively decomposed into mutually exclusive components that capture these direct and indirect effects. The decomposition has several special properties which distinguish it from conventional mediation decompositions of the ATE, properties which facilitate less restrictive identification assumptions as well as identification of all causal paths in the decomposition. An analysis of the returns to high school completion in the NLSY97 cohort suggests that the payoff to a high school degree stems overwhelmingly from its direct labor market returns. Mediation via college attendance, completion and graduate school attendance is small because of individuals' low counterfactual progression rates through these subsequent transitions.

2507.15529 2026-02-11 stat.CO

Algorithms for Approximating Conditionally Optimal Bounds

George Bissias

详情
英文摘要

This work develops algorithms for non-parametric confidence regions for samples from a univariate distribution whose support is a discrete mesh bounded on the left. We generalize the theory of Learned-Miller to preorders over the sample space. In this context, we show that the lexicographic low and lexicographic high orders are in some way extremal in the class of monotone preorders. From this theory we derive several approximation algorithms: 1) Closed form approximations for the lexicographic low and high orders with error tending to zero in the mesh size; 2) A polynomial-time approximation scheme for quantile orders with error tending to zero in the mesh size; 3) Monte Carlo methods for calculating quantile and lexicographic low orders applicable to any mesh size.

2507.09093 2026-02-11 stat.ML cs.LG math.OC

Sharp High-Probability Rates for Nonlinear SGD under Heavy-Tailed Noise via Symmetrization

Aleksandar Armacki, Dragana Bajovic, Dusan Jakovetic, Soummya Kar

Comments 43 pages, 1 figure

详情
英文摘要

We study convergence in high-probability of SGD-type methods in non-convex optimization and the presence of heavy-tailed noise. To combat the heavy-tailed noise, a general black-box nonlinear framework is considered, subsuming nonlinearities like sign, clipping, normalization and their smooth counterparts. Our first result shows that nonlinear SGD (N-SGD) achieves the rate $\widetilde{\mathcal{O}}(t^{-1/2})$, for any noise with unbounded moments and a symmetric probability density function (PDF). Crucially, N-SGD has exponentially decaying tails, matching the performance of linear SGD under light-tailed noise. To handle non-symmetric noise, we propose two novel estimators, based on the idea of noise symmetrization. The first, dubbed Symmetrized Gradient Estimator (SGE), assumes a noiseless gradient at any reference point is available at the start of training, while the second, dubbed Mini-batch SGE (MSGE), uses mini-batches to estimate the noiseless gradient. Combined with the nonlinear framework, we get N-SGE and N-MSGE methods, respectively, both achieving the same convergence rate and exponentially decaying tails as N-SGD, while allowing for non-symmetric noise with unbounded moments and PDF satisfying a mild technical condition, with N-MSGE additionally requiring bounded noise moment of order $p \in (1,2]$. Compared to works assuming noise with bounded $p$-th moment, our results: 1) are based on a novel symmetrization approach; 2) provide a unified framework and relaxed moment conditions; 3) imply optimal oracle complexity of N-SGD and N-SGE, strictly better than existing works when $p < 2$, while the complexity of N-MSGE is close to existing works. Compared to works assuming symmetric noise with unbounded moments, we: 1) provide a sharper analysis and improved rates; 2) facilitate state-dependent symmetric noise; 3) extend the strong guarantees to non-symmetric noise.

2507.06556 2026-02-11 math.PR math.CO math.ST stat.TH

Spectra of high-dimensional sparse random geometric graphs

Yifan Cao, Yizhe Zhu

Comments 26 pages, 4 figures

详情
英文摘要

We analyze the spectral properties of the high-dimensional random geometric graph $G(n, d, p)$, formed by sampling $n$ i.i.d vectors $\{v_i\}_{i=1}^{n}$ uniformly on a $d$-dimensional unit sphere and connecting each pair $\{i,j\}$ whenever $\langle v_i, v_j \rangle \geq τ$ so that $p=\mathbb P(\langle v_i,v_j\rangle \geq τ)$. This model defines a nonlinear random matrix ensemble with dependent entries. We show that if $d =ω( np\log^{2}(1/p))$ and $np\to\infty$, the limiting spectral distribution of the normalized adjacency matrix $\frac{A}{\sqrt{np(1-p)}}$ is the semicircle law. To our knowledge, this is the first such result for $G(n, d, p)$ in the sparse regime. In the constant sparsity case $p=α/n$, we further show that if $d=ω(\log^2(n))$ the limiting spectral distribution of $A$ in $G(n,α/n)$ coincides with that of the Erdős-Rényi graph $G(n,α/n)$. Our approach combines the classical moment method in random matrix theory with a novel recursive decomposition of closed-walk graphs, leveraging block-cut trees and ear decompositions, to control the moments of the empirical spectral distribution. A refined high trace analysis further yields a near-optimal bound on the second eigenvalue when $np=Ω(\log^4 (n))$, removing technical conditions previously imposed in (Liu et al. 2023). As an application, we demonstrate that this improved eigenvalue bound sharpens the parameter requirements on $d$ and $p$ for spontaneous synchronization on random geometric graphs in (Abdalla et al. 2024) under the homogeneous Kuramoto model.

2507.05526 2026-02-11 cs.LG stat.ME stat.ML

Estimating Interventional Distributions with Uncertain Causal Graphs through Meta-Learning

Anish Dhir, Cristiana Diaconu, Valentinian Mihai Lungu, James Requeima, Richard E. Turner, Mark van der Wilk

详情
英文摘要

In scientific domains -- from biology to the social sciences -- many questions boil down to \textit{What effect will we observe if we intervene on a particular variable?} If the causal relationships (e.g.~a causal graph) are known, it is possible to estimate the intervention distributions. In the absence of this domain knowledge, the causal structure must be discovered from the available observational data. However, observational data are often compatible with multiple causal graphs, making methods that commit to a single structure prone to overconfidence. A principled way to manage this structural uncertainty is via Bayesian inference, which averages over a posterior distribution on possible causal structures and functional mechanisms. Unfortunately, the number of causal structures grows super-exponentially with the number of nodes in the graph, making computations intractable. We propose to circumvent these challenges by using meta-learning to create an end-to-end model: the Model-Averaged Causal Estimation Transformer Neural Process (MACE-TNP). The model is trained to predict the Bayesian model-averaged interventional posterior distribution, and its end-to-end nature bypasses the need for expensive calculations. Empirically, we demonstrate that MACE-TNP outperforms strong Bayesian baselines. Our work establishes meta-learning as a flexible and scalable paradigm for approximating complex Bayesian causal inference, that can be scaled to increasingly challenging settings in the future.

2506.13865 2026-02-11 quant-ph cond-mat.dis-nn cs.LG cs.NE stat.ML

Connecting phases of matter to the flatness of the loss landscape in analog variational quantum algorithms

Kasidit Srimahajariyapong, Supanut Thanasilp, Thiparat Chotibut

Comments 17+9 pages, 9+7 figures

详情
英文摘要

Variational quantum algorithms (VQAs) promise near-term quantum advantage, yet parametrized quantum states commonly built from the digital gate-based approach often suffer from scalability issues such as barren plateaus, where the loss landscape becomes flat. We study an analog VQA ansätze composed of $M$ quenches of a disordered Ising chain, whose dynamics is native to several quantum simulation platforms. By tuning the disorder strength we place each quench in either a thermalized phase or a many-body-localized (MBL) phase and analyse (i) the ansätze's expressivity and (ii) the scaling of loss variance. Numerics shows that both phases reach maximal expressivity at large $M$, but barren plateaus emerge at far smaller $M$ in the thermalized phase than in the MBL phase. Exploiting this gap, we propose an MBL initialisation strategy: initialise the ansätze in the MBL regime at intermediate quench $M$, enabling an initial trainability while retaining sufficient expressivity for subsequent optimization. The results link quantum phases of matter and VQA trainability, and provide practical guidelines for scaling analog-hardware VQAs.

2506.05905 2026-02-11 stat.ME cs.NA math.NA stat.CO stat.ML

Sequential Monte Carlo approximations of Wasserstein--Fisher--Rao gradient flows

Francesca R. Crucinio, Sahani Pathiraja

Comments Changes from v1: the study of tempered dynamics was removed in favour of a larger experimental section

详情
英文摘要

We consider the problem of sampling from a probability distribution $π$. It is well known that this can be written as an optimisation problem over the space of probability distribution in which we aim to minimise the Kullback--Leibler divergence from $π$. We consider several partial differential equations (PDEs) whose solution is a minimiser of the Kullback--Leibler divergence from $π$ and connect them to well-known Monte Carlo algorithms. We focus in particular on PDEs obtained by considering the Wasserstein--Fisher--Rao geometry over the space of probabilities and show that these lead to a natural implementation using importance sampling and sequential Monte Carlo. We propose a novel algorithm to approximate the Wasserstein--Fisher--Rao flow of the Kullback--Leibler divergence and conduct an extensive empirical study to identify when these algorithms outperforms other popular Monte Carlo algorithms.

2506.05776 2026-02-11 stat.AP stat.OT

Analyzing the retraining frequency of global forecasting models: towards more stable forecasting systems

Marco Zanotti

详情
英文摘要

Forecast stability, that is, the consistency of predictions over time, is essential in business settings where sudden shifts in forecasts can disrupt planning and erode trust in predictive systems. Despite its importance, stability is often overlooked in favor of accuracy. In this study, we evaluate the stability of point and probabilistic forecasts across several retraining scenarios using three large forecastingdatasets and ten different global forecasting models. To analyze stability in the probabilistic setting, we propose a new model-agnostic, distribution-free, and scale-free metric that measuresprobabilistic stability: the Scaled Multi-Quantile Change (SMQC). The results show that less frequent retraining not only preserves but often improves forecast stability, challenging the need for frequent retraining. Moreover, the study shows that accuracy and stability are not necessarily conflicting objectives when adopting a global modeling approach. The study promotes a shift toward stability-aware forecasting practices, proposing a new metric to evaluate forecast stability effectively in probabilistic settings, and offering practical guidelines for building more stable and sustainable forecasting systems.

2505.21208 2026-02-11 stat.ML cs.LG math.OC

Input Convex Kolmogorov Arnold Networks

Thomas Deschatre, Xavier Warin

详情
英文摘要

This article presents an input convex neural network architecture using Kolmogorov-Arnold networks (ICKAN). Two specific networks are presented: the first is based on a low-order, linear-by-part, representation of functions, and a universal approximation theorem is provided. The second is based on cubic splines, for which only numerical results support convergence. We demonstrate on simple tests that these networks perform competitively with classical input convex neural networks (ICNNs). In a second part, we use the networks to solve some optimal transport problems needing a convex approximation of functions and demonstrate their effectiveness. Comparisons with ICNNs show that cubic ICKANs produce results similar to those of classical ICNNs.

2505.19013 2026-02-11 cs.LG cs.AI econ.GN q-fin.EC stat.ML

Faithful Group Shapley Value

Kiljae Lee, Ziqi Liu, Weijing Tang, Yuan Zhang

Comments Accepted to NeurIPS 2025

详情
英文摘要

Data Shapley is an important tool for data valuation, which quantifies the contribution of individual data points to machine learning models. In practice, group-level data valuation is desirable when data providers contribute data in batch. However, we identify that existing group-level extensions of Data Shapley are vulnerable to shell company attacks, where strategic group splitting can unfairly inflate valuations. We propose Faithful Group Shapley Value (FGSV) that uniquely defends against such attacks. Building on original mathematical insights, we develop a provably fast and accurate approximation algorithm for computing FGSV. Empirical experiments demonstrate that our algorithm significantly outperforms state-of-the-art methods in computational efficiency and approximation accuracy, while ensuring faithful group-level valuation.

2505.17133 2026-02-11 stat.ML cs.AI cs.LG

Learning Probabilities of Causation with Mask-Augmented Data

Shuai Wang, Yizhou Sun, Judea Pearl, Ang Li

Comments arXiv admin note: text overlap with arXiv:2502.08858

详情
英文摘要

Probabilities of causation play a central role in modern decision making. Tian and Pearl first introduced formal definitions and derived tight bounds for three binary probabilities of causation, such as the probability of necessity and sufficiency (PNS). However, estimating these probabilities requires both experimental and observational distributions specific to each subpopulation, which are often unreliable or impractical to obtain from limited population-level data. To solve this problem, we propose two machine learning models: Exact-MLP and Mask-MLP, which are trained on a small set of reliable subpopulations and are able to predict PNS bounds for all other subpopulations. We validate our models across four Structural Causal Models (SCMs), each evaluated on population-level data with sample sizes between 100k and 200k. Our models achieve average mean absolute errors (MAEs) of roughly 0.03 on main tasks, reducing MAE by about 80% relative to the corresponding baselines. These results demonstrate both the feasibility of machine learning models for learning probabilities of causation and the effectiveness of the proposed approach.

2505.10919 2026-02-11 physics.flu-dyn cs.LG stat.ML

A Physics-Informed Spatiotemporal Deep Learning Framework for Turbulent Systems

Luca Menicali, Andrew Grace, David H. Richter, Stefano Castruccio

详情
英文摘要

Fluid thermodynamics underpins atmospheric dynamics, climate science, industrial applications, and energy systems. However, direct numerical simulations (DNS) of such systems can be computationally prohibitive. To address this, we present a novel physics-informed spatiotemporal surrogate model for Rayleigh-Benard convection (RBC), a canonical example of convective fluid flow. Our approach combines convolutional neural networks, for spatial dimension reduction, with an innovative recurrent architecture, inspired by large language models, to model long-range temporal dynamics. Inference is penalized with respect to the governing partial differential equations to ensure physical interpretability. Since RBC exhibits turbulent behavior, we quantify uncertainty using a conformal prediction framework. This model replicates key physical features of RBC dynamics while significantly reducing computational cost, offering a scalable alternative to DNS for long-term simulations.

2505.08654 2026-02-11 stat.ME econ.EM q-fin.ST

Holistic Multi-Scale Inference of the Leverage Effect: Efficiency under Dependent Microstructure Noise

Ziyang Xiong, Zhao Chen, Christina Dan Wang

详情
英文摘要

This paper addresses the long-standing challenge of estimating the leverage effect from high-frequency data contaminated by dependent, non-Gaussian microstructure noise. We depart from the conventional reliance on pre-averaging or volatility "plug-in" methods by introducing a holistic multi-scale framework that operates directly on the leverage effect. We propose two novel estimators: the Subsampling-and-Averaging Leverage Effect (SALE) and the Multi-Scale Leverage Effect (MSLE). Central to our approach is a shifted window technique that constructs a noise-unbiased base estimator, significantly simplifying the multi-scale architecture. We provide a rigorous theoretical foundation for these estimators, establishing central limit theorems and stable convergence results that remain valid under both noise-free and dependent-noise settings. The primary contribution to estimation efficiency is a specifically designed weighting strategy for the MSLE estimator. By optimizing the weights based on the asymptotic covariance structure across scales and incorporating finite-sample variance corrections, we achieve substantial efficiency gains over existing benchmarks. Extensive simulation studies and an empirical analysis of 30 U.S. assets demonstrate that our framework consistently yields smaller estimation errors and superior performance in realistic, noisy market environments.

2504.03784 2026-02-11 stat.ML cs.AI cs.LG

Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning

Kai Ye, Hongyi Zhou, Jin Zhu, Francesco Quinzan, Chengchun Shi

详情
英文摘要

Reinforcement learning from human feedback (RLHF) has emerged as a key technique for aligning the output of large language models (LLMs) with human preferences. To learn the reward function, most existing RLHF algorithms use the Bradley-Terry model, which relies on assumptions about human preferences that may not reflect the complexity and variability of real-world judgments. In this paper, we propose a robust algorithm to enhance the performance of existing approaches under such reward model misspecifications. Theoretically, our algorithm reduces the variance of reward and policy estimators, leading to improved regret bounds. Empirical evaluations on LLM benchmark datasets demonstrate that the proposed algorithm consistently outperforms existing methods, with 77-81% of responses being favored over baselines on the Anthropic Helpful and Harmless dataset. The code is available at https://github.com/VRPO/VRPO.

2412.08794 2026-02-11 cs.LG stat.ML

Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning

Prajwal Koirala, Zhanhong Jiang, Soumik Sarkar, Cody Fleming

Journal ref International Conference on Learning Representations (ICLR), 2025

详情
英文摘要

In safe offline reinforcement learning (RL), the objective is to develop a policy that maximizes cumulative rewards while strictly adhering to safety constraints, utilizing only offline data. Traditional methods often face difficulties in balancing these constraints, leading to either diminished performance or increased safety risks. We address these issues with a novel approach that begins by learning a conservatively safe policy through the use of Conditional Variational Autoencoders, which model the latent safety constraints. Subsequently, we frame this as a Constrained Reward-Return Maximization problem, wherein the policy aims to optimize rewards while complying with the inferred latent safety constraints. This is achieved by training an encoder with a reward-Advantage Weighted Regression objective within the latent constraint space. Our methodology is supported by theoretical analysis, including bounds on policy performance and sample complexity. Extensive empirical evaluation on benchmark datasets, including challenging autonomous driving scenarios, demonstrates that our approach not only maintains safety compliance but also excels in cumulative reward optimization, surpassing existing methods. Additional visualizations provide further insights into the effectiveness and underlying mechanisms of our approach.

2412.06582 2026-02-11 math.ST stat.ME stat.TH

Optimal estimation in private distributed functional data analysis

Gengyu Xue, Zhenhua Lin, Yi Yu

详情
英文摘要

We systematically investigate the preservation of differential privacy in functional data analysis, beginning with functional mean estimation and extending to varying coefficient model estimation. Our work introduces a distributed learning framework involving multiple servers, each responsible for collecting several sparsely observed functions. This hierarchical setup introduces a mixed notion of privacy. Within each function, user-level differential privacy is applied to $m$ discrete observations. At the server level, central differential privacy is deployed to account for the centralised nature of data collection. Across servers, only private information is exchanged, adhering to federated differential privacy constraints. To address this complex hierarchy, we employ minimax theory to reveal several fundamental phenomena: from sparse to dense functional data analysis, from user-level to central and federated differential privacy costs, and the intricate interplay between different regimes of functional data analysis and privacy preservation. To the best of our knowledge, this is the first study to rigorously examine functional data estimation under multiple privacy constraints. Our theoretical findings are complemented by efficient private algorithms and extensive numerical evidence, providing a comprehensive exploration of this challenging problem.

2410.04165 2026-02-11 stat.ME econ.EM

How to Compare Copula Forecasts?

Tobias Fissler, Yannick Hoga

Journal ref Journal of Business & Economic Statistics (2026)

详情
英文摘要

This paper lays out a principled approach to compare copula forecasts via strictly consistent scores. We first establish the negative result that, in general, copulas fail to be elicitable, implying that copula predictions cannot sensibly be compared on their own. A notable exception is on Fréchet classes, that is, when the marginal distribution structure is given and fixed, in which case we give suitable scores for the copula forecast comparison. As a remedy for the general non-elicitability of copulas, we establish novel multi-objective scores for copula forecast along with marginal forecasts. They give rise to two-step tests of equal or superior predictive ability which admit attribution of the forecast ranking to the accuracy of the copulas or the marginals. Simulations show that our two-step tests work well in terms of size and power. We illustrate our new methodology via an empirical example using copula forecasts for international stock market indices.

2408.00955 2026-02-11 stat.ML cs.LG stat.ME

Aggregation Models with Optimal Weights for Distributed Gaussian Processes

Haoyuan Chen, Rui Tuo

Comments 34 pages, 8 figures, 2 tables

详情
英文摘要

Gaussian process (GP) models have received increasing attention in recent years due to their superb prediction accuracy and modeling flexibility. To address the computational burdens of GP models for large-scale datasets, distributed learning for GPs are often adopted. Current aggregation models for distributed GPs is not time-efficient when incorporating correlations between GP experts. In this work, we propose a novel approach for aggregated prediction in distributed GPs. The technique is suitable for both the exact and sparse variational GPs. The proposed method incorporates correlations among experts, leading to better prediction accuracy with manageable computational requirements. As demonstrated by empirical studies, the proposed approach results in more stable predictions in less time than state-of-the-art consistent aggregation models.

2406.05637 2026-02-11 math.OC cs.LG math.PR stat.ML

A Generalized Version of Chung's Lemma and its Applications

Li Jiang, Xiao Li, Andre Milzarek, Junwen Qiu

Comments 38 pages

详情
英文摘要

Chung's Lemma is a classical tool for establishing asymptotic convergence rates of (stochastic) optimization methods under strong convexity-type assumptions and appropriate polynomial diminishing step sizes. In this work, we develop a generalized version of Chung's Lemma, which provides a simple non-asymptotic convergence framework for a more general family of step size rules. We demonstrate broad applicability of the proposed generalized lemma by deriving tight non-asymptotic convergence rates for a large variety of stochastic methods. In particular, we obtain partially new non-asymptotic complexity results for stochastic optimization methods, such as Stochastic Gradient Descent (SGD) and Random Reshuffling (RR), under a general $(θ,μ)$-Polyak-Lojasiewicz (PL) condition and for various step sizes strategies, including polynomial, constant, exponential, and cosine step sizes rules. Notably, as a by-product of our analysis, we observe that exponential step sizes exhibit superior adaptivity to both landscape geometry and gradient noise; specifically, they achieve optimal convergence rates without requiring exact knowledge of the underlying landscape or separate parameter selection strategies for noisy and noise-free regimes. Our results demonstrate that the developed variant of Chung's Lemma offers a versatile, systematic, and streamlined approach to establish non-asymptotic convergence rates under general step size rules.

2312.05319 2026-02-11 stat.ME math.ST stat.TH

Hyperbolic Network Latent Space Model with Learnable Curvature

Jinming Li, Gongjun Xu, Ji Zhu

Journal ref Journal of the American Statistical Association 2026

详情
英文摘要

Network data is ubiquitous in various scientific disciplines, including sociology, economics, and neuroscience. Latent space models are often employed in network data analysis, but the geometric effect of latent space curvature remains a significant, unresolved issue. In this work, we propose a hyperbolic network latent space model with a learnable curvature parameter. We theoretically justify that learning the optimal curvature is essential to minimizing the embedding error across all hyperbolic embedding methods beyond network latent space models. A maximum-likelihood estimation strategy, employing manifold gradient optimization, is developed, and we establish the consistency and convergence rates for the maximum-likelihood estimators, both of which are technically challenging due to the non-linearity and non-convexity of the hyperbolic distance metric. We further demonstrate the geometric effect of latent space curvature and the superior performance of the proposed model through extensive simulation studies and an application using a Facebook friendship network.

2311.17407 2026-02-11 math.ST cs.NA math.NA stat.TH

Strong consistency of an estimator by the truncated singular value decomposition for an errors-in-variables regression model with collinearity

Kensuke Aishima

Comments arXiv admin note: text overlap with arXiv:2302.06824

Journal ref Linear Algebra and its Applications, Volume 721, 15 September 2025, Pages 520-541

详情
英文摘要

In this paper, we prove strong consistency of an estimator by the truncated singular value decomposition for a multivariate errors-in-variables linear regression model with collinearity. This result is an extension of Gleser's proof of the strong consistency of total least squares solutions to the case with modern rank constraints. While the usual discussion of consistency in the absence of solution uniqueness deals with the minimal norm solution, the contribution of this study is to develop a theory that shows the strong consistency of a set of solutions. The proof is based on properties of orthogonal projections, specifically properties of the Rayleigh-Ritz procedure for computing eigenvalues. This makes it suitable for targeting problems where some row vectors of the matrices do not contain noise. Therefore, this paper gives a proof for the regression model with the above condition on the row vectors, resulting in a natural generalization of the strong consistency for the standard TLS estimator.

2308.14240 2026-02-11 stat.AP

Bayesian Multivariate Track Geometry Degradation Modelling and its use in Condition-Based Inspection

Huy Truong-Ba, Sinda Rebello, Michael E. Cholette, Venkat Reddy, Pietro Borghesani

Journal ref Railway Engineering Science, 2025

详情
英文摘要

Effective maintenance of railway infrastructure is crucial for safe and comfortable transportation. Among the various degradation modes, track geometry deformation due to repeated loading significantly impacts operational safety. Detecting and maintaining acceptable track geometry involves the use of track recording vehicles (TRVs) that inspect and record geometric parameters. This study aims to develop a novel track geometry degradation model that considers multiple indicators and their correlations, accounting for both imperfect manual and mechanized tamping. A multivariate Wiener model is formulated to capture the characteristics of track geometry degradation. To address data limitations, a hierarchical Bayesian approach with Markov Chain Monte Carlo (MCMC) simulation is employed. This research contributes to the analysis of a multivariate predictive model, which considers the correlation between the degradation rates of multiple indicators, providing insights for rail operators and new track-monitoring systems. The model's performance is validated through a real-world case study on a commuter track in Queensland, Australia, using actual data and independent test datasets. Additionally, the study demonstrates the application of the proposed multivariate degradation model in developing a condition-based inspection policy for track geometry, potentially reducing the number of TRV runs while maintaining abnormal detection levels and failure rates.

2212.00133 2026-02-11 cs.LG math.OC stat.ML

Universal Neural Optimal Transport

Jonathan Geuter, Gregor Kornhardt, Ingimar Tomasson, Vaios Laschos

Comments 37 pages, 19 figures, accepted to ICML 2025

Journal ref Proceedings of the 42nd International Conference on Machine Learning, PMLR 267:19196-19232, 2025

详情
英文摘要

Optimal Transport (OT) problems are a cornerstone of many applications, but solving them is computationally expensive. To address this problem, we propose UNOT (Universal Neural Optimal Transport), a novel framework capable of accurately predicting (entropic) OT distances and plans between discrete measures for a given cost function. UNOT builds on Fourier Neural Operators, a universal class of neural networks that map between function spaces and that are discretization-invariant, which enables our network to process measures of variable resolutions. The network is trained adversarially using a second, generating network and a self-supervised bootstrapping loss. We ground UNOT in an extensive theoretical framework. Through experiments on Euclidean and non-Euclidean domains, we show that our network not only accurately predicts OT distances and plans across a wide range of datasets, but also captures the geometry of the Wasserstein space correctly. Furthermore, we show that our network can be used as a state-of-the-art initialization for the Sinkhorn algorithm with speedups of up to $7.4\times$, significantly outperforming existing approaches.

2101.00245 2026-02-11 stat.ML cs.CV cs.LG cs.NE

The Bayesian Method of Tensor Networks

Erdong Guo, David Draper

Comments 13 pages, 4 figures

Journal ref Neurocomputing 675 (2026) 132961

详情
英文摘要

Bayesian learning is a powerful learning framework which combines the external information of the data (background information) with the internal information (training data) in a logically consistent way in inference and prediction. By Bayes rule, the external information (prior distribution) and the internal information (training data likelihood) are combined coherently, and the posterior distribution and the posterior predictive (marginal) distribution obtained by Bayes rule summarize the total information needed in the inference and prediction, respectively. In this paper, we study the Bayesian framework of the Tensor Network from two perspective. First, we introduce the prior distribution to the weights in the Tensor Network and predict the labels of the new observations by the posterior predictive (marginal) distribution. Since the intractability of the parameter integral in the normalization constant computation, we approximate the posterior predictive distribution by Laplace approximation and obtain the out-product approximation of the hessian matrix of the posterior distribution of the Tensor Network model. Second, to estimate the parameters of the stationary mode, we propose a stable initialization trick to accelerate the inference process by which the Tensor Network can converge to the stationary path more efficiently and stably with gradient descent method. We verify our work on the MNIST, Phishing Website and Breast Cancer data set. We study the Bayesian properties of the Bayesian Tensor Network by visualizing the parameters of the model and the decision boundaries in the two dimensional synthetic data set. For a application purpose, our work can reduce the overfitting and improve the performance of normal Tensor Network model.