arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.23449 2026-03-25 math.ST stat.TH

Asymptotics of Nonparametric Estimation under general non-monotone MAR missingness: A Bayesian Approach

Badr-Eddine Chérief-Abdellatif, Jeffrey Näf

详情

英文摘要

Missing values are ubiquitous in (data) science, with potential detrimental consequences for any statistical analysis. As a consequence, a wealth of methods and theoretical results have been developed in recent years. Still, many questions remain open, in particular in the case of general non-monotone missing at random (MAR). In this work, we extend nonparametric Bayesian theory to this MAR setting. We introduce a general theorem of posterior contraction under MAR and an additional mild positivity condition. Using this result, we are able to show that, despite the missing values, the density of the uncontaminated data can be estimated with the minimax posterior contraction rate up to log factors. To the best of our knowledge, this is the first nonparametric result showing that the uncontaminated distribution can be consistently estimated under Rubin's MAR definition. As a consequence, we obtain an algorithm that takes data contaminated with missing values and returns a sample from a provably consistent estimate of the uncontaminated distribution.

URL PDF HTML ☆

赞 0 踩 0

2603.23322 2026-03-25 stat.AP cs.AI cs.CY physics.geo-ph

Leveraging LLMs and Social Media to Understand User Perception of Smartphone-Based Earthquake Early Warnings

Hanjing Wang, S. Mostafa Mousavi, Patrick Robertson, Richard M. Allen, Alexie Barski, Robert Bosch, Nivetha Thiruverahan, Youngmin Cho, Tajinder Gadh, Steve Malkos, Boone Spooner, Greg Wimpey, Marc Stogaitis

2603.23318 2026-03-25 cs.LG stat.ML

Robustness Quantification for Discriminative Models: a New Robustness Metric and its Application to Dynamic Classifier Selection

Rodrigo F. L. Lassance, Jasper De Bock

2603.23309 2026-03-25 stat.ME

Tail-Calibrated Estimation of Extreme Quantile Treatment Effects

Mengran Li, Daniela Castro-Camilo

2603.23305 2026-03-25 stat.ML cs.LG

Contextual Graph Matching with Correlated Gaussian Features

Mohammad Hassan Ahmad Yarandi, Luca Ganassali

2603.23302 2026-03-25 math.ST stat.ML stat.TH

A Theory of Nonparametric Covariance Function Estimation for Discretely Observed Data

Yoshikazu Terada, Atsutomo Yara

Comments 32 pages

2603.16146 2026-03-25 stat.ML cs.LG cs.SY eess.SY stat.ME

Deep Adaptive Model-Based Design of Experiments

Arno Strouwen, Sebastian Micluţa-Câmpeanu

2602.17503 2026-03-25 stat.ME

An extension to reversible jump Markov chain Monte Carlo for change point problems with heterogeneous temporal dynamics

Emily Gribbin, Benjamin Davis, Daniel Rolfe, Hannah Mitchell

详情

英文摘要

Detecting brief changes in time-series data remains a major challenge in fields where short-lived states carry meaning. In single-molecule localisation microscopy, this problem is particularly acute as fluorescent molecules used to tag protein oligomers display heterogenous photophysical behaviour that can complicate photobleach step analysis; a key step in resolving nanoscale protein organisation. Existing methods often require extensive filtering or prior calibration, and can fail to accurately account for blinking or reversible dark states that may contaminate downstream analysis. In this paper, an extension to RJMCMC is proposed for change point detection with heterogeneous temporal dynamics. This approach is applied to the problem of estimating per-frame active fluorophore counts from one-dimensional integrated intensity traces derived from Fluorescence Localisation Imaging with Photobleaching (FLImP), where compound change point pair moves are introduced to better account for short-lived events known as blinking and dark states. The approach is validated using simulated and experimental data, demonstrating improved accuracy and robustness when compared with current photobleach step analysis methods and with the existing analysis approach for FLImP data. This Compound RJMCMC (CRJMCMC) algorithm performs reliably across a wide range of fluorophore counts and signal-to-noise conditions, with signal-to-noise ratio (SNR) down to 0.001 and counts as high as nineteen fluorophores, while also effectively estimating low counts observed when studying EGFR oligomerisation. Beyond single molecule imaging, this work has applications for a variety of time series change point detection problems with heterogeneous state persistence. For example, electrocorticography brain-state segmentation, fault detection in industrial process monitoring and realised volatility in financial time series.

URL PDF HTML ☆

赞 0 踩 0

2601.09220 2026-03-25 cs.LG math.ST stat.AP stat.TH

From Hawkes Processes to Attention: Time-Modulated Mechanisms for Event Sequences

Xinzi Tan, Kejian Zhang, Junhan Yu, Doudou Zhou

2511.09431 2026-03-25 math.ST stat.TH

A Novel Testing Approach for Differences Among Brain Connectomes

Nicolas Escobar-Velasquez, Jaroslaw Harezlak

2510.16673 2026-03-25 stat.ME

Identification and estimation of causal mechanisms in cluster-randomized trials with post-treatment confounding using Bayesian nonparametrics

Yuki Ohnishi, Michael J. Daniels, Lei Yang, Fan Li

Comments 78 pages

2509.01540 2026-03-25 stat.ME

Discrete Chi-Square Method can model and forecast complex time series, like El Nino data between 1870 and 2026

Lauri Jetsu

Comments Submitted to Computational Statistics (Springer Verlag)

2412.15713 2026-03-25 stat.AP

Data Set of Load Tests and Structural Health Monitoring of a concrete boxgirder bridge

Martin Koehncke, Yogi Jaelani, Alexander Mendler, Lizzie Neumann, Philipp Wittenberg, Alina Rode-Klemm, Sylvia Kessler

Comments 16 pages, 7 figures, 7 tables

2412.07586 2026-03-25 cs.LG stat.ML

Paired Wasserstein Autoencoders for Conditional Sampling

Moritz Piening, Matthias Chung

2312.10618 2026-03-25 stat.ME cs.LG stat.ML

Sparse Learning and Class Probability Estimation with Weighted Support Vector Machines

Liyun Zeng, Hao Helen Zhang

详情

英文摘要

Classification and probability estimation are fundamental tasks with broad applications across modern machine learning and data science, spanning fields such as biology, medicine, engineering, and computer science. Recent development of weighted Support Vector Machines (wSVMs) has demonstrated considerable promise in robustly and accurately predicting class probabilities and performing classification across a variety of problems (Wang et al., 2008). However, the existing framework relies on an $\ell^2$-norm regularized binary wSVMs optimization formulation, which is designed for dense features and exhibits limited performance in the presence of sparse features with redundant noise. Effective sparse learning thus requires prescreening of important variables for each binary wSVM to ensure accurate estimation of pairwise conditional probabilities. In this paper, we propose a novel class of wSVMs frameworks that incorporate automatic variable selection with accurate probability estimation for sparse learning problems. We developed efficient algorithms for variable selection by solving either the $\ell^1$-norm or elastic net regularized wSVMs optimization problems. Class probability is then estimated either via the $\ell^2$-norm regularized wSVMs framework applied to the selected variables, or directly through elastic net regularized wSVMs. The two-step approach offers a strong advantage in simultaneous automatic variable selection and reliable probability estimators with competitive computational efficiency. The elastic net regularized wSVMs achieve superior performance in both variable selection and probability estimation, with the added benefit of variable grouping, at the cost of increases compensation time for high dimensional settings. The proposed wSVMs-based sparse learning methods are broadly applicable and can be naturally extended to $K$-class problems through ensemble learning.

URL PDF HTML ☆

赞 0 踩 0

2303.13865 2026-03-25 math.CT stat.ME

Compositionality in algorithms for smoothing

Moritz Schauer, Frank van der Meulen, Andi Q. Wang

2302.02200 2026-03-25 math.CO math.ST stat.TH

Rank-based linkage I: triplet comparisons and oriented simplicial complexes

R. W. R. Darling, Will Grilliette, Adam Logan

Comments 39 pages, 13 figures

详情

DOI: 10.46298/compositionality-8-2
Journal ref: Compositionality, Volume 8 (2026) (March 20, 2026) compositionality:14123

英文摘要

Rank-based linkage is a new tool for summarizing a collection $S$ of objects according to their relationships. These objects are not mapped to vectors, and ``similarity'' between objects need be neither numerical nor symmetrical. All an object needs to do is rank nearby objects by similarity to itself, using a Comparator which is transitive, but need not be consistent with any metric on the whole set. Call this a ranking system on $S$. Rank-based linkage is applied to the $K$-nearest neighbor digraph derived from a ranking system. Computations occur on a 2-dimensional abstract oriented simplicial complex whose faces are among the points, edges, and triangles of the line graph of the undirected $K$-nearest neighbor graph on $S$. In $|S| K^2$ steps it builds an edge-weighted linkage graph $(S, \mathcal{L}, σ)$ where $σ(\{x, y\})$ is called the in-sway between objects $x$ and $y$. Take $\mathcal{L}_t$ to be the links whose in-sway is at least $t$, and partition $S$ into components of the graph $(S, \mathcal{L}_t)$, for varying $t$. Rank-based linkage is a functor from a category of ``out-ordered'' digraphs to a category of partitioned sets, with the practical consequence that augmenting the set of objects in a rank-respectful way gives a fresh clustering which does not ``rip apart'' the previous one. The same holds for single linkage clustering in the metric space context, but not for typical optimization-based methods. Orientation sheaves play in a fundamental role and ensure that partially overlapping data sets can be ``glued'' together. Open combinatorial problems are presented in the last section.

URL PDF HTML ☆

赞 0 踩 0

2603.23294 2026-03-25 econ.EM stat.ME

Granger Causality in Expectiles: an M-vine copula test

Roberto Fuentes-Martínez, Irene Crimaldi

2603.23277 2026-03-25 stat.ME

A reduced rank model for spatial categorical data with many classes

Paul B May, Andrew Simpson, Semhar Michael

2603.23220 2026-03-25 cs.LG cs.AI stat.ML

General Machine Learning: Theory for Learning Under Variable Regimes

Aomar Osmani

Comments 56 pages

2603.23205 2026-03-25 stat.ML cs.LG stat.ME

Between Resolution Collapse and Variance Inflation: Weighted Conformal Anomaly Detection in Low-Data Regimes

Oliver Hennhöfer, Christine Preisach

Comments 18 pages, 2 figures, 7 tables

2603.23196 2026-03-25 math.ST cond-mat.dis-nn cond-mat.stat-mech math.PR stat.ML stat.TH

Gaussian mixtures and non-parametric likelihoods through the lens of statistical mechanics

Subhroshekhar Ghosh, Adityanand Guntuboyina, Satyaki Mukherjee, Hoang-Son Tran

Comments Authors listed in alphabetical order of surnames; 73 pages

2603.23184 2026-03-25 cs.CL cs.AI stat.AP

ImplicitRM: Unbiased Reward Modeling from Implicit Preference Data for LLM alignment

Hao Wang, Haocheng Yang, Licheng Pan, Lei Shen, Xiaoxi Li, Yinuo Wang, Zhichao Chen, Yuan Lu, Haoxuan Li, Zhouchen Lin

2603.23134 2026-03-25 cs.LG stat.AP

A Bayesian Learning Approach for Drone Coverage Network: A Case Study on Cardiac Arrest in Scotland

Tathagata Basu, Edoardo Patelli, Gianluca Filippi, Ben Parsonage, Christy Maddock, Massimiliano Vasile, Marco Fossati, Adam Loyd, Shaun Marshall, Paul Gowens

2603.23106 2026-03-25 stat.ML cs.LG cs.NA math.NA quant-ph

High-Resolution Tensor-Network Fourier Methods for Exponentially Compressed Non-Gaussian Aggregate Distributions

Juan José Rodríguez-Aldavero, Juan José García-Ripoll

Comments 22 pages, 13 figures

2603.22990 2026-03-25 stat.ME

A Top-Down Scale Approach for Multiscale Geographically and Temporally Weighted Regression

Ghislain Geniaux, César Martinez, Samuel Soubeyrand

Comments Preprint -- Submitted to Spatial Statistics

详情

英文摘要

This paper proposes tds mgtwr, a multiscale geographically and temporally weighted regression (MGTWR) model with covariate-specific spatial and temporal scales. The approach combines a separable spatio-temporal kernel with a Top-Down Scale (TDS) calibration scheme, where spatial and temporal bandwidths are selected for each covariate through a coordinate-wise search over ordered grids guided by the corrected Akaike Information Criterion (AICc). By avoiding unconstrained multidimensional optimization, this strategy extends to the spatio-temporal setting the stabilizing properties of TDS calibration scheme Geniaux (2026). The multiscale backfitting procedure combines the Top-Down Scale calibration scheme with an adaptive, importance-driven update schedule that prioritizes covariates according to their current scale-normalized contribution to the fitted signal, thereby limiting the number of local recalibrations required and accelerating convergence while maintaining estimator fidelity. We also introduce a generic prediction method for MGWR and MGTWR based on kernel sharpening. Monte Carlo experiments show that modeling both space and time improves coefficient recovery and predictive accuracy relative to purely spatial multiscale models when temporal variation is present and sufficiently supported by the data. Gains increase with sample size and signal-to-noise ratio. Two empirical applications illustrate the method under contrasting regimes. For Beet Yellows severity, a plant epidemiology and pest management problem, multiscale spatial modeling is essential, while spatio-temporal extensions yield additional gains when temporal information is rich. In modeling house prices, MGTWR consistently outperforms spatial local and STVC models. In both cases, predictive performance rivals flexible machine-learning benchmarks while preserving interpretable spatio-temporal scales.

URL PDF HTML ☆

赞 0 踩 0

2603.22964 2026-03-25 quant-ph cond-mat.quant-gas cs.LG stat.ML

A PAC-Bayesian approach to generalization for quantum models

Pablo Rodriguez-Grasa, Matthias C. Caro, Jens Eisert, Elies Gil-Fuster, Franz J. Schreiber, Carlos Bravo-Prieto

Comments 15+29 pages, 4 figures

2603.22959 2026-03-25 stat.ML cs.LG

Stepwise Variational Inference with Vine Copulas

Elisabeth Griesbauer, Leiv Rønneberg, Arnoldo Frigessi, Claudia Czado, Ingrid Hobæk Haff

2603.22914 2026-03-25 stat.ME econ.EM

Nonparametric regression with dependent censoring or competing risks

Jia-Han Shih, Simon M. S. Lo, Ralf A. Wilke

Comments 39 pages, 2 figures, for associated sample code, see https://github.com/ralfawilke/nonparreg

2603.22900 2026-03-25 stat.ME cs.AI cs.LG stat.ML

Off-Policy Evaluation and Learning for Survival Outcomes under Censoring

Kohsuke Kubota, Mitsuhiro Takahashi, Yuta Saito

Comments Preprint

2603.22845 2026-03-25 stat.AP stat.ME

DROP: Distributionally Robust Optimization for Multi-task Learning in Graphical Models

Canruo Shen, Xintong Ji, Qiong Li, Wenzhi Yang, Xiaoping Shi

2603.22838 2026-03-25 stat.ME

Community Detection on Inhomogeneous Multilayer Networks with Extreme Sparsity

Tao Shen, Wanjie Wang

Comments 35 pages, 2 figures

2603.22824 2026-03-25 cs.LG math.OC stat.ML

Towards The Implicit Bias on Multiclass Separable Data Under Norm Constraints

Shengping Xie, Zekun Wu, Quan Chen, Kaixu Tang

2603.22750 2026-03-25 stat.ML cs.LG

REALITrees: Rashomon Ensemble Active Learning for Interpretable Trees

Simon D. Nguyen, Hayden McTavish, Kentaro Hoffman, Cynthia Rudin, Tyler H. McCormick

2603.22741 2026-03-25 cs.DS cs.LG cs.NA math.NA math.ST stat.ML stat.TH

Algorithmic warm starts for Hamiltonian Monte Carlo

Matthew S. Zhang, Jason M. Altschuler, Sinho Chewi

2603.22729 2026-03-25 cs.LG cs.MA stat.ME

Behavioral Heterogeneity as Quantum-Inspired Representation

Mohammad Elayan, Wissam Kontar

2603.22719 2026-03-25 stat.ME

A Frequency-Domain Approach for Integrating Multiple Functional Time Series

Zerui Guo, Jianbin Tan, Hui Huang

2603.22712 2026-03-25 math.ST stat.TH

Efficient partially replicated block designs with each replication number one or two

R. A. Bailey, Rahul Mukerjee

Comments 25 pages

2603.22644 2026-03-25 stat.ML cs.LG

Overfitting and Generalizing with (PAC) Bayesian Prediction in Noisy Binary Classification

Xiaohan Zhu, Mesrob I. Ohannessian, Nathan Srebro

2603.22636 2026-03-25 stat.ME

When lookout sees crackle: Anomaly detection via kernel density estimation

Rob J Hyndman, Sevvandi Kandanaarachchi, Katharine Turner

Comments 30 pages

2603.22611 2026-03-25 math.ST math.PR stat.TH

A Martingale Approach To Fluctuations of Rank Estimators in Sensitivity Analysis

Reda Chhaibi, Fabrice Gamboa, Clément Pellegrini

Comments 48 pages, no figures. v1: Preliminary version. All comments are welcome

2603.22569 2026-03-25 q-fin.RM stat.ME

Proxy-Reliance Control in Conformal Recalibration of One-Sided Value-at-Risk

Tenghan Zhong

Comments 44 pages, 4 figures, 9 tables, appendix included

2603.22563 2026-03-25 stat.ML cs.LG

Privacy-Preserving Reinforcement Learning from Human Feedback via Decoupled Reward Modeling

Young Hyun Cho, Will Wei Sun

2603.22540 2026-03-25 stat.ME

Variable Selection in Functional Linear Quantile Regression for Identifying Associations between Daily Patterns of Physical Activity and Cognitive Function

Yuanzhen Yue, Stella Self, Yichao Wu, Jiajia Zhang, Rahul Ghosal

2603.22468 2026-03-25 stat.ML cs.LG math.ST stat.TH

SPDE Methods for Nonparametric Bayesian Posterior Contraction and Laplace Approximation

Enric Alberola-Boloix, Ioar Casado-Telletxea

Comments 32 pages, under review

2603.22465 2026-03-25 cs.LG cs.DC cs.IT cs.NI math.IT stat.ML

A Theoretical Framework for Energy-Aware Gradient Pruning in Federated Learning

Emmanouil M. Athanasakos

Comments 8 pages, 2 figures. This work has been submitted to the IEEE for possible publication

2603.22408 2026-03-25 stat.ME

Spline Quantile Regression with Cubic and Linear Smoothing Splines

Ta-Hsin Li

2603.20938 2026-03-25 stat.ME stat.AP

Refactor Analysis: Predictive Evaluations of Factor Models and Dimensionality

Michael Hardy

详情

英文摘要

Unidimensional factor models justify some of the most consequential summaries in science -- single scores, single ranks, and single leaderboards -- yet unidimensionality is usually assessed indirectly by fitting and evaluating models on images of the data (e.g., correlation matrices) rather than on the response matrix itself. We introduce Refactor analysis, a data-first evaluation paradigm that converts a one-factor solution into a rank-1 prediction of the original matrix by estimating both respondent- and item-side structure from dual association images. We further introduce Verifactor analysis, which evaluates the same construction under bi-cross-validated (BCV) row-column partitions for improved generalization. In simulations where the data-generating mechanism is truly rank-1 and correlational, Refactor metrics align with classical unidimensionality indices, validating the approach. However, across 200 public dichotomous datasets, traditional fit and unidimensionality measures, though highly intercorrelated, are weakly related to data recoverability, especially out of sample. This gap exposes a methodological vulnerability: excellent image-based fit can coexist with poor data-level explanatory power. Finally, treating the association measure itself as a testable hypothesis, we compare $ϕ$, tetrachoric, and quadrant correlation, $q^\prime$, an important reintroduction. Quadrant correlation emerges as a simple, interpretable, and remarkably robust alternative, yielding consistently stronger reconstruction and more stable behavior under sample-size variation than commonly used correlations. Together, Refactor and Verifactor shift unidimensionality assessment from "does a one-factor model fit the correlation matrix?" to the question that matters for measurement and benchmarking: does a one-factor dependence structure recover and generalize the observed responses?

URL PDF HTML ☆

赞 0 踩 0

2603.20655 2026-03-25 cs.LG stat.ML

Exponential Family Discriminant Analysis: Generalizing LDA-Style Generative Classification to Non-Gaussian Models

Anish Lakkapragada

Comments Preprint, 15 pages, 5 figures

2603.20328 2026-03-25 stat.ML cs.LG

Decorrelation, Diversity, and Emergent Intelligence: The Isomorphism Between Social Insect Colonies and Ensemble Machine Learning

Ernest Fokoué, Gregory Babbitt, Yuval Levental

Comments 47 pages, 13 figures, 4 tables

2603.15426 2026-03-25 cond-mat.stat-mech stat.OT

Exact and limit results for the CTRW in presence of drift and position dependent noise intensity

Marco Bianucci, Mauro Bologna, Riccardo Mannella

Comments 76 pages, 12 Figures

详情

英文摘要

Continuous-time random walks (CTRWs) with drift and position-dependent jumps provide a general framework for describing a wide range of natural and engineered systems. We analyze the stochastic differential equation associated with this class of models, in which the driving noise consists of spike (shot) events, and we derive two exact analytical results. First, we obtain a closed-form expression for the $n$-time correlation functions of The noise, expressed as a sum over all $2^{n-1}$ ordered partitions of the observation times (Proposition 2). Second, using the $G$-cumulant formalism, we derive an \emph{exact} non-local master equation (ME) for the probability density function of the CTRW variable, valid without invoking diffusive limits, fractional scaling assumptions, or closure hypotheses (Proposition 3). In interaction representation, this ME retains the same structural form as that of the standard CTRW without drift or position-dependent jumps. Our main result is the emergence of a \emph{universal local master equation}: at long times, the exact non-local ME is universally and accurately approximated by a time-local ME whose only coefficient is the instantaneous renewal rate $R(t)$. From this equation, exact in the well known Poissonian case, both local and global properties of the PDF can be readily inferred. For example, the temporal behavior of the PDF is directly controlled by that of the rate function $R(t)$: if the waiting-time distribution decays as a power law with exponent $μ>2$, then $R(t)\to const$ and the system converges to the Poissonian equilibrium. By contrast, for $μ<2$, the rate decays in time and the effective diffusion induced by the noise slowly weakens, without leading to a stationary state. Numerical experiments confirm its remarkable accuracy even far beyond regimes where a naive time-scale separation would justify it.

URL PDF HTML ☆

赞 0 踩 0

2603.12058 2026-03-25 math.PR math.ST stat.ME stat.TH

Low-Rank and Sparse Drift Estimation for High-Dimensional Lévy-Driven Ornstein--Uhlenbeck Processes

Marina Palaisti

2601.17145 2026-03-25 stat.ME math.ST stat.TH

Optimal Design under Interference, Homophily, and Robustness Trade-offs

Vydhourie Thiyageswaran, Alex Kokot, Jennifer Brennan, Marina Meila, Christina Lee Yu, Maryam Fazel

2601.13698 2026-03-25 cs.LG cs.AI cs.IT math.IT stat.ML

Does Privacy Always Harm Fairness? Data-Dependent Trade-offs via Chernoff Information Neural Estimation

Arjun Nichani, Hsiang Hsu, Chun-Fu, Chen, Haewon Jeong

2601.13419 2026-03-25 stat.ME stat.AP

Pathway-based Bayesian factor models for 'omics data

Lorenzo Mauri, Federica Stolf, Amy H. Herring, Cameron Miller, David B. Dunson

2601.06807 2026-03-25 stat.ME

Adversarially Perturbed Precision Matrix Estimation

Yiling Xie

2512.18884 2026-03-25 stat.CO

Fast simulation of Gaussian random fields with flexible correlation models in Euclidean spaces

Moreno Bevilacqua, Xavier Emery, Francisco Cuevas-Pacheco

2512.09275 2026-03-25 stat.ML cs.LG

Impact of Positional Encoding: Clean and Adversarial Rademacher Complexity for Transformers under In-Context Regression

Weiyi He, Yue Xing

Comments 25 pages, 3 figures

2512.06428 2026-03-25 stat.ME

Community detection in heterogeneous signed networks

Yuwen Wang, Shiwen Ye, Jingnan Zhang, Junhui Wang

2512.04165 2026-03-25 cs.LG stat.ML

Mitigating the Curse of Detail: Scaling Arguments for Feature Learning and Sample Complexity

Noa Rubin, Orit Davidovich, Zohar Ringel

2512.01074 2026-03-25 stat.AP q-bio.QM

COVID-19 Forecasting from U.S. Wastewater Surveillance Data: A Retrospective Multi-Model Study (2022-2024)

Faharudeen Alhassan, Hamed Karami, Amanda Bleichrodt, James M. Hyman, Isaac C. H. Fung, Ruiyan Luo, Gerardo Chowell

Comments 39 pages, 20 figures

2511.04568 2026-03-25 stat.ML cs.LG econ.EM math.ST stat.ME stat.TH

Riesz Regression As Direct Density Ratio Estimation

Masahiro Kato

2510.12416 2026-03-25 stat.ML cs.LG

Geopolitics, Geoeconomics, and Sovereign Risk: Different Shocks, Different Channels

Alvaro Ortiz, Tomasa Rodrigo, Pablo Saborido

2510.08294 2026-03-25 cs.LG cs.AI stat.ML

Counterfactual Identifiability via Dynamic Optimal Transport

Fabio De Sousa Ribeiro, Ainkaran Santhirasekaram, Ben Glocker

Comments Accepted at NeurIPS 2025

2510.03131 2026-03-25 stat.ME stat.ML

Total robustness in Bayesian Nonlinear Regression

Mengqi Chen, Charita Dellaporta, Thomas B. Berrett, Theodoros Damoulas

Comments 76 pages, 13 figures

2509.25802 2026-03-25 stat.ML eess.SP

Graph Distribution-valued Signals: A Wasserstein Space Perspective

Yanan Zhao, Feng Ji, Xingchao Jian, Wee Peng Tay

Comments Accepted by IEEE ICASSP 2026

2509.15197 2026-03-25 math.ST stat.ME stat.ML stat.TH

Consistent Bayesian causal discovery for structural equation models with equal error variances

Anamitra Chaudhuri, Yang Ni, Anirban Bhattacharya

2509.12066 2026-03-25 math.ST math.PR stat.AP stat.ME stat.TH

On the universal calibration of heavy-tailed combination tests

Parijat Chakraborty, F. Richard Guo, Kerby Shedden, Stilian Stoev

Comments 5 figures, 44 pages

详情

英文摘要

It is often of interest to test a global null hypothesis using multiple, possibly dependent $p$-values by combining their strengths while controlling the type-I error. Recently, several heavy-tailed combination tests, such as the harmonic mean test and the Cauchy combination test, have been proposed: they transform $p$-values into heavy-tailed random variables before combining them into a single test statistic. The resulting tests, which are calibrated under some form of independence assumption among the $p$-values, have been shown to be rather robust to dependence asymptotically as the $α$ level gets small. Yet, it has remained an open problem to understand this general phenomenon and characterize how such tests behave under dependence. Using the framework of multivariate regular variation from extreme value theory, we show that for a class of combination tests that are homogeneous, the asymptotic level of the test can be expressed using the angular measure under multivariate regular variation. This measure characterizes the dependence of the transformed heavy-tailed variables in their upper tails, or equivalently, the dependence of the $p$-values near zero. We use this result to study several tests. The harmonic mean test, which coincides with the Pareto linear combination test, is shown to be universally calibrated regardless of the tail dependence; further, this test is shown to be the only one that achieves universal calibration among all homogeneous heavy-tailed combination tests. In contrast, the Cauchy combination test is shown to be universally honest but often conservative; the Dunn-Šidák correction, also known as the Tippett's method, while being honest, is calibrated if and only if the underlying $p$-values are independent near zero. These theoretical findings are corroborated with simulations and an application to independence testing with survey data.

URL PDF HTML ☆

赞 0 踩 0

2508.10149 2026-03-25 stat.ML cs.LG

Prediction-Powered Inference with Inverse Probability Weighting

Jyotishka Datta, Nicholas G. Polson

Comments 10 pages, 3 figures

2506.20768 2026-03-25 math.ST stat.TH

Proof of The TAP Free Energy for High-Dimensional Linear Regression with Spherical Priors at All Temperatures

Zhiyuan Yu, Jingbo Liu

2504.14094 2026-03-25 cs.LG cs.AI stat.ML

Leakage and Interpretability in Concept-Based Models

Enrico Parisini, Tapabrata Chakraborti, Chris Harbron, Ben D. MacArthur, Christopher R. S. Banerji

Comments 39 pages, 25 figures

2409.04090 2026-03-25 math.ST stat.TH

Estimation of service value parameters for a queue with unobserved balking

Daniel Podorojnyi, Liron Ravner

2407.00644 2026-03-25 stat.ML cs.LG

Clusterpath Gaussian Graphical Modeling

D. J. W. Touw, A. Alfons, P. J. F. Groenen, I. Wilms

2404.15654 2026-03-25 math.ST stat.ME stat.TH

Autoregressive networks with dependent edges

Jinyuan Chang, Qin Fang, Eric D. Kolaczyk, Peter W. MacDonald, Qiwei Yao

Comments 33 pages, 2 tables, 3 figures

2312.15222 2026-03-25 stat.ME

Is control of type I error rate needed in Bayesian clinical trial designs?

Elja Arjas, Dario Gasbarra

Comments 31 pages, 2 figures

2312.03538 2026-03-25 stat.CO stat.ME

Bayesian variable selection in sample selection models using spike-and-slab priors

Adam J. Iqbal, Emmanuel O. Ogundimu, F. Javier Rubio

Comments An implementation and code used to reproduce simulation studies and the real data applications can be found at https://github.com/adam-iqbal/selection-spike-slab

2309.07250 2026-03-25 quant-ph cond-mat.stat-mech cs.LG stat.ML

All you need is spin: SU(2) equivariant variational quantum circuits based on spin networks

Richard D. P. East, Guillermo Alonso-Linaje, Chae-Yeun Park

Comments 19 + 7 pages, close to a version accepted to Quantum Science and Technology

2302.10426 2026-03-25 cs.AI cs.LG eess.SP stat.AP

An Accurate and Interpretable Framework for Trustworthy Process Monitoring

Hao Wang, Zhiyu Wang, Yunlong Niu, Zhaoran Liu, Haozhe Li, Yilin Liao, Yuxin Huang, Xinggao Liu

2202.05775 2026-03-25 stat.ML cs.LG

Inference of Multiscale Gaussian Graphical Model

Do Edmond Sanou, Christophe Ambroise, Geneviève Robin

Comments 31 pages

2003.08745 2026-03-25 cs.CV cs.LG stat.ML

On the Road with 16 Neurons: Mental Imagery with Bio-inspired Deep Neural Networks

Alice Plebe, Mauro Da Lio

Comments 18 pages, 10 figures

详情

DOI: 10.1109/ACCESS.2020.3028185

英文摘要

This paper proposes a strategy for visual prediction in the context of autonomous driving. Humans, when not distracted or drunk, are still the best drivers you can currently find. For this reason we take inspiration from two theoretical ideas about the human mind and its neural organization. The first idea concerns how the brain uses a hierarchical structure of neuron ensembles to extract abstract concepts from visual experience and code them into compact representations. The second idea suggests that these neural perceptual representations are not neutral but functional to the prediction of the future state of affairs in the environment. Similarly, the prediction mechanism is not neutral but oriented to the current planning of a future action. We identify within the deep learning framework two artificial counterparts of the aforementioned neurocognitive theories. We find a correspondence between the first theoretical idea and the architecture of convolutional autoencoders, while we translate the second theory into a training procedure that learns compact representations which are not neutral but oriented to driving tasks, from two distinct perspectives. From a static perspective, we force groups of neural units in the compact representations to distinctly represent specific concepts crucial to the driving task. From a dynamic perspective, we encourage the compact representations to be predictive of how the current road scenario will change in the future. We successfully learn compact representations that use as few as 16 neural units for each of the two basic driving concepts we consider: car and lane. We prove the efficiency of our proposed perceptual representations on the SYNTHIA dataset. Our source code is available at https://github.com/3lis/rnn_vae

URL PDF HTML ☆

赞 0 踩 0

1805.09108 2026-03-25 stat.ML cs.LG physics.med-ph stat.CO

Deep Learning Estimation of Absorbed Dose for Nuclear Medicine Diagnostics

Luciano Melodia

Comments Code available at https://codeberg.org/Jiren/MADVK

2603.22374 2026-03-25 stat.ME

On inference in parametric survival data models

Nils Lid Hjort

Comments 34 pages, no figures. Statistical Research Report, Department of Mathematics, University of Oslo, October 1991. The paper is published in essentially this form in International Statistical Review, 1992, vol. 60, pages 355-387, at this url: https://www.jstor.org/stable/1403683?seq=1

2603.22373 2026-03-25 stat.ME

Normalised Local Hazard Plots

Nils Lid Hjort, Thomas Lumley

Comments 41 pages, 15 figures. Statistical Research Report, Department of Mathematics, University of Oslo, from May 1993, but now arXiv'd March 2026. A Splus package for generating such plots is available from either of the authors

2603.22355 2026-03-25 stat.ML cs.CL cs.LG

Demystifying Low-Rank Knowledge Distillation in Large Language Models: Convergence, Generalization, and Information-Theoretic Guarantees

Alberlucia Rafael Soarez, Daniel Kim, Mariana Costa, Alejandro Torre

2603.22344 2026-03-25 cs.IR cs.LG stat.AP stat.ME

Errors in AI-Assisted Retrieval of Medical Literature: A Comparative Study

Jenny Gao, Yongfeng Zhang, Mary L Disis, Lanjing Zhang

详情

英文摘要

Large language models (LLMs) assisted literature retrieval may lead to erroneous references, but these errors have not been rigorously quantified. Therefore, we quantitatively assess errors in reference retrieval of widely used free-version LLM platforms and identify the factors associated with retrieval errors. We evaluated 2,000 references retrieved by 5 LLMs (Grok-2, ChatGPT GPT-4.1, Google Gemini Flash 2.5, Perplexity AI, and DeepSeek GPT-4) for 40 randomly-selected original articles (10 per journal) published Jan. 2024 to July 2025 from British Medical Journal (BMJ), Journal of the American Medical Association, and The New England Journal of Medicine (NEJM). Primary outcomes were a multimetric score ratio combining validity of digital object identifier, PubMed ID, Google-Scholar link, and relevance; and complete miss rate (proportion of references failing all applicable metrics). Multivariable regression was used to examine independent associations. LLM platforms completely failed to retrieve correct reference data 47.8% of the time. The average score ratio of the 5 LLM platforms was 0.29 (standard deviation, 0.35; range, 0-1.25), with a higher score ratio indicating a higher accuracy in retrieving relevant references and correct bibliographic data. The highest and lowest accuracies were achieved by Grok (0.57) and Genimi (0.11), respectively. Compared with BMJ, NEJM articles had lower score ratios and higher complete miss rates. Multivariable analysis shows LLM platforms and journals were independently associated with score ratios and complete miss rate, respectively. We show modest overall performance of LLMs and significant variability in retrieval accuracy across platforms and journals. LLM platforms and journals are associated with LLM's performance in retrieving medical literature. Bibliographic data should be carefully reviewed when using LLM-assisted literature retrieval.

URL PDF HTML ☆

赞 0 踩 0

2603.22328 2026-03-25 cs.LG cs.AI stat.ML

Beyond the Mean: Distribution-Aware Loss Functions for Bimodal Regression

Abolfazl Mohammadi-Seif, Carlos Soares, Rita P. Ribeiro, Ricardo Baeza-Yates

Comments 28 pages, 27 figures

2603.22320 2026-03-25 cs.LG stat.AP stat.ML

Bridging the Gap Between Climate Science and Machine Learning in Climate Model Emulation

Luca Schmidt, Nina Effenberger

2603.22302 2026-03-25 cs.LG cs.CY stat.AP

Research on Individual Trait Clustering and Development Pathway Adaptation Based on the K-means Algorithm

Qianru Wei, Jihaoyu Yang, Cheng Zhang, Jinming Yang

2603.22298 2026-03-25 stat.AP

Should the Olympic sprint skaters run the 500 meter twice?

Nils Lid Hjort

Comments 29 pages, 6 figures. This often cited report changed the Olympics. Statistical Research Report, Department of Mathematics, University of Oslo, November 1994, but arXiv'd March 2026