arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.16854 2026-03-18 stat.ME

Spatial Causal Tensor Completion for Multiple Exposures and Outcomes: An Application to the Health Effects of PFAS Pollution

Xiaodan Zhou, Brian J Reich, Shu Yang

详情

英文摘要

Per- and polyfluoroalkyl substances (PFAS) are typically encountered as mixtures of distinct chemicals with distinct effects on multiple health outcomes. Estimating joint causal effects using spatially-dependent observed data is challenging. We propose a spatial causal tensor completion framework that jointly models multiple exposures and outcomes within a low-rank tensor structure, while adjusting for observed confounders and latent spatial confounders. This method combines a low-rank tensor representation to pool information across exposures and outcomes with a spectral adjustment step that incorporates graph-Laplacian eigenvectors to approximate unmeasured spatial confounders, implemented via a projected-gradient descent algorithm. This framework enables causal inference in the presence of unmeasured spatial confounding and pervasive missingness of potential outcomes. We establish theoretical guarantees for the estimator and evaluate its finite-sample performance through extensive simulations. In an application to national PFAS monitoring data, our approach yields more conservative and credible causal relationships between PFOA and PFOS exposure and 13 chronic disease outcomes compared with existing alternatives.

URL PDF HTML ☆

赞 0 踩 0

2603.16829 2026-03-18 stat.ML cs.LG math.ST stat.ME stat.TH

Conditional Distributional Treatment Effects: Doubly Robust Estimation and Testing

Saksham Jain, Alex Luedtke

2603.16798 2026-03-18 cs.LG cs.DS math.ST stat.ML stat.TH

High-Dimensional Gaussian Mean Estimation under Realizable Contamination

Ilias Diakonikolas, Daniel M. Kane, Thanasis Pittas

2603.16785 2026-03-18 math.ST stat.TH

Local asymptotic normality for mixed fractional Ornstein-Uhlenbeck process under high-frequency observation

Chunhao Cai, Yiwu Shang, Cong Zhang

2603.12454 2026-03-18 stat.ME stat.AP

Rank-based methods for estimating landmark win probability in longitudinal randomized controlled trials with missing data

Guangyong Zou, Shi-Fang Qui, Joshua Zou, Emma Davies Smith, Yun-Hee Choi, Yuhan Bi

2602.16933 2026-03-18 stat.ME math.ST stat.ML stat.TH

M-estimation under Two-Phase Multiwave Sampling with Applications to Prediction-Powered Inference

Dan M. Kluger, Stephen Bates

2506.19015 2026-03-18 stat.ME

Principal stratification with recurrent events truncated by a terminal event: A nested Bayesian nonparametric approach

Yuki Ohnishi, Michael O. Harhay, Guangyu Tong, Fan Li

Comments 58 pages

2603.16756 2026-03-18 stat.CO stat.AP

Sequential Bayesian Experimental Design for Prediction in Physical Experiments Informed by Computer Models

Hao Zhu, Markus Hainy

Comments Accepted for presentation at the SIAM Conference on Uncertainty Quantification (UQ26), March 22-25, 2026, Minneapolis, USA

2603.16741 2026-03-18 cs.LG q-bio.NC q-bio.QM stat.ML

Bayesian Inference of Psychometric Variables From Brain and Behavior in Implicit Association Tests

Christian A. Kothe, Sean Mullen, Michael V. Bronstein, Grant Hanada, Marcelo Cicconet, Aaron N. McInnes, Tim Mullen, Marc Aafjes, Scott R. Sponheim, Alik S. Widge

Comments 43 pages, 7 figures, 6 tables, submitted to: Journal of Neural Engineering

详情

英文摘要

Objective. We establish a principled method for inferring mental health related psychometric variables from neural and behavioral data using the Implicit Association Test (IAT) as the data generation engine, aiming to overcome the limited predictive performance (typically under 0.7 AUC) of the gold-standard D-score method, which relies solely on reaction times. Approach. We propose a sparse hierarchical Bayesian model that leverages multi-modal data to predict experiences related to mental illness symptoms in new participants. The model is a multivariate generalization of the D-score with trainable parameters, engineered for parameter efficiency in the small-cohort regime typical of IAT studies. Data from two IAT variants were analyzed: a suicidality-related E-IAT ($n=39$) and a psychosis-related PSY-IAT ($n=34$). Main Results. Our approach overcomes a high inter-individual variability and low within-session effect size in the dataset, reaching AUCs of 0.73 (E-IAT) and 0.76 (PSY-IAT) in the best modality configurations, though corrected 95% confidence intervals are wide ($\pm 0.18$) and results are marginally significant after FDR correction ($q=0.10$). Restricting the E-IAT to MDD participants improves AUC to 0.79 $[0.62, 0.97]$ (significant at $q=0.05$). Performance is on par with the best reference methods (shrinkage LDA and EEGNet) for each task, even when the latter were adapted to the task, while the proposed method was not. Accuracy was substantially above near-chance D-scores (0.50-0.53 AUC) in both tasks, with more consistent cross-task performance than any single reference method. Significance. Our framework shows promise for enhancing IAT-based assessment of experiences related to entrapment and psychosis, and potentially other mental health conditions, though further validation on larger and independent cohorts will be needed to establish clinical utility.

URL PDF HTML ☆

赞 0 踩 0

2603.16729 2026-03-18 cs.LG cs.CE econ.EM math.OC stat.ML

GeMA: Learning Latent Manifold Frontiers for Benchmarking Complex Systems

Jia Ming Li, Anupriya, Daniel J. Graham

Comments Latent manifold frontiers for benchmarking complex production systems, and applications to national rail operators, wind farms, and macroeconomic productivity are presented

2603.16712 2026-03-18 math.ST cs.DS cs.LG stat.ML stat.TH

High-dimensional estimation with missing data: Statistical and computational limits

Kabir Aladin Verchand, Ankit Pensia, Saminul Haque, Rohith Kuditipudi

2603.16535 2026-03-18 cs.LG math.OC stat.ML

SympFormer: Accelerated attention blocks via Inertial Dynamics on Density Manifolds

Viktor Stein, Wuchen Li, Gabriele Steidl

Comments 24 pages, 2 figures, 3 tables, comments welcome!

2603.16530 2026-03-18 stat.ME

Estimation and Hypothesis Testing of Fixed Effects Models-Based Uncertainty for Factor Designs

Fan Zhang, Zhiming Li

Comments 24 pages, 10 tables

2603.16400 2026-03-18 stat.ME

A nonparametric approach to understand multivariate quantile dynamics in financial time series

Kunal Rai, Archi Roy, Itai Dattner, Soudeep Deb

2603.16345 2026-03-18 astro-ph.IM stat.AP

Optimising the FRB Search Pipeline for the Northern Cross Radio Telescope

Hayley Camilleri, Alessio Magro, Andrea Geminardi, Giovanni Naldi, Gianni Bernardi, Luca Bruno, Valentina Cesare, Francesco Fiori, Davide Pelliciari, Maura Pilia, Matteo Trudu

Comments 18 pages, 8 figures

详情

英文摘要

FRB search pipelines are being developed to operate under strict real-time constraints while maintaining sensitivity to short-duration transient signals. In incoherent dedispersion based pipelines such as Heimdall, apart from observation bandwidth and number of beams, detection performance and computational throughput are strongly dependent on the choice of processing parameters, which are often selected heuristically. In this work, we present a systematic evaluation of key dedispersion and matched filtering parameters and quantify their impact on both detection accuracy and runtime performance. A controlled synthetic injection framework is developed in which artificial FRB pulses with known DMs, SNRs, and pulse widths are embedded into realistic filterbank data containing instrumental noise representative of observations from the Northern Cross radio telescope. Using this framework, a grid of Heimdall configurations is explored, spanning DM tolerance, boxcar filter width, and processing gulp size. Detection performance is assessed by comparing recovered and injected signal properties, while computational performance is evaluated through end-to-end processing time measurements. The results reveal clear trade-offs between sensitivity and throughput across parameter choices. We identify an empirically optimal configuration that provides burst recovery while maintaining processing speeds exceeding real-time requirements. While the specific optimal parameters are derived for the Northern Cross, the methodology and findings are broadly applicable to any real-time transient detection pipeline employing matched-filtering and dedispersion, and are particularly relevant for low-frequency radio telescopes with similar observing configurations. These findings demonstrate the value of data-driven parameter evaluation for improving the performance of real-time transient detection pipelines.

URL PDF HTML ☆

赞 0 踩 0

2603.16344 2026-03-18 stat.ME

A flexible wrapped Lindley-type distribution for angular data modelling

Johan Ferreira, Delene van Wyk-de Ridder, Janet van Niekerk

2603.16317 2026-03-18 stat.OT

Balance and Fairness through Multicalibration in Nonlife Insurance Pricing

Michel Denuit, Marie Michaelides, Julien Trufin

2603.16294 2026-03-18 math.ST stat.TH

A Kernel Two-Sample Test Invariant under Group Action with Applications to Functional Data

Madison Giacofci, Anouar Meynaoui, Alex Podgorny

2603.16279 2026-03-18 cs.RO stat.ML

Agile Interception of a Flying Target using Competitive Reinforcement Learning

Timothée Gavin, Simon Lacroix, Murat Bronz

2603.16213 2026-03-18 math.ST stat.ME stat.TH

Equivalence testing with data-dependent and post-hoc equivalence margins

Stan Koobs, Nick W. Koning

2603.16062 2026-03-18 stat.ML cs.LG

Safe Distributionally Robust Feature Selection under Covariate Shift

Hiroyuki Hanada, Satoshi Akahane, Noriaki Hashimoto, Shion Takeno, Ichiro Takeuchi

2603.16056 2026-03-18 cond-mat.stat-mech quant-ph stat.ML

Population Annealing as a Discrete-Time Schrödinger Bridge

Masayuki Ohzeki

Comments 4 pages

2603.16042 2026-03-18 math.OC cs.LG stat.ML

Shuffling the Stochastic Mirror Descent via Dual Lipschitz Continuity and Kernel Conditioning

Junwen Qiu, Leilei Mei, Junyu Zhang

Comments 28 pages, 3 figures

2603.16041 2026-03-18 stat.ME cs.LG

Power Analysis for Prediction-Powered Inference

Yiqun T. Chen, Moran Guo, Shengy Li

2603.16014 2026-03-18 stat.CO

Fast Multitask Gaussian Process Regression

Aleksei G. Sorokin, Pieterjan Robbe, Fred J. Hickernell

2603.16005 2026-03-18 math.ST stat.ML stat.TH

Breakdown properties of optimal transport maps: general transportation costs

Alberto Gonzalez-Sanz, Marco Avella Medina

2603.15924 2026-03-18 stat.ME

Time Partitioning in Target Trial Emulation

Harold Tankpinou Zoumenou, Simon Ferreira, Charles Assaad, Nathanael Lapidus, Daria Bystrova, Benjamin Glemain

2603.15923 2026-03-18 stat.ML cs.LG

Learning to Recall with Transformers Beyond Orthogonal Embeddings

Nuri Mert Vural, Alberto Bietti, Mahdi Soltanolkotabi, Denny Wu

Comments ICLR 2026

2603.15902 2026-03-18 stat.CO stat.ME

SEMMS with Random Effects: A Mixed-Model Extension for Variable Selection in Clustered and Longitudinal Data

Haim Bar, Martin T. Wells

详情

英文摘要

SEMMS (Scalable Empirical-Bayes Model for Marker Selection) is a variable-selection procedure for generalized linear models that uses a three-component normal mixture prior on regression coefficients. In its original form, SEMMS assumes that all observations are independent. Many real-world datasets, however, arise from repeated-measures or clustered designs in which observations within the same subject are correlated. Ignoring this correlation inflates the apparent residual variance and can severely degrade variable-selection performance. We extend SEMMS to accommodate random intercepts, random slopes, or both, via an alternating coordinate-ascent algorithm. After each round of fixed-effect variable selection, the subject-level best linear unbiased predictors (BLUPs) are updated with \texttt{lmer} (Gaussian) or \texttt{glmer} (non-Gaussian); the fixed-effect step then operates on the random-effect-adjusted response. We describe the algorithm, evaluate its performance in three Gaussian simulation studies spanning a range of signal strengths, random-effect magnitudes, and sample/predictor-space regimes, and present a semi-synthetic real-data example. We further extend the framework to non-Gaussian families (Poisson, binomial) via an IRLS working-response adaptation: at each outer iteration the fixed-effects step uses the RE-adjusted working response computed from the current \texttt{glmer} fitted values rather than the raw response. When the fixed-effect signal is strong relative to the random-effect variance, both the original and extended procedures perform comparably. When the random-effect variance dominates -- the scenario most likely to cause plain SEMMS to fail -- the mixed-model extension recovers the exact true predictor set in 93\% of simulated datasets (Gaussian), 61\% (Poisson), and 65\% (binomial), compared with 1\%, 45\%, and 39\% for plain SEMMS respectively.

URL PDF HTML ☆

赞 0 踩 0

2603.15175 2026-03-18 stat.ME math.DS q-bio.PE

Bayesian Inference in Epidemic Modelling: A Beginner's Guide

Augustine Okolie

Comments 12 pages, 2 plots

2603.15057 2026-03-18 stat.ML cs.AI cs.LG

Analyzing Error Sources in Global Feature Effect Estimation

Timo Heiß, Coco Bögel, Bernd Bischl, Giuseppe Casalicchio

Comments Accepted to The 4th World Conference on eXplainable Artificial Intelligence (XAI 2026)

2603.14894 2026-03-18 cs.LG cs.AI stat.ML

Informative Perturbation Selection for Uncertainty-Aware Post-hoc Explanations

Sumedha Chugh, Ranjitha Prasad, Nazreen Shah

2603.14198 2026-03-18 cs.LG cs.AI stat.ML

Efficient Federated Conformal Prediction with Group-Conditional Guarantees

Haifeng Wen, Osvaldo Simeone, Hong Xing

Comments 22 pages, 5 figures, submitted for possible publication

2603.13784 2026-03-18 math.ST stat.ME stat.TH

Mixed difference integer-valued GARCH model for $ \mathbb{Z}$-valued time series

Abdelhakim Aknouche, Christian Francq, Yuichi Goto

Comments 61 pages, 8 figures

2603.11267 2026-03-18 stat.AP

A Statistically Reliable Optimization Framework for Bandit Experiments in Scientific Discovery

Tong Li, Travis Mandel, Goldie Phillips, Anna Rafferty, Eric M. Schwartz, Dehan Kong, Joseph J. Williams

2603.09531 2026-03-18 q-bio.QM cs.CV eess.IV stat.AP

Association of Progressive PPFE and Mortality in Lung Cancer Screening Cohorts

Shahab Aslani, Mehran Azimbagirad, Daryl Cheng, Daisuke Yamada, Ryoko Egashira, Adam Szmul, Justine Chan-Fook, Robert Chapman, Alfred Chung Pui So, Shanshan Wang, John McCabe, Tianqi Yang, Jose M Brenes, Eyjolfur Gudmundsson, The SUMMIT Consortium, Susan M. Astley, Daniel C. Alexander, Sam M. Janes, Joseph Jacob

2603.01381 2026-03-18 stat.ME

Differential gene expression analysis via two-component mixture models with a semiparametric skew-normal scale mixture alternative

Sangkon Oh, Geoffrey J. McLachlan

2602.17922 2026-03-18 stat.CO stat.ME stat.ML

Data-driven configuration tuning of glmnet for balancing accuracy and computational efficiency

Shuhei Muroya, Kei Hirose

Comments 23 pages, 9 figure. Title changed. Revised for linguistic clarity and stylistic improvements; no changes to the main results

2602.03999 2026-03-18 math.PR cs.DS cs.LG math.ST stat.ML stat.TH

Functional Stochastic Localization

Anming Gu, Bobby Shi, Kevin Tian

Comments Comments welcome! v2 adds citations and fixes typos

2601.01259 2026-03-18 stat.ME math.ST stat.TH

A Novel Multiple Imputation Approach For Parameter Estimation in Observation-Driven Time Series Models With Missing Data

Guilherme Pumi, Taiane Schaedler Prass, Douglas Krauthein Verdum

Comments This version presents the large sample theory for the proposed method, showing its strong consistency under mild assumptions, regardless of the amount of missing data or the its generating mechanism

2512.07709 2026-03-18 econ.EM stat.CO stat.ME

Bounds on inequality with incomplete data

James Banks, Thomas Glinnan, Tatiana Komarova

2512.00698 2026-03-18 cs.LG stat.ML

Flow Matching for Tabular Data Synthesis

Bahrul Ilmi Nasution, Floor Eijkelboom, Mark Elliot, Richard Allmendinger, Christian A. Naesseth

Comments Published at TMLR

2510.25289 2026-03-18 cs.SI math.ST stat.TH

Testing Correlation in Graphs by Counting Bounded Degree Motifs

Dong Huang, Pengkun Yang

Comments 46 pages, 8 figures

2510.14055 2026-03-18 math.ST cs.IT math.IT math.PR stat.TH

Minimum Hellinger Distance Estimators for Complex Survey Designs

David Kepplinger, Anand N. Vidyashankar

Comments 36 pages

2510.06122 2026-03-18 cs.LG stat.ML

PolyGraph Discrepancy: a classifier-based metric for graph generation

Markus Krimmel, Philip Hartout, Karsten Borgwardt, Dexiong Chen

Comments Camera-ready version published at ICLR 2026

2509.09965 2026-03-18 stat.ME math.ST stat.TH

Confidence Intervals for Extinction Risk: Validating Population Viability Analysis with Limited Data

Hiroshi Hakoyama

Comments 151 pages, 32 figures, 30 tables

2508.11814 2026-03-18 stat.ME

Simulation-based validation of Bayes factor computation

Martin Modrák, Sebastian Stroppel, Paul-Christian Bürkner

Comments 49 pages, 14 figures

2508.09554 2026-03-18 stat.AP

A Bayesian factor analysis model for non-randomised staggered designs

Constantin Schmidt, Shaun R. Seaman, Beatrice Emmanouil, Leila Reid, Stuart Smith, Daniela De Angelis, Pantelis Samartsidis

Comments 17 pages, 5 figures

2508.03675 2026-03-18 stat.ME stat.AP

Partial Conjunction Analysis in Neuroimaging: A Comparative Study

Monitirtha Dey, Anna Vesely, Thorsten Dickhaus

2506.01324 2026-03-18 stat.ML cs.IT cs.LG math.IT math.PR

Near-Optimal Clustering in Mixture of Markov Chains

Junghyun Lee, Yassir Jedra, Alexandre Proutière, Se-Young Yun

Comments AISTATS 2026 (50 pages, 6 figures) (ver3: camera-ready version, major revisions)

2505.21777 2026-03-18 cs.LG cond-mat.dis-nn cs.CV q-bio.NC stat.ML

Memorization to Generalization: Emergence of Diffusion Models from Associative Memory

Bao Pham, Gabriel Raya, Matteo Negri, Mohammed J. Zaki, Luca Ambrogioni, Dmitry Krotov

2505.09647 2026-03-18 cs.DS cs.IT cs.LG math.IT math.PR math.ST stat.TH

On Unbiased Low-Rank Approximation with Minimum Distortion

Leighton Pate Barnes, Stephen Cameron, Benjamin Howard

2505.07272 2026-03-18 stat.ML cs.LG eess.SP

ALPCAH: Subspace Learning for Sample-wise Heteroscedastic Data

Javier Salazar Cavazos, Jeffrey A. Fessler, Laura Balzano

2504.13336 2026-03-18 stat.ML cs.LG math.ST stat.TH

On the minimax optimality of Flow Matching through the connection to kernel density estimation

Lea Kunkel, Mathias Trabs

2504.09347 2026-03-18 stat.ML cs.LG math.ST stat.TH

Inference for Deep Neural Network Estimators in Generalized Nonparametric Models

Xuran Meng, Yi Li

Comments 91 pages, 14 figures, 20 tables

2504.03228 2026-03-18 econ.EM stat.ML

Weak instrumental variables due to nonlinearities in panel data: A Super Learner Control Function estimator

Monika Avila-Marquez

详情

英文摘要

A triangular structural panel data model with additive separable individual-specific effects is used to model the causal effect of a covariate on an outcome variable when there are unobservable confounders with some of them time-invariant. In this setup, a linear reduced-form equation might be problematic when the conditional mean of the endogenous covariate and the instrumental variables is nonlinear. The reason is that ignoring the nonlinearity could lead to weak instruments (instruments are weakly correlated with the endogenous covariate). As a solution, we propose a triangular simultaneous equation model for panel data with additive separable individual-specific fixed effects composed of a linear structural equation with a nonlinear reduced form equation. The parameter of interest is the structural parameter of the endogenous variable. The identification of this parameter is obtained under the assumption of available exclusion restrictions and using a control function approach. Estimating the parameter of interest is done using an estimator that we call Super Learner Control Function (SLCF) estimator. The estimation procedure is composed of two main steps and sample splitting. First, we estimate the control function using a super learner . In the following step, we use the estimated control function to control for endogeneity in the structural equation. Sample splitting is done across the individual dimension. The estimator is consistent and asymptotically normal achieving a parametric rate of convergence. We perform a Monte Carlo simulation to test the performance of the estimators proposed. We conclude that the Super Learner Control Function Estimators significantly outperform Within 2SLS estimators. Finally, we show that the SLCF estimator differs from both the plug-in IV estimator and a naive plug-in 2SLS estimator.

URL PDF HTML ☆

赞 0 踩 0

2504.02547 2026-03-18 stat.ME

Outlier-Robust Multi-Group Gaussian Mixture Modeling with Flexible Group Reassignment

Patricia Puchhammer, Ines Wilms, Peter Filzmoser

2503.14978 2026-03-18 math.ST cs.NA math.AP math.NA math.PR stat.TH

Inferring diffusivity from killed diffusion

Richard Nickl, Fanny Seizilles

Comments 33 pages, to appear in the Annals of Statistics

2503.13986 2026-03-18 math.ST stat.TH

Stratified Permutational Berry--Esseen Bounds and Their Applications to Statistics

Pengfei Tian, Fan Yang, Peng Ding

2503.13148 2026-03-18 stat.ME math.ST stat.TH

Spearman's rho for zero-inflated count data: formulation and attainable bounds

Jasper Arends, Guanjie Lyu, Mhamed Mesfioui, Elisa Perrone, Julien Trufin

2503.12966 2026-03-18 cs.LG stat.ML

Optimal Denoising in Score-Based Generative Models: The Role of Data Regularity

Eliot Beyler, Francis Bach

2503.07327 2026-03-18 stat.ME stat.ML

Casewise and Cellwise Robust Multilinear Principal Component Analysis

Mehdi Hirari, Fabio Centofanti, Mia Hubert, Stefan Van Aelst

2501.11738 2026-03-18 stat.ME

A new class of non-stationary Gaussian fields with general smoothness on metric graphs

David Bolin, Lenin Riera-Segura, Alexandre B. Simas

2409.04412 2026-03-18 stat.ME q-fin.MF q-fin.RM

Robust Elicitable Functionals

Kathleen E. Miao, Silvana M. Pesenti

2409.01983 2026-03-18 stat.ME math.ST stat.TH

The causal interpretation of acceleration factors

Mari Brathovde, Hein Putter, Morten Valberg, Richard A. J. Post

2408.08771 2026-03-18 stat.ME stat.CO

Dynamic factor analysis for sparse and irregular longitudinal data: an application to metabolite measurements in a COVID-19 study

Jiachen Cai, Robert J. B. Goudie, Brian D. M. Tom

详情

DOI: 10.1002/sim.70499
Journal ref: Stat. Med. 2026, 45(6-7):e70499

英文摘要

It is of scientific interest to identify essential biomarkers in biological processes underlying diseases to facilitate precision medicine. Factor analysis (FA) has long been used to address this goal: by assuming latent biological pathways drive the activity of measurable biomarkers, a biomarker is more influential if its absolute factor loading is larger. Although correlation between biomarkers has been properly handled under this framework, correlation between latent pathways are often overlooked, as one classical assumption in FA is the independence between factors. However, this assumption may not be realistic in the context of pathways, as existing biological knowledge suggests that pathways interact with one another rather than functioning independently. Motivated by sparsely and irregularly collected longitudinal measurements of metabolites in a COVID-19 study of large sample size, we propose a dynamic factor analysis model that can account for the potential cross-correlations between pathways, through a multi-output Gaussian processes (MOGP) prior on the factor trajectories. To mitigate against overfitting caused by sparsity of longitudinal measurements, we introduce a roughness penalty upon MOGP hyperparameters and allow for non-zero mean functions. To estimate these hyperparameters, we develop a stochastic expectation maximization (StEM) algorithm that scales well to the large sample size. In our simulation studies, StEM leads across all sample sizes considered to a more accurate and stable estimate of the MOGP hyperparameters than a comparator algorithm used in previous research. Application to the motivating example identifies a kynurenine pathway that affects the clinical severity of patients with COVID-19. In particular, a novel biomarker taurine is discovered, which has been receiving increased attention clinically, yet its role was overlooked in a previous analysis.

URL PDF HTML ☆

赞 0 踩 0

2408.03415 2026-03-18 stat.ME stat.CO

Gradient-Based Approximate Bayesian Inference with Entropy-Optimized Summary Statistics for Compartmental Models

Xiahui Li, Fergus J. Chadwick, Ben Swallow

2407.19892 2026-03-18 stat.ML cs.LG q-bio.GN

Making Multi-Axis Gaussian Graphical Models Scalable to Millions of Cells

Bailey Andrew, Erica L. Harris, James A. Poulter, David R. Westhead, Luisa Cutillo

Comments 8 pages (35 with appendix+references), 8 figures, 10 tables

2407.18360 2026-03-18 stat.AP

Evaluating Organizational Effectiveness: A New Strategy to Leverage Multisite Randomized Trials for Valid Assessment

Guanglei Hong, Jonah Deutsch, Peter Kress, Jose Eos Trinidad, Zhengyan Xu

Comments To appear in the American Journal of Evaluation

2405.19553 2026-03-18 math.ST cs.LG math.PR stat.ML stat.TH

Convergence Bounds for Sequential Monte Carlo on Multimodal Distributions using Soft Decomposition

Holden Lee, Matheau Santana-Gijzen

2402.03819 2026-03-18 stat.ML cs.LG

Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants

Abdoulaye Sakho, Emmanuel Malherbe, Erwan Scornet

2310.12032 2026-03-18 cs.LG stat.ML

Exact and general decoupled solutions of the LMC Multitask Gaussian Process model

Olivier Truffinet, Karim Ammar, Jean-Philippe Argaud, Bertrand Bouriquet

Comments 78 pages, 12 figures, submitted to Neurocomputing

2308.11458 2026-03-18 stat.ME

Towards a unified approach to formal risk of bias assessments for causal and descriptive inference

Oliver L. Pescott, Robin J. Boyd, Gary D. Powney, Gavin B. Stewart

详情

DOI: 10.1007/s11135-026-02687-0

英文摘要

Statistics is sometimes described as the science of reasoning under uncertainty. Statistical models provide one view of this uncertainty, but what is frequently neglected is the 'invisible' portion of uncertainty: that assumed not to exist once a model has been fitted to some data. Systematic errors, i.e. bias, in data relative to some model and inferential goal can seriously undermine research conclusions, and qualitative and quantitative techniques have been created across several disciplines to quantify and generally appraise such potential biases. Perhaps best known are so-called 'risk of bias' assessment instruments used to investigate the likely quality of randomised controlled trials in medical research. However, the logic of assessing the risks caused by various types of systematic error to statistical arguments applies far more widely. This logic applies even when statistical adjustment strategies for potential biases are used, as these frequently make assumptions (e.g. data 'missing at random') that can rarely be empirically guaranteed. Mounting concern about such situations can be seen in the increasing calls for greater consideration of biases caused by nonprobability sampling in descriptive inference (e.g. in survey sampling), and the statistical generalisability of in-sample causal effect estimates in causal inference. Both of these relate to the consideration of model-based and wider uncertainty when presenting research conclusions from models. Given that model-based adjustments are never perfect, we argue that qualitative risk of bias reporting frameworks for both descriptive and causal inferential arguments should be further developed and made mandatory by journals and funders. It is only through clear statements of the limits to statistical arguments that consumers of research can fully judge their value for any given application.

URL PDF HTML ☆

赞 0 踩 0

2303.08528 2026-03-18 stat.ME

Translating predictive distributions into informative priors

Andrew A. Manderson, Robert J. B. Goudie

Comments Revised, added KL methodology and comparisons, improved manuscript for clarity

2211.04129 2026-03-18 math.OC cs.LG stat.ML

An Efficient Global Optimization Algorithm with Adaptive Estimates of the Local Lipschitz Constants

Danny D'Agostino

Comments Accepted in Journal of Global Optimization, Springer

2211.03274 2026-03-18 stat.ME math.ST stat.TH

A General Framework for Cutting Feedback within Modularised Bayesian Inference

Yang Liu, Robert J. B. Goudie

Comments 30 pages, 9 figures

2106.00996 2026-03-18 stat.ME

Generalized Geographically Weighted Regression Model within a Modularized Bayesian Framework

Yang Liu, Robert J. B. Goudie

Comments 34 pages, 11 figures

2006.01584 2026-03-18 stat.ME stat.CO

Stochastic Approximation Cut Algorithm for Inference in Modularized Bayesian Models

Yang Liu, Robert J. B. Goudie

Comments 36 pages, 9 figures, 1 table

1607.06779 2026-03-18 stat.ME stat.AP stat.CO

Joining and splitting models with Markov melding

Robert J. B. Goudie, Anne M. Presanis, David Lunn, Daniela De Angelis, Lorenz Wernisch

2603.15840 2026-03-18 cs.LG cs.AI cs.CL stat.ML

When Stability Fails: Hidden Failure Modes Of LLMS in Data-Constrained Scientific Decision-Making

Nazia Riasat

Comments 13 pages, 5 figures. Accepted at ICLR 2026 Workshop: I Can't Believe It's Not Better (ICBINB 2026). OpenReview: https://openreview.net/pdf?id=vf8vs2ibso

2603.15839 2026-03-18 stat.AP q-fin.RM

A Portfolio-Anchored Frequency-Severity Risk Index for Trip and Driver Assessment Using Telematics Signals

Jongtaek Lee, Andrei Badescu, X. Sheldon Lin

Comments 31 pages, 4 figures. Submitted to ASTIN Bulletin

2603.15814 2026-03-18 cs.LG stat.AP

Longitudinal Risk Prediction in Mammography with Privileged History Distillation

Banafsheh Karimian, Alexis Guichemerre, Soufiane Belharbi, Natacha Gillet, Luke McCaffrey, Mohammadhadi Shateri, Eric Granger

详情

英文摘要

Breast cancer remains a leading cause of cancer-related mortality worldwide. Longitudinal mammography risk prediction models improve multi-year breast cancer risk prediction based on prior screening exams. However, in real-world clinical practice, longitudinal histories are often incomplete, irregular, or unavailable due to missed screenings, first-time examinations, heterogeneous acquisition schedules, or archival constraints. The absence of prior exams degrades the performance of longitudinal risk models and limits their practical applicability. While substantial longitudinal history is available during training, prior exams are commonly absent at test time. In this paper, we address missing history at inference time and propose a longitudinal risk prediction method that uses mammography history as privileged information during training and distills its prognostic value into a student model that only requires the current exam at inference time. The key idea is a privileged multi-teacher distillation scheme with horizon-specific teachers: each teacher is trained on the full longitudinal history to specialize in one prediction horizon, while the student receives only a reconstructed history derived from the current exam. This allows the student to inherit horizon-dependent longitudinal risk cues without requiring prior screening exams at deployment. Our new Privileged History Distillation (PHD) method is validated on a large longitudinal mammography dataset with multi-year cancer outcomes, CSAW-CC, comparing full-history and no-history baselines to their distilled counterparts. Using time-dependent AUC across horizons, our privileged history distillation method markedly improves the performance of long-horizon prediction over no-history models and is comparable to that of full-history models, while using only the current exam at inference time.

URL PDF HTML ☆

赞 0 踩 0

2603.15785 2026-03-18 math.PR math.MG math.ST stat.TH

On the Uniqueness of Fréchet Means for Polytope Norms

Roan Talbut, Andrew McCormack, Anthea Monod

Comments 28 pages, 1 figure

2603.15683 2026-03-18 stat.ML cs.LG

Beyond Distance: Quantifying Point Cloud Dynamics with Persistent Homology and Dynamic Optimal Transport

Yixin Wang, Ting Gao, Jinqiao Duan

Comments 42 pages, 15 figures

2603.15664 2026-03-18 stat.AP cs.AI cs.CE stat.ML

Quantum Amplitude Estimation for Catastrophe Insurance Tail-Risk Pricing: Empirical Convergence and NISQ Noise Analysis

Alexis Kirke