arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.02204 2026-03-03 cs.LG stat.ML

Partial Causal Structure Learning for Valid Selective Conformal Inference under Interventions

Amir Asiaee, Kavey Aryan, James P. Long

详情

英文摘要

Selective conformal prediction can yield substantially tighter uncertainty sets when we can identify calibration examples that are exchangeable with the test example. In interventional settings, such as perturbation experiments in genomics, exchangeability often holds only within subsets of interventions that leave a target variable "unaffected" (e.g., non-descendants of an intervened node in a causal graph). We study the practical regime where this invariance structure is unknown and must be learned from data. Our contributions are: (i) a contamination-robust conformal coverage theorem that quantifies how misclassification of "unaffected" calibration examples degrades coverage via an explicit function $g(δ,n)$ of the contamination fraction and calibration set size, providing a finite-sample lower bound that holds for arbitrary contaminating distributions; (ii) a task-driven partial causal learning formulation that estimates only the binary descendant indicators $Z_{a,i}=\mathbf{1}\{i\in\mathrm{desc}(a)\}$ needed for selective calibration, rather than the full causal graph; and (iii) algorithms for descendant discovery via perturbation intersection patterns (differentially affected variable set intersections across interventions), and for approximate distance-to-intervention estimation via local invariant causal prediction. We provide recovery conditions under which contamination is controlled. Experiments on synthetic linear structural equation models (SEMs) validate the bound: under controlled contamination up to $δ=0.30$, the corrected procedure maintains $\ge 0.95$ coverage while uncorrected selective CP degrades to $0.867$. A proof-of-concept on Replogle K562 CRISPR interference (CRISPRi) perturbation data demonstrates applicability to real genomic screens.

URL PDF HTML ☆

赞 0 踩 0

2603.02195 2026-03-03 stat.AP

Comparative Analysis of Spatiotemporal Volatility Models: An Empirical Study on Financial Network Series

Ariane N. Meli Chrisko, Jessie Li, Philipp Otto, Wolfgang Schmid

Comments 28 pages, 21 figures. Submitted to the Vienna-Copenhagen Conference on Financial Econometrics (2026)

2603.02193 2026-03-03 cs.LG cs.AI stat.ML

Symbol-Equivariant Recurrent Reasoning Models

Richard Freinschlag, Timo Bertram, Erich Kobler, Andreas Mayr, Günter Klambauer

2603.02191 2026-03-03 math.ST math.AG stat.TH

Algebraic statistics of Hüsler-Reiss graphical models in multivariate extremes

Carlos Améndola, Jane Ivy Coons, Alexandros Grosdos, Frank Röttger

Comments 27 pages, 2 figures

2603.02178 2026-03-03 cs.LG cs.AI stat.ML

Reservoir Subspace Injection for Online ICA under Top-n Whitening

Wenjun Xiao, Yuda Bi, Vince D Calhoun

2603.02160 2026-03-03 stat.ME

Setwise Hierarchical Variable Selection and the Generalized Linear Step-Up Procedure for False Discovery Rate Control

Sarah Organ, Toby Kenney, Hong Gu

Comments 24 pages, 5 figures

2603.02155 2026-03-03 cs.LG cs.AI math.ST stat.ML stat.TH

Near-Optimal Regret for KL-Regularized Multi-Armed Bandits

Kaixuan Ji, Qingyue Zhao, Heyang Zhao, Qiwei Di, Quanquan Gu

2603.02131 2026-03-03 stat.AP cs.SI physics.soc-ph stat.OT

Socio-Spatial Patterns of Suicide Mortality in the United States

Kushagra Tiwari, M. Amin Rahimian, Marie-Laure Charpignon, Philippe J. Giabbanelli, Praveen Kumar

Comments Code and data: https://github.com/kut97/suicide-sci

2603.02102 2026-03-03 cs.SI stat.AP

Political attitudes differ but share a common low-dimensional structure across social media and survey data

Antoine Vendeville, Hiroki Yamashita, Pedro Ramaciotti

2603.02069 2026-03-03 cs.LG cs.AI math.OC stat.ML

Scaling Laws of SignSGD in Linear Regression: When Does It Outperform SGD?

Jihwan Kim, Dogyoon Song, Chulhee Yun

Comments Accepted at ICLR 2026, 89 pages, 25 figures

2603.02059 2026-03-03 stat.ML cs.LG

TRAKNN: Efficient Trajectory Aware Spatiotemporal kNN for Rare Meteorological Trajectory Detection

Guillaume Coulaud, Davide Faranda

2603.02043 2026-03-03 cs.LG stat.ML

Leave-One-Out Prediction for General Hypothesis Classes

Jian Qian, Jiachen Xu

2603.02010 2026-03-03 cs.LG stat.ML

Noise-Calibrated Inference from Differentially Private Sufficient Statistics in Exponential Families

Amir Asiaee, Samhita Pal

2603.02003 2026-03-03 stat.ME

Analysis of Stepped-Wedge Randomised Cluster Trial using a generalized pairwise comparison approach : a simulation study

Yohan Bard, Emilie Presles, Marc Buyse, Silvy Laporte, Paul Zufferey, Frederikus A. Klok, Olivier Sanchez, Francis Couturaud, Edouard Ollier

2603.01989 2026-03-03 math.ST stat.TH

Wasserstein-based identification of metastable states in time series data via change point detection and segment clustering

David Gentile, Joshua Huang, James M. Murphy

2603.01981 2026-03-03 stat.AP

Quantifying Uncertainty in Void Swelling Prediction: A Conformal Prediction Framework for Reactor Safety Margins

Minhee Kim, Yong Yang

2603.01975 2026-03-03 stat.ML cs.NA math.NA

Density-Matrix Spectral Embeddings for Categorical Data: Operator Structure and Stability

Raquel Bosch-Romeu, Antonio Falcó, osé-Antonio Rodríguez-Gallego

2603.01971 2026-03-03 stat.ML cs.LG

LOCUS: A Distribution-Free Loss-Quantile Score for Risk-Aware Predictions

Matheus Barreto, Mário de Castro, Thiago R. Ramos, Denis Valle, Rafael Izbicki

Comments The article contains nine pages and the appendix twelve

2603.01951 2026-03-03 cs.LG math.OC stat.ML

Accelerating Single-Pass SGD for Generalized Linear Prediction

Qian Chen, Shihong Ding, Cong Fang

Comments 50 pages

2603.01943 2026-03-03 stat.ME

A Simulation Study to Compare Inferential Properties when Modelling Ordinal Outcomes: The Case for the (Plain but Robust) Proportional Odds Model

Stefan Inerle, Markus Pauly, Moritz Berger

2603.01858 2026-03-03 math.PR math.ST stat.TH

Existence, properties, and parametric inference for possibly hyperuniform Gibbs perturbed lattices

Jean-François Coeurjolly, Christopher Renaud-Chan

2602.08212 2026-03-03 stat.ME

Improved Conditional Logistic Regression using Information in Concordant Pairs with Software

Jacob Tennenbaum, Adam Kapelner

Comments 18 pages, 7 tables

2601.21895 2026-03-03 cs.CL cs.AI stat.ML

Learn-to-Distance: Distance Learning for Detecting LLM-Generated Text

Hongyi Zhou, Jin Zhu, Kai Ye, Ying Yang, Erhan Xu, Chengchun Shi

Comments Accepted by ICLR2026

2512.13599 2026-03-03 physics.geo-ph stat.ME

Correcting exponentiality test for binned earthquake magnitudes

Angela Stallone, Ilaria Spassiani

详情

DOI: 10.26443/seismica.v5i1.2257
Journal ref: Seismica 5.1 (2026) 1-10

英文摘要

Above the magnitude of completeness - the minimum threshold for which a 100\% detection rate is assumed - earthquake magnitudes are typically modeled as a continuous exponential distribution. In practice, however, earthquake catalogs report magnitudes with finite resolution, resulting in a discrete (geometric) distribution. To determine the magnitude of completeness, the Lilliefors test is commonly applied. Because this test assumes continuous data, it is standard practice to add uniform noise to binned magnitudes prior to testing exponentiality. Here we show analytically that uniform dithering does not recover the underlying continuous exponential distribution from its discretized (geometric) form. It instead returns a piecewise-constant residual lifetime distribution, whose deviation from the exponential model becomes detectable as catalog size or bin width increases. Through numerical experiments, we demonstrate that this deviation yields a systematic overestimation of the magnitude of completeness, with biases exceeding one magnitude unit in large, high-resolution catalogs. We derive the exact noise distribution - a truncated exponential within each magnitude bin - that correctly restores the continuous exponential distribution over the whole magnitude range. Numerical tests show that this correction yields Lilliefors rejection probabilities that are consistent with the significance level across a wide range of bin widths and catalog sizes. Although illustrated for the Lilliefors test, the identified bias and the proposed correction are independent of the specific statistical test and apply generally to exponentiality testing of discretized magnitude data.

URL PDF HTML ☆

赞 0 踩 0

2512.12046 2026-03-03 cs.LG cs.RO cs.SY eess.SY stat.ML

Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning

Vittorio Giammarino, Ahmed H. Qureshi

2510.22835 2026-03-03 cs.LG stat.CO stat.ML

Clustering by Denoising: Latent plug-and-play diffusion for single-cell data

Dominik Meier, Shixing Yu, Sagnik Nandy, Promit Ghosal, Kyra Gan

2510.07088 2026-03-03 stat.ML cs.LG

Fourier Analysis on the Boolean Hypercube via Hoeffding Functional Decomposition

Baptiste Ferrere, Nicolas Bousquet, Fabrice Gamboa, Jean-Michel Loubes, Joseph Muré

2510.01339 2026-03-03 cs.CV stat.ML

LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration

Alessio Spagnoletti, Andrés Almansa, Marcelo Pereyra

Comments 30 pages, 16 figures. The Fourteenth International Conference on Learning Representations, ICLR 2026

2507.16545 2026-03-03 stat.ME

Bayesian Variational Inference for Mixed Data Mixture Models

Junyang Wang, James Bennett, Victor Lhoste, Sarah Filippi

Comments Updated Corollary 5 to include contraction rate

2506.06267 2026-03-03 stat.ME

A causal framework for evaluating the total effect of strategies aiming to expand screening and to improve outcomes

Joy Zora Nakato, Janice Litunya, Brian Beesiga, Jane Kabami, James Ayieko, Moses R. Kamya, Gabriel Chamie, Laura B. Balzer

Comments 20 pages, 3 figures

2501.04134 2026-03-03 stat.ML cs.LG math.OC math.ST stat.TH

Mixing Times and Privacy Analysis for the Projected Langevin Algorithm under a Modulus of Continuity

Mario Bravo, Juan P. Flores-Mella, Cristóbal Guzmán

Comments 38 pages, 2 figures

2412.16031 2026-03-03 stat.ML cs.LG math.ST stat.TH

Learning sparsity-promoting regularizers for linear inverse problems

Giovanni S. Alberti, Ernesto De Vito, Tapio Helin, Matti Lassas, Luca Ratti, Matteo Santacesaria

Comments 28 pages, 4 figures

2410.23450 2026-03-03 cs.LG cs.AI cs.RO stat.ML

Return Augmented Decision Transformer for Off-Dynamics Reinforcement Learning

Ruhan Wang, Yu Yang, Zhishuai Liu, Dongruo Zhou, Pan Xu

Comments 26 pages, 11 tables, 8 figures. Published in Transactions on Machine Learning Research (TMLR)

2407.08086 2026-03-03 cs.LG stat.CO stat.ML

The GeometricKernels Package: Heat and Matérn Kernels for Geometric Learning on Manifolds, Meshes, and Graphs

Peter Mostowsky, Vincent Dutordoir, Iskander Azangulov, Noémie Jaquier, Michael John Hutchinson, Aditya Ravuri, Leonel Rozo, Alexander Terenin, Viacheslav Borovitskiy

2405.00081 2026-03-03 math.PR math.ST stat.ML stat.TH

Imprecise Markov Semigroups and their Ergodicity

Michele Caprio, Mengqi Chen

2404.08480 2026-03-03 cs.LG cs.CL stat.CO

Using ChatGPT for Data Science Analyses

Ozan Evkaya, Miguel de Carvalho

Comments 19 pages with figures and appendix

2401.17961 2026-03-03 math.ST stat.TH

A Bernstein-von Mises Theorem for Generalized Fiducial Distributions

J. E. Borgert, Jan Hannig

2401.00664 2026-03-03 math.OC cs.LG math.PR math.ST stat.TH

Metric Entropy-Free Sample Complexity Bounds for Sample Average Approximation in Convex Stochastic Programming

Hongcheng Liu, Jindong Tong

2310.17624 2026-03-03 math-ph math.MP math.PR math.ST quant-ph stat.TH

The six blinds and the elephant or an interdisciplinary selection of measurement features

Ask Ellingsen, Douglas Lundholm, Jean-Pierre Magnot

Comments 28 pages, 11 figures, 1 table. Proceedings of the XL Workshop on Geometric Methods in Physics, Bialowieza, 2023

2307.14025 2026-03-03 cs.LG cs.CV eess.IV q-bio.QM stat.ML

Topological Inductive Bias fosters Multiple Instance Learning in Data-Scarce Scenarios

Salome Kazeminia, Carsten Marr, Bastian Rieck

2305.04979 2026-03-03 cs.LG cs.DC stat.ML

FedHB: Hierarchical Bayesian Federated Learning

Minyoung Kim, Timothy Hospedales

2303.16668 2026-03-03 cs.LG cs.AI cs.CR stat.ML

Protecting Federated Learning from Extreme Model Poisoning Attacks via Multidimensional Time Series Anomaly Detection

Edoardo Gabrielli, Dimitri Belli, Zoe Matrullo, Vittorio Miori, Gabriele Tolomei

2303.04945 2026-03-03 quant-ph cs.DS cs.NA math.NA math.ST stat.TH

A Survey of Quantum Alternatives to Randomized Algorithms: Monte Carlo Integration and Beyond

Philip Intallura, Georgios Korpas, Sudeepto Chakraborty, Vyacheslav Kungurtsev, Rufus Lawrence, Ales Wodecki, Jakub Marecek

2210.17453 2026-03-03 stat.ME stat.AP stat.ML

Adaptive Selection of the Optimal Strategy to Improve Precision and Power in Randomized Trials

Laura B. Balzer, Erica Cai, Lucas Godoy Garraza, Pracheta Amaranath

Comments 27 pages (double spaced); 2 figures; 9 tables

详情

DOI: 10.1093/biomtc/ujad034
Journal ref: Biometrics, Volume 80, Issue 1, March 2024, ujad034

英文摘要

Benkeser et al. demonstrate how adjustment for baseline covariates in randomized trials can meaningfully improve precision for a variety of outcome types. Their findings build on a long history, starting in 1932 with R.A. Fisher and including more recent endorsements by the U.S. Food and Drug Administration and the European Medicines Agency. Here, we address an important practical consideration: *how* to select the adjustment approach -- which variables and in which form -- to maximize precision, while maintaining Type-I error control. Balzer et al. previously proposed *Adaptive Prespecification* within TMLE to flexibly and automatically select, from a prespecified set, the approach that maximizes empirical efficiency in small trials (N$<$40). To avoid overfitting with few randomized units, selection was previously limited to working generalized linear models, adjusting for a single covariate. Now, we tailor Adaptive Prespecification to trials with many randomized units. Using $V$-fold cross-validation and the estimated influence curve-squared as the loss function, we select from an expanded set of candidates, including modern machine learning methods adjusting for multiple covariates. As assessed in simulations exploring a variety of data generating processes, our approach maintains Type-I error control (under the null) and offers substantial gains in precision -- equivalent to 20-43\% reductions in sample size for the same statistical power. When applied to real data from ACTG Study 175, we also see meaningful efficiency improvements overall and within subgroups.

URL PDF HTML ☆

赞 0 踩 0

2105.11205 2026-03-03 stat.ME

Deconvolution density estimation with penalised MLE

Yun Cai, Hong Gu, Toby Kenney

Comments 25 pages, 4 figures, Appendix - 30 pages 8 figures

2603.01837 2026-03-03 cs.LG stat.ML

Constrained Particle Seeking: Solving Diffusion Inverse Problems with Just Forward Passes

Hongkun Dou, Zike Chen, Zeyu Li, Hongjue Li, Lijun Yang, Yue Deng

Comments Accepted by AAAI 2026

2603.01825 2026-03-03 cs.LG cs.GT stat.ML

Uncertainty Quantification of Click and Conversion Estimates for the Autobidding

Ivan Zhigalskii, Andrey Pudovikov, Aleksandr Katrutsa, Egor Samosvat

Comments 17 pages (10 main text + 7 appendix), 5 figures, 2 tables

2603.01786 2026-03-03 cs.LG cs.AI stat.ML

Learning Shortest Paths with Generative Flow Networks

Nikita Morozov, Ian Maksimov, Daniil Tiapkin, Sergey Samsonov

2603.01737 2026-03-03 eess.SP math.ST stat.TH

Detection of weak signals under arbitrary noise distributions

J. Zschetzsche, M. Weimar, O. Lang, S. Schuster, A. Haberl, S. Schertler, B. Lehner, J. Reisinger, M. Huemer, S. Rotter

Comments 24 pages, 8 figures, Code available at https://github.com/jonaslindenberger/LRao-detector

2603.01719 2026-03-03 stat.ML cs.LG

Co-optimization for Adaptive Conformal Prediction

Xiaoyi Su, Zhixin Zhou, Rui Luo

2603.01716 2026-03-03 stat.ME

A spatial scan statistical for categorical, functional data

Camille Frévent, Moustapha Sarr, Sophie Dabo-Niang

2603.01715 2026-03-03 stat.ME stat.AP

Power and Sample Size Calculations for Bayes Factors in two-arm clinical Phase II Trials with binary Endpoints

Riko Kelter

Comments 53 pages, 10 figures

2603.01670 2026-03-03 math.PR math.ST stat.ML stat.TH

Statistical Consistency of Discrete-to-Continuous Limits of Determinantal Point Processes

Hugo Jaquard, Nicolas Keriven

2603.01653 2026-03-03 stat.AP

Probabilistic forecasting of weather-driven faults in electricity networks: a flexible approach for extreme and non-extreme events

Mateus Maia, Daniela Castro-Camilo, Jethro Browell

2603.01588 2026-03-03 cs.LG stat.ML

Jump Like A Squirrel: Optimized Execution Step Order for Anytime Random Forest Inference

Daniel Biebert, Christian Hakert, Kay Heider, Daniel Kuhse, Sebastian Buschjäger, Jian-Jia Chen

2603.01514 2026-03-03 cs.LG stat.ML

Training Dynamics of Softmax Self-Attention: Fast Global Convergence via Preconditioning

Gautam Goel, Mahdi Soltanolkotabi, Peter Bartlett

2603.01468 2026-03-03 stat.ME stat.ML

Wild Bootstrap Inference for Non-Negative Matrix Factorization with Random Effects

Kenichi Satoh

2603.01434 2026-03-03 math.ST q-fin.RM stat.TH

A Laplace-based perspective on conditional mean risk sharing

Christopher Blier-Wong

2603.01428 2026-03-03 stat.AP astro-ph.IM

A Hybrid Particle Gaussian Mixture Filtering Method for Cislunar Orbit Determination Under Extreme Uncertainty

Ishan Paranjape, Tarun Hejmadi, Utkarsh Ranjan Mishra, Suman Chakravorty

Comments 8 pages; submitted to the 2026 IEEE FUSION conference

2603.01402 2026-03-03 stat.ME

Wrapped flat-top kernel density estimation with circular data

Yasuhito Tsuruta

Comments Accepted for publication in Statistical Papers

2603.01378 2026-03-03 stat.ME

Integration of Individual Participant and Aggregate Data Under Dataset Shift: Summary Statistic Comparison and Scalable Computation

Ming-Yueh Huang, Jing Qin, Chiung-Yu Huang

2603.01376 2026-03-03 cs.LG stat.ML

3BASiL: An Algorithmic Framework for Sparse plus Low-Rank Compression of LLMs

Mehdi Makni, Xiang Meng, Rahul Mazumder

Comments The Thirty-ninth Annual Conference on Neural Information Processing Systems

2603.01346 2026-03-03 cs.LG stat.ML

Relatively Smart: A New Approach for Instance-Optimal Learning

Shaddin Dughmi, Alireza F. Pour

2603.01339 2026-03-03 stat.ML cs.LG

Causal Effects with Unobserved Unit Types in Interacting Human-AI Systems

William Overman, Sadegh Shirani, Mohsen Bayati

2603.01337 2026-03-03 stat.ML cs.LG

Adaptive Estimation and Inference in Conditional Moment Models via the Discrepancy Principle

Jiyuan Tan, Vasilis Syrgkanis

2603.01309 2026-03-03 cs.LG stat.ML

PAC Guarantees for Reinforcement Learning: Sample Complexity, Coverage, and Structure

Joshua Steier

Comments 43 pages

2603.01304 2026-03-03 cs.LG stat.ML

Nonconvex Latent Optimally Partitioned Block-Sparse Recovery via Log-Sum and Minimax Concave Penalties

Takanobu Furuhashi, Hiroki Kuroda, Masahiro Yukawa, Qibin Zhao, Hidekata Hontani, Tatsuya Yokota

Comments 13 pages, 11 figures

2603.01293 2026-03-03 cs.LG cs.AI stat.ML

Theoretical Perspectives on Data Quality and Synergistic Effects in Pre- and Post-Training Reasoning Models

Adel Javanmard, Baharan Mirzasoleiman, Vahab Mirrokni

Comments 35 pages, 5 figures

2603.01268 2026-03-03 cs.DS cs.IT math.IT math.PR math.ST stat.TH

Achievability of Heterogeneous Hypergraph Recovery from its Graph Projection

Alexander Morgan, Chenghao Guo

2603.01237 2026-03-03 stat.ME

Robust measures of dispersion for circular data with an anomaly detection rule

Houyem Demni, Mia Hubert, Giovanni C. Porzio, Peter J. Rousseeuw

2603.01230 2026-03-03 stat.CO

Stochastic Neural Networks for Causal Inference with Missing Confounders

Yaxin Fang, Faming Liang

Comments Accepted at the International Conference on Learning Representations (ICLR) 2026

2603.01184 2026-03-03 cs.LG cs.AI q-bio.NC stat.CO

Scaling of learning time for high dimensional inputs

Carlos Stein Brito

Comments 14 pages, 5 figures

2603.01119 2026-03-03 stat.ME cs.AI

Robust Weighted Triangulation of Causal Effects Under Model Uncertainty

Rohit Bhattacharya, Ina Ocelli, Ted Westling

Comments 17 pages

2603.01117 2026-03-03 cs.DL cs.SI econ.GN q-fin.EC stat.AP

China leads scientific trends; the West launches new ones

Jeffrey W. Lockhart, Jamshid Sourati, Feng Shi, James Evans

Comments 16 pages, 4 figures

2603.01085 2026-03-03 stat.AP stat.CO

Recovery-Informed Forecasting Strategy Enhancement

Feng Li, Taozhu Ruan

2603.01081 2026-03-03 stat.AP

Issue-Specific Polarization and Cohesion in a Multi-Party Legislature: Integrating the Latent Space Item Response Model with Topic-Based Regression

Seungju Lee, In-Kyun Kim, Ick Hoon Jin

2603.01077 2026-03-03 math.DS cs.NA cs.SY eess.SY math.NA stat.ML

Kernel Methods for Stochastic Dynamical Systems with Application to Koopman Eigenfunctions: Feynman-Kac Representations and RKHS Approximation

Boumediene Hamzi, Houman Owhadi, Umesh Vaidya

2603.01057 2026-03-03 physics.flu-dyn stat.AP

Extreme-value statistics of curl-of-vorticity precursor peaks in perturbed Taylor-Green vortex turbulence

Satori Tsuzuki

2603.01047 2026-03-03 cs.LG stat.ML

Evaluating GFlowNet from partial episodes for stable and flexible policy-based training

Puhua Niu, Shili Wu, Xiaoning Qian

Comments Accepted by ICLR 2026

2602.23629 2026-03-03 stat.ML cs.LG math.ST stat.AP stat.ME stat.TH

Multivariate Spatio-Temporal Neural Hawkes Processes

Christopher Chukwuemeka, Hojun You, Mikyoung Jun

Comments 16 pages, 20 figures (including supplementary material)

2602.22803 2026-03-03 stat.ME

Rejoinder to the discussants of the two JASA articles `Frequentist Model Averaging' and `The Focused Information Criterion', by Nils Lid Hjort and Gerda Claeskens

Nils Lid Hjort, Gerda Claeskens

Comments 16 pages; Statistical Research Report, Department of Mathematics, University of Oslo, August 2003, and arXiv'd February 2026 (v2 simply corrects spelling in v1). This rejoinder to the two papers `FMA' and `FIC' is published in JASA, 2003, vol. 98, pages 938-945, at this url: tandfonline.com/doi/abs/10.1198/016214503000000882

2602.19113 2026-03-03 cs.LG cs.AI stat.ML

Learning from Complexity: Exploring Dynamic Sample Pruning of Spatio-Temporal Training

Wei Chen, Junle Chen, Yuqian Wu, Yuxuan Liang, Xiaofang Zhou

2602.07667 2026-03-03 econ.EM stat.AP stat.ML

Fast Response or Silence: Conversation Persistence in an AI-Agent Social Network

Aysajan Eziz

Comments 32 pages, 6 figures, 18 tables

2602.02577 2026-03-03 stat.ML cs.IT cs.LG math.IT

Relaxed Triangle Inequality for Kullback-Leibler Divergence Between Multivariate Gaussian Distributions

Shiji Xiao, Yufeng Zhang, Chubo Liu, Yan Ding, Keqin Li, Kenli Li

2601.19957 2026-03-03 stat.CO astro-ph.CO astro-ph.IM

SunBURST: Deterministic GPU-Accelerated Bayesian Evidence via Mode-Centric Laplace Integration

Ira Wolfson

Comments 46 pages, 1 figure, 10 tables

2512.12937 2026-03-03 math.PR math.ST stat.TH

Asymptotic Normality of Subgraph Counts in Sparse Inhomogeneous Random Graphs

Sayak Chatterjee, Anirban Chatterjee, Abhinav Chakraborty, Bhaswar B. Bhattacharya

Comments Revised version. 28 pages, 3 figures

2511.03087 2026-03-03 stat.ME math.OC math.ST stat.ML stat.TH

Beyond Maximum Likelihood: Variational Inequality Estimation for Generalized Linear Models

Linglingzhi Zhu, Jonghyeok Lee, Yao Xie

2511.02137 2026-03-03 stat.ML cs.LG stat.ME

DoFlow: Flow-based Generative Models for Interventional and Counterfactual Forecasting on Time Series

Dongze Wu, Feng Qiu, Yao Xie

Comments Accepted to the 14th International Conference on Learning Representations (ICLR 2026)

2510.08409 2026-03-03 stat.ML cs.LG

Optimal Stopping in Latent Diffusion Models

Yu-Han Wu, Quentin Berthet, Gérard Biau, Claire Boyer, Romuald Elie, Pierre Marion

2510.05060 2026-03-03 cs.LG math.ST stat.ML stat.TH

ResCP: Reservoir Conformal Prediction for Time Series Forecasting

Roberto Neglia, Andrea Cini, Michael M. Bronstein, Filippo Maria Bianchi

Comments ICLR 2026

2510.03605 2026-03-03 cs.AI cs.LG stat.ML

Understanding the Role of Training Data in Test-Time Scaling

Adel Javanmard, Baharan Mirzasoleiman, Vahab Mirrokni

Comments 25 pages, 5 figures, accepted in ICLR 2026

2510.00504 2026-03-03 stat.ML cond-mat.dis-nn cs.IT cs.LG math.IT

A universal compression theory for lottery ticket hypothesis and neural scaling laws

Hong-Yi Wang, Di Luo, Tomaso Poggio, Isaac L. Chuang, Liu Ziyin

Comments 26 pages. Accepted by ICLR 2026 conference

2509.26560 2026-03-03 stat.ML cs.LG q-bio.NC

Estimating Dimensionality of Neural Representations from Finite Samples

Chanwoo Chun, Abdulkadir Canatar, SueYeon Chung, Daniel Lee

2509.21513 2026-03-03 cs.LG cs.AI cs.CV math.PR stat.ML

DistillKac: Few-Step Image Generation via Damped Wave Equations

Weiqiao Han, Chenlin Meng, Christopher D. Manning, Stefano Ermon

Comments Accepted to ICLR 2026

2509.02391 2026-03-03 cs.LG cs.GT stat.ML

Gaming and Cooperation in Federated Learning: What Can Happen and How to Monitor It

Dongseok Kim, Hyoungsun Choi, Mohamed Jismy Aashik Rasool, Gisung Oh

Comments Published in Transactions on Machine Learning Research (TMLR), 2026. Camera-ready version. OpenReview: https://openreview.net/forum?id=Ck3q5YdWIv

2508.11727 2026-03-03 cs.LG stat.ML

Causal Structure Learning in Hawkes Processes with Complex Latent Confounder Networks

Songyao Jin, Biwei Huang

2508.09576 2026-03-03 stat.ME

Decoding Neuronal Ensembles from Spatially-Referenced Calcium Traces: A Bayesian Semiparametric Approach

Laura D'Angelo, Francesco Denti, Antonio Canale, Michele Guindani

2507.07469 2026-03-03 stat.ML cs.LG econ.EM

A Projection-Based ARIMA Framework for Nonlinear Dynamics in Macroeconomic and Financial Time Series: Closed-Form Estimation and Rolling-Window Inference

Haojie Liu, Zihan Lin

2507.05958 2026-03-03 math.ST stat.TH

Importance sampling for Sobol' indices estimation

Haythem Boucharif, Jérôme Morio, Paul Rochet

2506.16965 2026-03-03 cs.LG stat.ML

RocketStack: Level-aware Deep Recursive Ensemble Learning Architecture

Çağatay Demirel

Comments 32 pages, 1 graphical abstract, 8 figures, 10 tables, 2 supplementary figures

详情

英文摘要

Ensemble learning remains a cornerstone of machine learning, with stacking used to integrate predictions from multiple base learners through a meta-model. However, deep stacking remains uncommon due to feature redundancy, complexity, and computational burden. To address these limitations, RocketStack is introduced as a level-aware recursive stacking architecture explored up to ten stacking levels, extending beyond prior architectures. At level 1, base-learner predictions are fused with original features; at later levels, weaker learners are incrementally pruned using out-of-fold (OOF) scores. To curb early saturation, pruning is regularized by applying Gaussian perturbations at two noise scales to OOF scores prior to model selection for next-level stacking, alongside deterministic pruning. To control feature growth, periodic compression is applied at levels 3, 6, and 9 using Simple, Fast, Efficient (SFE) filtering, attention-based selection, and autoencoders. Across 33 datasets (23 binary, 10 multi-class), increasing accuracy with depth is confirmed by linear mixed-effects trend tests, and the best meta-model per level increasingly outperforms the best standalone ensemble. OOF-perturbed pruning is found to improve stability and late-level gains, while periodic compression is found to yield substantial runtime and dimensionality reductions with minimal accuracy drop. At the deepest level, accuracy slightly surpasses established deep tabular baselines. When hyperparameter optimization is performed on baseline models, early performance is boosted; however, untuned RocketStack closes the gap with depth and remains competitive at later levels. It achieves deep recursive stacking with sublinear computational growth and provides a modular, depth-aware foundation for scalable decision fusion as model pools and feature spaces evolve.

URL PDF HTML ☆

赞 0 踩 0

2505.16953 2026-03-03 cs.LG stat.ML

ICYM2I: The illusion of multimodal informativeness under missingness

Young Sang Choi, Vincent Jeanselme, Pierre Elias, Shalmali Joshi

Comments Published as a conference paper at ICLR 2026

2505.14042 2026-03-03 cs.LG cs.CV stat.ML

Adversarially Pretrained Transformers May Be Universally Robust In-Context Learners

Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki

Comments ICLR26

2504.08428 2026-03-03 stat.ME cond-mat.stat-mech cs.LG math.ST stat.ML stat.TH

Standardization of Weighted Ranking Correlation Coefficients

Pierangelo Lombardo

Comments 24 pages, 5 figures

2504.08214 2026-03-03 stat.CO stat.ME

An Optimal Transport-Based Generative Model for Bayesian Posterior Sampling

Ke Li, Wei Han, Yuexi Wang, Yun Yang

2502.21278 2026-03-03 cs.LG stat.ML

Does Generation Require Memorization? Creative Diffusion Models using Ambient Diffusion

Kulin Shah, Alkis Kalavasis, Adam R. Klivans, Giannis Daras

Comments 33 pages

2502.11788 2026-03-03 stat.AP

Comparison of offset and ratio weighted regressions in tweedie models with application to mid-term cancellations

Boucher Jean-Philippe, Coulibaly Raïssa

Comments 30 pages, 9 figures, 1 table

2502.00251 2026-03-03 stat.ME

Two-stage least squares with treatment-covariate interactions for treatment effect heterogeneity

Anqi Zhao, Peng Ding, Fan Li

2411.19305 2026-03-03 stat.ML cs.LG math.DS

LD-EnSF: Synergizing Latent Dynamics with Ensemble Score Filters for Fast Data Assimilation with Sparse Observations

Pengpeng Xiao, Phillip Si, Peng Chen

2410.08939 2026-03-03 stat.CO stat.ME stat.ML

Linear-cost unbiased posterior estimates for crossed effects and matrix factorization models via couplings

Paolo Maria Ceriani, Andrea Pandolfi, Giacomo Zanella

Comments 48 pages, 10 figures, 1 table

2410.04264 2026-03-03 stat.ML cs.LG

Decoupling Dynamical Richness from Representation Learning: Towards Practical Measurement

Yoonsoo Nam, Nayara Fonseca, Seok Hyeong Lee, Chris Mingard, Niclas Goring, Ouns El Harzli, Abdurrahman Hadi Erturk, Soufiane Hayou, Ard A. Louis

Comments Published at ICLR 2026

2407.15256 2026-03-03 math.ST econ.EM stat.TH

Weak-instrument-robust subvector inference in instrumental variables regression: A subvector Lagrange multiplier test and properties of subvector Anderson-Rubin confidence sets

Malte Londschien, Peter Bühlmann

2406.16227 2026-03-03 stat.ML cs.LG stat.ME

VICatMix: variational Bayesian clustering and variable selection for discrete biomedical data

Jackie Rao, Paul D. W. Kirk

2406.02701 2026-03-03 stat.CO

MPCR: Multi-Precision Computations Package in R

Mary Lai O. Salvana, Sameh Abdulah, Minwoo Kim, David Helmy, Ying Sun, Marc G. Genton

2309.13324 2026-03-03 stat.ME

Targeted Learning on Variable Importance Measure for Heterogeneous Treatment Effect

Haodong Li, Alan E Hubbard, Oliver J Hines, Andrea M Storås, Kajsa Kvist, Mark van der Laan

2603.01033 2026-03-03 stat.ME stat.AP stat.OT

Interpreting Net Survival: What We Estimate Versus What We Think We Estimate

Matthew J. Smith

Comments 21 pages, 4 figures

2603.00973 2026-03-03 stat.AP stat.ME

A Dirichlet-Multinomial-Poisson framework for the coherent analysis and forecast of cause-specific mortality

Andrea Nigri, Han Lin Shang, Francesco Ungolo

2603.00971 2026-03-03 stat.ML cs.LG math.ST stat.TH

Random Features for Operator-Valued Kernels: Bridging Kernel Methods and Neural Operators

Mike Nguyen, Nicole Mücke

2603.00968 2026-03-03 stat.ML cs.LG stat.ME

Learning with the Nash-Sutcliffe loss

Hristos Tyralis, Georgia Papacharalampous

Comments 77 pages, 4 figures, 6 tables

2603.00955 2026-03-03 stat.ME cs.AI

Beyond False Discovery Rate: A Stepdown Group SLOPE Approach for Grouped Variable Selection

Xuelin Zhang, Jingxuan Liang, Xinyue Liu, Hong Chen, Biqin Song

2603.00935 2026-03-03 stat.ML cs.LG

Time-Aware Latent Space Bayesian Optimization

Tuan A. Vu, Julien Martinelli, Harri Lähdesmäki

2603.00927 2026-03-03 stat.ME

Laplace Variational Inference for Bayesian Envelope Models

Seunghyeon Kim, Kwangmin Lee, Yeonhee Park

Comments 63 pages, 4 figures. Code available at https://github.com/Seunghyeon-Kim-stat/env-LVI

2603.00888 2026-03-03 cs.LG stat.ML

Probabilistic Learning and Generation in Deep Sequence Models

Wenlong Chen

Comments PhD thesis

详情

英文摘要

Despite exceptional predictive performance of Deep sequence models (DSMs), the main concern of their deployment centers around the lack of uncertainty awareness. In contrast, probabilistic models quantify the uncertainty associated with unobserved variables with rules of probability. Notably, Bayesian methods leverage Bayes' rule to express our belief of unobserved variables in a principled way. Since exact Bayesian inference is computationally infeasible at scale, approximate inference is required in practice. Two major bottlenecks of Bayesian methods, especially when applied in deep neural networks, are prior specification and approximation quality. In Chapter 3 & 4, we investigate how the architectures of DSMs themselves can be informative for the design of priors or approximations in probabilistic models. We first develop an approximate Bayesian inference method tailored to the Transformer based on the similarity between attention and sparse Gaussian process. Next, we exploit the long-range memory preservation capability of HiPPOs (High-order Polynomial Projection Operators) to construct an interdomain inducing point for Gaussian process, which successfully memorizes the history in online learning. In addition to the progress of DSMs in predictive tasks, sequential generative models consisting of a sequence of latent variables are popularized in the domain of deep generative models. Inspired by the explicit self-supervised signals for these latent variables in diffusion models, in Chapter 5, we explore the possibility of improving other generative models with self-supervision for their sequential latent states, and investigate desired probabilistic structures over them. Overall, this thesis leverages inductive biases in DSMs to design probabilistic inference or structure, which bridges the gap between DSMs and probabilistic models, leading to mutually reinforced improvement.

URL PDF HTML ☆

赞 0 踩 0

2603.00875 2026-03-03 cs.CE cs.SY eess.SY stat.AP

Battery Lifetime Prediction using Data-driven Modeling Approaches

Vikram C Patil

详情

英文摘要

Batteries are ubiquitous today, with applications ranging from smartphones, watches, and laptops to electric cars, drones, and electric aircraft. Lithium-ion batteries are widely used in these applications due to their high energy density, rechargeability, and low lifecycle cost. Understanding the lifetime of lithium-ion batteries is essential for their effective utilization across many domains. In this study, data-driven modeling approaches are explored to predict the lifetime of lithium-ion batteries using various measurable battery parameters. A battery dataset from NASA's electric aircraft experiments was used, which included 17 predictor variables and remaining flight time as the response variable representing battery lifetime. The dataset contained more than 4,000,000 rows. However, the original dataset provided limited directly useful information about battery utilization over time; therefore, feature engineering was performed to generate more informative variables. Additionally, dimensionality reduction using principal component analysis (PCA) was applied to reduce computational cost and model complexity by selecting a smaller number of principal components as predictors for model development. Random forest and neural network models were explored for battery lifetime prediction using the engineered features. Multiple neural network configurations were evaluated, including single- and double-hidden-layer architectures with varying numbers of nodes. Mean squared error (MSE) on the test dataset was used as the performance metric for model comparison. The results indicate that data-driven modeling approaches are effective for battery lifetime prediction, with neural network models outperforming other models based on the MSE metric. Furthermore, neural networks demonstrate robustness in handling high-dimensional battery data.

URL PDF HTML ☆

赞 0 踩 0

2603.00849 2026-03-03 math.ST stat.TH

A new kernel-based index for the global sensitivity analysis of models with correlated inputs

Troy Larsen, Alen Alexanderian

Comments 22 pages

2603.00819 2026-03-03 math.NA cs.LG cs.NA math.ST stat.TH

A short tour of operator learning theory: Convergence rates, statistical limits, and open questions

Simone Brugiapaglia, Nicola Rares Franco, Nicholas H. Nelsen

Comments 12 pages

2603.00794 2026-03-03 math.ST stat.TH

A New Look at the Visual Performance of Nonparametric Hazard Rate Estimators

Olaf Gefeller, Nils Lid Hjort

Comments 8 pages, no figures. Statistical Research Report, Department of Mathematics, University of Oslo, June 1997, but now arXiv'd March 2026. Has later appeared in "Data Highways and Information Flooding, a Challenge for Classification and Data Analysis", 1997, Springer Verlag

2603.00784 2026-03-03 math.PR math.ST stat.TH

On the time a diffusion process spends along a line

Nils Lid Hjort, Rafail Zalmonovich Khasminskii

Comments 16 pages, 0 figures; Statistical Research Report, Department of Mathematics, University of Oslo, October 1992, but now arXiv'd in March 2026. The paper is published, in essentially this form, in Stochastic Processes and their Applications, 1993, vol. 47, pages 229-247, and may be found at this url: www.sciencedirect.com/science/article/pii/030441499390016W

2603.00750 2026-03-03 math.ST math.PR stat.TH

A simple integral representation of single-event scoring rules

Alexander R. Pruss

2603.00749 2026-03-03 stat.ME

Hidden in Plain Sight: How Non-Collapsibility Biases Treatment Effects in (Network) Meta-Analysis

Harlan Campbell, Jeroen P. Jansen

2603.00734 2026-03-03 stat.ME stat.AP

Robust Power and Sample Size Calculations in Quasi-likelihood Models: Methods and Practice

Shijie Yuan, Amy Cochran, Paul Rathouz

2603.00716 2026-03-03 cs.LG stat.ML

Frozen Policy Iteration: Computationally Efficient RL under Linear $Q^π$ Realizability for Deterministic Dynamics

Yijing Ke, Zihan Zhang, Ruosong Wang

2603.00636 2026-03-03 cs.LG physics.ao-ph stat.ML

Retrodictive Forecasting: A Proof-of-Concept for Exploiting Temporal Asymmetry in Time Series Prediction

Cedric Damour

Comments 27 pages, 13 figures, 5 tables, Code available at https://github.com/cdamour/retrodictive-forecasting (Zenodo: https://doi.org/10.5281/zenodo.18803446)

详情

英文摘要

We propose a retrodictive forecasting paradigm for time series: instead of predicting the future from the past, we identify the future that best explains the observed present via inverse MAP optimization over a Conditional Variational Autoencoder (CVAE). This conditioning is a statistical modeling choice for Bayesian inversion; it does not assert that future events cause past observations. The approach is theoretically grounded in an information-theoretic arrow-of-time measure: the symmetrized Kullback-Leibler divergence between forward and time-reversed trajectory ensembles provides both the conceptual rationale and an operational GO/NO-GO diagnostic for applicability. We implement the paradigm as MAP inference over an inverse CVAE with a learned RealNVP normalizing-flow prior and evaluate it on six time series cases: four synthetic processes with controlled temporal asymmetry and two ERA5 reanalysis datasets (wind speed and solar irradiance). The work makes four contributions: (i) a formal retrodictive inference formulation; (ii) an inverse CVAE architecture; (iii) a model-free irreversibility diagnostic; and (iv) a falsifiable validation protocol with four pre-specified predictions. All pre-specified predictions are empirically supported: the diagnostic correctly classifies all six cases; the learned flow prior improves over an isotropic Gaussian baseline on GO cases; the inverse MAP yields no spurious advantage on time-reversible dynamics; and on irreversible GO cases, it achieves competitive or superior RMSE relative to forward baselines, with a statistically significant 17.7% reduction over a forward MLP on ERA5 solar irradiance. These results provide a structured proof-of-concept that retrodictive forecasting can constitute a viable alternative to conventional forward prediction when statistical time-irreversibility is present and exploitable.

URL PDF HTML ☆

赞 0 踩 0

2603.00553 2026-03-03 math.ST stat.TH

Minimax Simple Bayes Estimators of a Normal Variance

Yuzo Maruyama

Comments 6 pages

2603.00410 2026-03-03 stat.ME

Sensitivity Analysis for False Discovery Rate Estimation with Published p-Values

Tianyu Cao, Sangyoon Yi, Joshua Habiger

2603.00393 2026-03-03 physics.geo-ph cs.LG stat.ML

Dual-space posterior sampling for Bayesian inference in constrained inverse problems

Ali Siahkoohi, Kamal Aghazade, Ali Gholami

详情

英文摘要

Inverse problems constrained by partial differential equations are often ill-conditioned due to noisy and incomplete data or inherent non-uniqueness. A prominent example is full waveform inversion, which estimates Earth's subsurface properties by fitting seismic measurements subject to the wave equation, where ill-conditioning is inherent to noisy, band-limited, finite-aperture data and shadow zones. Casting the inverse problem into a Bayesian framework allows for a more comprehensive description of its solution, where instead of a single estimate, the posterior distribution characterizes non-uniqueness and can be sampled to quantify uncertainty. However, no clear procedure exists for translating hard physical constraints, such as the wave equation, into prior distributions amenable to existing sampling techniques. To address this, we perform posterior sampling in the dual space using an augmented Lagrangian formulation, which translates hard constraints into penalties amenable to sampling algorithms while ensuring their exact satisfaction. We achieve this by seamlessly integrating the alternating direction method of multipliers (ADMM) with Stein variational gradient descent (SVGD) -- a particle-based sampler -- where the constraint is relaxed at each iteration and multiplier updates progressively enforce satisfaction. This enables constrained posterior sampling while inheriting the favorable conditioning properties of dual-space solvers, where partial constraint relaxation allows productive updates even when the current model is far from the true solution. We validate the method on a stylized Rosenbrock conditional inference problem and on frequency-domain full waveform inversion for a Gaussian anomaly model and the Marmousi~II benchmark, demonstrating well-calibrated uncertainty estimates and posterior contraction with increasing data coverage.

URL PDF HTML ☆

赞 0 踩 0

2603.00365 2026-03-03 stat.AP econ.GN q-fin.EC

Randomized Recruitment Driven Sampling

Adam Visokay, Laura Boudreau, Rachel M. Heath, Tyler H. McCormick

2603.00347 2026-03-03 stat.ME

Synthetic Priors

Nick Polson, Vadim Sokolov

2603.00343 2026-03-03 stat.ME

Causal Inference with MNAR Self-Masking Confounders: A Stratified Delta-Imputed Propensity Estimation Method

Md. Niamul Islam Sium, Mohammad Hridoy Patwary

Comments Under Review

2603.00333 2026-03-03 math.OC cs.NA math.NA stat.ML

Dynamic Proximal Gradient Algorithms for Schatten-$p$ Quasi-Norm Regularized Problems

Weiping Shen, Linglingzhi Zhu, Yaohua Hu, Chong Li, Xiaoqi Yang

2603.00322 2026-03-03 stat.ME

Fast distance computation of multivariate distributions via nonparanormal transport

Edward Shao, Junyoung Park, Naresh Punjabi, Hui Jiang, Irina Gaynanova

Comments 21 pages 9 figures

2603.00291 2026-03-03 econ.EM stat.AP

Anticorruption Enforcement and Sale Mechanism Choice in China's Land Market

Julia Manso

2603.00277 2026-03-03 stat.ME stat.CO

CliPS -- How to identify cluster distributions in Bayesian mixture models

Gertraud Malsiner-Walli, Sylvia Frühwirth-Schnatter, Bettina Grün

2602.21487 2026-03-03 math.ST stat.TH

Moment bounds for condition numbers and singular values of high-dimensional Gaussian random matrices: Applications and limitations

Partha Sarkar, Kshitij Khare, Sanvesh Srivastava

2602.19691 2026-03-03 stat.ML cs.LG cs.NA math.NA

Smoothness Adaptivity in Constant-Depth Neural Networks: Optimal Rates via Smooth Activations

Yuhao Liu, Zilin Wang, Lei Wu, Shaobo Zhang

2602.13104 2026-03-03 stat.ML cs.LG math.ST stat.TH

Random Forests as Statistical Procedures: Design, Variance, and Dependence

Nathaniel S. O'Connell

Comments 55 pages (35 page main text; 20 page supplement); 10 figures (9 main text; 1 supplement). Version 2: Added procedure-aligned synthetic resampling (PASR) estimation framework, pointwise prediction and confidence intervals, and comprehensive simulations validating theoretical claims

2601.18075 2026-03-03 stat.ME

Maximum-Variance-Reduction Stratification for Improved Subsampling

Dingyi Wang, Haiying Wang, Qingpei Hu

2601.12175 2026-03-03 q-fin.ST stat.AP

Distributional Fitting and Tail Analysis of Lead-Time Compositions: Nights vs. Revenue on Airbnb

Harrison E. Katz, Jess Needleman, Liz Medina

2601.04906 2026-03-03 math.ST stat.TH

Inference for concave distribution functions under measurement error

Mohammed Es-Salih Benjrada, Cecile Durot, Tommaso Lando

2601.04584 2026-03-03 math.PR math.ST stat.TH

Distributional Limits for Eigenvalues of Graphon Kernel Matrices

Behzad Aalipur

2601.03059 2026-03-03 stat.ME

On the bias of the Hoover index estimator: Results for the gamma distribution

Roberto Vila, Helton Saulo

Comments 16 pages, 2 figures

2512.06116 2026-03-03 stat.AP q-bio.QM

Spatial Analysis for AI-segmented Histopathology Images: Methods and Implementation

Yoolkyu Park, Fangjiang Wu, Xin Feng, Shengjie Yang, Elizabeth H. Wang, Bo Yao, Chul Moon, Guanghua Xiao, Qiwei Li

Comments 44 pages, 2 figures

2511.10967 2026-03-03 stat.CO math.OC math.ST stat.TH

Autocovariance and Optimal Design for Random Walk Metropolis-Hastings Algorithm

Jingyi Zhang, James C. Spall

2511.00944 2026-03-03 stat.ME econ.EM

On the estimation of leverage effect and volatility of volatility in the presence of jumps

Qiang Liu, Zhi Liu, Wang Zhou

2510.21314 2026-03-03 cs.LG cs.AI stat.ML

A Convergence Analysis of Adaptive Optimizers under Floating-point Quantization

Xuan Tang, Jichu Li, Difan Zou

Comments 68 pages, 13 figures, ICLR 2026

2509.23357 2026-03-03 cs.LG math.OC stat.ML

Landing with the Score: Riemannian Optimization through Denoising

Andrey Kharitenko, Zebang Shen, Riccardo de Santi, Niao He, Florian Doerfler

Comments 41 pages, 9 figures

2509.22240 2026-03-03 eess.IV cs.CV cs.LG stat.AP stat.ML

COMPASS: Robust Feature Conformal Prediction for Medical Segmentation Metrics

Matt Y. Cheung, Ashok Veeraraghavan, Guha Balakrishnan

Comments Accepted at ICLR 2026

2509.20323 2026-03-03 cs.LG math.OC stat.ML

A Recovery Guarantee for Sparse Neural Networks

Sara Fridovich-Keil, Mert Pilanci

Comments ICLR 2026

2507.21783 2026-03-03 stat.AP cs.LG stat.ME stat.ML

Domain Generalization and Adaptation in Intensive Care with Anchor Regression

Malte Londschien, Manuel Burger, Gunnar Rätsch, Peter Bühlmann

2507.07815 2026-03-03 stat.ME stat.CO

Vecchia approximated Bayesian heteroskedastic Gaussian processes

Parul V. Patil, Robert B. Gramacy, Cayelan C. Carey, R. Quinn Thomas

Comments 33 pages, 14 figures

2505.12096 2026-03-03 cs.LG cs.AI stat.ML

When Bias Meets Trainability: Connecting Theories of Initialization

Alberto Bassi, Marco Baity-Jesi, Aurelien Lucchi, Carlo Albert, Emanuele Francazi

2505.07800 2026-03-03 stat.ME stat.AP

Moderation effects and elasticities in compositional regression with a total. Application to Bayesian spatiotemporal modelling of all-cause mortality from environmental stressors

Germà Coenders, Javier Palarea-Albaladejo, Marc Saez, Maria A. Barceló

详情

DOI: 10.1007/s00477-026-03183-5
Journal ref: Stochastic Environmental Research and Risk Assessment, 40 (2026), 56

英文摘要

Compositional regression models with a real-valued response variable can generally be specified as log-contrast models subject to a zero-sum constraint on the model coefficients. This formulation emphasises the relative information conveyed in the composition, while the overall total is regarded irrelevant. In this work, such a setting is extended to account not only for total effects, formally defined in a so-called T-space, but also for moderation or interaction effects. This is applied in the context of complex spatiotemporal data modelling, through an adaptation of the integrated nested Laplace approximation (INLA) method within a Bayesian estimation framework. Particular emphasis is placed on the interpretation of model coefficients and results, both on the original scale of the response variable and in terms of elasticities. The methodology is demonstrated through a detailed case study investigating the relationship between all-cause mortality and the interaction between extreme temperatures, air pollution composition, and total air pollution in Catalonia, Spain, during the summer of 2022. The results indicate that extreme temperatures are associated with an increased risk of mortality four days after exposure. Additionally, exposure to total air pollution, especially to NO2, is linked to elevated mortality risk regardless of temperature. In contrast, particulate matter is associated to increased mortality only when exposure occurs on days of extreme heat.

URL PDF HTML ☆

赞 0 踩 0

2503.22739 2026-03-03 econ.GN q-fin.EC stat.AP

The "Days of Learning" Metric for Education Evaluations

Gregory Camilli

2503.09026 2026-03-03 stat.ME

A Sparse Linear Model for Positive Definite Estimation of Covariance Matrices

Rakheon Kim, Irina Gaynanova

2502.12063 2026-03-03 stat.ML cs.LG math.OC math.ST stat.ME stat.TH

Low-Rank Thinning

Annabelle Michael Carrell, Albert Gong, Abhishek Shetty, Raaz Dwivedi, Lester Mackey

2501.04272 2026-03-03 stat.ML cs.LG

On weight and variance uncertainty in neural networks for regression tasks

Moein Monemi, Morteza Amini, S. Mahmoud Taheri, Mohammad Arashi

Comments Submitted to journal

2412.17879 2026-03-03 stat.AP stat.ME

Strategy to control biases in prior event rate ratio method, with application to palliative care in patients with advanced cancer

Xiangmei Ma, Grace Meijuan Yang, Qingyuan Zhuang, Yin Bun Cheung

Comments 35 pages, including 3 tables, 1 figure and 3 supplemental materials

2412.05397 2026-03-03 stat.ME

Network Structural Equation Models for Causal Mediation and Spillover Effects

Ritoban Kundu, Peter X. K. Song

2411.04340 2026-03-03 cs.HC cs.CY stat.CO

Survival of the Notable: Gender Asymmetry in Wikipedia Collective Deliberations

Khandaker Tasnim Huq, Giovanni Luca Ciampaglia

详情

DOI: 10.1145/3757663
Journal ref: Proc. ACM Hum.-Comput. Interact. 9, 7, Article CSCW482 (November 2025), 29 pages

英文摘要

Communities on the web rely on open conversation forums for a number of tasks, including governance, information sharing, and decision making. However these forms of collective deliberation can often result in biased outcomes. A prime example are Articles for Deletion (AfD) discussions on Wikipedia, which allow editors to gauge the notability of existing articles, and that, as prior work has suggested, may play a role in perpetuating the notorious gender gap of Wikipedia. Prior attempts to address this question have been hampered by access to narrow observation windows, reliance on limited subsets of both biographies and editorial outcomes, and by potential confounding factors. To address these limitations, here we adopt a competing risk survival framework to fully situate biographical AfD discussions within the full editorial cycle of Wikipedia content. We find that biographies of women are nominated for deletion faster than those of men, despite editors taking longer to reach a consensus for deletion of women, even after controlling for the size of the discussion. Furthermore, we find that AfDs about historical figures show a strong tendency to result into the redirecting or merging of the biography under discussion into other encyclopedic entries, and that there is a striking gender asymmetry: biographies of women are redirected or merged into biographies of men more often than the other way round. Our study provides a more complete picture of the role of AfD in the gender gap of Wikipedia, with implications for the governance of the open knowledge infrastructure of the web.

URL PDF HTML ☆

赞 0 踩 0

2410.21603 2026-03-03 stat.ME stat.CO

Approximate Bayesian Computation with Statistical Distances for Model Selection

Clara Grazian

2409.12446 2026-03-03 cs.LG cs.AI math.ST stat.ML stat.TH

Neural Networks Generalize on Low Complexity Data

Sourav Chatterjee, Timothy Sudijono

Comments 37 pages. Small corrections made

2407.18341 2026-03-03 stat.ME

Generalizing the Finkelstein-Schoenfeld Test to Incorporate Multiple Alternating Thresholds

Yunhan Mou, Tassos Kyriakides, Scott Hummel, Fan Li, Yuan Huang

2406.04098 2026-03-03 stat.ML cs.LG

A Large-Scale Neutral Comparison Study of Survival Models on Low-Dimensional Data

Lukas Burk, John Zobolas, Bernd Bischl, Andreas Bender, Marvin N. Wright, Raphael Sonabend

Comments 44 pages, 20 figures

详情

英文摘要

This work presents the first large-scale neutral benchmark experiment focused on single-event, right-censored, low-dimensional survival data. Benchmark experiments are essential in methodological research to scientifically compare new and existing model classes through proper empirical evaluation. Existing benchmarks in the survival literature are smaller in scale regarding the number of used datasets and extent of empirical evaluation. They often lack appropriate tuning or evaluation procedures, while other comparison studies focus on qualitative reviews rather than quantitative comparisons. This comprehensive study aims to fill the gap by neutrally evaluating a broad range of methods and providing generalizable guidelines for practitioners. We benchmark 19 models, ranging from classical statistical approaches to many common machine learning methods, on 34 publicly available datasets. The benchmark tunes models using both a discrimination measure (Harrell's C-index) and a scoring rule (Integrated Survival Brier Score), and evaluates them across six metrics covering discrimination, calibration, and overall predictive performance. Despite superior average ranks in overall predictive performance from individual learners like oblique random survival forests and likelihood-based boosting, and better discrimination rankings from multiple boosting- and tree-based methods as well as parametric survival models, no method significantly outperforms the commonly used Cox proportional hazards model for either tuning measure. We conclude that for predictive purposes in the standard survival analysis setting of low-dimensional, right-censored data, the Cox Proportional Hazards model remains a simple and robust method, sufficient for most practitioners. All code, data, and results are publicly available on GitHub https://github.com/slds-lmu/paper_2023_survival_benchmark

URL PDF HTML ☆

赞 0 踩 0

2404.11345 2026-03-03 stat.ME

Jacobi Prior: An Alternative Bayesian Method for Supervised Learning

Sourish Das, Shouvik Sardar

Comments 44 pages, 10 figures

2403.17609 2026-03-03 stat.ME stat.CO

Estimation Method under Three-Parameter Generalized Exponential Model: Consistency, Uniqueness and its Applications

Kiran Prajapat, Sharmishtha Mitra, Debasis Kundu

Comments Accepted for publication in the Japanese Journal of Statistics and Data Science

详情

英文摘要

In numerous instances, the generalized exponential distribution can be used as an alternative to the most widely used non-regular family of distributions: Weibull, gamma, lognormal with three-parameters when analyzing lifetime or any skewed continuous data. A non-regular family is a class of probability distributions that do not satisfy the regularity conditions typically assumed in classical statistical inference. Some key features of such family of distributions are: support of its probability density function depends on one its parameters; its likelihood function may not be bounded for a certain range of parameter space, hence maximum likelihood estimators do not exist; the likelihood function even may not be differentiable or integrable as needed, hence Fisher Information may not exist or be infinite. Moreover, standard results like MLE existence, consistency, asymptotic normality may fail. Therefore, specialized or robust inferential techniques are needed. This article offers a consistent method for estimating the parameters of a three-parameter generalized exponential distribution that sidesteps the issue of an unbounded likelihood function. The method is hinged on a maximum likelihood estimation of shape and scale parameters that uses a location-invariant statistic. Important estimator properties, such as uniqueness and consistency, are demonstrated for the first time under this approach. In addition, quantile estimates for the assumed distribution are provided. We present a Monte Carlo simulation study along with comparisons to a number of well-known estimation techniques in terms of bias and root mean square error. For illustrative purposes, a real dataset from reliability engineering, has been analyzed and the goodness of fit along with the bootstrap confidence intervals are compared with existing traditional methods.

URL PDF HTML ☆

赞 0 踩 0

2402.10018 2026-03-03 cs.IT math.IT q-bio.QM stat.AP

Two-Stage Decoding Algorithm and Bounds for Group Testing with Prior Statistics

Ayelet C. Portnoy, Amit Solomon, Alejandro Cohen

2402.06223 2026-03-03 cs.LG cs.CV stat.ML

Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning

Yuhang Liu, Zhen Zhang, Dong Gong, Erdun Gao, Biwei Huang, Mingming Gong, Anton van den Hengel, Kun Zhang, Javen Qinfeng Shi

2308.13905 2026-03-03 stat.ME math.ST stat.TH

Estimation and Hypothesis Testing of Derivatives in Smoothing Spline ANOVA Models

Ruiqi Liu, Kexuan Li, Meng Li

2210.11611 2026-03-03 stat.AP

3D Bivariate Spatial Modelling of Argo Ocean Temperature and Salinity

Mary Lai Salvana, Jian Cao, Mikyoung Jun

2603.00216 2026-03-03 math.ST math.PR stat.TH

The relative efficiency of sequential tests

Henri Doerks, Erik Ekström, Yuqiong Wang

2603.00135 2026-03-03 econ.EM stat.ME

Shift-Share Designs in Political Science

Peter Kyungtae Park

2603.00105 2026-03-03 cs.LG cs.CL stat.ME stat.ML

LIDS: LLM Summary Inference Under the Layered Lens

Dylan Park, Yingying Fan, Jinchi Lv

Comments 48 pages, 15 figures

2603.00100 2026-03-03 stat.AP cs.LG

Using Artificial Neural Networks to Predict Claim Duration in a Work Injury Compensation Environment

Anthony Almudevar

Comments 8 pages; 9 figures; 6 tables

2603.00098 2026-03-03 stat.OT cs.CY cs.LG econ.GN math.PR q-fin.EC

Profiling vs. Case-specific Evidence: A Probabilistic Analysis

Marcello Di Bello, Nicolò Cangiotti, Michele Loi

Comments 16 pages

2603.00039 2026-03-03 cs.LG cs.AI stat.ML

CARE: Confounder-Aware Aggregation for Reliable LLM Evaluation

Jitian Zhao, Changho Shin, Tzu-Heng Huang, Satya Sai Srinath Namburi GNVV, Frederic Sala