arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.03218 2026-04-06 math.ST math.PR stat.ML stat.TH

Power one sequential tests exist for weakly compact $\mathscr P$ against $\mathscr P^c$

Ashwin Ram, Aaditya Ramdas

Comments Preprint

详情

英文摘要

Suppose we observe data from a distribution $P$ and we wish to test the composite null hypothesis that $P\in\mathscr P$ against a composite alternative $P\in \mathscr Q\subseteq \mathscr P^c$. Herbert Robbins and coauthors pointed out around 1970 that, while no batch test can have a level $α\in(0,1)$ and power equal to one, sequential tests can be constructed with this fantastic property. Since then, and especially in the last decade, a plethora of sequential tests have been developed for a wide variety of settings. However, the literature has not yet provided a clean and general answer as to when such power-one sequential tests exist. This paper provides a remarkably general sufficient condition (that we also prove is not necessary). Focusing on i.i.d. laws in Polish spaces without any further restriction, we show that there exists a level-$α$ sequential test for any weakly compact $\mathscr P$, that is power-one against $\mathscr P^c$ (or any subset thereof). We show how to aggregate such tests into an $e$-process for $\mathscr P$ that increases to infinity under $\mathscr P^c$. We conclude by building an $e$-process that is asymptotically relatively growth rate optimal against $\mathscr P^c$, an extremely powerful result.

URL PDF HTML ☆

赞 0 踩 0

2604.03215 2026-04-06 stat.ME stat.AP

Directional Dependence of Extreme Events

Matthieu Garcin, Maxime L. D. Nicolas

2604.03076 2026-04-06 stat.AP

Carbon cost pass-through rate in power system: evidence from Italy under the EU ETS

Pierdomenico Duttilo, Francesco Lisi

2604.03073 2026-04-06 stat.ME

Modeling within-department homogeneity in research quality rankings: an application to the Italian ISPD

Giorgio E. Montanari, Marco Doretti

Comments 20 pages, 6 figures, 2 tables

2604.03068 2026-04-06 cond-mat.dis-nn cond-mat.stat-mech stat.ML

Escape dynamics and implicit bias of one-pass SGD in overparameterized quadratic networks

Dario Bocchi, Theotime Regimbeau, Carlo Lucibello, Luca Saglietti, Chiara Cammarota

Comments 30 pages, 6 figures

2604.03015 2026-04-06 cs.LG math.PR stat.ML

Generating DDPM-based Samples from Tilted Distributions

Himadri Mandal, Dhruman Gupta, Rushil Gupta, Sarvesh Ravichandran Iyer, Agniv Bandyopadhyay, Achal Bassamboo, Varun Gupta, Sandeep Juneja

Comments 33 pages, 4 figures

2604.02992 2026-04-06 stat.OT stat.AP stat.ME

Why is Regularization Underused? An Empirical Study on Trust and Adoption of Statistical Methods

Konstantin Emil Thiel, Marléne Baumeister, Nicole Krämer, Andreas Groll, Markus Pauly, Magdalena Wischnewski

2604.02887 2026-04-06 stat.ML cs.LG

Lipschitz bounds for integral kernels

Justin Reverdi, Sixin Zhang, Fabrice Gamboa, Serge Gratton

2604.02886 2026-04-06 stat.ME q-bio.GN q-bio.QM stat.AP stat.ML

High-dimensional Many-to-many-to-many Mediation Analysis

Tien Dat Nguyen, Trung Khang Tran, Cong Khanh Truong, Duy-Cat Can, Binh T. Nguyen, Oliver Y. Chén

2604.02849 2026-04-06 cs.NE stat.ML

Frame Theoretical Derivation of Three Factor Learning Rule for Oja's Subspace Rule

Taiki Yamada

Comments 5 pages note

2604.02802 2026-04-06 stat.ME

A Scale-Invariant Entropy Statistic for Distance Distributions

Mohamed Gewily

Comments 8 pages

2604.02739 2026-04-06 stat.ME stat.ML

Quotient-Based Posterior Analysis for Euclidean Latent Space Models

Kisung You, Mauro Giuffrè

2604.02738 2026-04-06 stat.ML cs.LG math.OC stat.CO

State estimations and noise identifications with intermittent corrupted observations via Bayesian variational inference

Peng Sun, Ruoyu Wang, Xue Luo

Comments 8 pages, 6 figures

2604.02722 2026-04-06 math.ST stat.TH

Parameter Estimation of Incomplete Gamma Subordinators

Meena Sanjay Babulal, Sunil Kumar Gauttam, Aditya Maheshwari

2604.02678 2026-04-06 stat.ME cs.AI stat.AP

Eligibility-Aware Evidence Synthesis: An Agentic Framework for Clinical Trial Meta-Analysis

Yao Zhao, Zhiyue Zhang, Yanxun Xu

详情

英文摘要

Clinical evidence synthesis requires identifying relevant trials from large registries and aggregating results that account for population differences. While recent LLM-based approaches have automated components of systematic review, they do not support end-to-end evidence synthesis. Moreover, conventional meta-analysis weights studies by statistical precision without considering clinical compatibility reflected in eligibility criteria. We propose EligMeta, an agentic framework that integrates automated trial discovery with eligibility-aware meta-analysis, translating natural-language queries into reproducible trial selection and incorporating eligibility alignment into study weighting to produce cohort-specific pooled estimates. EligMeta employs a hybrid architecture separating LLM-based reasoning from deterministic execution: LLMs generate interpretable rules from natural-language queries and perform schema-constrained parsing of trial metadata, while all logical operations, weight computations, and statistical pooling are executed deterministically to ensure reproducibility. The framework structures eligibility criteria and computes similarity-based study weights reflecting population alignment between target and comparator trials. In a gastric cancer landscape analysis, EligMeta reduced 4,044 candidate trials to 39 clinically relevant studies through rule-based filtering, recovering all 13 guideline-cited trials. In an olaparib adverse events meta-analysis across four trials, eligibility-aware weighting shifted the pooled risk ratio from 2.18 (95% CI: 1.71-2.79) under conventional Mantel-Haenszel estimation to 1.97 (95% CI: 1.76-2.20), demonstrating quantifiable impact of incorporating eligibility alignment. EligMeta bridges automated trial discovery with eligibility-aware meta-analysis, providing a scalable and reproducible framework for evidence synthesis in precision medicine.

URL PDF HTML ☆

赞 0 踩 0

2604.02664 2026-04-06 stat.ME astro-ph.IM stat.AP

A comparison of methods for Poisson regression in the presence of background

Massimiliano Bonamente, Vinay Kashyap, Xiaoli Li, Jelle de Plaa

Comments Submitted to ApJ

2604.02659 2026-04-06 cs.LG cs.AI cs.NA math.NA stat.ML

Low-Rank Compression of Pretrained Models via Randomized Subspace Iteration

Farhad Pourkamali-Anaraki

Comments 13 pages

2604.02610 2026-04-06 stat.ML cs.LG

Structure-Preserving Multi-View Embedding Using Gromov-Wasserstein Optimal Transport

Rafael Pereira Eufrazio, Eduardo Fernandes Montesuma, Charles Casimiro Cavalcante

Comments This manuscript is currently under review for possible publication in the journal Signal Processing (ELSEVIER)

2604.02595 2026-04-06 stat.ME

Multi-Site Health Research Integrating Complementary Data Sources: A Scoping Review of Statistical Inference Methods for Vertically Partitioned Data

Marie-Pier Domingue, Simon Lévesque, Anita Burgun, Jean-François Ethier, Félix Camirand Lemyre

详情

英文摘要

To address the multidimensional nature of health-related questions, advances in health research often require integrating information from various data sources within statistical analyses. When complementary information pertaining to the same set of individuals are distributed across different institutions, vertical methods make it possible to obtain analysis results without sharing or pooling individual-level data. To guide stakeholders toward a transparent use of vertical methods, this study aims to (1) Identify existing vertical methods enabling statistical inference; and (2) Characterize the methodological properties of these methods and the current extent of their use with health data. We conducted a scoping review using four interdisciplinary databases. We then systematically extracted the characteristics of identified vertical methods with respect to comparability with the pooled analysis, efficiency of communication schemes and confidentiality. We additionally screened studies that cited included articles to identify applications on vertically partitioned real-world health data. Among 2887 articles initially screened, 30 were included in the review. Inference for the linear and the logistic regression framework were the most frequent statistical inference tasks undertaken in proposed methods. Equivalence with the pooled analyses was not systematically addressed and most methods required multiple communications between participating parties. Almost all articles described their approach as privacy-preserving, although a minority provided privacy assessments. The scope of existing approaches enabling statistical inference for vertically partitioned data is still relatively limited. Most existing methods do not concurrently achieve results equivalent to centralized analyses, high communication efficiency, and guaranteed protection of individual-level data.

URL PDF HTML ☆

赞 0 踩 0

2604.02581 2026-04-06 stat.ML cs.LG cs.NA math.NA

Learning interacting particle systems from unlabeled data

Viska Wei, Fei Lu

Comments 39 pages, 7 figures

2604.02526 2026-04-06 stat.AP

Applied Statistics Requires Scientific Context

Ashley I Naimi

Comments 12 pages, 1 figure, 63 references

详情

英文摘要

Statistical methods are indispensable to scientific inference. However, there exists a longstanding tension across a wide range of scientific disciplines about the role that ``context'' should play in the application of statistical methods and the interpretation of statistical results. Though frequently invoked, the notion of ``scientific context'' refers to at least two distinct concepts: a set of foundational nuanced and elusive background assumptions and substantive features of a given area of study that shape the validity and reliability of statistical methods; and more quantifiable contextual issues that affect the performance of statistical methods and interpretation of statistical results. I argue here that the application and interpretation of statistical methods requires careful consideration of foundational contextual issues. To motivate the arguments, I review a recent re-formulation of the $p$-value as a measure of divergence between an observed dataset and a set of assumptions used to construct statistical measures. I use this framework to illustrate the role that context plays in two randomized trials: on low-dose aspirin for pregnancy loss, and a new inhibitor of a key biochemical pathway affecting ankylosing spondylitis. Finally, I note that the adoption of low significance thresholds in genome-wide association studies and high energy particle physics has been successful more so because of extensive validity-checking gauntlets and contextual considerations that have accompanied these low thresholds, not because of the low thresholds themselves. I use these illustrations and arguments to suggest that (i) the adoption of a universal threshold for significance testing should be abandoned as a goal of statistics reform; and (ii) the validity and optimal use of applied statistical tools requires careful consideration of nuanced scientific context.

URL PDF HTML ☆

赞 0 踩 0

2604.02507 2026-04-06 stat.ML cs.LG

Reinforcement Learning from Human Feedback: A Statistical Perspective

Pangpang Liu, Chengchun Shi, Will Wei Sun

2604.02489 2026-04-06 stat.ME

Sequentially-Rerandomized Switchback Experiments

Zhenghao Zeng, Christopher Adjaho, Alonso Bucarey, Chao Qin, Ruixuan Zhang, Paul Hoban, Ramesh Johari, Stefan Wager

Comments 33 pages, 10 figures

2604.02454 2026-04-06 stat.AP

Remote, bivariate expert elicitation to determine the prior probability distribution for sample size calculation in a Bayesian non-inferiority multicenter randomized controlled trial (Croup Dosing Trial)

Arlene Jiang, Alex Aregbesola, Apoorva Gangwani, Terry P. Klassen, Amy C. Plint, Elisabete Doyle, William Craig, Mohamed Eltorki, Banke Oketola, Hoda Badra, Yongdong Ouyang, Anna Heath

详情

英文摘要

Prior distributions must be specified for the parameters of interest in a Bayesian clinical trial. When existing evidence on the effects of the trial interventions is limited, prior distributions can be constructed with expert elicitation. However, conventional elicitation requires face-to-face interactions and intensive pre-elicitation training, which can be infeasible. Our remote elicitation was based on established expert elicitation methods. We used bivariate prior distributions for dependencies between elicited quantities. We elicited a prior distribution for the Croup Dosing Trial, which will assess the number of return visits to the emergency department within 7 days in children with croup. This trial evaluates the non-inferiority of 0.15 mg/kg of dexamethasone, compared to the standard dose of 0.60 mg/kg to treat croup. We conducted three remote workshops to elicit expert beliefs on the efficacy of the two doses of dexamethasone. Each workshop consisted of two survey rounds, separated by a group discussion. Prior to the workshop, experts reviewed provided literature on the effects of the two doses of dexamethasone. Beliefs were aggregated with expert-specific bivariate distributions. The aggregated distribution and surveyed non-inferiority margin determined the sample size. Twelve emergency medicine physicians participated in our remote elicitation exercise. The elicitation generated a prior distribution centered at 6% for the 0.60 mg/kg dose and 8% for the 0.15 mg/kg dose. The aggregated prior distribution produced a sample size of 1850, based on a non-inferiority margin of 4%. We elicited a prior distribution that incorporated past evidence and expert opinion. The elicited prior is consistent with literature on the efficacy of the dexamethasone doses in treating croup. Our approach demonstrates the feasibility of remotely eliciting bivariate distributions for clinical trials.

URL PDF HTML ☆

赞 0 踩 0

2604.02403 2026-04-06 econ.EM cs.CL stat.ME

Measuring What Cannot Be Surveyed: LLMs as Instruments for Latent Cognitive Variables in Labor Economics

Cristian Espinal Maya

Comments Working paper. 13 pages, 7 figures, 6 references. Part of the Cognitive Factor Economics research program. Code: https://github.com/Cespial/cognitive-factor-economics

2604.02400 2026-04-06 stat.AP

Varying risk exposure in auto insurance: a weighted tweedie framework for experience rating an cancellation penalties

Jean-Philippe Boucher, Raïssa Coulibaly, Julien Trufin

Comments 31 pages, 22 figures, 4 tables

2604.02394 2026-04-06 q-bio.GN stat.ME

Benchmarking Heritability Estimation Strategies Across 86 Configurations and Their Downstream Effect on Polygenic Risk Score Performance

Muhammad Muneeb, David B. Ascher

2604.02380 2026-04-06 q-bio.GN math.MG stat.ME

VeloTree: Inferring single-cell trajectories from RNA velocity fields with varifold distances

Elodie Maignant, Tim Conrad, Christoph von Tycowicz

Comments arXiv admin note: text overlap with arXiv:2507.11313

2604.02336 2026-04-06 math.FA math.ST stat.TH

Stationary Process Invertibility and the Unilateral Shift Operator

Anand Ganesh, Babhrubahan Bose, Anand Rajagopalan

Comments 4 pages

2604.01735 2026-04-06 stat.AP physics.data-an

Correlation analysis of the dispersion of SARS-CoV-2 in Mexico

Pablo Carlos López, Marcos Flores, Soham Biswas

Comments 8 pages, 6 figures

2603.29889 2026-04-06 econ.EM stat.ML

Penalized GMM Framework for Inference on Functionals of Nonparametric Instrumental Variable Estimators

Edvard Bakhitov

Comments Previously circulated as "Automatic Debiased Machine Learning in Presence of Endogeneity"

2603.29415 2026-04-06 math.ST math.PR stat.TH

Concentration of the bootstrap empirical process, with applications to statistical inference

Guillaume Maillard, Adrien Saumard

2603.28681 2026-04-06 stat.ML cs.LG

Functional Natural Policy Gradients

Aurelien Bibaut, Houssam Zenati, Thibaud Rahier, Nathan Kallus

2603.26227 2026-04-06 stat.ML cs.LG

Privacy-Accuracy Trade-offs in High-Dimensional LASSO under Perturbation Mechanisms

Ayaka Sakata, Haruka Tanzawa

Comments 53 pages, 11 figures

2603.24705 2026-04-06 stat.ME cs.LG econ.EM

Amortized Inference for Correlated Discrete Choice Models via Equivariant Neural Networks

Easton Huch, Michael Keane

2603.10288 2026-04-06 math.ST stat.TH

Version-Robust Methods for Identifying Minimal Sufficient Statistics

Rafael Oliveira Cavalcante, Alexandre Galvão Patriota

Comments 29 pages (now it provides a generalization to separable spaces)

2512.16383 2026-04-06 cs.LG stat.ML

Multivariate Uncertainty Quantification with Tomographic Quantile Forests

Takuya Kanazawa

Comments 36 pages. v2: matches published version

2512.03537 2026-04-06 cs.LG stat.ML

Pushing the Limits of Distillation-Based Continual Learning via Classifier-Proximal Lightweight Plugins

Zhiming Xu, Baile Xu, Jian Zhao, Furao Shen, Suorong Yang

Comments 10 pages, 8 figures, 2 tables

2512.00508 2026-04-06 stat.ME

High-dimensional Autoregressive Modeling for Time Series with Hierarchical Structures

Lan Li, Shibo Yu, Yingzhou Wang, Guodong Li

2511.21595 2026-04-06 stat.ME math.ST stat.TH

Degrees of Freedom in Penalized Regression: Model Selection with Adaptive Penalties

Mauro Bernardi, Antonio Canale, Marco Stefanucci

2511.13394 2026-04-06 cs.LG stat.ML

Fast and Robust Simulation-Based Inference With Optimization Monte Carlo

Vasilis Gkolemis, Christos Diou, Michael U. Gutmann

Comments Accepted at AISTATS 2026

2511.01154 2026-04-06 math.PR cs.LG math.ST stat.TH

Stability of the Kim--Milman flow map

Sinho Chewi, Aram-Alexandre Pooladian, Matthew S. Zhang

2510.15483 2026-04-06 stat.ML cs.LG

Fast Best-in-Class Regret for Contextual Bandits

Samuel Girard, Aurelien Bibaut, Arthur Gretton, Nathan Kallus, Houssam Zenati

2510.15075 2026-04-06 cs.LG stat.ML

Physics-informed data-driven machine health monitoring for two-photon lithography

Sixian Jia, Zhiqiao Dong, Chenhui Shao

2510.02513 2026-04-06 stat.ML cs.DS cs.LG cs.NA math.NA stat.CO

Adaptive randomized pivoting and volume sampling

Ethan N. Epperly

Comments 14 pages, 2 figures

2509.25708 2026-04-06 stat.ME stat.AP

Modeling Spatial Heterogeneity in Exposure Buffers and Risk: A Hierarchical Bayesian Approach

Saskia Comess, Daniel E Ho, Joshua L Warren

Comments Submitted to the Journal of the Royal Statistical Society, Series C

2509.05221 2026-04-06 stat.ME

A functional tensor model for dynamic multilayer networks with common invariant subspaces and the RKHS estimation

Runshi Tang, Runbing Zheng, Anru R. Zhang, Carey E. Priebe

2509.04603 2026-04-06 stat.AP cs.LG

DRtool: An Interactive Tool for Analyzing High-Dimensional Clusterings

Justin Lin, Julia Fukuyama

Comments 32 pages, 14 figures

2506.06845 2026-04-06 stat.CO stat.ML

Linear Discriminant Analysis with Gradient Optimization

Cencheng Shen, Yuexiao Dong

Comments 26 pages

2505.21723 2026-04-06 stat.CO cs.LG stat.ML

Are Statistical Methods Obsolete in the Era of Deep Learning? A Study of ODE Inverse Problems

Skyler Wu, Shihao Yang, S. C. Kou

Comments 35 pages, 11 figures (main text)

2505.07647 2026-04-06 math.PR stat.ML

Langevin Diffusion Approximation to Same Marginal Schrödinger Bridge

Medha Agarwal, Zaid Harchaoui, Garrett Mulcahy, Soumik Pal

Comments Final version. arXiv admin note: substantial text overlap with arXiv:2406.10823

2504.19331 2026-04-06 math.ST stat.TH

Bahadur asymptotic efficiency in the zone of moderate deviation probabilities

Mikhail Ermakov

Comments 9 pages

2504.01938 2026-04-06 cs.LG cs.NA math.NA stat.ML

A Unified Approach to Analysis and Design of Denoising Markov Models

Yinuo Ren, Grant M. Rotskoff, Lexing Ying

2503.10773 2026-04-06 stat.ML cs.LG

Learn then Decide: A Learning Approach for Designing Data Marketplaces

Yingqi Gao, Wenlu Xu, Jin J. Zhou, Hua Zhou, Yong Chen, Xiaowu Dai

2503.04876 2026-04-06 stat.ME math.ST stat.TH

Estimation of relative risk, odds ratio and their logarithms with guaranteed accuracy and controlled sample size ratio

Luis Mendo

Comments 47 pages, 19 figures

2502.11868 2026-04-06 stat.ME

Phylogenetic latent space models for network data

Federico Pavone, Daniele Durante, Robin J. Ryder

2502.00092 2026-04-06 math.ST cond-mat.dis-nn math.MG math.PR stat.TH

Minkowski tensors for point clouds and voxelized data: robust, asymptotically unbiased estimators

Daniel Hug, Michael A. Klatt, Dominik Pabst

Comments Substantially revised version

2410.06128 2026-04-06 cs.LG stat.ML

Amortized Inference of Causal Models via Conditional Fixed-Point Iterations

Divyat Mahajan, Jannes Gladrow, Agrin Hilmkil, Cheng Zhang, Meyer Scetbon

Comments Transactions on Machine Learning Research (TMLR) 2025 (J2C Certification). ICLR 2026

2409.11167 2026-04-06 stat.ME math.ST stat.TH

Using fractional derivatives to derive marginal densities

Si-Yang Li, David A. van Dyk, Maximilian Autenrieth

2404.03198 2026-04-06 stat.ME

Delaunay Weighted Two-sample Test for High-dimensional Data by Incorporating Geometric Information

Jiaqi Gu, Ruoxu Tan, Guosheng Yin

2402.01207 2026-04-06 cs.LG cs.AI stat.ME

Efficient Causal Graph Discovery Using Large Language Models

Thomas Jiralerspong, Xiaoyin Chen, Yash More, Vedant Shah, Yoshua Bengio

2308.02005 2026-04-06 stat.ME

Randomization-Based Inference for Average Treatment Effects in Inexactly Matched Observational Studies

Jianan Zhu, Jeffrey Zhang, Zijian Guo, Siyu Heng

2109.11142 2026-04-06 stat.ME math.ST stat.TH

Sparse PCA: A New Scalable Estimator Based On Integer Programming

Kayhan Behdin, Rahul Mazumder

Comments To appear in the Annals of Statistics

2104.04590 2026-04-06 econ.EM stat.ME

Identification of Dynamic Panel Logit Models with Fixed Effects

Christopher Dobronyi, Jiaying Gu, Kyoo il Kim, Thomas M. Russell