arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.20184 2026-03-23 cs.LG stat.ML

Kolmogorov-Arnold causal generative models

Alejandro Almodóvar, Mar Elizo, Patricia A. Apellániz, Santiago Zazo, Juan Parras

Comments 14 pages, 8 figures, 3 tables, 5 algorithms, preprint

详情

英文摘要

Causal generative models provide a principled framework for answering observational, interventional, and counterfactual queries from observational data. However, many deep causal models rely on highly expressive architectures with opaque mechanisms, limiting auditability in high-stakes domains. We propose KaCGM, a causal generative model for mixed-type tabular data where each structural equation is parameterized by a Kolmogorov--Arnold Network (KAN). This decomposition enables direct inspection of learned causal mechanisms, including symbolic approximations and visualization of parent--child relationships, while preserving query-agnostic generative semantics. We introduce a validation pipeline based on distributional matching and independence diagnostics of inferred exogenous variables, allowing assessment using observational data alone. Experiments on synthetic and semi-synthetic benchmarks show competitive performance against state-of-the-art methods. A real-world cardiovascular case study further demonstrates the extraction of simplified structural equations and interpretable causal effects. These results suggest that expressive causal generative modeling and functional transparency can be achieved jointly, supporting trustworthy deployment in tabular decision-making settings. Code: https://github.com/aalmodovares/kacgm

URL PDF HTML ☆

赞 0 踩 0

2603.20155 2026-03-23 cs.LG cs.CV stat.ML

Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD

Emiel Hoogeboom, David Ruhe, Jonathan Heek, Thomas Mensink, Tim Salimans

2603.20135 2026-03-23 math.ST cs.IT math.IT stat.TH

Classifier-Based Nonparametric Sequential Hypothesis Testing

Chia-Yu Hsu, Shubhanshu Shekhar

2603.20134 2026-03-23 econ.EM math.ST stat.TH

Triple/Double-Debiased Lasso

Denis Chetverikov, Jesper R. -V. Sørensen, Aleh Tsyvinski

Comments 47 pages, 10 figures

2603.20082 2026-03-23 math.ST stat.ME stat.TH

Inference in high-dimensional logistic regression under tensor network dependence

Josh Miles, Sohom Bhattacharya

2603.20071 2026-03-23 stat.ME math.ST stat.TH

Posterior inference via Hill's prediction model

Pier Giovanni Bissiri, Chris Holmes, Stephen G. Walker

Comments 23 pages, 7 figures

2603.20070 2026-03-23 math.ST cond-mat.stat-mech cs.CC stat.ML stat.TH

The monotonicity of the Franz-Parisi potential is equivalent with Low-degree MMSE lower bounds

Konstantinos Tsirkas, Leda Wang, Ilias Zadik

Comments 92 pages

2603.20068 2026-03-23 stat.ME stat.CO

Approximate posterior recalibration

Tiffany Cai, Philip Greengard, Ben Goodrich, Andrew Gelman

2603.20052 2026-03-23 physics.ao-ph stat.AP

Uncertainty in wind and solar projections depends on global and regional climate models

Nina Effenberger, Reto Knutti

2603.20022 2026-03-23 stat.ME

Q-approximation of operating characteristics of clinical trial designs

Susanna Gentile, Daniel E. Schwartz, Riddhiman Saha, Lorenzo Trippa

2603.20015 2026-03-23 stat.ME stat.AP

On the Calibration of Bayesian Success Criteria and Operating Characteristics for Clinical Trials

Peng Yang, Li Wang, Ying Yuan

2603.19986 2026-03-23 stat.AP

Probabilistic Estimation of Hidden Migrant Fatalities Along the Central Mediterranean Route

Gregor Zens, Zoe Sigman

2603.19977 2026-03-23 stat.ME

Scalable and Robust Spatial Prediction via Multi-Resolution Ensembles of Predictive Processes

Nicolas Bianco, Nadja Klein

2603.19945 2026-03-23 stat.ME

Cancer Survival Rates Are Misleading

Allen B. Downey

Comments 10 pages, 1 PDF figure. Companion analysis notebook: https://colab.research.google.com/github/AllenDowney/ThinkBayes2/blob/master/examples/cancer.ipynb

2603.19902 2026-03-23 cond-mat.dis-nn stat.ML

A Federated Many-to-One Hopfield model for associative Neural Networks

Andrea Alessandrelli, Fabrizio Durante, Andrea Ladiana, Andrea Lepre

2603.19899 2026-03-23 stat.ML cs.LG stat.AP

Deep Autocorrelation Modeling for Time-Series Forecasting: Progress and Prospects

Hao Wang, Licheng Pan, Qingsong Wen, Jialin Yu, Zhichao Chen, Chunyuan Zheng, Xiaoxi Li, Zhixuan Chu, Chao Xu, Mingming Gong, Haoxuan Li, Yuan Lu, Zhouchen Lin, Philip Torr, Yan Liu

2603.19840 2026-03-23 stat.ML cs.LG

Explainable cluster analysis: a bagging approach

Federico Maria Quetti, Elena Ballante, Silvia Figini, Paolo Giudici

2603.19804 2026-03-23 math.ST stat.ML stat.TH

Uncertainty Quantification Via the Posterior Predictive Variance

Sanjay Chaudhuri, Dean Dustin, Bertrand Clarke

2603.19799 2026-03-23 stat.ME stat.CO

Estimation of Multivariate Functional Principal Components from Sparse Functional Data

Uche Mbaka, Michelle Carey

2603.19792 2026-03-23 cs.LG cs.DS stat.CO stat.ME stat.ML

Scalable Learning of Multivariate Distributions via Coresets

Zeyu Ding, Katja Ickstadt, Nadja Klein, Alexander Munteanu, Simon Omlor

Comments AISTATS 2026

2603.19756 2026-03-23 stat.AP

Extraction of tabulated statistical results with tableParser

Ingmar Böschen

Comments 16 pages, 14 tables

详情

英文摘要

Tabulated content is omnipresent in scientific literature. This work presents the R package *tableParser*, designed to extract and postprocess tables from NISO-JATS-encoded XML, HTML, DOCX, and, with limitations, PDF documents. *tableParser* focuses on extracting and analyzing statistical test results reported in scientific publications. It can be used for large-scale analysis of effect sizes, reporting practices, or summarization of results, as well as for checking completeness and consistency of standard test results in unpublished documents. Documents can be processed in three decoding levels. *table2matrix()* compiles all tables into a list of character matrices with captions and footnotes. *table2text()* collapses the matrix contents into human-readable text, mimicking a screen reader. Optionally, many common codings that are reported within the table's caption and footnote can be used to decode and expand the table's content. The collapsed and decoded table content can be further processed match an ideal input for the extraction of statistical standard results with the *standardStats()* function from the *JATSdecoder* package. The output of *table2stats()* is a data frame with all detected standard results as columns and, if calculation is possible, a recalculated p-value. If desired, an automated consistency check of the reported and the coded p-values with the recalculated p-value can be initiated. *tableParser* works best on barrier-free HTML tables encoded in NISO-JATS, where captions and footnotes are clearly identifiable. By guessing the tables captions and footnotes conservatively, the processing of tables within HTML and DOCX documents is comparably robust. Technically, tables in PDFs often fail to be correctly extracted, with captions and footnotes not detectable. Therefore, a decoding of codes is not possible, which lowers *tableParser*'s decoding accuracy on PDFs.

URL PDF HTML ☆

赞 0 踩 0

2603.19755 2026-03-23 math.AP stat.ML

Regularity of Solutions to Beckmann's Parametric Optimal Transport

Hanno Gottschalk, Tobias J. Riedlinger

Comments arXiv admin note: text overlap with arXiv:2503.10729

2603.19736 2026-03-23 stat.ML cs.LG

A two-step sequential approach for hyperparameter selection in finite context models

José Contente, Ana Martins, Armando J. Pinho, Sónia Gouveia

2603.19728 2026-03-23 stat.ME

Objective Model Prior Probabilities in Variable Selection

James Berger, Gonzalo García-Donato, Elías Moreno, Luis Pericchi

2603.19703 2026-03-23 math.ST cs.LG stat.TH

Minimax and Adaptive Covariance Matrix Estimation under Differential Privacy

T. Tony Cai, Yicheng Li

2603.19657 2026-03-23 stat.ML cs.LG

Model Selection and Parameter Estimation of Multi-dimensional Gaussian Mixture Model

Xinyu Liu, Hai Zhang

2603.19648 2026-03-23 cs.LG cs.SY eess.SY math.OC stat.ML

Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis

Siddharth Chandak, Anuj Yadav, Ayfer Ozgur, Nicholas Bambos

Comments Submitted to IEEE Transactions on Automatic Control

2603.19633 2026-03-23 cs.LG stat.ML

Alternating Diffusion for Proximal Sampling with Zeroth Order Queries

Hirohane Takagi, Atsushi Nitanda

Comments Accepted to ICLR2026

2603.19629 2026-03-23 stat.ML cs.LG physics.geo-ph

On the role of memorization in learned priors for geophysical inverse problems

Ali Siahkoohi, Davide Sabeddu

2603.18168 2026-03-23 stat.ML cs.LG math.PR

ResNets of All Shapes and Sizes: Convergence of Training Dynamics in the Large-scale Limit

Louis-Pierre Chaintron, Lénaïc Chizat, Javier Maass

2603.17381 2026-03-23 econ.EM stat.ML

An Auditable AI Agent Loop for Empirical Economics: A Case Study in Forecast Combination

Minchul Shin

Comments 34 pages, no figure

2603.16982 2026-03-23 astro-ph.IM math.DS stat.AP

Trajectory Stability and Signature Diagnostics for Comet-Based Interstellar Navigation

Bo Pieter Johannes Andrée

Comments 31 pages, 2 figures, 4 added references

2603.15781 2026-03-23 stat.ML cs.LG

Learnability with Partial Labels and Adaptive Nearest Neighbors

Nicolas A. Errandonea, Santiago Mazuelas, Jose A. Lozano, Sanjoy Dasgupta

2602.18184 2026-03-23 math.ST math.PR stat.ME stat.TH

Kolmogorov-Type Maximal Inequalities for Independent and Dependent Negative Binomial Random Variables: Sharp Bounds, Sub-Exponential Refinements, and Applications to Overdispersed Count Data

Aristides V. Doumas, S. Spektor

Comments 11 pages, 8 figures, 2 tables

2602.11132 2026-03-23 math.ST stat.ME stat.TH

A New Look at Bayesian Testing

Jyotishka Datta, Nicholas G. Polson, Vadim Sokolov, Daniel Zantedeschi

Comments Revised version addresses proofs and references

2601.20018 2026-03-23 math.ST econ.EM math.PR stat.TH

Decoupling and randomization for double-indexed permutation statistics

Mingxuan Zou, Jingfan Xu, Peng Ding, Fang Han

Comments 42 pages

2511.12435 2026-03-23 stat.ME

Transfer learning for high-dimensional Factor-augmented sparse linear model

Bo Fu, Dandan Jiang

Comments 54 pages, 1 figures

2510.05685 2026-03-23 math.ST math.PR stat.TH

Sample complexity for divergence regularized optimal transport with radial cost

Ruiyu Han, Johannes Wiesel

2510.01803 2026-03-23 stat.AP

The Perceived Impact of Environment on Health in Italy: a Penalized Ordinal Regression Approach

Mattia Stival, Angela Andreella, Gaia Bertarelli, Catarina Midões, Stefano Federico Tonellato, Stefano Campostrini

2509.26385 2026-03-23 stat.ME stat.CO

An Order of Magnitude Time Complexity Reduction for Gaussian Graphical Model Posterior Sampling Using a Reverse Telescoping Block Decomposition

Zejin Gao, Ksheera Sagar, Anindya Bhadra

2509.01597 2026-03-23 cs.CR cs.DS stat.AP

Statistics-Friendly Confidentiality Protection for Establishment Data, with Applications to the QCEW

Kaitlyn Webb, Prottay Protivash, John Durrell, Daniell Toth, Aleksandra Slavković, Daniel Kifer

Comments 42 pages (13 main text, 2 references, and 27 appendix pages), 13 figures (4 in main text)

2508.18716 2026-03-23 stat.AP

Dynamic Count Models with Flexible Innovation Processes for Irregular Maritime Migration

Gregor Zens, Jakub Bijak

2508.15954 2026-03-23 math.OC stat.AP

A Heuristic Framework of Variable Neighborhood Descent Methods for the Large-Scale Multi-Level Facility Location Problem in Supply Chain Networks

Haibo Wang, Bahram Alidaee

Comments 48 pages 3 figures

2505.24144 2026-03-23 math.PR math.ST stat.TH

Sharp Concentration of Simple Random Tensors II: Asymmetry

Jiaheng Chen, Daniel Sanz-Alonso

Comments 42 pages, to appear in Information and Inference

2505.14255 2026-03-23 stat.ME math.PR

Statistical Inference for Quasi-Infinitely Divisible Distributions via Fourier Methods

Vladimir Panov, Anton Ryabchenko

Comments 25 pages, 9 figures

2502.10647 2026-03-23 cs.LG math.ST stat.ML stat.TH

A Power Transform

Jonathan T. Barron

2502.04082 2026-03-23 stat.AP

Market-based insurance ratemaking: application to pet insurance

Pierre-Olivier Goffard, Pierrick Piette, Gareth W. Peters

2501.11868 2026-03-23 stat.ME math.ST stat.ML stat.TH

Automatic Debiased Machine Learning for Smooth Functionals of Nonparametric M-Estimands

Lars van der Laan, Aurelien Bibaut, Nathan Kallus, Alex Luedtke

2501.11421 2026-03-23 cs.LG cs.IT math.IT math.ST stat.TH

Online Clustering of Data Sequences with Bandit Information

G Dhinesh Chandran, Srinivas Reddy Kota, Srikrishna Bhashyam

详情

英文摘要

We study the problem of online clustering of data sequences in the multi-armed bandit (MAB) framework under the fixed-confidence setting. There are $M$ arms, each providing i.i.d. samples from a parametric distribution whose parameters are unknown. The $M$ arms form $K$ clusters based on the distance between the true parameters. In the MAB setting, one arm can be sampled at each time. The objective is to estimate the clusters of the arms using as few samples as possible from the arms, subject to an upper bound on the error probability. Our setting allows for: arms within a cluster to have non-identical distributions, vector parameter arms, vector observations, and $K \le M$ clusters. We propose and analyze the Average Tracking Bandit Online Clustering (ATBOC) algorithm. ATBOC is asymptotically order-optimal for multivariate Gaussian arms, with expected sample complexity grows at most twice as fast as the lower bound as $δ\rightarrow 0$, and this guarantee extends to multivariate sub-Gaussian arms. For single-parameter exponential family arms, ATBOC is asymptotically optimal, matching the lower bound. We also propose a computationally more efficient alternatives Lower and Upper Confidence Bound based Bandit Online Clustering Algorithm (LUCBBOC), and Bandit Online Clustering-Elimination (BOC-ELIM). We derive the computational complexity of the proposed algorithms and compare their per-sample runtime through simulations. LUCBBOC and BOC-ELIM require lower per-sample runtime than ATBOC while achieving comparable performance. All the proposed algorithms are $δ$-Probably correct, i.e., the error probability of cluster estimate at the stopping time is atmost $δ$. We validate the asymptotic optimality guarantees through simulations, and present the comparison of our proposed algorithms with other related work through simulations on both synthetic and real-world datasets.

URL PDF HTML ☆

赞 0 踩 0

2410.15166 2026-03-23 math.ST stat.ME stat.TH

Adversarial Estimation of Assortment Probabilities under Independence Structure

Alexandre Belloni, Yan Chen, Matthew Harding

2409.19435 2026-03-23 cs.LG stat.CO stat.ML

Simulation-based Inference with the Python Package sbijax

Simon Dirmeier, Antonietta Mira, Carlo Albert

2409.18010 2026-03-23 eess.SY cs.SY math.OC stat.ML

End-to-end guarantees for indirect data-driven control of bilinear systems with finite stochastic data

Nicolas Chatzikiriakos, Robin Strässer, Frank Allgöwer, Andrea Iannelli

Comments Accepted for publication in Automatica

2409.06271 2026-03-23 stat.ML cs.LG stat.ME

A new paradigm for global sensitivity analysis

Gildas Mazo

2408.12905 2026-03-23 math.ST stat.TH

On the relation between likelihood ratios and p-values for testing success probabilities of Bernoulli trials

Wouter Kager, Ronald Meester

Comments 24 pages, 2 figures

2405.13591 2026-03-23 stat.ME

Practical limitations for real-life application of data fission and data thinning in post-clustering differential analysis

Benjamin Hivert, Denis Agniel, Rodolphe Thiébaut, Boris P. Hejblum

2405.01425 2026-03-23 cs.DS cs.LG math.ST stat.ML stat.TH

In-and-Out: Algorithmic Diffusion for Sampling Convex Bodies

Yunbum Kook, Santosh S. Vempala, Matthew S. Zhang

Comments To appear in Random Structures & Algorithms; conference version appeared in NeurIPS 2024 (spotlight)

2211.09875 2026-03-23 stat.CO

Mixture of Experts Distributional Regression: Implementation Using Robust Estimation with Adaptive First-order Methods

David Rügamer, Florian Pfisterer, Bernd Bischl, Bettina Grün

Comments arXiv admin note: text overlap with arXiv:2010.06889

2210.12790 2026-03-23 math.ST cond-mat.dis-nn cond-mat.soft math.PR stat.TH

A genuine test for hyperuniformity

Michael A. Klatt, Günter Last, Norbert Henze

Comments 46 pages, 11 figures, 1 table

2603.19577 2026-03-23 math.PR q-bio.QM stat.ME

Stochastic Averaging and Statistical Inference of Glycolytic Pathway

Arnab Ganguly, Hye-Won Kang

Comments 33 pages, 2 figures

2603.19549 2026-03-23 cs.CY cs.AI cs.ET stat.AP

Plagiarism or Productivity? Students Moral Disengagement and Behavioral Intentions to Use ChatGPT in Academic Writing

John Paul P. Miranda, Rhiziel P. Manalese, Mark Anthony A. Castro, Renen Paul M. Viado, Vernon Grace M. Maniago, Rudante M. Galapon, Jovita G. Rivera, Amado B. Martinez

Comments 5 pages, 1 figure, 2 table, conference proceeding

2603.19506 2026-03-23 math.ST stat.ME stat.TH

Doubly-Unlinked Regression for Dependent Data

Anik Burman, Sayantan Choudhury, Debangan Dey

Comments 81 pages, 6 figures, supplementary appendix included

2603.19480 2026-03-23 stat.ME math.ST stat.TH

Regression Adjustments for Double Randomization in Two-Sided Marketplaces

Timothy Sudijono, Lihua Lei, Lorenzo Masoero, Suhas Vijaykumar, Guido Imbens, James McQueen

Comments 72 pages. Comments welcome

2603.19440 2026-03-23 stat.ML cs.LG

Near-Equivalent Q-learning Policies for Dynamic Treatment Regimes

Sophia Yazzourh, Erica E. M. Moodie

Comments 13 pages, 2 figures

2603.19439 2026-03-23 stat.ML cs.LG eess.SP

Subspace Projection Methods for Fast Spectral Embeddings of Evolving Graphs

Mohammad Eini, Abdullah Karaaslanli, Vassilis Kalantzis, Panagiotis A. Traganitis

2603.19403 2026-03-23 stat.AP

Evaluation of Individual and Trial Level Association Metrics in the Validation of a Binary Surrogate Endpoint for a True Time-to-Event Endpoint

Renee Y. Ge, Azadeh Shohoudi, Malini Iyengar, Quefeng Li, Judy Li

Comments 29 pages, 6 figures

2603.19336 2026-03-23 stat.ME

Coordinate Descent Algorithm for Least Absolute Deviations Regression

Zehaan Naik, Debasis Kundu

Comments 28 pages, 7 figures

2603.19331 2026-03-23 cs.LG stat.ML

FalconBC: Flow matching for Amortized inference of Latent-CONditioned physiologic Boundary Conditions

Chloe H. Choi, Alison L. Marsden, Daniele E. Schiavazzi

2603.19291 2026-03-23 cs.LG cs.AI stat.ML

A Visualization for Comparative Analysis of Regression Models

Nassime Mountasir, Baptiste Lafabregue, Bruno Albert, Nicolas Lachiche

2512.18720 2026-03-23 stat.ML cs.LG

Unsupervised Feature Selection via Robust Autoencoder and Adaptive Graph Learning

Feng Yu, MD Saifur Rahman Mazumder, Ying Su, Oscar Contreras Velasco

2509.24005 2026-03-23 cs.LG stat.ML

Does Weak-to-strong Generalization Happen under Spurious Correlations?

Chenruo Liu, Yijun Dong, Qi Lei

2507.16945 2026-03-23 stat.ME

Optimal two-phase sampling designs for generalized raking estimators with multiple parameters of interest

Jasper B. Yang, Bryan E. Shepherd, Thomas Lumley, Pamela A. Shaw

Comments 40 pages (27 main, 13 supplemental); 1 figure, 5 tables

2506.12177 2026-03-23 stat.ME q-bio.QM stat.AP

A proxy-based approach for unmeasured confounding in electronic health records research

Haley Colgate Kottler, Amy Cochran

2505.08729 2026-03-23 stat.ME econ.EM

Which Covariates to Adjust for? Specification-robust Causal Inference in Observational Studies

Aditya Ghosh, Dominik Rothenhäusler

Comments 61 pages, 4 figures

2502.09880 2026-03-23 physics.soc-ph cs.LG cs.SI nlin.AO stat.ML

Interpretable Early Warnings using Machine Learning in an Online Game-experiment

Guillaume Falmagne, Anna B. Stephenson, Simon A. Levin

详情

DOI: 10.1073/pnas.2503493122
Journal ref: PNAS 123(1), e2503493122(2026)

英文摘要

Stemming from physics and later applied to other fields such as ecology, the theory of critical transitions suggests that some regime shifts are preceded by statistical early warning signals. Reddit's r/place experiment, a large-scale social game, provides a unique opportunity to test these signals consistently across thousands of subsystems undergoing critical transitions. In r/place, millions of users collaboratively created ''compositions'', or pixel-art drawings, in which transitions occur when one composition rapidly replaces another. We develop a machine-learning-based early warning system that combines the predictive power of multiple system-specific time series via gradient-boosted decision trees with memory-retaining features. Our method significantly outperforms standard early warning indicators. Trained on the 2022 r/place data, our algorithm detects half of the transitions occurring within 20 min at a false positive rate of just 3.6%. Its performance remains robust when tested on the 2023 r/place event, demonstrating generalizability across different contexts. Using SHapley Additive exPlanations (SHAP) for interpreting the predictions, we investigate the underlying drivers of warnings, which could be relevant to other complex systems, especially online social systems. We reveal an interplay of patterns preceding transitions, such as critical slowing down or speeding up, a lack of innovation or coordination, turbulent histories, and a lack of image complexity. These findings show the potential of machine learning indicators in socio-ecological systems for predicting regime shifts and understanding their dynamics.

URL PDF HTML ☆

赞 0 踩 0

2502.05709 2026-03-23 cs.LG stat.ML

Flow-based Conformal Prediction for Multi-dimensional Time Series

Junghwan Lee, Chen Xu, Yao Xie

2409.06890 2026-03-23 stat.ML cs.LG

Learning Representations for Independence Testing

Nathaniel Xu, Feng Liu, Danica J. Sutherland

Comments v3: as published at TMLR (https://openreview.net/forum?id=pDvKoXRsnW), including many relatively smaller improvements

2305.07433 2026-03-23 stat.AP

Aligning the Western Balkans power sectors with the European Green Deal

Emir Fejzić, Taco Niet, Cameron Wade, Will Usher

Comments 34 pages, 14 figures

1911.01850 2026-03-23 stat.ME stat.AP

Stabilizing Variable Selection and Regression

Niklas Pfister, Evan G. Williams, Jonas Peters, Ruedi Aebersold, Peter Bühlmann