arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.18569 2026-04-21 stat.ML cs.LG

Revisiting Active Sequential Prediction-Powered Mean Estimation

Maria-Eleni Sfyraki, Jun-Kun Wang

Comments Published as a conference paper at ICLR 2026

详情

英文摘要

In this work, we revisit the problem of active sequential prediction-powered mean estimation, where at each round one must decide the query probability of the ground-truth label upon observing the covariates of a sample. Furthermore, if the label is not queried, the prediction from a machine learning model is used instead. Prior work proposed an elegant scheme that determines the query probability by combining an uncertainty-based suggestion with a constant probability that encodes a soft constraint on the query probability. We explored different values of the mixing parameter and observed an intriguing empirical pattern: the smallest confidence width tends to occur when the weight on the constant probability is close to one, thereby reducing the influence of the uncertainty-based component. Motivated by this observation, we develop a non-asymptotic analysis of the estimator and establish a data-dependent bound on its confidence interval. Our analysis further suggests that when a no-regret learning approach is used to determine the query probability and control this bound, the query probability converges to the constraint of the max value of the query probability when it is chosen obliviously to the current covariates. We also conduct simulations that corroborate these theoretical findings.

URL PDF HTML ☆

赞 0 踩 0

2604.18547 2026-04-21 stat.ML cs.CL cs.LG

FUSE: Ensembling Verifiers with Zero Labeled Data

Joonhyuk Lee, Virginia Ma, Sarah Zhao, Yash Nair, Asher Spector, Regev Cohen, Emmanuel J. Candès

2604.18523 2026-04-21 cond-mat.dis-nn cs.IT math.IT math.ST stat.TH

BBP transition and the leading eigenvector of the spiked Wigner model with inhomogeneous noise

Leonardo S. Ferreira, Fernando L. Metz

Comments 21 pages, 7 figures

2604.18441 2026-04-21 math.ST cs.LG stat.ML stat.TH

Conformal Robust Set Estimation

Alejandro Cholaquidis, Emilien Joly, Leonardo Moreno

2604.18045 2026-04-21 stat.ME

An ensemble-based approach for multi-fidelity emulation and adaptive sampling

Hossein Mohammadi

2511.02757 2026-04-21 cs.LG math.OC stat.ML

ConMeZO: Adaptive Descent-Direction Sampling for Gradient-Free Finetuning of Large Language Models

Lejs Deen Behric, Liang Zhang, Bingcong Li, Kiran Koshy Thekumparampil

2510.16750 2026-04-21 math.ST cs.IT math.IT stat.ML stat.TH

On Robust Hypothesis Testing with respect to the Hellinger Distance

Eeshan Modak, Sivaraman Balakrishnan, Ananda Theertha Suresh

Comments 15 pages, 2 figures. Updated authors list. Some changes in notations and exposition. Shorter version to appear in the proceedings of ISIT 2026

2604.18505 2026-04-21 cs.IT math.IT stat.ML

Bayesian experimental design: grouped geometric pooled posterior via ensemble Kalman methods

Huchen Yang, Xinghao Dong, Jinlong Wu

2604.18497 2026-04-21 stat.ME

Missingness-Adaptive Factor Identification in High-Dimensional Data

Ping Zeng, Yicheng Zeng, Lixing Zhu

Comments 46 pages, 4 figures, 14 tables

2604.18450 2026-04-21 stat.ML cs.LG math.ST stat.TH

Random Matrix Theory of Early-Stopped Gradient Flow: A Transient BBP Scenario

Florentin Coeurdoux, Grégoire Ferré, Jean-Philippe Bouchaud

2604.18430 2026-04-21 stat.ME

Shrinkage through multiple identifiability

Carlos García Meixide, David Ríos Insua

2604.18420 2026-04-21 stat.ML cs.LG

Spectral bandits for smooth graph functions

Michal Valko, Rémi Munos, Branislav Kveton, Tomáš Kocák

Comments Published in International Conference on Machine Learning (ICML 2014)

2604.18402 2026-04-21 stat.ML math.DS

Adaptive Kernel Selection for Kernelized Diffusion Maps

Othmane Aboussaad, Adam Miraoui, Boumediene Hamzi, Houman Owhadi

2604.18388 2026-04-21 stat.ME

Order Dependence in Regression by Composition: Discussion on "Regression by Composition'' by Farewell, Daniel, Stensrud, and Huitfeldt

Mei Dong, Linbo Wang, Lin Liu, Oliver Dukes

2604.18363 2026-04-21 stat.ME

Effect Sizes in Marketing Research: Why Cohen's Local f^2 Belongs in the Toolkit

Wolfgang Messner

2604.18341 2026-04-21 stat.ME

Statistical inference with win statistics in cluster-randomized trials with composite outcomes

Xi Fang, Guangyu Tong, Yuan Huang, F. Perry Wilson, Patrick J. Heagerty, Fan Li

2604.18323 2026-04-21 stat.ME

Which Small-Sample Correction Should Be Used When Analyzing Stepped-Wedge Designs with Time-Varying Treatment Effects?

Yongdong Ouyang, Monica Taljaard, James P. Hughes, Fan Li

Comments first draft

2604.18319 2026-04-21 stat.ML cs.LG stat.ME

Overcoming Selection Bias in Statistical Studies With Amortized Bayesian Inference

Jonas Arruda, Sophie Chervet, Paula Staudt, Andreas Wieser, Michael Hoelscher, Isabelle Sermet-Gaudelus, Nadine Binder, Lulla Opatowski, Jan Hasenauer

详情

英文摘要

Selection bias arises when the probability that an observation enters a dataset depends on variables related to the quantities of interest, leading to systematic distortions in estimation and uncertainty quantification. For example, in epidemiological or survey settings, individuals with certain outcomes may be more likely to be included, resulting in biased prevalence estimates with potentially substantial downstream impact. Classical corrections, such as inverse-probability weighting or explicit likelihood-based models of the selection process, rely on tractable likelihoods, which limits their applicability in complex stochastic models with latent dynamics or high-dimensional structure. Simulation-based inference enables Bayesian analysis without tractable likelihoods but typically assumes missingness at random and thus fails when selection depends on unobserved outcomes or covariates. Here, we develop a bias-aware simulation-based inference framework that explicitly incorporates selection into neural posterior estimation. By embedding the selection mechanism directly into the generative simulator, the approach enables amortized Bayesian inference without requiring tractable likelihoods. This recasting of selection bias as part of the simulation process allows us to both obtain debiased estimates and explicitly test for the presence of bias. The framework integrates diagnostics to detect discrepancies between simulated and observed data and to assess posterior calibration. The method recovers well-calibrated posterior distributions across three statistical applications with diverse selection mechanisms, including settings in which likelihood-based approaches yield biased estimates. These results recast the correction of selection bias as a simulation problem and establish simulation-based inference as a practical and testable strategy for parameter estimation under selection bias.

URL PDF HTML ☆

赞 0 踩 0

2604.18314 2026-04-21 stat.ME

Embarrassingly Causal: Causal Use of Associational Data in Magic The Gathering Drafts

Mark Louie F. Ramos, Ph. D

2604.18310 2026-04-21 stat.ML cs.LG

Symmetry Guarantees Statistic Recovery in Variational Inference

Daniel Marks, Dario Paccagnan, Mark van der Wilk

Comments 19 pages, 2 figures

2604.18291 2026-04-21 stat.AP

Data (in)equities in data science: Dissecting systemic and systematic biases in pulse oximetry

Lillian Rountree, Harsh Parikh, Bhramar Mukherjee

2604.18253 2026-04-21 math.ST math.PR stat.CO stat.TH

Gamma-Based Expansion for the First-Passage Time Distribution of Stochastic Logistic Models with Harvesting

Simone Catanzaro, Elvira Di Nardo

2604.18251 2026-04-21 cs.CV cs.AI cs.LG stat.AP

Style-Based Neural Architectures for Real-Time Weather Classification

Hamed Ouattara, Pascal Houssam Salmane, Pierre Duthon, Frédéric Bernardin, Omar Ait Aider

Comments 9 pages, 21 figures

2604.18229 2026-04-21 stat.ME

Inference for Functional Data under Markov Constraints

Ulysse Naepels, Victor M. Panaretos

2604.18152 2026-04-21 stat.ML cs.LG

mlr3torch: A Deep Learning Framework in R based on mlr3 and torch

Sebastian Fischer, Lukas Burk, Carson Zhang, Bernd Bischl, Martin Binder

2604.18089 2026-04-21 cs.LG stat.ML

Towards E-Value Based Stopping Rules for Bayesian Deep Ensembles

Emanuel Sommer, Rickmer Schulte, Sarah Deubner, Julius Kobialka, David Rügamer

Comments Accepted for presentation at the OPTIMAL Workshop at AISTATS 2026, Tangier, Morocco

2604.18088 2026-04-21 cs.CV cs.AI stat.AP

Autonomous Unmanned Aircraft Systems for Enhanced Search and Rescue of Drowning Swimmers: Image-Based Localization and Mission Simulation

Sascha Emanuel Zell, Toni Schneidereit, Armin Fügenschuh, Michael Breuß

Comments Submitted to "Applied Intelligence"

2604.18057 2026-04-21 stat.ME stat.AP stat.CO

Efficient Bayesian inference for non-linear association structures in joint models: A hierarchical approach via INLA

Denis Rustand, Håvard Rue, Lisa Le Gall, Karen Leffondre

2604.18042 2026-04-21 stat.ME math.ST stat.TH

A Bayesian framework with adaptive elastic nets for the inference of Gaussian graphical models

Roland B. Sogan, Tabea Rebafka, Fanny Villers

2604.18022 2026-04-21 q-bio.BM cond-mat.stat-mech cs.LG stat.ML

Boltzmann Machine Learning with a Parallel, Persistent Markov chain Monte Carlo method for Estimating Evolutionary Fields and Couplings from a Protein Multiple Sequence Alignment

Sanzo Miyazawa

Comments A manuscript of 11 pages including 3 figures and 3 tables, and a supplementary material of 9 pages including 8 figures. The program and multiple sequence alignments employed here are available from https://gitlab.com/sanzo.miyazawa/BM/ and https://github.com/Sanzo-Miyazawa/BM/

2604.17984 2026-04-21 cs.LG stat.ML

Online Conformal Prediction with Adversarial Semi-bandit Feedback via Regret Minimization

Junyoung Yang, Kyungmin Kim, Sangdon Park

2604.17956 2026-04-21 cs.LG stat.ME

Federated Rule Ensemble Method in Medical Data

Ke Wan, Kensuke Tanioka, Toshio Shimokawa

2604.17952 2026-04-21 econ.EM cs.SI stat.AP

Causal inference for social network formation

Maximilian Kasy, Elizabeth Linos, Sanaz Mobasseri

2601.08184 2026-04-21 math.PR cs.LG stat.ML

Wasserstein-p Central Limit Theorem Rates: From Local Dependence to Markov Chains

Yixuan Zhang, Qiaomin Xie

Comments ACM SIGMETRICS 2026. 73 pages

2512.23405 2026-04-21 cs.LG stat.ML

On the Sample Complexity of Learning for Blind Inverse Problems

Nathan Buskulic, Luca Calatroni, Lorenzo Rosasco, Silvia Villa

详情

英文摘要

Blind inverse problems arise in many experimental settings where both the signal of interest and the forward operator are (partially) unknown. In this context, methods developed for the non-blind case cannot be adapted in a straightforward manner due to identifiability issues and symmetric solutions inherent to the blind setting. Recently, data-driven approaches have been proposed to address such problems, demonstrating strong empirical performance and adaptability. However, these methods often lack interpretability and are not supported by theoretical guarantees, limiting their reliability in domains such as applied imaging where a blind approach often relates to a calibration of the acquisition device. In this work, we shed light on learning in blind inverse problems within the insightful framework of Linear Minimum Mean Square Estimators (LMMSEs). We provide a theoretical analysis, deriving closed-form expressions for optimal estimators and extending classical recovery results to the blind setting. In particular, we establish equivalences with tailored Tikhonov-regularized formulations, where the regularization structure depends explicitly on the distributions of the unknown signal, of the noise, and of the random forward operator. We also show how the reconstruction error converges as the noise and the randomness of the operator diminish when we use a source condition assumption. Furthermore, we derive finite-sample error bounds that characterize the performance of the learned estimators as a function of the noise level, problem conditioning, and number of available samples. These bounds explicitly quantify the impact of operator randomness and show explicitly the dependence of the associated convergence rates to this randomness factors. Finally, we validate our theoretical findings through illustrative exemplar numerical experiments that confirm the predicted convergence behavior.

URL PDF HTML ☆

赞 0 踩 0

2512.02744 2026-04-21 stat.ME econ.EM stat.AP

Implicit score-driven filters for time-varying parameter models

Rutger-Jan Lange, Bram van Os, Dick van Dijk

Comments 73 pages

2511.16172 2026-04-21 econ.EM stat.ME

Confidence Sets for the Emergence, Collapse, and Recovery Dates of a Bubble

Eiji Kurozumi, Anton Skrobotov

2511.09249 2026-04-21 econ.EM math.ST stat.TH

Robust Cauchy-Based Methods for Predictive Regressions

Rustam Ibragimov, Jihyun Kim, Anton Skrobotov

2509.18964 2026-04-21 cs.LG math.OC stat.ML

Central Limit Theorems for Asynchronous Averaged Q-Learning

Xingtu Liu

2507.20993 2026-04-21 cs.LG cs.AI stat.ML

Annotation-Assisted Learning of Treatment Policies From Multimodal Electronic Health Records

Henri Arno, Thomas Demeester

Comments Preprint. Under review

2506.12771 2026-04-21 stat.ME

Machine-Learning-Powered Specification Testing in Linear Instrumental Variable Models

Cyrill Scheidegger, Malte Londschien, Peter Bühlmann

2505.21722 2026-04-21 cs.LG cs.AI stat.ML

Saddle-To-Saddle Dynamics in Deep ReLU Networks: Low-Rank Bias in the First Saddle Escape

Ioannis Bantzis, James B. Simon, Arthur Jacot

Comments Accepted at ICLR 2026. Camera-ready version

2505.20168 2026-04-21 stat.ME

Causal Meta-Analysis: Rethinking the Foundations of Evidence-Based Medicine

Clément Berenfeld, Ahmed Boughdiri, Bénédicte Colnet, Wouter A. C. van Amsterdam, Aurélien Bellet, Rémi Khellaf, Erwan Scornet, Julie Josse

Comments 26 pages, 4 figures, 2 tables. v2: Adding Sec 4.3 and correcting variance equations in Sec 3. v3: Refactoring of the manuscript and additional results (Thm 1 and 2). v4: additional appendix to Sec 3

2505.15342 2026-04-21 stat.ML cs.LG math.ST stat.TH

Policy Testing in Markov Decision Processes

Kaito Ariu, Po-An Wang, Alexandre Proutiere, Kenshi Abe

2503.21421 2026-04-21 math.OC math.PR math.ST stat.TH

Robust Mean Estimation for Optimization: The Impact of Heavy Tails

Bart P. G. van Parys, Bert Zwart

2503.09299 2026-04-21 math.ST cs.GT stat.TH

Low-Rank Graphon Estimation: Theory and Applications to Graphon Games

Olga Klopp, Fedor Noskov

2502.13570 2026-04-21 stat.ML cs.LG math.ST stat.ME stat.TH

A Scalable Nystrom-Based Kernel Two-Sample Test with Permutations

Antoine Chatalic, Marco Letizia, Nicolas Schreuder, Lorenzo Rosasco

2502.08531 2026-04-21 cs.LG stat.ML

On Different Notions of Redundancy in Conditional-Independence-Based Discovery of Graphical Models

Philipp M. Faller, Dominik Janzing

Comments AISTATS 2026. Previous versions contained incorrect claims about partial correlations and the necessity of the condition in proposition 2

2501.16315 2026-04-21 math.CA math.ST stat.TH

A varifold-type estimation for data sampled on a rectifiable set

Charly Boricaud, Blanche Buet

2409.15965 2026-04-21 math.ST math.OC stat.TH

A Christoffel-like function for high-dimensional support inference in graphical models

Jean-Bernard Lasserre, Lucas Slot

Comments Implemented referee comments. Added a missing assumption to the statement of Corollary 4.11

2407.03725 2026-04-21 econ.EM stat.ME

Is Inference Conditional on Not Rejecting a Pre-test Less Reliable than Unconditional Inference?

Clément de Chaisemartin, Xavier D'Haultfœuille

Comments 42 pages. Many changes compared to v2. In particular, we have added conditions for exact inference and results under local alternatives

2404.05486 2026-04-21 math.ST cs.IT math.IT stat.TH

Quickest Change Detection for Multiple Data Streams Using the James-Stein Estimator

Topi Halme, Venugopal V. Veeravalli, Visa Koivunen

2604.17762 2026-04-21 stat.OT

A Parameter-Centric View on Regression

Jingxin Yan, Lin Liu, Oliver Dukes, Qizhai Li, Linbo Wang

2604.17760 2026-04-21 stat.OT

Toward Variation-Independent Regression by Composition

Ruixuan Zhao, Oliver Dukes, Linbo Wang, Lin Liu

2604.17711 2026-04-21 math.ST math.OC math.PR stat.TH

Quantitative Stability of the Shadow for Wasserstein Projections and Sample Complexity

Jakwang Kim

Comments 13 pages

2604.17705 2026-04-21 math.ST stat.TH

Asymptotic behavior of the variance of the BLUE for the mean of stationary processes

Mamikon S. Ginovyan

2604.17694 2026-04-21 stat.ME cs.LG stat.ML

Improving reproducibility by controlling random seed stability in machine learning based estimation via bagging

Nicholas Williams, Alejandro Schuler

2604.17670 2026-04-21 cs.LG stat.ML

Prior-Fitted Functional Flow: In-Context Generative Models for Pharmacokinetics

César Ojeda, Niklas Hartung, Wilhelm Huisinga, Tim Jahn, Purity Kamene Kavwele, Marian Klose, Piyush Kumar, Ramsés J. Sánchez, Darius A. Faroughy

Comments 9 pages, 2 tables and 4 figures

2604.17593 2026-04-21 q-fin.PM stat.ME

Post-Screening Portfolio Selection

Yoshimasa Uematsu, Shinya Tanaka

2604.17568 2026-04-21 cs.LG math.ST stat.ML stat.TH

Diverse Dictionary Learning

Yujia Zheng, Zijian Li, Shunxing Fan, Andrew Gordon Wilson, Kun Zhang

Comments ICLR 2026

2604.17526 2026-04-21 math.PR math.ST stat.TH

Convergence of Langevin AIS for multimodal distributions

Akshat Agarwal, Gautam Iyer, Aidan Jameson, Seungjae Son, Wyatt Wimmer

2604.17490 2026-04-21 math.ST q-fin.RM stat.TH

Joint Exclusivity

Nawaf Mohammed

2604.17410 2026-04-21 math.ST cs.DS stat.ML stat.TH

Algorithmic Contiguity from Low-Degree Heuristic II: Predicting Detection-Recovery Gaps

Zhangsong Li

Comments 74 pages. This is the second part of arXiv:2502.09832. Also merged the results in arXiv:2601.20522

详情

英文摘要

The low-degree polynomial framework has emerged as a powerful tool for providing evidence of statistical-computational gaps in high-dimensional inference. For detection problems, the standard approach bounds the low-degree advantage through an explicit orthonormal basis. However, this method does not extend naturally to estimation tasks, and thus fails to capture the \emph{detection-recovery gap phenomenon} that arises in many high-dimensional problems. Although several important advances have been made to overcome this limitation \cite{SW22, SW25, CGGV25+}, the existing approaches often rely on delicate, model-specific combinatorial arguments. In this work, we develop a general approach for obtaining \emph{conditional computational lower bounds} for recovery problems from mild bounds on low-degree testing advantage. Our method combines the notion of algorithmic contiguity in \cite{Li25} with a cross-validation reduction in \cite{DHSS25} that converts successful recovery into a hypothesis test with lopsided success probabilities. In contrast to prior unconditional lower bounds, our argument is conceptually simple, flexible, and largely model-independent. We apply this framework to several canonical inference problems, including planted submatrix, planted dense subgraph, stochastic block model, multi-frequency angular synchronization, orthogonal group synchronization, and multi-layer stochastic block model. In the first three settings, our method recovers existing low-degree lower bounds for recovery in \cite{SW22, SW25} via a substantially simpler argument. In the latter three, it gives new evidence for conjectured computational thresholds including the persistence of detection-recovery gaps. Together, these results suggest that mild control of low-degree advantage is often sufficient to explain computational barriers for recovery in high-dimensional statistical models.

URL PDF HTML ☆

赞 0 踩 0

2604.17395 2026-04-21 stat.ME math.AT stat.AP

A Null Model for Mapper Subtype Claims

Chad M. Topaz

2604.17381 2026-04-21 stat.ML cs.LG

StrEBM: A Structured Latent Energy-Based Model for Blind Source Separation

Yuan-Hao Wei

2604.17267 2026-04-21 cs.AI stat.AP

Rectification Difficulty and Optimal Sample Allocation in LLM-Augmented Surveys

Zikun Ye, Hema Yoganarasimhan

2604.17254 2026-04-21 stat.ME stat.AP

Detecting Breast Carcinoma Metastasis on Whole-Slide Images by Partially Subsampled Multiple Instance Learning

Baichen Yu, Xuetong Li, Jing Zhou, Hansheng Wang

2604.17250 2026-04-21 stat.AP

Improving post-operative discharge destination prediction of geriatric patients with generative data augmentation

Pegah Golchian, Pauline Maier, Thomas Kocar, Marvin N. Wright

2604.17239 2026-04-21 math.ST econ.EM stat.TH

Bootstrap consistency for general double/debiased machine learning estimators

Ziming Lin, Fang Han

Comments 30 pages

2604.17236 2026-04-21 math.ST stat.TH

Learning Mixtures of Nonparametric and Convolutional Measures on Effectively Low-dimensional Affine Spaces

Sunrit Chakraborty, XuanLong Nguyen

2604.17224 2026-04-21 cs.LG stat.ML

LASER: Low-Rank Activation SVD for Efficient Recursion

Ege Çakar, Ketan Ali Raghu, Lia Zheng

Comments Accepted to the Latent and Implicit Thinking Workshop at ICLR 2026

2604.17219 2026-04-21 stat.ML cs.LG

PAC-Bayes Bounds for Gibbs Posteriors via Singular Learning Theory

Chenyang Wang, Yun Yang

2604.17213 2026-04-21 math.OC cs.SY eess.SY stat.ML

Symplectic Inductive Bias for Data-Driven Target Reachability in Hamiltonian Systems

Zhuo Ouyang, Jixian Liu, Enrique Mallada

2604.17194 2026-04-21 stat.ML cs.LG

Forecast Sports Outcomes under Efficient Market Hypothesis: Theoretical and Experimental Analysis of Odds-Only and Generalised Linear Models

Kaito Goto, Naoya Takeishi, Takehisa Yairi

详情

英文摘要

Converting betting odds into accurate outcome probabilities is a fundamental challenge in order to use betting odds as a benchmark for sports forecasting and market efficiency analysis. In this study, we propose two methods to overcome the limitations of existing conversion methods. Firstly, we propose an odds-only method to convert betting odds to probabilities without using historical data for model fitting. While existing odds-only methods, such as Multiplicative, Shin, and Power exist, they do not adjust for biases or relationships we found in our betting odds dataset, which consists of 90014 football matches across five different bookmakers. To overcome these limitations, our proposed Odds-Only-Equal-Profitability-Confidence (OO-EPC) method aligns with the bookmakers' pricing objectives of having equal confidence in profitability for each outcome. We provide empirical evidence from our betting odds dataset that, for the majority of bookmakers, our proposed OO-EPC method outperforms the existing odds-only methods. Beyond controlled experiments, we applied the OO-EPC method under real-world uncertainty by using it for six iterations of an annual basketball outcome forecasting competition. Secondly, we propose a generalised linear model that utilises historical data for model fitting and then converts betting odds to probabilities. Existing generalised linear models attempt to capture relationships that the Efficient Market Hypothesis already captures. To overcome this shortcoming, our proposed Favourite-Longshot-Bias-Adjusted Generalised Linear Model (FL-GLM) fits just one parameter to capture the favourite-longshot bias, providing a more interpretable alternative. We provide empirical evidence from historical football matches where, for all bookmakers, our proposed FL-GLM outperforms the existing multinomial and logistic generalised linear models.

URL PDF HTML ☆

赞 0 踩 0

2604.17160 2026-04-21 math.ST stat.TH

Bayesian analysis for a generalised Dirichlet process prior

Nils Lid Hjort

Comments 24 pages, no figures. Statistical Research Report, Deparment of Mathematics, University of Oslo, Nov 2000. Annals of Statistics was interested and invited a revision, which somehow was not followed up. The tech report has gathered quite a few citations, and is brought to arXiv April 2026 for better visibility

2604.17154 2026-04-21 stat.ME

Model Selection and Parameter Inference through Constraints via Sequences of Surrogate Smoothing Functions

Mateen R Shaikh

Comments submitted for peer review on April 18 2026

2604.15632 2026-04-21 math.AG stat.ML

Algebraic Invariants of Lightning Self-Attention

Yulia Alexandr, Hao Duan, Guido Montúfar

2604.14621 2026-04-21 stat.ML cs.LG

Differentially Private Conformal Prediction

Jiamei Wu, Ce Zhang, Zhipeng Cai, Jingsen Kong, Bei Jiang, Linglong Kong, Lingchen Kong

2604.09663 2026-04-21 econ.EM q-fin.GN stat.ME

JFR-rg: A New Macroeconomic Framework for High-Debt, Low-Growth Economies under Financial Repression

Hirofumi Wakimoto

Comments JEL Classification: E44, E52, E62, F31, H63. v2: bibliographic corrections, consistency fixes, and clarifications of scope conditions, falsification language, and selected interpretations; results unchanged

详情

英文摘要

Standard macroeconomic frameworks have correctly identified Japan's government debt - now exceeding 240% of GDP - as carrying substantial fiscal risk. Yet FRED data from 2013 to 2026 present an empirical record inviting a complementary perspective: debt ratios have stabilized, nominal GDP has exceeded 670 trillion yen (SAAR), and unemployment has remained near 2.6-2.7%. This paper formalizes these channels through the Japanese Financial Repression r-g (JFR-rg) model. Building on Blanchard (2019), the framework incorporates a financial repression bias (epsilon_t = pi_t - r^n_t, directly observable from FRED) and a non-linear exchange-rate channel. Three theoretical contributions extend the literature: (i) the Debt Sustainability Corridor, a characterization of stability in (epsilon_t, g^n*_t) space; (ii) the Normalization Ratchet, a path-dependence theorem showing that temporary policy errors generate persistently higher debt trajectories; and (iii) the Captive Financial System Parameter (phi_t), which endogenizes the institutional precondition for JFR-rg stability. Appendices H-L provide supporting empirical evidence (VAR, ARDL, Local Projections) showing the framework's claims are empirically disciplined and falsifiable. The core debt-dynamics propositions are anchored in the consolidated government budget identity (Layer L1), while selected propositions additionally rely on minimal structural assumptions; identification concerns apply only to the empirical Layer L2. Counterfactual simulations illustrate a Normalization Trap: aggressive rate hikes can produce counterproductive debt dynamics. For high-debt, low-growth economies sharing Japan's institutional characteristics, strategically deploying the resulting Repression Dividend into productivity-enhancing investment may represent a regime-contingent equilibrium possibility, conditional on the captive system condition being maintained.

URL PDF HTML ☆

赞 0 踩 0

2603.00883 2026-04-21 cs.LG cs.AI cs.CY stat.AP

Knowledge without Wisdom: Measuring Misalignment between LLMs and Intended Impact

Michael Hardy, Yunsung Kim

2601.15690 2026-04-21 cs.AI stat.AP

From Passive Metric to Active Signal: The Evolving Role of Uncertainty Quantification in Large Language Models

Jiaxin Zhang, Wendi Cui, Zhuohang Li, Lifu Huang, Bradley Malin, Caiming Xiong, Chien-Sheng Wu

Comments This paper has been accepted by ACL 2026

2601.09173 2026-04-21 cs.LG cs.CL q-bio.QM stat.ML

Geometric Stability: The Missing Axis of Representations

Prashant C. Raju

详情

英文摘要

Representational similarity analysis and related methods have become standard tools for comparing the internal geometries of neural networks and biological systems. These methods measure what is represented, the alignment between two representational spaces, but not whether that structure is robust. We introduce geometric stability, a distinct dimension of representational quality that quantifies how reliably a representation's pairwise distance structure holds under perturbation. Our metric, Shesha, measures self-consistency through split-half correlation of representational dissimilarity matrices constructed from complementary feature subsets. A key formal property distinguishes stability from similarity: Shesha is not invariant to orthogonal transformations of the feature space, unlike CKA and Procrustes, enabling it to detect compression-induced damage to manifold structure that similarity metrics cannot see. Spectral analysis reveals the mechanism: similarity metrics collapse after removing the top principal component, while stability retains sensitivity across the eigenspectrum. Across 2463 encoder configurations in seven domains -- language, vision, audio, video, protein sequences, molecular profiles, and neural population recordings -- stability and similarity are empirically uncorrelated ($ρ=-0.01$). A regime analysis shows this independence arises from opposing effects: geometry-preserving transformations make the metrics redundant, while compression makes them anti-correlated, canceling in aggregate. Applied to 94 pretrained models across 6 datasets, stability exposes a "geometric tax": DINOv2, the top-performing model for transfer learning, ranks last in geometric stability on 5/6 datasets. Contrastive alignment and hierarchical architecture predict stability, providing actionable guidance for model selection in deployment contexts where representational reliability matters.

URL PDF HTML ☆

赞 0 踩 0

2512.06324 2026-04-21 math.ST stat.TH

Subsampling Confidence Bound for Persistent Diagram via Time-delay Embedding

Donghyun Park, Junhyun An, Taehyoung Kim, Jisu Kim

Comments 23 pages, 2 figures, Minor corrections in formatting

2511.12069 2026-04-21 cs.SE stat.ME

A Code Smell Refactoring Approach using GNNs

HanYu Zhang, Tomoji Kishi

2511.09215 2026-04-21 stat.ME

Principled analysis of crossover designs: causal effects, efficient estimation, and robust inference

Zhichao Jiang, Peng Ding

2511.07261 2026-04-21 math.NA cs.NA stat.CO stat.ML

High-dimensional Bayesian filtering through deep density approximation

Kasper Bågmark, Filip Rydin

Comments 30 pages, 13 figures

2510.14142 2026-04-21 stat.ME

Complier General Causal Effect in Randomized Controlled Trials with One-Sided Noncompliance

Yin Tang, Yanyuan Ma, Jiwei Zhao

2510.05573 2026-04-21 stat.ML cs.IT cs.LG math.IT

On the Theory of Continual Learning with Gradient Descent for Neural Networks

Hossein Taheri, Avishek Ghosh, Arya Mazumdar

2509.19088 2026-04-21 cs.CY cs.AI cs.HC stat.AP

Digital Twins as Funhouse Mirrors: Five Key Distortions

Tianyi Peng, George Gui, Melanie Brucks, Daniel J. Merlau, Grace Jiarui Fan, Malek Ben Sliman, Eric J. Johnson, Abdullah Althenayyan, Silvia Bellezza, Dante Donati, Hortense Fong, Elizabeth Friedman, Ariana Guevara, Mohamed Hussein, Kinshuk Jerath, Bruce Kogut, Akshit Kumar, Kristen Lane, Hannah Li, Vicki Morwitz, Oded Netzer, Patryk Perkowski, Olivier Toubia

2508.17235 2026-04-21 stat.AP

On the relationship between the Wasserstein distance and differences in life expectancy at birth

Markus Sauerberg

Comments 19 pages, 18 figures

2508.10630 2026-04-21 math.NA cs.NA stat.CO stat.ML

Nonlinear filtering based on density approximation and deep BSDE prediction

Kasper Bågmark, Adam Andersson, Stig Larsson

Comments 18 pages, 6 figures

2508.01942 2026-04-21 math.OC math.ST stat.TH

Central Limit Theorems for Sample Average Approximations in Stochastic Optimal Control

Johannes Milz, Alexander Shapiro

2508.00040 2026-04-21 cs.LG math.PR stat.AP stat.ML

Regime-Aware Conditional Neural Processes with Multi-Criteria Decision Support for Operational Electricity Price Forecasting

Abhinav Das, Stephan Schlüter

2506.12176 2026-04-21 cs.LG stat.ML

"Faithful to What?" On the Limits of Fidelity-Based Explanations

Jackson Eshbaugh

Comments 6 pages, 3 figures, 3 tables. Accepted at the Workshop on Scientific Methods for Understanding Deep Learning (Sci4DL) at ICLR 2026. Code available at https://github.com/jacksoneshbaugh/lambda-linearity-score/tree/main

2505.12548 2026-04-21 stat.ME stat.ML

Modeling Nonstationary Extremal Dependence via Deep Spatial Deformations

Xuanjie Shao, Jordan Richards, Raphael Huser

2502.19499 2026-04-21 cs.LG math.OC stat.ML

On the Interpolation Effect of Score Smoothing in Diffusion Models

Zhengdao Chen

Comments 34 pages, 14 figures. Code available at: https://github.com/google-research/diffusion-score-smoothing

2502.05075 2026-04-21 cs.LG cs.NA math.NA stat.ML

Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension

Yijun Dong, Yicheng Li, Yunai Li, Jason D. Lee, Qi Lei

Comments ICML 2025

2501.02703 2026-04-21 stat.ME

Full-conformal novelty detection

Junu Lee, Ilia Popov, Zhimei Ren

2412.06628 2026-04-21 stat.ME math.ST stat.TH

Partial identification of principal causal effects under violations of principal ignorability

Minxuan Wu, Joseph Antonelli

Comments Corrected Figure 2: the top-row plots were inadvertently duplicated in the previous version. No changes to the related text, results, or conclusions. A few minor edits

2410.10282 2026-04-21 stat.CO stat.ME

Exact MCMC for Intractable Proposals

Dwija Kakkad, Dootika Vats

2409.14585 2026-04-21 math.NA cs.NA math.PR stat.CO stat.ML

A convergent scheme for the Bayesian filtering problem based on the Fokker--Planck equation and deep splitting

Kasper Bågmark, Adam Andersson, Stig Larsson, Filip Rydin

Comments 22 pages, 3 figures

2409.10030 2026-04-21 stat.ME econ.EM stat.ML

LASSO Inference for High Dimensional Predictive Regressions

Zhan Gao, Ji Hyung Lee, Ziwei Mei, Zhentao Shi

2409.06172 2026-04-21 stat.ME

Nonparametric Inference for Balance in Signed Networks

Xuyang Chen, Yinjie Wang, Weijing Tang

2405.20936 2026-04-21 stat.ME

Bayesian Deep Generative Models for Multiplex Networks with Multiscale Overlapping Clusters

Yuren Zhou, Yuqi Gu, David B. Dunson

2307.16421 2026-04-21 math.PR math.AP stat.ML

Wasserstein Mirror Gradient Flow as the limit of the Sinkhorn Algorithm

Nabarun Deb, Young-Heon Kim, Soumik Pal, Geoffrey Schiebinger

Comments 49 pages, 2 figures, Accepted in the Annals of Probability

2307.01908 2026-04-21 stat.ME

Efficient Estimation of Average Treatment Effect on the Treated under Endogenous Treatment Assignment

Trinetri Ghosh, Jiawei Shan, Menggang Yu, Jiwei Zhao

2604.17144 2026-04-21 stat.ME

Statistical Validation of Computer Models: Global and Subdomain Hypothesis Testing

Chaoan Li, Xianyang Zhang, Rui Tuo

2604.17136 2026-04-21 math.NT math.PR math.ST stat.TH

On the normality of the concatenated Fibonacci constant

José Ricardo G. Mendonça

Comments AMSart style, 18+ε pages, 8 tables, no figures, 27 references

2604.17130 2026-04-21 stat.ME cs.LG stat.ML

A proposal for PU classification under Non-SCAR using clustering and logistic model

Konrad Furmanczyk, Kacper Paczutkowski

Comments 12 pages, 2 figures, MDAI 25

2604.17067 2026-04-21 math.OC cs.LG math.ST stat.TH

Trajectory-Restricted Optimization Conditions and Geometry-Aware Linear Convergence

Faris Chaudhry, Anthea Monod, Keisuke Yano

Comments 37 pages, 2 figures

2604.16975 2026-04-21 math.NA cs.LG cs.NA stat.ML

Convergence theory for Hermite approximations under adaptive coordinate transformations

Yahya Saleh

2604.16949 2026-04-21 cs.LG eess.SP stat.ME

L1 Regularization Paths in Linear Models by Parametric Gaussian Message Passing

Yun-Peng Li, Hans-Andrea Loeliger

2604.16932 2026-04-21 stat.ML cs.LG

Neighbor Embedding for High-Dimensional Sparse Poisson Data

Noga Mudrik, Adam S. Charles

2604.16900 2026-04-21 stat.AP

Analyzing Process Data from Computer-Based Assessments: A Tutorial on Preprocessing, Feature Extraction, and Model-Based Inference

Daeun Hwangbo, Junyeong Park, Minjeong Jeon, Ick Hoon Jin

详情

英文摘要

Computer-based assessments routinely generate detailed interaction logs -- commonly referred to as process data -- that record every action a respondent performs during task completion, yet systematic preprocessing guidance, integrated analytical workflows, and cross-method consistency checks remain scarce in the literature. This paper provides a unified, end-to-end analytical framework for analyzing process data from large-scale assessments -- covering the full pipeline from raw log preprocessing to model-based inference -- using the Programme for the International Assessment of Adult Competencies (PIAAC) Problem Solving in Technology-Rich Environments (PS-TRE) domain as an illustrative example. We first present a systematic preprocessing pipeline -- including timestamp correction, duplicate removal, action block consolidation, and LLM-assisted standardization -- that transforms raw event-level logs into analysis-ready action sequences. We then review and demonstrate two complementary families of analytical methods. The first consists of feature-based methods and their downstream applications, including descriptive process indicators, n-gram analysis with TF--IDF weighting, multidimensional scaling, and process data-informed differential item functioning (DIF) analysis. The second consists of model-based approaches, namely hidden Markov models and the subtask identification procedure. Empirical illustrations using the United States sample illustrate that n-gram-based behavioral clusters carry differential diagnostic information primarily among incorrect respondents, that multidimentionsl scaling-derived features comprehensively reconstruct observed behavioral variables, and that process-informed DIF analyses can identify and mitigate construct-irrelevant sources of group differences. Reproducible R code implementations are provided for all major techniques.

URL PDF HTML ☆

赞 0 踩 0

2604.16894 2026-04-21 cs.LG stat.ME stat.ML

Covariance-Based Structural Equation Modeling in Small-Sample Settings with $p>n$

Hiroki Hasegawa, Aoba Tamura, Yukihiko Okada

Comments 31 pages, 7 figures and 7 tables

2604.16865 2026-04-21 stat.ML cs.LG math.PR

Extraction of informative statistical features in the problem of forecasting time series generated by It{ô}-type processes

Victor Korolev, Mikhail Ivanov, Tatiana Kukanova, Artyom Rukavitsa, Alexander Vakshin, Peter Solomonov, Alexander Zeifman

2604.16809 2026-04-21 stat.ML cs.LG math.OC

A Mechanism Study of Delayed Loss Spikes in Batch-Normalized Linear Models

Peifeng Gao, Wenyi Fang, Yang Zheng, Difan Zou

2604.16714 2026-04-21 cs.LG stat.CO stat.ML

How to Approximate Inference with Subtractive Mixture Models

Lena Zellinger, Nicola Branchini, Lennert De Smet, Víctor Elvira, Nikolay Malkin, Antonio Vergari

Comments Accepted version at AISTATS 2026

2604.16671 2026-04-21 stat.ME

Multi-Experiment Analysis

Reza Hosseini

2604.16661 2026-04-21 math.ST stat.ME stat.TH

Horseshoe Predictive Inference

Percy S. Zhai, Veronika Ročková

2604.16610 2026-04-21 stat.ML cs.LG

Fairness Constraints in High-Dimensional Generalized Linear Models

Yixiao Lin, James Booth

2604.16537 2026-04-21 stat.ME cs.AI stat.AP

Robustifying and Selecting Cohort-Appropriate Prognostic Models under Distributional Shifts

Dimitris Bertsimas, Carol Gao, Angelos G. Koulouras, Georgios Antonios Margonis

2604.16464 2026-04-21 stat.AP cs.LG

Horizon-Aware Forecasting of Passenger Assistance Demand for Rail Station Workforce Planning

Michael Sheehan, Irina Timoshenko

Comments 26 pages, 6 figures, 3 tables

2604.16453 2026-04-21 cs.LG cs.AI stat.ML

Sampling for Quality: Training-Free Reward-Guided LLM Decoding via Sequential Monte Carlo

Jelena Markovic-Voronov, Wenhui Zhu, Bo Long, Zhipeng Wang, Suyash Gupta, Kayhan Behdin, Bee-Chung Chen, Deepak Agarwal

2604.16435 2026-04-21 eess.SP cs.IT math.IT math.ST stat.TH

Beyond the Flat-Spike: Adaptive Sparse CCA for Decaying and Unbalanced Signals

Mengchu Xu, Jian Wang, Yonina C. Eldar

Comments 15 pages, 4 figures; submitted to IEEE TSP

2604.16428 2026-04-21 cs.LG cs.AI stat.ML

Non-Stationarity in the Embedding Space of Time Series Foundation Models

Jinmyeong Choi, Brad Shook, Artur Dubrawski

Comments 17 pages, 7 figures

2604.14949 2026-04-21 stat.ML cs.LG

Unsupervised feature selection using Bayesian Tucker decomposition

Y-h. Taguchi, Yoh-ichi Mototake

Comments 24 pages, 10 figures, to appear in Neural Computation

2604.03337 2026-04-21 cs.CV stat.AP

Significance and Stability Analysis of Gene-Environment Interaction using RGxEStat

Meng'en Qin, Zhe Li, Xiaohui Yang

2604.01502 2026-04-21 stat.ML cs.LG

Conformal Risk Control under Non-Monotone Losses: Theory and Finite-Sample Guarantees

Tareq Aldirawi, Yun Li, Wenge Guo

Comments 39 pages, 6 figures, 3 tables

2603.24647 2026-04-21 cs.LG stat.ML

Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch

Fabio Ferreira, Lucca Wobbe, Arjun Krishnakumar, Frank Hutter, Arber Zela

2512.13400 2026-04-21 econ.EM stat.ML

Profit-Aligned CATE Estimation: Reconciling Policy Learning and Inference

Artem Timoshenko, Caio Waisman

2511.22003 2026-04-21 stat.ML cs.LG stat.ME

A Sensitivity Approach to Causal Inference Under Limited Overlap

Yuanzhe Ma, Yian Huang, Hongseok Namkoong

2511.09872 2026-04-21 math.NA cs.NA stat.ML

Randomized batch-sampling Kaczmarz methods for solving linear systems

Dong-Yue Xie, Xi Yang

2511.03951 2026-04-21 math.ST stat.TH

A unified approach to the Behrens-Fisher problem

Nagananda K G, Jong Sung Kim

Comments 22 pages, 2 figures

2511.03535 2026-04-21 math.ST stat.TH

Asymptotics of the maximum likelihood estimator of the location parameter of Pearson Type VII distribution

Kazuki Okamura

Comments 32 pages, Simulation results added, Exposition modified, to appear in Sankhya A

2510.22341 2026-04-21 stat.AP q-fin.TR

Understanding Carbon Trade Dynamics: A European Union Emissions Trading System Perspective

Avirup Chakraborty

2510.12916 2026-04-21 stat.ML cs.LG

Efficient Inference for Coupled Hidden Markov Models in Continuous Time and Discrete Space

Giosue Migliorini, Padhraic Smyth

2509.09773 2026-04-21 stat.ME math.ST stat.TH

Optimal Inference of the Mean Outcome under Optimal Treatment Regime

Shuoxun Xu, Xinzhou Guo

Comments 17 pages, 5 figures

2508.17412 2026-04-21 cs.LG cs.AI stat.ML

A Ridge Too Far: Correcting Over-Shrinkage via Negative Regularization

Dongseok Kim, Gisung Oh

Comments Substantially revised and reorganized version with a new title, updated framing, and new experiments; the core idea of the work remains unchanged

2506.10060 2026-04-21 cs.LG cs.AI stat.ML

Textual Bayes: Quantifying Prompt Uncertainty in LLM-Based Systems

Brendan Leigh Ross, Noël Vouitsis, Atiyeh Ashari Ghomi, Rasa Hosseinzadeh, Ji Xin, Zhaoyan Liu, Yi Sui, Shiyi Hou, Kin Kwan Leung, Gabriel Loaiza-Ganem, Jesse C. Cresswell

Comments ICLR 2026

2505.13660 2026-04-21 math.OC cs.LG stat.ML

Sobolev Gradient Ascent for Optimal Transport: Barycenter Optimization and Convergence Analysis

Kaheon Kim, Bohan Zhou, Changbo Zhu, Xiaohui Chen

2505.07033 2026-04-21 stat.ML cs.LG stat.ME

Introducing the O-Value: A Universal Standardization for Confusion-Matrix-Based Classification Performance Metrics

Ningsheng Zhao, Trang Bui, Jia Yuan Yu, Krzysztof Dzieciolowski

2501.02817 2026-04-21 math.AT math.ST stat.ML stat.TH

A Stable Measure of Similarity for Time Series using Persistent Homology

Bala Krishnamoorthy, Elizabeth P. Thompson

Comments Modified our similarity measure and included associated results on real climate data

2410.15368 2026-04-21 math.OC cs.LG stat.ML

Tighter Performance Theory of FedExProx

Wojciech Anyszka, Kaja Gruntkowska, Alexander Tyurin, Peter Richtárik

Comments 43 pages, 4 figures

2410.09296 2026-04-21 cs.CR cs.DS stat.AP stat.ML

The 2020 US Decennial Census is more private than you (might) think

Buxin Su, Weijie J. Su, Chendi Wang

2401.15604 2026-04-21 cs.LG stat.ML

Neural Network-Based Score Estimation in Diffusion Models: Optimization and Generalization

Yinbin Han, Meisam Razaviyayn, Renyuan Xu

Comments 58 pages

2112.07572 2026-04-21 math.PR math.ST stat.TH

The high-dimensional asymptotics of first order methods with random data

Michael Celentano, Chen Cheng, Andrea Montanari

Comments 78 pages; v3: introduction, motivations and examples expanded

2006.12024 2026-04-21 stat.ML cs.LG

Bayesian Neural Networks: An Introduction and Survey

Ethan Goan, Clinton Fookes

Comments 44 pages, 8 figures, Fix typos in Eqn 30, 48, and alpha divergence