arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.07063 2026-04-09 stat.ME

Introduction to Relational Event Modelling

Martina Boschi, Ernst C. Wit

详情

英文摘要

Interactions and time shape many aspects of life. Everyday activities -- like conversations, emails, money transfers, citations, and even acts of violence -- are relational events: interactions between a sender and a receiver at a specific moment. At the intersection of event-history analysis and network modelling, relational event models (REMs) offer a powerful framework for studying when and why these events occur. Recent advances have made it possible to express REMs as generalized additive models, allowing researchers to capture complex, non-linear patterns over time. While an essay and a comprehensive review exist, a hands-on tutorial paper on REMs is still missing. This work fills that gap. It provides a practical introduction to REMs, incorporating the latest developments in the field. It demonstrates how to simulate synthetic relational-event data and walks through several empirical applications, comparing different modelling and inference strategies. By bringing together theory, simulation, and application, this tutorial lowers the barrier to entry and makes REMs a more accessible and practical tool.

URL PDF HTML ☆

赞 0 踩 0

2604.07018 2026-04-09 stat.ME stat.ML

Time Series Gaussian Chain Graph Models

Qin Fang, Xinghao Qiao, Zihan Wang

2604.06915 2026-04-09 stat.ME

Covariance Correction for Permutation Statistics in Multiple Testing Problems

Merle Munko, Paavo Sattler

2604.06894 2026-04-09 stat.AP

How Does LLM Help Regional CPI Forecast: An LLM-powered Deep Panel Modeling Framework

Tianchen Gao, Ao Sun, Yurou Wang, Jingyuan Liu, Cheng Hsiao

2604.06864 2026-04-09 stat.ML cs.LG

A Data-Informed Variational Clustering Framework for Noisy High-Dimensional Data

Wan Ping Chen

2604.06701 2026-04-09 cs.LG stat.ML

Bi-Lipschitz Autoencoder With Injectivity Guarantee

Qipeng Zhan, Zhuoping Zhou, Zexuan Wang, Qi Long, Li Shen

Comments Accepted for publication at ICLR 2026, 27 Pages, 15 Figures

2604.06659 2026-04-09 stat.ME

Transfer Learning for Robust Structured Regression with Bi-level Source Detection

Haoming Shi, Yang Feng, Xiaoqian Liu

Comments 34 pages, 7 Figures

2604.06621 2026-04-09 cs.GL cs.LG stat.ML

The Theorems of Dr. David Blackwell and Their Contributions to Artificial Intelligence

Napoleon Paxton

Comments Survey article, 19 pages, 1 figure, 2 tables

2604.06548 2026-04-09 cs.CE stat.AP

A Rolling-Horizon Stochastic Optimization Framework for NBA Franchise Management with Distributionally Robust Risk Constraints

Siming Zhang, Zhehui Shen, Shijie Chen, Jian Zhou

Comments 27 pages, 12 figures

2604.06499 2026-04-09 stat.AP stat.ML

Equivalence Testing Under Privacy Constraints

Savita Pareek, Luca Insolia, Roberto Molinari, Stéphane Guerrier

2604.06492 2026-04-09 cs.LG cs.CR stat.ML

Optimal Rates for Pure {\varepsilon}-Differentially Private Stochastic Convex Optimization with Heavy Tails

Andrew Lowy

2604.06464 2026-04-09 cs.LG physics.app-ph stat.ML

Weighted Bayesian Conformal Prediction

Xiayin Lou, Peng Luo

2604.06445 2026-04-09 stat.ME

From Simple to Composite Perturbations: A Unified Decomposition Framework for Stochastic Block Models

Jianwei Hu, Ding Chen, Ji Zhu

详情

英文摘要

Statistical inference for stochastic block models typically relies on the spectrum of the normalized adjacency matrix $\A^*$. In practice, the true probability matrix $\mathbf{B}$ is unknown and must be replaced by a plug-in estimator $\hat{\mathbf{B}}$. This substitution introduces two distinct types of estimation error: a simple perturbation $\boldsymbolΔ$, arising when $\hat{\mathbf{B}}$ replaces $\mathbf{B}$ only in the numerator, and a composite perturbation $\tilde{\boldsymbolΔ}$, arising when the replacement occurs in both the numerator and the denominator. Under both perturbation regimes, we decompose the total sum of squares into three components and conduct a detailed analysis of their asymptotic properties. This reveals a key, and perhaps surprising, distinction between simple and composite perturbations: the cross term $\tr({\A^*}\bDelta)$ is asymptotically negligible, whereas its composite counterpart $\tr({\A^*}\tilde{\bDelta})$ is not. Motivated by this, we develop a unified decomposition framework, expressing the composite perturbation matrix as $\tilde{\bDelta}=\check{\A}+\bDelta+\check{\bDelta}$, where $\check{\A}$ is a bias matrix of the normalized adjacency matrix, $\bDelta$ is the simple perturbation, and $\check{\bDelta}$ is a bias matrix of $\bDelta$. This structured decomposition allows us to precisely isolate and control each source of error, leading to a refined limiting theory for two key classes of test statistics. Concretely, for the largest eigenvalue statistic, we improve the existing condition from $K=O(n^{1/6-τ})$ to the optimal rate $K=o(n^{1/6})$ under both simple and composite perturbations. For the linear spectral statistic, our unified decomposition framework provides the necessary structure to systematically control these errors term by term, leading to a complete and rigorous proof of asymptotic normality.

URL PDF HTML ☆

赞 0 踩 0

2604.06438 2026-04-09 stat.AP cs.LG

Learning Debt and Cost-Sensitive Bayesian Retraining: A Forecasting Operations Framework

Harrison Katz

2604.06417 2026-04-09 stat.CO

Niching Importance Sampling for Multi-modal Rare-event Simulation

Hugh J. Kinnear, F. A. DiazDelaO

2604.06407 2026-04-09 stat.ME

Dealing with positivity violations in mediation analysis via weighted controlled effects, with application to assessing immune correlates of protection in antigen-experienced participants

Qijia He, Bo Zhang

详情

英文摘要

Causal mediation analysis has become an important and increasingly used framework for evaluating candidate immune response biomarkers in vaccine research. A controlled effects approach has been proposed to estimate controlled risk curves under a counterfactual scenario in which the entire study population is vaccinated and their post-vaccination immune responses are set to a range of fixed levels. This framework performs well when the study population is antigenically naïve, that is, individuals have not been previously exposed to the antigen, as is common in HIV-1 vaccine research and during the early phases of the COVID-19 pandemic. However, the controlled effects framework becomes more challenging to apply in antigen-experienced populations, where prior vaccination or infection has occurred, as in the case of influenza, dengue, and more recent phases of the COVID-19 pandemic. In such settings, a key identification assumption for valid causal mediation analysis, the positivity assumption, is violated: it is no longer plausible to conceive of a hypothetical intervention that sets a post-vaccination immune marker to a fixed level below an individual's baseline immune level. In this article, we introduce a weighted controlled risk approach that targets a subpopulation for whom there is a prespecified probability of attaining a post-vaccination immune marker level. We further generalize this framework to study contrasts of controlled risks for relevant subpopulations. We demonstrate the validity of the proposed estimators through simulation studies and apply the method to reanalyze post-vaccination neutralizing antibody titers against Omicron BA.4/BA.5 as an immune correlate of COVID-19 in the Coronavirus Variant Immunologic Landscape (COVAIL) trial. R code to implement the proposed method can be found on Github: https://github.com/Qijia-He/weighted_CVE.

URL PDF HTML ☆

赞 0 踩 0

2604.06395 2026-04-09 cs.LG q-bio.NC stat.ML

Bridging Theory and Practice in Crafting Robust Spiking Reservoirs

Ruggero Freddi, Nicolas Seseri, Diana Nigrisoli, Alessio Basti

2604.06394 2026-04-09 stat.ME

Depth-Based Vector Median Absolute Deviation Moments for Robust Multivariate Shape Analysis

Elsayed Elamir

Comments 14 pages, 3 figures

2604.06366 2026-04-09 cs.LG stat.ML

Stochastic Gradient Descent in the Saddle-to-Saddle Regime of Deep Linear Networks

Guillaume Corlouer, Avi Semler, Alexander Strang, Alexander Gietelink Oldenziel

2604.06282 2026-04-09 stat.ML cs.LG

Tight Convergence Rates for Online Distributed Linear Estimation with Adversarial Measurements

Nibedita Roy, Vishal Halder, Gugan Thoppe, Alexandre Reiffers-Masson, Mihir Dhanakshirur, Naman, Alexandre Azor

Comments Preprint

2604.06281 2026-04-09 stat.ML math.PR

Generalization error bounds for two-layer neural networks with Lipschitz loss function

Jiang Yu Nguwi, Nicolas Privault

2604.06251 2026-04-09 cs.AI cs.LG stat.AP

Toward Reducing Unproductive Container Moves: Predicting Service Requirements and Dwell Times

Elena Villalobos, Adolfo De Unánue T., Fernanda Sobrino, David Aké, Stephany Cisneros, Jorge Lecona, Alejandra Matadamaz

Comments Preprint, 20 pages, 9 figures, 5 tables (including appendices)

2604.04868 2026-04-09 cs.LG cs.AI stat.ML

Noise Immunity in In-Context Tabular Learning: An Empirical Robustness Analysis of TabPFN's Attention Mechanisms

James Hu, Mahdi Ghelichi

详情

英文摘要

Tabular foundation models (TFMs) such as TabPFN (Tabular Prior-Data Fitted Network) are designed to generalize across heterogeneous tabular datasets through in-context learning (ICL). They perform prediction in a single forward pass conditioned on labeled examples without dataset-specific parameter updates. This paradigm is particularly attractive in industrial domains (e.g., finance and healthcare) where tabular prediction is pervasive. Retraining a bespoke model for each new table can be costly or infeasible in these settings, while data quality issues such as irrelevant predictors, correlated feature groups, and label noise are common. In this paper, we provide strong empirical evidence that TabPFN is highly robust under these sub-optimal conditions. We study TabPFN and its attention mechanisms for binary classification problems with controlled synthetic perturbations that vary: (i) dataset width by injecting random uncorrelated features and by introducing nonlinearly correlated features, (ii) dataset size by increasing the number of training rows, and (iii) label quality by increasing the fraction of mislabeled targets. Beyond predictive performance, we analyze internal signals including attention concentration and attention-based feature ranking metrics. Across these parametric tests, TabPFN is remarkably resilient: ROC-AUC remains high, attention stays structured and sharp, and informative features are highly ranked by attention-based metrics. Qualitative visualizations with attention heatmaps, feature-token embeddings, and SHAP plots further support a consistent pattern across layers in which TabPFN increasingly concentrates on useful features while separating their signals from noise. Together, these findings suggest that TabPFN is a robust TFM capable of maintaining both predictive performance and coherent internal behavior under various scenarios of data imperfections.

URL PDF HTML ☆

赞 0 踩 0

2603.11090 2026-04-09 cs.LG stat.ME

Interventional Time Series Priors for Causal Foundation Models

Dennis Thumm, Ying Chen

Comments ICLR 2026 1st Workshop on Time Series in the Age of Large Models (TSALM)

2603.06257 2026-04-09 stat.ML cs.LG

Robust support vector model based on bounded asymmetric elastic net loss for binary classification

Haiyan Du, Hu Yang

Comments Upon re-examination, we found fundamental flaws in the BAEN-SVM model that undermine our conclusions. The design inadequately addresses geometrical rationality on slack variables, questioning generalizability. Thus, we retract this manuscript. We are exploring a different model and will resubmit after thorough validation. We apologize for any confusion

2602.15889 2026-04-09 stat.AP cs.AI cs.CL physics.ed-ph

Daily and Weekly Periodicity in Large Language Model Performance and Its Implications for Research

Paul Tschisgale, Peter Wulff

Comments The Supplementary Information can be found in the OSF repository cited in the Data Availability Statement

2512.10717 2026-04-09 stat.ME

Dynamic sparse graphs with overlapping communities

Xenia Miscouridou, Francesca Panero, Antreas Laos

2512.01423 2026-04-09 stat.ME

Active Hypothesis Testing under Computational Budgets with Applications to GWAS and LLM

Qi Kuang, Bowen Gang, Yin Xia

2511.05834 2026-04-09 stat.OT

Impacts of Data Splitting Strategies on Parameterized Link Prediction Algorithms

Xinshan Jiao, Yuxin Luo, Yilin Bi, Tao Zhou

Comments 18 pages, 3 figures. Published in Physica A (2026)

2511.01028 2026-04-09 quant-ph math-ph math.MP math.ST stat.TH

Pseudo quantum advantages in perceptron storage capacity

Fabio Benatti, Masoud Gharahi, Giovanni Gramegna, Stefano Mancini, Vincenzo Parisi

Comments 24 pages, 1 figure; minor changes, typos corrected

2510.11169 2026-04-09 stat.ML cs.LG

PAC-Bayesian Bounds on Constrained f-Entropic Risk Measures

Hind Atbir, Farah Cherfaoui, Guillaume Metzler, Emilie Morvant, Paul Viallard

Comments Accepted at the 29th International Conference on Artificial Intelligence and Statistics (AISTATS 2026)

2510.08974 2026-04-09 stat.CO cs.NA math.NA

Bayesian Active Learning for Bayesian Model Updating: the Art of Acquisition Functions and Beyond

Jingwen Song, Pengfei Wei

Comments 47 pages, 15 figures, submitted to Elsevier journal

详情

DOI: 10.1016/j.ymssp.2026.114237
Journal ref: Mechanical Systems and Signal Processing 251 (2026) 114237

英文摘要

Estimating posteriors and the associated model evidences, with desired accuracy and affordable computational cost, is a core issue of Bayesian model updating, and can be of great challenge given expensive-to-evaluate models and posteriors with complex features such as multi-modalities of unequal importance, nonlinear dependencies and high sharpness. Bayesian Quadrature (BQ) equipped with active learning has emerged as a competitive framework for tackling this challenge, as it provides flexible balance between computational cost and accuracy. The performance of a BQ scheme is fundamentally dictated by the acquisition function as it exclusively governs the active generation of integration points. After reexamining one of the most advanced acquisition function from a prospective inference perspective and reformulating the quadrature rules for prediction, four new acquisition functions, inspired by distinct intuitions on expected rewards, are primarily developed, all of which are accompanied by elegant interpretations and highly efficient numerical estimators. Mathematically, these four acquisition functions measure, respectively, the prediction uncertainty of posterior, the contribution to prediction uncertainty of evidence, as well as the expected reduction of prediction uncertainties concerning posterior and evidence, and thus provide flexibility for highly effective design of integration points. These acquisition functions are further extended to the transitional BQ scheme, along with several specific refinements, to tackle the above-mentioned challenges with high efficiency and robustness. Effectiveness of the developments is ultimately demonstrated with extensive benchmark studies and application to an engineering example.

URL PDF HTML ☆

赞 0 踩 0

2508.05423 2026-04-09 cs.LG stat.ML

Negative Binomial Variational Autoencoders for Overdispersed Latent Modeling

Yixuan Zhang, Jinhao Sheng, Wenxin Zhang, Quyu Kong, Feng Zhou

2507.18937 2026-04-09 physics.ao-ph cs.AI cs.LG stat.ML

CNN-based Surface Temperature Forecasts with Ensemble Numerical Weather Prediction

Takuya Inoue, Takuya Kawabata

Comments 48 pages, 14 figures

2507.10303 2026-04-09 stat.ML cs.LG stat.CO stat.ME

MF-GLaM: A multifidelity stochastic emulator using generalized lambda models

K. Giannoukou, X. Zhu, S. Marelli, B. Sudret

详情

Journal ref: Computer Methods in Applied Mechanics and Engineering, Volume 448, Part B, January 2026, 118498

英文摘要

Stochastic simulators exhibit intrinsic stochasticity due to unobservable, uncontrollable, or unmodeled input variables, resulting in random outputs even at fixed input conditions. Such simulators are common across various scientific disciplines; however, emulating their entire conditional probability distribution is challenging, as it is a task traditional deterministic surrogate modeling techniques are not designed for. Additionally, accurately characterizing the response distribution can require prohibitively large datasets, especially for computationally expensive high-fidelity (HF) simulators. When lower-fidelity (LF) stochastic simulators are available, they can enhance limited HF information within a multifidelity surrogate modeling (MFSM) framework. While MFSM techniques are well-established for deterministic settings, constructing multifidelity emulators to predict the full conditional response distribution of stochastic simulators remains a challenge. In this paper, we propose multifidelity generalized lambda models (MF-GLaMs) to efficiently emulate the conditional response distribution of HF stochastic simulators by exploiting data from LF stochastic simulators. Our approach builds upon the generalized lambda model (GLaM), which represents the conditional distribution at each input by a flexible, four-parameter generalized lambda distribution. MF-GLaMs are non-intrusive, requiring no access to the internal stochasticity of the simulators nor multiple replications of the same input values. We demonstrate the efficacy of MF-GLaM through synthetic examples of increasing complexity and a realistic earthquake application. Results show that MF-GLaMs can achieve improved accuracy at the same cost as single-fidelity GLaMs, or comparable performance at significantly reduced cost.

URL PDF HTML ☆

赞 0 踩 0

2507.06580 2026-04-09 math.PR math.ST stat.TH

On the rate of convergence to the Boolean extreme value distribution under the von Mises condition

Yuki Ueda

Comments 15 pages. This version has been revised from the previous one (see Section 4.2). Accepted in IDAQP

2503.02129 2026-04-09 cs.LG cs.AI math.ST stat.TH

Path Regularization: A Near-Complete and Optimal Nonasymptotic Generalization Theory for Multilayer Neural Networks and Double Descent Phenomenon

Hao Yu

2501.10806 2026-04-09 math.OC cs.LG cs.SY eess.SY stat.ML

Non-Expansive Mappings in Two-Time-Scale Stochastic Approximation: Finite-Time Analysis

Siddharth Chandak

Comments Accepted for publication to SIAM Journal on Control and Optimization

2411.11728 2026-04-09 stat.ME

Davis-Kahan Theorem in the two-to-infinity norm and its application to perfect clustering

Marianna Pensky

Comments 45 pages

2411.10858 2026-04-09 stat.ME

Scalable Gaussian Process Regression Via Median Posterior Inference for Estimating Multi-Pollutant Mixture Health Effects

Aaron Sonabend, Jiangshan Zhang, Edgar Castro, Joel Schwartz, Brent A. Coull, Junwei Lu

详情

DOI: 10.1093/biomtc/ujag071

英文摘要

Humans are exposed to complex mixtures of environmental pollutants rather than single chemicals, necessitating methods to quantify the health effects of such mixtures. Research on environmental mixtures provides insights into realistic exposure scenarios, informing regulatory policies that better protect public health. However, statistical challenges, including complex correlations among pollutants and nonlinear multivariate exposure-response relationships, complicate such analyses. A popular Bayesian semi-parametric Gaussian process regression framework (Coull et al., 2015) addresses these challenges by modeling exposure-response functions with Gaussian processes and performing feature selection to manage high-dimensional exposures while accounting for confounders. Originally designed for small to moderate-sized cohort studies, this framework does not scale well to massive datasets. To address this, we propose a divide-and-conquer strategy, partitioning data, computing posterior distributions in parallel, and combining results using the generalized median. While we focus on Gaussian process models for environmental mixtures, the proposed distributed computing strategy is broadly applicable to other Bayesian models with computationally prohibitive full-sample Markov Chain Monte Carlo fitting. We provide theoretical guarantees for the convergence of the proposed posterior distributions to those derived from the full sample. We apply this method to estimate associations between a mixture of ambient air pollutants and ~650,000 birthweights recorded in Massachusetts during 2001-2012. Our results reveal negative associations between birthweight and traffic pollution markers, including elemental and organic carbon and PM2.5, and positive associations with ozone and vegetation greenness.

URL PDF HTML ☆

赞 0 踩 0

2410.02941 2026-04-09 stat.ME

Efficient collaborative learning of the average treatment effect

Sijia Li, Rui Duan

Comments 30 pages, 6 figures

2409.14590 2026-04-09 cs.LG cs.AI stat.ML

Explainable AI needs formalization

Stefan Haufe, Rick Wilming, Benedict Clark, Rustam Zhumagambetov, Ahcène Boubekki, Jörg Martin, Danny Panknin

2409.06490 2026-04-09 cs.CV stat.AP

UAVDB: Point-Guided Masks for UAV Detection and Segmentation

Yu-Hsi Chen

Comments 14 pages, 4 figures, 4 tables

2407.20162 2026-04-09 math.ST stat.TH

Non-standard boundary behaviour in two-component mixture models

Heather Battey, Peter McCullagh, Daniel Xiang

2406.06408 2026-04-09 stat.ML cs.CR cs.LG math.ST stat.TH

Differentially Private Best-Arm Identification

Achraf Azize, Marc Jourdan, Aymen Al Marjani, Debabrota Basu

Comments 85 pages, 5 figures, 3 tables, 11 algorithms. To be published in the Journal of Machine Learning Research 27. This journal paper is an extended version of the conference paper Azize et al. ("On the Complexity of Differentially Private Best-Arm Identification with Fixed Confidence", NeurIPS 2023)

2405.08253 2026-04-09 stat.ML cs.LG math.OC

Thompson Sampling for Infinite-Horizon Discounted Decision Processes

Daniel Adelman, Cagla Keceli, Alba V. Olivares-Nadal

详情

英文摘要

This paper develops a viable notion of learning for sampling-based algorithms that applies in broader settings than previously considered. More specifically, we model a discounted infinite-horizon MDPs with Borel state and action spaces, whose rewards and transitions depend on an unknown parameter. To analyze adaptive learning algorithms based on sampling we introduce a general canonical probability space in this setting. Since standard definitions of regret are inadequate for policy evaluation in this setting, we propose new metrics that arise from decomposing the standard expected regret in discounted infinite-horizon MDPs into three terms: (i) the expected finite-time regret, (ii) the expected state regret, and (iii) the expected residual regret. Component (i) translates into the traditional concept of expected regret over a finite horizon. Term (ii) reflects how much future performance is compromised at a given time because earlier decisions have led the system to a less favorable state than under an optimal policy. Finally, metric (iii) measures regret with respect to the optimal reward from the current period onward, disregarding the irreversible consequences of past decisions. We further disaggregate this term by introducing the probabilistic residual regret, a finer, sample-path version of (iii) that captures the remaining loss in future performance from the current period onward, conditional on the observed history. Its expectation coincides with (iii). We then focus on Thompson sampling (TS); under assumptions that extend those used in prior work on finite state and action spaces to the Borel setting, we show that component (iii) for TS converges to zero exponentially fast. We further show that, under mild conditions ensuring the existence of the relevant limits, its probabilistic counterpart converges to zero almost surely and TS achieves complete learning.

URL PDF HTML ☆

赞 0 踩 0

2404.04794 2026-04-09 stat.ME

Local Balance Calibration for Nonparametric Propensity Score Estimation

Maosen Peng, Yan Li, Chong Wu, Liang Li

Comments Corresponding author: Chong Wu (Email: CWu18@mdanderson.org) and Liang Li (Email: LLi15@mdanderson.org)

2403.07628 2026-04-09 math.PR math-ph math.MP math.ST stat.TH

Asymptotic Expansions of the Limit Laws of Gaussian and Laguerre (Wishart) Ensembles at the Soft Edge

Folkmar Bornemann

Comments V5: using an alternative expression for the parameter tau that better fits the style of the other parameters in the Laguerre/Wishart cases, more remarks on the rationale of the scaling in the symplectic cases; 70 pages, 8 figures

2403.05281 2026-04-09 stat.ML math.ST stat.TH

A Generative Approach to Quasi-Random Sampling from Copulas via Space-Filling Designs

Sumin Wang, Chenxian Huang, Yongdao Zhou, Min-Qian Liu

Comments 42 pages, 5 figures

2403.03208 2026-04-09 stat.ML cs.LG stat.ME

Active Statistical Inference

Tijana Zrnic, Emmanuel J. Candès

2304.07797 2026-04-09 math.ST math.OC math.PR stat.TH

Optimal distributions for randomized unbiased estimators with an infinite horizon and an adaptive algorithm

Chao Zheng, Jiangtao Pan, Qun Wang

2303.05443 2026-04-09 stat.ME

Likelihood-based Inference for Skewed Responses in a Crossover Trial Setup

Savita Pareek, Kalyan Das, Siuli Mukhopadhyay

2105.07446 2026-04-09 stat.ML cs.LG math.ST stat.TH

Sobolev Norm Learning Rates for Conditional Mean Embeddings

Prem Talwai, Ali Shameli, David Simchi-Levi

Comments Appears in AISTATS 2022

2604.07325 2026-04-09 stat.ME math.ST stat.ML stat.TH

Conformal Prediction with Time-Series Data via Sequential Conformalized Density Regions

M. Sampson, K. S. Chan

2604.07323 2026-04-09 stat.ML cs.LG math.PR

Gaussian Approximation for Asynchronous Q-learning

Artemy Rubtsov, Sergey Samsonov, Vladimir Ulyanov, Alexey Naumov

Comments 41 pages

2604.07290 2026-04-09 physics.ins-det physics.geo-ph stat.AP

Multispectral representation of Distributed Acoustic Sensing data: a framework for physically interpretable feature extraction and visualization

Sergio Morell-Monzó, Dídac Diego-Tortosa, Isabel Pérez-Arjona, Víctor Espinosa

2604.07267 2026-04-09 stat.ML cs.LG

The Theory and Practice of Highly Scalable Gaussian Process Regression with Nearest Neighbours

Robert Allison, Tomasz Maciazek, Anthony Stephenson

Comments 92 pages (35-page main text + self-contained appendix with theorem proofs and auxiliary lemmas)

2604.07179 2026-04-09 stat.ME

NLP-Informed Dynamic Cognitive Diagnosis Modelling

Yawen Ma, Sahoko Ishida, Kate Cain, Gabriel Wallin

2604.07153 2026-04-09 math.ST stat.ME stat.TH

Non-asymptotic two-sample kernel testing with the spectrally truncated normalized MMD

Perrine Lacroix, Bertrand Michel, Franck Picard, Vincent Rivoirard

2604.07143 2026-04-09 cs.LG stat.AP stat.ML

Lumbermark: Resistant Clustering by Chopping Up Mutual Reachability Minimum Spanning Trees

Marek Gagolewski

2604.07135 2026-04-09 stat.ME

Private Federated Learning for High-dimensional Time Series

Kejun Chen, Qianqian Zhu

2604.06104 2026-04-09 physics.soc-ph stat.AP

Modeling Disruptions to Urban Metabolism using Interconnected Networks

Bharat Sharma, Abhilasha J. Saroj, Evan Scherrer, Melissa R. Allen-Dumas

2603.14984 2026-04-09 stat.ME stat.AP

Spatiotemporally Consistent Multivariate Bias Correction for Climate Projections via Nested Vine Copulas

Theresa Meier, Erwan Koch, Valérie Chavez-Demoulin, Thibault Vatter

Comments 58 pages, 15 figures, 7 tables

2603.14135 2026-04-09 stat.ML cs.LG

Conditional flow matching for physics-constrained inverse problems with finite training data

Agnimitra Dasgupta, Ali Fardisi, Mehrnegar Aminy, Brianna Binder, Bryan Shaddy, Saeed Moazami, Assad Oberai

详情

英文摘要

This study presents a conditional flow matching framework for solving physics-constrained Bayesian inverse problems. In this setting, samples from the joint distribution of inferred variables and measurements are assumed available, while explicit evaluation of the prior and likelihood densities is not required. We derive a simple and self-contained formulation of both the unconditional and conditional flow matching algorithms, tailored specifically to inverse problems. In the conditional setting, a neural network is trained to learn the velocity field of a probability flow ordinary differential equation that transports samples from a chosen source distribution directly to the posterior distribution conditioned on observed measurements. This black-box formulation accommodates nonlinear, high-dimensional, and potentially non-differentiable forward models without restrictive assumptions on the noise model. We further analyze the behavior of the learned velocity field in the regime of finite training data. Under mild architectural assumptions, we show that overtraining can induce degenerate behavior in the generated conditional distributions, including variance collapse and a phenomenon termed selective memorization, wherein generated samples concentrate around training data points associated with similar observations. A simplified theoretical analysis explains this behavior, and numerical experiments confirm it in practice. We demonstrate that standard early-stopping criteria based on monitoring test loss effectively mitigate such degeneracy. The proposed method is evaluated on several physics-based inverse problems. We investigate the impact of different choices of source distributions, including Gaussian and data-informed priors. Across these examples, conditional flow matching accurately captures complex, multimodal posterior distributions while maintaining computational efficiency.

URL PDF HTML ☆

赞 0 踩 0

2506.13017 2026-04-09 stat.AP

Spatially Varying Deep Functional Neural Network: Application in Large-Scale Crop Yield Prediction

Yeonjoo Park, Bo Li, Yehua Li

详情

DOI: 10.1093/jrsssc/qlag023
Journal ref: Journal of the Royal Statistical Society Series C: Applied Statistic (2026)

英文摘要

Accurate prediction of crop yield is critical for supporting food security, agricultural planning, and economic decision-making. However, yield forecasting remains a significant challenge due to the complex and nonlinear relationships between weather variables and crop production, as well as spatial heterogeneity across agricultural regions. We propose DSNet, a deep neural network architecture that integrates functional and scalar predictors with spatially varying coefficients and spatial random effects. The method is designed to flexibly model spatially indexed functional data, such as daily temperature curves, and their relationship to variability in the response, while accounting for spatial correlation. DSNet mitigates the curse of dimensionality through a low-rank structure inspired by the spatially varying functional index model (SVFIM). Through comprehensive simulations, we demonstrate that DSNet outperforms state-of-the-art functional regression models for spatial data, when the functional predictors exhibit complex structure and their relationship with the response varies spatially in a potentially nonstationary manner. Application to corn yield data from the U.S. Midwest demonstrates that DSNet achieves superior predictive accuracy compared to both leading machine learning approaches and parametric statistical models. These results highlight the model's robustness and its potential applicability to other weather-sensitive crops.

URL PDF HTML ☆

赞 0 踩 0

2503.24209 2026-04-09 math.ST math.PR stat.TH

Optimal low-rank posterior mean and distribution approximation in linear Gaussian inverse problems on Hilbert spaces

Giuseppe Carere, Han Cheng Lie

Comments To be published in Inverse Problems and Imaging, 43 pages, 5 figures

2503.08028 2026-04-09 stat.ML cs.LG

Computational bottlenecks for denoising diffusions

Andrea Montanari, Viet Vu

Comments 51 pages; 2 figures

2411.19653 2026-04-09 stat.ML cs.LG

Nonparametric Instrumental Regression via Kernel Methods is Minimax Optimal

Dimitri Meunier, Zhu Li, Tim Christensen, Arthur Gretton

2402.14260 2026-04-09 stat.ME

A New Regression Lens on Multi-Class Classification

Xin Bing, Bingqing Li, Marten Wegkamp

2307.03571 2026-04-09 cs.LG math.OC stat.ML

Smoothing the Edges: Smooth Optimization for Sparse Regularization using Hadamard Overparametrization

Chris Kolb, Christian L. Müller, Bernd Bischl, David Rügamer