arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.03247 2026-03-04 stat.ME stat.AP

Fusing Sparse Observations and Dense Simulations for Spatial Extreme Value Analysis: Application to U.S. Coastal Sea Levels

Brian N. White, Brian Blanton, Rick Luettich, Richard L. Smith

Comments 34 pages, 7 figures, 7 tables; Supporting Information included

详情

英文摘要

Estimating spatial extremes from sparse observational networks produces uncertain return level maps, but dense output from physics-based simulation models is often available as a complementary data source. We develop a two-stage frequentist frame-work for fusing observations and simulations. In Stage 1, generalized extreme value (GEV) distributions are fitted independently at each site, with a nonstationary location parameter where appropriate to accommodate observed trends. In Stage 2, the parameter estimates from all sources are modeled jointly as a high-dimensional spatial process through a linear model of coregionalization (LMC). Cross-source correlations, estimated from spatially interspersed networks without co-located sites, provide the mechanism for information transfer; an analytic gradient for the resulting likelihood keeps estimation computationally practical. We apply the framework to U.S. coastal sea levels over 1979-2021, fusing 29 NOAA tide gauge records with 100 ADCIRC hydrodynamic simulation sites. Leave-one-out cross-validation shows a 35% reduction in 100-year return level RMSE relative to a gauge-only model. Geographic block cross-validation confirms that fusion benefits persist under spatial extrapolation. The approach is implemented in the R package evfuse.

URL PDF HTML ☆

赞 0 踩 0

2603.03235 2026-03-04 stat.ML cs.LG stat.ME

The elbow statistic: Multiscale clustering statistical significance

Francisco J. Perez-Reche

Comments 30 pages, 3 figures, 5 tables

2603.03191 2026-03-04 stat.ML cs.LG math.OC

A Covering Framework for Offline POMDPs Learning using Belief Space Metric

Youheng Zhu, Yiping Lu

2603.03154 2026-03-04 stat.ME stat.CO

Extending the saemix package for R to fit non Gaussian outcomes

Emmanuelle Comets, Maud Delattre, Belhal Karimi

Comments Main text: 24 pages, 6 figures, 6 tables

2603.03035 2026-03-04 stat.ML cs.LG

Generalized Bayes for Causal Inference

Emil Javurek, Dennis Frauen, Yuxin Wang, Stefan Feuerriegel

2603.03008 2026-03-04 econ.EM stat.ME

Focused Weighted-Average Least Squares Estimator

Shou-Yung Yin

2603.02906 2026-03-04 cs.LG stat.ME

Towards Accurate and Interpretable Time-series Forecasting: A Polynomial Learning Approach

Bo Liu, Shao-Bo Lin, Changmiao Wang, Xiaotong Liu

2603.02898 2026-03-04 q-fin.ST econ.EM stat.AP

Range-Based Volatility Estimators for Monitoring Market Stress: Evidence from Local Food Price Data

Bo Pieter Johannes Andrée

Comments 41 pages, 10 figures, 11 tables

2603.02890 2026-03-04 math.ST math.PR stat.TH

Markov processes on a circular lattice

Sourav Majumdar

2603.02861 2026-03-04 stat.ME

Focused Information Criteria for Semiparametric Linear Hazard Regression

Axel Gandy, Nils Lid Hjort

Comments 16 pages, 4 figures, 3 tables; Statistical Research Report, Department of Mathematics, University of Oslo, February 2009, now arXiv'd March 2026. The paper was accepted by Biometrika in 2010, modulo "minor changes", but things slipped away from our tables

2603.02840 2026-03-04 cs.LG stat.ML

Adapting Time Series Foundation Models through Data Mixtures

Thomas L. Lee, Edoardo M. Ponti, Amos Storkey

Comments Preprint, 8 pages

2603.02753 2026-03-04 cs.LG q-bio.QM stat.ML

Deep learning-guided evolutionary optimization for protein design

Erik Hartman, Di Tang, Johan Malmström

Comments Code available at GitHub

2602.21501 2026-03-04 stat.ML cs.LG math.ST stat.TH

A Researcher's Guide to Empirical Risk Minimization

Lars van der Laan

Comments Version 2; minor edits and clarifications, expanded references, extended Section 2 (high-probability bounds)

详情

英文摘要

This guide provides a reference for high-probability regret bounds in empirical risk minimization (ERM). The presentation is modular: we begin with intuition and general proof strategies, then state broadly applicable guarantees under high-level conditions and provide tools for verifying them for specific losses and function classes. We emphasize that many ERM rate derivations can be organized around a three-step recipe -- a basic inequality, a uniform local concentration bound, and a fixed-point argument -- which yields regret bounds in terms of a critical radius, defined via localized Rademacher complexity, under a mild Bernstein-type variance-risk condition. To make these bounds concrete, we upper bound the critical radius using local maximal inequalities and metric-entropy integrals, thereby recovering familiar rates for VC-subgraph, Sobolev/Hölder, and bounded-variation classes. We also study ERM with nuisance components -- including weighted ERM and Neyman-orthogonal losses -- as they arise in causal inference, missing data, and domain adaptation. Following the orthogonal statistical learning framework, we highlight that these problems often admit regret-transfer bounds linking regret under an estimated loss to population regret under the target loss. These bounds typically decompose the regret into (i) statistical error under the estimated loss and (ii) approximation error due to nuisance estimation. Under sample splitting or cross-fitting, the first term can be controlled using standard fixed-loss ERM regret bounds, while the second depends only on nuisance-estimation accuracy. As a novel contribution, we also treat the in-sample regime, in which the nuisances and the ERM are fit on the same data, deriving regret bounds and showing that fast oracle rates remain attainable under suitable smoothness and Donsker-type conditions.

URL PDF HTML ☆

赞 0 踩 0

2602.20394 2026-03-04 stat.ML cond-mat.stat-mech cs.LG

Selecting Optimal Variable Order in Autoregressive Ising Models

Shiba Biswal, Marc Vuffray, Andrey Y. Lokhov

2602.05797 2026-03-04 cs.LG stat.ME

Classification Under Local Differential Privacy with Model Reversal and Model Averaging

Caihong Qin, Yang Bai

2511.08111 2026-03-04 math.PR math.ST stat.TH

On the Kantorovich contraction of Markov semigroups

Pierre Del Moral, Mathieu Gerber

Comments 39 pages

2510.08646 2026-03-04 cs.LG cs.AI cs.CL stat.ML

Mitigating Over-Refusal in Aligned Large Language Models via Inference-Time Activation Energy

Eric Hanchen Jiang, Weixuan Ou, Run Liu, Shengyuan Pang, Guancheng Wan, Ranjie Duan, Wei Dong, Kai-Wei Chang, XiaoFeng Wang, Ying Nian Wu, Xinfeng Li

2510.08382 2026-03-04 cs.LG stat.ML

Characterizing the Multiclass Learnability of Forgiving 0-1 Loss Functions

Jacob Trauger, Tyson Trauger, Ambuj Tewari

Comments 15 pages

2509.22613 2026-03-04 cs.AI cs.CL cs.LG stat.ML

Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective

Siwei Wang, Yifei Shen, Haoran Sun, Shi Feng, Shang-Hua Teng, Li Dong, Yaru Hao, Wei Chen

2508.14400 2026-03-04 math.ST stat.TH

Gaussian Multiplier Bootstrap Procedure for the $k$th Largest Coordinate of High-Dimensional Statistics

Yixi Ding, Qizhai Li, Yuke Shi, Liuquan Sun, Luobin Zhang

2508.12983 2026-03-04 stat.ME

Dynamic Latent Class Structural Equation Modeling: A Hands-On Tutorial for Modeling Intensive Longitudinal Data

Roberto Faleh, Sofia Morelli, Vivato Andriamiarana, Zachary J. Roman, Christoph Flückiger, Holger Brandt

Comments 41 pages, 13 figures,13 tables

2506.07275 2026-03-04 cs.LG cs.HC stat.AP

Tailored Behavior-Change Messaging for Physical Activity: Integrating Contextual Bandits and Large Language Models

Haochen Song, Dominik Hofer, Rania Islambouli, Laura Hawkins, Ananya Bhattacharjee, Zahra Hassanzadeh, Jan Smeddinck, Meredith Franklin, Joseph Jay Williams

详情

英文摘要

Contextual multi-armed bandit (cMAB) algorithms offer a promising framework for adapting behavioral interventions to individuals over time. However, cMABs often require large samples to learn effectively and typically rely on a finite pre-set of fixed message templates. In this paper, we present a hybrid cMABxLLM approach in which the cMAB selects an intervention type, and a large language model (LLM) which personalizes the message content within the selected type. We deployed this approach in a 30-day physical-activity intervention, comparing four behavioral change intervention types: behavioral self-monitoring, gain-framing, loss-framing, and social comparison, delivered as daily motivational messages to support motivation and achieve a daily step count. Message content is personalized using dynamic contextual factors, including daily fluctuations in self-efficacy, social influence, and regulatory focus. Over the trial, participants received daily messages assigned by one of five models: equal randomization (RCT), cMAB only, LLM only, LLM with interaction history, or cMABxLLM. Outcomes include motivation towards physical activity and message usefulness, assessed via ecological momentary assessments (EMAs). We evaluate and compare the five delivery models using pre-specified statistical analyses that account for repeated measures and time trends. We find that the cMABxLLM approach retains the perceived acceptance of LLM-generated messages, while reducing token usage and providing an explicit, reproducible decision rule for intervention selection. This hybrid approach also avoids the skew in intervention delivery by improving support for under-delivered intervention types. More broadly, our approach provides a deployable template for combining Bayesian adaptive experimentation with generative models in a way that supports both personalization and interpretability.

URL PDF HTML ☆

赞 0 踩 0

2506.05116 2026-03-04 stat.ME econ.EM math.ST stat.TH

The Spurious Factor Dilemma: Robust Inference in Heavy-Tailed Elliptical Factor Models

Jiang Hu, Jiahui Xie, Yangchun Zhang, Wang Zhou

Comments Added some content and some simulations

2505.21813 2026-03-04 cs.LG stat.ML

Optimizing Data Augmentation through Bayesian Model Selection

Madi Matymov, Ba-Hien Tran, Michael Kampffmeyer, Markus Heinonen, Maurizio Filippone

Comments 26 pages, 3 figures

2505.18996 2026-03-04 cs.LG stat.ML

Automatic and Structure-Aware Sparsification of Hybrid Neural ODEs

Bob Junyi Zou, Lu Tian

Comments Accepted at The 14th International Conference on Learning Representations (ICLR) 2026

2505.15008 2026-03-04 cs.LG cs.AI stat.ML

Know When to Abstain: Optimal Selective Classification with Likelihood Ratios

Alvin Heng, Harold Soh

2502.13583 2026-03-04 math.NA cs.NA math.OC stat.ML

Fundamental Bias in Inverting Random Sampling Matrices with Application to Sub-sampled Newton

Chengmei Niu, Zhenyu Liao, Zenan Ling, Michael W. Mahoney

Comments 55 pages, 4 figures. This version incorporates minor revisions to the proof

2501.13483 2026-03-04 stat.ML cs.LG

Robust Amortized Bayesian Inference with Self-Consistency Losses on Unlabeled Data

Aayush Mishra, Daniel Habermann, Marvin Schmitt, Stefan T. Radev, Paul-Christian Bürkner

Comments Accepted to International Conference on Learning Representations (ICLR) 2026

2501.09648 2026-03-04 stat.ME math.ST stat.TH

Central limit theorems for interacting innovation processes, related statistical tools and general results

Giacomo Aletti, Irene Crimaldi, Andrea Ghiglietti

2411.12725 2026-03-04 cs.GT econ.TH stat.ML

The Bounds of Algorithmic Collusion; $Q$-learning, Gradient Learning, and the Folk Theorem

Galit Askenazi-Golan, Domenico Mergoni Cecchelli, Edward Plumb, Clemens Possnig

Comments This is a new version of a previous paper by the title "Reinforcement Learning, Collusion, and the Folk Theorem" by the three (alphabetically) first authors

2410.22047 2026-03-04 math.PR math.ST stat.TH

Self-normalized Cramér-type Moderate Deviation of Stochastic Gradient Langevin Dynamics

Hongsheng Dai, Xiequan Fan, Jianya Lu

2408.03039 2026-03-04 math.ST stat.TH

Gaussian Approximations for the $k$th coordinate of sums of random vectors

Yixi Ding, Qizhai Li, Yuke Shi, Wei Zhang

Comments This submission is a duplicate of arXiv:2508.14400. We mistakenly created a new submission instead of a replacement

2407.01656 2026-03-04 cs.LG cond-mat.dis-nn physics.data-an q-bio.NC stat.ML

Absolute abstraction: a renormalisation group approach

Carlo Orientale Caputo, Elias Seiffert, Enrico Frausin, Matteo Marsili

Comments 35 pages, 6 figures

2405.15204 2026-03-04 stat.ME

A New Fit Assessment Framework for Common Factor Models Using Generalized Residuals

Youjin Sung, Youngjin Han, Yang Liu

2312.01518 2026-03-04 stat.AP

Analyzing State-Level Longevity Trends with the U.S. Mortality Database

Mike Ludkovski, Doris Padilla

Comments 31 pages, 18 figures

2210.09709 2026-03-04 stat.ML cs.LG math.ST stat.TH

Importance Weighting Correction of Regularized Least-Squares for Target Shift

Davit Gogolashvili

2603.02729 2026-03-04 cs.LG math.OC stat.ML

The power of small initialization in noisy low-tubal-rank tensor recovery

ZHiyu Liu, Haobo Geng, Xudong Wang, Yandong Tang, Zhi Han, Yao Wang

2603.02723 2026-03-04 stat.ME

The partly parametric and partly nonparametric additive risk model

Nils Lid Hjort, Emil Aas Stoltenberg

Comments 26 pages, 5 figures; Statistical Research Report, Department of Mathematics, University of Oslo, August 2021, but arXiv'd March 2026. The article has appeared in essentially this form in Lifetime Data Analysis 2021, vol. 27, pages 1-31, at this url: link.springer.com/content/pdf/10.1007/s10985-021-09535-3.pdf

2603.02649 2026-03-04 cs.LG math.OC stat.ML

HomeAdam: Adam and AdamW Algorithms Sometimes Go Home to Obtain Better Provable Generalization

Feihu Huang, Guanyi Zhang, Songcan Chen

Comments 39 pages

2603.02616 2026-03-04 stat.AP cs.AI cs.LG stat.ME stat.ML

Detecting Structural Heart Disease from Electrocardiograms via a Generalized Additive Model of Interpretable Foundation-Model Predictors

Ya Zhou, Zhaohong Sun, Tianxiang Hao, Xiangjie Li

2603.02611 2026-03-04 stat.ME stat.AP

A Bayesian Hierarchical Hurdle Beta-Binomial Model for Survey-Weighted Bounded Counts and Its Application to Childcare Enrollment

JoonHo Lee

2603.02607 2026-03-04 stat.ML cs.DS cs.LG math.OC

Combinatorial Sparse PCA Beyond the Spiked Identity Model

Syamantak Kumar, Purnamrita Sarkar, Kevin Tian, Peiyuan Zhang

Comments 36 pages, 6 figures

2603.02594 2026-03-04 stat.ML cs.CC cs.DS cs.LG

Low-Degree Method Fails to Predict Robust Subspace Recovery

He Jia, Aravindan Vijayaraghavan

Comments 27 pages, 1 figure

2603.02593 2026-03-04 stat.CO stat.ME

Composite Wavelet Matrix-Based Transforms and Applications

Radhika Kulkarni, Brani Vidakovic

Comments 30 pages, 9 figures, 6 tables

2603.02574 2026-03-04 stat.AP stat.CO

An Augmented Rating System for Test cricket: adapting Glicko's model

Rhitankar Bandyopadhyay, Diganta Mukherjee

Comments 23 pages, 17 tables, 1 figure

2603.02533 2026-03-04 cs.IT cs.CV cs.LG math.IT math.ST stat.ML stat.TH

Functional Properties of the Focal-Entropy

Jaimin Shah, Martina Cardone, Alex Dytso

Comments Accepted to AISTATS 2026

2603.02509 2026-03-04 stat.ME stat.AP

Semi-partitioned Generalized Method of Moments for Longitudinal Data with Lagged and Feedback Covariates

Niloofar Ramezani, Jeffrey R. Wilson

Comments 15 pages, 0 figures, 5 tables

2603.02502 2026-03-04 stat.ME

Tree-Embedded Bayesian Factor Models for Multidimensional Categorical Distributions

Naoki Awaya, Keisuke Sasaki, Genya Kobayashi, Shonosuke Sugasawa

2603.02492 2026-03-04 cs.IT cs.LO math.IT math.LO math.ST stat.TH

E-variables and tests of randomness for distribution classes

Georgii Potapov, Yuri Kalnishkan

2603.02485 2026-03-04 stat.ME stat.AP

A Decision Analysis Framework for High-fidelity and Low-fidelity Systems with Applications in Manufacturing Processes

Fan Zhang, Qiong Zhang, Madhura Limaye, Dhanashree Shinde, Gang Li, Sai Aditya Pradeep, Srikanth Pilla

2603.02483 2026-03-04 stat.ML cs.CG cs.CV cs.LG

Geometric structures and deviations on James' symmetric positive-definite matrix bicone domain

Jacek Karwowski, Frank Nielsen

Comments 35 pages, 4 figures

2603.02474 2026-03-04 stat.ME

Transportable inference using target population summary statistics under covariate shift

Ying Sheng, Yifei Sun, Chiung-Yu Huang

2603.02467 2026-03-04 stat.CO

CCMnet: A Software Package for Network Generation with Congruence Class Models

Ravi Goyal, Victor De Gruttola, Natasha K. Martin, Lior Rennert, Jukka-Pekka Onnela

Comments 27 pages, 9 figures, 2 tables

2603.02452 2026-03-04 cs.LG cs.AI stat.ML

Manifold Aware Denoising Score Matching (MAD)

Alona Levy-Jurgenson, Alvaro Prat, James Cuin, Yee Whye Teh

2603.02437 2026-03-04 stat.CO stat.ME

Leveraging Sparsity to Improve No-U-Turn Sampling Efficiency for Hierarchical Bayesian Models

Cole C. Monnahan, Kasper Kristensen, James T. Thorson, Bob Carpenter

Comments 26 pages, 12 figures including appendices

2603.02429 2026-03-04 cs.LG math.OC stat.ML

Dimension-Independent Convergence of Underdamped Langevin Monte Carlo in KL Divergence

Shiyuan Zhang, Qiwei Di, Xuheng Li, Quanquan Gu

Comments 51 pages, 1 table

2603.02424 2026-03-04 stat.AP

On the misuse of time-dependent models in assessing mask usage and excess mortality

Beny Spira, Daniel V. Tausk

Comments 19 pages, 3 figures

详情

英文摘要

The effectiveness of face masks as a population level intervention against respiratory viral transmission remains contested. While a large observational literature published during the COVID-19 pandemic reported beneficial effects, randomized controlled trials have consistently shown limited or no impact. An ecological analysis of European countries reported that average mask usage during the years 2020 and 2021 is positively associated with excess mortality in that same period in 24 European countries (Tausk and Spira, 2025). Such association remains after several attempts at controlling for confounding variables. This finding was later challenged by other authors and attributed to reverse causality (Cerqueira-Silva et al., 2026). In this paper, we reassess those criticisms in detail. We show that their analysis is fundamentally flawed, as the time-dependent regression framework used to refute the original findings yields spurious results partly due to the use of cumulative excess mortality as an outcome variable, thereby incorporating pre-intervention deaths and producing statistically significant effects even at impossible negative time lags. Diagnostic analyses further demonstrate that key assumptions of the model are violated, invalidating any association or causal interpretation. Finally, we present an original longitudinal analysis of mask usage designed to directly test the reverse causality hypothesis. By constructing multiple indices that capture mask adoption during distinct phases of pandemic waves, including interwave periods characterized by low mortality, we show that the association between mask usage and excess mortality persists and is not driven by reactive increases in masking. These findings provide substantial evidence that reverse causality provides, at most, a minor contribution to the observed association.

URL PDF HTML ☆

赞 0 踩 0

2603.02418 2026-03-04 stat.AP

Contributions of geolocated weather and building related data for insurance assessment of flood risks

Mulah Moriah, Franck Vermet, Pierre Ailliot, Philippe Naveau, Juliette Legrand

2603.02372 2026-03-04 stat.OT physics.pop-ph physics.space-ph stat.AP

Implications of the Pessimistic Lower Limit on the Drake Equation

Max Baak, Hella Snoek

Comments 8 pages, 2 figures

详情

英文摘要

The observation of life on Earth is generally accepted to be uninformative concerning the probability of life on other Earth-like planets, a belief first formalized by Brandon Carter and based on the selection effect of our existence. In a similar way, the Drake equation is either presented as estimate of the total number of active, communicative, extraterrestrial civilizations in our Galaxy ($n^g_{\rm civ}$), i.e. excluding humanity, or humanity is included in the estimate but judged to be an uninformative data point. Daniel Whitmire has recently challenged the Carter abiogenesis argument, claiming the logic behind it is flawed, as the conditional likelihoods used by Carter in Bayes' theorem are not evaluated prior to the occurrence of the evidence of life on Earth, but posterior. Doing so correctly, the anthropic selection effect is removed and the observation of life on Earth is informative after all. Following this argument, we treat the Drake equation as estimate of all technological civilizations in a statistical counting experiment and include the data point of humanity as informative evidence. This allows one to set a pessimistic lower limit on $n^o_{\rm civ}$ for the observable universe, $n^o_{\rm civ} > 0.051$ at 95\% C.L., or $n^g_{\rm civ} > 8\times10^{-13}$ at 95\% C.L. for the Galaxy. In particular, this excludes models that predict $n^o_{\rm civ}\ll 1$ for the observable universe and refines the allowable parameter space for hypotheses like Rare Earth. Our analysis substantially reduces the portion of the Drake equation parameter space that predicts humanity is alone; when applying the lower limit this study finds $P(n^o_{\rm civ}>1 |\, {\rm humanity}) = 97.6\%$, making solitude in the observable universe a disfavored outcome. For the low-end estimate of $n^o_{\rm civ}\! =\! 1$ we calculate a probability of 42\% for the existence of other communicating civilizations.

URL PDF HTML ☆

赞 0 踩 0

2603.02360 2026-03-04 math.PR stat.AP

"Game, Set, Match": Double Delight Watching a Grand Slam Tennis Match

Edsel A. Pena, Dip Das, Yuexuan Wu

2603.02289 2026-03-04 stat.ME cs.LG stat.ML

Topological Causal Effects

Kwangho Kim, Hajin Lee

2603.02282 2026-03-04 stat.ME

A Simpson Based Estimation Approach for the Overlapping Coefficient of k>=2 Normal Distributions

Omar Eidous, Majd Alsheyyab

Comments 15 pages, 2 tables

2602.19988 2026-03-04 stat.ME stat.CO

Change point analysis of high-dimensional data using random projections

Yi Xu, Yeonwoo Rho

2601.13759 2026-03-04 stat.ME

ChauBoxplot and AdaptiveBoxplot: Two R packages for boxplot-based outlier detection

Tiejun Tong, Hongmei Lin, Bowen Gang, Riquan Zhang

Comments 11 pages, 2 figures, 2 tables

2511.19476 2026-03-04 stat.ML cs.AI cs.LG

FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection

Jin Cui, Boran Zhao, Jiajun Xu, Jiaqi Guo, Shuo Guan, Pengju Ren

Comments 20 pages, 17 figures

详情

英文摘要

Coreset selection compresses large datasets into compact, representative subsets, reducing the energy and computational burden of training deep neural networks. Existing methods are either: (i) DNN-based, which are tied to model-specific parameters and introduce architectural bias; or (ii) DNN-free, which rely on heuristics lacking theoretical guarantees. Neither approach explicitly constrains distributional equivalence, largely because continuous distribution matching is considered inapplicable to discrete sampling. Moreover, prevalent metrics (e.g., MSE, KL, CE, MMD) cannot accurately capture higher-order moment discrepancies, leading to suboptimal coresets. In this work, we propose FAST, the first DNN-free distribution-matching coreset selection framework that formulates the coreset selection task as a graph-constrained optimization problem grounded in spectral graph theory and employs the Characteristic Function Distance (CFD) to capture full distributional information in the frequency domain. We further discover that naive CFD suffers from a "vanishing phase gradient" issue in medium and high-frequency regions; to address this, we introduce an Attenuated Phase-Decoupled CFD. Furthermore, for better convergence, we design a Progressive Discrepancy-Aware Sampling strategy that progressively schedules frequency selection from low to high, preserving global structure before refining local details and enabling accurate matching with fewer frequencies while avoiding overfitting. Extensive experiments demonstrate that FAST significantly outperforms state-of-the-art coreset selection methods across all evaluated benchmarks, achieving an average accuracy gain of 9.12%. Compared to other baseline coreset methods, it reduces power consumption by 96.57% and achieves a 2.2x average speedup, underscoring its high performance and energy efficiency.

URL PDF HTML ☆

赞 0 踩 0

2510.10902 2026-03-04 cs.LG stat.ML

Auditing Information Disclosure During LLM-Scale Gradient Descent Using Gradient Uniqueness

Sleem Abdelghafar, Maryam Aliakbarpour, Chris Jermaine

2509.20508 2026-03-04 stat.ML cs.LG

Fast Estimation of Wasserstein Distances via Regression on Sliced Wasserstein Distances

Khai Nguyen, Hai Nguyen, Nhat Ho

Comments Accepted to ICLR 2026, 34 pages, 30 figures, 6 tables

2508.01154 2026-03-04 stat.ME

On classes of distributions on the unit interval: structural properties and application to inequality data

Roberto Vila, Helton Saulo, Poliana Matos, Subhankar Dutta

Comments 39 pages, 24 figures

2507.18686 2026-03-04 math.ST math.AG math.CO math.CV stat.TH

One-dimensional Discrete Models of Maximum Likelihood Degree One

Carlos Améndola, Viet Duc Nguyen, Janike Oldekop

Comments 25 pages, minor improvements and more references to CR geometry

2507.08965 2026-03-04 cs.LG cs.AI stat.ML

Improving Classifier-Free Guidance in Masked Diffusion: Low-Dim Theoretical Insights with High-Dim Impact

Kevin Rojas, Ye He, Chieh-Hsin Lai, Yuhta Takida, Yuki Mitsufuji, Molei Tao

2507.08150 2026-03-04 stat.ML cs.LG stat.ME

CLEAR: Calibrated Learning for Epistemic and Aleatoric Risk

Ilia Azizi, Juraj Bodik, Jakob Heiss, Bin Yu

Comments Project page: https://unco3892.github.io/clear/

2506.01502 2026-03-04 cs.LG cs.AI stat.ML

Learning of Population Dynamics: Inverse Optimization Meets JKO Scheme

Mikhail Persiianov, Jiawei Chen, Petr Mokrov, Alexander Tyurin, Evgeny Burnaev, Alexander Korotin

2505.22423 2026-03-04 math.ST stat.TH

Max-laws of large numbers for weakly dependent high dimensional arrays with applications

Jonathan B. Hill

2505.13614 2026-03-04 cs.LG stat.ML

Deterministic Bounds and Random Estimates of Metric Tensors on Neuromanifolds

Ke Sun

Comments Published at the Fourteenth International Conference on Learning Representations (ICLR 2026)

2503.20513 2026-03-04 q-bio.BM q-bio.QM stat.AP

A Principal Submanifold-based Approach for Clustering and Multiscale RNA Correction

Menghao Wu, Zhigang Yao

Comments 30 pages, 15 figures

2501.18912 2026-03-04 stat.AP cs.SI

Constructing Reliable Social Networks from Conversational Data: An Ensemble Prompt Engineering Approach with Uncertainty Quantification

Gwanghee Kim, Ick Hoon Jin, Minjeong Jeon

2412.02333 2026-03-04 stat.ME

Estimation of a multivariate von Mises distribution for contaminated torus data

Giulia Bertagnolli, Luca Greco, Claudio Agostinelli

2412.00798 2026-03-04 cs.LG stat.ML

Combinatorial Rising Bandits

Seockbean Song, Youngsik Yoon, Siwei Wang, Wei Chen, Jungseul Ok

2411.01563 2026-03-04 math.ST stat.ML stat.TH

Statistical guarantees for denoising reflected diffusion models

Asbjørn Holk, Claudia Strauch, Lukas Trottner

2410.06378 2026-03-04 stat.ML cs.AI cs.IT cs.LG math.IT

Covering Numbers for Deep ReLU Networks with Applications to Function Approximation and Nonparametric Regression

Weigutian Ou, Helmut Bölcskei

Comments To appear in Foundations of Computational Mathematics

2407.10417 2026-03-04 stat.ML cs.LG

Proper losses regret at least 1/2-order

Han Bao, Asuka Takatsu

Comments JMLR accepted (50 pages)

2406.01819 2026-03-04 stat.ME

Bayesian Linear Models: A compact general set of results

J Andres Christen

Comments 13 pages, 4 figures, Python implementation

2406.00961 2026-03-04 math.PR math.ST stat.TH

Kronecker-product random matrices and a matrix least squares problem

Zhou Fan, Renyuan Ma

2403.10945 2026-03-04 stat.ME stat.AP

Zero-inflated stochastic volatility model for disaggregated inflation data with exact zeros

Geonhee Han, Kaoru Irie

2302.12717 2026-03-04 stat.ME stat.ML

Statistical Inference with Stochastic Gradient Methods under $ϕ$-mixing Data

Ruiqi Liu, Xi Chen, Zuofeng Shang

2210.10278 2026-03-04 cs.LG cs.GT stat.ML

A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design

Rui Ai, Boxiang Lyu, Zhaoran Wang, Zhuoran Yang, Michael I. Jordan