arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2602.05997 2026-02-06 stat.ML cs.LG stat.ME

Causal Inference on Stopped Random Walks in Online Advertising

Jia Yuan Yu

2602.05996 2026-02-06 cs.LG stat.ML

Orthogonal Self-Attention

Leo Zhang, James Martens

Comments Preprint

2602.05927 2026-02-06 stat.ML cs.LG

Transformers Are Born Biased: Structural Inductive Biases at Random Initialization and Their Practical Consequences

Siquan Li, Yao Tong, Haonan Wang, Tianyang Hu

详情

英文摘要

Transformers underpin modern large language models (LLMs) and are commonly assumed to be behaviorally unstructured at random initialization, with all meaningful preferences emerging only through large-scale training. We challenge this assumption by showing that randomly initialized transformers already exhibit strong and systematic structural biases. In particular, untrained models display extreme token preferences: across random input sequences, certain tokens are predicted with probabilities orders of magnitude larger. We provide a mechanistic explanation for this phenomenon by dissecting the transformer architecture at initialization. We show that extreme token preference arises from a contraction of token representations along a random seed-dependent direction. This contraction is driven by two interacting forces: (i) asymmetric nonlinear activations in MLP sublayers induce global (inter-sequence) representation concentration, and (ii) self-attention further amplifies this effect through local (intra-sequence) aggregation. Together, these mechanisms align hidden representations along a direction determined solely by the random initialization, producing highly non-uniform next-token predictions. Beyond mechanistic insight, we demonstrate that these initialization-induced biases persist throughout training, forming a stable and intrinsic model identity. Leveraging this property, we introduce SeedPrint, a fingerprinting method that can reliably distinguish models that differ only in their random initialization, even after extensive training and under substantial distribution shift. Finally, we identify a fundamental positional discrepancy inherent to the attention mechanism's intra-sequence contraction that is causally linked to the attention-sink phenomenon. This discovery provides a principled explanation for the emergence of sinks and offers a pathway for their control.

URL PDF HTML ☆

赞 0 踩 0

2602.05861 2026-02-06 cs.LG stat.ML

CFRecs: Counterfactual Recommendations on Real Estate User Listing Interaction Graphs

Seyedmasoud Mousavi, Ruomeng Xu, Xiaojing Zhu

2602.05852 2026-02-06 cs.LG cs.IT math.IT stat.ML

Exact Recovery in the Data Block Model

Amir R. Asadi, Akbar Davoodi, Ramin Javadi, Farzad Parvaresh

Comments 35 pages

2602.05846 2026-02-06 stat.ML cs.LG

Optimal scaling laws in learning hierarchical multi-index models

Leonardo Defilippis, Florent Krzakala, Bruno Loureiro, Antoine Maillard

2602.05812 2026-02-06 cs.LG stat.ML

Principled Confidence Estimation for Deep Computed Tomography

Matteo Gätzner, Johannes Kirschner

2602.05799 2026-02-06 math.OC cs.LG stat.ML

Non-Stationary Inventory Control with Lead Times

Nele H. Amiri, Sean R. Sinclair, Maximiliano Udenio

2602.05798 2026-02-06 stat.ME cs.LG eess.SP stat.ML

Learning False Discovery Rate Control via Model-Based Neural Networks

Arnau Vilella, Jasin Machkour, Michael Muma, Daniel P. Palomar

Comments Accepted to IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2026

2602.05784 2026-02-06 stat.ME

Correcting Measurement Error and Zero Inflation in Functional Covariates for Scalar-on-Function Quantile Regression

Caihong Qin, Lan Xue, Ufuk Beyaztas, Roger S. Zoh, Mark Benden, Jeff Goldsmith, Carmen D. Tekwe

2602.05778 2026-02-06 stat.ME

Copula-based models for spatially dependent cylindrical data

Francesca Labanca, Anna Gottard, Nadja Klein

2602.05704 2026-02-06 cs.LG stat.ML

Limitations of SGD for Multi-Index Models Beyond Statistical Queries

Daniel Barzilai, Ohad Shamir

2602.05611 2026-02-06 stat.ME

The stochastic view used in climate sciences: (some) perspectives from (some of) mathematical statistics

Nils Lid Hjort

Comments 17 pages, 18 figures

2602.05592 2026-02-06 math.ST econ.EM stat.TH

An invariant modification of the bilinear form test

Angelo Garate, Felipe Osorio, Federico Crudu

Comments 7 pages

2602.05559 2026-02-06 stat.CO math.ST stat.AP stat.ML stat.TH

Piecewise Deterministic Markov Processes for Bayesian Inference of PDE Coefficients

Leon Riccius, Iuri B. C. M. Rocha, Joris Bierkens, Hanne Kekkonen, Frans P. van der Meer

Comments 38 pages, 17 figures

2602.05531 2026-02-06 math.OC cs.LG stat.ML

Solving Stochastic Variational Inequalities without the Bounded Variance Assumption

Ahmet Alacaoglu, Jun-Hyun Kim

2602.05489 2026-02-06 math.OC cs.LG stat.ML

Convergence Rate of the Last Iterate of Stochastic Proximal Algorithms

Kevin Kurian Thomas Vaidyan, Michael P. Friedlander, Ahmet Alacaoglu

2602.05460 2026-02-06 math.ST stat.TH

Complexity reduction in online stochastic Newton methods with potential O(N d) total cost

Antoine Godichon-Baggioni, Bruno Portier, Guillaume Sallé

2602.05379 2026-02-06 stat.ML cs.LG

Variance Reduction Based Experience Replay for Policy Optimization

Hua Zheng, Wei Xie, M. Ben Feng, Keilung Choy

Comments 24 pages, 4 figures. arXiv admin note: text overlap with arXiv:2208.12341

2602.05377 2026-02-06 stat.CO stat.AP

Optimal Accelerated Life Testing Sampling Plan Design with Piecewise Linear Function based Modeling of Lifetime Characteristics

Sandip Barui, Shovan Chowdhury

2602.05351 2026-02-06 stat.ME math.ST stat.TH

A Flexible Modeling of Extremes in the Presence of Inliers

Shivshankar Nila, Ishapathik Das, N. Balakrishna

2602.05340 2026-02-06 stat.ML cs.LG

Decision-Focused Sequential Experimental Design: A Directional Uncertainty-Guided Approach

Beichen Wan, Mo Liu, Paul Grigas, Zuo-Jun Max Shen

2602.05335 2026-02-06 stat.ME cs.GR stat.AP

Boxplots and quartile plots for grouped and periodic angular data

Joshua D. Berlinski, Fan Dai, Ranjan Maitra

Comments 7 pages, 8 figures

2602.05225 2026-02-06 math.ST stat.ML stat.TH

Metric space valued Fréchet regression

László Györfi, Pierre Humbert, Batiste Le Bars

2602.04795 2026-02-06 cs.LG cs.NA eess.SP math.NA stat.ML

Maximum-Volume Nonnegative Matrix Factorization

Olivier Vu Thanh, Nicolas Gillis

Comments arXiv admin note: substantial text overlap with arXiv:2412.06380 (this paper is an updated version of Chapter 7 of the thesis of the first author, available from arXiv:2412.06380). The code is available from https://gitlab.com/vuthanho/maxvolmf.jl

2602.04408 2026-02-06 cs.LG stat.ML

Separation-Utility Pareto Frontier: An Information-Theoretic Characterization

Shizhou Xu

2602.03539 2026-02-06 math.ST stat.TH

Optimal neural network approximation of smooth compositional functions on sets with low intrinsic dimension

Thomas Nagler, Sophie Langer

2601.21200 2026-02-06 stat.ML cs.LG

Provably Reliable Classifier Guidance via Cross-Entropy Control

Sharan Sahu, Arisina Banerjee, Yuchen Wu

Comments 31 pages, 3 figures

2510.25753 2026-02-06 stat.ML cs.LG

How Data Mixing Shapes In-Context Learning: Asymptotic Equivalence for Transformers with MLPs

Samet Demir, Zafer Dogan

Comments NeurIPS 2025, 24 pages, 6 figures

2510.25502 2026-02-06 cs.LG cs.AI stat.ML

TempoPFN: Synthetic Pre-training of Linear RNNs for Zero-shot Time Series Forecasting

Vladyslav Moroshan, Julien Siems, Arber Zela, Timur Carstensen, Frank Hutter

Comments 38 pages, 22 figures, 17 tables

2510.22031 2026-02-06 cs.LG cs.AI stat.ML

Differentiable Constraint-Based Causal Discovery

Jincheng Zhou, Mengbo Wang, Anqi He, Yumeng Zhou, Hessam Olya, Murat Kocaoglu, Bruno Ribeiro

2510.18713 2026-02-06 cs.LG cs.AI stat.ML

Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options

Joongkyu Lee, Seouh-won Yi, Min-hwan Oh

Comments Accepted at NeurIPS 2025

2510.08916 2026-02-06 stat.ML cs.LG

A Representer Theorem for Hawkes Processes via Penalized Least Squares Minimization

Hideaki Kim, Tomoharu Iwata

Comments Accepted to ICLR 2026

2509.06505 2026-02-06 cs.LG cs.IT math.IT stat.ML

On optimal solutions of classical and sliced Wasserstein GANs with non-Gaussian data

Yu-Jui Huang, Hsin-Hua Shen, Yu-Chih Huang, Wan-Yi Lin, Shih-Chun Lin

2508.12735 2026-02-06 cs.DL stat.AP

Citation accuracy, citation noise, and citation bias: A foundation of citation analysis

Lutz Bornmann, Christian Leibel

2508.08336 2026-02-06 stat.ME

Empirical Bayes for Data Integration

Paul Rognon-Vael, David Rossell

2505.20295 2026-02-06 cs.CL cs.AI cs.LG stat.ML

SelfReflect: Can LLMs Communicate Their Internal Answer Distribution?

Michael Kirchhof, Luca Füger, Adam Goliński, Eeshan Gunesh Dhekane, Arno Blaas, Seong Joon Oh, Sinead Williamson

Comments Accepted at ICLR 2026

2505.15423 2026-02-06 cs.LG econ.EM stat.AP stat.ME stat.ML

SplitWise Regression: Stepwise Modeling with Adaptive Dummy Encoding

Marcell T. Kurbucz, Nikolaos Tzivanakis, Nilufer Sari Aslam, Adam M. Sykulski

Comments 15 pages, 1 figure, 3 tables

Journal ref Scientific Reports 15, 42454 (2025)

2504.12841 2026-02-06 cs.LG cs.AI cs.CV cs.MS stat.ML

ALT: A Python Package for Lightweight Feature Representation in Time Series Classification

Balázs P. Halmos, Balázs Hajós, Vince Á. Molnár, Marcell T. Kurbucz, Antal Jakovác

Comments 16 pages, 4 figures

Journal ref Machine Learning: Science and Technology (2026)

2504.02974 2026-02-06 math.ST stat.TH

Testing hypotheses generated by constraints

Martin Larsson, Aaditya Ramdas, Johannes Ruf

2503.22548 2026-02-06 stat.AP

Comparing methods to assess treatment effect heterogeneity in general parametric regression models

Yao Chen, Sophie Sun, Konstantinos Sechidis, Cong Zhang, Torsten Hothorn, Björn Bornkamp

2503.03557 2026-02-06 stat.AP stat.ME

Causal language jumps in clinical practice guidelines for diabetes management

Keling Wang, Chang Wei, Jeremy A. Labrecque

Comments 10 pages, 4 figures, 3 tables, 4 supplementary files

Journal ref BMJ Open 2026;16:e109205

2503.03530 2026-02-06 stat.ME

Inference for Heterogeneous Treatment Effects with Efficient Instruments and Machine Learning

Cyrill Scheidegger, Zijian Guo, Peter Bühlmann

2502.09986 2026-02-06 stat.ME stat.ML

Statistical description and dimension reduction of continuous time categorical trajectories with multivariate functional principal components

Hervé Cardot, Caroline Peltier

2502.00713 2026-02-06 stat.AP stat.ME

Using Individualized Treatment Effects to Assess Treatment Effect Heterogeneity

Konstantinos Sechidis, Cong Zhang, Sophie Sun, Yao Chen, Asher Spector, Björn Bornkamp

Journal ref Statistics in Medicine 2025

2501.09217 2026-02-06 cs.LG cs.AI cs.CV stat.ML

Adaptive Law-Based Transformation (ALT): A Lightweight Feature Representation for Time Series Classification

Marcell T. Kurbucz, Balázs Hajós, Balázs P. Halmos, Vince Á. Molnár, Antal Jakovác

Comments 8 pages, 1 figure, 5 tables

Journal ref Scientific Reports 15, 41775 (2025)

2412.00160 2026-02-06 q-bio.QM stat.AP

How reproducible are data-driven subtypes of Alzheimer's disease atrophy?

Emma Prevot, Cameron Shand, Neil Oxtoby, for Alzheimer's Disease Neuroimaging Initiative

Journal ref Journal of Alzheimer's Disease (2026)

2410.04560 2026-02-06 cs.LG stat.ML

GAMformer: Bridging Tabular Foundation Models and Interpretable Machine Learning

Andreas Mueller, Julien Siems, Harsha Nori, David Salinas, Arber Zela, Rich Caruana, Frank Hutter

Comments 22 pages, 15 figures

2409.01978 2026-02-06 quant-ph cs.LG stat.ML

Application of Langevin Dynamics to Advance the Quantum Natural Gradient Optimization Algorithm

Oleksandr Borysenko, Mykhailo Bratchenko, Ilya Lukin, Mykola Luhanko, Ihor Omelchenko, Andrii Sotnikov, Alessandro Lomi

Comments 11 pages, 3 figures

Journal ref Physica A 682 (2026) 131158

2407.02085 2026-02-06 stat.ME stat.CO

Regularized estimation of Monge-Kantorovich quantiles for spherical data

Bernard Bercu, Jérémie Bigot, Gauthier Thurin

2405.14982 2026-02-06 cs.LG cs.AI cs.CL stat.ML

In-context Time Series Predictor

Jiecheng Lu, Yan Sun, Shihao Yang

Comments Camera-ready version. Accepted at ICLR 2025

Journal ref Proceedings of the Thirteenth International Conference on Learning Representations (ICLR 2025)

2403.01673 2026-02-06 stat.ML cs.AI cs.LG

CATS: Enhancing Multivariate Time Series Forecasting by Constructing Auxiliary Time Series as Exogenous Variables

Jiecheng Lu, Xu Han, Yan Sun, Shihao Yang

Comments Camera-ready version. Accepted at ICML 2024

Journal ref Proceedings of the Forty-first International Conference on Machine Learning (ICML 2024)

2402.10506 2026-02-06 math.ST math.PR stat.TH

Optimistic Estimation of Convergence in Markov Chains with the Average-Mixing Time

Geoffrey Wolfer, Pierre Alquier

2310.09488 2026-02-06 stat.ML cs.LG

ARM: Refining Multivariate Forecasting with Adaptive Temporal-Contextual Learning

Jiecheng Lu, Xu Han, Shihao Yang

Comments Camera-ready version. Accepted at ICLR 2024

Journal ref Proceedings of the Twelfth International Conference on Learning Representations (ICLR 2024)

2309.16858 2026-02-06 stat.ML cs.LG

Improved Generalization Bounds for Transductive Learning by Transductive Local Complexity and Its Applications

Yingzhen Yang

Comments The ICML 2025 conference version (https://openreview.net/pdf?id=NRVdvg7VMn) is a special case of this paper where the chain length is fixed at 2 (i.e.,$Q=2$, see Def. 5.1), and its main results follow directly from the results here. This paper further provides a nearly optimal excess risk bound for realizable transductive learning and a stronger bound for transductive kernel learning

2307.01930 2026-02-06 cs.LG cs.AI cs.CV stat.AP stat.ML

Learning ECG Signal Features Without Backpropagation Using Linear Laws

Péter Pósfay, Marcell T. Kurbucz, Péter Kovács, Antal Jakovác

Comments 35 pages, 3 figures, 3 tables

Journal ref Machine Learning: Science and Technology 6, 035001 (2025)

2305.15793 2026-02-06 cs.LG cs.AI cs.CE stat.CO

Feature space reduction method for ultrahigh-dimensional, multiclass data: Random forest-based multiround screening (RFMS)

Gergely Hanczár, Marcell Stippinger, Dávid Hanák, Marcell T. Kurbucz, Olivér M. Törteli, Ágnes Chripkó, Zoltán Somogyvári

Comments 9 pages, 2 figures, 2 tables

Journal ref Machine Learning: Science and Technology 4, 045012 (2023)

2304.14211 2026-02-06 cs.LG cs.AI cs.CV cs.MS stat.ML

LLT: An R package for Linear Law-based Feature Space Transformation

Marcell T. Kurbucz, Péter Pósfay, Antal Jakovác

Comments 15 pages, 5 figures, 1 table

Journal ref SoftwareX 25, 101623 (2024)

2301.05936 2026-02-06 math.PR math.OC math.ST stat.TH

Arcade Processes for Informed Martingale Interpolation

Georges Kassis, Andrea Macrina

Comments On 4 February 2026, accepted for publication in Stochastic Processes and Their Applications

2210.00200 2026-02-06 stat.ME math.ST stat.TH

Semiparametric Efficient Fusion of Individual Data and Summary Statistics

Wenjie Hu, Ruoyu Wang, Wei Li, Wang Miao

Comments 69 pages, 5 figures

2602.05272 2026-02-06 math.ST math.PR stat.ML stat.TH

Asymptotically optimal sequential change detection for bounded means

Ashwin Ram, Aaditya Ramdas

Comments Preprint

2602.05259 2026-02-06 math.ST stat.ML stat.TH

An Asymptotic Law of the Iterated Logarithm for $\mathrm{KL}_{\inf}$

Ashwin Ram, Aaditya Ramdas

Comments Preprint

2602.05246 2026-02-06 stat.AP

Active Simulation-Based Inference for Scalable Car-Following Model Calibration

Menglin Kong, Chengyuan Zhang, Lijun Sun

2602.05239 2026-02-06 stat.ME

Impact Range Assessment (IRA): An Interpretable Sensitivity Measure for Regression Modelling

Jihao You, Dan Tulpan, Jiaojiao Diao, Jennifer L. Ellis

Comments 17 pages, 3 figures. This manuscript was first submitted to MethodsX on February 4, 2026

2602.05230 2026-02-06 cs.LG cs.AI stat.ML

ZeroS: Zero-Sum Linear Attention for Efficient Transformers

Jiecheng Lu, Xu Han, Yan Sun, Viresh Pati, Yubin Kim, Siddhartha Somani, Shihao Yang

Comments Camera-ready version. Accepted at NeurIPS 2025

Journal ref Proceedings of the Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025)

2602.05226 2026-02-06 stat.AP econ.EM stat.ME

Predictive Synthesis under Sporadic Participation: Evidence from Inflation Density Surveys

Matthew C. Johnson, Matteo Luciani, Minzhengxiong Zhang, Kenichiro McAlinn

2602.05174 2026-02-06 stat.ML cs.AI cs.LG math.ST stat.TH

Total Variation Rates for Riemannian Flow Matching

Yunrui Guan, Krishnakumar Balasubramanian, Shiqian Ma

2602.05106 2026-02-06 cs.CL cs.LG stat.ML

Data Kernel Perspective Space Performance Guarantees for Synthetic Data from Transformer Models

Michael Browder, Kevin Duh, J. David Harris, Vince Lyzinski, Paul McNamee, Youngser Park, Carey E. Priebe, Peter Viechnicki

2602.05082 2026-02-06 cs.LG cs.AI stat.ML

Reliable Explanations or Random Noise? A Reliability Metric for XAI

Poushali Sengupta, Sabita Maharjan, Frank Eliassen, Shashi Raj Pandey, Yan Zhang

2602.05065 2026-02-06 cs.LG math.OC stat.ML

Does SGD Seek Flatness or Sharpness? An Exactly Solvable Model

Yizhou Xu, Pierfrancesco Beneventano, Isaac Chuang, Liu Ziyin

2602.05041 2026-02-06 stat.ME

A Weighting Framework for Clusters as Confounders in Observational Studies

Eli Ben-Michael, Avi Feller, Luke Keele

2602.05032 2026-02-06 stat.CO

Fast Compute via MC Boosting

Sarah Polson, Vadim Sokolov

2602.05028 2026-02-06 stat.AP

Physics-Informed Diffusion Models for Vehicle Speed Trajectory Generation

Vadim Sokolov, Farnaz Behnia, Dominik Karbowski

2602.05022 2026-02-06 stat.ME

Double Variable Importance Matching to Estimate Distinct Causal Effects on Event Probability and Timing

Yuqi Li, Quinn Lanners, Matthew M. Engelhard

2602.04895 2026-02-06 cs.CR cs.DS cs.LG stat.ML

Privacy Amplification Persists under Unlimited Synthetic Data Release

Clément Pierquin, Aurélien Bellet, Marc Tommasi, Matthieu Boussard

2602.04891 2026-02-06 stat.ME physics.comp-ph stat.AP

Penalized Likelihood Parameter Estimation for Differential Equation Models: A Computational Tutorial

Matthew J Simpson, James S Bennett, Alexander Johnston, Ruth E Baker

Comments 28 pages, 6 figures

2602.04886 2026-02-06 cs.LG cs.AI cs.CE stat.ML

Denoising diffusion networks for normative modeling in neuroimaging

Luke Whitbread, Lyle J. Palmer, Mark Jenkinson

Comments 55 pages, 20 figures

2511.15120 2026-02-06 stat.ML cs.AI cs.IT cs.LG math.IT math.ST stat.TH

Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit

Bohan Zhang, Zihao Wang, Hengyu Fu, Jason D. Lee

Comments 85 pages, 2 figures. The order of the first two authors was determined by a coin flip. Accepted by ICLR 2026

2511.08667 2026-02-06 cs.LG stat.ML

TabPFN-2.5: Advancing the State of the Art in Tabular Foundation Models

Léo Grinsztajn, Klemens Flöge, Oscar Key, Felix Birkel, Philipp Jund, Brendan Roof, Benjamin Jäger, Dominik Safaric, Simone Alessi, Adrian Hayler, Mihir Manium, Rosen Yu, Felix Jablonski, Shi Bin Hoo, Anurag Garg, Jake Robertson, Magnus Bühler, Vladyslav Moroshan, Lennart Purucker, Clara Cornu, Lilly Charlotte Wehrhahn, Alessandro Bonetto, Bernhard Schölkopf, Sauraj Gambhir, Noah Hollmann, Frank Hutter

2510.24710 2026-02-06 math.OC cs.IT cs.LG math.IT stat.ML

A Single-Loop First-Order Algorithm for Linearly Constrained Bilevel Optimization

Wei Shen, Jiawei Zhang, Minhui Huang, Cong Shen

Comments NeurIPS 2025

2508.06483 2026-02-06 math.PR math.ST stat.ML stat.TH

A variational approach to dimension-free self-normalized concentration

Ben Chugg, Aaditya Ramdas

Comments 42 pages

2508.00120 2026-02-06 stat.ME stat.ML

AdapDISCOM: An Adaptive Sparse Regression Method for High-Dimensional Multimodal Data With Block-Wise Missingness and Measurement Errors

Maimouna Baldé, Abdoul O. Diakité, Claudia Moreau, Gleb Bezgin, Nikhil Bhagwat, Pedro Rosa-Neto, Jean-Baptiste Poline, Simon Girard, Amadou Barry

Comments 49 pages, 4 figures

2505.21799 2026-02-06 math.OC cs.LG stat.ML

PolarGrad: A Class of Matrix-Gradient Optimizers from a Unifying Preconditioning Perspective

Tim Tsz-Kit Lau, Qi Long, Weijie Su

Comments Minor typos corrected

2505.17004 2026-02-06 cs.LG cs.AI cs.NA math.NA stat.ML

Guided Diffusion Sampling on Function Spaces with Applications to PDEs

Jiachen Yao, Abbas Mammadov, Julius Berner, Gavin Kerrigan, Jong Chul Ye, Kamyar Azizzadenesheli, Anima Anandkumar

Comments Accepted to NeurIPS 2025

2505.05961 2026-02-06 math.DG stat.CO

GEORCE: A Fast New Control Algorithm for Computing Geodesics

Frederik Möbius Rygaard, Søren Hauberg

2502.07244 2026-02-06 cs.LG cs.AI stat.ML

Linear Transformers as VAR Models: Aligning Autoregressive Attention Mechanisms with Autoregressive Forecasting

Jiecheng Lu, Shihao Yang

Comments Camera-ready version. Accepted at ICML 2025

Journal ref Proceedings of the Forty-second International Conference on Machine Learning (ICML 2025)

2501.16156 2026-02-06 stat.ME

Moving toward best practice when using propensity score weighting in survey observational studies

Yukang Zeng, Fan Li, Guangyu Tong

2411.09686 2026-02-06 stat.ML cs.LG

Conditional regression for the Nonlinear Single-Variable Model

Yantao Wu, Mauro Maggioni

Comments 63 pages, 11 figures

2410.03159 2026-02-06 cs.LG cs.AI stat.ML

WAVE: Weighted Autoregressive Varying Gate for Time Series Forecasting

Jiecheng Lu, Xu Han, Yan Sun, Shihao Yang

Comments Camera-ready version. Accepted at ICML 2025

Journal ref Proceedings of the Forty-second International Conference on Machine Learning (ICML 2025)

2406.05014 2026-02-06 stat.ML cs.LG

Root Cause Analysis of Outliers with Missing Structural Knowledge

William Roy Orchard, Nastaran Okati, Sergio Hernan Garrido Mejia, Patrick Blöbaum, Dominik Janzing

Comments Accepted at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)

2405.07109 2026-02-06 stat.ME

Bridging Binarization: Causal Inference with Dichotomized Continuous Exposures

Kaitlyn J. Lee, Alan Hubbard, Alejandro Schuler

详情

DOI: 10.1515/jci-2024-0049

英文摘要

The average treatment effect (ATE) is a common parameter estimated in causal inference literature, but it is only defined for binary exposures. Thus, despite concerns raised by some researchers, many studies seeking to estimate the causal effect of a continuous exposure create a new binary exposure variable by dichotomizing the continuous values into two categories. In this paper, we affirm binarization as a statistically valid method for answering causal questions about continuous exposures by showing the equivalence between the binarized ATE and the difference in the average outcomes of two specific modified treatment policies. These policies impose cut-offs corresponding to the binarized exposure variable and assume preservation of relative self-selection. Relative self-selection is the ratio of the probability density of an individual having an exposure equal to one value of the continuous exposure variable versus another. The policies assume that, for any two values of the exposure variable with non-zero probability density after the cut-off, this ratio will remain unchanged. Through this equivalence, we clarify the assumptions underlying binarization and discuss how to properly interpret the resulting estimator. Additionally, we introduce a new target parameter that can be computed after binarization that considers the observed world as a benchmark. We argue that this parameter addresses more relevant causal questions than the traditional binarized ATE parameter. We present a simulation study to illustrate the implications of these assumptions when analyzing data and to demonstrate how to correctly implement estimators of the parameters discussed. Finally, we present an application of this method to evaluate the effect of a law in the state of California which seeks to limit exposures to oil and gas wells on birth outcomes to further illustrate the underlying assumptions.

URL PDF HTML ☆

赞 0 踩 0

2405.07102 2026-02-06 stat.ME stat.AP stat.OT

Nested Instrumental Variables Analysis: Switcher Average Treatment Effect, Identification, Efficient Estimation and Generalizability

Rui Wang, Ying-Qi Zhao, Oliver Dukes, Bo Zhang

2404.15617 2026-02-06 cs.LG cs.AI math.OC math.ST stat.TH

A Differential and Pointwise Control Approach to Reinforcement Learning

Minh Nguyen, Chandrajit Bajaj

Comments NeurIPS 2025

2311.01147 2026-02-06 stat.ME stat.CO

Variational Inference for Sparse Poisson Regression

Mitra Kharabati, Morteza Amini, Mohammad Arashi

Comments A part of the PhD thesis of Miss Mitra Kharabati