arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2507.09875 2026-03-05 cs.CL cs.AI cs.LG

Function Induction and Task Generalization: An Interpretability Study with Off-by-One Addition

Qinyuan Ye, Robin Jia, Xiang Ren

Comments ICLR 2026. Code: https://github.com/INK-USC/function-induction

详情

英文摘要

Large language models demonstrate the intriguing ability to perform unseen tasks via in-context learning. However, it remains unclear what mechanisms inside the model drive such task-level generalization. In this work, we approach this question through the lens of off-by-one addition (i.e., 1+1=3, 2+2=5, 3+3=?), a two-step, counterfactual task with an unexpected +1 function as a second step. Leveraging circuit-style interpretability techniques such as path patching, we analyze the models' internal computations behind their performance and present three key findings. First, we identify a mechanism that explains the model's generalization from standard addition to off-by-one addition. It resembles the induction head mechanism described in prior work, yet operates at a higher level of abstraction; we therefore term it "function induction" in this work. Second, we show that the induction of the +1 function is governed by multiple attention heads in parallel, each of which emits a distinct piece of the +1 function. Finally, we find that this function induction mechanism is reused in a broader range of tasks, including synthetic tasks such as shifted multiple-choice QA and algorithmic tasks such as base-8 addition. Overall, our findings offer deeper insights into how reusable and composable structures within language models enable task-level generalization.

URL PDF HTML ☆

赞 0 踩 0

2507.09768 2026-03-05 cs.LG cs.SD eess.AS

Knowing When to Quit: Probabilistic Early Exits for Speech Separation

Kenny Falkær Olsen, Mads Østergaard, Karl Ulbæk, Søren Føns Nielsen, Rasmus Malik Høegh Lindrup, Bjørn Sand Jensen, Morten Mørup

Comments Accepted at ICLR 2026

2507.08492 2026-03-05 cs.CV

D2Dewarp: Dual Dimensions Geometric Representation Learning Based Document Image Dewarping

Heng Li, Xiangping Wu, Qingcai Chen

Comments Accepted by CVPR 2026

2507.06196 2026-03-05 cs.CL cs.AI cs.LG

UQLM: A Python Package for Uncertainty Quantification in Large Language Models

Dylan Bouchard, Mohit Singh Chauhan, David Skarbrevik, Ho-Kyeong Ra, Viren Bajaj, Zeya Ahmad

Comments Accepted by JMLR; UQLM Repository: https://github.com/cvs-health/uqlm

2507.03112 2026-03-05 cs.CL cs.AI cs.CY

RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents

Peisong Wang, Ruotian Ma, Bang Zhang, Xingyu Chen, Zhiwei He, Kang Luo, Qingsong Lv, Qingxuan Jiang, Zheng Xie, Shanyi Wang, Yuan Li, Fanghua Ye, Jian Li, Yifan Yang, Zhaopeng Tu, Xiaolong Li

Comments Code: https://github.com/Tencent/DigitalHuman/tree/main/RLVER

2507.02751 2026-03-05 cs.CV

Partial Weakly-Supervised Oriented Object Detection

Mingxin Liu, Peiyuan Zhang, Yuan Liu, Wei Zhang, Yue Zhou, Ning Liao, Ziyang Gong, Junwei Luo, Zhirui Wang, Yi Yu, Xue Yang

Comments 10 pages, 5 figures, 4 tables, source code: https://github.com/VisionXLab/PWOOD

2506.23971 2026-03-05 cs.LG

UMA: A Family of Universal Models for Atoms

Brandon M. Wood, Misko Dzamba, Xiang Fu, Meng Gao, Muhammed Shuaibi, Luis Barroso-Luque, Kareem Abdelmaqsoud, Vahe Gharakhanyan, John R. Kitchin, Daniel S. Levine, Kyle Michel, Anuroop Sriram, Taco Cohen, Abhishek Das, Ammar Rizvi, Sushree Jagriti Sahoo, Zachary W. Ulissi, C. Lawrence Zitnick

Comments 33 pages, 8 figures

2506.18703 2026-03-05 cs.CL cs.LG

Context Biasing for Pronunciation-Orthography Mismatch in Automatic Speech Recognition

Christian Huber, Alexander Waibel

2506.17896 2026-03-05 cs.CV cs.AI

EgoWorld: Translating Exocentric View to Egocentric View using Rich Exocentric Observations

Junho Park, Andrew Sangwoo Ye, Taein Kwon

Comments Accepted by ICLR 2026. Project Page: https://redorangeyellowy.github.io/EgoWorld/

2506.15963 2026-03-05 cs.LG

On the Limits of Sparse Autoencoders: A Theoretical Framework and Reweighted Remedy

Jingyi Cui, Qi Zhang, Yifei Wang, Yisen Wang

Comments Accepted to ICLR2026

2506.13150 2026-03-05 cs.LG math.OC stat.ML

Federated ADMM from Bayesian Duality

Thomas Möllenhoff, Siddharth Swaroop, Finale Doshi-Velez, Mohammad Emtiyaz Khan

Comments First two authors contributed equally. Published at ICLR 2026. Code is at https://github.com/team-approx-bayes/bayes-admm

2506.09669 2026-03-05 cs.CL

Query-Level Uncertainty in Large Language Models

Lihu Chen, Gerard de Melo, Fabian M. Suchanek, Gaël Varoquaux

Comments ICLR 2026

2506.05937 2026-03-05 cs.LG cs.AI

Robust Adversarial Quantification via Conflict-Aware Evidential Deep Learning

Charmaine Barker, Daniel Bethell, Simos Gerasimou

2506.05634 2026-03-05 cs.LG cs.AI cs.NE

AutoQD: Automatic Discovery of Diverse Behaviors with Quality-Diversity Optimization

Saeed Hedayatian, Stefanos Nikolaidis

Comments Accepted to ICLR 2026

2506.02168 2026-03-05 cs.LG

An Approximation Theory Perspective on Machine Learning

Hrushikesh N. Mhaskar, Efstratios Tsoukanis, Ameya D. Jagtap

Comments 64 pages

2506.01756 2026-03-05 cs.RO

Learning with pyCub: A Simulation and Exercise Framework for Humanoid Robotics

Lukas Rustler, Matej Hoffmann

Comments Accepted to 17th International Conference on Robotics in Education (RiE 2026)

2505.21574 2026-03-05 cs.CV cs.LG

Do We Need All the Synthetic Data? Targeted Image Augmentation via Diffusion Models

Dang Nguyen, Jiping Li, Jinghao Zheng, Baharan Mirzasoleiman

2505.21281 2026-03-05 cs.AI

RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models

Yue Zhang, Zhiliang Tian, Shicheng Zhou, Haiyang Wang, Wenqing Hou, Yuying Liu, Xuechen Zhao, Minlie Huang, Ye Wang, Bin Zhou

2505.20065 2026-03-05 cs.LG cs.AI

SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety

Geon-Hyeong Kim, Yu Jin Kim, Byoungjip Kim, Honglak Lee, Kyunghoon Bae, Youngsoo Jang, Moontae Lee

Comments 40 pages

详情

Journal ref: In Proceedings of the International Conference on Learning Representations (ICLR), 2026

英文摘要

As Large Language Models (LLMs) are increasingly deployed in real-world applications, balancing helpfulness and safety has become a central challenge. A natural approach is to incorporate safety constraints into Reinforcement Learning from Human Feedback (RLHF), where recent studies have shown promising progress. However, these methods often rely on auxiliary networks or multi-stage pipelines, thereby increasing complexity. In this work, we revisit the original safety alignment objective and show that, under mild assumptions, it admits a closed-form optimal policy. We further derive a provably equivalent and tractable objective, enabling direct optimization. Building on this insight, we propose SafeDPO, a lightweight method that preserves the optimal solution of the underlying safety-constrained objective while requiring only one additional hyperparameter and minimal modifications to existing preference-based training methods. SafeDPO eliminates the need for reward models, cost models, and online sampling, relying only on preference data and safety indicators. Despite its simplicity, SafeDPO achieves competitive safety-helpfulness trade-offs compared to existing safety alignment methods. Experiments on the PKU-SafeRLHF-30K benchmark demonstrate that SafeDPO substantially improves safety while maintaining competitive helpfulness. Ablation studies further show that the additional hyperparameter provides a flexible mechanism to enhance safety while preserving the theoretical optimum, and confirm that SafeDPO scales reliably to LLMs with up to 13B parameters. Overall, our results highlight that a simple, theory-driven objective can provide a lightweight yet effective solution for safety alignment in practice.

URL PDF HTML ☆

赞 0 踩 0

2505.18535 2026-03-05 cs.LG math.PR stat.ML

Convergence, Sticking and Escape: Stochastic Dynamics Near Critical Points in SGD

Dmitry Dudukalov, Artem Logachov, Vladimir Lotov, Timofei Prasolov, Evgeny Prokopenko, Anton Tarasenko

Comments The introduction, Subsections 2.1 ("Suitable Time Scaling") and 2.2 ("Sticking to a Critical Point"), as well as a small portion of the proof, have been revised. Subsection 2.3 ("Leaving the Neighborhood of a Sharp Maximum") has undergone minor revisions due to the equality in the doubly exponential case

2505.16985 2026-03-05 cs.CV cs.AI cs.LG cs.RO

Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and Segmentation

Moru Liu, Hao Dong, Jessica Kelly, Olga Fink, Mario Trapp

Comments NeurIPS 2025

2505.15643 2026-03-05 cs.LG cs.IT math.IT stat.ML

Optimal Best-Arm Identification under Fixed Confidence with Multiple Optima

Lan V. Truong

Comments To appear in IEEE Transactions on Information Theory

2505.13943 2026-03-05 cs.CV

From Press to Pixels: Evolving Urdu Text Recognition

Samee Arif, Sualeha Farid

2505.13033 2026-03-05 cs.LG cs.AI

TSPulse: Tiny Pre-Trained Models with Disentangled Representations for Rapid Time-Series Analysis

Vijay Ekambaram, Subodh Kumar, Arindam Jati, Sumanta Mukherjee, Tomoya Sakai, Pankaj Dayama, Wesley M. Gifford, Jayant Kalagnanam

Comments Accepted in ICLR 2026

2505.12506 2026-03-05 cs.LG cs.AI

Unsupervised Representation Learning - an Invariant Risk Minimization Perspective

Yotam Norman, Ron Meir

2505.10118 2026-03-05 cs.CV cs.CL

Why 1 + 1 < 1 in Visual Token Pruning: Beyond Naive Integration via Multi-Objective Balanced Covering

Yangfu Li, Hongjian Zhan, Tianyi Chen, Qi Liu, Yue Lu

Comments 31 pages,9 figures,conference

2505.07757 2026-03-05 cs.AI cs.LG

Emotion-Gradient Metacognitive RSI (Part I): Theoretical Foundations and Single-Agent Architecture

Rintaro Ando

Comments Withdrawn due to a critical error discovered in the stability and convergence proofs (specifically Lemma 2, Theorem 12, and Proposition 10) in Section 3. The identified flaw invalidates the core theoretical guarantees regarding capability growth and system stability

2505.07380 2026-03-05 cs.CV cs.CR eess.IV

Apple's Synthetic Defocus Noise Pattern: Characterization and Forensic Applications

David Vázquez-Padín, Fernando Pérez-González, Pablo Pérez-Miguélez

Comments The last version of the paper is now published in IEEE Transactions on Information Forensics & Security, vol. 21, pp. 1096-1111, 2026

2505.06743 2026-03-05 cs.RO cs.AI

TPK: Trustworthy Trajectory Prediction Integrating Prior Knowledge For Interpretability and Kinematic Feasibility

Marius Baden, Ahmed Abouelazm, Christian Hubschneider, Yin Wu, Daniel Slieter, J. Marius Zöllner

Comments First and Second authors contributed equally; Accepted in the 36th IEEE Intelligent Vehicles Symposium (IV 2025) for oral presentation; Winner of the best paper award

2505.02888 2026-03-05 cs.LG cs.AI cs.CL

When Your Own Output Becomes Your Training Data: Noise-to-Meaning Loops and a Formal RSI Trigger

Rintaro Ando

Comments Withdrawn due to a critical error discovered in the mathematical derivation and proof of Theorem 2 (Unbounded Growth) and related Lemma 2 (Compression gain lower bound). This flaw invalidates the paper's main conclusion that N2M-RSI guarantees unbounded growth, requiring a fundamental revision of the theoretical framework