arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.15619 2026-03-17 cs.CL cs.AI

Mixture-of-Depths Attention

Lianghui Zhu, Yuxin Fang, Bencheng Liao, Shijie Wang, Tianheng Cheng, Zilong Huang, Chen Chen, Lai Wei, Yutao Zeng, Ya Wang, Yi Lin, Yu Li, Xinggang Wang

Comments Code is released at https://github.com/hustvl/MoDA

详情

英文摘要

Scaling depth is a key driver for large language models (LLMs). Yet, as LLMs become deeper, they often suffer from signal degradation: informative features formed in shallow layers are gradually diluted by repeated residual updates, making them harder to recover in deeper layers. We introduce mixture-of-depths attention (MoDA), a mechanism that allows each attention head to attend to sequence KV pairs at the current layer and depth KV pairs from preceding layers. We further describe a hardware-efficient algorithm for MoDA that resolves non-contiguous memory-access patterns, achieving 97.3% of FlashAttention-2's efficiency at a sequence length of 64K. Experiments on 1.5B-parameter models demonstrate that MoDA consistently outperforms strong baselines. Notably, it improves average perplexity by 0.2 across 10 validation benchmarks and increases average performance by 2.11% on 10 downstream tasks, with a negligible 3.7% FLOPs computational overhead. We also find that combining MoDA with post-norm yields better performance than using it with pre-norm. These results suggest that MoDA is a promising primitive for depth scaling. Code is released at https://github.com/hustvl/MoDA .

URL PDF HTML ☆

赞 0 踩 0

2603.15617 2026-03-17 cs.LG

HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification

Erik Y. Wang, Sumeet Motwani, James V. Roggeveen, Eliot Hodges, Dulhan Jayalath, Charles London, Kalyan Ramakrishnan, Flaviu Cipcigan, Philip Torr, Alessandro Abate

2603.15616 2026-03-17 cs.CV

GlyphPrinter: Region-Grouped Direct Preference Optimization for Glyph-Accurate Visual Text Rendering

Xincheng Shuai, Ziye Li, Henghui Ding, Dacheng Tao

Comments CVPR 2026, Project Page: https://henghuiding.com/GlyphPrinter/

2603.15615 2026-03-17 cs.CL cs.AI

Mechanistic Origin of Moral Indifference in Language Models

Lingyu Li, Yan Teng, Yingchun Wang

Comments 24 pages, 11 figures, 5 tables

2603.15614 2026-03-17 cs.CV

Tri-Prompting: Video Diffusion with Unified Control over Scene, Subject, and Motion

Zhenghong Zhou, Xiaohang Zhan, Zhiqin Chen, Soo Ye Kim, Nanxuan Zhao, Haitian Zheng, Qing Liu, He Zhang, Zhe Lin, Yuqian Zhou, Jiebo Luo

Comments Project page: https://zhouzhenghong-gt.github.io/Tri-Prompting-Page/

2603.15612 2026-03-17 cs.CV cs.RO

HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions

Yukang Cao, Haozhe Xie, Fangzhou Hong, Long Zhuo, Zhaoxi Chen, Liang Pan, Ziwei Liu

Comments https://yukangcao.github.io/HSImul3R/

2603.15611 2026-03-17 cs.CL

Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning

Aozhe Wang, Yuchen Yan, Nan Zhou, Zhengxi Lu, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

Comments Project Page: https://zju-real.github.io/Code-A1 Code: https://github.com/ZJU-REAL/Code-A1

2603.15607 2026-03-17 cs.AI cs.HC

Do Metrics for Counterfactual Explanations Align with User Perception?

Felix Liedeker, Basil Ell, Philipp Cimiano, Christoph Düsing

Comments Accepted at the 4th World Conference on eXplainable Artificial Intelligence (XAI 2026)

2603.15605 2026-03-17 cs.RO

Perception-Aware Autonomous Exploration in Feature-Limited Environments

Moji Shi, Rajitha de Silva, Hang Yu, Riccardo Polvara, Marija Popović

2603.15604 2026-03-17 cs.RO

EAAE: Energy-Aware Autonomous Exploration for UAVs in Unknown 3D Environments

Jacob Elskamp, Moji Shi, Leonard Bauersfeld, Davide Scaramuzza, Marija Popović

2603.15603 2026-03-17 cs.CV

Fast SAM 3D Body: Accelerating SAM 3D Body for Real-Time Full-Body Human Mesh Recovery

Timing Yang, Sicheng He, Hongyi Jing, Jiawei Yang, Zhijian Liu, Chuhang Zou, Yue Wang

2603.15600 2026-03-17 cs.RO cs.AI cs.CL cs.CV

From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation

Yibin Liu, Yaxing Lyu, Daqi Gao, Zhixuan Liang, Weiliang Tang, Shilong Mu, Xiaokang Yang, Yao Mu

Comments 31 pages

2603.15599 2026-03-17 cs.LG

SmartSearch: How Ranking Beats Structure for Conversational Memory Retrieval

Jesper Derehag, Carlos Calva, Timmy Ghiurau

2603.15596 2026-03-17 cs.LG

Robust and Computationally Efficient Linear Contextual Bandits under Adversarial Corruption and Heavy-Tailed Noise

Naoto Tani, Futoshi Futami

2603.15594 2026-03-17 cs.AI cs.CL

OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

Yuwen Du, Rui Ye, Shuo Tang, Xinyu Zhu, Yijun Lu, Yuzhu Cai, Siheng Chen

Comments 15 pages, 6 figures

2603.15590 2026-03-17 cs.LG

Effective Distillation to Hybrid xLSTM Architectures

Lukas Hauzenberger, Niklas Schmidinger, Thomas Schmied, Anamaria-Roberta Hartl, David Stap, Pieter-Jan Hoedt, Maximilian Beck, Sebastian Böck, Günter Klambauer, Sepp Hochreiter

2603.15583 2026-03-17 cs.CV

Grounding World Simulation Models in a Real-World Metropolis

Junyoung Seo, Hyunwook Choi, Minkyung Kwon, Jinhyeok Choi, Siyoon Jin, Gayoung Lee, Junho Kim, JoungBin Lee, Geonmo Gu, Dongyoon Han, Sangdoo Yun, Seungryong Kim, Jin-Hwa Kim

Comments project page: https://seoul-world-model.github.io/

2603.15576 2026-03-17 cs.LG math.OC stat.ML

Unbiased and Biased Variance-Reduced Forward-Reflected-Backward Splitting Methods for Stochastic Composite Inclusions

Quoc Tran-Dinh, Nghia Nguyen-Trung

Comments 34 pages and 2 figures

2603.15574 2026-03-17 cs.CV

Severe Domain Shift in Skeleton-Based Action Recognition:A Study of Uncertainty Failure in Real-World Gym Environments

Aaditya Khanal, Junxiu Zhou

Comments 6 pages, 7 figures

2603.15569 2026-03-17 cs.LG

Mamba-3: Improved Sequence Modeling using State Space Principles

Aakash Lahoti, Kevin Y. Li, Berlin Chen, Caitlin Wang, Aviv Bick, J. Zico Kolter, Tri Dao, Albert Gu

Comments ICLR 2026

2603.15564 2026-03-17 cs.LG stat.AP stat.ML

Predictive Uncertainty in Short-Term PV Forecasting under Missing Data: A Multiple Imputation Approach

Parastoo Pashmchi, Jérôme Benoit, Motonobu Kanagawa

Comments 10 pages

2603.15558 2026-03-17 cs.CV cs.RO

Panoramic Affordance Prediction

Zixin Zhang, Chenfei Liao, Hongfei Zhang, Harold Haodong Chen, Kanghao Chen, Zichen Wen, Litao Guo, Bin Ren, Xu Zheng, Yinchuan Li, Xuming Hu, Nicu Sebe, Ying-Cong Chen

2603.15557 2026-03-17 cs.CV

Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

Lexiang Xiong, Qi Li, Jingwen Ye, Xinchao Wang

2603.15555 2026-03-17 cs.CV

Learning Latent Proxies for Controllable Single-Image Relighting

Haoze Zheng, Zihao Wang, Xianfeng Wu, Yajing Bai, Yexin Liu, Yun Li, Xiaogang Xu, Harry Yang

Comments Accepted by CVPR2026

2603.15553 2026-03-17 cs.CV cs.LG

Self-Distillation of Hidden Layers for Self-Supervised Representation Learning

Scott C. Lowe, Anthony Fuller, Sageev Oore, Evan Shelhamer, Graham W. Taylor

2603.15547 2026-03-17 cs.CL cs.AI cs.HC

Can LLMs Model Incorrect Student Reasoning? A Case Study on Distractor Generation

Yanick Zengaffinen, Andreas Opedal, Donya Rooein, Kv Aditya Srivatsa, Shashank Sonkar, Mrinmaya Sachan

2603.15546 2026-03-17 cs.CV cs.GR cs.RO

Kimodo: Scaling Controllable Human Motion Generation

Davis Rempe, Mathis Petrovich, Ye Yuan, Haotian Zhang, Xue Bin Peng, Yifeng Jiang, Tingwu Wang, Umar Iqbal, David Minor, Michael de Ruyter, Jiefeng Li, Chen Tessler, Edy Lim, Eugene Jeong, Sam Wu, Ehsan Hassani, Michael Huang, Jin-Bey Yu, Chaeyeon Chung, Lina Song, Olivier Dionne, Jan Kautz, Simon Yuen, Sanja Fidler

Comments Project page: https://research.nvidia.com/labs/sil/projects/kimodo/

2603.15541 2026-03-17 cs.LG cs.NI

Bridging Local and Global Knowledge: Cascaded Mixture-of-Experts Learning for Near-Shortest Path Routing

Yung-Fu Chen, Anish Arora

2603.15539 2026-03-17 cs.LG

Vib2ECG: A Paired Chest-Lead SCG-ECG Dataset and Benchmark for ECG Reconstruction

Guorui Lu, Xiaohui Cai, Todor Stefanov, Qinyu Chen

Comments This work has been submitted to the IEEE for possible publication

2603.15527 2026-03-17 cs.AI cs.CY

Are Dilemmas and Conflicts in LLM Alignment Solvable? A View from Priority Graph

Zhenheng Tang, Xiang Liu, Qian Wang, Eunsol Choi, Bo Li, Xiaowen Chu