arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.02227 2026-03-04 cs.LG cs.CL

Routing Absorption in Sparse Attention: Why Random Gates Are Hard to Beat

Keston Aquino-Michaels

Comments 14 pages, 4 figures

详情

英文摘要

Can a transformer learn which attention entries matter during training? In principle, yes: attention distributions are highly concentrated, and a small gate network can identify the important entries post-hoc with near-perfect accuracy. In practice, barely. When sparse attention is trained end-to-end, the model's Q/K/V projections co-adapt to whatever mask is imposed, absorbing the routing signal until learned gates perform little better than frozen random gates. We call this routing absorption and present four independent lines of evidence for it in a controlled 31M-parameter transformer: (1) differentiable soft gating converges to nearly the same perplexity whether the gate is learned or random (48.73 +/- 0.60 vs. 49.83 +/- 0.04 over 3 seeds); (2) hard top-k gating receives exactly zero gradient through the mask; (3) a gate distilled onto co-adapted Q/K/V achieves high F1 against oracle masks but catastrophic perplexity when deployed (601.6 vs. 48.6 on mask-agnostic Q/K/V); and (4) stochastic mask randomization during training fails to prevent co-adaptation (78.2 ppl deployed dense vs. 37.3 baseline). We connect routing absorption to the same phenomenon in Mixture-of-Experts, where random routing matches learned routing because experts co-adapt to any router, but show that attention exhibits a structurally more severe form: shared Q/K/V parameters enable cross-layer compensation pathways absent in MoE, where experts are self-contained modules. The implication is that end-to-end sparse attention methods employing per-query token-level gating face absorption pressure proportional to the parameter asymmetry between the gate and the model, and that post-hoc approaches, which decouple representation learning from sparsification, sidestep this entirely.

URL PDF HTML ☆

赞 0 踩 0

2603.02224 2026-03-04 cs.LG

Subspace Geometry Governs Catastrophic Forgetting in Low-Rank Adaptation

Brady Steele

Comments 15 pages, 5 figures, 6 tables

2603.02223 2026-03-04 cs.LG cs.AI

Characterizing and Predicting Wildfire Evacuation Behavior: A Dual-Stage ML Approach

Sazzad Bin Bashar Polock, Anandi Dutta, Subasish Das

Comments This is the author's preprint version of a paper accepted for presentation at SoutheastConn 2026. The final published version will appear in the official conference proceedings. Conference site: https://ieeesoutheastcon.org/

2603.02222 2026-03-04 cs.LG cs.AI

MedCalc-Bench Doesn't Measure What You Think: A Benchmark Audit and the Case for Open-Book Evaluation

Artus Krohn-Grimberghe

2603.02219 2026-03-04 cs.LG cs.AI

NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels

Junfeng Fang, Nachuan Chen, Houcheng Jiang, Dan Zhang, Fei Shen, Xiang Wang, Xiangnan He, Tat-Seng Chua

2603.02217 2026-03-04 cs.LG cs.AI

Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression

Sieun Hyeon, Jaeyoung Do

2603.02216 2026-03-04 cs.LG cs.AI

ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue

Ruike Cao, Shaojie Bai, Fugen Yao, Liang Dong, Jian Xu, Li Xiao

Comments Accepted to ICLR 2026

2603.02215 2026-03-04 cs.LG cs.AI

RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning

Ran Li, Shimin Di, Haowei LI, Luanshi Bu, Jiachuan Wang, Wangze Ni, Lei Chen

2603.02213 2026-03-04 cs.CL cond-mat.stat-mech q-bio.GN

A Zipf-preserving, long-range correlated surrogate for written language and other symbolic sequences

Marcelo A. Montemurro, Mirko Degli Esposti

详情

DOI: 10.1016/j.physa.2025.131227
Journal ref: Physica A 683 (2026) 131227

英文摘要

Symbolic sequences such as written language and genomic DNA display characteristic frequency distributions and long-range correlations extending over many symbols. In language, this takes the form of Zipf's law for word frequencies together with persistent correlations spanning hundreds or thousands of tokens, while in DNA it is reflected in nucleotide composition and long-memory walks under purine-pyrimidine mappings. Existing surrogate models usually preserve either the frequency distribution or the correlation properties, but not both simultaneously. We introduce a surrogate model that retains both constraints: it preserves the empirical symbol frequencies of the original sequence and reproduces its long-range correlation structure, quantified by the detrended fluctuation analysis (DFA) exponent. Our method generates surrogates of symbolic sequences by mapping fractional Gaussian noise (FGN) onto the empirical histogram through a frequency-preserving assignment. The resulting surrogates match the original in first-order statistics and long-range scaling while randomising short-range dependencies. We validate the model on representative texts in English and Latin, and illustrate its broader applicability with genomic DNA, showing that base composition and DFA scaling are reproduced. This approach provides a principled tool for disentangling structural features of symbolic systems and for testing hypotheses on the origin of scaling laws and memory effects across language, DNA, and other symbolic domains.

URL PDF HTML ☆

赞 0 踩 0

2603.02206 2026-03-04 cs.SD

VoiceAgentRAG: Solving the RAG Latency Bottleneck in Real-Time Voice Agents Using Dual-Agent Architectures

Jielin Qiu, Jianguo Zhang, Zixiang Chen, Liangwei Yang, Ming Zhu, Juntao Tan, Haolin Chen, Wenting Zhao, Rithesh Murthy, Roshan Ram, Akshara Prabhakar, Shelby Heinecke, Caiming Xiong, Silvio Savarese, Huan Wang

2603.02022 2026-03-04 cs.SD cs.AI

CodecFlow: Efficient Bandwidth Extension via Conditional Flow Matching in Neural Codec Latent Space

Bowen Zhang, Junchuan Zhao, Ian McLoughlin, Ye Wang, A S Madhukumar

Comments 7 pages, 7 figures

2603.01741 2026-03-04 cs.LG cs.AI cs.RO

Rethinking Policy Diversity in Ensemble Policy Gradient in Large-Scale Reinforcement Learning

Naoki Shitanda, Motoki Omura, Tatsuya Harada, Takayuki Osa

Comments In ICLR 2026. Website at https://naoki04.github.io/paper-cpo/

2603.01690 2026-03-04 cs.CL cs.AI

QIME: Constructing Interpretable Medical Text Embeddings via Ontology-Grounded Questions

Yixuan Tang, Zhenghong Lin, Yandong Sun, Wynne Hsu, Mong Li Lee, Anthony K. H. Tung

2603.01650 2026-03-04 cs.CV

PromptStereo: Zero-Shot Stereo Matching via Structure and Motion Prompts

Xianqi Wang, Hao Yang, Hangtian Wang, Junda Cheng, Gangwei Xu, Min Lin, Xin Yang

Comments Accepted to CVPR 2026

2603.01562 2026-03-04 cs.AI

RubricBench: Aligning Model-Generated Rubrics with Human Standards

Qiyuan Zhang, Junyi Zhou, Yufei Wang, Fuyuan Lyu, Yidong Ming, Can Xu, Qingfeng Sun, Kai Zheng, Peng Kang, Xue Liu, Chen Ma

2603.01412 2026-03-04 cs.CV

UETrack: A Unified and Efficient Framework for Single Object Tracking

Ben Kang, Jie Zhao, Xin Chen, Wanting Geng, Bin Zhang, Lu Zhang, Dong Wang, Huchuan Lu

Comments This paper was accepted by CVPR2026

2603.01404 2026-03-04 cs.RO

D-GVIO: A Buffer-Driven and Efficient Decentralized GNSS-Visual-Inertial State Estimator for Multi-Agent Systems

Yarong Luo, Wentao Lu, Chi Guo, Ming Li

Comments Accepted by ICRA 2026

2603.01398 2026-03-04 cs.CV

Continuous Exposure-Time Modeling for Realistic Atmospheric Turbulence Synthesis

Junwei Zeng, Dong Liang, Sheng-Jun Huang, Kun Zhan, Songcan Chen

Comments Accepted to CVPR 2026!

2603.01099 2026-03-04 cs.CV

HeroGS: Hierarchical Guidance for Robust 3D Gaussian Splatting under Sparse Views

Jiashu Li, Xumeng Han, Zhaoyang Wei, Zipeng Wang, Kuiran Wang, Guorong Li, Zhenjun Han, Jianbin Jiao

2603.00976 2026-03-04 cs.CV

PreciseCache: Precise Feature Caching for Efficient and High-fidelity Video Generation

Jiangshan Wang, Kang Zhao, Jiayi Guo, Jiayu Wang, Hang Guo, Chenyang Zhu, Xiu Li, Xiangyu Yue

Comments ICLR 2026

2603.00755 2026-03-04 cs.CV cs.LG

BornoViT: A Novel Efficient Vision Transformer for Bengali Handwritten Basic Characters Classification

Rafi Hassan Chowdhury, Naimul Haque, Kaniz Fatiha

2603.00624 2026-03-04 cs.LG cs.AI cs.CV

IDER: IDempotent Experience Replay for Reliable Continual Learning

Zhanwang Liu, Yuting Li, Haoyuan Gao, Yexin Li, Linghe Kong, Lichao Sun, Weiran Huang

Comments Accepted by ICLR 2026

2603.00582 2026-03-04 cs.CL

Super Research: Answering Highly Complex Questions with Large Language Models through Super Deep and Super Wide Research

Yubo Dong, Nianhao You, Yuxuan Hou, Zixun Sun, Yue Zhang, Liang Zhang, Siyuan Zhao, Hehe Fan

2603.00147 2026-03-04 cs.CV cs.IR eess.IV

Leveraging GenAI for Segmenting and Labeling Centuries-old Technical Documents

Carlos Monroy, Benjamin Navarro

Comments 6 pages, 7 figures

详情

DOI: 10.1109/IEEE-CH65308.2025.11279526
Journal ref: 2025 IEEE International Conference on Cyber Humanities (IEEE-CH),Florence, Italy, 2025, pp. 1-6

英文摘要

Image segmentation and image recognition are well established computational techniques in the broader discipline of image processing. Segmentation allows to locate areas in an image, while recognition identifies specific objects within an image. These techniques have shown remarkable accuracy with modern images, mainly because the amount of training data is vast. Achieving similar accuracy in digitized images of centuries-old documents is more challenging. This difficulty is due to two main reasons: first, the lack of sufficient training data, and second, because the degree of specialization in a given domain. Despite these limitations, the ability to segment and recognize objects in these collections is important for automating the curation, cataloging, and dissemination of knowledge, making the contents of priceless collections accessible to scholars and the general public. In this paper, we report on our ongoing work in segmenting and labeling images pertaining to shipbuilding treatises from the XVI and XVII centuries, a historical period known as the Age of Exploration. To this end, we leverage SAM2 for image segmentation; Florence2 and ChatGPT for labeling; and a specialized ontology ontoShip and glossary glosShip of nautical architecture for enhancing the labeling process. Preliminary results demonstrate the potential of marrying these technologies for improving curation and retrieval of priceless historical documents. We also discuss the challenges and limitations encountered in this approach and ideas on how to overcome them in the future.

URL PDF HTML ☆

赞 0 踩 0

2602.23652 2026-03-04 cs.CV cs.AI

3D Modality-Aware Pre-training for Vision-Language Model in MRI Multi-organ Abnormality Detection

Haowen Zhu, Ning Yin, Xiaogen Zhou

2602.23191 2026-03-04 cs.CV

Uni-Animator: Towards Unified Visual Colorization

Xinyuan Chen, Yao Xu, Shaowen Wang, Pengjie Song, Bowen Deng

Comments 12 pages, 8 figures

2602.21646 2026-03-04 cs.CL

Scalable Multilingual Multimodal Machine Translation with Speech-Text Fusion

Yexing Du, Youcheng Pan, Zekun Wang, Zheng Chu, Yichong Huang, Kaiyuan Liu, Bo Yang, Yang Xiang, Ming Liu, Bing Qin

Comments Accepted in ICLR 2026

2602.18936 2026-03-04 cs.CV

CRAFT-LoRA: Content-Style Personalization via Rank-Constrained Adaptation and Training-Free Fusion

Yu Li, Yujun Cai, Chi Zhang

2602.15651 2026-03-04 cs.SD cs.CV eess.AS

UniTAF: A Modular Framework for Joint Text-to-Speech and Audio-to-Face Modeling

Qiangong Zhou, Nagasaka Tomohiro

Comments We have identified inaccuracies in some results that require further verification. To avoid misleading the research community, we are temporarily withdrawing the paper

2602.14744 2026-03-04 cs.CL

Rethinking the Role of LLMs in Time Series Forecasting

Xin Qiu, Junlong Tong, Yirong Sun, Yunpu Ma, Wei Zhang, Xiaoyu Shen