arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.01571 2026-03-03 cs.AI

Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models

Qiyuan Zhang, Yufei Wang, Tianhe Wu, Can Xu, Qingfeng Sun, Kai Zheng, Xue Liu, Chen Ma

详情

英文摘要

Recent advancements in Generative Reward Models (GRMs) have demonstrated that scaling the length of Chain-of-Thought (CoT) reasoning considerably enhances the reliability of evaluation. However, current works predominantly rely on unstructured length scaling, ignoring the divergent efficacy of different reasoning mechanisms: Breadth-CoT (B-CoT, i.e., multi-dimensional principle coverage) and Depth-CoT (D-CoT, i.e., substantive judgment soundness). To address this, we introduce Mix-GRM, a framework that reconfigures raw rationales into structured B-CoT and D-CoT through a modular synthesis pipeline, subsequently employing Supervised Fine-Tuning (SFT) and Reinforcement Learning with Verifiable Rewards (RLVR) to internalize and optimize these mechanisms. Comprehensive experiments demonstrate that Mix-GRM establishes a new state-of-the-art across five benchmarks, surpassing leading open-source RMs by an average of 8.2\%. Our results reveal a clear divergence in reasoning: B-CoT benefits subjective preference tasks, whereas D-CoT excels in objective correctness tasks. Consequently, misaligning the reasoning mechanism with the task directly degrades performance. Furthermore, we demonstrate that RLVR acts as a switching amplifier, inducing an emergent polarization where the model spontaneously allocates its reasoning style to match task demands. The synthesized data and models are released at \href{https://huggingface.co/collections/DonJoey/mix-grm}{Hugging Face}, and the code is released at \href{https://github.com/Don-Joey/Mix-GRM}{Github}.

URL PDF HTML ☆

赞 0 踩 0

2603.01568 2026-03-03 cs.LG cs.CV cs.IT math.IT q-bio.NC

Rate-Distortion Signatures of Generalization and Information Trade-offs

Leyla Roksan Caglar, Pedro A. M. Mediano, Baihan Lin

2603.01563 2026-03-03 cs.LG cs.AI

LFPO: Likelihood-Free Policy Optimization for Masked Diffusion Models

Chenxing Wei, Jiazhen Kang, Hong Wang, Jianqing Zhang, Hao Jiang, Xiaolong Xu, Ningyuan Sun, Ying He, F. Richard Yu, Yao Shu, Bo Jiang

2603.01560 2026-03-03 cs.RO

(hu)Man vs. Machine: In the Future of Motorsport, can Autonomous Vehicles Compete?

Armand Amaritei, Amber-Lily Blackadder, Sebastian Donnelly, Lora Hernandez, James Vine, Alexander Rast, Matthias Rolf, Andrew Bradley

2603.01557 2026-03-03 cs.AI

Benchmarking LLM Summaries of Multimodal Clinical Time Series for Remote Monitoring

Aditya Shukla, Yining Yuan, Ben Tamo, Yifei Wang, Micky Nnamdi, Shaun Tan, Jieru Li, Benoit Marteau, Brad Willingham, May Wang

2603.01554 2026-03-03 cs.AI

S5-HES Agent: Society 5.0-driven Agentic Framework to Democratize Smart Home Environment Simulation

Akila Siriweera, Janani Rangila, Keitaro Naruse, Incheon Paik, Isuru Jayanada

Comments 12 pages, 9 figures, and Journal

2603.01553 2026-03-03 cs.AI cs.LG

State-Action Inpainting Diffuser for Continuous Control with Delay

Dongqi Han, Wei Wang, Enze Zhang, Dongsheng Li

2603.01552 2026-03-03 cs.CV

Align-cDAE: Alzheimer's Disease Progression Modeling with Attention-Aligned Conditional Diffusion Auto-Encoder

Ayantika Das, Keerthi Ram, Mohanasankar Sivaprakasam

详情

英文摘要

Generative AI framework-based modeling and prediction of longitudinal human brain images offer an efficient mechanism to track neurodegenerative progression, essential for the assessment of diseases like Alzheimer's. Among the existing generative approaches, recent diffusion-based models have emerged as an effective alternative to generate disease progression images. Incorporating multi-modal and non-imaging attributes as conditional information into diffusion frameworks has been shown to improve controllability during such generations. However, existing methods do not explicitly ensure that information from non-imaging conditioning modalities is meaningfully aligned with image features to introduce desirable changes in the generated images, such as modulation of progression-specific regions. Further, more precise control over the generation process can be achieved by introducing progression-relevant structure into the internal representations of the model, lacking in the existing approaches. To address these limitations, we propose a diffusion autoencoder-based framework for disease progression modeling that explicitly enforces alignment between different modalities. The alignment is enforced by introducing an explicit objective function that enables the model to focus on the regions exhibiting progression-related changes. Further, we devise a mechanism to better structure the latent representational space of the diffusion auto-encoding framework. Specifically, we assign separate latent subspaces for integrating progression-related conditions and retaining subject-specific identity information, allowing better-controlled image generation. These results demonstrate that enforcing alignment and better structuring of the latent representational space of diffusion auto-encoding framework leads to more anatomically precise modeling of Alzheimer's disease progression.

URL PDF HTML ☆

赞 0 踩 0

2603.01548 2026-03-03 cs.AI cs.SE

Graph-Based Self-Healing Tool Routing for Cost-Efficient LLM Agents

Neeraj Bholani

Comments Working paper. 27 references, 13 figures, 8 tables, pseudocode appendix

2603.01547 2026-03-03 cs.CV

PathMoE: Interpretable Multimodal Interaction Experts for Pediatric Brain Tumor Classification

Jian Yu, Joakim Nguyen, Jinrui Fang, Awais Naeem, Zeyuan Cao, Sanjay Krishnan, Nicholas Konz, Tianlong Chen, Chandra Krishnan, Hairong Wang, Edward Castillo, Ying Ding, Ankita Shukla

2603.01545 2026-03-03 cs.CV

Training-Free Spatio-temporal Decoupled Reasoning Video Segmentation with Adaptive Object Memory

Zhengtong Zhu, Jiaqing Fan, Zhixuan Liu, Fanzhang Li

Comments Accept by AAAI2026

2603.01544 2026-03-03 cs.CV

RA-Det: Towards Universal Detection of AI-Generated Images via Robustness Asymmetry

Xinchang Wang, Yunhao Chen, Yuechen Zhang, Congcong Bian, Zihao Guo, Xingjun Ma, Hui Li

2603.01537 2026-03-03 cs.AI q-bio.BM q-bio.QM

Pharmacology Knowledge Graphs: Do We Need Chemical Structure for Drug Repurposing?

Youssef Abo-Dahab, Ruby Hernandez, Ismael Caleb Arechiga Duran

Comments 34 pages, 5 figures. Under review at Discover Artificial Intelligence

详情

英文摘要

The contributions of model complexity, data volume, and feature modalities to knowledge graph-based drug repurposing remain poorly quantified under rigorous temporal validation. We constructed a pharmacology knowledge graph from ChEMBL 36 comprising 5,348 entities including 3,127 drugs, 1,156 proteins, and 1,065 indications. A strict temporal split was enforced with training data up to 2022 and testing data from 2023 to 2025, together with biologically verified hard negatives mined from failed assays and clinical trials. We benchmarked five knowledge graph embedding models and a standard graph neural network with 3.44 million parameters that incorporates drug chemical structure using a graph attention encoder and ESM-2 protein embeddings. Scaling experiments ranging from 0.78 to 9.75 million parameters and from 25 to 100 percent of the data, together with feature ablation studies, were used to isolate the contributions of model capacity, graph density, and node feature modalities. Removing the graph attention based drug structure encoder and retaining only topological embeddings combined with ESM-2 protein features improved drug protein PR-AUC from 0.5631 to 0.5785 while reducing VRAM usage from 5.30 GB to 353 MB. Replacing the drug encoder with Morgan fingerprints further degraded performance, indicating that explicit chemical structure representations can be detrimental for predicting pharmacological network interactions. Increasing model size beyond 2.44 million parameters yielded diminishing returns, whereas increasing training data consistently improved performance. External validation confirmed 6 of the top 14 novel predictions as established therapeutic indications. These results show that drug pharmacological behavior can be accurately predicted using target-centric information and drug network topology alone, without requiring explicit chemical structure representations.

URL PDF HTML ☆

赞 0 踩 0

2603.01528 2026-03-03 cs.CV

Boosting AI Reliability with an FSM-Driven Streaming Inference Pipeline: An Industrial Case

Yutian Zhang, Zhongyi Pei, Yi Mao, Chen Wang, Lin Liu, Jianmin Wang

Comments Preprint. The work was done in 2024

2603.01526 2026-03-03 cs.LG

Scalable Multi-Task Low-Rank Model Adaptation

Zichen Tian, Antoine Ledent, Qianru Sun

Comments Published as a conference paper at ICLR 2026. 21 pages, 4 figures, 11 tables. Code is available

详情

Journal ref: International Conference on Learning Representations (ICLR), 2026

英文摘要

Scaling multi-task low-rank adaptation (LoRA) to a large number of tasks induces catastrophic performance degradation, such as an accuracy drop from 88.2% to 2.0% on DOTA when scaling from 5 to 15 tasks. This failure is due to parameter and representation misalignment. We find that existing solutions, like regularization and dynamic routing, fail at scale because they are constrained by a fundamental trade-off: strengthening regularization to reduce inter-task conflict inadvertently suppresses the essential feature discrimination required for effective routing. In this work, we identify two root causes for this trade-off. First, uniform regularization disrupts inter-task knowledge sharing: shared underlying knowledge concentrates in high-SV components (89% alignment on Flanv2->BBH). Uniform regularization forces high-SV components to update in orthogonal directions, directly disrupting the shared knowledge. Second, Conflict Amplification: Applying LoRA at the component-level (e.g., W_q, W_v) amplifies gradient conflicts; we show block-level adaptation reduces this conflict by 76% with only 50% parameters. Based on these insights, we propose mtLoRA, a scalable solution with three novel designs: 1) Spectral-Aware Regularization to selectively orthogonalize low-SV components while preserving high-SV shared knowledge, 2) Block-Level Adaptation to mitigate conflict amplification and largely improve parameter efficiency, and 3) Fine-Grained Routing using dimension-specific weights for superior expressive power. On four large-scale (15-25 tasks) vision (DOTA and iNat2018) and NLP (Dolly-15k and BBH) benchmarks, mtLoRA achieves 91.7%, 81.5%, 44.5% and 38.5% accuracy on DOTA, iNat2018, Dolly-15k and BBH respectively, outperforming the state-of-the-art by 2.3% on average while using 47% fewer parameters and 24% less training time.

URL PDF HTML ☆

赞 0 踩 0

2603.01524 2026-03-03 cs.CV

Better Matching, Less Forgetting: A Quality-Guided Matcher for Transformer-based Incremental Object Detection

Qirui Wu, Shizhou Zhang, De Cheng, Yinghui Xing, Lingyan Ran, Dahu Shi, Peng Wang

Comments Accepted in AAAI2026

2603.01514 2026-03-03 cs.LG stat.ML

Training Dynamics of Softmax Self-Attention: Fast Global Convergence via Preconditioning

Gautam Goel, Mahdi Soltanolkotabi, Peter Bartlett

2603.01509 2026-03-03 cs.CV cs.AI

Retrieval, Refinement, and Ranking for Text-to-Video Generation via Prompt Optimization and Test-Time Scaling

Zillur Rahman, Alex Sheng, Cristian Meo

Comments 2026 ICLR TTU Workshop

2603.01505 2026-03-03 cs.RO

FATE: Closed-Loop Feasibility-Aware Task Generation with Active Repair for Physically Grounded Robotic Curricula

Bingchuan Wei, Bingqi Huang, Jingheng Ma, Zeyu zhang, Sen Cui

Comments 16 Pages, 4 Figures

2603.01502 2026-03-03 cs.CL eess.AS

Anatomy of the Modality Gap: Dissecting the Internal States of End-to-End Speech LLMs

Ming-Hao Hsu, Xueyao Zhang, Xiaohai Tian, Jun Zhang, Zhizheng Wu

2603.01501 2026-03-03 cs.LG cs.AI

GAC: Stabilizing Asynchronous RL Training for LLMs via Gradient Alignment Control

Haofeng Xu, Junwei Su, Yukun Tian, Lansong Diao, Zhengping Qian, Chuan Wu

2603.01491 2026-03-03 cs.CV cs.GR

Radiometrically Consistent Gaussian Surfels for Inverse Rendering

Kyu Beom Han, Jaeyoon Kim, Woo Jae Kim, Jinhwan Seo, Sung-eui Yoon

Comments 9 pages, 6 figures, ICLR 2026 Oral paper

2603.01490 2026-03-03 cs.CV cs.AI

ATA: Bridging Implicit Reasoning with Attention-Guided and Action-Guided Inference for Vision-Language Action Models

Cheng Yang, Jianhao Jiao, Lingyi Huang, Jinqi Xiao, Zhexiang Tang, Yu Gong, Yibiao Ying, Yang Sui, Jintian Lin, Wen Huang, Bo Yuan

Comments Accepted by ICRA 2026

2603.01486 2026-03-03 cs.AI

Agentic Multi-Source Grounding for Enhanced Query Intent Understanding: A DoorDash Case Study

Emmanuel Aboah Boateng, Kyle MacDonald, Akshad Viswanathan, Sudeep Das

Comments 5 pages, 4 figures

2603.01485 2026-03-03 cs.CV

SCATR: Mitigating New Instance Suppression in LiDAR-based Tracking-by-Attention via Second Chance Assignment and Track Query Dropout

Brian Cheong, Letian Wang, Sandro Papais, Steven L. Waslander

2603.01481 2026-03-03 cs.AI

Harmonizing Dense and Sparse Signals in Multi-turn RL: Dual-Horizon Credit Assignment for Industrial Sales Agents

Haojin Yang, Ai Jian, Xinyue Huang, Yiwei Wang, Weipeng Zhang, Ke Zeng, Xunliang Cai, Jingqing Ruan

Comments 15 pages, 6 figures

2603.01480 2026-03-03 cs.RO

Towards Robot Skill Learning and Adaptation with Gaussian Processes

A K M Nadimul Haque, Fouad Sukkar, Sheila Sujipto, Cedric Le Gentil, Marc G. Carmichael, Teresa Vidal-Calleja

2603.01477 2026-03-03 cs.RO

SFCo-Nav: Efficient Zero-Shot Visual Language Navigation via Collaboration of Slow LLM and Fast Attributed Graph Alignment

Chaoran Xiong, Litao Wei, Xinhao Hu, Kehui Ma, Ziyi Xia, Zixin Jiang, Zhen Sun, Ling Pei

Comments Accepted by 2026 IEEE International Conference on Robotics and Automation (ICRA)

2603.01475 2026-03-03 cs.CV

WildCross: A Cross-Modal Large Scale Benchmark for Place Recognition and Metric Depth Estimation in Natural Environments

Joshua Knights, Joseph Reid, Kaushik Roy, David Hall, Mark Cox, Peyman Moghadam

Comments IEEE International Conference on Robotics & Automation (ICRA) 2026

2603.01469 2026-03-03 cs.RO cs.AI

Mean-Flow based One-Step Vision-Language-Action

Yang Chen, Xiaoguang Ma, Bin Zhao