arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.10966 2026-04-17 cs.CV cs.AI

You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass

Yinuo Yang, Zixian Ma, Manasi Ganti, Jieyu Zhang, Ranjay Krishna

Comments 9 pages, 4 figures

详情

英文摘要

We present a discriminative multimodal reward model that scores all candidate responses in a single forward pass. Conventional discriminative reward models evaluate each response independently, requiring multiple forward passes, one for each potential response. Our approach concatenates multiple responses with separator tokens and applies cross-entropy over their scalar scores, enabling direct comparative reasoning and efficient $N$-way preference learning. The multi-response design also yields up to $N\times$ wall-clock speedup and FLOPs reduction over conventional single-response scoring. To enable $N$-way reward evaluation beyond existing pairwise benchmarks, we construct two new benchmarks: (1) MR$^2$Bench-Image contains human-annotated rankings over responses from 8 diverse models; (2) MR$^2$Bench-Video is a large-scale video-based reward benchmark derived from 94K crowdsourced pairwise human judgments over video question-answering spanning 19 models, denoised via preference graph ensemble. Both benchmarks provide 4-response evaluation variants sampled from the full rankings. Built on a 4B vision-language backbone with LoRA fine-tuning and a lightweight MLP value head, our model achieves state-of-the-art results on six multimodal reward benchmarks, including MR$^2$Bench-Image, MR$^2$Bench-Video, and four other existing benchmarks. Our model outperforms existing larger generative and discriminative reward models. We further demonstrate that our reward model, when used in reinforcement learning with GRPO, produces improved policy models that maintain performance across standard multimodal benchmarks while substantially improving open-ended generation quality, outperforming a single-response discriminative reward model (RM) baseline by a large margin in both training stability and open-ended generation quality.

URL PDF HTML ☆

赞 0 踩 0

2604.10548 2026-04-17 cs.RO

Simple but Stable, Fast and Safe: Achieve End-to-end Control by High-Fidelity Differentiable Simulation

Fanxing Li, Shengyang Wang, Yuxiang Huang, Fangyu Sun, Shuyu Wu, Yufei Yan, Danping Zou, Wenxian Yu

2604.10410 2026-04-17 cs.AI

CWCD: Category-Wise Contrastive Decoding for Structured Medical Report Generation

Shantam Srivastava, Mahesh Bhosale, David Doermann, Mingchen Gao

Comments Accepted to MIDL 2026 (Oral)

2604.10351 2026-04-17 cs.RO

Trajectory-based actuator identification via differentiable simulation

Vyacheslav Kovalev, Ekaterina Chaikovskaia, Egor Davydenko, Roman Gorbachev

2604.10101 2026-04-17 cs.CL

Who Wrote This Line? Evaluating the Detection of LLM-Generated Classical Chinese Poetry

Jiang Li, Tian Lan, Shanshan Wang, Dongxing Zhang, Dianqing Lin, Guanglai Gao, Derek F. Wong, Xiangdong Su

Comments Accepted to ACL 2026 Main Conference

2604.09665 2026-04-17 cs.LG cs.AI

Deliberative Alignment is Deep, but Uncertainty Remains: Inference time safety improvement in reasoning via attribution of unsafe behavior to base model

Pankayaraj Pathmanathan, Furong Huang

2604.09057 2026-04-17 cs.CV cs.MM cs.SD

Tora3: Trajectory-Guided Audio-Video Generation with Physical Coherence

Junchao Liao, Zhenghao Zhang, Xiangyu Meng, Litao Li, Ziying Zhang, Siyu Zhu, Long Qin, Weizhi Wang

Comments 12 pages, 5 tables, 5 figures

2604.08952 2026-04-17 cs.CL cs.IR

MAB-DQA: Addressing Query Aspect Importance in Document Question Answering with Multi-Armed Bandits

Yixin Xiang, Yunshan Ma, Xiaoyu Du, Yibing Chen, Yanxin Zhang, Jinhui Tang

Comments Accepted by ACL 2026. 20 pages, 9 figures, 6 tables

2604.07941 2026-04-17 cs.CL cs.AI cs.LG

Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning

Shiwan Zhao, Zhihu Wang, Xuyang Zhao, Jiaming Zhou, Caiyue Xu, Chenfei Liu, Liting Zhang, Yuhang Jia, Yanzhe Zhang, Hualong Yu, Zichen Xu, Qicheng Li, Yong Qin

Comments 38 pages, 1 figure, 8 tables

详情

英文摘要

Post-training has become central to turning pretrained large language models (LLMs) into aligned, capable, and deployable systems. Recent progress spans supervised fine-tuning (SFT), preference optimization, reinforcement learning (RL), process supervision, verifier-guided methods, distillation, and multi-stage pipelines. Yet these methods are often discussed in fragmented ways, organized by labels or objectives rather than by the behavioral bottlenecks they address. This survey argues that LLM post-training is best understood as structured intervention on model behavior. We organize the field first by trajectory provenance, which defines two primary regimes: off-policy learning on externally supplied trajectories and on-policy learning on learner-generated rollouts. We then interpret methods through two recurring roles -- effective support expansion, which makes useful behaviors more reachable, and policy reshaping, which improves behavior within already reachable regions -- together with a complementary systems-level role, behavioral consolidation, which preserves, transfers, and amortizes useful behavior across stages and model transitions. Under this view, SFT may serve either support expansion or policy reshaping; preference optimization is usually off-policy reshaping, though online variants move closer to learner-generated states. On-policy RL often improves behavior on learner-generated states, but stronger guidance can also make hard-to-reach reasoning paths reachable. Distillation is often better understood as consolidation rather than only compression, and hybrid pipelines emerge as coordinated multi-stage compositions. Overall, the framework helps diagnose post-training bottlenecks and reason about stage composition, suggesting that progress increasingly depends on coordinated systems design rather than any single dominant objective.

URL PDF HTML ☆

赞 0 踩 0

2604.07663 2026-04-17 cs.LG

SAGE: Sign-Adaptive Gradient for Memory-Efficient LLM Optimization

Wooin Lee, Hyun-Tae Kim

Comments Accepted to Findings of the Association for Computational Linguistics: ACL 2026. 13 pages, 4 figures, 4 tables

2604.07021 2026-04-17 cs.CV

ModuSeg: Decoupling Object Discovery and Semantic Retrieval for Training-Free Weakly Supervised Segmentation

Qingze He, Fagui Liu, Dengke Zhang, Qingmao Wei, Quan Tang

2604.06647 2026-04-17 cs.CL

Feedback Adaptation for Retrieval-Augmented Generation

Jihwan Bang, Seunghan Yang, Kyuhong Shim, Simyung Chang, Juntae Lee, Sungha Choi

Comments Accepted at ACL 2026 Findings

2604.06296 2026-04-17 cs.LG cs.AI cs.MA cs.SE

AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent

Wenyue Hua, Sripad Karne, Qian Xie, Armaan Agrawal, Nikos Pagonas, Kostis Kaffes, Tianyi Peng

Comments 24 pages, 1 figure

2604.05541 2026-04-17 cs.CV

EchoAgent: Towards Reliable Echocardiography Interpretation with "Eyes","Hands" and "Minds"

Qin Wang, Zhiqing He, Yu Liu, Bowen Guo, Zeju Li, Miao Zhao, Wenhao Ju, Zhiling Luo, Xianhong Shu, Yi Guo, Yuanyuan Wang

Comments Accepted by CVPR 2026 CV4Clinical, 11 pages, 6 figures

详情

英文摘要

Reliable interpretation of echocardiography (Echo) is crucial for assessing cardiac function, which demands clinicians to synchronously orchestrate multiple capabilities, including visual observation (eyes), manual measurement (hands), and expert knowledge learning and reasoning (minds). While current task-specific deep-learning approaches and multimodal large language models have demonstrated promise in assisting Echo analysis through automated segmentation or reasoning, they remain focused on restricted skills, i.e., eyes-hands or eyes-minds, thereby limiting clinical reliability and utility. To address these issues, we propose EchoAgent, an agentic system tailored for end-to-end Echo interpretation, which achieves a fully coordinated eyes-hands-minds workflow that learns, observes, operates, and reasons like a cardiac sonographer. First, we introduce an expertise-driven cognition engine where our agent can automatically assimilate credible Echo guidelines into a structured knowledge base, thus constructing an Echo-customized mind. Second, we devise a hierarchical collaboration toolkit to endow EchoAgent with eyes-hands, which can automatically parse Echo video streams, identify cardiac views, perform anatomical segmentation, and quantitative measurement. Third, we integrate the perceived multimodal evidence with the exclusive knowledge base into an orchestrated reasoning hub to conduct explainable inferences. We evaluate EchoAgent on CAMUS and MIMIC-EchoQA datasets, which cover 48 distinct echocardiographic views spanning 14 cardiac anatomical regions. Experimental results show that EchoAgent achieves optimal performance across diverse structure analyses, yielding overall accuracy of up to 80.00%. Importantly, EchoAgent empowers a single system with abilities to learn, observe, operate and reason like an echocardiologist, which holds great promise for reliable Echo interpretation.

URL PDF HTML ☆

赞 0 踩 0

2604.05418 2026-04-17 cs.CV cs.AI

VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG

Honghao Fu, Miao Xu, Yiwei Wang, Dailing Zhang, Jun Liu, Yujun Cai

Comments Accepted by ACL 2026

2604.05407 2026-04-17 cs.AI cs.SE

CODESTRUCT: Code Agents over Structured Action Spaces

Myeongsoo Kim, Joe Hsu, Dingmin Wang, Shweta Garg, Varun Kumar, Murali Krishna Ramanathan

Comments Accepted at ACL 2026 main conference

2604.05302 2026-04-17 cs.CL

Right at My Level: A Unified Multilingual Framework for Proficiency-Aware Text Simplification

Jinhong Jeong, Junghun Park, Youngjae Yu

Comments Accepted to ACL 2026

2604.05242 2026-04-17 cs.CL cs.AI cs.CR

XMark: Reliable Multi-Bit Watermarking for LLM-Generated Texts

Jiahao Xu, Rui Hu, Olivera Kotevska, Zikai Zhang

Comments Accepted by ACL 2026 as a main conference paper

2604.03920 2026-04-17 cs.CL

From Plausible to Causal: Counterfactual Semantics for Policy Evaluation in Simulated Online Communities

Agam Goyal, Yian Wang, Eshwar Chandrasekharan, Hari Sundaram

Comments Accepted to PoliSim@CHI'26: 6 pages, 1 table (Best Paper Award)

2604.03611 2026-04-17 cs.CV

PortraitCraft: A Benchmark for Portrait Composition Understanding and Generation

Yuyang Sha, Zijie Lou, Youyun Tang, Xiaochao Qu, Zheng Qu, Ben Xia, Haoxiang Li, Ting Liu, Luoqi Liu

2604.03307 2026-04-17 cs.CV cs.AI

V-Reflection: Transforming MLLMs from Passive Observers to Active Interrogators

Jiazhou Zhou, Yucheng Chen, Hongyang Li, Qing Jiang, Hu Zhou, Ying-Cong Chen, Lei Zhang

Comments Main paper 14 pages with supplementary 7 pages

2604.02585 2026-04-17 cs.AI cs.CL

Mitigating LLM biases toward spurious social contexts using direct preference optimization

Hyunji Nam, Dorottya Demszky

Comments 26 pages

2604.00998 2026-04-17 cs.CV

Large Vision Model-Guided Masked Low-Rank Approximation for Ground-Roll Attenuation

Jiacheng Liao, Feng Qian, Ziyin Fan, Yongjian Guo

2604.00313 2026-04-17 cs.CV

Label-efficient underwater species classification with logistic regression on frozen foundation model embeddings

Thomas Manuel Rost

Comments v2. New methodology, instead of semi supervised now focused on linear separability. Methodological updates, performance upgrades. The concept is still the same, but the experimental setup and the implementation details have changed

2603.29693 2026-04-17 cs.AI

Measuring the metacognition of AI

Richard Servajean, Philippe Servajean

Comments 19 pages, 5 figures, 2 tables

2603.26528 2026-04-17 cs.CV

Learnable Quantum Efficiency Filters for Urban Hyperspectral Segmentation

Imad Ali Shah, Jiarong Li, Ethan Delaney, Enda Ward, Martin Glavin, Edward Jones, Brian Deegan

详情

DOI: 10.1109/OJVT.2026.3681390
Journal ref: IEEE Open Journal of Vehicular Technology, 2026

英文摘要

Hyperspectral sensing provides rich spectral information for scene understanding in urban driving, but its high dimensionality poses challenges for interpretation and efficient learning. We introduce Learnable Quantum Efficiency (LQE), a physics-inspired, interpretable dimensionality reduction (DR) method that parameterizes smooth high-order spectral response functions that emulate plausible sensor quantum efficiency curves. Unlike conventional methods or unconstrained learnable layers, LQE enforces physically motivated constraints, including a single dominant peak, smooth responses, and bounded bandwidth. This formulation yields a compact spectral representation that preserves discriminative information while remaining fully differentiable and end-to-end trainable within semantic segmentation models (SSMs). We conduct systematic evaluations across three publicly available multi-class hyperspectral urban driving datasets, comparing LQE against six conventional and seven learnable baseline DR methods across six SSMs. Averaged across all SSMs and configurations, LQE achieves the highest average mIoU, improving over conventional methods by 2.45\%, 0.45\%, and 1.04\%, and over learnable methods by 1.18\%, 1.56\%, and 0.81\% on HyKo, HSI-Drive, and Hyperspectral City, respectively. LQE maintains strong parameter efficiency (12--36 parameters compared to 51--22K for competing learnable approaches) and competitive inference latency. Ablation studies show that low-order configurations are optimal, while the learned spectral filters converge to dataset-intrinsic wavelength patterns. These results demonstrate that physics-informed spectral learning can improve both performance and interpretability, providing a principled bridge between hyperspectral perception and data-driven multispectral sensor design for automotive vision systems.

URL PDF HTML ☆

赞 0 踩 0

2603.25903 2026-04-17 cs.RO

Emergent Neural Automaton Policies: Learning Symbolic Structure from Visuomotor Trajectories

Yiyuan Pan, Xusheng Luo, Hanjiang Hu, Peiqi Yu, Changliu Liu

2603.24992 2026-04-17 cs.CV

C2W-Tune: Cavity-to -Wall Transfer Learning for Thin Atrial Wall Segmentation in 3D Late Gadolinium-enhanced Magnetic Resonance

Yusri Al-Sanaani, Rebecca Thornhill, Sreeraman Rajan

Comments Submitted this to the International Conference on Artificial Intelligence in Medicine (AIME 2026)

2603.24470 2026-04-17 cs.CV cs.AI cs.CL cs.SI

Counting Without Numbers and Finding Without Words

Badri Narayana Patro

2603.23998 2026-04-17 cs.CL

Sparse Growing Transformer: Training-Time Sparse Depth Allocation via Progressive Attention Looping

Yao Chen, Yilong Chen, Yinqi Yang, Junyuan Shang, Zhenyu Zhang, Zefeng Zhang, Shuaiyi Nie, Shuohuan Wang, Yu Sun, Hua Wu, HaiFeng Wang, Tingwen Liu