arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2601.13942 2026-04-30 cs.CV cs.AI

Glance-or-Gaze: Incentivizing LMMs to Adaptively Focus Search via Reinforcement Learning

Hongbo Bai, Yujin Zhou, Yile Wu, Chi-Min Chan, Pengcheng Wen, Kunhao Pan, Sirui Han, Yike Guo

详情

Journal ref: ACL 2026 Findings

英文摘要

Large Multimodal Models (LMMs) have achieved remarkable success in visual understanding, yet they struggle with knowledge-intensive queries involving long-tail entities or evolving information due to static parametric knowledge. Recent search-augmented approaches attempt to address this limitation, but existing methods rely on indiscriminate whole-image retrieval that introduces substantial visual redundancy and noise, and lack deep iterative reflection, limiting their effectiveness on complex visual queries. To overcome these challenges, we propose Glance-or-Gaze (GoG), a fully autonomous framework that shifts from passive perception to active visual planning. GoG introduces a Selective Gaze mechanism that dynamically chooses whether to glance at global context or gaze into high-value regions, filtering irrelevant information before retrieval. We design a dual-stage training strategy: Reflective GoG Behavior Alignment via supervised fine-tuning instills the fundamental GoG paradigm, while Complexity-Adaptive Reinforcement Learning further enhances the model's capability to handle complex queries through iterative reasoning. Experiments across six benchmarks demonstrate state-of-the-art performance. Ablation studies confirm that both Selective Gaze and complexity-adaptive RL are essential for effective visual search.

URL PDF HTML ☆

赞 0 踩 0

2601.13606 2026-04-30 cs.CV

ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch

Zheng Liu, Honglin Lin, Chonghan Qin, Xiaoyang Wang, Xin Gao, Yu Li, Mengzhang Cai, Yun Zhu, Zhanping Zhong, Qizhi Pei, Zhuoshi Pan, Xiaoran Shang, Bin Cui, Conghui He, Wentao Zhang, Lijun Wu

Comments 29 pages

2601.11923 2026-04-30 cs.CL

Mapping the maturation of TCM as an adjuvant to radiotherapy

P. Bilha Githinji, Aikaterini Melliou

2601.11568 2026-04-30 cs.LG cs.AI cs.CL

AdaFRUGAL: Adaptive Memory-Efficient Training with Dynamic Control

Quang-Hung Bui, Anh Son Ta

Comments We have identified issues in the current version of the manuscript that may affect the validity of some results. We are withdrawing the paper to conduct further verification and improvements before resubmission

2601.09107 2026-04-30 cs.CV cs.RO

Vision Foundation Models for Domain Generalisable Cross-View Localisation in Planetary Ground-Aerial Robotic Teams

Lachlan Holden, Feras Dayoub, Alberto Candela, David Harvey, Tat-Jun Chin

Comments 7 pages, 10 figures. Presented at the International Conference on Space Robotics (iSpaRo) 2025 in Sendai, Japan. Dataset available: https://doi.org/10.5281/zenodo.17364038

2601.06160 2026-04-30 cs.AI

Student Guides Teacher: Weak-to-Strong Inference via Spectral Orthogonal Exploration

Dayu Wang, Jiaye Yang, Weikang Li, Jiahui Liang, Yang Li, Deguo Xia, Jizhou Huang

Comments Accepted to ACL 2026 Main Conference

2601.03654 2026-04-30 cs.LG math.OC math.QA

Hybrid Quantum-Classical Ridgelet Neural Networks for Portfolio Optimization

Bahadur Yadav, Sanjay Kumar Mohanty

2601.03423 2026-04-30 cs.CL cs.AI

Training-Free Adaptation of New-Generation LLMs using Legacy Clinical Models

Sasha Ronaghi, Chloe Stanwyck, Asad Aali, Amir Ronaghi, Miguel Fuentes, Tina Hernandez-Boussard, Emily Alsentzer

2601.02731 2026-04-30 cs.SD cs.CV cs.MM

Omni2Sound: Towards Unified Video-Text-to-Audio Generation

Yusheng Dai, Zehua Chen, Yuxuan Jiang, Baolong Gao, Qiuhong Ke, Jianfei Cai, Jun Zhu

2512.20340 2026-04-30 cs.CV

The devil is in the details: Enhancing Video Virtual Try-On via Keyframe-Driven Details Injection

Qingdong He, Xueqin Chen, Yanjie Pan, Peng Tang, Pengcheng Xu, Zhenye Gan, Chengjie Wang, Xiaobin Hu, Jiangning Zhang, Yabiao Wang

Comments Accepted by CVPR 2026 (Main Conference)

2512.18583 2026-04-30 cs.LG cs.RO

SD2AIL: Adversarial Imitation Learning from Synthetic Demonstrations via Diffusion Models

Pengcheng Li, Qiang Fang, Tong Zhao, Yixing Lan, Xin Xu

Comments This paper has the following problems: Limited novelty, not clearly differentiated from existing methods/concepts; The level of experimental validation is limited; Sufficient serious structural, language, or other issues that impact the comprehensibility of the manuscript

2512.08982 2026-04-30 cs.CV cs.AI

Consist-Retinex: One-Step Noise-Emphasized Consistency Training Accelerates High-Quality Retinex Enhancement

Jian Xu, Wei Chen, Shigui Li, Delu Zeng, John Paisley, Qibin Zhao

2512.06565 2026-04-30 cs.CV

GNC-Pose: Geometry-Aware GNC-PnP for Accurate 6D Pose Estimation

Xiujin Liu

Comments 1 figures, 2 tables, 14pages

2511.22972 2026-04-30 cs.CL

Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match

Jinze Li, Yixing Xu, Guanchen Li, Shuo Yang, Jinfeng Xu, Xuanwu Yin, Dong Li, Edith C. H. Ngai, Emad Barsoum

Comments Published as a conference paper at ICLR 2026

2511.22958 2026-04-30 cs.CV

Contrastive Heliophysical Image Pretraining for Solar Dynamics Observatory Records

Shiyu Shen, Zhe Gao, Taifeng Chai, Yang Huang, Bin Pan

Comments arXiv admin note: This submission has been withdrawn due to violation of arXiv policies for acceptable submissions

2511.20714 2026-04-30 cs.CV cs.AI

Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation

Inferix Team, Tianyu Feng, Yizeng Han, Jiahao He, Yuanyu He, Xi Lin, Teng Liu, Hanfeng Lu, Jiasheng Tang, Wei Wang, Zhiyuan Wang, Jichao Wu, Mingyang Yang, Yinghao Yu, Zeyu Zhang, Bohan Zhuang

2511.20032 2026-04-30 cs.CV

Tell Model Where to Look: Mitigating Hallucinations in MLLMs by Vision-Guided Attention

Jianfei Zhao, Feng Zhang, Xin Sun, Chong Feng, Zhixing Tan

Comments CVPR 2026

2511.19543 2026-04-30 cs.RO

A Virtual Mechanical Interaction Layer Enables Resilient Human-to-Robot Object Handovers

Omar Faris, Sławomir Tadeja, Fulvio Forni

Comments Accepted for publication in IEEE Robotics and Automation Letters (RA-L)

2511.13285 2026-04-30 cs.CV

SkyReels-Text: Fine-Grained Font-Controllable Text Editing for Poster Design

Yunjie Yu, Jingchen Wu, Junchen Zhu, Chunze Lin, Guibin Chen

Comments Accepted to CVPR 2026

2511.10816 2026-04-30 cs.RO

Dynamically Extensible and Retractable Robotic Leg Linkages for Multi-task Execution in Search and Rescue Scenarios

William Harris, Lucas Yager, Syler Sylvester, Elizabeth Peiros, Micheal C. Yip

2511.07689 2026-04-30 cs.CL cs.AI cs.LG

Stress Testing Factual Consistency Metrics for Long-Document Summarization

Zain Muhammad Mujahid, Dustin Wright, Isabelle Augenstein

Comments Accepted in Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)

2511.03691 2026-04-30 cs.RO

Source-Free Bistable Fluidic Gripper for Size-Selective and Stiffness-Adaptive Grasping

Zhihang Qin, Yueheng Zhang, Wan Su, Linxin Hou, Shenghao Zhou, Zhijun Chen, Yu Jun Tan, Cecilia Laschi

2510.21828 2026-04-30 cs.CV cs.CL

Structured and Abstractive Reasoning on Multi-modal Relational Knowledge Images

Yichi Zhang, Zhuo Chen, Lingbing Guo, Wen Zhang, Huajun Chen

Comments Accepted by Findings of ACL 2026

2510.18165 2026-04-30 cs.AI cs.CL cs.LG cs.SE

Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model

Yihong Dong, Zhaoyu Ma, Xue Jiang, Zhiyuan Fan, Jiaru Qian, Yongmin Li, Jianha Xiao, Zhi Jin, Rongyu Cao, Binhua Li, Fei Huang, Yongbin Li, Ge Li

Comments Accepted to ACL 2026 (main)

2510.17548 2026-04-30 cs.CL

When Annotators Disagree, Topology Explains: Mapper, a Topological Tool for Exploring Text Embedding Geometry and Ambiguity

Nisrine Rair, Alban Goupil, Valeriu Vrabie, Emmanuel Chochoy

Comments Accepted to appear in the Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025, Main Conference)

2510.14703 2026-04-30 cs.AI

ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling

Jianghao Lin, Yuanyuan Shi, Xin Peng, Renjie Ding, Hairui Wang, Yuxuan Peng, Bizhe Bai, Weixi Song, Fengshuo Bai, Huacan Chai, Weinan Zhang, Fei Huang, Ying Wen

Comments ACL 2026 (main)

2510.14438 2026-04-30 cs.CL

WebAggregator: Enhancing Compositional Reasoning Capabilities of Deep Research Agent Foundation Models

Rui Wang, Ce Zhang, Jun-Yu Ma, Jianshu Zhang, Hongru Wang, Yi Chen, Boyang Xue, Tianqing Fang, Zhisong Zhang, Hongming Zhang, Haitao Mi, Dong Yu, Kam-Fai Wong

2510.10150 2026-04-30 cs.LG cs.AI

Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective

Zhezheng Hao, Hong Wang, Haoyang Liu, Jian Luo, Jiarui Yu, Hande Dong, Qiang Lin, Can Wang, Jiawei Chen

2510.08547 2026-04-30 cs.RO cs.CV

R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation

Xiuwei Xu, Angyuan Ma, Hankun Li, Bingyao Yu, Zheng Zhu, Jie Zhou, Jiwen Lu

Comments Accepted to RSS 2026. Project page: https://r2rgen.github.io/

2510.08049 2026-04-30 cs.CL cs.AI

A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models

Congmin Zheng, Jiachen Zhu, Zhuoying Ou, Yuxiang Chen, Kangning Zhang, Rong Shan, Zeyu Zheng, Mengyue Yang, Jianghao Lin, Yong Yu, Weinan Zhang