arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.20486 2026-04-23 cs.CV

ProMMSearchAgent: A Generalizable Multimodal Search Agent Trained with Process-Oriented Rewards

Wentao Yan, Shengqin Wang, Huichi Zhou, Yihang Chen, Kun Shao, Yuan Xie, Zhizhong Zhang

详情

英文摘要

Training multimodal agents via reinforcement learning for knowledge-intensive visual reasoning is fundamentally hindered by the extreme sparsity of outcome-based supervision and the unpredictability of live web environments. To resolve these algorithmic and environmental bottlenecks, we introduce ProMMSearchAgent, establishing a novel Sim-to-Real training paradigm for multimodal search. We decouple policy learning into a deterministic, local static sandbox. Crucially, to learn effectively within this constrained environment, we propose an introspective process-oriented reward. By probing the agent's own parametric knowledge boundaries, we generate dense behavioral metadata that explicitly rewards the correct cognitive decision, initiating a multimodal or text search only when visually or factually uncertain. Extensive experiments demonstrate that our locally-trained policy transfers zero-shot to the live Google Search API. ProMMSearchAgent achieves new SOTA performance, outperforming MMSearch-R1 by +5.1% on FVQA-test, +6.3% on InfoSeek, and +11.3% on MMSearch.

URL PDF HTML ☆

赞 0 踩 0

2604.20474 2026-04-23 cs.CV

Random Walk on Point Clouds for Feature Detection

Yuhe Zhang, Zhikun Tu, Zhi Li, Jian Gao, Bao Guo, Shunli Zhang

Comments 20 pages, 11 figures. Published in Information Sciences

2604.20473 2026-04-23 cs.CV

Video-ToC: Video Tree-of-Cue Reasoning

Qizhong Tan, Zhuotao Tian, Guangming Lu, Jun Yu, Wenjie Pei

2604.20472 2026-04-23 cs.RO cs.LG

Temporal Difference Calibration in Sequential Tasks: Application to Vision-Language-Action Models

Shelly Francis-Meretzki, Mirco Mutti, Yaniv Romano, Aviv Tamar

2604.20470 2026-04-23 cs.CV

DynamicRad: Content-Adaptive Sparse Attention for Long Video Diffusion

Yongji Long, Shijun Liang, Jintao Li, Yun Li

2604.20460 2026-04-23 cs.CV

CCTVBench: Contrastive Consistency Traffic VideoQA Benchmark for Multimodal LLMs

Xingcheng Zhou, Hao Guo, Rui Song, Walter Zimmer, Mingyu Liu, André Schamschurko, Hu Cao, Alois Knoll

2604.20458 2026-04-23 cs.LG physics.chem-ph

Surrogate Functionals for Machine-Learned Orbital-Free Density Functional Theory

Roman Remme, Fred A. Hamprecht

2604.20454 2026-04-23 cs.CL

Not all ANIMALs are equal: metaphorical framing through source domains and semantic frames

Yulia Otmakhova, Matteo Guida, Lea Frermann

Comments Accepted to ACL 2026 Findings

2604.20447 2026-04-23 cs.CL

Decoding Text Spans for Efficient and Accurate Named-Entity Recognition

Andrea Maracani, Savas Ozkan, Junyi Zhu, Sinan Mutlu, Mete Ozay

2604.20446 2026-04-23 cs.LG stat.ML

The Origin of Edge of Stability

Elon Litman

2604.20444 2026-04-23 cs.RO cs.AI cs.DB cs.LG

VTouch++: A Multimodal Dataset with Vision-Based Tactile Enhancement for Bimanual Manipulation

Qianxi Hua, Xinyue Li, Zheng Yan, Yang Li, Chi Zhang, Yongyao Li, Yufei Liu

2604.20441 2026-04-23 cs.AI

MedSkillAudit: A Domain-Specific Audit Framework for Medical Research Agent Skills

Yingyong Hou, Xinyuan Lao, Huimei Wang, Qianyu Yao, Wei Chen, Bocheng Huang, Fei Sun, Yuxian Lv, Weiqi Lei, Xueqian Wen, Pengfei Xia, Zhujun Tan, Shengyang Xie

Comments 20 pages, 9 figures, 1 graphic abstract, 4 tables

2604.20429 2026-04-23 cs.CV

Fast-then-Fine: A Two-Stage Framework with Multi-Granular Representation for Cross-Modal Retrieval in Remote Sensing

Xi Chen, Xu Chen, Xiangyang Jia, Xu Zhang, Shuquan Wei, Wei Wang

2604.20423 2026-04-23 cs.RO

OVPD: A Virtual-Physical Fusion Testing Dataset of OnSite Auton-omous Driving Challenge

Yuhang Zhang, Jiarui Zhang, Bowen Jian, Xin Zhou, Zhichao Lv, Peng Hang, Rongjie Yu, Ye Tian, Jian Sun

Comments 11 pages, 6 figures, 3 tables

2604.20421 2026-04-23 cs.LG

Unlocking the Forecasting Economy: A Suite of Datasets for the Full Lifecycle of Prediction Market: [Experiments \& Analysis]

Huaiyu Jia, Luofeng Zhou, Wentao Zhang, Lin William Cong, Siguang Li, Shuo Sun

Comments Project page: https://www.polymonitor.club/

2604.20420 2026-04-23 cs.LG cs.AI

Scalable AI Inference: Performance Analysis and Optimization of AI Model Serving

Hung Cuong Pham, Fatih Gedikli

2604.20413 2026-04-23 cs.AI

Self-Awareness before Action: Mitigating Logical Inertia via Proactive Cognitive Awareness

Fulong Fan, Peilin Liu, Fengzhe Liu, Shuyan Yang, Gang Yan

Comments Accepted to ACL 2026. 12 pages, 3 figures

2604.20409 2026-04-23 cs.LG stat.ML

Calibrating conditional risk

Andrey Vasilyev, Yikai Wang, Xiaocheng Li, Guanting Chen

2604.20398 2026-04-23 cs.CL cs.LG cs.SE

WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning

Juyong Jiang, Chenglin Cai, Chansung Park, Jiasi Shen, Sunghun Kim, Jianguo Li, Yue Wang

详情

英文摘要

While Large Language Models (LLMs) excel at function-level code generation, project-level tasks such as generating functional and visually aesthetic multi-page websites remain highly challenging. Existing works are often limited to single-page static websites, while agentic frameworks typically rely on multi-turn execution with proprietary models, leading to substantial token costs, high latency, and brittle integration. Training a small LLM end-to-end with reinforcement learning (RL) is a promising alternative, yet it faces a critical bottleneck in designing reliable and computationally feasible rewards for website generation. Unlike single-file coding tasks that can be verified by unit tests, website generation requires evaluating inherently subjective aesthetics, cross-page interactions, and functional correctness. To this end, we propose WebGen-R1, an end-to-end RL framework tailored for project-level website generation. We first introduce a scaffold-driven structured generation paradigm that constrains the large open-ended action space and preserves architectural integrity. We then design a novel cascaded multimodal reward that seamlessly couples structural guarantees with execution-grounded functional feedback and vision-based aesthetic supervision. Extensive experiments demonstrate that our WebGen-R1 substantially transforms a 7B base model from generating nearly nonfunctional websites into producing deployable, aesthetically aligned multi-page websites. Remarkably, our WebGen-R1 not only consistently outperforms heavily scaled open-source models (up to 72B), but also rivals the state-of-the-art DeepSeek-R1 (671B) in functional success, while substantially exceeding it in valid rendering and aesthetic alignment. These results position WebGen-R1 as a viable path for scaling small open models from function-level code generation to project-level web application generation.

URL PDF HTML ☆

赞 0 踩 0

2604.20393 2026-04-23 cs.CV

MLG-Stereo: ViT Based Stereo Matching with Multi-Stage Local-Global Enhancement

Haoyu Zhang, Jingyi Zhou, Peng Ye, Jiakang Yuan, Lin Zhang, Feng Xu, Tao Chen

2604.20392 2026-04-23 cs.CV

Self-supervised pretraining for an iterative image size agnostic vision transformer

Nedyalko Prisadnikov, Danda Pani Paudel, Yuqian Fu, Luc Van Gool

2604.20382 2026-04-23 cs.CL

Graph2Counsel: Clinically Grounded Synthetic Counseling Dialogue Generation from Client Psychological Graphs

Aishik Mandal, Hiba Arnaout, Clarissa W. Ong, Juliet Bockhorst, Kate Sheehan, Rachael Moldow, Tanmoy Chakraborty, Iryna Gurevych

Comments 49 pages, 46 figures, 11 tables

2604.20381 2026-04-23 cs.LG cs.NE cs.RO

Distributional Value Estimation Without Target Networks for Robust Quality-Diversity

Behrad Koohy, Jamie Bayne

Comments Accepted as Full Paper at GECCO'26

2604.20374 2026-04-23 cs.LG

Towards Event-Aware Forecasting in DeFi: Insights from On-chain Automated Market Maker Protocols

Huaiyu Jia, Jiehshun You, Yizhi Luo, Jingyu Liu, Shuo Sun

2604.20370 2026-04-23 cs.LG stat.ML

Cold-Start Forecasting of New Product Life-Cycles via Conditional Diffusion Models

Ruihan Zhou, Zishi Zhang, Jinhui Han, Yijie Peng, Xiaowei Zhang

2604.20368 2026-04-23 cs.CV cs.AI

LaplacianFormer:Rethinking Linear Attention with Laplacian Kernel

Zhe Feng, Sen Lian, Changwei Wang, Muyang Zhang, Tianlong Tan, Rongtao Xu, Weiliang Meng, Xiaopeng Zhang

2604.20366 2026-04-23 cs.CV

Mitigating Hallucinations in Large Vision-Language Models without Performance Degradation

Xingyu Zhu, Junfeng Fang, Shuo Wang, Beier Zhu, Zhicai Wang, Yonghui Yang, Xiangnan He

Comments ACL 2026 (Oral)

2604.20365 2026-04-23 cs.RO cs.AI

Benefits of Low-Cost Bio-Inspiration in the Age of Overparametrization

Kevin Godin-Dubois, Anil Yaman, Anna V. Kononova

2604.20361 2026-04-23 cs.CV

Object Referring-Guided Scanpath Prediction with Perception-Enhanced Vision-Language Models

Rong Quan, Yantao Lai, Dong Liang, Jie Qin

Comments ICMR 2026

2604.20358 2026-04-23 cs.CV

ConeSep: Cone-based Robust Noise-Unlearning Compositional Network for Composed Image Retrieval

Zixu Li, Yupeng Hu, Zhiwei Chen, Mingyu Zhang, Zhiheng Fu, Liqiang Nie

Comments Accepted by CVPR 2026