arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.13891 2026-04-16 cs.RO cs.AI cs.SY eess.SY

Beyond Conservative Automated Driving in Multi-Agent Scenarios via Coupled Model Predictive Control and Deep Reinforcement Learning

Saeed Rahmani, Gözde Körpe, Zhenlin, Xu, Bruno Brito, Simeon Craig Calvert, Bart van Arem

Comments This work has been submitted to the IEEE for possible publication

详情

英文摘要

Automated driving at unsignalized intersections is challenging due to complex multi-vehicle interactions and the need to balance safety and efficiency. Model Predictive Control (MPC) offers structured constraint handling through optimization but relies on hand-crafted rules that often produce overly conservative behavior. Deep Reinforcement Learning (RL) learns adaptive behaviors from experience but often struggles with safety assurance and generalization to unseen environments. In this study, we present an integrated MPC-RL framework to improve navigation performance in multi-agent scenarios. Experiments show that MPC-RL outperforms standalone MPC and end-to-end RL across three traffic-density levels. Collectively, MPC-RL reduces the collision rate by 21% and improves the success rate by 6.5% compared to pure MPC. We further evaluate zero-shot transfer to a highway merging scenario without retraining. Both MPC-based methods transfer substantially better than end-to-end PPO, which highlights the role of the MPC backbone in cross-scenario robustness. The framework also shows faster loss stabilization than end-to-end RL during training, which indicates a reduced learning burden. These results suggest that the integrated approach can improve the balance between safety performance and efficiency in multi-agent intersection scenarios, while the MPC component provides a strong foundation for generalization across driving environments. The implementation code is available open-source.

URL PDF HTML ☆

赞 0 踩 0

2604.13888 2026-04-16 cs.AI

GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis

Bo Yu, Cheng Yang, Dongyang Hou, Chengfu Liu, Jiayao Liu, Chi Wang, Zhiming Zhang, Haifeng Li, Wentao Yang

Comments 20 pages, 3 figures, 6 tables

详情

英文摘要

The integration of Large Language Models (LLMs) into Geographic Information Systems (GIS) marks a paradigm shift toward autonomous spatial analysis. However, evaluating these LLM-based agents remains challenging due to the complex, multi-step nature of geospatial workflows. Existing benchmarks primarily rely on static text or code matching, neglecting dynamic runtime feedback and the multimodal nature of spatial outputs. To address this gap, we introduce GeoAgentBench (GABench), a dynamic and interactive evaluation benchmark tailored for tool-augmented GIS agents. GABench provides a realistic execution sandbox integrating 117 atomic GIS tools, encompassing 53 typical spatial analysis tasks across 6 core GIS domains. Recognizing that precise parameter configuration is the primary determinant of execution success in dynamic GIS environments, we designed the Parameter Execution Accuracy (PEA) metric, which utilizes a "Last-Attempt Alignment" strategy to quantify the fidelity of implicit parameter inference. Complementing this, a Vision-Language Model (VLM) based verification is proposed to assess data-spatial accuracy and cartographic style adherence. Furthermore, to address the frequent task failures caused by parameter misalignments and runtime anomalies, we developed a novel agent architecture, Plan-and-React, that mimics expert cognitive workflows by decoupling global orchestration from step-wise reactive execution. Extensive experiments with seven representative LLMs demonstrate that the Plan-and-React paradigm significantly outperforms traditional frameworks, achieving the optimal balance between logical rigor and execution robustness, particularly in multi-step reasoning and error recovery. Our findings highlight current capability boundaries and establish a robust standard for assessing and advancing the next generation of autonomous GeoAI.

URL PDF HTML ☆

赞 0 踩 0

2604.13883 2026-04-16 cs.CV cs.LG

Context Sensitivity Improves Human-Machine Visual Alignment

Frieda Born, Tom Neuhäuser, Lukas Muttenthaler, Brett D. Roads, Bernhard Spitzer, Andrew K. Lampinen, Matt Jones, Klaus-Robert Müller, Michael C. Mozer

2604.13882 2026-04-16 cs.LG cs.AI

Evaluating Supervised Machine Learning Models: Principles, Pitfalls, and Metric Selection

Xuanyan Liu, Ignacio Cabrera Martin, Marcello Trovati, Xiaolong Xu, Nikolaos Polatidis

2604.13863 2026-04-16 cs.CV

PostureObjectstitch: Anomaly Image Generation Considering Assembly Relationships in Industrial Scenarios

Zebei Tong, Hongchang Chen, Yujie Lei, Gang Chen, Yushi Liu, Zhi Zheng, Hao Chen, Jieming Zhang, Ying Li, Dongpu Cao

2604.13856 2026-04-16 cs.CV

Any3DAvatar: Fast and High-Quality Full-Head 3D Avatar Reconstruction from Single Portrait Image

Yujie Gao, Yao Xiao, Xiangnan Zhu, Ya Li, Yiyi Zhang, Liqing Zhang, Jianfu Zhang

2604.13853 2026-04-16 cs.RO

Mosaic: An Extensible Framework for Composing Rule-Based and Learned Motion Planners

Nick Le Large, Marlon Steiner, Lingguang Wang, Willi Poh, Jan-Hendrik Pauls, Ömer Şahin Taş, Christoph Stiller

Comments 7 pages, 5 figures, 4 tables, submitted at 2026 IEEE/RSJ International Conference on Intelligent Robots & Systems

2604.13841 2026-04-16 cs.CV

DiffMagicFace: Identity Consistent Facial Editing of Real Videos

Huanghao Yin, Shenkun Xu, Kanle Shi, Junhai Yong, Bin Wang

2604.13835 2026-04-16 cs.CV

A Resource-Efficient Hybrid CNN-LSTM network for image-based bean leaf disease classification

Hye Jin Rhee, Joseph Damilola Akinyemi

2604.13828 2026-04-16 cs.CL

MUSE: Multi-Domain Chinese User Simulation via Self-Evolving Profiles and Rubric-Guided Alignment

Zihao Liu, Hantao Zhou, Jiguo Li, Jun Xu, Jiuchong Gao, Jinghua Hao, Renqing He, Peng Wang

2604.13824 2026-04-16 cs.LG

Beyond State Consistency: Behavior Consistency in Text-Based World Models

Youling Huang, Guanqiao Chen, Junchi Yao, Lu Wang, Fangkai Yang, Chao Du, ChenZhuo Zhao, Pu Zhao, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang

Comments 20 pages, 2 figures

2604.13822 2026-04-16 cs.LG

UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization

Zhengxi Lu, Fei Tang, Guangyi Liu, Kaitao Song, Xu Tan, Jin Ma, Wenqi Zhang, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen

2604.13817 2026-04-16 cs.LG

RPS: Information Elicitation with Reinforcement Prompt Selection

Tao Wang, Jingyao Lu, Xibo Wang, Haonan Huang, Su Yao, Zhiqiang Hu, Xingyan Chen, Enmao Diao

2604.13816 2026-04-16 cs.LG

Composite Silhouette: A Subsampling-based Aggregation Strategy

Aggelos Semoglou, Aristidis Likas, John Pavlopoulos

Comments 32 pages including Appendix

2604.13812 2026-04-16 cs.AI quant-ph

AlphaCNOT: Learning CNOT Minimization with Model-Based Planning

Jacopo Cossio, Daniele Lizzio Bosco, Riccardo Romanello, Giuseppe Serra, Carla Piazza

Comments 22 pages, 11 figures , journal

2604.13806 2026-04-16 cs.LG

Robust Ultra Low-Bit Post-Training Quantization via Stable Diagonal Curvature Estimate

Jaemin Kim, Sungkyun Kim, Junyeol Lee, Jiwon Seo

Comments EUROMLSYS 2026

2604.13804 2026-04-16 cs.LG

Character Beyond Speech: Leveraging Role-Playing Evaluation in Audio Large Language Models via Reinforcement Learning

Dongjie Fu, Fangming Feng, Xize Cheng, Linjun Li, Zhou Zhao, Tao Jin

2604.13803 2026-04-16 cs.CV cs.AI

Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation

Arya Shah, Vaibhav Tripathi, Mayank Singh, Chaklam Silpasuwanchai

Comments 28 pages, 9 figures, 13 tables

2604.13800 2026-04-16 cs.RO

EmbodiedClaw: Conversational Workflow Execution for Embodied AI Development

Xueyang Zhou, Yihan Sun, Xijie Gong, Guiyao Tie, Pan Zhou, Lichao Sun, Yongchao Chen

Comments 13 pages, 7 figure

2604.13797 2026-04-16 cs.CV

DRG-Font: Dynamic Reference-Guided Few-shot Font Generation via Contrastive Style-Content Disentanglement

Rejoy Chakraborty, Prasun Roy, Saumik Bhattacharya, Umapada Pal

Comments 11 pages

2604.13795 2026-04-16 cs.CV cs.LG

Artificial intelligence application in lymphoma diagnosis with Vision Transformer using weakly supervised training

Nghia, Nguyen, Amer Wahed, Andy Quesada, Yasir Ali, Hanadi El Achi, Y. Helen Zhang, Jocelyn Ursua, Alex Banerjee, Sahib Kalra, L. Jeffrey Medeiros, Jie Xu

Comments 23 pages, 6 figures, 1 table

2604.13793 2026-04-16 cs.CV

From Synchrony to Sequence: Exo-to-Ego Generation via Interpolation

Mohammad Mahdi, Nedko Savov, Danda Pani Paudel, Luc Van Gool

2604.13791 2026-04-16 cs.CV

PBE-UNet: A light weight Progressive Boundary-Enhanced U-Net with Scale-Aware Aggregation for Ultrasound Image Segmentation

Chen Wang, Yixin Zhu, Yongbin Zhu, Fengyuan Shi, Qi Li, Jun Wang, Zuozhu Liu, Keli Hu

Comments 14 pages, 14 figures

2604.13789 2026-04-16 cs.CV

Temporally Consistent Long-Term Memory for 3D Single Object Tracking

Jaejoon Yoo, SuBeen Lee, Yerim Jeon, Miso Lee, Jae-Pil Heo

Comments Accepted to CVPR 2026 Findings

2604.13016 2026-04-16 cs.LG cs.AI cs.CL

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Yaxuan Li, Yuxin Zuo, Bingxiang He, Jinqian Zhang, Chaojun Xiao, Cheng Qian, Tianyu Yu, Huan-ang Gao, Wenkai Yang, Zhiyuan Liu, Ning Ding

Comments 30 pages, 23 figures. Code: https://github.com/thunlp/OPD

2604.12865 2026-04-16 cs.AI

From edges to meaning: Semantic line sketches as a cognitive scaffold for ancient pictograph invention

Seowung Leem, Lin Gu, Ruogu Fang

2604.12358 2026-04-16 cs.CV

Why and When Visual Token Pruning Fails? A Study on Relevant Visual Information Shift in MLLMs Decoding

Jiwan Kim, Kibum Kim, Wonjoong Kim, Byung-Kwan Lee, Chanyoung Park

Comments Preprint, Project : https://ptkjw1997.github.io/DSTP-page/

2604.12213 2026-04-16 cs.AI cs.MA cs.SE

Modality-Native Routing in Agent-to-Agent Networks: A Multimodal A2A Protocol Extension

Vasundra Srinivasan

Comments 14 pages, 4 figures (TikZ). PDFLaTeX. Supplementary code and experiment artifacts: https://github.com/vasundras/modality-native-routing-a2a-protocol

2604.11748 2026-04-16 cs.CL cs.LG

LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling

Yuxin Chen, Chumeng Liang, Hangke Sui, Ruihan Guo, Chaoran Cheng, Jiaxuan You, Ge Liu

2604.11465 2026-04-16 cs.AI

Three Roles, One Model: Role Orchestration at Inference Time to Close the Performance Gap Between Small and Large Agents

S. Aaron McClendon, Jorge Gallego-Feliciano, Stavros Zervoudakis, Antonios Saravanos