arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.24564 2026-05-01 cs.CL cs.IR cs.IT math.IT

MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG

Xihang Wang, Zihan Wang, Chengkai Huang, Quan Z. Sheng, Lina Yao

详情

英文摘要

Multimodal Retrieval-Augmented Generation (MRAG) addresses key limitations of Multimodal Large Language Models (MLLMs), such as hallucination and outdated knowledge. However, current MRAG systems struggle to distinguish whether retrieved multimodal data truly supports the semantic core of an answer or merely provides superficial relevance. Existing metrics often rely on heuristic position-based confidence, which fails to capture the informational density of multimodal entities. To address this, we propose Multi-modal Evidence Grounding (MEG), a semantic-aware metric that quantifies the contribution of retrieved evidence. Unlike standard confidence measures, MEG utilizes Semantic Certainty Anchoring, focusing on high-IDF information-bearing tokens that better capture the semantic core of the answer. Building on MEG, we introduce MEG-RAG, a framework that trains a multimodal reranker to align retrieved evidence with the semantic anchors of the ground truth. By prioritizing high-value content based on semantic grounding rather than token probability distributions, MEG-RAG improves the accuracy and multimodal consistency of generated outputs. Extensive experiments on the M$^2$RAG benchmark show that MEG-RAG consistently outperforms strong baselines and demonstrates robust generalization across different teacher models.

URL PDF HTML ☆

赞 0 踩 0

2604.24026 2026-05-01 cs.CL cs.AI

From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills

Qiliang Liang, Hansi Wang, Zhong Liang, Yang Liu

Comments 21 pages, 1 figure

详情

英文摘要

LLM agents increasingly rely on reusable skills, capability packages that combine instructions, control flow, constraints, and tool calls. In most current agent systems, however, skills are still represented by text-heavy artifacts, including SKILL{.}md-style documents and structured records whose machine-usable evidence remains embedded largely in natural-language descriptions. This poses a challenge for skill-centered agent systems: managing skill collections and using skills to support agent both require reasoning over invocation interfaces, execution structure, and concrete side effects that are often entangled in a single textual surface. An explicit representation of skill knowledge may therefore help make these artifacts easier for machines to acquire and leverage. Drawing on Memory Organization Packets, Script Theory, and Conceptual Dependency from Schank and Abelson's classical work on linguistic knowledge representation, we introduce what is, to our knowledge, the first structured representation for agent skill artifacts that disentangles skill-level scheduling signals, scene-level execution structure, and logic-level action and resource-use evidence: the Scheduling-Structural-Logical (SSL) representation. We instantiate SSL with an LLM-based normalizer and evaluate it on a corpus of skills in two tasks, Skill Discovery and Risk Assessment, and superiorly outperform the text-only baselines: in Skill Discovery, SSL improves MRR from 0.573 to 0.707; in Risk Assessment, it improves macro F1 from 0.744 to 0.787. These findings reveal that explicit, source-grounded structure makes agent skills easier to search and review. They also suggest that SSL is best understood as a practical step toward more inspectable, reusable, and operationally actionable skill representations for agent systems, rather than as a finished standard or an end-to-end mechanism for managing and using skills.

URL PDF HTML ☆

赞 0 踩 0

2604.23763 2026-05-01 cs.CV

Edit Where You Mean: Region-Aware Adapter Injection for Mask-Free Local Image Editing

Honghao Cai, Xiangyuan Wang, Yunhao Bai, Haohua Chen, Tianze Zhou, Runqi Wang, Wei Zhu, Yibo Chen, Xu Tang, Yao Hu, Zhen Li

2604.22655 2026-05-01 cs.LG

Associativity-Peakiness Metric for Contingency Tables

Naomi E. Zirkind, William J. Diehl

Comments 38 pages, 21 figures

2604.22161 2026-05-01 cs.LG

Logistic Bandits with $\tilde{O}(\sqrt{dT})$ Regret without Context Diversity Assumptions

Seoungbin Bae, Dabeen Lee

2604.21729 2026-05-01 cs.RO

A Compact Peristaltic Pump Based on Magneto-Elastic Hysteresis with Single Pneumatic Control

Minjo Park, Metin Sitti

Comments Submitted to IEEE CBS 2026. This work has been submitted to the IEEE for possible publication

2604.21479 2026-05-01 cs.CV

Frozen LLMs as Map-Aware Spatio-Temporal Reasoners for Vehicle Trajectory Prediction

Yanjiao Liu, Jiawei Liu, Xun Gong, Zifei Nie

Comments Accepted for publication at IEEE Intelligent Vehicles Symposium 2026

2604.19962 2026-05-01 cs.RO

Radar Odometry Subject to High Tilt Dynamics of Subarctic Environments

Matěj Boxan, William Larrivée-Hardy, François Pomerleau

2604.19606 2026-05-01 cs.AI cs.MA

AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories

Xue Xia, Chengkai Yao, Mingyu Tsoi, Xinjie Mao, Wenxuan Huang, Jiaqi Wei, Hao Wu, Cheng Tan, Lang Yu, Yuejin Yang, Mengdi Liu, Siqi Sun, Zhangyang Gao

Comments 25 pages, 5 figures

2604.19571 2026-05-01 cs.CV

TransSplat: Unbalanced Semantic Transport for Language-Driven 3DGS Editing

Yanhui Chen, Jiahong Li, Jingchao Wang, Junyi Lin, Zixin Zeng, Yang Shi

2604.19221 2026-05-01 cs.AI cs.SD eess.AS

UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction

Yadong Li, Guoxin Wu, Haiping Hou, Biye Li

2604.18390 2026-05-01 cs.LG cs.AI

Randomly Initialized Networks Can Learn from Peer-to-Peer Consensus

Esteban Rodríguez-Betancourt, Edgar Casasola-Murillo

Comments 6 pages, 10 figures. To be published in ChileCON 2025 proceedings

2604.17175 2026-05-01 cs.LG cs.AI q-bio.BM

RosettaSearch: Multi-Objective Inference-Time Search for Protein Sequence Design

Meghana Kshirsagar, Allen Nie, Ching-An Cheng, Fanglei Xue, Rahul Dodhia, Juan Lavista Ferres, Kevin K. Yang, Frank DiMaio

2604.17022 2026-05-01 cs.CL cs.AI

Beyond Black-Box Labels: Interpretable Criteria for Diagnosing Subjective NLP Tasks

Nisrine Rair, Alban Goupil, Valeriu Vrabie, Emmanuel Chochoy

Comments Accepted to ACL Findings 2026

2604.13294 2026-05-01 cs.CV

PAT-VCM: Plug-and-Play Auxiliary Tokens for Video Coding for Machines

Wei Jiang, Wei Wang

Comments 22 pages, 4 figures, 23 tables

2604.12656 2026-05-01 cs.RO cs.LG

FeaXDrive: Feasibility-aware Trajectory-Centric Diffusion Planning for End-to-End Autonomous Driving

Baoyun Wang, Zhuoren Li, Ran Yu, Yu Che, Xinrui Zhang, Ming Liu, Jia Hu, Chen Lv, Bo Leng

Comments 22 pages, 6 figures

2604.09408 2026-05-01 cs.AI

HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

Mohamed Elfeki, Tu Trinh, Kelvin Luu, Guangze Luo, Nathan Hunt, Ernesto Montoya, Nandan Marwaha, Yannis He, Charles Wang, Fernando Crabedo, Alessa Castilo, Bing Liu

2604.07392 2026-05-01 cs.LG cs.IR cs.RO

Event-Centric World Modeling with Memory-Augmented Retrieval for Embodied Decision-Making

Zhaowen Fan, Rongchao Zhang

Comments This is the initial version (v1) released to establish priority for the proposed framework. Subsequent versions will include expanded experimental validation and exhaustive hardware benchmarking

2604.06715 2026-05-01 cs.CV cs.AI

HQF-Net: A Hybrid Quantum-Classical Multi-Scale Fusion Network for Remote Sensing Image Segmentation

Md Aminur Hossain, Ayush V. Patel, Siddhant Gole, Sanjay K. Singh, Biplab Banerjee

Comments 17 pages

2603.26571 2026-05-01 cs.CV cs.AI

GVCC: Zero-Shot Video Compression via Codebook-Driven Stochastic Rectified Flow

Ziyue Zeng, Xun Su, Haoyuan Liu, Bingyu Lu, Yui Tatsumi, Hiroshi Watanabe

Comments 9 pages, 3 figures

2603.22201 2026-05-01 cs.RO

Make Tracking Easy: Neural Motion Retargeting for Humanoid Whole-body Control

Qingrui Zhao, Kaiyue Yang, Xiyu Wang, Shiqi Zhao, Yi Lu, Xinfang Zhang, Qiu Shen, Xiao-Xiao Long, Xun Cao

Comments Report, 12 pages, 5 figures, 4 tables, webpage: https://nju3dv-humanoidgroup.github.io/nmr.github.io

2603.22078 2026-05-01 cs.RO

Do World Action Models Generalize Better than VLAs? A Robustness Study

Zhanguang Zhang, Zhiyuan Li, Behnam Rahmati, Rui Heng Yang, Yintao Ma, Amir Rasouli, Sajjad Pakdamansavoji, Yangzheng Wu, Lingfeng Zhang, Tongtong Cao, Feng Wen, Xinyu Wang, Xingyue Quan, Yingxue Zhang

详情

英文摘要

Robot action planning in the real world is challenging as it requires not only understanding the current state of the environment but also predicting how it will evolve in response to actions. Vision-language-action (VLA), which repurpose large-scale vision-language models for robot action generation using action experts, have achieved notable success across a variety of robotic tasks. Nevertheless, their performance remains constrained by the scope of their training data, exhibiting limited generalization to unseen scenarios and vulnerability to diverse contextual perturbations. More recently, world models have been revisited as an alternative to VLAs. These models, referred to as world action models (WAMs), are built upon world models that are trained on large corpora of video data to predict future states. With minor adaptations, their latent representation can be decoded into robot actions. It has been suggested that their explicit dynamic prediction capacity, combined with spatiotemporal priors acquired from web-scale video pretraining, enables WAMs to generalize more effectively than VLAs. In this paper, we conduct a comparative study of prominent state-of-the-art VLA policies and recently released WAMs. We evaluate their performance on the LIBERO-Plus and RoboTwin 2.0-Plus benchmarks under various visual and language perturbations. Our results show that WAMs achieve strong robustness, with LingBot-VA reaching 74.2% success rate on RoboTwin 2.0-Plus and Cosmos-Policy achieving 82.2% on LIBERO-Plus. While VLAs such as $π_{0.5}$ can achieve comparable robustness on certain tasks, they typically require extensive training with diverse robotic datasets and varied learning objectives. Hybrid approaches that partially incorporate video-based dynamic learning exhibit intermediate robustness, highlighting the importance of how video priors are integrated.

URL PDF HTML ☆

赞 0 踩 0

2603.21016 2026-05-01 cs.CL cs.AI cs.LG

Mitigating Selection Bias in Large Language Models via Permutation-Aware GRPO

Jinquan Zheng, Jia Yuan, Jiacheng Yao, Chenyang Gu, Pujun Zheng, Guoxiu He

Comments Accepted to ACL 2026 Main Conference. 19 pages, 3 figures, 6 tables

2603.19044 2026-05-01 cs.CL

MoRI: Learning Motivation-Grounded Reasoning for Scientific Ideation in Large Language Models

Chenyang Gu, Jiahao Cheng, Meicong Zhang, Pujun Zheng, Jinquan Zheng, Guoxiu He

Comments Accepted to ACL 2026 Main Conference

2603.07101 2026-05-01 cs.AI

Grounding Machine Creativity in Game Design Knowledge Representations: Empirical Probing of LLM-Based Executable Synthesis of Goal Playable Patterns under Structural Constraints

Hugh Xuechen Liu, Kıvanç Tatar

2603.01444 2026-05-01 cs.LG

Autoregressive Synthesis of Sparse and Semi-Structured Mixed-Type Data

Thomas Rückstieß, Robin Vujanic

Comments Under Submission

2602.24262 2026-05-01 cs.LG

Coverage-Aware Web Crawling for Domain-Specific Supplier Discovery via a Web--Knowledge--Web Pipeline

Yijiashun Qi, Yijiazhen Qi, Tanmay Wagh

Comments Accepted by 2026 7th International Conference on Computer Information and Big Data Applications

2602.17469 2026-05-01 cs.CL cs.HC

Cross-Lingual Sentiment Misalignment: Auditing Multilingual Language Models for Inversion Risk, Dialectal Representation, and Affective Stability

Nusrat Jahan Lia, Shubhashis Roy Dipta

2602.16516 2026-05-01 cs.CL

Supercharging Agenda Setting Research: The ParlaCAP Dataset of 28 European Parliaments and a Scalable Multilingual LLM-Based Classification

Taja Kuzman Pungeršek, Peter Rupnik, Daniela Širinić, Nikola Ljubešić

Comments 17 pages, 7 figures, 7 tables. Presented in the PoliticalNLP 2026 workshop, co-located with LREC 2026 conference

2602.13559 2026-05-01 cs.AI

OpAgent: Operator Agent for Web Navigation

Yuyu Guo, Wenjie Yang, Siyuan Yang, Ziyang Liu, Cheng Chen, Yuan Wei, Yun Hu, Yang Huang, Guoliang Hao, Dongsheng Yuan, Jianming Wang, Xin Chen, Hang Yu, Lei Lei, Peng Di