arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.18245 2026-04-28 cs.LG

Correction and Corruption: A Two-Rate View of Error Flow in LLM Protocols

Fernando Reitich

Comments 36 pages main paper, 19 pages supplementary material included as ancillary file

详情

英文摘要

Large language models are increasingly deployed as protocols: structured multi-call procedures that spend additional computation to transform a baseline answer into a final one. These protocols are evaluated only by end-to-end accuracy, giving limited insight into when they help, when they hurt, and whether their behavior transfers under distribution shift or composition. We propose a paired-outcome measurement interface for auditing a single protocol step on exact-match tasks. For each instance, the interface records a baseline correctness bit $E_0\in\{0,1\}$ and a post-step correctness bit $E_1\in\{0,1\}$, separating correction ($E_0=0\to E_1=1$) from corruption ($E_0=1\to E_1=0$) through two rates: $c=\Pr(E_1=1\mid E_0=0)$ and $γ=\Pr(E_1=0\mid E_0=1)$. These rates predict accuracy changes and define a reusable empirical interface testable across seeds, mixtures, and pipelines. We identify three failure mechanisms. Under mixture shift, pooled estimates of $(c,γ)$ become biased when calibration and deployment mixtures differ; conditioning on a difficulty proxy restores stability without additional model calls. Under presentation contamination, selection protocols alter the interface through stable presentation artifacts when candidate content is fixed. Under state insufficiency, the correctness bit may not carry enough history for multi-step pipelines to compose predictably; a Markov factorization test identifies when composition is valid and where additional state is needed. When a protocol step passes these diagnostics, it becomes an auditable module: gated by estimated gain, conditioned on a difficulty proxy to correct mixture bias, and composed into multi-step pipelines with predictable accuracy. We demonstrate these ideas on synthetic mathematical tasks and on GSM8K, where the calibrated interface correctly predicts when protocol steps should be activated or suppressed.

URL PDF HTML ☆

赞 0 踩 0

2604.17745 2026-04-28 cs.CL

HiRAS: A Hierarchical Multi-Agent Framework for Paper-to-Code Generation and Execution

Hanhua Hong, Yizhi LI, Jiaoyan Chen, Sophia Ananiadou, Xiaoli Li, Jung-jae Kim, Chenghua Lin

Comments 29 pages

2604.17527 2026-04-28 cs.RO

Safer Trajectory Planning with CBF-guided Diffusion Model for Unmanned Aerial Vehicles

Peiwen Yang, Shiyu Bai, Weisong Wen, Yixin Gao, Jiahao Hu

Comments Some equations and sentences need to be checked again and will be uploaded again

2604.16909 2026-04-28 cs.CL cs.AI

PRISM: Probing Reasoning, Instruction, and Source Memory in LLM Hallucinations

Yuhe Wu, Guangyu Wang, Yuran Chen, Jiatong Zhang, Yutong Zhang, Yujie Chen, Jiaming Shang, Guang Zhang, Zhuang Liu

Comments Accepted by ACL main conference 2026

2604.16817 2026-04-28 cs.LG cs.AI

Self-Reinforcing Controllable Synthesis of Rare Relational Data via Bayesian Calibration

Chongsheng Zhang, Hao Wang, Zelong Yu, Esteban Garces Arias, Julian Rodemann, Zhanshuo Zhang, Qilong Li, Gaojuan Fan, Krikamol Muandet, Christian Heumann

Comments Accepted at: Findings of the Association for Computational Linguistics: ACL 2026 (ACL 2026 Findings), San Diego, California, USA, July 2-7, 2026

2604.16514 2026-04-28 cs.CV cs.LG

BARD: Bridging AutoRegressive and Diffusion Vision-Language Models Via Highly Efficient Progressive Block Merging and Stage-Wise Distillation

Baoyou Chen, Hanchen Xia, Peng Tu, Haojun Shi, Liwei Zhang, Weihao Yuan, Siyu Zhu

2604.16452 2026-04-28 cs.RO cs.PL cs.SY eess.SY

Compiling OpenSCENARIO 2.1 for Scenario-Based Testing in CARLA

Thoshitha Gamage, Lasanthi Gamage

2604.14989 2026-04-28 cs.AI cs.AR

Dr. RTL: Autonomous Agentic RTL Optimization through Tool-Grounded Self-Improvement

Wenji Fang, Yao Lu, Shang Liu, Jing Wang, Ziyan Guo, Junxian He, Fengbin Tu, Zhiyao Xie

2604.14910 2026-04-28 cs.CV

Reward-Aware Trajectory Shaping for Few-step Visual Generation

Rui Li, Bingyu Li, Yuanzhi Liang, Haibin Huang, Chi Zhang, XueLong Li

2604.14888 2026-04-28 cs.CL cs.AI cs.CV cs.LG

Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models

Danae Sánchez Villegas, Samuel Lewis-Lim, Nikolaos Aletras, Desmond Elliott

2604.12373 2026-04-28 cs.CL

Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness

Tomer Ashuach, Shai Gretz, Yoav Katz, Yonatan Belinkov, Liat Ein-Dor

Comments Accepted to ACL 2026 (Main Conference). 8 pages, 16 figures, 2 tables

2604.10708 2026-04-28 cs.SD cs.AI cs.CV cs.MM

Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing

Zeyue Tian, Binxin Yang, Zhaoyang Liu, Jiexuan Zhang, Ruibin Yuan, Hubery Yin, Qifeng Chen, Chen Li, Jing Lyu, Wei Xue, Yike Guo

2604.10516 2026-04-28 cs.CL

Structure-Grounded Knowledge Retrieval via Code Dependencies for Multi-Step Data Reasoning

Xinyi Huang

2604.07042 2026-04-28 cs.AI

Planning Task Shielding: Detecting and Repairing Flaws in Planning Tasks through Turning them Unsolvable

Alberto Pozanco, Marianela Morales, Pietro Totis, Daniel Borrajo

2604.05631 2026-04-28 cs.AI cs.ET cs.HC

Beyond Behavior: Why AI Evaluation Needs a Cognitive Revolution

Amir Konigsberg

2604.05621 2026-04-28 cs.CV

FunRec: Reconstructing Functional 3D Scenes from Egocentric Interaction Videos

Alexandros Delitzas, Chenyangguang Zhang, Alexey Gavryushin, Tommaso Di Mario, Boyang Sun, Rishabh Dabral, Leonidas Guibas, Christian Theobalt, Marc Pollefeys, Francis Engelmann, Daniel Barath

Comments CVPR 2026. Project page: https://functionalscenes.github.io

2604.03768 2026-04-28 cs.AI cs.LG

RL-Driven Sustainable Land-Use Allocation for the Lake Malawi Basin

Ying Yao

Comments 9 pages, 11 figures; added baseline comparison under "Result" section; revised limitation and discussion

2604.02923 2026-04-28 cs.CL cs.AI

Council Mode: A Heterogeneous Multi-Agent Consensus Framework for Reducing LLM Hallucination and Bias

Shuai Wu, Xue Li, Yanna Feng, Yufang Li, Zhijun Wang, Ran Wang

Comments 24 pages, 8 figures, 16 tables, 1 algorithm. Open-source implementation: https://github.com/Noah-Wu66/Vectaix-Research. Archived software DOI: 10.5281/zenodo.19767626

2604.01897 2026-04-28 cs.SD eess.AS

FastTurn: Unifying Acoustic and Streaming Semantic Cues for Low-Latency and Robust Turn Detection

Chengyou Wang, Hongfei Xue, Chunjiang He, Jingbin Hu, Shuiyuan Wang, Bo Wu, Yuyu Ji, Jimeng Zheng, Ruofei Chen, Zhou Zhu, Lei Xie

Comments 5 pages, 2 figures

2604.01644 2026-04-28 cs.CV cs.MM

TOL: Textual Localization with OpenStreetMap

Youqi Liao, Shuhao Kang, Jingyu Xu, Olaf Wysocki, Yan Xia, Jianping Li, Zhen Dong, Bisheng Yang, Xieyuanli Chen

Comments Tech repo

详情

英文摘要

Natural language provides an intuitive way to express spatial intent in geospatial applications. While existing localization methods often rely on dense point cloud maps or high-resolution imagery, OpenStreetMap (OSM) offers a compact and freely available map representation that encodes rich semantic and structural information, making it well-suited for large-scale localization. However, text-to-OSM (T2O) localization remains largely unexplored. In this paper, we formulate the T2O localization task, which aims to estimate accurate 2D positions in urban environments from textual scene descriptions without relying on geometric observations or GNSS-based initial location. To support the proposed task, we introduce TOL, a large-scale benchmark spanning multiple continents and diverse urban environments. TOL contains approximately 121K textual queries paired with OSM map tiles and covers about 316 km of road trajectories across Boston, Karlsruhe, and Singapore. We further propose TOLoc, a coarse-to-fine localization framework that explicitly models the semantics of surrounding objects and their directional information. In the coarse stage, direction-aware features are extracted from both textual descriptions and OSM tiles to construct global descriptors, which are used to retrieve candidate locations for the query. In the fine stage, the query text and top-1 retrieved tile are jointly processed, where a dedicated alignment module fuses the textual descriptor and local map features to regress the 2-DoF pose. Experimental results demonstrate that TOLoc achieves strong localization performance, outperforming the best existing method by 6.53\%, 9.93\%, and 8.32\% at 5 m, 10 m, and 25 m thresholds, respectively, and shows strong generalization to unseen environments. Dataset, code and models will be publicly available at: https://github.com/WHU-USI3DV/TOL.

URL PDF HTML ☆

赞 0 踩 0

2603.27507 2026-04-28 cs.CV

Chat-Scene++: Exploiting Context-Rich Object Identification for 3D LLM

Haifeng Huang, Yilun Chen, Zehan Wang, Jiangmiao Pang, Zhou Zhao

2603.26729 2026-04-28 cs.CV cs.AI

Multi-view Graph Convolutional Network with Fully Leveraging Consistency via Granular-ball-based Topology Construction, Feature Enhancement and Interactive Fusion

Chengjie Cui, Taihua Xu, Shuyin Xia, Qinghua Zhang, Yun Cui, Shiping Wang

详情

英文摘要

The effective utilization of consistency is crucial for multi-view learning. GCNs leverage node connections to propagate information across the graph, facilitating the exploitation of consistency in multi-view data. However, most existing GCN-based multi-view methods suffer from several limitations. First, current approaches predominantly rely on KNN for topology construction, where the artificial selection of the k value significantly constrains the effective exploitation of inter-node consistency. Second, the inter-feature consistency within individual views is often overlooked, which adversely affects the quality of the final embedding representations. Moreover, these methods fail to fully utilize inter-view consistency as the fusion of embedded representations from multiple views is often implemented after the intra-view graph convolutional operation. Collectively, these issues limit the model's capacity to fully capture inter-node, inter-feature and inter-view consistency. To address these issues, this paper proposes the multi-view graph convolutional network with fully leveraging consistency via GB-based topology construction, feature enhancement and interactive fusion (MGCN-FLC). MGCN-FLC can fully utilize three types of consistency via the following three modules to enhance learning ability:The topology construction module based on the granular ball algorithm, which clusters nodes into granular balls with high internal similarity to capture inter-node consistency;The feature enhancement module that improves feature representations by capturing inter-feature consistency;The interactive fusion module that enables each view to deeply interact with all other views, thereby obtaining more comprehensive inter-view consistency. Experimental results on nine datasets show that the proposed MGCN-FLC outperforms state-of-the-art semi-supervised node classification methods.

URL PDF HTML ☆

赞 0 踩 0

2603.25562 2026-04-28 cs.LG cs.AI cs.CL

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

Yuqian Fu, Haohuan Huang, Kaiwen Jiang, Jiacai Liu, Zhuo Jiang, Yuanheng Zhu, Dongbin Zhao

2603.24231 2026-04-28 cs.CL cs.SI

When Annotators Agree but Labels Disagree: The Projection Problem in Stance Detection

Bowen Zhang

2603.19040 2026-04-28 cs.LG

When Differential Privacy Meets Wireless Federated Learning: An Improved Analysis for Privacy and Convergence

Chen Yaoling, Liang Hao, Tu Xiaotong

Comments 5 pages, 1 figure

2603.17834 2026-04-28 cs.RO cs.AI

Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control

Zunzhe Zhang, Runhan Huang, Yicheng Liu, Shaoting Zhu, Linzhan Mou, Hang Zhao

Comments 18 pages, 6 figures

2603.13502 2026-04-28 cs.RO cs.SY eess.SY

Safety-aware Goal-oriented Semantic Sensing, Communication, and Control for Robotics

Wenchao Wu, Shutong Chen, Wenjie Liu, Zhibo Pang, Yansha Deng, Robert Schober

Comments 7 pages. This paper has been submitted to the IEEE Wireless Communications Magazine

2603.11831 2026-04-28 cs.CV

Towards High-Fidelity CAD Generation via LLM-Driven Program Generation and Text-Based B-Rep Primitive Grounding

Jiahao Li, Qingwang Zhang, Qiuyu Chen, Guozhan Qiu, Yunzhong Lou, Xiangdong Zhou

Comments preprint

2603.09886 2026-04-28 cs.RO

Robust Cooperative Localization in Featureless Environments: A Comparative Study of DCL, StCL, CCL, CI, and Standard-CL

Nivand Khosravi, Rodrigo Ventura, Meysam Basiri

Comments Accepted and presented at the 2026 12th International Conference on Automation, Robotics and Applications (ICARA); to appear in IEEE conference proceedings

2603.08592 2026-04-28 cs.CV

Boosting MLLM Spatial Reasoning with Geometrically Referenced 3D Scene Representations

Jiangye Yuan, Gowri Kumar, Baoyuan Wang