arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.04207 2026-04-07 cs.AI

Don't Blink: Evidence Collapse during Multimodal Reasoning

Suresh Raghu, Satwik Pandey

Comments 8 pages, 6 figures, 1 table, plus appendix. Submitted to UAI 2026

详情

英文摘要

Reasoning VLMs can become more accurate while progressively losing visual grounding as they think. This creates task-conditional danger zones where low-entropy predictions are confident but ungrounded, a failure mode text-only monitoring cannot detect. Evaluating three reasoning VLMs on MathVista, HallusionBench, and MMMU_Pro, we find a pervasive evidence-collapse phenomenon: attention to annotated evidence regions drops substantially, often losing over half of evidence mass, as reasoning unfolds. Full-response entropy is the most reliable text-only uncertainty signal under cross-dataset transfer, yet adding vision features with a single global linear rule is brittle and often degrades transfer. An entropy-vision interaction model reveals a task-conditional regime: lowentropy, visually disengaged predictions are hazardous on sustained visual-reference tasks but benign on symbolic tasks. Using this structure, a targeted vision veto reduces selective risk by up to 1.9 percentage points at 90% coverage, while avoiding degradations where disengagement is expected. The results support task-aware multimodal monitoring for safe deployment under distribution shift.

URL PDF HTML ☆

赞 0 踩 0

2604.04204 2026-04-07 cs.CL cs.AI cs.CY cs.ET cs.LG

Which English Do LLMs Prefer? Triangulating Structural Bias Towards American English in Foundation Models

Mir Tafseer Nayeem, Davood Rafiei

Comments Preprint

2604.04198 2026-04-07 cs.CV cs.RO

DriveVA: Video Action Models are Zero-Shot Drivers

Mengmeng Liu, Diankun Zhang, Jiuming Liu, Jianfeng Cui, Hongwei Xie, Guang Chen, Hangjun Ye, Michael Ying Yang, Francesco Nex, Hao Cheng

2604.04196 2026-04-07 cs.RO cs.AI

Robots Need Some Education: On the complexity of learning in evolutionary robotics

Fuda van Diggelen

Comments PhD thesis

2604.04195 2026-04-07 cs.LG cs.CY

Stable and Privacy-Preserving Synthetic Educational Data with Empirical Marginals: A Copula-Based Approach

Gabriel Diaz Ramos, Lorenzo Luzi, Debshila Basu Mallick, Richard Baraniuk

Comments 10 pages, 6 figures. Accepted at the Educational Data Mining (EDM) 2026 conference

2604.04190 2026-04-07 cs.AI

Schema-Aware Planning and Hybrid Knowledge Toolset for Reliable Knowledge Graph Triple Verification

Xinyan Ma, Xianhao Ou, Weihao Zhang, Shixin Jiang, Runxuan Liu, Dandan Tu, Lei Chen, Ming Liu, Bing Qin

2604.04184 2026-04-07 cs.CV

AURA: Always-On Understanding and Real-Time Assistance via Video Streams

Xudong Lu, Yang Bo, Jinpeng Chen, Shuhan Li, Xintong Guo, Huankang Guan, Fang Liu, Dunyuan Xu, Peiwen Sun, Heyang Sun, Rui Liu, Hongsheng Li

2604.04183 2026-04-07 cs.CV

Scale-Aware Vision-Language Adaptation for Extreme Far-Distance Video Person Re-identification

Ashwat Rajbhandari, Bharatesh Chakravarthi

2604.04182 2026-04-07 cs.AI

Comparative reversal learning reveals rigid adaptation in LLMs under non-stationary uncertainty

Haomiaomiao Wang, Tomás E Ward, Lili Zhang

Comments 14 pages, 2 figures, accepted by IPMU 2026, SS04: Explainable AI and Decision-Making Under Uncertainty: Bridging Interpretability and Robustness

2604.04175 2026-04-07 cs.LG

Uncertainty-Aware Foundation Models for Clinical Data

Qian Zhou, Yuanyun Zhang, Shi Li

2604.04174 2026-04-07 cs.AI

CoALFake: Collaborative Active Learning with Human-LLM Co-Annotation for Cross-Domain Fake News Detection

Esma Aïmeur, Gilles Brassard, Dorsaf Sallami

2604.04172 2026-04-07 cs.CV cs.AI

GENFIG1: Visual Summaries of Scholarly Work as a Challenge for Vision-Language Models

Yaohan Guan, Pristina Wang, Najim Dehak, Alan Yuille, Jieneng Chen, Daniel Khashabi

2604.04171 2026-04-07 cs.AI

A Model of Understanding in Deep Learning Systems

David Peter Wallis Freeborn

2604.04170 2026-04-07 cs.CV cs.AI

Incomplete Multi-View Multi-Label Classification via Shared Codebook and Fused-Teacher Self-Distillation

Xu Yan, Jun Yin, Shiliang Sun, Minghua Wan

2604.04166 2026-04-07 cs.RO

Primitive-based Truncated Diffusion for Efficient Trajectory Generation of Differential Drive Mobile Manipulators

Long Xu, Choilam Wong, Yuhang Zhong, Junxiao Lin, Jialiang Hou, Fei Gao

Comments 9 pages, 6 figures

2604.04158 2026-04-07 cs.CV

Hierarchical Co-Embedding of Font Shapes and Impression Tags

Yugo Kubota, Kaito Shiku, Seiichi Uchida

2604.04157 2026-04-07 cs.AI

Readable Minds: Emergent Theory-of-Mind-Like Behavior in LLM Poker Agents

Hsieh-Ting Lin, Tsung-Yu Hou

Comments 7 pages (PNAS format), 4 figures, 2 tables, 49 references. Submitted to PNAS

详情

英文摘要

Theory of Mind (ToM) -- the ability to model others' mental states -- is fundamental to human social cognition. Whether large language models (LLMs) can develop ToM has been tested exclusively through static vignettes, leaving open whether ToM-like reasoning can emerge through dynamic interaction. Here we report that autonomous LLM agents playing extended sessions of Texas Hold'em poker progressively develop sophisticated opponent models, but only when equipped with persistent memory. In a 2x2 factorial design crossing memory (present/absent) with domain knowledge (present/absent), each with five replications (N = 20 experiments, ~6,000 agent-hand observations), we find that memory is both necessary and sufficient for ToM-like behavior emergence (Cliff's delta = 1.0, p = 0.008). Agents with memory reach ToM Level 3-5 (predictive to recursive modeling), while agents without memory remain at Level 0 across all replications. Strategic deception grounded in opponent models occurs exclusively in memory-equipped conditions (Fisher's exact p < 0.001). Domain expertise does not gate ToM-like behavior emergence but enhances its application: agents without poker knowledge develop equivalent ToM levels but less precise deception (p = 0.004). Agents with ToM deviate from game-theoretically optimal play (67% vs. 79% TAG adherence, delta = -1.0, p = 0.008) to exploit specific opponents, mirroring expert human play. All mental models are expressed in natural language and directly readable, providing a transparent window into AI social cognition. Cross-model validation with GPT-4o yields weighted Cohen's kappa = 0.81 (almost perfect agreement). These findings demonstrate that functional ToM-like behavior can emerge from interaction dynamics alone, without explicit training or prompting, with implications for understanding artificial social intelligence and biological social cognition.

URL PDF HTML ☆

赞 0 踩 0

2604.04155 2026-04-07 cs.LG cs.IT math.IT q-bio.QM stat.ML

The Geometric Alignment Tax: Tokenization vs. Continuous Geometry in Scientific Foundation Models

Prashant C. Raju

2604.04153 2026-04-07 cs.CV cs.AI cs.LG

Uncertainty-Aware Test-Time Adaptation for Cross-Region Spatio-Temporal Fusion of Land Surface Temperature

Sofiane Bouaziz, Adel Hafiane, Raphael Canals, Rachid Nedjai

Comments Accepted to IGARSS 2026

2604.04145 2026-04-07 cs.AI

Solar-VLM: Multimodal Vision-Language Models for Augmented Solar Power Forecasting

Hang Fan, Haoran Pei, Runze Liang, Weican Liu, Long Cheng, Wei Wei

2604.04142 2026-04-07 cs.CV

OP-GRPO: Efficient Off-Policy GRPO for Flow-Matching Models

Liyu Zhang, Kehan Li, Tingrui Han, Tao Zhao, Yuxuan Sheng, Shibo He, Chao Li

2604.04138 2026-04-07 cs.RO cs.AI

Learning Dexterous Grasping from Sparse Taxonomy Guidance

Juhan Park, Taerim Yoon, Seungmin Kim, Joonggil Kim, Wontae Ye, Jeongeun Park, Yoonbyung Chai, Geonwoo Cho, Geunwoo Cho, Dohyeong Kim, Kyungjae Lee, Yongjae Kim, Sungjoon Choi

2604.04136 2026-04-07 cs.CV

Rethinking Exposure Correction for Spatially Non-uniform Degradation

Ao Li, Jiawei Sun, Le Dong, Zhenyu Wang, Weisheng Dong

2604.04133 2026-04-07 cs.CV cs.AI

Learning Robust Visual Features in Computed Tomography Enables Efficient Transfer Learning for Clinical Tasks

Rubén Moreno-Aguado, Alba Magallón, Victor Moreno, Yingying Fang, Guang Yang

2604.04131 2026-04-07 cs.AI

Profile-Then-Reason: Bounded Semantic Complexity for Tool-Augmented Language Agents

Paulo Akira F. Enabe

2604.04129 2026-04-07 cs.SD cs.LG

Measuring Robustness of Speech Recognition from MEG Signals Under Distribution Shift

Sheng-You Chien, Bo-Yi Mao, Yi-Ning Chang, Po-Chih Kuo

Comments 17 pages, 6 figures, LibriBrain Competition @NeurIPS2025

2604.04127 2026-04-07 cs.CV

SARES-DEIM: Sparse Mixture-of-Experts Meets DETR for Robust SAR Ship Detection

Fenghao Song, Shaojing Yang, Xi Zhou

Comments 10 pages, 4 figures, published to JSTARS(IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing)

2604.04120 2026-04-07 cs.CL

Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression

Lingjie Zeng, Xiaofan Chen, Yanbo Wang, Xiuying Chen

2604.04117 2026-04-07 cs.RO cs.CV cs.LG

Efficient Onboard Spacecraft Pose Estimation with Event Cameras and Neuromorphic Hardware

Arunkumar Rathinam, Jules Lecomte, Jost Reelsen, Gregor Lenz, Axel von Arnim, Djamila Aouada

Comments AI4SPACE workshop at CVPR 2026

2604.04108 2026-04-07 cs.CV

Hypothesis Graph Refinement: Hypothesis-Driven Exploration with Cascade Error Correction for Embodied Navigation

Peixin Chen, Guoxi Zhang, Jianwei Ma, Qing Li

详情

英文摘要

Embodied agents must explore partially observed environments while maintaining reliable long-horizon memory. Existing graph-based navigation systems improve scalability, but they often treat unexplored regions as semantically unknown, leading to inefficient frontier search. Although vision-language models (VLMs) can predict frontier semantics, erroneous predictions may be embedded into memory and propagate through downstream inferences, causing structural error accumulation that confidence attenuation alone cannot resolve. These observations call for a framework that can leverage semantic predictions for directed exploration while systematically retracting errors once new evidence contradicts them. We propose Hypothesis Graph Refinement (HGR), a framework that represents frontier predictions as revisable hypothesis nodes in a dependency-aware graph memory. HGR introduces (1) semantic hypothesis module, which estimates context-conditioned semantic distributions over frontiers and ranks exploration targets by goal relevance, travel cost, and uncertainty, and (2) verification-driven cascade correction, which compares on-site observations against predicted semantics and, upon mismatch, retracts the refuted node together with all its downstream dependents. Unlike additive map-building, this allows the graph to contract by pruning erroneous subgraphs, keeping memory reliable throughout long episodes. We evaluate HGR on multimodal lifelong navigation (GOAT-Bench) and embodied question answering (A-EQA, EM-EQA). HGR achieves 72.41% success rate and 56.22% SPL on GOAT-Bench, and shows consistent improvements on both QA benchmarks. Diagnostic analysis reveals that cascade correction eliminates approximately 20% of structurally redundant hypothesis nodes and reduces revisits to erroneous regions by 4.5x, with specular and transparent surfaces accounting for 67% of corrected prediction errors.

URL PDF HTML ☆

赞 0 踩 0