arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.01126 2026-03-03 cs.RO

Pro-HOI: Perceptive Root-guided Humanoid-Object Interaction

Yuhang Lin, Jiyuan Shi, Dewei Wang, Jipeng Kong, Yong Liu, Chenjia Bai, Xuelong Li

详情

英文摘要

Executing reliable Humanoid-Object Interaction (HOI) tasks for humanoid robots is hindered by the lack of generalized control interfaces and robust closed-loop perception mechanisms. In this work, we introduce Perceptive Root-guided Humanoid-Object Interaction, Pro-HOI, a generalizable framework for robust humanoid loco-manipulation. First, we collect box-carrying motions that are suitable for real-world deployment and optimize penetration artifacts through a Signed Distance Field loss. Second, we propose a novel training framework that conditions the policy on a desired root-trajectory while utilizing reference motion exclusively as a reward. This design not only eliminates the need for intricate reward tuning but also establishes root trajectory as a universal interface for high-level planners, enabling simultaneous navigation and loco-manipulation. Furthermore, to ensure operational reliability, we incorporate a persistent object estimation module. By fusing real-time detection with Digital Twin, this module allows the robot to autonomously detect slippage and trigger re-grasping maneuvers. Empirical validation on a Unitree G1 robot demonstrates that Pro-HOI significantly outperforms baselines in generalization and robustness, achieving reliable long-horizon execution in complex real-world scenarios.

URL PDF HTML ☆

赞 0 踩 0

2603.01125 2026-03-03 cs.CV cs.AI

Predictive Reasoning with Augmented Anomaly Contrastive Learning for Compositional Visual Relations

Chengtai Li, Yuting He, Jianfeng Ren, Ruibin Bai, Yitian Zhao, Heng Yu, Xudong Jiang

Comments Accepted by IEEE Transactions on Multimedia

2603.01124 2026-03-03 cs.CV cs.AI

ClinCoT: Clinical-Aware Visual Chain-of-Thought for Medical Vision Language Models

Xiwei Liu, Yulong Li, Xinlin Zhuang, Xuhui Li, Jianxu Chen, Haolin Yang, Imran Razzak, Yutong Xie

2603.01121 2026-03-03 cs.AI

HVR-Met: A Hypothesis-Verification-Replaning Agentic System for Extreme Weather Diagnosis

Shuo Tang, Jiadong Zhang, Jian Xu, Gengxian Zhou, Qizhao Jin, Qinxuan Wang, Yi Hu, Ning Hu, Hongchang Ren, Lingli He, Jiaolan Fu, Jingtao Ding, Shiming Xiang, Chenglin Liu

2603.01115 2026-03-03 cs.CV

GuiDINO: Rethinking Vision Foundation Model in Medical Image Segmentation

Zhuonan Liang, Wei Guo, Jie Gan, Yaxuan Song, Runnan Chen, Hang Chang, Weidong Cai

Comments 12 pages, 2 figures, 3 tables

2603.01113 2026-03-03 cs.RO

From Dialogue to Execution: Mixture-of-Agents Assisted Interactive Planning for Behavior Tree-Based Long-Horizon Robot Execution

Kanata Suzuki, Kazuki Hori, Haruka Miyoshi, Shuhei Kurita, Tetsuya Ogata

2603.01110 2026-03-03 cs.RO

Compact Task-Aligned Imitation Learning for Laboratory Automation

Kanata Suzuki, Hanon Nakamurama, Kana Miyamoto, Tetsuya Ogata

2603.01108 2026-03-03 cs.CV

GroundedSurg: A Multi-Procedure Benchmark for Language-Conditioned Surgical Tool Segmentation

Tajamul Ashraf, Abrar Ul Riyaz, Wasif Tak, Tavaheed Tariq, Sonia Yadav, Moloud Abdar, Janibul Bashir

Comments https://github.com/gaash-lab/GroundedSurg

2603.01106 2026-03-03 cs.AI

DIVA-GRPO: Enhancing Multimodal Reasoning through Difficulty-Adaptive Variant Advantage

Haowen Gao, Zhenyu Zhang, Liang Pang, Fangda Guo, Hongjian Dou, Guannan Lv, Shaoguo Liu, Tingting Gao, Huawei Shen, Xueqi Cheng

Comments Accepted to ICLR 2026. Code and models are available at https://github.com/Siaaaaaa1/DIVA-GRPO

2603.01103 2026-03-03 cs.CV

Data-Efficient Brushstroke Generation with Diffusion Models for Oil Painting

Dantong Qin, Alessandro Bozzon, Xian Yang, Xun Zhang, Yike Guo, Pan Wang

2603.01101 2026-03-03 cs.SD cs.AI

SyncTrack: Rhythmic Stability and Synchronization in Multi-Track Music Generation

Hongrui Wang, Fan Zhang, Zhiyuan Yu, Ziya Zhou, Xi Chen, Can Yang, Yang Wang

Comments Accepted by ICLR 2026

2603.01096 2026-03-03 cs.CV cs.AI cs.CL cs.LG

Unified Vision-Language Modeling via Concept Space Alignment

Yifu Qiu, Paul-Ambroise Duquenne, Holger Schwenk

Comments ICLR 2026

2603.01089 2026-03-03 cs.CL cs.LG

CARD: Towards Conditional Design of Multi-agent Topological Structures

Tongtong Wu, Yanming Li, Ziye Tang, Chen Jiang, Linhao Luo, Guilin Qi, Shirui Pan, Gholamreza Haffari

Comments Accepted to ICLR 2026

2603.01083 2026-03-03 cs.CV

Can Vision Language Models Assess Graphic Design Aesthetics? A Benchmark, Evaluation, and Dataset Perspective

Arctanx An, Shizhao Sun, Danqing Huang, Mingxi Cheng, Yan Gao, Ji Li, Yu Qiao, Jiang Bian

Comments ICLR 2026

2603.01082 2026-03-03 cs.CV cs.IR

Beyond Global Similarity: Towards Fine-Grained, Multi-Condition Multimodal Retrieval

Xuan Lu, Kangle Li, Haohang Huang, Rui Meng, Wenjun Zeng, Xiaoyu Shen

Comments Accepted by CVPR 2026

2603.01074 2026-03-03 cs.CV

Adaptive Augmentation-Aware Latent Learning for Robust LiDAR Semantic Segmentation

Wangkai Li, Zhaoyang Li, Yuwen Pan, Rui Sun, Yujia Chen, Tianzhu Zhang

Comments Accepted by International Conference on Learning Representations (ICLR 2026)

2603.01068 2026-03-03 cs.CV cs.LG

LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model

Zebin You, Xiaolu Zhang, Jun Zhou, Chongxuan Li, Ji-Rong Wen

2603.01064 2026-03-03 cs.LG

A level-wise training scheme for learning neural multigrid smoothers with application to integral equations

Lingfeng Li, Yin King Chu, Raymond Chan, Justin Wan

2603.01063 2026-03-03 cs.CV

Unleashing VLA Potentials in Autonomous Driving via Explicit Learning from Failures

Yuechen Luo, Qimao Chen, Fang Li, Shaoqing Xu, Jaxin Liu, Ziying Song, Zhi-xin Yang, Fuxi Wen

2603.01055 2026-03-03 cs.AI

MMCOMET: A Large-Scale Multimodal Commonsense Knowledge Graph for Contextual Reasoning

Eileen Wang, Hiba Arnaout, Dhita Pratama, Shuo Yang, Dangyang Liu, Jie Yang, Josiah Poon, Jeff Pan, Caren Han

2603.01052 2026-03-03 cs.LG

No More Maybe-Arrows: Resolving Causal Uncertainty by Breaking Symmetries

Tingrui Huang, Devendra Singh Dhami

2603.01050 2026-03-03 cs.CV cs.AI

MM-DeepResearch: A Simple and Effective Multimodal Agentic Search Baseline

Huanjin Yao, Qixiang Yin, Min Yang, Ziwang Zhao, Yibo Wang, Haotian Luo, Jingyi Zhang, Jiaxing Huang

Comments Technical report

2603.01047 2026-03-03 cs.LG stat.ML

Evaluating GFlowNet from partial episodes for stable and flexible policy-based training

Puhua Niu, Shili Wu, Xiaoning Qian

Comments Accepted by ICLR 2026

2602.24080 2026-03-03 cs.AI cs.SD

Human or Machine? A Preliminary Turing Test for Speech-to-Speech Interaction

Xiang Li, Jiabao Gao, Sipei Lin, Xuan Zhou, Chi Zhang, Bo Cheng, Jiale Han, Benyou Wang

Comments Accepted by ICLR 2026 Conference

2602.23893 2026-03-03 cs.CV cs.RO

AoE: Always-on Egocentric Human Video Collection for Embodied AI

Bowen Yang, Zishuo Li, Yang Sun, Changtao Miao, Yifan Yang, Man Luo, Xiaotong Yan, Feng Jiang, Jinchuan Shi, Yankai Fu, Ning Chen, Junkai Zhao, Pengwei Wang, Guocai Yao, Shanghang Zhang, Hao Chen, Zhe Li, Kai Zhu

2602.23795 2026-03-03 cs.LG

GRAIL: Post-hoc Compensation by Linear Reconstruction for Compressed Networks

Wenwu Tang, Dong Wang, Lothar Thiele, Olga Saukh

Comments Conference on Parsimony and Learning (CPAL)

2602.23524 2026-03-03 cs.RO cs.CV cs.LG

V-MORALS: Visual Morse Graph-Aided Estimation of Regions of Attraction in a Learned Latent Space

Faiz Aladin, Ashwin Balasubramanian, Lars Lindemann, Daniel Seita

2602.23302 2026-03-03 cs.AI cs.LO math.LO

The logic of KM belief update is contained in the logic of AGM belief revision

Giacomo Bonanno

Comments arXiv admin note: text overlap with arXiv:2310.11506. text overlap with arXiv:2310.11506

2602.23239 2026-03-03 cs.AI cs.CY

Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive

Radha Sarma

Comments About 10,600 words in all (includes ~1000 words of literature and ~2400 words of Appendices). Under journal review

详情

英文摘要

AI systems are increasingly deployed in high-stakes contexts (medical diagnosis, legal research, financial analysis) under the assumption they can be governed by norms. This paper demonstrates that the assumption is formally invalid for optimization-based systems, specifically Large Language Models trained via Reinforcement Learning from Human Feedback (RLHF). Genuine agency requires two necessary and jointly sufficient architectural conditions. First, the capacity to maintain certain boundaries as non-negotiable constraints rather than tradeable weights (Incommensurability). Second, a non-inferential mechanism capable of suspending processing when those boundaries are threatened (Apophatic Responsiveness). RLHF-based systems are constitutively incompatible with both conditions. The operations that make optimization powerful, unifying all values on a scalar metric and always selecting the highest-scoring output, are precisely the operations that preclude normative governance and agency. This incompatibility is not a correctable training bug awaiting a technical fix. It is a formal constraint inherent to what optimization is. Consequently, documented failure modes (sycophancy, hallucination, and unfaithful reasoning) are not accidents but expected structural manifestations. Misaligned deployment triggers a second-order risk termed the Convergence Crisis. When humans are forced to verify AI outputs under metric pressure, they degrade from genuine agents into criteria-checking optimizers, eliminating the only component capable of bearing normative accountability. Beyond the incompatibility proof, this paper's primary positive contribution is a substrate-neutral architectural specification deriving what any system (biological, artificial, or institutional) must necessarily satisfy to qualify as a genuine agent rather than a sophisticated instrument.

URL PDF HTML ☆

赞 0 踩 0

2602.23166 2026-03-03 cs.CV

AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios

Zhaochen Su, Jincheng Gao, Hangyu Guo, Zhenhua Liu, Lueyang Zhang, Xinyu Geng, Shijue Huang, Peng Xia, Guanyu Jiang, Cheng Wang, Yue Zhang, Yi R. Fung, Junxian He

Comments The project website is available at https://agentvista-bench.github.io/, and the code is available at https://github.com/hkust-nlp/AgentVista