arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.20608 2026-02-25 cs.CV

VAGNet: Grounding 3D Affordance from Human-Object Interactions in Videos

Aihua Mao, Kaihang Huang, Yong-Jin Liu, Chee Seng Chan, Ying He

详情

英文摘要

3D object affordance grounding aims to identify regions on 3D objects that support human-object interaction (HOI), a capability essential to embodied visual reasoning. However, most existing approaches rely on static visual or textual cues, neglecting that affordances are inherently defined by dynamic actions. As a result, they often struggle to localize the true contact regions involved in real interactions. We take a different perspective. Humans learn how to use objects by observing and imitating actions, not just by examining shapes. Motivated by this intuition, we introduce video-guided 3D affordance grounding, which leverages dynamic interaction sequences to provide functional supervision. To achieve this, we propose VAGNet, a framework that aligns video-derived interaction cues with 3D structure to resolve ambiguities that static cues cannot address. To support this new setting, we introduce PVAD, the first HOI video-3D pairing affordance dataset, providing functional supervision unavailable in prior works. Extensive experiments on PVAD show that VAGNet achieves state-of-the-art performance, significantly outperforming static-based baselines. The code and dataset will be open publicly.

URL PDF HTML ☆

赞 0 踩 0

2602.20597 2026-02-25 cs.CV

Interaction-aware Representation Modeling with Co-occurrence Consistency for Egocentric Hand-Object Parsing

Yuejiao Su, Yi Wang, Lei Yao, Yawen Cui, Lap-Pui Chau

详情

英文摘要

A fine-grained understanding of egocentric human-environment interactions is crucial for developing next-generation embodied agents. One fundamental challenge in this area involves accurately parsing hands and active objects. While transformer-based architectures have demonstrated considerable potential for such tasks, several key limitations remain unaddressed: 1) existing query initialization mechanisms rely primarily on semantic cues or learnable parameters, demonstrating limited adaptability to changing active objects across varying input scenes; 2) previous transformer-based methods utilize pixel-level semantic features to iteratively refine queries during mask generation, which may introduce interaction-irrelevant content into the final embeddings; and 3) prevailing models are susceptible to "interaction illusion", producing physically inconsistent predictions. To address these issues, we propose an end-to-end Interaction-aware Transformer (InterFormer), which integrates three key components, i.e., a Dynamic Query Generator (DQG), a Dual-context Feature Selector (DFS), and the Conditional Co-occurrence (CoCo) loss. The DQG explicitly grounds query initialization in the spatial dynamics of hand-object contact, enabling targeted generation of interaction-aware queries for hands and various active objects. The DFS fuses coarse interactive cues with semantic features, thereby suppressing interaction-irrelevant noise and emphasizing the learning of interactive relationships. The CoCo loss incorporates hand-object relationship constraints to enhance physical consistency in prediction. Our model achieves state-of-the-art performance on both the EgoHOS and the challenging out-of-distribution mini-HOI4D datasets, demonstrating its effectiveness and strong generalization ability. Code and models are publicly available at https://github.com/yuggiehk/InterFormer.

URL PDF HTML ☆

赞 0 踩 0

2602.20593 2026-02-25 cs.LG cs.CR

Is the Trigger Essential? A Feature-Based Triggerless Backdoor Attack in Vertical Federated Learning

Yige Liu, Yiwei Lou, Che Wang, Yongzhi Cao, Hanpin Wang

2602.20592 2026-02-25 cs.SD eess.AS

Quantifying Dimensional Independence in Speech: An Information-Theoretic Framework for Disentangled Representation Learning

Bipasha Kashyap, Björn W. Schuller, Pubudu N. Pathirana

2602.20584 2026-02-25 cs.CV cs.RO

Long-Term Multi-Session 3D Reconstruction Under Substantial Appearance Change

Beverley Gorry, Tobias Fischer, Michael Milford, Alejandro Fontan

2602.20583 2026-02-25 cs.CV

PropFly: Learning to Propagate via On-the-Fly Supervision from Pre-trained Video Diffusion Models

Wonyong Seo, Jaeho Moon, Jaehyup Lee, Soo Ye Kim, Munchurl Kim

Comments The first two authors contributed equally to this work (equal contribution)

2602.20580 2026-02-25 cs.CL cs.AI cs.CR cs.LG

Personal Information Parroting in Language Models

Nishant Subramani, Kshitish Ghate, Mona Diab

Comments EACL Findings 2026

2602.20578 2026-02-25 cs.LG math.OC stat.ML

Upper-Linearizability of Online Non-Monotone DR-Submodular Maximization over Down-Closed Convex Sets

Yiyang Lu, Haresh Jadav, Mohammad Pedramfar, Ranveer Singh, Vaneet Aggarwal

2602.20577 2026-02-25 cs.CV

Efficient and Explainable End-to-End Autonomous Driving via Masked Vision-Language-Action Diffusion

Jiaru Zhang, Manav Gagvani, Can Cui, Juntong Peng, Ruqi Zhang, Ziran Wang

2602.20575 2026-02-25 cs.CV

An interactive enhanced driving dataset for autonomous driving

Haojie Feng, Peizhi Zhang, Mengjie Tian, Xinrui Zhang, Zhuoren Li, Junpeng Huang, Xiurong Wang, Junfan Zhu, Jianzhou Wang, Dongxiao Yin, Lu Xiong

2602.20574 2026-02-25 cs.LG cs.CL

GATES: Self-Distillation under Privileged Context with Consensus Gating

Alex Stein, Furong Huang, Tom Goldstein

Comments 10 Pages of main text with an additional 7 pages of supplementary material

2602.20569 2026-02-25 cs.CV

AIForge-Doc: A Benchmark for Detecting AI-Forged Tampering in Financial and Form Documents

Jiaqi Wu, Yuchen Zhou, Muduo Xu, Zisheng Liang, Simiao Ren, Jiayu Xue, Meige Yang, Siying Chen, Jingheng Huan

Comments 17 pages, 10 figures

2602.20567 2026-02-25 cs.LG math.OC stat.ML

Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs

Yifei Liang, Yan Sun, Xiaochun Cao, Li Shen

Comments 47 Pages

2602.20566 2026-02-25 cs.RO cs.CV

BFA++: Hierarchical Best-Feature-Aware Token Prune for Multi-View Vision Language Action Model

Haosheng Li, Weixin Mao, Zihan Lan, Hongwei Xiong, Hongan Wang, Chenyang Si, Ziwei Liu, Xiaoming Deng, Hua Chen

Comments 9 pages, 10 figures

2602.20557 2026-02-25 cs.LG cs.SC

GENSR: Symbolic Regression Based in Equation Generative Space

Qian Li, Yuxiao Hu, Juncheng Liu, Yuntian Chen

2602.20556 2026-02-25 cs.CV

WildGHand: Learning Anti-Perturbation Gaussian Hand Avatars from Monocular In-the-Wild Videos

Hanhui Li, Xuan Huang, Wanquan Liu, Yuhao Cheng, Long Chen, Yiqiang Yan, Xiaodan Liang, Chenqiang Gao

2602.20550 2026-02-25 cs.CV

The Finite Primitive Basis Theorem for Computational Imaging: Formal Foundations of the OperatorGraph Representation

Chengshuai Yang

2602.20548 2026-02-25 cs.CV

Robust Spiking Neural Networks Against Adversarial Attacks

Shuai Wang, Malu Zhang, Yulin Jiang, Dehao Zhang, Ammar Belatreche, Yu Liang, Yimeng Shan, Zijian Zhou, Yang Yang, Haizhou Li

Comments Published as a conference paper at ICLR 2026

2602.20543 2026-02-25 cs.CV

Beyond Human Performance: A Vision-Language Multi-Agent Approach for Quality Control in Pharmaceutical Manufacturing

Subhra Jyoti Mandal, Lara Rachidi, Puneet Jain, Matthieu Duvinage, Sander W. Timmer

2602.20532 2026-02-25 cs.LG cs.AI cs.CL

Actor-Curator: Co-adaptive Curriculum Learning via Policy-Improvement Bandits for RL Post-Training

Zhengyao Gu, Jonathan Light, Raul Astudillo, Ziyu Ye, Langzhou He, Henry Peng Zou, Wei Cheng, Santiago Paternain, Philip S. Yu, Yisong Yue

Comments 37 pages, 8 figures, 1 table. Preprint under review. Equal contribution by first two authors

2602.20531 2026-02-25 cs.CV

A Lightweight Vision-Language Fusion Framework for Predicting App Ratings from User Interfaces and Metadata

Azrin Sultana, Firoz Ahmed

Comments 24 pages, 10 figures

2602.20530 2026-02-25 cs.LG cs.SD eess.AS

Memory-guided Prototypical Co-occurrence Learning for Mixed Emotion Recognition

Ming Li, Yong-Jin Liu, Fang Liu, Huankun Sheng, Yeying Fan, Yixiang Wei, Minnan Luo, Weizhan Zhang, Wenping Wang

2602.20528 2026-02-25 cs.CL cs.LG

Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning

Justin Lovelace, Christian Belardi, Sofian Zalouk, Adhitya Polavaram, Srivatsa Kundurthy, Kilian Q. Weinberger

Comments COLM 2025

2602.20527 2026-02-25 cs.LG cs.AI

A Generalized Apprenticeship Learning Framework for Capturing Evolving Student Pedagogical Strategies

Md Mirajul Islam, Xi Yang, Adittya Soukarjya Saha, Rajesh Debnath, Min Chi

Comments 16 pages

2602.20520 2026-02-25 cs.CV cs.AI

How Do Inpainting Artifacts Propagate to Language?

Pratham Yashwante, Davit Abrahamyan, Shresth Grover, Sukruth Rao

2602.20517 2026-02-25 cs.AI cs.CL cs.LG

Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination

Rakshit Trivedi, Kartik Sharma, David C Parkes

Comments Spotlight paper at NeurIPS 2025

2602.20513 2026-02-25 cs.CL

From Performance to Purpose: A Sociotechnical Taxonomy for Evaluating Large Language Model Utility

Gavin Levinson, Keith Feldman

2602.20512 2026-02-25 cs.RO

Conflict-Based Search for Multi-Agent Path Finding with Elevators

Haitong He, Xuemian Wu, Shizhe Zhao, Zhongqiang Ren

2602.20502 2026-02-25 cs.AI cs.LG

ActionEngine: From Reactive to Programmatic GUI Agents via State Machine Memory

Hongbin Zhong, Fazle Faisal, Luis França, Tanakorn Leesatapornwongsa, Adriana Szekeres, Kexin Rong, Suman Nath

2602.20500 2026-02-25 cs.RO cs.CV

Strategy-Supervised Autonomous Laparoscopic Camera Control via Event-Driven Graph Mining

Keyu Zhou, Peisen Xu, Yahao Wu, Jiming Chen, Gaofeng Li, Shunlei Li

Comments Submitted to IEEE Transactions on Robotics (T-RO). 19 pages, 9 figures