arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.28766 2026-03-31 cs.CV

HandX: Scaling Bimanual Motion and Interaction Generation

Zimu Zhang, Yucheng Zhang, Xiyan Xu, Ziyin Wang, Sirui Xu, Kai Zhou, Bing Zhou, Chuan Guo, Jian Wang, Yu-Xiong Wang, Liang-Yan Gui

Comments CVPR 2026. Project Page: https://handx-project.github.io. Code: https://github.com/handx-project/HandX

详情

英文摘要

Synthesizing human motion has advanced rapidly, yet realistic hand motion and bimanual interaction remain underexplored. Whole-body models often miss the fine-grained cues that drive dexterous behavior, finger articulation, contact timing, and inter-hand coordination, and existing resources lack high-fidelity bimanual sequences that capture nuanced finger dynamics and collaboration. To fill this gap, we present HandX, a unified foundation spanning data, annotation, and evaluation. We consolidate and filter existing datasets for quality, and collect a new motion-capture dataset targeting underrepresented bimanual interactions with detailed finger dynamics. For scalable annotation, we introduce a decoupled strategy that extracts representative motion features, e.g., contact events and finger flexion, and then leverages reasoning from large language models to produce fine-grained, semantically rich descriptions aligned with these features. Building on the resulting data and annotations, we benchmark diffusion and autoregressive models with versatile conditioning modes. Experiments demonstrate high-quality dexterous motion generation, supported by our newly proposed hand-focused metrics. We further observe clear scaling trends: larger models trained on larger, higher-quality datasets produce more semantically coherent bimanual motion. Our dataset is released to support future research.

URL PDF HTML ☆

赞 0 踩 0

2603.28765 2026-03-31 cs.CL

Adaptive Block-Scaled Data Types

Jack Cook, Hyemin S. Lee, Kathryn Le, Junxian Guo, Giovanni Traverso, Anantha P. Chandrakasan, Song Han

Comments 19 pages, 9 figures

2603.28763 2026-03-31 cs.CV

PoseDreamer: Scalable and Photorealistic Human Data Generation Pipeline with Diffusion Models

Lorenza Prospero, Orest Kupyn, Ostap Viniavskyi, João F. Henriques, Christian Rupprecht

2603.28760 2026-03-31 cs.CV cs.RO

SHOW3D: Capturing Scenes of 3D Hands and Objects in the Wild

Patrick Rim, Kevin Harris, Braden Copple, Shangchen Han, Xu Xie, Ivan Shugurov, Sizhe An, He Wen, Alex Wong, Tomas Hodan, Kun He

Comments CVPR 2026

2603.28757 2026-03-31 cs.CV cs.MM cs.SD

SonoWorld: From One Image to a 3D Audio-Visual Scene

Derong Jin, Xiyi Chen, Ming C. Lin, Ruohan Gao

Comments Accepted by CVPR 2026, project page: https://humathe.github.io/sonoworld/

2603.28744 2026-03-31 cs.LG

Stop Probing, Start Coding: Why Linear Probes and Sparse Autoencoders Fail at Compositional Generalisation

Vitória Barin Pacela, Shruti Joshi, Isabela Camacho, Simon Lacoste-Julien, David Klindt

2603.28740 2026-03-31 cs.RO

FocusVLA: Focused Visual Utilization for Vision-Language-Action Models

Yichi Zhang, Weihao Yuan, Yizhuo Zhang, Xidong Zhang, Jia Wan

Comments 25 pages, 18 figures

2603.28739 2026-03-31 cs.LG stat.ML

Expectation Error Bounds for Transfer Learning in Linear Regression and Linear Neural Networks

Meitong Liu, Christopher Jung, Rui Li, Xue Feng, Han Zhao

2603.28732 2026-03-31 cs.RO cs.CV

Pandora: Articulated 3D Scene Graphs from Egocentric Vision

Alan Yu, Yun Chang, Christopher Xie, Luca Carlone

Comments 14 pages, 5 figures. Presented at the 2025 British Machine Vision Conference (BMVC) in Sheffield, UK

2603.28718 2026-03-31 cs.LG cs.AI cs.CV

Stepwise Credit Assignment for GRPO on Flow-Matching Models

Yash Savani, Branislav Kveton, Yuchen Liu, Yilin Wang, Jing Shi, Subhojyoti Mukherjee, Nikos Vlassis, Krishna Kumar Singh

Comments Accepted to the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026 Project page: https://stepwiseflowgrpo.com

2603.28713 2026-03-31 cs.CV

DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing

Kailai Feng, Yuxiang Wei, Bo Chen, Yang Pan, Hu Ye, Songwei Liu, Chenqian Yan, Yuan Gao

Comments https://carlofkl.github.io/dreamlite/

2603.28708 2026-03-31 cs.LG cs.DC

GPU-Accelerated Optimization of Transformer-Based Neural Networks for Real-Time Inference

Soutrik Mukherjee, Sangwhan Cha

Comments 10 pages, 8 figures, 15 tables

2603.28698 2026-03-31 cs.CL

EpiScreen: Early Epilepsy Detection from Electronic Health Records with Large Language Models

Shuang Zhou, Kai Yu, Zaifu Zhan, Huixue Zhou, Min Zeng, Feng Xie, Zhiyi Sha, Rui Zhang

Comments 24 pages, 5 figures, 4 tables

2603.28696 2026-03-31 cs.CV cs.AI

AdaptToken: Entropy-based Adaptive Token Selection for MLLM Long Video Understanding

Haozhe Qi, Kevin Qu, Mahdi Rad, Rui Wang, Alexander Mathis, Marc Pollefeys

Comments Project page: https://haozheqi.github.io/adapt-token

2603.28691 2026-03-31 cs.RO

DRIVE-Nav: Directional Reasoning, Inspection, and Verification for Efficient Open-Vocabulary Navigation

Maoguo Gao, Zejun Zhu, Zhiming Sun, Zhengwei Ma, Longze Yuan, Zhongjing Ma, Zhigang Gao, Jinhui Zhang, Suli Zou

Comments 8 pages, 4 figures. Project page: https://coolmaoguo.github.io/drive-nav-page/

2603.28690 2026-03-31 cs.RO cs.CE

Vision-Based Robotic Disassembly Combined with Real-Time MFA Data Acquisition

Federico Zocco, Maria Pozzi, Monica Malvezzi

Comments Submitted

2603.28678 2026-03-31 cs.LG

Subspace Optimization for Backpropagation-Free Continual Test-Time Adaptation

Damian Sójka, Sebastian Cygert, Marc Masana

2603.28674 2026-03-31 cs.RO

Serialized Red-Green-Gray: Quicker Heuristic Validation of Edges in Dynamic Roadmap Graphs

Yulie Arad, Stav Ashur, Marta Markowicz, James D. Motes, Marco Morales, Nancy M. Amato

2603.28673 2026-03-31 cs.LG cs.CR cs.DC

FL-PBM: Pre-Training Backdoor Mitigation for Federated Learning

Osama Wehbi, Sarhad Arisdakessian, Omar Abdel Wahab, Azzam Mourad, Hadi Otrok, Jamal Bentahar

Comments 12 pages, 3 figures, 1 table, 2 algorithms, Regular Journal Paper

2603.28670 2026-03-31 cs.CV cs.RO

Sim-to-Real Fruit Detection Using Synthetic Data: Quantitative Evaluation and Embedded Deployment with Isaac Sim

Martina Hutter-Mironovova

Comments 18 pages, 6 figures

2603.28662 2026-03-31 cs.LG cs.AI

AMIGO: Agentic Multi-Image Grounding Oracle Benchmark

Min Wang, Ata Mahjoubfar

2603.28660 2026-03-31 cs.CV

Industrial3D: A Terrestrial LiDAR Point Cloud Dataset and CrossParadigm Benchmark for Industrial Infrastructure

Chao Yin, Hongzhe Yue, Qing Han, Difeng Hu, Zhenyu Liang, Fangzhou Lin, Bing Sun, Boyu Wang, Mingkai Li, Wei Yao, Jack C. P. Cheng

Comments 49 pages, 8 figure, 14 tables

2603.28658 2026-03-31 cs.CV

Divide and Restore: A Modular Task-Decoupled Framework for Universal Image Restoration

Joanna Wiekiera, Martyna Zur

2603.28652 2026-03-31 cs.LG cs.CR cs.DC cs.GT

Mitigating Backdoor Attacks in Federated Learning Using PPA and MiniMax Game Theory

Osama Wehbi, Sarhad Arisdakessian, Omar Abdel Wahab, Anderson Avila, Azzam Mourad, Hadi Otrok

Comments 12 pages, 4 images, 2 tables, 2 algorithms, Regular Journal Paper

2603.28651 2026-03-31 cs.AI

Not Search, But Scan: Benchmarking MLLMs on Scan-Oriented Academic Paper Reasoning

Rongjin Li, Zichen Tang, Xianghe Wang, Xinyi Hu, Zhengyu Wang, Zhengyu Lu, Yiling Huang, Jiayuan Chen, Weisheng Tan, Jiacheng Liu, Zhongjun Yang, Haihong E

Comments Accepted to ICLR 2026

2603.28644 2026-03-31 cs.SD cs.LG cs.MM

Constructing Composite Features for Interpretable Music-Tagging

Chenhao Xue, Weitao Hu, Joyraj Chakraborty, Zhijin Guo, Kang Li, Tianyu Shi, Martin Reed, Nikolaos Thomos

Comments 5 pages, 8 figures, accepted at ICASSP 2026

2603.28643 2026-03-31 cs.AI cs.CL cs.HC

The Ultimate Tutorial for AI-driven Scale Development in Generative Psychometrics: Releasing AIGENIE from its Bottle

Lara Russell-Lasalandra, Hudson Golino, Luis Eduardo Garrido, Alexander P. Christensen

Comments 38 pages, 8 Figures, 3 tables

2603.26660 2026-03-31 cs.RO cs.AI

Ruka-v2: Tendon Driven Open-Source Dexterous Hand with Wrist and Abduction for Robot Learning

Xinqi Lucas Liu, Ruoxi Hu, Alejandro Ojeda Olarte, Zhuoran Chen, Kenny Ma, Charles Cheng Ji, Lerrel Pinto, Raunaq Bhirangi, Irmak Guzey

2603.16739 2026-03-31 cs.LG cs.AI cs.HC

SpecMoE: Spectral Mixture-of-Experts Foundation Model for Cross-Species EEG Decoding

Davy Darankoum, Chloé Habermacher, Julien Volle, Sergei Grudinin

Comments 34 pages (12 pages in the main text and 22 pages in Supplementary Information)

2603.14022 2026-03-31 cs.CV

A Hyperbolic Perspective on Hierarchical Structure in Object-Centric Scene Representations

Neelu Madan, Àlex Pujol, Andreas Møgelmose, Sergio Escalera, Kamal Nasrollahi, Graham W. Taylor, Thomas B. Moeslund

Comments accepted at CVPR Workshops 2026