arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.01224 2026-04-02 cs.RO

Functional Force-Aware Retargeting from Virtual Human Demos to Soft Robot Policies

Uksang Yoo, Mengjia Zhu, Evan Pezent, Jom Preechayasomboon, Jean Oh, Jeffrey Ichnowski, Amir Memar, Ben Abbatematteo, Homanga Bharadhwaj, Ashish Deshpande, Harsha Prahlad

详情

英文摘要

We introduce SoftAct, a framework for teaching soft robot hands to perform human-like manipulation skills by explicitly reasoning about contact forces. Leveraging immersive virtual reality, our system captures rich human demonstrations, including hand kinematics, object motion, dense contact patches, and detailed contact force information. Unlike conventional approaches that retarget human joint trajectories, SoftAct employs a two-stage, force-aware retargeting algorithm. The first stage attributes demonstrated contact forces to individual human fingers and allocates robot fingers proportionally, establishing a force-balanced mapping between human and robot hands. The second stage performs online retargeting by combining baseline end-effector pose tracking with geodesic-weighted contact refinements, using contact geometry and force magnitude to adjust robot fingertip targets in real time. This formulation enables soft robotic hands to reproduce the functional intent of human demonstrations while naturally accommodating extreme embodiment mismatch and nonlinear compliance. We evaluate SoftAct on a suite of contact-rich manipulation tasks using a custom non-anthropomorphic pneumatic soft robot hand. SoftAct's controller reduces fingertip trajectory tracking RMSE by up to 55 percent and reduces tracking variance by up to 69 percent compared to kinematic and learning-based baselines. At the policy level, SoftAct achieves consistently higher success in zero-shot real-world deployment and in simulation. These results demonstrate that explicitly modeling contact geometry and force distribution is essential for effective skill transfer to soft robotic hands, and cannot be recovered through kinematic imitation alone. Project videos and additional details are available at https://soft-act.github.io/.

URL PDF HTML ☆

赞 0 踩 0

2604.01221 2026-04-02 cs.AI cs.CV

HippoCamp: Benchmarking Contextual Agents on Personal Computers

Zhe Yang, Shulin Tian, Kairui Hu, Shuai Liu, Hoang-Nhat Nguyen, Yichi Zhang, Zujin Guo, Mengying Yu, Zinan Zhang, Jingkang Yang, Chen Change Loy, Ziwei Liu

Comments Project Page: https://hippocamp-ai.github.io/

2604.01220 2026-04-02 cs.CL

Universal YOCO for Efficient Depth Scaling

Yutao Sun, Li Dong, Tianzhu Ye, Shaohan Huang, Jianyong Wang, Furu Wei

2604.01216 2026-04-02 cs.LG cs.AI cs.CV

LAtent Phase Inference from Short time sequences using SHallow REcurrent Decoders (LAPIS-SHRED)

Yuxuan Bao, Xingyue Zhang, J. Nathan Kutz

详情

英文摘要

Reconstructing full spatio-temporal dynamics from sparse observations in both space and time remains a central challenge in complex systems, as measurements can be spatially incomplete and can be also limited to narrow temporal windows. Yet approximating the complete spatio-temporal trajectory is essential for mechanistic insight and understanding, model calibration, and operational decision-making. We introduce LAPIS-SHRED (LAtent Phase Inference from Short time sequence using SHallow REcurrent Decoders), a modular architecture that reconstructs and/or forecasts complete spatiotemporal dynamics from sparse sensor observations confined to short temporal windows. LAPIS-SHRED operates through a three-stage pipeline: (i) a SHRED model is pre-trained entirely on simulation data to map sensor time-histories into a structured latent space, (ii) a temporal sequence model, trained on simulation-derived latent trajectories, learns to propagate latent states forward or backward in time to span unobserved temporal regions from short observational time windows, and (iii) at deployment, only a short observation window of hyper-sparse sensor measurements from the true system is provided, from which the frozen SHRED model and the temporal model jointly reconstruct or forecast the complete spatiotemporal trajectory. The framework supports bidirectional inference, inherits data assimilation and multiscale reconstruction capabilities from its modular structure, and accommodates extreme observational constraints including single-frame terminal inputs. We evaluate LAPIS-SHRED on six experiments spanning complex spatio-temporal physics: turbulent flows, multiscale propulsion physics, volatile combustion transients, and satellite-derived environmental fields, highlighting a lightweight, modular architecture suited for operational settings where observation is constrained by physical or logistical limitations.

URL PDF HTML ☆

赞 0 踩 0

2604.01215 2026-04-02 cs.LG cs.AI physics.ao-ph

The Recipe Matters More Than the Kitchen:Mathematical Foundations of the AI Weather Prediction Pipeline

Piyush Garg, Diana R. Gergel, Andrew E. Shao, Galen J. Yacalis

详情

英文摘要

AI weather prediction has advanced rapidly, yet no unified mathematical framework explains what determines forecast skill. Existing theory addresses specific architectural choices rather than the learning pipeline as a whole, while operational evidence from 2023-2026 demonstrates that training methodology, loss function design, and data diversity matter at least as much as architecture selection. This paper makes two interleaved contributions. Theoretically, we construct a framework rooted in approximation theory on the sphere, dynamical systems theory, information theory, and statistical learning theory that treats the complete learning pipeline (architecture, loss function, training strategy, data distribution) rather than architecture alone. We establish a Learning Pipeline Error Decomposition showing that estimation error (loss- and data-dependent) dominates approximation error (architecture-dependent) at current scales. We develop a Loss Function Spectral Theory formalizing MSE-induced spectral blurring in spherical harmonic coordinates, and derive Out-of-Distribution Extrapolation Bounds proving that data-driven models systematically underestimate record-breaking extremes with bias growing linearly in record exceedance. Empirically, we validate these predictions via inference across ten architecturally diverse AI weather models using NVIDIA Earth2Studio with ERA5 initial conditions, evaluating six metrics across 30 initialization dates spanning all seasons. Results confirm universal spectral energy loss at high wavenumbers for MSE-trained models, rising Error Consensus Ratios showing that the majority of forecast error is shared across architectures, and linear negative bias during extreme events. A Holistic Model Assessment Score provides unified multi-dimensional evaluation, and a prescriptive framework enables mathematical evaluation of proposed pipelines before training.

URL PDF HTML ☆

赞 0 踩 0

2604.01213 2026-04-02 cs.RO cs.MA

Collaborative Task and Path Planning for Heterogeneous Robotic Teams using Multi-Agent PPO

Matthias Rubio, Julia Richter, Hendrik Kolvenbach, Marco Hutter

Comments 8 pages, 3 figures, associated code on https://github.com/leggedrobotics/multi_robot_global_planner

2604.01212 2026-04-02 cs.CL cs.AI

$\texttt{YC-Bench}$: Benchmarking AI Agents for Long-Term Planning and Consistent Execution

Muyu He, Adit Jain, Anand Kumar, Vincent Tu, Soumyadeep Bakshi, Sachin Patro, Nazneen Rajani

Comments 16 pages, 10 figures

2604.01210 2026-04-02 cs.LG cs.AI

CliffSearch: Structured Agentic Co-Evolution over Theory and Code for Scientific Algorithm Discovery

Youssef Mroueh, Carlos Fonseca, Brian Belgodere, David Cox

2604.01207 2026-04-02 cs.CV

TRACE: High-Fidelity 3D Scene Editing via Tangible Reconstruction and Geometry-Aligned Contextual Video Masking

Jiyuan Hu, Zechuan Zhang, Zongxin Yang, Yi Yang

Comments 22 pages, 9 figures

2604.01206 2026-04-02 cs.CL cs.LG

LLM REgression with a Latent Iterative State Head

Yiheng Su, Matthew Lease

2604.01193 2026-04-02 cs.CL

Embarrassingly Simple Self-Distillation Improves Code Generation

Ruixiang Zhang, Richard He Bai, Huangjie Zheng, Navdeep Jaitly, Ronan Collobert, Yizhe Zhang

2602.02326 2026-04-02 cs.CL

Language Steering for Multilingual In-Context Learning

Neeraja Kirtane, Kuan-Hao Huang

2601.02728 2026-04-02 cs.LG

CRoPE: Efficient Parametrization of Rotary Positional Embedding

Beicheng Lou, Zifei Xu, Vivian W. H. Wong

2511.08592 2026-04-02 cs.CL cs.AI

The Collective Turing Test: Large Language Models Can Generate Realistic Multi-User Discussions

Azza Bouleimen, Giordano De Marzo, Taehee Kim, Nicol`o Pagan, Hannah Metzler, Silvia Giordano, Anikó Hannák, David Garcia

2505.20507 2026-04-02 cs.CV cs.AI

Electrolyzers-HSI: Close-Range Multi-Scene Hyperspectral Imaging Benchmark Dataset

Elias Arbash, Ahmed Jamal Afifi, Ymane Belahsen, Margret Fuchs, Pedram Ghamisi, Paul Scheunders, Richard Gloaguen

详情

DOI: 10.1038/s41597-025-06279-9
Journal ref: Sci Data 12, 1818 (2025)

英文摘要

The global challenge of sustainable recycling demands automated, fast, and accurate, state-of-the-art (SOTA) material detection systems that act as a bedrock for a circular economy. Democratizing access to these cutting-edge solutions that enable real-time waste analysis is essential for scaling up recycling efforts and fostering the Green Deal. In response, we introduce \textbf{Electrolyzers-HSI}, a novel multimodal benchmark dataset designed to accelerate the recovery of critical raw materials through accurate electrolyzer materials classification. The dataset comprises 55 co-registered high-resolution RGB images and hyperspectral imaging (HSI) data cubes spanning the 400--2500 nm spectral range, yielding over 4.2 million pixel vectors and 424,169 labeled ones. This enables non-invasive spectral analysis of shredded electrolyzer samples, supporting quantitative and qualitative material classification and spectral properties investigation. We evaluate a suite of baseline machine learning (ML) methods alongside SOTA transformer-based deep learning (DL) architectures, including Vision Transformer, SpectralFormer, and the Multimodal Fusion Transformer, to investigate architectural bottlenecks for further efficiency optimisation when deploying transformers in material identification. We implement zero-shot detection techniques and majority voting across pixel-level predictions to establish object-level classification robustness. In adherence to the FAIR data principles, the electrolyzers-HSI dataset and accompanying codebase are openly available at https://github.com/hifexplo/Electrolyzers-HSI and https://rodare.hzdr.de/record/3668, supporting reproducible research and facilitating the broader adoption of smart and sustainable e-waste recycling solutions.

URL PDF HTML ☆

赞 0 踩 0

2604.01179 2026-04-02 cs.RO cs.AI cs.CV

A ROS 2 Wrapper for Florence-2: Multi-Mode Local Vision-Language Inference for Robotic Systems

J. E. Domínguez-Vidal

Comments 5 pages, 1 figure

2604.01175 2026-04-02 cs.LG

NeuroDDAF: Neural Dynamic Diffusion-Advection Fields with Evidential Fusion for Air Quality Forecasting

Prasanjit Dey, Soumyabrata Dev, Angela Meyer, Bianca Schoen-Phelan

Comments This manuscript is under review

2604.01171 2026-04-02 cs.CV

Open-Set Supervised 3D Anomaly Detection: An Industrial Dataset and a Generalisable Framework for Unknown Defects

Hanzhe Liang, Luocheng Zhang, Junyang Xia, HanLiang Zhou, Bingyang Guo, Yingxi Xie, Can Gao, Ruiyun Yu, Jinbao Wang, Pan Li

Comments Resources: https://github.com/hzzzzzhappy/open-industry

2604.01170 2026-04-02 cs.LG cs.AI cs.CL stat.AP stat.ML

Online Reasoning Calibration: Test-Time Training Enables Generalizable Conformal LLM Reasoning

Cai Zhou, Zekai Wang, Menghua Wu, Qianyu Julie Zhu, Flora C. Shi, Chenyu Wang, Ashia Wilson, Tommi Jaakkola, Stephen Bates

Comments 20 pages

2604.01169 2026-04-02 cs.LG cond-mat.mtrl-sci q-bio.BM

Bridging the Simulation-to-Experiment Gap with Generative Models using Adversarial Distribution Alignment

Kai Nelson, Tobias Kreiman, Sergey Levine, Aditi S. Krishnapriyan

2604.01158 2026-04-02 cs.RO

SMASH: Mastering Scalable Whole-Body Skills for Humanoid Ping-Pong with Egocentric Vision

Junli Ren, Yinghui Li, Kai Zhang, Penglin Fu, Haoran Jiang, Yixuan Pan, Guangjun Zeng, Tao Huang, Weizhong Guo, Peng Lu, Tianyu Li, Jingbo Wang, Li Chen, Hongyang Li, Ping Luo

2604.01155 2026-04-02 cs.SD

FineLAP: Taming Heterogeneous Supervision for Fine-grained Language-Audio Pretraining

Xiquan Li, Xuenan Xu, Ziyang Ma, Wenxi Chen, Haolin He, Qiuqiang Kong, Xie Chen

2604.01152 2026-04-02 cs.CL cs.AI

Brainstacks: Cross-Domain Cognitive Capabilities via Frozen MoE-LoRA Stacks for Continual LLM Learning

Mohammad R. Abu Ayyash

Comments 26 pages, 13 figures, 4 tables

2604.01142 2026-04-02 cs.RO cs.LG

Deep Reinforcement Learning for Robotic Manipulation under Distribution Shift with Bounded Extremum Seeking

Shaifalee Saxena, Rafael Fierro, Alexander Scheinker

2604.01141 2026-04-02 cs.CV cs.AI eess.IV

Looking into a Pixel by Nonlinear Unmixing -- A Generative Approach

Maofeng Tang, Hairong Qi

2604.01134 2026-04-02 cs.RO cs.DB eess.IV

VRUD: A Drone Dataset for Complex Vehicle-VRU Interactions within Mixed Traffic

Ziyu Wang, Hongrui Kou, Cheng Wang, Ruochen Li, Hubert P. H. Shum, Amir Atapour-Abarghouei, Yuxin Zhang

2604.01129 2026-04-02 cs.CV

ReinDriveGen: Reinforcement Post-Training for Out-of-Distribution Driving Scene Generation

Hao Zhang, Lue Fan, Weikang Bian, Zehuan Wu, Lewei Lu, Zhaoxiang Zhang, Hongsheng Li

Comments Project page: https://drive-sim.github.io/ReinDriveGen/

2604.01128 2026-04-02 cs.CL cs.AI cs.LG

Paper Reconstruction Evaluation: Evaluating Presentation and Hallucination in AI-written Papers

Atsuyuki Miyai, Mashiro Toyooka, Zaiying Zhao, Kenta Watanabe, Toshihiko Yamasaki, Kiyoharu Aizawa

Comments Project Page: https://agent4science-utokyo.github.io/PaperRecon_HP/

2604.01118 2026-04-02 cs.CV cs.AI cs.LG

Lightweight Prompt-Guided CLIP Adaptation for Monocular Depth Estimation

Reyhaneh Ahani Manghotay, Jie Liang

Comments 14 pages, 2 figures

2604.01117 2026-04-02 cs.LG

Reconsidering Dependency Networks from an Information Geometry Perspective

Kazuya Takabatake, Shotaro Akaho

Comments 25 papers, 7 figures