arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.28422 2026-03-31 cs.RO

Active Stereo-Camera Outperforms Multi-Sensor Setup in ACT Imitation Learning for Humanoid Manipulation

Robin Kühn, Moritz Schappler, Thomas Seel, Dennis Bank

Comments 7 pages

详情

英文摘要

The complexity of teaching humanoid robots new tasks is one of the major reasons hindering their widespread adoption in the industry. While Imitation Learning (IL), particularly Action Chunking with Transformers (ACT), enables rapid task acquisition, there is no consensus yet on the optimal sensory hardware required for manipulation tasks. This paper benchmarks 14 sensor combinations on the Unitree G1 humanoid robot equipped with three-finger hands for two manipulation tasks. We explicitly evaluate the integration of tactile and proprioceptive modalities alongside active vision. Our analysis demonstrates that strategic sensor selection can outperform complex configurations in data-limited regimes while reducing computational overhead. We develop an open-source Unified Ablation Framework that utilizes sensor masking on a comprehensive master dataset. Results indicate that additional modalities often degrade performance for IL with limited data. A minimal active stereo-camera setup outperformed complex multi-sensor configurations, achieving 87.5% success in a spatial generalization task and 94.4% in a structured manipulation task. Conversely, adding pressure sensors to this setup reduced success to 67.3% in the latter task due to a low signal-to-noise ratio. We conclude that in data-limited regimes, active vision offers a superior trade-off between robustness and complexity. While tactile modalities may require larger datasets to be effective, our findings validate that strategic sensor selection is critical for designing an efficient learning process.

URL PDF HTML ☆

赞 0 踩 0

2603.28420 2026-03-31 cs.LG cs.AI

Spectral Higher-Order Neural Networks

Gianluca Peri, Timoteo Carletti, Duccio Fanelli, Diego Febbe

2603.28418 2026-03-31 cs.CL

LombardoGraphia: Automatic Classification of Lombard Orthography Variants

Edoardo Signoroni, Pavel Rychlý

Comments To be published at LREC 2026

2603.28417 2026-03-31 cs.LG cs.AI

KGroups: A Versatile Univariate Max-Relevance Min-Redundancy Feature Selection Algorithm for High-dimensional Biological Data

Malick Ebiele, Malika Bendechache, Rob Brennan

2603.28416 2026-03-31 cs.LG cs.AI

Evolutionary Discovery of Reinforcement Learning Algorithms via Large Language Models

Alkis Sygkounas, Amy Loutfi, Andreas Persson

Comments accepted at GECCO 2026

2603.28414 2026-03-31 cs.CV

Unified Restoration-Perception Learning: Maritime Infrared-Visible Image Fusion and Segmentation

Weichao Cai, Weiliang Huang, Biao Xue, Chao Huang, Fei Yuan, Bob Zhang

2603.28410 2026-03-31 cs.LG stat.ML

Mixture-Model Preference Learning for Many-Objective Bayesian Optimization

Manisha Dubey, Sebastiaan De Peuter, Wanrong Wang, Samuel Kaski

Comments 18 pages, 9 figures

2603.28407 2026-03-31 cs.AI cs.CL

MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome

Fangda Ye, Yuxin Hu, Pengxiang Zhu, Yibo Li, Ziqi Jin, Yao Xiao, Yibo Wang, Lei Wang, Zhen Zhang, Lu Wang, Yue Deng, Bin Wang, Yifan Zhang, Liangcai Su, Xinyu Wang, He Zhao, Chen Wei, Qiang Ren, Bryan Hooi, An Bo, Shuicheng Yan, Lidong Bing

Comments GitHub: https://github.com/MiroMindAI/MiroEval

详情

英文摘要

Recent progress in deep research systems has been impressive, but evaluation still lags behind real user needs. Existing benchmarks predominantly assess final reports using fixed rubrics, failing to evaluate the underlying research process. Most also offer limited multimodal coverage, rely on synthetic tasks that do not reflect real-world query complexity, and cannot be refreshed as knowledge evolves. To address these gaps, we introduce MiroEval, a benchmark and evaluation framework for deep research systems. The benchmark comprises 100 tasks (70 text-only, 30 multimodal), all grounded in real user needs and constructed via a dual-path pipeline that supports periodic updates, enabling a live and evolving setting. The proposed evaluation suite assesses deep research systems along three complementary dimensions: adaptive synthesis quality evaluation with task-specific rubrics, agentic factuality verification via active retrieval and reasoning over both web sources and multimodal attachments, and process-centric evaluation audits how the system searches, reasons, and refines throughout its investigation. Evaluation across 13 systems yields three principal findings: the three evaluation dimensions capture complementary aspects of system capability, with each revealing distinct strengths and weaknesses across systems; process quality serves as a reliable predictor of overall outcome while revealing weaknesses invisible to output-level metrics; and multimodal tasks pose substantially greater challenges, with most systems declining by 3 to 10 points. The MiroThinker series achieves the most balanced performance, with MiroThinker-H1 ranking the highest overall in both settings. Human verification and robustness results confirm the reliability of the benchmark and evaluation framework. MiroEval provides a holistic diagnostic tool for the next generation of deep research agents.

URL PDF HTML ☆

赞 0 踩 0

2603.28405 2026-03-31 cs.CV cs.AI

EdgeDiT: Hardware-Aware Diffusion Transformers for Efficient On-Device Image Generation

Sravanth Kodavanti, Manjunath Arveti, Sowmya Vajrala, Srinivas Miriyala, Vikram N R

Comments Accepted at the Mobile AI Workshop, CVPR 2026

2603.28396 2026-03-31 cs.LG cs.CR

Label-efficient Training Updates for Malware Detection over Time

Luca Minnei, Cristian Manca, Giorgio Piras, Angelo Sotgiu, Maura Pintor, Daniele Ghiani, Davide Maiorca, Giorgio Giacinto, Battista Biggio

Comments Submitted to IEEE Transactions on Information Forensics and Security

2603.28390 2026-03-31 cs.CV eess.SP

SVH-BD : Synthetic Vegetation Hyperspectral Benchmark Dataset for Emulation of Remote Sensing Images

Chedly Ben Azizi, Claire Guilloteau, Gilles Roussel, Matthieu Puigt

2603.28387 2026-03-31 cs.AI cs.LG

The Scaffold Effect: How Prompt Framing Drives Apparent Multimodal Gains in Clinical VLM Evaluation

Doan Nam Long Vu, Simone Balloccu

2603.28386 2026-03-31 cs.AI

COvolve: Adversarial Co-Evolution of Large-Language-Model-Generated Policies and Environments via Two-Player Zero-Sum Game

Alkis Sygkounas, Rishi Hazra, Andreas Persson, Pedro Zuidberg Dos Martires, Amy Loutfi

Comments Accepted at GECCO 2026

2603.28385 2026-03-31 cs.LG cs.AI cs.NE cs.RO

Critic-Free Deep Reinforcement Learning for Maritime Coverage Path Planning on Irregular Hexagonal Grids

Carlos S. Sepúlveda, Gonzalo A. Ruz

2603.28376 2026-03-31 cs.CL cs.AI

Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design

Bin Zhu, Qianghuai Jia, Tian Lan, Junyang Ren, Feng Gu, Feihu Jiang, Longyue Wang, Zhao Xu, Weihua Luo

2603.28367 2026-03-31 cs.CV

Rethinking Structure Preservation in Text-Guided Image Editing with Visual Autoregressive Models

Tao Xia, Jiawei Liu, Yukun Zhang, Ting Liu, Wei Wang, Lei Zhang

2603.28366 2026-03-31 cs.CV

AutoCut: End-to-end advertisement video editing based on multimodal discretization and controllable generation

Milton Zhou, Sizhong Qin, Yongzhi Li, Quan Chen, Peng Jiang

Comments Accepted by CVPR 2026

2603.28363 2026-03-31 cs.CV

SEA: Evaluating Sketch Abstraction Efficiency via Element-level Commonsense Visual Question Answering

Jiho Park, Sieun Choi, Jaeyoon Seo, Minho Sohn, Yeana Kim, Jihie Kim

2603.28362 2026-03-31 cs.RO cond-mat.mtrl-sci cond-mat.soft physics.app-ph

A Foldable and Agile Soft Electromagnetic Robot for Multimodal Navigation in Confined and Unstructured Environments

Zhihao Lv, Xiaoyong Zhang, Mengfan Zhang, Xiaoyu Song, Xingyue Liu, Yide Liu, Shaoxing Qu, Guoyong Mao

2603.28361 2026-03-31 cs.AI cs.MA

Deep Research of Deep Research: From Transformer to Agent, From AI to AI for Science

Yipeng Yu

2603.28360 2026-03-31 cs.AI

CoE: Collaborative Entropy for Uncertainty Quantification in Agentic Multi-LLM Systems

Kangkang Sun, Jun Wu, Jianhua Li, Minyi Guo, Xiuzhen Che, Jianwei Huang

Comments 18 pages, 7 figures, has already published in ICLR workshop "Agentic AI in the Wild: From Hallucinations to Reliable Autonomy"

2603.28357 2026-03-31 cs.CV cs.LG

Optimized Weighted Voting System for Brain Tumor Classification Using MRI Images

Ha Anh Vu

2603.28353 2026-03-31 cs.CV

VistaGEN: Consistent Driving Video Generation with Fine-Grained Control Using Multiview Visual-Language Reasoning

Li-Heng Chen, Ke Cheng, Yahui Liu, Lei Shi, Shi-Sheng Huang, Hongbo Fu

2603.28351 2026-03-31 cs.CL

Not All Subjectivity Is the Same! Defining Desiderata for the Evaluation of Subjectivity in NLP

Urja Khurana, Michiel van der Meer, Enrico Liscio, Antske Fokkens, Pradeep K. Murukannaiah

Comments Under review

2603.28348 2026-03-31 cs.RO cs.HC

Proposing a Game Theory Approach to Explore Group Dynamics with Social Robot

Giulia Pusceddu

Comments Honorable Mention at HRI Pioneers 2025. Peer-reviewed. https://hripioneers.org/archives/hri25/participants/

2603.28346 2026-03-31 cs.LG stat.ML

Machine Learning-Assisted High-Dimensional Matrix Estimation

Wan Tian, Hui Yang, Zhouhui Lian, Lingyue Zhang, Yijie Peng

2603.28334 2026-03-31 cs.LG cs.DC

Key-Embedded Privacy for Decentralized AI in Biomedical Omics

Rongyu Zhang, Hongyu Dong, Gaole Dai, Ziqi Qiao, Shenli Zheng, Yuan Zhang, Aosong Cheng, Xiaowei Chi, Jincai Luo, Pin Li, Li Du, Dan Wang, Yuan Du, Xudong Xing, Jianxu Chen, Shanghang Zhang

2603.28333 2026-03-31 cs.CV cs.AI

Integrating Multimodal Large Language Model Knowledge into Amodal Completion

Heecheol Yun, Eunho Yang

2603.28328 2026-03-31 cs.LG physics.geo-ph

Physics-Informed Neural Networks for Predicting Hydrogen Sorption in Geological Formations: Thermodynamically Constrained Deep Learning Integrating Classical Adsorption Theory

Mohammad Nooraiepour, Mohammad Masoudi, Zezhang Song, Helge Hellevang

详情

英文摘要

Accurate prediction of hydrogen sorption in fine-grained geological materials is essential for evaluating underground hydrogen storage capacity, assessing caprock integrity, and characterizing hydrogen migration in subsurface energy systems. Classical isotherm models perform well at the individual-sample level but fail when generalized across heterogeneous populations, with the coefficient of determination collapsing from 0.80-0.90 for single-sample fits to 0.09-0.38 for aggregated multi-sample datasets. We present a multi-scale physics-informed neural network framework that addresses this limitation by embedding classical adsorption theory and thermodynamic constraints directly into the learning process. The framework utilizes 1,987 hydrogen sorption isotherm measurements across clays, shales, coals, supplemented by 224 characteristic uptake measurements. A seven-category physics-informed feature engineering scheme generates 62 thermodynamically meaningful descriptors from raw material characterization data. The loss function enforces saturation limits, a monotonic pressure response, and Van't Hoff temperature dependence via penalty weighting, while a three-phase curriculum-based training strategy ensures stable integration of competing physical constraints. An architecture-diverse ensemble of ten members provides calibrated uncertainty quantification, with post-hoc temperature scaling achieving target prediction interval coverage. The optimized PINN achieves R2 = 0.9544, RMSE = 0.0484 mmol/g, and MAE = 0.0231 mmol/g on the held-out test set, with 98.6% monotonicity satisfaction and zero non-physical negative predictions. Physics-informed regularization yields a 10-15% cross-lithology generalization advantage over a well-tuned random forest under leave-one-lithology-out validation, confirming that thermodynamic constraints transfer meaningfully across geological boundaries.

URL PDF HTML ☆

赞 0 踩 0

2603.28322 2026-03-31 cs.CV

SFDemorpher: Generalizable Face Demorphing for Operational Morphing Attack Detection

Raul Ismayilov, Luuk Spreeuwers