arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.21013 2026-03-24 cs.AI cs.LG cs.RO

A Framework for Low-Latency, LLM-driven Multimodal Interaction on the Pepper Robot

Erich Studerus, Vivienne Jia Zhong, Stephan Vonschallen

Comments 4 pages, 2 figures. To appear in Proceedings of the 21st ACM/IEEE International Conference on Human-Robot Interaction (HRI '26), Edinburgh, Scotland, March 2026

详情

DOI: 10.1145/3757279.3788808

英文摘要

Despite recent advances in integrating Large Language Models (LLMs) into social robotics, two weaknesses persist. First, existing implementations on platforms like Pepper often rely on cascaded Speech-to-Text (STT)->LLM->Text-to-Speech (TTS) pipelines, resulting in high latency and the loss of paralinguistic information. Second, most implementations fail to fully leverage the LLM's capabilities for multimodal perception and agentic control. We present an open-source Android framework for the Pepper robot that addresses these limitations through two key innovations. First, we integrate end-to-end Speech-to-Speech (S2S) models to achieve low-latency interaction while preserving paralinguistic cues and enabling adaptive intonation. Second, we implement extensive Function Calling capabilities that elevate the LLM to an agentic planner, orchestrating robot actions (navigation, gaze control, tablet interaction) and integrating diverse multimodal feedback (vision, touch, system state). The framework runs on the robot's tablet but can also be built to run on regular Android smartphones or tablets, decoupling development from robot hardware. This work provides the HRI community with a practical, extensible platform for exploring advanced LLM-driven embodied interaction.

URL PDF HTML ☆

赞 0 踩 0

2603.21010 2026-03-24 cs.CV

SkinCLIP-VL: Consistency-Aware Vision-Language Learning for Multimodal Skin Cancer Diagnosis

Zhixiang Lu, Shijie Xu, Kaicheng Yan, Xuyue Cai, Chong Zhang, Yulong Li, Angelos Stefanidis, Anh Nguyen, Jionglong Su

Comments Accepted by 2026 IEEE International Conference on Multimedia and Expo (ICME 2026)

2603.20994 2026-03-24 cs.AI cs.GT cs.LG

The Intelligent Disobedience Game: Formulating Disobedience in Stackelberg Games and Markov Decision Processes

Benedikt Hornig, Reuth Mirsky

Comments Accepted for presentation at the Rebellion and Disobedience in AI (RaD-AI) at AAMAS 2026

2603.20993 2026-03-24 cs.LG cs.AI

Long-Term Outlier Prediction Through Outlier Score Modeling

Yuma Aoki, Joon Park, Koh Takeuchi, Hisashi Kashima, Shinya Akimoto, Ryuichi Hashimoto, Takahiro Adachi, Takeshi Kishikawa, Takamitsu Sasaki

Comments 15 pages, 6 figues

2603.20992 2026-03-24 cs.RO

Geometrically Plausible Object Pose Refinement using Differentiable Simulation

Anil Zeybek, Rhys Newbury, Snehal Dikhale, Nawid Jamali, Soshi Iba, Akansel Cosgun

2603.20988 2026-03-24 cs.AI q-bio.NC

Can we automatize scientific discovery in the cognitive sciences?

Akshay K. Jagadish, Milena Rmus, Kristin Witte, Marvin Mathony, Marcel Binz, Eric Schulz

2603.20987 2026-03-24 cs.LG cond-mat.dis-nn cond-mat.stat-mech

Interpreting the Synchronization Gap: The Hidden Mechanism Inside Diffusion Transformers

Emil Albrychiewicz, Andrés Franco Valiente, Li-Ching Chen, Viola Zixin Zhao

Comments 38 pages, 5 figures

2603.20986 2026-03-24 cs.AI cond-mat.mes-hall

AutoMOOSE: An Agentic AI for Autonomous Phase-Field Simulation

Sukriti Manna, Henry Chan, Subramanian K. R. S. Sankaranarayanan

详情

英文摘要

Multiphysics simulation frameworks such as MOOSE provide rigorous engines for phase-field materials modeling, yet adoption is constrained by the expertise required to construct valid input files, coordinate parameter sweeps, diagnose failures, and extract quantitative results. We introduce AutoMOOSE, an open-source agentic framework that orchestrates the full simulation lifecycle from a single natural-language prompt. AutoMOOSE deploys a five-agent pipeline in which the Input Writer coordinates six sub-agents and the Reviewer autonomously corrects runtime failures without user intervention. A modular plugin architecture enables new phase-field formulations without modifying the core framework, and a Model Context Protocol (MCP) server exposes the workflow as ten structured tools for interoperability with any MCP-compatible client. Validated on a four-temperature copper grain growth benchmark, AutoMOOSE generates MOOSE input files with 6 of 12 structural blocks matching a human expert reference exactly and 4 functionally equivalent, executes all runs in parallel with a 1.8x speedup, and performs an end-to-end physical consistency check spanning intent, finite-element execution, and Arrhenius kinetics with no human verification. Grain coarsening kinetics are recovered with R^2 = 0.90-0.95 at T >= 600 K; the recovered activation energy Q_fit = 0.296 eV is consistent with a human-written reference (Q_fit = 0.267 eV) under identical parameters. Three runtime failure classes were diagnosed and resolved autonomously within a single correction cycle, and every run produces a provenance record satisfying FAIR data principles. These results show that the gap between knowing the physics and executing a validated simulation campaign can be bridged by a lightweight multi-agent orchestration layer, providing a pathway toward AI-driven materials discovery and self-driving laboratories.

URL PDF HTML ☆

赞 0 踩 0

2603.20985 2026-03-24 cs.CV

Consistent but Dangerous: Per-Sample Safety Classification Reveals False Reliability in Medical Vision-Language Models

Binesh Sadanandan, Vahid Behzadan

Comments CVPR 2026 Workshop on Medical Reasoning with Vision Language Foundation Models

2603.20984 2026-03-24 cs.LG

Joint Surrogate Learning of Objectives, Constraints, and Sensitivities for Efficient Multi-objective Optimization of Neural Dynamical Systems

Frithjof Gressmann, Ivan Georgiev Raikov, Seung Hyun Kim, Mattia Gazzola, Lawrence Rauchwerger, Ivan Soltesz

2603.20976 2026-03-24 cs.LG cs.AI cs.HC

Detection of adversarial intent in Human-AI teams using LLMs

Abed K. Musaffar, Ambuj Singh, Francesco Bullo

2603.20975 2026-03-24 cs.CL cs.LG

DiscoUQ: Structured Disagreement Analysis for Uncertainty Quantification in LLM Agent Ensembles

Bo Jiang

2603.20970 2026-03-24 cs.CV

GraPHFormer: A Multimodal Graph Persistent Homology Transformer for the Analysis of Neuroscience Morphologies

Uzair Shah, Marco Agus, Mahmoud Gamal, Mahmood Alzubaidi, Corrado Cali, Pierre J. Magistretti, Abdesselam Bouzerdoum, Mowafa Househ

Comments Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026

2603.20969 2026-03-24 cs.LG cs.CL

Understanding Contextual Recall in Transformers: How Finetuning Enables In-Context Reasoning over Pretraining Knowledge

Bhavya Vasudeva, Puneesh Deora, Alberto Bietti, Vatsal Sharan, Christos Thrampoulidis

Comments 28 pages, 26 figures

2603.20955 2026-03-24 cs.LG cs.AI

Beyond Expression Similarity: Contrastive Learning Recovers Functional Gene Associations from Protein Interaction Structure

Jason Dury

Comments 21 pages, 5 figures, code at https://github.com/EridosAI/GeneticCAL

2603.20948 2026-03-24 cs.AI cs.DB

gUFO: A Gentle Foundational Ontology for Semantic Web Knowledge Graphs

João Paulo A. Almeida, Giancarlo Guizzardi, Tiago Prince Sales, Claudenir M. Fonseca

Comments 29 pages, 1 figure

2603.20939 2026-03-24 cs.CL cs.AI cs.HC cs.IR stat.ML

User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction

Yuren Hao, Shuhaib Mehri, ChengXiang Zhai, Dilek Hakkani-Tür

Comments 21 pages including appendices

2603.20932 2026-03-24 cs.RO

Implementing Robust M-Estimators with Certifiable Factor Graph Optimization

Zhexin Xu, Hanna Jiamei Zhang, Helena Calatrava, Pau Closas, David M. Rosen

Comments The paper was accepted to the 2026 IEEE International Conference on Robotics and Automation (ICRA 2026)

2603.20930 2026-03-24 cs.LG cs.AI cs.IT math.IT

Causally-Guided Diffusion for Stable Feature Selection

Arun Vignesh Malarkkan, Xinyuan Wang, Kunpeng Liu, Denghui Zhang, Yanjie Fu

Comments 8 pages + references + appendix

2603.20925 2026-03-24 cs.AI

Profit is the Red Team: Stress-Testing Agents in Strategic Economic Interactions

Shouqiao Wang, Marcello Politi, Samuele Marro, Davide Crapis

2603.20921 2026-03-24 cs.LG

Discriminative Representation Learning for Clinical Prediction

Yang Zhang, Li Fan, Samuel Lawrence, Shi Li

2603.20919 2026-03-24 cs.LG cs.AI

Enhancing LIME using Neural Decision Trees

Mohamed Aymen Bouyahia, Argyris Kalogeratos

2603.20911 2026-03-24 cs.AI cs.CY

Do LLM-Driven Agents Exhibit Engagement Mechanisms? Controlled Tests of Information Load, Descriptive Norms, and Popularity Cues

Tai-Quan Peng, Yuan Tian, Songsong Liang, Dazhen Deng, Yingcai Wu

2603.20908 2026-03-24 cs.LG stat.ML

Bayesian Scattering: A Principled Baseline for Uncertainty on Image Data

Bernardo Fichera, Zarko Ivkovic, Kjell Jorner, Philipp Hennig, Viacheslav Borovitskiy

2603.20899 2026-03-24 cs.CL cs.AI

Mitigating Shortcut Reasoning in Language Models: A Gradient-Aware Training Approach

Hongyu Cao, Kunpeng Liu, Dongjie Wang, Yanjie Fu

Comments 12 pages, 2 figures. Preprint. Experiments on synthetic reasoning benchmarks. Code available

2603.20898 2026-03-24 cs.LG cs.AI cs.CV

Natural Gradient Descent for Online Continual Learning

Joe Khawand, David Colliaux

Comments 13 pages, 2 figures

2603.20896 2026-03-24 cs.LG cs.AI

Beyond the Birkhoff Polytope: Spectral-Sphere-Constrained Hyper-Connections

Zhaoyi Liu, Haichuan Zhang, Ang Li

Comments 16 pages

2603.20887 2026-03-24 cs.CV

Scene Graph-guided SegCaptioning Transformer with Fine-grained Alignment for Controllable Video Segmentation and Captioning

Xu Zhang, Jin Yuan, BinHong Yang, Xuan Liu, Qianjun Zhang, Yuyi Wang, Zhiyong Li, Hanwang Zhang

Comments 12 pages, 6 figures

2603.20885 2026-03-24 cs.RO cs.AI cs.HC

Characterizing the onset and offset of motor imagery during passive arm movements induced by an upper-body exoskeleton

Kanishka Mitra, Frigyes Samuel Racz, Satyam Kumar, Ashish D. Deshpande, José del R. Millán

Comments Accepted to IROS 2023. 6 pages, 6 figures. Project page available at https://mitrakanishka.github.io/projects/passive-arm-mi/

详情

DOI: 10.1109/IROS55552.2023.10342492
Journal ref: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023, pp. 3789-3794

英文摘要

Two distinct technologies have gained attention lately due to their prospects for motor rehabilitation: robotics and brain-machine interfaces (BMIs). Harnessing their combined efforts is a largely uncharted and promising direction that has immense clinical potential. However, a significant challenge is whether motor intentions from the user can be accurately detected using non-invasive BMIs in the presence of instrumental noise and passive movements induced by the rehabilitation exoskeleton. As an alternative to the straightforward continuous control approach, this study instead aims to characterize the onset and offset of motor imagery during passive arm movements induced by an upper-body exoskeleton to allow for the natural control (initiation and termination) of functional movements. Ten participants were recruited to perform kinesthetic motor imagery (MI) of the right arm while attached to the robot, simultaneously cued with LEDs indicating the initiation and termination of a goal-oriented reaching task. Using electroencephalogram signals, we built a decoder to detect the transition between i) rest and beginning MI and ii) maintaining and ending MI. Offline decoder evaluation achieved group average onset accuracy of 60.7% and 66.6% for offset accuracy, revealing that the start and stop of MI could be identified while attached to the robot. Furthermore, pseudo-online evaluation could replicate this performance, forecasting reliable online exoskeleton control in the future. Our approach showed that participants could produce quality and reliable sensorimotor rhythms regardless of noise or passive arm movements induced by wearing the exoskeleton, which opens new possibilities for BMI control of assistive devices.

URL PDF HTML ☆

赞 0 踩 0

2603.20869 2026-03-24 cs.AI cs.LG

ReLaMix: Residual Latency-Aware Mixing for Delay-Robust Financial Time-Series Forecasting

Tianyou Lai, Wentao Yue, Jiayi Zhou, Chaoyuan Hao, Lingke Chang, Qingyu Mao, Zhibo Niu, Qilei Li

Comments 6 pages, 5 figures