arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.15975 2026-03-18 cs.CV

UMO: Unified In-Context Learning Unlocks Motion Foundation Model Priors

Xiaoyan Cong, Zekun Li, Zhiyang Dou, Hongyu Li, Omid Taheri, Chuan Guo, Abhay Mittal, Sizhe An, Taku Komura, Wojciech Matusik, Michael J. Black, Srinath Sridhar

Comments Project Page: https://oliver-cong02.github.io/UMO.github.io/

详情

英文摘要

Large-scale foundation models (LFMs) have recently made impressive progress in text-to-motion generation by learning strong generative priors from massive 3D human motion datasets and paired text descriptions. However, how to effectively and efficiently leverage such single-purpose motion LFMs, i.e., text-to-motion synthesis, in more diverse cross-modal and in-context motion generation downstream tasks remains largely unclear. Prior work typically adapts pretrained generative priors to individual downstream tasks in a task-specific manner. In contrast, our goal is to unlock such priors to support a broad spectrum of downstream motion generation tasks within a single unified framework. To bridge this gap, we present UMO, a simple yet general unified formulation that casts diverse downstream tasks into compositions of atomic per-frame operations, enabling in-context adaptation to unlock the generative priors of pretrained DiT-based motion LFMs. Specifically, UMO introduces three learnable frame-level meta-operation embeddings to specify per-frame intent and employs lightweight temporal fusion to inject in-context cues into the pretrained backbone, with negligible runtime overhead compared to the base model. With this design, UMO finetunes the pretrained model, originally limited to text-to-motion generation, to support diverse previously unsupported tasks, including temporal inpainting, text-guided motion editing, text-serialized geometric constraints, and multi-identity reaction generation. Experiments demonstrate that UMO consistently outperforms task-specific and training-free baselines across a wide range of benchmarks, despite using a single unified model. Code and model will be publicly available. Project Page: https://oliver-cong02.github.io/UMO.github.io/

URL PDF HTML ☆

赞 0 踩 0

2603.15969 2026-03-18 cs.CL

Robust Language Identification for Romansh Varieties

Charlotte Model, Sina Ahmadi, Jannis Vamvas

2603.15968 2026-03-18 cs.AI cs.CL cs.LG cs.MA

MAC: Multi-Agent Constitution Learning

Rushil Thareja, Gautam Gupta, Francesco Pinto, Nils Lukas

Comments Code: https://github.com/rushil-thareja/MAC-Multi-Agent-Constitution-Learning | PyPI: https://pypi.org/project/mac-prompt/ | Website: https://www.mac-prompt.com/

2603.15965 2026-03-18 cs.CL cs.AI

MoLoRA: Composable Specialization via Per-Token Adapter Routing

Shrey Shah, Justin Wagle

2603.15960 2026-03-18 cs.AI

Optimizing Hospital Capacity During Pandemics: A Dual-Component Framework for Strategic Patient Relocation

Sadaf Tabatabaee, Hicham El Baz, Mohammed Khalil Ghali, Nagendra N. Nagarur

Comments 6 pages. Published in Proceedings of the IISE Annual Conference & Expo 2025. DOI: 10.21872/2025IISE_6202

2603.15958 2026-03-18 cs.LG

Deriving Hyperparameter Scaling Laws via Modern Optimization Theory

Egor Shulgin, Dimitri von Rütte, Tianyue H. Zhang, Niccolò Ajroldi, Bernhard Schölkopf, Antonio Orvieto

Comments v1: Preprint based on a short version published as a conference paper at SciForDL Workshop, 2nd edition

2603.15957 2026-03-18 cs.LG

GASP: Guided Asymmetric Self-Play For Coding LLMs

Swadesh Jana, Cansu Sancaktar, Tomáš Daniš, Georg Martius, Antonio Orvieto, Pavel Kolev

Comments Accepted at ICLR 2026 Workshop on AI with Recursive Self-Improvement (RSI 2026) as Spotlight, and ICLR 2026 Workshop on Lifelong Agents (LLA 2026)

2603.15953 2026-03-18 cs.CL cs.AI cs.LG

A Family of LLMs Liberated from Static Vocabularies

Aleph Alpha, :, Adnen Abdessaied, Artur Baranowski, Lukas Balles, Michael Barlow, Fabien C. Y. Benureau, Felix Berkenkamp, Lukas Bluebaum, Bastian Boll, Thomas F. Burns, Björn Deiseroth, Constantin Eichenberg, David Friede, Pablo Iyu Guerrero, Ahmed Hammam, Bastian Harren, Johann Higl, Yasser Jadidi, Carina Kauf, Johannes Messner, Jan Hendrik Metzen, Max Meuer, Vedant Nanda, Pit Neitemeier, Koen Oostermeijer, Letitia Parcalabescu, Markus Pernpointner, Felix Reinfurt, Dylan Rodriquez, Grégory Schott, Philipp Siedler, Martin Simonovsky, Till Speicher, Volker Stampa, Stephan Wäldchen, Samuel Weinbach, Gregor Ziegltrum

2603.15951 2026-03-18 cs.RO

Gaze-Aware Task Progression Detection Framework for Human-Robot Interaction Using RGB Cameras

Linlin Cheng, Koen Hindriks, Artem V. Belopolsky

Comments 9 pages, 7 figures. This article has been accepted for publication in IEEE Robotics and Automation Letters

详情

DOI: 10.1109/lra.2026.3673990

英文摘要

In human-robot interaction (HRI), detecting a human's gaze helps robots interpret user attention and intent. However, most gaze detection approaches rely on specialized eye-tracking hardware, limiting deployment in everyday settings. Appearance-based gaze estimation methods remove this dependency by using standard RGB cameras, but their practicality in HRI remains underexplored. We present a calibration-free framework for detecting task progression when information is conveyed via integrated display interfaces. The framework uses only the robot's built-in monocular RGB camera (640x480 resolution) and state-of-the-art gaze estimation to monitor attention patterns. It leverages natural behavior, where users shift focus from task interfaces to the robot's face to signal task completion, formalized through three Areas of Interest (AOI): tablet, robot face, and elsewhere. Systematic parameter optimization identifies configurations that balance detection accuracy and interaction latency. We validate our framework in a "First Day at Work" scenario, comparing it to button-based interaction. Results show a task completion detection accuracy of 77.6%. Compared to button-based interaction, the proposed system exhibits slightly higher response latency but preserves information retention and significantly improves comfort, social presence, and perceived naturalness. Notably, most participants reported that they did not consciously use eye movements to guide the interaction, underscoring the intuitive role of gaze as a communicative cue. This work demonstrates the feasibility of intuitive, low-cost, RGB-only gaze-based HRI for natural and engaging interactions.

URL PDF HTML ☆

赞 0 踩 0

2603.15950 2026-03-18 cs.CL cs.CY cs.SI

POLAR:A Per-User Association Test in Embedding Space

Pedro Bento, Arthur Buzelin, Arthur Chagas, Yan Aquino, Victoria Estanislau, Samira Malaquias, Pedro Robles Dutenhefner, Gisele L. Pappa, Virgilio Almeida, Wagner MeiraJr

Comments Accepted paper at ICWSM 2026

2603.15946 2026-03-18 cs.AI

Argumentative Human-AI Decision-Making: Toward AI Agents That Reason With Us, Not For Us

Stylianos Loukas Vasileiou, Antonio Rago, Francesca Toni, William Yeoh

2603.15939 2026-03-18 cs.LG cs.AI

Data-Local Autonomous LLM-Guided Neural Architecture Search for Multiclass Multimodal Time-Series Classification

Emil Hardarson, Luka Biedebach, Ómar Bessi Ómarsson, Teitur Hrólfsson, Anna Sigridur Islind, María Óskarsdóttir

详情

英文摘要

Applying machine learning to sensitive time-series data is often bottlenecked by the iteration loop: Performance depends strongly on preprocessing and architecture, yet training often has to run on-premise under strict data-local constraints. This is a common problem in healthcare and other privacy-constrained domains (e.g., a hospital developing deep learning models on patient EEG). This bottleneck is particularly challenging in multimodal fusion, where sensor modalities must be individually preprocessed and then combined. LLM-guided neural architecture search (NAS) can automate this exploration, but most existing workflows assume cloud execution or access to data-derived artifacts that cannot be exposed. We present a novel data-local, LLM-guided search framework that handles candidate pipelines remotely while executing all training and evaluation locally under a fixed protocol. The controller observes only trial-level summaries, such as pipeline descriptors, metrics, learning-curve statistics, and failure logs, without ever accessing raw samples or intermediate feature representations. Our framework targets multiclass, multimodal learning via one-vs-rest binary experts per class and modality, a lightweight fusion MLP, and joint search over expert architectures and modality-specific preprocessing. We evaluate our method on two regimes: UEA30 (public multivariate time-series classification dataset) and SleepEDFx sleep staging (heterogeneous clinical modalities such as EEG, EOG, and EMG). The results show that the modular baseline model is strong, and the LLM-guided NAS further improves it. Notably, our method finds models that perform within published ranges across most benchmark datasets. Across both settings, our method reduces manual intervention by enabling unattended architecture search while keeping sensitive data on-premise.

URL PDF HTML ☆

赞 0 踩 0

2603.15932 2026-03-18 cs.CV

Nodule-Aligned Latent Space Learning with LLM-Driven Multimodal Diffusion for Lung Nodule Progression Prediction

James Song, Yifan Wang, Chuan Zhou, Liyue Shen

2603.15927 2026-03-18 cs.LG cs.NA math.DS math.NA

Discovery of interaction and diffusion kernels in particle-to-mean-field multi-agent systems

Giacomo Albi, Alessandro Alla, Elisa Calzola

2603.15926 2026-03-18 cs.LG cs.AI

Evaluating Causal Discovery Algorithms for Path-Specific Fairness and Utility in Healthcare

Nitish Nagesh, Elahe Khatibi, Thomas Hughes, Mahdi Bagheri, Pratik Gajane, Amir M. Rahmani

2603.15925 2026-03-18 cs.LG

Generative Inverse Design with Abstention via Diagonal Flow Matching

Miguel de Campos, Werner Krebs, Hanno Gottschalk

2603.15916 2026-03-18 cs.LG cs.AI

Auto Researching, not hyperparameter tuning: Convergence Analysis of 10,000 Experiments

Xiaoyi Li

2603.15914 2026-03-18 cs.LG cs.AI

The Agentic Researcher: A Practical Guide to AI-Assisted Research in Mathematics and Machine Learning

Max Zimmer, Nico Pelleriti, Christophe Roux, Sebastian Pokutta

2603.15909 2026-03-18 cs.AI cs.CL cs.HC

Prompt Engineering for Scale Development in Generative Psychometrics

Lara Lee Russell-Lasalandra, Hudson Golino

Comments 22 pages, 7 figures

2603.15907 2026-03-18 cs.LG cs.SY eess.SY

Game-Theory-Assisted Reinforcement Learning for Border Defense: Early Termination based on Analytical Solutions

Goutam Das, Michael Dorothy, Kyle Volle, Daigo Shishika

Comments 7 pages, ACC 2026

2603.15905 2026-03-18 cs.SD

INSTRUMENTAL: Automatic Synthesizer Parameter Recovery from Audio via Evolutionary Optimization

Philipp Bogdan

Comments 5 pages

2603.15903 2026-03-18 cs.CL

Agent-based imitation dynamics can yield efficiently compressed population-level vocabularies

Nathaniel Imel, Richard Futrell, Michael Franke, Noga Zaslavsky

2603.15901 2026-03-18 cs.LG cs.AI cs.CV

Federated Learning for Privacy-Preserving Medical AI

Tin Hoang

Comments MSc Dissertation

2603.15897 2026-03-18 cs.CL cs.AI

COGNAC at SemEval-2026 Task 5: LLM Ensembles for Human-Level Word Sense Plausibility Rating in Challenging Narratives

Azwad Anjum Islam, Tisa Islam Erana

Comments System description paper in SemEval-2026, Task 5

2603.15887 2026-03-18 cs.CV cs.NE

EvoIQA - Explaining Image Distortions with Evolved White-Box Logic

Ruchika Gupta, Illya Bakurov, Nathan Haut, Wolfgang Banzhaf

Comments 11 pages, 3 figures

2603.15885 2026-03-18 cs.AI cs.RO

Resilience Meets Autonomy: Governing Embodied AI in Critical Infrastructure

Puneet Sharma, Christer Henrik Pursiainen

Comments 6 pages

2603.15880 2026-03-18 cs.LG cs.AI

Electrodermal Activity as a Unimodal Signal for Aerobic Exercise Detection in Wearable Sensors

Rena Mira Krishna, Ramya Sankar, Shadi Ghiasi

2603.15871 2026-03-18 cs.LG cs.AI

Counteractive RL: Rethinking Core Principles for Efficient and Scalable Deep Reinforcement Learning

Ezgi Korkmaz

Comments NeurIPS 2025 Spotlight

2603.15862 2026-03-18 cs.CV cs.LG

Self-supervised Disentanglement of Disease Effects from Aging in 3D Medical Shapes

Jakaria Rabbi, Nilanjan Ray, Dana Cobzas

Comments 10 pages

2603.15857 2026-03-18 cs.AI cs.LG cs.RO

Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models

Pranaya Jajoo, Harshit Sikchi, Siddhant Agarwal, Amy Zhang, Scott Niekum, Martha White

Comments ICLR 2026