arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.21154 2026-04-24 cs.AI

Agentic AI for Personalized Physiotherapy: A Multi-Agent Framework for Generative Video Training and Real-Time Pose Correction

Abhishek Dharmaratnakar, Srivaths Ranganathan, Anushree Sinha, Debanshu Das

Comments 3 pages, 2 figures, submitted to ICDH IEEE conference

详情

英文摘要

At-home physiotherapy compliance remains critically low due to a lack of personalized supervision and dynamic feedback. Existing digital health solutions rely on static, pre-recorded video libraries or generic 3D avatars that fail to account for a patient's specific injury limitations or home environment. In this paper, we propose a novel Multi-Agent System (MAS) architecture that leverages Generative AI and computer vision to close the tele-rehabilitation loop. Our framework consists of four specialized micro-agents: a Clinical Extraction Agent that parses unstructured medical notes into kinematic constraints; a Video Synthesis Agent that utilizes foundational video generation models to create personalized, patient-specific exercise videos; a Vision Processing Agent for real-time pose estimation; and a Diagnostic Feedback Agent that issues corrective instructions. We present the system architecture, detail the prototype pipeline using Large Language Models and MediaPipe, and outline our clinical evaluation plan. This work demonstrates the feasibility of combining generative media with agentic autonomous decision-making to scale personalized patient care safely and effectively.

URL PDF HTML ☆

赞 0 踩 0

2604.21146 2026-04-24 cs.CV

WFM: 3D Wavelet Flow Matching for Ultrafast Multi-Modal MRI Synthesis

Yalcin Tur, Mihajlo Stojkovic, Ulas Bagci

Comments 17 pages, 4 figures, 3 tables. Accepted at MIDL 2026 (Poster)

2604.21144 2026-04-24 cs.CL cs.AI cs.HC

Using Machine Mental Imagery for Representing Common Ground in Situated Dialogue

Biswesh Mohapatra, Giovanni Duca, Laurent Romary, Justine Cassell

Comments Work under review. Biswesh Mohapatra and Giovanni Duca both contributed equally to this work

2604.21139 2026-04-24 cs.CL cs.LG

Slot Machines: How LLMs Keep Track of Multiple Entities

Paul C. Bogdan, Jack Lindsey

2604.21138 2026-04-24 cs.RO cs.AI

Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems

Jiabao Ji, Yongchao Chen, Yang Zhang, Ramana Rao Kompella, Chuchu Fan, Gaowen Liu, Shiyu Chang

2604.21134 2026-04-24 cs.CL

Beyond Pixels: Introspective and Interactive Grounding for Visualization Agents

Yiyang Lu, Woong Shin, Ahmad Maroof Karimi, Feiyi Wang, Jie Ren, Evgenia Smirni

Comments 18 pages, 8 figures

2604.21133 2026-04-24 cs.CL

GRISP: Guided Recurrent IRI Selection over SPARQL Skeletons

Sebastian Walter, Hannah Bast

2604.21130 2026-04-24 cs.RO

Self-Predictive Representation for Autonomous UAV Object-Goal Navigation

Angel Ayala, Donling Sui, Francisco Cruz, Mitchell Torok, Mohammad Deghat, Bruno J. T. Fernandes

Comments Submitted to T-RO

2604.21127 2026-04-24 cs.CV

HyperFM: An Efficient Hyperspectral Foundation Model with Spectral Grouping

Zahid Hassan Tushar, Sanjay Purushotham

Comments 15 pages, 8 figures, to be published in CVPR 2026 findings, Code and data are publicly available on https://github.com/umbc-sanjaylab/HyperFM

详情

英文摘要

The NASA PACE mission provides unprecedented hyperspectral observations of ocean color, aerosols, and clouds, offering new insights into how these components interact and influence Earth's climate and air quality. Its Ocean Color Instrument measures light across hundreds of finely spaced wavelength bands, enabling detailed characterization of features such as phytoplankton composition, aerosol properties, and cloud microphysics. However, hyperspectral data of this scale is large, complex, and difficult to label, requiring specialized processing and analysis techniques. Existing foundation models, which have transformed computer vision and natural language processing, are generally trained on standard RGB imagery and therefore struggle to interpret the continuous spectral signatures captured by PACE. While recent advances have introduced hyperspectral foundation models, they are typically trained on cloud-free observations and often remain limited to single-sensor datasets due to spectral inconsistencies across instruments. Moreover, existing models tend to be parameter-heavy and computationally expensive, limiting scalability and adoption in operational settings. To address these challenges, we introduce HyperFM, a parameter-efficient hyperspectral foundation model that leverages intra-group and inter-group spectral attention along with hybrid parameter decomposition to better capture spectral spatial relationships while reducing computational cost. HyperFM demonstrates consistent performance improvements over existing hyperspectral foundation models and task-specific state-of-the-art methods across four benchmark downstream atmospheric cloud property retrieval tasks. To support further research, we additionally release HyperFM250K, a large-scale hyperspectral dataset from the PACE mission that includes both clear and cloudy scenes.

URL PDF HTML ☆

赞 0 踩 0

2604.21120 2026-04-24 cs.LG cs.CL

TabSHAP

Aryan Chaudhary, Prateek Agarwal, Tejasvi Alladi

2604.21119 2026-04-24 cs.CV cs.AI cs.SD

Materialistic RIR: Material Conditioned Realistic RIR Generation

Mahnoor Fatima Saad, Sagnik Majumder, Kristen Grauman, Ziad Al-Halah

Comments Accepted to CVPR 2026 Findings. Project page: https://mahnoor-fatima-saad.github.io/MatRIR.html

2604.21104 2026-04-24 cs.CV cs.LG

Pretrain Where? Investigating How Pretraining Data Diversity Impacts Geospatial Foundation Model Performance

Amandeep Kaur, Mirali Purohit, Gedeon Muhawenayo, Esther Rolf, Hannah Kerner

Comments Accepted at EarthVision workshop, CVPR 2026

2604.21103 2026-04-24 cs.AI econ.GN q-fin.EC

AI Governance under Political Turnover: The Alignment Surface of Compliance Design

Andrew J. Peterson

2604.21102 2026-04-24 cs.CV cs.AI

Leveraging Multimodal LLMs for Built Environment and Housing Attribute Assessment from Street-View Imagery

Siyuan Yao, Siavash Ghorbany, Kuangshi Ai, Arnav Cherukuthota, Meghan Forstchen, Alexis Korotasz, Matthew Sisk, Ming Hu, Chaoli Wang

2604.21100 2026-04-24 cs.LG

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

Neehal Tumma, Noel Loo, Daniela Rus

2604.21098 2026-04-24 cs.AI cs.CL

Propensity Inference: Environmental Contributors to LLM Behaviour

Olli Järviniemi, Oliver Makins, Jacob Merizian, Robert Kirk, Ben Millwood

2604.21094 2026-04-24 cs.LG

Spectral Embeddings Leak Graph Topology: Theory, Benchmark, and Adaptive Reconstruction

Thinh Nguyen-Cong, Truong-Son Hy, Thang N. Dinh

2604.21093 2026-04-24 cs.LG cs.AI

TRAVELFRAUDBENCH: A Configurable Evaluation Framework for GNN Fraud Ring Detection in Travel Networks

Bhavana Sajja

2604.21092 2026-04-24 cs.AI cs.SE

Mind the Prompt: Self-adaptive Generation of Task Plan Explanations via LLMs

Gricel Vázquez, Alexandros Evangelidis, Sepeedeh Shahbeigi, Radu Calinescu, Simos Gerasimou

2604.21082 2026-04-24 cs.CL cs.LG

Weighting What Matters: Boosting Sample Efficiency in Medical Report Generation via Token Reweighting

Alexander Weers, Daniel Rueckert, Martin J. Menten

2604.21079 2026-04-24 cs.CV

Foveated Reasoning: Stateful, Action-based Visual Focusing for Vision-Language Models

Juhong Min, Lazar Valkov, Vitali Petsiuk, Hossein Souri, Deen Dayal Mohan

2604.21078 2026-04-24 cs.RO

Impact-Aware Model Predictive Control for UAV Landing on a Heaving Platform

Jess Stephenson, Melissa Greeff

Comments To be published in the proceedings of International Federation of Automatic Control (IFAC) World Congress 2026

2604.21076 2026-04-24 cs.CL cs.AI

Serialisation Strategy Matters: How FHIR Data Format Affects LLM Medication Reconciliation

Sanjoy Pator

Comments 14 pages, 7 figures, independent research

详情

英文摘要

Medication reconciliation at clinical handoffs is a high-stakes, error-prone process. Large language models are increasingly proposed to assist with this task using FHIR-structured patient records, but a fundamental and largely unstudied variable is how the FHIR data is serialised before being passed to the model. We present the first systematic comparison of four FHIR serialisation strategies (Raw JSON, Markdown Table, Clinical Narrative, and Chronological Timeline) across five open-weight models (Phi-3.5-mini, Mistral-7B, BioMistral-7B, Llama-3.1-8B, Llama-3.3-70B) on a controlled benchmark of 200 synthetic patients, totalling 4,000 inference runs. We find that serialisation strategy has a large, statistically significant effect on performance for models up to 8B parameters: Clinical Narrative outperforms Raw JSON by up to 19 F1 points for Mistral-7B (r = 0.617, p < 10^{-10}). This advantage reverses at 70B, where Raw JSON achieves the best mean F1 of 0.9956. In all 20 model and strategy combinations, mean precision exceeds mean recall: omission is the dominant failure mode, with models more often missing an active medication than fabricating one, which changes how clinical safety auditing priorities should be set. Smaller models plateau at roughly 7-10 concurrent active medications, leaving polypharmacy patients, the patients most at risk from reconciliation errors, systematically underserved. BioMistral-7B, a domain-pretrained model without instruction tuning, produces zero usable output in all conditions, showing that domain pretraining alone is not sufficient for structured extraction. These results offer practical, evidence-based format recommendations for clinical LLM deployment: Clinical Narrative for models up to 8B, Raw JSON for 70B and above. The complete pipeline is reproducible on open-source tools running on an AWS g6e.xlarge instance (NVIDIA L40S, 48 GB VRAM).

URL PDF HTML ☆

赞 0 踩 0

2604.21070 2026-04-24 cs.CL cs.LG

DWTSumm: Discrete Wavelet Transform for Document Summarization

Rana Salama, Abdou Youssef, Mona Diab

2604.21061 2026-04-24 cs.AI

InVitroVision: a Multi-Modal AI Model for Automated Description of Embryo Development using Natural Language

Nicklas Neu, Thomas Ebner, Jasmin Primus, Raphael Zefferer, Bernhard Schenkenfelder, Mathias Brunbauer, Florian Kromp

Comments 15 pages, 2 figures

2604.21057 2026-04-24 cs.CL

TRACES: Tagging Reasoning Steps for Adaptive Cost-Efficient Early-Stopping

Yannis Belkhiter, Seshu Tirupathi, Giulio Zizzo, John D. Kelleher

2604.21053 2026-04-24 cs.RO cs.CV

Neuro-Symbolic Manipulation Understanding with Enriched Semantic Event Chains

Fatemeh Ziaeetabar

2604.21045 2026-04-24 cs.CL

Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech

Siqi Ouyang, Shuoyang Ding, Oleksii Hrinchuk, Vitaly Lavrukhin, Brian Yan, Boris Ginsburg, Lei Li

Comments ACL 2026 Oral

2604.21044 2026-04-24 cs.AI

Active Data

Richard Arthur, Virginia DiDomizio, Louis Hoebel

Comments 9 pages, 7 figures, 2 tables

2604.21042 2026-04-24 cs.LG

Interpretable Quantile Regression by Optimal Decision Trees

Valentin Lemaire, Gaël Aglin, Siegfried Nijssen