arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

Gernot Fiala, Markus Plass, Robert Harb, Peter Regitnig, Kristijan Skok, Wael Al Zoughbi, Carmen Zerner, Paul Torke, Michaela Kargl, Heimo Müller, Tomas Brazdil, Matej Gallo, Jaroslav Kubín, Roman Stoklasa, Rudolf Nenutil, Norman Zerbe, Andreas Holzinger, Petr Holub

Journal ref Artificial Intelligence in Medicine, Volume 174 (2026), 103368

2508.14746 2026-02-16 cs.LG

MissionHD: Hyperdimensional Refinement of Distribution-Deficient Reasoning Graphs for Video Anomaly Detection

Sanggeon Yun, Raheeb Hassan, Ryozo Masukawa, Nathaniel D. Bastian, Mohsen Imani

2508.08275 2026-02-16 cs.CL cs.AI

MLLM-CTBench: A Benchmark for Continual Instruction Tuning with Reasoning Process Diagnosis

Haiyun Guo, Zhiyan Hou, Yandu Sun, Jinghan He, Yu Chen, Yuzhe Zhou, Yuheng Jia, Jinqiao Wang, Tat-Seng Chua

Comments under review

详情

英文摘要

Continual instruction tuning(CIT) during the post-training phase is crucial for adapting multimodal large language models (MLLMs) to evolving real-world demands. However, the progress is hampered by the lack of benchmarks with rigorous, protocol-consistent evaluation. To bridge this gap, we introduce MLLM-CTBench, a comprehensive benchmark for CIT of MLLMs, covering seven challenging tasks across six diverse domains. MLLM-CTBench makes three key contributions. First, we establish a multidimensional evaluation framework that jointly assesses final-answer accuracy and process-level reasoning quality, where Chain-of-Thought (CoT) traces serve as an observable signal to diagnose catastrophic forgetting beyond answer-only evaluation. Second, we conduct a large-scale evaluation of continual learning methods by systematically assessing eight representative algorithms from four major families under a unified protocol across task orders, providing actionable insights for algorithm design. Third, we expand the scope from Supervised Fine-Tuning (SFT) to Reinforcement Fine-Tuning (RFT) in CIT. By investigating GRPO, an on-policy RL algorithm that stabilizes updates through explicit KL-divergence control to a prior policy, we aim to analyze how this mechanism affects cross-task knowledge retention. Our experiments yield several findings:(1) Process-level reasoning quality is often more resilient to catastrophic forgetting than final-answer accuracy, and forgetting is primarily driven by degradation in domain knowledge. (2) Model capability is critical factor influencing continual learning outcomes, with stronger baseline models exhibiting greater resistance to catastrophic forgetting. (3) On-policy RFT (GRPO), with its inherent KL control, achieves more stable cross-task retention than SFT. While removing KL control can amplify forgetting despite potential gains on new ones.

URL PDF HTML ☆

赞 0 踩 0

2508.07675 2026-02-16 cs.LG

Semantic Caching for Low-Cost LLM Serving: From Offline Learning to Online Adaptation

Xutong Liu, Baran Atalar, Xiangxiang Dai, Jinhang Zuo, Siwei Wang, John C. S. Lui, Wei Chen, Carlee Joe-Wong

Comments Accepted to INFOCOM 2026

2508.06095 2026-02-16 cs.RO

Incremental Language Understanding for Online Motion Planning of Robot Manipulators

Mitchell Abrams, Thies Oelerich, Christian Hartl-Nesic, Andreas Kugi, Matthias Scheutz

Comments 8 pages, 9 figures, accepted at IROS 2025

2508.05004 2026-02-16 cs.LG cs.AI cs.CL

R-Zero: Self-Evolving Reasoning LLM from Zero Data

Chengsong Huang, Wenhao Yu, Xiaoyang Wang, Hongming Zhang, Zongxia Li, Ruosen Li, Jiaxin Huang, Haitao Mi, Dong Yu

2508.02872 2026-02-16 cs.CL cs.LG

Highlight & Summarize: RAG without the jailbreaks

Giovanni Cherubin, Andrew Paverd

2508.01669 2026-02-16 cs.LG cs.DC

Bridging Generalization Gap of Heterogeneous Federated Clients Using Generative Models

Ziru Niu, Hai Dong, A. K. Qin

Comments Accepted by ICLR 2026 (poster)

Journal ref ICLR 2026

2508.01504 2026-02-16 cs.LG

Instruction-based Time Series Editing

Jiaxing Qiu, Dongliang Guo, Brynne Sullivan, Teague R. Henry, Thomas Hartvigsen

Comments (KDD 26) Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1

2507.19561 2026-02-16 cs.LG

Harnessing intuitive local evolution rules for physical learning

Roie Ezraty, Menachem Stern, Shmuel M. Rubinstein

Comments 26 pages, 6 figures (with appendices). Submitted to Physical Review E

2507.06971 2026-02-16 cs.CV cs.RO eess.IV

Hallucinating 360°: Panoramic Street-View Generation via Local Scenes Diffusion and Probabilistic Prompting

Fei Teng, Kai Luo, Sheng Wu, Siyu Li, Pujun Guo, Jiale Wei, Jiaming Zhang, Kunyu Peng, Kailun Yang

Comments Accepted to ICRA 2026. The source code will be publicly available at https://github.com/FeiT-FeiTeng/Percep360

详情

英文摘要

Panoramic perception holds significant potential for autonomous driving, enabling vehicles to acquire a comprehensive 360° surround view in a single shot. However, autonomous driving is a data-driven task. Complete panoramic data acquisition requires complex sampling systems and annotation pipelines, which are time-consuming and labor-intensive. Although existing street view generation models have demonstrated strong data regeneration capabilities, they can only learn from the fixed data distribution of existing datasets and cannot leverage stitched pinhole images as a supervisory signal. In this paper, we propose the first panoramic generation method Percep360 for autonomous driving. Percep360 enables coherent generation of panoramic data with control signals based on the stitched panoramic data. Percep360 focuses on two key aspects: coherence and controllability. Specifically, to overcome the inherent information loss caused by the pinhole sampling process, we propose the Local Scenes Diffusion Method (LSDM). LSDM reformulates the panorama generation as a spatially continuous diffusion process, bridging the gaps between different data distributions. Additionally, to achieve the controllable generation of panoramic images, we propose a Probabilistic Prompting Method (PPM). PPM dynamically selects the most relevant control cues, enabling controllable panoramic image generation. We evaluate the effectiveness of the generated images from three perspectives: image quality assessment (i.e., no-reference and with reference), controllability, and their utility in real-world Bird's Eye View (BEV) segmentation. Notably, the generated data consistently outperforms the original stitched images in no-reference quality metrics and enhances downstream perception models. The source code will be publicly available at https://github.com/FeiT-FeiTeng/Percep360.

URL PDF HTML ☆

赞 0 踩 0

2507.04103 2026-02-16 cs.AI cs.LG stat.ML

How to Train Your LLM Web Agent: A Statistical Diagnosis

Dheeraj Vattikonda, Santhoshi Ravichandran, Emiliano Penaloza, Hadi Nekoei, Megh Thakkar, Thibault Le Sellier de Chezelles, Nicolas Gontier, Miguel Muñoz-Mármol, Sahar Omidi Shayegan, Stefania Raimondo, Xue Liu, Alexandre Drouin, Laurent Charlin, Alexandre Piché, Alexandre Lacoste, Massimo Caccia

2507.03262 2026-02-16 cs.CV cs.AI

Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders

Yizhou Wang, Song Mao, Yang Chen, Yufan Shen, Yinqiao Yan, Pinlong Cai, Ding Wang, Guohang Yan, Zhi Yu, Xuming Hu, Botian Shi

Comments accepted by ICLR2026, project website: https://github.com/MaoSong2022/Encoder-Redundancy

2506.11906 2026-02-16 cs.RO

Palpation Alters Auditory Pain Expressions with Gender-Specific Variations in Robopatients

Chapa Sirithunge, Yue Xie, Saitarun Nadipineni, Fumiya Iida, Thilina Dulantha Lalitharatne

Comments 12 pages, 9 figures, journal

详情

DOI: 10.1109/TMRB.2025.3613785

英文摘要

Diagnostic errors remain a major cause of preventable mortality, particularly in resource limited settings. Medical training simulators, including robopatients, help reduce such errors by replicating patient responses during procedures such as abdominal palpation. However, generating realistic multimodal feedback especially auditory pain expressions remains challenging due to the complex, nonlinear relationship between applied palpation forces and perceived pain sounds. The high dimensionality and perceptual variability of pain vocalizations further limit conventional modeling approaches. We propose a novel experimental paradigm for adaptive pain expressivity in robopatients that dynamically generates auditory pain responses to palpation forces using human in the loop machine learning. Specifically, we employ Proximal Policy Optimization (PPO), a reinforcement learning algorithm suited for continuous control, to iteratively refine pain sound generation based on real time human evaluative feedback. The system initializes randomized mappings between force inputs and sound outputs, and the learning agent progressively adjusts them to align with human perceptual preferences. Results show that the framework adapts to individual palpation behaviors and subjective sound preferences while capturing a broad range of perceived pain intensities, from mild discomfort to acute distress. We also observe perceptual saturation at lower force ranges, with gender specific thresholds in pain sound perception. This work demonstrates the feasibility of human in the loop reinforcement learning for co-optimizing haptic input and auditory pain expression in medical simulators, highlighting the potential of adaptive and immersive platforms to enhance palpation training and reduce diagnostic errors.

URL PDF HTML ☆

赞 0 踩 0

2506.11827 2026-02-16 cs.RO cs.HC

Auditory-Tactile Congruence for Synthesis of Adaptive Pain Expressions in RoboPatients

Saitarun Nadipineni, Chapa Sirithunge, Yue Xie, Fumiya Iida, Thilina Dulantha Lalitharatne

Comments 20 pages, 8 figures, journal

2506.04166 2026-02-16 cs.LG stat.CO stat.ML

N$^2$: A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion

Caleb Chin, Aashish Khubchandani, Harshvardhan Maskara, Kyuseong Choi, Jacob Feitelberg, Albert Gong, Manit Paul, Tathagata Sadhukhan, Anish Agarwal, Raaz Dwivedi

Comments 21 pages, 6 figures

2505.16814 2026-02-16 cs.CL

Does Synthetic Data Help Named Entity Recognition for Low-Resource Languages?

Gaurav Kamath, Sowmya Vajjala

Comments Accepted at AACL 2025. Camera-ready version

Journal ref https://aclanthology.org/2025.ijcnlp-short.15/

2505.16308 2026-02-16 cs.LG

Beyond All-to-All: Causal-Aligned Transformer with Dynamic Structure Learning for Multivariate Time Series Forecasting

Xingyu Zhang, Hanyun Du, Zeen Song, Siyu Zhao, Changwen Zheng, Wenwen Qiang

2505.12988 2026-02-16 cs.LG

Optimal Formats for Weight Quantisation

Douglas Orr, Luka Ribar, Carlo Luschi

Comments 36 pages, 35 figures

2505.02467 2026-02-16 cs.CV cs.AI

Timing Is Everything: Finding the Optimal Fusion Points in Multimodal Medical Imaging

Valerio Guarrasi, Klara Mogensen, Sara Tassinari, Sara Qvarlander, Paolo Soda

详情

DOI: 10.1109/IJCNN64981.2025.11227201

英文摘要

Multimodal deep learning harnesses diverse imaging modalities, such as MRI sequences, to enhance diagnostic accuracy in medical imaging. A key challenge is determining the optimal timing for integrating these modalities-specifically, identifying the network layers where fusion modules should be inserted. Current approaches often rely on manual tuning or exhaustive search, which are computationally expensive without any guarantee of converging to optimal results. We propose a sequential forward search algorithm that incrementally activates and evaluates candidate fusion modules at different layers of a multimodal network. At each step, the algorithm retrains from previously learned weights and compares validation loss to identify the best-performing configuration. This process systematically reduces the search space, enabling efficient identification of the optimal fusion timing without exhaustively testing all possible module placements. The approach is validated on two multimodal MRI datasets, each addressing different classification tasks. Our algorithm consistently identified configurations that outperformed unimodal baselines, late fusion, and a brute-force ensemble of all potential fusion placements. These architectures demonstrated superior accuracy, F-score, and specificity while maintaining competitive or improved AUC values. Furthermore, the sequential nature of the search significantly reduced computational overhead, making the optimization process more practical. By systematically determining the optimal timing to fuse imaging modalities, our method advances multimodal deep learning for medical imaging. It provides an efficient and robust framework for fusion optimization, paving the way for improved clinical decision-making and more adaptable, scalable architectures in medical AI applications.

URL PDF HTML ☆

赞 0 踩 0

2505.01390 2026-02-16 cs.CV cs.AI cs.LG

Multimodal Doctor-in-the-Loop: A Clinically-Guided Explainable Framework for Predicting Pathological Response in Non-Small Cell Lung Cancer

Alice Natalina Caragliano, Claudia Tacconi, Carlo Greco, Lorenzo Nibid, Edy Ippolito, Michele Fiore, Giuseppe Perrone, Sara Ramella, Paolo Soda, Valerio Guarrasi

Comments arXiv admin note: substantial text overlap with arXiv:2502.17503

2505.01096 2026-02-16 cs.CV cs.CL

Evaluating Vision Language Model Adaptations for Radiology Report Generation in Low-Resource Languages

Marco Salmè, Rosa Sicilia, Paolo Soda, Valerio Guarrasi

详情

DOI: 10.1109/IJCNN64981.2025.11227552

英文摘要

The integration of artificial intelligence in healthcare has opened new horizons for improving medical diagnostics and patient care. However, challenges persist in developing systems capable of generating accurate and contextually relevant radiology reports, particularly in low-resource languages. In this study, we present a comprehensive benchmark to evaluate the performance of instruction-tuned Vision-Language Models (VLMs) in the specialized task of radiology report generation across three low-resource languages: Italian, German, and Spanish. Employing the LLaVA architectural framework, we conducted a systematic evaluation of pre-trained models utilizing general datasets, domain-specific datasets, and low-resource language-specific datasets. In light of the unavailability of models that possess prior knowledge of both the medical domain and low-resource languages, we analyzed various adaptations to determine the most effective approach for these contexts. The results revealed that language-specific models substantially outperformed both general and domain-specific models in generating radiology reports, emphasizing the critical role of linguistic adaptation. Additionally, models fine-tuned with medical terminology exhibited enhanced performance across all languages compared to models with generic knowledge, highlighting the importance of domain-specific training. We also explored the influence of the temperature parameter on the coherence of report generation, providing insights for optimal model settings. Our findings highlight the importance of tailored language and domain-specific training for improving the quality and accuracy of radiological reports in multilingual settings. This research not only advances our understanding of VLMs adaptability in healthcare but also points to significant avenues for future investigations into model tuning and language-specific adaptations.

URL PDF HTML ☆

赞 0 踩 0

2505.01091 2026-02-16 cs.CV cs.AI

Any-to-Any Vision-Language Model for Multimodal X-ray Imaging and Radiological Report Generation

Daniele Molino, Francesco di Feola, Linlin Shen, Paolo Soda, Valerio Guarrasi

Comments arXiv admin note: substantial text overlap with arXiv:2501.04614

2504.07282 2026-02-16 cs.CL

RAISE: Reinforced Adaptive Instruction Selection For Large Language Models

Qingsong Lv, Yangning Li, Zihua Lan, Zishan Xu, Jiwei Tang, Tingwei Lu, Yinghui Li, Wenhao Jiang, Hong-Gee Kim, Hai-Tao Zheng, Philip S. Yu

Comments Accepted by EMNLP 2025 findings

2504.01342 2026-02-16 cs.CL

Foundations and Evaluations in NLP

Jungyeul Park

Comments Mémoire d'habilitation à diriger des recherches, 2025-2026

2503.24258 2026-02-16 cs.CV cs.AI

Beyond a Single Mode: GAN Ensembles for Diverse Medical Data Generation

Lorenzo Tronchin, Tommy Löfstedt, Paolo Soda, Valerio Guarrasi

2503.22809 2026-02-16 cs.LG cs.AI

Data-Driven Worker Activity Recognition and Efficiency Estimation in Manual Fruit Harvesting

Uddhav Bhattarai, Rajkishan Arikapudi, Steven A. Fennimore, Frank N Martin, Stavros G. Vougioukas

Comments Published in Elsevier Biosystems Engineering

Journal ref Biosystems Engineering, Vol. 261, 104326 (2026)

2502.17503 2026-02-16 cs.LG cs.AI cs.CV eess.IV

Doctor-in-the-Loop: An Explainable, Multi-View Deep Learning Framework for Predicting Pathological Response in Non-Small Cell Lung Cancer

Alice Natalina Caragliano, Filippo Ruffini, Carlo Greco, Edy Ippolito, Michele Fiore, Claudia Tacconi, Lorenzo Nibid, Giuseppe Perrone, Sara Ramella, Paolo Soda, Valerio Guarrasi

2502.09567 2026-02-16 cs.CL cs.AI

MorphNLI: A Stepwise Approach to Natural Language Inference Using Text Morphing

Vlad Andrei Negru, Robert Vacareanu, Camelia Lemnaru, Mihai Surdeanu, Rodica Potolea

Comments 16 pages, 11 figures, 8 tables. Accepted for NAACL 2025 Findings

2501.04614 2026-02-16 cs.AI cs.LG

XGeM: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation

Daniele Molino, Francesco Di Feola, Eliodoro Faiella, Deborah Fazzini, Domiziana Santucci, Linlin Shen, Valerio Guarrasi, Paolo Soda