arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.06988 2026-04-09 cs.CV

Canopy Tree Height Estimation Using Quantile Regression: Modeling and Evaluating Uncertainty in Remote Sensing

Karsten Schrödter, Jan Pauls, Fabian Gieseke

Comments Accepted to AISTATS 2026

2604.06987 2026-04-09 cs.CV cs.AI cs.CR

CAAP: Capture-Aware Adversarial Patch Attacks on Palmprint Recognition Models

Renyang Liu, Jiale Li, Jie Zhang, Cong Wu, Xiaojun Jia, Shuxin Li, Wei Zhou, Kwok-Yan Lam, See-kiong Ng

详情

英文摘要

Palmprint recognition is deployed in security-critical applications, including access control and palm-based payment, due to its contactless acquisition and highly discriminative ridge-and-crease textures. However, the robustness of deep palmprint recognition systems against physically realizable attacks remains insufficiently understood. Existing studies are largely confined to the digital setting and do not adequately account for the texture-dominant nature of palmprint recognition or the distortions introduced during physical acquisition. To address this gap, we propose CAAP, a capture-aware adversarial patch framework for palmprint recognition. CAAP learns a universal patch that can be reused across inputs while remaining effective under realistic acquisition variation. To match the structural characteristics of palmprints, the framework adopts a cross-shaped patch topology, which enlarges spatial coverage under a fixed pixel budget and more effectively disrupts long-range texture continuity. CAAP further integrates three modules: ASIT for input-conditioned patch rendering, RaS for stochastic capture-aware simulation, and MS-DIFE for feature-level identity-disruptive guidance. We evaluate CAAP on the Tongji, IITD, and AISEC datasets against generic CNN backbones and palmprint-specific recognition models. Experiments show that CAAP achieves strong untargeted and targeted attack performance with favorable cross-model and cross-dataset transferability. The results further show that, although adversarial training can partially reduce the attack success rate, substantial residual vulnerability remains. These findings indicate that deep palmprint recognition systems remain vulnerable to physically realizable, capture-aware adversarial patch attacks, underscoring the need for more effective defenses in practice. Code available at https://github.com/ryliu68/CAAP.

URL PDF HTML ☆

赞 0 踩 0

2604.06985 2026-04-09 cs.LG cs.AI

Frailty Estimation in Elderly Oncology Patients Using Multimodal Wearable Data and Multi-Instance Learning

Ioannis Kyprakis, Vasileios Skaramagkas, Georgia Karanasiou, Lampros Lakkas, Andri Papakonstantinou, Domen Ribnikar, Kalliopi Keramida, Dorothea Tsekoura, Ketti Mazzocco, Anastasia Constantinidou, Konstantinos Marias, Dimitrios I. Fotiadis, Manolis Tsiknakis

Comments 7 pages, 1 figure, under review for IEEE EMBC 2026

2604.06972 2026-04-09 cs.RO cs.MA

Differentiable Environment-Trajectory Co-Optimization for Safe Multi-Agent Navigation

Zhan Gao, Gabriele Fadini, Stelian Coros, Amanda Prorok

2604.06966 2026-04-09 cs.CV

MAR-GRPO: Stabilized GRPO for AR-diffusion Hybrid Image Generation

Xiaoxiao Ma, Jiachen Lei, Tianfei Ren, Jie Huang, Siming Fu, Aiming Hao, Jiahong Wu, Xiangxiang Chu, Feng Zhao

2604.06954 2026-04-09 cs.CV

Compression as an Adversarial Amplifier Through Decision Space Reduction

Lewis Evans, Harkrishan Jandu, Zihan Ye, Yang Lu, Shreyank N Gowda

2604.06949 2026-04-09 cs.RO

Learning-Based Strategy for Composite Robot Assembly Skill Adaptation

Khalil Abuibaid, Aleksandr Sidorenko, Achim Wagner, Martin Ruskowski

Comments Accepted at RAAD 2026 (Springer). 6 pages, 4 figures

2604.06943 2026-04-09 cs.RO

Sustainable Transfer Learning for Adaptive Robot Skills

Khalil Abuibaid, Vinit Hegiste, Nigora Gafur, Achim Wagner, Martin Ruskowski

Comments Published in RAAD 2025 (Springer). 7 pages, 5 figures

2604.06938 2026-04-09 cs.CV

POS-ISP: Pipeline Optimization at the Sequence Level for Task-aware ISP

Jiyun Won, Heemin Yang, Woohyeok Kim, Jungseul Ok, Sunghyun Cho

2604.06934 2026-04-09 cs.CV cs.AI

Multi-modal user interface control detection using cross-attention

Milad Moradi, Ke Yan, David Colwell, Matthias Samwald, Rhona Asgari

2604.06932 2026-04-09 cs.RO

Towards Multi-Object Nonprehensile Transportation via Shared Teleoperation: A Framework Based on Virtual Object Model Predictive Control

Xinyang Fan, Zhaoyang Chen, Shu Xin, Yi Ren, Zainan Jiang, Fenglei Ni, Hong Liu

2604.06916 2026-04-09 cs.LG cs.AI cs.CV

FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling

Yitong Li, Junsong Chen, Shuchen Xue, Pengcuo Zeren, Siyuan Fu, Dinghao Yang, Yangyang Tang, Junjie Bai, Ping Luo, Song Han, Enze Xie

2604.06914 2026-04-09 cs.LG

Equivariant Multi-agent Reinforcement Learning for Multimodal Vehicle-to-Infrastructure Systems

Charbel Bou Chaaya, Mehdi Bennis

2604.06912 2026-04-09 cs.CV cs.AI

Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models

Yuheng Shi, Xiaohuan Pei, Linfeng Wen, Minjing Dong, Chang Xu

Comments 16 pages, 9 figures

2604.06906 2026-04-09 cs.CL cs.AI cs.CY

The AI Skills Shift: Mapping Skill Obsolescence, Emergence, and Transition Pathways in the LLM Era

Rudra Jadhav, Janhavi Danve

Comments 11 pages, 12 figures, 2 tables, 17 references. Code and data available at

2604.06903 2026-04-09 cs.CL

Is Biomedical Specialization Still Worth It? Insights from Domain-Adaptive Language Modelling with a New French Health Corpus

Aidan Mannion, Cécile Macaire, Armand Violle, Stéphane Ohayon, Xavier Tannier, Didier Schwab, Lorraine Goeuriot, François Portet

2604.06902 2026-04-09 cs.CL

iTAG: Inverse Design for Natural Text Generation with Accurate Causal Graph Annotations

Wenshuo Wang, Boyu Cao, Nan Zhuang, Wei Li

Comments Accepted at ACL 2026

2604.06896 2026-04-09 cs.LG cs.SE physics.bio-ph

VertAX: a differentiable vertex model for learning epithelial tissue mechanics

Alessandro Pasqui, Jim Martin Catacora Ocana, Anshuman Sinha, Matthieu Perez, Fabrice Delbary, Giorgio Gosti, Mattia Miotto, Domenico Caudo, Maxence Ernoult, Hervé Turlier

Comments 28 pages, 4 figures

2604.06883 2026-04-09 cs.CV

SCT-MOT: Enhancing Air-to-Air Multiple UAVs Tracking with Swarm-Coupled Motion and Trajectory Guidance

Zhaochen Chu, Tao Song, Ren Jin, Shaoming He, Defu Lin, Siqing Cheng

Comments 17 pages, 7 figures. Under review at IEEE Transactions on Aerospace and Electronic Systems (TAES). This work has been submitted to the IEEE for possible publication

详情

英文摘要

Air-to-air tracking of swarm UAVs presents significant challenges due to the complex nonlinear group motion and weak visual cues for small objects, which often cause detection failures, trajectory fragmentation, and identity switches. Although existing methods have attempted to improve performance by incorporating trajectory prediction, they model each object independently, neglecting the swarm-level motion dependencies. Their limited integration between motion prediction and appearance representation also weakens the spatio-temporal consistency required for tracking in visually ambiguous and cluttered environments, making it difficult to maintain coherent trajectories and reliable associations. To address these challenges, we propose SCT-MOT, a tracking framework that integrates Swarm-Coupled motion modeling and Trajectory-guided feature fusion. First, we develop a Swarm Motion-Aware Trajectory Prediction (SMTP) module jointly models historical trajectories and posture-aware appearance features from a swarm-level perspective, enabling more accurate forecasting of the nonlinear, coupled group trajectories. Second, we design a Trajectory-Guided Spatio-Temporal Feature Fusion (TG-STFF) module aligns predicted positions with historical visual cues and deeply integrates them with current frame features, enhancing temporal consistency and spatial discriminability for weak objects. Extensive experiments on three public air-to-air swarm UAV tracking datasets, including AIRMOT, MOT-FLY, and UAVSwarm, demonstrate that SMTP achieves more accurate trajectory forecasts and yields a 1.21\% IDF1 improvement over the state-of-the-art trajectory prediction module EqMotion when integrated into the same MOT framework. Overall, our SCT-MOT consistently achieves superior accuracy and robustness compared to state-of-the-art trackers across multiple metrics under complex swarm scenarios.

URL PDF HTML ☆

赞 0 踩 0

2604.06882 2026-04-09 cs.RO cs.SY eess.SP eess.SY

Telecom World Models: Unifying Digital Twins, Foundation Models, and Predictive Planning for 6G

Hang Zou, Yuzhi Yang, Lina Bariah, Yu Tian, Yuhuan Lu, Bohao Wang, Anis Bara, Brahim Mefgouda, Hao Liu, Yiwei Tao, Sergy Petrov, Salma Cheour, Nassim Sehad, Sumudu Samarakoon, Chongwen Huang, Samson Lasaulce, Mehdi Bennis, Mérouane Debbah

详情

英文摘要

The integration of machine learning tools into telecom networks, has led to two prevailing paradigms, namely, language-based systems, such as Large Language Models (LLMs), and physics-based systems, such as Digital Twins (DTs). While LLM-based approaches enable flexible interaction and automation, they lack explicit representations of network dynamics. DTs, in contrast, offer a high-fidelity network simulation, but remain scenario-specific and are not designed for learning or decision-making under uncertainty. This gap becomes critical for 6G systems, where decisions must take into account the evolving network states, uncertainty, and the cascading effects of control actions across multiple layers. In this article, we introduce the {Telecom World Model}~(TWM) concept, an architecture for learned, action-conditioned, uncertainty-aware modeling of telecom system dynamics. We decompose the problem into two interacting worlds, a controllable system world consisting of operator-configurable settings and an external world that captures propagation, mobility, traffic, and failures. We propose a three-layer architecture, comprising a field world model for spatial environment prediction, a control/dynamics world model for action-conditioned Key Performance Indicator (KPI) trajectory prediction, and a telecom foundation model layer for intent translation and orchestration. We showcase a comparative analysis between existing paradigms, which demonstrates that TWM jointly provides telecom state grounding, fast action-conditioned roll-outs, calibrated uncertainty, multi-timescale dynamics, model-based planning, and LLM-integrated guardrails. Furthermore, we present a proof-of-concept on network slicing to validate the proposed architecture, showing that the full three-layer pipeline outperforms single-world baselines and accurately predicts KPI trajectories.

URL PDF HTML ☆

赞 0 踩 0

2604.06871 2026-04-09 cs.CL cs.AI

Do We Need Distinct Representations for Every Speech Token? Unveiling and Exploiting Redundancy in Large Speech Language Models

Bajian Xiang, Tingwei Guo, Xuan Chen, Yang Han

Comments Accepted to ACL 2026 (Findings)

2604.06870 2026-04-09 cs.CV

RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details

Dewei Zhou, You Li, Zongxin Yang, Yi Yang

Comments 18 pages

2604.06865 2026-04-09 cs.CV cs.AI

Physical Adversarial Attacks on AI Surveillance Systems:Detection, Tracking, and Visible--Infrared Evasion

Miguel A. DelaCruz, Patricia Mae Santos, Rafael T. Navarro

2604.06854 2026-04-09 cs.CL

To Adapt or not to Adapt, Rethinking the Value of Medical Knowledge-Aware Large Language Models

Ane G. Domingo-Aldama, Iker De La Iglesia, Maitane Urruela, Aitziber Atutxa, Ander Barrena

详情

英文摘要

BACKGROUND: Recent studies have shown that domain-adapted large language models (LLMs) do not consistently outperform general-purpose counterparts on standard medical benchmarks, raising questions about the need for specialized clinical adaptation. METHODS: We systematically compare general and clinical LLMs on a diverse set of multiple choice clinical question answering tasks in English and Spanish. We introduce a perturbation based evaluation benchmark that probes model robustness, instruction following, and sensitivity to adversarial variations. Our evaluation includes, one-step and two-step question transformations, multi prompt testing and instruction guided assessment. We analyze a range of state-of-the-art clinical models and their general-purpose counterparts, focusing on Llama 3.1-based models. Additionally, we introduce Marmoka, a family of lightweight 8B-parameter clinical LLMs for English and Spanish, developed via continual domain-adaptive pretraining on medical corpora and instructions. RESULTS: The experiments show that clinical LLMs do not consistently outperform their general purpose counterparts on English clinical tasks, even under the proposed perturbation based benchmark. However, for the Spanish subsets the proposed Marmoka models obtain better results compared to Llama. CONCLUSIONS: Our results show that, under current short-form MCQA benchmarks, clinical LLMs offer only marginal and unstable improvements over general-purpose models in English, suggesting that existing evaluation frameworks may be insufficient to capture genuine medical expertise. We further find that both general and clinical models exhibit substantial limitations in instruction following and strict output formatting. Finally, we demonstrate that robust medical LLMs can be successfully developed for low-resource languages such as Spanish, as evidenced by the Marmoka models.

URL PDF HTML ☆

赞 0 踩 0

2604.06849 2026-04-09 cs.CV

Vision-Language Model-Guided Deep Unrolling Enables Personalized, Fast MRI

Fangmao Ju, Yuzhu He, Zhiwen Xue, Chunfeng Lian, Jianhua Ma

2604.06846 2026-04-09 cs.CL cs.AI

MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors

Xiaotian Luo, Xun Jiang, Jiangcheng Wu

Comments 9 pages, 4 figures, 9 tables. Preprint

详情

英文摘要

Interactive medical dialogue benchmarks have shown that LLM diagnostic accuracy degrades significantly when interacting with non-cooperative patients, yet existing approaches either apply adversarial behaviors without graded severity or case-specific grounding, or reduce patient non-cooperation to a single ungraded axis, and none analyze cross-dimension interactions. We introduce MedDialBench, a benchmark enabling controlled, dose-response characterization of how individual patient behavior dimensions affect LLM diagnostic robustness. It decomposes patient behavior into five dimensions -- Logic Consistency, Health Cognition, Expression Style, Disclosure, and Attitude -- each with graded severity levels and case-specific behavioral scripts. This controlled factorial design enables graded sensitivity analysis, dose-response profiling, and cross-dimension interaction detection. Evaluating five frontier LLMs across 7,225 dialogues (85 cases x 17 configurations x 5 models), we find a fundamental asymmetry: information pollution (fabricating symptoms) produces 1.7-3.4x larger accuracy drops than information deficit (withholding information), and fabricating is the only configuration achieving statistical significance across all five models (McNemar p < 0.05). Among six dimension combinations, fabricating is the sole driver of super-additive interaction: all three fabricating-involving pairs produce O/E ratios of 0.70-0.81 (35-44% of eligible cases fail under the combination despite succeeding under each dimension alone), while all non-fabricating pairs show purely additive effects (O/E ~ 1.0). Inquiry strategy moderates deficit but not pollution: exhaustive questioning recovers withheld information, but cannot compensate for fabricated inputs. Models exhibit distinct vulnerability profiles, with worst-case drops ranging from 38.8 to 54.1 percentage points.

URL PDF HTML ☆

赞 0 踩 0

2604.06845 2026-04-09 cs.CL cs.AI

HingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable Dialogues

Yijie Zhong, Yunfan Gao, Haofen Wang

Comments Accepted by TheWebConf 2026

2604.06844 2026-04-09 cs.CV

CloudMamba: An Uncertainty-Guided Dual-Scale Mamba Network for Cloud Detection in Remote Sensing Imagery

Jiajun Yang, Keyan Chen, Zhengxia Zou, Zhenwei Shi

2604.06838 2026-04-09 cs.AI cs.LG

Explaining Neural Networks in Preference Learning: a Post-hoc Inductive Logic Programming Approach

Daniele Fossemò, Filippo Mignosi, Giuseppe Placidi, Luca Raggioli, Matteo Spezialetti, Fabio Aurelio D'Asaro

Comments Under consideration for publication in Theory and Practice of Logic Programming (TPLP)

2604.06837 2026-04-09 cs.LG

Contraction-Aligned Analysis of Soft Bellman Residual Minimization with Weighted Lp-Norm for Markov Decision Problem

Hyukjun Yang, Han-Dong Lim, Donghwan Lee