arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.04943 2026-03-06 cs.SD

Training Dynamics-Aware Multi-Factor Curriculum Learning for Target Speaker Extraction

Yun Liu, Xuechen Liu, Xiaoxiao Miao, Junichi Yamagishi

详情

英文摘要

Target speaker extraction (TSE) aims to isolate a specific speaker's voice from multi-speaker mixtures. Despite strong benchmark results, real-world performance often degrades due to different interacting factors. Previous curriculum learning approaches for TSE typically address these factors separately, failing to capture their complex interactions and relying on predefined difficulty factors that may not align with actual model learning behavior. To address this challenge, we first propose a multi-factor curriculum learning strategy that jointly schedules SNR thresholds, speaker counts, overlap ratios, and synthetic/real proportions, enabling progressive learning from simple to complex scenarios. However, determining optimal scheduling without predefined assumptions remains challenging. We therefore introduce TSE-Datamap, a visualization framework that grounds curriculum design in observed training dynamics by tracking confidence and variability across training epochs. Our analysis reveals three characteristic data regions: (i) easy-to-learn examples where models consistently perform well, (ii) ambiguous examples where models oscillate between alternative predictions, and (iii) hard-to-learn examples where models persistently struggle. Guided by these data-driven insights, our methods improve extraction results over random sampling, with particularly strong gains in challenging multi-speaker scenarios.

URL PDF HTML ☆

赞 0 踩 0

2603.04938 2026-03-06 cs.CV cs.LG cs.RO

Person Detection and Tracking from an Overhead Crane LiDAR

Nilusha Jayawickrama, Henrik Toikka, Risto Ojala

Comments 8 pages, 7 figures, 4 tables. Submitted to Ubiquitous Robots (UR) 2026. Code: https://github.com/nilushacj/O-LiPeDeT-Overhead-LiDAR-Person-Detection-and-Tracking

2603.04936 2026-03-06 cs.LG

Semantic Communication-Enhanced Split Federated Learning for Vehicular Networks: Architecture, Challenges, and Case Study

Lu Yu, Zheng Chang, Ying-Chang Liang

Comments Accepted for publication in IEEE Communications Magazine. 7 pages, 5 figures

2603.04933 2026-03-06 cs.CL

AILS-NTUA at SemEval-2026 Task 3: Efficient Dimensional Aspect-Based Sentiment Analysis

Stavros Gazetas, Giorgos Filandrianos, Maria Lymperaiou, Paraskevi Tzouveli, Athanasios Voulodimos, Giorgos Stamou

2603.04921 2026-03-06 cs.CL

AILS-NTUA at SemEval-2026 Task 10: Agentic LLMs for Psycholinguistic Marker Extraction and Conspiracy Endorsement Detection

Panagiotis Alexios Spanakis, Maria Lymperaiou, Giorgos Filandrianos, Athanasios Voulodimos, Giorgos Stamou

2603.04920 2026-03-06 cs.AI

Knowledge-informed Bidding with Dual-process Control for Online Advertising

Huixiang Luo, Longyu Gao, Yaqi Liu, Qianqian Chen, Pingchun Huang, Tianning Li

2603.04918 2026-03-06 cs.LG cs.AI

BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning

Yuan Li, Bo Wang, Yufei Gao, Yuqian Yao, Xinyuan Wang, Zhangyue Yin, Xipeng Qiu

Comments Code available at https://github.com/OpenMOSS/BandPO.git

2603.04915 2026-03-06 cs.LG cs.AI cs.CR

EVMbench: Evaluating AI Agents on Smart Contract Security

Justin Wang, Andreas Bigger, Xiaohai Xu, Justin W. Lin, Andy Applebaum, Tejal Patwardhan, Alpin Yukseloglu, Olivia Watkins

2603.04914 2026-03-06 cs.RO cs.SY eess.SY math.OC

U-OBCA: Uncertainty-Aware Optimization-Based Collision Avoidance via Wasserstein Distributionally Robust Chance Constraints

Zehao Wang, Yuxuan Tang, Han Zhang, Jingchuan Wang, Weidong Chen

2603.04913 2026-03-06 cs.RO cs.CV

Beyond the Patch: Exploring Vulnerabilities of Visuomotor Policies via Viewpoint-Consistent 3D Adversarial Object

Chanmi Lee, Minsung Yoon, Woojae Kim, Sebin Lee, Sung-eui Yoon

Comments 8 pages, 10 figures, Accepted to ICRA 2026. Project page: https://chan-mi-lee.github.io/3DAdvObj/

2603.04910 2026-03-06 cs.RO cs.AI cs.LG

VPWEM: Non-Markovian Visuomotor Policy with Working and Episodic Memory

Yuheng Lei, Zhixuan Liang, Hongyuan Zhang, Ping Luo

2603.04908 2026-03-06 cs.CV

AdaIAT: Adaptively Increasing Attention to Generated Text to Alleviate Hallucinations in LVLM

Li'an Zhong, Ziqiang He, Jibin Zheng, Jin Li, Z. Jane Wang, Xiangui Kang

2603.04904 2026-03-06 cs.AI cs.CL

Alignment Backfire: Language-Dependent Reversal of Safety Interventions Across 16 Languages in LLM Multi-Agent Systems

Hiroki Fukui

Comments 89 pages, 4 figures, 4 supplementary figures, 12 supplementary tables; preprint

2603.04900 2026-03-06 cs.AI

EvoTool: Self-Evolving Tool-Use Policy Optimization in LLM Agents via Blame-Aware Mutation and Diversity-Aware Selection

Shuo Yang, Soyeon Caren Han, Xueqi Ma, Yan Li, Mohammad Reza Ghasemi Madani, Eduard Hovy

Comments Work under review, 9 pages, 5 figures

2603.04899 2026-03-06 cs.CV

FC-VFI: Faithful and Consistent Video Frame Interpolation for High-FPS Slow Motion Video Generation

Ganggui Ding, Hao Chen, Xiaogang Xu

Comments ICASSP2026

2603.04898 2026-03-06 cs.LG cs.NI

U-Parking: Distributed UWB-Assisted Autonomous Parking System with Robust Localization and Intelligent Planning

Yiang Wu, Qiong Wu, Pingyi Fan, Kezhi Wang, Wen Chen, Guoqiang Mao, Khaled B. Letaief

Comments This paper has been accepted by infocom. The source code has been released at: https://github.com/qiongwu86/U-Parking

2603.04896 2026-03-06 cs.AI

Authorize-on-Demand: Dynamic Authorization with Legality-Aware Intellectual Property Protection for VLMs

Lianyu Wang, Meng Wang, Huazhu Fu, Daoqiang Zhang

2603.04894 2026-03-06 cs.AI

Differentially Private Multimodal In-Context Learning

Ivoline C. Ngong, Zarreen Reza, Joseph P. Near

2603.04893 2026-03-06 cs.CL cs.AI

Free Lunch for Pass@$k$? Low Cost Diverse Sampling for Diffusion Language Models

Sean Lamont, Christian Walder, Paul Montague, Amir Dezfouli, Michael Norrish

2603.04892 2026-03-06 cs.CV

Locality-Attending Vision Transformer

Sina Hajimiri, Farzad Beizaee, Fereshteh Shakeri, Christian Desrosiers, Ismail Ben Ayed, Jose Dolz

Comments Accepted to ICLR 2026

2603.04890 2026-03-06 cs.LG cs.AI cs.CV

FedAFD: Multimodal Federated Learning via Adversarial Fusion and Distillation

Min Tan, Junchao Ma, Yinfu Feng, Jiajun Ding, Wenwen Pan, Tingting Han, Qian Zheng, Zhenzhong Kuang, Zhou Yu

Comments Accepted by CVPR 2026

2603.04887 2026-03-06 cs.CV

Federated Modality-specific Encoders and Partially Personalized Fusion Decoder for Multimodal Brain Tumor Segmentation

Hong Liu, Dong Wei, Qian Dai, Xian Wu, Yefeng Zheng, Liansheng Wang

Comments Medical Image Analysis 2025. arXiv admin note: substantial text overlap with arXiv:2403.11803

详情

DOI: 10.1016/j.media.2025.103759

英文摘要

Most existing federated learning (FL) methods for medical image analysis only considered intramodal heterogeneity, limiting their applicability to multimodal imaging applications. In practice, some FL participants may possess only a subset of the complete imaging modalities, posing intermodal heterogeneity as a challenge to effectively training a global model on all participants' data. Meanwhile, each participant expects a personalized model tailored to its local data characteristics in FL. This work proposes a new FL framework with federated modality-specific encoders and partially personalized multimodal fusion decoders (FedMEPD) to address the two concurrent issues. Specifically, FedMEPD employs an exclusive encoder for each modality to account for the intermodal heterogeneity. While these encoders are fully federated, the decoders are partially personalized to meet individual needs -- using the discrepancy between global and local parameter updates to dynamically determine which decoder filters are personalized. Implementation-wise, a server with full-modal data employs a fusion decoder to fuse representations from all modality-specific encoders, thus bridging the modalities to optimize the encoders via backpropagation. Moreover, multiple anchors are extracted from the fused multimodal representations and distributed to the clients in addition to the model parameters. Conversely, the clients with incomplete modalities calibrate their missing-modal representations toward the global full-modal anchors via scaled dot-product cross-attention, making up for the information loss due to absent modalities. FedMEPD is validated on the BraTS 2018 and 2020 multimodal brain tumor segmentation benchmarks. Results show that it outperforms various up-to-date methods for multimodal and personalized FL, and its novel designs are effective.

URL PDF HTML ☆

赞 0 踩 0

2603.04882 2026-03-06 cs.CV cs.AI cs.MM

DeformTrace: A Deformable State Space Model with Relay Tokens for Temporal Forgery Localization

Xiaodong Zhu, Suting Wang, Yuanming Zheng, Junqi Yang, Yangxu Liao, Yuhong Yang, Weiping Tu, Zhongyuan Wang

Comments 9 pages, 4 figures, accepted by AAAI 2026

2603.04878 2026-03-06 cs.CV

Structure Observation Driven Image-Text Contrastive Learning for Computed Tomography Report Generation

Hong Liu, Dong Wei, Qiong Peng, Yawen Huang, Xian Wu, Yefeng Zheng, Liansheng Wang

Comments Accept to IPMI 2025

详情

DOI: 10.1007/978-3-031-96625-5_15

英文摘要

Computed Tomography Report Generation (CTRG) aims to automate the clinical radiology reporting process, thereby reducing the workload of report writing and facilitating patient care. While deep learning approaches have achieved remarkable advances in X-ray report generation, their effectiveness may be limited in CTRG due to larger data volumes of CT images and more intricate details required to describe them. This work introduces a novel two-stage (structure- and report-learning) framework tailored for CTRG featuring effective structure-wise image-text contrasting. In the first stage, a set of learnable structure-specific visual queries observe corresponding structures in a CT image. The resulting observation tokens are contrasted with structure-specific textual features extracted from the accompanying radiology report with a structure-wise image-text contrastive loss. In addition, text-text similarity-based soft pseudo targets are proposed to mitigate the impact of false negatives, i.e., semantically identical image structures and texts from non-paired images and reports. Thus, the model learns structure-level semantic correspondences between CT images and reports. Further, a dynamic, diversity-enhanced negative queue is proposed to guide the network in learning to discriminate various abnormalities. In the second stage, the visual structure queries are frozen and used to select the critical image patch embeddings depicting each anatomical structure, minimizing distractions from irrelevant areas while reducing memory consumption. Also, a text decoder is added and trained for report generation.Our extensive experiments on two public datasets demonstrate that our framework establishes new state-of-the-art performance for CTRG in clinical efficiency, and its components are effective.

URL PDF HTML ☆

赞 0 踩 0

2603.04874 2026-03-06 cs.CV cs.AI cs.LG

Interpretable Pre-Release Baseball Pitch Type Anticipation from Broadcast 3D Kinematics

Jerrin Bright, Michelle Lu, John Zelek

Comments Submitted to CVPRW'26

2603.04869 2026-03-06 cs.CV

SURE: Semi-dense Uncertainty-REfined Feature Matching

Sicheng Li, Zaiwang Gu, Jie Zhang, Qing Guo, Xudong Jiang, Jun Cheng

Comments Accepted by ICRA 2026

2603.04868 2026-03-06 cs.AI

K-Gen: A Multimodal Language-Conditioned Approach for Interpretable Keypoint-Guided Trajectory Generation

Mingxuan Mu, Guo Yang, Lei Chen, Ping Wu, Jianxun Cui

2603.04864 2026-03-06 cs.CV

Scalable Injury-Risk Screening in Baseball Pitching From Broadcast Video

Jerrin Bright, Justin Mende, John Zelek

Comments Submitted to CVPRW'26

2603.04861 2026-03-06 cs.AI cs.LG cs.RO

Causally Robust Reward Learning from Reason-Augmented Preference Feedback

Minjune Hwang, Yigit Korkmaz, Daniel Seita, Erdem Bıyık

Comments Published in International Conference on Learning Representations (ICLR) 2026

2603.04857 2026-03-06 cs.CL cs.SE

FireBench: Evaluating Instruction Following in Enterprise and API-Driven LLM Applications

Yunfan Zhang, Yijie Bei, Jetashree Ravi, Pawel Garbacki