arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.20860 2026-02-25 cs.CV

DA-Cal: Towards Cross-Domain Calibration in Semantic Segmentation

Wangkai Li, Rui Sun, Zhaoyang Li, Yujia Chen, Tianzhu Zhang

详情

英文摘要

While existing unsupervised domain adaptation (UDA) methods greatly enhance target domain performance in semantic segmentation, they often neglect network calibration quality, resulting in misalignment between prediction confidence and actual accuracy -- a significant risk in safety-critical applications. Our key insight emerges from observing that performance degrades substantially when soft pseudo-labels replace hard pseudo-labels in cross-domain scenarios due to poor calibration, despite the theoretical equivalence of perfectly calibrated soft pseudo-labels to hard pseudo-labels. Based on this finding, we propose DA-Cal, a dedicated cross-domain calibration framework that transforms target domain calibration into soft pseudo-label optimization. DA-Cal introduces a Meta Temperature Network to generate pixel-level calibration parameters and employs bi-level optimization to establish the relationship between soft pseudo-labels and UDA supervision, while utilizing complementary domain-mixing strategies to prevent overfitting and reduce domain discrepancies. Experiments demonstrate that DA-Cal seamlessly integrates with existing self-training frameworks across multiple UDA segmentation benchmarks, significantly improving target domain calibration while delivering performance gains without inference overhead. The code will be released.

URL PDF HTML ☆

赞 0 踩 0

2602.20859 2026-02-25 cs.CL

FinAnchor: Aligned Multi-Model Representations for Financial Prediction

Zirui He, Huopu Zhang, Yanguang Liu, Sirui Wu, Mengnan Du

Comments 11 pages, 4 figures, 5 tables

2602.20853 2026-02-25 cs.CV

On the Explainability of Vision-Language Models in Art History

Stefanie Schneider

2602.20851 2026-02-25 cs.CV

Hybrid Fusion: One-Minute Efficient Training for Zero-Shot Cross-Domain Image Fusion

Ran Zhang, Xuanhua He, Liu Liu

2602.20850 2026-02-25 cs.RO

KCFRC: Kinematic Collision-Aware Foothold Reachability Criteria for Legged Locomotion

Lei Ye, Haibo Gao, Huaiguang Yang, Peng Xu, Haoyu Wang, Tie Liu, Junqi Shan, Zongquan Deng, Liang Ding

2602.20823 2026-02-25 cs.SD eess.AS

Geometric Analysis of Speech Representation Spaces: Topological Disentanglement and Confound Detection

Bipasha Kashyap, Pubudu N. Pathirana

Comments Submitted to INTERSPEECH 2026

2602.20818 2026-02-25 cs.CV

GatedCLIP: Gated Multimodal Fusion for Hateful Memes Detection

Yingying Guo, Ke Zhang, Zirong Zeng

Comments Preprint

2602.20813 2026-02-25 cs.AI

Pressure Reveals Character: Behavioural Alignment Evaluation at Depth

Nora Petrova, John Burden

Comments Preprint

2602.20812 2026-02-25 cs.AI

Qwen-BIM: developing large language model for BIM-based design with domain-specific benchmark and dataset

Jia-Rui Lin, Yun-Hong Cai, Xiang-Rui Ni, Shaojie Zhou, Peng Pan

2602.20810 2026-02-25 cs.AI

POMDPPlanners: Open-Source Package for POMDP Planning

Yaacov Pariente, Vadim Indelman

2602.20809 2026-02-25 cs.LG cs.AI

Regret-Guided Search Control for Efficient Learning in AlphaZero

Yun-Jui Tsai, Wei-Yu Chen, Yan-Ru Ju, Yu-Hung Chang, Ti-Rong Wu

Comments Accepted by the Fourteenth International Conference on Learning Representations (ICLR 2026)

2602.20805 2026-02-25 cs.SD cs.LG

Assessing the Impact of Speaker Identity in Speech Spoofing Detection

Anh-Tuan Dao, Driss Matrouf, Nicholas Evans

2602.20796 2026-02-25 cs.LG

Exploring the Impact of Parameter Update Magnitude on Forgetting and Generalization of Continual Learning

JinLi He, Liang Bai, Xian Yang

2602.20794 2026-02-25 cs.CV

VGGDrive: Empowering Vision-Language Models with Cross-View Geometric Grounding for Autonomous Driving

Jie Wang, Guang Li, Zhijian Huang, Chenxu Dang, Hangjun Ye, Yahong Han, Long Chen

Comments CVPR 2026

2602.20791 2026-02-25 cs.LG

Understanding the Role of Rehearsal Scale in Continual Learning under Varying Model Capacities

JinLi He, Liang Bai, Xian Yang

2602.20790 2026-02-25 cs.CV cs.RO

Real-time Motion Segmentation with Event-based Normal Flow

Sheng Zhong, Zhongyang Ren, Xiya Zhu, Dehao Yuan, Cornelia Fermuller, Yi Zhou

2602.20782 2026-02-25 cs.LG

On Electric Vehicle Energy Demand Forecasting and the Effect of Federated Learning

Andreas Tritsarolis, Gil Sampaio, Nikos Pelekis, Yannis Theodoridis

详情

英文摘要

The wide spread of new energy resources, smart devices, and demand side management strategies has motivated several analytics operations, from infrastructure load modeling to user behavior profiling. Energy Demand Forecasting (EDF) of Electric Vehicle Supply Equipments (EVSEs) is one of the most critical operations for ensuring efficient energy management and sustainability, since it enables utility providers to anticipate energy/power demand, optimize resource allocation, and implement proactive measures to improve grid reliability. However, accurate EDF is a challenging problem due to external factors, such as the varying user routines, weather conditions, driving behaviors, unknown state of charge, etc. Furthermore, as concerns and restrictions about privacy and sustainability have grown, training data has become increasingly fragmented, resulting in distributed datasets scattered across different data silos and/or edge devices, calling for federated learning solutions. In this paper, we investigate different well-established time series forecasting methodologies to address the EDF problem, from statistical methods (the ARIMA family) to traditional machine learning models (such as XGBoost) and deep neural networks (GRU and LSTM). We provide an overview of these methods through a performance comparison over four real-world EVSE datasets, evaluated under both centralized and federated learning paradigms, focusing on the trade-offs between forecasting fidelity, privacy preservation, and energy overheads. Our experimental results demonstrate, on the one hand, the superiority of gradient boosted trees (XGBoost) over statistical and NN-based models in both prediction accuracy and energy efficiency and, on the other hand, an insight that Federated Learning-enabled models balance these factors, offering a promising direction for decentralized energy demand forecasting.

URL PDF HTML ☆

赞 0 踩 0

2602.20773 2026-02-25 cs.CV

Federated Learning for Cross-Modality Medical Image Segmentation via Augmentation-Driven Generalization

Sachin Dudda Nagaraju, Ashkan Moradi, Bendik Skarre Abrahamsen, Mattijs Elschot

Comments Submitted to IEEE JBHI

2602.20770 2026-02-25 cs.AI

Pipeline for Verifying LLM-Generated Mathematical Solutions

Varvara Sazonova, Dmitri Shmelkin, Stanislav Kikot, Vasily Motolygin

2602.20768 2026-02-25 cs.RO

Visual Cooperative Drone Tracking for Open-Path Gas Measurements

Marius Schaab, Alisha Kiefer, Thomas Wiedemann, Patrick Hinsen, Achim J. Lilienthal

2602.20759 2026-02-25 cs.CL

Overton Pluralistic Reinforcement Learning for Large Language Models

Yu Fu, Seongho Son, Ilija Bogunovic

Comments 28 pages, 8 figures

2602.20758 2026-02-25 cs.LG

Deep unfolding of MCMC kernels: scalable, modular & explainable GANs for high-dimensional posterior sampling

Jonathan Spence, Tobías I. Liaudat, Konstantinos Zygalakis, Marcelo Pereyra

Comments 37 pages, 10 figures, 5 tables

2602.20752 2026-02-25 cs.CV cs.AI

OrthoDiffusion: A Generalizable Multi-Task Diffusion Foundation Model for Musculoskeletal MRI Interpretation

Tian Lan, Lei Xu, Zimu Yuan, Shanggui Liu, Jiajun Liu, Jiaxin Liu, Weilai Xiang, Hongyu Yang, Dong Jiang, Jianxin Yin, Dingyu Wang

详情

英文摘要

Musculoskeletal disorders represent a significant global health burden and are a leading cause of disability worldwide. While MRI is essential for accurate diagnosis, its interpretation remains exceptionally challenging. Radiologists must identify multiple potential abnormalities within complex anatomical structures across different imaging planes, a process that requires significant expertise and is prone to variability. We developed OrthoDiffusion, a unified diffusion-based foundation model designed for multi-task musculoskeletal MRI interpretation. The framework utilizes three orientation-specific 3D diffusion models, pre-trained in a self-supervised manner on 15,948 unlabeled knee MRI scans, to learn robust anatomical features from sagittal, coronal, and axial views. These view-specific representations are integrated to support diverse clinical tasks, including anatomical segmentation and multi-label diagnosis. Our evaluation demonstrates that OrthoDiffusion achieves excellent performance in the segmentation of 11 knee structures and the detection of 8 knee abnormalities. The model exhibited remarkable robustness across different clinical centers and MRI field strengths, consistently outperforming traditional supervised models. Notably, in settings where labeled data was scarce, OrthoDiffusion maintained high diagnostic precision using only 10\% of training labels. Furthermore, the anatomical representations learned from knee imaging proved highly transferable to other joints, achieving strong diagnostic performance across 11 diseases of the ankle and shoulder. These findings suggest that diffusion-based foundation models can serve as a unified platform for multi-disease diagnosis and anatomical segmentation, potentially improving the efficiency and accuracy of musculoskeletal MRI interpretation in real-world clinical workflows.

URL PDF HTML ☆

赞 0 踩 0

2602.20751 2026-02-25 cs.CL cs.AI cs.LG

SibylSense: Adaptive Rubric Learning via Memory Tuning and Adversarial Probing

Yifei Xu, Guilherme Potje, Shivam Shandilya, Tiancheng Yuan, Leonardo de Oliveira Nunes, Rakshanda Agarwal, Saeid Asgari, Adam Atkinson, Emre Kıcıman, Songwu Lu, Ranveer Chandra, Tusher Chakraborty

2602.20749 2026-02-25 cs.CL

Explicit Grammar Semantic Feature Fusion for Robust Text Classification

Azrin Sultana, Firoz Ahmed

Comments 30 pages, 9 figures

2602.20744 2026-02-25 cs.SD cs.AI

Voices of the Mountains: Deep Learning-Based Vocal Error Detection System for Kurdish Maqams

Darvan Shvan Khairaldeen, Hossein Hassani

详情

英文摘要

Maqam, a singing type, is a significant component of Kurdish music. A maqam singer receives training in a traditional face-to-face or through self-training. Automatic Singing Assessment (ASA) uses machine learning (ML) to provide the accuracy of singing styles and can help learners to improve their performance through error detection. Currently, the available ASA tools follow Western music rules. The musical composition requires all notes to stay within their expected pitch range from start to finish. The system fails to detect micro-intervals and pitch bends, so it identifies Kurdish maqam singing as incorrect even though the singer performs according to traditional rules. Kurdish maqam requires recognizing performance errors within microtonal spaces, which is beyond Western equal temperament. This research is the first attempt to address the mentioned gap. While many error types happen during singing, our focus is on pitch, rhythm, and modal stability errors in the context of Bayati-Kurd. We collected 50 songs from 13 vocalists ( 2-3 hours) and annotated 221 error spans (150 fine pitch, 46 rhythm, 25 modal drift). The data was segmented into 15,199 overlapping windows and converted to log-mel spectrograms. We developed a two-headed CNN-BiLSTM with attention mode to decide whether a window contains an error and to classify it based on the chosen errors. Trained for 20 epochs with early stopping at epoch 10, the model reached a validation macro-F1 of 0.468. On the full 50-song evaluation at a 0.750 threshold, recall was 39.4% and precision 25.8% . Within detected windows, type macro-F1 was 0.387, with F1 of 0.492 (fine pitch), 0.536 (rhythm), and 0.133 (modal drift); modal drift recall was 8.0%. The better performance on common error types shows that the method works, while the poor modal-drift recall shows that more data and balancing are needed.

URL PDF HTML ☆

赞 0 踩 0

2602.20739 2026-02-25 cs.AI cs.CV

PyVision-RL: Forging Open Agentic Vision Models via RL

Shitian Zhao, Shaoheng Lin, Ming Li, Haoquan Zhang, Wenshuo Peng, Kaipeng Zhang, Chen Wei

Comments preprint

2602.20732 2026-02-25 cs.AI

CHESS: Context-aware Hierarchical Efficient Semantic Selection for Long-Context LLM Inference

Chao Fei, Guozhong Li, Chenxi Liu, Panos Kalnis

2602.20731 2026-02-25 cs.CV cs.AI cs.LG

Communication-Inspired Tokenization for Structured Image Representations

Aram Davtyan, Yusuf Sahin, Yasaman Haghighi, Sebastian Stapf, Pablo Acuaviva, Alexandre Alahi, Paolo Favaro

Comments Project website: https://araachie.github.io/comit/

2602.20728 2026-02-25 cs.AI

Balancing Multiple Objectives in Urban Traffic Control with Reinforcement Learning from AI Feedback

Chenyang Zhao, Vinny Cahill, Ivana Dusparic