arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2507.12900 2026-03-19 cs.LG

Knowing What You Cannot Explain: Learning to Reject Low-Quality Explanations

Luca Stradiotti, Dario Pesenti, Stefano Teso, Jesse Davis

详情

英文摘要

Learning to Reject (LtR) frameworks allow ML models to abstain from uncertain predictions and promote user trust. However, since current LtR strategies focus solely on predictive performance, they completely neglect explanation quality. Low-quality explanations -- whether they inaccurately reflect the model's reasoning or fail to satisfy users -- can severely compromise trust assessments and induce over-reliance on incorrect predictions. We argue that models should abstain from making a prediction when they cannot offer a satisfactory explanation for it and introduce a framework for learning to reject low-quality explanations (LtX) in which predictors are equipped with a rejector that evaluates the explanation quality. Focusing on popular attribution techniques, we propose REX (REjector of low-quality eXplanations), which learns a rejector from explanation quality labels combining machine-side judgments with explicit human annotations to assess explanation quality. Our empirical evaluation demonstrates that \method outperforms popular LtR strategies and baselines relying on isolated explanation metrics. Finally, to support future research, we publicly release a novel, larger-scale dataset of 1050 human-annotated machine explanations.

URL PDF HTML ☆

赞 0 踩 0

2507.06231 2026-03-19 cs.CV

RSRefSeg 2: Decoupling Referring Remote Sensing Image Segmentation with Foundation Models

Keyan Chen, Chenyang Liu, Bowen Chen, Jiafan Zhang, Zhengxia Zou, Zhenwei Shi

详情

DOI: 10.1109/TGRS.2025.3647535
Journal ref: IEEE Transactions on Geoscience and Remote Sensing, vol. 64, pp. 1-20, 2026, Art no. 4700420

英文摘要

Referring Remote Sensing Image Segmentation provides a flexible and fine-grained framework for remote sensing scene analysis via vision-language collaborative interpretation. Current approaches predominantly utilize a three-stage pipeline encompassing dual-modal encoding, cross-modal interaction, and pixel decoding. These methods demonstrate significant limitations in managing complex semantic relationships and achieving precise cross-modal alignment, largely due to their coupled processing mechanism that conflates target localization with boundary delineation. This architectural coupling amplifies error propagation under semantic ambiguity while restricting model generalizability and interpretability. To address these issues, we propose RSRefSeg 2, a decoupling paradigm that reformulates the conventional workflow into a collaborative dual-stage framework: coarse localization followed by fine segmentation. RSRefSeg 2 integrates CLIP's cross-modal alignment strength with SAM's segmentation generalizability through strategic foundation model collaboration. Specifically, CLIP is employed as the dual-modal encoder to activate target features within its pre-aligned semantic space and generate localization prompts. To mitigate CLIP's misactivation challenges in multi-entity scenarios described by referring texts, a cascaded second-order prompter is devised, which enhances precision through implicit reasoning via decomposition of text embeddings into complementary semantic subspaces. These optimized semantic prompts subsequently direct the SAM to generate pixel-level refined masks, thereby completing the semantic transmission pipeline. Extensive experiments (RefSegRS, RRSIS-D, and RISBench) demonstrate that RSRefSeg 2 surpasses contemporary methods in segmentation accuracy (+~3% gIoU) and complex semantic interpretation. Code is available at: https://github.com/KyanChen/RSRefSeg2.

URL PDF HTML ☆

赞 0 踩 0

2507.05591 2026-03-19 cs.AI

MLlm-DR: Towards Explainable Depression Recognition with MultiModal Large Language Models

Wei Zhang, Juan Chen, En Zhu, Wenhong Cheng, YunPeng Li, Yanbo J. Wang

2507.05257 2026-03-19 cs.CL cs.AI

Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions

Yuanzhe Hu, Yu Wang, Julian McAuley

Comments Y. Hu and Y. Wang contribute equally

2506.22342 2026-03-19 cs.LG cs.AI

Improving Epidemic Analyses with Privacy-Preserving Integration of Sensitive Data

Zihan Guan, Zhiyuan Zhao, Fengwei Tian, Dung Nguyen, Payel Bhattacharjee, Ravi Tandon, B. Aditya Prakash, Anil Vullikanti

Comments 19 pages, 7 figures

2506.17237 2026-03-19 cs.CV

Mechanistic Interpretability of Diffusion Models: Circuit-Level Analysis and Causal Validation

Dip Roy

2506.10680 2026-03-19 cs.LG cs.AI

SatSOM: Saturation Self-Organizing Maps for Continual Learning

Igor Urbanik, Paweł Gajewski

Comments github repository: https://github.com/Radinyn/satsom

2506.02070 2026-03-19 cs.LG

An Introduction to Flow Matching and Diffusion Models

Peter Holderrieth, Ezra Erives

2505.23914 2026-03-19 cs.CL cs.AI

Probing Association Biases in LLM Moderation Over-Sensitivity

Yuxin Wang, Botao Yu, Ivory Yang, Saeed Hassanpour, Soroush Vosoughi

Comments Preprint

2505.22899 2026-03-19 cs.LG

On the Dynamic Regret of Following the Regularized Leader: Optimism with History Pruning

Naram Mhaisen, George Iosifidis

Comments Fixed typos. Proceedings of ICML 2025

2505.22882 2026-03-19 cs.RO

TwinTrack: Bridging Vision and Contact Physics for Real-Time Tracking of Unknown Objects in Contact-Rich Scenes

Wen Yang, Zhixian Xie, Yiting Wang, Abhijit Tadepalli, Heni Ben Amor, Shan Lin, Wanxin Jin

Comments Accepted by IEEE International Conference on Robotics & Automation (ICRA) 2026

2505.19208 2026-03-19 cs.CV

Domain and Task-Focused Example Selection for Data-Efficient Contrastive Medical Image Segmentation

Tyler Ward, Aaron Moseley, Abdullah-Al-Zubaer Imran

Comments Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://www.melba-journal.org/2026:002

详情

DOI: 10.59275/j.melba.2026-3cg7
Journal ref: Machine.Learning.for.Biomedical.Imaging. 2026 (2026)

英文摘要

Segmentation is one of the most important tasks in the medical imaging pipeline as it influences a number of image-based decisions. To be effective, fully supervised segmentation approaches require large amounts of manually annotated training data. However, the pixel-level annotation process is expensive, time-consuming, and error-prone, hindering progress and making it challenging to perform effective segmentations. Therefore, models must learn efficiently from limited labeled data. Self-supervised learning (SSL), particularly contrastive learning via pre-training on unlabeled data and fine-tuning on limited annotations, can facilitate such limited labeled image segmentation. To this end, we propose a novel self-supervised contrastive learning framework for medical image segmentation, leveraging inherent relationships of different images, dubbed PolyCL. Without requiring any pixel-level annotations or unreasonable data augmentations, our PolyCL learns and transfers context-aware discriminant features useful for segmentation from an innovative surrogate, in a task-related manner. Additionally, we integrate the Segment Anything Model (SAM) into our framework in two novel ways: as a post-processing refinement module that improves the accuracy of predicted masks using bounding box prompts derived from coarse outputs, and as a propagation mechanism via SAM 2 that generates volumetric segmentations from a single annotated 2D slice. Experimental evaluations on three public computed tomography (CT) datasets demonstrate that PolyCL outperforms fully-supervised and self-supervised baselines in both low-data and cross-domain scenarios. Our code is available at https://github.com/tbwa233/PolyCL.

URL PDF HTML ☆

赞 0 踩 0

2505.19161 2026-03-19 cs.CV

Benchmarking Endoscopic Surgical Image Restoration and Beyond

Jialun Pei, Diandian Guo, Donghui Yang, Zhixi Li, Yuxin Feng, Long Ma, Bo Du, Pheng-Ann Heng

Comments This work has been accepted by CVPR 2026

2505.18945 2026-03-19 cs.CV cs.RO

Echo Planning for Autonomous Driving: From Current Observations to Future Trajectories and Back

Jintao Sun, Hu Zhang, Gangyi Ding, Zhedong Zheng

Comments 12 pages, 4 figures

2505.15151 2026-03-19 cs.LG

Time Tracker: Mixture-of-Experts-Enhanced Foundation Time Series Forecasting Model with Decoupled Training Pipelines

Aobo Liang, Yan Sun, Xiaohou Shi, Ke Li

2505.13377 2026-03-19 cs.LG

Score Distillation Beyond Acceleration: Generative Modeling from Corrupted Data

Yasi Zhang, Tianyu Chen, Zhendong Wang, Ying Nian Wu, Mingyuan Zhou, Oscar Leong

Comments This paper merges DSD(Denoising Score Distillation) and RSD(Restoration Score Distillation)v1. Tianyu Chen and Yasi Zhang contributed equally; Oscar Leong and Mingyuan Zhou advised equally

2505.11714 2026-03-19 cs.LG cs.AI cs.GT

Bi-Level Policy Optimization with Nyström Hypergradients

Arjun Prakash, Naicheng He, Denizalp Goktas, Jacob Makar-Limanov, Amy Greenwald

2505.04389 2026-03-19 cs.LG

Clust-Splitter - an Efficient Nonsmooth Optimization-Based Algorithm for Clustering Large Datasets

Jenni Lampainen, Kaisa Joki, Napsu Karmitsa, Marko M. Mäkelä

Comments 36 pages, 23 figures; first version. A revised version has been published in 'Advances in Data Analysis and Classification' (2026)

2504.20667 2026-03-19 cs.LG

Explanations Go Linear: Post-hoc Explainability for Tabular Data with Interpretable Meta-Encoding

Simone Piaggesi, Riccardo Guidotti, Fosca Giannotti, Dino Pedreschi

Comments Accepted at ICDM 2025

2504.18346 2026-03-19 cs.CL cs.AI

Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review

Toghrul Abbasli, Kentaroh Toyoda, Yuan Wang, Leon Witt, Muhammad Asif Ali, Yukai Miao, Dan Li, Qingsong Wei

2504.17749 2026-03-19 cs.LG

MSGCN: Multiplex Spatial Graph Convolution Network for Interlayer Link Weight Prediction

Steven E. Wilson, Sina Khanmohammadi

2504.04893 2026-03-19 cs.CV cs.AI

SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models

Justus Westerhoff, Erblina Purelku, Jakob Hackstein, Jonas Loos, Leo Pinetzki, Erik Rodner, Lorenz Hufe

Comments Accepted at CVPR 2025 Workshop EVAL-FoMo-2

2504.00638 2026-03-19 cs.LG cs.AI eess.IV

Impact of Data Duplication on Deep Neural Network-Based Image Classifiers: Robust vs. Standard Models

Alireza Aghabagherloo, Aydin Abadi, Sumanta Sarkar, Vishnu Asutosh Dasu, Bart Preneel

2503.22179 2026-03-19 cs.CV

High-Fidelity Diffusion Face Swapping with ID-Constrained Facial Conditioning

Dailan He, Xiahong Wang, Shulun Wang, Guanglu Song, Bingqi Ma, Hao Shao, Yu Liu, Hongsheng Li

Comments CVPR 2026

2503.22174 2026-03-19 cs.CV

Synergistic Bleeding Region and Point Detection in Laparoscopic Surgical Videos

Jialun Pei, Zhangjun Zhou, Diandian Guo, Zhixi Li, Jing Qin, Bo Du, Pheng-Ann Heng

Comments This work has been accepted by CVPR 2026

2503.22063 2026-03-19 cs.LG

Arch-VQ: Discrete Architecture Representation Learning with Autoregressive Priors

Deshani Geethika Poddenige, Sachith Seneviratne, Asela Hevapathige, Damith Senanayake, Mahesan Niranjan, PN Suganthan, Saman Halgamuge

2503.19405 2026-03-19 cs.CV

Multi-modal 3D Pose and Shape Estimation with Computed Tomography

Mingxiao Tu, Hoijoon Jung, Alireza Moghadam, Jineel Raythatha, Lachlan Allan, Jeremy Hsu, Andre Kyme, Jinman Kim

2503.14576 2026-03-19 cs.LG cs.AI

SocialJax: An Evaluation Suite for Multi-agent Reinforcement Learning in Sequential Social Dilemmas

Zihao Guo, Shuqing Shi, Richard Willis, Tristan Tomilin, Joel Z. Leibo, Yali Du

Comments Accepted at ICLR 2026

2503.08485 2026-03-19 cs.CV

Test-Time 3D Occupancy Prediction

Fengyi Zhang, Xiangyu Sun, Huitong Yang, Zheng Zhang, Zi Huang, Yadan Luo

Comments CVPR 2026

2503.06462 2026-03-19 cs.CV cs.AI

StructGS: Adaptive Spherical Harmonics and Rendering Enhancements for Superior 3D Gaussian Splatting

Zexu Huang, Min Xu, Stuart Perry

详情

DOI: 10.1109/TMM.2025.3639991
Journal ref: IEEE Transactions on Multimedia, vol. 28, pp. 1499-1510, 2025

英文摘要

Recent advancements in 3D reconstruction coupled with neural rendering techniques have greatly improved the creation of photo-realistic 3D scenes, influencing both academic research and industry applications. The technique of 3D Gaussian Splatting and its variants incorporate the strengths of both primitive-based and volumetric representations, achieving superior rendering quality. While 3D Geometric Scattering (3DGS) and its variants have advanced the field of 3D representation, they fall short in capturing the stochastic properties of non-local structural information during the training process. Additionally, the initialisation of spherical functions in 3DGS-based methods often fails to engage higher-order terms in early training rounds, leading to unnecessary computational overhead as training progresses. Furthermore, current 3DGS-based approaches require training on higher resolution images to render higher resolution outputs, significantly increasing memory demands and prolonging training durations. We introduce StructGS, a framework that enhances 3D Gaussian Splatting (3DGS) for improved novel-view synthesis in 3D reconstruction. StructGS innovatively incorporates a patch-based SSIM loss, dynamic spherical harmonics initialisation and a Multi-scale Residual Network (MSRN) to address the above-mentioned limitations, respectively. Our framework significantly reduces computational redundancy, enhances detail capture and supports high-resolution rendering from low-resolution inputs. Experimentally, StructGS demonstrates superior performance over state-of-the-art (SOTA) models, achieving higher quality and more detailed renderings with fewer artifacts.

URL PDF HTML ☆

赞 0 踩 0