arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.08445 2026-03-10 cs.CV

Alfa: Attentive Low-Rank Filter Adaptation for Structure-Aware Cross-Domain Personalized Gaze Estimation

He-Yen Hsieh, Wei-Te Mark Ting, H. T. Kung

Comments 21 pages, 16 figures, AAAI2026

详情

英文摘要

Pre-trained gaze models learn to identify useful patterns commonly found across users, but subtle user-specific variations (i.e., eyelid shape or facial structure) can degrade model performance. Test-time personalization (TTP) adapts pre-trained models to these user-specific domain shifts using only a few unlabeled samples. Efficient fine-tuning is critical in performing this domain adaptation: data and computation resources can be limited-especially for on-device customization. While popular parameter-efficient fine-tuning (PEFT) methods address adaptation costs by updating only a small set of weights, they may not be taking full advantage of structures encoded in pre-trained filters. To more effectively leverage existing structures learned during pre-training, we reframe personalization as a process to reweight existing features rather than learning entirely new ones. We present Attentive Low-Rank Filter Adaptation (Alfa) to adapt gaze models by reweighting semantic patterns in pre-trained filters. With Alfa, singular value decomposition (SVD) extracts dominant spatial components that capture eye and facial characteristics across users. Via an attention mechanism, we need only a few unlabeled samples to adjust and reweight pre-trained structures, selectively amplifying those relevant to a target user. Alfa achieves the lowest average gaze errors across four cross-dataset gaze benchmarks, outperforming existing TTP methods and low-rank adaptation (LoRA)-based variants. We also show that Alfa's attentive low-rank methods can be applied to applications beyond vision, such as diffusion-based language models.

URL PDF HTML ☆

赞 0 踩 0

2603.08436 2026-03-10 cs.CV cs.CL

Can Vision-Language Models Solve the Shell Game?

Tiedong Liu, Wee Sun Lee

2603.08434 2026-03-10 cs.CV

Information Maximization for Long-Tailed Semi-Supervised Domain Generalization

Leo Fillioux, Omprakash Chakraborty, Quentin Gopée, Pierre Marza, Paul-Henry Cournède, Stergios Christodoulidis, Maria Vakalopoulou, Ismail Ben Ayed, Jose Dolz

2603.08433 2026-03-10 cs.RO

FoMo: A Multi-Season Dataset for Robot Navigation in Forêt Montmorency

Matěj Boxan, Gabriel Jeanson, Alexander Krawciw, Effie Daum, Xinyuan Qiao, Sven Lilge, Timothy D. Barfoot, François Pomerleau

2603.08429 2026-03-10 cs.CL cs.AI cs.IR

One Model Is Enough: Native Retrieval Embeddings from LLM Agent Hidden States

Bo Jiang

2603.08425 2026-03-10 cs.AI cs.HC cs.LG cs.MA cs.SY eess.SY

IronEngine: Towards General AI Assistant

Xi Mo

Comments Technical Report

2603.08424 2026-03-10 cs.LG cs.AI

SYNAPSE: Framework for Neuron Analysis and Perturbation in Sequence Encoding

Jesús Sánchez Ochoa, Enrique Tomás Martínez Beltrán, Alberto Huertas Celdrán

详情

英文摘要

In recent years, Artificial Intelligence has become a powerful partner for complex tasks such as data analysis, prediction, and problem-solving, yet its lack of transparency raises concerns about its reliability. In sensitive domains such as healthcare or cybersecurity, ensuring transparency, trustworthiness, and robustness is essential, since the consequences of wrong decisions or successful attacks can be severe. Prior neuron-level interpretability approaches are primarily descriptive, task-dependent, or require retraining, which limits their use as systematic, reusable tools for evaluating internal robustness across architectures and domains. To overcome these limitations, this work proposes SYNAPSE, a systematic, training-free framework for understanding and stress-testing the internal behavior of Transformer models across domains. It extracts per-layer [CLS] representations, trains a lightweight linear probe to obtain global and per-class neuron rankings, and applies forward-hook interventions during inference. This design enables controlled experiments on internal representations without altering the original model, thereby allowing weaknesses, stability patterns, and label-specific sensitivities to be measured and compared directly across tasks and architectures. Across all experiments, SYNAPSE reveals a consistent, domain-independent organization of internal representations, in which task-relevant information is encoded in broad, overlapping neuron subsets. This redundancy provides a strong degree of functional stability, while class-wise asymmetries expose heterogeneous specialization patterns and enable label-aware analysis. In contrast, small structured manipulations in weight or logit space are sufficient to redirect predictions, highlighting complementary vulnerability profiles and illustrating how SYNAPSE can guide the development of more robust Transformer models.

URL PDF HTML ☆

赞 0 踩 0

2603.08423 2026-03-10 cs.RO

Tactile Recognition of Both Shapes and Materials with Automatic Feature Optimization-Enabled Meta Learning

Hongliang Zhao, Wenhui Yang, Yang Chen, Zhuorui Wang, Baiheng Liu, Longhui Qin

Comments 7 pages, 7 figures, conference paper accepted by ICRA 2026

2603.08418 2026-03-10 cs.LG

Meta-RL with Shared Representations Enables Fast Adaptation in Energy Systems

Théo Zangato, Aomar Osmani, Pegah Alizadeh

Comments accepted at PAKDD 2026, Hong Kong

2603.08412 2026-03-10 cs.CL cs.AI

Aligning to Illusions: Choice Blindness in Human and AI Feedback

Wenbin Wu

Comments 16 pages, 6 figures, 2 tables

2603.08399 2026-03-10 cs.LG cs.AI cs.RO

A Recipe for Stable Offline Multi-agent Reinforcement Learning

Dongsu Lee, Daehee Lee, Amy Zhang

Comments Preprint

2603.08398 2026-03-10 cs.CL cs.AI cs.LG

Revealing Behavioral Plasticity in Large Language Models: A Token-Conditional Perspective

Liyuan Mao, Le Yu, Jing Zhou, Chujie Zheng, Bowen Yu, Chang Gao, Shixuan Liu, An Yang, Weinan Zhang, JunYang Lin

Comments Work done during an internship at the Qwen Team, Alibaba Group

2603.08392 2026-03-10 cs.CL

COACH meets QUORUM: A Framework and Pipeline for Aligning User, Expert and Developer Perspectives in LLM-generated Health Counselling

Yee Man Ng, Bram van Dijk, Pieter Beynen, Otto Boekesteijn, Joris Jansen, Gerard van Oortmerssen, Max van Duijn, Marco Spruit

Comments Under review for the CL4Health workshop

2603.08387 2026-03-10 cs.CV

AULLM++: Structural Reasoning with Large Language Models for Micro-Expression Recognition

Zhishu Liu, Kaishen Yuan, Bo Zhao, Hui Ma, Zitong Yu

2603.08386 2026-03-10 cs.CV

Real-Time Drone Detection in Event Cameras via Per-Pixel Frequency Analysis

Michael Bezick, Majid Sahin

2603.08383 2026-03-10 cs.RO

MoMaStage: Skill-State Graph Guided Planning and Closed-Loop Execution for Long-Horizon Indoor Mobile Manipulation

Chenxu Li, Zixuan Chen, Yetao Li, Jiapeng Xu, Hongyu Ding, Jieqi Shi, Jing Huo, Yang Gao

Comments 8 pages

2603.08379 2026-03-10 cs.RO

Perception-Aware Communication-Free Multi-UAV Coordination in the Wild

Manuel Boldrer, Michal Kamler, Afzal Ahmad, Martin Saska

2603.08377 2026-03-10 cs.LG stat.ML

Beyond the Markovian Assumption: Robust Optimization via Fractional Weyl Integrals in Imbalanced Data

Gustavo A. Dorrego

Comments 5 pages, 3 figures

2603.08374 2026-03-10 cs.CV

This Looks Distinctly Like That: Grounding Interpretable Recognition in Stiefel Geometry against Neural Collapse

Junhao Jia, Jiaqi Wang, Yunyou Liu, Haodong Jing, Yueyi Wu, Xian Wu, Yefeng Zheng

2603.08369 2026-03-10 cs.AI

M$^3$-ACE: Rectifying Visual Perception in Multimodal Math Reasoning via Multi-Agentic Context Engineering

Peijin Xie, Zhen Xu, Bingquan Liu, Baoxun Wang

详情

英文摘要

Multimodal large language models have recently shown promising progress in visual mathematical reasoning. However, their performance is often limited by a critical yet underexplored bottleneck: inaccurate visual perception. Through systematic analysis, we find that the most failures originate from incorrect or incomplete visual evidence extraction rather than deficiencies in reasoning capability. Moreover, models tend to remain overly confident in their initial perceptions, making standard strategies such as prompt engineering, multi-round self-reflection, or posterior guidance insufficient to reliably correct errors. To address this limitation, we propose M3-ACE, a multi-agentic context engineering framework designed to rectify visual perception in multimodal math reasoning. Instead of directly aggregating final answers, our approach decouples perception and reasoning by dynamically maintaining a shared context centered on visual evidence lists. Multiple agents collaboratively contribute complementary observations, enabling the system to expose inconsistencies and recover missing perceptual information. To support stable multi-turn collaboration, we further introduce two lightweight tools: a Summary Tool that organizes evidence from different agents into consistent, complementary, and conflicting components, and a Refine Tool that filters unreliable samples and guides iterative correction. Extensive experiments demonstrate that M3-ACE substantially improves visual mathematical reasoning performance across multiple benchmarks. Our method establishes new state-of-the-art results 89.1 on the MathVision benchmark and achieves consistent improvements on other related datasets, including MathVista and MathVerse. These results highlight the importance of perception-centric multi-agent collaboration for advancing multimodal reasoning systems.

URL PDF HTML ☆

赞 0 踩 0

2603.08364 2026-03-10 cs.CV

Diffusion-Based Data Augmentation for Image Recognition: A Systematic Analysis and Evaluation

Zekun Li, Yinghuan Shi, Yang Gao, Dong Xu

2603.08361 2026-03-10 cs.CV

$Δ$VLA: Prior-Guided Vision-Language-Action Models via World Knowledge Variation

Yijie Zhu, Jie He, Rui Shao, Kaishen Yuan, Tao Tan, Xiaochen Yuan, Zitong Yu

2603.08358 2026-03-10 cs.CL

Do Language Models Know Theo Has a Wife? Investigating the Proviso Problem

Tara Azin, Daniel Dumitrescu, Diana Inkpen, Raj Singh

2603.08349 2026-03-10 cs.LG cs.AI stat.ML

Towards plausibility in time series counterfactual explanations

Marcin Kostrzewa, Krzysztof Galus, Maciej Zięba

2603.08347 2026-03-10 cs.CV

Local-Global Prompt Learning via Sparse Optimal Transport

Deniz Kizaroğlu, Ülku Tuncer Küçüktas, Emre Çakmakyurdu, Alptekin Temizel

Comments 9 pages, 3 figures, 4 tables. Code available at GitHub

2603.08342 2026-03-10 cs.RO

PhaForce: Phase-Scheduled Visual-Force Policy Learning with Slow Planning and Fast Correction for Contact-Rich Manipulation

Mingxin Wang, Zhirun Yue, Renhao Lu, Yizhe Li, Zihan Wang, Guoping Pan, Kangkang Dong, Jun Cheng, Yi Cheng, Houde Liu

2603.08336 2026-03-10 cs.RO

Hierarchical Multi-Modal Planning for Fixed-Altitude Sparse Target Search and Sampling

Lingpeng Chen, Yuchen Zheng, Apple Pui-Yi Chui, Junfeng Wu, Ziyang Hong

Comments 8 pages, 9 figures, conference

2603.08329 2026-03-10 cs.CL cs.AI cs.IR

SPD-RAG: Sub-Agent Per Document Retrieval-Augmented Generation

Yagiz Can Akay, Muhammed Yusuf Kartal, Esra Alparslan, Faruk Ortakoyluoglu, Arda Akpinar

Comments 12 pages

2603.08328 2026-03-10 cs.CV cs.LG

Beyond Attention Heatmaps: How to Get Better Explanations for Multiple Instance Learning Models in Histopathology

Mina Jamshidi Idaji, Julius Hense, Tom Neuhäuser, Augustin Krause, Yanqing Luo, Oliver Eberle, Thomas Schnake, Laure Ciernik, Farnoush Rezaei Jafari, Reza Vahidimajd, Jonas Dippel, Christoph Walz, Frederick Klauschen, Andreas Mock, Klaus-Robert Müller

详情

英文摘要

Multiple instance learning (MIL) has enabled substantial progress in computational histopathology, where a large amount of patches from gigapixel whole slide images are aggregated into slide-level predictions. Heatmaps are widely used to validate MIL models and to discover tissue biomarkers. Yet, the validity of these heatmaps has barely been investigated. In this work, we introduce a general framework for evaluating the quality of MIL heatmaps without requiring additional labels. We conduct a large-scale benchmark experiment to assess six explanation methods across histopathology task types (classification, regression, survival), MIL model architectures (Attention-, Transformer-, Mamba-based), and patch encoder backbones (UNI2, Virchow2). Our results show that explanation quality mostly depends on MIL model architecture and task type, with perturbation ("Single"), layer-wise relevance propagation (LRP), and integrated gradients (IG) consistently outperforming attention-based and gradient-based saliency heatmaps, which often fail to reflect model decision mechanisms. We further demonstrate the advanced capabilities of the best-performing explanation methods: (i) We provide a proof-of-concept that MIL heatmaps of a bulk gene expression prediction model can be correlated with spatial transcriptomics for biological validation, and (ii) showcase the discovery of distinct model strategies for predicting human papillomavirus (HPV) infection from head and neck cancer slides. Our work highlights the importance of validating MIL heatmaps and establishes that improved explainability can enable more reliable model validation and yield biological insights, making a case for a broader adoption of explainable AI in digital pathology. Our code is provided in a public GitHub repository: https://github.com/bifold-pathomics/xMIL/tree/xmil-journal

URL PDF HTML ☆

赞 0 踩 0

2603.08324 2026-03-10 cs.RO cs.AI

EndoSERV: A Vision-based Endoluminal Robot Navigation System

Junyang Wu, Fangfang Xie, Minghui Zhang, Hanxiao Zhang, Jiayuan Sun, Yun Gu, Guang-Zhong Yang