arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.03657 2026-04-07 cs.CV cs.IR cs.MM

Love Me, Love My Label: Rethinking the Role of Labels in Prompt Retrieval for Visual In-Context Learning

Tianci Luo, Haohao Pan, Jinpeng Wang, Niu Lian, Xinrui Chen, Bin Chen, Shu-Tao Xia, Chun Yuan

Comments Accepted to CVPR 2026. 10 pages, 5 figures, 3 tables

详情

英文摘要

Visual in-context learning (VICL) enables visual foundation models to handle multiple tasks by steering them with demonstrative prompts. The choice of such prompts largely influences VICL performance, standing out as a key challenge. Prior work has made substantial progress on prompt retrieval and reranking strategies, but mainly focuses on prompt images while overlooking labels. We reveal these approaches sometimes get visually similar but label-inconsistent prompts, which potentially degrade VICL performance. On the other hand, higher label consistency between query and prompts preferably indicates stronger VICL results. Motivated by these findings, we develop a framework named LaPR (Label-aware Prompt Retrieval), which highlights the role of labels in prompt selection. Our framework first designs an image-label joint representation for prompts to incorporate label cues explicitly. Besides, to handle unavailable query labels at test time, we introduce a mixture-of-expert mechanism to the dual encoders with query-adaptive routing. Each expert is expected to capture a specific label mode, while the router infers query-adaptive mixture weights and helps to learn label-aware representation. We carefully design alternative optimization for experts and router, with a VICL performance-guided contrastive loss and a label-guided contrastive loss, respectively. Extensive experiments show promising and consistent improvement of LaPR on in-context segmentation, detection, and colorization tasks. Moreover, LaPR generalizes well across feature extractors and cross-fold scenarios, suggesting the importance of label utilization in prompt retrieval for VICL. Code is available at https://github.com/luotc-why/CVPR26-LaPR.

URL PDF HTML ☆

赞 0 踩 0

2604.03656 2026-04-07 cs.AI

Beyond Retrieval: Modeling Confidence Decay and Deterministic Agentic Platforms in Generative Engine Optimization

XinYu Zhao, ChengYou Li, XiangBao Meng, Kai Zhang, XiaoDong Liu

2604.03653 2026-04-07 cs.CV cs.IR cs.MM

Imagine Before Concentration: Diffusion-Guided Registers Enhance Partially Relevant Video Retrieval

Jun Li, Xuhang Lou, Jinpeng Wang, Yuting Wang, Yaowei Wang, Shu-Tao Xia, Bin Chen

Comments Accepted to CVPR 2026. 15 pages, 7 figures, 3 tables

2604.03652 2026-04-07 cs.CV

Motion-Adaptive Multi-Scale Temporal Modelling with Skeleton-Constrained Spatial Graphs for Efficient 3D Human Pose Estimation

Ruochen Li, Shuang Chen, Wenke E, Farshad Arvin, Amir Atapour-Abarghouei

Comments Accepted to IJCNN 2026, full paper

2604.03650 2026-04-07 cs.CL

CAGMamba: Context-Aware Gated Cross-Modal Mamba Network for Multimodal Sentiment Analysis

Minghai Jiao, Jing Xiao, Peng Xiao, Ende Zhang, Shuang Kan, Wenyan Jiang, Jinyao Li, Yixian Liu, Haidong Xin

2604.03649 2026-04-07 cs.CV cs.AI

ART: Adaptive Relational Transformer for Pedestrian Trajectory Prediction with Temporal-Aware Relations

Ruochen Li, Ziyi Chang, Junyan Hu, Jiannan Li, Amir Atapour-Abarghouei, Hubert P. H. Shum

2604.03640 2026-04-07 cs.CV cs.CR

ComPrivDet: Efficient Privacy Object Detection in Compressed Domains Through Inference Reuse

Yunhao Yao, Zhiqiang Wang, Ruiqi Li, Haoran Cheng, Puhan Luo, Xiangyang Li

Comments 6 pages, 6 figures

2604.03637 2026-04-07 cs.CV

SAGE-GAN: Towards Realistic and Robust Segmentation of Spatially Ordered Nanoparticles via Attention-Guided GANs

Anindya Pal, Varun Ajith, Saumik Bhattacharya, Sayantari Ghosh

Comments 10 pages, 7 figures, journal submission

2604.03635 2026-04-07 cs.CV cs.AI

A Generative Foundation Model for Multimodal Histopathology

Jinxi Xiang, Mingjie Li, Siyu Hou, Yijiang Chen, Xiangde Luo, Yuanfeng Ji, Xiang Zhou, Ehsan Adeli, Akshay Chaudhari, Curtis P. Langlotz, Kilian M. Pohl, Ruijiang Li

Comments 33 pages, 9 figures

详情

英文摘要

Accurate diagnosis and treatment of complex diseases require integrating histological, molecular, and clinical data, yet in practice these modalities are often incomplete owing to tissue scarcity, assay cost, and workflow constraints. Existing computational approaches attempt to impute missing modalities from available data but rely on task-specific models trained on narrow, single source-target pairs, limiting their generalizability. Here we introduce MuPD (Multimodal Pathology Diffusion), a generative foundation model that embeds hematoxylin and eosin (H&E)-stained histology, molecular RNA profiles, and clinical text into a shared latent space through a diffusion transformer with decoupled cross-modal attention. Pretrained on 100 million histology image patches, 1.6 million text-histology pairs, and 10.8 million RNA-histology pairs spanning 34 human organs, MuPD supports diverse cross-modal synthesis tasks with minimal or no task-specific fine-tuning. For text-conditioned and image-to-image generation, MuPD synthesizes histologically faithful tissue architectures, reducing Fréchet inception distance (FID) scores by 50% relative to domain-specific models and improving few-shot classification accuracy by up to 47% through synthetic data augmentation. For RNA-conditioned histology generation, MuPD reduces FID by 23% compared with the next-best method while preserving cell-type distributions across five cancer types. As a virtual stainer, MuPD translates H&E images to immunohistochemistry and multiplex immunofluorescence, improving average marker correlation by 37% over existing approaches. These results demonstrate that a single, unified generative model pretrained across heterogeneous pathology modalities can substantially outperform specialized alternatives, providing a scalable computational framework for multimodal histopathology.

URL PDF HTML ☆

赞 0 踩 0

2604.03631 2026-04-07 cs.AI

Single-agent vs. Multi-agents for Automated Video Analysis of On-Screen Collaborative Learning Behaviors

Likai Peng, Shihui Feng

Comments 15 pages, 4 figures. To be published in the 27th International Conference on Artificial Intelligence in Education (AIED2026)

2604.03630 2026-04-07 cs.AI q-bio.QM

A Multimodal Foundation Model of Spatial Transcriptomics and Histology for Biological Discovery and Clinical Prediction

Jinxi Xiang, Siyu Hou, Yuchen Li, Ryan Quinton, Xiaoming Zhang, Feyisope Eweje, Xiangde Luo, Yijiang Chen, Zhe Li, Colin Bergstrom, Ted Kim, Sierra Willens, Francesca Maria Olguin, Matthew Abikenari, Andrew Heider, Sanjeeth Rajaram, Joel Neal, Maximilian Diehn, Xiang Zhou, Ruijiang Li

Comments 29 pages, 5 figures. This manuscript is a work in progress; further updates and revisions will be posted as they become available

2604.03623 2026-04-07 cs.RO eess.SP

Towards Edge Intelligence via Autonomous Navigation: A Robot-Assisted Data Collection Approach

Tingting Huang, Yingyang Chen, Sixian Qin, Zhijian Lin, Jun Li, Li Wang

Comments 6 pages, 9 figures, submitted to IEEE International Conference on Communications (ICC) 2026

2604.03619 2026-04-07 cs.CV

Can Natural Image Autoencoders Compactly Tokenize fMRI Volumes for Long-Range Dynamics Modeling?

Peter Yongho Kim, Juhyeon Park, Jungwoo Park, Jubin Choi, Jungwoo Seo, Jiook Cha, Taesup Moon

Comments CVPR 2026

2604.03616 2026-04-07 cs.CL

The Format Tax

Ivan Yee Lee, Loris D'Antoni, Taylor Berg-Kirkpatrick

2604.03614 2026-04-07 cs.LG cs.AI

Neural Global Optimization via Iterative Refinement from Noisy Samples

Qusay Muzaffar, David Levin, Michael Werman

Comments 17 pages, 5 figures, 2 tables

2604.03613 2026-04-07 cs.RO

Human-Robot Copilot for Data-Efficient Imitation Learning

Rui Yan, Zaitian Gongye, Lars Paulsen, Xuxin Cheng, Xiaolong Wang

2604.03606 2026-04-07 cs.LG

BlazeFL: Fast and Deterministic Federated Learning Simulation

Kitsuya Azuma, Takayuki Nishio

Comments 9 pages, 4 figures. Accepted to the FedVision at CVPR 2026 (CVPRW)

2604.03603 2026-04-07 cs.CV cs.LG eess.IV

Stochastic Generative Plug-and-Play Priors

Chicago Y. Park, Edward P. Chandler, Yuyang Hu, Michael T. McCann, Cristina Garcia-Cardona, Brendt Wohlberg, Ulugbek S. Kamilov

2604.03599 2026-04-07 cs.LG

Evaluation of Bagging Predictors with Kernel Density Estimation and Bagging Score

Philipp Seitz, Jan Schmitt, Andreas Schiffler

Comments 5 pages, 2 figures, 2 tables, 1 algorithm, 9th International Conference on Advances in Artificial Intelligence (ICAAI 2025)

2604.03592 2026-04-07 cs.CL cs.AI

Unveiling Language Routing Isolation in Multilingual MoE Models for Interpretable Subnetwork Adaptation

Kening Zheng, Wei-Chieh Huang, Jiahao Huo, Zhonghao Li, Henry Peng Zou, Yibo Yan, Xin Zou, Jungang Li, Junzhuo Li, Hanrong Zhang, Xuming Hu, Philip S. Yu

2604.03590 2026-04-07 cs.CV

SBF: An Effective Representation to Augment Skeleton for Video-based Human Action Recognition

Zhuoxuan Peng, Yiyi Ding, Yang Lin, S. -H. Gary Chan

Comments Accepted by ABAW2026 (CVPR Workshop)

2604.03589 2026-04-07 cs.AI

Entropy and Attention Dynamics in Small Language Models: A Trace-Level Structural Analysis on the TruthfulQA Benchmark

Adeyemi Adeseye, Aisvarya Adeseye, Hannu Tenhunen, Jouni Isoaho

Comments Accepted to Publish it in 12th Intelligent Systems Conference 2026, 3-4 September 2026 in Amsterdam, The Netherlands

2604.03586 2026-04-07 cs.CL

MultiPress: A Multi-Agent Framework for Interpretable Multimodal News Classification

Tailong Luo, Hao Li, Rong Fu, Xinyue Jiang, Huaxuan Ding, Yiduo Zhang, Zilin Zhao, Simon Fong, Guangyin Jin, Jianyuan Ni

Comments Accepted in International Joint Conference on Neural Networks (IJCNN) 2026

2604.03583 2026-04-07 cs.CL

Text Summarization With Graph Attention Networks

Mohammadreza Ardestani, Yllias Chali

Comments Published in Proceedings of the 4th NeurIPS Efficient Natural Language and Speech Processing Workshop (ENLSP-IV), Vancouver, Canada, 2024. 14 pages, 8 figures

2604.03582 2026-04-07 cs.LG

Simple yet Effective: Low-Rank Spatial Attention for Neural Operators

Zherui Yang, Haiyang Xin, Tao Du, Ligang Liu

2604.03581 2026-04-07 cs.RO cs.CV

HAD: Combining Hierarchical Diffusion with Metric-Decoupled RL for End-to-End Driving

Wenhao Yao, Xinglong Sun, Zhenxin Li, Shiyi Lan, Zi Wang, Jose M. Alvarez, Zuxuan Wu

Comments 17 pages, 7 figures

2604.03572 2026-04-07 cs.CV physics.optics

Physics-Informed Untrained Learning for RGB-Guided Superresolution Single-Pixel Hyperspectral Imaging

Hao Zhang, Bilige Xu, Lichen Wei, Xu Ma, Wenyi Ren

Comments 9 pages, 13 figures, 5 tables

2604.03571 2026-04-07 cs.AI

Selective Forgetting for Large Reasoning Models

Tuan Le, Wei Qian, Mengdi Huai

2604.03565 2026-04-07 cs.AI cs.NE

Personality Requires Struggle: Three Regimes of the Baldwin Effect in Neuroevolved Chess Agents

Diego Armando Resendez Prado

Comments 18 pages, 4 figures, 4 tables

2604.03562 2026-04-07 cs.AI

When Adaptive Rewards Hurt: Causal Probing and the Switching-Stability Dilemma in LLM-Guided LEO Satellite Scheduling

Yuanhang Li

Comments 8 pages, 3 figures