arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.19157 2026-03-20 cs.CV

ADAPT: Attention Driven Adaptive Prompt Scheduling and InTerpolating Orthogonal Complements for Rare Concepts Generation

Kwanyoung Lee, Hyunwoo Oh, SeungJu Cha, Sungho Koh, Dong-Jin Kim

Comments Accepted in CVPR 2026 (findings). 10 pages, 4 figures; supplementary material included (8 pages, 10 figures)

2603.19152 2026-03-20 cs.CL cs.AI

VEPO: Variable Entropy Policy Optimization for Low-Resource Language Foundation Models

Chonghan Liu, Yimin Du, Qi An, Xin He, Cunqi Zhai, Fei Tan, Weijia Lin, Xiaochun Gong, Yongchao Deng, Shousheng Jia, Xiangzheng Zhang

Comments 23 pages. Includes figures and tables. Conference submission

2603.19149 2026-03-20 cs.CL cs.LG

Optimal Splitting of Language Models from Mixtures to Specialized Domains

Skyler Seto, Pierre Ablin, Anastasiia Filippova, Jiayuan Ye, Louis Bethune, Angelos Katharopoulos, David Grangier

Comments 26 pages, 11 tables, 17 figures

2603.19145 2026-03-20 cs.LG

Enhancing Pretrained Model-based Continual Representation Learning via Guided Random Projection

Ruilin Li, Heming Zou, Xiufeng Yan, Zheming Liang, Jie Yang, Chenliang Li, Xue Yang

2603.19144 2026-03-20 cs.CL cs.AI

UGID: Unified Graph Isomorphism for Debiasing Large Language Models

Zikang Ding, Junchi Yao, Junhao Li, Yi Zhang, Wenbo Jiang, Hongbo Liu, Lijie Hu

2603.19141 2026-03-20 cs.LG

SHAPCA: Consistent and Interpretable Explanations for Machine Learning Models on Spectroscopy Data

Mingxing Zhang, Nicola Rossberg, Simone Innocente, Katarzyna Komolibus, Rekha Gautam, Barry O'Sullivan, Luca Longo, Andrea Visentin

Comments 25 pages, 6 figures

2603.19139 2026-03-20 cs.LG q-bio.NC

Hierarchical Latent Structure Learning through Online Inference

Ines Aitsahalia, Kiyohito Iigaya

Comments 4 figures, 5 supplementary figures

2603.19138 2026-03-20 cs.AI cs.CR cs.SE

Implicit Patterns in LLM-Based Binary Analysis

Qiang Li, XiangRui Zhang, Haining Wang

Comments 18 pages

2603.19137 2026-03-20 cs.CV cs.RO

GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning

Yiren Lu, Yi Du, Disheng Liu, Yunlai Zhou, Chen Wang, Yu Yin

Comments Project page at https://vulab-ai.github.io/GSMem/

2603.19134 2026-03-20 cs.RO cs.HC

Introducing M: A Modular, Modifiable Social Robot

Victor Nikhil Antony, Zhili Gong, Yoonjae Kim, Chien-Ming Huang

2603.19131 2026-03-20 cs.LG cs.RO

From Inference Efficiency to Embodied Efficiency: Revisiting Efficiency Metrics for Vision-Language-Action Models

Zhuofan Li, Hongkun Yang, Zhenyang Chen, Yangxuan Chen, Yingyan, Lin, Chaojian Li

2603.19127 2026-03-20 cs.LG

On Optimizing Multimodal Jailbreaks for Spoken Language Models

Aravind Krishnan, Karolina Stańczak, Dietrich Klakow

Comments Under Review at INTERSPEECH 2026

2603.19122 2026-03-20 cs.CV

Revisiting Autoregressive Models for Generative Image Classification

Ilia Sudakov, Artem Babenko, Dmitry Baranchuk

Comments Tech report

2603.19118 2026-03-20 cs.AI cs.CL cs.LG

How Uncertainty Estimation Scales with Sampling in Reasoning Models

Maksym Del, Markus Kängsepp, Marharyta Domnich, Ardi Tampuu, Lisa Yankovskaya, Meelis Kull, Mark Fishel

2603.19098 2026-03-20 cs.CV

TAU-R1: Visual Language Model for Traffic Anomaly Understanding

Yuqiang Lin, Kehua Chen, Sam Lockyer, Arjun Yadav, Mingxuan Sui, Shucheng Zhang, Yan Shi, Bingzhang Wang, Yuang Zhang, Markus Zarbock, Florain Stanek, Adrian Evans, Wenbin Li, Yinhai Wang, Nic Zhang

2603.19097 2026-03-20 cs.CL cs.AI

DaPT: A Dual-Path Framework for Multilingual Multi-hop Question Answering

Yilin Wang, Yuchun Fan, Jiaoyang Li, Ziming Zhu, Yongyu Mu, Qiaozhi He, Tong Xiao, Jingbo Zhu

Comments Accepted by ICASSP 2026

2603.19092 2026-03-20 cs.CV cs.AI cs.CL cs.LG

SAVeS: Steering Safety Judgments in Vision-Language Models via Semantic Cues

Carlos Hinojosa, Clemens Grange, Bernard Ghanem

2603.19087 2026-03-20 cs.AI cs.CL

Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity

Qiawen Ella Liu, Marina Dubova, Henry Conklin, Takumi Harada, Thomas L. Griffiths

2603.19082 2026-03-20 cs.CL

A Dataset and Resources for Identifying Patient Health Literacy Information from Clinical Notes

Madeline Bittner, Dina Demner-Fushman, Yasmeen Shabazz, Davis Bartels, Dukyong Yoon, Brad Quitadamo, Rajiv Menghrajani, Leo Celi, Sarvesh Soni

2603.19077 2026-03-20 cs.CV

Multi-Modal Building Change Detection for Large-Scale Small Changes: Benchmark and Baseline

Ye Wang, Wei Lu, Zhihui You, Keyan Chen, Tongfei Liu, Kaiyu Li, Hongruixuan Chen, Qingling Shu, Sibao Chen

Comments 15 pages, 12 figures

详情

英文摘要

Change detection in optical remote sensing imagery is susceptible to illumination fluctuations, seasonal changes, and variations in surface land-cover materials. Relying solely on RGB imagery often produces pseudo-changes and leads to semantic ambiguity in features. Incorporating near-infrared (NIR) information provides heterogeneous physical cues that are complementary to visible light, thereby enhancing the discriminability of building materials and tiny structures while improving detection accuracy. However, existing multi-modal datasets generally lack high-resolution and accurately registered bi-temporal imagery, and current methods often fail to fully exploit the inherent heterogeneity between these modalities. To address these issues, we introduce the Large-scale Small-change Multi-modal Dataset (LSMD), a bi-temporal RGB-NIR building change detection benchmark dataset targeting small changes in realistic scenarios, providing a rigorous testing platform for evaluating multi-modal change detection methods in complex environments. Based on LSMD, we further propose the Multi-modal Spectral Complementarity Network (MSCNet) to achieve effective cross-modal feature fusion. MSCNet comprises three key components: the Neighborhood Context Enhancement Module (NCEM) to strengthen local spatial details, the Cross-modal Alignment and Interaction Module (CAIM) to enable deep interaction between RGB and NIR features, and the Saliency-aware Multisource Refinement Module (SMRM) to progressively refine fused features. Extensive experiments demonstrate that MSCNet effectively leverages multi-modal information and consistently outperforms existing methods under multiple input configurations, validating its efficacy for fine-grained building change detection. The source code will be made publicly available at: https://github.com/AeroVILab-AHU/LSMD

URL PDF HTML ☆

赞 0 踩 0

2603.19076 2026-03-20 cs.CV cs.RO

DROID-SLAM in the Wild

Moyang Li, Zihan Zhu, Marc Pollefeys, Daniel Barath

Comments CVPR 2026, Project Page: https://moyangli00.github.io/droid-w/

2603.19074 2026-03-20 cs.RO cs.AI

CAMO: A Conditional Neural Solver for the Multi-objective Multiple Traveling Salesman Problem

Fengxiaoxiao Li, Xiao Mao, Mingfeng Fan, Yifeng Zhang, Yi Li, Tanishq Duhan, Guillaume Sartoretti

Comments 9 pages, 3 figures

2603.19067 2026-03-20 cs.LG eess.SP

Communication-Efficient and Robust Multi-Modal Federated Learning via Latent-Space Consensus

Mohamed Badi, Chaouki Ben Issaid, Mehdi Bennis

Comments Accepted for publication in IEEE Wireless Communications Letters

2603.19066 2026-03-20 cs.CL cs.AI

Parallelograms Strike Back: LLMs Generate Better Analogies than People

Qiawen Ella Liu, Raja Marjieh, Jian-Qiao Zhu, Adele E. Goldberg, Thomas L. Griffiths

2603.19063 2026-03-20 cs.RO cs.GR

Fire as a Service: Augmenting Robot Simulators with Thermally and Visually Accurate Fire Dynamics

Anton R. Wagner, Madhan Balaji Rao, Helge Wrede, Sören Pirk, Xuesu Xiao

2603.19059 2026-03-20 cs.CV

SignAgent: Agentic LLMs for Linguistically-Grounded Sign Language Annotation and Dataset Curation

Oliver Cory, Ozge Mercanoglu Sincan, Richard Bowden

2603.19054 2026-03-20 cs.CV cs.AI

Em-Garde: A Propose-Match Framework for Proactive Streaming Video Understanding

Yikai Zheng, Xin Ding, Yifan Yang, Shiqi Jiang, Hao Wu, Qianxi Zhang, Weijun Wang, Ting Cao, Yunxin Liu

2603.19053 2026-03-20 cs.CV cs.GR

SwiftTailor: Efficient 3D Garment Generation with Geometry Image Representation

Phuc Pham, Uy Dieu Tran, Binh-Son Hua, Phong Nguyen

Comments CVPR 2026

2603.19048 2026-03-20 cs.CV

Measuring 3D Spatial Geometric Consistency in Dynamic Generated Videos

Weijia Dou, Wenzhao Zheng, Weiliang Chen, Yu Zheng, Jie Zhou, Jiwen Lu

Comments Code available at https://github.com/tj12323/SGC

2603.19039 2026-03-20 cs.CV

TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation

Yan Shu, Bin Ren, Zhitong Xiong, Xiao Xiang Zhu, Begüm Demir, Nicu Sebe, Paolo Rota

Comments Accepted by CVPR20206 (Main Track)