arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.09938 2026-03-31 cs.CL

Model Merging in the Era of Large Language Models: Methods, Applications, and Future Directions

Mingyang Song, Mao Zheng

详情

英文摘要

Model merging combines the parameters of multiple neural networks into a single model without additional training. As fine-tuned large language models (LLMs) proliferate, merging offers a computationally efficient alternative to ensembles and full retraining, enabling practitioners to compose specialized capabilities at minimal cost. This survey examines model merging in the LLM era through the \textbf{FUSE} taxonomy, organized along \textbf{F}oundations, \textbf{U}nification Strategies, \textbf{S}cenarios, and \textbf{E}cosystem. We first establish the theoretical underpinnings of merging, including loss landscape geometry and mode connectivity, then systematically review the algorithmic space spanning weight averaging, task vector arithmetic, sparsification-enhanced methods, mixture-of-experts architectures, and evolutionary optimization. We further examine downstream applications across multi-task learning, safety alignment, domain specialization, and federated learning, and survey the supporting ecosystem of tools and evaluation benchmarks. Finally, we identify key open challenges and future directions, aiming to equip researchers and practitioners with a structured foundation for advancing model merging.

URL PDF HTML ☆

赞 0 踩 0

2602.13191 2026-03-31 cs.CV cs.AI cs.CL

CoPE-VideoLM: Leveraging Codec Primitives For Efficient Video Language Modeling

Sayan Deb Sarkar, Rémi Pautrat, Ondrej Miksik, Marc Pollefeys, Iro Armeni, Mahdi Rad, Mihai Dusmanu

Comments Project Page: https://microsoft.github.io/CoPE

2602.11448 2026-03-31 cs.LG cs.CV

Hierarchical Concept Embedding & Pursuit for Interpretable Image Classification

Nghia Nguyen, Tianjiao Ding, René Vidal

Comments To be published in Conference on Computer Vision and Pattern Recognition (CVPR) 2026

2601.15288 2026-03-31 cs.CV

APPLE: Attribute-Preserving Pseudo-Labeling for Diffusion-Based Face Swapping

Jiwon Kang, Yeji Choi, JoungBin Lee, Wooseok Jang, Jinhyeok Choi, Taekeun Kang, Yongjae Park, Myungin Kim, Seungryong Kim

Comments Accepted at CVPR 2026. Project Page: https://cvlab-kaist.github.io/APPLE/

2601.11404 2026-03-31 cs.RO

ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models

Linqing Zhong, Yi Liu, Yifei Wei, Ziyu Xiong, Maoqing Yao, Si Liu, Guanghui Ren

Comments Accepted by Conference on Computer Vision and Pattern Recognition (CVPR) 2026

2601.04497 2026-03-31 cs.CV cs.AI cs.CL

Vision-Language Agents for Interactive Forest Change Analysis

James Brock, Ce Zhang, Nantheera Anantrasirichai

Comments 5 pages, 4 figures, Accepted into IGARSS 2026

2512.04890 2026-03-31 cs.CV

Equivariant symmetry-aware head pose estimation for fetal MRI

Ramya Muthukrishnan, Borjan Gagoski, Aryn Lee, P. Ellen Grant, Elfar Adalsteinsson, Benjamin Billot, Polina Golland

2511.07732 2026-03-31 cs.RO cs.AI cs.CL cs.CV cs.LG

ViPRA: Video Prediction for Robot Actions

Sandeep Routray, Hengkai Pan, Unnat Jain, Shikhar Bahl, Deepak Pathak

Comments In ICLR 2026. Website: https://vipra-project.github.io

2510.10063 2026-03-31 cs.CL cs.AI

CLMN: Concept based Language Models via Neural Symbolic Reasoning

Yibo Yang

Comments 7 pages, 2 figures

2510.01349 2026-03-31 cs.LG stat.ML

To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking

Hannah Lawrence, Elyssa Hofgard, Vasco Portilheiro, Yuxuan Chen, Tess Smidt, Robin Walters

Comments Published as a conference paper at ICLR 2026. A short version of this paper appeared at the ICLR AI4Mat workshop in April 2025

2509.22578 2026-03-31 cs.RO

EgoDemoGen: Egocentric Demonstration Generation for Viewpoint Generalization in Robotic Manipulation

Yuan Xu, Jiabing Yang, Xiaofeng Wang, Yixiang Chen, Zheng Zhu, Bowen Fang, Guan Huang, Xinze Chen, Yun Ye, Qiang Zhang, Peiyan Li, Xiangnan Wu, Kai Wang, Bing Zhan, Shuo Lu, Jing Liu, Nianfeng Liu, Yan Huang, Liang Wang

2509.04094 2026-03-31 cs.RO

Object-Reconstruction-Aware Whole-body Control of Mobile Manipulators

Fatih Dursun, Bruno Vilhena Adorno, Simon Watson, Wei Pan

Comments 19 pages, 17 figures, 5 tables. Under Review for the IEEE Transactions on Robotics (T-RO)

2508.09096 2026-03-31 cs.CL cs.IR

Link Prediction for Event Logs in the Process Industry

Anastasia Zhukova, Thomas Walton, Christian E. Lobmüller, Bela Gipp

Comments accepted to RESOURCEFUL 2026, co-located with LREC 2026

2505.05800 2026-03-31 cs.RO cs.CV

3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks

Vineet Bhat, Yu-Hsiang Lan, Prashanth Krishnamurthy, Ramesh Karri, Farshad Khorrami

Comments Accepted at the 1st Workshop on 3D LLM/VLA, CVPR 2025. This work has been submitted to the IEEE for possible publication

2411.17132 2026-03-31 cs.LG

Understanding SAM's Robustness to Noisy Labels through Gradient Down-weighting

Hoang-Chau Luong, Quang-Thuc Nguyen, Dat Ba Tran, Minh-Triet Tran

2409.11018 2026-03-31 cs.CV

Unleashing the Potential of Mamba: Boosting a LiDAR 3D Sparse Detector by Using Cross-Model Knowledge Distillation

Rui Yu, Runkai Zhao, Jiagen Li, Qingsong Zhao, HuaiCheng Yan, Meng Wang

2404.15390 2026-03-31 cs.LG cs.AI

Remedying uncertainty representations in visual inference through Explaining-Away Variational Autoencoders

Josefina Catoni, Domonkos Martos, Ferenc Csikor, Enzo Ferrante, Diego H. Milone, Balázs Meszéna, Gergő Orbán, Rodrigo Echeveste

2603.28625 2026-03-31 cs.RO cs.AI cs.SY eess.SY

Dynamic Lookahead Distance via Reinforcement Learning-Based Pure Pursuit for Autonomous Racing

Mohamed Elgouhary, Amr S. El-Wakeel

2603.28613 2026-03-31 cs.CV cs.AI cs.CR cs.MM

TGIF2: Extended Text-Guided Inpainting Forgery Dataset & Benchmark

Hannes Mareen, Dimitrios Karageorgiou, Paschalis Giakoumoglou, Peter Lambert, Symeon Papadopoulos, Glenn Van Wallendael

Comments 33 pages, accepted at Journal on Information Security

2603.28611 2026-03-31 cs.LG

LACE: Loss-Adaptive Capacity Expansion for Continual Learning

Shivnath Tathe

2603.28605 2026-03-31 cs.CV cs.CY cs.LG

Unsafe2Safe: Controllable Image Anonymization for Downstream Utility

Mih Dinh, SouYoung Jin

Comments Accepted at CVPR 2026 and CVPR 2026 Workshop on Machine Unlearning for Computer Vision

2603.28603 2026-03-31 cs.CV

ELViS: Efficient Visual Similarity from Local Descriptors that Generalizes Across Domains

Pavel Suma, Giorgos Kordopatis-Zilos, Yannis Kalantidis, Giorgos Tolias

Comments ICLR 2026

2603.28597 2026-03-31 cs.LG

Position: Explainable AI is Causality in Disguise

Amir-Hossein Karimi

2603.28593 2026-03-31 cs.LG physics.app-ph

Physics-Informed Framework for Impact Identification in Aerospace Composites

Natália Ribeiro Marinho, Richard Loendersloot, Jan Willem Wiegman, Frank Grooteman, Tiedo Tinga

2603.28589 2026-03-31 cs.AI cs.LG

Towards a Medical AI Scientist

Hongtao Wu, Boyun Zheng, Dingjie Song, Yu Jiang, Jianfeng Gao, Lei Xing, Lichao Sun, Yixuan Yuan

详情

英文摘要

Autonomous systems that generate scientific hypotheses, conduct experiments, and draft manuscripts have recently emerged as a promising paradigm for accelerating discovery. However, existing AI Scientists remain largely domain-agnostic, limiting their applicability to clinical medicine, where research is required to be grounded in medical evidence with specialized data modalities. In this work, we introduce Medical AI Scientist, the first autonomous research framework tailored to clinical autonomous research. It enables clinically grounded ideation by transforming extensively surveyed literature into actionable evidence through clinician-engineer co-reasoning mechanism, which improves the traceability of generated research ideas. It further facilitates evidence-grounded manuscript drafting guided by structured medical compositional conventions and ethical policies. The framework operates under 3 research modes, namely paper-based reproduction, literature-inspired innovation, and task-driven exploration, each corresponding to a distinct level of automated scientific inquiry with progressively increasing autonomy. Comprehensive evaluations by both large language models and human experts demonstrate that the ideas generated by the Medical AI Scientist are of substantially higher quality than those produced by commercial LLMs across 171 cases, 19 clinical tasks, and 6 data modalities. Meanwhile, our system achieves strong alignment between the proposed method and its implementation, while also demonstrating significantly higher success rates in executable experiments. Double-blind evaluations by human experts and the Stanford Agentic Reviewer suggest that the generated manuscripts approach MICCAI-level quality, while consistently surpassing those from ISBI and BIBM. The proposed Medical AI Scientist highlights the potential of leveraging AI for autonomous scientific discovery in healthcare.

URL PDF HTML ☆

赞 0 踩 0

2603.28583 2026-03-31 cs.CV cs.AI cs.MM

Navigating the Mirage: A Dual-Path Agentic Framework for Robust Misleading Chart Question Answering

Yanjie Zhang, Yafei Li, Rui Sheng, Zixin Chen, Yanna Lin, Huamin Qu, Lei Chen, Yushi Sun

Comments 10pages, 4 figures

2603.28581 2026-03-31 cs.RO

A Self-Rotating Tri-Rotor UAV for Field of View Expansion and Autonomous Flight

Xiaobin Zhou, Zihao Zheng, Aoxu Jin, Lei Qiang, Bo Zhu

2603.28575 2026-03-31 cs.LG cs.AI

ChemCLIP: Bridging Organic and Inorganic Anticancer Compounds Through Contrastive Learning

Mohamad Koohi-Moghadam, Hongzhe Sun, Hongyan Li, Kyongtae Tyler Bae

Comments 15 pages

2603.28573 2026-03-31 cs.LG cs.AI cs.MA

Learning Partial Action Replacement in Offline MARL

Yue Jin, Giovanni Montana

2603.28572 2026-03-31 cs.LG

Unrestrained Simplex Denoising for Discrete Data. A Non-Markovian Approach Applied to Graph Generation

Yoann Boget, Alexandros Kalousis

Comments Simplex Denoising