arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2410.05352 2026-03-31 cs.LG cs.AI

Recent Advances of Multimodal Continual Learning: A Comprehensive Survey

Dianzhi Yu, Xinni Zhang, Yankai Chen, Aiwei Liu, Yifei Zhang, Philip S. Yu, Irwin King

详情

英文摘要

Continual learning (CL) aims to empower machine learning models to learn continually from new data, while building upon previously acquired knowledge without forgetting. As models have evolved from small to large pre-trained architectures, and from supporting unimodal to multimodal data, multimodal continual learning (MMCL) methods have recently emerged. The primary complexity of MMCL is that it extends beyond a simple stacking of unimodal CL methods. Such straightforward approaches often suffer from multimodal catastrophic forgetting, yielding unsatisfactory performance. In addition, MMCL introduces new challenges that unimodal CL methods fail to adequately address, including modality imbalance, complex modality interaction, high computational costs, and degradation of pre-trained zero-shot capability of multimodal backbones. In this work, we present the first comprehensive survey on MMCL. We provide essential background knowledge and MMCL settings, as well as a structured taxonomy of MMCL methods. We categorize MMCL methods into four categories, i.e., regularization-based, architecture-based, replay-based, and prompt-based methods, explaining their methodologies and highlighting their key innovations. Additionally, to prompt further research in this field, we summarize open MMCL datasets and benchmarks, provide an in-depth discussion, and discuss several promising future directions. We have also created a GitHub repository for indexing relevant MMCL papers and open resources available at https://github.com/LucyDYu/Awesome-Multimodal-Continual-Learning.

URL PDF HTML ☆

赞 0 踩 0

2409.20283 2026-03-31 cs.CV

Match Stereo Videos via Bidirectional Alignment

Junpeng Jing, Ye Mao, Anlan Qiu, Krystian Mikolajczyk

Comments TPAMI 2026

2409.03166 2026-03-31 cs.RO cs.AI cs.CL

Continual Robot Skill and Task Learning via Dialogue

Weiwei Gu, Suresh Kondepudi, Anmol Gupta, Lixiao Huang, Nakul Gopalan

2405.13580 2026-03-31 cs.CV cs.HC

AltChart: Enhancing VLM-based Chart Summarization Through Multi-Pretext Tasks

Omar Moured, Jiaming Zhang, M. Saquib Sarfraz, Rainer Stiefelhagen

Comments Concerns about reproducibility of the train results and dataset availability

2402.05066 2026-03-31 cs.RO

Mobile Robot Exploration Without Maps via Out-of-Distribution Deep Reinforcement Learning

Shathushan Sivashangaran, Apoorva Khairnar, Azim Eskandarian

Comments \c{opyright} 2025 the authors. This work has been accepted to IFAC for publication under a Creative Commons License CC-BY-NC-ND

2306.02192 2026-03-31 cs.LG cs.NA math.NA

Correcting Auto-Differentiation in Neural-ODE Training

Yewei Xu, Shi Chen, Qin Li

Comments Accepted for publication in SIAM Journal on Applied Mathematics. This version corresponds to the final draft, prior to copyediting and production

2301.12230 2026-03-31 cs.LG cs.AI

Continual Graph Learning: A Survey

Qiao Yuan, Sheng-Uei Guan, Pin Ni, Tianlun Luo, Ka Lok Man, Prudence Wong, Victor Chang

2211.01512 2026-03-31 cs.LG math.ST stat.TH

Convergence of the Inexact Langevin Algorithm in KL Divergence with Application to Score-based Generative Models

Kaylee Yingxi Yang, Andre Wibisono

Comments Improved SGM convergence dependency on the LSI constant, and a minor correction to the MGF error assumption

2209.14267 2026-03-31 cs.LG cs.CV

Less is More: Rethinking Few-Shot Learning and Recurrent Neural Nets

Deborah Pereg, Martin Villiger, Brett Bouma, Polina Golland

Comments Version 3 is focused exclusively on the first part of v1 and v2, correcting minor mathematical errors. The original co-authors have transitioned in separate follow-up works

2603.26934 2026-03-31 cs.CV

Leveraging Avatar Fingerprinting: A Multi-Generator Photorealistic Talking-Head Public Database and Benchmark

Laura Pedrouzo-Rodriguez, Luis F. Gomez, Ruben Tolosana, Ruben Vera-Rodriguez, Roberto Daza, Aythami Morales, Julian Fierrez

2603.26929 2026-03-31 cs.CV

Live Interactive Training for Video Segmentation

Xinyu Yang, Haozheng Yu, Yihong Sun, Bharath Hariharan, Jennifer J. Sun

Comments CVPR 2026

2603.26908 2026-03-31 cs.CV

FusionAgent: A Multimodal Agent with Dynamic Model Selection for Human Recognition

Jie Zhu, Xiao Guo, Yiyang Su, Anil Jain, Xiaoming Liu

Comments CVPR 2026

2603.26900 2026-03-31 cs.CV

Computer Vision with a Superpixelation Camera

Sasidharan Mahalingam, Rachel Brown, Atul Ingle

2603.26891 2026-03-31 cs.LG cs.AI cs.GT

Strategic Candidacy in Generative AI Arenas

Chris Hays, Rachel Li, Bailey Flanigan, Manish Raghavan

Comments 43 pages, 5 figures

2603.26889 2026-03-31 cs.LG cond-mat.mtrl-sci

Property-Guided Molecular Generation and Optimization via Latent Flows

Alexander Arjun Lobo, Urvi Awasthi, Leonid Zhukov

Comments 25 pages, 18 figures. Accepted to ICLR 2026 AI4Mat Workshop

2603.26866 2026-03-31 cs.CV cs.AI

LACON: Training Text-to-Image Model from Uncurated Data

Zhiyang Liang, Ziyu Wan, Hongyu Liu, Dong Chen, Qiu Shen, Hao Zhu, Dongdong Chen

2603.26859 2026-03-31 cs.CV cs.AI eess.IV

Beyond Textual Knowledge-Leveraging Multimodal Knowledge Bases for Enhancing Vision-and-Language Navigation

Dongsheng Yang, Yinfeng Yu, Liejun Wang

Comments Main paper (37 pages). Accepted for publication by the Information Processing and Management,Volume 63,Issue 6,September 2026,104766

2603.26858 2026-03-31 cs.LG math.SP q-bio.GN stat.ML

A Hierarchical Sheaf Spectral Embedding Framework for Single-Cell RNA-seq Analysis

Xiang Xiang Wang, Guo-Wei We

2603.26856 2026-03-31 cs.SD cs.AI eess.AS

AFSS: Artifact-Focused Self-Synthesis for Mitigating Bias in Audio Deepfake Detection

Hai-Son Nguyen-Le, Hung-Cuong Nguyen-Thanh, Nhien-An Le-Khac, Dinh-Thuc Nguyen, Hong-Hanh Nguyen-Le

Comments Accepted at International Joint Conference on Neural Networks 2026

2603.26849 2026-03-31 cs.CV

Dual-View Optical Flow for 4D Micro-Expression Recognition - A Multi-Stream Fusion Attention Approach

Luu Tu Nguyen, Thi Bich Phuong Man, Vu Tram Anh Khuong, Thanh Ha Le, Thi Duyen Ngo

2603.26841 2026-03-31 cs.LG cs.AI

FatigueFormer: Static-Temporal Feature Fusion for Robust sEMG-Based Muscle Fatigue Recognition

Tong Zhang, Hong Guo, Shuangzhou Yan, Dongkai Weng, Jian Wang, Hongxin Zhang

2603.26838 2026-03-31 cs.AI cs.LG

Concerning Uncertainty -- A Systematic Survey of Uncertainty-Aware XAI

Helena Löfström, Tuwe Löfström, Anders Hjort, Fatima Rabia Yapicioglu

Comments 21 pages, 2 figures, journal

2603.26837 2026-03-31 cs.RO cs.AI cs.CV

SpatialAnt: Autonomous Zero-Shot Robot Navigation via Active Scene Reconstruction and Visual Anticipation

Jiwen Zhang, Xiangyu Shi, Siyuan Wang, Zerui Li, Zhongyu Wei, Qi Wu

Comments 10 pages, 4 figures, 5 tables. Homepage: https://imnearth.github.io/Spatial-X/

2603.26831 2026-03-31 cs.CV cs.AI

Envisioning global urban development with satellite imagery and generative AI

Kailai Sun, Yuebing Liang, Mingyi He, Yunhan Zheng, Alok Prakash, Shenhao Wang, Jinhua Zhao, Alex "Sandy'' Pentland

2603.26830 2026-03-31 cs.LG cs.AI cs.SE

A Regression Framework for Understanding Prompt Component Impact on LLM Performance

Andrew Lauziere, Jonathan Daugherty, Taisa Kushner

Comments 9 pages, 4 figures, 1 table

2603.26829 2026-03-31 cs.LG cs.AI

Squish and Release: Exposing Hidden Hallucinations by Making Them Surface as Safety Signals

Nathaniel Oh, Paul Attie

2603.26828 2026-03-31 cs.CL cs.LG

Arithmetic OOD Failure Unfolds in Stages in Minimal GPTs

Seine A. Shintani

Comments 16 pages, 4 figures

2603.26827 2026-03-31 cs.LG cs.AI cs.CV

Central-to-Local Adaptive Generative Diffusion Framework for Improving Gene Expression Prediction in Data-Limited Spatial Transcriptomics

Yaoyu Fang, Jiahe Qian, Xinkun Wang, Lee A. Cooper, Bo Zhou

Comments 31 pages, 12 figures, under review

2603.26823 2026-03-31 cs.LG cs.AI cs.PF

Throughput Optimization as a Strategic Lever in Large-Scale AI Systems: Evidence from Dataloader and Memory Profiling Innovations

Mayank Jha

Comments 5 pages double sided

2603.26821 2026-03-31 cs.LG cs.AI

Epileptic Seizure Prediction Using Patient-Adaptive Transformer Networks

Mohamed Mahdi, Asma Baghdadi