arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2512.24955 2026-03-24 cs.LG cs.AI cs.RO cs.SY eess.SY

MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control

Yongwei Zhang, Yuanzhe Xing, Quanyi Liang, Quan Quan, Zhikun She

Comments This work has been submitted to the IEEE for possible publication

详情

英文摘要

For stabilizing control tasks, model-free reinforcement learning (RL) approaches face numerous challenges, particularly regarding the issues of effectiveness and efficiency in complex high-dimensional environments with limited training data. To address these challenges, we propose Multi-Step Actor-Critic Learning with Lyapunov Certificates (MSACL), a novel approach that integrates exponential stability into off-policy maximum entropy reinforcement learning (MERL). In contrast to existing RL-based approaches that depend on elaborate reward engineering and single-step constraints, MSACL adopts intuitive reward design and exploits multi-step samples to enable exploratory actor-critic learning. Specifically, we first introduce Exponential Stability Labels (ESLs) to categorize training samples and propose a $λ$-weighted aggregation mechanism to learn Lyapunov certificates. Based on these certificates, we further design a stability-aware advantage function to guide policy optimization, thereby promoting rapid Lyapunov descent and robust state convergence. We evaluate MSACL across six benchmarks, comprising four stabilizing and two high-dimensional tracking tasks. Experimental results demonstrate its consistent performance improvements over both standard RL baselines and state-of-the-art Lyapunov-based RL algorithms. Beyond rapid convergence, MSACL exhibits robustness against environmental uncertainties and generalization to unseen reference signals. The source code and benchmarking environments are available at \href{https://github.com/YuanZhe-Xing/MSACL}{https://github.com/YuanZhe-Xing/MSACL}.

URL PDF HTML ☆

赞 0 踩 0

2512.24845 2026-03-24 cs.RO

ArtiSG: Functional 3D Scene Graph Construction via Human-demonstrated Articulated Objects Manipulation

Qiuyi Gu, Yuze Sheng, Jincheng Yu, Jiahao Tang, Xiaolong Shan, Zhaoyang Shen, Tinghao Yi, Xiaodan Liang, Xinlei Chen, Yu Wang

2512.21778 2026-03-24 cs.CV

Scene-VLM: Multimodal Video Scene Segmentation via Vision-Language Models

Nimrod Berman, Adam Botach, Emanuel Ben-Baruch, Shunit Haviv Hakimi, Asaf Gendler, Ilan Naiman, Erez Yosef, Igor Kviatkovsky

Comments Accepted for publication at CVPR 2026

2512.21284 2026-03-24 cs.CV

Toward Real-Time Surgical Scene Segmentation via a Spike-Driven Video Transformer with Spike-Informed Pretraining

Shihao Zou, Jingjing Li, Wei Ji, Jincai Huang, Kai Wang, Guo Dan, Weixin Si, Yi Pan

详情

英文摘要

Modern surgical systems increasingly rely on intelligent scene understanding to improve intra-operative safety and situational awareness, with surgical scene segmentation playing a fundamental role in fine-grained surgical perception. Although recent ANN models, especially large foundation models, have achieved impressive accuracy, their high computational and energy demands often hinder deployment in resource-constrained operative environments. To address this challenge, we explore SNN as a highly efficient paradigm. However, its performance in surgical scene segmentation remains constrained by sparse spike representations and limited annotated surgical data. We therefore propose SpikeSurgSeg, the first spike-driven video Transformer for surgical scene segmentation. It preserves the real-time and energy-efficient advantages of SNN, while achieving competitive performance against most ANN models in data-scarce surgical scenarios. Specifically, we introudce a spike-informed pretraining strategy based on MAE, where mask generation is guided by spike firing activity to better align with sparse spike representations, together with a layer-wise tube masking scheme that reduces information leakage and encourages contextual reasoning. To further strengthen semantic representation, we introduce multi-spectral knowledge distillation, which aligns teacher ANN and student SNN features in the frequency domain, where the mismatch between continuous activation patterns and spike-driven temporal representations can be effectively mitigated. Built on the pretrained SNN encoder, we further design a lightweight spike-driven segmentation head. Extensive experiments on EndoVis18 and our in-house SurgBleed dataset show that SpikeSurgSeg achieves mIoU comparable to SOTA ANN models while reducing inference latency by at least 8x. Notably, it delivers over 20x speedup relative to most foundation models.

URL PDF HTML ☆

赞 0 踩 0

2512.19402 2026-03-24 cs.RO cs.CV cs.GR

Real2Edit2Real: Generating Robotic Demonstrations via a 3D Control Interface

Yujie Zhao, Hongwei Fan, Di Chen, Shengcong Chen, Liliang Chen, Xiaoqi Li, Guanghui Ren, Hao Dong

Comments Accepted to CVPR 2026

2512.18687 2026-03-24 cs.AI

Social Comparison without Explicit Inference of Others' Reward Values: A Constructive Approach Using a Probabilistic Generative Model

Yosuke Taniuchi, Chie Hieida, Atsushi Noritake, Kazushi Ikeda, Masaki Isoda

Comments This work has been submitted to the IEEE for possible publication

2512.16975 2026-03-24 cs.CV cs.AI

InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression

Haotian Ye, Qiyuan He, Jiaqi Han, Puheng Li, Jiaojiao Fan, Zekun Hao, Fitsum Reda, Yogesh Balaji, Huayu Chen, Sheng Liu, Angela Yao, James Zou, Stefano Ermon, Haoxiang Wang, Ming-Yu Liu

2512.14395 2026-03-24 cs.AI

Massive Editing for Large Language Models Based on Dynamic Weight Generation

Wentao Wan, Qiqing Lao, Zhiwei Xie, Hefeng Wu, Runnan Lin, Liang Lin, Keze Wang

Comments Accepted by ICLR 2026

2512.13285 2026-03-24 cs.CV

CausalCLIP: Causally-Informed Feature Disentanglement and Filtering for Generalizable Detection of Generated Images

Bo Liu, Qiao Qin, Qinghui He

Comments 9 pages,Accepted to AAAI 2026

2512.13157 2026-03-24 cs.CV cs.AI

Intrinsic Image Fusion for Multi-View 3D Material Reconstruction

Peter Kocsis, Lukas Höllein, Matthias Nießner

Comments Project page: https://peter-kocsis.github.io/IntrinsicImageFusion/ Video: https://www.youtube.com/watch?v=-Vs3tR1Xl7k

2512.11438 2026-03-24 cs.CV cs.AI

Flowception: Temporally Expansive Flow Matching for Video Generation

Tariq Berrada Ifriqi, John Nguyen, Karteek Alahari, Jakob Verbeek, Ricky T. Q. Chen

2512.11374 2026-03-24 cs.CL cs.CY

Mining Legal Arguments to Study Judicial Formalism

Tomáš Koref, Lena Held, Mahammad Namazov, Harun Kumru, Yassine Thlija, Ivan Habernal

Comments pre-print under review

2512.07698 2026-03-24 cs.CV cs.RO

sim2art: Accurate Articulated Object Modeling from a Single Video using Synthetic Training Data Only

Arslan Artykov, Tom Ravaud, Corentin Sautier, Vincent Lepetit

2512.07010 2026-03-24 cs.LG

Always Keep Your Promises: A Model-Agnostic Attribution Algorithm for Neural Networks

Kevin Lee, Duncan Smith-Halverson, Pablo Millan Arias

Comments Manuscript under review

详情

英文摘要

Layer-wise Relevance Propagation (LRP) provides principled attribution for neural networks through conservation properties and foundations in Deep Taylor Decomposition. However, existing implementations operate at the module level, requiring architecture-specific propagation rules and model modifications. These limit the generality of target model and sustainability of implementations as architectures evolve. We introduce DynamicLRP, a model-agnostic LRP framework operating at the tensor operation level. By decomposing attribution to individual operations within computation graphs and introducing a novel mechanism for deferred activation resolution, named the Promise System, our approach achieves true architecture agnosticity while maintaining LRP's theoretical guarantees. This design operates independently of backpropagation machinery, requiring no model modification, enabling side-by-side execution with gradient backpropagation. Being based on computation graphs, this method is theoretically extensible to other deep learning libraries that support auto-differentiation. We demonstrate faithfulness matching or exceeding specialized implementations (1.77 vs 1.69 ABPC on VGG, equivalent performance on ViT, 93.70% and 95.06% top-1 attribution accuracy for explaining RoBERTa-large and Flan-T5-large answers on SQuADv2, respectively) while maintaining practical efficiency on models with 100M-1B parameters. We achieved 99.92% node coverage across 31,465 computation graph nodes from 15 diverse architectures, including state-space models (Mamba), audio transformers (Whisper), and multimodal systems (DePlot) without any model-specific code with rules for 47 fundamental operations implemented. Our operation-level decomposition and Promise System establish a sustainable, extensible foundation for LRP across evolving architectures. All code is available at https://github.com/keeinlev/dynamicLRP .

URL PDF HTML ☆

赞 0 踩 0

2512.06476 2026-03-24 cs.CL

Knowing What's Missing: Assessing Information Sufficiency in Question Answering

Akriti Jain, Aparna Garimella

Comments Accepted to EACL Findings 2026

2512.05959 2026-03-24 cs.CL cs.AI cs.CV

M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG

David Anugraha, Patrick Amadeus Irawan, Anshul Singh, En-Shiun Annie Lee, Genta Indra Winata

Comments Accepted to CVPR 2026

2512.05556 2026-03-24 cs.LG cs.AI

Beyond Linear Surrogates: High-Fidelity Local Explanations for Black-Box Models

Sanjeev Shrestha, Rahul Dubey, Hui Liu

Comments Accepted at the AAAI Spring Symposium Series 2026

2512.03794 2026-03-24 cs.CV cs.AI cs.CL cs.LG

AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition

Zichuan Lin, Yicheng Liu, Yang Yang, Lvfang Tao, Deheng Ye

Comments Accepted by CVPR 2026. Code and models are available at https://github.com/AdaptVision/AdaptVision

2512.01755 2026-03-24 cs.CV

FreqEdit: Preserving High-Frequency Features for Robust Multi-Turn Image Editing

Yucheng Liao, Jiajun Liang, Kaiqian Cui, Baoquan Zhao, Haoran Xie, Wei Liu, Qing Li, Xudong Mao

2512.00422 2026-03-24 cs.CV

PhysGen: Physically Grounded 3D Shape Generation for Industrial Design

Yingxuan You, Chen Zhao, Hantao Zhang, Ming Xu, Pascal Fua

Comments Accepted to CVPR 2026. 14 pages, 10 figures

2512.00065 2026-03-24 cs.CV cs.AI

Satellite to Street : Disaster Impact Estimator

Sreesritha Sai, Sai Venkata Suma Sreeja, Sai Sri Deepthi, Nikhil

Comments 6 pages,4 figures, 2 tables

详情

英文摘要

Accurate assessment of post-disaster damage is essential for prioritizing emergency response, yet current practices rely heavily on manual interpretation of satellite imagery.This approach is time-consuming, subjective, and difficult to scale during large-area disasters. Although recent deep-learning models for semantic segmentation and change detection have improved automation, many of them still struggle to capture subtle structural variations and often perform poorly when dealing with highly imbalanced datasets, where undamaged buildings dominate. This thesis introduces Satellite-to-Street:Disaster Impact Estimator, a deep-learning framework that produces detailed, pixel-level damage maps by analyzing pre and post-disaster satellite images together. The model is built on a modified dual-input U-Net architecture that strengthens feature fusion between both images, allowing it to detect not only small, localized changes but also broader contextual patterns across the scene. To address the imbalance between damage categories, a class-aware weighted loss function is used, which helps the model better recognize major and destroyed structures. A consistent preprocessing pipeline is employed to align image pairs, standardize resolutions, and prepare the dataset for training. Experiments conducted on publicly available disaster datasets show that the proposed framework achieves better classification of damaged regions compared to conventional segmentation networks.The generated damage maps provide faster and objective method for analyzing disaster impact, working alongside expert judgment rather than replacing it. In addition to identifying which areas are damaged, the system is capable of distinguishing different levels of severity, ranging from slight impact to complete destruction. This provides a more detailed and practical understanding of how the disaster has affected each region.

URL PDF HTML ☆

赞 0 踩 0

2511.22862 2026-03-24 cs.LG cs.CV

Bridging Modalities via Progressive Re-alignment for Multimodal Test-Time Adaptation

Jiacheng Li, Songhe Feng

Comments Accepted by AAAI 2026 (Oral)

详情

DOI: 10.1609/aaai.v40i27.39457
Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence. 2026, 40(27): 22931-22939

英文摘要

Test-time adaptation (TTA) enables online model adaptation using only unlabeled test data, aiming to bridge the gap between source and target distributions. However, in multimodal scenarios, varying degrees of distribution shift across different modalities give rise to a complex coupling effect of unimodal shallow feature shift and cross-modal high-level semantic misalignment, posing a major obstacle to extending existing TTA methods to the multimodal field. To address this challenge, we propose a novel multimodal test-time adaptation (MMTTA) framework, termed as Bridging Modalities via Progressive Re-alignment (BriMPR). BriMPR, consisting of two progressively enhanced modules, tackles the coupling effect with a divide-and-conquer strategy. Specifically, we first decompose MMTTA into multiple unimodal feature alignment sub-problems. By leveraging the strong function approximation ability of prompt tuning, we calibrate the unimodal global feature distributions to their respective source distributions, so as to achieve the initial semantic re-alignment across modalities. Subsequently, we assign the credible pseudo-labels to combinations of masked and complete modalities, and introduce inter-modal instance-wise contrastive learning to further enhance the information interaction among modalities and refine the alignment. Extensive experiments on MMTTA tasks, including both corruption-based and real-world domain shift benchmarks, demonstrate the superiority of our method. Our source code is available at https://github.com/Luchicken/BriMPR.

URL PDF HTML ☆

赞 0 踩 0

2511.20011 2026-03-24 cs.CV cs.AI

Multi-Context Fusion Transformer for Pedestrian Crossing Intention Prediction in Urban Environments

Yuanzhe Li, Hang Zhong, Steffen Müller

2511.17848 2026-03-24 cs.LG cond-mat.mtrl-sci

Scaling Kinetic Monte-Carlo Simulations of Grain Growth with Combined Convolutional and Graph Neural Networks

Zhihui Tian, Ethan Suwandi, Tomas Oppelstrup, Vasily V. Bulatov, Joel B. Harley, Fei Zhou

Comments Accepted for publication in Acta Materialia

2511.16407 2026-03-24 cs.RO

LAOF: Robust Latent Action Learning with Optical Flow Constraints

Xizhou Bu, Jiexi Lyu, Fulei Sun, Ruichen Yang, Zhiqiang Ma, Wei Li

Comments CVPR 2026; Project page: https://github.com/XizoB/LAOF

2511.12985 2026-03-24 cs.LG cs.CV

Angular Gradient Sign Method: Uncovering Vulnerabilities in Hyperbolic Networks

Minsoo Jo, Dongyoon Yang, Taesup Kim

Comments Accepted by AAAI 2026. Code available at: https://github.com/J-Minsoo/AGSM

2511.12876 2026-03-24 cs.AI econ.GN q-fin.EC

Think, Speak, Decide: Language-Augmented Multi-Agent Reinforcement Learning for Economic Decision-Making

Heyang Ma, Qirui Mi, Qipeng Yang, Zijun Fan, Bo Li, Haifeng Zhang

Comments Extended version of an accepted paper at AAAI 2026

2511.11236 2026-03-24 cs.CV

StyleQoRA: Quality-Aware Low-Rank Adaptation for Few-Shot Multi-Style Editing

Cong Cao, Huanjing Yue, Yujie Xu, Xiaodong Xu

2511.08935 2026-03-24 cs.RO cs.CV

Expand Your SCOPE: Semantic Cognition over Potential-Based Exploration for Embodied Visual Navigation

Ningnan Wang, Weihuang Chen, Liming Chen, Haoxuan Ji, Zhongyu Guo, Xuchong Zhang, Hongbin Sun

Comments Accepted to AAAI 2026

2511.08455 2026-03-24 cs.CL

Bot Meets Shortcut: How Can LLMs Aid in Handling Unknown Invariance OOD Scenarios?

Shiyan Zheng, Herun Wan, Minnan Luo, Junhang Huang