arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.24093 2026-03-26 cs.LG cs.AI

Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization

Fei Bai, Zhipeng Chen, Chuan Hao, Ming Yang, Ran Tao, Bryan Dai, Wayne Xin Zhao, Jian Yang, Hongteng Xu

详情

英文摘要

Recently, reinforcement learning~(RL) has become an important approach for improving the capabilities of large language models~(LLMs). In particular, reinforcement learning from verifiable rewards~(RLVR) has emerged as a promising paradigm for reasoning tasks. However, existing RL-based training still remains only a rough approximation to human learning. Human learners leverage both external and internal experience to guide exploration and gradually internalize useful trajectories into stable knowledge. Motivated by this gap, we ask: how can LLMs better utilize and internalize experience during RLVR training? To answer this question, we propose \textbf{D}ual \textbf{G}uidance \textbf{O}ptimization~(\textbf{DGO}), a unified framework that leverages \emph{external} and \emph{internal experience} to improve training effectiveness. Specifically, DGO first constructs an experience bank from previously explored trajectories. The policy then performs exploration under the joint guidance of the experience bank and the model's internal knowledge. The resulting trajectories are further used to refine the experience bank and optimize model parameters, forming a closed loop of experience utilization and internalization. Experiments show that DGO consistently outperforms baseline methods, suggesting that better utilization and internalization of experience lead to more effective reasoning.

URL PDF HTML ☆

赞 0 踩 0

2603.24086 2026-03-26 cs.CV cs.GR

LGTM: Training-Free Light-Guided Text-to-Image Diffusion Model via Initial Noise Manipulation

Ryugo Morita, Stanislav Frolov, Brian Bernhard Moser, Ko Watanabe, Riku Takahashi, Andreas Dengel

Comments Accepted to IJCNN2026

2603.24084 2026-03-26 cs.AI

Bridging the Evaluation Gap: Standardized Benchmarks for Multi-Objective Search

Hadar Peer, Carlos Hernandez, Sven Koenig, Ariel Felner, Oren Salzman

2603.24083 2026-03-26 cs.RO cs.AI cs.LG

Knowledge-Guided Manipulation Using Multi-Task Reinforcement Learning

Aditya Narendra, Mukhammadrizo Maribjonov, Dmitry Makarov, Dmitry Yudin, Aleksandr Panov

Comments 8 pages, 8 figures. Accepted to IEEE International Conference on Robotics and Automation (ICRA 2026)

2603.24079 2026-03-26 cs.CV cs.AI cs.CR

When Understanding Becomes a Risk: Authenticity and Safety Risks in the Emerging Image Generation Paradigm

Ye Leng, Junjie Chu, Mingjie Li, Chenhao Lin, Chao Shen, Michael Backes, Yun Shen, Yang Zhang

Comments Accepted by CVPR 2026. 15 pages, 11 figures

2603.24078 2026-03-26 cs.CV

PosterIQ: A Design Perspective Benchmark for Poster Understanding and Generation

Yuheng Feng, Wen Zhang, Haodong Duan, Xingxing Zou

Comments CVPR 2026, Project Page: https://github.com/ArtmeScienceLab/PosterIQ-Benchmark

2603.24076 2026-03-26 cs.LG cs.SY eess.SY

The impact of sensor placement on graph-neural-network-based leakage detection

J. J. H. van Gemert, V. Breschi, D. R. Yntema, K. J. Keesman, M. Lazar

2603.24073 2026-03-26 cs.CL

ConceptKT: A Benchmark for Concept-Level Deficiency Prediction in Knowledge Tracing

Yu-Chen Kang, Yu-Chien Tang, An-Zi Yen

Comments Accepted by LREC 2026

2603.24065 2026-03-26 cs.AI

Enhanced Mycelium of Thought (EMoT): A Bio-Inspired Hierarchical Reasoning Architecture with Strategic Dormancy and Mnemonic Encoding

Florian Odi Stummer

Comments 32 pages, 6 figures, 15 tables; includes ablation studies and reasoning trace visualisation

2603.24059 2026-03-26 cs.CV

AD-Reasoning: Multimodal Guideline-Guided Reasoning for Alzheimer's Disease Diagnosis

Qiuhui Chen, Yushan Deng, Xuancheng Yao, Yi Hong

Comments ICME 2026

2603.24057 2026-03-26 cs.CV

Beyond Semantic Priors: Mitigating Optimization Collapse for Generalizable Visual Forensics

Jipeng Liu, Haichao Shi, Siyu Xing, Rong Yin, Xiao-Yu Zhang

详情

英文摘要

While Vision-Language Models (VLMs) like CLIP have emerged as a dominant paradigm for generalizable deepfake detection, a representational disconnect remains: their semantic-centric pre-training is ill-suited for capturing non-semantic artifacts inherent to hyper-realistic synthesis. In this work, we identify a failure mode termed Optimization Collapse, where detectors trained with Sharpness-Aware Minimization (SAM) degenerate to random guessing on non-semantic forgeries once the perturbation radius exceeds a narrow threshold. To theoretically formalize this collapse, we propose the Critical Optimization Radius (COR) to quantify the geometric stability of the optimization landscape, and leverage the Gradient Signal-to-Noise Ratio (GSNR) to measure generalization potential. We establish a theorem proving that COR increases monotonically with GSNR, thereby revealing that the geometric instability of SAM optimization originates from degraded intrinsic generalization potential. This result identifies the layer-wise attenuation of GSNR as the root cause of Optimization Collapse in detecting non-semantic forgeries. Although naively reducing perturbation radius yields stable convergence under SAM, it merely treats the symptom without mitigating the intrinsic generalization degradation, necessitating enhanced gradient fidelity. Building on this insight, we propose the Contrastive Regional Injection Transformer (CoRIT), which integrates a computationally efficient Contrastive Gradient Proxy (CGP) with three training-free strategies: Region Refinement Mask to suppress CGP variance, Regional Signal Injection to preserve CGP magnitude, and Hierarchical Representation Integration to attain more generalizable representations. Extensive experiments demonstrate that CoRIT mitigates optimization collapse and achieves state-of-the-art generalization across cross-domain and universal forgery benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2603.24051 2026-03-26 cs.CL

FinToolSyn: A forward synthesis Framework for Financial Tool-Use Dialogue Data with Dynamic Tool Retrieval

Caishuang Huang, Yang Qiao, Rongyu Zhang, Junjie Ye, Pu Lu, Wenxi Wu, Meng Zhou, Xiku Du, Tao Gui, Qi Zhang, Xuanjing Huang

2603.24047 2026-03-26 cs.RO

PCHC: Enabling Preference Conditioned Humanoid Control via Multi-Objective Reinforcement Learning

Huanyu Li, Dewei Wang, Xinmiao Wang, Xinzhe Liu, Peng Liu, Chenjia Bai, Xuelong Li

Comments 8 pages, 7 figures

2603.24045 2026-03-26 cs.CV

LGEST: Dynamic Spatial-Spectral Expert Routing for Hyperspectral Image Classification

Jiawen Wen, Suixuan Qiu, Zihang Luo, Xiaofei Yang, Haotian Shi

2603.24044 2026-03-26 cs.LG cs.CL

MoE-Sieve: Routing-Guided LoRA for Efficient MoE Fine-Tuning

Andrea Manzoni

Comments 17 pages, 6 figures, 10 tables

2603.24043 2026-03-26 cs.CV

HAM: A Training-Free Style Transfer Approach via Heterogeneous Attention Modulation for Diffusion Models

Yeqi He, Liang Li, Zhiwen Yang, Xichun Sheng, Zhidong Zhao, Chenggang Yan

Comments Accepted in CVPR 2026 Findings

2603.24039 2026-03-26 cs.CV cs.GR cs.HC

SemLayer: Semantic-aware Generative Segmentation and Layer Construction for Abstract Icons

Haiyang Xu, Ronghuan Wu, Li-Yi Wei, Nanxuan Zhao, Chenxi Liu, Cuong Nguyen, Zhuowen Tu, Zhaowen Wang

Comments Accepted to CVPR 2026

2603.24036 2026-03-26 cs.CV

SpectralSplats: Robust Differentiable Tracking via Spectral Moment Supervision

Avigail Cohen Rimon, Amir Mann, Mirela Ben Chen, Or Litany

Comments Project page: https://avigailco.github.io/SpectralSplats/

2603.24034 2026-03-26 cs.CL cs.AI

From Oracle to Noisy Context: Mitigating Contextual Exposure Bias in Speech-LLMs

Xiaoyong Guo, Nanjie Li, Zijie Zeng, Kai Wang, Hao Huang, Haihua Xu, Wei Shi

2603.24030 2026-03-26 cs.CV cs.MM

Decompose and Transfer: CoT-Prompting Enhanced Alignment for Open-Vocabulary Temporal Action Detection

Sa Zhu, Wanqian Zhang, Lin Wang, Xiaohua Chen, Chenxu Cui, Jinchao Zhang, Bo Li

Comments Accepted by CVPR 2026

2603.24025 2026-03-26 cs.LG stat.ME

i-IF-Learn: Iterative Feature Selection and Unsupervised Learning for High-Dimensional Complex Data

Chen Ma, Wanjie Wang, Shuhao Fan

Comments 28 pages, 5 figures, including appendix. Accepted at AISTATS

2603.24023 2026-03-26 cs.CL cs.AI

Schema on the Inside: A Two-Phase Fine-Tuning Method for High-Efficiency Text-to-SQL at Scale

Chinmay Soni, Shivam Chourasia, Gaurav Kumar, Hitesh Kapoor

Comments 8 pages, 6 figures. Published in the Proceedings of the Fortieth AAAI Conference on Artificial Intelligence (AAAI-26), 2026

2603.24021 2026-03-26 cs.RO

QuadFM: Foundational Text-Driven Quadruped Motion Dataset for Generation and Control

Li Gao, Fuzhi Yang, Jianhui Chen, Liu Liu, Yao Zheng, Yang Cai, Ziqiao Li

2603.24018 2026-03-26 cs.AI

ELITE: Experiential Learning and Intent-Aware Transfer for Self-improving Embodied Agents

Bingqing Wei, Zhongyu Xia, Dingai Liu, Xiaoyu Zhou, Zhiwei Lin, Yongtao Wang

2603.24016 2026-03-26 cs.CV cs.LG

COVTrack++: Learning Open-Vocabulary Multi-Object Tracking from Continuous Videos via a Synergistic Paradigm

Zekun Qian, Wei Feng, Ruize Han, Junhui Hou

详情

英文摘要

Multi-Object Tracking (MOT) has traditionally focused on a few specific categories, restricting its applicability to real-world scenarios involving diverse objects. Open-Vocabulary Multi-Object Tracking (OVMOT) addresses this by enabling tracking of arbitrary categories, including novel objects unseen during training. However, current progress is constrained by two challenges: the lack of continuously annotated video data for training, and the lack of a customized OVMOT framework to synergistically handle detection and association. We address the data bottleneck by constructing C-TAO, the first continuously annotated training set for OVMOT, which increases annotation density by 26x over the original TAO and captures smooth motion dynamics and intermediate object states. For the framework bottleneck, we propose COVTrack++, a synergistic framework that achieves a bidirectional reciprocal mechanism between detection and association through three modules: (1) Multi-Cue Adaptive Fusion (MCF) dynamically balances appearance, motion, and semantic cues for association feature learning; (2) Multi-Granularity Hierarchical Aggregation (MGA) exploits hierarchical spatial relationships in dense detections, where visible child nodes (e.g., object parts) assist occluded parent objects (e.g., whole body) for association feature enhancement; (3) Temporal Confidence Propagation (TCP) recovers flickering detections through high-confidence tracked objects boosting low-confidence candidates across frames, stabilizing trajectories. Extensive experiments on TAO demonstrate state-of-the-art performance, with novel TETA reaching 35.4% and 30.5% on validation and test sets, improving novel AssocA by 4.8% and novel LocA by 5.8% over previous methods, and show strong zero-shot generalization on BDD100K. The code and dataset will be publicly available.

URL PDF HTML ☆

赞 0 踩 0

2603.24014 2026-03-26 cs.AI

Language-Grounded Multi-Agent Planning for Personalized and Fair Participatory Urban Sensing

Xusen Guo, Mingxing Peng, Hongliang Lu, Hai Yang, Jun Ma, Yuxuan Liang

Comments 19 pages, 12 figures

2603.24006 2026-03-26 cs.CV

UW-VOS: A Large-Scale Dataset for Underwater Video Object Segmentation

Hongshen Zhao, Jingkang Tai, Yuhang Wu, Wenkang Zhang, Xi Lan, Shangyan Wang, Tianyu Zhang, Wankou Yang

2603.24005 2026-03-26 cs.CV

DB SwinT: A Dual-Branch Swin Transformer Network for Road Extraction in Optical Remote Sensing Imagery

Zongyang He, Xiangli Yang, Xian Gao, Zhiguo Wang

2603.24004 2026-03-26 cs.CL

Thinking with Tables: Enhancing Multi-Modal Tabular Understanding via Neuro-Symbolic Reasoning

Kun-Yang Yu, Zhi Zhou, Shi-Yu Tian, Xiao-Wen Yang, Zi-Yi Jia, Ming Yang, Zi-Jian Cheng, Lan-Zhe Guo, Yu-Feng Li

Comments 20 pages, 6 figures

2603.23995 2026-03-26 cs.RO

MIRROR: Visual Motion Imitation via Real-time Retargeting and Teleoperation with Parallel Differential Inverse Kinematics

Junheng Li, Lizhi Yang, Aaron D. Ames

Comments 8 pages, 7 figures