arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2512.16848 2026-03-10 cs.LG cs.AI

Meta-RL Induces Exploration in Language Agents

Yulun Jiang, Liangze Jiang, Damien Teney, Michael Moor, Maria Brbic

Comments ICLR 2026

详情

英文摘要

Reinforcement learning (RL) has enabled the training of large language model (LLM) agents to interact with the environment and to solve multi-turn long-horizon tasks. However, the RL-trained agents often struggle in tasks that require active exploration and fail to efficiently adapt from trial-and-error experiences. In this paper, we present LaMer, a general Meta-RL framework that enables LLM agents to actively explore and learn from the environment feedback at test time. LaMer consists of two key components: (i) a cross-episode training framework to encourage exploration and long-term rewards optimization; and (ii) in-context policy adaptation via reflection, allowing the agent to adapt their policy from task feedback signal without gradient update. Experiments across diverse environments show that LaMer significantly improves performance over RL baselines, with 11%, 14%, and 19% performance gains on Sokoban, MineSweeper and Webshop, respectively. Moreover, LaMer also demonstrates better generalization to more challenging or previously unseen tasks compared to the RL-trained agents. Overall, our results demonstrate that Meta-RL provides a principled approach to induce exploration in language agents, enabling more robust adaptation to novel environments through learned exploration strategies.

URL PDF HTML ☆

赞 0 踩 0

2512.16301 2026-03-10 cs.AI cs.CL

Adaptation of Agentic AI: A Survey of Post-Training, Memory, and Skills

Pengcheng Jiang, Jiacheng Lin, Zhiyi Shi, Zifeng Wang, Luxi He, Yichen Wu, Ming Zhong, Peiyang Song, Qizheng Zhang, Heng Wang, Xueqiang Xu, Hanwen Xu, Pengrui Han, Dylan Zhang, Jiashuo Sun, Chaoqi Yang, Kun Qian, Tian Wang, Changran Hu, Manling Li, Quanzheng Li, Hao Peng, Sheng Wang, Jingbo Shang, Chao Zhang, Jiaxuan You, Liyuan Liu, Pan Lu, Yu Zhang, Heng Ji, Yejin Choi, Dawn Song, Jimeng Sun, Jiawei Han

2512.13215 2026-03-10 cs.RO

Multi-directional Safe Rectangle Corridor-Based MPC for Nonholonomic Robots Navigation in Cluttered Environment

Yinsong Qu, Yunxiang Li, Shanlin Zhong

Comments 9 pages, 11 figures, conference paper for the 2025 International Conference on Advanced Robotics and Mechatronics (ICARM), accepted

详情

DOI: 10.1109/ICARM65671.2025.11293633
Journal ref: 2025 International Conference on Advanced Robotics and Mechatronics (ICARM), Portsmouth, United Kingdom, 2025, pp. 634-641

英文摘要

Autonomous Mobile Robots (AMRs) have become indispensable in industrial applications due to their operational flexibility and efficiency. Navigation serves as a crucial technical foundation for accomplishing complex tasks. However, navigating AMRs in dense, cluttered, and semi-structured environments remains challenging, primarily due to nonholonomic vehicle dynamics, interactions with mixed static/dynamic obstacles, and the non-convex constrained nature of such operational spaces. To solve these problems, this paper proposes an Improved Sequential Model Predictive Control (ISMPC) navigation framework that systematically reformulates navigation tasks as sequential switched optimal control problems. The framework addresses the aforementioned challenges through two key innovations: 1) Implementation of a Multi-Directional Safety Rectangular Corridor (MDSRC) algorithm, which encodes the free space through rectangular convex regions to avoid collision with static obstacles, eliminating redundant computational burdens and accelerating solver convergence; 2) A sequential MPC navigation framework that integrates corridor constraints with barrier function constraints is proposed to achieve static and dynamic obstacle avoidance. The ISMPC navigation framework enables direct velocity generation for AMRs, simplifying traditional navigation algorithm architectures. Comparative experiments demonstrate the framework's superiority in free-space utilization ( an increase of 41.05$\%$ in the average corridor area) while maintaining real-time computational performance (average corridors generation latency of 3 ms).

URL PDF HTML ☆

赞 0 踩 0

2512.10416 2026-03-10 cs.CV cs.AI

Beyond Endpoints: Path-Centric Reasoning for Vectorized Off-Road Network Extraction

Wenfei Guan, Jilin Mei, Tong Shen, Xumin Wu, Shuo Wang, Chen Min, Yu Hu

Comments This revision improves clarity and consistency throughout the paper. We refine terminology to more precisely describe the vertex extraction optimization, add motivational context to the edge feature encoding section, and clarify the overall inference pipeline. We also add an Acknowledgments section

2512.08877 2026-03-10 cs.RO

IPPO Learns the Game, Not the Team: A Study on Generalization in Heterogeneous Agent Teams

Ryan LeRoy, Jack Kolb

Comments 4 pages, 3 figures, appendix

2512.08564 2026-03-10 cs.CV

Modular Neural Image Signal Processing

Mahmoud Afifi, Zhongling Wang, Ran Zhang, Michael S. Brown

2512.07580 2026-03-10 cs.CV

When Token Pruning is Worse than Random: Understanding Visual Token Information in VLLMs

Yahong Wang, Juncheng Wu, Zhangkai Ni, Longzhen Yang, Yihang Liu, Chengmei Yang, Ying Wen, Lianghua He, Xianfeng Tang, Hui Liu, Yuyin Zhou

Comments Accepted to CVPR 2026

详情

英文摘要

Vision Large Language Models (VLLMs) incur high computational costs due to their reliance on hundreds of visual tokens to represent images. While token pruning offers a promising solution for accelerating inference, this paper, however, identifies a key observation: in deeper layers (e.g., beyond the 20th), existing training-free pruning methods perform no better than random pruning. We hypothesize that this degradation is caused by \textbf{``vanishing token information''}, where visual tokens progressively lose their salience with increasing network depth. To validate this hypothesis, we quantify a token's information content by measuring the change in the model output probabilities upon its removal. Using this proposed metric, our analysis of the information of visual tokens across layers reveals three key findings: (1) As layers deepen, the information of visual tokens gradually becomes uniform and eventually vanishes at an intermediate layer, which we term as ``information horizon", beyond which the visual tokens become redundant; (2) The position of this horizon is not static; it extends deeper for visually intensive tasks, such as Optical Character Recognition (OCR), compared to more general tasks like Visual Question Answering (VQA); (3) This horizon is also strongly correlated with model capacity, as stronger VLLMs (e.g., Qwen2.5-VL) employ deeper visual tokens than weaker models (e.g., LLaVA-1.5). Based on our findings, we show that simple random pruning in deep layers efficiently balances performance and efficiency. Moreover, integrating random pruning consistently enhances existing methods. Using DivPrune with random pruning achieves state-of-the-art results, maintaining 96.9\% of Qwen-2.5-VL-7B performance while pruning 50\% of visual tokens. The code is available at https://github.com/YahongWang1/Information-Horizon.

URL PDF HTML ☆

赞 0 踩 0

2512.03112 2026-03-10 cs.LG cs.AI stat.ML

Beyond Additivity: Sparse Isotonic Shapley Regression toward Nonlinear Explainability

Jialai She

详情

英文摘要

Shapley values, a gold standard for feature attribution in Explainable AI, face two key challenges. First, the canonical Shapley framework assumes that the worth function is additive, yet real-world payoff constructions--driven by non-Gaussian distributions, heavy tails, feature dependence, or domain-specific loss scales--often violate this assumption, leading to distorted attributions. Second, achieving sparse explanations in high-dimensional settings by computing dense Shapley values and then applying ad hoc thresholding is costly and risks inconsistency. We introduce Sparse Isotonic Shapley Regression (SISR), a unified nonlinear explanation framework. SISR simultaneously learns a monotonic transformation to restore additivity--obviating the need for a closed-form specification--and enforces an L0 sparsity constraint on the Shapley vector, enhancing computational efficiency in large feature spaces. Its optimization algorithm leverages Pool-Adjacent-Violators for efficient isotonic regression and normalized hard-thresholding for support selection, ensuring ease in implementation and global convergence guarantees. Analysis shows that SISR recovers the true transformation in a wide range of scenarios and achieves strong support recovery even in high noise. Moreover, we are the first to demonstrate that irrelevant features and inter-feature dependencies can induce a true payoff transformation that deviates substantially from linearity. Extensive experiments demonstrate that SISR stabilizes attributions across payoff schemes and correctly filters irrelevant features; in contrast, standard Shapley values suffer severe rank and sign distortions. By unifying nonlinear transformation estimation with sparsity pursuit, SISR advances the frontier of nonlinear explainability, providing a theoretically grounded and practical attribution framework.

URL PDF HTML ☆

赞 0 踩 0

2512.03034 2026-03-10 cs.CV

MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation

Youxin Pang, Jiajun Liu, Lingfeng Tan, Yong Zhang, Feng Gao, Xiang Deng, Zhuoliang Kang, Xiaoming Wei, Yebin Liu

2512.02593 2026-03-10 cs.CL cs.MA cs.NE cs.SD eess.AS

Spoken Conversational Agents with Large Language Models

Chao-Han Huck Yang, Andreas Stolcke, Larry Heck

Comments Accepted to EMNLP 2025 Tutorial

2512.02581 2026-03-10 cs.LG

Evolving Diffusion and Flow Matching Policies for Online Reinforcement Learning

Chubin Zhang, Zhenglin Wan, Feng Chen, Fuchao Yang, Lang Feng, Yaxin Zhou, Xingrui Yu, Yang You, Ivor Tsang, Bo An

Comments Ver 2

2512.02486 2026-03-10 cs.LG

Dual-Robust Cross-Domain Offline Reinforcement Learning Against Dynamics Shifts

Zhongjian Qiao, Rui Yang, Jiafei Lyu, Xiu Li, Zhongxiang Dai, Zhuoran Yang, Siyang Gao, Shuang Qiu

Comments Accepted at ICLR 2026

2512.01782 2026-03-10 cs.LG cs.AI

Dual Randomized Smoothing: Beyond Global Noise Variance

Chenhao Sun, Yuhao Mao, Martin Vechev

Comments ICLR'26

详情

英文摘要

Randomized Smoothing (RS) is a prominent technique for certifying the robustness of neural networks against adversarial perturbations. With RS, achieving high accuracy at small radii requires a small noise variance, while achieving high accuracy at large radii requires a large noise variance. However, the global noise variance used in the standard RS formulation leads to a fundamental limitation: there exists no global noise variance that simultaneously achieves strong performance at both small and large radii. To break through the global variance limitation, we propose a dual RS framework which enables input-dependent noise variances. To achieve that, we first prove that RS remains valid with input-dependent noise variances, provided the variance is locally constant around each input. Building on this result, we introduce two components: (i) a variance estimator predicts an optimal noise variance for each input, (ii) this estimated variance is then used by a standard RS classifier. The variance estimator is independently smoothed via RS to ensure local constancy, enabling flexible design. We also introduce training strategies to iteratively optimize the two components. Experiments on CIFAR-10 demonstrate that our dual RS method provides strong performance for both small and large radii-unattainable with global noise variance-while incurring only a 60% computational overhead at inference. Moreover, it outperforms prior input-dependent noise approaches across most radii, with gains at radii 0.5, 0.75, and 1.0 of 15.6%, 20.0%, and 15.7%. On ImageNet, dual RS remains effective across all radii, with advantages of 8.6%, 17.1%, and 9.1% at radii 0.5, 1.0, and 1.5. Additionally, the dual RS framework provides a routing perspective for certified robustness, improving the accuracy-robustness trade-off with off-the-shelf expert RS models.

URL PDF HTML ☆

赞 0 踩 0

2512.01763 2026-03-10 cs.CV

HiconAgent: History Context-aware Policy Optimization for GUI Agents

Xurui Zhou, Gongwei Chen, Yuquan Xie, Zaijing Li, Kaiwen Zhou, Shuai Wang, Shuo Yang, Zhuotao Tian, Rui Shao

2512.00912 2026-03-10 cs.CV cs.AI cs.LG

ForamDeepSlice: A High-Accuracy Deep Learning Framework for Foraminifera Species Classification from 2D Micro-CT Slices

Abdelghafour Halimi, Ali Alibrahim, Didier Barradas-Bautista, Ronell Sicat, Abdulkader M. Afifi

2511.18694 2026-03-10 cs.RO cs.AI cs.CV

Stable Multi-Drone GNSS Tracking System for Marine Robots

Shuo Wen, Edwin Meriaux, Mariana Sosa Guzmán, Zhizun Wang, Junming Shi, Gregory Dudek

2511.16670 2026-03-10 cs.CV

Learning to Think Fast and Slow for Visual Language Models

Chenyu Lin, Cheng Chi, Jinlin Wu, Sharon Li, Kaiyang Zhou

2511.08946 2026-03-10 cs.LG

Improving Conditional VAE with Non-Volume Preserving transformations

Tuhin Subhra De

Comments Independent Work

2511.06830 2026-03-10 cs.CV

MUGSQA: Novel Multi-Uncertainty-Based Gaussian Splatting Quality Assessment Method, Dataset, and Benchmarks

Tianang Chen, Jian Jin, Shilv Cai, Zhuangzi Li, Weisi Lin

Comments ICASSP 2026

2511.05694 2026-03-10 cs.LG

Distributionally Robust Self Paced Curriculum Reinforcement Learning

Anirudh Satheesh, Keenan Powell, Vaneet Aggarwal

2511.02872 2026-03-10 cs.LG cs.AI cs.FL cs.LO

FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels

Jiedong Jiang, Wanyi He, Yuefeng Wang, Guoxiong Gao, Yongle Hu, Jingting Wang, Nailin Guan, Peihao Wu, Chunbo Dai, Liang Xiao, Bin Dong

2511.01743 2026-03-10 cs.LG cs.AI cs.NI

Towards Efficient Federated Learning of Networked Mixture-of-Experts for Mobile Edge Computing

Song Gao, Songyang Zhang, Shusen Jing, Shuai Zhang, Xiangwei Zhou, Yue Wang, Zhipeng Cai

2510.27178 2026-03-10 cs.RO

MobiDock: Design and Control of A Modular Self Reconfigurable Bimanual Mobile Manipulator via Robotic Docking

Xuan-Thuan Nguyen, Khac Nam Nguyen, Ngoc Duy Tran, Thi Thoa Mac, Anh Nguyen, Hoang Hiep Ly, Tung D. Ta

Comments IROS2026 submited

2510.26909 2026-03-10 cs.RO

NaviTrace: Evaluating Embodied Navigation of Vision-Language Models

Tim Windecker, Manthan Patel, Moritz Reuss, Richard Schwarzkopf, Cesar Cadena, Rudolf Lioutikov, Marco Hutter, Jonas Frey

Comments 11 pages, 6 figures, with appendix, accepted to ICRA 2026

2510.24793 2026-03-10 cs.CL cs.AI

SwiftEmbed: Ultra-Fast Text Embeddings via Static Token Lookup for Real-Time Applications

Edouard Lansiaux, Antoine Simonet, Eric Wiel

2510.24118 2026-03-10 cs.RO cs.AI

LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation

Haotian Zhou, Xiaole Wang, He Li, Zhuo Qi, Jinrun Yin, Haiyu Kong, Jianghuan Xu, Huijing Zhao

2510.20542 2026-03-10 cs.LG

A Unified Framework for Zero-Shot Reinforcement Learning

Jacopo Di Ventura, Jan Felix Kleuker, Aske Plaat, Thomas Moerland

2510.20331 2026-03-10 cs.CV

AnyPcc: Compressing Any Point Cloud with a Single Universal Model

Kangli Wang, Qianxi Yi, Yuqi Ye, Shihao Li, Wei Gao

Comments CVPR 2026

2510.18591 2026-03-10 cs.LG

Robustness Verification of Graph Neural Networks Via Lightweight Satisfiability Testing

Chia-Hsuan Lu, Tony Tan, Michael Benedikt

2510.17525 2026-03-10 cs.RO

HumanHalo - Safe and Efficient 3D Navigation Among Humans via Minimally Conservative MPC

Simon Schaefer, Helen Oleynikova, Sandra Hirche, Stefan Leutenegger